Libbash will enable programs to use Abstract Syntax Trees(AST) to parse and interpret shell scripts directly instead of using regular expressions. Most of bash 3.2 syntax will be supported. This will be a great benefit to programs both outside and inside Gentoo, including Portage/Paludis and repoman.
You can refer to qiaomuf’s blog or gentoo-soc mail list for the latest progress. We already have an introduction with benchmark for egencache, instruo and instruo reimplemented with libbash.
The main parts of the project can be illustrated as follows:

bashast/bashast.g is the ANTLR grammar of the Lexer and Parser. It contains necessary logic to build an Abstract Syntax Tree(AST) from a given shell script. You can use the AST to validate the script according to your domain logic. For example, repoman can be reimplemented to use the AST to do the QA job instead of complicated regular expressions. Our tree walker uses the AST to execute actions.
A tree walker extracts information from the AST and computes ancillary data structures. The walker contains the following:
The project is a C++ shared library. Because we use C++ to implement the library, we provide a C++ interface first. In future, I’ll provide a python binding.
The license is GNU General Public License 2 or later. We agree to change the license to LGPL if GPL-2 is getting improper for the library in future.
Bash uses GPL-3 but Portage uses GPL-2. Those two are not compatible.
We don't have enough time :). Bash uses more than 150000 lines of code for its implementation and we won’t cut off many bash features.
boost::spirit is a good option, but here are some points that speak for ANTLR:
The goal of this summer is to support metadata generation. Grey-colored features won’t be implemented in this summer. Green-colored features supported now. Blue-colored features are partly supported now.
Feature Name | Details | Usage | Estimated Amount of work | Remark |
Quoting | Escape Characters; Single, Double Quotes; ANSI-C Quoting | \r, \t, \\, “”, ‘’ | * | Locale Translation won’t be implemented |
Comments | Comments | # blah, #!/bin/sh | * | |
Pipelines | |, |& | ** | ||
Lists of Commands | mv a b || die | ** | ||
Compound Commands | Looping Constructs; Conditional Constructs; Command Grouping | until, while, for, if , case, (), {} | **** | |
Coprocesses | Won’t be implemented | Won’t be implemented | ||
Shell functions | ** | |||
Shell Parameters | Positional Parameters; Special Parameters; | $0, $@ | *** | |
Brace Expansion | echo a{d,c,b}e | * | Not required for metadata generation. | |
Tilde Expansion | Won’t be implemented | ~/foo, ~+/foo | Not required for metadata generation, rarely used in ebuilds and eclasses | |
Shell Parameter Expansion | ${parameter:−word} ${!name[@]} ${parameter,,pattern} | ** | Depends on Pattern Matching | |
Command Substitution | $(command) `command` | * | ||
Arithmetic Expansion | $(( expression )) let expression | ** | ||
Process Substitution | Won’t be implemented | Won’t be implemented | ||
Word Splitting | ** | |||
Pattern Matching | *.*, e? [:alnum:] | **** | Pattern matching can be used by other features | |
Filename Expansion | ** | Not required for metadata generation, depends on Pattern Matching | ||
Quote Removal | After all the above expansions | * | ||
Redirection | Redirecting input, output, standard input, standard output; Appending Redirected Output, standard input, standard output; Here Documents and Strings; Duplicating, Moving, Opening File Descriptors; | >word, >>word, 2>&1 | **** | Not all features are required for metadata generation |
Executing Commands | Simple Command Expansion Command Search and Execution Command Execution Environment Environment Exit Status Signal | ***** | Executing external command is not required for metadata generation. Signal will be ignored. | |
Bash Conditional Expressions | -a -b -x | *** | Use with [[ or test | |
Regular Expression | ** | Not required for metadata generation. | ||
Arrays | declare -a name name=(v1 .. vn) | ** | ||
Directory Stack | dirs, popd, pushd | *** | Use with Directory Stack built-in. Not required for metadata generation. |
Command | Used in eclass | Used in ebuild | Estimated amount of work | Remark |
: | No | Yes | * | Do nothing |
. | No | Yes | ** | A synonym for source. |
break | Yes | Yes | * | |
cd | Yes | Yes | *** | Not required for metadata generation. |
continue | Yes | Yes | * | |
eval | Yes | Yes | **** | |
exec | No | No | Replace current process with the new command | |
exit | Yes | Yes | * | Not required for metadata generation. |
export | Yes | Yes | * | |
getopts | No | No | Used by shell scripts to parse positional parameters | |
hash | No | No | Remember the full pathnames of commands | |
pwd | Yes | Yes | * | Not required for metadata generation. |
readonly | Yes, only once | No | * | Not required for metadata generation. |
return | Yes | Yes | * | |
shift | Yes | Yes | * | |
test | Yes | Yes | ** | Use with Bash Conditional Expressions |
times | No | No | ||
trap | No | Yes | *** | The commands in arg are to be read and executed when the shell receives signal sigspec. |
umask | Yes | Yes | ** | Not required for metadata generation. |
unset | Yes | Yes | * |
Command | Used in eclass | Used in ebuild | Estimated amount of work | Remark |
alias | No | Yes | ** | |
bind | No | No | ||
builtin | No | No | Run a shell builtin, passing it args, and return its exit status. | |
caller | No | No | Returns the context of any active subroutine call. | |
command | No | No | Runs command with arguments ignoring any shell function named command. | |
declare | Yes | Yes | *** | Not required for metadata generation. |
echo | Yes | Yes | ** | |
enable | No | No | Enable and disable built-in shell commands. | |
help | No | No | ||
let | Yes | Yes | * | |
local | Yes | Yes | * | |
logout | No | No | ||
mapfile | No | No | Read lines from the standard input into the indexed array variable array, or from file descriptor fd if the -u option is supplied. | |
printf | Yes | Yes | * | Not required for metadata generation. |
read | Yes | Yes | **** | |
readarray | No | No | A synonym for mapfile. | |
source | Yes | Yes | ** | |
type | Yes | Yes | *** | Not required for metadata generation. |
typeset | Yes | Yes | deprecated in favor of the declare built-in command. | |
ulimit | No | No | ||
unalias | No | No |
Command | Used in eclass | Used in ebuild | Estimated amount of work | Remark |
set | Yes | Yes | **** | |
shopt | Yes | Yes | **** | We probably won’t support many options |
* : estimated amount of work that is based on the number of lines of code that GNU bash used to implement the corresponding built-in.