Welcome to libbash

Introduction

Libbash will enable programs to use Abstract Syntax Trees(AST) to parse and interpret shell scripts directly instead of using regular expressions. Most of bash 3.2 syntax will be supported. This will be a great benefit to programs both outside and inside Gentoo, including Portage/Paludis and repoman.

Current progress

You can refer to qiaomuf’s blog or gentoo-soc mail list for the latest progress. We already have an introduction with benchmark for egencache, instruo and instruo reimplemented with libbash.

Resources

  1. Canonical repository
  2. Blog
  3. Ohloh page
  4. betelgeuse’s overlay (has one dependency of the library: antlr-c)

Design

Rational

The main parts of the project can be illustrated as follows:

Lexer & Parser

bashast/bashast.g is the ANTLR grammar of the Lexer and Parser. It contains necessary logic to build an Abstract Syntax Tree(AST) from a given shell script. You can use the AST to validate the script according to your domain logic. For example, repoman can be reimplemented to use the AST to do the QA job instead of complicated regular expressions. Our tree walker uses the AST to execute actions.

Tree Walker

A tree walker extracts information from the AST and computes ancillary data structures. The  walker contains the following:

  1. A tree parser grammar. It contains the grammar for tree parser and embedded code to run the logic.
  2. Ancillary data structures. It provides implementation of data structures for the walker, e.g. symbol table.
  3. Implementation of bash logic and features including most bash 3.1 features. There’s a list for them at the end of the page.
  4. Implementation of bash built-ins. Not all of the built-ins are required for our project (like bind). There’s a list for them at the end of the page.
Public Interface

The project is a C++ shared library. Because we use C++ to implement the library, we provide a C++ interface first. In future, I’ll provide a python binding.

License

The license is GNU General Public License 2 or later. We agree to change the license to LGPL if GPL-2 is getting improper for the library in future.

Q&A

  1. Why not just use the source code of GNU bash?

Bash uses GPL-3 but Portage uses GPL-2. Those two are not compatible.

  1. Why not use C?

We don't have enough time :). Bash uses more than 150000 lines of code for its implementation and we won’t cut off many bash features.

  1. As you’ve already used boost, why not just use boost::spirit to implement the parser?

boost::spirit is a good option, but here are some points that speak for ANTLR:

  1. There’s a graphical tool called antlrworks that help develop the grammar.
  2. Out of box testing and debugging tools(gUnit + antlrworks).
  3. Same grammar can generate parsers in multiple languages. So other projects based on libbash can generate, for example, a python parser.
  4. Out of the box tools to build ASTs and work with them.

Appendix

The goal of this summer is to support metadata generation. Grey-colored features won’t be implemented in this summer. Green-colored features supported now. Blue-colored features are partly supported now.

Bash Shell Features

Feature Name

Details

Usage

Estimated Amount of work

Remark

Quoting

Escape Characters;

Single, Double Quotes;

ANSI-C Quoting

\r, \t, \\, “”, ‘’

*

Locale Translation won’t be implemented

Comments

Comments

# blah, #!/bin/sh

*

Pipelines

|, |&

**

Lists of Commands

mv a b || die

**

Compound Commands

Looping Constructs;

Conditional Constructs;

Command Grouping

until, while, for, if , case, (), {}

****

Coprocesses

Won’t be implemented

Won’t be implemented

Shell functions

**

Shell Parameters

Positional Parameters;

Special Parameters;

$0, $@

***

Brace Expansion

echo a{d,c,b}e

*

Not required for metadata generation.

Tilde Expansion

Won’t be implemented

~/foo, ~+/foo

Not required for metadata generation, rarely used in ebuilds and eclasses

Shell Parameter Expansion

${parameter:−word}

${!name[@]}

${parameter,,pattern}

**

Depends on Pattern Matching

Command Substitution

$(command)

`command`

*

Arithmetic Expansion

$(( expression ))

let expression

**

Process Substitution

Won’t be implemented

Won’t be implemented

Word Splitting

**

Pattern Matching

*.*, e? [:alnum:]

****

Pattern matching can be used by other features

Filename Expansion

**

Not required for metadata generation,

depends on Pattern Matching

Quote Removal

After all the above expansions

*

Redirection

Redirecting input, output, standard input, standard output;

Appending Redirected Output, standard input, standard output;

Here Documents and Strings;

Duplicating, Moving, Opening File Descriptors;

>word, >>word, 2>&1

****

Not all features are required for metadata generation

Executing Commands

Simple Command Expansion

Command Search and Execution

Command Execution Environment

Environment

Exit Status

Signal

*****

Executing external command is not required for metadata generation.

Signal will be ignored.

Bash Conditional Expressions

-a -b -x

***

Use with [[ or test

Regular Expression

**

Not required for metadata generation.

Arrays

declare -a name name=(v1 .. vn)

**

Directory Stack

dirs, popd, pushd

***

Use with Directory Stack built-in.

Not required for metadata generation.

Built-tins
Bourne Shell built-ins

Command

Used in eclass

Used in ebuild

Estimated amount of work

Remark

:

No

Yes

*

Do nothing

.

No

Yes

**

A synonym for source.

break

Yes

Yes

*

cd

Yes

Yes

***

Not required for metadata generation.

continue

Yes

Yes

*

eval

Yes

Yes

****

exec

No

No

Replace current process with the new command

exit

Yes

Yes

*

Not required for metadata generation.

export

Yes

Yes

*

getopts

No

No

Used by shell scripts to parse positional parameters

hash

No

No

Remember the full pathnames of commands

pwd

Yes

Yes

*

Not required for metadata generation.

readonly

Yes, only once

No

*

Not required for metadata generation.

return

Yes

Yes

*

shift

Yes

Yes

*

test

Yes

Yes

**

Use with Bash Conditional Expressions

times

No

No

trap

No

Yes

***

The commands in arg are to be read and executed when the shell receives signal sigspec.

umask

Yes

Yes

**

Not required for metadata generation.

unset

Yes

Yes

*

Bash built-in Commands

Command

Used in eclass

Used in ebuild

Estimated amount of work

Remark

alias

No

Yes

**

bind

No

No

builtin

No

No

Run a shell builtin, passing it args, and return its exit status.

caller

No

No

Returns the context of any active subroutine call.

command

No

No

Runs command with arguments ignoring any shell function named command.

declare

Yes

Yes

***

Not required for metadata generation.

echo

Yes

Yes

**

enable

No

No

Enable and disable built-in shell commands.

help

No

No

let

Yes

Yes

*

local

Yes

Yes

*

logout

No

No

mapfile

No

No

Read lines from the standard input into the indexed array variable array, or from file descriptor fd if the -u option is supplied.

printf

Yes

Yes

*

Not required for metadata generation.

read

Yes

Yes

****

readarray

No

No

A synonym for mapfile.

source

Yes

Yes

**

type

Yes

Yes

***

Not required for metadata generation.

typeset

Yes

Yes

deprecated in favor of the declare built-in command.

ulimit

No

No

unalias

No

No

Modifying Shell Behavior

Command

Used in eclass

Used in ebuild

Estimated amount of work

Remark

set

Yes

Yes

****

shopt

Yes

Yes

****

We probably won’t support many options

* : estimated amount of work that is based on the number of lines of code that GNU bash used to implement the corresponding built-in.

Shell Variables(Implement as need)
  1. Bourne Shell Variables
  2. Bash Variables