Formal definition of command syntax

This chapter defines the syntax of the shell command language.

Note	Some of the syntactic features described below are not supported in the POSIXly-correct mode.

Tokenization

The characters of the input source code are first delimited into tokens. Tokens are delimited so that the earlier token spans as long as possible. A sequence of one or more unquoted blank characters delimits a token.

The following tokens are the operator tokens:

& && ( ) ; ;; | || < << <& <( <<- <<< <> > >> >& >( >>| >| (newline)

Note	Unlike other programming languages, the newline operator is a token rather than a white space.

Characters that are not blank nor part of an operator compose a word token. Words are parsed by the following parsing expression grammar:

Word: (WordElement / !SpecialChar .)+
WordElement: \ . /
' (!' .)* ' /
" QuoteElement* " /
Parameter /
Arithmetic /
CommandSubstitution
QuoteElement: \ ([$`"\] / <newline>) /
Parameter /
Arithmetic /
CommandSubstitutionQuoted /
![`"\] .
Parameter: $ [@*#?-$! [:digit:]] /
$ PortableName /
$ ParameterBody
PortableName: ![0-9] [0-9 ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_]+
ParameterBody: { ParameterNumber? (ParameterName / ParameterBody / $ ParameterBody / Arithmetic / CommandSubstitution) ParameterIndex? ParameterMatch? }
ParameterNumber: # ![}+=:/%] !([-?#] !})
ParameterName: [@*#?-$!] /
[[:alnum:] _]+
ParameterIndex: [ ParameterIndexWord (, ParameterIndexWord)? ]
ParameterIndexWord: (WordElement / !["'],] .)+
ParameterMatch: :? [-+=?] ParameterMatchWord /
(# / ## / % / %%) ParameterMatchWord /
(:/ / / [#%/]?) ParameterMatchWordNoSlash (/ ParameterMatchWord)?
ParameterMatchWord: (WordElement / !["'}] .)*
ParameterMatchWordNoSlash: (WordElement / !["'/}] .)*
Arithmetic: $(( ArithmeticBody* ))
ArithmeticBody: \ . /
Parameter /
Arithmetic /
CommandSubstitution /
( ArithmeticBody ) /
![`()] .
CommandSubstitution: $( CompleteProgram ) /
` CommandSubstitutionBody* `
CommandSubstitutionQuoted: $( CompleteProgram ) /
` CommandSubstitutionBodyQuoted* `
CommandSubstitutionBody: \ [$`\] /
!` .
CommandSubstitutionBodyQuoted: \ [$`\`] /
!` .
SpecialChar: [|&;<>()`\"' [:blank:]] / <newline>

The set of terminals of the grammar is the set of characters that can be handled on the environment in which the shell is run (a.k.a. execution character set), with the exception that the set does not contain the null character ('\0').

Strictly speaking, the definition above is not a complete parsing expression grammar because the rule for CommandSubstitution (Quoted) depends on CompleteProgram which is a non-terminal of the syntax.

Token classification

After a word token is delimited, the token may be further classified as an IO_NUMBER token, reserved word, name word, assignment word, or just normal word. Classification other than the normal word is applied only when applicable in the context in which the word appears. See Tokens and keywords for the list of the reserved words (keywords) and the context in which a word may be recognized as a reserved word.

A token is an IO_NUMBER token iff it is composed of digit characters only and immediately followed by < or >.

An assignment token is a token that starts with a name followed by =:

AssignmentWord: AssignmentPrefix Word
AssignmentPrefix: Name =
Name: ![[:digit:]] \[[:alnum:] _]+

Comments

A comment begins with # and continues up to (but not including) the next newline character. Comments are treated like a blank character and do not become part of a token. The initial # of a comment must appear as if it would otherwise be the first character of a word token; Other #s are just treated as part of a word token.

Comment: # (!<newline> .)*

Syntax

After tokens have been delimited, the sequence of the tokens is parsed according to the context-free grammar defined below, where *, +, and ? should be interpreted in the same manner as standard regular expression:

CompleteProgram: NL* | CompoundList
CompoundList: NL* AndOrList ((; | & | NL) CompleteProgram)?
AndOrList: Pipeline ((&& | ||) NL* Pipeline)*
Pipeline: !? Command (| NL* Command)*
Command: CompoundCommand Redirection* |
FunctionDefinition |
SimpleCommand
CompoundCommand: Subshell |
Grouping |
IfCommand |
ForCommand |
WhileCommand |
CaseCommand |
DoubleBracketCommand |
FunctionCommand
Subshell: ( CompoundList )
Grouping: { CompoundList }
IfCommand: if CompoundList then CompoundList (elif CompoundList then CompoundList)* (else CompoundList)? fi
ForCommand: for Name ((NL* in Word*)? (; | NL) NL*)? do CompoundList done
WhileCommand: (while | until) CompoundList do CompoundList done
CaseCommand: case Word NL* in NL* CaseList? esac
CaseList: CaseItem (;; NL* CaseList)?
CaseItem: (? Word (| Word)* ) CompleteProgram
DoubleBracketCommand: [[ Ors ]]
Ors: Ands (|| Ands)*
Ands: Nots (&& Nots)*
Nots: !* Primary
Primary: (-b | -c | -d | -e | -f | -G | -g | -h | -k | -L | -N | -n | -O | -o | -p | -r | -S | -s | -t | -u | -w | -x | -z) Word |
Word (-ef | -eq | -ge | -gt | -le | -lt | -ne | -nt | -ot | -veq | -vge | -vgt | -vle | -vlt | -vne | = | == | === | =~ | != | !== | < | >) Word |
( Ors ) |
Word
FunctionCommand: function Word (( ))? NL* CompoundCommand Redirection*
FunctionDefinition: Name ( ) NL* CompoundCommand Redirection*
SimpleCommand: (Assignment | Redirection) SimpleCommand? |
Word (Word | Redirection)*
Assignment: AssignmentWord |
AssignmentPrefix( NL* (Word NL*)* )
Redirection: IO_NUMBER? RedirectionOperator Word |
IO_NUMBER? <( CompleteProgram ) |
IO_NUMBER? >( CompleteProgram )
RedirectionOperator: < | <> | > | >| | >> | >>| | <& | >& | << | <<- | <<<
NL: <newline>

In the rule for Primary, Word tokens must not be ]]. Additionally, if a Primary starts with a Word, it must not be any of the possible unary operators allowed in the rule.

In the rule for SimpleCommand, a Word token is accepted only when the token cannot be parsed as the first token of an Assignment.

In the rule for Assignment, the ( token must immediately follow the AssignmentPrefix token, without any blank characters in between.

Here-document contents do not appear as part of the grammar above. They are parsed just after the newline (NL) token that follows the corresponding redirection operator.

Alias substitution

Word tokens are subject to alias substitution.

If a word is going to be parsed as a Word of a SimpleCommand, the word is subjected to alias substitution of any kind (normal and global aliases).
If a word is the next token after the result of an alias substitution and the substitution string ends with a blank character, then the word is also subjected to alias substitution of any kind.
Other words are subjected to global alias substitution unless the shell is in the POSIXly-correct mode.

Tokens that are classified as reserved words are not subject to alias substitution.