This chapter defines the syntax of the shell command language.
Note
|
Some of the syntactic features described below are not supported in the POSIXly-correct mode. |
Tokenization
The characters of the input source code are first delimited into tokens. Tokens are delimited so that the earlier token spans as long as possible. A sequence of one or more unquoted blank characters delimits a token.
The following tokens are the operator tokens:
&
&&
(
)
;
;;
|
||
<
<<
<&
<(
<<-
<<<
<>
>
>>
>&
>(
>>|
>|
(newline)
Note
|
Unlike other programming languages, the newline operator is a token rather than a white space. |
Characters that are not blank nor part of an operator compose a word token. Words are parsed by the following parsing expression grammar:
- Word
-
(WordElement / !SpecialChar .)+
- WordElement
-
\
. /
'
(!'
.)*'
/
"
QuoteElement*"
/
Parameter /
Arithmetic /
CommandSubstitution - QuoteElement
-
\
([$`"\
] / <newline>) /
Parameter /
Arithmetic /
CommandSubstitutionQuoted /
![`"\
] . - Parameter
-
$
[@*#?-$!
[:digit:]] /
$
PortableName /
$
ParameterBody - PortableName
-
![
0
-9
] [0
-9
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_
]+ - ParameterBody
-
{
ParameterNumber? (ParameterName / ParameterBody /$
ParameterBody / Arithmetic / CommandSubstitution) ParameterIndex? ParameterMatch?}
- ParameterNumber
-
#
![}+=:/%
] !([-?#
] !}
) - ParameterName
-
[
@*#?-$!
] /
[[:alnum:]_
]+ - ParameterIndex
-
[
ParameterIndexWord (,
ParameterIndexWord)?]
- ParameterIndexWord
-
(WordElement / ![
"'],
] .)+ - ParameterMatch
-
:
? [-+=?
] ParameterMatchWord /
(#
/##
/%
/%%
) ParameterMatchWord /
(:/
//
[#%/
]?) ParameterMatchWordNoSlash (/
ParameterMatchWord)? - ParameterMatchWord
-
(WordElement / ![
"'}
] .)* - ParameterMatchWordNoSlash
-
(WordElement / ![
"'/}
] .)* - Arithmetic
-
$((
ArithmeticBody*))
- ArithmeticBody
-
\
. /
Parameter /
Arithmetic /
CommandSubstitution /
(
ArithmeticBody)
/
![`()
] . - CommandSubstitution
-
$(
CompleteProgram)
/
`
CommandSubstitutionBody*`
- CommandSubstitutionQuoted
-
$(
CompleteProgram)
/
`
CommandSubstitutionBodyQuoted*`
- CommandSubstitutionBody
-
\
[$`\
] /
!`
. - CommandSubstitutionBodyQuoted
-
\
[$`\`
] /
!`
. - SpecialChar
-
[
|&;<>()`\"'
[:blank:]] / <newline>
The set of terminals of the grammar is the set of characters that can
be handled on the environment in which the shell is run (a.k.a. execution
character set), with the exception that the set does not contain the null
character ('\0'
).
Strictly speaking, the definition above is not a complete parsing expression grammar because the rule for CommandSubstitution (Quoted) depends on CompleteProgram which is a non-terminal of the syntax.
Token classification
After a word token is delimited, the token may be further classified as an IO_NUMBER token, reserved word, name word, assignment word, or just normal word. Classification other than the normal word is applied only when applicable in the context in which the word appears. See Tokens and keywords for the list of the reserved words (keywords) and the context in which a word may be recognized as a reserved word.
A token is an IO_NUMBER token iff it is composed of digit characters only and
immediately followed by <
or >
.
An assignment token is a token that starts with a name followed by =
:
- AssignmentWord
- AssignmentPrefix
-
Name
=
- Name
-
![[:digit:]] \[[:alnum:]
_
]+
Comments
A comment begins with #
and continues up to (but not including) the next
newline character.
Comments are treated like a blank character and do not become part of a token.
The initial #
of a comment must appear as if it would otherwise be the first
character of a word token; Other #
s are just treated as part of a word
token.
Syntax
After tokens have been delimited, the sequence of the tokens is parsed
according to the context-free grammar defined below, where *
, +
, and ?
should be interpreted in the same manner as standard regular expression:
- CompleteProgram
-
NL* | CompoundList
- CompoundList
-
NL* AndOrList ((
;
|&
| NL) CompleteProgram)? - AndOrList
- Pipeline
- Command
-
CompoundCommand Redirection* |
FunctionDefinition |
SimpleCommand - CompoundCommand
-
Subshell |
Grouping |
IfCommand |
ForCommand |
WhileCommand |
CaseCommand |
DoubleBracketCommand |
FunctionCommand - Subshell
-
(
CompoundList)
- Grouping
-
{
CompoundList}
- IfCommand
-
if
CompoundListthen
CompoundList (elif
CompoundListthen
CompoundList)* (else
CompoundList)?fi
- ForCommand
-
for
Name ((NL*in
Word*)? (;
| NL) NL*)?do
CompoundListdone
- WhileCommand
-
(
while
|until
) CompoundListdo
CompoundListdone
- CaseCommand
- CaseList
- CaseItem
-
(
? Word (|
Word)*)
CompleteProgram - DoubleBracketCommand
-
[[
Ors]]
- Ors
-
Ands (
||
Ands)* - Ands
-
Nots (
&&
Nots)* - Nots
-
!
* Primary - Primary
-
(
-b
|-c
|-d
|-e
|-f
|-G
|-g
|-h
|-k
|-L
|-N
|-n
|-O
|-o
|-p
|-r
|-S
|-s
|-t
|-u
|-w
|-x
|-z
) Word |
Word (-ef
|-eq
|-ge
|-gt
|-le
|-lt
|-ne
|-nt
|-ot
|-veq
|-vge
|-vgt
|-vle
|-vlt
|-vne
|=
|==
|===
|=~
|!=
|!==
|<
|>
) Word |
(
Ors)
|
Word - FunctionCommand
-
function
Word ((
)
)? NL* CompoundCommand Redirection* - FunctionDefinition
-
Name
(
)
NL* CompoundCommand Redirection* - SimpleCommand
-
(Assignment | Redirection) SimpleCommand? |
Word (Word | Redirection)* - Assignment
-
AssignmentWord |
AssignmentPrefix(
NL* (Word NL*)*)
- Redirection
-
IO_NUMBER? RedirectionOperator Word |
IO_NUMBER?<(
CompleteProgram)
|
IO_NUMBER?>(
CompleteProgram)
- RedirectionOperator
-
<
|<>
|>
|>|
|>>
|>>|
|<&
|>&
|<<
|<<-
|<<<
- NL
-
<newline>
In the rule for Primary, Word tokens must not be
]]
. Additionally, if a Primary starts with a Word, it must not be any of the
possible unary operators allowed in the rule.
In the rule for SimpleCommand, a Word token is accepted only when the token cannot be parsed as the first token of an Assignment.
In the rule for Assignment, the (
token must immediately
follow the AssignmentPrefix token, without any blank
characters in between.
Here-document contents do not appear as part of the grammar above. They are parsed just after the newline (NL) token that follows the corresponding redirection operator.
Alias substitution
Word tokens are subject to alias substitution.
-
If a word is going to be parsed as a Word of a SimpleCommand, the word is subjected to alias substitution of any kind (normal and global aliases).
-
If a word is the next token after the result of an alias substitution and the substitution string ends with a blank character, then the word is also subjected to alias substitution of any kind.
-
Other words are subjected to global alias substitution unless the shell is in the POSIXly-correct mode.
Tokens that are classified as reserved words are not subject to alias substitution.