The Open Group Base Specifications Issue 6
IEEE Std 1003.1, 2004 Edition
Copyright © 2001-2004 The IEEE and The Open Group

C.2 Shell Command Language

C.2.1 Shell Introduction

The System V shell was selected as the starting point for the Shell and Utilities volume of IEEE Std 1003.1-2001. The BSD C shell was excluded from consideration for the following reasons:

The construct "#!" is reserved for implementations wishing to provide that extension. If it were not reserved, the Shell and Utilities volume of IEEE Std 1003.1-2001 would disallow it by forcing it to be a comment. As it stands, a strictly conforming application must not use "#!" as the first two characters of the file.

C.2.2 Quoting

There is no additional rationale provided for this section.

Escape Character (Backslash)

There is no additional rationale provided for this section.

Single-Quotes

A backslash cannot be used to escape a single-quote in a single-quoted string. An embedded quote can be created by writing, for example: "'a'\''b'", which yields "a'b". (See the Shell and Utilities volume of IEEE Std 1003.1-2001, Section 2.6.5, Field Splitting for a better understanding of how portions of words are either split into fields or remain concatenated.) A single token can be made up of concatenated partial strings containing all three kinds of quoting or escaping, thus permitting any combination of characters.

Double-Quotes

The escaped <newline> used for line continuation is removed entirely from the input and is not replaced by any white space. Therefore, it cannot serve as a token separator.

In double-quoting, if a backslash is immediately followed by a character that would be interpreted as having a special meaning, the backslash is deleted and the subsequent character is taken literally. If a backslash does not precede a character that would have a special meaning, it is left in place unmodified and the character immediately following it is also left unmodified. Thus, for example:

"\$"  ->  $

"\a" -> \a

It would be desirable to include the statement "The characters from an enclosed "${" to the matching '}' shall not be affected by the double quotes", similar to the one for "$()". However, historical practice in the System V shell prevents this.

The requirement that double-quotes be matched inside "${...}" within double-quotes and the rule for finding the matching '}' in the Shell and Utilities volume of IEEE Std 1003.1-2001, Section 2.6.2, Parameter Expansion eliminate several subtle inconsistencies in expansion for historical shells in rare cases; for example:

"${foo-bar"}

yields bar when foo is not defined, and is an invalid substitution when foo is defined, in many historical shells. The differences in processing the "${...}" form have led to inconsistencies between historical systems. A consequence of this rule is that single-quotes cannot be used to quote the '}' within "${...}" ; for example:

unset bar
foo="${bar-'}'}"

is invalid because the "${...}" substitution contains an unpaired unescaped single-quote. The backslash can be used to escape the '}' in this example to achieve the desired result:

unset bar
foo="${bar-\}}"

The differences in processing the "${...}" form have led to inconsistencies between the historical System V shell, BSD, and KornShells, and the text in the Shell and Utilities volume of IEEE Std 1003.1-2001 is an attempt to converge them without breaking too many applications. The only alternative to this compromise between shells would be to make the behavior unspecified whenever the literal characters '", '{', '}', and '' appear within "${...}". To write a portable script that uses these values, a user would have to assign variables; for example:

squote=\' dquote=\" lbrace='{' rbrace='}'
${foo-$squote$rbrace$squote}

rather than:

${foo-"'}'"}

Some implementations have allowed the end of the word to terminate the backquoted command substitution, such as in:

"`echo hello"

This usage is undefined; the matching backquote is required by the Shell and Utilities volume of IEEE Std 1003.1-2001. The other undefined usage can be illustrated by the example:

sh -c '` echo "foo`'

The description of the recursive actions involving command substitution can be illustrated with an example. Upon recognizing the introduction of command substitution, the shell parses input (in a new context), gathering the source for the command substitution until an unbalanced ')' or '`' is located. For example, in the following:

echo "$(date; echo "
    one" )"

the double-quote following the echo does not terminate the first double-quote; it is part of the command substitution script. Similarly, in:

echo "$(echo *)"

the asterisk is not quoted since it is inside command substitution; however:

echo "$(echo "*")"

is quoted (and represents the asterisk character itself).

C.2.3 Token Recognition

The "((" and "))" symbols are control operators in the KornShell, used for an alternative syntax of an arithmetic expression command. A conforming application cannot use "((" as a single token (with the exception of the "$((" form for shell arithmetic).

On some implementations, the symbol "((" is a control operator; its use produces unspecified results. Applications that wish to have nested subshells, such as:

((echo Hello);(echo World))

must separate the "((" characters into two tokens by including white space between them. Some systems may treat these as invalid arithmetic expressions instead of subshells.

Certain combinations of characters are invalid in portable scripts, as shown in the grammar. Implementations may use these combinations (such as "|&" ) as valid control operators. Portable scripts cannot rely on receiving errors in all cases where this volume of IEEE Std 1003.1-2001 indicates that a syntax is invalid.

The (3) rule about combining characters to form operators is not meant to preclude systems from extending the shell language when characters are combined in otherwise invalid ways. Conforming applications cannot use invalid combinations, and test suites should not penalize systems that take advantage of this fact. For example, the unquoted combination "|&" is not valid in a POSIX script, but has a specific KornShell meaning.

The (10) rule about '#' as the current character is the first in the sequence in which a new token is being assembled. The '#' starts a comment only when it is at the beginning of a token. This rule is also written to indicate that the search for the end-of-comment does not consider escaped <newline> specially, so that a comment cannot be continued to the next line.

Alias Substitution

The alias capability was added in the User Portability Utilities option because it is widely used in historical implementations by interactive users.

The definition of "alias name" precludes an alias name containing a slash character. Since the text applies to the command words of simple commands, reserved words (in their proper places) cannot be confused with aliases.

The placement of alias substitution in token recognition makes it clear that it precedes all of the word expansion steps.

An example concerning trailing <blank>s and reserved words follows. If the user types:

$ alias foo="/bin/ls "
$ alias while="/"

The effect of executing:

$ while true
> do
> echo "Hello, World"
> done

is a never-ending sequence of "Hello, World" strings to the screen. However, if the user types:

$ foo while

the result is an ls listing of /. Since the alias substitution for foo ends in a <space>, the next word is checked for alias substitution. The next word, while, has also been aliased, so it is substituted as well. Since it is not in the proper position as a command word, it is not recognized as a reserved word.

If the user types:

$ foo; while

while retains its normal reserved-word properties.

C.2.4 Reserved Words

All reserved words are recognized syntactically as such in the contexts described. However, note that in is the only meaningful reserved word after a case or for; similarly, in is not meaningful as the first word of a simple command.

Reserved words are recognized only when they are delimited (that is, meet the definition of the Base Definitions volume of IEEE Std 1003.1-2001, Section 3.435, Word), whereas operators are themselves delimiters. For instance, '(' and ')' are control operators, so that no <space> is needed in (list). However, '{' and '}' are reserved words in { list;}, so that in this case the leading <space> and semicolon are required.

The list of unspecified reserved words is from the KornShell, so conforming applications cannot use them in places a reserved word would be recognized. This list contained time in early proposals, but it was removed when the time utility was selected for the Shell and Utilities volume of IEEE Std 1003.1-2001.

There was a strong argument for promoting braces to operators (instead of reserved words), so they would be syntactically equivalent to subshell operators. Concerns about compatibility outweighed the advantages of this approach. Nevertheless, conforming applications should consider quoting '{' and '}' when they represent themselves.

The restriction on ending a name with a colon is to allow future implementations that support named labels for flow control; see the RATIONALE for the break built-in utility.

It is possible that a future version of the Shell and Utilities volume of IEEE Std 1003.1-2001 may require that '{' and '}' be treated individually as control operators, although the token "{}" will probably be a special-case exemption from this because of the often-used find{} construct.

C.2.5 Parameters and Variables

Positional Parameters

There is no additional rationale provided for this section.

Special Parameters

Most historical implementations implement subshells by forking; thus, the special parameter '$' does not necessarily represent the process ID of the shell process executing the commands since the subshell execution environment preserves the value of '$'.

If a subshell were to execute a background command, the value of "$!" for the parent would not change. For example:

(
date &
echo $!
)
echo $!

would echo two different values for "$!".

The "$-" special parameter can be used to save and restore set options:

Save=$(echo $- | sed 's/[ics]//g')
...
set +aCefnuvx
if [ -n "$Save" ]; then
    set -$Save
fi

The three options are removed using sed in the example because they may appear in the value of "$-" (from the sh command line), but are not valid options to set.

The descriptions of parameters '*' and '@' assume the reader is familiar with the field splitting discussion in the Shell and Utilities volume of IEEE Std 1003.1-2001, Section 2.6.5, Field Splitting and understands that portions of the word remain concatenated unless there is some reason to split them into separate fields.

Some examples of the '*' and '@' properties, including the concatenation aspects:

set "abc" "def ghi" "jkl"

echo $* => "abc" "def" "ghi" "jkl" echo "$*" => "abc def ghi jkl" echo $@ => "abc" "def" "ghi" "jkl"

but:

echo "$@"      => "abc" "def ghi" "jkl"
echo "xx$@yy"  => "xxabc" "def ghi" "jklyy"
echo "$@$@"    => "abc" "def ghi" "jklabc" "def ghi" "jkl"

In the preceding examples, the double-quote characters that appear after the "=>" do not appear in the output and are used only to illustrate word boundaries.

The following example illustrates the effect of setting IFS to a null string:

$ IFS=''
$ set foo bar bam
$ echo "$@"
foo bar bam
$ echo "$*"
foobarbam
$ unset IFS
$ echo "$*"
foo bar bam

Shell Variables

See the discussion of IFS in Field Splitting and the RATIONALE for the sh utility.

The prohibition on LC_CTYPE changes affecting lexical processing protects the shell implementor (and the shell programmer) from the ill effects of changing the definition of <blank> or the set of alphabetic characters in the current environment. It would probably not be feasible to write a compiled version of a shell script without this rule. The rule applies only to the current invocation of the shell and its subshells-invoking a shell script or performing exec sh would subject the new shell to the changes in LC_CTYPE .

Other common environment variables used by historical shells are not specified by the Shell and Utilities volume of IEEE Std 1003.1-2001, but they should be reserved for the historical uses.

Tilde expansion for components of PATH in an assignment such as:

PATH=˜hlj/bin:˜dwc/bin:$PATH

is a feature of some historical shells and is allowed by the wording of the Shell and Utilities volume of IEEE Std 1003.1-2001, Section 2.6.1, Tilde Expansion. Note that the tildes are expanded during the assignment to PATH , not when PATH is accessed during command search.

The following entries represent additional information about variables included in the Shell and Utilities volume of IEEE Std 1003.1-2001, or rationale for common variables in use by shells that have been excluded:

_
(Underscore.) While underscore is historical practice, its overloaded usage in the KornShell is confusing, and it has been omitted from the Shell and Utilities volume of IEEE Std 1003.1-2001.
ENV
This variable can be used to set aliases and other items local to the invocation of a shell. The file referred to by ENV differs from $HOME/.profile in that .profile is typically executed at session start-up, whereas the ENV file is executed at the beginning of each shell invocation. The ENV value is interpreted in a manner similar to a dot script, in that the commands are executed in the current environment and the file needs to be readable, but not executable. However, unlike dot scripts, no PATH searching is performed. This is used as a guard against Trojan Horse security breaches.
ERRNO
This variable was omitted from the Shell and Utilities volume of IEEE Std 1003.1-2001 because the values of error numbers are not defined in IEEE Std 1003.1-2001 in a portable manner.
FCEDIT
Since this variable affects only the fc utility, it has been omitted from this more global place. The value of FCEDIT does not affect the command-line editing mode in the shell; see the description of set -o vi in the set built-in utility.
PS1
This variable is used for interactive prompts. Historically, the "superuser" has had a prompt of '#'. Since privileges are not required to be monolithic, it is difficult to define which privileges should cause the alternate prompt. However, a sufficiently powerful user should be reminded of that power by having an alternate prompt.
PS3
This variable is used by the KornShell for the select command. Since the POSIX shell does not include select, PS3 was omitted.
PS4
This variable is used for shell debugging. For example, the following script:
PS4='[${LINENO}]+ '
set -x
echo Hello

writes the following to standard error:

[3]+ echo Hello

RANDOM
This pseudo-random number generator was not seen as being useful to interactive users.
SECONDS
Although this variable is sometimes used with PS1 to allow the display of the current time in the prompt of the user, it is not one that would be manipulated frequently enough by an interactive user to include in the Shell and Utilities volume of IEEE Std 1003.1-2001.

C.2.6 Word Expansions

Step (2) refers to the "portions of fields generated by step (1)". For example, if the word being expanded were "$x+$y" and IFS =+, the word would be split only if "$x" or "$y" contained '+' ; the '+' in the original word was not generated by step (1).

IFS is used for performing field splitting on the results of parameter and command substitution; it is not used for splitting all fields. Previous versions of the shell used it for splitting all fields during field splitting, but this has severe problems because the shell can no longer parse its own script. There are also important security implications caused by this behavior. All useful applications of IFS use it for parsing input of the read utility and for splitting the results of parameter and command substitution.

The rule concerning expansion to a single field requires that if foo= abc and bar= def, that:

"$foo""$bar"


expands to the single field:

abcdef

The rule concerning empty fields can be illustrated by:

$    unset foo
$    set $foo bar " xyz "$foo" abc
$    for i
>    do
>        echo "-$i-"
>    done
-bar-
--
-xyz-
--
-abc-

Step (1) indicates that parameter expansion, command substitution, and arithmetic expansion are all processed simultaneously as they are scanned. For example, the following is valid arithmetic:

x=1
echo $(( $(echo 3)+$x ))

An early proposal stated that tilde expansion preceded the other steps, but this is not the case in known historical implementations; if it were, and if a referenced home directory contained a '$' character, expansions would result within the directory name.

Tilde Expansion

Tilde expansion generally occurs only at the beginning of words, but an exception based on historical practice has been included:

PATH=/posix/bin:˜dgk/bin

This is eligible for tilde expansion because tilde follows a colon and none of the relevant characters is quoted. Consideration was given to prohibiting this behavior because any of the following are reasonable substitutes:

PATH=$(printf %s ˜karels/bin : ˜bostic/bin)

for Dir in ˜maart/bin ˜srb/bin ... do PATH=${PATH:+$PATH:}$Dir done

In the first command, explicit colons are used for each directory. In all cases, the shell performs tilde expansion on each directory because all are separate words to the shell.

Note that expressions in operands such as:

make -k mumble LIBDIR=˜chet/lib

do not qualify as shell variable assignments, and tilde expansion is not performed (unless the command does so itself, which make does not).

Because of the requirement that the word is not quoted, the following are not equivalent; only the last causes tilde expansion:

\˜hlj/   ˜h\lj/   ˜"hlj"/   ˜hlj\/   ˜hlj/

In an early proposal, tilde expansion occurred following any unquoted equals sign or colon, but this was removed because of its complexity and to avoid breaking commands such as:

rcp hostname:˜marc/.profile .

A suggestion was made that the special sequence "$˜" should be allowed to force tilde expansion anywhere. Since this is not historical practice, it has been left for future implementations to evaluate. (The description in the Shell and Utilities volume of IEEE Std 1003.1-2001, Section 2.2, Quoting requires that a dollar sign be quoted to represent itself, so the "$˜" combination is already unspecified.)

The results of giving tilde with an unknown login name are undefined because the KornShell "˜+" and "˜-" constructs make use of this condition, but in general it is an error to give an incorrect login name with tilde. The results of having HOME unset are unspecified because some historical shells treat this as an error.

Parameter Expansion

The rule for finding the closing '}' in "${...}" is the one used in the KornShell and is upwardly-compatible with the Bourne shell, which does not determine the closing '}' until the word is expanded. The advantage of this is that incomplete expansions, such as:

${foo

can be determined during tokenization, rather than during expansion.

The string length and substring capabilities were included because of the demonstrated need for them, based on their usage in other shells, such as C shell and KornShell.

Historical versions of the KornShell have not performed tilde expansion on the word part of parameter expansion; however, it is more consistent to do so.

Command Substitution

The "$()" form of command substitution solves a problem of inconsistent behavior when using backquotes. For example:

Command

Output

echo '\$x'

\$x

echo `echo '\$x'`

$x

echo $(echo '\$x')

\$x

Additionally, the backquoted syntax has historical restrictions on the contents of the embedded command. While the newer "$()" form can process any kind of valid embedded script, the backquoted form cannot handle some valid scripts that include backquotes. For example, these otherwise valid embedded scripts do not work in the left column, but do work on the right:

echo `                         echo $(
cat <<\eof                     cat <<\eof
a here-doc with `              a here-doc with )
eof                            eof
`                              )

echo ` echo $( echo abc # a comment with ` echo abc # a comment with ) ` )
echo ` echo $( echo '`' echo ')' ` )

Because of these inconsistent behaviors, the backquoted variety of command substitution is not recommended for new applications that nest command substitutions or attempt to embed complex scripts.

The KornShell feature:

If command is of the form < word, word is expanded to generate a pathname, and the value of the command substitution is the contents of this file with any trailing <newline>s deleted.

was omitted from the Shell and Utilities volume of IEEE Std 1003.1-2001 because $( cat word) is an appropriate substitute. However, to prevent breaking numerous scripts relying on this feature, it is unspecified to have a script within "$()" that has only redirections.

The requirement to separate "$(" and '(' when a single subshell is command-substituted is to avoid any ambiguities with arithmetic expansion.

IEEE Std 1003.1-2001/Cor 1-2002, item XCU/TC1/D6/4 is applied, changing the text from: "If a command substitution occurs inside double-quotes, it shall not be performed on the results of the substitution." to: "If a command substitution occurs inside double-quotes, field splitting and pathname expansion shall not be performed on the results of the substitution.". The replacement text taken from the ISO POSIX-2:1993 standard is clearer about the items that are not performed.

Arithmetic Expansion

The standard developers agreed that there was a strong desire for some kind of arithmetic evaluator to provide functionality similar to expr, that relating it to '$' makes it work well with the standard shell language and provides access to arithmetic evaluation in places where accessing a utility would be inconvenient.

The syntax and semantics for arithmetic were revised for the ISO/IEC 9945-2:1993 standard. The language represents a simple subset of the previous arithmetic language (which was derived from the KornShell "(())" construct). The syntax was changed from that of a command denoted by ((expression)) to an expansion denoted by $((expression)). The new form is a dollar expansion ( '$' ) that evaluates the expression and substitutes the resulting value. Objections to the previous style of arithmetic included that it was too complicated, did not fit in well with the use of variables in the shell, and its syntax conflicted with subshells. The justification for the new syntax is that the shell is traditionally a macro language, and if a new feature is to be added, it should be accomplished by extending the capabilities presented by the current model of the shell, rather than by inventing a new one outside the model; adding a new dollar expansion was perceived to be the most intuitive and least destructive way to add such a new capability.

The standard requires assignment operators to be supported (as listed in the Shell and Utilities volume of IEEE Std 1003.1-2001, Section 1.7.2, Concepts Derived from the ISO Standard), and since arithmetic expansions are not specified to be evaluated in a subshell environment, changes to variables there have to be in effect after the arithmetic expansion, just as in the parameter expansion "${x=value}".

Note, however, that "$(( x=5 ))" need not be equivalent to "$(( $x=5 ))". If the value of the environment variable x is the string "y=", the expansion of "$(( x=5 ))" would set x to 5 and output 5, but "$(( $x=5 ))" would output 0 if the value of the environment variable y is not 5 and would output 1 if the environment variable y is 5. Similarly, if the value of the environment variable is 4, the expansion of "$(( x=5 ))" would still set x to 5 and output 5, but "$(( $x=5 ))" (which would be equivalent to "$(( 4=5 ))" ) would yield a syntax error.

In early proposals, a form $[expression] was used. It was functionally equivalent to the "$(())" of the current text, but objections were lodged that the 1988 KornShell had already implemented "$(())" and there was no compelling reason to invent yet another syntax. Furthermore, the "$[]" syntax had a minor incompatibility involving the patterns in case statements.

In early proposals, a form $[expression] was used. It was functionally equivalent to the "$(())" of the current text, but objections were lodged that the 1988 KornShell had already implemented "$(())" and there was no compelling reason to invent yet another syntax. Furthermore, the "$[]" syntax had a minor incompatibility involving the patterns in case statements.

The portion of the ISO C standard arithmetic operations selected corresponds to the operations historically supported in the KornShell. In addition to the exceptions listed in the Shell and Utilities volume of IEEE Std 1003.1-2001, Section 2.6.4, Arithmetic Expansion, the use of the following are explicitly outside the scope of the rules defined in the Shell and Utilities volume of IEEE Std 1003.1-2001, Section 1.7.2.1, Arithmetic Precision and Operations:

It was concluded that the test command ( [) was sufficient for the majority of relational arithmetic tests, and that tests involving complicated relational expressions within the shell are rare, yet could still be accommodated by testing the value of "$(())" itself. For example:

# a complicated relational expression
while [ $(( (($x + $y)/($a * $b)) < ($foo*$bar) )) -ne 0 ]

or better yet, the rare script that has many complex relational expressions could define a function like this:

val() {
    return $((!$1))
}

and complicated tests would be less intimidating:

while val $(( (($x + $y)/($a * $b)) < ($foo*$bar) ))
do
    # some calculations
done

A suggestion that was not adopted was to modify true and false to take an optional argument, and true would exit true only if the argument was non-zero, and false would exit false only if the argument was non-zero:

while true $(($x > 5 && $y <= 25))

There is a minor portability concern with the new syntax. The example "$((2+2))" could have been intended to mean a command substitution of a utility named "2+2" in a subshell. The standard developers considered this to be obscure and isolated to some KornShell scripts (because "$()" command substitution existed previously only in the KornShell). The text on command substitution requires that the "$(" and '(' be separate tokens if this usage is needed.

An example such as:

echo $((echo hi);(echo there))

should not be misinterpreted by the shell as arithmetic because attempts to balance the parentheses pairs would indicate that they are subshells. However, as indicated by the Base Definitions volume of IEEE Std 1003.1-2001, Section 3.112, Control Operator, a conforming application must separate two adjacent parentheses with white space to indicate nested subshells.

The standard is intentionally silent about how a variable's numeric value in an expression is determined from its normal "sequence of bytes" value. It could be done as a text substitution, as a conversion like that performed by strtol(), or even recursive evaluation. Therefore, the only cases for which the standard is clear are those for which both conversions produce the same result. The cases where they give the same result are those where the sequence of bytes form a valid integer constant. Therefore, if a variable does not contain a valid integer constant, the behavior is unspecified.

For the commands:

x=010; echo $((x += 1))

the output must be 9.

For the commands:

x=' 1'; echo $((x += 1))

the results are unspecified.

For the commands:

x=1+1; echo $((x += 1))

the results are unspecified.

Although the ISO/IEC 9899:1999 standard now requires support for long long and allows extended integer types with higher ranks, IEEE Std 1003.1-2001 only requires arithmetic expansions to support signed long integer arithmetic. Implementations are encouraged to support signed integer values at least as large as the size of the largest file allowed on the implementation.

Implementations are also allowed to perform floating-point evaluations as long as an application won't see different results for expressions that would not overflow signed long integer expression evaluation. (This includes appropriate truncation of results to integer values.)

Changes made in response to IEEE PASC Interpretation 1003.2 #208 removed the requirement that the integer constant suffixes l and L had to be recognized. The ISO POSIX-2:1993 standard did not require the u, ul, uL, U, Ul, UL, lu, lU, Lu, and LU suffixes since only signed integer arithmetic was required. Since all arithmetic expressions were treated as handling signed long integer types anyway, the l and L suffixes were redundant. No known scripts used them and some historic shells did not support them. When the ISO/IEC 9899:1999 standard was used as the basis for the description of arithmetic processing, the ll and LL suffixes and combinations were also not required. Implementations are still free to accept any or all of these suffixes, but are not required to do so.

There was also some confusion as to whether the shell was required to recognize character constants. Syntactically, character constants were required to be recognized, but the requirements for the handling of backslash ( '\' ) and quote ( '" ) characters (needed to specify character constants) within an arithmetic expansion were ambiguous. Furthermore, no known shells supported them. Changes made in response to IEEE PASC Interpretation 1003.2 #208 removed the requirement to support them (if they were indeed required before). IEEE Std 1003.1-2001 clearly does not require support for character constants.

IEEE Std 1003.1-2001/Cor 2-2004, item XCU/TC2/D6/3 is applied, clarifying arithmetic expressions.

Field Splitting

The operation of field splitting using IFS , as described in early proposals, was based on the way the KornShell splits words, but it is incompatible with other common versions of the shell. However, each has merit, and so a decision was made to allow both. If the IFS variable is unset or is <space> <tab> <newline>, the operation is equivalent to the way the System V shell splits words. Using characters outside the <space> <tab> <newline> set yields the KornShell behavior, where each of the non- <space> <tab> <newline>s is significant. This behavior, which affords the most flexibility, was taken from the way the original awk handled field splitting.

Rule (3) can be summarized as a pseudo-ERE:

(s*ns*|s+)

where s is an IFS white space character and n is a character in the IFS that is not white space. Any string matching that ERE delimits a field, except that the s+ form does not delimit fields at the beginning or the end of a line. For example, if IFS is <space>/ <comma>/ <tab>, the string:

<space><space>red<space><space>,<space>white<space>blue

yields the three colors as the delimited fields.

Pathname Expansion

There is no additional rationale provided for this section.

Quote Removal

There is no additional rationale provided for this section.

C.2.7 Redirection

In the System Interfaces volume of IEEE Std 1003.1-2001, file descriptors are integers in the range 0-({OPEN_MAX}-1). The file descriptors discussed in the Shell and Utilities volume of IEEE Std 1003.1-2001, Section 2.7, Redirection are that same set of small integers.

Having multi-digit file descriptor numbers for I/O redirection can cause some obscure compatibility problems. Specifically, scripts that depend on an example command:

echo 22>/dev/null

echoing "2" to standard error or "22" to standard output are no longer portable. However, the file descriptor number must still be delimited from the preceding text. For example:

cat file2>foo

writes the contents of file2, not the contents of file.

The ">|" format of output redirection was adopted from the KornShell. Along with the noclobber option, set -C, it provides a safety feature to prevent inadvertent overwriting of existing files. (See the RATIONALE for the pathchk utility for why this step was taken.) The restriction on regular files is historical practice.

The System V shell and the KornShell have differed historically on pathname expansion of word; the former never performed it, the latter only when the result was a single field (file). As a compromise, it was decided that the KornShell functionality was useful, but only as a shorthand device for interactive users. No reasonable shell script would be written with a command such as:

cat foo > a*

Thus, shell scripts are prohibited from doing it, while interactive users can select the shell with which they are most comfortable.

The construct "2>&1" is often used to redirect standard error to the same file as standard output. Since the redirections take place beginning to end, the order of redirections is significant. For example:

ls > foo 2>&1

directs both standard output and standard error to file foo. However:

ls 2>&1 > foo

only directs standard output to file foo because standard error was duplicated as standard output before standard output was directed to file foo.

The "<>" operator could be useful in writing an application that worked with several terminals, and occasionally wanted to start up a shell. That shell would in turn be unable to run applications that run from an ordinary controlling terminal unless it could make use of "<>" redirection. The specific example is a historical version of the pager more, which reads from standard error to get its commands, so standard input and standard output are both available for their usual usage. There is no way of saying the following in the shell without "<>" :

cat food | more - >/dev/tty03 2<>/dev/tty03

Another example of "<>" is one that opens /dev/tty on file descriptor 3 for reading and writing:

exec 3<> /dev/tty

An example of creating a lock file for a critical code region:

set -C
until    2> /dev/null > lockfile
do       sleep 30
done
set +C
perform critical functionrm lockfile

Since /dev/null is not a regular file, no error is generated by redirecting to it in noclobber mode.

Tilde expansion is not performed on a here-document because the data is treated as if it were enclosed in double quotes.

Redirecting Input

There is no additional rationale provided for this section.

Redirecting Output

There is no additional rationale provided for this section.

Appending Redirected Output

Note that when a file is opened (even with the O_APPEND flag set), the initial file offset for that file is set to the beginning of the file. Some historic shells set the file offset to the current end-of-file when append mode shell redirection was used, but this is not allowed by IEEE Std 1003.1-2001.

Here-Document

There is no additional rationale provided for this section.

Duplicating an Input File Descriptor

There is no additional rationale provided for this section.

Duplicating an Output File Descriptor

There is no additional rationale provided for this section.

Open File Descriptors for Reading and Writing

There is no additional rationale provided for this section.

C.2.8 Exit Status and Errors

Consequences of Shell Errors

There is no additional rationale provided for this section.

Exit Status for Commands

There is a historical difference in sh and ksh non-interactive error behavior. When a command named in a script is not found, some implementations of sh exit immediately, but ksh continues with the next command. Thus, the Shell and Utilities volume of IEEE Std 1003.1-2001 says that the shell "may" exit in this case. This puts a small burden on the programmer, who has to test for successful completion following a command if it is important that the next command not be executed if the previous command was not found. If it is important for the command to have been found, it was probably also important for it to complete successfully. The test for successful completion would not need to change.

Historically, shells have returned an exit status of 128+ n, where n represents the signal number. Since signal numbers are not standardized, there is no portable way to determine which signal caused the termination. Also, it is possible for a command to exit with a status in the same range of numbers that the shell would use to report that the command was terminated by a signal. Implementations are encouraged to choose exit values greater than 256 to indicate programs that terminate by a signal so that the exit status cannot be confused with an exit status generated by a normal termination.

Historical shells make the distinction between "utility not found" and "utility found but cannot execute" in their error messages. By specifying two seldomly used exit status values for these cases, 127 and 126 respectively, this gives an application the opportunity to make use of this distinction without having to parse an error message that would probably change from locale to locale. The command, env, nohup, and xargs utilities in the Shell and Utilities volume of IEEE Std 1003.1-2001 have also been specified to use this convention.

When a command fails during word expansion or redirection, most historical implementations exit with a status of 1. However, there was some sentiment that this value should probably be much higher so that an application could distinguish this case from the more normal exit status values. Thus, the language "greater than zero" was selected to allow either method to be implemented.

C.2.9 Shell Commands

A description of an "empty command" was removed from an early proposal because it is only relevant in the cases of sh -c "", system( "" ), or an empty shell-script file (such as the implementation of true on some historical systems). Since it is no longer mentioned in the Shell and Utilities volume of IEEE Std 1003.1-2001, it falls into the silently unspecified category of behavior where implementations can continue to operate as they have historically, but conforming applications do not construct empty commands. (However, note that sh does explicitly state an exit status for an empty string or file.) In an interactive session or a script with other commands, extra <newline>s or semicolons, such as:

$ false
$
$ echo $?
1

would not qualify as the empty command described here because they would be consumed by other parts of the grammar.

Simple Commands

The enumerated list is used only when the command is actually going to be executed. For example, in:

true || $foo *

no expansions are performed.

The following example illustrates both how a variable assignment without a command name affects the current execution environment, and how an assignment with a command name only affects the execution environment of the command:

$ x=red
$ echo $x
red
$ export x
$ sh -c 'echo $x'
red
$ x=blue sh -c 'echo $x'
blue
$ echo $x
red

This next example illustrates that redirections without a command name are still performed:

$ ls foo
ls: foo: no such file or directory
$ > foo
$ ls foo
foo

A command without a command name, but one that includes a command substitution, has an exit status of the last command substitution that the shell performed. For example:

if      x=$(command)
then    ...
fi

An example of redirections without a command name being performed in a subshell shows that the here-document does not disrupt the standard input of the while loop:

IFS=:
while    read a b
do       echo $a
         <<-eof
         Hello
         eof
done </etc/passwd

Following are examples of commands without command names in AND-OR lists:

> foo || {
    echo "error: foo cannot be created" >&2
    exit 1
}

# set saved if /vmunix.save exists test -f /vmunix.save && saved=1

Command substitution and redirections without command names both occur in subshells, but they are not necessarily the same ones. For example, in:

exec 3> file
var=$(echo foo >&3) 3>&1

it is unspecified whether foo is echoed to the file or to standard output.

Command Search and Execution

This description requires that the shell can execute shell scripts directly, even if the underlying system does not support the common "#!" interpreter convention. That is, if file foo contains shell commands and is executable, the following executes foo:

./foo

The command search shown here does not match all historical implementations. A more typical sequence has been:

But there are problems with this sequence. Since the programmer has no idea in advance which utilities might have been built into the shell, a function cannot be used to override portably a utility of the same name. (For example, a function named cd cannot be written for many historical systems.) Furthermore, the PATH variable is partially ineffective in this case, and only a pathname with a slash can be used to ensure a specific executable file is invoked.

After the execve() failure described, the shell normally executes the file as a shell script. Some implementations, however, attempt to detect whether the file is actually a script and not an executable from some other architecture. The method used by the KornShell is allowed by the text that indicates non-text files may be bypassed.

The sequence selected for the Shell and Utilities volume of IEEE Std 1003.1-2001 acknowledges that special built-ins cannot be overridden, but gives the programmer full control over which versions of other utilities are executed. It provides a means of suppressing function lookup (via the command utility) for the user's own functions and ensures that any regular built-ins or functions provided by the implementation are under the control of the path search. The mechanisms for associating built-ins or functions with executable files in the path are not specified by the Shell and Utilities volume of IEEE Std 1003.1-2001, but the wording requires that if either is implemented, the application is not able to distinguish a function or built-in from an executable (other than in terms of performance, presumably). The implementation ensures that all effects specified by the Shell and Utilities volume of IEEE Std 1003.1-2001 resulting from the invocation of the regular built-in or function (interaction with the environment, variables, traps, and so on) are identical to those resulting from the invocation of an executable file.

IEEE Std 1003.1-2001/Cor 2-2004, item XCU/TC2/D6/4 is applied, updating the case where execve() fails due to an error equivalent to the [ENOEXEC] error.

Examples

Consider three versions of the ls utility:

  1. The application includes a shell function named ls.

  2. The user writes a utility named ls and puts it in /fred/bin.

  3. The example implementation provides ls as a regular shell built-in that is invoked (either by the shell or directly by exec) when the path search reaches the directory /posix/bin.

If PATH = /posix/bin, various invocations yield different versions of ls:

Invocation

Version of ls

ls (from within application script)

(1) function

command ls (from within application script)

(3) built-in

ls (from within makefile called by application)

(3) built-in

system("ls")

(3) built-in

PATH="/fred/bin:$PATH" ls

(2) user's version

Pipelines

Because pipeline assignment of standard input or standard output or both takes place before redirection, it can be modified by redirection. For example:

$ command1 2>&1 | command2

sends both the standard output and standard error of command1 to the standard input of command2.

The reserved word ! allows more flexible testing using AND and OR lists.

It was suggested that it would be better to return a non-zero value if any command in the pipeline terminates with non-zero status (perhaps the bitwise-inclusive OR of all return values). However, the choice of the last-specified command semantics are historical practice and would cause applications to break if changed. An example of historical behavior:

$ sleep 5 | (exit 4)
$ echo $?
4
$ (exit 4) | sleep 5
$ echo $?
0

Lists

The equal precedence of "&&" and "||" is historical practice. The standard developers evaluated the model used more frequently in high-level programming languages, such as C, to allow the shell logical operators to be used for complex expressions in an unambiguous way, but they could not allow historical scripts to break in the subtle way unequal precedence might cause. Some arguments were posed concerning the "{}" or "()" groupings that are required historically. There are some disadvantages to these groupings:

IEEE PASC Interpretation 1003.2 #204 is applied, clarifying that the operators "&&" and "||" are evaluated with left associativity.

Asynchronous Lists

The grammar treats a construct such as:

foo & bar & bam &

as one "asynchronous list", but since the status of each element is tracked by the shell, the term "element of an asynchronous list" was introduced to identify just one of the foo, bar, or bam portions of the overall list.

Unless the implementation has an internal limit, such as {CHILD_MAX}, on the retained process IDs, it would require unbounded memory for the following example:

while true
do      foo & echo $!
done

The treatment of the signals SIGINT and SIGQUIT with asynchronous lists is described in the Shell and Utilities volume of IEEE Std 1003.1-2001, Section 2.11, Signals and Error Handling.

Since the connection of the input to the equivalent of /dev/null is considered to occur before redirections, the following script would produce no output:

exec < /etc/passwd
cat <&0 &
wait

Sequential Lists

There is no additional rationale provided for this section.

AND Lists

There is no additional rationale provided for this section.

OR Lists

There is no additional rationale provided for this section.

Compound Commands
Grouping Commands

The semicolon shown in { compound-list;} is an example of a control operator delimiting the } reserved word. Other delimiters are possible, as shown in the Shell and Utilities volume of IEEE Std 1003.1-2001, Section 2.10, Shell Grammar; <newline> is frequently used.

A proposal was made to use the <do-done> construct in all cases where command grouping in the current process environment is performed, identifying it as a construct for the grouping commands, as well as for shell functions. This was not included because the shell already has a grouping construct for this purpose ( "{}" ), and changing it would have been counter-productive.

For Loop

The format is shown with generous usage of <newline>s. See the grammar in the Shell and Utilities volume of IEEE Std 1003.1-2001, Section 2.10, Shell Grammar for a precise description of where <newline>s and semicolons can be interchanged.

Some historical implementations support '{' and '}' as substitutes for do and done. The standard developers chose to omit them, even as an obsolescent feature. (Note that these substitutes were only for the for command; the while and until commands could not use them historically because they are followed by compound-lists that may contain "{...}" grouping commands themselves.)

The reserved word pair do ... done was selected rather than do ... od (which would have matched the spirit of if ... fi and case ... esac) because od is already the name of a standard utility.

PASC Interpretation 1003.2 #169 has been applied changing the grammar.

Case Conditional Construct

An optional left parenthesis before pattern was added to allow numerous historical KornShell scripts to conform. At one time, using the leading parenthesis was required if the case statement was to be embedded within a "$()" command substitution; this is no longer the case with the POSIX shell. Nevertheless, many historical scripts use the left parenthesis, if only because it makes matching-parenthesis searching easier in vi and other editors. This is a relatively simple implementation change that is upwards-compatible for all scripts.

Consideration was given to requiring break inside the compound-list to prevent falling through to the next pattern action list. This was rejected as being nonexisting practice. An interesting undocumented feature of the KornShell is that using ";&" instead of ";;" as a terminator causes the exact opposite behavior-the flow of control continues with the next compound-list.

The pattern '*', given as the last pattern in a case construct, is equivalent to the default case in a C-language switch statement.

The grammar shows that reserved words can be used as patterns, even if one is the first word on a line. Obviously, the reserved word esac cannot be used in this manner.

If Conditional Construct

The precise format for the command syntax is described in the Shell and Utilities volume of IEEE Std 1003.1-2001, Section 2.10, Shell Grammar.

While Loop

The precise format for the command syntax is described in the Shell and Utilities volume of IEEE Std 1003.1-2001, Section 2.10, Shell Grammar.

Until Loop

The precise format for the command syntax is described in the Shell and Utilities volume of IEEE Std 1003.1-2001, Section 2.10, Shell Grammar.

Function Definition Command

The description of functions in an early proposal was based on the notion that functions should behave like miniature shell scripts; that is, except for sharing variables, most elements of an execution environment should behave as if they were a new execution environment, and changes to these should be local to the function. For example, traps and options should be reset on entry to the function, and any changes to them do not affect the traps or options of the caller. There were numerous objections to this basic idea, and the opponents asserted that functions were intended to be a convenient mechanism for grouping common commands that were to be executed in the current execution environment, similar to the execution of the dot special built-in.

It was also pointed out that the functions described in that early proposal did not provide a local scope for everything a new shell script would, such as the current working directory, or umask, but instead provided a local scope for only a few select properties. The basic argument was that if a local scope is needed for the execution environment, the mechanism already existed: the application can put the commands in a new shell script and call that script. All historical shells that implemented functions, other than the KornShell, have implemented functions that operate in the current execution environment. Because of this, traps and options have a global scope within a shell script. Local variables within a function were considered and included in another early proposal (controlled by the special built-in local), but were removed because they do not fit the simple model developed for functions and because there was some opposition to adding yet another new special built-in that was not part of historical practice. Implementations should reserve the identifier local (as well as typeset, as used in the KornShell) in case this local variable mechanism is adopted in a future version of IEEE Std 1003.1-2001.

A separate issue from the execution environment of a function is the availability of that function to child shells. A few objectors maintained that just as a variable can be shared with child shells by exporting it, so should a function. In early proposals, the export command therefore had a -f flag for exporting functions. Functions that were exported were to be put into the environment as name()= value pairs, and upon invocation, the shell would scan the environment for these and automatically define these functions. This facility was strongly opposed and was omitted. Some of the arguments against exportable functions were as follows:

As far as can be determined, the functions in the Shell and Utilities volume of IEEE Std 1003.1-2001 match those in System V. Earlier versions of the KornShell had two methods of defining functions:

function fname { compound-list }

and:

fname() { compound-list }

The latter used the same definition as the Shell and Utilities volume of IEEE Std 1003.1-2001, but differed in semantics, as described previously. The current edition of the KornShell aligns the latter syntax with the Shell and Utilities volume of IEEE Std 1003.1-2001 and keeps the former as is.

The name space for functions is limited to that of a name because of historical practice. Complications in defining the syntactic rules for the function definition command and in dealing with known extensions such as the "@()" usage in the KornShell prevented the name space from being widened to a word. Using functions to support synonyms such as the "!!" and '%' usage in the C shell is thus disallowed to conforming applications, but acceptable as an extension. For interactive users, the aliasing facilities in the Shell and Utilities volume of IEEE Std 1003.1-2001 should be adequate for this purpose. It is recognized that the name space for utilities in the file system is wider than that currently supported for functions, if the portable filename character set guidelines are ignored, but it did not seem useful to mandate extensions in systems for so little benefit to conforming applications.

The "()" in the function definition command consists of two operators. Therefore, intermixing <blank>s with the fname, '(', and ')' is allowed, but unnecessary.

An example of how a function definition can be used wherever a simple command is allowed:

# If variable i is equal to "yes",
# define function foo to be ls -l
#
[ "$i" = yes ] && foo() {
    ls -l
}

C.2.10 Shell Grammar

There are several subtle aspects of this grammar where conventional usage implies rules about the grammar that in fact are not true.

For compound_list, only the forms that end in a separator allow a reserved word to be recognized, so usually only a separator can be used where a compound list precedes a reserved word (such as Then, Else, Do, and Rbrace). Explicitly requiring a separator would disallow such valid (if rare) statements as:

if (false) then (echo x) else (echo y) fi

See the Note under special grammar rule (1).

Concerning the third sentence of rule (1) (``Also, if the parser ..."):

Note that the body of here-documents are handled by token recognition (see the Shell and Utilities volume of IEEE Std 1003.1-2001, Section 2.3, Token Recognition) and do not appear in the grammar directly. (However, the here-document I/O redirection operator is handled as part of the grammar.)

The start symbol of the grammar ( complete_command) represents either input from the command line or a shell script. It is repeatedly applied by the interpreter to its input and represents a single "chunk" of that input as seen by the interpreter.

Shell Grammar Lexical Conventions

There is no additional rationale provided for this section.

Shell Grammar Rules

There is no additional rationale provided for this section.

C.2.11 Signals and Error Handling

There is no additional rationale provided for this section.

C.2.12 Shell Execution Environment

Some implementations have implemented the last stage of a pipeline in the current environment so that commands such as:

command | read foo

set variable foo in the current environment. This extension is allowed, but not required; therefore, a shell programmer should consider a pipeline to be in a subshell environment, but not depend on it.

In early proposals, the description of execution environment failed to mention that each command in a multiple command pipeline could be in a subshell execution environment. For compatibility with some historical shells, the wording was phrased to allow an implementation to place any or all commands of a pipeline in the current environment. However, this means that a POSIX application must assume each command is in a subshell environment, but not depend on it.

The wording about shell scripts is meant to convey the fact that describing "trap actions" can only be understood in the context of the shell command language. Outside of this context, such as in a C-language program, signals are the operative condition, not traps.

C.2.13 Pattern Matching Notation

Pattern matching is a simpler concept and has a simpler syntax than REs, as the former is generally used for the manipulation of filenames, which are relatively simple collections of characters, while the latter is generally used to manipulate arbitrary text strings of potentially greater complexity. However, some of the basic concepts are the same, so this section points liberally to the detailed descriptions in the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 9, Regular Expressions.

Patterns Matching a Single Character

Both quoting and escaping are described here because pattern matching must work in three separate circumstances:

  1. Calling directly upon the shell, such as in pathname expansion or in a case statement. All of the following match the string or file abc:

    abc "abc" a"b"c a\bc a[b]c a["b"]c a[\b]c a["\b"]c a?c a*c
    
    

    The following do not:

    "a?c" a\*c a\[b]c
    
    
  2. Calling a utility or function without going through a shell, as described for find and the fnmatch() function defined in the System Interfaces volume of IEEE Std 1003.1-2001.

  3. Calling utilities such as find, cpio, tar, or pax through the shell command line. In this case, shell quote removal is performed before the utility sees the argument. For example, in:

    find /bin -name "e\c[\h]o" -print
    
    

    after quote removal, the backslashes are presented to find and it treats them as escape characters. Both precede ordinary characters, so the c and h represent themselves and echo would be found on many historical systems (that have it in /bin). To find a filename that contained shell special characters or pattern characters, both quoting and escaping are required, such as:

    pax -r ... "*a\(\?"
    
    

    to extract a filename ending with "a(?".

Conforming applications are required to quote or escape the shell special characters (sometimes called metacharacters). If used without this protection, syntax errors can result or implementation extensions can be triggered. For example, the KornShell supports a series of extensions based on parentheses in patterns.

The restriction on a circumflex in a bracket expression is to allow implementations that support pattern matching using the circumflex as the negation character in addition to the exclamation mark. A conforming application must use something like "[\^!]" to match either character.

Patterns Matching Multiple Characters

Since each asterisk matches zero or more occurrences, the patterns "a*b" and "a**b" have identical functionality.

Examples
a[bc]
Matches the strings "ab" and "ac".
a*d
Matches the strings "ad", "abd", and "abcd", but not the string "abc".
a*d*
Matches the strings "ad", "abcd", "abcdef", "aaaad", and "adddd".
*a*d
Matches the strings "ad", "abcd", "efabcd", "aaaad", and "adddd".
Patterns Used for Filename Expansion

The caveat about a slash within a bracket expression is derived from historical practice. The pattern "a[b/c]d" does not match such pathnames as abd or a/d. On some implementations (including those conforming to the Single UNIX Specification), it matched a pathname of literally "a[b/c]d". On other systems, it produced an undefined condition (an unescaped '[' used outside a bracket expression). In this version, the XSI behavior is now required.

Filenames beginning with a period historically have been specially protected from view on UNIX systems. A proposal to allow an explicit period in a bracket expression to match a leading period was considered; it is allowed as an implementation extension, but a conforming application cannot make use of it. If this extension becomes popular in the future, it will be considered for a future version of the Shell and Utilities volume of IEEE Std 1003.1-2001.

Historical systems have varied in their permissions requirements. To match f*/bar has required read permissions on the f* directories in the System V shell, but the Shell and Utilities volume of IEEE Std 1003.1-2001, the C shell, and KornShell require only search permissions.

C.2.14 Special Built-In Utilities

See the RATIONALE sections on the individual reference pages.


UNIX ® is a registered Trademark of The Open Group.
POSIX ® is a registered Trademark of The IEEE.
[ Main Index | XBD | XCU | XSH | XRAT ]