The Open Group Base Specifications Issue 6
IEEE Std 1003.1, 2004 Edition
Copyright © 2001-2004 The IEEE and The Open Group

A.12 Utility Conventions

A.12.1 Utility Argument Syntax

The standard developers considered that recent trends toward diluting the SYNOPSIS sections of historical reference pages to the equivalent of:

command [options][operands]

were a disservice to the reader. Therefore, considerable effort was placed into rigorous definitions of all the command line arguments and their interrelationships. The relationships depicted in the synopses are normative parts of IEEE Std 1003.1-2001; this information is sometimes repeated in textual form, but that is only for clarity within context.

The use of "undefined" for conflicting argument usage and for repeated usage of the same option is meant to prevent conforming applications from using conflicting arguments or repeated options unless specifically allowed (as is the case with ls, which allows simultaneous, repeated use of the -C, -l, and -1 options). Many historical implementations will tolerate this usage, choosing either the first or the last applicable argument. This tolerance can continue, but conforming applications cannot rely upon it. (Other implementations may choose to print usage messages instead.)

The use of "undefined" for conflicting argument usage also allows an implementation to make reasonable extensions to utilities where the implementor considers mutually-exclusive options according to IEEE Std 1003.1-2001 to have a sensible meaning and result.

IEEE Std 1003.1-2001 does not define the result of a command when an option-argument or operand is not followed by ellipses and the application specifies more than one of that option-argument or operand. This allows an implementation to define valid (although non-standard) behavior for the utility when more than one such option or operand is specified.

The following table summarizes the requirements for option-arguments:

 

SYNOPSIS Shows:

 

_

 

-a arg

-barg

-c[arg]

Conforming

 

 

 

application uses:

-a arg

-barg

-carg or -c

System supports:

-a arg and -aarg

-b arg and -barg

-carg and -c

Non-conforming

 

 

 

applications may use:

-aarg

-b arg

N/A

Allowing <blank>s after an option (that is, placing an option and its option-argument into separate argument strings) when IEEE Std 1003.1-2001 does not require it encourages portability of users, while still preserving backwards-compatibility of scripts. Inserting <blank>s between the option and the option-argument is preferred; however, historical usage has not been consistent in this area; therefore, <blank>s are required to be handled by all implementations, but implementations are also allowed to handle the historical syntax. Another justification for selecting the multiple-argument method was that the single-argument case is inherently ambiguous when the option-argument can legitimately be a null string.

IEEE Std 1003.1-2001 explicitly states that digits are permitted as operands and option-arguments. The lower and upper bounds for the values of the numbers used for operands and option-arguments were derived from the ISO C standard values for {LONG_MIN} and {LONG_MAX}. The requirement on the standard utilities is that numbers in the specified range do not cause a syntax error, although the specification of a number need not be semantically correct for a particular operand or option-argument of a utility. For example, the specification of:

dd obs=3000000000

would yield undefined behavior for the application and could be a syntax error because the number 3000000000 is outside of the range -2147483647 to +2147483647. On the other hand:

dd obs=2000000000

may cause some error, such as "blocksize too large", rather than a syntax error.

A.12.2 Utility Syntax Guidelines

This section is based on the rules listed in the SVID. It was included for two reasons:

  1. The individual utility descriptions in the Shell and Utilities volume of IEEE Std 1003.1-2001, Chapter 4, Utilities needed a set of common (although not universal) actions on which they could anchor their descriptions of option and operand syntax. Most of the standard utilities actually do use these guidelines, and many of their historical implementations use the getopt() function for their parsing. Therefore, it was simpler to cite the rules and merely identify exceptions.

  2. Writers of conforming applications need suggested guidelines if the POSIX community is to avoid the chaos of historical UNIX system command syntax.

It is recommended that all future utilities and applications use these guidelines to enhance "user portability". The fact that some historical utilities could not be changed (to avoid breaking historical applications) should not deter this future goal.

The voluntary nature of the guidelines is highlighted by repeated uses of the word should throughout. This usage should not be misinterpreted to imply that utilities that claim conformance in their OPTIONS sections do not always conform.

Guidelines 1 and 2 encourage utility writers to use only characters from the portable character set because use of locale-specific characters may make the utility inaccessible from other locales. Use of uppercase letters is discouraged due to problems associated with porting utilities to systems that do not distinguish between uppercase and lowercase characters in filenames. Use of non-alphanumeric characters is discouraged due to the number of utilities that treat non-alphanumeric characters in "special" ways depending on context (such as the shell using whitespace characters to delimit arguments, various quote characters for quoting, the dollar sign to introduce variable expansion, etc.).

In the Shell and Utilities volume of IEEE Std 1003.1-2001, Section 2.9.1, Simple Commands, it is further stated that a command used in the Shell Command Language cannot be named with a trailing colon.

Guideline 3 was changed to allow alphanumeric characters (letters and digits) from the character set to allow compatibility with historical usage. Historical practice allows the use of digits wherever practical, and there are no portability issues that would prohibit the use of digits. In fact, from an internationalization viewpoint, digits (being non-language-dependent) are preferable over letters (a -2 is intuitively self-explanatory to any user, while in the -f filename the letter 'f' is a mnemonic aid only to speakers of Latin-based languages where "filename" happens to translate to a word that begins with 'f'. Since guideline 3 still retains the word "single", multi-digit options are not allowed. Instances of historical utilities that used them have been marked obsolescent, with the numbers being changed from option names to option-arguments.

It was difficult to achieve a satisfactory solution to the problem of name space in option characters. When the standard developers desired to extend the historical cc utility to accept ISO C standard programs, they found that all of the portable alphabet was already in use by various vendors. Thus, they had to devise a new name, c89 (now superseded by c99), rather than something like cc -X. There were suggestions that implementors be restricted to providing extensions through various means (such as using a plus sign as the option delimiter or using option characters outside the alphanumeric set) that would reserve all of the remaining alphanumeric characters for future POSIX standards. These approaches were resisted because they lacked the historical style of UNIX systems. Furthermore, if a vendor-provided option should become commonly used in the industry, it would be a candidate for standardization. It would be desirable to standardize such a feature using historical practice for the syntax (the semantics can be standardized with any syntax). This would not be possible if the syntax was one reserved for the vendor. However, since the standardization process may lead to minor changes in the semantics, it may prove to be better for a vendor to use a syntax that will not be affected by standardization.

Guideline 8 includes the concept of comma-separated lists in a single argument. It is up to the utility to parse such a list itself because getopt() just returns the single string. This situation was retained so that certain historical utilities would not violate the guidelines. Applications preparing for international use should be aware of an occasional problem with comma-separated lists: in some locales, the comma is used as the radix character. Thus, if an application is preparing operands for a utility that expects a comma-separated list, it should avoid generating non-integer values through one of the means that is influenced by setting the LC_NUMERIC variable (such as awk, bc, printf, or printf()).

Applications calling any utility with a first operand starting with '-' should usually specify --, as indicated by Guideline 10, to mark the end of the options. This is true even if the SYNOPSIS in the Shell and Utilities volume of IEEE Std 1003.1-2001 does not specify any options; implementations may provide options as extensions to the Shell and Utilities volume of IEEE Std 1003.1-2001. The standard utilities that do not support Guideline 10 indicate that fact in the OPTIONS section of the utility description.

Guideline 11 was modified to clarify that the order of different options should not matter relative to one another. However, the order of repeated options that also have option-arguments may be significant; therefore, such options are required to be interpreted in the order that they are specified. The make utility is an instance of a historical utility that uses repeated options in which the order is significant. Multiple files are specified by giving multiple instances of the -f option; for example:

make -f common_header -f specific_rules target

Guideline 13 does not imply that all of the standard utilities automatically accept the operand '-' to mean standard input or output, nor does it specify the actions of the utility upon encountering multiple '-' operands. It simply says that, by default, '-' operands are not used for other purposes in the file reading or writing (but not when using stat(), unlink(), touch, and so on) utilities. All information concerning actual treatment of the '-' operand is found in the individual utility sections.

An area of concern was that as implementations mature, implementation-defined utilities and implementation-defined utility options will result. The idea was expressed that there needed to be a standard way, say an environment variable or some such mechanism, to identify implementation-defined utilities separately from standard utilities that may have the same name. It was decided that there already exist several ways of dealing with this situation and that it is outside of the scope to attempt to standardize in the area of non-standard items. A method that exists on some historical implementations is the use of the so-called /local/bin or /usr/local/bin directory to separate local or additional copies or versions of utilities. Another method that is also used is to isolate utilities into completely separate domains. Still another method to ensure that the desired utility is being used is to request the utility by its full pathname. There are many approaches to this situation; the examples given above serve to illustrate that there is more than one.


UNIX ® is a registered Trademark of The Open Group.
POSIX ® is a registered Trademark of The IEEE.
[ Main Index | XBD | XCU | XSH | XRAT ]