CS 11: How to write usage statements

If a program is called with incorrect command-line arguments, it should detect that fact and print a usage statement to the terminal. This document tells you what that usage statement should contain, and describes the conventions we expect you to use. It is not exhaustive; there are fairly standard ways to write even more elaborate usage statements than those that are described here, but the information below will at least get you started writing decent usage statements.

Usage statements are essentially the same for all programming languages. The conventions we describe below are for Unix-based operating systems (Linux and Mac OS X); similar conventions exist for Windows but some details may be different.

If you don't know what a command-line argument is, you are not ready to read this document. Go back to your language tutorial, learn what a command-line argument is, come back here and continue reading.

Purpose of a usage statement

A usage statement is a printed summary of how to invoke a program from a shell prompt (i.e. how to write a command line). It will include a description of all the possible command-line arguments that the program might take:

how many command-line arguments there are
what each command-line argument represents
whether a given command-line argument is optional or not
any other information relating to the command-line arguments

When and where to print a usage statement

When

A usage statement should always be printed when the user invokes a program with either:

the wrong number of command-line arguments (not enough or too many)
invalid command-line arguments (e.g. an english word where a number is expected)

It's important that you explicitly check the number and contents of command-line arguments so as to make sure that they are valid. This process is slightly different for each programming language, so we won't go into it here (see the language tracks for details about this).

To force a program to print its usage message, all you need to do is to type in the program name with no arguments (for programs that take at least one command-line argument). This is a common trick. However, some programs don't have any required command-line arguments (all the arguments are optional; see below). In cases like these, there is usually an optional argument called -help which will cause the usage message to be printed. Some programs use --help instead of -help for this purpose.

Where

The usage message is normally printed to the terminal. In Unix systems, "printing to the terminal" comes in two flavors. Normal printing to the terminal means printing to "standard output" (called stdout in the C language and similar things in other languages). This is not where you should print usage statements. Usage statements should be printed to "standard error" (called stderr in the C language and similar things in other languages). It's important to always print usage statements to stderr and not to stdout.

You might wonder why this is important, given that printing to both stdout and stderr prints to the same terminal window. The reason is that it is possible to redirect either stdout or stderr independently to a file rather than to a terminal. This is very useful in practice. For instance, you might want to log all of your error messages to a log file, but have the normal output go to the terminal as usual. Or you may want to redirect the non-error output to one file, and all the error messages to another file. Having error messages printed to stderr instead of to stdout makes this easy. If all error messages were printed to stdout, the normal output and error messages would not be easy to separate.

What your usage statement should contain

Your usage statement should contain

The name of the program
Every non-optional command-line argument your program takes
Every optional command-line argument your program takes
Any extra descriptive material that the user should know about.

Formatting guidelines

The usage statement should always begin with the literal word usage in lower case, followed by a colon and a space, followed by the rest of the usage message. So, the usage message starts with usage: . If there are no command-line arguments, the usage message will be very simple e.g.
```
    usage: myprog
```
where myprog is the name of the program.
Every command-line argument in the usage statement should be a single word, with no spaces. If you want to write an argument as multiple words, join the words together using underscores. So don't write number of generations as a command-line argument; write number_of_generations .
Do not surround a command-line argument name with parentheses, square brackets, angle brackets, curly brackets, or quotation marks. For our example, don't write [number_of_generations] or <number_of_generations> or "number_of_generations" ; just write number_of_generations . This is important because square brackets and angle brackets have special meanings in usage messages, and the other forms just look ugly.
The word representing a particular command-line argument should be descriptive. It should say what the command-line argument is supposed to represent, at least in general terms. So this would be a bad usage statement:
```
    usage: myprog arg1 arg2
```
because arg1 and arg2 could mean anything. This would be a better usage message:
```
    usage: myprog input_file output_file
```
because it suggests that the first command-line argument (input_file) represents the name of an input file, and the second command-line argument (output_file) presumably represents the name of an output file. This is useful information to someone using the program.
Do not separate successive command-line arguments with commas or semicolons; just separate them with single spaces. So don't write
```
    usage: myprog input_file, output_file
```
Instead, write:
```
    usage: myprog input_file output_file
```
Don't use symbolic characters for command-line argument names in usage statements. However, there are other common abbreviations you can use. For instance, instead of writing number_of_generations above you might want to write #generations . Don't do this; instead, write ngenerations (or even ngens if you want it to be shorter). Using "n" as the first letter of a command-line arguments is a commonly-used abbreviation for "number of".
If you have multiple command-line arguments of the same kind, you can number them. So if your program takes three files of the same type, you might write the usage statement as
```
    usage: myprog file1 file2 file3
```
Make sure you don't put a space before the number! In other words, don't do this:
```
    usage: myprog file 1 file 2 file 3
```
The reason for this is that it's hard to tell if the 1, 2, and 3 are supposed to be separate command-line arguments of their own.
If you need to explain in detail what a particular command-line argument means, do it on the lines following the first line. Don't feel that you have to cram the entire usage message on one line. For instance, this is bad:
```
    usage: myprog infile (the input file) outfile (the output file)
```
It should instead look like this:
```
    usage: myprog infile outfile
      infile:  the input file
      outfile: the output file
```
Note how the first line contains an example of the usage, while the subsequent lines explain what the command-line arguments mean in detail. This is a good pattern to follow.

Optional arguments in usage statements

Many programs have optional command-line arguments. There are various situations in which this can arise:

There can be optional "flags" which alter the behavior of the program in some way.
The optional flags can themselves have arguments.
There can be a variable number of arguments of the same type.

There are a few conventions you should follow with optional arguments. Here they are:

Optional arguments should always be surrounded by square brackets. Do not use square brackets for any other purpose in usage messages. Don't use anything but square brackets for this purpose. Square brackets mean that an argument is optional, always.
Optional "flags" (arguments that change the way the program works) should start with a dash. Very often (but not always) they will have a name which is a dash followed by a single letter, which identifies what it is. Here's an example, which is a greatly simplified version of the usage message for the Unix ls program which lists files:
```
    usage: ls [-a] [-F]
```
This program is called ls and can be called with no arguments, with one optional argument ( ls -a or ls -F ) or with two optional arguments ( ls -a -F ). In this particular case, the -a optional argument says to list hidden files as well as normal ones, and -F changes the way the output of the program is formatted.

Note that each optional argument gets its own set of square brackets.
If an optional "flag" argument itself has arguments, put them inside the square brackets. For instance:
```
    usage: chess [-strength r]
        -strength r: playing strength in approximate rating (800-3000)
                     (default strength is 2200)
```
might represent a chess program that can either play with its default strength (when invoked with no command-line arguments) or with some other strength if the optional argument is used. Note the extra explanatory text that goes after the usage statement itself. This program could be invoked like this:
```
% chess
```
(where % is the prompt) or like this:
```
% chess -strength 2600
```
but not like this:
```
% chess -strength
```
because that wouldn't make sense (the strength isn't specified).
If your program has multiple arguments of the same type on the command line (e.g. it takes a series of numbers between 1 and 10 of arbitrary length, but at least one of them), put any non-optional arguments as normal arguments in the usage message (normally this will be the first one of the arguments) and put the rest in square brackets. Since there are arbitrary numbers of arguments, you can use ellipses ( ... ) to indicate that any number of arguments can follow. For instance:
```
    usage: average n1 [n2 ...]
        n1, n2, etc.: numbers between 1 and 10
        Maximum length of the list: 100
```
This usage message indicates that the program (which is called average ) takes at least one number (called n1 ) and perhaps some other numbers, all between 1 and 10. There can be at most 100 numbers. Note how the lines after the usage statement itself were used to explain the meaning of the command-line arguments and also describe constraints on them (that there be no more than 100 numbers, and that each number is between 1 and 10).

Other notes

You shouldn't hard-code the program name into your program's source code. It won't make any difference in how it's displayed, but if you change the name of your program and don't change your usage message the usage message will be invalid. Instead, you should write your code so that the program name gets inserted into the usage message before printing. In C, the program name is argv[0] (the first element of the argv array) whereas in python it's sys.argv[0] .
Some programs (e.g. C track lab 2) use the less-than ( < ) and greater-than ( > ) symbols to perform Unix file redirection on the command line. The less-than symbol means that input from a file will look to the program like it's input to the terminal, and the greater-than symbol means that output to the terminal is actually redirected to a file. If this is done, it is a good idea to indicate this in the usage message (though many, many programs don't do this). For instance:
```
    usage: myprog  < input_file  > output_file
```
This says that the program ( myprog ) takes its input from the file input_file using input file redirection (remember to put the < symbol before the file name!) and puts its output into the file output_file using output file redirection.

Note that this also explains why you don't want to use the less-than ( < ) or greater-than ( > ) characters as angle brackets around a command-line argument name; a user might think you meant input/output redirection.