It is important to understand that while some computer languages (e.g. Scheme or Basic) are normally used with an interactive interpreter (where you type in commands that are immediately executed), C doesn't work that way. C source code files are always compiled into binary code by a program called a "compiler" and then executed. This is actually a multi-step process which we describe in some detail here.
Compiling C programs requires you to work with four kinds of files:
Regular source code files. These files contain function
definitions, and have names which end in ".c" by convention.
Header files. These files contain function declarations (also
known as function prototypes) and various preprocessor statements (see
below). They are used to allow source code files to access
externally-defined functions. Header files end in ".h" by
convention.
Object files. These files are produced as the output of the
compiler. They consist of function definitions in binary form, but they are
not executable by themselves. Object files end in ".o" by
convention, although on some operating systems (e.g. Windows, MS-DOS),
they often end in ".obj".
Binary executables. These are produced as the output of a
program called a "linker". The linker links together a number of object
files to produce a binary file which can be directly executed. Binary
executables have no special suffix on Unix operating systems, although they
generally end in ".exe" on Windows.
There are other kinds of files as well, notably libraries
(".a" files) and shared libraries (".so" files),
but you won't normally need to deal with them directly.
Before the C compiler starts compiling a source code file, the file is
processed by a preprocessor. This is in reality a separate program
(normally called "cpp", for "C preprocessor"), but it is invoked
automatically by the compiler before compilation proper begins. What the
preprocessor does is convert the source code file you write into another
source code file (you can think of it as a "modified" or "expanded" source
code file). That modified file may exist as a real file in the file system,
or it may only be stored in memory for a short time before being sent to the
compiler. Either way, you don't have to worry about it, but you do have to
know what the preprocessor commands do.
Preprocessor commands start with the pound sign ("#"). There
are several preprocessor commands; two of the most important are:
#define. This is mainly used to define constants.
For instance,
#define BIGNUM 1000000
specifies that wherever the character string BIGNUM is found
in the rest of the program, 1000000 should be substituted for
it. For instance, the statement:
int a = BIGNUM;
becomes
int a = 1000000;
#define is used in this way so as to avoid having to
explicitly write out some constant value in many different places in a source
code file. This is important in case you need to change the constant value
later on; it's much less bug-prone to change it once, in the
#define, than to have to change it in multiple places scattered
all over the code.
#include. This is used to access function
definitions defined outside of a source code file. For instance:
#include <stdio.h>
causes the preprocessor to paste the contents of
<stdio.h> into the source code file at the location of the
#include statement before it gets compiled.
#include is almost always used to include header files, which
are files which mainly contain function declarations and #define
statements. In this case, we use #include in order to be able
to use functions such as printf and scanf, whose
declarations are located in the file stdio.h. C compilers do
not allow you to use a function unless it has previously been declared or
defined in that file; #include statements are thus the way to
re-use previously-written code in your C programs.
There are a number of other preprocessor commands as well, but we will deal with them as we need them.
After the C preprocessor has included all the header files and expanded
out all the #define and #include statements (as
well as any other preprocessor commands that may be in the original file),
the compiler can compile the program. It does this by turning the C source
code into an object code file, which is a file ending in
".o" which contains the binary version of the source code.
Object code is not directly executable, though. In order to make an
executable, you also have to add code for all of the library functions that
were #included into the file (this is not the same as including
the declarations, which is what #include does). This is the job
of the linker (see the next section).
In general, the compiler is invoked as follows:
% gcc -c foo.c
where % is the unix prompt. This tells the compiler to run
the preprocessor on the file foo.c and then compile it into the
object code file foo.o. The -c option means to
compile the source code file into an object file but not to invoke the
linker. If your entire program is in one source code file, you can instead
do this:
% gcc foo.c -o foo
This tells the compiler to run the preprocessor on foo.c,
compile it and then link it to create an executable called foo.
The -o option states that the next word on the line is the name
of the binary executable file (program). If you don't specify the
-o,
i.e. if you just type gcc foo.c, the executable will be named
a.out for silly historical reasons.
Note also that the name of the compiler we are using is gcc,
which stands for "GNU C compiler" or "GNU compiler collection" depending on
who you listen to. Other C compilers exist; many of them have the name
cc, for "C compiler". On Linux systems cc is an
alias for gcc.
The job of the linker is to link together a bunch of object files
(.o files) into a binary executable. This includes both the
object files that the compiler created from your source code files as well as
object files that have been pre-compiled for you and collected into
library files. These files have names which end in .a or
.so, and you normally don't need to know about them, as the
linker knows where most of them are located and will link them in
automatically as needed.
Like the preprocessor, the linker is a separate program called
ld. Also like the preprocessor, the linker is invoked
automatically for you when you use the compiler. The normal way of using the
linker is as follows:
% gcc foo.o bar.o baz.o -o myprog
This line tells the compiler to link together three object files
(foo.o, bar.o, and baz.o) into a
binary executable file named myprog. Now you have a file called
myprog that you can run and which will hopefully do something
cool and/or useful.
This is all you need to know to begin compiling your own C programs.
Generally, we also recommend that you use the -Wall command-line
option:
% gcc -Wall -c foo.cc
The -Wall option causes the compiler to warn you about legal
but dubious code constructs, and will help you catch a lot of bugs very
early. If you want to be even more anal (and who doesn't?), do this:
% gcc -Wall -Wstrict-prototypes -ansi -pedantic -c foo.cc
The -Wstrict-prototypes option means that the compiler will
warn you if you haven't written correct prototypes for all your functions.
The -ansi and -pedantic options cause the compiler
to warn about any non-portable construct (e.g. constructs that may be
legal in gcc but not in all standard C compilers; such features
should usually be avoided).
Kernighan and Ritchie, The C Programming Language, 2nd Ed.
The man page for gcc. Type: man gcc
at the unix prompt.
The GNU Info documentation on gcc. Warning! This
is far more information than most people could possibly absorb in the average
millenium.
Info documentation on gcc can be accessed through the GNU
emacs editor by typing "M-x info" (where "M-x" means to hit the meta-key and
"x" simultaneously), or "C-h i" (where "C-h" means to hit the control key and
"i" simultaneously), followed by "mgcc<return>". Type
"minfo<return>" instead for a quick tour of how to use info. You can
also access the info documentation from the unix command line by typing
info gcc.