C++ track: compiling C++ programs.
It is important to understand that while some computer languages (e.g.
scheme or basic) are normally used with an interactive interpreter (where you
type in commands that are immediately executed), C++ doesn't work that way.
C++ source code files are always compiled into binary code by a
program called a "compiler" and then executed. This is actually a multi-step
process which we describe in some detail here.
The different kinds of files
Compiling C++ programs requires you to work with four kinds of files:
- Regular source code files. These files contain function
definitions, and have names which end in ".cc" by convention (although
sometimes you will see source code filenames which end in ".cpp" or ".C").
- Header files. These files contain class declarations, function
declarations (also known as function prototypes) and various preprocessor
statements (see below). They are used to allow source code files to access
externally-defined classes and functions. Header files end in ".hh" or ".h"
by convention.
- Object files. These files are produced as the output of the
compiler. They consist of function definitions in binary form, but they are
not executable by themselves. Object files end in ".o" by convention,
although on some operating systems (Windows, MS-DOS), they often end in
".obj".
- Binary executables. These are produced as the output of a
program called a "linker". The linker links together a number of object
files to produce a binary file which can be directly executed. Binary
executables have no special suffix on Unix operating systems, although they
generally end in ".exe" on Windows.
There are other kinds of files as well, notably libraries (".a" files) and
shared libraries (".so" files), but you won't normally need to deal with them
directly.
The preprocessor
Before the C++ compiler starts compiling a source code file, the file is
processed by a preprocessor. This is a separate program (normally
called "cpp", for "C preprocessor"), but it is invoked automatically by the
compiler before compilation proper begins. Preprocessor commands start with
the pound sign ("#"). There is really only one preprocessor command you need
to know for this track:
- #include. This is used to access function definitions defined
outside of a source code file. For instance:
#include <iostream>
causes the preprocessor to paste the contents of <iostream> into
your source code file. #include is almost always used to include
header files. In this case, we use #include in order to be able to
use the cin and cout objects (input/output streams), whose
declarations are located in the file iostream.h. C++ compilers do not
allow you to use a class, function or global object unless it has previously
been declared or defined in that file; #include statements are thus
the way to re-use previously-written code in your C++ programs. Note that,
unlike C, you do not have to include the file extension of the header
file in the #include statement.
There are a number of other preprocessor commands as well, but we won't be
needing them. In particular, C programmers should note that you should
never use #define to define a constant! Instead, use
const:
const int BIGNUM = 1000000;
This is much safer, since the compiler can use the type information to check
that BIGNUM is being used correctly. It's good to do this in C code as well.
Making the object file: the compiler
After the C++ compiler has included all the header files and expanded out all
the #include statements, it can compile the program. It does this by
turning the C source code into an object code file, which is a file
ending in ".o" which contains the binary version of the source code. Object
code is not directly executable, though. In order to make an executable, you
also have to add code for all of the library functions that were
#included into the file (this is not the same as including the
declarations, which is what #include does). This is the job of the
linker (see the next section).
In general, the compiler is invoked as follows:
% g++ -c foo.cc
where "%" is the unix prompt. This tells the compiler to run the
preprocessor on the file foo.cc and then compile it into the object
code file foo.o. The -c option means to compile the source
code file into an object file but not to invoke the linker. If your entire
program is in one source code file, you can instead do this:
% g++ foo.cc -o foo
This tells the compiler to run the preprocessor on foo.cc, compile it
and then link it to create an executable called foo. The -o
option states that the next word on the line is the name of the binary
executable file (program). If you don't specify the -o, i.e.
if you just type g++ foo.cc, the executable will be named a.out
for silly historical reasons.
Note also that the name of the compiler we are using is g++, which is
related to the GNU C compiler gcc (it shares most of its internals
with gcc).
Putting it all together: the linker
The job of the linker is to link together a bunch of object files
(.o files) into a binary executable. This includes both the object
files that the compiler created from your source code files as well as object
files that have been pre-compiled for you and collected into library
files. These files have names which end in .a or .so,
and you normally don't need to know about them, as the linker knows where
most of them are located and will link them in automatically as needed.
Like the preprocessor, the linker is a separate program called ld.
Also like the preprocessor, the linker is invoked automatically for you when
you use the compiler. The normal way of using the linker is as follows:
% g++ foo.o bar.o baz.o -o myprog
This line tells the compiler to link together three object files
(foo.o, bar.o, and baz.o) into a binary executable
file named myprog. Now you have a file called myprog that
you can run and which will hopefully do something cool and/or useful.
This is all you need to know to begin compiling your own C++ programs.
Generally, we also recommend that you use the -Wall command-line
option:
% g++ -Wall -c foo.cc
The -Wall option causes the compiler to warn you about legal but
dubious code constructs, and will help you catch a lot of bugs very early.
References
- The man page for g++. Type: man g++ | more
at the unix prompt. This is actually the same as the man page for
gcc. Just look for the material specific to g++.
- The GNU Info documentation on gcc. This also includes full
documentation of g++. Warning! This is far more information than most
people could possibly absorb in the average millenium.
Info documentation on gcc is accessed through the GNU emacs editor by typing
"M-x info" (where "M-x" means to hit the meta-key and "x" simultaneously), or
"C-h i" (where "C-h" means to hit the control key and "i" simultaneously),
followed by "mgcc<return>". Type "minfo<return>" instead for a
quick tour of how to use info. You can also access the info documentation
from the unix command line by typing info gcc.