Porting BSD Unix Through the GCC

by John Gilmore

GNU's Bulletin

June, 1989

I have ported the University of California at Berkeley's latest Unix
sources through the GNU C Compiler.  In the process, I made Berkeley
Unix more compatible with the draft ANSI C standard, made many programs
less machine-dependent and less compiler-dependent, and tested GCC.

Berkeley Unix has set the standard for high powered Unix systems for
many years, and continues to offer an improved alternative to AT&T Unix
releases.  However, Berkeley's C compiler is based on an old version of
PCC, the Portable C Compiler from AT&T.  By merging GCC into the
Berkeley release, we provided ANSI C compatibility, better optimization,
and improved compiler maintenance.  The GNU project gained an important
test case for GCC, and a strong collaborator in the free software
movement.

The project was conceived by John Gilmore, and aided by Keith Bostic and
Mike Karels of Berkeley, and Richard Stallman of FSF.  I did most of the
actual porting, while Keith and Mike provided machine resources,
collaborated on major decisions, and arbitrated the style and content of
the changes to Unix.  Richard provided quick turnaround on compiler bug
fixes and problem solving.

We are producing a Unix source tree which can be compiled by both the
old and the new compilers.  Rather than introducing new `#ifdef''s, we
are rewriting the code so that it does not depend on the features of
either compiler.  Whenever we have to make a change, we are moving in
the direction of ANSI C, POSIX compatibility, and machine independence.

We have used GCC releases 1.15 through 1.35.  I did four complete
"passes" over the Unix source tree; each involved running "make clean;
make" on the entire source tree, and examining 500K to 800K of resulting
output.  I'd fix as many errors as I could, testing small parts of the
source tree in the process, then merge my changes back into the master
sources and rebuild the whole thing again.

The errors fell into two general categories: language changes in ANSI C,
and non-portable code.  In some cases it was hard to tell the
difference.

The major ANSI C problem was the generation of character constants in
the preprocessor.  Excessive use of this now-obsolete feature in system
header files caused us to change about 10 include files and about 45
source modules.  Another preprocessor problem was that ANSI C uses a
different syntax for token concatenation; we rewrote pieces of five
modules to avoid having to concatenate tokens.  ANSI C clarified the
rules for the scope of names declared `extern'.  We moved extern
declarations around, or added global function declarations, in more than
38 files to handle this.  Nine programs used new ANSI keywords, such as
`signed' or `const', as identifiers; we picked new identifiers.  Eleven
modules used typedefs as formal parameters names, or used `unsigned'
with a typedef.

The worst non-portable construct we found in the Unix sources was the
use of pointers with member names that aren't right for the pointer
type.  Fixing this problem caused a lot of work, because we had to
figure out what each untyped or mistyped pointer was really being used
for, then fix its type, and the references to it.  We changed 5 modules
due to this, and abandoned one program, efl, which would have required
too much work to fix.

Another problem was caused by using CPP as a macro processor for
assembler source.  We circumvented this problem by making the assembler
source acceptable to both old-CPP and ANSI CPP.

A major problem was `asm' constructs in C source.  Some programs were
written in C with intermixed assembler code, producing a mess when
compiled with anything but the original compiler.  Other routines, such
as compress, drop in an `asm' here or there as an optimization.  Still
more modules, including the kernel, run a sed script over the assembler
code generated by the C compiler, before assembling and linking it.  We
eliminated as many uses of `asm' as we could, and turned others into
assembler language subroutines in `.s' files.  Both the Pascal and Lisp
interpreters used heavy hacking with sed scripts; each of these took
several days to fix.

We fixed three programs that used multi-character constants; two were
clearly errors.  Fifteen programs tried to declare functions or
variables, while omitting both the type and storage class; we added
`int' to the declarations.  In two modules this diagnosed errors caused
by use of `;' where `,' was intended.  Changes to the rules for parsing
declarations made us fix five modules, and declaration bugs in six more
were caught by GCC's improved error checking.  Fifteen programs had
miscellaneous pointer usage bugs fixed.  GCC caught bugs in five modules
caused by misunderstood sign extension.  Five or ten other miscellaneous
bugs were caught and fixed.

We are pleased with the results so far.  Most of the Unix code compiled
without problems, and the parts which we have executed are free from
code generation bugs.  The worst of the ANSI C changes only required
roughly fifty modules to be changed, and there were only two problems of
this magnitude.  A total of twenty bugs in GCC have been found so far,
and most of them are now fixed.  We expected several times this many
bugs; the compiler is in better shape than any of us expected.

Many minor problems and nit incompatibilities with ANSI C have been
removed from the Unix sources.  Far fewer user programs should require
attention when doing a BSD Unix port now.  However, we did not attempt
to make Berkeley Unix fully ANSI C compliant.  In particular, we kept
preprocessor comments (`#endif FOO') as well as machine-specific
`#define''s (`#ifdef vax').  GCC supports these features even though
ANSI C does not.

Unfinished work remains.  The BSD kernel has not yet been ported to GCC,
though it has been syntax-checked.  Optimization of the kernel will
cause problems until `volatile' declarations are used in all the right
places.  Pieces of the Portable C Compiler are still used inside lint,
f77, and pc.  Various sources still need their `setjmp' calls fixed so
that only volatile variables depend on keeping their values after a
`longjmp'.

Our changes will be available to recipients of Berkeley's next software
distribution, whenever that is.  We will also make diffs available to
others involved in porting Unix to ANSI C.

Future projects include building a complete set of ANSI C and POSIX
compatible include files and libraries (including function prototypes),
and converting the existing sources to use them.  An eventual goal is to
produce a fully standard-conforming Unix system---not only in the
interface provided to users, but with sources which will compile and run
on any standard-conforming compiler and libraries.

The success of this collaboration between GNU and Berkeley has
encouraged further cooperation.  The GNU project is working to provide
reimplementations of System V features that Berkeley Unix lacks, such as
improved shells and make commands.  In return, Berkeley has released
much of its software to the public, eliminating the AT&T license
requirement for programs that AT&T did not supply.  A large set of
"freed" BSD software is available by uucp or ftp from `uunet.uu.net' in
the subdirectory `bsd-sources', as well as on the GNU Compiler tape and
the UUNET tapes.

Copyright 1989