What is a compiler driver?

A compiler driver performs the following:

  1. preprocesses source file (e.g., replace #include <stido.h> with code)
  2. converts high level source file to assembly language
  3. runs the assembler to convert the assembly language into a relocatable object file with machine code
  4. performs the above steps for the other source files
  5. runs a linker program like ld to combine all the object files into an executable object file

What is a linker?

A program that combines one or more object files generated by the compiler into a single file that can be copied into memory and executed.

What is an object file?

Object files contain binary code. There are three forms of object files:

  • relocatable object file
  • executable object file
  • shared object file

Relocatable object file

Assemblers produce relocatable object files. They are “relocatable” because the functions and variables are not bound to any specific address. Instead, the addresses are still symbols.

This file contains binary code and data that can be combined with other relocatable object files at compile time to create an executable object file.

An example of a relocatable object file might be a collection of math functions.

/* math.c
 * A simple math library
 */

int add(int a, int b)
{
  return a + b;
}
unix > gcc -c math.c                      # Create relocatable obj file (math.o)
unix > readelf -h math.o | grep Type      # Read math.o with readelf
Type:            REL (Relocatable file)   # and verify its type

This relocatable object file can now be compiled with any other program to create an executable object file.

/* math.h */
#ifndef MATH_H
#define MATH_H
int add(int a, int b);
#endif

/* math_test.c */
#include <stdio.h>
#include "math.h"

int main(void)
{
  int result = add(1, 2);
  printf("result: %d\n", result);
  return 0;
}
unix > gcc math_test.c -o math_test
/tmp/cclRibQq.o: In function `main':
math_test.c:(.text+0x19): undefined reference to `add'
collect2: ld returned 1 exit status

What just happened? Why couldn’t I produce a math_test program? My math_test program references the add method, but the implementation for the add method is in the math.o relocatable object file. The gcc compiler system doesn’t know that, so it can’t magically link the math.o file with my program. Thus, I must specify that I’m linking in math.o.

unix > gcc math_test.c math.o -o math_test
unix > ./math_test
result: 3

Extra: Relocatable Object File

To more deeply understand the meaning of “relocatable”, look at the difference between the symbol tables of a relocatable object file and an executable object file.

unix > gcc -c main.c
unix > readelf --symbols main.o
Num:     Value   Size     Type      Bind       Vis     Ndx  Name
  0:  00000000      0   NOTYPE     LOCAL   DEFAULT     UND
  1:  00000000      0   FILE       LOCAL   DEFAULT     ABS  main.c
  2:  00000000      0   OBJECT    GLOBAL   DEFAULT     3    buf
  3:  00000000      0   OBJECT    GLOBAL   DEFAULT     1    main

unix > gcc main.c -o main
unix > readelf --symbols main
Num:     Value   Size     Type      Bind       Vis     Ndx  Name
 53:  08048460      2     FUNC    GLOBAL   DEFAULT     13   __libc_csu_fini
 54:  08048462      0     FUNC    GLOBAL   HIDDEN      13   __i686.get_pc_thunk.bx
 55:  0804a018      4     OBJECT  GLOBAL   DEFAULT     13   bufp0

Above are excerpts of the object files. The relocatable object file has symbols associated with the 00000000 address. The executable object file has symbols associated with real addresses. After the compiler and assembler generate the relocatable object file, the data start at address 0. The linker then relocates these sections by associating each with a location with in memory.

Executable Object Files

Having seen in detail what a relocatable object file is, it should be no surprise that an executable object file is simply a relocatable object file with the addresses assigned so that it can be placed into memory for execution. There are a few other details, but at a high level, this is the most important.

At the command line, if the program is invoked, a program called the loader copies the code and data from the executable into main memory. The loader then runs the program by jumping to the instruction at the entry point, which is always at the address of the start symbol.

The code at _start usually runs some initialization code. Then it runs the main routine, which is defined in every C program. Finally it runs _exit.

Shared Object Files

Relocatable object files are linked together during compile time. As stated before, once the symbols in the relocatable object files are assigned addresses, those cannot be changed unless the compilation process is done again. Another disadvantage is the entire object file is linked into the final executable object file.

A modern innovation is the development of shared object files, which are linked into programs during runtime. This is also known as dynamic linking.

Each file system only has one of each shared object file. Each program that uses the shared object file is using the exact same machine code as well. They are suffixed by .so.

How to create a shared object file

unix > gcc -shared -fPIC -o libtest.so test.c

What does ELF stand for?

executable and linkable format

What is an ELF file?

An ELF file is a file containing binary data.

Resources