Guide to Object File Linking
What is a compiler driver?
A compiler driver performs the following:
- preprocesses source file (e.g., replace
#include <stido.h>
with code) - converts high level source file to assembly language
- runs the assembler to convert the assembly language into a relocatable object file with machine code
- performs the above steps for the other source files
- runs a linker program like
ld
to combine all the object files into an executable object file
What is a linker?
A program that combines one or more object files generated by the compiler into a single file that can be copied into memory and executed.
What is an object file?
Object files contain binary code. There are three forms of object files:
- relocatable object file
- executable object file
- shared object file
Relocatable object file
Assemblers produce relocatable object files. They are “relocatable” because the functions and variables are not bound to any specific address. Instead, the addresses are still symbols.
This file contains binary code and data that can be combined with other relocatable object files at compile time to create an executable object file.
An example of a relocatable object file might be a collection of math functions.
unix > gcc -c math.c # Create relocatable obj file (math.o)
unix > readelf -h math.o | grep Type # Read math.o with readelf
Type: REL (Relocatable file) # and verify its type
This relocatable object file can now be compiled with any other program to create an executable object file.
unix > gcc math_test.c -o math_test
/tmp/cclRibQq.o: In function `main':
math_test.c:(.text+0x19): undefined reference to `add'
collect2: ld returned 1 exit status
What just happened? Why couldn’t I produce a math_test program? My math_test program references the add method, but the implementation for the add method is in the math.o relocatable object file. The gcc compiler system doesn’t know that, so it can’t magically link the math.o file with my program. Thus, I must specify that I’m linking in math.o.
unix > gcc math_test.c math.o -o math_test
unix > ./math_test
result: 3
Extra: Relocatable Object File
To more deeply understand the meaning of “relocatable”, look at the difference between the symbol tables of a relocatable object file and an executable object file.
unix > gcc -c main.c
unix > readelf --symbols main.o
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 FILE LOCAL DEFAULT ABS main.c
2: 00000000 0 OBJECT GLOBAL DEFAULT 3 buf
3: 00000000 0 OBJECT GLOBAL DEFAULT 1 main
unix > gcc main.c -o main
unix > readelf --symbols main
Num: Value Size Type Bind Vis Ndx Name
53: 08048460 2 FUNC GLOBAL DEFAULT 13 __libc_csu_fini
54: 08048462 0 FUNC GLOBAL HIDDEN 13 __i686.get_pc_thunk.bx
55: 0804a018 4 OBJECT GLOBAL DEFAULT 13 bufp0
Above are excerpts of the object files. The relocatable object file has symbols associated with the 00000000 address. The executable object file has symbols associated with real addresses. After the compiler and assembler generate the relocatable object file, the data start at address 0. The linker then relocates these sections by associating each with a location with in memory.
Executable Object Files
Having seen in detail what a relocatable object file is, it should be no surprise that an executable object file is simply a relocatable object file with the addresses assigned so that it can be placed into memory for execution. There are a few other details, but at a high level, this is the most important.
At the command line, if the program is invoked, a program called the loader copies the code and data from the executable into main memory. The loader then runs the program by jumping to the instruction at the entry point, which is always at the address of the start symbol.
The code at _start usually runs some initialization code. Then it runs the main routine, which is defined in every C program. Finally it runs _exit.
Shared Object Files
Relocatable object files are linked together during compile time. As stated before, once the symbols in the relocatable object files are assigned addresses, those cannot be changed unless the compilation process is done again. Another disadvantage is the entire object file is linked into the final executable object file.
A modern innovation is the development of shared object files, which are linked into programs during runtime. This is also known as dynamic linking.
Each file system only has one of each shared object file. Each program that uses the shared object file is using the exact same machine code as well. They are suffixed by .so.
How to create a shared object file
unix > gcc -shared -fPIC -o libtest.so test.c
What does ELF stand for?
executable and linkable format
What is an ELF file?
An ELF file is a file containing binary data.
Resources
- Beginner’s Guide to Linkers
- The ELF Object File Format: Introduction
- The ELF Object File Format by Dissection
- Linkers and Loaders
- Load time relocation of shared libraries
- Position Independent Code (PIC) in shared libraries
- A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux
- ELF Hello World