This document should be considered to be the ultimate reference to the DJGPP COFF format. That doesn't mean it's complete, but since this format isn't really documented elsewhere, this is as good as it gets. All programs reading COFF files should include <coff.h>
Note: Unless otherwise specified, all numeric fields are stored in host native order, which is LSB-first (little endian) for DJGPP, and all file offsets are relative to the beginning of the COFF object (i.e. the file header is always at offset zero, even when the object is inside a library).
Comments to dj@delorie.com
Structure | Located? | Purpose |
---|---|---|
File Header | Beginning of file | Overview of the file; controls layout of other sections |
Optional Header | Follows file header | For executables, used to store the initial %eip |
Section Header | Follow optional header; count determined by file header | Maintain location and size information about code and data sections |
Section Data | Stored in section header | Contains code and data for the program |
Relocation Directives | Stored in section header | Contain fixup information needed when relocating a section |
Line Numbers | Stored in section header | Hold address of each line number in code/data sections |
Symbol Table | Stored in file header | Contains one entry for each symbol this file defines or references |
String Table | Follows symbol table | Stores symbol names; first four bytes are total length |
typedef struct {
unsigned short f_magic; /* magic number */
unsigned short f_nscns; /* number of sections */
unsigned long f_timdat; /* time & date stamp */
unsigned long f_symptr; /* file pointer to symtab */
unsigned long f_nsyms; /* number of symtab entries */
unsigned short f_opthdr; /* sizeof(optional hdr) */
unsigned short f_flags; /* flags */
} FILHDR;
This structure always exists at the beginning of the COFF object. When reading this header, you should read FILHSZ bytes, and not rely on sizeof(FILHDR) to give the correct size.
Bit
|
Symbol
|
Meaning
|
---|---|---|
0x0001 | F_RELFLG | If set, there is no relocation information in this file. This is usually clear for objects and set for executables. |
0x0002 | F_EXEC | If set, all unresolved symbols have been resolved and the file may be considered executable. |
0x0004 | F_LNNO | If set, all line number information has been removed from the file (or was never added in the first place). |
0x0008 | F_LSYMS | If set, all the local symbols have been removed from the file (or were never added in the first place). |
0x0100 | F_AR32WR | Indicates that the file is 32-bit little endian |
The optional header immediately follows the file header in the COFF file. The size of this header is stored in the f_opthdr field of the file header. You must read that many bytes from the file regardless of how big you expect the optional header to be.
Two optional headers are defined for DJGPP objects:
Struct
|
Size
|
Purpose
|
---|---|---|
AOUTHDR | 28 | Added to executables to provide the entry point of the program |
GNU_AOUT | 32 | Unknown |
typedef struct {
unsigned short magic; /* type of file */
unsigned short vstamp; /* version stamp */
unsigned long tsize; /* text size in bytes, padded to FW bdry*/
unsigned long dsize; /* initialized data " " */
unsigned long bsize; /* uninitialized data " " */
unsigned long entry; /* entry pt. */
unsigned long text_start; /* base of text used for this file */
unsigned long data_start; /* base of data used for this file */
} AOUTHDR;
The only two fields you should rely on are described below.
typedef struct {
char s_name[8]; /* section name */
unsigned long s_paddr; /* physical address, aliased s_nlib */
unsigned long s_vaddr; /* virtual address */
unsigned long s_size; /* section size */
unsigned long s_scnptr; /* file ptr to raw data for section */
unsigned long s_relptr; /* file ptr to relocation */
unsigned long s_lnnoptr; /* file ptr to line numbers */
unsigned short s_nreloc; /* number of relocation entries */
unsigned short s_nlnno; /* number of line number entries */
unsigned long s_flags; /* flags */
} SCNHDR;
This structure always exists immediately following any optional header in the COFF file (or following the file header, if f_opthdr is zero). When reading this header, you should read SCNHSZ bytes, and not rely on sizeof(SCNHDR) to give the correct size. The number of section headers present is given in the f_nscns field of the file header.
Bit
|
Symbol
|
Meaning
|
---|---|---|
0x0020 | STYP_TEXT | If set, indicates that this section contains only executable code. |
0x0040 | STYP_DATA | If set, indicates that this section contains only initialized data. |
0x0080 | STYP_BSS | If set, indicates that this section defines uninitialized data, and has no data stored in the coff file for it. |
typedef struct {
unsigned long r_vaddr; /* address of relocation */
unsigned long r_symndx; /* symbol we're adjusting for */
unsigned short r_type; /* type of relocation */
} RELOC;
Warning: This structure's size is not a multiple of four. When reading from file, it is strongly recommended that either (1) you read each entry in a loop, reading RELSZ bytes each time, or allocate a block of memory and calculate a pointer to each entry you need by multiplying by RELSZ. In no case should you assume that array addressing or sizeof(RELOC) will be useful.
There are only two types of relocations that you will encounter in a normal DJGPP COFF object.
Type
|
Value
|
Purpose
|
---|---|---|
RELOC_ADDR32 | 6 | Relocate a 32-bit absolute reference |
RELOC_REL32 | 20 | Relocate a 32-bit relative reference |
For any relocation, you must determine the new address of the relocated symbol that we are adjusting for. If the symbol is in another object (external), the symbol table will contain a reference to that external symbol, and the relocation will refer to that symbol table entry. If the symbol is in the same object, the symbol table will have entries that refer to the sections themselves (always there and always private) that will be referred to. When you relocate the section itself, these symbols will reflect its new location.
RELOC_ADDR32
To do this relocation, you must perform the following steps:
RELOC_REL32
This relocation happens normally only in executable sections, and refers only to external symbols. To do this relocation, you must perform the following steps:
typedef struct {
union {
unsigned long l_symndx; /* function name symbol index */
unsigned long l_paddr; /* address of line number */
} l_addr;
unsigned short l_lnno; /* line number */
} LINENO;
Warning: This structure's size is not a multiple of four. When reading from file, it is strongly recommended that either (1) you read each entry in a loop, reading LINESZ bytes each time, or allocate a block of memory and calculate a pointer to each entry you need by multiplying by LINESZ. In no case should you assume that array addressing or sizeof(LINENO) will be useful.
Each executable section has its own line number table. Each function in that section is numbered independently, with the start of the function (the line with the opening brace) numbered as line one for that function. Each function in the line number table will have one entry where l_lnno is zero and the symbol table entry for the function is in l_symndx. This entry is followed by entries for each line of the function, with l_lnno set to the function-relative line number (1..N) and l_paddr set to the address of the first assembler codes for that line.
To figure out absolute line numbers, you must look in the symbol table for the function, find the "beginning of function" symbol (type C_FCN, usually right after the function's C_EXT or C_STAT symbol) where the absolute line number for the function (equivalent to line one in the line number table's scheme) is stored (in AUXENT.x_sym.x_misc.x_lnsz.x_lnno), and add that to the relative line numbers in the table.
The trick to getting line numbers right is to remember that the lines of the source file start at one (the first line in the file is line one) and functions are numbered starting at one also. When you add them up, you get one too many ones, so you must then subtract one to get the right line number.
typedef struct {
union {
char e_name[E_SYMNMLEN];
struct {
unsigned long e_zeroes;
unsigned long e_offset;
} e;
} e;
unsigned long e_value;
short e_scnum;
unsigned short e_type;
unsigned char e_sclass;
unsigned char e_numaux;
} SYMENT;
The symbol table is probably one of the most complex parts of the COFF object, mostly because there are so many symbol types. The symbol table has entries for all symbols and meta-symbols, including public, static, external, section, and debugging symbols.
Symbol
|
Value
|
Meaning
|
---|---|---|
N_UNDEF | 0 | An undefined (extern) symbol |
N_ABS | -1 | An absolute symbol (e_value is a constant, not an address) |
N_DEBUG | -2 | A debugging symbol |
Type
|
Bits
|
Meaning
|
---|---|---|
T_NULL | ---- 0000 | No symbol |
T_VOID | ---- 0001 | void function argument (not used) |
T_CHAR | ---- 0010 | character |
T_SHORT | ---- 0011 | short integer |
T_INT | ---- 0100 | integer |
T_LONG | ---- 0101 | long integer |
T_FLOAT | ---- 0110 | floating point |
T_DOUBLE | ---- 0111 | double precision float |
T_STRUCT | ---- 1000 | structure |
T_UNION | ---- 1001 | union |
T_ENUM | ---- 1010 | enumeration |
T_MOE | ---- 1011 | member of enumeration |
T_UCHAR | ---- 1100 | unsigned character |
T_USHORT | ---- 1101 | unsigned short |
T_UINT | ---- 1110 | unsigned integer |
T_ULONG | ---- 1111 | unsigned long |
T_LNGDBL | ---1 0000 | long double (special case bit pattern) |
DT_NON | --00 ---- | No derived type |
DT_PTR | --01 ---- | pointer to T |
DT_FCN | --10 ---- | function returning T |
DT_ARY | --11 ---- | array of T |
The BTYPE(x) macro extracts the base type from e_type. Note that all DT_* must be shifted by N_BTSHIFT to get actual values, as in:
e_type = base + derived << N_BTSHIFT;There are also macros ISPTR, ISFCN, and ISARY that test the upper bits for the derived type.
Class
|
Value
|
Meaning
|
---|---|---|
C_NULL | 0 | No entry |
C_AUTO | 1 | Automatic variable |
C_EXT | 2 | External (public) symbol - this covers globals and externs |
C_STAT | 3 | static (private) symbol |
C_REG | 4 | register variable |
C_EXTDEF | 5 | External definition |
C_LABEL | 6 | label |
C_ULABEL | 7 | undefined label |
C_MOS | 8 | member of structure |
C_ARG | 9 | function argument |
C_STRTAG | 10 | structure tag |
C_MOU | 11 | member of union |
C_UNTAG | 12 | union tag |
C_TPDEF | 13 | type definition |
C_USTATIC | 14 | undefined static |
C_ENTAG | 15 | enumaration tag |
C_MOE | 16 | member of enumeration |
C_REGPARM | 17 | register parameter |
C_FIELD | 18 | bit field |
C_AUTOARG | 19 | auto argument |
C_LASTENT | 20 | dummy entry (end of block) |
C_BLOCK | 100 | ".bb" or ".eb" - beginning or end of block |
C_FCN | 101 | ".bf" or ".ef" - beginning or end of function |
C_EOS | 102 | end of structure |
C_FILE | 103 | file name |
C_LINE | 104 | line number, reformatted as symbol |
C_ALIAS | 105 | duplicate tag |
C_HIDDEN | 106 | ext symbol in dmert public lib |
C_EFCN | 255 | physical end of function |
SClass
|
Meaning of the Value
|
---|---|
C_AUTO C_ARG |
Address of the variable, relative to %ebp |
C_EXT C_STAT others |
The address of the symbol |
C_REG | The register number assigned to this variable |
C_MOS | Offset of the member from the beginning of the structure |
C_MOE | The value of this enum member |
C_FIELD | The mask for this field |
C_EOS | size of struct/union/enum |
The string table contains the names of symbols that are too long to inline in the symbol table. To read the string table, position the file pointer just after the symbol table (usually, you read the strings right after you read the symbols anyway), and read four bytes as one 32-bit little endian integer. Allocate this much memory. Set the first four bytes of the memory to zero, and read the remainder of the string table (length-4) into the remainder of the memory (ptr+4). A code sample would look like this:
int i;
char *s;
read(fd, &i, 4);
s = (char *)malloc(i);
memset(s, 0, 4);
read(fd, s+4, i-4);
All references to strings in this table are offsets from the beginning of this memory block. Note that offsets of zero are legal and will result in a zero-length string because of those four zeros you put at the beginning (where the length used to be).