DJGPP COFF Spec

This document should be considered to be the ultimate reference to the DJGPP COFF format. That doesn't mean it's complete, but since this format isn't really documented elsewhere, this is as good as it gets. All programs reading COFF files should include <coff.h>

Note: Unless otherwise specified, all numeric fields are stored in host native order, which is LSB-first (little endian) for DJGPP, and all file offsets are relative to the beginning of the COFF object (i.e. the file header is always at offset zero, even when the object is inside a library).

Comments to dj@delorie.com

Structure Located? Purpose
File Header Beginning of file Overview of the file; controls layout of other sections
Optional Header Follows file header For executables, used to store the initial %eip
Section Header Follow optional header; count determined by file header Maintain location and size information about code and data sections
Section Data Stored in section header Contains code and data for the program
Relocation Directives Stored in section header Contain fixup information needed when relocating a section
Line Numbers Stored in section header Hold address of each line number in code/data sections
Symbol Table Stored in file header Contains one entry for each symbol this file defines or references
String Table Follows symbol table Stores symbol names; first four bytes are total length

 

File Header

typedef struct {
unsigned short f_magic; /* magic number */
unsigned short f_nscns; /* number of sections */
unsigned long f_timdat; /* time & date stamp */
unsigned long f_symptr; /* file pointer to symtab */
unsigned long f_nsyms; /* number of symtab entries */
unsigned short f_opthdr; /* sizeof(optional hdr) */
unsigned short f_flags; /* flags */
} FILHDR;

This structure always exists at the beginning of the COFF object. When reading this header, you should read FILHSZ bytes, and not rely on sizeof(FILHDR) to give the correct size.

f_magic - magic number
This is a constant value for all COFF files, and is used to detect the fact that the file is COFF. The value of this field must be I386MAGIC (0x14c) and is stored in little-endian format, so the first two bytes of any DJGPP COFF file are 0x4c and 0x01.

f_nscns - number of sections
The number of sections (and thus section headers) contained within this file.

f_timdat - file time & date stamp
The time that this coff file was created. The value has the same meaning as the time_t type.

f_symptr - symbol table pointer
Contains the file offset of the symbol table.

f_nsyms - number of symbols in the symbol table
The number of symbols in the symbol table.

f_opthdr - optional header size
The number of extra bytes that follow the file header, before the section headers begin. Often used to store the optional a.out header. Regardless of what optional header you expect, you should read (or skip) exactly the number of bytes given in this field before reading the section headers.

f_flags - flag bits
These flags provide additional information about the state of this coff object. The flags are as follows:

Bit
Symbol
Meaning
0x0001 F_RELFLG If set, there is no relocation information in this file. This is usually clear for objects and set for executables.
0x0002 F_EXEC If set, all unresolved symbols have been resolved and the file may be considered executable.
0x0004 F_LNNO If set, all line number information has been removed from the file (or was never added in the first place).
0x0008 F_LSYMS If set, all the local symbols have been removed from the file (or were never added in the first place).
0x0100 F_AR32WR Indicates that the file is 32-bit little endian

 

Optional Header

The optional header immediately follows the file header in the COFF file. The size of this header is stored in the f_opthdr field of the file header. You must read that many bytes from the file regardless of how big you expect the optional header to be.

Two optional headers are defined for DJGPP objects:

Struct
Size
Purpose
AOUTHDR 28 Added to executables to provide the entry point of the program
GNU_AOUT 32 Unknown

 

typedef struct {
unsigned short magic; /* type of file */
unsigned short vstamp; /* version stamp */
unsigned long tsize; /* text size in bytes, padded to FW bdry*/
unsigned long dsize; /* initialized data " " */
unsigned long bsize; /* uninitialized data " " */
unsigned long entry; /* entry pt. */
unsigned long text_start; /* base of text used for this file */
unsigned long data_start; /* base of data used for this file */
} AOUTHDR;

The only two fields you should rely on are described below.

magic - magic number
Always the value ZMAGIC (0x010b).

entry - entry point
This should be used to provide the initial value of %eip when the program is initialized.

 

Section Header

typedef struct {
char s_name[8]; /* section name */
unsigned long s_paddr; /* physical address, aliased s_nlib */
unsigned long s_vaddr; /* virtual address */
unsigned long s_size; /* section size */
unsigned long s_scnptr; /* file ptr to raw data for section */
unsigned long s_relptr; /* file ptr to relocation */
unsigned long s_lnnoptr; /* file ptr to line numbers */
unsigned short s_nreloc; /* number of relocation entries */
unsigned short s_nlnno; /* number of line number entries */
unsigned long s_flags; /* flags */
} SCNHDR;

This structure always exists immediately following any optional header in the COFF file (or following the file header, if f_opthdr is zero). When reading this header, you should read SCNHSZ bytes, and not rely on sizeof(SCNHDR) to give the correct size. The number of section headers present is given in the f_nscns field of the file header.

s_name - section name
The name of the section. The section name will never be more than eight characters, but be careful to handle the case where it's exactly eight characters - there will be no trailing null in the file! For shorter names, there field is padded with null bytes.

s_paddr - physical address of section data
This is the address at which the section data should be loaded into memory. For linked executables, this is the absolute address within the program space. For unlinked objects, this address is relative to the object's address space (i.e. the first section is always at offset zero).

s_vaddr - virtual address of section data
Always the same value as s_paddr in DJGPP.

s_size - section data size
The number of bytes of data stored in the file for this section. You should always read this many bytes from the file, beginning s_scnptr bytes from the beginning of the object.

s_scnptr - section data pointer
This contains the file offset of the section data.

s_relptr - relocation data pointer
The file offset of the relocation entries for this section.

s_lnnoptr - line number table pointer
The file offset of the line number entries for this section.

s_nreloc - number of relocation entries
The number of relocation entries for this section. Beware files with more than 65535 entries; this field truncates the value with no other way to get the "real" value.

s_nlnno - number of line number entries
The number of line number entries for this section. Beware files with more than 65535 entries; this field truncates the value with no other way to get the "real" value.

s_flags - flag bits
These flags provide additional information for each section. Flags other than those set below may be set, but are of no use aside from what these three provide.

Bit
Symbol
Meaning
0x0020 STYP_TEXT If set, indicates that this section contains only executable code.
0x0040 STYP_DATA If set, indicates that this section contains only initialized data.
0x0080 STYP_BSS If set, indicates that this section defines uninitialized data, and has no data stored in the coff file for it.

 

Relocation Directives

typedef struct {
unsigned long r_vaddr; /* address of relocation */
unsigned long r_symndx; /* symbol we're adjusting for */
unsigned short r_type; /* type of relocation */
} RELOC;

Warning: This structure's size is not a multiple of four. When reading from file, it is strongly recommended that either (1) you read each entry in a loop, reading RELSZ bytes each time, or allocate a block of memory and calculate a pointer to each entry you need by multiplying by RELSZ. In no case should you assume that array addressing or sizeof(RELOC) will be useful.

There are only two types of relocations that you will encounter in a normal DJGPP COFF object.

Type
Value
Purpose
RELOC_ADDR32 6 Relocate a 32-bit absolute reference
RELOC_REL32 20 Relocate a 32-bit relative reference

For any relocation, you must determine the new address of the relocated symbol that we are adjusting for. If the symbol is in another object (external), the symbol table will contain a reference to that external symbol, and the relocation will refer to that symbol table entry. If the symbol is in the same object, the symbol table will have entries that refer to the sections themselves (always there and always private) that will be referred to. When you relocate the section itself, these symbols will reflect its new location.

RELOC_ADDR32

To do this relocation, you must perform the following steps:

RELOC_REL32

This relocation happens normally only in executable sections, and refers only to external symbols. To do this relocation, you must perform the following steps:

 

Line Numbers

typedef struct {
union {
unsigned long l_symndx; /* function name symbol index */
unsigned long l_paddr; /* address of line number */
} l_addr;
unsigned short l_lnno; /* line number */
} LINENO;

Warning: This structure's size is not a multiple of four. When reading from file, it is strongly recommended that either (1) you read each entry in a loop, reading LINESZ bytes each time, or allocate a block of memory and calculate a pointer to each entry you need by multiplying by LINESZ. In no case should you assume that array addressing or sizeof(LINENO) will be useful.

Each executable section has its own line number table. Each function in that section is numbered independently, with the start of the function (the line with the opening brace) numbered as line one for that function. Each function in the line number table will have one entry where l_lnno is zero and the symbol table entry for the function is in l_symndx. This entry is followed by entries for each line of the function, with l_lnno set to the function-relative line number (1..N) and l_paddr set to the address of the first assembler codes for that line.

To figure out absolute line numbers, you must look in the symbol table for the function, find the "beginning of function" symbol (type C_FCN, usually right after the function's C_EXT or C_STAT symbol) where the absolute line number for the function (equivalent to line one in the line number table's scheme) is stored (in AUXENT.x_sym.x_misc.x_lnsz.x_lnno), and add that to the relative line numbers in the table.

The trick to getting line numbers right is to remember that the lines of the source file start at one (the first line in the file is line one) and functions are numbered starting at one also. When you add them up, you get one too many ones, so you must then subtract one to get the right line number.

 

Symbol Table

typedef struct {
union {
char e_name[E_SYMNMLEN];
struct {
unsigned long e_zeroes;
unsigned long e_offset;
} e;
} e;
unsigned long e_value;
short e_scnum;
unsigned short e_type;
unsigned char e_sclass;
unsigned char e_numaux;
} SYMENT;

The symbol table is probably one of the most complex parts of the COFF object, mostly because there are so many symbol types. The symbol table has entries for all symbols and meta-symbols, including public, static, external, section, and debugging symbols.

e.e_name - inlined symbol name
If the symbol's name is eight characters or less, it is stored in this field. Note that the first character overlaps the e_zeroes field - by doing so, the e_zeroes field can be used to determine if the symbol name has been inlined. Beware that the name is null terminated only if it is less than eight characters long, else it is not null terminated.

e.e.e_zeroes - flag to tell if name is inlined
If this field is zero, then the symbol name is found by using e_offset as an offset into the string table. If it is nonzero, then the name is in the e_name field.

e.e.e_offset - offset of name in string table
If e_zeroes is zero, this field contains the offset of the symbol name in the string table.

e_value - the value of the symbol
The value of the symbol. For example, if the symbol represents a function, this contains the address of the function. The meaning of the value depends on the type of symbol (below).

e_scnum - section number
The number of the section that this symbol belongs to. The first section in the section table is section one. In addition, e_scnum may be one of the following values:

Symbol
Value
Meaning
N_UNDEF 0 An undefined (extern) symbol
N_ABS -1 An absolute symbol (e_value is a constant, not an address)
N_DEBUG -2 A debugging symbol

e_type - symbol type
The type of the symbol. This is made up of a base type and a derived type. For example, "pointer to int" is "pointer to T" and "int".

Type
Bits
Meaning
T_NULL ---- 0000 No symbol
T_VOID ---- 0001 void function argument (not used)
T_CHAR ---- 0010 character
T_SHORT ---- 0011 short integer
T_INT ---- 0100 integer
T_LONG ---- 0101 long integer
T_FLOAT ---- 0110 floating point
T_DOUBLE ---- 0111 double precision float
T_STRUCT ---- 1000 structure
T_UNION ---- 1001 union
T_ENUM ---- 1010 enumeration
T_MOE ---- 1011 member of enumeration
T_UCHAR ---- 1100 unsigned character
T_USHORT ---- 1101 unsigned short
T_UINT ---- 1110 unsigned integer
T_ULONG ---- 1111 unsigned long
T_LNGDBL ---1 0000 long double (special case bit pattern)
DT_NON --00 ---- No derived type
DT_PTR --01 ---- pointer to T
DT_FCN --10 ---- function returning T
DT_ARY --11 ---- array of T

The BTYPE(x) macro extracts the base type from e_type. Note that all DT_* must be shifted by N_BTSHIFT to get actual values, as in:

e_type = base + derived << N_BTSHIFT;  
There are also macros ISPTR, ISFCN, and ISARY that test the upper bits for the derived type.

e_sclass - storage class
This tells where and what the symbol represents.
Class
Value
Meaning
C_NULL 0 No entry
C_AUTO 1 Automatic variable
C_EXT 2 External (public) symbol - this covers globals and externs
C_STAT 3 static (private) symbol
C_REG 4 register variable
C_EXTDEF 5 External definition
C_LABEL 6 label
C_ULABEL 7 undefined label
C_MOS 8 member of structure
C_ARG 9 function argument
C_STRTAG 10 structure tag
C_MOU 11 member of union
C_UNTAG 12 union tag
C_TPDEF 13 type definition
C_USTATIC 14 undefined static
C_ENTAG 15 enumaration tag
C_MOE 16 member of enumeration
C_REGPARM 17 register parameter
C_FIELD 18 bit field
C_AUTOARG 19 auto argument
C_LASTENT 20 dummy entry (end of block)
C_BLOCK 100 ".bb" or ".eb" - beginning or end of block
C_FCN 101 ".bf" or ".ef" - beginning or end of function
C_EOS 102 end of structure
C_FILE 103 file name
C_LINE 104 line number, reformatted as symbol
C_ALIAS 105 duplicate tag
C_HIDDEN 106 ext symbol in dmert public lib
C_EFCN 255 physical end of function
e_numaux - number of auxiliary entries
Each symbol is allowed to have additional data that follows it in the symbol table. This field tells how many equivalent SYMENTs are used for aux entries. For most symbols, this is zero. A value of one allows up to SYMESZ bytes of auxiliary information for that symbol. A non-exhaustive list of auxiliary entries follows, based on the storage class (e_sclass) or type (e_type) of the symbol.

Auxiliary Entries

DT_ARY
.x_sym.x_misc.x_lnsz.x_size
size in bytes (size*count)

T_STRUCT
T_UNION
T_ENUM
.x_sym.x_tagndx
syment index for list of tags (will point to C_STRTAG, C_UNTAG, or C_ENTAG)
.x_sym.x_misc.x_lnsz.x_size
size in bytes (size*count)

T_NULL | C_STAT - section symbols (like .text)
.x_scn.x_scnlen
section length (bytes)
.x_scn.x_nreloc
number of relocation entries (ushort)
.x_scn.x_nlinno
number of line numbers (ushort)

C_STRTAG - will be followed by C_MOS's and C_EOS
C_UNTAG - will be followed by C_MOU's and C_EOS
C_ENTAG - will be followed by C_MOE's and C_EOS
.x_sym.x_x_misc.x_lnsz.x_size
The size of the struct/union/enum
.x_sym.x_fcnary.x_fcn.x_endndx
The symbol index after our list.

C_EOS
.x_sym.x_misc.x_lnsz.x_size
the size of the struct/union/enum
.x_sym.x_tagndx
The symbol index of the start of our list.

C_FIELD
.x_sym.x_x_misc.x_lnsz.x_size
the number of bits

C_BLOCK
.x_sym.x_misc.x_lnsz.x_lnno
starting line number
.x_sym.x_fcnary.x_fcn.x_endndx
The symbol index after our block (if .bb)

C_FCN
.x_sym.x_misc.x_lnsz.x_lnno
starting line number
.x_sym.x_misc.x_lnsz.x_size
size in bytes

C_FILE
.x_file.x_fname
.x_file.x_n.x_zeroes
.x_file.x_n.x_offset
These three specify the file name, just like the three fields used to specify the symbol name.

Meanings of the Values

SClass
Meaning of the Value
C_AUTO
C_ARG
Address of the variable, relative to %ebp
C_EXT
C_STAT
others
The address of the symbol
C_REG The register number assigned to this variable
C_MOS Offset of the member from the beginning of the structure
C_MOE The value of this enum member
C_FIELD The mask for this field
C_EOS size of struct/union/enum

 

String Table

The string table contains the names of symbols that are too long to inline in the symbol table. To read the string table, position the file pointer just after the symbol table (usually, you read the strings right after you read the symbols anyway), and read four bytes as one 32-bit little endian integer. Allocate this much memory. Set the first four bytes of the memory to zero, and read the remainder of the string table (length-4) into the remainder of the memory (ptr+4). A code sample would look like this:

All references to strings in this table are offsets from the beginning of this memory block. Note that offsets of zero are legal and will result in a zero-length string because of those four zeros you put at the beginning (where the length used to be).


Pierre's LibraryPierre's Library - Changelog:

Analyse d'audience