Tuesday, August 12, 2008

The PE File Format(1)


What's PE file?

The PE file format is the format of executable binaries (DLLs and programs) for MS windows NT, windows 95 and win32s. It is called the Portable Executable (PE) format because it is supposed to be portable across all 32-bit operating systems by Microsoft.

The format is designed by Microsoft and standardized by the TIS (tool interface standard) Committee (Microsoft, Intel, Borland, Watcom, IBM and others) in 1993, apparently based on a good knowledge of COFF, the"common object file format" used for object files and executables on several UNIXes and on VMS.

It is helpful to understand the PE file format because PE files are almost identical on disk and in RAM. Learning about the PE format is also helpful for understanding many operating system concepts.

STRUCTURE OF A PE FILE

Apart from the sections consisting of the actual data, a PE file contains various headers that describe the sections and the important information present in the sections.The structure is shown right.

A PE file starts with the DOS executable header. It is followed by a small program that prints an error message saying that the program cannot be run in DOS mode.

After the DOS header and the DOS executable stub comes the PE header. A field in the DOS header points to this new header. The PE header starts with the 4-byte signature "PE" followed by two nulls. The PE format is based on the Common Object File Format (COFF) used by Unix.

The PE signature is followed by the object file header borrowed from COFF. This header is present also for the object files produced by Microsoft's 32-bit compilers. This header contains some general information about the file, such as the target machine ID, the number of sections in the file, and so forth.

The COFF style header is followed by the optional header. This header is optional in the sense that it is not required for the object files. As far as executables and DLLs are concerned, this header is mandatory. The optional header has two parts. The first part is inherited from COFF and can be found in all COFF files. The second part is an NT-specific extension of COFF. Apart from other NT-specific information, such as the subsystem type, this part also contains the data directory. The data directory is an array in which each entry points to some important piece of information. One of the entries in the data directory points to the import table of the executable or DLL, another entry points to the export table of the DLL, and so on.The data directory is followed by the section table. The section table is an array of section headers. A section header summarizes the important information about the respective section.

Finally, the section table is followed by the sections.Each section has some flags about alignment, what kind of data it contains (initialized data and so on), whether it can be shared etc.,and the data itself. Most, but not all, sections contain one or more directories referenced through the entries of the optional header's data directory array, like the directory of exported functions or the directory of base relocations. Directoryless types of contents are, for example ,executable code or initialized data.

Reference:
http://www.windowsitlibrary.com/Content/356/11/1.html
http://webster.cs.ucr.edu/Page_TechDocs/pe.txt

No comments: