Understanding PE Structure, The Layman’s Way – Malware Analysis Part 2

PE header format

Hello readers! In this article, we will look at the PE Header which is very much important in understanding the internal part of an executable file. Once you have an overall idea about what’s inside the executable file and how that executable file works in Windows it will then become easy for you to analyze any executable file as you advance the journey to the Malware Analysis path. Hopefully, this article will make you understand the overall scenario as to why I wrote this up and what’s the importance of PE Header while analyzing any malware binary. Also, I would try to keep this post as simple as possible since I am assuming that you are new to this exciting world of Malware Analysis and I don’t want you to get distracted. So, let’s get started.

Introduction

Each executable file has a common format called Common Object File Format (COFF), a format for executable, object code, shared library computer files used on Unix systems. And PE (Portable Executable) format is one such COFF format available today for executable, object code, DLLs, FON font files, and core dumps in 32-bit and 64-bit versions of Windows operating systems. And if you ask me what’s on the plate for Linux then? Well, we have an Executable Link File (ELF) format for the Linux. Since I have dedicated this post to the Windows PE headers,  I will discuss ELF format in some other post later.

PE format is actually a data structure that tells Windows OS loader what information is required in order to manage the wrapped executable code. This includes dynamic library references for linking, API export and import tables, resource management data and TLS data. The data structures on disk are the same data structures used in the memory and if you know how to find something in a PE file, you can almost certainly find the same information after the file is loaded in the memory. It is important to note that PE files are not just mapped into memory as a single memory-mapped file. Instead, the Win32 loader looks at the PE file and decides what portions of the file to map in.

A module in memory represents all the code, data, and resources from an executable file that is needed by a process. Other parts of a PE file may be read, but not mapped in (for instance, relocations). Some parts may not be mapped in at all, for example, when debug information is placed at the end of the file. A field in the PE header tells the system how much memory needs to be set aside for mapping the executable into memory. Data that won’t be mapped in is placed at the end of the file, past any parts that will be mapped in.

The PE data structures include DOS Header, DOS Stub, PE File Header, Image Optional Header, Section Table, Data Dictionaries and Sections.

PE Structure
PE (Portable Executable) Format

Let me explain you these data structures with the help of an example. So, I am taking an example of Calculator (calc.exe) here which I’ll be opening in Hex Editor (HxD). You can grab this handy tool from here.

Calc in HxD

DOS_Header

DOS Header occupies the first 64 bytes of the file. i.e. the first 4 rows of the hex editor as seen in the image below. If you notice you will see the ASCII strings “MZ” mentioned at the beginning of the file. This MZ occupies the first two bytes (hexadecimal: 4D 5A or 0x54AD) of the DOS Header which is read as 5Ah 4Dh. MZ is the initials of Mark Zbikowski, one of the developers of MS-DOS. This field is called e_magic or the magic number which is one such important field to identify an MS-DOS-compatible file type. All MS-DOS-compatible executable files set this value to 0x54AD i.e. “MZ” in ASCII.

The final field, e_lfanew, is a 4-byte offset (F0 00 00 00) and tells where the PE Header is located. Check the PE header section below for this offset location.

DOS Header

DOS_Stub Program

A stub is a tiny program or a piece of code that is run by default when the execution of an application starts. This stub prints out the message “This program cannot be run in DOS mode” when the program is not compatible with the Windows. So if we run a Win32-based program in an environment that doesn’t support Win32, we will get this informative error message. In this case, the real-mode stub program is run by MS-DOS when the executable is loaded. When a Windows loader maps a PE file into the memory, the first byte of the file that gets mapped corresponds to the first byte of the MS-Dos stub.

DOS Stub

PE File Header

The PE header is located by looking at the e_lfanew field of the MS-DOS Header. The e_lfanew gives the offset of the PE header location. The field e_lfanew actually denotes the file header of the new .exe header. The main PE Header is a structure of type IMAGE_NT_HEADERS and mainly contains SIGNATURE, IMAGE_FILE_HEADER, and IMAGE_OPTIONAL_HEADER.

SIGNATURE is the 4 bytes Dword signature. In this case, the offset set for PE header is 000000F0 and the PE signature starts at 50 45 00 00 (the letter PE followed by two terminating zeroes).

PE Header

IMAGE_FILE_HEADER:- The file header is the next 20 bytes of the PE file and contains only the most basic information about the layout of the file.

PE Header

The above-highlighted part of the image signifies the file header of any portable executable file.

The fields of the Image File Header are as follows:-

Image File Header

IMAGE_OPTIONAL_HEADER:- As from the name itself don’t simply assume that this is an optional header and is not a relevant one. In fact, this header contains some critical information that is beyond the basic information contained in the IMAGE_FILE_HEADER data structure. Some of the important fields that one should be aware of are listed below:-

  • Magic:- The magic field tells whether an executable image is of 32-bit or 64-bit. The value set in the Magic field is IMAGE_NT_OPTIONAL_HDR_MAGIC and the value is defined as IMAGE_NT_OPTIONAL_HDR32_MAGIC (0x10b) in a 32-bit application and as IMAGE_NT_OPTIONAL_HDR64_MAGIC (0x20b) in a 64-bit application.
  • Address of Entry Point:- It is the address where the Windows loader will begin execution. This is an RVA (Relative Virtual Address) and is usually found in the .text section. For executable this is the starting address. For device drivers, this is the address of the initialized function. The entry point function is optional for DLLs and when no entry point is present, this member is zero.
  • Image Base:- It is the address where an executable file will be memory-mapped to a specific location in memory. In Windows NT, the default image base for an executable is 0x10000 and for DLL, the default is 0x400000. Just to keep this in mind that in case of Windows 95, the address 0x10000 can’t be used to load 32-bit EXEs because it lies with the linear address region shared by all processes. And because of this Microsoft decided to change the default base address for Win32 executable to 0x400000.
  • Section Alignment:- When an executable is mapped into the memory, each section of that executable starts at a virtual address which is actually the multiple of this value.
  • Subsystem:- This field identifies the target subsystem for an executable file i.e. the type of subsystem an executable uses for its user interface. Each possible subsystem values are defined in WINNT.H file:-
ValueIdentifierMeaning
NATIVE1Doesn’t require a subsystem
WINDOWS_GUI2Runs in Windows GUI subsystem
WINDOWS_CLI3Runs in Windows character subsystem
OS2_CUI5Runs in OS/2 character subsystem
POSIX_CUI7Runs in the Posix character subsystem
  • IMAGE_DATA_DIRECTORY:- The data directory field indicates where to find the other important components of executable information in the file. The structures of this field are located at the bottom of the optional header structure. The current PE file format defines 16 possible data structures, out of which 11 are currently used.

Below are some of the the data directories:-

// Directory Entries

// Export Directory
define IMAGE_DIRECTORY_ENTRY_EXPORT 0
// Import Directory
define IMAGE_DIRECTORY_ENTRY_IMPORT 1
// Resource Directory
define IMAGE_DIRECTORY_ENTRY_RESOURCE 2
// Exception Directory
define IMAGE_DIRECTORY_ENTRY_EXCEPTION 3
// Security Directory
define IMAGE_DIRECTORY_ENTRY_SECURITY 4
// Base Relocation Table
define IMAGE_DIRECTORY_ENTRY_BASERELOC 5
// Debug Directory
define IMAGE_DIRECTORY_ENTRY_DEBUG 6
// Description String
define IMAGE_DIRECTORY_ENTRY_COPYRIGHT 7
// Machine Value (MIPS GP)
define IMAGE_DIRECTORY_ENTRY_GLOBALPTR 8
// TLS Directory
define IMAGE_DIRECTORY_ENTRY_TLS 9
// Load Configuration Directory
define IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG 10

Each data directory entry specifies the size and relative virtual address of the directory. To locate a particular directory, we first need to determine the relative virtual address from the data directory array in the optional header. Then we have to use the virtual address to determine which section the directory is in. Once we identify which section contain the directory, the section header for that section is then used to find the exact file offset location of the data directory.

Data directory is another important concept which one should be aware of. As I don’t want to make this post lengthy for you. I will be covering Data Directory topic in a separate post.

Section Table

Section Table contains information related to the various sections available in the image of an executable file. The sections in the image are sorted by the RVAs rather than alphabetically. Sections Table contains the following important fields:-

  • Name of Section
  • Relative Virtual Address (RVA)
  • Size of Raw Data
  • Virtual Size
  • Characteristics of Section

Sections

Below are the various sections available from an executable file:-

  • .text:- contains an executable code. Also known as .code.
  • .data:- contains initialized data.
  • .reloc:- contains relocation information.
  • .rsrc:- contains resource information of a module.
  • .debug:- contains debug information.
  • .edata, .idata:- contains export and import data.


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.