Mastering PE Structure for Malware Analysis: A Layman's Guide

In this article, we will look at the PE Structure or Portable Executable file format (PE File Format), which is important in understanding the internal structure of an executable file.

Once you have an overall idea about what’s inside the executable file and how it works in Windows, it will become easy to analyze any executable file as you advance the journey to the Malware Analysis path.

Hopefully, this article will make you understand the overall scenario as to why I wrote this up and the importance of PE Structure or PE File Format while analyzing any malware binary.

Also, I would try to keep this post as simple as possible since I assume you are new to this exciting world of Malware Analysis, and I don’t want you to get overwhelmed.

Lets have an in-depth look into Win32 Portable Executable File Format.

Table Of Contents

Introduction to PE Structure
What are the contents of the PE Structure?
DOS Header (IMAGE_DOS_HEADER)
DOS Stub Program
NT Header (IMAGE_NT_HEADERS)
Section Header Table
PE Sections
- The .edata and .data Sections
Advanced PE Format Concepts and Malware Analysis
Conclusion
Frequently Asked Questions (FAQs)
References:

Introduction to PE Structure

The PE format is like a blueprint for Windows programs – it tells the system how to load and run an app, from where to start to what to use.

Each executable file shares a common format called COFF (Common Object File Format). COFF is used for executables, object code, and shared libraries on Unix-like systems.

And PE file format is one such COFF format available today for executable, object code, DLLs, FON font files, and core dumps in 32-bit and 64-bit versions of Windows operating systems. Several examples of PE file extensions include.exe,.dll,.scr, and.sys, to name just a few.

And if you ask me what’s on the plate for Linux then? We have an Executable Link File (ELF) format for Linux. Since I have dedicated this post to the Windows PE headers, I will discuss ELF format in some other posts later.

PE (Portable Executable) file format is a data structure that tells the Windows OS loader what information is required to manage the wrapped executable code. This includes dynamic library references for linking, API export, import tables, resource management data, and TLS data.

The data structures on disk are the same data structures used in the memory, and if you know how to find something in a PE file, you can almost certainly find the exact information after the file is loaded into the memory.

It is important to note that PE files are not just mapped into memory as a single memory-mapped file. Instead, the Win32 loader looks at the PE file and decides what portions of the file to map in.

A module in memory represents all the code, data, and resources from an executable file needed by a process.

Other parts of a PE file may be read but not mapped in (for instance, relocations). Some parts may not be mapped in at all, for example, when debug information is placed at the end of the file.

A field in the PE header tells the system how much memory needs to be set aside for mapping the executable into memory. Data that won’t be mapped in is placed at the end of the file, past any parts that will be mapped in.

Understanding the PE structure is fundamental for both static analysis and dynamic analysis in the realm of malware analysis . Malware often comes in the form of PE files, and analyzing their structure can reveal suspicious characteristics and behaviors .

What are the contents of the PE Structure?

A PE file is broken into chunks – headers, tables, and sections – that each tell Windows something important about how to run the program.

The PE data structures include DOS Header, DOS Stub, PE File Header, Image Optional Header, Section Table, Data Dictionaries, and Sections.

PE Structure — *Diagram illustrating the layout of a Portable Executable (PE) file.*

Let me explain these data structures with the help of an example. So, I am taking an example of Calculator (calc.exe) here, which I’ll be opening in Hex Editor (HxD). You can grab this handy tool from here.

*Screenshot of calc.exe opened in HxD hex editor.*

DOS Header (IMAGE_DOS_HEADER)

This is the first part of the file – starts with “MZ” and basically says, ‘Hey, there’s more to see here!’ It also points to where the real action starts.

DOS Header occupies the first 64 bytes of the file. i.e., the first 4 rows of the hex editor as seen in the image below. If you notice, you will see the ASCII strings “MZ” mentioned at the beginning of the file. This MZ occupies the first two bytes (hexadecimal: 4D 5A or 0x54AD) of the DOS Header, which is read as 5Ah 4Dh.

This signature identifies the file as a valid PE file. The IMAGE_DOS_HEADER structure looks like this:

typedef struct _IMAGE_DOS_HEADER {      // DOS .EXE header
    WORD   e_magic;                     // Magic number
    WORD   e_cblp;                      // Bytes on last page of file
    WORD   e_cp;                        // Pages in file
    WORD   e_crlc;                      // Relocations
    WORD   e_cparhdr;                   // Size of header in paragraphs
    WORD   e_minalloc;                  // Minimum extra paragraphs needed
    WORD   e_maxalloc;                  // Maximum extra paragraphs needed
    WORD   e_ss;                        // Initial (relative) SS value
    WORD   e_sp;                        // Initial SP value
    WORD   e_csum;                      // Checksum
    WORD   e_ip;                        // Initial IP value
    WORD   e_cs;                        // Initial (relative) CS value
    WORD   e_lfarlc;                    // File address of relocation table
    WORD   e_ovno;                      // Overlay number
    WORD   e_res[4];                    // Reserved words
    WORD   e_oemid;                     // OEM identifier (for e_oeminfo)
    WORD   e_oeminfo;                   // OEM information; e_oemid specific
    WORD   e_res2[10];                  // Reserved words
    LONG   e_lfanew;                    // Offset to the NT header
  } IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

MZ is the initials of Mark Zbikowski, one of the developers of MS-DOS. This field is called e_magic or the magic number, which is a vital field to identify an MS-DOS-compatible file type. All MS-DOS-compatible executable files set this value to 0x5A4D, i.e., “MZ” in ASCII.

The final field, e_lfanew, is a 4-byte offset (F0 00 00 00) and tells where the PE Header is located. Check the PE header section below for this offset location.

Screenshot showing DOS Header (IMAGE_DOS_HEADER) of a PE File

DOS Stub Program

A stub is a tiny program or a piece of code that is run by default when the execution of an application starts.

This stub prints out the message “This program cannot be run in DOS mode” when the program is not compatible with Windows.

So if we run a Win32-based program in an environment that doesn’t support Win32, we will get this informative error message.

In this case, the real-mode stub program is run by MS-DOS when the executable is loaded. When a Windows loader maps a PE file into the memory, the first byte of the file that gets mapped corresponds to the first byte of the MS-Dos stub.

Screenshot showing an internal structure of a DOS Stub

NT Header (IMAGE_NT_HEADERS)

“This is where things get serious. The NT header (starts with ‘PE’) tells Windows, ’Here comes the real program structure – pay attention!’”

The PE header is located by looking at the e_lfanew field of the MS-DOS Header. The e_lfanew field gives the offset of the PE header location. The field e_lfanew denotes the file header of the new .exe header. The main PE Header is a structure of type IMAGE_NT_HEADERS and mainly contains SIGNATURE, IMAGE_FILE_HEADER, and IMAGE_OPTIONAL_HEADER.

SIGNATURE: This is the 4 bytes Dword signature. In this case, the offset set for the PE header is 000000F0, and the PE signature starts at 50 45 00 00 (the letter PE followed by two terminating zeroes).

File Header (IMAGE_FILE_HEADER)

This short header acts like a label – it gives basic info like how many sections are coming up and what kind of file this is.

The file header is the next 20 bytes of the PE file and contains only the most basic information about the file’s layout.

The above-highlighted part of the image signifies the file header of any portable executable file.

The fields of the Image File Header are as follows:-

typedef struct _IMAGE_FILE_HEADER {
  WORD  Machine;
  WORD  NumberOfSections;
  DWORD TimeDateStamp;
  DWORD PointerToSymbolTable;
  DWORD NumberOfSymbols;
  WORD  SizeOfOptionalHeader;
  WORD  Characteristics;
} IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;

The most important struct members are:

Machine – This member specifies the architecture type of the computer i.e. x86, x64 or ARM. The image file is designed to work on a specific type of computer or hardware or emulator can be used to mimics the behavior of the specified computer’s hardware and software.
NumberOfSections – Specifies the number of sections in a PE file and also indicates the size of the section table.
TimeDateStamp – Indicates when the file was created. This can be useful in malware analysis, although it can be easily spoofed. Malware analysts may examine this timestamp to understand the potential build time of the malware.
SizeOfOptionalHeader – The size of the IMAGE_OPTIONAL_HEADER structure.
Characteristics – Flags that specify certain attributes about the executable file, such as whether it is a dynamic-link-library (DLL) or a console application.

Optional Header (IMAGE_OPTIONAL_HEADER)

Despite its name, it’s not optional at all – it’s packed with critical info like where the program starts running and how much memory it’ll use. And the reason it is called optional because some of the file types do not have this header.

This header contains some critical information that is beyond the basic information contained in the IMAGE_FILE_HEADER data structure.

typedef struct _IMAGE_OPTIONAL_HEADER {
  WORD                 Magic;
  BYTE                 MajorLinkerVersion;
  BYTE                 MinorLinkerVersion;
  DWORD                SizeOfCode;
  DWORD                SizeOfInitializedData;
  DWORD                SizeOfUninitializedData;
  DWORD                AddressOfEntryPoint;
  DWORD                BaseOfCode;
  DWORD                BaseOfData;
  DWORD                ImageBase;
  DWORD                SectionAlignment;
  DWORD                FileAlignment;
  WORD                 MajorOperatingSystemVersion;
  WORD                 MinorOperatingSystemVersion;
  WORD                 MajorImageVersion;
  WORD                 MinorImageVersion;
  WORD                 MajorSubsystemVersion;
  WORD                 MinorSubsystemVersion;
  DWORD                Win32VersionValue;
  DWORD                SizeOfImage;
  DWORD                SizeOfHeaders;
  DWORD                CheckSum;
  WORD                 Subsystem;
  WORD                 DllCharacteristics;
  DWORD                SizeOfStackReserve;
  DWORD                SizeOfStackCommit;
  DWORD                SizeOfHeapReserve;
  DWORD                SizeOfHeapCommit;
  DWORD                LoaderFlags;
  DWORD                NumberOfRvaAndSizes;
  IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;

Some of the important fields that one should be aware of are listed below:-

Magic: The magic field tells whether an executable image is 32-bit or 64-bit. The value set in the Magic field is IMAGE_NT_OPTIONAL_HDR_MAGIC, and the value is defined as IMAGE_NT_OPTIONAL_HDR32_MAGIC (0x10b) in a 32-bit application and as IMAGE_NT_OPTIONAL_HDR64_MAGIC (0x20b) in a 64-bit application.

AddressOfEntryPoint: It is the address where the Windows loader will begin execution. This holds the RVA (Relative Virtual Address) of the Entry Point (EP) of the module and is usually found in the .text section. For executable, this is the starting address. For device drivers, this is the address of the initialized function. The entry point function is optional for DLLs, and when no entry point is present, this member is zero.

The BaseOfCode and BaseOfData members hold the RVAs of the beginning of the code and data sections, respectively.

ImageBase: It is the address where an executable file will be memory-mapped to a specific location in memory. In Windows NT, the default image base for an executable is 0x10000, and for DLL, the default is 0x400000. Keep in mind that in the case of Windows 95, the address 0x10000 can’t be used to load 32-bit EXEs because it lies with the linear address region shared by all processes. And because of this, Microsoft decided to change the default base address for the Win32 executable to 0x400000. So, this is by default 0x400000 for applications and 0x10000000 for DLLs.

SectionAlignment and FileAlignment: Both members indicate the alignment of the sections of PE in the memory and in the file, respectively. When an executable is mapped into the memory, each section of that executable starts at a virtual address which is the multiple of this value.

SizeOfImage: The SizeOfImage member indicates the memory size occupied by the PE file on runtime. It has to be a multiple of the SectionAlignment values.

Subsystem: This field identifies the target subsystem for an executable file, i.e., the type of subsystem an executable uses for its user interface. Each possible subsystem value is defined in the WINNT.H file. Common values include WINDOWS_GUI (2) for graphical applications and WINDOWS_CLI (3) for command-line applications.

Identifier	Value	Meaning
UNKNOWN	0	Unknown subsystem
NATIVE	1	Image doesn’t require a subsystem
WINDOWS_GUI	2	Runs in the Windows GUI subsystem
WINDOWS_CLI	3	Runs in the Windows character subsystem
OS2_CUI	5	Runs in the OS/2 character subsystem
POSIX_CUI	7	Runs in the Posix character subsystem
WINDOWS_CE_GUI	8	Windows CE GUI subsystem

Data Directory

Think of Data Directory as a table of contents that points to other cool things – like imports, exports, and resources – inside the executable.

Finally, at the end of the IMAGE_OPTIONAL_HEADER structure, is the so-called Data Directory which is one of the most important member in the Optional Header. This is an array of IMAGE_DATA_DIRECTORY structures which has the following data structure:

typedef struct _IMAGE_DATA_DIRECTORY {
  DWORD VirtualAddress;
  DWORD Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;

The Data Directory member is a pointer to the first IMAGE_DATA_DIRECTORY structure.

IMAGE_DATA_DIRECTORY: The data directory field indicates where to find the other important components of executable information in the file. The structures of this field are located at the bottom of the optional header structure. The current PE file format defines 16 possible data structures, out of which 11 are currently used.

Specific data directory can be accessed using its index in the array:


#define IMAGE_DIRECTORY_ENTRY_EXPORT          0   // Export Directory
#define IMAGE_DIRECTORY_ENTRY_IMPORT          1   // Import Directory
#define IMAGE_DIRECTORY_ENTRY_RESOURCE        2   // Resource Directory
#define IMAGE_DIRECTORY_ENTRY_EXCEPTION       3   // Exception Directory
#define IMAGE_DIRECTORY_ENTRY_SECURITY        4   // Security Directory
#define IMAGE_DIRECTORY_ENTRY_BASERELOC       5   // Base Relocation Table
#define IMAGE_DIRECTORY_ENTRY_DEBUG           6   // Debug Directory
#define IMAGE_DIRECTORY_ENTRY_ARCHITECTURE    7   // Architecture Specific Data
#define IMAGE_DIRECTORY_ENTRY_GLOBALPTR       8   // RVA of GP
#define IMAGE_DIRECTORY_ENTRY_TLS             9   // TLS Directory
#define IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG    10   // Load Configuration Directory
#define IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT   11   // Bound Import Directory in headers
#define IMAGE_DIRECTORY_ENTRY_IAT            12   // Import Address Table
#define IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT   13   // Delay Load Import Descriptors
#define IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR 14   // COM Runtime descriptor

A few important ones are the ExportTableAddress (table of exported functions), the ImportTableAddress (table of imported functions), and the ResourcesTable (table of resources such as images embedded in the PE), and the ImportAddressTable (IAT), which stores the runtime addresses of the imported functions.

Let’s, take a look at all the data directories below, however the most important being Export Directory and Import Address Table.

Export Directory: Contains information about functions and data exported by this PE file (typically a DLL)
Import Directory: Points to the Import Address Table (IAT) and contains information about DLLs and functions imported by this PE file. Analyzing the IAT is a common static analysis technique in malware analysis to understand the functionality of the executable by identifying the external libraries and APIs it uses. Malware may employ techniques to avoid import table analysis.
Resource Directory: Points to the resource section of the PE file, which contains things like icons, images, and other resources used by the application. Malware may hide malicious payloads within the resource section.
Base Relocation Table: Contains information about base relocations, which are necessary if the PE file cannot be loaded at its preferred ImageBase address.
TLS Table: Points to the Thread Local Storage (TLS) directory. TLS allows each thread in a multithreaded application to have its own private storage location for certain data. Malware can use TLS callbacks to execute code early in the process lifecycle, even before the main entry point, as an anti-debugging or anti-analysis technique.
Load Configuration Table: Contains security-related settings such as Data Execution Prevention (DEP) and Address Space Layout Randomization (ASLR) flags, and also lists legitimate exception handlers. Malware might attempt to bypass these security features.
Bound Import Table: Contains information about bound imports, where the importing executable contains actual addresses to the exporting module, used to confirm the validity of these addresses.

Import Address Table: The import address table is a data structure in a PE file that contains information about the addresses of functions imported from other executable files. These addresses are used to access the functions and data in the other executables.
Delay Import Descriptor: Contains information for delay-load imports, where DLLs are loaded only when they are first called. Malware might use this to evade initial detection.

Each data directory entry specifies the size and relative virtual address of the directory. To locate a particular directory, we first need to determine the relative virtual address from the data directory array in the optional header. Then we have to use the virtual address to determine which section the directory is in.

Once we identify which section contains the directory, the section header for that section is then used to find the exact file offset location of the data directory.

The data directory is another important concept that one should be aware of. As I don’t want to make this post lengthy for you. I have covered the Data Directory topic in a separate post mentioned below:

Read more: Journey Towards Import Address Table of an Executable

Section Header Table

Section Header Table is actually like a map – each entry tells where a section starts, how big it is, and what it’s used for.

The Section Header Table immediately follows the Optional Header. It is an array of IMAGE_SECTION_HEADER structures, where each structure contains information about one of the sections in the PE file. The number of entries in this table is indicated by the NumberOfSections field in the File Header.

The IMAGE_SECTION_HEADER structure looks like this:

typedef struct _IMAGE_SECTION_HEADER {
    BYTE    Name;
    union {
            DWORD   PhysicalAddress;
            DWORD   VirtualSize;
    } Misc;
    DWORD   VirtualAddress;
    DWORD   SizeOfRawData;
    DWORD   PointerToRawData;
    DWORD   PointerToRelocations;
    DWORD   PointerToLinenumbers;
    WORD    NumberOfRelocations;
    WORD    NumberOfLinenumbers;
    DWORD   Characteristics;
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;

Sections Headers Table contains the following important fields:

Name: An 8-byte array containing the name of the section (e.g., “.text”, “.data”, “.rsrc”). While standard section names exist , malware may use unusual section names to hide malicious code or as an anti-analysis technique.

VirtualSize: The size of the section when loaded into memory.
VirtualAddress: The address of the first byte of the section when loaded into memory, relative to the ImageBase This is also known as the Relative Virtual Address (RVA). Using VirtualAddress and VirtualSize info. we can obtain the RVA of the next section, assuming that the memory alignment property is set to default.
SizeOfRawData: The size of the section’s data in the PE file on disk. A significant difference between VirtualSize and SizeOfRawData can be an indicator of a packed executable.
PointerToRawData: The offset where the Raw Data section starts in the file. So, by adding this to the value above and assuming that the file alignment property is set to default, we can obtain the offset of where the next section starts in the file.
PointerToRelocations: A file pointer to the beginning of relocation entries for the section. It’s set to 0 for executable files.
PointerToLinenumbers: A file pointer to the beginning of COFF line-number entries for the section. It’s set to 0 because COFF debugging information is deprecated.
NumberOfRelocations: The number of relocation entries for the section, it’s set to 0 for executable images.
NumberOfLinenumbers: The number of COFF line-number entries for the section, it’s set to 0 because COFF debugging information is deprecated.
Characteristics: This tells about the memory access rights for that section in memory denoted as flags (R, RW, RWE, etc..). These flags describe the characteristics of the section, such as whether it contains code, initialized data, or read-only data, and the memory access permissions (read, write, execute).

PE Sections

These are the main parts of the executable – where code, data, and resources actually live and do their job.

PE sections contain the actual code and data of the executable program. Each section has a specific purpose.

PE file section headers also specify the section name using a simple character array field called Name.

Below are the various common sections that exist in almost every PE file.

.text (or.code): This is normally the first section and contains the executable code for the application. Inside this section is also an entry point of the application: the address of the first application instruction that will be executed. An application can have more than one section with the executable code.
.data: This section contains initialized data of an application, such as strings.
.rdata or .idata: Usually, these section names are used for the sections where the import table is located. This table lists the Windows API used by the application (along with the names of their associated DLLs). Using this, the Windows loader knows the API to find in which system DLL to retrieve its address.
.rsrc: This is the common name for the resource-container section, which contains things like images used for the application’s UI.
.bss: Represents uninitialized data. This section doesn’t take up space in the file on disk; it’s only allocated memory when the program is loaded.

.reloc: Contains relocation information, which is used to adjust memory addresses if the executable is loaded at a different base address than its preferred one.
.pdata: Present in 64-bit executables, this section stores exception-handling information.
.tls: Contains Thread Local Storage (TLS) data.
.debug: contains debug information.

One thing to note is that the author can modify the names of these sections. However, in a general scenario, the above-mentioned are some of the common names of specific sections, but one shouldn’t imply that they will always be used with the same name or for the same purpose.

And because the array is only 8 bytes long, PE section names are limited to only 8 characters long. Their maximum length is 8 ASCII characters, and each section has its characteristics (access right permissions in memory).

For example, .text section usually has read/execute access, .data section with read/write and .rsrc section with read-only access, etc.

The .edata and .data Sections

Among the above-mentioned common sections available in any PE file, it also includes other important sections as well, namely .edata and .idata, which contain the table to exported and imported functions.

The export directory and import directory (directory entries under Optional Header) entries in the DataDirectory array refer to these sections.

The .idata sections specify which functions and data the binary imports from the shared libraries or DLLs, and the .edata section lists down the various functions and their addresses that the DLL will export to be used by other binary files.

Advanced PE Format Concepts and Malware Analysis

Beyond the basic structure, several advanced PE format concepts are crucial for in-depth malware analysis:

Anti-Analysis Techniques: Malware employs various methods to hinder analysis. Besides packing and overlays, this includes techniques like manipulating the TimeDateStamp to mislead analysts, using TLS callbacks for early execution before a debugger can attach , and exploiting delay-load imports to hide malicious activity until it’s executed.

Relocations: This process adjusts memory addresses when a PE file is loaded into memory at a different base address than its preferred one. The .reloc section contains the necessary information for these adjustments. Malware might manipulate or strip relocation information as an anti-analysis technique.

Bound Imports: This technique involves pre-resolving import dependencies to speed up loading. The Bound Import Table in the Data Directory contains information about these pre-resolved imports.

Delay-Load Imports: These imports allow DLLs to be loaded only when they are first called, potentially evading initial detection. The Delay Import Descriptor in the Data Directory points to the delay-load information.

Thread Local Storage (TLS) Callbacks: These are functions executed on thread creation or destruction. Malware can use TLS callbacks for early execution, even before the main entry point, as an anti-debugging or anti-analysis technique. The TLS Table in the Data Directory points to the TLS data and callback addresses.

Load Configuration Directory: This directory contains security-related settings like DEP and ASLR flags, as well as information about exception handling. Malware might attempt to bypass these security features.

PE Overlays: These are extra data appended to the end of a PE file, beyond the defined sections. This data is ignored by the loader but can be used by malware to hide additional malicious code, configuration data, or even entire executables, acting as an anti-analysis technique.

Packing and Unpacking: Malware often uses packing, a form of obfuscation where the original executable code is compressed or encrypted and wrapped with an unpacking stub. This makes static analysis difficult as the main malicious code is hidden. Identifying and unpacking malware is a key skill in malware analysis, often involving tools like PEiD, Exeinfo PE, and debuggers.

Conclusion

PE structure might sound scary at first, but it’s just a smart way Windows keeps things organized – and attackers know it too.

The PE format is one of the most important concepts in understanding any malware binary. It provides a wealth of information about an executable, such as whether it’s for a 32-bit or 64-bit machine and whether it’s a DLL or another file type. Hopefully, this article has given you a basic understanding of the PE file format and what to look for when you start analyzing an executable.

Understanding the PE structure is a fundamental step in both static and dynamic malware analysis.

Recognizing advanced concepts like relocations, bound imports, delay-load imports, TLS callbacks, the Load Configuration Directory, PE overlays, and packing/unpacking techniques, as well as common anti-analysis techniques, will significantly enhance your ability to analyze and understand malicious software.

Frequently Asked Questions (FAQs)

What does a PE file start with?

A PE file starts with a header that contains information about the file, such as its size, type, and entry point. Specifically, it begins with the “MZ” signature in the DOS Header.

What is the difference between EXE and PE?

EXE is a file extension for an executable file, and PE is the format of that executable file in Windows.

What is the signature of a PE file?

The signature of a PE file is “PE\0\0” located in the NT Header, following the DOS Header and DOS Stub.

Can section names always be trusted?

No, section names cannot always be trusted. While there are common conventions (e.g., .text for code, .data for data), malware authors can (and frequently do) use misleading or non-standard section names to hide malicious code or data. For example, executable code might be placed in a section named .data or a custom name. Always check the characteristics of a section (e.g., if it’s marked as executable) in the Section Header, rather than relying solely on its name.

Why is understanding PE structure important for malware analysis?

Understanding the PE file format is crucial for malware analysis because nearly all Windows-based malware comes in this format. By dissecting its structure, analysts can uncover hidden malicious code, identify suspicious behaviors (like unusual imports or manipulated headers), and understand how malware operates, evades detection, or persists on a system. This knowledge is fundamental for both static and dynamic analysis techniques.

How can malware use PE overlays?

A PE overlay refers to extra data appended to the end of a PE file, beyond its defined sections. This data is typically ignored by the Windows loader during execution. Malware authors exploit this by hiding additional malicious code, configuration data, or even entire executables within the overlay. This technique can help malware evade detection by security tools that only scan the officially defined sections of a PE file or by inflating the file size to bypass upload limits of automated analysis sandboxes.

How can I tell if a PE file is packed?

Packed executables are a common anti-analysis technique used by malware. You can often identify them by looking for:

High Entropy: Sections containing compressed or encrypted data will show high randomness (entropy).

Unusual Section Sizes: A significant difference between VirtualSize (size in memory) and SizeOfRawData (size on disk) for a section can indicate packing.

Few Recognizable Strings: Malware often encrypts strings to hide its functionality.

Minimal Imports: Packed executables might only import a few functions like LoadLibrary and GetProcAddress to dynamically resolve other APIs at runtime.

Specific Packer Signatures: Tools like PEiD or Exeinfo PE can often identify known packers by their unique signatures.

How do I locate a specific section in a PE file?

To locate a specific section, you typically follow these steps:

Find NumberOfSections: This field in the IMAGE_FILE_HEADER tells you how many sections are present in the PE file.

Navigate to Section Header Table: This table immediately follows the Optional Header. Each entry in this table is an IMAGE_SECTION_HEADER structure.

Read Section Headers: Iterate through each IMAGE_SECTION_HEADER to find the section you’re interested in. Key fields to look at are: Name: The section’s name (e.g., “.text”, “.data”, “.rsrc”).

VirtualAddress (RVA): The offset of the section in memory relative to the ImageBase.
PointerToRawData: The file offset where the section’s raw data begins on disk.

SizeOfRawData: The size of the section’s data on disk.

Tools: You can use hex editors like HxD to manually navigate to offsets, or specialized PE analysis tools like PEview, PE-bear, or CFF Explorer which parse and display section information in a more readable format.

What is the difference between Virtual Address (VA) and Relative Virtual Address (RVA)?

Virtual Address (VA): This is the actual memory address where an item (like code or data) resides once the PE file is loaded into a process’s memory. Each process has its own distinct virtual address space.

Relative Virtual Address (RVA): This is an offset from the ImageBase (the preferred loading address of the PE file in memory). RVAs are used within the PE file structure on disk to specify locations, as the actual ImageBase can be randomized by ASLR (Address Space Layout Randomization) at runtime. The formula is RVA + ImageBase = VA.

References:

https://docs.microsoft.com/en-us/windows/win32/debug/pe-format

Last Updated: May 24, 2025

3 comments

Pablo Caballero says:
July 11, 2023 at 10:50 pm
hI Satyajit! thank you so much for writing this article (sorry, but it seems I can only write in uppercase).
Do you know what sections are read while computing the signature on a Windows executable?
thank you so much!
best!
pablo
Satyajit Daulaguphu says:
July 15, 2023 at 2:54 pm
Hi Pablo,
Thanks and Yes, I just noticed… I can also type in Uppercase only. But system has already posted your comment as small case later. I’ll look into this.
Well, to answer your query, I am not sure how it works exactly but i think that during signature computation, specific sections or portions of an executable file which I had mentioned earlier like code section, data section, imports/exports and resources are included in the hashing process. And once the hash value is computed it is encrypted with the private key to create a digital signature. Finally this signature is then embedded within the executable file.
Hope I answered your query.

Mastering PE Structure for Malware Analysis: A Layman’s Guide

Introduction to PE Structure

What are the contents of the PE Structure?

DOS Header (IMAGE_DOS_HEADER)

DOS Stub Program

NT Header (IMAGE_NT_HEADERS)

File Header (IMAGE_FILE_HEADER)

Optional Header (IMAGE_OPTIONAL_HEADER)

Data Directory

Section Header Table

PE Sections

The .edata and .data Sections

Advanced PE Format Concepts and Malware Analysis

Conclusion

Frequently Asked Questions (FAQs)

References:

Tags:

Satyajit Daulaguphu

3 comments

Leave a Reply Cancel reply

11 Critical Malware Persistence Mechanisms You Should Be Familiar With!

What Is the Best Computer for Cyber Security? 8 Options

Install Cuckoo Sandbox For Real-Time Malware Analysis [Part 1]

Exciting Journey Towards Import Address Table (IAT) of an Executable

Mastering PE Structure for Malware Analysis: A Layman’s Guide

Introduction to PE Structure

What are the contents of the PE Structure?

DOS Header (IMAGE_DOS_HEADER)

DOS Stub Program

NT Header (IMAGE_NT_HEADERS)

File Header (IMAGE_FILE_HEADER)

Optional Header (IMAGE_OPTIONAL_HEADER)

Data Directory

Section Header Table

PE Sections

The .edata and .data Sections

Advanced PE Format Concepts and Malware Analysis

Conclusion

Frequently Asked Questions (FAQs)

References:

Tags:

Satyajit Daulaguphu

3 comments

Leave a Reply Cancel reply

11 Critical Malware Persistence Mechanisms You Should Be Familiar With!

What Is the Best Computer for Cyber Security? 8 Options

You May Also Like