Before we proceed to the concept of PE File Format, which describes the internal structure of all Windows executable files, one should also know the concepts of Virtual Address (VA), Relative Virtual Address (RVA) and File Offsets as these would be the foundation in helping you to understand the technical parts of the PE file format.
This post will be the building blocks for the later posts on PE File Format. I highly recommend you to read this article before proceeding to the advanced topic of exploring the internals of any Windows executable file.
Read more about a PE Structure here!
Understanding how VA, RVA, and Offset are interconnected and how to calculate one from the other is very critical. This is something we have to deal with very often and in every reversing challenge that we will take up later.
Virtual Address (VA)
Applications do not directly access physical memory, they only access virtual memory. In other words, the Virtual Addresses (VAs) are the memory addresses that are referenced by an application.
Virtualizing access to memory provides flexibility in the way applications use available physical memory. In fact, an application doesn’t have to occupy a contiguous piece of physical memory; it can be broken down into parts, without the application even needing to know about it.
Relative Virtual Address (RVA)
Relative Virtual Address or RVA (here afterward) is the difference between two Virtual Addresses (VA) and represents the highest one. Virtual Address is the original address in the memory whereas Relative Virtual Address (RVA) is the relative address with respect to the ImageBase. ImageBase here means the base address where the executable file is first loaded into the memory.
We can calculate RVA with the help of the following formula:
RVA = VA – ImageBase
Have a look at the example below for more clarification:
An application is loaded into the memory having a Base Address at 0x400000 and the VA is at 0x401000. So the RVA is calculated as:
Virtual Address = 0x00401000
ImageBase = 0x00400000
RVA = 0x00001000
File Offsets
When we talk about offsets, we usually either refer to physical memory, a physical file on disk or in another general in cases where we treat data as raw data.
The file offset is actually a location within that particular file. To make it easier for you to understand it is actually the distance from the starting point either the start of the file or the start of a memory address. The offset value is added to the base value to determine the actual value.
So, if we have to calculate the file offset of the entry point in a PE file, consider the below table which shows the important fields within the PE optional header and section header for a particular application.
Optional Header | |||||
---|---|---|---|---|---|
Number of Sections | Section Alignment | Address of Entry Point | File Alignment | Image Base | |
06 | 00001000 | 0000739D | 00000200 | 01000000 | |
Section Headers | |||||
Section Name | Virtual Size | Virtual Address | Size of Raw Data | Pointer to Raw Data | Characteristics |
.text | 00007748 | 00001000 | 00007800 | 0000400 | 60000020 |
.data | 00001BA8 | 00009000 | 00000800 | 00007C00 | C0000040 |
.rsrc | 00008958 | 0000B000 | 00008A00 | 00008400 | 40000040 |
Now, the steps with which we calculate the file execution start offset are followed as below:
- First, determine the Address of entry point from the field under Optional Header.
- Next, check in which section’s virtual space the address of entry point lies.
- Once the right section header is determined, make a note of its virtual address and pointer to raw data fields.
Now, calculate the difference between the address of entry point and the virtual address of the earlier identified section in which the entry point lies.
- Finally, add the difference to the pointer to the raw data which will give the file-based execution start offset of that file.
In short the formula for calculating execution start file offset would be:
Offset of entry point in EXE file = (AddressOfEntryPoint – .section[VirtualAddress]) + .section[PointerToRawData]
In this case, the address of entry point lies in the .text section as the .text section starts at 0x00001000 and ends at 0x00007748 and the address of entry point is 0x0000739D.
So, the file offset for the execution start is:
(0x0000739D – 0x00001000) + 0x0000400 = 0x0000679D
7 comments
Thanks for the easy explanation… Keep up the good work… This helped…
Thanks a ton. Glad to know that it helped.
Please correct me if I am wrong.
In the table you have mentioned the virtual address of .text section as 0x1000 and virtual size as 0x7748. But then you have mentioned that the section ends at 0x7748. Shouldn’t it end at 0x8748 or 0x8747?
Apart from that doubt this article was to the point and easy to follow. Thanks.
Hi @thanursanmahesan:disqus,
Let me clear that doubt for you. So, every executable file contains few sections. And every section has a VirtualAddress and a VirtualSize. Here, VirtualAddress for a .text section denotes the address of the first byte of the section when loaded in the memory which is 0x1000 (hex) and the VirtualSize denotes the total size (in bytes) of that section in memory which in this case is 0x7748 in our example table. And which is 30536 bytes. Consider this VirtualSize as a boundary for that section. Hence, .text section ends at 0x7748. Hope this made it clear.
Got it. 🙂
I think you are wrong.the .text section is 30536(10) bytes,in Hex is 0x7748 and this section starts at 0x1000 so it will ends at 0x1000+0x7748=0x8748
Neat that was helpful could you explain why the virtual size is lower than the raw size ?