2008-01-14

Rich Header

The Rich header is that bit of data in a Portable Executable (PE) File[1] that follows the DOS Stub, but preceeds the actual PE Header. It's format is only documented in a handfull of places on the searchable net, and that documentation isn't terribly in-depth.[2] This post tries to shed some light on the Rich Header, but I fear it poses more questions than it answers.

While looking into the format of the PE Header, I became curious about an odd collection of consistent data in PE files produced by Visual Studio. It always began with the word Rich, and it wasn't in any of the documents made public by Microsoft. After spending some quality time with google, I found a bunch of references to the Rich header, but nothing described it in depth. Most just said it wasn't used by Windows and could be safely removed to save a few bytes. So I gave up.

Later on, I was reading something about anti-disassembling that discussed some fun modifications that one could do to a PE File. I don't recall where I saw this document[3], as it occurred quite some time ago, but it basically dealt with crashing various debuggers and disassemblers. At one point, this document talked about how the Rich header had been used in some court case to attribute a virus to its creator. That was all the document said, and THAT got my attention.

Some additional quality time with google eventually led me to the forum posting[2] by lifewire, who I'm assuming is some russian hacker or something. It described, with a couple of confusing typos, a little bit more information about the Rich header. In fact, most of what will be described here can be ascertained by reading his rich.txt file. (I'd post that file, but I haven't figured out how to put up attachments.) Unfortunately, I couldn't get the details through my thick skull without taking his document and firing up a free copy of IDA Pro[4] to disassemble the link.exe file provided in Microsoft Visual Studio .Net 2003.

Using IDA Pro and frequently referring to lifewire's rich.txt file, I learned that the Rich header begins with a marker, a checksum that is repeated three times, some encoded values, the word Rich in ascii, and finally ends with the checksum value, again.

The encoded values are "comp.id" fields from Visual Studio. I don't entirely understand them, and they're really key in making the Rich header forensically useful. For every library file that is linked against, for every object file that is compiled in, and for every tool in Visual Studio that touches data in the build process, there is a compid that is recorded.[5] Some of the Visual Studio tools have an associated compid. These compids are encoded along with an occurrence count. So the format of the encoded data in the Rich header is compid XOR checksum followed by the number of times that compid is referenced XORed with the checksum. That is, the compid is XORed with the checksum and then the occurrence count is XORed with the checksum, and then these values are written to the header.

The checksum itself is computed by iterating over every byte of the DOS Header, skipping the elfanew field, copying the byte into a 32-bit field, rotating the field left by the field's offset in the PE header, and then adding that to a sum. But that's not all. Next, that sum is then added to each compid XORed with its occurrence count. That is, the 32-bit sum is added to the value resulting from compid XOR occurrence, for every compid in the list.

The last detail I have is that the marker that appears at the beginning of the Rich header is the checksum value that was computed XORed with 0x536e6144. (Note that it's DanS in ascii; another name to put with MZ and Rich.)

So what does the Rich header tell us about the executable? It can tell us how many different object files, libraries, and tools where used to create the executable. What additional information could it tell us? If one were to create a table of compids and the corresponding components in Microsoft's tools, the Rich header would tell us what tools were used to create that executable. Additionally, by verifying the checksum, one can make guesses about whether the PE file were modified. I suspect that a table of components used to build the PE might provide enough details to extrapolate whether the executable itself were modified manually by testing to see if the object files and tools that were specified by the compids actually left a mark on the rest of the PE file.

In closing, here's some python (forgive it's rough edges, I'm new to python) that parses compids from PE files: rich.py via GoogleDocs

EDIT: here's a revised one via pastebin: rich.py

Sources:
[1] PE Coff Format Spec
[2] Forum Posting of rich.txt by lifewire
[3] If anyone recognized this description, please post a comment!
[4] IDA Pro Freeware Download
[5] This was speculation incorrectly included as though it were fact -- and it's false

2 comments:

Blah said...

Your python code is broken because it is incorrectly linewrapped, probably by stupid GoogleDocs.

stephen said...

Hey, you're right. Not every line was wrapped though, so I'm not sure what the problem was here.

Fixed and uploaded. Try it now.