IDA – New Code Signature Plugin

When reversing embedded code, it is often the case that completely different devices are built around a common code base, either due to code re-use by the vendor, or through the use of third-party software; this is especially true of devices running the same Real Time Operating System.

For example, I have two different routers, manufactured by two different vendors, and released about four years apart. Both devices run VxWorks, but the firmware for the older device included a symbol table, making it trivial to identify most of the original function names:

VxWorks Symbol Table

The older device with the symbol table is running VxWorks 5.5, while the newer device (with no symbol table) runs VxWorks 5.5.1, so they are pretty close in terms of their OS version. However, even simple functions contain a very different sequence of instructions when compared between the two firmwares:

strcpy from the VxWorks 5.5 firmware

strcpy from the VxWorks 5.5.1 firmware

Of course, binary variations can be the result of any number of things, including differences in the compiler version and changes to the build options.

Despite this, it would still be quite useful to take the known symbol names from the older device, particularly those of standard and common subroutines, and apply them to the newer device in order to facilitate the reversing of higher level functionality.

Existing Solutions

The IDB_2_PAT plugin will generate FLIRT signatures from the IDB with a symbol table; IDA’s FLIRT analysis can then be used to identify functions in the newer, symbol-less IDB:

Functions identified by IDA FLIRT analysis

With the FLIRT signatures, IDA was able to identify 164 functions, some of which, like os_memcpy and udp_cksum, are quite useful.

Of course, FLIRT signatures will only identify functions that start with the same sequence of instructions, and many of the standard POSIX functions, such as printf and strcmp, were not found.

Because FLIRT signatures only examine the first 32 bytes of a function, there are also many signature collisions between similar functions, which can be problematic:

;--------- (delete these lines to allow sigmake to read this file)
; add '+' at the start of a line to select a module
; add '-' if you are not sure about the selection
; do nothing if you want to exclude all modules

div_r                                               54 B8C8 00000000000000000085001A0000081214A00002002010210007000D2401FFFF
ldiv_r                                              54 B8C8 00000000000000000085001A0000081214A00002002010210007000D2401FFFF

proc_sname                                          00 0000 0000102127BDFEF803E0000827BD0108................................
proc_file                                           00 0000 0000102127BDFEF803E0000827BD0108................................

atoi                                                00 0000 000028250809F52A2406000A........................................
atol                                                00 0000 000028250809F52A2406000A........................................

PinChecksum                                         FF 5EB5 00044080010440213C046B5F000840403484CA6B010400193C0ECCCC35CECCCD
wps_checksum1                                       FF 5EB5 00044080010440213C046B5F000840403484CA6B010400193C0ECCCC35CECCCD
wps_checksum2                                       FF 5EB5 00044080010440213C046B5F000840403484CA6B010400193C0ECCCC35CECCCD

_d_cmp                                              FC 1FAF 0004CD02333907FF240F07FF172F000A0006CD023C18000F3718FFFF2419FFFF
_d_cmpe                                             FC 1FAF 0004CD02333907FF240F07FF172F000A0006CD023C18000F3718FFFF2419FFFF

_f_cmp                                              A0 C947 0004CDC2333900FF241800FF173800070005CDC23C19007F3739FFFF0099C824
_f_cmpe                                             A0 C947 0004CDC2333900FF241800FF173800070005CDC23C19007F3739FFFF0099C824

m_get                                               00 0000 00803021000610423C04803D8C8494F0................................
m_gethdr                                            00 0000 00803021000610423C04803D8C8494F0................................
m_getclr                                            00 0000 00803021000610423C04803D8C8494F0................................


Alternative Signature Approaches

Examining the functions between the two VxWorks firmwares shows that there are a small fraction (about 3%) of unique subroutines that are identical between both firmware images:

bcopy from the VxWorks 5.5 firmware

bcopy from the VxWorks 5.5.1 firmware

Signatures can be created over the entirety of these functions in order to generate more accurate fingerprints, without the possibility of collisions due to similar or identical function prologues in unrelated subroutines.

Still other functions are very nearly identical, as exemplified by the following functions which only differ by a couple of instructions:

A function from the VxWorks 5.5 firmware

The same function, in the VxWorks 5.5.1 firmware

A simple way to identify these similar, but not identical, functions in an architecture independent manner is to generate “fuzzy” signatures based only on easily identifiable actions, such as memory accesses, references to constant values, and function calls.

In the above function for example, we can see that there are six code blocks, one which references the immediate value 0xFFFFFFFF, one which has a single function call, and one which contains two function calls. As long as no other functions match this “fuzzy” signature, we can use these unique metrics to identify this same function in other IDBs. Although this type of matching can catch functions that would otherwise go unidentified, it also has a higher propensity for false positives.

A bit more reliable metric is unique string references, such as this one in gethostbyname:

gethostbyname string xref

Likewise, unique constants can also be used for function identification, particularly subroutines related to crypto or hashing:

Constant 0x41C64E6D used by rand

Even identifying functions whose names we don’t know can be useful. Consider the following code snippet in sub_801A50E0, from the VxWorks 5.5 firmware:

Function calls from sub_801A50E0

This unidentified function calls memset, strcpy, atoi, and sprintf; hence, if we can find this same function in other VxWorks firmware, we can identify these standard functions by association.

Alternative Signatures in Practice

I wrote an IDA plugin to automate these signature techniques and apply them to the VxWorks 5.5.1 firmware:

Output from the Rizzo plugin

This identified nearly 1,300 functions, and although some of those are probably incorrect, it was quite successful in locating many standard POSIX functions:

Functions identified by Rizzo

Like any such automated process, this is sure to produce some false positives/negatives, but having used it successfully against several RTOS firmwares now, I’m quite happy with it (read: “it works for me”!).



CREDIT:  Craig – devttys0

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s