Sunday, September 20, 2009

About the LC_DYLD_INFO[_ONLY] command.

With the introduction of the new __LINKEDIT format in iPhoneOS 3.1, many tools in the open toolchain are broken. This is all due to the unknown new commands LC_DYLD_INFO[_ONLY]. Although it's known to exist by many now, I found no useful documentation about this new format. Therefore, I'll outline what it is. Alternatively, you can study the source code of dyldinfo which contains every information here.



The LC_DYLD_INFO[_ONLY] commands


These load commands are numerically 0x22 and 0x80000022. The only difference between them are LC_DYLD_INFO_ONLY will abort loading when dyld doesn't understand the new format.

The structure of this load command has been described before. It refers to 5 chunks of data in the __LINKEDIT segment, which are called rebase, bind/weak_bind/lazy_bind, and export.

bind/weak_bind/lazy_bind


These 3 chunks are encoded with the same format. Please think of the data in these chunks as a tiny assembly language, which the only purpose is to "bind" (map) VM addresses to a symbol.

The encoding of each data is of the form:
Bit 76543210
opcodeimm operand(extra data)

So there are at most 16 different opcodes can be used, and the immediate operand can hold a value 0 to 15. But most of the times a value >15, or even non-numeric data is needed. In these cases, extra data will be appended after this byte.

A large number is encoded in the "LEB128" format. In this format, each byte is separated into a "continue bit" (bit 7) and the "digits" (bit 0-6).

Suppose we want to encode the number 123456 in LEB128. Firstly, we write 123456 in binary, and separated into groups of 7 digits: 0000111,1000100,1000000. Then we insert the "continue bit" as 1, except the most significant one, which is 0 to signal the end of the number: 00000111,11000100,11000000. Finally, it should be in little endian, so we flip it around and write out the result: 0xC0 0xC4 0x07.

Apple so far defined 13 opcodes:
opcodeSymbolMeaning
0DONEFinished defining a symbol.
1SET_DYLIB_ORDINAL_IMMSet the library ordinal of the current symbol to the imm operand.
2SET_DYLIB_ORDINAL_ULEBSame as above, but the library ordinary is read from the unsigned LEB128-encoded extra data.
3SET_DYLIB_SPECIAL_IMMSame as above, but the ordinary as set as negative of imm. Typical values are:
  • 0 = SELF
  • -1 = MAIN_EXECUTABLE
  • -2 = FLAT_LOOKUP
4SET_SYMBOL_TRAILING_FLAGS_IMMSet flags of the symbol in imm, and the symbol name as a C string in the extra data. The flags are:
  • 1 = WEAK_IMPORT
  • 8 = NON_WEAK_DEFINITION
5SET_TYPE_IMMSet the type of symbol as imm. Known values are:
  • 1 = POINTER
  • 2 = TEXT_ABSOLUTE32
  • 3 = TEXT_PCREL32
6SET_ADDEND_SLEBSet the addend of the symbol as the signed LEB128-encoded extra data. Usage unknown.
7SET_SEGMENT_AND_OFFSET_ULEBSet that the symbol can be found in the imm-th segment, at an offset found in the extra data.
8ADD_ADDR_ULEBIncrease the offset (as above) by the LEB128-encoded extra data.
9DO_BINDDefine a symbol from the gathered information. Increase the offset by 4 (or 8 on 64-bit targets) after this operation.
ADO_BIND_ADD_ADDR_ULEBSame as above, but besides the 4 byte increment, the extra data is also added.
BDO_BIND_ADD_ADDR_IMM_SCALEDSame as DO_BIND, but an extra imm*4 bytes is also added.
CDO_BIND_ULEB_TIMES_SKIPPING_ULEBThis is a very complex operation. Two unsigned LEB128-encoded numbers are read off from the extra data. The first is the count of symbols to be added, and the second is the bytes to skip after a symbol is added. In pseudocode, all it does is:
for i = 1 to count
define symbol
offset += 4 + skip
end for


For example, we want to bind the address 0x2020 (of the __DATA section, starting at 0x2000) to the symbol _XXHello, which is defined in the 9th loaded dylib, Hello.dylib. We would perform the following operations:
SET_DYLIB_ORDINAL_IMM(9)
SET_SYMBOL_TRAILING_FLAGS_IMM(0, "_XXHello")
SET_SEGMENT_AND_OFFSET_ULEB(2, 0x20) ; usually __DATA is the 2nd segment.
DO_BIND()

So in binary it will be
0x19 0x40 "_XXHello\0" 0x72 0x20 0x90


rebase


I don't think rebase is useful, and rebase uses a similar approach to code rebase info as bind, so I'm ignoring it here.

export


Unlike bind, export is an entirely different beast. The content of the export chunk defines a trie, or a prefix tree. A node in this trie is encoded as:
Node = «uint8_t terminal_size» [Terminal] «uint8_t child_count» [Child] [Child] [Child] ...
Child = «char* suffix» «uleb128 offset»
Terminal = «uleb128 flags» «uleb128 address»


Known flags are:
  • 1 = THREAD_LOCAL
  • 4 = WEAK_DEFINITION
  • 8 = INDIRECT_DEFINITION
  • 0x10 = HAS_SPECIALIZATIONS


For example, if a dylib exported _XXHello at 0x1022 and _XXWorld at 0x1064, and _XXHelloWorld2 at 0x1558. A trie that represent these symbols would be:

_XX - [Hello] - [World2]
\
[World]


So we encode our root node as
00 01 "_XX\0" (offset to _XX)

and _XX as
00 02 "Hello\0" (offset to Hello) "World\0" (offset to World)

if we place the _XX node right after the root node the offset would be 7, so the root node is
00 01 "_XX\0" 07

The offset of the rest can be obtained like this. Now for the Hello node, since it defined a symbol, we have to fill in the Terminal info:
05 00 A2 20 /*=0x1022*/ 01 "World2\0" (offset to World2)

etc.

No comments:

Post a Comment