DWARF as a Shared Reverse Engineering Format

nneonneo

Ghidra to DWARF was already possible via this plugin: https://github.com/cesena/ghidra2dwarf. I’ve used this a lot (and contributed some patches) as it’s immensely useful in reverse engineering code. It gives me source-level debug capabilities in GDB on binaries without the original source code, by lining up the decompiled source code with the generated DWARF debug info. It works practically like magic: you get the ability to inspect variables (including complex structure types!), see arguments to functions, get fully symbolicated backtraces, see source code as you step, use line-level breakpoints, and more.

stefan_

Whats wrong with using Ghidra to debug directly?

PennRobotics

I just tried on a microcontroller target.

Some instructions cause a ghidra.pcode.exec.PcodeExecutionException but can be manually skipped.

I can get to the same stopping points as a commercial simulator, but nothing is displayed in the Stack window in the Ghidra debugger and single stepping ignoring side effects (mostly for cpsie/cpsid/isb/dsb/msr/mrs op codes) takes 10 or 15 seconds for each skipped instruction. Watches on complex variable types (ListItem_t *) are clearly displayed wrong (value { 38 }) and I'd prefer watches to disappear when they're not in scope. I must be doing something wrong because I can step through the machine code but would prefer to step through each C++ line without using line-by-line breakpoints, and I don't know if that exists or how to set it up.

Right now, it's easier for me to use a commercial debugger/simulator; it shows the call history, variables are displayed properly once I import their structure, browsing SRAM is fast/intuitive, MCU peripheral registers are displayed/changed on a single screen with a few clicks or keystrokes (even after importing the SVD into Ghidra), op codes don't need to be skipped (so the processor handles the stack, privilege states, and core/system registers correctly).

Also, the default debugger windows in Ghidra are bonkers. The entire left side shows meaningless data. The right side has tabs for windows I'd like to see simultaneously (memory and decompilation). I'd also rather have multiple child windows for different memory ranges, since FreeRTOS and application variables are interspersed in the code but separated in RAM. The bottom panel is useless for me except for Stack (which is empty) and Watches. On a laptop, the whole UI is cramped. It's easy to change this, but I'd rather just have a useful workflow as an opinionated default.

When I can export what I've manually decompiled to DWARF, I'm more comfortable in a commercial Arm debugger than in gdb or anything that looks/feels like an Eclipse IDE debugger.

There's a lot of potential. That is obvious. I do wonder if I'm missing something; if changing the emulator/backend or the right tool setting will unlock nirvana. I also feel like I'm dropping the ball by not opening issues, but I'm sure there are plenty of Ghidra users trying to analyze Arm Cortex who have the same stumbling blocks as me but perhaps more free time.

nneonneo

Eh, using GDB directly is a lot more convenient for me. Integrations with pwntools, good scripting support, and familiarity are big reasons I keep using GDB. Plus the last few times I’ve tried to use Ghidra’s debugger I ran into weird issues with the debugger freezing or otherwise refusing to update the program state properly; after chasing down and trying to fix some of them I concluded that it wasn’t a sane use of my time.

delusional

What do you use GDB scripting for if you don't mind me asking? I always try to use it, but it's such an awkward language ant it never feels quite worth it to me.

l-albertovich

There doesn't have to be anything wrong for people to choose whatever they feel most comfortable with. Back when I used to do a lot of android (and some iOS) based RE I sometimes used gdb on the device itself through ssh because it was the path of least resistance which allowed me to focus on the hard parts of the job.

senkora

I imagine that having access to rr (which wraps gdb) would also be quite useful.

nandkeypull

See also https://github.com/aldelaro5/ghidra-ExportDwarfELFSymbols, which lets you generate an ELF out of a ghidra database for even non-ELF binaries (e.g. bare-metal firmware dumps).

boricj

Exporting DWARF symbols is a feature I'm craving for inside my own Ghidra extension for exporting relocatable object files, in order to improve the debugging experience for executables containing them. Unfortunately, I've always chickened out on this, partly because resynthesizing debugging data is tricky and partly because I recoil in horror at the sight of the DWARF specification (because I write my own ELF/COFF serializer).

That blog post gave me pause. I don't know yet if this particular implementation will be a good fit, but I need to stop kicking this can down the road.

amluto

> That blog post gave me pause.

Did it give you pause or did it give you resume?

(sorry for the content-free post, but I couldn’t resist.)

drob518

Upvoted for bland humor

xvilka

Importing IDB is possible via python-idb[1] but it needs some work to support recent versions of IDA Pro.

[1] https://github.com/williballenthin/python-idb

barosl

Off topic, but it is good to see an article about the LIEF library on Hacker News. I recently had a need to modify the header of an ELF file and LIEF was a lifesaver. Thanks to all the authors and contributors!

PennRobotics

I've been chipping away at recreating source for a dumped FreeRTOS binary for about a year using Ghidra and Trace32. (Disclosure: I work on Trace32.)

I wanted to implement something like https://github.com/neuviemeporte/mzretools (the tool for reversing F15SE, a DOS game) but I would need to port all of that tooling to an Arm cross-compiler and accept it will never produce a 1:1 binary in my case. (There's evidence the original was built with IAR which I will not be buying solely for a hobby project.)

Ghidra (especially after using BSim to load all of the NXP demos and some TI library code for adjacent ICs) is extremely useful at getting the entire structure: library functions, data types, tasks, queues, I2C/USB comm, pin setup, data locations, etc. You can re-code the flash part easily, but SRAM/stack is all zeros. The code that fills SRAM uses pointer math and some loops and is generally unreadable on its own. I tried the Ghidra simulator, but it didn't work out of the box on Arm microcontroller code.

Trace32 makes it easy to load the raw binary into an Arm simulator and step through all the way to the IDLE task, but there's no high-level listing. You just have to detect if you're in a loop and look at the loop address in Ghidra to see what holds you back. Without a simulation model (which I do not have---at least not outside of work) you have to manually jump past a few spots where the board waits for an acceptable status flag from a clock source, etc., and then manually call SysTick_Handler once, but the SRAM is eventually filled with appropriate data. Once you have a stack, you can start picking out which FreeRTOS macro parameters were set and figuring out the size and type of vendor-created structures which then streamlines further Ghidra analysis. Beyond that, you can't use RTOS awareness without symbols, so unless you write a script to import these from Ghidra, a more insightful view of the binary remains out of reach.

The analysis has been a LOT of looking back and forth between the simulator (which shows a hex address and machine code) and the Ghidra decompilation listing; it is a bit like getting Don Quixote verbally read to you, one letter at a time. Anytime there's a I2C_Master_Transmit(), you need to comb through datasheets for DSP settings and the various I2C-linked chip registers. It's a USB product (with multiple endpoints) so I use Wireshark and Python to check the decompiled USB application code and identify valid packet data that are never sent by the vendor software.

DWARF can be imported and exported in Trace32, so exporting symbols created in Ghidra as DWARF would vastly simplify my workflow (as would having a real model of the NXP chip and/or dumping the bootloader binary and secrets from a production device).

I'll definitely check out this project, as well as ghidra2dwarf linked by nneonneo.

bobmcnamara

I miss trace32.

I miss how everything is in its place.

I miss the number of times I've wondered if some functions could be combined and toy delight they worked wonderfully.

Just writing in to say I'm a fan.

cryptonector

DWARF is fantastic. I've used it to generate C headers for SPIs based on APIs.

saagarjha

I just use binsync tbh

Philpax

TIL! Thanks.

HN

DWARF as a Shared Reverse Engineering Format

DWARF as a Shared Reverse Engineering Format