LibLISA – Instruction Discovery and Analysis on x86-64
17 comments
·October 24, 2024saagarjha
jxors
Evaluating how much of instruction space we cover was indeed difficult. Initially, we wanted to parse Intel XED's datafiles to generate a map of valid instruction space, but we ended up going for the simpler approach of computing coverage by selecting instructions randomly and from real-world binaries because of time constraints.
From Table 7 you can get an idea of how many instruction variants we cover (~1500 covered, ~700 enumerated but not synthesized, 744 out of enumeration scope). Instruction variants correspond much more closely with the mnemonics listed in the reference manuals, and this is typically the number reported by related work.
saagarjha
Yes, but I still think this falls victim to the problem I mentioned: you might have two dozen arithmetic instructions, and two that change privilege state. It is generally the latter that is more interesting to those doing this kind of analysis. (Not saying that the former is completely useless; I am sure emulator developers and similar would find it interesting. But most of the research effort going into finding new instructions or whatever is going towards the not-simple instructions.)
pabs3
Reminds me of sandsifter, a fuzzer for the x86 ISA:
amy-petrik-214
great piece on application of such - https://media.defcon.org/DEF%20CON%2026/DEF%20CON%2026%20pre...
pabs3
The video of the talk about this is well worth watching too.
specialgoodness
This is interesting work but it totally misses the boat when it talks about the current state of the art. They cite a 2014 version of the Goel-Hunt-et al formal x86 model in ACL2, but they fail to talk about its modern version. The modern version (developed at Centaur and then Intel!) is an ACL2 x86 model that is so precise that it can boot Linux and run user-land programs. Let me say that again: it is a formal mathematical model of a processor that is so precise that it can boot Linux and run user-land programs! This is a monumental accomplishment and is not even mentioned in their paper.
jf
I've long wanted to have a way to see what actually happens inside a CPU when a set of instructions are executed. I'm pretty excited after skimming this paper as it looks like they developed a technique to automatically determine how the x86-64 instructions actually work by observing real world CPU behavior.
dzaima
This is determining the directly observable behavior rather than anything about microarchitectural specifics.
ddingus
And still quite a useful tool, particularly when exploring undocumented instructions.
fragmede
blinkenlights might be up your alley, if you haven't seen it before.
rbanffy
Every computer deserves a blinking lights front panel.
westurner
From https://news.ycombinator.com/item?id=33563857 :
> Memory Debugger
Valgrind > Memcheck, None: https://en.wikipedia.org/wiki/Valgrind ; TIL Memcheck's `none` provides a traceback where the shell would normally just print "Segmentation fault"
DynamoRio > Dr Memory: https://en.wikipedia.org/wiki/DynamoRIO#Dr._Memory
Intel Pin: https://en.wikipedia.org/wiki/Pin_(computer_program)
https://news.ycombinator.com/item?id=22095435, : SoftICE, EPT, Hypervisor, HyperDbg, PulseDbg, BugChecker, pyvmidbg (libVMI + GDB), libVMI Python, volatilityfoundation/volatility, Google/rekall -> yara, winpmem, Microsoft/linpmem, AVML,
rr, Ghidra Trace Format: https://github.com/NationalSecurityAgency/ghidra/discussions... https://github.com/NationalSecurityAgency/ghidra/discussions... : appliepie, orange_slice, cannoli
GDB can help with register introspection: https://web.stanford.edu/class/archive/cs/cs107/cs107.1202/l... :
> Auto-display and Printing Registers: The `info reg` command [and `info all-registers` (`i r a`)]
emu86 implements X86 instructions in Python, optionally in Jupyter notebooks; still w/o X86S, SIMD, AVX-512, x86-84-v4
westurner
> Valgrind `none` provides a traceback where the shell would normally just print "Segmentation fault"
What would it take to get `bash` to print a --traceback after "Segmentation fault\n", and then possibly also --gdb like pytest --pdb?
- [ ] ENH: bash: add valgrind `none`-style ~ --traceback after "Segmentation fault" given an env var or by default?
jxors
Hi! I'm one of the authors. Cool to see our work show up on HN!
I'm happy to answer questions if there are any.
jtotheh
This may be a really dumb question, but is that much of the behavior of an x86_64 CPU variable and undefined? Until recently I thought the chipmakers provided full information (recently I found an article about people investigating the undocumented innards of the 286, IIRC). This seems like a pretty shaky foundation for software.
jxors
Not a dumb question at all!
Documentation is definitely not one of x86's strengths. Other architectures do much better. For example, ARM provides formal models of their CPUs, and RISC-V is so simple you could implement all its semantics in a few thousand lines of code.
There are quite a few instructions with undefined behavior, but it is not that much of an issue if you can choose to avoid it -- for example in a compiler. Almost all UB is found in flags or when using invalid instruction prefixes. And although there is some unexpected UB, like `imul`'s zero flag being UB instead of being set according to the result of the multiplication [1], reading the manual and sticking to the parts that are clearly not UB gets you most of the way.
However, it becomes an issue if you need to analyze a binary that uses UB. Then you can't choose which instructions to use, so you need to have a complete model of all UB. That's much more difficult, and for example most decompilers currently fail at this. We have an example of this in Figure 1 of our paper.
This is neat but the analysis of their work leaves a bit to be desired. You can't just randomly select instructions and see if you did a good job, because the instruction space is not really uniform on any axis that people care about. For example, on a hypothetical ISA that has most the encoding space that is, like, simple arithmetic ops then you can get "good" coverage really easily. But that's not actually very useful because the instructions people care about when doing this kind of analysis are specific and usually more esoteric, and difficult to analyze with a simple bitstring approximation. Like, this definitely cannot discover the semantics of syscall, or rdrand. The authors claim they would have been able to discover reptar if they extended their work slightly, but I think it is pretty dubious that their methodology is powerful enough to do so.