Grease: An Open-Source Tool for Uncovering Hidden Vulnerabilities in Binary Code
11 comments
·March 20, 2025chc4
chc4
Thinking about it more, a lot of bugs come around downstream of your initial function inputs, and you'd still be able to catch things like "heap allocation and then out of bounds read from that allocation with an offset derived from input" just fine since your least-constrained model only infers for the inputs. That probably covers a lot of the normal use-cases for Angr plus automatically harnessing inputs to reach that, which sounds pretty useful
shw1n
I’d built an AI agent to accomplish this using Ghidra + GDB for dynamic analysis (tested it on crackmes)
It worked surprisingly well
Applied to YC with it, sadly no interview
Was later told by some accepted friends/VCs that our application was good, but without pedigree we needed traction to de-risk / get accepted :(
nicce
I think AI is currently much poorer for this use case, if you want to generalize it. There is less assembly code training data available where existing bad coding patterns are matched to actual bug descriptions. Assembly is more verbose so they also take more context width from LLMs. False positive are the biggest pain in this area. With LLMs it is also surprisingly difficult to test the existence of vulnerability in general - often you give a hint about the possible issue with the prompt itself. If you do it in large scale, false positives are everywhere.
theturtletalks
Would this have uncovered the XZ Utils scandal quicker?
TheAdamist
No.
This is looking for coding bugs that allow unintentional behavior, not intentionally malicious code.
mrbluecoat
I've Got Chills, They're Multiplying
curtisszmania
[dead]
ITwork2019
Grease is the time, is the place, is the motion
I'm suspicious of the effectiveness. Most people are doing symbolic execution to find bad pointer dereferences as bugs, whereas this tool is doing it to build the least constrained model and then checking the code against that same model. Wouldn't any code paths that are discovered as part of symbolic exploration and have out-of-bounds read/writes then be infered away as constraints, instead of bugs? Or being unable to detect memory corruption in the form of controlled pointer value overwrites, since you can't say that all pointer dereferences derived from your symbolic input are bugs that allow for attacked controlled memory corruption, because you don't have a concept of what inputs are under user control unlike most uses of Angr or other symbolic tainting tools. Is there a better list of the types of bug classes or heuristics that this is able to catch? Are there any numbers on the false positive/false negative rates against a dataset?