[HN Gopher] G-3PO: A protocol droid for Ghidra, or GPT-3 for rev... ___________________________________________________________________ G-3PO: A protocol droid for Ghidra, or GPT-3 for reverse- engineering Author : AlbertoGP Score : 158 points Date : 2023-01-04 20:20 UTC (2 hours ago) (HTM) web link (medium.com) (TXT) w3m dump (medium.com) | bri3d wrote: | I'm partial to Gepetto for IDA, which includes an especially | hilarious trick in which it instructs ChatGPT to phrase its | responses in JSON, and then uses this JSON directly to name | variables in the decompilation. If the JSON is incorrect, it | politely asks ChatGPT to please fix its JSON output, which | usually works. | | https://github.com/JusticeRage/Gepetto/blob/main/gepetto.py#... | popinman322 wrote: | I've been waiting to see something like this. There's certainly | room to fine-tune an LLM for this task; in that vein, I wonder | whether Ghidra's pcode would produce better results? It's a bit | better suited to this task in that the model wouldn't need to be | tuned for each possible instruction set. Training on code | compiled at different optimization levels might also produce | interesting results. | | You could probably also take the explanations from the LLM, | convert those into embeddings, and then do semantic search over | all functions in a binary. For example, searching for "get | process handle and inject dll" and getting a list of prospects. | It's less useful in an obfuscated binary, but for things like | modding games or extending end-of-life software it could be very | useful. | TOMDM wrote: | I'd never considered semantic search for code vulnerabilities. | | Maybe this is the next generation of automated code scanning. | | Next feature on github: "Our LLM has scanned your code and | found a potential buffer overflow. Please mark as a bug or a | false report" | popinman322 wrote: | I know there's some active work on this (using LLMs, not | traditional methods), not on the binary side but on the | source analysis side. See https://grit.io/, which tries to | detect bugs (and maybe vulnerabilities?) and automatically | submits PRs to patch them for you. I think morgante is their | contact on HN. | | It feels like it'd be difficult to acquire a large corpus of | vulnerabilities to train on. | AlbertoGP wrote: | A few days ago this went mostly ignored | (https://news.ycombinator.com/item?id=34161642) and I was asked | to re-submit it (https://news.ycombinator.com/item?id=34250150) | so that it gets a second chance. | | That's a script for the reverse-engineering tool Ghidra that uses | GPT-3 to de-compile machine code and to write plain English | explanations of what a piece of code does. | | The article is quite detailed and describes both its capabilities | and its limitations. That G-3PO script is open source, MIT | license: https://github.com/tenable/ghidra_tools/tree/main/g3po | | There was also another HN story about what at first sight looks | like an alternative implementation of the same idea: "GptHidra - | Ghidra plugin that asks OpenAI Chat GPT to explain functions" | | https://news.ycombinator.com/item?id=34165291 | | This one is more recent and lacks that good write-up mentioned | above. The script is smaller and it seems to have fewer features. | | I suggest checking both of them. | mdaniel wrote: | Wow, I wouldn't have expected Tenable to shell out to curl, | especially when the curl only adds two headers and they omitted | the "--fail" that would cause non-200 responses to return a | non-zero exit code :-( | | https://github.com/tenable/ghidra_tools/blob/main/g3po/g3po.... | saagarjha wrote: | The real question is how a human should merge these results with | their own reversing, honestly. I can't really trust GPT-3 to be | accurate like I would actually trust the decompiler (and, as any | reverser knows, you don't trust the decompiler). I think I would | treat the output of this as I might a suggestion from a friend | who I let glance over the code: "hmm, that might be a SHA-1?" and | then I go confirm the results for myself. | trenchgun wrote: | Exactly. GPT-3 shines where we can make it solve hard problems | into a format where they are easy to verify ___________________________________________________________________ (page generated 2023-01-04 23:00 UTC)