[HN Gopher] Reverse engineering my router's firmware with binwalk
       ___________________________________________________________________
        
       Reverse engineering my router's firmware with binwalk
        
       Author : sprado
       Score  : 480 points
       Date   : 2020-02-06 13:52 UTC (9 hours ago)
        
 (HTM) web link (embeddedbits.org)
 (TXT) w3m dump (embeddedbits.org)
        
       | bonyt wrote:
       | This is indeed a cool tool! I've used it before when forensically
       | analyzing a cell phone, and found interesting things. For
       | example, I found that a web browser had cached the unencrypted
       | bytes from an HTTP message. Binwalk identified the gzip header's
       | magic number (1f 8b), and after decompression there were
       | interesting results.
       | 
       | Another cool tool I learned about recently is signsrch. It's more
       | for reverse engineering binaries of software that implements
       | encryption of some type. It'll find signatures in the binaries of
       | these encryption methods, giving you a place to look when, for
       | example, reverse engineering a file format that you suspect is
       | encrypted in some way.
       | 
       | https://www.oreilly.com/library/view/learning-malware-analys...
        
       | xenocratus wrote:
       | I first found out about binwalk from this YT video on Firmware
       | Reverse Engineering: https://www.youtube.com/watch?v=GIU4yJn2-2A
       | 
       | Quite a good, short intro into the subject as well!
        
       | ggcdn wrote:
       | A slightly related question for HNers: Is there any easy tool for
       | a non-cs guy to reverse engineer a binary file containing numbers
       | and text in some specific format?
       | 
       | I have to work with some old structural analysis software. The
       | material and element definitions come in an obscure file format
       | ".PF3CMP". I know it contains text like the material names, and
       | numbers/letters for the material properties.
       | 
       | Ultimately its my goal to be able to write these files from
       | matlab or python, instead of using the horribly clunky user
       | interface. But first I need to know the structure of the file,
       | and I'm not even sure how to begin figuring that out.
       | 
       | [0] is what it looks like when opened in a hex editor
       | 
       | [0] https://imgur.com/a/jvqV3k8
        
         | mml wrote:
         | related possibly? what domain is this file from?
         | 
         | https://techdocs.broadcom.com/content/broadcom/techdocs/us/e...
        
           | ggcdn wrote:
           | thanks but sadly not, its from a structural analysis program
           | called PERFORM-3D.
           | 
           | I've contacted the developer but they will not release the
           | format of the files to me.
        
         | thebruce87m wrote:
         | The Linux tool "od" might help you here. The -c flag will print
         | ASCII characters.
         | 
         | You can get it with WSL on Windows, or even just install git
         | and you'll get git-bash for another easy option.
        
         | PeterisP wrote:
         | Depending on how weird the format is, it might be more
         | efficient to reverse-engineer the file-reading routines of that
         | program which can work with these files.
        
         | Youden wrote:
         | I don't know of any straightforward tools, most people I've
         | seen reverse engineer a format do it with a hex editor and
         | writing custom scripts. It's not directly relevant but the best
         | I've seen is this presentation about reverse engineering the
         | protocol used to communicate within a car:
         | https://www.youtube.com/watch?v=KkgxFplsTnM
         | 
         | It uses some techniques that might be relevant, like monitoring
         | different parts of a file as you make different changes (like
         | accelerating or decelerating). In your case it might be
         | possible to compare between different material definitions for
         | example.
        
           | [deleted]
        
           | ggcdn wrote:
           | Ok thanks, I'll take a look. It's possible for me to generate
           | these files for each of the various material settings so I
           | can manually 'diff' them, simillar to what you're describing
        
       | josteink wrote:
       | Did I read the blog wrong, or was the stock firmware too based on
       | a OpenWRT kernel?
       | 
       | That would be pretty hilarious if it was true.
        
         | fencepost wrote:
         | I'm pretty sure a lot of stock firmware is based on OpenWRT or
         | used to be, though I'm pretty sure most of them lag well behind
         | the current version. I haven't paid much attention for a while,
         | but I think a lot were based on Kamikaze which is more than 10
         | years old now.
         | 
         | For the vendors with access to closed-source drivers and
         | chipset info they can likely support devices not supported on
         | the open source packages.
         | 
         | Edit: Per Wikipedia, "Qualcomm's QCA Software Development Kit
         | (QSDK) which is being used as a development basis by many OEMs
         | is an OpenWrt derivative"
         | 
         | It also notes Ubiquiti's wireless router firmware as being
         | derived from OpenWRT, but I thought I remembered discussion of
         | Ubiquiti being derived from a different open source
         | distribution - unless perhaps the routers and wireless devices
         | don't share a code base.
        
           | josteink wrote:
           | That's pretty cool. I didn't know that.
           | 
           | Looking into the equivalent firmware[1] for my Archer C7 v2,
           | I didn't find any OpenWRT bits though. I was honestly a
           | little bit disappointed.
           | 
           | I guess the difference between hardware revisions might be
           | more fundamental than I assumed.                   DECIMAL
           | HEXADECIMAL     DESCRIPTION         -------------------------
           | -------------------------------------------------------------
           | ------------------         0             0x0             TP-
           | Link firmware header, firmware version: 1.-15188.3, image
           | version: "",                                       product
           | ID: 0x0, product version: -956301310, kernel load address:
           | 0x0,                                       kernel entry
           | point: 0x80002000, kernel offset: 16384512, kernel length:
           | 512, rootfs offset: 855873, rootfs length: 1048576,
           | bootloader offset:
           | 15204352, bootloader length: 0         71520         0x11760
           | Certificate in DER format (x509 v3), header length: 4,
           | sequence length: 64         98560         0x18100
           | U-Boot version string, "U-Boot 1.1.4 (Mar  5 2018 -
           | 13:57:29)"         98736         0x181B0         CRC32
           | polynomial table, big endian         131584        0x20200
           | TP-Link firmware header, firmware version: 0.0.3, image
           | version: "",                                       product
           | ID: 0x0, product version: -956301310, kernel load address:
           | 0x0,                                       kernel entry
           | point: 0x80002000, kernel offset: 16252928, kernel length:
           | 512, rootfs offset: 855873, rootfs length: 1048576,
           | bootloader offset:
           | 15204352, bootloader length: 0         132096        0x20400
           | LZMA compressed data, properties: 0x5D, dictionary size:
           | 33554432 bytes,
           | uncompressed size: 2451644 bytes         1180160
           | 0x120200        Squashfs filesystem, little endian, version
           | 4.0, compression:lzma, size:
           | 9878520 bytes, 789 inodes, blocksize: 131072 bytes, created:
           | 2018-03-05                                       06:16:10
           | 
           | [1] https://static.tp-
           | link.com/2018/201806/20180611/Archer%20C7(...
        
             | mjevans wrote:
             | The BOM can vary quite a lot between 'revisions', using
             | your product as an example...
             | 
             | https://openwrt.org/toh/tp-link/archer-c7-1750 (Scroll down
             | to the Info Links table and the Wikidevi Info column)
             | 
             | v1 to v2 upgrades the Flash (8MB to 16MB) and uses a
             | slightly different AN+AC wifi chip. v2 and v3 seem pretty
             | similar at a glance. v4 is rated at 12v 2a rather than
             | 2.5a; using a completely different BGN(2.6ghz) chip and
             | also different ethernet chip/switch. v5 is lower power
             | still at 1.5a, but it's less obvious where that change
             | happened due to lack of pictures. A guess based on the
             | simpler antenna list is that it uses less antenna.
        
           | bradknowles wrote:
           | Ubiquiti is based on Vyatta.
        
         | crankylinuxuser wrote:
         | Given this line...
         | 
         | image name: "MIPS OpenWrt Linux-3.3.8"
         | 
         | I would say you are true.
        
       | hyper_reality wrote:
       | It's a good article but there are much easier ways to use binwalk
       | than presented here.
       | 
       | In the first example he uses the "--signature" and "--term"
       | flags, these are unnecessary. Running binwalk with no flags will
       | produce the same output.
       | 
       | To extract part of the file, he also uses dd with the "skip" and
       | "count" options painfully calculated. You can just use:
       | 
       | binwalk --dd='.*' img.bin
       | 
       | and it will extract everything that matches the pattern - the
       | pattern above will extract all found files.
        
       | leeoniya wrote:
       | glad i flashed latest dd-wrt beta on my archer-c7 v5 :D. though
       | my wan-facing device runs OPNSense.
       | 
       | i actually prefer to run Tomato, but archer c7 is not broadcom :(
       | 
       | can anyone offer advice about dd-wrt vs openwrt (considering
       | trying openwrt).
        
         | josteink wrote:
         | Latest version of OpenWRT (19) runs noticeably better on this
         | device, with better HW offloading support and based on a nearly
         | mainline, modern Linux kernel and a brand new device-tree for
         | the Atheros SoC.
         | 
         | What reasons do you have to stay on dd-wrt?
        
           | leeoniya wrote:
           | > What reasons do you have to stay on dd-wrt?
           | 
           | mostly that i've used it before. can i gui-flash to openwrt
           | from dd-wrt? i've done tftp flashes before but they're pretty
           | fiddly with getting the stupid 30-30-30 or whatever timing
           | right. also i think these routers try to "pull" from a tftp
           | server rather than having you push to one that they bootstrap
           | - i've never been able to get the "pull" variant to work.
           | 
           | would be hell of a lot easier if the router could be booted
           | into something like android's (arm's?) fastboot or flashmode
           | mode so i can just push an image.
        
             | SpikedCola wrote:
             | Going from dd-wrt to openwrt should be as simple as a
             | firmware flash from the web gui, and an nvram reset. Worst
             | case, you can flash a "revert to stock" image from ddwrt to
             | go back to factory, then flash openwrt as if the device was
             | factory.
             | 
             | Openwrt also has a handy failsafe built into a lot of
             | models. It boots a stripped down http server where you can
             | upload recovery firmware.
             | 
             | Used to swear by dd-wrt, now I prefer openwrt.
        
             | josteink wrote:
             | Flashing the OpenWRT "factory" (as opposed to sysupgrade)
             | image in the web UI should probably work fine, but don't
             | quote me on it.
             | 
             | That's how I flashed from stock to OpenWRT on 3+ Archer
             | units anyway. Make sure not to keep settings.
        
         | bxparks wrote:
         | I use Gargoyle on my Archer C7 v2. This thread
         | (https://www.gargoyle-router.com/phpbb/viewtopic.php?t=11896)
         | says that C7 v5 is supported.
        
         | magduf wrote:
         | >i actually prefer to run Tomato, but archer c7 is not broadcom
         | :(
         | 
         | Not being Broadcom is a very good thing.
        
         | 12bits wrote:
         | Did you notice your wireless signal strength considerably lower
         | when going to dd-wrt?
         | 
         | I put openwrt on my c7 V5 and could barely get any bars.
         | 
         | Flashed back to the stock and was back in business.
         | 
         | Another thing I've read is the third party firmwares don't get
         | hardware access to NAT resulting in speed hits.
         | 
         | Cheers
        
           | leeoniya wrote:
           | yes, and i had throughput issues when running in full-width
           | G/N mixed mode compared to my previous Tomato/Asus RT-N16
           | setup. my phone would also drop out and reconnect
           | intermittently with the c7. but in dedicated AC it seems to
           | be doing well thus far. i cannot say for sure whether this
           | was due to DD-WRT or not as i did not do a thorough
           | comparison to stock.
           | 
           | > Another thing I've read is the third party firmwares don't
           | get hardware access to NAT
           | 
           | i read that too :(
        
         | dpcx wrote:
         | Where can one find the dd-wrt you used for your c7? I have the
         | same device and have been unable to get it to flash anything
         | other than official firmware.
        
           | RussianCow wrote:
           | These are the instructions I successfully followed on my C7
           | V2: https://wiki.dd-
           | wrt.com/wiki/index.php/TP_Link_Archer_C7#Ins...
           | 
           | Here is the exact `factory-to-ddwrt` image I used (this will
           | depend on which version you have): ftp://ftp.dd-
           | wrt.com/betas/2019/10-15-2019-r41328/tplink_archer-c7-v2/
        
         | TimSchumann wrote:
         | I'm running OpenWRT and the Archer c7 is on the list of
         | supported devices. I'd say give it a try.
        
       | ChuckNorris89 wrote:
       | _> Although the firmware was released last year (August 2019) as
       | I write this article, it uses an old Linux kernel version (3.3.8)
       | released in 2012 compiled with a very old GCC version (4.6) also
       | from 2012!_
       | 
       | This is what happens whey you pay peanuts for embedded devs and
       | outsource development to the cheapest sweatshop you can find so
       | your products can meet a competitive price point.
       | 
       | Sadly this will not change until there's regulation in place to
       | hold manufacturers accountable for their massively obvious
       | vulnerabilities since nobody cares that they're flooding the
       | market with potential botnet hosts when they're overworked, paid
       | miserably and have a manager constantly breathing down their
       | neck.
        
         | bluesign wrote:
         | It is mostly related to drivers to soc, not about paying devs
        
           | scoutt wrote:
           | Exactly. What I see is that the SoC provider just _freezes_
           | everything at a given version and supports just that. For
           | example I am currently building Android 9 on a QCOM SoC with
           | a 4.9 Kernel. I don 't think it will receive any future
           | update...
        
           | ChuckNorris89 wrote:
           | So how did OpenWRT manage to build firmware with up to date
           | components for it? The Qualcomm chips inside of it seem
           | fairly modern for such an old kernel.
        
             | prashnts wrote:
             | Note that openwrt has a big community of contributors and
             | not all devices/features are supported. In contrast the
             | manufacturer firmware is at least feature complete and easy
             | for regular users to set up.
        
               | rahuldottech wrote:
               | OpenWrt is also free. Both as free software, and free of
               | cost. When you're paying a manufacturer for a product,
               | surely it's not too much to expect them to ship with
               | functional software that also happens to be up-to-date
               | and secure?
        
               | jschwartzi wrote:
               | You can get that, but not at consumer-grade router
               | prices. I have a separate router that I put behind my
               | stand-alone cable modem. I paid for that separate router
               | about $200.00. And another $100 for the modem. A wifi
               | access point cost me another $100.
               | 
               | So it's about $400.00 for a router that has updated
               | firmware(pfSense). Or you can cheap out and spend only
               | $100.00. This is what you get by doing that.
        
             | bluesign wrote:
             | Cause openwrt doesnt care if some feature doesnt work but
             | oem should support all features
        
             | mjevans wrote:
             | Support varies, you should purchase devices that include
             | hardware which is supported by the Open Source drivers
             | (even if you have to compromise and it still uses some
             | small blobs that are free to distribute).
             | 
             | You should also purchase a device that includes enough
             | storage space and RAM to support more than the bare
             | minimum; that will help keep things future proof.
        
             | tenebrisalietum wrote:
             | OpenWRT doesn't guarantee support of all hardware. I have a
             | router flashed to a certain version with a newer kernel,
             | and the Wifi doesn't work because of no driver available.
        
         | Thaxll wrote:
         | Most routers run 2.6.32 kernel.
        
           | non-entity wrote:
           | Is there any particular reason for this? Like some feature
           | that was removed in later versions?
        
             | gvb wrote:
             | The primary reason is likely because the hardware (SoC
             | peripherals) drivers were written for 2.6.x and not forward
             | ported to newer versions of the linux kernel. A lot of
             | hardware drivers were (are) written by the hardware (chip)
             | manufacturers and then abandoned.
        
           | ChuckNorris89 wrote:
           | I have the feeling most home routers are designed by the same
           | OEM shop in Shenzhen.
        
       | GEBBL wrote:
       | This is amazing! I've used binwalk extract for 'capture the flag'
       | challenges but I never really thought about the practical
       | applications of it. Wow! Thank you
        
         | LeonM wrote:
         | Funny, I always assumed that there would be no application for
         | binwalk other than for extracting binary firmware images of
         | embedded devices.
         | 
         | Using binwalk for CTF challenges is actually a new insight for
         | me :)
        
           | beefhash wrote:
           | Conversely, it's a convenient tool for obfuscation. You can
           | trigger plausible false positives all over, while also making
           | sure that there's nothing of immediate use with binwalk left.
        
       | commandlinefan wrote:
       | From the output I see:                 23296         0x5B00
       | LZMA compressed data, properties: 0x5D, dictionary size:
       | 8388608 bytes, uncompressed size: 97476 bytes            64968
       | 0xFDC8          XML document, version: "1.0"
       | 
       | So it looks like the size of the bootloader should be 64968 -
       | 23296 = 41672. But he extracts 41162:                 $ dd
       | if=archer-c7.bin of=u-boot.bin.lzma bs=1 skip=23296 count=41162
       | 
       | Curious if anybody knows why 41162; is this a block-size
       | alignment requirement?
        
         | mrspeaker wrote:
         | I'm wondering how these values are determined too. I'm
         | "following along at home" without any idea what I'm doing
         | (though all the files, bytes, and offsets are matching with the
         | tutorial... Also, if the original author finds this thread:
         | amazing write-up - got me really interested in the topic!).
         | 
         | At the step where they remove the header with
         | dd if=uImage of=Image.lzma bs=1 skip=72
         | 
         | It results in a file that if I try and un compress it with
         | `unlzma Image.lzma` it complains with "Compressed data is
         | corrupt"
         | 
         | I don't know where the magic number "72" comes from. Is it
         | likely that could be different on my machine (a mac)?
         | 
         | [edit: I think there's something else wrong - if I use
         | `mkImage` to examine the uImage file I only get:
         | mkimage -l uImage         GP Header: Size 27051956 LoadAddr
         | 78a267ff
         | 
         | Instead of image information]
        
         | zerocrates wrote:
         | The 41162 bytes comes from the preceding uImage header, you'll
         | see it listed in that big description. I'm not sure what the
         | 510 bytes of padding are, though. Just padding? A checksum?
        
           | jschwartzi wrote:
           | Maybe bootloader code?
        
       | JoeAltmaier wrote:
       | Cool tool! I wrote something for reverse-engineering code, as a
       | consultant years ago. They had a radio module but the
       | manufacturer had lost the source code.
       | 
       | So the tool was called Golem. It had tables for defining opcode
       | to assembler pattern matching, that could be written for any
       | machine (instead of just the one I was cracking).
       | 
       | It worked iteratively. You ran it over the binary once, it
       | produced arbitrary labels from jump-points. You could annotate
       | that output by changing the labels to something human-readable
       | (e.g. Loop-back, Main, TimerISR etc) and add comments.
       | 
       | The next iteration would read that back in to build a symbol
       | table, rescan the binary and re-output. But this time it would
       | understand that the symbols were always on opcode boundaries,
       | distinguish data table from code entry points (because you marked
       | them) etc. So it would do a better job of staying in sync with
       | the code.
       | 
       | Once I was done with that project (and had re-compilable source
       | for the radio module) I put it away and never thought of it
       | again.
        
         | souprock wrote:
         | You were on your way to cloning IDA Pro, Ghidra, Binary Ninja,
         | or Hopper Disassembler. To varying degrees, sometimes as a pay-
         | extra option, those tools can produce source code.
        
           | JoeAltmaier wrote:
           | Um. I think they post-dated me! But I didn't go anywhere with
           | it.
        
             | souprock wrote:
             | IDA Pro started as a 16-bit MS-DOS program. It's real old.
             | I'm pretty sure I was using it back in 1992, when it was
             | already a well-developed program.
             | 
             | Ghidra is old too, although only recently public. It
             | couldn't be older than Java, which is from 1996.
        
               | JoeAltmaier wrote:
               | Cool. I did mine in 2006. Hey, those have mostly Intel
               | disassemblers. Mine did any machine code you cared to
               | write a dissector for.
               | 
               | Are they iterative? Can you add human clues/cues so they
               | do a better job the next time?
        
               | souprock wrote:
               | They are not at all mostly Intel disassemblers, though
               | some of them have freeware versions (to suppress
               | competition) or time-limited demo versions that are
               | purposely limited. They are very much designed around
               | humans adding clues: you can declare function parameters,
               | struct types, enumerations, and the meaning of various
               | offsets in code. They are interactive GUI tools,
               | continuously updating automated analysis as the user
               | assists by providing clues to the analysis engine. Ghidra
               | and Binary Ninja can be simultaneously multi-user,
               | storing the database on a server for collaboration.
               | 
               | IDA Pro supports dozens of processor architectures. I
               | count about 70, not including model variations and not
               | including community support. https://www.hex-
               | rays.com/products/ida/processors/
               | 
               | Ghidra supports "X86 16/32/64, ARM/AARCH64, PowerPC
               | 32/64/VLE, MIPS 16/32/64/micro, 68xxx, Java / DEX
               | bytecode, PA-RISC, PIC 12/16/17/18/24, Sparc 32/64,
               | CR16C, Z80, 6502, 8051, MSP430, AVR8, AVR32, and variants
               | of these processors."
               | 
               | Binary Ninja officially supports x86, x64, ARMv7, Thumb2,
               | ARMv8, PowerPC, MIPS, 6502. Community support adds AVR,
               | MSP430, and VMNDH-2k12.
               | 
               | Hopper Disassembler supports "x86{16,32,64}, Dalvik, avr,
               | ARM, java, PowerPC, Sparc, MIPS"
        
               | dmitrygr wrote:
               | ida handles any arch
               | 
               | it is interactive (so by definition iterative)
        
       | tasubotadas wrote:
       | I am really surprised that firmware images are not just .tar.gz
       | files renamed to .bin :/. That's how I would have implemented a
       | distribution of new firmware.
        
         | josteink wrote:
         | And how do you partition boot-loaders, kernels, and rootfs and
         | such in that tar.gz?
         | 
         | Embedded device will be hard coded to look at a fixed point and
         | start booting from there, there's no UEFI. How will you ensure
         | boot-loaders get unpacked precisely where they need to be?
         | 
         | And that doesn't even touch the idea of having a _router_
         | understand a file system before any firmware code is loaded.
         | 
         | Routers really are quite different from PCs.
        
           | tenebrisalietum wrote:
           | I think firmware images are typically not the fixed ROM code
           | the CPU first encounters upon startup, even if they contain
           | U-Boot. Especially if stored in NAND flash they probably
           | aren't.
           | 
           | AR7 platform, for example, the MIPS core runs a small ROM
           | that initializes RAM, then reads some blocks from flash. Not
           | sure how much code you'd need to unpack a tar.gz but
           | completely possible.
        
           | bshipp wrote:
           | True enough, but I think they used to be even more unique and
           | over time they've become more like PCs.
           | 
           | One of these days I'm going to log in to the admin interface
           | and find candy crush installed.
        
             | vlovich123 wrote:
             | They're "like PCs" in the sense that the instruction set
             | has of the CPUs has caught up and in theory you can attach
             | more complicated peripherals. However, unless your embedded
             | product has MMC flash attached (for many applications it
             | doesn't due to cost + physical size) you're SOL for the
             | following reasons:
             | 
             | 1. For M4s your storage is typically some kind of SPI flash
             | which doesn't act like the traditional desktop flash you're
             | dealing with. You have to manually specify the address
             | you're reading/writing & you have to do it on block
             | boundaries (multiple KB). You're generally looking at
             | 8-64MB. 2. For M0 your storage is typically flash built-in
             | with potentially even more restrictions. 3. These devices
             | have _very_ little RAM. Decompression means you have to
             | have a way of enforcing constraints on the amount of space
             | you 'll need. Aside from the space needed regularly for
             | decompression you may need to buffer the decompressed
             | content in-memory to align with block boundaries. All of
             | this means development time, increased costs & risk for
             | something you may not be able to pull of.
             | 
             | If your vendor actually internally compresses their image
             | then great but generally they don't for all the same
             | reasons (+ sometimes this is touching ROM code in the
             | chip).
        
           | monocasa wrote:
           | > And how do you partition boot-loaders, kernels, and rootfs
           | and such in that tar.gz?
           | 
           | In the past, each of those would be a separate MTD partition
           | with a seperate device file. You just dd them over those
           | files.
        
       | andrewshadura wrote:
       | Another similar tool to look at is Hachoir.
        
       ___________________________________________________________________
       (page generated 2020-02-06 23:00 UTC)