[HN Gopher] What's the Most Portable Way to Include Binary Blobs... ___________________________________________________________________ What's the Most Portable Way to Include Binary Blobs in an Executable? Author : Tomte Score : 33 points Date : 2022-07-25 09:15 UTC (1 days ago) (HTM) web link (tratt.net) (TXT) w3m dump (tratt.net) | DethNinja wrote: | Assuming binary blob is relatively small: | | Just template generate and store the data as a bit array on the | language of your choice. | | For example, if you are using C/C++ you can zip everything then | use a small python script to generate a C/C++ header where this | data is available as a uint8_t array. | | Keep in mind that all this data will be loaded to memory, so I | don't recommend this approach for anything north of 10mb. | kazinator wrote: | On a modern VM system, the static initialized data will be | mapped to memory, not loaded. So you have to worry about its | virtual footprint, not physical memory use. | kelseyfrog wrote: | https://thephd.dev/finally-embed-in-c23 | jll29 wrote: | Here's a standalone (and Rust-implemented) version similar to xxd | (if you don't like the vim dependency): | https://github.com/jochenleidner/ltools/blob/main/src/bin/bi... | | What I found is that many compilers don't like to compile very | large source files; so if the binaries you'd like to integrate | are big, it might be better to integrate their constituent | objects one by one (if applicable). | tomn wrote: | My colleague wrote this solution for C++ and cmake: | | https://github.com/ebu/libear/commit/40a4000296190c3f91eba79... | | This is a cmake function which generates C++ files using no | external tools. It's probably not very fast, but if you don't | need to handle big files and are already using cmake this is easy | to integrate, adds no dependencies and works on all platforms. | jreese wrote: | Make a ZIP file containing the blob, and catenate it to the end | of the executable binary. The ZIP format specifically puts all of | the key metadata at the back of the file, so pretty much any ZIP | tool can correctly read/list/extract data from the ZIP portion of | the file. Anything that needs to be linked at runtime can just be | extracted to a temp dir, and then cleaned up on exit. Bonus | points for getting "free" compression on text data blobs. | | We do this for Python applications, by combining a ZIP containing | the "link tree" of sources/packages/modules, with a shell | bootstrap script that automatically sets up the environment, | import path, etc, and Python itself has built in support for | importing pure-python modules from a ZIP file. All that's needed | for native modules is a simple import hook that extracts the | native objects into temp space and then loads them appropriately. | deivid wrote: | Just in case you are unaware, take a look at shiv: | https://github.com/linkedin/shiv which does this quite neatly | mmastrac wrote: | One missing approach is just appending the binary data to the end | of the file, and then reading the resource from /proc/self/exe on | Linux (or the equivalents on Mac and Windows). | | It's not "portable" per-se, but all modern platforms [1] have a | way to interrogate the binary contents of the currently-running | executable. | | [1] _NSGetExecutablePath, GetModuleFileName(), getexecname() etc | | EDIT: Apparently https://github.com/gpakosz/whereami will manage | a lot of this complexity for you | anyfoo wrote: | Don't do this, it's ugly and relies on assumptions that aren't | true. I haven't checked each spec, but it is very unlikely that | your ELF/mach-O/PE/... is still valid with added junk at the | end. You may try it out and it may work, but that is true for | many things that may come back to bite you (or others) in | spectacular ways. | dmitrygr wrote: | > it is very unlikely that your ELF/mach-O/PE/... is still | valid with added junk at the end. | | I've written loaders for all of the executable formats you | mentioned, and maybe a dozen more. I know of none where this | would violate the strict interpretation of the word of the | spec. | | That being said, valid file != happy OS | anyfoo wrote: | Agreed. As above: It may for example run, but not be | accepted by other parts of the OS (as evidenced). | fabian2k wrote: | I'd be interested in any example where this approach would | produce an invalid executable. I have used this without | issues, but of course I have certainly not tried this in | every possible environment. | anyfoo wrote: | Computing history is chock full of examples where something | "seems to work" but is actually invalid (and a mach-O | treated that way would be invalid [EDIT: or just "not | accepted" by some parts of the system, see below], whether | it runs or not), and then Raymond Chen has to write a blog | post about it decades later. Here's just one out of many as | a random example: https://devblogs.microsoft.com/oldnewthin | g/20041026-00/?p=37... | | Back to this particular case, the binary will fail strict | code signing validation on macOS. It may still _run_ | because the kernel does not access the binary past the | coverage of the code signature (and all the bits there are | still intact), similar to how multiarch binaries work, but | you will at least severely be hampered to distribute your | binary, since Gatekeeper won 't be happy either. | naasking wrote: | And on microcontrollers where embedded binaries are essential? | duskwuff wrote: | Most microcontrollers run code directly from flash memory -- | there's no "executable file" (or, indeed, any files) involved | at all. | kazinator wrote: | In TXR Lisp, I did this: | | https://www.nongnu.org/txr/txr-manpage.html#N-0389D15E | | There is a 128-byte area prefixed by the character sequence | @(txr):. It normally contains all zeros (empty null-terminated | string). If you put a non-empty UTF-8 string there, it gets | executed. | | Of course, the problem of including a binary blob is trivial if | it can just be declared as an array; the interesting problem is | doing it to the executable, without doing any compiling or | linking. | branon wrote: | Something like https://justine.lol/ape.html perhaps? | mrlonglong wrote: | C23 will soon have the #embed attribute to include such blobs. | This will ease portability concerns. | avrionov wrote: | This was discussed a few days ago "Embed is in C23": | | https://news.ycombinator.com/item?id=32201951 | | C++ added "std::embed" https://open- | std.org/JTC1/SC22/WG21/docs/papers/2020/p1040r6... | ghoward wrote: | The answer to this is easy. At least it was for me; I didn't know | it was such a problem. | | My solution is [1]. It generates a C file with a specific array | name passed in through the command-line. It also has a few other | niceties that I need. | | It works on Windows, Mac OSX, Linux, and the BSD's, no matter the | compiler or linker. | | I use it to generate the arrays for the help texts ([2] and [3]), | as well as two math libraries ([4] and [5]). | | People are welcome to adopt and adapt it. Just follow the | license, as per usual. I've even adapted to my other software. | [6] | | [1]: | https://git.yzena.com/gavin/bc/src/branch/master/gen/strgen.... | | [2]: | https://git.yzena.com/gavin/bc/src/branch/master/gen/bc_help... | | [3]: | https://git.yzena.com/gavin/bc/src/branch/master/gen/dc_help... | | [4]: https://git.yzena.com/gavin/bc/src/branch/master/gen/lib.bc | | [5]: https://git.yzena.com/gavin/bc/src/branch/master/gen/lib2.bc | | [6]: | https://git.yzena.com/Yzena/Yc/src/branch/master/tests/strge... | ufo wrote: | How do you feel about that problem the parent blog post | mentioned, of this being slow for large blobs particularly when | compiling with Clang? | hikarudo wrote: | You can split it up into several files, then concatenate the | arrays at runtime. | vgel wrote: | If you're already using Clang (and thus LLVM & its platform | constraints), I wonder if the best way would be to link in a tiny | Rust / Zig `.o` using `include_bytes!` / `@embedFile`... | kevin_thibedeau wrote: | This is broken: .incbin "string_blob.txt" | ... printf("%s\n", string_blob); | | Text files don't have to have a NUL termination. The proper way | to embed data with the .incbin directive is to add a label after | the file and use that directly for pointer arithmetic or compute | the size with another assembly directive. | kelnos wrote: | In this case it works because the author explicitly put a NUL | at the end of the string in the text file. I don't think the | author was trying to suggest that you can do this with | arbitrary data. | jandrese wrote: | Couldn't you follow up the .incbin statement with a .const 0 or | something similar? | ncmncm wrote: | ELF provides for any number of different kinds of "section" that | you can have automatically mapped into your address space at | startup. You just need a way for your program to know where it | is. There are lots of different ways to get that. | titzer wrote: | Yes, but the article was mostly about what tools do you use to | get that section into the ELF. | jviotti wrote: | My team is working on this problem in the context of creating | Node.js single-executable applications. While the naive approach | of just appending data at the end of the binary works, it is not | friendly with code-signature in macOS and Windows given that | signing operates on PE and Mach-O sections. | | We have recently open-sourced a small tool called Postject | (https://github.com/postmanlabs/postject), which is able to | inject arbitrary data as proper ELF/Mach-O/PE sections for all | major operating systems (with AIX support coming). The tool also | provides C/C++ cross-platform headers for easily traversing the | final binary and introspect whether the segment is present or | not. | | The tool is based on the LIEF (https://github.com/lief- | project/LIEF) project. | | At Postman, we are making use of this on our custom Node.js | single-executable applications and soon on our custom Electron.js | builds too. ___________________________________________________________________ (page generated 2022-07-26 23:00 UTC)