[HN Gopher] What's the Most Portable Way to Include Binary Blobs...
       ___________________________________________________________________
        
       What's the Most Portable Way to Include Binary Blobs in an
       Executable?
        
       Author : Tomte
       Score  : 33 points
       Date   : 2022-07-25 09:15 UTC (1 days ago)
        
 (HTM) web link (tratt.net)
 (TXT) w3m dump (tratt.net)
        
       | DethNinja wrote:
       | Assuming binary blob is relatively small:
       | 
       | Just template generate and store the data as a bit array on the
       | language of your choice.
       | 
       | For example, if you are using C/C++ you can zip everything then
       | use a small python script to generate a C/C++ header where this
       | data is available as a uint8_t array.
       | 
       | Keep in mind that all this data will be loaded to memory, so I
       | don't recommend this approach for anything north of 10mb.
        
         | kazinator wrote:
         | On a modern VM system, the static initialized data will be
         | mapped to memory, not loaded. So you have to worry about its
         | virtual footprint, not physical memory use.
        
       | kelseyfrog wrote:
       | https://thephd.dev/finally-embed-in-c23
        
       | jll29 wrote:
       | Here's a standalone (and Rust-implemented) version similar to xxd
       | (if you don't like the vim dependency):
       | https://github.com/jochenleidner/ltools/blob/main/src/bin/bi...
       | 
       | What I found is that many compilers don't like to compile very
       | large source files; so if the binaries you'd like to integrate
       | are big, it might be better to integrate their constituent
       | objects one by one (if applicable).
        
       | tomn wrote:
       | My colleague wrote this solution for C++ and cmake:
       | 
       | https://github.com/ebu/libear/commit/40a4000296190c3f91eba79...
       | 
       | This is a cmake function which generates C++ files using no
       | external tools. It's probably not very fast, but if you don't
       | need to handle big files and are already using cmake this is easy
       | to integrate, adds no dependencies and works on all platforms.
        
       | jreese wrote:
       | Make a ZIP file containing the blob, and catenate it to the end
       | of the executable binary. The ZIP format specifically puts all of
       | the key metadata at the back of the file, so pretty much any ZIP
       | tool can correctly read/list/extract data from the ZIP portion of
       | the file. Anything that needs to be linked at runtime can just be
       | extracted to a temp dir, and then cleaned up on exit. Bonus
       | points for getting "free" compression on text data blobs.
       | 
       | We do this for Python applications, by combining a ZIP containing
       | the "link tree" of sources/packages/modules, with a shell
       | bootstrap script that automatically sets up the environment,
       | import path, etc, and Python itself has built in support for
       | importing pure-python modules from a ZIP file. All that's needed
       | for native modules is a simple import hook that extracts the
       | native objects into temp space and then loads them appropriately.
        
         | deivid wrote:
         | Just in case you are unaware, take a look at shiv:
         | https://github.com/linkedin/shiv which does this quite neatly
        
       | mmastrac wrote:
       | One missing approach is just appending the binary data to the end
       | of the file, and then reading the resource from /proc/self/exe on
       | Linux (or the equivalents on Mac and Windows).
       | 
       | It's not "portable" per-se, but all modern platforms [1] have a
       | way to interrogate the binary contents of the currently-running
       | executable.
       | 
       | [1] _NSGetExecutablePath, GetModuleFileName(), getexecname() etc
       | 
       | EDIT: Apparently https://github.com/gpakosz/whereami will manage
       | a lot of this complexity for you
        
         | anyfoo wrote:
         | Don't do this, it's ugly and relies on assumptions that aren't
         | true. I haven't checked each spec, but it is very unlikely that
         | your ELF/mach-O/PE/... is still valid with added junk at the
         | end. You may try it out and it may work, but that is true for
         | many things that may come back to bite you (or others) in
         | spectacular ways.
        
           | dmitrygr wrote:
           | > it is very unlikely that your ELF/mach-O/PE/... is still
           | valid with added junk at the end.
           | 
           | I've written loaders for all of the executable formats you
           | mentioned, and maybe a dozen more. I know of none where this
           | would violate the strict interpretation of the word of the
           | spec.
           | 
           | That being said, valid file != happy OS
        
             | anyfoo wrote:
             | Agreed. As above: It may for example run, but not be
             | accepted by other parts of the OS (as evidenced).
        
           | fabian2k wrote:
           | I'd be interested in any example where this approach would
           | produce an invalid executable. I have used this without
           | issues, but of course I have certainly not tried this in
           | every possible environment.
        
             | anyfoo wrote:
             | Computing history is chock full of examples where something
             | "seems to work" but is actually invalid (and a mach-O
             | treated that way would be invalid [EDIT: or just "not
             | accepted" by some parts of the system, see below], whether
             | it runs or not), and then Raymond Chen has to write a blog
             | post about it decades later. Here's just one out of many as
             | a random example: https://devblogs.microsoft.com/oldnewthin
             | g/20041026-00/?p=37...
             | 
             | Back to this particular case, the binary will fail strict
             | code signing validation on macOS. It may still _run_
             | because the kernel does not access the binary past the
             | coverage of the code signature (and all the bits there are
             | still intact), similar to how multiarch binaries work, but
             | you will at least severely be hampered to distribute your
             | binary, since Gatekeeper won 't be happy either.
        
         | naasking wrote:
         | And on microcontrollers where embedded binaries are essential?
        
           | duskwuff wrote:
           | Most microcontrollers run code directly from flash memory --
           | there's no "executable file" (or, indeed, any files) involved
           | at all.
        
       | kazinator wrote:
       | In TXR Lisp, I did this:
       | 
       | https://www.nongnu.org/txr/txr-manpage.html#N-0389D15E
       | 
       | There is a 128-byte area prefixed by the character sequence
       | @(txr):. It normally contains all zeros (empty null-terminated
       | string). If you put a non-empty UTF-8 string there, it gets
       | executed.
       | 
       | Of course, the problem of including a binary blob is trivial if
       | it can just be declared as an array; the interesting problem is
       | doing it to the executable, without doing any compiling or
       | linking.
        
       | branon wrote:
       | Something like https://justine.lol/ape.html perhaps?
        
       | mrlonglong wrote:
       | C23 will soon have the #embed attribute to include such blobs.
       | This will ease portability concerns.
        
       | avrionov wrote:
       | This was discussed a few days ago "Embed is in C23":
       | 
       | https://news.ycombinator.com/item?id=32201951
       | 
       | C++ added "std::embed" https://open-
       | std.org/JTC1/SC22/WG21/docs/papers/2020/p1040r6...
        
       | ghoward wrote:
       | The answer to this is easy. At least it was for me; I didn't know
       | it was such a problem.
       | 
       | My solution is [1]. It generates a C file with a specific array
       | name passed in through the command-line. It also has a few other
       | niceties that I need.
       | 
       | It works on Windows, Mac OSX, Linux, and the BSD's, no matter the
       | compiler or linker.
       | 
       | I use it to generate the arrays for the help texts ([2] and [3]),
       | as well as two math libraries ([4] and [5]).
       | 
       | People are welcome to adopt and adapt it. Just follow the
       | license, as per usual. I've even adapted to my other software.
       | [6]
       | 
       | [1]:
       | https://git.yzena.com/gavin/bc/src/branch/master/gen/strgen....
       | 
       | [2]:
       | https://git.yzena.com/gavin/bc/src/branch/master/gen/bc_help...
       | 
       | [3]:
       | https://git.yzena.com/gavin/bc/src/branch/master/gen/dc_help...
       | 
       | [4]: https://git.yzena.com/gavin/bc/src/branch/master/gen/lib.bc
       | 
       | [5]: https://git.yzena.com/gavin/bc/src/branch/master/gen/lib2.bc
       | 
       | [6]:
       | https://git.yzena.com/Yzena/Yc/src/branch/master/tests/strge...
        
         | ufo wrote:
         | How do you feel about that problem the parent blog post
         | mentioned, of this being slow for large blobs particularly when
         | compiling with Clang?
        
           | hikarudo wrote:
           | You can split it up into several files, then concatenate the
           | arrays at runtime.
        
       | vgel wrote:
       | If you're already using Clang (and thus LLVM & its platform
       | constraints), I wonder if the best way would be to link in a tiny
       | Rust / Zig `.o` using `include_bytes!` / `@embedFile`...
        
       | kevin_thibedeau wrote:
       | This is broken:                 .incbin "string_blob.txt"
       | ...            printf("%s\n", string_blob);
       | 
       | Text files don't have to have a NUL termination. The proper way
       | to embed data with the .incbin directive is to add a label after
       | the file and use that directly for pointer arithmetic or compute
       | the size with another assembly directive.
        
         | kelnos wrote:
         | In this case it works because the author explicitly put a NUL
         | at the end of the string in the text file. I don't think the
         | author was trying to suggest that you can do this with
         | arbitrary data.
        
         | jandrese wrote:
         | Couldn't you follow up the .incbin statement with a .const 0 or
         | something similar?
        
       | ncmncm wrote:
       | ELF provides for any number of different kinds of "section" that
       | you can have automatically mapped into your address space at
       | startup. You just need a way for your program to know where it
       | is. There are lots of different ways to get that.
        
         | titzer wrote:
         | Yes, but the article was mostly about what tools do you use to
         | get that section into the ELF.
        
       | jviotti wrote:
       | My team is working on this problem in the context of creating
       | Node.js single-executable applications. While the naive approach
       | of just appending data at the end of the binary works, it is not
       | friendly with code-signature in macOS and Windows given that
       | signing operates on PE and Mach-O sections.
       | 
       | We have recently open-sourced a small tool called Postject
       | (https://github.com/postmanlabs/postject), which is able to
       | inject arbitrary data as proper ELF/Mach-O/PE sections for all
       | major operating systems (with AIX support coming). The tool also
       | provides C/C++ cross-platform headers for easily traversing the
       | final binary and introspect whether the segment is present or
       | not.
       | 
       | The tool is based on the LIEF (https://github.com/lief-
       | project/LIEF) project.
       | 
       | At Postman, we are making use of this on our custom Node.js
       | single-executable applications and soon on our custom Electron.js
       | builds too.
        
       ___________________________________________________________________
       (page generated 2022-07-26 23:00 UTC)