[HN Gopher] C Portability Lessons from Weird Machines
       ___________________________________________________________________
        
       C Portability Lessons from Weird Machines
        
       Author : rsecora
       Score  : 102 points
       Date   : 2022-02-21 16:45 UTC (6 hours ago)
        
 (HTM) web link (begriffs.com)
 (TXT) w3m dump (begriffs.com)
        
       | zokier wrote:
       | I don't think there is anything wrong in writing platform-
       | specific code; in certain circles there is this weird
       | fetishitization of portability, placing it on the highest
       | pedestal as a metric of quality. This happens in C programming
       | and also for example in shell scripting, people advocating for
       | relying only on POSIX-defined behavior. If a platform specific
       | way of doing something works better in some use-case then there
       | should be no shame in using that. What is important is that the
       | code relies on well-defined behavior, and also that the platform
       | assumptions/requirements are documented to a degree.
       | 
       | Of course it is wonderful that you can make programs that are
       | indeed portable between this huge range of computers; just that
       | not every program needs to do so.
        
         | Filligree wrote:
         | > Of course it is wonderful that you can make programs that are
         | indeed portable between this huge range of computers; just that
         | not every program needs to do so.
         | 
         | Isn't most code that would behave differently on different
         | architectures subject to undefined behaviour, however? The
         | signed overflow case mentioned, for example.
         | 
         | Sure, some of it is implementation-defined, but in practice you
         | need to write ultra-portable code anyway in order for your
         | compiler not to pull the rug out underneath you.
        
         | LeifCarrotson wrote:
         | > If a platform specific way of doing something works better in
         | some use-case then there should be no shame in using that.
         | 
         | I fully agree, but the real problem is not the limitation to
         | platform-specific logic but the conflation of fundamental
         | requirements and incidental requirements. There's no good way
         | to know when you meant "int" as just a counter of a user's
         | files, or "int" as in a 32-bit bitmask, or "int" as the value
         | of a pointer. For the former, it probably doesn't matter if
         | someone later compiles it for a different architecture, for the
         | latter, if you mean int32_t or uintptr_t, use that!
        
         | ithinkso wrote:
         | I love how the concept of 'platform independent' evolved - you
         | would think that it means you can run it anywhere but almost
         | all software that uses 'platform-independent' code is very
         | platform-dependent
         | 
         | Because if you make an Electron app it's only logical that
         | because it is platform-independent it can only be run on macOS
        
         | xemdetia wrote:
         | I would say most _modern_ code is not the kind of code written
         | with the absurd platform assumptions that old code used to do.
         | There 's not magic addresses you have to know to talk to
         | hardware, there's no implicit memory segmentation/memory
         | looping, and so on. Ever since most modern OSs try to prevent
         | direct access or randomize address spaces and so on it just is
         | hard to write code that is insane like the way we were.
         | 
         | So the question is are you contesting the POSIX defined
         | behaviour over using more logical interfaces from the OS or the
         | wild west where people were hacking around broken, platform-
         | specific features and often broken or awkward system libraries
         | or just the more modern case where people use the standard
         | interface instead of a more performant one. In the latter case
         | I agree I wish there was a more 'nice' way of doing more
         | dynamic and efficient feature detection without making simple C
         | programs crazy complex.
        
           | Someone wrote:
           | I think it also helps that modern hardware is a lot less
           | diverse. Most of the tools only run on systems that are
           | little endian, where _NULL_ is zero, chars are 8 bits, ints
           | are two's complement and silently wrap around, floats are 32
           | bits IEEE 754, etc, so code that erroneously assumes those to
           | be true in theory isn't portable, but in practice is.
           | 
           | And newer C standards code may even unbreak such code. Ints
           | already are two's complement in the latest version of C++ and
           | will be in the next version of C, for example.
        
         | thesuperbigfrog wrote:
         | >> in certain circles there is this weird fetishitization of
         | portability, placing it on the highest pedestal as a metric of
         | quality.
         | 
         | It's not a fetish if you have ever ported legacy code that was
         | not written with potability in mind.
         | 
         | "ints are always 32-bit and can be used to store pointers with
         | a simple cast" may have worked when the legacy program was
         | written, but it sure makes porting it a pain.
        
           | nyanpasu64 wrote:
           | And `unsigned long` can store pointers just fine on Linux 32,
           | Linux 64, Windows x86-32, and MSYS2 64... but not Windows
           | x86-64. https://github.com/cc65/cc65/issues/1680#issuecomment
           | -104641...
        
         | bombcar wrote:
         | C (and to some extent shell) programmers are the ones with the
         | most experience of the machine under them changing, perhaps
         | drastically - few other languages have even been around long
         | enough for that to have happened.
         | 
         | Java sidesteps this, of course, by defining a JVM to run on and
         | leaving the underlying details to the implementation.
        
       | rwmj wrote:
       | C23 just dropped support for any non-twos-complement
       | architectures. No more C on Unisys for you! http://www.open-
       | std.org/jtc1/sc22/wg14/www/docs/n2412.pdf
        
         | eqvinox wrote:
         | That doesn't preclude C23 on Unisys, it just forces the
         | compiler to hide that fact from the programmer ;D
         | 
         | (SCNR)
        
         | titzer wrote:
         | It's amazing the abilities that 50 years can bring a
         | programming language. Longest, most painful design debate ever.
        
       | viddi wrote:
       | Haven't read the article yet, but I have noticed that the tab
       | keeps loading even after 10 minutes. Aborting the loading process
       | leads to broken media.
       | 
       | I am no expert in HTML video delivery and haven't tried it out,
       | but maybe setting the preload attribute to "none" or "metadata"
       | might help?
        
       | PhantomGremlin wrote:
       | _the MIPS R3000 processor ... raises an exception for signed
       | integer overflow, unlike many other processors which silently
       | wrap to negative values._
       | 
       | Too bad programmer laziness won and most current hardware doesn't
       | support this.
       | 
       | As a teenager I remember getting hit by this all the time in
       | assembly language programming for the IBM S/360. (FORTRAN turned
       | it off).                  S0C8 Fixed-point overflow exception
       | 
       | When you're a kid you just do things quickly. This was the
       | machine's way of slapping you upside your head and saying: "are
       | you sure about that?"
        
         | laumars wrote:
         | > When you're a kid you just do things quickly.
         | 
         | I don't think this is a age problem. Plenty of adults are lazy
         | and plenty of kids aren't.
        
         | masklinn wrote:
         | > Too bad programmer laziness won and most current hardware
         | doesn't support this.
         | 
         | There were discussions around this a few years back when Regher
         | brought up the subject, and one of the issues commonly brought
         | up is if you want to handle (or force handling of) overflow,
         | traps are pretty shit, because it means you have to update the
         | trap handler _before each instruction which can overflow_ ,
         | because a global interrupt handler won't help you as it will
         | just be a slower overflow flag (at which point you might as
         | well just use an overflow flag). Traps are fine if you can set
         | up a single trap handler then go through the entire program,
         | but that's not how high-level languages deal with these issues.
         | 
         | 32b x86 had INTO, and compilers didn't bother using it.
        
         | flohofwoe wrote:
         | Modulo wraparound is just as much a feature in some situations
         | as it is a bug in others. And signed vs unsigned are just
         | different views on the same bag of bits (assuming two's
         | complement numbers), most operations on two's complement
         | numbers are 'sign agnostic' - I guess from a hardware
         | designer's pov, that's the whole point :)
         | 
         | The question is rather: was it really a good idea to bake
         | 'signedness' into the type system? ;)
        
           | pornel wrote:
           | That's why Rust has separate operations for wrapping and non-
           | wrapping arithmetic. When wrapping matters (e.g. you're
           | writing a hash function), you make it explicit you want
           | wrapping. Otherwise arithmetic can check for overflow (and
           | does by default in debug builds).
        
           | zozbot234 wrote:
           | Modulo wraparound _is_ convenient in non-trivial expressions
           | involving addition, subtraction and multiplication because it
           | will always give a correct in-range result if one exists.
           | "Checking for overflow" in such cases is necessarily more
           | complex than a simple check per operation; it must be
           | designed case by case.
        
         | zozbot234 wrote:
         | Overflow checks are trivial, there's no need for special
         | hardware support. It's pretty much exclusively a language-level
         | concern.
        
           | addaon wrote:
           | Overflow checks can be very expensive without hardware
           | support. Even on platforms with lightweight support (e.g. x86
           | 'INTO'), you're replacing one of the fastest instructions out
           | there -- think of how many execution units can handle a basic
           | add -- with a sequence of two dependent instructions.
        
             | zozbot234 wrote:
             | A vast majority of the cost is missed optimization due to
             | having to compute partial states in connection to overflow
             | errors. The checks themselves are trivially predicted, and
             | that's when the compiler can't optimize them out.
        
         | monocasa wrote:
         | In practice the vast majority of MIPS code uses addu, the non
         | trapping variant.
         | 
         | And in x86 land there's the into instruction, interrupt if
         | overflow bit set, so you're left with the same options.
        
           | spc476 wrote:
           | Which has to be done after every instruction
           | (http://boston.conman.org/2015/09/05.2) but it quite slow.
           | Using a conditional jump after each instruction is faster
           | than using INTO (http://boston.conman.org/2015/09/07.1).
        
             | monocasa wrote:
             | It's more complicated than shows up in micro benchmarks
             | like that. Since when you do it, it's pretty much every
             | add, you end up polluting your branch predictor by using jo
             | instructions everywhere and it can lead to worse overall
             | perf.
        
             | colejohnson66 wrote:
             | My guess would be a pipelining issue where `INTO` isn't
             | treated as a `Jcc`, but as an `INT` (mainly because it _is_
             | an interrupt). Agner Fog 's instruction tables[0] show (for
             | the Pentium 4) `Jcc` takes one uOP with a throughput of
             | 2-4. `INTO`, OTOH, when _not taken_ uses four uOPs with a
             | throughput of _18_! Zen 3 is much better with a throughput
             | of 2, but that 's still worse than `JO raiseINTO`.
             | 
             | [0]: https://www.agner.org/optimize/instruction_tables.pdf
        
       | rjsw wrote:
       | There are C compilers for the PDP-10, it must count as fairly
       | weird.
        
       | AnimalMuppet wrote:
       | Overall a good article. I was a bit amused and/or disgruntled to
       | see a TRS80 in the "Motorola 68000" section, though...
        
         | rjsw wrote:
         | Why disgruntled? I never saw a Model 16 [1] but they did exist.
         | 
         | [1] https://en.wikipedia.org/wiki/TRS-80_Model_II#model16
        
       | astrobe_ wrote:
       | > Everyone who writes about programming the Intel 286 says what a
       | pain its segmented memory architecture was
       | 
       | Actually this concerns more pre-80286 processors, since 80286
       | introduced virtual memory, and the segment registers were less
       | prominent in "protected mode". Moreover I wouldn't say it was a
       | pain, at least at the assembly level, once you understood the
       | trick. C had not concept of segmented memory, so you had to tell
       | the compiler which "memory model" it should use.
       | 
       | > One significant quirk is that the machine is very sensitive to
       | data alignment.
       | 
       | I remembered from school time about a "barrel register" that
       | allowed to remove this limitation, but it was introduced in
       | 68020.
       | 
       | On the topic itself, I like to say that a program is portable if
       | it has been ported once (likewise a module is reusable if it has
       | been reused once). I remember porting a program from a 68K
       | descendant to ARM, the only non-obvious portability issue was
       | that in C, the _char_ type is that the standard doesn 't mandate
       | the _char_ type to be signed or unsigned (it 's implementation-
       | defined).
        
         | spc476 wrote:
         | The segment registers were less prominent on the 80386 in
         | protected mode since you also have paging, and each segment can
         | be 4G in size. On the 80286 in protected mode the segment
         | registers are still very much there (no paging, each segment is
         | still limited to 64k).
        
         | zwieback wrote:
         | > > Everyone who writes about programming the Intel 286 says
         | what a pain its segmented memory architecture was
         | 
         | > Actually this concerns more pre-80286 processors, since 80286
         | introduced virtual memory,
         | 
         | 86 had segments, 286 added protected mode, 386 added virtual. I
         | would agree, though, 286 wasn't as bad as people make it sound.
         | In OS/2 1.x it was quite usable.
        
         | shadowofneptune wrote:
         | Having done some 8086 programming recently, I did find segments
         | rather helpful once you get used to them. They make it easier
         | to think about handling data in a modular fashion; a large (64k
         | maximum) data structure can be given its own segment. The 286
         | went farther by providing protection to allocated segments. I
         | have a feeling overlays only really become a nuisance once you
         | start working on projects far bigger than were ever intended
         | for that generation of '86. MS-DOS not having a true successor
         | didn't help either.
        
       | zwieback wrote:
       | I wrote a fair amount of code for TI's TMS320C4x DSPs. They had
       | 32 bit sized char, short, int, long, float and double and a long
       | double with 40 bits.
       | 
       | Took a bit to get used to but really the only way to get to the
       | good stuff was by writing assembly code and hand-tuning all the
       | pipeline stuff.
        
       | rsecora wrote:
       | It still amazes me how the PDP-11 has the NUXI [1] problem at
       | nibble level and how the PDP-11 was bytesexual [2].
       | 
       | [1] http://catb.org/jargon/html/N/NUXI-problem.html
       | 
       | [2] http://catb.org/jargon/html/B/bytesexual.html
        
         | gus_massa wrote:
         | [If you remove the spaces at the beginning of the line, HN will
         | make the links clicky. You probably need to add an enter in
         | between to get the correct formatting.]
        
           | rsecora wrote:
           | Done, thank you
        
       | gwern wrote:
       | Note: "weird machines" here has nothing to do with the well-known
       | security concept, just referring to unusual or obscure computers.
        
       | nivertech wrote:
       | The author forgot to mention that 8051 has a bit-addressable
       | lower part of RAM.
       | 
       | PDP-11 had a weird RAM overlay scheme of squeezing 256KB RAM into
       | a 64KB 16-bit address space.
       | 
       | IBM System/360 also had a weird addressing scheme with base
       | register and up to 4KB offsets.
       | 
       | https://en.wikipedia.org/wiki/IBM_System/360_architecture#Ad...
        
       | ChuckMcM wrote:
       | I scored 7. (have written C code on six of the architectures
       | mentioned (PDP 11, i86, VAX, 68K, IBM 360, AT&T 3B2, and DG
       | Eclipse) I have also written C code on the DEC KL-10 (36 bit
       | machine) which isn't covered. And while I have a PDP-8, I only
       | have FOCAL and FORTRAN for it rather than C. I'm sure there is a
       | C compiler out there somewhere :-).
       | 
       | With the imminent C23 spec I'm really amazed at how well C has
       | held up over the last half century. A lot of things in computers
       | are very 'temporal' (in that there are a lot of things that are
       | all available at a certain point in time that are required for
       | the system to work) but C has managed to dodge much of that.
        
       | eqvinox wrote:
       | On a slightly related note, chances are good anyone reading this
       | has an 8051 within a few meters of them - they're close to
       | omnipresent in USB chips, particularly hubs, storage bridges and
       | keyboards / mice. The architecture is equally atrocious as the
       | 6502.
       | 
       | btw: a good indicator is GCC support - AVR, also an 8-bit uC - is
       | perfectly well supported by GCC. 8051 and 6502, you need
       | something like SDCC [http://sdcc.sourceforge.net/]
        
         | dfox wrote:
         | One thing to keep in mind while programming AVR in C is that it
         | still is small-ish MCU with different address spaces. This
         | means that if you do not use correct compiler-specific type
         | annotations for read-only data these will get copied into RAM
         | on startup (both avr-libc and arduino contain some macrology
         | that catches the obvious cases like some string constants, but
         | you still need to keep this in mind).
        
         | RicoElectrico wrote:
         | Hope RISC-V will displace 8051 over time. It's such an absurd
         | thing to extend this architecture in myriad non-interoperable
         | (although backwards-compatible with OG 8051) ways. And don't
         | forget about the XRAM nonsense. Yuck.
        
         | unwiredben wrote:
         | For 6502 fans, there's a new port of Clang and LLVM that seems
         | to be doing some nice code generation. See https://llvm-
         | mos.org/
        
         | yuubi wrote:
         | the 6502 has a single 16-bit address space with some parts
         | (zero page, stack) addressable by means other than full 16-bit
         | addresses. the 8051 has 16-bit read-only code space, 16-bit
         | read/write external memory space, and 8-bit internal memory
         | space, except half of it is special: if you use indirect access
         | (address in an 8-bit register), you get memory. but if you
         | encode that same address literally in an instruction, you get a
         | peripheral register.
         | 
         | at least that's the part I remember
        
         | jazzyjackson wrote:
         | > The architecture is equally atrocious as the 6502.
         | 
         | I only ever hear glowing/nostalgic reviews of 6502 programming,
         | I guess from the retro/8bit gaming scene, curious what you find
         | so atrocious.
        
           | tenebrisalietum wrote:
           | 6502 is awesome to program from assembly.
           | 
           | What makes the 6502 atrocious for C is:
           | 
           | - internal CPU registers are 8 bits, no more, no less and you
           | only have 3 of them (5 if you count the stack pointer and
           | processor status register).
           | 
           | - fixed 8-bit stack pointer so things like automatic
           | variables and pass-by-value can't take up a lot of space.
           | 
           | - things like "access memory location Z + contents of
           | register A + this offset" aren't possible without a lot of
           | instructions.
           | 
           | - no hardware divide or multiply.
           | 
           | Many CPUs have instructions that map neatly to C operations,
           | but not 6502. With enough instructions C is hostable by any
           | CPU (e.g. ZPU) but a lot of work is needed to do that on the
           | 6502 and the real question is - will it fit in 16K, 32K, as
           | most 6502 CPUs only have 16 address lines - meaning they only
           | see 64K of addresses at once. Mechanisms exist to extend that
           | but they are platform specific.
           | 
           | IMHO Z80 is better in this regard with it's 16-bit stack
           | pointer and combinable index registers.
        
           | adrian_b wrote:
           | 6502 was nice only in comparison with Intel 8080/8085, but it
           | was very limited in comparison with better architectures.
           | 
           | The great quality of 6502 was that it allowed a very cheap
           | implementation, resulting in a very low price.
           | 
           | A very large number of computers used 6502 because it was the
           | cheapest, not because it was better than the alternatives.
           | 
           | For a very large number of programmers, 6502 was the 1st CPU
           | whose assembly language they have learned and possibly the
           | only one, as later they have used only high-level languages,
           | so it is fondly remembered. That does not mean that it was
           | really good.
           | 
           | I also have nostalgic happy memories about my first programs,
           | which I have entered on punched cards. That does not mean
           | that using punched cards is preferable to modern computers.
           | 
           | Programming a 6502 was tedious, because you had only 8-bit
           | operations (even Intel 8080 had 16-bit additions, which
           | simplified a lot multiply routines and many other
           | computations) and you had only a small number of 8-bit
           | registers with dedicated functions.
           | 
           | So most or all variables of normal sizes had to be stored in
           | memory and almost everything that would require a single
           | instruction in modern microcontrollers, e.g. ARM, required a
           | long sequence of instructions on 6502. (e.g. a single 32-bit
           | integer addition would have required a dozen instructions in
           | the best case, for statically allocated variables, but up to
           | 20 instructions or even more when run-time address
           | computations were also required, for dynamically-allocated
           | variables.)
           | 
           | A good macroassemmbler could simplify a lot the programming
           | on a 6502, by writing all programs with a set of
           | macroinstructions designed to simulate a more powerful CPU.
           | 
           | I do not know whether good macro-assemblers existed for 6502,
           | as in those days I have used mostly CP/M computers, which had
           | a good macro-assembler from Microsoft, or the much more
           | powerful Motorola 6809. I have programmed 6502 only a couple
           | of times, at some friends who had Commodore computers, and it
           | was weak in comparison with more recent CPUs, e.g. Zilog Z80
           | (which appeared one year later than 6502).
        
             | mwcampbell wrote:
             | > I do not know whether good macro-assemblers existed for
             | 6502
             | 
             | They certainly did. I don't know about the communities that
             | grew around Commodore, Atari, or other 6502-based
             | computers, but in the Apple II world, there were multiple
             | macro assemblers available. Possibly the most famous was
             | Merlin. As a pre-teen, I used the Mindcraft Assembler.
             | Mindcraft even sold another product called Macrosoft, which
             | was a macro library for their assembler that tried to
             | combine the speed of assembly language with a BASIC-like
             | syntax. The major downside, compared to both hand-coded
             | assembly and Applesoft BASIC (which was stored in a pre-
             | tokenized binary format), was the size of the executable.
             | 
             | Edit: Speaking of simulating a more powerful CPU, Steve
             | Wozniak implemented the SWEET16 [1] bytecode interpreter as
             | part of his original Apple II ROM. Apple Pascal used
             | p-code. And a more recent bytecode interpreter for the
             | Apple II is PLASMA [2].
             | 
             | [1]: https://en.wikipedia.org/wiki/SWEET16
             | 
             | [2]: https://github.com/dschmenk/plasma
        
             | le-mark wrote:
             | Wozniak's sweet 16 was along these lines:
             | 
             | https://en.m.wikipedia.org/wiki/SWEET16
        
           | eqvinox wrote:
           | > curious what you find so atrocious.
           | 
           | In the context of the original post, it's a bad target for C
           | -- I have no clue about other 6502 use :)
        
       | kwertyoowiyop wrote:
       | Given C's origin on the PDP-11, it's amazing it ended up so
       | portable to all these crazy architectures. Even as an old-timer,
       | the 8051 section made me say "WTF"!
        
       ___________________________________________________________________
       (page generated 2022-02-21 23:00 UTC)