[HN Gopher] C Portability Lessons from Weird Machines ___________________________________________________________________ C Portability Lessons from Weird Machines Author : rsecora Score : 102 points Date : 2022-02-21 16:45 UTC (6 hours ago) (HTM) web link (begriffs.com) (TXT) w3m dump (begriffs.com) | zokier wrote: | I don't think there is anything wrong in writing platform- | specific code; in certain circles there is this weird | fetishitization of portability, placing it on the highest | pedestal as a metric of quality. This happens in C programming | and also for example in shell scripting, people advocating for | relying only on POSIX-defined behavior. If a platform specific | way of doing something works better in some use-case then there | should be no shame in using that. What is important is that the | code relies on well-defined behavior, and also that the platform | assumptions/requirements are documented to a degree. | | Of course it is wonderful that you can make programs that are | indeed portable between this huge range of computers; just that | not every program needs to do so. | Filligree wrote: | > Of course it is wonderful that you can make programs that are | indeed portable between this huge range of computers; just that | not every program needs to do so. | | Isn't most code that would behave differently on different | architectures subject to undefined behaviour, however? The | signed overflow case mentioned, for example. | | Sure, some of it is implementation-defined, but in practice you | need to write ultra-portable code anyway in order for your | compiler not to pull the rug out underneath you. | LeifCarrotson wrote: | > If a platform specific way of doing something works better in | some use-case then there should be no shame in using that. | | I fully agree, but the real problem is not the limitation to | platform-specific logic but the conflation of fundamental | requirements and incidental requirements. There's no good way | to know when you meant "int" as just a counter of a user's | files, or "int" as in a 32-bit bitmask, or "int" as the value | of a pointer. For the former, it probably doesn't matter if | someone later compiles it for a different architecture, for the | latter, if you mean int32_t or uintptr_t, use that! | ithinkso wrote: | I love how the concept of 'platform independent' evolved - you | would think that it means you can run it anywhere but almost | all software that uses 'platform-independent' code is very | platform-dependent | | Because if you make an Electron app it's only logical that | because it is platform-independent it can only be run on macOS | xemdetia wrote: | I would say most _modern_ code is not the kind of code written | with the absurd platform assumptions that old code used to do. | There 's not magic addresses you have to know to talk to | hardware, there's no implicit memory segmentation/memory | looping, and so on. Ever since most modern OSs try to prevent | direct access or randomize address spaces and so on it just is | hard to write code that is insane like the way we were. | | So the question is are you contesting the POSIX defined | behaviour over using more logical interfaces from the OS or the | wild west where people were hacking around broken, platform- | specific features and often broken or awkward system libraries | or just the more modern case where people use the standard | interface instead of a more performant one. In the latter case | I agree I wish there was a more 'nice' way of doing more | dynamic and efficient feature detection without making simple C | programs crazy complex. | Someone wrote: | I think it also helps that modern hardware is a lot less | diverse. Most of the tools only run on systems that are | little endian, where _NULL_ is zero, chars are 8 bits, ints | are two's complement and silently wrap around, floats are 32 | bits IEEE 754, etc, so code that erroneously assumes those to | be true in theory isn't portable, but in practice is. | | And newer C standards code may even unbreak such code. Ints | already are two's complement in the latest version of C++ and | will be in the next version of C, for example. | thesuperbigfrog wrote: | >> in certain circles there is this weird fetishitization of | portability, placing it on the highest pedestal as a metric of | quality. | | It's not a fetish if you have ever ported legacy code that was | not written with potability in mind. | | "ints are always 32-bit and can be used to store pointers with | a simple cast" may have worked when the legacy program was | written, but it sure makes porting it a pain. | nyanpasu64 wrote: | And `unsigned long` can store pointers just fine on Linux 32, | Linux 64, Windows x86-32, and MSYS2 64... but not Windows | x86-64. https://github.com/cc65/cc65/issues/1680#issuecomment | -104641... | bombcar wrote: | C (and to some extent shell) programmers are the ones with the | most experience of the machine under them changing, perhaps | drastically - few other languages have even been around long | enough for that to have happened. | | Java sidesteps this, of course, by defining a JVM to run on and | leaving the underlying details to the implementation. | rwmj wrote: | C23 just dropped support for any non-twos-complement | architectures. No more C on Unisys for you! http://www.open- | std.org/jtc1/sc22/wg14/www/docs/n2412.pdf | eqvinox wrote: | That doesn't preclude C23 on Unisys, it just forces the | compiler to hide that fact from the programmer ;D | | (SCNR) | titzer wrote: | It's amazing the abilities that 50 years can bring a | programming language. Longest, most painful design debate ever. | viddi wrote: | Haven't read the article yet, but I have noticed that the tab | keeps loading even after 10 minutes. Aborting the loading process | leads to broken media. | | I am no expert in HTML video delivery and haven't tried it out, | but maybe setting the preload attribute to "none" or "metadata" | might help? | PhantomGremlin wrote: | _the MIPS R3000 processor ... raises an exception for signed | integer overflow, unlike many other processors which silently | wrap to negative values._ | | Too bad programmer laziness won and most current hardware doesn't | support this. | | As a teenager I remember getting hit by this all the time in | assembly language programming for the IBM S/360. (FORTRAN turned | it off). S0C8 Fixed-point overflow exception | | When you're a kid you just do things quickly. This was the | machine's way of slapping you upside your head and saying: "are | you sure about that?" | laumars wrote: | > When you're a kid you just do things quickly. | | I don't think this is a age problem. Plenty of adults are lazy | and plenty of kids aren't. | masklinn wrote: | > Too bad programmer laziness won and most current hardware | doesn't support this. | | There were discussions around this a few years back when Regher | brought up the subject, and one of the issues commonly brought | up is if you want to handle (or force handling of) overflow, | traps are pretty shit, because it means you have to update the | trap handler _before each instruction which can overflow_ , | because a global interrupt handler won't help you as it will | just be a slower overflow flag (at which point you might as | well just use an overflow flag). Traps are fine if you can set | up a single trap handler then go through the entire program, | but that's not how high-level languages deal with these issues. | | 32b x86 had INTO, and compilers didn't bother using it. | flohofwoe wrote: | Modulo wraparound is just as much a feature in some situations | as it is a bug in others. And signed vs unsigned are just | different views on the same bag of bits (assuming two's | complement numbers), most operations on two's complement | numbers are 'sign agnostic' - I guess from a hardware | designer's pov, that's the whole point :) | | The question is rather: was it really a good idea to bake | 'signedness' into the type system? ;) | pornel wrote: | That's why Rust has separate operations for wrapping and non- | wrapping arithmetic. When wrapping matters (e.g. you're | writing a hash function), you make it explicit you want | wrapping. Otherwise arithmetic can check for overflow (and | does by default in debug builds). | zozbot234 wrote: | Modulo wraparound _is_ convenient in non-trivial expressions | involving addition, subtraction and multiplication because it | will always give a correct in-range result if one exists. | "Checking for overflow" in such cases is necessarily more | complex than a simple check per operation; it must be | designed case by case. | zozbot234 wrote: | Overflow checks are trivial, there's no need for special | hardware support. It's pretty much exclusively a language-level | concern. | addaon wrote: | Overflow checks can be very expensive without hardware | support. Even on platforms with lightweight support (e.g. x86 | 'INTO'), you're replacing one of the fastest instructions out | there -- think of how many execution units can handle a basic | add -- with a sequence of two dependent instructions. | zozbot234 wrote: | A vast majority of the cost is missed optimization due to | having to compute partial states in connection to overflow | errors. The checks themselves are trivially predicted, and | that's when the compiler can't optimize them out. | monocasa wrote: | In practice the vast majority of MIPS code uses addu, the non | trapping variant. | | And in x86 land there's the into instruction, interrupt if | overflow bit set, so you're left with the same options. | spc476 wrote: | Which has to be done after every instruction | (http://boston.conman.org/2015/09/05.2) but it quite slow. | Using a conditional jump after each instruction is faster | than using INTO (http://boston.conman.org/2015/09/07.1). | monocasa wrote: | It's more complicated than shows up in micro benchmarks | like that. Since when you do it, it's pretty much every | add, you end up polluting your branch predictor by using jo | instructions everywhere and it can lead to worse overall | perf. | colejohnson66 wrote: | My guess would be a pipelining issue where `INTO` isn't | treated as a `Jcc`, but as an `INT` (mainly because it _is_ | an interrupt). Agner Fog 's instruction tables[0] show (for | the Pentium 4) `Jcc` takes one uOP with a throughput of | 2-4. `INTO`, OTOH, when _not taken_ uses four uOPs with a | throughput of _18_! Zen 3 is much better with a throughput | of 2, but that 's still worse than `JO raiseINTO`. | | [0]: https://www.agner.org/optimize/instruction_tables.pdf | rjsw wrote: | There are C compilers for the PDP-10, it must count as fairly | weird. | AnimalMuppet wrote: | Overall a good article. I was a bit amused and/or disgruntled to | see a TRS80 in the "Motorola 68000" section, though... | rjsw wrote: | Why disgruntled? I never saw a Model 16 [1] but they did exist. | | [1] https://en.wikipedia.org/wiki/TRS-80_Model_II#model16 | astrobe_ wrote: | > Everyone who writes about programming the Intel 286 says what a | pain its segmented memory architecture was | | Actually this concerns more pre-80286 processors, since 80286 | introduced virtual memory, and the segment registers were less | prominent in "protected mode". Moreover I wouldn't say it was a | pain, at least at the assembly level, once you understood the | trick. C had not concept of segmented memory, so you had to tell | the compiler which "memory model" it should use. | | > One significant quirk is that the machine is very sensitive to | data alignment. | | I remembered from school time about a "barrel register" that | allowed to remove this limitation, but it was introduced in | 68020. | | On the topic itself, I like to say that a program is portable if | it has been ported once (likewise a module is reusable if it has | been reused once). I remember porting a program from a 68K | descendant to ARM, the only non-obvious portability issue was | that in C, the _char_ type is that the standard doesn 't mandate | the _char_ type to be signed or unsigned (it 's implementation- | defined). | spc476 wrote: | The segment registers were less prominent on the 80386 in | protected mode since you also have paging, and each segment can | be 4G in size. On the 80286 in protected mode the segment | registers are still very much there (no paging, each segment is | still limited to 64k). | zwieback wrote: | > > Everyone who writes about programming the Intel 286 says | what a pain its segmented memory architecture was | | > Actually this concerns more pre-80286 processors, since 80286 | introduced virtual memory, | | 86 had segments, 286 added protected mode, 386 added virtual. I | would agree, though, 286 wasn't as bad as people make it sound. | In OS/2 1.x it was quite usable. | shadowofneptune wrote: | Having done some 8086 programming recently, I did find segments | rather helpful once you get used to them. They make it easier | to think about handling data in a modular fashion; a large (64k | maximum) data structure can be given its own segment. The 286 | went farther by providing protection to allocated segments. I | have a feeling overlays only really become a nuisance once you | start working on projects far bigger than were ever intended | for that generation of '86. MS-DOS not having a true successor | didn't help either. | zwieback wrote: | I wrote a fair amount of code for TI's TMS320C4x DSPs. They had | 32 bit sized char, short, int, long, float and double and a long | double with 40 bits. | | Took a bit to get used to but really the only way to get to the | good stuff was by writing assembly code and hand-tuning all the | pipeline stuff. | rsecora wrote: | It still amazes me how the PDP-11 has the NUXI [1] problem at | nibble level and how the PDP-11 was bytesexual [2]. | | [1] http://catb.org/jargon/html/N/NUXI-problem.html | | [2] http://catb.org/jargon/html/B/bytesexual.html | gus_massa wrote: | [If you remove the spaces at the beginning of the line, HN will | make the links clicky. You probably need to add an enter in | between to get the correct formatting.] | rsecora wrote: | Done, thank you | gwern wrote: | Note: "weird machines" here has nothing to do with the well-known | security concept, just referring to unusual or obscure computers. | nivertech wrote: | The author forgot to mention that 8051 has a bit-addressable | lower part of RAM. | | PDP-11 had a weird RAM overlay scheme of squeezing 256KB RAM into | a 64KB 16-bit address space. | | IBM System/360 also had a weird addressing scheme with base | register and up to 4KB offsets. | | https://en.wikipedia.org/wiki/IBM_System/360_architecture#Ad... | ChuckMcM wrote: | I scored 7. (have written C code on six of the architectures | mentioned (PDP 11, i86, VAX, 68K, IBM 360, AT&T 3B2, and DG | Eclipse) I have also written C code on the DEC KL-10 (36 bit | machine) which isn't covered. And while I have a PDP-8, I only | have FOCAL and FORTRAN for it rather than C. I'm sure there is a | C compiler out there somewhere :-). | | With the imminent C23 spec I'm really amazed at how well C has | held up over the last half century. A lot of things in computers | are very 'temporal' (in that there are a lot of things that are | all available at a certain point in time that are required for | the system to work) but C has managed to dodge much of that. | eqvinox wrote: | On a slightly related note, chances are good anyone reading this | has an 8051 within a few meters of them - they're close to | omnipresent in USB chips, particularly hubs, storage bridges and | keyboards / mice. The architecture is equally atrocious as the | 6502. | | btw: a good indicator is GCC support - AVR, also an 8-bit uC - is | perfectly well supported by GCC. 8051 and 6502, you need | something like SDCC [http://sdcc.sourceforge.net/] | dfox wrote: | One thing to keep in mind while programming AVR in C is that it | still is small-ish MCU with different address spaces. This | means that if you do not use correct compiler-specific type | annotations for read-only data these will get copied into RAM | on startup (both avr-libc and arduino contain some macrology | that catches the obvious cases like some string constants, but | you still need to keep this in mind). | RicoElectrico wrote: | Hope RISC-V will displace 8051 over time. It's such an absurd | thing to extend this architecture in myriad non-interoperable | (although backwards-compatible with OG 8051) ways. And don't | forget about the XRAM nonsense. Yuck. | unwiredben wrote: | For 6502 fans, there's a new port of Clang and LLVM that seems | to be doing some nice code generation. See https://llvm- | mos.org/ | yuubi wrote: | the 6502 has a single 16-bit address space with some parts | (zero page, stack) addressable by means other than full 16-bit | addresses. the 8051 has 16-bit read-only code space, 16-bit | read/write external memory space, and 8-bit internal memory | space, except half of it is special: if you use indirect access | (address in an 8-bit register), you get memory. but if you | encode that same address literally in an instruction, you get a | peripheral register. | | at least that's the part I remember | jazzyjackson wrote: | > The architecture is equally atrocious as the 6502. | | I only ever hear glowing/nostalgic reviews of 6502 programming, | I guess from the retro/8bit gaming scene, curious what you find | so atrocious. | tenebrisalietum wrote: | 6502 is awesome to program from assembly. | | What makes the 6502 atrocious for C is: | | - internal CPU registers are 8 bits, no more, no less and you | only have 3 of them (5 if you count the stack pointer and | processor status register). | | - fixed 8-bit stack pointer so things like automatic | variables and pass-by-value can't take up a lot of space. | | - things like "access memory location Z + contents of | register A + this offset" aren't possible without a lot of | instructions. | | - no hardware divide or multiply. | | Many CPUs have instructions that map neatly to C operations, | but not 6502. With enough instructions C is hostable by any | CPU (e.g. ZPU) but a lot of work is needed to do that on the | 6502 and the real question is - will it fit in 16K, 32K, as | most 6502 CPUs only have 16 address lines - meaning they only | see 64K of addresses at once. Mechanisms exist to extend that | but they are platform specific. | | IMHO Z80 is better in this regard with it's 16-bit stack | pointer and combinable index registers. | adrian_b wrote: | 6502 was nice only in comparison with Intel 8080/8085, but it | was very limited in comparison with better architectures. | | The great quality of 6502 was that it allowed a very cheap | implementation, resulting in a very low price. | | A very large number of computers used 6502 because it was the | cheapest, not because it was better than the alternatives. | | For a very large number of programmers, 6502 was the 1st CPU | whose assembly language they have learned and possibly the | only one, as later they have used only high-level languages, | so it is fondly remembered. That does not mean that it was | really good. | | I also have nostalgic happy memories about my first programs, | which I have entered on punched cards. That does not mean | that using punched cards is preferable to modern computers. | | Programming a 6502 was tedious, because you had only 8-bit | operations (even Intel 8080 had 16-bit additions, which | simplified a lot multiply routines and many other | computations) and you had only a small number of 8-bit | registers with dedicated functions. | | So most or all variables of normal sizes had to be stored in | memory and almost everything that would require a single | instruction in modern microcontrollers, e.g. ARM, required a | long sequence of instructions on 6502. (e.g. a single 32-bit | integer addition would have required a dozen instructions in | the best case, for statically allocated variables, but up to | 20 instructions or even more when run-time address | computations were also required, for dynamically-allocated | variables.) | | A good macroassemmbler could simplify a lot the programming | on a 6502, by writing all programs with a set of | macroinstructions designed to simulate a more powerful CPU. | | I do not know whether good macro-assemblers existed for 6502, | as in those days I have used mostly CP/M computers, which had | a good macro-assembler from Microsoft, or the much more | powerful Motorola 6809. I have programmed 6502 only a couple | of times, at some friends who had Commodore computers, and it | was weak in comparison with more recent CPUs, e.g. Zilog Z80 | (which appeared one year later than 6502). | mwcampbell wrote: | > I do not know whether good macro-assemblers existed for | 6502 | | They certainly did. I don't know about the communities that | grew around Commodore, Atari, or other 6502-based | computers, but in the Apple II world, there were multiple | macro assemblers available. Possibly the most famous was | Merlin. As a pre-teen, I used the Mindcraft Assembler. | Mindcraft even sold another product called Macrosoft, which | was a macro library for their assembler that tried to | combine the speed of assembly language with a BASIC-like | syntax. The major downside, compared to both hand-coded | assembly and Applesoft BASIC (which was stored in a pre- | tokenized binary format), was the size of the executable. | | Edit: Speaking of simulating a more powerful CPU, Steve | Wozniak implemented the SWEET16 [1] bytecode interpreter as | part of his original Apple II ROM. Apple Pascal used | p-code. And a more recent bytecode interpreter for the | Apple II is PLASMA [2]. | | [1]: https://en.wikipedia.org/wiki/SWEET16 | | [2]: https://github.com/dschmenk/plasma | le-mark wrote: | Wozniak's sweet 16 was along these lines: | | https://en.m.wikipedia.org/wiki/SWEET16 | eqvinox wrote: | > curious what you find so atrocious. | | In the context of the original post, it's a bad target for C | -- I have no clue about other 6502 use :) | kwertyoowiyop wrote: | Given C's origin on the PDP-11, it's amazing it ended up so | portable to all these crazy architectures. Even as an old-timer, | the 8051 section made me say "WTF"! ___________________________________________________________________ (page generated 2022-02-21 23:00 UTC)