[HN Gopher] Possible reasons for 8-bit bytes ___________________________________________________________________ Possible reasons for 8-bit bytes Author : cpach Score : 89 points Date : 2023-03-07 13:14 UTC (9 hours ago) (HTM) web link (jvns.ca) (TXT) w3m dump (jvns.ca) | billpg wrote: | This was three or four jobs ago, but I remember reviewing | someone's C code and they kept different collections of char* and | int* pointers where they could have used a single collection of | void* and the handler code would have been a lot simpler. | | The justification was that on this particular platform, char* | pointers were differently structured to int* pointers, because | char* pointers had to reference a single byte and int* pointers | didn't. | | EDIT - I appear to have cut this story short. See my response to | "wyldfire" for the rest. Sorry for causing confusion. | dahart wrote: | It is true that on at least some platforms an int* that is 4 | byte aligned is _faster_ to access than a pointer that is not | aligned. I don't know if there are platforms where int* is | assumed to be 4-byte aligned, or if the C standard allows or | disallows that, but it seems plausible that some compiler | somewhere defaulted to assuming an int* is aligned. Some | compilers might generate 2 load instructions for an unaligned | load, which incurs extra latency even if your data is already | in the cache line. These days usually you might use some kind | of alignment directive to enforce these things, which works on | any pointer type, but it does seem possible that the person's | code you reviewed wasn't incorrect to assume there's a | difference between those pointer types, even if there was a | better option. | wyldfire wrote: | > because char* pointers had to reference a single byte and | int* pointers didn't. | | I must be missing some context or you have a typo. Probably | most architectures I've ever worked with had `int *` refer to a | register/word-sized value, and I've not yet worked with an | architecture that had single-byte registers. | | Decades ago I worked on a codebase that used void * everywhere | and rampant casting of pointer types to and fro. It was a total | nightmare - the compiler was completely out of the loop and | runtime was the only place to find your bugs. | mlyle wrote: | There are architectures where all you have is word addressing | to memory. If you want to get a specific byte out, you need | to retrieve it and shift/mask yourself. In turn, a pointer to | a byte is a software construct rather than something there's | actual direct architectural support for. | loeg wrote: | Do C compilers for those platforms transparently implement | this for your char pointers as GP suggests? I would expect | that you would need to do it manually and that native C | pointers would only address the same words as the machine | itself. | billpg wrote: | Depends on how helpful the compiler is. This particular | compiler had an option to switch off adding in bit | shifting code when reading characters and instead set | CHAR_BIT to 32, meaning strings would have each character | taking up 32 bits of space. (So many zero bits, but | already handles emojis.) | mlyle wrote: | > Do C compilers for those platforms transparently | implement this for your char pointers as GP suggests? | | Yes. Lots of little microcontrollers and older big | machines have this "feature" and C compilers fix it for | you. | | There are nightmarish microcontrollers with Harvard | architectures and standards-compliant C compilers that | fix this up all behind the scenes for you. E.g. the 8051 | is ubiquitous, and it has a Harvard architecture: there | are separate buses/instructions to access program memory | and normal data memory. The program memory is only word | addressable, and the data memory is byte addressable. | | So, a "pointer" in many C environments for 8051 says what | bus the data is on and stashes in other bits what the | byte address is, if applicable. And dereferencing the | pointer involves a whole lot of conditional operations. | | Then there's things like the PDP-10, where there's | hardware support for doing fancy things with byte | pointers, but the pointers still have a different format | than word pointers (e.g. they stash the byte offset in | the high bits, not the low bits). | | The C standards makes relatively few demands upon | pointers so that you can do interesting things if | necessary for an architecture. | yurish wrote: | I have seen a DSP processor that could address only | 16-bit words. And C compiler did not fix it, bytes had 16 | bits there. | loeg wrote: | Yeah, this is what I have heard of and was expecting. | Sibling comment says it's not universal -- some C | compilers for these platforms emulate byte addressing. | csense wrote: | x86 is byte addressable, but internally, the x86 memory bus | is word addressable. So an x86 CPU does the shift/mask | process you're referring to internally. Which means it's | actually slower to access (for example) a 32-bit value that | is not aligned to a 4-byte boundary. | | C/C++ compilers often by default add extra bytes if | necessary to make sure everything's aligned. So if you have | struct X { int a; char b; int c; char d; } and struct Y { | int a; int b; char c; char d; } actually X takes up more | memory than Y, because X needs 6 extra bytes to align the | int fields to 32-bit boundaries (or 14 bytes to align to a | 64-bit boundary) while Y only needs 2 bytes (or 6 bytes for | 64-bit). | | Meaning you can sometimes save significant amounts of | memory in a C/C++ program by re-ordering struct fields [1]. | | [1] http://www.catb.org/esr/structure-packing/ | mlyle wrote: | Sure, unaligned access to memory is always expensive (on | architectures that allow it at all). | | But I'm talking about retrieving the 9th to 16th bit of a | word, which is a little different. x86 does this just | fine/quickly, because bytes are addressable. | kjs3 wrote: | _an architecture that had single-byte registers_ | | Wild guess, but the OP might be talking about the Intel 8051. | Single-byte registers, and depending on the C compiler (and | there are a few of them) 8-bit int* pointing to the first | 128/256 bytes of memory, but up to 64K of (much slower) | memory is supported in different memory spaces with different | instructions and a 16-bit register called DPTR (and some | implementations have 2 DPTR registers). C support for these | additional spaces is mostly via compiler extensions analogous | but different from the old 8086 NEAR and FAR pointers. I'm | obviously greatly simplifying and leaving out a ton of | details. | | Oh, yeah...on 8051 you need to support bit addressing as | well, at least for the 16 bytes from 20h to 2Fh. It's an odd | chip. | billpg wrote: | I forget the details (long time ago) but char* and int* | pointers had a different internal structure. The assembly | generated by the compiler when code accessed a char* pointer | was optimized for accessing single bytes and was very | different to the code generated for an int* pointer. | | Digging deeper, this particular microcontroller was tuned for | accessing 32 bits at a time. Accessing individual bytes | needed extra bit-shuffling code to be added by the compiler. | wyldfire wrote: | > char* and int* pointers had a different internal | structure. The assembly generated by the compiler when code | accessed a char* pointer was optimized for accessing single | bytes and was very different to the code generated for an | int* pointer. | | But -- they _are_ different. Architectures where they 're | treated the same are probably the exception. Depending on | what you mean by "very different" - most architectures will | emit different code for byte access versus word access. | billpg wrote: | Accessing a 32 bit word was a simple read op. | | Accessing an 8 bit byte from a pointer, the compiler | would insert assembly code into the generated object | code. The "normal" part of the pointer would be read, | loading four characters into a 32 bit register. Two extra | bits were squirreled away somewhere in the pointer and | these would feed into a shift instruction so the | requested byte would appear in the lowest-significant 8 | bits of the register. Finally, an AND instruction would | clear the top 24 bits. | leeter wrote: | Sounds like M68K or something similar, although Alpha AXP | had similar byte level access issues. A compiler on either | of those platforms likely would add a lot of fix up code to | deal with the fact they have to load the aligned (either | 16bit in M68K case or 32Bit IIRC in Alpha) and then do | bitwise and shifts depending on the pointers lower bits. | | Raymond's blog on the Alpha https://devblogs.microsoft.com/ | oldnewthing/20170816-00/?p=96... | monocasa wrote: | M68k was byte addressable just fine. Early alpha had that | issue though, as did later cray compilers. Alpha fixed it | with BWX (byte word extension). Early cray compilers | simply defined char as being 64bits, but later added | support for the shift/mask/thick pointer scheme to pack 8 | chars in a word. | leeter wrote: | Must have depended on variant, the one we used in college | would throw a GP fault for misaligned access. It | literally didn't have an A0 line. That said it's been | over 10 years and I could be remembering the very hard | instruction alignment rules as applying to data too... | monocasa wrote: | 16bits had to be aligned. It didn't have an A0 because of | the 16bit pathway, but it did have byte select lines | (#UDS, #LDS) for when you'd move.b d0,ADDR so that | devices external to the CPU could see an 8-bit data | access if that's what you were doing. | 908B64B197 wrote: | > ("a word is the natural unit of data used by a particular | processor design") Apparently on x86 the word size is 16 bits, | even though the registers are 64 bits. | | That's true for the the original x86 instruction set. IA-32 has | 32 bit word size and x86-64 has... you guessed it 64. | | 16 and 32 bit registers are still retained for compatibility | reasons (just look at the instruction set!). | stefan_ wrote: | It's extra fun because not only are the registers retained, | they were only _extended_. So you can use their 16 and 32 bit | names to refer to smaller sized parts of them. | loeg wrote: | Some x86 categorizations would call those dwords and qwords | respectively. | tom_ wrote: | Words aeem to be always 16 bits for the x86 and derivatives - | see the data types section of the software developer manual. | dragonwriter wrote: | I think "word" _as a datatype_ evolved from its original and | more general computing meaning of "the natural unit of data | used by a processor design" (the thing we talk about with an | "x-bit processor") to "16 bits" during the period of 16-bit | dominance and the explosion of computing that was happening | around it. | | Essentially, enough stuff got written assume=ing that "word" | was 16 bits (and double word, quad word, etc., had their | obvious relationship) that even though the term had not | previously been fixed, it would break the world to let it | change, even as processors with larger word sizes (in the | "natural unit of data" sense) became available, then popular, | then dominant. | IIAOPSW wrote: | Binary coded decimal makes perfect sense if you're going to | output the value to a succession of 7 segment displays (such as | in a calculator). You would have to do that conversion in | hardware anyway. A single repeated circuit mapping 4 bits to 7 | segments gets you the rest of the way to readable output. Now | that I think about it, its surprising ASCII wasn't designed | around ease of translation to segmented display. | kibwen wrote: | _> Now that I think about it, its surprising ASCII wasn 't | designed around ease of translation to segmented display._ | | Wikipedia has a section on the design considerations of ASCII: | https://en.wikipedia.org/wiki/ASCII#Design_considerations | jodrellblank wrote: | I love that there's a fractal world down there; from that the | digits 0-9 start with bit pattern 0011 and then their value | in binary to make easy convertion to/from BCD, and that | control codes Start Message and End Message were positioned | to maximise the Hamming distance so they're maximally | different and least likely to be misinterpreted as each other | in case of bits being mixed up, that 7-bit ASCII used on | 8-bit tape drives left room for a parity bit for each | character, that lowercase and uppercase letters differ only | by the toggling of a single bit, that some of the | digit/shift-symbol pairings date back to the first typewriter | with a shift key in 1878... | bregma wrote: | Maybe because ASCII is from the early 1960s and 7-segment | displays didn't become widespread until 15 years or so later. | karmakaze wrote: | EBCDIC 1963/64 (i.e. E-BCD-IC) was an extension of BCD to | support characters. | | [0] https://en.wikipedia.org/wiki/EBCDIC | ant6n wrote: | Maybe another vague reason: when PCs came about in the era of the | 8008...8086s, 64K of RAM was like a high, reasonable amount. So | you need 16-bit pointers, which require exactly 2 bytes. | kleton wrote: | ML might benefit a lot from 10bit bytes. Accelerators have a | separate memory space from the CPU after all, and have their own | hbm dram as close as possible to the dies. In exchange, you could | get decent exponent size on a float10 that might not kill your | gradients when training a model | londons_explore wrote: | There seems to be as-yet no consensus on the best math | primitives for ML. | | People have invented new ones for ML (eg the Brain Float16), | but even then some people have demonstrated training on int8 or | even int4. | | There isn't even consensus on how to map the state space onto | the numberline - is linear (as in ints) or exponential (as in | floats) better? Perhaps some entirely new mapping? | | And obviously there could be different optimal numbersystems | for different ML applications or different phases of training | or inference. | kibwen wrote: | The reason to have a distinction between bits and bytes in the | first place is so that you can have a unit of addressing that is | different from the smallest unit of information. | | But what would we lose if we just got rid of the notion of bytes | and just let every bit be addressable? | | To start, we'd still be able to fit the entire address space into | a 64-bit pointer. The maximum address space would merely be | reduced from 16 exabytes to 2 exabytes. | | I presume there's some efficiency reason why we can't address | bits in the first place. How much does that still apply? I admit, | I'd just rather live in a world where I don't have to think about | alignment or padding ever again. :P | jecel wrote: | The TMS340 family used bit addresses, but pointers were 32 | bits. | | https://en.wikipedia.org/wiki/TMS34010 | ElevenLathe wrote: | 64 bits of addressing is actually much more than most (any?) | actually-existing processors have, for the simple reason that | there is little demand for processors that can address 16 | exabytes of memory and all those address lines still cost | money. | FullyFunctional wrote: | More to the point, storing the _pointers_ cost memory. | Switching from 32-bit to 64-bit effectively halved the caches | for pointer-rich programs. AMD64 was a win largely due to all | the things they did to compensate (including doubling the | number of registers). | cpleppert wrote: | There are a couple of efficiency reason besides the simple fact | that every piece of hardware in existence operates on data | sizes in powers of the byte. To start off with it would be | fantastically inefficient to build a cpu that could load | arbitrary bit locations so you would either be restricted to | loading memory locations that are some reasonable fraction of | the internal cache line or pay a massive performance penalty to | load a bit address. Realistically what would you gain by doing | this when the cpu would have to divide any location by eight | (or some other fraction) to figure out which cache line it | needs to load? | | The article touches on this but having your addressable unit | fit a single character is incredibly convenient. If you are | manipulating text you will never worry about single bits in | isolation. Ditto for mathematical operations, do you really | have a need for numbers less than 255? It is a lot more | convenient to think about memory locations as some reasonable | unit that covers 99% of your computing use cases. | beecafe wrote: | [dead] | AdamH12113 wrote: | For those who are confused about bytes vs. words: | | The formal definition of a byte is that it's the smallest | _addressable_ unit of memory. Think of a memory as a linear | string of bits. A memory address points to a specific group of | bits (say, 8 of them). If you add 1 to the address, the new | address points to the group of bits immediately after the first | group. The size of those bit groups is 1 byte. | | In modern usage, "byte" has come to mean "a group of 8 bits", | even in situations where there is no memory addressing. This is | due to the overwhelming dominance of systems with 8-bit bytes. | Another term for a group of 8 bits is "octet", which is used in | e.g. the TCP standard. | | Words are a bit fuzzier. One way to think of a word is that it's | the largest number of bits acted on in a single operation without | any special handling. The word size is typically the size of a | CPU register or memory bus. x86 is a little weird with its | register addressing, but if you look at an ARM Cortex-M you will | see that its general-purpose CPU registers are 32 bits wide. | There are instructions for working on smaller or larger units of | data, but if you just do a generic MOV, LDR (load), or ADD | instruction, you will act on 32 register bits. This is what it | means for 32 bits to be the "natural" unit of data. So we say | that an ARM Cortex-M is a 32-bit CPU, even though there are a few | instructions that modify 64 bits (two registers) at once. | | Some of the fuzziness in the definition comes from the fact that | the sizes of the CPU registers, address space, and physical | address bus can all be different. The original AMD64 CPUs had | 64-bit registers, implemented a 48-bit address space, and brought | out 40 address lines. x86-64 CPUs now have 256-bit SIMD | instructions. "32-bit" and "64-bit" were also used as marketing | terms, with the definitions stretched accordingly. | | What it comes down to is that "word" is a very old term that is | no longer quite as useful for describing CPUs. But memories also | have word sizes, and here there is a concrete definition. The | word size of a memory is the number of bits you can read or write | at once -- that is, the number of data lines brought out from the | memory IC. | | (Note that a memory "word" is technically also a "byte" from the | memory's point of view -- it's both the natural unit of data and | the smallest addressable unit of data. CPU bytes are split out | from the memory word by the memory bus or the CPU itself. Since | computers are all about running software, we take the CPU's | perspective when talking about byte size.) | FullyFunctional wrote: | It's not entirely historically accurate. Early machines were | "word addressable" (where the words wasn't 8-bit) which by your | definition should have been called "byte addressable". | | There were even bit addressable computers, but it didn't catch | on :) | | If it wasn't for text, there would be nothing "natural" about | an 8-bit byte (but powers-of-two are natural in binary | computers). | fanf2 wrote: | In the Microsoft world, "word" generally means 16 bits, because | their usage dates back to the 16 bit era. Other sizes are | double words and quad words | | In the ARM ARM, a word is 32 bits, because that was the Arm's | original word size. Other sizes are half words and double | words. | | It is a very context-sensitive term. | AdamH12113 wrote: | >In the Microsoft world, "word" generally means 16 bits, | because their usage dates back to the 16 bit era. Other sizes | are double words and quad words | | Ah, yes. That terminology is still used in the Windows | registry, although Windows 10 seems to be limited to DWORD | and QWORD. Probably dates back to the 286 or earlier. :-) | ajross wrote: | FWIW, those conventions come from Intel originally, Microsoft | took it from them. ARM borrowed from VAX Unix conventions, | who got it from DEC. | cwoolfe wrote: | Because humans have 10 fingers and 8 is the closest power-of-two | to that. | gtop3 wrote: | The article points out that a power of two bit count is | actually less important than many of us assume at first. | williamDafoe wrote: | I worked on the UIUC PLATO system in the 1970s : CDC-6600, 7600 | cpus with 60-bit words. Back then everything used magnetic core | memory and that memory was unbelievably expensive! Sewn together | by women in southeast Asia, maybe $1 per word! | | Having 6-bit bytes on a CDC was a terrific PITA! The byte size | was a tradeoffs between saving MONEY (RAM) and the hassle of | shift codes (070) used to get uppercase letters and rare symbols! | Once semiconductor memory began to be available (2M words of | 'ECS' - "extended core storage" - actually semiconductor memory - | was added to our 1M byte memory in ~1978) computer architects | could afford to burn the extra 2 bits in every word to make | programming easier... | | At about the same time microprocessors like the 8008 were | starting to take off (1975). If the basic instruction could not | support a 0-100 value it would be virtually useless! There was | only 1 microprocessor that DID NOT use the 8-bit byte and that | was the 12-bit intersil 6100 which copied the pdp-8 instruction | set! | | Also the invention of double precision floating point made 32-bit | floating point okay. From the 40s till the 70s the most critical | decision in computer architecture was the size of the floating | point word: 36, 48, 52, 60 bits ... But 32 is clearly inadequate. | But the idea that you could have a second larger floating point | fpu that handled 32 AND 64-bit words made 32-bit floating point | acceptable.. | | Also in the early 1970s text processing took off, partly from the | invention of ASCII (1963), partly from 8-bit microprocessors, | partly from a little known OS whose fundamental idea was that | characters should be the only unit of I/O (Unix -father of | Linux). | | So why do we have 8-bit bytes? Thank you, Gordon Moore! | kjs3 wrote: | I worked on the later CDC Cyber 170/180 machines, and yeah | there was a C compiler (2, in fact). 60-bit words, 18-bit | addresses and index registers, and the choice of 5-bit or | 12-bit chars. The highly extended CDC Pascal dialect papered | over more of this weirdness and was much less torturous to use. | The Algol compiler was interesting as well. | | The 180 introduced a somewhat less wild, certainly more C | friendly, 64-bit arch revision. | | _There was only 1 microprocessor that DID NOT use the 8-bit | byte_ | | Toshiba had a 12-bit single chip processor at one time I'm | pretty sure you could make a similar claim about. More of a | microcontroller for automotive that general purpose processor, | tho. | gumby wrote: | Author doesn't mention that several of those machines with 36-bit | words had byte instructions allowing you to point at particular | byte (your choice as to width, from 1-36 bits wide) and/or to | stride through memory byte by byte (so an array of 3-bit fields | was as easy to manipulate as any other size). | | Also the ones I used to program (PDP-6/10/20) had an 18-bit | address space, which you may note is a CONS cell. In fact the | PDP-6 (first installed in 1964) was designed with LISP in mind | and several of its common instructions were LISP primitives (like | CAR and CDR). | drfuchs wrote: | Even more so, 6-bit characters were often used (supporting | upper case only), in order to squeeze six characters into a | word. Great for filenames and user id's. And for text files, | 7-bit was enough to get upper and lower case and all the | symbols, and you could pack five characters into a word. What | could be better? | downvotetruth wrote: | The obvious or most commonly occurring characters | [A-Za-z0-9\\. ]+ or upperloweralphanumericdotspace 6 bit | encoding seems absent. | samtho wrote: | I'm kind of disappointed that embedded computing was not | mentioned. It is the longest running use-case for resource | constrained applications and there are cases where not only are | you using 8-bit bytes but also an 8 bit CPU. BCD is still widely | used in this case to encode data to 7 segment displays or just as | data is relayed over the wire between chips. | williamDafoe wrote: | I agree completely! See my answer up above. Only 7 or 8 bits | makes sense for a microprocessor, not useful if you cannot | store 0-100 in a byte! With ASCII(1963) becoming ubiquitous, | the 8008 had to be 8-bits! Otherwise it would have been the | 7007 lol ... | [deleted] | [deleted] | moremetadata wrote: | > why was BCD popular? | | https://www.truenorthfloatingpoint.com/problem | | Floating point arithmetic has its problems. | | [1] Ariane 5 ROCKET, Flight 501 | | [2] Vancouver Stock Exchange | | [3] PATRIOT MISSILE FAILURE | | [4] The sinking of the Sleipner A offshore platform | | [1] https://en.wikipedia.org/wiki/Ariane_flight_V88 | | [2] https://en.wikipedia.org | /wiki/Vancouver_Stock_Exchange#Rounding_errors_on_its_Index_price | | [3] https://www-users.cse.umn.edu/~arnold/disasters/patriot.html | | [4] https://en.wikipedia.org/wiki/Sleipner_A#Collapse | elpocko wrote: | Can you elaborate? How/why is BCD a better alternative to | floating point arithmetic? | moremetadata wrote: | For the reasons others have mentioned, plus BCD doesnt suffer | data type issues in the same way unless the output data type | is wrong, but then the coder has more problems than they | realise. | | The only real disadvantage for BCD is its not as quick as | Floating point arithmetic, or bit swapping data types, but | with todays faster processors, for most people I'd say the | slower speed of BCD is a non issue. | | Throw in other hardware issues, like bit swapping in non ECC | memory and the chances of error's accumulate if not using | BCD. | finnh wrote: | floating point error. BCD guarantees you that 1/10th, | 1/100th, 1/100th, etc (to some configurable level) will be | perfectly accurate, without accumulating error during repeat | calculations. | | floating point cannot do that, its precision is based on | powers of 2 (1/2, 1/4, 1/8, and so on). For small values (in | the range 0-1), there are _so many_ values represented that | the powers of 2 map pretty tightly to the powers of 10. But | as you repeat calculations, or get into larger values (say, | in the range 1,000,000 - 1,000,001), the floating points | become more sparse and errors crop up even easier. | | For example, using 32 bit floating point values, each | consecutive floating point in the range 1,000,000 - 1,000,001 | is 0.0625 away from the next. jshell> | Math.ulp((float)1_000_000) $5 ==> 0.0625 | ajross wrote: | As others are pointing out, decimal fidelity and "error" | are different things. Any fixed point mantissa | representation in any base has a minimal precision of one | unit in its last place, the question is just which numbers | are exactly representable and which results have only | inexact representations that can accumulate error. | | BCD is attractive to human beings programming computers to | duplicate algorithms (generally financial ones) intended | for other human beings to execute using arabic numerals. | But it's not any more "accurate" (per transistor, it's | actually less accurate due to the overhead). | danbruc wrote: | You are confusing two things. Usually you represent decimal | numbers as rational fractions p/q with two integers. If you | fix q, you get a fixed point format, if you allow q to | vary, you get a floating point format. Unless you are | representing rational numbers you usually limit the | possible values of q, usually either powers of two or ten. | Powers of two will give you your familiar floating point | numbers but there are also base ten floating point numbers, | for example currency data types. | | BCD is a completely different thing, instead of tightly | encoding an integer you encode it digit by digit wasting | some fraction of a bit each time but make conversion to and | from decimal numbers much easier. But there is no advantage | compared to a base ten fixed or floating point | representation when it comes to representable numbers. | elpocko wrote: | You can have infinite precision in pretty much any accurate | representation though, no? Where is the advantage in using | BCD over any other fixed point representation? | KMag wrote: | The Ariane bug was an overflow casting 64-bit floating point to | 16-bit integer. It would still have overflowed at the same | point if it had been 64-bit decimal floating point using the | same units. The integer part of the floating point number still | wouldn't have fit in a signed 16-bit integer. | | As per the provided link, the Patriot missile error was 24-bit | fixed point arithmetic, not floating point. Granted, a fixed- | point representation in tenths of a second would have fixed | this particular problem, as would have using a clock frequency | that's a power of 1/2 (in Hz). Though, using a base 10 | representation would have prevented this rounding error, it | would also have reduced the time before overflow. | | I think IEEE-754r decimal floating point is a huge step | forward. In particular, I think there was a huge missed | opportunity in defining open spreadsheet formats that decimal | floating point option wasn't introduced. | | However, binary floating point rounding is irrelevant to the | Patriot fixed-point bug. | | It's not reasonable to expect accountants and laypeople to | understand binary floating point rounding. I've seen plenty of | programmers make goofy rounding errors in financial models and | trading systems. I've encountered a few developers who | literally believed the least significant few bits of a floating | point calculation are literally non-deterministic. (The best I | can tell, they thought spilling/loading x87 80-bit floats from | 64-bit stack-allocated storage resulted in whatever bits were | already present in the low-order bits in the x87 registers.) | pestatije wrote: | BCD is not floating point | coldtea wrote: | That's the parent's point | pflanze wrote: | Avoiding floating point doesn't imply BCD. Any | representation for integers would do fine, including | binary. | | There are two reasons for BCD, (1) to avoid the cost of | division for conversion to human readable representation as | implied in the OP, (2) when used to represent floating | point, to avoid "odd" representations in the human format | resulting from the conversion (like 1/10 not shown as 0.1). | (2) implies floating point. | | Eben in floating point represented using BCD you'd have | rounding errors when doing number calculations, that's | independent of the conversion to human readable formats; so | I don't see any reason to think that BCD would have avoided | any disasters unless humans were involved. BCD or not is | all about talking to humans, not to physics. | coldtea wrote: | > _Avoiding floating point doesn 't imply BCD_ | | Parent didn't say it's a logical necessity, as in "avoid | floating point ==> MUST use BCD". | | Just casually mentioned that one reason BCD got popular | to sidestep such issues in floating point. | | (I'm not saying that's the reason, or that it's the best | such option. It might even be historically untrue that | this was the reason - just saying the parent's statements | can and probably should be read like that). | pflanze wrote: | Sidestep which issue? The one of human representation, or | the problems with floating point? | | If they _just_ want to side step problems with floating | point rounding targetting the physical world, they need | to go with integers. Choosing BCD to represent those | integers makes no sense at all for that purpose. All I | sense is a conflation of issues. | | Also, thinking about it from a different angle, avoiding | issues with the physical world is one of properly | calculating so that rounding errors become no issues. | Choosing integers probably helps with that more in the | sense that it is making the programmer aware. Integers | are still discrete and you'll have rounding issues. | Higher precision can hide risks from rounding errors | becoming relevant, which is why f64 is often chosen over | f32. Going with an explicit resolution and range will | presumably (I'm not a specialist in this area) make | issues more upfront. Maybe at the risk of missing some | others (like with the Ariane rocket that blew up because | of a range overflow on integer numbers -- Edit: that | didn't happen _on_ the integer numbers though, but when | converting to them). | | A BCD number representation helps over the binary | representation when humans are involved who shouldn't be | surprised by the machine having different rounding than | what the human is used to from base 10. And _maybe_ | historically the cost of conversion. That 's all. (Pocket | calculators, and finance are the only areas I'm aware of | where that matters.) | | PS. danbruc | (https://news.ycombinator.com/item?id=35057850) says it | better than me. | sargstuff wrote: | Modern day vaccum tube hobby take on 8 bit ascii from | unabstracted signal procesing point of view (pre-type punning): | | 1920's-1950's were initially reusing prior experience/knowledge | of each punch card card hole as an individual electric on/off | switch [1], | | Electronic relays required 4 electrical inputs [2](flow | control/reset done per 'end of current row hole punches) | | 10 holes per line -> 3 relays!; 8 holes per line -> 2 relays, | where each relay deals with 4 bits. | | Switching away from physical punch card media to electric/audio, | 7holes per line, with extra bit for indicating 'done' with | current set of row holes. | | 8 holes per line needed 'software support' or make use of the | hardware for 3rd relay (formerly need for 10 holes in a line) | | Numbers faster because with 6 bits, don't need 3rd relay to do | flow control. | | Wonder if the pairing of binary sequence with graphic glyp could | be considered to be the origin of the closure concept. | | modern day abstractions based on '4 wire relay' concept: | | tcp/ip twisted pair | | usb prior to 3.2 vs. usb 3.2 variable lane width | | epci fixed lane vs. latest epci spec with variable width lane | | ----- | | [1] : http://quadibloc.com/comp/cardint.htm | | [2] : https://en.wikipedia.org/wiki/Vacuum_tube | PeterWhittaker wrote: | Or maybe it was C? http://www.catb.org/~esr/faqs/things-every- | hacker-once-knew/... | cpleppert wrote: | The transition started before C, EBCDIC was 8 bits and ASCII | was essentially a byte encoding. Unless you were designing some | exotic hardware you probably needed to handle text and that was | an eight bit byte. One motivation for the C type system was to | extend the B programming language to support ASCII characters. ___________________________________________________________________ (page generated 2023-03-07 23:00 UTC)