[HN Gopher] The essence of Reed-Solomon coding
       ___________________________________________________________________
        
       The essence of Reed-Solomon coding
        
       Author : rostayob
       Score  : 104 points
       Date   : 2022-11-06 14:01 UTC (8 hours ago)
        
 (HTM) web link (mazzo.li)
 (TXT) w3m dump (mazzo.li)
        
       | est wrote:
       | I think Reed-solomon should be considered in future network
       | protocols designs to combat censorship. Every byte should be
       | demuxed into bits and transferred in independent data streams, so
       | MITM boxes can only intercept incomplete streams, and aggregate
       | streams back to original would be insanely difficult. Let
       | transport layers do only one job and no distinguish whatever the
       | content might be inside.
       | 
       | Currently H2 does support M:N stream muxing but popular browsers
       | only support N:1 mode.
        
         | not2b wrote:
         | That seems an inferior approach to just using encryption.
        
         | vlovich123 wrote:
         | It's a comparatively expensive operation (CPU and memory)
         | compared with just encrypting the information which also blinds
         | the network operator to the same extent. Unless you're saying
         | that you'd send the stream across multiple disparate networks.
         | But if you're able to get packets out of one, what's stopping
         | you from getting the whole stream out that network?
        
       | vlovich123 wrote:
       | I really wish physical and OS network stacks would be able to
       | give you the ability to send uncorrected bit streams. That way
       | you can tune the error rate at the application level that makes
       | sense. For example, with video streaming, you probably don't need
       | much error correction on the data stream as periodic i-frames
       | would correct any transient glitches (you'd only bother to EC the
       | control headers for the video). Then WiFi networks maybe wouldn't
       | have to be as careful about time multiplexing all the coex
       | streams and some noise due to conflicts would be fine and not
       | require retransmission (because the application layer could
       | handle it).
       | 
       | This does come with tradeoffs (eg it may take your application
       | longer to recover from the noise than a quick retransmit at the
       | physical layer).
       | 
       | From a cost perspective it's also maybe impractical because the
       | computer industry gets efficiency gains by solving a problem for
       | everyone at some quality threshold by giving up optimality for
       | applications that could do something with it. Also you would
       | still need to correct the control layer of the network (IP + MAC)
       | just to make it work at all so it may be a wash (ie the
       | incremental cost of correcting the data vs control + data may be
       | insignificant).
       | 
       | Still, at least having the option as a switch that could be
       | flipped for experimentation purposes would be quite neat to allow
       | the curious to find new techniques / layers of abstractions vs
       | what's orthodoxy today.
        
         | sgtnoodle wrote:
         | You can do it in linux, with drivers that implement packet
         | injection. You're giving up a lot, though. I think you're right
         | about it being a wash with regard to control overhead. In
         | particular, because of the data ACKs, the firmware can
         | aggressively ramp the 802.11 frame data rate up and down.
         | Thanks to packet aggregation, the cost of the frame preamble
         | gets amortized across all buffered up data. Letting each
         | application schedule its own transmission would quickly eat up
         | the available channel bandwidth in frame overhead. It works for
         | a single specialized application, but monopolizes the entire
         | shared channel.
         | 
         | Folk do that sort of thing when transmitting low latency first-
         | person-video from drones, using commodity wifi hardware.
        
         | jonathanlydall wrote:
         | This is already done by many audio/video chat compression
         | algorithms which use UDP.
         | 
         | The idea being that with something like voice or video, if a
         | packet doesn't arrive, instead of stalling everything while you
         | wait for a retransmit, you instead just carry on regardless.
         | 
         | The occasional missing packet in voice is almost imperceptible,
         | however the more packets that get lost the more the voice seems
         | to "break up".
         | 
         | With video the symptoms are an accumulation of visual
         | corruption until the next key frame.
         | 
         | FPS games also do this (or at least used to), servers would
         | send a stream of UDP packets of entity states (as opposed to
         | sending deltas of state changes), so things like player1 is at
         | x,y,z coordinate with velocity a and heading vector of b.
         | 
         | If clients missed a packet they would just wait for the next
         | one since it's useless to know later where someone else was,
         | all you actually care about is where they are, or what's
         | happening, now.
        
           | YZF wrote:
           | I think the ask is for something like let a bit flip inside
           | the UDP packet or let a byte fall out of the UDP packet. The
           | integrity of UDP packets is still checked for (at different
           | layers).
           | 
           | For most networks the error rate is so low that I don't think
           | it's that valuable. Also you'd still want your metadata
           | checked otherwise it'll get potentially delivered to the
           | wrong place.
        
             | nsteel wrote:
             | That's what I assumed they meant but there's so many layers
             | of correction/protection going on and most of it is non-
             | optional if you want a working system. For example, if you
             | disabled the FEC that's used over high-speed serdes links
             | within a core router you'll be left with a broken system,
             | the error rate is much too high. In the designs I've worked
             | on you can't disable the internal CRC/ECC on the databuses
             | without risking corrupting the control data, which won't
             | end well. Nobody provides separate data and control ECC
             | protection, that's pointless overhead.
             | 
             | I guess they probably meant disable checking the ethernet
             | FCS, that might still work but I think it's a very bad
             | plan. I doubt this option is even exposed to network
             | operators.
        
           | ultrahax wrote:
           | Most engines I've seen will delta all the entity states
           | against the client's last acknowledged state. Costs some
           | memory and computation on both sides to keep the deltas
           | valid, but keeps the state update to under an MTU, generally.
        
         | [deleted]
        
       | nsteel wrote:
       | I think my first exposure to this was parchive files from usenet.
       | That feeling when your 3-month download of some "huge" 700MB iso
       | was corrupt, you load up quickpar and suddenly (quite a few
       | minutes later) it's all fixed! No idea how it worked at the time,
       | just magic.
        
       | nayuki wrote:
       | Using Reed-Solomon coding to recover from erasures (at known
       | positions) is relatively straightforward. It can be understood
       | with a basic knowledge of linear algebra and finite field
       | arithmetic.
       | 
       | Using RS for error correction (at initially unknown positions) is
       | quite difficult. I wrote a step-by-step guide on it including
       | demo code, and it doesn't even cover the most efficient decoding
       | algorithm (I used PGZ instead of Berlekamp-Massey):
       | https://www.nayuki.io/page/reed-solomon-error-correcting-cod...
        
         | rogers18445 wrote:
         | Usually, you would see RS used in a setting where any error is
         | going to be erasure (non erasure error -> failed checksum ->
         | erasure error).
         | 
         | Then you use something like LDPC codes for in-packet ECC. RS
         | for multi-packet ECC.
        
         | Hanschri wrote:
         | To understand how error correction works and to learn more
         | about Hamming codes & Reed-Solomon, 3Blue1Brown and Ben Eater
         | were invaluable. 3Blue1Brown and Ben Eater are by far some of
         | the best educational content creators within their fields,
         | mathematics and computer engineering respectively.
         | 
         | I would strongly recommend anyone interested in the topic to
         | check out any of these videos:
         | 
         | How to send a self-correcting message (Hamming codes):
         | https://www.youtube.com/watch?v=X8jsijhllIA
         | 
         | Hamming codes part 2, the elegance of it all:
         | https://www.youtube.com/watch?v=b3NxrZOu_CE
         | 
         | And any of Ben Eater's five videos on error correction:
         | https://eater.net/crc
         | 
         | As an aside, Ben Eater does all of his videos and
         | demonstrations using an 8-bit computer he has built step by
         | step in videos on a breadboard. Very impressive and inspiring.
        
         | abdullahkhalids wrote:
         | Is there a resource which lists out different codes and the
         | decoding algorithms that work with them?
        
       | aortega wrote:
       | There are many types of Reed-Solomon codes, each with different
       | amount of redundancy, they are a special case of BCH codes, a
       | kind of code that approach the Shannon limit (the maximum rate of
       | error-free data that can be transferred over a noisy channel).
       | The bandwidth of a channel (like radio, or optical fiber) is not
       | really limited by the media, but only by the noise it has.
       | Optical fiber have almost zero noise, that's why they are so
       | fast.
       | 
       | There are much better codes nowadays that are closer to the
       | Shannon limit, like LDPC, or convolutional codes. But they are
       | usually much more computationally intensive. They are used in
       | space probes where computation time don't matter, but you often
       | have channels with much more noise than signal.
        
       | terramars wrote:
       | I implemented the RS encoder that ended up being used on the
       | production satellite bus for the telemetry system in space
       | systems / Loral upgraded bus during my engineering coop in
       | school! All in verilog. It's complicated especially the decoder
       | but understandable with time. The turbo codes and more modern
       | extensions that enable lte / 5g and other low power / small
       | antenna applications are absolute black magic though. Such a cool
       | field!
        
       | agsamek wrote:
       | Reed-Solomon is the foundation of today's computing. It is used
       | in data storage (hdd, ssd) and in data transfer protocols. It
       | allows for building of a reliable system on top of an unreliable
       | real life fenomens with desired level of certainty. This is so
       | incredible tech that once implemented we can just forget about it
       | in the higher level abstractions.
        
       | tromp wrote:
       | This is also the essence behind Shamir's Secret Sharing where the
       | sample at x=0 is a secret shared by k+t parties any k of which
       | can recover the secret.
       | 
       | [1] https://en.wikipedia.org/wiki/Shamir%27s_Secret_Sharing
        
       | QuinnyPig wrote:
       | This is, for example, how Amazon S3 works.
        
         | kccqzy wrote:
         | It's also how GCS (and the underlying Colossus system) works.
         | 
         | https://news.ycombinator.com/item?id=11713406
        
         | nayuki wrote:
         | This is how Backblaze works.
         | https://www.backblaze.com/blog/reed-solomon/ ,
         | https://github.com/Backblaze/JavaReedSolomon ,
         | https://news.ycombinator.com/item?id=9726890
        
           | markc wrote:
           | And Sia distributed storage network https://gitlab.com/Nebulo
           | usLabs/Sia/-/blob/master/modules/er...
        
       | userbinator wrote:
       | I've found that, just as with CRCs, there's an abundance of
       | articles that show the theoretical explanation of RS, but aren't
       | much help for those wanting to actually implement it. Here's a
       | good practical explanation of implementing RS, including the GF
       | operations:
       | https://en.wikiversity.org/wiki/Reed%E2%80%93Solomon_codes_f...
        
         | corsix wrote:
         | Alternatively, the following pair of articles, the first of
         | which is already referenced as a footnote in the OP:
         | http://www.corsix.org/content/galois-field-instructions-2021...
         | http://www.corsix.org/content/reed-solomon-for-software-raid
        
       | klodolph wrote:
       | Some additions, as "exercise for the reader":
       | 
       | 1. The finite field you choose has a minimum size. What is the
       | minimum size field 2^bits for an RS(N,K) coding system? What
       | happens when you try to construct a Reed-Solomon code with a
       | finite field that is too small?
       | 
       | 2. Consider a Reed-Solomon coding system which uses a lookup
       | table for the finite field multiplication operation that fits in
       | L1 cache. Given that the table already fits in L1 cache, how
       | could you make the encoder/decoder faster, if you had a smaller
       | finite field?
        
         | nayuki wrote:
         | > What is the minimum size field 2^bits for an RS(N,K) coding
         | system?
         | 
         | Your field size must be at least N+1, noting that you shouldn't
         | use the value 0 in the encoding matrix.
         | 
         | > What happens when you try to construct a Reed-Solomon code
         | with a finite field that is too small?
         | 
         | Your system of linear equations doesn't have enough linearly
         | independent equations.
         | 
         | > how could you make the encoder/decoder faster
         | 
         | Maybe put the lookup table in a 64-bit general-purpose register
         | and use bit shifting, or in a 128/256/512-bit SIMD register and
         | use extraction instructions (shuffle bytes, etc.).
        
       ___________________________________________________________________
       (page generated 2022-11-06 23:00 UTC)