[HN Gopher] How Do Routers Work, Really?
       ___________________________________________________________________
        
       How Do Routers Work, Really?
        
       Author : turingbook
       Score  : 136 points
       Date   : 2020-09-10 19:00 UTC (3 hours ago)
        
 (HTM) web link (kamila.is)
 (TXT) w3m dump (kamila.is)
        
       | bogomipz wrote:
       | >"It needs to be routed: the router, based on L3 information,
       | decides where it needs to go ,in L3 speak - it will decide which
       | host to send it to, but not how. This corresponds to the routing
       | table (or FIB)."
       | 
       | This is not correct. The FIB(forwarding information base) is
       | concerned with layer 2. The RIB(routing information base)
       | determines the next hop. The RIB is what is used to populate
       | entries in the FIB with the correct outgoing interface. These two
       | terms are basic router terms. It was kind of surprising to see
       | this statement in a post titled "How Do Routers Work, Really?"
        
         | anotherkamila_ wrote:
         | You're right, I noticed it about an hour ago -- no idea what
         | was going on in my head then :-/ Fixed already. Thank you!
        
       | Cyph0n wrote:
       | > If that is the case, my condolences.
       | 
       | As a software engineer working on IOS-XR, that gave me a chuckle
       | :p
       | 
       | In the case of enterprise- and SP-grade routers, the data-plane -
       | i.e., where the actual forwarding and lookups take place - runs
       | entirely on a dedicated network processor (NP), mainly for
       | performance reasons. Information on the NP is populated by the
       | router's operating system in response to user configuration,
       | network topology changes, or protocol state updates. On the other
       | hand, the control plane runs mainly on the CPU(s). This is
       | required so that the protocols running on the router OS (e.g.,
       | BGP) can receive and send out updates based on their state
       | machines.
        
         | peterwwillis wrote:
         | I think the simplest way for people familiar with PCs to
         | visualize it are the FirePOWER devices. Network cards plugged
         | into some slot have embedded chips which can be programmed to,
         | say, filter specific kinds of traffic, or pass it onto the host
         | CPU for more advanced logic. While the machine's central CPU
         | runs a web interface, manages local databases, downloads
         | updates, manages clusters, records metrics, etc. And either can
         | even be hot-pluggable, interchangeable blades in a larger
         | machine chassis.
         | 
         | Protocol-wise, isn't it common now for the NP on higher end
         | stuff to handle L4 and higher protocols? Or are those still
         | largely managed by the CPU?
        
           | Cyph0n wrote:
           | Yeah, NPs can handle L4 protocols, but I believe it's usually
           | a hybrid approach where the logic is split between CPU and
           | NP.
        
         | anotherkamila_ wrote:
         | > As a software engineer working on IOS-XR, that gave me a
         | chuckle :p
         | 
         | Good good :D
         | 
         | Thanks for the clear data plane / control plane explanation,
         | that's a good way to summarise the distinction. May I link to
         | it from the article?
        
           | Cyph0n wrote:
           | Thanks! Sure, go ahead!
        
       | rabuse wrote:
       | I learned a lot about networking when setting up servers in
       | racks. Had to deal with issues arising from terrible UI's on a
       | lot of the routers out there, so I just kept digging deeper and
       | deeper into how it all works. Also, if more are looking into how
       | packets are actually routed, look into BGP, and how CDN's work.
       | Great stuff.
        
         | walshemj wrote:
         | I would start with how internal routing works before starting
         | on WAN routing.
         | 
         | Id look at the cisco press and CCNA training materials
        
       | anotherkamila_ wrote:
       | Hi, I'm the author. Uh hi w00t how why what's it doing here?! :D
       | 
       | I promise to make it better and actually finish it now! Check
       | back in a day or two I guess? Also I should post the code I
       | promised. Hello from the ADHD squirrel!
        
         | anotherkamila_ wrote:
         | Also thanks a ton for your suggestions, I really appreciate
         | them!
        
       | dnautics wrote:
       | this is great if for no other reason that in section 1 it
       | explains the difference between a switch and a router (which took
       | me a decade? to really understand). I really wish someone could
       | have laid it out clearly for me.
        
       | xg15 wrote:
       | > _Note that the next hop's IP address is in the router's memory
       | only: it does not appear in the packet at any time._
       | 
       | This clears some points that always puzzled me:
       | 
       | If the gateway is identified by an IP address, but the
       | destination host is also an IP address, which address exactly is
       | put into the packet? And how can a packet be routed if the
       | gateway's IP is itself part of the subnet that's supposed to be
       | routed to it. (E.g. 192.168.0.0/24 with default gateway
       | 192.168.0.1)
       | 
       | So the answer is, if I send the packet to host 1.1.1.1 but the
       | routing table has 2.2.2.2 as the next hop, the packet will have
       | 1.1.1.1 as the destination in the IP part but the _MAC of
       | 2.2.2.2_ as destination of the Ethernet part (or equivalent). It
       | doesn 't matter which subnet the next hop's IP is in, as the
       | routing table isn't consulted for it anyway - it's only used in
       | ARP)
       | 
       | This leaves the question, why the indirection and why the mucking
       | around with ARP and IPs that are never used as the destination to
       | anything?
       | 
       | Couldn't you simply put the next hop's MAC address (instead of IP
       | address) into the routing table and be able to route packets just
       | as well, with a lot less complexity?
        
         | monocasa wrote:
         | A lot of protocols don't end up using Ethernet as the physical
         | layer, even ones you still use today.
         | 
         | Qemu (and I think Docker too?) use SLIRP internally for access
         | between VMs which is ultimately an IP layer bridge.
         | 
         | On the WAN side (at least at one point, I could be out of date
         | here) they didn't use Ethernet, but instead IP layer routing as
         | well, on top of stuff like PPP and SONET.
        
         | james412 wrote:
         | IP addresses sharing a route have a common prefix. This is not
         | true of MAC addresses. They are allocated essentially randomly.
         | If you wanted to route solely using MAC addresses, every router
         | in the world would need a lookup table containing every MAC
         | address, route aggregation would be impossible
         | 
         | That's not /the/ reason why a MAC address is involved. It's
         | because that's the address for a physical device at a lower
         | layer in the stack. As others mention, IP is media-independent,
         | it cannot depend on a lower tier addressing scheme without
         | becoming fused to that medium
        
         | w7 wrote:
         | You can have network segments which do not use ethernet and
         | therefor have no MAC addresses, but still use IP addressing and
         | need to be routable. It doesn't make sense to tie the next-hop
         | in a table to MAC addresses which are an implementation detail
         | on a lower layer. A good, popular, example of this you can test
         | yourself without obscure hardware is wireguard.
        
         | rmetzler wrote:
         | If you would put next hops MAC address in the routing table and
         | the device fails and needs to be replaced, all the routing
         | tables would need to be rewritten, because MACs are supposed to
         | be unique. You couldn't just take a spare device, configure it
         | accordingly and be done with it.
        
         | bluecmd wrote:
         | IPV6 commonly does that. Your next hop is installed as a link-
         | local fe80-entry which is derived from the mac address. Not
         | exactly what you're after, but removes the IP numbering need.
        
         | yabones wrote:
         | The reason for that is because IP is not 'integrated' with
         | layer-2 tech like Ethernet. In fact, for a very long time
         | Ethernet was only really used on local networks. Point-to-Point
         | Protocol (PPP) [1] is a completely separate data link layer
         | technology with no real concept of MAC addresses, because there
         | can only be two devices on the bus.
         | 
         | Most of the very expensive 'multilayer' switches [2] do a form
         | of this where they associate a next-hop IP with a MAC address
         | entry and store that in the TCAM or data layer. It's not used
         | as much because Cisco has a ton of patents on this type of
         | technology, and also because general purpose hardware has
         | gotten quick enough that it's not as important as it was ~15
         | years ago...
         | 
         | [1] https://en.wikipedia.org/wiki/Point-to-Point_Protocol
         | 
         | [2]
         | https://en.wikipedia.org/wiki/Multilayer_switch#Layer-3_swit...
        
         | wmf wrote:
         | Historically, some links didn't have MAC addresses and
         | different link types have different address types so it's
         | easier for the routing protocols to work in terms of IP
         | addresses.
        
         | jcrawfordor wrote:
         | To give a simplified but largely accurate summation: IP and
         | Ethernet were each designed in different time periods and
         | largely without knowledge of the other. Ethernet was
         | historically used in such a fashion that multiple hosts (more
         | than 2) occupied the same collision domain, that is, they were
         | physically connected to the same cable, or through hubs that
         | repeated frames to all interfaces without routing. This means
         | that Ethernet required an addressing scheme so that hosts on
         | the same media knew which frames were for them (higher-level
         | protocols at the time did not necessarily handle this).
         | 
         | Ethernet's addressing scheme was not designed to accommodate
         | large hierarchical networks and so is unsuitable for the IP use
         | case, but more importantly, IP was designed completely
         | separately from Ethernet, and was not used primarily with
         | Ethernet until later, so IP could not "assume" that the layer
         | below it handled addressing (typically there was either no
         | layer below [point-to-point] or only a very simple one).
         | 
         | The result is that Ethernet and IP duplicate functionality to
         | some extent. It is theoretically possible, although not common,
         | to build a network which uses only layer 3 routing without any
         | reliance on Ethernet addressing. A significant reason this is
         | rare, arguably _the_ most significant reason, is that IP is now
         | carried over Ethernet a significant majority of the time and L2
         | Ethernet devices (like switches) require the use of Ethernet
         | addressing for the network to function. You usually see  "pure
         | IP" in virtual networking environments where the IP is
         | encapsulated in, well, more IP, but even then Ethernet frames
         | are sometimes used because, well, just like network hardware,
         | operating system network stacks generally expect them (examine,
         | e.g., the linux bridge implementation). It is completely
         | possible to build network stacks and network appliances which
         | do not require the use of Ethernet but it is expensive and
         | there's not much of a motivation to do so, and you'd run into
         | issues with any kind of equipment not so designed.
         | 
         | Addressing is not the only duplicate functionality between
         | Ethernet and IP, and it's one of the less significant ones
         | since Ethernet addressing does provide utility even if not
         | strictly required. Ethernet frames are checksummed, and IP
         | headers are also checksummed, even though the Ethernet checksum
         | is already over them. The IP header checksum exists because IP
         | was historically carried over lower layers that did not provide
         | integrity checking. This is basically pure wasted space in
         | typical networks, so IPv6 drops the header checksum to remove
         | the overhead.
         | 
         | In general, though, network protocols tend to make more sense
         | when you have some awareness of the history of their
         | development, as when you try to view the modern internet as an
         | elegant, monolithic design as some authors attempt, a lot of
         | things won't make sense because they simply are that way for
         | historic reasons. Ethernet and IP were each designed in the
         | '70s, but separately, and their use has accumulated significant
         | cruft since then, including some radical changes in the ways
         | that they were used (for example the transition of Ethernet
         | from shared media to point-to-point, which occurred de facto
         | earlier but became largely formalized with the introduction of
         | GbE which prohibits more than two hosts in a collision domain,
         | and of course ironically the introduction of multiple hosts in
         | a collision domain as an even larger issue with wireless
         | protocols, which requires additional handling below, or
         | actually in lieu of, the ethernet layer, 802.11 being a
         | replacement for ethernet that happens to behave similarly in
         | many ways for compatibility).
         | 
         | Finally, the OSI model is something that tends to add
         | complexity and confusion to these discussions, which is why I
         | doggedly discourage its use in teaching. The OSI Model
         | describes the OSI protocols, which were contemporaries
         | competitors to the TCP/IP protocols. Arguably, one of the
         | reasons that the OSI protocols fell out of use (in favor of IP)
         | is exactly because they assumed seven layers, and each was
         | fairly complex. Some OSI protocols are still in use, for
         | example IS-IS (OSI layer 2) in the telecom industry and some
         | backbone IP transit, but in niches and generally being replaced
         | with IP. IP is intentionally simpler, and can be fully
         | described using four layers, what's usually referred to as the
         | TCP/IP model.
         | 
         | The OSI layers do not map 1:1 to the TCP/IP layers, even if you
         | simply ignore the ones that map more poorly as instructors
         | often do. Even worse, many instructors and textbook authors
         | feel such a strong compulsion to map modern networks to the
         | obsolete OSI model that they cram application-layer protocols
         | into OSI layers 5 and 6 in order to have examples of them. I
         | have seen cases as extreme as an instructor claiming that HTTP
         | cookies represent the session layer. This kind of thing is
         | nonsense and hinders understanding rather than contributing to
         | it. If the OSI model is taught (not a bad idea at all as
         | students should realize that TCP/IP is merely the popular way,
         | and certainly not the only way), it should be taught
         | specifically by contrasting it to the different TCP/IP model.
         | Unfortunately few instructors and website authors today seem to
         | even be aware that the OSI protocol stack existed separately
         | from IP.
         | 
         | And, if you are wondering, yes, Ethernet can be used in a
         | switched network completely independently from IP (although not
         | really in a routed network unless you are generous about how
         | you define routing). This was more common decades ago, the only
         | equipment I have ever personally encountered that used bare
         | Ethernet was a very outdated CNC setup.
        
         | swinglock wrote:
         | > It doesn't matter which subnet the next hop's IP is in, as
         | the routing table isn't consulted for it anyway - it's only
         | used in ARP)
         | 
         | You can only ARP for hosts on the same subnet as you, terrible
         | hacks excluded.
         | 
         | > This leaves the question, why the indirection and why the
         | mucking around with ARP and IPs that are never used as the
         | destination to anything?
         | 
         | Because it was designed in layers so that different layers
         | could be replaced. We didn't know we'd end up with mostly only
         | IP and Ethernet in LANs back then.
         | 
         | > Couldn't you simply put the next hop's MAC address (instead
         | of IP address) into the routing table and be able to route
         | packets just as well, with a lot less complexity?
         | 
         | It could have been done in any number of ways. It's not that
         | much complexity through and it would bake Ethernet MACs into
         | everything IP, even in the cases where it's not needed.
        
           | AlphaSite wrote:
           | Fiddling with ARO comes up more often that you'd think,
           | especially as a quick easy way to handle HA.
        
       | boryas wrote:
       | I believe this piece does a good job with forwarding, but would
       | be improved by a discussion of termination.
       | 
       | Routing is only triggered when the packet is L2 terminated: the
       | destination MAC of the packet is one of the router's own MACs.
       | 
       | If the packet's destination MAC does not belong to the router, it
       | doesn't matter what is in its IP header, it will be switched in
       | the LAN it came in on.
       | 
       | This design also generalizes nicely to the case when the
       | destination IP of a routed packet is one of the router's IPs.
        
         | anotherkamila_ wrote:
         | Good point. Incorporating that would require more brain that I
         | have right now (bad timezone :D), but you're right, I
         | completely left that out. May I update the article with a link
         | to this comment?
        
       | geerlingguy wrote:
       | I learned how routers _really_ work from Ericsson 's seminal
       | video on the matter, The Good Warriors of the Net:
       | https://www.youtube.com/watch?v=x9XWxD6cJuY
       | 
       | Though I always thought the "router switch" was much more fun.
        
         | Spare_account wrote:
         | I watched this decades ago and forgot just enough about it that
         | I couldn't find it again recently when I tried. Thank you
        
         | jpxw wrote:
         | Just watched the whole video, amazing, nostalgic but also
         | subtly wrong in a number of annoying ways!
        
         | dec0dedab0de wrote:
         | Haha I forgot about this video. It was required viewing at my
         | first job.
        
         | sgillen wrote:
         | Haha thanks for sharing. Interesting how much emphasis there is
         | on "the ping of death" compared to literally any other exploit.
         | Does anyone know if this was really such a big problem when
         | this video came out?
        
           | schoen wrote:
           | What I remember is that the ping of death was extremely
           | surprising in terms of the number of OSes affected, the ease
           | of exploiting it, and the super-noticeable consequence of
           | instantly crashing the target machine. And it came out at a
           | time when there wasn't as much vulnerability research and
           | very few extensively cross-platform vulnerabilities.
           | 
           | Also, with the ping of death, the only way to use it was to
           | very noticeably crash systems -- not to secretly build a
           | botnet or something, as might have been done with RCE
           | vulnerabilities.
        
           | geerlingguy wrote:
           | I do remember hearing about it causing issues here and there
           | in the 90s/early 00s, but rarely. Never hear about it
           | anymore.
           | 
           | But I do remember AppleTalk causing issues more frequently on
           | a network I helped manage that had radio studios with two
           | Macs per studio, but mostly Windows PCs through the rest of
           | the building.
           | 
           | That place also had a Macintosh 512K running its phone system
           | until around 2010!
        
       | pfarrell wrote:
       | I would suggest expanding your terminology section. I know almost
       | nothing about routers and I'm lost in the first sentence of the
       | High Level Overview section.                 "A switch (or an L2
       | switch :-) ) is an L2-only thing."
       | 
       | I don't know what L2 means. I suspect a definition of the various
       | levels would expand the audience for this post.
        
         | AlphaSite wrote:
         | I think you need to know your audience and cater to them,
         | trying to explain everything just ends in a book. L2 is
         | especially googleable.
        
           | pfarrell wrote:
           | This is a good point. You have to have _some_ assumptions of
           | what your audience brings.
           | 
           | I'm aware there are levels of information in an IP packet,
           | but I don't know them offhand. If I have to google something
           | on the first sentence in a high level overview, then I'm
           | likely not going to read the piece and the author has lost me
           | as a reader. Maybe I'm not the target audience, though I was
           | interested. I'm providing that as feedback for the origial
           | author since the piece mentions that's it's still a work in
           | progress.
        
           | [deleted]
        
           | hinkley wrote:
           | To be fair, L2 could be Layer 2 or Level 2 (cache) and it
           | might be a crapshoot what you get. You might get confused
           | trying to answer your own questions.
           | 
           | Discoverability lives in the space between overexplaining and
           | underexplaining.
        
             | wruza wrote:
             | One can just add switch, router, network, etc to the query
             | until it works. Supposedly they'll all work. Weak google fu
             | means no info today, and if OP and the author are not the
             | same person, then the latter may not even have a clue that
             | it was posted on hn, where such high standards apply. If
             | someone brought an electronics forum wiki post, should one
             | expect every TLA1 to be explained there too?
             | 
             | 1 Three Letter Acronym/Abbreviation
        
         | Cerium wrote:
         | The IP stack has the concept of layers, which function as
         | abstractions that hide the implementation of lower layers from
         | the upper layers. Layer 2 (L2) is the physical link layer - it
         | only cares about getting a packet between two devices. Layer 3
         | (L3) is where IP addresses live. As the article describes a
         | router has functionality to send a packet towards its final
         | destination as well as get it between ports.
        
           | josteink wrote:
           | > The IP stack has the concept of layers, which function as
           | abstractions that hide the implementation of lower layers
           | from the upper layers
           | 
           | Correction: the _network_ stack has layers, where IP is one
           | of them, near the top.
           | 
           | Which is why most software targets IP. It's a good
           | abstraction and it's portable.
        
             | cameronh90 wrote:
             | GP may be referring to the "TCP/IP model" which does indeed
             | define the layers used in common parlance. This model has 4
             | layers in contrast to the OSI model's 7 layers. The TCP/IP
             | model is closer to how most real life network stack
             | implementations are defined.
             | 
             | Arguably even this layering system is too rigid for reality
             | but it's a decent model. See RFC 3439 section 3.
        
           | varjag wrote:
           | L1 is (naturally) physical. L2 is data link.
        
           | Cyph0n wrote:
           | L1 is the physical layer. L2 is the MAC layer.
        
         | tejohnso wrote:
         | https://en.wikipedia.org/wiki/OSI_model#Layer_2:_Data_Link_L...
        
         | hinkley wrote:
         | Reading the replies, I somewhat doubt whether you still know
         | what L2 means. The danger of being a nerd is sometimes you say
         | a lot of words but they don't mean anything.
         | 
         | Ethernet. L2 means Ethernet (or WiFi). Ethernet is the envelope
         | we put Internet traffic in (L3) and the layers above that are
         | about nailing down how exactly a conversation is managed.
         | Sometimes people get upset about what constitutes Layers 5-7,
         | especially since that Tim Berners-Lee joker ruined all the
         | pretty pictures with HTTP. So mostly we only talk about 2,3,4
         | and 7, in the same way you don't bring up religion or politics
         | at a family reunion.
        
         | mav3rick wrote:
         | L3 => IPs L2 => MAC addresses
        
         | msla wrote:
         | It's important to keep layering in mind when talking to people
         | outside the IETF, but the IETF itself is not impressed:
         | 
         | https://en.wikipedia.org/wiki/Internet_protocol_suite#Compar...
         | 
         | > The IETF protocol development effort is not concerned with
         | strict layering. Some of its protocols may not fit cleanly into
         | the OSI model, although RFCs sometimes refer to it and often
         | use the old OSI layer numbers. The IETF has repeatedly stated
         | that Internet protocol and architecture development is not
         | intended to be OSI-compliant. RFC 3439, referring to the
         | Internet architecture, contains a section entitled: "Layering
         | Considered Harmful".
         | 
         | Anyway: People sometimes like to pretend that OSI is a model
         | and TCP/IP implements the model, forgetting that OSI is/was a
         | protocol stack and TCP/IP has no interest in being "compliant"
         | with any other protocol stack to the extent it mimics its
         | layering architecture.
        
           | _jal wrote:
           | For me the OSI tends to come up at work to talk about scope
           | or areas of control. People will say "that happens in layer
           | 3" (for instance) as shorthand, not as a referent that
           | corresponds to any actual thing.
        
           | jlmcguire wrote:
           | This is one of those cases where both sides have some insight
           | depending on viewpoint. The OSI model is like every other
           | model. It isn't reality (at least in TCP/IP) but instead is a
           | helpful abstraction esp. around troubleshooting and
           | understanding networking concepts. There comes a point where
           | the model breaks down but that doesn't mean it's an unhelpful
           | model just that it isn't a complete picture. I try and work
           | networking problems through the OSI layer model but am aware
           | when things don't really fit well into it (MPLS, MSS, ARP,
           | Layer 5-7).
        
             | msla wrote:
             | I agree with you, except that the use of the OSI model
             | seems to be distorting history: TCP/IP went up against OSI
             | and won, even though OSI was favored, because TCP/IP could
             | get working systems faster. That's a lesson which should be
             | learned, but it gets obscured if you think that TCP/IP
             | implemented OSI and there never was a competition.
             | 
             | Plus, the OSI model is rather complicated; there's a
             | "TCP/IP Model" with four layers which is a lot simpler:
             | 
             | https://www.geeksforgeeks.org/tcp-ip-model/
             | 
             | > Process/Application Layer
             | 
             | > Host-to-Host/Transport Layer
             | 
             | > Internet Layer
             | 
             | > Network Access/Link Layer
             | 
             | (This seems to be the RFC 1122 model, BTW.)
             | 
             | RFC 1122 and RFC 871 each have models, too.
             | 
             | RFC 871 has:
             | 
             | > Application/Process
             | 
             | > Host-to-host
             | 
             | > Network interface
             | 
             | https://en.wikipedia.org/wiki/Internet_protocol_suite
        
         | Johnny555 wrote:
         | I don't think the post is meant to be a beginners level
         | introduction to networking, the author writes:
         | 
         |  _This is the inside view of how exactly a router operates. You
         | only need to know this if you are poking inside a router
         | implementation. If that is the case, my condolences._
         | 
         | If you're poking inside a router implementation, it seems fair
         | to expect that you have a basic understanding of OSI networking
         | layers.
        
         | IncRnd wrote:
         | This refers to Layer 2 in the OSI model of the network stack.
         | See https://en.wikipedia.org/wiki/OSI_model
         | 
         | 1. physical layer, 2. data link, 3.vnetwork, 4. transport, 5.
         | session, 6. presentation, 7. application layer.
         | 
         | So, many switches are layer 2, but layer 3 switches are often
         | referred to as switching routers. This can cause two different
         | switches to act differently from each other in certain network
         | environments. It isn't that one switch "doesn't work" but that
         | it isn't a router.
         | 
         | A router is nominally a L3 device, though most actually are
         | L1-7. To work, you need L1 & L2, but in today's world, there
         | are applications and interfaces that move the router across
         | L1-7, though not to the same depth as purpose built application
         | devices for example. Topping this off, some routers will switch
         | and some will not. It's the same wide-world of words that we
         | see across the whole computer industry.
         | 
         | The OSI model differs from the TCP model of networking, even
         | though both use numbered layers.
        
       ___________________________________________________________________
       (page generated 2020-09-10 23:00 UTC)