[HN Gopher] How Do Routers Work, Really? ___________________________________________________________________ How Do Routers Work, Really? Author : turingbook Score : 136 points Date : 2020-09-10 19:00 UTC (3 hours ago) (HTM) web link (kamila.is) (TXT) w3m dump (kamila.is) | bogomipz wrote: | >"It needs to be routed: the router, based on L3 information, | decides where it needs to go ,in L3 speak - it will decide which | host to send it to, but not how. This corresponds to the routing | table (or FIB)." | | This is not correct. The FIB(forwarding information base) is | concerned with layer 2. The RIB(routing information base) | determines the next hop. The RIB is what is used to populate | entries in the FIB with the correct outgoing interface. These two | terms are basic router terms. It was kind of surprising to see | this statement in a post titled "How Do Routers Work, Really?" | anotherkamila_ wrote: | You're right, I noticed it about an hour ago -- no idea what | was going on in my head then :-/ Fixed already. Thank you! | Cyph0n wrote: | > If that is the case, my condolences. | | As a software engineer working on IOS-XR, that gave me a chuckle | :p | | In the case of enterprise- and SP-grade routers, the data-plane - | i.e., where the actual forwarding and lookups take place - runs | entirely on a dedicated network processor (NP), mainly for | performance reasons. Information on the NP is populated by the | router's operating system in response to user configuration, | network topology changes, or protocol state updates. On the other | hand, the control plane runs mainly on the CPU(s). This is | required so that the protocols running on the router OS (e.g., | BGP) can receive and send out updates based on their state | machines. | peterwwillis wrote: | I think the simplest way for people familiar with PCs to | visualize it are the FirePOWER devices. Network cards plugged | into some slot have embedded chips which can be programmed to, | say, filter specific kinds of traffic, or pass it onto the host | CPU for more advanced logic. While the machine's central CPU | runs a web interface, manages local databases, downloads | updates, manages clusters, records metrics, etc. And either can | even be hot-pluggable, interchangeable blades in a larger | machine chassis. | | Protocol-wise, isn't it common now for the NP on higher end | stuff to handle L4 and higher protocols? Or are those still | largely managed by the CPU? | Cyph0n wrote: | Yeah, NPs can handle L4 protocols, but I believe it's usually | a hybrid approach where the logic is split between CPU and | NP. | anotherkamila_ wrote: | > As a software engineer working on IOS-XR, that gave me a | chuckle :p | | Good good :D | | Thanks for the clear data plane / control plane explanation, | that's a good way to summarise the distinction. May I link to | it from the article? | Cyph0n wrote: | Thanks! Sure, go ahead! | rabuse wrote: | I learned a lot about networking when setting up servers in | racks. Had to deal with issues arising from terrible UI's on a | lot of the routers out there, so I just kept digging deeper and | deeper into how it all works. Also, if more are looking into how | packets are actually routed, look into BGP, and how CDN's work. | Great stuff. | walshemj wrote: | I would start with how internal routing works before starting | on WAN routing. | | Id look at the cisco press and CCNA training materials | anotherkamila_ wrote: | Hi, I'm the author. Uh hi w00t how why what's it doing here?! :D | | I promise to make it better and actually finish it now! Check | back in a day or two I guess? Also I should post the code I | promised. Hello from the ADHD squirrel! | anotherkamila_ wrote: | Also thanks a ton for your suggestions, I really appreciate | them! | dnautics wrote: | this is great if for no other reason that in section 1 it | explains the difference between a switch and a router (which took | me a decade? to really understand). I really wish someone could | have laid it out clearly for me. | xg15 wrote: | > _Note that the next hop's IP address is in the router's memory | only: it does not appear in the packet at any time._ | | This clears some points that always puzzled me: | | If the gateway is identified by an IP address, but the | destination host is also an IP address, which address exactly is | put into the packet? And how can a packet be routed if the | gateway's IP is itself part of the subnet that's supposed to be | routed to it. (E.g. 192.168.0.0/24 with default gateway | 192.168.0.1) | | So the answer is, if I send the packet to host 1.1.1.1 but the | routing table has 2.2.2.2 as the next hop, the packet will have | 1.1.1.1 as the destination in the IP part but the _MAC of | 2.2.2.2_ as destination of the Ethernet part (or equivalent). It | doesn 't matter which subnet the next hop's IP is in, as the | routing table isn't consulted for it anyway - it's only used in | ARP) | | This leaves the question, why the indirection and why the mucking | around with ARP and IPs that are never used as the destination to | anything? | | Couldn't you simply put the next hop's MAC address (instead of IP | address) into the routing table and be able to route packets just | as well, with a lot less complexity? | monocasa wrote: | A lot of protocols don't end up using Ethernet as the physical | layer, even ones you still use today. | | Qemu (and I think Docker too?) use SLIRP internally for access | between VMs which is ultimately an IP layer bridge. | | On the WAN side (at least at one point, I could be out of date | here) they didn't use Ethernet, but instead IP layer routing as | well, on top of stuff like PPP and SONET. | james412 wrote: | IP addresses sharing a route have a common prefix. This is not | true of MAC addresses. They are allocated essentially randomly. | If you wanted to route solely using MAC addresses, every router | in the world would need a lookup table containing every MAC | address, route aggregation would be impossible | | That's not /the/ reason why a MAC address is involved. It's | because that's the address for a physical device at a lower | layer in the stack. As others mention, IP is media-independent, | it cannot depend on a lower tier addressing scheme without | becoming fused to that medium | w7 wrote: | You can have network segments which do not use ethernet and | therefor have no MAC addresses, but still use IP addressing and | need to be routable. It doesn't make sense to tie the next-hop | in a table to MAC addresses which are an implementation detail | on a lower layer. A good, popular, example of this you can test | yourself without obscure hardware is wireguard. | rmetzler wrote: | If you would put next hops MAC address in the routing table and | the device fails and needs to be replaced, all the routing | tables would need to be rewritten, because MACs are supposed to | be unique. You couldn't just take a spare device, configure it | accordingly and be done with it. | bluecmd wrote: | IPV6 commonly does that. Your next hop is installed as a link- | local fe80-entry which is derived from the mac address. Not | exactly what you're after, but removes the IP numbering need. | yabones wrote: | The reason for that is because IP is not 'integrated' with | layer-2 tech like Ethernet. In fact, for a very long time | Ethernet was only really used on local networks. Point-to-Point | Protocol (PPP) [1] is a completely separate data link layer | technology with no real concept of MAC addresses, because there | can only be two devices on the bus. | | Most of the very expensive 'multilayer' switches [2] do a form | of this where they associate a next-hop IP with a MAC address | entry and store that in the TCAM or data layer. It's not used | as much because Cisco has a ton of patents on this type of | technology, and also because general purpose hardware has | gotten quick enough that it's not as important as it was ~15 | years ago... | | [1] https://en.wikipedia.org/wiki/Point-to-Point_Protocol | | [2] | https://en.wikipedia.org/wiki/Multilayer_switch#Layer-3_swit... | wmf wrote: | Historically, some links didn't have MAC addresses and | different link types have different address types so it's | easier for the routing protocols to work in terms of IP | addresses. | jcrawfordor wrote: | To give a simplified but largely accurate summation: IP and | Ethernet were each designed in different time periods and | largely without knowledge of the other. Ethernet was | historically used in such a fashion that multiple hosts (more | than 2) occupied the same collision domain, that is, they were | physically connected to the same cable, or through hubs that | repeated frames to all interfaces without routing. This means | that Ethernet required an addressing scheme so that hosts on | the same media knew which frames were for them (higher-level | protocols at the time did not necessarily handle this). | | Ethernet's addressing scheme was not designed to accommodate | large hierarchical networks and so is unsuitable for the IP use | case, but more importantly, IP was designed completely | separately from Ethernet, and was not used primarily with | Ethernet until later, so IP could not "assume" that the layer | below it handled addressing (typically there was either no | layer below [point-to-point] or only a very simple one). | | The result is that Ethernet and IP duplicate functionality to | some extent. It is theoretically possible, although not common, | to build a network which uses only layer 3 routing without any | reliance on Ethernet addressing. A significant reason this is | rare, arguably _the_ most significant reason, is that IP is now | carried over Ethernet a significant majority of the time and L2 | Ethernet devices (like switches) require the use of Ethernet | addressing for the network to function. You usually see "pure | IP" in virtual networking environments where the IP is | encapsulated in, well, more IP, but even then Ethernet frames | are sometimes used because, well, just like network hardware, | operating system network stacks generally expect them (examine, | e.g., the linux bridge implementation). It is completely | possible to build network stacks and network appliances which | do not require the use of Ethernet but it is expensive and | there's not much of a motivation to do so, and you'd run into | issues with any kind of equipment not so designed. | | Addressing is not the only duplicate functionality between | Ethernet and IP, and it's one of the less significant ones | since Ethernet addressing does provide utility even if not | strictly required. Ethernet frames are checksummed, and IP | headers are also checksummed, even though the Ethernet checksum | is already over them. The IP header checksum exists because IP | was historically carried over lower layers that did not provide | integrity checking. This is basically pure wasted space in | typical networks, so IPv6 drops the header checksum to remove | the overhead. | | In general, though, network protocols tend to make more sense | when you have some awareness of the history of their | development, as when you try to view the modern internet as an | elegant, monolithic design as some authors attempt, a lot of | things won't make sense because they simply are that way for | historic reasons. Ethernet and IP were each designed in the | '70s, but separately, and their use has accumulated significant | cruft since then, including some radical changes in the ways | that they were used (for example the transition of Ethernet | from shared media to point-to-point, which occurred de facto | earlier but became largely formalized with the introduction of | GbE which prohibits more than two hosts in a collision domain, | and of course ironically the introduction of multiple hosts in | a collision domain as an even larger issue with wireless | protocols, which requires additional handling below, or | actually in lieu of, the ethernet layer, 802.11 being a | replacement for ethernet that happens to behave similarly in | many ways for compatibility). | | Finally, the OSI model is something that tends to add | complexity and confusion to these discussions, which is why I | doggedly discourage its use in teaching. The OSI Model | describes the OSI protocols, which were contemporaries | competitors to the TCP/IP protocols. Arguably, one of the | reasons that the OSI protocols fell out of use (in favor of IP) | is exactly because they assumed seven layers, and each was | fairly complex. Some OSI protocols are still in use, for | example IS-IS (OSI layer 2) in the telecom industry and some | backbone IP transit, but in niches and generally being replaced | with IP. IP is intentionally simpler, and can be fully | described using four layers, what's usually referred to as the | TCP/IP model. | | The OSI layers do not map 1:1 to the TCP/IP layers, even if you | simply ignore the ones that map more poorly as instructors | often do. Even worse, many instructors and textbook authors | feel such a strong compulsion to map modern networks to the | obsolete OSI model that they cram application-layer protocols | into OSI layers 5 and 6 in order to have examples of them. I | have seen cases as extreme as an instructor claiming that HTTP | cookies represent the session layer. This kind of thing is | nonsense and hinders understanding rather than contributing to | it. If the OSI model is taught (not a bad idea at all as | students should realize that TCP/IP is merely the popular way, | and certainly not the only way), it should be taught | specifically by contrasting it to the different TCP/IP model. | Unfortunately few instructors and website authors today seem to | even be aware that the OSI protocol stack existed separately | from IP. | | And, if you are wondering, yes, Ethernet can be used in a | switched network completely independently from IP (although not | really in a routed network unless you are generous about how | you define routing). This was more common decades ago, the only | equipment I have ever personally encountered that used bare | Ethernet was a very outdated CNC setup. | swinglock wrote: | > It doesn't matter which subnet the next hop's IP is in, as | the routing table isn't consulted for it anyway - it's only | used in ARP) | | You can only ARP for hosts on the same subnet as you, terrible | hacks excluded. | | > This leaves the question, why the indirection and why the | mucking around with ARP and IPs that are never used as the | destination to anything? | | Because it was designed in layers so that different layers | could be replaced. We didn't know we'd end up with mostly only | IP and Ethernet in LANs back then. | | > Couldn't you simply put the next hop's MAC address (instead | of IP address) into the routing table and be able to route | packets just as well, with a lot less complexity? | | It could have been done in any number of ways. It's not that | much complexity through and it would bake Ethernet MACs into | everything IP, even in the cases where it's not needed. | AlphaSite wrote: | Fiddling with ARO comes up more often that you'd think, | especially as a quick easy way to handle HA. | boryas wrote: | I believe this piece does a good job with forwarding, but would | be improved by a discussion of termination. | | Routing is only triggered when the packet is L2 terminated: the | destination MAC of the packet is one of the router's own MACs. | | If the packet's destination MAC does not belong to the router, it | doesn't matter what is in its IP header, it will be switched in | the LAN it came in on. | | This design also generalizes nicely to the case when the | destination IP of a routed packet is one of the router's IPs. | anotherkamila_ wrote: | Good point. Incorporating that would require more brain that I | have right now (bad timezone :D), but you're right, I | completely left that out. May I update the article with a link | to this comment? | geerlingguy wrote: | I learned how routers _really_ work from Ericsson 's seminal | video on the matter, The Good Warriors of the Net: | https://www.youtube.com/watch?v=x9XWxD6cJuY | | Though I always thought the "router switch" was much more fun. | Spare_account wrote: | I watched this decades ago and forgot just enough about it that | I couldn't find it again recently when I tried. Thank you | jpxw wrote: | Just watched the whole video, amazing, nostalgic but also | subtly wrong in a number of annoying ways! | dec0dedab0de wrote: | Haha I forgot about this video. It was required viewing at my | first job. | sgillen wrote: | Haha thanks for sharing. Interesting how much emphasis there is | on "the ping of death" compared to literally any other exploit. | Does anyone know if this was really such a big problem when | this video came out? | schoen wrote: | What I remember is that the ping of death was extremely | surprising in terms of the number of OSes affected, the ease | of exploiting it, and the super-noticeable consequence of | instantly crashing the target machine. And it came out at a | time when there wasn't as much vulnerability research and | very few extensively cross-platform vulnerabilities. | | Also, with the ping of death, the only way to use it was to | very noticeably crash systems -- not to secretly build a | botnet or something, as might have been done with RCE | vulnerabilities. | geerlingguy wrote: | I do remember hearing about it causing issues here and there | in the 90s/early 00s, but rarely. Never hear about it | anymore. | | But I do remember AppleTalk causing issues more frequently on | a network I helped manage that had radio studios with two | Macs per studio, but mostly Windows PCs through the rest of | the building. | | That place also had a Macintosh 512K running its phone system | until around 2010! | pfarrell wrote: | I would suggest expanding your terminology section. I know almost | nothing about routers and I'm lost in the first sentence of the | High Level Overview section. "A switch (or an L2 | switch :-) ) is an L2-only thing." | | I don't know what L2 means. I suspect a definition of the various | levels would expand the audience for this post. | AlphaSite wrote: | I think you need to know your audience and cater to them, | trying to explain everything just ends in a book. L2 is | especially googleable. | pfarrell wrote: | This is a good point. You have to have _some_ assumptions of | what your audience brings. | | I'm aware there are levels of information in an IP packet, | but I don't know them offhand. If I have to google something | on the first sentence in a high level overview, then I'm | likely not going to read the piece and the author has lost me | as a reader. Maybe I'm not the target audience, though I was | interested. I'm providing that as feedback for the origial | author since the piece mentions that's it's still a work in | progress. | [deleted] | hinkley wrote: | To be fair, L2 could be Layer 2 or Level 2 (cache) and it | might be a crapshoot what you get. You might get confused | trying to answer your own questions. | | Discoverability lives in the space between overexplaining and | underexplaining. | wruza wrote: | One can just add switch, router, network, etc to the query | until it works. Supposedly they'll all work. Weak google fu | means no info today, and if OP and the author are not the | same person, then the latter may not even have a clue that | it was posted on hn, where such high standards apply. If | someone brought an electronics forum wiki post, should one | expect every TLA1 to be explained there too? | | 1 Three Letter Acronym/Abbreviation | Cerium wrote: | The IP stack has the concept of layers, which function as | abstractions that hide the implementation of lower layers from | the upper layers. Layer 2 (L2) is the physical link layer - it | only cares about getting a packet between two devices. Layer 3 | (L3) is where IP addresses live. As the article describes a | router has functionality to send a packet towards its final | destination as well as get it between ports. | josteink wrote: | > The IP stack has the concept of layers, which function as | abstractions that hide the implementation of lower layers | from the upper layers | | Correction: the _network_ stack has layers, where IP is one | of them, near the top. | | Which is why most software targets IP. It's a good | abstraction and it's portable. | cameronh90 wrote: | GP may be referring to the "TCP/IP model" which does indeed | define the layers used in common parlance. This model has 4 | layers in contrast to the OSI model's 7 layers. The TCP/IP | model is closer to how most real life network stack | implementations are defined. | | Arguably even this layering system is too rigid for reality | but it's a decent model. See RFC 3439 section 3. | varjag wrote: | L1 is (naturally) physical. L2 is data link. | Cyph0n wrote: | L1 is the physical layer. L2 is the MAC layer. | tejohnso wrote: | https://en.wikipedia.org/wiki/OSI_model#Layer_2:_Data_Link_L... | hinkley wrote: | Reading the replies, I somewhat doubt whether you still know | what L2 means. The danger of being a nerd is sometimes you say | a lot of words but they don't mean anything. | | Ethernet. L2 means Ethernet (or WiFi). Ethernet is the envelope | we put Internet traffic in (L3) and the layers above that are | about nailing down how exactly a conversation is managed. | Sometimes people get upset about what constitutes Layers 5-7, | especially since that Tim Berners-Lee joker ruined all the | pretty pictures with HTTP. So mostly we only talk about 2,3,4 | and 7, in the same way you don't bring up religion or politics | at a family reunion. | mav3rick wrote: | L3 => IPs L2 => MAC addresses | msla wrote: | It's important to keep layering in mind when talking to people | outside the IETF, but the IETF itself is not impressed: | | https://en.wikipedia.org/wiki/Internet_protocol_suite#Compar... | | > The IETF protocol development effort is not concerned with | strict layering. Some of its protocols may not fit cleanly into | the OSI model, although RFCs sometimes refer to it and often | use the old OSI layer numbers. The IETF has repeatedly stated | that Internet protocol and architecture development is not | intended to be OSI-compliant. RFC 3439, referring to the | Internet architecture, contains a section entitled: "Layering | Considered Harmful". | | Anyway: People sometimes like to pretend that OSI is a model | and TCP/IP implements the model, forgetting that OSI is/was a | protocol stack and TCP/IP has no interest in being "compliant" | with any other protocol stack to the extent it mimics its | layering architecture. | _jal wrote: | For me the OSI tends to come up at work to talk about scope | or areas of control. People will say "that happens in layer | 3" (for instance) as shorthand, not as a referent that | corresponds to any actual thing. | jlmcguire wrote: | This is one of those cases where both sides have some insight | depending on viewpoint. The OSI model is like every other | model. It isn't reality (at least in TCP/IP) but instead is a | helpful abstraction esp. around troubleshooting and | understanding networking concepts. There comes a point where | the model breaks down but that doesn't mean it's an unhelpful | model just that it isn't a complete picture. I try and work | networking problems through the OSI layer model but am aware | when things don't really fit well into it (MPLS, MSS, ARP, | Layer 5-7). | msla wrote: | I agree with you, except that the use of the OSI model | seems to be distorting history: TCP/IP went up against OSI | and won, even though OSI was favored, because TCP/IP could | get working systems faster. That's a lesson which should be | learned, but it gets obscured if you think that TCP/IP | implemented OSI and there never was a competition. | | Plus, the OSI model is rather complicated; there's a | "TCP/IP Model" with four layers which is a lot simpler: | | https://www.geeksforgeeks.org/tcp-ip-model/ | | > Process/Application Layer | | > Host-to-Host/Transport Layer | | > Internet Layer | | > Network Access/Link Layer | | (This seems to be the RFC 1122 model, BTW.) | | RFC 1122 and RFC 871 each have models, too. | | RFC 871 has: | | > Application/Process | | > Host-to-host | | > Network interface | | https://en.wikipedia.org/wiki/Internet_protocol_suite | Johnny555 wrote: | I don't think the post is meant to be a beginners level | introduction to networking, the author writes: | | _This is the inside view of how exactly a router operates. You | only need to know this if you are poking inside a router | implementation. If that is the case, my condolences._ | | If you're poking inside a router implementation, it seems fair | to expect that you have a basic understanding of OSI networking | layers. | IncRnd wrote: | This refers to Layer 2 in the OSI model of the network stack. | See https://en.wikipedia.org/wiki/OSI_model | | 1. physical layer, 2. data link, 3.vnetwork, 4. transport, 5. | session, 6. presentation, 7. application layer. | | So, many switches are layer 2, but layer 3 switches are often | referred to as switching routers. This can cause two different | switches to act differently from each other in certain network | environments. It isn't that one switch "doesn't work" but that | it isn't a router. | | A router is nominally a L3 device, though most actually are | L1-7. To work, you need L1 & L2, but in today's world, there | are applications and interfaces that move the router across | L1-7, though not to the same depth as purpose built application | devices for example. Topping this off, some routers will switch | and some will not. It's the same wide-world of words that we | see across the whole computer industry. | | The OSI model differs from the TCP model of networking, even | though both use numbered layers. ___________________________________________________________________ (page generated 2020-09-10 23:00 UTC)