[HN Gopher] 5Gbps Ethernet on the Raspberry Pi Compute Module 4 ___________________________________________________________________ 5Gbps Ethernet on the Raspberry Pi Compute Module 4 Author : geerlingguy Score : 124 points Date : 2020-10-30 18:25 UTC (4 hours ago) (HTM) web link (www.jeffgeerling.com) (TXT) w3m dump (www.jeffgeerling.com) | ProAm wrote: | That was a fun read. Thanks. | unilynx wrote: | > "I need four computers, and they all need gigabit network | interfaces... where could I find four computers to do this?" | | Why not loop the ports back to themselves? IIRC, 1gbit ports | should autodetect when they're cross connected so it wouldn't | even need special cables | geerlingguy wrote: | Would that truly be able to test send / receive of a full (up | to) gigabit of data to/from the interface? If it's loopback, it | could test either sending 500 + receiving 500, or... sending | 500 + receiving 500. It's like sending data through localhost, | it doesn't seem to reflect a more real-world scenario (but | could be especially helpful just for testing). | nitrogen wrote: | I think maybe they meant linking Port 1 to Port 2, and Port 3 | to Port 4? Also I believe gigabit ethernet can be full | duplex, so you should be able to send 1000 and receive 1000 | on a single interface at the same time if it's in full duplex | mode. | adrian_b wrote: | When you loop back Ethernet links in the same computer, you | need to take care with the configuration, because normally the | operating system will not route the Ethernet packets through | the external wires but will process them like being for | localhost, so you will see a very large speed without any | relationship with the Ethernet speed. | | How to force the packets through the external wires depends on | the operating system. On Linux you must use namespaces and | assign the two Ethernet interfaces that are looped on each | other to two distinct namespaces, then set appropriate routes. | q3k wrote: | Seems to be in the same ballpark as when I got ~3.09Gbps on the | Pi4's PCIe, but on a single 10G link: | https://twitter.com/q3k/status/1225588859716632576 | geerlingguy wrote: | Oh, nice! How did I not find your tweets in all my searching | around? | q3k wrote: | Shitposting on Twitter makes for bad SEO :). | baybal2 wrote: | A much easier option: | | Get a USB 3.0 2.5G or 5G card. With a fully functional DMA on the | USB controller it can get quite close to PCIE option. | | A setback for all Linux users at the moment: | | The only chipmaker making USB NICs doing 2.5G+ is RealTek, and | RealTek chose to use USB NCM API for their latest chips. | | And as we know Linux support for NCM now is super slow, and | buggy. | | I barely got 120megs from it. Will welcome any kernel hacker | taking on the problem. | [deleted] | vetinari wrote: | > The only chipmaker making USB NICs doing 2.5G+ is RealTek, | and RealTek chose to use USB NCM API for their latest chips. | | QNAP QNA-UC5G1T uses Marvell AQtion AQC111U. Might be worth a | try. | escardin wrote: | It's probably outside the scope (and possibly cheating) but could | a DPDK stack & supported nic[1] push you past the PCIe limit? | | [1] https://core.dpdk.org/supported/ | q3k wrote: | Does DPDK actually let you not have to DMA packet data over to | the system memory and back? | geerlingguy wrote: | I think I've found the bottleneck now that I have the setup up | and running again today--ksoftirqd quickly hits 100% CPU and | stays that way until the benchmark run completes. | | See: https://github.com/geerlingguy/raspberry-pi-pcie- | devices/iss... | iscfrc wrote: | You might want to try enabling jumbo frames by setting the MTU | to something >1500 bytes. Doing so should reduce the number of | IRQs per unit of time since each frame will be carrying more | data and therefore there will be fewer of them. | | According to the Intel 82580EB datasheet[1] it supports an MTU | of "9.5KB." It's unclear if that means 9500 or 9728 bytes. | | I looked briefly for a datasheet that includes the ethernet | specs. of the Broadcom 2711 but didn't immediately find | anything. | | Recent versions of iproute2 can output the maximum MTU of an | interface via: # Look for "maxmtu" in the | output ip -d link list | | Barring that you can try incrementally upping the MTU until you | run in to errors. | | The MTU of an interface can be set via: ip link | set $interface mtu $mtu | | Note that for symmetrical testing via direct crossover you'll | want to have the MTU be the same on each interface pair. | | [1] | https://www.intel.com/content/www/us/en/embedded/products/ne... | (pg. 25, "Size of jumbo frames supported") | geerlingguy wrote: | I set the MTU to its max (just over 9000 on the intel, heh), | but that didn't make a difference. The one thing that did | move the needle was overclocking the CPU to 2.147 GHz (from | base 1.5 GHz clock), and that got me to 3.4 Gbps. So it seems | to be a CPU constraint at this point. | neurostimulant wrote: | I wonder if using user-space tcp stack (or anything that | could bypass the kernel) could push the number higher. | syoc wrote: | I would have a look at sending data with either DPDK | (https://doc.dpdk.org/burst-replay/introduction.html) or | AF_PACKET and mmap (https://sites.google.com/site/packetmmap/ ) | | You can also use ethtool -C on the NICs on both ends of the | connection to rate limit the irq signal handeling allowing you | to optimize for throughput instead of latency. | drewg123 wrote: | _So theoretically, 5 Gbps was possible_ | | No, it is not. That NIC is a PCIe Gen2 NIC. By using only a | single lane, you're limiting the bandwidth to ~500MB/sec | theoretical. That's 4Gb/s theoretical, and getting 3Gb/s is ~75% | of the theoretical bandwidth, which is pretty decent. | geerlingguy wrote: | I'll take pretty decent, then :) | | I mean, before this the most I had tested successfully was a | little over 2 Gbps with three NICs on a Pi 4 B. | drewg123 wrote: | Can you run an lspci -vvv on the Intel NIC? I just re-read | things, and it seems like 1 of those Gb/s is coming from the | on-board NIC. I'm curious if maybe PCIe is running at Gen1 | geerlingguy wrote: | Here you go! https://pastebin.com/A8gsGz3t | drewg123 wrote: | So its running Gen2 x1, which is good. I was afraid that | it might have downshifted to Gen1. Other threads point to | your CPU being pegged, and I would tend to agree with | that. | | What direction are you running the streams in? In | general, sending is much more efficient than receiving | ("its better to give than to receive"). From your | statement that ksoftirqd is pegged, I'm guessing you're | receiving. | | I'd first see what bandwidth you can send at with iperf | when you run the test in reverse so this pi is sending. | Then, to eliminate memory bw as a potential bottleneck, | you could use sendfile. I don't think iperf ever | supported sendfile (but its been years since I've used | it). I'd suggest installing netperf on this pi, running | netserver on its link partners, and running "netperf | -tTCP_SENDFILE -H othermachine" to all 5 peers and see | what happens. | stkdump wrote: | Well, when a LAN is 1Gb/s they are actually not talking about | real bits. It actually is 100MB/s max, not 125MB/s as one might | expect. Back in the old days they used to call it baud. | wmf wrote: | This is wrong; 1 Gbps Ethernet is 125 MB/s (including | headers/trailer and inter-packet gap so you only get ~117 in | practice). Infiniband, SATA, and Fibre Channel cheat but | Ethernet doesn't. | geerlingguy wrote: | Sorry about the slightly-clickbaity title. I actually have at | least a 10 GbE card (and switch) on the way to test those and see | if I can get more out of it, but for _this_ test, I had a | 4-interface Intel I340-T4, and I managed to get a maximum | throughput of 3.06 Gbps when pumping bits through all 4 of those | plus the built-in Gigabit interface on the Compute Module. | | For some reason I couldn't break that barrier, even though all | the interfaces can do ~940 Mbps on their own, and any three on | the PCIe card can do ~2.8 Gbps. It seems like there's some sort | of upper limit around 3 Gbps on the Pi CM4 (even when combining | the internal interface) :-/ | | But maybe I'm missing something in the Pi OS / Debian/Linux | kernel stack that is holding me back? Or is it a limitation on | the SoC? I though the ethernet chip was separate from the PCIe | lanes on it, but maybe there's something internal to the BCM2711 | that's bottlenecking it. | | Also... tons more detail here: | https://github.com/geerlingguy/raspberry-pi-pcie-devices/iss... | wil421 wrote: | Do you think an SFP+ nic would work? It would be cool to try | out fiber. | baybal2 wrote: | There are no SFP option on 5gbps NICs as i understand as per | standard | mmastrac wrote: | Awesome work. Been watching your videos on these (the video | card one was especially interesting). | | At what point are you saturating the poor little ARM CPU (or | its tiny PCIe interface)? | geerlingguy wrote: | Heh, I know that ~3 Gbps is the maximum you can get through | the PCIe interface (x1, PCI 2.0), so that is expected. But I | was hoping the internal ethernet interface was separate and | could add one 1 Gbps more... the CPU didn't seem to be maxed | out and was also not overheating at the time (especially not | with my 12" fan blasting on it). | dualboot wrote: | with some tuning you should be able to saturate the PCIe 1x | slot. | | Excellent reading on this available here : | | http://www.intel.com/content/dam/doc/application- | note/82575-... | | and here : | | https://blog.cloudflare.com/how-to-achieve-low-latency/ | | _Edit : with the inbound 10Gb card referenced_ | toast0 wrote: | Was all this TCP? You might try UDP as well, in case you're | hitting a bottleneck in the tcp stack. | stratosmacker wrote: | Jeff, | | First off, thank you for doing this kind of 'r&d', it is really | exciting to see what the Pi is capable of after less than a | decade. | | Would you be interested in someone testing a SAS PCI card? I'm | going to pick up one of these as soon as they're not | backordered... | monocasa wrote: | You might be hitting the limits of the RAM. I think LPDDR3 | maxes out at ~4.2Gbps, and running other bus masters like the | HDMI and OS itself would be cutting into that. | wmf wrote: | 32-bit LPDDR4-3200 should give 12.8 Gbytes/s which is 102 | Gbits/s. | monocasa wrote: | You can't just multiply width*frequency for DRAM these | days, as much as I wish we still lived in the days of | ubiquitous SRAM. | | The chip in some of the 2GB RPI4s is rated for only | 3.7Gbps. | | https://www.samsung.com/semiconductor/dram/lpddr4/K4F6E304H | B... | wmf wrote: | No, that chip is rated for 3.7 Gbps _per pin_ and it 's | 32 bits wide. Even at ~60% efficiency you're an order of | magnitude off. | monocasa wrote: | Real world tests are seeing around 3 to 4 Gbps of memory | bandwidth. | | https://medium.com/@ghalfacree/benchmarking-the- | raspberry-pi... | | LPDDR cannot sustain anywhere near the max speed of the | interface. It's more of a hope that you can burst | something out and go to sleep rather than trying to | maintain that speed. In a lot of ways DRAM hasn't gotten | faster in decades when you look at how latency clocks | nearly always increase at the same rate of interface | speed increases. And LPDDR is the niche where that shines | the most, because it doesn't have oodles of dies to | interleave to hide that issue. | mlyle wrote: | Bits aren't bytes. | monocasa wrote: | The y axis is labeled "megabits per second". | hedgehog wrote: | Those numbers look way off, maybe they mixed up the | units? Should be a few GBps at least. | wmf wrote: | Innumeracy strikes again. It's actually 4-5 Gbytes/s [1] | plus whatever bandwidth the video scanout is stealing | (~400 Mbytes/s?). That's only ~40% efficient which is | simultaneously terrible and pretty much what you'd expect | from Broadcom. However 4 Gbytes/s is 32 Gbits/s which | leaves plenty of headroom to do 5 Gbits/s of network I/O. | | [1] | https://www.raspberrypi.org/forums/viewtopic.php?t=271121 | mmastrac wrote: | Is there a way to see if you are hitting memory bandwidth | issues in Linux? | monocasa wrote: | Not in a holistic way AFAIK, and for sure not rigged up | to the Raspbian kernel (since all of that lives on the | videocore side), but I bet Broadcom or the RPi foundation | has access to some undocumented perf counters on the DRAM | controller that could illuminate this if they were the | ones debugging it. | CyberDildonics wrote: | Instead of lying and then apologizing once you get what you | want, it would be better to just not lie in the first place. | geerlingguy wrote: | Technically it's not a lie--there are 5x1 Gbps of interfaces | here. But I wanted to acknowledge that I used a technicality | to get the title how I wanted it, because if I didn't do | that, a lot of people wouldn't read it, and then we wouldn't | get to have this enlightening discussion ;) | ksec wrote: | >Sorry about the slightly-clickbaity title. | | Well yes because 5Gbps Ethernet is actually a thing ( NBase-T | or 5GBASE-T). So 1Gbps x 5 would be more accurate. | | Cant wait to see results on 10GbE though :) | | P.S I really wish 5Gbps Ethernet is more common. | geerlingguy wrote: | True true... though in my work trying to get a flexible 10 | GbE network set up in my house, I've found that the support | for 2.5 and 5 GbE are iffy at best on many devices :( | ncrmro wrote: | My ATT router made by Nokia has one 5gbe and the fiber plugs | in directly with SFP! | StillBored wrote: | Its a single lane pcie gen2 interface. The max theoretical is | 500MB/sec. So you can't ever touch 10G with it. In reality | getting 75% of theoretical on PCIe tends to be a rough upper | limit on most PCIe interfaces, so the 3Gbit your seeing is | pretty close to what one would expect. | | edit: Oh its 3Gbit across 5 interfaces, one of which isn't | PCIe, so the PCIe side is probably only running at about 50%. | It might be interesting to see if the CPUs are pegged (or just | one of them). Even so, PCIe on the rpi isn't coherent so that | is going to slow things down too. | leptons wrote: | >It might be interesting to see if the CPUs are pegged (or | just one of them). | | This is very likely the answer. I see a lot of people who | think of the Pi as some kind of workhorse and are trying to | use it for things that it simply can't do. The Pi is a great | little piece of hardware, but it's not really made for this | kind of thing. I'd never think about using a Raspberry Pi if | I had to think about "saturating a NIC". | geerlingguy wrote: | Well it can saturate up to two, and almost three, gigabit | NICs now. So not too shabby. | | But I like to know the limits so I can plan out a project | and know whether I'm safe using a Pi, or a 3-5x more | expensive board or small PC :) | geerlingguy wrote: | It looks like the problem is `ksoftirqd` gets pegged at 100% | and the system just queues up packets, slowing everything | down. See: https://github.com/geerlingguy/raspberry-pi-pcie- | devices/iss... | StillBored wrote: | So, this is sorta indicative of a RSS problem, but on the | rpi it could be caused by other things. Check | /proc/interrupts to assure you have balanced MSI's, | although that itself could be a problem too. | | edit: run `perf top` to see if that gives you a better | idea. | geerlingguy wrote: | Results: 15.96% [kernel] | [k] _raw_spin_unlock_irqrestore 12.81% [kernel] | [k] mmiocpy 6.26% [kernel] | [k] __copy_to_user_memcpy 6.02% [kernel] | [k] __local_bh_enable_ip 5.13% [igb] | [k] igb_poll | | When it hit full blast, I started getting "Events are | being lost, check IO/CPU overload!" | SoapSeller wrote: | Another idea will be to increase interrupt coalescing via | ethtool -c/C | dualboot wrote: | This is common even on x86 systems. | | You have to set the irq affinity to utilize the available | CPU cores. | | There is a script included with the source you used to | compile drivers called "set_irq_affinity" | | Ex (Sets IRQ Affinity for all available cores) : | | [path-to-i40epackage]/scripts/set_irq_affinity -x all ethX | geerlingguy wrote: | So like https://pastebin.com/2Z4UECPq ? -- this didn't | make a difference in the overall performance :( | dualboot wrote: | Looks like the script needs to be adjusted to function on | the Pi. | | I wish I had the cycles and the kit on hand to play with | this! ___________________________________________________________________ (page generated 2020-10-30 23:00 UTC)