[HN Gopher] Hunting a bug in the i40e Intel driver ___________________________________________________________________ Hunting a bug in the i40e Intel driver Author : todsacerdoti Score : 40 points Date : 2021-07-29 21:31 UTC (1 hours ago) (HTM) web link (blog.cri.epita.fr) (TXT) w3m dump (blog.cri.epita.fr) | rzezeski wrote: | > During those tests, we noticed the machines were randomly | freezing after some time, so we decided to upgrade the firmware | of the network cards, | | Reminds me of the various i40e Tx freezes I debugged while at | Joyent. Granted, this is the illumos driver, not Intel's, but | basically there were issues with the programming guide that I had | to figure out the hard way. The 700-series controllers have not | been the easiest to work with. | | https://smartos.org/bugview/OS-7492 [Tx freeze when b_cont chain | exceeds 8 descriptors] | | https://smartos.org/bugview/OS-7457 [i40e Tx freezes on zero | descriptors] | nn3 wrote: | Just to save you a somewhat pointless read, they didn't really | debug anything but just found the right forum to ask. | AceJohnny2 wrote: | Not entirely pointless, they did provide some useful tips (I | wasn't aware of Bcc), but yeah the story ends with them not | resolving the issue and just using a different version of the | driver that doesn't have the bug. | kbenson wrote: | They debugged the system, not the driver. The way they did that | was to identify and confirm it was the driver that caused the | problem and in what circumstances, so they could report it to | the people responsible for actually dealing with that. | | That's still a form of debugging. It's all a matter of | perspective. If you had a hardware device that you were | interacting directly with in an applicaiton, and you found that | if you utilized in in a specific way it crashed, so you changed | how the application used it so it wouldn't crash, that would be | debugging the application, even if not really debugging the | hardware. | MauranKilom wrote: | As a counterpoint, I found the journey interesting and learned | a lot about various tools on the way. Only caveat is that they | didn't end up pinpointing the error - understandable, given | that they are not paid to fix bugs in Intel code, and Intel | having fixed the bug already in a newer version anyway. ___________________________________________________________________ (page generated 2021-07-29 23:00 UTC)