[HN Gopher] Hunting a bug in the i40e Intel driver
       ___________________________________________________________________
        
       Hunting a bug in the i40e Intel driver
        
       Author : todsacerdoti
       Score  : 40 points
       Date   : 2021-07-29 21:31 UTC (1 hours ago)
        
 (HTM) web link (blog.cri.epita.fr)
 (TXT) w3m dump (blog.cri.epita.fr)
        
       | rzezeski wrote:
       | > During those tests, we noticed the machines were randomly
       | freezing after some time, so we decided to upgrade the firmware
       | of the network cards,
       | 
       | Reminds me of the various i40e Tx freezes I debugged while at
       | Joyent. Granted, this is the illumos driver, not Intel's, but
       | basically there were issues with the programming guide that I had
       | to figure out the hard way. The 700-series controllers have not
       | been the easiest to work with.
       | 
       | https://smartos.org/bugview/OS-7492 [Tx freeze when b_cont chain
       | exceeds 8 descriptors]
       | 
       | https://smartos.org/bugview/OS-7457 [i40e Tx freezes on zero
       | descriptors]
        
       | nn3 wrote:
       | Just to save you a somewhat pointless read, they didn't really
       | debug anything but just found the right forum to ask.
        
         | AceJohnny2 wrote:
         | Not entirely pointless, they did provide some useful tips (I
         | wasn't aware of Bcc), but yeah the story ends with them not
         | resolving the issue and just using a different version of the
         | driver that doesn't have the bug.
        
         | kbenson wrote:
         | They debugged the system, not the driver. The way they did that
         | was to identify and confirm it was the driver that caused the
         | problem and in what circumstances, so they could report it to
         | the people responsible for actually dealing with that.
         | 
         | That's still a form of debugging. It's all a matter of
         | perspective. If you had a hardware device that you were
         | interacting directly with in an applicaiton, and you found that
         | if you utilized in in a specific way it crashed, so you changed
         | how the application used it so it wouldn't crash, that would be
         | debugging the application, even if not really debugging the
         | hardware.
        
         | MauranKilom wrote:
         | As a counterpoint, I found the journey interesting and learned
         | a lot about various tools on the way. Only caveat is that they
         | didn't end up pinpointing the error - understandable, given
         | that they are not paid to fix bugs in Intel code, and Intel
         | having fixed the bug already in a newer version anyway.
        
       ___________________________________________________________________
       (page generated 2021-07-29 23:00 UTC)