[HN Gopher] Where Linux's load average comes from in the kernel
       ___________________________________________________________________
        
       Where Linux's load average comes from in the kernel
        
       Author : zdw
       Score  : 113 points
       Date   : 2022-04-18 16:43 UTC (6 hours ago)
        
 (HTM) web link (utcc.utoronto.ca)
 (TXT) w3m dump (utcc.utoronto.ca)
        
       | [deleted]
        
       | nousermane wrote:
       | Pretty sure that running "top -b >/var/log/whatever" in
       | background would catch the culprit(s) of originally reported
       | "spikes".
        
       | waynesonfire wrote:
       | If I understand the authors use-case to get more refined top
       | results, from the first paragraph,
       | 
       | "Suppose, not hypothetically, that you have a machine that
       | periodically has its load average briefly soar to relatively
       | absurd levels for no obvious reason; the machine is normally at,
       | say, 0.5 load average but briefly spikes to 10 or 15 a number of
       | times a day. "
       | 
       | $ sudo perf sched record -- sleep 5 && sudo perf sched latency
       | 
       | seems to do the trick. I'm not even a perf engineer just did a
       | simple google search. Though, that makes for pretty crummy blog
       | content.
        
         | tanelpoder wrote:
         | Perf sched shows you scheduling latency (time spent waiting on
         | CPU runqueue in Runnable state), but not "demand" for (some of)
         | the system resources like the load figure tries to estimate.
         | 
         | I think the author didn't realize that the unit of load is just
         | "number of threads" doing _something_ (wanting to be on CPU,
         | running on CPU or waiting for I/O in D state on Linux, just CPU
         | stuff on other Unixes).
         | 
         | Load average is just "average number of threads" doing
         | something over last 1,5,15 minutes.
         | 
         | So if just the single-number averaged over multiple minutes is
         | not good enough for drilling down into your load spikes, then
         | you just go look into the data source (not necessarily source
         | code) yourself. Just use ps or /proc filesystem to list the
         | _number of threads_ that are currently either in R or D state.
         | That's your system load at the current moment. If you want some
         | summary/average over 10 seconds, run the same commant 100x in a
         | row (and sleep a bit in the between) and count all threads in R
         | & D state (and then divide by 100 to normalize it to an
         | average).
         | 
         | It's basically sampling-based profiling of Linux thread states.
        
       | tanelpoder wrote:
       | When I was looking into this, I found that just running ps (with
       | -L option to see all threads, not just processes/thread group
       | leaders) with some grep/sort/uniq was the easiest way to break
       | down where does the "too high Linux system load" come from. No
       | need to compile C code or have root access to drill down into
       | load. And you could drill down further by sampling some
       | additional /proc/PID/task/TID/ files, like "syscall", "stack" to
       | see which (blocked) syscall is contributing to the load and where
       | in the kernel is it stuck. Knowing what kind of process/thread
       | level /proc files are available and reading/sampling them with a
       | shell one-liner is a powerful entry point for performance
       | troubleshooting and may allow you to delay writing advanced
       | kernel tracing scripts further.
       | 
       | For those that are interested:
       | 
       | https://tanelpoder.com/posts/high-system-load-low-cpu-utiliz...
        
         | anonymousDan wrote:
         | Interesting link thanks. The part I always find tricky with
         | kernel debugging is distinguishing normal vs abnormal behaviour
         | (e.g. how many kworker threads is too many)?
        
       | wbh1 wrote:
       | Similar article from Brendan Gregg a few years back:
       | https://www.brendangregg.com/blog/2017-08-08/linux-load-aver...
        
       | tie_ wrote:
       | Install atop and configure it to sample every second.
       | 
       | I can't count the number of times this has helped solve a
       | mysterious behavior. Atop is king.
        
         | geocrasher wrote:
         | Yes, atop is great for this! There's also "sar" (system
         | activity report) which can do similar things. Both are quite
         | helpful.
        
       | belter wrote:
       | From the code comments...
       | 
       | /*
       | 
       | * kernel/sched/loadavg.c
       | 
       | * This file contains the magic bits required to compute the
       | global loadavg
       | 
       | * figure. Its a silly number but people think its important. We
       | go through
       | 
       | * great pains to make it work on big machines and tickless
       | kernels.
       | 
       | */
        
         | R0b0t1 wrote:
         | The comment really undersells it. The fact that it is a regular
         | number, despite not being strongly tied to something real, does
         | make it useful. Even when you have a lot of IO wait causing an
         | extremely large number you are being told something useful.
        
       | bragr wrote:
       | One of my old Linux sys admin interview questions was describing
       | a situation with a high load number but low CPU utilization.
       | People who said that was impossible or similar got shown the door
       | and people who knew or could reason their way to IO wait/problem
       | would get to proceed.
        
         | belter wrote:
         | "The many load averages of Unix(es)"
         | 
         | https://utcc.utoronto.ca/~cks/space/blog/unix/ManyLoadAverag...
        
         | EdSchouten wrote:
         | The fact that the load average also counts threads stuck on I/O
         | happens to be Linux specific. On the BSDs it only measures CPU
         | utilization.
         | 
         | So it may be the case that you showed people the door, simply
         | because their experience was based on non-Linux operating
         | systems.
        
           | bragr wrote:
           | One does typically prefer Linux experience for a _Linux_ sys
           | admin, but don't get me wrong that's just one of many
           | screening question. That said, if someone doesn't have enough
           | Linux experience to know how Linux differs from *BSD or
           | Solaris or whatever they have time on, that seems like a
           | valid exclusion to me. I don't tend to put any stock of
           | claims of being able to be a quick study unless they've
           | obviously done some interview prep on the subjects they're
           | unfamiliar with. Best way to show you are a motivated, quick
           | study is by being a motivated, quick study.
        
         | tanelpoder wrote:
         | There's also one interesting addition that people may not be
         | aware of: Synchronous I/O that blocks (like pread64/pwrite64)
         | will contribute to Linux system load (threads in D state).
         | 
         | Asynchronous I/O completion checks (libaio's io_getevents) that
         | are willing to wait for I/O completion, will not contribute to
         | Linux system load (threads in S state).
         | 
         | Asynchronous I/O submissions (libaio's io_submit) either
         | quickly submit their I/O (a small amount of time in R state) OR
         | get stuck in io_submit() if the underlying block device I/O
         | queue is full. When io_submit() gets stuck, then you're
         | sleeping in D mode, thus contributing to system load again.
        
         | walrus01 wrote:
         | one possible 'correct' answer for something like this would be
         | iowait caused by slow disk performance on a degraded raid array
         | that was still operational.
         | 
         | as one of many possible theoretical scenarios.
        
         | bigcat123 wrote:
        
         | shon wrote:
         | We used to ask the same question. TBH I haven't read the f'ing
         | article yet but... I'd summarize this as load=Run-que length.
         | Number of run-eligible processes blocked by something where
         | something is usually IO in a low CPU utilization scenario.
        
         | lamontcg wrote:
         | processes stuck in a D state on stale NFS mounts.
        
       ___________________________________________________________________
       (page generated 2022-04-18 23:00 UTC)