[HN Gopher] Where Linux's load average comes from in the kernel ___________________________________________________________________ Where Linux's load average comes from in the kernel Author : zdw Score : 113 points Date : 2022-04-18 16:43 UTC (6 hours ago) (HTM) web link (utcc.utoronto.ca) (TXT) w3m dump (utcc.utoronto.ca) | [deleted] | nousermane wrote: | Pretty sure that running "top -b >/var/log/whatever" in | background would catch the culprit(s) of originally reported | "spikes". | waynesonfire wrote: | If I understand the authors use-case to get more refined top | results, from the first paragraph, | | "Suppose, not hypothetically, that you have a machine that | periodically has its load average briefly soar to relatively | absurd levels for no obvious reason; the machine is normally at, | say, 0.5 load average but briefly spikes to 10 or 15 a number of | times a day. " | | $ sudo perf sched record -- sleep 5 && sudo perf sched latency | | seems to do the trick. I'm not even a perf engineer just did a | simple google search. Though, that makes for pretty crummy blog | content. | tanelpoder wrote: | Perf sched shows you scheduling latency (time spent waiting on | CPU runqueue in Runnable state), but not "demand" for (some of) | the system resources like the load figure tries to estimate. | | I think the author didn't realize that the unit of load is just | "number of threads" doing _something_ (wanting to be on CPU, | running on CPU or waiting for I/O in D state on Linux, just CPU | stuff on other Unixes). | | Load average is just "average number of threads" doing | something over last 1,5,15 minutes. | | So if just the single-number averaged over multiple minutes is | not good enough for drilling down into your load spikes, then | you just go look into the data source (not necessarily source | code) yourself. Just use ps or /proc filesystem to list the | _number of threads_ that are currently either in R or D state. | That's your system load at the current moment. If you want some | summary/average over 10 seconds, run the same commant 100x in a | row (and sleep a bit in the between) and count all threads in R | & D state (and then divide by 100 to normalize it to an | average). | | It's basically sampling-based profiling of Linux thread states. | tanelpoder wrote: | When I was looking into this, I found that just running ps (with | -L option to see all threads, not just processes/thread group | leaders) with some grep/sort/uniq was the easiest way to break | down where does the "too high Linux system load" come from. No | need to compile C code or have root access to drill down into | load. And you could drill down further by sampling some | additional /proc/PID/task/TID/ files, like "syscall", "stack" to | see which (blocked) syscall is contributing to the load and where | in the kernel is it stuck. Knowing what kind of process/thread | level /proc files are available and reading/sampling them with a | shell one-liner is a powerful entry point for performance | troubleshooting and may allow you to delay writing advanced | kernel tracing scripts further. | | For those that are interested: | | https://tanelpoder.com/posts/high-system-load-low-cpu-utiliz... | anonymousDan wrote: | Interesting link thanks. The part I always find tricky with | kernel debugging is distinguishing normal vs abnormal behaviour | (e.g. how many kworker threads is too many)? | wbh1 wrote: | Similar article from Brendan Gregg a few years back: | https://www.brendangregg.com/blog/2017-08-08/linux-load-aver... | tie_ wrote: | Install atop and configure it to sample every second. | | I can't count the number of times this has helped solve a | mysterious behavior. Atop is king. | geocrasher wrote: | Yes, atop is great for this! There's also "sar" (system | activity report) which can do similar things. Both are quite | helpful. | belter wrote: | From the code comments... | | /* | | * kernel/sched/loadavg.c | | * This file contains the magic bits required to compute the | global loadavg | | * figure. Its a silly number but people think its important. We | go through | | * great pains to make it work on big machines and tickless | kernels. | | */ | R0b0t1 wrote: | The comment really undersells it. The fact that it is a regular | number, despite not being strongly tied to something real, does | make it useful. Even when you have a lot of IO wait causing an | extremely large number you are being told something useful. | bragr wrote: | One of my old Linux sys admin interview questions was describing | a situation with a high load number but low CPU utilization. | People who said that was impossible or similar got shown the door | and people who knew or could reason their way to IO wait/problem | would get to proceed. | belter wrote: | "The many load averages of Unix(es)" | | https://utcc.utoronto.ca/~cks/space/blog/unix/ManyLoadAverag... | EdSchouten wrote: | The fact that the load average also counts threads stuck on I/O | happens to be Linux specific. On the BSDs it only measures CPU | utilization. | | So it may be the case that you showed people the door, simply | because their experience was based on non-Linux operating | systems. | bragr wrote: | One does typically prefer Linux experience for a _Linux_ sys | admin, but don't get me wrong that's just one of many | screening question. That said, if someone doesn't have enough | Linux experience to know how Linux differs from *BSD or | Solaris or whatever they have time on, that seems like a | valid exclusion to me. I don't tend to put any stock of | claims of being able to be a quick study unless they've | obviously done some interview prep on the subjects they're | unfamiliar with. Best way to show you are a motivated, quick | study is by being a motivated, quick study. | tanelpoder wrote: | There's also one interesting addition that people may not be | aware of: Synchronous I/O that blocks (like pread64/pwrite64) | will contribute to Linux system load (threads in D state). | | Asynchronous I/O completion checks (libaio's io_getevents) that | are willing to wait for I/O completion, will not contribute to | Linux system load (threads in S state). | | Asynchronous I/O submissions (libaio's io_submit) either | quickly submit their I/O (a small amount of time in R state) OR | get stuck in io_submit() if the underlying block device I/O | queue is full. When io_submit() gets stuck, then you're | sleeping in D mode, thus contributing to system load again. | walrus01 wrote: | one possible 'correct' answer for something like this would be | iowait caused by slow disk performance on a degraded raid array | that was still operational. | | as one of many possible theoretical scenarios. | bigcat123 wrote: | shon wrote: | We used to ask the same question. TBH I haven't read the f'ing | article yet but... I'd summarize this as load=Run-que length. | Number of run-eligible processes blocked by something where | something is usually IO in a low CPU utilization scenario. | lamontcg wrote: | processes stuck in a D state on stale NFS mounts. ___________________________________________________________________ (page generated 2022-04-18 23:00 UTC)