An article covering everything you need to know about time on Unix. Time, a word that is entangled in everything in our lives, something we're intimately familiar with. Keeping track of it is important for many activities we do.
Linux load averages explained, including why they include the uninterruptible I/O sleep state.
An article covering everything you need to know about time on Unix. Time, a word that is entangled in everything in our lives, something we're intimately familiar with. Keeping track of it is important for many activities we do.
How do you design tools that can protect you against too much traffic? How about slowdown in downstream services? How do you nicely ask your clients to back off and retry later? Most importantly, when do you do that?
In this post I will show you how to break down Linux system load by the load contributor or reason. You can drill down into the “linux system load in thousands” and “high system load, but low CPU utilization” problem patterns too. Introduction - terminology Troubleshooting high system load on Linux Drilling down deeper - WCHAN Drilling down deeper - kernel stack How to troubleshoot past problems Summary Further reading Introduction - Terminology The system load metric aims to represent the system “resource demand” as just a single number. On classic Unixes, it only counts the demand for CPU (threads in Runnable state) The unit of system load metric is “number of processes/threads” (or tasks as the scheduling unit is called on Linux). The load average is an average number of threads over a time period (last 1,5,15 mins) that “compete for CPU” on classic unixes or “either compete for CPU or wait in an uninterruptible sleep state” on Linux Runnable state means “not blocked by anything”, ready to run on CPU. The thread is either currently running on CPU or waiting in the CPU runqueue for the OS scheduler to put it onto CPU On Linux, the system load includes threads both in Runnable (R) and in Uninterruptible sleep (D) states (typically disk I/O, but not always) So, on Linux, an absurdly high load figure can be caused by having lots of threads in Uninterruptible sleep (D) state, in addition to CPU demand. - Linux, Oracle, SQL performance tuning and troubleshooting training & writing.
Learn to measure Linux CPU performance correctly. Avoid load average mistakes, understand iowait vs CPU usage, and use eBPF and PSI for accurate metrics.
One of the things that struck me the most when observing managers at work, and in particular newly instated managers, is how managers become more and more out of touch with the realities of work. There’s actually a lot of research on that from quite a bit of different perspectives. Safety research for example has interesting things to say about “work as imagined” and “work as done”. This doesn’t happen over night of course, but rather a slow process - and I found it has a lot to do with the shift from doing and experiencing to planning and monitoring. In many ways, this is a shift from intuition based thinking to analytic type II “slow thinking” which is very different and requires very different ways of working. Unfortunately, most managers don’t get this and continue using their intuition instead of formal models when conducting planning and monitoring - with disastrous results. This isn’t an argument against intuition or analysis, but one should be aware which method they are employing and act accordingly. As an example, it is interesting to explore the stark difference between estimates on individual level where intuition works very well and on the team/group level, where our intuition fails.
Linux measures load average differently than other OS'es. In a nutshell it includes both CPU and disk i/o and more. Brendan has an excellent...
Curated collection of books, papers, and articles for learning distributed systems and performance engineering
A collection of my favorite posts to read and re-read about optimizing code to an extreme. Unlikely that I will ever need to go to the extremes that these very talented individuals go to but its nice …