![]() Now things started to finally click in my brain. Note: the actual CPU throttling is determined by how many processes you're running and which core(s) they're assigned to by the OS scheduler. This means that the theoretical upper bound for throttling on my 128 core machine is 124 seconds of throttling per second because (128cores * 100ms - 400ms) * 10 = 124. So far as I can comprehend, the theoretical upper bound of throttling is n * (100ms) - limit where n is the number of vCPUs and limit is how many milliseconds of CPU you are allotted in a 100ms window (calculated earlier by cpuLimit * 100ms). Understanding that, the reason for the throttling confusion starts to come into focus. Remember - CPU limits are based on time, not actual vCPUs. This 400ms of work can be broken up in any way - it could translate to 4 vCPUs each doing 100ms of work in a 100ms cfs_period, 8 vCPUs each doing doing 50ms of work, etc. The reason why we get 400ms in a 100ms time frame is each core is capable of doing 100ms of work in a 100ms period – 100ms x 4cores = 400ms. Going back to our discussion on millicores, this means that in every 100ms cfs_period in the operating system, we get 400ms of usage allowed. Once you use up your quota, you are "throttled" until the next period when you can begin using CPU again. A quota is how much CPU time you can use during a given period. It does this by setting a quota and a period. Cgroups, put simply, are a way to isolate and control groups of processes such that have no awareness of the other processes also running on the same server as them It's why when you run a Docker container, it thinks that its ENTRYPOINT + CMD is PID 1.Īmong other things, Cgroups use the Linux CFS (Completely Fair Scheduler) to set and enforce resource limits on groups of processes, e.g. Our example pod has a limit of 4.0 which is 4,000 millicores, or 4 whole cores worth of work capability.īut how does the operating system kernel even enforce this measure? If you're famililar with how Linux containers work, you probably have heard of cgroups. This is the metric used to define CPU requests/limits in Kubernetes. If you are not already familiar with the concept of millicores, suffice to say that 1 milllicore = 1/1000th of a CPU's time (1000 millicores = 1 whole core). Conceptualizingįor the purposes of this article, I'll be referring to a server with 128 CPU cores running a pod with a CPU limit of 4.0. While abstractly this seems pretty cut and dry, it gets more confusing when you're actually looking at in practice on production servers with tons of CPU cores. Currently, the only CPU usage alert bundled in is " CPUThrottlingHigh", which calculates number_of_cpu_cycles_pod_gets_throttled / number_of_cpu_cycles_total (not acutal metric names) to give you a percentage of how frequently your pod is getting its CPU throttled.īut wait, what does throttled even mean? Throttled (at least in my mind) means something along the lines of just getting slowed down, but in this case throttled means completely stopped – you cannot use any more CPU until the next CFS period (every 100ms in Kubernetes, which is also the Linux default - more on this later). ![]() I already knew of the kubernetes-mixin project, which provides sane default Prometheus alerting rules for monitoring Kubernetes cluster health, so I looked there first to see what rules they are using to monitor CPU. This is what first led me to discover that it's actually far more useful to monitor how much the CPU is being throttled rather than how much it's being used. ![]() I sought out what other people are doing to monitor CPU usage of pods in Kubernetes. Like anyone else in IT investigating something they're not sure of, I turned first to Google. This is typically limited to a single pod–the one the scanner randomly gets routed to–but can still be user-visible (and Pagerduty-activating □), so we want to get better monitoring on it. Recently, I've been doing some investigation into high CPU utilization occurring during routine security scans of our Wordpress websites causing issues such as slow response, increased errors, and other undesirable outcomes. Menu Demystifying Kubernetes CPU Limits (and Throttling)
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |