Question 1

What is the difference between cgroups v1 and v2?

Accepted Answer

cgroups v1 (2007) used multiple independent hierarchies — one per controller (cpu, memory, blkio, etc) — so a process could live in different cgroups across controllers, which made coordinated accounting difficult and led to inconsistent semantics. cgroups v2 (merged in Linux 4.5, 2016) uses a single unified hierarchy where each process belongs to exactly one cgroup, and controllers are enabled per-cgroup via cgroup.subtree_control. v2 cleaned up controller behavior (e.g. memory.max replaces memory.limit_in_bytes), added pressure stall information (PSI), and unified I/O accounting across block devices. Both can coexist; systemd uses v2 since 2019, and most modern distros (Fedora 31+, RHEL 9, Ubuntu 22.04+) default to v2-only.

Question 2

How do cgroups relate to Linux namespaces?

Accepted Answer

They are orthogonal kernel primitives. Namespaces (PID, network, mount, UTS, IPC, user, cgroup) virtualize what a process can see — its own /proc, its own network stack, its own filesystem mounts. cgroups limit what a process can use — how much CPU time, how much RAM, how many open PIDs. A container (Docker, podman, Kubernetes pod) is just the combination: the runtime creates new namespaces for isolation and assigns the processes to a cgroup for resource caps. Without cgroups you have isolation but a noisy neighbor can starve the host; without namespaces you have caps but processes can see and signal each other.

Question 3

What does cpu.cfs_quota_us actually do?

Accepted Answer

It is the cgroups v1 CPU bandwidth control knob. The Completely Fair Scheduler (CFS) measures runtime in microseconds within a fixed period (cpu.cfs_period_us, default 100000 = 100 ms). cpu.cfs_quota_us sets how many microseconds of CPU time the cgroup may consume in each period. quota=50000, period=100000 means 0.5 CPU; quota=200000, period=100000 means 2.0 CPUs. When a cgroup exceeds its quota mid-period, all its tasks are throttled until the next period boundary — which is the source of the infamous CFS throttling latency spikes in Kubernetes (visible as nr_throttled and throttled_time in cpu.stat). cgroups v2 collapses these into a single cpu.max file with the format 'quota period'.

Question 4

Why is OOM killer called from memory cgroup?

Accepted Answer

When a cgroup's memory.max is reached and reclaim cannot free enough, the kernel invokes a per-cgroup OOM killer that selects a victim from inside that cgroup — not the global OOM killer. This is critical for multi-tenant systems: a container that exceeds its limit dies without disturbing other tenants. The selection uses oom_score_adj plus RSS to pick a victim. memory.oom.group=1 (v2) escalates the kill to the entire cgroup, mirroring how Kubernetes treats a pod as the failure unit. The decision shows up in dmesg as 'Memory cgroup out of memory: Killed process N (name)'. To investigate, read memory.events for oom_kill counters.

Question 5

How does Kubernetes set resource limits via cgroups?

Accepted Answer

kubelet translates a pod spec into cgroup writes. resources.limits.cpu: '500m' becomes a cpu.max write of '50000 100000' (50 ms quota in a 100 ms period = 0.5 CPU). resources.limits.memory: '256Mi' becomes memory.max = 268435456. requests.cpu becomes cpu.weight (v2) or cpu.shares (v1) — relative weight only honored under contention. Each pod gets its own cgroup under /sys/fs/cgroup/kubepods.slice/, with QoS subdirectories (Guaranteed, Burstable, BestEffort) shaping eviction order. Containers within a pod share the pod cgroup and add per-container subgroups for individual limits.

Question 6

What is a freezer cgroup?

Accepted Answer

The freezer is a cgroup controller that pauses (freezes) every task in the cgroup — they remain in the kernel TASK_FROZEN state, holding all their state but consuming zero CPU, until thawed. In v1 it lived under /sys/fs/cgroup/freezer/; in v2 it is exposed via cgroup.freeze (write 1 to freeze, 0 to thaw). Used by CRIU (checkpoint/restore in userspace) to snapshot containers consistently, by systemd-run to suspend service units, by Android to put background apps to sleep, and by Docker pause/unpause. Unlike SIGSTOP, it works atomically across an entire cgroup tree and survives across signal delivery races.

Control Groups (cgroups)

Interactive visualization

Watch the 60-second explainer

Why cgroups matter

Anatomy of a cgroup

The CPU controller in detail

The memory controller in detail

Common misconceptions

A concrete example

Frequently asked questions