From: Bharata B Rao Basic description for usage and effect of CFS Bandwidth Control. Signed-off-by: Bharata B Rao Signed-off-by: Paul Turner --- Documentation/scheduler/sched-bwc.txt | 98 ++++++++++++++++++++++++++++++++++ Documentation/scheduler/sched-bwc.txt | 104 ++++++++++++++++++++++++++++++++++ 1 file changed, 104 insertions(+) Index: tip/Documentation/scheduler/sched-bwc.txt =================================================================== --- /dev/null +++ tip/Documentation/scheduler/sched-bwc.txt @@ -0,0 +1,104 @@ +CFS Bandwidth Control (aka CPU hard limits) +=========================================== + +[ This document talks about CPU bandwidth control of CFS groups only. + The bandwidth control of RT groups is explained in + Documentation/scheduler/sched-rt-group.txt ] + +CFS bandwidth control is a group scheduler extension that can be used to +control the maximum CPU bandwidth obtained by a CPU cgroup. + +Bandwidth allowed for a group is specified using quota and period. Within +a given "period" (microseconds), a group is allowed to consume up to "quota" +microseconds of CPU time, which is the upper limit or the hard limit. When the +CPU bandwidth consumption of a group exceeds the hard limit, the tasks in the +group are throttled and are not allowed to run until the end of the period at +which time the group's quota is replenished. + +Runtime available to the group is tracked globally. At the beginning of +every period, group's global runtime pool is replenished with "quota" +microseconds worth of runtime. The runtime consumption happens locally at each +CPU by fetching runtimes in "slices" from the global pool. + +Interface +--------- +Quota and period can be set via cgroup files. + +cpu.cfs_quota_us: the enforcement interval (microseconds) +cpu.cfs_period_us: the maximum allowed bandwidth (microseconds) + +Within a period of cpu.cfs_period_us, the group as a whole will not be allowed +to consume more than cpu_cfs_quota_us worth of runtime. + +The default value of cpu.cfs_period_us is 500ms and the default value +for cpu.cfs_quota_us is -1. + +A group with cpu.cfs_quota_us as -1 indicates that the group has infinite +bandwidth, which means that it is not bandwidth controlled. + +Writing any negative value to cpu.cfs_quota_us will turn the group into +an infinite bandwidth group. Reading cpu.cfs_quota_us for an infinite +bandwidth group will always return -1. + +System wide settings +-------------------- +The amount of runtime obtained from global pool every time a CPU wants the +group quota locally is controlled by a sysctl parameter +sched_cfs_bandwidth_slice_us. The current default is 10ms. This can be changed +by writing to /proc/sys/kernel/sched_cfs_bandwidth_slice_us. + +A quota hierarchy is defined to be consistent if the sum of child reservations +does not exceed the bandwidth allocated to its parent. An entity with no +explicit bandwidth reservation (e.g. no limit) is considered to inherit its +parent's limits. This behavior may be managed using +/proc/sys/kernel/sched_cfs_bandwidth_consistent + +Statistics +---------- +cpu.stat file lists three different stats related to CPU bandwidth control. + +nr_periods: Number of enforcement intervals that have elapsed. +nr_throttled: Number of times the group has been throttled/limited. +throttled_time: The total time duration (in nanoseconds) for which the group +remained throttled. + +These files are read-only. + +Hierarchy considerations +------------------------ +Each group's bandwidth (quota and period) can be set independent of its +parent or child groups. There are two ways in which a group can get +throttled: + +- it consumed its quota within the period +- it has quota left but the parent's quota is exhausted. + +In the 2nd case, even though the child has quota left, it will not be +able to run since the parent itself is throttled. Similarly groups that are +not bandwidth constrained might end up being throttled if any parent +in their hierarchy is throttled. + +Examples +-------- +1. Limit a group to 1 CPU worth of runtime. + + If period is 500ms and quota is also 500ms, the group will get + 1 CPU worth of runtime every 500ms. + + # echo 500000 > cpu.cfs_quota_us /* quota = 500ms */ + # echo 500000 > cpu.cfs_period_us /* period = 500ms */ + +2. Limit a group to 2 CPUs worth of runtime on a multi-CPU machine. + + With 500ms period and 1000ms quota, the group can get 2 CPUs worth of + runtime every 500ms. + + # echo 1000000 > cpu.cfs_quota_us /* quota = 1000ms */ + # echo 500000 > cpu.cfs_period_us /* period = 500ms */ + +3. Limit a group to 20% of 1 CPU. + + With 500ms period, 100ms quota will be equivalent to 20% of 1 CPU. + + # echo 100000 > cpu.cfs_quota_us /* quota = 100ms */ + # echo 500000 > cpu.cfs_period_us /* period = 500ms */