* [PATCH v5 0/3] sched/fair: Burstable CFS bandwidth controller @ 2021-05-20 12:34 Huaixin Chang 2021-05-20 12:34 ` [PATCH v5 1/3] sched/fair: Introduce the burstable CFS controller Huaixin Chang ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Huaixin Chang @ 2021-05-20 12:34 UTC (permalink / raw) To: changhuaixin Cc: bsegall, dietmar.eggemann, dtcccc, juri.lelli, khlebnikov, linux-kernel, mgorman, mingo, odin, odin, pauld, peterz, pjt, rostedt, shanpeic, tj, vincent.guittot, xiyou.wangcong Changelog: v5: - Rearrange into 3 patches, one less than the previous version. - The interference to other groups are valued. - Put a limit on burst, so that code is further simplified. - Rebase upon v5.13-rc3. v4: - Adjust assignments in tg_set_cfs_bandwidth(), saving unnecessary assignemnts when quota == RUNTIME_INF. - Getting rid of sysctl_sched_cfs_bw_burst_onset_percent, as there seems no justification for both controlling start bandwidth and a percent way. - Comment improvement in sched_cfs_period_timer() shifts on explaining why max_overrun shifting to 0 is a problem. - Rename previous_runtime to runtime_at_period_start. - Add cgroup2 interface and documentation. - Getting rid of exposing current_bw as there are not enough justification and the updating problem. - Add justification on cpu.stat change in the changelog. - Rebase upon v5.12-rc3. - Correct SoB chain. - Several indentation fixes. - Adjust quota in schbench test from 700000 to 600000. Link: https://lore.kernel.org/lkml/20210316044931.39733-1-changhuaixin@linux.alibaba.com/ v3: - Fix another issue reported by test robot. - Update docs as Randy Dunlap suggested. Link: https://lore.kernel.org/lkml/20210120122715.29493-1-changhuaixin@linux.alibaba.com/ v2: - Fix an issue reported by test robot. - Rewriting docs. Appreciate any further suggestions or help. Link: https://lore.kernel.org/lkml/20210121110453.18899-1-changhuaixin@linux.alibaba.com/ v1 Link: https://lore.kernel.org/lkml/20201217074620.58338-1-changhuaixin@linux.alibaba.com/ Previously, Cong Wang and Konstantin Khlebnikov proposed similar feature: https://lore.kernel.org/lkml/20180522062017.5193-1-xiyou.wangcong@gmail.com/ https://lore.kernel.org/lkml/157476581065.5793.4518979877345136813.stgit@buzz/ This time we present more latency statistics and handle overflow while accumulating. Huaixin Chang (3): sched/fair: Introduce the burstable CFS controller sched/fair: Add cfs bandwidth burst statistics sched/fair: Add document for burstable CFS bandwidth Documentation/admin-guide/cgroup-v2.rst | 17 +++--- Documentation/scheduler/sched-bwc.rst | 76 ++++++++++++++++++++++---- include/linux/sched/sysctl.h | 1 + kernel/sched/core.c | 96 ++++++++++++++++++++++++++------- kernel/sched/fair.c | 32 ++++++++++- kernel/sched/sched.h | 4 ++ kernel/sysctl.c | 9 ++++ 7 files changed, 200 insertions(+), 35 deletions(-) -- 2.14.4.44.g2045bb6 ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v5 1/3] sched/fair: Introduce the burstable CFS controller 2021-05-20 12:34 [PATCH v5 0/3] sched/fair: Burstable CFS bandwidth controller Huaixin Chang @ 2021-05-20 12:34 ` Huaixin Chang 2021-05-20 14:00 ` Odin Ugedal 2021-05-21 14:00 ` Peter Zijlstra 2021-05-20 12:34 ` [PATCH v5 2/3] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang 2021-05-20 12:34 ` [PATCH v5 3/3] sched/fair: Add document for burstable CFS bandwidth Huaixin Chang 2 siblings, 2 replies; 21+ messages in thread From: Huaixin Chang @ 2021-05-20 12:34 UTC (permalink / raw) To: changhuaixin Cc: bsegall, dietmar.eggemann, dtcccc, juri.lelli, khlebnikov, linux-kernel, mgorman, mingo, odin, odin, pauld, peterz, pjt, rostedt, shanpeic, tj, vincent.guittot, xiyou.wangcong The CFS bandwidth controller limits CPU requests of a task group to quota during each period. However, parallel workloads might be bursty so that they get throttled even when their average utilization is under quota. And they are latency sensitive at the same time so that throttling them is undesired. Scaling up period and quota allows greater burst capacity. But it might cause longer stuck till next refill. Introduce "burst" to allow accumulating unused quota from previous periods, and to be assigned when a task group requests more CPU than quota during a specific period. Introducing burst buffer might also cause interference to other groups. Thus limit the maximum accumulated buffer by "burst", and limit the maximum allowed burst by quota, too. The benefit of burst is seen when testing with schbench: echo $$ > /sys/fs/cgroup/cpu/test/cgroup.procs echo 600000 > /sys/fs/cgroup/cpu/test/cpu.cfs_quota_us echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_period_us echo 400000 > /sys/fs/cgroup/cpu/test/cpu.cfs_burst_us # The average CPU usage is around 500%, which is 200ms CPU time # every 40ms. ./schbench -m 1 -t 30 -r 10 -c 10000 -R 500 Without burst: Latency percentiles (usec) 50.0000th: 7 75.0000th: 8 90.0000th: 9 95.0000th: 10 *99.0000th: 933 99.5000th: 981 99.9000th: 3068 min=0, max=20054 rps: 498.31 p95 (usec) 10 p99 (usec) 933 p95/cputime 0.10% p99/cputime 9.33% With burst: Latency percentiles (usec) 50.0000th: 7 75.0000th: 8 90.0000th: 9 95.0000th: 9 *99.0000th: 12 99.5000th: 13 99.9000th: 19 min=0, max=406 rps: 498.36 p95 (usec) 9 p99 (usec) 12 p95/cputime 0.09% p99/cputime 0.12% The interferenece when using burst is valued by the possibilities for missing the deadline and the average WCET. Test results showed that when there many cgroups or CPU is under utilized, the interference is limited. More details are shown in: https://lore.kernel.org/lkml/5371BD36-55AE-4F71-B9D7-B86DC32E3D2B@linux.alibaba.com/ Co-developed-by: Shanpei Chen <shanpeic@linux.alibaba.com> Signed-off-by: Shanpei Chen <shanpeic@linux.alibaba.com> Co-developed-by: Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com> --- include/linux/sched/sysctl.h | 1 + kernel/sched/core.c | 83 ++++++++++++++++++++++++++++++++++++-------- kernel/sched/fair.c | 21 ++++++++++- kernel/sched/sched.h | 1 + kernel/sysctl.c | 9 +++++ 5 files changed, 99 insertions(+), 16 deletions(-) diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index db2c0f34aaaf..08432aeb742e 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -73,6 +73,7 @@ extern unsigned int sysctl_sched_uclamp_util_min_rt_default; #ifdef CONFIG_CFS_BANDWIDTH extern unsigned int sysctl_sched_cfs_bandwidth_slice; +extern unsigned int sysctl_sched_cfs_bw_burst_enabled; #endif #ifdef CONFIG_SCHED_AUTOGROUP diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 5226cc26a095..7d34b08ee0e5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8962,7 +8962,8 @@ static const u64 max_cfs_runtime = MAX_BW * NSEC_PER_USEC; static int __cfs_schedulable(struct task_group *tg, u64 period, u64 runtime); -static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota) +static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota, + u64 burst) { int i, ret = 0, runtime_enabled, runtime_was_enabled; struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth; @@ -8992,6 +8993,10 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota) if (quota != RUNTIME_INF && quota > max_cfs_runtime) return -EINVAL; + if (quota != RUNTIME_INF && (burst > quota || + burst + quota > max_cfs_runtime)) + return -EINVAL; + /* * Prevent race between setting of cfs_rq->runtime_enabled and * unthrottle_offline_cfs_rqs(). @@ -9013,6 +9018,7 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota) raw_spin_lock_irq(&cfs_b->lock); cfs_b->period = ns_to_ktime(period); cfs_b->quota = quota; + cfs_b->burst = burst; __refill_cfs_bandwidth_runtime(cfs_b); @@ -9046,9 +9052,10 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota) static int tg_set_cfs_quota(struct task_group *tg, long cfs_quota_us) { - u64 quota, period; + u64 quota, period, burst; period = ktime_to_ns(tg->cfs_bandwidth.period); + burst = tg->cfs_bandwidth.burst; if (cfs_quota_us < 0) quota = RUNTIME_INF; else if ((u64)cfs_quota_us <= U64_MAX / NSEC_PER_USEC) @@ -9056,7 +9063,7 @@ static int tg_set_cfs_quota(struct task_group *tg, long cfs_quota_us) else return -EINVAL; - return tg_set_cfs_bandwidth(tg, period, quota); + return tg_set_cfs_bandwidth(tg, period, quota, burst); } static long tg_get_cfs_quota(struct task_group *tg) @@ -9074,15 +9081,16 @@ static long tg_get_cfs_quota(struct task_group *tg) static int tg_set_cfs_period(struct task_group *tg, long cfs_period_us) { - u64 quota, period; + u64 quota, period, burst; if ((u64)cfs_period_us > U64_MAX / NSEC_PER_USEC) return -EINVAL; period = (u64)cfs_period_us * NSEC_PER_USEC; quota = tg->cfs_bandwidth.quota; + burst = tg->cfs_bandwidth.burst; - return tg_set_cfs_bandwidth(tg, period, quota); + return tg_set_cfs_bandwidth(tg, period, quota, burst); } static long tg_get_cfs_period(struct task_group *tg) @@ -9095,6 +9103,30 @@ static long tg_get_cfs_period(struct task_group *tg) return cfs_period_us; } +static int tg_set_cfs_burst(struct task_group *tg, long cfs_burst_us) +{ + u64 quota, period, burst; + + if ((u64)cfs_burst_us > U64_MAX / NSEC_PER_USEC) + return -EINVAL; + + burst = (u64)cfs_burst_us * NSEC_PER_USEC; + period = ktime_to_ns(tg->cfs_bandwidth.period); + quota = tg->cfs_bandwidth.quota; + + return tg_set_cfs_bandwidth(tg, period, quota, burst); +} + +static long tg_get_cfs_burst(struct task_group *tg) +{ + u64 burst_us; + + burst_us = tg->cfs_bandwidth.burst; + do_div(burst_us, NSEC_PER_USEC); + + return burst_us; +} + static s64 cpu_cfs_quota_read_s64(struct cgroup_subsys_state *css, struct cftype *cft) { @@ -9119,6 +9151,18 @@ static int cpu_cfs_period_write_u64(struct cgroup_subsys_state *css, return tg_set_cfs_period(css_tg(css), cfs_period_us); } +static s64 cpu_cfs_burst_read_s64(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + return tg_get_cfs_burst(css_tg(css)); +} + +static int cpu_cfs_burst_write_s64(struct cgroup_subsys_state *css, + struct cftype *cftype, s64 cfs_burst_us) +{ + return tg_set_cfs_burst(css_tg(css), cfs_burst_us); +} + struct cfs_schedulable_data { struct task_group *tg; u64 period, quota; @@ -9271,6 +9315,11 @@ static struct cftype cpu_legacy_files[] = { .read_u64 = cpu_cfs_period_read_u64, .write_u64 = cpu_cfs_period_write_u64, }, + { + .name = "cfs_burst_us", + .read_s64 = cpu_cfs_burst_read_s64, + .write_s64 = cpu_cfs_burst_write_s64, + }, { .name = "stat", .seq_show = cpu_cfs_stat_show, @@ -9390,27 +9439,29 @@ static int cpu_weight_nice_write_s64(struct cgroup_subsys_state *css, } #endif -static void __maybe_unused cpu_period_quota_print(struct seq_file *sf, - long period, long quota) +static void __maybe_unused +cpu_period_quota_print(struct seq_file *sf, long period, long quota, + long burst) { if (quota < 0) seq_puts(sf, "max"); else seq_printf(sf, "%ld", quota); - seq_printf(sf, " %ld\n", period); + seq_printf(sf, " %ld %ld\n", period, burst); } -/* caller should put the current value in *@periodp before calling */ -static int __maybe_unused cpu_period_quota_parse(char *buf, - u64 *periodp, u64 *quotap) +/* caller should put the current value in *@periodp and *@burstp before calling */ +static int __maybe_unused cpu_period_quota_parse(char *buf, u64 *periodp, + u64 *quotap, u64 *burstp) { char tok[21]; /* U64_MAX */ - if (sscanf(buf, "%20s %llu", tok, periodp) < 1) + if (sscanf(buf, "%20s %llu %llu", tok, periodp, burstp) < 1) return -EINVAL; *periodp *= NSEC_PER_USEC; + *burstp *= NSEC_PER_USEC; if (sscanf(tok, "%llu", quotap)) *quotap *= NSEC_PER_USEC; @@ -9427,7 +9478,8 @@ static int cpu_max_show(struct seq_file *sf, void *v) { struct task_group *tg = css_tg(seq_css(sf)); - cpu_period_quota_print(sf, tg_get_cfs_period(tg), tg_get_cfs_quota(tg)); + cpu_period_quota_print(sf, tg_get_cfs_period(tg), tg_get_cfs_quota(tg), + tg_get_cfs_burst(tg)); return 0; } @@ -9436,12 +9488,13 @@ static ssize_t cpu_max_write(struct kernfs_open_file *of, { struct task_group *tg = css_tg(of_css(of)); u64 period = tg_get_cfs_period(tg); + u64 burst = tg_get_cfs_burst(tg); u64 quota; int ret; - ret = cpu_period_quota_parse(buf, &period, "a); + ret = cpu_period_quota_parse(buf, &period, "a, &burst); if (!ret) - ret = tg_set_cfs_bandwidth(tg, period, quota); + ret = tg_set_cfs_bandwidth(tg, period, quota, burst); return ret ?: nbytes; } #endif diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3248e24a90b0..48fad5cf0f7a 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -134,6 +134,13 @@ int __weak arch_asym_cpu_priority(int cpu) * (default: 5 msec, units: microseconds) */ unsigned int sysctl_sched_cfs_bandwidth_slice = 5000UL; + +/* + * A switch for cfs bandwidth burst. + * + * (default: 1, enabled) + */ +unsigned int sysctl_sched_cfs_bw_burst_enabled = 1; #endif static inline void update_load_add(struct load_weight *lw, unsigned long inc) @@ -4628,8 +4635,16 @@ static inline u64 sched_cfs_bandwidth_slice(void) */ void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b) { - if (cfs_b->quota != RUNTIME_INF) + if (unlikely(cfs_b->quota == RUNTIME_INF)) + return; + + if (!sysctl_sched_cfs_bw_burst_enabled) { cfs_b->runtime = cfs_b->quota; + return; + } + + cfs_b->runtime += cfs_b->quota; + cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst); } static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg) @@ -4651,6 +4666,9 @@ static int __assign_cfs_rq_runtime(struct cfs_bandwidth *cfs_b, if (cfs_b->quota == RUNTIME_INF) amount = min_amount; else { + if (!cfs_b->period_active) + __refill_cfs_bandwidth_runtime(cfs_b); + start_cfs_bandwidth(cfs_b); if (cfs_b->runtime > 0) { @@ -5279,6 +5297,7 @@ void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b) cfs_b->runtime = 0; cfs_b->quota = RUNTIME_INF; cfs_b->period = ns_to_ktime(default_cfs_period()); + cfs_b->burst = 0; INIT_LIST_HEAD(&cfs_b->throttled_cfs_rq); hrtimer_init(&cfs_b->period_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index a189bec13729..d317ca74a48c 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -366,6 +366,7 @@ struct cfs_bandwidth { ktime_t period; u64 quota; u64 runtime; + u64 burst; s64 hierarchical_quota; u8 idle; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 14edf84cc571..fb27bae7ace2 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1816,6 +1816,15 @@ static struct ctl_table kern_table[] = { .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ONE, }, + { + .procname = "sched_cfs_bw_burst_enabled", + .data = &sysctl_sched_cfs_bw_burst_enabled, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, #endif #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL) { -- 2.14.4.44.g2045bb6 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v5 1/3] sched/fair: Introduce the burstable CFS controller 2021-05-20 12:34 ` [PATCH v5 1/3] sched/fair: Introduce the burstable CFS controller Huaixin Chang @ 2021-05-20 14:00 ` Odin Ugedal 2021-05-20 17:04 ` Tejun Heo 2021-05-21 9:09 ` changhuaixin 2021-05-21 14:00 ` Peter Zijlstra 1 sibling, 2 replies; 21+ messages in thread From: Odin Ugedal @ 2021-05-20 14:00 UTC (permalink / raw) To: Huaixin Chang Cc: Benjamin Segall, Dietmar Eggemann, dtcccc, Juri Lelli, khlebnikov, open list, Mel Gorman, Ingo Molnar, Odin Ugedal, pauld, Peter Zijlstra, Paul Turner, Steven Rostedt, shanpeic, Tejun Heo, Vincent Guittot, xiyou.wangcong Hi, Here are some more thoughts and questions: > The benefit of burst is seen when testing with schbench: > > echo $$ > /sys/fs/cgroup/cpu/test/cgroup.procs > echo 600000 > /sys/fs/cgroup/cpu/test/cpu.cfs_quota_us > echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_period_us > echo 400000 > /sys/fs/cgroup/cpu/test/cpu.cfs_burst_us > > # The average CPU usage is around 500%, which is 200ms CPU time > # every 40ms. > ./schbench -m 1 -t 30 -r 10 -c 10000 -R 500 > > Without burst: > > Latency percentiles (usec) > 50.0000th: 7 > 75.0000th: 8 > 90.0000th: 9 > 95.0000th: 10 > *99.0000th: 933 > 99.5000th: 981 > 99.9000th: 3068 > min=0, max=20054 > rps: 498.31 p95 (usec) 10 p99 (usec) 933 p95/cputime 0.10% p99/cputime 9.33% It should be noted that this was running on a 64 core machine (if that was the case, ref. your previous patch). I am curious how much you have tried tweaking both the period and the quota for this workload. I assume a longer period can help such bursty application, and from the small slowdowns, a slightly higher quota could also help I guess. I am not saying this is a bad idea, but that we need to understand what it fixes, and how, in order to be able to understand how/if to use it. Also, what value of the sysctl kernel.sched_cfs_bandwidth_slice_us are you using? What CONFIG_HZ you are using is also interesting, due to how bw is accounted for. There is some more info about it here: Documentation/scheduler/sched-bwc.rst. I assume a smaller slice value may also help, and it would be interesting to see what implications it gives. A high threads to (quota/period) ratio, together with a high bandwidth_slice will probably cause some throttling, so one has to choose between precision and overhead. Also, here you give a burst of 66% the quota. Would that be a typical value for a cgroup, or is it just a result of testing? As I understand this patchset, your example would allow 600% constant CPU load, then one period with 1000% load, then another "long set" of periods with 600% load. Have you discussed a way of limiting how long burst can be "saved" before expiring? > @@ -9427,7 +9478,8 @@ static int cpu_max_show(struct seq_file *sf, void *v) > { > struct task_group *tg = css_tg(seq_css(sf)); > > - cpu_period_quota_print(sf, tg_get_cfs_period(tg), tg_get_cfs_quota(tg)); > + cpu_period_quota_print(sf, tg_get_cfs_period(tg), tg_get_cfs_quota(tg), > + tg_get_cfs_burst(tg)); > return 0; > } The current cgroup v2 docs say the following: > cpu.max > A read-write two value file which exists on non-root cgroups. > The default is "max 100000". This will become a "three value file", and I know a few user space projects who parse this file by splitting on the middle space. I am not sure if they are "wrong", but I don't think we usually break such things. Not sure what Tejun thinks about this. Thanks Odin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 1/3] sched/fair: Introduce the burstable CFS controller 2021-05-20 14:00 ` Odin Ugedal @ 2021-05-20 17:04 ` Tejun Heo 2021-05-21 9:09 ` changhuaixin 1 sibling, 0 replies; 21+ messages in thread From: Tejun Heo @ 2021-05-20 17:04 UTC (permalink / raw) To: Odin Ugedal Cc: Huaixin Chang, Benjamin Segall, Dietmar Eggemann, dtcccc, Juri Lelli, khlebnikov, open list, Mel Gorman, Ingo Molnar, pauld, Peter Zijlstra, Paul Turner, Steven Rostedt, shanpeic, Vincent Guittot, xiyou.wangcong Hello, Odin. On Thu, May 20, 2021 at 04:00:29PM +0200, Odin Ugedal wrote: > > cpu.max > > A read-write two value file which exists on non-root cgroups. > > The default is "max 100000". > > This will become a "three value file", and I know a few user space projects > who parse this file by splitting on the middle space. I am not sure if they are > "wrong", but I don't think we usually break such things. Not sure what > Tejun thinks about this. Good point. I haven't thought about that. It would make more sense to separate it out to a separate file then - e.g. sth like cpu.max.burst, but it seems like there are important questions to answer before adding new interfaces. Thanks. -- tejun ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 1/3] sched/fair: Introduce the burstable CFS controller 2021-05-20 14:00 ` Odin Ugedal 2021-05-20 17:04 ` Tejun Heo @ 2021-05-21 9:09 ` changhuaixin 2021-05-21 9:38 ` Odin Ugedal 1 sibling, 1 reply; 21+ messages in thread From: changhuaixin @ 2021-05-21 9:09 UTC (permalink / raw) To: Odin Ugedal Cc: changhuaixin, Benjamin Segall, Dietmar Eggemann, dtcccc, Juri Lelli, khlebnikov, open list, Mel Gorman, Ingo Molnar, pauld, Peter Zijlstra, Paul Turner, Steven Rostedt, Shanpei Chen, Tejun Heo, Vincent Guittot, xiyou.wangcong > On May 20, 2021, at 10:00 PM, Odin Ugedal <odin@uged.al> wrote: > > Hi, > > Here are some more thoughts and questions: > >> The benefit of burst is seen when testing with schbench: >> >> echo $$ > /sys/fs/cgroup/cpu/test/cgroup.procs >> echo 600000 > /sys/fs/cgroup/cpu/test/cpu.cfs_quota_us >> echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_period_us >> echo 400000 > /sys/fs/cgroup/cpu/test/cpu.cfs_burst_us >> >> # The average CPU usage is around 500%, which is 200ms CPU time >> # every 40ms. >> ./schbench -m 1 -t 30 -r 10 -c 10000 -R 500 >> >> Without burst: >> >> Latency percentiles (usec) >> 50.0000th: 7 >> 75.0000th: 8 >> 90.0000th: 9 >> 95.0000th: 10 >> *99.0000th: 933 >> 99.5000th: 981 >> 99.9000th: 3068 >> min=0, max=20054 >> rps: 498.31 p95 (usec) 10 p99 (usec) 933 p95/cputime 0.10% p99/cputime 9.33% > > It should be noted that this was running on a 64 core machine (if that was > the case, ref. your previous patch). > > I am curious how much you have tried tweaking both the period and the quota > for this workload. I assume a longer period can help such bursty application, > and from the small slowdowns, a slightly higher quota could also help > I guess. I am > not saying this is a bad idea, but that we need to understand what it > fixes, and how, > in order to be able to understand how/if to use it. > Yeah, it is a well tuned workload and configuration. I did this because for benchmarks like schbench, workloads are generated in a fixed pattern without burst. So I set schbench params carefully to generate burst during each 100ms periods, to show burst works. Longer period or higher quota helps indeed, in which case more workloads can be used to generate tail latency then. In my view, burst is like the cfsb way of token bucket. For the present cfsb, bucket capacity is strictly limited to quota. And that is changed into quota + burst now. And it shall be used when tasks get throttled and CPU is under utilized for the whole system. > Also, what value of the sysctl kernel.sched_cfs_bandwidth_slice_us are > you using? > What CONFIG_HZ you are using is also interesting, due to how bw is > accounted for. > There is some more info about it here: Documentation/scheduler/sched-bwc.rst. I > assume a smaller slice value may also help, and it would be interesting to see > what implications it gives. A high threads to (quota/period) ratio, together > with a high bandwidth_slice will probably cause some throttling, so one has > to choose between precision and overhead. > Default value of kernel.sched_cfs_bandwidth_slice_us(5ms) and CONFIG_HZ(1000) is used. The following case might be used to prevent getting throttled from many threads and high bandwidth slice: mkdir /sys/fs/cgroup/cpu/test echo $$ > /sys/fs/cgroup/cpu/test/cgroup.procs echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_quota_us echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_burst_us ./schbench -m 1 -t 3 -r 20 -c 80000 -R 20 On my machine, two workers work for 80ms and sleep for 120ms in each round. The average utilization is around 80%. This will work on a two-core system. It is recommended to try it multiple times as getting throttled doesn't necessarily cause tail latency for schbench. > Also, here you give a burst of 66% the quota. Would that be a typical value > for a cgroup, or is it just a result of testing? As I understand this Yeah, it is not a typical value, and tuned for this test. > patchset, your example > would allow 600% constant CPU load, then one period with 1000% load, > then another > "long set" of periods with 600% load. Have you discussed a way of limiting how > long burst can be "saved" before expiring? Haven't thought about it much. It is interesting but I doubt the need to do that. > >> @@ -9427,7 +9478,8 @@ static int cpu_max_show(struct seq_file *sf, void *v) >> { >> struct task_group *tg = css_tg(seq_css(sf)); >> >> - cpu_period_quota_print(sf, tg_get_cfs_period(tg), tg_get_cfs_quota(tg)); >> + cpu_period_quota_print(sf, tg_get_cfs_period(tg), tg_get_cfs_quota(tg), >> + tg_get_cfs_burst(tg)); >> return 0; >> } > > The current cgroup v2 docs say the following: > >> cpu.max >> A read-write two value file which exists on non-root cgroups. >> The default is "max 100000". > > This will become a "three value file", and I know a few user space projects > who parse this file by splitting on the middle space. I am not sure if they are > "wrong", but I don't think we usually break such things. Not sure what > Tejun thinks about this. > Thanks, it will be modified in the way Tejun suggests. > Thanks > Odin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 1/3] sched/fair: Introduce the burstable CFS controller 2021-05-21 9:09 ` changhuaixin @ 2021-05-21 9:38 ` Odin Ugedal 2021-05-21 12:38 ` changhuaixin 0 siblings, 1 reply; 21+ messages in thread From: Odin Ugedal @ 2021-05-21 9:38 UTC (permalink / raw) To: changhuaixin Cc: Odin Ugedal, Benjamin Segall, Dietmar Eggemann, dtcccc, Juri Lelli, khlebnikov, open list, Mel Gorman, Ingo Molnar, pauld, Peter Zijlstra, Paul Turner, Steven Rostedt, Shanpei Chen, Tejun Heo, Vincent Guittot, xiyou.wangcong Hi, > Yeah, it is a well tuned workload and configuration. I did this because for benchmarks > like schbench, workloads are generated in a fixed pattern without burst. So I set schbench > params carefully to generate burst during each 100ms periods, to show burst works. Longer > period or higher quota helps indeed, in which case more workloads can be used to generate > tail latency then. Yeah, that makes sense. When it comes to fairness (you are talking about generating tail latency), I think configuration of cpu shares/weight between cgroups is more relevant. How much more tail latency will a cgroup be able to "create" when doubling the period? > In my view, burst is like the cfsb way of token bucket. For the present cfsb, bucket capacity > is strictly limited to quota. And that is changed into quota + burst now. And it shall be used when > tasks get throttled and CPU is under utilized for the whole system. Well, it is as strict as we can make it, depending on how one looks at it. We cannot guarantee anything more strict than the length of a jiffy or kernel.sched_cfs_bandwidth_slice_us (simplified ofc.), especially since we allow runtime from one period to be used in another. I think there is a "big" distinction between runtime transferred from the cfs_bw to cfs_rq's in a period compared to the actual runtime used. > Default value of kernel.sched_cfs_bandwidth_slice_us(5ms) and CONFIG_HZ(1000) is used. You should mention that in the msg then, since it is highly relevant to the results. Can you try to tweak kernel.sched_cfs_bandwidth_slice_us to something like 1ms, and see what the result will be? For such a workload and high cfs_bw_slice, a smaller CONFIG_HZ might also be beneficial (although there are many things to consider when talking about that, and a lot of people know more about that than me). > The following case might be used to prevent getting throttled from many threads and high bandwidth > slice: > > mkdir /sys/fs/cgroup/cpu/test > echo $$ > /sys/fs/cgroup/cpu/test/cgroup.procs > echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_quota_us > echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_burst_us > > ./schbench -m 1 -t 3 -r 20 -c 80000 -R 20 > > On my machine, two workers work for 80ms and sleep for 120ms in each round. The average utilization is > around 80%. This will work on a two-core system. It is recommended to try it multiple times as getting > throttled doesn't necessarily cause tail latency for schbench. When I run this, I get the following results without cfs bandwidth enabled. $ time ./schbench -m 1 -t 3 -r 20 -c 80000 -R 20 Latency percentiles (usec) runtime 20 (s) (398 total samples) 50.0th: 22 (201 samples) 75.0th: 50 (158 samples) 90.0th: 50 (0 samples) 95.0th: 51 (38 samples) *99.0th: 51 (0 samples) 99.5th: 51 (0 samples) 99.9th: 52 (1 samples) min=5, max=52 rps: 19900000.00 p95 (usec) 51 p99 (usec) 51 p95/cputime 0.06% p99/cputime 0.06% ./schbench -m 1 -t 3 -r 20 -c 80000 -R 20 31.85s user 0.00s system 159% cpu 20.021 total In this case, I see 80% load on two cores, ending at a total of 160%. If setting period: 100ms and quota: 100ms (aka. 1 cpu), throttling is what you would expect, or?. In this case, burst wouldn't matter? Thanks Odin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 1/3] sched/fair: Introduce the burstable CFS controller 2021-05-21 9:38 ` Odin Ugedal @ 2021-05-21 12:38 ` changhuaixin 0 siblings, 0 replies; 21+ messages in thread From: changhuaixin @ 2021-05-21 12:38 UTC (permalink / raw) To: Odin Ugedal Cc: changhuaixin, Benjamin Segall, Dietmar Eggemann, dtcccc, Juri Lelli, khlebnikov, open list, Mel Gorman, Ingo Molnar, pauld, Peter Zijlstra, Paul Turner, Steven Rostedt, Shanpei Chen, Tejun Heo, Vincent Guittot, xiyou.wangcong > On May 21, 2021, at 5:38 PM, Odin Ugedal <odin@uged.al> wrote: > > Hi, > >> Yeah, it is a well tuned workload and configuration. I did this because for benchmarks >> like schbench, workloads are generated in a fixed pattern without burst. So I set schbench >> params carefully to generate burst during each 100ms periods, to show burst works. Longer >> period or higher quota helps indeed, in which case more workloads can be used to generate >> tail latency then. > > Yeah, that makes sense. When it comes to fairness (you are talking > about generating tail > latency), I think configuration of cpu shares/weight between cgroups > is more relevant. > > How much more tail latency will a cgroup be able to "create" when > doubling the period? > Indeed, fairness is another factor relevant to tail latency. However, real workloads benefit from burst feature, too. For java workloads with equal fairness between cgroups, a huge drop of tail latency from 500ms to 27ms is seen after enabling burst feature. I shouldn't delete this info in the msg. I guess tail latency from schbench is small here, because schbench is simple and only measures wakeup latency. For workloads measuring round trip time, the effect of getting throttled is more obvious. > >> In my view, burst is like the cfsb way of token bucket. For the present cfsb, bucket capacity >> is strictly limited to quota. And that is changed into quota + burst now. And it shall be used when >> tasks get throttled and CPU is under utilized for the whole system. > > Well, it is as strict as we can make it, depending on how one looks at it. We > cannot guarantee anything more strict than the length of a jiffy or > kernel.sched_cfs_bandwidth_slice_us (simplified ofc.), especially since we allow > runtime from one period to be used in another. I think there is a > "big" distinction between > runtime transferred from the cfs_bw to cfs_rq's in a period compared > to the actual runtime used. > >> Default value of kernel.sched_cfs_bandwidth_slice_us(5ms) and CONFIG_HZ(1000) is used. > > You should mention that in the msg then, since it is highly relevant > to the results. Can you try to tweak Sorry for causing trouble reproducing this. I'll add these info. > kernel.sched_cfs_bandwidth_slice_us to something like 1ms, and see > what the result will be? > After using 1ms kernel.sched_cfs_bandwidth_slice_us I see 99.0th and 99.5th latency drop, and 99.9th latency remains at several ms. I guess I can't tell it from some small spikes now. # 1ms kernel.sched_cfs_bandwidth_slice_us echo $$ > /sys/fs/cgroup/cpu/test/cgroup.procs echo 600000 > /sys/fs/cgroup/cpu/test/cpu.cfs_quota_us #echo 400000 > /sys/fs/cgroup/cpu/test/cpu.cfs_burst_us cat /sys/fs/cgroup/cpu/test/cpu.stat | grep nr_throttled ./schbench -m 1 -t 30 -r 10 -c 10000 -R 500 Latency percentiles (usec) 50.0000th: 8 75.0000th: 8 90.0000th: 9 95.0000th: 10 *99.0000th: 13 99.5000th: 17 99.9000th: 6408 min=0, max=7576 rps: 497.44 p95 (usec) 10 p99 (usec) 13 p95/cputime 0.10% p99/cputime 0.13% > For such a workload and high cfs_bw_slice, a smaller CONFIG_HZ might > also be beneficial (although > there are many things to consider when talking about that, and a lot > of people know more about that than me). > >> The following case might be used to prevent getting throttled from many threads and high bandwidth >> slice: >> >> mkdir /sys/fs/cgroup/cpu/test >> echo $$ > /sys/fs/cgroup/cpu/test/cgroup.procs >> echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_quota_us >> echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_burst_us >> >> ./schbench -m 1 -t 3 -r 20 -c 80000 -R 20 >> >> On my machine, two workers work for 80ms and sleep for 120ms in each round. The average utilization is >> around 80%. This will work on a two-core system. It is recommended to try it multiple times as getting >> throttled doesn't necessarily cause tail latency for schbench. > > When I run this, I get the following results without cfs bandwidth enabled. > > $ time ./schbench -m 1 -t 3 -r 20 -c 80000 -R 20 > Latency percentiles (usec) runtime 20 (s) (398 total samples) > 50.0th: 22 (201 samples) > 75.0th: 50 (158 samples) > 90.0th: 50 (0 samples) > 95.0th: 51 (38 samples) > *99.0th: 51 (0 samples) > 99.5th: 51 (0 samples) > 99.9th: 52 (1 samples) > min=5, max=52 > rps: 19900000.00 p95 (usec) 51 p99 (usec) 51 p95/cputime 0.06% p99/cputime 0.06% > ./schbench -m 1 -t 3 -r 20 -c 80000 -R 20 31.85s user 0.00s system > 159% cpu 20.021 total > > In this case, I see 80% load on two cores, ending at a total of 160%. If setting > period: 100ms and quota: 100ms (aka. 1 cpu), throttling is what > you would expect, or?. In this case, burst wouldn't matter? > Sorry for my mistake. The -R option should be 10 instead of 20. And the case should be: # 1ms kernel.sched_cfs_bandwidth_slice_us mkdir /sys/fs/cgroup/cpu/test echo $$ > /sys/fs/cgroup/cpu/test/cgroup.procs echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_quota_us echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_burst_us ./schbench -m 1 -t 3 -r 20 -c 80000 -R 10 The average CPU usage is at 80%. I run this for 10 times, and got long tail latency for 6 times and got throttled for 8 times. Tail latencies are showed below, and it wasn't the worst case. Latency percentiles (usec) 50.0000th: 19872 75.0000th: 21344 90.0000th: 22176 95.0000th: 22496 *99.0000th: 22752 99.5000th: 22752 99.9000th: 22752 min=0, max=22727 rps: 9.90 p95 (usec) 22496 p99 (usec) 22752 p95/cputime 28.12% p99/cputime 28.44% Sometimes the measured period of schbench is not throttled and thus no tail latency is seen. Sometimes tasks do not get throttled because the the offset of schbench worker start from period start matters too. In this case, these two CPUs work for 80ms and sleeps for 120ms, If the 80ms burst period for 2 workers is cut into two cfsb periods, they might not get throttled. I'll use this case in the commit log. > > Thanks > Odin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 1/3] sched/fair: Introduce the burstable CFS controller 2021-05-20 12:34 ` [PATCH v5 1/3] sched/fair: Introduce the burstable CFS controller Huaixin Chang 2021-05-20 14:00 ` Odin Ugedal @ 2021-05-21 14:00 ` Peter Zijlstra 2021-05-24 12:42 ` changhuaixin 1 sibling, 1 reply; 21+ messages in thread From: Peter Zijlstra @ 2021-05-21 14:00 UTC (permalink / raw) To: Huaixin Chang Cc: bsegall, dietmar.eggemann, dtcccc, juri.lelli, khlebnikov, linux-kernel, mgorman, mingo, odin, odin, pauld, pjt, rostedt, shanpeic, tj, vincent.guittot, xiyou.wangcong On Thu, May 20, 2021 at 08:34:17PM +0800, Huaixin Chang wrote: > The CFS bandwidth controller limits CPU requests of a task group to > quota during each period. However, parallel workloads might be bursty > so that they get throttled even when their average utilization is under > quota. And they are latency sensitive at the same time so that > throttling them is undesired. > > Scaling up period and quota allows greater burst capacity. But it might > cause longer stuck till next refill. Introduce "burst" to allow > accumulating unused quota from previous periods, and to be assigned when > a task group requests more CPU than quota during a specific period. > > Introducing burst buffer might also cause interference to other groups. > Thus limit the maximum accumulated buffer by "burst", and limit > the maximum allowed burst by quota, too. Overall, *much* better than before. However I would like a little bit better discussion of how exactly people are supposed to reason about this. That will also help with the question from Odin on how people are supposed to set/compute this burst value. So traditional (UP-EDF) bandwidth control is something like: (U = \Sum u_i) <= 1 This guaranteeds both that every deadline is met and that the system is stable. After all, if U were > 1, then for every second of walltime, we'd have to run more than a second of program time, and obviously miss our deadline, but the next deadline will be further out still, there is never time to catch up, unbounded fail. This work observes that a workload doesn't always executes the full quota; this enables one to describe u_i as a statistical distribution. For example, have u_i = {x,e}_i, where x is the p(95) and x+e p(100) (the traditional WCET). This effectively allows u to be smaller, increasing the efficiency (we can pack more tasks in the system), but at the cost of missing deadlines when all the odds line up. However, it does maintain stability, since every overrun must be paired with an underrun as long as our x is above the average. That is, suppose we have 2 tasks, both specify a p(95) value, then we have a p(95)*p(95) = 90.25% chance both tasks are within their quota and everything is good. At the same time we have a p(5)p(5) = 0.25% chance both tasks will exceed their quota at the same time (guaranteed deadline fail). Somewhere in between there's a threshold where one exceeds and the other doesn't underrun enough to compensate; this depends on the specific CDFs. At the same time, we can say that the worst case deadline miss, will be \Sum e_i; that is, there is a bounded tardiness (under the assumption that x+e is indeed WCET). And I think you can compute more fun properties. Now, CFS bandwidth control is not EDF, and the above doesn't fully translate, but much does I think. We borrow time now against our future underrun, at the cost of increased interference against the other system users. All nicely bounded etc.. > Co-developed-by: Shanpei Chen <shanpeic@linux.alibaba.com> > Signed-off-by: Shanpei Chen <shanpeic@linux.alibaba.com> > Co-developed-by: Tianchen Ding <dtcccc@linux.alibaba.com> > Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> > Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com> > --- > include/linux/sched/sysctl.h | 1 + > kernel/sched/core.c | 83 ++++++++++++++++++++++++++++++++++++-------- > kernel/sched/fair.c | 21 ++++++++++- > kernel/sched/sched.h | 1 + > kernel/sysctl.c | 9 +++++ > 5 files changed, 99 insertions(+), 16 deletions(-) > > diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h > index db2c0f34aaaf..08432aeb742e 100644 > --- a/include/linux/sched/sysctl.h > +++ b/include/linux/sched/sysctl.h > @@ -73,6 +73,7 @@ extern unsigned int sysctl_sched_uclamp_util_min_rt_default; > > #ifdef CONFIG_CFS_BANDWIDTH > extern unsigned int sysctl_sched_cfs_bandwidth_slice; > +extern unsigned int sysctl_sched_cfs_bw_burst_enabled; > #endif > > #ifdef CONFIG_SCHED_AUTOGROUP > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 5226cc26a095..7d34b08ee0e5 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -8962,7 +8962,8 @@ static const u64 max_cfs_runtime = MAX_BW * NSEC_PER_USEC; > > static int __cfs_schedulable(struct task_group *tg, u64 period, u64 runtime); > > -static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota) > +static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota, > + u64 burst) > { > int i, ret = 0, runtime_enabled, runtime_was_enabled; > struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth; > @@ -8992,6 +8993,10 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota) > if (quota != RUNTIME_INF && quota > max_cfs_runtime) > return -EINVAL; > > + if (quota != RUNTIME_INF && (burst > quota || > + burst + quota > max_cfs_runtime)) > + return -EINVAL; > + > /* > * Prevent race between setting of cfs_rq->runtime_enabled and > * unthrottle_offline_cfs_rqs(). <snip all API crud> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 3248e24a90b0..48fad5cf0f7a 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -134,6 +134,13 @@ int __weak arch_asym_cpu_priority(int cpu) > * (default: 5 msec, units: microseconds) > */ > unsigned int sysctl_sched_cfs_bandwidth_slice = 5000UL; > + > +/* > + * A switch for cfs bandwidth burst. > + * > + * (default: 1, enabled) > + */ > +unsigned int sysctl_sched_cfs_bw_burst_enabled = 1; > #endif > > static inline void update_load_add(struct load_weight *lw, unsigned long inc) > @@ -4628,8 +4635,16 @@ static inline u64 sched_cfs_bandwidth_slice(void) > */ > void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b) > { > - if (cfs_b->quota != RUNTIME_INF) > + if (unlikely(cfs_b->quota == RUNTIME_INF)) > + return; > + > + if (!sysctl_sched_cfs_bw_burst_enabled) { > cfs_b->runtime = cfs_b->quota; > + return; > + } > + > + cfs_b->runtime += cfs_b->quota; > + cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst); > } > > static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg) > @@ -4651,6 +4666,9 @@ static int __assign_cfs_rq_runtime(struct cfs_bandwidth *cfs_b, > if (cfs_b->quota == RUNTIME_INF) > amount = min_amount; > else { > + if (!cfs_b->period_active) > + __refill_cfs_bandwidth_runtime(cfs_b); Why this call? > start_cfs_bandwidth(cfs_b); > > if (cfs_b->runtime > 0) { > @@ -5279,6 +5297,7 @@ void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b) > cfs_b->runtime = 0; > cfs_b->quota = RUNTIME_INF; > cfs_b->period = ns_to_ktime(default_cfs_period()); > + cfs_b->burst = 0; > > INIT_LIST_HEAD(&cfs_b->throttled_cfs_rq); > hrtimer_init(&cfs_b->period_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED); > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index a189bec13729..d317ca74a48c 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -366,6 +366,7 @@ struct cfs_bandwidth { > ktime_t period; > u64 quota; > u64 runtime; > + u64 burst; > s64 hierarchical_quota; > > u8 idle; > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index 14edf84cc571..fb27bae7ace2 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -1816,6 +1816,15 @@ static struct ctl_table kern_table[] = { > .proc_handler = proc_dointvec_minmax, > .extra1 = SYSCTL_ONE, > }, > + { > + .procname = "sched_cfs_bw_burst_enabled", > + .data = &sysctl_sched_cfs_bw_burst_enabled, > + .maxlen = sizeof(unsigned int), > + .mode = 0644, > + .proc_handler = proc_dointvec_minmax, > + .extra1 = SYSCTL_ZERO, > + .extra2 = SYSCTL_ONE, > + }, > #endif > #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL) > { What's the purpose of this new sysctl? By default it is disabled because burst==0, only if you set burst to some !0 value does this actually do anything. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 1/3] sched/fair: Introduce the burstable CFS controller 2021-05-21 14:00 ` Peter Zijlstra @ 2021-05-24 12:42 ` changhuaixin 2021-05-25 10:46 ` Peter Zijlstra ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: changhuaixin @ 2021-05-24 12:42 UTC (permalink / raw) To: Peter Zijlstra Cc: changhuaixin, Benjamin Segall, Dietmar Eggemann, dtcccc, Juri Lelli, khlebnikov, open list, Mel Gorman, Ingo Molnar, Odin Ugedal, Odin Ugedal, pauld, Paul Turner, Steven Rostedt, Shanpei Chen, Tejun Heo, Vincent Guittot, xiyou.wangcong > On May 21, 2021, at 10:00 PM, Peter Zijlstra <peterz@infradead.org> wrote: > > On Thu, May 20, 2021 at 08:34:17PM +0800, Huaixin Chang wrote: >> The CFS bandwidth controller limits CPU requests of a task group to >> quota during each period. However, parallel workloads might be bursty >> so that they get throttled even when their average utilization is under >> quota. And they are latency sensitive at the same time so that >> throttling them is undesired. >> >> Scaling up period and quota allows greater burst capacity. But it might >> cause longer stuck till next refill. Introduce "burst" to allow >> accumulating unused quota from previous periods, and to be assigned when >> a task group requests more CPU than quota during a specific period. >> >> Introducing burst buffer might also cause interference to other groups. >> Thus limit the maximum accumulated buffer by "burst", and limit >> the maximum allowed burst by quota, too. > > Overall, *much* better than before. > > However I would like a little bit better discussion of how exactly > people are supposed to reason about this. That will also help with the > question from Odin on how people are supposed to set/compute this burst > value. > > So traditional (UP-EDF) bandwidth control is something like: > > (U = \Sum u_i) <= 1 > > This guaranteeds both that every deadline is met and that the system is > stable. After all, if U were > 1, then for every second of walltime, > we'd have to run more than a second of program time, and obviously miss > our deadline, but the next deadline will be further out still, there is > never time to catch up, unbounded fail. > > This work observes that a workload doesn't always executes the full > quota; this enables one to describe u_i as a statistical distribution. > > For example, have u_i = {x,e}_i, where x is the p(95) and x+e p(100) > (the traditional WCET). This effectively allows u to be smaller, > increasing the efficiency (we can pack more tasks in the system), but at > the cost of missing deadlines when all the odds line up. However, it > does maintain stability, since every overrun must be paired with an > underrun as long as our x is above the average. > > That is, suppose we have 2 tasks, both specify a p(95) value, then we > have a p(95)*p(95) = 90.25% chance both tasks are within their quota and > everything is good. At the same time we have a p(5)p(5) = 0.25% chance > both tasks will exceed their quota at the same time (guaranteed deadline > fail). Somewhere in between there's a threshold where one exceeds and > the other doesn't underrun enough to compensate; this depends on the > specific CDFs. > > At the same time, we can say that the worst case deadline miss, will be > \Sum e_i; that is, there is a bounded tardiness (under the assumption > that x+e is indeed WCET). > > And I think you can compute more fun properties. > > Now, CFS bandwidth control is not EDF, and the above doesn't fully > translate, but much does I think. > > We borrow time now against our future underrun, at the cost of increased > interference against the other system users. All nicely bounded etc.. > I shall improve the commit log then. We did some compute on the probability that deadline is missed, and the expected boundary. These values are calculated with different number of control groups and variable CPU utilization when runtime is under exponential distribution, poisson distribution or pareto distribution. The more control groups there are, the more likely deadline is made and the smaller average WCET to expect. Because many equal control groups means small chance of U > 1. And the more under utilized the whole system is, the more likely deadline is made and the smaller average WCET to expect. More details are posted in https://lore.kernel.org/lkml/5371BD36-55AE-4F71-B9D7-B86DC32E3D2B@linux.alibaba.com/. > > >> Co-developed-by: Shanpei Chen <shanpeic@linux.alibaba.com> >> Signed-off-by: Shanpei Chen <shanpeic@linux.alibaba.com> >> Co-developed-by: Tianchen Ding <dtcccc@linux.alibaba.com> >> Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> >> Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com> >> --- >> include/linux/sched/sysctl.h | 1 + >> kernel/sched/core.c | 83 ++++++++++++++++++++++++++++++++++++-------- >> kernel/sched/fair.c | 21 ++++++++++- >> kernel/sched/sched.h | 1 + >> kernel/sysctl.c | 9 +++++ >> 5 files changed, 99 insertions(+), 16 deletions(-) >> >> diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h >> index db2c0f34aaaf..08432aeb742e 100644 >> --- a/include/linux/sched/sysctl.h >> +++ b/include/linux/sched/sysctl.h >> @@ -73,6 +73,7 @@ extern unsigned int sysctl_sched_uclamp_util_min_rt_default; >> >> #ifdef CONFIG_CFS_BANDWIDTH >> extern unsigned int sysctl_sched_cfs_bandwidth_slice; >> +extern unsigned int sysctl_sched_cfs_bw_burst_enabled; >> #endif >> >> #ifdef CONFIG_SCHED_AUTOGROUP >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 5226cc26a095..7d34b08ee0e5 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -8962,7 +8962,8 @@ static const u64 max_cfs_runtime = MAX_BW * NSEC_PER_USEC; >> >> static int __cfs_schedulable(struct task_group *tg, u64 period, u64 runtime); >> >> -static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota) >> +static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota, >> + u64 burst) >> { >> int i, ret = 0, runtime_enabled, runtime_was_enabled; >> struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth; >> @@ -8992,6 +8993,10 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota) >> if (quota != RUNTIME_INF && quota > max_cfs_runtime) >> return -EINVAL; >> >> + if (quota != RUNTIME_INF && (burst > quota || >> + burst + quota > max_cfs_runtime)) >> + return -EINVAL; >> + >> /* >> * Prevent race between setting of cfs_rq->runtime_enabled and >> * unthrottle_offline_cfs_rqs(). > > <snip all API crud> > >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 3248e24a90b0..48fad5cf0f7a 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -134,6 +134,13 @@ int __weak arch_asym_cpu_priority(int cpu) >> * (default: 5 msec, units: microseconds) >> */ >> unsigned int sysctl_sched_cfs_bandwidth_slice = 5000UL; >> + >> +/* >> + * A switch for cfs bandwidth burst. >> + * >> + * (default: 1, enabled) >> + */ >> +unsigned int sysctl_sched_cfs_bw_burst_enabled = 1; >> #endif >> >> static inline void update_load_add(struct load_weight *lw, unsigned long inc) >> @@ -4628,8 +4635,16 @@ static inline u64 sched_cfs_bandwidth_slice(void) >> */ >> void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b) >> { >> - if (cfs_b->quota != RUNTIME_INF) >> + if (unlikely(cfs_b->quota == RUNTIME_INF)) >> + return; >> + >> + if (!sysctl_sched_cfs_bw_burst_enabled) { >> cfs_b->runtime = cfs_b->quota; >> + return; >> + } >> + >> + cfs_b->runtime += cfs_b->quota; >> + cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst); >> } >> >> static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg) >> @@ -4651,6 +4666,9 @@ static int __assign_cfs_rq_runtime(struct cfs_bandwidth *cfs_b, >> if (cfs_b->quota == RUNTIME_INF) >> amount = min_amount; >> else { >> + if (!cfs_b->period_active) >> + __refill_cfs_bandwidth_runtime(cfs_b); > > Why this call? As the cfs bandwidth timer stops on idle with runtime unfilled, refill runtime when it restarts to make use of the underrun when period timer stops. Another way to do this might be: throttled = !list_empty(&cfs_b->throttled_cfs_rq); cfs_b->nr_periods += overrun; + __refill_cfs_bandwidth_runtime(cfs_b); + /* * idle depends on !throttled (for the case of a large deficit), and if * we're going inactive then everything else can be deferred */ if (cfs_b->idle && !throttled) goto out_deactivate; - __refill_cfs_bandwidth_runtime(cfs_b); - if (!throttled) { > >> start_cfs_bandwidth(cfs_b); >> >> if (cfs_b->runtime > 0) { >> @@ -5279,6 +5297,7 @@ void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b) >> cfs_b->runtime = 0; >> cfs_b->quota = RUNTIME_INF; >> cfs_b->period = ns_to_ktime(default_cfs_period()); >> + cfs_b->burst = 0; >> >> INIT_LIST_HEAD(&cfs_b->throttled_cfs_rq); >> hrtimer_init(&cfs_b->period_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED); >> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h >> index a189bec13729..d317ca74a48c 100644 >> --- a/kernel/sched/sched.h >> +++ b/kernel/sched/sched.h >> @@ -366,6 +366,7 @@ struct cfs_bandwidth { >> ktime_t period; >> u64 quota; >> u64 runtime; >> + u64 burst; >> s64 hierarchical_quota; >> >> u8 idle; >> diff --git a/kernel/sysctl.c b/kernel/sysctl.c >> index 14edf84cc571..fb27bae7ace2 100644 >> --- a/kernel/sysctl.c >> +++ b/kernel/sysctl.c >> @@ -1816,6 +1816,15 @@ static struct ctl_table kern_table[] = { >> .proc_handler = proc_dointvec_minmax, >> .extra1 = SYSCTL_ONE, >> }, >> + { >> + .procname = "sched_cfs_bw_burst_enabled", >> + .data = &sysctl_sched_cfs_bw_burst_enabled, >> + .maxlen = sizeof(unsigned int), >> + .mode = 0644, >> + .proc_handler = proc_dointvec_minmax, >> + .extra1 = SYSCTL_ZERO, >> + .extra2 = SYSCTL_ONE, >> + }, >> #endif >> #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL) >> { > > What's the purpose of this new sysctl? By default it is disabled because > burst==0, only if you set burst to some !0 value does this actually do > anything. Originally, this is designed to turn burst feature off when the system becomes unstable. Guess we can remove this as you have questioned it. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 1/3] sched/fair: Introduce the burstable CFS controller 2021-05-24 12:42 ` changhuaixin @ 2021-05-25 10:46 ` Peter Zijlstra 2021-05-31 6:59 ` luca abeni 2021-05-25 10:47 ` Peter Zijlstra 2021-05-25 10:49 ` Peter Zijlstra 2 siblings, 1 reply; 21+ messages in thread From: Peter Zijlstra @ 2021-05-25 10:46 UTC (permalink / raw) To: changhuaixin Cc: Benjamin Segall, Dietmar Eggemann, dtcccc, Juri Lelli, khlebnikov, open list, Mel Gorman, Ingo Molnar, Odin Ugedal, Odin Ugedal, pauld, Paul Turner, Steven Rostedt, Shanpei Chen, Tejun Heo, Vincent Guittot, xiyou.wangcong, luca.abeni, tommaso.cucinotta, baruah, anderson On Mon, May 24, 2021 at 08:42:03PM +0800, changhuaixin wrote: > > On May 21, 2021, at 10:00 PM, Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Thu, May 20, 2021 at 08:34:17PM +0800, Huaixin Chang wrote: > >> The CFS bandwidth controller limits CPU requests of a task group to > >> quota during each period. However, parallel workloads might be bursty > >> so that they get throttled even when their average utilization is under > >> quota. And they are latency sensitive at the same time so that > >> throttling them is undesired. > >> > >> Scaling up period and quota allows greater burst capacity. But it might > >> cause longer stuck till next refill. Introduce "burst" to allow > >> accumulating unused quota from previous periods, and to be assigned when > >> a task group requests more CPU than quota during a specific period. > >> > >> Introducing burst buffer might also cause interference to other groups. > >> Thus limit the maximum accumulated buffer by "burst", and limit > >> the maximum allowed burst by quota, too. > > > > Overall, *much* better than before. > > > > However I would like a little bit better discussion of how exactly > > people are supposed to reason about this. That will also help with the > > question from Odin on how people are supposed to set/compute this burst > > value. > > > > So traditional (UP-EDF) bandwidth control is something like: > > > > (U = \Sum u_i) <= 1 > > > > This guaranteeds both that every deadline is met and that the system is > > stable. After all, if U were > 1, then for every second of walltime, > > we'd have to run more than a second of program time, and obviously miss > > our deadline, but the next deadline will be further out still, there is > > never time to catch up, unbounded fail. > > > > This work observes that a workload doesn't always executes the full > > quota; this enables one to describe u_i as a statistical distribution. > > > > For example, have u_i = {x,e}_i, where x is the p(95) and x+e p(100) > > (the traditional WCET). This effectively allows u to be smaller, > > increasing the efficiency (we can pack more tasks in the system), but at > > the cost of missing deadlines when all the odds line up. However, it > > does maintain stability, since every overrun must be paired with an > > underrun as long as our x is above the average. > > > > That is, suppose we have 2 tasks, both specify a p(95) value, then we > > have a p(95)*p(95) = 90.25% chance both tasks are within their quota and > > everything is good. At the same time we have a p(5)p(5) = 0.25% chance > > both tasks will exceed their quota at the same time (guaranteed deadline > > fail). Somewhere in between there's a threshold where one exceeds and > > the other doesn't underrun enough to compensate; this depends on the > > specific CDFs. > > > > At the same time, we can say that the worst case deadline miss, will be > > \Sum e_i; that is, there is a bounded tardiness (under the assumption > > that x+e is indeed WCET). Having second thoughts about this exact claim; lightning can strike twice, and if we exceed bounds again before having recovered from the last time we might exceed the bound mentioned. I _think_ the property holds, but the bound might need work. > > And I think you can compute more fun properties. > > > > Now, CFS bandwidth control is not EDF, and the above doesn't fully > > translate, but much does I think. > > > > We borrow time now against our future underrun, at the cost of increased > > interference against the other system users. All nicely bounded etc.. > > > > I shall improve the commit log then. Thanks! > We did some compute on the probability that deadline is missed, and the expected > boundary. These values are calculated with different number of control groups and > variable CPU utilization when runtime is under exponential distribution, poisson > distribution or pareto distribution. > > The more control groups there are, the more likely deadline is made and the smaller average > WCET to expect. Because many equal control groups means small chance of U > 1. > > And the more under utilized the whole system is, the more likely deadline is made and the smaller > average WCET to expect. > > More details are posted in > https://lore.kernel.org/lkml/5371BD36-55AE-4F71-B9D7-B86DC32E3D2B@linux.alibaba.com/. Indeed you did; I'm a bit sad it's so hard to find papers that cover this. When one Googles for 'Probabilistic WCET' there's a fair number of papers about using Extreme Value Theory to estimate the traditional WCET given measurement based input. Many from the excellent WCET track at ECRTS. The thing is, the last time I attended that conference (which appears to be almost 4 years ago :/), I'm sure I spoke to people about exactly the thing explored here. Albeit, at the time we discussed this as a SCHED_DEADLINE task model extension. Let me Cc a bunch of people that might know more.., ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 1/3] sched/fair: Introduce the burstable CFS controller 2021-05-25 10:46 ` Peter Zijlstra @ 2021-05-31 6:59 ` luca abeni 0 siblings, 0 replies; 21+ messages in thread From: luca abeni @ 2021-05-31 6:59 UTC (permalink / raw) To: Peter Zijlstra Cc: changhuaixin, Benjamin Segall, Dietmar Eggemann, dtcccc, Juri Lelli, khlebnikov, open list, Mel Gorman, Ingo Molnar, Odin Ugedal, Odin Ugedal, pauld, Paul Turner, Steven Rostedt, Shanpei Chen, Tejun Heo, Vincent Guittot, xiyou.wangcong, tommaso.cucinotta, baruah, anderson Hi all, On Tue, 25 May 2021 12:46:52 +0200 Peter Zijlstra <peterz@infradead.org> wrote: [...] > > We did some compute on the probability that deadline is missed, and the expected > > boundary. These values are calculated with different number of control groups and > > variable CPU utilization when runtime is under exponential distribution, poisson > > distribution or pareto distribution. > > > > The more control groups there are, the more likely deadline is made and the smaller average > > WCET to expect. Because many equal control groups means small chance of U > 1. > > > > And the more under utilized the whole system is, the more likely deadline is made and the smaller > > average WCET to expect. > > > > More details are posted in > > https://lore.kernel.org/lkml/5371BD36-55AE-4F71-B9D7-B86DC32E3D2B@linux.alibaba.com/. > > Indeed you did; I'm a bit sad it's so hard to find papers that cover > this. When one Googles for 'Probabilistic WCET' there's a fair number of > papers about using Extreme Value Theory to estimate the traditional WCET > given measurement based input. Many from the excellent WCET track at > ECRTS. If I understand well the context, you do not need probabilistic WCET here... If you assume to know the probability distribution of the inter-arrival times and execution times (this is what is assumed at https://lore.kernel.org/lkml/5371BD36-55AE-4F71-B9D7-B86DC32E3D2B@linux.alibaba.com/, right?), then you can use "standard" queuing theory to compute the response time distribution. If I understand well, in the link mentioned above the response times are measured by simulating a model of the scheduler. Queuing theory can be used instead, as shown (for example) in these papers: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.7683&rep=rep1&type=pdf http://retis.sssup.it/~giorgio/paps/2001/wpdrts01.pdf (these papers consider a scheduler similar to SCHED_DEADLINE, but the approach can be easily applied to every scheduler that guarantees a runtime in a period --- I think the CFS controller falls in this category, right?) I think the burst mentioned above can be added to this queuing model; I'll have a look at this in the next days. The problem with this approach is that the execution times of different activation of a task are considered to be independent and identically distributed (this is the infamous "IID assumption"). And this assumption is often unrealistic... The probabilistic WCET approach mentioned above allows you to analyze the behaviour of a scheduler without assuming that the execution (and/or inter-activation) times are IID. Luca > The thing is, the last time I attended that conference (which appears to > be almost 4 years ago :/), I'm sure I spoke to people about exactly the > thing explored here. Albeit, at the time we discussed this as a > SCHED_DEADLINE task model extension. > > Let me Cc a bunch of people that might know more.., ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 1/3] sched/fair: Introduce the burstable CFS controller 2021-05-24 12:42 ` changhuaixin 2021-05-25 10:46 ` Peter Zijlstra @ 2021-05-25 10:47 ` Peter Zijlstra 2021-05-25 21:25 ` Benjamin Segall 2021-05-25 10:49 ` Peter Zijlstra 2 siblings, 1 reply; 21+ messages in thread From: Peter Zijlstra @ 2021-05-25 10:47 UTC (permalink / raw) To: changhuaixin Cc: Benjamin Segall, Dietmar Eggemann, dtcccc, Juri Lelli, khlebnikov, open list, Mel Gorman, Ingo Molnar, Odin Ugedal, Odin Ugedal, pauld, Paul Turner, Steven Rostedt, Shanpei Chen, Tejun Heo, Vincent Guittot, xiyou.wangcong On Mon, May 24, 2021 at 08:42:03PM +0800, changhuaixin wrote: > >> static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg) > >> @@ -4651,6 +4666,9 @@ static int __assign_cfs_rq_runtime(struct cfs_bandwidth *cfs_b, > >> if (cfs_b->quota == RUNTIME_INF) > >> amount = min_amount; > >> else { > >> + if (!cfs_b->period_active) > >> + __refill_cfs_bandwidth_runtime(cfs_b); > > > > Why this call? > > As the cfs bandwidth timer stops on idle with runtime unfilled, refill runtime when it restarts to make > use of the underrun when period timer stops. Another way to do this might be: > > throttled = !list_empty(&cfs_b->throttled_cfs_rq); > cfs_b->nr_periods += overrun; > > + __refill_cfs_bandwidth_runtime(cfs_b); > + > /* > * idle depends on !throttled (for the case of a large deficit), and if > * we're going inactive then everything else can be deferred > */ > if (cfs_b->idle && !throttled) > goto out_deactivate; > > - __refill_cfs_bandwidth_runtime(cfs_b); > - > if (!throttled) { > Ben, do you have a preference? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 1/3] sched/fair: Introduce the burstable CFS controller 2021-05-25 10:47 ` Peter Zijlstra @ 2021-05-25 21:25 ` Benjamin Segall 0 siblings, 0 replies; 21+ messages in thread From: Benjamin Segall @ 2021-05-25 21:25 UTC (permalink / raw) To: Peter Zijlstra Cc: changhuaixin, Dietmar Eggemann, dtcccc, Juri Lelli, khlebnikov, open list, Mel Gorman, Ingo Molnar, Odin Ugedal, Odin Ugedal, pauld, Paul Turner, Steven Rostedt, Shanpei Chen, Tejun Heo, Vincent Guittot, xiyou.wangcong Peter Zijlstra <peterz@infradead.org> writes: > On Mon, May 24, 2021 at 08:42:03PM +0800, changhuaixin wrote: > >> >> static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg) >> >> @@ -4651,6 +4666,9 @@ static int __assign_cfs_rq_runtime(struct cfs_bandwidth *cfs_b, >> >> if (cfs_b->quota == RUNTIME_INF) >> >> amount = min_amount; >> >> else { >> >> + if (!cfs_b->period_active) >> >> + __refill_cfs_bandwidth_runtime(cfs_b); >> > >> > Why this call? >> >> As the cfs bandwidth timer stops on idle with runtime unfilled, refill runtime when it restarts to make >> use of the underrun when period timer stops. Another way to do this might be: >> >> throttled = !list_empty(&cfs_b->throttled_cfs_rq); >> cfs_b->nr_periods += overrun; >> >> + __refill_cfs_bandwidth_runtime(cfs_b); >> + >> /* >> * idle depends on !throttled (for the case of a large deficit), and if >> * we're going inactive then everything else can be deferred >> */ >> if (cfs_b->idle && !throttled) >> goto out_deactivate; >> >> - __refill_cfs_bandwidth_runtime(cfs_b); >> - >> if (!throttled) { >> > > Ben, do you have a preference? I think I prefer the latter, possibly with a /* Refill extra burst quota even if cfs_b->idle */ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 1/3] sched/fair: Introduce the burstable CFS controller 2021-05-24 12:42 ` changhuaixin 2021-05-25 10:46 ` Peter Zijlstra 2021-05-25 10:47 ` Peter Zijlstra @ 2021-05-25 10:49 ` Peter Zijlstra 2 siblings, 0 replies; 21+ messages in thread From: Peter Zijlstra @ 2021-05-25 10:49 UTC (permalink / raw) To: changhuaixin Cc: Benjamin Segall, Dietmar Eggemann, dtcccc, Juri Lelli, khlebnikov, open list, Mel Gorman, Ingo Molnar, Odin Ugedal, Odin Ugedal, pauld, Paul Turner, Steven Rostedt, Shanpei Chen, Tejun Heo, Vincent Guittot, xiyou.wangcong On Mon, May 24, 2021 at 08:42:03PM +0800, changhuaixin wrote: > >> --- a/kernel/sysctl.c > >> +++ b/kernel/sysctl.c > >> @@ -1816,6 +1816,15 @@ static struct ctl_table kern_table[] = { > >> .proc_handler = proc_dointvec_minmax, > >> .extra1 = SYSCTL_ONE, > >> }, > >> + { > >> + .procname = "sched_cfs_bw_burst_enabled", > >> + .data = &sysctl_sched_cfs_bw_burst_enabled, > >> + .maxlen = sizeof(unsigned int), > >> + .mode = 0644, > >> + .proc_handler = proc_dointvec_minmax, > >> + .extra1 = SYSCTL_ZERO, > >> + .extra2 = SYSCTL_ONE, > >> + }, > >> #endif > >> #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL) > >> { > > > > What's the purpose of this new sysctl? By default it is disabled because > > burst==0, only if you set burst to some !0 value does this actually do > > anything. > > Originally, this is designed to turn burst feature off when the system becomes unstable. > Guess we can remove this as you have questioned it. Is stability a concern? This is CFS after all, if we overload, we simply share time as per usual. If there is a real use-case for a global knob to limit/disable this I don't object too much, but then please explicitly mention it. ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v5 2/3] sched/fair: Add cfs bandwidth burst statistics 2021-05-20 12:34 [PATCH v5 0/3] sched/fair: Burstable CFS bandwidth controller Huaixin Chang 2021-05-20 12:34 ` [PATCH v5 1/3] sched/fair: Introduce the burstable CFS controller Huaixin Chang @ 2021-05-20 12:34 ` Huaixin Chang 2021-05-20 14:11 ` Odin Ugedal ` (2 more replies) 2021-05-20 12:34 ` [PATCH v5 3/3] sched/fair: Add document for burstable CFS bandwidth Huaixin Chang 2 siblings, 3 replies; 21+ messages in thread From: Huaixin Chang @ 2021-05-20 12:34 UTC (permalink / raw) To: changhuaixin Cc: bsegall, dietmar.eggemann, dtcccc, juri.lelli, khlebnikov, linux-kernel, mgorman, mingo, odin, odin, pauld, peterz, pjt, rostedt, shanpeic, tj, vincent.guittot, xiyou.wangcong When using cfs_b and meeting with some throttled periods, users shall use burst buffer to allow bursty workloads. Apart from configuring some burst buffer and watch whether throttled periods disappears, some statistics on burst buffer using are also helpful. Thus expose the following statistics into cpu.stat file: nr_burst: number of periods bandwidth burst occurs burst_time: cumulative wall-time that any cpus has used above quota in respective periods Co-developed-by: Shanpei Chen <shanpeic@linux.alibaba.com> Signed-off-by: Shanpei Chen <shanpeic@linux.alibaba.com> Co-developed-by: Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com> --- kernel/sched/core.c | 13 ++++++++++--- kernel/sched/fair.c | 11 +++++++++++ kernel/sched/sched.h | 3 +++ 3 files changed, 24 insertions(+), 3 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7d34b08ee0e5..d442fcd85374 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9265,6 +9265,9 @@ static int cpu_cfs_stat_show(struct seq_file *sf, void *v) seq_printf(sf, "wait_sum %llu\n", ws); } + seq_printf(sf, "nr_burst %d\n", cfs_b->nr_burst); + seq_printf(sf, "burst_time %llu\n", cfs_b->burst_time); + return 0; } #endif /* CONFIG_CFS_BANDWIDTH */ @@ -9361,16 +9364,20 @@ static int cpu_extra_stat_show(struct seq_file *sf, { struct task_group *tg = css_tg(css); struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth; - u64 throttled_usec; + u64 throttled_usec, burst_usec; throttled_usec = cfs_b->throttled_time; do_div(throttled_usec, NSEC_PER_USEC); + burst_usec = cfs_b->burst_time; + do_div(burst_usec, NSEC_PER_USEC); seq_printf(sf, "nr_periods %d\n" "nr_throttled %d\n" - "throttled_usec %llu\n", + "throttled_usec %llu\n" + "nr_burst %d\n" + "burst_usec %llu\n", cfs_b->nr_periods, cfs_b->nr_throttled, - throttled_usec); + throttled_usec, cfs_b->nr_burst, burst_usec); } #endif return 0; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 48fad5cf0f7a..d4783b62a010 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4635,6 +4635,8 @@ static inline u64 sched_cfs_bandwidth_slice(void) */ void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b) { + u64 runtime; + if (unlikely(cfs_b->quota == RUNTIME_INF)) return; @@ -4643,8 +4645,17 @@ void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b) return; } + if (cfs_b->runtime_at_period_start > cfs_b->runtime) { + runtime = cfs_b->runtime_at_period_start - cfs_b->runtime; + if (runtime > cfs_b->quota) { + cfs_b->burst_time += runtime - cfs_b->quota; + cfs_b->nr_burst++; + } + } + cfs_b->runtime += cfs_b->quota; cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst); + cfs_b->runtime_at_period_start = cfs_b->runtime; } static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index d317ca74a48c..b770b553dfbb 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -367,6 +367,7 @@ struct cfs_bandwidth { u64 quota; u64 runtime; u64 burst; + u64 runtime_at_period_start; s64 hierarchical_quota; u8 idle; @@ -379,7 +380,9 @@ struct cfs_bandwidth { /* Statistics: */ int nr_periods; int nr_throttled; + int nr_burst; u64 throttled_time; + u64 burst_time; #endif }; -- 2.14.4.44.g2045bb6 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v5 2/3] sched/fair: Add cfs bandwidth burst statistics 2021-05-20 12:34 ` [PATCH v5 2/3] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang @ 2021-05-20 14:11 ` Odin Ugedal 2021-05-21 12:42 ` changhuaixin 2021-05-21 14:05 ` Peter Zijlstra 2021-05-20 17:06 ` Tejun Heo 2021-05-21 14:01 ` Peter Zijlstra 2 siblings, 2 replies; 21+ messages in thread From: Odin Ugedal @ 2021-05-20 14:11 UTC (permalink / raw) To: Huaixin Chang Cc: Benjamin Segall, Dietmar Eggemann, dtcccc, Juri Lelli, khlebnikov, open list, Mel Gorman, Ingo Molnar, Odin Ugedal, pauld, Peter Zijlstra, Paul Turner, Steven Rostedt, shanpeic, Tejun Heo, Vincent Guittot, xiyou.wangcong I am a bit sceptical about both the nr_burst and burst_time as they are now. As an example; a control group using "99.9%" of the quota each period and that is never throttled. Such group would with this patch with a burst of X still get nr_throttled = 0 (as before), but it would get a nr_burst and burst_time that will keep increasing. I think there is a big difference between runtime moved/taken from cfs_b->runtime to cfs_rq->runtime_remaining and the actual runtime used in the period. Currently, cfs bw can only supply info the first one, and not the latter. I think that if people see nr_burst increasing, that they think they _have_ to use cfs burst in order to avoid being throttled, even though that might not be the case. It is probably fine as is, as long as it is explicitly stated what the values mean and imply, and what they do not. I cannot see another way to calculate it as it is now, but maybe someone else has some thoughts. Thanks Odin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 2/3] sched/fair: Add cfs bandwidth burst statistics 2021-05-20 14:11 ` Odin Ugedal @ 2021-05-21 12:42 ` changhuaixin 2021-05-21 14:05 ` Peter Zijlstra 1 sibling, 0 replies; 21+ messages in thread From: changhuaixin @ 2021-05-21 12:42 UTC (permalink / raw) To: Odin Ugedal Cc: changhuaixin, Benjamin Segall, Dietmar Eggemann, dtcccc, Juri Lelli, khlebnikov, open list, Mel Gorman, Ingo Molnar, pauld, Peter Zijlstra, Paul Turner, Steven Rostedt, shanpeic, Tejun Heo, Vincent Guittot, xiyou.wangcong > On May 20, 2021, at 10:11 PM, Odin Ugedal <odin@uged.al> wrote: > > I am a bit sceptical about both the nr_burst and burst_time as they are now. > > As an example; a control group using "99.9%" of the quota each period > and that is never throttled. Such group would with this patch with a burst of X > still get nr_throttled = 0 (as before), but it would get a nr_burst > and burst_time that > will keep increasing. > Agreed, there are false positive and false negetive cases, as the current implementation uses cfs_b->runtime to judge instead of the actual runtime used. > I think there is a big difference between runtime moved/taken from > cfs_b->runtime to cfs_rq->runtime_remaining and the actual runtime used > in the period. Currently, cfs bw can only supply info the first one, and > not the latter. > > I think that if people see nr_burst increasing, that they think they _have_ > to use cfs burst in order to avoid being throttled, even though that might > not be the case. It is probably fine as is, as long as it is explicitly stated It can't be seeing nr_burst incresing first, and using cfs burst feature afterwards. Do you mean people see nr_throttled increasing and use cfs burst, while the actual usage is below quota? In that case, tasks get throttled because there are runtime to be returned from cfs_rq, and get unthrottled shortly. That is a false positive for nr_throttled. When users see that, using burst can help improve. > what the values mean and imply, and what they do not. I cannot see another > way to calculate it as it is now, but maybe someone else has some thoughts. > > Thanks > Odin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 2/3] sched/fair: Add cfs bandwidth burst statistics 2021-05-20 14:11 ` Odin Ugedal 2021-05-21 12:42 ` changhuaixin @ 2021-05-21 14:05 ` Peter Zijlstra 1 sibling, 0 replies; 21+ messages in thread From: Peter Zijlstra @ 2021-05-21 14:05 UTC (permalink / raw) To: Odin Ugedal Cc: Huaixin Chang, Benjamin Segall, Dietmar Eggemann, dtcccc, Juri Lelli, khlebnikov, open list, Mel Gorman, Ingo Molnar, pauld, Paul Turner, Steven Rostedt, shanpeic, Tejun Heo, Vincent Guittot, xiyou.wangcong On Thu, May 20, 2021 at 04:11:52PM +0200, Odin Ugedal wrote: > I am a bit sceptical about both the nr_burst and burst_time as they are now. > > As an example; a control group using "99.9%" of the quota each period > and that is never throttled. Such group would with this patch with a burst of X > still get nr_throttled = 0 (as before), but it would get a nr_burst > and burst_time that > will keep increasing. > > I think there is a big difference between runtime moved/taken from > cfs_b->runtime to cfs_rq->runtime_remaining and the actual runtime used > in the period. Currently, cfs bw can only supply info the first one, and > not the latter. > > I think that if people see nr_burst increasing, that they think they _have_ > to use cfs burst in order to avoid being throttled, even though that might > not be the case. It is probably fine as is, as long as it is explicitly stated > what the values mean and imply, and what they do not. I cannot see another > way to calculate it as it is now, but maybe someone else has some thoughts. You can always trace the system. I don't think we have nice tracepoints for any of this, but much can be inferred from the scheduler and hrtimer tracepoints. Also kprobe might be empoloyed to stick in more appropriate thingies I suppose. You can also run the workload without bandwidth controls and measure it's job execution times, and from that compute the bandwidth settings, all without tracepoints. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 2/3] sched/fair: Add cfs bandwidth burst statistics 2021-05-20 12:34 ` [PATCH v5 2/3] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang 2021-05-20 14:11 ` Odin Ugedal @ 2021-05-20 17:06 ` Tejun Heo 2021-05-21 14:01 ` Peter Zijlstra 2 siblings, 0 replies; 21+ messages in thread From: Tejun Heo @ 2021-05-20 17:06 UTC (permalink / raw) To: Huaixin Chang Cc: bsegall, dietmar.eggemann, dtcccc, juri.lelli, khlebnikov, linux-kernel, mgorman, mingo, odin, odin, pauld, peterz, pjt, rostedt, shanpeic, vincent.guittot, xiyou.wangcong On Thu, May 20, 2021 at 08:34:18PM +0800, Huaixin Chang wrote: > When using cfs_b and meeting with some throttled periods, users shall > use burst buffer to allow bursty workloads. Apart from configuring some > burst buffer and watch whether throttled periods disappears, some > statistics on burst buffer using are also helpful. Thus expose the > following statistics into cpu.stat file: > > nr_burst: number of periods bandwidth burst occurs nr_bursts > burst_time: cumulative wall-time that any cpus has > used above quota in respective periods burst_usec may be more in line with other fields in that file. Thanks. -- tejun ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 2/3] sched/fair: Add cfs bandwidth burst statistics 2021-05-20 12:34 ` [PATCH v5 2/3] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang 2021-05-20 14:11 ` Odin Ugedal 2021-05-20 17:06 ` Tejun Heo @ 2021-05-21 14:01 ` Peter Zijlstra 2 siblings, 0 replies; 21+ messages in thread From: Peter Zijlstra @ 2021-05-21 14:01 UTC (permalink / raw) To: Huaixin Chang Cc: bsegall, dietmar.eggemann, dtcccc, juri.lelli, khlebnikov, linux-kernel, mgorman, mingo, odin, odin, pauld, pjt, rostedt, shanpeic, tj, vincent.guittot, xiyou.wangcong On Thu, May 20, 2021 at 08:34:18PM +0800, Huaixin Chang wrote: > When using cfs_b and meeting with some throttled periods, users shall > use burst buffer to allow bursty workloads. Apart from configuring some > burst buffer and watch whether throttled periods disappears, some > statistics on burst buffer using are also helpful. Thus expose the > following statistics into cpu.stat file: > Helpful how.. the above is a bunch of words without any actual justification for any of this. ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v5 3/3] sched/fair: Add document for burstable CFS bandwidth 2021-05-20 12:34 [PATCH v5 0/3] sched/fair: Burstable CFS bandwidth controller Huaixin Chang 2021-05-20 12:34 ` [PATCH v5 1/3] sched/fair: Introduce the burstable CFS controller Huaixin Chang 2021-05-20 12:34 ` [PATCH v5 2/3] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang @ 2021-05-20 12:34 ` Huaixin Chang 2 siblings, 0 replies; 21+ messages in thread From: Huaixin Chang @ 2021-05-20 12:34 UTC (permalink / raw) To: changhuaixin Cc: bsegall, dietmar.eggemann, dtcccc, juri.lelli, khlebnikov, linux-kernel, mgorman, mingo, odin, odin, pauld, peterz, pjt, rostedt, shanpeic, tj, vincent.guittot, xiyou.wangcong Basic description of usage and effect for CFS Bandwidth Control Burst. Co-developed-by: Shanpei Chen <shanpeic@linux.alibaba.com> Signed-off-by: Shanpei Chen <shanpeic@linux.alibaba.com> Co-developed-by: Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com> --- Documentation/admin-guide/cgroup-v2.rst | 17 +++++--- Documentation/scheduler/sched-bwc.rst | 76 ++++++++++++++++++++++++++++----- 2 files changed, 77 insertions(+), 16 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index b1e81aa8598a..222c9942592e 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1000,6 +1000,8 @@ All time durations are in microseconds. - nr_periods - nr_throttled - throttled_usec + - nr_burst + - burst_usec cpu.weight A read-write single value file which exists on non-root @@ -1020,16 +1022,19 @@ All time durations are in microseconds. the closest approximation of the current weight. cpu.max - A read-write two value file which exists on non-root cgroups. - The default is "max 100000". + A read-write three value file which exists on non-root cgroups. + The default is "max 100000 0". The maximum bandwidth limit. It's in the following format:: - $MAX $PERIOD + $MAX $PERIOD $BURST - which indicates that the group may consume upto $MAX in each - $PERIOD duration. "max" for $MAX indicates no limit. If only - one number is written, $MAX is updated. + which indicates that the group may consume upto $MAX from this + period plus $BURST carried over from previous periods in each + $PERIOD duration. "max" for $MAX indicates no limit. "0" for + $BURST indicates no bandwidth can be carried over. $BURST should + not be larger than $MAX. On partial writing, values are updated + accordingly. cpu.pressure A read-write nested-keyed file. diff --git a/Documentation/scheduler/sched-bwc.rst b/Documentation/scheduler/sched-bwc.rst index 845eee659199..3e7ebb7bc562 100644 --- a/Documentation/scheduler/sched-bwc.rst +++ b/Documentation/scheduler/sched-bwc.rst @@ -22,39 +22,74 @@ cfs_quota units at each period boundary. As threads consume this bandwidth it is transferred to cpu-local "silos" on a demand basis. The amount transferred within each of these updates is tunable and described as the "slice". +By default, CPU bandwidth consumption is strictly limited to quota within each +given period. For the sequence of CPU usage u_i served under CFS bandwidth +control, if for any j <= k N(j,k) is the number of periods from u_j to u_k: + + u_j+...+u_k <= quota * N(j,k) + +For a bursty sequence among which interval u_j...u_k are at the peak, CPU +requests might have to wait for more periods to replenish enough quota. +Otherwise, larger quota is required. + +With "burst" buffer, CPU requests might be served as long as: + + u_j+...+u_k <= B_j + quota * N(j,k) <= quota * (N(j,k)+1) + +if for any j <= k N(j,k) is the number of periods from u_j to u_k and B_j is +the accumulated quota from previous periods in burst buffer serving u_j. The +max value of B_j is limited to quota so the total CPU utilization will not be +larger than quota * (N(j,k)+1). +Burst buffer helps in that serving whole bursty CPU requests without throttling +them can be done with moderate quota setting and accumulated quota in burst +buffer, if: + + u_0+...+u_n <= B_0 + quota * N(0,n) <= quota * (N(0,n)+1) + +where B_0 is the initial state of burst buffer. The maximum accumulated quota in +the burst buffer is capped by burst. With proper burst setting, the available +bandwidth is still determined by quota and period on the long run. + Management ---------- -Quota and period are managed within the cpu subsystem via cgroupfs. +Quota, period and burst are managed within the cpu subsystem via cgroupfs. .. note:: The cgroupfs files described in this section are only applicable to cgroup v1. For cgroup v2, see :ref:`Documentation/admin-guide/cgroupv2.rst <cgroup-v2-cpu>`. -- cpu.cfs_quota_us: the total available run-time within a period (in - microseconds) +- cpu.cfs_quota_us: run-time replenished within a period (in microseconds) - cpu.cfs_period_us: the length of a period (in microseconds) - cpu.stat: exports throttling statistics [explained further below] +- cpu.cfs_burst_us: the maximum accumulated run-time (in microseconds) The default values are:: cpu.cfs_period_us=100ms - cpu.cfs_quota=-1 + cpu.cfs_quota_us=-1 + cpu.cfs_burst_us=0 A value of -1 for cpu.cfs_quota_us indicates that the group does not have any bandwidth restriction in place, such a group is described as an unconstrained bandwidth group. This represents the traditional work-conserving behavior for CFS. -Writing any (valid) positive value(s) will enact the specified bandwidth limit. -The minimum quota allowed for the quota or period is 1ms. There is also an -upper bound on the period length of 1s. Additional restrictions exist when -bandwidth limits are used in a hierarchical fashion, these are explained in -more detail below. +Writing any (valid) positive value(s) no smaller than cpu.cfs_burst_us will +enact the specified bandwidth limit. The minimum quota allowed for the quota or +period is 1ms. There is also an upper bound on the period length of 1s. +Additional restrictions exist when bandwidth limits are used in a hierarchical +fashion, these are explained in more detail below. Writing any negative value to cpu.cfs_quota_us will remove the bandwidth limit and return the group to an unconstrained state once more. +A value of 0 for cpu.cfs_burst_us indicates that the group can not accumulate +any unused bandwidth. It makes the traditional bandwidth control behavior for +CFS unchanged. Writing any (valid) positive value(s) no larger than +cpu.cfs_quota_us into cpu.cfs_burst_us will enact the cap on unused bandwidth +accumulation. + Any updates to a group's bandwidth specification will result in it becoming unthrottled if it is in a constrained state. @@ -72,9 +107,15 @@ This is tunable via procfs:: Larger slice values will reduce transfer overheads, while smaller values allow for more fine-grained consumption. +There is also a global switch to turn off burst for all groups:: + /proc/sys/kernel/sched_cfs_bw_burst_enabled (default=1) + +By default it is enabled. Writing a 0 value means no accumulated CPU time can be +used for any group, even if cpu.cfs_burst_us is configured. + Statistics ---------- -A group's bandwidth statistics are exported via 3 fields in cpu.stat. +A group's bandwidth statistics are exported via 5 fields in cpu.stat. cpu.stat: @@ -82,6 +123,9 @@ cpu.stat: - nr_throttled: Number of times the group has been throttled/limited. - throttled_time: The total time duration (in nanoseconds) for which entities of the group have been throttled. +- nr_burst: Number of periods burst occurs. +- burst_time: Cumulative wall-time that any CPUs has used above quota in + respective periods This interface is read-only. @@ -179,3 +223,15 @@ Examples By using a small period here we are ensuring a consistent latency response at the expense of burst capacity. + +4. Limit a group to 40% of 1 CPU, and allow accumulate up to 20% of 1 CPU + additionally, in case accumulation has been done. + + With 50ms period, 20ms quota will be equivalent to 40% of 1 CPU. + And 10ms burst will be equivalent to 20% of 1 CPU. + + # echo 20000 > cpu.cfs_quota_us /* quota = 20ms */ + # echo 50000 > cpu.cfs_period_us /* period = 50ms */ + # echo 10000 > cpu.cfs_burst_us /* burst = 10ms */ + + Larger buffer setting (no larger than quota) allows greater burst capacity. -- 2.14.4.44.g2045bb6 ^ permalink raw reply related [flat|nested] 21+ messages in thread
end of thread, other threads:[~2021-05-31 6:59 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-05-20 12:34 [PATCH v5 0/3] sched/fair: Burstable CFS bandwidth controller Huaixin Chang 2021-05-20 12:34 ` [PATCH v5 1/3] sched/fair: Introduce the burstable CFS controller Huaixin Chang 2021-05-20 14:00 ` Odin Ugedal 2021-05-20 17:04 ` Tejun Heo 2021-05-21 9:09 ` changhuaixin 2021-05-21 9:38 ` Odin Ugedal 2021-05-21 12:38 ` changhuaixin 2021-05-21 14:00 ` Peter Zijlstra 2021-05-24 12:42 ` changhuaixin 2021-05-25 10:46 ` Peter Zijlstra 2021-05-31 6:59 ` luca abeni 2021-05-25 10:47 ` Peter Zijlstra 2021-05-25 21:25 ` Benjamin Segall 2021-05-25 10:49 ` Peter Zijlstra 2021-05-20 12:34 ` [PATCH v5 2/3] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang 2021-05-20 14:11 ` Odin Ugedal 2021-05-21 12:42 ` changhuaixin 2021-05-21 14:05 ` Peter Zijlstra 2021-05-20 17:06 ` Tejun Heo 2021-05-21 14:01 ` Peter Zijlstra 2021-05-20 12:34 ` [PATCH v5 3/3] sched/fair: Add document for burstable CFS bandwidth Huaixin Chang
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.