* [PATCH v2 2/3] sched/fair: prevent cpu burst too many periods @ 2021-12-08 14:50 Honglei Wang 2021-12-09 13:08 ` Peter Zijlstra 0 siblings, 1 reply; 3+ messages in thread From: Honglei Wang @ 2021-12-08 14:50 UTC (permalink / raw) To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, linux-kernel Cc: Huaixin Chang, Honglei Wang Tasks might get more cpu than quota in persistent periods due to the cpu burst introduced by commit f4183717b370 ("sched/fair: Introduce the burstable CFS controller"). For example, one task group whose quota is 100ms per period and can get 100ms burst, and its avg utilization is around 105ms per period. Once this group gets a free period which leaves enough runtime, it has a chance to get computting power more than its quota for 10 periods or more in common bandwidth configuration (say, 100ms as period). It means tasks can 'steal' the bursted power to do daily jobs because all tasks could be scheduled out or sleep to help the group get free periods. I believe the purpose of cpu burst is to help handling bursty worklod. But if one task group can get computting power more than its quota for persistent periods even there is no bursty workload, it's kinda broke. This patch limits the burst to 2 periods so that it won't break the quota limit for long. Permitting 2 periods can help on the scenario that periods refresh lands in the middle of a burst workload. With this, we can give task group more cpu burst power to handle the real burst workload and don't worry about the 'stealing'. Signed-off-by: Honglei Wang <wanghonglei@didichuxing.com> --- kernel/sched/fair.c | 13 ++++++++++--- kernel/sched/sched.h | 1 + 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 2cd626c22912..4e04cb4269ba 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4645,14 +4645,21 @@ void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b) return; } - cfs_b->runtime += cfs_b->quota; - runtime = cfs_b->runtime_snap - cfs_b->runtime; + runtime = cfs_b->runtime_snap - cfs_b->runtime - cfs_b->quota; if (runtime > 0) { cfs_b->burst_time += runtime; cfs_b->nr_burst++; + cfs_b->burst_periods++; + } + + if (cfs_b->burst_periods > 1) { + cfs_b->runtime = cfs_b->quota; + cfs_b->burst_periods = 0; + } else { + cfs_b->runtime += cfs_b->quota; + cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst); } - cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst); cfs_b->runtime_snap = cfs_b->runtime; } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 0e66749486e7..f42280bca3b2 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -370,6 +370,7 @@ struct cfs_bandwidth { u64 burst; u64 runtime_snap; s64 hierarchical_quota; + u8 burst_periods; u8 idle; u8 period_active; -- 2.14.1 ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2 2/3] sched/fair: prevent cpu burst too many periods 2021-12-08 14:50 [PATCH v2 2/3] sched/fair: prevent cpu burst too many periods Honglei Wang @ 2021-12-09 13:08 ` Peter Zijlstra 2021-12-10 16:34 ` Honglei Wang 0 siblings, 1 reply; 3+ messages in thread From: Peter Zijlstra @ 2021-12-09 13:08 UTC (permalink / raw) To: Honglei Wang Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, linux-kernel, Huaixin Chang, Honglei Wang On Wed, Dec 08, 2021 at 10:50:38PM +0800, Honglei Wang wrote: > Tasks might get more cpu than quota in persistent periods due to the > cpu burst introduced by commit f4183717b370 ("sched/fair: Introduce the > burstable CFS controller"). > For example, one task group whose quota is > 100ms per period and can get 100ms burst, and its avg utilization is > around 105ms per period. That would be a mis-configuration, surely.. > Once this group gets a free period which > leaves enough runtime, it has a chance to get computting power more > than its quota for 10 periods or more in common bandwidth configuration > (say, 100ms as period). Sure, if it, for some miraculous reason, decides to sleep for a whole period and then resume, it can indeed consume up to that 100ms extra, which, if as per the above, done at 5ms per perios, would be 20 periods until depleted. > It means tasks can 'steal' the bursted power to > do daily jobs because all tasks could be scheduled out or sleep to help > the group get free periods. That's the design,, > I believe the purpose of cpu burst is to help handling bursty worklod. > But if one task group can get computting power more than its quota for > persistent periods even there is no bursty workload, it's kinda broke. So if that was were bursty, it could consume that 100ms extra in a single go and that would be fine, but spreading that same amount over 20 periods is somehow a problem? -- even though the interference is less. > This patch limits the burst to 2 periods so that it won't break the > quota limit for long. Permitting 2 periods can help on the scenario that > periods refresh lands in the middle of a burst workload. With this, we > can give task group more cpu burst power to handle the real burst > workload and don't worry about the 'stealing'. I've yet so see an actual reason for any of this... ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2 2/3] sched/fair: prevent cpu burst too many periods 2021-12-09 13:08 ` Peter Zijlstra @ 2021-12-10 16:34 ` Honglei Wang 0 siblings, 0 replies; 3+ messages in thread From: Honglei Wang @ 2021-12-10 16:34 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, linux-kernel, Huaixin Chang, Honglei Wang On 2021/12/9 21:08, Peter Zijlstra wrote: > On Wed, Dec 08, 2021 at 10:50:38PM +0800, Honglei Wang wrote: >> Tasks might get more cpu than quota in persistent periods due to the >> cpu burst introduced by commit f4183717b370 ("sched/fair: Introduce the >> burstable CFS controller"). > >> For example, one task group whose quota is >> 100ms per period and can get 100ms burst, and its avg utilization is >> around 105ms per period. > > That would be a mis-configuration, surely.. > Well, it's a lame example to describe the spreading of the burst.. >> Once this group gets a free period which >> leaves enough runtime, it has a chance to get computting power more >> than its quota for 10 periods or more in common bandwidth configuration >> (say, 100ms as period). > > Sure, if it, for some miraculous reason, decides to sleep for a whole > period and then resume, it can indeed consume up to that 100ms extra, > which, if as per the above, done at 5ms per perios, would be 20 periods > until depleted. > >> It means tasks can 'steal' the bursted power to >> do daily jobs because all tasks could be scheduled out or sleep to help >> the group get free periods. > > That's the design,, > >> I believe the purpose of cpu burst is to help handling bursty worklod. >> But if one task group can get computting power more than its quota for >> persistent periods even there is no bursty workload, it's kinda broke. > > So if that was were bursty, it could consume that 100ms extra in a > single go and that would be fine, but spreading that same amount over 20 > periods is somehow a problem? -- even though the interference is less. > The key thought I make the change is that If the spreading of burst power always happen, it indicates the quota is not comfortable, the better way is to do quota re-config for the container, but not always get extra power from spreading the burst part which is not in the consideration of high level container dispatcher such as k8s scheduler. Container dispatcher might dispatch jobs oversale. The containers get more power from the burst spreading is outside the sense of the dispatcher. It might mislead the estimation of the entire ability of the host. IMO, cpu burst should be focus on the real burst workload, sharp and short term. Well, if 2 periods are a bit short for some huge cpu calculation jobs, maybe we can add an option to define the burstable periods to let the user make the decision based on the workload if you think this limit periods idea make sense. Thanks, Honglei >> This patch limits the burst to 2 periods so that it won't break the >> quota limit for long. Permitting 2 periods can help on the scenario that >> periods refresh lands in the middle of a burst workload. With this, we >> can give task group more cpu burst power to handle the real burst >> workload and don't worry about the 'stealing'. > > I've yet so see an actual reason for any of this... > ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-12-10 16:35 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-12-08 14:50 [PATCH v2 2/3] sched/fair: prevent cpu burst too many periods Honglei Wang 2021-12-09 13:08 ` Peter Zijlstra 2021-12-10 16:34 ` Honglei Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).