All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched/fair: prevent cpu burst too many periods
@ 2021-11-29 16:28 Honglei Wang
  2021-11-29 20:13 ` Benjamin Segall
  0 siblings, 1 reply; 4+ messages in thread
From: Honglei Wang @ 2021-11-29 16:28 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, linux-kernel
  Cc: jameshongleiwang

Tasks might get more cpu than quota in persistent periods due to the
cpu burst introduced by commit f4183717b370 ("sched/fair: Introduce the
burstable CFS controller"). For example, one task group whose quota is
100ms per period and can get 100ms burst, and its avg utilization is
around 105ms per period. Once this group gets a free period which
leaves enough runtime, it has a chance to get computting power more
than its quota for 10 periods or more in common bandwidth configuration
(say, 100ms as period). It means tasks can 'steal' the bursted power to
do daily jobs because all tasks could be scheduled out or sleep to help
the group get free periods.

I believe the purpose of cpu burst is to help handling bursty worklod.
But if one task group can get computting power more than its quota for
persistent periods even there is no bursty workload, it's kinda broke.

This patch limits the burst to one period so that it won't break the
quota limit for long. With this, we can give task group more cpu burst
power to handle the real bursty workload and don't worry about the
'stealing'.

Signed-off-by: Honglei Wang <wanghonglei@didichuxing.com>
---
 kernel/sched/fair.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6e476f6d9435..cc2c4567fc81 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4640,14 +4640,17 @@ void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
 	if (unlikely(cfs_b->quota == RUNTIME_INF))
 		return;
 
-	cfs_b->runtime += cfs_b->quota;
-	runtime = cfs_b->runtime_snap - cfs_b->runtime;
+	runtime = cfs_b->runtime_snap - cfs_b->quota - cfs_b->runtime;
+
 	if (runtime > 0) {
 		cfs_b->burst_time += runtime;
 		cfs_b->nr_burst++;
+		cfs_b->runtime = cfs_b->quota;
+	} else {
+		cfs_b->runtime += cfs_b->quota;
+		cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
 	}
 
-	cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
 	cfs_b->runtime_snap = cfs_b->runtime;
 }
 
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] sched/fair: prevent cpu burst too many periods
  2021-11-29 16:28 [PATCH] sched/fair: prevent cpu burst too many periods Honglei Wang
@ 2021-11-29 20:13 ` Benjamin Segall
  2021-11-30  7:58   ` Honglei Wang
  0 siblings, 1 reply; 4+ messages in thread
From: Benjamin Segall @ 2021-11-29 20:13 UTC (permalink / raw)
  To: Honglei Wang, Huaixin Chang
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Mel Gorman,
	Daniel Bristot de Oliveira, linux-kernel, jameshongleiwang

Honglei Wang <wanghonglei@didichuxing.com> writes:

> Tasks might get more cpu than quota in persistent periods due to the
> cpu burst introduced by commit f4183717b370 ("sched/fair: Introduce the
> burstable CFS controller"). For example, one task group whose quota is
> 100ms per period and can get 100ms burst, and its avg utilization is
> around 105ms per period. Once this group gets a free period which
> leaves enough runtime, it has a chance to get computting power more
> than its quota for 10 periods or more in common bandwidth configuration
> (say, 100ms as period). It means tasks can 'steal' the bursted power to
> do daily jobs because all tasks could be scheduled out or sleep to help
> the group get free periods.
>
> I believe the purpose of cpu burst is to help handling bursty worklod.
> But if one task group can get computting power more than its quota for
> persistent periods even there is no bursty workload, it's kinda broke.
>
> This patch limits the burst to one period so that it won't break the
> quota limit for long. With this, we can give task group more cpu burst
> power to handle the real bursty workload and don't worry about the
> 'stealing'.

CC ing the burst patch author.

Whether or not burst is useful only for burst, or also for a bit of
long-term-only fairness is not entirely clear to me. Assuming we want it
only for burst, cutting off this sharply has a bit of additional
downside because it means that if a period refresh lands in the middle
of a burst then you lose the burst runtime. Permitting only two periods
in a row to make use of burst should be doable but it's yet another
piece of state added to cfs_b for this, and given typical ~100ms periods
that may be low enough odds that we don't care.

>
> Signed-off-by: Honglei Wang <wanghonglei@didichuxing.com>
> ---
>  kernel/sched/fair.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6e476f6d9435..cc2c4567fc81 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4640,14 +4640,17 @@ void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
>  	if (unlikely(cfs_b->quota == RUNTIME_INF))
>  		return;
>  
> -	cfs_b->runtime += cfs_b->quota;
> -	runtime = cfs_b->runtime_snap - cfs_b->runtime;
> +	runtime = cfs_b->runtime_snap - cfs_b->quota - cfs_b->runtime;
> +
>  	if (runtime > 0) {
>  		cfs_b->burst_time += runtime;
>  		cfs_b->nr_burst++;
> +		cfs_b->runtime = cfs_b->quota;
> +	} else {
> +		cfs_b->runtime += cfs_b->quota;
> +		cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
>  	}
>  
> -	cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
>  	cfs_b->runtime_snap = cfs_b->runtime;
>  }

If we do this, it should also be mentioned in
Documentation/scheduler/sched-bwc.rst, since the straightforward
description of burst as extra max runtime is no longer enough.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sched/fair: prevent cpu burst too many periods
  2021-11-29 20:13 ` Benjamin Segall
@ 2021-11-30  7:58   ` Honglei Wang
  0 siblings, 0 replies; 4+ messages in thread
From: Honglei Wang @ 2021-11-30  7:58 UTC (permalink / raw)
  To: Benjamin Segall, Huaixin Chang
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Mel Gorman,
	Daniel Bristot de Oliveira, linux-kernel, jameshongleiwang



On 2021/11/30 04:13, Benjamin Segall wrote:
> Honglei Wang <wanghonglei@didichuxing.com> writes:
> 
>> Tasks might get more cpu than quota in persistent periods due to the
>> cpu burst introduced by commit f4183717b370 ("sched/fair: Introduce the
>> burstable CFS controller"). For example, one task group whose quota is
>> 100ms per period and can get 100ms burst, and its avg utilization is
>> around 105ms per period. Once this group gets a free period which
>> leaves enough runtime, it has a chance to get computting power more
>> than its quota for 10 periods or more in common bandwidth configuration
>> (say, 100ms as period). It means tasks can 'steal' the bursted power to
>> do daily jobs because all tasks could be scheduled out or sleep to help
>> the group get free periods.
>>
>> I believe the purpose of cpu burst is to help handling bursty worklod.
>> But if one task group can get computting power more than its quota for
>> persistent periods even there is no bursty workload, it's kinda broke.
>>
>> This patch limits the burst to one period so that it won't break the
>> quota limit for long. With this, we can give task group more cpu burst
>> power to handle the real bursty workload and don't worry about the
>> 'stealing'.
> 
> CC ing the burst patch author.
> 
> Whether or not burst is useful only for burst, or also for a bit of
> long-term-only fairness is not entirely clear to me. Assuming we want it
> only for burst, cutting off this sharply has a bit of additional
> downside because it means that if a period refresh lands in the middle
> of a burst then you lose the burst runtime. Permitting only two periods
> in a row to make use of burst should be doable but it's yet another
> piece of state added to cfs_b for this, and given typical ~100ms periods
> that may be low enough odds that we don't care.
> 

Originally, I was thinking if we should permit 2 periods, but in another 
hand, I thought if we just permit 1, we can give more cpu.max.burst to 
help the workloads sharp enough and can use up the quota even when they 
start working in the middle of one period. In this way, they have chance 
to get job done in part period with quota+bursted cpu and probably one 
more period with only quota. It's kind of different views of the usage.

If you think it can't cover the burst work loads, we can permit more 
periods.

>>
>> Signed-off-by: Honglei Wang <wanghonglei@didichuxing.com>
>> ---
>>   kernel/sched/fair.c | 9 ++++++---
>>   1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 6e476f6d9435..cc2c4567fc81 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4640,14 +4640,17 @@ void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
>>   	if (unlikely(cfs_b->quota == RUNTIME_INF))
>>   		return;
>>   
>> -	cfs_b->runtime += cfs_b->quota;
>> -	runtime = cfs_b->runtime_snap - cfs_b->runtime;
>> +	runtime = cfs_b->runtime_snap - cfs_b->quota - cfs_b->runtime;
>> +
>>   	if (runtime > 0) {
>>   		cfs_b->burst_time += runtime;
>>   		cfs_b->nr_burst++;
>> +		cfs_b->runtime = cfs_b->quota;
>> +	} else {
>> +		cfs_b->runtime += cfs_b->quota;
>> +		cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
>>   	}
>>   
>> -	cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
>>   	cfs_b->runtime_snap = cfs_b->runtime;
>>   }
> 
> If we do this, it should also be mentioned in
> Documentation/scheduler/sched-bwc.rst, since the straightforward
> description of burst as extra max runtime is no longer enough.
> 

Yep, I'll try to refine the description of the documentation next time 
if we decide to do this.

Thank,
Honglei

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] sched/fair: prevent cpu burst too many periods
@ 2021-11-25 14:20 Honglei Wang
  0 siblings, 0 replies; 4+ messages in thread
From: Honglei Wang @ 2021-11-25 14:20 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, linux-kernel

Tasks might get more cpu than quota in persistent periods due to the
cpu burst introduced by commit f4183717b370 ("sched/fair: Introduce the
burstable CFS controller"). For example, one task group whose quota is
100ms per period and can get 100ms burst, and its avg utilization is
around 105ms per period. Once this group gets a free period which
leaves enough runtime, it has a chance to get computting power more
than its quota for 10 periods or more in common bandwidth configuration
(say, 100ms as period). It means tasks can 'steal' the bursted power to
do daily jobs because all tasks could be scheduled out or sleep to help
the group get free periods.

I believe the purpose of cpu burst is to help handling bursty worklod.
But if one task group can get computting power more than its quota for
persistent periods even there is no bursty workload, it's kinda broke.

This patch limits the burst to one period so that it won't break the
quota limit for long. With this, we can give task group more cpu burst
power to handle the real bursty workload and don't worry about the
'stealing'.

Signed-off-by: Honglei Wang <jameshongleiwang@126.com>
---
 kernel/sched/fair.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6e476f6d9435..cc2c4567fc81 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4640,14 +4640,17 @@ void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
 	if (unlikely(cfs_b->quota == RUNTIME_INF))
 		return;
 
-	cfs_b->runtime += cfs_b->quota;
-	runtime = cfs_b->runtime_snap - cfs_b->runtime;
+	runtime = cfs_b->runtime_snap - cfs_b->quota - cfs_b->runtime;
+
 	if (runtime > 0) {
 		cfs_b->burst_time += runtime;
 		cfs_b->nr_burst++;
+		cfs_b->runtime = cfs_b->quota;
+	} else {
+		cfs_b->runtime += cfs_b->quota;
+		cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
 	}
 
-	cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
 	cfs_b->runtime_snap = cfs_b->runtime;
 }
 
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-11-30  8:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-29 16:28 [PATCH] sched/fair: prevent cpu burst too many periods Honglei Wang
2021-11-29 20:13 ` Benjamin Segall
2021-11-30  7:58   ` Honglei Wang
  -- strict thread matches above, loose matches on Subject: below --
2021-11-25 14:20 Honglei Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.