linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] sched/cfs: make util/load_avg stable
@ 2017-04-19 16:29 Vincent Guittot
  2017-04-19 16:29 ` [PATCH 1/2] sched/cfs: make util/load_avg more stable Vincent Guittot
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Vincent Guittot @ 2017-04-19 16:29 UTC (permalink / raw)
  To: mingo, linux-kernel
  Cc: dietmar.eggemann, Morten.Rasmussen, yuyang.du, pjt, bsegall,
	Vincent Guittot

In the current implementation of load/util_avg, we assume that the ongoing
time segment has fully elapsed and util/load_sum is divided by LOAD_AVG_MAX,
even if part of the time segment still remains to run . As a consequence,
this remaining part is considered as idle time and generates unexpected
variations of util_avg of a busy CPU in the range ]1002..1024[ whereas
util_avg should stay at 1023.

The 1st patch implements Peter's proposal to remove the contribution of the
current time segment when computing the util/load_avg. The 2nd one keeps
using the current segment but update the max value instead. Both solutions
make load/util_avg being stable with the advantage of using the most up to
date value for the 2nd patch. I have split it into 2 patches to show the 2
versions but if the 2nd patch looks ok, we should probably squashed them into
one.

Vincent Guittot (2):
  sched/cfs: make util/load_avg more stable
  sched/cfs: take into account current segment

 kernel/sched/fair.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] sched/cfs: make util/load_avg more stable
  2017-04-19 16:29 [PATCH 0/2] sched/cfs: make util/load_avg stable Vincent Guittot
@ 2017-04-19 16:29 ` Vincent Guittot
  2017-04-19 16:29 ` [PATCH 2/2] sched/cfs: take into account current time segment Vincent Guittot
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Vincent Guittot @ 2017-04-19 16:29 UTC (permalink / raw)
  To: mingo, linux-kernel
  Cc: dietmar.eggemann, Morten.Rasmussen, yuyang.du, pjt, bsegall,
	Vincent Guittot

In the current implementation of load/util_avg, we assume that the ongoing
time segment has fully elapsed, and util/load_sum is divided by LOAD_AVG_MAX,
even if part of the time segment still remains. As a consequence, this
remaining part is considered as idle time and generates unexpected variations
of util_avg of a busy CPU in the range ]1002..1024[ whereas util_avg should
stay at 1023.
In order to keep the metric stable, we should not consider the ongoing time
segment when computing load/util_avg but only the segments that have already
fully elapsed.
:if expand("%") == ""|browse confirm w|else|confirm w|endif

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3f83a35..f74da94 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3017,12 +3017,15 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 	/*
 	 * Step 2: update *_avg.
 	 */
-	sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX);
+	sa->load_avg = div_u64((sa->load_sum - sa->period_contrib * weight),
+					(LOAD_AVG_MAX - 1024));
 	if (cfs_rq) {
 		cfs_rq->runnable_load_avg =
-			div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX);
+			div_u64((cfs_rq->runnable_load_sum - sa->period_contrib * weight),
+					(LOAD_AVG_MAX - 1024));
 	}
-	sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
+	sa->util_avg = (sa->util_sum - (running * sa->period_contrib << SCHED_CAPACITY_SHIFT)) /
+					(LOAD_AVG_MAX - 1024);
 
 	return 1;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] sched/cfs: take into account current time segment
  2017-04-19 16:29 [PATCH 0/2] sched/cfs: make util/load_avg stable Vincent Guittot
  2017-04-19 16:29 ` [PATCH 1/2] sched/cfs: make util/load_avg more stable Vincent Guittot
@ 2017-04-19 16:29 ` Vincent Guittot
  2017-04-19 16:44 ` [PATCH 0/2] sched/cfs: make util/load_avg stable Vincent Guittot
  2017-04-19 16:54 ` [PATCH v2] sched/cfs: make util/load_avg more stable Vincent Guittot
  3 siblings, 0 replies; 11+ messages in thread
From: Vincent Guittot @ 2017-04-19 16:29 UTC (permalink / raw)
  To: mingo, linux-kernel
  Cc: dietmar.eggemann, Morten.Rasmussen, yuyang.du, pjt, bsegall,
	Vincent Guittot

To not consider the current time segment adds unwanted latency in the
load/util_avg responsivness especially when the time is scaled instead of
the contribution. Instead of waiting for the current time segment to have
fully elapsed before accounting it in load/util_avg, we can already account
the elapsed part but change the range used to compute load/util_avg
accordingly.
At the very beginning of a new time segment, the past segments have been
decayed and the max value is MAX_LOAD_AVG*y. At the very end of the current
time segment, the max value becomes 1024(us) + MAX_LOAD_AVG*y which is equal
to MAX_LOAD_AVG. In fact, the max value is
sa->period_contrib + MAX_LOAD_AVG*y at any time in the time segment.

Taking advantage of the fact that MAX_LOAD_AVG*y == MAX_LOAD_AVG-1024, the
range becomes [0..MAX_LOAD_AVG-1024+sa->period_contrib].

As the elapsed part is already accounted in load/util_sum, we update the max
value according to the current position in the time segment instead of
removing its contribution.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f74da94..c3b8f0f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3017,15 +3017,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 	/*
 	 * Step 2: update *_avg.
 	 */
-	sa->load_avg = div_u64((sa->load_sum - sa->period_contrib * weight),
-					(LOAD_AVG_MAX - 1024));
+	sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib);
 	if (cfs_rq) {
 		cfs_rq->runnable_load_avg =
-			div_u64((cfs_rq->runnable_load_sum - sa->period_contrib * weight),
-					(LOAD_AVG_MAX - 1024));
+			div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib);
 	}
-	sa->util_avg = (sa->util_sum - (running * sa->period_contrib << SCHED_CAPACITY_SHIFT)) /
-					(LOAD_AVG_MAX - 1024);
+	sa->util_avg = sa->util_sum / (LOAD_AVG_MAX - 1024 + sa->period_contrib);
 
 	return 1;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 0/2] sched/cfs: make util/load_avg stable
  2017-04-19 16:29 [PATCH 0/2] sched/cfs: make util/load_avg stable Vincent Guittot
  2017-04-19 16:29 ` [PATCH 1/2] sched/cfs: make util/load_avg more stable Vincent Guittot
  2017-04-19 16:29 ` [PATCH 2/2] sched/cfs: take into account current time segment Vincent Guittot
@ 2017-04-19 16:44 ` Vincent Guittot
  2017-04-19 16:44   ` [PATCH 1/2] sched/cfs: make util/load_avg more stable Vincent Guittot
  2017-04-19 16:44   ` [PATCH 2/2] sched/cfs: take into account current time segment Vincent Guittot
  2017-04-19 16:54 ` [PATCH v2] sched/cfs: make util/load_avg more stable Vincent Guittot
  3 siblings, 2 replies; 11+ messages in thread
From: Vincent Guittot @ 2017-04-19 16:44 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: dietmar.eggemann, Morten.Rasmussen, yuyang.du, pjt, bsegall,
	Vincent Guittot

In the current implementation of load/util_avg, we assume that the ongoing
time segment has fully elapsed and util/load_sum is divided by LOAD_AVG_MAX,
even if part of the time segment still remains to run . As a consequence,
this remaining part is considered as idle time and generates unexpected
variations of util_avg of a busy CPU in the range ]1002..1024[ whereas
util_avg should stay at 1023.

The 1st patch implements Peter's proposal to remove the contribution of the
current time segment when computing the util/load_avg. The 2nd one keeps
using the current segment but update the max value instead. Both solutions
make load/util_avg being stable with the advantage of using the most up to
date value for the 2nd patch. I have split it into 2 patches to show the 2
versions but if the 2nd patch looks ok, we should probably squashed them into
one.

Vincent Guittot (2):
  sched/cfs: make util/load_avg more stable
  sched/cfs: take into account current segment

 kernel/sched/fair.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] sched/cfs: make util/load_avg more stable
  2017-04-19 16:44 ` [PATCH 0/2] sched/cfs: make util/load_avg stable Vincent Guittot
@ 2017-04-19 16:44   ` Vincent Guittot
  2017-04-19 16:44   ` [PATCH 2/2] sched/cfs: take into account current time segment Vincent Guittot
  1 sibling, 0 replies; 11+ messages in thread
From: Vincent Guittot @ 2017-04-19 16:44 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: dietmar.eggemann, Morten.Rasmussen, yuyang.du, pjt, bsegall,
	Vincent Guittot

In the current implementation of load/util_avg, we assume that the ongoing
time segment has fully elapsed, and util/load_sum is divided by LOAD_AVG_MAX,
even if part of the time segment still remains. As a consequence, this
remaining part is considered as idle time and generates unexpected variations
of util_avg of a busy CPU in the range ]1002..1024[ whereas util_avg should
stay at 1023.
In order to keep the metric stable, we should not consider the ongoing time
segment when computing load/util_avg but only the segments that have already
fully elapsed.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---

Sorry some unexpected characters appeared in the commit message of previous
version

 kernel/sched/fair.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3f83a35..f74da94 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3017,12 +3017,15 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 	/*
 	 * Step 2: update *_avg.
 	 */
-	sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX);
+	sa->load_avg = div_u64((sa->load_sum - sa->period_contrib * weight),
+					(LOAD_AVG_MAX - 1024));
 	if (cfs_rq) {
 		cfs_rq->runnable_load_avg =
-			div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX);
+			div_u64((cfs_rq->runnable_load_sum - sa->period_contrib * weight),
+					(LOAD_AVG_MAX - 1024));
 	}
-	sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
+	sa->util_avg = (sa->util_sum - (running * sa->period_contrib << SCHED_CAPACITY_SHIFT)) /
+					(LOAD_AVG_MAX - 1024);
 
 	return 1;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] sched/cfs: take into account current time segment
  2017-04-19 16:44 ` [PATCH 0/2] sched/cfs: make util/load_avg stable Vincent Guittot
  2017-04-19 16:44   ` [PATCH 1/2] sched/cfs: make util/load_avg more stable Vincent Guittot
@ 2017-04-19 16:44   ` Vincent Guittot
  1 sibling, 0 replies; 11+ messages in thread
From: Vincent Guittot @ 2017-04-19 16:44 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: dietmar.eggemann, Morten.Rasmussen, yuyang.du, pjt, bsegall,
	Vincent Guittot

To not consider the current time segment adds unwanted latency in the
load/util_avg responsivness especially when the time is scaled instead of
the contribution. Instead of waiting for the current time segment to have
fully elapsed before accounting it in load/util_avg, we can already account
the elapsed part but change the range used to compute load/util_avg
accordingly.
At the very beginning of a new time segment, the past segments have been
decayed and the max value is MAX_LOAD_AVG*y. At the very end of the current
time segment, the max value becomes 1024(us) + MAX_LOAD_AVG*y which is equal
to MAX_LOAD_AVG. In fact, the max value is
sa->period_contrib + MAX_LOAD_AVG*y at any time in the time segment.

Taking advantage of the fact that MAX_LOAD_AVG*y == MAX_LOAD_AVG-1024, the
range becomes [0..MAX_LOAD_AVG-1024+sa->period_contrib].

As the elapsed part is already accounted in load/util_sum, we update the max
value according to the current position in the time segment instead of
removing its contribution.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f74da94..c3b8f0f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3017,15 +3017,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 	/*
 	 * Step 2: update *_avg.
 	 */
-	sa->load_avg = div_u64((sa->load_sum - sa->period_contrib * weight),
-					(LOAD_AVG_MAX - 1024));
+	sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib);
 	if (cfs_rq) {
 		cfs_rq->runnable_load_avg =
-			div_u64((cfs_rq->runnable_load_sum - sa->period_contrib * weight),
-					(LOAD_AVG_MAX - 1024));
+			div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib);
 	}
-	sa->util_avg = (sa->util_sum - (running * sa->period_contrib << SCHED_CAPACITY_SHIFT)) /
-					(LOAD_AVG_MAX - 1024);
+	sa->util_avg = sa->util_sum / (LOAD_AVG_MAX - 1024 + sa->period_contrib);
 
 	return 1;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2] sched/cfs: make util/load_avg more stable
  2017-04-19 16:29 [PATCH 0/2] sched/cfs: make util/load_avg stable Vincent Guittot
                   ` (2 preceding siblings ...)
  2017-04-19 16:44 ` [PATCH 0/2] sched/cfs: make util/load_avg stable Vincent Guittot
@ 2017-04-19 16:54 ` Vincent Guittot
  2017-04-25 11:05   ` Dietmar Eggemann
  3 siblings, 1 reply; 11+ messages in thread
From: Vincent Guittot @ 2017-04-19 16:54 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: dietmar.eggemann, Morten.Rasmussen, yuyang.du, pjt, bsegall,
	Vincent Guittot

In the current implementation of load/util_avg, we assume that the ongoing
time segment has fully elapsed, and util/load_sum is divided by LOAD_AVG_MAX,
even if part of the time segment still remains to run. As a consequence, this
remaining part is considered as idle time and generates unexpected variations
of util_avg of a busy CPU in the range ]1002..1024[ whereas util_avg should
stay at 1023.

In order to keep the metric stable, we should not consider the ongoing time
segment when computing load/util_avg but only the segments that have already
fully elapsed. Bu to not consider the current time segment adds unwanted
latency in the load/util_avg responsivness especially when the time is scaled
instead of the contribution. Instead of waiting for the current time segment
to have fully elapsed before accounting it in load/util_avg, we can already
account the elapsed part but change the range used to compute load/util_avg
accordingly.

At the very beginning of a new time segment, the past segments have been
decayed and the max value is MAX_LOAD_AVG*y. At the very end of the current
time segment, the max value becomes 1024(us) + MAX_LOAD_AVG*y which is equal
to MAX_LOAD_AVG. In fact, the max value is
sa->period_contrib + MAX_LOAD_AVG*y at any time in the time segment.

Taking advantage of the fact that MAX_LOAD_AVG*y == MAX_LOAD_AVG-1024, the
range becomes [0..MAX_LOAD_AVG-1024+sa->period_contrib].

As the elapsed part is already accounted in load/util_sum, we update the max
value according to the current position in the time segment instead of
removing its contribution.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---

Fold both patches in one

 kernel/sched/fair.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3f83a35..c3b8f0f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3017,12 +3017,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 	/*
 	 * Step 2: update *_avg.
 	 */
-	sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX);
+	sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib);
 	if (cfs_rq) {
 		cfs_rq->runnable_load_avg =
-			div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX);
+			div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib);
 	}
-	sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
+	sa->util_avg = sa->util_sum / (LOAD_AVG_MAX - 1024 + sa->period_contrib);
 
 	return 1;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] sched/cfs: make util/load_avg more stable
  2017-04-19 16:54 ` [PATCH v2] sched/cfs: make util/load_avg more stable Vincent Guittot
@ 2017-04-25 11:05   ` Dietmar Eggemann
  2017-04-25 12:40     ` Vincent Guittot
  0 siblings, 1 reply; 11+ messages in thread
From: Dietmar Eggemann @ 2017-04-25 11:05 UTC (permalink / raw)
  To: Vincent Guittot, peterz, mingo, linux-kernel
  Cc: Morten.Rasmussen, yuyang.du, pjt, bsegall

On 19/04/17 17:54, Vincent Guittot wrote:
> In the current implementation of load/util_avg, we assume that the ongoing
> time segment has fully elapsed, and util/load_sum is divided by LOAD_AVG_MAX,
> even if part of the time segment still remains to run. As a consequence, this
> remaining part is considered as idle time and generates unexpected variations
> of util_avg of a busy CPU in the range ]1002..1024[ whereas util_avg should

Why do you use the square brackets the other way around? Just curious.

1002 stands for 1024*y^1 w/ y = 4008/4096 or y^32 = 0.5, right ? Might
be worth mentioning.

> stay at 1023.
> 
> In order to keep the metric stable, we should not consider the ongoing time
> segment when computing load/util_avg but only the segments that have already
> fully elapsed. Bu to not consider the current time segment adds unwanted
> latency in the load/util_avg responsivness especially when the time is scaled
> instead of the contribution. Instead of waiting for the current time segment
> to have fully elapsed before accounting it in load/util_avg, we can already
> account the elapsed part but change the range used to compute load/util_avg
> accordingly.
> 
> At the very beginning of a new time segment, the past segments have been
> decayed and the max value is MAX_LOAD_AVG*y. At the very end of the current
> time segment, the max value becomes 1024(us) + MAX_LOAD_AVG*y which is equal
> to MAX_LOAD_AVG. In fact, the max value is
> sa->period_contrib + MAX_LOAD_AVG*y at any time in the time segment.
> 
> Taking advantage of the fact that MAX_LOAD_AVG*y == MAX_LOAD_AVG-1024, the
> range becomes [0..MAX_LOAD_AVG-1024+sa->period_contrib].
> 
> As the elapsed part is already accounted in load/util_sum, we update the max
> value according to the current position in the time segment instead of
> removing its contribution.

Removing its contribution stands for '- 1024' of 'LOAD_AVG_MAX - 1024'
which was added in patch 1/2?

> 
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
> 
> Fold both patches in one
> 
>  kernel/sched/fair.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 3f83a35..c3b8f0f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3017,12 +3017,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>  	/*
>  	 * Step 2: update *_avg.
>  	 */
> -	sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX);
> +	sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib);
>  	if (cfs_rq) {
>  		cfs_rq->runnable_load_avg =
> -			div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX);
> +			div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib);
>  	}
> -	sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
> +	sa->util_avg = sa->util_sum / (LOAD_AVG_MAX - 1024 + sa->period_contrib);
>  
>  	return 1;
>  }
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] sched/cfs: make util/load_avg more stable
  2017-04-25 11:05   ` Dietmar Eggemann
@ 2017-04-25 12:40     ` Vincent Guittot
  2017-04-25 14:53       ` Dietmar Eggemann
  0 siblings, 1 reply; 11+ messages in thread
From: Vincent Guittot @ 2017-04-25 12:40 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen,
	Yuyang Du, Paul Turner, Ben Segall

On 25 April 2017 at 13:05, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 19/04/17 17:54, Vincent Guittot wrote:
>> In the current implementation of load/util_avg, we assume that the ongoing
>> time segment has fully elapsed, and util/load_sum is divided by LOAD_AVG_MAX,
>> even if part of the time segment still remains to run. As a consequence, this
>> remaining part is considered as idle time and generates unexpected variations
>> of util_avg of a busy CPU in the range ]1002..1024[ whereas util_avg should
>
> Why do you use the square brackets the other way around? Just curious.

This refers to the very beginning and very end of time segment formulas below.
That being said, 1024 is not reachable because at very end we have :
 1024*MAX_LOAD_AVG*y+1024*1023 = 1023,9997

1002 is not reachable because at very beg we have
1024*MAX_LOAD_AVG*y+ 1024*0 = 1002,0577

But we are working with integer so [1002..1024[ is probably more correct

>
> 1002 stands for 1024*y^1 w/ y = 4008/4096 or y^32 = 0.5, right ? Might
> be worth mentioning.
>
>> stay at 1023.
>>
>> In order to keep the metric stable, we should not consider the ongoing time
>> segment when computing load/util_avg but only the segments that have already
>> fully elapsed. Bu to not consider the current time segment adds unwanted
>> latency in the load/util_avg responsivness especially when the time is scaled
>> instead of the contribution. Instead of waiting for the current time segment
>> to have fully elapsed before accounting it in load/util_avg, we can already
>> account the elapsed part but change the range used to compute load/util_avg
>> accordingly.
>>
>> At the very beginning of a new time segment, the past segments have been
>> decayed and the max value is MAX_LOAD_AVG*y. At the very end of the current
>> time segment, the max value becomes 1024(us) + MAX_LOAD_AVG*y which is equal
>> to MAX_LOAD_AVG. In fact, the max value is
>> sa->period_contrib + MAX_LOAD_AVG*y at any time in the time segment.
>>
>> Taking advantage of the fact that MAX_LOAD_AVG*y == MAX_LOAD_AVG-1024, the
>> range becomes [0..MAX_LOAD_AVG-1024+sa->period_contrib].
>>
>> As the elapsed part is already accounted in load/util_sum, we update the max
>> value according to the current position in the time segment instead of
>> removing its contribution.
>
> Removing its contribution stands for '- 1024' of 'LOAD_AVG_MAX - 1024'
> which was added in patch 1/2?

removing its contribution refers to  "- sa->period_contrib * weight"
and "- (running * sa->period_contrib << SCHED_CAPACITY_SHIFT))"  in
patch 1/2 of the previous version

>
>>
>> Suggested-by: Peter Zijlstra <peterz@infradead.org>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> ---
>>
>> Fold both patches in one
>>
>>  kernel/sched/fair.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 3f83a35..c3b8f0f 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -3017,12 +3017,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>       /*
>>        * Step 2: update *_avg.
>>        */
>> -     sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX);
>> +     sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib);
>>       if (cfs_rq) {
>>               cfs_rq->runnable_load_avg =
>> -                     div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX);
>> +                     div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib);
>>       }
>> -     sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
>> +     sa->util_avg = sa->util_sum / (LOAD_AVG_MAX - 1024 + sa->period_contrib);
>>
>>       return 1;
>>  }
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] sched/cfs: make util/load_avg more stable
  2017-04-25 12:40     ` Vincent Guittot
@ 2017-04-25 14:53       ` Dietmar Eggemann
  2017-04-25 15:17         ` Vincent Guittot
  0 siblings, 1 reply; 11+ messages in thread
From: Dietmar Eggemann @ 2017-04-25 14:53 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen,
	Yuyang Du, Paul Turner, Ben Segall

On 25/04/17 13:40, Vincent Guittot wrote:
> On 25 April 2017 at 13:05, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>> On 19/04/17 17:54, Vincent Guittot wrote:
>>> In the current implementation of load/util_avg, we assume that the ongoing
>>> time segment has fully elapsed, and util/load_sum is divided by LOAD_AVG_MAX,
>>> even if part of the time segment still remains to run. As a consequence, this
>>> remaining part is considered as idle time and generates unexpected variations
>>> of util_avg of a busy CPU in the range ]1002..1024[ whereas util_avg should
>>
>> Why do you use the square brackets the other way around? Just curious.
> 
> This refers to the very beginning and very end of time segment formulas below.
> That being said, 1024 is not reachable because at very end we have :
>  1024*MAX_LOAD_AVG*y+1024*1023 = 1023,9997
> 
> 1002 is not reachable because at very beg we have
> 1024*MAX_LOAD_AVG*y+ 1024*0 = 1002,0577
> 
> But we are working with integer so [1002..1024[ is probably more correct

OK, this is with y = 32nd-rt(0.5) exactly, understood.

I assume you mean LOAD_AVG_MAX instead of MAX_LOAD_AVG.

>> 1002 stands for 1024*y^1 w/ y = 4008/4096 or y^32 = 0.5, right ? Might
>> be worth mentioning.
>>
>>> stay at 1023.
>>>
>>> In order to keep the metric stable, we should not consider the ongoing time
>>> segment when computing load/util_avg but only the segments that have already
>>> fully elapsed. Bu to not consider the current time segment adds unwanted
>>> latency in the load/util_avg responsivness especially when the time is scaled
>>> instead of the contribution. Instead of waiting for the current time segment
>>> to have fully elapsed before accounting it in load/util_avg, we can already
>>> account the elapsed part but change the range used to compute load/util_avg
>>> accordingly.
>>>
>>> At the very beginning of a new time segment, the past segments have been
>>> decayed and the max value is MAX_LOAD_AVG*y. At the very end of the current
>>> time segment, the max value becomes 1024(us) + MAX_LOAD_AVG*y which is equal
>>> to MAX_LOAD_AVG. In fact, the max value is
>>> sa->period_contrib + MAX_LOAD_AVG*y at any time in the time segment.

s/MAX_LOAD_AVG/LOAD_AVG_MAX

>>>
>>> Taking advantage of the fact that MAX_LOAD_AVG*y == MAX_LOAD_AVG-1024, the
>>> range becomes [0..MAX_LOAD_AVG-1024+sa->period_contrib].
>>>
>>> As the elapsed part is already accounted in load/util_sum, we update the max
>>> value according to the current position in the time segment instead of
>>> removing its contribution.
>>
>> Removing its contribution stands for '- 1024' of 'LOAD_AVG_MAX - 1024'
>> which was added in patch 1/2?
> 
> removing its contribution refers to  "- sa->period_contrib * weight"
> and "- (running * sa->period_contrib << SCHED_CAPACITY_SHIFT))"  in
> patch 1/2 of the previous version

Yup, makes sense, so the '-1024' is the influence of the current 'time
segment' (n = 0) then.

IMHO, the removing of contribution in patch 1/2 wouldn't take freq and
cpu scaling of contribution (which is still in accumulate_sum()) into
consideration.

>>> Suggested-by: Peter Zijlstra <peterz@infradead.org>
>>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>>> ---
>>>
>>> Fold both patches in one
>>>
>>>  kernel/sched/fair.c | 6 +++---
>>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 3f83a35..c3b8f0f 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -3017,12 +3017,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>>       /*
>>>        * Step 2: update *_avg.
>>>        */
>>> -     sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX);
>>> +     sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib);
>>>       if (cfs_rq) {
>>>               cfs_rq->runnable_load_avg =
>>> -                     div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX);
>>> +                     div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib);
>>>       }
>>> -     sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
>>> +     sa->util_avg = sa->util_sum / (LOAD_AVG_MAX - 1024 + sa->period_contrib);
>>>
>>>       return 1;
>>>  }
>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] sched/cfs: make util/load_avg more stable
  2017-04-25 14:53       ` Dietmar Eggemann
@ 2017-04-25 15:17         ` Vincent Guittot
  0 siblings, 0 replies; 11+ messages in thread
From: Vincent Guittot @ 2017-04-25 15:17 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen,
	Yuyang Du, Paul Turner, Ben Segall

On 25 April 2017 at 16:53, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 25/04/17 13:40, Vincent Guittot wrote:
>> On 25 April 2017 at 13:05, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>>> On 19/04/17 17:54, Vincent Guittot wrote:
>>>> In the current implementation of load/util_avg, we assume that the ongoing
>>>> time segment has fully elapsed, and util/load_sum is divided by LOAD_AVG_MAX,
>>>> even if part of the time segment still remains to run. As a consequence, this
>>>> remaining part is considered as idle time and generates unexpected variations
>>>> of util_avg of a busy CPU in the range ]1002..1024[ whereas util_avg should
>>>
>>> Why do you use the square brackets the other way around? Just curious.
>>
>> This refers to the very beginning and very end of time segment formulas below.
>> That being said, 1024 is not reachable because at very end we have :
>>  1024*MAX_LOAD_AVG*y+1024*1023 = 1023,9997
>>
>> 1002 is not reachable because at very beg we have
>> 1024*MAX_LOAD_AVG*y+ 1024*0 = 1002,0577
>>
>> But we are working with integer so [1002..1024[ is probably more correct
>
> OK, this is with y = 32nd-rt(0.5) exactly, understood.
>
> I assume you mean LOAD_AVG_MAX instead of MAX_LOAD_AVG.

correct

>
>>> 1002 stands for 1024*y^1 w/ y = 4008/4096 or y^32 = 0.5, right ? Might
>>> be worth mentioning.
>>>
>>>> stay at 1023.
>>>>
>>>> In order to keep the metric stable, we should not consider the ongoing time
>>>> segment when computing load/util_avg but only the segments that have already
>>>> fully elapsed. Bu to not consider the current time segment adds unwanted
>>>> latency in the load/util_avg responsivness especially when the time is scaled
>>>> instead of the contribution. Instead of waiting for the current time segment
>>>> to have fully elapsed before accounting it in load/util_avg, we can already
>>>> account the elapsed part but change the range used to compute load/util_avg
>>>> accordingly.
>>>>
>>>> At the very beginning of a new time segment, the past segments have been
>>>> decayed and the max value is MAX_LOAD_AVG*y. At the very end of the current
>>>> time segment, the max value becomes 1024(us) + MAX_LOAD_AVG*y which is equal
>>>> to MAX_LOAD_AVG. In fact, the max value is
>>>> sa->period_contrib + MAX_LOAD_AVG*y at any time in the time segment.
>
> s/MAX_LOAD_AVG/LOAD_AVG_MAX

ditto

>
>>>>
>>>> Taking advantage of the fact that MAX_LOAD_AVG*y == MAX_LOAD_AVG-1024, the
>>>> range becomes [0..MAX_LOAD_AVG-1024+sa->period_contrib].
>>>>
>>>> As the elapsed part is already accounted in load/util_sum, we update the max
>>>> value according to the current position in the time segment instead of
>>>> removing its contribution.
>>>
>>> Removing its contribution stands for '- 1024' of 'LOAD_AVG_MAX - 1024'
>>> which was added in patch 1/2?
>>
>> removing its contribution refers to  "- sa->period_contrib * weight"
>> and "- (running * sa->period_contrib << SCHED_CAPACITY_SHIFT))"  in
>> patch 1/2 of the previous version
>
> Yup, makes sense, so the '-1024' is the influence of the current 'time
> segment' (n = 0) then.

Yes

>
> IMHO, the removing of contribution in patch 1/2 wouldn't take freq and
> cpu scaling of contribution (which is still in accumulate_sum()) into
> consideration.

Yes you're right but as everything has been put in 1 single patch in
v2, it doesn't make any difference now

>
>>>> Suggested-by: Peter Zijlstra <peterz@infradead.org>
>>>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>>>> ---
>>>>
>>>> Fold both patches in one
>>>>
>>>>  kernel/sched/fair.c | 6 +++---
>>>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index 3f83a35..c3b8f0f 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -3017,12 +3017,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>>>       /*
>>>>        * Step 2: update *_avg.
>>>>        */
>>>> -     sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX);
>>>> +     sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib);
>>>>       if (cfs_rq) {
>>>>               cfs_rq->runnable_load_avg =
>>>> -                     div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX);
>>>> +                     div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib);
>>>>       }
>>>> -     sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
>>>> +     sa->util_avg = sa->util_sum / (LOAD_AVG_MAX - 1024 + sa->period_contrib);
>>>>
>>>>       return 1;
>>>>  }
>>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-04-25 15:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-19 16:29 [PATCH 0/2] sched/cfs: make util/load_avg stable Vincent Guittot
2017-04-19 16:29 ` [PATCH 1/2] sched/cfs: make util/load_avg more stable Vincent Guittot
2017-04-19 16:29 ` [PATCH 2/2] sched/cfs: take into account current time segment Vincent Guittot
2017-04-19 16:44 ` [PATCH 0/2] sched/cfs: make util/load_avg stable Vincent Guittot
2017-04-19 16:44   ` [PATCH 1/2] sched/cfs: make util/load_avg more stable Vincent Guittot
2017-04-19 16:44   ` [PATCH 2/2] sched/cfs: take into account current time segment Vincent Guittot
2017-04-19 16:54 ` [PATCH v2] sched/cfs: make util/load_avg more stable Vincent Guittot
2017-04-25 11:05   ` Dietmar Eggemann
2017-04-25 12:40     ` Vincent Guittot
2017-04-25 14:53       ` Dietmar Eggemann
2017-04-25 15:17         ` Vincent Guittot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).