*[PATCH 0/2] sched/cfs: make util/load_avg stable@ 2017-04-19 16:29 Vincent Guittot2017-04-19 16:29 ` [PATCH 1/2] sched/cfs: make util/load_avg more stable Vincent Guittot ` (3 more replies) 0 siblings, 4 replies; 11+ messages in thread From: Vincent Guittot @ 2017-04-19 16:29 UTC (permalink / raw) To: mingo, linux-kernel Cc: dietmar.eggemann, Morten.Rasmussen, yuyang.du, pjt, bsegall, Vincent Guittot In the current implementation of load/util_avg, we assume that the ongoing time segment has fully elapsed and util/load_sum is divided by LOAD_AVG_MAX, even if part of the time segment still remains to run . As a consequence, this remaining part is considered as idle time and generates unexpected variations of util_avg of a busy CPU in the range ]1002..1024[ whereas util_avg should stay at 1023. The 1st patch implements Peter's proposal to remove the contribution of the current time segment when computing the util/load_avg. The 2nd one keeps using the current segment but update the max value instead. Both solutions make load/util_avg being stable with the advantage of using the most up to date value for the 2nd patch. I have split it into 2 patches to show the 2 versions but if the 2nd patch looks ok, we should probably squashed them into one. Vincent Guittot (2): sched/cfs: make util/load_avg more stable sched/cfs: take into account current segment kernel/sched/fair.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- 2.7.4 ^ permalink raw reply [flat|nested] 11+ messages in thread

*[PATCH 1/2] sched/cfs: make util/load_avg more stable2017-04-19 16:29 [PATCH 0/2] sched/cfs: make util/load_avg stable Vincent Guittot@ 2017-04-19 16:29 ` Vincent Guittot2017-04-19 16:29 ` [PATCH 2/2] sched/cfs: take into account current time segment Vincent Guittot ` (2 subsequent siblings) 3 siblings, 0 replies; 11+ messages in thread From: Vincent Guittot @ 2017-04-19 16:29 UTC (permalink / raw) To: mingo, linux-kernel Cc: dietmar.eggemann, Morten.Rasmussen, yuyang.du, pjt, bsegall, Vincent Guittot In the current implementation of load/util_avg, we assume that the ongoing time segment has fully elapsed, and util/load_sum is divided by LOAD_AVG_MAX, even if part of the time segment still remains. As a consequence, this remaining part is considered as idle time and generates unexpected variations of util_avg of a busy CPU in the range ]1002..1024[ whereas util_avg should stay at 1023. In order to keep the metric stable, we should not consider the ongoing time segment when computing load/util_avg but only the segments that have already fully elapsed. :if expand("%") == ""|browse confirm w|else|confirm w|endif Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> --- kernel/sched/fair.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3f83a35..f74da94 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3017,12 +3017,15 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa, /* * Step 2: update *_avg. */ - sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX); + sa->load_avg = div_u64((sa->load_sum - sa->period_contrib * weight), + (LOAD_AVG_MAX - 1024)); if (cfs_rq) { cfs_rq->runnable_load_avg = - div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX); + div_u64((cfs_rq->runnable_load_sum - sa->period_contrib * weight), + (LOAD_AVG_MAX - 1024)); } - sa->util_avg = sa->util_sum / LOAD_AVG_MAX; + sa->util_avg = (sa->util_sum - (running * sa->period_contrib << SCHED_CAPACITY_SHIFT)) / + (LOAD_AVG_MAX - 1024); return 1; } -- 2.7.4 ^ permalink raw reply [flat|nested] 11+ messages in thread

*[PATCH 2/2] sched/cfs: take into account current time segment2017-04-19 16:29 [PATCH 0/2] sched/cfs: make util/load_avg stable Vincent Guittot 2017-04-19 16:29 ` [PATCH 1/2] sched/cfs: make util/load_avg more stable Vincent Guittot@ 2017-04-19 16:29 ` Vincent Guittot2017-04-19 16:44 ` [PATCH 0/2] sched/cfs: make util/load_avg stable Vincent Guittot 2017-04-19 16:54 ` [PATCH v2] sched/cfs: make util/load_avg more stable Vincent Guittot 3 siblings, 0 replies; 11+ messages in thread From: Vincent Guittot @ 2017-04-19 16:29 UTC (permalink / raw) To: mingo, linux-kernel Cc: dietmar.eggemann, Morten.Rasmussen, yuyang.du, pjt, bsegall, Vincent Guittot To not consider the current time segment adds unwanted latency in the load/util_avg responsivness especially when the time is scaled instead of the contribution. Instead of waiting for the current time segment to have fully elapsed before accounting it in load/util_avg, we can already account the elapsed part but change the range used to compute load/util_avg accordingly. At the very beginning of a new time segment, the past segments have been decayed and the max value is MAX_LOAD_AVG*y. At the very end of the current time segment, the max value becomes 1024(us) + MAX_LOAD_AVG*y which is equal to MAX_LOAD_AVG. In fact, the max value is sa->period_contrib + MAX_LOAD_AVG*y at any time in the time segment. Taking advantage of the fact that MAX_LOAD_AVG*y == MAX_LOAD_AVG-1024, the range becomes [0..MAX_LOAD_AVG-1024+sa->period_contrib]. As the elapsed part is already accounted in load/util_sum, we update the max value according to the current position in the time segment instead of removing its contribution. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> --- kernel/sched/fair.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index f74da94..c3b8f0f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3017,15 +3017,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa, /* * Step 2: update *_avg. */ - sa->load_avg = div_u64((sa->load_sum - sa->period_contrib * weight), - (LOAD_AVG_MAX - 1024)); + sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib); if (cfs_rq) { cfs_rq->runnable_load_avg = - div_u64((cfs_rq->runnable_load_sum - sa->period_contrib * weight), - (LOAD_AVG_MAX - 1024)); + div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib); } - sa->util_avg = (sa->util_sum - (running * sa->period_contrib << SCHED_CAPACITY_SHIFT)) / - (LOAD_AVG_MAX - 1024); + sa->util_avg = sa->util_sum / (LOAD_AVG_MAX - 1024 + sa->period_contrib); return 1; } -- 2.7.4 ^ permalink raw reply [flat|nested] 11+ messages in thread

*[PATCH 0/2] sched/cfs: make util/load_avg stable2017-04-19 16:29 [PATCH 0/2] sched/cfs: make util/load_avg stable Vincent Guittot 2017-04-19 16:29 ` [PATCH 1/2] sched/cfs: make util/load_avg more stable Vincent Guittot 2017-04-19 16:29 ` [PATCH 2/2] sched/cfs: take into account current time segment Vincent Guittot@ 2017-04-19 16:44 ` Vincent Guittot2017-04-19 16:44 ` [PATCH 1/2] sched/cfs: make util/load_avg more stable Vincent Guittot 2017-04-19 16:44 ` [PATCH 2/2] sched/cfs: take into account current time segment Vincent Guittot 2017-04-19 16:54 ` [PATCH v2] sched/cfs: make util/load_avg more stable Vincent Guittot 3 siblings, 2 replies; 11+ messages in thread From: Vincent Guittot @ 2017-04-19 16:44 UTC (permalink / raw) To: peterz, mingo, linux-kernel Cc: dietmar.eggemann, Morten.Rasmussen, yuyang.du, pjt, bsegall, Vincent Guittot In the current implementation of load/util_avg, we assume that the ongoing time segment has fully elapsed and util/load_sum is divided by LOAD_AVG_MAX, even if part of the time segment still remains to run . As a consequence, this remaining part is considered as idle time and generates unexpected variations of util_avg of a busy CPU in the range ]1002..1024[ whereas util_avg should stay at 1023. The 1st patch implements Peter's proposal to remove the contribution of the current time segment when computing the util/load_avg. The 2nd one keeps using the current segment but update the max value instead. Both solutions make load/util_avg being stable with the advantage of using the most up to date value for the 2nd patch. I have split it into 2 patches to show the 2 versions but if the 2nd patch looks ok, we should probably squashed them into one. Vincent Guittot (2): sched/cfs: make util/load_avg more stable sched/cfs: take into account current segment kernel/sched/fair.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- 2.7.4 ^ permalink raw reply [flat|nested] 11+ messages in thread

*[PATCH 1/2] sched/cfs: make util/load_avg more stable2017-04-19 16:44 ` [PATCH 0/2] sched/cfs: make util/load_avg stable Vincent Guittot@ 2017-04-19 16:44 ` Vincent Guittot2017-04-19 16:44 ` [PATCH 2/2] sched/cfs: take into account current time segment Vincent Guittot 1 sibling, 0 replies; 11+ messages in thread From: Vincent Guittot @ 2017-04-19 16:44 UTC (permalink / raw) To: peterz, mingo, linux-kernel Cc: dietmar.eggemann, Morten.Rasmussen, yuyang.du, pjt, bsegall, Vincent Guittot In the current implementation of load/util_avg, we assume that the ongoing time segment has fully elapsed, and util/load_sum is divided by LOAD_AVG_MAX, even if part of the time segment still remains. As a consequence, this remaining part is considered as idle time and generates unexpected variations of util_avg of a busy CPU in the range ]1002..1024[ whereas util_avg should stay at 1023. In order to keep the metric stable, we should not consider the ongoing time segment when computing load/util_avg but only the segments that have already fully elapsed. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> --- Sorry some unexpected characters appeared in the commit message of previous version kernel/sched/fair.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3f83a35..f74da94 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3017,12 +3017,15 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa, /* * Step 2: update *_avg. */ - sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX); + sa->load_avg = div_u64((sa->load_sum - sa->period_contrib * weight), + (LOAD_AVG_MAX - 1024)); if (cfs_rq) { cfs_rq->runnable_load_avg = - div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX); + div_u64((cfs_rq->runnable_load_sum - sa->period_contrib * weight), + (LOAD_AVG_MAX - 1024)); } - sa->util_avg = sa->util_sum / LOAD_AVG_MAX; + sa->util_avg = (sa->util_sum - (running * sa->period_contrib << SCHED_CAPACITY_SHIFT)) / + (LOAD_AVG_MAX - 1024); return 1; } -- 2.7.4 ^ permalink raw reply [flat|nested] 11+ messages in thread

*2017-04-19 16:44 ` [PATCH 0/2] sched/cfs: make util/load_avg stable Vincent Guittot 2017-04-19 16:44 ` [PATCH 1/2] sched/cfs: make util/load_avg more stable Vincent Guittot[PATCH 2/2] sched/cfs: take into account current time segment@ 2017-04-19 16:44 ` Vincent Guittot1 sibling, 0 replies; 11+ messages in thread From: Vincent Guittot @ 2017-04-19 16:44 UTC (permalink / raw) To: peterz, mingo, linux-kernel Cc: dietmar.eggemann, Morten.Rasmussen, yuyang.du, pjt, bsegall, Vincent Guittot To not consider the current time segment adds unwanted latency in the load/util_avg responsivness especially when the time is scaled instead of the contribution. Instead of waiting for the current time segment to have fully elapsed before accounting it in load/util_avg, we can already account the elapsed part but change the range used to compute load/util_avg accordingly. At the very beginning of a new time segment, the past segments have been decayed and the max value is MAX_LOAD_AVG*y. At the very end of the current time segment, the max value becomes 1024(us) + MAX_LOAD_AVG*y which is equal to MAX_LOAD_AVG. In fact, the max value is sa->period_contrib + MAX_LOAD_AVG*y at any time in the time segment. Taking advantage of the fact that MAX_LOAD_AVG*y == MAX_LOAD_AVG-1024, the range becomes [0..MAX_LOAD_AVG-1024+sa->period_contrib]. As the elapsed part is already accounted in load/util_sum, we update the max value according to the current position in the time segment instead of removing its contribution. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> --- kernel/sched/fair.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index f74da94..c3b8f0f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3017,15 +3017,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa, /* * Step 2: update *_avg. */ - sa->load_avg = div_u64((sa->load_sum - sa->period_contrib * weight), - (LOAD_AVG_MAX - 1024)); + sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib); if (cfs_rq) { cfs_rq->runnable_load_avg = - div_u64((cfs_rq->runnable_load_sum - sa->period_contrib * weight), - (LOAD_AVG_MAX - 1024)); + div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib); } - sa->util_avg = (sa->util_sum - (running * sa->period_contrib << SCHED_CAPACITY_SHIFT)) / - (LOAD_AVG_MAX - 1024); + sa->util_avg = sa->util_sum / (LOAD_AVG_MAX - 1024 + sa->period_contrib); return 1; } -- 2.7.4 ^ permalink raw reply [flat|nested] 11+ messages in thread

*[PATCH v2] sched/cfs: make util/load_avg more stable2017-04-19 16:29 [PATCH 0/2] sched/cfs: make util/load_avg stable Vincent Guittot ` (2 preceding siblings ...) 2017-04-19 16:44 ` [PATCH 0/2] sched/cfs: make util/load_avg stable Vincent Guittot@ 2017-04-19 16:54 ` Vincent Guittot2017-04-25 11:05 ` Dietmar Eggemann 3 siblings, 1 reply; 11+ messages in thread From: Vincent Guittot @ 2017-04-19 16:54 UTC (permalink / raw) To: peterz, mingo, linux-kernel Cc: dietmar.eggemann, Morten.Rasmussen, yuyang.du, pjt, bsegall, Vincent Guittot In the current implementation of load/util_avg, we assume that the ongoing time segment has fully elapsed, and util/load_sum is divided by LOAD_AVG_MAX, even if part of the time segment still remains to run. As a consequence, this remaining part is considered as idle time and generates unexpected variations of util_avg of a busy CPU in the range ]1002..1024[ whereas util_avg should stay at 1023. In order to keep the metric stable, we should not consider the ongoing time segment when computing load/util_avg but only the segments that have already fully elapsed. Bu to not consider the current time segment adds unwanted latency in the load/util_avg responsivness especially when the time is scaled instead of the contribution. Instead of waiting for the current time segment to have fully elapsed before accounting it in load/util_avg, we can already account the elapsed part but change the range used to compute load/util_avg accordingly. At the very beginning of a new time segment, the past segments have been decayed and the max value is MAX_LOAD_AVG*y. At the very end of the current time segment, the max value becomes 1024(us) + MAX_LOAD_AVG*y which is equal to MAX_LOAD_AVG. In fact, the max value is sa->period_contrib + MAX_LOAD_AVG*y at any time in the time segment. Taking advantage of the fact that MAX_LOAD_AVG*y == MAX_LOAD_AVG-1024, the range becomes [0..MAX_LOAD_AVG-1024+sa->period_contrib]. As the elapsed part is already accounted in load/util_sum, we update the max value according to the current position in the time segment instead of removing its contribution. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> --- Fold both patches in one kernel/sched/fair.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3f83a35..c3b8f0f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3017,12 +3017,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa, /* * Step 2: update *_avg. */ - sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX); + sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib); if (cfs_rq) { cfs_rq->runnable_load_avg = - div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX); + div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib); } - sa->util_avg = sa->util_sum / LOAD_AVG_MAX; + sa->util_avg = sa->util_sum / (LOAD_AVG_MAX - 1024 + sa->period_contrib); return 1; } -- 2.7.4 ^ permalink raw reply [flat|nested] 11+ messages in thread

*Re: [PATCH v2] sched/cfs: make util/load_avg more stable2017-04-19 16:54 ` [PATCH v2] sched/cfs: make util/load_avg more stable Vincent Guittot@ 2017-04-25 11:05 ` Dietmar Eggemann2017-04-25 12:40 ` Vincent Guittot 0 siblings, 1 reply; 11+ messages in thread From: Dietmar Eggemann @ 2017-04-25 11:05 UTC (permalink / raw) To: Vincent Guittot, peterz, mingo, linux-kernel Cc: Morten.Rasmussen, yuyang.du, pjt, bsegall On 19/04/17 17:54, Vincent Guittot wrote: > In the current implementation of load/util_avg, we assume that the ongoing > time segment has fully elapsed, and util/load_sum is divided by LOAD_AVG_MAX, > even if part of the time segment still remains to run. As a consequence, this > remaining part is considered as idle time and generates unexpected variations > of util_avg of a busy CPU in the range ]1002..1024[ whereas util_avg should Why do you use the square brackets the other way around? Just curious. 1002 stands for 1024*y^1 w/ y = 4008/4096 or y^32 = 0.5, right ? Might be worth mentioning. > stay at 1023. > > In order to keep the metric stable, we should not consider the ongoing time > segment when computing load/util_avg but only the segments that have already > fully elapsed. Bu to not consider the current time segment adds unwanted > latency in the load/util_avg responsivness especially when the time is scaled > instead of the contribution. Instead of waiting for the current time segment > to have fully elapsed before accounting it in load/util_avg, we can already > account the elapsed part but change the range used to compute load/util_avg > accordingly. > > At the very beginning of a new time segment, the past segments have been > decayed and the max value is MAX_LOAD_AVG*y. At the very end of the current > time segment, the max value becomes 1024(us) + MAX_LOAD_AVG*y which is equal > to MAX_LOAD_AVG. In fact, the max value is > sa->period_contrib + MAX_LOAD_AVG*y at any time in the time segment. > > Taking advantage of the fact that MAX_LOAD_AVG*y == MAX_LOAD_AVG-1024, the > range becomes [0..MAX_LOAD_AVG-1024+sa->period_contrib]. > > As the elapsed part is already accounted in load/util_sum, we update the max > value according to the current position in the time segment instead of > removing its contribution. Removing its contribution stands for '- 1024' of 'LOAD_AVG_MAX - 1024' which was added in patch 1/2? > > Suggested-by: Peter Zijlstra <peterz@infradead.org> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > --- > > Fold both patches in one > > kernel/sched/fair.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 3f83a35..c3b8f0f 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3017,12 +3017,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa, > /* > * Step 2: update *_avg. > */ > - sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX); > + sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib); > if (cfs_rq) { > cfs_rq->runnable_load_avg = > - div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX); > + div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib); > } > - sa->util_avg = sa->util_sum / LOAD_AVG_MAX; > + sa->util_avg = sa->util_sum / (LOAD_AVG_MAX - 1024 + sa->period_contrib); > > return 1; > } > ^ permalink raw reply [flat|nested] 11+ messages in thread

*Re: [PATCH v2] sched/cfs: make util/load_avg more stable2017-04-25 11:05 ` Dietmar Eggemann@ 2017-04-25 12:40 ` Vincent Guittot2017-04-25 14:53 ` Dietmar Eggemann 0 siblings, 1 reply; 11+ messages in thread From: Vincent Guittot @ 2017-04-25 12:40 UTC (permalink / raw) To: Dietmar Eggemann Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen, Yuyang Du, Paul Turner, Ben Segall On 25 April 2017 at 13:05, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: > On 19/04/17 17:54, Vincent Guittot wrote: >> In the current implementation of load/util_avg, we assume that the ongoing >> time segment has fully elapsed, and util/load_sum is divided by LOAD_AVG_MAX, >> even if part of the time segment still remains to run. As a consequence, this >> remaining part is considered as idle time and generates unexpected variations >> of util_avg of a busy CPU in the range ]1002..1024[ whereas util_avg should > > Why do you use the square brackets the other way around? Just curious. This refers to the very beginning and very end of time segment formulas below. That being said, 1024 is not reachable because at very end we have : 1024*MAX_LOAD_AVG*y+1024*1023 = 1023,9997 1002 is not reachable because at very beg we have 1024*MAX_LOAD_AVG*y+ 1024*0 = 1002,0577 But we are working with integer so [1002..1024[ is probably more correct > > 1002 stands for 1024*y^1 w/ y = 4008/4096 or y^32 = 0.5, right ? Might > be worth mentioning. > >> stay at 1023. >> >> In order to keep the metric stable, we should not consider the ongoing time >> segment when computing load/util_avg but only the segments that have already >> fully elapsed. Bu to not consider the current time segment adds unwanted >> latency in the load/util_avg responsivness especially when the time is scaled >> instead of the contribution. Instead of waiting for the current time segment >> to have fully elapsed before accounting it in load/util_avg, we can already >> account the elapsed part but change the range used to compute load/util_avg >> accordingly. >> >> At the very beginning of a new time segment, the past segments have been >> decayed and the max value is MAX_LOAD_AVG*y. At the very end of the current >> time segment, the max value becomes 1024(us) + MAX_LOAD_AVG*y which is equal >> to MAX_LOAD_AVG. In fact, the max value is >> sa->period_contrib + MAX_LOAD_AVG*y at any time in the time segment. >> >> Taking advantage of the fact that MAX_LOAD_AVG*y == MAX_LOAD_AVG-1024, the >> range becomes [0..MAX_LOAD_AVG-1024+sa->period_contrib]. >> >> As the elapsed part is already accounted in load/util_sum, we update the max >> value according to the current position in the time segment instead of >> removing its contribution. > > Removing its contribution stands for '- 1024' of 'LOAD_AVG_MAX - 1024' > which was added in patch 1/2? removing its contribution refers to "- sa->period_contrib * weight" and "- (running * sa->period_contrib << SCHED_CAPACITY_SHIFT))" in patch 1/2 of the previous version > >> >> Suggested-by: Peter Zijlstra <peterz@infradead.org> >> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> >> --- >> >> Fold both patches in one >> >> kernel/sched/fair.c | 6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 3f83a35..c3b8f0f 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -3017,12 +3017,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa, >> /* >> * Step 2: update *_avg. >> */ >> - sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX); >> + sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib); >> if (cfs_rq) { >> cfs_rq->runnable_load_avg = >> - div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX); >> + div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib); >> } >> - sa->util_avg = sa->util_sum / LOAD_AVG_MAX; >> + sa->util_avg = sa->util_sum / (LOAD_AVG_MAX - 1024 + sa->period_contrib); >> >> return 1; >> } >> ^ permalink raw reply [flat|nested] 11+ messages in thread

*Re: [PATCH v2] sched/cfs: make util/load_avg more stable2017-04-25 12:40 ` Vincent Guittot@ 2017-04-25 14:53 ` Dietmar Eggemann2017-04-25 15:17 ` Vincent Guittot 0 siblings, 1 reply; 11+ messages in thread From: Dietmar Eggemann @ 2017-04-25 14:53 UTC (permalink / raw) To: Vincent Guittot Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen, Yuyang Du, Paul Turner, Ben Segall On 25/04/17 13:40, Vincent Guittot wrote: > On 25 April 2017 at 13:05, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: >> On 19/04/17 17:54, Vincent Guittot wrote: >>> In the current implementation of load/util_avg, we assume that the ongoing >>> time segment has fully elapsed, and util/load_sum is divided by LOAD_AVG_MAX, >>> even if part of the time segment still remains to run. As a consequence, this >>> remaining part is considered as idle time and generates unexpected variations >>> of util_avg of a busy CPU in the range ]1002..1024[ whereas util_avg should >> >> Why do you use the square brackets the other way around? Just curious. > > This refers to the very beginning and very end of time segment formulas below. > That being said, 1024 is not reachable because at very end we have : > 1024*MAX_LOAD_AVG*y+1024*1023 = 1023,9997 > > 1002 is not reachable because at very beg we have > 1024*MAX_LOAD_AVG*y+ 1024*0 = 1002,0577 > > But we are working with integer so [1002..1024[ is probably more correct OK, this is with y = 32nd-rt(0.5) exactly, understood. I assume you mean LOAD_AVG_MAX instead of MAX_LOAD_AVG. >> 1002 stands for 1024*y^1 w/ y = 4008/4096 or y^32 = 0.5, right ? Might >> be worth mentioning. >> >>> stay at 1023. >>> >>> In order to keep the metric stable, we should not consider the ongoing time >>> segment when computing load/util_avg but only the segments that have already >>> fully elapsed. Bu to not consider the current time segment adds unwanted >>> latency in the load/util_avg responsivness especially when the time is scaled >>> instead of the contribution. Instead of waiting for the current time segment >>> to have fully elapsed before accounting it in load/util_avg, we can already >>> account the elapsed part but change the range used to compute load/util_avg >>> accordingly. >>> >>> At the very beginning of a new time segment, the past segments have been >>> decayed and the max value is MAX_LOAD_AVG*y. At the very end of the current >>> time segment, the max value becomes 1024(us) + MAX_LOAD_AVG*y which is equal >>> to MAX_LOAD_AVG. In fact, the max value is >>> sa->period_contrib + MAX_LOAD_AVG*y at any time in the time segment. s/MAX_LOAD_AVG/LOAD_AVG_MAX >>> >>> Taking advantage of the fact that MAX_LOAD_AVG*y == MAX_LOAD_AVG-1024, the >>> range becomes [0..MAX_LOAD_AVG-1024+sa->period_contrib]. >>> >>> As the elapsed part is already accounted in load/util_sum, we update the max >>> value according to the current position in the time segment instead of >>> removing its contribution. >> >> Removing its contribution stands for '- 1024' of 'LOAD_AVG_MAX - 1024' >> which was added in patch 1/2? > > removing its contribution refers to "- sa->period_contrib * weight" > and "- (running * sa->period_contrib << SCHED_CAPACITY_SHIFT))" in > patch 1/2 of the previous version Yup, makes sense, so the '-1024' is the influence of the current 'time segment' (n = 0) then. IMHO, the removing of contribution in patch 1/2 wouldn't take freq and cpu scaling of contribution (which is still in accumulate_sum()) into consideration. >>> Suggested-by: Peter Zijlstra <peterz@infradead.org> >>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> >>> --- >>> >>> Fold both patches in one >>> >>> kernel/sched/fair.c | 6 +++--- >>> 1 file changed, 3 insertions(+), 3 deletions(-) >>> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>> index 3f83a35..c3b8f0f 100644 >>> --- a/kernel/sched/fair.c >>> +++ b/kernel/sched/fair.c >>> @@ -3017,12 +3017,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa, >>> /* >>> * Step 2: update *_avg. >>> */ >>> - sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX); >>> + sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib); >>> if (cfs_rq) { >>> cfs_rq->runnable_load_avg = >>> - div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX); >>> + div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib); >>> } >>> - sa->util_avg = sa->util_sum / LOAD_AVG_MAX; >>> + sa->util_avg = sa->util_sum / (LOAD_AVG_MAX - 1024 + sa->period_contrib); >>> >>> return 1; >>> } >>> ^ permalink raw reply [flat|nested] 11+ messages in thread

*Re: [PATCH v2] sched/cfs: make util/load_avg more stable2017-04-25 14:53 ` Dietmar Eggemann@ 2017-04-25 15:17 ` Vincent Guittot0 siblings, 0 replies; 11+ messages in thread From: Vincent Guittot @ 2017-04-25 15:17 UTC (permalink / raw) To: Dietmar Eggemann Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Morten Rasmussen, Yuyang Du, Paul Turner, Ben Segall On 25 April 2017 at 16:53, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: > On 25/04/17 13:40, Vincent Guittot wrote: >> On 25 April 2017 at 13:05, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: >>> On 19/04/17 17:54, Vincent Guittot wrote: >>>> In the current implementation of load/util_avg, we assume that the ongoing >>>> time segment has fully elapsed, and util/load_sum is divided by LOAD_AVG_MAX, >>>> even if part of the time segment still remains to run. As a consequence, this >>>> remaining part is considered as idle time and generates unexpected variations >>>> of util_avg of a busy CPU in the range ]1002..1024[ whereas util_avg should >>> >>> Why do you use the square brackets the other way around? Just curious. >> >> This refers to the very beginning and very end of time segment formulas below. >> That being said, 1024 is not reachable because at very end we have : >> 1024*MAX_LOAD_AVG*y+1024*1023 = 1023,9997 >> >> 1002 is not reachable because at very beg we have >> 1024*MAX_LOAD_AVG*y+ 1024*0 = 1002,0577 >> >> But we are working with integer so [1002..1024[ is probably more correct > > OK, this is with y = 32nd-rt(0.5) exactly, understood. > > I assume you mean LOAD_AVG_MAX instead of MAX_LOAD_AVG. correct > >>> 1002 stands for 1024*y^1 w/ y = 4008/4096 or y^32 = 0.5, right ? Might >>> be worth mentioning. >>> >>>> stay at 1023. >>>> >>>> In order to keep the metric stable, we should not consider the ongoing time >>>> segment when computing load/util_avg but only the segments that have already >>>> fully elapsed. Bu to not consider the current time segment adds unwanted >>>> latency in the load/util_avg responsivness especially when the time is scaled >>>> instead of the contribution. Instead of waiting for the current time segment >>>> to have fully elapsed before accounting it in load/util_avg, we can already >>>> account the elapsed part but change the range used to compute load/util_avg >>>> accordingly. >>>> >>>> At the very beginning of a new time segment, the past segments have been >>>> decayed and the max value is MAX_LOAD_AVG*y. At the very end of the current >>>> time segment, the max value becomes 1024(us) + MAX_LOAD_AVG*y which is equal >>>> to MAX_LOAD_AVG. In fact, the max value is >>>> sa->period_contrib + MAX_LOAD_AVG*y at any time in the time segment. > > s/MAX_LOAD_AVG/LOAD_AVG_MAX ditto > >>>> >>>> Taking advantage of the fact that MAX_LOAD_AVG*y == MAX_LOAD_AVG-1024, the >>>> range becomes [0..MAX_LOAD_AVG-1024+sa->period_contrib]. >>>> >>>> As the elapsed part is already accounted in load/util_sum, we update the max >>>> value according to the current position in the time segment instead of >>>> removing its contribution. >>> >>> Removing its contribution stands for '- 1024' of 'LOAD_AVG_MAX - 1024' >>> which was added in patch 1/2? >> >> removing its contribution refers to "- sa->period_contrib * weight" >> and "- (running * sa->period_contrib << SCHED_CAPACITY_SHIFT))" in >> patch 1/2 of the previous version > > Yup, makes sense, so the '-1024' is the influence of the current 'time > segment' (n = 0) then. Yes > > IMHO, the removing of contribution in patch 1/2 wouldn't take freq and > cpu scaling of contribution (which is still in accumulate_sum()) into > consideration. Yes you're right but as everything has been put in 1 single patch in v2, it doesn't make any difference now > >>>> Suggested-by: Peter Zijlstra <peterz@infradead.org> >>>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> >>>> --- >>>> >>>> Fold both patches in one >>>> >>>> kernel/sched/fair.c | 6 +++--- >>>> 1 file changed, 3 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>>> index 3f83a35..c3b8f0f 100644 >>>> --- a/kernel/sched/fair.c >>>> +++ b/kernel/sched/fair.c >>>> @@ -3017,12 +3017,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa, >>>> /* >>>> * Step 2: update *_avg. >>>> */ >>>> - sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX); >>>> + sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib); >>>> if (cfs_rq) { >>>> cfs_rq->runnable_load_avg = >>>> - div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX); >>>> + div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib); >>>> } >>>> - sa->util_avg = sa->util_sum / LOAD_AVG_MAX; >>>> + sa->util_avg = sa->util_sum / (LOAD_AVG_MAX - 1024 + sa->period_contrib); >>>> >>>> return 1; >>>> } >>>> ^ permalink raw reply [flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-04-25 15:18 UTC | newest]Thread overview:11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-04-19 16:29 [PATCH 0/2] sched/cfs: make util/load_avg stable Vincent Guittot 2017-04-19 16:29 ` [PATCH 1/2] sched/cfs: make util/load_avg more stable Vincent Guittot 2017-04-19 16:29 ` [PATCH 2/2] sched/cfs: take into account current time segment Vincent Guittot 2017-04-19 16:44 ` [PATCH 0/2] sched/cfs: make util/load_avg stable Vincent Guittot 2017-04-19 16:44 ` [PATCH 1/2] sched/cfs: make util/load_avg more stable Vincent Guittot 2017-04-19 16:44 ` [PATCH 2/2] sched/cfs: take into account current time segment Vincent Guittot 2017-04-19 16:54 ` [PATCH v2] sched/cfs: make util/load_avg more stable Vincent Guittot 2017-04-25 11:05 ` Dietmar Eggemann 2017-04-25 12:40 ` Vincent Guittot 2017-04-25 14:53 ` Dietmar Eggemann 2017-04-25 15:17 ` Vincent Guittot

This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).