* [PATCH v2] sched/fair: Correctly insert cfs_rq's to list on unthrottle
@ 2021-06-03 11:38 Odin Ugedal
2021-06-03 12:51 ` Vincent Guittot
2021-06-03 14:39 ` kernel test robot
0 siblings, 2 replies; 6+ messages in thread
From: Odin Ugedal @ 2021-06-03 11:38 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Daniel Bristot de Oliveira
Cc: cgroups, linux-kernel, Odin Ugedal
This fixes an issue where fairness is decreased since cfs_rq's can
end up not being decayed properly. For two sibling control groups with
the same priority, this can often lead to a load ratio of 99/1 (!!).
This happen because when a cfs_rq is throttled, all the descendant cfs_rq's
will be removed from the leaf list. When they initial cfs_rq is
unthrottled, it will currently only re add descendant cfs_rq's if they
have one or more entities enqueued. This is not a perfect heuristic.
Insted, we insert all cfs_rq's that contain one or more enqueued
entities, or contributes to the load of the task group.
Can often lead to sutiations like this for equally weighted control
groups:
$ ps u -C stress
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 10009 88.8 0.0 3676 100 pts/1 R+ 11:04 0:13 stress --cpu 1
root 10023 3.0 0.0 3676 104 pts/1 R+ 11:04 0:00 stress --cpu 1
Fixes: 31bc6aeaab1d ("sched/fair: Optimize update_blocked_averages()")
Signed-off-by: Odin Ugedal <odin@uged.al>
---
Original thread: https://lore.kernel.org/lkml/20210518125202.78658-3-odin@uged.al/
Changes since v1:
- Replaced cfs_rq field with using tg_load_avg_contrib
- Went from 3 to 1 pathces; one is merged and one is replaced
by a new patchset.
kernel/sched/fair.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 794c2cb945f8..0f1b39ca5ca8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4719,8 +4719,11 @@ static int tg_unthrottle_up(struct task_group *tg, void *data)
cfs_rq->throttled_clock_task_time += rq_clock_task(rq) -
cfs_rq->throttled_clock_task;
- /* Add cfs_rq with already running entity in the list */
- if (cfs_rq->nr_running >= 1)
+ /*
+ * Add cfs_rq with tg load avg contribution or one or more
+ * already running entities to the list
+ */
+ if (cfs_rq->tg_load_avg_contrib || cfs_rq->nr_running)
list_add_leaf_cfs_rq(cfs_rq);
}
--
2.31.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2] sched/fair: Correctly insert cfs_rq's to list on unthrottle
2021-06-03 11:38 [PATCH v2] sched/fair: Correctly insert cfs_rq's to list on unthrottle Odin Ugedal
@ 2021-06-03 12:51 ` Vincent Guittot
2021-06-03 13:12 ` Odin Ugedal
2021-06-03 14:39 ` kernel test robot
1 sibling, 1 reply; 6+ messages in thread
From: Vincent Guittot @ 2021-06-03 12:51 UTC (permalink / raw)
To: Odin Ugedal
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman,
Daniel Bristot de Oliveira, open list:CONTROL GROUP (CGROUP),
linux-kernel
On Thu, 3 Jun 2021 at 13:41, Odin Ugedal <odin@uged.al> wrote:
>
> This fixes an issue where fairness is decreased since cfs_rq's can
> end up not being decayed properly. For two sibling control groups with
> the same priority, this can often lead to a load ratio of 99/1 (!!).
>
> This happen because when a cfs_rq is throttled, all the descendant cfs_rq's
> will be removed from the leaf list. When they initial cfs_rq is
> unthrottled, it will currently only re add descendant cfs_rq's if they
> have one or more entities enqueued. This is not a perfect heuristic.
>
> Insted, we insert all cfs_rq's that contain one or more enqueued
> entities, or contributes to the load of the task group.
>
> Can often lead to sutiations like this for equally weighted control
> groups:
>
> $ ps u -C stress
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> root 10009 88.8 0.0 3676 100 pts/1 R+ 11:04 0:13 stress --cpu 1
> root 10023 3.0 0.0 3676 104 pts/1 R+ 11:04 0:00 stress --cpu 1
>
> Fixes: 31bc6aeaab1d ("sched/fair: Optimize update_blocked_averages()")
> Signed-off-by: Odin Ugedal <odin@uged.al>
> ---
>
> Original thread: https://lore.kernel.org/lkml/20210518125202.78658-3-odin@uged.al/
> Changes since v1:
> - Replaced cfs_rq field with using tg_load_avg_contrib
> - Went from 3 to 1 pathces; one is merged and one is replaced
> by a new patchset.
>
> kernel/sched/fair.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 794c2cb945f8..0f1b39ca5ca8 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4719,8 +4719,11 @@ static int tg_unthrottle_up(struct task_group *tg, void *data)
> cfs_rq->throttled_clock_task_time += rq_clock_task(rq) -
> cfs_rq->throttled_clock_task;
>
> - /* Add cfs_rq with already running entity in the list */
> - if (cfs_rq->nr_running >= 1)
> + /*
> + * Add cfs_rq with tg load avg contribution or one or more
> + * already running entities to the list
> + */
> + if (cfs_rq->tg_load_avg_contrib || cfs_rq->nr_running)
Out of curiosity, why did you decide to use
cfs_rq->tg_load_avg_contrib instead of !cfs_rq_is_decayed(cfs_rq)
which is used to delete the cfs_rq from the list when updating blocked
load ?
> list_add_leaf_cfs_rq(cfs_rq);
> }
>
> --
> 2.31.1
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] sched/fair: Correctly insert cfs_rq's to list on unthrottle
2021-06-03 12:51 ` Vincent Guittot
@ 2021-06-03 13:12 ` Odin Ugedal
2021-06-03 13:40 ` Vincent Guittot
0 siblings, 1 reply; 6+ messages in thread
From: Odin Ugedal @ 2021-06-03 13:12 UTC (permalink / raw)
To: Vincent Guittot
Cc: Odin Ugedal, Ingo Molnar, Peter Zijlstra, Juri Lelli,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Daniel Bristot de Oliveira, open list:CONTROL GROUP (CGROUP),
linux-kernel
Hi,
> Out of curiosity, why did you decide to use
> cfs_rq->tg_load_avg_contrib instead of !cfs_rq_is_decayed(cfs_rq)
> which is used to delete the cfs_rq from the list when updating blocked
> load ?
Well, the main reason was that it is currently (without the other in
flight patches) not safe to just use "cfs_rq_is_decayed" directly,
since that could result in
a situation where tg_load_avg_contrib!=0 while
cfs_rq_is_decayed()==true. I guess we can use cfs_rq_is_decayed() if
you prefer that,
and all the other PELT patches are merged. (This was initially why I
thought a new field was a simpler and more elegant solution to make
sure we book-keep correctly,
but when the PELT stuff is fixed properly, that should be no real
issue as long it works as we expect).
I was also thinking about the cfs_rq->nr_running part; is there a
chance of a situation where a cfs_rq->nr_running==1 and it has no
load, resulting in it being decayed and
removed from the list in "__update_blocked_fair"? I have not looked
properly at it, but just wondering if that is actually possible..
Also, out of curiosity, are there some implications of a situation
where tg_load_avg_contrib=0 while *_load!=0, or would that not cause
fairness issues?
Thanks
Odin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] sched/fair: Correctly insert cfs_rq's to list on unthrottle
2021-06-03 13:12 ` Odin Ugedal
@ 2021-06-03 13:40 ` Vincent Guittot
2021-06-03 13:55 ` Odin Ugedal
0 siblings, 1 reply; 6+ messages in thread
From: Vincent Guittot @ 2021-06-03 13:40 UTC (permalink / raw)
To: Odin Ugedal
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman,
Daniel Bristot de Oliveira, open list:CONTROL GROUP (CGROUP),
linux-kernel
On Thu, 3 Jun 2021 at 15:13, Odin Ugedal <odin@uged.al> wrote:
>
> Hi,
>
> > Out of curiosity, why did you decide to use
> > cfs_rq->tg_load_avg_contrib instead of !cfs_rq_is_decayed(cfs_rq)
> > which is used to delete the cfs_rq from the list when updating blocked
> > load ?
>
> Well, the main reason was that it is currently (without the other in
> flight patches) not safe to just use "cfs_rq_is_decayed" directly,
> since that could result in
> a situation where tg_load_avg_contrib!=0 while
> cfs_rq_is_decayed()==true. I guess we can use cfs_rq_is_decayed() if
> you prefer that,
> and all the other PELT patches are merged. (This was initially why I
> thought a new field was a simpler and more elegant solution to make
> sure we book-keep correctly,
> but when the PELT stuff is fixed properly, that should be no real
> issue as long it works as we expect).
If it's only a matter of waiting other PELT patches to be merged, we
should use cfs_rq_is_decayed().
cfs_rq->tg_load_avg_contrib is used to reduce contention on
tg->load_avg but it is outside the scope of PELT and the update of
blocked load so we should avoid using it there
>
> I was also thinking about the cfs_rq->nr_running part; is there a
> chance of a situation where a cfs_rq->nr_running==1 and it has no
> load, resulting in it being decayed and
> removed from the list in "__update_blocked_fair"? I have not looked
> properly at it, but just wondering if that is actually possible..
>
>
> Also, out of curiosity, are there some implications of a situation
> where tg_load_avg_contrib=0 while *_load!=0, or would that not cause
do you mean all cfs_rq->avg.load_avg with *_load ?
> fairness issues?
if load_avg!=0, we will update it periodically and sync
tg_load_avg_contrib with the former. So it's not a problem.
The other way was a problem because we stop updating load_avg and
tg_load_avg_contrib when load_avg/load_sum is null so the
tg_load_avg_contrib is stalled with a possibly very old value
>
> Thanks
> Odin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] sched/fair: Correctly insert cfs_rq's to list on unthrottle
2021-06-03 13:40 ` Vincent Guittot
@ 2021-06-03 13:55 ` Odin Ugedal
0 siblings, 0 replies; 6+ messages in thread
From: Odin Ugedal @ 2021-06-03 13:55 UTC (permalink / raw)
To: Vincent Guittot
Cc: Odin Ugedal, Ingo Molnar, Peter Zijlstra, Juri Lelli,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Daniel Bristot de Oliveira, open list:CONTROL GROUP (CGROUP),
linux-kernel
> If it's only a matter of waiting other PELT patches to be merged, we
> should use cfs_rq_is_decayed().
ACK. will post a v3.
> if load_avg!=0, we will update it periodically and sync
> tg_load_avg_contrib with the former. So it's not a problem.
>
> The other way was a problem because we stop updating load_avg and
> tg_load_avg_contrib when load_avg/load_sum is null so the
> tg_load_avg_contrib is stalled with a possibly very old value
Yeah, that makes sense.
Thanks
Odin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] sched/fair: Correctly insert cfs_rq's to list on unthrottle
2021-06-03 11:38 [PATCH v2] sched/fair: Correctly insert cfs_rq's to list on unthrottle Odin Ugedal
2021-06-03 12:51 ` Vincent Guittot
@ 2021-06-03 14:39 ` kernel test robot
1 sibling, 0 replies; 6+ messages in thread
From: kernel test robot @ 2021-06-03 14:39 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 2668 bytes --]
Hi Odin,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on tip/sched/core]
[also build test ERROR on tip/master linux/master linus/master v5.13-rc4 next-20210603]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Odin-Ugedal/sched-fair-Correctly-insert-cfs_rq-s-to-list-on-unthrottle/20210603-194311
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 858f9e11be8855ed62cb97e58174515da595c76b
config: h8300-randconfig-r001-20210603 (attached as .config)
compiler: h8300-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/fd98f41f84acfad0ad1c7d1eb553807e8d1e0b9b
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Odin-Ugedal/sched-fair-Correctly-insert-cfs_rq-s-to-list-on-unthrottle/20210603-194311
git checkout fd98f41f84acfad0ad1c7d1eb553807e8d1e0b9b
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=h8300
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
kernel/sched/fair.c: In function 'tg_unthrottle_up':
>> kernel/sched/fair.c:4704:13: error: 'struct cfs_rq' has no member named 'tg_load_avg_contrib'
4704 | if (cfs_rq->tg_load_avg_contrib || cfs_rq->nr_running)
| ^~
vim +4704 kernel/sched/fair.c
4689
4690 static int tg_unthrottle_up(struct task_group *tg, void *data)
4691 {
4692 struct rq *rq = data;
4693 struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
4694
4695 cfs_rq->throttle_count--;
4696 if (!cfs_rq->throttle_count) {
4697 cfs_rq->throttled_clock_task_time += rq_clock_task(rq) -
4698 cfs_rq->throttled_clock_task;
4699
4700 /*
4701 * Add cfs_rq with tg load avg contribution or one or more
4702 * already running entities to the list
4703 */
> 4704 if (cfs_rq->tg_load_avg_contrib || cfs_rq->nr_running)
4705 list_add_leaf_cfs_rq(cfs_rq);
4706 }
4707
4708 return 0;
4709 }
4710
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 20340 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-06-03 14:39 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-03 11:38 [PATCH v2] sched/fair: Correctly insert cfs_rq's to list on unthrottle Odin Ugedal
2021-06-03 12:51 ` Vincent Guittot
2021-06-03 13:12 ` Odin Ugedal
2021-06-03 13:40 ` Vincent Guittot
2021-06-03 13:55 ` Odin Ugedal
2021-06-03 14:39 ` kernel test robot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.