From: Vincent Guittot <vincent.guittot@linaro.org> To: Odin Ugedal <odin@uged.al> Cc: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>, Daniel Bristot de Oliveira <bristot@redhat.com>, "open list:CONTROL GROUP (CGROUP)" <cgroups@vger.kernel.org>, linux-kernel <linux-kernel@vger.kernel.org> Subject: Re: [PATCH v4] sched/fair: Correctly insert cfs_rq's to list on unthrottle Date: Mon, 7 Jun 2021 15:29:40 +0200 [thread overview] Message-ID: <CAKfTPtDHrD_QGoLeUkR0ALRakWH+KOopHZk=29fyi-oonerd9g@mail.gmail.com> (raw) In-Reply-To: <20210604102314.697749-1-odin@uged.al> On Fri, 4 Jun 2021 at 12:26, Odin Ugedal <odin@uged.al> wrote: > > This fixes an issue where fairness is decreased since cfs_rq's can > end up not being decayed properly. For two sibling control groups with > the same priority, this can often lead to a load ratio of 99/1 (!!). > > This happen because when a cfs_rq is throttled, all the descendant cfs_rq's > will be removed from the leaf list. When they initial cfs_rq is > unthrottled, it will currently only re add descendant cfs_rq's if they > have one or more entities enqueued. This is not a perfect heuristic. > > Instead, we insert all cfs_rq's that contain one or more enqueued > entities, or it its load is not completely decayed. > > Can often lead to situations like this for equally weighted control > groups: > > $ ps u -C stress > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > root 10009 88.8 0.0 3676 100 pts/1 R+ 11:04 0:13 stress --cpu 1 > root 10023 3.0 0.0 3676 104 pts/1 R+ 11:04 0:00 stress --cpu 1 > > Fixes: 31bc6aeaab1d ("sched/fair: Optimize update_blocked_averages()") > Signed-off-by: Odin Ugedal <odin@uged.al> > --- > Changes since v1: > - Replaced cfs_rq field with using tg_load_avg_contrib > - Went from 3 to 1 patches; one is merged and one is replaced > by a new patchset. > Changes since v2: > - Use !cfs_rq_is_decayed() instead of tg_load_avg_contrib > - Moved cfs_rq_is_decayed to above its new use > Changes since v3: > - (hopefully) Fix config for !CONFIG_SMP > kernel/sched/fair.c | 40 +++++++++++++++++++++------------------- > 1 file changed, 21 insertions(+), 19 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 794c2cb945f8..eec32f214ff8 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -712,6 +712,25 @@ static u64 sched_vslice(struct cfs_rq *cfs_rq, struct sched_entity *se) > return calc_delta_fair(sched_slice(cfs_rq, se), se); > } > > +static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq) It's not the best place for this function: - pelt.h header file is included below but cfs_rq_is_decayed() uses PELT - CONFIG_SMP is already defined few lines below - cfs_rq_is_decayed() is only used with CONFIG_FAIR_GROUP_SCHED and now with CONFIG_CFS_BANDWIDTH which depends on the former so moving cfs_rq_is_decayed() just above update_tg_load_avg() with other functions used for propagating and updating tg load seems a better place > +{ > + if (cfs_rq->load.weight) > + return false; > + > +#ifdef CONFIG_SMP > + if (cfs_rq->avg.load_sum) > + return false; > + > + if (cfs_rq->avg.util_sum) > + return false; > + > + if (cfs_rq->avg.runnable_sum) > + return false; > +#endif > + > + return true; > +} > + > #include "pelt.h" > #ifdef CONFIG_SMP > > @@ -4719,8 +4738,8 @@ static int tg_unthrottle_up(struct task_group *tg, void *data) > cfs_rq->throttled_clock_task_time += rq_clock_task(rq) - > cfs_rq->throttled_clock_task; > > - /* Add cfs_rq with already running entity in the list */ > - if (cfs_rq->nr_running >= 1) > + /* Add cfs_rq with load or one or more already running entities to the list */ > + if (!cfs_rq_is_decayed(cfs_rq) || cfs_rq->nr_running) > list_add_leaf_cfs_rq(cfs_rq); > } > > @@ -7895,23 +7914,6 @@ static bool __update_blocked_others(struct rq *rq, bool *done) > > #ifdef CONFIG_FAIR_GROUP_SCHED > > -static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq) > -{ > - if (cfs_rq->load.weight) > - return false; > - > - if (cfs_rq->avg.load_sum) > - return false; > - > - if (cfs_rq->avg.util_sum) > - return false; > - > - if (cfs_rq->avg.runnable_sum) > - return false; > - > - return true; > -} > - > static bool __update_blocked_fair(struct rq *rq, bool *done) > { > struct cfs_rq *cfs_rq, *pos; > -- > 2.31.1 >
WARNING: multiple messages have this Message-ID (diff)
From: Vincent Guittot <vincent.guittot-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> To: Odin Ugedal <odin-RObV4cXtwVA@public.gmane.org> Cc: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>, Juri Lelli <juri.lelli-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, Dietmar Eggemann <dietmar.eggemann-5wv7dgnIgG8@public.gmane.org>, Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>, Ben Segall <bsegall-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>, Mel Gorman <mgorman-l3A5Bk7waGM@public.gmane.org>, Daniel Bristot de Oliveira <bristot-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "open list:CONTROL GROUP (CGROUP)" <cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, linux-kernel <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> Subject: Re: [PATCH v4] sched/fair: Correctly insert cfs_rq's to list on unthrottle Date: Mon, 7 Jun 2021 15:29:40 +0200 [thread overview] Message-ID: <CAKfTPtDHrD_QGoLeUkR0ALRakWH+KOopHZk=29fyi-oonerd9g@mail.gmail.com> (raw) In-Reply-To: <20210604102314.697749-1-odin-RObV4cXtwVA@public.gmane.org> On Fri, 4 Jun 2021 at 12:26, Odin Ugedal <odin-RObV4cXtwVA@public.gmane.org> wrote: > > This fixes an issue where fairness is decreased since cfs_rq's can > end up not being decayed properly. For two sibling control groups with > the same priority, this can often lead to a load ratio of 99/1 (!!). > > This happen because when a cfs_rq is throttled, all the descendant cfs_rq's > will be removed from the leaf list. When they initial cfs_rq is > unthrottled, it will currently only re add descendant cfs_rq's if they > have one or more entities enqueued. This is not a perfect heuristic. > > Instead, we insert all cfs_rq's that contain one or more enqueued > entities, or it its load is not completely decayed. > > Can often lead to situations like this for equally weighted control > groups: > > $ ps u -C stress > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > root 10009 88.8 0.0 3676 100 pts/1 R+ 11:04 0:13 stress --cpu 1 > root 10023 3.0 0.0 3676 104 pts/1 R+ 11:04 0:00 stress --cpu 1 > > Fixes: 31bc6aeaab1d ("sched/fair: Optimize update_blocked_averages()") > Signed-off-by: Odin Ugedal <odin-RObV4cXtwVA@public.gmane.org> > --- > Changes since v1: > - Replaced cfs_rq field with using tg_load_avg_contrib > - Went from 3 to 1 patches; one is merged and one is replaced > by a new patchset. > Changes since v2: > - Use !cfs_rq_is_decayed() instead of tg_load_avg_contrib > - Moved cfs_rq_is_decayed to above its new use > Changes since v3: > - (hopefully) Fix config for !CONFIG_SMP > kernel/sched/fair.c | 40 +++++++++++++++++++++------------------- > 1 file changed, 21 insertions(+), 19 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 794c2cb945f8..eec32f214ff8 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -712,6 +712,25 @@ static u64 sched_vslice(struct cfs_rq *cfs_rq, struct sched_entity *se) > return calc_delta_fair(sched_slice(cfs_rq, se), se); > } > > +static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq) It's not the best place for this function: - pelt.h header file is included below but cfs_rq_is_decayed() uses PELT - CONFIG_SMP is already defined few lines below - cfs_rq_is_decayed() is only used with CONFIG_FAIR_GROUP_SCHED and now with CONFIG_CFS_BANDWIDTH which depends on the former so moving cfs_rq_is_decayed() just above update_tg_load_avg() with other functions used for propagating and updating tg load seems a better place > +{ > + if (cfs_rq->load.weight) > + return false; > + > +#ifdef CONFIG_SMP > + if (cfs_rq->avg.load_sum) > + return false; > + > + if (cfs_rq->avg.util_sum) > + return false; > + > + if (cfs_rq->avg.runnable_sum) > + return false; > +#endif > + > + return true; > +} > + > #include "pelt.h" > #ifdef CONFIG_SMP > > @@ -4719,8 +4738,8 @@ static int tg_unthrottle_up(struct task_group *tg, void *data) > cfs_rq->throttled_clock_task_time += rq_clock_task(rq) - > cfs_rq->throttled_clock_task; > > - /* Add cfs_rq with already running entity in the list */ > - if (cfs_rq->nr_running >= 1) > + /* Add cfs_rq with load or one or more already running entities to the list */ > + if (!cfs_rq_is_decayed(cfs_rq) || cfs_rq->nr_running) > list_add_leaf_cfs_rq(cfs_rq); > } > > @@ -7895,23 +7914,6 @@ static bool __update_blocked_others(struct rq *rq, bool *done) > > #ifdef CONFIG_FAIR_GROUP_SCHED > > -static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq) > -{ > - if (cfs_rq->load.weight) > - return false; > - > - if (cfs_rq->avg.load_sum) > - return false; > - > - if (cfs_rq->avg.util_sum) > - return false; > - > - if (cfs_rq->avg.runnable_sum) > - return false; > - > - return true; > -} > - > static bool __update_blocked_fair(struct rq *rq, bool *done) > { > struct cfs_rq *cfs_rq, *pos; > -- > 2.31.1 >
next prev parent reply other threads:[~2021-06-07 13:31 UTC|newest] Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-06-04 10:23 [PATCH v4] sched/fair: Correctly insert cfs_rq's to list on unthrottle Odin Ugedal 2021-06-07 13:29 ` Vincent Guittot [this message] 2021-06-07 13:29 ` Vincent Guittot 2021-06-07 13:36 ` Odin Ugedal 2021-06-07 13:36 ` Odin Ugedal 2021-06-08 16:39 ` Michal Koutný 2021-06-10 6:49 ` Vincent Guittot 2021-06-10 6:49 ` Vincent Guittot
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAKfTPtDHrD_QGoLeUkR0ALRakWH+KOopHZk=29fyi-oonerd9g@mail.gmail.com' \ --to=vincent.guittot@linaro.org \ --cc=bristot@redhat.com \ --cc=bsegall@google.com \ --cc=cgroups@vger.kernel.org \ --cc=dietmar.eggemann@arm.com \ --cc=juri.lelli@redhat.com \ --cc=linux-kernel@vger.kernel.org \ --cc=mgorman@suse.de \ --cc=mingo@redhat.com \ --cc=odin@uged.al \ --cc=peterz@infradead.org \ --cc=rostedt@goodmis.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.