linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: "Michal Koutný" <mkoutny@suse.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	Phil Auld <pauld@redhat.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Odin Ugedal <odin@uged.al>, Rik van Riel <riel@surriel.com>,
	Giovanni Gherdovich <ggherdovich@suse.cz>
Subject: Re: [RFC PATCH v2 4/5] sched/fair: Simplify load_cfs_rq_list maintenance
Date: Fri, 10 Sep 2021 16:19:27 +0200	[thread overview]
Message-ID: <CAKfTPtCAxksmsR6DJuAoSZ7XTon+gesyic36EHL-nQk0LiHWQg@mail.gmail.com> (raw)
In-Reply-To: <20210819175034.4577-5-mkoutny@suse.com>

On Thu, 19 Aug 2021 at 19:50, Michal Koutný <mkoutny@suse.com> wrote:
>
> load_cfs_rq_list should contain cfs_rqs that have runnable entities in
> them.  When we're operating under a throttled hierarchy we always
> update the load_cfs_rq_list in order not to break list_add_load_cfs_rq
> invariant of adding complete branches.
>
> This caused troubles when an entity became runnable (enqueue_entity)
> under a throttled hierarchy (see commit b34cb07dde7c ("sched/fair: Fix
> enqueue_task_fair() warning some more")). (Basically when we add to the
> load list, we have to ensure all the ancestors are added and when
> deleting we have to delete whole subtree.)
>
> This patch simplifies the code by no updates of load_cfs_rq_list when
> we're operating under the throttled hierarchy and defers the
> load_cfs_rq_list update to the point when whole hierarchy is unthrottled
> (tg_unthrottle_up). Specifically, subtrees of a throttled cfs_rq are not
> decaying their PELT when they're being throttled (but the parent of the
> throttled cfs_rq is decayed).

Your proposal looks interesting but I need more time to make sure that
all cases are covered. We have faced several odd corner cases and
sequences in the past that I need time to check that you don't put
some back

>
> The code is now simpler and load_cfs_rq_list contains only cfs_rqs that
> have load to decay.
>
> Signed-off-by: Michal Koutný <mkoutny@suse.com>
> ---
>  kernel/sched/fair.c | 58 ++++++++++-----------------------------------
>  1 file changed, 12 insertions(+), 46 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6f4d5d4dcdd9..9978485334ec 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3086,6 +3086,8 @@ void reweight_task(struct task_struct *p, int prio)
>         load->inv_weight = sched_prio_to_wmult[prio];
>  }
>
> +static inline int throttled_hierarchy(struct cfs_rq *cfs_rq);
> +
>  #ifdef CONFIG_FAIR_GROUP_SCHED
>  #ifdef CONFIG_SMP
>  /*
> @@ -3196,8 +3198,6 @@ static long calc_group_shares(struct cfs_rq *cfs_rq)
>  }
>  #endif /* CONFIG_SMP */
>
> -static inline int throttled_hierarchy(struct cfs_rq *cfs_rq);
> -
>  /*
>   * Recomputes the group entity based on the current state of its group
>   * runqueue.
> @@ -4294,10 +4294,11 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>
>         /*
>          * When bandwidth control is enabled, cfs might have been removed
> -        * because of a parent been throttled but cfs->nr_running > 1. Try to
> -        * add it unconditionally.
> +        * because of a parent been throttled. We'll add it later (with
> +        * complete branch up to se->on_rq/cfs_eq->on_list) in
> +        * tg_unthrottle_up() and unthrottle_cfs_rq().
>          */
> -       if (cfs_rq->nr_running == 1 || cfs_bandwidth_used())
> +       if (cfs_rq->nr_running == 1 && !throttled_hierarchy(cfs_rq))
>                 list_add_load_cfs_rq(cfs_rq);
>
>         if (cfs_rq->nr_running == 1)
> @@ -4936,31 +4937,13 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
>                 /* end evaluation on encountering a throttled cfs_rq */
>                 if (cfs_rq_throttled(cfs_rq))
>                         goto unthrottle_throttle;
> -
> -               /*
> -                * One parent has been throttled and cfs_rq removed from the
> -                * list. Add it back to not break the load list.
> -                */
> -               if (throttled_hierarchy(cfs_rq))
> -                       list_add_load_cfs_rq(cfs_rq);
>         }
>
>         /* At this point se is NULL and we are at root level*/
>         add_nr_running(rq, task_delta);
>
>  unthrottle_throttle:
> -       /*
> -        * The cfs_rq_throttled() breaks in the above iteration can result in
> -        * incomplete load list maintenance, resulting in triggering the
> -        * assertion below.
> -        */
> -       for_each_sched_entity(se) {
> -               cfs_rq = cfs_rq_of(se);
> -
> -               if (list_add_load_cfs_rq(cfs_rq))
> -                       break;
> -       }
> -
> +       /* See enqueue_task_fair:enqueue_throttle */
>         assert_list_load_cfs_rq(rq);
>
>         /* Determine whether we need to wake up potentially idle CPU: */
> @@ -5600,13 +5583,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>                 /* end evaluation on encountering a throttled cfs_rq */
>                 if (cfs_rq_throttled(cfs_rq))
>                         goto enqueue_throttle;
> -
> -               /*
> -                * One parent has been throttled and cfs_rq removed from the
> -                * list. Add it back to not break the load list.
> -                */
> -               if (throttled_hierarchy(cfs_rq))
> -                       list_add_load_cfs_rq(cfs_rq);
>         }
>
>         /* At this point se is NULL and we are at root level*/
> @@ -5630,21 +5606,11 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>                 update_overutilized_status(rq);
>
>  enqueue_throttle:
> -       if (cfs_bandwidth_used()) {
> -               /*
> -                * When bandwidth control is enabled; the cfs_rq_throttled()
> -                * breaks in the above iteration can result in incomplete
> -                * load list maintenance, resulting in triggering the assertion
> -                * below.
> -                */
> -               for_each_sched_entity(se) {
> -                       cfs_rq = cfs_rq_of(se);
> -
> -                       if (list_add_load_cfs_rq(cfs_rq))
> -                               break;
> -               }
> -       }
> -
> +       /*
> +        * If we got here, subtree of a cfs_rq must have been throttled and
> +        * therefore we did not modify load list or we climbed up to root (or
> +        * joined to an ancestor cfs_rq with on_rq == 1 => on_list).
> +        */
>         assert_list_load_cfs_rq(rq);
>
>         hrtick_update(rq);
> --
> 2.32.0
>

  reply	other threads:[~2021-09-10 14:19 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-19 17:50 [RFC PATCH v2 0/5] leaf_cfs_rq_list cleanups and fix Michal Koutný
2021-08-19 17:50 ` [RFC PATCH v2 1/5] sched/fair: Add ancestors of unthrottled undecayed cfs_rq Michal Koutný
2021-09-09 13:57   ` Vincent Guittot
2021-09-10 11:35     ` Michal Koutný
2021-09-10 13:17       ` Vincent Guittot
2021-08-19 17:50 ` [RFC PATCH v2 2/5] sched: Add group_se() helper Michal Koutný
2021-08-19 17:50 ` [RFC PATCH v2 3/5] sched/fair: Rename leaf_list to more fitting load_list Michal Koutný
2021-09-21 19:06   ` Odin Ugedal
2021-08-19 17:50 ` [RFC PATCH v2 4/5] sched/fair: Simplify load_cfs_rq_list maintenance Michal Koutný
2021-09-10 14:19   ` Vincent Guittot [this message]
2021-09-14  9:22     ` Michal Koutný
2021-09-14  9:45       ` Vincent Guittot
2021-09-15 12:30         ` Vincent Guittot
2021-09-15 18:59           ` Odin Ugedal
2021-09-21 19:21             ` Odin Ugedal
2021-09-21 21:40               ` Michal Koutný
2021-08-19 17:50 ` [RFC PATCH v2 5/5] sched/fair: Simplify ancestor enqueue loops Michal Koutný
2021-09-09 14:04   ` Vincent Guittot
2021-09-10 11:35     ` Michal Koutný
2021-09-10 11:43       ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKfTPtCAxksmsR6DJuAoSZ7XTon+gesyic36EHL-nQk0LiHWQg@mail.gmail.com \
    --to=vincent.guittot@linaro.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=ggherdovich@suse.cz \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=odin@uged.al \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).