linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched: allow newidle balancing to bail out of load_balance
@ 2022-06-09  2:55 Josh Don
  2022-06-09 13:41 ` Vincent Guittot
  2022-06-13  8:43 ` [tip: sched/core] sched: Allow " tip-bot2 for Josh Don
  0 siblings, 2 replies; 6+ messages in thread
From: Josh Don @ 2022-06-09  2:55 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel,
	Josh Don

While doing newidle load balancing, it is possible for new tasks to
arrive, such as with pending wakeups. newidle_balance() already accounts
for this by exiting the sched_domain load_balance() iteration if it
detects these cases. This is very important for minimizing wakeup
latency.

However, if we are already in load_balance(), we may stay there for a
while before returning back to newidle_balance(). This is most
exacerbated if we enter a 'goto redo' loop in the LBF_ALL_PINNED case. A
very straightforward workaround to this is to adjust should_we_balance()
to bail out if we're doing a CPU_NEWLY_IDLE balance and new tasks are
detected.

This was tested with the following reproduction:
- two threads that take turns sleeping and waking each other up are
  affined to two cores
- a large number of threads with 100% utilization are pinned to all
  other cores

Without this patch, wakeup latency was ~120us for the pair of threads,
almost entirely spent in load_balance(). With this patch, wakeup latency
is ~6us.

Signed-off-by: Josh Don <joshdon@google.com>
---
 kernel/sched/fair.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8c5b74f66bd3..5abf30117824 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9834,9 +9834,15 @@ static int should_we_balance(struct lb_env *env)
 	/*
 	 * In the newly idle case, we will allow all the CPUs
 	 * to do the newly idle load balance.
+	 *
+	 * However, we bail out if we already have tasks or a wakeup pending,
+	 * to optimize wakeup latency.
 	 */
-	if (env->idle == CPU_NEWLY_IDLE)
+	if (env->idle == CPU_NEWLY_IDLE) {
+		if (env->dst_rq->nr_running > 0 || env->dst_rq->ttwu_pending)
+			return 0;
 		return 1;
+	}
 
 	/* Try to find first idle CPU */
 	for_each_cpu_and(cpu, group_balance_mask(sg), env->cpus) {
-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] sched: allow newidle balancing to bail out of load_balance
  2022-06-09  2:55 [PATCH] sched: allow newidle balancing to bail out of load_balance Josh Don
@ 2022-06-09 13:41 ` Vincent Guittot
  2022-06-09 19:40   ` Josh Don
  2022-06-13  8:43 ` [tip: sched/core] sched: Allow " tip-bot2 for Josh Don
  1 sibling, 1 reply; 6+ messages in thread
From: Vincent Guittot @ 2022-06-09 13:41 UTC (permalink / raw)
  To: Josh Don
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel

On Thu, 9 Jun 2022 at 04:55, Josh Don <joshdon@google.com> wrote:
>
> While doing newidle load balancing, it is possible for new tasks to
> arrive, such as with pending wakeups. newidle_balance() already accounts
> for this by exiting the sched_domain load_balance() iteration if it
> detects these cases. This is very important for minimizing wakeup
> latency.
>
> However, if we are already in load_balance(), we may stay there for a
> while before returning back to newidle_balance(). This is most
> exacerbated if we enter a 'goto redo' loop in the LBF_ALL_PINNED case. A
> very straightforward workaround to this is to adjust should_we_balance()
> to bail out if we're doing a CPU_NEWLY_IDLE balance and new tasks are
> detected.

This one is close to the other tests and I wonder if it should be
better placed before taking the busiest rq lock and detaching some
tasks.

Beside your use case where all other threads can't move in local cpu
and load_balance() loops and clears other cpus, most of the time is
probably spent in fbg() and fbq() so there are more chance that a task
woke in this meantime and I imagine that it becomes useless to take
lock and move tasks from another cpu if the local cpu is no more newly
idle.

Have you tried other places in load_balance() and does this one
provide the lowest wakeup latency ?

That being said, the current patch makes sense.

>
> This was tested with the following reproduction:
> - two threads that take turns sleeping and waking each other up are
>   affined to two cores
> - a large number of threads with 100% utilization are pinned to all
>   other cores
>
> Without this patch, wakeup latency was ~120us for the pair of threads,
> almost entirely spent in load_balance(). With this patch, wakeup latency
> is ~6us.
>
> Signed-off-by: Josh Don <joshdon@google.com>
> ---
>  kernel/sched/fair.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8c5b74f66bd3..5abf30117824 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9834,9 +9834,15 @@ static int should_we_balance(struct lb_env *env)
>         /*
>          * In the newly idle case, we will allow all the CPUs
>          * to do the newly idle load balance.
> +        *
> +        * However, we bail out if we already have tasks or a wakeup pending,
> +        * to optimize wakeup latency.
>          */
> -       if (env->idle == CPU_NEWLY_IDLE)
> +       if (env->idle == CPU_NEWLY_IDLE) {
> +               if (env->dst_rq->nr_running > 0 || env->dst_rq->ttwu_pending)
> +                       return 0;
>                 return 1;
> +       }
>
>         /* Try to find first idle CPU */
>         for_each_cpu_and(cpu, group_balance_mask(sg), env->cpus) {
> --
> 2.36.1.476.g0c4daa206d-goog
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sched: allow newidle balancing to bail out of load_balance
  2022-06-09 13:41 ` Vincent Guittot
@ 2022-06-09 19:40   ` Josh Don
  2022-06-10  8:10     ` Vincent Guittot
  0 siblings, 1 reply; 6+ messages in thread
From: Josh Don @ 2022-06-09 19:40 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel

Thanks Vincent,

On Thu, Jun 9, 2022 at 6:42 AM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Thu, 9 Jun 2022 at 04:55, Josh Don <joshdon@google.com> wrote:
> >
> > While doing newidle load balancing, it is possible for new tasks to
> > arrive, such as with pending wakeups. newidle_balance() already accounts
> > for this by exiting the sched_domain load_balance() iteration if it
> > detects these cases. This is very important for minimizing wakeup
> > latency.
> >
> > However, if we are already in load_balance(), we may stay there for a
> > while before returning back to newidle_balance(). This is most
> > exacerbated if we enter a 'goto redo' loop in the LBF_ALL_PINNED case. A
> > very straightforward workaround to this is to adjust should_we_balance()
> > to bail out if we're doing a CPU_NEWLY_IDLE balance and new tasks are
> > detected.
>
> This one is close to the other tests and I wonder if it should be
> better placed before taking the busiest rq lock and detaching some
> tasks.
>
> Beside your use case where all other threads can't move in local cpu
> and load_balance() loops and clears other cpus, most of the time is
> probably spent in fbg() and fbq() so there are more chance that a task
> woke in this meantime and I imagine that it becomes useless to take
> lock and move tasks from another cpu if the local cpu is no more newly
> idle.
>
> Have you tried other places in load_balance() and does this one
> provide the lowest wakeup latency ?
>
> That being said, the current patch makes sense.

I tested with another check after fbg/fbq and there wasn't any
noticeable improvement to observed wakeup latency (not totally
unexpected, since it only helps for wakeups that come during fbg/fbq).
However, I don't think there's any harm in having that extra check in
the CPU_NEWLY_IDLE case; might as well avoid bouncing the rq lock if
we can. fbq+fbg are together taking ~3-4us per iteration in my repro.

If there are no objections I can send a v2 with the added delta:

@@ -9906,6 +9906,16 @@ static int load_balance(int this_cpu, struct rq *this_rq,
                goto out_balanced;
        }

+       /*
+        * fbg/fbq can take a while. In the newly idle case, recheck whether
+        * we should continue with balancing, since it is possible that a
+        * task woke up in the interim.
+        */
+       if (env.idle == CPU_NEWLY_IDLE && !should_we_balance(&env)) {
+               *continue_balancing = 0;
+               goto out_balanced;
+       }
+
        BUG_ON(busiest == env.dst_rq);

        schedstat_add(sd->lb_imbalance[idle], env.imbalance);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sched: allow newidle balancing to bail out of load_balance
  2022-06-09 19:40   ` Josh Don
@ 2022-06-10  8:10     ` Vincent Guittot
  2022-06-10 18:46       ` Josh Don
  0 siblings, 1 reply; 6+ messages in thread
From: Vincent Guittot @ 2022-06-10  8:10 UTC (permalink / raw)
  To: Josh Don
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel

On Thu, 9 Jun 2022 at 21:40, Josh Don <joshdon@google.com> wrote:
>
> Thanks Vincent,
>
> On Thu, Jun 9, 2022 at 6:42 AM Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> >
> > On Thu, 9 Jun 2022 at 04:55, Josh Don <joshdon@google.com> wrote:
> > >
> > > While doing newidle load balancing, it is possible for new tasks to
> > > arrive, such as with pending wakeups. newidle_balance() already accounts
> > > for this by exiting the sched_domain load_balance() iteration if it
> > > detects these cases. This is very important for minimizing wakeup
> > > latency.
> > >
> > > However, if we are already in load_balance(), we may stay there for a
> > > while before returning back to newidle_balance(). This is most
> > > exacerbated if we enter a 'goto redo' loop in the LBF_ALL_PINNED case. A
> > > very straightforward workaround to this is to adjust should_we_balance()
> > > to bail out if we're doing a CPU_NEWLY_IDLE balance and new tasks are
> > > detected.
> >
> > This one is close to the other tests and I wonder if it should be
> > better placed before taking the busiest rq lock and detaching some
> > tasks.
> >
> > Beside your use case where all other threads can't move in local cpu
> > and load_balance() loops and clears other cpus, most of the time is
> > probably spent in fbg() and fbq() so there are more chance that a task
> > woke in this meantime and I imagine that it becomes useless to take
> > lock and move tasks from another cpu if the local cpu is no more newly
> > idle.
> >
> > Have you tried other places in load_balance() and does this one
> > provide the lowest wakeup latency ?
> >
> > That being said, the current patch makes sense.
>
> I tested with another check after fbg/fbq and there wasn't any
> noticeable improvement to observed wakeup latency (not totally
> unexpected, since it only helps for wakeups that come during fbg/fbq).

ok. so IIUC the wakeup has already happened when we start
load_balance() in your case so the additional test is useless in your
case

> However, I don't think there's any harm in having that extra check in
> the CPU_NEWLY_IDLE case; might as well avoid bouncing the rq lock if
> we can. fbq+fbg are together taking ~3-4us per iteration in my repro.
>
> If there are no objections I can send a v2 with the added delta:

Would be good to get figures that show some benefits of this
additional check for some benchmarks

So I think that we can stay with your current proposal for now

>
> @@ -9906,6 +9906,16 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>                 goto out_balanced;
>         }
>
> +       /*
> +        * fbg/fbq can take a while. In the newly idle case, recheck whether
> +        * we should continue with balancing, since it is possible that a
> +        * task woke up in the interim.
> +        */
> +       if (env.idle == CPU_NEWLY_IDLE && !should_we_balance(&env)) {
> +               *continue_balancing = 0;
> +               goto out_balanced;
> +       }
> +
>         BUG_ON(busiest == env.dst_rq);
>
>         schedstat_add(sd->lb_imbalance[idle], env.imbalance);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sched: allow newidle balancing to bail out of load_balance
  2022-06-10  8:10     ` Vincent Guittot
@ 2022-06-10 18:46       ` Josh Don
  0 siblings, 0 replies; 6+ messages in thread
From: Josh Don @ 2022-06-10 18:46 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, linux-kernel

On Fri, Jun 10, 2022 at 1:10 AM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Thu, 9 Jun 2022 at 21:40, Josh Don <joshdon@google.com> wrote:
> >
> > Thanks Vincent,
> >
> > On Thu, Jun 9, 2022 at 6:42 AM Vincent Guittot
> > <vincent.guittot@linaro.org> wrote:
> > >
> > > On Thu, 9 Jun 2022 at 04:55, Josh Don <joshdon@google.com> wrote:
> > > >
> > > > While doing newidle load balancing, it is possible for new tasks to
> > > > arrive, such as with pending wakeups. newidle_balance() already accounts
> > > > for this by exiting the sched_domain load_balance() iteration if it
> > > > detects these cases. This is very important for minimizing wakeup
> > > > latency.
> > > >
> > > > However, if we are already in load_balance(), we may stay there for a
> > > > while before returning back to newidle_balance(). This is most
> > > > exacerbated if we enter a 'goto redo' loop in the LBF_ALL_PINNED case. A
> > > > very straightforward workaround to this is to adjust should_we_balance()
> > > > to bail out if we're doing a CPU_NEWLY_IDLE balance and new tasks are
> > > > detected.
> > >
> > > This one is close to the other tests and I wonder if it should be
> > > better placed before taking the busiest rq lock and detaching some
> > > tasks.
> > >
> > > Beside your use case where all other threads can't move in local cpu
> > > and load_balance() loops and clears other cpus, most of the time is
> > > probably spent in fbg() and fbq() so there are more chance that a task
> > > woke in this meantime and I imagine that it becomes useless to take
> > > lock and move tasks from another cpu if the local cpu is no more newly
> > > idle.
> > >
> > > Have you tried other places in load_balance() and does this one
> > > provide the lowest wakeup latency ?
> > >
> > > That being said, the current patch makes sense.
> >
> > I tested with another check after fbg/fbq and there wasn't any
> > noticeable improvement to observed wakeup latency (not totally
> > unexpected, since it only helps for wakeups that come during fbg/fbq).
>
> ok. so IIUC the wakeup has already happened when we start
> load_balance() in your case so the additional test is useless in your
> case

Not necessarily; the wakeup could also happen while we're in the
ALL_PINNED redo loop (this lasts ~100us), but the added check doesn't
meaningfully affect latency for my specific repro.

> > However, I don't think there's any harm in having that extra check in
> > the CPU_NEWLY_IDLE case; might as well avoid bouncing the rq lock if
> > we can. fbq+fbg are together taking ~3-4us per iteration in my repro.
> >
> > If there are no objections I can send a v2 with the added delta:
>
> Would be good to get figures that show some benefits of this
> additional check for some benchmarks
>
> So I think that we can stay with your current proposal for now

Sounds good, thanks for taking a look!

> >
> > @@ -9906,6 +9906,16 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> >                 goto out_balanced;
> >         }
> >
> > +       /*
> > +        * fbg/fbq can take a while. In the newly idle case, recheck whether
> > +        * we should continue with balancing, since it is possible that a
> > +        * task woke up in the interim.
> > +        */
> > +       if (env.idle == CPU_NEWLY_IDLE && !should_we_balance(&env)) {
> > +               *continue_balancing = 0;
> > +               goto out_balanced;
> > +       }
> > +
> >         BUG_ON(busiest == env.dst_rq);
> >
> >         schedstat_add(sd->lb_imbalance[idle], env.imbalance);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip: sched/core] sched: Allow newidle balancing to bail out of load_balance
  2022-06-09  2:55 [PATCH] sched: allow newidle balancing to bail out of load_balance Josh Don
  2022-06-09 13:41 ` Vincent Guittot
@ 2022-06-13  8:43 ` tip-bot2 for Josh Don
  1 sibling, 0 replies; 6+ messages in thread
From: tip-bot2 for Josh Don @ 2022-06-13  8:43 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Josh Don, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     792b9f65a568f48c50b3175536db9cde5a1edcc0
Gitweb:        https://git.kernel.org/tip/792b9f65a568f48c50b3175536db9cde5a1edcc0
Author:        Josh Don <joshdon@google.com>
AuthorDate:    Wed, 08 Jun 2022 19:55:15 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 13 Jun 2022 10:30:01 +02:00

sched: Allow newidle balancing to bail out of load_balance

While doing newidle load balancing, it is possible for new tasks to
arrive, such as with pending wakeups. newidle_balance() already accounts
for this by exiting the sched_domain load_balance() iteration if it
detects these cases. This is very important for minimizing wakeup
latency.

However, if we are already in load_balance(), we may stay there for a
while before returning back to newidle_balance(). This is most
exacerbated if we enter a 'goto redo' loop in the LBF_ALL_PINNED case. A
very straightforward workaround to this is to adjust should_we_balance()
to bail out if we're doing a CPU_NEWLY_IDLE balance and new tasks are
detected.

This was tested with the following reproduction:
- two threads that take turns sleeping and waking each other up are
  affined to two cores
- a large number of threads with 100% utilization are pinned to all
  other cores

Without this patch, wakeup latency was ~120us for the pair of threads,
almost entirely spent in load_balance(). With this patch, wakeup latency
is ~6us.

Signed-off-by: Josh Don <joshdon@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220609025515.2086253-1-joshdon@google.com
---
 kernel/sched/fair.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7d8ef01..8bed757 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9824,9 +9824,15 @@ static int should_we_balance(struct lb_env *env)
 	/*
 	 * In the newly idle case, we will allow all the CPUs
 	 * to do the newly idle load balance.
+	 *
+	 * However, we bail out if we already have tasks or a wakeup pending,
+	 * to optimize wakeup latency.
 	 */
-	if (env->idle == CPU_NEWLY_IDLE)
+	if (env->idle == CPU_NEWLY_IDLE) {
+		if (env->dst_rq->nr_running > 0 || env->dst_rq->ttwu_pending)
+			return 0;
 		return 1;
+	}
 
 	/* Try to find first idle CPU */
 	for_each_cpu_and(cpu, group_balance_mask(sg), env->cpus) {

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-06-13  8:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-09  2:55 [PATCH] sched: allow newidle balancing to bail out of load_balance Josh Don
2022-06-09 13:41 ` Vincent Guittot
2022-06-09 19:40   ` Josh Don
2022-06-10  8:10     ` Vincent Guittot
2022-06-10 18:46       ` Josh Don
2022-06-13  8:43 ` [tip: sched/core] sched: Allow " tip-bot2 for Josh Don

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).