All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Jirka Hladky <jhladky@redhat.com>, Phil Auld <pauld@redhat.com>,
	Ingo Molnar <mingo@kernel.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Hillf Danton <hdanton@sina.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Douglas Shakshober <dshaks@redhat.com>,
	Waiman Long <longman@redhat.com>, Joe Mario <jmario@redhat.com>,
	Bill Gray <bgray@redhat.com>
Subject: Re: [PATCH 00/13] Reconcile NUMA balancing decisions with the load balancer v6
Date: Fri, 15 May 2020 13:28:15 +0200	[thread overview]
Message-ID: <20200515112815.GT2957@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20200515084740.GJ3758@techsingularity.net>

On Fri, May 15, 2020 at 09:47:40AM +0100, Mel Gorman wrote:
> sched: Wake cache-local tasks via wake_list if wakee CPU is polling
> 
> There are two separate method for waking a task from idle, one for
> tasks running on CPUs that share a cache and one for when the caches are
> separate. The methods can loosely be called local and remote even though
> this is not directly related to NUMA and instead is due to the expected
> cost of accessing data that is cache-hot on another CPU. The "local" costs
> are expected to be relatively cheap as they share the LLC in comparison to
> a remote IPI that is potentially required when using the "remote" wakeup.
> The problem is that the local wakeup is not always cheaper and it appears
> to have degraded even further around the 4.19 mark.
> 
> There appears to be a couple of reasons why it can be slower.
> 
> The first is specific to pairs of tasks where one or both rapidly enters
> idle. For example, on netperf UDP_STREAM, the client sends a bunch of
> packets, wakes the server, the server processes some packets and goes
> back to sleep. There is a relationship between the tasks but it's not
> strictly synchronous. The timing is different if the client/server are on
> separate NUMA nodes and netserver is more likely to enter idle (measured
> as server entering idle 10% more times when tasks are local vs remote
> but machine-specific). However, the wakeups are so rapid that the wakeup
> happens while the server is descheduling. That forces the waker to spin
> on smp_cond_load_acquire for longer. In this case, it can be cheaper to
> add the task to the rq->wake_list even if that potentially requires an IPI.
> 
> The second is that the local wakeup path is simply not always
> that fast. Using ftrace, the cost of the locks, update_rq_clock and
> ttwu_do_activate was measured as roughly 4.5 microseconds. While it's
> a single instance, the cost of the "remote" wakeup for try_to_wake_up
> was roughly 2.5 microseconds versus 6.2 microseconds for the "fast" local
> wakeup. When there are tens of thousands of wakeups per second, these costs
> accumulate and manifest as a throughput regression in netperf UDP_STREAM.
> 
> The essential difference in costs comes down to whether the CPU is fully
> idle, a task is descheduling or polling in poll_idle(). This patch
> special cases ttwu_queue() to use the "remote" method if the CPUs
> task is polling as it's generally cheaper to use the wake_list in that
> case than the local method because an IPI should not be required. As it is
> race-prone, a reschedule IPI may still be sent but in that case the local
> wakeup would also have to send a reschedule IPI so it should be harmless.

We don't in fact send a wakeup IPI when polling. So this might end up
with an extra IPI.


> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 9a2fbf98fd6f..59077c7c6660 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2380,13 +2380,32 @@ static void ttwu_queue(struct task_struct *p, int cpu, int wake_flags)
>  	struct rq_flags rf;
>  
>  #if defined(CONFIG_SMP)
> -	if (sched_feat(TTWU_QUEUE) && !cpus_share_cache(smp_processor_id(), cpu)) {
> +	if (sched_feat(TTWU_QUEUE)) {
> +		/*
> +		 * A remote wakeup is often expensive as can require
> +		 * an IPI and the wakeup path is slow. However, in
> +		 * the specific case where the target CPU is idle
> +		 * and polling, the CPU is active and rapidly checking
> +		 * if a reschedule is needed.

Not strictly true; MWAIT can be very deep idle, it's just that with
POLLING we indicate we do not have to send an IPI to wake up. Just
setting the TIF_NEED_RESCHED flag is sufficient to wake up -- the
monitor part of monitor-wait.

>                                             In this case, the idle
> +		 * task just needs to be marked for resched and p
> +		 * will rapidly be requeued which is less expensive
> +		 * than the direct wakeup path.
> +		 */
> +		if (cpus_share_cache(smp_processor_id(), cpu)) {
> +			struct thread_info *ti = task_thread_info(p);
> +			typeof(ti->flags) val = READ_ONCE(ti->flags);
> +
> +			if (val & _TIF_POLLING_NRFLAG)
> +				goto activate;

I'm completely confused... the result here is that if you're polling you
do _NOT_ queue on the wake_list, but instead immediately enqueue.

(which kinda makes sense, since if the remote CPU is idle, it doesn't
have these lines in its cache anyway)

But the subject and comments all seem to suggest the opposite !?

Also, this will fail compilation when !defined(TIF_POLLING_NRFLAGG).

> +		}
> +
>  		sched_clock_cpu(cpu); /* Sync clocks across CPUs */
>  		ttwu_queue_remote(p, cpu, wake_flags);
>  		return;
>  	}
>  #endif
>  
> +activate:

The labels wants to go inside the ifdef, otherwise GCC will complain
about unused labels etc..

>  	rq_lock(rq, &rf);
>  	update_rq_clock(rq);
>  	ttwu_do_activate(rq, p, wake_flags, &rf);

  parent reply	other threads:[~2020-05-15 11:29 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-24  9:52 [PATCH 00/13] Reconcile NUMA balancing decisions with the load balancer v6 Mel Gorman
2020-02-24  9:52 ` [PATCH 01/13] sched/fair: Allow a per-CPU kthread waking a task to stack on the same CPU, to fix XFS performance regression Mel Gorman
2020-02-24  9:52 ` [PATCH 02/13] sched/numa: Trace when no candidate CPU was found on the preferred node Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] " tip-bot2 for Mel Gorman
2020-02-24  9:52 ` [PATCH 03/13] sched/numa: Distinguish between the different task_numa_migrate failure cases Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] sched/numa: Distinguish between the different task_numa_migrate() " tip-bot2 for Mel Gorman
2020-02-24  9:52 ` [PATCH 04/13] sched/fair: Reorder enqueue/dequeue_task_fair path Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2020-02-24  9:52 ` [PATCH 05/13] sched/numa: Replace runnable_load_avg by load_avg Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2020-02-24  9:52 ` [PATCH 06/13] sched/numa: Use similar logic to the load balancer for moving between domains with spare capacity Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] " tip-bot2 for Mel Gorman
2020-02-24  9:52 ` [PATCH 07/13] sched/pelt: Remove unused runnable load average Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2020-02-24  9:52 ` [PATCH 08/13] sched/pelt: Add a new runnable average signal Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2020-02-24 16:01     ` Valentin Schneider
2020-02-24 16:34       ` Mel Gorman
2020-02-25  8:23       ` Vincent Guittot
2020-02-24  9:52 ` [PATCH 09/13] sched/fair: Take into account runnable_avg to classify group Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2020-02-24  9:52 ` [PATCH 10/13] sched/numa: Prefer using an idle cpu as a migration target instead of comparing tasks Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] sched/numa: Prefer using an idle CPU " tip-bot2 for Mel Gorman
2020-02-24  9:52 ` [PATCH 11/13] sched/numa: Find an alternative idle CPU if the CPU is part of an active NUMA balance Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] " tip-bot2 for Mel Gorman
2020-02-24  9:52 ` [PATCH 12/13] sched/numa: Bias swapping tasks based on their preferred node Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] " tip-bot2 for Mel Gorman
2020-02-24  9:52 ` [PATCH 13/13] sched/numa: Stop an exhastive search if a reasonable swap candidate or idle CPU is found Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] " tip-bot2 for Mel Gorman
2020-02-24 15:16 ` [PATCH 00/13] Reconcile NUMA balancing decisions with the load balancer v6 Ingo Molnar
2020-02-25 11:59   ` Mel Gorman
2020-02-25 13:28     ` Vincent Guittot
2020-02-25 14:24       ` Mel Gorman
2020-02-25 14:53         ` Vincent Guittot
2020-02-27  9:09         ` Ingo Molnar
2020-03-09 19:12 ` Phil Auld
2020-03-09 20:36   ` Mel Gorman
2020-03-12  9:54     ` Mel Gorman
2020-03-12 12:17       ` Jirka Hladky
     [not found]       ` <CAE4VaGA4q4_qfC5qe3zaLRfiJhvMaSb2WADgOcQeTwmPvNat+A@mail.gmail.com>
2020-03-12 15:56         ` Mel Gorman
2020-03-12 17:06           ` Jirka Hladky
     [not found]           ` <CAE4VaGD8DUEi6JnKd8vrqUL_8HZXnNyHMoK2D+1-F5wo+5Z53Q@mail.gmail.com>
2020-03-12 21:47             ` Mel Gorman
2020-03-12 22:24               ` Jirka Hladky
2020-03-20 15:08                 ` Jirka Hladky
     [not found]                 ` <CAE4VaGC09OfU2zXeq2yp_N0zXMbTku5ETz0KEocGi-RSiKXv-w@mail.gmail.com>
2020-03-20 15:22                   ` Mel Gorman
2020-03-20 15:33                     ` Jirka Hladky
     [not found]                     ` <CAE4VaGBGbTT8dqNyLWAwuiqL8E+3p1_SqP6XTTV71wNZMjc9Zg@mail.gmail.com>
2020-03-20 16:38                       ` Mel Gorman
2020-03-20 17:21                         ` Jirka Hladky
2020-05-07 15:24                         ` Jirka Hladky
2020-05-07 15:54                           ` Mel Gorman
2020-05-07 16:29                             ` Jirka Hladky
2020-05-07 17:49                               ` Phil Auld
     [not found]                                 ` <20200508034741.13036-1-hdanton@sina.com>
2020-05-18 14:52                                   ` Jirka Hladky
     [not found]                                     ` <20200519043154.10876-1-hdanton@sina.com>
2020-05-20 13:58                                       ` Jirka Hladky
2020-05-20 16:01                                         ` Jirka Hladky
2020-05-21 11:06                                         ` Mel Gorman
     [not found]                                         ` <20200521140931.15232-1-hdanton@sina.com>
2020-05-21 16:04                                           ` Mel Gorman
     [not found]                                           ` <20200522010950.3336-1-hdanton@sina.com>
2020-05-22 11:05                                             ` Mel Gorman
2020-05-08  9:22                               ` Mel Gorman
2020-05-08 11:05                                 ` Jirka Hladky
     [not found]                                 ` <CAE4VaGC_v6On-YvqdTwAWu3Mq4ofiV0pLov-QpV+QHr_SJr+Rw@mail.gmail.com>
2020-05-13 14:57                                   ` Jirka Hladky
2020-05-13 15:30                                     ` Mel Gorman
2020-05-13 16:20                                       ` Jirka Hladky
2020-05-14  9:50                                         ` Mel Gorman
     [not found]                                           ` <CAE4VaGCGUFOAZ+YHDnmeJ95o4W0j04Yb7EWnf8a43caUQs_WuQ@mail.gmail.com>
2020-05-14 10:08                                             ` Mel Gorman
2020-05-14 10:22                                               ` Jirka Hladky
2020-05-14 11:50                                                 ` Mel Gorman
2020-05-14 13:34                                                   ` Jirka Hladky
2020-05-14 15:31                                       ` Peter Zijlstra
2020-05-15  8:47                                         ` Mel Gorman
2020-05-15 11:17                                           ` Peter Zijlstra
2020-05-15 13:03                                             ` Mel Gorman
2020-05-15 13:12                                               ` Peter Zijlstra
2020-05-15 13:28                                                 ` Peter Zijlstra
2020-05-15 14:24                                             ` Peter Zijlstra
2020-05-21 10:38                                               ` Mel Gorman
2020-05-21 11:41                                                 ` Peter Zijlstra
2020-05-22 13:28                                                   ` Mel Gorman
2020-05-22 14:38                                                     ` Peter Zijlstra
2020-05-15 11:28                                           ` Peter Zijlstra [this message]
2020-05-15 12:22                                             ` Mel Gorman
2020-05-15 12:51                                               ` Peter Zijlstra
2020-05-15 14:43                                       ` Jirka Hladky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200515112815.GT2957@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bgray@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=dshaks@redhat.com \
    --cc=hdanton@sina.com \
    --cc=jhladky@redhat.com \
    --cc=jmario@redhat.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=pauld@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.