All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Hillf Danton <hdanton@sina.com>
Cc: Jirka Hladky <jhladky@redhat.com>, Phil Auld <pauld@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Douglas Shakshober <dshaks@redhat.com>,
	Waiman Long <longman@redhat.com>, Joe Mario <jmario@redhat.com>,
	Bill Gray <bgray@redhat.com>,
	"aokuliar@redhat.com" <aokuliar@redhat.com>,
	"kkolakow@redhat.com" <kkolakow@redhat.com>
Subject: Re: [PATCH 00/13] Reconcile NUMA balancing decisions with the load balancer v6
Date: Thu, 21 May 2020 17:04:08 +0100	[thread overview]
Message-ID: <20200521160408.GC7167@techsingularity.net> (raw)
In-Reply-To: <20200521140931.15232-1-hdanton@sina.com>

On Thu, May 21, 2020 at 10:09:31PM +0800, Hillf Danton wrote:
> > I'm ignoring the coding style of c++ comments but minimally that should
> > be fixed. More importantly, node_type can be one of node_overloaded,
> > node_has_spare or node_fully busy and this is checking if there is a
> > mismatch. However, it's not taking into account whether the dst_node
> > is more or less loaded than the src and does not appear to be actually
> > checking spare capacity like the comment suggests.
> > 
>
> Type other than node_has_spare is not considered because node is not
> possible to be so idle that two communicating tasks would better
> "stay local."
> 

You hardcode an imbalance of 2 at the start without computing any
imbalance. Then if the source is fully_busy or overloaded while the
dst is idle, a task can move but that is based on an imaginary hard-coded
imbalance of 2. Finally, it appears that that the load balancer and
NUMA balancer are again using separate logic again when one part of the
objective of the series is that the load balancer and NUMA balancer would
not override each other. As the imbalance is never computed, the patch
can create one which the load balancer then overrides. What am I missing?

That said, it does make a certain amount of sense that if the dst has
spare capacity and the src is fully busy or overloaded then allow the
migration regardless of imbalance but that would be a different patch --
something like this but I didn't think too deeply or test it.

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 02f323b85b6d..97c0e090e161 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1916,19 +1916,27 @@ static void task_numa_find_cpu(struct task_numa_env *env,
 	 * imbalance that would be overruled by the load balancer.
 	 */
 	if (env->dst_stats.node_type == node_has_spare) {
-		unsigned int imbalance;
+		unsigned int imbalance = 0;
 		int src_running, dst_running;
 
 		/*
-		 * Would movement cause an imbalance? Note that if src has
-		 * more running tasks that the imbalance is ignored as the
-		 * move improves the imbalance from the perspective of the
-		 * CPU load balancer.
-		 * */
-		src_running = env->src_stats.nr_running - 1;
-		dst_running = env->dst_stats.nr_running + 1;
-		imbalance = max(0, dst_running - src_running);
-		imbalance = adjust_numa_imbalance(imbalance, src_running);
+		 * Check the imbalance if both src and dst have spare
+		 * capacity. If src is fully_busy or overloaded then
+		 * allow the task to move as it's both improving locality
+		 * and reducing an imbalance.
+		 */
+		if (env->src_stats.node_type == node_has_spare) {
+			/*
+			 * Would movement cause an imbalance? Note that if src
+			 * has more running tasks that the imbalance is
+			 * ignored as the move improves the imbalance from the
+			 * perspective of the CPU load balancer.
+			 */
+			src_running = env->src_stats.nr_running - 1;
+			dst_running = env->dst_stats.nr_running + 1;
+			imbalance = max(0, dst_running - src_running);
+			imbalance = adjust_numa_imbalance(imbalance, src_running);
+		}
 
 		/* Use idle CPU if there is no imbalance */
 		if (!imbalance) {

> > Then there is this part
> > 
> > +               imbalance = adjust_numa_imbalance(imbalance,
> > +                                               env->src_stats.nr_running);
> > +
> > +               //Do nothing without imbalance
> > +               if (!imbalance) {
> > +                       imbalance = 2;
> > +                       goto check_imb;
> > +               }
> > 
> > So... if there is no imbalance, assume there is an imbalance of 2, jump to
> > a branch that will always be false and fall through to code that ignores
> > the value of imbalance ...... it's hard to see exactly why that code flow
> > is ideal.
> > 
> With spare capacity be certain, then lets see how idle the node is.

But you no longer check how idle it is or what it's relative idleness of
the destination is relative to the source. adjust_numa_imbalance does
not calculate an imbalance, it only decides whether it should be
ignored.

> And I
> can not do that without the tool you created, He bless you. Although not
> sure they're two tasks talking to each other. This is another topic in the
> coming days.
> 

There is insufficient context in this path to determine if two tasks are
communicating. In some cases it may be inferred from last_wakee but that
only works for two tasks, it doesn't work for 1:n waker:wakees such as
might happen with a producer/consumer pattern. I guess you could record
the time that tasks migrated cross-node due to a wakeup and avoid migrating
those tasks for a period of time by either NUMA or load balancer but that
could cause transient overloads of a domain if there are a lot of wakeups
(e.g. hackbench). It's a change that would need to be done on both the
NUMA and load balancer to force the migration even if there are wakeup
relationships if the domain is fully busy or overloaded.

-- 
Mel Gorman
SUSE Labs

  parent reply	other threads:[~2020-05-21 16:04 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-24  9:52 [PATCH 00/13] Reconcile NUMA balancing decisions with the load balancer v6 Mel Gorman
2020-02-24  9:52 ` [PATCH 01/13] sched/fair: Allow a per-CPU kthread waking a task to stack on the same CPU, to fix XFS performance regression Mel Gorman
2020-02-24  9:52 ` [PATCH 02/13] sched/numa: Trace when no candidate CPU was found on the preferred node Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] " tip-bot2 for Mel Gorman
2020-02-24  9:52 ` [PATCH 03/13] sched/numa: Distinguish between the different task_numa_migrate failure cases Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] sched/numa: Distinguish between the different task_numa_migrate() " tip-bot2 for Mel Gorman
2020-02-24  9:52 ` [PATCH 04/13] sched/fair: Reorder enqueue/dequeue_task_fair path Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2020-02-24  9:52 ` [PATCH 05/13] sched/numa: Replace runnable_load_avg by load_avg Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2020-02-24  9:52 ` [PATCH 06/13] sched/numa: Use similar logic to the load balancer for moving between domains with spare capacity Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] " tip-bot2 for Mel Gorman
2020-02-24  9:52 ` [PATCH 07/13] sched/pelt: Remove unused runnable load average Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2020-02-24  9:52 ` [PATCH 08/13] sched/pelt: Add a new runnable average signal Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2020-02-24 16:01     ` Valentin Schneider
2020-02-24 16:34       ` Mel Gorman
2020-02-25  8:23       ` Vincent Guittot
2020-02-24  9:52 ` [PATCH 09/13] sched/fair: Take into account runnable_avg to classify group Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2020-02-24  9:52 ` [PATCH 10/13] sched/numa: Prefer using an idle cpu as a migration target instead of comparing tasks Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] sched/numa: Prefer using an idle CPU " tip-bot2 for Mel Gorman
2020-02-24  9:52 ` [PATCH 11/13] sched/numa: Find an alternative idle CPU if the CPU is part of an active NUMA balance Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] " tip-bot2 for Mel Gorman
2020-02-24  9:52 ` [PATCH 12/13] sched/numa: Bias swapping tasks based on their preferred node Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] " tip-bot2 for Mel Gorman
2020-02-24  9:52 ` [PATCH 13/13] sched/numa: Stop an exhastive search if a reasonable swap candidate or idle CPU is found Mel Gorman
2020-02-24 15:20   ` [tip: sched/core] " tip-bot2 for Mel Gorman
2020-02-24 15:16 ` [PATCH 00/13] Reconcile NUMA balancing decisions with the load balancer v6 Ingo Molnar
2020-02-25 11:59   ` Mel Gorman
2020-02-25 13:28     ` Vincent Guittot
2020-02-25 14:24       ` Mel Gorman
2020-02-25 14:53         ` Vincent Guittot
2020-02-27  9:09         ` Ingo Molnar
2020-03-09 19:12 ` Phil Auld
2020-03-09 20:36   ` Mel Gorman
2020-03-12  9:54     ` Mel Gorman
2020-03-12 12:17       ` Jirka Hladky
     [not found]       ` <CAE4VaGA4q4_qfC5qe3zaLRfiJhvMaSb2WADgOcQeTwmPvNat+A@mail.gmail.com>
2020-03-12 15:56         ` Mel Gorman
2020-03-12 17:06           ` Jirka Hladky
     [not found]           ` <CAE4VaGD8DUEi6JnKd8vrqUL_8HZXnNyHMoK2D+1-F5wo+5Z53Q@mail.gmail.com>
2020-03-12 21:47             ` Mel Gorman
2020-03-12 22:24               ` Jirka Hladky
2020-03-20 15:08                 ` Jirka Hladky
     [not found]                 ` <CAE4VaGC09OfU2zXeq2yp_N0zXMbTku5ETz0KEocGi-RSiKXv-w@mail.gmail.com>
2020-03-20 15:22                   ` Mel Gorman
2020-03-20 15:33                     ` Jirka Hladky
     [not found]                     ` <CAE4VaGBGbTT8dqNyLWAwuiqL8E+3p1_SqP6XTTV71wNZMjc9Zg@mail.gmail.com>
2020-03-20 16:38                       ` Mel Gorman
2020-03-20 17:21                         ` Jirka Hladky
2020-05-07 15:24                         ` Jirka Hladky
2020-05-07 15:54                           ` Mel Gorman
2020-05-07 16:29                             ` Jirka Hladky
2020-05-07 17:49                               ` Phil Auld
     [not found]                                 ` <20200508034741.13036-1-hdanton@sina.com>
2020-05-18 14:52                                   ` Jirka Hladky
     [not found]                                     ` <20200519043154.10876-1-hdanton@sina.com>
2020-05-20 13:58                                       ` Jirka Hladky
2020-05-20 16:01                                         ` Jirka Hladky
2020-05-21 11:06                                         ` Mel Gorman
     [not found]                                         ` <20200521140931.15232-1-hdanton@sina.com>
2020-05-21 16:04                                           ` Mel Gorman [this message]
     [not found]                                           ` <20200522010950.3336-1-hdanton@sina.com>
2020-05-22 11:05                                             ` Mel Gorman
2020-05-08  9:22                               ` Mel Gorman
2020-05-08 11:05                                 ` Jirka Hladky
     [not found]                                 ` <CAE4VaGC_v6On-YvqdTwAWu3Mq4ofiV0pLov-QpV+QHr_SJr+Rw@mail.gmail.com>
2020-05-13 14:57                                   ` Jirka Hladky
2020-05-13 15:30                                     ` Mel Gorman
2020-05-13 16:20                                       ` Jirka Hladky
2020-05-14  9:50                                         ` Mel Gorman
     [not found]                                           ` <CAE4VaGCGUFOAZ+YHDnmeJ95o4W0j04Yb7EWnf8a43caUQs_WuQ@mail.gmail.com>
2020-05-14 10:08                                             ` Mel Gorman
2020-05-14 10:22                                               ` Jirka Hladky
2020-05-14 11:50                                                 ` Mel Gorman
2020-05-14 13:34                                                   ` Jirka Hladky
2020-05-14 15:31                                       ` Peter Zijlstra
2020-05-15  8:47                                         ` Mel Gorman
2020-05-15 11:17                                           ` Peter Zijlstra
2020-05-15 13:03                                             ` Mel Gorman
2020-05-15 13:12                                               ` Peter Zijlstra
2020-05-15 13:28                                                 ` Peter Zijlstra
2020-05-15 14:24                                             ` Peter Zijlstra
2020-05-21 10:38                                               ` Mel Gorman
2020-05-21 11:41                                                 ` Peter Zijlstra
2020-05-22 13:28                                                   ` Mel Gorman
2020-05-22 14:38                                                     ` Peter Zijlstra
2020-05-15 11:28                                           ` Peter Zijlstra
2020-05-15 12:22                                             ` Mel Gorman
2020-05-15 12:51                                               ` Peter Zijlstra
2020-05-15 14:43                                       ` Jirka Hladky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200521160408.GC7167@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=aokuliar@redhat.com \
    --cc=bgray@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=dshaks@redhat.com \
    --cc=hdanton@sina.com \
    --cc=jhladky@redhat.com \
    --cc=jmario@redhat.com \
    --cc=juri.lelli@redhat.com \
    --cc=kkolakow@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@kernel.org \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.