From: Ingo Molnar <mingo@kernel.org>
To: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
LKML <linux-kernel@vger.kernel.org>,
Mel Gorman <mgorman@techsingularity.net>,
Rik van Riel <riel@surriel.com>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 5/6] sched/numa: Reset scan rate whenever task moves across nodes
Date: Mon, 10 Sep 2018 10:48:08 +0200 [thread overview]
Message-ID: <20180910084808.GE48257@gmail.com> (raw)
In-Reply-To: <1533276841-16341-6-git-send-email-srikar@linux.vnet.ibm.com>
* Srikar Dronamraju <srikar@linux.vnet.ibm.com> wrote:
> Currently task scan rate is reset when numa balancer migrates the task
> to a different node. If numa balancer initiates a swap, reset is only
> applicable to the task that initiates the swap. Similarly no scan rate
> reset is done if the task is migrated across nodes by traditional load
> balancer.
>
> Instead move the scan reset to the migrate_task_rq. This ensures the
> task moved out of its preferred node, either gets back to its preferred
> node quickly or finds a new preferred node. Doing so, would be fair to
> all tasks migrating across nodes.
>
> specjbb2005 / bops/JVM / higher bops are better
> on 2 Socket/2 Node Intel
> JVMS Prev Current %Change
> 4 210118 208862 -0.597759
> 1 313171 307007 -1.96825
>
>
> on 2 Socket/4 Node Power8 (PowerNV)
> JVMS Prev Current %Change
> 8 91027.5 89911.4 -1.22611
> 1 216460 216176 -0.131202
>
>
> on 2 Socket/2 Node Power9 (PowerNV)
> JVMS Prev Current %Change
> 4 191918 196078 2.16759
> 1 207043 214664 3.68088
>
>
> on 4 Socket/4 Node Power7
> JVMS Prev Current %Change
> 8 58462.1 60719.2 3.86079
> 1 108334 112615 3.95167
>
>
> dbench / transactions / higher numbers are better
> on 2 Socket/2 Node Intel
> count Min Max Avg Variance %Change
> 5 11851.8 11937.3 11890.9 33.5169
> 5 12511.7 12559.4 12539.5 15.5883 5.45459
>
>
> on 2 Socket/4 Node Power8 (PowerNV)
> count Min Max Avg Variance %Change
> 5 4791 5016.08 4962.55 85.9625
> 5 4709.28 4979.28 4919.32 105.126 -0.871125
>
>
> on 2 Socket/2 Node Power9 (PowerNV)
> count Min Max Avg Variance %Change
> 5 9353.43 9380.49 9369.6 9.04361
> 5 9388.38 9406.29 9395.1 5.98959 0.272157
>
>
> on 4 Socket/4 Node Power7
> count Min Max Avg Variance %Change
> 5 149.518 215.412 179.083 21.5903
> 5 157.71 184.929 174.754 10.7275 -2.41731
>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
> kernel/sched/fair.c | 19 +++++++++++++------
> 1 file changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index a5936ed..4ea0eff 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1837,12 +1837,6 @@ static int task_numa_migrate(struct task_struct *p)
> if (env.best_cpu == -1)
> return -EAGAIN;
>
> - /*
> - * Reset the scan period if the task is being rescheduled on an
> - * alternative node to recheck if the tasks is now properly placed.
> - */
> - p->numa_scan_period = task_scan_start(p);
> -
> best_rq = cpu_rq(env.best_cpu);
> if (env.best_task == NULL) {
> ret = migrate_task_to(p, env.best_cpu);
> @@ -6361,6 +6355,19 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu __maybe_unus
>
> /* We have migrated, no longer consider this task hot */
> p->se.exec_start = 0;
> +
> +#ifdef CONFIG_NUMA_BALANCING
> + if (!p->mm || (p->flags & PF_EXITING))
> + return;
> +
> + if (p->numa_faults) {
> + int src_nid = cpu_to_node(task_cpu(p));
> + int dst_nid = cpu_to_node(new_cpu);
> +
> + if (src_nid != dst_nid)
> + p->numa_scan_period = task_scan_start(p);
> + }
> +#endif
Please don't add #ifdeffery inside functions, especially not if they do weird flow control like
a 'return' from the middle of a block.
A properly named inline helper would work I suppose.
Thanks,
Ingo
next prev parent reply other threads:[~2018-09-10 8:48 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-03 6:13 [PATCH 0/6] numa-balancing patches Srikar Dronamraju
2018-08-03 6:13 ` [PATCH 1/6] sched/numa: Stop multiple tasks from moving to the cpu at the same time Srikar Dronamraju
2018-09-10 8:42 ` Ingo Molnar
2018-08-03 6:13 ` [PATCH 2/6] mm/migrate: Use trylock while resetting rate limit Srikar Dronamraju
2018-09-06 11:48 ` Peter Zijlstra
2018-09-10 8:39 ` Ingo Molnar
2018-08-03 6:13 ` [PATCH 3/6] sched/numa: Avoid task migration for small numa improvement Srikar Dronamraju
2018-09-10 8:46 ` Ingo Molnar
2018-09-12 15:17 ` Srikar Dronamraju
2018-08-03 6:13 ` [PATCH 4/6] sched/numa: Pass destination cpu as a parameter to migrate_task_rq Srikar Dronamraju
2018-08-03 6:14 ` [PATCH 5/6] sched/numa: Reset scan rate whenever task moves across nodes Srikar Dronamraju
2018-09-10 8:48 ` Ingo Molnar [this message]
2018-09-12 15:19 ` Srikar Dronamraju
2018-08-03 6:14 ` [PATCH 6/6] sched/numa: Limit the conditions where scan period is reset Srikar Dronamraju
2018-08-21 12:01 ` [PATCH 0/6] numa-balancing patches Srikar Dronamraju
2018-09-06 12:17 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180910084808.GE48257@gmail.com \
--to=mingo@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@techsingularity.net \
--cc=peterz@infradead.org \
--cc=riel@surriel.com \
--cc=srikar@linux.vnet.ibm.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).