linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Rik van Riel <riel@surriel.com>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Michael Neuling <mikey@neuling.org>,
	Gautham R Shenoy <ego@linux.vnet.ibm.com>,
	Parth Shah <parth@linux.ibm.com>
Subject: [PATCH 04/10] sched/fair: Prefer idle CPU to cache affinity
Date: Thu, 22 Apr 2021 15:53:20 +0530	[thread overview]
Message-ID: <20210422102326.35889-5-srikar@linux.vnet.ibm.com> (raw)
In-Reply-To: <20210422102326.35889-1-srikar@linux.vnet.ibm.com>

Current order of preference to pick a LLC while waking a wake-affine
task:
1. Between the waker CPU and previous CPU, prefer the LLC of the CPU
   that is idle.

2. Between the waker CPU and previous CPU, prefer the LLC of the CPU
   that is less lightly loaded.

In the current situation where waker and previous CPUs are busy, but
only one of its LLC has an idle CPU, Scheduler may end up picking a LLC
with no idle CPUs. To mitigate this, add a method where Scheduler
compares idle CPUs in waker and previous LLCs and picks the appropriate
one.

The new method looks at idle-core to figure out idle LLC. If there are
no idle LLCs, it compares the ratio of busy CPUs to the total number of
CPUs in the LLC. This method will only be useful to compare 2 LLCs. If
the previous CPU and the waking CPU are in the same LLC, this method
would not be useful. For now the new method is disabled by default.

Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Parth Shah <parth@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
Based on similar posting:
http://lore.kernel.org/lkml/20210226164029.122432-1-srikar@linux.vnet.ibm.com/t/#u
Some comments in the next patch
- Make WA_WAKER default (Suggested by Rik) : done in next patch
- Make WA_WAKER check more conservative: (Suggested by Rik / Peter)
- Rename WA_WAKER to WA_IDLER_LLC (Suggested by Vincent)
- s/pllc_size/tllc_size while checking for busy case: (Pointed by Dietmar)
- Add rcu_read_lock and check for validity of shared domains
- Add idle-core support

 kernel/sched/fair.c     | 64 +++++++++++++++++++++++++++++++++++++++++
 kernel/sched/features.h |  1 +
 2 files changed, 65 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 09c33cca0349..943621367a96 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5869,6 +5869,67 @@ wake_affine_weight(struct sched_domain *sd, struct task_struct *p,
 	return this_eff_load < prev_eff_load ? this_cpu : nr_cpumask_bits;
 }
 
+static int wake_affine_idler_llc(struct task_struct *p, int this_cpu, int prev_cpu, int sync)
+{
+#ifdef CONFIG_NO_HZ_COMMON
+	int pnr_busy, pllc_size, tnr_busy, tllc_size;
+#endif
+	struct sched_domain_shared *tsds, *psds;
+	int diff;
+
+	tsds = rcu_dereference(per_cpu(sd_llc_shared, this_cpu));
+	psds = rcu_dereference(per_cpu(sd_llc_shared, prev_cpu));
+	if (!tsds || !psds)
+		return nr_cpumask_bits;
+
+	if (sync) {
+		if (available_idle_cpu(this_cpu) || sched_idle_cpu(this_cpu))
+			return this_cpu;
+		if (tsds->idle_core != -1) {
+			if (cpumask_test_cpu(tsds->idle_core, p->cpus_ptr))
+				return tsds->idle_core;
+			return this_cpu;
+		}
+	}
+
+	if (available_idle_cpu(prev_cpu) || sched_idle_cpu(prev_cpu))
+		return prev_cpu;
+	if (psds->idle_core != -1) {
+		if (cpumask_test_cpu(psds->idle_core, p->cpus_ptr))
+			return psds->idle_core;
+		return prev_cpu;
+	}
+
+	if (!sync) {
+		if (available_idle_cpu(this_cpu) || sched_idle_cpu(this_cpu))
+			return this_cpu;
+		if (tsds->idle_core != -1) {
+			if (cpumask_test_cpu(tsds->idle_core, p->cpus_ptr))
+				return tsds->idle_core;
+			return this_cpu;
+		}
+	}
+
+#ifdef CONFIG_NO_HZ_COMMON
+	tnr_busy = atomic_read(&tsds->nr_busy_cpus);
+	pnr_busy = atomic_read(&psds->nr_busy_cpus);
+
+	tllc_size = per_cpu(sd_llc_size, this_cpu);
+	pllc_size = per_cpu(sd_llc_size, prev_cpu);
+
+	if (pnr_busy == pllc_size && tnr_busy == tllc_size)
+		return nr_cpumask_bits;
+
+	diff = pnr_busy * tllc_size - tnr_busy * pllc_size;
+	if (diff > 0)
+		return this_cpu;
+	if (diff < 0)
+		return prev_cpu;
+#endif /* CONFIG_NO_HZ_COMMON */
+
+	return nr_cpumask_bits;
+}
+
 static int wake_affine(struct sched_domain *sd, struct task_struct *p,
 		       int this_cpu, int prev_cpu, int sync)
 {
@@ -5877,6 +5938,9 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
 	if (sched_feat(WA_IDLE))
 		target = wake_affine_idle(this_cpu, prev_cpu, sync);
 
+	if (sched_feat(WA_IDLER_LLC) && target == nr_cpumask_bits)
+		target = wake_affine_idler_llc(p, this_cpu, prev_cpu, sync);
+
 	if (sched_feat(WA_WEIGHT) && target == nr_cpumask_bits)
 		target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
 
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 1bc2b158fc51..c77349a47e01 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -83,6 +83,7 @@ SCHED_FEAT(ATTACH_AGE_LOAD, true)
 
 SCHED_FEAT(WA_IDLE, true)
 SCHED_FEAT(WA_WEIGHT, true)
+SCHED_FEAT(WA_IDLER_LLC, false)
 SCHED_FEAT(WA_BIAS, true)
 
 /*
-- 
2.18.2


  parent reply	other threads:[~2021-04-22 10:24 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-22 10:23 [PATCH 00/10] sched/fair: wake_affine improvements Srikar Dronamraju
2021-04-22 10:23 ` [PATCH 01/10] sched/fair: Update affine statistics when needed Srikar Dronamraju
2021-04-22 10:23 ` [PATCH 02/10] sched/fair: Maintain the identity of idle-core Srikar Dronamraju
2021-04-22 10:23 ` [PATCH 03/10] sched/fair: Update idle-core more often Srikar Dronamraju
2021-04-22 10:23 ` Srikar Dronamraju [this message]
2021-04-22 10:23 ` [PATCH 05/10] sched/fair: Call wake_affine only if necessary Srikar Dronamraju
2021-04-22 10:23 ` [PATCH 06/10] sched/idle: Move busy_cpu accounting to idle callback Srikar Dronamraju
2021-04-22 10:23 ` [PATCH 07/10] sched/fair: Remove ifdefs in waker_affine_idler_llc Srikar Dronamraju
2021-04-22 10:23 ` [PATCH 08/10] sched/fair: Dont iterate if no idle CPUs Srikar Dronamraju
2021-04-22 10:23 ` [PATCH 09/10] sched/topology: Introduce fallback LLC Srikar Dronamraju
2021-04-22 15:10   ` kernel test robot
2021-04-22 10:23 ` [PATCH 10/10] powerpc/smp: Add fallback flag to powerpc MC domain Srikar Dronamraju
2021-04-23  8:25 ` [PATCH 00/10] sched/fair: wake_affine improvements Mel Gorman
2021-04-23 10:31   ` Srikar Dronamraju
2021-04-23 12:38     ` Mel Gorman
2021-04-26 10:30       ` Srikar Dronamraju
2021-04-26 11:35         ` Mel Gorman
2021-04-26 10:39   ` Srikar Dronamraju
2021-04-26 11:41     ` Mel Gorman
2021-04-28 12:57       ` Srikar Dronamraju
2021-04-27 14:52 ` Vincent Guittot
2021-04-28 12:49   ` Srikar Dronamraju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210422102326.35889-5-srikar@linux.vnet.ibm.com \
    --to=srikar@linux.vnet.ibm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=ego@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mikey@neuling.org \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=parth@linux.ibm.com \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=tglx@linutronix.de \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).