linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rik van Riel <riel@surriel.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: linux-kernel@vger.kernel.org, kernel-team@fb.com,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Valentin Schneider <valentin.schneider@arm.com>
Subject: [PATCH v2] sched/fair: bring back select_idle_smt, but differently
Date: Sun, 21 Mar 2021 15:03:58 -0400	[thread overview]
Message-ID: <20210321150358.71ef52b1@imladris.surriel.com> (raw)

    Mel Gorman did some nice work in 9fe1f127b913
    ("sched/fair: Merge select_idle_core/cpu()"), resulting in the kernel
    being more efficient at finding an idle CPU, and in tasks spending less
    time waiting to be run, both according to the schedstats run_delay
    numbers, and according to measured application latencies. Yay.
    
    The flip side of this is that we see more task migrations (about
    30% more), higher cache misses, higher memory bandwidth utilization,
    and higher CPU use, for the same number of requests/second.
    
    This is most pronounced on a memcache type workload, which saw
    a consistent 1-3% increase in total CPU use on the system, due
    to those increased task migrations leading to higher L2 cache
    miss numbers, and higher memory utilization. The exclusive L3
    cache on Skylake does us no favors there.
    
    On our web serving workload, that effect is usually negligible,
    but occasionally as large as a 9% regression in the number of
    requests served, due to some interaction between scheduler latency
    and the web load balancer.
    
    It appears that the increased number of CPU migrations is generally
    a good thing, since it leads to lower cpu_delay numbers, reflecting
    the fact that tasks get to run faster. However, the reduced locality
    and the corresponding increase in L2 cache misses hurts a little.
    
    The patch below appears to fix the regression, while keeping the
    benefit of the lower cpu_delay numbers, by reintroducing select_idle_smt
    with a twist: when a socket has no idle cores, check to see if the
    sibling of "prev" is idle, before searching all the other CPUs.
    
    This fixes both the occasional 9% regression on the web serving
    workload, and the continuous 2% CPU use regression on the memcache
    type workload.
    
    With Mel's patches and this patch together, the p95 and p99 response
    times for the memcache type application improve by about 20% over what
    they were before Mel's patches got merged.
    
    Signed-off-by: Rik van Riel <riel@surriel.com>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 794c2cb945f8..0c986972f4cd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6098,6 +6098,28 @@ static int select_idle_core(struct task_struct *p, int core, struct cpumask *cpu
 	return -1;
 }
 
+/*
+ * Scan the local SMT mask for idle CPUs.
+ */
+static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int
+target)
+{
+	int cpu;
+
+	if (!static_branch_likely(&sched_smt_present))
+		return -1;
+
+	for_each_cpu(cpu, cpu_smt_mask(target)) {
+		if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
+		    !cpumask_test_cpu(cpu, sched_domain_span(sd)))
+			continue;
+		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
+			return cpu;
+	}
+
+	return -1;
+}
+
 #else /* CONFIG_SCHED_SMT */
 
 static inline void set_idle_cores(int cpu, int val)
@@ -6114,6 +6136,11 @@ static inline int select_idle_core(struct task_struct *p, int core, struct cpuma
 	return __select_idle_cpu(core);
 }
 
+static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target)
+{
+	return -1;
+}
+
 #endif /* CONFIG_SCHED_SMT */
 
 /*
@@ -6121,7 +6148,7 @@ static inline int select_idle_core(struct task_struct *p, int core, struct cpuma
  * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
  * average idle time for this rq (as found in rq->avg_idle).
  */
-static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
+static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int prev, int target)
 {
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
 	int i, cpu, idle_cpu = -1, nr = INT_MAX;
@@ -6155,6 +6182,13 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 		time = cpu_clock(this);
 	}
 
+	if (!smt && cpus_share_cache(prev, target)) {
+		/* No idle core. Check if prev has an idle sibling. */
+		i = select_idle_smt(p, sd, prev);
+		if ((unsigned int)i < nr_cpumask_bits)
+			return i;
+	}
+
 	for_each_cpu_wrap(cpu, cpus, target) {
 		if (smt) {
 			i = select_idle_core(p, cpu, cpus, &idle_cpu);
@@ -6307,7 +6341,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	if (!sd)
 		return target;
 
-	i = select_idle_cpu(p, sd, target);
+	i = select_idle_cpu(p, sd, prev, target);
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;
 


             reply	other threads:[~2021-03-21 19:04 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-21 19:03 Rik van Riel [this message]
2021-03-22 11:03 ` [PATCH v2] sched/fair: bring back select_idle_smt, but differently Mel Gorman
2021-03-22 15:07   ` Rik van Riel
2021-03-22 15:33     ` Mel Gorman
2021-03-23  2:08       ` Rik van Riel
2021-03-26 19:19   ` [PATCH v3] " Rik van Riel
2021-03-28 15:36     ` Mel Gorman
2021-04-06 15:10     ` Vincent Guittot
2021-04-06 15:26       ` Rik van Riel
2021-04-06 15:31         ` Vincent Guittot
2021-04-06 15:33           ` Vincent Guittot
2021-04-06 15:55           ` Rik van Riel
2021-04-06 16:13             ` Vincent Guittot
2021-04-07  7:17         ` Peter Zijlstra
2021-04-07  9:41           ` Mel Gorman
2021-04-07 10:15             ` Peter Zijlstra
2021-04-07 10:47               ` Mel Gorman
2021-04-07 12:10                 ` Peter Zijlstra
2021-04-07  9:42           ` Vincent Guittot
2021-04-07  9:54             ` Peter Zijlstra
2021-04-07  9:57               ` Vincent Guittot
2021-04-07 10:19               ` Peter Zijlstra
2021-04-07 10:24                 ` Vincent Guittot
2021-04-08 15:52                 ` Rik van Riel
2021-04-09 11:24     ` [tip: sched/core] sched/fair: Bring back select_idle_smt(), " tip-bot2 for Rik van Riel
2021-04-09 16:14     ` tip-bot2 for Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210321150358.71ef52b1@imladris.surriel.com \
    --to=riel@surriel.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).