From: Suresh Siddha <suresh.b.siddha@intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@elte.hu>, Paul Turner <pjt@google.com>,
Mike Galbraith <efault@gmx.de>
Subject: Re: sched: Avoid SMT siblings in select_idle_sibling() if possible
Date: Tue, 15 Nov 2011 17:14:22 -0800 [thread overview]
Message-ID: <1321406062.16760.60.camel@sbsiddha-desk.sc.intel.com> (raw)
In-Reply-To: <1321350377.1421.55.camel@twins>
On Tue, 2011-11-15 at 01:46 -0800, Peter Zijlstra wrote:
> @@ -2346,25 +2347,38 @@ static int select_idle_sibling(struct ta
> * Otherwise, iterate the domains and find an elegible idle cpu.
> */
> rcu_read_lock();
> +again:
> for_each_domain(target, sd) {
> - if (!(sd->flags & SD_SHARE_PKG_RESOURCES))
> - break;
> + if (!smt && (sd->flags & SD_SHARE_CPUPOWER))
> + continue;
>
> - for_each_cpu_and(i, sched_domain_span(sd), tsk_cpus_allowed(p)) {
> - if (idle_cpu(i)) {
> - target = i;
> - break;
> + if (!(sd->flags & SD_SHARE_PKG_RESOURCES)) {
> + if (!smt) {
> + smt = 1;
> + goto again;
> }
> + break;
> }
It looks like you will be checking the core domain twice (with smt == 0
and smt == 1) if there are no idle siblings.
How about this patch which is more self explanatory?
---
Avoid select_idle_sibling() from picking a sibling thread if there's
an idle core that shares cache.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
kernel/sched.c | 2 +
kernel/sched_fair.c | 54 +++++++++++++++++++++++++++++++++++---------------
2 files changed, 40 insertions(+), 16 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index 0e9344a..4b0bc6a 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -734,6 +734,8 @@ static inline int cpu_of(struct rq *rq)
#define for_each_domain(cpu, __sd) \
for (__sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd); __sd; __sd = __sd->parent)
+#define for_each_lower_domain(sd) for (; sd; sd = sd->child)
+
#define cpu_rq(cpu) (&per_cpu(runqueues, (cpu)))
#define this_rq() (&__get_cpu_var(runqueues))
#define task_rq(p) cpu_rq(task_cpu(p))
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 5c9e679..cb7a5ef 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2241,6 +2241,25 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
return idlest;
}
+/**
+ * highest_flag_domain - Return highest sched_domain containing flag.
+ * @cpu: The cpu whose highest level of sched domain is to
+ * be returned.
+ * @flag: The flag to check for the highest sched_domain
+ * for the given cpu.
+ *
+ * Returns the highest sched_domain of a cpu which contains the given flag.
+ */
+static inline struct sched_domain *highest_flag_domain(int cpu, int flag)
+{
+ struct sched_domain *sd;
+
+ for_each_domain(cpu, sd)
+ if (!(sd->flags & flag))
+ return sd->child;
+ return NULL;
+}
+
/*
* Try and locate an idle CPU in the sched_domain.
*/
@@ -2249,6 +2268,7 @@ static int select_idle_sibling(struct task_struct *p, int target)
int cpu = smp_processor_id();
int prev_cpu = task_cpu(p);
struct sched_domain *sd;
+ struct sched_group *sg;
int i;
/*
@@ -2269,25 +2289,27 @@ static int select_idle_sibling(struct task_struct *p, int target)
* Otherwise, iterate the domains and find an elegible idle cpu.
*/
rcu_read_lock();
- for_each_domain(target, sd) {
- if (!(sd->flags & SD_SHARE_PKG_RESOURCES))
- break;
+ sd = highest_flag_domain(target, SD_SHARE_PKG_RESOURCES);
+ for_each_lower_domain(sd) {
+ sg = sd->groups;
+ do {
+ if (!cpumask_intersects(sched_group_cpus(sg),
+ tsk_cpus_allowed(p)))
+ goto next;
- for_each_cpu_and(i, sched_domain_span(sd), tsk_cpus_allowed(p)) {
- if (idle_cpu(i)) {
- target = i;
- break;
+ for_each_cpu(i, sched_group_cpus(sg)) {
+ if (!idle_cpu(i))
+ goto next;
}
- }
- /*
- * Lets stop looking for an idle sibling when we reached
- * the domain that spans the current cpu and prev_cpu.
- */
- if (cpumask_test_cpu(cpu, sched_domain_span(sd)) &&
- cpumask_test_cpu(prev_cpu, sched_domain_span(sd)))
- break;
+ target = cpumask_first_and(sched_group_cpus(sg),
+ tsk_cpus_allowed(p));
+ goto done;
+next:
+ sg = sg->next;
+ } while (sg != sd->groups);
}
+done:
rcu_read_unlock();
return target;
next prev parent reply other threads:[~2011-11-16 1:11 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-15 9:46 sched: Avoid SMT siblings in select_idle_sibling() if possible Peter Zijlstra
2011-11-16 1:14 ` Suresh Siddha [this message]
2011-11-16 9:24 ` Mike Galbraith
2011-11-16 18:37 ` Suresh Siddha
2011-11-17 1:59 ` Mike Galbraith
2011-11-17 15:38 ` Mike Galbraith
2011-11-17 15:56 ` Peter Zijlstra
2011-11-17 16:38 ` Mike Galbraith
2011-11-17 17:36 ` Suresh Siddha
2011-11-18 15:14 ` Mike Galbraith
2011-11-18 15:17 ` [patch 1/6] sched: use rt.nr_cpus_allowed to recover select_task_rq() cycles Mike Galbraith
2011-11-18 15:35 ` Peter Zijlstra
2011-11-18 17:34 ` Mike Galbraith
2011-11-22 14:17 ` Mike Galbraith
2011-11-22 14:18 ` [patch 1/7] " Mike Galbraith
2011-12-06 9:50 ` [tip:sched/core] sched: Use " tip-bot for Mike Galbraith
2011-11-22 14:20 ` [patch 2/7] sched: save some hrtick_start_fair cycles Mike Galbraith
2011-12-06 20:20 ` [tip:sched/core] sched: Save " tip-bot for Mike Galbraith
2011-11-22 14:21 ` [patch 3/7] sched: set skip_clock_update in yield_task_fair() Mike Galbraith
2011-11-23 11:53 ` Peter Zijlstra
2011-11-23 12:06 ` Mike Galbraith
2011-11-23 14:48 ` Peter Zijlstra
2011-11-24 3:50 ` Mike Galbraith
2011-11-24 10:12 ` Peter Zijlstra
2011-11-25 6:39 ` Mike Galbraith
2011-12-06 9:51 ` [tip:sched/core] sched: Set " tip-bot for Mike Galbraith
2011-11-22 14:22 ` [patch 4/7] sched: convert rq->avg_idle to rq->avg_event Mike Galbraith
2011-11-23 11:55 ` Peter Zijlstra
2011-11-23 12:09 ` Mike Galbraith
2011-11-23 12:27 ` Peter Zijlstra
2011-11-23 12:57 ` Mike Galbraith
2011-11-23 14:21 ` Mike Galbraith
2011-11-22 14:23 ` [patch 5/7] sched: ratelimit select_idle_sibling()for sync wakeups Mike Galbraith
2011-11-22 14:24 ` [patch 6/7] sched: use rq->avg_event to resurrect nohz ratelimiting Mike Galbraith
2011-11-23 11:57 ` Peter Zijlstra
2011-11-23 12:35 ` Mike Galbraith
2011-11-22 14:26 ` [patch 7/7] sched: only use TTWU_QUEUE when waker/wakee CPUs do not share top level cache Mike Galbraith
2011-11-23 12:08 ` Peter Zijlstra
2011-11-18 15:39 ` [patch 1/6] sched: use rt.nr_cpus_allowed to recover select_task_rq() cycles Hillf Danton
2011-11-18 15:18 ` [patch 2/6] sched: convert rq->avg_idle to rq->avg_event Mike Galbraith
2011-11-18 15:19 ` [patch 3/6] sched: use rq->avg_event to resurrect nohz ratelimiting Mike Galbraith
2011-11-18 15:36 ` Peter Zijlstra
2011-11-18 17:42 ` Mike Galbraith
2011-11-19 0:51 ` Van De Ven, Arjan
2011-11-19 4:15 ` Mike Galbraith
2011-11-18 15:20 ` [patch 4/6] sched: ratelimit select_idle_sibling()for sync wakeups Mike Galbraith
2011-11-18 15:22 ` [patch 5/6] sched: save some hrtick_start_fair cycles Mike Galbraith
2011-11-18 15:23 ` [patch 6/6] sched: set skip_clock_update in yield_task_fair() Mike Galbraith
2012-02-20 14:41 ` sched: Avoid SMT siblings in select_idle_sibling() if possible Peter Zijlstra
2012-02-20 15:03 ` Srivatsa Vaddagiri
2012-02-20 18:25 ` Mike Galbraith
2012-02-21 0:06 ` Srivatsa Vaddagiri
2012-02-21 6:37 ` Mike Galbraith
2012-02-21 8:09 ` Srivatsa Vaddagiri
2012-02-20 18:14 ` Mike Galbraith
2012-02-20 18:15 ` Peter Zijlstra
2012-02-20 19:07 ` Peter Zijlstra
2012-02-21 5:43 ` Mike Galbraith
2012-02-21 8:32 ` Srivatsa Vaddagiri
2012-02-21 9:21 ` Mike Galbraith
2012-02-21 10:37 ` Peter Zijlstra
2012-02-21 14:58 ` Srivatsa Vaddagiri
2012-02-23 10:49 ` Srivatsa Vaddagiri
2012-02-23 11:19 ` Ingo Molnar
2012-02-23 12:18 ` Srivatsa Vaddagiri
2012-02-23 11:20 ` Srivatsa Vaddagiri
2012-02-23 11:26 ` Ingo Molnar
2012-02-23 11:32 ` Srivatsa Vaddagiri
2012-02-23 16:17 ` Ingo Molnar
2012-02-23 11:21 ` Mike Galbraith
2012-02-25 6:54 ` Srivatsa Vaddagiri
2012-02-25 8:30 ` Mike Galbraith
2012-02-27 22:11 ` Suresh Siddha
2012-02-28 5:05 ` Mike Galbraith
2011-11-17 19:08 ` Suresh Siddha
2011-11-18 15:12 ` Peter Zijlstra
2011-11-18 15:26 ` Mike Galbraith
2011-12-06 9:49 ` [tip:sched/core] sched: Clean up domain traversal in select_idle_sibling() tip-bot for Suresh Siddha
2011-11-18 23:40 ` [tip:sched/core] sched: Avoid SMT siblings in select_idle_sibling() if possible tip-bot for Peter Zijlstra
[not found] <1329764866.2293.376.camhel@twins>
2012-03-05 15:24 ` Srivatsa Vaddagiri
2012-03-06 9:14 ` Ingo Molnar
2012-03-06 10:03 ` Srivatsa Vaddagiri
2012-03-22 15:32 ` Srivatsa Vaddagiri
2012-03-23 6:38 ` Mike Galbraith
2012-03-26 8:29 ` Peter Zijlstra
2012-03-26 8:36 ` Peter Zijlstra
2012-03-26 17:35 ` Srivatsa Vaddagiri
2012-03-26 18:06 ` Peter Zijlstra
2012-03-27 13:56 ` Mike Galbraith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1321406062.16760.60.camel@sbsiddha-desk.sc.intel.com \
--to=suresh.b.siddha@intel.com \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).