linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>,
	Suresh Siddha <suresh.b.siddha@intel.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@elte.hu>, Paul Turner <pjt@google.com>
Subject: Re: sched: Avoid SMT siblings in select_idle_sibling() if possible
Date: Mon, 5 Mar 2012 20:54:44 +0530	[thread overview]
Message-ID: <20120305152443.GE26559@linux.vnet.ibm.com> (raw)
In-Reply-To: <1329764866.2293.376.camhel@twins>

* Peter Zijlstra <peterz@infradead.org> [2012-02-20 20:07:46]:

> On Mon, 2012-02-20 at 19:14 +0100, Mike Galbraith wrote:
> > Enabling SD_BALANCE_WAKE used to be decidedly too expensive to consider.
> > Maybe that has changed, but I doubt it.
> 
> Right, I through I remembered somet such, you could see it on wakeup
> heavy things like pipe-bench and that java msg passing thing, right?

I did some experiments with volanomark and it does turn out to be
sensitive to SD_BALANCE_WAKE, while the other wake-heavy benchmark that I am
dealing with (Trade) benefits from it.

Normalized results for both benchmarks provided below. 

Machine : 2 Quad-core Intel X5570 CPU (H/T enabled)
Kernel  : tip (HEAD at b86148a) 

	   	   Before patch	    After patch

Trade thr'put		1		2.17 (~200% improvement)
volanomark		1		0.8  (20% degradation)


Quick description of benchmarks 
===============================

Trade was run inside a 8-vcpu VM (cgroup). 4 other 4-vcpu VMs running
cpu hogs were also present, leading to this cgroup setup:

	/cgroup/sys (1024 shares - hosts all system tasks)
	/cgroup/libvirt (20000 shares)
	/cgroup/libvirt/qemu/VM1 (8192 cpu shares)
	/cgroup/libvirt/qemu/VM2-5 (1024 shares)

Volanomark server/client programs were run in root cgroup.

The patch essentially does balance on wake to look for any idle cpu in
same cache domain as its prev_cpu (or cur_cpu if wake_affine obliges),
failing to find looks for least loaded cpu. This helps minimize
latencies for trade workload (and thus boost its score). For volanomark, 
it seems to hurt because of waking on a colder L2 cache.  The
tradeoff seems to be between latency and cache-misses. Short of adding
another tunable, are there better suggestions on how we can address this
sort of tradeoff?

Not-yet-Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

---
 include/linux/topology.h |    4 ++--
 kernel/sched/fair.c      |   26 +++++++++++++++++++++-----
 2 files changed, 23 insertions(+), 7 deletions(-)

Index: current/include/linux/topology.h
===================================================================
--- current.orig/include/linux/topology.h
+++ current/include/linux/topology.h
@@ -96,7 +96,7 @@ int arch_update_cpu_topology(void);
 				| 1*SD_BALANCE_NEWIDLE			\
 				| 1*SD_BALANCE_EXEC			\
 				| 1*SD_BALANCE_FORK			\
-				| 0*SD_BALANCE_WAKE			\
+				| 1*SD_BALANCE_WAKE			\
 				| 1*SD_WAKE_AFFINE			\
 				| 1*SD_SHARE_CPUPOWER			\
 				| 0*SD_POWERSAVINGS_BALANCE		\
@@ -129,7 +129,7 @@ int arch_update_cpu_topology(void);
 				| 1*SD_BALANCE_NEWIDLE			\
 				| 1*SD_BALANCE_EXEC			\
 				| 1*SD_BALANCE_FORK			\
-				| 0*SD_BALANCE_WAKE			\
+				| 1*SD_BALANCE_WAKE			\
 				| 1*SD_WAKE_AFFINE			\
 				| 0*SD_PREFER_LOCAL			\
 				| 0*SD_SHARE_CPUPOWER			\
Index: current/kernel/sched/fair.c
===================================================================
--- current.orig/kernel/sched/fair.c
+++ current/kernel/sched/fair.c
@@ -2638,7 +2638,7 @@ static int select_idle_sibling(struct ta
 	int prev_cpu = task_cpu(p);
 	struct sched_domain *sd;
 	struct sched_group *sg;
-	int i;
+	int i, some_idle_cpu = -1;
 
 	/*
 	 * If the task is going to be woken-up on this cpu and if it is
@@ -2661,15 +2661,25 @@ static int select_idle_sibling(struct ta
 	for_each_lower_domain(sd) {
 		sg = sd->groups;
 		do {
+			int skip = 0;
+
 			if (!cpumask_intersects(sched_group_cpus(sg),
 						tsk_cpus_allowed(p)))
 				goto next;
 
-			for_each_cpu(i, sched_group_cpus(sg)) {
-				if (!idle_cpu(i))
-					goto next;
+			for_each_cpu_and(i, sched_group_cpus(sg),
+							 tsk_cpus_allowed(p)) {
+				if (!idle_cpu(i)) {
+					if (some_idle_cpu >= 0)
+						goto next;
+					skip = 1;
+				} else
+					some_idle_cpu = i;
 			}
 
+			if (skip)
+				goto next;
+
 			target = cpumask_first_and(sched_group_cpus(sg),
 					tsk_cpus_allowed(p));
 			goto done;
@@ -2677,6 +2687,9 @@ next:
 			sg = sg->next;
 		} while (sg != sd->groups);
 	}
+
+	if (some_idle_cpu >= 0)
+		target = some_idle_cpu;
 done:
 	return target;
 }
@@ -2766,7 +2779,10 @@ select_task_rq_fair(struct task_struct *
 			prev_cpu = cpu;
 
 		new_cpu = select_idle_sibling(p, prev_cpu);
-		goto unlock;
+		if (idle_cpu(new_cpu))
+			goto unlock;
+		sd = rcu_dereference(per_cpu(sd_llc, prev_cpu));
+		cpu = prev_cpu;
 	}
 
 	while (sd) {








       reply	other threads:[~2012-03-05 15:24 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1329764866.2293.376.camhel@twins>
2012-03-05 15:24 ` Srivatsa Vaddagiri [this message]
2012-03-06  9:14   ` sched: Avoid SMT siblings in select_idle_sibling() if possible Ingo Molnar
2012-03-06 10:03     ` Srivatsa Vaddagiri
2012-03-22 15:32     ` Srivatsa Vaddagiri
2012-03-23  6:38       ` Mike Galbraith
2012-03-26  8:29       ` Peter Zijlstra
2012-03-26  8:36       ` Peter Zijlstra
2012-03-26 17:35         ` Srivatsa Vaddagiri
2012-03-26 18:06           ` Peter Zijlstra
2012-03-27 13:56             ` Mike Galbraith
2011-11-15  9:46 Peter Zijlstra
2011-11-16  1:14 ` Suresh Siddha
2011-11-16  9:24   ` Mike Galbraith
2011-11-16 18:37     ` Suresh Siddha
2011-11-17  1:59       ` Mike Galbraith
2011-11-17 15:38         ` Mike Galbraith
2011-11-17 15:56           ` Peter Zijlstra
2011-11-17 16:38             ` Mike Galbraith
2011-11-17 17:36               ` Suresh Siddha
2011-11-18 15:14                 ` Mike Galbraith
2012-02-20 14:41                   ` Peter Zijlstra
2012-02-20 15:03                     ` Srivatsa Vaddagiri
2012-02-20 18:25                       ` Mike Galbraith
2012-02-21  0:06                         ` Srivatsa Vaddagiri
2012-02-21  6:37                           ` Mike Galbraith
2012-02-21  8:09                             ` Srivatsa Vaddagiri
2012-02-20 18:14                     ` Mike Galbraith
2012-02-20 18:15                       ` Peter Zijlstra
2012-02-20 19:07                       ` Peter Zijlstra
2012-02-21  5:43                         ` Mike Galbraith
2012-02-21  8:32                           ` Srivatsa Vaddagiri
2012-02-21  9:21                             ` Mike Galbraith
2012-02-21 10:37                               ` Peter Zijlstra
2012-02-21 14:58                                 ` Srivatsa Vaddagiri
2012-02-23 10:49                       ` Srivatsa Vaddagiri
2012-02-23 11:19                         ` Ingo Molnar
2012-02-23 12:18                           ` Srivatsa Vaddagiri
2012-02-23 11:20                         ` Srivatsa Vaddagiri
2012-02-23 11:26                           ` Ingo Molnar
2012-02-23 11:32                             ` Srivatsa Vaddagiri
2012-02-23 16:17                               ` Ingo Molnar
2012-02-23 11:21                         ` Mike Galbraith
2012-02-25  6:54                           ` Srivatsa Vaddagiri
2012-02-25  8:30                             ` Mike Galbraith
2012-02-27 22:11                               ` Suresh Siddha
2012-02-28  5:05                                 ` Mike Galbraith
2011-11-17 19:08             ` Suresh Siddha
2011-11-18 15:12               ` Peter Zijlstra
2011-11-18 15:26                 ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120305152443.GE26559@linux.vnet.ibm.com \
    --to=vatsa@linux.vnet.ibm.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=suresh.b.siddha@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).