linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sched: Performance of Trade workload running inside VM
@ 2012-02-14 11:28 Srivatsa Vaddagiri
  2012-02-15 11:59 ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Srivatsa Vaddagiri @ 2012-02-14 11:28 UTC (permalink / raw)
  To: mingo, a.p.zijlstra, pjt, efault, venki, suresh.b.siddha
  Cc: linux-kernel, Nikunj A. Dadhania

I was investigating a performance issue which appears to be linked to
scheduler in some ways. Before I mention the potential scheduler issue,
here's the benchmark description:

Machine : 2 Intel quad-core CPU with HT enabled (16 logical CPUs), 48GB RAM
Linux kernel version : tip (HEAD at a80142eb)

cpu cgroups:
	/libvirt/qemu/VM1 (cpu.shares = 8192)
	/libvirt/qemu/VM2 (cpu.shares = 1024)
	/libvirt/qemu/VM3 (cpu.shares = 1024)
	/libvirt/qemu/VM4 (cpu.shares = 1024)
	/libvirt/qemu/VM5 (cpu.shares = 1024)

VM1-5 correspond to virtual machines. VM1 has 8 vcpus, while each of VM2-5 has
4 VCPUs. VM1 runs the (most important) Trade (OLTP) benchmark, while VM2-5 run 
CPU hogs to keep all their vcpus busy. 

A load generator running on the host bombards Trade server running
inside VM1 with requests and measures throughput alongwith response times.

			Only VM1 active		All VMs active
		=====================================================
Throughput		33395.083/min		18294.48/min  (-45%)
VM1 CPU utilization	21.4%			13.73%	      (-35%)

In the first case, only VM1 (running Trade server) is kept active while
VM2-5 are kept suspended. In that case, we see VM1 consume 21.4% CPU
with benchmark score at 33395.083/min.

Next, we activate all VMs (VM2-5 are resumed), which leads to benchmark
score dropping by 45% and CPU utilization of VM1 dropping by 35%. This
is despite VM1's shares of 8192 entitling it to receive 66% CPU resource
upon demand (8192/8192+4*1024 = 66%). Assigning lots and lots more shares to
VM1 doesn't help at all improve the situation.

Examining the execution pattern of VM1 (with help from scheduling
traces) revealed that:

a. VCPU tasks of VM1 sleep and run in short bursts (in microseconds scale),
   stressing the wakeup path of scheduler. 

b. In the "all VMs active" case, VM1's vcpu tasks were found to incur
   "high" wait times when two of VM1's tasks were scheduled on the same
   CPU (i.e a VCPU task had to wait behind a sibling VCPU task for
   obtaining CPU resource).

Further enabling SD_BALANCE_WAKE at SMT and MC domains and disabling
SD_WAKE_AFFINE at all domains (smt/mc/node) helped improve the CPU utilization
(and benchmark score) quite a bit. CPU utilization of VM1 (when all VMs are
active) went up to 17.5%.

This lead me to investigate the wakeup code path closely and in
particular select_idle_sibling(). select_idle_sibling() looks for a core
that is fully idle, failing which causes the task to wakeup on prev_cpu
(or cur_cpu). In particular, it does not go hunt for the least loaded
cpu, which is what SD_BALANCE_WAKE provides.

It seemed to me that we could have SD_BALANCE_WAKE enabled in SMT/MC
domains atleast without losing on cache benefits. However Peterz seems
to have noted that SD_BALANCE_WAKE can hurt sysbench.

https://lkml.org/lkml/2009/9/16/340

which I could easily verify on this system (i.e sysbench oltp throughput
drops when SD_BALANCE_WAKE is enabled).

I have tried coming up with something that allows us to keep
SD_BALANCE_WAKE enabled at smt/mc domains, not hurt sysbench and
also help the Trade benchmark that I had begun investigating. The patch
falls back to SD_BALANCE_WAKE type balance when the cpu returned by
select_idle_cpu() is not idle.

				tip		tip + patch
			=============================================
sysbench 			4032.313	4558.780     (+13%)
Trade thr'put (all VMs active) 	18294.48/min	31916.393    (+74%)
VM1 cpu util (all VMs active)   13.7%	   	17.3%	     (+26%)

[Note : sysbench was run with 16 threads as:

# sysbench --num-threads=16 --max-requests=100000 --test=oltp --oltp-table-size=500000 --mysql-socket=/var/lib/mysql/mysql.sock --oltp-read-only --mysql-user=root --mysql-password=blah run

]

Any other suggestions to help recover this particular benchmark score in
contended situation?

Not yet Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

---
 include/linux/topology.h |    4 ++--
 kernel/sched/fair.c      |    4 +++-
 2 files changed, 5 insertions(+), 3 deletions(-)

Index: linux-3.3-rc3-tip-a80142eb/include/linux/topology.h
===================================================================
--- linux-3.3-rc3-tip-a80142eb.orig/include/linux/topology.h
+++ linux-3.3-rc3-tip-a80142eb/include/linux/topology.h
@@ -96,7 +96,7 @@ int arch_update_cpu_topology(void);
 				| 1*SD_BALANCE_NEWIDLE			\
 				| 1*SD_BALANCE_EXEC			\
 				| 1*SD_BALANCE_FORK			\
-				| 0*SD_BALANCE_WAKE			\
+				| 1*SD_BALANCE_WAKE			\
 				| 1*SD_WAKE_AFFINE			\
 				| 1*SD_SHARE_CPUPOWER			\
 				| 0*SD_POWERSAVINGS_BALANCE		\
@@ -129,7 +129,7 @@ int arch_update_cpu_topology(void);
 				| 1*SD_BALANCE_NEWIDLE			\
 				| 1*SD_BALANCE_EXEC			\
 				| 1*SD_BALANCE_FORK			\
-				| 0*SD_BALANCE_WAKE			\
+				| 1*SD_BALANCE_WAKE			\
 				| 1*SD_WAKE_AFFINE			\
 				| 0*SD_PREFER_LOCAL			\
 				| 0*SD_SHARE_CPUPOWER			\
Index: linux-3.3-rc3-tip-a80142eb/kernel/sched/fair.c
===================================================================
--- linux-3.3-rc3-tip-a80142eb.orig/kernel/sched/fair.c
+++ linux-3.3-rc3-tip-a80142eb/kernel/sched/fair.c
@@ -2783,7 +2783,9 @@ select_task_rq_fair(struct task_struct *
 			prev_cpu = cpu;
 
 		new_cpu = select_idle_sibling(p, prev_cpu);
-		goto unlock;
+		if (idle_cpu(new_cpu))
+			goto unlock;
+		sd = rcu_dereference(per_cpu(sd_llc, prev_cpu));
 	}
 
 	while (sd) {


















^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: sched: Performance of Trade workload running inside VM
  2012-02-14 11:28 sched: Performance of Trade workload running inside VM Srivatsa Vaddagiri
@ 2012-02-15 11:59 ` Peter Zijlstra
  2012-02-15 17:10   ` Srivatsa Vaddagiri
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2012-02-15 11:59 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: mingo, pjt, efault, venki, suresh.b.siddha, linux-kernel,
	Nikunj A. Dadhania

On Tue, 2012-02-14 at 16:58 +0530, Srivatsa Vaddagiri wrote:

> This lead me to investigate the wakeup code path closely and in
> particular select_idle_sibling(). select_idle_sibling() looks for a core
> that is fully idle, failing which causes the task to wakeup on prev_cpu
> (or cur_cpu). In particular, it does not go hunt for the least loaded
> cpu, which is what SD_BALANCE_WAKE provides.
> 
> It seemed to me that we could have SD_BALANCE_WAKE enabled in SMT/MC
> domains atleast without losing on cache benefits. However Peterz seems
> to have noted that SD_BALANCE_WAKE can hurt sysbench.


> I have tried coming up with something that allows us to keep
> SD_BALANCE_WAKE enabled at smt/mc domains, not hurt sysbench and
> also help the Trade benchmark that I had begun investigating. The patch
> falls back to SD_BALANCE_WAKE type balance when the cpu returned by
> select_idle_cpu() is not idle.


> Index: linux-3.3-rc3-tip-a80142eb/kernel/sched/fair.c
> ===================================================================
> --- linux-3.3-rc3-tip-a80142eb.orig/kernel/sched/fair.c
> +++ linux-3.3-rc3-tip-a80142eb/kernel/sched/fair.c
> @@ -2783,7 +2783,9 @@ select_task_rq_fair(struct task_struct *
>  			prev_cpu = cpu;
>  
>  		new_cpu = select_idle_sibling(p, prev_cpu);
> -		goto unlock;
> +		if (idle_cpu(new_cpu))
> +			goto unlock;
> +		sd = rcu_dereference(per_cpu(sd_llc, prev_cpu));
>  	}
>  
>  	while (sd) {

Right, so the problem with this is that it might defeat wake_affine,
wake_affine tries to pull a task towards it wakeup source (irrespective
of idleness thereof).

Also, wake_balance is somewhat expensive, which seems like a bad thing
considering your workload is already wakeup heavy.

That said, there was a lot of text in your email which hid what your
actual problem was. So please try again, less words, more actual content
please.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: sched: Performance of Trade workload running inside VM
  2012-02-15 11:59 ` Peter Zijlstra
@ 2012-02-15 17:10   ` Srivatsa Vaddagiri
  2012-02-15 17:24     ` Peter Zijlstra
  2012-02-15 17:26     ` Peter Zijlstra
  0 siblings, 2 replies; 11+ messages in thread
From: Srivatsa Vaddagiri @ 2012-02-15 17:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, pjt, efault, venki, suresh.b.siddha, linux-kernel,
	Nikunj A. Dadhania

* Peter Zijlstra <a.p.zijlstra@chello.nl> [2012-02-15 12:59:21]:

> > @@ -2783,7 +2783,9 @@ select_task_rq_fair(struct task_struct *
> >  			prev_cpu = cpu;
> >  
> >  		new_cpu = select_idle_sibling(p, prev_cpu);
> > -		goto unlock;
> > +		if (idle_cpu(new_cpu))
> > +			goto unlock;
> > +		sd = rcu_dereference(per_cpu(sd_llc, prev_cpu));
> >  	}
> >  
> >  	while (sd) {
> 
> Right, so the problem with this is that it might defeat wake_affine,
> wake_affine tries to pull a task towards it wakeup source (irrespective
> of idleness thereof).

Isn't it already broken in some respect, given that
select_idle_sibling() could select a cpu which is different from wakeup
source (thus forcing a task to run on a cpu different from wakeup
source)?

Are there benchmarks you would suggest that could be sensitive to
wake_affine? I have already tried sysbench and found that it benefits
from this patch:

> Also, wake_balance is somewhat expensive, which seems like a bad thing
> considering your workload is already wakeup heavy.

The patch seems to help both my workload and sysbench.

                                tip       	tip + patch
                        =============================================
sysbench                        4032.313        4558.780     (+13%)
Trade thr'put (all VMs active)  18294.48/min    31916.393    (+74%)
VM1 cpu util (all VMs active)   13.7%           17.3%        (+26%)


> That said, there was a lot of text in your email which hid what your
> actual problem was. So please try again, less words, more actual content
> please.

Ok ..let me see if these numbers highlight the problem better.

Machine : 2 Quad-core Intel CPUs w/ HT enabled (16 logical cpus)
Host kernel : tip (HEAD at 2ce21a52)

cgroups:
	/libvirt	  (cpu.shares = 20000)
	/libvirt/qemu/VM1 (cpu.shares varied from 1024 -> 131072)
	/libvirt/qemu/VM2 (cpu.shares = 1024)
	/libvirt/qemu/VM3 (cpu.shares = 1024)
	/libvirt/qemu/VM4 (cpu.shares = 1024)
	/libvirt/qemu/VM5 (cpu.shares = 1024)

VM1-5 are (KVM) virtual machines. VM1 runs the most important benchmark
and has 8 vcpus.  VM2-5 each has 4 vcpus and run cpu hogs to keep their vcpus
busy. A load generator running on host bombards web+database server
running in VM1 and measures throughput alongwith response times.

First lets see the performance of benchmark when only VM1 is running
(other VMs suspended)

			Throughput 	VM1 %cpu utilization
			(tx/min)	(measured over 30-sec window)
		=========================================================

Only VM1 active		32900		20.35

>From this we know that VM1 is capable of delivering upto 32900 tx/min
performance in uncontended situation.

Next we activate all VMs. VM2-5 are running cpu hogs and are run at
constant cpu.shares of 1024. VM1's cpu.shares is varied from 1024 ->
131072 and its impact on benchmark performance is noted as below:

			Throughput 	VM1 %cpu utilization
VM1 cpu.shares		(tx/min)	(measured over 30-sec window)
========================================================================

1024			1547		4
2048			5900		9
4096			14000		12.4
8192			17700		13.5
16384			18800		13.5
32768			19600		13.6
65536			18323		13.4
131072			19000		13.8


Observed results:
	No matter how high cpu.shares we assign to VM1, its utilization
	flattens at ~14% and benchmark score does not improve beyond
	19000

Expected results:
	Increasing cpu.shares should let VM1 consume more and more CPU
	until it reaches close to its peak demand (20.35%) and delivers close 
	to peak performance possible (32900).

I will share similar results with patch applied by tomorrow. Also I am
trying to recreate the problem using simpler programs (like sload). Will
let you know if I am successful with that!

- vatsa


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: sched: Performance of Trade workload running inside VM
  2012-02-15 17:10   ` Srivatsa Vaddagiri
@ 2012-02-15 17:24     ` Peter Zijlstra
  2012-02-15 17:38       ` Srivatsa Vaddagiri
  2012-02-15 17:26     ` Peter Zijlstra
  1 sibling, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2012-02-15 17:24 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: mingo, pjt, efault, venki, suresh.b.siddha, linux-kernel,
	Nikunj A. Dadhania

On Wed, 2012-02-15 at 22:40 +0530, Srivatsa Vaddagiri wrote:
> Ok ..let me see if these numbers highlight the problem better.

does this translate like: I've no fscking clue, but my tinker made it go
away?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: sched: Performance of Trade workload running inside VM
  2012-02-15 17:10   ` Srivatsa Vaddagiri
  2012-02-15 17:24     ` Peter Zijlstra
@ 2012-02-15 17:26     ` Peter Zijlstra
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2012-02-15 17:26 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: mingo, pjt, efault, venki, suresh.b.siddha, linux-kernel,
	Nikunj A. Dadhania

On Wed, 2012-02-15 at 22:40 +0530, Srivatsa Vaddagiri wrote:
> Isn't it already broken in some respect, given that
> select_idle_sibling() could select a cpu which is different from
> wakeup
> source (thus forcing a task to run on a cpu different from wakeup
> source)? 

select_idle_sibling() should keep it in the same cache domain, thereby
reducing pain and increasing parallelism.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: sched: Performance of Trade workload running inside VM
  2012-02-15 17:24     ` Peter Zijlstra
@ 2012-02-15 17:38       ` Srivatsa Vaddagiri
  2012-02-15 17:45         ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Srivatsa Vaddagiri @ 2012-02-15 17:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, pjt, efault, venki, suresh.b.siddha, linux-kernel,
	Nikunj A. Dadhania

* Peter Zijlstra <a.p.zijlstra@chello.nl> [2012-02-15 18:24:11]:

> On Wed, 2012-02-15 at 22:40 +0530, Srivatsa Vaddagiri wrote:
> > Ok ..let me see if these numbers highlight the problem better.
> 
> does this translate like: I've no fscking clue,

I'd mentioned a possible reason earlier which is limiting VM1 to reach higher
utilization (which is the time it waits for cpu after wakeup):

>b. In the "all VMs active" case, VM1's vcpu tasks were found to incur
>   "high" wait times when ttwo of VM1's tasks were scheduled on the same
>   CPU (i.e a VCPU task had to wait behind a sibling VCPU task for
>   obtaining CPU resource).

Let me get cpu wait time data and post it by tomorrow.

> but my tinker made it go away?

Do you have any other suggestions for me to try?

- vatsa


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: sched: Performance of Trade workload running inside VM
  2012-02-15 17:38       ` Srivatsa Vaddagiri
@ 2012-02-15 17:45         ` Peter Zijlstra
  2012-02-15 17:56           ` Srivatsa Vaddagiri
  2012-02-18  7:41           ` Srivatsa Vaddagiri
  0 siblings, 2 replies; 11+ messages in thread
From: Peter Zijlstra @ 2012-02-15 17:45 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: mingo, pjt, efault, venki, suresh.b.siddha, linux-kernel,
	Nikunj A. Dadhania

On Wed, 2012-02-15 at 23:08 +0530, Srivatsa Vaddagiri wrote:
> * Peter Zijlstra <a.p.zijlstra@chello.nl> [2012-02-15 18:24:11]:
> 
> > On Wed, 2012-02-15 at 22:40 +0530, Srivatsa Vaddagiri wrote:
> > > Ok ..let me see if these numbers highlight the problem better.
> > 
> > does this translate like: I've no fscking clue,
> 
> I'd mentioned a possible reason earlier which is limiting VM1 to reach higher
> utilization (which is the time it waits for cpu after wakeup):
> 
> >b. In the "all VMs active" case, VM1's vcpu tasks were found to incur
> >   "high" wait times when ttwo of VM1's tasks were scheduled on the same
> >   CPU (i.e a VCPU task had to wait behind a sibling VCPU task for
> >   obtaining CPU resource).
> 
> Let me get cpu wait time data and post it by tomorrow.
> 
> > but my tinker made it go away?
> 
> Do you have any other suggestions for me to try?

I'm still waiting for a problem description that isn't a book.

What does the load-balancer do, why is it wrong, why does your patch
sort it etc.

I've really no idea what you're trying to do, other than make your
numbers improve (which while a noble goal, doesn't help in judging your
patch or suggesting alternative means of getting there).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: sched: Performance of Trade workload running inside VM
  2012-02-15 17:45         ` Peter Zijlstra
@ 2012-02-15 17:56           ` Srivatsa Vaddagiri
  2012-02-18  7:41           ` Srivatsa Vaddagiri
  1 sibling, 0 replies; 11+ messages in thread
From: Srivatsa Vaddagiri @ 2012-02-15 17:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, pjt, efault, venki, suresh.b.siddha, linux-kernel,
	Nikunj A. Dadhania

* Peter Zijlstra <a.p.zijlstra@chello.nl> [2012-02-15 18:45:02]:

> I'm still waiting for a problem description that isn't a book.

:-)

> What does the load-balancer do, why is it wrong, why does your patch
> sort it etc.

Ok. I will get some more load balancer traces and describe what it is doing
(potentially) wrong which is affecting this benchmark.

> I've really no idea what you're trying to do, other than make your
> numbers improve (which while a noble goal, doesn't help in judging your
> patch or suggesting alternative means of getting there).

- vatsa


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: sched: Performance of Trade workload running inside VM
  2012-02-15 17:45         ` Peter Zijlstra
  2012-02-15 17:56           ` Srivatsa Vaddagiri
@ 2012-02-18  7:41           ` Srivatsa Vaddagiri
  2012-02-20 14:56             ` Peter Zijlstra
  1 sibling, 1 reply; 11+ messages in thread
From: Srivatsa Vaddagiri @ 2012-02-18  7:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, pjt, efault, venki, suresh.b.siddha, linux-kernel,
	 Nikunj A. Dadhania

* Peter Zijlstra <a.p.zijlstra@chello.nl> [2012-02-15 18:45:02]:

> I'm still waiting for a problem description that isn't a book.
> 
> What does the load-balancer do,

select_idle_sibling() tries finding a core (sched group) which is fully idle
failing which will let task wakeup on prev_cpu.

> why is it wrong,

Take this scenario. wakeup source for task is cpu0 while its prev_cpu is
cpu7 (which is in another cache domain).

		cache_dom0		cache_dom1
		(0, 1) (2, 3)		(4, 5) (6, 7)
nr_running->	 1  1   1  1     	 0, 1   1, 2

In this case we let task wakeup on cpu7, resulting in it incurring some
wait time before being scheduled. A better choice would have been cpu4 (whose
core is partially idle) or in general any other less-loaded cpu which is in same
cache domain?

> why does your patch sort it etc.

The patch does result in a hunt for "least" busy cpu when the target cpu
returned by select_idle_sibling() is not idle - thus resulting in better
scheduling latencies for the task (and in turn better benchmark scores).

Another variant of the patch could be to have select_idle_sibling() look
for any idle cpu that is in same cache domain (rather than looking for a
whole group of cpus to be idle)?

- vatsa


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: sched: Performance of Trade workload running inside VM
  2012-02-18  7:41           ` Srivatsa Vaddagiri
@ 2012-02-20 14:56             ` Peter Zijlstra
  2012-02-20 15:09               ` Srivatsa Vaddagiri
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2012-02-20 14:56 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: mingo, pjt, efault, venki, suresh.b.siddha, linux-kernel,
	Nikunj A. Dadhania

On Sat, 2012-02-18 at 13:11 +0530, Srivatsa Vaddagiri wrote:
> * Peter Zijlstra <a.p.zijlstra@chello.nl> [2012-02-15 18:45:02]:

> > why does your patch sort it etc.
> 
> The patch does result in a hunt for "least" busy cpu when the target cpu
> returned by select_idle_sibling() is not idle - thus resulting in better
> scheduling latencies for the task (and in turn better benchmark scores).
> 
> Another variant of the patch could be to have select_idle_sibling() look
> for any idle cpu that is in same cache domain (rather than looking for a
> whole group of cpus to be idle)?

Right, so I looked over select_idle_sibling() again and it made my head
hurt :/ I can't immediately tell if its actually doing the right thing
or not (it _should_ try and avoid using SMT siblings if possible).

It would be very nice not to have both select_idle_sibling() and
SD_BALANCE_WAKE iterate the domain tree. So merging them if at all
possible would be goodness I think.

We'd have WAKE_AFFINE to decide which cache domain etc to stuff the task
on and then use select_idle_sibling() to find the most appropriate cpu
within that cache domain.

There was talk of modifying select_idle_sibling() to also consider the
C-state the cpu was in, preferring shallower over deeper C-states where
there's choice, this is very similar to what you propose, taking the
least loaded cpu when there isn't a proper idle one around.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: sched: Performance of Trade workload running inside VM
  2012-02-20 14:56             ` Peter Zijlstra
@ 2012-02-20 15:09               ` Srivatsa Vaddagiri
  0 siblings, 0 replies; 11+ messages in thread
From: Srivatsa Vaddagiri @ 2012-02-20 15:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, pjt, efault, venki, suresh.b.siddha, linux-kernel,
	Nikunj A. Dadhania

* Peter Zijlstra <a.p.zijlstra@chello.nl> [2012-02-20 15:56:30]:

> > Another variant of the patch could be to have select_idle_sibling() look
> > for any idle cpu that is in same cache domain (rather than looking for a
> > whole group of cpus to be idle)?
> 
> Right, so I looked over select_idle_sibling() again and it made my head
> hurt :/

I can vouch for it :-)

> I can't immediately tell if its actually doing the right thing
> or not (it _should_ try and avoid using SMT siblings if possible).

Yes makes sense.

> It would be very nice not to have both select_idle_sibling() and
> SD_BALANCE_WAKE iterate the domain tree. So merging them if at all
> possible would be goodness I think.

Right. Let me see how that can be worked out in my next version.

> We'd have WAKE_AFFINE to decide which cache domain etc to stuff the task
> on and then use select_idle_sibling() to find the most appropriate cpu
> within that cache domain.
> 
> There was talk of modifying select_idle_sibling() to also consider the
> C-state the cpu was in, preferring shallower over deeper C-states where
> there's choice,

Ok ..interesting. /me goes and educates himself how this info can be dug
out.

> this is very similar to what you propose, taking the
> least loaded cpu when there isn't a proper idle one around.

Thanks for the feedback ..

- vatsa


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-02-20 15:10 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-14 11:28 sched: Performance of Trade workload running inside VM Srivatsa Vaddagiri
2012-02-15 11:59 ` Peter Zijlstra
2012-02-15 17:10   ` Srivatsa Vaddagiri
2012-02-15 17:24     ` Peter Zijlstra
2012-02-15 17:38       ` Srivatsa Vaddagiri
2012-02-15 17:45         ` Peter Zijlstra
2012-02-15 17:56           ` Srivatsa Vaddagiri
2012-02-18  7:41           ` Srivatsa Vaddagiri
2012-02-20 14:56             ` Peter Zijlstra
2012-02-20 15:09               ` Srivatsa Vaddagiri
2012-02-15 17:26     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).