linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* CPU scheduler question/problem
@ 2009-01-22 21:34 Pawel Dziekonski
  2009-01-23 15:40 ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Pawel Dziekonski @ 2009-01-22 21:34 UTC (permalink / raw)
  To: linux-kernel

Hello,

all my compute intensive processes stick to one core,
whatever I put in /proc/sys/kernel/sched_min_granularity_ns - big number,
small number.

yes, I read /usr/src/linux/Documentation/scheduler/sched-design-CFS.txt
and yes, I did google about CFS tuning. no, I'm not a kernel developer. ;-)

also creation of cpu.shares does not help.

kernels tested:
2.6.27.7-9-default from openSUSE 11.1 distro
and vanilla 2.6.18.1.

this is a very simple test showing openssl performance:

dd if=/dev/zero bs=1M count=10000 | \
openssl enc -k qqqq -aes-128-cbc | \
pv > /dev/null

(you can skip pv if you like or not have it installed)

when I start top or pidstat I see that all of them run on the same core
and are competing for CPU time (pv 2%, dd 15%, openssl 83%).
to workaround this problem I use taskset to place processes on different cores:

taskset -c 0 dd if=/dev/zero bs=1M count=10000 | \
taskset -c 1 ./openssl enc -k qqqq -aes-128-cbc | \
taskset -c 2 pv > /dev/null

now the benchmark runs on 3 different cores and gives waaay better results
(115 MB/s compared to 185 MB/s).

another example: I start a computing application (quantum chemistry) that spawns
four processes that share workload. it happens very often that 2 processes share
same core for a moment and then jump to another core. taskset -p helps but it is
very inconvenient.

I am not using cpufreq.

question: is there a way to better balance processes over different
cores and do it
automagically? automating is necessary because I plan do build a compute cluster
with a batch system.

thanks in advance!

ps. my system is openSUSE 11.1 on Intel Core i7 CPU 940, HT enabled.
pps. I am not subscribed to LKML - please CC me - thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CPU scheduler question/problem
  2009-01-22 21:34 CPU scheduler question/problem Pawel Dziekonski
@ 2009-01-23 15:40 ` Peter Zijlstra
  2009-01-26 13:48   ` Pawel Dziekonski
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2009-01-23 15:40 UTC (permalink / raw)
  To: Pawel Dziekonski; +Cc: linux-kernel, Ingo Molnar

On Thu, 2009-01-22 at 22:34 +0100, Pawel Dziekonski wrote:
> question: is there a way to better balance processes over different
> cores and do it automagically?

Hard, the load-balancing code is a bunch of heuristics that work 'well'
for most of the things.

The pipe workload you mentioned has would behave that way because pipes
'assume' a produces/consumer behaviour, and thus are more likely to
place both tasks on the same cpu -- but will eventually pull them apart
if they want to run concurrently.

You might enable SCHED_DEBUG=y and try

 echo NO_SYNC_WAKEUPS > /debug/sched_features

For that particular load.

About your quantum chemistry application -- you say they share a
workload, does that mean they synchronize a lot on locks? If so, the
scheduler might, at times of serialization, think it is a
produces/consumer load and move tasks together, and then later, when
they run independently, move them apart again.

I'm afraid you'll have to share a bit more of how your application works
in order to get a more informed answer.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CPU scheduler question/problem
  2009-01-23 15:40 ` Peter Zijlstra
@ 2009-01-26 13:48   ` Pawel Dziekonski
  2009-01-26 13:55     ` Peter Zijlstra
  2009-01-26 22:55     ` Ingo Molnar
  0 siblings, 2 replies; 7+ messages in thread
From: Pawel Dziekonski @ 2009-01-26 13:48 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar

2009/1/23 Peter Zijlstra <peterz@infradead.org>:

> The pipe workload you mentioned has would behave that way because pipes
> 'assume' a produces/consumer behaviour, and thus are more likely to
> place both tasks on the same cpu -- but will eventually pull them apart
> if they want to run concurrently.
>
> You might enable SCHED_DEBUG=y and try
>  echo NO_SYNC_WAKEUPS > /debug/sched_features

Hello,

that did the trick. Openssl now gets a whole core exclusively and gives full
performance.

Regarding quantum chemistry application -- it is also using pipes
for communication between worker processes. Now this app works OK.

Where I can read more on tuning sched_features for different workloads?

Also, is there a way get a list of available schedulers and how to switch
between them?

thanks, Pawel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CPU scheduler question/problem
  2009-01-26 13:48   ` Pawel Dziekonski
@ 2009-01-26 13:55     ` Peter Zijlstra
  2009-01-26 22:55     ` Ingo Molnar
  1 sibling, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2009-01-26 13:55 UTC (permalink / raw)
  To: Pawel Dziekonski; +Cc: linux-kernel, Ingo Molnar

On Mon, 2009-01-26 at 14:48 +0100, Pawel Dziekonski wrote:
> 2009/1/23 Peter Zijlstra <peterz@infradead.org>:
> 
> > The pipe workload you mentioned has would behave that way because pipes
> > 'assume' a produces/consumer behaviour, and thus are more likely to
> > place both tasks on the same cpu -- but will eventually pull them apart
> > if they want to run concurrently.
> >
> > You might enable SCHED_DEBUG=y and try
> >  echo NO_SYNC_WAKEUPS > /debug/sched_features
> 
> Hello,
> 
> that did the trick. Openssl now gets a whole core exclusively and gives full
> performance.
> 
> Regarding quantum chemistry application -- it is also using pipes
> for communication between worker processes. Now this app works OK.

Hmm, how long does a worker run for each received packet? The thing is,
if the data is cache affine for the issuing cpu, and the worker runs
short it often doesn't make sense to run it remote, as the cache
transfer will hurt more.

If it runs longer, the balancer will usually pick it up and move it
around.

/debug/sched_features is mostly a debug tool, its not a supported/stable
tuning interface.

> Where I can read more on tuning sched_features for different workloads?

kernel/sched*.[ch] ;-)

> Also, is there a way get a list of available schedulers and how to switch
> between them?

include/linux/sched.h:

#define SCHED_NORMAL		0
#define SCHED_FIFO		1
#define SCHED_RR		2
#define SCHED_BATCH		3
/* SCHED_ISO: reserved but not implemented yet */
#define SCHED_IDLE		5

and the posix sched_{set,get}schedule() functions.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CPU scheduler question/problem
  2009-01-26 13:48   ` Pawel Dziekonski
  2009-01-26 13:55     ` Peter Zijlstra
@ 2009-01-26 22:55     ` Ingo Molnar
  2009-01-27 16:04       ` Pawel Dziekonski
  1 sibling, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2009-01-26 22:55 UTC (permalink / raw)
  To: Pawel Dziekonski, Mike Galbraith; +Cc: Peter Zijlstra, linux-kernel


* Pawel Dziekonski <dzieko@gmail.com> wrote:

> 2009/1/23 Peter Zijlstra <peterz@infradead.org>:
> 
> > The pipe workload you mentioned has would behave that way because pipes
> > 'assume' a produces/consumer behaviour, and thus are more likely to
> > place both tasks on the same cpu -- but will eventually pull them apart
> > if they want to run concurrently.
> >
> > You might enable SCHED_DEBUG=y and try
> >  echo NO_SYNC_WAKEUPS > /debug/sched_features
> 
> Hello,
> 
> that did the trick. Openssl now gets a whole core exclusively and gives 
> full performance.
> 
> Regarding quantum chemistry application -- it is also using pipes for 
> communication between worker processes. Now this app works OK.

Could you please try the fuller fix below too please, does it still do the 
trick and does the scheduler still maximize openssl and your quantum 
chemistry app's throughput?

There should be no need for you to tune anything - the scheduler must get 
such workloads right out of the box.

	Ingo

--------------------------->
>From 08a1c2658637045d207b6fa0c328055e589a4009 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Mon, 26 Jan 2009 17:56:17 +0100
Subject: [PATCH] sched: disable sync wakeups

Pawel Dziekonski reported that the openssl benchmark and his
quantum chemistry application both show slowdowns due to the
scheduler under-parallelizing execution.

The reason are pipe wakeups still doing 'sync' wakeups which
overrides the normal buddy wakeup logic - even if waker and
wakee are loosely coupled.

So disable sync wakeups and also fix an inversion of logic
in the buddy wakeup code.

Reported-by: Pawel Dziekonski <dzieko@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c          |    4 ++++
 kernel/sched_fair.c     |    8 +-------
 kernel/sched_features.h |    2 +-
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 8c2be1e..ce1cfc6 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2266,6 +2266,10 @@ static int try_to_wake_up(struct task_struct *p, unsigned int state, int sync)
 	if (!sched_feat(SYNC_WAKEUPS))
 		sync = 0;
 
+	if (!sync && (current->se.avg_overlap < sysctl_sched_migration_cost &&
+			    p->se.avg_overlap < sysctl_sched_migration_cost))
+		sync = 1;
+
 #ifdef CONFIG_SMP
 	if (sched_feat(LB_WAKEUP_UPDATE)) {
 		struct sched_domain *sd;
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 5cc1c16..fd789a2 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1189,10 +1189,6 @@ wake_affine(struct sched_domain *this_sd, struct rq *this_rq,
 	if (!(this_sd->flags & SD_WAKE_AFFINE) || !sched_feat(AFFINE_WAKEUPS))
 		return 0;
 
-	if (sync && (curr->se.avg_overlap > sysctl_sched_migration_cost ||
-			p->se.avg_overlap > sysctl_sched_migration_cost))
-		sync = 0;
-
 	/*
 	 * If sync wakeup then subtract the (maximum possible)
 	 * effect of the currently running task from the load
@@ -1419,9 +1415,7 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int sync)
 	if (!sched_feat(WAKEUP_PREEMPT))
 		return;
 
-	if (sched_feat(WAKEUP_OVERLAP) && (sync ||
-			(se->avg_overlap < sysctl_sched_migration_cost &&
-			 pse->avg_overlap < sysctl_sched_migration_cost))) {
+	if (sched_feat(WAKEUP_OVERLAP) && sync) {
 		resched_task(curr);
 		return;
 	}
diff --git a/kernel/sched_features.h b/kernel/sched_features.h
index da5d93b..8134e65 100644
--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -4,7 +4,7 @@ SCHED_FEAT(WAKEUP_PREEMPT, 1)
 SCHED_FEAT(START_DEBIT, 1)
 SCHED_FEAT(AFFINE_WAKEUPS, 1)
 SCHED_FEAT(CACHE_HOT_BUDDY, 1)
-SCHED_FEAT(SYNC_WAKEUPS, 1)
+SCHED_FEAT(SYNC_WAKEUPS, 0)
 SCHED_FEAT(HRTICK, 0)
 SCHED_FEAT(DOUBLE_TICK, 0)
 SCHED_FEAT(ASYM_GRAN, 1)

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: CPU scheduler question/problem
  2009-01-26 22:55     ` Ingo Molnar
@ 2009-01-27 16:04       ` Pawel Dziekonski
  0 siblings, 0 replies; 7+ messages in thread
From: Pawel Dziekonski @ 2009-01-27 16:04 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Mike Galbraith, Peter Zijlstra, linux-kernel

2009/1/26 Ingo Molnar <mingo@elte.hu>:
>
> * Pawel Dziekonski <dzieko@gmail.com> wrote:
>
>> 2009/1/23 Peter Zijlstra <peterz@infradead.org>:
>>
>> > The pipe workload you mentioned has would behave that way because pipes
>> > 'assume' a produces/consumer behaviour, and thus are more likely to
>> > place both tasks on the same cpu -- but will eventually pull them apart
>> > if they want to run concurrently.
>> >
>> > You might enable SCHED_DEBUG=y and try
>> >  echo NO_SYNC_WAKEUPS > /debug/sched_features
>>
>> Hello,
>>
>> that did the trick. Openssl now gets a whole core exclusively and gives
>> full performance.
>>
>> Regarding quantum chemistry application -- it is also using pipes for
>> communication between worker processes. Now this app works OK.
>
> Could you please try the fuller fix below too please, does it still do the
> trick and does the scheduler still maximize openssl and your quantum
> chemistry app's throughput?
>
> There should be no need for you to tune anything - the scheduler must get
> such workloads right out of the box.

hello,

After contacting Ingo directly I downloaded tip/master kernel tree via
http://people.redhat.com/mingo/tip.git/README.

after reboot SYNC_WAKEUPS is enabled by default and my openssl
benchmark is still
stuck on one core:

# uname -a
Linux MiP 2.6.29-rc2 #1 SMP Tue Jan 27 16:03:29 CET 2009 x86_64 x86_64
x86_64 GNU/Linux

# cat /sys/kernel/debug/sched_features
NEW_FAIR_SLEEPERS NO_NORMALIZED_SLEEPER ADAPTIVE_GRAN WAKEUP_PREEMPT
START_DEBIT AFFINE_WAKEUPS CACHE_HOT_BUDDY SYNC_WAKEUPS NO_HRTICK
NO_DOUBLE_TICK ASYM_GRAN LB_BIAS LB_WAKEUP_UPDATE ASYM_EFF_LOAD
NO_WAKEUP_OVERLAP LAST_BUDDY OWNER_SPIN

# pidstat

16:20:43          PID    %usr %system  %guest    %CPU   CPU  Command
16:20:44         4992    0.00    2.00    0.00    2.00     4  dd
16:20:44         4993   81.00    6.00    0.00   87.00     4  openssl
16:20:44         4994    3.00    9.00    0.00   12.00     4  pv

> Also, does
> the slowdown go away (if SYNC_WAKEUPS is enabled) if you reduce/increase
> /proc/sys/kernel/sched_migration_cost?

I tested values of 100000, 500000 (default), 999999. in all cases 3
processes stay together on the same core and are jumping together
between different cores:

# pidstat 1

16:27:07          PID    %usr %system  %guest    %CPU   CPU  Command
16:27:08         4992    0.00    1.00    0.00    1.00     0  dd
16:27:08         4993   78.00    6.00    0.00   84.00     0  openssl
16:27:08         4994    2.00   12.00    0.00   14.00     0  pv

16:27:08          PID    %usr %system  %guest    %CPU   CPU  Command
16:27:09         4992    0.00    2.00    0.00    2.00     4  dd
16:27:09         4993   83.00    3.00    0.00   86.00     4  openssl
16:27:09         4994    3.00    9.00    0.00   12.00     4  pv

However value of "1" works:

16:30:08          PID    %usr %system  %guest    %CPU   CPU  Command
16:30:09         6197    0.00    4.00    0.00    4.00     2  dd
16:30:09         6198   94.00    7.00    0.00  101.00     1  openssl
16:30:09         6199    2.00   15.00    0.00   17.00     0  pv

regards, Pawel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CPU scheduler question/problem
       [not found]                   ` <2cd4df870902091056v7287e53fx8e7c8c5599b856b3@mail.gmail.com>
@ 2009-02-11  9:06                     ` Yinghai Lu
  0 siblings, 0 replies; 7+ messages in thread
From: Yinghai Lu @ 2009-02-11  9:06 UTC (permalink / raw)
  To: Pawel Dziekonski, Ingo Molnar
  Cc: Peter Zijlstra, Tejun Heo, H. Peter Anvin, Jeremy Fitzhardinge,
	Thomas Gleixner, linux-kernel

Pawel Dziekonski wrote:
> 2009/2/9 Ingo Molnar <mingo@elte.hu>:
> 
>> What you need to do after this is to:
>>  git checkout tip/master
>> to be on the latest tip/master tree.
> 
> looks good.
> 
> again I got warnings (not errors!) around MODPOST about mismatch.
> make CONFIG_DEBUG_SECTION_MISMATCH=y
> shows 2 warning about some acpi related functions having (or not)
> necesary __init prefix.
> 
> after reboot I got this:
> 
> ------------[ cut here ]------------
> WARNING: at arch/x86/mm/ioremap.c:616 check_early_ioremap_leak+0x52/0x67()
> Hardware name:
> Debug warning: early ioremap leak of 1 areas detected.
> Modules linked in:


please check

[PATCH] pci: fix one early_ioremap leaking

Impact: fix map leaking

Pawel reported:
------------[ cut here ]------------
WARNING: at arch/x86/mm/ioremap.c:616 check_early_ioremap_leak+0x52/0x67()
Hardware name:
Debug warning: early ioremap leak of 1 areas detected.
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.29-rc4-tip #2
...

Reported-by: Pawel Dziekonski <dzieko@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/dmar.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/pci/dmar.c
===================================================================
--- linux-2.6.orig/drivers/pci/dmar.c
+++ linux-2.6/drivers/pci/dmar.c
@@ -42,6 +42,7 @@
 LIST_HEAD(dmar_drhd_units);
 
 static struct acpi_table_header * __initdata dmar_tbl;
+static acpi_size dmar_tbl_size;
 
 static void __init dmar_register_drhd_unit(struct dmar_drhd_unit *drhd)
 {
@@ -288,8 +289,9 @@ static int __init dmar_table_detect(void
 	acpi_status status = AE_OK;
 
 	/* if we could find DMAR table, then there are DMAR devices */
-	status = acpi_get_table(ACPI_SIG_DMAR, 0,
-				(struct acpi_table_header **)&dmar_tbl);
+	status = acpi_get_table_with_size(ACPI_SIG_DMAR, 0,
+				(struct acpi_table_header **)&dmar_tbl,
+				&dmar_tbl_size);
 
 	if (ACPI_SUCCESS(status) && !dmar_tbl) {
 		printk (KERN_WARNING PREFIX "Unable to map DMAR\n");
@@ -481,6 +483,7 @@ void __init detect_intel_iommu(void)
 			iommu_detected = 1;
 #endif
 	}
+	early_acpi_os_unmap_memory(dmar_tbl, dmar_tbl_size);
 	dmar_tbl = NULL;
 }
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-02-11  9:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-22 21:34 CPU scheduler question/problem Pawel Dziekonski
2009-01-23 15:40 ` Peter Zijlstra
2009-01-26 13:48   ` Pawel Dziekonski
2009-01-26 13:55     ` Peter Zijlstra
2009-01-26 22:55     ` Ingo Molnar
2009-01-27 16:04       ` Pawel Dziekonski
     [not found] <2cd4df870902031544h5f0b4e59na2c0a0804125dd9a@mail.gmail.com>
     [not found] ` <2cd4df870902040600r2974362r4e9eabf2608b05b4@mail.gmail.com>
     [not found]   ` <20090204142455.GE4411@elte.hu>
     [not found]     ` <2cd4df870902040751l19332473ic36e2642723f5ec8@mail.gmail.com>
     [not found]       ` <20090205192502.GC27422@elte.hu>
     [not found]         ` <2cd4df870902051514k31582fbal11113b37b756dda0@mail.gmail.com>
     [not found]           ` <1233914595.10894.1.camel@laptop>
     [not found]             ` <20090206155311.GQ18368@elte.hu>
     [not found]               ` <2cd4df870902061444p335a9433l73fe6e820ec11c6@mail.gmail.com>
     [not found]                 ` <20090209122538.GI17782@elte.hu>
     [not found]                   ` <2cd4df870902091056v7287e53fx8e7c8c5599b856b3@mail.gmail.com>
2009-02-11  9:06                     ` Yinghai Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).