* [PATCH -rt] timer: delay waking softirqs from the jiffy tick
@ 2009-08-21 9:56 Peter Zijlstra
2009-08-21 12:12 ` John Kacur
2009-08-21 23:32 ` john stultz
0 siblings, 2 replies; 4+ messages in thread
From: Peter Zijlstra @ 2009-08-21 9:56 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: Ingo Molnar, linux-kernel
Hi,
most people were complaining about broken balancing with the recent -rt
series.
A look at /proc/sched_debug yielded:
cpu#0, 2393.874 MHz
.nr_running : 0
.load : 0
.cpu_load[0] : 177522
.cpu_load[1] : 177522
.cpu_load[2] : 177522
.cpu_load[3] : 177522
.cpu_load[4] : 177522
cpu#1, 2393.874 MHz
.nr_running : 4
.load : 4096
.cpu_load[0] : 181618
.cpu_load[1] : 180850
.cpu_load[2] : 180274
.cpu_load[3] : 179938
.cpu_load[4] : 179758
Which indicated the cpu_load computation was hosed, the 177522 value
indicates that there is one RT task runnable. Initially I thought the
old problem of calculating the cpu_load from a softirq had re-surfaced,
however looking at the code shows its being done from scheduler_tick().
[ we really should fix this RT/cfs interaction some day... ]
A few trace_printk()s later:
sirq-timer/1-19 [001] 174.289744: 19: 50:S ==> [001] 0:140:R <idle>
<idle>-0 [001] 174.290724: enqueue_task_rt: adding task: 19/sirq-timer/1 with load: 177522
<idle>-0 [001] 174.290725: 0:140:R + [001] 19: 50:S sirq-timer/1
<idle>-0 [001] 174.290730: scheduler_tick: current load: 177522
<idle>-0 [001] 174.290732: scheduler_tick: current: 0/swapper
<idle>-0 [001] 174.290736: 0:140:R ==> [001] 19: 50:R sirq-timer/1
sirq-timer/1-19 [001] 174.290741: dequeue_task_rt: removing task: 19/sirq-timer/1 with load: 177522
sirq-timer/1-19 [001] 174.290743: 19: 50:S ==> [001] 0:140:R <idle>
We see that we always raise the timer softirq before doing the load
calculation. Avoid this by re-ordering the scheduler_tick() call in
update_process_times() to occur before we deal with timers.
This lowers the load back to sanity and restores regular load-balancing
behaviour.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
kernel/timer.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/timer.c b/kernel/timer.c
index 8137cce..96ac1b4 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1221,10 +1221,10 @@ void update_process_times(int user_tick)
/* Note: this timer irq context must be accounted for as well. */
account_process_tick(p, user_tick);
+ scheduler_tick();
run_local_timers();
if (rcu_pending(cpu))
rcu_check_callbacks(cpu, user_tick);
- scheduler_tick();
run_posix_cpu_timers(p);
}
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH -rt] timer: delay waking softirqs from the jiffy tick
2009-08-21 9:56 [PATCH -rt] timer: delay waking softirqs from the jiffy tick Peter Zijlstra
@ 2009-08-21 12:12 ` John Kacur
2009-08-21 12:19 ` Peter Zijlstra
2009-08-21 23:32 ` john stultz
1 sibling, 1 reply; 4+ messages in thread
From: John Kacur @ 2009-08-21 12:12 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel, dinogun
On Fri, Aug 21, 2009 at 11:56 AM, Peter Zijlstra<peterz@infradead.org> wrote:
> Hi,
>
> most people were complaining about broken balancing with the recent -rt
> series.
>
> A look at /proc/sched_debug yielded:
>
> cpu#0, 2393.874 MHz
> .nr_running : 0
> .load : 0
> .cpu_load[0] : 177522
> .cpu_load[1] : 177522
> .cpu_load[2] : 177522
> .cpu_load[3] : 177522
> .cpu_load[4] : 177522
> cpu#1, 2393.874 MHz
> .nr_running : 4
> .load : 4096
> .cpu_load[0] : 181618
> .cpu_load[1] : 180850
> .cpu_load[2] : 180274
> .cpu_load[3] : 179938
> .cpu_load[4] : 179758
>
>
> Which indicated the cpu_load computation was hosed, the 177522 value
> indicates that there is one RT task runnable. Initially I thought the
> old problem of calculating the cpu_load from a softirq had re-surfaced,
> however looking at the code shows its being done from scheduler_tick().
>
> [ we really should fix this RT/cfs interaction some day... ]
>
> A few trace_printk()s later:
>
> sirq-timer/1-19 [001] 174.289744: 19: 50:S ==> [001] 0:140:R <idle>
> <idle>-0 [001] 174.290724: enqueue_task_rt: adding task: 19/sirq-timer/1 with load: 177522
> <idle>-0 [001] 174.290725: 0:140:R + [001] 19: 50:S sirq-timer/1
> <idle>-0 [001] 174.290730: scheduler_tick: current load: 177522
> <idle>-0 [001] 174.290732: scheduler_tick: current: 0/swapper
> <idle>-0 [001] 174.290736: 0:140:R ==> [001] 19: 50:R sirq-timer/1
> sirq-timer/1-19 [001] 174.290741: dequeue_task_rt: removing task: 19/sirq-timer/1 with load: 177522
> sirq-timer/1-19 [001] 174.290743: 19: 50:S ==> [001] 0:140:R <idle>
>
> We see that we always raise the timer softirq before doing the load
> calculation. Avoid this by re-ordering the scheduler_tick() call in
> update_process_times() to occur before we deal with timers.
>
> This lowers the load back to sanity and restores regular load-balancing
> behaviour.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> kernel/timer.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/timer.c b/kernel/timer.c
> index 8137cce..96ac1b4 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1221,10 +1221,10 @@ void update_process_times(int user_tick)
>
> /* Note: this timer irq context must be accounted for as well. */
> account_process_tick(p, user_tick);
> + scheduler_tick();
> run_local_timers();
> if (rcu_pending(cpu))
> rcu_check_callbacks(cpu, user_tick);
> - scheduler_tick();
> run_posix_cpu_timers(p);
> }
>
>
> --
Cool! I applied this to the v2.6.31-rc6-rt5 tree and with the
following results from /proc/sched_debug
Before applying the patch:
cpu#0, 2792.838 MHz
.cpu_load[0] : 180594
.cpu_load[1] : 192061
.cpu_load[2] : 205170
.cpu_load[3] : 204449
.cpu_load[4] : 199281
cpu#1, 2792.838 MHz
.cpu_load[0] : 177522
.cpu_load[1] : 178378
.cpu_load[2] : 178932
.cpu_load[3] : 178808
After applying the patch:
cpu#0, 2792.847 MHz
.cpu_load[0] : 1024
.cpu_load[1] : 960
.cpu_load[2] : 700
.cpu_load[3] : 430
.cpu_load[4] : 260
cpu#1, 2792.847 MHz
.cpu_load[0] : 1024
.cpu_load[1] : 1280
.cpu_load[2] : 1393
.cpu_load[3] : 1390
.cpu_load[4] : 1871
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH -rt] timer: delay waking softirqs from the jiffy tick
2009-08-21 12:12 ` John Kacur
@ 2009-08-21 12:19 ` Peter Zijlstra
0 siblings, 0 replies; 4+ messages in thread
From: Peter Zijlstra @ 2009-08-21 12:19 UTC (permalink / raw)
To: John Kacur; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel, dinogun
On Fri, 2009-08-21 at 14:12 +0200, John Kacur wrote:
> On Fri, Aug 21, 2009 at 11:56 AM, Peter Zijlstra<peterz@infradead.org> wrote:
> > cpu#0, 2393.874 MHz
> cpu#0, 2792.838 MHz
Now clearly the most important information here is that your cpu is
faster than mine :-)
/me runs
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH -rt] timer: delay waking softirqs from the jiffy tick
2009-08-21 9:56 [PATCH -rt] timer: delay waking softirqs from the jiffy tick Peter Zijlstra
2009-08-21 12:12 ` John Kacur
@ 2009-08-21 23:32 ` john stultz
1 sibling, 0 replies; 4+ messages in thread
From: john stultz @ 2009-08-21 23:32 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 12118 bytes --]
On Fri, Aug 21, 2009 at 2:56 AM, Peter Zijlstra<peterz@infradead.org> wrote:
> most people were complaining about broken balancing with the recent -rt
> series.
>
> A look at /proc/sched_debug yielded:
>
> cpu#0, 2393.874 MHz
> .nr_running : 0
> .load : 0
> .cpu_load[0] : 177522
> .cpu_load[1] : 177522
> .cpu_load[2] : 177522
> .cpu_load[3] : 177522
> .cpu_load[4] : 177522
> cpu#1, 2393.874 MHz
> .nr_running : 4
> .load : 4096
> .cpu_load[0] : 181618
> .cpu_load[1] : 180850
> .cpu_load[2] : 180274
> .cpu_load[3] : 179938
> .cpu_load[4] : 179758
>
>
> Which indicated the cpu_load computation was hosed, the 177522 value
> indicates that there is one RT task runnable. Initially I thought the
> old problem of calculating the cpu_load from a softirq had re-surfaced,
> however looking at the code shows its being done from scheduler_tick().
>
> [ we really should fix this RT/cfs interaction some day... ]
>
> A few trace_printk()s later:
>
> sirq-timer/1-19 [001] 174.289744: 19: 50:S ==> [001] 0:140:R <idle>
> <idle>-0 [001] 174.290724: enqueue_task_rt: adding task: 19/sirq-timer/1 with load: 177522
> <idle>-0 [001] 174.290725: 0:140:R + [001] 19: 50:S sirq-timer/1
> <idle>-0 [001] 174.290730: scheduler_tick: current load: 177522
> <idle>-0 [001] 174.290732: scheduler_tick: current: 0/swapper
> <idle>-0 [001] 174.290736: 0:140:R ==> [001] 19: 50:R sirq-timer/1
> sirq-timer/1-19 [001] 174.290741: dequeue_task_rt: removing task: 19/sirq-timer/1 with load: 177522
> sirq-timer/1-19 [001] 174.290743: 19: 50:S ==> [001] 0:140:R <idle>
>
> We see that we always raise the timer softirq before doing the load
> calculation. Avoid this by re-ordering the scheduler_tick() call in
> update_process_times() to occur before we deal with timers.
>
> This lowers the load back to sanity and restores regular load-balancing
> behaviour.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> kernel/timer.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/timer.c b/kernel/timer.c
> index 8137cce..96ac1b4 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1221,10 +1221,10 @@ void update_process_times(int user_tick)
>
> /* Note: this timer irq context must be accounted for as well. */
> account_process_tick(p, user_tick);
> + scheduler_tick();
> run_local_timers();
> if (rcu_pending(cpu))
> rcu_check_callbacks(cpu, user_tick);
> - scheduler_tick();
> run_posix_cpu_timers(p);
> }
So I gave this a shot, and its somewhat improved, but I'm still seeing
a number of threads bunch up while other cpus go idle.
Simple c test case attached.
thanks
-john
sched_debug output:
Sched Debug Version: v0.09, 2.6.31-rc6-rt5fixed #11
now at 1452228.842078 msecs
.jiffies : 4296119524
.sysctl_sched_latency : 60.000000
.sysctl_sched_min_granularity : 12.000000
.sysctl_sched_wakeup_granularity : 15.000000
.sysctl_sched_child_runs_first : 0.000001
.sysctl_sched_features : 113917
cpu#0, 2193.084 MHz
.nr_running : 0
.load : 0
.nr_switches : 3625608
.nr_load_updates : 1452203
.nr_uninterruptible : 355
.next_balance : 4296.119775
.curr->pid : 0
.clock : 1452228.012605
.cpu_load[0] : 0
.cpu_load[1] : 512
.cpu_load[2] : 384
.cpu_load[3] : 224
.cpu_load[4] : 120
.rt.rt_nr_running : 0
.rt.rt_nr_uninterruptible : -2
.rto_schedule : 1478928
.rto_schedule_tail : 0
.rto_wakeup : 0
.rto_pulled : 0
.rto_pushed : 0
.yld_count : 0
.sched_switch : 0
.sched_count : 3636126
.sched_goidle : 960497
.ttwu_count : 1938509
.ttwu_local : 1878333
.bkl_count : 165
cfs_rq[0]:
.exec_clock : 525430.235581
.MIN_vruntime : 0.000001
.min_vruntime : 1770489.332594
.max_vruntime : 0.000001
.spread : 0.000000
.spread0 : 0.000000
.nr_running : 0
.load : 0
.nr_spread_over : 0
rt_rq[0]:
.rt_nr_running : 0
.rt_throttled : 0
.rt_time : 0.254107
.rt_runtime : 950.000000
runnable tasks:
task PID tree-key switches prio
exec-runtime sum-exec sum-sleep
----------------------------------------------------------------------------------------------------------
cpu#1, 2193.084 MHz
.nr_running : 5
.load : 5120
.nr_switches : 4824708
.nr_load_updates : 1451675
.nr_uninterruptible : -135
.next_balance : 4296.119574
.curr->pid : 19623
.clock : 1452228.136825
.cpu_load[0] : 5120
.cpu_load[1] : 49500
.cpu_load[2] : 38405
.cpu_load[3] : 24537
.cpu_load[4] : 15522
.rt.rt_nr_running : 0
.rt.rt_nr_uninterruptible : -2
.rto_schedule : 1472565
.rto_schedule_tail : 0
.rto_wakeup : 0
.rto_pulled : 0
.rto_pushed : 0
.yld_count : 1
.sched_switch : 0
.sched_count : 6529209
.sched_goidle : 432143
.ttwu_count : 3181567
.ttwu_local : 3128804
.bkl_count : 250
cfs_rq[1]:
.exec_clock : 1051264.911762
.MIN_vruntime : 1550428.769298
.min_vruntime : 1550428.613868
.max_vruntime : 1550429.162494
.spread : 0.393196
.spread0 : -220060.718726
.nr_running : 5
.load : 5120
.nr_spread_over : 4
rt_rq[1]:
.rt_nr_running : 0
.rt_throttled : 0
.rt_time : 0.343947
.rt_runtime : 950.000000
runnable tasks:
task PID tree-key switches prio
exec-runtime sum-exec sum-sleep
----------------------------------------------------------------------------------------------------------
spin 19620 1550428.769298 3236 120
1550428.769298 3201.087846 0.133102
spin 19621 1550428.916072 3230 120
1550428.916072 3205.577748 0.000000
spin 19622 1550429.162494 3244 120
1550429.162494 3208.822004 0.000000
R spin 19623 1550428.613868 3245 120
1550428.613868 3208.273826 0.000000
spin 19624 1550428.914231 3320 120
1550428.914231 3279.605516 0.000000
cpu#2, 2193.084 MHz
.nr_running : 1
.load : 1024
.nr_switches : 3654869
.nr_load_updates : 1451358
.nr_uninterruptible : -161
.next_balance : 4296.119525
.curr->pid : 19626
.clock : 1452228.262756
.cpu_load[0] : 1024
.cpu_load[1] : 512
.cpu_load[2] : 256
.cpu_load[3] : 128
.cpu_load[4] : 64
.rt.rt_nr_running : 0
.rt.rt_nr_uninterruptible : -2
.rto_schedule : 1489508
.rto_schedule_tail : 0
.rto_wakeup : 0
.rto_pulled : 0
.rto_pushed : 0
.yld_count : 0
.sched_switch : 0
.sched_count : 3659050
.sched_goidle : 1210178
.ttwu_count : 1982899
.ttwu_local : 1937837
.bkl_count : 316
cfs_rq[2]:
.exec_clock : 271064.442561
.MIN_vruntime : 0.000001
.min_vruntime : 1252745.472872
.max_vruntime : 0.000001
.spread : 0.000000
.spread0 : -517743.859722
.nr_running : 1
.load : 1024
.nr_spread_over : 0
rt_rq[2]:
.rt_nr_running : 0
.rt_throttled : 0
.rt_time : 0.245213
.rt_runtime : 950.000000
runnable tasks:
task PID tree-key switches prio
exec-runtime sum-exec sum-sleep
----------------------------------------------------------------------------------------------------------
R cat 19626 1252745.472872 4 120
1252745.472872 0.938117 0.001859
cpu#3, 2193.084 MHz
.nr_running : 0
.load : 0
.nr_switches : 3642923
.nr_load_updates : 1451041
.nr_uninterruptible : -59
.next_balance : 4296.119528
.curr->pid : 0
.clock : 1452228.390208
.cpu_load[0] : 0
.cpu_load[1] : 256
.cpu_load[2] : 192
.cpu_load[3] : 112
.cpu_load[4] : 60
.rt.rt_nr_running : 0
.rt.rt_nr_uninterruptible : -2
.rto_schedule : 1488745
.rto_schedule_tail : 0
.rto_wakeup : 0
.rto_pulled : 0
.rto_pushed : 0
.yld_count : 0
.sched_switch : 0
.sched_count : 3647105
.sched_goidle : 1205735
.ttwu_count : 1971911
.ttwu_local : 1935417
.bkl_count : 938
cfs_rq[3]:
.exec_clock : 278467.804587
.MIN_vruntime : 0.000001
.min_vruntime : 1506528.907842
.max_vruntime : 0.000001
.spread : 0.000000
.spread0 : -263960.424752
.nr_running : 0
.load : 0
.nr_spread_over : 2
rt_rq[3]:
.rt_nr_running : 0
.rt_throttled : 0
.rt_time : 0.304767
.rt_runtime : 950.000000
runnable tasks:
task PID tree-key switches prio
exec-runtime sum-exec sum-sleep
----------------------------------------------------------------------------------------------------------
[-- Attachment #2: spin.c --]
[-- Type: text/x-csrc, Size: 139 bytes --]
#include <stdio.h>
void spin(void)
{
while (1) ;
}
void main(void)
{
int i;
for(i=0;i<4;i++)
if(fork())
spin();
spin();
}
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-08-21 23:32 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-21 9:56 [PATCH -rt] timer: delay waking softirqs from the jiffy tick Peter Zijlstra
2009-08-21 12:12 ` John Kacur
2009-08-21 12:19 ` Peter Zijlstra
2009-08-21 23:32 ` john stultz
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.