All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -rt] timer: delay waking softirqs from the jiffy tick
@ 2009-08-21  9:56 Peter Zijlstra
  2009-08-21 12:12 ` John Kacur
  2009-08-21 23:32 ` john stultz
  0 siblings, 2 replies; 4+ messages in thread
From: Peter Zijlstra @ 2009-08-21  9:56 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Ingo Molnar, linux-kernel

Hi,

most people were complaining about broken balancing with the recent -rt
series.

A look at /proc/sched_debug yielded:

cpu#0, 2393.874 MHz
  .nr_running                    : 0
  .load                          : 0
  .cpu_load[0]                   : 177522
  .cpu_load[1]                   : 177522
  .cpu_load[2]                   : 177522
  .cpu_load[3]                   : 177522
  .cpu_load[4]                   : 177522
cpu#1, 2393.874 MHz
  .nr_running                    : 4
  .load                          : 4096
  .cpu_load[0]                   : 181618
  .cpu_load[1]                   : 180850
  .cpu_load[2]                   : 180274
  .cpu_load[3]                   : 179938
  .cpu_load[4]                   : 179758


Which indicated the cpu_load computation was hosed, the 177522 value
indicates that there is one RT task runnable. Initially I thought the
old problem of calculating the cpu_load from a softirq had re-surfaced,
however looking at the code shows its being done from scheduler_tick().

[ we really should fix this RT/cfs interaction some day... ]

A few trace_printk()s later:

    sirq-timer/1-19    [001]   174.289744:     19: 50:S ==> [001]     0:140:R <idle>                            
          <idle>-0     [001]   174.290724: enqueue_task_rt: adding task: 19/sirq-timer/1 with load: 177522      
          <idle>-0     [001]   174.290725:      0:140:R   + [001]    19: 50:S sirq-timer/1                      
          <idle>-0     [001]   174.290730: scheduler_tick: current load: 177522                                 
          <idle>-0     [001]   174.290732: scheduler_tick: current: 0/swapper                                   
          <idle>-0     [001]   174.290736:      0:140:R ==> [001]    19: 50:R sirq-timer/1                      
    sirq-timer/1-19    [001]   174.290741: dequeue_task_rt: removing task: 19/sirq-timer/1 with load: 177522    
    sirq-timer/1-19    [001]   174.290743:     19: 50:S ==> [001]     0:140:R <idle>  

We see that we always raise the timer softirq before doing the load
calculation. Avoid this by re-ordering the scheduler_tick() call in
update_process_times() to occur before we deal with timers.

This lowers the load back to sanity and restores regular load-balancing
behaviour.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/timer.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index 8137cce..96ac1b4 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1221,10 +1221,10 @@ void update_process_times(int user_tick)
 
 	/* Note: this timer irq context must be accounted for as well. */
 	account_process_tick(p, user_tick);
+	scheduler_tick();
 	run_local_timers();
 	if (rcu_pending(cpu))
 		rcu_check_callbacks(cpu, user_tick);
-	scheduler_tick();
 	run_posix_cpu_timers(p);
 }
 


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH -rt] timer: delay waking softirqs from the jiffy tick
  2009-08-21  9:56 [PATCH -rt] timer: delay waking softirqs from the jiffy tick Peter Zijlstra
@ 2009-08-21 12:12 ` John Kacur
  2009-08-21 12:19   ` Peter Zijlstra
  2009-08-21 23:32 ` john stultz
  1 sibling, 1 reply; 4+ messages in thread
From: John Kacur @ 2009-08-21 12:12 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel, dinogun

On Fri, Aug 21, 2009 at 11:56 AM, Peter Zijlstra<peterz@infradead.org> wrote:
> Hi,
>
> most people were complaining about broken balancing with the recent -rt
> series.
>
> A look at /proc/sched_debug yielded:
>
> cpu#0, 2393.874 MHz
>  .nr_running                    : 0
>  .load                          : 0
>  .cpu_load[0]                   : 177522
>  .cpu_load[1]                   : 177522
>  .cpu_load[2]                   : 177522
>  .cpu_load[3]                   : 177522
>  .cpu_load[4]                   : 177522
> cpu#1, 2393.874 MHz
>  .nr_running                    : 4
>  .load                          : 4096
>  .cpu_load[0]                   : 181618
>  .cpu_load[1]                   : 180850
>  .cpu_load[2]                   : 180274
>  .cpu_load[3]                   : 179938
>  .cpu_load[4]                   : 179758
>
>
> Which indicated the cpu_load computation was hosed, the 177522 value
> indicates that there is one RT task runnable. Initially I thought the
> old problem of calculating the cpu_load from a softirq had re-surfaced,
> however looking at the code shows its being done from scheduler_tick().
>
> [ we really should fix this RT/cfs interaction some day... ]
>
> A few trace_printk()s later:
>
>    sirq-timer/1-19    [001]   174.289744:     19: 50:S ==> [001]     0:140:R <idle>
>          <idle>-0     [001]   174.290724: enqueue_task_rt: adding task: 19/sirq-timer/1 with load: 177522
>          <idle>-0     [001]   174.290725:      0:140:R   + [001]    19: 50:S sirq-timer/1
>          <idle>-0     [001]   174.290730: scheduler_tick: current load: 177522
>          <idle>-0     [001]   174.290732: scheduler_tick: current: 0/swapper
>          <idle>-0     [001]   174.290736:      0:140:R ==> [001]    19: 50:R sirq-timer/1
>    sirq-timer/1-19    [001]   174.290741: dequeue_task_rt: removing task: 19/sirq-timer/1 with load: 177522
>    sirq-timer/1-19    [001]   174.290743:     19: 50:S ==> [001]     0:140:R <idle>
>
> We see that we always raise the timer softirq before doing the load
> calculation. Avoid this by re-ordering the scheduler_tick() call in
> update_process_times() to occur before we deal with timers.
>
> This lowers the load back to sanity and restores regular load-balancing
> behaviour.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  kernel/timer.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/timer.c b/kernel/timer.c
> index 8137cce..96ac1b4 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1221,10 +1221,10 @@ void update_process_times(int user_tick)
>
>        /* Note: this timer irq context must be accounted for as well. */
>        account_process_tick(p, user_tick);
> +       scheduler_tick();
>        run_local_timers();
>        if (rcu_pending(cpu))
>                rcu_check_callbacks(cpu, user_tick);
> -       scheduler_tick();
>        run_posix_cpu_timers(p);
>  }
>
>
> --

Cool! I applied this to the v2.6.31-rc6-rt5 tree and with the
following results from /proc/sched_debug

Before applying the patch:

cpu#0, 2792.838 MHz
  .cpu_load[0]                   : 180594
  .cpu_load[1]                   : 192061
  .cpu_load[2]                   : 205170
  .cpu_load[3]                   : 204449
  .cpu_load[4]                   : 199281

cpu#1, 2792.838 MHz
  .cpu_load[0]                   : 177522
  .cpu_load[1]                   : 178378
  .cpu_load[2]                   : 178932
  .cpu_load[3]                   : 178808

After applying the patch:

cpu#0, 2792.847 MHz
  .cpu_load[0]                   : 1024
  .cpu_load[1]                   : 960
  .cpu_load[2]                   : 700
  .cpu_load[3]                   : 430
  .cpu_load[4]                   : 260

cpu#1, 2792.847 MHz
  .cpu_load[0]                   : 1024
  .cpu_load[1]                   : 1280
  .cpu_load[2]                   : 1393
  .cpu_load[3]                   : 1390
  .cpu_load[4]                   : 1871

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH -rt] timer: delay waking softirqs from the jiffy tick
  2009-08-21 12:12 ` John Kacur
@ 2009-08-21 12:19   ` Peter Zijlstra
  0 siblings, 0 replies; 4+ messages in thread
From: Peter Zijlstra @ 2009-08-21 12:19 UTC (permalink / raw)
  To: John Kacur; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel, dinogun

On Fri, 2009-08-21 at 14:12 +0200, John Kacur wrote:
> On Fri, Aug 21, 2009 at 11:56 AM, Peter Zijlstra<peterz@infradead.org> wrote:

> > cpu#0, 2393.874 MHz

> cpu#0, 2792.838 MHz

Now clearly the most important information here is that your cpu is
faster than mine :-)

/me runs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH -rt] timer: delay waking softirqs from the jiffy tick
  2009-08-21  9:56 [PATCH -rt] timer: delay waking softirqs from the jiffy tick Peter Zijlstra
  2009-08-21 12:12 ` John Kacur
@ 2009-08-21 23:32 ` john stultz
  1 sibling, 0 replies; 4+ messages in thread
From: john stultz @ 2009-08-21 23:32 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Thomas Gleixner, Ingo Molnar, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 12118 bytes --]

On Fri, Aug 21, 2009 at 2:56 AM, Peter Zijlstra<peterz@infradead.org> wrote:
> most people were complaining about broken balancing with the recent -rt
> series.
>
> A look at /proc/sched_debug yielded:
>
> cpu#0, 2393.874 MHz
>  .nr_running                    : 0
>  .load                          : 0
>  .cpu_load[0]                   : 177522
>  .cpu_load[1]                   : 177522
>  .cpu_load[2]                   : 177522
>  .cpu_load[3]                   : 177522
>  .cpu_load[4]                   : 177522
> cpu#1, 2393.874 MHz
>  .nr_running                    : 4
>  .load                          : 4096
>  .cpu_load[0]                   : 181618
>  .cpu_load[1]                   : 180850
>  .cpu_load[2]                   : 180274
>  .cpu_load[3]                   : 179938
>  .cpu_load[4]                   : 179758
>
>
> Which indicated the cpu_load computation was hosed, the 177522 value
> indicates that there is one RT task runnable. Initially I thought the
> old problem of calculating the cpu_load from a softirq had re-surfaced,
> however looking at the code shows its being done from scheduler_tick().
>
> [ we really should fix this RT/cfs interaction some day... ]
>
> A few trace_printk()s later:
>
>    sirq-timer/1-19    [001]   174.289744:     19: 50:S ==> [001]     0:140:R <idle>
>          <idle>-0     [001]   174.290724: enqueue_task_rt: adding task: 19/sirq-timer/1 with load: 177522
>          <idle>-0     [001]   174.290725:      0:140:R   + [001]    19: 50:S sirq-timer/1
>          <idle>-0     [001]   174.290730: scheduler_tick: current load: 177522
>          <idle>-0     [001]   174.290732: scheduler_tick: current: 0/swapper
>          <idle>-0     [001]   174.290736:      0:140:R ==> [001]    19: 50:R sirq-timer/1
>    sirq-timer/1-19    [001]   174.290741: dequeue_task_rt: removing task: 19/sirq-timer/1 with load: 177522
>    sirq-timer/1-19    [001]   174.290743:     19: 50:S ==> [001]     0:140:R <idle>
>
> We see that we always raise the timer softirq before doing the load
> calculation. Avoid this by re-ordering the scheduler_tick() call in
> update_process_times() to occur before we deal with timers.
>
> This lowers the load back to sanity and restores regular load-balancing
> behaviour.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  kernel/timer.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/timer.c b/kernel/timer.c
> index 8137cce..96ac1b4 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1221,10 +1221,10 @@ void update_process_times(int user_tick)
>
>        /* Note: this timer irq context must be accounted for as well. */
>        account_process_tick(p, user_tick);
> +       scheduler_tick();
>        run_local_timers();
>        if (rcu_pending(cpu))
>                rcu_check_callbacks(cpu, user_tick);
> -       scheduler_tick();
>        run_posix_cpu_timers(p);
>  }

So I gave this a shot, and its somewhat improved, but I'm still seeing
a number of threads bunch up while other cpus go idle.

Simple c test case attached.

thanks
-john



sched_debug output:

Sched Debug Version: v0.09, 2.6.31-rc6-rt5fixed #11
now at 1452228.842078 msecs
  .jiffies                                 : 4296119524
  .sysctl_sched_latency                    : 60.000000
  .sysctl_sched_min_granularity            : 12.000000
  .sysctl_sched_wakeup_granularity         : 15.000000
  .sysctl_sched_child_runs_first           : 0.000001
  .sysctl_sched_features                   : 113917

cpu#0, 2193.084 MHz
  .nr_running                    : 0
  .load                          : 0
  .nr_switches                   : 3625608
  .nr_load_updates               : 1452203
  .nr_uninterruptible            : 355
  .next_balance                  : 4296.119775
  .curr->pid                     : 0
  .clock                         : 1452228.012605
  .cpu_load[0]                   : 0
  .cpu_load[1]                   : 512
  .cpu_load[2]                   : 384
  .cpu_load[3]                   : 224
  .cpu_load[4]                   : 120
  .rt.rt_nr_running              : 0
  .rt.rt_nr_uninterruptible      : -2
  .rto_schedule                  : 1478928
  .rto_schedule_tail             : 0
  .rto_wakeup                    : 0
  .rto_pulled                    : 0
  .rto_pushed                    : 0
  .yld_count                     : 0
  .sched_switch                  : 0
  .sched_count                   : 3636126
  .sched_goidle                  : 960497
  .ttwu_count                    : 1938509
  .ttwu_local                    : 1878333
  .bkl_count                     : 165

cfs_rq[0]:
  .exec_clock                    : 525430.235581
  .MIN_vruntime                  : 0.000001
  .min_vruntime                  : 1770489.332594
  .max_vruntime                  : 0.000001
  .spread                        : 0.000000
  .spread0                       : 0.000000
  .nr_running                    : 0
  .load                          : 0
  .nr_spread_over                : 0

rt_rq[0]:
  .rt_nr_running                 : 0
  .rt_throttled                  : 0
  .rt_time                       : 0.254107
  .rt_runtime                    : 950.000000

runnable tasks:
            task   PID         tree-key  switches  prio
exec-runtime         sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------

cpu#1, 2193.084 MHz
  .nr_running                    : 5
  .load                          : 5120
  .nr_switches                   : 4824708
  .nr_load_updates               : 1451675
  .nr_uninterruptible            : -135
  .next_balance                  : 4296.119574
  .curr->pid                     : 19623
  .clock                         : 1452228.136825
  .cpu_load[0]                   : 5120
  .cpu_load[1]                   : 49500
  .cpu_load[2]                   : 38405
  .cpu_load[3]                   : 24537
  .cpu_load[4]                   : 15522
  .rt.rt_nr_running              : 0
  .rt.rt_nr_uninterruptible      : -2
  .rto_schedule                  : 1472565
  .rto_schedule_tail             : 0
  .rto_wakeup                    : 0
  .rto_pulled                    : 0
  .rto_pushed                    : 0
  .yld_count                     : 1
  .sched_switch                  : 0
  .sched_count                   : 6529209
  .sched_goidle                  : 432143
  .ttwu_count                    : 3181567
  .ttwu_local                    : 3128804
  .bkl_count                     : 250

cfs_rq[1]:
  .exec_clock                    : 1051264.911762
  .MIN_vruntime                  : 1550428.769298
  .min_vruntime                  : 1550428.613868
  .max_vruntime                  : 1550429.162494
  .spread                        : 0.393196
  .spread0                       : -220060.718726
  .nr_running                    : 5
  .load                          : 5120
  .nr_spread_over                : 4

rt_rq[1]:
  .rt_nr_running                 : 0
  .rt_throttled                  : 0
  .rt_time                       : 0.343947
  .rt_runtime                    : 950.000000

runnable tasks:
            task   PID         tree-key  switches  prio
exec-runtime         sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------
            spin 19620   1550428.769298      3236   120
1550428.769298      3201.087846         0.133102
            spin 19621   1550428.916072      3230   120
1550428.916072      3205.577748         0.000000
            spin 19622   1550429.162494      3244   120
1550429.162494      3208.822004         0.000000
R           spin 19623   1550428.613868      3245   120
1550428.613868      3208.273826         0.000000
            spin 19624   1550428.914231      3320   120
1550428.914231      3279.605516         0.000000

cpu#2, 2193.084 MHz
  .nr_running                    : 1
  .load                          : 1024
  .nr_switches                   : 3654869
  .nr_load_updates               : 1451358
  .nr_uninterruptible            : -161
  .next_balance                  : 4296.119525
  .curr->pid                     : 19626
  .clock                         : 1452228.262756
  .cpu_load[0]                   : 1024
  .cpu_load[1]                   : 512
  .cpu_load[2]                   : 256
  .cpu_load[3]                   : 128
  .cpu_load[4]                   : 64
  .rt.rt_nr_running              : 0
  .rt.rt_nr_uninterruptible      : -2
  .rto_schedule                  : 1489508
  .rto_schedule_tail             : 0
  .rto_wakeup                    : 0
  .rto_pulled                    : 0
  .rto_pushed                    : 0
  .yld_count                     : 0
  .sched_switch                  : 0
  .sched_count                   : 3659050
  .sched_goidle                  : 1210178
  .ttwu_count                    : 1982899
  .ttwu_local                    : 1937837
  .bkl_count                     : 316

cfs_rq[2]:
  .exec_clock                    : 271064.442561
  .MIN_vruntime                  : 0.000001
  .min_vruntime                  : 1252745.472872
  .max_vruntime                  : 0.000001
  .spread                        : 0.000000
  .spread0                       : -517743.859722
  .nr_running                    : 1
  .load                          : 1024
  .nr_spread_over                : 0

rt_rq[2]:
  .rt_nr_running                 : 0
  .rt_throttled                  : 0
  .rt_time                       : 0.245213
  .rt_runtime                    : 950.000000

runnable tasks:
            task   PID         tree-key  switches  prio
exec-runtime         sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------
R            cat 19626   1252745.472872         4   120
1252745.472872         0.938117         0.001859

cpu#3, 2193.084 MHz
  .nr_running                    : 0
  .load                          : 0
  .nr_switches                   : 3642923
  .nr_load_updates               : 1451041
  .nr_uninterruptible            : -59
  .next_balance                  : 4296.119528
  .curr->pid                     : 0
  .clock                         : 1452228.390208
  .cpu_load[0]                   : 0
  .cpu_load[1]                   : 256
  .cpu_load[2]                   : 192
  .cpu_load[3]                   : 112
  .cpu_load[4]                   : 60
  .rt.rt_nr_running              : 0
  .rt.rt_nr_uninterruptible      : -2
  .rto_schedule                  : 1488745
  .rto_schedule_tail             : 0
  .rto_wakeup                    : 0
  .rto_pulled                    : 0
  .rto_pushed                    : 0
  .yld_count                     : 0
  .sched_switch                  : 0
  .sched_count                   : 3647105
  .sched_goidle                  : 1205735
  .ttwu_count                    : 1971911
  .ttwu_local                    : 1935417
  .bkl_count                     : 938

cfs_rq[3]:
  .exec_clock                    : 278467.804587
  .MIN_vruntime                  : 0.000001
  .min_vruntime                  : 1506528.907842
  .max_vruntime                  : 0.000001
  .spread                        : 0.000000
  .spread0                       : -263960.424752
  .nr_running                    : 0
  .load                          : 0
  .nr_spread_over                : 2

rt_rq[3]:
  .rt_nr_running                 : 0
  .rt_throttled                  : 0
  .rt_time                       : 0.304767
  .rt_runtime                    : 950.000000

runnable tasks:
            task   PID         tree-key  switches  prio
exec-runtime         sum-exec        sum-sleep
----------------------------------------------------------------------------------------------------------

[-- Attachment #2: spin.c --]
[-- Type: text/x-csrc, Size: 139 bytes --]

#include <stdio.h>




void spin(void)
{
	while  (1) ;
}


void main(void)
{
	int i;
	for(i=0;i<4;i++)
		if(fork())
			spin();
	spin();

}

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-08-21 23:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-21  9:56 [PATCH -rt] timer: delay waking softirqs from the jiffy tick Peter Zijlstra
2009-08-21 12:12 ` John Kacur
2009-08-21 12:19   ` Peter Zijlstra
2009-08-21 23:32 ` john stultz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.