* [PATCH 1/3] sched: Introduce sched_time_avg_ns minimal value
2016-06-14 15:27 [PATCH 0/3] sched: Enhance/fix sched_avg_update() Frederic Weisbecker
@ 2016-06-14 15:28 ` Frederic Weisbecker
2016-06-14 15:28 ` [PATCH 2/3] sched: Unloop sched avg decaying Frederic Weisbecker
2016-06-14 15:28 ` [NOT-FOR-MERGE-PATCH 3/3] Debug: Alternate old/new version of sched_avg_update calls for profiling Frederic Weisbecker
2 siblings, 0 replies; 8+ messages in thread
From: Frederic Weisbecker @ 2016-06-14 15:28 UTC (permalink / raw)
To: Peter Zijlstra
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Mike Galbraith, Thomas Gleixner
Writing 0 to sysctl value "kernel.sched_time_avg_ms" triggers a lockup:
NMI watchdog: Watchdog detected hard LOCKUP on cpu 4
Modules linked in:
irq event stamp: 81894
hardirqs last enabled at (81893): [<ffffffff8111af88>] rcu_idle_exit+0x68/0xa0
hardirqs last disabled at (81894): [<ffffffff8113536e>] tick_nohz_idle_exit+0x2e/0x110
softirqs last enabled at (81870): [<ffffffff810ab8a1>] _local_bh_enable+0x21/0x50
softirqs last disabled at (81869): [<ffffffff810ace2b>] irq_enter+0x4b/0x70
CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.6.0-rc4+ #269
Hardware name: MSI MS-7850/Z87-G41 PC Mate(MS-7850), BIOS V1.3 08/18/2013
0000000000000000 ffff88021fb05b80 ffffffff8141b023 0000000000000000
0000000000000004 ffff88021fb05ba0 ffffffff81163fcf 0000000000000000
ffff88021fb05c40 ffff88021fb05be0 ffffffff8119ae7c 00000001810077ce
Call Trace:
<NMI> [<ffffffff8141b023>] dump_stack+0x85/0xc2
[<ffffffff81163fcf>] watchdog_overflow_callback+0x13f/0x160
[<ffffffff8119ae7c>] __perf_event_overflow+0x9c/0x1f0
[<ffffffff8119baa4>] perf_event_overflow+0x14/0x20
[<ffffffff8100d235>] intel_pmu_handle_irq+0x1d5/0x4a0
[<ffffffff810062ad>] perf_event_nmi_handler+0x2d/0x50
[<ffffffff81174397>] ? tracing_generic_entry_update+0x97/0xb0
[<ffffffff81039cef>] nmi_handle+0xbf/0x2f0
[<ffffffff81039c35>] ? nmi_handle+0x5/0x2f0
[<ffffffff81174397>] ? tracing_generic_entry_update+0x97/0xb0
[<ffffffff8103a171>] default_do_nmi+0x71/0x1b0
[<ffffffff8103a3c5>] do_nmi+0x115/0x170
[<ffffffff8194b511>] end_repeat_nmi+0x1a/0x1e
[<ffffffff81174397>] ? tracing_generic_entry_update+0x97/0xb0
[<ffffffff81174397>] ? tracing_generic_entry_update+0x97/0xb0
[<ffffffff81174397>] ? tracing_generic_entry_update+0x97/0xb0
<<EOE>> [<ffffffff81177d3f>] ? trace_buffer_lock_reserve+0x3f/0x60
[<ffffffff810d3d51>] ? sched_avg_update+0x51/0xc0
[<ffffffff811781e5>] __trace_bputs+0x75/0x100
[<ffffffff810d3d79>] sched_avg_update+0x79/0xc0
[<ffffffff810dedc3>] cpu_load_update+0xa3/0xd0
[<ffffffff810e694a>] cpu_load_update_nohz_stop+0x7a/0x80
[<ffffffff811353b7>] tick_nohz_idle_exit+0x77/0x110
[<ffffffff810f0866>] cpu_startup_entry+0x176/0x420
[<ffffffff810589f4>] start_secondary+0x104/0x110
This is due to the loop in sched_avg_update() that fails to make any
progress when the update delay is 0.
Since this is non-sense to set that value to 0 anyway, force a minimum
of at least 1 millisecond.
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
kernel/sysctl.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 87b2fc3..b730dd6 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -345,7 +345,8 @@ static struct ctl_table kern_table[] = {
.data = &sysctl_sched_time_avg,
.maxlen = sizeof(unsigned int),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &one,
},
{
.procname = "sched_shares_window_ns",
--
2.7.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/3] sched: Unloop sched avg decaying
2016-06-14 15:27 [PATCH 0/3] sched: Enhance/fix sched_avg_update() Frederic Weisbecker
2016-06-14 15:28 ` [PATCH 1/3] sched: Introduce sched_time_avg_ns minimal value Frederic Weisbecker
@ 2016-06-14 15:28 ` Frederic Weisbecker
2016-06-14 15:58 ` Peter Zijlstra
2016-06-14 15:28 ` [NOT-FOR-MERGE-PATCH 3/3] Debug: Alternate old/new version of sched_avg_update calls for profiling Frederic Weisbecker
2 siblings, 1 reply; 8+ messages in thread
From: Frederic Weisbecker @ 2016-06-14 15:28 UTC (permalink / raw)
To: Peter Zijlstra
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Mike Galbraith, Thomas Gleixner
The code that computes sched rt avg decaying periodically (every 0.5
seconds) catches up with lost updates within a loop. This is ok when
CPUs sleep in dynticks mode for short periods of time as they only
miss a few updates that are caught up through a few iterations. But CPUs
can sleep for undefined periods of time, leading to unbound number of
loop iterations to catch up. As a result, the computation time of
sched_avg_update() can virtually induce some latency issue.
The solution in this patch propose to convert this loop into a single
division. Testing on idle load has shown positive results. Below
is showned a function profile comparison between the upstream version
(renamed as sched_avg_update_old) against the new version
(sched_avg_update) with both being called alternatively.
New version can be up to 2 times faster on average.
Function Hit Time Avg s^2
-------- --- ---- --- ---
trace_stat/function0: sched_avg_update_old 135120 47612.36 us 0.352 us 1049.606 us
trace_stat/function0: sched_avg_update 135120 42668.44 us 0.315 us 1617.970 us
trace_stat/function1: sched_avg_update_old 132074 46618.41 us 0.352 us 3653.566 us
trace_stat/function1: sched_avg_update 132074 41356.76 us 0.313 us 3161.585 us
trace_stat/function2: sched_avg_update_old 121335 43381.10 us 0.357 us 1262.240 us
trace_stat/function2: sched_avg_update 121335 38240.72 us 0.315 us 970.400 us
trace_stat/function3: sched_avg_update_old 37563 17640.67 us 0.469 us 148.206 us
trace_stat/function3: sched_avg_update 37563 11059.65 us 0.294 us 126.548 us
trace_stat/function4: sched_avg_update_old 24200 13557.05 us 0.560 us 57.124 us
trace_stat/function4: sched_avg_update 24200 6687.281 us 0.276 us 9.528 us
trace_stat/function5: sched_avg_update_old 28572 15158.76 us 0.530 us 674.049 us
trace_stat/function5: sched_avg_update 28573 8012.361 us 0.280 us 181.687 us
trace_stat/function6: sched_avg_update_old 23424 12987.64 us 0.554 us 27.639 us
trace_stat/function6: sched_avg_update 23425 6264.433 us 0.267 us 2.965 us
trace_stat/function7: sched_avg_update_old 22192 13083.69 us 0.589 us 25.605 us
trace_stat/function7: sched_avg_update 22191 5947.785 us 0.268 us 1.748 us
Below is a snapshot of the same function profile under full buzy load
(perf bench sched messaging). Old and new versions there show roughly
the same results with tiny 2 or 3 nsecs in favour of the old version
which shouldn't matter much as it is called every 0.5 seconds.
Function Hit Time Avg s^2
-------- --- ---- --- ---
trace_stat/function0: sched_avg_update 106699 8029.961 us 0.075 us 0.244 us
trace_stat/function0: sched_avg_update_old 106698 7852.948 us 0.073 us 0.313 us
trace_stat/function1: sched_avg_update 106547 8066.833 us 0.075 us 1.256 us
trace_stat/function1: sched_avg_update_old 106547 7794.896 us 0.073 us 1.521 us
trace_stat/function2: sched_avg_update 106527 8049.326 us 0.075 us 1.141 us
trace_stat/function2: sched_avg_update_old 106528 7818.155 us 0.073 us 1.052 us
trace_stat/function3: sched_avg_update 106534 8056.079 us 0.075 us 0.342 us
trace_stat/function3: sched_avg_update_old 106535 7815.416 us 0.073 us 0.369 us
trace_stat/function4: sched_avg_update 106433 8090.462 us 0.076 us 5.359 us
trace_stat/function4: sched_avg_update_old 106433 7879.694 us 0.074 us 0.433 us
trace_stat/function5: sched_avg_update 106426 8127.800 us 0.076 us 1.304 us
trace_stat/function5: sched_avg_update_old 106425 7854.538 us 0.073 us 2.466 us
trace_stat/function6: sched_avg_update 106436 8067.921 us 0.075 us 0.257 us
trace_stat/function6: sched_avg_update_old 106436 7830.492 us 0.073 us 0.334 us
trace_stat/function7: sched_avg_update 106427 8135.786 us 0.076 us 13.181 us
trace_stat/function7: sched_avg_update_old 106428 7940.925 us 0.074 us 0.982 us
As a conclusion, the new version behaves roughly the same under buzy
load but has a faster and much more contained behaviour under idle load.
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
kernel/sched/core.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 385c947..0c0578a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -666,17 +666,17 @@ bool sched_can_stop_tick(struct rq *rq)
void sched_avg_update(struct rq *rq)
{
s64 period = sched_avg_period();
+ s64 delta;
+ u64 rem;
+ int pending;
- while ((s64)(rq_clock(rq) - rq->age_stamp) > period) {
- /*
- * Inline assembly required to prevent the compiler
- * optimising this loop into a divmod call.
- * See __iter_div_u64_rem() for another example of this.
- */
- asm("" : "+rm" (rq->age_stamp));
- rq->age_stamp += period;
- rq->rt_avg /= 2;
- }
+ delta = (s64)(rq_clock(rq) - rq->age_stamp);
+ if (delta <= period)
+ return;
+
+ pending = div64_u64_rem(delta, period, &rem);
+ rq->age_stamp += delta - rem;
+ rq->rt_avg >>= pending;
}
#endif /* CONFIG_SMP */
--
2.7.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] sched: Unloop sched avg decaying
2016-06-14 15:28 ` [PATCH 2/3] sched: Unloop sched avg decaying Frederic Weisbecker
@ 2016-06-14 15:58 ` Peter Zijlstra
2016-06-30 12:52 ` Frederic Weisbecker
0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2016-06-14 15:58 UTC (permalink / raw)
To: Frederic Weisbecker; +Cc: LKML, Ingo Molnar, Mike Galbraith, Thomas Gleixner
On Tue, Jun 14, 2016 at 05:28:01PM +0200, Frederic Weisbecker wrote:
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 385c947..0c0578a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -666,17 +666,17 @@ bool sched_can_stop_tick(struct rq *rq)
> void sched_avg_update(struct rq *rq)
> {
> s64 period = sched_avg_period();
> + s64 delta;
> + u64 rem;
> + int pending;
>
> + delta = (s64)(rq_clock(rq) - rq->age_stamp);
> + if (delta <= period)
> + return;
> +
> + pending = div64_u64_rem(delta, period, &rem);
> + rq->age_stamp += delta - rem;
> + rq->rt_avg >>= pending;
> }
Blergh, and now do the profile on machine that doesn't have major
transistor count dedicated to divisions.
Why not add the division to the nohz exit path only?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] sched: Unloop sched avg decaying
2016-06-14 15:58 ` Peter Zijlstra
@ 2016-06-30 12:52 ` Frederic Weisbecker
2016-06-30 13:20 ` Peter Zijlstra
0 siblings, 1 reply; 8+ messages in thread
From: Frederic Weisbecker @ 2016-06-30 12:52 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: LKML, Ingo Molnar, Mike Galbraith, Thomas Gleixner
On Tue, Jun 14, 2016 at 05:58:42PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 14, 2016 at 05:28:01PM +0200, Frederic Weisbecker wrote:
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 385c947..0c0578a 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -666,17 +666,17 @@ bool sched_can_stop_tick(struct rq *rq)
> > void sched_avg_update(struct rq *rq)
> > {
> > s64 period = sched_avg_period();
> > + s64 delta;
> > + u64 rem;
> > + int pending;
> >
> > + delta = (s64)(rq_clock(rq) - rq->age_stamp);
> > + if (delta <= period)
> > + return;
> > +
> > + pending = div64_u64_rem(delta, period, &rem);
> > + rq->age_stamp += delta - rem;
> > + rq->rt_avg >>= pending;
> > }
>
> Blergh, and now do the profile on machine that doesn't have major
> transistor count dedicated to divisions.
Indeed I was afraid of something like that. That said I'm not sure which
is worse between iteration or division out of native asm.
Would it be worth testing?
>
> Why not add the division to the nohz exit path only?
It would be worse I think because we may exit much more often from nohz
than we reach a sched_avg_period().
So the only safe optimization I can do for now is:
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b489fc..4632d31 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -626,8 +626,9 @@ bool sched_can_stop_tick(struct rq *rq)
void sched_avg_update(struct rq *rq)
{
s64 period = sched_avg_period();
+ int i;
- while ((s64)(rq_clock(rq) - rq->age_stamp) > period) {
+ for (i = 0; (s64)(rq_clock(rq) - rq->age_stamp) > period); i++) {
/*
* Inline assembly required to prevent the compiler
* optimising this loop into a divmod call.
@@ -635,8 +636,8 @@ void sched_avg_update(struct rq *rq)
*/
asm("" : "+rm" (rq->age_stamp));
rq->age_stamp += period;
- rq->rt_avg /= 2;
}
+ rq->rt_avg >>= i;
}
#endif /* CONFIG_SMP */
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] sched: Unloop sched avg decaying
2016-06-30 12:52 ` Frederic Weisbecker
@ 2016-06-30 13:20 ` Peter Zijlstra
2016-07-06 11:53 ` Frederic Weisbecker
0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2016-06-30 13:20 UTC (permalink / raw)
To: Frederic Weisbecker; +Cc: LKML, Ingo Molnar, Mike Galbraith, Thomas Gleixner
On Thu, Jun 30, 2016 at 02:52:26PM +0200, Frederic Weisbecker wrote:
> On Tue, Jun 14, 2016 at 05:58:42PM +0200, Peter Zijlstra wrote:
> > Why not add the division to the nohz exit path only?
>
> It would be worse I think because we may exit much more often from nohz
> than we reach a sched_avg_period().
>
> So the only safe optimization I can do for now is:
How about something like this then?
---
kernel/sched/core.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3387e4f14fc9..fd1ae4c4105f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -665,9 +665,23 @@ bool sched_can_stop_tick(struct rq *rq)
void sched_avg_update(struct rq *rq)
{
- s64 period = sched_avg_period();
+ s64 delta, period = sched_avg_period();
- while ((s64)(rq_clock(rq) - rq->age_stamp) > period) {
+ delta = (s64)(rq_clock(rq) - rq->age_stamp);
+ if (likely(delta < period))
+ return;
+
+ if (unlikely(delta > 3*period)) {
+ int pending;
+ u64 rem;
+
+ pending = div64_u64_rem(delta, period, &rem);
+ rq->age_stamp += delta - rem;
+ rq->rt_avg >>= pending;
+ return;
+ }
+
+ while (delta > period) {
/*
* Inline assembly required to prevent the compiler
* optimising this loop into a divmod call.
@@ -675,6 +689,7 @@ void sched_avg_update(struct rq *rq)
*/
asm("" : "+rm" (rq->age_stamp));
rq->age_stamp += period;
+ delta -= period;
rq->rt_avg /= 2;
}
}
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] sched: Unloop sched avg decaying
2016-06-30 13:20 ` Peter Zijlstra
@ 2016-07-06 11:53 ` Frederic Weisbecker
0 siblings, 0 replies; 8+ messages in thread
From: Frederic Weisbecker @ 2016-07-06 11:53 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: LKML, Ingo Molnar, Mike Galbraith, Thomas Gleixner
On Thu, Jun 30, 2016 at 03:20:37PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 30, 2016 at 02:52:26PM +0200, Frederic Weisbecker wrote:
> > On Tue, Jun 14, 2016 at 05:58:42PM +0200, Peter Zijlstra wrote:
>
> > > Why not add the division to the nohz exit path only?
> >
> > It would be worse I think because we may exit much more often from nohz
> > than we reach a sched_avg_period().
> >
> > So the only safe optimization I can do for now is:
>
> How about something like this then?
>
> ---
>
> kernel/sched/core.c | 19 +++++++++++++++++--
> 1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 3387e4f14fc9..fd1ae4c4105f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -665,9 +665,23 @@ bool sched_can_stop_tick(struct rq *rq)
>
> void sched_avg_update(struct rq *rq)
> {
> - s64 period = sched_avg_period();
> + s64 delta, period = sched_avg_period();
>
> - while ((s64)(rq_clock(rq) - rq->age_stamp) > period) {
> + delta = (s64)(rq_clock(rq) - rq->age_stamp);
> + if (likely(delta < period))
> + return;
> +
> + if (unlikely(delta > 3*period)) {
> + int pending;
> + u64 rem;
> +
> + pending = div64_u64_rem(delta, period, &rem);
> + rq->age_stamp += delta - rem;
> + rq->rt_avg >>= pending;
> + return;
> + }
> +
> + while (delta > period) {
> /*
> * Inline assembly required to prevent the compiler
> * optimising this loop into a divmod call.
> @@ -675,6 +689,7 @@ void sched_avg_update(struct rq *rq)
> */
> asm("" : "+rm" (rq->age_stamp));
> rq->age_stamp += period;
> + delta -= period;
> rq->rt_avg /= 2;
> }
> }
Makes sense. I'm going to do some tests. We might want to precompute 3*period maybe.
Thanks.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [NOT-FOR-MERGE-PATCH 3/3] Debug: Alternate old/new version of sched_avg_update calls for profiling
2016-06-14 15:27 [PATCH 0/3] sched: Enhance/fix sched_avg_update() Frederic Weisbecker
2016-06-14 15:28 ` [PATCH 1/3] sched: Introduce sched_time_avg_ns minimal value Frederic Weisbecker
2016-06-14 15:28 ` [PATCH 2/3] sched: Unloop sched avg decaying Frederic Weisbecker
@ 2016-06-14 15:28 ` Frederic Weisbecker
2 siblings, 0 replies; 8+ messages in thread
From: Frederic Weisbecker @ 2016-06-14 15:28 UTC (permalink / raw)
To: Peter Zijlstra
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Mike Galbraith, Thomas Gleixner
This way we can measure with the function profiler.
trace_stat/function0: sched_avg_update_old 764 223.936 us 0.293 us 0.017 us
trace_stat/function0: sched_avg_update 764 199.818 us 0.261 us 0.006 us
trace_stat/function1: sched_avg_update_old 252 99.225 us 0.393 us 0.106 us
trace_stat/function1: sched_avg_update 252 59.080 us 0.234 us 0.008 us
trace_stat/function2: sched_avg_update_old 606 202.465 us 0.334 us 0.007 us
trace_stat/function2: sched_avg_update 605 184.083 us 0.304 us 0.002 us
trace_stat/function3: sched_avg_update_old 902 261.790 us 0.290 us 0.010 us
trace_stat/function3: sched_avg_update 901 238.253 us 0.264 us 0.004 us
trace_stat/function4: sched_avg_update_old 120 50.391 us 0.419 us 0.095 us
trace_stat/function4: sched_avg_update 120 35.947 us 0.299 us 0.004 us
trace_stat/function5: sched_avg_update_old 581 195.280 us 0.336 us 0.011 us
trace_stat/function5: sched_avg_update 582 173.594 us 0.298 us 0.003 us
trace_stat/function6: sched_avg_update_old 202 45.201 us 0.223 us 0.049 us
trace_stat/function6: sched_avg_update 201 37.892 us 0.188 us 0.013 us
trace_stat/function7: sched_avg_update_old 200 80.731 us 0.403 us 0.477 us
trace_stat/function7: sched_avg_update 200 41.759 us 0.208 us 0.012 us
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Not-Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
kernel/sched/core.c | 16 ++++++++++++++++
kernel/sched/fair.c | 7 ++++++-
kernel/sched/sched.h | 8 +++++++-
3 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0c0578a..5410e1bb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -663,6 +663,22 @@ bool sched_can_stop_tick(struct rq *rq)
}
#endif /* CONFIG_NO_HZ_FULL */
+void sched_avg_update_old(struct rq *rq)
+{
+ s64 period = sched_avg_period();
+
+ while ((s64)(rq_clock(rq) - rq->age_stamp) > period) {
+ /*
+ * Inline assembly required to prevent the compiler
+ * optimising this loop into a divmod call.
+ * See __iter_div_u64_rem() for another example of this.
+ */
+ asm("" : "+rm" (rq->age_stamp));
+ rq->age_stamp += period;
+ rq->rt_avg /= 2;
+ }
+}
+
void sched_avg_update(struct rq *rq)
{
s64 period = sched_avg_period();
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c6dd8ba..1b487bb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4607,6 +4607,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
* see decay_load_misses(). For NOHZ_FULL we get to subtract and add the extra
* term.
*/
+DEFINE_PER_CPU(int, cpu_lodd);
static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
unsigned long pending_updates)
{
@@ -4647,7 +4648,11 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
this_rq->cpu_load[i] = (old_load * (scale - 1) + new_load) >> i;
}
- sched_avg_update(this_rq);
+ if (__this_cpu_read(cpu_lodd) % 2)
+ sched_avg_update(this_rq);
+ else
+ sched_avg_update_old(this_rq);
+ __this_cpu_add(cpu_lodd, 1);
}
/* Used instead of source_load when we know the type == 0 */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 72f1f30..b7e197b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1427,6 +1427,7 @@ static inline int hrtick_enabled(struct rq *rq)
#endif /* CONFIG_SCHED_HRTICK */
#ifdef CONFIG_SMP
+extern void sched_avg_update_old(struct rq *rq);
extern void sched_avg_update(struct rq *rq);
#ifndef arch_scale_freq_capacity
@@ -1448,10 +1449,15 @@ unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
}
#endif
+DECLARE_PER_CPU(int, cpu_lodd);
static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta)
{
rq->rt_avg += rt_delta * arch_scale_freq_capacity(NULL, cpu_of(rq));
- sched_avg_update(rq);
+ if (__this_cpu_read(cpu_lodd) % 2)
+ sched_avg_update(rq);
+ else
+ sched_avg_update_old(rq);
+ __this_cpu_add(cpu_lodd, 1);
}
#else
static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta) { }
--
2.7.0
^ permalink raw reply related [flat|nested] 8+ messages in thread