* [PATCH 0/3] sched: Enhance/fix sched_avg_update()
@ 2016-06-14 15:27 Frederic Weisbecker
2016-06-14 15:28 ` [PATCH 1/3] sched: Introduce sched_time_avg_ns minimal value Frederic Weisbecker
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Frederic Weisbecker @ 2016-06-14 15:27 UTC (permalink / raw)
To: Peter Zijlstra
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Mike Galbraith, Thomas Gleixner
Hi,
Here is a fix on sched_time_avg_ms sysctl value and an enhancement on
nohz scenarii.
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
sched/core
HEAD: 8e6e088b73c2172c45cc3aadab3d4971ae47215b
Thanks,
Frederic
---
Frederic Weisbecker (3):
sched: Introduce sched_time_avg_ns minimal value
sched: Unloop sched avg decaying
Debug: Alternate old/new version of sched_avg_update calls for profiling
kernel/sched/core.c | 18 +++++++++++++++++-
kernel/sched/fair.c | 7 ++++++-
kernel/sched/sched.h | 8 +++++++-
kernel/sysctl.c | 3 ++-
4 files changed, 32 insertions(+), 4 deletions(-)
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/3] sched: Introduce sched_time_avg_ns minimal value
2016-06-14 15:27 [PATCH 0/3] sched: Enhance/fix sched_avg_update() Frederic Weisbecker
@ 2016-06-14 15:28 ` Frederic Weisbecker
2016-06-14 15:28 ` [PATCH 2/3] sched: Unloop sched avg decaying Frederic Weisbecker
2016-06-14 15:28 ` [NOT-FOR-MERGE-PATCH 3/3] Debug: Alternate old/new version of sched_avg_update calls for profiling Frederic Weisbecker
2 siblings, 0 replies; 8+ messages in thread
From: Frederic Weisbecker @ 2016-06-14 15:28 UTC (permalink / raw)
To: Peter Zijlstra
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Mike Galbraith, Thomas Gleixner
Writing 0 to sysctl value "kernel.sched_time_avg_ms" triggers a lockup:
NMI watchdog: Watchdog detected hard LOCKUP on cpu 4
Modules linked in:
irq event stamp: 81894
hardirqs last enabled at (81893): [<ffffffff8111af88>] rcu_idle_exit+0x68/0xa0
hardirqs last disabled at (81894): [<ffffffff8113536e>] tick_nohz_idle_exit+0x2e/0x110
softirqs last enabled at (81870): [<ffffffff810ab8a1>] _local_bh_enable+0x21/0x50
softirqs last disabled at (81869): [<ffffffff810ace2b>] irq_enter+0x4b/0x70
CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.6.0-rc4+ #269
Hardware name: MSI MS-7850/Z87-G41 PC Mate(MS-7850), BIOS V1.3 08/18/2013
0000000000000000 ffff88021fb05b80 ffffffff8141b023 0000000000000000
0000000000000004 ffff88021fb05ba0 ffffffff81163fcf 0000000000000000
ffff88021fb05c40 ffff88021fb05be0 ffffffff8119ae7c 00000001810077ce
Call Trace:
<NMI> [<ffffffff8141b023>] dump_stack+0x85/0xc2
[<ffffffff81163fcf>] watchdog_overflow_callback+0x13f/0x160
[<ffffffff8119ae7c>] __perf_event_overflow+0x9c/0x1f0
[<ffffffff8119baa4>] perf_event_overflow+0x14/0x20
[<ffffffff8100d235>] intel_pmu_handle_irq+0x1d5/0x4a0
[<ffffffff810062ad>] perf_event_nmi_handler+0x2d/0x50
[<ffffffff81174397>] ? tracing_generic_entry_update+0x97/0xb0
[<ffffffff81039cef>] nmi_handle+0xbf/0x2f0
[<ffffffff81039c35>] ? nmi_handle+0x5/0x2f0
[<ffffffff81174397>] ? tracing_generic_entry_update+0x97/0xb0
[<ffffffff8103a171>] default_do_nmi+0x71/0x1b0
[<ffffffff8103a3c5>] do_nmi+0x115/0x170
[<ffffffff8194b511>] end_repeat_nmi+0x1a/0x1e
[<ffffffff81174397>] ? tracing_generic_entry_update+0x97/0xb0
[<ffffffff81174397>] ? tracing_generic_entry_update+0x97/0xb0
[<ffffffff81174397>] ? tracing_generic_entry_update+0x97/0xb0
<<EOE>> [<ffffffff81177d3f>] ? trace_buffer_lock_reserve+0x3f/0x60
[<ffffffff810d3d51>] ? sched_avg_update+0x51/0xc0
[<ffffffff811781e5>] __trace_bputs+0x75/0x100
[<ffffffff810d3d79>] sched_avg_update+0x79/0xc0
[<ffffffff810dedc3>] cpu_load_update+0xa3/0xd0
[<ffffffff810e694a>] cpu_load_update_nohz_stop+0x7a/0x80
[<ffffffff811353b7>] tick_nohz_idle_exit+0x77/0x110
[<ffffffff810f0866>] cpu_startup_entry+0x176/0x420
[<ffffffff810589f4>] start_secondary+0x104/0x110
This is due to the loop in sched_avg_update() that fails to make any
progress when the update delay is 0.
Since this is non-sense to set that value to 0 anyway, force a minimum
of at least 1 millisecond.
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
kernel/sysctl.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 87b2fc3..b730dd6 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -345,7 +345,8 @@ static struct ctl_table kern_table[] = {
.data = &sysctl_sched_time_avg,
.maxlen = sizeof(unsigned int),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &one,
},
{
.procname = "sched_shares_window_ns",
--
2.7.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/3] sched: Unloop sched avg decaying
2016-06-14 15:27 [PATCH 0/3] sched: Enhance/fix sched_avg_update() Frederic Weisbecker
2016-06-14 15:28 ` [PATCH 1/3] sched: Introduce sched_time_avg_ns minimal value Frederic Weisbecker
@ 2016-06-14 15:28 ` Frederic Weisbecker
2016-06-14 15:58 ` Peter Zijlstra
2016-06-14 15:28 ` [NOT-FOR-MERGE-PATCH 3/3] Debug: Alternate old/new version of sched_avg_update calls for profiling Frederic Weisbecker
2 siblings, 1 reply; 8+ messages in thread
From: Frederic Weisbecker @ 2016-06-14 15:28 UTC (permalink / raw)
To: Peter Zijlstra
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Mike Galbraith, Thomas Gleixner
The code that computes sched rt avg decaying periodically (every 0.5
seconds) catches up with lost updates within a loop. This is ok when
CPUs sleep in dynticks mode for short periods of time as they only
miss a few updates that are caught up through a few iterations. But CPUs
can sleep for undefined periods of time, leading to unbound number of
loop iterations to catch up. As a result, the computation time of
sched_avg_update() can virtually induce some latency issue.
The solution in this patch propose to convert this loop into a single
division. Testing on idle load has shown positive results. Below
is showned a function profile comparison between the upstream version
(renamed as sched_avg_update_old) against the new version
(sched_avg_update) with both being called alternatively.
New version can be up to 2 times faster on average.
Function Hit Time Avg s^2
-------- --- ---- --- ---
trace_stat/function0: sched_avg_update_old 135120 47612.36 us 0.352 us 1049.606 us
trace_stat/function0: sched_avg_update 135120 42668.44 us 0.315 us 1617.970 us
trace_stat/function1: sched_avg_update_old 132074 46618.41 us 0.352 us 3653.566 us
trace_stat/function1: sched_avg_update 132074 41356.76 us 0.313 us 3161.585 us
trace_stat/function2: sched_avg_update_old 121335 43381.10 us 0.357 us 1262.240 us
trace_stat/function2: sched_avg_update 121335 38240.72 us 0.315 us 970.400 us
trace_stat/function3: sched_avg_update_old 37563 17640.67 us 0.469 us 148.206 us
trace_stat/function3: sched_avg_update 37563 11059.65 us 0.294 us 126.548 us
trace_stat/function4: sched_avg_update_old 24200 13557.05 us 0.560 us 57.124 us
trace_stat/function4: sched_avg_update 24200 6687.281 us 0.276 us 9.528 us
trace_stat/function5: sched_avg_update_old 28572 15158.76 us 0.530 us 674.049 us
trace_stat/function5: sched_avg_update 28573 8012.361 us 0.280 us 181.687 us
trace_stat/function6: sched_avg_update_old 23424 12987.64 us 0.554 us 27.639 us
trace_stat/function6: sched_avg_update 23425 6264.433 us 0.267 us 2.965 us
trace_stat/function7: sched_avg_update_old 22192 13083.69 us 0.589 us 25.605 us
trace_stat/function7: sched_avg_update 22191 5947.785 us 0.268 us 1.748 us
Below is a snapshot of the same function profile under full buzy load
(perf bench sched messaging). Old and new versions there show roughly
the same results with tiny 2 or 3 nsecs in favour of the old version
which shouldn't matter much as it is called every 0.5 seconds.
Function Hit Time Avg s^2
-------- --- ---- --- ---
trace_stat/function0: sched_avg_update 106699 8029.961 us 0.075 us 0.244 us
trace_stat/function0: sched_avg_update_old 106698 7852.948 us 0.073 us 0.313 us
trace_stat/function1: sched_avg_update 106547 8066.833 us 0.075 us 1.256 us
trace_stat/function1: sched_avg_update_old 106547 7794.896 us 0.073 us 1.521 us
trace_stat/function2: sched_avg_update 106527 8049.326 us 0.075 us 1.141 us
trace_stat/function2: sched_avg_update_old 106528 7818.155 us 0.073 us 1.052 us
trace_stat/function3: sched_avg_update 106534 8056.079 us 0.075 us 0.342 us
trace_stat/function3: sched_avg_update_old 106535 7815.416 us 0.073 us 0.369 us
trace_stat/function4: sched_avg_update 106433 8090.462 us 0.076 us 5.359 us
trace_stat/function4: sched_avg_update_old 106433 7879.694 us 0.074 us 0.433 us
trace_stat/function5: sched_avg_update 106426 8127.800 us 0.076 us 1.304 us
trace_stat/function5: sched_avg_update_old 106425 7854.538 us 0.073 us 2.466 us
trace_stat/function6: sched_avg_update 106436 8067.921 us 0.075 us 0.257 us
trace_stat/function6: sched_avg_update_old 106436 7830.492 us 0.073 us 0.334 us
trace_stat/function7: sched_avg_update 106427 8135.786 us 0.076 us 13.181 us
trace_stat/function7: sched_avg_update_old 106428 7940.925 us 0.074 us 0.982 us
As a conclusion, the new version behaves roughly the same under buzy
load but has a faster and much more contained behaviour under idle load.
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
kernel/sched/core.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 385c947..0c0578a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -666,17 +666,17 @@ bool sched_can_stop_tick(struct rq *rq)
void sched_avg_update(struct rq *rq)
{
s64 period = sched_avg_period();
+ s64 delta;
+ u64 rem;
+ int pending;
- while ((s64)(rq_clock(rq) - rq->age_stamp) > period) {
- /*
- * Inline assembly required to prevent the compiler
- * optimising this loop into a divmod call.
- * See __iter_div_u64_rem() for another example of this.
- */
- asm("" : "+rm" (rq->age_stamp));
- rq->age_stamp += period;
- rq->rt_avg /= 2;
- }
+ delta = (s64)(rq_clock(rq) - rq->age_stamp);
+ if (delta <= period)
+ return;
+
+ pending = div64_u64_rem(delta, period, &rem);
+ rq->age_stamp += delta - rem;
+ rq->rt_avg >>= pending;
}
#endif /* CONFIG_SMP */
--
2.7.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [NOT-FOR-MERGE-PATCH 3/3] Debug: Alternate old/new version of sched_avg_update calls for profiling
2016-06-14 15:27 [PATCH 0/3] sched: Enhance/fix sched_avg_update() Frederic Weisbecker
2016-06-14 15:28 ` [PATCH 1/3] sched: Introduce sched_time_avg_ns minimal value Frederic Weisbecker
2016-06-14 15:28 ` [PATCH 2/3] sched: Unloop sched avg decaying Frederic Weisbecker
@ 2016-06-14 15:28 ` Frederic Weisbecker
2 siblings, 0 replies; 8+ messages in thread
From: Frederic Weisbecker @ 2016-06-14 15:28 UTC (permalink / raw)
To: Peter Zijlstra
Cc: LKML, Frederic Weisbecker, Ingo Molnar, Mike Galbraith, Thomas Gleixner
This way we can measure with the function profiler.
trace_stat/function0: sched_avg_update_old 764 223.936 us 0.293 us 0.017 us
trace_stat/function0: sched_avg_update 764 199.818 us 0.261 us 0.006 us
trace_stat/function1: sched_avg_update_old 252 99.225 us 0.393 us 0.106 us
trace_stat/function1: sched_avg_update 252 59.080 us 0.234 us 0.008 us
trace_stat/function2: sched_avg_update_old 606 202.465 us 0.334 us 0.007 us
trace_stat/function2: sched_avg_update 605 184.083 us 0.304 us 0.002 us
trace_stat/function3: sched_avg_update_old 902 261.790 us 0.290 us 0.010 us
trace_stat/function3: sched_avg_update 901 238.253 us 0.264 us 0.004 us
trace_stat/function4: sched_avg_update_old 120 50.391 us 0.419 us 0.095 us
trace_stat/function4: sched_avg_update 120 35.947 us 0.299 us 0.004 us
trace_stat/function5: sched_avg_update_old 581 195.280 us 0.336 us 0.011 us
trace_stat/function5: sched_avg_update 582 173.594 us 0.298 us 0.003 us
trace_stat/function6: sched_avg_update_old 202 45.201 us 0.223 us 0.049 us
trace_stat/function6: sched_avg_update 201 37.892 us 0.188 us 0.013 us
trace_stat/function7: sched_avg_update_old 200 80.731 us 0.403 us 0.477 us
trace_stat/function7: sched_avg_update 200 41.759 us 0.208 us 0.012 us
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Not-Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
kernel/sched/core.c | 16 ++++++++++++++++
kernel/sched/fair.c | 7 ++++++-
kernel/sched/sched.h | 8 +++++++-
3 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0c0578a..5410e1bb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -663,6 +663,22 @@ bool sched_can_stop_tick(struct rq *rq)
}
#endif /* CONFIG_NO_HZ_FULL */
+void sched_avg_update_old(struct rq *rq)
+{
+ s64 period = sched_avg_period();
+
+ while ((s64)(rq_clock(rq) - rq->age_stamp) > period) {
+ /*
+ * Inline assembly required to prevent the compiler
+ * optimising this loop into a divmod call.
+ * See __iter_div_u64_rem() for another example of this.
+ */
+ asm("" : "+rm" (rq->age_stamp));
+ rq->age_stamp += period;
+ rq->rt_avg /= 2;
+ }
+}
+
void sched_avg_update(struct rq *rq)
{
s64 period = sched_avg_period();
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c6dd8ba..1b487bb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4607,6 +4607,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
* see decay_load_misses(). For NOHZ_FULL we get to subtract and add the extra
* term.
*/
+DEFINE_PER_CPU(int, cpu_lodd);
static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
unsigned long pending_updates)
{
@@ -4647,7 +4648,11 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
this_rq->cpu_load[i] = (old_load * (scale - 1) + new_load) >> i;
}
- sched_avg_update(this_rq);
+ if (__this_cpu_read(cpu_lodd) % 2)
+ sched_avg_update(this_rq);
+ else
+ sched_avg_update_old(this_rq);
+ __this_cpu_add(cpu_lodd, 1);
}
/* Used instead of source_load when we know the type == 0 */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 72f1f30..b7e197b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1427,6 +1427,7 @@ static inline int hrtick_enabled(struct rq *rq)
#endif /* CONFIG_SCHED_HRTICK */
#ifdef CONFIG_SMP
+extern void sched_avg_update_old(struct rq *rq);
extern void sched_avg_update(struct rq *rq);
#ifndef arch_scale_freq_capacity
@@ -1448,10 +1449,15 @@ unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
}
#endif
+DECLARE_PER_CPU(int, cpu_lodd);
static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta)
{
rq->rt_avg += rt_delta * arch_scale_freq_capacity(NULL, cpu_of(rq));
- sched_avg_update(rq);
+ if (__this_cpu_read(cpu_lodd) % 2)
+ sched_avg_update(rq);
+ else
+ sched_avg_update_old(rq);
+ __this_cpu_add(cpu_lodd, 1);
}
#else
static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta) { }
--
2.7.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] sched: Unloop sched avg decaying
2016-06-14 15:28 ` [PATCH 2/3] sched: Unloop sched avg decaying Frederic Weisbecker
@ 2016-06-14 15:58 ` Peter Zijlstra
2016-06-30 12:52 ` Frederic Weisbecker
0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2016-06-14 15:58 UTC (permalink / raw)
To: Frederic Weisbecker; +Cc: LKML, Ingo Molnar, Mike Galbraith, Thomas Gleixner
On Tue, Jun 14, 2016 at 05:28:01PM +0200, Frederic Weisbecker wrote:
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 385c947..0c0578a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -666,17 +666,17 @@ bool sched_can_stop_tick(struct rq *rq)
> void sched_avg_update(struct rq *rq)
> {
> s64 period = sched_avg_period();
> + s64 delta;
> + u64 rem;
> + int pending;
>
> + delta = (s64)(rq_clock(rq) - rq->age_stamp);
> + if (delta <= period)
> + return;
> +
> + pending = div64_u64_rem(delta, period, &rem);
> + rq->age_stamp += delta - rem;
> + rq->rt_avg >>= pending;
> }
Blergh, and now do the profile on machine that doesn't have major
transistor count dedicated to divisions.
Why not add the division to the nohz exit path only?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] sched: Unloop sched avg decaying
2016-06-14 15:58 ` Peter Zijlstra
@ 2016-06-30 12:52 ` Frederic Weisbecker
2016-06-30 13:20 ` Peter Zijlstra
0 siblings, 1 reply; 8+ messages in thread
From: Frederic Weisbecker @ 2016-06-30 12:52 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: LKML, Ingo Molnar, Mike Galbraith, Thomas Gleixner
On Tue, Jun 14, 2016 at 05:58:42PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 14, 2016 at 05:28:01PM +0200, Frederic Weisbecker wrote:
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 385c947..0c0578a 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -666,17 +666,17 @@ bool sched_can_stop_tick(struct rq *rq)
> > void sched_avg_update(struct rq *rq)
> > {
> > s64 period = sched_avg_period();
> > + s64 delta;
> > + u64 rem;
> > + int pending;
> >
> > + delta = (s64)(rq_clock(rq) - rq->age_stamp);
> > + if (delta <= period)
> > + return;
> > +
> > + pending = div64_u64_rem(delta, period, &rem);
> > + rq->age_stamp += delta - rem;
> > + rq->rt_avg >>= pending;
> > }
>
> Blergh, and now do the profile on machine that doesn't have major
> transistor count dedicated to divisions.
Indeed I was afraid of something like that. That said I'm not sure which
is worse between iteration or division out of native asm.
Would it be worth testing?
>
> Why not add the division to the nohz exit path only?
It would be worse I think because we may exit much more often from nohz
than we reach a sched_avg_period().
So the only safe optimization I can do for now is:
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b489fc..4632d31 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -626,8 +626,9 @@ bool sched_can_stop_tick(struct rq *rq)
void sched_avg_update(struct rq *rq)
{
s64 period = sched_avg_period();
+ int i;
- while ((s64)(rq_clock(rq) - rq->age_stamp) > period) {
+ for (i = 0; (s64)(rq_clock(rq) - rq->age_stamp) > period); i++) {
/*
* Inline assembly required to prevent the compiler
* optimising this loop into a divmod call.
@@ -635,8 +636,8 @@ void sched_avg_update(struct rq *rq)
*/
asm("" : "+rm" (rq->age_stamp));
rq->age_stamp += period;
- rq->rt_avg /= 2;
}
+ rq->rt_avg >>= i;
}
#endif /* CONFIG_SMP */
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] sched: Unloop sched avg decaying
2016-06-30 12:52 ` Frederic Weisbecker
@ 2016-06-30 13:20 ` Peter Zijlstra
2016-07-06 11:53 ` Frederic Weisbecker
0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2016-06-30 13:20 UTC (permalink / raw)
To: Frederic Weisbecker; +Cc: LKML, Ingo Molnar, Mike Galbraith, Thomas Gleixner
On Thu, Jun 30, 2016 at 02:52:26PM +0200, Frederic Weisbecker wrote:
> On Tue, Jun 14, 2016 at 05:58:42PM +0200, Peter Zijlstra wrote:
> > Why not add the division to the nohz exit path only?
>
> It would be worse I think because we may exit much more often from nohz
> than we reach a sched_avg_period().
>
> So the only safe optimization I can do for now is:
How about something like this then?
---
kernel/sched/core.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3387e4f14fc9..fd1ae4c4105f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -665,9 +665,23 @@ bool sched_can_stop_tick(struct rq *rq)
void sched_avg_update(struct rq *rq)
{
- s64 period = sched_avg_period();
+ s64 delta, period = sched_avg_period();
- while ((s64)(rq_clock(rq) - rq->age_stamp) > period) {
+ delta = (s64)(rq_clock(rq) - rq->age_stamp);
+ if (likely(delta < period))
+ return;
+
+ if (unlikely(delta > 3*period)) {
+ int pending;
+ u64 rem;
+
+ pending = div64_u64_rem(delta, period, &rem);
+ rq->age_stamp += delta - rem;
+ rq->rt_avg >>= pending;
+ return;
+ }
+
+ while (delta > period) {
/*
* Inline assembly required to prevent the compiler
* optimising this loop into a divmod call.
@@ -675,6 +689,7 @@ void sched_avg_update(struct rq *rq)
*/
asm("" : "+rm" (rq->age_stamp));
rq->age_stamp += period;
+ delta -= period;
rq->rt_avg /= 2;
}
}
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] sched: Unloop sched avg decaying
2016-06-30 13:20 ` Peter Zijlstra
@ 2016-07-06 11:53 ` Frederic Weisbecker
0 siblings, 0 replies; 8+ messages in thread
From: Frederic Weisbecker @ 2016-07-06 11:53 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: LKML, Ingo Molnar, Mike Galbraith, Thomas Gleixner
On Thu, Jun 30, 2016 at 03:20:37PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 30, 2016 at 02:52:26PM +0200, Frederic Weisbecker wrote:
> > On Tue, Jun 14, 2016 at 05:58:42PM +0200, Peter Zijlstra wrote:
>
> > > Why not add the division to the nohz exit path only?
> >
> > It would be worse I think because we may exit much more often from nohz
> > than we reach a sched_avg_period().
> >
> > So the only safe optimization I can do for now is:
>
> How about something like this then?
>
> ---
>
> kernel/sched/core.c | 19 +++++++++++++++++--
> 1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 3387e4f14fc9..fd1ae4c4105f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -665,9 +665,23 @@ bool sched_can_stop_tick(struct rq *rq)
>
> void sched_avg_update(struct rq *rq)
> {
> - s64 period = sched_avg_period();
> + s64 delta, period = sched_avg_period();
>
> - while ((s64)(rq_clock(rq) - rq->age_stamp) > period) {
> + delta = (s64)(rq_clock(rq) - rq->age_stamp);
> + if (likely(delta < period))
> + return;
> +
> + if (unlikely(delta > 3*period)) {
> + int pending;
> + u64 rem;
> +
> + pending = div64_u64_rem(delta, period, &rem);
> + rq->age_stamp += delta - rem;
> + rq->rt_avg >>= pending;
> + return;
> + }
> +
> + while (delta > period) {
> /*
> * Inline assembly required to prevent the compiler
> * optimising this loop into a divmod call.
> @@ -675,6 +689,7 @@ void sched_avg_update(struct rq *rq)
> */
> asm("" : "+rm" (rq->age_stamp));
> rq->age_stamp += period;
> + delta -= period;
> rq->rt_avg /= 2;
> }
> }
Makes sense. I'm going to do some tests. We might want to precompute 3*period maybe.
Thanks.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-07-06 11:53 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-14 15:27 [PATCH 0/3] sched: Enhance/fix sched_avg_update() Frederic Weisbecker
2016-06-14 15:28 ` [PATCH 1/3] sched: Introduce sched_time_avg_ns minimal value Frederic Weisbecker
2016-06-14 15:28 ` [PATCH 2/3] sched: Unloop sched avg decaying Frederic Weisbecker
2016-06-14 15:58 ` Peter Zijlstra
2016-06-30 12:52 ` Frederic Weisbecker
2016-06-30 13:20 ` Peter Zijlstra
2016-07-06 11:53 ` Frederic Weisbecker
2016-06-14 15:28 ` [NOT-FOR-MERGE-PATCH 3/3] Debug: Alternate old/new version of sched_avg_update calls for profiling Frederic Weisbecker
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.