All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] sched: Enhance/fix sched_avg_update()
@ 2016-06-14 15:27 Frederic Weisbecker
  2016-06-14 15:28 ` [PATCH 1/3] sched: Introduce sched_time_avg_ns minimal value Frederic Weisbecker
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Frederic Weisbecker @ 2016-06-14 15:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Ingo Molnar, Mike Galbraith, Thomas Gleixner

Hi,

Here is a fix on sched_time_avg_ms sysctl value and an enhancement on
nohz scenarii.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	sched/core

HEAD: 8e6e088b73c2172c45cc3aadab3d4971ae47215b

Thanks,
	Frederic
---

Frederic Weisbecker (3):
      sched: Introduce sched_time_avg_ns minimal value
      sched: Unloop sched avg decaying
      Debug: Alternate old/new version of sched_avg_update calls for profiling


 kernel/sched/core.c  | 18 +++++++++++++++++-
 kernel/sched/fair.c  |  7 ++++++-
 kernel/sched/sched.h |  8 +++++++-
 kernel/sysctl.c      |  3 ++-
 4 files changed, 32 insertions(+), 4 deletions(-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] sched: Introduce sched_time_avg_ns minimal value
  2016-06-14 15:27 [PATCH 0/3] sched: Enhance/fix sched_avg_update() Frederic Weisbecker
@ 2016-06-14 15:28 ` Frederic Weisbecker
  2016-06-14 15:28 ` [PATCH 2/3] sched: Unloop sched avg decaying Frederic Weisbecker
  2016-06-14 15:28 ` [NOT-FOR-MERGE-PATCH 3/3] Debug: Alternate old/new version of sched_avg_update calls for profiling Frederic Weisbecker
  2 siblings, 0 replies; 8+ messages in thread
From: Frederic Weisbecker @ 2016-06-14 15:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Ingo Molnar, Mike Galbraith, Thomas Gleixner

Writing 0 to sysctl value "kernel.sched_time_avg_ms" triggers a lockup:

	NMI watchdog: Watchdog detected hard LOCKUP on cpu 4
	Modules linked in:
	irq event stamp: 81894
	hardirqs last  enabled at (81893): [<ffffffff8111af88>] rcu_idle_exit+0x68/0xa0
	hardirqs last disabled at (81894): [<ffffffff8113536e>] tick_nohz_idle_exit+0x2e/0x110
	softirqs last  enabled at (81870): [<ffffffff810ab8a1>] _local_bh_enable+0x21/0x50
	softirqs last disabled at (81869): [<ffffffff810ace2b>] irq_enter+0x4b/0x70
	CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.6.0-rc4+ #269
	Hardware name: MSI MS-7850/Z87-G41 PC Mate(MS-7850), BIOS V1.3 08/18/2013
	0000000000000000 ffff88021fb05b80 ffffffff8141b023 0000000000000000
	0000000000000004 ffff88021fb05ba0 ffffffff81163fcf 0000000000000000
	ffff88021fb05c40 ffff88021fb05be0 ffffffff8119ae7c 00000001810077ce
	Call Trace:
	<NMI>  [<ffffffff8141b023>] dump_stack+0x85/0xc2
	[<ffffffff81163fcf>] watchdog_overflow_callback+0x13f/0x160
	[<ffffffff8119ae7c>] __perf_event_overflow+0x9c/0x1f0
	[<ffffffff8119baa4>] perf_event_overflow+0x14/0x20
	[<ffffffff8100d235>] intel_pmu_handle_irq+0x1d5/0x4a0
	[<ffffffff810062ad>] perf_event_nmi_handler+0x2d/0x50
	[<ffffffff81174397>] ? tracing_generic_entry_update+0x97/0xb0
	[<ffffffff81039cef>] nmi_handle+0xbf/0x2f0
	[<ffffffff81039c35>] ? nmi_handle+0x5/0x2f0
	[<ffffffff81174397>] ? tracing_generic_entry_update+0x97/0xb0
	[<ffffffff8103a171>] default_do_nmi+0x71/0x1b0
	[<ffffffff8103a3c5>] do_nmi+0x115/0x170
	[<ffffffff8194b511>] end_repeat_nmi+0x1a/0x1e
	[<ffffffff81174397>] ? tracing_generic_entry_update+0x97/0xb0
	[<ffffffff81174397>] ? tracing_generic_entry_update+0x97/0xb0
	[<ffffffff81174397>] ? tracing_generic_entry_update+0x97/0xb0
	<<EOE>>  [<ffffffff81177d3f>] ? trace_buffer_lock_reserve+0x3f/0x60
	[<ffffffff810d3d51>] ? sched_avg_update+0x51/0xc0
	[<ffffffff811781e5>] __trace_bputs+0x75/0x100
	[<ffffffff810d3d79>] sched_avg_update+0x79/0xc0
	[<ffffffff810dedc3>] cpu_load_update+0xa3/0xd0
	[<ffffffff810e694a>] cpu_load_update_nohz_stop+0x7a/0x80
	[<ffffffff811353b7>] tick_nohz_idle_exit+0x77/0x110
	[<ffffffff810f0866>] cpu_startup_entry+0x176/0x420
	[<ffffffff810589f4>] start_secondary+0x104/0x110

This is due to the loop in sched_avg_update() that fails to make any
progress when the update delay is 0.

Since this is non-sense to set that value to 0 anyway, force a minimum
of at least 1 millisecond.

Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/sysctl.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 87b2fc3..b730dd6 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -345,7 +345,8 @@ static struct ctl_table kern_table[] = {
 		.data		= &sysctl_sched_time_avg,
 		.maxlen		= sizeof(unsigned int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &one,
 	},
 	{
 		.procname	= "sched_shares_window_ns",
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] sched: Unloop sched avg decaying
  2016-06-14 15:27 [PATCH 0/3] sched: Enhance/fix sched_avg_update() Frederic Weisbecker
  2016-06-14 15:28 ` [PATCH 1/3] sched: Introduce sched_time_avg_ns minimal value Frederic Weisbecker
@ 2016-06-14 15:28 ` Frederic Weisbecker
  2016-06-14 15:58   ` Peter Zijlstra
  2016-06-14 15:28 ` [NOT-FOR-MERGE-PATCH 3/3] Debug: Alternate old/new version of sched_avg_update calls for profiling Frederic Weisbecker
  2 siblings, 1 reply; 8+ messages in thread
From: Frederic Weisbecker @ 2016-06-14 15:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Ingo Molnar, Mike Galbraith, Thomas Gleixner

The code that computes sched rt avg decaying periodically (every 0.5
seconds) catches up with lost updates within a loop. This is ok when
CPUs sleep in dynticks mode for short periods of time as they only
miss a few updates that are caught up through a few iterations. But CPUs
can sleep for undefined periods of time, leading to unbound number of
loop iterations to catch up. As a result, the computation time of
sched_avg_update() can virtually induce some latency issue.

The solution in this patch propose to convert this loop into a single
division. Testing on idle load has shown positive results. Below
is showned a function profile comparison between the upstream version
(renamed as sched_avg_update_old) against the new version
(sched_avg_update) with both being called alternatively.

New version can be up to 2 times faster on average.

	Function                                         Hit    Time            Avg             s^2
	--------                                         ---    ----            ---             ---
	trace_stat/function0:  sched_avg_update_old   135120    47612.36 us     0.352 us        1049.606 us
	trace_stat/function0:  sched_avg_update       135120    42668.44 us     0.315 us        1617.970 us
	trace_stat/function1:  sched_avg_update_old   132074    46618.41 us     0.352 us        3653.566 us
	trace_stat/function1:  sched_avg_update       132074    41356.76 us     0.313 us        3161.585 us
	trace_stat/function2:  sched_avg_update_old   121335    43381.10 us     0.357 us        1262.240 us
	trace_stat/function2:  sched_avg_update       121335    38240.72 us     0.315 us        970.400 us
	trace_stat/function3:  sched_avg_update_old    37563    17640.67 us     0.469 us        148.206 us
	trace_stat/function3:  sched_avg_update        37563    11059.65 us     0.294 us        126.548 us
	trace_stat/function4:  sched_avg_update_old    24200    13557.05 us     0.560 us        57.124 us
	trace_stat/function4:  sched_avg_update        24200    6687.281 us     0.276 us        9.528 us
	trace_stat/function5:  sched_avg_update_old    28572    15158.76 us     0.530 us        674.049 us
	trace_stat/function5:  sched_avg_update        28573    8012.361 us     0.280 us        181.687 us
	trace_stat/function6:  sched_avg_update_old    23424    12987.64 us     0.554 us        27.639 us
	trace_stat/function6:  sched_avg_update        23425    6264.433 us     0.267 us        2.965 us
	trace_stat/function7:  sched_avg_update_old    22192    13083.69 us     0.589 us        25.605 us
	trace_stat/function7:  sched_avg_update        22191    5947.785 us     0.268 us        1.748 us

Below is a snapshot of the same function profile under full buzy load
(perf bench sched messaging). Old and new versions there show roughly
the same results with tiny 2 or 3 nsecs in favour of the old version
which shouldn't matter much as it is called every 0.5 seconds.

	Function                                         Hit    Time            Avg             s^2
	--------                                         ---    ----            ---             ---
	trace_stat/function0:  sched_avg_update        106699    8029.961 us     0.075 us        0.244 us
	trace_stat/function0:  sched_avg_update_old    106698    7852.948 us     0.073 us        0.313 us
	trace_stat/function1:  sched_avg_update        106547    8066.833 us     0.075 us        1.256 us
	trace_stat/function1:  sched_avg_update_old    106547    7794.896 us     0.073 us        1.521 us
	trace_stat/function2:  sched_avg_update        106527    8049.326 us     0.075 us        1.141 us
	trace_stat/function2:  sched_avg_update_old    106528    7818.155 us     0.073 us        1.052 us
	trace_stat/function3:  sched_avg_update        106534    8056.079 us     0.075 us        0.342 us
	trace_stat/function3:  sched_avg_update_old    106535    7815.416 us     0.073 us        0.369 us
	trace_stat/function4:  sched_avg_update        106433    8090.462 us     0.076 us        5.359 us
	trace_stat/function4:  sched_avg_update_old    106433    7879.694 us     0.074 us        0.433 us
	trace_stat/function5:  sched_avg_update        106426    8127.800 us     0.076 us        1.304 us
	trace_stat/function5:  sched_avg_update_old    106425    7854.538 us     0.073 us        2.466 us
	trace_stat/function6:  sched_avg_update        106436    8067.921 us     0.075 us        0.257 us
	trace_stat/function6:  sched_avg_update_old    106436    7830.492 us     0.073 us        0.334 us
	trace_stat/function7:  sched_avg_update        106427    8135.786 us     0.076 us        13.181 us
	trace_stat/function7:  sched_avg_update_old    106428    7940.925 us     0.074 us        0.982 us

As a conclusion, the new version behaves roughly the same under buzy
load but has a faster and much more contained behaviour under idle load.

Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/sched/core.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 385c947..0c0578a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -666,17 +666,17 @@ bool sched_can_stop_tick(struct rq *rq)
 void sched_avg_update(struct rq *rq)
 {
 	s64 period = sched_avg_period();
+	s64 delta;
+	u64 rem;
+	int pending;
 
-	while ((s64)(rq_clock(rq) - rq->age_stamp) > period) {
-		/*
-		 * Inline assembly required to prevent the compiler
-		 * optimising this loop into a divmod call.
-		 * See __iter_div_u64_rem() for another example of this.
-		 */
-		asm("" : "+rm" (rq->age_stamp));
-		rq->age_stamp += period;
-		rq->rt_avg /= 2;
-	}
+	delta = (s64)(rq_clock(rq) - rq->age_stamp);
+	if (delta <= period)
+		return;
+
+	pending = div64_u64_rem(delta, period, &rem);
+	rq->age_stamp += delta - rem;
+	rq->rt_avg >>= pending;
 }
 
 #endif /* CONFIG_SMP */
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [NOT-FOR-MERGE-PATCH 3/3] Debug: Alternate old/new version of sched_avg_update calls for profiling
  2016-06-14 15:27 [PATCH 0/3] sched: Enhance/fix sched_avg_update() Frederic Weisbecker
  2016-06-14 15:28 ` [PATCH 1/3] sched: Introduce sched_time_avg_ns minimal value Frederic Weisbecker
  2016-06-14 15:28 ` [PATCH 2/3] sched: Unloop sched avg decaying Frederic Weisbecker
@ 2016-06-14 15:28 ` Frederic Weisbecker
  2 siblings, 0 replies; 8+ messages in thread
From: Frederic Weisbecker @ 2016-06-14 15:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Ingo Molnar, Mike Galbraith, Thomas Gleixner

This way we can measure with the function profiler.

	trace_stat/function0:  sched_avg_update_old                   764    223.936 us      0.293 us        0.017 us
	trace_stat/function0:  sched_avg_update                       764    199.818 us      0.261 us        0.006 us
	trace_stat/function1:  sched_avg_update_old                   252    99.225 us       0.393 us        0.106 us
	trace_stat/function1:  sched_avg_update                       252    59.080 us       0.234 us        0.008 us
	trace_stat/function2:  sched_avg_update_old                   606    202.465 us      0.334 us        0.007 us
	trace_stat/function2:  sched_avg_update                       605    184.083 us      0.304 us        0.002 us
	trace_stat/function3:  sched_avg_update_old                   902    261.790 us      0.290 us        0.010 us
	trace_stat/function3:  sched_avg_update                       901    238.253 us      0.264 us        0.004 us
	trace_stat/function4:  sched_avg_update_old                   120    50.391 us       0.419 us        0.095 us
	trace_stat/function4:  sched_avg_update                       120    35.947 us       0.299 us        0.004 us
	trace_stat/function5:  sched_avg_update_old                   581    195.280 us      0.336 us        0.011 us
	trace_stat/function5:  sched_avg_update                       582    173.594 us      0.298 us        0.003 us
	trace_stat/function6:  sched_avg_update_old                   202    45.201 us       0.223 us        0.049 us
	trace_stat/function6:  sched_avg_update                       201    37.892 us       0.188 us        0.013 us
	trace_stat/function7:  sched_avg_update_old                   200    80.731 us       0.403 us        0.477 us
	trace_stat/function7:  sched_avg_update                       200    41.759 us       0.208 us        0.012 us

Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Not-Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/sched/core.c  | 16 ++++++++++++++++
 kernel/sched/fair.c  |  7 ++++++-
 kernel/sched/sched.h |  8 +++++++-
 3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0c0578a..5410e1bb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -663,6 +663,22 @@ bool sched_can_stop_tick(struct rq *rq)
 }
 #endif /* CONFIG_NO_HZ_FULL */
 
+void sched_avg_update_old(struct rq *rq)
+{
+	s64 period = sched_avg_period();
+
+	while ((s64)(rq_clock(rq) - rq->age_stamp) > period) {
+		/*
+		 * Inline assembly required to prevent the compiler
+		 * optimising this loop into a divmod call.
+		 * See __iter_div_u64_rem() for another example of this.
+		 */
+		asm("" : "+rm" (rq->age_stamp));
+		rq->age_stamp += period;
+		rq->rt_avg /= 2;
+	}
+}
+
 void sched_avg_update(struct rq *rq)
 {
 	s64 period = sched_avg_period();
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c6dd8ba..1b487bb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4607,6 +4607,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
  * see decay_load_misses(). For NOHZ_FULL we get to subtract and add the extra
  * term.
  */
+DEFINE_PER_CPU(int, cpu_lodd);
 static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
 			    unsigned long pending_updates)
 {
@@ -4647,7 +4648,11 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
 		this_rq->cpu_load[i] = (old_load * (scale - 1) + new_load) >> i;
 	}
 
-	sched_avg_update(this_rq);
+	if (__this_cpu_read(cpu_lodd) % 2)
+		sched_avg_update(this_rq);
+	else
+		sched_avg_update_old(this_rq);
+	__this_cpu_add(cpu_lodd, 1);
 }
 
 /* Used instead of source_load when we know the type == 0 */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 72f1f30..b7e197b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1427,6 +1427,7 @@ static inline int hrtick_enabled(struct rq *rq)
 #endif /* CONFIG_SCHED_HRTICK */
 
 #ifdef CONFIG_SMP
+extern void sched_avg_update_old(struct rq *rq);
 extern void sched_avg_update(struct rq *rq);
 
 #ifndef arch_scale_freq_capacity
@@ -1448,10 +1449,15 @@ unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
 }
 #endif
 
+DECLARE_PER_CPU(int, cpu_lodd);
 static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta)
 {
 	rq->rt_avg += rt_delta * arch_scale_freq_capacity(NULL, cpu_of(rq));
-	sched_avg_update(rq);
+	if (__this_cpu_read(cpu_lodd) % 2)
+		sched_avg_update(rq);
+	else
+		sched_avg_update_old(rq);
+	__this_cpu_add(cpu_lodd, 1);
 }
 #else
 static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta) { }
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] sched: Unloop sched avg decaying
  2016-06-14 15:28 ` [PATCH 2/3] sched: Unloop sched avg decaying Frederic Weisbecker
@ 2016-06-14 15:58   ` Peter Zijlstra
  2016-06-30 12:52     ` Frederic Weisbecker
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2016-06-14 15:58 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: LKML, Ingo Molnar, Mike Galbraith, Thomas Gleixner

On Tue, Jun 14, 2016 at 05:28:01PM +0200, Frederic Weisbecker wrote:
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 385c947..0c0578a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -666,17 +666,17 @@ bool sched_can_stop_tick(struct rq *rq)
>  void sched_avg_update(struct rq *rq)
>  {
>  	s64 period = sched_avg_period();
> +	s64 delta;
> +	u64 rem;
> +	int pending;
>  
> +	delta = (s64)(rq_clock(rq) - rq->age_stamp);
> +	if (delta <= period)
> +		return;
> +
> +	pending = div64_u64_rem(delta, period, &rem);
> +	rq->age_stamp += delta - rem;
> +	rq->rt_avg >>= pending;
>  }

Blergh, and now do the profile on machine that doesn't have major
transistor count dedicated to divisions.

Why not add the division to the nohz exit path only?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] sched: Unloop sched avg decaying
  2016-06-14 15:58   ` Peter Zijlstra
@ 2016-06-30 12:52     ` Frederic Weisbecker
  2016-06-30 13:20       ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Frederic Weisbecker @ 2016-06-30 12:52 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, Ingo Molnar, Mike Galbraith, Thomas Gleixner

On Tue, Jun 14, 2016 at 05:58:42PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 14, 2016 at 05:28:01PM +0200, Frederic Weisbecker wrote:
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 385c947..0c0578a 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -666,17 +666,17 @@ bool sched_can_stop_tick(struct rq *rq)
> >  void sched_avg_update(struct rq *rq)
> >  {
> >  	s64 period = sched_avg_period();
> > +	s64 delta;
> > +	u64 rem;
> > +	int pending;
> >  
> > +	delta = (s64)(rq_clock(rq) - rq->age_stamp);
> > +	if (delta <= period)
> > +		return;
> > +
> > +	pending = div64_u64_rem(delta, period, &rem);
> > +	rq->age_stamp += delta - rem;
> > +	rq->rt_avg >>= pending;
> >  }
> 
> Blergh, and now do the profile on machine that doesn't have major
> transistor count dedicated to divisions.

Indeed I was afraid of something like that. That said I'm not sure which
is worse between iteration or division out of native asm.

Would it be worth testing?

> 
> Why not add the division to the nohz exit path only?

It would be worse I think because we may exit much more often from nohz
than we reach a sched_avg_period().

So the only safe optimization I can do for now is:

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b489fc..4632d31 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -626,8 +626,9 @@ bool sched_can_stop_tick(struct rq *rq)
 void sched_avg_update(struct rq *rq)
 {
 	s64 period = sched_avg_period();
+	int i;
 
-	while ((s64)(rq_clock(rq) - rq->age_stamp) > period) {
+	for (i = 0; (s64)(rq_clock(rq) - rq->age_stamp) > period); i++) {
 		/*
 		 * Inline assembly required to prevent the compiler
 		 * optimising this loop into a divmod call.
@@ -635,8 +636,8 @@ void sched_avg_update(struct rq *rq)
 		 */
 		asm("" : "+rm" (rq->age_stamp));
 		rq->age_stamp += period;
-		rq->rt_avg /= 2;
 	}
+	rq->rt_avg >>= i;
 }
 
 #endif /* CONFIG_SMP */

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] sched: Unloop sched avg decaying
  2016-06-30 12:52     ` Frederic Weisbecker
@ 2016-06-30 13:20       ` Peter Zijlstra
  2016-07-06 11:53         ` Frederic Weisbecker
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2016-06-30 13:20 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: LKML, Ingo Molnar, Mike Galbraith, Thomas Gleixner

On Thu, Jun 30, 2016 at 02:52:26PM +0200, Frederic Weisbecker wrote:
> On Tue, Jun 14, 2016 at 05:58:42PM +0200, Peter Zijlstra wrote:

> > Why not add the division to the nohz exit path only?
> 
> It would be worse I think because we may exit much more often from nohz
> than we reach a sched_avg_period().
> 
> So the only safe optimization I can do for now is:

How about something like this then?

---

 kernel/sched/core.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3387e4f14fc9..fd1ae4c4105f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -665,9 +665,23 @@ bool sched_can_stop_tick(struct rq *rq)
 
 void sched_avg_update(struct rq *rq)
 {
-	s64 period = sched_avg_period();
+	s64 delta, period = sched_avg_period();
 
-	while ((s64)(rq_clock(rq) - rq->age_stamp) > period) {
+	delta = (s64)(rq_clock(rq) - rq->age_stamp);
+	if (likely(delta < period))
+		return;
+
+	if (unlikely(delta > 3*period)) {
+		int pending;
+		u64 rem;
+
+		pending = div64_u64_rem(delta, period, &rem);
+		rq->age_stamp += delta - rem;
+		rq->rt_avg >>= pending;
+		return;
+	}
+
+	while (delta > period) {
 		/*
 		 * Inline assembly required to prevent the compiler
 		 * optimising this loop into a divmod call.
@@ -675,6 +689,7 @@ void sched_avg_update(struct rq *rq)
 		 */
 		asm("" : "+rm" (rq->age_stamp));
 		rq->age_stamp += period;
+		delta -= period;
 		rq->rt_avg /= 2;
 	}
 }

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] sched: Unloop sched avg decaying
  2016-06-30 13:20       ` Peter Zijlstra
@ 2016-07-06 11:53         ` Frederic Weisbecker
  0 siblings, 0 replies; 8+ messages in thread
From: Frederic Weisbecker @ 2016-07-06 11:53 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, Ingo Molnar, Mike Galbraith, Thomas Gleixner

On Thu, Jun 30, 2016 at 03:20:37PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 30, 2016 at 02:52:26PM +0200, Frederic Weisbecker wrote:
> > On Tue, Jun 14, 2016 at 05:58:42PM +0200, Peter Zijlstra wrote:
> 
> > > Why not add the division to the nohz exit path only?
> > 
> > It would be worse I think because we may exit much more often from nohz
> > than we reach a sched_avg_period().
> > 
> > So the only safe optimization I can do for now is:
> 
> How about something like this then?
> 
> ---
> 
>  kernel/sched/core.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 3387e4f14fc9..fd1ae4c4105f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -665,9 +665,23 @@ bool sched_can_stop_tick(struct rq *rq)
>  
>  void sched_avg_update(struct rq *rq)
>  {
> -	s64 period = sched_avg_period();
> +	s64 delta, period = sched_avg_period();
>  
> -	while ((s64)(rq_clock(rq) - rq->age_stamp) > period) {
> +	delta = (s64)(rq_clock(rq) - rq->age_stamp);
> +	if (likely(delta < period))
> +		return;
> +
> +	if (unlikely(delta > 3*period)) {
> +		int pending;
> +		u64 rem;
> +
> +		pending = div64_u64_rem(delta, period, &rem);
> +		rq->age_stamp += delta - rem;
> +		rq->rt_avg >>= pending;
> +		return;
> +	}
> +
> +	while (delta > period) {
>  		/*
>  		 * Inline assembly required to prevent the compiler
>  		 * optimising this loop into a divmod call.
> @@ -675,6 +689,7 @@ void sched_avg_update(struct rq *rq)
>  		 */
>  		asm("" : "+rm" (rq->age_stamp));
>  		rq->age_stamp += period;
> +		delta -= period;
>  		rq->rt_avg /= 2;
>  	}
>  }

Makes sense. I'm going to do some tests. We might want to precompute 3*period maybe.

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-07-06 11:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-14 15:27 [PATCH 0/3] sched: Enhance/fix sched_avg_update() Frederic Weisbecker
2016-06-14 15:28 ` [PATCH 1/3] sched: Introduce sched_time_avg_ns minimal value Frederic Weisbecker
2016-06-14 15:28 ` [PATCH 2/3] sched: Unloop sched avg decaying Frederic Weisbecker
2016-06-14 15:58   ` Peter Zijlstra
2016-06-30 12:52     ` Frederic Weisbecker
2016-06-30 13:20       ` Peter Zijlstra
2016-07-06 11:53         ` Frederic Weisbecker
2016-06-14 15:28 ` [NOT-FOR-MERGE-PATCH 3/3] Debug: Alternate old/new version of sched_avg_update calls for profiling Frederic Weisbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.