All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] cpuidle: Change ktime_get() with local_clock()
@ 2016-04-14 19:23 ` Daniel Lezcano
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Lezcano @ 2016-04-14 19:23 UTC (permalink / raw)
  To: rjw; +Cc: peterz, mingo, open list:CPUIDLE DRIVERS, open list

The ktime_get() can have a non negligeable overhead, use local_clock()
instead.

In order to test the difference between ktime_get() and local_clock(),
a quick hack has been added to trigger, via debugfs, 10000 times a
call to ktime_get() and local_clock() and measure the elapsed time.

Then the average value, the min and max is computed for each call.

>From userspace, the test above was called 100 times every 2 seconds.

So, ktime_get() and local_clock() have been called 1000000 times in
total.

The results are:

ktime_get():
============
 * average: 101 ns (stddev: 27.4)
 * maximum: 38313 ns
 * minimum: 65 ns

local_clock():
==============
 * average: 60 ns (stddev: 9.8)
 * maximum: 13487 ns
 * minimum: 46 ns

The local_clock() is faster and more stable.

Even if it is a drop in the ocean, changing the ktime_get() by the
local_clock() allows to save 80ns at idle time (entry + exit). And
in some circumstances, especially when there are several CPUs racing
for the clock access, we save tens of microseconds.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 drivers/cpuidle/cpuidle.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index f996efc..78447bc 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -173,7 +173,7 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 
 	struct cpuidle_state *target_state = &drv->states[index];
 	bool broadcast = !!(target_state->flags & CPUIDLE_FLAG_TIMER_STOP);
-	ktime_t time_start, time_end;
+	u64 time_start, time_end;
 	s64 diff;
 
 	/*
@@ -195,13 +195,13 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 	sched_idle_set_state(target_state);
 
 	trace_cpu_idle_rcuidle(index, dev->cpu);
-	time_start = ktime_get();
+	time_start = local_clock();
 
 	stop_critical_timings();
 	entered_state = target_state->enter(dev, drv, index);
 	start_critical_timings();
 
-	time_end = ktime_get();
+	time_end = local_clock();
 	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
 
 	/* The cpu is no longer idle or about to enter idle. */
@@ -217,7 +217,11 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 	if (!cpuidle_state_is_coupled(drv, entered_state))
 		local_irq_enable();
 
-	diff = ktime_to_us(ktime_sub(time_end, time_start));
+	/*
+	 * local_clock() returns the time in nanosecond, let's shift
+	 * by 10 (divide by 1024) to have microsecond based time.
+	 */
+	diff = (time_end - time_start) >> 10;
 	if (diff > INT_MAX)
 		diff = INT_MAX;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] cpuidle: Change ktime_get() with local_clock()
@ 2016-04-14 19:23 ` Daniel Lezcano
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Lezcano @ 2016-04-14 19:23 UTC (permalink / raw)
  To: rjw; +Cc: peterz, mingo, open list:CPUIDLE DRIVERS, open list

The ktime_get() can have a non negligeable overhead, use local_clock()
instead.

In order to test the difference between ktime_get() and local_clock(),
a quick hack has been added to trigger, via debugfs, 10000 times a
call to ktime_get() and local_clock() and measure the elapsed time.

Then the average value, the min and max is computed for each call.

>From userspace, the test above was called 100 times every 2 seconds.

So, ktime_get() and local_clock() have been called 1000000 times in
total.

The results are:

ktime_get():
============
 * average: 101 ns (stddev: 27.4)
 * maximum: 38313 ns
 * minimum: 65 ns

local_clock():
==============
 * average: 60 ns (stddev: 9.8)
 * maximum: 13487 ns
 * minimum: 46 ns

The local_clock() is faster and more stable.

Even if it is a drop in the ocean, changing the ktime_get() by the
local_clock() allows to save 80ns at idle time (entry + exit). And
in some circumstances, especially when there are several CPUs racing
for the clock access, we save tens of microseconds.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 drivers/cpuidle/cpuidle.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index f996efc..78447bc 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -173,7 +173,7 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 
 	struct cpuidle_state *target_state = &drv->states[index];
 	bool broadcast = !!(target_state->flags & CPUIDLE_FLAG_TIMER_STOP);
-	ktime_t time_start, time_end;
+	u64 time_start, time_end;
 	s64 diff;
 
 	/*
@@ -195,13 +195,13 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 	sched_idle_set_state(target_state);
 
 	trace_cpu_idle_rcuidle(index, dev->cpu);
-	time_start = ktime_get();
+	time_start = local_clock();
 
 	stop_critical_timings();
 	entered_state = target_state->enter(dev, drv, index);
 	start_critical_timings();
 
-	time_end = ktime_get();
+	time_end = local_clock();
 	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
 
 	/* The cpu is no longer idle or about to enter idle. */
@@ -217,7 +217,11 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 	if (!cpuidle_state_is_coupled(drv, entered_state))
 		local_irq_enable();
 
-	diff = ktime_to_us(ktime_sub(time_end, time_start));
+	/*
+	 * local_clock() returns the time in nanosecond, let's shift
+	 * by 10 (divide by 1024) to have microsecond based time.
+	 */
+	diff = (time_end - time_start) >> 10;
 	if (diff > INT_MAX)
 		diff = INT_MAX;
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] cpuidle: Change ktime_get() with local_clock()
  2016-04-14 19:23 ` Daniel Lezcano
  (?)
@ 2016-04-20 12:13 ` Peter Zijlstra
  2016-04-20 12:30   ` Daniel Lezcano
  -1 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2016-04-20 12:13 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: rjw, mingo, open list:CPUIDLE DRIVERS, open list

On Thu, Apr 14, 2016 at 09:23:54PM +0200, Daniel Lezcano wrote:
> @@ -217,7 +217,11 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>  	if (!cpuidle_state_is_coupled(drv, entered_state))
>  		local_irq_enable();
>  
> -	diff = ktime_to_us(ktime_sub(time_end, time_start));
> +	/*
> +	 * local_clock() returns the time in nanosecond, let's shift
> +	 * by 10 (divide by 1024) to have microsecond based time.
> +	 */
> +	diff = (time_end - time_start) >> 10;

Changelog fails to explain the ramifications of this change...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] cpuidle: Change ktime_get() with local_clock()
  2016-04-20 12:13 ` Peter Zijlstra
@ 2016-04-20 12:30   ` Daniel Lezcano
  2016-04-20 12:58     ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Lezcano @ 2016-04-20 12:30 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: rjw, mingo, open list:CPUIDLE DRIVERS, open list

On Wed, Apr 20, 2016 at 02:13:15PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 14, 2016 at 09:23:54PM +0200, Daniel Lezcano wrote:
> > @@ -217,7 +217,11 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
> >  	if (!cpuidle_state_is_coupled(drv, entered_state))
> >  		local_irq_enable();
> >  
> > -	diff = ktime_to_us(ktime_sub(time_end, time_start));
> > +	/*
> > +	 * local_clock() returns the time in nanosecond, let's shift
> > +	 * by 10 (divide by 1024) to have microsecond based time.
> > +	 */
> > +	diff = (time_end - time_start) >> 10;
> 
> Changelog fails to explain the ramifications of this change...

Sorry, I don't get the point of your comment. Do you mean I should elaborate 
the comment above in the changelog?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] cpuidle: Change ktime_get() with local_clock()
  2016-04-20 12:30   ` Daniel Lezcano
@ 2016-04-20 12:58     ` Peter Zijlstra
  2016-04-20 16:47       ` Daniel Lezcano
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2016-04-20 12:58 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: rjw, mingo, open list:CPUIDLE DRIVERS, open list

On Wed, Apr 20, 2016 at 02:30:11PM +0200, Daniel Lezcano wrote:
> On Wed, Apr 20, 2016 at 02:13:15PM +0200, Peter Zijlstra wrote:
> > On Thu, Apr 14, 2016 at 09:23:54PM +0200, Daniel Lezcano wrote:
> > > @@ -217,7 +217,11 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
> > >  	if (!cpuidle_state_is_coupled(drv, entered_state))
> > >  		local_irq_enable();
> > >  
> > > -	diff = ktime_to_us(ktime_sub(time_end, time_start));
> > > +	/*
> > > +	 * local_clock() returns the time in nanosecond, let's shift
> > > +	 * by 10 (divide by 1024) to have microsecond based time.
> > > +	 */
> > > +	diff = (time_end - time_start) >> 10;
> > 
> > Changelog fails to explain the ramifications of this change...
> 
> Sorry, I don't get the point of your comment. Do you mean I should elaborate 
> the comment above in the changelog?

Yeah, why is /1024 good enough?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] cpuidle: Change ktime_get() with local_clock()
  2016-04-20 12:58     ` Peter Zijlstra
@ 2016-04-20 16:47       ` Daniel Lezcano
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Lezcano @ 2016-04-20 16:47 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: rjw, mingo, open list:CPUIDLE DRIVERS, open list

On Wed, Apr 20, 2016 at 02:58:37PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 20, 2016 at 02:30:11PM +0200, Daniel Lezcano wrote:
> > On Wed, Apr 20, 2016 at 02:13:15PM +0200, Peter Zijlstra wrote:
> > > On Thu, Apr 14, 2016 at 09:23:54PM +0200, Daniel Lezcano wrote:
> > > > @@ -217,7 +217,11 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
> > > >  	if (!cpuidle_state_is_coupled(drv, entered_state))
> > > >  		local_irq_enable();
> > > >  
> > > > -	diff = ktime_to_us(ktime_sub(time_end, time_start));
> > > > +	/*
> > > > +	 * local_clock() returns the time in nanosecond, let's shift
> > > > +	 * by 10 (divide by 1024) to have microsecond based time.
> > > > +	 */
> > > > +	diff = (time_end - time_start) >> 10;
> > > 
> > > Changelog fails to explain the ramifications of this change...
> > 
> > Sorry, I don't get the point of your comment. Do you mean I should elaborate 
> > the comment above in the changelog?
> 
> Yeah, why is /1024 good enough?

Ok.

The conversion between nanosec to microsec could be done with integer 
division (div 1000) or by 10 bits shifting (div 1024).

The following table gives some results at the limits.

 ------------------------------------------
|   nsec   |   div(1000)   |   div(1024)   |
 ------------------------------------------
|   1e3    |        1 usec |      976 nsec |
 ------------------------------------------
|   1e6    |     1000 usec |      976 usec |
 ------------------------------------------
|   1e9    |  1000000 usec |   976562 usec |
 ------------------------------------------

There is a linear deviation of 2.34%. This loss of precision is acceptable 
in the context of the resulting diff which is used for statistics. These 
ones are processed to guess estimate an approximation of the duration of the 
next idle period which ends up into an idle state selection. The selection 
criteria takes into account the next duration based on large intervals, 
represented by the idle state's target residency.

The 2^10 division is enough because the approximation regarding the 1e3 
division is lost in all the approximations done for the next idle duration 
computation.

Would be this explanation sufficient ?

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-04-20 16:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-14 19:23 [PATCH] cpuidle: Change ktime_get() with local_clock() Daniel Lezcano
2016-04-14 19:23 ` Daniel Lezcano
2016-04-20 12:13 ` Peter Zijlstra
2016-04-20 12:30   ` Daniel Lezcano
2016-04-20 12:58     ` Peter Zijlstra
2016-04-20 16:47       ` Daniel Lezcano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.