All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4] cpuidle: Fix last_residency division
@ 2016-06-30 14:34 Shreyas B. Prabhu
  2016-06-30 14:57   ` Daniel Lezcano
  0 siblings, 1 reply; 13+ messages in thread
From: Shreyas B. Prabhu @ 2016-06-30 14:34 UTC (permalink / raw)
  To: rjw
  Cc: daniel.lezcano, linux-pm, linuxppc-dev, anton, mpe, bsingharora,
	David.Laight, arnd, nicolas.pitre, Shreyas B. Prabhu

Snooze is a poll idle state in powernv and pseries platforms. Snooze
has a timeout so that if a cpu stays in snooze for more than target
residency of the next available idle state, then it would exit thereby
giving chance to the cpuidle governor to re-evaluate and
promote the cpu to a deeper idle state. Therefore whenever snooze exits
due to this timeout, its last_residency will be target_residency of next
deeper state.

commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
changed the math around last_residency calculation. Specifically, while
converting last_residency value from nanoseconds to microseconds it does
right shift by 10. Due to this, in snooze timeout exit scenarios
last_residency calculated is roughly 2.3% less than target_residency of
next available state. This pattern is picked up get_typical_interval()
in the menu governor and therefore expected_interval in menu_select() is
frequently less than the target_residency of any state but snooze.

Due to this we are entering snooze at a higher rate, thereby affecting
the single thread performance.

Fix this by using a better approximation for division by 1000.

Reported-by: Anton Blanchard <anton@samba.org>
Bisected-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Suggested-by David Laight <david.laight@aculab.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
---
Changes in v4
=============
 - Increasing the threshold upto which approximation can be used.
 - Removed explicit cast. Instead added a comment saying why cast
   is safe.

Changes in v3
=============
 - Using approximation suggested by David

Changes in v2
=============
 - Fixing it in the cpuidle core code instead of driver code.

 drivers/cpuidle/cpuidle.c | 11 +++--------
 drivers/cpuidle/cpuidle.h | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a4d0059..f55ad01 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -174,7 +174,6 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 	struct cpuidle_state *target_state = &drv->states[index];
 	bool broadcast = !!(target_state->flags & CPUIDLE_FLAG_TIMER_STOP);
 	u64 time_start, time_end;
-	s64 diff;
 
 	/*
 	 * Tell the time framework to switch to a broadcast timer because our
@@ -218,14 +217,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 		local_irq_enable();
 
 	/*
-	 * local_clock() returns the time in nanosecond, let's shift
-	 * by 10 (divide by 1024) to have microsecond based time.
+	 * local_clock() returns the time in nanoseconds, convert it to
+	 * microsecond based time.
 	 */
-	diff = (time_end - time_start) >> 10;
-	if (diff > INT_MAX)
-		diff = INT_MAX;
-
-	dev->last_residency = (int) diff;
+	dev->last_residency = convert_nsec_to_usec(time_end - time_start);
 
 	if (entered_state >= 0) {
 		/* Update cpuidle counters */
diff --git a/drivers/cpuidle/cpuidle.h b/drivers/cpuidle/cpuidle.h
index f87f399..a027b35 100644
--- a/drivers/cpuidle/cpuidle.h
+++ b/drivers/cpuidle/cpuidle.h
@@ -68,4 +68,42 @@ static inline void cpuidle_coupled_unregister_device(struct cpuidle_device *dev)
 }
 #endif
 
+/*
+ * To ensure that there is no overflow while approximation
+ * for dividing val by 1000, we must respect -
+ * val + (val >> 5) <= 0xFFFFFFFF
+ * val + val/32 <= 0xFFFFFFFF
+ * val <= (0xFFFFFFFF * 32) / 33
+ * val <= 0xF83E0F82
+ * Hence the threshold for val below which we can use the
+ * approximation is 0xF83E0F82
+ */
+#define DIV_APPROXIMATION_THRESHOLD 0xF83E0F82UL
+
+/*
+ * Used for calculating last_residency in usec. Optimized for case
+ * where last_residency in nsecs is < DIV_APPROXIMATION_THRESHOLD
+ * Approximated value has less than 1% error.
+ */
+static inline int convert_nsec_to_usec(u64 nsec)
+{
+	if (likely(nsec < DIV_APPROXIMATION_THRESHOLD)) {
+		u32 usec = nsec;
+
+		usec += usec >> 5;
+		usec = usec >> 10;
+
+		/* Can safely cast to int since usec is < INT_MAX */
+		return usec;
+	} else {
+		u64 usec = div_u64(nsec, 1000);
+
+		if (usec > INT_MAX)
+			usec = INT_MAX;
+
+		/* Can safely cast to int since usec is < INT_MAX */
+		return usec;
+	}
+}
+
 #endif /* __DRIVER_CPUIDLE_H */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v4] cpuidle: Fix last_residency division
  2016-06-30 14:34 [PATCH v4] cpuidle: Fix last_residency division Shreyas B. Prabhu
@ 2016-06-30 14:57   ` Daniel Lezcano
  0 siblings, 0 replies; 13+ messages in thread
From: Daniel Lezcano @ 2016-06-30 14:57 UTC (permalink / raw)
  To: Shreyas B. Prabhu, rjw
  Cc: nicolas.pitre, arnd, linux-pm, David.Laight, anton, linuxppc-dev

On 06/30/2016 04:34 PM, Shreyas B. Prabhu wrote:
> Snooze is a poll idle state in powernv and pseries platforms. Snooze
> has a timeout so that if a cpu stays in snooze for more than target
> residency of the next available idle state, then it would exit thereby
> giving chance to the cpuidle governor to re-evaluate and
> promote the cpu to a deeper idle state. Therefore whenever snooze exits
> due to this timeout, its last_residency will be target_residency of next
> deeper state.
>
> commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
> changed the math around last_residency calculation. Specifically, while
> converting last_residency value from nanoseconds to microseconds it does
> right shift by 10. Due to this, in snooze timeout exit scenarios
> last_residency calculated is roughly 2.3% less than target_residency of
> next available state. This pattern is picked up get_typical_interval()
> in the menu governor and therefore expected_interval in menu_select() is
> frequently less than the target_residency of any state but snooze.
>
> Due to this we are entering snooze at a higher rate, thereby affecting
> the single thread performance.
>
> Fix this by using a better approximation for division by 1000.
>
> Reported-by: Anton Blanchard <anton@samba.org>
> Bisected-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
> Suggested-by David Laight <david.laight@aculab.com>
> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
> ---
> Changes in v4
> =============
>   - Increasing the threshold upto which approximation can be used.
>   - Removed explicit cast. Instead added a comment saying why cast
>     is safe.
>
> Changes in v3
> =============
>   - Using approximation suggested by David
>
> Changes in v2
> =============
>   - Fixing it in the cpuidle core code instead of driver code.
>
>   drivers/cpuidle/cpuidle.c | 11 +++--------
>   drivers/cpuidle/cpuidle.h | 38 ++++++++++++++++++++++++++++++++++++++
>   2 files changed, 41 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index a4d0059..f55ad01 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -174,7 +174,6 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>   	struct cpuidle_state *target_state = &drv->states[index];
>   	bool broadcast = !!(target_state->flags & CPUIDLE_FLAG_TIMER_STOP);
>   	u64 time_start, time_end;
> -	s64 diff;
>
>   	/*
>   	 * Tell the time framework to switch to a broadcast timer because our
> @@ -218,14 +217,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>   		local_irq_enable();
>
>   	/*
> -	 * local_clock() returns the time in nanosecond, let's shift
> -	 * by 10 (divide by 1024) to have microsecond based time.
> +	 * local_clock() returns the time in nanoseconds, convert it to
> +	 * microsecond based time.
>   	 */
> -	diff = (time_end - time_start) >> 10;
> -	if (diff > INT_MAX)
> -		diff = INT_MAX;
> -
> -	dev->last_residency = (int) diff;
> +	dev->last_residency = convert_nsec_to_usec(time_end - time_start);
>
>   	if (entered_state >= 0) {
>   		/* Update cpuidle counters */
> diff --git a/drivers/cpuidle/cpuidle.h b/drivers/cpuidle/cpuidle.h
> index f87f399..a027b35 100644
> --- a/drivers/cpuidle/cpuidle.h
> +++ b/drivers/cpuidle/cpuidle.h
> @@ -68,4 +68,42 @@ static inline void cpuidle_coupled_unregister_device(struct cpuidle_device *dev)
>   }
>   #endif
>
> +/*
> + * To ensure that there is no overflow while approximation
> + * for dividing val by 1000, we must respect -
> + * val + (val >> 5) <= 0xFFFFFFFF
> + * val + val/32 <= 0xFFFFFFFF
> + * val <= (0xFFFFFFFF * 32) / 33
> + * val <= 0xF83E0F82
> + * Hence the threshold for val below which we can use the
> + * approximation is 0xF83E0F82
> + */
> +#define DIV_APPROXIMATION_THRESHOLD 0xF83E0F82UL
> +
> +/*
> + * Used for calculating last_residency in usec. Optimized for case
> + * where last_residency in nsecs is < DIV_APPROXIMATION_THRESHOLD
> + * Approximated value has less than 1% error.
> + */
> +static inline int convert_nsec_to_usec(u64 nsec)
> +{
> +	if (likely(nsec < DIV_APPROXIMATION_THRESHOLD)) {
> +		u32 usec = nsec;
> +
> +		usec += usec >> 5;
> +		usec = usec >> 10;
> +
> +		/* Can safely cast to int since usec is < INT_MAX */
> +		return usec;
> +	} else {
> +		u64 usec = div_u64(nsec, 1000);
> +
> +		if (usec > INT_MAX)
> +			usec = INT_MAX;
> +
> +		/* Can safely cast to int since usec is < INT_MAX */
> +		return usec;
> +	}
> +}


What bothers me with this division is the benefit of adding an extra 
ultra optimized division by 1000 in cpuidle.h while we have already 
ktime_divns which is optimized in ktime.h.

Why not:

ts = ns_to_ktime(local_clock());

...

te = ns_to_ktime(local_clock());


diff = ktime_us_delta(te, ts);





-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4] cpuidle: Fix last_residency division
@ 2016-06-30 14:57   ` Daniel Lezcano
  0 siblings, 0 replies; 13+ messages in thread
From: Daniel Lezcano @ 2016-06-30 14:57 UTC (permalink / raw)
  To: Shreyas B. Prabhu, rjw
  Cc: linux-pm, linuxppc-dev, anton, mpe, bsingharora, David.Laight,
	arnd, nicolas.pitre

On 06/30/2016 04:34 PM, Shreyas B. Prabhu wrote:
> Snooze is a poll idle state in powernv and pseries platforms. Snooze
> has a timeout so that if a cpu stays in snooze for more than target
> residency of the next available idle state, then it would exit thereby
> giving chance to the cpuidle governor to re-evaluate and
> promote the cpu to a deeper idle state. Therefore whenever snooze exits
> due to this timeout, its last_residency will be target_residency of next
> deeper state.
>
> commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
> changed the math around last_residency calculation. Specifically, while
> converting last_residency value from nanoseconds to microseconds it does
> right shift by 10. Due to this, in snooze timeout exit scenarios
> last_residency calculated is roughly 2.3% less than target_residency of
> next available state. This pattern is picked up get_typical_interval()
> in the menu governor and therefore expected_interval in menu_select() is
> frequently less than the target_residency of any state but snooze.
>
> Due to this we are entering snooze at a higher rate, thereby affecting
> the single thread performance.
>
> Fix this by using a better approximation for division by 1000.
>
> Reported-by: Anton Blanchard <anton@samba.org>
> Bisected-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
> Suggested-by David Laight <david.laight@aculab.com>
> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
> ---
> Changes in v4
> =============
>   - Increasing the threshold upto which approximation can be used.
>   - Removed explicit cast. Instead added a comment saying why cast
>     is safe.
>
> Changes in v3
> =============
>   - Using approximation suggested by David
>
> Changes in v2
> =============
>   - Fixing it in the cpuidle core code instead of driver code.
>
>   drivers/cpuidle/cpuidle.c | 11 +++--------
>   drivers/cpuidle/cpuidle.h | 38 ++++++++++++++++++++++++++++++++++++++
>   2 files changed, 41 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index a4d0059..f55ad01 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -174,7 +174,6 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>   	struct cpuidle_state *target_state = &drv->states[index];
>   	bool broadcast = !!(target_state->flags & CPUIDLE_FLAG_TIMER_STOP);
>   	u64 time_start, time_end;
> -	s64 diff;
>
>   	/*
>   	 * Tell the time framework to switch to a broadcast timer because our
> @@ -218,14 +217,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>   		local_irq_enable();
>
>   	/*
> -	 * local_clock() returns the time in nanosecond, let's shift
> -	 * by 10 (divide by 1024) to have microsecond based time.
> +	 * local_clock() returns the time in nanoseconds, convert it to
> +	 * microsecond based time.
>   	 */
> -	diff = (time_end - time_start) >> 10;
> -	if (diff > INT_MAX)
> -		diff = INT_MAX;
> -
> -	dev->last_residency = (int) diff;
> +	dev->last_residency = convert_nsec_to_usec(time_end - time_start);
>
>   	if (entered_state >= 0) {
>   		/* Update cpuidle counters */
> diff --git a/drivers/cpuidle/cpuidle.h b/drivers/cpuidle/cpuidle.h
> index f87f399..a027b35 100644
> --- a/drivers/cpuidle/cpuidle.h
> +++ b/drivers/cpuidle/cpuidle.h
> @@ -68,4 +68,42 @@ static inline void cpuidle_coupled_unregister_device(struct cpuidle_device *dev)
>   }
>   #endif
>
> +/*
> + * To ensure that there is no overflow while approximation
> + * for dividing val by 1000, we must respect -
> + * val + (val >> 5) <= 0xFFFFFFFF
> + * val + val/32 <= 0xFFFFFFFF
> + * val <= (0xFFFFFFFF * 32) / 33
> + * val <= 0xF83E0F82
> + * Hence the threshold for val below which we can use the
> + * approximation is 0xF83E0F82
> + */
> +#define DIV_APPROXIMATION_THRESHOLD 0xF83E0F82UL
> +
> +/*
> + * Used for calculating last_residency in usec. Optimized for case
> + * where last_residency in nsecs is < DIV_APPROXIMATION_THRESHOLD
> + * Approximated value has less than 1% error.
> + */
> +static inline int convert_nsec_to_usec(u64 nsec)
> +{
> +	if (likely(nsec < DIV_APPROXIMATION_THRESHOLD)) {
> +		u32 usec = nsec;
> +
> +		usec += usec >> 5;
> +		usec = usec >> 10;
> +
> +		/* Can safely cast to int since usec is < INT_MAX */
> +		return usec;
> +	} else {
> +		u64 usec = div_u64(nsec, 1000);
> +
> +		if (usec > INT_MAX)
> +			usec = INT_MAX;
> +
> +		/* Can safely cast to int since usec is < INT_MAX */
> +		return usec;
> +	}
> +}


What bothers me with this division is the benefit of adding an extra 
ultra optimized division by 1000 in cpuidle.h while we have already 
ktime_divns which is optimized in ktime.h.

Why not:

ts = ns_to_ktime(local_clock());

...

te = ns_to_ktime(local_clock());


diff = ktime_us_delta(te, ts);





-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4] cpuidle: Fix last_residency division
  2016-06-30 14:57   ` Daniel Lezcano
@ 2016-06-30 15:37     ` Nicolas Pitre
  -1 siblings, 0 replies; 13+ messages in thread
From: Nicolas Pitre @ 2016-06-30 15:37 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: arnd, linux-pm, rjw, David.Laight, anton, Shreyas B. Prabhu,
	linuxppc-dev

On Thu, 30 Jun 2016, Daniel Lezcano wrote:

> On 06/30/2016 04:34 PM, Shreyas B. Prabhu wrote:
> > Snooze is a poll idle state in powernv and pseries platforms. Snooze
> > has a timeout so that if a cpu stays in snooze for more than target
> > residency of the next available idle state, then it would exit thereby
> > giving chance to the cpuidle governor to re-evaluate and
> > promote the cpu to a deeper idle state. Therefore whenever snooze exits
> > due to this timeout, its last_residency will be target_residency of next
> > deeper state.
> >
> > commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
> > changed the math around last_residency calculation. Specifically, while
> > converting last_residency value from nanoseconds to microseconds it does
> > right shift by 10. Due to this, in snooze timeout exit scenarios
> > last_residency calculated is roughly 2.3% less than target_residency of
> > next available state. This pattern is picked up get_typical_interval()
> > in the menu governor and therefore expected_interval in menu_select() is
> > frequently less than the target_residency of any state but snooze.
> >
> > Due to this we are entering snooze at a higher rate, thereby affecting
> > the single thread performance.
> >
> > Fix this by using a better approximation for division by 1000.
> >
> > Reported-by: Anton Blanchard <anton@samba.org>
> > Bisected-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
> > Suggested-by David Laight <david.laight@aculab.com>
> > Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
> > ---
> > Changes in v4
> > =============
> >   - Increasing the threshold upto which approximation can be used.
> >   - Removed explicit cast. Instead added a comment saying why cast
> >     is safe.
> >
> > Changes in v3
> > =============
> >   - Using approximation suggested by David
> >
> > Changes in v2
> > =============
> >   - Fixing it in the cpuidle core code instead of driver code.
> >
> >   drivers/cpuidle/cpuidle.c | 11 +++--------
> >   drivers/cpuidle/cpuidle.h | 38 ++++++++++++++++++++++++++++++++++++++
> >   2 files changed, 41 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> > index a4d0059..f55ad01 100644
> > --- a/drivers/cpuidle/cpuidle.c
> > +++ b/drivers/cpuidle/cpuidle.c
> > @@ -174,7 +174,6 @@ int cpuidle_enter_state(struct cpuidle_device *dev,
> > struct cpuidle_driver *drv,
> >    struct cpuidle_state *target_state = &drv->states[index];
> >    bool broadcast = !!(target_state->flags & CPUIDLE_FLAG_TIMER_STOP);
> >    u64 time_start, time_end;
> > -	s64 diff;
> >
> >    /*
> >   	 * Tell the time framework to switch to a broadcast timer because our
> > @@ -218,14 +217,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev,
> > struct cpuidle_driver *drv,
> >     local_irq_enable();
> >
> >   	/*
> > -	 * local_clock() returns the time in nanosecond, let's shift
> > -	 * by 10 (divide by 1024) to have microsecond based time.
> > +	 * local_clock() returns the time in nanoseconds, convert it to
> > +	 * microsecond based time.
> >   	 */
> > -	diff = (time_end - time_start) >> 10;
> > -	if (diff > INT_MAX)
> > -		diff = INT_MAX;
> > -
> > -	dev->last_residency = (int) diff;
> > +	dev->last_residency = convert_nsec_to_usec(time_end - time_start);
> >
> >    if (entered_state >= 0) {
> >   		/* Update cpuidle counters */
> > diff --git a/drivers/cpuidle/cpuidle.h b/drivers/cpuidle/cpuidle.h
> > index f87f399..a027b35 100644
> > --- a/drivers/cpuidle/cpuidle.h
> > +++ b/drivers/cpuidle/cpuidle.h
> > @@ -68,4 +68,42 @@ static inline void
> > cpuidle_coupled_unregister_device(struct cpuidle_device *dev)
> >   }
> >   #endif
> >
> > +/*
> > + * To ensure that there is no overflow while approximation
> > + * for dividing val by 1000, we must respect -
> > + * val + (val >> 5) <= 0xFFFFFFFF
> > + * val + val/32 <= 0xFFFFFFFF
> > + * val <= (0xFFFFFFFF * 32) / 33
> > + * val <= 0xF83E0F82
> > + * Hence the threshold for val below which we can use the
> > + * approximation is 0xF83E0F82
> > + */
> > +#define DIV_APPROXIMATION_THRESHOLD 0xF83E0F82UL
> > +
> > +/*
> > + * Used for calculating last_residency in usec. Optimized for case
> > + * where last_residency in nsecs is < DIV_APPROXIMATION_THRESHOLD
> > + * Approximated value has less than 1% error.
> > + */
> > +static inline int convert_nsec_to_usec(u64 nsec)
> > +{
> > +	if (likely(nsec < DIV_APPROXIMATION_THRESHOLD)) {
> > +		u32 usec = nsec;
> > +
> > +		usec += usec >> 5;
> > +		usec = usec >> 10;
> > +
> > +		/* Can safely cast to int since usec is < INT_MAX */
> > +		return usec;
> > +	} else {
> > +		u64 usec = div_u64(nsec, 1000);
> > +
> > +		if (usec > INT_MAX)
> > +			usec = INT_MAX;
> > +
> > +		/* Can safely cast to int since usec is < INT_MAX */
> > +		return usec;
> > +	}
> > +}
> 
> 
> What bothers me with this division is the benefit of adding an extra ultra
> optimized division by 1000 in cpuidle.h while we have already ktime_divns
> which is optimized in ktime.h.

It is "optimized" but still much heavier than what is presented above as 
it provides maximum precision.

It all depends on how important the performance gain from the original 
shift by 10 was in the first place.


Nicolas
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4] cpuidle: Fix last_residency division
@ 2016-06-30 15:37     ` Nicolas Pitre
  0 siblings, 0 replies; 13+ messages in thread
From: Nicolas Pitre @ 2016-06-30 15:37 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Shreyas B. Prabhu, rjw, linux-pm, linuxppc-dev, anton, mpe,
	bsingharora, David.Laight, arnd

On Thu, 30 Jun 2016, Daniel Lezcano wrote:

> On 06/30/2016 04:34 PM, Shreyas B. Prabhu wrote:
> > Snooze is a poll idle state in powernv and pseries platforms. Snooze
> > has a timeout so that if a cpu stays in snooze for more than target
> > residency of the next available idle state, then it would exit thereby
> > giving chance to the cpuidle governor to re-evaluate and
> > promote the cpu to a deeper idle state. Therefore whenever snooze exits
> > due to this timeout, its last_residency will be target_residency of next
> > deeper state.
> >
> > commit e93e59ce5b85 ("cpuidle: Replace ktime_get() with local_clock()")
> > changed the math around last_residency calculation. Specifically, while
> > converting last_residency value from nanoseconds to microseconds it does
> > right shift by 10. Due to this, in snooze timeout exit scenarios
> > last_residency calculated is roughly 2.3% less than target_residency of
> > next available state. This pattern is picked up get_typical_interval()
> > in the menu governor and therefore expected_interval in menu_select() is
> > frequently less than the target_residency of any state but snooze.
> >
> > Due to this we are entering snooze at a higher rate, thereby affecting
> > the single thread performance.
> >
> > Fix this by using a better approximation for division by 1000.
> >
> > Reported-by: Anton Blanchard <anton@samba.org>
> > Bisected-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
> > Suggested-by David Laight <david.laight@aculab.com>
> > Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
> > ---
> > Changes in v4
> > =============
> >   - Increasing the threshold upto which approximation can be used.
> >   - Removed explicit cast. Instead added a comment saying why cast
> >     is safe.
> >
> > Changes in v3
> > =============
> >   - Using approximation suggested by David
> >
> > Changes in v2
> > =============
> >   - Fixing it in the cpuidle core code instead of driver code.
> >
> >   drivers/cpuidle/cpuidle.c | 11 +++--------
> >   drivers/cpuidle/cpuidle.h | 38 ++++++++++++++++++++++++++++++++++++++
> >   2 files changed, 41 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> > index a4d0059..f55ad01 100644
> > --- a/drivers/cpuidle/cpuidle.c
> > +++ b/drivers/cpuidle/cpuidle.c
> > @@ -174,7 +174,6 @@ int cpuidle_enter_state(struct cpuidle_device *dev,
> > struct cpuidle_driver *drv,
> >    struct cpuidle_state *target_state = &drv->states[index];
> >    bool broadcast = !!(target_state->flags & CPUIDLE_FLAG_TIMER_STOP);
> >    u64 time_start, time_end;
> > -	s64 diff;
> >
> >    /*
> >   	 * Tell the time framework to switch to a broadcast timer because our
> > @@ -218,14 +217,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev,
> > struct cpuidle_driver *drv,
> >     local_irq_enable();
> >
> >   	/*
> > -	 * local_clock() returns the time in nanosecond, let's shift
> > -	 * by 10 (divide by 1024) to have microsecond based time.
> > +	 * local_clock() returns the time in nanoseconds, convert it to
> > +	 * microsecond based time.
> >   	 */
> > -	diff = (time_end - time_start) >> 10;
> > -	if (diff > INT_MAX)
> > -		diff = INT_MAX;
> > -
> > -	dev->last_residency = (int) diff;
> > +	dev->last_residency = convert_nsec_to_usec(time_end - time_start);
> >
> >    if (entered_state >= 0) {
> >   		/* Update cpuidle counters */
> > diff --git a/drivers/cpuidle/cpuidle.h b/drivers/cpuidle/cpuidle.h
> > index f87f399..a027b35 100644
> > --- a/drivers/cpuidle/cpuidle.h
> > +++ b/drivers/cpuidle/cpuidle.h
> > @@ -68,4 +68,42 @@ static inline void
> > cpuidle_coupled_unregister_device(struct cpuidle_device *dev)
> >   }
> >   #endif
> >
> > +/*
> > + * To ensure that there is no overflow while approximation
> > + * for dividing val by 1000, we must respect -
> > + * val + (val >> 5) <= 0xFFFFFFFF
> > + * val + val/32 <= 0xFFFFFFFF
> > + * val <= (0xFFFFFFFF * 32) / 33
> > + * val <= 0xF83E0F82
> > + * Hence the threshold for val below which we can use the
> > + * approximation is 0xF83E0F82
> > + */
> > +#define DIV_APPROXIMATION_THRESHOLD 0xF83E0F82UL
> > +
> > +/*
> > + * Used for calculating last_residency in usec. Optimized for case
> > + * where last_residency in nsecs is < DIV_APPROXIMATION_THRESHOLD
> > + * Approximated value has less than 1% error.
> > + */
> > +static inline int convert_nsec_to_usec(u64 nsec)
> > +{
> > +	if (likely(nsec < DIV_APPROXIMATION_THRESHOLD)) {
> > +		u32 usec = nsec;
> > +
> > +		usec += usec >> 5;
> > +		usec = usec >> 10;
> > +
> > +		/* Can safely cast to int since usec is < INT_MAX */
> > +		return usec;
> > +	} else {
> > +		u64 usec = div_u64(nsec, 1000);
> > +
> > +		if (usec > INT_MAX)
> > +			usec = INT_MAX;
> > +
> > +		/* Can safely cast to int since usec is < INT_MAX */
> > +		return usec;
> > +	}
> > +}
> 
> 
> What bothers me with this division is the benefit of adding an extra ultra
> optimized division by 1000 in cpuidle.h while we have already ktime_divns
> which is optimized in ktime.h.

It is "optimized" but still much heavier than what is presented above as 
it provides maximum precision.

It all depends on how important the performance gain from the original 
shift by 10 was in the first place.


Nicolas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4] cpuidle: Fix last_residency division
  2016-06-30 15:37     ` Nicolas Pitre
  (?)
@ 2016-07-01  8:06     ` Daniel Lezcano
  2016-07-01 12:41         ` Balbir Singh
                         ` (2 more replies)
  -1 siblings, 3 replies; 13+ messages in thread
From: Daniel Lezcano @ 2016-07-01  8:06 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Shreyas B. Prabhu, rjw, linux-pm, linuxppc-dev, anton, mpe,
	bsingharora, David.Laight, arnd

On 06/30/2016 05:37 PM, Nicolas Pitre wrote:
> On Thu, 30 Jun 2016, Daniel Lezcano wrote:

[ ... ]

>>> +	if (likely(nsec < DIV_APPROXIMATION_THRESHOLD)) {
>>> +		u32 usec = nsec;
>>> +
>>> +		usec += usec >> 5;
>>> +		usec = usec >> 10;
>>> +
>>> +		/* Can safely cast to int since usec is < INT_MAX */
>>> +		return usec;
>>> +	} else {
>>> +		u64 usec = div_u64(nsec, 1000);
>>> +
>>> +		if (usec > INT_MAX)
>>> +			usec = INT_MAX;
>>> +
>>> +		/* Can safely cast to int since usec is < INT_MAX */
>>> +		return usec;
>>> +	}
>>> +}
>>
>>
>> What bothers me with this division is the benefit of adding an extra ultra
>> optimized division by 1000 in cpuidle.h while we have already ktime_divns
>> which is optimized in ktime.h.
>
> It is "optimized" but still much heavier than what is presented above as
> it provides maximum precision.
>
> It all depends on how important the performance gain from the original
> shift by 10 was in the first place.

Actually the original shift was there because it was convenient as a 
simple ~div1000 operation. But against all odds, the approximation 
introduced a regression on a very specific use case on PowerPC.

We are not in the hot path and I think we can live with a ktime_divns 
without problem. That would simplify the fix I believe.

Perhaps the div1000 routine could be moved in ktime.h to be used as a 
helper for ktime_divns when we divide by the 1000 constant and submitted 
in a separate patch as an optimization.

-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4] cpuidle: Fix last_residency division
  2016-07-01  8:06     ` Daniel Lezcano
@ 2016-07-01 12:41         ` Balbir Singh
  2016-07-01 13:00         ` Nicolas Pitre
  2016-07-01 14:16       ` Shreyas B Prabhu
  2 siblings, 0 replies; 13+ messages in thread
From: Balbir Singh @ 2016-07-01 12:41 UTC (permalink / raw)
  To: Daniel Lezcano, Nicolas Pitre
  Cc: arnd, linux-pm, rjw, David.Laight, anton, Shreyas B. Prabhu,
	linuxppc-dev

On Fri, 2016-07-01 at 10:06 +0200, Daniel Lezcano wrote:
> On 06/30/2016 05:37 PM, Nicolas Pitre wrote:
> > 
> > On Thu, 30 Jun 2016, Daniel Lezcano wrote:
> [ ... ]
> 
> > 
> > > 
> > > > 
> > > > +	if (likely(nsec < DIV_APPROXIMATION_THRESHOLD)) {
> > > > +		u32 usec = nsec;
> > > > +
> > > > +		usec += usec >> 5;
> > > > +		usec = usec >> 10;
> > > > +
> > > > +		/* Can safely cast to int since usec is < INT_MAX */
> > > > +		return usec;
> > > > +	} else {
> > > > +		u64 usec = div_u64(nsec, 1000);
> > > > +
> > > > +		if (usec > INT_MAX)
> > > > +			usec = INT_MAX;
> > > > +
> > > > +		/* Can safely cast to int since usec is < INT_MAX */
> > > > +		return usec;
> > > > +	}
> > > > +}
> > > 
> > > What bothers me with this division is the benefit of adding an extra ultra
> > > optimized division by 1000 in cpuidle.h while we have already ktime_divns
> > > which is optimized in ktime.h.
> > It is "optimized" but still much heavier than what is presented above as
> > it provides maximum precision.
> > 
> > It all depends on how important the performance gain from the original
> > shift by 10 was in the first place.
> Actually the original shift was there because it was convenient as a 
> simple ~div1000 operation. But against all odds, the approximation 
> introduced a regression on a very specific use case on PowerPC.
> 
> We are not in the hot path and I think we can live with a ktime_divns 
> without problem. That would simplify the fix I believe.
> 

I would tend to agree with this and there are better ways to do
multiplicative inverses if we care

Balbir Singh.

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4] cpuidle: Fix last_residency division
@ 2016-07-01 12:41         ` Balbir Singh
  0 siblings, 0 replies; 13+ messages in thread
From: Balbir Singh @ 2016-07-01 12:41 UTC (permalink / raw)
  To: Daniel Lezcano, Nicolas Pitre
  Cc: Shreyas B. Prabhu, rjw, linux-pm, linuxppc-dev, anton, mpe,
	David.Laight, arnd

On Fri, 2016-07-01 at 10:06 +0200, Daniel Lezcano wrote:
> On 06/30/2016 05:37 PM, Nicolas Pitre wrote:
> > 
> > On Thu, 30 Jun 2016, Daniel Lezcano wrote:
> [ ... ]
> 
> > 
> > > 
> > > > 
> > > > +	if (likely(nsec < DIV_APPROXIMATION_THRESHOLD)) {
> > > > +		u32 usec = nsec;
> > > > +
> > > > +		usec += usec >> 5;
> > > > +		usec = usec >> 10;
> > > > +
> > > > +		/* Can safely cast to int since usec is < INT_MAX */
> > > > +		return usec;
> > > > +	} else {
> > > > +		u64 usec = div_u64(nsec, 1000);
> > > > +
> > > > +		if (usec > INT_MAX)
> > > > +			usec = INT_MAX;
> > > > +
> > > > +		/* Can safely cast to int since usec is < INT_MAX */
> > > > +		return usec;
> > > > +	}
> > > > +}
> > > 
> > > What bothers me with this division is the benefit of adding an extra ultra
> > > optimized division by 1000 in cpuidle.h while we have already ktime_divns
> > > which is optimized in ktime.h.
> > It is "optimized" but still much heavier than what is presented above as
> > it provides maximum precision.
> > 
> > It all depends on how important the performance gain from the original
> > shift by 10 was in the first place.
> Actually the original shift was there because it was convenient as a 
> simple ~div1000 operation. But against all odds, the approximation 
> introduced a regression on a very specific use case on PowerPC.
> 
> We are not in the hot path and I think we can live with a ktime_divns 
> without problem. That would simplify the fix I believe.
> 

I would tend to agree with this and there are better ways to do
multiplicative inverses if we care

Balbir Singh.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4] cpuidle: Fix last_residency division
  2016-07-01  8:06     ` Daniel Lezcano
@ 2016-07-01 13:00         ` Nicolas Pitre
  2016-07-01 13:00         ` Nicolas Pitre
  2016-07-01 14:16       ` Shreyas B Prabhu
  2 siblings, 0 replies; 13+ messages in thread
From: Nicolas Pitre @ 2016-07-01 13:00 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: arnd, linux-pm, rjw, David.Laight, anton, Shreyas B. Prabhu,
	linuxppc-dev

On Fri, 1 Jul 2016, Daniel Lezcano wrote:

> On 06/30/2016 05:37 PM, Nicolas Pitre wrote:
> > On Thu, 30 Jun 2016, Daniel Lezcano wrote:
> 
> [ ... ]
> 
> > > > +	if (likely(nsec < DIV_APPROXIMATION_THRESHOLD)) {
> > > > +		u32 usec = nsec;
> > > > +
> > > > +		usec += usec >> 5;
> > > > +		usec = usec >> 10;
> > > > +
> > > > +		/* Can safely cast to int since usec is < INT_MAX */
> > > > +		return usec;
> > > > +	} else {
> > > > +		u64 usec = div_u64(nsec, 1000);
> > > > +
> > > > +		if (usec > INT_MAX)
> > > > +			usec = INT_MAX;
> > > > +
> > > > +		/* Can safely cast to int since usec is < INT_MAX */
> > > > +		return usec;
> > > > +	}
> > > > +}
> > >
> > >
> > > What bothers me with this division is the benefit of adding an extra ultra
> > > optimized division by 1000 in cpuidle.h while we have already ktime_divns
> > > which is optimized in ktime.h.
> >
> > It is "optimized" but still much heavier than what is presented above as
> > it provides maximum precision.
> >
> > It all depends on how important the performance gain from the original
> > shift by 10 was in the first place.
> 
> Actually the original shift was there because it was convenient as a simple
> ~div1000 operation. But against all odds, the approximation introduced a
> regression on a very specific use case on PowerPC.
> 
> We are not in the hot path and I think we can live with a ktime_divns without
> problem. That would simplify the fix I believe.

I agree.

> Perhaps the div1000 routine could be moved in ktime.h to be used as a helper
> for ktime_divns when we divide by the 1000 constant and submitted in a
> separate patch as an optimization.

The proposed patch here still provides an approximation so it wouldn't 
be appropriate for ktime_divns.


Nicolas
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4] cpuidle: Fix last_residency division
@ 2016-07-01 13:00         ` Nicolas Pitre
  0 siblings, 0 replies; 13+ messages in thread
From: Nicolas Pitre @ 2016-07-01 13:00 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Shreyas B. Prabhu, rjw, linux-pm, linuxppc-dev, anton, mpe,
	bsingharora, David.Laight, arnd

On Fri, 1 Jul 2016, Daniel Lezcano wrote:

> On 06/30/2016 05:37 PM, Nicolas Pitre wrote:
> > On Thu, 30 Jun 2016, Daniel Lezcano wrote:
> 
> [ ... ]
> 
> > > > +	if (likely(nsec < DIV_APPROXIMATION_THRESHOLD)) {
> > > > +		u32 usec = nsec;
> > > > +
> > > > +		usec += usec >> 5;
> > > > +		usec = usec >> 10;
> > > > +
> > > > +		/* Can safely cast to int since usec is < INT_MAX */
> > > > +		return usec;
> > > > +	} else {
> > > > +		u64 usec = div_u64(nsec, 1000);
> > > > +
> > > > +		if (usec > INT_MAX)
> > > > +			usec = INT_MAX;
> > > > +
> > > > +		/* Can safely cast to int since usec is < INT_MAX */
> > > > +		return usec;
> > > > +	}
> > > > +}
> > >
> > >
> > > What bothers me with this division is the benefit of adding an extra ultra
> > > optimized division by 1000 in cpuidle.h while we have already ktime_divns
> > > which is optimized in ktime.h.
> >
> > It is "optimized" but still much heavier than what is presented above as
> > it provides maximum precision.
> >
> > It all depends on how important the performance gain from the original
> > shift by 10 was in the first place.
> 
> Actually the original shift was there because it was convenient as a simple
> ~div1000 operation. But against all odds, the approximation introduced a
> regression on a very specific use case on PowerPC.
> 
> We are not in the hot path and I think we can live with a ktime_divns without
> problem. That would simplify the fix I believe.

I agree.

> Perhaps the div1000 routine could be moved in ktime.h to be used as a helper
> for ktime_divns when we divide by the 1000 constant and submitted in a
> separate patch as an optimization.

The proposed patch here still provides an approximation so it wouldn't 
be appropriate for ktime_divns.


Nicolas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4] cpuidle: Fix last_residency division
  2016-07-01 12:41         ` Balbir Singh
@ 2016-07-01 13:02           ` Nicolas Pitre
  -1 siblings, 0 replies; 13+ messages in thread
From: Nicolas Pitre @ 2016-07-01 13:02 UTC (permalink / raw)
  To: Balbir Singh
  Cc: arnd, linux-pm, Shreyas B. Prabhu, Daniel Lezcano, rjw,
	David.Laight, anton, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1846 bytes --]

On Fri, 1 Jul 2016, Balbir Singh wrote:

> On Fri, 2016-07-01 at 10:06 +0200, Daniel Lezcano wrote:
> > On 06/30/2016 05:37 PM, Nicolas Pitre wrote:
> > > 
> > > On Thu, 30 Jun 2016, Daniel Lezcano wrote:
> > [ ... ]
> > 
> > > 
> > > > 
> > > > > 
> > > > > +	if (likely(nsec < DIV_APPROXIMATION_THRESHOLD)) {
> > > > > +		u32 usec = nsec;
> > > > > +
> > > > > +		usec += usec >> 5;
> > > > > +		usec = usec >> 10;
> > > > > +
> > > > > +		/* Can safely cast to int since usec is < INT_MAX */
> > > > > +		return usec;
> > > > > +	} else {
> > > > > +		u64 usec = div_u64(nsec, 1000);
> > > > > +
> > > > > +		if (usec > INT_MAX)
> > > > > +			usec = INT_MAX;
> > > > > +
> > > > > +		/* Can safely cast to int since usec is < INT_MAX */
> > > > > +		return usec;
> > > > > +	}
> > > > > +}
> > > > 
> > > > What bothers me with this division is the benefit of adding an extra ultra
> > > > optimized division by 1000 in cpuidle.h while we have already ktime_divns
> > > > which is optimized in ktime.h.
> > > It is "optimized" but still much heavier than what is presented above as
> > > it provides maximum precision.
> > > 
> > > It all depends on how important the performance gain from the original
> > > shift by 10 was in the first place.
> > Actually the original shift was there because it was convenient as a 
> > simple ~div1000 operation. But against all odds, the approximation 
> > introduced a regression on a very specific use case on PowerPC.
> > 
> > We are not in the hot path and I think we can live with a ktime_divns 
> > without problem. That would simplify the fix I believe.
> > 
> 
> I would tend to agree with this and there are better ways to do
> multiplicative inverses if we care

ktime_divns already does it automatically when applicable.


Nicolas

[-- Attachment #2: Type: text/plain, Size: 150 bytes --]

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4] cpuidle: Fix last_residency division
@ 2016-07-01 13:02           ` Nicolas Pitre
  0 siblings, 0 replies; 13+ messages in thread
From: Nicolas Pitre @ 2016-07-01 13:02 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Daniel Lezcano, Shreyas B. Prabhu, rjw, linux-pm, linuxppc-dev,
	anton, mpe, David.Laight, arnd

[-- Attachment #1: Type: text/plain, Size: 1794 bytes --]

On Fri, 1 Jul 2016, Balbir Singh wrote:

> On Fri, 2016-07-01 at 10:06 +0200, Daniel Lezcano wrote:
> > On 06/30/2016 05:37 PM, Nicolas Pitre wrote:
> > > 
> > > On Thu, 30 Jun 2016, Daniel Lezcano wrote:
> > [ ... ]
> > 
> > > 
> > > > 
> > > > > 
> > > > > +	if (likely(nsec < DIV_APPROXIMATION_THRESHOLD)) {
> > > > > +		u32 usec = nsec;
> > > > > +
> > > > > +		usec += usec >> 5;
> > > > > +		usec = usec >> 10;
> > > > > +
> > > > > +		/* Can safely cast to int since usec is < INT_MAX */
> > > > > +		return usec;
> > > > > +	} else {
> > > > > +		u64 usec = div_u64(nsec, 1000);
> > > > > +
> > > > > +		if (usec > INT_MAX)
> > > > > +			usec = INT_MAX;
> > > > > +
> > > > > +		/* Can safely cast to int since usec is < INT_MAX */
> > > > > +		return usec;
> > > > > +	}
> > > > > +}
> > > > 
> > > > What bothers me with this division is the benefit of adding an extra ultra
> > > > optimized division by 1000 in cpuidle.h while we have already ktime_divns
> > > > which is optimized in ktime.h.
> > > It is "optimized" but still much heavier than what is presented above as
> > > it provides maximum precision.
> > > 
> > > It all depends on how important the performance gain from the original
> > > shift by 10 was in the first place.
> > Actually the original shift was there because it was convenient as a 
> > simple ~div1000 operation. But against all odds, the approximation 
> > introduced a regression on a very specific use case on PowerPC.
> > 
> > We are not in the hot path and I think we can live with a ktime_divns 
> > without problem. That would simplify the fix I believe.
> > 
> 
> I would tend to agree with this and there are better ways to do
> multiplicative inverses if we care

ktime_divns already does it automatically when applicable.


Nicolas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4] cpuidle: Fix last_residency division
  2016-07-01  8:06     ` Daniel Lezcano
  2016-07-01 12:41         ` Balbir Singh
  2016-07-01 13:00         ` Nicolas Pitre
@ 2016-07-01 14:16       ` Shreyas B Prabhu
  2 siblings, 0 replies; 13+ messages in thread
From: Shreyas B Prabhu @ 2016-07-01 14:16 UTC (permalink / raw)
  To: Daniel Lezcano, Nicolas Pitre
  Cc: rjw, linux-pm, linuxppc-dev, anton, mpe, bsingharora, David.Laight, arnd



On 07/01/2016 01:36 PM, Daniel Lezcano wrote:
> On 06/30/2016 05:37 PM, Nicolas Pitre wrote:
>> On Thu, 30 Jun 2016, Daniel Lezcano wrote:
>>> +    }
>>>> +}
>>>
>>>
>>> What bothers me with this division is the benefit of adding an extra
>>> ultra
>>> optimized division by 1000 in cpuidle.h while we have already
>>> ktime_divns
>>> which is optimized in ktime.h.
>>
>> It is "optimized" but still much heavier than what is presented above as
>> it provides maximum precision.
>>
>> It all depends on how important the performance gain from the original
>> shift by 10 was in the first place.
> 
> Actually the original shift was there because it was convenient as a
> simple ~div1000 operation. But against all odds, the approximation
> introduced a regression on a very specific use case on PowerPC.
> 
> We are not in the hot path and I think we can live with a ktime_divns
> without problem. That would simplify the fix I believe.
> 

I agree too. I'll post next version with this.

Thanks,
Shreyas


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-07-01 14:16 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-30 14:34 [PATCH v4] cpuidle: Fix last_residency division Shreyas B. Prabhu
2016-06-30 14:57 ` Daniel Lezcano
2016-06-30 14:57   ` Daniel Lezcano
2016-06-30 15:37   ` Nicolas Pitre
2016-06-30 15:37     ` Nicolas Pitre
2016-07-01  8:06     ` Daniel Lezcano
2016-07-01 12:41       ` Balbir Singh
2016-07-01 12:41         ` Balbir Singh
2016-07-01 13:02         ` Nicolas Pitre
2016-07-01 13:02           ` Nicolas Pitre
2016-07-01 13:00       ` Nicolas Pitre
2016-07-01 13:00         ` Nicolas Pitre
2016-07-01 14:16       ` Shreyas B Prabhu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.