All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Optimize int_sqrt for small values for faster idle
@ 2016-01-28 21:42 Andi Kleen
  2016-01-28 22:04 ` kbuild test robot
                   ` (8 more replies)
  0 siblings, 9 replies; 22+ messages in thread
From: Andi Kleen @ 2016-01-28 21:42 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, davidlohr.bueso, rafael.j.wysocki, lenb, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

The menu cpuidle governor does at least two int_sqrt() each time
we go into idle in get_typical_interval to compute stddev

int_sqrts take 100-120 cycles each. Short idle latency is important
for many workloads.

I instrumented the function on my workstation and most values are
16bit only and most others 32bit (50% percentile is 122094,
75% is 3699533).

sqrt is implemented by starting with an initial estimation,
and then iterating. int_sqrt currently only uses a fixed
estimating which is good for 64bits worth of input.

This patch adds some checks at the beginning to start with
a better estimate for values fitting in 8, 16bit and 32bit.
This makes int_sqrt between 60+% faster for values in 16bit,
and still somewhat faster (between 10 and 30%) for larger values
upto 32bit. Full 64bit is slightly slower.

This optimizes the short idle calls and does not hurt the
long sleep (which probably do not care) much.

An alternative would be a full table drive approach, or
trying some inverted sqrt optimization, but this simple change
already seems to have a good payoff.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 lib/int_sqrt.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/lib/int_sqrt.c b/lib/int_sqrt.c
index 1ef4cc3..2479ccf 100644
--- a/lib/int_sqrt.c
+++ b/lib/int_sqrt.c
@@ -21,7 +21,15 @@ unsigned long int_sqrt(unsigned long x)
 	if (x <= 1)
 		return x;
 
-	m = 1UL << (BITS_PER_LONG - 2);
+	if (x <= 0xffff) {
+		if (m <= 0xff)
+			m = 1UL << (8 - 2);
+		else
+			m = 1UL << (16 - 2);
+	} else if (x <= 0xffffffff)
+		m = 1UL << (32 - 2);
+	else
+		m = 1UL << (BITS_PER_LONG - 2);
 	while (m != 0) {
 		b = y + m;
 		y >>= 1;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-07-24 13:28 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-28 21:42 [PATCH] Optimize int_sqrt for small values for faster idle Andi Kleen
2016-01-28 22:04 ` kbuild test robot
2016-01-28 22:11 ` kbuild test robot
2016-01-28 22:15 ` Joe Perches
2016-01-28 22:40   ` Andi Kleen
2016-01-28 22:22 ` Joe Perches
2016-01-28 22:29 ` Eric Dumazet
2016-01-28 22:30 ` Andi Kleen
2016-01-29  3:59 ` Rafael J. Wysocki
2016-01-31  7:27 ` Thomas Rohwer
2016-02-01 21:25 ` Rasmus Villemoes
2016-02-01 21:36   ` Andi Kleen
2016-02-01 23:08     ` Rasmus Villemoes
2016-02-02  0:00       ` Andi Kleen
2016-02-02  0:36       ` Eric Dumazet
2016-02-02 20:46         ` Rasmus Villemoes
2016-02-02 21:30           ` Eric Dumazet
2017-07-20 10:10         ` Peter Zijlstra
2017-07-24 13:28           ` Eric Dumazet
2016-02-07 21:32     ` Rasmus Villemoes
2016-02-09 20:44       ` Andi Kleen
2016-02-10 13:31         ` Fengguang Wu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.