All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -v2 0/3] lib/int_sqrt: Fix optimize and document
@ 2017-10-20 16:44 Peter Zijlstra
  2017-10-20 16:44 ` [PATCH -v2 1/3] lib/int_sqrt: Optimize small argument Peter Zijlstra
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Peter Zijlstra @ 2017-10-20 16:44 UTC (permalink / raw)
  To: torvalds, akpm
  Cc: dave, aksgarg1989, tglx, mingo, will.deacon, joe, peterz,
	linux-kernel, davem, mawilcox, keescook, md

Version with hopefully acceptable Changelogs.

Thanks for reminding me Joe.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH -v2 1/3] lib/int_sqrt: Optimize small argument
  2017-10-20 16:44 [PATCH -v2 0/3] lib/int_sqrt: Fix optimize and document Peter Zijlstra
@ 2017-10-20 16:44 ` Peter Zijlstra
  2017-10-20 16:44 ` [PATCH -v2 2/3] lib/int_sqrt: Optimize initial value compute Peter Zijlstra
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2017-10-20 16:44 UTC (permalink / raw)
  To: torvalds, akpm
  Cc: dave, aksgarg1989, tglx, mingo, will.deacon, joe, peterz,
	linux-kernel, davem, mawilcox, keescook, md

[-- Attachment #1: peterz-int_sqrt-opt-1.patch --]
[-- Type: text/plain, Size: 1888 bytes --]

The current int_sqrt() computation is sub-optimal for the case of
small @x. Which is the interesting case when we're going to do
cumulative distribution functions on idle times, which we assume to be
a random variable, where the target residency of the deepest idle state
gives an upper bound on the variable (5e6ns on recent Intel chips).

In the case of small @x, the compute loop:

	while (m != 0) {
		b = y + m;
		y >>= 1;

		if (x >= b) {
			x -= b;
			y += m;
		}
		m >>= 2;
	}

can be reduced to:

	while (m > x)
		m >>= 2;

Because y==0, b==m and until x>=m y will remain 0.

And while this is computationally equivalent, it runs much faster
because there's less code, in particular less branches.

      cycles:                 branches:              branch-misses:

OLD:

hot:   45.109444 +- 0.044117  44.333392 +- 0.002254  0.018723 +- 0.000593
cold: 187.737379 +- 0.156678  44.333407 +- 0.002254  6.272844 +- 0.004305

PRE:

hot:   67.937492 +- 0.064124  66.999535 +- 0.000488  0.066720 +- 0.001113
cold: 232.004379 +- 0.332811  66.999527 +- 0.000488  6.914634 +- 0.006568

POST:

hot:   43.633557 +- 0.034373  45.333132 +- 0.002277  0.023529 +- 0.000681
cold: 207.438411 +- 0.125840  45.333132 +- 0.002277  6.976486 +- 0.004219

Averages computed over all values <128k using a LFSR to generate
order. Cold numbers have a LFSR based branch trace buffer 'confuser'
ran between each int_sqrt() invocation.

Fixes: 30493cc9dddb ("lib/int_sqrt.c: optimize square root algorithm")
Suggested-by: Anshul Garg <aksgarg1989@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 lib/int_sqrt.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/lib/int_sqrt.c
+++ b/lib/int_sqrt.c
@@ -22,6 +22,9 @@ unsigned long int_sqrt(unsigned long x)
 		return x;
 
 	m = 1UL << (BITS_PER_LONG - 2);
+	while (m > x)
+		m >>= 2;
+
 	while (m != 0) {
 		b = y + m;
 		y >>= 1;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH -v2 2/3] lib/int_sqrt: Optimize initial value compute
  2017-10-20 16:44 [PATCH -v2 0/3] lib/int_sqrt: Fix optimize and document Peter Zijlstra
  2017-10-20 16:44 ` [PATCH -v2 1/3] lib/int_sqrt: Optimize small argument Peter Zijlstra
@ 2017-10-20 16:44 ` Peter Zijlstra
  2017-10-20 16:44 ` [PATCH -v2 3/3] lib/int_sqrt: Adjust comments Peter Zijlstra
  2017-10-20 19:31 ` [PATCH -v2 0/3] lib/int_sqrt: Fix optimize and document Linus Torvalds
  3 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2017-10-20 16:44 UTC (permalink / raw)
  To: torvalds, akpm
  Cc: dave, aksgarg1989, tglx, mingo, will.deacon, joe, peterz,
	linux-kernel, davem, mawilcox, keescook, md

[-- Attachment #1: peterz-int_sqrt-opt-2.patch --]
[-- Type: text/plain, Size: 1905 bytes --]

The initial value (@m) compute is:

	m = 1UL << (BITS_PER_LONG - 2);
	while (m > x)
		m >>= 2;

Which is a linear search for the highest even bit smaller or equal to @x
We can implement this using a binary search using __fls() (or better
when its hardware implemented).

	m = 1UL << (__fls(x) & ~1UL);

Especially for small values of @x; which are the more common arguments
when doing a CDF on idle times; the linear search is near to worst
case, while the binary search of __fls() is a constant 6 (or 5 on
32bit) branches.

      cycles:                 branches:              branch-misses:

PRE:

hot:   43.633557 +- 0.034373  45.333132 +- 0.002277  0.023529 +- 0.000681
cold: 207.438411 +- 0.125840  45.333132 +- 0.002277  6.976486 +- 0.004219

SOFTWARE FLS:

hot:   29.576176 +- 0.028850  26.666730 +- 0.004511  0.019463 +- 0.000663
cold: 165.947136 +- 0.188406  26.666746 +- 0.004511  6.133897 +- 0.004386

HARDWARE FLS:

hot:   24.720922 +- 0.025161  20.666784 +- 0.004509  0.020836 +- 0.000677
cold: 132.777197 +- 0.127471  20.666776 +- 0.004509  5.080285 +- 0.003874

Averages computed over all values <128k using a LFSR to generate
order. Cold numbers have a LFSR based branch trace buffer 'confuser'
ran between each int_sqrt() invocation.

Acked-by: Will Deacon <will.deacon@arm.com>
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 lib/int_sqrt.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

--- a/lib/int_sqrt.c
+++ b/lib/int_sqrt.c
@@ -7,6 +7,7 @@
 
 #include <linux/kernel.h>
 #include <linux/export.h>
+#include <linux/bitops.h>
 
 /**
  * int_sqrt - rough approximation to sqrt
@@ -21,10 +22,7 @@ unsigned long int_sqrt(unsigned long x)
 	if (x <= 1)
 		return x;
 
-	m = 1UL << (BITS_PER_LONG - 2);
-	while (m > x)
-		m >>= 2;
-
+	m = 1UL << (__fls(x) & ~1UL);
 	while (m != 0) {
 		b = y + m;
 		y >>= 1;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH -v2 3/3] lib/int_sqrt: Adjust comments
  2017-10-20 16:44 [PATCH -v2 0/3] lib/int_sqrt: Fix optimize and document Peter Zijlstra
  2017-10-20 16:44 ` [PATCH -v2 1/3] lib/int_sqrt: Optimize small argument Peter Zijlstra
  2017-10-20 16:44 ` [PATCH -v2 2/3] lib/int_sqrt: Optimize initial value compute Peter Zijlstra
@ 2017-10-20 16:44 ` Peter Zijlstra
  2017-10-20 19:31 ` [PATCH -v2 0/3] lib/int_sqrt: Fix optimize and document Linus Torvalds
  3 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2017-10-20 16:44 UTC (permalink / raw)
  To: torvalds, akpm
  Cc: dave, aksgarg1989, tglx, mingo, will.deacon, joe, peterz,
	linux-kernel, davem, mawilcox, keescook, md

[-- Attachment #1: peterz-int_sqrt-opt-3.patch --]
[-- Type: text/plain, Size: 638 bytes --]

Our current int_sqrt() is not rough nor any approximation; it
calculates the exact value of: floor(sqrt()). Document this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 lib/int_sqrt.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/lib/int_sqrt.c
+++ b/lib/int_sqrt.c
@@ -10,10 +10,10 @@
 #include <linux/bitops.h>
 
 /**
- * int_sqrt - rough approximation to sqrt
+ * int_sqrt - computes the integer square root
  * @x: integer of which to calculate the sqrt
  *
- * A very rough approximation to the sqrt() function.
+ * Computes: floor(sqrt(x))
  */
 unsigned long int_sqrt(unsigned long x)
 {

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -v2 0/3] lib/int_sqrt: Fix optimize and document
  2017-10-20 16:44 [PATCH -v2 0/3] lib/int_sqrt: Fix optimize and document Peter Zijlstra
                   ` (2 preceding siblings ...)
  2017-10-20 16:44 ` [PATCH -v2 3/3] lib/int_sqrt: Adjust comments Peter Zijlstra
@ 2017-10-20 19:31 ` Linus Torvalds
  2017-11-02 16:43   ` Joe Perches
  3 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2017-10-20 19:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Davidlohr Bueso, Anshul Garg, Thomas Gleixner,
	Ingo Molnar, Will Deacon, Joe Perches, Linux Kernel Mailing List,
	David Miller, Matthew Wilcox, Kees Cook, Michael Davidson

On Fri, Oct 20, 2017 at 12:44 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> Version with hopefully acceptable Changelogs.

Ack.

             Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -v2 0/3] lib/int_sqrt: Fix optimize and document
  2017-10-20 19:31 ` [PATCH -v2 0/3] lib/int_sqrt: Fix optimize and document Linus Torvalds
@ 2017-11-02 16:43   ` Joe Perches
  2017-11-02 16:47     ` Peter Zijlstra
  0 siblings, 1 reply; 8+ messages in thread
From: Joe Perches @ 2017-11-02 16:43 UTC (permalink / raw)
  To: Linus Torvalds, Peter Zijlstra
  Cc: Andrew Morton, Davidlohr Bueso, Anshul Garg, Thomas Gleixner,
	Ingo Molnar, Will Deacon, Linux Kernel Mailing List,
	David Miller, Matthew Wilcox, Kees Cook, Michael Davidson

On Fri, 2017-10-20 at 15:31 -0400, Linus Torvalds wrote:
> On Fri, Oct 20, 2017 at 12:44 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > Version with hopefully acceptable Changelogs.
> 
> Ack.

Still not in -next

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -v2 0/3] lib/int_sqrt: Fix optimize and document
  2017-11-02 16:43   ` Joe Perches
@ 2017-11-02 16:47     ` Peter Zijlstra
  2017-11-02 16:51       ` Kees Cook
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2017-11-02 16:47 UTC (permalink / raw)
  To: Joe Perches
  Cc: Linus Torvalds, Andrew Morton, Davidlohr Bueso, Anshul Garg,
	Thomas Gleixner, Ingo Molnar, Will Deacon,
	Linux Kernel Mailing List, David Miller, Matthew Wilcox,
	Kees Cook, Michael Davidson

On Thu, Nov 02, 2017 at 09:43:50AM -0700, Joe Perches wrote:
> On Fri, 2017-10-20 at 15:31 -0400, Linus Torvalds wrote:
> > On Fri, Oct 20, 2017 at 12:44 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > > Version with hopefully acceptable Changelogs.
> > 
> > Ack.
> 
> Still not in -next

I got emails from Andrew's bot.. I have no idea where Andrew keeps his
pile-'o-patches these days, but that'd be the place to check.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -v2 0/3] lib/int_sqrt: Fix optimize and document
  2017-11-02 16:47     ` Peter Zijlstra
@ 2017-11-02 16:51       ` Kees Cook
  0 siblings, 0 replies; 8+ messages in thread
From: Kees Cook @ 2017-11-02 16:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Joe Perches, Linus Torvalds, Andrew Morton, Davidlohr Bueso,
	Anshul Garg, Thomas Gleixner, Ingo Molnar, Will Deacon,
	Linux Kernel Mailing List, David Miller, Matthew Wilcox,
	Michael Davidson

On Thu, Nov 2, 2017 at 9:47 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Nov 02, 2017 at 09:43:50AM -0700, Joe Perches wrote:
>> On Fri, 2017-10-20 at 15:31 -0400, Linus Torvalds wrote:
>> > On Fri, Oct 20, 2017 at 12:44 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > > Version with hopefully acceptable Changelogs.
>> >
>> > Ack.
>>
>> Still not in -next
>
> I got emails from Andrew's bot.. I have no idea where Andrew keeps his
> pile-'o-patches these days, but that'd be the place to check.

Looks like they're in mmots:

http://www.ozlabs.org/~akpm/mmots/broken-out/

but not yet mmotm, which is what gets merged for -next. So they're in
process, and should appear soon, AIUI.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-11-02 16:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-20 16:44 [PATCH -v2 0/3] lib/int_sqrt: Fix optimize and document Peter Zijlstra
2017-10-20 16:44 ` [PATCH -v2 1/3] lib/int_sqrt: Optimize small argument Peter Zijlstra
2017-10-20 16:44 ` [PATCH -v2 2/3] lib/int_sqrt: Optimize initial value compute Peter Zijlstra
2017-10-20 16:44 ` [PATCH -v2 3/3] lib/int_sqrt: Adjust comments Peter Zijlstra
2017-10-20 19:31 ` [PATCH -v2 0/3] lib/int_sqrt: Fix optimize and document Linus Torvalds
2017-11-02 16:43   ` Joe Perches
2017-11-02 16:47     ` Peter Zijlstra
2017-11-02 16:51       ` Kees Cook

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.