linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
@ 2017-12-20 14:20 Crt Mori
  2017-12-20 14:39 ` David Laight
  0 siblings, 1 reply; 20+ messages in thread
From: Crt Mori @ 2017-12-20 14:20 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Ingo Molnar, Andrew Morton, Kees Cook, Rusty Russell, Ian Abbott,
	Larry Finger, Niklas Soderlund, Thomas Gleixner,
	Krzysztof Kozlowski, Masahiro Yamada, linux-kernel, linux-iio,
	Peter Zijlstra, Joe Perches, Crt Mori

There is no option to perform 64bit integer sqrt on 32bit platform.
Added stronger typed int_sqrt64 enables the 64bit calculations to
be performed on 32bit platforms. Although int_sqrt() is a rough
approximation, the same algorithm is used in int_sqrt64() as good
enough on 32bit platform.

Signed-off-by: Crt Mori <cmo@melexis.com>
---
 include/linux/kernel.h |  9 +++++++++
 lib/int_sqrt.c         | 31 +++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 0ad4c3044cf9..e4a3dc64e650 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -460,6 +460,15 @@ extern int func_ptr_is_kernel_text(void *ptr);
 
 unsigned long int_sqrt(unsigned long);
 
+#if BITS_PER_LONG < 64
+u32 int_sqrt64(u64 x);
+#else
+static inline u32 int_sqrt64(u64 x)
+{
+	return (u32)int_sqrt(x);
+}
+#endif
+
 extern void bust_spinlocks(int yes);
 extern int oops_in_progress;		/* If set, an oops, panic(), BUG() or die() is in progress */
 extern int panic_timeout;
diff --git a/lib/int_sqrt.c b/lib/int_sqrt.c
index 1ef4cc344977..184da54677ec 100644
--- a/lib/int_sqrt.c
+++ b/lib/int_sqrt.c
@@ -7,6 +7,7 @@
 
 #include <linux/kernel.h>
 #include <linux/export.h>
+#include <linux/bitops.h>
 
 /**
  * int_sqrt - rough approximation to sqrt
@@ -36,3 +37,33 @@ unsigned long int_sqrt(unsigned long x)
 	return y;
 }
 EXPORT_SYMBOL(int_sqrt);
+
+#if BITS_PER_LONG < 64
+/**
+ * int_sqrt64 - strongly typed int_sqrt function when minimum 64 bit input
+ * is expected.
+ * @x: 64bit integer of which to calculate the sqrt
+ */
+u32 int_sqrt64(u64 x)
+{
+	u64 b, m, y = 0;
+
+	if (x <= 1)
+		return x;
+
+	m = 1ULL << (fls64(x) & ~1ULL);
+	while (m != 0) {
+		b = y + m;
+		y >>= 1;
+
+		if (x >= b) {
+			x -= b;
+			y += m;
+		}
+		m >>= 2;
+	}
+
+	return y;
+}
+EXPORT_SYMBOL(int_sqrt64);
+#endif
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-20 14:20 [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt Crt Mori
@ 2017-12-20 14:39 ` David Laight
  2017-12-20 15:41   ` Crt Mori
  2017-12-20 16:00   ` Peter Zijlstra
  0 siblings, 2 replies; 20+ messages in thread
From: David Laight @ 2017-12-20 14:39 UTC (permalink / raw)
  To: 'Crt Mori', Jonathan Cameron
  Cc: Ingo Molnar, Andrew Morton, Kees Cook, Rusty Russell, Ian Abbott,
	Larry Finger, Niklas Soderlund, Thomas Gleixner,
	Krzysztof Kozlowski, Masahiro Yamada, linux-kernel, linux-iio,
	Peter Zijlstra, Joe Perches

From: Crt Mori
> Sent: 20 December 2017 14:20
> 
> There is no option to perform 64bit integer sqrt on 32bit platform.
> Added stronger typed int_sqrt64 enables the 64bit calculations to
> be performed on 32bit platforms. Although int_sqrt() is a rough
> approximation, the same algorithm is used in int_sqrt64() as good
> enough on 32bit platform.

The algorithm used gives an exact (rounded down) square root,
not an approximation.

With minor changes it ought to be possible to remove most of the
64bit arithmetic and shifts.

If you care about performance then using 32 bit maths will be much faster.

	David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-20 14:39 ` David Laight
@ 2017-12-20 15:41   ` Crt Mori
  2017-12-20 16:00   ` Peter Zijlstra
  1 sibling, 0 replies; 20+ messages in thread
From: Crt Mori @ 2017-12-20 15:41 UTC (permalink / raw)
  To: David Laight
  Cc: Jonathan Cameron, Ingo Molnar, Andrew Morton, Kees Cook,
	Rusty Russell, Ian Abbott, Larry Finger, Niklas Soderlund,
	Thomas Gleixner, Krzysztof Kozlowski, Masahiro Yamada,
	linux-kernel, linux-iio, Peter Zijlstra, Joe Perches

On 20 December 2017 at 15:39, David Laight <David.Laight@aculab.com> wrote:
> From: Crt Mori
>> Sent: 20 December 2017 14:20
>>
>> There is no option to perform 64bit integer sqrt on 32bit platform.
>> Added stronger typed int_sqrt64 enables the 64bit calculations to
>> be performed on 32bit platforms. Although int_sqrt() is a rough
>> approximation, the same algorithm is used in int_sqrt64() as good
>> enough on 32bit platform.
>
> The algorithm used gives an exact (rounded down) square root,
> not an approximation.
>

OK, noted. This is from new changed function indeed - will change in v11.

> With minor changes it ought to be possible to remove most of the
> 64bit arithmetic and shifts.
>
> If you care about performance then using 32 bit maths will be much faster.
>

I need precision not the performance. That is why I would like to
leave int_sqrt as one for good performance while my will be used when
you require a precision.

>         David
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-20 14:39 ` David Laight
  2017-12-20 15:41   ` Crt Mori
@ 2017-12-20 16:00   ` Peter Zijlstra
  2017-12-20 16:17     ` Crt Mori
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2017-12-20 16:00 UTC (permalink / raw)
  To: David Laight
  Cc: 'Crt Mori',
	Jonathan Cameron, Ingo Molnar, Andrew Morton, Kees Cook,
	Rusty Russell, Ian Abbott, Larry Finger, Niklas Soderlund,
	Thomas Gleixner, Krzysztof Kozlowski, Masahiro Yamada,
	linux-kernel, linux-iio, Joe Perches

On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote:

> With minor changes it ought to be possible to remove most of the
> 64bit arithmetic and shifts.
> 
> If you care about performance then using 32 bit maths will be much faster.

Some, u64 add/sub/shift isn't exactly expensive, but yes, I also
indicated that improvement is possible. At the very least y can be made
a u32 I suppose.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-20 16:00   ` Peter Zijlstra
@ 2017-12-20 16:17     ` Crt Mori
  2017-12-20 16:46       ` David Laight
  0 siblings, 1 reply; 20+ messages in thread
From: Crt Mori @ 2017-12-20 16:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Laight, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

On 20 December 2017 at 17:00, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote:
>
>> With minor changes it ought to be possible to remove most of the
>> 64bit arithmetic and shifts.
>>
>> If you care about performance then using 32 bit maths will be much faster.
>
> Some, u64 add/sub/shift isn't exactly expensive, but yes, I also
> indicated that improvement is possible. At the very least y can be made
> a u32 I suppose.

OK, is there any more easy optimizations you see?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-20 16:17     ` Crt Mori
@ 2017-12-20 16:46       ` David Laight
  2017-12-20 17:19         ` Joe Perches
  2017-12-20 17:30         ` Crt Mori
  0 siblings, 2 replies; 20+ messages in thread
From: David Laight @ 2017-12-20 16:46 UTC (permalink / raw)
  To: 'Crt Mori', Peter Zijlstra
  Cc: Jonathan Cameron, Ingo Molnar, Andrew Morton, Kees Cook,
	Rusty Russell, Ian Abbott, Larry Finger, Niklas Soderlund,
	Thomas Gleixner, Krzysztof Kozlowski, Masahiro Yamada,
	linux-kernel, linux-iio, Joe Perches

From: Crt Mori
> Sent: 20 December 2017 16:17
> 
> On 20 December 2017 at 17:00, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote:
> >
> >> With minor changes it ought to be possible to remove most of the
> >> 64bit arithmetic and shifts.
> >>
> >> If you care about performance then using 32 bit maths will be much faster.
> >
> > Some, u64 add/sub/shift isn't exactly expensive, but yes, I also
> > indicated that improvement is possible. At the very least y can be made
> > a u32 I suppose.
> 
> OK, is there any more easy optimizations you see?

I think this version works.
It doesn't have the optimisation for small values.

unsigned int sqrt64(unsigned long long x)
{
        unsigned int x_hi = x >> 32;

        unsigned int b = 0;
        unsigned int y = 0;
        unsigned int i;

        for (i = 0; i < 32; i++) {
                b <<= 2;
                b |= x_hi >> 30;
                x_hi <<= 2;
                if (i == 15)
                        x_hi = x;
                y <<= 1;
                if (b > y)
                        b -= ++y;
        }
        return y;
}

Put it through cc -O3 -m32 -c -o sqrt64.o sqrt64.c and then objdump sqrt64.o
and compare to that of your version.

	David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-20 16:46       ` David Laight
@ 2017-12-20 17:19         ` Joe Perches
  2017-12-21 10:02           ` David Laight
  2017-12-20 17:30         ` Crt Mori
  1 sibling, 1 reply; 20+ messages in thread
From: Joe Perches @ 2017-12-20 17:19 UTC (permalink / raw)
  To: David Laight, 'Crt Mori', Peter Zijlstra
  Cc: Jonathan Cameron, Ingo Molnar, Andrew Morton, Kees Cook,
	Rusty Russell, Ian Abbott, Larry Finger, Niklas Soderlund,
	Thomas Gleixner, Krzysztof Kozlowski, Masahiro Yamada,
	linux-kernel, linux-iio

On Wed, 2017-12-20 at 16:46 +0000, David Laight wrote:
> From: Crt Mori
> > Sent: 20 December 2017 16:17
> > 
> > On 20 December 2017 at 17:00, Peter Zijlstra <peterz@infradead.org> wrote:
> > > On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote:
> > > 
> > > > With minor changes it ought to be possible to remove most of the
> > > > 64bit arithmetic and shifts.
> > > > 
> > > > If you care about performance then using 32 bit maths will be much faster.
> > > 
> > > Some, u64 add/sub/shift isn't exactly expensive, but yes, I also
> > > indicated that improvement is possible. At the very least y can be made
> > > a u32 I suppose.
> > 
> > OK, is there any more easy optimizations you see?
> 
> I think this version works.
> It doesn't have the optimisation for small values.
> 
> unsigned int sqrt64(unsigned long long x)
> {
>         unsigned int x_hi = x >> 32;
> 
>         unsigned int b = 0;
>         unsigned int y = 0;
>         unsigned int i;
> 

Perhaps add:

	if (x <= UINT_MAX)
		return int_sqrt((unsigned long)x);

>         for (i = 0; i < 32; i++) {
>                 b <<= 2;
>                 b |= x_hi >> 30;
>                 x_hi <<= 2;
>                 if (i == 15)
>                         x_hi = x;
>                 y <<= 1;
>                 if (b > y)
>                         b -= ++y;
>         }
>         return y;
> }
> 
> Put it through cc -O3 -m32 -c -o sqrt64.o sqrt64.c and then objdump sqrt64.o
> and compare to that of your version.
> 
> 	David
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-20 16:46       ` David Laight
  2017-12-20 17:19         ` Joe Perches
@ 2017-12-20 17:30         ` Crt Mori
  2017-12-21 10:08           ` David Laight
                             ` (2 more replies)
  1 sibling, 3 replies; 20+ messages in thread
From: Crt Mori @ 2017-12-20 17:30 UTC (permalink / raw)
  To: David Laight
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

On 20 December 2017 at 17:46, David Laight <David.Laight@aculab.com> wrote:
> From: Crt Mori
>> Sent: 20 December 2017 16:17
>>
>> On 20 December 2017 at 17:00, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote:
>> >
>> >> With minor changes it ought to be possible to remove most of the
>> >> 64bit arithmetic and shifts.
>> >>
>> >> If you care about performance then using 32 bit maths will be much faster.
>> >
>> > Some, u64 add/sub/shift isn't exactly expensive, but yes, I also
>> > indicated that improvement is possible. At the very least y can be made
>> > a u32 I suppose.
>>
>> OK, is there any more easy optimizations you see?
>
> I think this version works.
> It doesn't have the optimisation for small values.
>
> unsigned int sqrt64(unsigned long long x)
> {
>         unsigned int x_hi = x >> 32;
>
>         unsigned int b = 0;
>         unsigned int y = 0;
>         unsigned int i;
>
>         for (i = 0; i < 32; i++) {
>                 b <<= 2;
>                 b |= x_hi >> 30;
>                 x_hi <<= 2;
>                 if (i == 15)
>                         x_hi = x;
>                 y <<= 1;
>                 if (b > y)
>                         b -= ++y;
>         }
>         return y;
> }
>
> Put it through cc -O3 -m32 -c -o sqrt64.o sqrt64.c and then objdump sqrt64.o
> and compare to that of your version.
>
>         David
>

Wow, thanks. This seems like a major change.

I did a quick run through unit tests for the sensor and the results
are way off. On the sensor I had to convert double calculations to
integer calculations and target was to get end result within 0.02 degC
(with previous approximate sqrt implementation) in sensor working
range. This now gets into 3 degC delta at least and some are way off.
It might be off because of some scaling on the other hand during the
equation (not exactly comparing sqrt implementations here).

I will need a bit more testing of this to see if it is replaceable.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-20 17:19         ` Joe Perches
@ 2017-12-21 10:02           ` David Laight
  0 siblings, 0 replies; 20+ messages in thread
From: David Laight @ 2017-12-21 10:02 UTC (permalink / raw)
  To: 'Joe Perches', 'Crt Mori', Peter Zijlstra
  Cc: Jonathan Cameron, Ingo Molnar, Andrew Morton, Kees Cook,
	Rusty Russell, Ian Abbott, Larry Finger, Niklas Soderlund,
	Thomas Gleixner, Krzysztof Kozlowski, Masahiro Yamada,
	linux-kernel, linux-iio

From: Joe Perches
> Sent: 20 December 2017 17:20
...
> > I think this version works.
> > It doesn't have the optimisation for small values.
> >
> > unsigned int sqrt64(unsigned long long x)
> > {
> >         unsigned int x_hi = x >> 32;
> >
> >         unsigned int b = 0;
> >         unsigned int y = 0;
> >         unsigned int i;
> >
> 
> Perhaps add:
> 
> 	if (x <= UINT_MAX)
> 		return int_sqrt((unsigned long)x);

Actually something like:
	i = 32;
	if (!x_hi) {
		x_hi = x;
		i = 16;
	}
	
	if (!(x_hi & 0xffff0000)) {
		x_hi <<= 16;
		i -= 8;
	}
Repeat for 0xff000000, 0xf0000000 and 0xc0000000 and adjust loop to count down.

	David

> 
> >         for (i = 0; i < 32; i++) {
> >                 b <<= 2;
> >                 b |= x_hi >> 30;
> >                 x_hi <<= 2;
> >                 if (i == 15)
> >                         x_hi = x;
> >                 y <<= 1;
> >                 if (b > y)
> >                         b -= ++y;
> >         }
> >         return y;
> > }
> >
> > Put it through cc -O3 -m32 -c -o sqrt64.o sqrt64.c and then objdump sqrt64.o
> > and compare to that of your version.
> >
> > 	David
> >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-20 17:30         ` Crt Mori
@ 2017-12-21 10:08           ` David Laight
  2017-12-21 10:59           ` David Laight
  2017-12-21 11:43           ` David Laight
  2 siblings, 0 replies; 20+ messages in thread
From: David Laight @ 2017-12-21 10:08 UTC (permalink / raw)
  To: 'Crt Mori'
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

From: Crt Mori
> Sent: 20 December 2017 17:30
> To: David Laight
...
> > I think this version works.
> > It doesn't have the optimisation for small values.
> >
> > unsigned int sqrt64(unsigned long long x)
> > {
> >         unsigned int x_hi = x >> 32;
> >
> >         unsigned int b = 0;
> >         unsigned int y = 0;
> >         unsigned int i;
> >
> >         for (i = 0; i < 32; i++) {
> >                 b <<= 2;
> >                 b |= x_hi >> 30;
> >                 x_hi <<= 2;
> >                 if (i == 15)
> >                         x_hi = x;
> >                 y <<= 1;
> >                 if (b > y)
> >                         b -= ++y;
> >         }
> >         return y;
> > }
> 
> Wow, thanks. This seems like a major change.

Not really, it is just doing it slightly differently.
The algorithm is the 'schoolboy' one that used to be taught at school.
(Although I found it in a boot from the 1930s.)

I compared the object code for amd64 (doesn't need to reload
'x_hi' half way through) against the original loop.
They are much the same size, execution speed will depend the lengths
of the dependency chains.

> I did a quick run through unit tests for the sensor and the results
> are way off. On the sensor I had to convert double calculations to
> integer calculations and target was to get end result within 0.02 degC
> (with previous approximate sqrt implementation) in sensor working
> range. This now gets into 3 degC delta at least and some are way off.
> It might be off because of some scaling on the other hand during the
> equation (not exactly comparing sqrt implementations here).
> 
> I will need a bit more testing of this to see if it is replaceable.

You probably need to put values through both (all 3) functions to see
where you are getting errors.
It might be rounding, or there might be a fubar somewhere.

	David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-20 17:30         ` Crt Mori
  2017-12-21 10:08           ` David Laight
@ 2017-12-21 10:59           ` David Laight
  2017-12-21 11:43           ` David Laight
  2 siblings, 0 replies; 20+ messages in thread
From: David Laight @ 2017-12-21 10:59 UTC (permalink / raw)
  To: 'Crt Mori'
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

From: Crt Mori
> Sent: 20 December 2017 17:30
> >> OK, is there any more easy optimizations you see?
> >
> > I think this version works.
> > It doesn't have the optimisation for small values.
> >
> > unsigned int sqrt64(unsigned long long x)
> > {
> >         unsigned int x_hi = x >> 32;
> >
> >         unsigned int b = 0;
> >         unsigned int y = 0;
> >         unsigned int i;
> >
> >         for (i = 0; i < 32; i++) {
> >                 b <<= 2;
> >                 b |= x_hi >> 30;
> >                 x_hi <<= 2;
> >                 if (i == 15)
> >                         x_hi = x;
> >                 y <<= 1;
> >                 if (b > y)
> >                         b -= ++y;
> >         }
> >         return y;
> > }
..
> 
> I did a quick run through unit tests for the sensor and the results
> are way off. On the sensor I had to convert double calculations to
> integer calculations and target was to get end result within 0.02 degC
> (with previous approximate sqrt implementation) in sensor working
> range. This now gets into 3 degC delta at least and some are way off.
> It might be off because of some scaling on the other hand during the
> equation (not exactly comparing sqrt implementations here).

I didn't get it quite right...
The last few lines need to be:
		if (b > y) {	
			b -= ++y;
			y++;
		}
	}
	return y >> 1;
}

Although that then fails for inputs larger than 2^62.

	David

	

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-20 17:30         ` Crt Mori
  2017-12-21 10:08           ` David Laight
  2017-12-21 10:59           ` David Laight
@ 2017-12-21 11:43           ` David Laight
  2017-12-21 13:17             ` Crt Mori
  2 siblings, 1 reply; 20+ messages in thread
From: David Laight @ 2017-12-21 11:43 UTC (permalink / raw)
  To: 'Crt Mori'
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

From: Crt Mori
> Sent: 20 December 2017 17:30
> I did a quick run through unit tests for the sensor and the results
> are way off
> ...

Try this version instead:
unsigned int sqrt64(unsigned long long x_in)
{
        unsigned int x = x_in >> 32;

        unsigned int b = 0;
        unsigned int y = 0;
        unsigned int i;

        i = 31;
        if (!x) {
                x = x_in;
                i = 15;
        }
        if (!(x & 0xffff0000)) {
                x <<= 16;
                i -= 8;
        }
        if (!(x & 0xff000000)) {
                x <<= 8;
                i -= 4;
        }
        if (!(x & 0xf0000000)) {
                x <<= 4;
                i -= 2;
        }

        do {
                b <<= 2;
                b |= x >> 30;
                x <<= 2;
                if (i == 16)
                        x = x_in;
                y <<= 1;
                if (b > y) {
                        b -= ++y;
                        y++;
                }
        } while (--i);

        /* 'b' becomes 33 bits if the input is greater than 2^62 */
        b <<= 1;
        b |= x >> 31;
        if (b > y || (b == y && x & (1u << 30)))
                y |= 1;

        return y;
}

I've tested that one with more values.

	David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-21 11:43           ` David Laight
@ 2017-12-21 13:17             ` Crt Mori
  2017-12-21 13:56               ` David Laight
  0 siblings, 1 reply; 20+ messages in thread
From: Crt Mori @ 2017-12-21 13:17 UTC (permalink / raw)
  To: David Laight
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

On 21 December 2017 at 12:43, David Laight <David.Laight@aculab.com> wrote:
> From: Crt Mori
>> Sent: 20 December 2017 17:30
>> I did a quick run through unit tests for the sensor and the results
>> are way off
>> ...
>
> Try this version instead:
> unsigned int sqrt64(unsigned long long x_in)
> {
>         unsigned int x = x_in >> 32;
>
>         unsigned int b = 0;
>         unsigned int y = 0;
>         unsigned int i;

i can be u8. And I will still use explicit typing.

>
>         i = 31;
>         if (!x) {
>                 x = x_in;
>                 i = 15;
>         }
>         if (!(x & 0xffff0000)) {
>                 x <<= 16;
>                 i -= 8;
>         }
>         if (!(x & 0xff000000)) {
>                 x <<= 8;
>                 i -= 4;
>         }
>         if (!(x & 0xf0000000)) {
>                 x <<= 4;
>                 i -= 2;
>         }
>

This part above looks like FLS

>         do {
>                 b <<= 2;
>                 b |= x >> 30;
>                 x <<= 2;
>                 if (i == 16)
>                         x = x_in;
>                 y <<= 1;
>                 if (b > y) {
>                         b -= ++y;
>                         y++;
>                 }
>         } while (--i);
>
>         /* 'b' becomes 33 bits if the input is greater than 2^62 */
>         b <<= 1;
>         b |= x >> 31;
>         if (b > y || (b == y && x & (1u << 30)))
>                 y |= 1;
>
>         return y;
> }
>
> I've tested that one with more values.
>
>         David
>

This one indeed works. I did some more testing this morning and I am
fine with either.

So question is: Do I make change as per David's suggestion with his
sign-off, or leave the version it was before the change?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-21 13:17             ` Crt Mori
@ 2017-12-21 13:56               ` David Laight
  2017-12-21 14:11                 ` Peter Zijlstra
  0 siblings, 1 reply; 20+ messages in thread
From: David Laight @ 2017-12-21 13:56 UTC (permalink / raw)
  To: 'Crt Mori'
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

From: Crt Mori
> Sent: 21 December 2017 13:18
...
> >         unsigned int i;
> 
> i can be u8. And I will still use explicit typing.

u8 will add extra code, unsigned int is good.
'x' needs to be u32, 'y' and 'b' could be larger.

I was testing in userspace.

...
> This part above looks like FLS
It also does the rest of the required shifts.

...
> This one indeed works. I did some more testing this morning and I am
> fine with either.
> 
> So question is: Do I make change as per David's suggestion with his
> sign-off, or leave the version it was before the change?

If you generate the actual patch I can add an appropriate sign-off
(or whatever is needed).

	David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-21 13:56               ` David Laight
@ 2017-12-21 14:11                 ` Peter Zijlstra
  2017-12-21 14:48                   ` David Laight
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2017-12-21 14:11 UTC (permalink / raw)
  To: David Laight
  Cc: 'Crt Mori',
	Jonathan Cameron, Ingo Molnar, Andrew Morton, Kees Cook,
	Rusty Russell, Ian Abbott, Larry Finger, Niklas Soderlund,
	Thomas Gleixner, Krzysztof Kozlowski, Masahiro Yamada,
	linux-kernel, linux-iio, Joe Perches

On Thu, Dec 21, 2017 at 01:56:55PM +0000, David Laight wrote:
> > This part above looks like FLS
> It also does the rest of the required shifts.

Still, fls() + shift is way faster on hardware that has an fls
instruction.

Writing out that binary search doesn't make sense.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-21 14:11                 ` Peter Zijlstra
@ 2017-12-21 14:48                   ` David Laight
  2017-12-22 13:44                     ` Crt Mori
  0 siblings, 1 reply; 20+ messages in thread
From: David Laight @ 2017-12-21 14:48 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: 'Crt Mori',
	Jonathan Cameron, Ingo Molnar, Andrew Morton, Kees Cook,
	Rusty Russell, Ian Abbott, Larry Finger, Niklas Soderlund,
	Thomas Gleixner, Krzysztof Kozlowski, Masahiro Yamada,
	linux-kernel, linux-iio, Joe Perches

From: Peter Zijlstra
> Sent: 21 December 2017 14:12
...
> > > This part above looks like FLS
> > It also does the rest of the required shifts.
> 
> Still, fls() + shift is way faster on hardware that has an fls
> instruction.
> 
> Writing out that binary search doesn't make sense.

If the hardware doesn't have an appropriate fls instruction
the soft fls()will be worse.

If you used fls() you'd still need quite a bit of code
to generate the correct shift and loop count adjustment.
Given the cost of the loop iterations the 3 tests are noise.
The open coded version is obviously correct...

I didn't add the 4th one because the code always does 2 iterations.

If you were really worried about performance there are faster
algorithms (even doing 2 or 4 bits a time is faster).

	David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-21 14:48                   ` David Laight
@ 2017-12-22 13:44                     ` Crt Mori
  2018-01-09 15:18                       ` Crt Mori
  0 siblings, 1 reply; 20+ messages in thread
From: Crt Mori @ 2017-12-22 13:44 UTC (permalink / raw)
  To: David Laight
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

>From simple strong typing of existing int_sqrt we came to something a
bit more complex or better. Can we decide now which we want in, or I
submit v12 and we decide then (although it is not a v12, but whole new
thing)?

On 21 December 2017 at 15:48, David Laight <David.Laight@aculab.com> wrote:
> From: Peter Zijlstra
>> Sent: 21 December 2017 14:12
> ...
>> > > This part above looks like FLS
>> > It also does the rest of the required shifts.
>>
>> Still, fls() + shift is way faster on hardware that has an fls
>> instruction.
>>
>> Writing out that binary search doesn't make sense.
>
> If the hardware doesn't have an appropriate fls instruction
> the soft fls()will be worse.
>
> If you used fls() you'd still need quite a bit of code
> to generate the correct shift and loop count adjustment.
> Given the cost of the loop iterations the 3 tests are noise.
> The open coded version is obviously correct...
>
> I didn't add the 4th one because the code always does 2 iterations.
>
> If you were really worried about performance there are faster
> algorithms (even doing 2 or 4 bits a time is faster).
>
>         David
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-22 13:44                     ` Crt Mori
@ 2018-01-09 15:18                       ` Crt Mori
  2018-01-12  9:41                         ` David Laight
  0 siblings, 1 reply; 20+ messages in thread
From: Crt Mori @ 2018-01-09 15:18 UTC (permalink / raw)
  To: David Laight
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

It has been some time now since this moved. I have decided not to use
David's implementation because I want to maintain also range above
2^62

Are there any additional objections for this not to go in as it is?

On 22 December 2017 at 14:44, Crt Mori <cmo@melexis.com> wrote:
>
> From simple strong typing of existing int_sqrt we came to something a
> bit more complex or better. Can we decide now which we want in, or I
> submit v12 and we decide then (although it is not a v12, but whole new
> thing)?
>
> On 21 December 2017 at 15:48, David Laight <David.Laight@aculab.com> wrote:
> > From: Peter Zijlstra
> >> Sent: 21 December 2017 14:12
> > ...
> >> > > This part above looks like FLS
> >> > It also does the rest of the required shifts.
> >>
> >> Still, fls() + shift is way faster on hardware that has an fls
> >> instruction.
> >>
> >> Writing out that binary search doesn't make sense.
> >
> > If the hardware doesn't have an appropriate fls instruction
> > the soft fls()will be worse.
> >
> > If you used fls() you'd still need quite a bit of code
> > to generate the correct shift and loop count adjustment.
> > Given the cost of the loop iterations the 3 tests are noise.
> > The open coded version is obviously correct...
> >
> > I didn't add the 4th one because the code always does 2 iterations.
> >
> > If you were really worried about performance there are faster
> > algorithms (even doing 2 or 4 bits a time is faster).
> >
> >         David
> >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2018-01-09 15:18                       ` Crt Mori
@ 2018-01-12  9:41                         ` David Laight
  2018-01-15  8:17                           ` Crt Mori
  0 siblings, 1 reply; 20+ messages in thread
From: David Laight @ 2018-01-12  9:41 UTC (permalink / raw)
  To: 'Crt Mori'
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

From: Crt Mori [mailto:cmo@melexis.com]
> Sent: 09 January 2018 15:18
> 
> It has been some time now since this moved. I have decided not to use
> David's implementation because I want to maintain also range above
> 2^62

The last version I did supported the full range.

	David


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2018-01-12  9:41                         ` David Laight
@ 2018-01-15  8:17                           ` Crt Mori
  0 siblings, 0 replies; 20+ messages in thread
From: Crt Mori @ 2018-01-15  8:17 UTC (permalink / raw)
  To: David Laight
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

On 12 January 2018 at 10:41, David Laight <David.Laight@aculab.com> wrote:
> From: Crt Mori [mailto:cmo@melexis.com]
>> Sent: 09 January 2018 15:18
>>
>> It has been some time now since this moved. I have decided not to use
>> David's implementation because I want to maintain also range above
>> 2^62
>
> The last version I did supported the full range.

Nothing changed below that comment, so I was assuming it is not
supported (or did I miss a mail?).

Also fls discussion had opposite opinions or it just seems inconclusive to me?

>
>         David
>

So you all agree David Laight version is much better and should be
used instead of currently proposed version?

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-01-15  8:18 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-20 14:20 [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt Crt Mori
2017-12-20 14:39 ` David Laight
2017-12-20 15:41   ` Crt Mori
2017-12-20 16:00   ` Peter Zijlstra
2017-12-20 16:17     ` Crt Mori
2017-12-20 16:46       ` David Laight
2017-12-20 17:19         ` Joe Perches
2017-12-21 10:02           ` David Laight
2017-12-20 17:30         ` Crt Mori
2017-12-21 10:08           ` David Laight
2017-12-21 10:59           ` David Laight
2017-12-21 11:43           ` David Laight
2017-12-21 13:17             ` Crt Mori
2017-12-21 13:56               ` David Laight
2017-12-21 14:11                 ` Peter Zijlstra
2017-12-21 14:48                   ` David Laight
2017-12-22 13:44                     ` Crt Mori
2018-01-09 15:18                       ` Crt Mori
2018-01-12  9:41                         ` David Laight
2018-01-15  8:17                           ` Crt Mori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).