All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
@ 2017-12-20 14:20 Crt Mori
  2017-12-20 14:39   ` David Laight
  0 siblings, 1 reply; 29+ messages in thread
From: Crt Mori @ 2017-12-20 14:20 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Ingo Molnar, Andrew Morton, Kees Cook, Rusty Russell, Ian Abbott,
	Larry Finger, Niklas Soderlund, Thomas Gleixner,
	Krzysztof Kozlowski, Masahiro Yamada, linux-kernel, linux-iio,
	Peter Zijlstra, Joe Perches, Crt Mori

There is no option to perform 64bit integer sqrt on 32bit platform.
Added stronger typed int_sqrt64 enables the 64bit calculations to
be performed on 32bit platforms. Although int_sqrt() is a rough
approximation, the same algorithm is used in int_sqrt64() as good
enough on 32bit platform.

Signed-off-by: Crt Mori <cmo@melexis.com>
---
 include/linux/kernel.h |  9 +++++++++
 lib/int_sqrt.c         | 31 +++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 0ad4c3044cf9..e4a3dc64e650 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -460,6 +460,15 @@ extern int func_ptr_is_kernel_text(void *ptr);
 
 unsigned long int_sqrt(unsigned long);
 
+#if BITS_PER_LONG < 64
+u32 int_sqrt64(u64 x);
+#else
+static inline u32 int_sqrt64(u64 x)
+{
+	return (u32)int_sqrt(x);
+}
+#endif
+
 extern void bust_spinlocks(int yes);
 extern int oops_in_progress;		/* If set, an oops, panic(), BUG() or die() is in progress */
 extern int panic_timeout;
diff --git a/lib/int_sqrt.c b/lib/int_sqrt.c
index 1ef4cc344977..184da54677ec 100644
--- a/lib/int_sqrt.c
+++ b/lib/int_sqrt.c
@@ -7,6 +7,7 @@
 
 #include <linux/kernel.h>
 #include <linux/export.h>
+#include <linux/bitops.h>
 
 /**
  * int_sqrt - rough approximation to sqrt
@@ -36,3 +37,33 @@ unsigned long int_sqrt(unsigned long x)
 	return y;
 }
 EXPORT_SYMBOL(int_sqrt);
+
+#if BITS_PER_LONG < 64
+/**
+ * int_sqrt64 - strongly typed int_sqrt function when minimum 64 bit input
+ * is expected.
+ * @x: 64bit integer of which to calculate the sqrt
+ */
+u32 int_sqrt64(u64 x)
+{
+	u64 b, m, y = 0;
+
+	if (x <= 1)
+		return x;
+
+	m = 1ULL << (fls64(x) & ~1ULL);
+	while (m != 0) {
+		b = y + m;
+		y >>= 1;
+
+		if (x >= b) {
+			x -= b;
+			y += m;
+		}
+		m >>= 2;
+	}
+
+	return y;
+}
+EXPORT_SYMBOL(int_sqrt64);
+#endif
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-20 14:20 [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt Crt Mori
@ 2017-12-20 14:39   ` David Laight
  0 siblings, 0 replies; 29+ messages in thread
From: David Laight @ 2017-12-20 14:39 UTC (permalink / raw)
  To: 'Crt Mori', Jonathan Cameron
  Cc: Ingo Molnar, Andrew Morton, Kees Cook, Rusty Russell, Ian Abbott,
	Larry Finger, Niklas Soderlund, Thomas Gleixner,
	Krzysztof Kozlowski, Masahiro Yamada, linux-kernel, linux-iio,
	Peter Zijlstra, Joe Perches

From: Crt Mori
> Sent: 20 December 2017 14:20
> 
> There is no option to perform 64bit integer sqrt on 32bit platform.
> Added stronger typed int_sqrt64 enables the 64bit calculations to
> be performed on 32bit platforms. Although int_sqrt() is a rough
> approximation, the same algorithm is used in int_sqrt64() as good
> enough on 32bit platform.

The algorithm used gives an exact (rounded down) square root,
not an approximation.

With minor changes it ought to be possible to remove most of the
64bit arithmetic and shifts.

If you care about performance then using 32 bit maths will be much faster.

	David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
@ 2017-12-20 14:39   ` David Laight
  0 siblings, 0 replies; 29+ messages in thread
From: David Laight @ 2017-12-20 14:39 UTC (permalink / raw)
  To: 'Crt Mori', Jonathan Cameron
  Cc: Ingo Molnar, Andrew Morton, Kees Cook, Rusty Russell, Ian Abbott,
	Larry Finger, Niklas Soderlund, Thomas Gleixner,
	Krzysztof Kozlowski, Masahiro Yamada, linux-kernel, linux-iio,
	Peter Zijlstra, Joe Perches

From: Crt Mori
> Sent: 20 December 2017 14:20
>=20
> There is no option to perform 64bit integer sqrt on 32bit platform.
> Added stronger typed int_sqrt64 enables the 64bit calculations to
> be performed on 32bit platforms. Although int_sqrt() is a rough
> approximation, the same algorithm is used in int_sqrt64() as good
> enough on 32bit platform.

The algorithm used gives an exact (rounded down) square root,
not an approximation.

With minor changes it ought to be possible to remove most of the
64bit arithmetic and shifts.

If you care about performance then using 32 bit maths will be much faster.

	David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-20 14:39   ` David Laight
  (?)
@ 2017-12-20 15:41   ` Crt Mori
  -1 siblings, 0 replies; 29+ messages in thread
From: Crt Mori @ 2017-12-20 15:41 UTC (permalink / raw)
  To: David Laight
  Cc: Jonathan Cameron, Ingo Molnar, Andrew Morton, Kees Cook,
	Rusty Russell, Ian Abbott, Larry Finger, Niklas Soderlund,
	Thomas Gleixner, Krzysztof Kozlowski, Masahiro Yamada,
	linux-kernel, linux-iio, Peter Zijlstra, Joe Perches

On 20 December 2017 at 15:39, David Laight <David.Laight@aculab.com> wrote:
> From: Crt Mori
>> Sent: 20 December 2017 14:20
>>
>> There is no option to perform 64bit integer sqrt on 32bit platform.
>> Added stronger typed int_sqrt64 enables the 64bit calculations to
>> be performed on 32bit platforms. Although int_sqrt() is a rough
>> approximation, the same algorithm is used in int_sqrt64() as good
>> enough on 32bit platform.
>
> The algorithm used gives an exact (rounded down) square root,
> not an approximation.
>

OK, noted. This is from new changed function indeed - will change in v11.

> With minor changes it ought to be possible to remove most of the
> 64bit arithmetic and shifts.
>
> If you care about performance then using 32 bit maths will be much faster.
>

I need precision not the performance. That is why I would like to
leave int_sqrt as one for good performance while my will be used when
you require a precision.

>         David
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-20 14:39   ` David Laight
  (?)
  (?)
@ 2017-12-20 16:00   ` Peter Zijlstra
  2017-12-20 16:17     ` Crt Mori
  -1 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2017-12-20 16:00 UTC (permalink / raw)
  To: David Laight
  Cc: 'Crt Mori',
	Jonathan Cameron, Ingo Molnar, Andrew Morton, Kees Cook,
	Rusty Russell, Ian Abbott, Larry Finger, Niklas Soderlund,
	Thomas Gleixner, Krzysztof Kozlowski, Masahiro Yamada,
	linux-kernel, linux-iio, Joe Perches

On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote:

> With minor changes it ought to be possible to remove most of the
> 64bit arithmetic and shifts.
> 
> If you care about performance then using 32 bit maths will be much faster.

Some, u64 add/sub/shift isn't exactly expensive, but yes, I also
indicated that improvement is possible. At the very least y can be made
a u32 I suppose.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-20 16:00   ` Peter Zijlstra
@ 2017-12-20 16:17     ` Crt Mori
  2017-12-20 16:46         ` David Laight
  0 siblings, 1 reply; 29+ messages in thread
From: Crt Mori @ 2017-12-20 16:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Laight, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

On 20 December 2017 at 17:00, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote:
>
>> With minor changes it ought to be possible to remove most of the
>> 64bit arithmetic and shifts.
>>
>> If you care about performance then using 32 bit maths will be much faster.
>
> Some, u64 add/sub/shift isn't exactly expensive, but yes, I also
> indicated that improvement is possible. At the very least y can be made
> a u32 I suppose.

OK, is there any more easy optimizations you see?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-20 16:17     ` Crt Mori
@ 2017-12-20 16:46         ` David Laight
  0 siblings, 0 replies; 29+ messages in thread
From: David Laight @ 2017-12-20 16:46 UTC (permalink / raw)
  To: 'Crt Mori', Peter Zijlstra
  Cc: Jonathan Cameron, Ingo Molnar, Andrew Morton, Kees Cook,
	Rusty Russell, Ian Abbott, Larry Finger, Niklas Soderlund,
	Thomas Gleixner, Krzysztof Kozlowski, Masahiro Yamada,
	linux-kernel, linux-iio, Joe Perches

From: Crt Mori
> Sent: 20 December 2017 16:17
> 
> On 20 December 2017 at 17:00, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote:
> >
> >> With minor changes it ought to be possible to remove most of the
> >> 64bit arithmetic and shifts.
> >>
> >> If you care about performance then using 32 bit maths will be much faster.
> >
> > Some, u64 add/sub/shift isn't exactly expensive, but yes, I also
> > indicated that improvement is possible. At the very least y can be made
> > a u32 I suppose.
> 
> OK, is there any more easy optimizations you see?

I think this version works.
It doesn't have the optimisation for small values.

unsigned int sqrt64(unsigned long long x)
{
        unsigned int x_hi = x >> 32;

        unsigned int b = 0;
        unsigned int y = 0;
        unsigned int i;

        for (i = 0; i < 32; i++) {
                b <<= 2;
                b |= x_hi >> 30;
                x_hi <<= 2;
                if (i == 15)
                        x_hi = x;
                y <<= 1;
                if (b > y)
                        b -= ++y;
        }
        return y;
}

Put it through cc -O3 -m32 -c -o sqrt64.o sqrt64.c and then objdump sqrt64.o
and compare to that of your version.

	David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
@ 2017-12-20 16:46         ` David Laight
  0 siblings, 0 replies; 29+ messages in thread
From: David Laight @ 2017-12-20 16:46 UTC (permalink / raw)
  To: 'Crt Mori', Peter Zijlstra
  Cc: Jonathan Cameron, Ingo Molnar, Andrew Morton, Kees Cook,
	Rusty Russell, Ian Abbott, Larry Finger, Niklas Soderlund,
	Thomas Gleixner, Krzysztof Kozlowski, Masahiro Yamada,
	linux-kernel, linux-iio, Joe Perches

RnJvbTogQ3J0IE1vcmkNCj4gU2VudDogMjAgRGVjZW1iZXIgMjAxNyAxNjoxNw0KPiANCj4gT24g
MjAgRGVjZW1iZXIgMjAxNyBhdCAxNzowMCwgUGV0ZXIgWmlqbHN0cmEgPHBldGVyekBpbmZyYWRl
YWQub3JnPiB3cm90ZToNCj4gPiBPbiBXZWQsIERlYyAyMCwgMjAxNyBhdCAwMjozOToyNlBNICsw
MDAwLCBEYXZpZCBMYWlnaHQgd3JvdGU6DQo+ID4NCj4gPj4gV2l0aCBtaW5vciBjaGFuZ2VzIGl0
IG91Z2h0IHRvIGJlIHBvc3NpYmxlIHRvIHJlbW92ZSBtb3N0IG9mIHRoZQ0KPiA+PiA2NGJpdCBh
cml0aG1ldGljIGFuZCBzaGlmdHMuDQo+ID4+DQo+ID4+IElmIHlvdSBjYXJlIGFib3V0IHBlcmZv
cm1hbmNlIHRoZW4gdXNpbmcgMzIgYml0IG1hdGhzIHdpbGwgYmUgbXVjaCBmYXN0ZXIuDQo+ID4N
Cj4gPiBTb21lLCB1NjQgYWRkL3N1Yi9zaGlmdCBpc24ndCBleGFjdGx5IGV4cGVuc2l2ZSwgYnV0
IHllcywgSSBhbHNvDQo+ID4gaW5kaWNhdGVkIHRoYXQgaW1wcm92ZW1lbnQgaXMgcG9zc2libGUu
IEF0IHRoZSB2ZXJ5IGxlYXN0IHkgY2FuIGJlIG1hZGUNCj4gPiBhIHUzMiBJIHN1cHBvc2UuDQo+
IA0KPiBPSywgaXMgdGhlcmUgYW55IG1vcmUgZWFzeSBvcHRpbWl6YXRpb25zIHlvdSBzZWU/DQoN
CkkgdGhpbmsgdGhpcyB2ZXJzaW9uIHdvcmtzLg0KSXQgZG9lc24ndCBoYXZlIHRoZSBvcHRpbWlz
YXRpb24gZm9yIHNtYWxsIHZhbHVlcy4NCg0KdW5zaWduZWQgaW50IHNxcnQ2NCh1bnNpZ25lZCBs
b25nIGxvbmcgeCkNCnsNCiAgICAgICAgdW5zaWduZWQgaW50IHhfaGkgPSB4ID4+IDMyOw0KDQog
ICAgICAgIHVuc2lnbmVkIGludCBiID0gMDsNCiAgICAgICAgdW5zaWduZWQgaW50IHkgPSAwOw0K
ICAgICAgICB1bnNpZ25lZCBpbnQgaTsNCg0KICAgICAgICBmb3IgKGkgPSAwOyBpIDwgMzI7IGkr
Kykgew0KICAgICAgICAgICAgICAgIGIgPDw9IDI7DQogICAgICAgICAgICAgICAgYiB8PSB4X2hp
ID4+IDMwOw0KICAgICAgICAgICAgICAgIHhfaGkgPDw9IDI7DQogICAgICAgICAgICAgICAgaWYg
KGkgPT0gMTUpDQogICAgICAgICAgICAgICAgICAgICAgICB4X2hpID0geDsNCiAgICAgICAgICAg
ICAgICB5IDw8PSAxOw0KICAgICAgICAgICAgICAgIGlmIChiID4geSkNCiAgICAgICAgICAgICAg
ICAgICAgICAgIGIgLT0gKyt5Ow0KICAgICAgICB9DQogICAgICAgIHJldHVybiB5Ow0KfQ0KDQpQ
dXQgaXQgdGhyb3VnaCBjYyAtTzMgLW0zMiAtYyAtbyBzcXJ0NjQubyBzcXJ0NjQuYyBhbmQgdGhl
biBvYmpkdW1wIHNxcnQ2NC5vDQphbmQgY29tcGFyZSB0byB0aGF0IG9mIHlvdXIgdmVyc2lvbi4N
Cg0KCURhdmlkDQoNCg==

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-20 16:46         ` David Laight
@ 2017-12-20 17:19           ` Joe Perches
  -1 siblings, 0 replies; 29+ messages in thread
From: Joe Perches @ 2017-12-20 17:19 UTC (permalink / raw)
  To: David Laight, 'Crt Mori', Peter Zijlstra
  Cc: Jonathan Cameron, Ingo Molnar, Andrew Morton, Kees Cook,
	Rusty Russell, Ian Abbott, Larry Finger, Niklas Soderlund,
	Thomas Gleixner, Krzysztof Kozlowski, Masahiro Yamada,
	linux-kernel, linux-iio

On Wed, 2017-12-20 at 16:46 +0000, David Laight wrote:
> From: Crt Mori
> > Sent: 20 December 2017 16:17
> > 
> > On 20 December 2017 at 17:00, Peter Zijlstra <peterz@infradead.org> wrote:
> > > On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote:
> > > 
> > > > With minor changes it ought to be possible to remove most of the
> > > > 64bit arithmetic and shifts.
> > > > 
> > > > If you care about performance then using 32 bit maths will be much faster.
> > > 
> > > Some, u64 add/sub/shift isn't exactly expensive, but yes, I also
> > > indicated that improvement is possible. At the very least y can be made
> > > a u32 I suppose.
> > 
> > OK, is there any more easy optimizations you see?
> 
> I think this version works.
> It doesn't have the optimisation for small values.
> 
> unsigned int sqrt64(unsigned long long x)
> {
>         unsigned int x_hi = x >> 32;
> 
>         unsigned int b = 0;
>         unsigned int y = 0;
>         unsigned int i;
> 

Perhaps add:

	if (x <= UINT_MAX)
		return int_sqrt((unsigned long)x);

>         for (i = 0; i < 32; i++) {
>                 b <<= 2;
>                 b |= x_hi >> 30;
>                 x_hi <<= 2;
>                 if (i == 15)
>                         x_hi = x;
>                 y <<= 1;
>                 if (b > y)
>                         b -= ++y;
>         }
>         return y;
> }
> 
> Put it through cc -O3 -m32 -c -o sqrt64.o sqrt64.c and then objdump sqrt64.o
> and compare to that of your version.
> 
> 	David
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
@ 2017-12-20 17:19           ` Joe Perches
  0 siblings, 0 replies; 29+ messages in thread
From: Joe Perches @ 2017-12-20 17:19 UTC (permalink / raw)
  To: David Laight, 'Crt Mori', Peter Zijlstra
  Cc: Jonathan Cameron, Ingo Molnar, Andrew Morton, Kees Cook,
	Rusty Russell, Ian Abbott, Larry Finger, Niklas Soderlund,
	Thomas Gleixner, Krzysztof Kozlowski, Masahiro Yamada,
	linux-kernel, linux-iio

On Wed, 2017-12-20 at 16:46 +0000, David Laight wrote:
> From: Crt Mori
> > Sent: 20 December 2017 16:17
> > 
> > On 20 December 2017 at 17:00, Peter Zijlstra <peterz@infradead.org> wrote:
> > > On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote:
> > > 
> > > > With minor changes it ought to be possible to remove most of the
> > > > 64bit arithmetic and shifts.
> > > > 
> > > > If you care about performance then using 32 bit maths will be much faster.
> > > 
> > > Some, u64 add/sub/shift isn't exactly expensive, but yes, I also
> > > indicated that improvement is possible. At the very least y can be made
> > > a u32 I suppose.
> > 
> > OK, is there any more easy optimizations you see?
> 
> I think this version works.
> It doesn't have the optimisation for small values.
> 
> unsigned int sqrt64(unsigned long long x)
> {
>         unsigned int x_hi = x >> 32;
> 
>         unsigned int b = 0;
>         unsigned int y = 0;
>         unsigned int i;
> 

Perhaps add:

	if (x <= UINT_MAX)
		return int_sqrt((unsigned long)x);

>         for (i = 0; i < 32; i++) {
>                 b <<= 2;
>                 b |= x_hi >> 30;
>                 x_hi <<= 2;
>                 if (i == 15)
>                         x_hi = x;
>                 y <<= 1;
>                 if (b > y)
>                         b -= ++y;
>         }
>         return y;
> }
> 
> Put it through cc -O3 -m32 -c -o sqrt64.o sqrt64.c and then objdump sqrt64.o
> and compare to that of your version.
> 
> 	David
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-20 16:46         ` David Laight
  (?)
  (?)
@ 2017-12-20 17:30         ` Crt Mori
  2017-12-21 10:08             ` David Laight
                             ` (2 more replies)
  -1 siblings, 3 replies; 29+ messages in thread
From: Crt Mori @ 2017-12-20 17:30 UTC (permalink / raw)
  To: David Laight
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

On 20 December 2017 at 17:46, David Laight <David.Laight@aculab.com> wrote:
> From: Crt Mori
>> Sent: 20 December 2017 16:17
>>
>> On 20 December 2017 at 17:00, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote:
>> >
>> >> With minor changes it ought to be possible to remove most of the
>> >> 64bit arithmetic and shifts.
>> >>
>> >> If you care about performance then using 32 bit maths will be much faster.
>> >
>> > Some, u64 add/sub/shift isn't exactly expensive, but yes, I also
>> > indicated that improvement is possible. At the very least y can be made
>> > a u32 I suppose.
>>
>> OK, is there any more easy optimizations you see?
>
> I think this version works.
> It doesn't have the optimisation for small values.
>
> unsigned int sqrt64(unsigned long long x)
> {
>         unsigned int x_hi = x >> 32;
>
>         unsigned int b = 0;
>         unsigned int y = 0;
>         unsigned int i;
>
>         for (i = 0; i < 32; i++) {
>                 b <<= 2;
>                 b |= x_hi >> 30;
>                 x_hi <<= 2;
>                 if (i == 15)
>                         x_hi = x;
>                 y <<= 1;
>                 if (b > y)
>                         b -= ++y;
>         }
>         return y;
> }
>
> Put it through cc -O3 -m32 -c -o sqrt64.o sqrt64.c and then objdump sqrt64.o
> and compare to that of your version.
>
>         David
>

Wow, thanks. This seems like a major change.

I did a quick run through unit tests for the sensor and the results
are way off. On the sensor I had to convert double calculations to
integer calculations and target was to get end result within 0.02 degC
(with previous approximate sqrt implementation) in sensor working
range. This now gets into 3 degC delta at least and some are way off.
It might be off because of some scaling on the other hand during the
equation (not exactly comparing sqrt implementations here).

I will need a bit more testing of this to see if it is replaceable.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-20 17:19           ` Joe Perches
@ 2017-12-21 10:02             ` David Laight
  -1 siblings, 0 replies; 29+ messages in thread
From: David Laight @ 2017-12-21 10:02 UTC (permalink / raw)
  To: 'Joe Perches', 'Crt Mori', Peter Zijlstra
  Cc: Jonathan Cameron, Ingo Molnar, Andrew Morton, Kees Cook,
	Rusty Russell, Ian Abbott, Larry Finger, Niklas Soderlund,
	Thomas Gleixner, Krzysztof Kozlowski, Masahiro Yamada,
	linux-kernel, linux-iio

From: Joe Perches
> Sent: 20 December 2017 17:20
...
> > I think this version works.
> > It doesn't have the optimisation for small values.
> >
> > unsigned int sqrt64(unsigned long long x)
> > {
> >         unsigned int x_hi = x >> 32;
> >
> >         unsigned int b = 0;
> >         unsigned int y = 0;
> >         unsigned int i;
> >
> 
> Perhaps add:
> 
> 	if (x <= UINT_MAX)
> 		return int_sqrt((unsigned long)x);

Actually something like:
	i = 32;
	if (!x_hi) {
		x_hi = x;
		i = 16;
	}
	
	if (!(x_hi & 0xffff0000)) {
		x_hi <<= 16;
		i -= 8;
	}
Repeat for 0xff000000, 0xf0000000 and 0xc0000000 and adjust loop to count down.

	David

> 
> >         for (i = 0; i < 32; i++) {
> >                 b <<= 2;
> >                 b |= x_hi >> 30;
> >                 x_hi <<= 2;
> >                 if (i == 15)
> >                         x_hi = x;
> >                 y <<= 1;
> >                 if (b > y)
> >                         b -= ++y;
> >         }
> >         return y;
> > }
> >
> > Put it through cc -O3 -m32 -c -o sqrt64.o sqrt64.c and then objdump sqrt64.o
> > and compare to that of your version.
> >
> > 	David
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
@ 2017-12-21 10:02             ` David Laight
  0 siblings, 0 replies; 29+ messages in thread
From: David Laight @ 2017-12-21 10:02 UTC (permalink / raw)
  To: 'Joe Perches', 'Crt Mori', Peter Zijlstra
  Cc: Jonathan Cameron, Ingo Molnar, Andrew Morton, Kees Cook,
	Rusty Russell, Ian Abbott, Larry Finger, Niklas Soderlund,
	Thomas Gleixner, Krzysztof Kozlowski, Masahiro Yamada,
	linux-kernel, linux-iio

From: Joe Perches
> Sent: 20 December 2017 17:20
...
> > I think this version works.
> > It doesn't have the optimisation for small values.
> >
> > unsigned int sqrt64(unsigned long long x)
> > {
> >         unsigned int x_hi =3D x >> 32;
> >
> >         unsigned int b =3D 0;
> >         unsigned int y =3D 0;
> >         unsigned int i;
> >
>=20
> Perhaps add:
>=20
> 	if (x <=3D UINT_MAX)
> 		return int_sqrt((unsigned long)x);

Actually something like:
	i =3D 32;
	if (!x_hi) {
		x_hi =3D x;
		i =3D 16;
	}
=09
	if (!(x_hi & 0xffff0000)) {
		x_hi <<=3D 16;
		i -=3D 8;
	}
Repeat for 0xff000000, 0xf0000000 and 0xc0000000 and adjust loop to count d=
own.

	David

>=20
> >         for (i =3D 0; i < 32; i++) {
> >                 b <<=3D 2;
> >                 b |=3D x_hi >> 30;
> >                 x_hi <<=3D 2;
> >                 if (i =3D=3D 15)
> >                         x_hi =3D x;
> >                 y <<=3D 1;
> >                 if (b > y)
> >                         b -=3D ++y;
> >         }
> >         return y;
> > }
> >
> > Put it through cc -O3 -m32 -c -o sqrt64.o sqrt64.c and then objdump sqr=
t64.o
> > and compare to that of your version.
> >
> > 	David
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-20 17:30         ` Crt Mori
@ 2017-12-21 10:08             ` David Laight
  2017-12-21 10:59             ` David Laight
  2017-12-21 11:43             ` David Laight
  2 siblings, 0 replies; 29+ messages in thread
From: David Laight @ 2017-12-21 10:08 UTC (permalink / raw)
  To: 'Crt Mori'
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

From: Crt Mori
> Sent: 20 December 2017 17:30
> To: David Laight
...
> > I think this version works.
> > It doesn't have the optimisation for small values.
> >
> > unsigned int sqrt64(unsigned long long x)
> > {
> >         unsigned int x_hi = x >> 32;
> >
> >         unsigned int b = 0;
> >         unsigned int y = 0;
> >         unsigned int i;
> >
> >         for (i = 0; i < 32; i++) {
> >                 b <<= 2;
> >                 b |= x_hi >> 30;
> >                 x_hi <<= 2;
> >                 if (i == 15)
> >                         x_hi = x;
> >                 y <<= 1;
> >                 if (b > y)
> >                         b -= ++y;
> >         }
> >         return y;
> > }
> 
> Wow, thanks. This seems like a major change.

Not really, it is just doing it slightly differently.
The algorithm is the 'schoolboy' one that used to be taught at school.
(Although I found it in a boot from the 1930s.)

I compared the object code for amd64 (doesn't need to reload
'x_hi' half way through) against the original loop.
They are much the same size, execution speed will depend the lengths
of the dependency chains.

> I did a quick run through unit tests for the sensor and the results
> are way off. On the sensor I had to convert double calculations to
> integer calculations and target was to get end result within 0.02 degC
> (with previous approximate sqrt implementation) in sensor working
> range. This now gets into 3 degC delta at least and some are way off.
> It might be off because of some scaling on the other hand during the
> equation (not exactly comparing sqrt implementations here).
> 
> I will need a bit more testing of this to see if it is replaceable.

You probably need to put values through both (all 3) functions to see
where you are getting errors.
It might be rounding, or there might be a fubar somewhere.

	David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
@ 2017-12-21 10:08             ` David Laight
  0 siblings, 0 replies; 29+ messages in thread
From: David Laight @ 2017-12-21 10:08 UTC (permalink / raw)
  To: 'Crt Mori'
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

RnJvbTogQ3J0IE1vcmkNCj4gU2VudDogMjAgRGVjZW1iZXIgMjAxNyAxNzozMA0KPiBUbzogRGF2
aWQgTGFpZ2h0DQouLi4NCj4gPiBJIHRoaW5rIHRoaXMgdmVyc2lvbiB3b3Jrcy4NCj4gPiBJdCBk
b2Vzbid0IGhhdmUgdGhlIG9wdGltaXNhdGlvbiBmb3Igc21hbGwgdmFsdWVzLg0KPiA+DQo+ID4g
dW5zaWduZWQgaW50IHNxcnQ2NCh1bnNpZ25lZCBsb25nIGxvbmcgeCkNCj4gPiB7DQo+ID4gICAg
ICAgICB1bnNpZ25lZCBpbnQgeF9oaSA9IHggPj4gMzI7DQo+ID4NCj4gPiAgICAgICAgIHVuc2ln
bmVkIGludCBiID0gMDsNCj4gPiAgICAgICAgIHVuc2lnbmVkIGludCB5ID0gMDsNCj4gPiAgICAg
ICAgIHVuc2lnbmVkIGludCBpOw0KPiA+DQo+ID4gICAgICAgICBmb3IgKGkgPSAwOyBpIDwgMzI7
IGkrKykgew0KPiA+ICAgICAgICAgICAgICAgICBiIDw8PSAyOw0KPiA+ICAgICAgICAgICAgICAg
ICBiIHw9IHhfaGkgPj4gMzA7DQo+ID4gICAgICAgICAgICAgICAgIHhfaGkgPDw9IDI7DQo+ID4g
ICAgICAgICAgICAgICAgIGlmIChpID09IDE1KQ0KPiA+ICAgICAgICAgICAgICAgICAgICAgICAg
IHhfaGkgPSB4Ow0KPiA+ICAgICAgICAgICAgICAgICB5IDw8PSAxOw0KPiA+ICAgICAgICAgICAg
ICAgICBpZiAoYiA+IHkpDQo+ID4gICAgICAgICAgICAgICAgICAgICAgICAgYiAtPSArK3k7DQo+
ID4gICAgICAgICB9DQo+ID4gICAgICAgICByZXR1cm4geTsNCj4gPiB9DQo+IA0KPiBXb3csIHRo
YW5rcy4gVGhpcyBzZWVtcyBsaWtlIGEgbWFqb3IgY2hhbmdlLg0KDQpOb3QgcmVhbGx5LCBpdCBp
cyBqdXN0IGRvaW5nIGl0IHNsaWdodGx5IGRpZmZlcmVudGx5Lg0KVGhlIGFsZ29yaXRobSBpcyB0
aGUgJ3NjaG9vbGJveScgb25lIHRoYXQgdXNlZCB0byBiZSB0YXVnaHQgYXQgc2Nob29sLg0KKEFs
dGhvdWdoIEkgZm91bmQgaXQgaW4gYSBib290IGZyb20gdGhlIDE5MzBzLikNCg0KSSBjb21wYXJl
ZCB0aGUgb2JqZWN0IGNvZGUgZm9yIGFtZDY0IChkb2Vzbid0IG5lZWQgdG8gcmVsb2FkDQoneF9o
aScgaGFsZiB3YXkgdGhyb3VnaCkgYWdhaW5zdCB0aGUgb3JpZ2luYWwgbG9vcC4NClRoZXkgYXJl
IG11Y2ggdGhlIHNhbWUgc2l6ZSwgZXhlY3V0aW9uIHNwZWVkIHdpbGwgZGVwZW5kIHRoZSBsZW5n
dGhzDQpvZiB0aGUgZGVwZW5kZW5jeSBjaGFpbnMuDQoNCj4gSSBkaWQgYSBxdWljayBydW4gdGhy
b3VnaCB1bml0IHRlc3RzIGZvciB0aGUgc2Vuc29yIGFuZCB0aGUgcmVzdWx0cw0KPiBhcmUgd2F5
IG9mZi4gT24gdGhlIHNlbnNvciBJIGhhZCB0byBjb252ZXJ0IGRvdWJsZSBjYWxjdWxhdGlvbnMg
dG8NCj4gaW50ZWdlciBjYWxjdWxhdGlvbnMgYW5kIHRhcmdldCB3YXMgdG8gZ2V0IGVuZCByZXN1
bHQgd2l0aGluIDAuMDIgZGVnQw0KPiAod2l0aCBwcmV2aW91cyBhcHByb3hpbWF0ZSBzcXJ0IGlt
cGxlbWVudGF0aW9uKSBpbiBzZW5zb3Igd29ya2luZw0KPiByYW5nZS4gVGhpcyBub3cgZ2V0cyBp
bnRvIDMgZGVnQyBkZWx0YSBhdCBsZWFzdCBhbmQgc29tZSBhcmUgd2F5IG9mZi4NCj4gSXQgbWln
aHQgYmUgb2ZmIGJlY2F1c2Ugb2Ygc29tZSBzY2FsaW5nIG9uIHRoZSBvdGhlciBoYW5kIGR1cmlu
ZyB0aGUNCj4gZXF1YXRpb24gKG5vdCBleGFjdGx5IGNvbXBhcmluZyBzcXJ0IGltcGxlbWVudGF0
aW9ucyBoZXJlKS4NCj4gDQo+IEkgd2lsbCBuZWVkIGEgYml0IG1vcmUgdGVzdGluZyBvZiB0aGlz
IHRvIHNlZSBpZiBpdCBpcyByZXBsYWNlYWJsZS4NCg0KWW91IHByb2JhYmx5IG5lZWQgdG8gcHV0
IHZhbHVlcyB0aHJvdWdoIGJvdGggKGFsbCAzKSBmdW5jdGlvbnMgdG8gc2VlDQp3aGVyZSB5b3Ug
YXJlIGdldHRpbmcgZXJyb3JzLg0KSXQgbWlnaHQgYmUgcm91bmRpbmcsIG9yIHRoZXJlIG1pZ2h0
IGJlIGEgZnViYXIgc29tZXdoZXJlLg0KDQoJRGF2aWQNCg0K

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-20 17:30         ` Crt Mori
@ 2017-12-21 10:59             ` David Laight
  2017-12-21 10:59             ` David Laight
  2017-12-21 11:43             ` David Laight
  2 siblings, 0 replies; 29+ messages in thread
From: David Laight @ 2017-12-21 10:59 UTC (permalink / raw)
  To: 'Crt Mori'
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

From: Crt Mori
> Sent: 20 December 2017 17:30
> >> OK, is there any more easy optimizations you see?
> >
> > I think this version works.
> > It doesn't have the optimisation for small values.
> >
> > unsigned int sqrt64(unsigned long long x)
> > {
> >         unsigned int x_hi = x >> 32;
> >
> >         unsigned int b = 0;
> >         unsigned int y = 0;
> >         unsigned int i;
> >
> >         for (i = 0; i < 32; i++) {
> >                 b <<= 2;
> >                 b |= x_hi >> 30;
> >                 x_hi <<= 2;
> >                 if (i == 15)
> >                         x_hi = x;
> >                 y <<= 1;
> >                 if (b > y)
> >                         b -= ++y;
> >         }
> >         return y;
> > }
..
> 
> I did a quick run through unit tests for the sensor and the results
> are way off. On the sensor I had to convert double calculations to
> integer calculations and target was to get end result within 0.02 degC
> (with previous approximate sqrt implementation) in sensor working
> range. This now gets into 3 degC delta at least and some are way off.
> It might be off because of some scaling on the other hand during the
> equation (not exactly comparing sqrt implementations here).

I didn't get it quite right...
The last few lines need to be:
		if (b > y) {	
			b -= ++y;
			y++;
		}
	}
	return y >> 1;
}

Although that then fails for inputs larger than 2^62.

	David

	

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
@ 2017-12-21 10:59             ` David Laight
  0 siblings, 0 replies; 29+ messages in thread
From: David Laight @ 2017-12-21 10:59 UTC (permalink / raw)
  To: 'Crt Mori'
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

RnJvbTogQ3J0IE1vcmkNCj4gU2VudDogMjAgRGVjZW1iZXIgMjAxNyAxNzozMA0KPiA+PiBPSywg
aXMgdGhlcmUgYW55IG1vcmUgZWFzeSBvcHRpbWl6YXRpb25zIHlvdSBzZWU/DQo+ID4NCj4gPiBJ
IHRoaW5rIHRoaXMgdmVyc2lvbiB3b3Jrcy4NCj4gPiBJdCBkb2Vzbid0IGhhdmUgdGhlIG9wdGlt
aXNhdGlvbiBmb3Igc21hbGwgdmFsdWVzLg0KPiA+DQo+ID4gdW5zaWduZWQgaW50IHNxcnQ2NCh1
bnNpZ25lZCBsb25nIGxvbmcgeCkNCj4gPiB7DQo+ID4gICAgICAgICB1bnNpZ25lZCBpbnQgeF9o
aSA9IHggPj4gMzI7DQo+ID4NCj4gPiAgICAgICAgIHVuc2lnbmVkIGludCBiID0gMDsNCj4gPiAg
ICAgICAgIHVuc2lnbmVkIGludCB5ID0gMDsNCj4gPiAgICAgICAgIHVuc2lnbmVkIGludCBpOw0K
PiA+DQo+ID4gICAgICAgICBmb3IgKGkgPSAwOyBpIDwgMzI7IGkrKykgew0KPiA+ICAgICAgICAg
ICAgICAgICBiIDw8PSAyOw0KPiA+ICAgICAgICAgICAgICAgICBiIHw9IHhfaGkgPj4gMzA7DQo+
ID4gICAgICAgICAgICAgICAgIHhfaGkgPDw9IDI7DQo+ID4gICAgICAgICAgICAgICAgIGlmIChp
ID09IDE1KQ0KPiA+ICAgICAgICAgICAgICAgICAgICAgICAgIHhfaGkgPSB4Ow0KPiA+ICAgICAg
ICAgICAgICAgICB5IDw8PSAxOw0KPiA+ICAgICAgICAgICAgICAgICBpZiAoYiA+IHkpDQo+ID4g
ICAgICAgICAgICAgICAgICAgICAgICAgYiAtPSArK3k7DQo+ID4gICAgICAgICB9DQo+ID4gICAg
ICAgICByZXR1cm4geTsNCj4gPiB9DQouLg0KPiANCj4gSSBkaWQgYSBxdWljayBydW4gdGhyb3Vn
aCB1bml0IHRlc3RzIGZvciB0aGUgc2Vuc29yIGFuZCB0aGUgcmVzdWx0cw0KPiBhcmUgd2F5IG9m
Zi4gT24gdGhlIHNlbnNvciBJIGhhZCB0byBjb252ZXJ0IGRvdWJsZSBjYWxjdWxhdGlvbnMgdG8N
Cj4gaW50ZWdlciBjYWxjdWxhdGlvbnMgYW5kIHRhcmdldCB3YXMgdG8gZ2V0IGVuZCByZXN1bHQg
d2l0aGluIDAuMDIgZGVnQw0KPiAod2l0aCBwcmV2aW91cyBhcHByb3hpbWF0ZSBzcXJ0IGltcGxl
bWVudGF0aW9uKSBpbiBzZW5zb3Igd29ya2luZw0KPiByYW5nZS4gVGhpcyBub3cgZ2V0cyBpbnRv
IDMgZGVnQyBkZWx0YSBhdCBsZWFzdCBhbmQgc29tZSBhcmUgd2F5IG9mZi4NCj4gSXQgbWlnaHQg
YmUgb2ZmIGJlY2F1c2Ugb2Ygc29tZSBzY2FsaW5nIG9uIHRoZSBvdGhlciBoYW5kIGR1cmluZyB0
aGUNCj4gZXF1YXRpb24gKG5vdCBleGFjdGx5IGNvbXBhcmluZyBzcXJ0IGltcGxlbWVudGF0aW9u
cyBoZXJlKS4NCg0KSSBkaWRuJ3QgZ2V0IGl0IHF1aXRlIHJpZ2h0Li4uDQpUaGUgbGFzdCBmZXcg
bGluZXMgbmVlZCB0byBiZToNCgkJaWYgKGIgPiB5KSB7CQ0KCQkJYiAtPSArK3k7DQoJCQl5Kys7
DQoJCX0NCgl9DQoJcmV0dXJuIHkgPj4gMTsNCn0NCg0KQWx0aG91Z2ggdGhhdCB0aGVuIGZhaWxz
IGZvciBpbnB1dHMgbGFyZ2VyIHRoYW4gMl42Mi4NCg0KCURhdmlkDQoNCgkNCg==

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-20 17:30         ` Crt Mori
@ 2017-12-21 11:43             ` David Laight
  2017-12-21 10:59             ` David Laight
  2017-12-21 11:43             ` David Laight
  2 siblings, 0 replies; 29+ messages in thread
From: David Laight @ 2017-12-21 11:43 UTC (permalink / raw)
  To: 'Crt Mori'
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

From: Crt Mori
> Sent: 20 December 2017 17:30
> I did a quick run through unit tests for the sensor and the results
> are way off
> ...

Try this version instead:
unsigned int sqrt64(unsigned long long x_in)
{
        unsigned int x = x_in >> 32;

        unsigned int b = 0;
        unsigned int y = 0;
        unsigned int i;

        i = 31;
        if (!x) {
                x = x_in;
                i = 15;
        }
        if (!(x & 0xffff0000)) {
                x <<= 16;
                i -= 8;
        }
        if (!(x & 0xff000000)) {
                x <<= 8;
                i -= 4;
        }
        if (!(x & 0xf0000000)) {
                x <<= 4;
                i -= 2;
        }

        do {
                b <<= 2;
                b |= x >> 30;
                x <<= 2;
                if (i == 16)
                        x = x_in;
                y <<= 1;
                if (b > y) {
                        b -= ++y;
                        y++;
                }
        } while (--i);

        /* 'b' becomes 33 bits if the input is greater than 2^62 */
        b <<= 1;
        b |= x >> 31;
        if (b > y || (b == y && x & (1u << 30)))
                y |= 1;

        return y;
}

I've tested that one with more values.

	David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
@ 2017-12-21 11:43             ` David Laight
  0 siblings, 0 replies; 29+ messages in thread
From: David Laight @ 2017-12-21 11:43 UTC (permalink / raw)
  To: 'Crt Mori'
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

RnJvbTogQ3J0IE1vcmkNCj4gU2VudDogMjAgRGVjZW1iZXIgMjAxNyAxNzozMA0KPiBJIGRpZCBh
IHF1aWNrIHJ1biB0aHJvdWdoIHVuaXQgdGVzdHMgZm9yIHRoZSBzZW5zb3IgYW5kIHRoZSByZXN1
bHRzDQo+IGFyZSB3YXkgb2ZmDQo+IC4uLg0KDQpUcnkgdGhpcyB2ZXJzaW9uIGluc3RlYWQ6DQp1
bnNpZ25lZCBpbnQgc3FydDY0KHVuc2lnbmVkIGxvbmcgbG9uZyB4X2luKQ0Kew0KICAgICAgICB1
bnNpZ25lZCBpbnQgeCA9IHhfaW4gPj4gMzI7DQoNCiAgICAgICAgdW5zaWduZWQgaW50IGIgPSAw
Ow0KICAgICAgICB1bnNpZ25lZCBpbnQgeSA9IDA7DQogICAgICAgIHVuc2lnbmVkIGludCBpOw0K
DQogICAgICAgIGkgPSAzMTsNCiAgICAgICAgaWYgKCF4KSB7DQogICAgICAgICAgICAgICAgeCA9
IHhfaW47DQogICAgICAgICAgICAgICAgaSA9IDE1Ow0KICAgICAgICB9DQogICAgICAgIGlmICgh
KHggJiAweGZmZmYwMDAwKSkgew0KICAgICAgICAgICAgICAgIHggPDw9IDE2Ow0KICAgICAgICAg
ICAgICAgIGkgLT0gODsNCiAgICAgICAgfQ0KICAgICAgICBpZiAoISh4ICYgMHhmZjAwMDAwMCkp
IHsNCiAgICAgICAgICAgICAgICB4IDw8PSA4Ow0KICAgICAgICAgICAgICAgIGkgLT0gNDsNCiAg
ICAgICAgfQ0KICAgICAgICBpZiAoISh4ICYgMHhmMDAwMDAwMCkpIHsNCiAgICAgICAgICAgICAg
ICB4IDw8PSA0Ow0KICAgICAgICAgICAgICAgIGkgLT0gMjsNCiAgICAgICAgfQ0KDQogICAgICAg
IGRvIHsNCiAgICAgICAgICAgICAgICBiIDw8PSAyOw0KICAgICAgICAgICAgICAgIGIgfD0geCA+
PiAzMDsNCiAgICAgICAgICAgICAgICB4IDw8PSAyOw0KICAgICAgICAgICAgICAgIGlmIChpID09
IDE2KQ0KICAgICAgICAgICAgICAgICAgICAgICAgeCA9IHhfaW47DQogICAgICAgICAgICAgICAg
eSA8PD0gMTsNCiAgICAgICAgICAgICAgICBpZiAoYiA+IHkpIHsNCiAgICAgICAgICAgICAgICAg
ICAgICAgIGIgLT0gKyt5Ow0KICAgICAgICAgICAgICAgICAgICAgICAgeSsrOw0KICAgICAgICAg
ICAgICAgIH0NCiAgICAgICAgfSB3aGlsZSAoLS1pKTsNCg0KICAgICAgICAvKiAnYicgYmVjb21l
cyAzMyBiaXRzIGlmIHRoZSBpbnB1dCBpcyBncmVhdGVyIHRoYW4gMl42MiAqLw0KICAgICAgICBi
IDw8PSAxOw0KICAgICAgICBiIHw9IHggPj4gMzE7DQogICAgICAgIGlmIChiID4geSB8fCAoYiA9
PSB5ICYmIHggJiAoMXUgPDwgMzApKSkNCiAgICAgICAgICAgICAgICB5IHw9IDE7DQoNCiAgICAg
ICAgcmV0dXJuIHk7DQp9DQoNCkkndmUgdGVzdGVkIHRoYXQgb25lIHdpdGggbW9yZSB2YWx1ZXMu
DQoNCglEYXZpZA0KDQo=

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-21 11:43             ` David Laight
  (?)
@ 2017-12-21 13:17             ` Crt Mori
  2017-12-21 13:56                 ` David Laight
  -1 siblings, 1 reply; 29+ messages in thread
From: Crt Mori @ 2017-12-21 13:17 UTC (permalink / raw)
  To: David Laight
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

On 21 December 2017 at 12:43, David Laight <David.Laight@aculab.com> wrote:
> From: Crt Mori
>> Sent: 20 December 2017 17:30
>> I did a quick run through unit tests for the sensor and the results
>> are way off
>> ...
>
> Try this version instead:
> unsigned int sqrt64(unsigned long long x_in)
> {
>         unsigned int x = x_in >> 32;
>
>         unsigned int b = 0;
>         unsigned int y = 0;
>         unsigned int i;

i can be u8. And I will still use explicit typing.

>
>         i = 31;
>         if (!x) {
>                 x = x_in;
>                 i = 15;
>         }
>         if (!(x & 0xffff0000)) {
>                 x <<= 16;
>                 i -= 8;
>         }
>         if (!(x & 0xff000000)) {
>                 x <<= 8;
>                 i -= 4;
>         }
>         if (!(x & 0xf0000000)) {
>                 x <<= 4;
>                 i -= 2;
>         }
>

This part above looks like FLS

>         do {
>                 b <<= 2;
>                 b |= x >> 30;
>                 x <<= 2;
>                 if (i == 16)
>                         x = x_in;
>                 y <<= 1;
>                 if (b > y) {
>                         b -= ++y;
>                         y++;
>                 }
>         } while (--i);
>
>         /* 'b' becomes 33 bits if the input is greater than 2^62 */
>         b <<= 1;
>         b |= x >> 31;
>         if (b > y || (b == y && x & (1u << 30)))
>                 y |= 1;
>
>         return y;
> }
>
> I've tested that one with more values.
>
>         David
>

This one indeed works. I did some more testing this morning and I am
fine with either.

So question is: Do I make change as per David's suggestion with his
sign-off, or leave the version it was before the change?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-21 13:17             ` Crt Mori
@ 2017-12-21 13:56                 ` David Laight
  0 siblings, 0 replies; 29+ messages in thread
From: David Laight @ 2017-12-21 13:56 UTC (permalink / raw)
  To: 'Crt Mori'
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

From: Crt Mori
> Sent: 21 December 2017 13:18
...
> >         unsigned int i;
> 
> i can be u8. And I will still use explicit typing.

u8 will add extra code, unsigned int is good.
'x' needs to be u32, 'y' and 'b' could be larger.

I was testing in userspace.

...
> This part above looks like FLS
It also does the rest of the required shifts.

...
> This one indeed works. I did some more testing this morning and I am
> fine with either.
> 
> So question is: Do I make change as per David's suggestion with his
> sign-off, or leave the version it was before the change?

If you generate the actual patch I can add an appropriate sign-off
(or whatever is needed).

	David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
@ 2017-12-21 13:56                 ` David Laight
  0 siblings, 0 replies; 29+ messages in thread
From: David Laight @ 2017-12-21 13:56 UTC (permalink / raw)
  To: 'Crt Mori'
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

RnJvbTogQ3J0IE1vcmkNCj4gU2VudDogMjEgRGVjZW1iZXIgMjAxNyAxMzoxOA0KLi4uDQo+ID4g
ICAgICAgICB1bnNpZ25lZCBpbnQgaTsNCj4gDQo+IGkgY2FuIGJlIHU4LiBBbmQgSSB3aWxsIHN0
aWxsIHVzZSBleHBsaWNpdCB0eXBpbmcuDQoNCnU4IHdpbGwgYWRkIGV4dHJhIGNvZGUsIHVuc2ln
bmVkIGludCBpcyBnb29kLg0KJ3gnIG5lZWRzIHRvIGJlIHUzMiwgJ3knIGFuZCAnYicgY291bGQg
YmUgbGFyZ2VyLg0KDQpJIHdhcyB0ZXN0aW5nIGluIHVzZXJzcGFjZS4NCg0KLi4uDQo+IFRoaXMg
cGFydCBhYm92ZSBsb29rcyBsaWtlIEZMUw0KSXQgYWxzbyBkb2VzIHRoZSByZXN0IG9mIHRoZSBy
ZXF1aXJlZCBzaGlmdHMuDQoNCi4uLg0KPiBUaGlzIG9uZSBpbmRlZWQgd29ya3MuIEkgZGlkIHNv
bWUgbW9yZSB0ZXN0aW5nIHRoaXMgbW9ybmluZyBhbmQgSSBhbQ0KPiBmaW5lIHdpdGggZWl0aGVy
Lg0KPiANCj4gU28gcXVlc3Rpb24gaXM6IERvIEkgbWFrZSBjaGFuZ2UgYXMgcGVyIERhdmlkJ3Mg
c3VnZ2VzdGlvbiB3aXRoIGhpcw0KPiBzaWduLW9mZiwgb3IgbGVhdmUgdGhlIHZlcnNpb24gaXQg
d2FzIGJlZm9yZSB0aGUgY2hhbmdlPw0KDQpJZiB5b3UgZ2VuZXJhdGUgdGhlIGFjdHVhbCBwYXRj
aCBJIGNhbiBhZGQgYW4gYXBwcm9wcmlhdGUgc2lnbi1vZmYNCihvciB3aGF0ZXZlciBpcyBuZWVk
ZWQpLg0KDQoJRGF2aWQNCg0K

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-21 13:56                 ` David Laight
  (?)
@ 2017-12-21 14:11                 ` Peter Zijlstra
  2017-12-21 14:48                   ` David Laight
  -1 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2017-12-21 14:11 UTC (permalink / raw)
  To: David Laight
  Cc: 'Crt Mori',
	Jonathan Cameron, Ingo Molnar, Andrew Morton, Kees Cook,
	Rusty Russell, Ian Abbott, Larry Finger, Niklas Soderlund,
	Thomas Gleixner, Krzysztof Kozlowski, Masahiro Yamada,
	linux-kernel, linux-iio, Joe Perches

On Thu, Dec 21, 2017 at 01:56:55PM +0000, David Laight wrote:
> > This part above looks like FLS
> It also does the rest of the required shifts.

Still, fls() + shift is way faster on hardware that has an fls
instruction.

Writing out that binary search doesn't make sense.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-21 14:11                 ` Peter Zijlstra
@ 2017-12-21 14:48                   ` David Laight
  2017-12-22 13:44                     ` Crt Mori
  0 siblings, 1 reply; 29+ messages in thread
From: David Laight @ 2017-12-21 14:48 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: 'Crt Mori',
	Jonathan Cameron, Ingo Molnar, Andrew Morton, Kees Cook,
	Rusty Russell, Ian Abbott, Larry Finger, Niklas Soderlund,
	Thomas Gleixner, Krzysztof Kozlowski, Masahiro Yamada,
	linux-kernel, linux-iio, Joe Perches

From: Peter Zijlstra
> Sent: 21 December 2017 14:12
...
> > > This part above looks like FLS
> > It also does the rest of the required shifts.
> 
> Still, fls() + shift is way faster on hardware that has an fls
> instruction.
> 
> Writing out that binary search doesn't make sense.

If the hardware doesn't have an appropriate fls instruction
the soft fls()will be worse.

If you used fls() you'd still need quite a bit of code
to generate the correct shift and loop count adjustment.
Given the cost of the loop iterations the 3 tests are noise.
The open coded version is obviously correct...

I didn't add the 4th one because the code always does 2 iterations.

If you were really worried about performance there are faster
algorithms (even doing 2 or 4 bits a time is faster).

	David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-21 14:48                   ` David Laight
@ 2017-12-22 13:44                     ` Crt Mori
  2018-01-09 15:18                       ` Crt Mori
  0 siblings, 1 reply; 29+ messages in thread
From: Crt Mori @ 2017-12-22 13:44 UTC (permalink / raw)
  To: David Laight
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

>From simple strong typing of existing int_sqrt we came to something a
bit more complex or better. Can we decide now which we want in, or I
submit v12 and we decide then (although it is not a v12, but whole new
thing)?

On 21 December 2017 at 15:48, David Laight <David.Laight@aculab.com> wrote:
> From: Peter Zijlstra
>> Sent: 21 December 2017 14:12
> ...
>> > > This part above looks like FLS
>> > It also does the rest of the required shifts.
>>
>> Still, fls() + shift is way faster on hardware that has an fls
>> instruction.
>>
>> Writing out that binary search doesn't make sense.
>
> If the hardware doesn't have an appropriate fls instruction
> the soft fls()will be worse.
>
> If you used fls() you'd still need quite a bit of code
> to generate the correct shift and loop count adjustment.
> Given the cost of the loop iterations the 3 tests are noise.
> The open coded version is obviously correct...
>
> I didn't add the 4th one because the code always does 2 iterations.
>
> If you were really worried about performance there are faster
> algorithms (even doing 2 or 4 bits a time is faster).
>
>         David
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2017-12-22 13:44                     ` Crt Mori
@ 2018-01-09 15:18                       ` Crt Mori
  2018-01-12  9:41                           ` David Laight
  0 siblings, 1 reply; 29+ messages in thread
From: Crt Mori @ 2018-01-09 15:18 UTC (permalink / raw)
  To: David Laight
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

It has been some time now since this moved. I have decided not to use
David's implementation because I want to maintain also range above
2^62

Are there any additional objections for this not to go in as it is?

On 22 December 2017 at 14:44, Crt Mori <cmo@melexis.com> wrote:
>
> From simple strong typing of existing int_sqrt we came to something a
> bit more complex or better. Can we decide now which we want in, or I
> submit v12 and we decide then (although it is not a v12, but whole new
> thing)?
>
> On 21 December 2017 at 15:48, David Laight <David.Laight@aculab.com> wrote:
> > From: Peter Zijlstra
> >> Sent: 21 December 2017 14:12
> > ...
> >> > > This part above looks like FLS
> >> > It also does the rest of the required shifts.
> >>
> >> Still, fls() + shift is way faster on hardware that has an fls
> >> instruction.
> >>
> >> Writing out that binary search doesn't make sense.
> >
> > If the hardware doesn't have an appropriate fls instruction
> > the soft fls()will be worse.
> >
> > If you used fls() you'd still need quite a bit of code
> > to generate the correct shift and loop count adjustment.
> > Given the cost of the loop iterations the 3 tests are noise.
> > The open coded version is obviously correct...
> >
> > I didn't add the 4th one because the code always does 2 iterations.
> >
> > If you were really worried about performance there are faster
> > algorithms (even doing 2 or 4 bits a time is faster).
> >
> >         David
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2018-01-09 15:18                       ` Crt Mori
@ 2018-01-12  9:41                           ` David Laight
  0 siblings, 0 replies; 29+ messages in thread
From: David Laight @ 2018-01-12  9:41 UTC (permalink / raw)
  To: 'Crt Mori'
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

From: Crt Mori [mailto:cmo@melexis.com]
> Sent: 09 January 2018 15:18
> 
> It has been some time now since this moved. I have decided not to use
> David's implementation because I want to maintain also range above
> 2^62

The last version I did supported the full range.

	David


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
@ 2018-01-12  9:41                           ` David Laight
  0 siblings, 0 replies; 29+ messages in thread
From: David Laight @ 2018-01-12  9:41 UTC (permalink / raw)
  To: 'Crt Mori'
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

RnJvbTogQ3J0IE1vcmkgW21haWx0bzpjbW9AbWVsZXhpcy5jb21dDQo+IFNlbnQ6IDA5IEphbnVh
cnkgMjAxOCAxNToxOA0KPiANCj4gSXQgaGFzIGJlZW4gc29tZSB0aW1lIG5vdyBzaW5jZSB0aGlz
IG1vdmVkLiBJIGhhdmUgZGVjaWRlZCBub3QgdG8gdXNlDQo+IERhdmlkJ3MgaW1wbGVtZW50YXRp
b24gYmVjYXVzZSBJIHdhbnQgdG8gbWFpbnRhaW4gYWxzbyByYW5nZSBhYm92ZQ0KPiAyXjYyDQoN
ClRoZSBsYXN0IHZlcnNpb24gSSBkaWQgc3VwcG9ydGVkIHRoZSBmdWxsIHJhbmdlLg0KDQoJRGF2
aWQNCg0K

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt
  2018-01-12  9:41                           ` David Laight
  (?)
@ 2018-01-15  8:17                           ` Crt Mori
  -1 siblings, 0 replies; 29+ messages in thread
From: Crt Mori @ 2018-01-15  8:17 UTC (permalink / raw)
  To: David Laight
  Cc: Peter Zijlstra, Jonathan Cameron, Ingo Molnar, Andrew Morton,
	Kees Cook, Rusty Russell, Ian Abbott, Larry Finger,
	Niklas Soderlund, Thomas Gleixner, Krzysztof Kozlowski,
	Masahiro Yamada, linux-kernel, linux-iio, Joe Perches

On 12 January 2018 at 10:41, David Laight <David.Laight@aculab.com> wrote:
> From: Crt Mori [mailto:cmo@melexis.com]
>> Sent: 09 January 2018 15:18
>>
>> It has been some time now since this moved. I have decided not to use
>> David's implementation because I want to maintain also range above
>> 2^62
>
> The last version I did supported the full range.

Nothing changed below that comment, so I was assuming it is not
supported (or did I miss a mail?).

Also fls discussion had opposite opinions or it just seems inconclusive to me?

>
>         David
>

So you all agree David Laight version is much better and should be
used instead of currently proposed version?

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2018-01-15  8:18 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-20 14:20 [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt Crt Mori
2017-12-20 14:39 ` David Laight
2017-12-20 14:39   ` David Laight
2017-12-20 15:41   ` Crt Mori
2017-12-20 16:00   ` Peter Zijlstra
2017-12-20 16:17     ` Crt Mori
2017-12-20 16:46       ` David Laight
2017-12-20 16:46         ` David Laight
2017-12-20 17:19         ` Joe Perches
2017-12-20 17:19           ` Joe Perches
2017-12-21 10:02           ` David Laight
2017-12-21 10:02             ` David Laight
2017-12-20 17:30         ` Crt Mori
2017-12-21 10:08           ` David Laight
2017-12-21 10:08             ` David Laight
2017-12-21 10:59           ` David Laight
2017-12-21 10:59             ` David Laight
2017-12-21 11:43           ` David Laight
2017-12-21 11:43             ` David Laight
2017-12-21 13:17             ` Crt Mori
2017-12-21 13:56               ` David Laight
2017-12-21 13:56                 ` David Laight
2017-12-21 14:11                 ` Peter Zijlstra
2017-12-21 14:48                   ` David Laight
2017-12-22 13:44                     ` Crt Mori
2018-01-09 15:18                       ` Crt Mori
2018-01-12  9:41                         ` David Laight
2018-01-12  9:41                           ` David Laight
2018-01-15  8:17                           ` Crt Mori

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.