[PATCH] trivial, document that div64_u64() is not precise on 32bit platforms

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] trivial, document that div64_u64() is not precise on 32bit platforms
@ 2010-08-02 16:09 Oleg Nesterov
  2010-08-03 22:28 ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2010-08-02 16:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ben Woodard, Brian Behlendorf, Jeremy Fitzhardinge,
	Mark Grondona, linux-kernel

We have a bugreport which blames div64_u64() on 32bit platforms.

However, the code obviously doesn't even try to pretend it can do
the 64bit division precisely. If there is something in the high
word of divisor, div64_u64() just shifts both arguments and throws
out the low bits.

Add a small comment to avoid the confusion.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---

 lib/div64.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/lib/div64.c
+++ b/lib/div64.c
@@ -77,7 +77,10 @@ s64 div_s64_rem(s64 dividend, s32 diviso
 EXPORT_SYMBOL(div_s64_rem);
 #endif
 
-/* 64bit divisor, dividend and result. dynamic precision */
+/*
+ * 64bit divisor, dividend and result. Dynamic precision, unless
+ * divisor fits in u32 result is not exactly correct.
+ */
 #ifndef div64_u64
 u64 div64_u64(u64 dividend, u64 divisor)
 {


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] trivial, document that div64_u64() is not precise on 32bit platforms
  2010-08-02 16:09 [PATCH] trivial, document that div64_u64() is not precise on 32bit platforms Oleg Nesterov
@ 2010-08-03 22:28 ` Andrew Morton
  2010-08-04  0:12   ` Ben Woodard
  2010-08-09 16:30   ` [PATCH] Make div64_u64() " Brian Behlendorf
  0 siblings, 2 replies; 12+ messages in thread
From: Andrew Morton @ 2010-08-03 22:28 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ben Woodard, Brian Behlendorf, Jeremy Fitzhardinge,
	Mark Grondona, linux-kernel

On Mon, 2 Aug 2010 18:09:51 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> We have a bugreport which blames div64_u64() on 32bit platforms.
> 
> However, the code obviously doesn't even try to pretend it can do
> the 64bit division precisely. If there is something in the high
> word of divisor, div64_u64() just shifts both arguments and throws
> out the low bits.
> 
> Add a small comment to avoid the confusion.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
> 
>  lib/div64.c |    5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> --- a/lib/div64.c
> +++ b/lib/div64.c
> @@ -77,7 +77,10 @@ s64 div_s64_rem(s64 dividend, s32 diviso
>  EXPORT_SYMBOL(div_s64_rem);
>  #endif
>  
> -/* 64bit divisor, dividend and result. dynamic precision */
> +/*
> + * 64bit divisor, dividend and result. Dynamic precision, unless
> + * divisor fits in u32 result is not exactly correct.
> + */
>  #ifndef div64_u64
>  u64 div64_u64(u64 dividend, u64 divisor)
>  {

Well that was a bit lazy of us - I wonder how hard it is to fix.

At present people will test their code on 64-bit only to find out later
that it doesn't work correctly on 32-bit.  Bad.  Perhaps we should
similarly break the 64-bit version :)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] trivial, document that div64_u64() is not precise on 32bit platforms
  2010-08-03 22:28 ` Andrew Morton
@ 2010-08-04  0:12   ` Ben Woodard
  2010-08-09 16:30   ` [PATCH] Make div64_u64() " Brian Behlendorf
  1 sibling, 0 replies; 12+ messages in thread
From: Ben Woodard @ 2010-08-04  0:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Oleg Nesterov, Behlendorf, Brian D.,
	Jeremy Fitzhardinge, Mark Grondona, linux-kernel

Andrew,

On Tue, 2010-08-03 at 15:28 -0700, Andrew Morton wrote:
> Well that was a bit lazy of us - I wonder how hard it is to fix.
> 
> At present people will test their code on 64-bit only to find out
> later
> that it doesn't work correctly on 32-bit.  Bad.  Perhaps we should
> similarly break the 64-bit version :)

I just talked to Brian. He's got an implementation that does the
division correctly. He's going to clean it up and send you a patch so
that it will work without losing precision on both 32b and 64b. It is
based upon public domain code from  http://www.hackersdelight.org and
has been proven to be correct in all cases.

-ben



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] Make div64_u64() precise on 32bit platforms
  2010-08-03 22:28 ` Andrew Morton
  2010-08-04  0:12   ` Ben Woodard
@ 2010-08-09 16:30   ` Brian Behlendorf
  2010-09-17  0:00     ` Oleg Nesterov
  1 sibling, 1 reply; 12+ messages in thread
From: Brian Behlendorf @ 2010-08-09 16:30 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Oleg Nesterov, Ben Woodard, Jeremy Fitzhardinge, Mark Grondona,
	linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 1285 bytes --]


> On Mon, 2 Aug 2010 18:09:51 +0200
>
> Oleg Nesterov <oleg@redhat.com> wrote:
> > We have a bugreport which blames div64_u64() on 32bit platforms.
> >
> > However, the code obviously doesn't even try to pretend it can do
> > the 64bit division precisely. If there is something in the high
> > word of divisor, div64_u64() just shifts both arguments and throws
> > out the low bits.
>
> Well that was a bit lazy of us - I wonder how hard it is to fix.
>
> At present people will test their code on 64-bit only to find out later
> that it doesn't work correctly on 32-bit.  Bad.  Perhaps we should
> similarly break the 64-bit version :)

Here's an even crazier idea, let's just fix the 32-bit version.  :)

The attached patch fully implements div64_u64() such that it will return 
precisely the right quotient even when the divisor exceeds 32-bits.  The 
patch also adds a div64_s64() function to fully support signed 64-bit 
division.

Because this fix is non-obvious I have also included a unsigned and signed 
regression test to verify the correctness of the patch.  Using a vanilla 
2.6.35 kernel the unsigned regression tests fails on 32-bit platforms.  With 
the proposed patch applied both the unsigned and signed tests pass.

-- 
Thanks,
Brian

[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: 0001-Fix-div64_u64-for-32bit-platforms.patch --]
[-- Type: text/plain, Size: 3927 bytes --]

>From 3b30f1cf78f88b40360dd65816941cf2be9dd60d Mon Sep 17 00:00:00 2001
From: Brian Behlendorf <behlendorf1@llnl.gov>
Date: Thu, 5 Aug 2010 14:59:11 -0700
Subject: [PATCH] Fix div64_u64 for 32bit platforms

The current implementation of div64_u64 for 32bit systems returns
an approximately correct result when the divisor exceeds 32bits.
Since doing 64bit division using 32bit hardware is a long since
solved problem we just use one of the existing proven methods.

Additionally, add a div64_s64 function to correctly handle doing
signed 64bit division.

---
 include/linux/kernel.h |    5 +++
 include/linux/math64.h |   12 +++++++++
 lib/div64.c            |   64 +++++++++++++++++++++++++++++++++++++++--------
 3 files changed, 70 insertions(+), 11 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 5de838b..7a00dff 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -162,6 +162,11 @@ extern int _cond_resched(void);
 		(__x < 0) ? -__x : __x;		\
 	})
 
+#define abs64(x) ({				\
+		s64 __x = (x);			\
+		(__x < 0) ? -__x : __x;		\
+	})
+
 #ifdef CONFIG_PROVE_LOCKING
 void might_fault(void);
 #else
diff --git a/include/linux/math64.h b/include/linux/math64.h
index c87f152..23fcdfc 100644
--- a/include/linux/math64.h
+++ b/include/linux/math64.h
@@ -35,6 +35,14 @@ static inline u64 div64_u64(u64 dividend, u64 divisor)
 	return dividend / divisor;
 }
 
+/**
+ * div64_s64 - signed 64bit divide with 64bit divisor
+ */
+static inline s64 div64_s64(s64 dividend, s64 divisor)
+{
+	return dividend / divisor;
+}
+
 #elif BITS_PER_LONG == 32
 
 #ifndef div_u64_rem
@@ -53,6 +61,10 @@ extern s64 div_s64_rem(s64 dividend, s32 divisor, s32 *remainder);
 extern u64 div64_u64(u64 dividend, u64 divisor);
 #endif
 
+#ifndef div64_s64
+extern s64 div64_s64(s64 dividend, s64 divisor);
+#endif
+
 #endif /* BITS_PER_LONG */
 
 /**
diff --git a/lib/div64.c b/lib/div64.c
index a111eb8..e4e7fc6 100644
--- a/lib/div64.c
+++ b/lib/div64.c
@@ -77,26 +77,68 @@ s64 div_s64_rem(s64 dividend, s32 divisor, s32 *remainder)
 EXPORT_SYMBOL(div_s64_rem);
 #endif
 
-/* 64bit divisor, dividend and result. dynamic precision */
+/**
+ * div64_u64 - unsigned 64bit divide with 64bit divisor
+ * @dividend:	64bit dividend
+ * @divisor:	64bit divisor
+ *
+ * This implementation is a modified version of the algorithm proposed
+ * by the book 'Hacker's Delight'.  The original source and full proof
+ * can be found here and is available for use without restriction.
+ *
+ * 'http://www.hackersdelight.org/HDcode/newCode/divDouble.c'
+ */
 #ifndef div64_u64
 u64 div64_u64(u64 dividend, u64 divisor)
 {
-	u32 high, d;
-
-	high = divisor >> 32;
-	if (high) {
-		unsigned int shift = fls(high);
+	u64 u0, quot0, quot1;
+	u32 rem;
+	int n;
+
+	if (divisor >> 32 == 0) {
+		if (dividend >> 32 < divisor) {
+			return div_u64_rem(dividend, divisor, &rem);
+		} else {
+			u0 = dividend & 0xFFFFFFFF;
+			quot1 = div_u64_rem(dividend >> 32, divisor, &rem);
+			u0 += ((u64)rem << 32);
+			quot0 = div_u64_rem(u0, divisor, &rem);
+			return (quot1 << 32) + quot0;
+		}
+	} else {
+		n = __builtin_clzll(divisor);
+		quot1 = div_u64_rem(dividend >> 1, (divisor << n) >> 32, &rem);
+		quot0 = (quot1 << n) >> 31;
 
-		d = divisor >> shift;
-		dividend >>= shift;
-	} else
-		d = divisor;
+		if (quot0 != 0)
+			quot0 = quot0 - 1;
+		if ((dividend - quot0 * divisor) >= divisor)
+			quot0 = quot0 + 1;
 
-	return div_u64(dividend, d);
+		return quot0;
+	}
 }
 EXPORT_SYMBOL(div64_u64);
 #endif
 
+/**
+ * div64_s64 - signed 64bit divide with 64bit divisor
+ * @dividend:	64bit dividend
+ * @divisor:	64bit divisor
+ */
+#ifndef div64_s64
+s64 div64_s64(s64 dividend, s64 divisor)
+{
+	s64 quot, t;
+
+	quot = div64_u64(abs64(dividend), abs64(divisor));
+	t = (dividend ^ divisor) >> 63;
+
+	return (quot ^ t) - t;
+}
+EXPORT_SYMBOL(div64_s64);
+#endif
+
 #endif /* BITS_PER_LONG == 32 */
 
 /*
-- 
1.5.4.5


[-- Attachment #3: div64_u64_test.c --]
[-- Type: text/x-csrc, Size: 1555 bytes --]

#include <linux/module.h>

/*
 * Verification test for div64_u64.
 */

#ifndef abs64
#define abs64(x) ({			\
	s64 __x = (x);			\
	(__x < 0) ? -__x : __x;		\
})
#endif

int div64_u64_test(void)
{
	u64 uu, vu, qu, ru;
	int n, i, j, errors = 0;
	const u64 tabu[] = {
	    0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
	    10, 11, 12, 13, 14, 15, 16, 1000, 2003,
	    32765, 32766, 32767, 32768, 32769, 32760,
	    65533, 65534, 65535, 65536, 65537, 65538,
	    0x7ffffffeULL, 0x7fffffffULL, 0x80000000ULL, 0x80000001ULL,
	    0x7000000000000000ULL, 0x7000000080000000ULL, 0x7000000080000001ULL,
	    0x7fffffffffffffffULL, 0x7fffffff8fffffffULL, 0x7fffffff8ffffff1ULL,
	    0x7fffffff00000000ULL, 0x7fffffff80000000ULL, 0x7fffffff00000001ULL,
	    0x8000000000000000ULL, 0x8000000080000000ULL, 0x8000000080000001ULL,
	    0xc000000000000000ULL, 0xc000000080000000ULL, 0xc000000080000001ULL,
	    0xfffffffffffffffdULL, 0xfffffffffffffffeULL, 0xffffffffffffffffULL,
	};

	printk("%s", "Testing unsigned 64-bit division.\n");
	n = sizeof(tabu) / sizeof(tabu[0]);
	for (i = 0; i < n; i++) {
		for (j = 1; j < n; j++) {
			uu = tabu[i];
			vu = tabu[j];
			qu = div64_u64(uu, vu);
			ru = uu - qu * vu;
			if (qu > uu || ru >= vu) {
				printk("%016llx/%016llx != %016llx "
				   "rem %016llx\n", uu, vu, qu, ru);
				errors++;
			}
		}
	}

	if (errors) {
		printk("Failed %d/%d tests\n", errors, n * (n - 1));
	} else {
		printk("Passed all %d tests\n", n * (n - 1));
	}

	return 0;
}

void div64_u64_exit(void) { }

module_init(div64_u64_test);
module_exit(div64_u64_exit);

[-- Attachment #4: div64_s64_test.c --]
[-- Type: text/x-csrc, Size: 1533 bytes --]

#include <linux/module.h>

/*
 * Verification test for div64_s64.
 */

#ifndef abs64
#define abs64(x) ({			\
	s64 __x = (x);			\
	(__x < 0) ? -__x : __x;		\
})
#endif

int div64_s64_test(void)
{
	s64 u, v, q, r;
	int n, i, j, k, errors = 0;
	const s64 tabs[] = {
	    0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
	    10, 11, 12, 13, 14, 15, 16, 1000, 2003,
	    32765, 32766, 32767, 32768, 32769, 32760,
	    65533, 65534, 65535, 65536, 65537, 65538,
	    0x7ffffffeLL, 0x7fffffffLL, 0x80000000LL, 0x80000001LL,
	    0x7000000000000000LL, 0x7000000080000000LL, 0x7000000080000001LL,
	    0x7fffffffffffffffLL, 0x7fffffff8fffffffLL, 0x7fffffff8ffffff1LL,
	    0x7fffffff00000000LL, 0x7fffffff80000000LL, 0x7fffffff00000001LL,
	    0x0123456789abcdefLL, 0x00000000abcdef01LL, 0x0000000012345678LL,
	};

	printk("%s", "Testing signed 64-bit division.\n");
	n = sizeof(tabs) / sizeof(tabs[0]);
	for (i = 0; i < n; i++) {
		for (j = 1; j < n; j++) {
			for (k = 0; k <= 3; k++) {
				u = (k & 1)  ? -tabs[i] : tabs[i];
				v = (k >= 2) ? -tabs[j] : tabs[j];

				q = div64_s64(u, v);
				r = u - q * v;
				if (abs64(q) >  abs64(u) ||
				    abs64(r) >= abs64(v) ||
				    (r != 0 && (r ^ u) < 0)) {
					printk("%016llx/%016llx != %016llx "
					    "rem %016llx\n", u, v, q, r);
					errors++;
				}
			}
		}
	}

	if (errors) {
		printk("Failed %d/%d tests\n", errors, n * (n - 1));
	} else {
		printk("Passed all %d tests\n", n * (n - 1));
	}

	return 0;
}

void div64_s64_exit(void) { }

module_init(div64_s64_test);
module_exit(div64_s64_exit);

[-- Attachment #5: README --]
[-- Type: text/plain, Size: 2084 bytes --]

linux-2.6.35

Testing Summary     | x86_64 | x86    |
--------------------+-----------------+
linux-2.6.35        | PASS   | FAIL   |
linux-2.6.35+patch  | PASS   | PASS   |


Testing Details (x86_64)
--------------------------------------------------------------------------
* PASS - linux-2.6.35
Testing unsigned 64-bit division.
Passed all 2756 tests

* PASS - linux-2.6.35+patch
Testing unsigned 64-bit division.
Passed all 2756 tests
Testing signed 64-bit division.
Passed all 2162 tests


Testing Details (x86)
--------------------------------------------------------------------------
* FAIL - linux-2.6.35
Testing unsigned 64-bit division.
7000000080000000/7000000080000001 != 0000000000000001 rem ffffffffffffffff
7fffffff8fffffff/7fffffffffffffff != 0000000000000001 rem ffffffff90000000
7fffffff8ffffff1/7fffffffffffffff != 0000000000000001 rem ffffffff8ffffff2
7fffffff8ffffff1/7fffffff8fffffff != 0000000000000001 rem fffffffffffffff2
7fffffff00000000/7fffffff00000001 != 0000000000000001 rem ffffffffffffffff
7fffffff80000000/7fffffffffffffff != 0000000000000001 rem ffffffff80000001
7fffffff80000000/7fffffff8fffffff != 0000000000000001 rem fffffffff0000001
7fffffff80000000/7fffffff8ffffff1 != 0000000000000001 rem fffffffff000000f
8000000000000000/8000000080000000 != 0000000000000001 rem ffffffff80000000
8000000000000000/8000000080000001 != 0000000000000001 rem ffffffff7fffffff
8000000080000000/8000000080000001 != 0000000000000001 rem ffffffffffffffff
c000000000000000/c000000080000000 != 0000000000000001 rem ffffffff80000000
c000000000000000/c000000080000001 != 0000000000000001 rem ffffffff7fffffff
c000000080000000/c000000080000001 != 0000000000000001 rem ffffffffffffffff
fffffffffffffffd/7fffffffffffffff != 0000000000000002 rem ffffffffffffffff
fffffffffffffffd/fffffffffffffffe != 0000000000000001 rem ffffffffffffffff
fffffffffffffffe/ffffffffffffffff != 0000000000000001 rem ffffffffffffffff
Failed 17/2756 tests

* PASS - linux-2.6.35+patch
Testing unsigned 64-bit division.
Passed all 2756 tests
Testing signed 64-bit division.
Passed all 2162 tests


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] Make div64_u64() precise on 32bit platforms
  2010-08-09 16:30   ` [PATCH] Make div64_u64() " Brian Behlendorf
@ 2010-09-17  0:00     ` Oleg Nesterov
  0 siblings, 0 replies; 12+ messages in thread
From: Oleg Nesterov @ 2010-09-17  0:00 UTC (permalink / raw)
  To: Brian Behlendorf
  Cc: Andrew Morton, Ben Woodard, Jeremy Fitzhardinge, Mark Grondona,
	linux-kernel

On 08/09, Brian Behlendorf wrote:
>
> > On Mon, 2 Aug 2010 18:09:51 +0200
> >
> > Oleg Nesterov <oleg@redhat.com> wrote:
> > > We have a bugreport which blames div64_u64() on 32bit platforms.
> > >
> > > However, the code obviously doesn't even try to pretend it can do
> > > the 64bit division precisely. If there is something in the high
> > > word of divisor, div64_u64() just shifts both arguments and throws
> > > out the low bits.
> >
> > Well that was a bit lazy of us - I wonder how hard it is to fix.
> >
> > At present people will test their code on 64-bit only to find out later
> > that it doesn't work correctly on 32-bit.  Bad.  Perhaps we should
> > similarly break the 64-bit version :)
>
> Here's an even crazier idea, let's just fix the 32-bit version.  :)
>
> The attached patch fully implements div64_u64() such that it will return
> precisely the right quotient even when the divisor exceeds 32-bits.  The
> patch also adds a div64_s64() function to fully support signed 64-bit
> division.

Well, since nobody commented this patch...

Personally I agree, it makes sense to make it precise/correct. I think
you should add CC's and resend the patch. Although I do not know who
can ack it authoritatively (probably Andrew ;).

I have no idea how much the fixed version is slower, and afaics the
current callers do not really care about precision. But we can always
add the old code back as div64_64_sucks_on_32_bit_but_faster().

> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -162,6 +162,11 @@ extern int _cond_resched(void);
>  		(__x < 0) ? -__x : __x;		\
>  	})
>
> +#define abs64(x) ({				\
> +		s64 __x = (x);			\
> +		(__x < 0) ? -__x : __x;		\
> +	})
> +

Can't we just improve abs? Say,

	#define abs(x) ({				\
			typeof(x) __x = (x);		\
			(__x < 0) ? -__x : __x;		\
		})


>  u64 div64_u64(u64 dividend, u64 divisor)
>  {
> ...
> +	} else {
> +		n = __builtin_clzll(divisor);

This is a bit unusual. I mean, it is not that common to use gcc builtins
in the normal code. And, it seems, we can use __fls(divisor >> 32) or
just fls64() instead ?

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Make div64_u64() precise on 32bit platforms
@ 2010-10-12 19:26 Brian Behlendorf
  2010-10-13 21:37 ` Oleg Nesterov
  0 siblings, 1 reply; 12+ messages in thread
From: Brian Behlendorf @ 2010-10-12 19:26 UTC (permalink / raw)
  To: LKML; +Cc: Andrew Morton, Oleg Nesterov

[-- Attachment #1: Type: text/plain, Size: 1142 bytes --]


I'm resending the patch as is and adding what I hope are the right CCs.  Also 
let me explain why I opted to add abs64() and use the gcc builtin.

>Can't we just improve abs? Say,

I was reluctant to change abs() since it would have a much larger impact on 
the code base.  Using typeof() should be OK but if any of the callers 
mistakenly call abs() with an unsigned value then we could see compiler 
warnings about '__x < 0' being a useless conditional.

>This is a bit unusual. I mean, it is not that common to use gcc builtins
>in the normal code. And, it seems, we can use __fls(divisor >> 32) or
>just fls64() instead ?

I opted for the gcc builtin because I felt it made the code more readable.  I 
also suspect it will perform slightly better than __fls() on some archs.  For 
example, on powerpc __fls() in implemented using the 'cntlzw' instruction.  
It returns (BITS_PER_LONG - 1 - cntlzw) which is wasted work since my 
function would immediately undo this to get back cntlzw.  If I was lucky the 
compiler would optimize this away for me but if I use the builtin I don't 
need to take the chance.

-- 
Thanks,
Brian Behlendorf

[-- Attachment #2: 0001-Fix-div64_u64-for-32bit-platforms.patch --]
[-- Type: text/plain, Size: 3927 bytes --]

>From 3b30f1cf78f88b40360dd65816941cf2be9dd60d Mon Sep 17 00:00:00 2001
From: Brian Behlendorf <behlendorf1@llnl.gov>
Date: Thu, 5 Aug 2010 14:59:11 -0700
Subject: [PATCH] Fix div64_u64 for 32bit platforms

The current implementation of div64_u64 for 32bit systems returns
an approximately correct result when the divisor exceeds 32bits.
Since doing 64bit division using 32bit hardware is a long since
solved problem we just use one of the existing proven methods.

Additionally, add a div64_s64 function to correctly handle doing
signed 64bit division.

---
 include/linux/kernel.h |    5 +++
 include/linux/math64.h |   12 +++++++++
 lib/div64.c            |   64 +++++++++++++++++++++++++++++++++++++++--------
 3 files changed, 70 insertions(+), 11 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 5de838b..7a00dff 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -162,6 +162,11 @@ extern int _cond_resched(void);
 		(__x < 0) ? -__x : __x;		\
 	})
 
+#define abs64(x) ({				\
+		s64 __x = (x);			\
+		(__x < 0) ? -__x : __x;		\
+	})
+
 #ifdef CONFIG_PROVE_LOCKING
 void might_fault(void);
 #else
diff --git a/include/linux/math64.h b/include/linux/math64.h
index c87f152..23fcdfc 100644
--- a/include/linux/math64.h
+++ b/include/linux/math64.h
@@ -35,6 +35,14 @@ static inline u64 div64_u64(u64 dividend, u64 divisor)
 	return dividend / divisor;
 }
 
+/**
+ * div64_s64 - signed 64bit divide with 64bit divisor
+ */
+static inline s64 div64_s64(s64 dividend, s64 divisor)
+{
+	return dividend / divisor;
+}
+
 #elif BITS_PER_LONG == 32
 
 #ifndef div_u64_rem
@@ -53,6 +61,10 @@ extern s64 div_s64_rem(s64 dividend, s32 divisor, s32 *remainder);
 extern u64 div64_u64(u64 dividend, u64 divisor);
 #endif
 
+#ifndef div64_s64
+extern s64 div64_s64(s64 dividend, s64 divisor);
+#endif
+
 #endif /* BITS_PER_LONG */
 
 /**
diff --git a/lib/div64.c b/lib/div64.c
index a111eb8..e4e7fc6 100644
--- a/lib/div64.c
+++ b/lib/div64.c
@@ -77,26 +77,68 @@ s64 div_s64_rem(s64 dividend, s32 divisor, s32 *remainder)
 EXPORT_SYMBOL(div_s64_rem);
 #endif
 
-/* 64bit divisor, dividend and result. dynamic precision */
+/**
+ * div64_u64 - unsigned 64bit divide with 64bit divisor
+ * @dividend:	64bit dividend
+ * @divisor:	64bit divisor
+ *
+ * This implementation is a modified version of the algorithm proposed
+ * by the book 'Hacker's Delight'.  The original source and full proof
+ * can be found here and is available for use without restriction.
+ *
+ * 'http://www.hackersdelight.org/HDcode/newCode/divDouble.c'
+ */
 #ifndef div64_u64
 u64 div64_u64(u64 dividend, u64 divisor)
 {
-	u32 high, d;
-
-	high = divisor >> 32;
-	if (high) {
-		unsigned int shift = fls(high);
+	u64 u0, quot0, quot1;
+	u32 rem;
+	int n;
+
+	if (divisor >> 32 == 0) {
+		if (dividend >> 32 < divisor) {
+			return div_u64_rem(dividend, divisor, &rem);
+		} else {
+			u0 = dividend & 0xFFFFFFFF;
+			quot1 = div_u64_rem(dividend >> 32, divisor, &rem);
+			u0 += ((u64)rem << 32);
+			quot0 = div_u64_rem(u0, divisor, &rem);
+			return (quot1 << 32) + quot0;
+		}
+	} else {
+		n = __builtin_clzll(divisor);
+		quot1 = div_u64_rem(dividend >> 1, (divisor << n) >> 32, &rem);
+		quot0 = (quot1 << n) >> 31;
 
-		d = divisor >> shift;
-		dividend >>= shift;
-	} else
-		d = divisor;
+		if (quot0 != 0)
+			quot0 = quot0 - 1;
+		if ((dividend - quot0 * divisor) >= divisor)
+			quot0 = quot0 + 1;
 
-	return div_u64(dividend, d);
+		return quot0;
+	}
 }
 EXPORT_SYMBOL(div64_u64);
 #endif
 
+/**
+ * div64_s64 - signed 64bit divide with 64bit divisor
+ * @dividend:	64bit dividend
+ * @divisor:	64bit divisor
+ */
+#ifndef div64_s64
+s64 div64_s64(s64 dividend, s64 divisor)
+{
+	s64 quot, t;
+
+	quot = div64_u64(abs64(dividend), abs64(divisor));
+	t = (dividend ^ divisor) >> 63;
+
+	return (quot ^ t) - t;
+}
+EXPORT_SYMBOL(div64_s64);
+#endif
+
 #endif /* BITS_PER_LONG == 32 */
 
 /*
-- 
1.5.4.5


[-- Attachment #3: div64_u64_test.c --]
[-- Type: text/x-csrc, Size: 1555 bytes --]

#include <linux/module.h>

/*
 * Verification test for div64_u64.
 */

#ifndef abs64
#define abs64(x) ({			\
	s64 __x = (x);			\
	(__x < 0) ? -__x : __x;		\
})
#endif

int div64_u64_test(void)
{
	u64 uu, vu, qu, ru;
	int n, i, j, errors = 0;
	const u64 tabu[] = {
	    0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
	    10, 11, 12, 13, 14, 15, 16, 1000, 2003,
	    32765, 32766, 32767, 32768, 32769, 32760,
	    65533, 65534, 65535, 65536, 65537, 65538,
	    0x7ffffffeULL, 0x7fffffffULL, 0x80000000ULL, 0x80000001ULL,
	    0x7000000000000000ULL, 0x7000000080000000ULL, 0x7000000080000001ULL,
	    0x7fffffffffffffffULL, 0x7fffffff8fffffffULL, 0x7fffffff8ffffff1ULL,
	    0x7fffffff00000000ULL, 0x7fffffff80000000ULL, 0x7fffffff00000001ULL,
	    0x8000000000000000ULL, 0x8000000080000000ULL, 0x8000000080000001ULL,
	    0xc000000000000000ULL, 0xc000000080000000ULL, 0xc000000080000001ULL,
	    0xfffffffffffffffdULL, 0xfffffffffffffffeULL, 0xffffffffffffffffULL,
	};

	printk("%s", "Testing unsigned 64-bit division.\n");
	n = sizeof(tabu) / sizeof(tabu[0]);
	for (i = 0; i < n; i++) {
		for (j = 1; j < n; j++) {
			uu = tabu[i];
			vu = tabu[j];
			qu = div64_u64(uu, vu);
			ru = uu - qu * vu;
			if (qu > uu || ru >= vu) {
				printk("%016llx/%016llx != %016llx "
				   "rem %016llx\n", uu, vu, qu, ru);
				errors++;
			}
		}
	}

	if (errors) {
		printk("Failed %d/%d tests\n", errors, n * (n - 1));
	} else {
		printk("Passed all %d tests\n", n * (n - 1));
	}

	return 0;
}

void div64_u64_exit(void) { }

module_init(div64_u64_test);
module_exit(div64_u64_exit);

[-- Attachment #4: div64_s64_test.c --]
[-- Type: text/x-csrc, Size: 1533 bytes --]

#include <linux/module.h>

/*
 * Verification test for div64_s64.
 */

#ifndef abs64
#define abs64(x) ({			\
	s64 __x = (x);			\
	(__x < 0) ? -__x : __x;		\
})
#endif

int div64_s64_test(void)
{
	s64 u, v, q, r;
	int n, i, j, k, errors = 0;
	const s64 tabs[] = {
	    0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
	    10, 11, 12, 13, 14, 15, 16, 1000, 2003,
	    32765, 32766, 32767, 32768, 32769, 32760,
	    65533, 65534, 65535, 65536, 65537, 65538,
	    0x7ffffffeLL, 0x7fffffffLL, 0x80000000LL, 0x80000001LL,
	    0x7000000000000000LL, 0x7000000080000000LL, 0x7000000080000001LL,
	    0x7fffffffffffffffLL, 0x7fffffff8fffffffLL, 0x7fffffff8ffffff1LL,
	    0x7fffffff00000000LL, 0x7fffffff80000000LL, 0x7fffffff00000001LL,
	    0x0123456789abcdefLL, 0x00000000abcdef01LL, 0x0000000012345678LL,
	};

	printk("%s", "Testing signed 64-bit division.\n");
	n = sizeof(tabs) / sizeof(tabs[0]);
	for (i = 0; i < n; i++) {
		for (j = 1; j < n; j++) {
			for (k = 0; k <= 3; k++) {
				u = (k & 1)  ? -tabs[i] : tabs[i];
				v = (k >= 2) ? -tabs[j] : tabs[j];

				q = div64_s64(u, v);
				r = u - q * v;
				if (abs64(q) >  abs64(u) ||
				    abs64(r) >= abs64(v) ||
				    (r != 0 && (r ^ u) < 0)) {
					printk("%016llx/%016llx != %016llx "
					    "rem %016llx\n", u, v, q, r);
					errors++;
				}
			}
		}
	}

	if (errors) {
		printk("Failed %d/%d tests\n", errors, n * (n - 1));
	} else {
		printk("Passed all %d tests\n", n * (n - 1));
	}

	return 0;
}

void div64_s64_exit(void) { }

module_init(div64_s64_test);
module_exit(div64_s64_exit);

[-- Attachment #5: README --]
[-- Type: text/plain, Size: 2084 bytes --]

linux-2.6.35

Testing Summary     | x86_64 | x86    |
--------------------+-----------------+
linux-2.6.35        | PASS   | FAIL   |
linux-2.6.35+patch  | PASS   | PASS   |


Testing Details (x86_64)
--------------------------------------------------------------------------
* PASS - linux-2.6.35
Testing unsigned 64-bit division.
Passed all 2756 tests

* PASS - linux-2.6.35+patch
Testing unsigned 64-bit division.
Passed all 2756 tests
Testing signed 64-bit division.
Passed all 2162 tests


Testing Details (x86)
--------------------------------------------------------------------------
* FAIL - linux-2.6.35
Testing unsigned 64-bit division.
7000000080000000/7000000080000001 != 0000000000000001 rem ffffffffffffffff
7fffffff8fffffff/7fffffffffffffff != 0000000000000001 rem ffffffff90000000
7fffffff8ffffff1/7fffffffffffffff != 0000000000000001 rem ffffffff8ffffff2
7fffffff8ffffff1/7fffffff8fffffff != 0000000000000001 rem fffffffffffffff2
7fffffff00000000/7fffffff00000001 != 0000000000000001 rem ffffffffffffffff
7fffffff80000000/7fffffffffffffff != 0000000000000001 rem ffffffff80000001
7fffffff80000000/7fffffff8fffffff != 0000000000000001 rem fffffffff0000001
7fffffff80000000/7fffffff8ffffff1 != 0000000000000001 rem fffffffff000000f
8000000000000000/8000000080000000 != 0000000000000001 rem ffffffff80000000
8000000000000000/8000000080000001 != 0000000000000001 rem ffffffff7fffffff
8000000080000000/8000000080000001 != 0000000000000001 rem ffffffffffffffff
c000000000000000/c000000080000000 != 0000000000000001 rem ffffffff80000000
c000000000000000/c000000080000001 != 0000000000000001 rem ffffffff7fffffff
c000000080000000/c000000080000001 != 0000000000000001 rem ffffffffffffffff
fffffffffffffffd/7fffffffffffffff != 0000000000000002 rem ffffffffffffffff
fffffffffffffffd/fffffffffffffffe != 0000000000000001 rem ffffffffffffffff
fffffffffffffffe/ffffffffffffffff != 0000000000000001 rem ffffffffffffffff
Failed 17/2756 tests

* PASS - linux-2.6.35+patch
Testing unsigned 64-bit division.
Passed all 2756 tests
Testing signed 64-bit division.
Passed all 2162 tests


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] Make div64_u64() precise on 32bit platforms
  2010-10-12 19:26 Brian Behlendorf
@ 2010-10-13 21:37 ` Oleg Nesterov
  2010-10-14 12:11   ` Oleg Nesterov
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2010-10-13 21:37 UTC (permalink / raw)
  To: Brian Behlendorf; +Cc: LKML, Andrew Morton

On 10/12, Brian Behlendorf wrote:
>
> I'm resending the patch as is and adding what I hope are the right CCs.  Also
> let me explain why I opted to add abs64() and use the gcc builtin.
>
> >Can't we just improve abs? Say,
>
> I was reluctant to change abs() since it would have a much larger impact on
> the code base.  Using typeof() should be OK but if any of the callers
> mistakenly call abs() with an unsigned value then we could see compiler
> warnings about '__x < 0' being a useless conditional.

I see. Probably in this case we want this warning. But I agree, it is better
to make a separate patch for such a change.

> >This is a bit unusual. I mean, it is not that common to use gcc builtins
> >in the normal code. And, it seems, we can use __fls(divisor >> 32) or
> >just fls64() instead ?
>
> I opted for the gcc builtin because I felt it made the code more readable.  I
> also suspect it will perform slightly better than __fls() on some archs.

Well, compared to div_64() we are going to do, this is nothing. But
I won't argue.


I think the patch is correct. A couple of questions though,

> + * 'http://www.hackersdelight.org/HDcode/newCode/divDouble.c'

404

>  u64 div64_u64(u64 dividend, u64 divisor)
>  {
> -	u32 high, d;
> -
> -	high = divisor >> 32;
> -	if (high) {
> -		unsigned int shift = fls(high);
> +	u64 u0, quot0, quot1;
> +	u32 rem;
> +	int n;
> +
> +	if (divisor >> 32 == 0) {
> +		if (dividend >> 32 < divisor) {
> +			return div_u64_rem(dividend, divisor, &rem);
> +		} else {
> +			u0 = dividend & 0xFFFFFFFF;
> +			quot1 = div_u64_rem(dividend >> 32, divisor, &rem);
> +			u0 += ((u64)rem << 32);
> +			quot0 = div_u64_rem(u0, divisor, &rem);
> +			return (quot1 << 32) + quot0;
> +		}

Looks correct... but I can't understand these complications.
Looks like we can just do

	if ((divisor >> 32) == 0) {
		div_u64(dividend, divisor);
	} else {
	...

No?

> +	} else {
> +		n = __builtin_clzll(divisor);
> +		quot1 = div_u64_rem(dividend >> 1, (divisor << n) >> 32, &rem);
> +		quot0 = (quot1 << n) >> 31;

I can't understand this "dividend >> 1". It seems to me that

		quot1 = div_u64(dividend, (divisor << n) >> 32);
		quot0 = (quot1 << n) >> 32;

should be equally correct. Or I missed some overflow?


Anyway this looks correct, but I almost died trying to understand this
code (or, better say, to convince myself I can understand it ;)

Looks like, if we denote

		A = dividend
		B = divisor
		K = 1ull << (32 - n)

then
		quot0 = A / (B - (B % K))

which is obviously >= A/B. All we need is to ensure is that it is
<= A/B + 1, and this seems to be true.

So, I believe the patch is correct.



A bit off-topic,

> 			uu = tabu[i];
> 			vu = tabu[j];
> 			qu = div64_u64(uu, vu);
> 			ru = uu - qu * vu;
> 			if (qu > uu || ru >= vu) {
> 				printk("%016llx/%016llx != %016llx "
> 				   "rem %016llx\n", uu, vu, qu, ru);
> 				errors++;
> 			}

I wouldn't trust this check too much. I mean, it can miss an error.
For example, consider something like

	vu = -1ll
	uu = vu / 2
	qu = 2		// suppose that div64_u64() is very wrong

Afaics, this wrong qu passes the check.

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Make div64_u64() precise on 32bit platforms
  2010-10-13 21:37 ` Oleg Nesterov
@ 2010-10-14 12:11   ` Oleg Nesterov
  2010-10-21 17:46     ` Brian Behlendorf
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2010-10-14 12:11 UTC (permalink / raw)
  To: Brian Behlendorf; +Cc: LKML, Andrew Morton

On 10/13, Oleg Nesterov wrote:
>
> On 10/12, Brian Behlendorf wrote:
> >
> >  u64 div64_u64(u64 dividend, u64 divisor)
> >  {
> > -	u32 high, d;
> > -
> > -	high = divisor >> 32;
> > -	if (high) {
> > -		unsigned int shift = fls(high);
> > +	u64 u0, quot0, quot1;
> > +	u32 rem;
> > +	int n;
> > +
> > +	if (divisor >> 32 == 0) {
> > +		if (dividend >> 32 < divisor) {
> > +			return div_u64_rem(dividend, divisor, &rem);
> > +		} else {
> > +			u0 = dividend & 0xFFFFFFFF;
> > +			quot1 = div_u64_rem(dividend >> 32, divisor, &rem);
> > +			u0 += ((u64)rem << 32);
> > +			quot0 = div_u64_rem(u0, divisor, &rem);
> > +			return (quot1 << 32) + quot0;
> > +		}
>
> Looks correct... but I can't understand these complications.
> Looks like we can just do
>
> 	if ((divisor >> 32) == 0) {
> 		div_u64(dividend, divisor);
> 	} else {
> 	...
>
> No?
>
> > +	} else {
> > +		n = __builtin_clzll(divisor);
> > +		quot1 = div_u64_rem(dividend >> 1, (divisor << n) >> 32, &rem);
> > +		quot0 = (quot1 << n) >> 31;
>
> I can't understand this "dividend >> 1". It seems to me that
>
> 		quot1 = div_u64(dividend, (divisor << n) >> 32);
> 		quot0 = (quot1 << n) >> 32;
>
> should be equally correct. Or I missed some overflow?

Thinking more about this with a fresh head, we don't event need quot1,
unless I missed something. We can do

		quot0 = div_u64((dividend << n) >> 32, (divisor << n) >> 32);

instead. Or, better,

		n = 32 - __builtin_clzll(divisor);
		quot0 = div_u64(dividend >> n, divisor >> n);

And 32 - clzll == fls.

So, I think it can be really trivial, see the test-case below,
seems to work (you need 64bit machine to test).

What do you think? I do not trust my math skills.

Oleg.

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/fcntl.h>
#include <assert.h>

typedef unsigned long long u64;
typedef unsigned long u32;

static inline u64 div_u64(u64 A, u32 B)
{
	return A / B;
}

static inline unsigned long __fls(unsigned long word)
{
	asm("bsr %1,%0"
	    : "=r" (word)
	    : "rm" (word));
	return word;
}

u64 div64_u64(u64 A, u64 B)
{
	u32 high = B >> 32;
	u64 quot;

	if (high == 0) {
		quot = div_u64(A, B);
	} else {
		int n = 1 + __fls(high);
		quot = div_u64(A >> n, B >> n);

		if (quot != 0)
			quot--;
		if ((A - quot * B) >= B)
			quot++;
	}

	return quot;
}

int main(void)
{
	int fd, n;

	fd = open("/dev/urandom", O_RDONLY);
	assert(fd >= 0);

	for (n = 1;; ++n) {
		u64 xx[2], rs;

		assert(read(fd, xx, sizeof(xx)) == sizeof(xx));

		if (xx[1] == 0)
			continue;

		rs = div64_u64(xx[0], xx[1]);

		if (rs != xx[0] / xx[1]) {
			printf("ERR!! %llx / %llx = %llx : %llx\n",
				xx[0] , xx[1], xx[0] / xx[1], rs);
			return 1;
		}

		if (!(n %100000))
			printf("passed: %d\n", n);
	}
}


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Make div64_u64() precise on 32bit platforms
  2010-10-14 12:11   ` Oleg Nesterov
@ 2010-10-21 17:46     ` Brian Behlendorf
  2010-10-21 18:12       ` Oleg Nesterov
  0 siblings, 1 reply; 12+ messages in thread
From: Brian Behlendorf @ 2010-10-21 17:46 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: LKML, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1995 bytes --]

> > > +	if (divisor >> 32 == 0) {
> > > +		if (dividend >> 32 < divisor) {
> > > +			return div_u64_rem(dividend, divisor, &rem);
> > > +		} else {
> > > +			u0 = dividend & 0xFFFFFFFF;
> > > +			quot1 = div_u64_rem(dividend >> 32, divisor, &rem);
> > > +			u0 += ((u64)rem << 32);
> > > +			quot0 = div_u64_rem(u0, divisor, &rem);
> > > +			return (quot1 << 32) + quot0;
> > > +		}
> >
> > Looks correct... but I can't understand these complications.
> > Looks like we can just do
> >
> > 	if ((divisor >> 32) == 0) {
> > 		div_u64(dividend, divisor);
> > 	} else {
> > 	...
> >
> > No?

The idea here, as described in the formal proof, is to cleanly handle
the overflow case.  When I implemented this I assumed the overflow case
would in fact be a problem.  To my surprise your right it doesn't seem
to be causing any trouble.  In practice I can't find any cases where it
is a problem on i386.

> > I can't understand this "dividend >> 1". It seems to me that
> >
> > 		quot1 = div_u64(dividend, (divisor << n) >> 32);
> > 		quot0 = (quot1 << n) >> 32;
> >
> > should be equally correct. Or I missed some overflow?
> 
> Thinking more about this with a fresh head, we don't event need quot1,
> unless I missed something. We can do
> 
> 		quot0 = div_u64((dividend << n) >> 32, (divisor << n) >> 32);
> 
> instead. Or, better,
> 
> 		n = 32 - __builtin_clzll(divisor);
> 		quot0 = div_u64(dividend >> n, divisor >> n);
> 
> And 32 - clzll == fls.

Once again, the extra complexity was only there to handle to overflow
case.

> So, I think it can be really trivial, see the test-case below,
> seems to work (you need 64bit machine to test).
> 
> What do you think? I do not trust my math skills.

I think we should use your simpler version.  There's no good reason to
make this more complicated than it needs to be.  I haven't been able to
find a test case where your changes get the wrong result.

The updated patch is against linux-2.6.35 and passes all the previous
test cases.

Thanks,
Brian

[-- Attachment #2: Type: text/x-patch, Size: 3572 bytes --]

>From 35755c57cb45f7d30abcd88a8e2fc1ccc5beecfd Mon Sep 17 00:00:00 2001
From: Brian Behlendorf <behlendorf1@llnl.gov>
Date: Thu, 5 Aug 2010 14:59:11 -0700
Subject: [PATCH] Fix div64_u64 for 32bit platforms

The current implementation of div64_u64 for 32bit systems returns
an approximately correct result when the divisor exceeds 32bits.
Since doing 64bit division using 32bit hardware is a long since
solved problem we just use one of the existing proven methods.

Additionally, add a div64_s64 function to correctly handle doing
signed 64bit division.
---
 include/linux/kernel.h |    5 ++++
 include/linux/math64.h |   12 +++++++++++
 lib/div64.c            |   52 ++++++++++++++++++++++++++++++++++++++---------
 3 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 8317ec4..ed6371c 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -162,6 +162,11 @@ extern int _cond_resched(void);
 		(__x < 0) ? -__x : __x;		\
 	})
 
+#define abs64(x) ({				\
+		s64 __x = (x);			\
+		(__x < 0) ? -__x : __x;		\
+	})
+
 #ifdef CONFIG_PROVE_LOCKING
 void might_fault(void);
 #else
diff --git a/include/linux/math64.h b/include/linux/math64.h
index c87f152..23fcdfc 100644
--- a/include/linux/math64.h
+++ b/include/linux/math64.h
@@ -35,6 +35,14 @@ static inline u64 div64_u64(u64 dividend, u64 divisor)
 	return dividend / divisor;
 }
 
+/**
+ * div64_s64 - signed 64bit divide with 64bit divisor
+ */
+static inline s64 div64_s64(s64 dividend, s64 divisor)
+{
+	return dividend / divisor;
+}
+
 #elif BITS_PER_LONG == 32
 
 #ifndef div_u64_rem
@@ -53,6 +61,10 @@ extern s64 div_s64_rem(s64 dividend, s32 divisor, s32 *remainder);
 extern u64 div64_u64(u64 dividend, u64 divisor);
 #endif
 
+#ifndef div64_s64
+extern s64 div64_s64(s64 dividend, s64 divisor);
+#endif
+
 #endif /* BITS_PER_LONG */
 
 /**
diff --git a/lib/div64.c b/lib/div64.c
index a111eb8..5b49191 100644
--- a/lib/div64.c
+++ b/lib/div64.c
@@ -77,26 +77,58 @@ s64 div_s64_rem(s64 dividend, s32 divisor, s32 *remainder)
 EXPORT_SYMBOL(div_s64_rem);
 #endif
 
-/* 64bit divisor, dividend and result. dynamic precision */
+/**
+ * div64_u64 - unsigned 64bit divide with 64bit divisor
+ * @dividend:	64bit dividend
+ * @divisor:	64bit divisor
+ *
+ * This implementation is a modified version of the algorithm proposed
+ * by the book 'Hacker's Delight'.  The original source and full proof
+ * can be found here and is available for use without restriction.
+ *
+ * 'http://www.hackersdelight.org/HDcode/newCode/divDouble.c'
+ */
 #ifndef div64_u64
 u64 div64_u64(u64 dividend, u64 divisor)
 {
-	u32 high, d;
+	u32 high = divisor >> 32;
+	u64 quot;
 
-	high = divisor >> 32;
-	if (high) {
-		unsigned int shift = fls(high);
+	if (high == 0) {
+		quot = div_u64(dividend, divisor);
+	} else {
+		int n = 1 + fls(high);
+		quot = div_u64(dividend >> n, divisor >> n);
 
-		d = divisor >> shift;
-		dividend >>= shift;
-	} else
-		d = divisor;
+		if (quot != 0)
+			quot--;
+		if ((dividend - quot * divisor) >= divisor)
+			quot++;
+	}
 
-	return div_u64(dividend, d);
+	return quot;
 }
 EXPORT_SYMBOL(div64_u64);
 #endif
 
+/**
+ * div64_s64 - signed 64bit divide with 64bit divisor
+ * @dividend:	64bit dividend
+ * @divisor:	64bit divisor
+ */
+#ifndef div64_s64
+s64 div64_s64(s64 dividend, s64 divisor)
+{
+	s64 quot, t;
+
+	quot = div64_u64(abs64(dividend), abs64(divisor));
+	t = (dividend ^ divisor) >> 63;
+
+	return (quot ^ t) - t;
+}
+EXPORT_SYMBOL(div64_s64);
+#endif
+
 #endif /* BITS_PER_LONG == 32 */
 
 /*
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] Make div64_u64() precise on 32bit platforms
  2010-10-21 17:46     ` Brian Behlendorf
@ 2010-10-21 18:12       ` Oleg Nesterov
  2010-10-21 19:22         ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2010-10-21 18:12 UTC (permalink / raw)
  To: Brian Behlendorf; +Cc: LKML, Andrew Morton

Andrew, please drop

	lib-div64c-document-that-div64_u64-is-not-precise-on-32bit-platforms.patch

On 10/21, Brian Behlendorf wrote:
>
> I think we should use your simpler version.  There's no good reason to
> make this more complicated than it needs to be.  I haven't been able to
> find a test case where your changes get the wrong result.

Great!

> From: Brian Behlendorf <behlendorf1@llnl.gov>
> Date: Thu, 5 Aug 2010 14:59:11 -0700
> Subject: [PATCH] Fix div64_u64 for 32bit platforms
>
> The current implementation of div64_u64 for 32bit systems returns
> an approximately correct result when the divisor exceeds 32bits.
> Since doing 64bit division using 32bit hardware is a long since
> solved problem we just use one of the existing proven methods.

Brian, you forgot to add your sob. Hopefully Andrew can add

	Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Signed-off-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Make div64_u64() precise on 32bit platforms
  2010-10-21 18:12       ` Oleg Nesterov
@ 2010-10-21 19:22         ` Andrew Morton
  2010-10-21 19:49           ` Oleg Nesterov
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2010-10-21 19:22 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Brian Behlendorf, LKML

On Thu, 21 Oct 2010 20:12:20 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> Andrew, please drop
> 
> 	lib-div64c-document-that-div64_u64-is-not-precise-on-32bit-platforms.patch
> 
> On 10/21, Brian Behlendorf wrote:
> >
> > I think we should use your simpler version.  There's no good reason to
> > make this more complicated than it needs to be.  I haven't been able to
> > find a test case where your changes get the wrong result.
> 
> Great!
> 
> > From: Brian Behlendorf <behlendorf1@llnl.gov>
> > Date: Thu, 5 Aug 2010 14:59:11 -0700
> > Subject: [PATCH] Fix div64_u64 for 32bit platforms
> >
> > The current implementation of div64_u64 for 32bit systems returns
> > an approximately correct result when the divisor exceeds 32bits.
> > Since doing 64bit division using 32bit hardware is a long since
> > solved problem we just use one of the existing proven methods.
> 
> Brian, you forgot to add your sob. Hopefully Andrew can add
> 
> 	Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

OK..

Just to tie all this off, you originally said "We have a bug report
which blames div64_u64() on 32bit platforms.".  Where is that bug report?

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Make div64_u64() precise on 32bit platforms
  2010-10-21 19:22         ` Andrew Morton
@ 2010-10-21 19:49           ` Oleg Nesterov
  0 siblings, 0 replies; 12+ messages in thread
From: Oleg Nesterov @ 2010-10-21 19:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Brian Behlendorf, LKML

On 10/21, Andrew Morton wrote:
>
> Just to tie all this off, you originally said "We have a bug report
> which blames div64_u64() on 32bit platforms.".  Where is that bug report?

	Bug 616105 - problems with 64b division on 32b platforms.
	https://bugzilla.redhat.com/show_bug.cgi?id=616105

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-10-21 19:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-02 16:09 [PATCH] trivial, document that div64_u64() is not precise on 32bit platforms Oleg Nesterov
2010-08-03 22:28 ` Andrew Morton
2010-08-04  0:12   ` Ben Woodard
2010-08-09 16:30   ` [PATCH] Make div64_u64() " Brian Behlendorf
2010-09-17  0:00     ` Oleg Nesterov
2010-10-12 19:26 Brian Behlendorf
2010-10-13 21:37 ` Oleg Nesterov
2010-10-14 12:11   ` Oleg Nesterov
2010-10-21 17:46     ` Brian Behlendorf
2010-10-21 18:12       ` Oleg Nesterov
2010-10-21 19:22         ` Andrew Morton
2010-10-21 19:49           ` Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.