From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757788Ab0JLTgv (ORCPT ); Tue, 12 Oct 2010 15:36:51 -0400 Received: from nspiron-3.llnl.gov ([128.115.41.83]:63295 "EHLO smtp.llnl.gov" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753165Ab0JLTgu (ORCPT ); Tue, 12 Oct 2010 15:36:50 -0400 X-Greylist: delayed 582 seconds by postgrey-1.27 at vger.kernel.org; Tue, 12 Oct 2010 15:36:49 EDT X-Attachments: 0001-Fix-div64_u64-for-32bit-platforms.patch, div64_u64_test.c, div64_s64_test.c, README From: Brian Behlendorf To: LKML Subject: Re: [PATCH] Make div64_u64() precise on 32bit platforms Date: Tue, 12 Oct 2010 12:26:59 -0700 User-Agent: KMail/1.9.4 Cc: Andrew Morton , Oleg Nesterov MIME-Version: 1.0 Content-Type: Multipart/Mixed; boundary="Boundary-00=_JaLtM9puqmnsqEX" Message-Id: <201010121227.05735.behlendorf1@llnl.gov> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --Boundary-00=_JaLtM9puqmnsqEX Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline I'm resending the patch as is and adding what I hope are the right CCs. Also let me explain why I opted to add abs64() and use the gcc builtin. >Can't we just improve abs? Say, I was reluctant to change abs() since it would have a much larger impact on the code base. Using typeof() should be OK but if any of the callers mistakenly call abs() with an unsigned value then we could see compiler warnings about '__x < 0' being a useless conditional. >This is a bit unusual. I mean, it is not that common to use gcc builtins >in the normal code. And, it seems, we can use __fls(divisor >> 32) or >just fls64() instead ? I opted for the gcc builtin because I felt it made the code more readable. I also suspect it will perform slightly better than __fls() on some archs. For example, on powerpc __fls() in implemented using the 'cntlzw' instruction. It returns (BITS_PER_LONG - 1 - cntlzw) which is wasted work since my function would immediately undo this to get back cntlzw. If I was lucky the compiler would optimize this away for me but if I use the builtin I don't need to take the chance. -- Thanks, Brian Behlendorf --Boundary-00=_JaLtM9puqmnsqEX Content-Type: text/plain; charset="iso-8859-1"; name="0001-Fix-div64_u64-for-32bit-platforms.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-Fix-div64_u64-for-32bit-platforms.patch" >>From 3b30f1cf78f88b40360dd65816941cf2be9dd60d Mon Sep 17 00:00:00 2001 From: Brian Behlendorf Date: Thu, 5 Aug 2010 14:59:11 -0700 Subject: [PATCH] Fix div64_u64 for 32bit platforms The current implementation of div64_u64 for 32bit systems returns an approximately correct result when the divisor exceeds 32bits. Since doing 64bit division using 32bit hardware is a long since solved problem we just use one of the existing proven methods. Additionally, add a div64_s64 function to correctly handle doing signed 64bit division. --- include/linux/kernel.h | 5 +++ include/linux/math64.h | 12 +++++++++ lib/div64.c | 64 +++++++++++++++++++++++++++++++++++++++-------- 3 files changed, 70 insertions(+), 11 deletions(-) diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 5de838b..7a00dff 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -162,6 +162,11 @@ extern int _cond_resched(void); (__x < 0) ? -__x : __x; \ }) +#define abs64(x) ({ \ + s64 __x = (x); \ + (__x < 0) ? -__x : __x; \ + }) + #ifdef CONFIG_PROVE_LOCKING void might_fault(void); #else diff --git a/include/linux/math64.h b/include/linux/math64.h index c87f152..23fcdfc 100644 --- a/include/linux/math64.h +++ b/include/linux/math64.h @@ -35,6 +35,14 @@ static inline u64 div64_u64(u64 dividend, u64 divisor) return dividend / divisor; } +/** + * div64_s64 - signed 64bit divide with 64bit divisor + */ +static inline s64 div64_s64(s64 dividend, s64 divisor) +{ + return dividend / divisor; +} + #elif BITS_PER_LONG == 32 #ifndef div_u64_rem @@ -53,6 +61,10 @@ extern s64 div_s64_rem(s64 dividend, s32 divisor, s32 *remainder); extern u64 div64_u64(u64 dividend, u64 divisor); #endif +#ifndef div64_s64 +extern s64 div64_s64(s64 dividend, s64 divisor); +#endif + #endif /* BITS_PER_LONG */ /** diff --git a/lib/div64.c b/lib/div64.c index a111eb8..e4e7fc6 100644 --- a/lib/div64.c +++ b/lib/div64.c @@ -77,26 +77,68 @@ s64 div_s64_rem(s64 dividend, s32 divisor, s32 *remainder) EXPORT_SYMBOL(div_s64_rem); #endif -/* 64bit divisor, dividend and result. dynamic precision */ +/** + * div64_u64 - unsigned 64bit divide with 64bit divisor + * @dividend: 64bit dividend + * @divisor: 64bit divisor + * + * This implementation is a modified version of the algorithm proposed + * by the book 'Hacker's Delight'. The original source and full proof + * can be found here and is available for use without restriction. + * + * 'http://www.hackersdelight.org/HDcode/newCode/divDouble.c' + */ #ifndef div64_u64 u64 div64_u64(u64 dividend, u64 divisor) { - u32 high, d; - - high = divisor >> 32; - if (high) { - unsigned int shift = fls(high); + u64 u0, quot0, quot1; + u32 rem; + int n; + + if (divisor >> 32 == 0) { + if (dividend >> 32 < divisor) { + return div_u64_rem(dividend, divisor, &rem); + } else { + u0 = dividend & 0xFFFFFFFF; + quot1 = div_u64_rem(dividend >> 32, divisor, &rem); + u0 += ((u64)rem << 32); + quot0 = div_u64_rem(u0, divisor, &rem); + return (quot1 << 32) + quot0; + } + } else { + n = __builtin_clzll(divisor); + quot1 = div_u64_rem(dividend >> 1, (divisor << n) >> 32, &rem); + quot0 = (quot1 << n) >> 31; - d = divisor >> shift; - dividend >>= shift; - } else - d = divisor; + if (quot0 != 0) + quot0 = quot0 - 1; + if ((dividend - quot0 * divisor) >= divisor) + quot0 = quot0 + 1; - return div_u64(dividend, d); + return quot0; + } } EXPORT_SYMBOL(div64_u64); #endif +/** + * div64_s64 - signed 64bit divide with 64bit divisor + * @dividend: 64bit dividend + * @divisor: 64bit divisor + */ +#ifndef div64_s64 +s64 div64_s64(s64 dividend, s64 divisor) +{ + s64 quot, t; + + quot = div64_u64(abs64(dividend), abs64(divisor)); + t = (dividend ^ divisor) >> 63; + + return (quot ^ t) - t; +} +EXPORT_SYMBOL(div64_s64); +#endif + #endif /* BITS_PER_LONG == 32 */ /* -- 1.5.4.5 --Boundary-00=_JaLtM9puqmnsqEX Content-Type: text/x-csrc; charset="iso-8859-1"; name="div64_u64_test.c" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="div64_u64_test.c" #include /* * Verification test for div64_u64. */ #ifndef abs64 #define abs64(x) ({ \ s64 __x = (x); \ (__x < 0) ? -__x : __x; \ }) #endif int div64_u64_test(void) { u64 uu, vu, qu, ru; int n, i, j, errors = 0; const u64 tabu[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1000, 2003, 32765, 32766, 32767, 32768, 32769, 32760, 65533, 65534, 65535, 65536, 65537, 65538, 0x7ffffffeULL, 0x7fffffffULL, 0x80000000ULL, 0x80000001ULL, 0x7000000000000000ULL, 0x7000000080000000ULL, 0x7000000080000001ULL, 0x7fffffffffffffffULL, 0x7fffffff8fffffffULL, 0x7fffffff8ffffff1ULL, 0x7fffffff00000000ULL, 0x7fffffff80000000ULL, 0x7fffffff00000001ULL, 0x8000000000000000ULL, 0x8000000080000000ULL, 0x8000000080000001ULL, 0xc000000000000000ULL, 0xc000000080000000ULL, 0xc000000080000001ULL, 0xfffffffffffffffdULL, 0xfffffffffffffffeULL, 0xffffffffffffffffULL, }; printk("%s", "Testing unsigned 64-bit division.\n"); n = sizeof(tabu) / sizeof(tabu[0]); for (i = 0; i < n; i++) { for (j = 1; j < n; j++) { uu = tabu[i]; vu = tabu[j]; qu = div64_u64(uu, vu); ru = uu - qu * vu; if (qu > uu || ru >= vu) { printk("%016llx/%016llx != %016llx " "rem %016llx\n", uu, vu, qu, ru); errors++; } } } if (errors) { printk("Failed %d/%d tests\n", errors, n * (n - 1)); } else { printk("Passed all %d tests\n", n * (n - 1)); } return 0; } void div64_u64_exit(void) { } module_init(div64_u64_test); module_exit(div64_u64_exit); --Boundary-00=_JaLtM9puqmnsqEX Content-Type: text/x-csrc; charset="iso-8859-1"; name="div64_s64_test.c" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="div64_s64_test.c" #include /* * Verification test for div64_s64. */ #ifndef abs64 #define abs64(x) ({ \ s64 __x = (x); \ (__x < 0) ? -__x : __x; \ }) #endif int div64_s64_test(void) { s64 u, v, q, r; int n, i, j, k, errors = 0; const s64 tabs[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1000, 2003, 32765, 32766, 32767, 32768, 32769, 32760, 65533, 65534, 65535, 65536, 65537, 65538, 0x7ffffffeLL, 0x7fffffffLL, 0x80000000LL, 0x80000001LL, 0x7000000000000000LL, 0x7000000080000000LL, 0x7000000080000001LL, 0x7fffffffffffffffLL, 0x7fffffff8fffffffLL, 0x7fffffff8ffffff1LL, 0x7fffffff00000000LL, 0x7fffffff80000000LL, 0x7fffffff00000001LL, 0x0123456789abcdefLL, 0x00000000abcdef01LL, 0x0000000012345678LL, }; printk("%s", "Testing signed 64-bit division.\n"); n = sizeof(tabs) / sizeof(tabs[0]); for (i = 0; i < n; i++) { for (j = 1; j < n; j++) { for (k = 0; k <= 3; k++) { u = (k & 1) ? -tabs[i] : tabs[i]; v = (k >= 2) ? -tabs[j] : tabs[j]; q = div64_s64(u, v); r = u - q * v; if (abs64(q) > abs64(u) || abs64(r) >= abs64(v) || (r != 0 && (r ^ u) < 0)) { printk("%016llx/%016llx != %016llx " "rem %016llx\n", u, v, q, r); errors++; } } } } if (errors) { printk("Failed %d/%d tests\n", errors, n * (n - 1)); } else { printk("Passed all %d tests\n", n * (n - 1)); } return 0; } void div64_s64_exit(void) { } module_init(div64_s64_test); module_exit(div64_s64_exit); --Boundary-00=_JaLtM9puqmnsqEX Content-Type: text/plain; charset="iso-8859-1"; name="README" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="README" linux-2.6.35 Testing Summary | x86_64 | x86 | --------------------+-----------------+ linux-2.6.35 | PASS | FAIL | linux-2.6.35+patch | PASS | PASS | Testing Details (x86_64) -------------------------------------------------------------------------- * PASS - linux-2.6.35 Testing unsigned 64-bit division. Passed all 2756 tests * PASS - linux-2.6.35+patch Testing unsigned 64-bit division. Passed all 2756 tests Testing signed 64-bit division. Passed all 2162 tests Testing Details (x86) -------------------------------------------------------------------------- * FAIL - linux-2.6.35 Testing unsigned 64-bit division. 7000000080000000/7000000080000001 != 0000000000000001 rem ffffffffffffffff 7fffffff8fffffff/7fffffffffffffff != 0000000000000001 rem ffffffff90000000 7fffffff8ffffff1/7fffffffffffffff != 0000000000000001 rem ffffffff8ffffff2 7fffffff8ffffff1/7fffffff8fffffff != 0000000000000001 rem fffffffffffffff2 7fffffff00000000/7fffffff00000001 != 0000000000000001 rem ffffffffffffffff 7fffffff80000000/7fffffffffffffff != 0000000000000001 rem ffffffff80000001 7fffffff80000000/7fffffff8fffffff != 0000000000000001 rem fffffffff0000001 7fffffff80000000/7fffffff8ffffff1 != 0000000000000001 rem fffffffff000000f 8000000000000000/8000000080000000 != 0000000000000001 rem ffffffff80000000 8000000000000000/8000000080000001 != 0000000000000001 rem ffffffff7fffffff 8000000080000000/8000000080000001 != 0000000000000001 rem ffffffffffffffff c000000000000000/c000000080000000 != 0000000000000001 rem ffffffff80000000 c000000000000000/c000000080000001 != 0000000000000001 rem ffffffff7fffffff c000000080000000/c000000080000001 != 0000000000000001 rem ffffffffffffffff fffffffffffffffd/7fffffffffffffff != 0000000000000002 rem ffffffffffffffff fffffffffffffffd/fffffffffffffffe != 0000000000000001 rem ffffffffffffffff fffffffffffffffe/ffffffffffffffff != 0000000000000001 rem ffffffffffffffff Failed 17/2756 tests * PASS - linux-2.6.35+patch Testing unsigned 64-bit division. Passed all 2756 tests Testing signed 64-bit division. Passed all 2162 tests --Boundary-00=_JaLtM9puqmnsqEX--