From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753378Ab0JMVmQ (ORCPT ); Wed, 13 Oct 2010 17:42:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37889 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752074Ab0JMVmP (ORCPT ); Wed, 13 Oct 2010 17:42:15 -0400 Date: Wed, 13 Oct 2010 23:37:46 +0200 From: Oleg Nesterov To: Brian Behlendorf Cc: LKML , Andrew Morton Subject: Re: [PATCH] Make div64_u64() precise on 32bit platforms Message-ID: <20101013213746.GA27248@redhat.com> References: <201010121227.05735.behlendorf1@llnl.gov> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201010121227.05735.behlendorf1@llnl.gov> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/12, Brian Behlendorf wrote: > > I'm resending the patch as is and adding what I hope are the right CCs. Also > let me explain why I opted to add abs64() and use the gcc builtin. > > >Can't we just improve abs? Say, > > I was reluctant to change abs() since it would have a much larger impact on > the code base. Using typeof() should be OK but if any of the callers > mistakenly call abs() with an unsigned value then we could see compiler > warnings about '__x < 0' being a useless conditional. I see. Probably in this case we want this warning. But I agree, it is better to make a separate patch for such a change. > >This is a bit unusual. I mean, it is not that common to use gcc builtins > >in the normal code. And, it seems, we can use __fls(divisor >> 32) or > >just fls64() instead ? > > I opted for the gcc builtin because I felt it made the code more readable. I > also suspect it will perform slightly better than __fls() on some archs. Well, compared to div_64() we are going to do, this is nothing. But I won't argue. I think the patch is correct. A couple of questions though, > + * 'http://www.hackersdelight.org/HDcode/newCode/divDouble.c' 404 > u64 div64_u64(u64 dividend, u64 divisor) > { > - u32 high, d; > - > - high = divisor >> 32; > - if (high) { > - unsigned int shift = fls(high); > + u64 u0, quot0, quot1; > + u32 rem; > + int n; > + > + if (divisor >> 32 == 0) { > + if (dividend >> 32 < divisor) { > + return div_u64_rem(dividend, divisor, &rem); > + } else { > + u0 = dividend & 0xFFFFFFFF; > + quot1 = div_u64_rem(dividend >> 32, divisor, &rem); > + u0 += ((u64)rem << 32); > + quot0 = div_u64_rem(u0, divisor, &rem); > + return (quot1 << 32) + quot0; > + } Looks correct... but I can't understand these complications. Looks like we can just do if ((divisor >> 32) == 0) { div_u64(dividend, divisor); } else { ... No? > + } else { > + n = __builtin_clzll(divisor); > + quot1 = div_u64_rem(dividend >> 1, (divisor << n) >> 32, &rem); > + quot0 = (quot1 << n) >> 31; I can't understand this "dividend >> 1". It seems to me that quot1 = div_u64(dividend, (divisor << n) >> 32); quot0 = (quot1 << n) >> 32; should be equally correct. Or I missed some overflow? Anyway this looks correct, but I almost died trying to understand this code (or, better say, to convince myself I can understand it ;) Looks like, if we denote A = dividend B = divisor K = 1ull << (32 - n) then quot0 = A / (B - (B % K)) which is obviously >= A/B. All we need is to ensure is that it is <= A/B + 1, and this seems to be true. So, I believe the patch is correct. A bit off-topic, > uu = tabu[i]; > vu = tabu[j]; > qu = div64_u64(uu, vu); > ru = uu - qu * vu; > if (qu > uu || ru >= vu) { > printk("%016llx/%016llx != %016llx " > "rem %016llx\n", uu, vu, qu, ru); > errors++; > } I wouldn't trust this check too much. I mean, it can miss an error. For example, consider something like vu = -1ll uu = vu / 2 qu = 2 // suppose that div64_u64() is very wrong Afaics, this wrong qu passes the check. Oleg.