linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Fix for __div64_32 locks when using some 64 bit numbers
@ 2009-03-17 21:15 davidastro
  2009-03-17 23:03 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 7+ messages in thread
From: davidastro @ 2009-03-17 21:15 UTC (permalink / raw)
  To: linuxppc-dev


I found a bug when using the function __div64_32 in assembly in a 32 bit ppc
architecture unit.

I tried the numbers 55834565048000000 for the dividend and 4294967079 for
the divisor. When passing these two numbers to the function  __div64_32, I
had a software lock. I searched for possible patches online and in different
forums but I could not find anything related to the assembly implementation
to this function (I would have to apologize if somebody already found a fix
:-) ).

Anyway, when analyzing the assembly code, I found out with gdb the problem.
I am not an expert in ppc architecture but I read the documentation and I am
pretty sure I solved the issue (I have been testing for couple of days using
random 64 to 32 number combinations with good results).

Who or Where should I post the fix to be reviewed.

Thanks for your attention
-- 
View this message in context: http://www.nabble.com/Fix-for-__div64_32-locks-when-using-some-64-bit-numbers-tp22567864p22567864.html
Sent from the linuxppc-dev mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fix for __div64_32 locks when using some 64 bit numbers
  2009-03-17 21:15 Fix for __div64_32 locks when using some 64 bit numbers davidastro
@ 2009-03-17 23:03 ` Benjamin Herrenschmidt
  2009-03-18 15:33   ` davidastro
  2009-03-20 19:33   ` davidastro
  0 siblings, 2 replies; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2009-03-17 23:03 UTC (permalink / raw)
  To: davidastro; +Cc: linuxppc-dev

On Tue, 2009-03-17 at 14:15 -0700, davidastro wrote:
> I found a bug when using the function __div64_32 in assembly in a 32 bit ppc
> architecture unit.
> 
> I tried the numbers 55834565048000000 for the dividend and 4294967079 for
> the divisor. When passing these two numbers to the function  __div64_32, I
> had a software lock. I searched for possible patches online and in different
> forums but I could not find anything related to the assembly implementation
> to this function (I would have to apologize if somebody already found a fix
> :-) ).
> 
> Anyway, when analyzing the assembly code, I found out with gdb the problem.
> I am not an expert in ppc architecture but I read the documentation and I am
> pretty sure I solved the issue (I have been testing for couple of days using
> random 64 to 32 number combinations with good results).
> 
> Who or Where should I post the fix to be reviewed.

Here is fine :-)

Ben.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fix for __div64_32 locks when using some 64 bit numbers
  2009-03-17 23:03 ` Benjamin Herrenschmidt
@ 2009-03-18 15:33   ` davidastro
  2009-03-23  4:39     ` Paul Mackerras
  2009-03-20 19:33   ` davidastro
  1 sibling, 1 reply; 7+ messages in thread
From: davidastro @ 2009-03-18 15:33 UTC (permalink / raw)
  To: linuxppc-dev



Benjamin Herrenschmidt wrote:
> 
> On Tue, 2009-03-17 at 14:15 -0700, davidastro wrote:
>> I found a bug when using the function __div64_32 in assembly in a 32 bit
>> ppc
>> architecture unit.
>> 
>> I tried the numbers 55834565048000000 for the dividend and 4294967079 for
>> the divisor. When passing these two numbers to the function  __div64_32,
>> I
>> had a software lock. I searched for possible patches online and in
>> different
>> forums but I could not find anything related to the assembly
>> implementation
>> to this function (I would have to apologize if somebody already found a
>> fix
>> :-) ).
>> 
>> Anyway, when analyzing the assembly code, I found out with gdb the
>> problem.
>> I am not an expert in ppc architecture but I read the documentation and I
>> am
>> pretty sure I solved the issue (I have been testing for couple of days
>> using
>> random 64 to 32 number combinations with good results).
>> 
>> Who or Where should I post the fix to be reviewed.
> 
> Here is fine :-)
> 
> Ben.
> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev
> 
> 


Basically, the numbers shown above was causing the 64 by 32 bit algorithm to
divide by zero making the unit spin and also giving incorrect results.
Here is the code as it was before.


#include <asm/ppc_asm.h>
#include <asm/processor.h>

_GLOBAL(__div64_32)
        lwz     r5,0(r3)        # get the dividend into r5/r6
        lwz     r6,4(r3)
        cmplw   r5,r4
        li      r7,0
        li      r8,0
        blt     1f
        divwu   r7,r5,r4        # if dividend.hi >= divisor,
        mullw   r0,r7,r4        # quotient.hi = dividend.hi / divisor
        subf.   r5,r0,r5        # dividend.hi %= divisor
        beq     3f
1:      mr      r11,r5          # here dividend.hi != 0
        andis.  r0,r5,0xc000
        bne     2f
        cntlzw  r0,r5           # we are shifting the dividend right
        li      r10,-1          # to make it < 2^32, and shifting
        srw     r10,r10,r0      # the divisor right the same amount,
        add     r9,r4,r10       # rounding up (so the estimate cannot
        andc    r11,r6,r10      # ever be too large, only too small)
        andc    r9,r9,r10      #THIS CODE COULD STORE A ZERO IN r9
        or      r11,r5,r11
        rotlw   r9,r9,r0
        rotlw   r11,r11,r0
        divwu   r11,r11,r9      # WARNING r9 COULD BE ZERO
2:      mullw   r10,r11,r4      # to get an estimate of the quotient,
        mulhwu  r9,r11,r4       # multiply the estimate by the divisor,
        subfc   r6,r10,r6       # take the product from the divisor,
        add     r8,r8,r11       # and add the estimate to the accumulated
        subfe.  r5,r9,r5        # quotient
        bne     1b
3:      cmplw   r6,r4
        blt     4f
        divwu   r0,r6,r4        # perform the remaining 32-bit division
        mullw   r10,r0,r4       # and get the remainder
        add     r8,r8,r0
        subf    r6,r10,r6
4:      stw     r7,0(r3)        # return the quotient in *r3
        stw     r8,4(r3)
        mr      r3,r6           # return the remainder in r3
        blr
 

In the line in bold with the warning:
"divwu   r11,r11,r9      # WARNING r9 COULD BE ZERO"

r9 could be zero if giving the right number by the previous operation:
"andc    r9,r9,r10      #THIS CODE COULD STORE A ZERO IN r9"

My propose solution is the following:

#include <asm/ppc_asm.h>
#include <asm/processor.h>

_GLOBAL(__div64_32)
        lwz     r5,0(r3)        # get the dividend into r5/r6
        lwz     r6,4(r3)
        cmplw   r5,r4
        li      r7,0
        li      r8,0
        blt     1f
        divwu   r7,r5,r4        # if dividend.hi >= divisor,
        mullw   r0,r7,r4        # quotient.hi = dividend.hi / divisor
        subf.   r5,r0,r5        # dividend.hi %= divisor
        beq     3f
1:      mr      r11,r5          # here dividend.hi != 0
        andis.  r0,r5,0xc000
        bne     2f
        cntlzw  r0,r5           # we are shifting the dividend right
        li      r10,-1          # to make it < 2^32, and shifting
        srw     r10,r10,r0      # the divisor right the same amount,
        add     r9,r4,r10       # rounding up (so the estimate cannot
        andc    r11,r6,r10      # ever be too large, only too small)
        andc.   r9,r9,r10		# Check result is not equal to zero (r9 is
dividing later on)
        bne		5f
        li		r9,1     		# Magic number to avoid r9 = 0 if ever happens
5:     or		r11,r5,r11        
        rotlw   r9,r9,r0
        rotlw   r11,r11,r0
        divwu   r11,r11,r9      # then we divide the shifted quantities
2:      mullw   r10,r11,r4      # to get an estimate of the quotient,
        mulhwu  r9,r11,r4       # multiply the estimate by the divisor,
        subfc   r6,r10,r6       # take the product from the divisor,
        add     r8,r8,r11       # and add the estimate to the accumulated
        subfe.  r5,r9,r5        # quotient
        bne     1b
3:      cmplw   r6,r4
        blt     4f
        divwu   r0,r6,r4        # perform the remaining 32-bit division
        mullw   r10,r0,r4       # and get the remainder
        add     r8,r8,r0
        subf    r6,r10,r6
4:      stw     r7,0(r3)        # return the quotient in *r3
        stw     r8,4(r3)
        mr      r3,r6           # return the remainder in r3
        blr


Now, I check if the operation:
"andc.   r9,r9,r10"

is equal to zero. If not equal to zero branch to 5:
"bne		5f"

which is the normal behavior of the algorithm, but if equal to zero perform:
"li		r9,1     		# Magic number to avoid r9 = 0 if ever happens"

In this case I assigned a 1 to r9 which is the smallest number close to
zero.
The division algorithm works by approximations and 1 one is a good
approximation for this pass.

Well, I have tested this code for two days and I have compared the results
with the C generic implementation
obtaining the same results but faster execution by the assembly
implementations (6-10 times faster).

Please, let me know if this solution pass the test and if it does, I would
like to know in which kernel version would be applied.


Thanks for your attention,   
-- 
View this message in context: http://www.nabble.com/Fix-for-__div64_32-locks-when-using-some-64-bit-numbers-tp22567864p22581509.html
Sent from the linuxppc-dev mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fix for __div64_32 locks when using some 64 bit numbers
  2009-03-17 23:03 ` Benjamin Herrenschmidt
  2009-03-18 15:33   ` davidastro
@ 2009-03-20 19:33   ` davidastro
  2009-03-21 21:46     ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 7+ messages in thread
From: davidastro @ 2009-03-20 19:33 UTC (permalink / raw)
  To: linuxppc-dev


Hi Ben:

I was wondering if you have any change to look into and test the propose fix
I suggested in my previous post.
I'd like to know if the fix is correct.

Thanks for your attention,


Benjamin Herrenschmidt wrote:
> 
> On Tue, 2009-03-17 at 14:15 -0700, davidastro wrote:
>> I found a bug when using the function __div64_32 in assembly in a 32 bit
>> ppc
>> architecture unit.
>> 
>> I tried the numbers 55834565048000000 for the dividend and 4294967079 for
>> the divisor. When passing these two numbers to the function  __div64_32,
>> I
>> had a software lock. I searched for possible patches online and in
>> different
>> forums but I could not find anything related to the assembly
>> implementation
>> to this function (I would have to apologize if somebody already found a
>> fix
>> :-) ).
>> 
>> Anyway, when analyzing the assembly code, I found out with gdb the
>> problem.
>> I am not an expert in ppc architecture but I read the documentation and I
>> am
>> pretty sure I solved the issue (I have been testing for couple of days
>> using
>> random 64 to 32 number combinations with good results).
>> 
>> Who or Where should I post the fix to be reviewed.
> 
> Here is fine :-)
> 
> Ben.
> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev
> 
> 

-- 
View this message in context: http://www.nabble.com/Fix-for-__div64_32-locks-when-using-some-64-bit-numbers-tp22567864p22627440.html
Sent from the linuxppc-dev mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fix for __div64_32 locks when using some 64 bit numbers
  2009-03-20 19:33   ` davidastro
@ 2009-03-21 21:46     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2009-03-21 21:46 UTC (permalink / raw)
  To: davidastro; +Cc: linuxppc-dev

On Fri, 2009-03-20 at 12:33 -0700, davidastro wrote:
> Hi Ben:
> 
> I was wondering if you have any change to look into and test the propose fix
> I suggested in my previous post.
> I'd like to know if the fix is correct.

Sorry, I haven't had a chance yet. I will asap.

Ben.

> Thanks for your attention,
> 
> 
> Benjamin Herrenschmidt wrote:
> > 
> > On Tue, 2009-03-17 at 14:15 -0700, davidastro wrote:
> >> I found a bug when using the function __div64_32 in assembly in a 32 bit
> >> ppc
> >> architecture unit.
> >> 
> >> I tried the numbers 55834565048000000 for the dividend and 4294967079 for
> >> the divisor. When passing these two numbers to the function  __div64_32,
> >> I
> >> had a software lock. I searched for possible patches online and in
> >> different
> >> forums but I could not find anything related to the assembly
> >> implementation
> >> to this function (I would have to apologize if somebody already found a
> >> fix
> >> :-) ).
> >> 
> >> Anyway, when analyzing the assembly code, I found out with gdb the
> >> problem.
> >> I am not an expert in ppc architecture but I read the documentation and I
> >> am
> >> pretty sure I solved the issue (I have been testing for couple of days
> >> using
> >> random 64 to 32 number combinations with good results).
> >> 
> >> Who or Where should I post the fix to be reviewed.
> > 
> > Here is fine :-)
> > 
> > Ben.
> > 
> > 
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@ozlabs.org
> > https://ozlabs.org/mailman/listinfo/linuxppc-dev
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fix for __div64_32 locks when using some 64 bit numbers
  2009-03-18 15:33   ` davidastro
@ 2009-03-23  4:39     ` Paul Mackerras
  2009-03-23 17:56       ` davidastro
  0 siblings, 1 reply; 7+ messages in thread
From: Paul Mackerras @ 2009-03-23  4:39 UTC (permalink / raw)
  To: davidastro; +Cc: linuxppc-dev

davidastro <davidastro@hotmail.com> writes:

> Basically, the numbers shown above was causing the 64 by 32 bit algorithm to
> divide by zero making the unit spin and also giving incorrect results.
> Here is the code as it was before.
[snip]
>         cntlzw  r0,r5           # we are shifting the dividend right
>         li      r10,-1          # to make it < 2^32, and shifting
>         srw     r10,r10,r0      # the divisor right the same amount,
>         add     r9,r4,r10       # rounding up (so the estimate cannot
>         andc    r11,r6,r10      # ever be too large, only too small)
>         andc    r9,r9,r10      #THIS CODE COULD STORE A ZERO IN r9
>         or      r11,r5,r11
>         rotlw   r9,r9,r0
>         rotlw   r11,r11,r0

That bug was fixed in October 2005 in commit 344480b99730bfd2, and the
fix is in v2.6.15-rc1 and all later kernels.  I fixed it a bit
differently to your suggestion though - see

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=344480b99730bfd205e306d3fd168cdcebe83425

You must be working from an old kernel tree - which kernel version are
you using?

Paul.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fix for __div64_32 locks when using some 64 bit numbers
  2009-03-23  4:39     ` Paul Mackerras
@ 2009-03-23 17:56       ` davidastro
  0 siblings, 0 replies; 7+ messages in thread
From: davidastro @ 2009-03-23 17:56 UTC (permalink / raw)
  To: linuxppc-dev


Paul,

You are correct. Since it was such a small piece of code, I didn't use diff
between the kernel I am using (2.6.14) and the newest ones (2.6.28 on the
Linux). Also, I do not know how to look in the kernel files commit history.
I guess the link you gave me might be of some help. 

Thanks for your help


Paul Mackerras wrote:
> 
> davidastro <davidastro@hotmail.com> writes:
> 
>> Basically, the numbers shown above was causing the 64 by 32 bit algorithm
>> to
>> divide by zero making the unit spin and also giving incorrect results.
>> Here is the code as it was before.
> [snip]
>>         cntlzw  r0,r5           # we are shifting the dividend right
>>         li      r10,-1          # to make it < 2^32, and shifting
>>         srw     r10,r10,r0      # the divisor right the same amount,
>>         add     r9,r4,r10       # rounding up (so the estimate cannot
>>         andc    r11,r6,r10      # ever be too large, only too small)
>>         andc    r9,r9,r10      #THIS CODE COULD STORE A ZERO IN r9
>>         or      r11,r5,r11
>>         rotlw   r9,r9,r0
>>         rotlw   r11,r11,r0
> 
> That bug was fixed in October 2005 in commit 344480b99730bfd2, and the
> fix is in v2.6.15-rc1 and all later kernels.  I fixed it a bit
> differently to your suggestion though - see
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=344480b99730bfd205e306d3fd168cdcebe83425
> 
> You must be working from an old kernel tree - which kernel version are
> you using?
> 
> Paul.
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev
> 
> 

-- 
View this message in context: http://www.nabble.com/Fix-for-__div64_32-locks-when-using-some-64-bit-numbers-tp22567864p22665081.html
Sent from the linuxppc-dev mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-03-23 17:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-17 21:15 Fix for __div64_32 locks when using some 64 bit numbers davidastro
2009-03-17 23:03 ` Benjamin Herrenschmidt
2009-03-18 15:33   ` davidastro
2009-03-23  4:39     ` Paul Mackerras
2009-03-23 17:56       ` davidastro
2009-03-20 19:33   ` davidastro
2009-03-21 21:46     ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).