A divide by zero bug in lib/math/rational.c (with triggering input)

All of lore.kernel.org
 help / color / mirror / Atom feed

* A divide by zero bug in lib/math/rational.c (with triggering input)
@ 2021-05-21  5:09 Yiyuan guo
       [not found] ` <CAHp75Vf8kQ73w0R9ieDNjDVkxM-V83QRN9mc6BjRZA8xHpPNAA@mail.gmail.com>
  0 siblings, 1 reply; 7+ messages in thread
From: Yiyuan guo @ 2021-05-21  5:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: andy, tpiepho, akpm, oskar, Yiyuan guo

In the file lib/math/rational.c, the function
rational_best_approximation has the following
code:

void rational_best_approximation(
    unsigned long given_numerator, unsigned long given_denominator,
    unsigned long max_numerator, unsigned long max_denominator,
    unsigned long *best_numerator, unsigned long *best_denominator) {
   ...
   if ((n2 > max_numerator) || (d2 > max_denominator)) {
            unsigned long t = min((max_numerator - n0) / n1,
                          (max_denominator - d0) / d1);
   ...
}

d1 may be equal to zero when performing the division, leading to a
divide by zero problem.

One input  to trigger the divide by zero bug is:
rational_best_approximation(31415, 100, (1 << 8) - 1, (1 << 5) - 1, &n, &d)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A divide by zero bug in lib/math/rational.c (with triggering input)
       [not found]   ` <CAHp75Vft8pnA+m0C=Ok7nRyjERAd2uJJ4q6HcN460j0Hir6Kaw@mail.gmail.com>
@ 2021-05-21  7:55     ` Yiyuan guo
  2021-05-21  9:20       ` Trent Piepho
  0 siblings, 1 reply; 7+ messages in thread
From: Yiyuan guo @ 2021-05-21  7:55 UTC (permalink / raw)
  To: Andy Shevchenko; +Cc: linux-kernel, andy, tpiepho, akpm, oskar

Thanks for your timely response.

I am not familiar with the theorem. But any input satisfying the
condition below will
trigger a divide by zero at the first loop iteration:

(given_numerator / given_denominator > max_numerator) || (1 +
given_numerator / given_denominator > max_denominator)

I think such a condition is rather complex and may not be enforced by
all callers of this function.

On Fri, May 21, 2021 at 3:42 PM Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
>
>
>
> On Friday, May 21, 2021, Andy Shevchenko <andy.shevchenko@gmail.com> wrote:
>>
>>
>>
>> On Friday, May 21, 2021, Yiyuan guo <yguoaz@gmail.com> wrote:
>>>
>>> In the file lib/math/rational.c, the function
>>> rational_best_approximation has the following
>>> code:
>>>
>>> void rational_best_approximation(
>>>     unsigned long given_numerator, unsigned long given_denominator,
>>>     unsigned long max_numerator, unsigned long max_denominator,
>>>     unsigned long *best_numerator, unsigned long *best_denominator) {
>>>    ...
>>>    if ((n2 > max_numerator) || (d2 > max_denominator)) {
>>>             unsigned long t = min((max_numerator - n0) / n1,
>>>                           (max_denominator - d0) / d1);
>>>    ...
>>> }
>>>
>>> d1 may be equal to zero when performing the division, leading to a
>>> divide by zero problem.
>>>
>>> One input  to trigger the divide by zero bug is:
>>> rational_best_approximation(31415, 100, (1 << 8) - 1, (1 << 5) - 1, &n, &d)
>>
>>
>>
>> Have you read a theorem about this? TL;DR; as far as I can see the input data is not suitable for this function.
>>
>
>
> I think we may add the proper check and saturate the output which in your case should be (255,1).
>
>>
>>
>> --
>> With Best Regards,
>> Andy Shevchenko
>>
>>
>
>
> --
> With Best Regards,
> Andy Shevchenko
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A divide by zero bug in lib/math/rational.c (with triggering input)
  2021-05-21  7:55     ` Yiyuan guo
@ 2021-05-21  9:20       ` Trent Piepho
  2021-05-21  9:53         ` Andy Shevchenko
  0 siblings, 1 reply; 7+ messages in thread
From: Trent Piepho @ 2021-05-21  9:20 UTC (permalink / raw)
  To: Yiyuan guo; +Cc: Andy Shevchenko, linux-kernel, andy, akpm, oskar

On Fri, May 21, 2021 at 12:55 AM Yiyuan guo <yguoaz@gmail.com> wrote:
>
> Thanks for your timely response.
>
> I am not familiar with the theorem. But any input satisfying the
> condition below will
> trigger a divide by zero at the first loop iteration:
>
> (given_numerator / given_denominator > max_numerator) || (1 +
> given_numerator / given_denominator > max_denominator)

I think the error can only occur when the loop exits on the 1st
iteration, when d1 is still zero.  In this case the prior convergent,
n1/d1 = 1/0, does not really exist as this is the 1st iteration.  The
actual series of convergents generated will never have zero terms,
because we stop at zero, so there will never be zero from the prior
iteration as we would have stopped there.

I think the prior version of the code, which did not consider
semi-convergents, would have determined the 1st convergent, 314/1,
exceeded the bounds and would return the prior one, 1/0, without
generating an exception but also not a correct answer, since 1/0 isn't
really part of the series, it's just an initial value to make the math
that generates the series work (d2 = a * d1 + d0).

With semi-convergents, this can actually get the correct answer.  The
best semi-convergent term is correctly found, (max_numerator - n0) /
n1 = 255.  Using this would return 255/1, which is in this case the
best answer.

But the "is semi-convergent better than prior convergent" test does
not consider what I think is a special case of there being no prior
convergent.  In this case it should always select the semi-convergent.

I think this handles it:

                if ((n2 > max_numerator) || (d2 > max_denominator)) {
                       unsigned long t = (max_numerator - n0) / n1;
                       if (!d1 || (t = min(t, max_denominator - d0) / d1)) ||
                           2u * t > a || (2u * t == a && d0 * dp > d1 * d)) {
                               n1 = n0 + t * n1;
                               d1 = d0 + t * d1;
                       }
                       break;
               }

Above !d1 is the special case.  I don't like that, but I'm not seeing
a way to think about the problect that doesn't involve one.

> I think such a condition is rather complex and may not be enforced by
> all callers of this function.
>
> On Fri, May 21, 2021 at 3:42 PM Andy Shevchenko
> <andy.shevchenko@gmail.com> wrote:
> >
> >
> >
> > On Friday, May 21, 2021, Andy Shevchenko <andy.shevchenko@gmail.com> wrote:
> >>
> >>
> >>
> >> On Friday, May 21, 2021, Yiyuan guo <yguoaz@gmail.com> wrote:
> >>>
> >>> In the file lib/math/rational.c, the function
> >>> rational_best_approximation has the following
> >>> code:
> >>>
> >>> void rational_best_approximation(
> >>>     unsigned long given_numerator, unsigned long given_denominator,
> >>>     unsigned long max_numerator, unsigned long max_denominator,
> >>>     unsigned long *best_numerator, unsigned long *best_denominator) {
> >>>    ...
> >>>    if ((n2 > max_numerator) || (d2 > max_denominator)) {
> >>>             unsigned long t = min((max_numerator - n0) / n1,
> >>>                           (max_denominator - d0) / d1);
> >>>    ...
> >>> }
> >>>
> >>> d1 may be equal to zero when performing the division, leading to a
> >>> divide by zero problem.
> >>>
> >>> One input  to trigger the divide by zero bug is:
> >>> rational_best_approximation(31415, 100, (1 << 8) - 1, (1 << 5) - 1, &n, &d)
> >>
> >>
> >>
> >> Have you read a theorem about this? TL;DR; as far as I can see the input data is not suitable for this function.
> >>
> >
> >
> > I think we may add the proper check and saturate the output which in your case should be (255,1).
> >
> >>
> >>
> >> --
> >> With Best Regards,
> >> Andy Shevchenko
> >>
> >>
> >
> >
> > --
> > With Best Regards,
> > Andy Shevchenko
> >
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A divide by zero bug in lib/math/rational.c (with triggering input)
  2021-05-21  9:20       ` Trent Piepho
@ 2021-05-21  9:53         ` Andy Shevchenko
  2021-05-22 19:07           ` Oskar Schirmer
  2021-05-23  0:05           ` Trent Piepho
  0 siblings, 2 replies; 7+ messages in thread
From: Andy Shevchenko @ 2021-05-21  9:53 UTC (permalink / raw)
  To: Trent Piepho, Daniel Latypov; +Cc: Yiyuan guo, linux-kernel, andy, akpm, oskar

+Cc: Daniel (here is a real case for test cases!)

On Fri, May 21, 2021 at 12:20 PM Trent Piepho <tpiepho@gmail.com> wrote:
> On Fri, May 21, 2021 at 12:55 AM Yiyuan guo <yguoaz@gmail.com> wrote:
> >
> > Thanks for your timely response.
> >
> > I am not familiar with the theorem. But any input satisfying the
> > condition below will
> > trigger a divide by zero at the first loop iteration:
> >
> > (given_numerator / given_denominator > max_numerator) || (1 +
> > given_numerator / given_denominator > max_denominator)
>
> I think the error can only occur when the loop exits on the 1st
> iteration, when d1 is still zero.  In this case the prior convergent,
> n1/d1 = 1/0, does not really exist as this is the 1st iteration.  The
> actual series of convergents generated will never have zero terms,
> because we stop at zero, so there will never be zero from the prior
> iteration as we would have stopped there.

This is my conclusion as well, but you beat me to it.
And below is exactly my understanding of what's going on.

> I think the prior version of the code, which did not consider
> semi-convergents, would have determined the 1st convergent, 314/1,
> exceeded the bounds and would return the prior one, 1/0, without
> generating an exception but also not a correct answer, since 1/0 isn't
> really part of the series, it's just an initial value to make the math
> that generates the series work (d2 = a * d1 + d0).
>
> With semi-convergents, this can actually get the correct answer.  The
> best semi-convergent term is correctly found, (max_numerator - n0) /
> n1 = 255.  Using this would return 255/1, which is in this case the
> best answer.
>
> But the "is semi-convergent better than prior convergent" test does
> not consider what I think is a special case of there being no prior
> convergent.  In this case it should always select the semi-convergent.
>
> I think this handles it:
>
>                 if ((n2 > max_numerator) || (d2 > max_denominator)) {
>                        unsigned long t = (max_numerator - n0) / n1;
>                        if (!d1 || (t = min(t, max_denominator - d0) / d1)) ||
>                            2u * t > a || (2u * t == a && d0 * dp > d1 * d)) {
>                                n1 = n0 + t * n1;
>                                d1 = d0 + t * d1;
>                        }
>                        break;
>                }
>
> Above !d1 is the special case.  I don't like that, but I'm not seeing
> a way to think about the problect that doesn't involve one.

Let me think about it.

> > I think such a condition is rather complex and may not be enforced by
> > all callers of this function.
> >
> > On Fri, May 21, 2021 at 3:42 PM Andy Shevchenko
> > <andy.shevchenko@gmail.com> wrote:
> > > On Friday, May 21, 2021, Andy Shevchenko <andy.shevchenko@gmail.com> wrote:
> > >> On Friday, May 21, 2021, Yiyuan guo <yguoaz@gmail.com> wrote:
> > >>>
> > >>> In the file lib/math/rational.c, the function
> > >>> rational_best_approximation has the following
> > >>> code:
> > >>>
> > >>> void rational_best_approximation(
> > >>>     unsigned long given_numerator, unsigned long given_denominator,
> > >>>     unsigned long max_numerator, unsigned long max_denominator,
> > >>>     unsigned long *best_numerator, unsigned long *best_denominator) {
> > >>>    ...
> > >>>    if ((n2 > max_numerator) || (d2 > max_denominator)) {
> > >>>             unsigned long t = min((max_numerator - n0) / n1,
> > >>>                           (max_denominator - d0) / d1);
> > >>>    ...
> > >>> }
> > >>>
> > >>> d1 may be equal to zero when performing the division, leading to a
> > >>> divide by zero problem.
> > >>>
> > >>> One input  to trigger the divide by zero bug is:
> > >>> rational_best_approximation(31415, 100, (1 << 8) - 1, (1 << 5) - 1, &n, &d)
> > >>
> > >> Have you read a theorem about this? TL;DR; as far as I can see the input data is not suitable for this function.
> > >
> > > I think we may add the proper check and saturate the output which in your case should be (255,1).



-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A divide by zero bug in lib/math/rational.c (with triggering input)
  2021-05-21  9:53         ` Andy Shevchenko
@ 2021-05-22 19:07           ` Oskar Schirmer
  2021-05-24  3:20             ` Trent Piepho
  2021-05-23  0:05           ` Trent Piepho
  1 sibling, 1 reply; 7+ messages in thread
From: Oskar Schirmer @ 2021-05-22 19:07 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Trent Piepho, Daniel Latypov, Yiyuan guo, linux-kernel, andy, akpm

On Fri, May 21, 2021 at 12:53:27 +0300, Andy Shevchenko wrote:
> +Cc: Daniel (here is a real case for test cases!)
> 
> On Fri, May 21, 2021 at 12:20 PM Trent Piepho <tpiepho@gmail.com> wrote:
> > On Fri, May 21, 2021 at 12:55 AM Yiyuan guo <yguoaz@gmail.com> wrote:
> > >
> > > Thanks for your timely response.
> > >
> > > I am not familiar with the theorem. But any input satisfying the
> > > condition below will
> > > trigger a divide by zero at the first loop iteration:
> > >
> > > (given_numerator / given_denominator > max_numerator) || (1 +
> > > given_numerator / given_denominator > max_denominator)
> >
> > I think the error can only occur when the loop exits on the 1st
> > iteration, when d1 is still zero.  In this case the prior convergent,
> > n1/d1 = 1/0, does not really exist as this is the 1st iteration.  The
> > actual series of convergents generated will never have zero terms,
> > because we stop at zero, so there will never be zero from the prior
> > iteration as we would have stopped there.
> 
> This is my conclusion as well, but you beat me to it.
> And below is exactly my understanding of what's going on.
> 
> > I think the prior version of the code, which did not consider
> > semi-convergents, would have determined the 1st convergent, 314/1,
> > exceeded the bounds and would return the prior one, 1/0, without
> > generating an exception but also not a correct answer, since 1/0 isn't
> > really part of the series, it's just an initial value to make the math
> > that generates the series work (d2 = a * d1 + d0).
> >
> > With semi-convergents, this can actually get the correct answer.  The
> > best semi-convergent term is correctly found, (max_numerator - n0) /
> > n1 = 255.  Using this would return 255/1, which is in this case the
> > best answer.
> >
> > But the "is semi-convergent better than prior convergent" test does
> > not consider what I think is a special case of there being no prior
> > convergent.  In this case it should always select the semi-convergent.
> >
> > I think this handles it:
> >
> >                 if ((n2 > max_numerator) || (d2 > max_denominator)) {
> >                        unsigned long t = (max_numerator - n0) / n1;
> >                        if (!d1 || (t = min(t, max_denominator - d0) / d1)) ||
> >                            2u * t > a || (2u * t == a && d0 * dp > d1 * d)) {
> >                                n1 = n0 + t * n1;
> >                                d1 = d0 + t * d1;
> >                        }
> >                        break;
> >                }

Sorry, it does not. E.g. with the given fraction of 31/1000
and the registers restricted to 8 and 5 bits respectively, the
proposed fixed function would still divide by zero, because
n1 == 0. If it was for the division by d1, the test for !d1
will cut the expression for the conditional short, as intended,
but as the branch then uses t, the evaluation for t = ... / d1
will still be performed.

Moreover, for a fraction of 33/1000, both the original and
the latest version would produce 1/30, which is off by some
1.01%, but the proposed fixed version would result in 1/31,
which is worse: 2.24% off.

We are not talking about a science math library, but only
about a helper function for kernel use, and as results with
somewhat less than perfect approximation only occur in cases,
where hardware limitations do not allow the precise result,
I think the original function was not so bad. And the code it
produced was much shorter than the latest version, although
this might not be an argument in times, where a simple OS
kernel is beyond the 40MB.

Admitted, the original version had the flaw of offering bogus
fractions for input beyond the saturation limits, and this
case sure should be handled. And the function is called "best",
so Trents approach to look for better results is sure valid.

Nonetheless, I'ld favour going back to the original Euclidian,
and maybe add a test for the saturation cases to avoid the
caller running into trouble, even though given the use case
the caller will probably run into trouble with this fixed,
when the registers simply cannot hold a good approximation.
So maybe the function should return a boolean to indicate.

best regards,
  Oskar



> > Above !d1 is the special case.  I don't like that, but I'm not seeing
> > a way to think about the problect that doesn't involve one.
> 
> Let me think about it.
> 
> > > I think such a condition is rather complex and may not be enforced by
> > > all callers of this function.
> > >
> > > On Fri, May 21, 2021 at 3:42 PM Andy Shevchenko
> > > <andy.shevchenko@gmail.com> wrote:
> > > > On Friday, May 21, 2021, Andy Shevchenko <andy.shevchenko@gmail.com> wrote:
> > > >> On Friday, May 21, 2021, Yiyuan guo <yguoaz@gmail.com> wrote:
> > > >>>
> > > >>> In the file lib/math/rational.c, the function
> > > >>> rational_best_approximation has the following
> > > >>> code:
> > > >>>
> > > >>> void rational_best_approximation(
> > > >>>     unsigned long given_numerator, unsigned long given_denominator,
> > > >>>     unsigned long max_numerator, unsigned long max_denominator,
> > > >>>     unsigned long *best_numerator, unsigned long *best_denominator) {
> > > >>>    ...
> > > >>>    if ((n2 > max_numerator) || (d2 > max_denominator)) {
> > > >>>             unsigned long t = min((max_numerator - n0) / n1,
> > > >>>                           (max_denominator - d0) / d1);
> > > >>>    ...
> > > >>> }
> > > >>>
> > > >>> d1 may be equal to zero when performing the division, leading to a
> > > >>> divide by zero problem.
> > > >>>
> > > >>> One input  to trigger the divide by zero bug is:
> > > >>> rational_best_approximation(31415, 100, (1 << 8) - 1, (1 << 5) - 1, &n, &d)
> > > >>
> > > >> Have you read a theorem about this? TL;DR; as far as I can see the input data is not suitable for this function.
> > > >
> > > > I think we may add the proper check and saturate the output which in your case should be (255,1).
> 
> 
> 
> -- 
> With Best Regards,
> Andy Shevchenko

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A divide by zero bug in lib/math/rational.c (with triggering input)
  2021-05-21  9:53         ` Andy Shevchenko
  2021-05-22 19:07           ` Oskar Schirmer
@ 2021-05-23  0:05           ` Trent Piepho
  1 sibling, 0 replies; 7+ messages in thread
From: Trent Piepho @ 2021-05-23  0:05 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Daniel Latypov, Yiyuan guo, linux-kernel, andy, akpm, oskar

On Fri, May 21, 2021 at 2:53 AM Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
> >
> > I think the error can only occur when the loop exits on the 1st
> > iteration, when d1 is still zero.  In this case the prior convergent,
> > n1/d1 = 1/0, does not really exist as this is the 1st iteration.  The
> > actual series of convergents generated will never have zero terms,
> > because we stop at zero, so there will never be zero from the prior
> > iteration as we would have stopped there.
>
> This is my conclusion as well, but you beat me to it.
> And below is exactly my understanding of what's going on.

I came up with some more test cases, and there is another possibility,
if the value is small. e.g.

rational_best_approximation(1,30, 1,10, ...)
rational_best_approximation(1,19, 1,10, ...)

The former should be 0/1 and the latter 1/10.  These will divide by
zero on the 2nd iteration.

But I have a patch now that works.  It gets the closest answer in all
cases, larger than max, less than min but closer to the min than to
zero, and closest to zero.

It ends up being zero additional arithmetic to do this.   All that is
needed is a few additional branches in the termination condition.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: A divide by zero bug in lib/math/rational.c (with triggering input)
  2021-05-22 19:07           ` Oskar Schirmer
@ 2021-05-24  3:20             ` Trent Piepho
  0 siblings, 0 replies; 7+ messages in thread
From: Trent Piepho @ 2021-05-24  3:20 UTC (permalink / raw)
  To: Oskar Schirmer
  Cc: Andy Shevchenko, Daniel Latypov, Yiyuan guo, linux-kernel, andy, akpm

On Sat, May 22, 2021 at 12:08 PM Oskar Schirmer <oskar@scara.com> wrote:
> On Fri, May 21, 2021 at 12:53:27 +0300, Andy Shevchenko wrote:
> > On Fri, May 21, 2021 at 12:20 PM Trent Piepho <tpiepho@gmail.com> wrote:
> > > On Fri, May 21, 2021 at 12:55 AM Yiyuan guo <yguoaz@gmail.com> wrote:
>
> Sorry, it does not. E.g. with the given fraction of 31/1000
> and the registers restricted to 8 and 5 bits respectively, the
> proposed fixed function would still divide by zero, because
> n1 == 0. If it was for the division by d1, the test for !d1

Yes, values less than 1 less than the smallest allowed non-zero value
will divide by zero will finish on the 2nd iteration, with n1 == 0,
and divide by zero.

The finished patch I've since sent fixes this.

> Moreover, for a fraction of 33/1000, both the original and
> the latest version would produce 1/30, which is off by some
> 1.01%, but the proposed fixed version would result in 1/31,
> which is worse: 2.24% off.

Finished patch correctly produces 1/30 in this case.

> I think the original function was not so bad. And the code it
> produced was much shorter than the latest version, although
> this might not be an argument in times, where a simple OS
> kernel is beyond the 40MB.

I measured this.  I've compared the original, which did not consider
semi-convergents nor out of range values, the current version, which
does semi-convergents but fails on out of range, and the patched
version, which handles that too.

Size in bytes:
X64 149 205 278
ARM 164 220 300

So 129 bytes on x64 and 136 bytes on ARM.  Not all that much.  I
didn't try writing a special check for large/small inputs in the older
code to see how large that was, for a more like-to-like comparison to
my latest patch.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-05-24  3:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-21  5:09 A divide by zero bug in lib/math/rational.c (with triggering input) Yiyuan guo
     [not found] ` <CAHp75Vf8kQ73w0R9ieDNjDVkxM-V83QRN9mc6BjRZA8xHpPNAA@mail.gmail.com>
     [not found]   ` <CAHp75Vft8pnA+m0C=Ok7nRyjERAd2uJJ4q6HcN460j0Hir6Kaw@mail.gmail.com>
2021-05-21  7:55     ` Yiyuan guo
2021-05-21  9:20       ` Trent Piepho
2021-05-21  9:53         ` Andy Shevchenko
2021-05-22 19:07           ` Oskar Schirmer
2021-05-24  3:20             ` Trent Piepho
2021-05-23  0:05           ` Trent Piepho

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.