Re: [PATCH v2 07/28] softfloat: Move sqrt_float to softfloat-parts.c.inc

From: "Alex Bennée" <alex.bennee@linaro.org>
To: Richard Henderson <richard.henderson@linaro.org>
Cc: qemu-devel@nongnu.org, david@redhat.com
Subject: Re: [PATCH v2 07/28] softfloat: Move sqrt_float to softfloat-parts.c.inc
Date: Thu, 03 Jun 2021 10:17:37 +0100	[thread overview]
Message-ID: <87o8cn9uek.fsf@linaro.org> (raw)
In-Reply-To: <20210525150706.294968-8-richard.henderson@linaro.org>

Richard Henderson <richard.henderson@linaro.org> writes:

> Rename to parts$N_sqrt.
> Reimplement float128_sqrt with FloatParts128.
>
> Reimplement with the inverse sqrt newton-raphson algorithm from musl.
> This is significantly faster than even the berkeley sqrt n-r algorithm,
> because it does not use division instructions, only multiplication.
>
> Ordinarily, changing algorithms at the same time as migrating code is
> a bad idea, but this is the only way I found that didn't break one of
> the routines at the same time.

I can't pretend to follow the details of the method as well as I could
the original but that's why we have tests so if they are happy I'm
happy:

Tested-by: Alex Bennée <alex.bennee@linaro.org>

<snip>
> +
> +    if (N == 64) {
> +        /* float64 or smaller */
> +
> +        r32 = ((uint64_t)r32 * u32) >> 31;
> +        /* |r*sqrt(m) - 1| < 0x1.7Bp-16 */
> +
> +        s32 = ((uint64_t)m32 * r32) >> 32;
> +        d32 = ((uint64_t)s32 * r32) >> 32;
> +        u32 = three32 - d32;
> +
> +        if (fmt->frac_size <= 23) {
> +            /* float32 or smaller */
> +
> +            s32 = ((uint64_t)s32 * u32) >> 32;  /* 3.29 */
> +            s32 = (s32 - 1) >> 6;               /* 9.23 */
> +            /* s < sqrt(m) < s + 0x1.08p-23 */
> +
> +            /* compute nearest rounded result to 2.23 bits */
> +            uint32_t d0 = (m32 << 16) - s32 * s32;
> +            uint32_t d1 = s32 - d0;
> +            uint32_t d2 = d1 + s32 + 1;
> +            s32 += d1 >> 31;
> +            a->frac_hi = (uint64_t)s32 << (64 - 25);
> +
> +            /* increment or decrement for inexact */
> +            if (d2 != 0) {
> +                a->frac_hi += ((int32_t)(d1 ^ d2) < 0 ? -1 : 1);
> +            }
> +            goto done;
> +        }
> +
> +        /* float64 */
> +
> +        r64 = (uint64_t)r32 * u32 * 2;
> +        /* |r*sqrt(m) - 1| < 0x1.37-p29; convert to 64-bit arithmetic */
> +        mul64To128(m64, r64, &s64, &discard);
> +        mul64To128(s64, r64, &d64, &discard);
> +        u64 = three64 - d64;
> +
> +        mul64To128(s64, u64, &s64, &discard);  /* 3.61 */
> +        s64 = (s64 - 2) >> 9;                  /* 12.52 */
> +
> +        /* Compute nearest rounded result */
> +        uint64_t d0 = (m64 << 42) - s64 * s64;
> +        uint64_t d1 = s64 - d0;
> +        uint64_t d2 = d1 + s64 + 1;
> +        s64 += d1 >> 63;
> +        a->frac_hi = s64 << (64 - 54);
> +
> +        /* increment or decrement for inexact */
> +        if (d2 != 0) {
> +            a->frac_hi += ((int64_t)(d1 ^ d2) < 0 ? -1 : 1);
> +        }
> +        goto done;
> +    }

I usually take more convincing about gotos but I can't see a way of
doing it more neatly so have a:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

as well :-)

-- 
Alex Bennée