qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Markus Armbruster <armbru@redhat.com>
To: Tao Xu <tao3.xu@intel.com>
Cc: qemu-devel@nongnu.org, mdroth@linux.vnet.ibm.com, ehabkost@redhat.com
Subject: Re: [PATCH] util/cutils: Expand do_strtosz parsing precision to 64 bits
Date: Thu, 05 Dec 2019 16:29:02 +0100	[thread overview]
Message-ID: <87a786sse9.fsf@dusky.pond.sub.org> (raw)
In-Reply-To: <20191205021459.29920-1-tao3.xu@intel.com> (Tao Xu's message of "Thu, 5 Dec 2019 10:14:59 +0800")

Tao Xu <tao3.xu@intel.com> writes:

> Parse input string both as a double and as a uint64_t, then use the
> method which consumes more characters. Update the related test cases.
>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
[...]
> diff --git a/util/cutils.c b/util/cutils.c
> index 77acadc70a..b08058c57c 100644
> --- a/util/cutils.c
> +++ b/util/cutils.c
> @@ -212,24 +212,43 @@ static int do_strtosz(const char *nptr, const char **end,
>                        const char default_suffix, int64_t unit,
>                        uint64_t *result)
>  {
> -    int retval;
> -    const char *endptr;
> +    int retval, retd, retu;
> +    const char *suffix, *suffixd, *suffixu;
>      unsigned char c;
>      int mul_required = 0;
> -    double val, mul, integral, fraction;
> +    bool use_strtod;
> +    uint64_t valu;
> +    double vald, mul, integral, fraction;

Note for later: @mul is double.

> +
> +    retd = qemu_strtod_finite(nptr, &suffixd, &vald);
> +    retu = qemu_strtou64(nptr, &suffixu, 0, &valu);
> +    use_strtod = strlen(suffixd) < strlen(suffixu);
> +
> +    /*
> +     * Parse @nptr both as a double and as a uint64_t, then use the method
> +     * which consumes more characters.
> +     */

The comment is in a funny place.  I'd put it right before the
qemu_strtod_finite() line.

> +    if (use_strtod) {
> +        suffix = suffixd;
> +        retval = retd;
> +    } else {
> +        suffix = suffixu;
> +        retval = retu;
> +    }
>  
> -    retval = qemu_strtod_finite(nptr, &endptr, &val);
>      if (retval) {
>          goto out;
>      }

This is even more subtle than it looks.

A close reading of the function contracts leads to three cases for each
conversion:

* parse error (including infinity and NaN)

  @retu / @retd is -EINVAL
  @valu / @vald is uninitialized
  @suffixu / @suffixd is @nptr

* range error

  @retu / @retd is -ERANGE
  @valu / @vald is our best approximation of the conversion result
  @suffixu / @suffixd points to the first character not consumed by the
  conversion.

  Sub-cases:

  - uint64_t overflow

    We know the conversion result exceeds UINT64_MAX.

  - double overflow

    we know the conversion result's magnitude exceeds the largest
    representable finite double DBL_MAX.

  - double underflow

    we know the conversion result is close to zero (closer than DBL_MIN,
    the smallest normalized positive double).

* success

  @retu / @retd is 0
  @valu / @vald is the conversion result
  @suffixu / @suffixd points to the first character not consumed by the
  conversion.

This leads to a matrix (parse error, uint64_t overflow, success) x
(parse error, double overflow, double underflow, success).  We need to
check the code does what we want for each element of this matrix, and
document any behavior that's not perfectly obvious.

(success, success): we pick uint64_t if qemu_strtou64() consumed more
characters than qemu_strtod_finite(), else double.  "More" is important
here; when they consume the same characters, we *need* to use the
uint64_t result.  Example: for "18446744073709551615", we need to use
uint64_t 18446744073709551615, not double 18446744073709551616.0.  But
for "18446744073709551616.", we need to use the double.  Good.

(success, parse error) and (parse error, success): we pick the one that
succeeds, because success consumes characters, and failure to parse does
not.  Good.

(parse error, parse error): neither consumes characters, so we pick
uint64_t.  Good.

(parse error, double overflow), (parse error, double underflow) and
(uint64_t overflow, parse error): we pick the range error, because it
consumes characters.  Good.

These are the simple combinations.  The remainder are hairier: (success,
double overflow), (success, double underflow), (uint64_t overflow,
success).  I lack the time to analyze them today.  Must be done before
we take this patch.  Any takers?

> -    fraction = modf(val, &integral);
> -    if (fraction != 0) {
> -        mul_required = 1;
> +    if (use_strtod) {
> +        fraction = modf(vald, &integral);
> +        if (fraction != 0) {
> +            mul_required = 1;
> +        }
>      }

Here, @suffix points to the suffix character, if any.

> -    c = *endptr;
> +    c = *suffix;
>      mul = suffix_mul(c, unit);
>      if (mul >= 0) {
> -        endptr++;
> +        suffix++;

Now @suffix points to the first character not consumed, *not* the
suffix.

Your patch effectively renames @endptr to @suffix.  I think @endptr is
the better name.  Keeping the name also makes the diff smaller and
slightly easier to review.

>      } else {
>          mul = suffix_mul(default_suffix, unit);

suffix_mul() returns int64_t.  The assignment converts it to double.
Fine before the patch, because @mul is the multiplier for a double
value.  No longer true after the patch, see below.

>          assert(mul >= 0);
> @@ -238,23 +257,36 @@ static int do_strtosz(const char *nptr, const char **end,
>          retval = -EINVAL;
>          goto out;
>      }
> -    /*
> -     * Values near UINT64_MAX overflow to 2**64 when converting to double
> -     * precision.  Compare against the maximum representable double precision
> -     * value below 2**64, computed as "the next value after 2**64 (0x1p64) in
> -     * the direction of 0".
> -     */
> -    if ((val * mul > nextafter(0x1p64, 0)) || val < 0) {
> -        retval = -ERANGE;
> -        goto out;
> +
> +    if (use_strtod) {
> +        /*
> +         * Values near UINT64_MAX overflow to 2**64 when converting to double
> +         * precision. Compare against the maximum representable double precision
> +         * value below 2**64, computed as "the next value after 2**64 (0x1p64)
> +         * in the direction of 0".
> +         */
> +        if ((vald * mul > nextafter(0x1p64, 0)) || vald < 0) {
> +            retval = -ERANGE;
> +            goto out;
> +        }
> +        *result = vald * mul;

Here, @mul is a multiplier for double vald.

> +    } else {
> +        /* Reject negative input and overflow output */
> +        while (qemu_isspace(*nptr)) {
> +            nptr++;
> +        }
> +        if (*nptr == '-' || UINT64_MAX / (uint64_t) mul < valu) {
> +            retval = -ERANGE;
> +            goto out;
> +        }
> +        *result = valu * (uint64_t) mul;

Here, @mul is a multiplier for uint64_t valu.

Please change @mul to int64_t to reduce conversions.

>      }
> -    *result = val * mul;
>      retval = 0;
>  
>  out:
>      if (end) {
> -        *end = endptr;
> -    } else if (*endptr) {
> +        *end = suffix;
> +    } else if (*suffix) {
>          retval = -EINVAL;
>      }



  reply	other threads:[~2019-12-05 15:57 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-05  2:14 [PATCH] util/cutils: Expand do_strtosz parsing precision to 64 bits Tao Xu
2019-12-05 15:29 ` Markus Armbruster [this message]
2019-12-09  5:38   ` Tao Xu
2019-12-17 10:25     ` Markus Armbruster
2019-12-18  1:33       ` Tao Xu
2019-12-18  5:26         ` Tao Xu
2019-12-18 18:26           ` Markus Armbruster
2019-12-19  7:43             ` Tao Xu
2019-12-19 10:15               ` Markus Armbruster
2019-12-18 21:49         ` Eric Blake
2019-12-17 12:04   ` Christophe de Dinechin
2019-12-17 14:08     ` Markus Armbruster
2019-12-17 14:12       ` Christophe de Dinechin
2019-12-17 15:01         ` Markus Armbruster
2019-12-18  2:29           ` Tao Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a786sse9.fsf@dusky.pond.sub.org \
    --to=armbru@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=tao3.xu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).