Re: [PATCH] util/cutils: Expand do_strtosz parsing precision to 64 bits

From: Markus Armbruster <armbru@redhat.com>
To: Tao Xu <tao3.xu@intel.com>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"ehabkost@redhat.com" <ehabkost@redhat.com>,
	"mdroth@linux.vnet.ibm.com" <mdroth@linux.vnet.ibm.com>
Subject: Re: [PATCH] util/cutils: Expand do_strtosz parsing precision to 64 bits
Date: Tue, 17 Dec 2019 11:25:39 +0100	[thread overview]
Message-ID: <87y2vbgsf0.fsf@dusky.pond.sub.org> (raw)
In-Reply-To: <b7c442e3-cc7e-155e-5370-db9a371928a6@intel.com> (Tao Xu's message of "Mon, 9 Dec 2019 13:38:39 +0800")

Tao Xu <tao3.xu@intel.com> writes:

> On 12/5/19 11:29 PM, Markus Armbruster wrote:
>> Tao Xu <tao3.xu@intel.com> writes:
>>
>>> Parse input string both as a double and as a uint64_t, then use the
>>> method which consumes more characters. Update the related test cases.
>>>
>>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>>> ---
>> [...]
>>> diff --git a/util/cutils.c b/util/cutils.c
>>> index 77acadc70a..b08058c57c 100644
>>> --- a/util/cutils.c
>>> +++ b/util/cutils.c
>>> @@ -212,24 +212,43 @@ static int do_strtosz(const char *nptr, const char **end,
>>>                         const char default_suffix, int64_t unit,
>>>                         uint64_t *result)
>>>   {
>>> -    int retval;
>>> -    const char *endptr;
>>> +    int retval, retd, retu;
>>> +    const char *suffix, *suffixd, *suffixu;
>>>       unsigned char c;
>>>       int mul_required = 0;
>>> -    double val, mul, integral, fraction;
>>> +    bool use_strtod;
>>> +    uint64_t valu;
>>> +    double vald, mul, integral, fraction;
>>
>> Note for later: @mul is double.
>>
>>> +
>>> +    retd = qemu_strtod_finite(nptr, &suffixd, &vald);
>>> +    retu = qemu_strtou64(nptr, &suffixu, 0, &valu);

Note for later: passing 0 to base accepts octal and hexadecimal
integers.

>>> +    use_strtod = strlen(suffixd) < strlen(suffixu);
>>> +
>>> +    /*
>>> +     * Parse @nptr both as a double and as a uint64_t, then use the method
>>> +     * which consumes more characters.
>>> +     */
>>
>> The comment is in a funny place.  I'd put it right before the
>> qemu_strtod_finite() line.
>>
>>> +    if (use_strtod) {
>>> +        suffix = suffixd;
>>> +        retval = retd;
>>> +    } else {
>>> +        suffix = suffixu;
>>> +        retval = retu;
>>> +    }
>>>   -    retval = qemu_strtod_finite(nptr, &endptr, &val);
>>>       if (retval) {
>>>           goto out;
>>>       }
>>
>> This is even more subtle than it looks.
>>
>> A close reading of the function contracts leads to three cases for each
>> conversion:
>>
>> * parse error (including infinity and NaN)
>>
>>    @retu / @retd is -EINVAL
>>    @valu / @vald is uninitialized
>>    @suffixu / @suffixd is @nptr
>>
>> * range error
>>
>>    @retu / @retd is -ERANGE
>>    @valu / @vald is our best approximation of the conversion result
>>    @suffixu / @suffixd points to the first character not consumed by the
>>    conversion.
>>
>>    Sub-cases:
>>
>>    - uint64_t overflow
>>
>>      We know the conversion result exceeds UINT64_MAX.
>>
>>    - double overflow
>>
>>      we know the conversion result's magnitude exceeds the largest
>>      representable finite double DBL_MAX.
>>
>>    - double underflow
>>
>>      we know the conversion result is close to zero (closer than DBL_MIN,
>>      the smallest normalized positive double).
>>
>> * success
>>
>>    @retu / @retd is 0
>>    @valu / @vald is the conversion result
>>    @suffixu / @suffixd points to the first character not consumed by the
>>    conversion.
>>
>> This leads to a matrix (parse error, uint64_t overflow, success) x
>> (parse error, double overflow, double underflow, success).  We need to
>> check the code does what we want for each element of this matrix, and
>> document any behavior that's not perfectly obvious.
>>
>> (success, success): we pick uint64_t if qemu_strtou64() consumed more
>> characters than qemu_strtod_finite(), else double.  "More" is important
>> here; when they consume the same characters, we *need* to use the
>> uint64_t result.  Example: for "18446744073709551615", we need to use
>> uint64_t 18446744073709551615, not double 18446744073709551616.0.  But
>> for "18446744073709551616.", we need to use the double.  Good.

Also fun: for "0123", we use uint64_t 83, not double 123.0.  But for
"0123.", we use 123.0, not 83.

Do we really want to accept octal and hexadecimal integers?

>> (success, parse error) and (parse error, success): we pick the one that
>> succeeds, because success consumes characters, and failure to parse does
>> not.  Good.
>>
>> (parse error, parse error): neither consumes characters, so we pick
>> uint64_t.  Good.
>>
>> (parse error, double overflow), (parse error, double underflow) and
>> (uint64_t overflow, parse error): we pick the range error, because it
>> consumes characters.  Good.
>>
>> These are the simple combinations.  The remainder are hairier: (success,
>> double overflow), (success, double underflow), (uint64_t overflow,
>> success).  I lack the time to analyze them today.  Must be done before
>> we take this patch.  Any takers?
>
> (success, double overflow), (success, double underflow), pick double
> overflow error, return -ERANGE. Because it consumes
> characters. Example: for "1.79769e+309", qemu_strtou64 consumes "1",
> and prases as uint64_t; but qemu_strtod_finite return -ERANGE and
> consumes all characters. It is OK.

The only way to have double overflow when uint64_t succeeds is an
exponent.  Double consumes the characters making up the exponent,
uint64_t does not.  We use double.

The only way to have double underflow is with an exponent or a decimal
point.  Double consumes their characters, uint64_t does not.  We use
double.

Okay.

> (uint64_t overflow, success), consume the same characters, use the
> uint64_t return -ERANGE. Note that even if qemu_strtod_finite can
> parse these cases such as "18446744073709551617", but the result is
> uint64_t so we also need to return -ERANGE. It is OK.

That's just one of two cases, I think.  The other one is when the
overflowing integer is followed by an exponent or decimal point.  We use
double then.  Converting the double to uint64_t overflows, except when a
negative exponent brings the number into range.

Examples: "18446744073709551617" picks uint64_t overflow,
"18446744073709551617.0" picks double success (but converting it to
uint64_t below overflows), and "18446744073709551617e-10" picks double
success (converted to 1844674407 below).

Okay.

> Thank you for your analysis and suggestion. I will add more test cases
> to cover some of these analysis.

Good move.