From: Tao Xu <tao3.xu@intel.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: "mdroth@linux.vnet.ibm.com" <mdroth@linux.vnet.ibm.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"ehabkost@redhat.com" <ehabkost@redhat.com>
Subject: Re: [PATCH] util/cutils: Expand do_strtosz parsing precision to 64 bits
Date: Wed, 18 Dec 2019 09:33:59 +0800 [thread overview]
Message-ID: <e731445a-4461-3212-c08d-05dc7ad2b742@intel.com> (raw)
In-Reply-To: <87y2vbgsf0.fsf@dusky.pond.sub.org>
On 12/17/2019 6:25 PM, Markus Armbruster wrote:
> Tao Xu <tao3.xu@intel.com> writes:
>
>> On 12/5/19 11:29 PM, Markus Armbruster wrote:
>>> Tao Xu <tao3.xu@intel.com> writes:
>>>
>>>> Parse input string both as a double and as a uint64_t, then use the
>>>> method which consumes more characters. Update the related test cases.
>>>>
>>>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>>>> ---
>>> [...]
>>>> diff --git a/util/cutils.c b/util/cutils.c
>>>> index 77acadc70a..b08058c57c 100644
>>>> --- a/util/cutils.c
>>>> +++ b/util/cutils.c
>>>> @@ -212,24 +212,43 @@ static int do_strtosz(const char *nptr, const char **end,
>>>> const char default_suffix, int64_t unit,
>>>> uint64_t *result)
>>>> {
>>>> - int retval;
>>>> - const char *endptr;
>>>> + int retval, retd, retu;
>>>> + const char *suffix, *suffixd, *suffixu;
>>>> unsigned char c;
>>>> int mul_required = 0;
>>>> - double val, mul, integral, fraction;
>>>> + bool use_strtod;
>>>> + uint64_t valu;
>>>> + double vald, mul, integral, fraction;
>>>
>>> Note for later: @mul is double.
>>>
>>>> +
>>>> + retd = qemu_strtod_finite(nptr, &suffixd, &vald);
>>>> + retu = qemu_strtou64(nptr, &suffixu, 0, &valu);
>
> Note for later: passing 0 to base accepts octal and hexadecimal
> integers.
>
>>>> + use_strtod = strlen(suffixd) < strlen(suffixu);
>>>> +
>>>> + /*
>>>> + * Parse @nptr both as a double and as a uint64_t, then use the method
>>>> + * which consumes more characters.
>>>> + */
>>>
>>> The comment is in a funny place. I'd put it right before the
>>> qemu_strtod_finite() line.
>>>
>>>> + if (use_strtod) {
>>>> + suffix = suffixd;
>>>> + retval = retd;
>>>> + } else {
>>>> + suffix = suffixu;
>>>> + retval = retu;
>>>> + }
>>>> - retval = qemu_strtod_finite(nptr, &endptr, &val);
>>>> if (retval) {
>>>> goto out;
>>>> }
>>>
>>> This is even more subtle than it looks.
>>>
>>> A close reading of the function contracts leads to three cases for each
>>> conversion:
>>>
>>> * parse error (including infinity and NaN)
>>>
>>> @retu / @retd is -EINVAL
>>> @valu / @vald is uninitialized
>>> @suffixu / @suffixd is @nptr
>>>
>>> * range error
>>>
>>> @retu / @retd is -ERANGE
>>> @valu / @vald is our best approximation of the conversion result
>>> @suffixu / @suffixd points to the first character not consumed by the
>>> conversion.
>>>
>>> Sub-cases:
>>>
>>> - uint64_t overflow
>>>
>>> We know the conversion result exceeds UINT64_MAX.
>>>
>>> - double overflow
>>>
>>> we know the conversion result's magnitude exceeds the largest
>>> representable finite double DBL_MAX.
>>>
>>> - double underflow
>>>
>>> we know the conversion result is close to zero (closer than DBL_MIN,
>>> the smallest normalized positive double).
>>>
>>> * success
>>>
>>> @retu / @retd is 0
>>> @valu / @vald is the conversion result
>>> @suffixu / @suffixd points to the first character not consumed by the
>>> conversion.
>>>
>>> This leads to a matrix (parse error, uint64_t overflow, success) x
>>> (parse error, double overflow, double underflow, success). We need to
>>> check the code does what we want for each element of this matrix, and
>>> document any behavior that's not perfectly obvious.
>>>
>>> (success, success): we pick uint64_t if qemu_strtou64() consumed more
>>> characters than qemu_strtod_finite(), else double. "More" is important
>>> here; when they consume the same characters, we *need* to use the
>>> uint64_t result. Example: for "18446744073709551615", we need to use
>>> uint64_t 18446744073709551615, not double 18446744073709551616.0. But
>>> for "18446744073709551616.", we need to use the double. Good.
>
> Also fun: for "0123", we use uint64_t 83, not double 123.0. But for
> "0123.", we use 123.0, not 83.
>
> Do we really want to accept octal and hexadecimal integers?
>
Thank you for reminding me. Octal and hexadecimal may bring more
confusion. I will use qemu_strtou64(nptr, &suffixu, 10, &valu) and add
test for input like "0123".
>>> (success, parse error) and (parse error, success): we pick the one that
>>> succeeds, because success consumes characters, and failure to parse does
>>> not. Good.
>>>
>>> (parse error, parse error): neither consumes characters, so we pick
>>> uint64_t. Good.
>>>
>>> (parse error, double overflow), (parse error, double underflow) and
>>> (uint64_t overflow, parse error): we pick the range error, because it
>>> consumes characters. Good.
>>>
>>> These are the simple combinations. The remainder are hairier: (success,
>>> double overflow), (success, double underflow), (uint64_t overflow,
>>> success). I lack the time to analyze them today. Must be done before
>>> we take this patch. Any takers?
>>
>> (success, double overflow), (success, double underflow), pick double
>> overflow error, return -ERANGE. Because it consumes
>> characters. Example: for "1.79769e+309", qemu_strtou64 consumes "1",
>> and prases as uint64_t; but qemu_strtod_finite return -ERANGE and
>> consumes all characters. It is OK.
>
> The only way to have double overflow when uint64_t succeeds is an
> exponent. Double consumes the characters making up the exponent,
> uint64_t does not. We use double.
>
> The only way to have double underflow is with an exponent or a decimal
> point. Double consumes their characters, uint64_t does not. We use
> double.
>
> Okay.
>
>> (uint64_t overflow, success), consume the same characters, use the
>> uint64_t return -ERANGE. Note that even if qemu_strtod_finite can
>> parse these cases such as "18446744073709551617", but the result is
>> uint64_t so we also need to return -ERANGE. It is OK.
>
> That's just one of two cases, I think. The other one is when the
> overflowing integer is followed by an exponent or decimal point. We use
> double then. Converting the double to uint64_t overflows, except when a
> negative exponent brings the number into range.
>
> Examples: "18446744073709551617" picks uint64_t overflow,
> "18446744073709551617.0" picks double success (but converting it to
> uint64_t below overflows), and "18446744073709551617e-10" picks double
> success (converted to 1844674407 below).
>
> Okay.
>
>> Thank you for your analysis and suggestion. I will add more test cases
>> to cover some of these analysis.
>
> Good move.
>
>
Thank you for your further analysis.
next prev parent reply other threads:[~2019-12-18 1:35 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-05 2:14 [PATCH] util/cutils: Expand do_strtosz parsing precision to 64 bits Tao Xu
2019-12-05 15:29 ` Markus Armbruster
2019-12-09 5:38 ` Tao Xu
2019-12-17 10:25 ` Markus Armbruster
2019-12-18 1:33 ` Tao Xu [this message]
2019-12-18 5:26 ` Tao Xu
2019-12-18 18:26 ` Markus Armbruster
2019-12-19 7:43 ` Tao Xu
2019-12-19 10:15 ` Markus Armbruster
2019-12-18 21:49 ` Eric Blake
2019-12-17 12:04 ` Christophe de Dinechin
2019-12-17 14:08 ` Markus Armbruster
2019-12-17 14:12 ` Christophe de Dinechin
2019-12-17 15:01 ` Markus Armbruster
2019-12-18 2:29 ` Tao Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e731445a-4461-3212-c08d-05dc7ad2b742@intel.com \
--to=tao3.xu@intel.com \
--cc=armbru@redhat.com \
--cc=ehabkost@redhat.com \
--cc=mdroth@linux.vnet.ibm.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).