From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD356C43603 for ; Wed, 18 Dec 2019 01:35:00 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 92E5B21582 for ; Wed, 18 Dec 2019 01:35:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 92E5B21582 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:48136 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ihOEt-0007Iu-Na for qemu-devel@archiver.kernel.org; Tue, 17 Dec 2019 20:34:59 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:36198) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ihOE4-0006Pf-AC for qemu-devel@nongnu.org; Tue, 17 Dec 2019 20:34:09 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ihOE2-0000hE-0o for qemu-devel@nongnu.org; Tue, 17 Dec 2019 20:34:07 -0500 Received: from mga01.intel.com ([192.55.52.88]:37438) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ihOE1-0000an-LO for qemu-devel@nongnu.org; Tue, 17 Dec 2019 20:34:05 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Dec 2019 17:34:01 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,327,1571727600"; d="scan'208";a="240627920" Received: from txu2-mobl.ccr.corp.intel.com (HELO [10.239.196.238]) ([10.239.196.238]) by fmsmga004.fm.intel.com with ESMTP; 17 Dec 2019 17:34:00 -0800 Subject: Re: [PATCH] util/cutils: Expand do_strtosz parsing precision to 64 bits To: Markus Armbruster References: <20191205021459.29920-1-tao3.xu@intel.com> <87a786sse9.fsf@dusky.pond.sub.org> <87y2vbgsf0.fsf@dusky.pond.sub.org> From: Tao Xu Message-ID: Date: Wed, 18 Dec 2019 09:33:59 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.1 MIME-Version: 1.0 In-Reply-To: <87y2vbgsf0.fsf@dusky.pond.sub.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.88 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "mdroth@linux.vnet.ibm.com" , "qemu-devel@nongnu.org" , "ehabkost@redhat.com" Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" On 12/17/2019 6:25 PM, Markus Armbruster wrote: > Tao Xu writes: > >> On 12/5/19 11:29 PM, Markus Armbruster wrote: >>> Tao Xu writes: >>> >>>> Parse input string both as a double and as a uint64_t, then use the >>>> method which consumes more characters. Update the related test cases. >>>> >>>> Signed-off-by: Tao Xu >>>> --- >>> [...] >>>> diff --git a/util/cutils.c b/util/cutils.c >>>> index 77acadc70a..b08058c57c 100644 >>>> --- a/util/cutils.c >>>> +++ b/util/cutils.c >>>> @@ -212,24 +212,43 @@ static int do_strtosz(const char *nptr, const char **end, >>>> const char default_suffix, int64_t unit, >>>> uint64_t *result) >>>> { >>>> - int retval; >>>> - const char *endptr; >>>> + int retval, retd, retu; >>>> + const char *suffix, *suffixd, *suffixu; >>>> unsigned char c; >>>> int mul_required = 0; >>>> - double val, mul, integral, fraction; >>>> + bool use_strtod; >>>> + uint64_t valu; >>>> + double vald, mul, integral, fraction; >>> >>> Note for later: @mul is double. >>> >>>> + >>>> + retd = qemu_strtod_finite(nptr, &suffixd, &vald); >>>> + retu = qemu_strtou64(nptr, &suffixu, 0, &valu); > > Note for later: passing 0 to base accepts octal and hexadecimal > integers. > >>>> + use_strtod = strlen(suffixd) < strlen(suffixu); >>>> + >>>> + /* >>>> + * Parse @nptr both as a double and as a uint64_t, then use the method >>>> + * which consumes more characters. >>>> + */ >>> >>> The comment is in a funny place. I'd put it right before the >>> qemu_strtod_finite() line. >>> >>>> + if (use_strtod) { >>>> + suffix = suffixd; >>>> + retval = retd; >>>> + } else { >>>> + suffix = suffixu; >>>> + retval = retu; >>>> + } >>>> - retval = qemu_strtod_finite(nptr, &endptr, &val); >>>> if (retval) { >>>> goto out; >>>> } >>> >>> This is even more subtle than it looks. >>> >>> A close reading of the function contracts leads to three cases for each >>> conversion: >>> >>> * parse error (including infinity and NaN) >>> >>> @retu / @retd is -EINVAL >>> @valu / @vald is uninitialized >>> @suffixu / @suffixd is @nptr >>> >>> * range error >>> >>> @retu / @retd is -ERANGE >>> @valu / @vald is our best approximation of the conversion result >>> @suffixu / @suffixd points to the first character not consumed by the >>> conversion. >>> >>> Sub-cases: >>> >>> - uint64_t overflow >>> >>> We know the conversion result exceeds UINT64_MAX. >>> >>> - double overflow >>> >>> we know the conversion result's magnitude exceeds the largest >>> representable finite double DBL_MAX. >>> >>> - double underflow >>> >>> we know the conversion result is close to zero (closer than DBL_MIN, >>> the smallest normalized positive double). >>> >>> * success >>> >>> @retu / @retd is 0 >>> @valu / @vald is the conversion result >>> @suffixu / @suffixd points to the first character not consumed by the >>> conversion. >>> >>> This leads to a matrix (parse error, uint64_t overflow, success) x >>> (parse error, double overflow, double underflow, success). We need to >>> check the code does what we want for each element of this matrix, and >>> document any behavior that's not perfectly obvious. >>> >>> (success, success): we pick uint64_t if qemu_strtou64() consumed more >>> characters than qemu_strtod_finite(), else double. "More" is important >>> here; when they consume the same characters, we *need* to use the >>> uint64_t result. Example: for "18446744073709551615", we need to use >>> uint64_t 18446744073709551615, not double 18446744073709551616.0. But >>> for "18446744073709551616.", we need to use the double. Good. > > Also fun: for "0123", we use uint64_t 83, not double 123.0. But for > "0123.", we use 123.0, not 83. > > Do we really want to accept octal and hexadecimal integers? > Thank you for reminding me. Octal and hexadecimal may bring more confusion. I will use qemu_strtou64(nptr, &suffixu, 10, &valu) and add test for input like "0123". >>> (success, parse error) and (parse error, success): we pick the one that >>> succeeds, because success consumes characters, and failure to parse does >>> not. Good. >>> >>> (parse error, parse error): neither consumes characters, so we pick >>> uint64_t. Good. >>> >>> (parse error, double overflow), (parse error, double underflow) and >>> (uint64_t overflow, parse error): we pick the range error, because it >>> consumes characters. Good. >>> >>> These are the simple combinations. The remainder are hairier: (success, >>> double overflow), (success, double underflow), (uint64_t overflow, >>> success). I lack the time to analyze them today. Must be done before >>> we take this patch. Any takers? >> >> (success, double overflow), (success, double underflow), pick double >> overflow error, return -ERANGE. Because it consumes >> characters. Example: for "1.79769e+309", qemu_strtou64 consumes "1", >> and prases as uint64_t; but qemu_strtod_finite return -ERANGE and >> consumes all characters. It is OK. > > The only way to have double overflow when uint64_t succeeds is an > exponent. Double consumes the characters making up the exponent, > uint64_t does not. We use double. > > The only way to have double underflow is with an exponent or a decimal > point. Double consumes their characters, uint64_t does not. We use > double. > > Okay. > >> (uint64_t overflow, success), consume the same characters, use the >> uint64_t return -ERANGE. Note that even if qemu_strtod_finite can >> parse these cases such as "18446744073709551617", but the result is >> uint64_t so we also need to return -ERANGE. It is OK. > > That's just one of two cases, I think. The other one is when the > overflowing integer is followed by an exponent or decimal point. We use > double then. Converting the double to uint64_t overflows, except when a > negative exponent brings the number into range. > > Examples: "18446744073709551617" picks uint64_t overflow, > "18446744073709551617.0" picks double success (but converting it to > uint64_t below overflows), and "18446744073709551617e-10" picks double > success (converted to 1844674407 below). > > Okay. > >> Thank you for your analysis and suggestion. I will add more test cases >> to cover some of these analysis. > > Good move. > > Thank you for your further analysis.