From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.6 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A77DC43603 for ; Tue, 17 Dec 2019 12:05:36 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DD452207FF for ; Tue, 17 Dec 2019 12:05:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="MtTjCsqJ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DD452207FF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:39534 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ihBbb-0002bx-3E for qemu-devel@archiver.kernel.org; Tue, 17 Dec 2019 07:05:35 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:36050) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ihBaY-00027J-JC for qemu-devel@nongnu.org; Tue, 17 Dec 2019 07:04:34 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ihBaW-0000M7-Gq for qemu-devel@nongnu.org; Tue, 17 Dec 2019 07:04:29 -0500 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:31031 helo=us-smtp-1.mimecast.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ihBaW-0000J5-Cg for qemu-devel@nongnu.org; Tue, 17 Dec 2019 07:04:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1576584267; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=k7SbSCKV9dl/jpvBY0yOX1eEJGNy7lThtDnwXnGs39g=; b=MtTjCsqJwjQTeWNFIAe6iVr8A9tRIgDSjswkGsgkAG33wnHZrq2ov5rqRuuWKv/hR9Gyru xIi+oNM1F+myGYEpf6WLdqAcTzqC6MIL6+6uS9ESsng7dhNF7Uz4e0rSgd/BHhzrCD+8F6 HZllxz71Kpepq6F+R95/zy2k/MZGHfU= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-266-DStZX7ZSOTuJyOjwj64cKg-1; Tue, 17 Dec 2019 07:04:25 -0500 Received: by mail-wr1-f72.google.com with SMTP id j13so5248592wrr.20 for ; Tue, 17 Dec 2019 04:04:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=OWae+cuybe3ezTnE36cWgw6y1DmbqygsrVlTmtr37NQ=; b=X69CELy4fevrCbBD60+I+wQFpUYCuWbzd9S71TA05h/1/G8KkRs7Hsce9qES4fGKHg AVy6OALB7wJlPEvgvse+coK0kgNjfllv8wBOn9nXD48P9jF1UmEY4c9V51tvtt4qLpXK gVBXLu0BKjrRaalG3HjrA8Kv1cvNuZ1xPAldJqLdFzIK63Ay1YoHQJwImXgsHEopfJ58 0zpNkQxyMmN8WFI94FORKu09rIRvBgm6VLx+FzyU+oMqbB4eCdga0aYEX9sNKNuyncPK 6MJF1CYwDpXWXJjqi40HqglXX7wsxir2F7EMBDKWz7eTpGfaFOBoWqT3SSAzpAdoirAl Cqdg== X-Gm-Message-State: APjAAAV3r+O5Z5Ubq0vERYjnnJ343AoS3g+7PKTs9qOXUwqopEFGTZCm eKWO1E/KtD5Yk/FhyR9556CzZHovyGJQIdLyXCXNmpdNxpHOuyNGMPnTa5uUfYoniVrWbeR8Gk1 fwSqn4WbDWJ9gUtA= X-Received: by 2002:a1c:1b41:: with SMTP id b62mr4931949wmb.53.1576584264470; Tue, 17 Dec 2019 04:04:24 -0800 (PST) X-Google-Smtp-Source: APXvYqwizPkyetWkCdTFH8pjO9VGTDdR7HinPrMUp7OnnyY7HqISID4j8cJAo8CuMrCFdv+jlrHTLg== X-Received: by 2002:a1c:1b41:: with SMTP id b62mr4931915wmb.53.1576584264129; Tue, 17 Dec 2019 04:04:24 -0800 (PST) Received: from ?IPv6:2a01:e0a:466:71c0:1c42:ed63:2256:4add? ([2a01:e0a:466:71c0:1c42:ed63:2256:4add]) by smtp.gmail.com with ESMTPSA id b15sm2726657wmj.13.2019.12.17.04.04.23 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 17 Dec 2019 04:04:23 -0800 (PST) Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3608.40.2.2.4\)) Subject: Re: [PATCH] util/cutils: Expand do_strtosz parsing precision to 64 bits From: Christophe de Dinechin In-Reply-To: <87a786sse9.fsf@dusky.pond.sub.org> Date: Tue, 17 Dec 2019 13:04:22 +0100 Message-Id: <64E0F779-B750-4766-9978-58A8B4737839@redhat.com> References: <20191205021459.29920-1-tao3.xu@intel.com> <87a786sse9.fsf@dusky.pond.sub.org> To: Markus Armbruster X-Mailer: Apple Mail (2.3608.40.2.2.4) X-MC-Unique: DStZX7ZSOTuJyOjwj64cKg-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 205.139.110.120 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tao Xu , qemu-devel@nongnu.org, ehabkost@redhat.com, mdroth@linux.vnet.ibm.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" > On 5 Dec 2019, at 16:29, Markus Armbruster wrote: >=20 > Tao Xu writes: >=20 >> Parse input string both as a double and as a uint64_t, then use the >> method which consumes more characters. Update the related test cases. >>=20 >> Signed-off-by: Tao Xu >> --- > [...] >> diff --git a/util/cutils.c b/util/cutils.c >> index 77acadc70a..b08058c57c 100644 >> --- a/util/cutils.c >> +++ b/util/cutils.c >> @@ -212,24 +212,43 @@ static int do_strtosz(const char *nptr, const char= **end, >> const char default_suffix, int64_t unit, >> uint64_t *result) >> { >> - int retval; >> - const char *endptr; >> + int retval, retd, retu; >> + const char *suffix, *suffixd, *suffixu; >> unsigned char c; >> int mul_required =3D 0; >> - double val, mul, integral, fraction; >> + bool use_strtod; >> + uint64_t valu; >> + double vald, mul, integral, fraction; >=20 > Note for later: @mul is double. >=20 >> + >> + retd =3D qemu_strtod_finite(nptr, &suffixd, &vald); >> + retu =3D qemu_strtou64(nptr, &suffixu, 0, &valu); >> + use_strtod =3D strlen(suffixd) < strlen(suffixu); >> + >> + /* >> + * Parse @nptr both as a double and as a uint64_t, then use the met= hod >> + * which consumes more characters. >> + */ >=20 > The comment is in a funny place. I'd put it right before the > qemu_strtod_finite() line. >=20 >> + if (use_strtod) { >> + suffix =3D suffixd; >> + retval =3D retd; >> + } else { >> + suffix =3D suffixu; >> + retval =3D retu; >> + } >>=20 >> - retval =3D qemu_strtod_finite(nptr, &endptr, &val); >> if (retval) { >> goto out; >> } >=20 > This is even more subtle than it looks. But why it is even necessary? The =E2=80=9Ccontract=E2=80=9D for the function used to be that it returned= rounded values beyond 2^53, which in itself is curious. But now it=E2=80=99s a 6-dimensional matrix of hell with NaNs and barfnots,= when the name implies it=E2=80=99s simply doing a text to u64 conversion=E2=80=A6 There is certainly a reason, but I=E2=80=99m really curious what it is :-) >=20 > A close reading of the function contracts leads to three cases for each > conversion: >=20 > * parse error (including infinity and NaN) >=20 > @retu / @retd is -EINVAL > @valu / @vald is uninitialized > @suffixu / @suffixd is @nptr >=20 > * range error >=20 > @retu / @retd is -ERANGE > @valu / @vald is our best approximation of the conversion result > @suffixu / @suffixd points to the first character not consumed by the > conversion. >=20 > Sub-cases: >=20 > - uint64_t overflow >=20 > We know the conversion result exceeds UINT64_MAX. >=20 > - double overflow >=20 > we know the conversion result's magnitude exceeds the largest > representable finite double DBL_MAX. >=20 > - double underflow >=20 > we know the conversion result is close to zero (closer than DBL_MIN, > the smallest normalized positive double). >=20 > * success >=20 > @retu / @retd is 0 > @valu / @vald is the conversion result > @suffixu / @suffixd points to the first character not consumed by the > conversion. >=20 > This leads to a matrix (parse error, uint64_t overflow, success) x > (parse error, double overflow, double underflow, success). We need to > check the code does what we want for each element of this matrix, and > document any behavior that's not perfectly obvious. >=20 > (success, success): we pick uint64_t if qemu_strtou64() consumed more > characters than qemu_strtod_finite(), else double. "More" is important > here; when they consume the same characters, we *need* to use the > uint64_t result. Example: for "18446744073709551615", we need to use > uint64_t 18446744073709551615, not double 18446744073709551616.0. But > for "18446744073709551616.", we need to use the double. Good. >=20 > (success, parse error) and (parse error, success): we pick the one that > succeeds, because success consumes characters, and failure to parse does > not. Good. >=20 > (parse error, parse error): neither consumes characters, so we pick > uint64_t. Good. >=20 > (parse error, double overflow), (parse error, double underflow) and > (uint64_t overflow, parse error): we pick the range error, because it > consumes characters. Good. >=20 > These are the simple combinations. The remainder are hairier: (success, > double overflow), (success, double underflow), (uint64_t overflow, > success). I lack the time to analyze them today. Must be done before > we take this patch. Any takers? >=20 >> - fraction =3D modf(val, &integral); >> - if (fraction !=3D 0) { >> - mul_required =3D 1; >> + if (use_strtod) { >> + fraction =3D modf(vald, &integral); >> + if (fraction !=3D 0) { >> + mul_required =3D 1; >> + } >> } >=20 > Here, @suffix points to the suffix character, if any. >=20 >> - c =3D *endptr; >> + c =3D *suffix; >> mul =3D suffix_mul(c, unit); >> if (mul >=3D 0) { >> - endptr++; >> + suffix++; >=20 > Now @suffix points to the first character not consumed, *not* the > suffix. >=20 > Your patch effectively renames @endptr to @suffix. I think @endptr is > the better name. Keeping the name also makes the diff smaller and > slightly easier to review. >=20 >> } else { >> mul =3D suffix_mul(default_suffix, unit); >=20 > suffix_mul() returns int64_t. The assignment converts it to double. > Fine before the patch, because @mul is the multiplier for a double > value. No longer true after the patch, see below. >=20 >> assert(mul >=3D 0); >> @@ -238,23 +257,36 @@ static int do_strtosz(const char *nptr, const char= **end, >> retval =3D -EINVAL; >> goto out; >> } >> - /* >> - * Values near UINT64_MAX overflow to 2**64 when converting to doub= le >> - * precision. Compare against the maximum representable double pre= cision >> - * value below 2**64, computed as "the next value after 2**64 (0x1p= 64) in >> - * the direction of 0". >> - */ >> - if ((val * mul > nextafter(0x1p64, 0)) || val < 0) { >> - retval =3D -ERANGE; >> - goto out; >> + >> + if (use_strtod) { >> + /* >> + * Values near UINT64_MAX overflow to 2**64 when converting to = double >> + * precision. Compare against the maximum representable double = precision >> + * value below 2**64, computed as "the next value after 2**64 (= 0x1p64) >> + * in the direction of 0". >> + */ >> + if ((vald * mul > nextafter(0x1p64, 0)) || vald < 0) { >> + retval =3D -ERANGE; >> + goto out; >> + } >> + *result =3D vald * mul; >=20 > Here, @mul is a multiplier for double vald. >=20 >> + } else { >> + /* Reject negative input and overflow output */ >> + while (qemu_isspace(*nptr)) { >> + nptr++; >> + } >> + if (*nptr =3D=3D '-' || UINT64_MAX / (uint64_t) mul < valu) { >> + retval =3D -ERANGE; >> + goto out; >> + } >> + *result =3D valu * (uint64_t) mul; >=20 > Here, @mul is a multiplier for uint64_t valu. >=20 > Please change @mul to int64_t to reduce conversions. >=20 >> } >> - *result =3D val * mul; >> retval =3D 0; >>=20 >> out: >> if (end) { >> - *end =3D endptr; >> - } else if (*endptr) { >> + *end =3D suffix; >> + } else if (*suffix) { >> retval =3D -EINVAL; >> } >=20 >=20