From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:42841)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <armbru@redhat.com>) id 1bDZNQ-000702-1G
	for qemu-devel@nongnu.org; Thu, 16 Jun 2016 11:38:41 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <armbru@redhat.com>) id 1bDZNJ-0007Az-Ru
	for qemu-devel@nongnu.org; Thu, 16 Jun 2016 11:38:40 -0400
Received: from mx1.redhat.com ([209.132.183.28]:58324)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <armbru@redhat.com>) id 1bDZNJ-0007AM-KP
	for qemu-devel@nongnu.org; Thu, 16 Jun 2016 11:38:33 -0400
Received: from int-mx10.intmail.prod.int.phx2.redhat.com
	(int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id 1904563E01
	for <qemu-devel@nongnu.org>; Thu, 16 Jun 2016 15:38:33 +0000 (UTC)
From: Markus Armbruster <armbru@redhat.com>
References: <1465526889-8339-1-git-send-email-eblake@redhat.com>
	<1465526889-8339-4-git-send-email-eblake@redhat.com>
Date: Thu, 16 Jun 2016 17:38:30 +0200
In-Reply-To: <1465526889-8339-4-git-send-email-eblake@redhat.com> (Eric
	Blake's message of "Thu, 9 Jun 2016 20:48:08 -0600")
Message-ID: <87porhw3d5.fsf@dusky.pond.sub.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [PATCH 3/4] qobject: Parse non-finite numbers,
 as an extension
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Eric Blake <eblake@redhat.com>
Cc: qemu-devel@nongnu.org, Luiz Capitulino <lcapitulino@redhat.com>

Eric Blake <eblake@redhat.com> writes:

> There have been times in the past where we have been careless and
> allowed non-finite values to escape as a 'number' type in QMP
> output (such as before commit 27ff42e).  This is in violation of
> the JSON specification in RFC 7159, and usually caused by
> floating-point division by 0 resulting in NaN, although infinity
> is also a possible culprit.  While a future patch will tighten the
> output generator to avoid a lexical error from a strict JSON
> parser, we might as well first fix our parser to accept non-finite
> values as an extension, so that we can always read what we have
> output in the past ("be liberal in what you accept, strict in what
>  you produce").
>
> Rather than complicate the lexer to add LOTS of states for each
> letter within 'nan' and 'inf[inity]', I chose to just abuse the
> 'keyword' token, but had to make it accept upper case.  Also, since
> I want to parse '-inf', I had to tweak IN_NEG_NONZERO_NUMBER; and
> renamed that in the process (we have always accepted '-0', so the
> state name was a misnomer).  Then the parser then does the real magic

Then the parser does

> of creating a QFloat object if a non-finite "keyword" was recognized.
>
> I intentionally did not support NAN(n-char-sequence) forms, even
> though C99 requires it to work for strtod(), in part because
> "implementation-specified" n-char-sequence is rather hard to test
> in a platform-agnostic manner, and in part because we've never
> actually output it.

I'm afraid only because libc didn't.

=C2=A77.19.6.1:

    A double argument representing an infinity is converted in one of
    the styles [-]inf or [-]infinity -- which style is
    implementation-defined.  A double argument representing a NaN is
    converted in one of the styles [-]nan or [-]nan(n-char-sequence) --
    which style, and the meaning of any n-char- sequence, is
    implementation-defined.

> Improve the testsuite to cover the new extension, including working
> around the fact that g_assert_cmpfloat() does NOT test equivalence,
> but merely mathematical equality (0.0 and -0.0 are equal but not
> equivalent, NaN and NaN are equivalent but not equal), and enhance
> the existing tests for -0.0 in the process.
>
> Checkpatch has a false negative, complaining that there should be

Isn't this a false positive?

> spaces around binary '-' in what is really the float literal
> -32.20e-10.  It was over my head to figure out how to silence that
> one.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  qobject/json-lexer.c  | 15 ++++++++-----
>  qobject/json-parser.c | 13 +++++++++--
>  tests/check-qjson.c   | 62 ++++++++++++++++++++++++++++++++++++++++++++-=
------
>  docs/qmp-spec.txt     |  4 ++++
>  4 files changed, 79 insertions(+), 15 deletions(-)
>
> diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
> index ebd15d8..de16219 100644
> --- a/qobject/json-lexer.c
> +++ b/qobject/json-lexer.c
> @@ -18,13 +18,14 @@
>  #define MAX_TOKEN_SIZE (64ULL << 20)
>
>  /*
> - * Required by JSON (RFC 7159), plus \' extension in "":
> + * Required by JSON (RFC 7159), plus \' extension in "", and extension
> + * of parsing case-insensitive non-finite numbers like "NaN" and "-Inf":
>   *
>   * \"([^\\\"]|(\\\"|\\'|\\\\|\\/|\\b|\\f|\\n|\\r|\\t|
>   *    \\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*\"
>   * -?(0|[1-9][0-9]*)(.[0-9]+)?([eE][-+]?[0-9]+)?
>   * [{}\[\],:]
> - * [a-z]+   # covers null, true, false
> + * -?[a-zA-Z]+   # covers null, true, false, nan, inf[inity]
>   *
>   * Extension of '' strings:
>   *
> @@ -58,7 +59,7 @@ enum json_lexer_state {
>      IN_MANTISSA,
>      IN_MANTISSA_DIGITS,
>      IN_NONZERO_NUMBER,
> -    IN_NEG_NONZERO_NUMBER,
> +    IN_NEG_NUMBER,
>      IN_KEYWORD,
>      IN_ESCAPE,
>      IN_ESCAPE_L,
> @@ -206,15 +207,18 @@ static const uint8_t json_lexer[][256] =3D  {
>          ['.'] =3D IN_MANTISSA,
>      },
>
> -    [IN_NEG_NONZERO_NUMBER] =3D {
> +    [IN_NEG_NUMBER] =3D {
>          ['0'] =3D IN_ZERO,
>          ['1' ... '9'] =3D IN_NONZERO_NUMBER,
> +        ['a' ... 'z'] =3D IN_KEYWORD,
> +        ['A' ... 'Z'] =3D IN_KEYWORD,
>      },
>
>      /* keywords */
>      [IN_KEYWORD] =3D {
>          TERMINAL(JSON_KEYWORD),
>          ['a' ... 'z'] =3D IN_KEYWORD,
> +        ['A' ... 'Z'] =3D IN_KEYWORD,
>      },
>
>      /* whitespace */
> @@ -264,7 +268,7 @@ static const uint8_t json_lexer[][256] =3D  {
>          ['\''] =3D IN_SQ_STRING,
>          ['0'] =3D IN_ZERO,
>          ['1' ... '9'] =3D IN_NONZERO_NUMBER,
> -        ['-'] =3D IN_NEG_NONZERO_NUMBER,
> +        ['-'] =3D IN_NEG_NUMBER,
>          ['{'] =3D JSON_LCURLY,
>          ['}'] =3D JSON_RCURLY,
>          ['['] =3D JSON_LSQUARE,
> @@ -272,6 +276,7 @@ static const uint8_t json_lexer[][256] =3D  {
>          [','] =3D JSON_COMMA,
>          [':'] =3D JSON_COLON,
>          ['a' ... 'z'] =3D IN_KEYWORD,
> +        ['A' ... 'Z'] =3D IN_KEYWORD,
>          ['%'] =3D IN_ESCAPE,
>          [' '] =3D IN_WHITESPACE,
>          ['\t'] =3D IN_WHITESPACE,
> diff --git a/qobject/json-parser.c b/qobject/json-parser.c
> index 67ed727..12519b6 100644
> --- a/qobject/json-parser.c
> +++ b/qobject/json-parser.c
> @@ -450,6 +450,16 @@ static QObject *parse_keyword(JSONParserContext *ctx=
t)
>          return QOBJECT(qbool_from_bool(false));
>      } else if (!strcmp(token->str, "null")) {
>          return qnull();
> +    } else {
> +        double d;
> +        char *p;
> +
> +        /* The lexer feeds us "NaN" and "-Inf" as keywords */
> +        errno =3D 0;
> +        d =3D strtod(token->str, &p);
> +        if (!errno && p !=3D token->str && !*p) {
> +            return QOBJECT(qfloat_from_double(d));
> +        }
>      }
>      parse_error(ctxt, token, "invalid keyword '%s'", token->str);
>      return NULL;
> @@ -519,8 +529,7 @@ static QObject *parse_literal(JSONParserContext *ctxt)
>      }
>      case JSON_FLOAT:
>          /* FIXME dependent on locale; a pervasive issue in QEMU */
> -        /* FIXME our lexer matches RFC 7159 in forbidding Inf or NaN,
> -         * but those might be useful extensions beyond JSON */
> +        /* NaN and infinity are parsed as extensions under parse_keyword=
() */

"Under"?  What about "extensions, by parse_keyword()"?

>          return QOBJECT(qfloat_from_double(strtod(token->str, NULL)));
>      default:
>          abort();
> diff --git a/tests/check-qjson.c b/tests/check-qjson.c
> index 0e158f6..95adf64 100644
> --- a/tests/check-qjson.c
> +++ b/tests/check-qjson.c
> @@ -11,6 +11,7 @@
>   *
>   */
>  #include "qemu/osdep.h"
> +#include <math.h>
>
>  #include "qapi/qmp/qstring.h"
>  #include "qapi/qmp/qint.h"
> @@ -932,34 +933,78 @@ static void float_number(void)
>      struct {
>          const char *encoded;
>          double decoded;
> -        int skip;
> +        const char *canonical;
>      } test_cases[] =3D {
>          { "32.43", 32.43 },
>          { "0.222", 0.222 },
>          { "-32.12313", -32.12313 },
> -        { "-32.20e-10", -32.20e-10, .skip =3D 1 },
> +        { "-32.20e-10", -32.20e-10, "-0" }, /* FIXME: Our rounding is lo=
usy */
> +        { "-0.0", -0.0, "-0" },
>          { },
>      };
>
>      for (i =3D 0; test_cases[i].encoded; i++) {
>          QObject *obj;
>          QFloat *qfloat;
> +        QString *str;
>
>          obj =3D qobject_from_json(test_cases[i].encoded);
>          g_assert(obj !=3D NULL);
>          g_assert(qobject_type(obj) =3D=3D QTYPE_QFLOAT);
>
>          qfloat =3D qobject_to_qfloat(obj);
> -        g_assert(qfloat_get_double(qfloat) =3D=3D test_cases[i].decoded);
> +        g_assert_cmpfloat(qfloat_get_double(qfloat), =3D=3D,
> +                          test_cases[i].decoded);
> +        g_assert_cmpint(signbit(qfloat_get_double(qfloat)), =3D=3D,
> +                        signbit(test_cases[i].decoded));

Could use a comment.

>
> -        if (test_cases[i].skip =3D=3D 0) {
> -            QString *str;
> +        str =3D qobject_to_json(obj);
> +        g_assert_cmpstr(qstring_get_str(str), =3D=3D,
> +                        test_cases[i].canonical ?: test_cases[i].encoded=
);
> +        QDECREF(str);
> +        QDECREF(qfloat);
> +    }
> +}
>
> -            str =3D qobject_to_json(obj);
> -            g_assert(strcmp(qstring_get_str(str), test_cases[i].encoded)=
 =3D=3D 0);
> -            QDECREF(str);
> +static void non_finite_number(void)
> +{
> +    int i;
> +    struct {
> +        const char *encoded;
> +        double decoded;
> +        const char *canonical;
> +    } test_cases[] =3D {
> +        { "nan", NAN },
> +        { "NaN", NAN, "nan" },
> +        /* Not all libc implementations can round-trip '-nan' */
> +        /* We do not support forms like 'NAN(0)' */
> +        { "inf", INFINITY },
> +        { "-INFINITY", -INFINITY, "-inf" },
> +        { },
> +    };
> +
> +    for (i =3D 0; test_cases[i].encoded; i++) {
> +        QObject *obj;
> +        QFloat *qfloat;
> +        QString *str;
> +
> +        obj =3D qobject_from_json(test_cases[i].encoded);
> +        g_assert(obj !=3D NULL);
> +        g_assert(qobject_type(obj) =3D=3D QTYPE_QFLOAT);
> +
> +        qfloat =3D qobject_to_qfloat(obj);
> +        /* g_assert_cmpfloat(NAN, =3D=3D, NAN) doesn't do what we want */
> +        g_assert_cmpint(fpclassify(qfloat_get_double(qfloat)), =3D=3D,
> +                        fpclassify(test_cases[i].decoded));
> +        if (!isnan(test_cases[i].decoded)) {
> +            g_assert_cmpfloat(qfloat_get_double(qfloat), =3D=3D,
> +                              test_cases[i].decoded);
>          }
>
> +        str =3D qobject_to_json(obj);
> +        g_assert_cmpstr(qstring_get_str(str), =3D=3D,
> +                        test_cases[i].canonical ?: test_cases[i].encoded=
);
> +        QDECREF(str);
>          QDECREF(qfloat);
>      }
>  }
> @@ -1520,6 +1565,7 @@ int main(int argc, char **argv)
>
>      g_test_add_func("/literals/number/simple", simple_number);
>      g_test_add_func("/literals/number/float", float_number);
> +    g_test_add_func("/literals/number/non-finite", non_finite_number);
>      g_test_add_func("/literals/number/vararg", vararg_number);
>
>      g_test_add_func("/literals/keyword", keyword_literal);
> diff --git a/docs/qmp-spec.txt b/docs/qmp-spec.txt
> index f8b5356..bfba431 100644
> --- a/docs/qmp-spec.txt
> +++ b/docs/qmp-spec.txt
> @@ -51,6 +51,10 @@ json-string, and both input forms of strings understan=
d an additional
>  escape sequence of "\'" for a single quote. The server will only use
>  double quoting on output.
>
> +As an extension, the server understands case-insensitive forms of
> +non-finite numbers, such as "NaN", "Inf", and "-infinity" (but not
> +"NaN(...)").
> +
>  2.1 General Definitions
>  -----------------------