All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/4] Guarantee valid JSON in QMP, even for nonfinite numbers
@ 2016-06-10  2:48 Eric Blake
  2016-06-10  2:48 ` [Qemu-devel] [PATCH 1/4] qobject: Correct JSON lexer grammar comments Eric Blake
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Eric Blake @ 2016-06-10  2:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: armbru

Flesh out an idea that came up during my JSON visitor series.
Semantic errors are nicer than lexical errors.  While we generally
consider any non-finite number a bug in our code (that is, we
don't plan to expose them over QMP), at least we can guarantee
that even with a bug in our code we are still giving the user
valid JSON, similar to how we convert non-UTF-8 byte sequences
to use the replacement character for valid UTF-8 output.

I hit a couple of checkpatch false negatives in while writing this
series; one was easy to fix, but the other stumped me.  It doesn't
help that checkpatch tweaks also trigger checkpatch warnings...

Eric Blake (4):
  qobject: Correct JSON lexer grammar comments
  checkpatch: There is no qemu_strtod()
  qobject: Parse non-finite numbers, as an extension
  qobject: Output valid JSON for non-finite numbers

 qobject/json-lexer.c  | 32 ++++++++++++++++++++-------
 qobject/json-parser.c | 13 +++++++++--
 qobject/qjson.c       | 15 ++++++++++---
 tests/check-qjson.c   | 61 ++++++++++++++++++++++++++++++++++++++++++++-------
 docs/qmp-spec.txt     |  4 ++++
 scripts/checkpatch.pl |  2 +-
 6 files changed, 105 insertions(+), 22 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 1/4] qobject: Correct JSON lexer grammar comments
  2016-06-10  2:48 [Qemu-devel] [PATCH 0/4] Guarantee valid JSON in QMP, even for nonfinite numbers Eric Blake
@ 2016-06-10  2:48 ` Eric Blake
  2016-06-16 16:19   ` Markus Armbruster
  2016-06-10  2:48 ` [Qemu-devel] [PATCH 2/4] checkpatch: There is no qemu_strtod() Eric Blake
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Eric Blake @ 2016-06-10  2:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: armbru, Luiz Capitulino

Fix the regex comments describing what we parse as JSON.  No change
to the lexer itself, just to the comments:
- The "" and '' string construction was missing alternation between
different escape sequences
- The construction for numbers forgot to handle optional leading '-'
- The construction for numbers was grouped incorrectly so that it
didn't permit '0.1'
- The construction for numbers forgot to mark the exponent as optional
- No mention that our '' string and "\'" are JSON extensions
- No mention of our %d and related extensions when constructing JSON

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 qobject/json-lexer.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 496374d..ebd15d8 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -18,11 +18,22 @@
 #define MAX_TOKEN_SIZE (64ULL << 20)

 /*
- * \"([^\\\"]|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*\"
- * '([^\\']|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*'
- * 0|([1-9][0-9]*(.[0-9]+)?([eE]([-+])?[0-9]+))
+ * Required by JSON (RFC 7159), plus \' extension in "":
+ *
+ * \"([^\\\"]|(\\\"|\\'|\\\\|\\/|\\b|\\f|\\n|\\r|\\t|
+ *    \\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*\"
+ * -?(0|[1-9][0-9]*)(.[0-9]+)?([eE][-+]?[0-9]+)?
  * [{}\[\],:]
- * [a-z]+
+ * [a-z]+   # covers null, true, false
+ *
+ * Extension of '' strings:
+ *
+ * '([^\\']|(\\\"|\\'|\\\\|\\/|\\b|\\f|\\n|\\r|\\t|
+ *    \\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*'
+ *
+ * Extension for vararg handling in JSON construction:
+ *
+ * %((l|ll|I64)?d|[ipsf])
  *
  */

@@ -213,7 +224,7 @@ static const uint8_t json_lexer[][256] =  {
         ['\t'] = IN_WHITESPACE,
         ['\r'] = IN_WHITESPACE,
         ['\n'] = IN_WHITESPACE,
-    },        
+    },

     /* escape */
     [IN_ESCAPE_LL] = {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 2/4] checkpatch: There is no qemu_strtod()
  2016-06-10  2:48 [Qemu-devel] [PATCH 0/4] Guarantee valid JSON in QMP, even for nonfinite numbers Eric Blake
  2016-06-10  2:48 ` [Qemu-devel] [PATCH 1/4] qobject: Correct JSON lexer grammar comments Eric Blake
@ 2016-06-10  2:48 ` Eric Blake
  2016-06-16 16:20   ` Markus Armbruster
  2016-06-10  2:48 ` [Qemu-devel] [PATCH 3/4] qobject: Parse non-finite numbers, as an extension Eric Blake
  2016-06-10  2:48 ` [Qemu-devel] [PATCH 4/4] qobject: Output valid JSON for non-finite numbers Eric Blake
  3 siblings, 1 reply; 18+ messages in thread
From: Eric Blake @ 2016-06-10  2:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: armbru, Blue Swirl

Maybe there should be; but until there is, we should not flag
strtod() calls as something to replaced with qemu_strtod().

We also lack qemu_strtof() and qemu_strtold(), but as no one
has been using strtof() or strtold(), it's not worth complicating
the regex for them.

(Ironically, I had to use 'git commit -n' since checkpatch uses
TAB indents, in violation of its own recommendations.)

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 scripts/checkpatch.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index c939a32..cf32c8f 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2453,7 +2453,7 @@ sub process {
 		}

 # recommend qemu_strto* over strto* for numeric conversions
-		if ($line =~ /\b(strto[^k].*?)\s*\(/) {
+		if ($line =~ /\b(strto[^kd].*?)\s*\(/) {
 			WARN("consider using qemu_$1 in preference to $1\n" . $herecurr);
 		}
 # check for module_init(), use category-specific init macros explicitly please
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 3/4] qobject: Parse non-finite numbers, as an extension
  2016-06-10  2:48 [Qemu-devel] [PATCH 0/4] Guarantee valid JSON in QMP, even for nonfinite numbers Eric Blake
  2016-06-10  2:48 ` [Qemu-devel] [PATCH 1/4] qobject: Correct JSON lexer grammar comments Eric Blake
  2016-06-10  2:48 ` [Qemu-devel] [PATCH 2/4] checkpatch: There is no qemu_strtod() Eric Blake
@ 2016-06-10  2:48 ` Eric Blake
  2016-06-16 15:38   ` Markus Armbruster
  2016-06-10  2:48 ` [Qemu-devel] [PATCH 4/4] qobject: Output valid JSON for non-finite numbers Eric Blake
  3 siblings, 1 reply; 18+ messages in thread
From: Eric Blake @ 2016-06-10  2:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: armbru, Luiz Capitulino

There have been times in the past where we have been careless and
allowed non-finite values to escape as a 'number' type in QMP
output (such as before commit 27ff42e).  This is in violation of
the JSON specification in RFC 7159, and usually caused by
floating-point division by 0 resulting in NaN, although infinity
is also a possible culprit.  While a future patch will tighten the
output generator to avoid a lexical error from a strict JSON
parser, we might as well first fix our parser to accept non-finite
values as an extension, so that we can always read what we have
output in the past ("be liberal in what you accept, strict in what
 you produce").

Rather than complicate the lexer to add LOTS of states for each
letter within 'nan' and 'inf[inity]', I chose to just abuse the
'keyword' token, but had to make it accept upper case.  Also, since
I want to parse '-inf', I had to tweak IN_NEG_NONZERO_NUMBER; and
renamed that in the process (we have always accepted '-0', so the
state name was a misnomer).  Then the parser then does the real magic
of creating a QFloat object if a non-finite "keyword" was recognized.

I intentionally did not support NAN(n-char-sequence) forms, even
though C99 requires it to work for strtod(), in part because
"implementation-specified" n-char-sequence is rather hard to test
in a platform-agnostic manner, and in part because we've never
actually output it.

Improve the testsuite to cover the new extension, including working
around the fact that g_assert_cmpfloat() does NOT test equivalence,
but merely mathematical equality (0.0 and -0.0 are equal but not
equivalent, NaN and NaN are equivalent but not equal), and enhance
the existing tests for -0.0 in the process.

Checkpatch has a false negative, complaining that there should be
spaces around binary '-' in what is really the float literal
-32.20e-10.  It was over my head to figure out how to silence that
one.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 qobject/json-lexer.c  | 15 ++++++++-----
 qobject/json-parser.c | 13 +++++++++--
 tests/check-qjson.c   | 62 ++++++++++++++++++++++++++++++++++++++++++++-------
 docs/qmp-spec.txt     |  4 ++++
 4 files changed, 79 insertions(+), 15 deletions(-)

diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index ebd15d8..de16219 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -18,13 +18,14 @@
 #define MAX_TOKEN_SIZE (64ULL << 20)

 /*
- * Required by JSON (RFC 7159), plus \' extension in "":
+ * Required by JSON (RFC 7159), plus \' extension in "", and extension
+ * of parsing case-insensitive non-finite numbers like "NaN" and "-Inf":
  *
  * \"([^\\\"]|(\\\"|\\'|\\\\|\\/|\\b|\\f|\\n|\\r|\\t|
  *    \\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*\"
  * -?(0|[1-9][0-9]*)(.[0-9]+)?([eE][-+]?[0-9]+)?
  * [{}\[\],:]
- * [a-z]+   # covers null, true, false
+ * -?[a-zA-Z]+   # covers null, true, false, nan, inf[inity]
  *
  * Extension of '' strings:
  *
@@ -58,7 +59,7 @@ enum json_lexer_state {
     IN_MANTISSA,
     IN_MANTISSA_DIGITS,
     IN_NONZERO_NUMBER,
-    IN_NEG_NONZERO_NUMBER,
+    IN_NEG_NUMBER,
     IN_KEYWORD,
     IN_ESCAPE,
     IN_ESCAPE_L,
@@ -206,15 +207,18 @@ static const uint8_t json_lexer[][256] =  {
         ['.'] = IN_MANTISSA,
     },

-    [IN_NEG_NONZERO_NUMBER] = {
+    [IN_NEG_NUMBER] = {
         ['0'] = IN_ZERO,
         ['1' ... '9'] = IN_NONZERO_NUMBER,
+        ['a' ... 'z'] = IN_KEYWORD,
+        ['A' ... 'Z'] = IN_KEYWORD,
     },

     /* keywords */
     [IN_KEYWORD] = {
         TERMINAL(JSON_KEYWORD),
         ['a' ... 'z'] = IN_KEYWORD,
+        ['A' ... 'Z'] = IN_KEYWORD,
     },

     /* whitespace */
@@ -264,7 +268,7 @@ static const uint8_t json_lexer[][256] =  {
         ['\''] = IN_SQ_STRING,
         ['0'] = IN_ZERO,
         ['1' ... '9'] = IN_NONZERO_NUMBER,
-        ['-'] = IN_NEG_NONZERO_NUMBER,
+        ['-'] = IN_NEG_NUMBER,
         ['{'] = JSON_LCURLY,
         ['}'] = JSON_RCURLY,
         ['['] = JSON_LSQUARE,
@@ -272,6 +276,7 @@ static const uint8_t json_lexer[][256] =  {
         [','] = JSON_COMMA,
         [':'] = JSON_COLON,
         ['a' ... 'z'] = IN_KEYWORD,
+        ['A' ... 'Z'] = IN_KEYWORD,
         ['%'] = IN_ESCAPE,
         [' '] = IN_WHITESPACE,
         ['\t'] = IN_WHITESPACE,
diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index 67ed727..12519b6 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -450,6 +450,16 @@ static QObject *parse_keyword(JSONParserContext *ctxt)
         return QOBJECT(qbool_from_bool(false));
     } else if (!strcmp(token->str, "null")) {
         return qnull();
+    } else {
+        double d;
+        char *p;
+
+        /* The lexer feeds us "NaN" and "-Inf" as keywords */
+        errno = 0;
+        d = strtod(token->str, &p);
+        if (!errno && p != token->str && !*p) {
+            return QOBJECT(qfloat_from_double(d));
+        }
     }
     parse_error(ctxt, token, "invalid keyword '%s'", token->str);
     return NULL;
@@ -519,8 +529,7 @@ static QObject *parse_literal(JSONParserContext *ctxt)
     }
     case JSON_FLOAT:
         /* FIXME dependent on locale; a pervasive issue in QEMU */
-        /* FIXME our lexer matches RFC 7159 in forbidding Inf or NaN,
-         * but those might be useful extensions beyond JSON */
+        /* NaN and infinity are parsed as extensions under parse_keyword() */
         return QOBJECT(qfloat_from_double(strtod(token->str, NULL)));
     default:
         abort();
diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index 0e158f6..95adf64 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -11,6 +11,7 @@
  *
  */
 #include "qemu/osdep.h"
+#include <math.h>

 #include "qapi/qmp/qstring.h"
 #include "qapi/qmp/qint.h"
@@ -932,34 +933,78 @@ static void float_number(void)
     struct {
         const char *encoded;
         double decoded;
-        int skip;
+        const char *canonical;
     } test_cases[] = {
         { "32.43", 32.43 },
         { "0.222", 0.222 },
         { "-32.12313", -32.12313 },
-        { "-32.20e-10", -32.20e-10, .skip = 1 },
+        { "-32.20e-10", -32.20e-10, "-0" }, /* FIXME: Our rounding is lousy */
+        { "-0.0", -0.0, "-0" },
         { },
     };

     for (i = 0; test_cases[i].encoded; i++) {
         QObject *obj;
         QFloat *qfloat;
+        QString *str;

         obj = qobject_from_json(test_cases[i].encoded);
         g_assert(obj != NULL);
         g_assert(qobject_type(obj) == QTYPE_QFLOAT);

         qfloat = qobject_to_qfloat(obj);
-        g_assert(qfloat_get_double(qfloat) == test_cases[i].decoded);
+        g_assert_cmpfloat(qfloat_get_double(qfloat), ==,
+                          test_cases[i].decoded);
+        g_assert_cmpint(signbit(qfloat_get_double(qfloat)), ==,
+                        signbit(test_cases[i].decoded));

-        if (test_cases[i].skip == 0) {
-            QString *str;
+        str = qobject_to_json(obj);
+        g_assert_cmpstr(qstring_get_str(str), ==,
+                        test_cases[i].canonical ?: test_cases[i].encoded);
+        QDECREF(str);
+        QDECREF(qfloat);
+    }
+}

-            str = qobject_to_json(obj);
-            g_assert(strcmp(qstring_get_str(str), test_cases[i].encoded) == 0);
-            QDECREF(str);
+static void non_finite_number(void)
+{
+    int i;
+    struct {
+        const char *encoded;
+        double decoded;
+        const char *canonical;
+    } test_cases[] = {
+        { "nan", NAN },
+        { "NaN", NAN, "nan" },
+        /* Not all libc implementations can round-trip '-nan' */
+        /* We do not support forms like 'NAN(0)' */
+        { "inf", INFINITY },
+        { "-INFINITY", -INFINITY, "-inf" },
+        { },
+    };
+
+    for (i = 0; test_cases[i].encoded; i++) {
+        QObject *obj;
+        QFloat *qfloat;
+        QString *str;
+
+        obj = qobject_from_json(test_cases[i].encoded);
+        g_assert(obj != NULL);
+        g_assert(qobject_type(obj) == QTYPE_QFLOAT);
+
+        qfloat = qobject_to_qfloat(obj);
+        /* g_assert_cmpfloat(NAN, ==, NAN) doesn't do what we want */
+        g_assert_cmpint(fpclassify(qfloat_get_double(qfloat)), ==,
+                        fpclassify(test_cases[i].decoded));
+        if (!isnan(test_cases[i].decoded)) {
+            g_assert_cmpfloat(qfloat_get_double(qfloat), ==,
+                              test_cases[i].decoded);
         }

+        str = qobject_to_json(obj);
+        g_assert_cmpstr(qstring_get_str(str), ==,
+                        test_cases[i].canonical ?: test_cases[i].encoded);
+        QDECREF(str);
         QDECREF(qfloat);
     }
 }
@@ -1520,6 +1565,7 @@ int main(int argc, char **argv)

     g_test_add_func("/literals/number/simple", simple_number);
     g_test_add_func("/literals/number/float", float_number);
+    g_test_add_func("/literals/number/non-finite", non_finite_number);
     g_test_add_func("/literals/number/vararg", vararg_number);

     g_test_add_func("/literals/keyword", keyword_literal);
diff --git a/docs/qmp-spec.txt b/docs/qmp-spec.txt
index f8b5356..bfba431 100644
--- a/docs/qmp-spec.txt
+++ b/docs/qmp-spec.txt
@@ -51,6 +51,10 @@ json-string, and both input forms of strings understand an additional
 escape sequence of "\'" for a single quote. The server will only use
 double quoting on output.

+As an extension, the server understands case-insensitive forms of
+non-finite numbers, such as "NaN", "Inf", and "-infinity" (but not
+"NaN(...)").
+
 2.1 General Definitions
 -----------------------

-- 
2.5.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 4/4] qobject: Output valid JSON for non-finite numbers
  2016-06-10  2:48 [Qemu-devel] [PATCH 0/4] Guarantee valid JSON in QMP, even for nonfinite numbers Eric Blake
                   ` (2 preceding siblings ...)
  2016-06-10  2:48 ` [Qemu-devel] [PATCH 3/4] qobject: Parse non-finite numbers, as an extension Eric Blake
@ 2016-06-10  2:48 ` Eric Blake
  2016-06-16 16:17   ` Markus Armbruster
  3 siblings, 1 reply; 18+ messages in thread
From: Eric Blake @ 2016-06-10  2:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: armbru, Luiz Capitulino

It's better to give downstream clients a valid JSON string,
even if they are semantically expecting a number, than it is
to give them a bare keyword extension that can cause a
lexical error.

Of course, as long as we don't recognize (certain) strings as valid
numbers during a conversion to QObject, this means our extension
of accepting bare keywords for non-finite numbers cannot undergo
a round trip (once converted into a string, we never get back to
a QFloat).  However, non-finite input is rare enough that it's
not worth bothering with at the moment.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 qobject/qjson.c     | 15 ++++++++++++---
 tests/check-qjson.c | 11 +++++------
 docs/qmp-spec.txt   |  2 +-
 3 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/qobject/qjson.c b/qobject/qjson.c
index ef160d2..465b080 100644
--- a/qobject/qjson.c
+++ b/qobject/qjson.c
@@ -12,6 +12,7 @@
  */

 #include "qemu/osdep.h"
+#include <math.h>
 #include "qapi/qmp/json-lexer.h"
 #include "qapi/qmp/json-parser.h"
 #include "qapi/qmp/json-streamer.h"
@@ -236,19 +237,27 @@ static void to_json(const QObject *obj, QString *str, int pretty, int indent)
     }
     case QTYPE_QFLOAT: {
         QFloat *val = qobject_to_qfloat(obj);
+        double d = qfloat_get_double(val);
         char buffer[1024];
         int len;

+        if (!isfinite(d)) {
+            /* Always output valid JSON - a semantic error due to a
+             * string where a number was expected is better than a
+             * lexical error from a strict parser */
+            snprintf(buffer, sizeof(buffer), "\"%f\"", d);
+            qstring_append(str, buffer);
+            break;
+        }
+
         /* FIXME: snprintf() is locale dependent; but JSON requires
          * numbers to be formatted as if in the C locale. Dependence
          * on C locale is a pervasive issue in QEMU. */
-        /* FIXME: This risks printing Inf or NaN, which are not valid
-         * JSON values. */
         /* FIXME: the default precision of 6 for %f often causes
          * rounding errors; we should be using DBL_DECIMAL_DIG (17),
          * and only rounding to a shorter number if the result would
          * still produce the same floating point value.  */
-        len = snprintf(buffer, sizeof(buffer), "%f", qfloat_get_double(val));
+        len = snprintf(buffer, sizeof(buffer), "%f", d);
         while (len > 0 && buffer[len - 1] == '0') {
             len--;
         }
diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index 95adf64..b0a9178 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -974,12 +974,12 @@ static void non_finite_number(void)
         double decoded;
         const char *canonical;
     } test_cases[] = {
-        { "nan", NAN },
-        { "NaN", NAN, "nan" },
+        { "nan", NAN, "\"nan\"" },
+        { "NaN", NAN, "\"nan\"" },
         /* Not all libc implementations can round-trip '-nan' */
         /* We do not support forms like 'NAN(0)' */
-        { "inf", INFINITY },
-        { "-INFINITY", -INFINITY, "-inf" },
+        { "inf", INFINITY, "\"inf\"" },
+        { "-INFINITY", -INFINITY, "\"-inf\"" },
         { },
     };

@@ -1002,8 +1002,7 @@ static void non_finite_number(void)
         }

         str = qobject_to_json(obj);
-        g_assert_cmpstr(qstring_get_str(str), ==,
-                        test_cases[i].canonical ?: test_cases[i].encoded);
+        g_assert_cmpstr(qstring_get_str(str), ==, test_cases[i].canonical);
         QDECREF(str);
         QDECREF(qfloat);
     }
diff --git a/docs/qmp-spec.txt b/docs/qmp-spec.txt
index bfba431..75c68d9 100644
--- a/docs/qmp-spec.txt
+++ b/docs/qmp-spec.txt
@@ -53,7 +53,7 @@ double quoting on output.

 As an extension, the server understands case-insensitive forms of
 non-finite numbers, such as "NaN", "Inf", and "-infinity" (but not
-"NaN(...)").
+"NaN(...)"); however, such numbers are output as a json-string.

 2.1 General Definitions
 -----------------------
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] qobject: Parse non-finite numbers, as an extension
  2016-06-10  2:48 ` [Qemu-devel] [PATCH 3/4] qobject: Parse non-finite numbers, as an extension Eric Blake
@ 2016-06-16 15:38   ` Markus Armbruster
  2016-06-16 16:25     ` Markus Armbruster
  0 siblings, 1 reply; 18+ messages in thread
From: Markus Armbruster @ 2016-06-16 15:38 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, Luiz Capitulino

Eric Blake <eblake@redhat.com> writes:

> There have been times in the past where we have been careless and
> allowed non-finite values to escape as a 'number' type in QMP
> output (such as before commit 27ff42e).  This is in violation of
> the JSON specification in RFC 7159, and usually caused by
> floating-point division by 0 resulting in NaN, although infinity
> is also a possible culprit.  While a future patch will tighten the
> output generator to avoid a lexical error from a strict JSON
> parser, we might as well first fix our parser to accept non-finite
> values as an extension, so that we can always read what we have
> output in the past ("be liberal in what you accept, strict in what
>  you produce").
>
> Rather than complicate the lexer to add LOTS of states for each
> letter within 'nan' and 'inf[inity]', I chose to just abuse the
> 'keyword' token, but had to make it accept upper case.  Also, since
> I want to parse '-inf', I had to tweak IN_NEG_NONZERO_NUMBER; and
> renamed that in the process (we have always accepted '-0', so the
> state name was a misnomer).  Then the parser then does the real magic

Then the parser does

> of creating a QFloat object if a non-finite "keyword" was recognized.
>
> I intentionally did not support NAN(n-char-sequence) forms, even
> though C99 requires it to work for strtod(), in part because
> "implementation-specified" n-char-sequence is rather hard to test
> in a platform-agnostic manner, and in part because we've never
> actually output it.

I'm afraid only because libc didn't.

§7.19.6.1:

    A double argument representing an infinity is converted in one of
    the styles [-]inf or [-]infinity -- which style is
    implementation-defined.  A double argument representing a NaN is
    converted in one of the styles [-]nan or [-]nan(n-char-sequence) --
    which style, and the meaning of any n-char- sequence, is
    implementation-defined.

> Improve the testsuite to cover the new extension, including working
> around the fact that g_assert_cmpfloat() does NOT test equivalence,
> but merely mathematical equality (0.0 and -0.0 are equal but not
> equivalent, NaN and NaN are equivalent but not equal), and enhance
> the existing tests for -0.0 in the process.
>
> Checkpatch has a false negative, complaining that there should be

Isn't this a false positive?

> spaces around binary '-' in what is really the float literal
> -32.20e-10.  It was over my head to figure out how to silence that
> one.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  qobject/json-lexer.c  | 15 ++++++++-----
>  qobject/json-parser.c | 13 +++++++++--
>  tests/check-qjson.c   | 62 ++++++++++++++++++++++++++++++++++++++++++++-------
>  docs/qmp-spec.txt     |  4 ++++
>  4 files changed, 79 insertions(+), 15 deletions(-)
>
> diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
> index ebd15d8..de16219 100644
> --- a/qobject/json-lexer.c
> +++ b/qobject/json-lexer.c
> @@ -18,13 +18,14 @@
>  #define MAX_TOKEN_SIZE (64ULL << 20)
>
>  /*
> - * Required by JSON (RFC 7159), plus \' extension in "":
> + * Required by JSON (RFC 7159), plus \' extension in "", and extension
> + * of parsing case-insensitive non-finite numbers like "NaN" and "-Inf":
>   *
>   * \"([^\\\"]|(\\\"|\\'|\\\\|\\/|\\b|\\f|\\n|\\r|\\t|
>   *    \\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*\"
>   * -?(0|[1-9][0-9]*)(.[0-9]+)?([eE][-+]?[0-9]+)?
>   * [{}\[\],:]
> - * [a-z]+   # covers null, true, false
> + * -?[a-zA-Z]+   # covers null, true, false, nan, inf[inity]
>   *
>   * Extension of '' strings:
>   *
> @@ -58,7 +59,7 @@ enum json_lexer_state {
>      IN_MANTISSA,
>      IN_MANTISSA_DIGITS,
>      IN_NONZERO_NUMBER,
> -    IN_NEG_NONZERO_NUMBER,
> +    IN_NEG_NUMBER,
>      IN_KEYWORD,
>      IN_ESCAPE,
>      IN_ESCAPE_L,
> @@ -206,15 +207,18 @@ static const uint8_t json_lexer[][256] =  {
>          ['.'] = IN_MANTISSA,
>      },
>
> -    [IN_NEG_NONZERO_NUMBER] = {
> +    [IN_NEG_NUMBER] = {
>          ['0'] = IN_ZERO,
>          ['1' ... '9'] = IN_NONZERO_NUMBER,
> +        ['a' ... 'z'] = IN_KEYWORD,
> +        ['A' ... 'Z'] = IN_KEYWORD,
>      },
>
>      /* keywords */
>      [IN_KEYWORD] = {
>          TERMINAL(JSON_KEYWORD),
>          ['a' ... 'z'] = IN_KEYWORD,
> +        ['A' ... 'Z'] = IN_KEYWORD,
>      },
>
>      /* whitespace */
> @@ -264,7 +268,7 @@ static const uint8_t json_lexer[][256] =  {
>          ['\''] = IN_SQ_STRING,
>          ['0'] = IN_ZERO,
>          ['1' ... '9'] = IN_NONZERO_NUMBER,
> -        ['-'] = IN_NEG_NONZERO_NUMBER,
> +        ['-'] = IN_NEG_NUMBER,
>          ['{'] = JSON_LCURLY,
>          ['}'] = JSON_RCURLY,
>          ['['] = JSON_LSQUARE,
> @@ -272,6 +276,7 @@ static const uint8_t json_lexer[][256] =  {
>          [','] = JSON_COMMA,
>          [':'] = JSON_COLON,
>          ['a' ... 'z'] = IN_KEYWORD,
> +        ['A' ... 'Z'] = IN_KEYWORD,
>          ['%'] = IN_ESCAPE,
>          [' '] = IN_WHITESPACE,
>          ['\t'] = IN_WHITESPACE,
> diff --git a/qobject/json-parser.c b/qobject/json-parser.c
> index 67ed727..12519b6 100644
> --- a/qobject/json-parser.c
> +++ b/qobject/json-parser.c
> @@ -450,6 +450,16 @@ static QObject *parse_keyword(JSONParserContext *ctxt)
>          return QOBJECT(qbool_from_bool(false));
>      } else if (!strcmp(token->str, "null")) {
>          return qnull();
> +    } else {
> +        double d;
> +        char *p;
> +
> +        /* The lexer feeds us "NaN" and "-Inf" as keywords */
> +        errno = 0;
> +        d = strtod(token->str, &p);
> +        if (!errno && p != token->str && !*p) {
> +            return QOBJECT(qfloat_from_double(d));
> +        }
>      }
>      parse_error(ctxt, token, "invalid keyword '%s'", token->str);
>      return NULL;
> @@ -519,8 +529,7 @@ static QObject *parse_literal(JSONParserContext *ctxt)
>      }
>      case JSON_FLOAT:
>          /* FIXME dependent on locale; a pervasive issue in QEMU */
> -        /* FIXME our lexer matches RFC 7159 in forbidding Inf or NaN,
> -         * but those might be useful extensions beyond JSON */
> +        /* NaN and infinity are parsed as extensions under parse_keyword() */

"Under"?  What about "extensions, by parse_keyword()"?

>          return QOBJECT(qfloat_from_double(strtod(token->str, NULL)));
>      default:
>          abort();
> diff --git a/tests/check-qjson.c b/tests/check-qjson.c
> index 0e158f6..95adf64 100644
> --- a/tests/check-qjson.c
> +++ b/tests/check-qjson.c
> @@ -11,6 +11,7 @@
>   *
>   */
>  #include "qemu/osdep.h"
> +#include <math.h>
>
>  #include "qapi/qmp/qstring.h"
>  #include "qapi/qmp/qint.h"
> @@ -932,34 +933,78 @@ static void float_number(void)
>      struct {
>          const char *encoded;
>          double decoded;
> -        int skip;
> +        const char *canonical;
>      } test_cases[] = {
>          { "32.43", 32.43 },
>          { "0.222", 0.222 },
>          { "-32.12313", -32.12313 },
> -        { "-32.20e-10", -32.20e-10, .skip = 1 },
> +        { "-32.20e-10", -32.20e-10, "-0" }, /* FIXME: Our rounding is lousy */
> +        { "-0.0", -0.0, "-0" },
>          { },
>      };
>
>      for (i = 0; test_cases[i].encoded; i++) {
>          QObject *obj;
>          QFloat *qfloat;
> +        QString *str;
>
>          obj = qobject_from_json(test_cases[i].encoded);
>          g_assert(obj != NULL);
>          g_assert(qobject_type(obj) == QTYPE_QFLOAT);
>
>          qfloat = qobject_to_qfloat(obj);
> -        g_assert(qfloat_get_double(qfloat) == test_cases[i].decoded);
> +        g_assert_cmpfloat(qfloat_get_double(qfloat), ==,
> +                          test_cases[i].decoded);
> +        g_assert_cmpint(signbit(qfloat_get_double(qfloat)), ==,
> +                        signbit(test_cases[i].decoded));

Could use a comment.

>
> -        if (test_cases[i].skip == 0) {
> -            QString *str;
> +        str = qobject_to_json(obj);
> +        g_assert_cmpstr(qstring_get_str(str), ==,
> +                        test_cases[i].canonical ?: test_cases[i].encoded);
> +        QDECREF(str);
> +        QDECREF(qfloat);
> +    }
> +}
>
> -            str = qobject_to_json(obj);
> -            g_assert(strcmp(qstring_get_str(str), test_cases[i].encoded) == 0);
> -            QDECREF(str);
> +static void non_finite_number(void)
> +{
> +    int i;
> +    struct {
> +        const char *encoded;
> +        double decoded;
> +        const char *canonical;
> +    } test_cases[] = {
> +        { "nan", NAN },
> +        { "NaN", NAN, "nan" },
> +        /* Not all libc implementations can round-trip '-nan' */
> +        /* We do not support forms like 'NAN(0)' */
> +        { "inf", INFINITY },
> +        { "-INFINITY", -INFINITY, "-inf" },
> +        { },
> +    };
> +
> +    for (i = 0; test_cases[i].encoded; i++) {
> +        QObject *obj;
> +        QFloat *qfloat;
> +        QString *str;
> +
> +        obj = qobject_from_json(test_cases[i].encoded);
> +        g_assert(obj != NULL);
> +        g_assert(qobject_type(obj) == QTYPE_QFLOAT);
> +
> +        qfloat = qobject_to_qfloat(obj);
> +        /* g_assert_cmpfloat(NAN, ==, NAN) doesn't do what we want */
> +        g_assert_cmpint(fpclassify(qfloat_get_double(qfloat)), ==,
> +                        fpclassify(test_cases[i].decoded));
> +        if (!isnan(test_cases[i].decoded)) {
> +            g_assert_cmpfloat(qfloat_get_double(qfloat), ==,
> +                              test_cases[i].decoded);
>          }
>
> +        str = qobject_to_json(obj);
> +        g_assert_cmpstr(qstring_get_str(str), ==,
> +                        test_cases[i].canonical ?: test_cases[i].encoded);
> +        QDECREF(str);
>          QDECREF(qfloat);
>      }
>  }
> @@ -1520,6 +1565,7 @@ int main(int argc, char **argv)
>
>      g_test_add_func("/literals/number/simple", simple_number);
>      g_test_add_func("/literals/number/float", float_number);
> +    g_test_add_func("/literals/number/non-finite", non_finite_number);
>      g_test_add_func("/literals/number/vararg", vararg_number);
>
>      g_test_add_func("/literals/keyword", keyword_literal);
> diff --git a/docs/qmp-spec.txt b/docs/qmp-spec.txt
> index f8b5356..bfba431 100644
> --- a/docs/qmp-spec.txt
> +++ b/docs/qmp-spec.txt
> @@ -51,6 +51,10 @@ json-string, and both input forms of strings understand an additional
>  escape sequence of "\'" for a single quote. The server will only use
>  double quoting on output.
>
> +As an extension, the server understands case-insensitive forms of
> +non-finite numbers, such as "NaN", "Inf", and "-infinity" (but not
> +"NaN(...)").
> +
>  2.1 General Definitions
>  -----------------------

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] qobject: Output valid JSON for non-finite numbers
  2016-06-10  2:48 ` [Qemu-devel] [PATCH 4/4] qobject: Output valid JSON for non-finite numbers Eric Blake
@ 2016-06-16 16:17   ` Markus Armbruster
  2016-06-17  3:06     ` Eric Blake
  0 siblings, 1 reply; 18+ messages in thread
From: Markus Armbruster @ 2016-06-16 16:17 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, Luiz Capitulino

Eric Blake <eblake@redhat.com> writes:

> It's better to give downstream clients a valid JSON string,
> even if they are semantically expecting a number, than it is
> to give them a bare keyword extension that can cause a
> lexical error.

Incompatible change.  If all clients are choking on non-finite numbers,
then the incompatibility is an improvement.  If a client exists that
groks non-finite numbers, ...  Absence is always hard to show.

Moreover, it turns query-qmp-schema into a liar: the schema it returns
claims a certain member of the reply has "type": "number", and then we
go on to send a string anyway.

> Of course, as long as we don't recognize (certain) strings as valid
> numbers during a conversion to QObject,

That would be even crazier!

>                                         this means our extension
> of accepting bare keywords for non-finite numbers cannot undergo
> a round trip (once converted into a string, we never get back to
> a QFloat).  However, non-finite input is rare enough that it's
> not worth bothering with at the moment.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>

I'm afraid the only sane solution is to find all uses of number in QMP
output, audit the code producing them, then assert isfinite() in the
monitor.  For commands without a side effect, we could fail the command
instead of tripping an assertion.  We'd have to declare such commands.

Let's examine the occurences of "number" in output of query-qmp-schema,
or actually in the qmp-introspect.c that gets generated with -u:

* Object q_obj_migrate_set_downtime-arg member value: input

* Builtin number: d'uh!

* Object MigrationStats member mbps: in output of query-migrate

* Object XBZRLECacheStats member overflow: likewise

* Object KeyValue case number: not a type.

* Object BlockDeviceTimedStats members avg_rd_queue_depth,
  avg_wr_queue_depth: in output of query-blockstats

* Enum CommandLineParameterType member: not a type

* Enum JSONType member: not a type

* Enum KeyValueKind: not a type

* Object PciBusInfo member: not a type

So it's just query-migrate and query-blockstats.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] qobject: Correct JSON lexer grammar comments
  2016-06-10  2:48 ` [Qemu-devel] [PATCH 1/4] qobject: Correct JSON lexer grammar comments Eric Blake
@ 2016-06-16 16:19   ` Markus Armbruster
  2016-06-16 17:41     ` Eric Blake
  0 siblings, 1 reply; 18+ messages in thread
From: Markus Armbruster @ 2016-06-16 16:19 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, Luiz Capitulino

Eric Blake <eblake@redhat.com> writes:

> Fix the regex comments describing what we parse as JSON.  No change
> to the lexer itself, just to the comments:
> - The "" and '' string construction was missing alternation between
> different escape sequences
> - The construction for numbers forgot to handle optional leading '-'
> - The construction for numbers was grouped incorrectly so that it
> didn't permit '0.1'
> - The construction for numbers forgot to mark the exponent as optional
> - No mention that our '' string and "\'" are JSON extensions
> - No mention of our %d and related extensions when constructing JSON
>
> Signed-off-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Markus Armbruster <armbru@redhat.com>

I'll take this one through qapi-next.  Thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] checkpatch: There is no qemu_strtod()
  2016-06-10  2:48 ` [Qemu-devel] [PATCH 2/4] checkpatch: There is no qemu_strtod() Eric Blake
@ 2016-06-16 16:20   ` Markus Armbruster
  2016-06-16 16:31     ` Paolo Bonzini
  0 siblings, 1 reply; 18+ messages in thread
From: Markus Armbruster @ 2016-06-16 16:20 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, Blue Swirl, Paolo Bonzini

Eric Blake <eblake@redhat.com> writes:

> Maybe there should be; but until there is, we should not flag
> strtod() calls as something to replaced with qemu_strtod().
>
> We also lack qemu_strtof() and qemu_strtold(), but as no one
> has been using strtof() or strtold(), it's not worth complicating
> the regex for them.
>
> (Ironically, I had to use 'git commit -n' since checkpatch uses
> TAB indents, in violation of its own recommendations.)
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  scripts/checkpatch.pl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index c939a32..cf32c8f 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -2453,7 +2453,7 @@ sub process {
>  		}
>
>  # recommend qemu_strto* over strto* for numeric conversions
> -		if ($line =~ /\b(strto[^k].*?)\s*\(/) {
> +		if ($line =~ /\b(strto[^kd].*?)\s*\(/) {
>  			WARN("consider using qemu_$1 in preference to $1\n" . $herecurr);
>  		}
>  # check for module_init(), use category-specific init macros explicitly please

Reviewed-by: Markus Armbruster <armbru@redhat.com>

Paolo, if you want me to smuggle this into master via qapi-next, let me
know.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] qobject: Parse non-finite numbers, as an extension
  2016-06-16 15:38   ` Markus Armbruster
@ 2016-06-16 16:25     ` Markus Armbruster
  2016-06-17  3:00       ` Eric Blake
  0 siblings, 1 reply; 18+ messages in thread
From: Markus Armbruster @ 2016-06-16 16:25 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, Luiz Capitulino

I think this commit mixes up parsing of non-finite numbers, which we may
or may not want, with general test improvements, which we'll want
regardless.  Please split the patch.

On the parsing of non-finite numbers: the code looks good to me, but as
long as we're not ready to extend QMP to include non-finite numbers both
ways, I doubt we need to parse them.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] checkpatch: There is no qemu_strtod()
  2016-06-16 16:20   ` Markus Armbruster
@ 2016-06-16 16:31     ` Paolo Bonzini
  0 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2016-06-16 16:31 UTC (permalink / raw)
  To: Markus Armbruster, Eric Blake; +Cc: qemu-devel, Blue Swirl



On 16/06/2016 18:20, Markus Armbruster wrote:
> Eric Blake <eblake@redhat.com> writes:
> 
>> Maybe there should be; but until there is, we should not flag
>> strtod() calls as something to replaced with qemu_strtod().
>>
>> We also lack qemu_strtof() and qemu_strtold(), but as no one
>> has been using strtof() or strtold(), it's not worth complicating
>> the regex for them.
>>
>> (Ironically, I had to use 'git commit -n' since checkpatch uses
>> TAB indents, in violation of its own recommendations.)
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>> ---
>>  scripts/checkpatch.pl | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
>> index c939a32..cf32c8f 100755
>> --- a/scripts/checkpatch.pl
>> +++ b/scripts/checkpatch.pl
>> @@ -2453,7 +2453,7 @@ sub process {
>>  		}
>>
>>  # recommend qemu_strto* over strto* for numeric conversions
>> -		if ($line =~ /\b(strto[^k].*?)\s*\(/) {
>> +		if ($line =~ /\b(strto[^kd].*?)\s*\(/) {
>>  			WARN("consider using qemu_$1 in preference to $1\n" . $herecurr);
>>  		}
>>  # check for module_init(), use category-specific init macros explicitly please
> 
> Reviewed-by: Markus Armbruster <armbru@redhat.com>
> 
> Paolo, if you want me to smuggle this into master via qapi-next, let me
> know.

Yes, please do.

Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] qobject: Correct JSON lexer grammar comments
  2016-06-16 16:19   ` Markus Armbruster
@ 2016-06-16 17:41     ` Eric Blake
  2016-06-17  7:54       ` Markus Armbruster
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Blake @ 2016-06-16 17:41 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, Luiz Capitulino

[-- Attachment #1: Type: text/plain, Size: 1927 bytes --]



On 06/16/2016 10:19 AM, Markus Armbruster wrote:
> Eric Blake <eblake@redhat.com> writes:
> 
>> Fix the regex comments describing what we parse as JSON.  No change
>> to the lexer itself, just to the comments:
>> - The "" and '' string construction was missing alternation between
>> different escape sequences
>> - The construction for numbers forgot to handle optional leading '-'
>> - The construction for numbers was grouped incorrectly so that it
>> didn't permit '0.1'
>> - The construction for numbers forgot to mark the exponent as optional
>> - No mention that our '' string and "\'" are JSON extensions
>> - No mention of our %d and related extensions when constructing JSON
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> Reviewed-by: Markus Armbruster <armbru@redhat.com>
> 
> I'll take this one through qapi-next.  Thanks!
> 

You may want to squash this in for shorter lines:


diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index de16219..b030576 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -21,16 +21,14 @@
  * Required by JSON (RFC 7159), plus \' extension in "", and extension
  * of parsing case-insensitive non-finite numbers like "NaN" and "-Inf":
  *
- * \"([^\\\"]|(\\\"|\\'|\\\\|\\/|\\b|\\f|\\n|\\r|\\t|
- *    \\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*\"
+ * \"([^\\\"]|\\[\"'\\/bfnrt]|\\u[0-9a-fA-F]{4})*\"
  * -?(0|[1-9][0-9]*)(.[0-9]+)?([eE][-+]?[0-9]+)?
  * [{}\[\],:]
  * -?[a-zA-Z]+   # covers null, true, false, nan, inf[inity]
  *
  * Extension of '' strings:
  *
- * '([^\\']|(\\\"|\\'|\\\\|\\/|\\b|\\f|\\n|\\r|\\t|
- *    \\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*'
+ * '([^\\']|\\[\"'\\/bfnrt]|\\u[0-9a-fA-F]{4})*'
  *
  * Extension for vararg handling in JSON construction:
  *

Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] qobject: Parse non-finite numbers, as an extension
  2016-06-16 16:25     ` Markus Armbruster
@ 2016-06-17  3:00       ` Eric Blake
  2016-06-17  8:04         ` Markus Armbruster
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Blake @ 2016-06-17  3:00 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, Luiz Capitulino

[-- Attachment #1: Type: text/plain, Size: 1937 bytes --]

On 06/16/2016 10:25 AM, Markus Armbruster wrote:
> I think this commit mixes up parsing of non-finite numbers, which we may
> or may not want, with general test improvements, which we'll want
> regardless.  Please split the patch.
> 
> On the parsing of non-finite numbers: the code looks good to me, but as
> long as we're not ready to extend QMP to include non-finite numbers both
> ways, I doubt we need to parse them.

Even if we want to guarantee that we will never generate a non-finite
number as the result of a QMP command, we DO have to worry about the
'id' field.  With all this series applied:

$  ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp
stdio{"QMP": {"version": {"qemu": {"micro": 50, "minor": 6, "major": 2},
"package": " (v2.6.0-1151-g1bb9f5e)"}, "capabilities": []}}
{'execute':'qmp_capabilities', 'id':[nan,inf,-Infinity]}
{"return": {}, "id": ["nan", "inf", "-inf"]}

and with just 1-3 applied:

$ ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp
stdio{"QMP": {"version": {"qemu": {"micro": 50, "minor": 6, "major": 2},
"package": " (v2.6.0-1151-g1bb9f5e)"}, "capabilities": []}}
{'execute':'qmp_capabilities', 'id':[nan,inf,-Infinity]}
{"return": {}, "id": [nan, inf, -inf]}

[okay, I cheated, as evidenced by the new git stuff appended in the
version output being the same between the two lines, but you get the idea].

I'm comfortable with a compromise that asserts that QMP commands will
never output a non-finite number anywhere that introspection lists
'number', but that QMP itself understands the extension of parsing a
bare inf anywhere, and if a bare inf appears under 'id', then the replay
of 'id' will also include a bare inf (if your JSON library allows you to
send non-JSON, it should also allow you to parse non-JSON).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] qobject: Output valid JSON for non-finite numbers
  2016-06-16 16:17   ` Markus Armbruster
@ 2016-06-17  3:06     ` Eric Blake
  2016-06-17  8:14       ` Markus Armbruster
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Blake @ 2016-06-17  3:06 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, Luiz Capitulino

[-- Attachment #1: Type: text/plain, Size: 3413 bytes --]

On 06/16/2016 10:17 AM, Markus Armbruster wrote:
> Eric Blake <eblake@redhat.com> writes:
> 
>> It's better to give downstream clients a valid JSON string,
>> even if they are semantically expecting a number, than it is
>> to give them a bare keyword extension that can cause a
>> lexical error.
> 
> Incompatible change.  If all clients are choking on non-finite numbers,
> then the incompatibility is an improvement.  If a client exists that
> groks non-finite numbers, ...  Absence is always hard to show.

The 'id' field is an outlier - there, we replay the user's input with no
contextual interpretation (however, we DO reserve the right to reorder
the keys in the dicts that we replay, and to canonicalize UTF-8 text or
otherwise alter their input to something "equivalent").

> 
> Moreover, it turns query-qmp-schema into a liar: the schema it returns
> claims a certain member of the reply has "type": "number", and then we
> go on to send a string anyway.

The 'id' field is documented as sending ANY JSON value, so if we argue
that canonicalizing their extension input of a bare inf into a proper
JSON string on output is reasonable, then we may want this patch in
addition to adding assertions that none of the QMP commands with
introspectible 'number' ever output non-finite values.

> 
>> Of course, as long as we don't recognize (certain) strings as valid
>> numbers during a conversion to QObject,
> 
> That would be even crazier!
> 
>>                                         this means our extension
>> of accepting bare keywords for non-finite numbers cannot undergo
>> a round trip (once converted into a string, we never get back to
>> a QFloat).  However, non-finite input is rare enough that it's
>> not worth bothering with at the moment.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> I'm afraid the only sane solution is to find all uses of number in QMP
> output, audit the code producing them, then assert isfinite() in the
> monitor.  For commands without a side effect, we could fail the command
> instead of tripping an assertion.  We'd have to declare such commands.
> 
> Let's examine the occurences of "number" in output of query-qmp-schema,
> or actually in the qmp-introspect.c that gets generated with -u:
> 
> * Object q_obj_migrate_set_downtime-arg member value: input

Even though it's not output, it does need to be checked that it will
behave sanely with Inf or NaN input if we extend our parser to allow
those (behaving sanely may include a graceful error that the input was
out of range).

> 
> * Builtin number: d'uh!
> 
> * Object MigrationStats member mbps: in output of query-migrate
> 
> * Object XBZRLECacheStats member overflow: likewise
> 
> * Object KeyValue case number: not a type.
> 
> * Object BlockDeviceTimedStats members avg_rd_queue_depth,
>   avg_wr_queue_depth: in output of query-blockstats
> 
> * Enum CommandLineParameterType member: not a type
> 
> * Enum JSONType member: not a type
> 
> * Enum KeyValueKind: not a type
> 
> * Object PciBusInfo member: not a type
> 
> So it's just query-migrate and query-blockstats.
> 

Okay, looks like I need to respin this, and the rest of my JSON output
visitor on top of it, with this audit done first.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] qobject: Correct JSON lexer grammar comments
  2016-06-16 17:41     ` Eric Blake
@ 2016-06-17  7:54       ` Markus Armbruster
  2016-06-21 13:53         ` Eric Blake
  0 siblings, 1 reply; 18+ messages in thread
From: Markus Armbruster @ 2016-06-17  7:54 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, Luiz Capitulino

Eric Blake <eblake@redhat.com> writes:

> On 06/16/2016 10:19 AM, Markus Armbruster wrote:
>> Eric Blake <eblake@redhat.com> writes:
>> 
>>> Fix the regex comments describing what we parse as JSON.  No change
>>> to the lexer itself, just to the comments:
>>> - The "" and '' string construction was missing alternation between
>>> different escape sequences
>>> - The construction for numbers forgot to handle optional leading '-'
>>> - The construction for numbers was grouped incorrectly so that it
>>> didn't permit '0.1'
>>> - The construction for numbers forgot to mark the exponent as optional
>>> - No mention that our '' string and "\'" are JSON extensions
>>> - No mention of our %d and related extensions when constructing JSON
>>>
>>> Signed-off-by: Eric Blake <eblake@redhat.com>
>> 
>> Reviewed-by: Markus Armbruster <armbru@redhat.com>
>> 
>> I'll take this one through qapi-next.  Thanks!
>> 
>
> You may want to squash this in for shorter lines:
>
>
> diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
> index de16219..b030576 100644
> --- a/qobject/json-lexer.c
> +++ b/qobject/json-lexer.c
> @@ -21,16 +21,14 @@
>   * Required by JSON (RFC 7159), plus \' extension in "", and extension
>   * of parsing case-insensitive non-finite numbers like "NaN" and "-Inf":
>   *
> - * \"([^\\\"]|(\\\"|\\'|\\\\|\\/|\\b|\\f|\\n|\\r|\\t|
> - *    \\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*\"
> + * \"([^\\\"]|\\[\"'\\/bfnrt]|\\u[0-9a-fA-F]{4})*\"
>   * -?(0|[1-9][0-9]*)(.[0-9]+)?([eE][-+]?[0-9]+)?
>   * [{}\[\],:]
>   * -?[a-zA-Z]+   # covers null, true, false, nan, inf[inity]
>   *
>   * Extension of '' strings:
>   *
> - * '([^\\']|(\\\"|\\'|\\\\|\\/|\\b|\\f|\\n|\\r|\\t|
> - *    \\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*'
> + * '([^\\']|\\[\"'\\/bfnrt]|\\u[0-9a-fA-F]{4})*'
>   *
>   * Extension for vararg handling in JSON construction:
>   *

Doesn't apply as is, but I get what you mean.  Applied to qapi-next,
please double-check the result there.  Thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] qobject: Parse non-finite numbers, as an extension
  2016-06-17  3:00       ` Eric Blake
@ 2016-06-17  8:04         ` Markus Armbruster
  0 siblings, 0 replies; 18+ messages in thread
From: Markus Armbruster @ 2016-06-17  8:04 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, Luiz Capitulino

Eric Blake <eblake@redhat.com> writes:

> On 06/16/2016 10:25 AM, Markus Armbruster wrote:
>> I think this commit mixes up parsing of non-finite numbers, which we may
>> or may not want, with general test improvements, which we'll want
>> regardless.  Please split the patch.
>> 
>> On the parsing of non-finite numbers: the code looks good to me, but as
>> long as we're not ready to extend QMP to include non-finite numbers both
>> ways, I doubt we need to parse them.
>
> Even if we want to guarantee that we will never generate a non-finite
> number as the result of a QMP command, we DO have to worry about the
> 'id' field.  With all this series applied:
>
> $  ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp
> stdio{"QMP": {"version": {"qemu": {"micro": 50, "minor": 6, "major": 2},
> "package": " (v2.6.0-1151-g1bb9f5e)"}, "capabilities": []}}
> {'execute':'qmp_capabilities', 'id':[nan,inf,-Infinity]}
> {"return": {}, "id": ["nan", "inf", "-inf"]}

You get a different id back, in violation of the QMP spec.

> and with just 1-3 applied:
>
> $ ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp
> stdio{"QMP": {"version": {"qemu": {"micro": 50, "minor": 6, "major": 2},
> "package": " (v2.6.0-1151-g1bb9f5e)"}, "capabilities": []}}
> {'execute':'qmp_capabilities', 'id':[nan,inf,-Infinity]}
> {"return": {}, "id": [nan, inf, -inf]}

This would conform to a revision of the QMP spec that supports
non-finite numbers.  With the current spec, it's in "garbage in, garbage
out" territory.  At least it's the same garbage :)

Current master:

    {"QMP": {"version": {"qemu": {"micro": 50, "minor": 6, "major": 2}, "package": " (v2.6.0-1114-g585fcd4)"}, "capabilities": []}}
    {'execute':'qmp_capabilities', 'id':[nan,inf,-Infinity]}
    {"error": {"class": "GenericError", "desc": "Invalid JSON syntax"}}
    {"error": {"class": "GenericError", "desc": "Invalid JSON syntax"}}
    {"error": {"class": "GenericError", "desc": "Invalid JSON syntax"}}
    {"error": {"class": "GenericError", "desc": "Invalid JSON syntax"}}

Our diagnosis and reporting of lexical errors sucks, but that's not
news.

> [okay, I cheated, as evidenced by the new git stuff appended in the
> version output being the same between the two lines, but you get the idea].
>
> I'm comfortable with a compromise that asserts that QMP commands will
> never output a non-finite number anywhere that introspection lists
> 'number', but that QMP itself understands the extension of parsing a
> bare inf anywhere, and if a bare inf appears under 'id', then the replay
> of 'id' will also include a bare inf (if your JSON library allows you to
> send non-JSON, it should also allow you to parse non-JSON).

Yes.

However, non-finite numbers have never worked in input.  I don't see why
we should make them work in input now, unless we extend QMP to include
non-finite numbers, and make them work both ways.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] qobject: Output valid JSON for non-finite numbers
  2016-06-17  3:06     ` Eric Blake
@ 2016-06-17  8:14       ` Markus Armbruster
  0 siblings, 0 replies; 18+ messages in thread
From: Markus Armbruster @ 2016-06-17  8:14 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, Luiz Capitulino

Eric Blake <eblake@redhat.com> writes:

> On 06/16/2016 10:17 AM, Markus Armbruster wrote:
>> Eric Blake <eblake@redhat.com> writes:
>> 
>>> It's better to give downstream clients a valid JSON string,
>>> even if they are semantically expecting a number, than it is
>>> to give them a bare keyword extension that can cause a
>>> lexical error.
>> 
>> Incompatible change.  If all clients are choking on non-finite numbers,
>> then the incompatibility is an improvement.  If a client exists that
>> groks non-finite numbers, ...  Absence is always hard to show.
>
> The 'id' field is an outlier - there, we replay the user's input with no
> contextual interpretation (however, we DO reserve the right to reorder
> the keys in the dicts that we replay, and to canonicalize UTF-8 text or
> otherwise alter their input to something "equivalent").

Yes, the response's id must the the same JSON value, but it needn't be
the same text.

>> Moreover, it turns query-qmp-schema into a liar: the schema it returns
>> claims a certain member of the reply has "type": "number", and then we
>> go on to send a string anyway.
>
> The 'id' field is documented as sending ANY JSON value, so if we argue
> that canonicalizing their extension input of a bare inf into a proper
> JSON string on output is reasonable, then we may want this patch in
> addition to adding assertions that none of the QMP commands with
> introspectible 'number' ever output non-finite values.

I read this thrice, and I'm still not sure I got the argument :)

>>> Of course, as long as we don't recognize (certain) strings as valid
>>> numbers during a conversion to QObject,
>> 
>> That would be even crazier!
>> 
>>>                                         this means our extension
>>> of accepting bare keywords for non-finite numbers cannot undergo
>>> a round trip (once converted into a string, we never get back to
>>> a QFloat).  However, non-finite input is rare enough that it's
>>> not worth bothering with at the moment.
>>>
>>> Signed-off-by: Eric Blake <eblake@redhat.com>
>> 
>> I'm afraid the only sane solution is to find all uses of number in QMP
>> output, audit the code producing them, then assert isfinite() in the
>> monitor.  For commands without a side effect, we could fail the command
>> instead of tripping an assertion.  We'd have to declare such commands.
>> 
>> Let's examine the occurences of "number" in output of query-qmp-schema,
>> or actually in the qmp-introspect.c that gets generated with -u:
>> 
>> * Object q_obj_migrate_set_downtime-arg member value: input
>
> Even though it's not output, it does need to be checked that it will
> behave sanely with Inf or NaN input if we extend our parser to allow
> those (behaving sanely may include a graceful error that the input was
> out of range).

Yes, *if* we extend QMP.

>> 
>> * Builtin number: d'uh!
>> 
>> * Object MigrationStats member mbps: in output of query-migrate
>> 
>> * Object XBZRLECacheStats member overflow: likewise
>> 
>> * Object KeyValue case number: not a type.
>> 
>> * Object BlockDeviceTimedStats members avg_rd_queue_depth,
>>   avg_wr_queue_depth: in output of query-blockstats
>> 
>> * Enum CommandLineParameterType member: not a type
>> 
>> * Enum JSONType member: not a type
>> 
>> * Enum KeyValueKind: not a type
>> 
>> * Object PciBusInfo member: not a type
>> 
>> So it's just query-migrate and query-blockstats.
>> 
>
> Okay, looks like I need to respin this, and the rest of my JSON output
> visitor on top of it, with this audit done first.

Audit, plus isfinite() assertions to guard the JSON output.

The (misnamed) QMP output visitor shouldn't assert, because it can
legitimately be used for purposes other than QMP.  Only the actual
conversion to JSON should assert.  Currently, to_json().  With your JSON
output visitor, it would be qstring_append_json_number(), or its caller.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] qobject: Correct JSON lexer grammar comments
  2016-06-17  7:54       ` Markus Armbruster
@ 2016-06-21 13:53         ` Eric Blake
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Blake @ 2016-06-21 13:53 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, Luiz Capitulino

[-- Attachment #1: Type: text/plain, Size: 1061 bytes --]

On 06/17/2016 01:54 AM, Markus Armbruster wrote:

>>>
>>> I'll take this one through qapi-next.  Thanks!
>>>
>>
>> You may want to squash this in for shorter lines:
>>
>>
>> diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
>> index de16219..b030576 100644
>> --- a/qobject/json-lexer.c
>> +++ b/qobject/json-lexer.c
>> @@ -21,16 +21,14 @@
>>   * Required by JSON (RFC 7159), plus \' extension in "", and extension
>>   * of parsing case-insensitive non-finite numbers like "NaN" and "-Inf":
>>   *
>> - * \"([^\\\"]|(\\\"|\\'|\\\\|\\/|\\b|\\f|\\n|\\r|\\t|
>> - *    \\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*\"
>> + * \"([^\\\"]|\\[\"'\\/bfnrt]|\\u[0-9a-fA-F]{4})*\"

> Doesn't apply as is, but I get what you mean.  Applied to qapi-next,
> please double-check the result there.  Thanks!

You lost the comment about 'plus \' extension in ""', which is probably
still worth keeping; but otherwise looks okay.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-06-21 13:53 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-10  2:48 [Qemu-devel] [PATCH 0/4] Guarantee valid JSON in QMP, even for nonfinite numbers Eric Blake
2016-06-10  2:48 ` [Qemu-devel] [PATCH 1/4] qobject: Correct JSON lexer grammar comments Eric Blake
2016-06-16 16:19   ` Markus Armbruster
2016-06-16 17:41     ` Eric Blake
2016-06-17  7:54       ` Markus Armbruster
2016-06-21 13:53         ` Eric Blake
2016-06-10  2:48 ` [Qemu-devel] [PATCH 2/4] checkpatch: There is no qemu_strtod() Eric Blake
2016-06-16 16:20   ` Markus Armbruster
2016-06-16 16:31     ` Paolo Bonzini
2016-06-10  2:48 ` [Qemu-devel] [PATCH 3/4] qobject: Parse non-finite numbers, as an extension Eric Blake
2016-06-16 15:38   ` Markus Armbruster
2016-06-16 16:25     ` Markus Armbruster
2016-06-17  3:00       ` Eric Blake
2016-06-17  8:04         ` Markus Armbruster
2016-06-10  2:48 ` [Qemu-devel] [PATCH 4/4] qobject: Output valid JSON for non-finite numbers Eric Blake
2016-06-16 16:17   ` Markus Armbruster
2016-06-17  3:06     ` Eric Blake
2016-06-17  8:14       ` Markus Armbruster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.