All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups
@ 2018-08-08 12:02 Markus Armbruster
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 01/56] check-qjson: Cover multiple JSON objects in same string Markus Armbruster
                   ` (56 more replies)
  0 siblings, 57 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

JSON is such a simple language, so writing a parser should be easy,
shouldn't it?  Well, the evidence is in, and it's a lot of patches.
Summary of fixes:

* Reject ASCII control characters in strings as RFC 7159 specifies

* Reject all invalid UTF-8 sequences, not just some

* Reject invalid \uXXXX escapes

* Implement \uXXXX surrogate pairs as specified by RFC 7159

* Don't ignore \u0000 silently, map it to \xC0\80 (modified UTF-8)

* qobject_from_json() is ridicilously broken for input containing more
  than one value, fix

* Don't ignore trailing unterminated structures

* Less cavalierly cruel error reporting

Topped off with tests and cleanups.

If you're into this kind of disaster relief, commit c7a3f25200c
"qapi.py: Restructure lexer and parser" was even funnier.

Marc-André Lureau (2):
  json: remove useless return value from lexer/parser
  json-parser: simplify and avoid JSONParserContext allocation

Markus Armbruster (54):
  check-qjson: Cover multiple JSON objects in same string
  check-qjson: Cover blank and lexically erroneous input
  check-qjson: Cover whitespace more thoroughly
  qmp-cmd-test: Split off qmp-test
  qmp-test: Cover syntax and lexical errors
  test-qga: Clean up how we test QGA synchronization
  check-qjson: Cover escaped characters more thoroughly, part 1
  check-qjson: Streamline escaped_string()'s test strings
  check-qjson: Cover escaped characters more thoroughly, part 2
  check-qjson: Drop redundant string tests
  check-qjson: Cover UTF-8 in single quoted strings
  check-qjson: Simplify utf8_string()
  check-qjson: Fix utf8_string() to test all invalid sequences
  check-qjson qmp-test: Cover control characters more thoroughly
  check-qjson: Cover interpolation more thoroughly
  json: Fix lexer to include the bad character in JSON_ERROR token
  json: Reject unescaped control characters
  json: Revamp lexer documentation
  json: Tighten and simplify qstring_from_escaped_str()'s loop
  check-qjson: Document we expect invalid UTF-8 to be rejected
  json: Reject invalid UTF-8 sequences
  json: Report first rather than last parse error
  json: Leave rejecting invalid UTF-8 to parser
  json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8")
  json: Leave rejecting invalid escape sequences to parser
  json: Simplify parse_string()
  json: Reject invalid \uXXXX, fix \u0000
  json: Fix \uXXXX for surrogate pairs
  check-qjson: Fix and enable utf8_string()'s disabled part
  json: Have lexer call streamer directly
  json: Redesign the callback to consume JSON values
  json: Don't pass null @tokens to json_parser_parse()
  json: Don't create JSON_ERROR tokens that won't be used
  json: Rename token JSON_ESCAPE & friends to JSON_INTERPOL
  json: Treat unwanted interpolation as lexical error
  json: Pass lexical errors and limit violations to callback
  json: Leave rejecting invalid interpolation to parser
  json: Replace %I64d, %I64u by %PRId64, %PRIu64
  json: Nicer recovery from invalid leading zero
  json: Improve names of lexer states related to numbers
  qjson: Fix qobject_from_json() & friends for multiple values
  json: Fix latent parser aborts at end of input
  json: Fix streamer not to ignore trailing unterminated structures
  json: Assert json_parser_parse() consumes all tokens on success
  qjson: Have qobject_from_json() & friends reject empty and blank
  json: Enforce token count and size limits more tightly
  json: Streamline json_message_process_token()
  json: Unbox tokens queue in JSONMessageParser
  json: Eliminate lexer state IN_ERROR and pseudo-token JSON_MIN
  json: Eliminate lexer state IN_WHITESPACE, pseudo-token JSON_SKIP
  json: Make JSONToken opaque outside json-parser.c
  qobject: Drop superfluous includes of qemu-common.h
  json: Clean up headers
  docs/interop/qmp-spec: How to force known good parser state

 MAINTAINERS                      |    1 +
 block.c                          |    5 -
 docs/interop/qmp-spec.txt        |   37 +-
 include/qapi/qmp/json-lexer.h    |   56 --
 include/qapi/qmp/json-parser.h   |   36 +-
 include/qapi/qmp/json-streamer.h |   46 --
 include/qapi/qmp/qerror.h        |    3 -
 include/qemu/unicode.h           |    1 +
 monitor.c                        |   21 +-
 qapi/qmp-dispatch.c              |    1 -
 qapi/qobject-input-visitor.c     |    5 -
 qga/main.c                       |   15 +-
 qobject/json-lexer.c             |  361 +++++------
 qobject/json-parser-int.h        |   51 ++
 qobject/json-parser.c            |  298 ++++-----
 qobject/json-streamer.c          |  126 ++--
 qobject/qbool.c                  |    1 -
 qobject/qjson.c                  |   31 +-
 qobject/qlist.c                  |    1 -
 qobject/qnull.c                  |    1 -
 qobject/qnum.c                   |    1 -
 qobject/qobject.c                |    1 -
 qobject/qstring.c                |    1 -
 tests/Makefile.include           |    3 +
 tests/check-qjson.c              | 1017 +++++++++++++++---------------
 tests/libqtest.c                 |   56 +-
 tests/libqtest.h                 |   13 +
 tests/qmp-cmd-test.c             |  213 +++++++
 tests/qmp-test.c                 |  248 ++------
 tests/test-qga.c                 |    8 +-
 util/unicode.c                   |   69 +-
 31 files changed, 1398 insertions(+), 1329 deletions(-)
 delete mode 100644 include/qapi/qmp/json-lexer.h
 delete mode 100644 include/qapi/qmp/json-streamer.h
 create mode 100644 qobject/json-parser-int.h
 create mode 100644 tests/qmp-cmd-test.c

-- 
2.17.1

^ permalink raw reply	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 01/56] check-qjson: Cover multiple JSON objects in same string
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
@ 2018-08-08 12:02 ` Markus Armbruster
  2018-08-09 13:25   ` Eric Blake
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 02/56] check-qjson: Cover blank and lexically erroneous input Markus Armbruster
                   ` (55 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

qobject_from_json() & friends misbehave when the JSON text has more
than one JSON value.  Add test coverage to demonstrate the bugs.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 tests/check-qjson.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index eaf5d20663..cc952c56ea 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -1418,6 +1418,25 @@ static void limits_nesting(void)
     g_assert(obj == NULL);
 }
 
+static void multiple_values(void)
+{
+    Error *err = NULL;
+    QObject *obj;
+
+    /* BUG this leaks the syntax tree for "false" */
+    obj = qobject_from_json("false true", &err);
+    g_assert(qbool_get_bool(qobject_to(QBool, obj)));
+    g_assert(!err);
+    qobject_unref(obj);
+
+    /* BUG simultaneously succeeds and fails */
+    /* BUG calls json_parser_parse() with errp pointing to non-null */
+    obj = qobject_from_json("} true", &err);
+    g_assert(qbool_get_bool(qobject_to(QBool, obj)));
+    error_free_or_abort(&err);
+    qobject_unref(obj);
+}
+
 int main(int argc, char **argv)
 {
     g_test_init(&argc, &argv, NULL);
@@ -1455,6 +1474,7 @@ int main(int argc, char **argv)
     g_test_add_func("/errors/invalid_dict_comma", invalid_dict_comma);
     g_test_add_func("/errors/unterminated/literal", unterminated_literal);
     g_test_add_func("/errors/limits/nesting", limits_nesting);
+    g_test_add_func("/errors/multiple_values", multiple_values);
 
     return g_test_run();
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 02/56] check-qjson: Cover blank and lexically erroneous input
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 01/56] check-qjson: Cover multiple JSON objects in same string Markus Armbruster
@ 2018-08-08 12:02 ` Markus Armbruster
  2018-08-09 13:29   ` Eric Blake
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 03/56] check-qjson: Cover whitespace more thoroughly Markus Armbruster
                   ` (54 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

qobject_from_json() can return null without setting an error on
lexical errors.  I call that a bug.  Add test coverage to demonstrate
it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 tests/check-qjson.c | 36 +++++++++++++++++++++++++++++++++---
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index cc952c56ea..81b92d6b0c 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -1307,8 +1307,36 @@ static void simple_varargs(void)
 
 static void empty_input(void)
 {
-    const char *empty = "";
-    QObject *obj = qobject_from_json(empty, &error_abort);
+    QObject *obj = qobject_from_json("", &error_abort);
+    g_assert(obj == NULL);
+}
+
+static void blank_input(void)
+{
+    QObject *obj = qobject_from_json("\n ", &error_abort);
+    g_assert(obj == NULL);
+}
+
+static void junk_input(void)
+{
+    /* Note: junk within strings is covered elsewhere */
+    Error *err = NULL;
+    QObject *obj;
+
+    obj = qobject_from_json("@", &err);
+    g_assert(!err);             /* BUG */
+    g_assert(obj == NULL);
+
+    obj = qobject_from_json("[0\xFF]", &err);
+    error_free_or_abort(&err);
+    g_assert(obj == NULL);
+
+    obj = qobject_from_json("00", &err);
+    g_assert(!err);             /* BUG */
+    g_assert(obj == NULL);
+
+    obj = qobject_from_json("[1e", &err);
+    g_assert(!err);             /* BUG */
     g_assert(obj == NULL);
 }
 
@@ -1462,7 +1490,9 @@ int main(int argc, char **argv)
 
     g_test_add_func("/varargs/simple_varargs", simple_varargs);
 
-    g_test_add_func("/errors/empty_input", empty_input);
+    g_test_add_func("/errors/empty", empty_input);
+    g_test_add_func("/errors/blank", blank_input);
+    g_test_add_func("/errors/junk", junk_input);
     g_test_add_func("/errors/unterminated/string", unterminated_string);
     g_test_add_func("/errors/unterminated/escape", unterminated_escape);
     g_test_add_func("/errors/unterminated/sq_string", unterminated_sq_string);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 03/56] check-qjson: Cover whitespace more thoroughly
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 01/56] check-qjson: Cover multiple JSON objects in same string Markus Armbruster
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 02/56] check-qjson: Cover blank and lexically erroneous input Markus Armbruster
@ 2018-08-08 12:02 ` Markus Armbruster
  2018-08-09 13:36   ` Eric Blake
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 04/56] qmp-cmd-test: Split off qmp-test Markus Armbruster
                   ` (53 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 tests/check-qjson.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index 81b92d6b0c..0a9a054c7b 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -1236,7 +1236,7 @@ static void simple_whitespace(void)
                     })),
         },
         {
-            .encoded = " [ 43 , { 'h' : 'b' }, [ ], 42 ]",
+            .encoded = "\t[ 43 , { 'h' : 'b' },\n\t[ ], 42 ]\n",
             .decoded = QLIT_QLIST(((QLitObject[]){
                         QLIT_QNUM(43),
                         QLIT_QDICT(((QLitDictEntry[]){
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 04/56] qmp-cmd-test: Split off qmp-test
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (2 preceding siblings ...)
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 03/56] check-qjson: Cover whitespace more thoroughly Markus Armbruster
@ 2018-08-08 12:02 ` Markus Armbruster
  2018-08-09 13:38   ` Eric Blake
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 05/56] qmp-test: Cover syntax and lexical errors Markus Armbruster
                   ` (52 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

qmp-test is for QMP protocol tests.  Commit e4a426e75ef added generic,
basic tests of query commands to it.  Move them to their own test
program qmp-cmd-test, to keep qmp-test focused on the protocol.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 MAINTAINERS            |   1 +
 tests/Makefile.include |   3 +
 tests/qmp-cmd-test.c   | 213 +++++++++++++++++++++++++++++++++++++++++
 tests/qmp-test.c       | 191 +-----------------------------------
 4 files changed, 218 insertions(+), 190 deletions(-)
 create mode 100644 tests/qmp-cmd-test.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 666e936812..dc129b7034 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1695,6 +1695,7 @@ F: monitor.c
 F: docs/devel/*qmp-*
 F: scripts/qmp/
 F: tests/qmp-test.c
+F: tests/qmp-cmd-test.c
 T: git git://repo.or.cz/qemu/armbru.git qapi-next
 
 Register API
diff --git a/tests/Makefile.include b/tests/Makefile.include
index a49282704e..48f18ad581 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -179,6 +179,8 @@ check-block-$(CONFIG_POSIX) += tests/qemu-iotests-quick.sh
 
 check-qtest-generic-y = tests/qmp-test$(EXESUF)
 gcov-files-generic-y = monitor.c qapi/qmp-dispatch.c
+check-qtest-generic-y += tests/qmp-cmd-test$(EXESUF)
+
 check-qtest-generic-y += tests/device-introspect-test$(EXESUF)
 gcov-files-generic-y = qdev-monitor.c qmp.c
 check-qtest-generic-y += tests/cdrom-test$(EXESUF)
@@ -770,6 +772,7 @@ libqos-usb-obj-y = $(libqos-spapr-obj-y) $(libqos-pc-obj-y) tests/libqos/usb.o
 libqos-virtio-obj-y = $(libqos-spapr-obj-y) $(libqos-pc-obj-y) tests/libqos/virtio.o tests/libqos/virtio-pci.o tests/libqos/virtio-mmio.o tests/libqos/malloc-generic.o
 
 tests/qmp-test$(EXESUF): tests/qmp-test.o
+tests/qmp-cmd-test$(EXESUF): tests/qmp-cmd-test.o
 tests/device-introspect-test$(EXESUF): tests/device-introspect-test.o
 tests/rtc-test$(EXESUF): tests/rtc-test.o
 tests/m48t59-test$(EXESUF): tests/m48t59-test.o
diff --git a/tests/qmp-cmd-test.c b/tests/qmp-cmd-test.c
new file mode 100644
index 0000000000..5e4a831adb
--- /dev/null
+++ b/tests/qmp-cmd-test.c
@@ -0,0 +1,213 @@
+/*
+ * QMP command test cases
+ *
+ * Copyright (c) 2017 Red Hat Inc.
+ *
+ * Authors:
+ *  Markus Armbruster <armbru@redhat.com>,
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+#include "qapi/error.h"
+#include "qapi/qapi-visit-introspect.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qobject-input-visitor.h"
+
+const char common_args[] = "-nodefaults -machine none";
+
+/* Query smoke tests */
+
+static int query_error_class(const char *cmd)
+{
+    static struct {
+        const char *cmd;
+        int err_class;
+    } fails[] = {
+        /* Success depends on build configuration: */
+#ifndef CONFIG_SPICE
+        { "query-spice", ERROR_CLASS_COMMAND_NOT_FOUND },
+#endif
+#ifndef CONFIG_VNC
+        { "query-vnc", ERROR_CLASS_GENERIC_ERROR },
+        { "query-vnc-servers", ERROR_CLASS_GENERIC_ERROR },
+#endif
+#ifndef CONFIG_REPLICATION
+        { "query-xen-replication-status", ERROR_CLASS_COMMAND_NOT_FOUND },
+#endif
+        /* Likewise, and require special QEMU command-line arguments: */
+        { "query-acpi-ospm-status", ERROR_CLASS_GENERIC_ERROR },
+        { "query-balloon", ERROR_CLASS_DEVICE_NOT_ACTIVE },
+        { "query-hotpluggable-cpus", ERROR_CLASS_GENERIC_ERROR },
+        { "query-vm-generation-id", ERROR_CLASS_GENERIC_ERROR },
+        { NULL, -1 }
+    };
+    int i;
+
+    for (i = 0; fails[i].cmd; i++) {
+        if (!strcmp(cmd, fails[i].cmd)) {
+            return fails[i].err_class;
+        }
+    }
+    return -1;
+}
+
+static void test_query(const void *data)
+{
+    const char *cmd = data;
+    int expected_error_class = query_error_class(cmd);
+    QDict *resp, *error;
+    const char *error_class;
+
+    qtest_start(common_args);
+
+    resp = qmp("{ 'execute': %s }", cmd);
+    error = qdict_get_qdict(resp, "error");
+    error_class = error ? qdict_get_str(error, "class") : NULL;
+
+    if (expected_error_class < 0) {
+        g_assert(qdict_haskey(resp, "return"));
+    } else {
+        g_assert(error);
+        g_assert_cmpint(qapi_enum_parse(&QapiErrorClass_lookup, error_class,
+                                        -1, &error_abort),
+                        ==, expected_error_class);
+    }
+    qobject_unref(resp);
+
+    qtest_end();
+}
+
+static bool query_is_blacklisted(const char *cmd)
+{
+    const char *blacklist[] = {
+        /* Not actually queries: */
+        "add-fd",
+        /* Success depends on target arch: */
+        "query-cpu-definitions",  /* arm, i386, ppc, s390x */
+        "query-gic-capabilities", /* arm */
+        /* Success depends on target-specific build configuration: */
+        "query-pci",              /* CONFIG_PCI */
+        /* Success depends on launching SEV guest */
+        "query-sev-launch-measure",
+        /* Success depends on Host or Hypervisor SEV support */
+        "query-sev",
+        "query-sev-capabilities",
+        NULL
+    };
+    int i;
+
+    for (i = 0; blacklist[i]; i++) {
+        if (!strcmp(cmd, blacklist[i])) {
+            return true;
+        }
+    }
+    return false;
+}
+
+typedef struct {
+    SchemaInfoList *list;
+    GHashTable *hash;
+} QmpSchema;
+
+static void qmp_schema_init(QmpSchema *schema)
+{
+    QDict *resp;
+    Visitor *qiv;
+    SchemaInfoList *tail;
+
+    qtest_start(common_args);
+    resp = qmp("{ 'execute': 'query-qmp-schema' }");
+
+    qiv = qobject_input_visitor_new(qdict_get(resp, "return"));
+    visit_type_SchemaInfoList(qiv, NULL, &schema->list, &error_abort);
+    visit_free(qiv);
+
+    qobject_unref(resp);
+    qtest_end();
+
+    schema->hash = g_hash_table_new(g_str_hash, g_str_equal);
+
+    /* Build @schema: hash table mapping entity name to SchemaInfo */
+    for (tail = schema->list; tail; tail = tail->next) {
+        g_hash_table_insert(schema->hash, tail->value->name, tail->value);
+    }
+}
+
+static SchemaInfo *qmp_schema_lookup(QmpSchema *schema, const char *name)
+{
+    return g_hash_table_lookup(schema->hash, name);
+}
+
+static void qmp_schema_cleanup(QmpSchema *schema)
+{
+    qapi_free_SchemaInfoList(schema->list);
+    g_hash_table_destroy(schema->hash);
+}
+
+static bool object_type_has_mandatory_members(SchemaInfo *type)
+{
+    SchemaInfoObjectMemberList *tail;
+
+    g_assert(type->meta_type == SCHEMA_META_TYPE_OBJECT);
+
+    for (tail = type->u.object.members; tail; tail = tail->next) {
+        if (!tail->value->has_q_default) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
+static void add_query_tests(QmpSchema *schema)
+{
+    SchemaInfoList *tail;
+    SchemaInfo *si, *arg_type, *ret_type;
+    char *test_name;
+
+    /* Test the query-like commands */
+    for (tail = schema->list; tail; tail = tail->next) {
+        si = tail->value;
+        if (si->meta_type != SCHEMA_META_TYPE_COMMAND) {
+            continue;
+        }
+
+        if (query_is_blacklisted(si->name)) {
+            continue;
+        }
+
+        arg_type = qmp_schema_lookup(schema, si->u.command.arg_type);
+        if (object_type_has_mandatory_members(arg_type)) {
+            continue;
+        }
+
+        ret_type = qmp_schema_lookup(schema, si->u.command.ret_type);
+        if (ret_type->meta_type == SCHEMA_META_TYPE_OBJECT
+            && !ret_type->u.object.members) {
+            continue;
+        }
+
+        test_name = g_strdup_printf("qmp/%s", si->name);
+        qtest_add_data_func(test_name, si->name, test_query);
+        g_free(test_name);
+    }
+}
+
+int main(int argc, char *argv[])
+{
+    QmpSchema schema;
+    int ret;
+
+    g_test_init(&argc, &argv, NULL);
+
+    qmp_schema_init(&schema);
+    add_query_tests(&schema);
+    ret = g_test_run();
+
+    qmp_schema_cleanup(&schema);
+    return ret;
+}
diff --git a/tests/qmp-test.c b/tests/qmp-test.c
index 487ef946ed..b6eff4fe97 100644
--- a/tests/qmp-test.c
+++ b/tests/qmp-test.c
@@ -13,13 +13,10 @@
 #include "qemu/osdep.h"
 #include "libqtest.h"
 #include "qapi/error.h"
-#include "qapi/qapi-visit-introspect.h"
 #include "qapi/qapi-visit-misc.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qlist.h"
 #include "qapi/qobject-input-visitor.h"
-#include "qapi/util.h"
-#include "qapi/visitor.h"
 #include "qapi/qmp/qstring.h"
 
 const char common_args[] = "-nodefaults -machine none";
@@ -253,184 +250,6 @@ static void test_qmp_oob(void)
     qtest_quit(qts);
 }
 
-/* Query smoke tests */
-
-static int query_error_class(const char *cmd)
-{
-    static struct {
-        const char *cmd;
-        int err_class;
-    } fails[] = {
-        /* Success depends on build configuration: */
-#ifndef CONFIG_SPICE
-        { "query-spice", ERROR_CLASS_COMMAND_NOT_FOUND },
-#endif
-#ifndef CONFIG_VNC
-        { "query-vnc", ERROR_CLASS_GENERIC_ERROR },
-        { "query-vnc-servers", ERROR_CLASS_GENERIC_ERROR },
-#endif
-#ifndef CONFIG_REPLICATION
-        { "query-xen-replication-status", ERROR_CLASS_COMMAND_NOT_FOUND },
-#endif
-        /* Likewise, and require special QEMU command-line arguments: */
-        { "query-acpi-ospm-status", ERROR_CLASS_GENERIC_ERROR },
-        { "query-balloon", ERROR_CLASS_DEVICE_NOT_ACTIVE },
-        { "query-hotpluggable-cpus", ERROR_CLASS_GENERIC_ERROR },
-        { "query-vm-generation-id", ERROR_CLASS_GENERIC_ERROR },
-        { NULL, -1 }
-    };
-    int i;
-
-    for (i = 0; fails[i].cmd; i++) {
-        if (!strcmp(cmd, fails[i].cmd)) {
-            return fails[i].err_class;
-        }
-    }
-    return -1;
-}
-
-static void test_query(const void *data)
-{
-    const char *cmd = data;
-    int expected_error_class = query_error_class(cmd);
-    QDict *resp, *error;
-    const char *error_class;
-
-    qtest_start(common_args);
-
-    resp = qmp("{ 'execute': %s }", cmd);
-    error = qdict_get_qdict(resp, "error");
-    error_class = error ? qdict_get_str(error, "class") : NULL;
-
-    if (expected_error_class < 0) {
-        g_assert(qdict_haskey(resp, "return"));
-    } else {
-        g_assert(error);
-        g_assert_cmpint(qapi_enum_parse(&QapiErrorClass_lookup, error_class,
-                                        -1, &error_abort),
-                        ==, expected_error_class);
-    }
-    qobject_unref(resp);
-
-    qtest_end();
-}
-
-static bool query_is_blacklisted(const char *cmd)
-{
-    const char *blacklist[] = {
-        /* Not actually queries: */
-        "add-fd",
-        /* Success depends on target arch: */
-        "query-cpu-definitions",  /* arm, i386, ppc, s390x */
-        "query-gic-capabilities", /* arm */
-        /* Success depends on target-specific build configuration: */
-        "query-pci",              /* CONFIG_PCI */
-        /* Success depends on launching SEV guest */
-        "query-sev-launch-measure",
-        /* Success depends on Host or Hypervisor SEV support */
-        "query-sev",
-        "query-sev-capabilities",
-        NULL
-    };
-    int i;
-
-    for (i = 0; blacklist[i]; i++) {
-        if (!strcmp(cmd, blacklist[i])) {
-            return true;
-        }
-    }
-    return false;
-}
-
-typedef struct {
-    SchemaInfoList *list;
-    GHashTable *hash;
-} QmpSchema;
-
-static void qmp_schema_init(QmpSchema *schema)
-{
-    QDict *resp;
-    Visitor *qiv;
-    SchemaInfoList *tail;
-
-    qtest_start(common_args);
-    resp = qmp("{ 'execute': 'query-qmp-schema' }");
-
-    qiv = qobject_input_visitor_new(qdict_get(resp, "return"));
-    visit_type_SchemaInfoList(qiv, NULL, &schema->list, &error_abort);
-    visit_free(qiv);
-
-    qobject_unref(resp);
-    qtest_end();
-
-    schema->hash = g_hash_table_new(g_str_hash, g_str_equal);
-
-    /* Build @schema: hash table mapping entity name to SchemaInfo */
-    for (tail = schema->list; tail; tail = tail->next) {
-        g_hash_table_insert(schema->hash, tail->value->name, tail->value);
-    }
-}
-
-static SchemaInfo *qmp_schema_lookup(QmpSchema *schema, const char *name)
-{
-    return g_hash_table_lookup(schema->hash, name);
-}
-
-static void qmp_schema_cleanup(QmpSchema *schema)
-{
-    qapi_free_SchemaInfoList(schema->list);
-    g_hash_table_destroy(schema->hash);
-}
-
-static bool object_type_has_mandatory_members(SchemaInfo *type)
-{
-    SchemaInfoObjectMemberList *tail;
-
-    g_assert(type->meta_type == SCHEMA_META_TYPE_OBJECT);
-
-    for (tail = type->u.object.members; tail; tail = tail->next) {
-        if (!tail->value->has_q_default) {
-            return true;
-        }
-    }
-
-    return false;
-}
-
-static void add_query_tests(QmpSchema *schema)
-{
-    SchemaInfoList *tail;
-    SchemaInfo *si, *arg_type, *ret_type;
-    char *test_name;
-
-    /* Test the query-like commands */
-    for (tail = schema->list; tail; tail = tail->next) {
-        si = tail->value;
-        if (si->meta_type != SCHEMA_META_TYPE_COMMAND) {
-            continue;
-        }
-
-        if (query_is_blacklisted(si->name)) {
-            continue;
-        }
-
-        arg_type = qmp_schema_lookup(schema, si->u.command.arg_type);
-        if (object_type_has_mandatory_members(arg_type)) {
-            continue;
-        }
-
-        ret_type = qmp_schema_lookup(schema, si->u.command.ret_type);
-        if (ret_type->meta_type == SCHEMA_META_TYPE_OBJECT
-            && !ret_type->u.object.members) {
-            continue;
-        }
-
-        test_name = g_strdup_printf("qmp/%s", si->name);
-        qtest_add_data_func(test_name, si->name, test_query);
-        g_free(test_name);
-    }
-}
-
 /* Preconfig tests */
 
 static void test_qmp_preconfig(void)
@@ -474,19 +293,11 @@ static void test_qmp_preconfig(void)
 
 int main(int argc, char *argv[])
 {
-    QmpSchema schema;
-    int ret;
-
     g_test_init(&argc, &argv, NULL);
 
     qtest_add_func("qmp/protocol", test_qmp_protocol);
     qtest_add_func("qmp/oob", test_qmp_oob);
-    qmp_schema_init(&schema);
-    add_query_tests(&schema);
     qtest_add_func("qmp/preconfig", test_qmp_preconfig);
 
-    ret = g_test_run();
-
-    qmp_schema_cleanup(&schema);
-    return ret;
+    return g_test_run();
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 05/56] qmp-test: Cover syntax and lexical errors
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (3 preceding siblings ...)
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 04/56] qmp-cmd-test: Split off qmp-test Markus Armbruster
@ 2018-08-08 12:02 ` Markus Armbruster
  2018-08-09 13:42   ` Eric Blake
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 06/56] test-qga: Clean up how we test QGA synchronization Markus Armbruster
                   ` (51 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 tests/libqtest.c | 17 +++++++++++++++++
 tests/libqtest.h | 11 +++++++++++
 tests/qmp-test.c | 39 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+)

diff --git a/tests/libqtest.c b/tests/libqtest.c
index 3706f30aa2..c02fc91b37 100644
--- a/tests/libqtest.c
+++ b/tests/libqtest.c
@@ -586,6 +586,23 @@ void qtest_qmp_send(QTestState *s, const char *fmt, ...)
     va_end(ap);
 }
 
+void qtest_qmp_send_raw(QTestState *s, const char *fmt, ...)
+{
+    bool log = getenv("QTEST_LOG") != NULL;
+    va_list ap;
+    char *str;
+
+    va_start(ap, fmt);
+    str = g_strdup_vprintf(fmt, ap);
+    va_end(ap);
+
+    if (log) {
+        fprintf(stderr, "%s", str);
+    }
+    socket_send(s->qmp_fd, str, strlen(str));
+    g_free(str);
+}
+
 QDict *qtest_qmp_eventwait_ref(QTestState *s, const char *event)
 {
     QDict *response;
diff --git a/tests/libqtest.h b/tests/libqtest.h
index def1edaafa..1e831973ff 100644
--- a/tests/libqtest.h
+++ b/tests/libqtest.h
@@ -96,6 +96,17 @@ QDict *qtest_qmp(QTestState *s, const char *fmt, ...)
 void qtest_qmp_send(QTestState *s, const char *fmt, ...)
     GCC_FMT_ATTR(2, 3);
 
+/**
+ * qtest_qmp_send_raw:
+ * @s: #QTestState instance to operate on.
+ * @fmt...: text to send, formatted like sprintf()
+ *
+ * Sends text to the QMP monitor verbatim.  Need not be valid JSON;
+ * this is useful for negative tests.
+ */
+void qtest_qmp_send_raw(QTestState *s, const char *fmt, ...)
+    GCC_FMT_ATTR(2, 3);
+
 /**
  * qtest_qmpv:
  * @s: #QTestState instance to operate on.
diff --git a/tests/qmp-test.c b/tests/qmp-test.c
index b6eff4fe97..5e56be105e 100644
--- a/tests/qmp-test.c
+++ b/tests/qmp-test.c
@@ -42,10 +42,49 @@ static void test_version(QObject *version)
     visit_free(v);
 }
 
+static bool recovered(QTestState *qts)
+{
+    QDict *resp;
+    bool ret;
+
+    resp = qtest_qmp(qts, "{ 'execute': 'no-such-cmd' }");
+    ret = !strcmp(get_error_class(resp), "CommandNotFound");
+    qobject_unref(resp);
+    return ret;
+}
+
 static void test_malformed(QTestState *qts)
 {
     QDict *resp;
 
+    /* syntax error */
+    qtest_qmp_send_raw(qts, "{]\n");
+    resp = qtest_qmp_receive(qts);
+    g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
+    qobject_unref(resp);
+    g_assert(recovered(qts));
+
+    /* lexical error: impossible byte outside string */
+    qtest_qmp_send_raw(qts, "{\xFF");
+    resp = qtest_qmp_receive(qts);
+    g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
+    qobject_unref(resp);
+    g_assert(recovered(qts));
+
+    /* lexical error: impossible byte in string */
+    qtest_qmp_send_raw(qts, "{'bad \xFF");
+    resp = qtest_qmp_receive(qts);
+    g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
+    qobject_unref(resp);
+    g_assert(recovered(qts));
+
+    /* lexical error: interpolation */
+    qtest_qmp_send_raw(qts, "%%p\n");
+    resp = qtest_qmp_receive(qts);
+    g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
+    qobject_unref(resp);
+    g_assert(recovered(qts));
+
     /* Not even a dictionary */
     resp = qtest_qmp(qts, "null");
     g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 06/56] test-qga: Clean up how we test QGA synchronization
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (4 preceding siblings ...)
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 05/56] qmp-test: Cover syntax and lexical errors Markus Armbruster
@ 2018-08-08 12:02 ` Markus Armbruster
  2018-08-09 13:46   ` Eric Blake
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 07/56] check-qjson: Cover escaped characters more thoroughly, part 1 Markus Armbruster
                   ` (50 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

To permit recovering from arbitrary JSON parse errors, the JSON parser
resets itself on lexical errors.  We recommend sending a 0xff byte for
that purpose, and test-qga covers this usage since commit 5229564b832.
That commit had to add an ugly hack to qmp_fd_vsend() to make capable
of sending this byte (it's designed to send only valid JSON).

The previous commit added a way to send arbitrary text.  Put that to
use for this purpose, and drop the hack from qmp_fd_vsend().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 tests/libqtest.c | 39 +++++++++++++++++++++------------------
 tests/libqtest.h |  2 ++
 tests/test-qga.c |  8 ++++----
 3 files changed, 27 insertions(+), 22 deletions(-)

diff --git a/tests/libqtest.c b/tests/libqtest.c
index c02fc91b37..9c844874e4 100644
--- a/tests/libqtest.c
+++ b/tests/libqtest.c
@@ -489,16 +489,6 @@ void qmp_fd_vsend(int fd, const char *fmt, va_list ap)
 {
     QObject *qobj;
 
-    /*
-     * qobject_from_vjsonf_nofail() chokes on leading 0xff as invalid
-     * JSON, but tests/test-qga.c needs to send that to test QGA
-     * synchronization
-     */
-    if (*fmt == '\377') {
-        socket_send(fd, fmt, 1);
-        fmt++;
-    }
-
     /* Going through qobject ensures we escape strings properly */
     qobj = qobject_from_vjsonf_nofail(fmt, ap);
 
@@ -586,23 +576,36 @@ void qtest_qmp_send(QTestState *s, const char *fmt, ...)
     va_end(ap);
 }
 
-void qtest_qmp_send_raw(QTestState *s, const char *fmt, ...)
+void qmp_fd_vsend_raw(int fd, const char *fmt, va_list ap)
 {
     bool log = getenv("QTEST_LOG") != NULL;
-    va_list ap;
-    char *str;
-
-    va_start(ap, fmt);
-    str = g_strdup_vprintf(fmt, ap);
-    va_end(ap);
+    char *str = g_strdup_vprintf(fmt, ap);
 
     if (log) {
         fprintf(stderr, "%s", str);
     }
-    socket_send(s->qmp_fd, str, strlen(str));
+    socket_send(fd, str, strlen(str));
     g_free(str);
 }
 
+void qmp_fd_send_raw(int fd, const char *fmt, ...)
+{
+    va_list ap;
+
+    va_start(ap, fmt);
+    qmp_fd_vsend_raw(fd, fmt, ap);
+    va_end(ap);
+}
+
+void qtest_qmp_send_raw(QTestState *s, const char *fmt, ...)
+{
+    va_list ap;
+
+    va_start(ap, fmt);
+    qmp_fd_vsend_raw(s->qmp_fd, fmt, ap);
+    va_end(ap);
+}
+
 QDict *qtest_qmp_eventwait_ref(QTestState *s, const char *event)
 {
     QDict *response;
diff --git a/tests/libqtest.h b/tests/libqtest.h
index 1e831973ff..2d1eb4b282 100644
--- a/tests/libqtest.h
+++ b/tests/libqtest.h
@@ -959,6 +959,8 @@ static inline int64_t clock_set(int64_t val)
 QDict *qmp_fd_receive(int fd);
 void qmp_fd_vsend(int fd, const char *fmt, va_list ap) GCC_FMT_ATTR(2, 0);
 void qmp_fd_send(int fd, const char *fmt, ...) GCC_FMT_ATTR(2, 3);
+void qmp_fd_send_raw(int fd, const char *fmt, ...) GCC_FMT_ATTR(2, 3);
+void qmp_fd_vsend_raw(int fd, const char *fmt, va_list ap) GCC_FMT_ATTR(2, 0);
 QDict *qmp_fdv(int fd, const char *fmt, va_list ap) GCC_FMT_ATTR(2, 0);
 QDict *qmp_fd(int fd, const char *fmt, ...) GCC_FMT_ATTR(2, 3);
 
diff --git a/tests/test-qga.c b/tests/test-qga.c
index c552cc0125..4e51898d23 100644
--- a/tests/test-qga.c
+++ b/tests/test-qga.c
@@ -147,10 +147,10 @@ static void test_qga_sync_delimited(gconstpointer fix)
     unsigned char c;
     QDict *ret;
 
-    qmp_fd_send(fixture->fd,
-                "\xff{'execute': 'guest-sync-delimited',"
-                " 'arguments': {'id': %u } }",
-                r);
+    qmp_fd_send_raw(fixture->fd,
+                    "\xff{'execute': 'guest-sync-delimited',"
+                    " 'arguments': {'id': %u } }",
+                    r);
 
     /*
      * Read and ignore garbage until resynchronized.
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 07/56] check-qjson: Cover escaped characters more thoroughly, part 1
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (5 preceding siblings ...)
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 06/56] test-qga: Clean up how we test QGA synchronization Markus Armbruster
@ 2018-08-08 12:02 ` Markus Armbruster
  2018-08-09 13:54   ` Eric Blake
  2018-08-09 14:00   ` Eric Blake
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 08/56] check-qjson: Streamline escaped_string()'s test strings Markus Armbruster
                   ` (49 subsequent siblings)
  56 siblings, 2 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

escaped_string() first tests double quoted strings, then repeats a few
tests with single quotes.  Repeat all of them: store the strings to
test without quotes, and wrap them in either kind of quote for
testing.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 tests/check-qjson.c | 94 ++++++++++++++++++++++++++-------------------
 1 file changed, 55 insertions(+), 39 deletions(-)

diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index 0a9a054c7b..1c7f24bc4d 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -22,55 +22,71 @@
 #include "qapi/qmp/qstring.h"
 #include "qemu-common.h"
 
+static QString *from_json_str(const char *jstr, Error **errp, bool single)
+{
+    char quote = single ? '\'' : '"';
+    char *qjstr = g_strdup_printf("%c%s%c", quote, jstr, quote);
+
+    return qobject_to(QString, qobject_from_json(qjstr, errp));
+}
+
+static char *to_json_str(QString *str)
+{
+    QString *json = qobject_to_json(QOBJECT(str));
+    char *jstr;
+
+    if (!json) {
+        return NULL;
+    }
+    /* peel off double quotes */
+    jstr = g_strndup(qstring_get_str(json) + 1,
+                     qstring_get_length(json) - 2);
+    qobject_unref(json);
+    return jstr;
+}
+
 static void escaped_string(void)
 {
-    int i;
     struct {
-        const char *encoded;
-        const char *decoded;
+        /* Content of JSON string to parse with qobject_from_json() */
+        const char *json_in;
+        /* Expected parse output; to unparse with qobject_to_json() */
+        const char *utf8_out;
         int skip;
     } test_cases[] = {
-        { "\"\\b\"", "\b" },
-        { "\"\\f\"", "\f" },
-        { "\"\\n\"", "\n" },
-        { "\"\\r\"", "\r" },
-        { "\"\\t\"", "\t" },
-        { "\"/\"", "/" },
-        { "\"\\/\"", "/", .skip = 1 },
-        { "\"\\\\\"", "\\" },
-        { "\"\\\"\"", "\"" },
-        { "\"hello world \\\"embedded string\\\"\"",
+        { "\\b", "\b" },
+        { "\\f", "\f" },
+        { "\\n", "\n" },
+        { "\\r", "\r" },
+        { "\\t", "\t" },
+        { "/", "/" },
+        { "\\/", "/", .skip = 1 },
+        { "\\\\", "\\" },
+        { "\\\"", "\"" },
+        { "hello world \\\"embedded string\\\"",
           "hello world \"embedded string\"" },
-        { "\"hello world\\nwith new line\"", "hello world\nwith new line" },
-        { "\"single byte utf-8 \\u0020\"", "single byte utf-8  ", .skip = 1 },
-        { "\"double byte utf-8 \\u00A2\"", "double byte utf-8 \xc2\xa2" },
-        { "\"triple byte utf-8 \\u20AC\"", "triple byte utf-8 \xe2\x82\xac" },
-        { "'\\b'", "\b", .skip = 1 },
-        { "'\\f'", "\f", .skip = 1 },
-        { "'\\n'", "\n", .skip = 1 },
-        { "'\\r'", "\r", .skip = 1 },
-        { "'\\t'", "\t", .skip = 1 },
-        { "'\\/'", "/", .skip = 1 },
-        { "'\\\\'", "\\", .skip = 1 },
+        { "hello world\\nwith new line", "hello world\nwith new line" },
+        { "single byte utf-8 \\u0020", "single byte utf-8  ", .skip = 1 },
+        { "double byte utf-8 \\u00A2", "double byte utf-8 \xc2\xa2" },
+        { "triple byte utf-8 \\u20AC", "triple byte utf-8 \xe2\x82\xac" },
         {}
     };
+    int i, j;
+    QString *cstr;
+    char *jstr;
 
-    for (i = 0; test_cases[i].encoded; i++) {
-        QObject *obj;
-        QString *str;
-
-        obj = qobject_from_json(test_cases[i].encoded, &error_abort);
-        str = qobject_to(QString, obj);
-        g_assert(str);
-        g_assert_cmpstr(qstring_get_str(str), ==, test_cases[i].decoded);
-
-        if (test_cases[i].skip == 0) {
-            str = qobject_to_json(obj);
-            g_assert_cmpstr(qstring_get_str(str), ==, test_cases[i].encoded);
-            qobject_unref(obj);
+    for (i = 0; test_cases[i].json_in; i++) {
+        for (j = 0; j < 2; j++) {
+            cstr = from_json_str(test_cases[i].json_in, &error_abort, j);
+            g_assert_cmpstr(qstring_get_try_str(cstr),
+                            ==, test_cases[i].utf8_out);
+            if (test_cases[i].skip == 0) {
+                jstr = to_json_str(cstr);
+                g_assert_cmpstr(jstr, ==, test_cases[i].json_in);
+                g_free(jstr);
+            }
+            qobject_unref(cstr);
         }
-
-        qobject_unref(str);
     }
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 08/56] check-qjson: Streamline escaped_string()'s test strings
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (6 preceding siblings ...)
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 07/56] check-qjson: Cover escaped characters more thoroughly, part 1 Markus Armbruster
@ 2018-08-08 12:02 ` Markus Armbruster
  2018-08-09 13:57   ` Eric Blake
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 09/56] check-qjson: Cover escaped characters more thoroughly, part 2 Markus Armbruster
                   ` (48 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Merge a few closely related test strings, and drop a few redundant
ones.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 tests/check-qjson.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index 1c7f24bc4d..8f51f57af9 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -54,18 +54,8 @@ static void escaped_string(void)
         const char *utf8_out;
         int skip;
     } test_cases[] = {
-        { "\\b", "\b" },
-        { "\\f", "\f" },
-        { "\\n", "\n" },
-        { "\\r", "\r" },
-        { "\\t", "\t" },
-        { "/", "/" },
-        { "\\/", "/", .skip = 1 },
-        { "\\\\", "\\" },
-        { "\\\"", "\"" },
-        { "hello world \\\"embedded string\\\"",
-          "hello world \"embedded string\"" },
-        { "hello world\\nwith new line", "hello world\nwith new line" },
+        { "\\b\\f\\n\\r\\t\\\\\\\"", "\b\f\n\r\t\\\"" },
+        { "\\/\\'", "/'", .skip = 1 },
         { "single byte utf-8 \\u0020", "single byte utf-8  ", .skip = 1 },
         { "double byte utf-8 \\u00A2", "double byte utf-8 \xc2\xa2" },
         { "triple byte utf-8 \\u20AC", "triple byte utf-8 \xe2\x82\xac" },
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 09/56] check-qjson: Cover escaped characters more thoroughly, part 2
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (7 preceding siblings ...)
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 08/56] check-qjson: Streamline escaped_string()'s test strings Markus Armbruster
@ 2018-08-08 12:02 ` Markus Armbruster
  2018-08-09 14:03   ` Eric Blake
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 10/56] check-qjson: Drop redundant string tests Markus Armbruster
                   ` (47 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Cover surrogates, invalid escapes, and noncharacters.  This
demonstrates that valid surrogate pairs are misinterpreted, and
invalid surrogates and noncharacters aren't rejected.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 tests/check-qjson.c | 53 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 45 insertions(+), 8 deletions(-)

diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index 8f51f57af9..e899e2d361 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -59,6 +59,38 @@ static void escaped_string(void)
         { "single byte utf-8 \\u0020", "single byte utf-8  ", .skip = 1 },
         { "double byte utf-8 \\u00A2", "double byte utf-8 \xc2\xa2" },
         { "triple byte utf-8 \\u20AC", "triple byte utf-8 \xe2\x82\xac" },
+        { "quadruple byte utf-8 \\uD834\\uDD1E", /* U+1D11E */
+          /* bug: want \xF0\x9D\x84\x9E */
+          "quadruple byte utf-8 \xED\xA0\xB4\xED\xB4\x9E", .skip = 1 },
+        { "\\z", NULL },
+        { "\\ux", NULL },
+        { "\\u1x", NULL },
+        { "\\u12x", NULL },
+        { "\\u123x", NULL },
+        { "\\u12345", "\341\210\2645" },
+        { "\\u12345", "\341\210\2645" },
+        { "\\u0000x", "x", .skip = 1}, /* bug: want \xC0\x80x */
+        { "unpaired leading surrogate \\uD800\\uD800",
+          /* bug: not rejected */
+          "unpaired leading surrogate \355\240\200\355\240\200", .skip = 1 },
+        { "unpaired trailing surrogate \\uDC00\\uDC00",
+          /* bug: not rejected */
+          "unpaired trailing surrogate \355\260\200\355\260\200", .skip = 1},
+        { "backward surrogate pair \\uDC00\\uD800",
+          /* bug: not rejected */
+          "backward surrogate pair \355\260\200\355\240\200", .skip = 1},
+        { "noncharacter U+FDD0 \\uFDD0",
+          /* bug: not rejected */
+          "noncharacter U+FDD0 \xEF\xB7\x90", .skip = 1},
+        { "noncharacter U+FDEF \\uFDEF",
+          /* bug: not rejected */
+          "noncharacter U+FDEF \xEF\xB7\xAF", .skip = 1},
+        { "noncharacter U+1FFFE \\uD87F\\uDFFE",
+          /* bug: not rejected */
+          "noncharacter U+1FFFE \xED\xA1\xBF\xED\xBF\xBE", .skip = 1},
+        { "noncharacter U+10FFFF \\uDC3F\\uDFFF",
+          /* bug: not rejected */
+          "noncharacter U+10FFFF \xED\xB0\xBF\xED\xBF\xBF", .skip = 1},
         {}
     };
     int i, j;
@@ -67,15 +99,20 @@ static void escaped_string(void)
 
     for (i = 0; test_cases[i].json_in; i++) {
         for (j = 0; j < 2; j++) {
-            cstr = from_json_str(test_cases[i].json_in, &error_abort, j);
-            g_assert_cmpstr(qstring_get_try_str(cstr),
-                            ==, test_cases[i].utf8_out);
-            if (test_cases[i].skip == 0) {
-                jstr = to_json_str(cstr);
-                g_assert_cmpstr(jstr, ==, test_cases[i].json_in);
-                g_free(jstr);
+            if (test_cases[i].utf8_out) {
+                cstr = from_json_str(test_cases[i].json_in, &error_abort, j);
+                g_assert_cmpstr(qstring_get_try_str(cstr),
+                                ==, test_cases[i].utf8_out);
+                if (!test_cases[i].skip) {
+                    jstr = to_json_str(cstr);
+                    g_assert_cmpstr(jstr, ==, test_cases[i].json_in);
+                    g_free(jstr);
+                }
+                qobject_unref(cstr);
+            } else {
+                cstr = from_json_str(test_cases[i].json_in, NULL, j);
+                g_assert(!cstr);
             }
-            qobject_unref(cstr);
         }
     }
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 10/56] check-qjson: Drop redundant string tests
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (8 preceding siblings ...)
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 09/56] check-qjson: Cover escaped characters more thoroughly, part 2 Markus Armbruster
@ 2018-08-08 12:02 ` Markus Armbruster
  2018-08-09 14:04   ` Eric Blake
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 11/56] check-qjson: Cover UTF-8 in single quoted strings Markus Armbruster
                   ` (46 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

simple_string() and single_quote_string() add nothing to
escaped_string() anymore.  Drop them.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 tests/check-qjson.c | 59 ---------------------------------------------
 1 file changed, 59 deletions(-)

diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index e899e2d361..f0e8967a53 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -117,63 +117,6 @@ static void escaped_string(void)
     }
 }
 
-static void simple_string(void)
-{
-    int i;
-    struct {
-        const char *encoded;
-        const char *decoded;
-    } test_cases[] = {
-        { "\"hello world\"", "hello world" },
-        { "\"the quick brown fox jumped over the fence\"",
-          "the quick brown fox jumped over the fence" },
-        {}
-    };
-
-    for (i = 0; test_cases[i].encoded; i++) {
-        QObject *obj;
-        QString *str;
-
-        obj = qobject_from_json(test_cases[i].encoded, &error_abort);
-        str = qobject_to(QString, obj);
-        g_assert(str);
-        g_assert(strcmp(qstring_get_str(str), test_cases[i].decoded) == 0);
-
-        str = qobject_to_json(obj);
-        g_assert(strcmp(qstring_get_str(str), test_cases[i].encoded) == 0);
-
-        qobject_unref(obj);
-        
-        qobject_unref(str);
-    }
-}
-
-static void single_quote_string(void)
-{
-    int i;
-    struct {
-        const char *encoded;
-        const char *decoded;
-    } test_cases[] = {
-        { "'hello world'", "hello world" },
-        { "'the quick brown fox \\' jumped over the fence'",
-          "the quick brown fox ' jumped over the fence" },
-        {}
-    };
-
-    for (i = 0; test_cases[i].encoded; i++) {
-        QObject *obj;
-        QString *str;
-
-        obj = qobject_from_json(test_cases[i].encoded, &error_abort);
-        str = qobject_to(QString, obj);
-        g_assert(str);
-        g_assert(strcmp(qstring_get_str(str), test_cases[i].decoded) == 0);
-
-        qobject_unref(str);
-    }
-}
-
 static void utf8_string(void)
 {
     /*
@@ -1512,10 +1455,8 @@ int main(int argc, char **argv)
 {
     g_test_init(&argc, &argv, NULL);
 
-    g_test_add_func("/literals/string/simple", simple_string);
     g_test_add_func("/literals/string/escaped", escaped_string);
     g_test_add_func("/literals/string/utf8", utf8_string);
-    g_test_add_func("/literals/string/single_quote", single_quote_string);
     g_test_add_func("/literals/string/vararg", vararg_string);
 
     g_test_add_func("/literals/number/simple", simple_number);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 11/56] check-qjson: Cover UTF-8 in single quoted strings
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (9 preceding siblings ...)
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 10/56] check-qjson: Drop redundant string tests Markus Armbruster
@ 2018-08-08 12:02 ` Markus Armbruster
  2018-08-09 14:17   ` Eric Blake
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 12/56] check-qjson: Simplify utf8_string() Markus Armbruster
                   ` (45 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

utf8_string() tests only double quoted strings.  Cover single quoted
strings, too: store the strings to test without quotes, then wrap them
in either kind of quote.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 tests/check-qjson.c | 427 ++++++++++++++++++++++----------------------
 1 file changed, 214 insertions(+), 213 deletions(-)

diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index f0e8967a53..75f0a9f18a 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -144,10 +144,14 @@ static void utf8_string(void)
      * http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
      */
     static const struct {
+        /* Content of JSON string to parse with qobject_from_json() */
         const char *json_in;
+        /* Expected parse output */
         const char *utf8_out;
-        const char *json_out;   /* defaults to @json_in */
-        const char *utf8_in;    /* defaults to @utf8_out */
+        /* Expected unparse output, defaults to @json_in */
+        const char *json_out;
+        /* Expected parse output for @json_out, defaults to @utf8_out */
+        const char *utf8_in;
     } test_cases[] = {
         /*
          * Bug markers used here:
@@ -165,72 +169,72 @@ static void utf8_string(void)
         /* 1  Some correct UTF-8 text */
         {
             /* a bit of German */
-            "\"Falsches \xC3\x9C" "ben von Xylophonmusik qu\xC3\xA4lt"
-            " jeden gr\xC3\xB6\xC3\x9F" "eren Zwerg.\"",
             "Falsches \xC3\x9C" "ben von Xylophonmusik qu\xC3\xA4lt"
             " jeden gr\xC3\xB6\xC3\x9F" "eren Zwerg.",
-            "\"Falsches \\u00DCben von Xylophonmusik qu\\u00E4lt"
-            " jeden gr\\u00F6\\u00DFeren Zwerg.\"",
+            "Falsches \xC3\x9C" "ben von Xylophonmusik qu\xC3\xA4lt"
+            " jeden gr\xC3\xB6\xC3\x9F" "eren Zwerg.",
+            "Falsches \\u00DCben von Xylophonmusik qu\\u00E4lt"
+            " jeden gr\\u00F6\\u00DFeren Zwerg.",
         },
         {
             /* a bit of Greek */
-            "\"\xCE\xBA\xE1\xBD\xB9\xCF\x83\xCE\xBC\xCE\xB5\"",
             "\xCE\xBA\xE1\xBD\xB9\xCF\x83\xCE\xBC\xCE\xB5",
-            "\"\\u03BA\\u1F79\\u03C3\\u03BC\\u03B5\"",
+            "\xCE\xBA\xE1\xBD\xB9\xCF\x83\xCE\xBC\xCE\xB5",
+            "\\u03BA\\u1F79\\u03C3\\u03BC\\u03B5",
         },
         /* 2  Boundary condition test cases */
         /* 2.1  First possible sequence of a certain length */
         /* 2.1.1  1 byte U+0000 */
         {
-            "\"\\u0000\"",
+            "\\u0000",
             "",                 /* bug: want overlong "\xC0\x80" */
-            "\"\\u0000\"",
+            "\\u0000",
             "\xC0\x80",
         },
         /* 2.1.2  2 bytes U+0080 */
         {
-            "\"\xC2\x80\"",
             "\xC2\x80",
-            "\"\\u0080\"",
+            "\xC2\x80",
+            "\\u0080",
         },
         /* 2.1.3  3 bytes U+0800 */
         {
-            "\"\xE0\xA0\x80\"",
             "\xE0\xA0\x80",
-            "\"\\u0800\"",
+            "\xE0\xA0\x80",
+            "\\u0800",
         },
         /* 2.1.4  4 bytes U+10000 */
         {
-            "\"\xF0\x90\x80\x80\"",
             "\xF0\x90\x80\x80",
-            "\"\\uD800\\uDC00\"",
+            "\xF0\x90\x80\x80",
+            "\\uD800\\uDC00",
         },
         /* 2.1.5  5 bytes U+200000 */
         {
-            "\"\xF8\x88\x80\x80\x80\"",
+            "\xF8\x88\x80\x80\x80",
             NULL,               /* bug: rejected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
             "\xF8\x88\x80\x80\x80",
         },
         /* 2.1.6  6 bytes U+4000000 */
         {
-            "\"\xFC\x84\x80\x80\x80\x80\"",
+            "\xFC\x84\x80\x80\x80\x80",
             NULL,               /* bug: rejected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
             "\xFC\x84\x80\x80\x80\x80",
         },
         /* 2.2  Last possible sequence of a certain length */
         /* 2.2.1  1 byte U+007F */
         {
-            "\"\x7F\"",
             "\x7F",
-            "\"\\u007F\"",
+            "\x7F",
+            "\\u007F",
         },
         /* 2.2.2  2 bytes U+07FF */
         {
-            "\"\xDF\xBF\"",
             "\xDF\xBF",
-            "\"\\u07FF\"",
+            "\xDF\xBF",
+            "\\u07FF",
         },
         /*
          * 2.2.3  3 bytes U+FFFC
@@ -242,122 +246,122 @@ static void utf8_string(void)
          * U+FFFC here.
          */
         {
-            "\"\xEF\xBF\xBC\"",
             "\xEF\xBF\xBC",
-            "\"\\uFFFC\"",
+            "\xEF\xBF\xBC",
+            "\\uFFFC",
         },
         /* 2.2.4  4 bytes U+1FFFFF */
         {
-            "\"\xF7\xBF\xBF\xBF\"",
+            "\xF7\xBF\xBF\xBF",
             NULL,               /* bug: rejected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
             "\xF7\xBF\xBF\xBF",
         },
         /* 2.2.5  5 bytes U+3FFFFFF */
         {
-            "\"\xFB\xBF\xBF\xBF\xBF\"",
+            "\xFB\xBF\xBF\xBF\xBF",
             NULL,               /* bug: rejected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
             "\xFB\xBF\xBF\xBF\xBF",
         },
         /* 2.2.6  6 bytes U+7FFFFFFF */
         {
-            "\"\xFD\xBF\xBF\xBF\xBF\xBF\"",
+            "\xFD\xBF\xBF\xBF\xBF\xBF",
             NULL,               /* bug: rejected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
             "\xFD\xBF\xBF\xBF\xBF\xBF",
         },
         /* 2.3  Other boundary conditions */
         {
             /* last one before surrogate range: U+D7FF */
-            "\"\xED\x9F\xBF\"",
             "\xED\x9F\xBF",
-            "\"\\uD7FF\"",
+            "\xED\x9F\xBF",
+            "\\uD7FF",
         },
         {
             /* first one after surrogate range: U+E000 */
-            "\"\xEE\x80\x80\"",
             "\xEE\x80\x80",
-            "\"\\uE000\"",
+            "\xEE\x80\x80",
+            "\\uE000",
         },
         {
             /* last one in BMP: U+FFFD */
-            "\"\xEF\xBF\xBD\"",
             "\xEF\xBF\xBD",
-            "\"\\uFFFD\"",
+            "\xEF\xBF\xBD",
+            "\\uFFFD",
         },
         {
             /* last one in last plane: U+10FFFD */
-            "\"\xF4\x8F\xBF\xBD\"",
             "\xF4\x8F\xBF\xBD",
-            "\"\\uDBFF\\uDFFD\""
+            "\xF4\x8F\xBF\xBD",
+            "\\uDBFF\\uDFFD"
         },
         {
             /* first one beyond Unicode range: U+110000 */
-            "\"\xF4\x90\x80\x80\"",
             "\xF4\x90\x80\x80",
-            "\"\\uFFFD\"",
+            "\xF4\x90\x80\x80",
+            "\\uFFFD",
         },
         /* 3  Malformed sequences */
         /* 3.1  Unexpected continuation bytes */
         /* 3.1.1  First continuation byte */
         {
-            "\"\x80\"",
+            "\x80",
             "\x80",             /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         /* 3.1.2  Last continuation byte */
         {
-            "\"\xBF\"",
+            "\xBF",
             "\xBF",             /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         /* 3.1.3  2 continuation bytes */
         {
-            "\"\x80\xBF\"",
+            "\x80\xBF",
             "\x80\xBF",         /* bug: not corrected */
-            "\"\\uFFFD\\uFFFD\"",
+            "\\uFFFD\\uFFFD",
         },
         /* 3.1.4  3 continuation bytes */
         {
-            "\"\x80\xBF\x80\"",
+            "\x80\xBF\x80",
             "\x80\xBF\x80",     /* bug: not corrected */
-            "\"\\uFFFD\\uFFFD\\uFFFD\"",
+            "\\uFFFD\\uFFFD\\uFFFD",
         },
         /* 3.1.5  4 continuation bytes */
         {
-            "\"\x80\xBF\x80\xBF\"",
+            "\x80\xBF\x80\xBF",
             "\x80\xBF\x80\xBF", /* bug: not corrected */
-            "\"\\uFFFD\\uFFFD\\uFFFD\\uFFFD\"",
+            "\\uFFFD\\uFFFD\\uFFFD\\uFFFD",
         },
         /* 3.1.6  5 continuation bytes */
         {
-            "\"\x80\xBF\x80\xBF\x80\"",
+            "\x80\xBF\x80\xBF\x80",
             "\x80\xBF\x80\xBF\x80", /* bug: not corrected */
-            "\"\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\"",
+            "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD",
         },
         /* 3.1.7  6 continuation bytes */
         {
-            "\"\x80\xBF\x80\xBF\x80\xBF\"",
+            "\x80\xBF\x80\xBF\x80\xBF",
             "\x80\xBF\x80\xBF\x80\xBF", /* bug: not corrected */
-            "\"\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\"",
+            "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD",
         },
         /* 3.1.8  7 continuation bytes */
         {
-            "\"\x80\xBF\x80\xBF\x80\xBF\x80\"",
+            "\x80\xBF\x80\xBF\x80\xBF\x80",
             "\x80\xBF\x80\xBF\x80\xBF\x80", /* bug: not corrected */
-            "\"\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\"",
+            "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD",
         },
         /* 3.1.9  Sequence of all 64 possible continuation bytes */
         {
-            "\"\x80\x81\x82\x83\x84\x85\x86\x87"
+            "\x80\x81\x82\x83\x84\x85\x86\x87"
             "\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F"
             "\x90\x91\x92\x93\x94\x95\x96\x97"
             "\x98\x99\x9A\x9B\x9C\x9D\x9E\x9F"
             "\xA0\xA1\xA2\xA3\xA4\xA5\xA6\xA7"
             "\xA8\xA9\xAA\xAB\xAC\xAD\xAE\xAF"
             "\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7"
-            "\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF\"",
+            "\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF",
              /* bug: not corrected */
             "\x80\x81\x82\x83\x84\x85\x86\x87"
             "\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F"
@@ -367,27 +371,27 @@ static void utf8_string(void)
             "\xA8\xA9\xAA\xAB\xAC\xAD\xAE\xAF"
             "\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7"
             "\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF",
-            "\"\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
-            "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\""
+            "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
+            "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
         },
         /* 3.2  Lonely start characters */
         /* 3.2.1  All 32 first bytes of 2-byte sequences, followed by space */
         {
-            "\"\xC0 \xC1 \xC2 \xC3 \xC4 \xC5 \xC6 \xC7 "
+            "\xC0 \xC1 \xC2 \xC3 \xC4 \xC5 \xC6 \xC7 "
             "\xC8 \xC9 \xCA \xCB \xCC \xCD \xCE \xCF "
             "\xD0 \xD1 \xD2 \xD3 \xD4 \xD5 \xD6 \xD7 "
-            "\xD8 \xD9 \xDA \xDB \xDC \xDD \xDE \xDF \"",
+            "\xD8 \xD9 \xDA \xDB \xDC \xDD \xDE \xDF ",
             NULL,               /* bug: rejected */
-            "\"\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD "
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD "
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD "
-            "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \"",
+            "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD "
+            "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD ",
             "\xC0 \xC1 \xC2 \xC3 \xC4 \xC5 \xC6 \xC7 "
             "\xC8 \xC9 \xCA \xCB \xCC \xCD \xCE \xCF "
             "\xD0 \xD1 \xD2 \xD3 \xD4 \xD5 \xD6 \xD7 "
@@ -395,159 +399,159 @@ static void utf8_string(void)
         },
         /* 3.2.2  All 16 first bytes of 3-byte sequences, followed by space */
         {
-            "\"\xE0 \xE1 \xE2 \xE3 \xE4 \xE5 \xE6 \xE7 "
-            "\xE8 \xE9 \xEA \xEB \xEC \xED \xEE \xEF \"",
+            "\xE0 \xE1 \xE2 \xE3 \xE4 \xE5 \xE6 \xE7 "
+            "\xE8 \xE9 \xEA \xEB \xEC \xED \xEE \xEF ",
             /* bug: not corrected */
             "\xE0 \xE1 \xE2 \xE3 \xE4 \xE5 \xE6 \xE7 "
             "\xE8 \xE9 \xEA \xEB \xEC \xED \xEE \xEF ",
-            "\"\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD "
-            "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \"",
+            "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD "
+            "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD ",
         },
         /* 3.2.3  All 8 first bytes of 4-byte sequences, followed by space */
         {
-            "\"\xF0 \xF1 \xF2 \xF3 \xF4 \xF5 \xF6 \xF7 \"",
+            "\xF0 \xF1 \xF2 \xF3 \xF4 \xF5 \xF6 \xF7 ",
             NULL,               /* bug: rejected */
-            "\"\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \"",
+            "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD ",
             "\xF0 \xF1 \xF2 \xF3 \xF4 \xF5 \xF6 \xF7 ",
         },
         /* 3.2.4  All 4 first bytes of 5-byte sequences, followed by space */
         {
-            "\"\xF8 \xF9 \xFA \xFB \"",
+            "\xF8 \xF9 \xFA \xFB ",
             NULL,               /* bug: rejected */
-            "\"\\uFFFD \\uFFFD \\uFFFD \\uFFFD \"",
+            "\\uFFFD \\uFFFD \\uFFFD \\uFFFD ",
             "\xF8 \xF9 \xFA \xFB ",
         },
         /* 3.2.5  All 2 first bytes of 6-byte sequences, followed by space */
         {
-            "\"\xFC \xFD \"",
+            "\xFC \xFD ",
             NULL,               /* bug: rejected */
-            "\"\\uFFFD \\uFFFD \"",
+            "\\uFFFD \\uFFFD ",
             "\xFC \xFD ",
         },
         /* 3.3  Sequences with last continuation byte missing */
         /* 3.3.1  2-byte sequence with last byte missing (U+0000) */
         {
-            "\"\xC0\"",
+            "\xC0",
             NULL,               /* bug: rejected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
             "\xC0",
         },
         /* 3.3.2  3-byte sequence with last byte missing (U+0000) */
         {
-            "\"\xE0\x80\"",
+            "\xE0\x80",
             "\xE0\x80",           /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         /* 3.3.3  4-byte sequence with last byte missing (U+0000) */
         {
-            "\"\xF0\x80\x80\"",
+            "\xF0\x80\x80",
             "\xF0\x80\x80",     /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         /* 3.3.4  5-byte sequence with last byte missing (U+0000) */
         {
-            "\"\xF8\x80\x80\x80\"",
+            "\xF8\x80\x80\x80",
             NULL,                   /* bug: rejected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
             "\xF8\x80\x80\x80",
         },
         /* 3.3.5  6-byte sequence with last byte missing (U+0000) */
         {
-            "\"\xFC\x80\x80\x80\x80\"",
+            "\xFC\x80\x80\x80\x80",
             NULL,                        /* bug: rejected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
             "\xFC\x80\x80\x80\x80",
         },
         /* 3.3.6  2-byte sequence with last byte missing (U+07FF) */
         {
-            "\"\xDF\"",
+            "\xDF",
             "\xDF",             /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         /* 3.3.7  3-byte sequence with last byte missing (U+FFFF) */
         {
-            "\"\xEF\xBF\"",
+            "\xEF\xBF",
             "\xEF\xBF",           /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         /* 3.3.8  4-byte sequence with last byte missing (U+1FFFFF) */
         {
-            "\"\xF7\xBF\xBF\"",
+            "\xF7\xBF\xBF",
             NULL,               /* bug: rejected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
             "\xF7\xBF\xBF",
         },
         /* 3.3.9  5-byte sequence with last byte missing (U+3FFFFFF) */
         {
-            "\"\xFB\xBF\xBF\xBF\"",
+            "\xFB\xBF\xBF\xBF",
             NULL,                 /* bug: rejected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
             "\xFB\xBF\xBF\xBF",
         },
         /* 3.3.10  6-byte sequence with last byte missing (U+7FFFFFFF) */
         {
-            "\"\xFD\xBF\xBF\xBF\xBF\"",
+            "\xFD\xBF\xBF\xBF\xBF",
             NULL,                        /* bug: rejected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
             "\xFD\xBF\xBF\xBF\xBF",
         },
         /* 3.4  Concatenation of incomplete sequences */
         {
-            "\"\xC0\xE0\x80\xF0\x80\x80\xF8\x80\x80\x80\xFC\x80\x80\x80\x80"
-            "\xDF\xEF\xBF\xF7\xBF\xBF\xFB\xBF\xBF\xBF\xFD\xBF\xBF\xBF\xBF\"",
+            "\xC0\xE0\x80\xF0\x80\x80\xF8\x80\x80\x80\xFC\x80\x80\x80\x80"
+            "\xDF\xEF\xBF\xF7\xBF\xBF\xFB\xBF\xBF\xBF\xFD\xBF\xBF\xBF\xBF",
             NULL,               /* bug: rejected */
-            "\"\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
-            "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\"",
+            "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
+            "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD",
             "\xC0\xE0\x80\xF0\x80\x80\xF8\x80\x80\x80\xFC\x80\x80\x80\x80"
             "\xDF\xEF\xBF\xF7\xBF\xBF\xFB\xBF\xBF\xBF\xFD\xBF\xBF\xBF\xBF",
         },
         /* 3.5  Impossible bytes */
         {
-            "\"\xFE\"",
+            "\xFE",
             NULL,               /* bug: rejected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
             "\xFE",
         },
         {
-            "\"\xFF\"",
+            "\xFF",
             NULL,               /* bug: rejected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
             "\xFF",
         },
         {
-            "\"\xFE\xFE\xFF\xFF\"",
+            "\xFE\xFE\xFF\xFF",
             NULL,                 /* bug: rejected */
-            "\"\\uFFFD\\uFFFD\\uFFFD\\uFFFD\"",
+            "\\uFFFD\\uFFFD\\uFFFD\\uFFFD",
             "\xFE\xFE\xFF\xFF",
         },
         /* 4  Overlong sequences */
         /* 4.1  Overlong '/' */
         {
-            "\"\xC0\xAF\"",
+            "\xC0\xAF",
             NULL,               /* bug: rejected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
             "\xC0\xAF",
         },
         {
-            "\"\xE0\x80\xAF\"",
+            "\xE0\x80\xAF",
             "\xE0\x80\xAF",     /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         {
-            "\"\xF0\x80\x80\xAF\"",
+            "\xF0\x80\x80\xAF",
             "\xF0\x80\x80\xAF",  /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         {
-            "\"\xF8\x80\x80\x80\xAF\"",
+            "\xF8\x80\x80\x80\xAF",
             NULL,                        /* bug: rejected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
             "\xF8\x80\x80\x80\xAF",
         },
         {
-            "\"\xFC\x80\x80\x80\x80\xAF\"",
+            "\xFC\x80\x80\x80\x80\xAF",
             NULL,                               /* bug: rejected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
             "\xFC\x80\x80\x80\x80\xAF",
         },
         /*
@@ -558,16 +562,16 @@ static void utf8_string(void)
          */
         {
             /* \U+007F */
-            "\"\xC1\xBF\"",
+            "\xC1\xBF",
             NULL,               /* bug: rejected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
             "\xC1\xBF",
         },
         {
             /* \U+07FF */
-            "\"\xE0\x9F\xBF\"",
+            "\xE0\x9F\xBF",
             "\xE0\x9F\xBF",     /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         {
             /*
@@ -576,181 +580,181 @@ static void utf8_string(void)
              * noncharacter.  Testing U+FFFC seems more useful.  See
              * also 2.2.3
              */
-            "\"\xF0\x8F\xBF\xBC\"",
+            "\xF0\x8F\xBF\xBC",
             "\xF0\x8F\xBF\xBC",   /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         {
             /* \U+1FFFFF */
-            "\"\xF8\x87\xBF\xBF\xBF\"",
+            "\xF8\x87\xBF\xBF\xBF",
             NULL,                        /* bug: rejected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
             "\xF8\x87\xBF\xBF\xBF",
         },
         {
             /* \U+3FFFFFF */
-            "\"\xFC\x83\xBF\xBF\xBF\xBF\"",
+            "\xFC\x83\xBF\xBF\xBF\xBF",
             NULL,                               /* bug: rejected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
             "\xFC\x83\xBF\xBF\xBF\xBF",
         },
         /* 4.3  Overlong representation of the NUL character */
         {
             /* \U+0000 */
-            "\"\xC0\x80\"",
+            "\xC0\x80",
             NULL,               /* bug: rejected */
-            "\"\\u0000\"",
+            "\\u0000",
             "\xC0\x80",
         },
         {
             /* \U+0000 */
-            "\"\xE0\x80\x80\"",
+            "\xE0\x80\x80",
             "\xE0\x80\x80",     /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         {
             /* \U+0000 */
-            "\"\xF0\x80\x80\x80\"",
+            "\xF0\x80\x80\x80",
             "\xF0\x80\x80\x80",   /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         {
             /* \U+0000 */
-            "\"\xF8\x80\x80\x80\x80\"",
+            "\xF8\x80\x80\x80\x80",
             NULL,                        /* bug: rejected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
             "\xF8\x80\x80\x80\x80",
         },
         {
             /* \U+0000 */
-            "\"\xFC\x80\x80\x80\x80\x80\"",
+            "\xFC\x80\x80\x80\x80\x80",
             NULL,                               /* bug: rejected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
             "\xFC\x80\x80\x80\x80\x80",
         },
         /* 5  Illegal code positions */
         /* 5.1  Single UTF-16 surrogates */
         {
             /* \U+D800 */
-            "\"\xED\xA0\x80\"",
+            "\xED\xA0\x80",
             "\xED\xA0\x80",     /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         {
             /* \U+DB7F */
-            "\"\xED\xAD\xBF\"",
+            "\xED\xAD\xBF",
             "\xED\xAD\xBF",     /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         {
             /* \U+DB80 */
-            "\"\xED\xAE\x80\"",
+            "\xED\xAE\x80",
             "\xED\xAE\x80",     /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         {
             /* \U+DBFF */
-            "\"\xED\xAF\xBF\"",
+            "\xED\xAF\xBF",
             "\xED\xAF\xBF",     /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         {
             /* \U+DC00 */
-            "\"\xED\xB0\x80\"",
+            "\xED\xB0\x80",
             "\xED\xB0\x80",     /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         {
             /* \U+DF80 */
-            "\"\xED\xBE\x80\"",
+            "\xED\xBE\x80",
             "\xED\xBE\x80",     /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         {
             /* \U+DFFF */
-            "\"\xED\xBF\xBF\"",
+            "\xED\xBF\xBF",
             "\xED\xBF\xBF",     /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         /* 5.2  Paired UTF-16 surrogates */
         {
             /* \U+D800\U+DC00 */
-            "\"\xED\xA0\x80\xED\xB0\x80\"",
+            "\xED\xA0\x80\xED\xB0\x80",
             "\xED\xA0\x80\xED\xB0\x80", /* bug: not corrected */
-            "\"\\uFFFD\\uFFFD\"",
+            "\\uFFFD\\uFFFD",
         },
         {
             /* \U+D800\U+DFFF */
-            "\"\xED\xA0\x80\xED\xBF\xBF\"",
+            "\xED\xA0\x80\xED\xBF\xBF",
             "\xED\xA0\x80\xED\xBF\xBF", /* bug: not corrected */
-            "\"\\uFFFD\\uFFFD\"",
+            "\\uFFFD\\uFFFD",
         },
         {
             /* \U+DB7F\U+DC00 */
-            "\"\xED\xAD\xBF\xED\xB0\x80\"",
+            "\xED\xAD\xBF\xED\xB0\x80",
             "\xED\xAD\xBF\xED\xB0\x80", /* bug: not corrected */
-            "\"\\uFFFD\\uFFFD\"",
+            "\\uFFFD\\uFFFD",
         },
         {
             /* \U+DB7F\U+DFFF */
-            "\"\xED\xAD\xBF\xED\xBF\xBF\"",
+            "\xED\xAD\xBF\xED\xBF\xBF",
             "\xED\xAD\xBF\xED\xBF\xBF", /* bug: not corrected */
-            "\"\\uFFFD\\uFFFD\"",
+            "\\uFFFD\\uFFFD",
         },
         {
             /* \U+DB80\U+DC00 */
-            "\"\xED\xAE\x80\xED\xB0\x80\"",
+            "\xED\xAE\x80\xED\xB0\x80",
             "\xED\xAE\x80\xED\xB0\x80", /* bug: not corrected */
-            "\"\\uFFFD\\uFFFD\"",
+            "\\uFFFD\\uFFFD",
         },
         {
             /* \U+DB80\U+DFFF */
-            "\"\xED\xAE\x80\xED\xBF\xBF\"",
+            "\xED\xAE\x80\xED\xBF\xBF",
             "\xED\xAE\x80\xED\xBF\xBF", /* bug: not corrected */
-            "\"\\uFFFD\\uFFFD\"",
+            "\\uFFFD\\uFFFD",
         },
         {
             /* \U+DBFF\U+DC00 */
-            "\"\xED\xAF\xBF\xED\xB0\x80\"",
+            "\xED\xAF\xBF\xED\xB0\x80",
             "\xED\xAF\xBF\xED\xB0\x80", /* bug: not corrected */
-            "\"\\uFFFD\\uFFFD\"",
+            "\\uFFFD\\uFFFD",
         },
         {
             /* \U+DBFF\U+DFFF */
-            "\"\xED\xAF\xBF\xED\xBF\xBF\"",
+            "\xED\xAF\xBF\xED\xBF\xBF",
             "\xED\xAF\xBF\xED\xBF\xBF", /* bug: not corrected */
-            "\"\\uFFFD\\uFFFD\"",
+            "\\uFFFD\\uFFFD",
         },
         /* 5.3  Other illegal code positions */
         /* BMP noncharacters */
         {
             /* \U+FFFE */
-            "\"\xEF\xBF\xBE\"",
+            "\xEF\xBF\xBE",
             "\xEF\xBF\xBE",     /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         {
             /* \U+FFFF */
-            "\"\xEF\xBF\xBF\"",
+            "\xEF\xBF\xBF",
             "\xEF\xBF\xBF",     /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         {
             /* U+FDD0 */
-            "\"\xEF\xB7\x90\"",
+            "\xEF\xB7\x90",
             "\xEF\xB7\x90",     /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         {
             /* U+FDEF */
-            "\"\xEF\xB7\xAF\"",
+            "\xEF\xB7\xAF",
             "\xEF\xB7\xAF",     /* bug: not corrected */
-            "\"\\uFFFD\"",
+            "\\uFFFD",
         },
         /* Plane 1 .. 16 noncharacters */
         {
             /* U+1FFFE U+1FFFF U+2FFFE U+2FFFF ... U+10FFFE U+10FFFF */
-            "\"\xF0\x9F\xBF\xBE\xF0\x9F\xBF\xBF"
+            "\xF0\x9F\xBF\xBE\xF0\x9F\xBF\xBF"
             "\xF0\xAF\xBF\xBE\xF0\xAF\xBF\xBF"
             "\xF0\xBF\xBF\xBE\xF0\xBF\xBF\xBF"
             "\xF1\x8F\xBF\xBE\xF1\x8F\xBF\xBF"
@@ -765,7 +769,7 @@ static void utf8_string(void)
             "\xF3\x9F\xBF\xBE\xF3\x9F\xBF\xBF"
             "\xF3\xAF\xBF\xBE\xF3\xAF\xBF\xBF"
             "\xF3\xBF\xBF\xBE\xF3\xBF\xBF\xBF"
-            "\xF4\x8F\xBF\xBE\xF4\x8F\xBF\xBF\"",
+            "\xF4\x8F\xBF\xBE\xF4\x8F\xBF\xBF",
             /* bug: not corrected */
             "\xF0\x9F\xBF\xBE\xF0\x9F\xBF\xBF"
             "\xF0\xAF\xBF\xBE\xF0\xAF\xBF\xBF"
@@ -783,55 +787,52 @@ static void utf8_string(void)
             "\xF3\xAF\xBF\xBE\xF3\xAF\xBF\xBF"
             "\xF3\xBF\xBF\xBE\xF3\xBF\xBF\xBF"
             "\xF4\x8F\xBF\xBE\xF4\x8F\xBF\xBF",
-            "\"\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
-            "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\"",
+            "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
+            "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD",
         },
         {}
     };
-    int i;
-    QObject *obj;
+    int i, j;
     QString *str;
     const char *json_in, *utf8_out, *utf8_in, *json_out;
+    char *jstr;
 
     for (i = 0; test_cases[i].json_in; i++) {
-        json_in = test_cases[i].json_in;
-        utf8_out = test_cases[i].utf8_out;
-        utf8_in = test_cases[i].utf8_in ?: test_cases[i].utf8_out;
-        json_out = test_cases[i].json_out ?: test_cases[i].json_in;
+        for (j = 0; j < 2; j++) {
+            json_in = test_cases[i].json_in;
+            utf8_out = test_cases[i].utf8_out;
+            utf8_in = test_cases[i].utf8_in ?: test_cases[i].utf8_out;
+            json_out = test_cases[i].json_out ?: test_cases[i].json_in;
 
-        obj = qobject_from_json(json_in, utf8_out ? &error_abort : NULL);
-        if (utf8_out) {
-            str = qobject_to(QString, obj);
-            g_assert(str);
-            g_assert_cmpstr(qstring_get_str(str), ==, utf8_out);
-        } else {
-            g_assert(!obj);
-        }
-        qobject_unref(obj);
+            /* Parse @json_in, expect @utf8_out */
+            if (utf8_out) {
+                str = from_json_str(json_in, &error_abort, j);
+                g_assert_cmpstr(qstring_get_try_str(str), ==, utf8_out);
+                qobject_unref(str);
+            } else {
+                str = from_json_str(json_in, NULL, j);
+                g_assert(!str);
+            }
 
-        obj = QOBJECT(qstring_from_str(utf8_in));
-        str = qobject_to_json(obj);
-        if (json_out) {
-            g_assert(str);
-            g_assert_cmpstr(qstring_get_str(str), ==, json_out);
-        } else {
-            g_assert(!str);
-        }
-        qobject_unref(str);
-        qobject_unref(obj);
+            /* Unparse @utf8_in, expect @json_out */
+            str = qstring_from_str(utf8_in);
+            jstr = to_json_str(str);
+            g_assert_cmpstr(jstr, ==, json_out);
+            qobject_unref(str);
+            g_free(jstr);
 
-        /*
-         * Disabled, because qobject_from_json() is buggy, and I can't
-         * be bothered to add the expected incorrect results.
-         * FIXME Enable once these bugs have been fixed.
-         */
-        if (0 && json_out != json_in) {
-            obj = qobject_from_json(json_out, &error_abort);
-            str = qobject_to(QString, obj);
-            g_assert(str);
-            g_assert_cmpstr(qstring_get_str(str), ==, utf8_out);
+            /*
+             * Parse @json_out right back
+             * Disabled, because qobject_from_json() is buggy, and I can't
+             * be bothered to add the expected incorrect results.
+             * FIXME Enable once these bugs have been fixed.
+             */
+            if (0 && json_out != json_in) {
+                str = from_json_str(json_out, &error_abort, j);
+                g_assert_cmpstr(qstring_get_try_str(str), ==, utf8_out);
+            }
         }
     }
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 12/56] check-qjson: Simplify utf8_string()
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (10 preceding siblings ...)
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 11/56] check-qjson: Cover UTF-8 in single quoted strings Markus Armbruster
@ 2018-08-08 12:02 ` Markus Armbruster
  2018-08-09 14:20   ` Eric Blake
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 13/56] check-qjson: Fix utf8_string() to test all invalid sequences Markus Armbruster
                   ` (44 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

The previous commit made utf8_string()'s test_cases[].utf8_in
superfluous: we can use .json_in instead.  Except for the case testing
U+0000.  \x00 doesn't work in C strings, so it tests \\u0000 instead.
But testing \\uXXXX is escaped_string()'s job.  It's covered there.
Test U+0001 here, and drop .utf8_in.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 tests/check-qjson.c | 53 ++++++++-------------------------------------
 1 file changed, 9 insertions(+), 44 deletions(-)

diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index 75f0a9f18a..5ba09e5ab6 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -150,8 +150,6 @@ static void utf8_string(void)
         const char *utf8_out;
         /* Expected unparse output, defaults to @json_in */
         const char *json_out;
-        /* Expected parse output for @json_out, defaults to @utf8_out */
-        const char *utf8_in;
     } test_cases[] = {
         /*
          * Bug markers used here:
@@ -160,10 +158,6 @@ static void utf8_string(void)
          * - bug: rejected
          *   JSON parser rejects invalid sequence(s)
          *   We may choose to define this as feature
-         * - bug: want "..."
-         *   JSON parser produces incorrect result, this is the
-         *   correct one, assuming replacement character U+FFFF
-         *   We may choose to reject instead of replace
          */
 
         /* 1  Some correct UTF-8 text */
@@ -184,12 +178,15 @@ static void utf8_string(void)
         },
         /* 2  Boundary condition test cases */
         /* 2.1  First possible sequence of a certain length */
-        /* 2.1.1  1 byte U+0000 */
+        /*
+         * 2.1.1  1 byte U+0001
+         * \x00 is impossible, test \x01 instead.  Other
+         * representations of U+0000 are covered under 4.3.
+         */
         {
-            "\\u0000",
-            "",                 /* bug: want overlong "\xC0\x80" */
-            "\\u0000",
-            "\xC0\x80",
+            "\x01",
+            "\x01",
+            "\\u0001",
         },
         /* 2.1.2  2 bytes U+0080 */
         {
@@ -214,14 +211,12 @@ static void utf8_string(void)
             "\xF8\x88\x80\x80\x80",
             NULL,               /* bug: rejected */
             "\\uFFFD",
-            "\xF8\x88\x80\x80\x80",
         },
         /* 2.1.6  6 bytes U+4000000 */
         {
             "\xFC\x84\x80\x80\x80\x80",
             NULL,               /* bug: rejected */
             "\\uFFFD",
-            "\xFC\x84\x80\x80\x80\x80",
         },
         /* 2.2  Last possible sequence of a certain length */
         /* 2.2.1  1 byte U+007F */
@@ -255,21 +250,18 @@ static void utf8_string(void)
             "\xF7\xBF\xBF\xBF",
             NULL,               /* bug: rejected */
             "\\uFFFD",
-            "\xF7\xBF\xBF\xBF",
         },
         /* 2.2.5  5 bytes U+3FFFFFF */
         {
             "\xFB\xBF\xBF\xBF\xBF",
             NULL,               /* bug: rejected */
             "\\uFFFD",
-            "\xFB\xBF\xBF\xBF\xBF",
         },
         /* 2.2.6  6 bytes U+7FFFFFFF */
         {
             "\xFD\xBF\xBF\xBF\xBF\xBF",
             NULL,               /* bug: rejected */
             "\\uFFFD",
-            "\xFD\xBF\xBF\xBF\xBF\xBF",
         },
         /* 2.3  Other boundary conditions */
         {
@@ -392,10 +384,6 @@ static void utf8_string(void)
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD "
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD "
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD ",
-            "\xC0 \xC1 \xC2 \xC3 \xC4 \xC5 \xC6 \xC7 "
-            "\xC8 \xC9 \xCA \xCB \xCC \xCD \xCE \xCF "
-            "\xD0 \xD1 \xD2 \xD3 \xD4 \xD5 \xD6 \xD7 "
-            "\xD8 \xD9 \xDA \xDB \xDC \xDD \xDE \xDF ",
         },
         /* 3.2.2  All 16 first bytes of 3-byte sequences, followed by space */
         {
@@ -412,21 +400,18 @@ static void utf8_string(void)
             "\xF0 \xF1 \xF2 \xF3 \xF4 \xF5 \xF6 \xF7 ",
             NULL,               /* bug: rejected */
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD ",
-            "\xF0 \xF1 \xF2 \xF3 \xF4 \xF5 \xF6 \xF7 ",
         },
         /* 3.2.4  All 4 first bytes of 5-byte sequences, followed by space */
         {
             "\xF8 \xF9 \xFA \xFB ",
             NULL,               /* bug: rejected */
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD ",
-            "\xF8 \xF9 \xFA \xFB ",
         },
         /* 3.2.5  All 2 first bytes of 6-byte sequences, followed by space */
         {
             "\xFC \xFD ",
             NULL,               /* bug: rejected */
             "\\uFFFD \\uFFFD ",
-            "\xFC \xFD ",
         },
         /* 3.3  Sequences with last continuation byte missing */
         /* 3.3.1  2-byte sequence with last byte missing (U+0000) */
@@ -434,7 +419,6 @@ static void utf8_string(void)
             "\xC0",
             NULL,               /* bug: rejected */
             "\\uFFFD",
-            "\xC0",
         },
         /* 3.3.2  3-byte sequence with last byte missing (U+0000) */
         {
@@ -453,14 +437,12 @@ static void utf8_string(void)
             "\xF8\x80\x80\x80",
             NULL,                   /* bug: rejected */
             "\\uFFFD",
-            "\xF8\x80\x80\x80",
         },
         /* 3.3.5  6-byte sequence with last byte missing (U+0000) */
         {
             "\xFC\x80\x80\x80\x80",
             NULL,                        /* bug: rejected */
             "\\uFFFD",
-            "\xFC\x80\x80\x80\x80",
         },
         /* 3.3.6  2-byte sequence with last byte missing (U+07FF) */
         {
@@ -479,21 +461,18 @@ static void utf8_string(void)
             "\xF7\xBF\xBF",
             NULL,               /* bug: rejected */
             "\\uFFFD",
-            "\xF7\xBF\xBF",
         },
         /* 3.3.9  5-byte sequence with last byte missing (U+3FFFFFF) */
         {
             "\xFB\xBF\xBF\xBF",
             NULL,                 /* bug: rejected */
             "\\uFFFD",
-            "\xFB\xBF\xBF\xBF",
         },
         /* 3.3.10  6-byte sequence with last byte missing (U+7FFFFFFF) */
         {
             "\xFD\xBF\xBF\xBF\xBF",
             NULL,                        /* bug: rejected */
             "\\uFFFD",
-            "\xFD\xBF\xBF\xBF\xBF",
         },
         /* 3.4  Concatenation of incomplete sequences */
         {
@@ -502,27 +481,22 @@ static void utf8_string(void)
             NULL,               /* bug: rejected */
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD",
-            "\xC0\xE0\x80\xF0\x80\x80\xF8\x80\x80\x80\xFC\x80\x80\x80\x80"
-            "\xDF\xEF\xBF\xF7\xBF\xBF\xFB\xBF\xBF\xBF\xFD\xBF\xBF\xBF\xBF",
         },
         /* 3.5  Impossible bytes */
         {
             "\xFE",
             NULL,               /* bug: rejected */
             "\\uFFFD",
-            "\xFE",
         },
         {
             "\xFF",
             NULL,               /* bug: rejected */
             "\\uFFFD",
-            "\xFF",
         },
         {
             "\xFE\xFE\xFF\xFF",
             NULL,                 /* bug: rejected */
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD",
-            "\xFE\xFE\xFF\xFF",
         },
         /* 4  Overlong sequences */
         /* 4.1  Overlong '/' */
@@ -530,7 +504,6 @@ static void utf8_string(void)
             "\xC0\xAF",
             NULL,               /* bug: rejected */
             "\\uFFFD",
-            "\xC0\xAF",
         },
         {
             "\xE0\x80\xAF",
@@ -546,13 +519,11 @@ static void utf8_string(void)
             "\xF8\x80\x80\x80\xAF",
             NULL,                        /* bug: rejected */
             "\\uFFFD",
-            "\xF8\x80\x80\x80\xAF",
         },
         {
             "\xFC\x80\x80\x80\x80\xAF",
             NULL,                               /* bug: rejected */
             "\\uFFFD",
-            "\xFC\x80\x80\x80\x80\xAF",
         },
         /*
          * 4.2  Maximum overlong sequences
@@ -565,7 +536,6 @@ static void utf8_string(void)
             "\xC1\xBF",
             NULL,               /* bug: rejected */
             "\\uFFFD",
-            "\xC1\xBF",
         },
         {
             /* \U+07FF */
@@ -589,14 +559,12 @@ static void utf8_string(void)
             "\xF8\x87\xBF\xBF\xBF",
             NULL,                        /* bug: rejected */
             "\\uFFFD",
-            "\xF8\x87\xBF\xBF\xBF",
         },
         {
             /* \U+3FFFFFF */
             "\xFC\x83\xBF\xBF\xBF\xBF",
             NULL,                               /* bug: rejected */
             "\\uFFFD",
-            "\xFC\x83\xBF\xBF\xBF\xBF",
         },
         /* 4.3  Overlong representation of the NUL character */
         {
@@ -604,7 +572,6 @@ static void utf8_string(void)
             "\xC0\x80",
             NULL,               /* bug: rejected */
             "\\u0000",
-            "\xC0\x80",
         },
         {
             /* \U+0000 */
@@ -623,14 +590,12 @@ static void utf8_string(void)
             "\xF8\x80\x80\x80\x80",
             NULL,                        /* bug: rejected */
             "\\uFFFD",
-            "\xF8\x80\x80\x80\x80",
         },
         {
             /* \U+0000 */
             "\xFC\x80\x80\x80\x80\x80",
             NULL,                               /* bug: rejected */
             "\\uFFFD",
-            "\xFC\x80\x80\x80\x80\x80",
         },
         /* 5  Illegal code positions */
         /* 5.1  Single UTF-16 surrogates */
@@ -803,7 +768,7 @@ static void utf8_string(void)
         for (j = 0; j < 2; j++) {
             json_in = test_cases[i].json_in;
             utf8_out = test_cases[i].utf8_out;
-            utf8_in = test_cases[i].utf8_in ?: test_cases[i].utf8_out;
+            utf8_in = test_cases[i].utf8_out ?: test_cases[i].json_in;
             json_out = test_cases[i].json_out ?: test_cases[i].json_in;
 
             /* Parse @json_in, expect @utf8_out */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 13/56] check-qjson: Fix utf8_string() to test all invalid sequences
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (11 preceding siblings ...)
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 12/56] check-qjson: Simplify utf8_string() Markus Armbruster
@ 2018-08-08 12:02 ` Markus Armbruster
  2018-08-09 14:22   ` Eric Blake
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 14/56] check-qjson qmp-test: Cover control characters more thoroughly Markus Armbruster
                   ` (43 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Some of utf8_string()'s test_cases[] contain multiple invalid
sequences.  Testing that qobject_from_json() fails only tests we
reject at least one invalid sequence.  That's incomplete.

Additionally test each non-space sequence in isolation.

This demonstrates that the JSON parser accepts invalid sequences
starting with \xC2..\xF4.  Add a FIXME comment.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 tests/check-qjson.c | 33 ++++++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index 5ba09e5ab6..5f3334322b 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -20,6 +20,7 @@
 #include "qapi/qmp/qnull.h"
 #include "qapi/qmp/qnum.h"
 #include "qapi/qmp/qstring.h"
+#include "qemu/unicode.h"
 #include "qemu-common.h"
 
 static QString *from_json_str(const char *jstr, Error **errp, bool single)
@@ -379,7 +380,7 @@ static void utf8_string(void)
             "\xC8 \xC9 \xCA \xCB \xCC \xCD \xCE \xCF "
             "\xD0 \xD1 \xD2 \xD3 \xD4 \xD5 \xD6 \xD7 "
             "\xD8 \xD9 \xDA \xDB \xDC \xDD \xDE \xDF ",
-            NULL,               /* bug: rejected */
+            NULL,               /* bug: rejected (partly, see FIXME below) */
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD "
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD "
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD "
@@ -398,7 +399,7 @@ static void utf8_string(void)
         /* 3.2.3  All 8 first bytes of 4-byte sequences, followed by space */
         {
             "\xF0 \xF1 \xF2 \xF3 \xF4 \xF5 \xF6 \xF7 ",
-            NULL,               /* bug: rejected */
+            NULL,               /* bug: rejected (partly, see FIXME below) */
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD ",
         },
         /* 3.2.4  All 4 first bytes of 5-byte sequences, followed by space */
@@ -478,7 +479,7 @@ static void utf8_string(void)
         {
             "\xC0\xE0\x80\xF0\x80\x80\xF8\x80\x80\x80\xFC\x80\x80\x80\x80"
             "\xDF\xEF\xBF\xF7\xBF\xBF\xFB\xBF\xBF\xBF\xFD\xBF\xBF\xBF\xBF",
-            NULL,               /* bug: rejected */
+            NULL,               /* bug: rejected (partly, see FIXME below) */
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD",
         },
@@ -761,8 +762,8 @@ static void utf8_string(void)
     };
     int i, j;
     QString *str;
-    const char *json_in, *utf8_out, *utf8_in, *json_out;
-    char *jstr;
+    const char *json_in, *utf8_out, *utf8_in, *json_out, *tail;
+    char *end, *in, *jstr;
 
     for (i = 0; test_cases[i].json_in; i++) {
         for (j = 0; j < 2; j++) {
@@ -779,6 +780,28 @@ static void utf8_string(void)
             } else {
                 str = from_json_str(json_in, NULL, j);
                 g_assert(!str);
+                /*
+                 * Failure may be due to any sequence, but *all* sequences
+                 * are expected to fail.  Test each one in isolation.
+                 */
+                for (tail = json_in; *tail; tail = end) {
+                    mod_utf8_codepoint(tail, 6, &end);
+                    if (*end == ' ') {
+                        end++;
+                    }
+                    in = strndup(tail, end - tail);
+                    str = from_json_str(in, NULL, j);
+                    /*
+                     * FIXME JSON parser accepts invalid sequence
+                     * starting with \xC2..\xF4
+                     */
+                    if (*in >= '\xC2' && *in <= '\xF4') {
+                        g_free(str);
+                        str = NULL;
+                    }
+                    g_assert(!str);
+                    g_free(in);
+                }
             }
 
             /* Unparse @utf8_in, expect @json_out */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 14/56] check-qjson qmp-test: Cover control characters more thoroughly
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (12 preceding siblings ...)
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 13/56] check-qjson: Fix utf8_string() to test all invalid sequences Markus Armbruster
@ 2018-08-08 12:02 ` Markus Armbruster
  2018-08-09 17:24   ` Eric Blake
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 15/56] check-qjson: Cover interpolation " Markus Armbruster
                   ` (42 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

RFC 7159 requires control characters in strings to be escaped.
Demonstrate the JSON parser accepts U+0001 .. U+001F unescaped.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 tests/check-qjson.c | 36 ++++++++++++++++++++++++++++++------
 tests/qmp-test.c    | 14 ++++++++++++++
 2 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index 5f3334322b..33bd5854fc 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -161,6 +161,26 @@ static void utf8_string(void)
          *   We may choose to define this as feature
          */
 
+        /* 0  Control characters */
+        {
+            /*
+             * Note: \x00 is impossible, other representations of
+             * U+0000 are covered under 4.3
+             */
+            "\x01\x02\x03\x04\x05\x06\x07"
+            "\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F"
+            "\x10\x11\x12\x13\x14\x15\x16\x17"
+            "\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F",
+            /* bug: not corrected (valid UTF-8, but invalid JSON) */
+            "\x01\x02\x03\x04\x05\x06\x07"
+            "\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F"
+            "\x10\x11\x12\x13\x14\x15\x16\x17"
+            "\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F",
+            "\\u0001\\u0002\\u0003\\u0004\\u0005\\u0006\\u0007"
+            "\\b\\t\\n\\u000B\\f\\r\\u000E\\u000F"
+            "\\u0010\\u0011\\u0012\\u0013\\u0014\\u0015\\u0016\\u0017"
+            "\\u0018\\u0019\\u001A\\u001B\\u001C\\u001D\\u001E\\u001F",
+        },
         /* 1  Some correct UTF-8 text */
         {
             /* a bit of German */
@@ -180,14 +200,14 @@ static void utf8_string(void)
         /* 2  Boundary condition test cases */
         /* 2.1  First possible sequence of a certain length */
         /*
-         * 2.1.1  1 byte U+0001
-         * \x00 is impossible, test \x01 instead.  Other
-         * representations of U+0000 are covered under 4.3.
+         * 2.1.1 1 byte U+0020
+         * Control characters are already covered by their own test
+         * case under 0.  Test the first 1 byte non-control character
+         * here.
          */
         {
-            "\x01",
-            "\x01",
-            "\\u0001",
+            " ",
+            " ",
         },
         /* 2.1.2  2 bytes U+0080 */
         {
@@ -1302,6 +1322,10 @@ static void junk_input(void)
     g_assert(!err);             /* BUG */
     g_assert(obj == NULL);
 
+    obj = qobject_from_json("{\x01", &err);
+    g_assert(!err);             /* BUG */
+    g_assert(obj == NULL);
+
     obj = qobject_from_json("[0\xFF]", &err);
     error_free_or_abort(&err);
     g_assert(obj == NULL);
diff --git a/tests/qmp-test.c b/tests/qmp-test.c
index 5e56be105e..5117a1ab25 100644
--- a/tests/qmp-test.c
+++ b/tests/qmp-test.c
@@ -71,6 +71,13 @@ static void test_malformed(QTestState *qts)
     qobject_unref(resp);
     g_assert(recovered(qts));
 
+    /* lexical error: funny control character outside string */
+    qtest_qmp_send_raw(qts, "{\x01");
+    resp = qtest_qmp_receive(qts);
+    g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
+    qobject_unref(resp);
+    g_assert(recovered(qts));
+
     /* lexical error: impossible byte in string */
     qtest_qmp_send_raw(qts, "{'bad \xFF");
     resp = qtest_qmp_receive(qts);
@@ -78,6 +85,13 @@ static void test_malformed(QTestState *qts)
     qobject_unref(resp);
     g_assert(recovered(qts));
 
+    /* lexical error: control character in string */
+    qtest_qmp_send_raw(qts, "{'execute': 'nonexistent', 'id':'\n'}");
+    resp = qtest_qmp_receive(qts);
+    g_assert_cmpstr(get_error_class(resp), ==, "CommandNotFound"); /* BUG */
+    qobject_unref(resp);
+    g_assert(recovered(qts));
+
     /* lexical error: interpolation */
     qtest_qmp_send_raw(qts, "%%p\n");
     resp = qtest_qmp_receive(qts);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 15/56] check-qjson: Cover interpolation more thoroughly
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (13 preceding siblings ...)
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 14/56] check-qjson qmp-test: Cover control characters more thoroughly Markus Armbruster
@ 2018-08-08 12:02 ` Markus Armbruster
  2018-08-09 17:26   ` Eric Blake
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 16/56] json: Fix lexer to include the bad character in JSON_ERROR token Markus Armbruster
                   ` (41 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 tests/check-qjson.c | 141 ++++++++++++++++++++++++--------------------
 1 file changed, 77 insertions(+), 64 deletions(-)

diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index 33bd5854fc..fda2b014a3 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -845,30 +845,6 @@ static void utf8_string(void)
     }
 }
 
-static void vararg_string(void)
-{
-    int i;
-    struct {
-        const char *decoded;
-    } test_cases[] = {
-        { "hello world" },
-        { "the quick brown fox jumped over the fence" },
-        {}
-    };
-
-    for (i = 0; test_cases[i].decoded; i++) {
-        QString *str;
-
-        str = qobject_to(QString,
-                         qobject_from_jsonf_nofail("%s",
-                                                   test_cases[i].decoded));
-        g_assert(str);
-        g_assert(strcmp(qstring_get_str(str), test_cases[i].decoded) == 0);
-
-        qobject_unref(str);
-    }
-}
-
 static void simple_number(void)
 {
     int i;
@@ -986,29 +962,6 @@ static void float_number(void)
     }
 }
 
-static void vararg_number(void)
-{
-    QNum *qnum;
-    int value = 0x2342;
-    long long value_ll = 0x2342342343LL;
-    double valuef = 2.323423423;
-    int64_t val;
-
-    qnum = qobject_to(QNum, qobject_from_jsonf_nofail("%d", value));
-    g_assert(qnum_get_try_int(qnum, &val));
-    g_assert_cmpint(val, ==, value);
-    qobject_unref(qnum);
-
-    qnum = qobject_to(QNum, qobject_from_jsonf_nofail("%lld", value_ll));
-    g_assert(qnum_get_try_int(qnum, &val));
-    g_assert_cmpint(val, ==, value_ll);
-    qobject_unref(qnum);
-
-    qnum = qobject_to(QNum, qobject_from_jsonf_nofail("%f", valuef));
-    g_assert(qnum_get_double(qnum) == valuef);
-    qobject_unref(qnum);
-}
-
 static void keyword_literal(void)
 {
     QObject *obj;
@@ -1038,17 +991,6 @@ static void keyword_literal(void)
 
     qobject_unref(qbool);
 
-    qbool = qobject_to(QBool, qobject_from_jsonf_nofail("%i", false));
-    g_assert(qbool);
-    g_assert(qbool_get_bool(qbool) == false);
-    qobject_unref(qbool);
-
-    /* Test that non-zero values other than 1 get collapsed to true */
-    qbool = qobject_to(QBool, qobject_from_jsonf_nofail("%i", 2));
-    g_assert(qbool);
-    g_assert(qbool_get_bool(qbool) == true);
-    qobject_unref(qbool);
-
     obj = qobject_from_json("null", &error_abort);
     g_assert(obj != NULL);
     g_assert(qobject_type(obj) == QTYPE_QNULL);
@@ -1060,6 +1002,78 @@ static void keyword_literal(void)
     qobject_unref(null);
 }
 
+static void interpolation(void)
+{
+    long long value_lld = 0x123456789abcdefLL;
+    long value_ld = (long)value_lld;
+    int value_d = (int)value_lld;
+    unsigned long long value_llu = 0xfedcba9876543210ULL;
+    unsigned long value_lu = (unsigned long)value_llu;
+    unsigned value_u = (unsigned)value_llu;
+    double value_f = 2.323423423;
+    const char *value_s = "hello world";
+    QObject *value_p = QOBJECT(qnull());
+    QBool *qbool;
+    QNum *qnum;
+    QString *qstr;
+    QObject *qobj;
+
+    /* bool */
+
+    qbool = qobject_to(QBool, qobject_from_jsonf_nofail("%i", false));
+    g_assert(qbool);
+    g_assert(qbool_get_bool(qbool) == false);
+    qobject_unref(qbool);
+
+    /* Test that non-zero values other than 1 get collapsed to true */
+    qbool = qobject_to(QBool, qobject_from_jsonf_nofail("%i", 2));
+    g_assert(qbool);
+    g_assert(qbool_get_bool(qbool) == true);
+    qobject_unref(qbool);
+
+    /* number */
+
+    qnum = qobject_to(QNum, qobject_from_jsonf_nofail("%d", value_d));
+    g_assert_cmpint(qnum_get_int(qnum), ==, value_d);
+    qobject_unref(qnum);
+
+    qnum = qobject_to(QNum, qobject_from_jsonf_nofail("%ld", value_ld));
+    g_assert_cmpint(qnum_get_int(qnum), ==, value_ld);
+    qobject_unref(qnum);
+
+    qnum = qobject_to(QNum, qobject_from_jsonf_nofail("%lld", value_lld));
+    g_assert_cmpint(qnum_get_int(qnum), ==, value_lld);
+    qobject_unref(qnum);
+
+    qnum = qobject_to(QNum, qobject_from_jsonf_nofail("%u", value_u));
+    g_assert_cmpuint(qnum_get_uint(qnum), ==, value_u);
+    qobject_unref(qnum);
+
+    qnum = qobject_to(QNum, qobject_from_jsonf_nofail("%lu", value_lu));
+    g_assert_cmpuint(qnum_get_uint(qnum), ==, value_lu);
+    qobject_unref(qnum);
+
+    qnum = qobject_to(QNum, qobject_from_jsonf_nofail("%llu", value_llu));
+    g_assert_cmpuint(qnum_get_uint(qnum), ==, value_llu);
+    qobject_unref(qnum);
+
+    qnum = qobject_to(QNum, qobject_from_jsonf_nofail("%f", value_f));
+    g_assert(qnum_get_double(qnum) == value_f);
+    qobject_unref(qnum);
+
+    /* string */
+
+    qstr = qobject_to(QString,
+                     qobject_from_jsonf_nofail("%s", value_s));
+    g_assert_cmpstr(qstring_get_try_str(qstr), ==, value_s);
+    qobject_unref(qstr);
+
+    /* object */
+
+    qobj = qobject_from_jsonf_nofail("%p", value_p);
+    g_assert(qobj == value_p);
+}
+
 static void simple_dict(void)
 {
     int i;
@@ -1278,7 +1292,7 @@ static void simple_whitespace(void)
     }
 }
 
-static void simple_varargs(void)
+static void simple_interpolation(void)
 {
     QObject *embedded_obj;
     QObject *obj;
@@ -1470,22 +1484,21 @@ int main(int argc, char **argv)
 
     g_test_add_func("/literals/string/escaped", escaped_string);
     g_test_add_func("/literals/string/utf8", utf8_string);
-    g_test_add_func("/literals/string/vararg", vararg_string);
 
     g_test_add_func("/literals/number/simple", simple_number);
     g_test_add_func("/literals/number/large", large_number);
     g_test_add_func("/literals/number/float", float_number);
-    g_test_add_func("/literals/number/vararg", vararg_number);
 
     g_test_add_func("/literals/keyword", keyword_literal);
 
+    g_test_add_func("/literals/interpolation", interpolation);
+
     g_test_add_func("/dicts/simple_dict", simple_dict);
     g_test_add_func("/dicts/large_dict", large_dict);
     g_test_add_func("/lists/simple_list", simple_list);
 
-    g_test_add_func("/whitespace/simple_whitespace", simple_whitespace);
-
-    g_test_add_func("/varargs/simple_varargs", simple_varargs);
+    g_test_add_func("/mixed/simple_whitespace", simple_whitespace);
+    g_test_add_func("/mixed/interpolation", simple_interpolation);
 
     g_test_add_func("/errors/empty", empty_input);
     g_test_add_func("/errors/blank", blank_input);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 16/56] json: Fix lexer to include the bad character in JSON_ERROR token
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (14 preceding siblings ...)
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 15/56] check-qjson: Cover interpolation " Markus Armbruster
@ 2018-08-08 12:02 ` Markus Armbruster
  2018-08-09 17:42   ` Eric Blake
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 17/56] json: Reject unescaped control characters Markus Armbruster
                   ` (40 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

json_lexer[] maps (lexer state, input character) to the new lexer
state.  The input character is consumed unless the new state is
terminal and the input character doesn't belong to this token,
i.e. the state transition uses look-ahead.  When this is the case,
input character '\0' would result in the same state transition.
TERMINAL_NEEDED_LOOKAHEAD() exploits this.

Except this is wrong for transitions to IN_ERROR.  There, the
offending input character is in fact consumed: case IN_ERROR returns.
It isn't added to the JSON_ERROR token, though.

Fix that by making TERMINAL_NEEDED_LOOKAHEAD() return false for
transitions to IN_ERROR.

There's a slight complication.  json_lexer_flush() passes input
character '\0' to flush an incomplete token.  If this results in
JSON_ERROR, we'd now add the '\0' to the token.  Suppress that.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-lexer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 980ba159d6..7c0875d225 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -76,7 +76,7 @@ QEMU_BUILD_BUG_ON((int)JSON_MIN <= (int)IN_START);
    from OLD_STATE required lookahead.  This happens whenever the table
    below uses the TERMINAL macro.  */
 #define TERMINAL_NEEDED_LOOKAHEAD(old_state, terminal) \
-            (json_lexer[(old_state)][0] == (terminal))
+    (terminal != IN_ERROR && json_lexer[(old_state)][0] == (terminal))
 
 static const uint8_t json_lexer[][256] =  {
     /* Relies on default initialization to IN_ERROR! */
@@ -304,7 +304,7 @@ static int json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
         assert(lexer->state <= ARRAY_SIZE(json_lexer));
         new_state = json_lexer[lexer->state][(uint8_t)ch];
         char_consumed = !TERMINAL_NEEDED_LOOKAHEAD(lexer->state, new_state);
-        if (char_consumed) {
+        if (char_consumed && !flush) {
             g_string_append_c(lexer->token, ch);
         }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 17/56] json: Reject unescaped control characters
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (15 preceding siblings ...)
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 16/56] json: Fix lexer to include the bad character in JSON_ERROR token Markus Armbruster
@ 2018-08-08 12:02 ` Markus Armbruster
  2018-08-09 18:26   ` Eric Blake
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 18/56] json: Revamp lexer documentation Markus Armbruster
                   ` (39 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Fix the lexer to reject unescaped control characters in JSON strings,
in accordance with RFC 7159.

Bonus: we now recover more nicely from unclosed strings.  E.g.

    {"one: 1}\n{"two": 2}

now recovers cleanly after the newline, where before the lexer
remained confused until the next unpaired double quote or lexical
error.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-lexer.c | 4 ++--
 tests/check-qjson.c  | 6 +-----
 tests/qmp-test.c     | 4 ++--
 3 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 7c0875d225..e85e9a78ff 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -115,7 +115,7 @@ static const uint8_t json_lexer[][256] =  {
         ['u'] = IN_DQ_UCODE0,
     },
     [IN_DQ_STRING] = {
-        [1 ... 0xBF] = IN_DQ_STRING,
+        [0x20 ... 0xBF] = IN_DQ_STRING,
         [0xC2 ... 0xF4] = IN_DQ_STRING,
         ['\\'] = IN_DQ_STRING_ESCAPE,
         ['"'] = JSON_STRING,
@@ -155,7 +155,7 @@ static const uint8_t json_lexer[][256] =  {
         ['u'] = IN_SQ_UCODE0,
     },
     [IN_SQ_STRING] = {
-        [1 ... 0xBF] = IN_SQ_STRING,
+        [0x20 ... 0xBF] = IN_SQ_STRING,
         [0xC2 ... 0xF4] = IN_SQ_STRING,
         ['\\'] = IN_SQ_STRING_ESCAPE,
         ['\''] = JSON_STRING,
diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index fda2b014a3..7d8ce5c68d 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -171,11 +171,7 @@ static void utf8_string(void)
             "\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F"
             "\x10\x11\x12\x13\x14\x15\x16\x17"
             "\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F",
-            /* bug: not corrected (valid UTF-8, but invalid JSON) */
-            "\x01\x02\x03\x04\x05\x06\x07"
-            "\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F"
-            "\x10\x11\x12\x13\x14\x15\x16\x17"
-            "\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F",
+            NULL,
             "\\u0001\\u0002\\u0003\\u0004\\u0005\\u0006\\u0007"
             "\\b\\t\\n\\u000B\\f\\r\\u000E\\u000F"
             "\\u0010\\u0011\\u0012\\u0013\\u0014\\u0015\\u0016\\u0017"
diff --git a/tests/qmp-test.c b/tests/qmp-test.c
index 5117a1ab25..b77987b644 100644
--- a/tests/qmp-test.c
+++ b/tests/qmp-test.c
@@ -86,9 +86,9 @@ static void test_malformed(QTestState *qts)
     g_assert(recovered(qts));
 
     /* lexical error: control character in string */
-    qtest_qmp_send_raw(qts, "{'execute': 'nonexistent', 'id':'\n'}");
+    qtest_qmp_send_raw(qts, "{'execute': 'nonexistent', 'id':'\n");
     resp = qtest_qmp_receive(qts);
-    g_assert_cmpstr(get_error_class(resp), ==, "CommandNotFound"); /* BUG */
+    g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
     qobject_unref(resp);
     g_assert(recovered(qts));
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 18/56] json: Revamp lexer documentation
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (16 preceding siblings ...)
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 17/56] json: Reject unescaped control characters Markus Armbruster
@ 2018-08-08 12:02 ` Markus Armbruster
  2018-08-09 18:49   ` Eric Blake
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 19/56] json: Tighten and simplify qstring_from_escaped_str()'s loop Markus Armbruster
                   ` (38 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-lexer.c | 80 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 71 insertions(+), 9 deletions(-)

diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index e85e9a78ff..109a7d8bb8 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -18,21 +18,83 @@
 #define MAX_TOKEN_SIZE (64ULL << 20)
 
 /*
- * Required by JSON (RFC 7159):
+ * From RFC 7159 "The JavaScript Object Notation (JSON) Data
+ * Interchange Format", with [comments in brackets]:
  *
- * \"([^\\\"]|\\[\"'\\/bfnrt]|\\u[0-9a-fA-F]{4})*\"
- * -?(0|[1-9][0-9]*)(.[0-9]+)?([eE][-+]?[0-9]+)?
- * [{}\[\],:]
- * [a-z]+   # covers null, true, false
+ * The set of tokens includes six structural characters, strings,
+ * numbers, and three literal names.
  *
- * Extension of '' strings:
+ * These are the six structural characters:
  *
- * '([^\\']|\\[\"'\\/bfnrt]|\\u[0-9a-fA-F]{4})*'
+ *    begin-array     = ws %x5B ws  ; [ left square bracket
+ *    begin-object    = ws %x7B ws  ; { left curly bracket
+ *    end-array       = ws %x5D ws  ; ] right square bracket
+ *    end-object      = ws %x7D ws  ; } right curly bracket
+ *    name-separator  = ws %x3A ws  ; : colon
+ *    value-separator = ws %x2C ws  ; , comma
  *
- * Extension for vararg handling in JSON construction:
+ * Insignificant whitespace is allowed before or after any of the six
+ * structural characters.
+ * [This lexer accepts it before or after any token, which is actually
+ * the same, as the grammar always has structural characters between
+ * other tokens.]
  *
- * %((l|ll|I64)?d|[ipsf])
+ *    ws = *(
+ *           %x20 /              ; Space
+ *           %x09 /              ; Horizontal tab
+ *           %x0A /              ; Line feed or New line
+ *           %x0D )              ; Carriage return
  *
+ * [...] three literal names:
+ *    false null true
+ *  [This lexer accepts [a-z]+, and leaves rejecting unknown literal
+ *  names to the parser.]
+ *
+ * [Numbers:]
+ *
+ *    number = [ minus ] int [ frac ] [ exp ]
+ *    decimal-point = %x2E       ; .
+ *    digit1-9 = %x31-39         ; 1-9
+ *    e = %x65 / %x45            ; e E
+ *    exp = e [ minus / plus ] 1*DIGIT
+ *    frac = decimal-point 1*DIGIT
+ *    int = zero / ( digit1-9 *DIGIT )
+ *    minus = %x2D               ; -
+ *    plus = %x2B                ; +
+ *    zero = %x30                ; 0
+ *
+ * [Strings:]
+ *    string = quotation-mark *char quotation-mark
+ *
+ *    char = unescaped /
+ *        escape (
+ *            %x22 /          ; "    quotation mark  U+0022
+ *            %x5C /          ; \    reverse solidus U+005C
+ *            %x2F /          ; /    solidus         U+002F
+ *            %x62 /          ; b    backspace       U+0008
+ *            %x66 /          ; f    form feed       U+000C
+ *            %x6E /          ; n    line feed       U+000A
+ *            %x72 /          ; r    carriage return U+000D
+ *            %x74 /          ; t    tab             U+0009
+ *            %x75 4HEXDIG )  ; uXXXX                U+XXXX
+ *    escape = %x5C              ; \
+ *    quotation-mark = %x22      ; "
+ *    unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
+ *
+ *
+ * Extensions over RFC 7159:
+ * - Extra escape sequence in strings:
+ *   0x27 (apostrophe) is recognized after escape, too
+ * - Single-quoted strings:
+ *   Like double-quoted strings, except they're delimited by %x27
+ *   (apostrophe) instead of %x22 (quotation mark), and can't contain
+ *   unescaped apostrophe, but can contain unescaped quotation mark.
+ * - Interpolation:
+ *   interpolation = %((l|ll|I64)[du]|[ipsf])
+ *
+ * Note:
+ * - Input must be encoded in UTF-8.
+ * - Decoding and validating is left to the parser.
  */
 
 enum json_lexer_state {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 19/56] json: Tighten and simplify qstring_from_escaped_str()'s loop
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (17 preceding siblings ...)
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 18/56] json: Revamp lexer documentation Markus Armbruster
@ 2018-08-08 12:02 ` Markus Armbruster
  2018-08-09 18:52   ` Eric Blake
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 20/56] check-qjson: Document we expect invalid UTF-8 to be rejected Markus Armbruster
                   ` (37 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Simplify loop control, and assert that the string ends with the
appropriate quote (the lexer ensures it does).

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-parser.c | 35 +++++++++--------------------------
 1 file changed, 9 insertions(+), 26 deletions(-)

diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index a5aa790d62..e00405745f 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -132,66 +132,50 @@ static QString *qstring_from_escaped_str(JSONParserContext *ctxt,
 {
     const char *ptr = token->str;
     QString *str;
-    int double_quote = 1;
-
-    if (*ptr == '"') {
-        double_quote = 1;
-    } else {
-        double_quote = 0;
-    }
-    ptr++;
+    char quote;
 
+    assert(*ptr == '"' || *ptr == '\'');
+    quote = *ptr;
     str = qstring_new();
-    while (*ptr && 
-           ((double_quote && *ptr != '"') || (!double_quote && *ptr != '\''))) {
+
+    while (*++ptr != quote) {
+        assert(*ptr);
         if (*ptr == '\\') {
-            ptr++;
-
-            switch (*ptr) {
+            switch (*++ptr) {
             case '"':
                 qstring_append(str, "\"");
-                ptr++;
                 break;
             case '\'':
                 qstring_append(str, "'");
-                ptr++;
                 break;
             case '\\':
                 qstring_append(str, "\\");
-                ptr++;
                 break;
             case '/':
                 qstring_append(str, "/");
-                ptr++;
                 break;
             case 'b':
                 qstring_append(str, "\b");
-                ptr++;
                 break;
             case 'f':
                 qstring_append(str, "\f");
-                ptr++;
                 break;
             case 'n':
                 qstring_append(str, "\n");
-                ptr++;
                 break;
             case 'r':
                 qstring_append(str, "\r");
-                ptr++;
                 break;
             case 't':
                 qstring_append(str, "\t");
-                ptr++;
                 break;
             case 'u': {
                 uint16_t unicode_char = 0;
                 char utf8_char[4];
                 int i = 0;
 
-                ptr++;
-
                 for (i = 0; i < 4; i++) {
+                    ptr++;
                     if (qemu_isxdigit(*ptr)) {
                         unicode_char |= hex2decimal(*ptr) << ((3 - i) * 4);
                     } else {
@@ -199,7 +183,6 @@ static QString *qstring_from_escaped_str(JSONParserContext *ctxt,
                                     "invalid hex escape sequence in string");
                         goto out;
                     }
-                    ptr++;
                 }
 
                 wchar_to_utf8(unicode_char, utf8_char, sizeof(utf8_char));
@@ -212,7 +195,7 @@ static QString *qstring_from_escaped_str(JSONParserContext *ctxt,
         } else {
             char dummy[2];
 
-            dummy[0] = *ptr++;
+            dummy[0] = *ptr;
             dummy[1] = 0;
 
             qstring_append(str, dummy);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 20/56] check-qjson: Document we expect invalid UTF-8 to be rejected
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (18 preceding siblings ...)
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 19/56] json: Tighten and simplify qstring_from_escaped_str()'s loop Markus Armbruster
@ 2018-08-08 12:02 ` Markus Armbruster
  2018-08-09 18:55   ` Eric Blake
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 21/56] json: Reject invalid UTF-8 sequences Markus Armbruster
                   ` (36 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

The JSON parser rejects some invalid sequences, but accepts others
without correcting the problem.

We should either reject all invalid sequences, or minimize overlong
sequences and replace all other invalid sequences by a suitable
replacement character.  A common choice for replacement is U+FFFD.

I'm going to implement the former.  Update the comments in
utf8_string() to expect this.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 tests/check-qjson.c | 151 +++++++++++++++++++++-----------------------
 1 file changed, 71 insertions(+), 80 deletions(-)

diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index 7d8ce5c68d..8ce047fad0 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -126,13 +126,7 @@ static void utf8_string(void)
      * They're all marked "bug:" below, and are to be replaced by
      * correct ones as the bugs get fixed.
      *
-     * The JSON parser rejects some invalid sequences, but accepts
-     * others without correcting the problem.
-     *
-     * We should either reject all invalid sequences, or minimize
-     * overlong sequences and replace all other invalid sequences by a
-     * suitable replacement character.  A common choice for
-     * replacement is U+FFFD.
+     * The JSON parser rejects some, but not all invalid sequences.
      *
      * Problem: we can't easily deal with embedded U+0000.  Parsing
      * the JSON string "this \\u0000" is fun" yields "this \0 is fun",
@@ -154,11 +148,8 @@ static void utf8_string(void)
     } test_cases[] = {
         /*
          * Bug markers used here:
-         * - bug: not corrected
-         *   JSON parser fails to correct invalid sequence(s)
-         * - bug: rejected
-         *   JSON parser rejects invalid sequence(s)
-         *   We may choose to define this as feature
+         * - bug: not rejected
+         *   JSON parser fails to reject invalid sequence(s)
          */
 
         /* 0  Control characters */
@@ -226,13 +217,13 @@ static void utf8_string(void)
         /* 2.1.5  5 bytes U+200000 */
         {
             "\xF8\x88\x80\x80\x80",
-            NULL,               /* bug: rejected */
+            NULL,
             "\\uFFFD",
         },
         /* 2.1.6  6 bytes U+4000000 */
         {
             "\xFC\x84\x80\x80\x80\x80",
-            NULL,               /* bug: rejected */
+            NULL,
             "\\uFFFD",
         },
         /* 2.2  Last possible sequence of a certain length */
@@ -265,19 +256,19 @@ static void utf8_string(void)
         /* 2.2.4  4 bytes U+1FFFFF */
         {
             "\xF7\xBF\xBF\xBF",
-            NULL,               /* bug: rejected */
+            NULL,
             "\\uFFFD",
         },
         /* 2.2.5  5 bytes U+3FFFFFF */
         {
             "\xFB\xBF\xBF\xBF\xBF",
-            NULL,               /* bug: rejected */
+            NULL,
             "\\uFFFD",
         },
         /* 2.2.6  6 bytes U+7FFFFFFF */
         {
             "\xFD\xBF\xBF\xBF\xBF\xBF",
-            NULL,               /* bug: rejected */
+            NULL,
             "\\uFFFD",
         },
         /* 2.3  Other boundary conditions */
@@ -316,49 +307,49 @@ static void utf8_string(void)
         /* 3.1.1  First continuation byte */
         {
             "\x80",
-            "\x80",             /* bug: not corrected */
+            "\x80",             /* bug: not rejected */
             "\\uFFFD",
         },
         /* 3.1.2  Last continuation byte */
         {
             "\xBF",
-            "\xBF",             /* bug: not corrected */
+            "\xBF",             /* bug: not rejected */
             "\\uFFFD",
         },
         /* 3.1.3  2 continuation bytes */
         {
             "\x80\xBF",
-            "\x80\xBF",         /* bug: not corrected */
+            "\x80\xBF",         /* bug: not rejected */
             "\\uFFFD\\uFFFD",
         },
         /* 3.1.4  3 continuation bytes */
         {
             "\x80\xBF\x80",
-            "\x80\xBF\x80",     /* bug: not corrected */
+            "\x80\xBF\x80",     /* bug: not rejected */
             "\\uFFFD\\uFFFD\\uFFFD",
         },
         /* 3.1.5  4 continuation bytes */
         {
             "\x80\xBF\x80\xBF",
-            "\x80\xBF\x80\xBF", /* bug: not corrected */
+            "\x80\xBF\x80\xBF", /* bug: not rejected */
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD",
         },
         /* 3.1.6  5 continuation bytes */
         {
             "\x80\xBF\x80\xBF\x80",
-            "\x80\xBF\x80\xBF\x80", /* bug: not corrected */
+            "\x80\xBF\x80\xBF\x80", /* bug: not rejected */
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD",
         },
         /* 3.1.7  6 continuation bytes */
         {
             "\x80\xBF\x80\xBF\x80\xBF",
-            "\x80\xBF\x80\xBF\x80\xBF", /* bug: not corrected */
+            "\x80\xBF\x80\xBF\x80\xBF", /* bug: not rejected */
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD",
         },
         /* 3.1.8  7 continuation bytes */
         {
             "\x80\xBF\x80\xBF\x80\xBF\x80",
-            "\x80\xBF\x80\xBF\x80\xBF\x80", /* bug: not corrected */
+            "\x80\xBF\x80\xBF\x80\xBF\x80", /* bug: not rejected */
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD",
         },
         /* 3.1.9  Sequence of all 64 possible continuation bytes */
@@ -371,7 +362,7 @@ static void utf8_string(void)
             "\xA8\xA9\xAA\xAB\xAC\xAD\xAE\xAF"
             "\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7"
             "\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF",
-             /* bug: not corrected */
+             /* bug: not rejected */
             "\x80\x81\x82\x83\x84\x85\x86\x87"
             "\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F"
             "\x90\x91\x92\x93\x94\x95\x96\x97"
@@ -396,7 +387,7 @@ static void utf8_string(void)
             "\xC8 \xC9 \xCA \xCB \xCC \xCD \xCE \xCF "
             "\xD0 \xD1 \xD2 \xD3 \xD4 \xD5 \xD6 \xD7 "
             "\xD8 \xD9 \xDA \xDB \xDC \xDD \xDE \xDF ",
-            NULL,               /* bug: rejected (partly, see FIXME below) */
+            NULL,               /* bug: accepted partly, see FIXME below */
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD "
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD "
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD "
@@ -406,7 +397,7 @@ static void utf8_string(void)
         {
             "\xE0 \xE1 \xE2 \xE3 \xE4 \xE5 \xE6 \xE7 "
             "\xE8 \xE9 \xEA \xEB \xEC \xED \xEE \xEF ",
-            /* bug: not corrected */
+            /* bug: not rejected */
             "\xE0 \xE1 \xE2 \xE3 \xE4 \xE5 \xE6 \xE7 "
             "\xE8 \xE9 \xEA \xEB \xEC \xED \xEE \xEF ",
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD "
@@ -415,131 +406,131 @@ static void utf8_string(void)
         /* 3.2.3  All 8 first bytes of 4-byte sequences, followed by space */
         {
             "\xF0 \xF1 \xF2 \xF3 \xF4 \xF5 \xF6 \xF7 ",
-            NULL,               /* bug: rejected (partly, see FIXME below) */
+            NULL,               /* bug: accepted partly, see FIXME below */
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD ",
         },
         /* 3.2.4  All 4 first bytes of 5-byte sequences, followed by space */
         {
             "\xF8 \xF9 \xFA \xFB ",
-            NULL,               /* bug: rejected */
+            NULL,
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD ",
         },
         /* 3.2.5  All 2 first bytes of 6-byte sequences, followed by space */
         {
             "\xFC \xFD ",
-            NULL,               /* bug: rejected */
+            NULL,
             "\\uFFFD \\uFFFD ",
         },
         /* 3.3  Sequences with last continuation byte missing */
         /* 3.3.1  2-byte sequence with last byte missing (U+0000) */
         {
             "\xC0",
-            NULL,               /* bug: rejected */
+            NULL,
             "\\uFFFD",
         },
         /* 3.3.2  3-byte sequence with last byte missing (U+0000) */
         {
             "\xE0\x80",
-            "\xE0\x80",           /* bug: not corrected */
+            "\xE0\x80",         /* bug: not rejected */
             "\\uFFFD",
         },
         /* 3.3.3  4-byte sequence with last byte missing (U+0000) */
         {
             "\xF0\x80\x80",
-            "\xF0\x80\x80",     /* bug: not corrected */
+            "\xF0\x80\x80",     /* bug: not rejected */
             "\\uFFFD",
         },
         /* 3.3.4  5-byte sequence with last byte missing (U+0000) */
         {
             "\xF8\x80\x80\x80",
-            NULL,                   /* bug: rejected */
+            NULL,
             "\\uFFFD",
         },
         /* 3.3.5  6-byte sequence with last byte missing (U+0000) */
         {
             "\xFC\x80\x80\x80\x80",
-            NULL,                        /* bug: rejected */
+            NULL,
             "\\uFFFD",
         },
         /* 3.3.6  2-byte sequence with last byte missing (U+07FF) */
         {
             "\xDF",
-            "\xDF",             /* bug: not corrected */
+            "\xDF",             /* bug: not rejected */
             "\\uFFFD",
         },
         /* 3.3.7  3-byte sequence with last byte missing (U+FFFF) */
         {
             "\xEF\xBF",
-            "\xEF\xBF",           /* bug: not corrected */
+            "\xEF\xBF",         /* bug: not rejected */
             "\\uFFFD",
         },
         /* 3.3.8  4-byte sequence with last byte missing (U+1FFFFF) */
         {
             "\xF7\xBF\xBF",
-            NULL,               /* bug: rejected */
+            NULL,
             "\\uFFFD",
         },
         /* 3.3.9  5-byte sequence with last byte missing (U+3FFFFFF) */
         {
             "\xFB\xBF\xBF\xBF",
-            NULL,                 /* bug: rejected */
+            NULL,
             "\\uFFFD",
         },
         /* 3.3.10  6-byte sequence with last byte missing (U+7FFFFFFF) */
         {
             "\xFD\xBF\xBF\xBF\xBF",
-            NULL,                        /* bug: rejected */
+            NULL,
             "\\uFFFD",
         },
         /* 3.4  Concatenation of incomplete sequences */
         {
             "\xC0\xE0\x80\xF0\x80\x80\xF8\x80\x80\x80\xFC\x80\x80\x80\x80"
             "\xDF\xEF\xBF\xF7\xBF\xBF\xFB\xBF\xBF\xBF\xFD\xBF\xBF\xBF\xBF",
-            NULL,               /* bug: rejected (partly, see FIXME below) */
+            NULL,               /* bug: accepted partly, see FIXME below */
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD",
         },
         /* 3.5  Impossible bytes */
         {
             "\xFE",
-            NULL,               /* bug: rejected */
+            NULL,
             "\\uFFFD",
         },
         {
             "\xFF",
-            NULL,               /* bug: rejected */
+            NULL,
             "\\uFFFD",
         },
         {
             "\xFE\xFE\xFF\xFF",
-            NULL,                 /* bug: rejected */
+            NULL,
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD",
         },
         /* 4  Overlong sequences */
         /* 4.1  Overlong '/' */
         {
             "\xC0\xAF",
-            NULL,               /* bug: rejected */
+            NULL,
             "\\uFFFD",
         },
         {
             "\xE0\x80\xAF",
-            "\xE0\x80\xAF",     /* bug: not corrected */
+            "\xE0\x80\xAF",     /* bug: not rejected */
             "\\uFFFD",
         },
         {
             "\xF0\x80\x80\xAF",
-            "\xF0\x80\x80\xAF",  /* bug: not corrected */
+            "\xF0\x80\x80\xAF", /* bug: not rejected */
             "\\uFFFD",
         },
         {
             "\xF8\x80\x80\x80\xAF",
-            NULL,                        /* bug: rejected */
+            NULL,
             "\\uFFFD",
         },
         {
             "\xFC\x80\x80\x80\x80\xAF",
-            NULL,                               /* bug: rejected */
+            NULL,
             "\\uFFFD",
         },
         /*
@@ -551,13 +542,13 @@ static void utf8_string(void)
         {
             /* \U+007F */
             "\xC1\xBF",
-            NULL,               /* bug: rejected */
+            NULL,
             "\\uFFFD",
         },
         {
             /* \U+07FF */
             "\xE0\x9F\xBF",
-            "\xE0\x9F\xBF",     /* bug: not corrected */
+            "\xE0\x9F\xBF",     /* bug: not rejected */
             "\\uFFFD",
         },
         {
@@ -568,50 +559,50 @@ static void utf8_string(void)
              * also 2.2.3
              */
             "\xF0\x8F\xBF\xBC",
-            "\xF0\x8F\xBF\xBC",   /* bug: not corrected */
+            "\xF0\x8F\xBF\xBC", /* bug: not rejected */
             "\\uFFFD",
         },
         {
             /* \U+1FFFFF */
             "\xF8\x87\xBF\xBF\xBF",
-            NULL,                        /* bug: rejected */
+            NULL,
             "\\uFFFD",
         },
         {
             /* \U+3FFFFFF */
             "\xFC\x83\xBF\xBF\xBF\xBF",
-            NULL,                               /* bug: rejected */
+            NULL,
             "\\uFFFD",
         },
         /* 4.3  Overlong representation of the NUL character */
         {
             /* \U+0000 */
             "\xC0\x80",
-            NULL,               /* bug: rejected */
+            NULL,
             "\\u0000",
         },
         {
             /* \U+0000 */
             "\xE0\x80\x80",
-            "\xE0\x80\x80",     /* bug: not corrected */
+            "\xE0\x80\x80",     /* bug: not rejected */
             "\\uFFFD",
         },
         {
             /* \U+0000 */
             "\xF0\x80\x80\x80",
-            "\xF0\x80\x80\x80",   /* bug: not corrected */
+            "\xF0\x80\x80\x80", /* bug: not rejected */
             "\\uFFFD",
         },
         {
             /* \U+0000 */
             "\xF8\x80\x80\x80\x80",
-            NULL,                        /* bug: rejected */
+            NULL,
             "\\uFFFD",
         },
         {
             /* \U+0000 */
             "\xFC\x80\x80\x80\x80\x80",
-            NULL,                               /* bug: rejected */
+            NULL,
             "\\uFFFD",
         },
         /* 5  Illegal code positions */
@@ -619,92 +610,92 @@ static void utf8_string(void)
         {
             /* \U+D800 */
             "\xED\xA0\x80",
-            "\xED\xA0\x80",     /* bug: not corrected */
+            "\xED\xA0\x80",     /* bug: not rejected */
             "\\uFFFD",
         },
         {
             /* \U+DB7F */
             "\xED\xAD\xBF",
-            "\xED\xAD\xBF",     /* bug: not corrected */
+            "\xED\xAD\xBF",     /* bug: not rejected */
             "\\uFFFD",
         },
         {
             /* \U+DB80 */
             "\xED\xAE\x80",
-            "\xED\xAE\x80",     /* bug: not corrected */
+            "\xED\xAE\x80",     /* bug: not rejected */
             "\\uFFFD",
         },
         {
             /* \U+DBFF */
             "\xED\xAF\xBF",
-            "\xED\xAF\xBF",     /* bug: not corrected */
+            "\xED\xAF\xBF",     /* bug: not rejected */
             "\\uFFFD",
         },
         {
             /* \U+DC00 */
             "\xED\xB0\x80",
-            "\xED\xB0\x80",     /* bug: not corrected */
+            "\xED\xB0\x80",     /* bug: not rejected */
             "\\uFFFD",
         },
         {
             /* \U+DF80 */
             "\xED\xBE\x80",
-            "\xED\xBE\x80",     /* bug: not corrected */
+            "\xED\xBE\x80",     /* bug: not rejected */
             "\\uFFFD",
         },
         {
             /* \U+DFFF */
             "\xED\xBF\xBF",
-            "\xED\xBF\xBF",     /* bug: not corrected */
+            "\xED\xBF\xBF",     /* bug: not rejected */
             "\\uFFFD",
         },
         /* 5.2  Paired UTF-16 surrogates */
         {
             /* \U+D800\U+DC00 */
             "\xED\xA0\x80\xED\xB0\x80",
-            "\xED\xA0\x80\xED\xB0\x80", /* bug: not corrected */
+            "\xED\xA0\x80\xED\xB0\x80", /* bug: not rejected */
             "\\uFFFD\\uFFFD",
         },
         {
             /* \U+D800\U+DFFF */
             "\xED\xA0\x80\xED\xBF\xBF",
-            "\xED\xA0\x80\xED\xBF\xBF", /* bug: not corrected */
+            "\xED\xA0\x80\xED\xBF\xBF", /* bug: not rejected */
             "\\uFFFD\\uFFFD",
         },
         {
             /* \U+DB7F\U+DC00 */
             "\xED\xAD\xBF\xED\xB0\x80",
-            "\xED\xAD\xBF\xED\xB0\x80", /* bug: not corrected */
+            "\xED\xAD\xBF\xED\xB0\x80", /* bug: not rejected */
             "\\uFFFD\\uFFFD",
         },
         {
             /* \U+DB7F\U+DFFF */
             "\xED\xAD\xBF\xED\xBF\xBF",
-            "\xED\xAD\xBF\xED\xBF\xBF", /* bug: not corrected */
+            "\xED\xAD\xBF\xED\xBF\xBF", /* bug: not rejected */
             "\\uFFFD\\uFFFD",
         },
         {
             /* \U+DB80\U+DC00 */
             "\xED\xAE\x80\xED\xB0\x80",
-            "\xED\xAE\x80\xED\xB0\x80", /* bug: not corrected */
+            "\xED\xAE\x80\xED\xB0\x80", /* bug: not rejected */
             "\\uFFFD\\uFFFD",
         },
         {
             /* \U+DB80\U+DFFF */
             "\xED\xAE\x80\xED\xBF\xBF",
-            "\xED\xAE\x80\xED\xBF\xBF", /* bug: not corrected */
+            "\xED\xAE\x80\xED\xBF\xBF", /* bug: not rejected */
             "\\uFFFD\\uFFFD",
         },
         {
             /* \U+DBFF\U+DC00 */
             "\xED\xAF\xBF\xED\xB0\x80",
-            "\xED\xAF\xBF\xED\xB0\x80", /* bug: not corrected */
+            "\xED\xAF\xBF\xED\xB0\x80", /* bug: not rejected */
             "\\uFFFD\\uFFFD",
         },
         {
             /* \U+DBFF\U+DFFF */
             "\xED\xAF\xBF\xED\xBF\xBF",
-            "\xED\xAF\xBF\xED\xBF\xBF", /* bug: not corrected */
+            "\xED\xAF\xBF\xED\xBF\xBF", /* bug: not rejected */
             "\\uFFFD\\uFFFD",
         },
         /* 5.3  Other illegal code positions */
@@ -712,25 +703,25 @@ static void utf8_string(void)
         {
             /* \U+FFFE */
             "\xEF\xBF\xBE",
-            "\xEF\xBF\xBE",     /* bug: not corrected */
+            "\xEF\xBF\xBE",     /* bug: not rejected */
             "\\uFFFD",
         },
         {
             /* \U+FFFF */
             "\xEF\xBF\xBF",
-            "\xEF\xBF\xBF",     /* bug: not corrected */
+            "\xEF\xBF\xBF",     /* bug: not rejected */
             "\\uFFFD",
         },
         {
             /* U+FDD0 */
             "\xEF\xB7\x90",
-            "\xEF\xB7\x90",     /* bug: not corrected */
+            "\xEF\xB7\x90",     /* bug: not rejected */
             "\\uFFFD",
         },
         {
             /* U+FDEF */
             "\xEF\xB7\xAF",
-            "\xEF\xB7\xAF",     /* bug: not corrected */
+            "\xEF\xB7\xAF",     /* bug: not rejected */
             "\\uFFFD",
         },
         /* Plane 1 .. 16 noncharacters */
@@ -752,7 +743,7 @@ static void utf8_string(void)
             "\xF3\xAF\xBF\xBE\xF3\xAF\xBF\xBF"
             "\xF3\xBF\xBF\xBE\xF3\xBF\xBF\xBF"
             "\xF4\x8F\xBF\xBE\xF4\x8F\xBF\xBF",
-            /* bug: not corrected */
+            /* bug: not rejected */
             "\xF0\x9F\xBF\xBE\xF0\x9F\xBF\xBF"
             "\xF0\xAF\xBF\xBE\xF0\xAF\xBF\xBF"
             "\xF0\xBF\xBF\xBE\xF0\xBF\xBF\xBF"
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 21/56] json: Reject invalid UTF-8 sequences
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (19 preceding siblings ...)
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 20/56] check-qjson: Document we expect invalid UTF-8 to be rejected Markus Armbruster
@ 2018-08-08 12:02 ` Markus Armbruster
  2018-08-09 22:16   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 22/56] json: Report first rather than last parse error Markus Armbruster
                   ` (35 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

We reject bytes that can't occur in valid UTF-8 (\xC0..\xC1,
\xF5..\xFF in the lexer.  That's insufficient; there's plenty of
invalid UTF-8 not containing these bytes, as demonstrated by
check-qjson:

* Malformed sequences

  - Unexpected continuation bytes

  - Missing continuation bytes after start bytes other than
    \xC0..\xC1, \xF5..\xFD.

* Overlong sequences with start bytes other than \xC0..\xC1,
  \xF5..\xFD.

* Invalid code points

Fixing this in the lexer would be bothersome.  Fixing it in the parser
is straightforward, so do that.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/qemu/unicode.h |   1 +
 qobject/json-parser.c  |  20 ++++--
 tests/check-qjson.c    | 137 ++++++++++++++---------------------------
 util/unicode.c         |  69 ++++++++++++++++++---
 4 files changed, 122 insertions(+), 105 deletions(-)

diff --git a/include/qemu/unicode.h b/include/qemu/unicode.h
index 71c72db461..7fa10b8e60 100644
--- a/include/qemu/unicode.h
+++ b/include/qemu/unicode.h
@@ -2,5 +2,6 @@
 #define QEMU_UNICODE_H
 
 int mod_utf8_codepoint(const char *s, size_t n, char **end);
+ssize_t mod_utf8_encode(char buf[], size_t bufsz, int codepoint);
 
 #endif
diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index e00405745f..6b60b07e09 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -13,6 +13,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
+#include "qemu/unicode.h"
 #include "qapi/error.h"
 #include "qemu-common.h"
 #include "qapi/qmp/qbool.h"
@@ -133,6 +134,10 @@ static QString *qstring_from_escaped_str(JSONParserContext *ctxt,
     const char *ptr = token->str;
     QString *str;
     char quote;
+    int cp;
+    char *end;
+    ssize_t len;
+    char utf8_buf[5];
 
     assert(*ptr == '"' || *ptr == '\'');
     quote = *ptr;
@@ -193,12 +198,15 @@ static QString *qstring_from_escaped_str(JSONParserContext *ctxt,
                 goto out;
             }
         } else {
-            char dummy[2];
-
-            dummy[0] = *ptr;
-            dummy[1] = 0;
-
-            qstring_append(str, dummy);
+            cp = mod_utf8_codepoint(ptr, 6, &end);
+            if (cp <= 0) {
+                parse_error(ctxt, token, "invalid UTF-8 sequence in string");
+                goto out;
+            }
+            ptr = end - 1;
+            len = mod_utf8_encode(utf8_buf, sizeof(utf8_buf), cp);
+            assert(len >= 0);
+            qstring_append(str, utf8_buf);
         }
     }
 
diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index 8ce047fad0..e00be5b023 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -121,13 +121,6 @@ static void escaped_string(void)
 static void utf8_string(void)
 {
     /*
-     * FIXME Current behavior for invalid UTF-8 sequences is
-     * incorrect.  This test expects current, incorrect results.
-     * They're all marked "bug:" below, and are to be replaced by
-     * correct ones as the bugs get fixed.
-     *
-     * The JSON parser rejects some, but not all invalid sequences.
-     *
      * Problem: we can't easily deal with embedded U+0000.  Parsing
      * the JSON string "this \\u0000" is fun" yields "this \0 is fun",
      * which gets misinterpreted as NUL-terminated "this ".  We should
@@ -146,12 +139,6 @@ static void utf8_string(void)
         /* Expected unparse output, defaults to @json_in */
         const char *json_out;
     } test_cases[] = {
-        /*
-         * Bug markers used here:
-         * - bug: not rejected
-         *   JSON parser fails to reject invalid sequence(s)
-         */
-
         /* 0  Control characters */
         {
             /*
@@ -299,7 +286,7 @@ static void utf8_string(void)
         {
             /* first one beyond Unicode range: U+110000 */
             "\xF4\x90\x80\x80",
-            "\xF4\x90\x80\x80",
+            NULL,
             "\\uFFFD",
         },
         /* 3  Malformed sequences */
@@ -307,49 +294,49 @@ static void utf8_string(void)
         /* 3.1.1  First continuation byte */
         {
             "\x80",
-            "\x80",             /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         /* 3.1.2  Last continuation byte */
         {
             "\xBF",
-            "\xBF",             /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         /* 3.1.3  2 continuation bytes */
         {
             "\x80\xBF",
-            "\x80\xBF",         /* bug: not rejected */
+            NULL,
             "\\uFFFD\\uFFFD",
         },
         /* 3.1.4  3 continuation bytes */
         {
             "\x80\xBF\x80",
-            "\x80\xBF\x80",     /* bug: not rejected */
+            NULL,
             "\\uFFFD\\uFFFD\\uFFFD",
         },
         /* 3.1.5  4 continuation bytes */
         {
             "\x80\xBF\x80\xBF",
-            "\x80\xBF\x80\xBF", /* bug: not rejected */
+            NULL,
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD",
         },
         /* 3.1.6  5 continuation bytes */
         {
             "\x80\xBF\x80\xBF\x80",
-            "\x80\xBF\x80\xBF\x80", /* bug: not rejected */
+            NULL,
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD",
         },
         /* 3.1.7  6 continuation bytes */
         {
             "\x80\xBF\x80\xBF\x80\xBF",
-            "\x80\xBF\x80\xBF\x80\xBF", /* bug: not rejected */
+            NULL,
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD",
         },
         /* 3.1.8  7 continuation bytes */
         {
             "\x80\xBF\x80\xBF\x80\xBF\x80",
-            "\x80\xBF\x80\xBF\x80\xBF\x80", /* bug: not rejected */
+            NULL,
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD",
         },
         /* 3.1.9  Sequence of all 64 possible continuation bytes */
@@ -362,16 +349,7 @@ static void utf8_string(void)
             "\xA8\xA9\xAA\xAB\xAC\xAD\xAE\xAF"
             "\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7"
             "\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF",
-             /* bug: not rejected */
-            "\x80\x81\x82\x83\x84\x85\x86\x87"
-            "\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F"
-            "\x90\x91\x92\x93\x94\x95\x96\x97"
-            "\x98\x99\x9A\x9B\x9C\x9D\x9E\x9F"
-            "\xA0\xA1\xA2\xA3\xA4\xA5\xA6\xA7"
-            "\xA8\xA9\xAA\xAB\xAC\xAD\xAE\xAF"
-            "\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7"
-            "\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF",
-            "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
+            NULL,
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
@@ -379,6 +357,7 @@ static void utf8_string(void)
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
+            "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD",
         },
         /* 3.2  Lonely start characters */
         /* 3.2.1  All 32 first bytes of 2-byte sequences, followed by space */
@@ -387,7 +366,7 @@ static void utf8_string(void)
             "\xC8 \xC9 \xCA \xCB \xCC \xCD \xCE \xCF "
             "\xD0 \xD1 \xD2 \xD3 \xD4 \xD5 \xD6 \xD7 "
             "\xD8 \xD9 \xDA \xDB \xDC \xDD \xDE \xDF ",
-            NULL,               /* bug: accepted partly, see FIXME below */
+            NULL,
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD "
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD "
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD "
@@ -397,16 +376,14 @@ static void utf8_string(void)
         {
             "\xE0 \xE1 \xE2 \xE3 \xE4 \xE5 \xE6 \xE7 "
             "\xE8 \xE9 \xEA \xEB \xEC \xED \xEE \xEF ",
-            /* bug: not rejected */
-            "\xE0 \xE1 \xE2 \xE3 \xE4 \xE5 \xE6 \xE7 "
-            "\xE8 \xE9 \xEA \xEB \xEC \xED \xEE \xEF ",
+            NULL,
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD "
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD ",
         },
         /* 3.2.3  All 8 first bytes of 4-byte sequences, followed by space */
         {
             "\xF0 \xF1 \xF2 \xF3 \xF4 \xF5 \xF6 \xF7 ",
-            NULL,               /* bug: accepted partly, see FIXME below */
+            NULL,
             "\\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD \\uFFFD ",
         },
         /* 3.2.4  All 4 first bytes of 5-byte sequences, followed by space */
@@ -431,13 +408,13 @@ static void utf8_string(void)
         /* 3.3.2  3-byte sequence with last byte missing (U+0000) */
         {
             "\xE0\x80",
-            "\xE0\x80",         /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         /* 3.3.3  4-byte sequence with last byte missing (U+0000) */
         {
             "\xF0\x80\x80",
-            "\xF0\x80\x80",     /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         /* 3.3.4  5-byte sequence with last byte missing (U+0000) */
@@ -455,13 +432,13 @@ static void utf8_string(void)
         /* 3.3.6  2-byte sequence with last byte missing (U+07FF) */
         {
             "\xDF",
-            "\xDF",             /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         /* 3.3.7  3-byte sequence with last byte missing (U+FFFF) */
         {
             "\xEF\xBF",
-            "\xEF\xBF",         /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         /* 3.3.8  4-byte sequence with last byte missing (U+1FFFFF) */
@@ -486,7 +463,7 @@ static void utf8_string(void)
         {
             "\xC0\xE0\x80\xF0\x80\x80\xF8\x80\x80\x80\xFC\x80\x80\x80\x80"
             "\xDF\xEF\xBF\xF7\xBF\xBF\xFB\xBF\xBF\xBF\xFD\xBF\xBF\xBF\xBF",
-            NULL,               /* bug: accepted partly, see FIXME below */
+            NULL,
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD",
         },
@@ -515,12 +492,12 @@ static void utf8_string(void)
         },
         {
             "\xE0\x80\xAF",
-            "\xE0\x80\xAF",     /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         {
             "\xF0\x80\x80\xAF",
-            "\xF0\x80\x80\xAF", /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         {
@@ -548,7 +525,7 @@ static void utf8_string(void)
         {
             /* \U+07FF */
             "\xE0\x9F\xBF",
-            "\xE0\x9F\xBF",     /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         {
@@ -559,7 +536,7 @@ static void utf8_string(void)
              * also 2.2.3
              */
             "\xF0\x8F\xBF\xBC",
-            "\xF0\x8F\xBF\xBC", /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         {
@@ -584,13 +561,13 @@ static void utf8_string(void)
         {
             /* \U+0000 */
             "\xE0\x80\x80",
-            "\xE0\x80\x80",     /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         {
             /* \U+0000 */
             "\xF0\x80\x80\x80",
-            "\xF0\x80\x80\x80", /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         {
@@ -610,92 +587,92 @@ static void utf8_string(void)
         {
             /* \U+D800 */
             "\xED\xA0\x80",
-            "\xED\xA0\x80",     /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         {
             /* \U+DB7F */
             "\xED\xAD\xBF",
-            "\xED\xAD\xBF",     /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         {
             /* \U+DB80 */
             "\xED\xAE\x80",
-            "\xED\xAE\x80",     /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         {
             /* \U+DBFF */
             "\xED\xAF\xBF",
-            "\xED\xAF\xBF",     /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         {
             /* \U+DC00 */
             "\xED\xB0\x80",
-            "\xED\xB0\x80",     /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         {
             /* \U+DF80 */
             "\xED\xBE\x80",
-            "\xED\xBE\x80",     /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         {
             /* \U+DFFF */
             "\xED\xBF\xBF",
-            "\xED\xBF\xBF",     /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         /* 5.2  Paired UTF-16 surrogates */
         {
             /* \U+D800\U+DC00 */
             "\xED\xA0\x80\xED\xB0\x80",
-            "\xED\xA0\x80\xED\xB0\x80", /* bug: not rejected */
+            NULL,
             "\\uFFFD\\uFFFD",
         },
         {
             /* \U+D800\U+DFFF */
             "\xED\xA0\x80\xED\xBF\xBF",
-            "\xED\xA0\x80\xED\xBF\xBF", /* bug: not rejected */
+            NULL,
             "\\uFFFD\\uFFFD",
         },
         {
             /* \U+DB7F\U+DC00 */
             "\xED\xAD\xBF\xED\xB0\x80",
-            "\xED\xAD\xBF\xED\xB0\x80", /* bug: not rejected */
+            NULL,
             "\\uFFFD\\uFFFD",
         },
         {
             /* \U+DB7F\U+DFFF */
             "\xED\xAD\xBF\xED\xBF\xBF",
-            "\xED\xAD\xBF\xED\xBF\xBF", /* bug: not rejected */
+            NULL,
             "\\uFFFD\\uFFFD",
         },
         {
             /* \U+DB80\U+DC00 */
             "\xED\xAE\x80\xED\xB0\x80",
-            "\xED\xAE\x80\xED\xB0\x80", /* bug: not rejected */
+            NULL,
             "\\uFFFD\\uFFFD",
         },
         {
             /* \U+DB80\U+DFFF */
             "\xED\xAE\x80\xED\xBF\xBF",
-            "\xED\xAE\x80\xED\xBF\xBF", /* bug: not rejected */
+            NULL,
             "\\uFFFD\\uFFFD",
         },
         {
             /* \U+DBFF\U+DC00 */
             "\xED\xAF\xBF\xED\xB0\x80",
-            "\xED\xAF\xBF\xED\xB0\x80", /* bug: not rejected */
+            NULL,
             "\\uFFFD\\uFFFD",
         },
         {
             /* \U+DBFF\U+DFFF */
             "\xED\xAF\xBF\xED\xBF\xBF",
-            "\xED\xAF\xBF\xED\xBF\xBF", /* bug: not rejected */
+            NULL,
             "\\uFFFD\\uFFFD",
         },
         /* 5.3  Other illegal code positions */
@@ -703,25 +680,25 @@ static void utf8_string(void)
         {
             /* \U+FFFE */
             "\xEF\xBF\xBE",
-            "\xEF\xBF\xBE",     /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         {
             /* \U+FFFF */
             "\xEF\xBF\xBF",
-            "\xEF\xBF\xBF",     /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         {
             /* U+FDD0 */
             "\xEF\xB7\x90",
-            "\xEF\xB7\x90",     /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         {
             /* U+FDEF */
             "\xEF\xB7\xAF",
-            "\xEF\xB7\xAF",     /* bug: not rejected */
+            NULL,
             "\\uFFFD",
         },
         /* Plane 1 .. 16 noncharacters */
@@ -743,23 +720,7 @@ static void utf8_string(void)
             "\xF3\xAF\xBF\xBE\xF3\xAF\xBF\xBF"
             "\xF3\xBF\xBF\xBE\xF3\xBF\xBF\xBF"
             "\xF4\x8F\xBF\xBE\xF4\x8F\xBF\xBF",
-            /* bug: not rejected */
-            "\xF0\x9F\xBF\xBE\xF0\x9F\xBF\xBF"
-            "\xF0\xAF\xBF\xBE\xF0\xAF\xBF\xBF"
-            "\xF0\xBF\xBF\xBE\xF0\xBF\xBF\xBF"
-            "\xF1\x8F\xBF\xBE\xF1\x8F\xBF\xBF"
-            "\xF1\x9F\xBF\xBE\xF1\x9F\xBF\xBF"
-            "\xF1\xAF\xBF\xBE\xF1\xAF\xBF\xBF"
-            "\xF1\xBF\xBF\xBE\xF1\xBF\xBF\xBF"
-            "\xF2\x8F\xBF\xBE\xF2\x8F\xBF\xBF"
-            "\xF2\x9F\xBF\xBE\xF2\x9F\xBF\xBF"
-            "\xF2\xAF\xBF\xBE\xF2\xAF\xBF\xBF"
-            "\xF2\xBF\xBF\xBE\xF2\xBF\xBF\xBF"
-            "\xF3\x8F\xBF\xBE\xF3\x8F\xBF\xBF"
-            "\xF3\x9F\xBF\xBE\xF3\x9F\xBF\xBF"
-            "\xF3\xAF\xBF\xBE\xF3\xAF\xBF\xBF"
-            "\xF3\xBF\xBF\xBE\xF3\xBF\xBF\xBF"
-            "\xF4\x8F\xBF\xBE\xF4\x8F\xBF\xBF",
+            NULL,
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
             "\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD\\uFFFD"
@@ -798,14 +759,6 @@ static void utf8_string(void)
                     }
                     in = strndup(tail, end - tail);
                     str = from_json_str(in, NULL, j);
-                    /*
-                     * FIXME JSON parser accepts invalid sequence
-                     * starting with \xC2..\xF4
-                     */
-                    if (*in >= '\xC2' && *in <= '\xF4') {
-                        g_free(str);
-                        str = NULL;
-                    }
                     g_assert(!str);
                     g_free(in);
                 }
diff --git a/util/unicode.c b/util/unicode.c
index a812a35171..8580bc598b 100644
--- a/util/unicode.c
+++ b/util/unicode.c
@@ -13,6 +13,21 @@
 #include "qemu/osdep.h"
 #include "qemu/unicode.h"
 
+static bool is_valid_codepoint(int codepoint)
+{
+    if (codepoint > 0x10FFFFu) {
+        return false;            /* beyond Unicode range */
+    }
+    if ((codepoint >= 0xFDD0 && codepoint <= 0xFDEF)
+        || (codepoint & 0xFFFE) == 0xFFFE) {
+        return false;            /* noncharacter */
+    }
+    if (codepoint >= 0xD800 && codepoint <= 0xDFFF) {
+        return false;            /* surrogate code point */
+    }
+    return true;
+}
+
 /**
  * mod_utf8_codepoint:
  * @s: string encoded in modified UTF-8
@@ -83,13 +98,8 @@ int mod_utf8_codepoint(const char *s, size_t n, char **end)
             cp <<= 6;
             cp |= byte & 0x3F;
         }
-        if (cp > 0x10FFFF) {
-            cp = -1;            /* beyond Unicode range */
-        } else if ((cp >= 0xFDD0 && cp <= 0xFDEF)
-                   || (cp & 0xFFFE) == 0xFFFE) {
-            cp = -1;            /* noncharacter */
-        } else if (cp >= 0xD800 && cp <= 0xDFFF) {
-            cp = -1;            /* surrogate code point */
+        if (!is_valid_codepoint(cp)) {
+            cp = -1;
         } else if (cp < min_cp[len - 2] && !(cp == 0 && len == 2)) {
             cp = -1;            /* overlong, not \xC0\x80 */
         }
@@ -99,3 +109,48 @@ out:
     *end = (char *)p;
     return cp;
 }
+
+/**
+ * mod_utf8_encode:
+ * @buf: Destination buffer
+ * @bufsz: size of @buf, at least 5.
+ * @codepoint: Unicode codepoint to encode
+ *
+ * Convert Unicode codepoint @codepoint to modified UTF-8.
+ *
+ * Returns: the length of the UTF-8 sequence on success, -1 when
+ * @codepoint is invalid.
+ */
+ssize_t mod_utf8_encode(char buf[], size_t bufsz, int codepoint)
+{
+    assert(bufsz >= 5);
+
+    if (!is_valid_codepoint(codepoint)) {
+        return -1;
+    }
+
+    if (codepoint > 0 && codepoint <= 0x7F) {
+        buf[0] = codepoint & 0x7F;
+        buf[1] = 0;
+        return 1;
+    }
+    if (codepoint <= 0x7FF) {
+        buf[0] = 0xC0 | ((codepoint >> 6) & 0x1F);
+        buf[1] = 0x80 | (codepoint & 0x3F);
+        buf[2] = 0;
+        return 2;
+    }
+    if (codepoint <= 0xFFFF) {
+        buf[0] = 0xE0 | ((codepoint >> 12) & 0x0F);
+        buf[1] = 0x80 | ((codepoint >> 6) & 0x3F);
+        buf[2] = 0x80 | (codepoint & 0x3F);
+        buf[3] = 0;
+        return 3;
+    }
+    buf[0] = 0xF0 | ((codepoint >> 18) & 0x07);
+    buf[1] = 0x80 | ((codepoint >> 12) & 0x3F);
+    buf[2] = 0x80 | ((codepoint >> 6) & 0x3F);
+    buf[3] = 0x80 | (codepoint & 0x3F);
+    buf[4] = 0;
+    return 4;
+}
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 22/56] json: Report first rather than last parse error
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (20 preceding siblings ...)
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 21/56] json: Reject invalid UTF-8 sequences Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-10 15:25   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 23/56] json: Leave rejecting invalid UTF-8 to parser Markus Armbruster
                   ` (34 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Quiz time!  When a parser reports multiple errors, but the user gets
to see just one, which one is (on average) the least useful one?

Yes, you're right, it's the last one!  You're clearly familiar with
compilers.

Which one does QEMU report?

Right again, the last one!  You're clearly familiar with QEMU.

Reproducer: feeding

    {"abc\xC2ijk": 1}\n

to QMP produces

    {"error": {"class": "GenericError", "desc": "JSON parse error, key is not a string in object"}}

Report the first error instead.  The reproducer now produces

    {"error": {"class": "GenericError", "desc": "JSON parse error, invalid UTF-8 sequence in string"}}

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-parser.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index 6b60b07e09..b3a95be3c8 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -54,13 +54,13 @@ static void GCC_FMT_ATTR(3, 4) parse_error(JSONParserContext *ctxt,
 {
     va_list ap;
     char message[1024];
+
+    if (ctxt->err) {
+        return;
+    }
     va_start(ap, msg);
     vsnprintf(message, sizeof(message), msg, ap);
     va_end(ap);
-    if (ctxt->err) {
-        error_free(ctxt->err);
-        ctxt->err = NULL;
-    }
     error_setg(&ctxt->err, "JSON parse error, %s", message);
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 23/56] json: Leave rejecting invalid UTF-8 to parser
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (21 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 22/56] json: Report first rather than last parse error Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-10 15:36   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8") Markus Armbruster
                   ` (33 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Both the lexer and the parser (attempt to) validate UTF-8 in JSON
strings.

The lexer rejects bytes that can't occur in valid UTF-8: \xC0..\xC1,
\xF5..\xFF.  This rejects some, but not all invalid UTF-8.  It also
rejects ASCII control characters \x00..\x1F, in accordance with RFC
7159 (see recent commit "json: Reject unescaped control characters").

When the lexer rejects, it ends the token right after the first bad
byte.  Good when the bad byte is a newline.  Not so good when it's
something like an overlong sequence in the middle of a string.  For
instance, input

    {"abc\xC0\xAFijk": 1}\n

produces the tokens

    JSON_LCURLY   {
    JSON_ERROR    "abc\xC0
    JSON_ERROR    \xAF
    JSON_KEYWORD  ijk
    JSON_ERROR   ": 1}\n

The parser then reports four errors

    Invalid JSON syntax
    Invalid JSON syntax
    JSON parse error, invalid keyword 'ijk'
    Invalid JSON syntax

before it recovers at the newline.

The commit before previous made the parser reject invalid UTF-8
sequences.  Since then, anything the lexer rejects, the parser would
reject as well.  Thus, the lexer's rejecting is unnecessary for
correctness, and harmful for error reporting.

However, we want to keep rejecting ASCII control characters in the
lexer, because that produces the behavior we want for unclosed
strings.

We also need to keep rejecting \xFF in the lexer, because we
documented that as a way to reset the JSON parser
(docs/interop/qmp-spec.txt section 2.6 QGA Synchronization), which
means we can't change how we recover from this error now.  I wish we
hadn't done that.

I think we should treat \xFE the same as \xFF.

Change the lexer to accept \xC0..\xC1 and \xF5..\xFD.  It now rejects
only \x00..\x1F and \xFE..\xFF.  Error reporting for invalid UTF-8 in
strings is much improved, except for \xFE and \xFF.  For the example
above, the lexer now produces

    JSON_LCURLY   {
    JSON_STRING   "abc\xC0\xAFijk"
    JSON_COLON    :
    JSON_INTEGER  1
    JSON_RCURLY

and the parser reports just

    JSON parse error, invalid UTF-8 sequence in string

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-lexer.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 109a7d8bb8..ca1e0e2c03 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -177,8 +177,7 @@ static const uint8_t json_lexer[][256] =  {
         ['u'] = IN_DQ_UCODE0,
     },
     [IN_DQ_STRING] = {
-        [0x20 ... 0xBF] = IN_DQ_STRING,
-        [0xC2 ... 0xF4] = IN_DQ_STRING,
+        [0x20 ... 0xFD] = IN_DQ_STRING,
         ['\\'] = IN_DQ_STRING_ESCAPE,
         ['"'] = JSON_STRING,
     },
@@ -217,8 +216,7 @@ static const uint8_t json_lexer[][256] =  {
         ['u'] = IN_SQ_UCODE0,
     },
     [IN_SQ_STRING] = {
-        [0x20 ... 0xBF] = IN_SQ_STRING,
-        [0xC2 ... 0xF4] = IN_SQ_STRING,
+        [0x20 ... 0xFD] = IN_SQ_STRING,
         ['\\'] = IN_SQ_STRING_ESCAPE,
         ['\''] = JSON_STRING,
     },
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8")
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (22 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 23/56] json: Leave rejecting invalid UTF-8 to parser Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-10 15:48   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequences to parser Markus Armbruster
                   ` (32 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

This is consistent with qobject_to_json().  See commit e2ec3f97680.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-lexer.c  | 2 +-
 qobject/json-parser.c | 2 +-
 tests/check-qjson.c   | 8 +-------
 3 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index ca1e0e2c03..36fb665b12 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -93,7 +93,7 @@
  *   interpolation = %((l|ll|I64)[du]|[ipsf])
  *
  * Note:
- * - Input must be encoded in UTF-8.
+ * - Input must be encoded in modified UTF-8.
  * - Decoding and validating is left to the parser.
  */
 
diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index b3a95be3c8..14225c3c09 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -199,7 +199,7 @@ static QString *qstring_from_escaped_str(JSONParserContext *ctxt,
             }
         } else {
             cp = mod_utf8_codepoint(ptr, 6, &end);
-            if (cp <= 0) {
+            if (cp < 0) {
                 parse_error(ctxt, token, "invalid UTF-8 sequence in string");
                 goto out;
             }
diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index e00be5b023..e5a7cb6bf6 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -121,12 +121,6 @@ static void escaped_string(void)
 static void utf8_string(void)
 {
     /*
-     * Problem: we can't easily deal with embedded U+0000.  Parsing
-     * the JSON string "this \\u0000" is fun" yields "this \0 is fun",
-     * which gets misinterpreted as NUL-terminated "this ".  We should
-     * consider using overlong encoding \xC0\x80 for U+0000 ("modified
-     * UTF-8").
-     *
      * Most test cases are scraped from Markus Kuhn's UTF-8 decoder
      * capability and stress test at
      * http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
@@ -555,7 +549,7 @@ static void utf8_string(void)
         {
             /* \U+0000 */
             "\xC0\x80",
-            NULL,
+            "\xC0\x80",
             "\\u0000",
         },
         {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequences to parser
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (23 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8") Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-10 15:56   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 26/56] json: Simplify parse_string() Markus Armbruster
                   ` (31 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Both lexer and parser reject invalid escape sequences in strings.  The
parser's check is useless.

The lexer ends the token right after the first non-well-formed byte.
This tends to lead to suboptimal error reporting.  For instance, input

    {"abc\@ijk": 1}

produces the tokens

    JSON_LCURLY   {
    JSON_ERROR    "abc\@
    JSON_KEYWORD  ijk
    JSON_ERROR   ": 1}\n

The parser then reports three errors

    Invalid JSON syntax
    JSON parse error, invalid keyword 'ijk'
    Invalid JSON syntax

before it recovers at the newline.

Drop the lexer's escape sequence checking, and make it accept the same
characters after '\' it accepts elsewhere in strings.  It now produces

    JSON_LCURLY   {
    JSON_STRING   "abc\@ijk"
    JSON_COLON    :
    JSON_INTEGER  1
    JSON_RCURLY

and the parser reports just

    JSON parse error, invalid escape sequence in string

While there, fix parse_string()'s inaccurate function comment.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-lexer.c  | 72 +++----------------------------------------
 qobject/json-parser.c | 56 +++++++++++++++++++--------------
 2 files changed, 37 insertions(+), 91 deletions(-)

diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 36fb665b12..af0a7fdb8a 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -80,6 +80,8 @@
  *    escape = %x5C              ; \
  *    quotation-mark = %x22      ; "
  *    unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
+ *    [This lexer accepts any non-control character after escape, and
+ *    leaves rejecting invalid ones to the parser.]
  *
  *
  * Extensions over RFC 7159:
@@ -99,16 +101,8 @@
 
 enum json_lexer_state {
     IN_ERROR = 0,               /* must really be 0, see json_lexer[] */
-    IN_DQ_UCODE3,
-    IN_DQ_UCODE2,
-    IN_DQ_UCODE1,
-    IN_DQ_UCODE0,
     IN_DQ_STRING_ESCAPE,
     IN_DQ_STRING,
-    IN_SQ_UCODE3,
-    IN_SQ_UCODE2,
-    IN_SQ_UCODE1,
-    IN_SQ_UCODE0,
     IN_SQ_STRING_ESCAPE,
     IN_SQ_STRING,
     IN_ZERO,
@@ -144,37 +138,8 @@ static const uint8_t json_lexer[][256] =  {
     /* Relies on default initialization to IN_ERROR! */
 
     /* double quote string */
-    [IN_DQ_UCODE3] = {
-        ['0' ... '9'] = IN_DQ_STRING,
-        ['a' ... 'f'] = IN_DQ_STRING,
-        ['A' ... 'F'] = IN_DQ_STRING,
-    },
-    [IN_DQ_UCODE2] = {
-        ['0' ... '9'] = IN_DQ_UCODE3,
-        ['a' ... 'f'] = IN_DQ_UCODE3,
-        ['A' ... 'F'] = IN_DQ_UCODE3,
-    },
-    [IN_DQ_UCODE1] = {
-        ['0' ... '9'] = IN_DQ_UCODE2,
-        ['a' ... 'f'] = IN_DQ_UCODE2,
-        ['A' ... 'F'] = IN_DQ_UCODE2,
-    },
-    [IN_DQ_UCODE0] = {
-        ['0' ... '9'] = IN_DQ_UCODE1,
-        ['a' ... 'f'] = IN_DQ_UCODE1,
-        ['A' ... 'F'] = IN_DQ_UCODE1,
-    },
     [IN_DQ_STRING_ESCAPE] = {
-        ['b'] = IN_DQ_STRING,
-        ['f'] =  IN_DQ_STRING,
-        ['n'] =  IN_DQ_STRING,
-        ['r'] =  IN_DQ_STRING,
-        ['t'] =  IN_DQ_STRING,
-        ['/'] = IN_DQ_STRING,
-        ['\\'] = IN_DQ_STRING,
-        ['\''] = IN_DQ_STRING,
-        ['\"'] = IN_DQ_STRING,
-        ['u'] = IN_DQ_UCODE0,
+        [0x20 ... 0xFD] = IN_DQ_STRING,
     },
     [IN_DQ_STRING] = {
         [0x20 ... 0xFD] = IN_DQ_STRING,
@@ -183,37 +148,8 @@ static const uint8_t json_lexer[][256] =  {
     },
 
     /* single quote string */
-    [IN_SQ_UCODE3] = {
-        ['0' ... '9'] = IN_SQ_STRING,
-        ['a' ... 'f'] = IN_SQ_STRING,
-        ['A' ... 'F'] = IN_SQ_STRING,
-    },
-    [IN_SQ_UCODE2] = {
-        ['0' ... '9'] = IN_SQ_UCODE3,
-        ['a' ... 'f'] = IN_SQ_UCODE3,
-        ['A' ... 'F'] = IN_SQ_UCODE3,
-    },
-    [IN_SQ_UCODE1] = {
-        ['0' ... '9'] = IN_SQ_UCODE2,
-        ['a' ... 'f'] = IN_SQ_UCODE2,
-        ['A' ... 'F'] = IN_SQ_UCODE2,
-    },
-    [IN_SQ_UCODE0] = {
-        ['0' ... '9'] = IN_SQ_UCODE1,
-        ['a' ... 'f'] = IN_SQ_UCODE1,
-        ['A' ... 'F'] = IN_SQ_UCODE1,
-    },
     [IN_SQ_STRING_ESCAPE] = {
-        ['b'] = IN_SQ_STRING,
-        ['f'] =  IN_SQ_STRING,
-        ['n'] =  IN_SQ_STRING,
-        ['r'] =  IN_SQ_STRING,
-        ['t'] =  IN_SQ_STRING,
-        ['/'] = IN_SQ_STRING,
-        ['\\'] = IN_SQ_STRING,
-        ['\''] = IN_SQ_STRING,
-        ['\"'] = IN_SQ_STRING,
-        ['u'] = IN_SQ_UCODE0,
+        [0x20 ... 0xFD] = IN_SQ_STRING,
     },
     [IN_SQ_STRING] = {
         [0x20 ... 0xFD] = IN_SQ_STRING,
diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index 14225c3c09..d469004616 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -106,30 +106,40 @@ static int hex2decimal(char ch)
 }
 
 /**
- * parse_string(): Parse a json string and return a QObject
+ * parse_string(): Parse a JSON string
  *
- *  string
- *      ""
- *      " chars "
- *  chars
- *      char
- *      char chars
- *  char
- *      any-Unicode-character-
- *          except-"-or-\-or-
- *          control-character
- *      \"
- *      \\
- *      \/
- *      \b
- *      \f
- *      \n
- *      \r
- *      \t
- *      \u four-hex-digits 
+ * From RFC 7159 "The JavaScript Object Notation (JSON) Data
+ * Interchange Format":
+ *
+ *    char = unescaped /
+ *        escape (
+ *            %x22 /          ; "    quotation mark  U+0022
+ *            %x5C /          ; \    reverse solidus U+005C
+ *            %x2F /          ; /    solidus         U+002F
+ *            %x62 /          ; b    backspace       U+0008
+ *            %x66 /          ; f    form feed       U+000C
+ *            %x6E /          ; n    line feed       U+000A
+ *            %x72 /          ; r    carriage return U+000D
+ *            %x74 /          ; t    tab             U+0009
+ *            %x75 4HEXDIG )  ; uXXXX                U+XXXX
+ *    escape = %x5C              ; \
+ *    quotation-mark = %x22      ; "
+ *    unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
+ *
+ * Extensions over RFC 7159:
+ * - Extra escape sequence in strings:
+ *   0x27 (apostrophe) is recognized after escape, too
+ * - Single-quoted strings:
+ *   Like double-quoted strings, except they're delimited by %x27
+ *   (apostrophe) instead of %x22 (quotation mark), and can't contain
+ *   unescaped apostrophe, but can contain unescaped quotation mark.
+ *
+ * Note:
+ * - Encoding is modified UTF-8.
+ * - Invalid Unicode characters are rejected.
+ * - Control characters are rejected by the lexer.
  */
-static QString *qstring_from_escaped_str(JSONParserContext *ctxt,
-                                         JSONToken *token)
+static QString *parse_string(JSONParserContext *ctxt, JSONToken *token)
 {
     const char *ptr = token->str;
     QString *str;
@@ -494,7 +504,7 @@ static QObject *parse_literal(JSONParserContext *ctxt)
 
     switch (token->type) {
     case JSON_STRING:
-        return QOBJECT(qstring_from_escaped_str(ctxt, token));
+        return QOBJECT(parse_string(ctxt, token));
     case JSON_INTEGER: {
         /*
          * Represent JSON_INTEGER as QNUM_I64 if possible, else as
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 26/56] json: Simplify parse_string()
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (24 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequences to parser Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-10 15:59   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 27/56] json: Reject invalid \uXXXX, fix \u0000 Markus Armbruster
                   ` (30 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-parser.c | 42 +++++++++++++++++++-----------------------
 1 file changed, 19 insertions(+), 23 deletions(-)

diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index d469004616..f26e5b7511 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -101,8 +101,7 @@ static int hex2decimal(char ch)
     } else if (ch >= 'A' && ch <= 'F') {
         return 10 + (ch - 'A');
     }
-
-    return -1;
+    abort();
 }
 
 /**
@@ -144,7 +143,7 @@ static QString *parse_string(JSONParserContext *ctxt, JSONToken *token)
     const char *ptr = token->str;
     QString *str;
     char quote;
-    int cp;
+    int cp, i;
     char *end;
     ssize_t len;
     char utf8_buf[5];
@@ -158,51 +157,48 @@ static QString *parse_string(JSONParserContext *ctxt, JSONToken *token)
         if (*ptr == '\\') {
             switch (*++ptr) {
             case '"':
-                qstring_append(str, "\"");
+                qstring_append_chr(str, '"');
                 break;
             case '\'':
-                qstring_append(str, "'");
+                qstring_append_chr(str, '\'');
                 break;
             case '\\':
-                qstring_append(str, "\\");
+                qstring_append_chr(str, '\\');
                 break;
             case '/':
-                qstring_append(str, "/");
+                qstring_append_chr(str, '/');
                 break;
             case 'b':
-                qstring_append(str, "\b");
+                qstring_append_chr(str, '\b');
                 break;
             case 'f':
-                qstring_append(str, "\f");
+                qstring_append_chr(str, '\f');
                 break;
             case 'n':
-                qstring_append(str, "\n");
+                qstring_append_chr(str, '\n');
                 break;
             case 'r':
-                qstring_append(str, "\r");
+                qstring_append_chr(str, '\r');
                 break;
             case 't':
-                qstring_append(str, "\t");
+                qstring_append_chr(str, '\t');
                 break;
-            case 'u': {
-                uint16_t unicode_char = 0;
-                char utf8_char[4];
-                int i = 0;
-
+            case 'u':
+                cp = 0;
                 for (i = 0; i < 4; i++) {
                     ptr++;
-                    if (qemu_isxdigit(*ptr)) {
-                        unicode_char |= hex2decimal(*ptr) << ((3 - i) * 4);
-                    } else {
+                    if (!qemu_isxdigit(*ptr)) {
                         parse_error(ctxt, token,
                                     "invalid hex escape sequence in string");
                         goto out;
                     }
+                    cp <<= 4;
+                    cp |= hex2decimal(*ptr);
                 }
 
-                wchar_to_utf8(unicode_char, utf8_char, sizeof(utf8_char));
-                qstring_append(str, utf8_char);
-            }   break;
+                wchar_to_utf8(cp, utf8_buf, sizeof(utf8_buf));
+                qstring_append(str, utf8_buf);
+                break;
             default:
                 parse_error(ctxt, token, "invalid escape sequence in string");
                 goto out;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 27/56] json: Reject invalid \uXXXX, fix \u0000
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (25 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 26/56] json: Simplify parse_string() Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-10 16:10   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs Markus Armbruster
                   ` (29 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

The JSON parser translates invalid \uXXXX to garbage instead of
rejecting it, and swallows \u0000.

Fix by using mod_utf8_encode() instead of flawed wchar_to_utf8().

Valid surrogate pairs are now differently broken: they're rejected
instead of translated to garbage.  The next commit will fix them.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-parser.c | 35 ++++++-----------------------------
 tests/check-qjson.c   | 32 +++++++++-----------------------
 2 files changed, 15 insertions(+), 52 deletions(-)

diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index f26e5b7511..bb54886809 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -64,34 +64,6 @@ static void GCC_FMT_ATTR(3, 4) parse_error(JSONParserContext *ctxt,
     error_setg(&ctxt->err, "JSON parse error, %s", message);
 }
 
-/**
- * String helpers
- *
- * These helpers are used to unescape strings.
- */
-static void wchar_to_utf8(uint16_t wchar, char *buffer, size_t buffer_length)
-{
-    if (wchar <= 0x007F) {
-        BUG_ON(buffer_length < 2);
-
-        buffer[0] = wchar & 0x7F;
-        buffer[1] = 0;
-    } else if (wchar <= 0x07FF) {
-        BUG_ON(buffer_length < 3);
-
-        buffer[0] = 0xC0 | ((wchar >> 6) & 0x1F);
-        buffer[1] = 0x80 | (wchar & 0x3F);
-        buffer[2] = 0;
-    } else {
-        BUG_ON(buffer_length < 4);
-
-        buffer[0] = 0xE0 | ((wchar >> 12) & 0x0F);
-        buffer[1] = 0x80 | ((wchar >> 6) & 0x3F);
-        buffer[2] = 0x80 | (wchar & 0x3F);
-        buffer[3] = 0;
-    }
-}
-
 static int hex2decimal(char ch)
 {
     if (ch >= '0' && ch <= '9') {
@@ -196,7 +168,12 @@ static QString *parse_string(JSONParserContext *ctxt, JSONToken *token)
                     cp |= hex2decimal(*ptr);
                 }
 
-                wchar_to_utf8(cp, utf8_buf, sizeof(utf8_buf));
+                if (mod_utf8_encode(utf8_buf, sizeof(utf8_buf), cp) < 0) {
+                    parse_error(ctxt, token,
+                                "\\u%.4s is not a valid Unicode character",
+                                ptr - 3);
+                    goto out;
+                }
                 qstring_append(str, utf8_buf);
                 break;
             default:
diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index e5a7cb6bf6..422697459f 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -62,7 +62,7 @@ static void escaped_string(void)
         { "triple byte utf-8 \\u20AC", "triple byte utf-8 \xe2\x82\xac" },
         { "quadruple byte utf-8 \\uD834\\uDD1E", /* U+1D11E */
           /* bug: want \xF0\x9D\x84\x9E */
-          "quadruple byte utf-8 \xED\xA0\xB4\xED\xB4\x9E", .skip = 1 },
+          NULL },
         { "\\z", NULL },
         { "\\ux", NULL },
         { "\\u1x", NULL },
@@ -70,28 +70,14 @@ static void escaped_string(void)
         { "\\u123x", NULL },
         { "\\u12345", "\341\210\2645" },
         { "\\u12345", "\341\210\2645" },
-        { "\\u0000x", "x", .skip = 1}, /* bug: want \xC0\x80x */
-        { "unpaired leading surrogate \\uD800\\uD800",
-          /* bug: not rejected */
-          "unpaired leading surrogate \355\240\200\355\240\200", .skip = 1 },
-        { "unpaired trailing surrogate \\uDC00\\uDC00",
-          /* bug: not rejected */
-          "unpaired trailing surrogate \355\260\200\355\260\200", .skip = 1},
-        { "backward surrogate pair \\uDC00\\uD800",
-          /* bug: not rejected */
-          "backward surrogate pair \355\260\200\355\240\200", .skip = 1},
-        { "noncharacter U+FDD0 \\uFDD0",
-          /* bug: not rejected */
-          "noncharacter U+FDD0 \xEF\xB7\x90", .skip = 1},
-        { "noncharacter U+FDEF \\uFDEF",
-          /* bug: not rejected */
-          "noncharacter U+FDEF \xEF\xB7\xAF", .skip = 1},
-        { "noncharacter U+1FFFE \\uD87F\\uDFFE",
-          /* bug: not rejected */
-          "noncharacter U+1FFFE \xED\xA1\xBF\xED\xBF\xBE", .skip = 1},
-        { "noncharacter U+10FFFF \\uDC3F\\uDFFF",
-          /* bug: not rejected */
-          "noncharacter U+10FFFF \xED\xB0\xBF\xED\xBF\xBF", .skip = 1},
+        { "\\u0000x", "\xC0\x80x" },
+        { "unpaired leading surrogate \\uD800\\uD800", NULL },
+        { "unpaired trailing surrogate \\uDC00\\uDC00", NULL },
+        { "backward surrogate pair \\uDC00\\uD800", NULL },
+        { "noncharacter U+FDD0 \\uFDD0", NULL },
+        { "noncharacter U+FDEF \\uFDEF", NULL },
+        { "noncharacter U+1FFFE \\uD87F\\uDFFE", NULL },
+        { "noncharacter U+10FFFF \\uDC3F\\uDFFF", NULL },
         {}
     };
     int i, j;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (26 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 27/56] json: Reject invalid \uXXXX, fix \u0000 Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-10 17:18   ` Eric Blake
  2018-08-12  9:52   ` Paolo Bonzini
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 29/56] check-qjson: Fix and enable utf8_string()'s disabled part Markus Armbruster
                   ` (28 subsequent siblings)
  56 siblings, 2 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

The JSON parser treats each half of a surrogate pair as unpaired
surrogate.  Fix it to recognize surrogate pairs.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-parser.c | 16 +++++++++++++++-
 tests/check-qjson.c   |  3 +--
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index bb54886809..703065fa2b 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -115,7 +115,7 @@ static QString *parse_string(JSONParserContext *ctxt, JSONToken *token)
     const char *ptr = token->str;
     QString *str;
     char quote;
-    int cp, i;
+    int cp, i, leading_surrogate;
     char *end;
     ssize_t len;
     char utf8_buf[5];
@@ -156,6 +156,8 @@ static QString *parse_string(JSONParserContext *ctxt, JSONToken *token)
                 qstring_append_chr(str, '\t');
                 break;
             case 'u':
+                leading_surrogate = 0;
+            hex:
                 cp = 0;
                 for (i = 0; i < 4; i++) {
                     ptr++;
@@ -168,6 +170,18 @@ static QString *parse_string(JSONParserContext *ctxt, JSONToken *token)
                     cp |= hex2decimal(*ptr);
                 }
 
+                if (cp >= 0xD800 && cp <= 0xDBFF && !leading_surrogate
+                    && ptr[1] == '\\' && ptr[2] == 'u') {
+                    ptr += 2;
+                    leading_surrogate = cp;
+                    goto hex;
+                }
+                if (cp >= 0xDC00 && cp <= 0xDFFF && leading_surrogate) {
+                    cp &= 0x3FF;
+                    cp |= (leading_surrogate & 0x3FF) << 10;
+                    cp += 0x010000;
+                }
+
                 if (mod_utf8_encode(utf8_buf, sizeof(utf8_buf), cp) < 0) {
                     parse_error(ctxt, token,
                                 "\\u%.4s is not a valid Unicode character",
diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index 422697459f..3d3a3f105f 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -61,8 +61,7 @@ static void escaped_string(void)
         { "double byte utf-8 \\u00A2", "double byte utf-8 \xc2\xa2" },
         { "triple byte utf-8 \\u20AC", "triple byte utf-8 \xe2\x82\xac" },
         { "quadruple byte utf-8 \\uD834\\uDD1E", /* U+1D11E */
-          /* bug: want \xF0\x9D\x84\x9E */
-          NULL },
+          "quadruple byte utf-8 \xF0\x9D\x84\x9E" },
         { "\\z", NULL },
         { "\\ux", NULL },
         { "\\u1x", NULL },
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 29/56] check-qjson: Fix and enable utf8_string()'s disabled part
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (27 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-10 17:19   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 30/56] json: remove useless return value from lexer/parser Markus Armbruster
                   ` (27 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 tests/check-qjson.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index 3d3a3f105f..c8c0ad95a6 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -750,15 +750,10 @@ static void utf8_string(void)
             qobject_unref(str);
             g_free(jstr);
 
-            /*
-             * Parse @json_out right back
-             * Disabled, because qobject_from_json() is buggy, and I can't
-             * be bothered to add the expected incorrect results.
-             * FIXME Enable once these bugs have been fixed.
-             */
-            if (0 && json_out != json_in) {
+            /* Parse @json_out right back, unless it has replacements */
+            if (!strstr(json_out, "\\uFFFD")) {
                 str = from_json_str(json_out, &error_abort, j);
-                g_assert_cmpstr(qstring_get_try_str(str), ==, utf8_out);
+                g_assert_cmpstr(qstring_get_try_str(str), ==, utf8_in);
             }
         }
     }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 30/56] json: remove useless return value from lexer/parser
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (28 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 29/56] check-qjson: Fix and enable utf8_string()'s disabled part Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 31/56] json-parser: simplify and avoid JSONParserContext allocation Markus Armbruster
                   ` (26 subsequent siblings)
  56 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

From: Marc-André Lureau <marcandre.lureau@redhat.com>

The lexer always returns 0 when char feeding. Furthermore, none of the
caller care about the return value.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180326150916.9602-10-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/qapi/qmp/json-lexer.h    |  4 ++--
 include/qapi/qmp/json-streamer.h |  4 ++--
 qobject/json-lexer.c             | 23 ++++++++---------------
 qobject/json-streamer.c          |  8 ++++----
 4 files changed, 16 insertions(+), 23 deletions(-)

diff --git a/include/qapi/qmp/json-lexer.h b/include/qapi/qmp/json-lexer.h
index afee7828cd..66ccf0357c 100644
--- a/include/qapi/qmp/json-lexer.h
+++ b/include/qapi/qmp/json-lexer.h
@@ -47,9 +47,9 @@ struct JSONLexer
 
 void json_lexer_init(JSONLexer *lexer, JSONLexerEmitter func);
 
-int json_lexer_feed(JSONLexer *lexer, const char *buffer, size_t size);
+void json_lexer_feed(JSONLexer *lexer, const char *buffer, size_t size);
 
-int json_lexer_flush(JSONLexer *lexer);
+void json_lexer_flush(JSONLexer *lexer);
 
 void json_lexer_destroy(JSONLexer *lexer);
 
diff --git a/include/qapi/qmp/json-streamer.h b/include/qapi/qmp/json-streamer.h
index 00d8a23af8..cb808cf27d 100644
--- a/include/qapi/qmp/json-streamer.h
+++ b/include/qapi/qmp/json-streamer.h
@@ -36,10 +36,10 @@ typedef struct JSONMessageParser
 void json_message_parser_init(JSONMessageParser *parser,
                               void (*func)(JSONMessageParser *, GQueue *));
 
-int json_message_parser_feed(JSONMessageParser *parser,
+void json_message_parser_feed(JSONMessageParser *parser,
                              const char *buffer, size_t size);
 
-int json_message_parser_flush(JSONMessageParser *parser);
+void json_message_parser_flush(JSONMessageParser *parser);
 
 void json_message_parser_destroy(JSONMessageParser *parser);
 
diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index af0a7fdb8a..87cdd41f29 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -286,7 +286,7 @@ void json_lexer_init(JSONLexer *lexer, JSONLexerEmitter func)
     lexer->x = lexer->y = 0;
 }
 
-static int json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
+static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
 {
     int char_consumed, new_state;
 
@@ -340,7 +340,7 @@ static int json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
             g_string_truncate(lexer->token, 0);
             new_state = IN_START;
             lexer->state = new_state;
-            return 0;
+            return;
         default:
             break;
         }
@@ -355,29 +355,22 @@ static int json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
         g_string_truncate(lexer->token, 0);
         lexer->state = IN_START;
     }
-
-    return 0;
 }
 
-int json_lexer_feed(JSONLexer *lexer, const char *buffer, size_t size)
+void json_lexer_feed(JSONLexer *lexer, const char *buffer, size_t size)
 {
     size_t i;
 
     for (i = 0; i < size; i++) {
-        int err;
-
-        err = json_lexer_feed_char(lexer, buffer[i], false);
-        if (err < 0) {
-            return err;
-        }
+        json_lexer_feed_char(lexer, buffer[i], false);
     }
-
-    return 0;
 }
 
-int json_lexer_flush(JSONLexer *lexer)
+void json_lexer_flush(JSONLexer *lexer)
 {
-    return lexer->state == IN_START ? 0 : json_lexer_feed_char(lexer, 0, true);
+    if (lexer->state != IN_START) {
+        json_lexer_feed_char(lexer, 0, true);
+    }
 }
 
 void json_lexer_destroy(JSONLexer *lexer)
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index c51c2021f9..78dfff2aa0 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -118,15 +118,15 @@ void json_message_parser_init(JSONMessageParser *parser,
     json_lexer_init(&parser->lexer, json_message_process_token);
 }
 
-int json_message_parser_feed(JSONMessageParser *parser,
+void json_message_parser_feed(JSONMessageParser *parser,
                              const char *buffer, size_t size)
 {
-    return json_lexer_feed(&parser->lexer, buffer, size);
+    json_lexer_feed(&parser->lexer, buffer, size);
 }
 
-int json_message_parser_flush(JSONMessageParser *parser)
+void json_message_parser_flush(JSONMessageParser *parser)
 {
-    return json_lexer_flush(&parser->lexer);
+    json_lexer_flush(&parser->lexer);
 }
 
 void json_message_parser_destroy(JSONMessageParser *parser)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 31/56] json-parser: simplify and avoid JSONParserContext allocation
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (29 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 30/56] json: remove useless return value from lexer/parser Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 32/56] json: Have lexer call streamer directly Markus Armbruster
                   ` (25 subsequent siblings)
  56 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

From: Marc-André Lureau <marcandre.lureau@redhat.com>

parser_context_new/free() are only used from json_parser_parse(). We
can fold the code there and avoid an allocation altogether.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180719184111.5129-9-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-parser.c | 41 +++++++++--------------------------------
 1 file changed, 9 insertions(+), 32 deletions(-)

diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index 703065fa2b..b14336e653 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -232,33 +232,6 @@ static JSONToken *parser_context_peek_token(JSONParserContext *ctxt)
     return g_queue_peek_head(ctxt->buf);
 }
 
-static JSONParserContext *parser_context_new(GQueue *tokens)
-{
-    JSONParserContext *ctxt;
-
-    if (!tokens) {
-        return NULL;
-    }
-
-    ctxt = g_malloc0(sizeof(JSONParserContext));
-    ctxt->buf = tokens;
-
-    return ctxt;
-}
-
-/* to support error propagation, ctxt->err must be freed separately */
-static void parser_context_free(JSONParserContext *ctxt)
-{
-    if (ctxt) {
-        while (!g_queue_is_empty(ctxt->buf)) {
-            parser_context_pop_token(ctxt);
-        }
-        g_free(ctxt->current);
-        g_queue_free(ctxt->buf);
-        g_free(ctxt);
-    }
-}
-
 /**
  * Parsing rules
  */
@@ -570,18 +543,22 @@ QObject *json_parser_parse(GQueue *tokens, va_list *ap)
 
 QObject *json_parser_parse_err(GQueue *tokens, va_list *ap, Error **errp)
 {
-    JSONParserContext *ctxt = parser_context_new(tokens);
+    JSONParserContext ctxt = { .buf = tokens };
     QObject *result;
 
-    if (!ctxt) {
+    if (!tokens) {
         return NULL;
     }
 
-    result = parse_value(ctxt, ap);
+    result = parse_value(&ctxt, ap);
 
-    error_propagate(errp, ctxt->err);
+    error_propagate(errp, ctxt.err);
 
-    parser_context_free(ctxt);
+    while (!g_queue_is_empty(ctxt.buf)) {
+        parser_context_pop_token(&ctxt);
+    }
+    g_free(ctxt.current);
+    g_queue_free(ctxt.buf);
 
     return result;
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 32/56] json: Have lexer call streamer directly
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (30 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 31/56] json-parser: simplify and avoid JSONParserContext allocation Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-10 17:22   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 33/56] json: Redesign the callback to consume JSON values Markus Armbruster
                   ` (24 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

json_lexer_init() takes the function to process a token as an
argument.  It's always json_message_process_token().  Makes the code
harder to understand for no actual gain.  Drop the indirection.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/qapi/qmp/json-lexer.h    | 13 +++----------
 include/qapi/qmp/json-streamer.h |  3 +++
 qobject/json-lexer.c             | 13 ++++++++-----
 qobject/json-streamer.c          |  6 +++---
 4 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/include/qapi/qmp/json-lexer.h b/include/qapi/qmp/json-lexer.h
index 66ccf0357c..44bcf2ca64 100644
--- a/include/qapi/qmp/json-lexer.h
+++ b/include/qapi/qmp/json-lexer.h
@@ -32,20 +32,13 @@ typedef enum json_token_type {
     JSON_ERROR,
 } JSONTokenType;
 
-typedef struct JSONLexer JSONLexer;
-
-typedef void (JSONLexerEmitter)(JSONLexer *, GString *,
-                                JSONTokenType, int x, int y);
-
-struct JSONLexer
-{
-    JSONLexerEmitter *emit;
+typedef struct JSONLexer {
     int state;
     GString *token;
     int x, y;
-};
+} JSONLexer;
 
-void json_lexer_init(JSONLexer *lexer, JSONLexerEmitter func);
+void json_lexer_init(JSONLexer *lexer);
 
 void json_lexer_feed(JSONLexer *lexer, const char *buffer, size_t size);
 
diff --git a/include/qapi/qmp/json-streamer.h b/include/qapi/qmp/json-streamer.h
index cb808cf27d..7922e185a5 100644
--- a/include/qapi/qmp/json-streamer.h
+++ b/include/qapi/qmp/json-streamer.h
@@ -33,6 +33,9 @@ typedef struct JSONMessageParser
     uint64_t token_size;
 } JSONMessageParser;
 
+void json_message_process_token(JSONLexer *lexer, GString *input,
+                                JSONTokenType type, int x, int y);
+
 void json_message_parser_init(JSONMessageParser *parser,
                               void (*func)(JSONMessageParser *, GQueue *));
 
diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 87cdd41f29..0b54b1af56 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -14,6 +14,7 @@
 #include "qemu/osdep.h"
 #include "qemu-common.h"
 #include "qapi/qmp/json-lexer.h"
+#include "qapi/qmp/json-streamer.h"
 
 #define MAX_TOKEN_SIZE (64ULL << 20)
 
@@ -278,9 +279,8 @@ static const uint8_t json_lexer[][256] =  {
     },
 };
 
-void json_lexer_init(JSONLexer *lexer, JSONLexerEmitter func)
+void json_lexer_init(JSONLexer *lexer)
 {
-    lexer->emit = func;
     lexer->state = IN_START;
     lexer->token = g_string_sized_new(3);
     lexer->x = lexer->y = 0;
@@ -316,7 +316,8 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
         case JSON_FLOAT:
         case JSON_KEYWORD:
         case JSON_STRING:
-            lexer->emit(lexer, lexer->token, new_state, lexer->x, lexer->y);
+            json_message_process_token(lexer, lexer->token, new_state,
+                                       lexer->x, lexer->y);
             /* fall through */
         case JSON_SKIP:
             g_string_truncate(lexer->token, 0);
@@ -336,7 +337,8 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
              * never a valid ASCII/UTF-8 sequence, so this should reliably
              * induce an error/flush state.
              */
-            lexer->emit(lexer, lexer->token, JSON_ERROR, lexer->x, lexer->y);
+            json_message_process_token(lexer, lexer->token, JSON_ERROR,
+                                       lexer->x, lexer->y);
             g_string_truncate(lexer->token, 0);
             new_state = IN_START;
             lexer->state = new_state;
@@ -351,7 +353,8 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
      * this is a security consideration.
      */
     if (lexer->token->len > MAX_TOKEN_SIZE) {
-        lexer->emit(lexer, lexer->token, lexer->state, lexer->x, lexer->y);
+        json_message_process_token(lexer, lexer->token, lexer->state,
+                                   lexer->x, lexer->y);
         g_string_truncate(lexer->token, 0);
         lexer->state = IN_START;
     }
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index 78dfff2aa0..9f57ebf2bd 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -34,8 +34,8 @@ static void json_message_free_tokens(JSONMessageParser *parser)
     }
 }
 
-static void json_message_process_token(JSONLexer *lexer, GString *input,
-                                       JSONTokenType type, int x, int y)
+void json_message_process_token(JSONLexer *lexer, GString *input,
+                                JSONTokenType type, int x, int y)
 {
     JSONMessageParser *parser = container_of(lexer, JSONMessageParser, lexer);
     JSONToken *token;
@@ -115,7 +115,7 @@ void json_message_parser_init(JSONMessageParser *parser,
     parser->tokens = g_queue_new();
     parser->token_size = 0;
 
-    json_lexer_init(&parser->lexer, json_message_process_token);
+    json_lexer_init(&parser->lexer);
 }
 
 void json_message_parser_feed(JSONMessageParser *parser,
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 33/56] json: Redesign the callback to consume JSON values
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (31 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 32/56] json: Have lexer call streamer directly Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-13 15:30   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 34/56] json: Don't pass null @tokens to json_parser_parse() Markus Armbruster
                   ` (23 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

The classical way to structure parser and lexer is to have the client
call the parser to get an abstract syntax tree, the parser call the
lexer to get the next token, and the lexer call some function to get
input characters.

Another way to structure them would be to have the client feed
characters to the lexer, the lexer feed tokens to the parser, and the
parser feed abstract syntax trees to some callback provided by the
client.  This way is more easily integrated into an event loop that
dispatches input characters as they arrive.

Our JSON parser is kind of between the two.  The lexer feeds tokens to
a "streamer" instead of a real parser.  The streamer accumulates
tokens until it got the sequence of tokens that comprise a single JSON
value (it counts curly braces and square brackets to decide).  It
feeds those token sequences to a callback provided by the client.  The
callback passes each token sequence to the parser, and gets back an
abstract syntax tree.

I figure it was done that way to make a straightforward recursive
descent parser possible.  "Get next token" becomes "pop the first
token off the token sequence".  Drawback: we need to store a complete
token sequence.  Each token eats 13 + input characters + malloc
overhead bytes.

Observations:

1. This is not the only way to use recursive descent.  If we replaced
   "get next token" by a coroutine yield, we could do without a
   streamer.

2. The lexer reports errors by passing a JSON_ERROR token to the
   streamer.  This communicates the offending input characters and
   their location, but no more.

3. The streamer reports errors by passing a null token sequence to the
   callback.  The (already poor) lexical error information is thrown
   away.

4. Having the callback receive a token sequence duplicates the code to
   convert token sequence to abstract syntax tree in every callback.

5. Known bug: the streamer silently drops incomplete token sequences.

This commit rectifies 4. by lifting the call of the parser from the
callbacks into the streamer.  Later commits will address 3. and 5.

The lifting removes a bug from qjson.c's parse_json(): it passed a
pointer to a non-null Error * in certain cases, as demonstrated by
check-qjson.c.

json_parser_parse() is now unused.  It's a stupid wrapper around
json_parser_parse_err().  Drop it, and rename json_parser_parse_err()
to json_parser_parse().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/qapi/qmp/json-parser.h   |  3 +--
 include/qapi/qmp/json-streamer.h |  8 ++++++--
 monitor.c                        | 18 ++++++++----------
 qapi/qmp-dispatch.c              |  1 -
 qga/main.c                       | 12 +++---------
 qobject/json-parser.c            |  7 +------
 qobject/json-streamer.c          | 19 +++++++++++--------
 qobject/qjson.c                  | 14 +++++---------
 tests/check-qjson.c              |  1 -
 tests/libqtest.c                 |  9 +++------
 10 files changed, 38 insertions(+), 54 deletions(-)

diff --git a/include/qapi/qmp/json-parser.h b/include/qapi/qmp/json-parser.h
index 102f5c0068..a34209db7a 100644
--- a/include/qapi/qmp/json-parser.h
+++ b/include/qapi/qmp/json-parser.h
@@ -16,7 +16,6 @@
 
 #include "qemu-common.h"
 
-QObject *json_parser_parse(GQueue *tokens, va_list *ap);
-QObject *json_parser_parse_err(GQueue *tokens, va_list *ap, Error **errp);
+QObject *json_parser_parse(GQueue *tokens, va_list *ap, Error **errp);
 
 #endif
diff --git a/include/qapi/qmp/json-streamer.h b/include/qapi/qmp/json-streamer.h
index 7922e185a5..e162fd01da 100644
--- a/include/qapi/qmp/json-streamer.h
+++ b/include/qapi/qmp/json-streamer.h
@@ -25,7 +25,9 @@ typedef struct JSONToken {
 
 typedef struct JSONMessageParser
 {
-    void (*emit)(struct JSONMessageParser *parser, GQueue *tokens);
+    void (*emit)(void *opaque, QObject *json, Error *err);
+    void *opaque;
+    va_list *ap;
     JSONLexer lexer;
     int brace_count;
     int bracket_count;
@@ -37,7 +39,9 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
                                 JSONTokenType type, int x, int y);
 
 void json_message_parser_init(JSONMessageParser *parser,
-                              void (*func)(JSONMessageParser *, GQueue *));
+                              void (*emit)(void *opaque, QObject *json,
+                                           Error *err),
+                              void *opaque, va_list *ap);
 
 void json_message_parser_feed(JSONMessageParser *parser,
                              const char *buffer, size_t size);
diff --git a/monitor.c b/monitor.c
index 77861e96af..71658d9905 100644
--- a/monitor.c
+++ b/monitor.c
@@ -59,7 +59,6 @@
 #include "qapi/qmp/qstring.h"
 #include "qapi/qmp/qjson.h"
 #include "qapi/qmp/json-streamer.h"
-#include "qapi/qmp/json-parser.h"
 #include "qapi/qmp/qlist.h"
 #include "qom/object_interfaces.h"
 #include "trace-root.h"
@@ -4245,18 +4244,15 @@ static void monitor_qmp_bh_dispatcher(void *data)
 
 #define  QMP_REQ_QUEUE_LEN_MAX  (8)
 
-static void handle_qmp_command(JSONMessageParser *parser, GQueue *tokens)
+static void handle_qmp_command(void *opaque, QObject *req, Error *err)
 {
-    QObject *req, *id = NULL;
+    Monitor *mon = opaque;
+    QObject *id = NULL;
     QDict *qdict;
-    MonitorQMP *mon_qmp = container_of(parser, MonitorQMP, parser);
-    Monitor *mon = container_of(mon_qmp, Monitor, qmp);
-    Error *err = NULL;
     QMPRequest *req_obj;
 
-    req = json_parser_parse_err(tokens, NULL, &err);
     if (!req && !err) {
-        /* json_parser_parse_err() sucks: can fail without setting @err */
+        /* json_parser_parse() sucks: can fail without setting @err */
         error_setg(&err, QERR_JSON_PARSING);
     }
 
@@ -4452,7 +4448,8 @@ static void monitor_qmp_event(void *opaque, int event)
         monitor_qmp_response_flush(mon);
         monitor_qmp_cleanup_queues(mon);
         json_message_parser_destroy(&mon->qmp.parser);
-        json_message_parser_init(&mon->qmp.parser, handle_qmp_command);
+        json_message_parser_init(&mon->qmp.parser, handle_qmp_command,
+                                 mon, NULL);
         mon_refcount--;
         monitor_fdsets_cleanup();
         break;
@@ -4670,7 +4667,8 @@ void monitor_init(Chardev *chr, int flags)
 
     if (monitor_is_qmp(mon)) {
         qemu_chr_fe_set_echo(&mon->chr, true);
-        json_message_parser_init(&mon->qmp.parser, handle_qmp_command);
+        json_message_parser_init(&mon->qmp.parser, handle_qmp_command,
+                                 mon, NULL);
         if (mon->use_io_thread) {
             /*
              * Make sure the old iowatch is gone.  It's possible when
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 6f2d466596..d8da1a62de 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -14,7 +14,6 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qapi/qmp/dispatch.h"
-#include "qapi/qmp/json-parser.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qjson.h"
 #include "qapi/qmp/qbool.h"
diff --git a/qga/main.c b/qga/main.c
index 87372d40ef..2fc49d00d8 100644
--- a/qga/main.c
+++ b/qga/main.c
@@ -19,7 +19,6 @@
 #include <sys/wait.h>
 #endif
 #include "qapi/qmp/json-streamer.h"
-#include "qapi/qmp/json-parser.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qjson.h"
 #include "qapi/qmp/qstring.h"
@@ -597,18 +596,13 @@ static void process_command(GAState *s, QDict *req)
 }
 
 /* handle requests/control events coming in over the channel */
-static void process_event(JSONMessageParser *parser, GQueue *tokens)
+static void process_event(void *opaque, QObject *obj, Error *err)
 {
-    GAState *s = container_of(parser, GAState, parser);
-    QObject *obj;
+    GAState *s = opaque;
     QDict *req, *rsp;
-    Error *err = NULL;
     int ret;
 
-    g_assert(s && parser);
-
     g_debug("process_event: called");
-    obj = json_parser_parse_err(tokens, NULL, &err);
     if (err) {
         goto err;
     }
@@ -1320,7 +1314,7 @@ static int run_agent(GAState *s, GAConfig *config, int socket_activation)
     s->command_state = ga_command_state_new();
     ga_command_state_init(s, s->command_state);
     ga_command_state_init_all(s->command_state);
-    json_message_parser_init(&s->parser, process_event);
+    json_message_parser_init(&s->parser, process_event, s, NULL);
 
 #ifndef _WIN32
     if (!register_signal_handlers()) {
diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index b14336e653..0196d511c3 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -536,12 +536,7 @@ static QObject *parse_value(JSONParserContext *ctxt, va_list *ap)
     }
 }
 
-QObject *json_parser_parse(GQueue *tokens, va_list *ap)
-{
-    return json_parser_parse_err(tokens, ap, NULL);
-}
-
-QObject *json_parser_parse_err(GQueue *tokens, va_list *ap, Error **errp)
+QObject *json_parser_parse(GQueue *tokens, va_list *ap, Error **errp)
 {
     JSONParserContext ctxt = { .buf = tokens };
     QObject *result;
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index 9f57ebf2bd..7fd0ff8756 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -14,6 +14,7 @@
 #include "qemu/osdep.h"
 #include "qemu-common.h"
 #include "qapi/qmp/json-lexer.h"
+#include "qapi/qmp/json-parser.h"
 #include "qapi/qmp/json-streamer.h"
 
 #define MAX_TOKEN_SIZE (64ULL << 20)
@@ -38,8 +39,9 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
                                 JSONTokenType type, int x, int y)
 {
     JSONMessageParser *parser = container_of(lexer, JSONMessageParser, lexer);
+    Error *err = NULL;
     JSONToken *token;
-    GQueue *tokens;
+    QObject *json;
 
     switch (type) {
     case JSON_LCURLY:
@@ -97,19 +99,20 @@ out_emit:
     /* send current list of tokens to parser and reset tokenizer */
     parser->brace_count = 0;
     parser->bracket_count = 0;
-    /* parser->emit takes ownership of parser->tokens.  Remove our own
-     * reference to parser->tokens before handing it out to parser->emit.
-     */
-    tokens = parser->tokens;
+    json = json_parser_parse(parser->tokens, parser->ap, &err);
     parser->tokens = g_queue_new();
-    parser->emit(parser, tokens);
     parser->token_size = 0;
+    parser->emit(parser->opaque, json, err);
 }
 
 void json_message_parser_init(JSONMessageParser *parser,
-                              void (*func)(JSONMessageParser *, GQueue *))
+                              void (*emit)(void *opaque, QObject *json,
+                                           Error *err),
+                              void *opaque, va_list *ap)
 {
-    parser->emit = func;
+    parser->emit = emit;
+    parser->opaque = opaque;
+    parser->ap = ap;
     parser->brace_count = 0;
     parser->bracket_count = 0;
     parser->tokens = g_queue_new();
diff --git a/qobject/qjson.c b/qobject/qjson.c
index ab4040f235..7395556069 100644
--- a/qobject/qjson.c
+++ b/qobject/qjson.c
@@ -13,8 +13,6 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
-#include "qapi/qmp/json-lexer.h"
-#include "qapi/qmp/json-parser.h"
 #include "qapi/qmp/json-streamer.h"
 #include "qapi/qmp/qjson.h"
 #include "qapi/qmp/qbool.h"
@@ -27,16 +25,16 @@
 typedef struct JSONParsingState
 {
     JSONMessageParser parser;
-    va_list *ap;
     QObject *result;
     Error *err;
 } JSONParsingState;
 
-static void parse_json(JSONMessageParser *parser, GQueue *tokens)
+static void consume_json(void *opaque, QObject *json, Error *err)
 {
-    JSONParsingState *s = container_of(parser, JSONParsingState, parser);
+    JSONParsingState *s = opaque;
 
-    s->result = json_parser_parse_err(tokens, s->ap, &s->err);
+    s->result = json;
+    error_propagate(&s->err, err);
 }
 
 /*
@@ -54,9 +52,7 @@ static QObject *qobject_from_jsonv(const char *string, va_list *ap,
 {
     JSONParsingState state = {};
 
-    state.ap = ap;
-
-    json_message_parser_init(&state.parser, parse_json);
+    json_message_parser_init(&state.parser, consume_json, &state, ap);
     json_message_parser_feed(&state.parser, string, strlen(string));
     json_message_parser_flush(&state.parser);
     json_message_parser_destroy(&state.parser);
diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index c8c0ad95a6..4c4afcf691 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -1385,7 +1385,6 @@ static void multiple_values(void)
     qobject_unref(obj);
 
     /* BUG simultaneously succeeds and fails */
-    /* BUG calls json_parser_parse() with errp pointing to non-null */
     obj = qobject_from_json("} true", &err);
     g_assert(qbool_get_bool(qobject_to(QBool, obj)));
     error_free_or_abort(&err);
diff --git a/tests/libqtest.c b/tests/libqtest.c
index 9c844874e4..aa451214d9 100644
--- a/tests/libqtest.c
+++ b/tests/libqtest.c
@@ -23,7 +23,6 @@
 #include "libqtest.h"
 #include "qemu/cutils.h"
 #include "qapi/error.h"
-#include "qapi/qmp/json-parser.h"
 #include "qapi/qmp/json-streamer.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qjson.h"
@@ -428,12 +427,10 @@ typedef struct {
     QDict *response;
 } QMPResponseParser;
 
-static void qmp_response(JSONMessageParser *parser, GQueue *tokens)
+static void qmp_response(void *opaque, QObject *obj, Error *err)
 {
-    QMPResponseParser *qmp = container_of(parser, QMPResponseParser, parser);
-    QObject *obj;
+    QMPResponseParser *qmp = opaque;
 
-    obj = json_parser_parse(tokens, NULL);
     if (!obj) {
         fprintf(stderr, "QMP JSON response parsing failed\n");
         exit(1);
@@ -450,7 +447,7 @@ QDict *qmp_fd_receive(int fd)
     bool log = getenv("QTEST_LOG") != NULL;
 
     qmp.response = NULL;
-    json_message_parser_init(&qmp.parser, qmp_response);
+    json_message_parser_init(&qmp.parser, qmp_response, &qmp, NULL);
     while (!qmp.response) {
         ssize_t len;
         char c;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 34/56] json: Don't pass null @tokens to json_parser_parse()
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (32 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 33/56] json: Redesign the callback to consume JSON values Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-13 15:32   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 35/56] json: Don't create JSON_ERROR tokens that won't be used Markus Armbruster
                   ` (22 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

json_parser_parse() normally returns the QObject on success.  Except
it returns null when its @tokens argument is null.

Its only caller json_message_process_token() passes null @tokens when
emitting a lexical error.  The call is a rather opaque way to say json
= NULL then.

Simplify matters by lifting the assignment to json out of the emit
path: initialize json to null, set it to the value of
json_parser_parse() when there's no lexical error.  Drop the special
case from json_parser_parse().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-parser.c   |  4 ----
 qobject/json-streamer.c | 25 ++++++++++++-------------
 2 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index 0196d511c3..0e4ea564ab 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -541,10 +541,6 @@ QObject *json_parser_parse(GQueue *tokens, va_list *ap, Error **errp)
     JSONParserContext ctxt = { .buf = tokens };
     QObject *result;
 
-    if (!tokens) {
-        return NULL;
-    }
-
     result = parse_value(&ctxt, ap);
 
     error_propagate(errp, ctxt.err);
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index 7fd0ff8756..0c33186e8e 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -39,9 +39,9 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
                                 JSONTokenType type, int x, int y)
 {
     JSONMessageParser *parser = container_of(lexer, JSONMessageParser, lexer);
+    QObject *json = NULL;
     Error *err = NULL;
     JSONToken *token;
-    QObject *json;
 
     switch (type) {
     case JSON_LCURLY:
@@ -72,34 +72,33 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
     g_queue_push_tail(parser->tokens, token);
 
     if (type == JSON_ERROR) {
-        goto out_emit_bad;
-    } else if (parser->brace_count < 0 ||
+        goto out_emit;
+    }
+
+    if (parser->brace_count < 0 ||
         parser->bracket_count < 0 ||
         (parser->brace_count == 0 &&
          parser->bracket_count == 0)) {
+        json = json_parser_parse(parser->tokens, parser->ap, &err);
+        parser->tokens = NULL;
         goto out_emit;
-    } else if (parser->token_size > MAX_TOKEN_SIZE ||
+    }
+
+    if (parser->token_size > MAX_TOKEN_SIZE ||
                g_queue_get_length(parser->tokens) > MAX_TOKEN_COUNT ||
                parser->bracket_count + parser->brace_count > MAX_NESTING) {
         /* Security consideration, we limit total memory allocated per object
          * and the maximum recursion depth that a message can force.
          */
-        goto out_emit_bad;
+        goto out_emit;
     }
 
     return;
 
-out_emit_bad:
-    /*
-     * Clear out token list and tell the parser to emit an error
-     * indication by passing it a NULL list
-     */
-    json_message_free_tokens(parser);
 out_emit:
-    /* send current list of tokens to parser and reset tokenizer */
     parser->brace_count = 0;
     parser->bracket_count = 0;
-    json = json_parser_parse(parser->tokens, parser->ap, &err);
+    json_message_free_tokens(parser);
     parser->tokens = g_queue_new();
     parser->token_size = 0;
     parser->emit(parser->opaque, json, err);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 35/56] json: Don't create JSON_ERROR tokens that won't be used
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (33 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 34/56] json: Don't pass null @tokens to json_parser_parse() Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-13 15:32   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 36/56] json: Rename token JSON_ESCAPE & friends to JSON_INTERPOL Markus Armbruster
                   ` (21 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-streamer.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index 0c33186e8e..fa595a8761 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -56,6 +56,8 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
     case JSON_RSQUARE:
         parser->bracket_count--;
         break;
+    case JSON_ERROR:
+        goto out_emit;
     default:
         break;
     }
@@ -71,10 +73,6 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
 
     g_queue_push_tail(parser->tokens, token);
 
-    if (type == JSON_ERROR) {
-        goto out_emit;
-    }
-
     if (parser->brace_count < 0 ||
         parser->bracket_count < 0 ||
         (parser->brace_count == 0 &&
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 36/56] json: Rename token JSON_ESCAPE & friends to JSON_INTERPOL
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (34 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 35/56] json: Don't create JSON_ERROR tokens that won't be used Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-13 15:34   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 37/56] json: Treat unwanted interpolation as lexical error Markus Armbruster
                   ` (20 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

The JSON parser optionally supports interpolation.  The code calls it
"escape".  Awkward, because it uses the same term for escape sequences
within strings.  The latter usage is consistent with RFC 7159 "The
JavaScript Object Notation (JSON) Data Interchange Format" and ISO C.
Call the former "interpolation" instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/qapi/qmp/json-lexer.h |  2 +-
 qobject/json-lexer.c          | 64 +++++++++++++++++------------------
 qobject/json-parser.c         |  8 ++---
 3 files changed, 37 insertions(+), 37 deletions(-)

diff --git a/include/qapi/qmp/json-lexer.h b/include/qapi/qmp/json-lexer.h
index 44bcf2ca64..ff3a6f80f0 100644
--- a/include/qapi/qmp/json-lexer.h
+++ b/include/qapi/qmp/json-lexer.h
@@ -27,7 +27,7 @@ typedef enum json_token_type {
     JSON_FLOAT,
     JSON_KEYWORD,
     JSON_STRING,
-    JSON_ESCAPE,
+    JSON_INTERPOL,
     JSON_SKIP,
     JSON_ERROR,
 } JSONTokenType;
diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 0b54b1af56..5b1f720331 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -115,12 +115,12 @@ enum json_lexer_state {
     IN_NONZERO_NUMBER,
     IN_NEG_NONZERO_NUMBER,
     IN_KEYWORD,
-    IN_ESCAPE,
-    IN_ESCAPE_L,
-    IN_ESCAPE_LL,
-    IN_ESCAPE_I,
-    IN_ESCAPE_I6,
-    IN_ESCAPE_I64,
+    IN_INTERPOL,
+    IN_INTERPOL_L,
+    IN_INTERPOL_LL,
+    IN_INTERPOL_I,
+    IN_INTERPOL_I6,
+    IN_INTERPOL_I64,
     IN_WHITESPACE,
     IN_START,
 };
@@ -221,40 +221,40 @@ static const uint8_t json_lexer[][256] =  {
         ['\n'] = IN_WHITESPACE,
     },
 
-    /* escape */
-    [IN_ESCAPE_LL] = {
-        ['d'] = JSON_ESCAPE,
-        ['u'] = JSON_ESCAPE,
+    /* interpolation */
+    [IN_INTERPOL_LL] = {
+        ['d'] = JSON_INTERPOL,
+        ['u'] = JSON_INTERPOL,
     },
 
-    [IN_ESCAPE_L] = {
-        ['d'] = JSON_ESCAPE,
-        ['l'] = IN_ESCAPE_LL,
-        ['u'] = JSON_ESCAPE,
+    [IN_INTERPOL_L] = {
+        ['d'] = JSON_INTERPOL,
+        ['l'] = IN_INTERPOL_LL,
+        ['u'] = JSON_INTERPOL,
     },
 
-    [IN_ESCAPE_I64] = {
-        ['d'] = JSON_ESCAPE,
-        ['u'] = JSON_ESCAPE,
+    [IN_INTERPOL_I64] = {
+        ['d'] = JSON_INTERPOL,
+        ['u'] = JSON_INTERPOL,
     },
 
-    [IN_ESCAPE_I6] = {
-        ['4'] = IN_ESCAPE_I64,
+    [IN_INTERPOL_I6] = {
+        ['4'] = IN_INTERPOL_I64,
     },
 
-    [IN_ESCAPE_I] = {
-        ['6'] = IN_ESCAPE_I6,
+    [IN_INTERPOL_I] = {
+        ['6'] = IN_INTERPOL_I6,
     },
 
-    [IN_ESCAPE] = {
-        ['d'] = JSON_ESCAPE,
-        ['i'] = JSON_ESCAPE,
-        ['p'] = JSON_ESCAPE,
-        ['s'] = JSON_ESCAPE,
-        ['u'] = JSON_ESCAPE,
-        ['f'] = JSON_ESCAPE,
-        ['l'] = IN_ESCAPE_L,
-        ['I'] = IN_ESCAPE_I,
+    [IN_INTERPOL] = {
+        ['d'] = JSON_INTERPOL,
+        ['i'] = JSON_INTERPOL,
+        ['p'] = JSON_INTERPOL,
+        ['s'] = JSON_INTERPOL,
+        ['u'] = JSON_INTERPOL,
+        ['f'] = JSON_INTERPOL,
+        ['l'] = IN_INTERPOL_L,
+        ['I'] = IN_INTERPOL_I,
     },
 
     /* top level rule */
@@ -271,7 +271,7 @@ static const uint8_t json_lexer[][256] =  {
         [','] = JSON_COMMA,
         [':'] = JSON_COLON,
         ['a' ... 'z'] = IN_KEYWORD,
-        ['%'] = IN_ESCAPE,
+        ['%'] = IN_INTERPOL,
         [' '] = IN_WHITESPACE,
         ['\t'] = IN_WHITESPACE,
         ['\r'] = IN_WHITESPACE,
@@ -311,7 +311,7 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
         case JSON_RSQUARE:
         case JSON_COLON:
         case JSON_COMMA:
-        case JSON_ESCAPE:
+        case JSON_INTERPOL:
         case JSON_INTEGER:
         case JSON_FLOAT:
         case JSON_KEYWORD:
diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index 0e4ea564ab..f1806ce0dc 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -418,7 +418,7 @@ static QObject *parse_keyword(JSONParserContext *ctxt)
     return NULL;
 }
 
-static QObject *parse_escape(JSONParserContext *ctxt, va_list *ap)
+static QObject *parse_interpolation(JSONParserContext *ctxt, va_list *ap)
 {
     JSONToken *token;
 
@@ -427,7 +427,7 @@ static QObject *parse_escape(JSONParserContext *ctxt, va_list *ap)
     }
 
     token = parser_context_pop_token(ctxt);
-    assert(token && token->type == JSON_ESCAPE);
+    assert(token && token->type == JSON_INTERPOL);
 
     if (!strcmp(token->str, "%p")) {
         return va_arg(*ap, QObject *);
@@ -522,8 +522,8 @@ static QObject *parse_value(JSONParserContext *ctxt, va_list *ap)
         return parse_object(ctxt, ap);
     case JSON_LSQUARE:
         return parse_array(ctxt, ap);
-    case JSON_ESCAPE:
-        return parse_escape(ctxt, ap);
+    case JSON_INTERPOL:
+        return parse_interpolation(ctxt, ap);
     case JSON_INTEGER:
     case JSON_FLOAT:
     case JSON_STRING:
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 37/56] json: Treat unwanted interpolation as lexical error
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (35 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 36/56] json: Rename token JSON_ESCAPE & friends to JSON_INTERPOL Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-13 15:48   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 38/56] json: Pass lexical errors and limit violations to callback Markus Armbruster
                   ` (19 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

The JSON parser optionally supports interpolation.  The lexer
recognizes interpolation tokens unconditionally.  The parser rejects
them when interpolation is disabled, in parse_interpolation().
However, it neglects to set an error then, which can make
json_parser_parse() fail without setting an error.

Move the check for unwanted interpolation from the parser's
parse_interpolation() into the lexer's finite state machine.  When
interpolation is disabled, '%' is now handled like any other
unexpected character.

The next commit will improve how such lexical errors are handled.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/qapi/qmp/json-lexer.h |  4 ++--
 qobject/json-lexer.c          | 42 ++++++++++++++++++++++++++---------
 qobject/json-parser.c         |  4 ----
 qobject/json-streamer.c       |  2 +-
 tests/qmp-test.c              |  4 ++++
 5 files changed, 39 insertions(+), 17 deletions(-)

diff --git a/include/qapi/qmp/json-lexer.h b/include/qapi/qmp/json-lexer.h
index ff3a6f80f0..5586d12f26 100644
--- a/include/qapi/qmp/json-lexer.h
+++ b/include/qapi/qmp/json-lexer.h
@@ -33,12 +33,12 @@ typedef enum json_token_type {
 } JSONTokenType;
 
 typedef struct JSONLexer {
-    int state;
+    int start_state, state;
     GString *token;
     int x, y;
 } JSONLexer;
 
-void json_lexer_init(JSONLexer *lexer);
+void json_lexer_init(JSONLexer *lexer, bool enable_interpolation);
 
 void json_lexer_feed(JSONLexer *lexer, const char *buffer, size_t size);
 
diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 5b1f720331..0ea1eae4aa 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -122,6 +122,7 @@ enum json_lexer_state {
     IN_INTERPOL_I6,
     IN_INTERPOL_I64,
     IN_WHITESPACE,
+    IN_START_INTERPOL,
     IN_START,
 };
 
@@ -271,17 +272,38 @@ static const uint8_t json_lexer[][256] =  {
         [','] = JSON_COMMA,
         [':'] = JSON_COLON,
         ['a' ... 'z'] = IN_KEYWORD,
+        [' '] = IN_WHITESPACE,
+        ['\t'] = IN_WHITESPACE,
+        ['\r'] = IN_WHITESPACE,
+        ['\n'] = IN_WHITESPACE,
+    },
+
+    [IN_START_INTERPOL] = {
+        ['"'] = IN_DQ_STRING,
+        ['\''] = IN_SQ_STRING,
+        ['0'] = IN_ZERO,
+        ['1' ... '9'] = IN_NONZERO_NUMBER,
+        ['-'] = IN_NEG_NONZERO_NUMBER,
+        ['{'] = JSON_LCURLY,
+        ['}'] = JSON_RCURLY,
+        ['['] = JSON_LSQUARE,
+        [']'] = JSON_RSQUARE,
+        [','] = JSON_COMMA,
+        [':'] = JSON_COLON,
+        ['a' ... 'z'] = IN_KEYWORD,
+        [' '] = IN_WHITESPACE,
+        ['\t'] = IN_WHITESPACE,
+        ['\r'] = IN_WHITESPACE,
+        ['\n'] = IN_WHITESPACE,
+        /* matches IN_START up to here */
         ['%'] = IN_INTERPOL,
-        [' '] = IN_WHITESPACE,
-        ['\t'] = IN_WHITESPACE,
-        ['\r'] = IN_WHITESPACE,
-        ['\n'] = IN_WHITESPACE,
     },
 };
 
-void json_lexer_init(JSONLexer *lexer)
+void json_lexer_init(JSONLexer *lexer, bool enable_interpolation)
 {
-    lexer->state = IN_START;
+    lexer->start_state = lexer->state = enable_interpolation
+        ? IN_START_INTERPOL : IN_START;
     lexer->token = g_string_sized_new(3);
     lexer->x = lexer->y = 0;
 }
@@ -321,7 +343,7 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
             /* fall through */
         case JSON_SKIP:
             g_string_truncate(lexer->token, 0);
-            new_state = IN_START;
+            new_state = lexer->start_state;
             break;
         case IN_ERROR:
             /* XXX: To avoid having previous bad input leaving the parser in an
@@ -340,7 +362,7 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
             json_message_process_token(lexer, lexer->token, JSON_ERROR,
                                        lexer->x, lexer->y);
             g_string_truncate(lexer->token, 0);
-            new_state = IN_START;
+            new_state = lexer->start_state;
             lexer->state = new_state;
             return;
         default:
@@ -356,7 +378,7 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
         json_message_process_token(lexer, lexer->token, lexer->state,
                                    lexer->x, lexer->y);
         g_string_truncate(lexer->token, 0);
-        lexer->state = IN_START;
+        lexer->state = lexer->start_state;
     }
 }
 
@@ -371,7 +393,7 @@ void json_lexer_feed(JSONLexer *lexer, const char *buffer, size_t size)
 
 void json_lexer_flush(JSONLexer *lexer)
 {
-    if (lexer->state != IN_START) {
+    if (lexer->state != lexer->start_state) {
         json_lexer_feed_char(lexer, 0, true);
     }
 }
diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index f1806ce0dc..848d469b2a 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -422,10 +422,6 @@ static QObject *parse_interpolation(JSONParserContext *ctxt, va_list *ap)
 {
     JSONToken *token;
 
-    if (ap == NULL) {
-        return NULL;
-    }
-
     token = parser_context_pop_token(ctxt);
     assert(token && token->type == JSON_INTERPOL);
 
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index fa595a8761..a373e0114a 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -115,7 +115,7 @@ void json_message_parser_init(JSONMessageParser *parser,
     parser->tokens = g_queue_new();
     parser->token_size = 0;
 
-    json_lexer_init(&parser->lexer);
+    json_lexer_init(&parser->lexer, !!ap);
 }
 
 void json_message_parser_feed(JSONMessageParser *parser,
diff --git a/tests/qmp-test.c b/tests/qmp-test.c
index b77987b644..3046567819 100644
--- a/tests/qmp-test.c
+++ b/tests/qmp-test.c
@@ -94,6 +94,10 @@ static void test_malformed(QTestState *qts)
 
     /* lexical error: interpolation */
     qtest_qmp_send_raw(qts, "%%p\n");
+    /* two errors, one for "%", one for "p" */
+    resp = qtest_qmp_receive(qts);
+    g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
+    qobject_unref(resp);
     resp = qtest_qmp_receive(qts);
     g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
     qobject_unref(resp);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 38/56] json: Pass lexical errors and limit violations to callback
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (36 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 37/56] json: Treat unwanted interpolation as lexical error Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-13 15:51   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 39/56] json: Leave rejecting invalid interpolation to parser Markus Armbruster
                   ` (18 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

The callback to consume JSON values takes QObject *json, Error *err.
If both are null, the callback is supposed to make up an error by
itself.  This sucks.

qjson.c's consume_json() neglects to do so, which makes
qobject_from_json() & friends return null instead of failing.  I
consider that a bug.

The culprit is json_message_process_token(): it passes two null
pointers when it runs into a lexical error or a limit violation.  Fix
it to pass a proper Error object then.  Update the callbacks:

* monitor.c's handle_qmp_command(): the code to make up an error is
  now dead, drop it.

* qga/main.c's process_event(): lumps the "both null" case together
  with the "not a JSON object" case.  The former is now gone.  The
  error message "Invalid JSON syntax" is misleading for the latter.
  Improve it to "Input must be a JSON object".

* qobject/qjson.c's consume_json(): no update; check-qjson
  demonstrates qobject_from_json() now sets an error on lexical
  errors, but still doesn't on some other errors.

* tests/libqtest.c's qmp_response(): the Error object is now reliable,
  so use it to improve the error message.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/qapi/qmp/qerror.h |  3 ---
 monitor.c                 |  5 +----
 qga/main.c                |  3 ++-
 qobject/json-streamer.c   | 22 ++++++++++++++++------
 tests/check-qjson.c       | 14 +++++++-------
 tests/libqtest.c          |  7 +++++--
 6 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/include/qapi/qmp/qerror.h b/include/qapi/qmp/qerror.h
index c82360f429..145571f618 100644
--- a/include/qapi/qmp/qerror.h
+++ b/include/qapi/qmp/qerror.h
@@ -61,9 +61,6 @@
 #define QERR_IO_ERROR \
     "An IO error has occurred"
 
-#define QERR_JSON_PARSING \
-    "Invalid JSON syntax"
-
 #define QERR_MIGRATION_ACTIVE \
     "There's a migration process in progress"
 
diff --git a/monitor.c b/monitor.c
index 71658d9905..dc0ed8df92 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4251,10 +4251,7 @@ static void handle_qmp_command(void *opaque, QObject *req, Error *err)
     QDict *qdict;
     QMPRequest *req_obj;
 
-    if (!req && !err) {
-        /* json_parser_parse() sucks: can fail without setting @err */
-        error_setg(&err, QERR_JSON_PARSING);
-    }
+    assert(!req != !err);
 
     qdict = qobject_to(QDict, req);
     if (qdict) {
diff --git a/qga/main.c b/qga/main.c
index 2fc49d00d8..b74e1241ef 100644
--- a/qga/main.c
+++ b/qga/main.c
@@ -603,12 +603,13 @@ static void process_event(void *opaque, QObject *obj, Error *err)
     int ret;
 
     g_debug("process_event: called");
+    assert(!obj != !err);
     if (err) {
         goto err;
     }
     req = qobject_to(QDict, obj);
     if (!req) {
-        error_setg(&err, QERR_JSON_PARSING);
+        error_setg(&err, "Input must be a JSON object");
         goto err;
     }
     if (!qdict_haskey(req, "execute")) {
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index a373e0114a..e372ecc895 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -13,6 +13,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu-common.h"
+#include "qapi/error.h"
 #include "qapi/qmp/json-lexer.h"
 #include "qapi/qmp/json-parser.h"
 #include "qapi/qmp/json-streamer.h"
@@ -57,6 +58,7 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
         parser->bracket_count--;
         break;
     case JSON_ERROR:
+        error_setg(&err, "JSON parse error, stray '%s'", input->str);
         goto out_emit;
     default:
         break;
@@ -82,12 +84,20 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
         goto out_emit;
     }
 
-    if (parser->token_size > MAX_TOKEN_SIZE ||
-               g_queue_get_length(parser->tokens) > MAX_TOKEN_COUNT ||
-               parser->bracket_count + parser->brace_count > MAX_NESTING) {
-        /* Security consideration, we limit total memory allocated per object
-         * and the maximum recursion depth that a message can force.
-         */
+    /*
+     * Security consideration, we limit total memory allocated per object
+     * and the maximum recursion depth that a message can force.
+     */
+    if (parser->token_size > MAX_TOKEN_SIZE) {
+        error_setg(&err, "JSON token size limit exceeded");
+        goto out_emit;
+    }
+    if (g_queue_get_length(parser->tokens) > MAX_TOKEN_COUNT) {
+        error_setg(&err, "JSON token count limit exceeded");
+        goto out_emit;
+    }
+    if (parser->bracket_count + parser->brace_count > MAX_NESTING) {
+        error_setg(&err, "JSON nesting depth limit exceeded");
         goto out_emit;
     }
 
diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index 4c4afcf691..895be489b3 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -1247,11 +1247,11 @@ static void junk_input(void)
     QObject *obj;
 
     obj = qobject_from_json("@", &err);
-    g_assert(!err);             /* BUG */
+    error_free_or_abort(&err);
     g_assert(obj == NULL);
 
     obj = qobject_from_json("{\x01", &err);
-    g_assert(!err);             /* BUG */
+    error_free_or_abort(&err);
     g_assert(obj == NULL);
 
     obj = qobject_from_json("[0\xFF]", &err);
@@ -1259,11 +1259,11 @@ static void junk_input(void)
     g_assert(obj == NULL);
 
     obj = qobject_from_json("00", &err);
-    g_assert(!err);             /* BUG */
+    error_free_or_abort(&err);
     g_assert(obj == NULL);
 
     obj = qobject_from_json("[1e", &err);
-    g_assert(!err);             /* BUG */
+    error_free_or_abort(&err);
     g_assert(obj == NULL);
 }
 
@@ -1271,7 +1271,7 @@ static void unterminated_string(void)
 {
     Error *err = NULL;
     QObject *obj = qobject_from_json("\"abc", &err);
-    g_assert(!err);             /* BUG */
+    error_free_or_abort(&err);
     g_assert(obj == NULL);
 }
 
@@ -1279,7 +1279,7 @@ static void unterminated_sq_string(void)
 {
     Error *err = NULL;
     QObject *obj = qobject_from_json("'abc", &err);
-    g_assert(!err);             /* BUG */
+    error_free_or_abort(&err);
     g_assert(obj == NULL);
 }
 
@@ -1287,7 +1287,7 @@ static void unterminated_escape(void)
 {
     Error *err = NULL;
     QObject *obj = qobject_from_json("\"abc\\\"", &err);
-    g_assert(!err);             /* BUG */
+    error_free_or_abort(&err);
     g_assert(obj == NULL);
 }
 
diff --git a/tests/libqtest.c b/tests/libqtest.c
index aa451214d9..7ef8dd621f 100644
--- a/tests/libqtest.c
+++ b/tests/libqtest.c
@@ -431,8 +431,11 @@ static void qmp_response(void *opaque, QObject *obj, Error *err)
 {
     QMPResponseParser *qmp = opaque;
 
-    if (!obj) {
-        fprintf(stderr, "QMP JSON response parsing failed\n");
+    assert(!obj != !err);
+
+    if (err) {
+        error_prepend(&err, "QMP JSON response parsing failed: ");
+        error_report_err(err);
         exit(1);
     }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 39/56] json: Leave rejecting invalid interpolation to parser
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (37 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 38/56] json: Pass lexical errors and limit violations to callback Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-13 16:12   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 40/56] json: Replace %I64d, %I64u by %PRId64, %PRIu64 Markus Armbruster
                   ` (17 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Both lexer and parser reject invalid interpolation specifications.
The parser's check is useless.

The lexer ends the token right after the first bad character.  This
tends to lead to suboptimal error reporting.  For instance, input

    [ %11d ]

produces the tokens

    JSON_LSQUARE  [
    JSON_ERROR    %1
    JSON_INTEGER  1
    JSON_KEYWORD  d
    JSON_RSQUARE  ]

The parser then yields an error, an object and two more errors:

    error: Invalid JSON syntax
    object: 1
    error: JSON parse error, invalid keyword
    error: JSON parse error, expecting value

Change the lexer to accept [A-Za-z0-9]*[duipsf].  It now produces

    JSON_LSQUARE  [
    JSON_INTERPOLATION %11d
    JSON_RSQUARE  ]

and the parser reports just

    JSON parse error, invalid interpolation '%11d'

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-lexer.c  | 52 +++++++++----------------------------------
 qobject/json-parser.c |  1 +
 2 files changed, 11 insertions(+), 42 deletions(-)

diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 0ea1eae4aa..7a82aab88b 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -93,7 +93,8 @@
  *   (apostrophe) instead of %x22 (quotation mark), and can't contain
  *   unescaped apostrophe, but can contain unescaped quotation mark.
  * - Interpolation:
- *   interpolation = %((l|ll|I64)[du]|[ipsf])
+ *   The lexer accepts [A-Za-z0-9]*, and leaves rejecting invalid ones
+ *   to the parser.
  *
  * Note:
  * - Input must be encoded in modified UTF-8.
@@ -116,11 +117,6 @@ enum json_lexer_state {
     IN_NEG_NONZERO_NUMBER,
     IN_KEYWORD,
     IN_INTERPOL,
-    IN_INTERPOL_L,
-    IN_INTERPOL_LL,
-    IN_INTERPOL_I,
-    IN_INTERPOL_I6,
-    IN_INTERPOL_I64,
     IN_WHITESPACE,
     IN_START_INTERPOL,
     IN_START,
@@ -222,42 +218,6 @@ static const uint8_t json_lexer[][256] =  {
         ['\n'] = IN_WHITESPACE,
     },
 
-    /* interpolation */
-    [IN_INTERPOL_LL] = {
-        ['d'] = JSON_INTERPOL,
-        ['u'] = JSON_INTERPOL,
-    },
-
-    [IN_INTERPOL_L] = {
-        ['d'] = JSON_INTERPOL,
-        ['l'] = IN_INTERPOL_LL,
-        ['u'] = JSON_INTERPOL,
-    },
-
-    [IN_INTERPOL_I64] = {
-        ['d'] = JSON_INTERPOL,
-        ['u'] = JSON_INTERPOL,
-    },
-
-    [IN_INTERPOL_I6] = {
-        ['4'] = IN_INTERPOL_I64,
-    },
-
-    [IN_INTERPOL_I] = {
-        ['6'] = IN_INTERPOL_I6,
-    },
-
-    [IN_INTERPOL] = {
-        ['d'] = JSON_INTERPOL,
-        ['i'] = JSON_INTERPOL,
-        ['p'] = JSON_INTERPOL,
-        ['s'] = JSON_INTERPOL,
-        ['u'] = JSON_INTERPOL,
-        ['f'] = JSON_INTERPOL,
-        ['l'] = IN_INTERPOL_L,
-        ['I'] = IN_INTERPOL_I,
-    },
-
     /* top level rule */
     [IN_START] = {
         ['"'] = IN_DQ_STRING,
@@ -278,6 +238,14 @@ static const uint8_t json_lexer[][256] =  {
         ['\n'] = IN_WHITESPACE,
     },
 
+    /* interpolation */
+    [IN_INTERPOL] = {
+        TERMINAL(JSON_INTERPOL),
+        ['A' ... 'Z'] = IN_INTERPOL,
+        ['a' ... 'z'] = IN_INTERPOL,
+        ['0' ... '9'] = IN_INTERPOL,
+    },
+
     [IN_START_INTERPOL] = {
         ['"'] = IN_DQ_STRING,
         ['\''] = IN_SQ_STRING,
diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index 848d469b2a..bd137399e5 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -448,6 +448,7 @@ static QObject *parse_interpolation(JSONParserContext *ctxt, va_list *ap)
     } else if (!strcmp(token->str, "%f")) {
         return QOBJECT(qnum_from_double(va_arg(*ap, double)));
     }
+    parse_error(ctxt, token, "invalid interpolation '%s'", token->str);
     return NULL;
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 40/56] json: Replace %I64d, %I64u by %PRId64, %PRIu64
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (38 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 39/56] json: Leave rejecting invalid interpolation to parser Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-13 16:18   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 41/56] json: Nicer recovery from invalid leading zero Markus Armbruster
                   ` (16 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Support for %I64d got addded in commit 2c0d4b36e7f "json: fix PRId64
on Win32".  We had to hard-code I64d because we used the lexer's
finite state machine to check interpolations.  No more, so clean this
up.

Additional conversion specifications would be easy enough to implement
when needed.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-parser.c | 10 ++++++----
 tests/check-qjson.c   | 10 ++++++++++
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index bd137399e5..350a9d267b 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -433,16 +433,18 @@ static QObject *parse_interpolation(JSONParserContext *ctxt, va_list *ap)
         return QOBJECT(qnum_from_int(va_arg(*ap, int)));
     } else if (!strcmp(token->str, "%ld")) {
         return QOBJECT(qnum_from_int(va_arg(*ap, long)));
-    } else if (!strcmp(token->str, "%lld") ||
-               !strcmp(token->str, "%I64d")) {
+    } else if (!strcmp(token->str, "%lld")) {
         return QOBJECT(qnum_from_int(va_arg(*ap, long long)));
+    } else if (!strcmp(token->str, "%" PRId64)) {
+        return QOBJECT(qnum_from_int(va_arg(*ap, int64_t)));
     } else if (!strcmp(token->str, "%u")) {
         return QOBJECT(qnum_from_uint(va_arg(*ap, unsigned int)));
     } else if (!strcmp(token->str, "%lu")) {
         return QOBJECT(qnum_from_uint(va_arg(*ap, unsigned long)));
-    } else if (!strcmp(token->str, "%llu") ||
-               !strcmp(token->str, "%I64u")) {
+    } else if (!strcmp(token->str, "%llu")) {
         return QOBJECT(qnum_from_uint(va_arg(*ap, unsigned long long)));
+    } else if (!strcmp(token->str, "%" PRIu64)) {
+        return QOBJECT(qnum_from_uint(va_arg(*ap, uint64_t)));
     } else if (!strcmp(token->str, "%s")) {
         return QOBJECT(qstring_from_str(va_arg(*ap, const char *)));
     } else if (!strcmp(token->str, "%f")) {
diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index 895be489b3..fbb607c227 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -919,9 +919,11 @@ static void keyword_literal(void)
 static void interpolation(void)
 {
     long long value_lld = 0x123456789abcdefLL;
+    int64_t value_d64 = value_lld;
     long value_ld = (long)value_lld;
     int value_d = (int)value_lld;
     unsigned long long value_llu = 0xfedcba9876543210ULL;
+    uint64_t value_u64 = value_llu;
     unsigned long value_lu = (unsigned long)value_llu;
     unsigned value_u = (unsigned)value_llu;
     double value_f = 2.323423423;
@@ -959,6 +961,10 @@ static void interpolation(void)
     g_assert_cmpint(qnum_get_int(qnum), ==, value_lld);
     qobject_unref(qnum);
 
+    qnum = qobject_to(QNum, qobject_from_jsonf_nofail("%" PRId64, value_d64));
+    g_assert_cmpint(qnum_get_int(qnum), ==, value_lld);
+    qobject_unref(qnum);
+
     qnum = qobject_to(QNum, qobject_from_jsonf_nofail("%u", value_u));
     g_assert_cmpuint(qnum_get_uint(qnum), ==, value_u);
     qobject_unref(qnum);
@@ -971,6 +977,10 @@ static void interpolation(void)
     g_assert_cmpuint(qnum_get_uint(qnum), ==, value_llu);
     qobject_unref(qnum);
 
+    qnum = qobject_to(QNum, qobject_from_jsonf_nofail("%" PRIu64, value_u64));
+    g_assert_cmpuint(qnum_get_uint(qnum), ==, value_llu);
+    qobject_unref(qnum);
+
     qnum = qobject_to(QNum, qobject_from_jsonf_nofail("%f", value_f));
     g_assert(qnum_get_double(qnum) == value_f);
     qobject_unref(qnum);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 41/56] json: Nicer recovery from invalid leading zero
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (39 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 40/56] json: Replace %I64d, %I64u by %PRId64, %PRIu64 Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-13 16:33   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 42/56] json: Improve names of lexer states related to numbers Markus Armbruster
                   ` (15 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

For input 0123, the lexer produces the tokens

    JSON_ERROR    01
    JSON_INTEGER  23

Reporting an error is correct; 0123 is invalid according to RFC 7159.
But the error recovery isn't nice.

Make the finite state machine eat digits before going into the error
state.  The lexer now produces

    JSON_ERROR    0123

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-lexer.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 7a82aab88b..f600cc732e 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -108,6 +108,7 @@ enum json_lexer_state {
     IN_SQ_STRING_ESCAPE,
     IN_SQ_STRING,
     IN_ZERO,
+    IN_BAD_ZERO,
     IN_DIGITS,
     IN_DIGIT,
     IN_EXP_E,
@@ -158,10 +159,14 @@ static const uint8_t json_lexer[][256] =  {
     /* Zero */
     [IN_ZERO] = {
         TERMINAL(JSON_INTEGER),
-        ['0' ... '9'] = IN_ERROR,
+        ['0' ... '9'] = IN_BAD_ZERO,
         ['.'] = IN_MANTISSA,
     },
 
+    [IN_BAD_ZERO] = {
+        ['0' ... '9'] = IN_BAD_ZERO,
+    },
+
     /* Float */
     [IN_DIGITS] = {
         TERMINAL(JSON_FLOAT),
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 42/56] json: Improve names of lexer states related to numbers
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (40 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 41/56] json: Nicer recovery from invalid leading zero Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-13 16:36   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 43/56] qjson: Fix qobject_from_json() & friends for multiple values Markus Armbruster
                   ` (14 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-lexer.c | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index f600cc732e..733ce3f5ba 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -109,13 +109,13 @@ enum json_lexer_state {
     IN_SQ_STRING,
     IN_ZERO,
     IN_BAD_ZERO,
-    IN_DIGITS,
-    IN_DIGIT,
+    IN_EXP_DIGITS,
+    IN_EXP_SIGN,
     IN_EXP_E,
     IN_MANTISSA,
     IN_MANTISSA_DIGITS,
-    IN_NONZERO_NUMBER,
-    IN_NEG_NONZERO_NUMBER,
+    IN_DIGITS,
+    IN_SIGN,
     IN_KEYWORD,
     IN_INTERPOL,
     IN_WHITESPACE,
@@ -168,19 +168,19 @@ static const uint8_t json_lexer[][256] =  {
     },
 
     /* Float */
-    [IN_DIGITS] = {
+    [IN_EXP_DIGITS] = {
         TERMINAL(JSON_FLOAT),
-        ['0' ... '9'] = IN_DIGITS,
+        ['0' ... '9'] = IN_EXP_DIGITS,
     },
 
-    [IN_DIGIT] = {
-        ['0' ... '9'] = IN_DIGITS,
+    [IN_EXP_SIGN] = {
+        ['0' ... '9'] = IN_EXP_DIGITS,
     },
 
     [IN_EXP_E] = {
-        ['-'] = IN_DIGIT,
-        ['+'] = IN_DIGIT,
-        ['0' ... '9'] = IN_DIGITS,
+        ['-'] = IN_EXP_SIGN,
+        ['+'] = IN_EXP_SIGN,
+        ['0' ... '9'] = IN_EXP_DIGITS,
     },
 
     [IN_MANTISSA_DIGITS] = {
@@ -195,17 +195,17 @@ static const uint8_t json_lexer[][256] =  {
     },
 
     /* Number */
-    [IN_NONZERO_NUMBER] = {
+    [IN_DIGITS] = {
         TERMINAL(JSON_INTEGER),
-        ['0' ... '9'] = IN_NONZERO_NUMBER,
+        ['0' ... '9'] = IN_DIGITS,
         ['e'] = IN_EXP_E,
         ['E'] = IN_EXP_E,
         ['.'] = IN_MANTISSA,
     },
 
-    [IN_NEG_NONZERO_NUMBER] = {
+    [IN_SIGN] = {
         ['0'] = IN_ZERO,
-        ['1' ... '9'] = IN_NONZERO_NUMBER,
+        ['1' ... '9'] = IN_DIGITS,
     },
 
     /* keywords */
@@ -228,8 +228,8 @@ static const uint8_t json_lexer[][256] =  {
         ['"'] = IN_DQ_STRING,
         ['\''] = IN_SQ_STRING,
         ['0'] = IN_ZERO,
-        ['1' ... '9'] = IN_NONZERO_NUMBER,
-        ['-'] = IN_NEG_NONZERO_NUMBER,
+        ['1' ... '9'] = IN_DIGITS,
+        ['-'] = IN_SIGN,
         ['{'] = JSON_LCURLY,
         ['}'] = JSON_RCURLY,
         ['['] = JSON_LSQUARE,
@@ -255,8 +255,8 @@ static const uint8_t json_lexer[][256] =  {
         ['"'] = IN_DQ_STRING,
         ['\''] = IN_SQ_STRING,
         ['0'] = IN_ZERO,
-        ['1' ... '9'] = IN_NONZERO_NUMBER,
-        ['-'] = IN_NEG_NONZERO_NUMBER,
+        ['1' ... '9'] = IN_DIGITS,
+        ['-'] = IN_SIGN,
         ['{'] = JSON_LCURLY,
         ['}'] = JSON_RCURLY,
         ['['] = JSON_LSQUARE,
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 43/56] qjson: Fix qobject_from_json() & friends for multiple values
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (41 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 42/56] json: Improve names of lexer states related to numbers Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-14 13:26   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 44/56] json: Fix latent parser aborts at end of input Markus Armbruster
                   ` (13 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

qobject_from_json() & friends use the consume_json() callback to
receive either a value or an error from the parser.

When they are fed a string that contains more than either one JSON
value or one JSON syntax error, consume_json() gets called multiple
times.

When the last call receives a value, qobject_from_json() returns that
value.  Any other values are leaked.

When any call receives an error, qobject_from_json() sets the first
error received.  Any other errors are thrown away.

When values follow errors, qobject_from_json() returns both a value
and sets an error.  That's bad.  Impact:

* block.c's parse_json_protocol() ignores and leaks the value.  It's
  used to to parse pseudo-filenames starting with "json:".  The
  pseudo-filenames can come from the user or from image meta-data such
  as a QCOW2 image's backing file name.

* vl.c's parse_display_qapi() ignores and leaks the error.  It's used
  to parse the argument of command line option -display.

* vl.c's main() case QEMU_OPTION_blockdev ignores the error and leaves
  it in @err.  main() will then pass a pointer to a non-null Error *
  to net_init_clients(), which is forbidden.  It can lead to assertion
  failure or other misbehavior.

* check-qjson.c's multiple_values() demonstrates the badness.

* The other callers are not affected since they only pass strings with
  exactly one JSON value or, in the case of negative tests, one
  error.

The impact on the _nofail() functions is relatively harmless.  They
abort when any call receives an error.  Else they return the last
value, and leak the others, if any.

Fix consume_json() as follows.  On the first call, save value and
error as before.  On subsequent calls, if any, don't save them.  If
the first call saved a value, the next call, if any, replaces the
value by an "Expecting at most one JSON value" error.  Take care not
to leak values or errors that aren't saved.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/qjson.c     | 15 ++++++++++++++-
 tests/check-qjson.c | 10 +++-------
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/qobject/qjson.c b/qobject/qjson.c
index 7395556069..7f69036487 100644
--- a/qobject/qjson.c
+++ b/qobject/qjson.c
@@ -33,8 +33,21 @@ static void consume_json(void *opaque, QObject *json, Error *err)
 {
     JSONParsingState *s = opaque;
 
+    assert(!json != !err);
+    assert(!s->result || !s->err);
+
+    if (s->result) {
+        qobject_unref(s->result);
+        s->result = NULL;
+        error_setg(&s->err, "Expecting at most one JSON value");
+    }
+    if (s->err) {
+        qobject_unref(json);
+        error_free(err);
+        return;
+    }
     s->result = json;
-    error_propagate(&s->err, err);
+    s->err = err;
 }
 
 /*
diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index fbb607c227..30b1b037d3 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -1388,17 +1388,13 @@ static void multiple_values(void)
     Error *err = NULL;
     QObject *obj;
 
-    /* BUG this leaks the syntax tree for "false" */
     obj = qobject_from_json("false true", &err);
-    g_assert(qbool_get_bool(qobject_to(QBool, obj)));
-    g_assert(!err);
-    qobject_unref(obj);
+    error_free_or_abort(&err);
+    g_assert(obj == NULL);
 
-    /* BUG simultaneously succeeds and fails */
     obj = qobject_from_json("} true", &err);
-    g_assert(qbool_get_bool(qobject_to(QBool, obj)));
     error_free_or_abort(&err);
-    qobject_unref(obj);
+    g_assert(obj == NULL);
 }
 
 int main(int argc, char **argv)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 44/56] json: Fix latent parser aborts at end of input
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (42 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 43/56] qjson: Fix qobject_from_json() & friends for multiple values Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-16 13:10   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 45/56] json: Fix streamer not to ignore trailing unterminated structures Markus Armbruster
                   ` (12 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

json-parser.c carefully reports end of input like this:

    token = parser_context_pop_token(ctxt);
    if (token == NULL) {
	parse_error(ctxt, NULL, "premature EOI");
	goto out;
    }

Except parser_context_pop_token() can't return null, it fails its
assertion instead.  Same for parser_context_peek_token().  Broken in
commit 65c0f1e9558, and faithfully preserved in commit 95385fe9ace.
Only a latent bug, because the streamer throws away any input that
could trigger it.

Drop the assertions, so we can fix the streamer in the next commit.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-parser.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index 350a9d267b..c2974d46b3 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -221,14 +221,12 @@ out:
 static JSONToken *parser_context_pop_token(JSONParserContext *ctxt)
 {
     g_free(ctxt->current);
-    assert(!g_queue_is_empty(ctxt->buf));
     ctxt->current = g_queue_pop_head(ctxt->buf);
     return ctxt->current;
 }
 
 static JSONToken *parser_context_peek_token(JSONParserContext *ctxt)
 {
-    assert(!g_queue_is_empty(ctxt->buf));
     return g_queue_peek_head(ctxt->buf);
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 45/56] json: Fix streamer not to ignore trailing unterminated structures
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (43 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 44/56] json: Fix latent parser aborts at end of input Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-16 13:12   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 46/56] json: Assert json_parser_parse() consumes all tokens on success Markus Armbruster
                   ` (11 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

json_message_process_token() accumulates tokens until it got the
sequence of tokens that comprise a single JSON value (it counts curly
braces and square brackets to decide).  It feeds those token sequences
to json_parser_parse().  If a non-empty sequence of tokens remains at
the end of the parse, it's silently ignored.  check-qjson.c cases
unterminated_array(), unterminated_array_comma(), unterminated_dict(),
unterminated_dict_comma() demonstrate this bug.

Fix as follows.  Introduce a JSON_END_OF_INPUT token.  When the
streamer receives it, it feeds the accumulated tokens to
json_parser_parse().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/qapi/qmp/json-lexer.h | 1 +
 qobject/json-lexer.c          | 2 ++
 qobject/json-streamer.c       | 8 ++++++++
 tests/check-qjson.c           | 8 ++++----
 4 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/qapi/qmp/json-lexer.h b/include/qapi/qmp/json-lexer.h
index 5586d12f26..8058695e40 100644
--- a/include/qapi/qmp/json-lexer.h
+++ b/include/qapi/qmp/json-lexer.h
@@ -30,6 +30,7 @@ typedef enum json_token_type {
     JSON_INTERPOL,
     JSON_SKIP,
     JSON_ERROR,
+    JSON_END_OF_INPUT
 } JSONTokenType;
 
 typedef struct JSONLexer {
diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 733ce3f5ba..823db3aef8 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -369,6 +369,8 @@ void json_lexer_flush(JSONLexer *lexer)
     if (lexer->state != lexer->start_state) {
         json_lexer_feed_char(lexer, 0, true);
     }
+    json_message_process_token(lexer, lexer->token, JSON_END_OF_INPUT,
+                               lexer->x, lexer->y);
 }
 
 void json_lexer_destroy(JSONLexer *lexer)
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index e372ecc895..674dfe6e85 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -60,6 +60,13 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
     case JSON_ERROR:
         error_setg(&err, "JSON parse error, stray '%s'", input->str);
         goto out_emit;
+    case JSON_END_OF_INPUT:
+        if (g_queue_is_empty(parser->tokens)) {
+            return;
+        }
+        json = json_parser_parse(parser->tokens, parser->ap, &err);
+        parser->tokens = NULL;
+        goto out_emit;
     default:
         break;
     }
@@ -137,6 +144,7 @@ void json_message_parser_feed(JSONMessageParser *parser,
 void json_message_parser_flush(JSONMessageParser *parser)
 {
     json_lexer_flush(&parser->lexer);
+    assert(g_queue_is_empty(parser->tokens));
 }
 
 void json_message_parser_destroy(JSONMessageParser *parser)
diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index 30b1b037d3..833d220654 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -1305,7 +1305,7 @@ static void unterminated_array(void)
 {
     Error *err = NULL;
     QObject *obj = qobject_from_json("[32", &err);
-    g_assert(!err);             /* BUG */
+    error_free_or_abort(&err);
     g_assert(obj == NULL);
 }
 
@@ -1313,7 +1313,7 @@ static void unterminated_array_comma(void)
 {
     Error *err = NULL;
     QObject *obj = qobject_from_json("[32,", &err);
-    g_assert(!err);             /* BUG */
+    error_free_or_abort(&err);
     g_assert(obj == NULL);
 }
 
@@ -1329,7 +1329,7 @@ static void unterminated_dict(void)
 {
     Error *err = NULL;
     QObject *obj = qobject_from_json("{'abc':32", &err);
-    g_assert(!err);             /* BUG */
+    error_free_or_abort(&err);
     g_assert(obj == NULL);
 }
 
@@ -1337,7 +1337,7 @@ static void unterminated_dict_comma(void)
 {
     Error *err = NULL;
     QObject *obj = qobject_from_json("{'abc':32,", &err);
-    g_assert(!err);             /* BUG */
+    error_free_or_abort(&err);
     g_assert(obj == NULL);
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 46/56] json: Assert json_parser_parse() consumes all tokens on success
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (44 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 45/56] json: Fix streamer not to ignore trailing unterminated structures Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-16 13:13   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 47/56] qjson: Have qobject_from_json() & friends reject empty and blank Markus Armbruster
                   ` (10 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-parser.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index c2974d46b3..208dffc96c 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -539,6 +539,7 @@ QObject *json_parser_parse(GQueue *tokens, va_list *ap, Error **errp)
     QObject *result;
 
     result = parse_value(&ctxt, ap);
+    assert(ctxt.err || g_queue_is_empty(ctxt.buf));
 
     error_propagate(errp, ctxt.err);
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 47/56] qjson: Have qobject_from_json() & friends reject empty and blank
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (45 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 46/56] json: Assert json_parser_parse() consumes all tokens on success Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-16 13:20   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 48/56] json: Enforce token count and size limits more tightly Markus Armbruster
                   ` (9 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

The last case where qobject_from_json() & friends return null without
setting an error is empty or blank input.  Callers:

* block.c's parse_json_protocol() reports "Could not parse the JSON
  options".  It's marked as a work-around, because it also covered
  actual bugs, but they got fixed in the previous few commits.

* qobject_input_visitor_new_str() reports "JSON parse error".  Also
  marked as work-around.  The recent fixes have made this unreachable,
  because it currently gets called only for input starting with '{'.

* check-qjson.c's empty_input() and blank_input() demonstrate the
  behavior.

* The other callers are not affected since they only pass input with
  exactly one JSON value or, in the case of negative tests, one error.

Fail with "Expecting a JSON value" instead of returning null, and
simplify callers.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 block.c                      |  5 -----
 qapi/qobject-input-visitor.c |  5 -----
 qobject/qjson.c              |  4 ++++
 tests/check-qjson.c          | 12 ++++++++++--
 4 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index 39f373e035..b837684e3c 100644
--- a/block.c
+++ b/block.c
@@ -1478,11 +1478,6 @@ static QDict *parse_json_filename(const char *filename, Error **errp)
 
     options_obj = qobject_from_json(filename, errp);
     if (!options_obj) {
-        /* Work around qobject_from_json() lossage TODO fix that */
-        if (errp && !*errp) {
-            error_setg(errp, "Could not parse the JSON options");
-            return NULL;
-        }
         error_prepend(errp, "Could not parse the JSON options: ");
         return NULL;
     }
diff --git a/qapi/qobject-input-visitor.c b/qapi/qobject-input-visitor.c
index da57f4cc24..3e88b27f9e 100644
--- a/qapi/qobject-input-visitor.c
+++ b/qapi/qobject-input-visitor.c
@@ -725,11 +725,6 @@ Visitor *qobject_input_visitor_new_str(const char *str,
     if (is_json) {
         obj = qobject_from_json(str, errp);
         if (!obj) {
-            /* Work around qobject_from_json() lossage TODO fix that */
-            if (errp && !*errp) {
-                error_setg(errp, "JSON parse error");
-                return NULL;
-            }
             return NULL;
         }
         args = qobject_to(QDict, obj);
diff --git a/qobject/qjson.c b/qobject/qjson.c
index 7f69036487..b9ccae2c2a 100644
--- a/qobject/qjson.c
+++ b/qobject/qjson.c
@@ -70,6 +70,10 @@ static QObject *qobject_from_jsonv(const char *string, va_list *ap,
     json_message_parser_flush(&state.parser);
     json_message_parser_destroy(&state.parser);
 
+    if (!state.result && !state.err) {
+        error_setg(&state.err, "Expecting a JSON value");
+    }
+
     error_propagate(errp, state.err);
     return state.result;
 }
diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index 833d220654..49490f678e 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -1240,13 +1240,21 @@ static void simple_interpolation(void)
 
 static void empty_input(void)
 {
-    QObject *obj = qobject_from_json("", &error_abort);
+    Error *err = NULL;
+    QObject *obj;
+
+    obj = qobject_from_json("", &err);
+    error_free_or_abort(&err);
     g_assert(obj == NULL);
 }
 
 static void blank_input(void)
 {
-    QObject *obj = qobject_from_json("\n ", &error_abort);
+    Error *err = NULL;
+    QObject *obj;
+
+    obj = qobject_from_json("\n ", &err);
+    error_free_or_abort(&err);
     g_assert(obj == NULL);
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 48/56] json: Enforce token count and size limits more tightly
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (46 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 47/56] qjson: Have qobject_from_json() & friends reject empty and blank Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-16 13:22   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 49/56] json: Streamline json_message_process_token() Markus Armbruster
                   ` (8 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Token count and size limits exist to guard against excessive heap
usage.  We check them only after we created the token on the heap.
That's assigning a cowboy to the barn to lasso the horse after it has
bolted.  Close the barn door instead: check before we create the
token.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-streamer.c | 36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index 674dfe6e85..810aae521f 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -20,7 +20,7 @@
 
 #define MAX_TOKEN_SIZE (64ULL << 20)
 #define MAX_TOKEN_COUNT (2ULL << 20)
-#define MAX_NESTING (1ULL << 10)
+#define MAX_NESTING (1 << 10)
 
 static void json_message_free_token(void *token, void *opaque)
 {
@@ -71,6 +71,23 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
         break;
     }
 
+    /*
+     * Security consideration, we limit total memory allocated per object
+     * and the maximum recursion depth that a message can force.
+     */
+    if (parser->token_size + input->len + 1 > MAX_TOKEN_SIZE) {
+        error_setg(&err, "JSON token size limit exceeded");
+        goto out_emit;
+    }
+    if (g_queue_get_length(parser->tokens) + 1 > MAX_TOKEN_COUNT) {
+        error_setg(&err, "JSON token count limit exceeded");
+        goto out_emit;
+    }
+    if (parser->bracket_count + parser->brace_count > MAX_NESTING) {
+        error_setg(&err, "JSON nesting depth limit exceeded");
+        goto out_emit;
+    }
+
     token = g_malloc(sizeof(JSONToken) + input->len + 1);
     token->type = type;
     memcpy(token->str, input->str, input->len);
@@ -91,23 +108,6 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
         goto out_emit;
     }
 
-    /*
-     * Security consideration, we limit total memory allocated per object
-     * and the maximum recursion depth that a message can force.
-     */
-    if (parser->token_size > MAX_TOKEN_SIZE) {
-        error_setg(&err, "JSON token size limit exceeded");
-        goto out_emit;
-    }
-    if (g_queue_get_length(parser->tokens) > MAX_TOKEN_COUNT) {
-        error_setg(&err, "JSON token count limit exceeded");
-        goto out_emit;
-    }
-    if (parser->bracket_count + parser->brace_count > MAX_NESTING) {
-        error_setg(&err, "JSON nesting depth limit exceeded");
-        goto out_emit;
-    }
-
     return;
 
 out_emit:
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 49/56] json: Streamline json_message_process_token()
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (47 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 48/56] json: Enforce token count and size limits more tightly Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-16 13:40   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 50/56] json: Unbox tokens queue in JSONMessageParser Markus Armbruster
                   ` (7 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-streamer.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index 810aae521f..954bf9d468 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -99,16 +99,13 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
 
     g_queue_push_tail(parser->tokens, token);
 
-    if (parser->brace_count < 0 ||
-        parser->bracket_count < 0 ||
-        (parser->brace_count == 0 &&
-         parser->bracket_count == 0)) {
-        json = json_parser_parse(parser->tokens, parser->ap, &err);
-        parser->tokens = NULL;
-        goto out_emit;
+    if ((parser->brace_count > 0 || parser->bracket_count > 0)
+        && parser->bracket_count >= 0 && parser->bracket_count >= 0) {
+        return;
     }
 
-    return;
+    json = json_parser_parse(parser->tokens, parser->ap, &err);
+    parser->tokens = NULL;
 
 out_emit:
     parser->brace_count = 0;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 50/56] json: Unbox tokens queue in JSONMessageParser
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (48 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 49/56] json: Streamline json_message_process_token() Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-16 13:42   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 51/56] json: Eliminate lexer state IN_ERROR and pseudo-token JSON_MIN Markus Armbruster
                   ` (6 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/qapi/qmp/json-streamer.h |  2 +-
 qobject/json-parser.c            |  1 -
 qobject/json-streamer.c          | 30 +++++++++++-------------------
 3 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/include/qapi/qmp/json-streamer.h b/include/qapi/qmp/json-streamer.h
index e162fd01da..d1d7fe2595 100644
--- a/include/qapi/qmp/json-streamer.h
+++ b/include/qapi/qmp/json-streamer.h
@@ -31,7 +31,7 @@ typedef struct JSONMessageParser
     JSONLexer lexer;
     int brace_count;
     int bracket_count;
-    GQueue *tokens;
+    GQueue tokens;
     uint64_t token_size;
 } JSONMessageParser;
 
diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index 208dffc96c..3623567160 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -547,7 +547,6 @@ QObject *json_parser_parse(GQueue *tokens, va_list *ap, Error **errp)
         parser_context_pop_token(&ctxt);
     }
     g_free(ctxt.current);
-    g_queue_free(ctxt.buf);
 
     return result;
 }
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index 954bf9d468..9210281a65 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -22,17 +22,12 @@
 #define MAX_TOKEN_COUNT (2ULL << 20)
 #define MAX_NESTING (1 << 10)
 
-static void json_message_free_token(void *token, void *opaque)
-{
-    g_free(token);
-}
-
 static void json_message_free_tokens(JSONMessageParser *parser)
 {
-    if (parser->tokens) {
-        g_queue_foreach(parser->tokens, json_message_free_token, NULL);
-        g_queue_free(parser->tokens);
-        parser->tokens = NULL;
+    JSONToken *token;
+
+    while ((token = g_queue_pop_head(&parser->tokens))) {
+        g_free(token);
     }
 }
 
@@ -61,11 +56,10 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
         error_setg(&err, "JSON parse error, stray '%s'", input->str);
         goto out_emit;
     case JSON_END_OF_INPUT:
-        if (g_queue_is_empty(parser->tokens)) {
+        if (g_queue_is_empty(&parser->tokens)) {
             return;
         }
-        json = json_parser_parse(parser->tokens, parser->ap, &err);
-        parser->tokens = NULL;
+        json = json_parser_parse(&parser->tokens, parser->ap, &err);
         goto out_emit;
     default:
         break;
@@ -79,7 +73,7 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
         error_setg(&err, "JSON token size limit exceeded");
         goto out_emit;
     }
-    if (g_queue_get_length(parser->tokens) + 1 > MAX_TOKEN_COUNT) {
+    if (g_queue_get_length(&parser->tokens) + 1 > MAX_TOKEN_COUNT) {
         error_setg(&err, "JSON token count limit exceeded");
         goto out_emit;
     }
@@ -97,21 +91,19 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
 
     parser->token_size += input->len;
 
-    g_queue_push_tail(parser->tokens, token);
+    g_queue_push_tail(&parser->tokens, token);
 
     if ((parser->brace_count > 0 || parser->bracket_count > 0)
         && parser->bracket_count >= 0 && parser->bracket_count >= 0) {
         return;
     }
 
-    json = json_parser_parse(parser->tokens, parser->ap, &err);
-    parser->tokens = NULL;
+    json = json_parser_parse(&parser->tokens, parser->ap, &err);
 
 out_emit:
     parser->brace_count = 0;
     parser->bracket_count = 0;
     json_message_free_tokens(parser);
-    parser->tokens = g_queue_new();
     parser->token_size = 0;
     parser->emit(parser->opaque, json, err);
 }
@@ -126,7 +118,7 @@ void json_message_parser_init(JSONMessageParser *parser,
     parser->ap = ap;
     parser->brace_count = 0;
     parser->bracket_count = 0;
-    parser->tokens = g_queue_new();
+    g_queue_init(&parser->tokens);
     parser->token_size = 0;
 
     json_lexer_init(&parser->lexer, !!ap);
@@ -141,7 +133,7 @@ void json_message_parser_feed(JSONMessageParser *parser,
 void json_message_parser_flush(JSONMessageParser *parser)
 {
     json_lexer_flush(&parser->lexer);
-    assert(g_queue_is_empty(parser->tokens));
+    assert(g_queue_is_empty(&parser->tokens));
 }
 
 void json_message_parser_destroy(JSONMessageParser *parser)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 51/56] json: Eliminate lexer state IN_ERROR and pseudo-token JSON_MIN
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (49 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 50/56] json: Unbox tokens queue in JSONMessageParser Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-16 13:45   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 52/56] json: Eliminate lexer state IN_WHITESPACE, pseudo-token JSON_SKIP Markus Armbruster
                   ` (5 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/qapi/qmp/json-lexer.h | 10 ++++------
 qobject/json-lexer.c          | 18 ++++++++----------
 2 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/include/qapi/qmp/json-lexer.h b/include/qapi/qmp/json-lexer.h
index 8058695e40..f3524de07a 100644
--- a/include/qapi/qmp/json-lexer.h
+++ b/include/qapi/qmp/json-lexer.h
@@ -14,10 +14,9 @@
 #ifndef QEMU_JSON_LEXER_H
 #define QEMU_JSON_LEXER_H
 
-
-typedef enum json_token_type {
-    JSON_MIN = 100,
-    JSON_LCURLY = JSON_MIN,
+typedef enum {
+    JSON_ERROR = 0,             /* must be zero */
+    JSON_LCURLY,
     JSON_RCURLY,
     JSON_LSQUARE,
     JSON_RSQUARE,
@@ -29,8 +28,7 @@ typedef enum json_token_type {
     JSON_STRING,
     JSON_INTERPOL,
     JSON_SKIP,
-    JSON_ERROR,
-    JSON_END_OF_INPUT
+    JSON_END_OF_INPUT           /* must be last */
 } JSONTokenType;
 
 typedef struct JSONLexer {
diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 823db3aef8..0332f9dbe1 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -101,8 +101,9 @@
  * - Decoding and validating is left to the parser.
  */
 
-enum json_lexer_state {
-    IN_ERROR = 0,               /* must really be 0, see json_lexer[] */
+enum {
+    IN_START = JSON_END_OF_INPUT + 1,
+    IN_START_INTERPOL,
     IN_DQ_STRING_ESCAPE,
     IN_DQ_STRING,
     IN_SQ_STRING_ESCAPE,
@@ -119,11 +120,9 @@ enum json_lexer_state {
     IN_KEYWORD,
     IN_INTERPOL,
     IN_WHITESPACE,
-    IN_START_INTERPOL,
-    IN_START,
 };
 
-QEMU_BUILD_BUG_ON((int)JSON_MIN <= (int)IN_START);
+QEMU_BUILD_BUG_ON(JSON_ERROR != 0); /* json_lexer[] relies on this */
 
 #define TERMINAL(state) [0 ... 0x7F] = (state)
 
@@ -131,10 +130,10 @@ QEMU_BUILD_BUG_ON((int)JSON_MIN <= (int)IN_START);
    from OLD_STATE required lookahead.  This happens whenever the table
    below uses the TERMINAL macro.  */
 #define TERMINAL_NEEDED_LOOKAHEAD(old_state, terminal) \
-    (terminal != IN_ERROR && json_lexer[(old_state)][0] == (terminal))
+    (terminal != JSON_ERROR && json_lexer[(old_state)][0] == (terminal))
 
 static const uint8_t json_lexer[][256] =  {
-    /* Relies on default initialization to IN_ERROR! */
+    /* Relies on default initialization to JSON_ERROR */
 
     /* double quote string */
     [IN_DQ_STRING_ESCAPE] = {
@@ -318,7 +317,7 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
             g_string_truncate(lexer->token, 0);
             new_state = lexer->start_state;
             break;
-        case IN_ERROR:
+        case JSON_ERROR:
             /* XXX: To avoid having previous bad input leaving the parser in an
              * unresponsive state where we consume unpredictable amounts of
              * subsequent "good" input, percolate this error state up to the
@@ -335,8 +334,7 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
             json_message_process_token(lexer, lexer->token, JSON_ERROR,
                                        lexer->x, lexer->y);
             g_string_truncate(lexer->token, 0);
-            new_state = lexer->start_state;
-            lexer->state = new_state;
+            lexer->state = lexer->start_state;
             return;
         default:
             break;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 52/56] json: Eliminate lexer state IN_WHITESPACE, pseudo-token JSON_SKIP
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (50 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 51/56] json: Eliminate lexer state IN_ERROR and pseudo-token JSON_MIN Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-16 13:51   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 53/56] json: Make JSONToken opaque outside json-parser.c Markus Armbruster
                   ` (4 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Bonus: static json_lexer[] loses its unused elements.  It shrinks from
8KiB to 4.75KiB for me.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/qapi/qmp/json-lexer.h |  1 -
 qobject/json-lexer.c          | 30 +++++++++---------------------
 2 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/include/qapi/qmp/json-lexer.h b/include/qapi/qmp/json-lexer.h
index f3524de07a..1a2dbbb717 100644
--- a/include/qapi/qmp/json-lexer.h
+++ b/include/qapi/qmp/json-lexer.h
@@ -27,7 +27,6 @@ typedef enum {
     JSON_KEYWORD,
     JSON_STRING,
     JSON_INTERPOL,
-    JSON_SKIP,
     JSON_END_OF_INPUT           /* must be last */
 } JSONTokenType;
 
diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 0332f9dbe1..26ba45956d 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -119,7 +119,6 @@ enum {
     IN_SIGN,
     IN_KEYWORD,
     IN_INTERPOL,
-    IN_WHITESPACE,
 };
 
 QEMU_BUILD_BUG_ON(JSON_ERROR != 0); /* json_lexer[] relies on this */
@@ -213,15 +212,6 @@ static const uint8_t json_lexer[][256] =  {
         ['a' ... 'z'] = IN_KEYWORD,
     },
 
-    /* whitespace */
-    [IN_WHITESPACE] = {
-        TERMINAL(JSON_SKIP),
-        [' '] = IN_WHITESPACE,
-        ['\t'] = IN_WHITESPACE,
-        ['\r'] = IN_WHITESPACE,
-        ['\n'] = IN_WHITESPACE,
-    },
-
     /* top level rule */
     [IN_START] = {
         ['"'] = IN_DQ_STRING,
@@ -236,10 +226,10 @@ static const uint8_t json_lexer[][256] =  {
         [','] = JSON_COMMA,
         [':'] = JSON_COLON,
         ['a' ... 'z'] = IN_KEYWORD,
-        [' '] = IN_WHITESPACE,
-        ['\t'] = IN_WHITESPACE,
-        ['\r'] = IN_WHITESPACE,
-        ['\n'] = IN_WHITESPACE,
+        [' '] = IN_START,
+        ['\t'] = IN_START,
+        ['\r'] = IN_START,
+        ['\n'] = IN_START,
     },
 
     /* interpolation */
@@ -263,11 +253,11 @@ static const uint8_t json_lexer[][256] =  {
         [','] = JSON_COMMA,
         [':'] = JSON_COLON,
         ['a' ... 'z'] = IN_KEYWORD,
-        [' '] = IN_WHITESPACE,
-        ['\t'] = IN_WHITESPACE,
-        ['\r'] = IN_WHITESPACE,
-        ['\n'] = IN_WHITESPACE,
         /* matches IN_START up to here */
+        [' '] = IN_START_INTERPOL,
+        ['\t'] = IN_START_INTERPOL,
+        ['\r'] = IN_START_INTERPOL,
+        ['\n'] = IN_START_INTERPOL,
         ['%'] = IN_INTERPOL,
     },
 };
@@ -294,7 +284,7 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
         assert(lexer->state <= ARRAY_SIZE(json_lexer));
         new_state = json_lexer[lexer->state][(uint8_t)ch];
         char_consumed = !TERMINAL_NEEDED_LOOKAHEAD(lexer->state, new_state);
-        if (char_consumed && !flush) {
+        if (char_consumed && new_state != lexer->start_state && !flush) {
             g_string_append_c(lexer->token, ch);
         }
 
@@ -312,8 +302,6 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
         case JSON_STRING:
             json_message_process_token(lexer, lexer->token, new_state,
                                        lexer->x, lexer->y);
-            /* fall through */
-        case JSON_SKIP:
             g_string_truncate(lexer->token, 0);
             new_state = lexer->start_state;
             break;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 53/56] json: Make JSONToken opaque outside json-parser.c
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (51 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 52/56] json: Eliminate lexer state IN_WHITESPACE, pseudo-token JSON_SKIP Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-16 13:54   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 54/56] qobject: Drop superfluous includes of qemu-common.h Markus Armbruster
                   ` (3 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/qapi/qmp/json-parser.h   |  4 ++++
 include/qapi/qmp/json-streamer.h |  7 -------
 qobject/json-parser.c            | 19 +++++++++++++++++++
 qobject/json-streamer.c          |  8 +-------
 4 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/include/qapi/qmp/json-parser.h b/include/qapi/qmp/json-parser.h
index a34209db7a..21b23d7bec 100644
--- a/include/qapi/qmp/json-parser.h
+++ b/include/qapi/qmp/json-parser.h
@@ -15,7 +15,11 @@
 #define QEMU_JSON_PARSER_H
 
 #include "qemu-common.h"
+#include "qapi/qmp/json-lexer.h"
 
+typedef struct JSONToken JSONToken;
+
+JSONToken *json_token(JSONTokenType type, int x, int y, GString *tokstr);
 QObject *json_parser_parse(GQueue *tokens, va_list *ap, Error **errp);
 
 #endif
diff --git a/include/qapi/qmp/json-streamer.h b/include/qapi/qmp/json-streamer.h
index d1d7fe2595..29950ac37c 100644
--- a/include/qapi/qmp/json-streamer.h
+++ b/include/qapi/qmp/json-streamer.h
@@ -16,13 +16,6 @@
 
 #include "qapi/qmp/json-lexer.h"
 
-typedef struct JSONToken {
-    int type;
-    int x;
-    int y;
-    char str[];
-} JSONToken;
-
 typedef struct JSONMessageParser
 {
     void (*emit)(void *opaque, QObject *json, Error *err);
diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index 3623567160..d8f9df2fd3 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -26,6 +26,13 @@
 #include "qapi/qmp/json-lexer.h"
 #include "qapi/qmp/json-streamer.h"
 
+struct JSONToken {
+    JSONTokenType type;
+    int x;
+    int y;
+    char str[];
+};
+
 typedef struct JSONParserContext
 {
     Error *err;
@@ -533,6 +540,18 @@ static QObject *parse_value(JSONParserContext *ctxt, va_list *ap)
     }
 }
 
+JSONToken *json_token(JSONTokenType type, int x, int y, GString *tokstr)
+{
+    JSONToken *token = g_malloc(sizeof(JSONToken) + tokstr->len + 1);
+
+    token->type = type;
+    memcpy(token->str, tokstr->str, tokstr->len);
+    token->str[tokstr->len] = 0;
+    token->x = x;
+    token->y = y;
+    return token;
+}
+
 QObject *json_parser_parse(GQueue *tokens, va_list *ap, Error **errp)
 {
     JSONParserContext ctxt = { .buf = tokens };
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index 9210281a65..467bc29413 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -82,13 +82,7 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
         goto out_emit;
     }
 
-    token = g_malloc(sizeof(JSONToken) + input->len + 1);
-    token->type = type;
-    memcpy(token->str, input->str, input->len);
-    token->str[input->len] = 0;
-    token->x = x;
-    token->y = y;
-
+    token = json_token(type, x, y, input);
     parser->token_size += input->len;
 
     g_queue_push_tail(&parser->tokens, token);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 54/56] qobject: Drop superfluous includes of qemu-common.h
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (52 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 53/56] json: Make JSONToken opaque outside json-parser.c Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-16 13:54   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 55/56] json: Clean up headers Markus Armbruster
                   ` (2 subsequent siblings)
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/qapi/qmp/json-parser.h | 1 -
 qobject/json-lexer.c           | 1 -
 qobject/json-streamer.c        | 1 -
 qobject/qbool.c                | 1 -
 qobject/qlist.c                | 1 -
 qobject/qnull.c                | 1 -
 qobject/qnum.c                 | 1 -
 qobject/qobject.c              | 1 -
 qobject/qstring.c              | 1 -
 9 files changed, 9 deletions(-)

diff --git a/include/qapi/qmp/json-parser.h b/include/qapi/qmp/json-parser.h
index 21b23d7bec..55f75954c3 100644
--- a/include/qapi/qmp/json-parser.h
+++ b/include/qapi/qmp/json-parser.h
@@ -14,7 +14,6 @@
 #ifndef QEMU_JSON_PARSER_H
 #define QEMU_JSON_PARSER_H
 
-#include "qemu-common.h"
 #include "qapi/qmp/json-lexer.h"
 
 typedef struct JSONToken JSONToken;
diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 26ba45956d..dc21eb52cf 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -12,7 +12,6 @@
  */
 
 #include "qemu/osdep.h"
-#include "qemu-common.h"
 #include "qapi/qmp/json-lexer.h"
 #include "qapi/qmp/json-streamer.h"
 
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index 467bc29413..da53e770e9 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -12,7 +12,6 @@
  */
 
 #include "qemu/osdep.h"
-#include "qemu-common.h"
 #include "qapi/error.h"
 #include "qapi/qmp/json-lexer.h"
 #include "qapi/qmp/json-parser.h"
diff --git a/qobject/qbool.c b/qobject/qbool.c
index b58249925c..06dfc43498 100644
--- a/qobject/qbool.c
+++ b/qobject/qbool.c
@@ -13,7 +13,6 @@
 
 #include "qemu/osdep.h"
 #include "qapi/qmp/qbool.h"
-#include "qemu-common.h"
 
 /**
  * qbool_from_bool(): Create a new QBool from a bool
diff --git a/qobject/qlist.c b/qobject/qlist.c
index 37c1c167f1..b3274af88b 100644
--- a/qobject/qlist.c
+++ b/qobject/qlist.c
@@ -17,7 +17,6 @@
 #include "qapi/qmp/qnum.h"
 #include "qapi/qmp/qstring.h"
 #include "qemu/queue.h"
-#include "qemu-common.h"
 
 /**
  * qlist_new(): Create a new QList
diff --git a/qobject/qnull.c b/qobject/qnull.c
index f6f55f11ea..00870a1824 100644
--- a/qobject/qnull.c
+++ b/qobject/qnull.c
@@ -11,7 +11,6 @@
  */
 
 #include "qemu/osdep.h"
-#include "qemu-common.h"
 #include "qapi/qmp/qnull.h"
 
 QNull qnull_ = {
diff --git a/qobject/qnum.c b/qobject/qnum.c
index 1501c82832..7012fc57f2 100644
--- a/qobject/qnum.c
+++ b/qobject/qnum.c
@@ -14,7 +14,6 @@
 
 #include "qemu/osdep.h"
 #include "qapi/qmp/qnum.h"
-#include "qemu-common.h"
 
 /**
  * qnum_from_int(): Create a new QNum from an int64_t
diff --git a/qobject/qobject.c b/qobject/qobject.c
index cf4b7e229e..878dd76e79 100644
--- a/qobject/qobject.c
+++ b/qobject/qobject.c
@@ -8,7 +8,6 @@
  */
 
 #include "qemu/osdep.h"
-#include "qemu-common.h"
 #include "qapi/qmp/qbool.h"
 #include "qapi/qmp/qnull.h"
 #include "qapi/qmp/qnum.h"
diff --git a/qobject/qstring.c b/qobject/qstring.c
index 0f1510e792..1c6897df00 100644
--- a/qobject/qstring.c
+++ b/qobject/qstring.c
@@ -12,7 +12,6 @@
 
 #include "qemu/osdep.h"
 #include "qapi/qmp/qstring.h"
-#include "qemu-common.h"
 
 /**
  * qstring_new(): Create a new empty QString
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 55/56] json: Clean up headers
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (53 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 54/56] qobject: Drop superfluous includes of qemu-common.h Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-16 17:50   ` Eric Blake
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 56/56] docs/interop/qmp-spec: How to force known good parser state Markus Armbruster
  2018-08-08 14:03 ` [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

The JSON parser has three public headers, json-lexer.h, json-parser.h,
json-streamer.h.  They all contain stuff that is of no interest
outside qobject/json-*.c.

Collect the public interface in include/qapi/qmp/json-parser.h, and
everything else in qobject/json-parser-int.h.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 include/qapi/qmp/json-parser.h                | 36 ++++++++++++---
 include/qapi/qmp/json-streamer.h              | 46 -------------------
 monitor.c                                     |  2 +-
 qga/main.c                                    |  2 +-
 qobject/json-lexer.c                          |  3 +-
 .../json-lexer.h => qobject/json-parser-int.h | 26 ++++++-----
 qobject/json-parser.c                         |  4 +-
 qobject/json-streamer.c                       |  4 +-
 qobject/qjson.c                               |  2 +-
 tests/libqtest.c                              |  2 +-
 10 files changed, 51 insertions(+), 76 deletions(-)
 delete mode 100644 include/qapi/qmp/json-streamer.h
 rename include/qapi/qmp/json-lexer.h => qobject/json-parser-int.h (62%)

diff --git a/include/qapi/qmp/json-parser.h b/include/qapi/qmp/json-parser.h
index 55f75954c3..7345a9bd5c 100644
--- a/include/qapi/qmp/json-parser.h
+++ b/include/qapi/qmp/json-parser.h
@@ -1,5 +1,5 @@
 /*
- * JSON Parser 
+ * JSON Parser
  *
  * Copyright IBM, Corp. 2009
  *
@@ -11,14 +11,36 @@
  *
  */
 
-#ifndef QEMU_JSON_PARSER_H
-#define QEMU_JSON_PARSER_H
+#ifndef QAPI_QMP_JSON_PARSER_H
+#define QAPI_QMP_JSON_PARSER_H
 
-#include "qapi/qmp/json-lexer.h"
+typedef struct JSONLexer {
+    int start_state, state;
+    GString *token;
+    int x, y;
+} JSONLexer;
 
-typedef struct JSONToken JSONToken;
+typedef struct JSONMessageParser {
+    void (*emit)(void *opaque, QObject *json, Error *err);
+    void *opaque;
+    va_list *ap;
+    JSONLexer lexer;
+    int brace_count;
+    int bracket_count;
+    GQueue tokens;
+    uint64_t token_size;
+} JSONMessageParser;
 
-JSONToken *json_token(JSONTokenType type, int x, int y, GString *tokstr);
-QObject *json_parser_parse(GQueue *tokens, va_list *ap, Error **errp);
+void json_message_parser_init(JSONMessageParser *parser,
+                              void (*emit)(void *opaque, QObject *json,
+                                           Error *err),
+                              void *opaque, va_list *ap);
+
+void json_message_parser_feed(JSONMessageParser *parser,
+                             const char *buffer, size_t size);
+
+void json_message_parser_flush(JSONMessageParser *parser);
+
+void json_message_parser_destroy(JSONMessageParser *parser);
 
 #endif
diff --git a/include/qapi/qmp/json-streamer.h b/include/qapi/qmp/json-streamer.h
deleted file mode 100644
index 29950ac37c..0000000000
--- a/include/qapi/qmp/json-streamer.h
+++ /dev/null
@@ -1,46 +0,0 @@
-/*
- * JSON streaming support
- *
- * Copyright IBM, Corp. 2009
- *
- * Authors:
- *  Anthony Liguori   <aliguori@us.ibm.com>
- *
- * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
- * See the COPYING.LIB file in the top-level directory.
- *
- */
-
-#ifndef QEMU_JSON_STREAMER_H
-#define QEMU_JSON_STREAMER_H
-
-#include "qapi/qmp/json-lexer.h"
-
-typedef struct JSONMessageParser
-{
-    void (*emit)(void *opaque, QObject *json, Error *err);
-    void *opaque;
-    va_list *ap;
-    JSONLexer lexer;
-    int brace_count;
-    int bracket_count;
-    GQueue tokens;
-    uint64_t token_size;
-} JSONMessageParser;
-
-void json_message_process_token(JSONLexer *lexer, GString *input,
-                                JSONTokenType type, int x, int y);
-
-void json_message_parser_init(JSONMessageParser *parser,
-                              void (*emit)(void *opaque, QObject *json,
-                                           Error *err),
-                              void *opaque, va_list *ap);
-
-void json_message_parser_feed(JSONMessageParser *parser,
-                             const char *buffer, size_t size);
-
-void json_message_parser_flush(JSONMessageParser *parser);
-
-void json_message_parser_destroy(JSONMessageParser *parser);
-
-#endif
diff --git a/monitor.c b/monitor.c
index dc0ed8df92..ff8960f857 100644
--- a/monitor.c
+++ b/monitor.c
@@ -58,7 +58,7 @@
 #include "qapi/qmp/qnum.h"
 #include "qapi/qmp/qstring.h"
 #include "qapi/qmp/qjson.h"
-#include "qapi/qmp/json-streamer.h"
+#include "qapi/qmp/json-parser.h"
 #include "qapi/qmp/qlist.h"
 #include "qom/object_interfaces.h"
 #include "trace-root.h"
diff --git a/qga/main.c b/qga/main.c
index b74e1241ef..6d70242d05 100644
--- a/qga/main.c
+++ b/qga/main.c
@@ -18,7 +18,7 @@
 #include <syslog.h>
 #include <sys/wait.h>
 #endif
-#include "qapi/qmp/json-streamer.h"
+#include "qapi/qmp/json-parser.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qjson.h"
 #include "qapi/qmp/qstring.h"
diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index dc21eb52cf..4d2b840929 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -12,8 +12,7 @@
  */
 
 #include "qemu/osdep.h"
-#include "qapi/qmp/json-lexer.h"
-#include "qapi/qmp/json-streamer.h"
+#include "json-parser-int.h"
 
 #define MAX_TOKEN_SIZE (64ULL << 20)
 
diff --git a/include/qapi/qmp/json-lexer.h b/qobject/json-parser-int.h
similarity index 62%
rename from include/qapi/qmp/json-lexer.h
rename to qobject/json-parser-int.h
index 1a2dbbb717..442d17996a 100644
--- a/include/qapi/qmp/json-lexer.h
+++ b/qobject/json-parser-int.h
@@ -1,5 +1,5 @@
 /*
- * JSON lexer
+ * JSON Parser
  *
  * Copyright IBM, Corp. 2009
  *
@@ -11,8 +11,10 @@
  *
  */
 
-#ifndef QEMU_JSON_LEXER_H
-#define QEMU_JSON_LEXER_H
+#ifndef JSON_PARSER_INT_H
+#define JSON_PARSER_INT_H
+
+#include "qapi/qmp/json-parser.h"
 
 typedef enum {
     JSON_ERROR = 0,             /* must be zero */
@@ -30,18 +32,20 @@ typedef enum {
     JSON_END_OF_INPUT           /* must be last */
 } JSONTokenType;
 
-typedef struct JSONLexer {
-    int start_state, state;
-    GString *token;
-    int x, y;
-} JSONLexer;
+typedef struct JSONToken JSONToken;
 
+/* json-lexer.c */
 void json_lexer_init(JSONLexer *lexer, bool enable_interpolation);
-
 void json_lexer_feed(JSONLexer *lexer, const char *buffer, size_t size);
-
 void json_lexer_flush(JSONLexer *lexer);
-
 void json_lexer_destroy(JSONLexer *lexer);
 
+/* json-streamer.c */
+void json_message_process_token(JSONLexer *lexer, GString *input,
+                                JSONTokenType type, int x, int y);
+
+/* json-parser.c */
+JSONToken *json_token(JSONTokenType type, int x, int y, GString *tokstr);
+QObject *json_parser_parse(GQueue *tokens, va_list *ap, Error **errp);
+
 #endif
diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index d8f9df2fd3..179b8a45d7 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -22,9 +22,7 @@
 #include "qapi/qmp/qnull.h"
 #include "qapi/qmp/qnum.h"
 #include "qapi/qmp/qstring.h"
-#include "qapi/qmp/json-parser.h"
-#include "qapi/qmp/json-lexer.h"
-#include "qapi/qmp/json-streamer.h"
+#include "json-parser-int.h"
 
 struct JSONToken {
     JSONTokenType type;
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index da53e770e9..47dd7ea576 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -13,9 +13,7 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
-#include "qapi/qmp/json-lexer.h"
-#include "qapi/qmp/json-parser.h"
-#include "qapi/qmp/json-streamer.h"
+#include "json-parser-int.h"
 
 #define MAX_TOKEN_SIZE (64ULL << 20)
 #define MAX_TOKEN_COUNT (2ULL << 20)
diff --git a/qobject/qjson.c b/qobject/qjson.c
index b9ccae2c2a..db36101f3b 100644
--- a/qobject/qjson.c
+++ b/qobject/qjson.c
@@ -13,7 +13,7 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
-#include "qapi/qmp/json-streamer.h"
+#include "qapi/qmp/json-parser.h"
 #include "qapi/qmp/qjson.h"
 #include "qapi/qmp/qbool.h"
 #include "qapi/qmp/qdict.h"
diff --git a/tests/libqtest.c b/tests/libqtest.c
index 7ef8dd621f..19a39091a1 100644
--- a/tests/libqtest.c
+++ b/tests/libqtest.c
@@ -23,7 +23,7 @@
 #include "libqtest.h"
 #include "qemu/cutils.h"
 #include "qapi/error.h"
-#include "qapi/qmp/json-streamer.h"
+#include "qapi/qmp/json-parser.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qjson.h"
 #include "qapi/qmp/qlist.h"
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* [Qemu-devel] [PATCH 56/56] docs/interop/qmp-spec: How to force known good parser state
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (54 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 55/56] json: Clean up headers Markus Armbruster
@ 2018-08-08 12:03 ` Markus Armbruster
  2018-08-10 14:30   ` Eric Blake
  2018-08-08 14:03 ` [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
  56 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 12:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Section "QGA Synchronization" specifies that sending "a raw 0xFF
sentinel byte" makes the server "reset its state and discard all
pending data prior to the sentinel."  What actually happens there is a
lexical error, which will produce one ore more error responses.
Moreover, it's not specific to QGA.

Create new section "Forcing the JSON parser into known-good state" to
document the technique properly.  Rewrite section "QGA
Synchronization" to document just the other direction, i.e. command
guest-sync-delimited.

Section "Protocol Specification" mentions "synchronization bytes
(documented below)".  Delete that.

While there, fix it not to claim '"Server" is QEMU itself', but
'"Server" is either QEMU or the QEMU Guest Agent'.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 docs/interop/qmp-spec.txt | 37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/docs/interop/qmp-spec.txt b/docs/interop/qmp-spec.txt
index 1566b8ae5e..d4a42fe2cc 100644
--- a/docs/interop/qmp-spec.txt
+++ b/docs/interop/qmp-spec.txt
@@ -20,9 +20,9 @@ operating system.
 2. Protocol Specification
 =========================
 
-This section details the protocol format. For the purpose of this document
-"Client" is any application which is using QMP to communicate with QEMU and
-"Server" is QEMU itself.
+This section details the protocol format. For the purpose of this
+document, "Server" is either QEMU or the QEMU Guest Agent, and
+"Client" is any application communicating with it via QMP.
 
 JSON data structures, when mentioned in this document, are always in the
 following format:
@@ -34,9 +34,8 @@ by the JSON standard:
 
 http://www.ietf.org/rfc/rfc7159.txt
 
-The protocol is always encoded in UTF-8 except for synchronization
-bytes (documented below); although thanks to json-string escape
-sequences, the server will reply using only the strict ASCII subset.
+The sever expects its input to be encoded in UTF-8, and sends its
+output encoded in ASCII.
 
 For convenience, json-object members mentioned in this document will
 be in a certain order. However, in real protocol usage they can be in
@@ -215,16 +214,28 @@ Some events are rate-limited to at most one per second.  If additional
 dropped, and the last one is delayed.  "Similar" normally means same
 event type.  See qmp-events.txt for details.
 
-2.6 QGA Synchronization
+2.6 Forcing the JSON parser into known-good state
+-------------------------------------------------
+
+Incomplete or invalid input can leave the server's JSON parser in a
+state where it can't parse additional commands.  To get it back into
+known-good state, the client should provoke a lexical error.
+
+The cleanest way to do that is sending an ASCII control character
+other than '\t' (horizontal tab), '\r' (carriage return), and '\n'
+(new line).
+
+Sadly, older versions of QEMU can fail to flag this as an error.  If a
+client needs to deal with them, it should send a 0xFF byte.
+
+2.7 QGA Synchronization
 -----------------------
 
 When using QGA, an additional synchronization feature is built into
-the protocol.  If the Client sends a raw 0xFF sentinel byte (not valid
-JSON), then the Server will reset its state and discard all pending
-data prior to the sentinel.  Conversely, if the Client makes use of
-the 'guest-sync-delimited' command, the Server will send a raw 0xFF
-sentinel byte prior to its response, to aid the Client in discarding
-any data prior to the sentinel.
+the protocol. If the Client makes use of the 'guest-sync-delimited'
+command, the Server will send a raw 0xFF sentinel byte prior to its
+response, to aid the Client in discarding any data prior to the
+sentinel.
 
 
 3. QMP Examples
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups
  2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
                   ` (55 preceding siblings ...)
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 56/56] docs/interop/qmp-spec: How to force known good parser state Markus Armbruster
@ 2018-08-08 14:03 ` Markus Armbruster
  56 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-08 14:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: marcandre.lureau, mdroth, eblake

Forgot to mention this is based on my "[PATCH v3 00/23] tests:
Compile-time format string checking for libqtest.h".

Based-on: 20180806065344.7103-1-armbru@redhat.com

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 01/56] check-qjson: Cover multiple JSON objects in same string
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 01/56] check-qjson: Cover multiple JSON objects in same string Markus Armbruster
@ 2018-08-09 13:25   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-09 13:25 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> qobject_from_json() & friends misbehave when the JSON text has more
> than one JSON value.  Add test coverage to demonstrate the bugs.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   tests/check-qjson.c | 20 ++++++++++++++++++++
>   1 file changed, 20 insertions(+)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 02/56] check-qjson: Cover blank and lexically erroneous input
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 02/56] check-qjson: Cover blank and lexically erroneous input Markus Armbruster
@ 2018-08-09 13:29   ` Eric Blake
  2018-08-10 13:40     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-09 13:29 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> qobject_from_json() can return null without setting an error on
> lexical errors.  I call that a bug.  Add test coverage to demonstrate
> it.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   tests/check-qjson.c | 36 +++++++++++++++++++++++++++++++++---
>   1 file changed, 33 insertions(+), 3 deletions(-)
> 

> +static void junk_input(void)
> +{
> +    /* Note: junk within strings is covered elsewhere */
> +    Error *err = NULL;
> +    QObject *obj;
> +
> +    obj = qobject_from_json("@", &err);

Invalid token

> +    g_assert(!err);             /* BUG */
> +    g_assert(obj == NULL);
> +
> +    obj = qobject_from_json("[0\xFF]", &err);

\xff stream reset, followed by unbalanced ]

> +    error_free_or_abort(&err);
> +    g_assert(obj == NULL);
> +
> +    obj = qobject_from_json("00", &err);

Invalid as a JSON number

> +    g_assert(!err);             /* BUG */
> +    g_assert(obj == NULL);
> +
> +    obj = qobject_from_json("[1e", &err);

Incomplete as a JSON number

> +    g_assert(!err);             /* BUG */
>       g_assert(obj == NULL);
>   }

Is it also worth testing:

"t" (incomplete as a JSON literal)
"a" (not a valid JSON literal, but alphabetic and thus different from 
the "@" test above)

At any rate, with or without further tests this is good improved 
coverage dealt with in the rest of the series.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 03/56] check-qjson: Cover whitespace more thoroughly
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 03/56] check-qjson: Cover whitespace more thoroughly Markus Armbruster
@ 2018-08-09 13:36   ` Eric Blake
  2018-08-10 13:43     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-09 13:36 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   tests/check-qjson.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/check-qjson.c b/tests/check-qjson.c
> index 81b92d6b0c..0a9a054c7b 100644
> --- a/tests/check-qjson.c
> +++ b/tests/check-qjson.c
> @@ -1236,7 +1236,7 @@ static void simple_whitespace(void)
>                       })),
>           },
>           {
> -            .encoded = " [ 43 , { 'h' : 'b' }, [ ], 42 ]",
> +            .encoded = "\t[ 43 , { 'h' : 'b' },\n\t[ ], 42 ]\n",

I would also test \r, since that is the final whitespace character 
mentioned in RFC 7159.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 04/56] qmp-cmd-test: Split off qmp-test
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 04/56] qmp-cmd-test: Split off qmp-test Markus Armbruster
@ 2018-08-09 13:38   ` Eric Blake
  2018-08-10 13:49     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-09 13:38 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> qmp-test is for QMP protocol tests.  Commit e4a426e75ef added generic,
> basic tests of query commands to it.  Move them to their own test
> program qmp-cmd-test, to keep qmp-test focused on the protocol.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   MAINTAINERS            |   1 +
>   tests/Makefile.include |   3 +
>   tests/qmp-cmd-test.c   | 213 +++++++++++++++++++++++++++++++++++++++++
>   tests/qmp-test.c       | 191 +-----------------------------------
>   4 files changed, 218 insertions(+), 190 deletions(-)
>   create mode 100644 tests/qmp-cmd-test.c
> 

> +++ b/tests/qmp-cmd-test.c
> @@ -0,0 +1,213 @@
> +/*
> + * QMP command test cases
> + *
> + * Copyright (c) 2017 Red Hat Inc.

Worth adding 2018?

> + *
> + * Authors:
> + *  Markus Armbruster <armbru@redhat.com>,
> + *

Trailing comma is odd. And these days, I'm inclined to omit Author lines 
(git history is more reliable, anyways)

Those are minor issues, where it is up to you if/how to clean them, so

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 05/56] qmp-test: Cover syntax and lexical errors
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 05/56] qmp-test: Cover syntax and lexical errors Markus Armbruster
@ 2018-08-09 13:42   ` Eric Blake
  2018-08-10 13:52     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-09 13:42 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   tests/libqtest.c | 17 +++++++++++++++++
>   tests/libqtest.h | 11 +++++++++++
>   tests/qmp-test.c | 39 +++++++++++++++++++++++++++++++++++++++
>   3 files changed, 67 insertions(+)
> 

> +    /* lexical error: impossible byte outside string */
> +    qtest_qmp_send_raw(qts, "{\xFF");

\xff is an impossible byte inside a string as well; plus it has special 
meaning to at least QMP for commanding a parser reset. Is a better byte 
more appropriate (maybe \x7f), either in replacement to \xff or as an 
additional test?

> +    resp = qtest_qmp_receive(qts);
> +    g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
> +    qobject_unref(resp);
> +    g_assert(recovered(qts));
> +
> +    /* lexical error: impossible byte in string */
> +    qtest_qmp_send_raw(qts, "{'bad \xFF");

Same question about \xff being special as the parser reset command, so 
should we test a different byte instead/as well?

> +    resp = qtest_qmp_receive(qts);
> +    g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
> +    qobject_unref(resp);
> +    g_assert(recovered(qts));
> +
> +    /* lexical error: interpolation */
> +    qtest_qmp_send_raw(qts, "%%p\n");
> +    resp = qtest_qmp_receive(qts);
> +    g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
> +    qobject_unref(resp);
> +    g_assert(recovered(qts));
> +
>       /* Not even a dictionary */
>       resp = qtest_qmp(qts, "null");
>       g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 06/56] test-qga: Clean up how we test QGA synchronization
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 06/56] test-qga: Clean up how we test QGA synchronization Markus Armbruster
@ 2018-08-09 13:46   ` Eric Blake
  2018-08-10 13:57     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-09 13:46 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> To permit recovering from arbitrary JSON parse errors, the JSON parser
> resets itself on lexical errors.  We recommend sending a 0xff byte for
> that purpose, and test-qga covers this usage since commit 5229564b832.
> That commit had to add an ugly hack to qmp_fd_vsend() to make capable
> of sending this byte (it's designed to send only valid JSON).
> 
> The previous commit added a way to send arbitrary text.  Put that to
> use for this purpose, and drop the hack from qmp_fd_vsend().
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   tests/libqtest.c | 39 +++++++++++++++++++++------------------
>   tests/libqtest.h |  2 ++
>   tests/test-qga.c |  8 ++++----
>   3 files changed, 27 insertions(+), 22 deletions(-)
> 
> diff --git a/tests/libqtest.c b/tests/libqtest.c
> index c02fc91b37..9c844874e4 100644
> --- a/tests/libqtest.c
> +++ b/tests/libqtest.c
> @@ -489,16 +489,6 @@ void qmp_fd_vsend(int fd, const char *fmt, va_list ap)
>   {
>       QObject *qobj;
>   
> -    /*
> -     * qobject_from_vjsonf_nofail() chokes on leading 0xff as invalid
> -     * JSON, but tests/test-qga.c needs to send that to test QGA
> -     * synchronization
> -     */
> -    if (*fmt == '\377') {
> -        socket_send(fd, fmt, 1);
> -        fmt++;
> -    }
> -
>       /* Going through qobject ensures we escape strings properly */
>       qobj = qobject_from_vjsonf_nofail(fmt, ap);

This does JSON interpolation...

>   
> @@ -586,23 +576,36 @@ void qtest_qmp_send(QTestState *s, const char *fmt, ...)
>       va_end(ap);
>   }
>   
> -void qtest_qmp_send_raw(QTestState *s, const char *fmt, ...)
> +void qmp_fd_vsend_raw(int fd, const char *fmt, va_list ap)
>   {
>       bool log = getenv("QTEST_LOG") != NULL;
> -    va_list ap;
> -    char *str;
> -
> -    va_start(ap, fmt);
> -    str = g_strdup_vprintf(fmt, ap);
> -    va_end(ap);
> +    char *str = g_strdup_vprintf(fmt, ap);

...while the new code does printf interpolation...

> +++ b/tests/test-qga.c
> @@ -147,10 +147,10 @@ static void test_qga_sync_delimited(gconstpointer fix)
>       unsigned char c;
>       QDict *ret;
>   
> -    qmp_fd_send(fixture->fd,
> -                "\xff{'execute': 'guest-sync-delimited',"
> -                " 'arguments': {'id': %u } }",
> -                r);
> +    qmp_fd_send_raw(fixture->fd,
> +                    "\xff{'execute': 'guest-sync-delimited',"
> +                    " 'arguments': {'id': %u } }",
> +                    r);

...and your test was relying on interpolation. Fortunately, your only 
thing being interpolated (%u) happens to result in valid JSON - but you 
may want to split this into:

qmp_fd_send_raw(fixture->fd, "\xff");
qmp_fd_send(fixutre->fd, "{'execute ... %u}}", f);

to make it less questionable. If not, then call it out in the commit 
message and/or a comment on the code itself.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 07/56] check-qjson: Cover escaped characters more thoroughly, part 1
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 07/56] check-qjson: Cover escaped characters more thoroughly, part 1 Markus Armbruster
@ 2018-08-09 13:54   ` Eric Blake
  2018-08-10 14:03     ` Markus Armbruster
  2018-08-09 14:00   ` Eric Blake
  1 sibling, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-09 13:54 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> escaped_string() first tests double quoted strings, then repeats a few
> tests with single quotes.  Repeat all of them: store the strings to
> test without quotes, and wrap them in either kind of quote for
> testing.

Does that properly cover the fact that "'" and '"' are valid, but the 
counterparts need escaping?

> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   tests/check-qjson.c | 94 ++++++++++++++++++++++++++-------------------
>   1 file changed, 55 insertions(+), 39 deletions(-)
> 
> diff --git a/tests/check-qjson.c b/tests/check-qjson.c
> index 0a9a054c7b..1c7f24bc4d 100644
> --- a/tests/check-qjson.c
> +++ b/tests/check-qjson.c
> @@ -22,55 +22,71 @@
>   #include "qapi/qmp/qstring.h"
>   #include "qemu-common.h"
>   
> +static QString *from_json_str(const char *jstr, Error **errp, bool single)
> +{
> +    char quote = single ? '\'' : '"';
> +    char *qjstr = g_strdup_printf("%c%s%c", quote, jstr, quote);
> +
> +    return qobject_to(QString, qobject_from_json(qjstr, errp));

Memory leak of qjstr.

> +}
> +
> +static char *to_json_str(QString *str)
> +{
> +    QString *json = qobject_to_json(QOBJECT(str));
> +    char *jstr;
> +
> +    if (!json) {
> +        return NULL;
> +    }
> +    /* peel off double quotes */
> +    jstr = g_strndup(qstring_get_str(json) + 1,
> +                     qstring_get_length(json) - 2);
> +    qobject_unref(json);
> +    return jstr;
> +}
> +
>   static void escaped_string(void)
>   {
> -    int i;
>       struct {
> -        const char *encoded;
> -        const char *decoded;
> +        /* Content of JSON string to parse with qobject_from_json() */
> +        const char *json_in;
> +        /* Expected parse output; to unparse with qobject_to_json() */
> +        const char *utf8_out;
>           int skip;
>       } test_cases[] = {

> +        { "\\\"", "\"" },

This covers the escaped version of ", but not of ', and not the 
unescaped version of either (per my comment above, the latter can only 
be done with the opposite quoting).

Otherwise looks sane.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 08/56] check-qjson: Streamline escaped_string()'s test strings
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 08/56] check-qjson: Streamline escaped_string()'s test strings Markus Armbruster
@ 2018-08-09 13:57   ` Eric Blake
  2018-08-10 14:15     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-09 13:57 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> Merge a few closely related test strings, and drop a few redundant
> ones.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   tests/check-qjson.c | 14 ++------------
>   1 file changed, 2 insertions(+), 12 deletions(-)
> 

>       } test_cases[] = {
> -        { "\\b", "\b" },
> -        { "\\f", "\f" },
> -        { "\\n", "\n" },
> -        { "\\r", "\r" },
> -        { "\\t", "\t" },
> -        { "/", "/" },
> -        { "\\/", "/", .skip = 1 },
> -        { "\\\\", "\\" },
> -        { "\\\"", "\"" },
> -        { "hello world \\\"embedded string\\\"",
> -          "hello world \"embedded string\"" },
> -        { "hello world\\nwith new line", "hello world\nwith new line" },
> +        { "\\b\\f\\n\\r\\t\\\\\\\"", "\b\f\n\r\t\\\"" },
> +        { "\\/\\'", "/'", .skip = 1 },

Aha - this adds coverage of the escaped ' not present in 7/56. (Still 
nothing about the unescaped versions of ' or " with correct quoting).

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 07/56] check-qjson: Cover escaped characters more thoroughly, part 1
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 07/56] check-qjson: Cover escaped characters more thoroughly, part 1 Markus Armbruster
  2018-08-09 13:54   ` Eric Blake
@ 2018-08-09 14:00   ` Eric Blake
  2018-08-10 14:11     ` Markus Armbruster
  1 sibling, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-09 14:00 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> escaped_string() first tests double quoted strings, then repeats a few
> tests with single quotes.  Repeat all of them: store the strings to
> test without quotes, and wrap them in either kind of quote for
> testing.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   tests/check-qjson.c | 94 ++++++++++++++++++++++++++-------------------
>   1 file changed, 55 insertions(+), 39 deletions(-)
> 

>       struct {
> -        const char *encoded;
> -        const char *decoded;
> +        /* Content of JSON string to parse with qobject_from_json() */
> +        const char *json_in;
> +        /* Expected parse output; to unparse with qobject_to_json() */
> +        const char *utf8_out;
>           int skip;

Instead of int skip (and why is that not a bool?), would it be better to 
have an optional const char *json_out?

> +    for (i = 0; test_cases[i].json_in; i++) {
> +        for (j = 0; j < 2; j++) {
> +            cstr = from_json_str(test_cases[i].json_in, &error_abort, j);
> +            g_assert_cmpstr(qstring_get_try_str(cstr),
> +                            ==, test_cases[i].utf8_out);
> +            if (test_cases[i].skip == 0) {
> +                jstr = to_json_str(cstr);
> +                g_assert_cmpstr(jstr, ==, test_cases[i].json_in);
> +                g_free(jstr);

and here, write g_assert_cmpstr(jstr, ==, test_cases[i].json_out ?: 
test_cases[i].json_in)?  After all, the reason we're skipping is because 
there are some cases of multiple inputs that get canonicalized to 
constant output, such as " " vs. "\u0020", or "\\'" vs. "'".

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 09/56] check-qjson: Cover escaped characters more thoroughly, part 2
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 09/56] check-qjson: Cover escaped characters more thoroughly, part 2 Markus Armbruster
@ 2018-08-09 14:03   ` Eric Blake
  2018-08-10 14:16     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-09 14:03 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> Cover surrogates, invalid escapes, and noncharacters.  This
> demonstrates that valid surrogate pairs are misinterpreted, and
> invalid surrogates and noncharacters aren't rejected.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   tests/check-qjson.c | 53 ++++++++++++++++++++++++++++++++++++++-------
>   1 file changed, 45 insertions(+), 8 deletions(-)
> 

> +        { "\\u12x", NULL },
> +        { "\\u123x", NULL },
> +        { "\\u12345", "\341\210\2645" },
> +        { "\\u12345", "\341\210\2645" },

Why is this one duplicated?

Otherwise,
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 10/56] check-qjson: Drop redundant string tests
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 10/56] check-qjson: Drop redundant string tests Markus Armbruster
@ 2018-08-09 14:04   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-09 14:04 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> simple_string() and single_quote_string() add nothing to
> escaped_string() anymore.  Drop them.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   tests/check-qjson.c | 59 ---------------------------------------------
>   1 file changed, 59 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 11/56] check-qjson: Cover UTF-8 in single quoted strings
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 11/56] check-qjson: Cover UTF-8 in single quoted strings Markus Armbruster
@ 2018-08-09 14:17   ` Eric Blake
  2018-08-10 14:18     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-09 14:17 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> utf8_string() tests only double quoted strings.  Cover single quoted
> strings, too: store the strings to test without quotes, then wrap them
> in either kind of quote.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   tests/check-qjson.c | 427 ++++++++++++++++++++++----------------------
>   1 file changed, 214 insertions(+), 213 deletions(-)
> 

Pre-existing, but:

>           /* 2.2.4  4 bytes U+1FFFFF */

Technically, Unicode ends at U+10FFFF (21 bits). Anything beyond that is 
not valid Unicode, even if it IS a valid interpretation of UTF-8 encoding.

>           {
> -            "\"\xF7\xBF\xBF\xBF\"",
> +            "\xF7\xBF\xBF\xBF",
>               NULL,               /* bug: rejected */
> -            "\"\\uFFFD\"",
> +            "\\uFFFD",
>               "\xF7\xBF\xBF\xBF",
>           },
>           /* 2.2.5  5 bytes U+3FFFFFF */

Which makes this one also questionable,

>           {
> -            "\"\xFB\xBF\xBF\xBF\xBF\"",
> +            "\xFB\xBF\xBF\xBF\xBF",
>               NULL,               /* bug: rejected */
> -            "\"\\uFFFD\"",
> +            "\\uFFFD",
>               "\xFB\xBF\xBF\xBF\xBF",
>           },
>           /* 2.2.6  6 bytes U+7FFFFFFF */

and this one.

>           {
>               /* last one in last plane: U+10FFFD */
> -            "\"\xF4\x8F\xBF\xBD\"",
>               "\xF4\x8F\xBF\xBD",
> -            "\"\\uDBFF\\uDFFD\""
> +            "\xF4\x8F\xBF\xBD",
> +            "\\uDBFF\\uDFFD"
>           },
>           {
>               /* first one beyond Unicode range: U+110000 */

while these are reasonable.

The conversion of the initializer looks sane (well, mechanical).  Ergo:

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 12/56] check-qjson: Simplify utf8_string()
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 12/56] check-qjson: Simplify utf8_string() Markus Armbruster
@ 2018-08-09 14:20   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-09 14:20 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> The previous commit made utf8_string()'s test_cases[].utf8_in
> superfluous: we can use .json_in instead.  Except for the case testing
> U+0000.  \x00 doesn't work in C strings, so it tests \\u0000 instead.
> But testing \\uXXXX is escaped_string()'s job.  It's covered there.
> Test U+0001 here, and drop .utf8_in.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   tests/check-qjson.c | 53 ++++++++-------------------------------------
>   1 file changed, 9 insertions(+), 44 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 13/56] check-qjson: Fix utf8_string() to test all invalid sequences
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 13/56] check-qjson: Fix utf8_string() to test all invalid sequences Markus Armbruster
@ 2018-08-09 14:22   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-09 14:22 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> Some of utf8_string()'s test_cases[] contain multiple invalid
> sequences.  Testing that qobject_from_json() fails only tests we
> reject at least one invalid sequence.  That's incomplete.
> 
> Additionally test each non-space sequence in isolation.
> 
> This demonstrates that the JSON parser accepts invalid sequences
> starting with \xC2..\xF4.  Add a FIXME comment.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   tests/check-qjson.c | 33 ++++++++++++++++++++++++++++-----
>   1 file changed, 28 insertions(+), 5 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 14/56] check-qjson qmp-test: Cover control characters more thoroughly
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 14/56] check-qjson qmp-test: Cover control characters more thoroughly Markus Armbruster
@ 2018-08-09 17:24   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-09 17:24 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> RFC 7159 requires control characters in strings to be escaped.
> Demonstrate the JSON parser accepts U+0001 .. U+001F unescaped.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   tests/check-qjson.c | 36 ++++++++++++++++++++++++++++++------
>   tests/qmp-test.c    | 14 ++++++++++++++
>   2 files changed, 44 insertions(+), 6 deletions(-)
> 
Reviewed-by: Eric Blake <eblake@redhat.com>

Accepting it on input (as an extension to the RFC, similar to our \' 
extension) sounds reasonable, but producing it on output is wrong.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 15/56] check-qjson: Cover interpolation more thoroughly
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 15/56] check-qjson: Cover interpolation " Markus Armbruster
@ 2018-08-09 17:26   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-09 17:26 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   tests/check-qjson.c | 141 ++++++++++++++++++++++++--------------------
>   1 file changed, 77 insertions(+), 64 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 16/56] json: Fix lexer to include the bad character in JSON_ERROR token
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 16/56] json: Fix lexer to include the bad character in JSON_ERROR token Markus Armbruster
@ 2018-08-09 17:42   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-09 17:42 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> json_lexer[] maps (lexer state, input character) to the new lexer
> state.  The input character is consumed unless the new state is
> terminal and the input character doesn't belong to this token,
> i.e. the state transition uses look-ahead.  When this is the case,
> input character '\0' would result in the same state transition.
> TERMINAL_NEEDED_LOOKAHEAD() exploits this.
> 
> Except this is wrong for transitions to IN_ERROR.  There, the
> offending input character is in fact consumed: case IN_ERROR returns.
> It isn't added to the JSON_ERROR token, though.
> 
> Fix that by making TERMINAL_NEEDED_LOOKAHEAD() return false for
> transitions to IN_ERROR.
> 
> There's a slight complication.  json_lexer_flush() passes input
> character '\0' to flush an incomplete token.  If this results in
> JSON_ERROR, we'd now add the '\0' to the token.  Suppress that.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   qobject/json-lexer.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)

Deceptively small change, but worthwhile.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 17/56] json: Reject unescaped control characters
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 17/56] json: Reject unescaped control characters Markus Armbruster
@ 2018-08-09 18:26   ` Eric Blake
  2018-08-10 14:26     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-09 18:26 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> Fix the lexer to reject unescaped control characters in JSON strings,
> in accordance with RFC 7159.

Question - can this break existing QMP clients that were relying on this 
extension working?

Libvirt used to use libyajl, now it uses libjansson. So I'll check both 
of those libraries:

yajl: https://github.com/lloyd/yajl/blob/master/src/yajl_encode.c#L32

             default:
                 if ((unsigned char) str[end] < 32) {
                     CharToHex(str[end], hexBuf + 4);
escaped = hexBuf;

jansson: https://github.com/akheron/jansson/blob/master/src/dump.c#L101

             /* mandatory escape or control char */
if(codepoint == '\\' || codepoint == '"' || codepoint < 0x20)

Okay, both libraries appear to always send control characters encoded, 
and thus were not relying on this accidental QMP extension.

Are we worried about other clients?

> 
> Bonus: we now recover more nicely from unclosed strings.  E.g.
> 
>      {"one: 1}\n{"two": 2}
> 
> now recovers cleanly after the newline, where before the lexer
> remained confused until the next unpaired double quote or lexical
> error.

On that grounds alone, I could live with this patch, even if we end up 
having to revert it later if some client was actually depending on 
sending raw control characters as part of a string.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 18/56] json: Revamp lexer documentation
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 18/56] json: Revamp lexer documentation Markus Armbruster
@ 2018-08-09 18:49   ` Eric Blake
  2018-08-10 14:31     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-09 18:49 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   qobject/json-lexer.c | 80 +++++++++++++++++++++++++++++++++++++++-----
>   1 file changed, 71 insertions(+), 9 deletions(-)
> 

> + *
> + * [Numbers:]

Worth also calling out:

[Objects:]
       object = begin-object [ member *( value-separator member ) ]
                end-object

       member = string name-separator value
[Arrays:]
    array = begin-array [ value *( value-separator value ) ] end-array

so as to completely cover the RFC grammar?

> + *
> + * Extensions over RFC 7159:
> + * - Extra escape sequence in strings:
> + *   0x27 (apostrophe) is recognized after escape, too
> + * - Single-quoted strings:
> + *   Like double-quoted strings, except they're delimited by %x27
> + *   (apostrophe) instead of %x22 (quotation mark), and can't contain
> + *   unescaped apostrophe, but can contain unescaped quotation mark.
> + * - Interpolation:
> + *   interpolation = %((l|ll|I64)[du]|[ipsf])

Not in your series, but we recently discussed adding %% (only inside 
strings); coupled with enforcing that all other interpolation occurs 
outside of strings.  I guess we can update this comment at that time.

> + *
> + * Note:
> + * - Input must be encoded in UTF-8.
> + * - Decoding and validating is left to the parser.
>    */
>   
>   enum json_lexer_state {
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 19/56] json: Tighten and simplify qstring_from_escaped_str()'s loop
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 19/56] json: Tighten and simplify qstring_from_escaped_str()'s loop Markus Armbruster
@ 2018-08-09 18:52   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-09 18:52 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> Simplify loop control, and assert that the string ends with the
> appropriate quote (the lexer ensures it does).
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   qobject/json-parser.c | 35 +++++++++--------------------------
>   1 file changed, 9 insertions(+), 26 deletions(-)
> 

Nice diffstat.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 20/56] check-qjson: Document we expect invalid UTF-8 to be rejected
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 20/56] check-qjson: Document we expect invalid UTF-8 to be rejected Markus Armbruster
@ 2018-08-09 18:55   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-09 18:55 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> The JSON parser rejects some invalid sequences, but accepts others
> without correcting the problem.
> 
> We should either reject all invalid sequences, or minimize overlong
> sequences and replace all other invalid sequences by a suitable
> replacement character.  A common choice for replacement is U+FFFD.
> 
> I'm going to implement the former.  Update the comments in
> utf8_string() to expect this.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   tests/check-qjson.c | 151 +++++++++++++++++++++-----------------------
>   1 file changed, 71 insertions(+), 80 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 21/56] json: Reject invalid UTF-8 sequences
  2018-08-08 12:02 ` [Qemu-devel] [PATCH 21/56] json: Reject invalid UTF-8 sequences Markus Armbruster
@ 2018-08-09 22:16   ` Eric Blake
  2018-08-10 14:40     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-09 22:16 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> We reject bytes that can't occur in valid UTF-8 (\xC0..\xC1,
> \xF5..\xFF in the lexer.  That's insufficient; there's plenty of
> invalid UTF-8 not containing these bytes, as demonstrated by
> check-qjson:
> 
> * Malformed sequences
> 
>    - Unexpected continuation bytes
> 
>    - Missing continuation bytes after start bytes other than
>      \xC0..\xC1, \xF5..\xFD.
> 
> * Overlong sequences with start bytes other than \xC0..\xC1,
>    \xF5..\xFD.
> 
> * Invalid code points
> 
> Fixing this in the lexer would be bothersome.  Fixing it in the parser
> is straightforward, so do that.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---

> @@ -193,12 +198,15 @@ static QString *qstring_from_escaped_str(JSONParserContext *ctxt,
>                   goto out;
>               }
>           } else {
> -            char dummy[2];
> -
> -            dummy[0] = *ptr;
> -            dummy[1] = 0;
> -
> -            qstring_append(str, dummy);
> +            cp = mod_utf8_codepoint(ptr, 6, &end);

Why are you hard-coding 6 here, rather than computing min(6, 
strchr(ptr,0)-ptr)?  If the user passes an invalid sequence at the end 
of the string, can we end up making mod_utf8_codepoint() read beyond the 
end of our string?  Would it be better to just always pass the remaining 
string length (mod_utf8_codepoint() only cares about stopping short of 6 
bytes, but never reads beyond there even if you pass a larger number)?

> +            if (cp <= 0) {
> +                parse_error(ctxt, token, "invalid UTF-8 sequence in string");
> +                goto out;
> +            }
> +            ptr = end - 1;
> +            len = mod_utf8_encode(utf8_buf, sizeof(utf8_buf), cp);
> +            assert(len >= 0);
> +            qstring_append(str, utf8_buf);
>           }
>       }
>   

> +++ b/util/unicode.c
> @@ -13,6 +13,21 @@
>   #include "qemu/osdep.h"
>   #include "qemu/unicode.h"
>   

> +ssize_t mod_utf8_encode(char buf[], size_t bufsz, int codepoint)
> +{
> +    assert(bufsz >= 5);
> +
> +    if (!is_valid_codepoint(codepoint)) {
> +        return -1;
> +    }
> +
> +    if (codepoint > 0 && codepoint <= 0x7F) {
> +        buf[0] = codepoint & 0x7F;

Dead use of binary &. But acceptable for symmetry with the other code 
branches.

> +        buf[1] = 0;
> +        return 1;
> +    }
> +    if (codepoint <= 0x7FF) {
> +        buf[0] = 0xC0 | ((codepoint >> 6) & 0x1F);
> +        buf[1] = 0x80 | (codepoint & 0x3F);
> +        buf[2] = 0;
> +        return 2;
> +    }
> +    if (codepoint <= 0xFFFF) {
> +        buf[0] = 0xE0 | ((codepoint >> 12) & 0x0F);
> +        buf[1] = 0x80 | ((codepoint >> 6) & 0x3F);
> +        buf[2] = 0x80 | (codepoint & 0x3F);
> +        buf[3] = 0;
> +        return 3;
> +    }
> +    buf[0] = 0xF0 | ((codepoint >> 18) & 0x07);
> +    buf[1] = 0x80 | ((codepoint >> 12) & 0x3F);
> +    buf[2] = 0x80 | ((codepoint >> 6) & 0x3F);
> +    buf[3] = 0x80 | (codepoint & 0x3F);
> +    buf[4] = 0;
> +    return 4;
> +}
> 

Overall, looks nice.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 02/56] check-qjson: Cover blank and lexically erroneous input
  2018-08-09 13:29   ` Eric Blake
@ 2018-08-10 13:40     ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-10 13:40 UTC (permalink / raw)
  To: Eric Blake; +Cc: Markus Armbruster, qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:02 AM, Markus Armbruster wrote:
>> qobject_from_json() can return null without setting an error on
>> lexical errors.  I call that a bug.  Add test coverage to demonstrate
>> it.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   tests/check-qjson.c | 36 +++++++++++++++++++++++++++++++++---
>>   1 file changed, 33 insertions(+), 3 deletions(-)
>>
>
>> +static void junk_input(void)
>> +{
>> +    /* Note: junk within strings is covered elsewhere */
>> +    Error *err = NULL;
>> +    QObject *obj;
>> +
>> +    obj = qobject_from_json("@", &err);
>
> Invalid token
>
>> +    g_assert(!err);             /* BUG */
>> +    g_assert(obj == NULL);
>> +
>> +    obj = qobject_from_json("[0\xFF]", &err);
>
> \xff stream reset, followed by unbalanced ]



>> +    error_free_or_abort(&err);
>> +    g_assert(obj == NULL);
>> +
>> +    obj = qobject_from_json("00", &err);
>
> Invalid as a JSON number
>
>> +    g_assert(!err);             /* BUG */
>> +    g_assert(obj == NULL);
>> +
>> +    obj = qobject_from_json("[1e", &err);
>
> Incomplete as a JSON number
>
>> +    g_assert(!err);             /* BUG */
>>       g_assert(obj == NULL);
>>   }
>
> Is it also worth testing:
>
> "t" (incomplete as a JSON literal)
> "a" (not a valid JSON literal, but alphabetic and thus different from
> the "@" test above)

Yes, an invalid keyword is worth testing.  The way the code works,
testing one should suffice.

> At any rate, with or without further tests this is good improved
> coverage dealt with in the rest of the series.
>
> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 03/56] check-qjson: Cover whitespace more thoroughly
  2018-08-09 13:36   ` Eric Blake
@ 2018-08-10 13:43     ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-10 13:43 UTC (permalink / raw)
  To: Eric Blake; +Cc: Markus Armbruster, qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:02 AM, Markus Armbruster wrote:
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   tests/check-qjson.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/tests/check-qjson.c b/tests/check-qjson.c
>> index 81b92d6b0c..0a9a054c7b 100644
>> --- a/tests/check-qjson.c
>> +++ b/tests/check-qjson.c
>> @@ -1236,7 +1236,7 @@ static void simple_whitespace(void)
>>                       })),
>>           },
>>           {
>> -            .encoded = " [ 43 , { 'h' : 'b' }, [ ], 42 ]",
>> +            .encoded = "\t[ 43 , { 'h' : 'b' },\n\t[ ], 42 ]\n",
>
> I would also test \r, since that is the final whitespace character
> mentioned in RFC 7159.

Easy to do, so why not.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 04/56] qmp-cmd-test: Split off qmp-test
  2018-08-09 13:38   ` Eric Blake
@ 2018-08-10 13:49     ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-10 13:49 UTC (permalink / raw)
  To: Eric Blake; +Cc: Markus Armbruster, qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:02 AM, Markus Armbruster wrote:
>> qmp-test is for QMP protocol tests.  Commit e4a426e75ef added generic,
>> basic tests of query commands to it.  Move them to their own test
>> program qmp-cmd-test, to keep qmp-test focused on the protocol.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   MAINTAINERS            |   1 +
>>   tests/Makefile.include |   3 +
>>   tests/qmp-cmd-test.c   | 213 +++++++++++++++++++++++++++++++++++++++++
>>   tests/qmp-test.c       | 191 +-----------------------------------
>>   4 files changed, 218 insertions(+), 190 deletions(-)
>>   create mode 100644 tests/qmp-cmd-test.c
>>
>
>> +++ b/tests/qmp-cmd-test.c
>> @@ -0,0 +1,213 @@
>> +/*
>> + * QMP command test cases
>> + *
>> + * Copyright (c) 2017 Red Hat Inc.
>
> Worth adding 2018?

I'll do that in one of the patches that adds something new.

>> + *
>> + * Authors:
>> + *  Markus Armbruster <armbru@redhat.com>,
>> + *
>
> Trailing comma is odd. And these days, I'm inclined to omit Author
> lines (git history is more reliable, anyways)

Comma inherited from qmp-test.c.  I'm fixing both.

> Those are minor issues, where it is up to you if/how to clean them, so
>
> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 05/56] qmp-test: Cover syntax and lexical errors
  2018-08-09 13:42   ` Eric Blake
@ 2018-08-10 13:52     ` Markus Armbruster
  2018-08-10 14:06       ` Eric Blake
  0 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-10 13:52 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:02 AM, Markus Armbruster wrote:
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   tests/libqtest.c | 17 +++++++++++++++++
>>   tests/libqtest.h | 11 +++++++++++
>>   tests/qmp-test.c | 39 +++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 67 insertions(+)
>>
>
>> +    /* lexical error: impossible byte outside string */
>> +    qtest_qmp_send_raw(qts, "{\xFF");
>
> \xff is an impossible byte inside a string as well; plus it has
> special meaning to at least QMP for commanding a parser reset. Is a
> better byte more appropriate (maybe \x7f), either in replacement to
> \xff or as an additional test?

\xFF is documented to have special meaning for QGA, but as far as the
code's concerned, it's a lexical error like any other.  I'm fixing the
documentation in PATCH 56.  Want me to move that patch to the front of
the series?

>> +    resp = qtest_qmp_receive(qts);
>> +    g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
>> +    qobject_unref(resp);
>> +    g_assert(recovered(qts));
>> +
>> +    /* lexical error: impossible byte in string */
>> +    qtest_qmp_send_raw(qts, "{'bad \xFF");
>
> Same question about \xff being special as the parser reset command, so
> should we test a different byte instead/as well?
>
>> +    resp = qtest_qmp_receive(qts);
>> +    g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
>> +    qobject_unref(resp);
>> +    g_assert(recovered(qts));
>> +
>> +    /* lexical error: interpolation */
>> +    qtest_qmp_send_raw(qts, "%%p\n");
>> +    resp = qtest_qmp_receive(qts);
>> +    g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
>> +    qobject_unref(resp);
>> +    g_assert(recovered(qts));
>> +
>>       /* Not even a dictionary */
>>       resp = qtest_qmp(qts, "null");
>>       g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
>>

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 06/56] test-qga: Clean up how we test QGA synchronization
  2018-08-09 13:46   ` Eric Blake
@ 2018-08-10 13:57     ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-10 13:57 UTC (permalink / raw)
  To: Eric Blake; +Cc: Markus Armbruster, qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:02 AM, Markus Armbruster wrote:
>> To permit recovering from arbitrary JSON parse errors, the JSON parser
>> resets itself on lexical errors.  We recommend sending a 0xff byte for
>> that purpose, and test-qga covers this usage since commit 5229564b832.
>> That commit had to add an ugly hack to qmp_fd_vsend() to make capable
>> of sending this byte (it's designed to send only valid JSON).
>>
>> The previous commit added a way to send arbitrary text.  Put that to
>> use for this purpose, and drop the hack from qmp_fd_vsend().
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   tests/libqtest.c | 39 +++++++++++++++++++++------------------
>>   tests/libqtest.h |  2 ++
>>   tests/test-qga.c |  8 ++++----
>>   3 files changed, 27 insertions(+), 22 deletions(-)
>>
>> diff --git a/tests/libqtest.c b/tests/libqtest.c
>> index c02fc91b37..9c844874e4 100644
>> --- a/tests/libqtest.c
>> +++ b/tests/libqtest.c
>> @@ -489,16 +489,6 @@ void qmp_fd_vsend(int fd, const char *fmt, va_list ap)
>>   {
>>       QObject *qobj;
>>   -    /*
>> -     * qobject_from_vjsonf_nofail() chokes on leading 0xff as invalid
>> -     * JSON, but tests/test-qga.c needs to send that to test QGA
>> -     * synchronization
>> -     */
>> -    if (*fmt == '\377') {
>> -        socket_send(fd, fmt, 1);
>> -        fmt++;
>> -    }
>> -
>>       /* Going through qobject ensures we escape strings properly */
>>       qobj = qobject_from_vjsonf_nofail(fmt, ap);
>
> This does JSON interpolation...
>
>>   @@ -586,23 +576,36 @@ void qtest_qmp_send(QTestState *s, const
>> char *fmt, ...)
>>       va_end(ap);
>>   }
>>   -void qtest_qmp_send_raw(QTestState *s, const char *fmt, ...)
>> +void qmp_fd_vsend_raw(int fd, const char *fmt, va_list ap)
>>   {
>>       bool log = getenv("QTEST_LOG") != NULL;
>> -    va_list ap;
>> -    char *str;
>> -
>> -    va_start(ap, fmt);
>> -    str = g_strdup_vprintf(fmt, ap);
>> -    va_end(ap);
>> +    char *str = g_strdup_vprintf(fmt, ap);
>
> ...while the new code does printf interpolation...
>
>> +++ b/tests/test-qga.c
>> @@ -147,10 +147,10 @@ static void test_qga_sync_delimited(gconstpointer fix)
>>       unsigned char c;
>>       QDict *ret;
>>   -    qmp_fd_send(fixture->fd,
>> -                "\xff{'execute': 'guest-sync-delimited',"
>> -                " 'arguments': {'id': %u } }",
>> -                r);
>> +    qmp_fd_send_raw(fixture->fd,
>> +                    "\xff{'execute': 'guest-sync-delimited',"
>> +                    " 'arguments': {'id': %u } }",
>> +                    r);
>
> ...and your test was relying on interpolation. Fortunately, your only

Yes, my patch is a bit lazy there.

> thing being interpolated (%u) happens to result in valid JSON - but
> you may want to split this into:
>
> qmp_fd_send_raw(fixture->fd, "\xff");
> qmp_fd_send(fixutre->fd, "{'execute ... %u}}", f);
>
> to make it less questionable. If not, then call it out in the commit
> message and/or a comment on the code itself.

Splitting is easier than explaining, so that's what I'll do.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 07/56] check-qjson: Cover escaped characters more thoroughly, part 1
  2018-08-09 13:54   ` Eric Blake
@ 2018-08-10 14:03     ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-10 14:03 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:02 AM, Markus Armbruster wrote:
>> escaped_string() first tests double quoted strings, then repeats a few
>> tests with single quotes.  Repeat all of them: store the strings to
>> test without quotes, and wrap them in either kind of quote for
>> testing.
>
> Does that properly cover the fact that "'" and '"' are valid, but the
> counterparts need escaping?
>
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   tests/check-qjson.c | 94 ++++++++++++++++++++++++++-------------------
>>   1 file changed, 55 insertions(+), 39 deletions(-)
>>
>> diff --git a/tests/check-qjson.c b/tests/check-qjson.c
>> index 0a9a054c7b..1c7f24bc4d 100644
>> --- a/tests/check-qjson.c
>> +++ b/tests/check-qjson.c
>> @@ -22,55 +22,71 @@
>>   #include "qapi/qmp/qstring.h"
>>   #include "qemu-common.h"
>>   +static QString *from_json_str(const char *jstr, Error **errp,
>> bool single)
>> +{
>> +    char quote = single ? '\'' : '"';
>> +    char *qjstr = g_strdup_printf("%c%s%c", quote, jstr, quote);
>> +
>> +    return qobject_to(QString, qobject_from_json(qjstr, errp));
>
> Memory leak of qjstr.

Fixing.

>> +}
>> +
>> +static char *to_json_str(QString *str)
>> +{
>> +    QString *json = qobject_to_json(QOBJECT(str));
>> +    char *jstr;
>> +
>> +    if (!json) {
>> +        return NULL;
>> +    }
>> +    /* peel off double quotes */
>> +    jstr = g_strndup(qstring_get_str(json) + 1,
>> +                     qstring_get_length(json) - 2);
>> +    qobject_unref(json);
>> +    return jstr;
>> +}
>> +
>>   static void escaped_string(void)
>>   {
>> -    int i;
>>       struct {
>> -        const char *encoded;
>> -        const char *decoded;
>> +        /* Content of JSON string to parse with qobject_from_json() */
>> +        const char *json_in;
>> +        /* Expected parse output; to unparse with qobject_to_json() */
>> +        const char *utf8_out;
>>           int skip;
>>       } test_cases[] = {
>
>> +        { "\\\"", "\"" },
>
> This covers the escaped version of ", but not of ', and not the
> unescaped version of either (per my comment above, the latter can only
> be done with the opposite quoting).

escaped_string() is about testing \-escapes.  Unescaped quotes are
covered by simple_string() and single_quote_string().

However, I drop both in PATCH 10.  That's actually a bad idea.

>> Otherwise looks sane.

Thanks!

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 05/56] qmp-test: Cover syntax and lexical errors
  2018-08-10 13:52     ` Markus Armbruster
@ 2018-08-10 14:06       ` Eric Blake
  2018-08-16 12:44         ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-10 14:06 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, marcandre.lureau, mdroth

On 08/10/2018 08:52 AM, Markus Armbruster wrote:

>>> +    /* lexical error: impossible byte outside string */
>>> +    qtest_qmp_send_raw(qts, "{\xFF");
>>
>> \xff is an impossible byte inside a string as well; plus it has
>> special meaning to at least QMP for commanding a parser reset. Is a
>> better byte more appropriate (maybe \x7f), either in replacement to
>> \xff or as an additional test?
> 
> \xFF is documented to have special meaning for QGA, but as far as the
> code's concerned, it's a lexical error like any other.  I'm fixing the
> documentation in PATCH 56.  Want me to move that patch to the front of
> the series?

Might not hurt. We also have a potential design decision to make: for 
most lexical errors, we report the error (with QGA, the user then 
requests that the first valid command after the client's induced lexical 
error also include an 0xff reply byte so that the client can easily skip 
over all the line noise, including said error reports).  Thus, we COULD 
decide to make our parser specifically accept 0xff as a new token, 
different from the lexical error token, so that it inhibits wasted error 
messages to the client on the grounds that the client sent it on 
purpose, differently from all other ways the client can use a lexical 
error to cause a reset.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 07/56] check-qjson: Cover escaped characters more thoroughly, part 1
  2018-08-09 14:00   ` Eric Blake
@ 2018-08-10 14:11     ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-10 14:11 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:02 AM, Markus Armbruster wrote:
>> escaped_string() first tests double quoted strings, then repeats a few
>> tests with single quotes.  Repeat all of them: store the strings to
>> test without quotes, and wrap them in either kind of quote for
>> testing.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   tests/check-qjson.c | 94 ++++++++++++++++++++++++++-------------------
>>   1 file changed, 55 insertions(+), 39 deletions(-)
>>
>
>>       struct {
>> -        const char *encoded;
>> -        const char *decoded;
>> +        /* Content of JSON string to parse with qobject_from_json() */
>> +        const char *json_in;
>> +        /* Expected parse output; to unparse with qobject_to_json() */
>> +        const char *utf8_out;
>>           int skip;
>
> Instead of int skip (and why is that not a bool?),

Ask Anthony ;)

>                                                    would it be better
> to have an optional const char *json_out?
>
>> +    for (i = 0; test_cases[i].json_in; i++) {
>> +        for (j = 0; j < 2; j++) {
>> +            cstr = from_json_str(test_cases[i].json_in, &error_abort, j);
>> +            g_assert_cmpstr(qstring_get_try_str(cstr),
>> +                            ==, test_cases[i].utf8_out);
>> +            if (test_cases[i].skip == 0) {
>> +                jstr = to_json_str(cstr);
>> +                g_assert_cmpstr(jstr, ==, test_cases[i].json_in);
>> +                g_free(jstr);
>
> and here, write g_assert_cmpstr(jstr, ==, test_cases[i].json_out ?:
> test_cases[i].json_in)?  After all, the reason we're skipping is
> because there are some cases of multiple inputs that get canonicalized
> to constant output, such as " " vs. "\u0020", or "\\'" vs. "'".

This would additionally test that qobject_from_json() correctly maps '/'
and ' ' to themselves.  Marginal.  If we want it, then comparing jstr to
cstr if skip would do.  Dunno.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 08/56] check-qjson: Streamline escaped_string()'s test strings
  2018-08-09 13:57   ` Eric Blake
@ 2018-08-10 14:15     ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-10 14:15 UTC (permalink / raw)
  To: Eric Blake; +Cc: Markus Armbruster, qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:02 AM, Markus Armbruster wrote:
>> Merge a few closely related test strings, and drop a few redundant
>> ones.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   tests/check-qjson.c | 14 ++------------
>>   1 file changed, 2 insertions(+), 12 deletions(-)
>>
>
>>       } test_cases[] = {
>> -        { "\\b", "\b" },
>> -        { "\\f", "\f" },
>> -        { "\\n", "\n" },
>> -        { "\\r", "\r" },
>> -        { "\\t", "\t" },
>> -        { "/", "/" },
>> -        { "\\/", "/", .skip = 1 },
>> -        { "\\\\", "\\" },
>> -        { "\\\"", "\"" },
>> -        { "hello world \\\"embedded string\\\"",
>> -          "hello world \"embedded string\"" },
>> -        { "hello world\\nwith new line", "hello world\nwith new line" },
>> +        { "\\b\\f\\n\\r\\t\\\\\\\"", "\b\f\n\r\t\\\"" },
>> +        { "\\/\\'", "/'", .skip = 1 },
>
> Aha - this adds coverage of the escaped ' not present in 7/56.

You're right.  I'll move that part there.

> Aha - this adds coverage of the escaped ' not present in 7/56. (Still
> nothing about the unescaped versions of ' or " with correct quoting).
>
> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 09/56] check-qjson: Cover escaped characters more thoroughly, part 2
  2018-08-09 14:03   ` Eric Blake
@ 2018-08-10 14:16     ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-10 14:16 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:02 AM, Markus Armbruster wrote:
>> Cover surrogates, invalid escapes, and noncharacters.  This
>> demonstrates that valid surrogate pairs are misinterpreted, and
>> invalid surrogates and noncharacters aren't rejected.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   tests/check-qjson.c | 53 ++++++++++++++++++++++++++++++++++++++-------
>>   1 file changed, 45 insertions(+), 8 deletions(-)
>>
>
>> +        { "\\u12x", NULL },
>> +        { "\\u123x", NULL },
>> +        { "\\u12345", "\341\210\2645" },
>> +        { "\\u12345", "\341\210\2645" },
>
> Why is this one duplicated?

Editing accident, fixing.

> Otherwise,
> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 11/56] check-qjson: Cover UTF-8 in single quoted strings
  2018-08-09 14:17   ` Eric Blake
@ 2018-08-10 14:18     ` Markus Armbruster
  2018-08-10 14:59       ` Eric Blake
  0 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-10 14:18 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:02 AM, Markus Armbruster wrote:
>> utf8_string() tests only double quoted strings.  Cover single quoted
>> strings, too: store the strings to test without quotes, then wrap them
>> in either kind of quote.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   tests/check-qjson.c | 427 ++++++++++++++++++++++----------------------
>>   1 file changed, 214 insertions(+), 213 deletions(-)
>>
>
> Pre-existing, but:
>
>>           /* 2.2.4  4 bytes U+1FFFFF */
>
> Technically, Unicode ends at U+10FFFF (21 bits). Anything beyond that
> is not valid Unicode, even if it IS a valid interpretation of UTF-8
> encoding.

Correct.  Testing how we handle such sequences makes sense all the same.

>>           {
>> -            "\"\xF7\xBF\xBF\xBF\"",
>> +            "\xF7\xBF\xBF\xBF",
>>               NULL,               /* bug: rejected */
>> -            "\"\\uFFFD\"",
>> +            "\\uFFFD",
>>               "\xF7\xBF\xBF\xBF",
>>           },
>>           /* 2.2.5  5 bytes U+3FFFFFF */
>
> Which makes this one also questionable,
>
>>           {
>> -            "\"\xFB\xBF\xBF\xBF\xBF\"",
>> +            "\xFB\xBF\xBF\xBF\xBF",
>>               NULL,               /* bug: rejected */
>> -            "\"\\uFFFD\"",
>> +            "\\uFFFD",
>>               "\xFB\xBF\xBF\xBF\xBF",
>>           },
>>           /* 2.2.6  6 bytes U+7FFFFFFF */
>
> and this one.
>
>>           {
>>               /* last one in last plane: U+10FFFD */
>> -            "\"\xF4\x8F\xBF\xBD\"",
>>               "\xF4\x8F\xBF\xBD",
>> -            "\"\\uDBFF\\uDFFD\""
>> +            "\xF4\x8F\xBF\xBD",
>> +            "\\uDBFF\\uDFFD"
>>           },
>>           {
>>               /* first one beyond Unicode range: U+110000 */
>
> while these are reasonable.
>
> The conversion of the initializer looks sane (well, mechanical).  Ergo:
>
> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 17/56] json: Reject unescaped control characters
  2018-08-09 18:26   ` Eric Blake
@ 2018-08-10 14:26     ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-10 14:26 UTC (permalink / raw)
  To: Eric Blake; +Cc: Markus Armbruster, qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:02 AM, Markus Armbruster wrote:
>> Fix the lexer to reject unescaped control characters in JSON strings,
>> in accordance with RFC 7159.
>
> Question - can this break existing QMP clients that were relying on
> this extension working?

In theory, yes.

The "extension" is undocumented.  That makes it a bug.

I'm not aware of clients relying on it.

> Libvirt used to use libyajl, now it uses libjansson. So I'll check
> both of those libraries:
>
> yajl: https://github.com/lloyd/yajl/blob/master/src/yajl_encode.c#L32
>
>             default:
>                 if ((unsigned char) str[end] < 32) {
>                     CharToHex(str[end], hexBuf + 4);
> escaped = hexBuf;
>
> jansson: https://github.com/akheron/jansson/blob/master/src/dump.c#L101
>
>             /* mandatory escape or control char */
> if(codepoint == '\\' || codepoint == '"' || codepoint < 0x20)
>
> Okay, both libraries appear to always send control characters encoded,
> and thus were not relying on this accidental QMP extension.
>
> Are we worried about other clients?

Breakage seems unlikely to me.

>> Bonus: we now recover more nicely from unclosed strings.  E.g.
>>
>>      {"one: 1}\n{"two": 2}
>>
>> now recovers cleanly after the newline, where before the lexer
>> remained confused until the next unpaired double quote or lexical
>> error.
>
> On that grounds alone, I could live with this patch, even if we end up
> having to revert it later if some client was actually depending on
> sending raw control characters as part of a string.

Having to revert the patch to stay bug-compatible wouldn't be exactly
terrible.

> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 56/56] docs/interop/qmp-spec: How to force known good parser state
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 56/56] docs/interop/qmp-spec: How to force known good parser state Markus Armbruster
@ 2018-08-10 14:30   ` Eric Blake
  2018-08-17  8:37     ` Markus Armbruster
  2018-08-17 11:16     ` Markus Armbruster
  0 siblings, 2 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-10 14:30 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> Section "QGA Synchronization" specifies that sending "a raw 0xFF
> sentinel byte" makes the server "reset its state and discard all
> pending data prior to the sentinel."  What actually happens there is a
> lexical error, which will produce one ore more error responses.
> Moreover, it's not specific to QGA.

Hoisting my review of this, as you may want to move it sooner in the series.

> 
> Create new section "Forcing the JSON parser into known-good state" to
> document the technique properly.  Rewrite section "QGA
> Synchronization" to document just the other direction, i.e. command
> guest-sync-delimited.
> 
> Section "Protocol Specification" mentions "synchronization bytes
> (documented below)".  Delete that.
> 
> While there, fix it not to claim '"Server" is QEMU itself', but
> '"Server" is either QEMU or the QEMU Guest Agent'.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   docs/interop/qmp-spec.txt | 37 ++++++++++++++++++++++++-------------
>   1 file changed, 24 insertions(+), 13 deletions(-)
> 
> diff --git a/docs/interop/qmp-spec.txt b/docs/interop/qmp-spec.txt
> index 1566b8ae5e..d4a42fe2cc 100644
> --- a/docs/interop/qmp-spec.txt
> +++ b/docs/interop/qmp-spec.txt
> @@ -20,9 +20,9 @@ operating system.
>   2. Protocol Specification
>   =========================
>   
> -This section details the protocol format. For the purpose of this document
> -"Client" is any application which is using QMP to communicate with QEMU and
> -"Server" is QEMU itself.
> +This section details the protocol format. For the purpose of this
> +document, "Server" is either QEMU or the QEMU Guest Agent, and
> +"Client" is any application communicating with it via QMP.
>   

Broadens the term "QMP" to mean any client speaking to a qemu 
machine-readable server (previously, we tended to treat "QMP" as the 
direct-to-qemu service, and "QGA" as the guest agent service). I can 
live with that, especially since this document was already mentioning QGA.

>   JSON data structures, when mentioned in this document, are always in the
>   following format:
> @@ -34,9 +34,8 @@ by the JSON standard:
>   
>   http://www.ietf.org/rfc/rfc7159.txt
>   
> -The protocol is always encoded in UTF-8 except for synchronization
> -bytes (documented below); although thanks to json-string escape
> -sequences, the server will reply using only the strict ASCII subset.
> +The sever expects its input to be encoded in UTF-8, and sends its
> +output encoded in ASCII.
>   

Perhaps worth documenting is the range of JSON numbers produced by qemu 
(maybe as a separate patch). Libvirt just hit a bug with the jansson 
library making it extremely difficult to parse JSON containing numbers 
larger than INT64_MAX, when compared to yajl which had a way to support 
up to UINT64_MAX.

https://bugzilla.redhat.com/show_bug.cgi?id=1614569

Knowing that qemu sends numbers larger than INT64_MAX with the intent 
that they not be truncated/rounded by conversion to double can be a 
vital piece of information for implementing a client, when it comes to 
picking a particular library for JSON parsing.

>   For convenience, json-object members mentioned in this document will
>   be in a certain order. However, in real protocol usage they can be in
> @@ -215,16 +214,28 @@ Some events are rate-limited to at most one per second.  If additional
>   dropped, and the last one is delayed.  "Similar" normally means same
>   event type.  See qmp-events.txt for details.
>   
> -2.6 QGA Synchronization
> +2.6 Forcing the JSON parser into known-good state
> +-------------------------------------------------
> +
> +Incomplete or invalid input can leave the server's JSON parser in a
> +state where it can't parse additional commands.  To get it back into
> +known-good state, the client should provoke a lexical error.
> +
> +The cleanest way to do that is sending an ASCII control character
> +other than '\t' (horizontal tab), '\r' (carriage return), and '\n'

s/and/or/

> +(new line).
> +
> +Sadly, older versions of QEMU can fail to flag this as an error.  If a
> +client needs to deal with them, it should send a 0xFF byte.

Here's where we have the choice of whether to intentionally document 
0xff as an intentional parser reset, instead of a lexical error. If so, 
the advice to provoke a lexical error via an ASCII control (of which I 
would be most likely to use 0x00 NUL or 0x1b ESC) vs. an intentional use 
of 0xff may need different wording here.

But if you don't want to give 0xff any more special treatment than what 
it already has as a lexical error (and that ALL lexical errors result in 
a stream reset, but possibly after emitting error messages), then this 
wording seems okay.

> +
> +2.7 QGA Synchronization
>   -----------------------
>   
>   When using QGA, an additional synchronization feature is built into
> -the protocol.  If the Client sends a raw 0xFF sentinel byte (not valid
> -JSON), then the Server will reset its state and discard all pending
> -data prior to the sentinel.  Conversely, if the Client makes use of
> -the 'guest-sync-delimited' command, the Server will send a raw 0xFF
> -sentinel byte prior to its response, to aid the Client in discarding
> -any data prior to the sentinel.
> +the protocol. If the Client makes use of the 'guest-sync-delimited'
> +command, the Server will send a raw 0xFF sentinel byte prior to its
> +response, to aid the Client in discarding any data prior to the
> +sentinel.

Maybe worth mentioning "including error messages reported about any 
lexical errors received prior to the guest-sync-delimited command"

>   
>   
>   3. QMP Examples
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 18/56] json: Revamp lexer documentation
  2018-08-09 18:49   ` Eric Blake
@ 2018-08-10 14:31     ` Markus Armbruster
  2018-08-10 15:02       ` Eric Blake
  0 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-10 14:31 UTC (permalink / raw)
  To: Eric Blake; +Cc: Markus Armbruster, qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:02 AM, Markus Armbruster wrote:
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   qobject/json-lexer.c | 80 +++++++++++++++++++++++++++++++++++++++-----
>>   1 file changed, 71 insertions(+), 9 deletions(-)
>>
>
>> + *
>> + * [Numbers:]
>
> Worth also calling out:
>
> [Objects:]
>       object = begin-object [ member *( value-separator member ) ]
>                end-object
>
>       member = string name-separator value
> [Arrays:]
>    array = begin-array [ value *( value-separator value ) ] end-array
>
> so as to completely cover the RFC grammar?

Should this go into json-parser.c?

>> + *
>> + * Extensions over RFC 7159:
>> + * - Extra escape sequence in strings:
>> + *   0x27 (apostrophe) is recognized after escape, too
>> + * - Single-quoted strings:
>> + *   Like double-quoted strings, except they're delimited by %x27
>> + *   (apostrophe) instead of %x22 (quotation mark), and can't contain
>> + *   unescaped apostrophe, but can contain unescaped quotation mark.
>> + * - Interpolation:
>> + *   interpolation = %((l|ll|I64)[du]|[ipsf])
>
> Not in your series, but we recently discussed adding %% (only inside
> strings); coupled with enforcing that all other interpolation occurs
> outside of strings.  I guess we can update this comment at that time.

Message-ID: <87bmaoszf0.fsf@dusky.pond.sub.org>
https://lists.nongnu.org/archive/html/qemu-devel/2018-07/msg05844.html

I meant to do that in this series, but got overwhelmed by all the other
stuff, and forgot.  Thanks for the reminder.  I may still do it in v2.
If not, we can do it on top.

>> + *
>> + * Note:
>> + * - Input must be encoded in UTF-8.
>> + * - Decoding and validating is left to the parser.
>>    */
>>     enum json_lexer_state {
>>
>
> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 21/56] json: Reject invalid UTF-8 sequences
  2018-08-09 22:16   ` Eric Blake
@ 2018-08-10 14:40     ` Markus Armbruster
  2018-08-10 15:21       ` Eric Blake
  0 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-10 14:40 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:02 AM, Markus Armbruster wrote:
>> We reject bytes that can't occur in valid UTF-8 (\xC0..\xC1,
>> \xF5..\xFF in the lexer.  That's insufficient; there's plenty of
>> invalid UTF-8 not containing these bytes, as demonstrated by
>> check-qjson:
>>
>> * Malformed sequences
>>
>>    - Unexpected continuation bytes
>>
>>    - Missing continuation bytes after start bytes other than
>>      \xC0..\xC1, \xF5..\xFD.
>>
>> * Overlong sequences with start bytes other than \xC0..\xC1,
>>    \xF5..\xFD.
>>
>> * Invalid code points
>>
>> Fixing this in the lexer would be bothersome.  Fixing it in the parser
>> is straightforward, so do that.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>
>> @@ -193,12 +198,15 @@ static QString *qstring_from_escaped_str(JSONParserContext *ctxt,
>>                   goto out;
>>               }
>>           } else {
>> -            char dummy[2];
>> -
>> -            dummy[0] = *ptr;
>> -            dummy[1] = 0;
>> -
>> -            qstring_append(str, dummy);
>> +            cp = mod_utf8_codepoint(ptr, 6, &end);
>
> Why are you hard-coding 6 here, rather than computing min(6,
> strchr(ptr,0)-ptr)?  If the user passes an invalid sequence at the end
> of the string, can we end up making mod_utf8_codepoint() read beyond
> the end of our string?  Would it be better to just always pass the
> remaining string length (mod_utf8_codepoint() only cares about
> stopping short of 6 bytes, but never reads beyond there even if you
> pass a larger number)?

mod_utf8_codepoint() never reads beyond '\0'.  The second parameter
exists only so you can further limit reads.  I like to provide that
capability, because it sometimes saves a silly substring copy.


>> +            if (cp <= 0) {
>> +                parse_error(ctxt, token, "invalid UTF-8 sequence in string");
>> +                goto out;
>> +            }
>> +            ptr = end - 1;
>> +            len = mod_utf8_encode(utf8_buf, sizeof(utf8_buf), cp);
>> +            assert(len >= 0);
>> +            qstring_append(str, utf8_buf);
>>           }
>>       }
>>   
>
>> +++ b/util/unicode.c
>> @@ -13,6 +13,21 @@
>>   #include "qemu/osdep.h"
>>   #include "qemu/unicode.h"
>>   
>
>> +ssize_t mod_utf8_encode(char buf[], size_t bufsz, int codepoint)
>> +{
>> +    assert(bufsz >= 5);
>> +
>> +    if (!is_valid_codepoint(codepoint)) {
>> +        return -1;
>> +    }
>> +
>> +    if (codepoint > 0 && codepoint <= 0x7F) {
>> +        buf[0] = codepoint & 0x7F;
>
> Dead use of binary &. But acceptable for symmetry with the other code
> branches.

Exactly as dead as ...

>> +        buf[1] = 0;
>> +        return 1;
>> +    }
>> +    if (codepoint <= 0x7FF) {
>> +        buf[0] = 0xC0 | ((codepoint >> 6) & 0x1F);

... this one, and ...

>> +        buf[1] = 0x80 | (codepoint & 0x3F);
>> +        buf[2] = 0;
>> +        return 2;
>> +    }
>> +    if (codepoint <= 0xFFFF) {
>> +        buf[0] = 0xE0 | ((codepoint >> 12) & 0x0F);

... this one, and ...

>> +        buf[1] = 0x80 | ((codepoint >> 6) & 0x3F);
>> +        buf[2] = 0x80 | (codepoint & 0x3F);
>> +        buf[3] = 0;
>> +        return 3;
>> +    }
>> +    buf[0] = 0xF0 | ((codepoint >> 18) & 0x07);

... even this one.

The last one only because is_valid_codepoint() rejects codepoints >
0x10FFFFu, which is admittedly a non-local argument.

I'm debating whether to keep or drop the redundant masking.  Got a
preference?

>> +    buf[1] = 0x80 | ((codepoint >> 12) & 0x3F);
>> +    buf[2] = 0x80 | ((codepoint >> 6) & 0x3F);
>> +    buf[3] = 0x80 | (codepoint & 0x3F);
>> +    buf[4] = 0;
>> +    return 4;
>> +}
>>
>
> Overall, looks nice.

Thanks!

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 11/56] check-qjson: Cover UTF-8 in single quoted strings
  2018-08-10 14:18     ` Markus Armbruster
@ 2018-08-10 14:59       ` Eric Blake
  2018-08-13  6:11         ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-10 14:59 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, marcandre.lureau, mdroth

On 08/10/2018 09:18 AM, Markus Armbruster wrote:
> Eric Blake <eblake@redhat.com> writes:
> 
>> On 08/08/2018 07:02 AM, Markus Armbruster wrote:
>>> utf8_string() tests only double quoted strings.  Cover single quoted
>>> strings, too: store the strings to test without quotes, then wrap them
>>> in either kind of quote.
>>>
>>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>>> ---
>>>    tests/check-qjson.c | 427 ++++++++++++++++++++++----------------------
>>>    1 file changed, 214 insertions(+), 213 deletions(-)
>>>
>>
>> Pre-existing, but:
>>
>>>            /* 2.2.4  4 bytes U+1FFFFF */
>>
>> Technically, Unicode ends at U+10FFFF (21 bits). Anything beyond that
>> is not valid Unicode, even if it IS a valid interpretation of UTF-8
>> encoding.
> 
> Correct.  Testing how we handle such sequences makes sense all the same.
> 
>>>            {
>>> -            "\"\xF7\xBF\xBF\xBF\"",
>>> +            "\xF7\xBF\xBF\xBF",
>>>                NULL,               /* bug: rejected */

So, maybe all the more we need to do is remove the comment (as we WANT 
to reject these)?

>>
>> The conversion of the initializer looks sane (well, mechanical).  Ergo:
>>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
> 
> Thanks!

Of course, playing games with the pre-existing comments on out-of-range 
behavior is probably better for a separate patch, and you do have some 
churn on these tests in later patches. I'll leave it up to you what to 
do (or leave put).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 18/56] json: Revamp lexer documentation
  2018-08-10 14:31     ` Markus Armbruster
@ 2018-08-10 15:02       ` Eric Blake
  2018-08-13  6:12         ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-10 15:02 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, marcandre.lureau, mdroth

On 08/10/2018 09:31 AM, Markus Armbruster wrote:

>>> + *
>>> + * [Numbers:]
>>
>> Worth also calling out:
>>
>> [Objects:]
>>        object = begin-object [ member *( value-separator member ) ]
>>                 end-object
>>
>>        member = string name-separator value
>> [Arrays:]
>>     array = begin-array [ value *( value-separator value ) ] end-array
>>
>> so as to completely cover the RFC grammar?
> 
> Should this go into json-parser.c?

Perhaps. After all, the lexer does nothing special for any of those 
constructs; they are where we really have moved into the parser phase.


>>> + * - Interpolation:
>>> + *   interpolation = %((l|ll|I64)[du]|[ipsf])
>>
>> Not in your series, but we recently discussed adding %% (only inside
>> strings); coupled with enforcing that all other interpolation occurs
>> outside of strings.  I guess we can update this comment at that time.
> 
> Message-ID: <87bmaoszf0.fsf@dusky.pond.sub.org>
> https://lists.nongnu.org/archive/html/qemu-devel/2018-07/msg05844.html
> 
> I meant to do that in this series, but got overwhelmed by all the other
> stuff, and forgot.  Thanks for the reminder.  I may still do it in v2.
> If not, we can do it on top.

Here's where I first attempted it, if it helps.

https://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg00603.html

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 21/56] json: Reject invalid UTF-8 sequences
  2018-08-10 14:40     ` Markus Armbruster
@ 2018-08-10 15:21       ` Eric Blake
  2018-08-16 14:50         ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-10 15:21 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, marcandre.lureau, mdroth

On 08/10/2018 09:40 AM, Markus Armbruster wrote:

>>> +            cp = mod_utf8_codepoint(ptr, 6, &end);
>>
>> Why are you hard-coding 6 here, rather than computing min(6,
>> strchr(ptr,0)-ptr)?  If the user passes an invalid sequence at the end
>> of the string, can we end up making mod_utf8_codepoint() read beyond
>> the end of our string?  Would it be better to just always pass the
>> remaining string length (mod_utf8_codepoint() only cares about
>> stopping short of 6 bytes, but never reads beyond there even if you
>> pass a larger number)?
> 
> mod_utf8_codepoint() never reads beyond '\0'.  The second parameter
> exists only so you can further limit reads.  I like to provide that
> capability, because it sometimes saves a silly substring copy.

Okay. Perhaps the comments on mod_utf8_codepoint() could make that more 
clear that the contract is not violated (I didn't spot it without a 
close re-read of the code, prompted by your reply).  But that's possibly 
a separate patch.


>>> +    if (codepoint > 0 && codepoint <= 0x7F) {
>>> +        buf[0] = codepoint & 0x7F;
>>
>> Dead use of binary &. But acceptable for symmetry with the other code
>> branches.
> 
> Exactly as dead as ...
> 

>>> +    buf[0] = 0xF0 | ((codepoint >> 18) & 0x07);
> 
> ... even this one.
> 
> The last one only because is_valid_codepoint() rejects codepoints >
> 0x10FFFFu, which is admittedly a non-local argument.
> 
> I'm debating whether to keep or drop the redundant masking.  Got a
> preference?

No strong preference. A compiler with good range propagation during 
optimization should be able to eliminate the dead mask from the emitted 
assembly.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 22/56] json: Report first rather than last parse error
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 22/56] json: Report first rather than last parse error Markus Armbruster
@ 2018-08-10 15:25   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-10 15:25 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> Quiz time!  When a parser reports multiple errors, but the user gets
> to see just one, which one is (on average) the least useful one?

:)

> 
> Reproducer: feeding
> 
>      {"abc\xC2ijk": 1}\n
> 
> to QMP produces
> 
>      {"error": {"class": "GenericError", "desc": "JSON parse error, key is not a string in object"}}
> 
> Report the first error instead.  The reproducer now produces
> 
>      {"error": {"class": "GenericError", "desc": "JSON parse error, invalid UTF-8 sequence in string"}}

Yes, definite improvement.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 23/56] json: Leave rejecting invalid UTF-8 to parser
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 23/56] json: Leave rejecting invalid UTF-8 to parser Markus Armbruster
@ 2018-08-10 15:36   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-10 15:36 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> Both the lexer and the parser (attempt to) validate UTF-8 in JSON
> strings.
> 

> 
> The commit before previous made the parser reject invalid UTF-8
> sequences.  Since then, anything the lexer rejects, the parser would
> reject as well.  Thus, the lexer's rejecting is unnecessary for
> correctness, and harmful for error reporting.

Nice analysis.

> 
> However, we want to keep rejecting ASCII control characters in the
> lexer, because that produces the behavior we want for unclosed
> strings.
> 
> We also need to keep rejecting \xFF in the lexer, because we
> documented that as a way to reset the JSON parser
> (docs/interop/qmp-spec.txt section 2.6 QGA Synchronization), which
> means we can't change how we recover from this error now.  I wish we
> hadn't done that.

Or, if we give special meaning to 0xff to cause a lexer reset without 
also emitting an error message, as a design decision. (Doesn't change 
this patch - that would be a change on top).

> 
> I think we should treat \xFE the same as \xFF.

Reasonable, as it would cover byte-order-marks.

> 
> Change the lexer to accept \xC0..\xC1 and \xF5..\xFD.  It now rejects
> only \x00..\x1F and \xFE..\xFF.  Error reporting for invalid UTF-8 in
> strings is much improved, except for \xFE and \xFF.  For the example
> above, the lexer now produces
> 
>      JSON_LCURLY   {
>      JSON_STRING   "abc\xC0\xAFijk"
>      JSON_COLON    :
>      JSON_INTEGER  1
>      JSON_RCURLY
> 
> and the parser reports just
> 
>      JSON parse error, invalid UTF-8 sequence in string
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   qobject/json-lexer.c | 6 ++----
>   1 file changed, 2 insertions(+), 4 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8")
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8") Markus Armbruster
@ 2018-08-10 15:48   ` Eric Blake
  2018-08-10 16:09     ` Eric Blake
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-10 15:48 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> This is consistent with qobject_to_json().  See commit e2ec3f97680.

Side note: that commit mentions that on output, ASCII DEL (0x7f) is 
always escaped. RFC 7159 does not require it to be escaped on input, but 
I wonder if any of your earlier testsuite improvements should 
specifically cover \x7f vs. \u007f on input being canonicalized to 
\u007f on round trip output.

> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   qobject/json-lexer.c  | 2 +-
>   qobject/json-parser.c | 2 +-
>   tests/check-qjson.c   | 8 +-------
>   3 files changed, 3 insertions(+), 9 deletions(-)
> 
> diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
> index ca1e0e2c03..36fb665b12 100644
> --- a/qobject/json-lexer.c
> +++ b/qobject/json-lexer.c
> @@ -93,7 +93,7 @@
>    *   interpolation = %((l|ll|I64)[du]|[ipsf])
>    *
>    * Note:
> - * - Input must be encoded in UTF-8.
> + * - Input must be encoded in modified UTF-8.

Worth documenting this in the QMP doc as an explicit extension? In 
general, our QMP interfaces that take binary input do so via base64 
encoding, rather than via a modified UTF-8 string - and I don't know how 
yajl or jansson would feel about an extension for producing modified 
UTF-8 for QMP to consume if we really did want to pass NUL bytes without 
the overhead of UTF-8; what's more, even if you can pass NUL, you still 
have to worry about all other byte sequences being valid (so base64 is 
still better for true binary data - it's hard to argue that we'd ever 
have an interface where we want UTF-8 including embedded NUL rather than 
true binary).  I guess it can also be argued that outputting modified 
UTF-8 is a violation of JSON, so the fact that we can round-trip NUL 
doesn't help if the client can't read it.

So having typed all that, I guess the answer is no, we don't want to 
document it; for now, the fact that we accept \xc0\x80 on input and 
produce it on output is only for the testsuite, and unlikely to matter 
to any real client of QMP.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequences to parser
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequences to parser Markus Armbruster
@ 2018-08-10 15:56   ` Eric Blake
  2018-08-13  7:05     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-10 15:56 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> Both lexer and parser reject invalid escape sequences in strings.  The
> parser's check is useless.
> 

> 
> Drop the lexer's escape sequence checking, and make it accept the same
> characters after '\' it accepts elsewhere in strings.  It now produces
> 
>      JSON_LCURLY   {
>      JSON_STRING   "abc\@ijk"
>      JSON_COLON    :
>      JSON_INTEGER  1
>      JSON_RCURLY
> 
> and the parser reports just
> 
>      JSON parse error, invalid escape sequence in string
> 
> While there, fix parse_string()'s inaccurate function comment.

Worthwhile improvement.

> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   qobject/json-lexer.c  | 72 +++----------------------------------------
>   qobject/json-parser.c | 56 +++++++++++++++++++--------------
>   2 files changed, 37 insertions(+), 91 deletions(-)

and shorter!

>       [IN_DQ_STRING_ESCAPE] = {
> -        ['b'] = IN_DQ_STRING,
> -        ['f'] =  IN_DQ_STRING,
> -        ['n'] =  IN_DQ_STRING,
> -        ['r'] =  IN_DQ_STRING,
> -        ['t'] =  IN_DQ_STRING,
> -        ['/'] = IN_DQ_STRING,
> -        ['\\'] = IN_DQ_STRING,
> -        ['\''] = IN_DQ_STRING,
> -        ['\"'] = IN_DQ_STRING,
> -        ['u'] = IN_DQ_UCODE0,
> +        [0x20 ... 0xFD] = IN_DQ_STRING,

Among other things, this means the parser now has to flag "\u" as an 
incomplete escape - but your added testsuite coverage earlier in the 
series ensures that we do.

> +++ b/qobject/json-parser.c
> @@ -106,30 +106,40 @@ static int hex2decimal(char ch)
>   }
>   
>   /**
> - * parse_string(): Parse a json string and return a QObject
> + * parse_string(): Parse a JSON string
>    *
> - *  string

> + * From RFC 7159 "The JavaScript Object Notation (JSON) Data
> + * Interchange Format":
> + *
> + *    char = unescaped /
> + *        escape (
> + *            %x22 /          ; "    quotation mark  U+0022
> + *            %x5C /          ; \    reverse solidus U+005C
> + *            %x2F /          ; /    solidus         U+002F
> + *            %x62 /          ; b    backspace       U+0008
> + *            %x66 /          ; f    form feed       U+000C
> + *            %x6E /          ; n    line feed       U+000A
> + *            %x72 /          ; r    carriage return U+000D
> + *            %x74 /          ; t    tab             U+0009
> + *            %x75 4HEXDIG )  ; uXXXX                U+XXXX
> + *    escape = %x5C              ; \
> + *    quotation-mark = %x22      ; "
> + *    unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
> + *
> + * Extensions over RFC 7159:
> + * - Extra escape sequence in strings:
> + *   0x27 (apostrophe) is recognized after escape, too
> + * - Single-quoted strings:
> + *   Like double-quoted strings, except they're delimited by %x27
> + *   (apostrophe) instead of %x22 (quotation mark), and can't contain
> + *   unescaped apostrophe, but can contain unescaped quotation mark.
> + *
> + * Note:
> + * - Encoding is modified UTF-8.

That is an extension over RFC 7159. But I'm okay with leaving it in the 
Notes section.

> + * - Invalid Unicode characters are rejected.
> + * - Control characters are rejected by the lexer.

Worth being explicit that this is 00-1f, fe, and ff?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 26/56] json: Simplify parse_string()
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 26/56] json: Simplify parse_string() Markus Armbruster
@ 2018-08-10 15:59   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-10 15:59 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   qobject/json-parser.c | 42 +++++++++++++++++++-----------------------
>   1 file changed, 19 insertions(+), 23 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8")
  2018-08-10 15:48   ` Eric Blake
@ 2018-08-10 16:09     ` Eric Blake
  2018-08-13  7:00       ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-10 16:09 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/10/2018 10:48 AM, Eric Blake wrote:

>>    * Note:
>> - * - Input must be encoded in UTF-8.
>> + * - Input must be encoded in modified UTF-8.
> 
> Worth documenting this in the QMP doc as an explicit extension? In 
> general, our QMP interfaces that take binary input do so via base64 
> encoding, rather than via a modified UTF-8 string - and I don't know how 
> yajl or jansson would feel about an extension for producing modified 
> UTF-8 for QMP to consume if we really did want to pass NUL bytes without 
> the overhead of UTF-8; what's more, even if you can pass NUL, you still 
> have to worry about all other byte sequences being valid (so base64 is 
> still better for true binary data - it's hard to argue that we'd ever 
> have an interface where we want UTF-8 including embedded NUL rather than 
> true binary).  I guess it can also be argued that outputting modified 
> UTF-8 is a violation of JSON, so the fact that we can round-trip NUL 
> doesn't help if the client can't read it.
> 
> So having typed all that, I guess the answer is no, we don't want to 
> document it; for now, the fact that we accept \xc0\x80 on input and 
> produce it on output is only for the testsuite, and unlikely to matter 
> to any real client of QMP.

Actually, I guess we never output \xc0\x80; but would output the C 
string "\\u0000" (since any byte above 0x1f is passed through our UTF 
decoder back into a codepoint then output with \u). So it's really only 
a question of whether our input engine can pass "\x00" vs. "\\u0000" 
when we NEED an input NUL, and except for the testsuite, our QAPI schema 
never really needs an input NUL.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 27/56] json: Reject invalid \uXXXX, fix \u0000
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 27/56] json: Reject invalid \uXXXX, fix \u0000 Markus Armbruster
@ 2018-08-10 16:10   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-10 16:10 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> The JSON parser translates invalid \uXXXX to garbage instead of
> rejecting it, and swallows \u0000.
> 
> Fix by using mod_utf8_encode() instead of flawed wchar_to_utf8().
> 
> Valid surrogate pairs are now differently broken: they're rejected
> instead of translated to garbage.  The next commit will fix them.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   qobject/json-parser.c | 35 ++++++-----------------------------
>   tests/check-qjson.c   | 32 +++++++++-----------------------
>   2 files changed, 15 insertions(+), 52 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs Markus Armbruster
@ 2018-08-10 17:18   ` Eric Blake
  2018-08-13  7:07     ` Markus Armbruster
  2018-08-12  9:52   ` Paolo Bonzini
  1 sibling, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-10 17:18 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> The JSON parser treats each half of a surrogate pair as unpaired
> surrogate.  Fix it to recognize surrogate pairs.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   qobject/json-parser.c | 16 +++++++++++++++-
>   tests/check-qjson.c   |  3 +--
>   2 files changed, 16 insertions(+), 3 deletions(-)
> 

> @@ -168,6 +170,18 @@ static QString *parse_string(JSONParserContext *ctxt, JSONToken *token)
>                       cp |= hex2decimal(*ptr);
>                   }
>   
> +                if (cp >= 0xD800 && cp <= 0xDBFF && !leading_surrogate
> +                    && ptr[1] == '\\' && ptr[2] == 'u') {
> +                    ptr += 2;
> +                    leading_surrogate = cp;
> +                    goto hex;
> +                }
> +                if (cp >= 0xDC00 && cp <= 0xDFFF && leading_surrogate) {
> +                    cp &= 0x3FF;
> +                    cp |= (leading_surrogate & 0x3FF) << 10;
> +                    cp += 0x010000;
> +                }
> +
>                   if (mod_utf8_encode(utf8_buf, sizeof(utf8_buf), cp) < 0) {
>                       parse_error(ctxt, token,
>                                   "\\u%.4s is not a valid Unicode character",

Consider "\\udbff\\udfff" - a valid surrogate pair (in terms of being in 
range), but which decodes to u+10ffff.  Since is_valid_codepoint() (part 
of mod_utf8_encode()) rejects it due to (codepoint & 0xfffe) == 0xfffe, 
it means we end up printing this error message, but only using the 
second half of the surrogate pair.  Is that okay?

Otherwise,
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 29/56] check-qjson: Fix and enable utf8_string()'s disabled part
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 29/56] check-qjson: Fix and enable utf8_string()'s disabled part Markus Armbruster
@ 2018-08-10 17:19   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-10 17:19 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   tests/check-qjson.c | 11 +++--------
>   1 file changed, 3 insertions(+), 8 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

> diff --git a/tests/check-qjson.c b/tests/check-qjson.c
> index 3d3a3f105f..c8c0ad95a6 100644
> --- a/tests/check-qjson.c
> +++ b/tests/check-qjson.c
> @@ -750,15 +750,10 @@ static void utf8_string(void)
>               qobject_unref(str);
>               g_free(jstr);
>   
> -            /*
> -             * Parse @json_out right back
> -             * Disabled, because qobject_from_json() is buggy, and I can't
> -             * be bothered to add the expected incorrect results.
> -             * FIXME Enable once these bugs have been fixed.
> -             */
> -            if (0 && json_out != json_in) {
> +            /* Parse @json_out right back, unless it has replacements */
> +            if (!strstr(json_out, "\\uFFFD")) {
>                   str = from_json_str(json_out, &error_abort, j);
> -                g_assert_cmpstr(qstring_get_try_str(str), ==, utf8_out);
> +                g_assert_cmpstr(qstring_get_try_str(str), ==, utf8_in);
>               }
>           }
>       }
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 32/56] json: Have lexer call streamer directly
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 32/56] json: Have lexer call streamer directly Markus Armbruster
@ 2018-08-10 17:22   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-10 17:22 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> json_lexer_init() takes the function to process a token as an
> argument.  It's always json_message_process_token().  Makes the code
> harder to understand for no actual gain.  Drop the indirection.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs Markus Armbruster
  2018-08-10 17:18   ` Eric Blake
@ 2018-08-12  9:52   ` Paolo Bonzini
  2018-08-13  7:12     ` Markus Armbruster
  1 sibling, 1 reply; 162+ messages in thread
From: Paolo Bonzini @ 2018-08-12  9:52 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 14:03, Markus Armbruster wrote:
> +                if (cp >= 0xD800 && cp <= 0xDBFF && !leading_surrogate
> +                    && ptr[1] == '\\' && ptr[2] == 'u') {
> +                    ptr += 2;
> +                    leading_surrogate = cp;
> +                    goto hex;
> +                }
> +                if (cp >= 0xDC00 && cp <= 0xDFFF && leading_surrogate) {
> +                    cp &= 0x3FF;
> +                    cp |= (leading_surrogate & 0x3FF) << 10;
> +                    cp += 0x010000;
> +                }
> +

The leading surrogate is discarded for \uD800\uCAFE, I think.  Is this
desired?

Paolo

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 11/56] check-qjson: Cover UTF-8 in single quoted strings
  2018-08-10 14:59       ` Eric Blake
@ 2018-08-13  6:11         ` Markus Armbruster
  2018-08-13 14:53           ` Eric Blake
  0 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-13  6:11 UTC (permalink / raw)
  To: Eric Blake; +Cc: marcandre.lureau, qemu-devel, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/10/2018 09:18 AM, Markus Armbruster wrote:
>> Eric Blake <eblake@redhat.com> writes:
>>
>>> On 08/08/2018 07:02 AM, Markus Armbruster wrote:
>>>> utf8_string() tests only double quoted strings.  Cover single quoted
>>>> strings, too: store the strings to test without quotes, then wrap them
>>>> in either kind of quote.
>>>>
>>>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>>>> ---
>>>>    tests/check-qjson.c | 427 ++++++++++++++++++++++----------------------
>>>>    1 file changed, 214 insertions(+), 213 deletions(-)
>>>>
>>>
>>> Pre-existing, but:
>>>
>>>>            /* 2.2.4  4 bytes U+1FFFFF */
>>>
>>> Technically, Unicode ends at U+10FFFF (21 bits). Anything beyond that
>>> is not valid Unicode, even if it IS a valid interpretation of UTF-8
>>> encoding.
>>
>> Correct.  Testing how we handle such sequences makes sense all the same.
>>
>>>>            {
>>>> -            "\"\xF7\xBF\xBF\xBF\"",
>>>> +            "\xF7\xBF\xBF\xBF",
>>>>                NULL,               /* bug: rejected */
>
> So, maybe all the more we need to do is remove the comment (as we WANT
> to reject these)?

Is PATCH 20 doing what you suggest?

>>>
>>> The conversion of the initializer looks sane (well, mechanical).  Ergo:
>>>
>>> Reviewed-by: Eric Blake <eblake@redhat.com>
>>
>> Thanks!
>
> Of course, playing games with the pre-existing comments on
> out-of-range behavior is probably better for a separate patch, and you
> do have some churn on these tests in later patches. I'll leave it up
> to you what to do (or leave put).

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 18/56] json: Revamp lexer documentation
  2018-08-10 15:02       ` Eric Blake
@ 2018-08-13  6:12         ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-13  6:12 UTC (permalink / raw)
  To: Eric Blake; +Cc: marcandre.lureau, qemu-devel, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/10/2018 09:31 AM, Markus Armbruster wrote:
>
>>>> + *
>>>> + * [Numbers:]
>>>
>>> Worth also calling out:
>>>
>>> [Objects:]
>>>        object = begin-object [ member *( value-separator member ) ]
>>>                 end-object
>>>
>>>        member = string name-separator value
>>> [Arrays:]
>>>     array = begin-array [ value *( value-separator value ) ] end-array
>>>
>>> so as to completely cover the RFC grammar?
>>
>> Should this go into json-parser.c?
>
> Perhaps. After all, the lexer does nothing special for any of those
> constructs; they are where we really have moved into the parser phase.
>
>
>>>> + * - Interpolation:
>>>> + *   interpolation = %((l|ll|I64)[du]|[ipsf])
>>>
>>> Not in your series, but we recently discussed adding %% (only inside
>>> strings); coupled with enforcing that all other interpolation occurs
>>> outside of strings.  I guess we can update this comment at that time.
>>
>> Message-ID: <87bmaoszf0.fsf@dusky.pond.sub.org>
>> https://lists.nongnu.org/archive/html/qemu-devel/2018-07/msg05844.html
>>
>> I meant to do that in this series, but got overwhelmed by all the other
>> stuff, and forgot.  Thanks for the reminder.  I may still do it in v2.
>> If not, we can do it on top.
>
> Here's where I first attempted it, if it helps.
>
> https://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg00603.html

Thanks.  I'll see what I can steal from it.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8")
  2018-08-10 16:09     ` Eric Blake
@ 2018-08-13  7:00       ` Markus Armbruster
  2018-08-13 14:57         ` Eric Blake
  2018-08-17  7:18         ` Markus Armbruster
  0 siblings, 2 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-13  7:00 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/10/2018 10:48 AM, Eric Blake wrote:
>> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>>> This is consistent with qobject_to_json().  See commit e2ec3f97680.
>>
>> Side note: that commit mentions that on output, ASCII DEL (0x7f) is
>> always escaped. RFC 7159 does not require it to be escaped on input,

Weird, isn't it?

>> but I wonder if any of your earlier testsuite improvements should
>> specifically cover \x7f vs. \u007f on input being canonicalized to
>> \u007f on round trip output.

>From utf8_string():

        /* 2.2.1  1 byte U+007F */
        {
            "\x7F",
            "\x7F",
            "\\u007F",
        },

We test parsing of JSON "\x7F" (expecting C string "\x7F"), unparsing of
that C string (expecting JSON "\\u007F"), and after PATCH 29 parsing of
that JSON (expecting the C string again).  Sufficient?

>>>
>>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>>> ---
>>>   qobject/json-lexer.c  | 2 +-
>>>   qobject/json-parser.c | 2 +-
>>>   tests/check-qjson.c   | 8 +-------
>>>   3 files changed, 3 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
>>> index ca1e0e2c03..36fb665b12 100644
>>> --- a/qobject/json-lexer.c
>>> +++ b/qobject/json-lexer.c
>>> @@ -93,7 +93,7 @@
>>>    *   interpolation = %((l|ll|I64)[du]|[ipsf])
>>>    *
>>>    * Note:
>>> - * - Input must be encoded in UTF-8.
>>> + * - Input must be encoded in modified UTF-8.
>>
>> Worth documenting this in the QMP doc as an explicit extension?

qmp-spec.txt:

    The sever expects its input to be encoded in UTF-8, and sends its
    output encoded in ASCII.

The obvious update would be to stick in "modified".

>>                                                                 In
>> general, our QMP interfaces that take binary input do so via base64
>> encoding, rather than via a modified UTF-8 string -

Agreed.

However, whether QMP has a use for funny characters or not, the JSON
parser has to handle them *somehow*.  "Handle" in the broadest possible
sense, including "reject".  Not including misbehavior like "crash" and
"silently ignore some input following ASCII NUL".

>>                                                     and I don't know
>> how yajl or jansson would feel about an extension for producing
>> modified UTF-8 for QMP to consume if we really did want to pass NUL
>> bytes without the overhead of UTF-8; what's more, even if you can
>> pass NUL, you still have to worry about all other byte sequences
>> being valid (so base64 is still better for true binary data - it's
>> hard to argue that we'd ever have an interface where we want UTF-8
>> including embedded NUL rather than true binary).  I guess it can
>> also be argued that outputting modified UTF-8 is a violation of
>> JSON, so the fact that we can round-trip NUL doesn't help if the
>> client can't read it.
>>
>> So having typed all that, I guess the answer is no, we don't want to
>> document it; for now, the fact that we accept \xc0\x80 on input and
>> produce it on output is only for the testsuite, and unlikely to
>> matter to any real client of QMP.
>
> Actually, I guess we never output \xc0\x80; but would output the C
> string "\\u0000" (since any byte above 0x1f is passed through our UTF
> decoder back into a codepoint then output with \u).

to_json() converts the C string sequence by sequence.  Valid sequences
in the BMP other than ASCII control characters (\x00..\x1F and \x7F) are
copied unchanged.  Everything else is escaped.

>                                                     So it's really
> only a question of whether our input engine can pass "\x00"
> vs. "\\u0000" when we NEED an input NUL, and except for the testsuite,
> our QAPI schema never really needs an input NUL.

The question is how the JSON parser is to handle "\u0000" escapes in
JSON strings and NUL bytes anywhere.

The answer for NUL bytes is obvious: reject them just like any other
byte <= 0x1F, as required by the RFC.

My answer for \u0000 is to treat it as much like any other codepoint <=
0x1F as possible.  Treating it exactly like them isn't possible, because
NUL bytes in C strings aren't possible.  However, \xC0\x80 sequences
are.

Reasonably simple and consistent, don't you think?

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequences to parser
  2018-08-10 15:56   ` Eric Blake
@ 2018-08-13  7:05     ` Markus Armbruster
  2018-08-13 14:58       ` Eric Blake
  0 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-13  7:05 UTC (permalink / raw)
  To: Eric Blake; +Cc: Markus Armbruster, qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> Both lexer and parser reject invalid escape sequences in strings.  The
>> parser's check is useless.
>>
>
>>
>> Drop the lexer's escape sequence checking, and make it accept the same
>> characters after '\' it accepts elsewhere in strings.  It now produces
>>
>>      JSON_LCURLY   {
>>      JSON_STRING   "abc\@ijk"
>>      JSON_COLON    :
>>      JSON_INTEGER  1
>>      JSON_RCURLY
>>
>> and the parser reports just
>>
>>      JSON parse error, invalid escape sequence in string
>>
>> While there, fix parse_string()'s inaccurate function comment.
>
> Worthwhile improvement.
>
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   qobject/json-lexer.c  | 72 +++----------------------------------------
>>   qobject/json-parser.c | 56 +++++++++++++++++++--------------
>>   2 files changed, 37 insertions(+), 91 deletions(-)
>
> and shorter!
>
>>       [IN_DQ_STRING_ESCAPE] = {
>> -        ['b'] = IN_DQ_STRING,
>> -        ['f'] =  IN_DQ_STRING,
>> -        ['n'] =  IN_DQ_STRING,
>> -        ['r'] =  IN_DQ_STRING,
>> -        ['t'] =  IN_DQ_STRING,
>> -        ['/'] = IN_DQ_STRING,
>> -        ['\\'] = IN_DQ_STRING,
>> -        ['\''] = IN_DQ_STRING,
>> -        ['\"'] = IN_DQ_STRING,
>> -        ['u'] = IN_DQ_UCODE0,
>> +        [0x20 ... 0xFD] = IN_DQ_STRING,
>
> Among other things, this means the parser now has to flag "\u" as an
> incomplete escape - but your added testsuite coverage earlier in the
> series ensures that we do.

Yes.

>> +++ b/qobject/json-parser.c
>> @@ -106,30 +106,40 @@ static int hex2decimal(char ch)
>>   }
>>     /**
>> - * parse_string(): Parse a json string and return a QObject
>> + * parse_string(): Parse a JSON string
>>    *
>> - *  string
>
>> + * From RFC 7159 "The JavaScript Object Notation (JSON) Data
>> + * Interchange Format":
>> + *
>> + *    char = unescaped /
>> + *        escape (
>> + *            %x22 /          ; "    quotation mark  U+0022
>> + *            %x5C /          ; \    reverse solidus U+005C
>> + *            %x2F /          ; /    solidus         U+002F
>> + *            %x62 /          ; b    backspace       U+0008
>> + *            %x66 /          ; f    form feed       U+000C
>> + *            %x6E /          ; n    line feed       U+000A
>> + *            %x72 /          ; r    carriage return U+000D
>> + *            %x74 /          ; t    tab             U+0009
>> + *            %x75 4HEXDIG )  ; uXXXX                U+XXXX
>> + *    escape = %x5C              ; \
>> + *    quotation-mark = %x22      ; "
>> + *    unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
>> + *
>> + * Extensions over RFC 7159:
>> + * - Extra escape sequence in strings:
>> + *   0x27 (apostrophe) is recognized after escape, too
>> + * - Single-quoted strings:
>> + *   Like double-quoted strings, except they're delimited by %x27
>> + *   (apostrophe) instead of %x22 (quotation mark), and can't contain
>> + *   unescaped apostrophe, but can contain unescaped quotation mark.
>> + *
>> + * Note:
>> + * - Encoding is modified UTF-8.
>
> That is an extension over RFC 7159. But I'm okay with leaving it in
> the Notes section.
>
>> + * - Invalid Unicode characters are rejected.
>> + * - Control characters are rejected by the lexer.
>
> Worth being explicit that this is 00-1f, fe, and ff?

\xFE and \xFF are invalid, not control.

What about:

 * - Invalid Unicode characters are rejected.
 * - Control characters \x00..\x1F are rejected by the lexer.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs
  2018-08-10 17:18   ` Eric Blake
@ 2018-08-13  7:07     ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-13  7:07 UTC (permalink / raw)
  To: Eric Blake; +Cc: Markus Armbruster, qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> The JSON parser treats each half of a surrogate pair as unpaired
>> surrogate.  Fix it to recognize surrogate pairs.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   qobject/json-parser.c | 16 +++++++++++++++-
>>   tests/check-qjson.c   |  3 +--
>>   2 files changed, 16 insertions(+), 3 deletions(-)
>>
>
>> @@ -168,6 +170,18 @@ static QString *parse_string(JSONParserContext *ctxt, JSONToken *token)
>>                      cp |= hex2decimal(*ptr);
>>                  }
>> +                if (cp >= 0xD800 && cp <= 0xDBFF && !leading_surrogate
>> +                    && ptr[1] == '\\' && ptr[2] == 'u') {
>> +                    ptr += 2;
>> +                    leading_surrogate = cp;
>> +                    goto hex;
>> +                }
>> +                if (cp >= 0xDC00 && cp <= 0xDFFF && leading_surrogate) {
>> +                    cp &= 0x3FF;
>> +                    cp |= (leading_surrogate & 0x3FF) << 10;
>> +                    cp += 0x010000;
>> +                }
>> +
>>                   if (mod_utf8_encode(utf8_buf, sizeof(utf8_buf), cp) < 0) {
>>                       parse_error(ctxt, token,
>>                                   "\\u%.4s is not a valid Unicode character",
>
> Consider "\\udbff\\udfff" - a valid surrogate pair (in terms of being
> in range), but which decodes to u+10ffff.  Since is_valid_codepoint()
> (part of mod_utf8_encode()) rejects it due to (codepoint & 0xfffe) ==
> 0xfffe, it means we end up printing this error message, but only using
> the second half of the surrogate pair.  Is that okay?

It's not horrible, but I wouldn't call it okay.  I'll try to improve it.

> Otherwise,
> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs
  2018-08-12  9:52   ` Paolo Bonzini
@ 2018-08-13  7:12     ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-13  7:12 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Markus Armbruster, qemu-devel, marcandre.lureau, mdroth

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 08/08/2018 14:03, Markus Armbruster wrote:
>> +                if (cp >= 0xD800 && cp <= 0xDBFF && !leading_surrogate
>> +                    && ptr[1] == '\\' && ptr[2] == 'u') {
>> +                    ptr += 2;
>> +                    leading_surrogate = cp;
>> +                    goto hex;
>> +                }
>> +                if (cp >= 0xDC00 && cp <= 0xDFFF && leading_surrogate) {
>> +                    cp &= 0x3FF;
>> +                    cp |= (leading_surrogate & 0x3FF) << 10;
>> +                    cp += 0x010000;
>> +                }
>> +
>
> The leading surrogate is discarded for \uD800\uCAFE, I think.  Is this
> desired?

Certainly not.  I'll fix it.  Thanks!

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 11/56] check-qjson: Cover UTF-8 in single quoted strings
  2018-08-13  6:11         ` Markus Armbruster
@ 2018-08-13 14:53           ` Eric Blake
  2018-08-14  6:01             ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-13 14:53 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: marcandre.lureau, qemu-devel, mdroth

On 08/13/2018 01:11 AM, Markus Armbruster wrote:

>>>> Technically, Unicode ends at U+10FFFF (21 bits). Anything beyond that
>>>> is not valid Unicode, even if it IS a valid interpretation of UTF-8
>>>> encoding.
>>>
>>> Correct.  Testing how we handle such sequences makes sense all the same.
>>>
>>>>>             {
>>>>> -            "\"\xF7\xBF\xBF\xBF\"",
>>>>> +            "\xF7\xBF\xBF\xBF",
>>>>>                 NULL,               /* bug: rejected */
>>
>> So, maybe all the more we need to do is remove the comment (as we WANT
>> to reject these)?
> 
> Is PATCH 20 doing what you suggest?

Yes, I think you get there in the end, it was more a question of churn 
in the meantime.

> 
>>>>
>>>> The conversion of the initializer looks sane (well, mechanical).  Ergo:
>>>>
>>>> Reviewed-by: Eric Blake <eblake@redhat.com>
>>>
>>> Thanks!
>>
>> Of course, playing games with the pre-existing comments on
>> out-of-range behavior is probably better for a separate patch, and you
>> do have some churn on these tests in later patches. I'll leave it up
>> to you what to do (or leave put).
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8")
  2018-08-13  7:00       ` Markus Armbruster
@ 2018-08-13 14:57         ` Eric Blake
  2018-08-14  6:07           ` Markus Armbruster
  2018-08-17  7:18         ` Markus Armbruster
  1 sibling, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-13 14:57 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, marcandre.lureau, mdroth

On 08/13/2018 02:00 AM, Markus Armbruster wrote:
> Eric Blake <eblake@redhat.com> writes:
> 
>> On 08/10/2018 10:48 AM, Eric Blake wrote:
>>> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>>>> This is consistent with qobject_to_json().  See commit e2ec3f97680.
>>>
>>> Side note: that commit mentions that on output, ASCII DEL (0x7f) is
>>> always escaped. RFC 7159 does not require it to be escaped on input,
> 
> Weird, isn't it?
> 
>>> but I wonder if any of your earlier testsuite improvements should
>>> specifically cover \x7f vs. \u007f on input being canonicalized to
>>> \u007f on round trip output.
> 
>>From utf8_string():
> 
>          /* 2.2.1  1 byte U+007F */
>          {
>              "\x7F",
>              "\x7F",
>              "\\u007F",
>          },
> 
> We test parsing of JSON "\x7F" (expecting C string "\x7F"), unparsing of
> that C string (expecting JSON "\\u007F"), and after PATCH 29 parsing of
> that JSON (expecting the C string again).  Sufficient?

Indeed, looks like we have coverage of DEL thanks to the bounds testing 
of various interesting UTF-8 inflection points.

>>>> +++ b/qobject/json-lexer.c
>>>> @@ -93,7 +93,7 @@
>>>>     *   interpolation = %((l|ll|I64)[du]|[ipsf])
>>>>     *
>>>>     * Note:
>>>> - * - Input must be encoded in UTF-8.
>>>> + * - Input must be encoded in modified UTF-8.
>>>
>>> Worth documenting this in the QMP doc as an explicit extension?
> 
> qmp-spec.txt:
> 
>      The sever expects its input to be encoded in UTF-8, and sends its
>      output encoded in ASCII.
> 
> The obvious update would be to stick in "modified".
> 
>>>                                                                  In
>>> general, our QMP interfaces that take binary input do so via base64
>>> encoding, rather than via a modified UTF-8 string -
> 
> Agreed.
> 
> However, whether QMP has a use for funny characters or not, the JSON
> parser has to handle them *somehow*.  "Handle" in the broadest possible
> sense, including "reject".  Not including misbehavior like "crash" and
> "silently ignore some input following ASCII NUL".
> 

> 
>>                                                      So it's really
>> only a question of whether our input engine can pass "\x00"
>> vs. "\\u0000" when we NEED an input NUL, and except for the testsuite,
>> our QAPI schema never really needs an input NUL.
> 
> The question is how the JSON parser is to handle "\u0000" escapes in
> JSON strings and NUL bytes anywhere.
> 
> The answer for NUL bytes is obvious: reject them just like any other
> byte <= 0x1F, as required by the RFC.

Yes, rejecting \x00 on input is fine, which leaves only the escape 
inside strings:

> 
> My answer for \u0000 is to treat it as much like any other codepoint <=
> 0x1F as possible.  Treating it exactly like them isn't possible, because
> NUL bytes in C strings aren't possible.  However, \xC0\x80 sequences
> are.

Yes, for internal processing to use modified UTF-8 so that we can accept 
"\\u0000" on input is reasonable. And even nicer if we turn a QString 
containing \xc0\x80 back into "\\u0000" on output, so that our use of 
modified UTF-8 never even escapes to the QMP client (and then we don't 
need a documentation change).

> 
> Reasonably simple and consistent, don't you think?
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequences to parser
  2018-08-13  7:05     ` Markus Armbruster
@ 2018-08-13 14:58       ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-13 14:58 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, marcandre.lureau, mdroth

On 08/13/2018 02:05 AM, Markus Armbruster wrote:
> Eric Blake <eblake@redhat.com> writes:
> 
>> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>>> Both lexer and parser reject invalid escape sequences in strings.  The
>>> parser's check is useless.
>>>

>>> + * Extensions over RFC 7159:
>>> + * - Extra escape sequence in strings:
>>> + *   0x27 (apostrophe) is recognized after escape, too
>>> + * - Single-quoted strings:
>>> + *   Like double-quoted strings, except they're delimited by %x27
>>> + *   (apostrophe) instead of %x22 (quotation mark), and can't contain
>>> + *   unescaped apostrophe, but can contain unescaped quotation mark.
>>> + *
>>> + * Note:
>>> + * - Encoding is modified UTF-8.
>>
>> That is an extension over RFC 7159. But I'm okay with leaving it in
>> the Notes section.
>>
>>> + * - Invalid Unicode characters are rejected.
>>> + * - Control characters are rejected by the lexer.
>>
>> Worth being explicit that this is 00-1f, fe, and ff?
> 
> \xFE and \xFF are invalid, not control.
> 
> What about:
> 
>   * - Invalid Unicode characters are rejected.
>   * - Control characters \x00..\x1F are rejected by the lexer.

Works for me.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 33/56] json: Redesign the callback to consume JSON values
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 33/56] json: Redesign the callback to consume JSON values Markus Armbruster
@ 2018-08-13 15:30   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-13 15:30 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:

> 
> Observations:
> 
> 1. This is not the only way to use recursive descent.  If we replaced
>     "get next token" by a coroutine yield, we could do without a
>     streamer.
> 
> 2. The lexer reports errors by passing a JSON_ERROR token to the
>     streamer.  This communicates the offending input characters and
>     their location, but no more.

In fact, the offending input wasn't completely available until earlier 
in the series :)

> 
> 3. The streamer reports errors by passing a null token sequence to the
>     callback.  The (already poor) lexical error information is thrown
>     away.
> 
> 4. Having the callback receive a token sequence duplicates the code to
>     convert token sequence to abstract syntax tree in every callback.
> 
> 5. Known bug: the streamer silently drops incomplete token sequences.
> 
> This commit rectifies 4. by lifting the call of the parser from the
> callbacks into the streamer.  Later commits will address 3. and 5.
> 
> The lifting removes a bug from qjson.c's parse_json(): it passed a
> pointer to a non-null Error * in certain cases, as demonstrated by
> check-qjson.c.
> 
> json_parser_parse() is now unused.  It's a stupid wrapper around
> json_parser_parse_err().  Drop it, and rename json_parser_parse_err()
> to json_parser_parse().
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 34/56] json: Don't pass null @tokens to json_parser_parse()
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 34/56] json: Don't pass null @tokens to json_parser_parse() Markus Armbruster
@ 2018-08-13 15:32   ` Eric Blake
  2018-08-14  6:17     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-13 15:32 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> json_parser_parse() normally returns the QObject on success.  Except
> it returns null when its @tokens argument is null.
> 
> Its only caller json_message_process_token() passes null @tokens when
> emitting a lexical error.  The call is a rather opaque way to say json
> = NULL then.
> 
> Simplify matters by lifting the assignment to json out of the emit
> path: initialize json to null, set it to the value of
> json_parser_parse() when there's no lexical error.  Drop the special
> case from json_parser_parse().
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   qobject/json-parser.c   |  4 ----
>   qobject/json-streamer.c | 25 ++++++++++++-------------
>   2 files changed, 12 insertions(+), 17 deletions(-)
> 

Shorter and simpler.

Reviewed-by: Eric Blake <eblake@redhat.com>

> +++ b/qobject/json-streamer.c
> @@ -39,9 +39,9 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
>                                   JSONTokenType type, int x, int y)
>   {
>       JSONMessageParser *parser = container_of(lexer, JSONMessageParser, lexer);
> +    QObject *json = NULL;
>       Error *err = NULL;
>       JSONToken *token;
> -    QObject *json;

Why the churn in position?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 35/56] json: Don't create JSON_ERROR tokens that won't be used
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 35/56] json: Don't create JSON_ERROR tokens that won't be used Markus Armbruster
@ 2018-08-13 15:32   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-13 15:32 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   qobject/json-streamer.c | 6 ++----
>   1 file changed, 2 insertions(+), 4 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 36/56] json: Rename token JSON_ESCAPE & friends to JSON_INTERPOL
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 36/56] json: Rename token JSON_ESCAPE & friends to JSON_INTERPOL Markus Armbruster
@ 2018-08-13 15:34   ` Eric Blake
  2018-08-14  6:28     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-13 15:34 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> The JSON parser optionally supports interpolation.  The code calls it
> "escape".  Awkward, because it uses the same term for escape sequences
> within strings.  The latter usage is consistent with RFC 7159 "The
> JavaScript Object Notation (JSON) Data Interchange Format" and ISO C.
> Call the former "interpolation" instead.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   include/qapi/qmp/json-lexer.h |  2 +-
>   qobject/json-lexer.c          | 64 +++++++++++++++++------------------
>   qobject/json-parser.c         |  8 ++---
>   3 files changed, 37 insertions(+), 37 deletions(-)

Mechanical, and a worthwhile name change.

Reviewed-by: Eric Blake <eblake@redhat.com>

Bike-shedding: Would INTERP (short for interpolate) be any more legible 
than INTERPOL (which I first read as short for 'international police')?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 37/56] json: Treat unwanted interpolation as lexical error
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 37/56] json: Treat unwanted interpolation as lexical error Markus Armbruster
@ 2018-08-13 15:48   ` Eric Blake
  2018-08-14  6:51     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-13 15:48 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> The JSON parser optionally supports interpolation.  The lexer
> recognizes interpolation tokens unconditionally.  The parser rejects
> them when interpolation is disabled, in parse_interpolation().
> However, it neglects to set an error then, which can make
> json_parser_parse() fail without setting an error.
> 
> Move the check for unwanted interpolation from the parser's
> parse_interpolation() into the lexer's finite state machine.  When
> interpolation is disabled, '%' is now handled like any other
> unexpected character.
> 
> The next commit will improve how such lexical errors are handled.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   include/qapi/qmp/json-lexer.h |  4 ++--
>   qobject/json-lexer.c          | 42 ++++++++++++++++++++++++++---------
>   qobject/json-parser.c         |  4 ----
>   qobject/json-streamer.c       |  2 +-
>   tests/qmp-test.c              |  4 ++++
>   5 files changed, 39 insertions(+), 17 deletions(-)
> 


> @@ -271,17 +272,38 @@ static const uint8_t json_lexer[][256] =  {
>           [','] = JSON_COMMA,
>           [':'] = JSON_COLON,
>           ['a' ... 'z'] = IN_KEYWORD,
> +        [' '] = IN_WHITESPACE,
> +        ['\t'] = IN_WHITESPACE,
> +        ['\r'] = IN_WHITESPACE,
> +        ['\n'] = IN_WHITESPACE,
> +    },
> +
> +    [IN_START_INTERPOL] = {
> +        ['"'] = IN_DQ_STRING,
...

> +        ['\n'] = IN_WHITESPACE,
> +        /* matches IN_START up to here */
>           ['%'] = IN_INTERPOL,

You could compress this as:

[IN_START_INTERPOL ... IN_START] = {
    ['"'] = ...
    ['\n'] = ...
},
[IN_START_INTERPOL]['%'] = IN_INTERPOL,

rather than duplicating the common list twice. (We already exploit gcc's 
range initialization, and the fact that you can initialize a broader 
range and then re-initialize a more specific subset later)

> +++ b/tests/qmp-test.c
> @@ -94,6 +94,10 @@ static void test_malformed(QTestState *qts)
>   
>       /* lexical error: interpolation */
>       qtest_qmp_send_raw(qts, "%%p\n");
> +    /* two errors, one for "%", one for "p" */
> +    resp = qtest_qmp_receive(qts);
> +    g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
> +    qobject_unref(resp);
>       resp = qtest_qmp_receive(qts);
>       g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
>       qobject_unref(resp);

I'm impressed at how easily you got the lexer to parse two different 
token grammars, and agree that doing it in the lexer when we don't want 
interpolation is a nicer place.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 38/56] json: Pass lexical errors and limit violations to callback
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 38/56] json: Pass lexical errors and limit violations to callback Markus Armbruster
@ 2018-08-13 15:51   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-13 15:51 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> The callback to consume JSON values takes QObject *json, Error *err.
> If both are null, the callback is supposed to make up an error by
> itself.  This sucks.
> 
> qjson.c's consume_json() neglects to do so, which makes
> qobject_from_json() & friends return null instead of failing.  I
> consider that a bug.
> 
> The culprit is json_message_process_token(): it passes two null
> pointers when it runs into a lexical error or a limit violation.  Fix
> it to pass a proper Error object then.  Update the callbacks:
> 

> +++ b/include/qapi/qmp/qerror.h
> @@ -61,9 +61,6 @@
>   #define QERR_IO_ERROR \
>       "An IO error has occurred"
>   
> -#define QERR_JSON_PARSING \
> -    "Invalid JSON syntax"
> -

Bonus - one less of these annoying defines.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 39/56] json: Leave rejecting invalid interpolation to parser
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 39/56] json: Leave rejecting invalid interpolation to parser Markus Armbruster
@ 2018-08-13 16:12   ` Eric Blake
  2018-08-14  7:23     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-13 16:12 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> Both lexer and parser reject invalid interpolation specifications.
> The parser's check is useless.
> 
> The lexer ends the token right after the first bad character.  This
> tends to lead to suboptimal error reporting.  For instance, input
> 
>      [ %11d ]

With no context of other characters on this line, it took me a while to 
notice that this was '1' (one) not 'l' (ell), even though my current 
font renders the two quite distinctly. (And now you know why base32 
avoids 0/1 in its alphabet, to minimize confusion with O/l - see RFC 
4648).  A better example might be %22d.

> 
> produces the tokens
> 
>      JSON_LSQUARE  [
>      JSON_ERROR    %1
>      JSON_INTEGER  1

And that's in spite of the context you have here, which makes it obvious 
that the parser saw an integer.

>      JSON_KEYWORD  d
>      JSON_RSQUARE  ]
> 
> The parser then yields an error, an object and two more errors:
> 
>      error: Invalid JSON syntax
>      object: 1
>      error: JSON parse error, invalid keyword
>      error: JSON parse error, expecting value
> 
> Change the lexer to accept [A-Za-z0-9]*[duipsf].  It now produces

That regex doesn't match the code.

> 
>      JSON_LSQUARE  [
>      JSON_INTERPOLATION %11d
>      JSON_RSQUARE  ]
> 
> and the parser reports just
> 
>      JSON parse error, invalid interpolation '%11d'
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   qobject/json-lexer.c  | 52 +++++++++----------------------------------
>   qobject/json-parser.c |  1 +
>   2 files changed, 11 insertions(+), 42 deletions(-)
> 
> diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
> index 0ea1eae4aa..7a82aab88b 100644
> --- a/qobject/json-lexer.c
> +++ b/qobject/json-lexer.c
> @@ -93,7 +93,8 @@
>    *   (apostrophe) instead of %x22 (quotation mark), and can't contain
>    *   unescaped apostrophe, but can contain unescaped quotation mark.
>    * - Interpolation:
> - *   interpolation = %((l|ll|I64)[du]|[ipsf])
> + *   The lexer accepts [A-Za-z0-9]*, and leaves rejecting invalid ones
> + *   to the parser.

This comment is more apropos.  But is it worth spelling "The lexer 
accepts %[A-Za-z0-9]*", and/or documenting that recognizing 
interpolation during lexing is now optional (thanks to the previous patch)?

> @@ -278,6 +238,14 @@ static const uint8_t json_lexer[][256] =  {
>           ['\n'] = IN_WHITESPACE,
>       },
>   
> +    /* interpolation */
> +    [IN_INTERPOL] = {
> +        TERMINAL(JSON_INTERPOL),
> +        ['A' ... 'Z'] = IN_INTERPOL,
> +        ['a' ... 'z'] = IN_INTERPOL,
> +        ['0' ... '9'] = IN_INTERPOL,
> +    },
> +

Note that we still treat code like "'foo': %#x" as an invalid 
interpolation request (it would be a valid printf format), while at the 
same time passing "%1" on to the parser (which is not a valid printf 
format, so -Wformat would have flagged it).  In fact, "%d1" which is 
valid for printf as "%d" followed by a literal "1" gets thrown to the 
parser as one chunk - but it is not valid in JSON to have %d adjacent to 
another integer any more than it is to reject "%d1" as an unknown 
sequence.  I don't think that catering to the remaining printf 
metacharacters (' ', '#', '\'', '-', '*', '+') nor demanding that things 
end on a letter is worth the effort, since it just makes the lexer more 
complicated when your goal was to do as little as possible to get things 
thrown over to the parser.

So even though your lexing is now somewhat different from printf, I 
don't see that as a serious drawback (the uses we care about are still 
-Wformat clean, and the uses we don't like are properly handled in the 
parser).

Perhaps you want to enhance the testsuite (earlier in the series) to 
cover "%22d", "%d1", "%1" as various unaccepted interpolation requests 
(if you didn't already).

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 40/56] json: Replace %I64d, %I64u by %PRId64, %PRIu64
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 40/56] json: Replace %I64d, %I64u by %PRId64, %PRIu64 Markus Armbruster
@ 2018-08-13 16:18   ` Eric Blake
  2018-08-14  7:24     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-13 16:18 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> Support for %I64d got addded in commit 2c0d4b36e7f "json: fix PRId64

s/addded/added/

> on Win32".  We had to hard-code I64d because we used the lexer's
> finite state machine to check interpolations.  No more, so clean this
> up.
> 
> Additional conversion specifications would be easy enough to implement
> when needed.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   qobject/json-parser.c | 10 ++++++----
>   tests/check-qjson.c   | 10 ++++++++++
>   2 files changed, 16 insertions(+), 4 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

>          return QOBJECT(qnum_from_int(va_arg(*ap, long)));
> -    } else if (!strcmp(token->str, "%lld") ||
> -               !strcmp(token->str, "%I64d")) {
> +    } else if (!strcmp(token->str, "%lld")) {
>          return QOBJECT(qnum_from_int(va_arg(*ap, long long)));
> +    } else if (!strcmp(token->str, "%" PRId64)) {
> +        return QOBJECT(qnum_from_int(va_arg(*ap, int64_t)));

I had a double-take to make sure this still works on mingw. The trick 
used to be that the lexer had to parse the union of all forms understood 
by any libc (making Linux understand %I64d even though only mingw would 
generate it) then the parser had to accept all forms allowed through by 
the lexer.  Now the lexer accepts all forms with no effort (because it 
is no longer validates), and the parser is made stricter (%I64d no 
longer works on Linux, where we have two redundant 'if' clauses; but 
mingw has two distinct 'if' clauses and works as desired).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 41/56] json: Nicer recovery from invalid leading zero
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 41/56] json: Nicer recovery from invalid leading zero Markus Armbruster
@ 2018-08-13 16:33   ` Eric Blake
  2018-08-14  8:24     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-13 16:33 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> For input 0123, the lexer produces the tokens
> 
>      JSON_ERROR    01
>      JSON_INTEGER  23
> 
> Reporting an error is correct; 0123 is invalid according to RFC 7159.
> But the error recovery isn't nice.
> 
> Make the finite state machine eat digits before going into the error
> state.  The lexer now produces
> 
>      JSON_ERROR    0123
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   qobject/json-lexer.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
> 

> @@ -158,10 +159,14 @@ static const uint8_t json_lexer[][256] =  {
>       /* Zero */
>       [IN_ZERO] = {
>           TERMINAL(JSON_INTEGER),
> -        ['0' ... '9'] = IN_ERROR,
> +        ['0' ... '9'] = IN_BAD_ZERO,
>           ['.'] = IN_MANTISSA,
>       },
>   
> +    [IN_BAD_ZERO] = {
> +        ['0' ... '9'] = IN_BAD_ZERO,
> +    },
> +

Should IN_BAD_ZERO also consume '.' and/or 'e' (after all, '01e2 is a 
valid C constant, but not a valid JSON literal)?  But I think your 
choice here is fine (again, add too much, and then the lexer has to 
track a lot of state; whereas this minimal addition catches the most 
obvious things with little effort).

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 42/56] json: Improve names of lexer states related to numbers
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 42/56] json: Improve names of lexer states related to numbers Markus Armbruster
@ 2018-08-13 16:36   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-13 16:36 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   qobject/json-lexer.c | 38 +++++++++++++++++++-------------------
>   1 file changed, 19 insertions(+), 19 deletions(-)

> @@ -228,8 +228,8 @@ static const uint8_t json_lexer[][256] =  {
>           ['"'] = IN_DQ_STRING,
>           ['\''] = IN_SQ_STRING,
>           ['0'] = IN_ZERO,
> -        ['1' ... '9'] = IN_NONZERO_NUMBER,
> -        ['-'] = IN_NEG_NONZERO_NUMBER,
> +        ['1' ... '9'] = IN_DIGITS,
> +        ['-'] = IN_SIGN,
>           ['{'] = JSON_LCURLY,
>           ['}'] = JSON_RCURLY,
>           ['['] = JSON_LSQUARE,
> @@ -255,8 +255,8 @@ static const uint8_t json_lexer[][256] =  {
>           ['"'] = IN_DQ_STRING,
>           ['\''] = IN_SQ_STRING,
>           ['0'] = IN_ZERO,
> -        ['1' ... '9'] = IN_NONZERO_NUMBER,
> -        ['-'] = IN_NEG_NONZERO_NUMBER,
> +        ['1' ... '9'] = IN_DIGITS,
> +        ['-'] = IN_SIGN,

If you take my advice in the earlier patch about not repeating this list 
for IN_INTERPOL_START, you'll have an easy-to-resolve conflict here.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 11/56] check-qjson: Cover UTF-8 in single quoted strings
  2018-08-13 14:53           ` Eric Blake
@ 2018-08-14  6:01             ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-14  6:01 UTC (permalink / raw)
  To: Eric Blake; +Cc: Markus Armbruster, marcandre.lureau, qemu-devel, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/13/2018 01:11 AM, Markus Armbruster wrote:
>
>>>>> Technically, Unicode ends at U+10FFFF (21 bits). Anything beyond that
>>>>> is not valid Unicode, even if it IS a valid interpretation of UTF-8
>>>>> encoding.
>>>>
>>>> Correct.  Testing how we handle such sequences makes sense all the same.
>>>>
>>>>>>             {
>>>>>> -            "\"\xF7\xBF\xBF\xBF\"",
>>>>>> +            "\xF7\xBF\xBF\xBF",
>>>>>>                 NULL,               /* bug: rejected */
>>>
>>> So, maybe all the more we need to do is remove the comment (as we WANT
>>> to reject these)?
>>
>> Is PATCH 20 doing what you suggest?
>
> Yes, I think you get there in the end, it was more a question of churn
> in the meantime.

Modest churn, I think.  PATCH 09 adds some ten bug: comments that go
away in "[PATCH 21/56] json: Reject invalid UTF-8 sequences" (some might
go a bit later, didn't check).  I put my announcement of intent "[PATCH
20/56] check-qjson: Document we expect invalid UTF-8 to be rejected"
right before its implementation in PATCH 21.  Having PATCH 20 in place
before PATCH 09 would avoid the bug: comment churn, but it would also
separate announcement of intent from implementation.  Seems doubtful to
me.

>>>>>
>>>>> The conversion of the initializer looks sane (well, mechanical).  Ergo:
>>>>>
>>>>> Reviewed-by: Eric Blake <eblake@redhat.com>
>>>>
>>>> Thanks!
>>>
>>> Of course, playing games with the pre-existing comments on
>>> out-of-range behavior is probably better for a separate patch, and you
>>> do have some churn on these tests in later patches. I'll leave it up
>>> to you what to do (or leave put).
>>

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8")
  2018-08-13 14:57         ` Eric Blake
@ 2018-08-14  6:07           ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-14  6:07 UTC (permalink / raw)
  To: Eric Blake; +Cc: Markus Armbruster, marcandre.lureau, qemu-devel, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/13/2018 02:00 AM, Markus Armbruster wrote:
>> Eric Blake <eblake@redhat.com> writes:
>>
>>> On 08/10/2018 10:48 AM, Eric Blake wrote:
>>>> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>>>>> This is consistent with qobject_to_json().  See commit e2ec3f97680.
>>>>
>>>> Side note: that commit mentions that on output, ASCII DEL (0x7f) is
>>>> always escaped. RFC 7159 does not require it to be escaped on input,
>>
>> Weird, isn't it?
>>
>>>> but I wonder if any of your earlier testsuite improvements should
>>>> specifically cover \x7f vs. \u007f on input being canonicalized to
>>>> \u007f on round trip output.
>>
>>>From utf8_string():
>>
>>          /* 2.2.1  1 byte U+007F */
>>          {
>>              "\x7F",
>>              "\x7F",
>>              "\\u007F",
>>          },
>>
>> We test parsing of JSON "\x7F" (expecting C string "\x7F"), unparsing of
>> that C string (expecting JSON "\\u007F"), and after PATCH 29 parsing of
>> that JSON (expecting the C string again).  Sufficient?
>
> Indeed, looks like we have coverage of DEL thanks to the bounds
> testing of various interesting UTF-8 inflection points.
>
>>>>> +++ b/qobject/json-lexer.c
>>>>> @@ -93,7 +93,7 @@
>>>>>     *   interpolation = %((l|ll|I64)[du]|[ipsf])
>>>>>     *
>>>>>     * Note:
>>>>> - * - Input must be encoded in UTF-8.
>>>>> + * - Input must be encoded in modified UTF-8.
>>>>
>>>> Worth documenting this in the QMP doc as an explicit extension?
>>
>> qmp-spec.txt:
>>
>>      The sever expects its input to be encoded in UTF-8, and sends its
>>      output encoded in ASCII.
>>
>> The obvious update would be to stick in "modified".
>>
>>>>                                                                  In
>>>> general, our QMP interfaces that take binary input do so via base64
>>>> encoding, rather than via a modified UTF-8 string -
>>
>> Agreed.
>>
>> However, whether QMP has a use for funny characters or not, the JSON
>> parser has to handle them *somehow*.  "Handle" in the broadest possible
>> sense, including "reject".  Not including misbehavior like "crash" and
>> "silently ignore some input following ASCII NUL".
>>
>
>>
>>>                                                      So it's really
>>> only a question of whether our input engine can pass "\x00"
>>> vs. "\\u0000" when we NEED an input NUL, and except for the testsuite,
>>> our QAPI schema never really needs an input NUL.
>>
>> The question is how the JSON parser is to handle "\u0000" escapes in
>> JSON strings and NUL bytes anywhere.
>>
>> The answer for NUL bytes is obvious: reject them just like any other
>> byte <= 0x1F, as required by the RFC.
>
> Yes, rejecting \x00 on input is fine, which leaves only the escape
> inside strings:
>
>>
>> My answer for \u0000 is to treat it as much like any other codepoint <=
>> 0x1F as possible.  Treating it exactly like them isn't possible, because
>> NUL bytes in C strings aren't possible.  However, \xC0\x80 sequences
>> are.
>
> Yes, for internal processing to use modified UTF-8 so that we can
> accept "\\u0000" on input is reasonable. And even nicer if we turn a
> QString containing \xc0\x80 back into "\\u0000" on output, so that our
> use of modified UTF-8 never even escapes to the QMP client (and then
> we don't need a documentation change).

We do, in to_json():

        for (; *ptr; ptr = end) {
--->        cp = mod_utf8_codepoint(ptr, 6, &end);
            switch (cp) {
            [special cases like \n...]
            default:
                if (cp < 0) {
                    cp = 0xFFFD; /* replacement character */
                }
                if (cp > 0xFFFF) {
                    /* beyond BMP; need a surrogate pair */
                    snprintf(buf, sizeof(buf), "\\u%04X\\u%04X",
                             0xD800 + ((cp - 0x10000) >> 10),
                             0xDC00 + ((cp - 0x10000) & 0x3FF));
                } else if (cp < 0x20 || cp >= 0x7F) {
--->                snprintf(buf, sizeof(buf), "\\u%04X", cp);
                } else {
                    buf[0] = cp;
                    buf[1] = 0;
                }
                qstring_append(str, buf);
            }

mod_utf8_codepoint() recognizes \xC0\x80 as codepoint 0.  That's an
ASCII control character, and those gets emitted using format \\u%04X.

>>
>> Reasonably simple and consistent, don't you think?
>>

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 34/56] json: Don't pass null @tokens to json_parser_parse()
  2018-08-13 15:32   ` Eric Blake
@ 2018-08-14  6:17     ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-14  6:17 UTC (permalink / raw)
  To: Eric Blake; +Cc: Markus Armbruster, qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> json_parser_parse() normally returns the QObject on success.  Except
>> it returns null when its @tokens argument is null.
>>
>> Its only caller json_message_process_token() passes null @tokens when
>> emitting a lexical error.  The call is a rather opaque way to say json
>> = NULL then.
>>
>> Simplify matters by lifting the assignment to json out of the emit
>> path: initialize json to null, set it to the value of
>> json_parser_parse() when there's no lexical error.  Drop the special
>> case from json_parser_parse().
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   qobject/json-parser.c   |  4 ----
>>   qobject/json-streamer.c | 25 ++++++++++++-------------
>>   2 files changed, 12 insertions(+), 17 deletions(-)
>>
>
> Shorter and simpler.
>
> Reviewed-by: Eric Blake <eblake@redhat.com>
>
>> +++ b/qobject/json-streamer.c
>> @@ -39,9 +39,9 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
>>                                   JSONTokenType type, int x, int y)
>>   {
>>       JSONMessageParser *parser = container_of(lexer, JSONMessageParser, lexer);
>> +    QObject *json = NULL;
>>       Error *err = NULL;
>>       JSONToken *token;
>> -    QObject *json;
>
> Why the churn in position?

I like to put declarations with initializers before declarations without
initializers.  ObMovieQuote: "it's a symbol of my individuality, and my
belief in personal freedom."

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 36/56] json: Rename token JSON_ESCAPE & friends to JSON_INTERPOL
  2018-08-13 15:34   ` Eric Blake
@ 2018-08-14  6:28     ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-14  6:28 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> The JSON parser optionally supports interpolation.  The code calls it
>> "escape".  Awkward, because it uses the same term for escape sequences
>> within strings.  The latter usage is consistent with RFC 7159 "The
>> JavaScript Object Notation (JSON) Data Interchange Format" and ISO C.
>> Call the former "interpolation" instead.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   include/qapi/qmp/json-lexer.h |  2 +-
>>   qobject/json-lexer.c          | 64 +++++++++++++++++------------------
>>   qobject/json-parser.c         |  8 ++---
>>   3 files changed, 37 insertions(+), 37 deletions(-)
>
> Mechanical, and a worthwhile name change.
>
> Reviewed-by: Eric Blake <eblake@redhat.com>
>
> Bike-shedding: Would INTERP (short for interpolate) be any more
> legible than INTERPOL (which I first read as short for 'international
> police')?

Ah, where's the fun in that!

When I read INTERP, I associate "interpreter".  On the other hand, there
appears to be precedence for abbreviating "interpolate" /
"interpolation" to "interp" in numpy and MATLAB.

Another possible abbreviation would be IPOLATE.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 37/56] json: Treat unwanted interpolation as lexical error
  2018-08-13 15:48   ` Eric Blake
@ 2018-08-14  6:51     ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-14  6:51 UTC (permalink / raw)
  To: Eric Blake; +Cc: Markus Armbruster, qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> The JSON parser optionally supports interpolation.  The lexer
>> recognizes interpolation tokens unconditionally.  The parser rejects
>> them when interpolation is disabled, in parse_interpolation().
>> However, it neglects to set an error then, which can make
>> json_parser_parse() fail without setting an error.
>>
>> Move the check for unwanted interpolation from the parser's
>> parse_interpolation() into the lexer's finite state machine.  When
>> interpolation is disabled, '%' is now handled like any other
>> unexpected character.
>>
>> The next commit will improve how such lexical errors are handled.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   include/qapi/qmp/json-lexer.h |  4 ++--
>>   qobject/json-lexer.c          | 42 ++++++++++++++++++++++++++---------
>>   qobject/json-parser.c         |  4 ----
>>   qobject/json-streamer.c       |  2 +-
>>   tests/qmp-test.c              |  4 ++++
>>   5 files changed, 39 insertions(+), 17 deletions(-)
>>
>
>
>> @@ -271,17 +272,38 @@ static const uint8_t json_lexer[][256] =  {
>>           [','] = JSON_COMMA,
>>           [':'] = JSON_COLON,
>>           ['a' ... 'z'] = IN_KEYWORD,
>> +        [' '] = IN_WHITESPACE,
>> +        ['\t'] = IN_WHITESPACE,
>> +        ['\r'] = IN_WHITESPACE,
>> +        ['\n'] = IN_WHITESPACE,
>> +    },
>> +
>> +    [IN_START_INTERPOL] = {
>> +        ['"'] = IN_DQ_STRING,
> ...
>
>> +        ['\n'] = IN_WHITESPACE,
>> +        /* matches IN_START up to here */
>>           ['%'] = IN_INTERPOL,
>
> You could compress this as:
>
> [IN_START_INTERPOL ... IN_START] = {
>    ['"'] = ...
>    ['\n'] = ...
> },
> [IN_START_INTERPOL]['%'] = IN_INTERPOL,
>
> rather than duplicating the common list twice. (We already exploit
> gcc's range initialization, and the fact that you can initialize a
> broader range and then re-initialize a more specific subset later)

Neat!

It'll lose some of its charm in PATCH 52, but enough remains for me to
like it.

>> +++ b/tests/qmp-test.c
>> @@ -94,6 +94,10 @@ static void test_malformed(QTestState *qts)
>>         /* lexical error: interpolation */
>>       qtest_qmp_send_raw(qts, "%%p\n");
>> +    /* two errors, one for "%", one for "p" */
>> +    resp = qtest_qmp_receive(qts);
>> +    g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
>> +    qobject_unref(resp);
>>       resp = qtest_qmp_receive(qts);
>>       g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
>>       qobject_unref(resp);
>
> I'm impressed at how easily you got the lexer to parse two different
> token grammars, and agree that doing it in the lexer when we don't
> want interpolation is a nicer place.
>
> Reviewed-by: Eric Blake <eblake@redhat.com>

Well, what you see here isn't my first attempt to make the lexer do it,
it's the one I finally liked :)

Thanks!

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 39/56] json: Leave rejecting invalid interpolation to parser
  2018-08-13 16:12   ` Eric Blake
@ 2018-08-14  7:23     ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-14  7:23 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> Both lexer and parser reject invalid interpolation specifications.
>> The parser's check is useless.
>>
>> The lexer ends the token right after the first bad character.  This
>> tends to lead to suboptimal error reporting.  For instance, input
>>
>>      [ %11d ]
>
> With no context of other characters on this line, it took me a while
> to notice that this was '1' (one) not 'l' (ell), even though my
> current font renders the two quite distinctly. (And now you know why
> base32 avoids 0/1 in its alphabet, to minimize confusion with O/l -
> see RFC 4648).  A better example might be %22d.

I made up an example that could plausibly escape review.  I guess the
realism is too... realistic.  Also, not really helpful.

>> produces the tokens
>>
>>      JSON_LSQUARE  [
>>      JSON_ERROR    %1
>>      JSON_INTEGER  1
>
> And that's in spite of the context you have here, which makes it
> obvious that the parser saw an integer.
>
>>      JSON_KEYWORD  d
>>      JSON_RSQUARE  ]
>>
>> The parser then yields an error, an object and two more errors:
>>
>>      error: Invalid JSON syntax
>>      object: 1
>>      error: JSON parse error, invalid keyword
>>      error: JSON parse error, expecting value
>>
>> Change the lexer to accept [A-Za-z0-9]*[duipsf].  It now produces
>
> That regex doesn't match the code.

Left over from a previous, unpublished iteration.  I'll fix it.

>>
>>      JSON_LSQUARE  [
>>      JSON_INTERPOLATION %11d
>>      JSON_RSQUARE  ]
>>
>> and the parser reports just
>>
>>      JSON parse error, invalid interpolation '%11d'
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   qobject/json-lexer.c  | 52 +++++++++----------------------------------
>>   qobject/json-parser.c |  1 +
>>   2 files changed, 11 insertions(+), 42 deletions(-)
>>
>> diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
>> index 0ea1eae4aa..7a82aab88b 100644
>> --- a/qobject/json-lexer.c
>> +++ b/qobject/json-lexer.c
>> @@ -93,7 +93,8 @@
>>    *   (apostrophe) instead of %x22 (quotation mark), and can't contain
>>    *   unescaped apostrophe, but can contain unescaped quotation mark.
>>    * - Interpolation:
>> - *   interpolation = %((l|ll|I64)[du]|[ipsf])
>> + *   The lexer accepts [A-Za-z0-9]*, and leaves rejecting invalid ones
>> + *   to the parser.
>
> This comment is more apropos.  But is it worth spelling "The lexer
> accepts %[A-Za-z0-9]*",

Yes.

>                         and/or documenting that recognizing
> interpolation during lexing is now optional (thanks to the previous
> patch)?

Should be done right when I make it optional, in PATCH 37, perhaps like
this:

@@ -92,7 +92,7 @@
  *   Like double-quoted strings, except they're delimited by %x27
  *   (apostrophe) instead of %x22 (quotation mark), and can't contain
  *   unescaped apostrophe, but can contain unescaped quotation mark.
- * - Interpolation:
+ * - Interpolation, if enabled:
  *   interpolation = %((l|ll|I64)[du]|[ipsf])
  *
  * Note:

>> @@ -278,6 +238,14 @@ static const uint8_t json_lexer[][256] =  {
>>           ['\n'] = IN_WHITESPACE,
>>       },
>>   +    /* interpolation */
>> +    [IN_INTERPOL] = {
>> +        TERMINAL(JSON_INTERPOL),
>> +        ['A' ... 'Z'] = IN_INTERPOL,
>> +        ['a' ... 'z'] = IN_INTERPOL,
>> +        ['0' ... '9'] = IN_INTERPOL,
>> +    },
>> +
>
> Note that we still treat code like "'foo': %#x" as an invalid
> interpolation request (it would be a valid printf format), while at
> the same time passing "%1" on to the parser (which is not a valid
> printf format, so -Wformat would have flagged it).  In fact, "%d1"
> which is valid for printf as "%d" followed by a literal "1" gets
> thrown to the parser as one chunk - but it is not valid in JSON to
> have %d adjacent to another integer any more than it is to reject
> "%d1" as an unknown sequence.  I don't think that catering to the
> remaining printf metacharacters (' ', '#', '\'', '-', '*', '+') nor
> demanding that things end on a letter is worth the effort, since it
> just makes the lexer more complicated when your goal was to do as
> little as possible to get things thrown over to the parser.

Yes.

The goal is to catch all mistakes safely, and as many as practical at
compile time.

Invalid interpolation specifications get rejected at run time, by
parse_interpolation().  We can't check them at compile time.

We can't check the variadic arguments match the interpolation
specifications at run time.  We can enlist gcc to check at compile time.
The only remaining hole is '%' in strings.  I intend to plug it.

> So even though your lexing is now somewhat different from printf, I
> don't see that as a serious drawback (the uses we care about are still
> -Wformat clean, and the uses we don't like are properly handled in the
> parser).

Exactly.

> Perhaps you want to enhance the testsuite (earlier in the series) to
> cover "%22d", "%d1", "%1" as various unaccepted interpolation requests
> (if you didn't already).

We don't have negative tests so far.

>From a white box point of view, one negative test certainly makes sense,
to cover parse_interpolation()'s error path.  Multiple ones not so much;
it's all the same to parse_interpolation().

> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 40/56] json: Replace %I64d, %I64u by %PRId64, %PRIu64
  2018-08-13 16:18   ` Eric Blake
@ 2018-08-14  7:24     ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-14  7:24 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> Support for %I64d got addded in commit 2c0d4b36e7f "json: fix PRId64
>
> s/addded/added/

Fixing...

>> on Win32".  We had to hard-code I64d because we used the lexer's
>> finite state machine to check interpolations.  No more, so clean this
>> up.
>>
>> Additional conversion specifications would be easy enough to implement
>> when needed.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   qobject/json-parser.c | 10 ++++++----
>>   tests/check-qjson.c   | 10 ++++++++++
>>   2 files changed, 16 insertions(+), 4 deletions(-)
>>
>
> Reviewed-by: Eric Blake <eblake@redhat.com>
>
>>          return QOBJECT(qnum_from_int(va_arg(*ap, long)));
>> -    } else if (!strcmp(token->str, "%lld") ||
>> -               !strcmp(token->str, "%I64d")) {
>> +    } else if (!strcmp(token->str, "%lld")) {
>>          return QOBJECT(qnum_from_int(va_arg(*ap, long long)));
>> +    } else if (!strcmp(token->str, "%" PRId64)) {
>> +        return QOBJECT(qnum_from_int(va_arg(*ap, int64_t)));
>
> I had a double-take to make sure this still works on mingw. The trick
> used to be that the lexer had to parse the union of all forms
> understood by any libc (making Linux understand %I64d even though only
> mingw would generate it) then the parser had to accept all forms
> allowed through by the lexer.  Now the lexer accepts all forms with no
> effort (because it is no longer validates), and the parser is made
> stricter (%I64d no longer works on Linux, where we have two redundant
> 'if' clauses; but mingw has two distinct 'if' clauses and works as
> desired).

Exactly.  Thanks for checking!

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 41/56] json: Nicer recovery from invalid leading zero
  2018-08-13 16:33   ` Eric Blake
@ 2018-08-14  8:24     ` Markus Armbruster
  2018-08-14 13:14       ` Eric Blake
  0 siblings, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-14  8:24 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> For input 0123, the lexer produces the tokens
>>
>>      JSON_ERROR    01
>>      JSON_INTEGER  23
>>
>> Reporting an error is correct; 0123 is invalid according to RFC 7159.
>> But the error recovery isn't nice.
>>
>> Make the finite state machine eat digits before going into the error
>> state.  The lexer now produces
>>
>>      JSON_ERROR    0123
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   qobject/json-lexer.c | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>
>> @@ -158,10 +159,14 @@ static const uint8_t json_lexer[][256] =  {
>>       /* Zero */
>>       [IN_ZERO] = {
>>           TERMINAL(JSON_INTEGER),
>> -        ['0' ... '9'] = IN_ERROR,
>> +        ['0' ... '9'] = IN_BAD_ZERO,
>>           ['.'] = IN_MANTISSA,
>>       },
>>   +    [IN_BAD_ZERO] = {
>> +        ['0' ... '9'] = IN_BAD_ZERO,
>> +    },
>> +
>
> Should IN_BAD_ZERO also consume '.' and/or 'e' (after all, '01e2 is a
> valid C constant, but not a valid JSON literal)?  But I think your
> choice here is fine (again, add too much, and then the lexer has to
> track a lot of state; whereas this minimal addition catches the most
> obvious things with little effort).

My patch is of marginal value to begin with.  It improves error recovery
only for the "integer with redundant leading zero" case.  I guess that's
more common than "floating-point with redundant leading zero".

An obvious way to extend it to "number with redundant leading zero"
would be cloning the lexer states related to numbers.  Clean, but six
more states.  Meh.

Another way is to dumb down the lexer not to care about leading zero,
and catch it in parse_literal() instead.  Basically duplicates the lexer
state machine up to where leading zero is recognized.  Not much code,
but meh.

Yet another way is to have the lexer eat "digit salad" after redundant
leading zero:

    [IN_BAD_ZERO] = {
        ['0' ... '9'] = IN_BAD_ZERO,
        ['.'] = IN_BAD_ZERO,
        ['e'] = IN_BAD_ZERO,
        ['-'] = IN_BAD_ZERO,
        ['+'] = IN_BAD_ZERO,
    },

Eats even crap like 01e...  But then the same crap without leading zero
should also be eaten.  We'd have to add state transitions to IN_BAD_ZERO
to the six states related to numbers.  Perhaps clever use of gcc's range
initialization lets us do this compactly.  Otherwise, meh again.

Opinions?

I'd also be fine with dropping this patch.

> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 41/56] json: Nicer recovery from invalid leading zero
  2018-08-14  8:24     ` Markus Armbruster
@ 2018-08-14 13:14       ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-14 13:14 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, marcandre.lureau, mdroth

On 08/14/2018 03:24 AM, Markus Armbruster wrote:
>> Should IN_BAD_ZERO also consume '.' and/or 'e' (after all, '01e2 is a
>> valid C constant, but not a valid JSON literal)?  But I think your
>> choice here is fine (again, add too much, and then the lexer has to
>> track a lot of state; whereas this minimal addition catches the most
>> obvious things with little effort).
> 
> My patch is of marginal value to begin with.  It improves error recovery
> only for the "integer with redundant leading zero" case.  I guess that's
> more common than "floating-point with redundant leading zero".

Yes, that was my thought.

> 
> An obvious way to extend it to "number with redundant leading zero"
> would be cloning the lexer states related to numbers.  Clean, but six
> more states.  Meh.
> 
> Another way is to dumb down the lexer not to care about leading zero,
> and catch it in parse_literal() instead.  Basically duplicates the lexer
> state machine up to where leading zero is recognized.  Not much code,
> but meh.
> 
> Yet another way is to have the lexer eat "digit salad" after redundant
> leading zero:
> 
>      [IN_BAD_ZERO] = {
>          ['0' ... '9'] = IN_BAD_ZERO,
>          ['.'] = IN_BAD_ZERO,
>          ['e'] = IN_BAD_ZERO,
>          ['-'] = IN_BAD_ZERO,
>          ['+'] = IN_BAD_ZERO,
>      },
> 
> Eats even crap like 01e...  But then the same crap without leading zero
> should also be eaten.  We'd have to add state transitions to IN_BAD_ZERO
> to the six states related to numbers.  Perhaps clever use of gcc's range
> initialization lets us do this compactly.  Otherwise, meh again.
> 
> Opinions?

Of the various options, the patch as written seems to be the most 
minimal. Maybe tweak the commit message to call out other cases that we 
discussed as not worth doing, because the likelihood of such input is 
dramatically less.

About the only other improvement that might be worth making is adding:

[IN_BAD_ZERO] = {
   ['x'] = IN_BAD_ZERO,
   ['X'] = IN_BAD_ZERO,
   ['a' ... 'f'] = IN_BAD_ZERO,
   ['A' ... 'F'] = IN_BAD_ZERO,
}

to catch people typing JSON by hand and thinking that a hex constant 
will work.  That way, '0x1a' gets parsed as a single JSON_ERROR token, 
rather than four separate tokens of JSON_INTEGER, JSON_KEYWORD, 
JSON_INTEGER, JSON_KEYWORD (where you get the semantic error due to 
JSON_KEYWORD unexpected).

> 
> I'd also be fine with dropping this patch.

No, I like it, especially if you also like my suggestion to make error 
reporting on hex numbers resemble error reporting on octal numbers.

> 
>> Reviewed-by: Eric Blake <eblake@redhat.com>
> 
> Thanks!
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 43/56] qjson: Fix qobject_from_json() & friends for multiple values
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 43/56] qjson: Fix qobject_from_json() & friends for multiple values Markus Armbruster
@ 2018-08-14 13:26   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-14 13:26 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> qobject_from_json() & friends use the consume_json() callback to
> receive either a value or an error from the parser.
> 
> When they are fed a string that contains more than either one JSON
> value or one JSON syntax error, consume_json() gets called multiple
> times.
> 
> When the last call receives a value, qobject_from_json() returns that
> value.  Any other values are leaked.
> 
> When any call receives an error, qobject_from_json() sets the first
> error received.  Any other errors are thrown away.
> 
> When values follow errors, qobject_from_json() returns both a value
> and sets an error.  That's bad.  Impact:
> 
> * block.c's parse_json_protocol() ignores and leaks the value.  It's
>    used to to parse pseudo-filenames starting with "json:".  The
>    pseudo-filenames can come from the user or from image meta-data such
>    as a QCOW2 image's backing file name.

Fortunately, I don't think this falls in the category of a 
denial-of-service attack worthy of a CVE (memory leaks in long-running 
processes are bad, but to be an escalation attack, you'd have to 
convince someone with more rights to repeatedly reload your malicious 
image to cause them to suffer from the leak - but why would they load 
your image more than once? You can reload it yourself, but then you are 
only killing your own qemu so there is no escalation of privilege).


> 
> * vl.c's parse_display_qapi() ignores and leaks the error.  It's used
>    to parse the argument of command line option -display.
> 
> * vl.c's main() case QEMU_OPTION_blockdev ignores the error and leaves
>    it in @err.  main() will then pass a pointer to a non-null Error *
>    to net_init_clients(), which is forbidden.  It can lead to assertion
>    failure or other misbehavior.
> 
> * check-qjson.c's multiple_values() demonstrates the badness.
> 
> * The other callers are not affected since they only pass strings with
>    exactly one JSON value or, in the case of negative tests, one
>    error.
> 
> The impact on the _nofail() functions is relatively harmless.  They
> abort when any call receives an error.  Else they return the last
> value, and leak the others, if any.
> 
> Fix consume_json() as follows.  On the first call, save value and
> error as before.  On subsequent calls, if any, don't save them.  If
> the first call saved a value, the next call, if any, replaces the
> value by an "Expecting at most one JSON value" error.  Take care not
> to leak values or errors that aren't saved.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   qobject/qjson.c     | 15 ++++++++++++++-
>   tests/check-qjson.c | 10 +++-------
>   2 files changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/qobject/qjson.c b/qobject/qjson.c
> index 7395556069..7f69036487 100644
> --- a/qobject/qjson.c
> +++ b/qobject/qjson.c
> @@ -33,8 +33,21 @@ static void consume_json(void *opaque, QObject *json, Error *err)
>   {
>       JSONParsingState *s = opaque;
>   
> +    assert(!json != !err);
> +    assert(!s->result || !s->err);
> +
> +    if (s->result) {

Reached whether the second item encountered was also an object (json != 
NULL) or an error (err != NULL), but seems reasonable.

> +        qobject_unref(s->result);
> +        s->result = NULL;
> +        error_setg(&s->err, "Expecting at most one JSON value");
> +    }
> +    if (s->err) {
> +        qobject_unref(json);
> +        error_free(err);
> +        return;
> +    }

Worth spelling this clause as error_propagate(&s->err, err)?  Oh, I see 
why you can't - you have to do the early return in the case that (json 
!= NULL) so that you don't assign s->result again if this was a parse of 
a second object.

>       s->result = json;
> -    error_propagate(&s->err, err);
> +    s->err = err;
>   }
>   

Took me a couple of reads to verify the logic makes sense, but it looks 
right.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 05/56] qmp-test: Cover syntax and lexical errors
  2018-08-10 14:06       ` Eric Blake
@ 2018-08-16 12:44         ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-16 12:44 UTC (permalink / raw)
  To: Eric Blake; +Cc: marcandre.lureau, qemu-devel, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/10/2018 08:52 AM, Markus Armbruster wrote:
>
>>>> +    /* lexical error: impossible byte outside string */
>>>> +    qtest_qmp_send_raw(qts, "{\xFF");
>>>
>>> \xff is an impossible byte inside a string as well; plus it has
>>> special meaning to at least QMP for commanding a parser reset. Is a
>>> better byte more appropriate (maybe \x7f), either in replacement to
>>> \xff or as an additional test?
>>
>> \xFF is documented to have special meaning for QGA, but as far as the
>> code's concerned, it's a lexical error like any other.  I'm fixing the
>> documentation in PATCH 56.  Want me to move that patch to the front of
>> the series?
>
> Might not hurt. We also have a potential design decision to make: for
> most lexical errors, we report the error (with QGA, the user then
> requests that the first valid command after the client's induced
> lexical error also include an 0xff reply byte so that the client can
> easily skip over all the line noise, including said error reports).
> Thus, we COULD decide to make our parser specifically accept 0xff as a
> new token, different from the lexical error token, so that it inhibits
> wasted error messages to the client on the grounds that the client
> sent it on purpose, differently from all other ways the client can use
> a lexical error to cause a reset.

I don't think that's worthwhile.  Let me explain.

I see one and a half use cases.

The full use case is of course QGA synchronization.  Why is that even
necessary?  I believe it's a work-around for transports that fail to
provide proper connection semantics, such as virtio-serial.

When a transport provides it (sockets do), every connection starts with
a clean slate.  The only way it can get de-synchronized is either peer
getting confused enough to send garbage, and then it's probably best to
close the session and start over.  Evidence: we don't bother with
synchronization for QMP, which commonly uses socket transports.

I guess the problem for QGA is reconnecting with virtio-serial.  The
slate isn't clean then.  If the previous connection left some of the
peer's output unread, you'll get that before your own, and it may not be
valid JSON.  Same for the other direction.

To reset the peer's parser, we provoke a lexical error by sending \xFF.

We still have to read and ignore stale peer output.  We do that with the
help of command guest-sync-delimited.  Whether lexical error caused by
\xFF are suppressed or not makes no appreciable difference.

I therefore think suppressing it is not worth the bother.  What we have
seems good enough.

The half use case is human interactive use.  It's fairly easy to
miscount parenthesis and get the peer into some unresponsive state.
Provoking a lexical error lets me start over.  However, I'd rather not
use \xFF to provoke it, because how do you type that?  Sending a
suitable ASCII control character is easier, and will do the trick after
PATCH 17 fixes our JSON parser.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 44/56] json: Fix latent parser aborts at end of input
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 44/56] json: Fix latent parser aborts at end of input Markus Armbruster
@ 2018-08-16 13:10   ` Eric Blake
  2018-08-16 15:19     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-16 13:10 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> json-parser.c carefully reports end of input like this:
> 
>      token = parser_context_pop_token(ctxt);
>      if (token == NULL) {
> 	parse_error(ctxt, NULL, "premature EOI");
> 	goto out;
>      }

Are the TABs in the commit message intentional?

> 
> Except parser_context_pop_token() can't return null, it fails its
> assertion instead.  Same for parser_context_peek_token().  Broken in
> commit 65c0f1e9558, and faithfully preserved in commit 95385fe9ace.
> Only a latent bug, because the streamer throws away any input that
> could trigger it.
> 
> Drop the assertions, so we can fix the streamer in the next commit.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   qobject/json-parser.c | 2 --
>   1 file changed, 2 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 45/56] json: Fix streamer not to ignore trailing unterminated structures
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 45/56] json: Fix streamer not to ignore trailing unterminated structures Markus Armbruster
@ 2018-08-16 13:12   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-16 13:12 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> json_message_process_token() accumulates tokens until it got the
> sequence of tokens that comprise a single JSON value (it counts curly
> braces and square brackets to decide).  It feeds those token sequences
> to json_parser_parse().  If a non-empty sequence of tokens remains at
> the end of the parse, it's silently ignored.  check-qjson.c cases
> unterminated_array(), unterminated_array_comma(), unterminated_dict(),
> unterminated_dict_comma() demonstrate this bug.
> 
> Fix as follows.  Introduce a JSON_END_OF_INPUT token.  When the
> streamer receives it, it feeds the accumulated tokens to
> json_parser_parse().
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
Nice.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 46/56] json: Assert json_parser_parse() consumes all tokens on success
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 46/56] json: Assert json_parser_parse() consumes all tokens on success Markus Armbruster
@ 2018-08-16 13:13   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-16 13:13 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   qobject/json-parser.c | 1 +
>   1 file changed, 1 insertion(+)

Straight-forward, but took a lot of cleanup in earlier patches to get 
here ;)

Reviewed-by: Eric Blake <eblake@redhat.com>

> 
> diff --git a/qobject/json-parser.c b/qobject/json-parser.c
> index c2974d46b3..208dffc96c 100644
> --- a/qobject/json-parser.c
> +++ b/qobject/json-parser.c
> @@ -539,6 +539,7 @@ QObject *json_parser_parse(GQueue *tokens, va_list *ap, Error **errp)
>       QObject *result;
>   
>       result = parse_value(&ctxt, ap);
> +    assert(ctxt.err || g_queue_is_empty(ctxt.buf));
>   
>       error_propagate(errp, ctxt.err);
>   
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 47/56] qjson: Have qobject_from_json() & friends reject empty and blank
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 47/56] qjson: Have qobject_from_json() & friends reject empty and blank Markus Armbruster
@ 2018-08-16 13:20   ` Eric Blake
  2018-08-16 15:40     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-16 13:20 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> The last case where qobject_from_json() & friends return null without
> setting an error is empty or blank input.  Callers:
> 
> * block.c's parse_json_protocol() reports "Could not parse the JSON
>    options".  It's marked as a work-around, because it also covered
>    actual bugs, but they got fixed in the previous few commits.

How would you trigger this? I guess that would be by using the 
pseud-json block specification spelled "json:" rather than the usual 
"json:{...}".

> 
> * qobject_input_visitor_new_str() reports "JSON parse error".  Also
>    marked as work-around.  The recent fixes have made this unreachable,
>    because it currently gets called only for input starting with '{'.

Indeed, no triggers to this.

> 
> * check-qjson.c's empty_input() and blank_input() demonstrate the
>    behavior.
> 
> * The other callers are not affected since they only pass input with
>    exactly one JSON value or, in the case of negative tests, one error.

As long as sending back-to-back newlines to QMP does not treat the empty 
line as an error, you should be okay. (If sending two newlines in a row 
now results in a {"error":...} response from the server for the blank 
line, then you've regressed).

> 
> Fail with "Expecting a JSON value" instead of returning null, and
> simplify callers.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---

Yay for getting rid of the inconsistent error reporting!

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 48/56] json: Enforce token count and size limits more tightly
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 48/56] json: Enforce token count and size limits more tightly Markus Armbruster
@ 2018-08-16 13:22   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-16 13:22 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> Token count and size limits exist to guard against excessive heap
> usage.  We check them only after we created the token on the heap.
> That's assigning a cowboy to the barn to lasso the horse after it has
> bolted.  Close the barn door instead: check before we create the
> token.

Love the imagery.

> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   qobject/json-streamer.c | 36 ++++++++++++++++++------------------
>   1 file changed, 18 insertions(+), 18 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 49/56] json: Streamline json_message_process_token()
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 49/56] json: Streamline json_message_process_token() Markus Armbruster
@ 2018-08-16 13:40   ` Eric Blake
  2018-08-16 15:42     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-16 13:40 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   qobject/json-streamer.c | 13 +++++--------
>   1 file changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
> index 810aae521f..954bf9d468 100644
> --- a/qobject/json-streamer.c
> +++ b/qobject/json-streamer.c
> @@ -99,16 +99,13 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
>   
>       g_queue_push_tail(parser->tokens, token);
>   
> -    if (parser->brace_count < 0 ||
> -        parser->bracket_count < 0 ||

Old: if we are unbalanced (more right tokens read than left)...

> -        (parser->brace_count == 0 &&
> -         parser->bracket_count == 0)) {

...or if we uniformly ended nesting,...

> -        json = json_parser_parse(parser->tokens, parser->ap, &err);

...then parse (to either diagnose the unbalance, or to see if the 
balanced construct is valid), with weird flow control that skips over an 
early return.

Or put another way, if we invert the condition, we find the cases where 
we want an early return instead of parsing (and can thus use that to get 
rid of an unsightly goto over a single early return).

Applying deMorgan's rules:

!(brace < 0 || bracket < 0 || (brace == 0 && bracket == 0))
!(brace < 0) && !(bracket < 0) && !(brace == 0 && bracket == 0)
brace >= 0 && bracket >= 0 && (!(brace == 0) || !(bracket == 0))
brace >= 0 && bracket >= 0 && (brace != 0 || bracket != 0)

But based on what we learned in the first two conjunctions, we can 
rewrite the third:

brace >= 0 && bracket >= 0 && (brace > 0 || bracket > 0)

and then commute the logic:

(brace > 0 || bracket > 0) && brace >= 0 && bracket >= 0

> -        parser->tokens = NULL;
> -        goto out_emit;
> +    if ((parser->brace_count > 0 || parser->bracket_count > 0)
> +        && parser->bracket_count >= 0 && parser->bracket_count >= 0) {

So the new condition is correct, and reads as:

If either struct is still awaiting closure, and both structs have not 
gone unbalanced, then early exit.

It was not intuitive, but stepping through the logic shows it is identical.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 50/56] json: Unbox tokens queue in JSONMessageParser
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 50/56] json: Unbox tokens queue in JSONMessageParser Markus Armbruster
@ 2018-08-16 13:42   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-16 13:42 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   include/qapi/qmp/json-streamer.h |  2 +-
>   qobject/json-parser.c            |  1 -
>   qobject/json-streamer.c          | 30 +++++++++++-------------------
>   3 files changed, 12 insertions(+), 21 deletions(-)
> 

Nice reduction as a result.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 51/56] json: Eliminate lexer state IN_ERROR and pseudo-token JSON_MIN
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 51/56] json: Eliminate lexer state IN_ERROR and pseudo-token JSON_MIN Markus Armbruster
@ 2018-08-16 13:45   ` Eric Blake
  2018-08-16 15:48     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-16 13:45 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   include/qapi/qmp/json-lexer.h | 10 ++++------
>   qobject/json-lexer.c          | 18 ++++++++----------
>   2 files changed, 12 insertions(+), 16 deletions(-)
> 

> @@ -335,8 +334,7 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
>               json_message_process_token(lexer, lexer->token, JSON_ERROR,
>                                          lexer->x, lexer->y);
>               g_string_truncate(lexer->token, 0);
> -            new_state = lexer->start_state;
> -            lexer->state = new_state;
> +            lexer->state = lexer->start_state;

Does this simplification belong in an earlier patch?

Otherwise,
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 52/56] json: Eliminate lexer state IN_WHITESPACE, pseudo-token JSON_SKIP
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 52/56] json: Eliminate lexer state IN_WHITESPACE, pseudo-token JSON_SKIP Markus Armbruster
@ 2018-08-16 13:51   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-16 13:51 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> Bonus: static json_lexer[] loses its unused elements.  It shrinks from
> 8KiB to 4.75KiB for me.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   include/qapi/qmp/json-lexer.h |  1 -
>   qobject/json-lexer.c          | 30 +++++++++---------------------
>   2 files changed, 9 insertions(+), 22 deletions(-)
> 

Impacted when rebasing atop my earlier ideas for compressing IN_INTERPOL 
initialization, but overall:

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 53/56] json: Make JSONToken opaque outside json-parser.c
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 53/56] json: Make JSONToken opaque outside json-parser.c Markus Armbruster
@ 2018-08-16 13:54   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-16 13:54 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   include/qapi/qmp/json-parser.h   |  4 ++++
>   include/qapi/qmp/json-streamer.h |  7 -------
>   qobject/json-parser.c            | 19 +++++++++++++++++++
>   qobject/json-streamer.c          |  8 +-------
>   4 files changed, 24 insertions(+), 14 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 54/56] qobject: Drop superfluous includes of qemu-common.h
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 54/56] qobject: Drop superfluous includes of qemu-common.h Markus Armbruster
@ 2018-08-16 13:54   ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-16 13:54 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   include/qapi/qmp/json-parser.h | 1 -
>   qobject/json-lexer.c           | 1 -
>   qobject/json-streamer.c        | 1 -
>   qobject/qbool.c                | 1 -
>   qobject/qlist.c                | 1 -
>   qobject/qnull.c                | 1 -
>   qobject/qnum.c                 | 1 -
>   qobject/qobject.c              | 1 -
>   qobject/qstring.c              | 1 -
>   9 files changed, 9 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 21/56] json: Reject invalid UTF-8 sequences
  2018-08-10 15:21       ` Eric Blake
@ 2018-08-16 14:50         ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-16 14:50 UTC (permalink / raw)
  To: Eric Blake; +Cc: marcandre.lureau, qemu-devel, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/10/2018 09:40 AM, Markus Armbruster wrote:
>
>>>> +            cp = mod_utf8_codepoint(ptr, 6, &end);
>>>
>>> Why are you hard-coding 6 here, rather than computing min(6,
>>> strchr(ptr,0)-ptr)?  If the user passes an invalid sequence at the end
>>> of the string, can we end up making mod_utf8_codepoint() read beyond
>>> the end of our string?  Would it be better to just always pass the
>>> remaining string length (mod_utf8_codepoint() only cares about
>>> stopping short of 6 bytes, but never reads beyond there even if you
>>> pass a larger number)?
>>
>> mod_utf8_codepoint() never reads beyond '\0'.  The second parameter
>> exists only so you can further limit reads.  I like to provide that
>> capability, because it sometimes saves a silly substring copy.
>
> Okay. Perhaps the comments on mod_utf8_codepoint() could make that
> more clear that the contract is not violated (I didn't spot it without
> a close re-read of the code, prompted by your reply).  But that's
> possibly a separate patch.

Well, the contract says @s is a string, and that means no access beyond
the terminating null character is permitted.  Perhaps too subtle.  My
contracts often are.

[...]

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 44/56] json: Fix latent parser aborts at end of input
  2018-08-16 13:10   ` Eric Blake
@ 2018-08-16 15:19     ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-16 15:19 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> json-parser.c carefully reports end of input like this:
>>
>>      token = parser_context_pop_token(ctxt);
>>      if (token == NULL) {
>> 	parse_error(ctxt, NULL, "premature EOI");
>> 	goto out;
>>      }
>
> Are the TABs in the commit message intentional?

No.  Suspect a paste accident.  Fixing...

>> Except parser_context_pop_token() can't return null, it fails its
>> assertion instead.  Same for parser_context_peek_token().  Broken in
>> commit 65c0f1e9558, and faithfully preserved in commit 95385fe9ace.
>> Only a latent bug, because the streamer throws away any input that
>> could trigger it.
>>
>> Drop the assertions, so we can fix the streamer in the next commit.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   qobject/json-parser.c | 2 --
>>   1 file changed, 2 deletions(-)
>
> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 47/56] qjson: Have qobject_from_json() & friends reject empty and blank
  2018-08-16 13:20   ` Eric Blake
@ 2018-08-16 15:40     ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-16 15:40 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> The last case where qobject_from_json() & friends return null without
>> setting an error is empty or blank input.  Callers:
>>
>> * block.c's parse_json_protocol() reports "Could not parse the JSON
>>    options".  It's marked as a work-around, because it also covered
>>    actual bugs, but they got fixed in the previous few commits.
>
> How would you trigger this?

$ qemu-system-x86_64 json:{}
qemu-system-x86_64: Must specify either driver or file
$ qemu-system-x86_64 json:
qemu-system-x86_64: Could not parse the JSON options

>                             I guess that would be by using the
> pseud-json block specification spelled "json:" rather than the usual
> "json:{...}".
>
>>
>> * qobject_input_visitor_new_str() reports "JSON parse error".  Also
>>    marked as work-around.  The recent fixes have made this unreachable,
>>    because it currently gets called only for input starting with '{'.
>
> Indeed, no triggers to this.
>
>>
>> * check-qjson.c's empty_input() and blank_input() demonstrate the
>>    behavior.
>>
>> * The other callers are not affected since they only pass input with
>>    exactly one JSON value or, in the case of negative tests, one error.
>
> As long as sending back-to-back newlines to QMP does not treat the
> empty line as an error, you should be okay. (If sending two newlines
> in a row now results in a {"error":...} response from the server for
> the blank line, then you've regressed).

QMP doesn't parse with qobject_from_json(), so it isn't affected.

Permit me a digression on newlines.

Newlines are like any other whitespace.  Whitespace can be necessary to
make the lexer emit a token.  For instance, sending "123" without a
newline to QMP does not produce a reply.  The lexer is in state
IN_DIGITS then.  You can make it go to JSON_INTEGER and emit the token
by sending a newline.  This produces a reply.

This doesn't match a (naive?) interactive users mental model of newline.
When such a user hits newline, he expects a reply.  If he doesn't get
one, say because he miscounted his curlies, confusion ensues.

A better designed protocol would avoid that trap.

>> Fail with "Expecting a JSON value" instead of returning null, and
>> simplify callers.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>
> Yay for getting rid of the inconsistent error reporting!
>
> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 49/56] json: Streamline json_message_process_token()
  2018-08-16 13:40   ` Eric Blake
@ 2018-08-16 15:42     ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-16 15:42 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   qobject/json-streamer.c | 13 +++++--------
>>   1 file changed, 5 insertions(+), 8 deletions(-)
>>
>> diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
>> index 810aae521f..954bf9d468 100644
>> --- a/qobject/json-streamer.c
>> +++ b/qobject/json-streamer.c
>> @@ -99,16 +99,13 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
>>         g_queue_push_tail(parser->tokens, token);
>>   -    if (parser->brace_count < 0 ||
>> -        parser->bracket_count < 0 ||
>
> Old: if we are unbalanced (more right tokens read than left)...
>
>> -        (parser->brace_count == 0 &&
>> -         parser->bracket_count == 0)) {
>
> ...or if we uniformly ended nesting,...
>
>> -        json = json_parser_parse(parser->tokens, parser->ap, &err);
>
> ...then parse (to either diagnose the unbalance, or to see if the
> balanced construct is valid), with weird flow control that skips over
> an early return.
>
> Or put another way, if we invert the condition, we find the cases
> where we want an early return instead of parsing (and can thus use
> that to get rid of an unsightly goto over a single early return).
>
> Applying deMorgan's rules:
>
> !(brace < 0 || bracket < 0 || (brace == 0 && bracket == 0))
> !(brace < 0) && !(bracket < 0) && !(brace == 0 && bracket == 0)
> brace >= 0 && bracket >= 0 && (!(brace == 0) || !(bracket == 0))
> brace >= 0 && bracket >= 0 && (brace != 0 || bracket != 0)
>
> But based on what we learned in the first two conjunctions, we can
> rewrite the third:
>
> brace >= 0 && bracket >= 0 && (brace > 0 || bracket > 0)
>
> and then commute the logic:
>
> (brace > 0 || bracket > 0) && brace >= 0 && bracket >= 0
>
>> -        parser->tokens = NULL;
>> -        goto out_emit;
>> +    if ((parser->brace_count > 0 || parser->bracket_count > 0)
>> +        && parser->bracket_count >= 0 && parser->bracket_count >= 0) {
>
> So the new condition is correct, and reads as:
>
> If either struct is still awaiting closure, and both structs have not
> gone unbalanced, then early exit.
>
> It was not intuitive, but stepping through the logic shows it is identical.

My first version had a "simpler" condition there.  My test cases proved
it wrong %-}

> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 51/56] json: Eliminate lexer state IN_ERROR and pseudo-token JSON_MIN
  2018-08-16 13:45   ` Eric Blake
@ 2018-08-16 15:48     ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-16 15:48 UTC (permalink / raw)
  To: Eric Blake; +Cc: Markus Armbruster, qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   include/qapi/qmp/json-lexer.h | 10 ++++------
>>   qobject/json-lexer.c          | 18 ++++++++----------
>>   2 files changed, 12 insertions(+), 16 deletions(-)
>>
>
>> @@ -335,8 +334,7 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
>>               json_message_process_token(lexer, lexer->token, JSON_ERROR,
>>                                          lexer->x, lexer->y);
>>               g_string_truncate(lexer->token, 0);
>> -            new_state = lexer->start_state;
>> -            lexer->state = new_state;
>> +            lexer->state = lexer->start_state;
>
> Does this simplification belong in an earlier patch?

Hmm, PATCH 37 would be a better fit indeed.

> Otherwise,
> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 55/56] json: Clean up headers
  2018-08-08 12:03 ` [Qemu-devel] [PATCH 55/56] json: Clean up headers Markus Armbruster
@ 2018-08-16 17:50   ` Eric Blake
  2018-08-17  8:22     ` Markus Armbruster
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Blake @ 2018-08-16 17:50 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: marcandre.lureau, mdroth

On 08/08/2018 07:03 AM, Markus Armbruster wrote:
> The JSON parser has three public headers, json-lexer.h, json-parser.h,
> json-streamer.h.  They all contain stuff that is of no interest
> outside qobject/json-*.c.
> 
> Collect the public interface in include/qapi/qmp/json-parser.h, and
> everything else in qobject/json-parser-int.h.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---

Nice separation.

>   10 files changed, 51 insertions(+), 76 deletions(-)
>   delete mode 100644 include/qapi/qmp/json-streamer.h
>   rename include/qapi/qmp/json-lexer.h => qobject/json-parser-int.h (62%)


> 
> diff --git a/include/qapi/qmp/json-parser.h b/include/qapi/qmp/json-parser.h
> index 55f75954c3..7345a9bd5c 100644
> --- a/include/qapi/qmp/json-parser.h
> +++ b/include/qapi/qmp/json-parser.h
> @@ -1,5 +1,5 @@
>   /*
> - * JSON Parser
> + * JSON Parser

I'm not sure what git tried to flag here.

Otherwise, looks like a good reorganization.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8")
  2018-08-13  7:00       ` Markus Armbruster
  2018-08-13 14:57         ` Eric Blake
@ 2018-08-17  7:18         ` Markus Armbruster
  1 sibling, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-17  7:18 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Eric Blake, qemu-devel, marcandre.lureau, mdroth

Markus Armbruster <armbru@redhat.com> writes:

> Eric Blake <eblake@redhat.com> writes:
>
>> On 08/10/2018 10:48 AM, Eric Blake wrote:
>>> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>>>> This is consistent with qobject_to_json().  See commit e2ec3f97680.
>>>
>>> Side note: that commit mentions that on output, ASCII DEL (0x7f) is
>>> always escaped. RFC 7159 does not require it to be escaped on input,
>
> Weird, isn't it?
>
>>> but I wonder if any of your earlier testsuite improvements should
>>> specifically cover \x7f vs. \u007f on input being canonicalized to
>>> \u007f on round trip output.
>
> From utf8_string():
>
>         /* 2.2.1  1 byte U+007F */
>         {
>             "\x7F",
>             "\x7F",
>             "\\u007F",
>         },
>
> We test parsing of JSON "\x7F" (expecting C string "\x7F"), unparsing of
> that C string (expecting JSON "\\u007F"), and after PATCH 29 parsing of
> that JSON (expecting the C string again).  Sufficient?
>
>>>>
>>>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>>>> ---
>>>>   qobject/json-lexer.c  | 2 +-
>>>>   qobject/json-parser.c | 2 +-
>>>>   tests/check-qjson.c   | 8 +-------
>>>>   3 files changed, 3 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
>>>> index ca1e0e2c03..36fb665b12 100644
>>>> --- a/qobject/json-lexer.c
>>>> +++ b/qobject/json-lexer.c
>>>> @@ -93,7 +93,7 @@
>>>>    *   interpolation = %((l|ll|I64)[du]|[ipsf])
>>>>    *
>>>>    * Note:
>>>> - * - Input must be encoded in UTF-8.
>>>> + * - Input must be encoded in modified UTF-8.
>>>
>>> Worth documenting this in the QMP doc as an explicit extension?
>
> qmp-spec.txt:
>
>     The sever expects its input to be encoded in UTF-8, and sends its
>     output encoded in ASCII.
>
> The obvious update would be to stick in "modified".

Not really necessary, because:

* Before this patch, the JSON parser rejects \0 as ASCII control
  character, and \xC0\x80 as overlong UTF-8.

  Note that PATCH 17 fixed rejection of \0 in JSON strings.  PATCH 21
  fixed rejection of invalid UTF-8, but \xC0\x80 wasn't broken.

* This patch makes \xC0\x80 pass the "invalid UTF-8" check, only to get
  rejected as ASCII control character.  The error message changes,
  that's all.

The patch's benefit is consistency with the other direction:
qobject_to_json() maps \xC0\x80 to \\u0000.  I guess my commit message
should explain this a bit better.

[...]

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 55/56] json: Clean up headers
  2018-08-16 17:50   ` Eric Blake
@ 2018-08-17  8:22     ` Markus Armbruster
  0 siblings, 0 replies; 162+ messages in thread
From: Markus Armbruster @ 2018-08-17  8:22 UTC (permalink / raw)
  To: Eric Blake; +Cc: Markus Armbruster, qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> The JSON parser has three public headers, json-lexer.h, json-parser.h,
>> json-streamer.h.  They all contain stuff that is of no interest
>> outside qobject/json-*.c.
>>
>> Collect the public interface in include/qapi/qmp/json-parser.h, and
>> everything else in qobject/json-parser-int.h.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>
> Nice separation.
>
>>   10 files changed, 51 insertions(+), 76 deletions(-)
>>   delete mode 100644 include/qapi/qmp/json-streamer.h
>>   rename include/qapi/qmp/json-lexer.h => qobject/json-parser-int.h (62%)
>
>
>>
>> diff --git a/include/qapi/qmp/json-parser.h b/include/qapi/qmp/json-parser.h
>> index 55f75954c3..7345a9bd5c 100644
>> --- a/include/qapi/qmp/json-parser.h
>> +++ b/include/qapi/qmp/json-parser.h
>> @@ -1,5 +1,5 @@
>>   /*
>> - * JSON Parser
>> + * JSON Parser
>
> I'm not sure what git tried to flag here.

Trailing whitespace cleaned up.

> Otherwise, looks like a good reorganization.
>
> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 56/56] docs/interop/qmp-spec: How to force known good parser state
  2018-08-10 14:30   ` Eric Blake
@ 2018-08-17  8:37     ` Markus Armbruster
  2018-08-17 14:34       ` Eric Blake
  2018-08-17 11:16     ` Markus Armbruster
  1 sibling, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-17  8:37 UTC (permalink / raw)
  To: Eric Blake; +Cc: Markus Armbruster, qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> Section "QGA Synchronization" specifies that sending "a raw 0xFF
>> sentinel byte" makes the server "reset its state and discard all
>> pending data prior to the sentinel."  What actually happens there is a
>> lexical error, which will produce one ore more error responses.
>> Moreover, it's not specific to QGA.
>
> Hoisting my review of this, as you may want to move it sooner in the series.
>
>>
>> Create new section "Forcing the JSON parser into known-good state" to
>> document the technique properly.  Rewrite section "QGA
>> Synchronization" to document just the other direction, i.e. command
>> guest-sync-delimited.
>>
>> Section "Protocol Specification" mentions "synchronization bytes
>> (documented below)".  Delete that.
>>
>> While there, fix it not to claim '"Server" is QEMU itself', but
>> '"Server" is either QEMU or the QEMU Guest Agent'.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   docs/interop/qmp-spec.txt | 37 ++++++++++++++++++++++++-------------
>>   1 file changed, 24 insertions(+), 13 deletions(-)
>>
>> diff --git a/docs/interop/qmp-spec.txt b/docs/interop/qmp-spec.txt
>> index 1566b8ae5e..d4a42fe2cc 100644
>> --- a/docs/interop/qmp-spec.txt
>> +++ b/docs/interop/qmp-spec.txt
>> @@ -20,9 +20,9 @@ operating system.
>>   2. Protocol Specification
>>   =========================
>>   -This section details the protocol format. For the purpose of this
>> document
>> -"Client" is any application which is using QMP to communicate with QEMU and
>> -"Server" is QEMU itself.
>> +This section details the protocol format. For the purpose of this
>> +document, "Server" is either QEMU or the QEMU Guest Agent, and
>> +"Client" is any application communicating with it via QMP.
>>   
>
> Broadens the term "QMP" to mean any client speaking to a qemu
> machine-readable server (previously, we tended to treat "QMP" as the
> direct-to-qemu service, and "QGA" as the guest agent service). I can
> live with that, especially since this document was already mentioning
> QGA.

And by that it already had QMP denote two disctinct things: the protocol
and one of its two applications.  I'm not really making this worse.  I'm
not really improving it, either.

>>   JSON data structures, when mentioned in this document, are always in the
>>   following format:
>> @@ -34,9 +34,8 @@ by the JSON standard:
>>     http://www.ietf.org/rfc/rfc7159.txt
>>   -The protocol is always encoded in UTF-8 except for
>> synchronization
>> -bytes (documented below); although thanks to json-string escape
>> -sequences, the server will reply using only the strict ASCII subset.
>> +The sever expects its input to be encoded in UTF-8, and sends its
>> +output encoded in ASCII.
>>   
>
> Perhaps worth documenting is the range of JSON numbers produced by
> qemu (maybe as a separate patch). Libvirt just hit a bug with the
> jansson library making it extremely difficult to parse JSON containing
> numbers larger than INT64_MAX, when compared to yajl which had a way
> to support up to UINT64_MAX.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1614569
>
> Knowing that qemu sends numbers larger than INT64_MAX with the intent
> that they not be truncated/rounded by conversion to double can be a
> vital piece of information for implementing a client, when it comes to
> picking a particular library for JSON parsing.

Good point.  Doesn't really fit into this commit, though.  Care to
propose a patch?

>>   For convenience, json-object members mentioned in this document will
>>   be in a certain order. However, in real protocol usage they can be in
>> @@ -215,16 +214,28 @@ Some events are rate-limited to at most one per second.  If additional
>>   dropped, and the last one is delayed.  "Similar" normally means same
>>   event type.  See qmp-events.txt for details.
>>   -2.6 QGA Synchronization
>> +2.6 Forcing the JSON parser into known-good state
>> +-------------------------------------------------
>> +
>> +Incomplete or invalid input can leave the server's JSON parser in a
>> +state where it can't parse additional commands.  To get it back into
>> +known-good state, the client should provoke a lexical error.
>> +
>> +The cleanest way to do that is sending an ASCII control character
>> +other than '\t' (horizontal tab), '\r' (carriage return), and '\n'
>
> s/and/or/

Done.

>> +(new line).
>> +
>> +Sadly, older versions of QEMU can fail to flag this as an error.  If a
>> +client needs to deal with them, it should send a 0xFF byte.
>
> Here's where we have the choice of whether to intentionally document
> 0xff as an intentional parser reset, instead of a lexical error. If
> so, the advice to provoke a lexical error via an ASCII control (of
> which I would be most likely to use 0x00 NUL or 0x1b ESC) vs. an
> intentional use of 0xff may need different wording here.
>
> But if you don't want to give 0xff any more special treatment than
> what it already has as a lexical error (and that ALL lexical errors
> result in a stream reset, but possibly after emitting error messages),
> then this wording seems okay.
>
>> +
>> +2.7 QGA Synchronization
>>   -----------------------
>>     When using QGA, an additional synchronization feature is built
>> into
>> -the protocol.  If the Client sends a raw 0xFF sentinel byte (not valid
>> -JSON), then the Server will reset its state and discard all pending
>> -data prior to the sentinel.  Conversely, if the Client makes use of
>> -the 'guest-sync-delimited' command, the Server will send a raw 0xFF
>> -sentinel byte prior to its response, to aid the Client in discarding
>> -any data prior to the sentinel.
>> +the protocol. If the Client makes use of the 'guest-sync-delimited'
>> +command, the Server will send a raw 0xFF sentinel byte prior to its
>> +response, to aid the Client in discarding any data prior to the
>> +sentinel.
>
> Maybe worth mentioning "including error messages reported about any
> lexical errors received prior to the guest-sync-delimited command"
>
>>       3. QMP Examples
>>

Thanks!

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 56/56] docs/interop/qmp-spec: How to force known good parser state
  2018-08-10 14:30   ` Eric Blake
  2018-08-17  8:37     ` Markus Armbruster
@ 2018-08-17 11:16     ` Markus Armbruster
  2018-08-17 14:35       ` Eric Blake
  1 sibling, 1 reply; 162+ messages in thread
From: Markus Armbruster @ 2018-08-17 11:16 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, marcandre.lureau, mdroth

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> Section "QGA Synchronization" specifies that sending "a raw 0xFF
>> sentinel byte" makes the server "reset its state and discard all
>> pending data prior to the sentinel."  What actually happens there is a
>> lexical error, which will produce one ore more error responses.
>> Moreover, it's not specific to QGA.
>
> Hoisting my review of this, as you may want to move it sooner in the series.
>
>>
>> Create new section "Forcing the JSON parser into known-good state" to
>> document the technique properly.  Rewrite section "QGA
>> Synchronization" to document just the other direction, i.e. command
>> guest-sync-delimited.
>>
>> Section "Protocol Specification" mentions "synchronization bytes
>> (documented below)".  Delete that.
>>
>> While there, fix it not to claim '"Server" is QEMU itself', but
>> '"Server" is either QEMU or the QEMU Guest Agent'.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   docs/interop/qmp-spec.txt | 37 ++++++++++++++++++++++++-------------
>>   1 file changed, 24 insertions(+), 13 deletions(-)
>>
>> diff --git a/docs/interop/qmp-spec.txt b/docs/interop/qmp-spec.txt
>> index 1566b8ae5e..d4a42fe2cc 100644
>> --- a/docs/interop/qmp-spec.txt
>> +++ b/docs/interop/qmp-spec.txt
>> @@ -20,9 +20,9 @@ operating system.
>>   2. Protocol Specification
>>   =========================
>>   -This section details the protocol format. For the purpose of this
>> document
>> -"Client" is any application which is using QMP to communicate with QEMU and
>> -"Server" is QEMU itself.
>> +This section details the protocol format. For the purpose of this
>> +document, "Server" is either QEMU or the QEMU Guest Agent, and
>> +"Client" is any application communicating with it via QMP.
>>   
[...]
>>   JSON data structures, when mentioned in this document, are always in the
>>   following format:
>> @@ -34,9 +34,8 @@ by the JSON standard:
>>     http://www.ietf.org/rfc/rfc7159.txt
>>   -The protocol is always encoded in UTF-8 except for
>> synchronization
>> -bytes (documented below); although thanks to json-string escape
>> -sequences, the server will reply using only the strict ASCII subset.
>> +The sever expects its input to be encoded in UTF-8, and sends its
>> +output encoded in ASCII.
>>   
[...]
>>   For convenience, json-object members mentioned in this document will
>>   be in a certain order. However, in real protocol usage they can be in
>> @@ -215,16 +214,28 @@ Some events are rate-limited to at most one per second.  If additional
>>   dropped, and the last one is delayed.  "Similar" normally means same
>>   event type.  See qmp-events.txt for details.
>>   -2.6 QGA Synchronization
>> +2.6 Forcing the JSON parser into known-good state
>> +-------------------------------------------------
>> +
>> +Incomplete or invalid input can leave the server's JSON parser in a
>> +state where it can't parse additional commands.  To get it back into
>> +known-good state, the client should provoke a lexical error.
>> +
>> +The cleanest way to do that is sending an ASCII control character
>> +other than '\t' (horizontal tab), '\r' (carriage return), and '\n'
>
> s/and/or/
>
>> +(new line).
>> +
>> +Sadly, older versions of QEMU can fail to flag this as an error.  If a
>> +client needs to deal with them, it should send a 0xFF byte.
[...]
>> +
>> +2.7 QGA Synchronization
>>   -----------------------
>>     When using QGA, an additional synchronization feature is built
>> into
>> -the protocol.  If the Client sends a raw 0xFF sentinel byte (not valid
>> -JSON), then the Server will reset its state and discard all pending
>> -data prior to the sentinel.  Conversely, if the Client makes use of
>> -the 'guest-sync-delimited' command, the Server will send a raw 0xFF
>> -sentinel byte prior to its response, to aid the Client in discarding
>> -any data prior to the sentinel.
>> +the protocol. If the Client makes use of the 'guest-sync-delimited'
>> +command, the Server will send a raw 0xFF sentinel byte prior to its
>> +response, to aid the Client in discarding any data prior to the
>> +sentinel.
>
> Maybe worth mentioning "including error messages reported about any
> lexical errors received prior to the guest-sync-delimited command"
>
>>       3. QMP Examples
>>

What about:

    2.7 QGA Synchronization
    -----------------------

    When a client connects to QGA over a transport lacking proper
    connection semantics such as virtio-serial, QGA may have read partial
    input from a previous client.  The client needs to force QGA's parser
    into known-good state using the previous section's technique.
    Moreover, the client may receive output a previous client didn't read.
    To help with skipping that output, QGA provides the
    'guest-sync-delimited' command.  Refer to its documentation for
    details.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 56/56] docs/interop/qmp-spec: How to force known good parser state
  2018-08-17  8:37     ` Markus Armbruster
@ 2018-08-17 14:34       ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-17 14:34 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, marcandre.lureau, mdroth

On 08/17/2018 03:37 AM, Markus Armbruster wrote:

>> Perhaps worth documenting is the range of JSON numbers produced by
>> qemu (maybe as a separate patch). Libvirt just hit a bug with the
>> jansson library making it extremely difficult to parse JSON containing
>> numbers larger than INT64_MAX, when compared to yajl which had a way
>> to support up to UINT64_MAX.
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1614569
>>
>> Knowing that qemu sends numbers larger than INT64_MAX with the intent
>> that they not be truncated/rounded by conversion to double can be a
>> vital piece of information for implementing a client, when it comes to
>> picking a particular library for JSON parsing.
> 
> Good point.  Doesn't really fit into this commit, though.  Care to
> propose a patch?

Will do, but I'll probably wait for your v2 series to land first.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [Qemu-devel] [PATCH 56/56] docs/interop/qmp-spec: How to force known good parser state
  2018-08-17 11:16     ` Markus Armbruster
@ 2018-08-17 14:35       ` Eric Blake
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Blake @ 2018-08-17 14:35 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, marcandre.lureau, mdroth

On 08/17/2018 06:16 AM, Markus Armbruster wrote:

>>> +2.7 QGA Synchronization
>>>    -----------------------
>>>      When using QGA, an additional synchronization feature is built
>>> into
>>> -the protocol.  If the Client sends a raw 0xFF sentinel byte (not valid
>>> -JSON), then the Server will reset its state and discard all pending
>>> -data prior to the sentinel.  Conversely, if the Client makes use of
>>> -the 'guest-sync-delimited' command, the Server will send a raw 0xFF
>>> -sentinel byte prior to its response, to aid the Client in discarding
>>> -any data prior to the sentinel.
>>> +the protocol. If the Client makes use of the 'guest-sync-delimited'
>>> +command, the Server will send a raw 0xFF sentinel byte prior to its
>>> +response, to aid the Client in discarding any data prior to the
>>> +sentinel.
>>
>> Maybe worth mentioning "including error messages reported about any
>> lexical errors received prior to the guest-sync-delimited command"
>>
>>>        3. QMP Examples
>>>
> 
> What about:
> 
>      2.7 QGA Synchronization
>      -----------------------
> 
>      When a client connects to QGA over a transport lacking proper
>      connection semantics such as virtio-serial, QGA may have read partial
>      input from a previous client.  The client needs to force QGA's parser
>      into known-good state using the previous section's technique.
>      Moreover, the client may receive output a previous client didn't read.
>      To help with skipping that output, QGA provides the
>      'guest-sync-delimited' command.  Refer to its documentation for
>      details.

That works for me.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

end of thread, other threads:[~2018-08-17 14:35 UTC | newest]

Thread overview: 162+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 01/56] check-qjson: Cover multiple JSON objects in same string Markus Armbruster
2018-08-09 13:25   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 02/56] check-qjson: Cover blank and lexically erroneous input Markus Armbruster
2018-08-09 13:29   ` Eric Blake
2018-08-10 13:40     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 03/56] check-qjson: Cover whitespace more thoroughly Markus Armbruster
2018-08-09 13:36   ` Eric Blake
2018-08-10 13:43     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 04/56] qmp-cmd-test: Split off qmp-test Markus Armbruster
2018-08-09 13:38   ` Eric Blake
2018-08-10 13:49     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 05/56] qmp-test: Cover syntax and lexical errors Markus Armbruster
2018-08-09 13:42   ` Eric Blake
2018-08-10 13:52     ` Markus Armbruster
2018-08-10 14:06       ` Eric Blake
2018-08-16 12:44         ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 06/56] test-qga: Clean up how we test QGA synchronization Markus Armbruster
2018-08-09 13:46   ` Eric Blake
2018-08-10 13:57     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 07/56] check-qjson: Cover escaped characters more thoroughly, part 1 Markus Armbruster
2018-08-09 13:54   ` Eric Blake
2018-08-10 14:03     ` Markus Armbruster
2018-08-09 14:00   ` Eric Blake
2018-08-10 14:11     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 08/56] check-qjson: Streamline escaped_string()'s test strings Markus Armbruster
2018-08-09 13:57   ` Eric Blake
2018-08-10 14:15     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 09/56] check-qjson: Cover escaped characters more thoroughly, part 2 Markus Armbruster
2018-08-09 14:03   ` Eric Blake
2018-08-10 14:16     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 10/56] check-qjson: Drop redundant string tests Markus Armbruster
2018-08-09 14:04   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 11/56] check-qjson: Cover UTF-8 in single quoted strings Markus Armbruster
2018-08-09 14:17   ` Eric Blake
2018-08-10 14:18     ` Markus Armbruster
2018-08-10 14:59       ` Eric Blake
2018-08-13  6:11         ` Markus Armbruster
2018-08-13 14:53           ` Eric Blake
2018-08-14  6:01             ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 12/56] check-qjson: Simplify utf8_string() Markus Armbruster
2018-08-09 14:20   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 13/56] check-qjson: Fix utf8_string() to test all invalid sequences Markus Armbruster
2018-08-09 14:22   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 14/56] check-qjson qmp-test: Cover control characters more thoroughly Markus Armbruster
2018-08-09 17:24   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 15/56] check-qjson: Cover interpolation " Markus Armbruster
2018-08-09 17:26   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 16/56] json: Fix lexer to include the bad character in JSON_ERROR token Markus Armbruster
2018-08-09 17:42   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 17/56] json: Reject unescaped control characters Markus Armbruster
2018-08-09 18:26   ` Eric Blake
2018-08-10 14:26     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 18/56] json: Revamp lexer documentation Markus Armbruster
2018-08-09 18:49   ` Eric Blake
2018-08-10 14:31     ` Markus Armbruster
2018-08-10 15:02       ` Eric Blake
2018-08-13  6:12         ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 19/56] json: Tighten and simplify qstring_from_escaped_str()'s loop Markus Armbruster
2018-08-09 18:52   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 20/56] check-qjson: Document we expect invalid UTF-8 to be rejected Markus Armbruster
2018-08-09 18:55   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 21/56] json: Reject invalid UTF-8 sequences Markus Armbruster
2018-08-09 22:16   ` Eric Blake
2018-08-10 14:40     ` Markus Armbruster
2018-08-10 15:21       ` Eric Blake
2018-08-16 14:50         ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 22/56] json: Report first rather than last parse error Markus Armbruster
2018-08-10 15:25   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 23/56] json: Leave rejecting invalid UTF-8 to parser Markus Armbruster
2018-08-10 15:36   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8") Markus Armbruster
2018-08-10 15:48   ` Eric Blake
2018-08-10 16:09     ` Eric Blake
2018-08-13  7:00       ` Markus Armbruster
2018-08-13 14:57         ` Eric Blake
2018-08-14  6:07           ` Markus Armbruster
2018-08-17  7:18         ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequences to parser Markus Armbruster
2018-08-10 15:56   ` Eric Blake
2018-08-13  7:05     ` Markus Armbruster
2018-08-13 14:58       ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 26/56] json: Simplify parse_string() Markus Armbruster
2018-08-10 15:59   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 27/56] json: Reject invalid \uXXXX, fix \u0000 Markus Armbruster
2018-08-10 16:10   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs Markus Armbruster
2018-08-10 17:18   ` Eric Blake
2018-08-13  7:07     ` Markus Armbruster
2018-08-12  9:52   ` Paolo Bonzini
2018-08-13  7:12     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 29/56] check-qjson: Fix and enable utf8_string()'s disabled part Markus Armbruster
2018-08-10 17:19   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 30/56] json: remove useless return value from lexer/parser Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 31/56] json-parser: simplify and avoid JSONParserContext allocation Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 32/56] json: Have lexer call streamer directly Markus Armbruster
2018-08-10 17:22   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 33/56] json: Redesign the callback to consume JSON values Markus Armbruster
2018-08-13 15:30   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 34/56] json: Don't pass null @tokens to json_parser_parse() Markus Armbruster
2018-08-13 15:32   ` Eric Blake
2018-08-14  6:17     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 35/56] json: Don't create JSON_ERROR tokens that won't be used Markus Armbruster
2018-08-13 15:32   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 36/56] json: Rename token JSON_ESCAPE & friends to JSON_INTERPOL Markus Armbruster
2018-08-13 15:34   ` Eric Blake
2018-08-14  6:28     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 37/56] json: Treat unwanted interpolation as lexical error Markus Armbruster
2018-08-13 15:48   ` Eric Blake
2018-08-14  6:51     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 38/56] json: Pass lexical errors and limit violations to callback Markus Armbruster
2018-08-13 15:51   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 39/56] json: Leave rejecting invalid interpolation to parser Markus Armbruster
2018-08-13 16:12   ` Eric Blake
2018-08-14  7:23     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 40/56] json: Replace %I64d, %I64u by %PRId64, %PRIu64 Markus Armbruster
2018-08-13 16:18   ` Eric Blake
2018-08-14  7:24     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 41/56] json: Nicer recovery from invalid leading zero Markus Armbruster
2018-08-13 16:33   ` Eric Blake
2018-08-14  8:24     ` Markus Armbruster
2018-08-14 13:14       ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 42/56] json: Improve names of lexer states related to numbers Markus Armbruster
2018-08-13 16:36   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 43/56] qjson: Fix qobject_from_json() & friends for multiple values Markus Armbruster
2018-08-14 13:26   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 44/56] json: Fix latent parser aborts at end of input Markus Armbruster
2018-08-16 13:10   ` Eric Blake
2018-08-16 15:19     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 45/56] json: Fix streamer not to ignore trailing unterminated structures Markus Armbruster
2018-08-16 13:12   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 46/56] json: Assert json_parser_parse() consumes all tokens on success Markus Armbruster
2018-08-16 13:13   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 47/56] qjson: Have qobject_from_json() & friends reject empty and blank Markus Armbruster
2018-08-16 13:20   ` Eric Blake
2018-08-16 15:40     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 48/56] json: Enforce token count and size limits more tightly Markus Armbruster
2018-08-16 13:22   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 49/56] json: Streamline json_message_process_token() Markus Armbruster
2018-08-16 13:40   ` Eric Blake
2018-08-16 15:42     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 50/56] json: Unbox tokens queue in JSONMessageParser Markus Armbruster
2018-08-16 13:42   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 51/56] json: Eliminate lexer state IN_ERROR and pseudo-token JSON_MIN Markus Armbruster
2018-08-16 13:45   ` Eric Blake
2018-08-16 15:48     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 52/56] json: Eliminate lexer state IN_WHITESPACE, pseudo-token JSON_SKIP Markus Armbruster
2018-08-16 13:51   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 53/56] json: Make JSONToken opaque outside json-parser.c Markus Armbruster
2018-08-16 13:54   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 54/56] qobject: Drop superfluous includes of qemu-common.h Markus Armbruster
2018-08-16 13:54   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 55/56] json: Clean up headers Markus Armbruster
2018-08-16 17:50   ` Eric Blake
2018-08-17  8:22     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 56/56] docs/interop/qmp-spec: How to force known good parser state Markus Armbruster
2018-08-10 14:30   ` Eric Blake
2018-08-17  8:37     ` Markus Armbruster
2018-08-17 14:34       ` Eric Blake
2018-08-17 11:16     ` Markus Armbruster
2018-08-17 14:35       ` Eric Blake
2018-08-08 14:03 ` [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.