All of lore.kernel.org
 help / color / mirror / Atom feed
From: Markus Armbruster <armbru@redhat.com>
To: qemu-devel@nongnu.org
Cc: marcandre.lureau@redhat.com, mdroth@linux.vnet.ibm.com,
	eblake@redhat.com
Subject: [Qemu-devel] [PATCH 4/6] json: Nicer recovery from lexical errors
Date: Mon, 27 Aug 2018 09:00:19 +0200	[thread overview]
Message-ID: <20180827070021.11931-5-armbru@redhat.com> (raw)
In-Reply-To: <20180827070021.11931-1-armbru@redhat.com>

When the lexer chokes on an input character, it consumes the
character, emits a JSON error token, and enters its start state.  This
can lead to suboptimal error recovery.  For instance, input

    0123 ,

produces the tokens

    JSON_ERROR    01
    JSON_INTEGER  23
    JSON_COMMA    ,

Make the lexer skip characters after a lexical error until a
structural character ('[', ']', '{', '}', ':', ','), an ASCII control
character, or '\xFE', or '\xFF'.

Note that we must not skip ASCII control characters, '\xFE', '\xFF',
because those are documented to force the JSON parser into known-good
state, by docs/interop/qmp-spec.txt.

The lexer now produces

    JSON_ERROR    01
    JSON_COMMA    ,

Update qmp-test for the nicer error recovery: QMP now report just one
error for input %p instead of two.  Also drop the newline after %p; it
was needed to tease out the second error.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-lexer.c | 43 +++++++++++++++++++++++++++++--------------
 tests/qmp-test.c     |  6 +-----
 2 files changed, 30 insertions(+), 19 deletions(-)

diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 28582e17d9..39c7ce7adc 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -101,6 +101,7 @@
 
 enum json_lexer_state {
     IN_ERROR = 0,               /* must really be 0, see json_lexer[] */
+    IN_RECOVERY,
     IN_DQ_STRING_ESCAPE,
     IN_DQ_STRING,
     IN_SQ_STRING_ESCAPE,
@@ -130,6 +131,28 @@ QEMU_BUILD_BUG_ON(IN_START_INTERP != IN_START + 1);
 static const uint8_t json_lexer[][256] =  {
     /* Relies on default initialization to IN_ERROR! */
 
+    /* error recovery */
+    [IN_RECOVERY] = {
+        /*
+         * Skip characters until a structural character, an ASCII
+         * control character other than '\t', or impossible UTF-8
+         * bytes '\xFE', '\xFF'.  Structural characters and line
+         * endings are promising resynchronization points.  Clients
+         * may use the others to force the JSON parser into known-good
+         * state; see docs/interop/qmp-spec.txt.
+         */
+        [0 ... 0x1F] = IN_START | LOOKAHEAD,
+        [0x20 ... 0xFD] = IN_RECOVERY,
+        [0xFE ... 0xFF] = IN_START | LOOKAHEAD,
+        ['\t'] = IN_RECOVERY,
+        ['['] = IN_START | LOOKAHEAD,
+        [']'] = IN_START | LOOKAHEAD,
+        ['{'] = IN_START | LOOKAHEAD,
+        ['}'] = IN_START | LOOKAHEAD,
+        [':'] = IN_START | LOOKAHEAD,
+        [','] = IN_START | LOOKAHEAD,
+    },
+
     /* double quote string */
     [IN_DQ_STRING_ESCAPE] = {
         [0x20 ... 0xFD] = IN_DQ_STRING,
@@ -301,26 +324,18 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
             /* fall through */
         case JSON_SKIP:
             g_string_truncate(lexer->token, 0);
+            /* fall through */
+        case IN_START:
             new_state = lexer->start_state;
             break;
         case IN_ERROR:
-            /* XXX: To avoid having previous bad input leaving the parser in an
-             * unresponsive state where we consume unpredictable amounts of
-             * subsequent "good" input, percolate this error state up to the
-             * parser by emitting a JSON_ERROR token, then reset lexer state.
-             *
-             * Also note that this handling is required for reliable channel
-             * negotiation between QMP and the guest agent, since chr(0xFF)
-             * is placed at the beginning of certain events to ensure proper
-             * delivery when the channel is in an unknown state. chr(0xFF) is
-             * never a valid ASCII/UTF-8 sequence, so this should reliably
-             * induce an error/flush state.
-             */
             json_message_process_token(lexer, lexer->token, JSON_ERROR,
                                        lexer->x, lexer->y);
+            new_state = IN_RECOVERY;
+            /* fall through */
+        case IN_RECOVERY:
             g_string_truncate(lexer->token, 0);
-            lexer->state = lexer->start_state;
-            return;
+            break;
         default:
             break;
         }
diff --git a/tests/qmp-test.c b/tests/qmp-test.c
index 4ae2245484..04ad7648d2 100644
--- a/tests/qmp-test.c
+++ b/tests/qmp-test.c
@@ -93,11 +93,7 @@ static void test_malformed(QTestState *qts)
     g_assert(recovered(qts));
 
     /* lexical error: interpolation */
-    qtest_qmp_send_raw(qts, "%%p\n");
-    /* two errors, one for "%", one for "p" */
-    resp = qtest_qmp_receive(qts);
-    g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
-    qobject_unref(resp);
+    qtest_qmp_send_raw(qts, "%%p");
     resp = qtest_qmp_receive(qts);
     g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
     qobject_unref(resp);
-- 
2.17.1

  parent reply	other threads:[~2018-08-27  7:00 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-27  7:00 [Qemu-devel] [PATCH 0/6] json: More fixes, error reporting improvements, cleanups Markus Armbruster
2018-08-27  7:00 ` [Qemu-devel] [PATCH 1/6] json: Fix lexer for lookahead character beyond '\x7F' Markus Armbruster
2018-08-27 16:50   ` Eric Blake
2018-08-28  4:28     ` Markus Armbruster
2018-08-27  7:00 ` [Qemu-devel] [PATCH 2/6] json: Clean up how lexer consumes "end of input" Markus Armbruster
2018-08-27 16:58   ` Eric Blake
2018-08-28  4:28     ` Markus Armbruster
2018-08-27  7:00 ` [Qemu-devel] [PATCH 3/6] json: Make lexer's "character consumed" logic less confusing Markus Armbruster
2018-08-27 17:04   ` Eric Blake
2018-08-27  7:00 ` Markus Armbruster [this message]
2018-08-27 17:18   ` [Qemu-devel] [PATCH 4/6] json: Nicer recovery from lexical errors Eric Blake
2018-08-28  4:35     ` Markus Armbruster
2018-08-27  7:00 ` [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR Markus Armbruster
2018-08-27 17:20   ` Eric Blake
2018-08-27 17:29   ` Eric Blake
2018-08-28  4:40     ` Markus Armbruster
2018-08-28 15:01       ` Eric Blake
2018-08-28 15:04         ` Eric Blake
2018-08-31  7:08           ` Markus Armbruster
2018-08-31  7:06         ` Markus Armbruster
2018-08-27  7:00 ` [Qemu-devel] [PATCH 6/6] json: Eliminate lexer state IN_WHITESPACE, pseudo-token JSON_SKIP Markus Armbruster
2018-08-27 17:25   ` Eric Blake
2018-08-28  4:41     ` Markus Armbruster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180827070021.11931-5-armbru@redhat.com \
    --to=armbru@redhat.com \
    --cc=eblake@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.