All of lore.kernel.org
 help / color / mirror / Atom feed
From: Markus Armbruster <armbru@redhat.com>
To: qemu-devel@nongnu.org
Cc: marcandre.lureau@redhat.com, mdroth@linux.vnet.ibm.com,
	eblake@redhat.com
Subject: [Qemu-devel] [PATCH 3/6] json: Make lexer's "character consumed" logic less confusing
Date: Mon, 27 Aug 2018 09:00:18 +0200	[thread overview]
Message-ID: <20180827070021.11931-4-armbru@redhat.com> (raw)
In-Reply-To: <20180827070021.11931-1-armbru@redhat.com>

The lexer uses macro TERMINAL_NEEDED_LOOKAHEAD() to decide whether a
state transition consumes the input character.  It returns true when
the state transition is defined with the TERMINAL() macro.  To detect
that, it checks whether input '\0' would have resulted in the same
state transition, and the new state is not IN_ERROR.

Why does that even work?  For all states, the new state on input '\0'
is either IN_ERROR or defined with TERMINAL().  If the state
transition equals the one we'd get for input '\0', it goes to IN_ERROR
or to the argument of TERMINAL().  We never use TERMINAL(IN_ERROR),
because it makes no sense.  Thus, if it doesn't go to IN_ERROR, it
must be defined with TERMINAL().

Since this isn't quite confusing enough, we negate the result to get
@char_consumed, and ignore it when @flush is true.

Instead of deriving the lookahead bit from the state transition, make
it explicit.  This is easier to understand, and a bit more flexible,
too.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-lexer.c      | 27 ++++++++++++++++-----------
 qobject/json-parser-int.h |  1 +
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index ec3aec726f..28582e17d9 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -121,15 +121,11 @@ enum json_lexer_state {
 };
 
 QEMU_BUILD_BUG_ON((int)JSON_MIN <= (int)IN_START_INTERP);
+QEMU_BUILD_BUG_ON(JSON_MAX >= 0x80);
 QEMU_BUILD_BUG_ON(IN_START_INTERP != IN_START + 1);
 
-#define TERMINAL(state) [0 ... 0xFF] = (state)
-
-/* Return whether TERMINAL is a terminal state and the transition to it
-   from OLD_STATE required lookahead.  This happens whenever the table
-   below uses the TERMINAL macro.  */
-#define TERMINAL_NEEDED_LOOKAHEAD(old_state, terminal) \
-    (terminal != IN_ERROR && json_lexer[(old_state)][0] == (terminal))
+#define LOOKAHEAD 0x80
+#define TERMINAL(state) [0 ... 0xFF] = ((state) | LOOKAHEAD)
 
 static const uint8_t json_lexer[][256] =  {
     /* Relies on default initialization to IN_ERROR! */
@@ -251,6 +247,17 @@ static const uint8_t json_lexer[][256] =  {
     [IN_START_INTERP]['%'] = IN_INTERP,
 };
 
+static inline uint8_t next_state(JSONLexer *lexer, char ch, bool flush,
+                                 bool *char_consumed)
+{
+    uint8_t next;
+
+    assert(lexer->state <= ARRAY_SIZE(json_lexer));
+    next = json_lexer[lexer->state][(uint8_t)ch];
+    *char_consumed = !flush && !(next & LOOKAHEAD);
+    return next & ~LOOKAHEAD;
+}
+
 void json_lexer_init(JSONLexer *lexer, bool enable_interpolation)
 {
     lexer->start_state = lexer->state = enable_interpolation
@@ -271,11 +278,9 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
     }
 
     while (flush ? lexer->state != lexer->start_state : !char_consumed) {
-        assert(lexer->state <= ARRAY_SIZE(json_lexer));
-        new_state = json_lexer[lexer->state][(uint8_t)ch];
-        char_consumed = !flush
-            && !TERMINAL_NEEDED_LOOKAHEAD(lexer->state, new_state);
+        new_state = next_state(lexer, ch, flush, &char_consumed);
         if (char_consumed) {
+            assert(!flush);
             g_string_append_c(lexer->token, ch);
         }
 
diff --git a/qobject/json-parser-int.h b/qobject/json-parser-int.h
index ceaa890ec6..abeec63af5 100644
--- a/qobject/json-parser-int.h
+++ b/qobject/json-parser-int.h
@@ -33,6 +33,7 @@ typedef enum json_token_type {
     JSON_SKIP,
     JSON_ERROR,
     JSON_END_OF_INPUT,
+    JSON_MAX = JSON_END_OF_INPUT
 } JSONTokenType;
 
 typedef struct JSONToken JSONToken;
-- 
2.17.1

  parent reply	other threads:[~2018-08-27  7:00 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-27  7:00 [Qemu-devel] [PATCH 0/6] json: More fixes, error reporting improvements, cleanups Markus Armbruster
2018-08-27  7:00 ` [Qemu-devel] [PATCH 1/6] json: Fix lexer for lookahead character beyond '\x7F' Markus Armbruster
2018-08-27 16:50   ` Eric Blake
2018-08-28  4:28     ` Markus Armbruster
2018-08-27  7:00 ` [Qemu-devel] [PATCH 2/6] json: Clean up how lexer consumes "end of input" Markus Armbruster
2018-08-27 16:58   ` Eric Blake
2018-08-28  4:28     ` Markus Armbruster
2018-08-27  7:00 ` Markus Armbruster [this message]
2018-08-27 17:04   ` [Qemu-devel] [PATCH 3/6] json: Make lexer's "character consumed" logic less confusing Eric Blake
2018-08-27  7:00 ` [Qemu-devel] [PATCH 4/6] json: Nicer recovery from lexical errors Markus Armbruster
2018-08-27 17:18   ` Eric Blake
2018-08-28  4:35     ` Markus Armbruster
2018-08-27  7:00 ` [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR Markus Armbruster
2018-08-27 17:20   ` Eric Blake
2018-08-27 17:29   ` Eric Blake
2018-08-28  4:40     ` Markus Armbruster
2018-08-28 15:01       ` Eric Blake
2018-08-28 15:04         ` Eric Blake
2018-08-31  7:08           ` Markus Armbruster
2018-08-31  7:06         ` Markus Armbruster
2018-08-27  7:00 ` [Qemu-devel] [PATCH 6/6] json: Eliminate lexer state IN_WHITESPACE, pseudo-token JSON_SKIP Markus Armbruster
2018-08-27 17:25   ` Eric Blake
2018-08-28  4:41     ` Markus Armbruster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180827070021.11931-4-armbru@redhat.com \
    --to=armbru@redhat.com \
    --cc=eblake@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.