All of lore.kernel.org
 help / color / mirror / Atom feed
From: Markus Armbruster <armbru@redhat.com>
To: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>,
	qemu-devel@nongnu.org, marcandre.lureau@redhat.com,
	mdroth@linux.vnet.ibm.com
Subject: Re: [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs
Date: Mon, 13 Aug 2018 09:07:57 +0200	[thread overview]
Message-ID: <87eff2bv76.fsf@dusky.pond.sub.org> (raw)
In-Reply-To: <67210961-8fc0-f3ac-be32-cf2f4903eed9@redhat.com> (Eric Blake's message of "Fri, 10 Aug 2018 12:18:10 -0500")

Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:03 AM, Markus Armbruster wrote:
>> The JSON parser treats each half of a surrogate pair as unpaired
>> surrogate.  Fix it to recognize surrogate pairs.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   qobject/json-parser.c | 16 +++++++++++++++-
>>   tests/check-qjson.c   |  3 +--
>>   2 files changed, 16 insertions(+), 3 deletions(-)
>>
>
>> @@ -168,6 +170,18 @@ static QString *parse_string(JSONParserContext *ctxt, JSONToken *token)
>>                      cp |= hex2decimal(*ptr);
>>                  }
>> +                if (cp >= 0xD800 && cp <= 0xDBFF && !leading_surrogate
>> +                    && ptr[1] == '\\' && ptr[2] == 'u') {
>> +                    ptr += 2;
>> +                    leading_surrogate = cp;
>> +                    goto hex;
>> +                }
>> +                if (cp >= 0xDC00 && cp <= 0xDFFF && leading_surrogate) {
>> +                    cp &= 0x3FF;
>> +                    cp |= (leading_surrogate & 0x3FF) << 10;
>> +                    cp += 0x010000;
>> +                }
>> +
>>                   if (mod_utf8_encode(utf8_buf, sizeof(utf8_buf), cp) < 0) {
>>                       parse_error(ctxt, token,
>>                                   "\\u%.4s is not a valid Unicode character",
>
> Consider "\\udbff\\udfff" - a valid surrogate pair (in terms of being
> in range), but which decodes to u+10ffff.  Since is_valid_codepoint()
> (part of mod_utf8_encode()) rejects it due to (codepoint & 0xfffe) ==
> 0xfffe, it means we end up printing this error message, but only using
> the second half of the surrogate pair.  Is that okay?

It's not horrible, but I wouldn't call it okay.  I'll try to improve it.

> Otherwise,
> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!

  reply	other threads:[~2018-08-13  7:08 UTC|newest]

Thread overview: 162+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 01/56] check-qjson: Cover multiple JSON objects in same string Markus Armbruster
2018-08-09 13:25   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 02/56] check-qjson: Cover blank and lexically erroneous input Markus Armbruster
2018-08-09 13:29   ` Eric Blake
2018-08-10 13:40     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 03/56] check-qjson: Cover whitespace more thoroughly Markus Armbruster
2018-08-09 13:36   ` Eric Blake
2018-08-10 13:43     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 04/56] qmp-cmd-test: Split off qmp-test Markus Armbruster
2018-08-09 13:38   ` Eric Blake
2018-08-10 13:49     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 05/56] qmp-test: Cover syntax and lexical errors Markus Armbruster
2018-08-09 13:42   ` Eric Blake
2018-08-10 13:52     ` Markus Armbruster
2018-08-10 14:06       ` Eric Blake
2018-08-16 12:44         ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 06/56] test-qga: Clean up how we test QGA synchronization Markus Armbruster
2018-08-09 13:46   ` Eric Blake
2018-08-10 13:57     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 07/56] check-qjson: Cover escaped characters more thoroughly, part 1 Markus Armbruster
2018-08-09 13:54   ` Eric Blake
2018-08-10 14:03     ` Markus Armbruster
2018-08-09 14:00   ` Eric Blake
2018-08-10 14:11     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 08/56] check-qjson: Streamline escaped_string()'s test strings Markus Armbruster
2018-08-09 13:57   ` Eric Blake
2018-08-10 14:15     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 09/56] check-qjson: Cover escaped characters more thoroughly, part 2 Markus Armbruster
2018-08-09 14:03   ` Eric Blake
2018-08-10 14:16     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 10/56] check-qjson: Drop redundant string tests Markus Armbruster
2018-08-09 14:04   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 11/56] check-qjson: Cover UTF-8 in single quoted strings Markus Armbruster
2018-08-09 14:17   ` Eric Blake
2018-08-10 14:18     ` Markus Armbruster
2018-08-10 14:59       ` Eric Blake
2018-08-13  6:11         ` Markus Armbruster
2018-08-13 14:53           ` Eric Blake
2018-08-14  6:01             ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 12/56] check-qjson: Simplify utf8_string() Markus Armbruster
2018-08-09 14:20   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 13/56] check-qjson: Fix utf8_string() to test all invalid sequences Markus Armbruster
2018-08-09 14:22   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 14/56] check-qjson qmp-test: Cover control characters more thoroughly Markus Armbruster
2018-08-09 17:24   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 15/56] check-qjson: Cover interpolation " Markus Armbruster
2018-08-09 17:26   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 16/56] json: Fix lexer to include the bad character in JSON_ERROR token Markus Armbruster
2018-08-09 17:42   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 17/56] json: Reject unescaped control characters Markus Armbruster
2018-08-09 18:26   ` Eric Blake
2018-08-10 14:26     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 18/56] json: Revamp lexer documentation Markus Armbruster
2018-08-09 18:49   ` Eric Blake
2018-08-10 14:31     ` Markus Armbruster
2018-08-10 15:02       ` Eric Blake
2018-08-13  6:12         ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 19/56] json: Tighten and simplify qstring_from_escaped_str()'s loop Markus Armbruster
2018-08-09 18:52   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 20/56] check-qjson: Document we expect invalid UTF-8 to be rejected Markus Armbruster
2018-08-09 18:55   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 21/56] json: Reject invalid UTF-8 sequences Markus Armbruster
2018-08-09 22:16   ` Eric Blake
2018-08-10 14:40     ` Markus Armbruster
2018-08-10 15:21       ` Eric Blake
2018-08-16 14:50         ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 22/56] json: Report first rather than last parse error Markus Armbruster
2018-08-10 15:25   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 23/56] json: Leave rejecting invalid UTF-8 to parser Markus Armbruster
2018-08-10 15:36   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8") Markus Armbruster
2018-08-10 15:48   ` Eric Blake
2018-08-10 16:09     ` Eric Blake
2018-08-13  7:00       ` Markus Armbruster
2018-08-13 14:57         ` Eric Blake
2018-08-14  6:07           ` Markus Armbruster
2018-08-17  7:18         ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequences to parser Markus Armbruster
2018-08-10 15:56   ` Eric Blake
2018-08-13  7:05     ` Markus Armbruster
2018-08-13 14:58       ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 26/56] json: Simplify parse_string() Markus Armbruster
2018-08-10 15:59   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 27/56] json: Reject invalid \uXXXX, fix \u0000 Markus Armbruster
2018-08-10 16:10   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs Markus Armbruster
2018-08-10 17:18   ` Eric Blake
2018-08-13  7:07     ` Markus Armbruster [this message]
2018-08-12  9:52   ` Paolo Bonzini
2018-08-13  7:12     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 29/56] check-qjson: Fix and enable utf8_string()'s disabled part Markus Armbruster
2018-08-10 17:19   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 30/56] json: remove useless return value from lexer/parser Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 31/56] json-parser: simplify and avoid JSONParserContext allocation Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 32/56] json: Have lexer call streamer directly Markus Armbruster
2018-08-10 17:22   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 33/56] json: Redesign the callback to consume JSON values Markus Armbruster
2018-08-13 15:30   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 34/56] json: Don't pass null @tokens to json_parser_parse() Markus Armbruster
2018-08-13 15:32   ` Eric Blake
2018-08-14  6:17     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 35/56] json: Don't create JSON_ERROR tokens that won't be used Markus Armbruster
2018-08-13 15:32   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 36/56] json: Rename token JSON_ESCAPE & friends to JSON_INTERPOL Markus Armbruster
2018-08-13 15:34   ` Eric Blake
2018-08-14  6:28     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 37/56] json: Treat unwanted interpolation as lexical error Markus Armbruster
2018-08-13 15:48   ` Eric Blake
2018-08-14  6:51     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 38/56] json: Pass lexical errors and limit violations to callback Markus Armbruster
2018-08-13 15:51   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 39/56] json: Leave rejecting invalid interpolation to parser Markus Armbruster
2018-08-13 16:12   ` Eric Blake
2018-08-14  7:23     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 40/56] json: Replace %I64d, %I64u by %PRId64, %PRIu64 Markus Armbruster
2018-08-13 16:18   ` Eric Blake
2018-08-14  7:24     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 41/56] json: Nicer recovery from invalid leading zero Markus Armbruster
2018-08-13 16:33   ` Eric Blake
2018-08-14  8:24     ` Markus Armbruster
2018-08-14 13:14       ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 42/56] json: Improve names of lexer states related to numbers Markus Armbruster
2018-08-13 16:36   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 43/56] qjson: Fix qobject_from_json() & friends for multiple values Markus Armbruster
2018-08-14 13:26   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 44/56] json: Fix latent parser aborts at end of input Markus Armbruster
2018-08-16 13:10   ` Eric Blake
2018-08-16 15:19     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 45/56] json: Fix streamer not to ignore trailing unterminated structures Markus Armbruster
2018-08-16 13:12   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 46/56] json: Assert json_parser_parse() consumes all tokens on success Markus Armbruster
2018-08-16 13:13   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 47/56] qjson: Have qobject_from_json() & friends reject empty and blank Markus Armbruster
2018-08-16 13:20   ` Eric Blake
2018-08-16 15:40     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 48/56] json: Enforce token count and size limits more tightly Markus Armbruster
2018-08-16 13:22   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 49/56] json: Streamline json_message_process_token() Markus Armbruster
2018-08-16 13:40   ` Eric Blake
2018-08-16 15:42     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 50/56] json: Unbox tokens queue in JSONMessageParser Markus Armbruster
2018-08-16 13:42   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 51/56] json: Eliminate lexer state IN_ERROR and pseudo-token JSON_MIN Markus Armbruster
2018-08-16 13:45   ` Eric Blake
2018-08-16 15:48     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 52/56] json: Eliminate lexer state IN_WHITESPACE, pseudo-token JSON_SKIP Markus Armbruster
2018-08-16 13:51   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 53/56] json: Make JSONToken opaque outside json-parser.c Markus Armbruster
2018-08-16 13:54   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 54/56] qobject: Drop superfluous includes of qemu-common.h Markus Armbruster
2018-08-16 13:54   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 55/56] json: Clean up headers Markus Armbruster
2018-08-16 17:50   ` Eric Blake
2018-08-17  8:22     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 56/56] docs/interop/qmp-spec: How to force known good parser state Markus Armbruster
2018-08-10 14:30   ` Eric Blake
2018-08-17  8:37     ` Markus Armbruster
2018-08-17 14:34       ` Eric Blake
2018-08-17 11:16     ` Markus Armbruster
2018-08-17 14:35       ` Eric Blake
2018-08-08 14:03 ` [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87eff2bv76.fsf@dusky.pond.sub.org \
    --to=armbru@redhat.com \
    --cc=eblake@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.