All of lore.kernel.org
 help / color / mirror / Atom feed
From: Markus Armbruster <armbru@redhat.com>
To: Eric Blake <eblake@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [Qemu-devel] RFC: libyajl for JSON
Date: Mon, 02 Nov 2015 20:10:33 +0100	[thread overview]
Message-ID: <87egg8nro5.fsf@blackfin.pond.sub.org> (raw)
In-Reply-To: <56378572.5020203@redhat.com> (Eric Blake's message of "Mon, 2 Nov 2015 08:46:58 -0700")

Eric Blake <eblake@redhat.com> writes:

> On 11/02/2015 01:40 AM, Markus Armbruster wrote:
>> Eric Blake <eblake@redhat.com> writes:
>> 
>>> Loaded question in response to
>>> https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg06988.html, but
>> 
>> Discussion of our parser's enormous appetite for wasting RAM.  Fixable,
>> but it's work, and it's not its only defect.
>> 
>
>>> On the other hand, one of the "features" of qemu's hand-rolled json
>>> parser is the ability to do qobject_from_jsonf("{'foo':%s}", "bar")
>>> (that is, we extended JSON with our notion of single-quoted strings, and
>>> with our notion of %s and similar escape sequences for piecing together
>>> multiple inputs into a single input stream without having to first
>>> g_strdup_printf our pieces into a single string).  I don't know if
>>> libyajl lets us add extensions to the language it parses.
>> 
>> Actually two separate extensions:
>> 
>> * Single-quoted strings
>> 
>>   The extension's purpose is avoiding quotes in C.  Example:
>> 
>>       "{ 'execute': 'migrate_set_speed', 'arguments': { 'value': 10 } }"
>> 
>>   is easier to read than
>> 
>>       "{ \"execute\": \"migrate_set_speed\", \"arguments\": { \"value\": 10 } }"
>> 
>>   Most actual uses are in tests.
>
> Except that we have documented that QMP clients may use it; and indeed:
>
> $ printf "{'execute':'qmp_capabilities'}\n{'execute':'quit'}" | \
>    ./x86_64-softmmu/qemu-system-x86_64 -nographic -nodefaults -qmp stdio
> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 4, "major": 2},
> "package": ""}, "capabilities": []}}
> {"return": {}}
> {"return": {}}
> {"timestamp": {"seconds": 1446478389, "microseconds": 265886}, "event":
> "SHUTDOWN"}

Unnecessary mistake :(

> So that is an absolute must for whatever parser we choose.  My first
> glance at libyajl is that it does NOT currently allow single quotes (not
> even with any of its existing options), so we'd have to pre-process
> input before feeding it to yajl.

I'm not sure reusing an existing parser makes much sense if we need to
preprocess anyway.  After all, parsing JSON isn't exactly a moon shot,
at least if you know what you're doing.

>>   JSON5, a proposed extension to JSON also supports single-quoted
>>   strings.  So does Python.
>
> It would be an interesting task to see if yajl would accept patches to
> parse JSON5 as additional options (for example, yajl already has options
> on whether to diagnose or ignore UTF8 errors, and on whether to allow /*
> */ javascript comments).  But then we'd be requiring an
> as-yet-unreleased version of libyajl rather than being able to rely on
> what distros already have, at least for a few years; or we'd have to
> carry new-enough yajl as a git submodule within qemu.git.
>
> I'll ask on the yajl mailing lists, to get a feel for community
> receptiveness, before attempting to actually write patches on that front.

Makes sense.

>> * Optional interpolation
>> 
>>   If you pass a va_list to the parser, it replaces certain escape
>>   sequences by values from that va_list.  The escape sequences are a
>>   subset of printf conversion specifiers, to enable compile-time
>>   checking with __attribute__((format...)).
>> 
>>   We used to rely on this for creating "rich" error objects, but those
>>   are long gone (commit df1e608).  Perhaps two dozen uses are left.
>
> The testsuite is probably the biggest user, but we still have uses
> throughout the code base.
>
> Based on my look at libyajl, I think we CAN get away with stream
> interpolation - yajl maintains state such that you do NOT have to feed
> it the entire stream in one go.  So this one is not insurmountable; we
> could rewrite our qobject_from_jsonf() to make multiple calls into yajl
> without having to pre-format a single string.

Sounds complicated...

>>   We could convert them to use "normal" interpolation, i.e. first format
>>   the arguments as JSON, then interpolate with g_strdup_printf() or
>>   similar, then parse.  Formatting only to parse it right back is
>>   inelegant.  Also inefficient, but that probably doesn't matter here.
>
> Or indeed, this is still an option.
>
>> 
>>   The current interface could be retained as convenience function to
>>   interpolate and feed the result to the JSOn parser.
>> 
>>   Alternatively, we could build the QObject manually instead.  More
>>   efficient than either kind of interpolation, but a good deal less
>>   readable.
>
> At any rate, it is certainly less of a show-stopper when compared to the
> need for supporting single-quoted strings.
>
>> Got one more, actually a pitfall, not an extension:
>> 
>> * Representation of JSON numbers
>> 
>>   RFC 7159 doesn't specify how numbers are to be represented.
>> 
>>   Many JSON implementations represent any JSON number as double.  This
>>   can represent signed integers up to 53 bits exactly.
>> 
>>   Our parser represents numbers as int64_t when possible, else as
>>   double.
>> 
>>   Unlike the extensions above, this one is essential: any parser that
>>   can't do it is useless for us.  Can yajl do it?
>
> Yes; the yajl parser has 4 modes of parse operation based on which of
> three callbacks you implement: double-only (yajl_double), long long-only
> (yajl_integer), double-and-int (both yajl_double and yajl_integer, not
> sure which one has precedence if input satisfies both formats), or
> custom number (yajl_number, which is given a string, and you turn it
> into the format of your choice).  Likewise, the yajl formatter has 2
> functions: yajl_gen_double() (formats doubles) and yajl_gen_number()
> (you provide literal strings formatted how you like).

Good.

>> More extensions or pitfalls might be lurking in our parser.  Therefore,
>> replacing our parser by a suitable library is not without risk.  I guess
>> we could do it over a full development cycle.  No way for 2.5.
>> 
>
> Absolutely not for 2.5; we've missed softfreeze, and this would count as
> a new feature.
>
>> If we decide to attempt replacing it in the next development cycle, we
>> might want to refrain from fixing its bugs except for true show
>> stoppers.
>> 

  reply	other threads:[~2015-11-02 19:10 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-30 19:45 [Qemu-devel] RFC: libyajl for JSON Eric Blake
2015-10-30 20:16 ` Peter Maydell
2015-11-02  8:40 ` Markus Armbruster
2015-11-02 11:49   ` Paolo Bonzini
2015-11-02 12:56     ` Markus Armbruster
2015-11-02 13:47       ` Paolo Bonzini
2015-11-02 14:10   ` Stefan Hajnoczi
2015-11-02 15:46   ` Eric Blake
2015-11-02 19:10     ` Markus Armbruster [this message]
2015-11-02 20:08       ` Eric Blake
2015-11-03  7:17         ` Markus Armbruster
2015-11-03 13:19           ` Luiz Capitulino
2015-11-03 13:38             ` Paolo Bonzini
2015-11-03 13:46               ` Luiz Capitulino
2015-11-03 13:53                 ` Paolo Bonzini
2015-11-03 14:26                   ` Luiz Capitulino
2015-11-03 14:53                     ` Paolo Bonzini
2015-11-03 15:08                     ` Markus Armbruster
2015-11-03 13:40             ` Eric Blake
2015-11-03 13:44               ` Daniel P. Berrange
2015-11-03 13:53               ` Luiz Capitulino
2015-11-02 13:17 ` Luiz Capitulino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87egg8nro5.fsf@blackfin.pond.sub.org \
    --to=armbru@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=lcapitulino@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.