On 07/28/2015 08:33 AM, Markus Armbruster wrote:
> Eric Blake <eblake@redhat.com> writes:
> 
>> On 07/01/2015 02:22 PM, Markus Armbruster wrote:
>>> Caution, rough edges.
>>
>> No joke. It doesn't even compile without this fixup to a rebase snafu
>> (see [0] below):
> 
> Uh, how did that happen?  I compiled it a million times... (except for
> the last time, obviously).

At least you're not alone; I've done dumb things like that, too.  And at
least it was easy to figure out, so I could test it.

>>>
>>> FIXME it can generate awfully long lines
>>
>> We already have long lines in generated output, but I agree that finding
>> ways to break it up might be nice.  Actually,
> 
> This one's different in that the length is due to string literals.
> indent is happy to break lines for us, but not string literals.
> 
> Longest line is a bit over 4KiB for me.
> 

If we break up string literals, at least use some indentation to make it
obvious that multiple lines merge to a single array entry. For example
(after patch 47):

...
    "{ 'name': ':abr', 'meta-type': 'object', "
      "'members': [ "
        "{ 'name': 'device', 'type': ':acg', 'default': null }, "
        "{ 'name': 'node-name', 'type': ':acg', 'default': null }, "
        "{ 'name': 'snapshot-file', 'type': ':acg' }, "
        "{ 'name': 'snapshot-node-name', 'type': ':acg', 'default': null
}, "
        "{ 'name': 'format', 'type': ':acg', 'default': null }, "
        "{ 'name': 'mode', 'type': ':afo', 'default': null } ] }, "
    "{ 'name': ... "

Oh, and just thought of something while typing: although patch 47 masks
type names from the end user, it would be VERY worthwhile if the C code
had strategic comments that pointed back to the qapi type name:

    "{ 'name': ':abr', " /* TypeFoo */
      "'meta-type': 'object'"
...

(and then there's the '' vs. "" issue which impacts the output, as
well).  And if you do the "," between entries as a separate line, that
might impact whether you do a trailing "]" on a line by its own (see [3]
below).

>>> +
>>> +{ 'struct': 'SchemaInfoEnum',
>>> +  'data': { 'values': ['str'] } }
>>
>> Do we want to document anything about sort ordering of this list?  Is it
>> worth sorting the array by name, to allow clients to bsearch for whether
>> a particular enum value is supported, rather than having to linear search?
> 
> Haven't thought about it.  We currently emit them in definition order,
> which is not sorted.  Largest enums:
> 
>     Q_KEY_CODE_MAX = 125,
>     BLKDEBUG_EVENT_MAX = 43,
>     BLOCKDEV_DRIVER_MAX = 28,
>     CHARDEV_BACKEND_KIND_MAX = 19,
>     RUN_STATE_MAX = 15,
>     NET_CLIENT_OPTIONS_KIND_MAX = 12,
> 
> Most enums are small: median is 3.
> 
> Would libvirt prefer them sorted?

Libvirt can probably live without sorting of enum constants (searching
Q_KEY_CODE_MAX isn't going to be that noticeable of an impact on O(n)
vs. O(logn); I predict much more time will be spent searching for type
"xyz" referenced by command "abc" from the overall array, and repeating
that search for multiple feature probes).

But if we do want sorting, we need it up front (it will be easier to
decide to use bsearch down the road if we are guaranteed that output is
sorted, than it would be to try and learn whether whether we are talking
to a new vs. old qemu to learn if sorting is in effect because it was
added as an after-thought).  And if we don't want sorting, documenting
that data is NOT guaranteed to be position-dependent, in spite of being
in a JSON array, is a nice touch.

>>> +
>>> +{ 'struct': 'SchemaInfoObjectVariant',
>>> +  'data': { 'case': 'str',
>>> +            'members': [ 'SchemaInfoObjectMember' ] } }
>>
>> Would it be simpler to just have:
>>
>> 'data': { 'case': 'str', 'type': 'str' }
>>
>> and make the user refer recursively to the (possibly-implicit) type for
>> the members?
> 
> Hmmm...
> 
> QAPISchemaObjectTypeVariant has members name, type, flat, and a few more
> that don't matter here.
> 
> For a non-flat variant with name=N, type=T, my code creates
> 
>     { 'case': 'N', 'members': [ { 'name': 'data', 'type': 'T' } ] }
> 
> This means when the tag is 'N', we have a member 'data' of type 'T'.
> 
> For a flat variant, it creates
> 
>     { 'case': 'N', 'members': [ { ... the members of T ... } ] }
> 
> This means when the tag is 'N', we have all the members of T.
> 
> If I understand you correctly, you're proposing
> 
>     { 'case': 'N', 'type': 'T' }
> 
> to mean when the tag is 'N', we have all the members of T.  For the flat
> variant above, we'd create exactly that.
> 
> T must not have variants, but the schema doesn't reflect that.

That's our current restriction, but it's one we might decide to lift in
the future.  Having a type with two different discriminators could get a
bit weird to think about, but doesn't seem to be technically impossible.

> 
> For the simple variant, we'd then create
> 
>     { 'case': 'N', 'type': 'TT' }
> 
> where TT is a new implicit object type with a single member with name
> data and type T.
> 
> Correct?

Yes.

> 
>> In particular, if we ever decide to allow a flat union to have another
>> union as a branch, rather than the current restriction that all branches
>> must be structs, then referring to the type of a branch may be easier
>> than breaking out all members of a struct.
> 
> Not sure I completely get you here.  Using an object type instead of a
> member list is obviously more flexible, because for any member list we
> can make up an object type with the same meaning, but not vice versa.

Indeed, and that's the restriction that I mention we might someday want
to lift.  Or, in (modified, due to inline {} types) qapi terms, if we
ever want to allow:

{ 'union': 'Helper', 'data': { 'number': 'int', 'string': 'str' } }
{ 'enum': 'MyEnum', 'data': [ 'a', 'b' ] }
{ 'union': 'Main', 'base': { 'switch': 'MyEnum' },
  'discriminator': 'switch', 'data': { 'a': { 'boolean': 'bool' },
                                       'b': 'Helper' } }
{ 'command': 'foo', 'data': { 'data': 'Main' } }

then that would permit QMP invocations of:
{ "execute": "foo", "arguments": { "data": {
  "switch": "a", "boolean": true } } }
{ "execute": "foo", "arguments": { "data": {
  "switch": "b", "type": "number", "data": 1 } } }
{ "execute": "foo", "arguments": { "data": {
  "switch": "b", "type": "string", "data": "hello" } } }

which we can express as a list of case/types for the primary variants of
'Main' (those types in turn refer to the secondary variants of type
Helper), but which we cannot express as a [ 'SchemaInfoObjectMember' ]
list, because the type of the "data" member depends on the secondary
discriminator that is called into use on the "b" case of the primary
discriminator.

So I think we're saying the same thing, that a [
'SchemaInfoObjectMember' ] can always be written as a reference to an
object type name, but not all object type names can be broken back into
an array of SchemaInfoObjectMember (only those types that are pure
structs without variants); and that although we currently do not allow
sub-variants within a union, we should not get in the way of that being
a possible future extension.

> 
>>                                             And if that's the case, it
>> may have knock-on simplifications to your earlier patches for tracking
>> variants. See [1] below for more thoughts...
>>
>> Do we want to guarantee anything about the sort ordering in this list?
> 
> Again, haven't thought about it.
> 
> Do we expect member lists to get so large binary search is called for?

Probably not, since such a list would be unwieldy for both client and
server.  We tend to add boxing and optional sub-structs rather than
direct parameters if we have that much information to pass along (think
about how adding throttling parameters to a new block device was done
with a single top-level parameter pointing to a throttling sub-struct,
rather than adding lots of throttling parameters at top level).

But, as with enum sorting, actually documenting our choice will help
cement the expectations of clients on what they have to do when learning
if a parameter was added.

> 
>>> +
>>> +{ 'struct': 'SchemaInfoObject',
>>> +  'data': { 'members': [ 'SchemaInfoObjectMember' ],
>>> +            '*tag': 'str',
>>> +            '*variants': [ 'SchemaInfoObjectVariant' ] } }
>>
>> or these?
> 
> Same question.
> 

Here, if enums are sorted, then case branches within variants should be
sorted. If enums are unsorted, then I'm fine if case branches are also
unsorted (and possibly in a different order than the enum was), but be
sure to document that.

>>> +
>>> +{ 'struct': 'SchemaInfoAlternate',
>>> +  'data': { 'members': [ 'SchemaInfoObjectMember' ] } }
>>
>> Here's an example of what you generated:
>>     "{ 'name': 'BlockdevRef', 'meta-type': 'alternate', 'members': [ {
>> 'name': 'definition', 'type': 'BlockdevOptions' }, { 'name':
>> 'reference', 'type': 'str' } ] }, "
>>
>> I think you could get away with something simpler:
>>
>>  'data': { 'types': [ 'str' ] }
>>
>> as in:
>>  "{ 'name': 'BlockdevRef', 'meta-type': 'alternate', 'types': [
>> 'BlockdevOptions', 'str' ] }, "
> 
> I.e. have a list of types instead of a list of members.
> 
> Let's see what we'd lose, by enumerating the members of
> SchemaInfoObjectMember:
> 
> * name: not ABI, should not be examined (see commit message), thus no
>   loss.
> 
> * type: kept.
> 
> * default: never present (see commit message), thus no loss
> 
>> the only worry is whether we might want future extensions, where we'd
>> want additional information per element of that array, vs. being forced
>> to return two arrays in parallel (arrays of structs are more extensible
>> than arrays of strings).
>> Seems like this would be just a
> 
> Yes?
> 
> Choices:
> 
> * The only piece of information we need on an alternative right now is
>   the type, so make members a list of types.  Nice now, awkward if we
>   ever need more,
> 
> * To provide for future additions, make it a list of
>   SchemaInfoAlternateMember, where SchemaInfoAlternateMember has just
>   one member type now.
> 
> * Reuse existing SchemaInfoObjectMember, because that's close and I'm
>   lazy.
> 
> Preferences?

At this point, my vote is with a new SchemaInfoAlternateMember class
(SchemaInfoObjectMember may diverge in a different direction, and it
would eliminate the question of how to not expose the branch names as
ABI; but keeping things as a (one-member, for now) dictionary will allow
future extensions).

>>> @@ -265,7 +265,7 @@ class QAPISchemaGenCommandVisitor(QAPISchemaVisitor):
>>>          self.defn = None
>>>          self.regy = None
>>>          self.visited_rets = None
>>> -    def visit_begin(self):
>>> +    def visit_begin(self, schema):
>>
>> And again my python object-oriented newness is showing through; where I
>> guess all children have to update signatures to still be polymorphic to
>> a parent adding a parameter.
> 
> Maybe they need to be updated, maybe not, but updating them neatly
> sidesteps Python polymorphism questions from mere mortals like you and
> me.

see reference to [4] below

> 
>>> +''',
>>> +                          c_name=c_name(name))
>>> +        self.defn = mcgen('''
>>> +char %(c_name)s[] = "["
>>> +    "%(c_jsons)s]";
>>
>> And again. Also, I'd consider putting the "]" on its own line, like the
>> "[" was, so that you can more easily cut and paste individual lines of
>> generated output (but since JSON doesn't allow trailing comma, I guess
>> the last line is still always going to be special).
> 
> That's my reason for keepint the "]" there.
> 

[3] and that works for me, unless we want to break long lines into
shorter ones by virtue of putting "," (and thus "]") on their own line
to make it even more obvious where we are breaking between elements of
the overall array.

>>> +''',
>>> +                          c_name=c_name(name),
>>> +                          c_jsons=', "\n    "'.join(self.jsons))
>>
>> Cool syntax :)
> 
> Possibly bordering on too cool :)

Java was the first language I encountered where "".foo() was valid
syntax; sometimes, I wish C had made strings a first class type.  I'm
fine with it as it is.

>> [1] Ah, so .flat is still in use here, to avoid having to create
>> implicit types everywhere.  But if we create implicit types for simple
>> unions, and just track variants by their case name/type instead of case
>> name/[members], it will allow us to have a union as a case branch (I
>> don't know that we need that much flexibility), and not have to worry
>> about exposing .flat everywhere.  It may even result in a smaller JSON
>> string (you'd have to play with it to know for sure).
> 
> We should try hard to get the introspection schema right from the start.
> 
> But the internal representation is malleable.  I know we can implement
> non-flat unions completely in terms of flat ones (and that means no
> .flat), but I also know I failed at doing it in this series.  I'm not
> sure your idea will do the trick completely.  It's fine, we can finish
> the job later.

Yes, I've come to that conclusion myself - doesn't matter what we do on
the first round internally as long as the output is right; cleaning up
the internals can come later.


> I could shoehorn both views into a single visitor function, by passing
> both views, .base + .local_members, and .members.  All implementations
> will use only one of the views, but it's not immediately obvious which
> one.  So I chose to have two visitor functions.  Matter of taste.

I can live with it (documentation that sub-classes should override at
most only one of the two visitors might help the cause, though).

>>> +++ b/scripts/qapi.py
>>> @@ -764,7 +764,7 @@ class QAPISchemaEntity(object):
>>>          pass
>>>  
>>>  class QAPISchemaVisitor(object):
>>> -    def visit_begin(self):
>>> +    def visit_begin(self, schema):
>>
>> I don't know enough python to know if making schema optional in the
>> parent class affects what the child class is allowed to implement while
>> still overriding things.
> 
> Not sure I get you here.

It was an idle musing on whether

class QAPISchemaVisitor(object):
    def visit_begin(self, schema=None):

would permit:
class QAPIIntrospectVisitor(QAPISchemaVisitor):
    def visit_begin(self):

with proper polymorphism.  But you've already come to the conclusion
above [4] that it's easier to not mess with optional parameters (leave
that for the python gurus), and that mere mortals are better off using
something that obviously works instead of requiring knowledge about
language internals.


>>> +++ b/tests/.gitignore
>>> @@ -19,6 +19,7 @@ test-opts-visitor
>>>  test-qapi-event.[ch]
>>>  test-qapi-types.[ch]
>>>  test-qapi-visit.[ch]
>>> +test-qapi-introspect.[ch]
>>
>> [2] Ah, maybe this is the file that wasn't quite right.
> 
> Anything I need to fix?

The generated files created by 'make check' wer named
test-qmp-introspect.[ch] (either s/qapi/qmp/ here, or else fix a
Makefile rule to generate the desired name).

> 
> Thanks!  I really appreciate your review.  Must have been a big chunk of
> work.

Yes, doing a thorough review took me the better part of a week to go
through it all, so I can only imagine how much time you've invested in
it.  Hopefully v3 will be easier to review, because it will be diffs
against this version and mainly focusing on whether review comments were
addressed.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org