From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:39729)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <armbru@redhat.com>) id 1ZyKoE-0005gU-2f
	for qemu-devel@nongnu.org; Mon, 16 Nov 2015 09:31:07 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <armbru@redhat.com>) id 1ZyKo9-0001lT-Sl
	for qemu-devel@nongnu.org; Mon, 16 Nov 2015 09:31:06 -0500
From: Markus Armbruster <armbru@redhat.com>
References: <1447224690-9743-1-git-send-email-eblake@redhat.com>
	<1447224690-9743-20-git-send-email-eblake@redhat.com>
	<564621E7.9040103@redhat.com> <87wptlu5sg.fsf@blackfin.pond.sub.org>
	<56465806.8030103@redhat.com>
Date: Mon, 16 Nov 2015 15:30:46 +0100
In-Reply-To: <56465806.8030103@redhat.com> (Eric Blake's message of "Fri, 13
	Nov 2015 14:37:10 -0700")
Message-ID: <87vb92dnjd.fsf@blackfin.pond.sub.org>
MIME-Version: 1.0
Content-Type: text/plain
Subject: Re: [Qemu-devel] [PATCH v11 19/28] qapi: Change munging of
	CamelCase enum values
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Eric Blake <eblake@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Peter Maydell <peter.maydell@linaro.org>, "open list:Block layer core" <qemu-block@nongnu.org>, "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>, qemu-devel@nongnu.org, Michael Roth <mdroth@linux.vnet.ibm.com>, Gerd Hoffmann <kraxel@redhat.com>, Amit Shah <amit.shah@redhat.com>, Luiz Capitulino <lcapitulino@redhat.com>, Andreas =?utf-8?Q?F=C3=A4rber?= <afaerber@suse.de>

Eric Blake <eblake@redhat.com> writes:

> On 11/13/2015 11:13 AM, Markus Armbruster wrote:
>
>> We need c_name() to protect ticklish identifiers only when its result is
>> used as identifier.  Not when it's *part* of an identifier,
>> e.g. prefixed with qapi_, or camel_to_upper(type_name) + '_'.
>> 
>> We can protect even when we don't need to, if that helps keeping things
>> simple.
>
> As far as I can tell, as soon as we reserve Q_ in addition to q_, and
> fix the bug in c_name() applying munging backwards, then we are safe to
> state that any comparison between two names for collisions will be
> accurate as long as both sides of the comparison picked the same value
> of c_name(name, protect), whether or not our later use of that name is
> done with protect=True or protect=False.

Okay.

>> The obvious simple way to check for collisions works like this:
>> 
>> 1. Every QAPI name is mangled in exactly one way, modulo case: always
>>    with c_name(), and always with the same value of protect.
>
> That part is easy.  Maybe I could even reuse guardname(name) instead of
> c_name(name).upper(), although I don't know that it buys any clarity.

Certainly not without a rename :)

>> 2. We require the mangled name to be case-insensitively unique in its
>>    name space.
>
> That part is a bit harder: we unfortunately have the existing clash
> between the command 'stop' and the event 'STOP'.  I didn't find any
> other clashes in existing clients, though (phew), so we can either
> whitelist just that one, or more likely, set up separate namespaces for
> commands vs. events when doing the case collision tests.

I think the natural split is three: commands, events, types.

> But it's a bummer that it won't be quite as easy as I had hoped (using a
> single dictionary for both lookups and case collisions is easier than
> having two dictionaries to separate namespaces, while still searching
> both dictionaries for lookups).
>
> At any rate, I'm playing with patches along these lines.

Before I dive into yet another lengthy treatise: I strongly recommend to
kick clash detection work as far back in the queue as practical.  It's
hairy, and improving it isn't a priority.  It has clogged the queue more
than enough already.


Let's take a step back and consider requirements.  I think there are
broadly two: the QAPI schema must make sense, and generation of clashing
C identifiers should be caught.


QAPI's requirements are pretty simple.  It's always clear whether a name
denotes a command, event, type, or member of a certain type.  Thus,
requiring names to be unique within their kind suffices.  In other
words, separate name spaces for commands, events, types and for each
type's members.

To avoid confusion, we can require more.  Right now, we require command,
event and type names to be unique within their combined kinds.  In other
words, a combined name space for commands, events and types.


Catching C clashes is more complicated, because we need to analyze the
generators' (less than disciplined) use of C's name spaces to find its
requirements.

QAPI names get mangled and combined with prefixes or suffixes to become
C identifiers.

Prefixes and suffixes can only reduce clashes among generated names
(they can require reserving names, but let's ignore that here).
Therefore, considering clashes between mangled QAPI names suffices.

If we mangled each name in exactly one way, this would be trivial: use
the mangled name as dictionary key, and use a separate dictionary for
each C name space.

Note it's trivial even with multiple ways to mangle, as long as each
name gets mangled in exactly one of them.

Unfortunately, this is not the case.

We mangle in the following ways:

* Standard mangling with c_name()

  Variations: protect=False, shouted, unshouted.

* camel_to_upper() mangling

Names that get mangled in more than one way, to the best of my
knowledge:

(1) Enumeration member names

  - Shouted standard mangling for the enumeration constant
  - Standard mangling for the union member when used as variant tag

(2) Enumeration type names

  - Standard mangling for the typedef
  - camel_to_upper() mangling for the enumeration constants

(3) Event names

  - Unshouted standard mangling for the function to emit the event
  - Shouted standard mangling for the enumeration constants

(4) Possibly built-in type names

  - Standard mangling with protect=False
  - If we screw up somehow, standard mangling without protect=False

General solution: put all possible manglings into the name space's
dictionary.  For example, enter a command name in standard mangling, but
enter an event name twice, once in unshouted standard mangling, and once
in shouted standard mangling.

If we peek into a few black boxes, we can simplify things a bit.

We use C's ordinary identifier name space, the struct tag name space,
and the member name spaces of structs and unions we generate.

We don't need dictionaries for the C struct, union and enum tag name
spaces, because we always use the same identifier in the ordinary name
space.

We can safely ignore protect=False since we reserve the q_ prefix in all
variations, and avoid its use outside c_name().  Standard mangling as
key suffices for (4).

We can ignore a mangling when we can prove it can clash only when
another one clashes:

* An event name's unshouted standard mangling can clash only when the
  shouted mangling clashes.  Down to one key, but it's still an unusual
  one.

* If we enforce "enumeration member names must be lower-case", then
  shouted standard mangling can clash only when standard mangling
  clashes.  Use of standard mangling as key suffices for (1).

We can cover *all* case variations of standard mangling by making the
dictionary case-insensitive.  Unfortunately, this runs into the STOP
vs. stop roadblock in the ordinary identifier dictionary.  No go.

We could split the ordinary identifier dictionary into events, commands
and types (the split that exists in QAPI) if generators ensure
identifiers from different parts can't clash, by restricting names or by
adding prefixes or siffixes.  I think that's the case, but it's a
non-trivial argument.  The resulting dictionaries could then be made
case-insensitive separately.  But this is getting too complex for my
taste.

Instead of or in addition to simplifying clash detection, we can
simplify mangling:

* If we enforce "event name must be ALL_CAPS", then shouted standard
  mangling can be trivially replaced by standard mangling.  Since we
  already showed that unshouted standard mangling can be ignored, use of
  standard mangling as key suffices for (3).

* That leaves (2).  To make standard mangling suffice there as well, we
  can simply replace camel_to_upper() mangling by standard mangling.  If
  that's inopportune, we'll need to find another solution.

To be honest, I'm *this* close to giving up on clash detection entirely.
It's 100% self-inflicted pain.