From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39729) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZyKoE-0005gU-2f for qemu-devel@nongnu.org; Mon, 16 Nov 2015 09:31:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZyKo9-0001lT-Sl for qemu-devel@nongnu.org; Mon, 16 Nov 2015 09:31:06 -0500 From: Markus Armbruster References: <1447224690-9743-1-git-send-email-eblake@redhat.com> <1447224690-9743-20-git-send-email-eblake@redhat.com> <564621E7.9040103@redhat.com> <87wptlu5sg.fsf@blackfin.pond.sub.org> <56465806.8030103@redhat.com> Date: Mon, 16 Nov 2015 15:30:46 +0100 In-Reply-To: <56465806.8030103@redhat.com> (Eric Blake's message of "Fri, 13 Nov 2015 14:37:10 -0700") Message-ID: <87vb92dnjd.fsf@blackfin.pond.sub.org> MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [Qemu-devel] [PATCH v11 19/28] qapi: Change munging of CamelCase enum values List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake Cc: Kevin Wolf , Peter Maydell , "open list:Block layer core" , "Michael S. Tsirkin" , Jason Wang , qemu-devel@nongnu.org, Michael Roth , Gerd Hoffmann , Amit Shah , Luiz Capitulino , Andreas =?utf-8?Q?F=C3=A4rber?= Eric Blake writes: > On 11/13/2015 11:13 AM, Markus Armbruster wrote: > >> We need c_name() to protect ticklish identifiers only when its result is >> used as identifier. Not when it's *part* of an identifier, >> e.g. prefixed with qapi_, or camel_to_upper(type_name) + '_'. >> >> We can protect even when we don't need to, if that helps keeping things >> simple. > > As far as I can tell, as soon as we reserve Q_ in addition to q_, and > fix the bug in c_name() applying munging backwards, then we are safe to > state that any comparison between two names for collisions will be > accurate as long as both sides of the comparison picked the same value > of c_name(name, protect), whether or not our later use of that name is > done with protect=True or protect=False. Okay. >> The obvious simple way to check for collisions works like this: >> >> 1. Every QAPI name is mangled in exactly one way, modulo case: always >> with c_name(), and always with the same value of protect. > > That part is easy. Maybe I could even reuse guardname(name) instead of > c_name(name).upper(), although I don't know that it buys any clarity. Certainly not without a rename :) >> 2. We require the mangled name to be case-insensitively unique in its >> name space. > > That part is a bit harder: we unfortunately have the existing clash > between the command 'stop' and the event 'STOP'. I didn't find any > other clashes in existing clients, though (phew), so we can either > whitelist just that one, or more likely, set up separate namespaces for > commands vs. events when doing the case collision tests. I think the natural split is three: commands, events, types. > But it's a bummer that it won't be quite as easy as I had hoped (using a > single dictionary for both lookups and case collisions is easier than > having two dictionaries to separate namespaces, while still searching > both dictionaries for lookups). > > At any rate, I'm playing with patches along these lines. Before I dive into yet another lengthy treatise: I strongly recommend to kick clash detection work as far back in the queue as practical. It's hairy, and improving it isn't a priority. It has clogged the queue more than enough already. Let's take a step back and consider requirements. I think there are broadly two: the QAPI schema must make sense, and generation of clashing C identifiers should be caught. QAPI's requirements are pretty simple. It's always clear whether a name denotes a command, event, type, or member of a certain type. Thus, requiring names to be unique within their kind suffices. In other words, separate name spaces for commands, events, types and for each type's members. To avoid confusion, we can require more. Right now, we require command, event and type names to be unique within their combined kinds. In other words, a combined name space for commands, events and types. Catching C clashes is more complicated, because we need to analyze the generators' (less than disciplined) use of C's name spaces to find its requirements. QAPI names get mangled and combined with prefixes or suffixes to become C identifiers. Prefixes and suffixes can only reduce clashes among generated names (they can require reserving names, but let's ignore that here). Therefore, considering clashes between mangled QAPI names suffices. If we mangled each name in exactly one way, this would be trivial: use the mangled name as dictionary key, and use a separate dictionary for each C name space. Note it's trivial even with multiple ways to mangle, as long as each name gets mangled in exactly one of them. Unfortunately, this is not the case. We mangle in the following ways: * Standard mangling with c_name() Variations: protect=False, shouted, unshouted. * camel_to_upper() mangling Names that get mangled in more than one way, to the best of my knowledge: (1) Enumeration member names - Shouted standard mangling for the enumeration constant - Standard mangling for the union member when used as variant tag (2) Enumeration type names - Standard mangling for the typedef - camel_to_upper() mangling for the enumeration constants (3) Event names - Unshouted standard mangling for the function to emit the event - Shouted standard mangling for the enumeration constants (4) Possibly built-in type names - Standard mangling with protect=False - If we screw up somehow, standard mangling without protect=False General solution: put all possible manglings into the name space's dictionary. For example, enter a command name in standard mangling, but enter an event name twice, once in unshouted standard mangling, and once in shouted standard mangling. If we peek into a few black boxes, we can simplify things a bit. We use C's ordinary identifier name space, the struct tag name space, and the member name spaces of structs and unions we generate. We don't need dictionaries for the C struct, union and enum tag name spaces, because we always use the same identifier in the ordinary name space. We can safely ignore protect=False since we reserve the q_ prefix in all variations, and avoid its use outside c_name(). Standard mangling as key suffices for (4). We can ignore a mangling when we can prove it can clash only when another one clashes: * An event name's unshouted standard mangling can clash only when the shouted mangling clashes. Down to one key, but it's still an unusual one. * If we enforce "enumeration member names must be lower-case", then shouted standard mangling can clash only when standard mangling clashes. Use of standard mangling as key suffices for (1). We can cover *all* case variations of standard mangling by making the dictionary case-insensitive. Unfortunately, this runs into the STOP vs. stop roadblock in the ordinary identifier dictionary. No go. We could split the ordinary identifier dictionary into events, commands and types (the split that exists in QAPI) if generators ensure identifiers from different parts can't clash, by restricting names or by adding prefixes or siffixes. I think that's the case, but it's a non-trivial argument. The resulting dictionaries could then be made case-insensitive separately. But this is getting too complex for my taste. Instead of or in addition to simplifying clash detection, we can simplify mangling: * If we enforce "event name must be ALL_CAPS", then shouted standard mangling can be trivially replaced by standard mangling. Since we already showed that unshouted standard mangling can be ignored, use of standard mangling as key suffices for (3). * That leaves (2). To make standard mangling suffice there as well, we can simply replace camel_to_upper() mangling by standard mangling. If that's inopportune, we'll need to find another solution. To be honest, I'm *this* close to giving up on clash detection entirely. It's 100% self-inflicted pain.