On Thu, 1 Jul 2021, Luca Fancellu wrote:
> > On 23 Jun 2021, at 23:03, Stefano Stabellini <sstabellini@kernel.org> wrote:
> > 
> > On Mon, 10 May 2021, Luca Fancellu wrote:
> >> Add preprocessor called by doxygen before parsing headers,
> >> it will include in every header a doxygen_include.h file
> >> that provides missing defines and includes that are
> >> usually passed by the compiler.
> >> 
> >> Add doxy_input.list that is a text file containing the
> >> relative path to the source code file to be parsed by
> >> doxygen. The path sould be relative to the xen folder:
> >> E.g. xen/include/public/grant_table.h
> >> 
> >> Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
> >> ---
> >> docs/xen-doxygen/doxy-preprocessor.py | 110 ++++++++++++++++++++++++++
> >> docs/xen-doxygen/doxy_input.list      |   0
> >> docs/xen-doxygen/doxygen_include.h.in |  32 ++++++++
> >> 3 files changed, 142 insertions(+)
> >> create mode 100755 docs/xen-doxygen/doxy-preprocessor.py
> >> create mode 100644 docs/xen-doxygen/doxy_input.list
> >> create mode 100644 docs/xen-doxygen/doxygen_include.h.in
> >> 
> >> diff --git a/docs/xen-doxygen/doxy-preprocessor.py b/docs/xen-doxygen/doxy-preprocessor.py
> >> new file mode 100755
> >> index 0000000000..496899d8e6
> >> --- /dev/null
> >> +++ b/docs/xen-doxygen/doxy-preprocessor.py
> >> @@ -0,0 +1,110 @@
> >> +#!/usr/bin/python3
> >> +#
> >> +# Copyright (c) 2021, Arm Limited.
> >> +#
> >> +# SPDX-License-Identifier: GPL-2.0
> >> +#
> >> +
> >> +import os, sys, re
> >> +
> >> +
> >> +# Variables that holds the preprocessed header text
> >> +output_text = ""
> >> +header_file_name = ""
> >> +
> >> +# Variables to enumerate the anonymous structs/unions
> >> +anonymous_struct_count = 0
> >> +anonymous_union_count = 0
> >> +
> >> +
> >> +def error(text):
> >> +    sys.stderr.write("{}\n".format(text))
> >> +    sys.exit(1)
> >> +
> >> +
> >> +def write_to_output(text):
> >> +    sys.stdout.write(text)
> >> +
> >> +
> >> +def insert_doxygen_header(text):
> >> +    # Here the strategy is to insert the #include <doxygen_include.h> in the
> >> +    # first line of the header
> >> +    abspath = os.path.dirname(os.path.abspath(__file__))
> >> +    text += "#include \"{}/doxygen_include.h\"\n".format(abspath)
> >> +
> >> +    return text
> >> +
> >> +
> >> +def enumerate_anonymous(match):
> >> +    global anonymous_struct_count
> >> +    global anonymous_union_count
> >> +
> >> +    if "struct" in match.group(1):
> >> +        label = "anonymous_struct_%d" % anonymous_struct_count
> >> +        anonymous_struct_count += 1
> >> +    else:
> >> +        label = "anonymous_union_%d" % anonymous_union_count
> >> +        anonymous_union_count += 1
> >> +
> >> +    return match.group(1) + " " + label + " {"
> >> +
> >> +
> >> +def manage_anonymous_structs_unions(text):
> >> +    # Match anonymous unions/structs with this pattern:
> >> +    # struct/union {
> >> +    #     [...]
> >> +    #
> >> +    # and substitute it in this way:
> >> +    #
> >> +    # struct anonymous_struct_# {
> >> +    #     [...]
> >> +    # or
> >> +    # union anonymous_union_# {
> >> +    #     [...]
> >> +    # where # is a counter starting from zero, different between structs and
> >> +    # unions
> >> +    #
> >> +    # We don't count anonymous union/struct that are part of a typedef because
> >> +    # they don't create any issue for doxygen
> >> +    text = re.sub(
> >> +        "(?<!typedef\s)(struct|union)\s+?\{",
> >> +        enumerate_anonymous,
> >> +        text,
> >> +        flags=re.S
> >> +    )
> 
> Hi Stefano,
> 
> > 
> > My python is a bit rusty but I thought this is really clever!
> > 
> > One question: given that anonymous_struct_count is local per file being
> > processed, it always starts at 0 for each header. I think that is
> > actually better from a documentation readability point of view.
> > 
> > However, is it possible that Doxygen gets confused in a case where we
> > can multiple "struct anonymous_struct_0", e.g. one for grant_table.h,
> > one for event_channel.h, etc. ?
> 
> Yes this is a very good point, I did some experiment and it can happen, however so far we didn’t notice any
> problem because all the anonymous union/struct were part of other data structure, in that case there is no
> problem at all because we have upper_data_structure::anonymous_{struct/union}_0/1/2…
> 
> So given the fact that is difficult to have clash, I would say we can handle any future case separately.
> 
> Having a global numbering can solve the issue but because the script is called separately for each header,
> Implementing it will involve some changes, there should be a file to maintain the number across invocation
> and so on.
> 
> Let me know what do you think about that and if in your opinion we can proceed the way it is now.

I think anonymous union/struct are always part of another data struct in
our headers, so we shouldn't have a problem. I think we can get away
without a global numbering system. Maybe you should add a note about
this limitation in the commit message or an in-code comment.