bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Jose E. Marchesi" <jose.marchesi@oracle.com>
To: bpf@vger.kernel.org
Cc: david.faust@oracle.com,
	James Hilliard <james.hilliard1@gmail.com>,
	Nick Desaulniers <ndesaulniers@google.com>,
	David Malcolm <dmalcolm@redhat.com>,
	Julia Lawall <julia.lawall@inria.fr>,
	elena.zannoni@oracle.com
Subject: BTF tag support in DWARF (notes for today's BPF Office Hours)
Date: Thu, 05 Jan 2023 12:37:57 +0100	[thread overview]
Message-ID: <87r0w9jjoq.fsf@oracle.com> (raw)


Hello all.

Find below the notes we intend to use in today's BPF office hour to
discuss possible solutions for the current limitations in the DWARF
representation of the btf_type_tag C attributes, and hopefully decide on
one so we can move forward with this.

The list of suggested solutions below is of course not closed: these are
just the ones we could think about.  Better alternatives and suggestions
are very welcome!

BTF tag support in DWARF

* Current situation: annotations as children DIEs for pointees

  DWARF information is structured as a tree of DIE nodes.  Nodes can
  have attributes associated to them, as well as zero or more DIE
  children.
   
  clang extends DWARF with a new tag (DIE type) =DW_TAG_LLVM_annotation=.
  Nodes of this type are used to associate a tag name with a tag value that
  is also a string.

  Example:

  :  DW_TAG_LLVM_annotation
  :     DW_AT_name        "btf_type_tag"
  :     DW_AT_const_value "user"

  At the moment, clang generates =DW_TAG_LLVM_annotation= nodes as children
  of =DW_TAG_pointer_type= nodes.  The intended semantic is that the
  annotation applies to the pointed-to type.

  For example (indentation reflects the parent-children tree structure):

  : DW_TAG_pointer_type
  :   DW_AT_type "int"
  :   DW_TAG_LLVM_annotation
  :     DW_AT_name        "btf_type_tag"
  :     DW_AT_const_value "tag1"

  The example above associates a "btf_type_tag->tag1" named annotation to the
  type pointed by its containing pointer_type, which is "int".

  This approach has the advantage that, since the new
  =DW_TAG_LLVM_annotation= nodes are effectively used as attributes, they are
  safely ignored by DWARF consumers that do not understand this DIE type.

  But this approach also has a big caveat: types that are not pointed-to by
  pointer types are not expressible in this design.  This obviously impacts
  simple types such as =int= but also pointer types that are not pointees
  themselves.

  For example, it is not possible to associate the tag =__tag2= to the type
  =int **= in this example (Note this is sparse/clang ordering.):

  : int * __tag1 * __tag2 h;

  - sparse
    +  __tag1 applies to int*, __tag2 applies to int**
    : got int *[noderef] __tag1 *[addressable] [noderef] [toplevel] __tag2 h
  - clang
    + According to DWARF __tag1 applies to int*, no __tag2 (??).
    + According to BTF  __tag1 applies to int*, no __tag2 (??).
    : DWARF
    : 0x00000023:   DW_TAG_variable
    :                 DW_AT_name	("h")
    :                 DW_AT_type	(0x0000002e "int **")
    :
    : 0x0000002e:   DW_TAG_pointer_type
    :                 DW_AT_type	(0x00000037 "int *")
    :
    : 0x00000033:     DW_TAG_LLVM_annotation
    :                 DW_AT_name	("btf_type_tag")
    :                 DW_AT_const_value	("tag1")
    : BTF
    : [1] TYPE_TAG 'tag1' type_id=3
    : [2] PTR '(anon)' type_id=1
    : [3] PTR '(anon)' type_id=4
    : [4] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
    : [5] VAR 'h' type_id=2, linkage=global
    :
    : 'h' -> ptr -> 'tag1' -> ptr -> int

* A note about `void'

  The DWARF specification recommends to denote the =void= C type by
  generating a DIE with =DW_TAG_unspecified_type= and name "void".

  However, both GCC and LLVM do _not_ follow this recommendation and instead
  they denote the =void= type as the absence of a =DW_AT_type= attribute in
  whatever containing node.

  Example, for a pointer to =void=:

  : 3      DW_TAG_pointer_type    [no children]

  Note also that the kernel sources have sparse annotations like:

  : void __user * data;

  Which, using sparse ordering, means that the type which is annotated is
  =void=.  Therefore it is very important to be able to tag the =void= basic
  type in this design.

  GDB and other DWARF consumers understand the spec-recommended way to denote
  =void=.

* Solution 1: annotations as qualifiers

  A possible solution for this is to handle =DW_TAG_LLVM_annotation= the same
  way than C type qualifiers are handled in DWARF: including them in the type
  chain linked by =DW_AT_type= attributes.

  For example:

  : DW_TAG_pointer_type
  :   DW_AT_type ("btf_type_tag")
  :
  : DW_TAG_LLVM_annotation
  :   DW_AT_name        "btf_type_tag"
  :   DW_AT_const_value "tag1"
  :   DW_AT_type        ("int")
  :
  : DW_TAG_base_type
  :   DW_AT_name ("int")

  Note how now the =LLVM_annotation= has the annotated type linked by
  =DW_AT_type=, and acts itself as a type linked from =DW_TAG_pointer_type=.

  Advantages of this approach:

  - It makes sense for annotations to be implemented as qualifiers, because
    they actually qualify a target type.

  - This approach is totally flexible and makes it possible to annotate any
    type, qualified or not, pointed-to or not.

  - The resulting DWARF looks like the BTF.

  - It can handle annotated `void', as currently generated by GCC and
    clang/LLVM:

    :   DW_TAG_LLVM_annotation
    :     DW_AT_name        "btf_type_tag"
    :     DW_AT_const_value "tag1"
    :     DW_AT_type NULL

  Disadvantages of this approach:

  - Implementing this is more elaborated, and it requires DWARF consumers to
    understand this new DIE type, in order to follow the type chains in the
    tree: =DW_TAG_LLVM_annotation= should now be expected in any =DW_AT_type=
    reference.

  - This breaks DWARF, making it very difficult to be implemented as a
    compiler extension, and will likely require make it part of DWARF.

  - This is not backwards compatible to what clang currently generates.

* Solution 2: annotations as children DIEs

  This approach involves keeping the =DW_TAG_LLVM_annotation= DIE, with the
  same internal structure it has now, but associating it to the type DIE that
  is its parent.  (Note this is not the same than being linked by a
  =DW_AT_type= attribute like in Solution 1.)

  This means that this DWARF tree:

  : DW_TAG_pointer_type
  :   DW_AT_type "int"
  :   DW_TAG_LLVM_annotation
  :     DW_AT_name        "btf_type_tag"
  :     DW_AT_const_value "tag1"

  Denotes an annotation that applies to the type =int*=, not the pointee type
  =int=.

  Advantages of this approach:

  - This approach makes it possible to annotate any type, qualified or not,
    pointed-to or not.

  - This can easily be implemented as a compiler extension, because existing
    DWARF consumers will happily ignore the new attributes in case they don't
    support them;  the type chains in the tree remain the same.

  - Easy to implement in GCC.

  Disadvantages of this approach:

  - This may result in an increased number of type nodes in the tree.  For
    example, we may have a tagged =int*= and a non-tagged =int*=, which now
    will have to be implemented using two different DIEs.
   
  - This is not backwards-compatible to what clang currently generates, in
    the case of pointer types.

  - It cannot handle annotated `void' as currently generated by GCC and
    clang/LLVM, so for tagged =void= we would need to generate unspecified
    types with name "void":

    : DW_TAG_unspecified_type
    :   DW_AT_name "void"
    :   DW_TAG_LLVM_annotation
    :     DW_AT_name        "btf_type_tag"
    :     DW_AT_const_value "tag1"

    But this should be supported by DWARF consumers, as per the DWARF spec,
    and it is certainly recognized by GDB.

* Solution 3a: annotations as set of attributes

  Another possible solution is to extend DWARF with a pair of two new
  attributes =DW_AT_annotation_tag= and =DW_AT_annotation_value=.

  Annotated types will have these attributes defined.  Example:

  : DW_TAG_pointer_type
  :   DW_AT_type "int"
  :   DW_AT_annotation_tag   "btf_type_tag"
  :   DW_AT_annotation_value "tag1"

  Note that in this example the tag applies to the pointer type, not the
  pointee, i.e. to =int*=.

  Advantages of this approach:

  - This can easily be implemented as a compiler extension, because existing
    DWARF consumers will happily ignore the new attributes in case they don't
    support them;  the type chains in the tree remain the same.

  - This is backwards compatible to what clang currently generates.

  - Easy to implement in GCC.
   
  Disadvantages of this approach:

  - This may result in an increased number of type nodes in the tree.  For
    example, we may have a tagged =int*= and a non-tagged =int*=, which now
    will have to be implemented using two different DIEs.

  - It cannot handle annotated `void' as currently generated by GCC and
    clang/LLVM, so for tagged =void= we would need to generate unspecified
    types with name "void":

    : DW_TAG_unspecified_type
    :   DW_AT_name "void"
    :   DW_AT_annotation_tag   "btf_type_tag"
    :   DW_AT_annotation_value "tag1"

    But this should be supported by DWARF consumers, as per the DWARF spec,
    and it is certainly recognized by GDB.
   
* Solution 3b: annotations as single "structured" attributes

  This is like 3a, but using a single attribute =DW_AT_annotation= instead of
  two, and encoding the tag name and the tag value in the string value using
  some convention.

  For example:

  : DW_TAG_pointer_type
  :   DW_AT_type "int"
  :   DW_AT_annotation "btf_type_tag tag1"

  Meaning the tag name is "btf_type_tag" and the tag value is "tag1", using
  the convention that a white character separates them.

  Advantages over 3a:

  - Using a single attribute is more robust, since it eliminates the possible
    situation of a node having =DW_AT_annotation_tag= and not
    =DW_AT_annotation_value=.

  - It is easier to extend it, since the string stored in the
    =DW_AT_annotation= attribute may be made as complex as desired.  Better
    than adding more =DW_AT_annotation_FOO= attributes.

  - This is backwards compatible to what clang currently generates.

  - Easy to implement in GCC.
   
  Disadvantages over 3a:

  - This requires defining conventions specifying the structure of the string
    stored in the attribute.

  - This has the danger of overzealous design: "let's store a JSON tree in
    =DW_AT_annotation= for future extensions instead of continue bothering
    with DWARF".

  - It cannot handle annotated `void' as currently generated by GCC and
    clang/LLVM, so for tagged =void= we would need to generate unspecified
    types with name "void":

    : DW_TAG_unspecified_type
    :   DW_AT_name "void"
    :   DW_AT_annotation  "btf_type_tag tag1"

    But this should be supported by DWARF consumers, as per the DWARF spec,
    and it is certainly recognized by GDB.

             reply	other threads:[~2023-01-05 11:34 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-05 11:37 Jose E. Marchesi [this message]
2023-01-05 18:30 ` BTF tag support in DWARF (notes for today's BPF Office Hours) Jose E. Marchesi
2023-01-22 17:53   ` Yonghong Song
2023-01-23 15:50     ` Jose E. Marchesi
2023-01-23 18:43       ` David Faust
2023-01-24  7:37         ` Yonghong Song
2023-02-20 23:42   ` Eduard Zingerman
2023-02-21 19:38     ` David Faust
2023-02-21 22:57       ` Eduard Zingerman
2023-02-22 18:03         ` David Faust
2023-02-22 18:11           ` Alexei Starovoitov
2023-02-22 19:43             ` Eduard Zingerman
2023-02-27 21:13               ` Andrii Nakryiko
2023-02-28  0:41                 ` Eduard Zingerman
2023-02-28  0:45                   ` Andrii Nakryiko
2023-02-28  0:57                     ` Eduard Zingerman
2023-02-28  2:44                       ` Alexei Starovoitov
2023-02-28  5:28                         ` Andrii Nakryiko
2023-02-28  6:53                           ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r0w9jjoq.fsf@oracle.com \
    --to=jose.marchesi@oracle.com \
    --cc=bpf@vger.kernel.org \
    --cc=david.faust@oracle.com \
    --cc=dmalcolm@redhat.com \
    --cc=elena.zannoni@oracle.com \
    --cc=james.hilliard1@gmail.com \
    --cc=julia.lawall@inria.fr \
    --cc=ndesaulniers@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).