dwarves.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 dwarves 0/6] Support for new btf_type_tag encoding
@ 2023-05-24  0:18 Eduard Zingerman
  2023-05-24  0:18 ` [PATCH v3 dwarves 1/6] dwarves.h: expose ptr_table interface Eduard Zingerman
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Eduard Zingerman @ 2023-05-24  0:18 UTC (permalink / raw)
  To: dwarves, arnaldo.melo
  Cc: bpf, kernel-team, ast, daniel, andrii, yhs, jemarch, david.faust,
	mykolal, Eduard Zingerman

In recent discussion in BPF mailing list ([1], look for Solution #2)
participants agreed to add a new DWARF representation for
"btf_type_tag" annotations.

Existing representation is DW_TAG_LLVM_annotation object attached as a
child to a DW_TAG_pointer_type. It means that "btf_type_tag"
annotation is attached to a pointee type.

New representation is DW_TAG_LLVM_annotation object attached as a
child to *any* type. It means that "btf_type_tag" annotation is
attached to the parent type.

For example, for the following C code:

    int __attribute__((btf_type_tag("tag1"))) *g;

CLANG generates the following DWARF (1):

    0x0000001e:   DW_TAG_variable
                    DW_AT_name      ("g")
                    DW_AT_type      (0x00000029 "int *")

    0x00000029:   DW_TAG_pointer_type
                    DW_AT_type      (0x00000032 "int")

    0x0000002e:     DW_TAG_LLVM_annotation
                      DW_AT_name    ("btf_type_tag")
                      DW_AT_const_value     ("tag1")

    0x00000032:   DW_TAG_base_type
                    DW_AT_name      ("int")

However, using the new representation scheme DWARF looks as follows (2):

    0x0000001e:   DW_TAG_variable
                    DW_AT_name  ("g")
                    DW_AT_type  (0x00000029 "int *")
    
    0x00000029:   DW_TAG_pointer_type
                    DW_AT_type  (0x00000032 "int")
    
    0x00000032:   DW_TAG_base_type
                    DW_AT_name  ("int")
                    DW_AT_encoding  (DW_ATE_signed)
                    DW_AT_byte_size (0x04)
    
    0x00000036:     DW_TAG_LLVM_annotation
                      DW_AT_name    ("btf:type_tag")
                      DW_AT_const_value ("tag1")
    
Note that in (1) DW_TAG_LLVM_annotation is a child of
DW_TAG_pointer_type, but in (2) it is a child of DW_TAG_base_type.

This patch adds logic necessary to handle such annotations in the
pahole tool. Examples like below are supported:

  #define __tag(val) __attribute__((btf_type_tag("__" #val)))

  struct      alpha {};
  union       bravo {};
  enum        charlie { X };
  typedef int delta;

  struct echo {
    int          * __tag(a)  a;
    int            __tag(b) *b;
    int            __tag(c)  c;
    void           __tag(d) *d;
    void           __tag(e) *e;
    struct alpha   __tag(f)  f;
    union  bravo   __tag(g)  g;
    enum   charlie __tag(h)  h;
    delta          __tag(i)  i;
    int __tag(j_result) (__tag(j) *j)(int __tag(j_param));
  } g;
  
Implementation details
----------------------

Although this was not discussed in the mailing list, the proposed
implementation acts in a following way (for compatibility reasons):
- both forms could be present in the debug info;
- if any annotations corresponding to the new form are present in the
  debug info, annotations corresponding to the old form are ignored.

The v3 of this patch-set includes changes suggested in discussion [2],
which simplifies the implementation. Here is an overview for each
patch in the patch-set:

1. "dwarves.h: expose ptr_table interface"
   Makes struct ptr_table related functions accessible from
   dwarves.h header.

2. "dwarf_loader: Track unspecified types in a separate list"
   [1] suggests that new type tags encoding for the following case:

     void __attribute__((btf_type_tag("tag1"))) *g;

   Creates a DIE of kind DW_TAG_unspecified_type with name "void"
   and attaches DW_TAG_LLVM_annotation children to it.

   The later patches would rely on identity of this unspecified type
   instance for recoding, thus this patch introduces special tag type
   to represent DW_TAG_unspecified_type DIEs in the object model.

3. "dwarf_loader: handle btf_type_tag w/o special pointer type"
   Changes the way type tags are encoded in the dwarves.h object model:
   - special pointer type btf_type_tag_ptr_type is removed;
   - field struct btf_type_tag_type::node is removed, instances of
     btf_type_tag_type no longer form a list, instead links between
     types are tracked via btf_type_tag_type::tag.type fields, as it
     is done for derived types (e.g. DW_TAG_const_type).

4. "dwarf_loader: support btf:type_tag DW_TAG_LLVM_annotation"
   Adds support for new type tags encoding, this includes:
   - Changes to visit child DIEs of the following types:
     - DW_TAG_unspecified_type
     - DW_TAG_base_type
     - DW_TAG_typedef
     - DW_TAG_array_type
     - DW_TAG_subroutine_type
     - DW_TAG_enumeration_type
     - DW_TAG_structure_type
     In order to collect DW_TAG_LLVM_annotation's.
   - Changes in recode phase.

5. "dwarf_loader: move type tags before CVR qualifiers when necessary"
   
   Kernel expects type tags to precede CVR qualifiers in BTF.
   However, DWARF encoding format agreed with GCC team in [3.2]
   does not allow to attach DW_TAG_LLVM_annotation tags to qualifiers.
   
   Hence, this patch, which adds a post-processing step that converts
   type chains like CONST -> VOLATILE -> TYPE_TAG -> ...
   to TYPE_TAG -> CONST -> VOLATILE -> ...

6. "btf_encoder: skip type tags for VAR entry types"
   
   Kernel does not expect VAR entries to have types starting from
   BTF_TYPE_TAG. Before introduction of support for 'btf:type_tag'
   such situations were not possible, as TYPE_TAG entries were always
   preceded by PTR entries.
   
   This patch changes BTF VAR generation code to skip any BTF_TYPE_TAG
   entries for VAR type.

Corresponding CLANG changes are tracked in [3], refer to [3.2] for
some encoding examples.

Testing
-------

To verify the changes I used the following:
- Tools:
  - "LLVM-main"   :: LLVM at revision [4];
  - "LLVM-new"    :: LLVM at revision [4] with patches [3] applied;
  - "gcc"         :: GCC version 11.3 (no support for btf_type_tag annotations);
  - "pahole-next" :: dwarves at revision [5];
  - "pahole-new"  :: dwarves at revision [5] + this patch-set,
  - "kernel"      :: Linux Kernel bpf-next branch at revision [6] with CI patch [7].
- test cases:
  - kernel build;
  - kernel BPF test cases build, BPF tests execution
    (test_verifier, test_progs, test_progs-no_alu32, test_maps);
  - btfdiff script (suggested by Arnaldo, [8]).
- tool combinations (kernel compiler / clang for BPF tests / pahole version):
  - LLVM-main / LLVM-main / pahole-new
    - kernel build : ok
    - bpf tests    : ok
    - btfdiff      : ok (modulo diff #1, see below)
  - gcc       / LLVM-main / pahole-new
    - kernel build : ok
    - bpf tests    : ok
    - btfdiff      : ok but dwarf dump sometimes segfaults
  - LLVM-new  / LLVM-new  / pahole-next
    - kernel build : ok (modulo warn #1, see below)
    - bpf tests    : ok
    - btfdiff      : ok (modulo diff #1, see below)
  - LLVM-new  / LLVM-new  / pahole-new
    - kernel build : ok
    - bpf tests    : ok
    - btfdiff      : ok (modulo diff #1, see below)
  - gcc       / LLVM-new  / pahole-new
    - kernel build : ok
    - bpf tests    : ok
    - btfdiff      : ok

Diff #1: Difference in flexible array printing, several occurrences as below:

  @ -10531,7 +10531,7 @ struct bpf_cand_cache {
          struct {
                  const struct btf  * btf;                 /*    16     8 */
                  u32                id;                   /*    24     4 */
  -       } cands[0]; /*    16     0 */
  +       } cands[]; /*    16     0 */
   
Warn #1: pahole-next complains about unexpected child tags generated
         by clang, e.g.:

  die__create_new_tag: unspecified_type WITH children!
  die__create_new_base_type: DW_TAG_base_type WITH children!

Changelog
---------

V2 -> V3:
- Suggestion [2] from Arnaldo to represent type tags as separate
  derived types is applied. As a consequence, V3 rewrites V2 almost
  completely.
- "dwarf_loader: move type tags before CVR qualifiers when necessary"
  is added after discussion in [3.2].
- "btf_encoder: skip type tags for VAR entry types" 
  is added after additional testing (I'm not sure why this was not an
  issue for V2).

V1 -> V2:
- The patch is split in 5 parts to (hopefully) simplify the review:
  - #1, #2: two simple patches for fprintf and btf_loader to fix printing
    issue for types annotated by BTF type tags;
  - #3:  merges `struct llvm_annotation` and `struct btf_type_tag_type`
    as a preparatory step;
  - #4: introduces `struct unspecified_type` as a preparatory step;
  - #5: main logic for `btf:type_tag` support, this once can't
    be split further w/o parts losing some functionality for kernel
    build and/or bpf tests.
- `reallocarray()` in `push_btf_type_tag_mapping()` is replaced by
  `realloc()` (suggested by Alan);
- The sequence `free(dcu->hash_tags); free(dcu->hash_types);` added in
  V1 is removed from `dwarf_cu__delete()`. It was a fix for some
  valgrind errors reported for `pahole -F dwarf`, but this is
  unrelated and the fix is incomplete.

Links & revisions
-----------------

[1] Mailing list discussion regarding `btf:type_tag`
    Various approaches are discussed, Solution #2 is accepted
    https://lore.kernel.org/bpf/87r0w9jjoq.fsf@oracle.com/
[2] Suggestion to treat DW_TAG_llvm_annotation as a derived tag
    https://lore.kernel.org/bpf/ZCVygOn0+zKFEqW2@kernel.org/
[3] LLVM changes to generate btf:type_tag, revisions stack:
    [3.1] https://reviews.llvm.org/D143966
    [3.2] https://reviews.llvm.org/D143967 - this one has a number of examples
                                             in the description.
    [3.3] https://reviews.llvm.org/D145891
[4] LLVM revision
    commit ec77d1f3d9fc ("[lldb] Simplify predicates of find_if in BroadcastManager")
[5] Dwarves revision:
    commit 31bc0d741057 ("dwarf_loader: DW_TAG_subroutine_type may have a DW_AT_byte_size")
[6] Kernel revision:
    commit df21139441b0 ("tracing: fprobe: Initialize ret valiable to fix smatch error")
[7] Kernel CI patch:
    https://github.com/kernel-patches/vmtest/commit/2d732ac4e06631d11f4326989eea28d695efb7f5
[8] Suggestion to use btfdiff
    https://lore.kernel.org/dwarves/ZAKpZGSHTvsS4r8E@kernel.org/T/#mddbfe661e339485fb2b0e706b313
[V1] https://lore.kernel.org/dwarves/2232e368e55eb401bde45ce1b20fb710e379ae9c.camel@gmail.com/T/
[V2] https://lore.kernel.org/dwarves/20230314230417.1507266-1-eddyz87@gmail.com/

Eduard Zingerman (6):
  dwarves.h: expose ptr_table interface
  dwarf_loader: Track unspecified types in a separate list
  dwarf_loader: handle btf_type_tag w/o special pointer type
  dwarf_loader: support btf:type_tag DW_TAG_LLVM_annotation
  dwarf_loader: move type tags before CVR qualifiers when necessary
  btf_encoder: skip type tags for VAR entry types

 btf_encoder.c  |  30 +-
 dwarf_loader.c | 731 ++++++++++++++++++++++++++++++++++++++++---------
 dwarves.c      |   7 +-
 dwarves.h      |  34 +--
 4 files changed, 655 insertions(+), 147 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 dwarves 1/6] dwarves.h: expose ptr_table interface
  2023-05-24  0:18 [PATCH v3 dwarves 0/6] Support for new btf_type_tag encoding Eduard Zingerman
@ 2023-05-24  0:18 ` Eduard Zingerman
  2023-05-24  0:18 ` [PATCH v3 dwarves 2/6] dwarf_loader: Track unspecified types in a separate list Eduard Zingerman
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Eduard Zingerman @ 2023-05-24  0:18 UTC (permalink / raw)
  To: dwarves, arnaldo.melo
  Cc: bpf, kernel-team, ast, daniel, andrii, yhs, jemarch, david.faust,
	mykolal, Eduard Zingerman

Allow to use ptr_table__{init,exit,add} functions outside of dwarves.c.
This would be leveraged by subsequent patches.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 dwarves.c | 6 +++---
 dwarves.h | 4 ++++
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/dwarves.c b/dwarves.c
index 218367b..ed5c348 100644
--- a/dwarves.c
+++ b/dwarves.c
@@ -491,18 +491,18 @@ void cus__add(struct cus *cus, struct cu *cu)
 	cu__find_class_holes(cu);
 }
 
-static void ptr_table__init(struct ptr_table *pt)
+void ptr_table__init(struct ptr_table *pt)
 {
 	pt->entries = NULL;
 	pt->nr_entries = pt->allocated_entries = 0;
 }
 
-static void ptr_table__exit(struct ptr_table *pt)
+void ptr_table__exit(struct ptr_table *pt)
 {
 	zfree(&pt->entries);
 }
 
-static int ptr_table__add(struct ptr_table *pt, void *ptr, uint32_t *idxp)
+int ptr_table__add(struct ptr_table *pt, void *ptr, uint32_t *idxp)
 {
 	const uint32_t nr_entries = pt->nr_entries + 1;
 	const uint32_t rc = pt->nr_entries;
diff --git a/dwarves.h b/dwarves.h
index eb1a6df..54771d1 100644
--- a/dwarves.h
+++ b/dwarves.h
@@ -185,6 +185,10 @@ struct ptr_table {
 	uint32_t allocated_entries;
 };
 
+void ptr_table__init(struct ptr_table *pt);
+void ptr_table__exit(struct ptr_table *pt);
+int ptr_table__add(struct ptr_table *pt, void *ptr, uint32_t *idxp);
+
 struct function;
 struct tag;
 struct cu;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 dwarves 2/6] dwarf_loader: Track unspecified types in a separate list
  2023-05-24  0:18 [PATCH v3 dwarves 0/6] Support for new btf_type_tag encoding Eduard Zingerman
  2023-05-24  0:18 ` [PATCH v3 dwarves 1/6] dwarves.h: expose ptr_table interface Eduard Zingerman
@ 2023-05-24  0:18 ` Eduard Zingerman
  2023-05-24  0:18 ` [PATCH v3 dwarves 3/6] dwarf_loader: handle btf_type_tag w/o special pointer type Eduard Zingerman
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Eduard Zingerman @ 2023-05-24  0:18 UTC (permalink / raw)
  To: dwarves, arnaldo.melo
  Cc: bpf, kernel-team, ast, daniel, andrii, yhs, jemarch, david.faust,
	mykolal, Eduard Zingerman

In recent discussion in BPF mailing list ([1]) participants agreed to
add a new DWARF representation for "btf_type_tag" annotations.
The agreed representation of void pointers uses unspecified types.
For example, consider the following C code:

    struct alpha {
      void __attribute__((btf_type_tag("__alpha_a"))) *a;
    } g;

And corresponding DWARF:

0x29:   DW_TAG_structure_type
          DW_AT_name      ("alpha")

0x2e:     DW_TAG_member
            DW_AT_name    ("a")
            DW_AT_type    (0x38 "void *")

0x38:   DW_TAG_pointer_type
          DW_AT_type      (0x41 "void")

0x41:   DW_TAG_unspecified_type
          DW_AT_name      ("void")

0x43:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf:type_tag")
            DW_AT_const_value     ("__alpha_a")

This is a preparatory patch for new type tags representation support,
specifically it adds `struct unspecified_type` and a new `cu` field
`struct cu::unspecified_types`. These would be used in a subsequent
patch to recode type tags attached to DW_TAG_unspecified_type
as in the example above.

[1] Mailing list discussion regarding `btf:type_tag`
    Various approaches are discussed, Solution #2 is accepted
    https://lore.kernel.org/bpf/87r0w9jjoq.fsf@oracle.com/

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 dwarf_loader.c | 36 ++++++++++++++++++++++++++++++++++--
 dwarves.c      |  1 +
 dwarves.h      | 17 +++++++++++++++++
 3 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index ccf3194..dfed334 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -165,14 +165,19 @@ static struct dwarf_cu *dwarf_cu__new(struct cu *cu)
 	return dwarf_cu;
 }
 
+static void unspecified_type__delete(struct cu *cu, struct unspecified_type *utype);
+
 static void dwarf_cu__delete(struct cu *cu)
 {
 	if (cu == NULL || cu->priv == NULL)
 		return;
 
 	struct dwarf_cu *dcu = cu->priv;
+	struct list_head *pos, *n;
 
 	// dcu->hash_tags & dcu->hash_types are on cu->obstack
+	list_for_each_safe(pos, n, &cu->unspecified_types)
+		unspecified_type__delete(cu, container_of(pos, struct unspecified_type, node));
 	cu__free(cu, dcu);
 	cu->priv = NULL;
 }
@@ -1386,6 +1391,33 @@ static struct tag *die__create_new_tag(Dwarf_Die *die, struct cu *cu)
 	return tag;
 }
 
+static struct tag *die__create_new_unspecified_type(Dwarf_Die *die, struct cu *cu,
+						    struct conf_load *conf)
+{
+	struct unspecified_type *tag;
+
+	tag = tag__alloc_with_spec(cu, sizeof(struct unspecified_type));
+	if (tag == NULL)
+		return NULL;
+
+	tag__init(&tag->tag, cu, die);
+	INIT_LIST_HEAD(&tag->node);
+
+	tag->name = attr_string(die, DW_AT_name, conf);
+
+	list_add(&tag->node, &cu->unspecified_types);
+
+	return &tag->tag;
+}
+
+static void unspecified_type__delete(struct cu *cu, struct unspecified_type *utype)
+{
+	struct dwarf_tag *dtag = utype->tag.priv;
+
+	cu__free(cu, dtag);
+	cu__free(cu, utype);
+}
+
 static struct btf_type_tag_ptr_type *die__create_new_btf_type_tag_ptr_type(Dwarf_Die *die, struct cu *cu)
 {
 	struct btf_type_tag_ptr_type *tag;
@@ -2186,10 +2218,10 @@ static struct tag *__die__process_tag(Dwarf_Die *die, struct cu *cu,
 	case DW_TAG_volatile_type:
 	case DW_TAG_atomic_type:
 		tag = die__create_new_tag(die, cu);		break;
-	case DW_TAG_unspecified_type:
-		tag = die__create_new_tag(die, cu);		break;
 	case DW_TAG_pointer_type:
 		tag = die__create_new_pointer_tag(die, cu, conf);	break;
+	case DW_TAG_unspecified_type:
+		tag = die__create_new_unspecified_type(die, cu, conf); break;
 	case DW_TAG_ptr_to_member_type:
 		tag = die__create_new_ptr_to_member_type(die, cu); break;
 	case DW_TAG_enumeration_type:
diff --git a/dwarves.c b/dwarves.c
index ed5c348..f989250 100644
--- a/dwarves.c
+++ b/dwarves.c
@@ -681,6 +681,7 @@ struct cu *cu__new(const char *name, uint8_t addr_size,
 		cu->dfops	= NULL;
 		INIT_LIST_HEAD(&cu->tags);
 		INIT_LIST_HEAD(&cu->tool_list);
+		INIT_LIST_HEAD(&cu->unspecified_types);
 
 		cu->addr_size = addr_size;
 		cu->extra_dbg_info = 0;
diff --git a/dwarves.h b/dwarves.h
index 54771d1..509ffbc 100644
--- a/dwarves.h
+++ b/dwarves.h
@@ -246,6 +246,7 @@ struct cu {
 	struct list_head node;
 	struct list_head tags;
 	struct list_head tool_list;	/* To be used by tools such as ctracer */
+	struct list_head unspecified_types;
 	struct ptr_table types_table;
 	struct ptr_table functions_table;
 	struct ptr_table tags_table;
@@ -686,6 +687,22 @@ static inline struct btf_type_tag_type *tag__btf_type_tag(struct tag *tag)
 	return (struct btf_type_tag_type *)tag;
 }
 
+/** struct unspecified_type - representation of DW_TAG_unspecified_type.
+ *
+ *  @name   - DW_AT_name associated with this tag
+ *  @node   - a node for cu::unspecified_types list
+ */
+struct unspecified_type {
+	struct tag		tag;
+	const char		*name;
+	struct list_head	node;
+};
+
+static inline struct unspecified_type *tag__unspecified_type(struct tag *tag)
+{
+	return (struct unspecified_type *)tag;
+}
+
 /** struct namespace - base class for enums, structs, unions, typedefs, etc
  *
  * @tags - class_member, enumerators, etc
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 dwarves 3/6] dwarf_loader: handle btf_type_tag w/o special pointer type
  2023-05-24  0:18 [PATCH v3 dwarves 0/6] Support for new btf_type_tag encoding Eduard Zingerman
  2023-05-24  0:18 ` [PATCH v3 dwarves 1/6] dwarves.h: expose ptr_table interface Eduard Zingerman
  2023-05-24  0:18 ` [PATCH v3 dwarves 2/6] dwarf_loader: Track unspecified types in a separate list Eduard Zingerman
@ 2023-05-24  0:18 ` Eduard Zingerman
  2023-05-24  0:18 ` [PATCH v3 dwarves 4/6] dwarf_loader: support btf:type_tag DW_TAG_LLVM_annotation Eduard Zingerman
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Eduard Zingerman @ 2023-05-24  0:18 UTC (permalink / raw)
  To: dwarves, arnaldo.melo
  Cc: bpf, kernel-team, ast, daniel, andrii, yhs, jemarch, david.faust,
	mykolal, Eduard Zingerman

Currently BTF type tags could only be associated with "pointee" types.
(Types pointed to by some pointer). This is an artifact of type tags
encoding in DWARF. For example, for the following C code:

  int __attribute__((type_tag("tag"))) *p;

Clang produces the following DWARF:

  0x51:   DW_TAG_pointer_type
            DW_AT_type  (0x4d "int")

  0x56:     DW_TAG_LLVM_annotation
              DW_AT_name        ("btf_type_tag")
              DW_AT_const_value ("tag")

The assumption is that such DW_TAG_LLVM_annotation entries are only
attached to DW_TAG_pointer_type entries and apply to the pointee type
(0x4d "int" in this case).

This is handled by dwarf_loader using a special pointer sub-type:
`struct btf_type_tag_ptr_type`, which adds a list of type tags to
a regular pointer and has additional processing on the recode phase.

However, recent discussion [1] agreed to introduce a new kind of type
tags encoding in DWARF, allowing to associate such tags with any
types, not only pointee types.

As a preparation step to handle such encoding this commit removes
`struct btf_type_tag_ptr_type`, instead information about a set of
type tags associated with a pointer is stored in a new
`struct dwarf_cu` field:

  struct dwarf_cu {
        ...
        struct type_tags_table type_tags;
        ...
  }

Creation of `struct btf_type_tag_ptr_type` is delayed until all CU
members are scanned. To avoid second scan the `type_tags` field is
used to accumulate information necessary for type tag creation.
For each DIE with type tag children this information includes:
- struct tag corresponding to this DIE
- list of type tag values attached to DIE

Which means that each record varies in size. To accommodate this
treat `struct dwarf_cu::type_tags` as a stream of records with the
following format:

  ... parent-pointer tag-value* NULL ...
         ^               ^       ^
         |               |       '----- terminator
      struct tag*      char*

This information is used on the recode phase to establish correct
`struct tag::type` links.

Delayed creation of type tag objects would be exploited in the
follow-up patch.

[1] Mailing list discussion regarding `btf:type_tag`
    Various approaches are discussed, Solution #2 is accepted
    https://lore.kernel.org/bpf/87r0w9jjoq.fsf@oracle.com/

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 dwarf_loader.c | 350 ++++++++++++++++++++++++++++++++-----------------
 dwarves.h      |  19 ---
 2 files changed, 230 insertions(+), 139 deletions(-)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index dfed334..3670493 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -120,6 +120,8 @@ static void dwarf_tag__set_spec(struct dwarf_tag *dtag, dwarf_off_ref spec)
 struct dwarf_cu {
 	struct hlist_head *hash_tags;
 	struct hlist_head *hash_types;
+	/* See comment at add_btf_type_tag() */
+	struct ptr_table type_tags;
 	struct dwarf_tag *last_type_lookup;
 	struct cu *cu;
 	struct dwarf_cu *type_unit;
@@ -150,6 +152,7 @@ static int dwarf_cu__init(struct dwarf_cu *dcu, struct cu *cu)
 	dcu->type_unit = NULL;
 	// To avoid a per-lookup check against NULL in dwarf_cu__find_type_by_ref()
 	dcu->last_type_lookup = &sentinel_dtag;
+	ptr_table__init(&dcu->type_tags);
 	return 0;
 }
 
@@ -175,6 +178,7 @@ static void dwarf_cu__delete(struct cu *cu)
 	struct dwarf_cu *dcu = cu->priv;
 	struct list_head *pos, *n;
 
+	ptr_table__exit(&dcu->type_tags);
 	// dcu->hash_tags & dcu->hash_types are on cu->obstack
 	list_for_each_safe(pos, n, &cu->unspecified_types)
 		unspecified_type__delete(cu, container_of(pos, struct unspecified_type, node));
@@ -526,6 +530,12 @@ static void tag__init(struct tag *tag, struct cu *cu, Dwarf_Die *die)
 	INIT_LIST_HEAD(&tag->node);
 }
 
+static void tag__init_dummy(struct tag *tag, struct cu *cu, int tag_code)
+{
+	tag->tag = tag_code;
+	INIT_LIST_HEAD(&tag->node);
+}
+
 static struct tag *tag__new(Dwarf_Die *die, struct cu *cu)
 {
 	struct tag *tag = tag__alloc(cu, sizeof(*tag));
@@ -894,6 +904,11 @@ static int tag__recode_dwarf_bitfield(struct tag *tag, struct cu *cu, uint16_t b
 	return -ENOMEM;
 }
 
+static uint32_t tag__small_id(struct tag *tag)
+{
+	return ((struct dwarf_tag *)tag->priv)->small_id;
+}
+
 static int add_llvm_annotation(Dwarf_Die *die, int component_idx, struct conf_load *conf,
 			       struct list_head *head)
 {
@@ -939,6 +954,97 @@ static int add_child_llvm_annotations(Dwarf_Die *die, int component_idx,
 	return 0;
 }
 
+struct type_tags_writer {
+	struct dwarf_cu *dcu;
+	struct tag *parent;
+	bool started;
+};
+
+static void type_tags_writer__init(struct type_tags_writer *writer,
+				   struct dwarf_cu *dcu,
+				   struct tag *parent)
+{
+	writer->dcu = dcu;
+	writer->parent = parent;
+	writer->started = false;
+}
+
+/* It is necessary to delay type tag objects creation until all other
+ * CU members are scanned. To avoid a second scan the `type_tags`
+ * field is used to accumulate information necessary for type tag creation.
+ * For each DIE with type tag children this information includes:
+ * - struct tag corresponding to this DIE
+ * - list of type tag values attached to DIE
+ *
+ * Which means that each record varies in size. To accommodate this
+ * treat `writer->dcu->type_tags` as a stream of records with the
+ * following format:
+ *
+ *   ... parent-pointer tag-value* NULL ...
+ *          ^               ^       ^
+ *          |               |       '----- terminator
+ *       struct tag*      char*
+ */
+static int add_btf_type_tag(struct type_tags_writer *writer, Dwarf_Die *die,
+			    struct conf_load *conf)
+{
+	struct ptr_table *type_tags = &writer->dcu->type_tags;
+	const char *name, *value;
+	uint32_t ignored;
+	int ret = 0;
+
+	if (conf->skip_encoding_btf_type_tag)
+		return 0;
+
+	name = attr_string(die, DW_AT_name, conf);
+	if (strcmp(name, "btf_type_tag") != 0)
+		return 0;
+
+	value = attr_string(die, DW_AT_const_value, conf);
+
+	if (!writer->started) {
+		writer->started = true;
+		/* Terminate previous record */
+		if (type_tags->nr_entries != 0) {
+			ret = ptr_table__add(type_tags, NULL, &ignored);
+			if (ret)
+				return ret;
+		}
+		/* Start new record by saving parent pointer */
+		ret = ptr_table__add(type_tags, writer->parent, &ignored);
+		if (ret)
+			return ret;
+	}
+
+	return ptr_table__add(type_tags, (void *)value, &ignored);
+}
+
+/* Collect all type tag values attached to `parent` and save those in
+ * dwarf_cu::type_tags stream.
+ * See add_btf_type_tag() for stream encoding details.
+ * See dwarf_cu__recode_type_tags() for type tag processing details.
+ */
+static int add_child_btf_type_tags(Dwarf_Die *die, struct cu *cu,
+				   struct tag *parent, struct conf_load *conf)
+{
+	struct type_tags_writer ttw;
+	Dwarf_Die *cdie, child;
+	int ret;
+
+	if (!dwarf_haschildren(die) || dwarf_child(die, &child) != 0)
+		return 0;
+
+	type_tags_writer__init(&ttw, cu->priv, parent);
+	cdie = &child;
+	do {
+		ret = add_btf_type_tag(&ttw, cdie, conf);
+		if (ret)
+			return ret;
+	} while (dwarf_siblingof(cdie, cdie) == 0);
+
+	return 0;
+}
+
 int class_member__dwarf_recode_bitfield(struct class_member *member,
 					struct cu *cu)
 {
@@ -1418,89 +1524,21 @@ static void unspecified_type__delete(struct cu *cu, struct unspecified_type *uty
 	cu__free(cu, utype);
 }
 
-static struct btf_type_tag_ptr_type *die__create_new_btf_type_tag_ptr_type(Dwarf_Die *die, struct cu *cu)
+static struct tag *die__create_new_pointer_tag(Dwarf_Die *die, struct cu *cu,
+					       struct conf_load *conf)
 {
-	struct btf_type_tag_ptr_type *tag;
+	struct tag *tag;
 
-	tag  = tag__alloc_with_spec(cu, sizeof(struct btf_type_tag_ptr_type));
+	tag = tag__new(die, cu);
 	if (tag == NULL)
 		return NULL;
 
-	tag__init(&tag->tag, cu, die);
-	tag->tag.has_btf_type_tag = true;
-	INIT_LIST_HEAD(&tag->tags);
-	return tag;
-}
-
-static struct btf_type_tag_type *die__create_new_btf_type_tag_type(Dwarf_Die *die, struct cu *cu,
-								   struct conf_load *conf)
-{
-	struct btf_type_tag_type *tag;
-
-	tag  = tag__alloc_with_spec(cu, sizeof(struct btf_type_tag_type));
-	if (tag == NULL)
+	if (add_child_btf_type_tags(die, cu, tag, conf))
 		return NULL;
 
-	tag__init(&tag->tag, cu, die);
-	tag->value = attr_string(die, DW_AT_const_value, conf);
 	return tag;
 }
 
-static struct tag *die__create_new_pointer_tag(Dwarf_Die *die, struct cu *cu,
-					       struct conf_load *conf)
-{
-	struct btf_type_tag_ptr_type *tag = NULL;
-	struct btf_type_tag_type *annot;
-	Dwarf_Die *cdie, child;
-	const char *name;
-	uint32_t id;
-
-	/* If no child tags or skipping btf_type_tag encoding, just create a new tag
-	 * and return
-	 */
-	if (!dwarf_haschildren(die) || dwarf_child(die, &child) != 0 ||
-	    conf->skip_encoding_btf_type_tag)
-		return tag__new(die, cu);
-
-	/* Otherwise, check DW_TAG_LLVM_annotation child tags */
-	cdie = &child;
-	do {
-		if (dwarf_tag(cdie) != DW_TAG_LLVM_annotation)
-			continue;
-
-		/* Only check btf_type_tag annotations */
-		name = attr_string(cdie, DW_AT_name, conf);
-		if (strcmp(name, "btf_type_tag") != 0)
-			continue;
-
-		if (tag == NULL) {
-			/* Create a btf_type_tag_ptr type. */
-			tag = die__create_new_btf_type_tag_ptr_type(die, cu);
-			if (!tag)
-				return NULL;
-		}
-
-		/* Create a btf_type_tag type for this annotation. */
-		annot = die__create_new_btf_type_tag_type(cdie, cu, conf);
-		if (annot == NULL)
-			return NULL;
-
-		if (cu__table_add_tag(cu, &annot->tag, &id) < 0)
-			return NULL;
-
-		struct dwarf_tag *dtag = annot->tag.priv;
-		dtag->small_id = id;
-		cu__hash(cu, &annot->tag);
-
-		/* For a list of DW_TAG_LLVM_annotation like tag1 -> tag2 -> tag3,
-		 * the tag->tags contains tag3 -> tag2 -> tag1.
-		 */
-		list_add(&annot->node, &tag->tags);
-	} while (dwarf_siblingof(cdie, cdie) == 0);
-
-	return tag ? &tag->tag : tag__new(die, cu);
-}
-
 static struct tag *die__create_new_ptr_to_member_type(Dwarf_Die *die,
 						      struct cu *cu)
 {
@@ -2551,45 +2589,6 @@ static void lexblock__recode_dwarf_types(struct lexblock *tag, struct cu *cu)
 	}
 }
 
-static void dwarf_cu__recode_btf_type_tag_ptr(struct btf_type_tag_ptr_type *tag,
-					      uint32_t pointee_type)
-{
-	struct btf_type_tag_type *annot;
-	struct dwarf_tag *annot_dtag;
-	struct tag *prev_tag;
-
-	/* Given source like
-	 *   int tag1 tag2 tag3 *p;
-	 * the tag->tags contains tag3 -> tag2 -> tag1, the final type chain looks like:
-	 *   pointer -> tag3 -> tag2 -> tag1 -> pointee
-	 *
-	 * Basically it means
-	 *   - '*' applies to "int tag1 tag2 tag3"
-	 *   - tag3 applies to "int tag1 tag2"
-	 *   - tag2 applies to "int tag1"
-	 *   - tag1 applies to "int"
-	 *
-	 * This also makes final source code (format c) easier as we can do
-	 *   emit for "tag3 -> tag2 -> tag1 -> int"
-	 *   emit '*'
-	 *
-	 * For 'tag3 -> tag2 -> tag1 -> int":
-	 *   emit for "tag2 -> tag1 -> int"
-	 *   emit tag3
-	 *
-	 * Eventually we can get the source code like
-	 *   int tag1 tag2 tag3 *p;
-	 * and this matches the user/kernel code.
-	 */
-	prev_tag = &tag->tag;
-	list_for_each_entry(annot, &tag->tags, node) {
-		annot_dtag = annot->tag.priv;
-		prev_tag->type = annot_dtag->small_id;
-		prev_tag = &annot->tag;
-	}
-	prev_tag->type = pointee_type;
-}
-
 static int tag__recode_dwarf_type(struct tag *tag, struct cu *cu)
 {
 	struct dwarf_tag *dtag = tag->priv;
@@ -2681,6 +2680,10 @@ static int tag__recode_dwarf_type(struct tag *tag, struct cu *cu)
 		if (dtype != NULL)
 			goto out;
 		goto find_type;
+
+	case DW_TAG_LLVM_annotation:
+		return 0;
+
 	case DW_TAG_variable: {
 		struct variable *var = tag__variable(tag);
 
@@ -2699,10 +2702,7 @@ static int tag__recode_dwarf_type(struct tag *tag, struct cu *cu)
 	}
 
 	if (dtag->type.off == 0) {
-		if (tag->tag != DW_TAG_pointer_type || !tag->has_btf_type_tag)
-			tag->type = 0; /* void */
-		else
-			dwarf_cu__recode_btf_type_tag_ptr(tag__btf_type_tag_ptr(tag), 0);
+		tag->type = 0; /* void */
 		return 0;
 	}
 
@@ -2714,10 +2714,7 @@ check_type:
 		return 0;
 	}
 out:
-	if (tag->tag != DW_TAG_pointer_type || !tag->has_btf_type_tag)
-		tag->type = dtype->small_id;
-	else
-		dwarf_cu__recode_btf_type_tag_ptr(tag__btf_type_tag_ptr(tag), dtype->small_id);
+	tag->type = dtype->small_id;
 
 	return 0;
 }
@@ -2817,12 +2814,125 @@ static int cu__recode_dwarf_types_table(struct cu *cu,
 	return 0;
 }
 
+static struct btf_type_tag_type *new_btf_type_tag_type(struct cu *cu,
+						       const char *value)
+{
+	struct btf_type_tag_type *tag;
+
+	tag  = tag__alloc_with_spec(cu, sizeof(struct btf_type_tag_type));
+	if (tag == NULL)
+		return NULL;
+
+	tag__init_dummy(&tag->tag, cu, DW_TAG_LLVM_annotation);
+	tag->value = value;
+	return tag;
+}
+
+/* Consider the following C code:
+ *
+ *   int __tag1 __tag2 __tag3 *p;
+ *
+ * According to C types "parsing" rules such definitions are
+ * read from right to left:
+ *  - '*' applies to "int tag1 tag2 tag3"
+ *  - tag3 applies to "int tag1 tag2"
+ *  - tag2 applies to "int tag1"
+ *  - tag1 applies to "int"
+ *
+ * Consequently, in BTF it should be represented as follows:
+ *
+ *   '*' -> tag3 -> tag2 -> tag1 -> int
+ *
+ * We want to reflect the same structure using dwarves structures.
+ * Clang generates the following DWARF:
+ *
+ *   DW_TAG_pointer_type
+ *     DW_AT_type:            "int"
+ *
+ *     DW_TAG_LLVM_annotation
+ *       DW_AT_name:          "btf_type_tag"
+ *       DW_AT_const_value    "tag1"
+ *
+ *     DW_TAG_LLVM_annotation
+ *       DW_AT_name           "btf_type_tag"
+ *       DW_AT_const_value    "tag2"
+ *
+ *     DW_TAG_LLVM_annotation
+ *       DW_AT_name           "btf_type_tag"
+ *       DW_AT_const_value    "tag3"
+ *
+ * Here we create one instance of `struct btf_type_tag_type` for each
+ * tag and link those in reverse order:
+ *
+ *   tag3 -> tag2 -> tag1
+ *
+ * And complete the chain as follows for each type tag group:
+ *
+ *   '*' -> tag3 -> tag2 -> tag1 -> int
+ */
+static int dwarf_cu__recode_type_tags(struct cu *cu)
+{
+	struct dwarf_cu *dcu = cu->priv;
+	void **entries = dcu->type_tags.entries;
+	unsigned int N = dcu->type_tags.nr_entries;
+	unsigned int i = 0;
+
+	/* See add_btf_type_tag() for record format description */
+	while (i < N) {
+		struct tag *parent = entries[i++];
+		struct btf_type_tag_type *first = NULL;
+		struct btf_type_tag_type *last = NULL;
+		struct btf_type_tag_type *prev = NULL;
+
+		while (i < N) {
+			const char *value = entries[i++];
+			uint32_t id;
+
+			if (value == NULL)
+				break;
+
+			last = new_btf_type_tag_type(cu, value);
+			if (last == NULL)
+				return -ENOMEM;
+
+			if (first == NULL)
+				first = last;
+
+			if (cu__table_add_tag(cu, &last->tag, &id) < 0)
+				return -ENOMEM;
+
+			struct dwarf_tag *dtag = last->tag.priv;
+			dtag->small_id = id;
+			/* Link `prev` and `last` in reverse order */
+			if (prev)
+				last->tag.type = tag__small_id(&prev->tag);
+			prev = last;
+		}
+
+		if (first == NULL)
+			continue;
+
+		/* Type tags are recoded *after* main recode phase,
+		 * so parent->type is valid.
+		 *
+		 * parent   last           first  parent->type
+		 *    |      |               |       |
+		 *   '*' -> tag3 -> tag2 -> tag1 -> int
+		 */
+		first->tag.type = parent->type;
+		parent->type = tag__small_id(&last->tag);
+	}
+
+	return 0;
+}
+
 static int cu__recode_dwarf_types(struct cu *cu)
 {
 	if (cu__recode_dwarf_types_table(cu, &cu->types_table, 1) ||
 	    cu__recode_dwarf_types_table(cu, &cu->tags_table, 0) ||
 	    cu__recode_dwarf_types_table(cu, &cu->functions_table, 0))
 		return -1;
+	dwarf_cu__recode_type_tags(cu);
 	return 0;
 }
 
diff --git a/dwarves.h b/dwarves.h
index 509ffbc..22b0a40 100644
--- a/dwarves.h
+++ b/dwarves.h
@@ -446,7 +446,6 @@ struct tag {
 	uint16_t	 tag;
 	bool		 visited;
 	bool		 top_level;
-	bool		 has_btf_type_tag;
 	uint16_t	 recursivity_level;
 	void		 *priv;
 };
@@ -658,30 +657,12 @@ struct llvm_annotation {
  *
  * @tag   - DW_TAG_LLVM_annotation tag
  * @value - btf_type_tag value string
- * @node  - list_head node
  */
 struct btf_type_tag_type {
 	struct tag		tag;
 	const char		*value;
-	struct list_head	node;
 };
 
-/** The struct btf_type_tag_ptr_type - type containing both pointer type and
- *  its btf_type_tag annotations
- *
- * @tag  - pointer type tag
- * @tags - btf_type_tag annotations for the pointer type
- */
-struct btf_type_tag_ptr_type {
-	struct tag		tag;
-	struct list_head 	tags;
-};
-
-static inline struct btf_type_tag_ptr_type *tag__btf_type_tag_ptr(struct tag *tag)
-{
-	return (struct btf_type_tag_ptr_type *)tag;
-}
-
 static inline struct btf_type_tag_type *tag__btf_type_tag(struct tag *tag)
 {
 	return (struct btf_type_tag_type *)tag;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 dwarves 4/6] dwarf_loader: support btf:type_tag DW_TAG_LLVM_annotation
  2023-05-24  0:18 [PATCH v3 dwarves 0/6] Support for new btf_type_tag encoding Eduard Zingerman
                   ` (2 preceding siblings ...)
  2023-05-24  0:18 ` [PATCH v3 dwarves 3/6] dwarf_loader: handle btf_type_tag w/o special pointer type Eduard Zingerman
@ 2023-05-24  0:18 ` Eduard Zingerman
  2023-05-24  0:18 ` [PATCH v3 dwarves 5/6] dwarf_loader: move type tags before CVR qualifiers when necessary Eduard Zingerman
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Eduard Zingerman @ 2023-05-24  0:18 UTC (permalink / raw)
  To: dwarves, arnaldo.melo
  Cc: bpf, kernel-team, ast, daniel, andrii, yhs, jemarch, david.faust,
	mykolal, Eduard Zingerman

"btf:type_tag" is an DW_TAG_LLVM_annotation object that encodes
btf_type_tag attributes in DWARF. Contrary to existing "btf_type_tag"
it allows to associate such attributes with non-pointer types.
When "btf:type_tag" is attached to a type it applies to this type.

For example, the following C code:

    struct echo {
      int __attribute__((btf_type_tag("__c"))) c;
    }

Produces the following DWARF:

0x29:   DW_TAG_structure_type
          DW_AT_name      ("echo")

0x40:     DW_TAG_member
            DW_AT_name    ("c")
            DW_AT_type    (0x8c "int")

0x8c:   DW_TAG_base_type
          DW_AT_name      ("int")
          DW_AT_encoding  (DW_ATE_signed)
          DW_AT_byte_size (0x04)

0x90:     DW_TAG_LLVM_annotation
            DW_AT_name    ("btf:type_tag")
            DW_AT_const_value     ("__c")

Meaning that type 0x8c is an `int` with type tag `__c`.
Corresponding BTF looks as follows:

[1] STRUCT 'echo'
        ...
        'c' type_id=8 bits_offset=128
[4] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[8] TYPE_TAG '__c' type_id=4

This commit adds support for DW_TAG_LLVM_annotation "btf:type_tag"
attached to the following entities:
- base types;
- arrays;
- pointers;
- structs
- unions;
- enums;
- typedefs.

To allow backwards compatibility and avoid additional invocation
options, implementation acts in a following way:
- both `btf_type_tag` and `btf:type_tag` could be present in the
  debug info;
- if `btf:type_tag` annotations are present in the debug info,
  `btf_type_tag` annotations are ignored.

Code changes could be summarized as follows:
- add `enum type_tag_kind` with the following values:
  - TYPE_TAG_SELF for `btf:type_tag`
  - TYPE_TAG_POINTEE for `btf_type_tag`;
- add field `struct dwarf_cu::effective_type_tag_kind` to decide which
  type tag kind should be used for CU;
- change `dwarf_cu::type_tags` records format to convey tag kind;
- add calls to add_btf_type_tag() / add_child_btf_type_tags() for
  die__create_new_*** functions corresponding to types listed above;
- change dwarf_cu__recode_type_tags() to recode TYPE_TAG_SELF type
  tags *before* main recode phase, so `struct dwarf_cu::hash_types`
  changes would be visible during main phase.

See also:
[1] Mailing list discussion regarding `btf:type_tag`
    Various approaches are discussed, Solution #2 is accepted
    https://lore.kernel.org/bpf/87r0w9jjoq.fsf@oracle.com/

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 dwarf_loader.c | 195 ++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 160 insertions(+), 35 deletions(-)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index 3670493..2b50322 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -117,6 +117,29 @@ static void dwarf_tag__set_spec(struct dwarf_tag *dtag, dwarf_off_ref spec)
 	*(dwarf_off_ref *)(dtag + 1) = spec;
 }
 
+enum type_tag_kind {
+	/* Corresponds to type tags of form:
+	 *
+	 *     DW_TAG_LLVM_annotation
+	 *       DW_AT_name	("btf:type_tag")
+	 *       DW_AT_const_value	("...")
+	 *
+	 * Such entries could be subordinate to any DWARF type.
+	 * The tag applies to the parent type.
+	 */
+	TYPE_TAG_SELF = 1u,
+	/* Corresponds to type tags of form:
+	 *
+	 *     DW_TAG_LLVM_annotation
+	 *       DW_AT_name	("btf_type_tag")
+	 *       DW_AT_const_value	("...")
+	 *
+	 * Such entries are subordinate to DW_TAG_pointer_type type only.
+	 * The tag applies to DW_AT_type of the pointer.
+	 */
+	TYPE_TAG_POINTEE = 2u
+};
+
 struct dwarf_cu {
 	struct hlist_head *hash_tags;
 	struct hlist_head *hash_types;
@@ -125,6 +148,7 @@ struct dwarf_cu {
 	struct dwarf_tag *last_type_lookup;
 	struct cu *cu;
 	struct dwarf_cu *type_unit;
+	enum type_tag_kind effective_type_tag_kind;
 };
 
 static int dwarf_cu__init(struct dwarf_cu *dcu, struct cu *cu)
@@ -153,6 +177,7 @@ static int dwarf_cu__init(struct dwarf_cu *dcu, struct cu *cu)
 	// To avoid a per-lookup check against NULL in dwarf_cu__find_type_by_ref()
 	dcu->last_type_lookup = &sentinel_dtag;
 	ptr_table__init(&dcu->type_tags);
+	dcu->effective_type_tag_kind = TYPE_TAG_POINTEE;
 	return 0;
 }
 
@@ -232,6 +257,20 @@ static void cu__hash(struct cu *cu, struct tag *tag)
 	hashtags__hash(hashtable, tag->priv);
 }
 
+/* Change impostor->priv->id to match tag->priv->id and replace `tag`
+ * by `impostor` in the `cu` types/tags table.
+ */
+static void cu__hash_impersonate(struct cu *cu, struct tag *tag, struct tag *impostor)
+{
+	struct dwarf_tag *dimpostor = impostor->priv;
+	struct dwarf_tag *dtag = tag->priv;
+
+	hlist_del_init(&dimpostor->hash_node);
+	hlist_del_init(&dtag->hash_node);
+	dimpostor->id = dtag->id;
+	cu__hash(cu, impostor);
+}
+
 static struct dwarf_tag *dwarf_cu__find_tag_by_ref(const struct dwarf_cu *cu,
 						   const struct dwarf_off_ref *ref)
 {
@@ -980,16 +1019,17 @@ static void type_tags_writer__init(struct type_tags_writer *writer,
  * treat `writer->dcu->type_tags` as a stream of records with the
  * following format:
  *
- *   ... parent-pointer tag-value* NULL ...
- *          ^               ^       ^
- *          |               |       '----- terminator
- *       struct tag*      char*
+ *   ... parent-pointer (tag-value tag-kind)* NULL ...
+ *          ^               ^         ^        ^
+ *          |               |         |        '----- terminator
+ *       struct tag*      char*  enum type_tag_kind
  */
 static int add_btf_type_tag(struct type_tags_writer *writer, Dwarf_Die *die,
 			    struct conf_load *conf)
 {
 	struct ptr_table *type_tags = &writer->dcu->type_tags;
 	const char *name, *value;
+	uintptr_t kind = 0;
 	uint32_t ignored;
 	int ret = 0;
 
@@ -997,11 +1037,16 @@ static int add_btf_type_tag(struct type_tags_writer *writer, Dwarf_Die *die,
 		return 0;
 
 	name = attr_string(die, DW_AT_name, conf);
-	if (strcmp(name, "btf_type_tag") != 0)
+	kind = strcmp(name, "btf_type_tag") == 0 ? TYPE_TAG_POINTEE : kind;
+	kind = strcmp(name, "btf:type_tag") == 0 ? TYPE_TAG_SELF : kind;
+	if (!kind)
 		return 0;
 
 	value = attr_string(die, DW_AT_const_value, conf);
 
+	if (kind == TYPE_TAG_SELF)
+		writer->dcu->effective_type_tag_kind = TYPE_TAG_SELF;
+
 	if (!writer->started) {
 		writer->started = true;
 		/* Terminate previous record */
@@ -1016,7 +1061,11 @@ static int add_btf_type_tag(struct type_tags_writer *writer, Dwarf_Die *die,
 			return ret;
 	}
 
-	return ptr_table__add(type_tags, (void *)value, &ignored);
+	ret = ptr_table__add(type_tags, (void *)value, &ignored);
+	if (ret)
+		return ret;
+
+	return ptr_table__add(type_tags, (void *)kind, &ignored);
 }
 
 /* Collect all type tag values attached to `parent` and save those in
@@ -1510,6 +1559,7 @@ static struct tag *die__create_new_unspecified_type(Dwarf_Die *die, struct cu *c
 	INIT_LIST_HEAD(&tag->node);
 
 	tag->name = attr_string(die, DW_AT_name, conf);
+	add_child_btf_type_tags(die, cu, &tag->tag, conf);
 
 	list_add(&tag->node, &cu->unspecified_types);
 
@@ -1611,9 +1661,7 @@ static struct tag *die__create_new_base_type(Dwarf_Die *die, struct cu *cu, stru
 	if (base == NULL)
 		return NULL;
 
-	if (dwarf_haschildren(die))
-		fprintf(stderr, "%s: DW_TAG_base_type WITH children!\n",
-			__func__);
+	add_child_btf_type_tags(die, cu, &base->tag, conf);
 
 	return &base->tag;
 }
@@ -1628,11 +1676,15 @@ static struct tag *die__create_new_typedef(Dwarf_Die *die, struct cu *cu, struct
 	if (add_child_llvm_annotations(die, -1, conf, &tdef->namespace.annots))
 		return NULL;
 
+	if (add_child_btf_type_tags(die, cu, &tdef->namespace.tag, conf))
+		return NULL;
+
 	return &tdef->namespace.tag;
 }
 
-static struct tag *die__create_new_array(Dwarf_Die *die, struct cu *cu)
+static struct tag *die__create_new_array(Dwarf_Die *die, struct cu *cu, struct conf_load *conf)
 {
+	struct type_tags_writer ttw;
 	Dwarf_Die child;
 	/* "64 dimensions will be enough for everybody." acme, 2006 */
 	const uint8_t max_dimensions = 64;
@@ -1645,19 +1697,29 @@ static struct tag *die__create_new_array(Dwarf_Die *die, struct cu *cu)
 	if (!dwarf_haschildren(die) || dwarf_child(die, &child) != 0)
 		return &array->tag;
 
+	type_tags_writer__init(&ttw, cu->priv, &array->tag);
 	die = &child;
 	do {
-		if (dwarf_tag(die) == DW_TAG_subrange_type) {
+		switch (dwarf_tag(die)) {
+		case DW_TAG_subrange_type:
 			nr_entries[array->dimensions++] = attr_upper_bound(die);
 			if (array->dimensions == max_dimensions) {
 				fprintf(stderr, "%s: only %u dimensions are "
 						"supported!\n",
 					__FUNCTION__, max_dimensions);
-				break;
+				goto _break_loop;
 			}
-		} else
+			break;
+		case DW_TAG_LLVM_annotation:
+			if (add_btf_type_tag(&ttw, die, conf))
+				goto out_free;
+			break;
+		default:
 			cu__tag_not_handled(die);
+			break;
+		}
 	} while (dwarf_siblingof(die, die) == 0);
+_break_loop:
 
 	array->nr_entries = memdup(nr_entries,
 				   array->dimensions * sizeof(uint32_t), cu);
@@ -1754,6 +1816,7 @@ static struct tag *die__create_new_constant(Dwarf_Die *die, struct cu *cu, struc
 static struct tag *die__create_new_subroutine_type(Dwarf_Die *die,
 						   struct cu *cu, struct conf_load *conf)
 {
+	struct type_tags_writer ttw;
 	Dwarf_Die child;
 	struct ftype *ftype = ftype__new(die, cu);
 	struct tag *tag;
@@ -1764,6 +1827,7 @@ static struct tag *die__create_new_subroutine_type(Dwarf_Die *die,
 	if (!dwarf_haschildren(die) || dwarf_child(die, &child) != 0)
 		goto out;
 
+	type_tags_writer__init(&ttw, cu->priv, &ftype->tag);
 	die = &child;
 	do {
 		uint32_t id;
@@ -1778,6 +1842,10 @@ static struct tag *die__create_new_subroutine_type(Dwarf_Die *die,
 		case DW_TAG_unspecified_parameters:
 			ftype->unspec_parms = 1;
 			continue;
+		case DW_TAG_LLVM_annotation:
+			if (add_btf_type_tag(&ttw, die, conf))
+				goto out_delete;
+			continue;
 		default:
 			tag = die__process_tag(die, cu, 0, conf);
 			if (tag == NULL)
@@ -1817,6 +1885,7 @@ static struct tag *die__create_new_enumeration(Dwarf_Die *die, struct cu *cu, st
 {
 	Dwarf_Die child;
 	struct type *enumeration = type__new(die, cu, conf);
+	struct type_tags_writer ttw;
 
 	if (enumeration == NULL)
 		return NULL;
@@ -1832,19 +1901,27 @@ static struct tag *die__create_new_enumeration(Dwarf_Die *die, struct cu *cu, st
 		goto out;
 	}
 
+	type_tags_writer__init(&ttw, cu->priv, &enumeration->namespace.tag);
 	die = &child;
 	do {
 		struct enumerator *enumerator;
 
-		if (dwarf_tag(die) != DW_TAG_enumerator) {
+		switch (dwarf_tag(die)) {
+		case DW_TAG_enumerator:
+			enumerator = enumerator__new(die, cu, conf);
+			if (enumerator == NULL)
+				goto out_delete;
+
+			enumeration__add(enumeration, enumerator);
+			break;
+		case DW_TAG_LLVM_annotation:
+			if (add_btf_type_tag(&ttw, die, conf))
+				goto out_delete;
+			break;
+		default:
 			cu__tag_not_handled(die);
-			continue;
+			break;
 		}
-		enumerator = enumerator__new(die, cu, conf);
-		if (enumerator == NULL)
-			goto out_delete;
-
-		enumeration__add(enumeration, enumerator);
 	} while (dwarf_siblingof(die, die) == 0);
 out:
 	return &enumeration->namespace.tag;
@@ -1857,8 +1934,10 @@ static int die__process_class(Dwarf_Die *die, struct type *class,
 			      struct cu *cu, struct conf_load *conf)
 {
 	const bool is_union = tag__is_union(&class->namespace.tag);
+	struct type_tags_writer ttw;
 	int member_idx = 0;
 
+	type_tags_writer__init(&ttw, cu->priv, &class->namespace.tag);
 	do {
 		switch (dwarf_tag(die)) {
 		case DW_TAG_subrange_type: // XXX: ADA stuff, its a type tho, will have other entries referencing it...
@@ -1906,6 +1985,8 @@ static int die__process_class(Dwarf_Die *die, struct type *class,
 		}
 			continue;
 		case DW_TAG_LLVM_annotation:
+			if (add_btf_type_tag(&ttw, die, conf))
+				return -ENOMEM;
 			if (add_llvm_annotation(die, -1, conf, &class->namespace.annots))
 				return -ENOMEM;
 			continue;
@@ -2243,7 +2324,7 @@ static struct tag *__die__process_tag(Dwarf_Die *die, struct cu *cu,
 	case DW_TAG_imported_unit:
 		return NULL; // We don't support imported units yet, so to avoid segfaults
 	case DW_TAG_array_type:
-		tag = die__create_new_array(die, cu);		break;
+		tag = die__create_new_array(die, cu, conf);	break;
 	case DW_TAG_string_type: // FORTRAN stuff, looks like an array
 		tag = die__create_new_string_type(die, cu);	break;
 	case DW_TAG_base_type:
@@ -2866,11 +2947,20 @@ static struct btf_type_tag_type *new_btf_type_tag_type(struct cu *cu,
  *
  *   tag3 -> tag2 -> tag1
  *
- * And complete the chain as follows for each type tag group:
+ * And for kind == TYPE_TAG_POINTEE complete the chain as follows for
+ * each type tag group:
  *
  *   '*' -> tag3 -> tag2 -> tag1 -> int
+ *
+ * For kind == TYPE_TAG_SELF the tag1..3 would be applied to `int`
+ * entry itself, thus complete the chain as below:
+ *
+ *   tag3 -> tag2 -> tag1 -> int
+ *
+ * And do cu__hash_impersonate(cu, int, tag3), so that each lookup of
+ * `int` by Dwarf_Off would return `tag3`.
  */
-static int dwarf_cu__recode_type_tags(struct cu *cu)
+static int dwarf_cu__recode_type_tags(struct cu *cu, enum type_tag_kind target_kind)
 {
 	struct dwarf_cu *dcu = cu->priv;
 	void **entries = dcu->type_tags.entries;
@@ -2885,12 +2975,18 @@ static int dwarf_cu__recode_type_tags(struct cu *cu)
 		struct btf_type_tag_type *prev = NULL;
 
 		while (i < N) {
-			const char *value = entries[i++];
+			const char *value;
+			uintptr_t kind;
 			uint32_t id;
 
-			if (value == NULL)
+			value = entries[i++];
+			if (value == NULL || i >= N)
 				break;
 
+			kind = (uintptr_t)entries[i++];
+			if (kind != target_kind)
+				continue;
+
 			last = new_btf_type_tag_type(cu, value);
 			if (last == NULL)
 				return -ENOMEM;
@@ -2912,15 +3008,34 @@ static int dwarf_cu__recode_type_tags(struct cu *cu)
 		if (first == NULL)
 			continue;
 
-		/* Type tags are recoded *after* main recode phase,
-		 * so parent->type is valid.
-		 *
-		 * parent   last           first  parent->type
-		 *    |      |               |       |
-		 *   '*' -> tag3 -> tag2 -> tag1 -> int
-		 */
-		first->tag.type = parent->type;
-		parent->type = tag__small_id(&last->tag);
+		switch (target_kind) {
+		case TYPE_TAG_POINTEE:
+			/* For POINTEE type tags are recoded *after*
+			 * main recode phase, so parent->type is valid.
+			 *
+			 * parent   last           first  parent->type
+			 *    |      |               |       |
+			 *   '*' -> tag3 -> tag2 -> tag1 -> int
+			 */
+			first->tag.type = parent->type;
+			parent->type = tag__small_id(&last->tag);
+			break;
+		case TYPE_TAG_SELF:
+			/* For SELF type tags are recoded *before*
+			 * main recode phase, so dcu->hash_types
+			 * changes would be visible during that phase.
+			 *
+			 * last            first   parent
+			 *   |               |       |
+			 *  tag3 -> tag2 -> tag1 -> int
+			 *   ^                       |
+			 *   '-----------------------'
+			 *   copy parent->tag.priv->id
+			 */
+			first->tag.type = tag__small_id(parent);
+			cu__hash_impersonate(cu, parent, &last->tag);
+			break;
+		}
 	}
 
 	return 0;
@@ -2928,11 +3043,21 @@ static int dwarf_cu__recode_type_tags(struct cu *cu)
 
 static int cu__recode_dwarf_types(struct cu *cu)
 {
+	struct dwarf_cu *dcu = cu->priv;
+
+	if (dcu->effective_type_tag_kind == TYPE_TAG_SELF &&
+	    dwarf_cu__recode_type_tags(cu, TYPE_TAG_SELF) != 0)
+		return -1;
+
 	if (cu__recode_dwarf_types_table(cu, &cu->types_table, 1) ||
 	    cu__recode_dwarf_types_table(cu, &cu->tags_table, 0) ||
 	    cu__recode_dwarf_types_table(cu, &cu->functions_table, 0))
 		return -1;
-	dwarf_cu__recode_type_tags(cu);
+
+	if (dcu->effective_type_tag_kind == TYPE_TAG_POINTEE &&
+	    dwarf_cu__recode_type_tags(cu, TYPE_TAG_POINTEE) != 0)
+		return -1;
+
 	return 0;
 }
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 dwarves 5/6] dwarf_loader: move type tags before CVR qualifiers when necessary
  2023-05-24  0:18 [PATCH v3 dwarves 0/6] Support for new btf_type_tag encoding Eduard Zingerman
                   ` (3 preceding siblings ...)
  2023-05-24  0:18 ` [PATCH v3 dwarves 4/6] dwarf_loader: support btf:type_tag DW_TAG_LLVM_annotation Eduard Zingerman
@ 2023-05-24  0:18 ` Eduard Zingerman
  2023-05-24  0:18 ` [PATCH v3 dwarves 6/6] btf_encoder: skip type tags for VAR entry types Eduard Zingerman
  2023-05-25  0:13 ` [PATCH v3 dwarves 0/6] Support for new btf_type_tag encoding Eduard Zingerman
  6 siblings, 0 replies; 9+ messages in thread
From: Eduard Zingerman @ 2023-05-24  0:18 UTC (permalink / raw)
  To: dwarves, arnaldo.melo
  Cc: bpf, kernel-team, ast, daniel, andrii, yhs, jemarch, david.faust,
	mykolal, Eduard Zingerman

After discussion in [1] it was agreed to attach type tag annotations
of TYPE_TAG_SELF kind only to "base" types, specifically to:
- base types;
- arrays;
- pointers;
- structs
- unions;
- enums;
- typedefs.

And to not attach such tags to const/volatile/restrict derived types.
However, current Linux Kernel BTF validation code expects that all
type tags precede CVR qualifiers.

In other words, CLANG and GCC would generate tag chains like this:

  const -> volatile -> tag1 -> int

But kernel wants tag chain to be:

  tag1 -> const -> volatile -> int

This commit moves type tags before CVR qualifiers using a simplistic
algorithm and relies on libbpf's BTF deduplication algorithm for
cleanup.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 dwarf_loader.c | 210 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 210 insertions(+)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index 2b50322..e764510 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -3134,12 +3134,222 @@ static int die__process(Dwarf_Die *die, struct cu *cu, struct conf_load *conf)
 	return 0;
 }
 
+/* LISP-style single-linked immutable list */
+struct string_list {
+	const char *string;
+	const struct string_list *next;
+};
+
+enum {
+	CONST = 1u << 1,
+	VOLATILE = 1u << 2,
+	RESTRICT = 1u << 3,
+};
+
+/* Create a type corresponding to @target_id wrapped in CVR qualifiers
+ * specified in @cvr.
+ *
+ * @target_id -  in: ID of a type to wrap with qualifiers
+ *              out: ID of the resulting qualified type.
+ */
+static int wrap_qualifiers(struct cu *cu, uint32_t cvr, uint32_t *target_id)
+{
+	struct {
+		uint16_t code;
+		bool present;
+	} qualifiers[3] = {
+		{ DW_TAG_restrict_type, cvr & RESTRICT },
+		{ DW_TAG_volatile_type, cvr & VOLATILE },
+		{ DW_TAG_const_type   , cvr & CONST },
+	};
+
+	for (uint32_t i = 0; i < 3; ++i) {
+		struct tag *tag;
+
+		if (!qualifiers[i].present)
+			continue;
+
+		tag = tag__alloc(cu, sizeof(*tag));
+		if (tag == NULL)
+			return -ENOMEM;
+
+		tag__init_dummy(tag, cu, qualifiers[i].code);
+		tag->type = *target_id;
+		if (cu__add_tag(cu, tag, target_id))
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+/* Wrap @target_id with type tags specified by @tags, assign the
+ * @top_id as ID of the resulting type.
+ */
+static int wrap_type_tags(struct cu *cu, uint32_t top_id, uint32_t target_id,
+			  const struct string_list *tags)
+{
+	struct btf_type_tag_type *type_tag;
+	uint32_t running_id = target_id;
+	int ret = 0;
+
+	for (; tags != NULL; tags = tags->next) {
+		type_tag = new_btf_type_tag_type(cu, tags->string);
+		if (type_tag == NULL)
+			return -ENOMEM;
+
+		type_tag->tag.type = running_id;
+		if (tags->next != NULL) {
+			ret = cu__add_tag(cu, &type_tag->tag, &running_id);
+		} else {
+			struct tag *top_tag = cu__type(cu, top_id);
+
+			list_del_init(&top_tag->node);
+			cu__table_nullify_type_entry(cu, top_id);
+			cu__free(cu, top_tag);
+			ret = cu__add_tag_with_id(cu, &type_tag->tag, top_id);
+			break;
+		}
+	}
+
+	return ret;
+}
+
+/* Traverse CVR/type tag chain starting from @top_id and replace it
+ * with a chain starting from type tags.
+ *
+ * @top_id - first ID in the CVR/type tag chain, type entry
+ *           corresponding to this ID would be replaced.
+ * @id     - current ID in the CVR/type tag chain, used for recursive descend.
+ * @cvr    - CVR bits accumulated so far.
+ * @tags   - list of type tags accumulated so far.
+ */
+static int rebuild_type_tag_chain(struct cu *cu, uint32_t top_id, uint32_t id, uint32_t cvr,
+				  struct string_list *tags)
+{
+	struct string_list tag_node = {};
+	uint32_t qualified_id;
+	uint16_t dw_tag = 0;
+	struct tag *tag;
+	int ret;
+
+	tag = cu__type(cu, id);
+	if (tag != NULL)
+		dw_tag = tag->tag;
+
+	switch (dw_tag) {
+	case DW_TAG_const_type:
+		cvr |= CONST;
+		break;
+	case DW_TAG_volatile_type:
+		cvr |= VOLATILE;
+		break;
+	case DW_TAG_restrict_type:
+		cvr |= RESTRICT;
+		break;
+	case DW_TAG_LLVM_annotation:
+		tag_node.string = tag__btf_type_tag(tag)->value;
+		tag_node.next = tags;
+		tags = &tag_node;
+		break;
+	default:
+		if (cvr == 0 || tags == NULL)
+			return 0;
+
+		qualified_id = id;
+		ret = wrap_qualifiers(cu, cvr, &qualified_id);
+		if (ret)
+			return ret;
+		return wrap_type_tags(cu, top_id, qualified_id, tags);
+	}
+
+	return rebuild_type_tag_chain(cu, top_id, tag->type, cvr, tags);
+}
+
+/* After some discussion in [1] it was agreed to attach type tag
+ * annotations of TYPE_TAG_SELF kind only to "base" types,
+ * specifically to:
+ * - base types;
+ * - arrays;
+ * - pointers;
+ * - structs
+ * - unions;
+ * - enums;
+ * - typedefs.
+ *
+ * And to not attach such tags to const/volatile/restrict derived types.
+ * However, current Linux Kernel BTF validation code expects that all
+ * type tags precede CVR qualifiers.
+ *
+ * In other words, CLANG and GCC would generate tag chains like this:
+ *
+ *   const -> volatile -> tag1 -> int
+ *
+ * But kernel wants tag chain to be:
+ *
+ *   tag1 -> const -> volatile -> int
+ *
+ * The code below moves type tags before CVR qualifiers in following steps
+ * for each CVR-type:
+ * - recursively traverse CVR / TYPE_TAG chain, accumulating CVR bits
+ *   and type tags
+ * - rebuild the chain with type tags put at front
+ * - for the first entry in a chain replace original CVR type entry
+ *   with type tag entry, so that all links to this type remain valid
+ *
+ * For example, the following chain:
+ *
+ *   const -> volatile -> tag1 -> int
+ *   id: 1    id: 2       id: 3   id: 4
+ *
+ * Would be rewritten as:
+ *
+ *   tag1 -> const -> volatile -> int
+ *   id: 1   id: 5    id: 6       id: 4
+ *                                 ^
+ *   tag1 -> volatile -------------|
+ *   id: 2   id: 7                 |
+ *                                 |
+ *   tag1 -------------------------'
+ *   id: 3
+ *
+ * The unnecessary duplicate chains 6->4 and 7->4 would be removed by
+ * libbpf's BTF deduplication algorithm.
+ *
+ * [1] https://reviews.llvm.org/D143967
+ */
+static int move_type_tags_before_cvr_qualifiers(struct cu *cu)
+{
+	uint32_t N = cu->types_table.nr_entries;
+	struct tag *tag;
+	int ret = 0;
+
+	for (uint32_t id = 1; id < N; ++id) {
+		tag = cu__type(cu, id);
+		if (tag == NULL)
+			continue;
+
+		if (tag->tag != DW_TAG_const_type &&
+		    tag->tag != DW_TAG_volatile_type &&
+		    tag->tag != DW_TAG_restrict_type)
+			continue;
+
+		ret = rebuild_type_tag_chain(cu, id, id, 0, NULL);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 static int die__process_and_recode(Dwarf_Die *die, struct cu *cu, struct conf_load *conf)
 {
 	int ret = die__process(die, cu, conf);
 	if (ret != 0)
 		return ret;
 	ret = cu__recode_dwarf_types(cu);
+	if (ret != 0)
+		return ret;
+	ret = move_type_tags_before_cvr_qualifiers(cu);
 	if (ret != 0)
 		return ret;
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 dwarves 6/6] btf_encoder: skip type tags for VAR entry types
  2023-05-24  0:18 [PATCH v3 dwarves 0/6] Support for new btf_type_tag encoding Eduard Zingerman
                   ` (4 preceding siblings ...)
  2023-05-24  0:18 ` [PATCH v3 dwarves 5/6] dwarf_loader: move type tags before CVR qualifiers when necessary Eduard Zingerman
@ 2023-05-24  0:18 ` Eduard Zingerman
  2023-05-25  0:13 ` [PATCH v3 dwarves 0/6] Support for new btf_type_tag encoding Eduard Zingerman
  6 siblings, 0 replies; 9+ messages in thread
From: Eduard Zingerman @ 2023-05-24  0:18 UTC (permalink / raw)
  To: dwarves, arnaldo.melo
  Cc: bpf, kernel-team, ast, daniel, andrii, yhs, jemarch, david.faust,
	mykolal, Eduard Zingerman

Kernel does not expect VAR entries to have types starting from
BTF_TYPE_TAG. Specifically, the code like below will be rejected:

  struct rq __percpu runqueues;
  ...
  rq = (struct rq *)bpf_per_cpu_ptr(&runqueues, cpu);
  ... rq->cpu ...     // rq type is now PTR_TO_BTF_ID

The access to 'rq->cpu' would be checked by a call to
kernel/bpf/btf.c:btf_struct_access() which invokes btf_struct_walk(),
using rq's type as a starting point. The btf_struct_walk() wants the
first type in a chain to be STRUCT or UNION and does not skip modifiers.

Before introduction of support for 'btf:type_tag' such situations were
not possible, as TYPE_TAG entries were always preceded by PTR entries.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
 btf_encoder.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index 65f6e71..300d9c2 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -1498,6 +1498,19 @@ static bool ftype__has_arg_names(const struct ftype *ftype)
 	return true;
 }
 
+static type_id_t skip_btf_type_tags(struct cu *cu, type_id_t id)
+{
+	for (;;) {
+		struct tag *tag = cu__type(cu, id);
+
+		if (tag == NULL || tag->tag != DW_TAG_LLVM_annotation)
+			break;
+		id = tag->type;
+	}
+
+	return id;
+}
+
 static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder)
 {
 	struct cu *cu = encoder->cu;
@@ -1583,7 +1596,22 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder)
 			continue;
 		}
 
-		type = var->ip.tag.type + encoder->type_id_off;
+		/* Kernel does not expect VAR entries to have types starting from BTF_TYPE_TAG.
+		 * Specifically, the code like below will be rejected:
+		 *
+		 *   struct rq __percpu runqueues;
+		 *   ...
+		 *   rq = (struct rq *)bpf_per_cpu_ptr(&runqueues, cpu);
+		 *   ... rq->cpu ...     // rq type is now PTR_TO_BTF_ID
+		 *
+		 * The access to 'rq->cpu' would be checked by a call to
+		 * kernel/bpf/btf.c:btf_struct_access() which invokes btf_struct_walk(),
+		 * using rq's type as a starting point. The btf_struct_walk() wants the
+		 * first type in a chain to be STRUCT or UNION and does not skip modifiers.
+		 *
+		 * Thus, call skip_btf_type_tags() here.
+		 */
+		type = skip_btf_type_tags(cu, var->ip.tag.type) + encoder->type_id_off;
 		linkage = var->external ? BTF_VAR_GLOBAL_ALLOCATED : BTF_VAR_STATIC;
 
 		if (encoder->verbose) {
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 dwarves 0/6] Support for new btf_type_tag encoding
  2023-05-24  0:18 [PATCH v3 dwarves 0/6] Support for new btf_type_tag encoding Eduard Zingerman
                   ` (5 preceding siblings ...)
  2023-05-24  0:18 ` [PATCH v3 dwarves 6/6] btf_encoder: skip type tags for VAR entry types Eduard Zingerman
@ 2023-05-25  0:13 ` Eduard Zingerman
  2023-05-26  0:06   ` Eduard Zingerman
  6 siblings, 1 reply; 9+ messages in thread
From: Eduard Zingerman @ 2023-05-25  0:13 UTC (permalink / raw)
  To: dwarves, arnaldo.melo
  Cc: bpf, kernel-team, ast, daniel, andrii, yhs, jemarch, david.faust,
	mykolal

On Wed, 2023-05-24 at 03:18 +0300, Eduard Zingerman wrote:
[...] 
>   - gcc       / LLVM-main / pahole-new
>     - kernel build : ok
>     - bpf tests    : ok
>     - btfdiff      : ok but dwarf dump sometimes segfaults

Regarding this segfault. I can reproduce it using pahole 'next' branch
if btfdiff is executed several times. So, the proposed patch-set is
not the cause.

Specifically, it happens when the following command is executed:

  pahole -F dwarf --flat_arrays --sort ... vmlinux

With the following stack trace:

  Thread 1 "pahole" received signal SIGSEGV, Segmentation fault.
  0x00007ffff7f819ad in __rb_erase_color (node=0x7fffd4045830, parent=0x0, root=0x5555555672d8 <structures.tree>) at /home/eddy/work/dwarves-fork/rbtree.c:134
  134			if (parent->rb_left == node)
  (gdb) bt
  #0  0x00007ffff7f819ad in __rb_erase_color (node=0x7fffd4045830, parent=0x0, root=0x5555555672d8 <structures.tree>) at /home/eddy/work/dwarves-fork/rbtree.c:134
  #1  0x00007ffff7f82014 in rb_erase (node=0x7fff21ae5b80, root=0x5555555672d8 <structures.tree>) at /home/eddy/work/dwarves-fork/rbtree.c:275
  #2  0x0000555555559c3d in __structures__delete () at /home/eddy/work/dwarves-fork/pahole.c:440
  #3  0x0000555555559c70 in structures__delete () at /home/eddy/work/dwarves-fork/pahole.c:448
  #4  0x0000555555560bb6 in main (argc=13, argv=0x7fffffffdcd8) at /home/eddy/work/dwarves-fork/pahole.c:3584

I tried random testing the rb_tree implementation and can reliably
reproduce the error using the diff at the end of the email.
E.g. it happens for test sequence (using add/delete functions adapted
to work on int values instead of 'struct structure'):

  60 4 24 34 15 2 80 26 82 67 92 77 9.

Will figure out a fix for rb_erase code tomorrow.

--- testing code for rb_erase ---

diff --git a/pahole.c b/pahole.c
index 6fc4ed6..6c60c9e 100644
--- a/pahole.c
+++ b/pahole.c
@@ -8,6 +8,7 @@
 
 #include <argp.h>
 #include <assert.h>
+#include <signal.h>
 #include <stdio.h>
 #include <dwarf.h>
 #include <elfutils/version.h>
@@ -3408,10 +3409,97 @@ out_free:
 	return ret;
 }
 
+struct test_struct {
+	struct rb_node rb_node;
+	int val;
+};
+
+static struct rb_root test_tree = RB_ROOT;
+static struct test_struct structs[100] = {};
+static int free_struct_idx = 0;
+static int num_tries = 0;
+
+static void test_add(int val)
+{
+        struct rb_node **p = &test_tree.rb_node;
+        struct rb_node *parent = NULL;
+	struct test_struct *str;
+
+        while (*p != NULL) {
+		int rc;
+
+                parent = *p;
+                str = rb_entry(parent, struct test_struct, rb_node);
+		rc = str->val - val;
+
+		if (rc > 0)
+                        p = &(*p)->rb_left;
+                else if (rc < 0)
+                        p = &(*p)->rb_right;
+		else {
+			return;
+		}
+        }
+
+	str = &structs[free_struct_idx++];
+	str->val = val;
+
+	rb_link_node(&str->rb_node, parent, p);
+        rb_insert_color(&str->rb_node, &test_tree);
+
+	return;
+}
+
+static void test_delete()
+{
+	struct rb_node *next = rb_first(&test_tree);
+
+	while (next) {
+		struct test_struct *pos = rb_entry(next, struct test_struct, rb_node);
+		next = rb_next(&pos->rb_node);
+		rb_erase(&pos->rb_node, &structures__tree);
+	}
+}
+
+void test_erase_once()
+{
+	test_tree = RB_ROOT;
+	free_struct_idx = 0;
+	bzero(structs, sizeof(structs));
+	int size = rand() % 16;
+	for (int i = 0; i < size; ++i) {
+		int val = rand() % 100;
+		test_add(val);
+	}
+	test_delete();
+	++num_tries;
+}
+
+static void sigsegv_handler(int nSignum, siginfo_t* si, void* vcontext) {
+	fprintf(stderr, "SIGSEGV after %d iters: [%d]", num_tries, free_struct_idx);
+	for (int i = 0; i < free_struct_idx; ++i)
+		fprintf(stderr, " %d", structs[i].val);
+	fprintf(stderr, "\n");
+	exit(1);
+}
+
 int main(int argc, char *argv[])
 {
 	int err, remaining, rc = EXIT_FAILURE;
 
+	int seed = clock();
+	fprintf(stderr, "seed = %d\n", seed);
+	srand(seed);
+	struct sigaction action = {};
+	action.sa_flags = SA_SIGINFO;
+	action.sa_sigaction = sigsegv_handler;
+	sigaction(SIGSEGV, &action, NULL);
+	for (int i = 0; i < 1000 * 1000; ++i) {
+		test_erase_once();
+	}
+	fprintf(stderr, "done\n");
+	return 0;
+
 	if (argp_parse(&pahole__argp, argc, argv, 0, &remaining, NULL)) {
 		argp_help(&pahole__argp, stderr, ARGP_HELP_SEE, argv[0]);
 		goto out;

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 dwarves 0/6] Support for new btf_type_tag encoding
  2023-05-25  0:13 ` [PATCH v3 dwarves 0/6] Support for new btf_type_tag encoding Eduard Zingerman
@ 2023-05-26  0:06   ` Eduard Zingerman
  0 siblings, 0 replies; 9+ messages in thread
From: Eduard Zingerman @ 2023-05-26  0:06 UTC (permalink / raw)
  To: dwarves, arnaldo.melo
  Cc: bpf, kernel-team, ast, daniel, andrii, yhs, jemarch, david.faust,
	mykolal

On Thu, 2023-05-25 at 03:13 +0300, Eduard Zingerman wrote:
[...]
> +static void test_delete()
> +{
> +	struct rb_node *next = rb_first(&test_tree);
> +
> +	while (next) {
> +		struct test_struct *pos = rb_entry(next, struct test_struct, rb_node);
> +		next = rb_next(&pos->rb_node);
> +		rb_erase(&pos->rb_node, &structures__tree);
                                        ^^^^^^^
                                        Should be &test_tree

I made a mistake in this test code.
When correct tree is passed to rb_erase() test passes w/o segfaults.
The original issue is caused by the same 'struct structures' instance
being added to an rb_tree twice, posted a separate patch with a fix to
the mailing list.

> +	}
> +}
[...]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-05-26  0:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-24  0:18 [PATCH v3 dwarves 0/6] Support for new btf_type_tag encoding Eduard Zingerman
2023-05-24  0:18 ` [PATCH v3 dwarves 1/6] dwarves.h: expose ptr_table interface Eduard Zingerman
2023-05-24  0:18 ` [PATCH v3 dwarves 2/6] dwarf_loader: Track unspecified types in a separate list Eduard Zingerman
2023-05-24  0:18 ` [PATCH v3 dwarves 3/6] dwarf_loader: handle btf_type_tag w/o special pointer type Eduard Zingerman
2023-05-24  0:18 ` [PATCH v3 dwarves 4/6] dwarf_loader: support btf:type_tag DW_TAG_LLVM_annotation Eduard Zingerman
2023-05-24  0:18 ` [PATCH v3 dwarves 5/6] dwarf_loader: move type tags before CVR qualifiers when necessary Eduard Zingerman
2023-05-24  0:18 ` [PATCH v3 dwarves 6/6] btf_encoder: skip type tags for VAR entry types Eduard Zingerman
2023-05-25  0:13 ` [PATCH v3 dwarves 0/6] Support for new btf_type_tag encoding Eduard Zingerman
2023-05-26  0:06   ` Eduard Zingerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).