All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH dwarves v2] btf_encoder: sanitize non-regular int base type
@ 2021-02-07  7:17 Yonghong Song
  2021-02-07 10:32 ` Sedat Dilek
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Yonghong Song @ 2021-02-07  7:17 UTC (permalink / raw)
  To: acme, dwarves
  Cc: bpf, andriin, mark, ndesaulniers, sedat.dilek, Andrii Nakryiko

clang with dwarf5 may generate non-regular int base type,
i.e., not a signed/unsigned char/short/int/longlong/__int128.
Such base types are often used to describe
how an actual parameter or variable is generated. For example,

0x000015cf:   DW_TAG_base_type
                DW_AT_name      ("DW_ATE_unsigned_1")
                DW_AT_encoding  (DW_ATE_unsigned)
                DW_AT_byte_size (0x00)

0x00010ed9:         DW_TAG_formal_parameter
                      DW_AT_location    (DW_OP_lit0,
                                         DW_OP_not,
                                         DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
                                         DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
                                         DW_OP_stack_value)
                      DW_AT_abstract_origin     (0x00013984 "branch")

What it does is with a literal "0", did a "not" operation, and the converted to
one-bit unsigned int and then 8-bit unsigned int.

Another example,

0x000e97e4:   DW_TAG_base_type
                DW_AT_name      ("DW_ATE_unsigned_24")
                DW_AT_encoding  (DW_ATE_unsigned)
                DW_AT_byte_size (0x03)

0x000f88f8:     DW_TAG_variable
                  DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
                     [0xffffffff82808812, 0xffffffff82808817):
                         DW_OP_breg0 RAX+0,
                         DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
                         DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
                         DW_OP_stack_value,
                         DW_OP_piece 0x1,
                         DW_OP_breg0 RAX+0,
                         DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
                         DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
                         DW_OP_lit8,
                         DW_OP_shr,
                         DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
                         DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
                         DW_OP_stack_value,
                         DW_OP_piece 0x3
                     ......

At one point, a right shift by 8 happens and the result is converted to
32-bit unsigned int and then to 24-bit unsigned int.

BTF does not need any of these DW_OP_* information and such non-regular int
types will cause libbpf to emit errors.
Let us sanitize them to generate BTF acceptable to libbpf and kernel.

Cc: Sedat Dilek <sedat.dilek@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 libbtf.c | 39 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/libbtf.c b/libbtf.c
index 9f76283..5843200 100644
--- a/libbtf.c
+++ b/libbtf.c
@@ -373,6 +373,7 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
 	struct btf *btf = btfe->btf;
 	const struct btf_type *t;
 	uint8_t encoding = 0;
+	uint16_t byte_sz;
 	int32_t id;
 
 	if (bt->is_signed) {
@@ -384,7 +385,43 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
 		return -1;
 	}
 
-	id = btf__add_int(btf, name, BITS_ROUNDUP_BYTES(bt->bit_size), encoding);
+	/* dwarf5 may emit DW_ATE_[un]signed_{num} base types where
+	 * {num} is not power of 2 and may exceed 128. Such attributes
+	 * are mostly used to record operation for an actual parameter
+	 * or variable.
+	 * For example,
+	 *     DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
+	 *         [0xffffffff82808812, 0xffffffff82808817):
+	 *             DW_OP_breg0 RAX+0,
+	 *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
+	 *             DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
+	 *             DW_OP_stack_value,
+	 *             DW_OP_piece 0x1,
+	 *             DW_OP_breg0 RAX+0,
+	 *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
+	 *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
+	 *             DW_OP_lit8,
+	 *             DW_OP_shr,
+	 *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
+	 *             DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
+	 *             DW_OP_stack_value, DW_OP_piece 0x3
+	 *     DW_AT_name    ("ebx")
+	 *     DW_AT_decl_file       ("/linux/arch/x86/events/intel/core.c")
+	 *
+	 * In the above example, at some point, one unsigned_32 value
+	 * is right shifted by 8 and the result is converted to unsigned_32
+	 * and then unsigned_24.
+	 *
+	 * BTF does not need such DW_OP_* information so let us sanitize
+	 * these non-regular int types to avoid libbpf/kernel complaints.
+	 */
+	byte_sz = BITS_ROUNDUP_BYTES(bt->bit_size);
+	if (!byte_sz || (byte_sz & (byte_sz - 1))) {
+		name = "__SANITIZED_FAKE_INT__";
+		byte_sz = 4;
+	}
+
+	id = btf__add_int(btf, name, byte_sz, encoding);
 	if (id < 0) {
 		btf_elf__log_err(btfe, BTF_KIND_INT, name, true, "Error emitting BTF type");
 	} else {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH dwarves v2] btf_encoder: sanitize non-regular int base type
  2021-02-07  7:17 [PATCH dwarves v2] btf_encoder: sanitize non-regular int base type Yonghong Song
@ 2021-02-07 10:32 ` Sedat Dilek
  2021-02-08 19:55   ` Arnaldo Carvalho de Melo
  2021-02-07 14:18 ` Mark Wielaard
  2021-02-08 19:22 ` Nick Desaulniers
  2 siblings, 1 reply; 14+ messages in thread
From: Sedat Dilek @ 2021-02-07 10:32 UTC (permalink / raw)
  To: Yonghong Song
  Cc: acme, dwarves, bpf, andriin, mark, Nick Desaulniers, Andrii Nakryiko

On Sun, Feb 7, 2021 at 8:17 AM Yonghong Song <yhs@fb.com> wrote:
>
> clang with dwarf5 may generate non-regular int base type,
> i.e., not a signed/unsigned char/short/int/longlong/__int128.
> Such base types are often used to describe
> how an actual parameter or variable is generated. For example,
>
> 0x000015cf:   DW_TAG_base_type
>                 DW_AT_name      ("DW_ATE_unsigned_1")
>                 DW_AT_encoding  (DW_ATE_unsigned)
>                 DW_AT_byte_size (0x00)
>
> 0x00010ed9:         DW_TAG_formal_parameter
>                       DW_AT_location    (DW_OP_lit0,
>                                          DW_OP_not,
>                                          DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
>                                          DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
>                                          DW_OP_stack_value)
>                       DW_AT_abstract_origin     (0x00013984 "branch")
>
> What it does is with a literal "0", did a "not" operation, and the converted to
> one-bit unsigned int and then 8-bit unsigned int.
>
> Another example,
>
> 0x000e97e4:   DW_TAG_base_type
>                 DW_AT_name      ("DW_ATE_unsigned_24")
>                 DW_AT_encoding  (DW_ATE_unsigned)
>                 DW_AT_byte_size (0x03)
>
> 0x000f88f8:     DW_TAG_variable
>                   DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
>                      [0xffffffff82808812, 0xffffffff82808817):
>                          DW_OP_breg0 RAX+0,
>                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
>                          DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
>                          DW_OP_stack_value,
>                          DW_OP_piece 0x1,
>                          DW_OP_breg0 RAX+0,
>                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
>                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
>                          DW_OP_lit8,
>                          DW_OP_shr,
>                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
>                          DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
>                          DW_OP_stack_value,
>                          DW_OP_piece 0x3
>                      ......
>
> At one point, a right shift by 8 happens and the result is converted to
> 32-bit unsigned int and then to 24-bit unsigned int.
>
> BTF does not need any of these DW_OP_* information and such non-regular int
> types will cause libbpf to emit errors.
> Let us sanitize them to generate BTF acceptable to libbpf and kernel.
>
> Cc: Sedat Dilek <sedat.dilek@gmail.com>

Thanks for v2.

For both v1 and v2:

   Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
   Reported-by: Sedat Dilek <sedat.dilek@gmail.com>

My development and testing environment:

1. Debian/testing AMD64
2. Linux v5.11-rc6+ with custom mostly Clang fixes
3. Debug-Info: BTF + DWARF-v5 enabled
4. Compiler and tools (incl. Clang's Integrated ASsembler IAS):
LLVM/Clang v12.0.0-rc1 (make LLVM=1 LLVM_IAS=1)

Build and boot on bare metal.

- Sedat -

> Acked-by: Andrii Nakryiko <andrii@kernel.org>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  libbtf.c | 39 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 38 insertions(+), 1 deletion(-)
>
> diff --git a/libbtf.c b/libbtf.c
> index 9f76283..5843200 100644
> --- a/libbtf.c
> +++ b/libbtf.c
> @@ -373,6 +373,7 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
>         struct btf *btf = btfe->btf;
>         const struct btf_type *t;
>         uint8_t encoding = 0;
> +       uint16_t byte_sz;
>         int32_t id;
>
>         if (bt->is_signed) {
> @@ -384,7 +385,43 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
>                 return -1;
>         }
>
> -       id = btf__add_int(btf, name, BITS_ROUNDUP_BYTES(bt->bit_size), encoding);
> +       /* dwarf5 may emit DW_ATE_[un]signed_{num} base types where
> +        * {num} is not power of 2 and may exceed 128. Such attributes
> +        * are mostly used to record operation for an actual parameter
> +        * or variable.
> +        * For example,
> +        *     DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> +        *         [0xffffffff82808812, 0xffffffff82808817):
> +        *             DW_OP_breg0 RAX+0,
> +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> +        *             DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> +        *             DW_OP_stack_value,
> +        *             DW_OP_piece 0x1,
> +        *             DW_OP_breg0 RAX+0,
> +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> +        *             DW_OP_lit8,
> +        *             DW_OP_shr,
> +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> +        *             DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> +        *             DW_OP_stack_value, DW_OP_piece 0x3
> +        *     DW_AT_name    ("ebx")
> +        *     DW_AT_decl_file       ("/linux/arch/x86/events/intel/core.c")
> +        *
> +        * In the above example, at some point, one unsigned_32 value
> +        * is right shifted by 8 and the result is converted to unsigned_32
> +        * and then unsigned_24.
> +        *
> +        * BTF does not need such DW_OP_* information so let us sanitize
> +        * these non-regular int types to avoid libbpf/kernel complaints.
> +        */
> +       byte_sz = BITS_ROUNDUP_BYTES(bt->bit_size);
> +       if (!byte_sz || (byte_sz & (byte_sz - 1))) {
> +               name = "__SANITIZED_FAKE_INT__";
> +               byte_sz = 4;
> +       }
> +
> +       id = btf__add_int(btf, name, byte_sz, encoding);
>         if (id < 0) {
>                 btf_elf__log_err(btfe, BTF_KIND_INT, name, true, "Error emitting BTF type");
>         } else {
> --
> 2.24.1
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH dwarves v2] btf_encoder: sanitize non-regular int base type
  2021-02-07  7:17 [PATCH dwarves v2] btf_encoder: sanitize non-regular int base type Yonghong Song
  2021-02-07 10:32 ` Sedat Dilek
@ 2021-02-07 14:18 ` Mark Wielaard
  2021-02-07 15:15   ` Sedat Dilek
                     ` (2 more replies)
  2021-02-08 19:22 ` Nick Desaulniers
  2 siblings, 3 replies; 14+ messages in thread
From: Mark Wielaard @ 2021-02-07 14:18 UTC (permalink / raw)
  To: Yonghong Song, acme, dwarves
  Cc: bpf, andriin, ndesaulniers, sedat.dilek, Andrii Nakryiko

Hi,

On Sat, 2021-02-06 at 23:17 -0800, Yonghong Song wrote:
> clang with dwarf5 may generate non-regular int base type,
> i.e., not a signed/unsigned char/short/int/longlong/__int128.
> Such base types are often used to describe
> how an actual parameter or variable is generated. For example,
> 
> 0x000015cf:   DW_TAG_base_type
>                 DW_AT_name      ("DW_ATE_unsigned_1")
>                 DW_AT_encoding  (DW_ATE_unsigned)
>                 DW_AT_byte_size (0x00)
> 
> 0x00010ed9:         DW_TAG_formal_parameter
>                       DW_AT_location    (DW_OP_lit0,
>                                          DW_OP_not,
>                                          DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
>                                          DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
>                                          DW_OP_stack_value)
>                       DW_AT_abstract_origin     (0x00013984 "branch")
> 
> What it does is with a literal "0", did a "not" operation, and the converted to
> one-bit unsigned int and then 8-bit unsigned int.

Thanks for tracking this down. Do you have any idea why the clang
compiler emits this? You might be right that it is intended to do what
you describe it does (but then it would simply encode an unsigned
constant 1 char in a very inefficient way). But as implemented it
doesn't seem to make any sense. What would DW_OP_convert of an zero
sized base type even mean (if it is intended as a 1 bit sized typed,
then why is there no DW_AT_bit_size)?

So I do think your patch makes sense. clang clearly is emitting
something bogus. And so some fixup is needed. But maybe we should at
least give a warning about it, otherwise it might never get fixed.

BTW. If these bogus base types are only emitted as part of a location
expression and not as part of an actual function or variable type
description, then why are we even trying to encode it as a BTF type? It
might be cheaper to just skip/drop it. But maybe the code setup makes
it hard to know whether or not such a (bogus) type is actually
referenced from a function or variable description?

Cheers,

Mark

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH dwarves v2] btf_encoder: sanitize non-regular int base type
  2021-02-07 14:18 ` Mark Wielaard
@ 2021-02-07 15:15   ` Sedat Dilek
  2021-02-07 18:14   ` Yonghong Song
  2021-02-08 19:16   ` Nick Desaulniers
  2 siblings, 0 replies; 14+ messages in thread
From: Sedat Dilek @ 2021-02-07 15:15 UTC (permalink / raw)
  To: Mark Wielaard
  Cc: Yonghong Song, acme, dwarves, bpf, andriin, Nick Desaulniers,
	Andrii Nakryiko

On Sun, Feb 7, 2021 at 3:18 PM Mark Wielaard <mark@klomp.org> wrote:
>
> Hi,
>
> On Sat, 2021-02-06 at 23:17 -0800, Yonghong Song wrote:
> > clang with dwarf5 may generate non-regular int base type,
> > i.e., not a signed/unsigned char/short/int/longlong/__int128.
> > Such base types are often used to describe
> > how an actual parameter or variable is generated. For example,
> >
> > 0x000015cf:   DW_TAG_base_type
> >                 DW_AT_name      ("DW_ATE_unsigned_1")
> >                 DW_AT_encoding  (DW_ATE_unsigned)
> >                 DW_AT_byte_size (0x00)
> >
> > 0x00010ed9:         DW_TAG_formal_parameter
> >                       DW_AT_location    (DW_OP_lit0,
> >                                          DW_OP_not,
> >                                          DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
> >                                          DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
> >                                          DW_OP_stack_value)
> >                       DW_AT_abstract_origin     (0x00013984 "branch")
> >
> > What it does is with a literal "0", did a "not" operation, and the converted to
> > one-bit unsigned int and then 8-bit unsigned int.
>
> Thanks for tracking this down. Do you have any idea why the clang
> compiler emits this? You might be right that it is intended to do what
> you describe it does (but then it would simply encode an unsigned
> constant 1 char in a very inefficient way). But as implemented it
> doesn't seem to make any sense. What would DW_OP_convert of an zero
> sized base type even mean (if it is intended as a 1 bit sized typed,
> then why is there no DW_AT_bit_size)?
>
> So I do think your patch makes sense. clang clearly is emitting
> something bogus. And so some fixup is needed. But maybe we should at
> least give a warning about it, otherwise it might never get fixed.
>
> BTW. If these bogus base types are only emitted as part of a location
> expression and not as part of an actual function or variable type
> description, then why are we even trying to encode it as a BTF type? It
> might be cheaper to just skip/drop it. But maybe the code setup makes
> it hard to know whether or not such a (bogus) type is actually
> referenced from a function or variable description?
>

As said this is with LLVM/Clang v12.0.0-rc1 and `make LLVM=1
LLVM_IAS=1` means use all LLVM tools (do not use GNU/binutils like
GNU/ld BFD) and Clang's Integrated ASsembler (means do not use GNU
AS).

When doing an indexed search for "DW_ATE_unsigned"...

...this points to changes recently done in places like
llvm/lib/CodeGen/AsmPrinter/.

I see 16 changes there touching DWARF-x area since llvmorg-12.0.0-rc1 release:

$ git log --oneline llvmorg-12.0.0-rc1.. llvm/lib/CodeGen/AsmPrinter/
e44a10094283 .gcc_except_table: Set SHF_LINK_ORDER if binutils>=2.36,
and drop unneeded unique ID for -fno-unique-section-names
853a2649160c [AsmPrinter] __patchable_function_entries: Set
SHF_LINK_ORDER for binutils 2.36 and above
e3c0b0fe0958 [WebAssembly] locals can now be indirect in DWARF
34f3249abdff [DebugInfo] Fix error from D95893, where I accidentally
used an unsigned int in a loop and it wraps around.
a740af4de970 [CodeView][DebugInfo] Update the code for removing
template arguments from the display name of a codeview function id.
56fa34ae3570 DebugInfo: Temporarily work around -gsplit-dwarf + LTO
.debug_gnu_pubnames regression after D94976
8998f5843503 Re-land D94976 after revert in e29552c5aff6
d32deaab4d53 Revert "[DWARF] Location-less inlined variables should
not have DW_TAG_variable"
ddc2f1e3fb4f [DWARF] Location-less inlined variables should not have
DW_TAG_variable
511c9a76fb98 [AsmPrinter] Use ListSeparator (NFC)
85b7b5625a00 Fix memory leak in 4318028cd2d7633a0cdeb0b5d4d2ed81fab87864
4318028cd2d7 DebugInfo: Add a DWARF FORM extension for addrx+offset
references to reduce relocations
e29552c5aff6 Revert "[DWARF] Create subprogram's DIE in DISubprogram's unit"
dd7297e1bffe DebugInfo: Fix bug in addr+offset exprloc to use DWARFv5
addrx op instead of DWARFv4 GNU extension
7e6c87ee0454 DebugInfo: Deduplicate addresses in debug_addr
ef0dcb506300 [DWARF] Create subprogram's DIE in DISubprogram's unit

What I try to say is with LLVM-13 (git) this might look different?

Here, I have a LLVM-12 ThinLTO+PGO optimized toolchain which saves 1/3
of Linux-kernel build-time.
So, I do not want to switch to Debian's or packages from <apt.llvm.org>.
These binaries take much much longer and I do not know if I get some
new issues with Linux v5.11-rc6+.

Again, I am not a LLVM toolchain expert.
Best is to ask on llvm-dev mailing-list?

Thanks.

- Sedat -



[1] https://github.com/llvm/llvm-project/search?o=desc&q=DW_ATE_unsigned&s=indexed

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH dwarves v2] btf_encoder: sanitize non-regular int base type
  2021-02-07 14:18 ` Mark Wielaard
  2021-02-07 15:15   ` Sedat Dilek
@ 2021-02-07 18:14   ` Yonghong Song
  2021-02-08 19:16   ` Nick Desaulniers
  2 siblings, 0 replies; 14+ messages in thread
From: Yonghong Song @ 2021-02-07 18:14 UTC (permalink / raw)
  To: Mark Wielaard, acme, dwarves
  Cc: bpf, andriin, ndesaulniers, sedat.dilek, Andrii Nakryiko



On 2/7/21 6:18 AM, Mark Wielaard wrote:
> Hi,
> 
> On Sat, 2021-02-06 at 23:17 -0800, Yonghong Song wrote:
>> clang with dwarf5 may generate non-regular int base type,
>> i.e., not a signed/unsigned char/short/int/longlong/__int128.
>> Such base types are often used to describe
>> how an actual parameter or variable is generated. For example,
>>
>> 0x000015cf:   DW_TAG_base_type
>>                  DW_AT_name      ("DW_ATE_unsigned_1")
>>                  DW_AT_encoding  (DW_ATE_unsigned)
>>                  DW_AT_byte_size (0x00)
>>
>> 0x00010ed9:         DW_TAG_formal_parameter
>>                        DW_AT_location    (DW_OP_lit0,
>>                                           DW_OP_not,
>>                                           DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
>>                                           DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
>>                                           DW_OP_stack_value)
>>                        DW_AT_abstract_origin     (0x00013984 "branch")
>>
>> What it does is with a literal "0", did a "not" operation, and the converted to
>> one-bit unsigned int and then 8-bit unsigned int.
> 
> Thanks for tracking this down. Do you have any idea why the clang
> compiler emits this? You might be right that it is intended to do what

No, I don't know.

> you describe it does (but then it would simply encode an unsigned
> constant 1 char in a very inefficient way). But as implemented it
> doesn't seem to make any sense. What would DW_OP_convert of an zero
> sized base type even mean (if it is intended as a 1 bit sized typed,
> then why is there no DW_AT_bit_size)?

We need to report back to llvm community about this instance.
DW_AT_byte_size = 0 only for DW_ATE_unsigned_1 does not sound right.
DW_AT_bit_size might be needed as you suggested.

> 
> So I do think your patch makes sense. clang clearly is emitting
> something bogus. And so some fixup is needed. But maybe we should at
> least give a warning about it, otherwise it might never get fixed.

In pahole, to deal with dwarf weirdness, we have several other places
for workaround without warning. I think it is okay not to have
warning here as users cannot do anything about it. Also we have
this special int type with name __SANITIZED_FAKE_INT__ which will
signal the issue still exists if looking at BTF or generated vmlinux.h.

> 
> BTW. If these bogus base types are only emitted as part of a location
> expression and not as part of an actual function or variable type
> description, then why are we even trying to encode it as a BTF type? It

Let us still encode it as a BTF type as an evidence that we do
changed some types. After deduplication, we will only have ONE 
__SANITIZED_FAKE_INT__ type. I think leave this one type in BTF is okay.

> might be cheaper to just skip/drop it. But maybe the code setup makes
> it hard to know whether or not such a (bogus) type is actually
> referenced from a function or variable description?

I guess it is possible that we do a preprocessing to check whether
any types BTF intends to emit (var/func definition, etc.) referencing
such types or not. This will involves overhead for every CU most of 
which does not have this issue. And this should not really happen given 
a sane dwarf.

With __SANITIZED_FAKE_INT__ as the type name, if something e.g., a 
variable definition use this type, you will have
    __SANITIZED_FAKE_INT__ a;
in skeleton, or have
    struct {
       __SANITIZED_FAKE_INT__ member;
       ...
    };
in vmlinux.h,
we should be able to catch such an error easily.

> 
> Cheers,
> 
> Mark
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH dwarves v2] btf_encoder: sanitize non-regular int base type
  2021-02-07 14:18 ` Mark Wielaard
  2021-02-07 15:15   ` Sedat Dilek
  2021-02-07 18:14   ` Yonghong Song
@ 2021-02-08 19:16   ` Nick Desaulniers
       [not found]     ` <CAMXQf9_Qy5tD05ax1vtETnzM9szLxm95JgpHgT0HzjktixMNNQ@mail.gmail.com>
  2 siblings, 1 reply; 14+ messages in thread
From: Nick Desaulniers @ 2021-02-08 19:16 UTC (permalink / raw)
  To: Mark Wielaard, David Blaikie
  Cc: Yonghong Song, Arnaldo Carvalho de Melo, dwarves, bpf,
	Andrii Nakryiko, Sedat Dilek, Andrii Nakryiko

On Sun, Feb 7, 2021 at 6:18 AM Mark Wielaard <mark@klomp.org> wrote:
>
> Hi,
>
> On Sat, 2021-02-06 at 23:17 -0800, Yonghong Song wrote:
> > clang with dwarf5 may generate non-regular int base type,
> > i.e., not a signed/unsigned char/short/int/longlong/__int128.
> > Such base types are often used to describe
> > how an actual parameter or variable is generated. For example,
> >
> > 0x000015cf:   DW_TAG_base_type
> >                 DW_AT_name      ("DW_ATE_unsigned_1")
> >                 DW_AT_encoding  (DW_ATE_unsigned)
> >                 DW_AT_byte_size (0x00)
> >
> > 0x00010ed9:         DW_TAG_formal_parameter
> >                       DW_AT_location    (DW_OP_lit0,
> >                                          DW_OP_not,
> >                                          DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
> >                                          DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
> >                                          DW_OP_stack_value)
> >                       DW_AT_abstract_origin     (0x00013984 "branch")
> >
> > What it does is with a literal "0", did a "not" operation, and the converted to
> > one-bit unsigned int and then 8-bit unsigned int.
>
> Thanks for tracking this down. Do you have any idea why the clang
> compiler emits this? You might be right that it is intended to do what
> you describe it does (but then it would simply encode an unsigned
> constant 1 char in a very inefficient way). But as implemented it
> doesn't seem to make any sense. What would DW_OP_convert of an zero
> sized base type even mean (if it is intended as a 1 bit sized typed,
> then why is there no DW_AT_bit_size)?

David,
Any thoughts on the above sequence of DW_OP_ entries?  This is a part
of DWARF I'm unfamiliar with.

>
> So I do think your patch makes sense. clang clearly is emitting
> something bogus. And so some fixup is needed. But maybe we should at
> least give a warning about it, otherwise it might never get fixed.
>
> BTW. If these bogus base types are only emitted as part of a location
> expression and not as part of an actual function or variable type
> description, then why are we even trying to encode it as a BTF type? It
> might be cheaper to just skip/drop it. But maybe the code setup makes
> it hard to know whether or not such a (bogus) type is actually
> referenced from a function or variable description?
>
> Cheers,
>
> Mark



-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH dwarves v2] btf_encoder: sanitize non-regular int base type
  2021-02-07  7:17 [PATCH dwarves v2] btf_encoder: sanitize non-regular int base type Yonghong Song
  2021-02-07 10:32 ` Sedat Dilek
  2021-02-07 14:18 ` Mark Wielaard
@ 2021-02-08 19:22 ` Nick Desaulniers
  2021-02-08 19:27   ` Sedat Dilek
  2021-02-08 20:13   ` Arnaldo Carvalho de Melo
  2 siblings, 2 replies; 14+ messages in thread
From: Nick Desaulniers @ 2021-02-08 19:22 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Arnaldo Carvalho de Melo, dwarves, bpf, Andrii Nakryiko,
	Mark Wielaard, Sedat Dilek, Andrii Nakryiko

On Sat, Feb 6, 2021 at 11:17 PM Yonghong Song <yhs@fb.com> wrote:
>
> clang with dwarf5 may generate non-regular int base type,
> i.e., not a signed/unsigned char/short/int/longlong/__int128.
> Such base types are often used to describe
> how an actual parameter or variable is generated. For example,
>
> 0x000015cf:   DW_TAG_base_type
>                 DW_AT_name      ("DW_ATE_unsigned_1")
>                 DW_AT_encoding  (DW_ATE_unsigned)
>                 DW_AT_byte_size (0x00)
>
> 0x00010ed9:         DW_TAG_formal_parameter
>                       DW_AT_location    (DW_OP_lit0,
>                                          DW_OP_not,
>                                          DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
>                                          DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
>                                          DW_OP_stack_value)
>                       DW_AT_abstract_origin     (0x00013984 "branch")
>
> What it does is with a literal "0", did a "not" operation, and the converted to
> one-bit unsigned int and then 8-bit unsigned int.
>
> Another example,
>
> 0x000e97e4:   DW_TAG_base_type
>                 DW_AT_name      ("DW_ATE_unsigned_24")
>                 DW_AT_encoding  (DW_ATE_unsigned)
>                 DW_AT_byte_size (0x03)
>
> 0x000f88f8:     DW_TAG_variable
>                   DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
>                      [0xffffffff82808812, 0xffffffff82808817):
>                          DW_OP_breg0 RAX+0,
>                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
>                          DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
>                          DW_OP_stack_value,
>                          DW_OP_piece 0x1,
>                          DW_OP_breg0 RAX+0,
>                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
>                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
>                          DW_OP_lit8,
>                          DW_OP_shr,
>                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
>                          DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
>                          DW_OP_stack_value,
>                          DW_OP_piece 0x3
>                      ......
>
> At one point, a right shift by 8 happens and the result is converted to
> 32-bit unsigned int and then to 24-bit unsigned int.
>
> BTF does not need any of these DW_OP_* information and such non-regular int
> types will cause libbpf to emit errors.
> Let us sanitize them to generate BTF acceptable to libbpf and kernel.
>
> Cc: Sedat Dilek <sedat.dilek@gmail.com>
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
> Signed-off-by: Yonghong Song <yhs@fb.com>

Thanks for the patch!

Tested-by: Nick Desaulniers <ndesaulniers@google.com>

> ---
>  libbtf.c | 39 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 38 insertions(+), 1 deletion(-)
>
> diff --git a/libbtf.c b/libbtf.c
> index 9f76283..5843200 100644
> --- a/libbtf.c
> +++ b/libbtf.c
> @@ -373,6 +373,7 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
>         struct btf *btf = btfe->btf;
>         const struct btf_type *t;
>         uint8_t encoding = 0;
> +       uint16_t byte_sz;
>         int32_t id;
>
>         if (bt->is_signed) {
> @@ -384,7 +385,43 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
>                 return -1;
>         }
>
> -       id = btf__add_int(btf, name, BITS_ROUNDUP_BYTES(bt->bit_size), encoding);
> +       /* dwarf5 may emit DW_ATE_[un]signed_{num} base types where
> +        * {num} is not power of 2 and may exceed 128. Such attributes
> +        * are mostly used to record operation for an actual parameter
> +        * or variable.
> +        * For example,
> +        *     DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> +        *         [0xffffffff82808812, 0xffffffff82808817):
> +        *             DW_OP_breg0 RAX+0,
> +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> +        *             DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> +        *             DW_OP_stack_value,
> +        *             DW_OP_piece 0x1,
> +        *             DW_OP_breg0 RAX+0,
> +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> +        *             DW_OP_lit8,
> +        *             DW_OP_shr,
> +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> +        *             DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> +        *             DW_OP_stack_value, DW_OP_piece 0x3
> +        *     DW_AT_name    ("ebx")
> +        *     DW_AT_decl_file       ("/linux/arch/x86/events/intel/core.c")
> +        *
> +        * In the above example, at some point, one unsigned_32 value
> +        * is right shifted by 8 and the result is converted to unsigned_32
> +        * and then unsigned_24.
> +        *
> +        * BTF does not need such DW_OP_* information so let us sanitize
> +        * these non-regular int types to avoid libbpf/kernel complaints.
> +        */
> +       byte_sz = BITS_ROUNDUP_BYTES(bt->bit_size);
> +       if (!byte_sz || (byte_sz & (byte_sz - 1))) {
> +               name = "__SANITIZED_FAKE_INT__";
> +               byte_sz = 4;
> +       }
> +
> +       id = btf__add_int(btf, name, byte_sz, encoding);
>         if (id < 0) {
>                 btf_elf__log_err(btfe, BTF_KIND_INT, name, true, "Error emitting BTF type");
>         } else {
> --
> 2.24.1
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH dwarves v2] btf_encoder: sanitize non-regular int base type
  2021-02-08 19:22 ` Nick Desaulniers
@ 2021-02-08 19:27   ` Sedat Dilek
  2021-02-08 20:13   ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 14+ messages in thread
From: Sedat Dilek @ 2021-02-08 19:27 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Yonghong Song, Arnaldo Carvalho de Melo, dwarves, bpf,
	Andrii Nakryiko, Mark Wielaard, Andrii Nakryiko

On Mon, Feb 8, 2021 at 8:23 PM Nick Desaulniers <ndesaulniers@google.com> wrote:
>
> On Sat, Feb 6, 2021 at 11:17 PM Yonghong Song <yhs@fb.com> wrote:
> >
> > clang with dwarf5 may generate non-regular int base type,
> > i.e., not a signed/unsigned char/short/int/longlong/__int128.
> > Such base types are often used to describe
> > how an actual parameter or variable is generated. For example,
> >
> > 0x000015cf:   DW_TAG_base_type
> >                 DW_AT_name      ("DW_ATE_unsigned_1")
> >                 DW_AT_encoding  (DW_ATE_unsigned)
> >                 DW_AT_byte_size (0x00)
> >
> > 0x00010ed9:         DW_TAG_formal_parameter
> >                       DW_AT_location    (DW_OP_lit0,
> >                                          DW_OP_not,
> >                                          DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
> >                                          DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
> >                                          DW_OP_stack_value)
> >                       DW_AT_abstract_origin     (0x00013984 "branch")
> >
> > What it does is with a literal "0", did a "not" operation, and the converted to
> > one-bit unsigned int and then 8-bit unsigned int.
> >
> > Another example,
> >
> > 0x000e97e4:   DW_TAG_base_type
> >                 DW_AT_name      ("DW_ATE_unsigned_24")
> >                 DW_AT_encoding  (DW_ATE_unsigned)
> >                 DW_AT_byte_size (0x03)
> >
> > 0x000f88f8:     DW_TAG_variable
> >                   DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> >                      [0xffffffff82808812, 0xffffffff82808817):
> >                          DW_OP_breg0 RAX+0,
> >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> >                          DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> >                          DW_OP_stack_value,
> >                          DW_OP_piece 0x1,
> >                          DW_OP_breg0 RAX+0,
> >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> >                          DW_OP_lit8,
> >                          DW_OP_shr,
> >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> >                          DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> >                          DW_OP_stack_value,
> >                          DW_OP_piece 0x3
> >                      ......
> >
> > At one point, a right shift by 8 happens and the result is converted to
> > 32-bit unsigned int and then to 24-bit unsigned int.
> >
> > BTF does not need any of these DW_OP_* information and such non-regular int
> > types will cause libbpf to emit errors.
> > Let us sanitize them to generate BTF acceptable to libbpf and kernel.
> >
> > Cc: Sedat Dilek <sedat.dilek@gmail.com>
> > Acked-by: Andrii Nakryiko <andrii@kernel.org>
> > Signed-off-by: Yonghong Song <yhs@fb.com>
>
> Thanks for the patch!
>
> Tested-by: Nick Desaulniers <ndesaulniers@google.com>
>

Cool, thanks for testing Nick.

- Sedat -

> > ---
> >  libbtf.c | 39 ++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 38 insertions(+), 1 deletion(-)
> >
> > diff --git a/libbtf.c b/libbtf.c
> > index 9f76283..5843200 100644
> > --- a/libbtf.c
> > +++ b/libbtf.c
> > @@ -373,6 +373,7 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
> >         struct btf *btf = btfe->btf;
> >         const struct btf_type *t;
> >         uint8_t encoding = 0;
> > +       uint16_t byte_sz;
> >         int32_t id;
> >
> >         if (bt->is_signed) {
> > @@ -384,7 +385,43 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
> >                 return -1;
> >         }
> >
> > -       id = btf__add_int(btf, name, BITS_ROUNDUP_BYTES(bt->bit_size), encoding);
> > +       /* dwarf5 may emit DW_ATE_[un]signed_{num} base types where
> > +        * {num} is not power of 2 and may exceed 128. Such attributes
> > +        * are mostly used to record operation for an actual parameter
> > +        * or variable.
> > +        * For example,
> > +        *     DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> > +        *         [0xffffffff82808812, 0xffffffff82808817):
> > +        *             DW_OP_breg0 RAX+0,
> > +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > +        *             DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> > +        *             DW_OP_stack_value,
> > +        *             DW_OP_piece 0x1,
> > +        *             DW_OP_breg0 RAX+0,
> > +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > +        *             DW_OP_lit8,
> > +        *             DW_OP_shr,
> > +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > +        *             DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> > +        *             DW_OP_stack_value, DW_OP_piece 0x3
> > +        *     DW_AT_name    ("ebx")
> > +        *     DW_AT_decl_file       ("/linux/arch/x86/events/intel/core.c")
> > +        *
> > +        * In the above example, at some point, one unsigned_32 value
> > +        * is right shifted by 8 and the result is converted to unsigned_32
> > +        * and then unsigned_24.
> > +        *
> > +        * BTF does not need such DW_OP_* information so let us sanitize
> > +        * these non-regular int types to avoid libbpf/kernel complaints.
> > +        */
> > +       byte_sz = BITS_ROUNDUP_BYTES(bt->bit_size);
> > +       if (!byte_sz || (byte_sz & (byte_sz - 1))) {
> > +               name = "__SANITIZED_FAKE_INT__";
> > +               byte_sz = 4;
> > +       }
> > +
> > +       id = btf__add_int(btf, name, byte_sz, encoding);
> >         if (id < 0) {
> >                 btf_elf__log_err(btfe, BTF_KIND_INT, name, true, "Error emitting BTF type");
> >         } else {
> > --
> > 2.24.1
> >
>
>
> --
> Thanks,
> ~Nick Desaulniers

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH dwarves v2] btf_encoder: sanitize non-regular int base type
  2021-02-07 10:32 ` Sedat Dilek
@ 2021-02-08 19:55   ` Arnaldo Carvalho de Melo
  2021-02-08 20:46     ` Sedat Dilek
  0 siblings, 1 reply; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-02-08 19:55 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Yonghong Song, dwarves, bpf, andriin, mark, Nick Desaulniers,
	Andrii Nakryiko

Em Sun, Feb 07, 2021 at 11:32:00AM +0100, Sedat Dilek escreveu:
> On Sun, Feb 7, 2021 at 8:17 AM Yonghong Song <yhs@fb.com> wrote:
> >
> > clang with dwarf5 may generate non-regular int base type,
> > i.e., not a signed/unsigned char/short/int/longlong/__int128.
> > Such base types are often used to describe
> > how an actual parameter or variable is generated. For example,
> >
> > 0x000015cf:   DW_TAG_base_type
> >                 DW_AT_name      ("DW_ATE_unsigned_1")
> >                 DW_AT_encoding  (DW_ATE_unsigned)
> >                 DW_AT_byte_size (0x00)
> >
> > 0x00010ed9:         DW_TAG_formal_parameter
> >                       DW_AT_location    (DW_OP_lit0,
> >                                          DW_OP_not,
> >                                          DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
> >                                          DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
> >                                          DW_OP_stack_value)
> >                       DW_AT_abstract_origin     (0x00013984 "branch")
> >
> > What it does is with a literal "0", did a "not" operation, and the converted to
> > one-bit unsigned int and then 8-bit unsigned int.
> >
> > Another example,
> >
> > 0x000e97e4:   DW_TAG_base_type
> >                 DW_AT_name      ("DW_ATE_unsigned_24")
> >                 DW_AT_encoding  (DW_ATE_unsigned)
> >                 DW_AT_byte_size (0x03)
> >
> > 0x000f88f8:     DW_TAG_variable
> >                   DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> >                      [0xffffffff82808812, 0xffffffff82808817):
> >                          DW_OP_breg0 RAX+0,
> >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> >                          DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> >                          DW_OP_stack_value,
> >                          DW_OP_piece 0x1,
> >                          DW_OP_breg0 RAX+0,
> >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> >                          DW_OP_lit8,
> >                          DW_OP_shr,
> >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> >                          DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> >                          DW_OP_stack_value,
> >                          DW_OP_piece 0x3
> >                      ......
> >
> > At one point, a right shift by 8 happens and the result is converted to
> > 32-bit unsigned int and then to 24-bit unsigned int.
> >
> > BTF does not need any of these DW_OP_* information and such non-regular int
> > types will cause libbpf to emit errors.
> > Let us sanitize them to generate BTF acceptable to libbpf and kernel.
> >
> > Cc: Sedat Dilek <sedat.dilek@gmail.com>
> 
> Thanks for v2.
> 
> For both v1 and v2:
> 
>    Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
>    Reported-by: Sedat Dilek <sedat.dilek@gmail.com>

Thanks, applied.

- Arnaldo
 
> My development and testing environment:
> 
> 1. Debian/testing AMD64
> 2. Linux v5.11-rc6+ with custom mostly Clang fixes
> 3. Debug-Info: BTF + DWARF-v5 enabled
> 4. Compiler and tools (incl. Clang's Integrated ASsembler IAS):
> LLVM/Clang v12.0.0-rc1 (make LLVM=1 LLVM_IAS=1)
> 
> Build and boot on bare metal.
> 
> - Sedat -
> 
> > Acked-by: Andrii Nakryiko <andrii@kernel.org>
> > Signed-off-by: Yonghong Song <yhs@fb.com>
> > ---
> >  libbtf.c | 39 ++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 38 insertions(+), 1 deletion(-)
> >
> > diff --git a/libbtf.c b/libbtf.c
> > index 9f76283..5843200 100644
> > --- a/libbtf.c
> > +++ b/libbtf.c
> > @@ -373,6 +373,7 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
> >         struct btf *btf = btfe->btf;
> >         const struct btf_type *t;
> >         uint8_t encoding = 0;
> > +       uint16_t byte_sz;
> >         int32_t id;
> >
> >         if (bt->is_signed) {
> > @@ -384,7 +385,43 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
> >                 return -1;
> >         }
> >
> > -       id = btf__add_int(btf, name, BITS_ROUNDUP_BYTES(bt->bit_size), encoding);
> > +       /* dwarf5 may emit DW_ATE_[un]signed_{num} base types where
> > +        * {num} is not power of 2 and may exceed 128. Such attributes
> > +        * are mostly used to record operation for an actual parameter
> > +        * or variable.
> > +        * For example,
> > +        *     DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> > +        *         [0xffffffff82808812, 0xffffffff82808817):
> > +        *             DW_OP_breg0 RAX+0,
> > +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > +        *             DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> > +        *             DW_OP_stack_value,
> > +        *             DW_OP_piece 0x1,
> > +        *             DW_OP_breg0 RAX+0,
> > +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > +        *             DW_OP_lit8,
> > +        *             DW_OP_shr,
> > +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > +        *             DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> > +        *             DW_OP_stack_value, DW_OP_piece 0x3
> > +        *     DW_AT_name    ("ebx")
> > +        *     DW_AT_decl_file       ("/linux/arch/x86/events/intel/core.c")
> > +        *
> > +        * In the above example, at some point, one unsigned_32 value
> > +        * is right shifted by 8 and the result is converted to unsigned_32
> > +        * and then unsigned_24.
> > +        *
> > +        * BTF does not need such DW_OP_* information so let us sanitize
> > +        * these non-regular int types to avoid libbpf/kernel complaints.
> > +        */
> > +       byte_sz = BITS_ROUNDUP_BYTES(bt->bit_size);
> > +       if (!byte_sz || (byte_sz & (byte_sz - 1))) {
> > +               name = "__SANITIZED_FAKE_INT__";
> > +               byte_sz = 4;
> > +       }
> > +
> > +       id = btf__add_int(btf, name, byte_sz, encoding);
> >         if (id < 0) {
> >                 btf_elf__log_err(btfe, BTF_KIND_INT, name, true, "Error emitting BTF type");
> >         } else {
> > --
> > 2.24.1
> >

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH dwarves v2] btf_encoder: sanitize non-regular int base type
  2021-02-08 19:22 ` Nick Desaulniers
  2021-02-08 19:27   ` Sedat Dilek
@ 2021-02-08 20:13   ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-02-08 20:13 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Yonghong Song, dwarves, bpf, Andrii Nakryiko, Mark Wielaard,
	Sedat Dilek, Andrii Nakryiko

Em Mon, Feb 08, 2021 at 11:22:48AM -0800, Nick Desaulniers escreveu:
> On Sat, Feb 6, 2021 at 11:17 PM Yonghong Song <yhs@fb.com> wrote:
> >
> > clang with dwarf5 may generate non-regular int base type,
> > i.e., not a signed/unsigned char/short/int/longlong/__int128.
> > Such base types are often used to describe
> > how an actual parameter or variable is generated. For example,
> >
> > 0x000015cf:   DW_TAG_base_type
> >                 DW_AT_name      ("DW_ATE_unsigned_1")
> >                 DW_AT_encoding  (DW_ATE_unsigned)
> >                 DW_AT_byte_size (0x00)
> >
> > 0x00010ed9:         DW_TAG_formal_parameter
> >                       DW_AT_location    (DW_OP_lit0,
> >                                          DW_OP_not,
> >                                          DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
> >                                          DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
> >                                          DW_OP_stack_value)
> >                       DW_AT_abstract_origin     (0x00013984 "branch")
> >
> > What it does is with a literal "0", did a "not" operation, and the converted to
> > one-bit unsigned int and then 8-bit unsigned int.
> >
> > Another example,
> >
> > 0x000e97e4:   DW_TAG_base_type
> >                 DW_AT_name      ("DW_ATE_unsigned_24")
> >                 DW_AT_encoding  (DW_ATE_unsigned)
> >                 DW_AT_byte_size (0x03)
> >
> > 0x000f88f8:     DW_TAG_variable
> >                   DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> >                      [0xffffffff82808812, 0xffffffff82808817):
> >                          DW_OP_breg0 RAX+0,
> >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> >                          DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> >                          DW_OP_stack_value,
> >                          DW_OP_piece 0x1,
> >                          DW_OP_breg0 RAX+0,
> >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> >                          DW_OP_lit8,
> >                          DW_OP_shr,
> >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> >                          DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> >                          DW_OP_stack_value,
> >                          DW_OP_piece 0x3
> >                      ......
> >
> > At one point, a right shift by 8 happens and the result is converted to
> > 32-bit unsigned int and then to 24-bit unsigned int.
> >
> > BTF does not need any of these DW_OP_* information and such non-regular int
> > types will cause libbpf to emit errors.
> > Let us sanitize them to generate BTF acceptable to libbpf and kernel.
> >
> > Cc: Sedat Dilek <sedat.dilek@gmail.com>
> > Acked-by: Andrii Nakryiko <andrii@kernel.org>
> > Signed-off-by: Yonghong Song <yhs@fb.com>
> 
> Thanks for the patch!
> 
> Tested-by: Nick Desaulniers <ndesaulniers@google.com>

Thanks for testing and documenting that you tested, added the tag to
the commit,

- Arnaldo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH dwarves v2] btf_encoder: sanitize non-regular int base type
  2021-02-08 19:55   ` Arnaldo Carvalho de Melo
@ 2021-02-08 20:46     ` Sedat Dilek
  2021-02-08 21:07       ` Andrii Nakryiko
  0 siblings, 1 reply; 14+ messages in thread
From: Sedat Dilek @ 2021-02-08 20:46 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Yonghong Song, dwarves, bpf, andriin, mark, Nick Desaulniers,
	Andrii Nakryiko

On Mon, Feb 8, 2021 at 8:55 PM Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
>
> Em Sun, Feb 07, 2021 at 11:32:00AM +0100, Sedat Dilek escreveu:
> > On Sun, Feb 7, 2021 at 8:17 AM Yonghong Song <yhs@fb.com> wrote:
> > >
> > > clang with dwarf5 may generate non-regular int base type,
> > > i.e., not a signed/unsigned char/short/int/longlong/__int128.
> > > Such base types are often used to describe
> > > how an actual parameter or variable is generated. For example,
> > >
> > > 0x000015cf:   DW_TAG_base_type
> > >                 DW_AT_name      ("DW_ATE_unsigned_1")
> > >                 DW_AT_encoding  (DW_ATE_unsigned)
> > >                 DW_AT_byte_size (0x00)
> > >
> > > 0x00010ed9:         DW_TAG_formal_parameter
> > >                       DW_AT_location    (DW_OP_lit0,
> > >                                          DW_OP_not,
> > >                                          DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
> > >                                          DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
> > >                                          DW_OP_stack_value)
> > >                       DW_AT_abstract_origin     (0x00013984 "branch")
> > >
> > > What it does is with a literal "0", did a "not" operation, and the converted to
> > > one-bit unsigned int and then 8-bit unsigned int.
> > >
> > > Another example,
> > >
> > > 0x000e97e4:   DW_TAG_base_type
> > >                 DW_AT_name      ("DW_ATE_unsigned_24")
> > >                 DW_AT_encoding  (DW_ATE_unsigned)
> > >                 DW_AT_byte_size (0x03)
> > >
> > > 0x000f88f8:     DW_TAG_variable
> > >                   DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> > >                      [0xffffffff82808812, 0xffffffff82808817):
> > >                          DW_OP_breg0 RAX+0,
> > >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > >                          DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> > >                          DW_OP_stack_value,
> > >                          DW_OP_piece 0x1,
> > >                          DW_OP_breg0 RAX+0,
> > >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > >                          DW_OP_lit8,
> > >                          DW_OP_shr,
> > >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > >                          DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> > >                          DW_OP_stack_value,
> > >                          DW_OP_piece 0x3
> > >                      ......
> > >
> > > At one point, a right shift by 8 happens and the result is converted to
> > > 32-bit unsigned int and then to 24-bit unsigned int.
> > >
> > > BTF does not need any of these DW_OP_* information and such non-regular int
> > > types will cause libbpf to emit errors.
> > > Let us sanitize them to generate BTF acceptable to libbpf and kernel.
> > >
> > > Cc: Sedat Dilek <sedat.dilek@gmail.com>
> >
> > Thanks for v2.
> >
> > For both v1 and v2:
> >
> >    Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
> >    Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
>
> Thanks, applied.
>

Great.
I cannot see it yet in [1] or [2].

More important to me is is this worth a pahole v1.20.1 release?
This patch is required to successfully build with BTF and DWARF-5 and Clang-12.

I have Nathan's "bpf: Hoise pahole version checks into Kconfig" patch
in my custom patchset (together with Nick's DWARF-v5 patchset) which
makes it easier to do (depends on PAHOLE_VERSION >= 1201) via Kconfig
for example.

- Sedat -

[1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/
[2] https://github.com/acmel/dwarves/

> - Arnaldo
>
> > My development and testing environment:
> >
> > 1. Debian/testing AMD64
> > 2. Linux v5.11-rc6+ with custom mostly Clang fixes
> > 3. Debug-Info: BTF + DWARF-v5 enabled
> > 4. Compiler and tools (incl. Clang's Integrated ASsembler IAS):
> > LLVM/Clang v12.0.0-rc1 (make LLVM=1 LLVM_IAS=1)
> >
> > Build and boot on bare metal.
> >
> > - Sedat -
> >
> > > Acked-by: Andrii Nakryiko <andrii@kernel.org>
> > > Signed-off-by: Yonghong Song <yhs@fb.com>
> > > ---
> > >  libbtf.c | 39 ++++++++++++++++++++++++++++++++++++++-
> > >  1 file changed, 38 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/libbtf.c b/libbtf.c
> > > index 9f76283..5843200 100644
> > > --- a/libbtf.c
> > > +++ b/libbtf.c
> > > @@ -373,6 +373,7 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
> > >         struct btf *btf = btfe->btf;
> > >         const struct btf_type *t;
> > >         uint8_t encoding = 0;
> > > +       uint16_t byte_sz;
> > >         int32_t id;
> > >
> > >         if (bt->is_signed) {
> > > @@ -384,7 +385,43 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
> > >                 return -1;
> > >         }
> > >
> > > -       id = btf__add_int(btf, name, BITS_ROUNDUP_BYTES(bt->bit_size), encoding);
> > > +       /* dwarf5 may emit DW_ATE_[un]signed_{num} base types where
> > > +        * {num} is not power of 2 and may exceed 128. Such attributes
> > > +        * are mostly used to record operation for an actual parameter
> > > +        * or variable.
> > > +        * For example,
> > > +        *     DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> > > +        *         [0xffffffff82808812, 0xffffffff82808817):
> > > +        *             DW_OP_breg0 RAX+0,
> > > +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > > +        *             DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> > > +        *             DW_OP_stack_value,
> > > +        *             DW_OP_piece 0x1,
> > > +        *             DW_OP_breg0 RAX+0,
> > > +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > > +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > > +        *             DW_OP_lit8,
> > > +        *             DW_OP_shr,
> > > +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > > +        *             DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> > > +        *             DW_OP_stack_value, DW_OP_piece 0x3
> > > +        *     DW_AT_name    ("ebx")
> > > +        *     DW_AT_decl_file       ("/linux/arch/x86/events/intel/core.c")
> > > +        *
> > > +        * In the above example, at some point, one unsigned_32 value
> > > +        * is right shifted by 8 and the result is converted to unsigned_32
> > > +        * and then unsigned_24.
> > > +        *
> > > +        * BTF does not need such DW_OP_* information so let us sanitize
> > > +        * these non-regular int types to avoid libbpf/kernel complaints.
> > > +        */
> > > +       byte_sz = BITS_ROUNDUP_BYTES(bt->bit_size);
> > > +       if (!byte_sz || (byte_sz & (byte_sz - 1))) {
> > > +               name = "__SANITIZED_FAKE_INT__";
> > > +               byte_sz = 4;
> > > +       }
> > > +
> > > +       id = btf__add_int(btf, name, byte_sz, encoding);
> > >         if (id < 0) {
> > >                 btf_elf__log_err(btfe, BTF_KIND_INT, name, true, "Error emitting BTF type");
> > >         } else {
> > > --
> > > 2.24.1
> > >
>
> --
>
> - Arnaldo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH dwarves v2] btf_encoder: sanitize non-regular int base type
  2021-02-08 20:46     ` Sedat Dilek
@ 2021-02-08 21:07       ` Andrii Nakryiko
  2021-02-08 21:11         ` Sedat Dilek
  0 siblings, 1 reply; 14+ messages in thread
From: Andrii Nakryiko @ 2021-02-08 21:07 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Arnaldo Carvalho de Melo, Yonghong Song, dwarves, bpf,
	Andrii Nakryiko, Mark Wielaard, Nick Desaulniers,
	Andrii Nakryiko

On Mon, Feb 8, 2021 at 12:46 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>
> On Mon, Feb 8, 2021 at 8:55 PM Arnaldo Carvalho de Melo
> <arnaldo.melo@gmail.com> wrote:
> >
> > Em Sun, Feb 07, 2021 at 11:32:00AM +0100, Sedat Dilek escreveu:
> > > On Sun, Feb 7, 2021 at 8:17 AM Yonghong Song <yhs@fb.com> wrote:
> > > >
> > > > clang with dwarf5 may generate non-regular int base type,
> > > > i.e., not a signed/unsigned char/short/int/longlong/__int128.
> > > > Such base types are often used to describe
> > > > how an actual parameter or variable is generated. For example,
> > > >
> > > > 0x000015cf:   DW_TAG_base_type
> > > >                 DW_AT_name      ("DW_ATE_unsigned_1")
> > > >                 DW_AT_encoding  (DW_ATE_unsigned)
> > > >                 DW_AT_byte_size (0x00)
> > > >
> > > > 0x00010ed9:         DW_TAG_formal_parameter
> > > >                       DW_AT_location    (DW_OP_lit0,
> > > >                                          DW_OP_not,
> > > >                                          DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
> > > >                                          DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
> > > >                                          DW_OP_stack_value)
> > > >                       DW_AT_abstract_origin     (0x00013984 "branch")
> > > >
> > > > What it does is with a literal "0", did a "not" operation, and the converted to
> > > > one-bit unsigned int and then 8-bit unsigned int.
> > > >
> > > > Another example,
> > > >
> > > > 0x000e97e4:   DW_TAG_base_type
> > > >                 DW_AT_name      ("DW_ATE_unsigned_24")
> > > >                 DW_AT_encoding  (DW_ATE_unsigned)
> > > >                 DW_AT_byte_size (0x03)
> > > >
> > > > 0x000f88f8:     DW_TAG_variable
> > > >                   DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> > > >                      [0xffffffff82808812, 0xffffffff82808817):
> > > >                          DW_OP_breg0 RAX+0,
> > > >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > > >                          DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> > > >                          DW_OP_stack_value,
> > > >                          DW_OP_piece 0x1,
> > > >                          DW_OP_breg0 RAX+0,
> > > >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > > >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > > >                          DW_OP_lit8,
> > > >                          DW_OP_shr,
> > > >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > > >                          DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> > > >                          DW_OP_stack_value,
> > > >                          DW_OP_piece 0x3
> > > >                      ......
> > > >
> > > > At one point, a right shift by 8 happens and the result is converted to
> > > > 32-bit unsigned int and then to 24-bit unsigned int.
> > > >
> > > > BTF does not need any of these DW_OP_* information and such non-regular int
> > > > types will cause libbpf to emit errors.
> > > > Let us sanitize them to generate BTF acceptable to libbpf and kernel.
> > > >
> > > > Cc: Sedat Dilek <sedat.dilek@gmail.com>
> > >
> > > Thanks for v2.
> > >
> > > For both v1 and v2:
> > >
> > >    Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
> > >    Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
> >
> > Thanks, applied.
> >
>
> Great.
> I cannot see it yet in [1] or [2].
>
> More important to me is is this worth a pahole v1.20.1 release?
> This patch is required to successfully build with BTF and DWARF-5 and Clang-12.
>
> I have Nathan's "bpf: Hoise pahole version checks into Kconfig" patch
> in my custom patchset (together with Nick's DWARF-v5 patchset) which
> makes it easier to do (depends on PAHOLE_VERSION >= 1201) via Kconfig

I don't think the math works out with 1201 vs 121 (in the future), so
I'd rather prefer 1.21, if necessary.

> for example.
>
> - Sedat -
>
> [1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/
> [2] https://github.com/acmel/dwarves/
>
> > - Arnaldo
> >
> > > My development and testing environment:
> > >
> > > 1. Debian/testing AMD64
> > > 2. Linux v5.11-rc6+ with custom mostly Clang fixes
> > > 3. Debug-Info: BTF + DWARF-v5 enabled
> > > 4. Compiler and tools (incl. Clang's Integrated ASsembler IAS):
> > > LLVM/Clang v12.0.0-rc1 (make LLVM=1 LLVM_IAS=1)
> > >
> > > Build and boot on bare metal.
> > >
> > > - Sedat -
> > >
> > > > Acked-by: Andrii Nakryiko <andrii@kernel.org>
> > > > Signed-off-by: Yonghong Song <yhs@fb.com>
> > > > ---
> > > >  libbtf.c | 39 ++++++++++++++++++++++++++++++++++++++-
> > > >  1 file changed, 38 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/libbtf.c b/libbtf.c
> > > > index 9f76283..5843200 100644
> > > > --- a/libbtf.c
> > > > +++ b/libbtf.c
> > > > @@ -373,6 +373,7 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
> > > >         struct btf *btf = btfe->btf;
> > > >         const struct btf_type *t;
> > > >         uint8_t encoding = 0;
> > > > +       uint16_t byte_sz;
> > > >         int32_t id;
> > > >
> > > >         if (bt->is_signed) {
> > > > @@ -384,7 +385,43 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
> > > >                 return -1;
> > > >         }
> > > >
> > > > -       id = btf__add_int(btf, name, BITS_ROUNDUP_BYTES(bt->bit_size), encoding);
> > > > +       /* dwarf5 may emit DW_ATE_[un]signed_{num} base types where
> > > > +        * {num} is not power of 2 and may exceed 128. Such attributes
> > > > +        * are mostly used to record operation for an actual parameter
> > > > +        * or variable.
> > > > +        * For example,
> > > > +        *     DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> > > > +        *         [0xffffffff82808812, 0xffffffff82808817):
> > > > +        *             DW_OP_breg0 RAX+0,
> > > > +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > > > +        *             DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> > > > +        *             DW_OP_stack_value,
> > > > +        *             DW_OP_piece 0x1,
> > > > +        *             DW_OP_breg0 RAX+0,
> > > > +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > > > +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > > > +        *             DW_OP_lit8,
> > > > +        *             DW_OP_shr,
> > > > +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > > > +        *             DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> > > > +        *             DW_OP_stack_value, DW_OP_piece 0x3
> > > > +        *     DW_AT_name    ("ebx")
> > > > +        *     DW_AT_decl_file       ("/linux/arch/x86/events/intel/core.c")
> > > > +        *
> > > > +        * In the above example, at some point, one unsigned_32 value
> > > > +        * is right shifted by 8 and the result is converted to unsigned_32
> > > > +        * and then unsigned_24.
> > > > +        *
> > > > +        * BTF does not need such DW_OP_* information so let us sanitize
> > > > +        * these non-regular int types to avoid libbpf/kernel complaints.
> > > > +        */
> > > > +       byte_sz = BITS_ROUNDUP_BYTES(bt->bit_size);
> > > > +       if (!byte_sz || (byte_sz & (byte_sz - 1))) {
> > > > +               name = "__SANITIZED_FAKE_INT__";
> > > > +               byte_sz = 4;
> > > > +       }
> > > > +
> > > > +       id = btf__add_int(btf, name, byte_sz, encoding);
> > > >         if (id < 0) {
> > > >                 btf_elf__log_err(btfe, BTF_KIND_INT, name, true, "Error emitting BTF type");
> > > >         } else {
> > > > --
> > > > 2.24.1
> > > >
> >
> > --
> >
> > - Arnaldo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH dwarves v2] btf_encoder: sanitize non-regular int base type
  2021-02-08 21:07       ` Andrii Nakryiko
@ 2021-02-08 21:11         ` Sedat Dilek
  0 siblings, 0 replies; 14+ messages in thread
From: Sedat Dilek @ 2021-02-08 21:11 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Arnaldo Carvalho de Melo, Yonghong Song, dwarves, bpf,
	Andrii Nakryiko, Mark Wielaard, Nick Desaulniers,
	Andrii Nakryiko

On Mon, Feb 8, 2021 at 10:08 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Feb 8, 2021 at 12:46 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >
> > On Mon, Feb 8, 2021 at 8:55 PM Arnaldo Carvalho de Melo
> > <arnaldo.melo@gmail.com> wrote:
> > >
> > > Em Sun, Feb 07, 2021 at 11:32:00AM +0100, Sedat Dilek escreveu:
> > > > On Sun, Feb 7, 2021 at 8:17 AM Yonghong Song <yhs@fb.com> wrote:
> > > > >
> > > > > clang with dwarf5 may generate non-regular int base type,
> > > > > i.e., not a signed/unsigned char/short/int/longlong/__int128.
> > > > > Such base types are often used to describe
> > > > > how an actual parameter or variable is generated. For example,
> > > > >
> > > > > 0x000015cf:   DW_TAG_base_type
> > > > >                 DW_AT_name      ("DW_ATE_unsigned_1")
> > > > >                 DW_AT_encoding  (DW_ATE_unsigned)
> > > > >                 DW_AT_byte_size (0x00)
> > > > >
> > > > > 0x00010ed9:         DW_TAG_formal_parameter
> > > > >                       DW_AT_location    (DW_OP_lit0,
> > > > >                                          DW_OP_not,
> > > > >                                          DW_OP_convert (0x000015cf) "DW_ATE_unsigned_1",
> > > > >                                          DW_OP_convert (0x000015d4) "DW_ATE_unsigned_8",
> > > > >                                          DW_OP_stack_value)
> > > > >                       DW_AT_abstract_origin     (0x00013984 "branch")
> > > > >
> > > > > What it does is with a literal "0", did a "not" operation, and the converted to
> > > > > one-bit unsigned int and then 8-bit unsigned int.
> > > > >
> > > > > Another example,
> > > > >
> > > > > 0x000e97e4:   DW_TAG_base_type
> > > > >                 DW_AT_name      ("DW_ATE_unsigned_24")
> > > > >                 DW_AT_encoding  (DW_ATE_unsigned)
> > > > >                 DW_AT_byte_size (0x03)
> > > > >
> > > > > 0x000f88f8:     DW_TAG_variable
> > > > >                   DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> > > > >                      [0xffffffff82808812, 0xffffffff82808817):
> > > > >                          DW_OP_breg0 RAX+0,
> > > > >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > > > >                          DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> > > > >                          DW_OP_stack_value,
> > > > >                          DW_OP_piece 0x1,
> > > > >                          DW_OP_breg0 RAX+0,
> > > > >                          DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > > > >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > > > >                          DW_OP_lit8,
> > > > >                          DW_OP_shr,
> > > > >                          DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > > > >                          DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> > > > >                          DW_OP_stack_value,
> > > > >                          DW_OP_piece 0x3
> > > > >                      ......
> > > > >
> > > > > At one point, a right shift by 8 happens and the result is converted to
> > > > > 32-bit unsigned int and then to 24-bit unsigned int.
> > > > >
> > > > > BTF does not need any of these DW_OP_* information and such non-regular int
> > > > > types will cause libbpf to emit errors.
> > > > > Let us sanitize them to generate BTF acceptable to libbpf and kernel.
> > > > >
> > > > > Cc: Sedat Dilek <sedat.dilek@gmail.com>
> > > >
> > > > Thanks for v2.
> > > >
> > > > For both v1 and v2:
> > > >
> > > >    Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
> > > >    Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
> > >
> > > Thanks, applied.
> > >
> >
> > Great.
> > I cannot see it yet in [1] or [2].
> >
> > More important to me is is this worth a pahole v1.20.1 release?
> > This patch is required to successfully build with BTF and DWARF-5 and Clang-12.
> >
> > I have Nathan's "bpf: Hoise pahole version checks into Kconfig" patch
> > in my custom patchset (together with Nick's DWARF-v5 patchset) which
> > makes it easier to do (depends on PAHOLE_VERSION >= 1201) via Kconfig
>
> I don't think the math works out with 1201 vs 121 (in the future), so
> I'd rather prefer 1.21, if necessary.
>

Aaargh, you are right.

# /opt/pahole/bin/pahole --numeric_version
120

The wish is to have a pahole v1.21.

- Sedat -

> > for example.
> >
> > - Sedat -
> >
> > [1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/
> > [2] https://github.com/acmel/dwarves/
> >
> > > - Arnaldo
> > >
> > > > My development and testing environment:
> > > >
> > > > 1. Debian/testing AMD64
> > > > 2. Linux v5.11-rc6+ with custom mostly Clang fixes
> > > > 3. Debug-Info: BTF + DWARF-v5 enabled
> > > > 4. Compiler and tools (incl. Clang's Integrated ASsembler IAS):
> > > > LLVM/Clang v12.0.0-rc1 (make LLVM=1 LLVM_IAS=1)
> > > >
> > > > Build and boot on bare metal.
> > > >
> > > > - Sedat -
> > > >
> > > > > Acked-by: Andrii Nakryiko <andrii@kernel.org>
> > > > > Signed-off-by: Yonghong Song <yhs@fb.com>
> > > > > ---
> > > > >  libbtf.c | 39 ++++++++++++++++++++++++++++++++++++++-
> > > > >  1 file changed, 38 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/libbtf.c b/libbtf.c
> > > > > index 9f76283..5843200 100644
> > > > > --- a/libbtf.c
> > > > > +++ b/libbtf.c
> > > > > @@ -373,6 +373,7 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
> > > > >         struct btf *btf = btfe->btf;
> > > > >         const struct btf_type *t;
> > > > >         uint8_t encoding = 0;
> > > > > +       uint16_t byte_sz;
> > > > >         int32_t id;
> > > > >
> > > > >         if (bt->is_signed) {
> > > > > @@ -384,7 +385,43 @@ int32_t btf_elf__add_base_type(struct btf_elf *btfe, const struct base_type *bt,
> > > > >                 return -1;
> > > > >         }
> > > > >
> > > > > -       id = btf__add_int(btf, name, BITS_ROUNDUP_BYTES(bt->bit_size), encoding);
> > > > > +       /* dwarf5 may emit DW_ATE_[un]signed_{num} base types where
> > > > > +        * {num} is not power of 2 and may exceed 128. Such attributes
> > > > > +        * are mostly used to record operation for an actual parameter
> > > > > +        * or variable.
> > > > > +        * For example,
> > > > > +        *     DW_AT_location        (indexed (0x3c) loclist = 0x00008fb0:
> > > > > +        *         [0xffffffff82808812, 0xffffffff82808817):
> > > > > +        *             DW_OP_breg0 RAX+0,
> > > > > +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > > > > +        *             DW_OP_convert (0x000e97df) "DW_ATE_unsigned_8",
> > > > > +        *             DW_OP_stack_value,
> > > > > +        *             DW_OP_piece 0x1,
> > > > > +        *             DW_OP_breg0 RAX+0,
> > > > > +        *             DW_OP_convert (0x000e97d5) "DW_ATE_unsigned_64",
> > > > > +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > > > > +        *             DW_OP_lit8,
> > > > > +        *             DW_OP_shr,
> > > > > +        *             DW_OP_convert (0x000e97da) "DW_ATE_unsigned_32",
> > > > > +        *             DW_OP_convert (0x000e97e4) "DW_ATE_unsigned_24",
> > > > > +        *             DW_OP_stack_value, DW_OP_piece 0x3
> > > > > +        *     DW_AT_name    ("ebx")
> > > > > +        *     DW_AT_decl_file       ("/linux/arch/x86/events/intel/core.c")
> > > > > +        *
> > > > > +        * In the above example, at some point, one unsigned_32 value
> > > > > +        * is right shifted by 8 and the result is converted to unsigned_32
> > > > > +        * and then unsigned_24.
> > > > > +        *
> > > > > +        * BTF does not need such DW_OP_* information so let us sanitize
> > > > > +        * these non-regular int types to avoid libbpf/kernel complaints.
> > > > > +        */
> > > > > +       byte_sz = BITS_ROUNDUP_BYTES(bt->bit_size);
> > > > > +       if (!byte_sz || (byte_sz & (byte_sz - 1))) {
> > > > > +               name = "__SANITIZED_FAKE_INT__";
> > > > > +               byte_sz = 4;
> > > > > +       }
> > > > > +
> > > > > +       id = btf__add_int(btf, name, byte_sz, encoding);
> > > > >         if (id < 0) {
> > > > >                 btf_elf__log_err(btfe, BTF_KIND_INT, name, true, "Error emitting BTF type");
> > > > >         } else {
> > > > > --
> > > > > 2.24.1
> > > > >
> > >
> > > --
> > >
> > > - Arnaldo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH dwarves v2] btf_encoder: sanitize non-regular int base type
       [not found]     ` <CAMXQf9_Qy5tD05ax1vtETnzM9szLxm95JgpHgT0HzjktixMNNQ@mail.gmail.com>
@ 2021-02-08 21:24       ` Mark Wielaard
  0 siblings, 0 replies; 14+ messages in thread
From: Mark Wielaard @ 2021-02-08 21:24 UTC (permalink / raw)
  To: David Blaikie, Nick Desaulniers
  Cc: Yonghong Song, Arnaldo Carvalho de Melo, dwarves, bpf,
	Andrii Nakryiko, Sedat Dilek, Andrii Nakryiko

Hi David,

(Nice to have seen you just recently!)

On Mon, 2021-02-08 at 12:43 -0800, David Blaikie wrote:
> DW_OP_convert was added in DWARFv5 and it can be used for type conversions,
> these are created in the LLVM middle/backend during optimizations, not by
> the frontend. So the middle/backend doesn't have a way to create canonical
> DW_TAG_base_types for frontend language integer types - so it creates
> synthesized ones. So this is intentional/not a particularly quirky thing.
> 
> LLVM creates locations incrementally as optimizations are applied - without
> doing any particular canonicalization/reduction later on (maybe it has some
> canonicalization/reduction, but not a very wholistic/aggressive approach
> there) - so it can end up with something that is not the most compact
> representation.

But it seems to end up with a bogus representation (see below in the
original quoted message for an example). The issue isn't really that it
isn't an optimized constant representation (although it is kind of an
inefficient one). It is this DW_OP_convert applied to a
DW_TAG_base_type DIE with a DW_AT_byte_size of zero. Which doesn't
really make any sense to me. It feels like it is asking the DWARF
consumer to do a divide by zero here ("now express this value using
zero bits!"). Could you explain these syntactic "DW_ATE_unsigned_1"
base_type DIEs? What do they represent? Why do they have a zero size?
Should they really have a DW_AT_bit_size and a DW_AT_byte_size of 1?

Thanks,

Mark

> On Mon, Feb 8, 2021 at 11:17 AM Nick Desaulniers <ndesaulniers@google.com>
> wrote:
> 
> > On Sun, Feb 7, 2021 at 6:18 AM Mark Wielaard <mark@klomp.org> wrote:
> > > 
> > > Hi,
> > > 
> > > On Sat, 2021-02-06 at 23:17 -0800, Yonghong Song wrote:
> > > > clang with dwarf5 may generate non-regular int base type,
> > > > i.e., not a signed/unsigned char/short/int/longlong/__int128.
> > > > Such base types are often used to describe
> > > > how an actual parameter or variable is generated. For example,
> > > > 
> > > > 0x000015cf:   DW_TAG_base_type
> > > >                 DW_AT_name      ("DW_ATE_unsigned_1")
> > > >                 DW_AT_encoding  (DW_ATE_unsigned)
> > > >                 DW_AT_byte_size (0x00)
> > > > 
> > > > 0x00010ed9:         DW_TAG_formal_parameter
> > > >                       DW_AT_location    (DW_OP_lit0,
> > > >                                          DW_OP_not,
> > > >                                          DW_OP_convert (0x000015cf)
> > 
> > "DW_ATE_unsigned_1",
> > > >                                          DW_OP_convert (0x000015d4)
> > 
> > "DW_ATE_unsigned_8",
> > > >                                          DW_OP_stack_value)
> > > >                       DW_AT_abstract_origin     (0x00013984 "branch")
> > > > 
> > > > What it does is with a literal "0", did a "not" operation, and the
> > 
> > converted to
> > > > one-bit unsigned int and then 8-bit unsigned int.
> > > 
> > > Thanks for tracking this down. Do you have any idea why the clang
> > > compiler emits this? You might be right that it is intended to do what
> > > you describe it does (but then it would simply encode an unsigned
> > > constant 1 char in a very inefficient way). But as implemented it
> > > doesn't seem to make any sense. What would DW_OP_convert of an zero
> > > sized base type even mean (if it is intended as a 1 bit sized typed,
> > > then why is there no DW_AT_bit_size)?
> > 
> > David,
> > Any thoughts on the above sequence of DW_OP_ entries?  This is a part
> > of DWARF I'm unfamiliar with.
> > 
> > > 
> > > So I do think your patch makes sense. clang clearly is emitting
> > > something bogus. And so some fixup is needed. But maybe we should at
> > > least give a warning about it, otherwise it might never get fixed.
> > > 
> > > BTW. If these bogus base types are only emitted as part of a location
> > > expression and not as part of an actual function or variable type
> > > description, then why are we even trying to encode it as a BTF type? It
> > > might be cheaper to just skip/drop it. But maybe the code setup makes
> > > it hard to know whether or not such a (bogus) type is actually
> > > referenced from a function or variable description?
> > > 
> > > Cheers,
> > > 
> > > Mark
> > 
> > 
> > 
> > --
> > Thanks,
> > ~Nick Desaulniers
> > 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-02-08 21:27 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-07  7:17 [PATCH dwarves v2] btf_encoder: sanitize non-regular int base type Yonghong Song
2021-02-07 10:32 ` Sedat Dilek
2021-02-08 19:55   ` Arnaldo Carvalho de Melo
2021-02-08 20:46     ` Sedat Dilek
2021-02-08 21:07       ` Andrii Nakryiko
2021-02-08 21:11         ` Sedat Dilek
2021-02-07 14:18 ` Mark Wielaard
2021-02-07 15:15   ` Sedat Dilek
2021-02-07 18:14   ` Yonghong Song
2021-02-08 19:16   ` Nick Desaulniers
     [not found]     ` <CAMXQf9_Qy5tD05ax1vtETnzM9szLxm95JgpHgT0HzjktixMNNQ@mail.gmail.com>
2021-02-08 21:24       ` Mark Wielaard
2021-02-08 19:22 ` Nick Desaulniers
2021-02-08 19:27   ` Sedat Dilek
2021-02-08 20:13   ` Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.