dwarves.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Per-CPU variables in modules and pahole
@ 2020-12-09 20:53 Andrii Nakryiko
  2020-12-10 16:43 ` Jiri Olsa
  0 siblings, 1 reply; 9+ messages in thread
From: Andrii Nakryiko @ 2020-12-09 20:53 UTC (permalink / raw)
  To: bpf, dwarves, Jiri Olsa, Hao Luo, Arnaldo Carvalho de Melo

Hi,

I'm working on supporting per-CPU symbols in BPF/libbpf, and the
prerequisite for that is BTF data for .data..percpu data section and
variables inside that.

Turns out, pahole doesn't currently emit any BTF information for such
variables in kernel modules. And the reason why is quite confusing and
I can't figure it out myself, so was hoping someone else might be able
to help.

To repro, you can take latest bpf-next tree and add this to
bpf_testmod/bpf_testmod.c inside selftests/bpf:

$ git diff bpf_testmod/bpf_testmod.c
      diff --git
a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
index 2df19d73ca49..b2086b798019 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -3,6 +3,7 @@
 #include <linux/error-injection.h>
 #include <linux/init.h>
 #include <linux/module.h>
+#include <linux/percpu-defs.h>
 #include <linux/sysfs.h>
 #include <linux/tracepoint.h>
 #include "bpf_testmod.h"
@@ -10,6 +11,10 @@
 #define CREATE_TRACE_POINTS
 #include "bpf_testmod-events.h"

+DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy1) = -1;
+DEFINE_PER_CPU(int, bpf_testmod_ksym_percpu) = 123;
+DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy2) = -1;
+
 noinline ssize_t
 bpf_testmod_test_read(struct file *file, struct kobject *kobj,
                      struct bin_attribute *bin_attr,

1. So the very first issue (that I'm going to ignore for now) is that
if I just added bpf_testmod_ksym_percpu, it would get addr == 0 and
would be ignored by the current pahole logic. So we need to fix that
for modules. Adding dummy1 and dummy2 takes care of this for now,
bpf_testmod_ksym_percpu has offset 4.

2. Second issue is more interesting. Somehow, when pahole iterates
over DWARF variables, the address of bpf_testmod_ksym_percpu is
reported as 0x10e74, not 4. Which totally confuses pahole because
according to ELF symbols, bpf_testmod_ksym_percpu symbol has value 4.
I tracked this down to dwarf_getlocation() returning 10e74 as number
field in expr.

But this seems wrong, because when looking at DWARF:

$ readelf -wi bpf_testmod.ko | grep bpf_testmod_ksym_percpu -B1 -A6
 <1><fbc5>: Abbrev Number: 97 (DW_TAG_variable)
    <fbc6>   DW_AT_name        : (indirect string, offset: 0x4afb):
bpf_testmod_ksym_percpu
    <fbca>   DW_AT_decl_file   : 5
    <fbcb>   DW_AT_decl_line   : 15
    <fbcc>   DW_AT_decl_column : 1
    <fbcd>   DW_AT_type        : <0xce>
    <fbd1>   DW_AT_external    : 1
    <fbd1>   DW_AT_location    : 9 byte block: 3 4 0 0 0 0 0 0 0
 (DW_OP_addr: 4)

You can see that addr is actually 4.

And ELF symbols agree:

$ readelf -a bpf_testmod.ko | grep bpf_testmod_ksym_percpu
   102: 0000000000000004     4 OBJECT  GLOBAL DEFAULT   33
bpf_testmod_ksym_percpu


I also can't seem to match 0x10e74 to anything in bpf_testmod.ko, no
section or anything like that.

So, help! Is this just a libdw bug? If yes, why don't we see it
anywhere else? If not, what am I missing and how can we make pahole
emit BTF data for variables in modules?

Thanks!


-- Andrii

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Per-CPU variables in modules and pahole
  2020-12-09 20:53 Per-CPU variables in modules and pahole Andrii Nakryiko
@ 2020-12-10 16:43 ` Jiri Olsa
  2020-12-10 17:02   ` Andrii Nakryiko
  0 siblings, 1 reply; 9+ messages in thread
From: Jiri Olsa @ 2020-12-10 16:43 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, dwarves, Jiri Olsa, Hao Luo, Arnaldo Carvalho de Melo

On Wed, Dec 09, 2020 at 12:53:44PM -0800, Andrii Nakryiko wrote:
> Hi,
> 
> I'm working on supporting per-CPU symbols in BPF/libbpf, and the
> prerequisite for that is BTF data for .data..percpu data section and
> variables inside that.
> 
> Turns out, pahole doesn't currently emit any BTF information for such
> variables in kernel modules. And the reason why is quite confusing and
> I can't figure it out myself, so was hoping someone else might be able
> to help.
> 
> To repro, you can take latest bpf-next tree and add this to
> bpf_testmod/bpf_testmod.c inside selftests/bpf:
> 
> $ git diff bpf_testmod/bpf_testmod.c
>       diff --git
> a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> index 2df19d73ca49..b2086b798019 100644
> --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> @@ -3,6 +3,7 @@
>  #include <linux/error-injection.h>
>  #include <linux/init.h>
>  #include <linux/module.h>
> +#include <linux/percpu-defs.h>
>  #include <linux/sysfs.h>
>  #include <linux/tracepoint.h>
>  #include "bpf_testmod.h"
> @@ -10,6 +11,10 @@
>  #define CREATE_TRACE_POINTS
>  #include "bpf_testmod-events.h"
> 
> +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy1) = -1;
> +DEFINE_PER_CPU(int, bpf_testmod_ksym_percpu) = 123;
> +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy2) = -1;
> +
>  noinline ssize_t
>  bpf_testmod_test_read(struct file *file, struct kobject *kobj,
>                       struct bin_attribute *bin_attr,
> 
> 1. So the very first issue (that I'm going to ignore for now) is that
> if I just added bpf_testmod_ksym_percpu, it would get addr == 0 and
> would be ignored by the current pahole logic. So we need to fix that
> for modules. Adding dummy1 and dummy2 takes care of this for now,
> bpf_testmod_ksym_percpu has offset 4.

I removed that addr zero check in the modules changes but when
collecting functions, but it's still there in collect_percpu_var

> 
> 2. Second issue is more interesting. Somehow, when pahole iterates
> over DWARF variables, the address of bpf_testmod_ksym_percpu is
> reported as 0x10e74, not 4. Which totally confuses pahole because
> according to ELF symbols, bpf_testmod_ksym_percpu symbol has value 4.
> I tracked this down to dwarf_getlocation() returning 10e74 as number
> field in expr.

in which place do you see that address? when I put displayed
address from collect_percpu_var it shows 4

not sure this is related but looks like similar issue I had to
solve for modules functions, as described in the changelog:
(not merged yet)

    btf_encoder: Detect kernel module ftrace addresses

    ...
    There's one tricky point with kernel modules wrt Elf object,
    which we get from dwfl_module_getelf function. This function
    performs all possible relocations, including __mcount_loc
    section.

    So addrs array contains relocated values, which we need take
    into account when we compare them to functions values which
    are relative to their sections.
    ...

The 0x10e74 value could be relocated 4.. but it's me guessing,
because not sure where you see that address exactly

jirka


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Per-CPU variables in modules and pahole
  2020-12-10 16:43 ` Jiri Olsa
@ 2020-12-10 17:02   ` Andrii Nakryiko
  2020-12-10 18:28     ` Hao Luo
  2020-12-10 23:42     ` Jiri Olsa
  0 siblings, 2 replies; 9+ messages in thread
From: Andrii Nakryiko @ 2020-12-10 17:02 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: bpf, dwarves, Jiri Olsa, Hao Luo, Arnaldo Carvalho de Melo

On Thu, Dec 10, 2020 at 8:43 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Wed, Dec 09, 2020 at 12:53:44PM -0800, Andrii Nakryiko wrote:
> > Hi,
> >
> > I'm working on supporting per-CPU symbols in BPF/libbpf, and the
> > prerequisite for that is BTF data for .data..percpu data section and
> > variables inside that.
> >
> > Turns out, pahole doesn't currently emit any BTF information for such
> > variables in kernel modules. And the reason why is quite confusing and
> > I can't figure it out myself, so was hoping someone else might be able
> > to help.
> >
> > To repro, you can take latest bpf-next tree and add this to
> > bpf_testmod/bpf_testmod.c inside selftests/bpf:
> >
> > $ git diff bpf_testmod/bpf_testmod.c
> >       diff --git
> > a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > index 2df19d73ca49..b2086b798019 100644
> > --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > @@ -3,6 +3,7 @@
> >  #include <linux/error-injection.h>
> >  #include <linux/init.h>
> >  #include <linux/module.h>
> > +#include <linux/percpu-defs.h>
> >  #include <linux/sysfs.h>
> >  #include <linux/tracepoint.h>
> >  #include "bpf_testmod.h"
> > @@ -10,6 +11,10 @@
> >  #define CREATE_TRACE_POINTS
> >  #include "bpf_testmod-events.h"
> >
> > +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy1) = -1;
> > +DEFINE_PER_CPU(int, bpf_testmod_ksym_percpu) = 123;
> > +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy2) = -1;
> > +
> >  noinline ssize_t
> >  bpf_testmod_test_read(struct file *file, struct kobject *kobj,
> >                       struct bin_attribute *bin_attr,
> >
> > 1. So the very first issue (that I'm going to ignore for now) is that
> > if I just added bpf_testmod_ksym_percpu, it would get addr == 0 and
> > would be ignored by the current pahole logic. So we need to fix that
> > for modules. Adding dummy1 and dummy2 takes care of this for now,
> > bpf_testmod_ksym_percpu has offset 4.
>
> I removed that addr zero check in the modules changes but when
> collecting functions, but it's still there in collect_percpu_var

Hao had some reason to skip per-cpu variables with offset 0, maybe he
can comment on that before we change it.

>
> >
> > 2. Second issue is more interesting. Somehow, when pahole iterates
> > over DWARF variables, the address of bpf_testmod_ksym_percpu is
> > reported as 0x10e74, not 4. Which totally confuses pahole because
> > according to ELF symbols, bpf_testmod_ksym_percpu symbol has value 4.
> > I tracked this down to dwarf_getlocation() returning 10e74 as number
> > field in expr.
>
> in which place do you see that address? when I put displayed
> address from collect_percpu_var it shows 4

yes, ELF symbol's value is 4, but when iterating DWARF variables
(0x10e70 + 4) is returned. It does look like a special handling of
modules. I missed that libdw does some special things for specifically
modules. Further debugging yesterday showed that 0x10e70 roughly
corresponds to the offset of .data..per_cpu if you count all the
allocatable data sections that come before it. So I think you are
right. We should probably centralize the logic of kernel module
detection so that we can handle these module vs non-module differences
properly.

>
> not sure this is related but looks like similar issue I had to
> solve for modules functions, as described in the changelog:
> (not merged yet)
>
>     btf_encoder: Detect kernel module ftrace addresses
>
>     ...
>     There's one tricky point with kernel modules wrt Elf object,
>     which we get from dwfl_module_getelf function. This function
>     performs all possible relocations, including __mcount_loc
>     section.
>
>     So addrs array contains relocated values, which we need take
>     into account when we compare them to functions values which
>     are relative to their sections.
>     ...
>
> The 0x10e74 value could be relocated 4.. but it's me guessing,
> because not sure where you see that address exactly


It comes up in cu__encode_btf(), var->ip.addr is not 4, as we expect it to be.

>
> jirka
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Per-CPU variables in modules and pahole
  2020-12-10 17:02   ` Andrii Nakryiko
@ 2020-12-10 18:28     ` Hao Luo
  2020-12-11  2:56       ` Andrii Nakryiko
  2020-12-10 23:42     ` Jiri Olsa
  1 sibling, 1 reply; 9+ messages in thread
From: Hao Luo @ 2020-12-10 18:28 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiri Olsa, bpf, dwarves, Jiri Olsa, Arnaldo Carvalho de Melo

On Thu, Dec 10, 2020 at 9:02 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Dec 10, 2020 at 8:43 AM Jiri Olsa <jolsa@redhat.com> wrote:
> >
> > On Wed, Dec 09, 2020 at 12:53:44PM -0800, Andrii Nakryiko wrote:
> > > Hi,
> > >
> > > I'm working on supporting per-CPU symbols in BPF/libbpf, and the
> > > prerequisite for that is BTF data for .data..percpu data section and
> > > variables inside that.
> > >
> > > Turns out, pahole doesn't currently emit any BTF information for such
> > > variables in kernel modules. And the reason why is quite confusing and
> > > I can't figure it out myself, so was hoping someone else might be able
> > > to help.
> > >
> > > To repro, you can take latest bpf-next tree and add this to
> > > bpf_testmod/bpf_testmod.c inside selftests/bpf:
> > >
> > > $ git diff bpf_testmod/bpf_testmod.c
> > >       diff --git
> > > a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > > b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > > index 2df19d73ca49..b2086b798019 100644
> > > --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > > +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > > @@ -3,6 +3,7 @@
> > >  #include <linux/error-injection.h>
> > >  #include <linux/init.h>
> > >  #include <linux/module.h>
> > > +#include <linux/percpu-defs.h>
> > >  #include <linux/sysfs.h>
> > >  #include <linux/tracepoint.h>
> > >  #include "bpf_testmod.h"
> > > @@ -10,6 +11,10 @@
> > >  #define CREATE_TRACE_POINTS
> > >  #include "bpf_testmod-events.h"
> > >
> > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy1) = -1;
> > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_percpu) = 123;
> > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy2) = -1;
> > > +
> > >  noinline ssize_t
> > >  bpf_testmod_test_read(struct file *file, struct kobject *kobj,
> > >                       struct bin_attribute *bin_attr,
> > >
> > > 1. So the very first issue (that I'm going to ignore for now) is that
> > > if I just added bpf_testmod_ksym_percpu, it would get addr == 0 and
> > > would be ignored by the current pahole logic. So we need to fix that
> > > for modules. Adding dummy1 and dummy2 takes care of this for now,
> > > bpf_testmod_ksym_percpu has offset 4.
> >
> > I removed that addr zero check in the modules changes but when
> > collecting functions, but it's still there in collect_percpu_var
>
> Hao had some reason to skip per-cpu variables with offset 0, maybe he
> can comment on that before we change it.
>

When I initially write that check, I see there are multiple symbols of
the same name that associate with a single variable, but there is only
one that has a non-zero address. Besides, there are symbols that don't
associate to any variable and they have zero address. For example,
those defined as __ADDRESSABLE(sym) and __UNIQUE_ID(prefix). They are
quite a lot, I remember. So I filtered out the zero address for the
purpose of accelerating encoding. I noticed that on x86_64, the first
page of the percpu section is reserved, so I deem those symbols that
are of normal interest should have positive addresses.

>
> >
> > >
> > > 2. Second issue is more interesting. Somehow, when pahole iterates
> > > over DWARF variables, the address of bpf_testmod_ksym_percpu is
> > > reported as 0x10e74, not 4. Which totally confuses pahole because
> > > according to ELF symbols, bpf_testmod_ksym_percpu symbol has value 4.
> > > I tracked this down to dwarf_getlocation() returning 10e74 as number
> > > field in expr.
> >
> > in which place do you see that address? when I put displayed
> > address from collect_percpu_var it shows 4
>
> yes, ELF symbol's value is 4, but when iterating DWARF variables
> (0x10e70 + 4) is returned. It does look like a special handling of
> modules. I missed that libdw does some special things for specifically
> modules. Further debugging yesterday showed that 0x10e70 roughly
> corresponds to the offset of .data..per_cpu if you count all the
> allocatable data sections that come before it. So I think you are
> right. We should probably centralize the logic of kernel module
> detection so that we can handle these module vs non-module differences
> properly.
>
> >
> > not sure this is related but looks like similar issue I had to
> > solve for modules functions, as described in the changelog:
> > (not merged yet)
> >
> >     btf_encoder: Detect kernel module ftrace addresses
> >
> >     ...
> >     There's one tricky point with kernel modules wrt Elf object,
> >     which we get from dwfl_module_getelf function. This function
> >     performs all possible relocations, including __mcount_loc
> >     section.
> >
> >     So addrs array contains relocated values, which we need take
> >     into account when we compare them to functions values which
> >     are relative to their sections.
> >     ...
> >
> > The 0x10e74 value could be relocated 4.. but it's me guessing,
> > because not sure where you see that address exactly
>
>
> It comes up in cu__encode_btf(), var->ip.addr is not 4, as we expect it to be.
>
> >
> > jirka
> >

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Per-CPU variables in modules and pahole
  2020-12-10 17:02   ` Andrii Nakryiko
  2020-12-10 18:28     ` Hao Luo
@ 2020-12-10 23:42     ` Jiri Olsa
  2020-12-10 23:49       ` Andrii Nakryiko
  1 sibling, 1 reply; 9+ messages in thread
From: Jiri Olsa @ 2020-12-10 23:42 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, dwarves, Jiri Olsa, Hao Luo, Arnaldo Carvalho de Melo

On Thu, Dec 10, 2020 at 09:02:05AM -0800, Andrii Nakryiko wrote:

SNIP

> 
> yes, ELF symbol's value is 4, but when iterating DWARF variables
> (0x10e70 + 4) is returned. It does look like a special handling of
> modules. I missed that libdw does some special things for specifically
> modules. Further debugging yesterday showed that 0x10e70 roughly
> corresponds to the offset of .data..per_cpu if you count all the
> allocatable data sections that come before it. So I think you are
> right. We should probably centralize the logic of kernel module
> detection so that we can handle these module vs non-module differences
> properly.
> 
> >
> > not sure this is related but looks like similar issue I had to
> > solve for modules functions, as described in the changelog:
> > (not merged yet)
> >
> >     btf_encoder: Detect kernel module ftrace addresses
> >
> >     ...
> >     There's one tricky point with kernel modules wrt Elf object,
> >     which we get from dwfl_module_getelf function. This function
> >     performs all possible relocations, including __mcount_loc
> >     section.
> >
> >     So addrs array contains relocated values, which we need take
> >     into account when we compare them to functions values which
> >     are relative to their sections.
> >     ...
> >
> > The 0x10e74 value could be relocated 4.. but it's me guessing,
> > because not sure where you see that address exactly
> 
> 
> It comes up in cu__encode_btf(), var->ip.addr is not 4, as we expect it to be.

I'm taking section sh_addr for each function and relocate
the addr value for kernel modules, check setup_functions
function

I don't see this being somehow centralized, looks simple
enough to me for each case

jirka


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Per-CPU variables in modules and pahole
  2020-12-10 23:42     ` Jiri Olsa
@ 2020-12-10 23:49       ` Andrii Nakryiko
  2020-12-11  2:57         ` Andrii Nakryiko
  0 siblings, 1 reply; 9+ messages in thread
From: Andrii Nakryiko @ 2020-12-10 23:49 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: bpf, dwarves, Jiri Olsa, Hao Luo, Arnaldo Carvalho de Melo

On Thu, Dec 10, 2020 at 3:42 PM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Thu, Dec 10, 2020 at 09:02:05AM -0800, Andrii Nakryiko wrote:
>
> SNIP
>
> >
> > yes, ELF symbol's value is 4, but when iterating DWARF variables
> > (0x10e70 + 4) is returned. It does look like a special handling of
> > modules. I missed that libdw does some special things for specifically
> > modules. Further debugging yesterday showed that 0x10e70 roughly
> > corresponds to the offset of .data..per_cpu if you count all the
> > allocatable data sections that come before it. So I think you are
> > right. We should probably centralize the logic of kernel module
> > detection so that we can handle these module vs non-module differences
> > properly.
> >
> > >
> > > not sure this is related but looks like similar issue I had to
> > > solve for modules functions, as described in the changelog:
> > > (not merged yet)
> > >
> > >     btf_encoder: Detect kernel module ftrace addresses
> > >
> > >     ...
> > >     There's one tricky point with kernel modules wrt Elf object,
> > >     which we get from dwfl_module_getelf function. This function
> > >     performs all possible relocations, including __mcount_loc
> > >     section.
> > >
> > >     So addrs array contains relocated values, which we need take
> > >     into account when we compare them to functions values which
> > >     are relative to their sections.
> > >     ...
> > >
> > > The 0x10e74 value could be relocated 4.. but it's me guessing,
> > > because not sure where you see that address exactly
> >
> >
> > It comes up in cu__encode_btf(), var->ip.addr is not 4, as we expect it to be.
>
> I'm taking section sh_addr for each function and relocate
> the addr value for kernel modules, check setup_functions
> function
>
> I don't see this being somehow centralized, looks simple
> enough to me for each case

I meant centralized detection of whether we are working with the
module or vmlinux or something else. setup_functions() currently has
very specific heuristic for that. So I'd like to extract that or come
up with some other way that won't be so function specific
(__start_mcount_loc symbol vs __mcount_loc section).

>
> jirka
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Per-CPU variables in modules and pahole
  2020-12-10 18:28     ` Hao Luo
@ 2020-12-11  2:56       ` Andrii Nakryiko
  2020-12-11  3:29         ` Andrii Nakryiko
  0 siblings, 1 reply; 9+ messages in thread
From: Andrii Nakryiko @ 2020-12-11  2:56 UTC (permalink / raw)
  To: Hao Luo; +Cc: Jiri Olsa, bpf, dwarves, Jiri Olsa, Arnaldo Carvalho de Melo

On Thu, Dec 10, 2020 at 10:29 AM Hao Luo <haoluo@google.com> wrote:
>
> On Thu, Dec 10, 2020 at 9:02 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Dec 10, 2020 at 8:43 AM Jiri Olsa <jolsa@redhat.com> wrote:
> > >
> > > On Wed, Dec 09, 2020 at 12:53:44PM -0800, Andrii Nakryiko wrote:
> > > > Hi,
> > > >
> > > > I'm working on supporting per-CPU symbols in BPF/libbpf, and the
> > > > prerequisite for that is BTF data for .data..percpu data section and
> > > > variables inside that.
> > > >
> > > > Turns out, pahole doesn't currently emit any BTF information for such
> > > > variables in kernel modules. And the reason why is quite confusing and
> > > > I can't figure it out myself, so was hoping someone else might be able
> > > > to help.
> > > >
> > > > To repro, you can take latest bpf-next tree and add this to
> > > > bpf_testmod/bpf_testmod.c inside selftests/bpf:
> > > >
> > > > $ git diff bpf_testmod/bpf_testmod.c
> > > >       diff --git
> > > > a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > > > b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > > > index 2df19d73ca49..b2086b798019 100644
> > > > --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > > > +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > > > @@ -3,6 +3,7 @@
> > > >  #include <linux/error-injection.h>
> > > >  #include <linux/init.h>
> > > >  #include <linux/module.h>
> > > > +#include <linux/percpu-defs.h>
> > > >  #include <linux/sysfs.h>
> > > >  #include <linux/tracepoint.h>
> > > >  #include "bpf_testmod.h"
> > > > @@ -10,6 +11,10 @@
> > > >  #define CREATE_TRACE_POINTS
> > > >  #include "bpf_testmod-events.h"
> > > >
> > > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy1) = -1;
> > > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_percpu) = 123;
> > > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy2) = -1;
> > > > +
> > > >  noinline ssize_t
> > > >  bpf_testmod_test_read(struct file *file, struct kobject *kobj,
> > > >                       struct bin_attribute *bin_attr,
> > > >
> > > > 1. So the very first issue (that I'm going to ignore for now) is that
> > > > if I just added bpf_testmod_ksym_percpu, it would get addr == 0 and
> > > > would be ignored by the current pahole logic. So we need to fix that
> > > > for modules. Adding dummy1 and dummy2 takes care of this for now,
> > > > bpf_testmod_ksym_percpu has offset 4.
> > >
> > > I removed that addr zero check in the modules changes but when
> > > collecting functions, but it's still there in collect_percpu_var
> >
> > Hao had some reason to skip per-cpu variables with offset 0, maybe he
> > can comment on that before we change it.
> >
>
> When I initially write that check, I see there are multiple symbols of
> the same name that associate with a single variable, but there is only
> one that has a non-zero address. Besides, there are symbols that don't
> associate to any variable and they have zero address. For example,
> those defined as __ADDRESSABLE(sym) and __UNIQUE_ID(prefix). They are
> quite a lot, I remember. So I filtered out the zero address for the
> purpose of accelerating encoding. I noticed that on x86_64, the first
> page of the percpu section is reserved, so I deem those symbols that
> are of normal interest should have positive addresses.

So I just checked my local vmlinux image, and seems like the only one
with addr == 0 is fixed_percpu_data. Everything else that's detected
as belonging to .data..percpu section looks sane and has non-zero
offset.

So I think this might have been the case before we switched to using
ELF symbols and now it's not? I think I'll just drop this check, will
post the patch, and would really appreciate if you can test it in your
environment. Does that sound ok?

>
> >
> > >
> > > >
> > > > 2. Second issue is more interesting. Somehow, when pahole iterates
> > > > over DWARF variables, the address of bpf_testmod_ksym_percpu is
> > > > reported as 0x10e74, not 4. Which totally confuses pahole because
> > > > according to ELF symbols, bpf_testmod_ksym_percpu symbol has value 4.
> > > > I tracked this down to dwarf_getlocation() returning 10e74 as number
> > > > field in expr.
> > >
> > > in which place do you see that address? when I put displayed
> > > address from collect_percpu_var it shows 4
> >
> > yes, ELF symbol's value is 4, but when iterating DWARF variables
> > (0x10e70 + 4) is returned. It does look like a special handling of
> > modules. I missed that libdw does some special things for specifically
> > modules. Further debugging yesterday showed that 0x10e70 roughly
> > corresponds to the offset of .data..per_cpu if you count all the
> > allocatable data sections that come before it. So I think you are
> > right. We should probably centralize the logic of kernel module
> > detection so that we can handle these module vs non-module differences
> > properly.
> >
> > >
> > > not sure this is related but looks like similar issue I had to
> > > solve for modules functions, as described in the changelog:
> > > (not merged yet)
> > >
> > >     btf_encoder: Detect kernel module ftrace addresses
> > >
> > >     ...
> > >     There's one tricky point with kernel modules wrt Elf object,
> > >     which we get from dwfl_module_getelf function. This function
> > >     performs all possible relocations, including __mcount_loc
> > >     section.
> > >
> > >     So addrs array contains relocated values, which we need take
> > >     into account when we compare them to functions values which
> > >     are relative to their sections.
> > >     ...
> > >
> > > The 0x10e74 value could be relocated 4.. but it's me guessing,
> > > because not sure where you see that address exactly
> >
> >
> > It comes up in cu__encode_btf(), var->ip.addr is not 4, as we expect it to be.
> >
> > >
> > > jirka
> > >

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Per-CPU variables in modules and pahole
  2020-12-10 23:49       ` Andrii Nakryiko
@ 2020-12-11  2:57         ` Andrii Nakryiko
  0 siblings, 0 replies; 9+ messages in thread
From: Andrii Nakryiko @ 2020-12-11  2:57 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: bpf, dwarves, Jiri Olsa, Hao Luo, Arnaldo Carvalho de Melo

On Thu, Dec 10, 2020 at 3:49 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Dec 10, 2020 at 3:42 PM Jiri Olsa <jolsa@redhat.com> wrote:
> >
> > On Thu, Dec 10, 2020 at 09:02:05AM -0800, Andrii Nakryiko wrote:
> >
> > SNIP
> >
> > >
> > > yes, ELF symbol's value is 4, but when iterating DWARF variables
> > > (0x10e70 + 4) is returned. It does look like a special handling of
> > > modules. I missed that libdw does some special things for specifically
> > > modules. Further debugging yesterday showed that 0x10e70 roughly
> > > corresponds to the offset of .data..per_cpu if you count all the
> > > allocatable data sections that come before it. So I think you are
> > > right. We should probably centralize the logic of kernel module
> > > detection so that we can handle these module vs non-module differences
> > > properly.
> > >
> > > >
> > > > not sure this is related but looks like similar issue I had to
> > > > solve for modules functions, as described in the changelog:
> > > > (not merged yet)
> > > >
> > > >     btf_encoder: Detect kernel module ftrace addresses
> > > >
> > > >     ...
> > > >     There's one tricky point with kernel modules wrt Elf object,
> > > >     which we get from dwfl_module_getelf function. This function
> > > >     performs all possible relocations, including __mcount_loc
> > > >     section.
> > > >
> > > >     So addrs array contains relocated values, which we need take
> > > >     into account when we compare them to functions values which
> > > >     are relative to their sections.
> > > >     ...
> > > >
> > > > The 0x10e74 value could be relocated 4.. but it's me guessing,
> > > > because not sure where you see that address exactly
> > >
> > >
> > > It comes up in cu__encode_btf(), var->ip.addr is not 4, as we expect it to be.
> >
> > I'm taking section sh_addr for each function and relocate
> > the addr value for kernel modules, check setup_functions
> > function
> >
> > I don't see this being somehow centralized, looks simple
> > enough to me for each case
>
> I meant centralized detection of whether we are working with the
> module or vmlinux or something else. setup_functions() currently has
> very specific heuristic for that. So I'd like to extract that or come
> up with some other way that won't be so function specific
> (__start_mcount_loc symbol vs __mcount_loc section).
>

This seems to be unnecessary, actually. We already record
btfe->percpu_base_addr, which for vmlinux is always zero, while for
module non-zero. So just subtracting this base addr before looking up
ELF symbol solves the problem for me and still works for vmlinux. So
I'm going with that for now.

> >
> > jirka
> >

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Per-CPU variables in modules and pahole
  2020-12-11  2:56       ` Andrii Nakryiko
@ 2020-12-11  3:29         ` Andrii Nakryiko
  0 siblings, 0 replies; 9+ messages in thread
From: Andrii Nakryiko @ 2020-12-11  3:29 UTC (permalink / raw)
  To: Hao Luo; +Cc: Jiri Olsa, bpf, dwarves, Jiri Olsa, Arnaldo Carvalho de Melo

On Thu, Dec 10, 2020 at 6:56 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Dec 10, 2020 at 10:29 AM Hao Luo <haoluo@google.com> wrote:
> >
> > On Thu, Dec 10, 2020 at 9:02 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Thu, Dec 10, 2020 at 8:43 AM Jiri Olsa <jolsa@redhat.com> wrote:
> > > >
> > > > On Wed, Dec 09, 2020 at 12:53:44PM -0800, Andrii Nakryiko wrote:
> > > > > Hi,
> > > > >
> > > > > I'm working on supporting per-CPU symbols in BPF/libbpf, and the
> > > > > prerequisite for that is BTF data for .data..percpu data section and
> > > > > variables inside that.
> > > > >
> > > > > Turns out, pahole doesn't currently emit any BTF information for such
> > > > > variables in kernel modules. And the reason why is quite confusing and
> > > > > I can't figure it out myself, so was hoping someone else might be able
> > > > > to help.
> > > > >
> > > > > To repro, you can take latest bpf-next tree and add this to
> > > > > bpf_testmod/bpf_testmod.c inside selftests/bpf:
> > > > >
> > > > > $ git diff bpf_testmod/bpf_testmod.c
> > > > >       diff --git
> > > > > a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > > > > b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > > > > index 2df19d73ca49..b2086b798019 100644
> > > > > --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > > > > +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
> > > > > @@ -3,6 +3,7 @@
> > > > >  #include <linux/error-injection.h>
> > > > >  #include <linux/init.h>
> > > > >  #include <linux/module.h>
> > > > > +#include <linux/percpu-defs.h>
> > > > >  #include <linux/sysfs.h>
> > > > >  #include <linux/tracepoint.h>
> > > > >  #include "bpf_testmod.h"
> > > > > @@ -10,6 +11,10 @@
> > > > >  #define CREATE_TRACE_POINTS
> > > > >  #include "bpf_testmod-events.h"
> > > > >
> > > > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy1) = -1;
> > > > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_percpu) = 123;
> > > > > +DEFINE_PER_CPU(int, bpf_testmod_ksym_dummy2) = -1;
> > > > > +
> > > > >  noinline ssize_t
> > > > >  bpf_testmod_test_read(struct file *file, struct kobject *kobj,
> > > > >                       struct bin_attribute *bin_attr,
> > > > >
> > > > > 1. So the very first issue (that I'm going to ignore for now) is that
> > > > > if I just added bpf_testmod_ksym_percpu, it would get addr == 0 and
> > > > > would be ignored by the current pahole logic. So we need to fix that
> > > > > for modules. Adding dummy1 and dummy2 takes care of this for now,
> > > > > bpf_testmod_ksym_percpu has offset 4.
> > > >
> > > > I removed that addr zero check in the modules changes but when
> > > > collecting functions, but it's still there in collect_percpu_var
> > >
> > > Hao had some reason to skip per-cpu variables with offset 0, maybe he
> > > can comment on that before we change it.
> > >
> >
> > When I initially write that check, I see there are multiple symbols of
> > the same name that associate with a single variable, but there is only
> > one that has a non-zero address. Besides, there are symbols that don't
> > associate to any variable and they have zero address. For example,
> > those defined as __ADDRESSABLE(sym) and __UNIQUE_ID(prefix). They are
> > quite a lot, I remember. So I filtered out the zero address for the
> > purpose of accelerating encoding. I noticed that on x86_64, the first
> > page of the percpu section is reserved, so I deem those symbols that
> > are of normal interest should have positive addresses.
>
> So I just checked my local vmlinux image, and seems like the only one
> with addr == 0 is fixed_percpu_data. Everything else that's detected
> as belonging to .data..percpu section looks sane and has non-zero
> offset.
>
> So I think this might have been the case before we switched to using
> ELF symbols and now it's not? I think I'll just drop this check, will
> post the patch, and would really appreciate if you can test it in your
> environment. Does that sound ok?

Ah, never mind. While ELF symbols look good, it's the DWARF variables
side where the problem is. There are lots of DWARF variables that map
to addr 0 and which are impossible to distinguish from readl
fixed_percpu_data, because we can't even rely on getting DWARF
variable name.

I guess I'll leave it as is for now, but we should come up with some
solution, ideally.

>
> >
> > >
> > > >
> > > > >
> > > > > 2. Second issue is more interesting. Somehow, when pahole iterates
> > > > > over DWARF variables, the address of bpf_testmod_ksym_percpu is
> > > > > reported as 0x10e74, not 4. Which totally confuses pahole because
> > > > > according to ELF symbols, bpf_testmod_ksym_percpu symbol has value 4.
> > > > > I tracked this down to dwarf_getlocation() returning 10e74 as number
> > > > > field in expr.
> > > >
> > > > in which place do you see that address? when I put displayed
> > > > address from collect_percpu_var it shows 4
> > >
> > > yes, ELF symbol's value is 4, but when iterating DWARF variables
> > > (0x10e70 + 4) is returned. It does look like a special handling of
> > > modules. I missed that libdw does some special things for specifically
> > > modules. Further debugging yesterday showed that 0x10e70 roughly
> > > corresponds to the offset of .data..per_cpu if you count all the
> > > allocatable data sections that come before it. So I think you are
> > > right. We should probably centralize the logic of kernel module
> > > detection so that we can handle these module vs non-module differences
> > > properly.
> > >
> > > >
> > > > not sure this is related but looks like similar issue I had to
> > > > solve for modules functions, as described in the changelog:
> > > > (not merged yet)
> > > >
> > > >     btf_encoder: Detect kernel module ftrace addresses
> > > >
> > > >     ...
> > > >     There's one tricky point with kernel modules wrt Elf object,
> > > >     which we get from dwfl_module_getelf function. This function
> > > >     performs all possible relocations, including __mcount_loc
> > > >     section.
> > > >
> > > >     So addrs array contains relocated values, which we need take
> > > >     into account when we compare them to functions values which
> > > >     are relative to their sections.
> > > >     ...
> > > >
> > > > The 0x10e74 value could be relocated 4.. but it's me guessing,
> > > > because not sure where you see that address exactly
> > >
> > >
> > > It comes up in cu__encode_btf(), var->ip.addr is not 4, as we expect it to be.
> > >
> > > >
> > > > jirka
> > > >

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-12-11  3:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-09 20:53 Per-CPU variables in modules and pahole Andrii Nakryiko
2020-12-10 16:43 ` Jiri Olsa
2020-12-10 17:02   ` Andrii Nakryiko
2020-12-10 18:28     ` Hao Luo
2020-12-11  2:56       ` Andrii Nakryiko
2020-12-11  3:29         ` Andrii Nakryiko
2020-12-10 23:42     ` Jiri Olsa
2020-12-10 23:49       ` Andrii Nakryiko
2020-12-11  2:57         ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).