Re: [RFC PATCH bpf-next 8/8] tools/bpftool: show PIDs with FDs open against BPF map/prog/link/btf

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [RFC PATCH bpf-next 8/8] tools/bpftool: show PIDs with FDs open against BPF map/prog/link/btf
       [not found] ` <20200612223150.1177182-9-andriin@fb.com>
@ 2020-06-13  3:45   ` Alexei Starovoitov
  2020-06-13  5:57     ` Andrii Nakryiko
  0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2020-06-13  3:45 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev, ast, daniel, andrii.nakryiko, kernel-team, Hao Luo,
	Arnaldo Carvalho de Melo, Song Liu, Quentin Monnet

On Fri, Jun 12, 2020 at 03:31:50PM -0700, Andrii Nakryiko wrote:
> Add bpf_iter-based way to find all the processes that hold open FDs against
> BPF object (map, prog, link, btf). Add new flag (-o, for "ownership", given
> -p is already taken) to trigger collection and output of these PIDs.
> 
> Sample output for each of 4 BPF objects:
> 
> $ sudo ./bpftool -o prog show
> 1992: cgroup_skb  name egress_alt  tag 9ad187367cf2b9e8  gpl
>         loaded_at 2020-06-12T14:18:10-0700  uid 0
>         xlated 48B  jited 59B  memlock 4096B  map_ids 2074
>         btf_id 460
>         pids: 913709,913732,913733,913734
> 2062: cgroup_device  tag 8c42dee26e8cd4c2  gpl
>         loaded_at 2020-06-12T14:37:52-0700  uid 0
>         xlated 648B  jited 409B  memlock 4096B
>         pids: 1
> 
> $ sudo ./bpftool -o map show
> 2074: array  name test_cgr.bss  flags 0x400
>         key 4B  value 8B  max_entries 1  memlock 8192B
>         btf_id 460
>         pids: 913709,913732,913733,913734
> 
> $ sudo ./bpftool -o link show
> 82: cgroup  prog 1992
>         cgroup_id 0  attach_type egress
>         pids: 913709,913732,913733,913734
> 86: cgroup  prog 1992
>         cgroup_id 0  attach_type egress
>         pids: 913709,913732,913733,913734

This is awesome.

Why extra flag though? I think it's so useful that everyone would want to see
this by default. Also the word 'pid' has kernel meaning or user space meaning?
Looks like kernel then bpftool should say 'tid'.
Could you capture comm as well and sort it by comm, like:

$ sudo ./bpftool link show
82: cgroup  prog 1992
        cgroup_id 0  attach_type egress
        systemd(1), firewall(913709 913732), logger(913733 913734)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 8/8] tools/bpftool: show PIDs with FDs open against BPF map/prog/link/btf
  2020-06-13  3:45   ` [RFC PATCH bpf-next 8/8] tools/bpftool: show PIDs with FDs open against BPF map/prog/link/btf Alexei Starovoitov
@ 2020-06-13  5:57     ` Andrii Nakryiko
  2020-06-13 22:14       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 12+ messages in thread
From: Andrii Nakryiko @ 2020-06-13  5:57 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Hao Luo, Arnaldo Carvalho de Melo,
	Song Liu, Quentin Monnet

On Fri, Jun 12, 2020 at 8:45 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Jun 12, 2020 at 03:31:50PM -0700, Andrii Nakryiko wrote:
> > Add bpf_iter-based way to find all the processes that hold open FDs against
> > BPF object (map, prog, link, btf). Add new flag (-o, for "ownership", given
> > -p is already taken) to trigger collection and output of these PIDs.
> >
> > Sample output for each of 4 BPF objects:
> >
> > $ sudo ./bpftool -o prog show
> > 1992: cgroup_skb  name egress_alt  tag 9ad187367cf2b9e8  gpl
> >         loaded_at 2020-06-12T14:18:10-0700  uid 0
> >         xlated 48B  jited 59B  memlock 4096B  map_ids 2074
> >         btf_id 460
> >         pids: 913709,913732,913733,913734
> > 2062: cgroup_device  tag 8c42dee26e8cd4c2  gpl
> >         loaded_at 2020-06-12T14:37:52-0700  uid 0
> >         xlated 648B  jited 409B  memlock 4096B
> >         pids: 1
> >
> > $ sudo ./bpftool -o map show
> > 2074: array  name test_cgr.bss  flags 0x400
> >         key 4B  value 8B  max_entries 1  memlock 8192B
> >         btf_id 460
> >         pids: 913709,913732,913733,913734
> >
> > $ sudo ./bpftool -o link show
> > 82: cgroup  prog 1992
> >         cgroup_id 0  attach_type egress
> >         pids: 913709,913732,913733,913734
> > 86: cgroup  prog 1992
> >         cgroup_id 0  attach_type egress
> >         pids: 913709,913732,913733,913734
>
> This is awesome.

Thanks.

>
> Why extra flag though? I think it's so useful that everyone would want to see

No good reason apart from "being safe by default". If turned on by
default, bpftool would need to probe for bpf_iter support first. I can
add probing and do this by default.

> this by default. Also the word 'pid' has kernel meaning or user space meaning?
> Looks like kernel then bpftool should say 'tid'.

No, its process ID in user-space sense. See task->tgid in
pid_iter.bpf.c. I figured thread ID isn't all that useful.

> Could you capture comm as well and sort it by comm, like:
>
> $ sudo ./bpftool link show
> 82: cgroup  prog 1992
>         cgroup_id 0  attach_type egress
>         systemd(1), firewall(913709 913732), logger(913733 913734)

Yep, comm is useful, I'll add that. Grouping by comm is kind of a
pain, though, plus usually there will be one process only. So let me
start with doing comm (pid) for each PID independently. I think that
will be as good in practice.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 8/8] tools/bpftool: show PIDs with FDs open against BPF map/prog/link/btf
  2020-06-13  5:57     ` Andrii Nakryiko
@ 2020-06-13 22:14       ` Arnaldo Carvalho de Melo
  2020-06-15  9:04         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 12+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-06-13 22:14 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Andrii Nakryiko, bpf, Networking,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team, Hao Luo,
	Song Liu, Quentin Monnet

Em Fri, Jun 12, 2020 at 10:57:59PM -0700, Andrii Nakryiko escreveu:
> On Fri, Jun 12, 2020 at 8:45 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, Jun 12, 2020 at 03:31:50PM -0700, Andrii Nakryiko wrote:
> > > Add bpf_iter-based way to find all the processes that hold open FDs against
> > > BPF object (map, prog, link, btf). Add new flag (-o, for "ownership", given
> > > -p is already taken) to trigger collection and output of these PIDs.
> > >
> > > Sample output for each of 4 BPF objects:
> > >
> > > $ sudo ./bpftool -o prog show
> > > 1992: cgroup_skb  name egress_alt  tag 9ad187367cf2b9e8  gpl
> > >         loaded_at 2020-06-12T14:18:10-0700  uid 0
> > >         xlated 48B  jited 59B  memlock 4096B  map_ids 2074
> > >         btf_id 460
> > >         pids: 913709,913732,913733,913734
> > > 2062: cgroup_device  tag 8c42dee26e8cd4c2  gpl
> > >         loaded_at 2020-06-12T14:37:52-0700  uid 0
> > >         xlated 648B  jited 409B  memlock 4096B
> > >         pids: 1
> > >
> > > $ sudo ./bpftool -o map show
> > > 2074: array  name test_cgr.bss  flags 0x400
> > >         key 4B  value 8B  max_entries 1  memlock 8192B
> > >         btf_id 460
> > >         pids: 913709,913732,913733,913734
> > >
> > > $ sudo ./bpftool -o link show
> > > 82: cgroup  prog 1992
> > >         cgroup_id 0  attach_type egress
> > >         pids: 913709,913732,913733,913734
> > > 86: cgroup  prog 1992
> > >         cgroup_id 0  attach_type egress
> > >         pids: 913709,913732,913733,913734
> >
> > This is awesome.

Indeed.
 
> Thanks.
> 
> >
> > Why extra flag though? I think it's so useful that everyone would want to see

Agreed.
 
> No good reason apart from "being safe by default". If turned on by
> default, bpftool would need to probe for bpf_iter support first. I can
> add probing and do this by default.

I think this is the way to go.
 
> > this by default. Also the word 'pid' has kernel meaning or user space meaning?
> > Looks like kernel then bpftool should say 'tid'.
> 
> No, its process ID in user-space sense. See task->tgid in
> pid_iter.bpf.c. I figured thread ID isn't all that useful.
> 
> > Could you capture comm as well and sort it by comm, like:
> >
> > $ sudo ./bpftool link show
> > 82: cgroup  prog 1992
> >         cgroup_id 0  attach_type egress
> >         systemd(1), firewall(913709 913732), logger(913733 913734)
> 
> Yep, comm is useful, I'll add that. Grouping by comm is kind of a
> pain, though, plus usually there will be one process only. So let me
> start with doing comm (pid) for each PID independently. I think that
> will be as good in practice.

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 8/8] tools/bpftool: show PIDs with FDs open against BPF map/prog/link/btf
  2020-06-13 22:14       ` Arnaldo Carvalho de Melo
@ 2020-06-15  9:04         ` Toke Høiland-Jørgensen
  2020-06-15  9:30           ` Quentin Monnet
  0 siblings, 1 reply; 12+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-06-15  9:04 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Andrii Nakryiko
  Cc: Alexei Starovoitov, Andrii Nakryiko, bpf, Networking,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team, Hao Luo,
	Song Liu, Quentin Monnet

Arnaldo Carvalho de Melo <acme@kernel.org> writes:

> Em Fri, Jun 12, 2020 at 10:57:59PM -0700, Andrii Nakryiko escreveu:
>> On Fri, Jun 12, 2020 at 8:45 PM Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>> >
>> > On Fri, Jun 12, 2020 at 03:31:50PM -0700, Andrii Nakryiko wrote:
>> > > Add bpf_iter-based way to find all the processes that hold open FDs against
>> > > BPF object (map, prog, link, btf). Add new flag (-o, for "ownership", given
>> > > -p is already taken) to trigger collection and output of these PIDs.
>> > >
>> > > Sample output for each of 4 BPF objects:
>> > >
>> > > $ sudo ./bpftool -o prog show
>> > > 1992: cgroup_skb  name egress_alt  tag 9ad187367cf2b9e8  gpl
>> > >         loaded_at 2020-06-12T14:18:10-0700  uid 0
>> > >         xlated 48B  jited 59B  memlock 4096B  map_ids 2074
>> > >         btf_id 460
>> > >         pids: 913709,913732,913733,913734
>> > > 2062: cgroup_device  tag 8c42dee26e8cd4c2  gpl
>> > >         loaded_at 2020-06-12T14:37:52-0700  uid 0
>> > >         xlated 648B  jited 409B  memlock 4096B
>> > >         pids: 1
>> > >
>> > > $ sudo ./bpftool -o map show
>> > > 2074: array  name test_cgr.bss  flags 0x400
>> > >         key 4B  value 8B  max_entries 1  memlock 8192B
>> > >         btf_id 460
>> > >         pids: 913709,913732,913733,913734
>> > >
>> > > $ sudo ./bpftool -o link show
>> > > 82: cgroup  prog 1992
>> > >         cgroup_id 0  attach_type egress
>> > >         pids: 913709,913732,913733,913734
>> > > 86: cgroup  prog 1992
>> > >         cgroup_id 0  attach_type egress
>> > >         pids: 913709,913732,913733,913734
>> >
>> > This is awesome.
>
> Indeed.
>  
>> Thanks.
>> 
>> >
>> > Why extra flag though? I think it's so useful that everyone would want to see
>
> Agreed.
>  
>> No good reason apart from "being safe by default". If turned on by
>> default, bpftool would need to probe for bpf_iter support first. I can
>> add probing and do this by default.
>
> I think this is the way to go.

+1

And also +1 on the awesomeness of this feature! :)

-Toke


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 8/8] tools/bpftool: show PIDs with FDs open against BPF map/prog/link/btf
  2020-06-15  9:04         ` Toke Høiland-Jørgensen
@ 2020-06-15  9:30           ` Quentin Monnet
  0 siblings, 0 replies; 12+ messages in thread
From: Quentin Monnet @ 2020-06-15  9:30 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Toke Høiland-Jørgensen, Arnaldo Carvalho de Melo,
	Alexei Starovoitov, Andrii Nakryiko, bpf, Networking,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team, Hao Luo,
	Song Liu

2020-06-15 11:04 UTC+0200 ~ Toke Høiland-Jørgensen <toke@redhat.com>
> Arnaldo Carvalho de Melo <acme@kernel.org> writes:
> 
>> Em Fri, Jun 12, 2020 at 10:57:59PM -0700, Andrii Nakryiko escreveu:
>>> On Fri, Jun 12, 2020 at 8:45 PM Alexei Starovoitov
>>> <alexei.starovoitov@gmail.com> wrote:
>>>>
>>>> On Fri, Jun 12, 2020 at 03:31:50PM -0700, Andrii Nakryiko wrote:
>>>>> Add bpf_iter-based way to find all the processes that hold open FDs against
>>>>> BPF object (map, prog, link, btf). Add new flag (-o, for "ownership", given
>>>>> -p is already taken) to trigger collection and output of these PIDs.
>>>>>
>>>>> Sample output for each of 4 BPF objects:
>>>>>
>>>>> $ sudo ./bpftool -o prog show
>>>>> 1992: cgroup_skb  name egress_alt  tag 9ad187367cf2b9e8  gpl
>>>>>         loaded_at 2020-06-12T14:18:10-0700  uid 0
>>>>>         xlated 48B  jited 59B  memlock 4096B  map_ids 2074
>>>>>         btf_id 460
>>>>>         pids: 913709,913732,913733,913734
>>>>> 2062: cgroup_device  tag 8c42dee26e8cd4c2  gpl
>>>>>         loaded_at 2020-06-12T14:37:52-0700  uid 0
>>>>>         xlated 648B  jited 409B  memlock 4096B
>>>>>         pids: 1
>>>>>
>>>>> $ sudo ./bpftool -o map show
>>>>> 2074: array  name test_cgr.bss  flags 0x400
>>>>>         key 4B  value 8B  max_entries 1  memlock 8192B
>>>>>         btf_id 460
>>>>>         pids: 913709,913732,913733,913734
>>>>>
>>>>> $ sudo ./bpftool -o link show
>>>>> 82: cgroup  prog 1992
>>>>>         cgroup_id 0  attach_type egress
>>>>>         pids: 913709,913732,913733,913734
>>>>> 86: cgroup  prog 1992
>>>>>         cgroup_id 0  attach_type egress
>>>>>         pids: 913709,913732,913733,913734
>>>>
>>>> This is awesome.
>>
>> Indeed.
>>  
>>> Thanks.
>>>
>>>>
>>>> Why extra flag though? I think it's so useful that everyone would want to see
>>
>> Agreed.
>>  
>>> No good reason apart from "being safe by default". If turned on by
>>> default, bpftool would need to probe for bpf_iter support first. I can
>>> add probing and do this by default.
>>
>> I think this is the way to go.
> 
> +1
> 
> And also +1 on the awesomeness of this feature! :)
> 
> -Toke
> 

Thanks a lot Andrii, the feature looks great indeed.

Thank you for the clean-up and refactoring in bpftool and its Makefile
as well, I am happy to confirm that test_bpftool_build.sh still passes
on my setup with your changes.

Quentin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 1/8] libbpf: generalize libbpf externs support
       [not found]   ` <CA+khW7hAYVdoQX5-j0z1iGEVZeww4BBu4NXzy5eS5OwDRYqe2w@mail.gmail.com>
@ 2020-06-15 18:55     ` Andrii Nakryiko
  0 siblings, 0 replies; 12+ messages in thread
From: Andrii Nakryiko @ 2020-06-15 18:55 UTC (permalink / raw)
  To: Hao Luo
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Arnaldo Carvalho de Melo, Song Liu,
	Quentin Monnet

On Mon, Jun 15, 2020 at 9:44 AM Hao Luo <haoluo@google.com> wrote:
>
> Andrii,
>
> Thanks for this patch, it looks very nice! Decoupling kconfig from generic externs is much needed.
>
> On Fri, Jun 12, 2020 at 3:34 PM Andrii Nakryiko <andriin@fb.com> wrote:
>>
>> Switch existing Kconfig externs to be just one of few possible kinds of more
>> generic externs. This refactoring is in preparation for ksymbol extern
>> support, added in the follow up patch. There are no functional changes
>> intended.
>>
>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
>> ---
>>  tools/lib/bpf/libbpf.c | 332 ++++++++++++++++++++++++-----------------
>>  1 file changed, 199 insertions(+), 133 deletions(-)
>>

[...]

>> @@ -1443,12 +1454,12 @@ static int set_ext_value_tri(struct extern_desc *ext, void *ext_val,
>>                 else /* value == 'n' */
>>                         *(enum libbpf_tristate *)ext_val = TRI_NO;
>>                 break;
>> -       case EXT_CHAR:
>> +       case KCFG_CHAR:
>>                 *(char *)ext_val = value;
>>                 break;
>> -       case EXT_UNKNOWN:
>> -       case EXT_INT:
>> -       case EXT_CHAR_ARR:
>> +       case KCFG_UNKNOWN:
>> +       case KCFG_INT:
>> +       case KCFG_CHAR_ARR:
>>         default:
>>                 pr_warn("extern %s=%c should be bool, tristate, or char\n",
>>                         ext->name, value);
>
>
> Very minor: pr_warn("kconfig extern ..."); I noticed you have one similar message changed below.
>

yeah, good catch, I'll update

>>
>> @@ -1458,12 +1469,12 @@ static int set_ext_value_tri(struct extern_desc *ext, void *ext_val,
>>         return 0;
>>  }
>>

for the future, please cut irrelevant parts of the patch, makes it
easier to see where your replies are

[...]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 2/8] libbpf: add support for extracting kernel symbol addresses
       [not found]   ` <CA+khW7hFZzp_K_xydSFw0O3LYB22_fC=Z4wG7i9Si+phGHn4cQ@mail.gmail.com>
@ 2020-06-15 19:08     ` Andrii Nakryiko
  2020-06-16  8:05       ` Hao Luo
  0 siblings, 1 reply; 12+ messages in thread
From: Andrii Nakryiko @ 2020-06-15 19:08 UTC (permalink / raw)
  To: Hao Luo
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Arnaldo Carvalho de Melo, Song Liu,
	Quentin Monnet

On Mon, Jun 15, 2020 at 9:44 AM Hao Luo <haoluo@google.com> wrote:
>
> Thanks, Andrii,
>
> This change looks nice! A couple of comments:
>
> 1. A 'void' type variable looks slightly odd from a user's perspective. How about using 'u64' or 'void *'? Or at least, a named type, which aliases to 'void'?

That choice is very deliberate one. `extern const void` is the right
way in C language to access linker-generated symbols, for instance,
which is quite similar to what the intent is her. Having void type is
very explicit that you don't know/care about that value pointed to by
extern address, the only operation you can perform is to get it's
address.

Once we add kernel variables support, that's when types will start to
be specified and libbpf will do extra checks (type matching) and extra
work (generating ldimm64 with BTF ID, for instance), to allow C code
to access data pointed to by extern address.

Switching type to u64 would be misleading in allowing C code to
implicitly dereference value of extern. E.g., there is a big
difference between:

extern u64 bla;

printf("%lld\n", bla); /* de-reference happens here, we get contents
of memory pointed to by "bla" symbol */

printf("%p\n", &bla); /* here we get value of linker symbol/address of
extern variable */

Currently I explicitly support only the latter and want to prevent the
former, until we have kernel variables in BTF. Using `extern void`
makes compiler enforce that only the &bla form is allowed. Everything
else is compilation error.

> 2. About the type size of ksym, IIUC, it looks strange that the values read from kallsyms have 8 bytes but their corresponding vs->size is 4 bytes and vs->type points to 4-byte int. Can we make them of the same size?

That's a bit of a hack on my part. Variable needs to point to some
type, which size will match the size of datasec's varinfo entry. This
is checked and enforced by kernel. I'm looking for 4-byte int, because
it's almost guaranteed that it will be present in program's BTF and I
won't have to explicitly add it (it's because all BPF programs return
int, so it must be in program's BTF already). While 8-byte long is
less likely to be there.

In the future, if we have a nicer way to extend BTF (and we will
soon), we can do this a bit better, but either way that .ksyms DATASEC
type isn't used for anything (there is no map with that DATASEC as a
value type), so it doesn't matter.

>
> Hao
>
> On Fri, Jun 12, 2020 at 3:35 PM Andrii Nakryiko <andriin@fb.com> wrote:
>>
>> Add support for another (in addition to existing Kconfig) special kind of
>> externs in BPF code, kernel symbol externs. Such externs allow BPF code to
>> "know" kernel symbol address and either use it for comparisons with kernel
>> data structures (e.g., struct file's f_op pointer, to distinguish different
>> kinds of file), or, with the help of bpf_probe_user_kernel(), to follow
>> pointers and read data from global variables. Kernel symbol addresses are
>> found through /proc/kallsyms, which should be present in the system.
>>
>> Currently, such kernel symbol variables are typeless: they have to be defined
>> as `extern const void <symbol>` and the only operation you can do (in C code)
>> with them is to take its address. Such extern should reside in a special
>> section '.ksyms'. bpf_helpers.h header provides __ksym macro for this. Strong
>> vs weak semantics stays the same as with Kconfig externs. If symbol is not
>> found in /proc/kallsyms, this will be a failure for strong (non-weak) extern,
>> but will be defaulted to 0 for weak externs.
>>
>> If the same symbol is defined multiple times in /proc/kallsyms, then it will
>> be error if any of the associated addresses differs. In that case, address is
>> ambiguous, so libbpf falls on the side of caution, rather than confusing user
>> with randomly chosen address.
>>
>> In the future, once kernel is extended with variables BTF information, such
>> ksym externs will be supported in a typed version, which will allow BPF
>> program to read variable's contents directly, similarly to how it's done for
>> fentry/fexit input arguments.
>>
>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
>> ---
>>  tools/lib/bpf/bpf_helpers.h |   1 +
>>  tools/lib/bpf/btf.h         |   5 ++
>>  tools/lib/bpf/libbpf.c      | 138 ++++++++++++++++++++++++++++++++++--
>>  3 files changed, 139 insertions(+), 5 deletions(-)
>>

[...]

>>
>>  enum extern_type {
>>         EXT_UNKNOWN,
>> +       EXT_KSYM,
>>
>>         EXT_KCFG,
>>  };
>
>
> Minor, let EXT_KSYM come after EXT_KCFG.

I wanted ksym externs to go before KCFG ones, but not sure why. I'll
double check, I don't think it should matter.

>
>>
>>

[...]

>> +static int bpf_object__read_kallsyms_file(struct bpf_object *obj)
>> +{
>> +       char sym_type, sym_name[256];
>> +       unsigned long sym_addr;
>> +       struct extern_desc *ext;
>> +       int ret, err = 0;
>> +       FILE *f;
>> +
>> +       f = fopen("/proc/kallsyms", "r");
>> +       if (!f) {
>> +               err = -errno;
>> +               pr_warn("failed to open /proc/kallsyms: %d\n", err);
>> +               return err;
>> +       }
>> +
>> +       while (true) {
>> +               ret = fscanf(f, "%lx %c %s%*[^\n]\n",
>> +                            &sym_addr, &sym_type, sym_name);
>
>
> Maybe better follow the existing pattern in kernel (scripts/kallsyms.c https://github.com/torvalds/linux/blob/master/scripts/kallsyms.c#L177)


oh, didn't know about this "%499s" trick, will change.

>
>>
>> +               if (ret == EOF && feof(f))
>> +                       break;
>> +               if (ret != 3) {
>> +                       err = -EINVAL;
>> +                       goto out;
>> +               }
>> +

[...]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 3/8] selftests/bpf: add __ksym extern selftest
       [not found]   ` <CA+khW7jxdS1KRpk2syVGjDqbyn3wAd3Eh_LEMAEhkPUehuXMwg@mail.gmail.com>
@ 2020-06-15 19:11     ` Andrii Nakryiko
  0 siblings, 0 replies; 12+ messages in thread
From: Andrii Nakryiko @ 2020-06-15 19:11 UTC (permalink / raw)
  To: Hao Luo
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Arnaldo Carvalho de Melo, Song Liu,
	Quentin Monnet

On Mon, Jun 15, 2020 at 9:45 AM Hao Luo <haoluo@google.com> wrote:
>
> Andrii, a couple of general comments on fixed_percpu_data.
>
> I think it would be better to check the existence of fixed_percpu_data in kallsyms first. If it's not there, just skip, or maybe warn but not fail.

fixed_percpu_data is always there, but I missed the fact that it's
x86-specific one. I'll switch to some bpf-specific symbol (e.g., like
bpf_prog_fops or something along those lines).

>
> Further, if we really want to be sure that  fixed_percpu_data is the first percpu var, we can read the value of __per_cpu_start, which marks the beginning address of the percpu section. Checking the address of fixed_percpu_data against __per_cpu_start rather than 0 should be more robust, I think, given that fixed_percpu_data exists.

There are assertions in Linux sources that fixed_percpu_data is 0, so
I don't think that it necessary. But it's a moot point, as I'll use
something less x86-specific.

>
> Hao
>
> On Fri, Jun 12, 2020 at 3:35 PM Andrii Nakryiko <andriin@fb.com> wrote:
>>
>> Validate libbpf is able to handle weak and strong kernel symbol externs in BPF
>> code correctly.
>>
>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
>> ---
>>  .../testing/selftests/bpf/prog_tests/ksyms.c  | 71 +++++++++++++++++++
>>  .../testing/selftests/bpf/progs/test_ksyms.c  | 32 +++++++++
>>  2 files changed, 103 insertions(+)
>>  create mode 100644 tools/testing/selftests/bpf/prog_tests/ksyms.c
>>  create mode 100644 tools/testing/selftests/bpf/progs/test_ksyms.c
>>

[...]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 2/8] libbpf: add support for extracting kernel symbol addresses
  2020-06-15 19:08     ` [RFC PATCH bpf-next 2/8] libbpf: add support for extracting kernel symbol addresses Andrii Nakryiko
@ 2020-06-16  8:05       ` Hao Luo
  2020-06-17  1:24         ` Hao Luo
  0 siblings, 1 reply; 12+ messages in thread
From: Hao Luo @ 2020-06-16  8:05 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Arnaldo Carvalho de Melo, Song Liu,
	Quentin Monnet

On Mon, Jun 15, 2020 at 12:08 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Jun 15, 2020 at 9:44 AM Hao Luo <haoluo@google.com> wrote:
> >
> > Thanks, Andrii,
> >
> > This change looks nice! A couple of comments:
> >
> > 1. A 'void' type variable looks slightly odd from a user's perspective. How about using 'u64' or 'void *'? Or at least, a named type, which aliases to 'void'?
>
> That choice is very deliberate one. `extern const void` is the right
> way in C language to access linker-generated symbols, for instance,
> which is quite similar to what the intent is her. Having void type is
> very explicit that you don't know/care about that value pointed to by
> extern address, the only operation you can perform is to get it's
> address.
>
> Once we add kernel variables support, that's when types will start to
> be specified and libbpf will do extra checks (type matching) and extra
> work (generating ldimm64 with BTF ID, for instance), to allow C code
> to access data pointed to by extern address.
>
> Switching type to u64 would be misleading in allowing C code to
> implicitly dereference value of extern. E.g., there is a big
> difference between:
>
> extern u64 bla;
>
> printf("%lld\n", bla); /* de-reference happens here, we get contents
> of memory pointed to by "bla" symbol */
>
> printf("%p\n", &bla); /* here we get value of linker symbol/address of
> extern variable */
>
> Currently I explicitly support only the latter and want to prevent the
> former, until we have kernel variables in BTF. Using `extern void`
> makes compiler enforce that only the &bla form is allowed. Everything
> else is compilation error.
>

Ah, I see. I've been taking the extern variable as an actual variable
that contains the symbol's address, which is the first case. Your
approach makes sense. Thanks for explaining.

> > 2. About the type size of ksym, IIUC, it looks strange that the values read from kallsyms have 8 bytes but their corresponding vs->size is 4 bytes and vs->type points to 4-byte int. Can we make them of the same size?
>
> That's a bit of a hack on my part. Variable needs to point to some
> type, which size will match the size of datasec's varinfo entry. This
> is checked and enforced by kernel. I'm looking for 4-byte int, because
> it's almost guaranteed that it will be present in program's BTF and I
> won't have to explicitly add it (it's because all BPF programs return
> int, so it must be in program's BTF already). While 8-byte long is
> less likely to be there.
>
> In the future, if we have a nicer way to extend BTF (and we will
> soon), we can do this a bit better, but either way that .ksyms DATASEC
> type isn't used for anything (there is no map with that DATASEC as a
> value type), so it doesn't matter.
>

Thanks for explaining, Andrii.

These explanations as comments in the code would be quite helpful, IMHO.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 2/8] libbpf: add support for extracting kernel symbol addresses
  2020-06-16  8:05       ` Hao Luo
@ 2020-06-17  1:24         ` Hao Luo
  2020-06-17  1:36           ` Andrii Nakryiko
  0 siblings, 1 reply; 12+ messages in thread
From: Hao Luo @ 2020-06-17  1:24 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Arnaldo Carvalho de Melo, Song Liu,
	Quentin Monnet

Andrii,

Do you think we need to put the kernel's variables in one single
DATASEC in vmlinux BTF? It looks like all the ksyms in the program
will be under one ".ksyms" section, so we are not able to tell whether
a ksym is from a percpu section or a .rodata section. Without this
information, if the vmlinux has multiple DATASECs, the loader may need
to traverse all of them. If vmlinux BTF has only one DATASEC, it
matches the object's BTF better.

Right now, the percpu vars are in a ".data..percpu" DATASEC in my
patch and the plan seems that we will introduce more DATASECs to hold
other data.

Please let me know your insights here. Thanks.

Hao

On Tue, Jun 16, 2020 at 1:05 AM Hao Luo <haoluo@google.com> wrote:
>
> On Mon, Jun 15, 2020 at 12:08 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Jun 15, 2020 at 9:44 AM Hao Luo <haoluo@google.com> wrote:
> > >
> > > Thanks, Andrii,
> > >
> > > This change looks nice! A couple of comments:
> > >
> > > 1. A 'void' type variable looks slightly odd from a user's perspective. How about using 'u64' or 'void *'? Or at least, a named type, which aliases to 'void'?
> >
> > That choice is very deliberate one. `extern const void` is the right
> > way in C language to access linker-generated symbols, for instance,
> > which is quite similar to what the intent is her. Having void type is
> > very explicit that you don't know/care about that value pointed to by
> > extern address, the only operation you can perform is to get it's
> > address.
> >
> > Once we add kernel variables support, that's when types will start to
> > be specified and libbpf will do extra checks (type matching) and extra
> > work (generating ldimm64 with BTF ID, for instance), to allow C code
> > to access data pointed to by extern address.
> >
> > Switching type to u64 would be misleading in allowing C code to
> > implicitly dereference value of extern. E.g., there is a big
> > difference between:
> >
> > extern u64 bla;
> >
> > printf("%lld\n", bla); /* de-reference happens here, we get contents
> > of memory pointed to by "bla" symbol */
> >
> > printf("%p\n", &bla); /* here we get value of linker symbol/address of
> > extern variable */
> >
> > Currently I explicitly support only the latter and want to prevent the
> > former, until we have kernel variables in BTF. Using `extern void`
> > makes compiler enforce that only the &bla form is allowed. Everything
> > else is compilation error.
> >
>
> Ah, I see. I've been taking the extern variable as an actual variable
> that contains the symbol's address, which is the first case. Your
> approach makes sense. Thanks for explaining.
>
> > > 2. About the type size of ksym, IIUC, it looks strange that the values read from kallsyms have 8 bytes but their corresponding vs->size is 4 bytes and vs->type points to 4-byte int. Can we make them of the same size?
> >
> > That's a bit of a hack on my part. Variable needs to point to some
> > type, which size will match the size of datasec's varinfo entry. This
> > is checked and enforced by kernel. I'm looking for 4-byte int, because
> > it's almost guaranteed that it will be present in program's BTF and I
> > won't have to explicitly add it (it's because all BPF programs return
> > int, so it must be in program's BTF already). While 8-byte long is
> > less likely to be there.
> >
> > In the future, if we have a nicer way to extend BTF (and we will
> > soon), we can do this a bit better, but either way that .ksyms DATASEC
> > type isn't used for anything (there is no map with that DATASEC as a
> > value type), so it doesn't matter.
> >
>
> Thanks for explaining, Andrii.
>
> These explanations as comments in the code would be quite helpful, IMHO.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 2/8] libbpf: add support for extracting kernel symbol addresses
  2020-06-17  1:24         ` Hao Luo
@ 2020-06-17  1:36           ` Andrii Nakryiko
  2020-06-18  7:53             ` Hao Luo
  0 siblings, 1 reply; 12+ messages in thread
From: Andrii Nakryiko @ 2020-06-17  1:36 UTC (permalink / raw)
  To: Hao Luo
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Arnaldo Carvalho de Melo, Song Liu,
	Quentin Monnet

On Tue, Jun 16, 2020 at 6:24 PM Hao Luo <haoluo@google.com> wrote:
>
> Andrii,
>
> Do you think we need to put the kernel's variables in one single
> DATASEC in vmlinux BTF? It looks like all the ksyms in the program
> will be under one ".ksyms" section, so we are not able to tell whether
> a ksym is from a percpu section or a .rodata section. Without this
> information, if the vmlinux has multiple DATASECs, the loader may need
> to traverse all of them. If vmlinux BTF has only one DATASEC, it
> matches the object's BTF better.
>
> Right now, the percpu vars are in a ".data..percpu" DATASEC in my
> patch and the plan seems that we will introduce more DATASECs to hold
> other data.
>
> Please let me know your insights here. Thanks.

I think we should keep original DATASECs in vmlinux's BTF, so that
they match ELF sections. Otherwise BTF is going to lie and will cause
confusion down the road in the longer term.

On the BPF program side, though, I think we'll limit it to just two
special sections: .ksyms and .ksyms.percpu. libbpf will have to
traverse all vmlinux DATASECs to find corresponding variables, but
that's ok, it has to do the linear scan either way.

>
> Hao
>
> On Tue, Jun 16, 2020 at 1:05 AM Hao Luo <haoluo@google.com> wrote:
> >
> > On Mon, Jun 15, 2020 at 12:08 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Mon, Jun 15, 2020 at 9:44 AM Hao Luo <haoluo@google.com> wrote:
> > > >
> > > > Thanks, Andrii,
> > > >
> > > > This change looks nice! A couple of comments:
> > > >
> > > > 1. A 'void' type variable looks slightly odd from a user's perspective. How about using 'u64' or 'void *'? Or at least, a named type, which aliases to 'void'?
> > >
> > > That choice is very deliberate one. `extern const void` is the right
> > > way in C language to access linker-generated symbols, for instance,
> > > which is quite similar to what the intent is her. Having void type is
> > > very explicit that you don't know/care about that value pointed to by
> > > extern address, the only operation you can perform is to get it's
> > > address.
> > >
> > > Once we add kernel variables support, that's when types will start to
> > > be specified and libbpf will do extra checks (type matching) and extra
> > > work (generating ldimm64 with BTF ID, for instance), to allow C code
> > > to access data pointed to by extern address.
> > >
> > > Switching type to u64 would be misleading in allowing C code to
> > > implicitly dereference value of extern. E.g., there is a big
> > > difference between:
> > >
> > > extern u64 bla;
> > >
> > > printf("%lld\n", bla); /* de-reference happens here, we get contents
> > > of memory pointed to by "bla" symbol */
> > >
> > > printf("%p\n", &bla); /* here we get value of linker symbol/address of
> > > extern variable */
> > >
> > > Currently I explicitly support only the latter and want to prevent the
> > > former, until we have kernel variables in BTF. Using `extern void`
> > > makes compiler enforce that only the &bla form is allowed. Everything
> > > else is compilation error.
> > >
> >
> > Ah, I see. I've been taking the extern variable as an actual variable
> > that contains the symbol's address, which is the first case. Your
> > approach makes sense. Thanks for explaining.
> >
> > > > 2. About the type size of ksym, IIUC, it looks strange that the values read from kallsyms have 8 bytes but their corresponding vs->size is 4 bytes and vs->type points to 4-byte int. Can we make them of the same size?
> > >
> > > That's a bit of a hack on my part. Variable needs to point to some
> > > type, which size will match the size of datasec's varinfo entry. This
> > > is checked and enforced by kernel. I'm looking for 4-byte int, because
> > > it's almost guaranteed that it will be present in program's BTF and I
> > > won't have to explicitly add it (it's because all BPF programs return
> > > int, so it must be in program's BTF already). While 8-byte long is
> > > less likely to be there.
> > >
> > > In the future, if we have a nicer way to extend BTF (and we will
> > > soon), we can do this a bit better, but either way that .ksyms DATASEC
> > > type isn't used for anything (there is no map with that DATASEC as a
> > > value type), so it doesn't matter.
> > >
> >
> > Thanks for explaining, Andrii.
> >
> > These explanations as comments in the code would be quite helpful, IMHO.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 2/8] libbpf: add support for extracting kernel symbol addresses
  2020-06-17  1:36           ` Andrii Nakryiko
@ 2020-06-18  7:53             ` Hao Luo
  0 siblings, 0 replies; 12+ messages in thread
From: Hao Luo @ 2020-06-18  7:53 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Arnaldo Carvalho de Melo, Song Liu,
	Quentin Monnet

Sounds good. Thanks.

On Tue, Jun 16, 2020 at 6:37 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Jun 16, 2020 at 6:24 PM Hao Luo <haoluo@google.com> wrote:
> >
> > Andrii,
> >
> > Do you think we need to put the kernel's variables in one single
> > DATASEC in vmlinux BTF? It looks like all the ksyms in the program
> > will be under one ".ksyms" section, so we are not able to tell whether
> > a ksym is from a percpu section or a .rodata section. Without this
> > information, if the vmlinux has multiple DATASECs, the loader may need
> > to traverse all of them. If vmlinux BTF has only one DATASEC, it
> > matches the object's BTF better.
> >
> > Right now, the percpu vars are in a ".data..percpu" DATASEC in my
> > patch and the plan seems that we will introduce more DATASECs to hold
> > other data.
> >
> > Please let me know your insights here. Thanks.
>
> I think we should keep original DATASECs in vmlinux's BTF, so that
> they match ELF sections. Otherwise BTF is going to lie and will cause
> confusion down the road in the longer term.
>
> On the BPF program side, though, I think we'll limit it to just two
> special sections: .ksyms and .ksyms.percpu. libbpf will have to
> traverse all vmlinux DATASECs to find corresponding variables, but
> that's ok, it has to do the linear scan either way.
>
> >
> > Hao
> >
> > On Tue, Jun 16, 2020 at 1:05 AM Hao Luo <haoluo@google.com> wrote:
> > >
> > > On Mon, Jun 15, 2020 at 12:08 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Mon, Jun 15, 2020 at 9:44 AM Hao Luo <haoluo@google.com> wrote:
> > > > >
> > > > > Thanks, Andrii,
> > > > >
> > > > > This change looks nice! A couple of comments:
> > > > >
> > > > > 1. A 'void' type variable looks slightly odd from a user's perspective. How about using 'u64' or 'void *'? Or at least, a named type, which aliases to 'void'?
> > > >
> > > > That choice is very deliberate one. `extern const void` is the right
> > > > way in C language to access linker-generated symbols, for instance,
> > > > which is quite similar to what the intent is her. Having void type is
> > > > very explicit that you don't know/care about that value pointed to by
> > > > extern address, the only operation you can perform is to get it's
> > > > address.
> > > >
> > > > Once we add kernel variables support, that's when types will start to
> > > > be specified and libbpf will do extra checks (type matching) and extra
> > > > work (generating ldimm64 with BTF ID, for instance), to allow C code
> > > > to access data pointed to by extern address.
> > > >
> > > > Switching type to u64 would be misleading in allowing C code to
> > > > implicitly dereference value of extern. E.g., there is a big
> > > > difference between:
> > > >
> > > > extern u64 bla;
> > > >
> > > > printf("%lld\n", bla); /* de-reference happens here, we get contents
> > > > of memory pointed to by "bla" symbol */
> > > >
> > > > printf("%p\n", &bla); /* here we get value of linker symbol/address of
> > > > extern variable */
> > > >
> > > > Currently I explicitly support only the latter and want to prevent the
> > > > former, until we have kernel variables in BTF. Using `extern void`
> > > > makes compiler enforce that only the &bla form is allowed. Everything
> > > > else is compilation error.
> > > >
> > >
> > > Ah, I see. I've been taking the extern variable as an actual variable
> > > that contains the symbol's address, which is the first case. Your
> > > approach makes sense. Thanks for explaining.
> > >
> > > > > 2. About the type size of ksym, IIUC, it looks strange that the values read from kallsyms have 8 bytes but their corresponding vs->size is 4 bytes and vs->type points to 4-byte int. Can we make them of the same size?
> > > >
> > > > That's a bit of a hack on my part. Variable needs to point to some
> > > > type, which size will match the size of datasec's varinfo entry. This
> > > > is checked and enforced by kernel. I'm looking for 4-byte int, because
> > > > it's almost guaranteed that it will be present in program's BTF and I
> > > > won't have to explicitly add it (it's because all BPF programs return
> > > > int, so it must be in program's BTF already). While 8-byte long is
> > > > less likely to be there.
> > > >
> > > > In the future, if we have a nicer way to extend BTF (and we will
> > > > soon), we can do this a bit better, but either way that .ksyms DATASEC
> > > > type isn't used for anything (there is no map with that DATASEC as a
> > > > value type), so it doesn't matter.
> > > >
> > >
> > > Thanks for explaining, Andrii.
> > >
> > > These explanations as comments in the code would be quite helpful, IMHO.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-06-18  7:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200612223150.1177182-1-andriin@fb.com>
     [not found] ` <20200612223150.1177182-9-andriin@fb.com>
2020-06-13  3:45   ` [RFC PATCH bpf-next 8/8] tools/bpftool: show PIDs with FDs open against BPF map/prog/link/btf Alexei Starovoitov
2020-06-13  5:57     ` Andrii Nakryiko
2020-06-13 22:14       ` Arnaldo Carvalho de Melo
2020-06-15  9:04         ` Toke Høiland-Jørgensen
2020-06-15  9:30           ` Quentin Monnet
     [not found] ` <20200612223150.1177182-2-andriin@fb.com>
     [not found]   ` <CA+khW7hAYVdoQX5-j0z1iGEVZeww4BBu4NXzy5eS5OwDRYqe2w@mail.gmail.com>
2020-06-15 18:55     ` [RFC PATCH bpf-next 1/8] libbpf: generalize libbpf externs support Andrii Nakryiko
     [not found] ` <20200612223150.1177182-3-andriin@fb.com>
     [not found]   ` <CA+khW7hFZzp_K_xydSFw0O3LYB22_fC=Z4wG7i9Si+phGHn4cQ@mail.gmail.com>
2020-06-15 19:08     ` [RFC PATCH bpf-next 2/8] libbpf: add support for extracting kernel symbol addresses Andrii Nakryiko
2020-06-16  8:05       ` Hao Luo
2020-06-17  1:24         ` Hao Luo
2020-06-17  1:36           ` Andrii Nakryiko
2020-06-18  7:53             ` Hao Luo
     [not found] ` <20200612223150.1177182-4-andriin@fb.com>
     [not found]   ` <CA+khW7jxdS1KRpk2syVGjDqbyn3wAd3Eh_LEMAEhkPUehuXMwg@mail.gmail.com>
2020-06-15 19:11     ` [RFC PATCH bpf-next 3/8] selftests/bpf: add __ksym extern selftest Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).