bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* accessing global and per-cpu vars
@ 2020-05-21 17:03 Alexei Starovoitov
  2020-05-21 18:58 ` Andrii Nakryiko
  0 siblings, 1 reply; 6+ messages in thread
From: Alexei Starovoitov @ 2020-05-21 17:03 UTC (permalink / raw)
  To: haoluo, Andrii Nakryiko, Arnaldo Carvalho de Melo,
	Daniel Borkmann, bpf, olegrom, Martin KaFai Lau

Hi,

here are my notes from the bpf office hours today.
Does it sound as what we discussed?

Steps to add incremental support to global vars:
1. teach libbpf to replace "ld_imm64 rX, foo" with absolute address
of var "foo" by reading that value from kallsyms.
From the verifier point of view ld_imm64 instruction will look like it's
assigning large constant into a register.
The bpf prog would need to use bpf_probe_read_kernel()
to further access vars.

2. teach pahole to store ' A ' annotated kallsyms into vmlinux BTF as
BTF_KIND_VAR.
There are ~300 of them, so should be minimal increase in size.

3. teach libbpf to scan vmlinux BTF for vars and replace "ld_imm64 rX, foo"
with BPF_PSEUDO_BTF_ID.
From the verifier point of view 'ld_imm64 rX, 123 // pseudo_btf_id'
will be similar to ld_imm64 with pseudo_map_fd and pseudo_map_value.
The verifier will check btf_id and replace that with actual kernel address
at program load time. It will also know that exact type of 'rX' from there on.
That gives big performance win since bpf prog will be able to use
direct load instructions to access vars.

4. add bpf_per_cpu(var, cpu) helper.
It will accept 'var' in R1 and the verifier will enforce that R1 is
PTR_TO_BTF_ID type
and it's BTF_KIND_VAR and it's in per-cpu datasec.
The return value from that helper will be normal PTR_TO_BTF_ID,
so subsequent load instruction can use it directly.
Would be nice to have this helper without BTF requirement,
but I don't see how to make it safe without BTF at the moment.
Similarly bpf_this_cpu_ptr(var) helper could be necessary or
we may fold it into cpu == BPF_F_CURRENT_CPU as a single helper.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: accessing global and per-cpu vars
  2020-05-21 17:03 accessing global and per-cpu vars Alexei Starovoitov
@ 2020-05-21 18:58 ` Andrii Nakryiko
  2020-05-22 14:28   ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 6+ messages in thread
From: Andrii Nakryiko @ 2020-05-21 18:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: haoluo, Andrii Nakryiko, Arnaldo Carvalho de Melo,
	Daniel Borkmann, bpf, olegrom, Martin KaFai Lau

On Thu, May 21, 2020 at 10:07 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> Hi,
>
> here are my notes from the bpf office hours today.
> Does it sound as what we discussed?
>
> Steps to add incremental support to global vars:
> 1. teach libbpf to replace "ld_imm64 rX, foo" with absolute address
> of var "foo" by reading that value from kallsyms.
> From the verifier point of view ld_imm64 instruction will look like it's
> assigning large constant into a register.
> The bpf prog would need to use bpf_probe_read_kernel()
> to further access vars.

yep

>
> 2. teach pahole to store ' A ' annotated kallsyms into vmlinux BTF as
> BTF_KIND_VAR.
> There are ~300 of them, so should be minimal increase in size.

I thought we'd do that based on section name? Or we will actually
teach pahole to extract kallsyms from vmlinux image?

There was step 1.5 (or even 0.5) to see if it's feasible to add not
just per-CPU variables as well.

>
> 3. teach libbpf to scan vmlinux BTF for vars and replace "ld_imm64 rX, foo"
> with BPF_PSEUDO_BTF_ID.
> From the verifier point of view 'ld_imm64 rX, 123 // pseudo_btf_id'
> will be similar to ld_imm64 with pseudo_map_fd and pseudo_map_value.
> The verifier will check btf_id and replace that with actual kernel address
> at program load time. It will also know that exact type of 'rX' from there on.
> That gives big performance win since bpf prog will be able to use
> direct load instructions to access vars.

yep

>
> 4. add bpf_per_cpu(var, cpu) helper.
> It will accept 'var' in R1 and the verifier will enforce that R1 is
> PTR_TO_BTF_ID type
> and it's BTF_KIND_VAR and it's in per-cpu datasec.
> The return value from that helper will be normal PTR_TO_BTF_ID,
> so subsequent load instruction can use it directly.
> Would be nice to have this helper without BTF requirement,
> but I don't see how to make it safe without BTF at the moment.
> Similarly bpf_this_cpu_ptr(var) helper could be necessary or
> we may fold it into cpu == BPF_F_CURRENT_CPU as a single helper.

separate helper sounds better to me, both from usability stand point,
as well as not using extra register unnecessarily

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: accessing global and per-cpu vars
  2020-05-21 18:58 ` Andrii Nakryiko
@ 2020-05-22 14:28   ` Arnaldo Carvalho de Melo
       [not found]     ` <CA+khW7j=ejncVYgY=hKEnkrkwA=Wjwa6Y2PFWgzrV1EV_8rvpw@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-05-22 14:28 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, haoluo, Andrii Nakryiko, Daniel Borkmann,
	bpf, olegrom, Martin KaFai Lau

Em Thu, May 21, 2020 at 11:58:47AM -0700, Andrii Nakryiko escreveu:
> On Thu, May 21, 2020 at 10:07 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > 2. teach pahole to store ' A ' annotated kallsyms into vmlinux BTF as
> > BTF_KIND_VAR.
> > There are ~300 of them, so should be minimal increase in size.
> 
> I thought we'd do that based on section name? Or we will actually
> teach pahole to extract kallsyms from vmlinux image?

No need to touch kallsyms:

  net/core/filter.c
  
  DEFINE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info);
  
  # grep -w bpf_redirect_info /proc/kallsyms
  000000000002a160 A bpf_redirect_info
  #
  # readelf -s ~acme/git/build/v5.7-rc2+/vmlinux | grep bpf_redirect_info
  113637: 000000000002a2e0    32 OBJECT  GLOBAL DEFAULT   34 bpf_redirect_info
  #

Its in the ELF symtab.

[root@quaco ~]# grep ' A ' /proc/kallsyms | wc -l
351
[root@quaco ~]# readelf -s ~acme/git/build/v5.7-rc2+/vmlinux | grep "OBJECT  GLOBAL" | wc -l
3221
[root@quaco ~]#

So ' A ' in kallsyms needs some extra info from the symtab in addition
to being OBJECT GLOBAL, checking...
 
> There was step 1.5 (or even 0.5) to see if it's feasible to add not
> just per-CPU variables as well.

- Arnaldo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: accessing global and per-cpu vars
       [not found]     ` <CA+khW7j=ejncVYgY=hKEnkrkwA=Wjwa6Y2PFWgzrV1EV_8rvpw@mail.gmail.com>
@ 2020-05-26 18:20       ` Andrii Nakryiko
       [not found]         ` <CA+khW7ha-5YSgm5kARO=+JEtf-Ahmc1N_TBJ2iLSntk12pfy3w@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Andrii Nakryiko @ 2020-05-26 18:20 UTC (permalink / raw)
  To: Hao Luo
  Cc: Arnaldo Carvalho de Melo, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, bpf, Oleg Rombakh, Martin KaFai Lau

On Tue, May 26, 2020 at 12:58 AM Hao Luo <haoluo@google.com> wrote:
>
> Hi, Arnaldo and Andrii,
>
> Thanks for taking a look and checking.
>
> On Fri, May 22, 2020 at 7:28 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>>
>> Em Thu, May 21, 2020 at 11:58:47AM -0700, Andrii Nakryiko escreveu:
>> > On Thu, May 21, 2020 at 10:07 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>> > > 2. teach pahole to store ' A ' annotated kallsyms into vmlinux BTF as
>> > > BTF_KIND_VAR.
>> > > There are ~300 of them, so should be minimal increase in size.
>> >
>> > I thought we'd do that based on section name? Or we will actually
>> > teach pahole to extract kallsyms from vmlinux image?
>>
>> No need to touch kallsyms:
>>
>>   net/core/filter.c
>>
>>   DEFINE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info);
>>
>>   # grep -w bpf_redirect_info /proc/kallsyms
>>   000000000002a160 A bpf_redirect_info
>>   #
>>   # readelf -s ~acme/git/build/v5.7-rc2+/vmlinux | grep bpf_redirect_info
>>   113637: 000000000002a2e0    32 OBJECT  GLOBAL DEFAULT   34 bpf_redirect_info
>>   #
>>
>> Its in the ELF symtab.
>>
>> [root@quaco ~]# grep ' A ' /proc/kallsyms | wc -l
>> 351
>> [root@quaco ~]# readelf -s ~acme/git/build/v5.7-rc2+/vmlinux | grep "OBJECT  GLOBAL" | wc -l
>> 3221
>> [root@quaco ~]#
>>
>> So ' A ' in kallsyms needs some extra info from the symtab in addition
>>
>> to being OBJECT GLOBAL, checking...
>
>
> After playing a bit, I found 'A' symbols in kallsyms include the per_cpu variables (e.g. runqueues and sched_clock_data), either global or local. An example of the global var is 'runqueues' and the example of local one is 'sched_clock_data'.
>
> The OBJECT GLOBAL symbols in vmlinux include the global variables such as runqueues. It also includes those symbols annotated as other capital letters such as 'R' or 'B' in kallsyms. For example, __per_cpu_offset is OBJECT GLOBAL in vmlinux and it's annotated as 'R', implying a global const variable.
>
> I think either the vmlinux approach or the kallsyms approach is good enough. I will continue experimenting while working on step 1.
>

/proc/kallsyms is available in runtime (if configured, of course),
while vmlinux image might not be available at runtime at all in some
environments. This is one of the reasons for BTF to be exposed in
runtime through /sys/kernel/btf/vmlinux, instead of just keeping it in
vmlinux image. So I think kallsyms approach is better and more
reliable.

As for 'A', 'R', 'B', etc. Can we please look at source code of
whatever in kernel defines those lettera in ksyms, instead of guessing
based on a subset of symbols? Guessing like this makes me nervous :)

> Thanks,
> Hao
>
>>
>> > There was step 1.5 (or even 0.5) to see if it's feasible to add not
>> > just per-CPU variables as well.
>>
>> - Arnaldo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: accessing global and per-cpu vars
       [not found]           ` <CA+khW7hqemc+xsbMQq-DW3X+mHKO+Lm64hNpWNRyZ75MkUa0Gg@mail.gmail.com>
@ 2020-05-28 20:58             ` Arnaldo Carvalho de Melo
  2020-05-28 20:58             ` Andrii Nakryiko
  1 sibling, 0 replies; 6+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-05-28 20:58 UTC (permalink / raw)
  To: Hao Luo, Andrii Nakryiko
  Cc: Arnaldo Carvalho de Melo, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, bpf, Oleg Rombakh, Martin KaFai Lau



On May 28, 2020 5:50:52 PM GMT-03:00, Hao Luo <haoluo@google.com> wrote:
>A quick update on this thread.
>
>I came up with a draft patch that fulfills step 1. I added a ".ksym"
>section for extern vars. The libbpf fills these vars' values by reading
>/proc/kallsyms at load time. I think I am going to upload this patch
>for
>review together with step 3 and 4 after I work them out.
>
>Regarding step 2, I have also worked out a patch in pahole that inserts
>the
>kernel's percpu vars into BTF. I realized, because step 2 happens at
>compile time, there is no kallsyms file available to extract symbols,
>so we
>have to read the global vars from vmlinux. Currently on v5.7-rc7, I was
>able to extract 291 percpu vars, static or global. The .BTF size
>increases
>from 2d2c10 to 2d4dd0. A clean build on my local workstation increases
>from
>10m13s to 11m24s (wall time). Common global percpu vars can be found in
>.BTF.
>
>haoluo@haoluo:~/kernel/tip/pkgs/images/boot$ bpftool btf dump file
>vmlinux-5.7.0-smp-DEV | grep runqueues
>
>[14098] VAR 'runqueues' type_id=13725, linkage=global-alloc
>
>haoluo@haoluo:~/kernel/tip/pkgs/images/boot$ bpftool btf dump file
>vmlinux-5.7.0-smp-DEV | grep cpu_stopper
>
>[17589] STRUCT 'cpu_stopper' size=72 vlen=5
>[17609] VAR 'cpu_stopper' type_id=17589, linkage=global-alloc
>
>Arnaldo, would you please advise on how to upload the pahole patch for
>review? I am going to polish it a bit and think I can upload it for
>review.

Cool, send it to me, CC dwarves@vger.kernel.org,

Thanks, 

- Arnaldo

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: accessing global and per-cpu vars
       [not found]           ` <CA+khW7hqemc+xsbMQq-DW3X+mHKO+Lm64hNpWNRyZ75MkUa0Gg@mail.gmail.com>
  2020-05-28 20:58             ` Arnaldo Carvalho de Melo
@ 2020-05-28 20:58             ` Andrii Nakryiko
  1 sibling, 0 replies; 6+ messages in thread
From: Andrii Nakryiko @ 2020-05-28 20:58 UTC (permalink / raw)
  To: Hao Luo
  Cc: Arnaldo Carvalho de Melo, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, bpf, Oleg Rombakh, Martin KaFai Lau

On Thu, May 28, 2020 at 1:51 PM Hao Luo <haoluo@google.com> wrote:
>
> A quick update on this thread.
>
> I came up with a draft patch that fulfills step 1. I added a ".ksym" section for extern vars. The libbpf fills these vars' values by reading /proc/kallsyms at load time. I think I am going to upload this patch for review together with step 3 and 4 after I work them out.
>
> Regarding step 2, I have also worked out a patch in pahole that inserts the kernel's percpu vars into BTF. I realized, because step 2 happens at compile time, there is no kallsyms file available to extract symbols, so we have to read the global vars from vmlinux. Currently on v5.7-rc7, I was able to extract 291 percpu vars, static or global. The .BTF size increases from 2d2c10 to 2d4dd0. A clean build on my local workstation increases from 10m13s to 11m24s (wall time). Common global percpu vars can be found in .BTF.

For humans among us, that's 8640 bytes increase, it seems, not a big
deal at all. Have you checked how much would it increase if you
include not just per-cpu variables?

Also I wonder what adds more than a minute to the build process? Is it
all pahole's BTF generation step? If yes, why it's so much slower now?

>
> haoluo@haoluo:~/kernel/tip/pkgs/images/boot$ bpftool btf dump file vmlinux-5.7.0-smp-DEV | grep runqueues
>
> [14098] VAR 'runqueues' type_id=13725, linkage=global-alloc
>
> haoluo@haoluo:~/kernel/tip/pkgs/images/boot$ bpftool btf dump file vmlinux-5.7.0-smp-DEV | grep cpu_stopper
>
> [17589] STRUCT 'cpu_stopper' size=72 vlen=5
>
> [17609] VAR 'cpu_stopper' type_id=17589, linkage=global-alloc
>
> Arnaldo, would you please advise on how to upload the pahole patch for review? I am going to polish it a bit and think I can upload it for review.
>
> Thanks,
> Hao
>
> On Tue, May 26, 2020 at 2:04 PM Hao Luo <haoluo@google.com> wrote:
>>
>> I just did some poking and found the source of the format. TLDR is these letters are of the same semantic of 'nm' output [1]. So we can put the symbols of 'A' in BTF first, as these symbols have absolute addresses in runtime and it's the safest choice to start with, I think.
>>
>> More details. So during linking for vmlinux, the intermediate obj is passed to nm and its output is used by the kallsyms to generate a .S file [2]. That .S file builds a data blob 'kallsyms_names' in vmlinux [3] which is used to generate /proc/kallsyms [4]. The types of the symbols are carried from the output of nm to the kallsyms_names, mostly untouched. The only exception is, if CONFIG_KALLSYMS_ABSOLUTE_PERCPU is configured, percpu symbols are forced to have absolute addresses.
>>
>> [1] https://linux.die.net/man/1/nm
>> [2] https://github.com/torvalds/linux/blob/master/scripts/link-vmlinux.sh#L168
>> [3] https://github.com/torvalds/linux/blob/master/scripts/kallsyms.c#L446
>> [4] https://github.com/torvalds/linux/blob/master/kernel/kallsyms.c#L115
>>
>> On Tue, May 26, 2020 at 11:21 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>>
>>> On Tue, May 26, 2020 at 12:58 AM Hao Luo <haoluo@google.com> wrote:
>>> >
>>> > Hi, Arnaldo and Andrii,
>>> >
>>> > Thanks for taking a look and checking.
>>> >
>>> > On Fri, May 22, 2020 at 7:28 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>>> >>
>>> >> Em Thu, May 21, 2020 at 11:58:47AM -0700, Andrii Nakryiko escreveu:
>>> >> > On Thu, May 21, 2020 at 10:07 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>> >> > > 2. teach pahole to store ' A ' annotated kallsyms into vmlinux BTF as
>>> >> > > BTF_KIND_VAR.
>>> >> > > There are ~300 of them, so should be minimal increase in size.
>>> >> >
>>> >> > I thought we'd do that based on section name? Or we will actually
>>> >> > teach pahole to extract kallsyms from vmlinux image?
>>> >>
>>> >> No need to touch kallsyms:
>>> >>
>>> >>   net/core/filter.c
>>> >>
>>> >>   DEFINE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info);
>>> >>
>>> >>   # grep -w bpf_redirect_info /proc/kallsyms
>>> >>   000000000002a160 A bpf_redirect_info
>>> >>   #
>>> >>   # readelf -s ~acme/git/build/v5.7-rc2+/vmlinux | grep bpf_redirect_info
>>> >>   113637: 000000000002a2e0    32 OBJECT  GLOBAL DEFAULT   34 bpf_redirect_info
>>> >>   #
>>> >>
>>> >> Its in the ELF symtab.
>>> >>
>>> >> [root@quaco ~]# grep ' A ' /proc/kallsyms | wc -l
>>> >> 351
>>> >> [root@quaco ~]# readelf -s ~acme/git/build/v5.7-rc2+/vmlinux | grep "OBJECT  GLOBAL" | wc -l
>>> >> 3221
>>> >> [root@quaco ~]#
>>> >>
>>> >> So ' A ' in kallsyms needs some extra info from the symtab in addition
>>> >>
>>> >> to being OBJECT GLOBAL, checking...
>>> >
>>> >
>>> > After playing a bit, I found 'A' symbols in kallsyms include the per_cpu variables (e.g. runqueues and sched_clock_data), either global or local. An example of the global var is 'runqueues' and the example of local one is 'sched_clock_data'.
>>> >
>>> > The OBJECT GLOBAL symbols in vmlinux include the global variables such as runqueues. It also includes those symbols annotated as other capital letters such as 'R' or 'B' in kallsyms. For example, __per_cpu_offset is OBJECT GLOBAL in vmlinux and it's annotated as 'R', implying a global const variable.
>>> >
>>> > I think either the vmlinux approach or the kallsyms approach is good enough. I will continue experimenting while working on step 1.
>>> >
>>>
>>> /proc/kallsyms is available in runtime (if configured, of course),
>>> while vmlinux image might not be available at runtime at all in some
>>> environments. This is one of the reasons for BTF to be exposed in
>>> runtime through /sys/kernel/btf/vmlinux, instead of just keeping it in
>>> vmlinux image. So I think kallsyms approach is better and more
>>> reliable.
>>>
>>> As for 'A', 'R', 'B', etc. Can we please look at source code of
>>> whatever in kernel defines those lettera in ksyms, instead of guessing
>>> based on a subset of symbols? Guessing like this makes me nervous :)
>>>
>>> > Thanks,
>>> > Hao
>>> >
>>> >>
>>> >> > There was step 1.5 (or even 0.5) to see if it's feasible to add not
>>> >> > just per-CPU variables as well.
>>> >>
>>> >> - Arnaldo

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-05-28 20:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-21 17:03 accessing global and per-cpu vars Alexei Starovoitov
2020-05-21 18:58 ` Andrii Nakryiko
2020-05-22 14:28   ` Arnaldo Carvalho de Melo
     [not found]     ` <CA+khW7j=ejncVYgY=hKEnkrkwA=Wjwa6Y2PFWgzrV1EV_8rvpw@mail.gmail.com>
2020-05-26 18:20       ` Andrii Nakryiko
     [not found]         ` <CA+khW7ha-5YSgm5kARO=+JEtf-Ahmc1N_TBJ2iLSntk12pfy3w@mail.gmail.com>
     [not found]           ` <CA+khW7hqemc+xsbMQq-DW3X+mHKO+Lm64hNpWNRyZ75MkUa0Gg@mail.gmail.com>
2020-05-28 20:58             ` Arnaldo Carvalho de Melo
2020-05-28 20:58             ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).