BPF Archive on lore.kernel.org
 help / color / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Hao Luo <haoluo@google.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Andrii Nakryiko <andriin@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>, bpf <bpf@vger.kernel.org>,
	Oleg Rombakh <olegrom@google.com>,
	Martin KaFai Lau <kafai@fb.com>
Subject: Re: accessing global and per-cpu vars
Date: Thu, 28 May 2020 13:58:39 -0700
Message-ID: <CAEf4BzZo7RMQb6HzhqROLjTASXzfCi82f4-ySRBN2UshR73KEA@mail.gmail.com> (raw)
In-Reply-To: <CA+khW7hqemc+xsbMQq-DW3X+mHKO+Lm64hNpWNRyZ75MkUa0Gg@mail.gmail.com>

On Thu, May 28, 2020 at 1:51 PM Hao Luo <haoluo@google.com> wrote:
>
> A quick update on this thread.
>
> I came up with a draft patch that fulfills step 1. I added a ".ksym" section for extern vars. The libbpf fills these vars' values by reading /proc/kallsyms at load time. I think I am going to upload this patch for review together with step 3 and 4 after I work them out.
>
> Regarding step 2, I have also worked out a patch in pahole that inserts the kernel's percpu vars into BTF. I realized, because step 2 happens at compile time, there is no kallsyms file available to extract symbols, so we have to read the global vars from vmlinux. Currently on v5.7-rc7, I was able to extract 291 percpu vars, static or global. The .BTF size increases from 2d2c10 to 2d4dd0. A clean build on my local workstation increases from 10m13s to 11m24s (wall time). Common global percpu vars can be found in .BTF.

For humans among us, that's 8640 bytes increase, it seems, not a big
deal at all. Have you checked how much would it increase if you
include not just per-cpu variables?

Also I wonder what adds more than a minute to the build process? Is it
all pahole's BTF generation step? If yes, why it's so much slower now?

>
> haoluo@haoluo:~/kernel/tip/pkgs/images/boot$ bpftool btf dump file vmlinux-5.7.0-smp-DEV | grep runqueues
>
> [14098] VAR 'runqueues' type_id=13725, linkage=global-alloc
>
> haoluo@haoluo:~/kernel/tip/pkgs/images/boot$ bpftool btf dump file vmlinux-5.7.0-smp-DEV | grep cpu_stopper
>
> [17589] STRUCT 'cpu_stopper' size=72 vlen=5
>
> [17609] VAR 'cpu_stopper' type_id=17589, linkage=global-alloc
>
> Arnaldo, would you please advise on how to upload the pahole patch for review? I am going to polish it a bit and think I can upload it for review.
>
> Thanks,
> Hao
>
> On Tue, May 26, 2020 at 2:04 PM Hao Luo <haoluo@google.com> wrote:
>>
>> I just did some poking and found the source of the format. TLDR is these letters are of the same semantic of 'nm' output [1]. So we can put the symbols of 'A' in BTF first, as these symbols have absolute addresses in runtime and it's the safest choice to start with, I think.
>>
>> More details. So during linking for vmlinux, the intermediate obj is passed to nm and its output is used by the kallsyms to generate a .S file [2]. That .S file builds a data blob 'kallsyms_names' in vmlinux [3] which is used to generate /proc/kallsyms [4]. The types of the symbols are carried from the output of nm to the kallsyms_names, mostly untouched. The only exception is, if CONFIG_KALLSYMS_ABSOLUTE_PERCPU is configured, percpu symbols are forced to have absolute addresses.
>>
>> [1] https://linux.die.net/man/1/nm
>> [2] https://github.com/torvalds/linux/blob/master/scripts/link-vmlinux.sh#L168
>> [3] https://github.com/torvalds/linux/blob/master/scripts/kallsyms.c#L446
>> [4] https://github.com/torvalds/linux/blob/master/kernel/kallsyms.c#L115
>>
>> On Tue, May 26, 2020 at 11:21 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>>
>>> On Tue, May 26, 2020 at 12:58 AM Hao Luo <haoluo@google.com> wrote:
>>> >
>>> > Hi, Arnaldo and Andrii,
>>> >
>>> > Thanks for taking a look and checking.
>>> >
>>> > On Fri, May 22, 2020 at 7:28 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>>> >>
>>> >> Em Thu, May 21, 2020 at 11:58:47AM -0700, Andrii Nakryiko escreveu:
>>> >> > On Thu, May 21, 2020 at 10:07 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>> >> > > 2. teach pahole to store ' A ' annotated kallsyms into vmlinux BTF as
>>> >> > > BTF_KIND_VAR.
>>> >> > > There are ~300 of them, so should be minimal increase in size.
>>> >> >
>>> >> > I thought we'd do that based on section name? Or we will actually
>>> >> > teach pahole to extract kallsyms from vmlinux image?
>>> >>
>>> >> No need to touch kallsyms:
>>> >>
>>> >>   net/core/filter.c
>>> >>
>>> >>   DEFINE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info);
>>> >>
>>> >>   # grep -w bpf_redirect_info /proc/kallsyms
>>> >>   000000000002a160 A bpf_redirect_info
>>> >>   #
>>> >>   # readelf -s ~acme/git/build/v5.7-rc2+/vmlinux | grep bpf_redirect_info
>>> >>   113637: 000000000002a2e0    32 OBJECT  GLOBAL DEFAULT   34 bpf_redirect_info
>>> >>   #
>>> >>
>>> >> Its in the ELF symtab.
>>> >>
>>> >> [root@quaco ~]# grep ' A ' /proc/kallsyms | wc -l
>>> >> 351
>>> >> [root@quaco ~]# readelf -s ~acme/git/build/v5.7-rc2+/vmlinux | grep "OBJECT  GLOBAL" | wc -l
>>> >> 3221
>>> >> [root@quaco ~]#
>>> >>
>>> >> So ' A ' in kallsyms needs some extra info from the symtab in addition
>>> >>
>>> >> to being OBJECT GLOBAL, checking...
>>> >
>>> >
>>> > After playing a bit, I found 'A' symbols in kallsyms include the per_cpu variables (e.g. runqueues and sched_clock_data), either global or local. An example of the global var is 'runqueues' and the example of local one is 'sched_clock_data'.
>>> >
>>> > The OBJECT GLOBAL symbols in vmlinux include the global variables such as runqueues. It also includes those symbols annotated as other capital letters such as 'R' or 'B' in kallsyms. For example, __per_cpu_offset is OBJECT GLOBAL in vmlinux and it's annotated as 'R', implying a global const variable.
>>> >
>>> > I think either the vmlinux approach or the kallsyms approach is good enough. I will continue experimenting while working on step 1.
>>> >
>>>
>>> /proc/kallsyms is available in runtime (if configured, of course),
>>> while vmlinux image might not be available at runtime at all in some
>>> environments. This is one of the reasons for BTF to be exposed in
>>> runtime through /sys/kernel/btf/vmlinux, instead of just keeping it in
>>> vmlinux image. So I think kallsyms approach is better and more
>>> reliable.
>>>
>>> As for 'A', 'R', 'B', etc. Can we please look at source code of
>>> whatever in kernel defines those lettera in ksyms, instead of guessing
>>> based on a subset of symbols? Guessing like this makes me nervous :)
>>>
>>> > Thanks,
>>> > Hao
>>> >
>>> >>
>>> >> > There was step 1.5 (or even 0.5) to see if it's feasible to add not
>>> >> > just per-CPU variables as well.
>>> >>
>>> >> - Arnaldo

      parent reply index

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-21 17:03 Alexei Starovoitov
2020-05-21 18:58 ` Andrii Nakryiko
2020-05-22 14:28   ` Arnaldo Carvalho de Melo
     [not found]     ` <CA+khW7j=ejncVYgY=hKEnkrkwA=Wjwa6Y2PFWgzrV1EV_8rvpw@mail.gmail.com>
2020-05-26 18:20       ` Andrii Nakryiko
     [not found]         ` <CA+khW7ha-5YSgm5kARO=+JEtf-Ahmc1N_TBJ2iLSntk12pfy3w@mail.gmail.com>
     [not found]           ` <CA+khW7hqemc+xsbMQq-DW3X+mHKO+Lm64hNpWNRyZ75MkUa0Gg@mail.gmail.com>
2020-05-28 20:58             ` Arnaldo Carvalho de Melo
2020-05-28 20:58             ` Andrii Nakryiko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEf4BzZo7RMQb6HzhqROLjTASXzfCi82f4-ySRBN2UshR73KEA@mail.gmail.com \
    --to=andrii.nakryiko@gmail.com \
    --cc=acme@kernel.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andriin@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=kafai@fb.com \
    --cc=olegrom@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git