Dwarves Archive on lore.kernel.org
 help / color / Atom feed
From: Andrii Nakryiko <andrii.nakryiko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Hao Luo <haoluo-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: Arnaldo Carvalho de Melo
	<arnaldo.melo-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Alexei Starovoitov
	<alexei.starovoitov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>,
	Oleg Rombakh <olegrom-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Martin Lau <kafai-b10kYP2dOMg@public.gmane.org>,
	dwarves-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH v4] btf_encoder: Teach pahole to store percpu variables in vmlinux BTF.
Date: Fri, 19 Jun 2020 12:58:17 -0700
Message-ID: <CAEf4BzbE2K7uyJ-2d3N+NLrV5OKS0HMm5vKuxMz5bfZkE_ANzw@mail.gmail.com> (raw)
In-Reply-To: <20200618074853.133675-1-haoluo-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

On Thu, Jun 18, 2020 at 12:49 AM Hao Luo <haoluo-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>
> On SMP systems, the global percpu variables are placed in a special
> '.data..percpu' section, which is stored in a segment whose initial
> address is set to 0, the addresses of per-CPU variables are relative
> positive addresses [1].
>
> This patch extracts these variables from vmlinux and places them with
> their type information in BTF. More specifically, when BTF is encoded,
> we find the index of the '.data..percpu' section and then traverse
> the symbol table to find those global objects which are in this section.
> For each of these objects, we push a BTF_KIND_VAR into the types buffer,
> and a BTF_VAR_SECINFO into another buffer, percpu_secinfo. When all the
> CUs have finished processing, we push a BTF_KIND_DATASEC into the
> btfe->types buffer, followed by the percpu_secinfo's content.
>
> In a v5.7-rc7 linux kernel, I was able to extract 291 such variables.
> The build time overhead is small and the space overhead is also small.
>
> Testing:
>
> - vmlinux size has increased by ~12kb.
>   Before:
>    $ readelf -SW vmlinux | grep BTF
>    [25] .BTF              PROGBITS        ffffffff821a905c 13a905c 2d2bf8 00
>   After:
>    $ pahole -J vmlinux
>    $ readelf -SW vmlinux  | grep BTF
>    [25] .BTF              PROGBITS        ffffffff821a905c 13a905c 2d5bca 00
>
> - Common global percpu VARs and DATASEC are found in BTF section.
>   $ bpftool btf dump file vmlinux | grep runqueues
>   [14098] VAR 'runqueues' type_id=13725, linkage=global-alloc
>
>   $ bpftool btf dump file vmlinux | grep 'cpu_stopper'
>   [17592] STRUCT 'cpu_stopper' size=72 vlen=5
>   [17612] VAR 'cpu_stopper' type_id=17592, linkage=static
>
>   $ bpftool btf dump file vmlinux | grep ' DATASEC '
>   [63652] DATASEC '.data..percpu' size=0 vlen=294

probably forgot to update the example, I'd imagine size wouldn't be 0 anymore?

>
> - Tested bpf selftests.
>
> References:
>  [1] https://lwn.net/Articles/531148/
> Signed-off-by: Hao Luo <haoluo-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> ---

I'm quite a bit uncomfortable that pahole would just ignore
non-conformat variables. We should understand why there might even be
those variables with invalid names, how that can happen in C?


> Changelog since v3:
> - Correct the size of the DATASEC, which was previously 0 and now is
>   the last var_secinfo's offset + size.
> - Add several sanity checks on VARs and DATASEC to ensure that the
>   generated BTF will not be rejected by verifier.
> - Tested through a set of selftests and bpf_tracing programs.
>
> Changelog since v2:
> - Move finding percpu_shndx and extracting symtab into btfe creation,
>   so we don't have to allocate a new symtab for each CU.
> - More debug msg by logging the vars encoded in 'verbose' mode. We
>   probably don't want to log the symbols that are _not_ encoded,
>   since that would be too verbose.
> - Calculate var offsets using 'addr - shdr.sh_addr', so it could be
>   generalized to other sections in future.
> - Filter out the symbols that are not STT_OBJECT.
> - Sort var_secinfos in the DATASEC by their offsets.
> - Free 'persec_secinfo' buffer and 'symtab' in btfe deletion.
> - Replace the string ".data..percpu" with a constant PERCPU_SECTION.
>
> Changelog since v1:
> - Add a ".data..percpu" DATASEC that encodes the found VARs.
> - Use percpu section's shndx to find the symbols that are percpu variables.
> - Use the correct type to set VAR's linkage.
>
>  btf_encoder.c | 185 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  dwarves.c     |   6 ++
>  dwarves.h     |   2 +
>  libbtf.c      | 127 ++++++++++++++++++++++++++++++++++
>  libbtf.h      |  12 ++++
>  pahole.c      |   1 +
>  6 files changed, 333 insertions(+)
>
> diff --git a/btf_encoder.c b/btf_encoder.c
> index df16ba0..c7401c3 100644
> --- a/btf_encoder.c
> +++ b/btf_encoder.c
> @@ -16,8 +16,44 @@
>  #include "elf_symtab.h"
>  #include "btf_encoder.h"
>
> +#include <ctype.h> /* for isalpha() and isalnum() */
>  #include <inttypes.h>
>
> +/*
> + * This corresponds to the same macro defined in
> + * include/linux/kallsyms.h
> + */
> +#define KSYM_NAME_LEN 128
> +
> +static bool btf_name_char_ok(char c, bool first, bool dot_ok)
> +{
> +       if ((first ? !isalpha(c) : !isalnum(c)) &&
> +           c != '_' &&
> +           ((c == '.' && !dot_ok) || c != '.'))
> +               return false;
> +       return true;
> +}
> +
> +/* Check wether the given name is valid in vmlinux btf. */
> +static bool btf_name_valid(const char *p, bool dot_ok)

dot_ok is always true, you can simplify above function

> +{
> +       const char *limit;
> +
> +       if (!btf_name_char_ok(*p, true, dot_ok))
> +               return false;
> +
> +       /* set a limit on identifier length */
> +       limit = p + KSYM_NAME_LEN;
> +       p++;
> +       while (*p && p < limit) {
> +               if (!btf_name_char_ok(*p, false, dot_ok))
> +                       return false;
> +               p++;
> +       }
> +
> +       return !*p;
> +}
> +

[...]

> +       /* search within symtab for percpu variables */
> +       elf_symtab__for_each_symbol(btfe->symtab, core_id, sym) {
> +               uint32_t linkage, type, size, offset;
> +               int32_t btf_var_id, btf_var_secinfo_id;
> +               uint64_t addr;
> +               const char *name;
> +
> +               /* compare a symbol's shndx to determine if it's a percpu variable */
> +               if (elf_sym__section(&sym) != btfe->percpu_shndx)
> +                       continue;
> +               if (elf_sym__type(&sym) != STT_OBJECT)
> +                       continue;
> +
> +               addr = elf_sym__value(&sym);
> +               /*
> +                * Store only those symbols that have allocated space in the percpu section.
> +                * This excludes the following three types of symbols:
> +                *
> +                *  1. __ADDRESSABLE(sym), which are forcely emitted as symbols.
> +                *  2. __UNIQUE_ID(prefix), which are introduced to generate unique ids.
> +                *  3. __exitcall(fn), functions which are labeled as exit calls.
> +                *
> +                * In addition, the variables defined using DEFINE_PERCPU_FIRST are
> +                * also not included, which currently includes:
> +                *
> +                *  1. fixed_percpu_data
> +                */
> +               if (!addr)
> +                       continue;
> +               var = hashaddr__find_variable(hash_addr, addr);
> +               if (var == NULL)
> +                       continue;
> +
> +               /*
> +                * Valididate the variable's name, the same check is also
> +                * performed in bpf verifier.
> +                */
> +               name = variable__name(var, cu);
> +               if (!var->name || !btf_name_valid(name, true)) {

It might be ok to filter them out, but I'd rather try to first
understand why do we get those invalid names in the first place.
Otherwise we might be just papering over the real bug in how we
process symbols/variables in DWARF?

> +                       if (verbose)
> +                               printf("invalid variable name '%s'.\n", name);
> +                       continue;
> +               }
> +
> +               /*
> +                * Validate the variable's type, the same check is also
> +                * performed in bpf verifier.
> +                */
> +               type = var->ip.tag.type + type_id_off;
> +               if (type == 0) {
> +                       if (verbose)
> +                               printf("invalid type for variable '%s'.\n", name);
> +                       continue;

Again, there might be cases which we want to ignore, but we need to
understand what and why those cases are?

> +               }
> +
> +               /*
> +                * Validate the size of the variable's type, the same check is
> +                * also performed in bpf verifier.
> +                */
> +               size = variable__type_size(var, cu);
> +               if (size == 0) {
> +                       if (verbose)
> +                               printf("invalid type size for variable '%s'.\n", name);
> +                       continue;
> +               }
> +

The goal here is not to satisfy the verifier, but to generate BTF from
DWARF faithfully. Let's not just ignore "inconvenient" cases and
instead investigate why such cases happen.

> +               if (verbose)
> +                       printf("symbol '%s' of address 0x%lx encoded\n",
> +                              elf_sym__name(&sym, btfe->symtab), addr);
> +
> +               /* add a BTF_KIND_VAR in btfe->types */
> +               linkage = var->external ? BTF_VAR_GLOBAL_ALLOCATED : BTF_VAR_STATIC;
> +               btf_var_id = btf_elf__add_var_type(btfe, type, var->name, linkage);
> +               if (btf_var_id < 0) {
> +                       err = -1;
> +                       printf("error: failed to encode variable '%s'\n", name);
> +                       break;
> +               }
> +
> +               /*
> +                * add a BTF_VAR_SECINFO in btfe->percpu_secinfo, which will be added into
> +                * btfe->types later when we add BTF_VAR_DATASEC.
> +                */
> +               type = btf_var_id;
> +               offset = addr - btfe->percpu_base_addr;
> +               btf_var_secinfo_id = btf_elf__add_var_secinfo(&btfe->percpu_secinfo,
> +                                                             type, offset, size);
> +               if (btf_var_secinfo_id < 0) {
> +                       err = -1;
> +                       printf("error: failed to encode var secinfo '%s'\n", name);
> +                       break;
> +               }
> +       }
> +
>  out:
>         if (err)
>                 btf_elf__delete(btfe);

[...]

  parent reply index

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-18  7:48 Hao Luo
     [not found] ` <20200618074853.133675-1-haoluo-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2020-06-19 19:58   ` Andrii Nakryiko [this message]
     [not found]     ` <CAEf4BzbE2K7uyJ-2d3N+NLrV5OKS0HMm5vKuxMz5bfZkE_ANzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-06-19 20:30       ` Hao Luo
     [not found]         ` <CA+khW7j3AoW8q+Q30RXG-psw4imkBxNqffObnu2dr2=K=oVBCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-06-29 20:25           ` Andrii Nakryiko
     [not found]             ` <CAEf4Bza20kx3d+84Fg+UKXFYb2vforj55g0zO92Hpz+cM49LQA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-06-29 21:13               ` Hao Luo
     [not found]                 ` <CA+khW7hqX9fJd2G8-Kze9pMR4hUKV5ROxn1vKYy82+PCzDeCDQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-06-30  1:02                   ` Arnaldo Carvalho de Melo
2020-06-29 20:20       ` Andrii Nakryiko
     [not found]         ` <CAEf4Bzbiu8G5UK9TDBRFYGwVk4A4QWoLurZsFKhs439gBXaDCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-06-30 11:59           ` Arnaldo Carvalho de Melo
     [not found]             ` <20200630115927.GL29008-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2020-07-03  8:13               ` Hao Luo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEf4BzbE2K7uyJ-2d3N+NLrV5OKS0HMm5vKuxMz5bfZkE_ANzw@mail.gmail.com \
    --to=andrii.nakryiko-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=alexei.starovoitov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=arnaldo.melo-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org \
    --cc=dwarves-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=haoluo-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=kafai-b10kYP2dOMg@public.gmane.org \
    --cc=olegrom-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Dwarves Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/dwarves/0 dwarves/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dwarves dwarves/ https://lore.kernel.org/dwarves \
		dwarves@vger.kernel.org
	public-inbox-index dwarves

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.dwarves


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git