Dwarves Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v4] btf_encoder: Teach pahole to store percpu variables in vmlinux BTF.
@ 2020-06-18  7:48 Hao Luo
       [not found] ` <20200618074853.133675-1-haoluo-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Hao Luo @ 2020-06-18  7:48 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Andrii Nakryiko, Alexei Starovoitov,
	daniel-FeC+5ew28dpmcu3hnIyYJQ, olegrom-hpIqsD4AKlfQT0dZR+AlfA,
	kafai-b10kYP2dOMg, dwarves-u79uwXL29TY76Z2rM5mHXA, Hao Luo

On SMP systems, the global percpu variables are placed in a special
'.data..percpu' section, which is stored in a segment whose initial
address is set to 0, the addresses of per-CPU variables are relative
positive addresses [1].

This patch extracts these variables from vmlinux and places them with
their type information in BTF. More specifically, when BTF is encoded,
we find the index of the '.data..percpu' section and then traverse
the symbol table to find those global objects which are in this section.
For each of these objects, we push a BTF_KIND_VAR into the types buffer,
and a BTF_VAR_SECINFO into another buffer, percpu_secinfo. When all the
CUs have finished processing, we push a BTF_KIND_DATASEC into the
btfe->types buffer, followed by the percpu_secinfo's content.

In a v5.7-rc7 linux kernel, I was able to extract 291 such variables.
The build time overhead is small and the space overhead is also small.

Testing:

- vmlinux size has increased by ~12kb.
  Before:
   $ readelf -SW vmlinux | grep BTF
   [25] .BTF              PROGBITS        ffffffff821a905c 13a905c 2d2bf8 00
  After:
   $ pahole -J vmlinux
   $ readelf -SW vmlinux  | grep BTF
   [25] .BTF              PROGBITS        ffffffff821a905c 13a905c 2d5bca 00

- Common global percpu VARs and DATASEC are found in BTF section.
  $ bpftool btf dump file vmlinux | grep runqueues
  [14098] VAR 'runqueues' type_id=13725, linkage=global-alloc

  $ bpftool btf dump file vmlinux | grep 'cpu_stopper'
  [17592] STRUCT 'cpu_stopper' size=72 vlen=5
  [17612] VAR 'cpu_stopper' type_id=17592, linkage=static

  $ bpftool btf dump file vmlinux | grep ' DATASEC '
  [63652] DATASEC '.data..percpu' size=0 vlen=294

- Tested bpf selftests.

References:
 [1] https://lwn.net/Articles/531148/
Signed-off-by: Hao Luo <haoluo-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
Changelog since v3:
- Correct the size of the DATASEC, which was previously 0 and now is
  the last var_secinfo's offset + size.
- Add several sanity checks on VARs and DATASEC to ensure that the
  generated BTF will not be rejected by verifier.
- Tested through a set of selftests and bpf_tracing programs.

Changelog since v2:
- Move finding percpu_shndx and extracting symtab into btfe creation,
  so we don't have to allocate a new symtab for each CU.
- More debug msg by logging the vars encoded in 'verbose' mode. We
  probably don't want to log the symbols that are _not_ encoded,
  since that would be too verbose.
- Calculate var offsets using 'addr - shdr.sh_addr', so it could be
  generalized to other sections in future.
- Filter out the symbols that are not STT_OBJECT.
- Sort var_secinfos in the DATASEC by their offsets.
- Free 'persec_secinfo' buffer and 'symtab' in btfe deletion.
- Replace the string ".data..percpu" with a constant PERCPU_SECTION.

Changelog since v1:
- Add a ".data..percpu" DATASEC that encodes the found VARs.
- Use percpu section's shndx to find the symbols that are percpu variables.
- Use the correct type to set VAR's linkage.

 btf_encoder.c | 185 ++++++++++++++++++++++++++++++++++++++++++++++++++
 dwarves.c     |   6 ++
 dwarves.h     |   2 +
 libbtf.c      | 127 ++++++++++++++++++++++++++++++++++
 libbtf.h      |  12 ++++
 pahole.c      |   1 +
 6 files changed, 333 insertions(+)

diff --git a/btf_encoder.c b/btf_encoder.c
index df16ba0..c7401c3 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -16,8 +16,44 @@
 #include "elf_symtab.h"
 #include "btf_encoder.h"
 
+#include <ctype.h> /* for isalpha() and isalnum() */
 #include <inttypes.h>
 
+/*
+ * This corresponds to the same macro defined in
+ * include/linux/kallsyms.h
+ */
+#define KSYM_NAME_LEN 128
+
+static bool btf_name_char_ok(char c, bool first, bool dot_ok)
+{
+	if ((first ? !isalpha(c) : !isalnum(c)) &&
+	    c != '_' &&
+	    ((c == '.' && !dot_ok) || c != '.'))
+		return false;
+	return true;
+}
+
+/* Check wether the given name is valid in vmlinux btf. */
+static bool btf_name_valid(const char *p, bool dot_ok)
+{
+	const char *limit;
+
+	if (!btf_name_char_ok(*p, true, dot_ok))
+		return false;
+
+	/* set a limit on identifier length */
+	limit = p + KSYM_NAME_LEN;
+	p++;
+	while (*p && p < limit) {
+		if (!btf_name_char_ok(*p, false, dot_ok))
+			return false;
+		p++;
+	}
+
+	return !*p;
+}
+
 static int tag__check_id_drift(const struct tag *tag,
 			       uint32_t core_id, uint32_t btf_type_id,
 			       uint32_t type_id_off)
@@ -168,6 +204,27 @@ int btf_encoder__encode()
 	return err;
 }
 
+
+#define HASHADDR__BITS 8
+#define HASHADDR__SIZE (1UL << HASHADDR__BITS)
+#define hashaddr__fn(key) hash_64(key, HASHADDR__BITS)
+
+static struct variable *hashaddr__find_variable(const struct hlist_head hashtable[],
+						const uint64_t addr)
+{
+	struct variable *variable;
+	struct hlist_node *pos;
+	uint16_t bucket = hashaddr__fn(addr);
+	const struct hlist_head *head = &hashtable[bucket];
+
+	hlist_for_each_entry(variable, pos, head, tool_hnode) {
+		if (variable->ip.addr == addr)
+			return variable;
+	}
+
+	return NULL;
+}
+
 int cu__encode_btf(struct cu *cu, int verbose)
 {
 	bool add_index_type = false;
@@ -176,6 +233,10 @@ int cu__encode_btf(struct cu *cu, int verbose)
 	struct function *fn;
 	struct tag *pos;
 	int err = 0;
+	struct hlist_head hash_addr[HASHADDR__SIZE];
+	struct variable *var;
+	bool has_global_var = false;
+	GElf_Sym sym;
 
 	if (btfe && strcmp(btfe->filename, cu->filename)) {
 		err = btf_encoder__encode();
@@ -241,6 +302,130 @@ int cu__encode_btf(struct cu *cu, int verbose)
 		}
 	}
 
+
+	if (btfe->percpu_shndx == 0 || !btfe->symtab)
+		goto out;
+
+	if (verbose)
+		printf("search cu '%s' for percpu global variables.\n", cu->name);
+
+	/* cache variables' addresses, preparing for searching in symtab. */
+	for (core_id = 0; core_id < HASHADDR__SIZE; ++core_id)
+		INIT_HLIST_HEAD(&hash_addr[core_id]);
+
+	cu__for_each_variable(cu, core_id, pos) {
+		struct hlist_head *head;
+
+		var = tag__variable(pos);
+		if (var->declaration)
+			continue;
+		/* percpu variables are allocated in global space */
+		if (variable__scope(var) != VSCOPE_GLOBAL)
+			continue;
+		has_global_var = true;
+		head = &hash_addr[hashaddr__fn(var->ip.addr)];
+		hlist_add_head(&var->tool_hnode, head);
+	}
+	if (!has_global_var) {
+		if (verbose)
+			printf("cu has no global variable defined, skip.\n");
+		goto out;
+	}
+
+	/* search within symtab for percpu variables */
+	elf_symtab__for_each_symbol(btfe->symtab, core_id, sym) {
+		uint32_t linkage, type, size, offset;
+		int32_t btf_var_id, btf_var_secinfo_id;
+		uint64_t addr;
+		const char *name;
+
+		/* compare a symbol's shndx to determine if it's a percpu variable */
+		if (elf_sym__section(&sym) != btfe->percpu_shndx)
+			continue;
+		if (elf_sym__type(&sym) != STT_OBJECT)
+			continue;
+
+		addr = elf_sym__value(&sym);
+		/*
+		 * Store only those symbols that have allocated space in the percpu section.
+		 * This excludes the following three types of symbols:
+		 *
+		 *  1. __ADDRESSABLE(sym), which are forcely emitted as symbols.
+		 *  2. __UNIQUE_ID(prefix), which are introduced to generate unique ids.
+		 *  3. __exitcall(fn), functions which are labeled as exit calls.
+		 *
+		 * In addition, the variables defined using DEFINE_PERCPU_FIRST are
+		 * also not included, which currently includes:
+		 *
+		 *  1. fixed_percpu_data
+		 */
+		if (!addr)
+			continue;
+		var = hashaddr__find_variable(hash_addr, addr);
+		if (var == NULL)
+			continue;
+
+		/*
+		 * Valididate the variable's name, the same check is also
+		 * performed in bpf verifier.
+		 */
+		name = variable__name(var, cu);
+		if (!var->name || !btf_name_valid(name, true)) {
+			if (verbose)
+				printf("invalid variable name '%s'.\n", name);
+			continue;
+		}
+
+		/*
+		 * Validate the variable's type, the same check is also
+		 * performed in bpf verifier.
+		 */
+		type = var->ip.tag.type + type_id_off;
+		if (type == 0) {
+			if (verbose)
+				printf("invalid type for variable '%s'.\n", name);
+			continue;
+		}
+
+		/*
+		 * Validate the size of the variable's type, the same check is
+		 * also performed in bpf verifier.
+		 */
+		size = variable__type_size(var, cu);
+		if (size == 0) {
+			if (verbose)
+				printf("invalid type size for variable '%s'.\n", name);
+			continue;
+		}
+
+		if (verbose)
+			printf("symbol '%s' of address 0x%lx encoded\n",
+			       elf_sym__name(&sym, btfe->symtab), addr);
+
+		/* add a BTF_KIND_VAR in btfe->types */
+		linkage = var->external ? BTF_VAR_GLOBAL_ALLOCATED : BTF_VAR_STATIC;
+		btf_var_id = btf_elf__add_var_type(btfe, type, var->name, linkage);
+		if (btf_var_id < 0) {
+			err = -1;
+			printf("error: failed to encode variable '%s'\n", name);
+			break;
+		}
+
+		/*
+		 * add a BTF_VAR_SECINFO in btfe->percpu_secinfo, which will be added into
+		 * btfe->types later when we add BTF_VAR_DATASEC.
+		 */
+		type = btf_var_id;
+		offset = addr - btfe->percpu_base_addr;
+		btf_var_secinfo_id = btf_elf__add_var_secinfo(&btfe->percpu_secinfo,
+							      type, offset, size);
+		if (btf_var_secinfo_id < 0) {
+			err = -1;
+			printf("error: failed to encode var secinfo '%s'\n", name);
+			break;
+		}
+	}
+
 out:
 	if (err)
 		btf_elf__delete(btfe);
diff --git a/dwarves.c b/dwarves.c
index eb7885f..141f688 100644
--- a/dwarves.c
+++ b/dwarves.c
@@ -978,6 +978,12 @@ const char *variable__type_name(const struct variable *var,
 	return tag != NULL ? tag__name(tag, cu, bf, len, NULL) : NULL;
 }
 
+uint32_t variable__type_size(const struct variable *var, const struct cu *cu)
+{
+	const struct tag *tag = cu__type(cu, var->ip.tag.type);
+	return tag != NULL ? tag__size(tag, cu) : 0;
+}
+
 void class_member__delete(struct class_member *member, struct cu *cu)
 {
 	obstack_free(&cu->obstack, member);
diff --git a/dwarves.h b/dwarves.h
index a772e57..e4b0255 100644
--- a/dwarves.h
+++ b/dwarves.h
@@ -672,6 +672,8 @@ const char *variable__name(const struct variable *var, const struct cu *cu);
 const char *variable__type_name(const struct variable *var,
 				const struct cu *cu, char *bf, size_t len);
 
+uint32_t variable__type_size(const struct variable *var, const struct cu *cu);
+
 struct lexblock {
 	struct ip_tag	 ip;
 	struct list_head tags;
diff --git a/libbtf.c b/libbtf.c
index 2fbce40..7a01ded 100644
--- a/libbtf.c
+++ b/libbtf.c
@@ -25,6 +25,7 @@
 #include "dutil.h"
 #include "gobuffer.h"
 #include "dwarves.h"
+#include "elf_symtab.h"
 
 #define BTF_INFO_ENCODE(kind, kind_flag, vlen)				\
 	((!!(kind_flag) << 31) | ((kind) << 24) | ((vlen) & BTF_MAX_VLEN))
@@ -46,8 +47,21 @@ struct btf_array_type {
 	struct btf_array array;
 };
 
+struct btf_var_type {
+	struct btf_type type;
+	struct btf_var var;
+};
+
 uint8_t btf_elf__verbose;
 
+static int btf_var_secinfo_cmp(const void *a, const void *b)
+{
+	const struct btf_var_secinfo *av = a;
+	const struct btf_var_secinfo *bv = b;
+
+	return av->offset - bv->offset;
+}
+
 uint32_t btf_elf__get32(struct btf_elf *btfe, uint32_t *p)
 {
 	uint32_t val = *p;
@@ -137,6 +151,8 @@ out:
 struct btf_elf *btf_elf__new(const char *filename, Elf *elf)
 {
 	struct btf_elf *btfe = zalloc(sizeof(*btfe));
+	GElf_Shdr shdr;
+	Elf_Scn *sec;
 
 	if (!btfe)
 		return NULL;
@@ -193,6 +209,26 @@ struct btf_elf *btf_elf__new(const char *filename, Elf *elf)
 	default:	 btfe->wordsize = 0; break;
 	}
 
+	btfe->symtab = elf_symtab__new(NULL, btfe->elf, &btfe->ehdr);
+	if (!btfe->symtab) {
+		if (btf_elf__verbose)
+			printf("%s: '%s' doesn't have symtab.\n", __func__,
+			       btfe->filename);
+		return btfe;
+	}
+
+	/* find percpu section's shndx */
+	sec = elf_section_by_name(btfe->elf, &btfe->ehdr, &shdr, PERCPU_SECTION,
+				  NULL);
+	if (!sec) {
+		if (btf_elf__verbose)
+			printf("%s: '%s' doesn't have '%s' section\n", __func__,
+			       btfe->filename, PERCPU_SECTION);
+		return btfe;
+	}
+	btfe->percpu_shndx = elf_ndxscn(sec);
+	btfe->percpu_base_addr = shdr.sh_addr;
+
 	return btfe;
 
 errout:
@@ -211,7 +247,10 @@ void btf_elf__delete(struct btf_elf *btfe)
 			elf_end(btfe->elf);
 	}
 
+	elf_symtab__delete(btfe->symtab);
+
 	__gobuffer__delete(&btfe->types);
+	__gobuffer__delete(&btfe->percpu_secinfo);
 	free(btfe->filename);
 	free(btfe->data);
 	free(btfe);
@@ -613,6 +652,90 @@ int32_t btf_elf__add_func_proto(struct btf_elf *btfe, struct ftype *ftype, uint3
 	return type_id;
 }
 
+int32_t btf_elf__add_var_type(struct btf_elf *btfe, uint32_t type, uint32_t name_off,
+			      uint32_t linkage)
+{
+	struct btf_var_type t;
+
+	t.type.name_off = name_off;
+	t.type.info = BTF_INFO_ENCODE(BTF_KIND_VAR, 0, 0);
+	t.type.type = type;
+
+	t.var.linkage = linkage;
+
+	++btfe->type_index;
+	if (gobuffer__add(&btfe->types, &t.type, sizeof(t)) < 0) {
+		btf_elf__log_type(btfe, &t.type, true, true,
+				  "type=%u name=%s Error in adding gobuffer",
+				  t.type.type, btf_elf__name_in_gobuf(btfe, t.type.name_off));
+		return -1;
+	}
+
+	btf_elf__log_type(btfe, &t.type, false, false, "type=%u name=%s",
+			  t.type.type, btf_elf__name_in_gobuf(btfe, t.type.name_off));
+
+	return btfe->type_index;
+}
+
+int32_t btf_elf__add_var_secinfo(struct gobuffer *buf, uint32_t type,
+				 uint32_t offset, uint32_t size)
+{
+	struct btf_var_secinfo si = {
+		.type = type,
+		.offset = offset,
+		.size = size,
+	};
+	return gobuffer__add(buf, &si, sizeof(si));
+}
+
+extern struct strings *strings;
+
+int32_t btf_elf__add_datasec_type(struct btf_elf *btfe, const char *section_name,
+				  struct gobuffer *var_secinfo_buf)
+{
+	struct btf_type type;
+	size_t sz = gobuffer__size(var_secinfo_buf);
+	uint16_t nr_var_secinfo = sz / sizeof(struct btf_var_secinfo);
+	uint32_t name_off;
+	struct btf_var_secinfo *last_vsi;
+
+	qsort(var_secinfo_buf->entries, nr_var_secinfo,
+	      sizeof(struct btf_var_secinfo), btf_var_secinfo_cmp);
+
+	last_vsi = (struct btf_var_secinfo *)var_secinfo_buf->entries + nr_var_secinfo - 1;
+
+	/*
+	 * dwarves doesn't store section names in its string table,
+	 * so we have to add it by ourselves.
+	 */
+	name_off = strings__add(strings, section_name);
+
+	type.name_off = name_off;
+	type.info = BTF_INFO_ENCODE(BTF_KIND_DATASEC, 0, nr_var_secinfo);
+	type.size = last_vsi->offset + last_vsi->size;
+
+	++btfe->type_index;
+	if (gobuffer__add(&btfe->types, &type, sizeof(type)) < 0) {
+		btf_elf__log_type(btfe, &type, true, true,
+				  "name=%s vlen=%u Error in adding datasec",
+				  btf_elf__name_in_gobuf(btfe, type.name_off),
+				  nr_var_secinfo);
+		return -1;
+	}
+	if (gobuffer__add(&btfe->types, var_secinfo_buf->entries, sz) < 0) {
+		btf_elf__log_type(btfe, &type, true, true,
+				  "name=%s vlen=%u Error in adding var_secinfo",
+				  btf_elf__name_in_gobuf(btfe, type.name_off),
+				  nr_var_secinfo);
+		return -1;
+	}
+
+	btf_elf__log_type(btfe, &type, false, false, "type=datasec name=%s",
+			  btf_elf__name_in_gobuf(btfe, type.name_off));
+
+	return btfe->type_index;
+}
+
 static int btf_elf__write(const char *filename, struct btf *btf)
 {
 	GElf_Shdr shdr_mem, *shdr;
@@ -727,6 +850,10 @@ int btf_elf__encode(struct btf_elf *btfe, uint8_t flags)
 	if (gobuffer__size(&btfe->types) == 0)
 		return 0;
 
+	if (gobuffer__size(&btfe->percpu_secinfo) != 0)
+		btf_elf__add_datasec_type(btfe, PERCPU_SECTION,
+					  &btfe->percpu_secinfo);
+
 	btfe->size = sizeof(*hdr) + (gobuffer__size(&btfe->types) + gobuffer__size(btfe->strings));
 	btfe->data = zalloc(btfe->size);
 
diff --git a/libbtf.h b/libbtf.h
index f3c8500..be06480 100644
--- a/libbtf.h
+++ b/libbtf.h
@@ -20,8 +20,10 @@ struct btf_elf {
 	void		  *priv;
 	Elf		  *elf;
 	GElf_Ehdr	  ehdr;
+	struct elf_symtab *symtab;
 	struct gobuffer	  types;
 	struct gobuffer   *strings;
+	struct gobuffer   percpu_secinfo;
 	char		  *filename;
 	size_t		  size;
 	int		  swapped;
@@ -30,11 +32,15 @@ struct btf_elf {
 	bool		  is_big_endian;
 	bool		  raw_btf; // "/sys/kernel/btf/vmlinux"
 	uint32_t	  type_index;
+	uint32_t	  percpu_shndx;
+	uint64_t	  percpu_base_addr;
 };
 
 extern uint8_t btf_elf__verbose;
 #define btf_elf__verbose_log(fmt, ...) { if (btf_elf__verbose) printf(fmt, __VA_ARGS__); }
 
+#define PERCPU_SECTION ".data..percpu"
+
 struct base_type;
 struct ftype;
 
@@ -55,6 +61,12 @@ int32_t btf_elf__add_enum(struct btf_elf *btf, uint32_t name, uint32_t size,
 int btf_elf__add_enum_val(struct btf_elf *btf, uint32_t name, int32_t value);
 int32_t btf_elf__add_func_proto(struct btf_elf *btf, struct ftype *ftype,
 				uint32_t type_id_off);
+int32_t btf_elf__add_var_type(struct btf_elf *btfe, uint32_t type, uint32_t name_off,
+			      uint32_t linkage);
+int32_t btf_elf__add_var_secinfo(struct gobuffer *buf, uint32_t type,
+				 uint32_t offset, uint32_t size);
+int32_t btf_elf__add_datasec_type(struct btf_elf *btfe, const char *section_name,
+				  struct gobuffer *var_secinfo_buf);
 void btf_elf__set_strings(struct btf_elf *btf, struct gobuffer *strings);
 int  btf_elf__encode(struct btf_elf *btf, uint8_t flags);
 
diff --git a/pahole.c b/pahole.c
index e2a081b..8407db9 100644
--- a/pahole.c
+++ b/pahole.c
@@ -1084,6 +1084,7 @@ static error_t pahole__options_parser(int key, char *arg,
 	case 'i': find_containers = 1;
 		  class_name = arg;			break;
 	case 'J': btf_encode = 1;
+		  conf_load.get_addr_info = true;
 		  no_bitfield_type_recode = true;	break;
 	case 'l': conf.show_first_biggest_size_base_type_member = 1;	break;
 	case 'M': conf.show_only_data_members = 1;	break;
-- 
2.27.0.290.gba653c62da-goog

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] btf_encoder: Teach pahole to store percpu variables in vmlinux BTF.
       [not found] ` <20200618074853.133675-1-haoluo-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2020-06-19 19:58   ` Andrii Nakryiko
       [not found]     ` <CAEf4BzbE2K7uyJ-2d3N+NLrV5OKS0HMm5vKuxMz5bfZkE_ANzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Andrii Nakryiko @ 2020-06-19 19:58 UTC (permalink / raw)
  To: Hao Luo
  Cc: Arnaldo Carvalho de Melo, Alexei Starovoitov, Daniel Borkmann,
	Oleg Rombakh, Martin Lau, dwarves-u79uwXL29TY76Z2rM5mHXA

On Thu, Jun 18, 2020 at 12:49 AM Hao Luo <haoluo-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>
> On SMP systems, the global percpu variables are placed in a special
> '.data..percpu' section, which is stored in a segment whose initial
> address is set to 0, the addresses of per-CPU variables are relative
> positive addresses [1].
>
> This patch extracts these variables from vmlinux and places them with
> their type information in BTF. More specifically, when BTF is encoded,
> we find the index of the '.data..percpu' section and then traverse
> the symbol table to find those global objects which are in this section.
> For each of these objects, we push a BTF_KIND_VAR into the types buffer,
> and a BTF_VAR_SECINFO into another buffer, percpu_secinfo. When all the
> CUs have finished processing, we push a BTF_KIND_DATASEC into the
> btfe->types buffer, followed by the percpu_secinfo's content.
>
> In a v5.7-rc7 linux kernel, I was able to extract 291 such variables.
> The build time overhead is small and the space overhead is also small.
>
> Testing:
>
> - vmlinux size has increased by ~12kb.
>   Before:
>    $ readelf -SW vmlinux | grep BTF
>    [25] .BTF              PROGBITS        ffffffff821a905c 13a905c 2d2bf8 00
>   After:
>    $ pahole -J vmlinux
>    $ readelf -SW vmlinux  | grep BTF
>    [25] .BTF              PROGBITS        ffffffff821a905c 13a905c 2d5bca 00
>
> - Common global percpu VARs and DATASEC are found in BTF section.
>   $ bpftool btf dump file vmlinux | grep runqueues
>   [14098] VAR 'runqueues' type_id=13725, linkage=global-alloc
>
>   $ bpftool btf dump file vmlinux | grep 'cpu_stopper'
>   [17592] STRUCT 'cpu_stopper' size=72 vlen=5
>   [17612] VAR 'cpu_stopper' type_id=17592, linkage=static
>
>   $ bpftool btf dump file vmlinux | grep ' DATASEC '
>   [63652] DATASEC '.data..percpu' size=0 vlen=294

probably forgot to update the example, I'd imagine size wouldn't be 0 anymore?

>
> - Tested bpf selftests.
>
> References:
>  [1] https://lwn.net/Articles/531148/
> Signed-off-by: Hao Luo <haoluo-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> ---

I'm quite a bit uncomfortable that pahole would just ignore
non-conformat variables. We should understand why there might even be
those variables with invalid names, how that can happen in C?


> Changelog since v3:
> - Correct the size of the DATASEC, which was previously 0 and now is
>   the last var_secinfo's offset + size.
> - Add several sanity checks on VARs and DATASEC to ensure that the
>   generated BTF will not be rejected by verifier.
> - Tested through a set of selftests and bpf_tracing programs.
>
> Changelog since v2:
> - Move finding percpu_shndx and extracting symtab into btfe creation,
>   so we don't have to allocate a new symtab for each CU.
> - More debug msg by logging the vars encoded in 'verbose' mode. We
>   probably don't want to log the symbols that are _not_ encoded,
>   since that would be too verbose.
> - Calculate var offsets using 'addr - shdr.sh_addr', so it could be
>   generalized to other sections in future.
> - Filter out the symbols that are not STT_OBJECT.
> - Sort var_secinfos in the DATASEC by their offsets.
> - Free 'persec_secinfo' buffer and 'symtab' in btfe deletion.
> - Replace the string ".data..percpu" with a constant PERCPU_SECTION.
>
> Changelog since v1:
> - Add a ".data..percpu" DATASEC that encodes the found VARs.
> - Use percpu section's shndx to find the symbols that are percpu variables.
> - Use the correct type to set VAR's linkage.
>
>  btf_encoder.c | 185 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  dwarves.c     |   6 ++
>  dwarves.h     |   2 +
>  libbtf.c      | 127 ++++++++++++++++++++++++++++++++++
>  libbtf.h      |  12 ++++
>  pahole.c      |   1 +
>  6 files changed, 333 insertions(+)
>
> diff --git a/btf_encoder.c b/btf_encoder.c
> index df16ba0..c7401c3 100644
> --- a/btf_encoder.c
> +++ b/btf_encoder.c
> @@ -16,8 +16,44 @@
>  #include "elf_symtab.h"
>  #include "btf_encoder.h"
>
> +#include <ctype.h> /* for isalpha() and isalnum() */
>  #include <inttypes.h>
>
> +/*
> + * This corresponds to the same macro defined in
> + * include/linux/kallsyms.h
> + */
> +#define KSYM_NAME_LEN 128
> +
> +static bool btf_name_char_ok(char c, bool first, bool dot_ok)
> +{
> +       if ((first ? !isalpha(c) : !isalnum(c)) &&
> +           c != '_' &&
> +           ((c == '.' && !dot_ok) || c != '.'))
> +               return false;
> +       return true;
> +}
> +
> +/* Check wether the given name is valid in vmlinux btf. */
> +static bool btf_name_valid(const char *p, bool dot_ok)

dot_ok is always true, you can simplify above function

> +{
> +       const char *limit;
> +
> +       if (!btf_name_char_ok(*p, true, dot_ok))
> +               return false;
> +
> +       /* set a limit on identifier length */
> +       limit = p + KSYM_NAME_LEN;
> +       p++;
> +       while (*p && p < limit) {
> +               if (!btf_name_char_ok(*p, false, dot_ok))
> +                       return false;
> +               p++;
> +       }
> +
> +       return !*p;
> +}
> +

[...]

> +       /* search within symtab for percpu variables */
> +       elf_symtab__for_each_symbol(btfe->symtab, core_id, sym) {
> +               uint32_t linkage, type, size, offset;
> +               int32_t btf_var_id, btf_var_secinfo_id;
> +               uint64_t addr;
> +               const char *name;
> +
> +               /* compare a symbol's shndx to determine if it's a percpu variable */
> +               if (elf_sym__section(&sym) != btfe->percpu_shndx)
> +                       continue;
> +               if (elf_sym__type(&sym) != STT_OBJECT)
> +                       continue;
> +
> +               addr = elf_sym__value(&sym);
> +               /*
> +                * Store only those symbols that have allocated space in the percpu section.
> +                * This excludes the following three types of symbols:
> +                *
> +                *  1. __ADDRESSABLE(sym), which are forcely emitted as symbols.
> +                *  2. __UNIQUE_ID(prefix), which are introduced to generate unique ids.
> +                *  3. __exitcall(fn), functions which are labeled as exit calls.
> +                *
> +                * In addition, the variables defined using DEFINE_PERCPU_FIRST are
> +                * also not included, which currently includes:
> +                *
> +                *  1. fixed_percpu_data
> +                */
> +               if (!addr)
> +                       continue;
> +               var = hashaddr__find_variable(hash_addr, addr);
> +               if (var == NULL)
> +                       continue;
> +
> +               /*
> +                * Valididate the variable's name, the same check is also
> +                * performed in bpf verifier.
> +                */
> +               name = variable__name(var, cu);
> +               if (!var->name || !btf_name_valid(name, true)) {

It might be ok to filter them out, but I'd rather try to first
understand why do we get those invalid names in the first place.
Otherwise we might be just papering over the real bug in how we
process symbols/variables in DWARF?

> +                       if (verbose)
> +                               printf("invalid variable name '%s'.\n", name);
> +                       continue;
> +               }
> +
> +               /*
> +                * Validate the variable's type, the same check is also
> +                * performed in bpf verifier.
> +                */
> +               type = var->ip.tag.type + type_id_off;
> +               if (type == 0) {
> +                       if (verbose)
> +                               printf("invalid type for variable '%s'.\n", name);
> +                       continue;

Again, there might be cases which we want to ignore, but we need to
understand what and why those cases are?

> +               }
> +
> +               /*
> +                * Validate the size of the variable's type, the same check is
> +                * also performed in bpf verifier.
> +                */
> +               size = variable__type_size(var, cu);
> +               if (size == 0) {
> +                       if (verbose)
> +                               printf("invalid type size for variable '%s'.\n", name);
> +                       continue;
> +               }
> +

The goal here is not to satisfy the verifier, but to generate BTF from
DWARF faithfully. Let's not just ignore "inconvenient" cases and
instead investigate why such cases happen.

> +               if (verbose)
> +                       printf("symbol '%s' of address 0x%lx encoded\n",
> +                              elf_sym__name(&sym, btfe->symtab), addr);
> +
> +               /* add a BTF_KIND_VAR in btfe->types */
> +               linkage = var->external ? BTF_VAR_GLOBAL_ALLOCATED : BTF_VAR_STATIC;
> +               btf_var_id = btf_elf__add_var_type(btfe, type, var->name, linkage);
> +               if (btf_var_id < 0) {
> +                       err = -1;
> +                       printf("error: failed to encode variable '%s'\n", name);
> +                       break;
> +               }
> +
> +               /*
> +                * add a BTF_VAR_SECINFO in btfe->percpu_secinfo, which will be added into
> +                * btfe->types later when we add BTF_VAR_DATASEC.
> +                */
> +               type = btf_var_id;
> +               offset = addr - btfe->percpu_base_addr;
> +               btf_var_secinfo_id = btf_elf__add_var_secinfo(&btfe->percpu_secinfo,
> +                                                             type, offset, size);
> +               if (btf_var_secinfo_id < 0) {
> +                       err = -1;
> +                       printf("error: failed to encode var secinfo '%s'\n", name);
> +                       break;
> +               }
> +       }
> +
>  out:
>         if (err)
>                 btf_elf__delete(btfe);

[...]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] btf_encoder: Teach pahole to store percpu variables in vmlinux BTF.
       [not found]     ` <CAEf4BzbE2K7uyJ-2d3N+NLrV5OKS0HMm5vKuxMz5bfZkE_ANzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-06-19 20:30       ` Hao Luo
       [not found]         ` <CA+khW7j3AoW8q+Q30RXG-psw4imkBxNqffObnu2dr2=K=oVBCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2020-06-29 20:20       ` Andrii Nakryiko
  1 sibling, 1 reply; 9+ messages in thread
From: Hao Luo @ 2020-06-19 20:30 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Arnaldo Carvalho de Melo, Alexei Starovoitov, Daniel Borkmann,
	Oleg Rombakh, Martin Lau, dwarves-u79uwXL29TY76Z2rM5mHXA

Hi, Andrii,

I agree that we'd better put a hold on this patch until we find out
the reason for the 'unconventional' symbols. I'll _try_ to figure it
out, but not able to fully commit my time on this patch. I thought I'd
better publish this patch as the DATASEC and VARs are generated
correctly in format, so that anyone can use it to generate the vmlinux
and continue the development on libbpf based on your ksym work (i.e.
typed ksyms).

Hao

On Fri, Jun 19, 2020 at 12:58 PM Andrii Nakryiko
<andrii.nakryiko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> On Thu, Jun 18, 2020 at 12:49 AM Hao Luo <haoluo-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> >
> > On SMP systems, the global percpu variables are placed in a special
> > '.data..percpu' section, which is stored in a segment whose initial
> > address is set to 0, the addresses of per-CPU variables are relative
> > positive addresses [1].
> >
> > This patch extracts these variables from vmlinux and places them with
> > their type information in BTF. More specifically, when BTF is encoded,
> > we find the index of the '.data..percpu' section and then traverse
> > the symbol table to find those global objects which are in this section.
> > For each of these objects, we push a BTF_KIND_VAR into the types buffer,
> > and a BTF_VAR_SECINFO into another buffer, percpu_secinfo. When all the
> > CUs have finished processing, we push a BTF_KIND_DATASEC into the
> > btfe->types buffer, followed by the percpu_secinfo's content.
> >
> > In a v5.7-rc7 linux kernel, I was able to extract 291 such variables.
> > The build time overhead is small and the space overhead is also small.
> >
> > Testing:
> >
> > - vmlinux size has increased by ~12kb.
> >   Before:
> >    $ readelf -SW vmlinux | grep BTF
> >    [25] .BTF              PROGBITS        ffffffff821a905c 13a905c 2d2bf8 00
> >   After:
> >    $ pahole -J vmlinux
> >    $ readelf -SW vmlinux  | grep BTF
> >    [25] .BTF              PROGBITS        ffffffff821a905c 13a905c 2d5bca 00
> >
> > - Common global percpu VARs and DATASEC are found in BTF section.
> >   $ bpftool btf dump file vmlinux | grep runqueues
> >   [14098] VAR 'runqueues' type_id=13725, linkage=global-alloc
> >
> >   $ bpftool btf dump file vmlinux | grep 'cpu_stopper'
> >   [17592] STRUCT 'cpu_stopper' size=72 vlen=5
> >   [17612] VAR 'cpu_stopper' type_id=17592, linkage=static
> >
> >   $ bpftool btf dump file vmlinux | grep ' DATASEC '
> >   [63652] DATASEC '.data..percpu' size=0 vlen=294
>
> probably forgot to update the example, I'd imagine size wouldn't be 0 anymore?
>

Ah, right, I overlooked the example.

> >
> > - Tested bpf selftests.
> >
> > References:
> >  [1] https://lwn.net/Articles/531148/
> > Signed-off-by: Hao Luo <haoluo-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > ---
>
> I'm quite a bit uncomfortable that pahole would just ignore
> non-conformat variables. We should understand why there might even be
> those variables with invalid names, how that can happen in C?
>
>
> > Changelog since v3:
> > - Correct the size of the DATASEC, which was previously 0 and now is
> >   the last var_secinfo's offset + size.
> > - Add several sanity checks on VARs and DATASEC to ensure that the
> >   generated BTF will not be rejected by verifier.
> > - Tested through a set of selftests and bpf_tracing programs.
> >
> > Changelog since v2:
> > - Move finding percpu_shndx and extracting symtab into btfe creation,
> >   so we don't have to allocate a new symtab for each CU.
> > - More debug msg by logging the vars encoded in 'verbose' mode. We
> >   probably don't want to log the symbols that are _not_ encoded,
> >   since that would be too verbose.
> > - Calculate var offsets using 'addr - shdr.sh_addr', so it could be
> >   generalized to other sections in future.
> > - Filter out the symbols that are not STT_OBJECT.
> > - Sort var_secinfos in the DATASEC by their offsets.
> > - Free 'persec_secinfo' buffer and 'symtab' in btfe deletion.
> > - Replace the string ".data..percpu" with a constant PERCPU_SECTION.
> >
> > Changelog since v1:
> > - Add a ".data..percpu" DATASEC that encodes the found VARs.
> > - Use percpu section's shndx to find the symbols that are percpu variables.
> > - Use the correct type to set VAR's linkage.
> >
> >  btf_encoder.c | 185 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >  dwarves.c     |   6 ++
> >  dwarves.h     |   2 +
> >  libbtf.c      | 127 ++++++++++++++++++++++++++++++++++
> >  libbtf.h      |  12 ++++
> >  pahole.c      |   1 +
> >  6 files changed, 333 insertions(+)
> >
> > diff --git a/btf_encoder.c b/btf_encoder.c
> > index df16ba0..c7401c3 100644
> > --- a/btf_encoder.c
> > +++ b/btf_encoder.c
> > @@ -16,8 +16,44 @@
> >  #include "elf_symtab.h"
> >  #include "btf_encoder.h"
> >
> > +#include <ctype.h> /* for isalpha() and isalnum() */
> >  #include <inttypes.h>
> >
> > +/*
> > + * This corresponds to the same macro defined in
> > + * include/linux/kallsyms.h
> > + */
> > +#define KSYM_NAME_LEN 128
> > +
> > +static bool btf_name_char_ok(char c, bool first, bool dot_ok)
> > +{
> > +       if ((first ? !isalpha(c) : !isalnum(c)) &&
> > +           c != '_' &&
> > +           ((c == '.' && !dot_ok) || c != '.'))
> > +               return false;
> > +       return true;
> > +}
> > +
> > +/* Check wether the given name is valid in vmlinux btf. */
> > +static bool btf_name_valid(const char *p, bool dot_ok)
>
> dot_ok is always true, you can simplify above function
>
> > +{
> > +       const char *limit;
> > +
> > +       if (!btf_name_char_ok(*p, true, dot_ok))
> > +               return false;
> > +
> > +       /* set a limit on identifier length */
> > +       limit = p + KSYM_NAME_LEN;
> > +       p++;
> > +       while (*p && p < limit) {
> > +               if (!btf_name_char_ok(*p, false, dot_ok))
> > +                       return false;
> > +               p++;
> > +       }
> > +
> > +       return !*p;
> > +}
> > +
>
> [...]
>
> > +       /* search within symtab for percpu variables */
> > +       elf_symtab__for_each_symbol(btfe->symtab, core_id, sym) {
> > +               uint32_t linkage, type, size, offset;
> > +               int32_t btf_var_id, btf_var_secinfo_id;
> > +               uint64_t addr;
> > +               const char *name;
> > +
> > +               /* compare a symbol's shndx to determine if it's a percpu variable */
> > +               if (elf_sym__section(&sym) != btfe->percpu_shndx)
> > +                       continue;
> > +               if (elf_sym__type(&sym) != STT_OBJECT)
> > +                       continue;
> > +
> > +               addr = elf_sym__value(&sym);
> > +               /*
> > +                * Store only those symbols that have allocated space in the percpu section.
> > +                * This excludes the following three types of symbols:
> > +                *
> > +                *  1. __ADDRESSABLE(sym), which are forcely emitted as symbols.
> > +                *  2. __UNIQUE_ID(prefix), which are introduced to generate unique ids.
> > +                *  3. __exitcall(fn), functions which are labeled as exit calls.
> > +                *
> > +                * In addition, the variables defined using DEFINE_PERCPU_FIRST are
> > +                * also not included, which currently includes:
> > +                *
> > +                *  1. fixed_percpu_data
> > +                */
> > +               if (!addr)
> > +                       continue;
> > +               var = hashaddr__find_variable(hash_addr, addr);
> > +               if (var == NULL)
> > +                       continue;
> > +
> > +               /*
> > +                * Valididate the variable's name, the same check is also
> > +                * performed in bpf verifier.
> > +                */
> > +               name = variable__name(var, cu);
> > +               if (!var->name || !btf_name_valid(name, true)) {
>
> It might be ok to filter them out, but I'd rather try to first
> understand why do we get those invalid names in the first place.
> Otherwise we might be just papering over the real bug in how we
> process symbols/variables in DWARF?
>
> > +                       if (verbose)
> > +                               printf("invalid variable name '%s'.\n", name);
> > +                       continue;
> > +               }
> > +
> > +               /*
> > +                * Validate the variable's type, the same check is also
> > +                * performed in bpf verifier.
> > +                */
> > +               type = var->ip.tag.type + type_id_off;
> > +               if (type == 0) {
> > +                       if (verbose)
> > +                               printf("invalid type for variable '%s'.\n", name);
> > +                       continue;
>
> Again, there might be cases which we want to ignore, but we need to
> understand what and why those cases are?
>
> > +               }
> > +
> > +               /*
> > +                * Validate the size of the variable's type, the same check is
> > +                * also performed in bpf verifier.
> > +                */
> > +               size = variable__type_size(var, cu);
> > +               if (size == 0) {
> > +                       if (verbose)
> > +                               printf("invalid type size for variable '%s'.\n", name);
> > +                       continue;
> > +               }
> > +
>
> The goal here is not to satisfy the verifier, but to generate BTF from
> DWARF faithfully. Let's not just ignore "inconvenient" cases and
> instead investigate why such cases happen.
>
> > +               if (verbose)
> > +                       printf("symbol '%s' of address 0x%lx encoded\n",
> > +                              elf_sym__name(&sym, btfe->symtab), addr);
> > +
> > +               /* add a BTF_KIND_VAR in btfe->types */
> > +               linkage = var->external ? BTF_VAR_GLOBAL_ALLOCATED : BTF_VAR_STATIC;
> > +               btf_var_id = btf_elf__add_var_type(btfe, type, var->name, linkage);
> > +               if (btf_var_id < 0) {
> > +                       err = -1;
> > +                       printf("error: failed to encode variable '%s'\n", name);
> > +                       break;
> > +               }
> > +
> > +               /*
> > +                * add a BTF_VAR_SECINFO in btfe->percpu_secinfo, which will be added into
> > +                * btfe->types later when we add BTF_VAR_DATASEC.
> > +                */
> > +               type = btf_var_id;
> > +               offset = addr - btfe->percpu_base_addr;
> > +               btf_var_secinfo_id = btf_elf__add_var_secinfo(&btfe->percpu_secinfo,
> > +                                                             type, offset, size);
> > +               if (btf_var_secinfo_id < 0) {
> > +                       err = -1;
> > +                       printf("error: failed to encode var secinfo '%s'\n", name);
> > +                       break;
> > +               }
> > +       }
> > +
> >  out:
> >         if (err)
> >                 btf_elf__delete(btfe);
>
> [...]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] btf_encoder: Teach pahole to store percpu variables in vmlinux BTF.
       [not found]     ` <CAEf4BzbE2K7uyJ-2d3N+NLrV5OKS0HMm5vKuxMz5bfZkE_ANzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2020-06-19 20:30       ` Hao Luo
@ 2020-06-29 20:20       ` Andrii Nakryiko
       [not found]         ` <CAEf4Bzbiu8G5UK9TDBRFYGwVk4A4QWoLurZsFKhs439gBXaDCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 9+ messages in thread
From: Andrii Nakryiko @ 2020-06-29 20:20 UTC (permalink / raw)
  To: Hao Luo
  Cc: Arnaldo Carvalho de Melo, Alexei Starovoitov, Daniel Borkmann,
	Oleg Rombakh, Martin Lau, dwarves-u79uwXL29TY76Z2rM5mHXA

On Fri, Jun 19, 2020 at 12:58 PM Andrii Nakryiko
<andrii.nakryiko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> On Thu, Jun 18, 2020 at 12:49 AM Hao Luo <haoluo-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> >
> > On SMP systems, the global percpu variables are placed in a special
> > '.data..percpu' section, which is stored in a segment whose initial
> > address is set to 0, the addresses of per-CPU variables are relative
> > positive addresses [1].
> >
> > This patch extracts these variables from vmlinux and places them with
> > their type information in BTF. More specifically, when BTF is encoded,
> > we find the index of the '.data..percpu' section and then traverse
> > the symbol table to find those global objects which are in this section.
> > For each of these objects, we push a BTF_KIND_VAR into the types buffer,
> > and a BTF_VAR_SECINFO into another buffer, percpu_secinfo. When all the
> > CUs have finished processing, we push a BTF_KIND_DATASEC into the
> > btfe->types buffer, followed by the percpu_secinfo's content.
> >
> > In a v5.7-rc7 linux kernel, I was able to extract 291 such variables.
> > The build time overhead is small and the space overhead is also small.
> >
> > Testing:
> >
> > - vmlinux size has increased by ~12kb.
> >   Before:
> >    $ readelf -SW vmlinux | grep BTF
> >    [25] .BTF              PROGBITS        ffffffff821a905c 13a905c 2d2bf8 00
> >   After:
> >    $ pahole -J vmlinux
> >    $ readelf -SW vmlinux  | grep BTF
> >    [25] .BTF              PROGBITS        ffffffff821a905c 13a905c 2d5bca 00
> >
> > - Common global percpu VARs and DATASEC are found in BTF section.
> >   $ bpftool btf dump file vmlinux | grep runqueues
> >   [14098] VAR 'runqueues' type_id=13725, linkage=global-alloc
> >
> >   $ bpftool btf dump file vmlinux | grep 'cpu_stopper'
> >   [17592] STRUCT 'cpu_stopper' size=72 vlen=5
> >   [17612] VAR 'cpu_stopper' type_id=17592, linkage=static
> >
> >   $ bpftool btf dump file vmlinux | grep ' DATASEC '
> >   [63652] DATASEC '.data..percpu' size=0 vlen=294
>
> probably forgot to update the example, I'd imagine size wouldn't be 0 anymore?
>
> >
> > - Tested bpf selftests.
> >
> > References:
> >  [1] https://lwn.net/Articles/531148/
> > Signed-off-by: Hao Luo <haoluo-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> > ---
>
> I'm quite a bit uncomfortable that pahole would just ignore
> non-conformat variables. We should understand why there might even be
> those variables with invalid names, how that can happen in C?
>
>
> > Changelog since v3:
> > - Correct the size of the DATASEC, which was previously 0 and now is
> >   the last var_secinfo's offset + size.
> > - Add several sanity checks on VARs and DATASEC to ensure that the
> >   generated BTF will not be rejected by verifier.
> > - Tested through a set of selftests and bpf_tracing programs.
> >
> > Changelog since v2:
> > - Move finding percpu_shndx and extracting symtab into btfe creation,
> >   so we don't have to allocate a new symtab for each CU.
> > - More debug msg by logging the vars encoded in 'verbose' mode. We
> >   probably don't want to log the symbols that are _not_ encoded,
> >   since that would be too verbose.
> > - Calculate var offsets using 'addr - shdr.sh_addr', so it could be
> >   generalized to other sections in future.
> > - Filter out the symbols that are not STT_OBJECT.
> > - Sort var_secinfos in the DATASEC by their offsets.
> > - Free 'persec_secinfo' buffer and 'symtab' in btfe deletion.
> > - Replace the string ".data..percpu" with a constant PERCPU_SECTION.
> >
> > Changelog since v1:
> > - Add a ".data..percpu" DATASEC that encodes the found VARs.
> > - Use percpu section's shndx to find the symbols that are percpu variables.
> > - Use the correct type to set VAR's linkage.
> >
> >  btf_encoder.c | 185 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >  dwarves.c     |   6 ++
> >  dwarves.h     |   2 +
> >  libbtf.c      | 127 ++++++++++++++++++++++++++++++++++
> >  libbtf.h      |  12 ++++
> >  pahole.c      |   1 +
> >  6 files changed, 333 insertions(+)
> >
> > diff --git a/btf_encoder.c b/btf_encoder.c
> > index df16ba0..c7401c3 100644
> > --- a/btf_encoder.c
> > +++ b/btf_encoder.c
> > @@ -16,8 +16,44 @@
> >  #include "elf_symtab.h"
> >  #include "btf_encoder.h"
> >
> > +#include <ctype.h> /* for isalpha() and isalnum() */
> >  #include <inttypes.h>
> >
> > +/*
> > + * This corresponds to the same macro defined in
> > + * include/linux/kallsyms.h
> > + */
> > +#define KSYM_NAME_LEN 128
> > +
> > +static bool btf_name_char_ok(char c, bool first, bool dot_ok)
> > +{
> > +       if ((first ? !isalpha(c) : !isalnum(c)) &&
> > +           c != '_' &&
> > +           ((c == '.' && !dot_ok) || c != '.'))
> > +               return false;
> > +       return true;
> > +}
> > +
> > +/* Check wether the given name is valid in vmlinux btf. */
> > +static bool btf_name_valid(const char *p, bool dot_ok)
>
> dot_ok is always true, you can simplify above function
>
> > +{
> > +       const char *limit;
> > +
> > +       if (!btf_name_char_ok(*p, true, dot_ok))
> > +               return false;
> > +
> > +       /* set a limit on identifier length */
> > +       limit = p + KSYM_NAME_LEN;
> > +       p++;
> > +       while (*p && p < limit) {
> > +               if (!btf_name_char_ok(*p, false, dot_ok))
> > +                       return false;
> > +               p++;
> > +       }
> > +
> > +       return !*p;
> > +}
> > +
>
> [...]
>
> > +       /* search within symtab for percpu variables */
> > +       elf_symtab__for_each_symbol(btfe->symtab, core_id, sym) {
> > +               uint32_t linkage, type, size, offset;
> > +               int32_t btf_var_id, btf_var_secinfo_id;
> > +               uint64_t addr;
> > +               const char *name;
> > +
> > +               /* compare a symbol's shndx to determine if it's a percpu variable */
> > +               if (elf_sym__section(&sym) != btfe->percpu_shndx)
> > +                       continue;
> > +               if (elf_sym__type(&sym) != STT_OBJECT)
> > +                       continue;
> > +
> > +               addr = elf_sym__value(&sym);
> > +               /*
> > +                * Store only those symbols that have allocated space in the percpu section.
> > +                * This excludes the following three types of symbols:
> > +                *
> > +                *  1. __ADDRESSABLE(sym), which are forcely emitted as symbols.
> > +                *  2. __UNIQUE_ID(prefix), which are introduced to generate unique ids.
> > +                *  3. __exitcall(fn), functions which are labeled as exit calls.
> > +                *
> > +                * In addition, the variables defined using DEFINE_PERCPU_FIRST are
> > +                * also not included, which currently includes:
> > +                *
> > +                *  1. fixed_percpu_data
> > +                */
> > +               if (!addr)
> > +                       continue;
> > +               var = hashaddr__find_variable(hash_addr, addr);
> > +               if (var == NULL)
> > +                       continue;
> > +
> > +               /*
> > +                * Valididate the variable's name, the same check is also
> > +                * performed in bpf verifier.
> > +                */
> > +               name = variable__name(var, cu);
> > +               if (!var->name || !btf_name_valid(name, true)) {
>
> It might be ok to filter them out, but I'd rather try to first
> understand why do we get those invalid names in the first place.
> Otherwise we might be just papering over the real bug in how we
> process symbols/variables in DWARF?

Ok, so I played with it a bit locally. Those variables with empty name
or 0 size seem to be due to some bugs in variable__name() and
variable__type_size() implementations. If you stick to ELF symbol's
elf_sym__name() and elf_sym__size(), you'll get valid sizes.

Arnaldo, could you please take a look at why variable__xxx() impls
don't work for some variables? I can repro it locally, e.g.:

ELF SYM kstat == <empty>, SZ 48 == 0
invalid type size for variable '(null)'.
invalid variable name '(null)'.

kstat and 48 come from ELF symbol, <empty> and 0 are from
variabel__xxx(). And that happens for quite a few variables, actually.

With those fixes, my kernel can be successfully BTF-generated. The
number of per-cpu variables jump from 213 with all this filtering to
287 proper variables while using elf_sym__xxx() functions.

>
> > +                       if (verbose)
> > +                               printf("invalid variable name '%s'.\n", name);
> > +                       continue;
> > +               }
> > +
> > +               /*
> > +                * Validate the variable's type, the same check is also
> > +                * performed in bpf verifier.
> > +                */
> > +               type = var->ip.tag.type + type_id_off;
> > +               if (type == 0) {
> > +                       if (verbose)
> > +                               printf("invalid type for variable '%s'.\n", name);
> > +                       continue;
>
> Again, there might be cases which we want to ignore, but we need to
> understand what and why those cases are?
>
> > +               }
> > +
> > +               /*
> > +                * Validate the size of the variable's type, the same check is
> > +                * also performed in bpf verifier.
> > +                */
> > +               size = variable__type_size(var, cu);
> > +               if (size == 0) {
> > +                       if (verbose)
> > +                               printf("invalid type size for variable '%s'.\n", name);
> > +                       continue;
> > +               }
> > +
>
> The goal here is not to satisfy the verifier, but to generate BTF from
> DWARF faithfully. Let's not just ignore "inconvenient" cases and
> instead investigate why such cases happen.
>
> > +               if (verbose)
> > +                       printf("symbol '%s' of address 0x%lx encoded\n",
> > +                              elf_sym__name(&sym, btfe->symtab), addr);
> > +
> > +               /* add a BTF_KIND_VAR in btfe->types */
> > +               linkage = var->external ? BTF_VAR_GLOBAL_ALLOCATED : BTF_VAR_STATIC;
> > +               btf_var_id = btf_elf__add_var_type(btfe, type, var->name, linkage);
> > +               if (btf_var_id < 0) {
> > +                       err = -1;
> > +                       printf("error: failed to encode variable '%s'\n", name);
> > +                       break;
> > +               }
> > +
> > +               /*
> > +                * add a BTF_VAR_SECINFO in btfe->percpu_secinfo, which will be added into
> > +                * btfe->types later when we add BTF_VAR_DATASEC.
> > +                */
> > +               type = btf_var_id;
> > +               offset = addr - btfe->percpu_base_addr;
> > +               btf_var_secinfo_id = btf_elf__add_var_secinfo(&btfe->percpu_secinfo,
> > +                                                             type, offset, size);
> > +               if (btf_var_secinfo_id < 0) {
> > +                       err = -1;
> > +                       printf("error: failed to encode var secinfo '%s'\n", name);
> > +                       break;
> > +               }
> > +       }
> > +
> >  out:
> >         if (err)
> >                 btf_elf__delete(btfe);
>
> [...]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] btf_encoder: Teach pahole to store percpu variables in vmlinux BTF.
       [not found]         ` <CA+khW7j3AoW8q+Q30RXG-psw4imkBxNqffObnu2dr2=K=oVBCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-06-29 20:25           ` Andrii Nakryiko
       [not found]             ` <CAEf4Bza20kx3d+84Fg+UKXFYb2vforj55g0zO92Hpz+cM49LQA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Andrii Nakryiko @ 2020-06-29 20:25 UTC (permalink / raw)
  To: Hao Luo
  Cc: Arnaldo Carvalho de Melo, Alexei Starovoitov, Daniel Borkmann,
	Oleg Rombakh, Martin Lau, dwarves-u79uwXL29TY76Z2rM5mHXA

On Fri, Jun 19, 2020 at 1:30 PM Hao Luo <haoluo-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>
> Hi, Andrii,
>
> I agree that we'd better put a hold on this patch until we find out
> the reason for the 'unconventional' symbols. I'll _try_ to figure it
> out, but not able to fully commit my time on this patch. I thought I'd
> better publish this patch as the DATASEC and VARs are generated
> correctly in format, so that anyone can use it to generate the vmlinux
> and continue the development on libbpf based on your ksym work (i.e.
> typed ksyms).
>

Hey Hao,

It's a pity that you are willing to drop this while being half-step
away from doing this properly. See my other reply, there is something
fishy with variable__name() and variable__type_size() and how it
calculates/caches values. ELF itself has all the data (based ELF
symbol data) and it seems to be correct. And I was correct to be
suspicious about just filtering out such variables, because in my case
you'd filter out a good chunk of variables for no good reason: 74
variables, which is a 26% of all per-CPU variables in my kernel.

It's up to you, of course, but it would be nice to fix it up (just
switching to elf_sym__size() and elf_sym__name() would be fine, you'd
just need to make sure to add elf_sym_name() result into a string
buffer; unless Arnaldo has some better alternatives) and publish
complete v5, that would get merged into pahole. After that, building
on my .ksym work in libbpf should get you to what you need from BPF
pretty quickly (plus some kernel-side logic to recognize these per-CPU
variables).

> Hao
>
> On Fri, Jun 19, 2020 at 12:58 PM Andrii Nakryiko
> <andrii.nakryiko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >
> > On Thu, Jun 18, 2020 at 12:49 AM Hao Luo <haoluo-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> > >
> > > On SMP systems, the global percpu variables are placed in a special
> > > '.data..percpu' section, which is stored in a segment whose initial
> > > address is set to 0, the addresses of per-CPU variables are relative
> > > positive addresses [1].
> > >
> > > This patch extracts these variables from vmlinux and places them with
> > > their type information in BTF. More specifically, when BTF is encoded,
> > > we find the index of the '.data..percpu' section and then traverse
> > > the symbol table to find those global objects which are in this section.
> > > For each of these objects, we push a BTF_KIND_VAR into the types buffer,
> > > and a BTF_VAR_SECINFO into another buffer, percpu_secinfo. When all the
> > > CUs have finished processing, we push a BTF_KIND_DATASEC into the
> > > btfe->types buffer, followed by the percpu_secinfo's content.
> > >
> > > In a v5.7-rc7 linux kernel, I was able to extract 291 such variables.
> > > The build time overhead is small and the space overhead is also small.
> > >
> > > Testing:
> > >
> > > - vmlinux size has increased by ~12kb.
> > >   Before:
> > >    $ readelf -SW vmlinux | grep BTF
> > >    [25] .BTF              PROGBITS        ffffffff821a905c 13a905c 2d2bf8 00
> > >   After:
> > >    $ pahole -J vmlinux
> > >    $ readelf -SW vmlinux  | grep BTF
> > >    [25] .BTF              PROGBITS        ffffffff821a905c 13a905c 2d5bca 00
> > >
> > > - Common global percpu VARs and DATASEC are found in BTF section.
> > >   $ bpftool btf dump file vmlinux | grep runqueues
> > >   [14098] VAR 'runqueues' type_id=13725, linkage=global-alloc
> > >
> > >   $ bpftool btf dump file vmlinux | grep 'cpu_stopper'
> > >   [17592] STRUCT 'cpu_stopper' size=72 vlen=5
> > >   [17612] VAR 'cpu_stopper' type_id=17592, linkage=static
> > >
> > >   $ bpftool btf dump file vmlinux | grep ' DATASEC '
> > >   [63652] DATASEC '.data..percpu' size=0 vlen=294
> >
> > probably forgot to update the example, I'd imagine size wouldn't be 0 anymore?
> >

[...]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] btf_encoder: Teach pahole to store percpu variables in vmlinux BTF.
       [not found]             ` <CAEf4Bza20kx3d+84Fg+UKXFYb2vforj55g0zO92Hpz+cM49LQA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-06-29 21:13               ` Hao Luo
       [not found]                 ` <CA+khW7hqX9fJd2G8-Kze9pMR4hUKV5ROxn1vKYy82+PCzDeCDQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Hao Luo @ 2020-06-29 21:13 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Arnaldo Carvalho de Melo, Alexei Starovoitov, Daniel Borkmann,
	Oleg Rombakh, Martin Lau, dwarves-u79uwXL29TY76Z2rM5mHXA

Hi Andrii,

Thanks for taking a look! I can post a v5 based on your thorough and
detailed analysis this week. Some of my other work on sched consumed
most of my bandwidth last week and I very appreciate your help! I
apologize if it is felt I dropped this patch and will follow up within
this week.

Hao

On Mon, Jun 29, 2020 at 1:26 PM Andrii Nakryiko
<andrii.nakryiko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> On Fri, Jun 19, 2020 at 1:30 PM Hao Luo <haoluo-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> >
> > Hi, Andrii,
> >
> > I agree that we'd better put a hold on this patch until we find out
> > the reason for the 'unconventional' symbols. I'll _try_ to figure it
> > out, but not able to fully commit my time on this patch. I thought I'd
> > better publish this patch as the DATASEC and VARs are generated
> > correctly in format, so that anyone can use it to generate the vmlinux
> > and continue the development on libbpf based on your ksym work (i.e.
> > typed ksyms).
> >
>
> Hey Hao,
>
> It's a pity that you are willing to drop this while being half-step
> away from doing this properly. See my other reply, there is something
> fishy with variable__name() and variable__type_size() and how it
> calculates/caches values. ELF itself has all the data (based ELF
> symbol data) and it seems to be correct. And I was correct to be
> suspicious about just filtering out such variables, because in my case
> you'd filter out a good chunk of variables for no good reason: 74
> variables, which is a 26% of all per-CPU variables in my kernel.
>
> It's up to you, of course, but it would be nice to fix it up (just
> switching to elf_sym__size() and elf_sym__name() would be fine, you'd
> just need to make sure to add elf_sym_name() result into a string
> buffer; unless Arnaldo has some better alternatives) and publish
> complete v5, that would get merged into pahole. After that, building
> on my .ksym work in libbpf should get you to what you need from BPF
> pretty quickly (plus some kernel-side logic to recognize these per-CPU
> variables).
>
> > Hao
> >
> > On Fri, Jun 19, 2020 at 12:58 PM Andrii Nakryiko
> > <andrii.nakryiko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > >
> > > On Thu, Jun 18, 2020 at 12:49 AM Hao Luo <haoluo-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> > > >
> > > > On SMP systems, the global percpu variables are placed in a special
> > > > '.data..percpu' section, which is stored in a segment whose initial
> > > > address is set to 0, the addresses of per-CPU variables are relative
> > > > positive addresses [1].
> > > >
> > > > This patch extracts these variables from vmlinux and places them with
> > > > their type information in BTF. More specifically, when BTF is encoded,
> > > > we find the index of the '.data..percpu' section and then traverse
> > > > the symbol table to find those global objects which are in this section.
> > > > For each of these objects, we push a BTF_KIND_VAR into the types buffer,
> > > > and a BTF_VAR_SECINFO into another buffer, percpu_secinfo. When all the
> > > > CUs have finished processing, we push a BTF_KIND_DATASEC into the
> > > > btfe->types buffer, followed by the percpu_secinfo's content.
> > > >
> > > > In a v5.7-rc7 linux kernel, I was able to extract 291 such variables.
> > > > The build time overhead is small and the space overhead is also small.
> > > >
> > > > Testing:
> > > >
> > > > - vmlinux size has increased by ~12kb.
> > > >   Before:
> > > >    $ readelf -SW vmlinux | grep BTF
> > > >    [25] .BTF              PROGBITS        ffffffff821a905c 13a905c 2d2bf8 00
> > > >   After:
> > > >    $ pahole -J vmlinux
> > > >    $ readelf -SW vmlinux  | grep BTF
> > > >    [25] .BTF              PROGBITS        ffffffff821a905c 13a905c 2d5bca 00
> > > >
> > > > - Common global percpu VARs and DATASEC are found in BTF section.
> > > >   $ bpftool btf dump file vmlinux | grep runqueues
> > > >   [14098] VAR 'runqueues' type_id=13725, linkage=global-alloc
> > > >
> > > >   $ bpftool btf dump file vmlinux | grep 'cpu_stopper'
> > > >   [17592] STRUCT 'cpu_stopper' size=72 vlen=5
> > > >   [17612] VAR 'cpu_stopper' type_id=17592, linkage=static
> > > >
> > > >   $ bpftool btf dump file vmlinux | grep ' DATASEC '
> > > >   [63652] DATASEC '.data..percpu' size=0 vlen=294
> > >
> > > probably forgot to update the example, I'd imagine size wouldn't be 0 anymore?
> > >
>
> [...]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] btf_encoder: Teach pahole to store percpu variables in vmlinux BTF.
       [not found]                 ` <CA+khW7hqX9fJd2G8-Kze9pMR4hUKV5ROxn1vKYy82+PCzDeCDQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-06-30  1:02                   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-06-30  1:02 UTC (permalink / raw)
  To: Hao Luo, Andrii Nakryiko
  Cc: Alexei Starovoitov, Daniel Borkmann, Oleg Rombakh, Martin Lau,
	dwarves-u79uwXL29TY76Z2rM5mHXA



On June 29, 2020 6:13:20 PM GMT-03:00, Hao Luo <haoluo@google.com> wrote:
>Hi Andrii,
>
>Thanks for taking a look! I can post a v5 based on your thorough and
>detailed analysis this week. Some of my other work on sched consumed
>most of my bandwidth last week and I very appreciate your help! I
>apologize if it is felt I dropped this patch and will follow up within
>this week.

I'll also try and look at this tomorrow.

- Arnaldo

>
>Hao
>
>On Mon, Jun 29, 2020 at 1:26 PM Andrii Nakryiko
><andrii.nakryiko@gmail.com> wrote:
>>
>> On Fri, Jun 19, 2020 at 1:30 PM Hao Luo <haoluo@google.com> wrote:
>> >
>> > Hi, Andrii,
>> >
>> > I agree that we'd better put a hold on this patch until we find out
>> > the reason for the 'unconventional' symbols. I'll _try_ to figure
>it
>> > out, but not able to fully commit my time on this patch. I thought
>I'd
>> > better publish this patch as the DATASEC and VARs are generated
>> > correctly in format, so that anyone can use it to generate the
>vmlinux
>> > and continue the development on libbpf based on your ksym work
>(i.e.
>> > typed ksyms).
>> >
>>
>> Hey Hao,
>>
>> It's a pity that you are willing to drop this while being half-step
>> away from doing this properly. See my other reply, there is something
>> fishy with variable__name() and variable__type_size() and how it
>> calculates/caches values. ELF itself has all the data (based ELF
>> symbol data) and it seems to be correct. And I was correct to be
>> suspicious about just filtering out such variables, because in my
>case
>> you'd filter out a good chunk of variables for no good reason: 74
>> variables, which is a 26% of all per-CPU variables in my kernel.
>>
>> It's up to you, of course, but it would be nice to fix it up (just
>> switching to elf_sym__size() and elf_sym__name() would be fine, you'd
>> just need to make sure to add elf_sym_name() result into a string
>> buffer; unless Arnaldo has some better alternatives) and publish
>> complete v5, that would get merged into pahole. After that, building
>> on my .ksym work in libbpf should get you to what you need from BPF
>> pretty quickly (plus some kernel-side logic to recognize these
>per-CPU
>> variables).
>>
>> > Hao
>> >
>> > On Fri, Jun 19, 2020 at 12:58 PM Andrii Nakryiko
>> > <andrii.nakryiko@gmail.com> wrote:
>> > >
>> > > On Thu, Jun 18, 2020 at 12:49 AM Hao Luo <haoluo@google.com>
>wrote:
>> > > >
>> > > > On SMP systems, the global percpu variables are placed in a
>special
>> > > > '.data..percpu' section, which is stored in a segment whose
>initial
>> > > > address is set to 0, the addresses of per-CPU variables are
>relative
>> > > > positive addresses [1].
>> > > >
>> > > > This patch extracts these variables from vmlinux and places
>them with
>> > > > their type information in BTF. More specifically, when BTF is
>encoded,
>> > > > we find the index of the '.data..percpu' section and then
>traverse
>> > > > the symbol table to find those global objects which are in this
>section.
>> > > > For each of these objects, we push a BTF_KIND_VAR into the
>types buffer,
>> > > > and a BTF_VAR_SECINFO into another buffer, percpu_secinfo. When
>all the
>> > > > CUs have finished processing, we push a BTF_KIND_DATASEC into
>the
>> > > > btfe->types buffer, followed by the percpu_secinfo's content.
>> > > >
>> > > > In a v5.7-rc7 linux kernel, I was able to extract 291 such
>variables.
>> > > > The build time overhead is small and the space overhead is also
>small.
>> > > >
>> > > > Testing:
>> > > >
>> > > > - vmlinux size has increased by ~12kb.
>> > > >   Before:
>> > > >    $ readelf -SW vmlinux | grep BTF
>> > > >    [25] .BTF              PROGBITS        ffffffff821a905c
>13a905c 2d2bf8 00
>> > > >   After:
>> > > >    $ pahole -J vmlinux
>> > > >    $ readelf -SW vmlinux  | grep BTF
>> > > >    [25] .BTF              PROGBITS        ffffffff821a905c
>13a905c 2d5bca 00
>> > > >
>> > > > - Common global percpu VARs and DATASEC are found in BTF
>section.
>> > > >   $ bpftool btf dump file vmlinux | grep runqueues
>> > > >   [14098] VAR 'runqueues' type_id=13725, linkage=global-alloc
>> > > >
>> > > >   $ bpftool btf dump file vmlinux | grep 'cpu_stopper'
>> > > >   [17592] STRUCT 'cpu_stopper' size=72 vlen=5
>> > > >   [17612] VAR 'cpu_stopper' type_id=17592, linkage=static
>> > > >
>> > > >   $ bpftool btf dump file vmlinux | grep ' DATASEC '
>> > > >   [63652] DATASEC '.data..percpu' size=0 vlen=294
>> > >
>> > > probably forgot to update the example, I'd imagine size wouldn't
>be 0 anymore?
>> > >
>>
>> [...]

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] btf_encoder: Teach pahole to store percpu variables in vmlinux BTF.
       [not found]         ` <CAEf4Bzbiu8G5UK9TDBRFYGwVk4A4QWoLurZsFKhs439gBXaDCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-06-30 11:59           ` Arnaldo Carvalho de Melo
       [not found]             ` <20200630115927.GL29008-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-06-30 11:59 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Hao Luo, Arnaldo Carvalho de Melo, Alexei Starovoitov,
	Daniel Borkmann, Oleg Rombakh, Martin Lau,
	dwarves-u79uwXL29TY76Z2rM5mHXA

Em Mon, Jun 29, 2020 at 01:20:02PM -0700, Andrii Nakryiko escreveu:
> On Fri, Jun 19, 2020 at 12:58 PM Andrii Nakryiko <andrii.nakryiko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> > On Thu, Jun 18, 2020 at 12:49 AM Hao Luo <haoluo-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> > > +               if (!addr)
> > > +                       continue;
> > > +               var = hashaddr__find_variable(hash_addr, addr);
> > > +               if (var == NULL)
> > > +                       continue;
> > > +
> > > +               /*
> > > +                * Valididate the variable's name, the same check is also
> > > +                * performed in bpf verifier.
> > > +                */
> > > +               name = variable__name(var, cu);
> > > +               if (!var->name || !btf_name_valid(name, true)) {
> >
> > It might be ok to filter them out, but I'd rather try to first
> > understand why do we get those invalid names in the first place.
> > Otherwise we might be just papering over the real bug in how we
> > process symbols/variables in DWARF?
> 
> Ok, so I played with it a bit locally. Those variables with empty name
> or 0 size seem to be due to some bugs in variable__name() and
> variable__type_size() implementations. If you stick to ELF symbol's
> elf_sym__name() and elf_sym__size(), you'll get valid sizes.
> 
> Arnaldo, could you please take a look at why variable__xxx() impls
> don't work for some variables? I can repro it locally, e.g.:

I'm looking at it, I think this is that maybe not all variables that are
in the elf symbol table are encoded in DWARF (maybe some declared in asm
files?) and thus we will not find its name in the .debug_str.

ctf_loader.c, that was used as the model for btf_loader.c has this:

static const char *ctf__variable_name(const struct variable *var,
                                      const struct cu *cu)
{
        struct ctf *ctf = cu->priv;

        return ctf->symtab->symstrs->d_buf + var->name;
}

struct debug_fmt_ops ctf__ops = {
        .name           = "ctf",
        .function__name = ctf__function_name,
        .load_file      = ctf__load_file,
        .variable__name = ctf__variable_name,
        .strings__ptr   = ctf__strings_ptr,
        .cu__delete     = ctf__cu_delete,
};

And then

const char *variable__name(const struct variable *var, const struct cu *cu)
{
        if (cu->dfops && cu->dfops->variable__name)
                return cu->dfops->variable__name(var, cu);
        return s(cu, var->name);
}

I.e. btf_loader.c and dwarf_loader.c are using the fallback return that
assumes all strings are in a single table, the CTF one gets variable
names from the ELF symtab, as you seem to be doing by using
elf_sym__name() directly.

I'm reading Hao's patch to see how the variables are being created as
they are loaded from DWARF, to check this hunch.

- Arnaldo
 
> ELF SYM kstat == <empty>, SZ 48 == 0
> invalid type size for variable '(null)'.
> invalid variable name '(null)'.
> 
> kstat and 48 come from ELF symbol, <empty> and 0 are from
> variabel__xxx(). And that happens for quite a few variables, actually.
> 
> With those fixes, my kernel can be successfully BTF-generated. The
> number of per-cpu variables jump from 213 with all this filtering to
> 287 proper variables while using elf_sym__xxx() functions.
> 
> >
> > > +                       if (verbose)
> > > +                               printf("invalid variable name '%s'.\n", name);
> > > +                       continue;
> > > +               }
> > > +
> > > +               /*
> > > +                * Validate the variable's type, the same check is also
> > > +                * performed in bpf verifier.
> > > +                */
> > > +               type = var->ip.tag.type + type_id_off;
> > > +               if (type == 0) {
> > > +                       if (verbose)
> > > +                               printf("invalid type for variable '%s'.\n", name);
> > > +                       continue;
> >
> > Again, there might be cases which we want to ignore, but we need to
> > understand what and why those cases are?
> >
> > > +               }
> > > +
> > > +               /*
> > > +                * Validate the size of the variable's type, the same check is
> > > +                * also performed in bpf verifier.
> > > +                */
> > > +               size = variable__type_size(var, cu);
> > > +               if (size == 0) {
> > > +                       if (verbose)
> > > +                               printf("invalid type size for variable '%s'.\n", name);
> > > +                       continue;
> > > +               }
> > > +
> >
> > The goal here is not to satisfy the verifier, but to generate BTF from
> > DWARF faithfully. Let's not just ignore "inconvenient" cases and
> > instead investigate why such cases happen.
> >
> > > +               if (verbose)
> > > +                       printf("symbol '%s' of address 0x%lx encoded\n",
> > > +                              elf_sym__name(&sym, btfe->symtab), addr);
> > > +
> > > +               /* add a BTF_KIND_VAR in btfe->types */
> > > +               linkage = var->external ? BTF_VAR_GLOBAL_ALLOCATED : BTF_VAR_STATIC;
> > > +               btf_var_id = btf_elf__add_var_type(btfe, type, var->name, linkage);
> > > +               if (btf_var_id < 0) {
> > > +                       err = -1;
> > > +                       printf("error: failed to encode variable '%s'\n", name);
> > > +                       break;
> > > +               }
> > > +
> > > +               /*
> > > +                * add a BTF_VAR_SECINFO in btfe->percpu_secinfo, which will be added into
> > > +                * btfe->types later when we add BTF_VAR_DATASEC.
> > > +                */
> > > +               type = btf_var_id;
> > > +               offset = addr - btfe->percpu_base_addr;
> > > +               btf_var_secinfo_id = btf_elf__add_var_secinfo(&btfe->percpu_secinfo,
> > > +                                                             type, offset, size);
> > > +               if (btf_var_secinfo_id < 0) {
> > > +                       err = -1;
> > > +                       printf("error: failed to encode var secinfo '%s'\n", name);
> > > +                       break;
> > > +               }
> > > +       }
> > > +
> > >  out:
> > >         if (err)
> > >                 btf_elf__delete(btfe);
> >
> > [...]

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] btf_encoder: Teach pahole to store percpu variables in vmlinux BTF.
       [not found]             ` <20200630115927.GL29008-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2020-07-03  8:13               ` Hao Luo
  0 siblings, 0 replies; 9+ messages in thread
From: Hao Luo @ 2020-07-03  8:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Andrii Nakryiko, Arnaldo Carvalho de Melo, Alexei Starovoitov,
	Daniel Borkmann, Oleg Rombakh, Martin Lau,
	dwarves-u79uwXL29TY76Z2rM5mHXA

Some update on my side.

I am sending a v5 based on Andrii's suggestion on using elf_sym__name
and elf_sym__size. It seems my local build tool has encoded all the
symbols into DWARF, therefore, I don't see those "(anon)" variables in
either v4 or v5, as seen by Andrii and Arnaldo. But I did come up with
a potential concern about the variable's type while writing v5. Right
now, the btf_id of the variable's type is retrieved as

type = var->ip.tag.type + type_id_off;

If I understand correctly, this is obtained from DWARF. Since symtab
doesn't store the symbol's type (please correct me if I'm wrong), even
if we can get the variable's name and size using elf_sym__name and
elf_sym__size, we have no way to get the type. I am afraid we may get
an incorrect type id using var->ip.tag.type.

So, Andrii, could you please verify whether the encoded VAR's types
are still correct for those 'anon' variables? I can't verify as my
build tool doesn't generate such variables. If the encoded type ids
are still correct, then using sym does do a better job than using
DWARF.

I'm also leaving the basic validations on variable's name and size in
v5. I think they are needed. It's better having less variables than a
malformed btf if things go awry.

Hao

On Tue, Jun 30, 2020 at 4:59 AM Arnaldo Carvalho de Melo
<acme-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>
> Em Mon, Jun 29, 2020 at 01:20:02PM -0700, Andrii Nakryiko escreveu:
> > On Fri, Jun 19, 2020 at 12:58 PM Andrii Nakryiko <andrii.nakryiko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> > > On Thu, Jun 18, 2020 at 12:49 AM Hao Luo <haoluo-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> > > > +               if (!addr)
> > > > +                       continue;
> > > > +               var = hashaddr__find_variable(hash_addr, addr);
> > > > +               if (var == NULL)
> > > > +                       continue;
> > > > +
> > > > +               /*
> > > > +                * Valididate the variable's name, the same check is also
> > > > +                * performed in bpf verifier.
> > > > +                */
> > > > +               name = variable__name(var, cu);
> > > > +               if (!var->name || !btf_name_valid(name, true)) {
> > >
> > > It might be ok to filter them out, but I'd rather try to first
> > > understand why do we get those invalid names in the first place.
> > > Otherwise we might be just papering over the real bug in how we
> > > process symbols/variables in DWARF?
> >
> > Ok, so I played with it a bit locally. Those variables with empty name
> > or 0 size seem to be due to some bugs in variable__name() and
> > variable__type_size() implementations. If you stick to ELF symbol's
> > elf_sym__name() and elf_sym__size(), you'll get valid sizes.
> >
> > Arnaldo, could you please take a look at why variable__xxx() impls
> > don't work for some variables? I can repro it locally, e.g.:
>
> I'm looking at it, I think this is that maybe not all variables that are
> in the elf symbol table are encoded in DWARF (maybe some declared in asm
> files?) and thus we will not find its name in the .debug_str.
>
> ctf_loader.c, that was used as the model for btf_loader.c has this:
>
> static const char *ctf__variable_name(const struct variable *var,
>                                       const struct cu *cu)
> {
>         struct ctf *ctf = cu->priv;
>
>         return ctf->symtab->symstrs->d_buf + var->name;
> }
>
> struct debug_fmt_ops ctf__ops = {
>         .name           = "ctf",
>         .function__name = ctf__function_name,
>         .load_file      = ctf__load_file,
>         .variable__name = ctf__variable_name,
>         .strings__ptr   = ctf__strings_ptr,
>         .cu__delete     = ctf__cu_delete,
> };
>
> And then
>
> const char *variable__name(const struct variable *var, const struct cu *cu)
> {
>         if (cu->dfops && cu->dfops->variable__name)
>                 return cu->dfops->variable__name(var, cu);
>         return s(cu, var->name);
> }
>
> I.e. btf_loader.c and dwarf_loader.c are using the fallback return that
> assumes all strings are in a single table, the CTF one gets variable
> names from the ELF symtab, as you seem to be doing by using
> elf_sym__name() directly.
>
> I'm reading Hao's patch to see how the variables are being created as
> they are loaded from DWARF, to check this hunch.
>
> - Arnaldo
>
> > ELF SYM kstat == <empty>, SZ 48 == 0
> > invalid type size for variable '(null)'.
> > invalid variable name '(null)'.
> >
> > kstat and 48 come from ELF symbol, <empty> and 0 are from
> > variabel__xxx(). And that happens for quite a few variables, actually.
> >
> > With those fixes, my kernel can be successfully BTF-generated. The
> > number of per-cpu variables jump from 213 with all this filtering to
> > 287 proper variables while using elf_sym__xxx() functions.
> >
> > >
> > > > +                       if (verbose)
> > > > +                               printf("invalid variable name '%s'.\n", name);
> > > > +                       continue;
> > > > +               }
> > > > +
> > > > +               /*
> > > > +                * Validate the variable's type, the same check is also
> > > > +                * performed in bpf verifier.
> > > > +                */
> > > > +               type = var->ip.tag.type + type_id_off;
> > > > +               if (type == 0) {
> > > > +                       if (verbose)
> > > > +                               printf("invalid type for variable '%s'.\n", name);
> > > > +                       continue;
> > >
> > > Again, there might be cases which we want to ignore, but we need to
> > > understand what and why those cases are?
> > >
> > > > +               }
> > > > +
> > > > +               /*
> > > > +                * Validate the size of the variable's type, the same check is
> > > > +                * also performed in bpf verifier.
> > > > +                */
> > > > +               size = variable__type_size(var, cu);
> > > > +               if (size == 0) {
> > > > +                       if (verbose)
> > > > +                               printf("invalid type size for variable '%s'.\n", name);
> > > > +                       continue;
> > > > +               }
> > > > +
> > >
> > > The goal here is not to satisfy the verifier, but to generate BTF from
> > > DWARF faithfully. Let's not just ignore "inconvenient" cases and
> > > instead investigate why such cases happen.
> > >
> > > > +               if (verbose)
> > > > +                       printf("symbol '%s' of address 0x%lx encoded\n",
> > > > +                              elf_sym__name(&sym, btfe->symtab), addr);
> > > > +
> > > > +               /* add a BTF_KIND_VAR in btfe->types */
> > > > +               linkage = var->external ? BTF_VAR_GLOBAL_ALLOCATED : BTF_VAR_STATIC;
> > > > +               btf_var_id = btf_elf__add_var_type(btfe, type, var->name, linkage);
> > > > +               if (btf_var_id < 0) {
> > > > +                       err = -1;
> > > > +                       printf("error: failed to encode variable '%s'\n", name);
> > > > +                       break;
> > > > +               }
> > > > +
> > > > +               /*
> > > > +                * add a BTF_VAR_SECINFO in btfe->percpu_secinfo, which will be added into
> > > > +                * btfe->types later when we add BTF_VAR_DATASEC.
> > > > +                */
> > > > +               type = btf_var_id;
> > > > +               offset = addr - btfe->percpu_base_addr;
> > > > +               btf_var_secinfo_id = btf_elf__add_var_secinfo(&btfe->percpu_secinfo,
> > > > +                                                             type, offset, size);
> > > > +               if (btf_var_secinfo_id < 0) {
> > > > +                       err = -1;
> > > > +                       printf("error: failed to encode var secinfo '%s'\n", name);
> > > > +                       break;
> > > > +               }
> > > > +       }
> > > > +
> > > >  out:
> > > >         if (err)
> > > >                 btf_elf__delete(btfe);
> > >
> > > [...]
>
> --
>
> - Arnaldo

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, back to index

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-18  7:48 [PATCH v4] btf_encoder: Teach pahole to store percpu variables in vmlinux BTF Hao Luo
     [not found] ` <20200618074853.133675-1-haoluo-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2020-06-19 19:58   ` Andrii Nakryiko
     [not found]     ` <CAEf4BzbE2K7uyJ-2d3N+NLrV5OKS0HMm5vKuxMz5bfZkE_ANzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-06-19 20:30       ` Hao Luo
     [not found]         ` <CA+khW7j3AoW8q+Q30RXG-psw4imkBxNqffObnu2dr2=K=oVBCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-06-29 20:25           ` Andrii Nakryiko
     [not found]             ` <CAEf4Bza20kx3d+84Fg+UKXFYb2vforj55g0zO92Hpz+cM49LQA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-06-29 21:13               ` Hao Luo
     [not found]                 ` <CA+khW7hqX9fJd2G8-Kze9pMR4hUKV5ROxn1vKYy82+PCzDeCDQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-06-30  1:02                   ` Arnaldo Carvalho de Melo
2020-06-29 20:20       ` Andrii Nakryiko
     [not found]         ` <CAEf4Bzbiu8G5UK9TDBRFYGwVk4A4QWoLurZsFKhs439gBXaDCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-06-30 11:59           ` Arnaldo Carvalho de Melo
     [not found]             ` <20200630115927.GL29008-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2020-07-03  8:13               ` Hao Luo

Dwarves Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/dwarves/0 dwarves/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dwarves dwarves/ https://lore.kernel.org/dwarves \
		dwarves@vger.kernel.org
	public-inbox-index dwarves

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.dwarves


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git