dwarves.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 dwarves, kernel, libbpf 0/9] Add support for generating BTF for all variables
@ 2022-11-04 23:10 Stephen Brennan
  2022-11-04 23:10 ` [PATCH v2 dwarves 1/9] dutil: return ELF section name when looked up by index Stephen Brennan
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Stephen Brennan @ 2022-11-04 23:10 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: dwarves, bpf, Arnaldo Carvalho de Melo, alan.maguire, Andrii Nakryiko

Hi all,

It's been a few weeks since the last update to my prior thread [1] on this
topic. My apologies on that: Alan hunted down some BTF deduplicator bugs and he
found the root cause and got it fixed [2]. We went ahead and respun the dwarves
patches on top of the latest version, and Alan also created a kernel patch to
implement the tristate for BTF variables. I tested the full series and did a
size comparison which I'll share below.

[1] https://lore.kernel.org/bpf/20220826184911.168442-1-stephen.s.brennan@oracle.com/
[2] https://lore.kernel.org/bpf/1666622309-22289-1-git-send-email-alan.maguire@oracle.com/

To remind folks what the series is all about, BTF currently contains type
information for functions and percpu variables, but not for global variables.
Debuggers would find BTF quite useful as a source of type information, but they
would need that global variable data, as users tend to look more at data types
than function types. A major advantage of BTF is that it is compact, and
built-in to the kernel. Assuming that you can find the kallsyms symbol table
(which is possible via the vmcoreinfo note since 6.0), then you can locate BTF
data. With the kallsyms table and the type info, you have enough information to
enable reasonably user-friendly debugging. There are proof-of-concept patches
for drgn (a Python-based scriptable debugger) to leverage all this
functionality to debug a core dump without any external debug info.

So, this patch series is a re-roll & rebase of the prior one which was just for
dwarves/pahole. It includes three components:

(1) Alan's libbpf fix, just for reference
(2) The dwarves patches to add support for generating variable BTF
(3) The kernel patch adding the variable generation as a tristate

The only new portion is 3, I believe. But all of this should be a complete
product. I used this complete product to build a small-ish upstream kernel
configuration, and I measured the size of the .BTF sections. Here's the result:

Vars built-in:  0x7d3ad1  8,207,057 bytes
Vars in module: 0x62af5f  6,467,423 bytes
 -> module BTF: 0x1a8e66  1,740,390 bytes   (combined size 8,207,813)
Vars disabled:  0x62e90b  6,482,187 bytes

Sorry, I don't have a combined diffstat for the files since these patches are
both for dwarves and kernel repos. But here's the listing of patches:

dwarves: Stephen Brennan
  [1/9] dutil: return ELF section name when looked up by index
  [2/9] btf_encoder: Rename percpu structures to variables
  [3/9] btf_encoder: cache all ELF section info
  [4/9] btf_encoder: make the variable array dynamic
  [5/9] btf_encoder: record ELF section for collected variables
  [6/9] btf_encoder: collect all variables
  [7/9] btf_encoder: allow encoding all variables

libbpf: Alan Maguire
  [8/9] libbpf: btf dedup identical struct test needs check for nested structs/arrays
  (** note, this is already merged, but included for completeness **)

kernel: Alan Maguire
  [9/9] bpf: add support for CONFIG_DEBUG_INFO_BTF_VARS

Thanks for your consideration!
Stephen

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 dwarves 1/9] dutil: return ELF section name when looked up by index
  2022-11-04 23:10 [PATCH v2 dwarves, kernel, libbpf 0/9] Add support for generating BTF for all variables Stephen Brennan
@ 2022-11-04 23:10 ` Stephen Brennan
  2022-11-04 23:10 ` [PATCH v2 dwarves 2/9] btf_encoder: Rename percpu structures to variables Stephen Brennan
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Stephen Brennan @ 2022-11-04 23:10 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: dwarves, bpf, Arnaldo Carvalho de Melo, alan.maguire, Andrii Nakryiko

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 dutil.c | 10 +++++++++-
 dutil.h |  2 +-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/dutil.c b/dutil.c
index 97c4474..a4d55e6 100644
--- a/dutil.c
+++ b/dutil.c
@@ -2,6 +2,7 @@
   SPDX-License-Identifier: GPL-2.0-only
 
   Copyright (C) 2007 Arnaldo Carvalho de Melo <acme@redhat.com>
+  Copyright (c) 2022, Oracle and/or its affiliates.
 */
 
 
@@ -207,13 +208,20 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Shdr *shp, const char *name, size_t
 	return sec;
 }
 
-Elf_Scn *elf_section_by_idx(Elf *elf, GElf_Shdr *shp, int idx)
+Elf_Scn *elf_section_by_idx(Elf *elf, GElf_Shdr *shp, int idx, const char **name_out)
 {
 	Elf_Scn *sec;
+	size_t str_idx;
 
 	sec = elf_getscn(elf, idx);
 	if (sec)
 		gelf_getshdr(sec, shp);
+
+	if (name_out) {
+		if (elf_getshdrstrndx(elf, &str_idx))
+			return NULL;
+		*name_out = elf_strptr(elf, str_idx, shp->sh_name);
+	}
 	return sec;
 }
 
diff --git a/dutil.h b/dutil.h
index 335a17c..ff78aa6 100644
--- a/dutil.h
+++ b/dutil.h
@@ -328,7 +328,7 @@ void *zalloc(const size_t size);
 
 Elf_Scn *elf_section_by_name(Elf *elf, GElf_Shdr *shp, const char *name, size_t *index);
 
-Elf_Scn *elf_section_by_idx(Elf *elf, GElf_Shdr *shp, int idx);
+Elf_Scn *elf_section_by_idx(Elf *elf, GElf_Shdr *shp, int idx, const char **name_out);
 
 #ifndef SHT_GNU_ATTRIBUTES
 /* Just a way to check if we're using an old elfutils version */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 dwarves 2/9] btf_encoder: Rename percpu structures to variables
  2022-11-04 23:10 [PATCH v2 dwarves, kernel, libbpf 0/9] Add support for generating BTF for all variables Stephen Brennan
  2022-11-04 23:10 ` [PATCH v2 dwarves 1/9] dutil: return ELF section name when looked up by index Stephen Brennan
@ 2022-11-04 23:10 ` Stephen Brennan
  2022-11-04 23:10 ` [PATCH v2 dwarves 3/9] btf_encoder: cache all ELF section info Stephen Brennan
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Stephen Brennan @ 2022-11-04 23:10 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: dwarves, bpf, Arnaldo Carvalho de Melo, alan.maguire, Andrii Nakryiko

The BTF encoder will now be storing symbol data for more than just
percpu variables. Rename the data structure fields so that they are
more general.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 btf_encoder.c | 54 ++++++++++++++++++++++++++-------------------------
 1 file changed, 28 insertions(+), 26 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index a5fa04a..c914647 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -2,6 +2,7 @@
   SPDX-License-Identifier: GPL-2.0-only
 
   Copyright (C) 2019 Facebook
+  Copyright (c) 2022, Oracle and/or its affiliates.
 
   Derived from ctf_encoder.c, which is:
 
@@ -36,7 +37,7 @@ struct elf_function {
 	bool		 generated;
 };
 
-#define MAX_PERCPU_VAR_CNT 4096
+#define MAX_VAR_CNT 4096
 
 struct var_info {
 	uint64_t    addr;
@@ -64,12 +65,12 @@ struct btf_encoder {
 			  is_rel;
 	uint32_t	  array_index_id;
 	struct {
-		struct var_info vars[MAX_PERCPU_VAR_CNT];
+		struct var_info vars[MAX_VAR_CNT];
 		int		var_cnt;
-		uint32_t	shndx;
-		uint64_t	base_addr;
-		uint64_t	sec_sz;
-	} percpu;
+		uint32_t	percpu_shndx;
+		uint64_t	percpu_base_addr;
+		uint64_t	percpu_sec_sz;
+	} variables;
 	struct {
 		struct elf_function *entries;
 		int		    allocated;
@@ -1175,8 +1176,8 @@ static int percpu_var_cmp(const void *_a, const void *_b)
 static bool btf_encoder__percpu_var_exists(struct btf_encoder *encoder, uint64_t addr, uint32_t *sz, const char **name)
 {
 	struct var_info key = { .addr = addr };
-	const struct var_info *p = bsearch(&key, encoder->percpu.vars, encoder->percpu.var_cnt,
-					   sizeof(encoder->percpu.vars[0]), percpu_var_cmp);
+	const struct var_info *p = bsearch(&key, encoder->variables.vars, encoder->variables.var_cnt,
+					   sizeof(encoder->variables.vars[0]), percpu_var_cmp);
 	if (!p)
 		return false;
 
@@ -1192,7 +1193,7 @@ static int btf_encoder__collect_percpu_var(struct btf_encoder *encoder, GElf_Sym
 	uint32_t size;
 
 	/* compare a symbol's shndx to determine if it's a percpu variable */
-	if (sym_sec_idx != encoder->percpu.shndx)
+	if (sym_sec_idx != encoder->variables.percpu_shndx)
 		return 0;
 	if (elf_sym__type(sym) != STT_OBJECT)
 		return 0;
@@ -1220,17 +1221,17 @@ static int btf_encoder__collect_percpu_var(struct btf_encoder *encoder, GElf_Sym
 	 * ET_EXEC file) we need to subtract the section address.
 	 */
 	if (!encoder->is_rel)
-		addr -= encoder->percpu.base_addr;
+		addr -= encoder->variables.percpu_base_addr;
 
-	if (encoder->percpu.var_cnt == MAX_PERCPU_VAR_CNT) {
+	if (encoder->variables.var_cnt == MAX_VAR_CNT) {
 		fprintf(stderr, "Reached the limit of per-CPU variables: %d\n",
-			MAX_PERCPU_VAR_CNT);
+			MAX_VAR_CNT);
 		return -1;
 	}
-	encoder->percpu.vars[encoder->percpu.var_cnt].addr = addr;
-	encoder->percpu.vars[encoder->percpu.var_cnt].sz = size;
-	encoder->percpu.vars[encoder->percpu.var_cnt].name = sym_name;
-	encoder->percpu.var_cnt++;
+	encoder->variables.vars[encoder->variables.var_cnt].addr = addr;
+	encoder->variables.vars[encoder->variables.var_cnt].sz = size;
+	encoder->variables.vars[encoder->variables.var_cnt].name = sym_name;
+	encoder->variables.var_cnt++;
 
 	return 0;
 }
@@ -1242,7 +1243,7 @@ static int btf_encoder__collect_symbols(struct btf_encoder *encoder, bool collec
 	GElf_Sym sym;
 
 	/* cache variables' addresses, preparing for searching in symtab. */
-	encoder->percpu.var_cnt = 0;
+	encoder->variables.var_cnt = 0;
 
 	/* search within symtab for percpu variables */
 	elf_symtab__for_each_symbol_index(encoder->symtab, core_id, sym, sym_sec_idx) {
@@ -1253,11 +1254,11 @@ static int btf_encoder__collect_symbols(struct btf_encoder *encoder, bool collec
 	}
 
 	if (collect_percpu_vars) {
-		if (encoder->percpu.var_cnt)
-			qsort(encoder->percpu.vars, encoder->percpu.var_cnt, sizeof(encoder->percpu.vars[0]), percpu_var_cmp);
+		if (encoder->variables.var_cnt)
+			qsort(encoder->variables.vars, encoder->variables.var_cnt, sizeof(encoder->variables.vars[0]), percpu_var_cmp);
 
 		if (encoder->verbose)
-			printf("Found %d per-CPU variables!\n", encoder->percpu.var_cnt);
+			printf("Found %d per-CPU variables!\n", encoder->variables.var_cnt);
 	}
 
 	if (encoder->functions.cnt) {
@@ -1288,7 +1289,7 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, uint32_
 	struct tag *pos;
 	int err = -1;
 
-	if (encoder->percpu.shndx == 0 || !encoder->symtab)
+	if (encoder->variables.percpu_shndx == 0 || !encoder->symtab)
 		return 0;
 
 	if (encoder->verbose)
@@ -1318,9 +1319,10 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, uint32_
 		 * always contains virtual symbol addresses, so subtract
 		 * the section address unconditionally.
 		 */
-		if (addr < encoder->percpu.base_addr || addr >= encoder->percpu.base_addr + encoder->percpu.sec_sz)
+		if (addr < encoder->variables.percpu_base_addr ||
+		    addr >= encoder->variables.percpu_base_addr + encoder->variables.percpu_sec_sz)
 			continue;
-		addr -= encoder->percpu.base_addr;
+		addr -= encoder->variables.percpu_base_addr;
 
 		if (!btf_encoder__percpu_var_exists(encoder, addr, &size, &name))
 			continue; /* not a per-CPU variable */
@@ -1468,9 +1470,9 @@ struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filenam
 			if (encoder->verbose)
 				printf("%s: '%s' doesn't have '%s' section\n", __func__, cu->filename, PERCPU_SECTION);
 		} else {
-			encoder->percpu.shndx	  = elf_ndxscn(sec);
-			encoder->percpu.base_addr = shdr.sh_addr;
-			encoder->percpu.sec_sz	  = shdr.sh_size;
+			encoder->variables.percpu_shndx     = elf_ndxscn(sec);
+			encoder->variables.percpu_base_addr = shdr.sh_addr;
+			encoder->variables.percpu_sec_sz    = shdr.sh_size;
 		}
 
 		if (btf_encoder__collect_symbols(encoder, !encoder->skip_encoding_vars))
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 dwarves 3/9] btf_encoder: cache all ELF section info
  2022-11-04 23:10 [PATCH v2 dwarves, kernel, libbpf 0/9] Add support for generating BTF for all variables Stephen Brennan
  2022-11-04 23:10 ` [PATCH v2 dwarves 1/9] dutil: return ELF section name when looked up by index Stephen Brennan
  2022-11-04 23:10 ` [PATCH v2 dwarves 2/9] btf_encoder: Rename percpu structures to variables Stephen Brennan
@ 2022-11-04 23:10 ` Stephen Brennan
  2022-11-04 23:10 ` [PATCH v2 dwarves 4/9] btf_encoder: make the variable array dynamic Stephen Brennan
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Stephen Brennan @ 2022-11-04 23:10 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: dwarves, bpf, Arnaldo Carvalho de Melo, alan.maguire, Andrii Nakryiko

To handle outputting all variables generally, we'll need to store more
section data. Create a table of ELF sections so we can refer to all the
cached data.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 btf_encoder.c | 68 +++++++++++++++++++++++++++++++++------------------
 1 file changed, 44 insertions(+), 24 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index c914647..e0e038b 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -38,6 +38,7 @@ struct elf_function {
 };
 
 #define MAX_VAR_CNT 4096
+#define MAX_ELF_SEC_CNT    128
 
 struct var_info {
 	uint64_t    addr;
@@ -45,6 +46,13 @@ struct var_info {
 	uint32_t    sz;
 };
 
+struct elf_secinfo {
+	uint64_t	addr;
+	const char	*name;
+	uint64_t	sz;
+	bool		include;
+};
+
 /*
  * cu: cu being processed.
  */
@@ -64,12 +72,12 @@ struct btf_encoder {
 			  gen_floats,
 			  is_rel;
 	uint32_t	  array_index_id;
+	struct elf_secinfo secinfo[MAX_ELF_SEC_CNT];
+	size_t		  seccnt;
 	struct {
 		struct var_info vars[MAX_VAR_CNT];
 		int		var_cnt;
 		uint32_t	percpu_shndx;
-		uint64_t	percpu_base_addr;
-		uint64_t	percpu_sec_sz;
 	} variables;
 	struct {
 		struct elf_function *entries;
@@ -1216,12 +1224,13 @@ static int btf_encoder__collect_percpu_var(struct btf_encoder *encoder, GElf_Sym
 	if (encoder->verbose)
 		printf("Found per-CPU symbol '%s' at address 0x%" PRIx64 "\n", sym_name, addr);
 
-	/* Make sure addr is section-relative. For kernel modules (which are
-	 * ET_REL files) this is already the case. For vmlinux (which is an
-	 * ET_EXEC file) we need to subtract the section address.
+	/* Make sure addr is absolute, so that we can compare it to DWARF
+	 * absolute addresses. We can compute to section-relative addresses
+	 * when necessary.
 	 */
-	if (!encoder->is_rel)
-		addr -= encoder->variables.percpu_base_addr;
+	if (encoder->is_rel && sym->st_shndx < encoder->seccnt) {
+		addr += encoder->secinfo[sym->st_shndx].addr;
+	}
 
 	if (encoder->variables.var_cnt == MAX_VAR_CNT) {
 		fprintf(stderr, "Reached the limit of per-CPU variables: %d\n",
@@ -1288,6 +1297,7 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, uint32_
 	uint32_t core_id;
 	struct tag *pos;
 	int err = -1;
+	struct elf_secinfo *pcpu_scn = &encoder->secinfo[encoder->variables.percpu_shndx];
 
 	if (encoder->variables.percpu_shndx == 0 || !encoder->symtab)
 		return 0;
@@ -1315,14 +1325,9 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, uint32_
 		addr = var->ip.addr;
 		dwarf_name = variable__name(var);
 
-		/* Make sure addr is section-relative. DWARF, unlike ELF,
-		 * always contains virtual symbol addresses, so subtract
-		 * the section address unconditionally.
-		 */
-		if (addr < encoder->variables.percpu_base_addr ||
-		    addr >= encoder->variables.percpu_base_addr + encoder->variables.percpu_sec_sz)
+		/* Make sure addr is in the percpu section */
+		if (addr < pcpu_scn->addr || addr >= pcpu_scn->addr + pcpu_scn->sz)
 			continue;
-		addr -= encoder->variables.percpu_base_addr;
 
 		if (!btf_encoder__percpu_var_exists(encoder, addr, &size, &name))
 			continue; /* not a per-CPU variable */
@@ -1397,7 +1402,7 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, uint32_
 		 * add a BTF_VAR_SECINFO in encoder->percpu_secinfo, which will be added into
 		 * encoder->types later when we add BTF_VAR_DATASEC.
 		 */
-		id = btf_encoder__add_var_secinfo(encoder, id, addr, size);
+		id = btf_encoder__add_var_secinfo(encoder, id, addr - pcpu_scn->addr, size);
 		if (id < 0) {
 			fprintf(stderr, "error: failed to encode section info for variable '%s' at addr 0x%" PRIx64 "\n",
 			        name, addr);
@@ -1461,20 +1466,35 @@ struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filenam
 			goto out;
 		}
 
-		/* find percpu section's shndx */
+		/* index the ELF sections for later lookup */
 
 		GElf_Shdr shdr;
-		Elf_Scn *sec = elf_section_by_name(cu->elf, &shdr, PERCPU_SECTION, NULL);
+		size_t shndx;
+		if (elf_getshdrnum(cu->elf, &encoder->seccnt))
+			goto out_delete;
+		if (encoder->seccnt >= MAX_ELF_SEC_CNT) {
+			fprintf(stderr, "%s: reached limit of ELF sections\n", __func__);
+			goto out_delete;
+		}
 
-		if (!sec) {
-			if (encoder->verbose)
-				printf("%s: '%s' doesn't have '%s' section\n", __func__, cu->filename, PERCPU_SECTION);
-		} else {
-			encoder->variables.percpu_shndx     = elf_ndxscn(sec);
-			encoder->variables.percpu_base_addr = shdr.sh_addr;
-			encoder->variables.percpu_sec_sz    = shdr.sh_size;
+		for (shndx = 0; shndx < encoder->seccnt; shndx++) {
+			const char *secname = NULL;
+			Elf_Scn *sec = elf_section_by_idx(cu->elf, &shdr, shndx, &secname);
+			if (!sec)
+				goto out_delete;
+			encoder->secinfo[shndx].addr = shdr.sh_addr;
+			encoder->secinfo[shndx].sz = shdr.sh_size;
+			encoder->secinfo[shndx].name = secname;
+
+			if (strcmp(secname, PERCPU_SECTION) == 0) {
+				encoder->variables.percpu_shndx = shndx;
+				encoder->secinfo[shndx].include = true;
+			}
 		}
 
+		if (!encoder->variables.percpu_shndx && encoder->verbose)
+			printf("%s: '%s' doesn't have '%s' section\n", __func__, cu->filename, PERCPU_SECTION);
+
 		if (btf_encoder__collect_symbols(encoder, !encoder->skip_encoding_vars))
 			goto out_delete;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 dwarves 4/9] btf_encoder: make the variable array dynamic
  2022-11-04 23:10 [PATCH v2 dwarves, kernel, libbpf 0/9] Add support for generating BTF for all variables Stephen Brennan
                   ` (2 preceding siblings ...)
  2022-11-04 23:10 ` [PATCH v2 dwarves 3/9] btf_encoder: cache all ELF section info Stephen Brennan
@ 2022-11-04 23:10 ` Stephen Brennan
  2022-11-04 23:10 ` [PATCH v2 dwarves 5/9] btf_encoder: record ELF section for collected variables Stephen Brennan
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Stephen Brennan @ 2022-11-04 23:10 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: dwarves, bpf, Arnaldo Carvalho de Melo, alan.maguire, Andrii Nakryiko

To collect all variables, we need a dynamically allocated array to scale
with any amount.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 btf_encoder.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index e0e038b..5cd247d 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -37,7 +37,6 @@ struct elf_function {
 	bool		 generated;
 };
 
-#define MAX_VAR_CNT 4096
 #define MAX_ELF_SEC_CNT    128
 
 struct var_info {
@@ -75,8 +74,9 @@ struct btf_encoder {
 	struct elf_secinfo secinfo[MAX_ELF_SEC_CNT];
 	size_t		  seccnt;
 	struct {
-		struct var_info vars[MAX_VAR_CNT];
+		struct var_info *vars;
 		int		var_cnt;
+		int		var_alloc;
 		uint32_t	percpu_shndx;
 	} variables;
 	struct {
@@ -1206,6 +1206,15 @@ static int btf_encoder__collect_percpu_var(struct btf_encoder *encoder, GElf_Sym
 	if (elf_sym__type(sym) != STT_OBJECT)
 		return 0;
 
+	if (encoder->variables.var_cnt == encoder->variables.var_alloc) {
+		struct var_info *new;
+		encoder->variables.var_alloc = max(1000, encoder->variables.var_alloc * 3 / 2);
+		new = realloc(encoder->variables.vars, encoder->variables.var_alloc * sizeof(*new));
+		if (!new)
+			return -1;
+		encoder->variables.vars = new;
+	}
+
 	addr = elf_sym__value(sym);
 
 	size = elf_sym__size(sym);
@@ -1232,11 +1241,6 @@ static int btf_encoder__collect_percpu_var(struct btf_encoder *encoder, GElf_Sym
 		addr += encoder->secinfo[sym->st_shndx].addr;
 	}
 
-	if (encoder->variables.var_cnt == MAX_VAR_CNT) {
-		fprintf(stderr, "Reached the limit of per-CPU variables: %d\n",
-			MAX_VAR_CNT);
-		return -1;
-	}
 	encoder->variables.vars[encoder->variables.var_cnt].addr = addr;
 	encoder->variables.vars[encoder->variables.var_cnt].sz = size;
 	encoder->variables.vars[encoder->variables.var_cnt].name = sym_name;
@@ -1520,6 +1524,10 @@ void btf_encoder__delete(struct btf_encoder *encoder)
 	encoder->btf = NULL;
 	elf_symtab__delete(encoder->symtab);
 
+	encoder->variables.var_cnt = encoder->variables.var_alloc = 0;
+	free(encoder->variables.vars);
+	encoder->variables.vars = NULL;
+
 	encoder->functions.allocated = encoder->functions.cnt = 0;
 	free(encoder->functions.entries);
 	encoder->functions.entries = NULL;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 dwarves 5/9] btf_encoder: record ELF section for collected variables
  2022-11-04 23:10 [PATCH v2 dwarves, kernel, libbpf 0/9] Add support for generating BTF for all variables Stephen Brennan
                   ` (3 preceding siblings ...)
  2022-11-04 23:10 ` [PATCH v2 dwarves 4/9] btf_encoder: make the variable array dynamic Stephen Brennan
@ 2022-11-04 23:10 ` Stephen Brennan
  2022-11-04 23:11 ` [PATCH v2 dwarves 6/9] btf_encoder: collect all variables Stephen Brennan
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Stephen Brennan @ 2022-11-04 23:10 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: dwarves, bpf, Arnaldo Carvalho de Melo, alan.maguire, Andrii Nakryiko

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 btf_encoder.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index 5cd247d..d1f2f38 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -43,6 +43,7 @@ struct var_info {
 	uint64_t    addr;
 	const char *name;
 	uint32_t    sz;
+	uint32_t    shndx;
 };
 
 struct elf_secinfo {
@@ -1194,7 +1195,7 @@ static bool btf_encoder__percpu_var_exists(struct btf_encoder *encoder, uint64_t
 	return true;
 }
 
-static int btf_encoder__collect_percpu_var(struct btf_encoder *encoder, GElf_Sym *sym, size_t sym_sec_idx)
+static int btf_encoder__collect_percpu_var(struct btf_encoder *encoder, GElf_Sym *sym, uint32_t sym_sec_idx)
 {
 	const char *sym_name;
 	uint64_t addr;
@@ -1244,6 +1245,7 @@ static int btf_encoder__collect_percpu_var(struct btf_encoder *encoder, GElf_Sym
 	encoder->variables.vars[encoder->variables.var_cnt].addr = addr;
 	encoder->variables.vars[encoder->variables.var_cnt].sz = size;
 	encoder->variables.vars[encoder->variables.var_cnt].name = sym_name;
+	encoder->variables.vars[encoder->variables.var_cnt].shndx = sym_sec_idx;
 	encoder->variables.var_cnt++;
 
 	return 0;
@@ -1251,7 +1253,7 @@ static int btf_encoder__collect_percpu_var(struct btf_encoder *encoder, GElf_Sym
 
 static int btf_encoder__collect_symbols(struct btf_encoder *encoder, bool collect_percpu_vars)
 {
-	Elf32_Word sym_sec_idx;
+	uint32_t sym_sec_idx;
 	uint32_t core_id;
 	GElf_Sym sym;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 dwarves 6/9] btf_encoder: collect all variables
  2022-11-04 23:10 [PATCH v2 dwarves, kernel, libbpf 0/9] Add support for generating BTF for all variables Stephen Brennan
                   ` (4 preceding siblings ...)
  2022-11-04 23:10 ` [PATCH v2 dwarves 5/9] btf_encoder: record ELF section for collected variables Stephen Brennan
@ 2022-11-04 23:11 ` Stephen Brennan
  2022-11-04 23:11 ` [PATCH v2 dwarves 7/9] btf_encoder: allow encoding " Stephen Brennan
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Stephen Brennan @ 2022-11-04 23:11 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: dwarves, bpf, Arnaldo Carvalho de Melo, alan.maguire, Andrii Nakryiko

Prior to this, we only collected ELF symbols which were in the percpu
section. Now, collect all ELF symbols of type STT_OBJECT, and rely on
the setting of "include" in the elf_secinfo to decide whether to include
in the final output.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 btf_encoder.c | 70 +++++++++++++++++++++++++++------------------------
 1 file changed, 37 insertions(+), 33 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index d1f2f38..f7acc9a 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -1172,7 +1172,7 @@ int btf_encoder__encode(struct btf_encoder *encoder)
 	return err;
 }
 
-static int percpu_var_cmp(const void *_a, const void *_b)
+static int var_cmp(const void *_a, const void *_b)
 {
 	const struct var_info *a = _a;
 	const struct var_info *b = _b;
@@ -1182,28 +1182,24 @@ static int percpu_var_cmp(const void *_a, const void *_b)
 	return a->addr < b->addr ? -1 : 1;
 }
 
-static bool btf_encoder__percpu_var_exists(struct btf_encoder *encoder, uint64_t addr, uint32_t *sz, const char **name)
+static bool btf_encoder__var_exists(struct btf_encoder *encoder, uint64_t addr, struct var_info **info)
 {
 	struct var_info key = { .addr = addr };
-	const struct var_info *p = bsearch(&key, encoder->variables.vars, encoder->variables.var_cnt,
-					   sizeof(encoder->variables.vars[0]), percpu_var_cmp);
+	struct var_info *p = bsearch(&key, encoder->variables.vars, encoder->variables.var_cnt,
+				     sizeof(encoder->variables.vars[0]), var_cmp);
 	if (!p)
 		return false;
 
-	*sz = p->sz;
-	*name = p->name;
+	*info = p;
 	return true;
 }
 
-static int btf_encoder__collect_percpu_var(struct btf_encoder *encoder, GElf_Sym *sym, uint32_t sym_sec_idx)
+static int btf_encoder__collect_var(struct btf_encoder *encoder, GElf_Sym *sym, uint32_t sym_sec_idx)
 {
 	const char *sym_name;
 	uint64_t addr;
 	uint32_t size;
 
-	/* compare a symbol's shndx to determine if it's a percpu variable */
-	if (sym_sec_idx != encoder->variables.percpu_shndx)
-		return 0;
 	if (elf_sym__type(sym) != STT_OBJECT)
 		return 0;
 
@@ -1262,7 +1258,7 @@ static int btf_encoder__collect_symbols(struct btf_encoder *encoder, bool collec
 
 	/* search within symtab for percpu variables */
 	elf_symtab__for_each_symbol_index(encoder->symtab, core_id, sym, sym_sec_idx) {
-		if (collect_percpu_vars && btf_encoder__collect_percpu_var(encoder, &sym, sym_sec_idx))
+		if (collect_percpu_vars && btf_encoder__collect_var(encoder, &sym, sym_sec_idx))
 			return -1;
 		if (btf_encoder__collect_function(encoder, &sym))
 			return -1;
@@ -1270,7 +1266,7 @@ static int btf_encoder__collect_symbols(struct btf_encoder *encoder, bool collec
 
 	if (collect_percpu_vars) {
 		if (encoder->variables.var_cnt)
-			qsort(encoder->variables.vars, encoder->variables.var_cnt, sizeof(encoder->variables.vars[0]), percpu_var_cmp);
+			qsort(encoder->variables.vars, encoder->variables.var_cnt, sizeof(encoder->variables.vars[0]), var_cmp);
 
 		if (encoder->verbose)
 			printf("Found %d per-CPU variables!\n", encoder->variables.var_cnt);
@@ -1313,17 +1309,19 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, uint32_
 
 	cu__for_each_variable(cu, core_id, pos) {
 		struct variable *var = tag__variable(pos);
-		uint32_t size, type, linkage;
-		const char *name, *dwarf_name;
+		uint32_t type, linkage;
+		const char *dwarf_name;
 		struct llvm_annotation *annot;
 		const struct tag *tag;
+		struct var_info *info;
+		struct elf_secinfo *sec = NULL;
 		uint64_t addr;
 		int id;
 
 		if (var->declaration && !var->spec)
 			continue;
 
-		/* percpu variables are allocated in global space */
+		/* we want global variables, or those with a definition */
 		if (variable__scope(var) != VSCOPE_GLOBAL && !var->spec)
 			continue;
 
@@ -1331,13 +1329,16 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, uint32_
 		addr = var->ip.addr;
 		dwarf_name = variable__name(var);
 
-		/* Make sure addr is in the percpu section */
-		if (addr < pcpu_scn->addr || addr >= pcpu_scn->addr + pcpu_scn->sz)
-			continue;
-
-		if (!btf_encoder__percpu_var_exists(encoder, addr, &size, &name))
+		if (!btf_encoder__var_exists(encoder, addr, &info))
 			continue; /* not a per-CPU variable */
 
+		/* Get the ELF section info */
+		if (info->shndx && info->shndx < encoder->seccnt)
+			sec = &encoder->secinfo[info->shndx];
+		/* Only continue if the section is to be included */
+		if (!sec || !sec->include)
+			continue;
+
 		/* A lot of "special" DWARF variables (e.g, __UNIQUE_ID___xxx)
 		 * have addr == 0, which is the same as, say, valid
 		 * fixed_percpu_data per-CPU variable. To distinguish between
@@ -1356,7 +1357,7 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, uint32_
 		 *  per-CPU symbols have non-zero values.
 		 */
 		if (var->ip.addr == 0) {
-			if (!dwarf_name || strcmp(dwarf_name, name))
+			if (!dwarf_name || strcmp(dwarf_name, info->name))
 				continue;
 		}
 
@@ -1365,7 +1366,7 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, uint32_
 
 		if (var->ip.tag.type == 0) {
 			fprintf(stderr, "error: found variable '%s' in CU '%s' that has void type\n",
-				name, cu->name);
+				info->name, cu->name);
 			if (encoder->force)
 				continue;
 			err = -1;
@@ -1375,7 +1376,7 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, uint32_
 		tag = cu__type(cu, var->ip.tag.type);
 		if (tag__size(tag, cu) == 0) {
 			if (encoder->verbose)
-				fprintf(stderr, "Ignoring zero-sized per-CPU variable '%s'...\n", dwarf_name ?: "<missing name>");
+				fprintf(stderr, "Ignoring zero-sized variable '%s'...\n", dwarf_name ?: "<missing name>");
 			continue;
 		}
 
@@ -1384,14 +1385,14 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, uint32_
 
 		if (encoder->verbose) {
 			printf("Variable '%s' from CU '%s' at address 0x%" PRIx64 " encoded\n",
-			       name, cu->name, addr);
+			       info->name, cu->name, addr);
 		}
 
 		/* add a BTF_KIND_VAR in encoder->types */
-		id = btf_encoder__add_var(encoder, type, name, linkage);
+		id = btf_encoder__add_var(encoder, type, info->name, linkage);
 		if (id < 0) {
 			fprintf(stderr, "error: failed to encode variable '%s' at addr 0x%" PRIx64 "\n",
-			        name, addr);
+			        info->name, addr);
 			goto out;
 		}
 
@@ -1399,20 +1400,23 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, uint32_
 			int tag_type_id = btf_encoder__add_decl_tag(encoder, annot->value, id, annot->component_idx);
 			if (tag_type_id < 0) {
 				fprintf(stderr, "error: failed to encode tag '%s' to variable '%s' with component_idx %d\n",
-					annot->value, name, annot->component_idx);
+					annot->value, info->name, annot->component_idx);
 				goto out;
 			}
 		}
 
 		/*
-		 * add a BTF_VAR_SECINFO in encoder->percpu_secinfo, which will be added into
+		 * For percpu variables, add a BTF_VAR_SECINFO in
+		 * encoder->percpu_secinfo, which will be added into
 		 * encoder->types later when we add BTF_VAR_DATASEC.
 		 */
-		id = btf_encoder__add_var_secinfo(encoder, id, addr - pcpu_scn->addr, size);
-		if (id < 0) {
-			fprintf(stderr, "error: failed to encode section info for variable '%s' at addr 0x%" PRIx64 "\n",
-			        name, addr);
-			goto out;
+		if (info->shndx == encoder->variables.percpu_shndx) {
+			id = btf_encoder__add_var_secinfo(encoder, id, addr - pcpu_scn->addr, info->sz);
+			if (id < 0) {
+				fprintf(stderr, "error: failed to encode section info for variable '%s' at addr 0x%" PRIx64 "\n",
+					info->name, addr);
+				goto out;
+			}
 		}
 	}
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 dwarves 7/9] btf_encoder: allow encoding all variables
  2022-11-04 23:10 [PATCH v2 dwarves, kernel, libbpf 0/9] Add support for generating BTF for all variables Stephen Brennan
                   ` (5 preceding siblings ...)
  2022-11-04 23:11 ` [PATCH v2 dwarves 6/9] btf_encoder: collect all variables Stephen Brennan
@ 2022-11-04 23:11 ` Stephen Brennan
  2022-11-04 23:11 ` [PATCH v2 bpf 8/9] libbpf: btf dedup identical struct test needs check for nested structs/arrays Stephen Brennan
  2022-11-04 23:11 ` [PATCH v2 kernel 9/9] bpf: add support for CONFIG_DEBUG_INFO_BTF_VARS Stephen Brennan
  8 siblings, 0 replies; 10+ messages in thread
From: Stephen Brennan @ 2022-11-04 23:11 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: dwarves, bpf, Arnaldo Carvalho de Melo, alan.maguire, Andrii Nakryiko

Add the --encode_all_btf_vars option, which conflicts with
--skip_encoding_btf_vars, and will enable encoding all variables which
have a corresponding STT_OBJECT match in the ELF symbol table. Rework
the btf_encoder_new() signature to include a single enum to specify
which kinds of variables are allowed, and add the necessary logic to
select variables.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 btf_encoder.c      | 22 +++++++++++++---------
 btf_encoder.h      |  8 +++++++-
 man-pages/pahole.1 |  6 +++++-
 pahole.c           | 30 ++++++++++++++++++++++++------
 4 files changed, 49 insertions(+), 17 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index f7acc9a..b3ede15 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -65,7 +65,6 @@ struct btf_encoder {
 	struct elf_symtab *symtab;
 	bool		  has_index_type,
 			  need_index_type,
-			  skip_encoding_vars,
 			  raw_output,
 			  verbose,
 			  force,
@@ -74,6 +73,7 @@ struct btf_encoder {
 	uint32_t	  array_index_id;
 	struct elf_secinfo secinfo[MAX_ELF_SEC_CNT];
 	size_t		  seccnt;
+	enum btf_var_option encode_vars;
 	struct {
 		struct var_info *vars;
 		int		var_cnt;
@@ -1247,24 +1247,25 @@ static int btf_encoder__collect_var(struct btf_encoder *encoder, GElf_Sym *sym,
 	return 0;
 }
 
-static int btf_encoder__collect_symbols(struct btf_encoder *encoder, bool collect_percpu_vars)
+static int btf_encoder__collect_symbols(struct btf_encoder *encoder)
 {
 	uint32_t sym_sec_idx;
 	uint32_t core_id;
 	GElf_Sym sym;
+	bool collect_vars = (encoder->encode_vars != BTF_VAR_NONE);
 
 	/* cache variables' addresses, preparing for searching in symtab. */
 	encoder->variables.var_cnt = 0;
 
 	/* search within symtab for percpu variables */
 	elf_symtab__for_each_symbol_index(encoder->symtab, core_id, sym, sym_sec_idx) {
-		if (collect_percpu_vars && btf_encoder__collect_var(encoder, &sym, sym_sec_idx))
+		if (collect_vars && btf_encoder__collect_var(encoder, &sym, sym_sec_idx))
 			return -1;
 		if (btf_encoder__collect_function(encoder, &sym))
 			return -1;
 	}
 
-	if (collect_percpu_vars) {
+	if (collect_vars) {
 		if (encoder->variables.var_cnt)
 			qsort(encoder->variables.vars, encoder->variables.var_cnt, sizeof(encoder->variables.vars[0]), var_cmp);
 
@@ -1425,7 +1426,7 @@ out:
 	return err;
 }
 
-struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filename, struct btf *base_btf, bool skip_encoding_vars, bool force, bool gen_floats, bool verbose)
+struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filename, struct btf *base_btf, enum btf_var_option vars, bool force, bool gen_floats, bool verbose)
 {
 	struct btf_encoder *encoder = zalloc(sizeof(*encoder));
 
@@ -1441,11 +1442,11 @@ struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filenam
 
 		encoder->force		 = force;
 		encoder->gen_floats	 = gen_floats;
-		encoder->skip_encoding_vars = skip_encoding_vars;
 		encoder->verbose	 = verbose;
 		encoder->has_index_type  = false;
 		encoder->need_index_type = false;
 		encoder->array_index_id  = 0;
+		encoder->encode_vars	 = vars;
 
 		GElf_Ehdr ehdr;
 
@@ -1495,17 +1496,20 @@ struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filenam
 			encoder->secinfo[shndx].addr = shdr.sh_addr;
 			encoder->secinfo[shndx].sz = shdr.sh_size;
 			encoder->secinfo[shndx].name = secname;
+			if (encoder->encode_vars == BTF_VAR_ALL)
+				encoder->secinfo[shndx].include = true;
 
 			if (strcmp(secname, PERCPU_SECTION) == 0) {
 				encoder->variables.percpu_shndx = shndx;
-				encoder->secinfo[shndx].include = true;
+				if (encoder->encode_vars != BTF_VAR_NONE)
+					encoder->secinfo[shndx].include = true;
 			}
 		}
 
 		if (!encoder->variables.percpu_shndx && encoder->verbose)
 			printf("%s: '%s' doesn't have '%s' section\n", __func__, cu->filename, PERCPU_SECTION);
 
-		if (btf_encoder__collect_symbols(encoder, !encoder->skip_encoding_vars))
+		if (btf_encoder__collect_symbols(encoder))
 			goto out_delete;
 
 		if (encoder->verbose)
@@ -1671,7 +1675,7 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
 		}
 	}
 
-	if (!encoder->skip_encoding_vars)
+	if (encoder->encode_vars != BTF_VAR_NONE)
 		err = btf_encoder__encode_cu_variables(encoder, type_id_off);
 out:
 	encoder->cu = NULL;
diff --git a/btf_encoder.h b/btf_encoder.h
index a65120c..e03c7cc 100644
--- a/btf_encoder.h
+++ b/btf_encoder.h
@@ -16,7 +16,13 @@ struct btf;
 struct cu;
 struct list_head;
 
-struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filename, struct btf *base_btf, bool skip_encoding_vars, bool force, bool gen_floats, bool verbose);
+enum btf_var_option {
+	BTF_VAR_NONE,
+	BTF_VAR_PERCPU,
+	BTF_VAR_ALL,
+};
+
+struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filename, struct btf *base_btf, enum btf_var_option vars, bool force, bool gen_floats, bool verbose);
 void btf_encoder__delete(struct btf_encoder *encoder);
 
 int btf_encoder__encode(struct btf_encoder *encoder);
diff --git a/man-pages/pahole.1 b/man-pages/pahole.1
index 7460104..e7f5ab5 100644
--- a/man-pages/pahole.1
+++ b/man-pages/pahole.1
@@ -215,7 +215,11 @@ the debugging information.
 
 .TP
 .B \-\-skip_encoding_btf_vars
-Do not encode VARs in BTF.
+TQ
+.B \-\-encode_all_btf_vars
+By default, VARs are encoded only for percpu variables. These options allow
+to skip encoding them, or to encode all variables regardless of whether they are
+percpu. These options are mutually exclusive.
 
 .TP
 .B \-\-skip_encoding_btf_decl_tag
diff --git a/pahole.c b/pahole.c
index 4ddf21f..6ff4e22 100644
--- a/pahole.c
+++ b/pahole.c
@@ -37,7 +37,7 @@ static bool ctf_encode;
 static bool sort_output;
 static bool need_resort;
 static bool first_obj_only;
-static bool skip_encoding_btf_vars;
+static enum btf_var_option encode_btf_vars = BTF_VAR_PERCPU;
 static bool btf_encode_force;
 static const char *base_btf_file;
 
@@ -1222,6 +1222,7 @@ ARGP_PROGRAM_VERSION_HOOK_DEF = dwarves_print_version;
 #define ARGP_languages_exclude	   336
 #define ARGP_skip_encoding_btf_enum64 337
 #define ARGP_skip_emitting_atomic_typedefs 338
+#define ARGP_encode_all_btf_vars   339
 
 static const struct argp_option pahole__options[] = {
 	{
@@ -1533,7 +1534,12 @@ static const struct argp_option pahole__options[] = {
 	{
 		.name = "skip_encoding_btf_vars",
 		.key  = ARGP_skip_encoding_btf_vars,
-		.doc  = "Do not encode VARs in BTF."
+		.doc  = "Do not encode any VARs in BTF (default: only percpu)."
+	},
+	{
+		.name = "encode_all_btf_vars",
+		.key  = ARGP_encode_all_btf_vars,
+		.doc  = "Encode all VARs in BTF (default: only percpu)."
 	},
 	{
 		.name = "btf_encode_force",
@@ -1763,8 +1769,6 @@ static error_t pahole__options_parser(int key, char *arg,
 		conf.range = arg;			break;
 	case ARGP_header_type:
 		conf.header_type = arg;			break;
-	case ARGP_skip_encoding_btf_vars:
-		skip_encoding_btf_vars = true;		break;
 	case ARGP_btf_encode_force:
 		btf_encode_force = true;		break;
 	case ARGP_btf_base:
@@ -1803,6 +1807,20 @@ static error_t pahole__options_parser(int key, char *arg,
 		conf_load.skip_encoding_btf_enum64 = true;	break;
 	case ARGP_skip_emitting_atomic_typedefs:
 		conf.skip_emitting_atomic_typedefs = true;	break;
+	case ARGP_skip_encoding_btf_vars:
+		if (encode_btf_vars != BTF_VAR_PERCPU) {
+			fprintf(stderr, "error: --encode_all_btf_vars and --skip_encoding_btf_vars are mutually exclusive\n");
+			return ARGP_HELP_SEE;
+		}
+		encode_btf_vars = BTF_VAR_NONE;
+		break;
+	case ARGP_encode_all_btf_vars:
+		if (encode_btf_vars != BTF_VAR_PERCPU) {
+			fprintf(stderr, "error: --encode_all_btf_vars and --skip_encoding_btf_vars are mutually exclusive\n");
+			return ARGP_HELP_SEE;
+		}
+		encode_btf_vars = BTF_VAR_ALL;
+		break;
 	default:
 		return ARGP_ERR_UNKNOWN;
 	}
@@ -3037,7 +3055,7 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
 			 * And, it is used by the thread
 			 * create it.
 			 */
-			btf_encoder = btf_encoder__new(cu, detached_btf_filename, conf_load->base_btf, skip_encoding_btf_vars,
+			btf_encoder = btf_encoder__new(cu, detached_btf_filename, conf_load->base_btf, encode_btf_vars,
 						       btf_encode_force, btf_gen_floats, global_verbose);
 			if (btf_encoder && thr_data) {
 				struct thread_data *thread = thr_data;
@@ -3067,7 +3085,7 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
 				thread->encoder =
 					btf_encoder__new(cu, detached_btf_filename,
 							 NULL,
-							 skip_encoding_btf_vars,
+							 encode_btf_vars,
 							 btf_encode_force,
 							 btf_gen_floats,
 							 global_verbose);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 bpf 8/9] libbpf: btf dedup identical struct test needs check for nested structs/arrays
  2022-11-04 23:10 [PATCH v2 dwarves, kernel, libbpf 0/9] Add support for generating BTF for all variables Stephen Brennan
                   ` (6 preceding siblings ...)
  2022-11-04 23:11 ` [PATCH v2 dwarves 7/9] btf_encoder: allow encoding " Stephen Brennan
@ 2022-11-04 23:11 ` Stephen Brennan
  2022-11-04 23:11 ` [PATCH v2 kernel 9/9] bpf: add support for CONFIG_DEBUG_INFO_BTF_VARS Stephen Brennan
  8 siblings, 0 replies; 10+ messages in thread
From: Stephen Brennan @ 2022-11-04 23:11 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: dwarves, bpf, Arnaldo Carvalho de Melo, alan.maguire, Andrii Nakryiko

From: Alan Maguire <alan.maguire@oracle.com>

When examining module BTF, it is common to see core kernel structures
such as sk_buff, net_device duplicated in the module.  After adding
debug messaging to BTF it turned out that much of the problem
was down to the identical struct test failing during deduplication;
sometimes the compiler adds identical structs.  However
it turns out sometimes that type ids of identical struct members
can also differ, even when the containing structs are still identical.

To take an example, for struct sk_buff, debug messaging revealed
that the identical struct matching was failing for the anon
struct "headers"; specifically for the first field:

__u8       __pkt_type_offset[0]; /*   128     0 */

Looking at the code in BTF deduplication, we have code that guards
against the possibility of identical struct definitions, down to
type ids, and identical array definitions.  However in this case
we have a struct which is being defined twice but does not have
identical type ids since each duplicate struct has separate type
ids for the above array member.   A similar problem (though not
observed) could occur for struct-in-struct.

The solution is to make the "identical struct" test check members
not just for matching ids, but to also check if they in turn are
identical structs or arrays.

The results of doing this are quite dramatic (for some modules
at least); I see the number of type ids drop from around 10000
to just over 1000 in one module for example.

For testing use latest pahole or apply [1], otherwise dedups
can fail for the reasons described there.

Also fix return type of btf_dedup_identical_arrays() as
suggested by Andrii to match boolean return type used
elsewhere.

Fixes: efdd3eb8015e ("libbpf: Accommodate DWARF/compiler bug with duplicated structs")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>

[1] https://lore.kernel.org/bpf/1666364523-9648-1-git-send-email-alan.maguire
---
 tools/lib/bpf/btf.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index d88647d..675a0df 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -3887,14 +3887,14 @@ static inline __u16 btf_fwd_kind(struct btf_type *t)
 }
 
 /* Check if given two types are identical ARRAY definitions */
-static int btf_dedup_identical_arrays(struct btf_dedup *d, __u32 id1, __u32 id2)
+static bool btf_dedup_identical_arrays(struct btf_dedup *d, __u32 id1, __u32 id2)
 {
 	struct btf_type *t1, *t2;
 
 	t1 = btf_type_by_id(d->btf, id1);
 	t2 = btf_type_by_id(d->btf, id2);
 	if (!btf_is_array(t1) || !btf_is_array(t2))
-		return 0;
+		return false;
 
 	return btf_equal_array(t1, t2);
 }
@@ -3918,7 +3918,9 @@ static bool btf_dedup_identical_structs(struct btf_dedup *d, __u32 id1, __u32 id
 	m1 = btf_members(t1);
 	m2 = btf_members(t2);
 	for (i = 0, n = btf_vlen(t1); i < n; i++, m1++, m2++) {
-		if (m1->type != m2->type)
+		if (m1->type != m2->type &&
+		    !btf_dedup_identical_arrays(d, m1->type, m2->type) &&
+		    !btf_dedup_identical_structs(d, m1->type, m2->type))
 			return false;
 	}
 	return true;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 kernel 9/9] bpf: add support for CONFIG_DEBUG_INFO_BTF_VARS
  2022-11-04 23:10 [PATCH v2 dwarves, kernel, libbpf 0/9] Add support for generating BTF for all variables Stephen Brennan
                   ` (7 preceding siblings ...)
  2022-11-04 23:11 ` [PATCH v2 bpf 8/9] libbpf: btf dedup identical struct test needs check for nested structs/arrays Stephen Brennan
@ 2022-11-04 23:11 ` Stephen Brennan
  8 siblings, 0 replies; 10+ messages in thread
From: Stephen Brennan @ 2022-11-04 23:11 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: dwarves, bpf, Arnaldo Carvalho de Melo, alan.maguire, Andrii Nakryiko

From: Alan Maguire <alan.maguire@oracle.com>

This patch implements a proof-of-concept of the kernel side of
global variable support discussed in [1].

Add new tristate CONFIG_DEBUG_INFO_BTF_VARS; when it is 'y'
both per-CPU and global variables are added to vmlinux BTF.
When set to 'm',  variable BTF is added to module
kernel/bpf/vmlinux_btf_extra.ko; as a result, global variables
will be available in /sys/kernel/btf/vmlinux_btf_extra using
split BTF to store variables.

To achieve this, we add a "target" option to scripts/pahole-flags.sh
which - if set to "extra" - gives us the flags to be used for
extra vmlinux BTF generation.  Module building and BTF generation
are skipped if CONFIG_DEBUG_INFO_BTF_VARS is 'y' or 'n'.

Depends on having a v1.24 of dwarves which provides support for
encoding all variables as per patch 1-7 of [1].  To put the pieces
together, apply those patches to pahole.  Eventually, like other
functionality it would likely require dependending on a newer
version of pahole like v1.25.

CMakeLists.txt before building so that the pahole version can be
a proxy for detecting the "encode all variables" feature.

Prior to fixes [2] and [3] dedup failed and was highly
redundant, but with those fixes in pahole and libbpf we see
that in the module case, vmlinux_btf_extra.ko only defines
two non-VAR kinds; one for the percpu DATASEC and one for
the 'double' type:

$ bpftool btf dump -B ~/src/bpf/vmlinux file ~/src/bpf/kernel/bpf/vmlinux_btf_extra.ko|egrep -v VAR
[115981] INT 'double' size=8 bits_offset=0 nr_bits=64 encoding=(none)
[159904] DATASEC '.data..percpu' size=209156 vlen=378

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>

[1] TBD (requires resyncing patches with dwarves)
[2] https://lore.kernel.org/bpf/1666364523-9648-1-git-send-email-alan.maguire@oracle.com/
[3] https://lore.kernel.org/bpf/1666622309-22289-1-git-send-email-alan.maguire@oracle.com/
---
 Makefile                       |  3 ++-
 kernel/bpf/Makefile            |  4 ++++
 kernel/bpf/vmlinux_btf_extra.c | 22 ++++++++++++++++++++
 lib/Kconfig.debug              |  9 ++++++++
 scripts/Makefile.modfinal      |  6 ++++++
 scripts/pahole-flags.sh        | 38 ++++++++++++++++++++++++++++++++++
 6 files changed, 81 insertions(+), 1 deletion(-)
 create mode 100644 kernel/bpf/vmlinux_btf_extra.c

diff --git a/Makefile b/Makefile
index f659d3085121..7cc99a424a85 100644
--- a/Makefile
+++ b/Makefile
@@ -527,6 +527,7 @@ XZ		= xz
 ZSTD		= zstd
 
 PAHOLE_FLAGS	= $(shell PAHOLE=$(PAHOLE) $(srctree)/scripts/pahole-flags.sh)
+EXTRA_PAHOLE_FLAGS = $(shell PAHOLE=$(PAHOLE) $(srctree)/scripts/pahole-flags.sh extra)
 
 CHECKFLAGS     := -D__linux__ -Dlinux -D__STDC__ -Dunix -D__unix__ \
 		  -Wbitwise -Wno-return-void -Wno-unknown-attribute $(CF)
@@ -614,7 +615,7 @@ export KBUILD_RUSTFLAGS RUSTFLAGS_KERNEL RUSTFLAGS_MODULE
 export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
 export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_RUSTFLAGS_MODULE KBUILD_LDFLAGS_MODULE
 export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL KBUILD_RUSTFLAGS_KERNEL
-export PAHOLE_FLAGS
+export PAHOLE_FLAGS EXTRA_PAHOLE_FLAGS
 
 # Files to ignore in find ... statements
 
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 341c94f208f4..8e3ee5c98f23 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -43,3 +43,7 @@ obj-$(CONFIG_BPF_PRELOAD) += preload/
 obj-$(CONFIG_BPF_SYSCALL) += relo_core.o
 $(obj)/relo_core.o: $(srctree)/tools/lib/bpf/relo_core.c FORCE
 	$(call if_changed_rule,cc_o_c)
+
+ifeq ($(CONFIG_DEBUG_INFO_BTF_VARS),m)
+obj-m = vmlinux_btf_extra.o
+endif
diff --git a/kernel/bpf/vmlinux_btf_extra.c b/kernel/bpf/vmlinux_btf_extra.c
new file mode 100644
index 000000000000..9aa682287a1e
--- /dev/null
+++ b/kernel/bpf/vmlinux_btf_extra.c
@@ -0,0 +1,22 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022, Oracle and/or its affiliates. */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+/* Dummy module used as a container for extra vmlinux BTF information;
+ * to be used if vmlinux BTF size is a concern.
+ */
+static int __init vmlinux_btf_extra_init(void)
+{
+	return 0;
+}
+module_init(vmlinux_btf_extra_init);
+
+static void __exit vmlinux_btf_extra_exit(void)
+{
+	return;
+}
+module_exit(vmlinux_btf_extra_exit);
+
+MODULE_LICENSE("GPL v2");
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index f473f7d8a0a2..3c7df82a37db 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -364,6 +364,15 @@ config DEBUG_INFO_BTF_MODULES
 	help
 	  Generate compact split BTF type information for kernel modules.
 
+config DEBUG_INFO_BTF_VARS
+	tristate "Encode kernel variables in BTF"
+	depends on DEBUG_INFO_BTF && PAHOLE_VERSION >= 124
+	help
+	   Decide whether pahole emits variable definitions for all
+	   variables.  If 'm', variables are stored in vmlinux-btf-extra
+	   module, which has BTF for variables only (it is deduplicated
+	   with vmlinux BTF).
+
 config MODULE_ALLOW_BTF_MISMATCH
 	bool "Allow loading modules with non-matching BTF type info"
 	depends on DEBUG_INFO_BTF_MODULES
diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
index 9a1fa6aa30fe..210d7869c14f 100644
--- a/scripts/Makefile.modfinal
+++ b/scripts/Makefile.modfinal
@@ -43,6 +43,12 @@ quiet_cmd_btf_ko = BTF [M] $@
 		printf "Skipping BTF generation for %s due to unavailability of vmlinux\n" $@ 1>&2; \
 	elif [ -n "$(CONFIG_RUST)" ] && $(srctree)/scripts/is_rust_module.sh $@; then 		\
 		printf "Skipping BTF generation for %s because it's a Rust module\n" $@ 1>&2; \
+	elif [ $@ == "kernel/bpf/vmlinux_btf_extra.ko" ]; then		\
+		LLVM_OBJCOPY="$(OBJCOPY)" $(PAHOLE) -J			\
+		    --btf_encode_detached=$@.btf --btf_base vmlinux	\
+		    $(EXTRA_PAHOLE_FLAGS) vmlinux;			\
+		$(OBJCOPY) --add-section .BTF=$(@).btf			\
+		    --set-section-flags .BTF=alloc,readonly $(@);	\
 	else								\
 		LLVM_OBJCOPY="$(OBJCOPY)" $(PAHOLE) -J $(PAHOLE_FLAGS) --btf_base vmlinux $@; \
 		$(RESOLVE_BTFIDS) -b vmlinux $@; 			\
diff --git a/scripts/pahole-flags.sh b/scripts/pahole-flags.sh
index 0d99ef17e4a5..44bc714fe926 100755
--- a/scripts/pahole-flags.sh
+++ b/scripts/pahole-flags.sh
@@ -1,6 +1,16 @@
 #!/bin/sh
 # SPDX-License-Identifier: GPL-2.0
 
+config_value() {
+	awk -v name=$1 -F '=' '$1 == name { print $2 }' include/config/auto.conf
+}
+
+# target is set to "extra" for vmlinux_btf_extra module encoding flags.
+# If CONFIG_DEBUG_INFO_BTF_VARS is 'm', we encode variables in
+# the module, otherwise if 'y' encode them in vmlinux BTF directly.
+
+target=$1
+
 extra_paholeopt=
 
 if ! [ -x "$(command -v ${PAHOLE})" ]; then
@@ -20,4 +30,32 @@ if [ "${pahole_ver}" -ge "122" ]; then
 	extra_paholeopt="${extra_paholeopt} -j"
 fi
 
+case $(config_value CONFIG_DEBUG_INFO_BTF_VARS) in
+y)
+	# variables are encoded in core vmlinux BTF; nothing is encoded
+	# in vmlinux_btf_extra module BTF.
+	if [[ $target == "extra" ]]; then
+		extra_paholeopt=""
+	else
+		extra_paholeopt="${extra_paholeopt} --encode_all_btf_vars"
+	fi
+	;;
+m)
+	# global variables are encoded in vmlinux_btf_extra; per-CPU
+	# variables are still found in vmlinux BTF.
+	if [[ $target == "extra" ]]; then
+		extra_paholeopt="--encode_all_btf_vars"
+	else
+		extra_paholeopt="${extra_paholeopt}"
+	fi
+	;;
+*)
+	# nothing is encoded in vmlinux_btf_extra; no global variables
+	# are encoded in core vmlinux BTF.
+	if [[ $target == "extra" ]]; then
+		extra_paholeopt=""
+	fi
+	;;
+esac
+
 echo ${extra_paholeopt}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-11-04 23:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-04 23:10 [PATCH v2 dwarves, kernel, libbpf 0/9] Add support for generating BTF for all variables Stephen Brennan
2022-11-04 23:10 ` [PATCH v2 dwarves 1/9] dutil: return ELF section name when looked up by index Stephen Brennan
2022-11-04 23:10 ` [PATCH v2 dwarves 2/9] btf_encoder: Rename percpu structures to variables Stephen Brennan
2022-11-04 23:10 ` [PATCH v2 dwarves 3/9] btf_encoder: cache all ELF section info Stephen Brennan
2022-11-04 23:10 ` [PATCH v2 dwarves 4/9] btf_encoder: make the variable array dynamic Stephen Brennan
2022-11-04 23:10 ` [PATCH v2 dwarves 5/9] btf_encoder: record ELF section for collected variables Stephen Brennan
2022-11-04 23:11 ` [PATCH v2 dwarves 6/9] btf_encoder: collect all variables Stephen Brennan
2022-11-04 23:11 ` [PATCH v2 dwarves 7/9] btf_encoder: allow encoding " Stephen Brennan
2022-11-04 23:11 ` [PATCH v2 bpf 8/9] libbpf: btf dedup identical struct test needs check for nested structs/arrays Stephen Brennan
2022-11-04 23:11 ` [PATCH v2 kernel 9/9] bpf: add support for CONFIG_DEBUG_INFO_BTF_VARS Stephen Brennan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).