dwarves.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH dwarves 0/7] Add support for generating BTF for all variables
@ 2022-08-26 18:49 Stephen Brennan
  2022-08-26 18:49 ` [PATCH dwarves 1/7] dutil: return ELF section name when looked up by index Stephen Brennan
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Stephen Brennan @ 2022-08-26 18:49 UTC (permalink / raw)
  To: dwarves; +Cc: bpf, Arnaldo Carvalho de Melo, stephen.s.brennan, alan.maguire

Hello everyone,

BTF offers some exciting new possibilities beyond its original intent;
one of these is making the kernel more self-describing for debug tools.
Kallsyms contains symbol table data, and ORC (for x86_64) contains
information to help unwind stacks. Now, BTF can provide type information
for functions and variables. Taken together, this data is enough to
power the basic (read-only) functions of a postmortem or live debugger,
without falling back on the heavier debugging information formats like
DWARF. What's more, all of these data sources are contained within the
kernel image itself, and are thus available on live systems and within
crash dumps, without consulting any external debug information files.

However, currently BTF generation emits information only for percpu
variables. This patch series removes that limitation, allowing
generating BTF for all variables, thus providing complete type
information for debuggers.

Of course, generating additional BTF means that more data must be stored
in the kernel image, and that may not be okay for everyone. Thus, the
new behavior must be explicitly enabled by a flag.

Testing
-------

To verify this change and illustrate the additional space required, I
built v5.19-rc7 on x86_defconfig, with the following additionally
enabled:

enable DEBUG_INFO_DWARF4
enable BPF_SYSCALL
enable DEBUG_INFO_BTF

I then ran pahole to generate BTF from the built vmlinux in three
configurations, and recorded the size of the BTF for each:

1) using the current master branch
   size: 5505315 bytes
2) using this patched version, without enabling --encode_all_btf_vars
   size: 5505315 bytes
3) using this patched version, with --encode_all_btf_vars enabled
   size: 6811291 bytes

A total increase of 1.25 MiB, or a 23.7% increase. This is definitely
notable, but not unreasonable for many use cases such as desktop or
server applications. I also verified that the data generated by cases 1
and 2 are byte-for-byte identical: that is, there are no changes to the
generated BTF unless --encode_all_btf_vars is enabled.

I also verified that the output variables makes sense. I created an
application which parses the output BTF and dumps the
declarations (BTF_KIND_VAR and BTF_KIND_FUNC), and then diffed its
output between configuration 2 and 3. I'm happy to provide a link to
that diff (it's of course too big to include in the email).

End-to-end test
---------------

To show this is not just theory, I've created an end-to-end test which
combines BTF generated via this patch series, along with a kernel patch
necessary to expose the kallsyms data [1], and a branch of the drgn
debugger[2] which implements kallsyms and BTF parsing. Core dumps
generated on the resulting kernel can be loaded by the drgn debugger,
and the it can read out variables from the dump with full type
information without needing to consult a DWARF debuginfo file.

Future Work
-----------

If this proves acceptable, I'd like to follow-up with a kernel patch to
add a configuration option (default=n) for generating BTF with all
variables, which distributions could choose to enable or not.

There was previous discussion[3] about leveraging split BTF or building
additional kernel modules to contain the extra variables. I believe with
this patch series, it is possible to do that. However, I'd argue that
simpler is better here: the advantage for using BTF is having it all
available in the kernel/module image. Storing extra BTF on the
filesystem would break that advantage, and at that point, you'd be
better off using a debuginfo format like CTF, which is lightweight and
expected to be found on the filesystem.

[1]: https://lore.kernel.org/lkml/20220517000508.777145-3-stephen.s.brennan@oracle.com/T/
     (The above series is already in the 6.0 RC's)
[2]: https://github.com/brenns10/drgn/tree/kallsyms_plus_btf
[3]: https://lore.kernel.org/bpf/586a6288-704a-f7a7-b256-e18a675927df@oracle.com/

Stephen Brennan (7):
  dutil: return ELF section name when looked up by index
  btf_encoder: Rename percpu structures to variables
  btf_encoder: cache all ELF section info
  btf_encoder: make the variable array dynamic
  btf_encoder: record ELF section for collected variables
  btf_encoder: collect all variables
  btf_encoder: allow encoding all variables

 btf_encoder.c      | 196 +++++++++++++++++++++++++++------------------
 btf_encoder.h      |   8 +-
 dutil.c            |  10 ++-
 dutil.h            |   2 +-
 man-pages/pahole.1 |   6 +-
 pahole.c           |  31 +++++--
 6 files changed, 165 insertions(+), 88 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH dwarves 1/7] dutil: return ELF section name when looked up by index
  2022-08-26 18:49 [PATCH dwarves 0/7] Add support for generating BTF for all variables Stephen Brennan
@ 2022-08-26 18:49 ` Stephen Brennan
  2022-08-26 18:49 ` [PATCH dwarves 2/7] btf_encoder: Rename percpu structures to variables Stephen Brennan
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Stephen Brennan @ 2022-08-26 18:49 UTC (permalink / raw)
  To: dwarves; +Cc: bpf, Arnaldo Carvalho de Melo, stephen.s.brennan, alan.maguire

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 dutil.c | 10 +++++++++-
 dutil.h |  2 +-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/dutil.c b/dutil.c
index 97c4474..a4d55e6 100644
--- a/dutil.c
+++ b/dutil.c
@@ -2,6 +2,7 @@
   SPDX-License-Identifier: GPL-2.0-only
 
   Copyright (C) 2007 Arnaldo Carvalho de Melo <acme@redhat.com>
+  Copyright (c) 2022, Oracle and/or its affiliates.
 */
 
 
@@ -207,13 +208,20 @@ Elf_Scn *elf_section_by_name(Elf *elf, GElf_Shdr *shp, const char *name, size_t
 	return sec;
 }
 
-Elf_Scn *elf_section_by_idx(Elf *elf, GElf_Shdr *shp, int idx)
+Elf_Scn *elf_section_by_idx(Elf *elf, GElf_Shdr *shp, int idx, const char **name_out)
 {
 	Elf_Scn *sec;
+	size_t str_idx;
 
 	sec = elf_getscn(elf, idx);
 	if (sec)
 		gelf_getshdr(sec, shp);
+
+	if (name_out) {
+		if (elf_getshdrstrndx(elf, &str_idx))
+			return NULL;
+		*name_out = elf_strptr(elf, str_idx, shp->sh_name);
+	}
 	return sec;
 }
 
diff --git a/dutil.h b/dutil.h
index e45bba0..2dcf986 100644
--- a/dutil.h
+++ b/dutil.h
@@ -328,7 +328,7 @@ void *zalloc(const size_t size);
 
 Elf_Scn *elf_section_by_name(Elf *elf, GElf_Shdr *shp, const char *name, size_t *index);
 
-Elf_Scn *elf_section_by_idx(Elf *elf, GElf_Shdr *shp, int idx);
+Elf_Scn *elf_section_by_idx(Elf *elf, GElf_Shdr *shp, int idx, const char **name_out);
 
 #ifndef SHT_GNU_ATTRIBUTES
 /* Just a way to check if we're using an old elfutils version */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH dwarves 2/7] btf_encoder: Rename percpu structures to variables
  2022-08-26 18:49 [PATCH dwarves 0/7] Add support for generating BTF for all variables Stephen Brennan
  2022-08-26 18:49 ` [PATCH dwarves 1/7] dutil: return ELF section name when looked up by index Stephen Brennan
@ 2022-08-26 18:49 ` Stephen Brennan
  2022-08-26 18:49 ` [PATCH dwarves 3/7] btf_encoder: cache all ELF section info Stephen Brennan
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Stephen Brennan @ 2022-08-26 18:49 UTC (permalink / raw)
  To: dwarves; +Cc: bpf, Arnaldo Carvalho de Melo, stephen.s.brennan, alan.maguire

The BTF encoder will now be storing symbol data for more than just
percpu variables. Rename the data structure fields so that they are
more general.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 btf_encoder.c | 54 ++++++++++++++++++++++++++-------------------------
 1 file changed, 28 insertions(+), 26 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index daa8e3b..bf59962 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -2,6 +2,7 @@
   SPDX-License-Identifier: GPL-2.0-only
 
   Copyright (C) 2019 Facebook
+  Copyright (c) 2022, Oracle and/or its affiliates.
 
   Derived from ctf_encoder.c, which is:
 
@@ -36,7 +37,7 @@ struct elf_function {
 	bool		 generated;
 };
 
-#define MAX_PERCPU_VAR_CNT 4096
+#define MAX_VAR_CNT 4096
 
 struct var_info {
 	uint64_t    addr;
@@ -60,12 +61,12 @@ struct btf_encoder {
 			  is_rel;
 	uint32_t	  array_index_id;
 	struct {
-		struct var_info vars[MAX_PERCPU_VAR_CNT];
+		struct var_info vars[MAX_VAR_CNT];
 		int		var_cnt;
-		uint32_t	shndx;
-		uint64_t	base_addr;
-		uint64_t	sec_sz;
-	} percpu;
+		uint32_t	percpu_shndx;
+		uint64_t	percpu_base_addr;
+		uint64_t	percpu_sec_sz;
+	} variables;
 	struct {
 		struct elf_function *entries;
 		int		    allocated;
@@ -1126,8 +1127,8 @@ static int percpu_var_cmp(const void *_a, const void *_b)
 static bool btf_encoder__percpu_var_exists(struct btf_encoder *encoder, uint64_t addr, uint32_t *sz, const char **name)
 {
 	struct var_info key = { .addr = addr };
-	const struct var_info *p = bsearch(&key, encoder->percpu.vars, encoder->percpu.var_cnt,
-					   sizeof(encoder->percpu.vars[0]), percpu_var_cmp);
+	const struct var_info *p = bsearch(&key, encoder->variables.vars, encoder->variables.var_cnt,
+					   sizeof(encoder->variables.vars[0]), percpu_var_cmp);
 	if (!p)
 		return false;
 
@@ -1143,7 +1144,7 @@ static int btf_encoder__collect_percpu_var(struct btf_encoder *encoder, GElf_Sym
 	uint32_t size;
 
 	/* compare a symbol's shndx to determine if it's a percpu variable */
-	if (sym_sec_idx != encoder->percpu.shndx)
+	if (sym_sec_idx != encoder->variables.percpu_shndx)
 		return 0;
 	if (elf_sym__type(sym) != STT_OBJECT)
 		return 0;
@@ -1171,17 +1172,17 @@ static int btf_encoder__collect_percpu_var(struct btf_encoder *encoder, GElf_Sym
 	 * ET_EXEC file) we need to subtract the section address.
 	 */
 	if (!encoder->is_rel)
-		addr -= encoder->percpu.base_addr;
+		addr -= encoder->variables.percpu_base_addr;
 
-	if (encoder->percpu.var_cnt == MAX_PERCPU_VAR_CNT) {
+	if (encoder->variables.var_cnt == MAX_VAR_CNT) {
 		fprintf(stderr, "Reached the limit of per-CPU variables: %d\n",
-			MAX_PERCPU_VAR_CNT);
+			MAX_VAR_CNT);
 		return -1;
 	}
-	encoder->percpu.vars[encoder->percpu.var_cnt].addr = addr;
-	encoder->percpu.vars[encoder->percpu.var_cnt].sz = size;
-	encoder->percpu.vars[encoder->percpu.var_cnt].name = sym_name;
-	encoder->percpu.var_cnt++;
+	encoder->variables.vars[encoder->variables.var_cnt].addr = addr;
+	encoder->variables.vars[encoder->variables.var_cnt].sz = size;
+	encoder->variables.vars[encoder->variables.var_cnt].name = sym_name;
+	encoder->variables.var_cnt++;
 
 	return 0;
 }
@@ -1193,7 +1194,7 @@ static int btf_encoder__collect_symbols(struct btf_encoder *encoder, bool collec
 	GElf_Sym sym;
 
 	/* cache variables' addresses, preparing for searching in symtab. */
-	encoder->percpu.var_cnt = 0;
+	encoder->variables.var_cnt = 0;
 
 	/* search within symtab for percpu variables */
 	elf_symtab__for_each_symbol_index(encoder->symtab, core_id, sym, sym_sec_idx) {
@@ -1204,11 +1205,11 @@ static int btf_encoder__collect_symbols(struct btf_encoder *encoder, bool collec
 	}
 
 	if (collect_percpu_vars) {
-		if (encoder->percpu.var_cnt)
-			qsort(encoder->percpu.vars, encoder->percpu.var_cnt, sizeof(encoder->percpu.vars[0]), percpu_var_cmp);
+		if (encoder->variables.var_cnt)
+			qsort(encoder->variables.vars, encoder->variables.var_cnt, sizeof(encoder->variables.vars[0]), percpu_var_cmp);
 
 		if (encoder->verbose)
-			printf("Found %d per-CPU variables!\n", encoder->percpu.var_cnt);
+			printf("Found %d per-CPU variables!\n", encoder->variables.var_cnt);
 	}
 
 	if (encoder->functions.cnt) {
@@ -1238,7 +1239,7 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, struct
 	struct tag *pos;
 	int err = -1;
 
-	if (encoder->percpu.shndx == 0 || !encoder->symtab)
+	if (encoder->variables.percpu_shndx == 0 || !encoder->symtab)
 		return 0;
 
 	if (encoder->verbose)
@@ -1268,9 +1269,10 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, struct
 		 * always contains virtual symbol addresses, so subtract
 		 * the section address unconditionally.
 		 */
-		if (addr < encoder->percpu.base_addr || addr >= encoder->percpu.base_addr + encoder->percpu.sec_sz)
+		if (addr < encoder->variables.percpu_base_addr ||
+		    addr >= encoder->variables.percpu_base_addr + encoder->variables.percpu_sec_sz)
 			continue;
-		addr -= encoder->percpu.base_addr;
+		addr -= encoder->variables.percpu_base_addr;
 
 		if (!btf_encoder__percpu_var_exists(encoder, addr, &size, &name))
 			continue; /* not a per-CPU variable */
@@ -1418,9 +1420,9 @@ struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filenam
 			if (encoder->verbose)
 				printf("%s: '%s' doesn't have '%s' section\n", __func__, cu->filename, PERCPU_SECTION);
 		} else {
-			encoder->percpu.shndx	  = elf_ndxscn(sec);
-			encoder->percpu.base_addr = shdr.sh_addr;
-			encoder->percpu.sec_sz	  = shdr.sh_size;
+			encoder->variables.percpu_shndx     = elf_ndxscn(sec);
+			encoder->variables.percpu_base_addr = shdr.sh_addr;
+			encoder->variables.percpu_sec_sz    = shdr.sh_size;
 		}
 
 		if (btf_encoder__collect_symbols(encoder, !encoder->skip_encoding_vars))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH dwarves 3/7] btf_encoder: cache all ELF section info
  2022-08-26 18:49 [PATCH dwarves 0/7] Add support for generating BTF for all variables Stephen Brennan
  2022-08-26 18:49 ` [PATCH dwarves 1/7] dutil: return ELF section name when looked up by index Stephen Brennan
  2022-08-26 18:49 ` [PATCH dwarves 2/7] btf_encoder: Rename percpu structures to variables Stephen Brennan
@ 2022-08-26 18:49 ` Stephen Brennan
  2022-08-26 18:49 ` [PATCH dwarves 4/7] btf_encoder: make the variable array dynamic Stephen Brennan
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Stephen Brennan @ 2022-08-26 18:49 UTC (permalink / raw)
  To: dwarves; +Cc: bpf, Arnaldo Carvalho de Melo, stephen.s.brennan, alan.maguire

To handle outputting all variables generally, we'll need to store more
section data. Create a table of ELF sections so we can refer to all the
cached data.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 btf_encoder.c | 68 +++++++++++++++++++++++++++++++++------------------
 1 file changed, 44 insertions(+), 24 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index bf59962..f67738a 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -38,6 +38,7 @@ struct elf_function {
 };
 
 #define MAX_VAR_CNT 4096
+#define MAX_ELF_SEC_CNT    128
 
 struct var_info {
 	uint64_t    addr;
@@ -45,6 +46,13 @@ struct var_info {
 	uint32_t    sz;
 };
 
+struct elf_secinfo {
+	uint64_t    addr;
+	const char *name;
+	uint64_t    sz;
+	bool        include;
+};
+
 struct btf_encoder {
 	struct list_head  node;
 	struct btf        *btf;
@@ -60,12 +68,12 @@ struct btf_encoder {
 			  gen_floats,
 			  is_rel;
 	uint32_t	  array_index_id;
+	struct elf_secinfo secinfo[MAX_ELF_SEC_CNT];
+	size_t             seccnt;
 	struct {
 		struct var_info vars[MAX_VAR_CNT];
 		int		var_cnt;
 		uint32_t	percpu_shndx;
-		uint64_t	percpu_base_addr;
-		uint64_t	percpu_sec_sz;
 	} variables;
 	struct {
 		struct elf_function *entries;
@@ -1167,12 +1175,13 @@ static int btf_encoder__collect_percpu_var(struct btf_encoder *encoder, GElf_Sym
 	if (encoder->verbose)
 		printf("Found per-CPU symbol '%s' at address 0x%" PRIx64 "\n", sym_name, addr);
 
-	/* Make sure addr is section-relative. For kernel modules (which are
-	 * ET_REL files) this is already the case. For vmlinux (which is an
-	 * ET_EXEC file) we need to subtract the section address.
+	/* Make sure addr is absolute, so that we can compare it to DWARF
+	 * absolute addresses. We can compute to section-relative addresses
+	 * when necessary.
 	 */
-	if (!encoder->is_rel)
-		addr -= encoder->variables.percpu_base_addr;
+	if (encoder->is_rel && sym->st_shndx < encoder->seccnt) {
+		addr += encoder->secinfo[sym->st_shndx].addr;
+	}
 
 	if (encoder->variables.var_cnt == MAX_VAR_CNT) {
 		fprintf(stderr, "Reached the limit of per-CPU variables: %d\n",
@@ -1238,6 +1247,7 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, struct
 	uint32_t core_id;
 	struct tag *pos;
 	int err = -1;
+	struct elf_secinfo *pcpu_scn = &encoder->secinfo[encoder->variables.percpu_shndx];
 
 	if (encoder->variables.percpu_shndx == 0 || !encoder->symtab)
 		return 0;
@@ -1265,14 +1275,9 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, struct
 		addr = var->ip.addr;
 		dwarf_name = variable__name(var);
 
-		/* Make sure addr is section-relative. DWARF, unlike ELF,
-		 * always contains virtual symbol addresses, so subtract
-		 * the section address unconditionally.
-		 */
-		if (addr < encoder->variables.percpu_base_addr ||
-		    addr >= encoder->variables.percpu_base_addr + encoder->variables.percpu_sec_sz)
+		/* Make sure addr is in the percpu section */
+		if (addr < pcpu_scn->addr || addr >= pcpu_scn->addr + pcpu_scn->sz)
 			continue;
-		addr -= encoder->variables.percpu_base_addr;
 
 		if (!btf_encoder__percpu_var_exists(encoder, addr, &size, &name))
 			continue; /* not a per-CPU variable */
@@ -1347,7 +1352,7 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, struct
 		 * add a BTF_VAR_SECINFO in encoder->percpu_secinfo, which will be added into
 		 * encoder->types later when we add BTF_VAR_DATASEC.
 		 */
-		id = btf_encoder__add_var_secinfo(encoder, id, addr, size);
+		id = btf_encoder__add_var_secinfo(encoder, id, addr - pcpu_scn->addr, size);
 		if (id < 0) {
 			fprintf(stderr, "error: failed to encode section info for variable '%s' at addr 0x%" PRIx64 "\n",
 			        name, addr);
@@ -1411,20 +1416,35 @@ struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filenam
 			goto out;
 		}
 
-		/* find percpu section's shndx */
+		/* index the ELF sections for later lookup */
 
 		GElf_Shdr shdr;
-		Elf_Scn *sec = elf_section_by_name(cu->elf, &shdr, PERCPU_SECTION, NULL);
+		size_t shndx;
+		if (elf_getshdrnum(cu->elf, &encoder->seccnt))
+			goto out_delete;
+		if (encoder->seccnt >= MAX_ELF_SEC_CNT) {
+			fprintf(stderr, "%s: reached limit of ELF sections\n", __func__);
+			goto out_delete;
+		}
 
-		if (!sec) {
-			if (encoder->verbose)
-				printf("%s: '%s' doesn't have '%s' section\n", __func__, cu->filename, PERCPU_SECTION);
-		} else {
-			encoder->variables.percpu_shndx     = elf_ndxscn(sec);
-			encoder->variables.percpu_base_addr = shdr.sh_addr;
-			encoder->variables.percpu_sec_sz    = shdr.sh_size;
+		for (shndx = 0; shndx < encoder->seccnt; shndx++) {
+			const char *secname = NULL;
+			Elf_Scn *sec = elf_section_by_idx(cu->elf, &shdr, shndx, &secname);
+			if (!sec)
+				goto out_delete;
+			encoder->secinfo[shndx].addr = shdr.sh_addr;
+			encoder->secinfo[shndx].sz = shdr.sh_size;
+			encoder->secinfo[shndx].name = secname;
+
+			if (strcmp(secname, PERCPU_SECTION) == 0) {
+				encoder->variables.percpu_shndx = shndx;
+				encoder->secinfo[shndx].include = true;
+			}
 		}
 
+		if (!encoder->variables.percpu_shndx && encoder->verbose)
+			printf("%s: '%s' doesn't have '%s' section\n", __func__, cu->filename, PERCPU_SECTION);
+
 		if (btf_encoder__collect_symbols(encoder, !encoder->skip_encoding_vars))
 			goto out_delete;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH dwarves 4/7] btf_encoder: make the variable array dynamic
  2022-08-26 18:49 [PATCH dwarves 0/7] Add support for generating BTF for all variables Stephen Brennan
                   ` (2 preceding siblings ...)
  2022-08-26 18:49 ` [PATCH dwarves 3/7] btf_encoder: cache all ELF section info Stephen Brennan
@ 2022-08-26 18:49 ` Stephen Brennan
  2022-08-26 18:49 ` [PATCH dwarves 5/7] btf_encoder: record ELF section for collected variables Stephen Brennan
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Stephen Brennan @ 2022-08-26 18:49 UTC (permalink / raw)
  To: dwarves; +Cc: bpf, Arnaldo Carvalho de Melo, stephen.s.brennan, alan.maguire

To collect all variables, we need a dynamically allocated array to scale
with any amount.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 btf_encoder.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index f67738a..ddc9d00 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -37,7 +37,6 @@ struct elf_function {
 	bool		 generated;
 };
 
-#define MAX_VAR_CNT 4096
 #define MAX_ELF_SEC_CNT    128
 
 struct var_info {
@@ -71,8 +70,9 @@ struct btf_encoder {
 	struct elf_secinfo secinfo[MAX_ELF_SEC_CNT];
 	size_t             seccnt;
 	struct {
-		struct var_info vars[MAX_VAR_CNT];
+		struct var_info *vars;
 		int		var_cnt;
+		int		var_alloc;
 		uint32_t	percpu_shndx;
 	} variables;
 	struct {
@@ -1157,6 +1157,15 @@ static int btf_encoder__collect_percpu_var(struct btf_encoder *encoder, GElf_Sym
 	if (elf_sym__type(sym) != STT_OBJECT)
 		return 0;
 
+	if (encoder->variables.var_cnt == encoder->variables.var_alloc) {
+		struct var_info *new;
+		encoder->variables.var_alloc = max(1000, encoder->variables.var_alloc * 3 / 2);
+		new = realloc(encoder->variables.vars, encoder->variables.var_alloc * sizeof(*new));
+		if (!new)
+			return -1;
+		encoder->variables.vars = new;
+	}
+
 	addr = elf_sym__value(sym);
 
 	size = elf_sym__size(sym);
@@ -1183,11 +1192,6 @@ static int btf_encoder__collect_percpu_var(struct btf_encoder *encoder, GElf_Sym
 		addr += encoder->secinfo[sym->st_shndx].addr;
 	}
 
-	if (encoder->variables.var_cnt == MAX_VAR_CNT) {
-		fprintf(stderr, "Reached the limit of per-CPU variables: %d\n",
-			MAX_VAR_CNT);
-		return -1;
-	}
 	encoder->variables.vars[encoder->variables.var_cnt].addr = addr;
 	encoder->variables.vars[encoder->variables.var_cnt].sz = size;
 	encoder->variables.vars[encoder->variables.var_cnt].name = sym_name;
@@ -1470,6 +1474,10 @@ void btf_encoder__delete(struct btf_encoder *encoder)
 	encoder->btf = NULL;
 	elf_symtab__delete(encoder->symtab);
 
+	encoder->variables.var_cnt = encoder->variables.var_alloc = 0;
+	free(encoder->variables.vars);
+	encoder->variables.vars = NULL;
+
 	encoder->functions.allocated = encoder->functions.cnt = 0;
 	free(encoder->functions.entries);
 	encoder->functions.entries = NULL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH dwarves 5/7] btf_encoder: record ELF section for collected variables
  2022-08-26 18:49 [PATCH dwarves 0/7] Add support for generating BTF for all variables Stephen Brennan
                   ` (3 preceding siblings ...)
  2022-08-26 18:49 ` [PATCH dwarves 4/7] btf_encoder: make the variable array dynamic Stephen Brennan
@ 2022-08-26 18:49 ` Stephen Brennan
  2022-08-26 18:49 ` [PATCH dwarves 6/7] btf_encoder: collect all variables Stephen Brennan
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Stephen Brennan @ 2022-08-26 18:49 UTC (permalink / raw)
  To: dwarves; +Cc: bpf, Arnaldo Carvalho de Melo, stephen.s.brennan, alan.maguire

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 btf_encoder.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index ddc9d00..83aca61 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -43,6 +43,7 @@ struct var_info {
 	uint64_t    addr;
 	const char *name;
 	uint32_t    sz;
+	uint32_t    shndx;
 };
 
 struct elf_secinfo {
@@ -1145,7 +1146,7 @@ static bool btf_encoder__percpu_var_exists(struct btf_encoder *encoder, uint64_t
 	return true;
 }
 
-static int btf_encoder__collect_percpu_var(struct btf_encoder *encoder, GElf_Sym *sym, size_t sym_sec_idx)
+static int btf_encoder__collect_percpu_var(struct btf_encoder *encoder, GElf_Sym *sym, uint32_t sym_sec_idx)
 {
 	const char *sym_name;
 	uint64_t addr;
@@ -1195,6 +1196,7 @@ static int btf_encoder__collect_percpu_var(struct btf_encoder *encoder, GElf_Sym
 	encoder->variables.vars[encoder->variables.var_cnt].addr = addr;
 	encoder->variables.vars[encoder->variables.var_cnt].sz = size;
 	encoder->variables.vars[encoder->variables.var_cnt].name = sym_name;
+	encoder->variables.vars[encoder->variables.var_cnt].shndx = sym_sec_idx;
 	encoder->variables.var_cnt++;
 
 	return 0;
@@ -1202,7 +1204,7 @@ static int btf_encoder__collect_percpu_var(struct btf_encoder *encoder, GElf_Sym
 
 static int btf_encoder__collect_symbols(struct btf_encoder *encoder, bool collect_percpu_vars)
 {
-	Elf32_Word sym_sec_idx;
+	uint32_t sym_sec_idx;
 	uint32_t core_id;
 	GElf_Sym sym;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH dwarves 6/7] btf_encoder: collect all variables
  2022-08-26 18:49 [PATCH dwarves 0/7] Add support for generating BTF for all variables Stephen Brennan
                   ` (4 preceding siblings ...)
  2022-08-26 18:49 ` [PATCH dwarves 5/7] btf_encoder: record ELF section for collected variables Stephen Brennan
@ 2022-08-26 18:49 ` Stephen Brennan
  2022-08-26 18:49 ` [PATCH dwarves 7/7] btf_encoder: allow encoding " Stephen Brennan
  2022-08-30 15:14 ` [PATCH dwarves 0/7] Add support for generating BTF for " Alexei Starovoitov
  7 siblings, 0 replies; 15+ messages in thread
From: Stephen Brennan @ 2022-08-26 18:49 UTC (permalink / raw)
  To: dwarves; +Cc: bpf, Arnaldo Carvalho de Melo, stephen.s.brennan, alan.maguire

Prior to this, we only collected ELF symbols which were in the percpu
section. Now, collect all ELF symbols of type STT_OBJECT, and rely on
the setting of "include" in the elf_secinfo to decide whether to include
in the final output.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 btf_encoder.c | 70 +++++++++++++++++++++++++++------------------------
 1 file changed, 37 insertions(+), 33 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index 83aca61..1804500 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -1123,7 +1123,7 @@ int btf_encoder__encode(struct btf_encoder *encoder)
 	return err;
 }
 
-static int percpu_var_cmp(const void *_a, const void *_b)
+static int var_cmp(const void *_a, const void *_b)
 {
 	const struct var_info *a = _a;
 	const struct var_info *b = _b;
@@ -1133,28 +1133,24 @@ static int percpu_var_cmp(const void *_a, const void *_b)
 	return a->addr < b->addr ? -1 : 1;
 }
 
-static bool btf_encoder__percpu_var_exists(struct btf_encoder *encoder, uint64_t addr, uint32_t *sz, const char **name)
+static bool btf_encoder__var_exists(struct btf_encoder *encoder, uint64_t addr, struct var_info **info)
 {
 	struct var_info key = { .addr = addr };
-	const struct var_info *p = bsearch(&key, encoder->variables.vars, encoder->variables.var_cnt,
-					   sizeof(encoder->variables.vars[0]), percpu_var_cmp);
+	struct var_info *p = bsearch(&key, encoder->variables.vars, encoder->variables.var_cnt,
+				     sizeof(encoder->variables.vars[0]), var_cmp);
 	if (!p)
 		return false;
 
-	*sz = p->sz;
-	*name = p->name;
+	*info = p;
 	return true;
 }
 
-static int btf_encoder__collect_percpu_var(struct btf_encoder *encoder, GElf_Sym *sym, uint32_t sym_sec_idx)
+static int btf_encoder__collect_var(struct btf_encoder *encoder, GElf_Sym *sym, uint32_t sym_sec_idx)
 {
 	const char *sym_name;
 	uint64_t addr;
 	uint32_t size;
 
-	/* compare a symbol's shndx to determine if it's a percpu variable */
-	if (sym_sec_idx != encoder->variables.percpu_shndx)
-		return 0;
 	if (elf_sym__type(sym) != STT_OBJECT)
 		return 0;
 
@@ -1213,7 +1209,7 @@ static int btf_encoder__collect_symbols(struct btf_encoder *encoder, bool collec
 
 	/* search within symtab for percpu variables */
 	elf_symtab__for_each_symbol_index(encoder->symtab, core_id, sym, sym_sec_idx) {
-		if (collect_percpu_vars && btf_encoder__collect_percpu_var(encoder, &sym, sym_sec_idx))
+		if (collect_percpu_vars && btf_encoder__collect_var(encoder, &sym, sym_sec_idx))
 			return -1;
 		if (btf_encoder__collect_function(encoder, &sym))
 			return -1;
@@ -1221,7 +1217,7 @@ static int btf_encoder__collect_symbols(struct btf_encoder *encoder, bool collec
 
 	if (collect_percpu_vars) {
 		if (encoder->variables.var_cnt)
-			qsort(encoder->variables.vars, encoder->variables.var_cnt, sizeof(encoder->variables.vars[0]), percpu_var_cmp);
+			qsort(encoder->variables.vars, encoder->variables.var_cnt, sizeof(encoder->variables.vars[0]), var_cmp);
 
 		if (encoder->verbose)
 			printf("Found %d per-CPU variables!\n", encoder->variables.var_cnt);
@@ -1263,17 +1259,19 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, struct
 
 	cu__for_each_variable(cu, core_id, pos) {
 		struct variable *var = tag__variable(pos);
-		uint32_t size, type, linkage;
-		const char *name, *dwarf_name;
+		uint32_t type, linkage;
+		const char *dwarf_name;
 		struct llvm_annotation *annot;
 		const struct tag *tag;
+		struct var_info *info;
+		struct elf_secinfo *sec = NULL;
 		uint64_t addr;
 		int id;
 
 		if (var->declaration && !var->spec)
 			continue;
 
-		/* percpu variables are allocated in global space */
+		/* we want global variables, or those with a definition */
 		if (variable__scope(var) != VSCOPE_GLOBAL && !var->spec)
 			continue;
 
@@ -1281,13 +1279,16 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, struct
 		addr = var->ip.addr;
 		dwarf_name = variable__name(var);
 
-		/* Make sure addr is in the percpu section */
-		if (addr < pcpu_scn->addr || addr >= pcpu_scn->addr + pcpu_scn->sz)
-			continue;
-
-		if (!btf_encoder__percpu_var_exists(encoder, addr, &size, &name))
+		if (!btf_encoder__var_exists(encoder, addr, &info))
 			continue; /* not a per-CPU variable */
 
+		/* Get the ELF section info */
+		if (info->shndx && info->shndx < encoder->seccnt)
+			sec = &encoder->secinfo[info->shndx];
+		/* Only continue if the section is to be included */
+		if (!sec || !sec->include)
+			continue;
+
 		/* A lot of "special" DWARF variables (e.g, __UNIQUE_ID___xxx)
 		 * have addr == 0, which is the same as, say, valid
 		 * fixed_percpu_data per-CPU variable. To distinguish between
@@ -1306,7 +1307,7 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, struct
 		 *  per-CPU symbols have non-zero values.
 		 */
 		if (var->ip.addr == 0) {
-			if (!dwarf_name || strcmp(dwarf_name, name))
+			if (!dwarf_name || strcmp(dwarf_name, info->name))
 				continue;
 		}
 
@@ -1315,7 +1316,7 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, struct
 
 		if (var->ip.tag.type == 0) {
 			fprintf(stderr, "error: found variable '%s' in CU '%s' that has void type\n",
-				name, cu->name);
+				info->name, cu->name);
 			if (encoder->force)
 				continue;
 			err = -1;
@@ -1325,7 +1326,7 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, struct
 		tag = cu__type(cu, var->ip.tag.type);
 		if (tag__size(tag, cu) == 0) {
 			if (encoder->verbose)
-				fprintf(stderr, "Ignoring zero-sized per-CPU variable '%s'...\n", dwarf_name ?: "<missing name>");
+				fprintf(stderr, "Ignoring zero-sized variable '%s'...\n", dwarf_name ?: "<missing name>");
 			continue;
 		}
 
@@ -1334,14 +1335,14 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, struct
 
 		if (encoder->verbose) {
 			printf("Variable '%s' from CU '%s' at address 0x%" PRIx64 " encoded\n",
-			       name, cu->name, addr);
+			       info->name, cu->name, addr);
 		}
 
 		/* add a BTF_KIND_VAR in encoder->types */
-		id = btf_encoder__add_var(encoder, type, name, linkage);
+		id = btf_encoder__add_var(encoder, type, info->name, linkage);
 		if (id < 0) {
 			fprintf(stderr, "error: failed to encode variable '%s' at addr 0x%" PRIx64 "\n",
-			        name, addr);
+			        info->name, addr);
 			goto out;
 		}
 
@@ -1349,20 +1350,23 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, struct
 			int tag_type_id = btf_encoder__add_decl_tag(encoder, annot->value, id, annot->component_idx);
 			if (tag_type_id < 0) {
 				fprintf(stderr, "error: failed to encode tag '%s' to variable '%s' with component_idx %d\n",
-					annot->value, name, annot->component_idx);
+					annot->value, info->name, annot->component_idx);
 				goto out;
 			}
 		}
 
 		/*
-		 * add a BTF_VAR_SECINFO in encoder->percpu_secinfo, which will be added into
+		 * For percpu variables, add a BTF_VAR_SECINFO in
+		 * encoder->percpu_secinfo, which will be added into
 		 * encoder->types later when we add BTF_VAR_DATASEC.
 		 */
-		id = btf_encoder__add_var_secinfo(encoder, id, addr - pcpu_scn->addr, size);
-		if (id < 0) {
-			fprintf(stderr, "error: failed to encode section info for variable '%s' at addr 0x%" PRIx64 "\n",
-			        name, addr);
-			goto out;
+		if (info->shndx == encoder->variables.percpu_shndx) {
+			id = btf_encoder__add_var_secinfo(encoder, id, addr - pcpu_scn->addr, info->sz);
+			if (id < 0) {
+				fprintf(stderr, "error: failed to encode section info for variable '%s' at addr 0x%" PRIx64 "\n",
+					info->name, addr);
+				goto out;
+			}
 		}
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH dwarves 7/7] btf_encoder: allow encoding all variables
  2022-08-26 18:49 [PATCH dwarves 0/7] Add support for generating BTF for all variables Stephen Brennan
                   ` (5 preceding siblings ...)
  2022-08-26 18:49 ` [PATCH dwarves 6/7] btf_encoder: collect all variables Stephen Brennan
@ 2022-08-26 18:49 ` Stephen Brennan
  2022-08-30 15:14 ` [PATCH dwarves 0/7] Add support for generating BTF for " Alexei Starovoitov
  7 siblings, 0 replies; 15+ messages in thread
From: Stephen Brennan @ 2022-08-26 18:49 UTC (permalink / raw)
  To: dwarves; +Cc: bpf, Arnaldo Carvalho de Melo, stephen.s.brennan, alan.maguire

Add the --encode_all_btf_vars option, which conflicts with
--skip_encoding_btf_vars, and will enable encoding all variables which
have a corresponding STT_OBJECT match in the ELF symbol table. Rework
the btf_encoder_new() signature to include a single enum to specify
which kinds of variables are allowed, and add the necessary logic to
select variables.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 btf_encoder.c      | 22 +++++++++++++---------
 btf_encoder.h      |  8 +++++++-
 man-pages/pahole.1 |  6 +++++-
 pahole.c           | 31 +++++++++++++++++++++++++------
 4 files changed, 50 insertions(+), 17 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index 1804500..eabc8d2 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -61,7 +61,6 @@ struct btf_encoder {
 	struct elf_symtab *symtab;
 	bool		  has_index_type,
 			  need_index_type,
-			  skip_encoding_vars,
 			  raw_output,
 			  verbose,
 			  force,
@@ -70,6 +69,7 @@ struct btf_encoder {
 	uint32_t	  array_index_id;
 	struct elf_secinfo secinfo[MAX_ELF_SEC_CNT];
 	size_t             seccnt;
+	enum btf_var_option encode_vars;
 	struct {
 		struct var_info *vars;
 		int		var_cnt;
@@ -1198,24 +1198,25 @@ static int btf_encoder__collect_var(struct btf_encoder *encoder, GElf_Sym *sym,
 	return 0;
 }
 
-static int btf_encoder__collect_symbols(struct btf_encoder *encoder, bool collect_percpu_vars)
+static int btf_encoder__collect_symbols(struct btf_encoder *encoder)
 {
 	uint32_t sym_sec_idx;
 	uint32_t core_id;
 	GElf_Sym sym;
+	bool collect_vars = (encoder->encode_vars != BTF_VAR_NONE);
 
 	/* cache variables' addresses, preparing for searching in symtab. */
 	encoder->variables.var_cnt = 0;
 
 	/* search within symtab for percpu variables */
 	elf_symtab__for_each_symbol_index(encoder->symtab, core_id, sym, sym_sec_idx) {
-		if (collect_percpu_vars && btf_encoder__collect_var(encoder, &sym, sym_sec_idx))
+		if (collect_vars && btf_encoder__collect_var(encoder, &sym, sym_sec_idx))
 			return -1;
 		if (btf_encoder__collect_function(encoder, &sym))
 			return -1;
 	}
 
-	if (collect_percpu_vars) {
+	if (collect_vars) {
 		if (encoder->variables.var_cnt)
 			qsort(encoder->variables.vars, encoder->variables.var_cnt, sizeof(encoder->variables.vars[0]), var_cmp);
 
@@ -1375,7 +1376,7 @@ out:
 	return err;
 }
 
-struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filename, struct btf *base_btf, bool skip_encoding_vars, bool force, bool gen_floats, bool verbose)
+struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filename, struct btf *base_btf, enum btf_var_option vars, bool force, bool gen_floats, bool verbose)
 {
 	struct btf_encoder *encoder = zalloc(sizeof(*encoder));
 
@@ -1391,11 +1392,11 @@ struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filenam
 
 		encoder->force		 = force;
 		encoder->gen_floats	 = gen_floats;
-		encoder->skip_encoding_vars = skip_encoding_vars;
 		encoder->verbose	 = verbose;
 		encoder->has_index_type  = false;
 		encoder->need_index_type = false;
 		encoder->array_index_id  = 0;
+		encoder->encode_vars	 = vars;
 
 		GElf_Ehdr ehdr;
 
@@ -1445,17 +1446,20 @@ struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filenam
 			encoder->secinfo[shndx].addr = shdr.sh_addr;
 			encoder->secinfo[shndx].sz = shdr.sh_size;
 			encoder->secinfo[shndx].name = secname;
+			if (encoder->encode_vars == BTF_VAR_ALL)
+				encoder->secinfo[shndx].include = true;
 
 			if (strcmp(secname, PERCPU_SECTION) == 0) {
 				encoder->variables.percpu_shndx = shndx;
-				encoder->secinfo[shndx].include = true;
+				if (encoder->encode_vars != BTF_VAR_NONE)
+					encoder->secinfo[shndx].include = true;
 			}
 		}
 
 		if (!encoder->variables.percpu_shndx && encoder->verbose)
 			printf("%s: '%s' doesn't have '%s' section\n", __func__, cu->filename, PERCPU_SECTION);
 
-		if (btf_encoder__collect_symbols(encoder, !encoder->skip_encoding_vars))
+		if (btf_encoder__collect_symbols(encoder))
 			goto out_delete;
 
 		if (encoder->verbose)
@@ -1615,7 +1619,7 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
 		}
 	}
 
-	if (!encoder->skip_encoding_vars)
+	if (encoder->encode_vars != BTF_VAR_NONE)
 		err = btf_encoder__encode_cu_variables(encoder, cu, type_id_off);
 out:
 	return err;
diff --git a/btf_encoder.h b/btf_encoder.h
index a65120c..e03c7cc 100644
--- a/btf_encoder.h
+++ b/btf_encoder.h
@@ -16,7 +16,13 @@ struct btf;
 struct cu;
 struct list_head;
 
-struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filename, struct btf *base_btf, bool skip_encoding_vars, bool force, bool gen_floats, bool verbose);
+enum btf_var_option {
+	BTF_VAR_NONE,
+	BTF_VAR_PERCPU,
+	BTF_VAR_ALL,
+};
+
+struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filename, struct btf *base_btf, enum btf_var_option vars, bool force, bool gen_floats, bool verbose);
 void btf_encoder__delete(struct btf_encoder *encoder);
 
 int btf_encoder__encode(struct btf_encoder *encoder);
diff --git a/man-pages/pahole.1 b/man-pages/pahole.1
index bb88e2f..d21af0a 100644
--- a/man-pages/pahole.1
+++ b/man-pages/pahole.1
@@ -207,7 +207,11 @@ the debugging information.
 
 .TP
 .B \-\-skip_encoding_btf_vars
-Do not encode VARs in BTF.
+.TQ
+.B \-\-encode_all_btf_vars
+By default, VARs are encoded only for percpu variables. These options allow
+to skip encoding them, or to encode all variables regardless of whether they are
+percpu. These options are mutually exclusive.
 
 .TP
 .B \-\-skip_encoding_btf_decl_tag
diff --git a/pahole.c b/pahole.c
index e87d9a4..1f86ffb 100644
--- a/pahole.c
+++ b/pahole.c
@@ -4,6 +4,7 @@
   Copyright (C) 2006 Mandriva Conectiva S.A.
   Copyright (C) 2006 Arnaldo Carvalho de Melo <acme@mandriva.com>
   Copyright (C) 2007- Arnaldo Carvalho de Melo <acme@redhat.com>
+  Copyright (c) 2022 Oracle and/or its affiliates.
 */
 
 #include <argp.h>
@@ -37,7 +38,7 @@ static bool ctf_encode;
 static bool sort_output;
 static bool need_resort;
 static bool first_obj_only;
-static bool skip_encoding_btf_vars;
+static enum btf_var_option encode_btf_vars = BTF_VAR_PERCPU;
 static bool btf_encode_force;
 static const char *base_btf_file;
 
@@ -1221,6 +1222,7 @@ ARGP_PROGRAM_VERSION_HOOK_DEF = dwarves_print_version;
 #define ARGP_languages		   335
 #define ARGP_languages_exclude	   336
 #define ARGP_skip_encoding_btf_enum64 337
+#define ARGP_encode_all_btf_vars   338
 
 static const struct argp_option pahole__options[] = {
 	{
@@ -1532,7 +1534,12 @@ static const struct argp_option pahole__options[] = {
 	{
 		.name = "skip_encoding_btf_vars",
 		.key  = ARGP_skip_encoding_btf_vars,
-		.doc  = "Do not encode VARs in BTF."
+		.doc  = "Do not encode any VARs in BTF (default: only percpu)."
+	},
+	{
+		.name = "encode_all_btf_vars",
+		.key  = ARGP_encode_all_btf_vars,
+		.doc  = "Encode all VARs in BTF (default: only percpu)."
 	},
 	{
 		.name = "btf_encode_force",
@@ -1757,8 +1764,6 @@ static error_t pahole__options_parser(int key, char *arg,
 		conf.range = arg;			break;
 	case ARGP_header_type:
 		conf.header_type = arg;			break;
-	case ARGP_skip_encoding_btf_vars:
-		skip_encoding_btf_vars = true;		break;
 	case ARGP_btf_encode_force:
 		btf_encode_force = true;		break;
 	case ARGP_btf_base:
@@ -1795,6 +1800,20 @@ static error_t pahole__options_parser(int key, char *arg,
 		languages.str = arg;			break;
 	case ARGP_skip_encoding_btf_enum64:
 		conf_load.skip_encoding_btf_enum64 = true;	break;
+	case ARGP_skip_encoding_btf_vars:
+		if (encode_btf_vars != BTF_VAR_PERCPU) {
+			fprintf(stderr, "error: --encode_all_btf_vars and --skip_encoding_btf_vars are mutually exclusive\n");
+			return ARGP_HELP_SEE;
+		}
+		encode_btf_vars = BTF_VAR_NONE;
+		break;
+	case ARGP_encode_all_btf_vars:
+		if (encode_btf_vars != BTF_VAR_PERCPU) {
+			fprintf(stderr, "error: --encode_all_btf_vars and --skip_encoding_btf_vars are mutually exclusive\n");
+			return ARGP_HELP_SEE;
+		}
+		encode_btf_vars = BTF_VAR_ALL;
+		break;
 	default:
 		return ARGP_ERR_UNKNOWN;
 	}
@@ -3034,7 +3053,7 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
 			 * And, it is used by the thread
 			 * create it.
 			 */
-			btf_encoder = btf_encoder__new(cu, detached_btf_filename, conf_load->base_btf, skip_encoding_btf_vars,
+			btf_encoder = btf_encoder__new(cu, detached_btf_filename, conf_load->base_btf, encode_btf_vars,
 						       btf_encode_force, btf_gen_floats, global_verbose);
 			if (btf_encoder && thr_data) {
 				struct thread_data *thread = thr_data;
@@ -3064,7 +3083,7 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
 				thread->encoder =
 					btf_encoder__new(cu, detached_btf_filename,
 							 NULL,
-							 skip_encoding_btf_vars,
+							 encode_btf_vars,
 							 btf_encode_force,
 							 btf_gen_floats,
 							 global_verbose);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH dwarves 0/7] Add support for generating BTF for all variables
  2022-08-26 18:49 [PATCH dwarves 0/7] Add support for generating BTF for all variables Stephen Brennan
                   ` (6 preceding siblings ...)
  2022-08-26 18:49 ` [PATCH dwarves 7/7] btf_encoder: allow encoding " Stephen Brennan
@ 2022-08-30 15:14 ` Alexei Starovoitov
  2022-09-07 19:06   ` Stephen Brennan
  7 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2022-08-30 15:14 UTC (permalink / raw)
  To: Stephen Brennan, Jiri Olsa, Andrii Nakryiko, Daniel Borkmann
  Cc: dwarves, bpf, Arnaldo Carvalho de Melo, Alan Maguire

On Fri, Aug 26, 2022 at 11:54 AM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Hello everyone,
>
> BTF offers some exciting new possibilities beyond its original intent;
> one of these is making the kernel more self-describing for debug tools.
> Kallsyms contains symbol table data, and ORC (for x86_64) contains
> information to help unwind stacks. Now, BTF can provide type information
> for functions and variables. Taken together, this data is enough to
> power the basic (read-only) functions of a postmortem or live debugger,
> without falling back on the heavier debugging information formats like
> DWARF. What's more, all of these data sources are contained within the
> kernel image itself, and are thus available on live systems and within
> crash dumps, without consulting any external debug information files.
>
> However, currently BTF generation emits information only for percpu
> variables. This patch series removes that limitation, allowing
> generating BTF for all variables, thus providing complete type
> information for debuggers.
>
> Of course, generating additional BTF means that more data must be stored
> in the kernel image, and that may not be okay for everyone. Thus, the
> new behavior must be explicitly enabled by a flag.
>
> Testing
> -------
>
> To verify this change and illustrate the additional space required, I
> built v5.19-rc7 on x86_defconfig, with the following additionally
> enabled:
>
> enable DEBUG_INFO_DWARF4
> enable BPF_SYSCALL
> enable DEBUG_INFO_BTF
>
> I then ran pahole to generate BTF from the built vmlinux in three
> configurations, and recorded the size of the BTF for each:
>
> 1) using the current master branch
>    size: 5505315 bytes
> 2) using this patched version, without enabling --encode_all_btf_vars
>    size: 5505315 bytes
> 3) using this patched version, with --encode_all_btf_vars enabled
>    size: 6811291 bytes
>
> A total increase of 1.25 MiB, or a 23.7% increase. This is definitely
> notable, but not unreasonable for many use cases such as desktop or
> server applications. I also verified that the data generated by cases 1
> and 2 are byte-for-byte identical: that is, there are no changes to the
> generated BTF unless --encode_all_btf_vars is enabled.
>
> I also verified that the output variables makes sense. I created an
> application which parses the output BTF and dumps the
> declarations (BTF_KIND_VAR and BTF_KIND_FUNC), and then diffed its
> output between configuration 2 and 3. I'm happy to provide a link to
> that diff (it's of course too big to include in the email).
>
> End-to-end test
> ---------------
>
> To show this is not just theory, I've created an end-to-end test which
> combines BTF generated via this patch series, along with a kernel patch
> necessary to expose the kallsyms data [1], and a branch of the drgn
> debugger[2] which implements kallsyms and BTF parsing. Core dumps
> generated on the resulting kernel can be loaded by the drgn debugger,
> and the it can read out variables from the dump with full type
> information without needing to consult a DWARF debuginfo file.
>
> Future Work
> -----------
>
> If this proves acceptable, I'd like to follow-up with a kernel patch to
> add a configuration option (default=n) for generating BTF with all
> variables, which distributions could choose to enable or not.
>
> There was previous discussion[3] about leveraging split BTF or building
> additional kernel modules to contain the extra variables. I believe with
> this patch series, it is possible to do that. However, I'd argue that
> simpler is better here: the advantage for using BTF is having it all
> available in the kernel/module image. Storing extra BTF on the
> filesystem would break that advantage, and at that point, you'd be
> better off using a debuginfo format like CTF, which is lightweight and
> expected to be found on the filesystem.

With all or nothing approach the distros would have a hard choice
to make whether to enable that kconfig, increase BTF and consume
extra memory without any obvious reason or just don't do it.
Majority probably is not going to enable it.
So the feature will become a single vendor only and with
inevitable bit-rot.

Whereas with split BTF and extra kernel module approach
we can enable BTF with all global vars by default.
The extra module will be shipped by all distros and tools
like bpftrace might start using it.

>
> [1]: https://lore.kernel.org/lkml/20220517000508.777145-3-stephen.s.brennan@oracle.com/T/
>      (The above series is already in the 6.0 RC's)
> [2]: https://github.com/brenns10/drgn/tree/kallsyms_plus_btf
> [3]: https://lore.kernel.org/bpf/586a6288-704a-f7a7-b256-e18a675927df@oracle.com/
>
> Stephen Brennan (7):
>   dutil: return ELF section name when looked up by index
>   btf_encoder: Rename percpu structures to variables
>   btf_encoder: cache all ELF section info
>   btf_encoder: make the variable array dynamic
>   btf_encoder: record ELF section for collected variables
>   btf_encoder: collect all variables
>   btf_encoder: allow encoding all variables
>
>  btf_encoder.c      | 196 +++++++++++++++++++++++++++------------------
>  btf_encoder.h      |   8 +-
>  dutil.c            |  10 ++-
>  dutil.h            |   2 +-
>  man-pages/pahole.1 |   6 +-
>  pahole.c           |  31 +++++--
>  6 files changed, 165 insertions(+), 88 deletions(-)
>
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH dwarves 0/7] Add support for generating BTF for all variables
  2022-08-30 15:14 ` [PATCH dwarves 0/7] Add support for generating BTF for " Alexei Starovoitov
@ 2022-09-07 19:06   ` Stephen Brennan
  2022-09-07 19:27     ` Alexei Starovoitov
  0 siblings, 1 reply; 15+ messages in thread
From: Stephen Brennan @ 2022-09-07 19:06 UTC (permalink / raw)
  To: Alexei Starovoitov, Jiri Olsa, Andrii Nakryiko, Daniel Borkmann
  Cc: dwarves, bpf, Arnaldo Carvalho de Melo, Alan Maguire

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> On Fri, Aug 26, 2022 at 11:54 AM Stephen Brennan
> <stephen.s.brennan@oracle.com> wrote:
[...]
>> Future Work
>> -----------
>>
>> If this proves acceptable, I'd like to follow-up with a kernel patch to
>> add a configuration option (default=n) for generating BTF with all
>> variables, which distributions could choose to enable or not.
>>
>> There was previous discussion[3] about leveraging split BTF or building
>> additional kernel modules to contain the extra variables. I believe with
>> this patch series, it is possible to do that. However, I'd argue that
>> simpler is better here: the advantage for using BTF is having it all
>> available in the kernel/module image. Storing extra BTF on the
>> filesystem would break that advantage, and at that point, you'd be
>> better off using a debuginfo format like CTF, which is lightweight and
>> expected to be found on the filesystem.
>
> With all or nothing approach the distros would have a hard choice
> to make whether to enable that kconfig, increase BTF and consume
> extra memory without any obvious reason or just don't do it.
> Majority probably is not going to enable it.
> So the feature will become a single vendor only and with
> inevitable bit-rot.

I'd intend to support it even if just a single distribution enabled it.
But I do see your concern.

> Whereas with split BTF and extra kernel module approach
> we can enable BTF with all global vars by default.
> The extra module will be shipped by all distros and tools
> like bpftrace might start using it.

Split BTF is currently limited to a single base BTF file. We'd need more
patches for pahole to support multiple --btf_base files: e.g.
vmlinux.btf and vmlinux-variables.btf. There's also the question of
modules: presumably we wouldn't try to have "$MODULE" and
"$MODULE-btf-extra" modules due to the added complexity. I doubt the
space savings would be worth it.

I can look into these concerns, but if possible I would like to proceed
with this series, as it is a separate concern from the exact mechanism
by which we include extra BTF into the kernel.

Thanks,
Stephen

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH dwarves 0/7] Add support for generating BTF for all variables
  2022-09-07 19:06   ` Stephen Brennan
@ 2022-09-07 19:27     ` Alexei Starovoitov
  2022-09-07 21:54       ` Stephen Brennan
  0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2022-09-07 19:27 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Jiri Olsa, Andrii Nakryiko, Daniel Borkmann, dwarves, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire

On Wed, Sep 7, 2022 at 12:07 PM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> > On Fri, Aug 26, 2022 at 11:54 AM Stephen Brennan
> > <stephen.s.brennan@oracle.com> wrote:
> [...]
> >> Future Work
> >> -----------
> >>
> >> If this proves acceptable, I'd like to follow-up with a kernel patch to
> >> add a configuration option (default=n) for generating BTF with all
> >> variables, which distributions could choose to enable or not.
> >>
> >> There was previous discussion[3] about leveraging split BTF or building
> >> additional kernel modules to contain the extra variables. I believe with
> >> this patch series, it is possible to do that. However, I'd argue that
> >> simpler is better here: the advantage for using BTF is having it all
> >> available in the kernel/module image. Storing extra BTF on the
> >> filesystem would break that advantage, and at that point, you'd be
> >> better off using a debuginfo format like CTF, which is lightweight and
> >> expected to be found on the filesystem.
> >
> > With all or nothing approach the distros would have a hard choice
> > to make whether to enable that kconfig, increase BTF and consume
> > extra memory without any obvious reason or just don't do it.
> > Majority probably is not going to enable it.
> > So the feature will become a single vendor only and with
> > inevitable bit-rot.
>
> I'd intend to support it even if just a single distribution enabled it.
> But I do see your concern.

This thread was dormant for 8 days.
That's a poor example of "intend to support".


> > Whereas with split BTF and extra kernel module approach
> > we can enable BTF with all global vars by default.
> > The extra module will be shipped by all distros and tools
> > like bpftrace might start using it.
>
> Split BTF is currently limited to a single base BTF file. We'd need more
> patches for pahole to support multiple --btf_base files: e.g.
> vmlinux.btf and vmlinux-variables.btf. There's also the question of
> modules: presumably we wouldn't try to have "$MODULE" and
> "$MODULE-btf-extra" modules due to the added complexity. I doubt the
> space savings would be worth it.
>
> I can look into these concerns, but if possible I would like to proceed
> with this series, as it is a separate concern from the exact mechanism
> by which we include extra BTF into the kernel.

Not an option. Sorry.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH dwarves 0/7] Add support for generating BTF for all variables
  2022-09-07 19:27     ` Alexei Starovoitov
@ 2022-09-07 21:54       ` Stephen Brennan
  2022-09-08 20:35         ` Alexei Starovoitov
  0 siblings, 1 reply; 15+ messages in thread
From: Stephen Brennan @ 2022-09-07 21:54 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiri Olsa, Andrii Nakryiko, Daniel Borkmann, dwarves, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> On Wed, Sep 7, 2022 at 12:07 PM Stephen Brennan
> <stephen.s.brennan@oracle.com> wrote:
>>
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>> > On Fri, Aug 26, 2022 at 11:54 AM Stephen Brennan
>> > <stephen.s.brennan@oracle.com> wrote:
>> [...]
>> >> Future Work
>> >> -----------
>> >>
>> >> If this proves acceptable, I'd like to follow-up with a kernel patch to
>> >> add a configuration option (default=n) for generating BTF with all
>> >> variables, which distributions could choose to enable or not.
>> >>
>> >> There was previous discussion[3] about leveraging split BTF or building
>> >> additional kernel modules to contain the extra variables. I believe with
>> >> this patch series, it is possible to do that. However, I'd argue that
>> >> simpler is better here: the advantage for using BTF is having it all
>> >> available in the kernel/module image. Storing extra BTF on the
>> >> filesystem would break that advantage, and at that point, you'd be
>> >> better off using a debuginfo format like CTF, which is lightweight and
>> >> expected to be found on the filesystem.
>> >
>> > With all or nothing approach the distros would have a hard choice
>> > to make whether to enable that kconfig, increase BTF and consume
>> > extra memory without any obvious reason or just don't do it.
>> > Majority probably is not going to enable it.
>> > So the feature will become a single vendor only and with
>> > inevitable bit-rot.
>>
>> I'd intend to support it even if just a single distribution enabled it.
>> But I do see your concern.
>
> This thread was dormant for 8 days.
> That's a poor example of "intend to support".

You're right, I definitely could have replied sooner. I'm sorry for that.

>> > Whereas with split BTF and extra kernel module approach
>> > we can enable BTF with all global vars by default.
>> > The extra module will be shipped by all distros and tools
>> > like bpftrace might start using it.
>>
>> Split BTF is currently limited to a single base BTF file. We'd need more
>> patches for pahole to support multiple --btf_base files: e.g.
>> vmlinux.btf and vmlinux-variables.btf. There's also the question of
>> modules: presumably we wouldn't try to have "$MODULE" and
>> "$MODULE-btf-extra" modules due to the added complexity. I doubt the
>> space savings would be worth it.
>>
>> I can look into these concerns, but if possible I would like to proceed
>> with this series, as it is a separate concern from the exact mechanism
>> by which we include extra BTF into the kernel.
>
> Not an option. Sorry.

Ok, so let me describe what I understand to be the proposed design as of
the previous thread, and see if it satisfies your concerns. We can work
from there to make sure we've got a concensus design before going
further.

Option #1
---------

* A new module, "vmlinux-btf-extra" (or something roughly like that) is
  added, which contains BTF only. It is generated with
  --encode_all_btf_vars and uses --btf_base=path/to/vmlinux_btf so that
  it contains only BTF variables. The vmlinux BTF would be generated
  same as always (without the --encode_all_btf_vars).

* In the previous thread, it was proposed [1] that modules could
  include variables in their BTF in order to reduce the complexity of
  the change. Modules would have their BTF generated using
  --encode_all_btf_vars and --btf_base=path/to/vmlinux_btf. The
  resulting hierarchy would look like this:

  vmlinux_btf  [ functions and percpu vars only ]
  |- vmlinux-btf-extra [ all other vars for vmlinux ]
  |- $MODULE   [ functions and all vars ]
  ...

This option is desirable because it means that we only need 2-level
split BTF, and so we don't actually need to make changes to pahole for
multiple --btf_base files. There are two downsides I see:

(a) While we save space on vmlinux BTF, each module will have a bit of
    extra data for variable types. On my laptop (5.15 based) I have 9.8
    MB of BTF, and if you deduct vmlinux, you're still left with 4.7 MB.
    If we assume the same overhead of 23.7%, that would be 1.1 MB of
    extra module BTF for my particular use case.

    $ ls -l /sys/kernel/btf | awk '{sum += $5} END {print(sum)}'
    9876871
    $ ls -l /sys/kernel/btf/vmlinux
    -r--r--r-- 1 root root 5174406 Sep  7 14:20 /sys/kernel/btf/vmlinux

(b) It's possible for "vmlinux-btf-extras" and "$MODULE" to contain
    duplicate type definitions, wasting additional space. However, as
    far as I understand it, this was already a possibility, e.g.
    $MODULE1 and $MODULE2 could already contain duplicate types. So I
    think this downside is no more.


Option #2
---------

* The vmlinux-btf-extra module is still added as in Option #1.

* Further, each module would have its own "$MODULE-btf-extra" module to
  add in extra BTF. These would be built with a --btf_base=$MODULE.ko
  and of course that BTF is based on vmlinux, so we would have:

  vmlinux_btf              [ functions and percpu vars only ]
  |- vmlinux-btf-extras    [ all other vars for vmlinux ]
  |- $MODULE               [ functions and percpu vars only ]
     |- $MODULE-btf-extra  [ all  other vars for $MODULE ]

This is much more complex, pahole must be extended to support a
hierarchy of --btf_base files. The kernel itself may not need to
understand multi-level BTF since there's no requirement that it actually
understand $MODULE-btf-extra, so long as it exposes it via
/sys/kernel/btf/$MODULE-btf-extra. I'd also like to see some sort of
mechanism to allow an administrator to say "please always load
$MODULE-btf-extras alongside $MODULE", but I think that would be a
userspace problem.

This resolves issue (a) from option #1, of course at implementation
cost.

Regardless of Option #1 or #2, I'd propose that we implement this as a
tristate, similar to what Alan proposed [2]. When set to "m" we use the
solutions described above, and when set to "y", we don't bother with it,
instead using --encode_all_btf_vars for all generation.

If we go with Option #1, no changes to this series should be necessary.
If we go with Option #2, I'll need to extend pahole to support at least
two BTF base files. Please let me know your thoughts.

Stephen

[1]: https://lore.kernel.org/bpf/CAEf4BzZmJKqXaJMBxhKqFNXzjO=eN5gk2xQVnmQVdK2xd3HQ=g@mail.gmail.com/
[2]: https://lore.kernel.org/bpf/alpine.LRH.2.23.451.2205032254390.10133@MyRouter/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH dwarves 0/7] Add support for generating BTF for all variables
  2022-09-07 21:54       ` Stephen Brennan
@ 2022-09-08 20:35         ` Alexei Starovoitov
  2022-09-09 19:31           ` Stephen Brennan
  0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2022-09-08 20:35 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Jiri Olsa, Andrii Nakryiko, Daniel Borkmann, dwarves, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire

On Wed, Sep 7, 2022 at 2:54 PM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> > On Wed, Sep 7, 2022 at 12:07 PM Stephen Brennan
> > <stephen.s.brennan@oracle.com> wrote:
> >>
> >> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> >> > On Fri, Aug 26, 2022 at 11:54 AM Stephen Brennan
> >> > <stephen.s.brennan@oracle.com> wrote:
> >> [...]
> >> >> Future Work
> >> >> -----------
> >> >>
> >> >> If this proves acceptable, I'd like to follow-up with a kernel patch to
> >> >> add a configuration option (default=n) for generating BTF with all
> >> >> variables, which distributions could choose to enable or not.
> >> >>
> >> >> There was previous discussion[3] about leveraging split BTF or building
> >> >> additional kernel modules to contain the extra variables. I believe with
> >> >> this patch series, it is possible to do that. However, I'd argue that
> >> >> simpler is better here: the advantage for using BTF is having it all
> >> >> available in the kernel/module image. Storing extra BTF on the
> >> >> filesystem would break that advantage, and at that point, you'd be
> >> >> better off using a debuginfo format like CTF, which is lightweight and
> >> >> expected to be found on the filesystem.
> >> >
> >> > With all or nothing approach the distros would have a hard choice
> >> > to make whether to enable that kconfig, increase BTF and consume
> >> > extra memory without any obvious reason or just don't do it.
> >> > Majority probably is not going to enable it.
> >> > So the feature will become a single vendor only and with
> >> > inevitable bit-rot.
> >>
> >> I'd intend to support it even if just a single distribution enabled it.
> >> But I do see your concern.
> >
> > This thread was dormant for 8 days.
> > That's a poor example of "intend to support".
>
> You're right, I definitely could have replied sooner. I'm sorry for that.
>
> >> > Whereas with split BTF and extra kernel module approach
> >> > we can enable BTF with all global vars by default.
> >> > The extra module will be shipped by all distros and tools
> >> > like bpftrace might start using it.
> >>
> >> Split BTF is currently limited to a single base BTF file. We'd need more
> >> patches for pahole to support multiple --btf_base files: e.g.
> >> vmlinux.btf and vmlinux-variables.btf. There's also the question of
> >> modules: presumably we wouldn't try to have "$MODULE" and
> >> "$MODULE-btf-extra" modules due to the added complexity. I doubt the
> >> space savings would be worth it.
> >>
> >> I can look into these concerns, but if possible I would like to proceed
> >> with this series, as it is a separate concern from the exact mechanism
> >> by which we include extra BTF into the kernel.
> >
> > Not an option. Sorry.
>
> Ok, so let me describe what I understand to be the proposed design as of
> the previous thread, and see if it satisfies your concerns. We can work
> from there to make sure we've got a concensus design before going
> further.

I was hoping Andrii and others will provide their opinion.
Here are my .02

> Option #1
> ---------
>
> * A new module, "vmlinux-btf-extra" (or something roughly like that) is
>   added, which contains BTF only. It is generated with
>   --encode_all_btf_vars and uses --btf_base=path/to/vmlinux_btf so that
>   it contains only BTF variables. The vmlinux BTF would be generated
>   same as always (without the --encode_all_btf_vars).
>
> * In the previous thread, it was proposed [1] that modules could
>   include variables in their BTF in order to reduce the complexity of
>   the change. Modules would have their BTF generated using
>   --encode_all_btf_vars and --btf_base=path/to/vmlinux_btf. The
>   resulting hierarchy would look like this:
>
>   vmlinux_btf  [ functions and percpu vars only ]
>   |- vmlinux-btf-extra [ all other vars for vmlinux ]
>   |- $MODULE   [ functions and all vars ]
>   ...
>
> This option is desirable because it means that we only need 2-level
> split BTF, and so we don't actually need to make changes to pahole for
> multiple --btf_base files. There are two downsides I see:
>
> (a) While we save space on vmlinux BTF, each module will have a bit of
>     extra data for variable types. On my laptop (5.15 based) I have 9.8
>     MB of BTF, and if you deduct vmlinux, you're still left with 4.7 MB.
>     If we assume the same overhead of 23.7%, that would be 1.1 MB of
>     extra module BTF for my particular use case.
>
>     $ ls -l /sys/kernel/btf | awk '{sum += $5} END {print(sum)}'
>     9876871
>     $ ls -l /sys/kernel/btf/vmlinux
>     -r--r--r-- 1 root root 5174406 Sep  7 14:20 /sys/kernel/btf/vmlinux
>
> (b) It's possible for "vmlinux-btf-extras" and "$MODULE" to contain
>     duplicate type definitions, wasting additional space. However, as
>     far as I understand it, this was already a possibility, e.g.
>     $MODULE1 and $MODULE2 could already contain duplicate types. So I
>     think this downside is no more.

Both concerns are valid, but I'm a bit puzzled with (a).
At least in the networking drivers the number of global vars is very small.
I expected other drivers to be similar.
So having "functions and all vars" in ko-s should not add
that much overhead.

Maybe you're seeing this overhead because pahole is adding
all declared vars and not only the vars that are actually present?
That would explain the discrepancy.
(b) with a bunch of duplicates is a sign that something is off as well.

>
>
> Option #2
> ---------
>
> * The vmlinux-btf-extra module is still added as in Option #1.
>
> * Further, each module would have its own "$MODULE-btf-extra" module to
>   add in extra BTF. These would be built with a --btf_base=$MODULE.ko
>   and of course that BTF is based on vmlinux, so we would have:
>
>   vmlinux_btf              [ functions and percpu vars only ]
>   |- vmlinux-btf-extras    [ all other vars for vmlinux ]
>   |- $MODULE               [ functions and percpu vars only ]
>      |- $MODULE-btf-extra  [ all  other vars for $MODULE ]
>
> This is much more complex, pahole must be extended to support a
> hierarchy of --btf_base files. The kernel itself may not need to
> understand multi-level BTF since there's no requirement that it actually
> understand $MODULE-btf-extra, so long as it exposes it via
> /sys/kernel/btf/$MODULE-btf-extra. I'd also like to see some sort of
> mechanism to allow an administrator to say "please always load
> $MODULE-btf-extras alongside $MODULE", but I think that would be a
> userspace problem.
>
> This resolves issue (a) from option #1, of course at implementation
> cost.
>
> Regardless of Option #1 or #2, I'd propose that we implement this as a
> tristate, similar to what Alan proposed [2]. When set to "m" we use the
> solutions described above, and when set to "y", we don't bother with it,
> instead using --encode_all_btf_vars for all generation.
>
> If we go with Option #1, no changes to this series should be necessary.
> If we go with Option #2, I'll need to extend pahole to support at least
> two BTF base files. Please let me know your thoughts.

Completely agree that two level btf-extra needs quite a bit more work.
Before we proceed with option 2 let's figure out
the reason for extra space in option 1.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH dwarves 0/7] Add support for generating BTF for all variables
  2022-09-08 20:35         ` Alexei Starovoitov
@ 2022-09-09 19:31           ` Stephen Brennan
  2022-09-23 23:38             ` Andrii Nakryiko
  0 siblings, 1 reply; 15+ messages in thread
From: Stephen Brennan @ 2022-09-09 19:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiri Olsa, Andrii Nakryiko, Daniel Borkmann, dwarves, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire

>> (a) While we save space on vmlinux BTF, each module will have a bit of
>>     extra data for variable types. On my laptop (5.15 based) I have 9.8
>>     MB of BTF, and if you deduct vmlinux, you're still left with 4.7 MB.
>>     If we assume the same overhead of 23.7%, that would be 1.1 MB of
>>     extra module BTF for my particular use case.
>>
>>     $ ls -l /sys/kernel/btf | awk '{sum += $5} END {print(sum)}'
>>     9876871
>>     $ ls -l /sys/kernel/btf/vmlinux
>>     -r--r--r-- 1 root root 5174406 Sep  7 14:20 /sys/kernel/btf/vmlinux
>>
>> (b) It's possible for "vmlinux-btf-extras" and "$MODULE" to contain
>>     duplicate type definitions, wasting additional space. However, as
>>     far as I understand it, this was already a possibility, e.g.
>>     $MODULE1 and $MODULE2 could already contain duplicate types. So I
>>     think this downside is no more.
>
> Both concerns are valid, but I'm a bit puzzled with (a).
> At least in the networking drivers the number of global vars is very small.
> I expected other drivers to be similar.
> So having "functions and all vars" in ko-s should not add
> that much overhead.
>
> Maybe you're seeing this overhead because pahole is adding
> all declared vars and not only the vars that are actually present?
> That would explain the discrepancy.
> (b) with a bunch of duplicates is a sign that something is off as well.

Sorry, I didn't actually have an analysis for module BTF, I was just
extrapolating the result I had seen for vmlinux. I went ahead and did a
proper test, generating BTF for a distribution kernel from Oracle Linux
(kernel-uek-5.15.0-1.43.4.1.el9uek.x86_64) - something that I easily had
on hand and could regenerate the BTF for quickly.

Basically, the steps were:

    pahole -J vmlinux --btf_encode_detached=vmlinux.btf
    pahole -J vmlinux --btf_encode_detached=vmlinux.btf.all \
           --encode_all_btf_vars

    # For each module
    pahole -J $MODULE --btf_encode_detached=$MODULE.btf \
           --btf_base=vmlinux.btf
    pahole -J $MODULE --btf_encode_detached=$MODULE.btf.all \
           --btf_base=vmlinux.btf --encode_all_btf_vars

    # what if we based the module BTF on the "vmlinux.btf.all" instead?
    pahole -J $MODULE --btf_encode_detached=$MODULE.btf.all.all \
           --btf_base=vmlinux.btf.all --encode_all_btf_vars

And then using ls/awk to sum up the bytes of each BTF file. Results are:

vmlinux:

-rw-r-----. 1 opc opc 4904193 Sep  9 18:58 vmlinux.btf
-rw-r-----. 1 opc opc 6534684 Sep  9 18:58 vmlinux.btf.all

In this case there's a 33% increase in BTF size.

modules:

$ ls -l *.btf | awk '{sum += $5} END {print(sum)}'
43979532
$ ls -l *.btf.all | awk '{sum += $5} END {print(sum)}'
44757792
$ ls -l *.btf.all.all | awk '{sum += $5} END {print(sum)}'
44696639

So the "*.btf.all.all" modules were just an experiment to see if the
extra data inside "vmlinux.btf.all" could reduce some duplication in
module BTF. The answer was yes, but not enough to make up for the
increase in the vmlinux BTF size.

The "*.btf.all" modules are the ones we would actually expect to use in
Option #1, where we have a vmlinux-btf-extras and the rest of the
modules include their globals in their BTF sections directly, and are
based off of the vmlinux BTF. This test shows on average, that the
module BTF size would grow by 1.6% with Option #1. Of course the exact
memory size that accounts for will vary by workload, depending on how
many modules are loaded. But I'd imagine, assuming you have around 5MB
of module BTF *actually loaded*, then the overhead would be around 85k
bytes.  I don't know about how you feel, but I think that sounds
acceptable, it's just 22 pages at 4k size :)

Let me know how it sounds to you.

Thanks,
Stephen

>>
>>
>> Option #2
>> ---------
>>
>> * The vmlinux-btf-extra module is still added as in Option #1.
>>
>> * Further, each module would have its own "$MODULE-btf-extra" module to
>>   add in extra BTF. These would be built with a --btf_base=$MODULE.ko
>>   and of course that BTF is based on vmlinux, so we would have:
>>
>>   vmlinux_btf              [ functions and percpu vars only ]
>>   |- vmlinux-btf-extras    [ all other vars for vmlinux ]
>>   |- $MODULE               [ functions and percpu vars only ]
>>      |- $MODULE-btf-extra  [ all  other vars for $MODULE ]
>>
>> This is much more complex, pahole must be extended to support a
>> hierarchy of --btf_base files. The kernel itself may not need to
>> understand multi-level BTF since there's no requirement that it actually
>> understand $MODULE-btf-extra, so long as it exposes it via
>> /sys/kernel/btf/$MODULE-btf-extra. I'd also like to see some sort of
>> mechanism to allow an administrator to say "please always load
>> $MODULE-btf-extras alongside $MODULE", but I think that would be a
>> userspace problem.
>>
>> This resolves issue (a) from option #1, of course at implementation
>> cost.
>>
>> Regardless of Option #1 or #2, I'd propose that we implement this as a
>> tristate, similar to what Alan proposed [2]. When set to "m" we use the
>> solutions described above, and when set to "y", we don't bother with it,
>> instead using --encode_all_btf_vars for all generation.
>>
>> If we go with Option #1, no changes to this series should be necessary.
>> If we go with Option #2, I'll need to extend pahole to support at least
>> two BTF base files. Please let me know your thoughts.
>
> Completely agree that two level btf-extra needs quite a bit more work.
> Before we proceed with option 2 let's figure out
> the reason for extra space in option 1.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH dwarves 0/7] Add support for generating BTF for all variables
  2022-09-09 19:31           ` Stephen Brennan
@ 2022-09-23 23:38             ` Andrii Nakryiko
  0 siblings, 0 replies; 15+ messages in thread
From: Andrii Nakryiko @ 2022-09-23 23:38 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Alexei Starovoitov, Jiri Olsa, Andrii Nakryiko, Daniel Borkmann,
	dwarves, bpf, Arnaldo Carvalho de Melo, Alan Maguire

On Fri, Sep 9, 2022 at 12:31 PM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
>
> >> (a) While we save space on vmlinux BTF, each module will have a bit of
> >>     extra data for variable types. On my laptop (5.15 based) I have 9.8
> >>     MB of BTF, and if you deduct vmlinux, you're still left with 4.7 MB.
> >>     If we assume the same overhead of 23.7%, that would be 1.1 MB of
> >>     extra module BTF for my particular use case.
> >>
> >>     $ ls -l /sys/kernel/btf | awk '{sum += $5} END {print(sum)}'
> >>     9876871
> >>     $ ls -l /sys/kernel/btf/vmlinux
> >>     -r--r--r-- 1 root root 5174406 Sep  7 14:20 /sys/kernel/btf/vmlinux
> >>
> >> (b) It's possible for "vmlinux-btf-extras" and "$MODULE" to contain
> >>     duplicate type definitions, wasting additional space. However, as
> >>     far as I understand it, this was already a possibility, e.g.
> >>     $MODULE1 and $MODULE2 could already contain duplicate types. So I
> >>     think this downside is no more.
> >
> > Both concerns are valid, but I'm a bit puzzled with (a).
> > At least in the networking drivers the number of global vars is very small.
> > I expected other drivers to be similar.
> > So having "functions and all vars" in ko-s should not add
> > that much overhead.
> >
> > Maybe you're seeing this overhead because pahole is adding
> > all declared vars and not only the vars that are actually present?
> > That would explain the discrepancy.
> > (b) with a bunch of duplicates is a sign that something is off as well.
>
> Sorry, I didn't actually have an analysis for module BTF, I was just
> extrapolating the result I had seen for vmlinux. I went ahead and did a
> proper test, generating BTF for a distribution kernel from Oracle Linux
> (kernel-uek-5.15.0-1.43.4.1.el9uek.x86_64) - something that I easily had
> on hand and could regenerate the BTF for quickly.
>
> Basically, the steps were:
>
>     pahole -J vmlinux --btf_encode_detached=vmlinux.btf
>     pahole -J vmlinux --btf_encode_detached=vmlinux.btf.all \
>            --encode_all_btf_vars
>
>     # For each module
>     pahole -J $MODULE --btf_encode_detached=$MODULE.btf \
>            --btf_base=vmlinux.btf
>     pahole -J $MODULE --btf_encode_detached=$MODULE.btf.all \
>            --btf_base=vmlinux.btf --encode_all_btf_vars
>
>     # what if we based the module BTF on the "vmlinux.btf.all" instead?
>     pahole -J $MODULE --btf_encode_detached=$MODULE.btf.all.all \
>            --btf_base=vmlinux.btf.all --encode_all_btf_vars
>
> And then using ls/awk to sum up the bytes of each BTF file. Results are:
>
> vmlinux:
>
> -rw-r-----. 1 opc opc 4904193 Sep  9 18:58 vmlinux.btf
> -rw-r-----. 1 opc opc 6534684 Sep  9 18:58 vmlinux.btf.all
>
> In this case there's a 33% increase in BTF size.
>
> modules:
>
> $ ls -l *.btf | awk '{sum += $5} END {print(sum)}'
> 43979532
> $ ls -l *.btf.all | awk '{sum += $5} END {print(sum)}'
> 44757792
> $ ls -l *.btf.all.all | awk '{sum += $5} END {print(sum)}'
> 44696639
>
> So the "*.btf.all.all" modules were just an experiment to see if the
> extra data inside "vmlinux.btf.all" could reduce some duplication in
> module BTF. The answer was yes, but not enough to make up for the
> increase in the vmlinux BTF size.
>
> The "*.btf.all" modules are the ones we would actually expect to use in
> Option #1, where we have a vmlinux-btf-extras and the rest of the
> modules include their globals in their BTF sections directly, and are
> based off of the vmlinux BTF. This test shows on average, that the
> module BTF size would grow by 1.6% with Option #1. Of course the exact
> memory size that accounts for will vary by workload, depending on how
> many modules are loaded. But I'd imagine, assuming you have around 5MB
> of module BTF *actually loaded*, then the overhead would be around 85k
> bytes.  I don't know about how you feel, but I think that sounds
> acceptable, it's just 22 pages at 4k size :)
>
> Let me know how it sounds to you.
>
> Thanks,
> Stephen
>
> >>
> >>
> >> Option #2
> >> ---------
> >>
> >> * The vmlinux-btf-extra module is still added as in Option #1.
> >>
> >> * Further, each module would have its own "$MODULE-btf-extra" module to
> >>   add in extra BTF. These would be built with a --btf_base=$MODULE.ko
> >>   and of course that BTF is based on vmlinux, so we would have:
> >>
> >>   vmlinux_btf              [ functions and percpu vars only ]
> >>   |- vmlinux-btf-extras    [ all other vars for vmlinux ]
> >>   |- $MODULE               [ functions and percpu vars only ]
> >>      |- $MODULE-btf-extra  [ all  other vars for $MODULE ]
> >>
> >> This is much more complex, pahole must be extended to support a
> >> hierarchy of --btf_base files. The kernel itself may not need to
> >> understand multi-level BTF since there's no requirement that it actually
> >> understand $MODULE-btf-extra, so long as it exposes it via
> >> /sys/kernel/btf/$MODULE-btf-extra. I'd also like to see some sort of
> >> mechanism to allow an administrator to say "please always load
> >> $MODULE-btf-extras alongside $MODULE", but I think that would be a
> >> userspace problem.
> >>
> >> This resolves issue (a) from option #1, of course at implementation
> >> cost.
> >>
> >> Regardless of Option #1 or #2, I'd propose that we implement this as a
> >> tristate, similar to what Alan proposed [2]. When set to "m" we use the
> >> solutions described above, and when set to "y", we don't bother with it,
> >> instead using --encode_all_btf_vars for all generation.
> >>
> >> If we go with Option #1, no changes to this series should be necessary.
> >> If we go with Option #2, I'll need to extend pahole to support at least
> >> two BTF base files. Please let me know your thoughts.
> >
> > Completely agree that two level btf-extra needs quite a bit more work.
> > Before we proceed with option 2 let's figure out
> > the reason for extra space in option 1.

I don't think an extra module for each module just for keeping those
all-var-BTFs is acceptable, so Option #2 doesn't even seem like an
option.

But given a very small increase in size of BTF for modules when
including variables, I think Option #1 is quite reasonable.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-09-23 23:39 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-26 18:49 [PATCH dwarves 0/7] Add support for generating BTF for all variables Stephen Brennan
2022-08-26 18:49 ` [PATCH dwarves 1/7] dutil: return ELF section name when looked up by index Stephen Brennan
2022-08-26 18:49 ` [PATCH dwarves 2/7] btf_encoder: Rename percpu structures to variables Stephen Brennan
2022-08-26 18:49 ` [PATCH dwarves 3/7] btf_encoder: cache all ELF section info Stephen Brennan
2022-08-26 18:49 ` [PATCH dwarves 4/7] btf_encoder: make the variable array dynamic Stephen Brennan
2022-08-26 18:49 ` [PATCH dwarves 5/7] btf_encoder: record ELF section for collected variables Stephen Brennan
2022-08-26 18:49 ` [PATCH dwarves 6/7] btf_encoder: collect all variables Stephen Brennan
2022-08-26 18:49 ` [PATCH dwarves 7/7] btf_encoder: allow encoding " Stephen Brennan
2022-08-30 15:14 ` [PATCH dwarves 0/7] Add support for generating BTF for " Alexei Starovoitov
2022-09-07 19:06   ` Stephen Brennan
2022-09-07 19:27     ` Alexei Starovoitov
2022-09-07 21:54       ` Stephen Brennan
2022-09-08 20:35         ` Alexei Starovoitov
2022-09-09 19:31           ` Stephen Brennan
2022-09-23 23:38             ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).