All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH dwarves 0/5] dwarves: support encoding of optimized-out parameters, removal of inconsistent static functions
@ 2023-01-24 13:45 Alan Maguire
  2023-01-24 13:45 ` [PATCH dwarves 1/5] dwarves: help dwarf loader spot functions with optimized-out parameters Alan Maguire
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Alan Maguire @ 2023-01-24 13:45 UTC (permalink / raw)
  To: acme, yhs, ast, olsajiri, timo
  Cc: daniel, andrii, songliubraving, john.fastabend, kpsingh, sdf,
	haoluo, martin.lau, bpf, Alan Maguire

At optimization level -O2 or higher in gcc, static functions may be
optimized such that they have suffixes like .isra.0, .constprop.0 etc.
These represent

- constant propagation (.constprop.0);
- interprocedural scalar replacement of aggregates, removal of
  unused parameters and replacement of parameters passed by
  reference by parameters passed by value (.isra.0)

See [1] for details.

Currently BTF encoding does not handle such optimized functions
that get renamed with a "." suffix such as ".isra.0", ".constprop.0".
This is safer because such suffixes can often indicate parameters have
been optimized out.  This series addresses this by matching a
function to a suffixed version ("foo" matching "foo.isra.0") while
ensuring that the function signature does not contain optimized-out
parameters.  Note that if the function is found ("foo") it will
be preferred, only falling back to "foo.isra.0" if lookup of the
function fails.  Addition to BTF is skipped if the function has
optimized-out parameters, since the expected function signature
will not match. BTF encoding does not include the "."-suffix to
be consistent with DWARF. In addition, the kernel currently does
not allow a "." suffix in a BTF function name.

A problem with this approach however is that BTF carries out the
encoding process in parallel across multiple CUs, and sometimes
a function has optimized-out parameters in one CU but not others;
we see this for NF_HOOK.constprop.0 for example.  So in order to
determine if the function has optimized-out parameters in any
CU, its addition is not carried out until we have processed all
CUs and are about to merge BTF.  At this point we know if any
such optimizations have occurred.  Patches 1-4 handle the
optimized-out parameter identification and matching "."-suffixed
functions with the original function to facilitate BTF
encoding.

Patch 5 addresses a related problem - it is entirely possible
for a static function of the same name to exist in different
CUs with different function signatures.  Because BTF does not
currently encode any information that would help disambiguate
which BTF function specification matches which static function
(in the case of multiple different function signatures), it is
best to eliminate such functions from BTF for now.  The same
mechanism that is used to compare static "."-suffixed functions
is re-used for the static function comparison.  A superficial
comparison of number of parameters/parameter names is done to
see if such representations are consistent, and if inconsistent
prototypes are observed, the function is flagged for exclusion
from BTF.

When these methods are combined - the additive encoding of
"."-suffixed functions and the subtractive elimination of
functions with inconsistent parameters - we see a small overall
increase in the number of functions in vmlinux BTF, from
49538 to 50083.  It turns out that the inconsistent prototype
checking will actually eliminate some of the suffix matches
also, for cases where the DWARF representation of a function
differs across CUs, but not via the abstract origin late DWARF
references showing optimized-out parameters that we check
for in patch 1.

[1] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

Alan Maguire (5):
  dwarves: help dwarf loader spot functions with optimized-out
    parameters
  btf_encoder: refactor function addition into dedicated
    btf_encoder__add_func
  btf_encoder: child encoders should have a reference to parent encoder
  btf_encoder: represent "."-suffixed optimized functions (".isra.0") in
    BTF
  btf_encoder: skip BTF encoding of static functions with inconsistent
    prototypes

 btf_encoder.c  | 357 +++++++++++++++++++++++++++++++++++++++++++++++++--------
 btf_encoder.h  |   2 +-
 dwarf_loader.c |  76 +++++++++++-
 dwarves.h      |   5 +-
 pahole.c       |   7 +-
 5 files changed, 390 insertions(+), 57 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH dwarves 1/5] dwarves: help dwarf loader spot functions with optimized-out parameters
  2023-01-24 13:45 [PATCH dwarves 0/5] dwarves: support encoding of optimized-out parameters, removal of inconsistent static functions Alan Maguire
@ 2023-01-24 13:45 ` Alan Maguire
  2023-01-25 16:53   ` Jiri Olsa
  2023-01-25 17:47   ` Eduard Zingerman
  2023-01-24 13:45 ` [PATCH dwarves 2/5] btf_encoder: refactor function addition into dedicated btf_encoder__add_func Alan Maguire
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 27+ messages in thread
From: Alan Maguire @ 2023-01-24 13:45 UTC (permalink / raw)
  To: acme, yhs, ast, olsajiri, timo
  Cc: daniel, andrii, songliubraving, john.fastabend, kpsingh, sdf,
	haoluo, martin.lau, bpf, Alan Maguire

Compilation generates DWARF at several stages, and often the
later DWARF representations more accurately represent optimizations
that have occurred during compilation.

In particular, parameter representations can be spotted by their
abstract origin references to the original parameter, but they
often have more accurate location information.  In most cases,
the parameter locations will match calling conventions, and be
registers for the first 6 parameters on x86_64, first 8 on ARM64
etc.  If the parameter is not a register when it should be however,
it is likely passed via the stack or the compiler has used a
constant representation instead.

This change adds a field to parameters and their associated
ftype to note if a parameter has been optimized out.  Having
this information allows us to skip such functions, as their
presence in CUs makes BTF encoding impossible.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 dwarf_loader.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 dwarves.h      |  4 +++-
 2 files changed, 77 insertions(+), 3 deletions(-)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index 5a74035..0220f1d 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -992,13 +992,67 @@ static struct class_member *class_member__new(Dwarf_Die *die, struct cu *cu,
 	return member;
 }
 
-static struct parameter *parameter__new(Dwarf_Die *die, struct cu *cu, struct conf_load *conf)
+/* How many function parameters are passed via registers?  Used below in
+ * determining if an argument has been optimized out or if it is simply
+ * an argument > NR_REGISTER_PARAMS.  Setting NR_REGISTER_PARAMS to 0
+ * allows unsupported architectures to skip tagging optimized-out
+ * values.
+ */
+#if defined(__x86_64__)
+#define NR_REGISTER_PARAMS      6
+#elif defined(__s390__)
+#define NR_REGISTER_PARAMS	5
+#elif defined(__aarch64__)
+#define NR_REGISTER_PARAMS      8
+#elif defined(__mips__)
+#define NR_REGISTER_PARAMS	8
+#elif defined(__powerpc__)
+#define NR_REGISTER_PARAMS	8
+#elif defined(__sparc__)
+#define NR_REGISTER_PARAMS	6
+#elif defined(__riscv) && __riscv_xlen == 64
+#define NR_REGISTER_PARAMS	8
+#elif defined(__arc__)
+#define NR_REGISTER_PARAMS	8
+#else
+#define NR_REGISTER_PARAMS      0
+#endif
+
+static struct parameter *parameter__new(Dwarf_Die *die, struct cu *cu,
+					struct conf_load *conf, int param_idx)
 {
 	struct parameter *parm = tag__alloc(cu, sizeof(*parm));
 
 	if (parm != NULL) {
+		struct location loc;
+
 		tag__init(&parm->tag, cu, die);
 		parm->name = attr_string(die, DW_AT_name, conf);
+
+		/* Parameters which use DW_AT_abstract_origin to point at
+		 * the original parameter definition (with no name in the DIE)
+		 * are the result of later DWARF generation during compilation
+		 * so often better take into account if arguments were
+		 * optimized out.
+		 *
+		 * By checking that locations for parameters that are expected
+		 * to be passed as registers are actually passed as registers,
+		 * we can spot optimized-out parameters.
+		 */
+		if (param_idx < NR_REGISTER_PARAMS && !parm->name &&
+		    attr_location(die, &loc.expr, &loc.exprlen) == 0 &&
+		    loc.exprlen != 0) {
+			Dwarf_Op *expr = loc.expr;
+
+			switch (expr->atom) {
+			case DW_OP_reg1 ... DW_OP_reg31:
+			case DW_OP_breg0 ... DW_OP_breg31:
+				break;
+			default:
+				parm->optimized = true;
+				break;
+			}
+		}
 	}
 
 	return parm;
@@ -1450,7 +1504,7 @@ static struct tag *die__create_new_parameter(Dwarf_Die *die,
 					     struct cu *cu, struct conf_load *conf,
 					     int param_idx)
 {
-	struct parameter *parm = parameter__new(die, cu, conf);
+	struct parameter *parm = parameter__new(die, cu, conf, param_idx);
 
 	if (parm == NULL)
 		return NULL;
@@ -2209,6 +2263,10 @@ static void ftype__recode_dwarf_types(struct tag *tag, struct cu *cu)
 			}
 			pos->name = tag__parameter(dtype->tag)->name;
 			pos->tag.type = dtype->tag->type;
+			if (pos->optimized) {
+				tag__parameter(dtype->tag)->optimized = pos->optimized;
+				type->optimized_parms = 1;
+			}
 			continue;
 		}
 
@@ -2219,6 +2277,20 @@ static void ftype__recode_dwarf_types(struct tag *tag, struct cu *cu)
 		}
 		pos->tag.type = dtype->small_id;
 	}
+	/* if parameters were optimized out, set flag for the ftype this
+	 * function tag referred to via abstract origin.
+	 */
+	if (type->optimized_parms) {
+		struct dwarf_tag *dtype = type->tag.priv;
+		struct dwarf_tag *dftype;
+
+		dftype = dwarf_cu__find_tag_by_ref(dcu, &dtype->abstract_origin);
+		if (dftype && dftype->tag) {
+			struct ftype *ftype = tag__ftype(dftype->tag);
+
+			ftype->optimized_parms = 1;
+		}
+	}
 }
 
 static void lexblock__recode_dwarf_types(struct lexblock *tag, struct cu *cu)
diff --git a/dwarves.h b/dwarves.h
index 589588e..1ad1b3b 100644
--- a/dwarves.h
+++ b/dwarves.h
@@ -808,6 +808,7 @@ size_t lexblock__fprintf(const struct lexblock *lexblock, const struct cu *cu,
 struct parameter {
 	struct tag tag;
 	const char *name;
+	bool optimized;
 };
 
 static inline struct parameter *tag__parameter(const struct tag *tag)
@@ -827,7 +828,8 @@ struct ftype {
 	struct tag	 tag;
 	struct list_head parms;
 	uint16_t	 nr_parms;
-	uint8_t		 unspec_parms; /* just one bit is needed */
+	uint8_t		 unspec_parms:1; /* just one bit is needed */
+	uint8_t		 optimized_parms:1;
 };
 
 static inline struct ftype *tag__ftype(const struct tag *tag)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH dwarves 2/5] btf_encoder: refactor function addition into dedicated btf_encoder__add_func
  2023-01-24 13:45 [PATCH dwarves 0/5] dwarves: support encoding of optimized-out parameters, removal of inconsistent static functions Alan Maguire
  2023-01-24 13:45 ` [PATCH dwarves 1/5] dwarves: help dwarf loader spot functions with optimized-out parameters Alan Maguire
@ 2023-01-24 13:45 ` Alan Maguire
  2023-01-24 13:45 ` [PATCH dwarves 3/5] btf_encoder: child encoders should have a reference to parent encoder Alan Maguire
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Alan Maguire @ 2023-01-24 13:45 UTC (permalink / raw)
  To: acme, yhs, ast, olsajiri, timo
  Cc: daniel, andrii, songliubraving, john.fastabend, kpsingh, sdf,
	haoluo, martin.lau, bpf, Alan Maguire

This will be useful for postponing local function addition later on.
As part of this, store the type id offset and unspecified type in
the encoder, as this will simplify late addition of local functions.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 btf_encoder.c | 99 +++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 55 insertions(+), 44 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index a5fa04a..15a042c 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -54,6 +54,8 @@ struct btf_encoder {
 	struct gobuffer   percpu_secinfo;
 	const char	  *filename;
 	struct elf_symtab *symtab;
+	uint32_t	  type_id_off;
+	uint32_t	  unspecified_type;
 	bool		  has_index_type,
 			  need_index_type,
 			  skip_encoding_vars,
@@ -593,20 +595,20 @@ static int32_t btf_encoder__add_func_param(struct btf_encoder *encoder, const ch
 	}
 }
 
-static int32_t btf_encoder__tag_type(struct btf_encoder *encoder, uint32_t type_id_off, uint32_t tag_type)
+static int32_t btf_encoder__tag_type(struct btf_encoder *encoder, uint32_t tag_type)
 {
 	if (tag_type == 0)
 		return 0;
 
-	if (encoder->cu->unspecified_type.tag && tag_type == encoder->cu->unspecified_type.type) {
+	if (tag_type == encoder->unspecified_type) {
 		// No provision for encoding this, turn it into void.
 		return 0;
 	}
 
-	return type_id_off + tag_type;
+	return encoder->type_id_off + tag_type;
 }
 
-static int32_t btf_encoder__add_func_proto(struct btf_encoder *encoder, struct ftype *ftype, uint32_t type_id_off)
+static int32_t btf_encoder__add_func_proto(struct btf_encoder *encoder, struct ftype *ftype)
 {
 	struct btf *btf = encoder->btf;
 	const struct btf_type *t;
@@ -616,7 +618,7 @@ static int32_t btf_encoder__add_func_proto(struct btf_encoder *encoder, struct f
 
 	/* add btf_type for func_proto */
 	nr_params = ftype->nr_parms + (ftype->unspec_parms ? 1 : 0);
-	type_id = btf_encoder__tag_type(encoder, type_id_off, ftype->tag.type);
+	type_id = btf_encoder__tag_type(encoder, ftype->tag.type);
 
 	id = btf__add_func_proto(btf, type_id);
 	if (id > 0) {
@@ -634,7 +636,7 @@ static int32_t btf_encoder__add_func_proto(struct btf_encoder *encoder, struct f
 	ftype__for_each_parameter(ftype, param) {
 		const char *name = parameter__name(param);
 
-		type_id = param->tag.type == 0 ? 0 : type_id_off + param->tag.type;
+		type_id = param->tag.type == 0 ? 0 : encoder->type_id_off + param->tag.type;
 		++param_idx;
 		if (btf_encoder__add_func_param(encoder, name, type_id, param_idx == nr_params))
 			return -1;
@@ -762,6 +764,30 @@ static int32_t btf_encoder__add_decl_tag(struct btf_encoder *encoder, const char
 	return id;
 }
 
+static int32_t btf_encoder__add_func(struct btf_encoder *encoder, struct function *fn)
+{
+	int btf_fnproto_id, btf_fn_id, tag_type_id;
+	struct llvm_annotation *annot;
+	const char *name;
+
+	btf_fnproto_id = btf_encoder__add_func_proto(encoder, &fn->proto);
+	name = function__name(fn);
+	btf_fn_id = btf_encoder__add_ref_type(encoder, BTF_KIND_FUNC, btf_fnproto_id, name, false);
+	if (btf_fnproto_id < 0 || btf_fn_id < 0) {
+		printf("error: failed to encode function '%s'\n", function__name(fn));
+		return -1;
+	}
+	list_for_each_entry(annot, &fn->annots, node) {
+		tag_type_id = btf_encoder__add_decl_tag(encoder, annot->value, btf_fn_id, annot->component_idx);
+		if (tag_type_id < 0) {
+			fprintf(stderr, "error: failed to encode tag '%s' to func %s with component_idx %d\n",
+				annot->value, name, annot->component_idx);
+			return -1;
+		}
+	}
+	return 0;
+}
+
 /*
  * This corresponds to the same macro defined in
  * include/linux/kallsyms.h
@@ -859,22 +885,21 @@ static void dump_invalid_symbol(const char *msg, const char *sym,
 	fprintf(stderr, "PAHOLE: Error: Use '--btf_encode_force' to ignore such symbols and force emit the btf.\n");
 }
 
-static int tag__check_id_drift(const struct tag *tag,
-			       uint32_t core_id, uint32_t btf_type_id,
-			       uint32_t type_id_off)
+static int tag__check_id_drift(struct btf_encoder *encoder, const struct tag *tag,
+			       uint32_t core_id, uint32_t btf_type_id)
 {
-	if (btf_type_id != (core_id + type_id_off)) {
+	if (btf_type_id != (core_id + encoder->type_id_off)) {
 		fprintf(stderr,
 			"%s: %s id drift, core_id: %u, btf_type_id: %u, type_id_off: %u\n",
 			__func__, dwarf_tag_name(tag->tag),
-			core_id, btf_type_id, type_id_off);
+			core_id, btf_type_id, encoder->type_id_off);
 		return -1;
 	}
 
 	return 0;
 }
 
-static int32_t btf_encoder__add_struct_type(struct btf_encoder *encoder, struct tag *tag, uint32_t type_id_off)
+static int32_t btf_encoder__add_struct_type(struct btf_encoder *encoder, struct tag *tag)
 {
 	struct type *type = tag__type(tag);
 	struct class_member *pos;
@@ -896,7 +921,7 @@ static int32_t btf_encoder__add_struct_type(struct btf_encoder *encoder, struct
 		 * is required.
 		 */
 		name = class_member__name(pos);
-		if (btf_encoder__add_field(encoder, name, type_id_off + pos->tag.type, pos->bitfield_size, pos->bit_offset))
+		if (btf_encoder__add_field(encoder, name, encoder->type_id_off + pos->tag.type, pos->bitfield_size, pos->bit_offset))
 			return -1;
 	}
 
@@ -936,11 +961,11 @@ static int32_t btf_encoder__add_enum_type(struct btf_encoder *encoder, struct ta
 	return type_id;
 }
 
-static int btf_encoder__encode_tag(struct btf_encoder *encoder, struct tag *tag, uint32_t type_id_off,
+static int btf_encoder__encode_tag(struct btf_encoder *encoder, struct tag *tag,
 				   struct conf_load *conf_load)
 {
 	/* single out type 0 as it represents special type "void" */
-	uint32_t ref_type_id = tag->type == 0 ? 0 : type_id_off + tag->type;
+	uint32_t ref_type_id = tag->type == 0 ? 0 : encoder->type_id_off + tag->type;
 	struct base_type *bt;
 	const char *name;
 
@@ -970,7 +995,7 @@ static int btf_encoder__encode_tag(struct btf_encoder *encoder, struct tag *tag,
 		if (tag__type(tag)->declaration)
 			return btf_encoder__add_ref_type(encoder, BTF_KIND_FWD, 0, name, tag->tag == DW_TAG_union_type);
 		else
-			return btf_encoder__add_struct_type(encoder, tag, type_id_off);
+			return btf_encoder__add_struct_type(encoder, tag);
 	case DW_TAG_array_type:
 		/* TODO: Encode one dimension at a time. */
 		encoder->need_index_type = true;
@@ -978,7 +1003,7 @@ static int btf_encoder__encode_tag(struct btf_encoder *encoder, struct tag *tag,
 	case DW_TAG_enumeration_type:
 		return btf_encoder__add_enum_type(encoder, tag, conf_load);
 	case DW_TAG_subroutine_type:
-		return btf_encoder__add_func_proto(encoder, tag__ftype(tag), type_id_off);
+		return btf_encoder__add_func_proto(encoder, tag__ftype(tag));
         case DW_TAG_unspecified_type:
 		/* Just don't encode this for now, converting anything with this type to void (0) instead.
 		 *
@@ -1281,7 +1306,7 @@ static bool ftype__has_arg_names(const struct ftype *ftype)
 	return true;
 }
 
-static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, uint32_t type_id_off)
+static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder)
 {
 	struct cu *cu = encoder->cu;
 	uint32_t core_id;
@@ -1366,7 +1391,7 @@ static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder, uint32_
 			continue;
 		}
 
-		type = var->ip.tag.type + type_id_off;
+		type = var->ip.tag.type + encoder->type_id_off;
 		linkage = var->external ? BTF_VAR_GLOBAL_ALLOCATED : BTF_VAR_STATIC;
 
 		if (encoder->verbose) {
@@ -1507,7 +1532,6 @@ void btf_encoder__delete(struct btf_encoder *encoder)
 
 int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct conf_load *conf_load)
 {
-	uint32_t type_id_off = btf__type_cnt(encoder->btf) - 1;
 	struct llvm_annotation *annot;
 	int btf_type_id, tag_type_id, skipped_types = 0;
 	uint32_t core_id;
@@ -1516,21 +1540,24 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
 	int err = 0;
 
 	encoder->cu = cu;
+	encoder->type_id_off = btf__type_cnt(encoder->btf) - 1;
+	if (encoder->cu->unspecified_type.tag)
+		encoder->unspecified_type = encoder->cu->unspecified_type.type;
 
 	if (!encoder->has_index_type) {
 		/* cu__find_base_type_by_name() takes "type_id_t *id" */
 		type_id_t id;
 		if (cu__find_base_type_by_name(cu, "int", &id)) {
 			encoder->has_index_type = true;
-			encoder->array_index_id = type_id_off + id;
+			encoder->array_index_id = encoder->type_id_off + id;
 		} else {
 			encoder->has_index_type = false;
-			encoder->array_index_id = type_id_off + cu->types_table.nr_entries;
+			encoder->array_index_id = encoder->type_id_off + cu->types_table.nr_entries;
 		}
 	}
 
 	cu__for_each_type(cu, core_id, pos) {
-		btf_type_id = btf_encoder__encode_tag(encoder, pos, type_id_off, conf_load);
+		btf_type_id = btf_encoder__encode_tag(encoder, pos, conf_load);
 
 		if (btf_type_id == 0) {
 			++skipped_types;
@@ -1538,7 +1565,7 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
 		}
 
 		if (btf_type_id < 0 ||
-		    tag__check_id_drift(pos, core_id, btf_type_id + skipped_types, type_id_off)) {
+		    tag__check_id_drift(encoder, pos, core_id, btf_type_id + skipped_types)) {
 			err = -1;
 			goto out;
 		}
@@ -1572,7 +1599,7 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
 			continue;
 		}
 
-		btf_type_id = type_id_off + core_id;
+		btf_type_id = encoder->type_id_off + core_id;
 		ns = tag__namespace(pos);
 		list_for_each_entry(annot, &ns->annots, node) {
 			tag_type_id = btf_encoder__add_decl_tag(encoder, annot->value, btf_type_id, annot->component_idx);
@@ -1585,8 +1612,6 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
 	}
 
 	cu__for_each_function(cu, core_id, fn) {
-		int btf_fnproto_id, btf_fn_id;
-		const char *name;
 
 		/*
 		 * Skip functions that:
@@ -1616,27 +1641,13 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
 				continue;
 		}
 
-		btf_fnproto_id = btf_encoder__add_func_proto(encoder, &fn->proto, type_id_off);
-		name = function__name(fn);
-		btf_fn_id = btf_encoder__add_ref_type(encoder, BTF_KIND_FUNC, btf_fnproto_id, name, false);
-		if (btf_fnproto_id < 0 || btf_fn_id < 0) {
-			err = -1;
-			printf("error: failed to encode function '%s'\n", function__name(fn));
+		err = btf_encoder__add_func(encoder, fn);
+		if (err)
 			goto out;
-		}
-
-		list_for_each_entry(annot, &fn->annots, node) {
-			tag_type_id = btf_encoder__add_decl_tag(encoder, annot->value, btf_fn_id, annot->component_idx);
-			if (tag_type_id < 0) {
-				fprintf(stderr, "error: failed to encode tag '%s' to func %s with component_idx %d\n",
-					annot->value, name, annot->component_idx);
-				goto out;
-			}
-		}
 	}
 
 	if (!encoder->skip_encoding_vars)
-		err = btf_encoder__encode_cu_variables(encoder, type_id_off);
+		err = btf_encoder__encode_cu_variables(encoder);
 out:
 	encoder->cu = NULL;
 	return err;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH dwarves 3/5] btf_encoder: child encoders should have a reference to parent encoder
  2023-01-24 13:45 [PATCH dwarves 0/5] dwarves: support encoding of optimized-out parameters, removal of inconsistent static functions Alan Maguire
  2023-01-24 13:45 ` [PATCH dwarves 1/5] dwarves: help dwarf loader spot functions with optimized-out parameters Alan Maguire
  2023-01-24 13:45 ` [PATCH dwarves 2/5] btf_encoder: refactor function addition into dedicated btf_encoder__add_func Alan Maguire
@ 2023-01-24 13:45 ` Alan Maguire
  2023-01-24 13:45 ` [PATCH dwarves 4/5] btf_encoder: represent "."-suffixed optimized functions (".isra.0") in BTF Alan Maguire
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Alan Maguire @ 2023-01-24 13:45 UTC (permalink / raw)
  To: acme, yhs, ast, olsajiri, timo
  Cc: daniel, andrii, songliubraving, john.fastabend, kpsingh, sdf,
	haoluo, martin.lau, bpf, Alan Maguire

The parent encoder will be used to share details of local functions
such that we can mark those which have optimized parameters in
any CU.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 btf_encoder.c | 4 +++-
 btf_encoder.h | 2 +-
 pahole.c      | 3 ++-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index 15a042c..449439f 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -49,6 +49,7 @@ struct var_info {
  */
 struct btf_encoder {
 	struct list_head  node;
+	struct btf_encoder *parent;
 	struct btf        *btf;
 	struct cu         *cu;
 	struct gobuffer   percpu_secinfo;
@@ -1433,11 +1434,12 @@ out:
 	return err;
 }
 
-struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filename, struct btf *base_btf, bool skip_encoding_vars, bool force, bool gen_floats, bool verbose)
+struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filename, struct btf *base_btf, bool skip_encoding_vars, bool force, bool gen_floats, struct btf_encoder *parent, bool verbose)
 {
 	struct btf_encoder *encoder = zalloc(sizeof(*encoder));
 
 	if (encoder) {
+		encoder->parent = parent;
 		encoder->raw_output = detached_filename != NULL;
 		encoder->filename = strdup(encoder->raw_output ? detached_filename : cu->filename);
 		if (encoder->filename == NULL)
diff --git a/btf_encoder.h b/btf_encoder.h
index a65120c..fd5a97f 100644
--- a/btf_encoder.h
+++ b/btf_encoder.h
@@ -16,7 +16,7 @@ struct btf;
 struct cu;
 struct list_head;
 
-struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filename, struct btf *base_btf, bool skip_encoding_vars, bool force, bool gen_floats, bool verbose);
+struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filename, struct btf *base_btf, bool skip_encoding_vars, bool force, bool gen_floats, struct btf_encoder *parent, bool verbose);
 void btf_encoder__delete(struct btf_encoder *encoder);
 
 int btf_encoder__encode(struct btf_encoder *encoder);
diff --git a/pahole.c b/pahole.c
index 6f4f87c..844d502 100644
--- a/pahole.c
+++ b/pahole.c
@@ -3037,7 +3037,7 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
 			 * create it.
 			 */
 			btf_encoder = btf_encoder__new(cu, detached_btf_filename, conf_load->base_btf, skip_encoding_btf_vars,
-						       btf_encode_force, btf_gen_floats, global_verbose);
+						       btf_encode_force, btf_gen_floats, NULL, global_verbose);
 			if (btf_encoder && thr_data) {
 				struct thread_data *thread = thr_data;
 
@@ -3069,6 +3069,7 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
 							 skip_encoding_btf_vars,
 							 btf_encode_force,
 							 btf_gen_floats,
+							 btf_encoder,
 							 global_verbose);
 				thread->btf = btf_encoder__btf(thread->encoder);
 			}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH dwarves 4/5] btf_encoder: represent "."-suffixed optimized functions (".isra.0") in BTF
  2023-01-24 13:45 [PATCH dwarves 0/5] dwarves: support encoding of optimized-out parameters, removal of inconsistent static functions Alan Maguire
                   ` (2 preceding siblings ...)
  2023-01-24 13:45 ` [PATCH dwarves 3/5] btf_encoder: child encoders should have a reference to parent encoder Alan Maguire
@ 2023-01-24 13:45 ` Alan Maguire
  2023-01-25 17:54   ` Kui-Feng Lee
  2023-01-24 13:45 ` [PATCH dwarves 5/5] btf_encoder: skip BTF encoding of static functions with inconsistent prototypes Alan Maguire
  2023-01-24 15:14 ` [PATCH dwarves 0/5] dwarves: support encoding of optimized-out parameters, removal of inconsistent static functions Jiri Olsa
  5 siblings, 1 reply; 27+ messages in thread
From: Alan Maguire @ 2023-01-24 13:45 UTC (permalink / raw)
  To: acme, yhs, ast, olsajiri, timo
  Cc: daniel, andrii, songliubraving, john.fastabend, kpsingh, sdf,
	haoluo, martin.lau, bpf, Alan Maguire

At gcc optimization level O2 or higher, many function optimizations
occur such as

- constant propagation (.constprop.0);
- interprocedural scalar replacement of aggregates, removal of
  unused parameters and replacement of parameters passed by
  reference by parameters passed by value (.isra.0)

See [1] for details.

Currently BTF encoding does not handle such optimized functions
that get renamed with a "." suffix such as ".isra.0", ".constprop.0".
This is safer because such suffixes can often indicate parameters have
been optimized out.  Since we can now spot this, support matching
to a "." suffix and represent the function in BTF if it does not
have optimized-out parameters.  First an attempt to match by
exact name is made; if that fails we fall back to checking
for a "."-suffixed name.  The BTF representation will use the
original function name "foo" not "foo.isra.0" for consistency
with DWARF representation.

There is a complication however, and this arises because we process
each CU separately and merge BTF when complete.  Different CUs
may optimize differently, so in one CU, a function may have
optimized-out parameters - and thus be ineligible for BTF -
while in another it does not have optimized-out parameters -
making it eligible for BTF.  The NF_HOOK function is an
example of this.

The only workable solution I could find without disrupting
BTF generation parallelism is to create a shared representation
of such "."-suffixed functions in the main BTF encoder.  A
binary tree is used to store function representations.  Instead
of directly adding a static function with a "." suffix, we
postpone their addition until we have processed all CUs.
At that point - since we have merged any observations of
optimized-out parameters for each function - we know if
the candidate function is safe for BTF addition or not.

[1] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 btf_encoder.c | 195 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 pahole.c      |   4 +-
 2 files changed, 191 insertions(+), 8 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index 449439f..a86b099 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -21,6 +21,8 @@
 #include <stdlib.h> /* for qsort() and bsearch() */
 #include <inttypes.h>
 #include <limits.h>
+#include <search.h> /* for tsearch(), tfind() and tdstroy() */
+#include <pthread.h>
 
 #include <sys/types.h>
 #include <sys/stat.h>
@@ -34,6 +36,7 @@
 struct elf_function {
 	const char	*name;
 	bool		 generated;
+	size_t		prefixlen;
 };
 
 #define MAX_PERCPU_VAR_CNT 4096
@@ -57,6 +60,9 @@ struct btf_encoder {
 	struct elf_symtab *symtab;
 	uint32_t	  type_id_off;
 	uint32_t	  unspecified_type;
+	void		  *saved_func_tree;
+	int		  saved_func_cnt;
+	pthread_mutex_t	  saved_func_lock;
 	bool		  has_index_type,
 			  need_index_type,
 			  skip_encoding_vars,
@@ -77,9 +83,12 @@ struct btf_encoder {
 		struct elf_function *entries;
 		int		    allocated;
 		int		    cnt;
+		int		    suffix_cnt; /* number of .isra, .part etc */
 	} functions;
 };
 
+static void btf_encoder__add_saved_funcs(struct btf_encoder *encoder);
+
 void btf_encoders__add(struct list_head *encoders, struct btf_encoder *encoder)
 {
 	list_add_tail(&encoder->node, encoders);
@@ -698,6 +707,11 @@ int32_t btf_encoder__add_encoder(struct btf_encoder *encoder, struct btf_encoder
 			return id;
 	}
 
+	/* for multi-threaded case, saved functions are added to each encoder's
+	 * BTF, prior to it being merged with the parent encoder.
+	 */
+	btf_encoder__add_saved_funcs(encoder);
+
 	return btf__add_btf(encoder->btf, other->btf);
 }
 
@@ -765,6 +779,76 @@ static int32_t btf_encoder__add_decl_tag(struct btf_encoder *encoder, const char
 	return id;
 }
 
+static int function__compare(const void *a, const void *b)
+{
+	struct function *fa = (struct function *)a, *fb = (struct function *)b;
+
+	return strcmp(function__name(fa), function__name(fb));
+}
+
+struct btf_encoder_state {
+	struct btf_encoder *encoder;
+	uint32_t type_id_off;
+};
+
+/*
+ * static functions with suffixes are not added yet - we need to
+ * observe across all CUs to see if the static function has
+ * optimized parameters in any CU, since in such a case it should
+ * not be included in the final BTF.  NF_HOOK.constprop.0() is
+ * a case in point - it has optimized-out parameters in some CUs
+ * but not others.  In order to have consistency (since we do not
+ * know which instance the BTF-specified function signature will
+ * apply to), we simply skip adding functions which have optimized
+ * out parameters anywhere.
+ */
+static int32_t btf_encoder__save_func(struct btf_encoder *encoder, struct function *fn)
+{
+	struct btf_encoder *parent = encoder->parent ? encoder->parent : encoder;
+	const char *name = function__name(fn);
+	struct function **nodep;
+	int ret = 0;
+
+	pthread_mutex_lock(&parent->saved_func_lock);
+	nodep = tsearch(fn, &parent->saved_func_tree, function__compare);
+	if (nodep == NULL) {
+		fprintf(stderr, "error: out of memory adding local function '%s'\n",
+			name);
+		ret = -1;
+		goto out;
+	}
+	/* If we find an existing entry, we want to merge observations
+	 * across both functions, checking that the "seen optimized-out
+	 * parameters" status is reflected in our tree entry.
+	 * If the entry is new, record encoder state required
+	 * to add the local function later (encoder + type_id_off)
+	 * such that we can add the function later.
+	 */
+	if (*nodep != fn) {
+		(*nodep)->proto.optimized_parms |= fn->proto.optimized_parms;
+	} else {
+		struct btf_encoder_state *state = zalloc(sizeof(*state));
+
+		if (state == NULL) {
+			fprintf(stderr, "error: out of memory adding local function '%s'\n",
+				name);
+			ret = -1;
+			goto out;
+		}
+		state->encoder = encoder;
+		state->type_id_off = encoder->type_id_off;
+		fn->priv = state;
+		encoder->saved_func_cnt++;
+		if (encoder->verbose)
+			printf("added local function '%s'%s\n", name,
+			       fn->proto.optimized_parms ?
+			       ", optimized-out params" : "");
+	}
+out:
+	pthread_mutex_unlock(&parent->saved_func_lock);
+	return ret;
+}
+
 static int32_t btf_encoder__add_func(struct btf_encoder *encoder, struct function *fn)
 {
 	int btf_fnproto_id, btf_fn_id, tag_type_id;
@@ -789,6 +873,55 @@ static int32_t btf_encoder__add_func(struct btf_encoder *encoder, struct functio
 	return 0;
 }
 
+/* visit each node once, adding associated function */
+static void btf_encoder__add_saved_func(const void *nodep, const VISIT which,
+					const int depth __maybe_unused)
+{
+	struct btf_encoder_state *state;
+	struct btf_encoder *encoder;
+	struct function *fn = NULL;
+
+	switch (which) {
+	case preorder:
+	case endorder:
+		break;
+	case postorder:
+	case leaf:
+		fn = *((struct function **)nodep);
+		break;
+	}
+	if (!fn || !fn->priv)
+		return;
+	state = (struct btf_encoder_state *)fn->priv;
+	encoder = state->encoder;
+	encoder->type_id_off = state->type_id_off;
+	/* we can safely free encoder state since we visit each node once */
+	free(fn->priv);
+	fn->priv = NULL;
+	if (fn->proto.optimized_parms) {
+		if (encoder->verbose)
+			printf("skipping addition of '%s' due to optimized-out parameters\n",
+			       function__name(fn));
+	} else {
+		btf_encoder__add_func(encoder, fn);
+	}
+}
+
+void saved_func__free(void *nodep __maybe_unused)
+{
+	/* nothing to do, functions are freed when the cu is. */
+}
+
+static void btf_encoder__add_saved_funcs(struct btf_encoder *encoder)
+{
+	if (!encoder->parent && encoder->saved_func_tree) {
+		encoder->type_id_off = 0;
+		twalk(encoder->saved_func_tree, btf_encoder__add_saved_func);
+		tdestroy(encoder->saved_func_tree, saved_func__free);
+		encoder->saved_func_tree = NULL;
+	}
+}
+
 /*
  * This corresponds to the same macro defined in
  * include/linux/kallsyms.h
@@ -800,6 +933,12 @@ static int functions_cmp(const void *_a, const void *_b)
 	const struct elf_function *a = _a;
 	const struct elf_function *b = _b;
 
+	/* if search key allows prefix match, verify target has matching
+	 * prefix len and prefix matches.
+	 */
+	if (a->prefixlen && a->prefixlen == b->prefixlen)
+		return strncmp(a->name, b->name, b->prefixlen);
+
 	return strcmp(a->name, b->name);
 }
 
@@ -832,14 +971,21 @@ static int btf_encoder__collect_function(struct btf_encoder *encoder, GElf_Sym *
 	}
 
 	encoder->functions.entries[encoder->functions.cnt].name = name;
+	if (strchr(name, '.')) {
+		const char *suffix = strchr(name, '.');
+
+		encoder->functions.suffix_cnt++;
+		encoder->functions.entries[encoder->functions.cnt].prefixlen = suffix - name;
+	}
 	encoder->functions.entries[encoder->functions.cnt].generated = false;
 	encoder->functions.cnt++;
 	return 0;
 }
 
-static struct elf_function *btf_encoder__find_function(const struct btf_encoder *encoder, const char *name)
+static struct elf_function *btf_encoder__find_function(const struct btf_encoder *encoder, const char *name,
+						       size_t prefixlen)
 {
-	struct elf_function key = { .name = name };
+	struct elf_function key = { .name = name, .prefixlen = prefixlen };
 
 	return bsearch(&key, encoder->functions.entries, encoder->functions.cnt, sizeof(key), functions_cmp);
 }
@@ -1171,6 +1317,9 @@ int btf_encoder__encode(struct btf_encoder *encoder)
 	if (gobuffer__size(&encoder->percpu_secinfo) != 0)
 		btf_encoder__add_datasec(encoder, PERCPU_SECTION);
 
+	/* for single-threaded case, saved functions are added here */
+	btf_encoder__add_saved_funcs(encoder);
+
 	/* Empty file, nothing to do, so... done! */
 	if (btf__type_cnt(encoder->btf) == 1)
 		return 0;
@@ -1456,6 +1605,7 @@ struct btf_encoder *btf_encoder__new(struct cu *cu, const char *detached_filenam
 		encoder->has_index_type  = false;
 		encoder->need_index_type = false;
 		encoder->array_index_id  = 0;
+		pthread_mutex_init(&encoder->saved_func_lock, NULL);
 
 		GElf_Ehdr ehdr;
 
@@ -1614,6 +1764,7 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
 	}
 
 	cu__for_each_function(cu, core_id, fn) {
+		bool save = false;
 
 		/*
 		 * Skip functions that:
@@ -1634,22 +1785,54 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
 			if (!name)
 				continue;
 
-			func = btf_encoder__find_function(encoder, name);
-			if (!func || func->generated)
+			/* prefer exact name function match... */
+			func = btf_encoder__find_function(encoder, name, 0);
+			if (func) {
+				if (func->generated)
+					continue;
+				func->generated = true;
+			} else if (encoder->functions.suffix_cnt) {
+				/* falling back to name.isra.0 match if no exact
+				 * match is found; only bother if we found any
+				 * .suffix function names.  The function
+				 * will be saved and added once we ensure
+				 * it does not have optimized-out parameters
+				 * in any cu.
+				 */
+				func = btf_encoder__find_function(encoder, name,
+								  strlen(name));
+				if (func) {
+					save = true;
+					if (encoder->verbose)
+						printf("matched function '%s' with '%s'%s\n",
+						       name, func->name,
+						       fn->proto.optimized_parms ?
+						       ", has optimized-out parameters" : "");
+				}
+			}
+			if (!func)
 				continue;
-			func->generated = true;
 		} else {
 			if (!fn->external)
 				continue;
 		}
 
-		err = btf_encoder__add_func(encoder, fn);
+		if (save)
+			err = btf_encoder__save_func(encoder, fn);
+		else
+			err = btf_encoder__add_func(encoder, fn);
 		if (err)
 			goto out;
 	}
 
 	if (!encoder->skip_encoding_vars)
 		err = btf_encoder__encode_cu_variables(encoder);
+
+	/* It is only safe to delete this CU if we have not stashed any local
+	 * functions for later addition.
+	 */
+	if (!err)
+		err = encoder->saved_func_cnt > 0 ? LSK__KEEPIT : LSK__DELETE;
 out:
 	encoder->cu = NULL;
 	return err;
diff --git a/pahole.c b/pahole.c
index 844d502..62d54ec 100644
--- a/pahole.c
+++ b/pahole.c
@@ -3078,11 +3078,11 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
 			encoder = btf_encoder;
 		}
 
-		if (btf_encoder__encode_cu(encoder, cu, conf_load)) {
+		ret = btf_encoder__encode_cu(encoder, cu, conf_load);
+		if (ret < 0) {
 			fprintf(stderr, "Encountered error while encoding BTF.\n");
 			exit(1);
 		}
-		ret = LSK__DELETE;
 out_btf:
 		return ret;
 	}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH dwarves 5/5] btf_encoder: skip BTF encoding of static functions with inconsistent prototypes
  2023-01-24 13:45 [PATCH dwarves 0/5] dwarves: support encoding of optimized-out parameters, removal of inconsistent static functions Alan Maguire
                   ` (3 preceding siblings ...)
  2023-01-24 13:45 ` [PATCH dwarves 4/5] btf_encoder: represent "."-suffixed optimized functions (".isra.0") in BTF Alan Maguire
@ 2023-01-24 13:45 ` Alan Maguire
  2023-01-25 13:39   ` Jiri Olsa
  2023-01-25 16:53   ` Jiri Olsa
  2023-01-24 15:14 ` [PATCH dwarves 0/5] dwarves: support encoding of optimized-out parameters, removal of inconsistent static functions Jiri Olsa
  5 siblings, 2 replies; 27+ messages in thread
From: Alan Maguire @ 2023-01-24 13:45 UTC (permalink / raw)
  To: acme, yhs, ast, olsajiri, timo
  Cc: daniel, andrii, songliubraving, john.fastabend, kpsingh, sdf,
	haoluo, martin.lau, bpf, Alan Maguire

Two static functions in different CUs can share the same name but
have different function prototypes.  In such a case BTF has no
way currently to map which function description matches which
function site, so it is safer to eliminate such duplicates by
using the same global matching procedure used to identify
functions with optimized-out parameters.  Protoype checking
is done by comparing parameter numbers and then names.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 btf_encoder.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++----------
 dwarves.h     |  1 +
 2 files changed, 78 insertions(+), 16 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index a86b099..e0acd29 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -786,21 +786,39 @@ static int function__compare(const void *a, const void *b)
 	return strcmp(function__name(fa), function__name(fb));
 }
 
+#define BTF_ENCODER_MAX_PARAMETERS	10
+
 struct btf_encoder_state {
 	struct btf_encoder *encoder;
 	uint32_t type_id_off;
+	bool got_parameter_names;
+	const char *parameter_names[BTF_ENCODER_MAX_PARAMETERS];
 };
 
+void parameter_names__get(struct ftype *ftype, size_t nr_parameters, const char **parameter_names)
+{
+	struct parameter *parameter;
+	int i = 0;
+
+	ftype__for_each_parameter(ftype, parameter) {
+		if (i >= nr_parameters)
+			break;
+		parameter_names[i++] = parameter__name(parameter);
+	}
+}
+
 /*
- * static functions with suffixes are not added yet - we need to
- * observe across all CUs to see if the static function has
- * optimized parameters in any CU, since in such a case it should
- * not be included in the final BTF.  NF_HOOK.constprop.0() is
- * a case in point - it has optimized-out parameters in some CUs
- * but not others.  In order to have consistency (since we do not
- * know which instance the BTF-specified function signature will
- * apply to), we simply skip adding functions which have optimized
- * out parameters anywhere.
+ * static functions are not added yet - we need to observe across
+ * all CUs to see if the static function has optimized-out parameters
+ * in any CU, or inconsistent function prototypes; in either case,
+ * it should not be included in the final BTF.  For the optimized-out
+ * case, NF_HOOK.constprop.0() is a case in point - it has optimized-out
+ * parameters in some CUs but not others.  Similarly, a static function
+ * like enable_store() has inconsistent function prototypes in different
+ * CUs so should not be present.  In order to have consistency (since
+ * we do not know which instance the BTF-specified function signature
+ * will apply to), we simply skip adding functions which have
+ * optimized-out parameters/inconsistent function prototypes anywhere.
  */
 static int32_t btf_encoder__save_func(struct btf_encoder *encoder, struct function *fn)
 {
@@ -819,13 +837,51 @@ static int32_t btf_encoder__save_func(struct btf_encoder *encoder, struct functi
 	}
 	/* If we find an existing entry, we want to merge observations
 	 * across both functions, checking that the "seen optimized-out
-	 * parameters" status is reflected in our tree entry.
+	 * parameters"/inconsistent proto status is reflected in tree entry.
 	 * If the entry is new, record encoder state required
 	 * to add the local function later (encoder + type_id_off)
-	 * such that we can add the function later.
+	 * such that we can add the function later.  Parameter names are
+	 * also stored in state to speed up multiple static function
+	 * comparisons.
 	 */
 	if (*nodep != fn) {
-		(*nodep)->proto.optimized_parms |= fn->proto.optimized_parms;
+		struct function *ofn = *nodep;
+
+		ofn->proto.optimized_parms |= fn->proto.optimized_parms;
+		/* compare parameters to see if signatures match */
+
+		if (ofn->proto.inconsistent_proto)
+			goto out;
+
+		if (ofn->proto.nr_parms != fn->proto.nr_parms) {
+			ofn->proto.inconsistent_proto = 1;
+			goto out;
+		}
+		if (ofn->proto.nr_parms > 0) {
+			struct btf_encoder_state *state = ofn->priv;
+			const char *parameter_names[BTF_ENCODER_MAX_PARAMETERS];
+			int i;
+
+			if (!state->got_parameter_names) {
+				parameter_names__get(&ofn->proto, BTF_ENCODER_MAX_PARAMETERS,
+						     state->parameter_names);
+				state->got_parameter_names = true;
+			}
+			parameter_names__get(&fn->proto, BTF_ENCODER_MAX_PARAMETERS,
+					     parameter_names);
+			for (i = 0; i < ofn->proto.nr_parms; i++) {
+				if (!state->parameter_names[i]) {
+					if (!parameter_names[i])
+						continue;
+				} else if (parameter_names[i]) {
+					if (strcmp(state->parameter_names[i],
+						   parameter_names[i]) == 0)
+						continue;
+				}
+				ofn->proto.inconsistent_proto = 1;
+				goto out;
+			}
+		}
 	} else {
 		struct btf_encoder_state *state = zalloc(sizeof(*state));
 
@@ -898,10 +954,12 @@ static void btf_encoder__add_saved_func(const void *nodep, const VISIT which,
 	/* we can safely free encoder state since we visit each node once */
 	free(fn->priv);
 	fn->priv = NULL;
-	if (fn->proto.optimized_parms) {
+	if (fn->proto.optimized_parms || fn->proto.inconsistent_proto) {
 		if (encoder->verbose)
-			printf("skipping addition of '%s' due to optimized-out parameters\n",
-			       function__name(fn));
+			printf("skipping addition of '%s' due to %s\n",
+			       function__name(fn),
+			       fn->proto.optimized_parms ? "optimized-out parameters" :
+							   "multiple inconsistent function prototypes");
 	} else {
 		btf_encoder__add_func(encoder, fn);
 	}
@@ -1775,6 +1833,8 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
 		 */
 		if (fn->declaration)
 			continue;
+		if (!fn->external)
+			save = true;
 		if (!ftype__has_arg_names(&fn->proto))
 			continue;
 		if (encoder->functions.cnt) {
@@ -1790,7 +1850,8 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
 			if (func) {
 				if (func->generated)
 					continue;
-				func->generated = true;
+				if (!save)
+					func->generated = true;
 			} else if (encoder->functions.suffix_cnt) {
 				/* falling back to name.isra.0 match if no exact
 				 * match is found; only bother if we found any
diff --git a/dwarves.h b/dwarves.h
index 1ad1b3b..9b80262 100644
--- a/dwarves.h
+++ b/dwarves.h
@@ -830,6 +830,7 @@ struct ftype {
 	uint16_t	 nr_parms;
 	uint8_t		 unspec_parms:1; /* just one bit is needed */
 	uint8_t		 optimized_parms:1;
+	uint8_t		 inconsistent_proto:1;
 };
 
 static inline struct ftype *tag__ftype(const struct tag *tag)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH dwarves 0/5] dwarves: support encoding of optimized-out parameters, removal of inconsistent static functions
  2023-01-24 13:45 [PATCH dwarves 0/5] dwarves: support encoding of optimized-out parameters, removal of inconsistent static functions Alan Maguire
                   ` (4 preceding siblings ...)
  2023-01-24 13:45 ` [PATCH dwarves 5/5] btf_encoder: skip BTF encoding of static functions with inconsistent prototypes Alan Maguire
@ 2023-01-24 15:14 ` Jiri Olsa
  2023-01-24 16:11   ` Alan Maguire
  5 siblings, 1 reply; 27+ messages in thread
From: Jiri Olsa @ 2023-01-24 15:14 UTC (permalink / raw)
  To: Alan Maguire
  Cc: acme, yhs, ast, olsajiri, timo, daniel, andrii, songliubraving,
	john.fastabend, kpsingh, sdf, haoluo, martin.lau, bpf

On Tue, Jan 24, 2023 at 01:45:26PM +0000, Alan Maguire wrote:
> At optimization level -O2 or higher in gcc, static functions may be
> optimized such that they have suffixes like .isra.0, .constprop.0 etc.
> These represent
> 
> - constant propagation (.constprop.0);
> - interprocedural scalar replacement of aggregates, removal of
>   unused parameters and replacement of parameters passed by
>   reference by parameters passed by value (.isra.0)
> 
> See [1] for details.
> 
> Currently BTF encoding does not handle such optimized functions
> that get renamed with a "." suffix such as ".isra.0", ".constprop.0".
> This is safer because such suffixes can often indicate parameters have
> been optimized out.  This series addresses this by matching a
> function to a suffixed version ("foo" matching "foo.isra.0") while
> ensuring that the function signature does not contain optimized-out
> parameters.  Note that if the function is found ("foo") it will
> be preferred, only falling back to "foo.isra.0" if lookup of the
> function fails.  Addition to BTF is skipped if the function has
> optimized-out parameters, since the expected function signature
> will not match. BTF encoding does not include the "."-suffix to
> be consistent with DWARF. In addition, the kernel currently does
> not allow a "." suffix in a BTF function name.
> 
> A problem with this approach however is that BTF carries out the
> encoding process in parallel across multiple CUs, and sometimes
> a function has optimized-out parameters in one CU but not others;
> we see this for NF_HOOK.constprop.0 for example.  So in order to
> determine if the function has optimized-out parameters in any
> CU, its addition is not carried out until we have processed all
> CUs and are about to merge BTF.  At this point we know if any
> such optimizations have occurred.  Patches 1-4 handle the
> optimized-out parameter identification and matching "."-suffixed
> functions with the original function to facilitate BTF
> encoding.
> 
> Patch 5 addresses a related problem - it is entirely possible
> for a static function of the same name to exist in different
> CUs with different function signatures.  Because BTF does not
> currently encode any information that would help disambiguate
> which BTF function specification matches which static function
> (in the case of multiple different function signatures), it is
> best to eliminate such functions from BTF for now.  The same
> mechanism that is used to compare static "."-suffixed functions
> is re-used for the static function comparison.  A superficial
> comparison of number of parameters/parameter names is done to
> see if such representations are consistent, and if inconsistent
> prototypes are observed, the function is flagged for exclusion
> from BTF.

should we remove all instances of static functions with same name?

Yonghong suggested in the other thread, that user will assume that
the function he's attached to is the one he expects, while he will
be attached to any (first) match we get from kallsyms_lookup_name

thanks,
jirka

> 
> When these methods are combined - the additive encoding of
> "."-suffixed functions and the subtractive elimination of
> functions with inconsistent parameters - we see a small overall
> increase in the number of functions in vmlinux BTF, from
> 49538 to 50083.  It turns out that the inconsistent prototype
> checking will actually eliminate some of the suffix matches
> also, for cases where the DWARF representation of a function
> differs across CUs, but not via the abstract origin late DWARF
> references showing optimized-out parameters that we check
> for in patch 1.
> 
> [1] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
> 
> Alan Maguire (5):
>   dwarves: help dwarf loader spot functions with optimized-out
>     parameters
>   btf_encoder: refactor function addition into dedicated
>     btf_encoder__add_func
>   btf_encoder: child encoders should have a reference to parent encoder
>   btf_encoder: represent "."-suffixed optimized functions (".isra.0") in
>     BTF
>   btf_encoder: skip BTF encoding of static functions with inconsistent
>     prototypes
> 
>  btf_encoder.c  | 357 +++++++++++++++++++++++++++++++++++++++++++++++++--------
>  btf_encoder.h  |   2 +-
>  dwarf_loader.c |  76 +++++++++++-
>  dwarves.h      |   5 +-
>  pahole.c       |   7 +-
>  5 files changed, 390 insertions(+), 57 deletions(-)
> 
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH dwarves 0/5] dwarves: support encoding of optimized-out parameters, removal of inconsistent static functions
  2023-01-24 15:14 ` [PATCH dwarves 0/5] dwarves: support encoding of optimized-out parameters, removal of inconsistent static functions Jiri Olsa
@ 2023-01-24 16:11   ` Alan Maguire
  2023-01-25 13:59     ` Jiri Olsa
  0 siblings, 1 reply; 27+ messages in thread
From: Alan Maguire @ 2023-01-24 16:11 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, yhs, ast, timo, daniel, andrii, songliubraving,
	john.fastabend, kpsingh, sdf, haoluo, martin.lau, bpf

On 24/01/2023 15:14, Jiri Olsa wrote:
> On Tue, Jan 24, 2023 at 01:45:26PM +0000, Alan Maguire wrote:
>> At optimization level -O2 or higher in gcc, static functions may be
>> optimized such that they have suffixes like .isra.0, .constprop.0 etc.
>> These represent
>>
>> - constant propagation (.constprop.0);
>> - interprocedural scalar replacement of aggregates, removal of
>>   unused parameters and replacement of parameters passed by
>>   reference by parameters passed by value (.isra.0)
>>
>> See [1] for details.
>>
>> Currently BTF encoding does not handle such optimized functions
>> that get renamed with a "." suffix such as ".isra.0", ".constprop.0".
>> This is safer because such suffixes can often indicate parameters have
>> been optimized out.  This series addresses this by matching a
>> function to a suffixed version ("foo" matching "foo.isra.0") while
>> ensuring that the function signature does not contain optimized-out
>> parameters.  Note that if the function is found ("foo") it will
>> be preferred, only falling back to "foo.isra.0" if lookup of the
>> function fails.  Addition to BTF is skipped if the function has
>> optimized-out parameters, since the expected function signature
>> will not match. BTF encoding does not include the "."-suffix to
>> be consistent with DWARF. In addition, the kernel currently does
>> not allow a "." suffix in a BTF function name.
>>
>> A problem with this approach however is that BTF carries out the
>> encoding process in parallel across multiple CUs, and sometimes
>> a function has optimized-out parameters in one CU but not others;
>> we see this for NF_HOOK.constprop.0 for example.  So in order to
>> determine if the function has optimized-out parameters in any
>> CU, its addition is not carried out until we have processed all
>> CUs and are about to merge BTF.  At this point we know if any
>> such optimizations have occurred.  Patches 1-4 handle the
>> optimized-out parameter identification and matching "."-suffixed
>> functions with the original function to facilitate BTF
>> encoding.
>>
>> Patch 5 addresses a related problem - it is entirely possible
>> for a static function of the same name to exist in different
>> CUs with different function signatures.  Because BTF does not
>> currently encode any information that would help disambiguate
>> which BTF function specification matches which static function
>> (in the case of multiple different function signatures), it is
>> best to eliminate such functions from BTF for now.  The same
>> mechanism that is used to compare static "."-suffixed functions
>> is re-used for the static function comparison.  A superficial
>> comparison of number of parameters/parameter names is done to
>> see if such representations are consistent, and if inconsistent
>> prototypes are observed, the function is flagged for exclusion
>> from BTF.
> 
> should we remove all instances of static functions with same name?
> 
> Yonghong suggested in the other thread, that user will assume that
> the function he's attached to is the one he expects, while he will
> be attached to any (first) match we get from kallsyms_lookup_name
>

The problem is that many static functions that have consistent
prototypes are encountered multiple times when processing CUs,
even though some have only one System.map/kallsyms entry. I tweaked patch 5 
to count how many times a function was encountered when compiling the
tree of static functions. It turns out of the 25872 functions, 22608 
are found once, leaving 3264 that are found multiple times. This would
be a lot of functions to leave out I think, especially since many
don't actually have multiple kallsyms entries to deal with and
the prototype is consistent.

BTW there's a bug in patch 5 that can cause a segmentation fault,
apologies about this. I don't want to send a v2 yet as folks
haven't had a chance to look at it properly but the fix is below.
Thanks!

Alan

diff --git a/btf_encoder.c b/btf_encoder.c
index ee0b032..e89c1a8 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -786,7 +786,7 @@ static int function__compare(const void *a, const void *b)
        return strcmp(function__name(fa), function__name(fb));
 }
 
-#define BTF_ENCODER_MAX_PARAMETERS     10
+#define BTF_ENCODER_MAX_PARAMETERS     12
 
 struct btf_encoder_state {
        struct btf_encoder *encoder;
@@ -869,7 +869,7 @@ static int32_t btf_encoder__save_func(struct btf_encoder *en
                        }
                        parameter_names__get(&fn->proto, BTF_ENCODER_MAX_PARAMET
                                             parameter_names);
-                       for (i = 0; i < ofn->proto.nr_parms; i++) {
+                       for (i = 0; i < ofn->proto.nr_parms && i < BTF_ENCODER_M
                                if (!state->parameter_names[i]) {
                                        if (!parameter_names[i])
                                                continue;

> thanks,
> jirka
> 
>>
>> When these methods are combined - the additive encoding of
>> "."-suffixed functions and the subtractive elimination of
>> functions with inconsistent parameters - we see a small overall
>> increase in the number of functions in vmlinux BTF, from
>> 49538 to 50083.  It turns out that the inconsistent prototype
>> checking will actually eliminate some of the suffix matches
>> also, for cases where the DWARF representation of a function
>> differs across CUs, but not via the abstract origin late DWARF
>> references showing optimized-out parameters that we check
>> for in patch 1.
>>
>> [1] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
>>
>> Alan Maguire (5):
>>   dwarves: help dwarf loader spot functions with optimized-out
>>     parameters
>>   btf_encoder: refactor function addition into dedicated
>>     btf_encoder__add_func
>>   btf_encoder: child encoders should have a reference to parent encoder
>>   btf_encoder: represent "."-suffixed optimized functions (".isra.0") in
>>     BTF
>>   btf_encoder: skip BTF encoding of static functions with inconsistent
>>     prototypes
>>
>>  btf_encoder.c  | 357 +++++++++++++++++++++++++++++++++++++++++++++++++--------
>>  btf_encoder.h  |   2 +-
>>  dwarf_loader.c |  76 +++++++++++-
>>  dwarves.h      |   5 +-
>>  pahole.c       |   7 +-
>>  5 files changed, 390 insertions(+), 57 deletions(-)
>>
>> -- 
>> 1.8.3.1
>>

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH dwarves 5/5] btf_encoder: skip BTF encoding of static functions with inconsistent prototypes
  2023-01-24 13:45 ` [PATCH dwarves 5/5] btf_encoder: skip BTF encoding of static functions with inconsistent prototypes Alan Maguire
@ 2023-01-25 13:39   ` Jiri Olsa
  2023-01-25 14:18     ` Alan Maguire
  2023-01-25 16:53   ` Jiri Olsa
  1 sibling, 1 reply; 27+ messages in thread
From: Jiri Olsa @ 2023-01-25 13:39 UTC (permalink / raw)
  To: Alan Maguire
  Cc: acme, yhs, ast, olsajiri, timo, daniel, andrii, songliubraving,
	john.fastabend, kpsingh, sdf, haoluo, martin.lau, bpf

On Tue, Jan 24, 2023 at 01:45:31PM +0000, Alan Maguire wrote:

SNIP

>  static int32_t btf_encoder__save_func(struct btf_encoder *encoder, struct function *fn)
>  {
> @@ -819,13 +837,51 @@ static int32_t btf_encoder__save_func(struct btf_encoder *encoder, struct functi
>  	}
>  	/* If we find an existing entry, we want to merge observations
>  	 * across both functions, checking that the "seen optimized-out
> -	 * parameters" status is reflected in our tree entry.
> +	 * parameters"/inconsistent proto status is reflected in tree entry.
>  	 * If the entry is new, record encoder state required
>  	 * to add the local function later (encoder + type_id_off)
> -	 * such that we can add the function later.
> +	 * such that we can add the function later.  Parameter names are
> +	 * also stored in state to speed up multiple static function
> +	 * comparisons.
>  	 */
>  	if (*nodep != fn) {
> -		(*nodep)->proto.optimized_parms |= fn->proto.optimized_parms;
> +		struct function *ofn = *nodep;
> +
> +		ofn->proto.optimized_parms |= fn->proto.optimized_parms;
> +		/* compare parameters to see if signatures match */
> +
> +		if (ofn->proto.inconsistent_proto)
> +			goto out;
> +
> +		if (ofn->proto.nr_parms != fn->proto.nr_parms) {
> +			ofn->proto.inconsistent_proto = 1;
> +			goto out;
> +		}
> +		if (ofn->proto.nr_parms > 0) {
> +			struct btf_encoder_state *state = ofn->priv;
> +			const char *parameter_names[BTF_ENCODER_MAX_PARAMETERS];
> +			int i;
> +
> +			if (!state->got_parameter_names) {
> +				parameter_names__get(&ofn->proto, BTF_ENCODER_MAX_PARAMETERS,
> +						     state->parameter_names);
> +				state->got_parameter_names = true;
> +			}
> +			parameter_names__get(&fn->proto, BTF_ENCODER_MAX_PARAMETERS,
> +					     parameter_names);
> +			for (i = 0; i < ofn->proto.nr_parms; i++) {
> +				if (!state->parameter_names[i]) {
> +					if (!parameter_names[i])
> +						continue;
> +				} else if (parameter_names[i]) {
> +					if (strcmp(state->parameter_names[i],
> +						   parameter_names[i]) == 0)
> +						continue;

I guess we can't check type easily? tag has type field,
but I'm not sure if we can get reasonable type info from that

jirka

> +				}
> +				ofn->proto.inconsistent_proto = 1;
> +				goto out;
> +			}
> +		}
>  	} else {
>  		struct btf_encoder_state *state = zalloc(sizeof(*state));
>  
> @@ -898,10 +954,12 @@ static void btf_encoder__add_saved_func(const void *nodep, const VISIT which,
>  	/* we can safely free encoder state since we visit each node once */
>  	free(fn->priv);
>  	fn->priv = NULL;
> -	if (fn->proto.optimized_parms) {
> +	if (fn->proto.optimized_parms || fn->proto.inconsistent_proto) {
>  		if (encoder->verbose)
> -			printf("skipping addition of '%s' due to optimized-out parameters\n",
> -			       function__name(fn));
> +			printf("skipping addition of '%s' due to %s\n",
> +			       function__name(fn),
> +			       fn->proto.optimized_parms ? "optimized-out parameters" :
> +							   "multiple inconsistent function prototypes");
>  	} else {
>  		btf_encoder__add_func(encoder, fn);
>  	}
> @@ -1775,6 +1833,8 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
>  		 */
>  		if (fn->declaration)
>  			continue;
> +		if (!fn->external)
> +			save = true;
>  		if (!ftype__has_arg_names(&fn->proto))
>  			continue;
>  		if (encoder->functions.cnt) {
> @@ -1790,7 +1850,8 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
>  			if (func) {
>  				if (func->generated)
>  					continue;
> -				func->generated = true;
> +				if (!save)
> +					func->generated = true;
>  			} else if (encoder->functions.suffix_cnt) {
>  				/* falling back to name.isra.0 match if no exact
>  				 * match is found; only bother if we found any
> diff --git a/dwarves.h b/dwarves.h
> index 1ad1b3b..9b80262 100644
> --- a/dwarves.h
> +++ b/dwarves.h
> @@ -830,6 +830,7 @@ struct ftype {
>  	uint16_t	 nr_parms;
>  	uint8_t		 unspec_parms:1; /* just one bit is needed */
>  	uint8_t		 optimized_parms:1;
> +	uint8_t		 inconsistent_proto:1;
>  };
>  
>  static inline struct ftype *tag__ftype(const struct tag *tag)
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH dwarves 0/5] dwarves: support encoding of optimized-out parameters, removal of inconsistent static functions
  2023-01-24 16:11   ` Alan Maguire
@ 2023-01-25 13:59     ` Jiri Olsa
  0 siblings, 0 replies; 27+ messages in thread
From: Jiri Olsa @ 2023-01-25 13:59 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Jiri Olsa, acme, yhs, ast, timo, daniel, andrii, songliubraving,
	john.fastabend, kpsingh, sdf, haoluo, martin.lau, bpf

On Tue, Jan 24, 2023 at 04:11:56PM +0000, Alan Maguire wrote:
> On 24/01/2023 15:14, Jiri Olsa wrote:
> > On Tue, Jan 24, 2023 at 01:45:26PM +0000, Alan Maguire wrote:
> >> At optimization level -O2 or higher in gcc, static functions may be
> >> optimized such that they have suffixes like .isra.0, .constprop.0 etc.
> >> These represent
> >>
> >> - constant propagation (.constprop.0);
> >> - interprocedural scalar replacement of aggregates, removal of
> >>   unused parameters and replacement of parameters passed by
> >>   reference by parameters passed by value (.isra.0)
> >>
> >> See [1] for details.
> >>
> >> Currently BTF encoding does not handle such optimized functions
> >> that get renamed with a "." suffix such as ".isra.0", ".constprop.0".
> >> This is safer because such suffixes can often indicate parameters have
> >> been optimized out.  This series addresses this by matching a
> >> function to a suffixed version ("foo" matching "foo.isra.0") while
> >> ensuring that the function signature does not contain optimized-out
> >> parameters.  Note that if the function is found ("foo") it will
> >> be preferred, only falling back to "foo.isra.0" if lookup of the
> >> function fails.  Addition to BTF is skipped if the function has
> >> optimized-out parameters, since the expected function signature
> >> will not match. BTF encoding does not include the "."-suffix to
> >> be consistent with DWARF. In addition, the kernel currently does
> >> not allow a "." suffix in a BTF function name.
> >>
> >> A problem with this approach however is that BTF carries out the
> >> encoding process in parallel across multiple CUs, and sometimes
> >> a function has optimized-out parameters in one CU but not others;
> >> we see this for NF_HOOK.constprop.0 for example.  So in order to
> >> determine if the function has optimized-out parameters in any
> >> CU, its addition is not carried out until we have processed all
> >> CUs and are about to merge BTF.  At this point we know if any
> >> such optimizations have occurred.  Patches 1-4 handle the
> >> optimized-out parameter identification and matching "."-suffixed
> >> functions with the original function to facilitate BTF
> >> encoding.
> >>
> >> Patch 5 addresses a related problem - it is entirely possible
> >> for a static function of the same name to exist in different
> >> CUs with different function signatures.  Because BTF does not
> >> currently encode any information that would help disambiguate
> >> which BTF function specification matches which static function
> >> (in the case of multiple different function signatures), it is
> >> best to eliminate such functions from BTF for now.  The same
> >> mechanism that is used to compare static "."-suffixed functions
> >> is re-used for the static function comparison.  A superficial
> >> comparison of number of parameters/parameter names is done to
> >> see if such representations are consistent, and if inconsistent
> >> prototypes are observed, the function is flagged for exclusion
> >> from BTF.
> > 
> > should we remove all instances of static functions with same name?
> > 
> > Yonghong suggested in the other thread, that user will assume that
> > the function he's attached to is the one he expects, while he will
> > be attached to any (first) match we get from kallsyms_lookup_name
> >
> 
> The problem is that many static functions that have consistent
> prototypes are encountered multiple times when processing CUs,
> even though some have only one System.map/kallsyms entry. I tweaked patch 5 
> to count how many times a function was encountered when compiling the
> tree of static functions. It turns out of the 25872 functions, 22608 
> are found once, leaving 3264 that are found multiple times. This would
> be a lot of functions to leave out I think, especially since many
> don't actually have multiple kallsyms entries to deal with and
> the prototype is consistent.

ok, I checked and it seems like this is the case for inlined functions,
not sure what we can do here

all the instances having same prototype might be enough for now

jirka

> 
> BTW there's a bug in patch 5 that can cause a segmentation fault,
> apologies about this. I don't want to send a v2 yet as folks
> haven't had a chance to look at it properly but the fix is below.
> Thanks!
> 
> Alan
> 
> diff --git a/btf_encoder.c b/btf_encoder.c
> index ee0b032..e89c1a8 100644
> --- a/btf_encoder.c
> +++ b/btf_encoder.c
> @@ -786,7 +786,7 @@ static int function__compare(const void *a, const void *b)
>         return strcmp(function__name(fa), function__name(fb));
>  }
>  
> -#define BTF_ENCODER_MAX_PARAMETERS     10
> +#define BTF_ENCODER_MAX_PARAMETERS     12
>  
>  struct btf_encoder_state {
>         struct btf_encoder *encoder;
> @@ -869,7 +869,7 @@ static int32_t btf_encoder__save_func(struct btf_encoder *en
>                         }
>                         parameter_names__get(&fn->proto, BTF_ENCODER_MAX_PARAMET
>                                              parameter_names);
> -                       for (i = 0; i < ofn->proto.nr_parms; i++) {
> +                       for (i = 0; i < ofn->proto.nr_parms && i < BTF_ENCODER_M
>                                 if (!state->parameter_names[i]) {
>                                         if (!parameter_names[i])
>                                                 continue;
> 
> > thanks,
> > jirka
> > 
> >>
> >> When these methods are combined - the additive encoding of
> >> "."-suffixed functions and the subtractive elimination of
> >> functions with inconsistent parameters - we see a small overall
> >> increase in the number of functions in vmlinux BTF, from
> >> 49538 to 50083.  It turns out that the inconsistent prototype
> >> checking will actually eliminate some of the suffix matches
> >> also, for cases where the DWARF representation of a function
> >> differs across CUs, but not via the abstract origin late DWARF
> >> references showing optimized-out parameters that we check
> >> for in patch 1.
> >>
> >> [1] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
> >>
> >> Alan Maguire (5):
> >>   dwarves: help dwarf loader spot functions with optimized-out
> >>     parameters
> >>   btf_encoder: refactor function addition into dedicated
> >>     btf_encoder__add_func
> >>   btf_encoder: child encoders should have a reference to parent encoder
> >>   btf_encoder: represent "."-suffixed optimized functions (".isra.0") in
> >>     BTF
> >>   btf_encoder: skip BTF encoding of static functions with inconsistent
> >>     prototypes
> >>
> >>  btf_encoder.c  | 357 +++++++++++++++++++++++++++++++++++++++++++++++++--------
> >>  btf_encoder.h  |   2 +-
> >>  dwarf_loader.c |  76 +++++++++++-
> >>  dwarves.h      |   5 +-
> >>  pahole.c       |   7 +-
> >>  5 files changed, 390 insertions(+), 57 deletions(-)
> >>
> >> -- 
> >> 1.8.3.1
> >>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH dwarves 5/5] btf_encoder: skip BTF encoding of static functions with inconsistent prototypes
  2023-01-25 13:39   ` Jiri Olsa
@ 2023-01-25 14:18     ` Alan Maguire
  0 siblings, 0 replies; 27+ messages in thread
From: Alan Maguire @ 2023-01-25 14:18 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, yhs, ast, timo, daniel, andrii, songliubraving,
	john.fastabend, kpsingh, sdf, haoluo, martin.lau, bpf

On 25/01/2023 13:39, Jiri Olsa wrote:
> On Tue, Jan 24, 2023 at 01:45:31PM +0000, Alan Maguire wrote:
> 
> SNIP
> 
>>  static int32_t btf_encoder__save_func(struct btf_encoder *encoder, struct function *fn)
>>  {
>> @@ -819,13 +837,51 @@ static int32_t btf_encoder__save_func(struct btf_encoder *encoder, struct functi
>>  	}
>>  	/* If we find an existing entry, we want to merge observations
>>  	 * across both functions, checking that the "seen optimized-out
>> -	 * parameters" status is reflected in our tree entry.
>> +	 * parameters"/inconsistent proto status is reflected in tree entry.
>>  	 * If the entry is new, record encoder state required
>>  	 * to add the local function later (encoder + type_id_off)
>> -	 * such that we can add the function later.
>> +	 * such that we can add the function later.  Parameter names are
>> +	 * also stored in state to speed up multiple static function
>> +	 * comparisons.
>>  	 */
>>  	if (*nodep != fn) {
>> -		(*nodep)->proto.optimized_parms |= fn->proto.optimized_parms;
>> +		struct function *ofn = *nodep;
>> +
>> +		ofn->proto.optimized_parms |= fn->proto.optimized_parms;
>> +		/* compare parameters to see if signatures match */
>> +
>> +		if (ofn->proto.inconsistent_proto)
>> +			goto out;
>> +
>> +		if (ofn->proto.nr_parms != fn->proto.nr_parms) {
>> +			ofn->proto.inconsistent_proto = 1;
>> +			goto out;
>> +		}
>> +		if (ofn->proto.nr_parms > 0) {
>> +			struct btf_encoder_state *state = ofn->priv;
>> +			const char *parameter_names[BTF_ENCODER_MAX_PARAMETERS];
>> +			int i;
>> +
>> +			if (!state->got_parameter_names) {
>> +				parameter_names__get(&ofn->proto, BTF_ENCODER_MAX_PARAMETERS,
>> +						     state->parameter_names);
>> +				state->got_parameter_names = true;
>> +			}
>> +			parameter_names__get(&fn->proto, BTF_ENCODER_MAX_PARAMETERS,
>> +					     parameter_names);
>> +			for (i = 0; i < ofn->proto.nr_parms; i++) {
>> +				if (!state->parameter_names[i]) {
>> +					if (!parameter_names[i])
>> +						continue;
>> +				} else if (parameter_names[i]) {
>> +					if (strcmp(state->parameter_names[i],
>> +						   parameter_names[i]) == 0)
>> +						continue;
> 
> I guess we can't check type easily? tag has type field,
> but I'm not sure if we can get reasonable type info from that
>

Ideally we'd do that definitely; my worry is that we'd have to
provide a buffer for each parameter type, and representing some parameter
types can be quite complex (like function pointer parameters). 
The memory and computation overheads would likely be significant to compute
the exact parameter types I suspect, and this would need to be
done for > 3000 vmlinux functions which have multiple instances.

In practice, the simplistic approach does seem to work;
I'd suggest we stick with the simple approach for now and
see if we can improve on it over time.

Thanks!

Alan 
> jirka
> 
>> +				}
>> +				ofn->proto.inconsistent_proto = 1;
>> +				goto out;
>> +			}
>> +		}
>>  	} else {
>>  		struct btf_encoder_state *state = zalloc(sizeof(*state));
>>  
>> @@ -898,10 +954,12 @@ static void btf_encoder__add_saved_func(const void *nodep, const VISIT which,
>>  	/* we can safely free encoder state since we visit each node once */
>>  	free(fn->priv);
>>  	fn->priv = NULL;
>> -	if (fn->proto.optimized_parms) {
>> +	if (fn->proto.optimized_parms || fn->proto.inconsistent_proto) {
>>  		if (encoder->verbose)
>> -			printf("skipping addition of '%s' due to optimized-out parameters\n",
>> -			       function__name(fn));
>> +			printf("skipping addition of '%s' due to %s\n",
>> +			       function__name(fn),
>> +			       fn->proto.optimized_parms ? "optimized-out parameters" :
>> +							   "multiple inconsistent function prototypes");
>>  	} else {
>>  		btf_encoder__add_func(encoder, fn);
>>  	}
>> @@ -1775,6 +1833,8 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
>>  		 */
>>  		if (fn->declaration)
>>  			continue;
>> +		if (!fn->external)
>> +			save = true;
>>  		if (!ftype__has_arg_names(&fn->proto))
>>  			continue;
>>  		if (encoder->functions.cnt) {
>> @@ -1790,7 +1850,8 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
>>  			if (func) {
>>  				if (func->generated)
>>  					continue;
>> -				func->generated = true;
>> +				if (!save)
>> +					func->generated = true;
>>  			} else if (encoder->functions.suffix_cnt) {
>>  				/* falling back to name.isra.0 match if no exact
>>  				 * match is found; only bother if we found any
>> diff --git a/dwarves.h b/dwarves.h
>> index 1ad1b3b..9b80262 100644
>> --- a/dwarves.h
>> +++ b/dwarves.h
>> @@ -830,6 +830,7 @@ struct ftype {
>>  	uint16_t	 nr_parms;
>>  	uint8_t		 unspec_parms:1; /* just one bit is needed */
>>  	uint8_t		 optimized_parms:1;
>> +	uint8_t		 inconsistent_proto:1;
>>  };
>>  
>>  static inline struct ftype *tag__ftype(const struct tag *tag)
>> -- 
>> 1.8.3.1
>>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH dwarves 5/5] btf_encoder: skip BTF encoding of static functions with inconsistent prototypes
  2023-01-24 13:45 ` [PATCH dwarves 5/5] btf_encoder: skip BTF encoding of static functions with inconsistent prototypes Alan Maguire
  2023-01-25 13:39   ` Jiri Olsa
@ 2023-01-25 16:53   ` Jiri Olsa
  2023-01-26 14:12     ` Alan Maguire
  1 sibling, 1 reply; 27+ messages in thread
From: Jiri Olsa @ 2023-01-25 16:53 UTC (permalink / raw)
  To: Alan Maguire
  Cc: acme, yhs, ast, olsajiri, timo, daniel, andrii, songliubraving,
	john.fastabend, kpsingh, sdf, haoluo, martin.lau, bpf

On Tue, Jan 24, 2023 at 01:45:31PM +0000, Alan Maguire wrote:

SNIP

>  	} else {
>  		struct btf_encoder_state *state = zalloc(sizeof(*state));
>  
> @@ -898,10 +954,12 @@ static void btf_encoder__add_saved_func(const void *nodep, const VISIT which,
>  	/* we can safely free encoder state since we visit each node once */
>  	free(fn->priv);
>  	fn->priv = NULL;
> -	if (fn->proto.optimized_parms) {
> +	if (fn->proto.optimized_parms || fn->proto.inconsistent_proto) {
>  		if (encoder->verbose)
> -			printf("skipping addition of '%s' due to optimized-out parameters\n",
> -			       function__name(fn));
> +			printf("skipping addition of '%s' due to %s\n",
> +			       function__name(fn),
> +			       fn->proto.optimized_parms ? "optimized-out parameters" :
> +							   "multiple inconsistent function prototypes");
>  	} else {
>  		btf_encoder__add_func(encoder, fn);
>  	}
> @@ -1775,6 +1833,8 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
>  		 */
>  		if (fn->declaration)
>  			continue;
> +		if (!fn->external)
> +			save = true;

how about conflicts between static and global functions,
I can see still few cases like:

  void __init msg_init(void)
  static void msg_init(struct spi_message *m, struct spi_transfer *x,
                       u8 *addr, size_t count, char *tx, char *rx)

  static inline void free_pgtable_page(u64 *pt)
  void free_pgtable_page(void *vaddr)

  STATIC inline long INIT parse_header(u8 *input, long *skip, long in_len)
  static struct tb_cfg_result parse_header(const struct ctl_pkg *pkg, u32 len,
                                         enum tb_cfg_pkg_type type, u64 route)
  static void __init parse_header(char *s)


could we enable the check/save globaly?

jirka

>  		if (!ftype__has_arg_names(&fn->proto))
>  			continue;
>  		if (encoder->functions.cnt) {
> @@ -1790,7 +1850,8 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
>  			if (func) {
>  				if (func->generated)
>  					continue;
> -				func->generated = true;
> +				if (!save)
> +					func->generated = true;
>  			} else if (encoder->functions.suffix_cnt) {
>  				/* falling back to name.isra.0 match if no exact
>  				 * match is found; only bother if we found any
> diff --git a/dwarves.h b/dwarves.h
> index 1ad1b3b..9b80262 100644
> --- a/dwarves.h
> +++ b/dwarves.h
> @@ -830,6 +830,7 @@ struct ftype {
>  	uint16_t	 nr_parms;
>  	uint8_t		 unspec_parms:1; /* just one bit is needed */
>  	uint8_t		 optimized_parms:1;
> +	uint8_t		 inconsistent_proto:1;
>  };
>  
>  static inline struct ftype *tag__ftype(const struct tag *tag)
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH dwarves 1/5] dwarves: help dwarf loader spot functions with optimized-out parameters
  2023-01-24 13:45 ` [PATCH dwarves 1/5] dwarves: help dwarf loader spot functions with optimized-out parameters Alan Maguire
@ 2023-01-25 16:53   ` Jiri Olsa
  2023-01-25 17:47   ` Eduard Zingerman
  1 sibling, 0 replies; 27+ messages in thread
From: Jiri Olsa @ 2023-01-25 16:53 UTC (permalink / raw)
  To: Alan Maguire
  Cc: acme, yhs, ast, olsajiri, timo, daniel, andrii, songliubraving,
	john.fastabend, kpsingh, sdf, haoluo, martin.lau, bpf

On Tue, Jan 24, 2023 at 01:45:27PM +0000, Alan Maguire wrote:

SNIP

>  
>  	return parm;
> @@ -1450,7 +1504,7 @@ static struct tag *die__create_new_parameter(Dwarf_Die *die,
>  					     struct cu *cu, struct conf_load *conf,
>  					     int param_idx)
>  {
> -	struct parameter *parm = parameter__new(die, cu, conf);
> +	struct parameter *parm = parameter__new(die, cu, conf, param_idx);
>  
>  	if (parm == NULL)
>  		return NULL;
> @@ -2209,6 +2263,10 @@ static void ftype__recode_dwarf_types(struct tag *tag, struct cu *cu)
>  			}
>  			pos->name = tag__parameter(dtype->tag)->name;
>  			pos->tag.type = dtype->tag->type;
> +			if (pos->optimized) {
> +				tag__parameter(dtype->tag)->optimized = pos->optimized;
> +				type->optimized_parms = 1;
> +			}
>  			continue;
>  		}
>  
> @@ -2219,6 +2277,20 @@ static void ftype__recode_dwarf_types(struct tag *tag, struct cu *cu)
>  		}
>  		pos->tag.type = dtype->small_id;
>  	}
> +	/* if parameters were optimized out, set flag for the ftype this
> +	 * function tag referred to via abstract origin.
> +	 */
> +	if (type->optimized_parms) {
> +		struct dwarf_tag *dtype = type->tag.priv;
> +		struct dwarf_tag *dftype;
> +
> +		dftype = dwarf_cu__find_tag_by_ref(dcu, &dtype->abstract_origin);
> +		if (dftype && dftype->tag) {
> +			struct ftype *ftype = tag__ftype(dftype->tag);
> +
> +			ftype->optimized_parms = 1;

nit, could be:
   tag__ftype(dftype->tag)->optimized_parms = 1; 

as you did above

jirka

> +		}
> +	}
>  }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH dwarves 1/5] dwarves: help dwarf loader spot functions with optimized-out parameters
  2023-01-24 13:45 ` [PATCH dwarves 1/5] dwarves: help dwarf loader spot functions with optimized-out parameters Alan Maguire
  2023-01-25 16:53   ` Jiri Olsa
@ 2023-01-25 17:47   ` Eduard Zingerman
  2023-01-25 18:28     ` Alan Maguire
  1 sibling, 1 reply; 27+ messages in thread
From: Eduard Zingerman @ 2023-01-25 17:47 UTC (permalink / raw)
  To: Alan Maguire, acme, yhs, ast, olsajiri, timo
  Cc: daniel, andrii, songliubraving, john.fastabend, kpsingh, sdf,
	haoluo, martin.lau, bpf

On Tue, 2023-01-24 at 13:45 +0000, Alan Maguire wrote:
> Compilation generates DWARF at several stages, and often the
> later DWARF representations more accurately represent optimizations
> that have occurred during compilation.
> 
> In particular, parameter representations can be spotted by their
> abstract origin references to the original parameter, but they
> often have more accurate location information.  In most cases,
> the parameter locations will match calling conventions, and be
> registers for the first 6 parameters on x86_64, first 8 on ARM64
> etc.  If the parameter is not a register when it should be however,
> it is likely passed via the stack or the compiler has used a
> constant representation instead.
> 
> This change adds a field to parameters and their associated
> ftype to note if a parameter has been optimized out.  Having
> this information allows us to skip such functions, as their
> presence in CUs makes BTF encoding impossible.
> 
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  dwarf_loader.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  dwarves.h      |  4 +++-
>  2 files changed, 77 insertions(+), 3 deletions(-)
> 
> diff --git a/dwarf_loader.c b/dwarf_loader.c
> index 5a74035..0220f1d 100644
> --- a/dwarf_loader.c
> +++ b/dwarf_loader.c
> @@ -992,13 +992,67 @@ static struct class_member *class_member__new(Dwarf_Die *die, struct cu *cu,
>  	return member;
>  }
>  
> -static struct parameter *parameter__new(Dwarf_Die *die, struct cu *cu, struct conf_load *conf)
> +/* How many function parameters are passed via registers?  Used below in
> + * determining if an argument has been optimized out or if it is simply
> + * an argument > NR_REGISTER_PARAMS.  Setting NR_REGISTER_PARAMS to 0
> + * allows unsupported architectures to skip tagging optimized-out
> + * values.
> + */
> +#if defined(__x86_64__)
> +#define NR_REGISTER_PARAMS      6
> +#elif defined(__s390__)
> +#define NR_REGISTER_PARAMS	5
> +#elif defined(__aarch64__)
> +#define NR_REGISTER_PARAMS      8
> +#elif defined(__mips__)
> +#define NR_REGISTER_PARAMS	8
> +#elif defined(__powerpc__)
> +#define NR_REGISTER_PARAMS	8
> +#elif defined(__sparc__)
> +#define NR_REGISTER_PARAMS	6
> +#elif defined(__riscv) && __riscv_xlen == 64
> +#define NR_REGISTER_PARAMS	8
> +#elif defined(__arc__)
> +#define NR_REGISTER_PARAMS	8
> +#else
> +#define NR_REGISTER_PARAMS      0
> +#endif
> +
> +static struct parameter *parameter__new(Dwarf_Die *die, struct cu *cu,
> +					struct conf_load *conf, int param_idx)
>  {
>  	struct parameter *parm = tag__alloc(cu, sizeof(*parm));
>  
>  	if (parm != NULL) {
> +		struct location loc;
> +
>  		tag__init(&parm->tag, cu, die);
>  		parm->name = attr_string(die, DW_AT_name, conf);
> +
> +		/* Parameters which use DW_AT_abstract_origin to point at
> +		 * the original parameter definition (with no name in the DIE)
> +		 * are the result of later DWARF generation during compilation
> +		 * so often better take into account if arguments were
> +		 * optimized out.
> +		 *
> +		 * By checking that locations for parameters that are expected
> +		 * to be passed as registers are actually passed as registers,
> +		 * we can spot optimized-out parameters.
> +		 */
> +		if (param_idx < NR_REGISTER_PARAMS && !parm->name &&
> +		    attr_location(die, &loc.expr, &loc.exprlen) == 0 &&
> +		    loc.exprlen != 0) {
> +			Dwarf_Op *expr = loc.expr;
> +
> +			switch (expr->atom) {
> +			case DW_OP_reg1 ... DW_OP_reg31:
> +			case DW_OP_breg0 ... DW_OP_breg31:
> +				break;
> +			default:
> +				parm->optimized = true;
> +				break;
> +			}
> +		}

Hi Alan,

I looked through the DWARF standard and found two relevant entries:

> 4.1.4
> 
> If no location attribute is present in a variable entry representing
> the definition of a variable (...), or if the location attribute is
> present but has an empty location description (...), the variable is
> assumed to exist in the source code but not in the executable program
> (but see number 10, below).

This paragraph implies that parameter name presence or absence is
irrelevant, but I don't have any examples when parameter name is
present for a removed parameter.

> 4.1.10
> 
> A DW_AT_const_value attribute for an entry describing a variable or formal
> parameter whose value is constant and not represented by an object in the
> address space of the program, or an entry describing a named constant. (Note
> that such an entry does not have a location attribute.)

For this paragraph I have an example:

    $ cat test.c
    __attribute__((noinline))
    static int f(int x, int y) {
        return x + y;
    }
    
    int main(int argc, char *argv[]) {
        return f(1, 2) + f(1, 3);
    }
    
    $ gcc --version | head -n1
    gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
    $ gcc -O2 -g -c test.c -o test.o
    
The objdump shows that constant propagation removed the first
parameter of the function `f`:

    $ llvm-objdump -d test.o 
    
    test.o:	file format elf64-x86-64
    
    Disassembly of section .text:
    
    0000000000000000 <f.constprop.0>:
           0: 8d 47 01                     	leal	0x1(%rdi), %eax
           3: c3                           	retq
    
    Disassembly of section .text.startup:
    
    0000000000000000 <main>:
           0: f3 0f 1e fa                  	endbr64
           4: bf 02 00 00 00               	movl	$0x2, %edi
           9: e8 00 00 00 00               	callq	0xe <main+0xe>
           e: bf 03 00 00 00               	movl	$0x3, %edi
          13: 89 c2                        	movl	%eax, %edx
          15: e8 00 00 00 00               	callq	0x1a <main+0x1a>
          1a: 01 d0                        	addl	%edx, %eax
          1c: c3                           	retq
    
However, the information about this parameter is still present in the DWARF:

    $ llvm-dwarfdump test.o
    ...
    0x000000c1:   DW_TAG_subprogram
                    DW_AT_name	("f")
                    DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
                    DW_AT_decl_line	(2)
                    DW_AT_decl_column	(0x0c)
                    DW_AT_prototyped	(true)
                    DW_AT_type	(0x000000a9 "int")
                    DW_AT_inline	(DW_INL_inlined)
                    DW_AT_sibling	(0x000000e1)
    
    0x000000d0:     DW_TAG_formal_parameter
                      DW_AT_name	("x")
                      DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
                      DW_AT_decl_line	(2)
                      DW_AT_decl_column	(0x12)
                      DW_AT_type	(0x000000a9 "int")
    
    0x000000d8:     DW_TAG_formal_parameter
                      DW_AT_name	("y")
                      DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
                      DW_AT_decl_line	(2)
                      DW_AT_decl_column	(0x19)
                      DW_AT_type	(0x000000a9 "int")
    
    0x000000e0:     NULL
    
    0x000000e1:   DW_TAG_subprogram
                    DW_AT_abstract_origin	(0x000000c1 "f")
                    DW_AT_low_pc	(0x0000000000000000)
                    DW_AT_high_pc	(0x0000000000000004)
                    DW_AT_frame_base	(DW_OP_call_frame_cfa)
                    DW_AT_call_all_calls	(true)
    
    0x000000f8:     DW_TAG_formal_parameter
                      DW_AT_abstract_origin	(0x000000d8 "y")
                      DW_AT_location	(DW_OP_reg5 RDI)
    
    0x000000ff:     DW_TAG_formal_parameter
                      DW_AT_abstract_origin	(0x000000d0 "x")
                      DW_AT_const_value	(0x01)
    
    0x00000105:     NULL
    
When I ask pahole with this patch-set applied to generate BTF I see
the following output:

    $ pahole --verbose --btf_encode_detached=test.btf test.o
    btf_encoder__new: 'test.o' doesn't have '.data..percpu' section
    Found 0 per-CPU variables!
    Found 2 functions!
    File test.o:
    [1] INT int size=4 nr_bits=32 encoding=SIGNED
    [2] PTR (anon) type_id=3
    [3] PTR (anon) type_id=4
    [4] INT char size=1 nr_bits=8 encoding=SIGNED
    [5] FUNC_PROTO (anon) return=1 args=(1 argc, 2 argv)
    [6] FUNC main type_id=5
    matched function 'f' with 'f.constprop.0'
    added local function 'f'
    matched function 'f' with 'f.constprop.0'
    [7] FUNC_PROTO (anon) return=1 args=(1 x, 1 y)
    [8] FUNC f type_id=7
    
Meaning that function `f` had not been skipped.
A trivial modification overcomes this:

		if (param_idx < NR_REGISTER_PARAMS && !parm->name) {
			if (attr_location(die, &loc.expr, &loc.exprlen) == 0 &&
			    loc.exprlen != 0) {
				Dwarf_Op *expr = loc.expr;

				switch (expr->atom) {
				case DW_OP_reg1 ... DW_OP_reg31:
				case DW_OP_breg0 ... DW_OP_breg31:
					break;
				default:
					parm->optimized = true;
					break;
				}
			} else if (dwarf_attr(die, DW_AT_const_value, &attr) != NULL) {
					parm->optimized = true;
			}

With it pahole seem to work as intended (if I understand the intention correctly):

    $ pahole --verbose --btf_encode_detached=test.btf test.o
    btf_encoder__new: 'test.o' doesn't have '.data..percpu' section
    Found 0 per-CPU variables!
    Found 2 functions!
    File test.o:
    [1] INT int size=4 nr_bits=32 encoding=SIGNED
    [2] PTR (anon) type_id=3
    [3] PTR (anon) type_id=4
    [4] INT char size=1 nr_bits=8 encoding=SIGNED
    [5] FUNC_PROTO (anon) return=1 args=(1 argc, 2 argv)
    [6] FUNC main type_id=5
    matched function 'f' with 'f.constprop.0', has optimized-out parameters
    added local function 'f', optimized-out params
    matched function 'f' with 'f.constprop.0', has optimized-out parameters
    skipping addition of 'f' due to optimized-out parameters

wdyt?

Thanks,
Eduard

>  
>  	return parm;
> @@ -1450,7 +1504,7 @@ static struct tag *die__create_new_parameter(Dwarf_Die *die,
>  					     struct cu *cu, struct conf_load *conf,
>  					     int param_idx)
>  {
> -	struct parameter *parm = parameter__new(die, cu, conf);
> +	struct parameter *parm = parameter__new(die, cu, conf, param_idx);
>  
>  	if (parm == NULL)
>  		return NULL;
> @@ -2209,6 +2263,10 @@ static void ftype__recode_dwarf_types(struct tag *tag, struct cu *cu)
>  			}
>  			pos->name = tag__parameter(dtype->tag)->name;
>  			pos->tag.type = dtype->tag->type;
> +			if (pos->optimized) {
> +				tag__parameter(dtype->tag)->optimized = pos->optimized;
> +				type->optimized_parms = 1;
> +			}
>  			continue;
>  		}
>  
> @@ -2219,6 +2277,20 @@ static void ftype__recode_dwarf_types(struct tag *tag, struct cu *cu)
>  		}
>  		pos->tag.type = dtype->small_id;
>  	}
> +	/* if parameters were optimized out, set flag for the ftype this
> +	 * function tag referred to via abstract origin.
> +	 */
> +	if (type->optimized_parms) {
> +		struct dwarf_tag *dtype = type->tag.priv;
> +		struct dwarf_tag *dftype;
> +
> +		dftype = dwarf_cu__find_tag_by_ref(dcu, &dtype->abstract_origin);
> +		if (dftype && dftype->tag) {
> +			struct ftype *ftype = tag__ftype(dftype->tag);
> +
> +			ftype->optimized_parms = 1;
> +		}
> +	}
>  }
>  
>  static void lexblock__recode_dwarf_types(struct lexblock *tag, struct cu *cu)
> diff --git a/dwarves.h b/dwarves.h
> index 589588e..1ad1b3b 100644
> --- a/dwarves.h
> +++ b/dwarves.h
> @@ -808,6 +808,7 @@ size_t lexblock__fprintf(const struct lexblock *lexblock, const struct cu *cu,
>  struct parameter {
>  	struct tag tag;
>  	const char *name;
> +	bool optimized;
>  };
>  
>  static inline struct parameter *tag__parameter(const struct tag *tag)
> @@ -827,7 +828,8 @@ struct ftype {
>  	struct tag	 tag;
>  	struct list_head parms;
>  	uint16_t	 nr_parms;
> -	uint8_t		 unspec_parms; /* just one bit is needed */
> +	uint8_t		 unspec_parms:1; /* just one bit is needed */
> +	uint8_t		 optimized_parms:1;
>  };
>  
>  static inline struct ftype *tag__ftype(const struct tag *tag)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH dwarves 4/5] btf_encoder: represent "."-suffixed optimized functions (".isra.0") in BTF
  2023-01-24 13:45 ` [PATCH dwarves 4/5] btf_encoder: represent "."-suffixed optimized functions (".isra.0") in BTF Alan Maguire
@ 2023-01-25 17:54   ` Kui-Feng Lee
  2023-01-25 18:56     ` Arnaldo Carvalho de Melo
  2023-01-25 18:59     ` Alan Maguire
  0 siblings, 2 replies; 27+ messages in thread
From: Kui-Feng Lee @ 2023-01-25 17:54 UTC (permalink / raw)
  To: Alan Maguire, acme, yhs, ast, olsajiri, timo
  Cc: daniel, andrii, songliubraving, john.fastabend, kpsingh, sdf,
	haoluo, martin.lau, bpf


On 1/24/23 05:45, Alan Maguire wrote:
> +/*
> + * static functions with suffixes are not added yet - we need to
> + * observe across all CUs to see if the static function has
> + * optimized parameters in any CU, since in such a case it should
> + * not be included in the final BTF.  NF_HOOK.constprop.0() is
> + * a case in point - it has optimized-out parameters in some CUs
> + * but not others.  In order to have consistency (since we do not
> + * know which instance the BTF-specified function signature will
> + * apply to), we simply skip adding functions which have optimized
> + * out parameters anywhere.
> + */
> +static int32_t btf_encoder__save_func(struct btf_encoder *encoder, struct function *fn)
> +{
> +	struct btf_encoder *parent = encoder->parent ? encoder->parent : encoder;
> +	const char *name = function__name(fn);
> +	struct function **nodep;
> +	int ret = 0;
> +
> +	pthread_mutex_lock(&parent->saved_func_lock);

Do you have the number of static functions with suffices?

If the number of static functions with suffices is high, the contention 
of the lock would be an issue.

Is it possible to keep a local pool of static functions with suffices? 
The pool will be combined with its parent either at the completion of a 
CU, before ending the thread or when merging into the main thread.


> +	nodep = tsearch(fn, &parent->saved_func_tree, function__compare);
> +	if (nodep == NULL) {
> +		fprintf(stderr, "error: out of memory adding local function '%s'\n",
> +			name);
> +		ret = -1;
> +		goto out;
> +	}
> +	/* If we find an existing entry, we want to merge observations
> +	 * across both functions, checking that the "seen optimized-out
> +	 * parameters" status is reflected in our tree entry.
> +	 * If the entry is new, record encoder state required
> +	 * to add the local function later (encoder + type_id_off)
> +	 * such that we can add the function later.
> +	 */
> +	if (*nodep != fn) {
> +		(*nodep)->proto.optimized_parms |= fn->proto.optimized_parms;
> +	} else {
> +		struct btf_encoder_state *state = zalloc(sizeof(*state));
> +
> +		if (state == NULL) {
> +			fprintf(stderr, "error: out of memory adding local function '%s'\n",
> +				name);
> +			ret = -1;
> +			goto out;
> +		}
> +		state->encoder = encoder;
> +		state->type_id_off = encoder->type_id_off;
> +		fn->priv = state;
> +		encoder->saved_func_cnt++;
> +		if (encoder->verbose)
> +			printf("added local function '%s'%s\n", name,
> +			       fn->proto.optimized_parms ?
> +			       ", optimized-out params" : "");
> +	}
> +out:
> +	pthread_mutex_unlock(&parent->saved_func_lock);
> +	return ret;
> +}
> +

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH dwarves 1/5] dwarves: help dwarf loader spot functions with optimized-out parameters
  2023-01-25 17:47   ` Eduard Zingerman
@ 2023-01-25 18:28     ` Alan Maguire
  2023-01-25 21:34       ` Eduard Zingerman
  0 siblings, 1 reply; 27+ messages in thread
From: Alan Maguire @ 2023-01-25 18:28 UTC (permalink / raw)
  To: Eduard Zingerman, acme, yhs, ast, olsajiri, timo
  Cc: daniel, andrii, songliubraving, john.fastabend, kpsingh, sdf,
	haoluo, martin.lau, bpf

On 25/01/2023 17:47, Eduard Zingerman wrote:
> On Tue, 2023-01-24 at 13:45 +0000, Alan Maguire wrote:
>> Compilation generates DWARF at several stages, and often the
>> later DWARF representations more accurately represent optimizations
>> that have occurred during compilation.
>>
>> In particular, parameter representations can be spotted by their
>> abstract origin references to the original parameter, but they
>> often have more accurate location information.  In most cases,
>> the parameter locations will match calling conventions, and be
>> registers for the first 6 parameters on x86_64, first 8 on ARM64
>> etc.  If the parameter is not a register when it should be however,
>> it is likely passed via the stack or the compiler has used a
>> constant representation instead.
>>
>> This change adds a field to parameters and their associated
>> ftype to note if a parameter has been optimized out.  Having
>> this information allows us to skip such functions, as their
>> presence in CUs makes BTF encoding impossible.
>>
>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>> ---
>>  dwarf_loader.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>>  dwarves.h      |  4 +++-
>>  2 files changed, 77 insertions(+), 3 deletions(-)
>>
>> diff --git a/dwarf_loader.c b/dwarf_loader.c
>> index 5a74035..0220f1d 100644
>> --- a/dwarf_loader.c
>> +++ b/dwarf_loader.c
>> @@ -992,13 +992,67 @@ static struct class_member *class_member__new(Dwarf_Die *die, struct cu *cu,
>>  	return member;
>>  }
>>  
>> -static struct parameter *parameter__new(Dwarf_Die *die, struct cu *cu, struct conf_load *conf)
>> +/* How many function parameters are passed via registers?  Used below in
>> + * determining if an argument has been optimized out or if it is simply
>> + * an argument > NR_REGISTER_PARAMS.  Setting NR_REGISTER_PARAMS to 0
>> + * allows unsupported architectures to skip tagging optimized-out
>> + * values.
>> + */
>> +#if defined(__x86_64__)
>> +#define NR_REGISTER_PARAMS      6
>> +#elif defined(__s390__)
>> +#define NR_REGISTER_PARAMS	5
>> +#elif defined(__aarch64__)
>> +#define NR_REGISTER_PARAMS      8
>> +#elif defined(__mips__)
>> +#define NR_REGISTER_PARAMS	8
>> +#elif defined(__powerpc__)
>> +#define NR_REGISTER_PARAMS	8
>> +#elif defined(__sparc__)
>> +#define NR_REGISTER_PARAMS	6
>> +#elif defined(__riscv) && __riscv_xlen == 64
>> +#define NR_REGISTER_PARAMS	8
>> +#elif defined(__arc__)
>> +#define NR_REGISTER_PARAMS	8
>> +#else
>> +#define NR_REGISTER_PARAMS      0
>> +#endif
>> +
>> +static struct parameter *parameter__new(Dwarf_Die *die, struct cu *cu,
>> +					struct conf_load *conf, int param_idx)
>>  {
>>  	struct parameter *parm = tag__alloc(cu, sizeof(*parm));
>>  
>>  	if (parm != NULL) {
>> +		struct location loc;
>> +
>>  		tag__init(&parm->tag, cu, die);
>>  		parm->name = attr_string(die, DW_AT_name, conf);
>> +
>> +		/* Parameters which use DW_AT_abstract_origin to point at
>> +		 * the original parameter definition (with no name in the DIE)
>> +		 * are the result of later DWARF generation during compilation
>> +		 * so often better take into account if arguments were
>> +		 * optimized out.
>> +		 *
>> +		 * By checking that locations for parameters that are expected
>> +		 * to be passed as registers are actually passed as registers,
>> +		 * we can spot optimized-out parameters.
>> +		 */
>> +		if (param_idx < NR_REGISTER_PARAMS && !parm->name &&
>> +		    attr_location(die, &loc.expr, &loc.exprlen) == 0 &&
>> +		    loc.exprlen != 0) {
>> +			Dwarf_Op *expr = loc.expr;
>> +
>> +			switch (expr->atom) {
>> +			case DW_OP_reg1 ... DW_OP_reg31:
>> +			case DW_OP_breg0 ... DW_OP_breg31:
>> +				break;
>> +			default:
>> +				parm->optimized = true;
>> +				break;
>> +			}
>> +		}
> 
> Hi Alan,
> 
> I looked through the DWARF standard and found two relevant entries:
> 
>> 4.1.4
>>
>> If no location attribute is present in a variable entry representing
>> the definition of a variable (...), or if the location attribute is
>> present but has an empty location description (...), the variable is
>> assumed to exist in the source code but not in the executable program
>> (but see number 10, below).
> 
> This paragraph implies that parameter name presence or absence is
> irrelevant, but I don't have any examples when parameter name is
> present for a removed parameter.
> 
>> 4.1.10
>>
>> A DW_AT_const_value attribute for an entry describing a variable or formal
>> parameter whose value is constant and not represented by an object in the
>> address space of the program, or an entry describing a named constant. (Note
>> that such an entry does not have a location attribute.)
> 
> For this paragraph I have an example:
> 
>     $ cat test.c
>     __attribute__((noinline))
>     static int f(int x, int y) {
>         return x + y;
>     }
>     
>     int main(int argc, char *argv[]) {
>         return f(1, 2) + f(1, 3);
>     }
>     
>     $ gcc --version | head -n1
>     gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
>     $ gcc -O2 -g -c test.c -o test.o
>     
> The objdump shows that constant propagation removed the first
> parameter of the function `f`:
> 
>     $ llvm-objdump -d test.o 
>     
>     test.o:	file format elf64-x86-64
>     
>     Disassembly of section .text:
>     
>     0000000000000000 <f.constprop.0>:
>            0: 8d 47 01                     	leal	0x1(%rdi), %eax
>            3: c3                           	retq
>     
>     Disassembly of section .text.startup:
>     
>     0000000000000000 <main>:
>            0: f3 0f 1e fa                  	endbr64
>            4: bf 02 00 00 00               	movl	$0x2, %edi
>            9: e8 00 00 00 00               	callq	0xe <main+0xe>
>            e: bf 03 00 00 00               	movl	$0x3, %edi
>           13: 89 c2                        	movl	%eax, %edx
>           15: e8 00 00 00 00               	callq	0x1a <main+0x1a>
>           1a: 01 d0                        	addl	%edx, %eax
>           1c: c3                           	retq
>     
> However, the information about this parameter is still present in the DWARF:
> 
>     $ llvm-dwarfdump test.o
>     ...
>     0x000000c1:   DW_TAG_subprogram
>                     DW_AT_name	("f")
>                     DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
>                     DW_AT_decl_line	(2)
>                     DW_AT_decl_column	(0x0c)
>                     DW_AT_prototyped	(true)
>                     DW_AT_type	(0x000000a9 "int")
>                     DW_AT_inline	(DW_INL_inlined)
>                     DW_AT_sibling	(0x000000e1)
>     
>     0x000000d0:     DW_TAG_formal_parameter
>                       DW_AT_name	("x")
>                       DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
>                       DW_AT_decl_line	(2)
>                       DW_AT_decl_column	(0x12)
>                       DW_AT_type	(0x000000a9 "int")
>     
>     0x000000d8:     DW_TAG_formal_parameter
>                       DW_AT_name	("y")
>                       DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
>                       DW_AT_decl_line	(2)
>                       DW_AT_decl_column	(0x19)
>                       DW_AT_type	(0x000000a9 "int")
>     
>     0x000000e0:     NULL
>     
>     0x000000e1:   DW_TAG_subprogram
>                     DW_AT_abstract_origin	(0x000000c1 "f")
>                     DW_AT_low_pc	(0x0000000000000000)
>                     DW_AT_high_pc	(0x0000000000000004)
>                     DW_AT_frame_base	(DW_OP_call_frame_cfa)
>                     DW_AT_call_all_calls	(true)
>     
>     0x000000f8:     DW_TAG_formal_parameter
>                       DW_AT_abstract_origin	(0x000000d8 "y")
>                       DW_AT_location	(DW_OP_reg5 RDI)
>     
>     0x000000ff:     DW_TAG_formal_parameter
>                       DW_AT_abstract_origin	(0x000000d0 "x")
>                       DW_AT_const_value	(0x01)
>     
>     0x00000105:     NULL
>     
> When I ask pahole with this patch-set applied to generate BTF I see
> the following output:
> 
>     $ pahole --verbose --btf_encode_detached=test.btf test.o
>     btf_encoder__new: 'test.o' doesn't have '.data..percpu' section
>     Found 0 per-CPU variables!
>     Found 2 functions!
>     File test.o:
>     [1] INT int size=4 nr_bits=32 encoding=SIGNED
>     [2] PTR (anon) type_id=3
>     [3] PTR (anon) type_id=4
>     [4] INT char size=1 nr_bits=8 encoding=SIGNED
>     [5] FUNC_PROTO (anon) return=1 args=(1 argc, 2 argv)
>     [6] FUNC main type_id=5
>     matched function 'f' with 'f.constprop.0'
>     added local function 'f'
>     matched function 'f' with 'f.constprop.0'
>     [7] FUNC_PROTO (anon) return=1 args=(1 x, 1 y)
>     [8] FUNC f type_id=7
>     
> Meaning that function `f` had not been skipped.
> A trivial modification overcomes this:
> 
> 		if (param_idx < NR_REGISTER_PARAMS && !parm->name) {
> 			if (attr_location(die, &loc.expr, &loc.exprlen) == 0 &&
> 			    loc.exprlen != 0) {
> 				Dwarf_Op *expr = loc.expr;
> 
> 				switch (expr->atom) {
> 				case DW_OP_reg1 ... DW_OP_reg31:
> 				case DW_OP_breg0 ... DW_OP_breg31:
> 					break;
> 				default:
> 					parm->optimized = true;
> 					break;
> 				}
> 			} else if (dwarf_attr(die, DW_AT_const_value, &attr) != NULL) {
> 					parm->optimized = true;
> 			}
> 
> With it pahole seem to work as intended (if I understand the intention correctly):
> 
>     $ pahole --verbose --btf_encode_detached=test.btf test.o
>     btf_encoder__new: 'test.o' doesn't have '.data..percpu' section
>     Found 0 per-CPU variables!
>     Found 2 functions!
>     File test.o:
>     [1] INT int size=4 nr_bits=32 encoding=SIGNED
>     [2] PTR (anon) type_id=3
>     [3] PTR (anon) type_id=4
>     [4] INT char size=1 nr_bits=8 encoding=SIGNED
>     [5] FUNC_PROTO (anon) return=1 args=(1 argc, 2 argv)
>     [6] FUNC main type_id=5
>     matched function 'f' with 'f.constprop.0', has optimized-out parameters
>     added local function 'f', optimized-out params
>     matched function 'f' with 'f.constprop.0', has optimized-out parameters
>     skipping addition of 'f' due to optimized-out parameters
> 
> wdyt?
> 

This is great, thanks Eduard! I can add an additional patch
for the else clause code above, attributing that to you in v2 if
you like?

Alan

> Thanks,
> Eduard
> 
>>  
>>  	return parm;
>> @@ -1450,7 +1504,7 @@ static struct tag *die__create_new_parameter(Dwarf_Die *die,
>>  					     struct cu *cu, struct conf_load *conf,
>>  					     int param_idx)
>>  {
>> -	struct parameter *parm = parameter__new(die, cu, conf);
>> +	struct parameter *parm = parameter__new(die, cu, conf, param_idx);
>>  
>>  	if (parm == NULL)
>>  		return NULL;
>> @@ -2209,6 +2263,10 @@ static void ftype__recode_dwarf_types(struct tag *tag, struct cu *cu)
>>  			}
>>  			pos->name = tag__parameter(dtype->tag)->name;
>>  			pos->tag.type = dtype->tag->type;
>> +			if (pos->optimized) {
>> +				tag__parameter(dtype->tag)->optimized = pos->optimized;
>> +				type->optimized_parms = 1;
>> +			}
>>  			continue;
>>  		}
>>  
>> @@ -2219,6 +2277,20 @@ static void ftype__recode_dwarf_types(struct tag *tag, struct cu *cu)
>>  		}
>>  		pos->tag.type = dtype->small_id;
>>  	}
>> +	/* if parameters were optimized out, set flag for the ftype this
>> +	 * function tag referred to via abstract origin.
>> +	 */
>> +	if (type->optimized_parms) {
>> +		struct dwarf_tag *dtype = type->tag.priv;
>> +		struct dwarf_tag *dftype;
>> +
>> +		dftype = dwarf_cu__find_tag_by_ref(dcu, &dtype->abstract_origin);
>> +		if (dftype && dftype->tag) {
>> +			struct ftype *ftype = tag__ftype(dftype->tag);
>> +
>> +			ftype->optimized_parms = 1;
>> +		}
>> +	}
>>  }
>>  
>>  static void lexblock__recode_dwarf_types(struct lexblock *tag, struct cu *cu)
>> diff --git a/dwarves.h b/dwarves.h
>> index 589588e..1ad1b3b 100644
>> --- a/dwarves.h
>> +++ b/dwarves.h
>> @@ -808,6 +808,7 @@ size_t lexblock__fprintf(const struct lexblock *lexblock, const struct cu *cu,
>>  struct parameter {
>>  	struct tag tag;
>>  	const char *name;
>> +	bool optimized;
>>  };
>>  
>>  static inline struct parameter *tag__parameter(const struct tag *tag)
>> @@ -827,7 +828,8 @@ struct ftype {
>>  	struct tag	 tag;
>>  	struct list_head parms;
>>  	uint16_t	 nr_parms;
>> -	uint8_t		 unspec_parms; /* just one bit is needed */
>> +	uint8_t		 unspec_parms:1; /* just one bit is needed */
>> +	uint8_t		 optimized_parms:1;
>>  };
>>  
>>  static inline struct ftype *tag__ftype(const struct tag *tag)
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH dwarves 4/5] btf_encoder: represent "."-suffixed optimized functions (".isra.0") in BTF
  2023-01-25 17:54   ` Kui-Feng Lee
@ 2023-01-25 18:56     ` Arnaldo Carvalho de Melo
  2023-01-26 18:37       ` Kui-Feng Lee
  2023-01-25 18:59     ` Alan Maguire
  1 sibling, 1 reply; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-01-25 18:56 UTC (permalink / raw)
  To: Kui-Feng Lee
  Cc: Alan Maguire, yhs, ast, olsajiri, timo, daniel, andrii,
	songliubraving, john.fastabend, kpsingh, sdf, haoluo, martin.lau,
	bpf

Em Wed, Jan 25, 2023 at 09:54:26AM -0800, Kui-Feng Lee escreveu:
> 
> On 1/24/23 05:45, Alan Maguire wrote:
> > +/*
> > + * static functions with suffixes are not added yet - we need to
> > + * observe across all CUs to see if the static function has
> > + * optimized parameters in any CU, since in such a case it should
> > + * not be included in the final BTF.  NF_HOOK.constprop.0() is
> > + * a case in point - it has optimized-out parameters in some CUs
> > + * but not others.  In order to have consistency (since we do not
> > + * know which instance the BTF-specified function signature will
> > + * apply to), we simply skip adding functions which have optimized
> > + * out parameters anywhere.
> > + */
> > +static int32_t btf_encoder__save_func(struct btf_encoder *encoder, struct function *fn)
> > +{
> > +	struct btf_encoder *parent = encoder->parent ? encoder->parent : encoder;
> > +	const char *name = function__name(fn);
> > +	struct function **nodep;
> > +	int ret = 0;
> > +
> > +	pthread_mutex_lock(&parent->saved_func_lock);
> 
> Do you have the number of static functions with suffices?

⬢[acme@toolbox linux]$ readelf -sW ../build/v6.2-rc5+/vmlinux | grep -w FUNC | grep '[[:alnum:]]\.' | wc -l
7245
⬢[acme@toolbox linux]$ readelf -sW ../build/v6.2-rc5+/vmlinux | grep -w FUNC | wc -l
122336
⬢[acme@toolbox linux]$ readelf -sW ../build/v6.2-rc5+/vmlinux | grep -w FUNC | grep '[[:alnum:]]\.constprop' | wc -l
1292
⬢[acme@toolbox linux]$ readelf -sW ../build/v6.2-rc5+/vmlinux | grep -w FUNC | grep '[[:alnum:]]\.isra' | wc -l
1148
⬢[acme@toolbox linux]$ readelf -sW ../build/v6.2-rc5+/vmlinux | grep -w FUNC | grep '[[:alnum:]]\.cold' | wc -l
4066
⬢[acme@toolbox linux]$ readelf -sW ../build/v6.2-rc5+/vmlinux | grep -w FUNC | grep '[[:alnum:]]\.part' | wc -l
1013
⬢[acme@toolbox linux]$
 
> If the number of static functions with suffices is high, the contention of
> the lock would be an issue.
> 
> Is it possible to keep a local pool of static functions with suffices? The
> pool will be combined with its parent either at the completion of a CU,
> before ending the thread or when merging into the main thread.

May help, but I think maybe premature optimization is the root of... :-)

- Arnaldo
 
> > +	nodep = tsearch(fn, &parent->saved_func_tree, function__compare);
> > +	if (nodep == NULL) {
> > +		fprintf(stderr, "error: out of memory adding local function '%s'\n",
> > +			name);
> > +		ret = -1;
> > +		goto out;
> > +	}
> > +	/* If we find an existing entry, we want to merge observations
> > +	 * across both functions, checking that the "seen optimized-out
> > +	 * parameters" status is reflected in our tree entry.
> > +	 * If the entry is new, record encoder state required
> > +	 * to add the local function later (encoder + type_id_off)
> > +	 * such that we can add the function later.
> > +	 */
> > +	if (*nodep != fn) {
> > +		(*nodep)->proto.optimized_parms |= fn->proto.optimized_parms;
> > +	} else {
> > +		struct btf_encoder_state *state = zalloc(sizeof(*state));
> > +
> > +		if (state == NULL) {
> > +			fprintf(stderr, "error: out of memory adding local function '%s'\n",
> > +				name);
> > +			ret = -1;
> > +			goto out;
> > +		}
> > +		state->encoder = encoder;
> > +		state->type_id_off = encoder->type_id_off;
> > +		fn->priv = state;
> > +		encoder->saved_func_cnt++;
> > +		if (encoder->verbose)
> > +			printf("added local function '%s'%s\n", name,
> > +			       fn->proto.optimized_parms ?
> > +			       ", optimized-out params" : "");
> > +	}
> > +out:
> > +	pthread_mutex_unlock(&parent->saved_func_lock);
> > +	return ret;
> > +}
> > +

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH dwarves 4/5] btf_encoder: represent "."-suffixed optimized functions (".isra.0") in BTF
  2023-01-25 17:54   ` Kui-Feng Lee
  2023-01-25 18:56     ` Arnaldo Carvalho de Melo
@ 2023-01-25 18:59     ` Alan Maguire
  2023-01-26 17:43       ` Kui-Feng Lee
  1 sibling, 1 reply; 27+ messages in thread
From: Alan Maguire @ 2023-01-25 18:59 UTC (permalink / raw)
  To: Kui-Feng Lee, acme, yhs, ast, olsajiri, timo
  Cc: daniel, andrii, songliubraving, john.fastabend, kpsingh, sdf,
	haoluo, martin.lau, bpf

On 25/01/2023 17:54, Kui-Feng Lee wrote:
> 
> On 1/24/23 05:45, Alan Maguire wrote:
>> +/*
>> + * static functions with suffixes are not added yet - we need to
>> + * observe across all CUs to see if the static function has
>> + * optimized parameters in any CU, since in such a case it should
>> + * not be included in the final BTF.  NF_HOOK.constprop.0() is
>> + * a case in point - it has optimized-out parameters in some CUs
>> + * but not others.  In order to have consistency (since we do not
>> + * know which instance the BTF-specified function signature will
>> + * apply to), we simply skip adding functions which have optimized
>> + * out parameters anywhere.
>> + */
>> +static int32_t btf_encoder__save_func(struct btf_encoder *encoder, struct function *fn)
>> +{
>> +    struct btf_encoder *parent = encoder->parent ? encoder->parent : encoder;
>> +    const char *name = function__name(fn);
>> +    struct function **nodep;
>> +    int ret = 0;
>> +
>> +    pthread_mutex_lock(&parent->saved_func_lock);
> 
> Do you have the number of static functions with suffices?
> 

There are a few thousand, and around 25000 static functions
overall ("."-suffixed are all static) that will participate in
the tree representations (see patch 5).  This equates to roughly 
half of the vmlinux BTF functions.

> If the number of static functions with suffices is high, the contention of the lock would be an issue.
> 
> Is it possible to keep a local pool of static functions with suffices? The pool will be combined with its parent either at the completion of a CU, before ending the thread or when merging into the main thread.
>

It's possible alright. I'll try to lay out the possibilities so we
can figure out the best way forward.

Option 1: global tree of static functions, created during DWARF loading

Pro: Quick addition/lookup, we can flag optimizations or inconsistent prototypes as
we encounter them.
Con: Lock contention between encoder threads.

Option 2: store static functions in a per-encoder tree, traverse them all
prior to BTF merging to eliminate unwanted functions

Pro: limits contention.
Con: for each static function in each encoder, we need to look it up in all other
encoder trees. In option 1 we paid that price as the function was added, here
we pay it later on prior to merging. So processing here is 
O(number_functions * num_encoders). There may be a cleverer way to handle
this but I can't see it right now.

There may be other approaches to this of course, but these were the two I
could come up with. What do you think?

Alan

> 
>> +    nodep = tsearch(fn, &parent->saved_func_tree, function__compare);
>> +    if (nodep == NULL) {
>> +        fprintf(stderr, "error: out of memory adding local function '%s'\n",
>> +            name);
>> +        ret = -1;
>> +        goto out;
>> +    }
>> +    /* If we find an existing entry, we want to merge observations
>> +     * across both functions, checking that the "seen optimized-out
>> +     * parameters" status is reflected in our tree entry.
>> +     * If the entry is new, record encoder state required
>> +     * to add the local function later (encoder + type_id_off)
>> +     * such that we can add the function later.
>> +     */
>> +    if (*nodep != fn) {
>> +        (*nodep)->proto.optimized_parms |= fn->proto.optimized_parms;
>> +    } else {
>> +        struct btf_encoder_state *state = zalloc(sizeof(*state));
>> +
>> +        if (state == NULL) {
>> +            fprintf(stderr, "error: out of memory adding local function '%s'\n",
>> +                name);
>> +            ret = -1;
>> +            goto out;
>> +        }
>> +        state->encoder = encoder;
>> +        state->type_id_off = encoder->type_id_off;
>> +        fn->priv = state;
>> +        encoder->saved_func_cnt++;
>> +        if (encoder->verbose)
>> +            printf("added local function '%s'%s\n", name,
>> +                   fn->proto.optimized_parms ?
>> +                   ", optimized-out params" : "");
>> +    }
>> +out:
>> +    pthread_mutex_unlock(&parent->saved_func_lock);
>> +    return ret;
>> +}
>> +

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH dwarves 1/5] dwarves: help dwarf loader spot functions with optimized-out parameters
  2023-01-25 18:28     ` Alan Maguire
@ 2023-01-25 21:34       ` Eduard Zingerman
  2023-01-25 22:52         ` Alan Maguire
  0 siblings, 1 reply; 27+ messages in thread
From: Eduard Zingerman @ 2023-01-25 21:34 UTC (permalink / raw)
  To: Alan Maguire, acme, yhs, ast, olsajiri, timo
  Cc: daniel, andrii, songliubraving, john.fastabend, kpsingh, sdf,
	haoluo, martin.lau, bpf

On Wed, 2023-01-25 at 18:28 +0000, Alan Maguire wrote:
> On 25/01/2023 17:47, Eduard Zingerman wrote:
> > On Tue, 2023-01-24 at 13:45 +0000, Alan Maguire wrote:
> > > Compilation generates DWARF at several stages, and often the
> > > later DWARF representations more accurately represent optimizations
> > > that have occurred during compilation.
> > > 
> > > In particular, parameter representations can be spotted by their
> > > abstract origin references to the original parameter, but they
> > > often have more accurate location information.  In most cases,
> > > the parameter locations will match calling conventions, and be
> > > registers for the first 6 parameters on x86_64, first 8 on ARM64
> > > etc.  If the parameter is not a register when it should be however,
> > > it is likely passed via the stack or the compiler has used a
> > > constant representation instead.
> > > 
> > > This change adds a field to parameters and their associated
> > > ftype to note if a parameter has been optimized out.  Having
> > > this information allows us to skip such functions, as their
> > > presence in CUs makes BTF encoding impossible.
> > > 
> > > Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> > > ---
> > >  dwarf_loader.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
> > >  dwarves.h      |  4 +++-
> > >  2 files changed, 77 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/dwarf_loader.c b/dwarf_loader.c
> > > index 5a74035..0220f1d 100644
> > > --- a/dwarf_loader.c
> > > +++ b/dwarf_loader.c
> > > @@ -992,13 +992,67 @@ static struct class_member *class_member__new(Dwarf_Die *die, struct cu *cu,
> > >  	return member;
> > >  }
> > >  
> > > -static struct parameter *parameter__new(Dwarf_Die *die, struct cu *cu, struct conf_load *conf)
> > > +/* How many function parameters are passed via registers?  Used below in
> > > + * determining if an argument has been optimized out or if it is simply
> > > + * an argument > NR_REGISTER_PARAMS.  Setting NR_REGISTER_PARAMS to 0
> > > + * allows unsupported architectures to skip tagging optimized-out
> > > + * values.
> > > + */
> > > +#if defined(__x86_64__)
> > > +#define NR_REGISTER_PARAMS      6
> > > +#elif defined(__s390__)
> > > +#define NR_REGISTER_PARAMS	5
> > > +#elif defined(__aarch64__)
> > > +#define NR_REGISTER_PARAMS      8
> > > +#elif defined(__mips__)
> > > +#define NR_REGISTER_PARAMS	8
> > > +#elif defined(__powerpc__)
> > > +#define NR_REGISTER_PARAMS	8
> > > +#elif defined(__sparc__)
> > > +#define NR_REGISTER_PARAMS	6
> > > +#elif defined(__riscv) && __riscv_xlen == 64
> > > +#define NR_REGISTER_PARAMS	8
> > > +#elif defined(__arc__)
> > > +#define NR_REGISTER_PARAMS	8
> > > +#else
> > > +#define NR_REGISTER_PARAMS      0
> > > +#endif
> > > +
> > > +static struct parameter *parameter__new(Dwarf_Die *die, struct cu *cu,
> > > +					struct conf_load *conf, int param_idx)
> > >  {
> > >  	struct parameter *parm = tag__alloc(cu, sizeof(*parm));
> > >  
> > >  	if (parm != NULL) {
> > > +		struct location loc;
> > > +
> > >  		tag__init(&parm->tag, cu, die);
> > >  		parm->name = attr_string(die, DW_AT_name, conf);
> > > +
> > > +		/* Parameters which use DW_AT_abstract_origin to point at
> > > +		 * the original parameter definition (with no name in the DIE)
> > > +		 * are the result of later DWARF generation during compilation
> > > +		 * so often better take into account if arguments were
> > > +		 * optimized out.
> > > +		 *
> > > +		 * By checking that locations for parameters that are expected
> > > +		 * to be passed as registers are actually passed as registers,
> > > +		 * we can spot optimized-out parameters.
> > > +		 */
> > > +		if (param_idx < NR_REGISTER_PARAMS && !parm->name &&
> > > +		    attr_location(die, &loc.expr, &loc.exprlen) == 0 &&
> > > +		    loc.exprlen != 0) {
> > > +			Dwarf_Op *expr = loc.expr;
> > > +
> > > +			switch (expr->atom) {
> > > +			case DW_OP_reg1 ... DW_OP_reg31:
> > > +			case DW_OP_breg0 ... DW_OP_breg31:
> > > +				break;
> > > +			default:
> > > +				parm->optimized = true;
> > > +				break;
> > > +			}
> > > +		}
> > 
> > Hi Alan,
> > 
> > I looked through the DWARF standard and found two relevant entries:
> > 
> > > 4.1.4
> > > 
> > > If no location attribute is present in a variable entry representing
> > > the definition of a variable (...), or if the location attribute is
> > > present but has an empty location description (...), the variable is
> > > assumed to exist in the source code but not in the executable program
> > > (but see number 10, below).
> > 
> > This paragraph implies that parameter name presence or absence is
> > irrelevant, but I don't have any examples when parameter name is
> > present for a removed parameter.
> > 
> > > 4.1.10
> > > 
> > > A DW_AT_const_value attribute for an entry describing a variable or formal
> > > parameter whose value is constant and not represented by an object in the
> > > address space of the program, or an entry describing a named constant. (Note
> > > that such an entry does not have a location attribute.)
> > 
> > For this paragraph I have an example:
> > 
> >     $ cat test.c
> >     __attribute__((noinline))
> >     static int f(int x, int y) {
> >         return x + y;
> >     }
> >     
> >     int main(int argc, char *argv[]) {
> >         return f(1, 2) + f(1, 3);
> >     }
> >     
> >     $ gcc --version | head -n1
> >     gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
> >     $ gcc -O2 -g -c test.c -o test.o
> >     
> > The objdump shows that constant propagation removed the first
> > parameter of the function `f`:
> > 
> >     $ llvm-objdump -d test.o 
> >     
> >     test.o:	file format elf64-x86-64
> >     
> >     Disassembly of section .text:
> >     
> >     0000000000000000 <f.constprop.0>:
> >            0: 8d 47 01                     	leal	0x1(%rdi), %eax
> >            3: c3                           	retq
> >     
> >     Disassembly of section .text.startup:
> >     
> >     0000000000000000 <main>:
> >            0: f3 0f 1e fa                  	endbr64
> >            4: bf 02 00 00 00               	movl	$0x2, %edi
> >            9: e8 00 00 00 00               	callq	0xe <main+0xe>
> >            e: bf 03 00 00 00               	movl	$0x3, %edi
> >           13: 89 c2                        	movl	%eax, %edx
> >           15: e8 00 00 00 00               	callq	0x1a <main+0x1a>
> >           1a: 01 d0                        	addl	%edx, %eax
> >           1c: c3                           	retq
> >     
> > However, the information about this parameter is still present in the DWARF:
> > 
> >     $ llvm-dwarfdump test.o
> >     ...
> >     0x000000c1:   DW_TAG_subprogram
> >                     DW_AT_name	("f")
> >                     DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
> >                     DW_AT_decl_line	(2)
> >                     DW_AT_decl_column	(0x0c)
> >                     DW_AT_prototyped	(true)
> >                     DW_AT_type	(0x000000a9 "int")
> >                     DW_AT_inline	(DW_INL_inlined)
> >                     DW_AT_sibling	(0x000000e1)
> >     
> >     0x000000d0:     DW_TAG_formal_parameter
> >                       DW_AT_name	("x")
> >                       DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
> >                       DW_AT_decl_line	(2)
> >                       DW_AT_decl_column	(0x12)
> >                       DW_AT_type	(0x000000a9 "int")
> >     
> >     0x000000d8:     DW_TAG_formal_parameter
> >                       DW_AT_name	("y")
> >                       DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
> >                       DW_AT_decl_line	(2)
> >                       DW_AT_decl_column	(0x19)
> >                       DW_AT_type	(0x000000a9 "int")
> >     
> >     0x000000e0:     NULL
> >     
> >     0x000000e1:   DW_TAG_subprogram
> >                     DW_AT_abstract_origin	(0x000000c1 "f")
> >                     DW_AT_low_pc	(0x0000000000000000)
> >                     DW_AT_high_pc	(0x0000000000000004)
> >                     DW_AT_frame_base	(DW_OP_call_frame_cfa)
> >                     DW_AT_call_all_calls	(true)
> >     
> >     0x000000f8:     DW_TAG_formal_parameter
> >                       DW_AT_abstract_origin	(0x000000d8 "y")
> >                       DW_AT_location	(DW_OP_reg5 RDI)
> >     
> >     0x000000ff:     DW_TAG_formal_parameter
> >                       DW_AT_abstract_origin	(0x000000d0 "x")
> >                       DW_AT_const_value	(0x01)
> >     
> >     0x00000105:     NULL
> >     
> > When I ask pahole with this patch-set applied to generate BTF I see
> > the following output:
> > 
> >     $ pahole --verbose --btf_encode_detached=test.btf test.o
> >     btf_encoder__new: 'test.o' doesn't have '.data..percpu' section
> >     Found 0 per-CPU variables!
> >     Found 2 functions!
> >     File test.o:
> >     [1] INT int size=4 nr_bits=32 encoding=SIGNED
> >     [2] PTR (anon) type_id=3
> >     [3] PTR (anon) type_id=4
> >     [4] INT char size=1 nr_bits=8 encoding=SIGNED
> >     [5] FUNC_PROTO (anon) return=1 args=(1 argc, 2 argv)
> >     [6] FUNC main type_id=5
> >     matched function 'f' with 'f.constprop.0'
> >     added local function 'f'
> >     matched function 'f' with 'f.constprop.0'
> >     [7] FUNC_PROTO (anon) return=1 args=(1 x, 1 y)
> >     [8] FUNC f type_id=7
> >     
> > Meaning that function `f` had not been skipped.
> > A trivial modification overcomes this:
> > 
> > 		if (param_idx < NR_REGISTER_PARAMS && !parm->name) {
> > 			if (attr_location(die, &loc.expr, &loc.exprlen) == 0 &&
> > 			    loc.exprlen != 0) {
> > 				Dwarf_Op *expr = loc.expr;
> > 
> > 				switch (expr->atom) {
> > 				case DW_OP_reg1 ... DW_OP_reg31:
> > 				case DW_OP_breg0 ... DW_OP_breg31:
> > 					break;
> > 				default:
> > 					parm->optimized = true;
> > 					break;
> > 				}
> > 			} else if (dwarf_attr(die, DW_AT_const_value, &attr) != NULL) {
> > 					parm->optimized = true;
> > 			}
> > 
> > With it pahole seem to work as intended (if I understand the intention correctly):
> > 
> >     $ pahole --verbose --btf_encode_detached=test.btf test.o
> >     btf_encoder__new: 'test.o' doesn't have '.data..percpu' section
> >     Found 0 per-CPU variables!
> >     Found 2 functions!
> >     File test.o:
> >     [1] INT int size=4 nr_bits=32 encoding=SIGNED
> >     [2] PTR (anon) type_id=3
> >     [3] PTR (anon) type_id=4
> >     [4] INT char size=1 nr_bits=8 encoding=SIGNED
> >     [5] FUNC_PROTO (anon) return=1 args=(1 argc, 2 argv)
> >     [6] FUNC main type_id=5
> >     matched function 'f' with 'f.constprop.0', has optimized-out parameters
> >     added local function 'f', optimized-out params
> >     matched function 'f' with 'f.constprop.0', has optimized-out parameters
> >     skipping addition of 'f' due to optimized-out parameters
> > 
> > wdyt?
> > 
> 
> This is great, thanks Eduard! I can add an additional patch
> for the else clause code above, attributing that to you in v2 if
> you like?
> 
> Alan
> 

More on this topic. I tried the same example but with clang,
DWARF generated by clang differs significantly.

    $ cat test.c
    __attribute__((noinline))
    static int f(int x, int y) {
        return x + y;
    }
    
    int main(int argc, char *argv[]) {
        return f(1, 2) + f(1, 3);
    }
    
    $ clang --version | head -n1
    clang version 16.0.0 (https://github.com/llvm/llvm-project.git 50d4a1f70e111cd41b1a94d95fd06b5691aa2643)
    
    $ clang -O2 -g -c test.c -o test.o

llvm-objdump shows that the first parameter is still optimized out:

    $ llvm-objdump -d test.o 
    
    test.o:	file format elf64-x86-64
    
    Disassembly of section .text:
    
    0000000000000000 <main>:
           0: 53                           	pushq	%rbx
           1: bf 02 00 00 00               	movl	$0x2, %edi
           6: e8 15 00 00 00               	callq	0x20 <f>
           b: 89 c3                        	movl	%eax, %ebx
           d: bf 03 00 00 00               	movl	$0x3, %edi
          12: e8 09 00 00 00               	callq	0x20 <f>
          17: 01 d8                        	addl	%ebx, %eax
          19: 5b                           	popq	%rbx
          1a: c3                           	retq
          1b: 0f 1f 44 00 00               	nopl	(%rax,%rax)
    
    0000000000000020 <f>:
          20: 8d 47 01                     	leal	0x1(%rdi), %eax
          23: c3                           	retq

And here is the DWARF, note that formal parameter has both
`DW_AT_name` and `DW_AT_const_value` attributes:

    $ llvm-dwarfdump test.o
    ...
    0x00000061:   DW_TAG_subprogram
                    DW_AT_low_pc	(0x0000000000000020)
                    DW_AT_high_pc	(0x0000000000000024)
                    DW_AT_frame_base	(DW_OP_reg7 RSP)
                    DW_AT_call_all_calls	(true)
                    DW_AT_name	("f")
                    DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
                    DW_AT_decl_line	(2)
                    DW_AT_prototyped	(true)
                    DW_AT_calling_convention	(DW_CC_nocall)
                    DW_AT_type	(0x00000085 "int")
    
    0x00000071:     DW_TAG_formal_parameter
                      DW_AT_const_value	(1)
                      DW_AT_name	("x")
                      DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
                      DW_AT_decl_line	(2)
                      DW_AT_type	(0x00000085 "int")
    
    0x0000007a:     DW_TAG_formal_parameter
                      DW_AT_location	(DW_OP_reg5 RDI)
                      DW_AT_name	("y")
                      DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
                      DW_AT_decl_line	(2)
                      DW_AT_type	(0x00000085 "int")
    
    0x00000084:     NULL
    ...

Given this DWARF layout pahole does not recognize `x` as optimized out:

    $ pahole --verbose --btf_encode_detached=test.btf test.o
    btf_encoder__new: 'test.o' doesn't have '.data..percpu' section
    Found 0 per-CPU variables!
    Found 2 functions!
    File test.o:
    [1] INT int size=4 nr_bits=32 encoding=SIGNED
    [2] PTR (anon) type_id=3
    [3] PTR (anon) type_id=4
    [4] INT char size=1 nr_bits=8 encoding=SIGNED
    [5] FUNC_PROTO (anon) return=1 args=(1 argc, 2 argv)
    [6] FUNC main type_id=5
    [7] FUNC_PROTO (anon) return=1 args=(1 x, 1 y)
    [8] FUNC f type_id=7

The way I read paragraph 4.1.4 mentioned before the tag `DW_AT_name`
should not be used to identify whether parameter is optimized out.
Unfortunately trivial modification of the condition in the
`parameter__new()` to remove the `!parm->name` check is not
sufficient. For some reason parameters `x` and `y` are not visited in
`ftype__recode_dwarf_types()` and thus `optimized_parms` field is not set.

Thanks,
Eduard



> > Thanks,
> > Eduard
> > 
> > >  
> > >  	return parm;
> > > @@ -1450,7 +1504,7 @@ static struct tag *die__create_new_parameter(Dwarf_Die *die,
> > >  					     struct cu *cu, struct conf_load *conf,
> > >  					     int param_idx)
> > >  {
> > > -	struct parameter *parm = parameter__new(die, cu, conf);
> > > +	struct parameter *parm = parameter__new(die, cu, conf, param_idx);
> > >  
> > >  	if (parm == NULL)
> > >  		return NULL;
> > > @@ -2209,6 +2263,10 @@ static void ftype__recode_dwarf_types(struct tag *tag, struct cu *cu)
> > >  			}
> > >  			pos->name = tag__parameter(dtype->tag)->name;
> > >  			pos->tag.type = dtype->tag->type;
> > > +			if (pos->optimized) {
> > > +				tag__parameter(dtype->tag)->optimized = pos->optimized;
> > > +				type->optimized_parms = 1;
> > > +			}
> > >  			continue;
> > >  		}
> > >  
> > > @@ -2219,6 +2277,20 @@ static void ftype__recode_dwarf_types(struct tag *tag, struct cu *cu)
> > >  		}
> > >  		pos->tag.type = dtype->small_id;
> > >  	}
> > > +	/* if parameters were optimized out, set flag for the ftype this
> > > +	 * function tag referred to via abstract origin.
> > > +	 */
> > > +	if (type->optimized_parms) {
> > > +		struct dwarf_tag *dtype = type->tag.priv;
> > > +		struct dwarf_tag *dftype;
> > > +
> > > +		dftype = dwarf_cu__find_tag_by_ref(dcu, &dtype->abstract_origin);
> > > +		if (dftype && dftype->tag) {
> > > +			struct ftype *ftype = tag__ftype(dftype->tag);
> > > +
> > > +			ftype->optimized_parms = 1;
> > > +		}
> > > +	}
> > >  }
> > >  
> > >  static void lexblock__recode_dwarf_types(struct lexblock *tag, struct cu *cu)
> > > diff --git a/dwarves.h b/dwarves.h
> > > index 589588e..1ad1b3b 100644
> > > --- a/dwarves.h
> > > +++ b/dwarves.h
> > > @@ -808,6 +808,7 @@ size_t lexblock__fprintf(const struct lexblock *lexblock, const struct cu *cu,
> > >  struct parameter {
> > >  	struct tag tag;
> > >  	const char *name;
> > > +	bool optimized;
> > >  };
> > >  
> > >  static inline struct parameter *tag__parameter(const struct tag *tag)
> > > @@ -827,7 +828,8 @@ struct ftype {
> > >  	struct tag	 tag;
> > >  	struct list_head parms;
> > >  	uint16_t	 nr_parms;
> > > -	uint8_t		 unspec_parms; /* just one bit is needed */
> > > +	uint8_t		 unspec_parms:1; /* just one bit is needed */
> > > +	uint8_t		 optimized_parms:1;
> > >  };
> > >  
> > >  static inline struct ftype *tag__ftype(const struct tag *tag)
> > 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH dwarves 1/5] dwarves: help dwarf loader spot functions with optimized-out parameters
  2023-01-25 21:34       ` Eduard Zingerman
@ 2023-01-25 22:52         ` Alan Maguire
  2023-01-25 23:42           ` Eduard Zingerman
  0 siblings, 1 reply; 27+ messages in thread
From: Alan Maguire @ 2023-01-25 22:52 UTC (permalink / raw)
  To: Eduard Zingerman, acme, yhs, ast, olsajiri, timo
  Cc: daniel, andrii, songliubraving, john.fastabend, kpsingh, sdf,
	haoluo, martin.lau, bpf

On 25/01/2023 21:34, Eduard Zingerman wrote:
> On Wed, 2023-01-25 at 18:28 +0000, Alan Maguire wrote:
>> On 25/01/2023 17:47, Eduard Zingerman wrote:
>>> On Tue, 2023-01-24 at 13:45 +0000, Alan Maguire wrote:
>>>> Compilation generates DWARF at several stages, and often the
>>>> later DWARF representations more accurately represent optimizations
>>>> that have occurred during compilation.
>>>>
>>>> In particular, parameter representations can be spotted by their
>>>> abstract origin references to the original parameter, but they
>>>> often have more accurate location information.  In most cases,
>>>> the parameter locations will match calling conventions, and be
>>>> registers for the first 6 parameters on x86_64, first 8 on ARM64
>>>> etc.  If the parameter is not a register when it should be however,
>>>> it is likely passed via the stack or the compiler has used a
>>>> constant representation instead.
>>>>
>>>> This change adds a field to parameters and their associated
>>>> ftype to note if a parameter has been optimized out.  Having
>>>> this information allows us to skip such functions, as their
>>>> presence in CUs makes BTF encoding impossible.
>>>>
>>>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>>>> ---
>>>>  dwarf_loader.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>>>>  dwarves.h      |  4 +++-
>>>>  2 files changed, 77 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/dwarf_loader.c b/dwarf_loader.c
>>>> index 5a74035..0220f1d 100644
>>>> --- a/dwarf_loader.c
>>>> +++ b/dwarf_loader.c
>>>> @@ -992,13 +992,67 @@ static struct class_member *class_member__new(Dwarf_Die *die, struct cu *cu,
>>>>  	return member;
>>>>  }
>>>>  
>>>> -static struct parameter *parameter__new(Dwarf_Die *die, struct cu *cu, struct conf_load *conf)
>>>> +/* How many function parameters are passed via registers?  Used below in
>>>> + * determining if an argument has been optimized out or if it is simply
>>>> + * an argument > NR_REGISTER_PARAMS.  Setting NR_REGISTER_PARAMS to 0
>>>> + * allows unsupported architectures to skip tagging optimized-out
>>>> + * values.
>>>> + */
>>>> +#if defined(__x86_64__)
>>>> +#define NR_REGISTER_PARAMS      6
>>>> +#elif defined(__s390__)
>>>> +#define NR_REGISTER_PARAMS	5
>>>> +#elif defined(__aarch64__)
>>>> +#define NR_REGISTER_PARAMS      8
>>>> +#elif defined(__mips__)
>>>> +#define NR_REGISTER_PARAMS	8
>>>> +#elif defined(__powerpc__)
>>>> +#define NR_REGISTER_PARAMS	8
>>>> +#elif defined(__sparc__)
>>>> +#define NR_REGISTER_PARAMS	6
>>>> +#elif defined(__riscv) && __riscv_xlen == 64
>>>> +#define NR_REGISTER_PARAMS	8
>>>> +#elif defined(__arc__)
>>>> +#define NR_REGISTER_PARAMS	8
>>>> +#else
>>>> +#define NR_REGISTER_PARAMS      0
>>>> +#endif
>>>> +
>>>> +static struct parameter *parameter__new(Dwarf_Die *die, struct cu *cu,
>>>> +					struct conf_load *conf, int param_idx)
>>>>  {
>>>>  	struct parameter *parm = tag__alloc(cu, sizeof(*parm));
>>>>  
>>>>  	if (parm != NULL) {
>>>> +		struct location loc;
>>>> +
>>>>  		tag__init(&parm->tag, cu, die);
>>>>  		parm->name = attr_string(die, DW_AT_name, conf);
>>>> +
>>>> +		/* Parameters which use DW_AT_abstract_origin to point at
>>>> +		 * the original parameter definition (with no name in the DIE)
>>>> +		 * are the result of later DWARF generation during compilation
>>>> +		 * so often better take into account if arguments were
>>>> +		 * optimized out.
>>>> +		 *
>>>> +		 * By checking that locations for parameters that are expected
>>>> +		 * to be passed as registers are actually passed as registers,
>>>> +		 * we can spot optimized-out parameters.
>>>> +		 */
>>>> +		if (param_idx < NR_REGISTER_PARAMS && !parm->name &&
>>>> +		    attr_location(die, &loc.expr, &loc.exprlen) == 0 &&
>>>> +		    loc.exprlen != 0) {
>>>> +			Dwarf_Op *expr = loc.expr;
>>>> +
>>>> +			switch (expr->atom) {
>>>> +			case DW_OP_reg1 ... DW_OP_reg31:
>>>> +			case DW_OP_breg0 ... DW_OP_breg31:
>>>> +				break;
>>>> +			default:
>>>> +				parm->optimized = true;
>>>> +				break;
>>>> +			}
>>>> +		}
>>>
>>> Hi Alan,
>>>
>>> I looked through the DWARF standard and found two relevant entries:
>>>
>>>> 4.1.4
>>>>
>>>> If no location attribute is present in a variable entry representing
>>>> the definition of a variable (...), or if the location attribute is
>>>> present but has an empty location description (...), the variable is
>>>> assumed to exist in the source code but not in the executable program
>>>> (but see number 10, below).
>>>
>>> This paragraph implies that parameter name presence or absence is
>>> irrelevant, but I don't have any examples when parameter name is
>>> present for a removed parameter.
>>>
>>>> 4.1.10
>>>>
>>>> A DW_AT_const_value attribute for an entry describing a variable or formal
>>>> parameter whose value is constant and not represented by an object in the
>>>> address space of the program, or an entry describing a named constant. (Note
>>>> that such an entry does not have a location attribute.)
>>>
>>> For this paragraph I have an example:
>>>
>>>     $ cat test.c
>>>     __attribute__((noinline))
>>>     static int f(int x, int y) {
>>>         return x + y;
>>>     }
>>>     
>>>     int main(int argc, char *argv[]) {
>>>         return f(1, 2) + f(1, 3);
>>>     }
>>>     
>>>     $ gcc --version | head -n1
>>>     gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
>>>     $ gcc -O2 -g -c test.c -o test.o
>>>     
>>> The objdump shows that constant propagation removed the first
>>> parameter of the function `f`:
>>>
>>>     $ llvm-objdump -d test.o 
>>>     
>>>     test.o:	file format elf64-x86-64
>>>     
>>>     Disassembly of section .text:
>>>     
>>>     0000000000000000 <f.constprop.0>:
>>>            0: 8d 47 01                     	leal	0x1(%rdi), %eax
>>>            3: c3                           	retq
>>>     
>>>     Disassembly of section .text.startup:
>>>     
>>>     0000000000000000 <main>:
>>>            0: f3 0f 1e fa                  	endbr64
>>>            4: bf 02 00 00 00               	movl	$0x2, %edi
>>>            9: e8 00 00 00 00               	callq	0xe <main+0xe>
>>>            e: bf 03 00 00 00               	movl	$0x3, %edi
>>>           13: 89 c2                        	movl	%eax, %edx
>>>           15: e8 00 00 00 00               	callq	0x1a <main+0x1a>
>>>           1a: 01 d0                        	addl	%edx, %eax
>>>           1c: c3                           	retq
>>>     
>>> However, the information about this parameter is still present in the DWARF:
>>>
>>>     $ llvm-dwarfdump test.o
>>>     ...
>>>     0x000000c1:   DW_TAG_subprogram
>>>                     DW_AT_name	("f")
>>>                     DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
>>>                     DW_AT_decl_line	(2)
>>>                     DW_AT_decl_column	(0x0c)
>>>                     DW_AT_prototyped	(true)
>>>                     DW_AT_type	(0x000000a9 "int")
>>>                     DW_AT_inline	(DW_INL_inlined)
>>>                     DW_AT_sibling	(0x000000e1)
>>>     
>>>     0x000000d0:     DW_TAG_formal_parameter
>>>                       DW_AT_name	("x")
>>>                       DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
>>>                       DW_AT_decl_line	(2)
>>>                       DW_AT_decl_column	(0x12)
>>>                       DW_AT_type	(0x000000a9 "int")
>>>     
>>>     0x000000d8:     DW_TAG_formal_parameter
>>>                       DW_AT_name	("y")
>>>                       DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
>>>                       DW_AT_decl_line	(2)
>>>                       DW_AT_decl_column	(0x19)
>>>                       DW_AT_type	(0x000000a9 "int")
>>>     
>>>     0x000000e0:     NULL
>>>     
>>>     0x000000e1:   DW_TAG_subprogram
>>>                     DW_AT_abstract_origin	(0x000000c1 "f")
>>>                     DW_AT_low_pc	(0x0000000000000000)
>>>                     DW_AT_high_pc	(0x0000000000000004)
>>>                     DW_AT_frame_base	(DW_OP_call_frame_cfa)
>>>                     DW_AT_call_all_calls	(true)
>>>     
>>>     0x000000f8:     DW_TAG_formal_parameter
>>>                       DW_AT_abstract_origin	(0x000000d8 "y")
>>>                       DW_AT_location	(DW_OP_reg5 RDI)
>>>     
>>>     0x000000ff:     DW_TAG_formal_parameter
>>>                       DW_AT_abstract_origin	(0x000000d0 "x")
>>>                       DW_AT_const_value	(0x01)
>>>     
>>>     0x00000105:     NULL
>>>     
>>> When I ask pahole with this patch-set applied to generate BTF I see
>>> the following output:
>>>
>>>     $ pahole --verbose --btf_encode_detached=test.btf test.o
>>>     btf_encoder__new: 'test.o' doesn't have '.data..percpu' section
>>>     Found 0 per-CPU variables!
>>>     Found 2 functions!
>>>     File test.o:
>>>     [1] INT int size=4 nr_bits=32 encoding=SIGNED
>>>     [2] PTR (anon) type_id=3
>>>     [3] PTR (anon) type_id=4
>>>     [4] INT char size=1 nr_bits=8 encoding=SIGNED
>>>     [5] FUNC_PROTO (anon) return=1 args=(1 argc, 2 argv)
>>>     [6] FUNC main type_id=5
>>>     matched function 'f' with 'f.constprop.0'
>>>     added local function 'f'
>>>     matched function 'f' with 'f.constprop.0'
>>>     [7] FUNC_PROTO (anon) return=1 args=(1 x, 1 y)
>>>     [8] FUNC f type_id=7
>>>     
>>> Meaning that function `f` had not been skipped.
>>> A trivial modification overcomes this:
>>>
>>> 		if (param_idx < NR_REGISTER_PARAMS && !parm->name) {
>>> 			if (attr_location(die, &loc.expr, &loc.exprlen) == 0 &&
>>> 			    loc.exprlen != 0) {
>>> 				Dwarf_Op *expr = loc.expr;
>>>
>>> 				switch (expr->atom) {
>>> 				case DW_OP_reg1 ... DW_OP_reg31:
>>> 				case DW_OP_breg0 ... DW_OP_breg31:
>>> 					break;
>>> 				default:
>>> 					parm->optimized = true;
>>> 					break;
>>> 				}
>>> 			} else if (dwarf_attr(die, DW_AT_const_value, &attr) != NULL) {
>>> 					parm->optimized = true;
>>> 			}
>>>
>>> With it pahole seem to work as intended (if I understand the intention correctly):
>>>
>>>     $ pahole --verbose --btf_encode_detached=test.btf test.o
>>>     btf_encoder__new: 'test.o' doesn't have '.data..percpu' section
>>>     Found 0 per-CPU variables!
>>>     Found 2 functions!
>>>     File test.o:
>>>     [1] INT int size=4 nr_bits=32 encoding=SIGNED
>>>     [2] PTR (anon) type_id=3
>>>     [3] PTR (anon) type_id=4
>>>     [4] INT char size=1 nr_bits=8 encoding=SIGNED
>>>     [5] FUNC_PROTO (anon) return=1 args=(1 argc, 2 argv)
>>>     [6] FUNC main type_id=5
>>>     matched function 'f' with 'f.constprop.0', has optimized-out parameters
>>>     added local function 'f', optimized-out params
>>>     matched function 'f' with 'f.constprop.0', has optimized-out parameters
>>>     skipping addition of 'f' due to optimized-out parameters
>>>
>>> wdyt?
>>>
>>
>> This is great, thanks Eduard! I can add an additional patch
>> for the else clause code above, attributing that to you in v2 if
>> you like?
>>
>> Alan
>>
> 
> More on this topic. I tried the same example but with clang,
> DWARF generated by clang differs significantly.
> 
>     $ cat test.c
>     __attribute__((noinline))
>     static int f(int x, int y) {
>         return x + y;
>     }
>     
>     int main(int argc, char *argv[]) {
>         return f(1, 2) + f(1, 3);
>     }
>     
>     $ clang --version | head -n1
>     clang version 16.0.0 (https://github.com/llvm/llvm-project.git 50d4a1f70e111cd41b1a94d95fd06b5691aa2643)
>     
>     $ clang -O2 -g -c test.c -o test.o
> 
> llvm-objdump shows that the first parameter is still optimized out:
> 
>     $ llvm-objdump -d test.o 
>     
>     test.o:	file format elf64-x86-64
>     
>     Disassembly of section .text:
>     
>     0000000000000000 <main>:
>            0: 53                           	pushq	%rbx
>            1: bf 02 00 00 00               	movl	$0x2, %edi
>            6: e8 15 00 00 00               	callq	0x20 <f>
>            b: 89 c3                        	movl	%eax, %ebx
>            d: bf 03 00 00 00               	movl	$0x3, %edi
>           12: e8 09 00 00 00               	callq	0x20 <f>
>           17: 01 d8                        	addl	%ebx, %eax
>           19: 5b                           	popq	%rbx
>           1a: c3                           	retq
>           1b: 0f 1f 44 00 00               	nopl	(%rax,%rax)
>     
>     0000000000000020 <f>:
>           20: 8d 47 01                     	leal	0x1(%rdi), %eax
>           23: c3                           	retq
> 
> And here is the DWARF, note that formal parameter has both
> `DW_AT_name` and `DW_AT_const_value` attributes:
> 
>     $ llvm-dwarfdump test.o
>     ...
>     0x00000061:   DW_TAG_subprogram
>                     DW_AT_low_pc	(0x0000000000000020)
>                     DW_AT_high_pc	(0x0000000000000024)
>                     DW_AT_frame_base	(DW_OP_reg7 RSP)
>                     DW_AT_call_all_calls	(true)
>                     DW_AT_name	("f")
>                     DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
>                     DW_AT_decl_line	(2)
>                     DW_AT_prototyped	(true)
>                     DW_AT_calling_convention	(DW_CC_nocall)
>                     DW_AT_type	(0x00000085 "int")
>     
>     0x00000071:     DW_TAG_formal_parameter
>                       DW_AT_const_value	(1)
>                       DW_AT_name	("x")
>                       DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
>                       DW_AT_decl_line	(2)
>                       DW_AT_type	(0x00000085 "int")
>     
>     0x0000007a:     DW_TAG_formal_parameter
>                       DW_AT_location	(DW_OP_reg5 RDI)
>                       DW_AT_name	("y")
>                       DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
>                       DW_AT_decl_line	(2)
>                       DW_AT_type	(0x00000085 "int")
>     
>     0x00000084:     NULL
>     ...
> 
> Given this DWARF layout pahole does not recognize `x` as optimized out:
> 
>     $ pahole --verbose --btf_encode_detached=test.btf test.o
>     btf_encoder__new: 'test.o' doesn't have '.data..percpu' section
>     Found 0 per-CPU variables!
>     Found 2 functions!
>     File test.o:
>     [1] INT int size=4 nr_bits=32 encoding=SIGNED
>     [2] PTR (anon) type_id=3
>     [3] PTR (anon) type_id=4
>     [4] INT char size=1 nr_bits=8 encoding=SIGNED
>     [5] FUNC_PROTO (anon) return=1 args=(1 argc, 2 argv)
>     [6] FUNC main type_id=5
>     [7] FUNC_PROTO (anon) return=1 args=(1 x, 1 y)
>     [8] FUNC f type_id=7
> 
> The way I read paragraph 4.1.4 mentioned before the tag `DW_AT_name`
> should not be used to identify whether parameter is optimized out.
> Unfortunately trivial modification of the condition in the
> `parameter__new()` to remove the `!parm->name` check is not
> sufficient. For some reason parameters `x` and `y` are not visited in
> `ftype__recode_dwarf_types()` and thus `optimized_parms` field is not set.
> 

Thanks for this - I tried it, and we spot the optimization once we update
die__create_new_parameter() as follows:

diff --git a/dwarf_loader.c b/dwarf_loader.c
index f96b6ff..605ad45 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -1529,6 +1530,8 @@ static struct tag *die__create_new_parameter(Dwarf_Die *di
 
        if (ftype != NULL) {
                ftype__add_parameter(ftype, parm);
+               if (parm->optimized)
+                       ftype->optimized_parms = 1;
                if (param_idx >= 0) {
                        if (add_child_llvm_annotations(die, param_idx, conf, &(t
                                return NULL;


With that change, I see:

$ pahole --verbose --btf_encode_detached=test.btf test.o
btf_encoder__new: 'test.o' doesn't have '.data..percpu' section
Found 0 per-CPU variables!
Found 2 functions!
File test.o:
[1] INT int size=4 nr_bits=32 encoding=SIGNED
[2] PTR (anon) type_id=3
[3] PTR (anon) type_id=4
[4] INT char size=1 nr_bits=8 encoding=SIGNED
[5] FUNC_PROTO (anon) return=1 args=(1 argc, 2 argv)
[6] FUNC main type_id=5
added local function 'f', optimized-out params
skipping addition of 'f' due to optimized-out parameters

Thanks!

Alan

> Thanks,
> Eduard
> 
> 
> 
>>> Thanks,
>>> Eduard
>>>
>>>>  
>>>>  	return parm;
>>>> @@ -1450,7 +1504,7 @@ static struct tag *die__create_new_parameter(Dwarf_Die *die,
>>>>  					     struct cu *cu, struct conf_load *conf,
>>>>  					     int param_idx)
>>>>  {
>>>> -	struct parameter *parm = parameter__new(die, cu, conf);
>>>> +	struct parameter *parm = parameter__new(die, cu, conf, param_idx);
>>>>  
>>>>  	if (parm == NULL)
>>>>  		return NULL;
>>>> @@ -2209,6 +2263,10 @@ static void ftype__recode_dwarf_types(struct tag *tag, struct cu *cu)
>>>>  			}
>>>>  			pos->name = tag__parameter(dtype->tag)->name;
>>>>  			pos->tag.type = dtype->tag->type;
>>>> +			if (pos->optimized) {
>>>> +				tag__parameter(dtype->tag)->optimized = pos->optimized;
>>>> +				type->optimized_parms = 1;
>>>> +			}
>>>>  			continue;
>>>>  		}
>>>>  
>>>> @@ -2219,6 +2277,20 @@ static void ftype__recode_dwarf_types(struct tag *tag, struct cu *cu)
>>>>  		}
>>>>  		pos->tag.type = dtype->small_id;
>>>>  	}
>>>> +	/* if parameters were optimized out, set flag for the ftype this
>>>> +	 * function tag referred to via abstract origin.
>>>> +	 */
>>>> +	if (type->optimized_parms) {
>>>> +		struct dwarf_tag *dtype = type->tag.priv;
>>>> +		struct dwarf_tag *dftype;
>>>> +
>>>> +		dftype = dwarf_cu__find_tag_by_ref(dcu, &dtype->abstract_origin);
>>>> +		if (dftype && dftype->tag) {
>>>> +			struct ftype *ftype = tag__ftype(dftype->tag);
>>>> +
>>>> +			ftype->optimized_parms = 1;
>>>> +		}
>>>> +	}
>>>>  }
>>>>  
>>>>  static void lexblock__recode_dwarf_types(struct lexblock *tag, struct cu *cu)
>>>> diff --git a/dwarves.h b/dwarves.h
>>>> index 589588e..1ad1b3b 100644
>>>> --- a/dwarves.h
>>>> +++ b/dwarves.h
>>>> @@ -808,6 +808,7 @@ size_t lexblock__fprintf(const struct lexblock *lexblock, const struct cu *cu,
>>>>  struct parameter {
>>>>  	struct tag tag;
>>>>  	const char *name;
>>>> +	bool optimized;
>>>>  };
>>>>  
>>>>  static inline struct parameter *tag__parameter(const struct tag *tag)
>>>> @@ -827,7 +828,8 @@ struct ftype {
>>>>  	struct tag	 tag;
>>>>  	struct list_head parms;
>>>>  	uint16_t	 nr_parms;
>>>> -	uint8_t		 unspec_parms; /* just one bit is needed */
>>>> +	uint8_t		 unspec_parms:1; /* just one bit is needed */
>>>> +	uint8_t		 optimized_parms:1;
>>>>  };
>>>>  
>>>>  static inline struct ftype *tag__ftype(const struct tag *tag)
>>>
> 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH dwarves 1/5] dwarves: help dwarf loader spot functions with optimized-out parameters
  2023-01-25 22:52         ` Alan Maguire
@ 2023-01-25 23:42           ` Eduard Zingerman
  2023-01-26  0:20             ` Eduard Zingerman
  0 siblings, 1 reply; 27+ messages in thread
From: Eduard Zingerman @ 2023-01-25 23:42 UTC (permalink / raw)
  To: Alan Maguire, acme, yhs, ast, olsajiri, timo
  Cc: daniel, andrii, songliubraving, john.fastabend, kpsingh, sdf,
	haoluo, martin.lau, bpf

On Wed, 2023-01-25 at 22:52 +0000, Alan Maguire wrote:
[...]
> 
> Thanks for this - I tried it, and we spot the optimization once we update
> die__create_new_parameter() as follows:
> 
> diff --git a/dwarf_loader.c b/dwarf_loader.c
> index f96b6ff..605ad45 100644
> --- a/dwarf_loader.c
> +++ b/dwarf_loader.c
> @@ -1529,6 +1530,8 @@ static struct tag *die__create_new_parameter(Dwarf_Die *di
>  
>         if (ftype != NULL) {
>                 ftype__add_parameter(ftype, parm);
> +               if (parm->optimized)
> +                       ftype->optimized_parms = 1;
>                 if (param_idx >= 0) {
>                         if (add_child_llvm_annotations(die, param_idx, conf, &(t
>                                 return NULL;
> 

Great, looks good.

> With that change, I see:
> 
> $ pahole --verbose --btf_encode_detached=test.btf test.o
> btf_encoder__new: 'test.o' doesn't have '.data..percpu' section
> Found 0 per-CPU variables!
> Found 2 functions!
> File test.o:
> [1] INT int size=4 nr_bits=32 encoding=SIGNED
> [2] PTR (anon) type_id=3
> [3] PTR (anon) type_id=4
> [4] INT char size=1 nr_bits=8 encoding=SIGNED
> [5] FUNC_PROTO (anon) return=1 args=(1 argc, 2 argv)
> [6] FUNC main type_id=5
> added local function 'f', optimized-out params
> skipping addition of 'f' due to optimized-out parameters

Sorry, I have one more silly program.

I talked to Yonghong today and we discussed if compiler can change a
type of a function parameter as a result of some optimization.
Consider the following example:

    $ cat test.c
    struct st {
      int a;
      int b;
    };
    
    __attribute__((noinline))
    static int f(struct st *s) {
      return s->a + s->b;
    }
    
    int main(int argc, char *argv[]) {
      struct st s = {
        .a = (long)argv[0],
        .b = (long)argv[1]
      };
      return f(&s);
    }

When compiled by `clang` with -O3 the prototype of `f` is changed from
`int f(struct *st)` to `int f(int, int)`:

    $ clang -O3 -g -c test.c -o test.o && llvm-objdump -d test.o
    ...
    0000000000000000 <main>:
           0: 8b 3e                        	movl	(%rsi), %edi
           2: 8b 76 08                     	movl	0x8(%rsi), %esi
           5: eb 09                        	jmp	0x10 <f>
           7: 66 0f 1f 84 00 00 00 00 00   	nopw	(%rax,%rax)
    
    0000000000000010 <f>:
          10: 8d 04 37                     	leal	(%rdi,%rsi), %eax
          13: c3                           	retq
    
But generated DWARF hides this information:

    $ llvm-dwarfdump test.o
    ...
    0x0000005c:   DW_TAG_subprogram
                    DW_AT_low_pc	(0x0000000000000010)
                    DW_AT_high_pc	(0x0000000000000014)
                    DW_AT_frame_base	(DW_OP_reg7 RSP)
                    DW_AT_call_all_calls	(true)
                    DW_AT_name	("f")
                    DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
                    DW_AT_decl_line	(7)
                    DW_AT_prototyped	(true)
                    DW_AT_type	(0x00000074 "int")
    
    0x0000006b:     DW_TAG_formal_parameter
                      DW_AT_name	("s")
                      DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
                      DW_AT_decl_line	(7)
                      DW_AT_type	(0x0000009e "st *")
    
    0x00000073:     NULL
    ...

Is this important?
(gcc does not do this for the particular example, but I don't know if
 it could be tricked to under some conditions).

Thanks,
Eduard

[...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH dwarves 1/5] dwarves: help dwarf loader spot functions with optimized-out parameters
  2023-01-25 23:42           ` Eduard Zingerman
@ 2023-01-26  0:20             ` Eduard Zingerman
  2023-01-26 14:02               ` Alan Maguire
  0 siblings, 1 reply; 27+ messages in thread
From: Eduard Zingerman @ 2023-01-26  0:20 UTC (permalink / raw)
  To: Alan Maguire, acme, yhs, ast, olsajiri, timo
  Cc: daniel, andrii, songliubraving, john.fastabend, kpsingh, sdf,
	haoluo, martin.lau, bpf

On Thu, 2023-01-26 at 01:42 +0200, Eduard Zingerman wrote:
> On Wed, 2023-01-25 at 22:52 +0000, Alan Maguire wrote:
> [...]
> > 
> > Thanks for this - I tried it, and we spot the optimization once we update
> > die__create_new_parameter() as follows:
> > 
> > diff --git a/dwarf_loader.c b/dwarf_loader.c
> > index f96b6ff..605ad45 100644
> > --- a/dwarf_loader.c
> > +++ b/dwarf_loader.c
> > @@ -1529,6 +1530,8 @@ static struct tag *die__create_new_parameter(Dwarf_Die *di
> >  
> >         if (ftype != NULL) {
> >                 ftype__add_parameter(ftype, parm);
> > +               if (parm->optimized)
> > +                       ftype->optimized_parms = 1;
> >                 if (param_idx >= 0) {
> >                         if (add_child_llvm_annotations(die, param_idx, conf, &(t
> >                                 return NULL;
> > 
> 
> Great, looks good.
> 
> > With that change, I see:
> > 
> > $ pahole --verbose --btf_encode_detached=test.btf test.o
> > btf_encoder__new: 'test.o' doesn't have '.data..percpu' section
> > Found 0 per-CPU variables!
> > Found 2 functions!
> > File test.o:
> > [1] INT int size=4 nr_bits=32 encoding=SIGNED
> > [2] PTR (anon) type_id=3
> > [3] PTR (anon) type_id=4
> > [4] INT char size=1 nr_bits=8 encoding=SIGNED
> > [5] FUNC_PROTO (anon) return=1 args=(1 argc, 2 argv)
> > [6] FUNC main type_id=5
> > added local function 'f', optimized-out params
> > skipping addition of 'f' due to optimized-out parameters
> 
> Sorry, I have one more silly program.
> 
> I talked to Yonghong today and we discussed if compiler can change a
> type of a function parameter as a result of some optimization.
> Consider the following example:
> 
>     $ cat test.c
>     struct st {
>       int a;
>       int b;
>     };
>     
>     __attribute__((noinline))
>     static int f(struct st *s) {
>       return s->a + s->b;
>     }
>     
>     int main(int argc, char *argv[]) {
>       struct st s = {
>         .a = (long)argv[0],
>         .b = (long)argv[1]
>       };
>       return f(&s);
>     }
> 
> When compiled by `clang` with -O3 the prototype of `f` is changed from
> `int f(struct *st)` to `int f(int, int)`:
> 
>     $ clang -O3 -g -c test.c -o test.o && llvm-objdump -d test.o
>     ...
>     0000000000000000 <main>:
>            0: 8b 3e                        	movl	(%rsi), %edi
>            2: 8b 76 08                     	movl	0x8(%rsi), %esi
>            5: eb 09                        	jmp	0x10 <f>
>            7: 66 0f 1f 84 00 00 00 00 00   	nopw	(%rax,%rax)
>     
>     0000000000000010 <f>:
>           10: 8d 04 37                     	leal	(%rdi,%rsi), %eax
>           13: c3                           	retq
>     
> But generated DWARF hides this information:

Actually, I'm not correct. The information is present because
`DW_AT_location` attribute is not present (just as 4.1.4 says).
So I think that the condition for optimized parameters detection has
to be adjusted one more time:

			has_location = attr_location(die, &loc.expr, &loc.exprlen) == 0;
			has_const_value = dwarf_attr(die, DW_AT_const_value, &attr) != NULL;

			if (has_location && loc.exprlen != 0) {
				Dwarf_Op *expr = loc.expr;

				switch (expr->atom) {
				case DW_OP_reg1 ... DW_OP_reg31:
				case DW_OP_breg0 ... DW_OP_breg31:
					break;
				default:
					parm->optimized = true;
					break;
				}
			} else if (!has_location || has_const_value) {
				parm->optimized = true;
			}

(But again, the parameter is marked as optimized but the function is
 not skipped in the final BTF, so either I applied our last change
 incorrectly or something additional should be done).
 
wdyt?

>     $ llvm-dwarfdump test.o
>     ...
>     0x0000005c:   DW_TAG_subprogram
>                     DW_AT_low_pc	(0x0000000000000010)
>                     DW_AT_high_pc	(0x0000000000000014)
>                     DW_AT_frame_base	(DW_OP_reg7 RSP)
>                     DW_AT_call_all_calls	(true)
>                     DW_AT_name	("f")
>                     DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
>                     DW_AT_decl_line	(7)
>                     DW_AT_prototyped	(true)
>                     DW_AT_type	(0x00000074 "int")
>     
>     0x0000006b:     DW_TAG_formal_parameter
>                       DW_AT_name	("s")
>                       DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
>                       DW_AT_decl_line	(7)
>                       DW_AT_type	(0x0000009e "st *")
>     
>     0x00000073:     NULL
>     ...
> 
> Is this important?
> (gcc does not do this for the particular example, but I don't know if
>  it could be tricked to under some conditions).
> 
> Thanks,
> Eduard
> 
> [...]


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH dwarves 1/5] dwarves: help dwarf loader spot functions with optimized-out parameters
  2023-01-26  0:20             ` Eduard Zingerman
@ 2023-01-26 14:02               ` Alan Maguire
  2023-01-26 15:02                 ` Eduard Zingerman
  0 siblings, 1 reply; 27+ messages in thread
From: Alan Maguire @ 2023-01-26 14:02 UTC (permalink / raw)
  To: Eduard Zingerman, acme, yhs, ast, olsajiri, timo
  Cc: daniel, andrii, songliubraving, john.fastabend, kpsingh, sdf,
	haoluo, martin.lau, bpf

On 26/01/2023 00:20, Eduard Zingerman wrote:
> On Thu, 2023-01-26 at 01:42 +0200, Eduard Zingerman wrote:
>> On Wed, 2023-01-25 at 22:52 +0000, Alan Maguire wrote:
>> [...]
>>>
>>> Thanks for this - I tried it, and we spot the optimization once we update
>>> die__create_new_parameter() as follows:
>>>
>>> diff --git a/dwarf_loader.c b/dwarf_loader.c
>>> index f96b6ff..605ad45 100644
>>> --- a/dwarf_loader.c
>>> +++ b/dwarf_loader.c
>>> @@ -1529,6 +1530,8 @@ static struct tag *die__create_new_parameter(Dwarf_Die *di
>>>  
>>>         if (ftype != NULL) {
>>>                 ftype__add_parameter(ftype, parm);
>>> +               if (parm->optimized)
>>> +                       ftype->optimized_parms = 1;
>>>                 if (param_idx >= 0) {
>>>                         if (add_child_llvm_annotations(die, param_idx, conf, &(t
>>>                                 return NULL;
>>>
>>
>> Great, looks good.
>>
>>> With that change, I see:
>>>
>>> $ pahole --verbose --btf_encode_detached=test.btf test.o
>>> btf_encoder__new: 'test.o' doesn't have '.data..percpu' section
>>> Found 0 per-CPU variables!
>>> Found 2 functions!
>>> File test.o:
>>> [1] INT int size=4 nr_bits=32 encoding=SIGNED
>>> [2] PTR (anon) type_id=3
>>> [3] PTR (anon) type_id=4
>>> [4] INT char size=1 nr_bits=8 encoding=SIGNED
>>> [5] FUNC_PROTO (anon) return=1 args=(1 argc, 2 argv)
>>> [6] FUNC main type_id=5
>>> added local function 'f', optimized-out params
>>> skipping addition of 'f' due to optimized-out parameters
>>
>> Sorry, I have one more silly program.
>>
>> I talked to Yonghong today and we discussed if compiler can change a
>> type of a function parameter as a result of some optimization.
>> Consider the following example:
>>
>>     $ cat test.c
>>     struct st {
>>       int a;
>>       int b;
>>     };
>>     
>>     __attribute__((noinline))
>>     static int f(struct st *s) {
>>       return s->a + s->b;
>>     }
>>     
>>     int main(int argc, char *argv[]) {
>>       struct st s = {
>>         .a = (long)argv[0],
>>         .b = (long)argv[1]
>>       };
>>       return f(&s);
>>     }
>>
>> When compiled by `clang` with -O3 the prototype of `f` is changed from
>> `int f(struct *st)` to `int f(int, int)`:
>>
>>     $ clang -O3 -g -c test.c -o test.o && llvm-objdump -d test.o
>>     ...
>>     0000000000000000 <main>:
>>            0: 8b 3e                        	movl	(%rsi), %edi
>>            2: 8b 76 08                     	movl	0x8(%rsi), %esi
>>            5: eb 09                        	jmp	0x10 <f>
>>            7: 66 0f 1f 84 00 00 00 00 00   	nopw	(%rax,%rax)
>>     
>>     0000000000000010 <f>:
>>           10: 8d 04 37                     	leal	(%rdi,%rsi), %eax
>>           13: c3                           	retq
>>     
>> But generated DWARF hides this information:
> 
> Actually, I'm not correct. The information is present because
> `DW_AT_location` attribute is not present (just as 4.1.4 says).
> So I think that the condition for optimized parameters detection has
> to be adjusted one more time:
> 
> 			has_location = attr_location(die, &loc.expr, &loc.exprlen) == 0;
> 			has_const_value = dwarf_attr(die, DW_AT_const_value, &attr) != NULL;
> 
> 			if (has_location && loc.exprlen != 0) {
> 				Dwarf_Op *expr = loc.expr;
> 
> 				switch (expr->atom) {
> 				case DW_OP_reg1 ... DW_OP_reg31:
> 				case DW_OP_breg0 ... DW_OP_breg31:
> 					break;
> 				default:
> 					parm->optimized = true;
> 					break;
> 				}
> 			} else if (!has_location || has_const_value) {
> 				parm->optimized = true;
> 			}
> 
> (But again, the parameter is marked as optimized but the function is
>  not skipped in the final BTF, so either I applied our last change
>  incorrectly or something additional should be done).
>  
> wdyt?

I've been digging into this a bit, and the issue here is that for 
gcc-generated DWARF at least, location info is often in the abstract 
origin parameter references, so we have to combine observations across
abstract origin reference and original parameter to determine for sure
if the parameter really is missing location information. In the
case of this program there are no abstract origin references, so
it's a bit more straightforward, but we have to handle both cases
I think.

I'll try and polish up a v2 series that incorporates this shortly;
in testing it, it works on this case as desired I think:

LLVM_OBJCOPY=objcopy pahole --verbose -J ~/src/isra2/test2.o
btf_encoder__new: '/home/alan/src/isra2/test2.o' doesn't have '.data..percpu' section
Found 0 per-CPU variables!
Found 13 functions!
File /home/alan/src/isra2/test2.o:
[1] INT long size=8 nr_bits=64 encoding=SIGNED
[2] INT int size=4 nr_bits=32 encoding=SIGNED
[3] PTR (anon) type_id=4
[4] PTR (anon) type_id=5
[5] INT char size=1 nr_bits=8 encoding=SIGNED
[6] STRUCT st size=8
	a type_id=2 bits_offset=0
	b type_id=2 bits_offset=32
[7] PTR (anon) type_id=6
[8] FUNC_PROTO (anon) return=2 args=(2 argc, 3 argv)
[9] FUNC main type_id=8
added local function 'f', optimized-out params
skipping addition of 'f' due to optimized-out parameters

Thanks!

Alan

> 
>>     $ llvm-dwarfdump test.o
>>     ...
>>     0x0000005c:   DW_TAG_subprogram
>>                     DW_AT_low_pc	(0x0000000000000010)
>>                     DW_AT_high_pc	(0x0000000000000014)
>>                     DW_AT_frame_base	(DW_OP_reg7 RSP)
>>                     DW_AT_call_all_calls	(true)
>>                     DW_AT_name	("f")
>>                     DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
>>                     DW_AT_decl_line	(7)
>>                     DW_AT_prototyped	(true)
>>                     DW_AT_type	(0x00000074 "int")
>>     
>>     0x0000006b:     DW_TAG_formal_parameter
>>                       DW_AT_name	("s")
>>                       DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
>>                       DW_AT_decl_line	(7)
>>                       DW_AT_type	(0x0000009e "st *")
>>     
>>     0x00000073:     NULL
>>     ...
>>
>> Is this important?
>> (gcc does not do this for the particular example, but I don't know if
>>  it could be tricked to under some conditions).
>>
>> Thanks,
>> Eduard
>>
>> [...]
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH dwarves 5/5] btf_encoder: skip BTF encoding of static functions with inconsistent prototypes
  2023-01-25 16:53   ` Jiri Olsa
@ 2023-01-26 14:12     ` Alan Maguire
  0 siblings, 0 replies; 27+ messages in thread
From: Alan Maguire @ 2023-01-26 14:12 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, yhs, ast, timo, daniel, andrii, songliubraving,
	john.fastabend, kpsingh, sdf, haoluo, martin.lau, bpf

On 25/01/2023 16:53, Jiri Olsa wrote:
> On Tue, Jan 24, 2023 at 01:45:31PM +0000, Alan Maguire wrote:
> 
> SNIP
> 
>>  	} else {
>>  		struct btf_encoder_state *state = zalloc(sizeof(*state));
>>  
>> @@ -898,10 +954,12 @@ static void btf_encoder__add_saved_func(const void *nodep, const VISIT which,
>>  	/* we can safely free encoder state since we visit each node once */
>>  	free(fn->priv);
>>  	fn->priv = NULL;
>> -	if (fn->proto.optimized_parms) {
>> +	if (fn->proto.optimized_parms || fn->proto.inconsistent_proto) {
>>  		if (encoder->verbose)
>> -			printf("skipping addition of '%s' due to optimized-out parameters\n",
>> -			       function__name(fn));
>> +			printf("skipping addition of '%s' due to %s\n",
>> +			       function__name(fn),
>> +			       fn->proto.optimized_parms ? "optimized-out parameters" :
>> +							   "multiple inconsistent function prototypes");
>>  	} else {
>>  		btf_encoder__add_func(encoder, fn);
>>  	}
>> @@ -1775,6 +1833,8 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
>>  		 */
>>  		if (fn->declaration)
>>  			continue;
>> +		if (!fn->external)
>> +			save = true;
> 
> how about conflicts between static and global functions,
> I can see still few cases like:
> 
>   void __init msg_init(void)
>   static void msg_init(struct spi_message *m, struct spi_transfer *x,
>                        u8 *addr, size_t count, char *tx, char *rx)
> 
>   static inline void free_pgtable_page(u64 *pt)
>   void free_pgtable_page(void *vaddr)
> 
>   STATIC inline long INIT parse_header(u8 *input, long *skip, long in_len)
>   static struct tb_cfg_result parse_header(const struct ctl_pkg *pkg, u32 len,
>                                          enum tb_cfg_pkg_type type, u64 route)
>   static void __init parse_header(char *s)
> 
>

great catch; I hadn't thought of this at all. Looks like it is often
the case that the first function found will actually end up in BTF
currently, so sometimes we get the static, sometimes the non-static.
I'm seeing the same list as above.

> could we enable the check/save globaly?
>

I think we could, though at the additional cost of a larger tree
of functions. I can't see another way to handle it though right
now.

Alan

 
> jirka
> 
>>  		if (!ftype__has_arg_names(&fn->proto))
>>  			continue;
>>  		if (encoder->functions.cnt) {
>> @@ -1790,7 +1850,8 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct co
>>  			if (func) {
>>  				if (func->generated)
>>  					continue;
>> -				func->generated = true;
>> +				if (!save)
>> +					func->generated = true;
>>  			} else if (encoder->functions.suffix_cnt) {
>>  				/* falling back to name.isra.0 match if no exact
>>  				 * match is found; only bother if we found any
>> diff --git a/dwarves.h b/dwarves.h
>> index 1ad1b3b..9b80262 100644
>> --- a/dwarves.h
>> +++ b/dwarves.h
>> @@ -830,6 +830,7 @@ struct ftype {
>>  	uint16_t	 nr_parms;
>>  	uint8_t		 unspec_parms:1; /* just one bit is needed */
>>  	uint8_t		 optimized_parms:1;
>> +	uint8_t		 inconsistent_proto:1;
>>  };
>>  
>>  static inline struct ftype *tag__ftype(const struct tag *tag)
>> -- 
>> 1.8.3.1
>>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH dwarves 1/5] dwarves: help dwarf loader spot functions with optimized-out parameters
  2023-01-26 14:02               ` Alan Maguire
@ 2023-01-26 15:02                 ` Eduard Zingerman
  0 siblings, 0 replies; 27+ messages in thread
From: Eduard Zingerman @ 2023-01-26 15:02 UTC (permalink / raw)
  To: Alan Maguire, acme, yhs, ast, olsajiri, timo
  Cc: daniel, andrii, songliubraving, john.fastabend, kpsingh, sdf,
	haoluo, martin.lau, bpf

On Thu, 2023-01-26 at 14:02 +0000, Alan Maguire wrote:
> On 26/01/2023 00:20, Eduard Zingerman wrote:
> > On Thu, 2023-01-26 at 01:42 +0200, Eduard Zingerman wrote:
> > > On Wed, 2023-01-25 at 22:52 +0000, Alan Maguire wrote:
> > > [...]
> > > > 
> > > > Thanks for this - I tried it, and we spot the optimization once we update
> > > > die__create_new_parameter() as follows:
> > > > 
> > > > diff --git a/dwarf_loader.c b/dwarf_loader.c
> > > > index f96b6ff..605ad45 100644
> > > > --- a/dwarf_loader.c
> > > > +++ b/dwarf_loader.c
> > > > @@ -1529,6 +1530,8 @@ static struct tag *die__create_new_parameter(Dwarf_Die *di
> > > >  
> > > >         if (ftype != NULL) {
> > > >                 ftype__add_parameter(ftype, parm);
> > > > +               if (parm->optimized)
> > > > +                       ftype->optimized_parms = 1;
> > > >                 if (param_idx >= 0) {
> > > >                         if (add_child_llvm_annotations(die, param_idx, conf, &(t
> > > >                                 return NULL;
> > > > 
> > > 
> > > Great, looks good.
> > > 
> > > > With that change, I see:
> > > > 
> > > > $ pahole --verbose --btf_encode_detached=test.btf test.o
> > > > btf_encoder__new: 'test.o' doesn't have '.data..percpu' section
> > > > Found 0 per-CPU variables!
> > > > Found 2 functions!
> > > > File test.o:
> > > > [1] INT int size=4 nr_bits=32 encoding=SIGNED
> > > > [2] PTR (anon) type_id=3
> > > > [3] PTR (anon) type_id=4
> > > > [4] INT char size=1 nr_bits=8 encoding=SIGNED
> > > > [5] FUNC_PROTO (anon) return=1 args=(1 argc, 2 argv)
> > > > [6] FUNC main type_id=5
> > > > added local function 'f', optimized-out params
> > > > skipping addition of 'f' due to optimized-out parameters
> > > 
> > > Sorry, I have one more silly program.
> > > 
> > > I talked to Yonghong today and we discussed if compiler can change a
> > > type of a function parameter as a result of some optimization.
> > > Consider the following example:
> > > 
> > >     $ cat test.c
> > >     struct st {
> > >       int a;
> > >       int b;
> > >     };
> > >     
> > >     __attribute__((noinline))
> > >     static int f(struct st *s) {
> > >       return s->a + s->b;
> > >     }
> > >     
> > >     int main(int argc, char *argv[]) {
> > >       struct st s = {
> > >         .a = (long)argv[0],
> > >         .b = (long)argv[1]
> > >       };
> > >       return f(&s);
> > >     }
> > > 
> > > When compiled by `clang` with -O3 the prototype of `f` is changed from
> > > `int f(struct *st)` to `int f(int, int)`:
> > > 
> > >     $ clang -O3 -g -c test.c -o test.o && llvm-objdump -d test.o
> > >     ...
> > >     0000000000000000 <main>:
> > >            0: 8b 3e                        	movl	(%rsi), %edi
> > >            2: 8b 76 08                     	movl	0x8(%rsi), %esi
> > >            5: eb 09                        	jmp	0x10 <f>
> > >            7: 66 0f 1f 84 00 00 00 00 00   	nopw	(%rax,%rax)
> > >     
> > >     0000000000000010 <f>:
> > >           10: 8d 04 37                     	leal	(%rdi,%rsi), %eax
> > >           13: c3                           	retq
> > >     
> > > But generated DWARF hides this information:
> > 
> > Actually, I'm not correct. The information is present because
> > `DW_AT_location` attribute is not present (just as 4.1.4 says).
> > So I think that the condition for optimized parameters detection has
> > to be adjusted one more time:
> > 
> > 			has_location = attr_location(die, &loc.expr, &loc.exprlen) == 0;
> > 			has_const_value = dwarf_attr(die, DW_AT_const_value, &attr) != NULL;
> > 
> > 			if (has_location && loc.exprlen != 0) {
> > 				Dwarf_Op *expr = loc.expr;
> > 
> > 				switch (expr->atom) {
> > 				case DW_OP_reg1 ... DW_OP_reg31:
> > 				case DW_OP_breg0 ... DW_OP_breg31:
> > 					break;
> > 				default:
> > 					parm->optimized = true;
> > 					break;
> > 				}
> > 			} else if (!has_location || has_const_value) {
> > 				parm->optimized = true;
> > 			}
> > 
> > (But again, the parameter is marked as optimized but the function is
> >  not skipped in the final BTF, so either I applied our last change
> >  incorrectly or something additional should be done).
> >  
> > wdyt?
> 
> I've been digging into this a bit, and the issue here is that for 
> gcc-generated DWARF at least, location info is often in the abstract 
> origin parameter references, so we have to combine observations across
> abstract origin reference and original parameter to determine for sure
> if the parameter really is missing location information. In the
> case of this program there are no abstract origin references, so
> it's a bit more straightforward, but we have to handle both cases
> I think.

Is it safe it ignore DW_TAG_subprogram's with DW_AT_abstract_origin's
and thus avoid the combine logic?
The way I read standard it looks like DW_AT_abstract_origin is only
present for instances that undergo some optimization.

> 
> I'll try and polish up a v2 series that incorporates this shortly;
> in testing it, it works on this case as desired I think:
> 
> LLVM_OBJCOPY=objcopy pahole --verbose -J ~/src/isra2/test2.o
> btf_encoder__new: '/home/alan/src/isra2/test2.o' doesn't have '.data..percpu' section
> Found 0 per-CPU variables!
> Found 13 functions!
> File /home/alan/src/isra2/test2.o:
> [1] INT long size=8 nr_bits=64 encoding=SIGNED
> [2] INT int size=4 nr_bits=32 encoding=SIGNED
> [3] PTR (anon) type_id=4
> [4] PTR (anon) type_id=5
> [5] INT char size=1 nr_bits=8 encoding=SIGNED
> [6] STRUCT st size=8
> 	a type_id=2 bits_offset=0
> 	b type_id=2 bits_offset=32
> [7] PTR (anon) type_id=6
> [8] FUNC_PROTO (anon) return=2 args=(2 argc, 3 argv)
> [9] FUNC main type_id=8
> added local function 'f', optimized-out params
> skipping addition of 'f' due to optimized-out parameters
> 
> Thanks!
> 
> Alan
> 
> > 
> > >     $ llvm-dwarfdump test.o
> > >     ...
> > >     0x0000005c:   DW_TAG_subprogram
> > >                     DW_AT_low_pc	(0x0000000000000010)
> > >                     DW_AT_high_pc	(0x0000000000000014)
> > >                     DW_AT_frame_base	(DW_OP_reg7 RSP)
> > >                     DW_AT_call_all_calls	(true)
> > >                     DW_AT_name	("f")
> > >                     DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
> > >                     DW_AT_decl_line	(7)
> > >                     DW_AT_prototyped	(true)
> > >                     DW_AT_type	(0x00000074 "int")
> > >     
> > >     0x0000006b:     DW_TAG_formal_parameter
> > >                       DW_AT_name	("s")
> > >                       DW_AT_decl_file	("/home/eddy/work/tmp/test.c")
> > >                       DW_AT_decl_line	(7)
> > >                       DW_AT_type	(0x0000009e "st *")
> > >     
> > >     0x00000073:     NULL
> > >     ...
> > > 
> > > Is this important?
> > > (gcc does not do this for the particular example, but I don't know if
> > >  it could be tricked to under some conditions).
> > > 
> > > Thanks,
> > > Eduard
> > > 
> > > [...]
> > 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH dwarves 4/5] btf_encoder: represent "."-suffixed optimized functions (".isra.0") in BTF
  2023-01-25 18:59     ` Alan Maguire
@ 2023-01-26 17:43       ` Kui-Feng Lee
  0 siblings, 0 replies; 27+ messages in thread
From: Kui-Feng Lee @ 2023-01-26 17:43 UTC (permalink / raw)
  To: Alan Maguire, acme, yhs, ast, olsajiri, timo
  Cc: daniel, andrii, songliubraving, john.fastabend, kpsingh, sdf,
	haoluo, martin.lau, bpf


On 1/25/23 10:59, Alan Maguire wrote:
> On 25/01/2023 17:54, Kui-Feng Lee wrote:
>> On 1/24/23 05:45, Alan Maguire wrote:
>>> +/*
>>> + * static functions with suffixes are not added yet - we need to
>>> + * observe across all CUs to see if the static function has
>>> + * optimized parameters in any CU, since in such a case it should
>>> + * not be included in the final BTF.  NF_HOOK.constprop.0() is
>>> + * a case in point - it has optimized-out parameters in some CUs
>>> + * but not others.  In order to have consistency (since we do not
>>> + * know which instance the BTF-specified function signature will
>>> + * apply to), we simply skip adding functions which have optimized
>>> + * out parameters anywhere.
>>> + */
>>> +static int32_t btf_encoder__save_func(struct btf_encoder *encoder, struct function *fn)
>>> +{
>>> +    struct btf_encoder *parent = encoder->parent ? encoder->parent : encoder;
>>> +    const char *name = function__name(fn);
>>> +    struct function **nodep;
>>> +    int ret = 0;
>>> +
>>> +    pthread_mutex_lock(&parent->saved_func_lock);
>> Do you have the number of static functions with suffices?
>>
> There are a few thousand, and around 25000 static functions
> overall ("."-suffixed are all static) that will participate in
> the tree representations (see patch 5).  This equates to roughly
> half of the vmlinux BTF functions.


To evaluate the effectiveness of your patchset, I conducted an 
experiment where I ran a command:

`time env LLVM_OBJCOPY=objcopy pahole -J --btf_gen_floats 
--lang_exclude=rust -j .tmp_vmlinux.btf`.


On my machine, it took about

  - 9s w/o the patchset (3s waiting for the worker threads)

  - 13s w/ the patchset (7s waiting for the worker threads)

It was about 4s difference.

If I turned multi-threading off (w/o -j), it took

  - 28s w/o the patchset.

  - 32s w/ the patchset.

It was about 4s difference as sell.


Hence, multi-threading does not benefit us in the instance of this 
patchset. Lock contention should be taken into account heavily here. 
Approximately 4% of the time is spent when executing a Linux incremental 
build (about 96s~108s) with an insignificant modification to the source 
tree for about four seconds.


Taking into consideration the previous experience that shows a reduction 
in BTF info processing time (not including loading and IO) to 13%, I am 
uncertain if it pays off to invest my time towards reducing 4s to <1s. 
Though, cutting down 3 seconds every single time I need to rebuild the 
tree for some small changes might be worth it.


>
>> If the number of static functions with suffices is high, the contention of the lock would be an issue.
>>
>> Is it possible to keep a local pool of static functions with suffices? The pool will be combined with its parent either at the completion of a CU, before ending the thread or when merging into the main thread.
>>
> It's possible alright. I'll try to lay out the possibilities so we
> can figure out the best way forward.
>
> Option 1: global tree of static functions, created during DWARF loading
>
> Pro: Quick addition/lookup, we can flag optimizations or inconsistent prototypes as
> we encounter them.
> Con: Lock contention between encoder threads.
>
> Option 2: store static functions in a per-encoder tree, traverse them all
> prior to BTF merging to eliminate unwanted functions
>
> Pro: limits contention.
> Con: for each static function in each encoder, we need to look it up in all other
> encoder trees. In option 1 we paid that price as the function was added, here
> we pay it later on prior to merging. So processing here is
> O(number_functions * num_encoders). There may be a cleverer way to handle
> this but I can't see it right now.
>
> There may be other approaches to this of course, but these were the two I
> could come up with. What do you think?


Option 2 appears to be the more convenient and effective solution, 
whereas Option 1, I guess, will require considerable effort for a 
successful outcome.


> Alan
>
>>> +    nodep = tsearch(fn, &parent->saved_func_tree, function__compare);
>>> +    if (nodep == NULL) {
>>> +        fprintf(stderr, "error: out of memory adding local function '%s'\n",
>>> +            name);
>>> +        ret = -1;
>>> +        goto out;
>>> +    }
>>> +    /* If we find an existing entry, we want to merge observations
>>> +     * across both functions, checking that the "seen optimized-out
>>> +     * parameters" status is reflected in our tree entry.
>>> +     * If the entry is new, record encoder state required
>>> +     * to add the local function later (encoder + type_id_off)
>>> +     * such that we can add the function later.
>>> +     */
>>> +    if (*nodep != fn) {
>>> +        (*nodep)->proto.optimized_parms |= fn->proto.optimized_parms;
>>> +    } else {
>>> +        struct btf_encoder_state *state = zalloc(sizeof(*state));
>>> +
>>> +        if (state == NULL) {
>>> +            fprintf(stderr, "error: out of memory adding local function '%s'\n",
>>> +                name);
>>> +            ret = -1;
>>> +            goto out;
>>> +        }
>>> +        state->encoder = encoder;
>>> +        state->type_id_off = encoder->type_id_off;
>>> +        fn->priv = state;
>>> +        encoder->saved_func_cnt++;
>>> +        if (encoder->verbose)
>>> +            printf("added local function '%s'%s\n", name,
>>> +                   fn->proto.optimized_parms ?
>>> +                   ", optimized-out params" : "");
>>> +    }
>>> +out:
>>> +    pthread_mutex_unlock(&parent->saved_func_lock);
>>> +    return ret;
>>> +}
>>> +

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH dwarves 4/5] btf_encoder: represent "."-suffixed optimized functions (".isra.0") in BTF
  2023-01-25 18:56     ` Arnaldo Carvalho de Melo
@ 2023-01-26 18:37       ` Kui-Feng Lee
  0 siblings, 0 replies; 27+ messages in thread
From: Kui-Feng Lee @ 2023-01-26 18:37 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Alan Maguire, yhs, ast, olsajiri, timo, daniel, andrii,
	songliubraving, john.fastabend, kpsingh, sdf, haoluo, martin.lau,
	bpf


On 1/25/23 10:56, Arnaldo Carvalho de Melo wrote:
> Em Wed, Jan 25, 2023 at 09:54:26AM -0800, Kui-Feng Lee escreveu:
>> On 1/24/23 05:45, Alan Maguire wrote:
>>
>   ... skip ...
>> If the number of static functions with suffices is high, the contention of
>> the lock would be an issue.
>>
>> Is it possible to keep a local pool of static functions with suffices? The
>> pool will be combined with its parent either at the completion of a CU,
>> before ending the thread or when merging into the main thread.
> May help, but I think maybe premature optimization is the root of... :-)


It is true.  However, we already encountered the lock contention issue

when doing parallelization previously.  It is very likely to have the same

issue here.


FYI, I did some tests and had numbers in another thread.

https://lore.kernel.org/bpf/9d2a5966-7cef-0c35-8990-368fc6de930d@gmail.com/



^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2023-01-26 18:37 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-24 13:45 [PATCH dwarves 0/5] dwarves: support encoding of optimized-out parameters, removal of inconsistent static functions Alan Maguire
2023-01-24 13:45 ` [PATCH dwarves 1/5] dwarves: help dwarf loader spot functions with optimized-out parameters Alan Maguire
2023-01-25 16:53   ` Jiri Olsa
2023-01-25 17:47   ` Eduard Zingerman
2023-01-25 18:28     ` Alan Maguire
2023-01-25 21:34       ` Eduard Zingerman
2023-01-25 22:52         ` Alan Maguire
2023-01-25 23:42           ` Eduard Zingerman
2023-01-26  0:20             ` Eduard Zingerman
2023-01-26 14:02               ` Alan Maguire
2023-01-26 15:02                 ` Eduard Zingerman
2023-01-24 13:45 ` [PATCH dwarves 2/5] btf_encoder: refactor function addition into dedicated btf_encoder__add_func Alan Maguire
2023-01-24 13:45 ` [PATCH dwarves 3/5] btf_encoder: child encoders should have a reference to parent encoder Alan Maguire
2023-01-24 13:45 ` [PATCH dwarves 4/5] btf_encoder: represent "."-suffixed optimized functions (".isra.0") in BTF Alan Maguire
2023-01-25 17:54   ` Kui-Feng Lee
2023-01-25 18:56     ` Arnaldo Carvalho de Melo
2023-01-26 18:37       ` Kui-Feng Lee
2023-01-25 18:59     ` Alan Maguire
2023-01-26 17:43       ` Kui-Feng Lee
2023-01-24 13:45 ` [PATCH dwarves 5/5] btf_encoder: skip BTF encoding of static functions with inconsistent prototypes Alan Maguire
2023-01-25 13:39   ` Jiri Olsa
2023-01-25 14:18     ` Alan Maguire
2023-01-25 16:53   ` Jiri Olsa
2023-01-26 14:12     ` Alan Maguire
2023-01-24 15:14 ` [PATCH dwarves 0/5] dwarves: support encoding of optimized-out parameters, removal of inconsistent static functions Jiri Olsa
2023-01-24 16:11   ` Alan Maguire
2023-01-25 13:59     ` Jiri Olsa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.