dwarves.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/12]
@ 2024-04-12 21:15 Arnaldo Carvalho de Melo
  2024-04-12 21:15 ` [PATCH 01/12] core: Allow asking for a reproducible build Arnaldo Carvalho de Melo
                   ` (11 more replies)
  0 siblings, 12 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-12 21:15 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh, Eduard Zingerman

   	This allows us to have reproducible builds while keeping the
DWARF loading phase in parallel, achieving a noticeable speedup as
showed in the commit log messages.

- Arnaldo

v2: Don't delete the CUs after encoding BTF for them, we may still have
still references to them from the BTF encoder, as noticed by Alan
Maguire.


Arnaldo Carvalho de Melo (12):
  core: Allow asking for a reproducible build
  pahole: Disable BTF multithreaded encoded when doing reproducible
    builds
  dwarf_loader: Separate creating the cu/dcu pair from processing it
  dwarf_loader: Introduce dwarf_cus__process_cu()
  dwarf_loader: Create the cu/dcu pair in dwarf_cus__nextcu()
  dwarf_loader: Remove unused 'thr_data' arg from
    dwarf_cus__create_and_process_cu()
  core: Add unlocked cus__add() variant
  core: Add cus__remove(), counterpart of cus__add()
  dwarf_loader: Add the cu to the cus list early, remove on LSK_DELETE
  core/dwarf_loader: Add functions to set state of CU processing
  pahole: Encode BTF serially in a reproducible build
  tests: Add a BTF reproducible generation test

 dwarf_loader.c              | 73 ++++++++++++++++++++++++---------
 dwarves.c                   | 62 +++++++++++++++++++++++++++-
 dwarves.h                   | 17 ++++++++
 pahole.c                    | 81 +++++++++++++++++++++++++++++++++++--
 tests/reproducible_build.sh | 56 +++++++++++++++++++++++++
 5 files changed, 265 insertions(+), 24 deletions(-)
 create mode 100755 tests/reproducible_build.sh

-- 
2.44.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 01/12] core: Allow asking for a reproducible build
  2024-04-12 21:15 [PATCH 00/12] Arnaldo Carvalho de Melo
@ 2024-04-12 21:15 ` Arnaldo Carvalho de Melo
  2024-04-12 21:15 ` [PATCH 02/12] pahole: Disable BTF multithreaded encoded when doing reproducible builds Arnaldo Carvalho de Melo
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-12 21:15 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

This is initially about BTF encoding, we want to load DWARF and encode
BTF from it in a way that is reproducible, i.e. no matter how many
threads we use for the loading/encoding process, the output will be the
same, i.e. the BTF ids produced will be the same for all builds.

This first path just adds the conf_load field and allows it to be asked
for with the '--reproducible_build' option in pahole.

At some point we'll use with --btf_features=+reproducible_build or
'--btf_features=default --btf_features=reproducible_build' to keep the
default set of BTF features and be able to use this in the Linux kernel
build system without doing an extra pahole version check for the
availability of --reproducible_build with pahole versions that already
support --btf_features and thus would ignore "reproducible_build".

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarves.h | 1 +
 pahole.c  | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/dwarves.h b/dwarves.h
index 2393a6c3dc836f39..4dfaa01a00f782d9 100644
--- a/dwarves.h
+++ b/dwarves.h
@@ -87,6 +87,7 @@ struct conf_load {
 	bool			skip_encoding_btf_vars;
 	bool			btf_gen_floats;
 	bool			btf_encode_force;
+	bool			reproducible_build;
 	uint8_t			hashtable_bits;
 	uint8_t			max_hashtable_bits;
 	uint16_t		kabi_prefix_len;
diff --git a/pahole.c b/pahole.c
index 0b9c2de74f146a4d..96e153432fa212a5 100644
--- a/pahole.c
+++ b/pahole.c
@@ -1235,6 +1235,7 @@ ARGP_PROGRAM_VERSION_HOOK_DEF = dwarves_print_version;
 #define ARGP_supported_btf_features 342
 #define ARGP_btf_features_strict 343
 #define ARGP_contains_enumerator 344
+#define ARGP_reproducible_build 345
 
 /* --btf_features=feature1[,feature2,..] allows us to specify
  * a list of requested BTF features or "all" to enable all features.
@@ -1819,6 +1820,11 @@ static const struct argp_option pahole__options[] = {
 		.arg = "FEATURE_LIST_STRICT",
 		.doc = "Specify supported BTF features in FEATURE_LIST_STRICT or 'all' for all supported features.  Unlike --btf_features, unrecognized features will trigger an error."
 	},
+	{
+		.name = "reproducible_build",
+		.key = ARGP_reproducible_build,
+		.doc = "Generate reproducile BTF output"
+	},
 	{
 		.name = NULL,
 	}
@@ -1997,6 +2003,8 @@ static error_t pahole__options_parser(int key, char *arg,
 		conf_load.btf_gen_optimized = true;		break;
 	case ARGP_skip_encoding_btf_inconsistent_proto:
 		conf_load.skip_encoding_btf_inconsistent_proto = true; break;
+	case ARGP_reproducible_build:
+		conf_load.reproducible_build = true;	break;
 	case ARGP_btf_features:
 		parse_btf_features(arg, false);		break;
 	case ARGP_supported_btf_features:
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 02/12] pahole: Disable BTF multithreaded encoded when doing reproducible builds
  2024-04-12 21:15 [PATCH 00/12] Arnaldo Carvalho de Melo
  2024-04-12 21:15 ` [PATCH 01/12] core: Allow asking for a reproducible build Arnaldo Carvalho de Melo
@ 2024-04-12 21:15 ` Arnaldo Carvalho de Melo
  2024-04-12 21:15 ` [PATCH 03/12] dwarf_loader: Separate creating the cu/dcu pair from processing it Arnaldo Carvalho de Melo
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-12 21:15 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

Reproducible builds need to produce BTF that have the same ids, which is
not possible at the moment to do in parallel with libbpf, so serialize
the encoding.

The next patches will also make sure that DWARF while being read in
parallel into internal representation for later BTF encoding has its CU
(Compile Units) fed to the BTF encoder in the same order as it is in the
DWARF file, this way we'll produce the same BTF output no matter how
many threads are used to read BTF.

Then we'll make sure we have tests in place that compare the output of
parallel BTF encoding (well, just the DWARF loading part, maybe the BTF
in the future), i.e. when using 'pahole -j' with the one obtained when
doing single threaded encoding.

Testing it on a:

  # grep -m1 "model name" /proc/cpuinfo
  model name	: 13th Gen Intel(R) Core(TM) i7-1365U
  ~#

I.e. 2 performance cores (4 threads) + 8 efficiency cores.

From:

  $ perf stat -r5 pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux' (5 runs):

         17,187.27 msec task-clock:u       #    6.153 CPUs utilized   ( +-  0.34% )
  <SNIP>
            2.7931 +- 0.0336 seconds time elapsed  ( +-  1.20% )

  $

To:

  $ perf stat -r5 pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible_build vmlinux

   Performance counter stats for 'pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible_build vmlinux' (5 runs):

         14,654.06 msec task-clock:u       #    3.507 CPUs utilized   ( +-  0.45% )
  <SNIP>
            4.1787 +- 0.0344 seconds time elapsed  ( +-  0.82% )

  $

Which is still a nice improvement over doing it completely serially:

  $ perf stat -r5 pahole --btf_encode_detached=vmlinux.btf.serial vmlinux

   Performance counter stats for 'pahole --btf_encode_detached=vmlinux.btf.serial vmlinux' (5 runs):

          7,506.93 msec task-clock:u       #    1.000 CPUs utilized   ( +-  0.13% )
  <SNIP>
            7.5106 +- 0.0115 seconds time elapsed  ( +-  0.15% )

  $

  $ pahole vmlinux.btf.parallel > /tmp/parallel
  $ pahole vmlinux.btf.parallel.reproducible_build > /tmp/parallel.reproducible_build
  $ diff -u /tmp/parallel /tmp/parallel.reproducible_build | wc -l
  269920
  $ pahole --sort vmlinux.btf.parallel > /tmp/parallel.sorted
  $ pahole --sort vmlinux.btf.parallel.reproducible_build > /tmp/parallel.reproducible_build.sorted
  $ diff -u /tmp/parallel.sorted /tmp/parallel.reproducible_build.sorted | wc -l
  0
  $

The BTF ids continue to be undeterministic, as we need to process the
CUs (compile unites) in the same order that they are on vmlinux:

  $ bpftool btf dump file vmlinux.btf.serial > btfdump.serial
  $ bpftool btf dump file vmlinux.btf.parallel.reproducible_build > btfdump.parallel.reproducible_build
  $ bpftool btf dump file vmlinux.btf.parallel > btfdump.parallel
  $ diff -u btfdump.serial btfdump.parallel | wc -l
  624144
  $ diff -u btfdump.serial btfdump.parallel.reproducible_build | wc -l
  594622
  $ diff -u btfdump.parallel.reproducible_build btfdump.parallel | wc -l
  623355
  $

The BTF ids don't match, we'll get them to match at the end of this
patch series:

  $ tail -5 btfdump.serial
  	type_id=127124 offset=219200 size=40 (VAR 'rt6_uncached_list')
  	type_id=11760 offset=221184 size=64 (VAR 'vmw_steal_time')
  	type_id=13533 offset=221248 size=8 (VAR 'kvm_apic_eoi')
  	type_id=13532 offset=221312 size=64 (VAR 'steal_time')
  	type_id=13531 offset=221376 size=68 (VAR 'apf_reason')
  $ tail -5 btfdump.parallel.reproducible_build
  	type_id=113812 offset=219200 size=40 (VAR 'rt6_uncached_list')
  	type_id=87979 offset=221184 size=64 (VAR 'vmw_steal_time')
  	type_id=127391 offset=221248 size=8 (VAR 'kvm_apic_eoi')
  	type_id=127390 offset=221312 size=64 (VAR 'steal_time')
  	type_id=127389 offset=221376 size=68 (VAR 'apf_reason')
  $

Now to make it process the CUs in order, that should get everything
straight without hopefully not degrading it further too much.

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 pahole.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/pahole.c b/pahole.c
index 96e153432fa212a5..fcb4360f11debeb9 100644
--- a/pahole.c
+++ b/pahole.c
@@ -3173,6 +3173,14 @@ struct thread_data {
 	struct btf_encoder *encoder;
 };
 
+static int pahole_threads_prepare_reproducible_build(struct conf_load *conf, int nr_threads, void **thr_data)
+{
+	for (int i = 0; i < nr_threads; i++)
+		thr_data[i] = NULL;
+
+	return 0;
+}
+
 static int pahole_threads_prepare(struct conf_load *conf, int nr_threads, void **thr_data)
 {
 	int i;
@@ -3283,7 +3291,10 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
 				thread->btf = btf_encoder__btf(btf_encoder);
 			}
 		}
-		pthread_mutex_unlock(&btf_lock);
+
+		// Reproducible builds don't have multiple btf_encoders, so we need to keep the lock until we encode BTF for this CU.
+		if (thr_data)
+			pthread_mutex_unlock(&btf_lock);
 
 		if (!btf_encoder) {
 			ret = LSK__STOP_LOADING;
@@ -3319,6 +3330,8 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
 			exit(1);
 		}
 out_btf:
+		if (!thr_data) // See comment about reproducibe_build above
+			pthread_mutex_unlock(&btf_lock);
 		return ret;
 	}
 #if 0
@@ -3689,8 +3702,14 @@ int main(int argc, char *argv[])
 
 	conf_load.steal = pahole_stealer;
 	conf_load.thread_exit = pahole_thread_exit;
-	conf_load.threads_prepare = pahole_threads_prepare;
-	conf_load.threads_collect = pahole_threads_collect;
+
+	if (conf_load.reproducible_build) {
+		conf_load.threads_prepare = pahole_threads_prepare_reproducible_build;
+		conf_load.threads_collect = NULL;
+	} else {
+		conf_load.threads_prepare = pahole_threads_prepare;
+		conf_load.threads_collect = pahole_threads_collect;
+	}
 
 	// Make 'pahole --header type < file' a shorter form of 'pahole -C type --count 1 < file'
 	if (conf.header_type && !class_name && prettify_input) {
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 03/12] dwarf_loader: Separate creating the cu/dcu pair from processing it
  2024-04-12 21:15 [PATCH 00/12] Arnaldo Carvalho de Melo
  2024-04-12 21:15 ` [PATCH 01/12] core: Allow asking for a reproducible build Arnaldo Carvalho de Melo
  2024-04-12 21:15 ` [PATCH 02/12] pahole: Disable BTF multithreaded encoded when doing reproducible builds Arnaldo Carvalho de Melo
@ 2024-04-12 21:15 ` Arnaldo Carvalho de Melo
  2024-04-12 21:15 ` [PATCH 04/12] dwarf_loader: Introduce dwarf_cus__process_cu() Arnaldo Carvalho de Melo
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-12 21:15 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

We will need it so that we add the dcu to a list in the same order as
the CUs are in the DWARF file (vmlinux mostly).

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarf_loader.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index 1dffb3f433cb7c8e..125e361ef2bf3f7b 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -3207,8 +3207,7 @@ struct dwarf_thread {
 	void			*data;
 };
 
-static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die,
-					    uint8_t pointer_size, void *thr_data)
+static struct dwarf_cu *dwarf_cus__create_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die, uint8_t pointer_size)
 {
 	/*
 	 * DW_AT_name in DW_TAG_compile_unit can be NULL, first seen in:
@@ -3218,17 +3217,32 @@ static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *c
 	const char *name = attr_string(cu_die, DW_AT_name, dcus->conf);
 	struct cu *cu = cu__new(name ?: "", pointer_size, dcus->build_id, dcus->build_id_len, dcus->filename, dcus->conf->use_obstack);
 	if (cu == NULL || cu__set_common(cu, dcus->conf, dcus->mod, dcus->elf) != 0)
-		return DWARF_CB_ABORT;
+		return NULL;
 
 	struct dwarf_cu *dcu = dwarf_cu__new(cu);
 
-	if (dcu == NULL)
-		return DWARF_CB_ABORT;
+	if (dcu == NULL) {
+		cu__delete(cu);
+		return NULL;
+	}
 
 	dcu->type_unit = dcus->type_dcu;
 	cu->priv = dcu;
 	cu->dfops = &dwarf__ops;
 
+	return dcu;
+}
+
+static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die,
+					    uint8_t pointer_size, void *thr_data)
+{
+	struct dwarf_cu *dcu = dwarf_cus__create_cu(dcus, cu_die, pointer_size);
+
+	if (dcu == NULL)
+		return DWARF_CB_ABORT;
+
+	struct cu *cu = dcu->cu;
+
 	if (die__process_and_recode(cu_die, cu, dcus->conf) != 0 ||
 	    cus__finalize(dcus->cus, cu, dcus->conf, thr_data) == LSK__STOP_LOADING)
 		return DWARF_CB_ABORT;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 04/12] dwarf_loader: Introduce dwarf_cus__process_cu()
  2024-04-12 21:15 [PATCH 00/12] Arnaldo Carvalho de Melo
                   ` (2 preceding siblings ...)
  2024-04-12 21:15 ` [PATCH 03/12] dwarf_loader: Separate creating the cu/dcu pair from processing it Arnaldo Carvalho de Melo
@ 2024-04-12 21:15 ` Arnaldo Carvalho de Melo
  2024-04-12 21:15 ` [PATCH 05/12] dwarf_loader: Create the cu/dcu pair in dwarf_cus__nextcu() Arnaldo Carvalho de Melo
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-12 21:15 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

Finishing the separation of creating a dcu/cu pair from processing it,
as we'll need to add the new dcu/cu pair to the list under cus__lock(),
so that we process it in order to keep a reproducible BTF encoding.

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarf_loader.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index 125e361ef2bf3f7b..5090509309446890 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -3233,6 +3233,16 @@ static struct dwarf_cu *dwarf_cus__create_cu(struct dwarf_cus *dcus, Dwarf_Die *
 	return dcu;
 }
 
+static int dwarf_cus__process_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die,
+				 struct cu *cu, void *thr_data)
+{
+	if (die__process_and_recode(cu_die, cu, dcus->conf) != 0 ||
+	    cus__finalize(dcus->cus, cu, dcus->conf, thr_data) == LSK__STOP_LOADING)
+		return DWARF_CB_ABORT;
+
+       return DWARF_CB_OK;
+}
+
 static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die,
 					    uint8_t pointer_size, void *thr_data)
 {
@@ -3241,13 +3251,7 @@ static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *c
 	if (dcu == NULL)
 		return DWARF_CB_ABORT;
 
-	struct cu *cu = dcu->cu;
-
-	if (die__process_and_recode(cu_die, cu, dcus->conf) != 0 ||
-	    cus__finalize(dcus->cus, cu, dcus->conf, thr_data) == LSK__STOP_LOADING)
-		return DWARF_CB_ABORT;
-
-       return DWARF_CB_OK;
+	return dwarf_cus__process_cu(dcus, cu_die, dcu->cu, thr_data);
 }
 
 static int dwarf_cus__nextcu(struct dwarf_cus *dcus, Dwarf_Die *die_mem, Dwarf_Die **cu_die, uint8_t *pointer_size, uint8_t *offset_size)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 05/12] dwarf_loader: Create the cu/dcu pair in dwarf_cus__nextcu()
  2024-04-12 21:15 [PATCH 00/12] Arnaldo Carvalho de Melo
                   ` (3 preceding siblings ...)
  2024-04-12 21:15 ` [PATCH 04/12] dwarf_loader: Introduce dwarf_cus__process_cu() Arnaldo Carvalho de Melo
@ 2024-04-12 21:15 ` Arnaldo Carvalho de Melo
  2024-04-12 21:15 ` [PATCH 06/12] dwarf_loader: Remove unused 'thr_data' arg from dwarf_cus__create_and_process_cu() Arnaldo Carvalho de Melo
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-12 21:15 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

dwarf_cus__nextcu() is only used in parallel DWARF loading, and we want
to make sure that we preserve the order of the CUs in the DWARF file
being loaded, so move creating the cu/dcu to under that lock.

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarf_loader.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index 5090509309446890..a7a8b2bea112ba75 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -3254,7 +3254,9 @@ static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *c
 	return dwarf_cus__process_cu(dcus, cu_die, dcu->cu, thr_data);
 }
 
-static int dwarf_cus__nextcu(struct dwarf_cus *dcus, Dwarf_Die *die_mem, Dwarf_Die **cu_die, uint8_t *pointer_size, uint8_t *offset_size)
+static int dwarf_cus__nextcu(struct dwarf_cus *dcus, struct dwarf_cu **dcu,
+			     Dwarf_Die *die_mem, Dwarf_Die **cu_die,
+			     uint8_t *pointer_size, uint8_t *offset_size)
 {
 	Dwarf_Off noff;
 	size_t cuhl;
@@ -3274,6 +3276,15 @@ static int dwarf_cus__nextcu(struct dwarf_cus *dcus, Dwarf_Die *die_mem, Dwarf_D
 			dcus->off = noff;
 	}
 
+	if (ret == 0 && *cu_die != NULL) {
+		*dcu = dwarf_cus__create_cu(dcus, *cu_die, *pointer_size);
+		if (*dcu == NULL) {
+			dcus->error = ENOMEM;
+			ret = -1;
+			goto out_unlock;
+		}
+	}
+
 out_unlock:
 	cus__unlock(dcus->cus);
 
@@ -3286,13 +3297,13 @@ static void *dwarf_cus__process_cu_thread(void *arg)
 	struct dwarf_cus *dcus = dthr->dcus;
 	uint8_t pointer_size, offset_size;
 	Dwarf_Die die_mem, *cu_die;
+	struct dwarf_cu *dcu;
 
-	while (dwarf_cus__nextcu(dcus, &die_mem, &cu_die, &pointer_size, &offset_size) == 0) {
+	while (dwarf_cus__nextcu(dcus, &dcu, &die_mem, &cu_die, &pointer_size, &offset_size) == 0) {
 		if (cu_die == NULL)
 			break;
 
-		if (dwarf_cus__create_and_process_cu(dcus, cu_die,
-						     pointer_size, dthr->data) == DWARF_CB_ABORT)
+		if (dwarf_cus__process_cu(dcus, cu_die, dcu->cu, dthr->data) == DWARF_CB_ABORT)
 			goto out_abort;
 	}
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 06/12] dwarf_loader: Remove unused 'thr_data' arg from dwarf_cus__create_and_process_cu()
  2024-04-12 21:15 [PATCH 00/12] Arnaldo Carvalho de Melo
                   ` (4 preceding siblings ...)
  2024-04-12 21:15 ` [PATCH 05/12] dwarf_loader: Create the cu/dcu pair in dwarf_cus__nextcu() Arnaldo Carvalho de Melo
@ 2024-04-12 21:15 ` Arnaldo Carvalho de Melo
  2024-04-12 21:15 ` [PATCH 07/12] core: Add unlocked cus__add() variant Arnaldo Carvalho de Melo
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-12 21:15 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

The only caller for dwarf_cus__create_and_process_cu() now is serial
loading of DWARF, so no point in passing the perf thread data, that is
always NULL, so remove that parameter.

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarf_loader.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index a7a8b2bea112ba75..a097b67a2d123b55 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -3243,15 +3243,14 @@ static int dwarf_cus__process_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die,
        return DWARF_CB_OK;
 }
 
-static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die,
-					    uint8_t pointer_size, void *thr_data)
+static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die, uint8_t pointer_size)
 {
 	struct dwarf_cu *dcu = dwarf_cus__create_cu(dcus, cu_die, pointer_size);
 
 	if (dcu == NULL)
 		return DWARF_CB_ABORT;
 
-	return dwarf_cus__process_cu(dcus, cu_die, dcu->cu, thr_data);
+	return dwarf_cus__process_cu(dcus, cu_die, dcu->cu, NULL);
 }
 
 static int dwarf_cus__nextcu(struct dwarf_cus *dcus, struct dwarf_cu **dcu,
@@ -3377,8 +3376,7 @@ static int __dwarf_cus__process_cus(struct dwarf_cus *dcus)
 		if (cu_die == NULL)
 			break;
 
-		if (dwarf_cus__create_and_process_cu(dcus, cu_die,
-						     pointer_size, NULL) == DWARF_CB_ABORT)
+		if (dwarf_cus__create_and_process_cu(dcus, cu_die, pointer_size) == DWARF_CB_ABORT)
 			return DWARF_CB_ABORT;
 
 		dcus->off = noff;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 07/12] core: Add unlocked cus__add() variant
  2024-04-12 21:15 [PATCH 00/12] Arnaldo Carvalho de Melo
                   ` (5 preceding siblings ...)
  2024-04-12 21:15 ` [PATCH 06/12] dwarf_loader: Remove unused 'thr_data' arg from dwarf_cus__create_and_process_cu() Arnaldo Carvalho de Melo
@ 2024-04-12 21:15 ` Arnaldo Carvalho de Melo
  2024-04-12 21:16 ` [PATCH 08/12] core: Add cus__remove(), counterpart of cus__add() Arnaldo Carvalho de Melo
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-12 21:15 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

As we'll use with the cus lock already held when getting the next CU
from vmlinux to keep the order in the original DWARF file.

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarves.c | 9 ++++++---
 dwarves.h | 1 +
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/dwarves.c b/dwarves.c
index 3b4be595aa59a856..654a8085e9252a21 100644
--- a/dwarves.c
+++ b/dwarves.c
@@ -479,13 +479,16 @@ uint32_t cus__nr_entries(const struct cus *cus)
 	return cus->nr_entries;
 }
 
-void cus__add(struct cus *cus, struct cu *cu)
+void __cus__add(struct cus *cus, struct cu *cu)
 {
-	cus__lock(cus);
-
 	cus->nr_entries++;
 	list_add_tail(&cu->node, &cus->cus);
+}
 
+void cus__add(struct cus *cus, struct cu *cu)
+{
+	cus__lock(cus);
+	__cus__add(cus, cu);
 	cus__unlock(cus);
 
 	cu__find_class_holes(cu);
diff --git a/dwarves.h b/dwarves.h
index 4dfaa01a00f782d9..42b00bc1341e66cb 100644
--- a/dwarves.h
+++ b/dwarves.h
@@ -171,6 +171,7 @@ int cus__fprintf_load_files_err(struct cus *cus, const char *tool,
 int cus__load_dir(struct cus *cus, struct conf_load *conf,
 		  const char *dirname, const char *filename_mask,
 		  const int recursive);
+void __cus__add(struct cus *cus, struct cu *cu);
 void cus__add(struct cus *cus, struct cu *cu);
 void cus__print_error_msg(const char *progname, const struct cus *cus,
 			  const char *filename, const int err);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 08/12] core: Add cus__remove(), counterpart of cus__add()
  2024-04-12 21:15 [PATCH 00/12] Arnaldo Carvalho de Melo
                   ` (6 preceding siblings ...)
  2024-04-12 21:15 ` [PATCH 07/12] core: Add unlocked cus__add() variant Arnaldo Carvalho de Melo
@ 2024-04-12 21:16 ` Arnaldo Carvalho de Melo
  2024-04-12 21:16 ` [PATCH 09/12] dwarf_loader: Add the cu to the cus list early, remove on LSK_DELETE Arnaldo Carvalho de Melo
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-12 21:16 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

We'll switch to adding the 'struct cu' instance to the 'struct cus' list
early, under the lock, to keep the order from the original DWARF file
and then LSK_KEEPIT will just leave it there while LSK_DELETE will first
remove it from the cus list, under cus lock, to then call cu__delete().

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarves.c | 13 +++++++++++++
 dwarves.h |  4 ++++
 2 files changed, 17 insertions(+)

diff --git a/dwarves.c b/dwarves.c
index 654a8085e9252a21..3cd300db97973ce4 100644
--- a/dwarves.c
+++ b/dwarves.c
@@ -479,6 +479,19 @@ uint32_t cus__nr_entries(const struct cus *cus)
 	return cus->nr_entries;
 }
 
+void __cus__remove(struct cus *cus, struct cu *cu)
+{
+	cus->nr_entries--;
+	list_del_init(&cu->node);
+}
+
+void cus__remove(struct cus *cus, struct cu *cu)
+{
+	cus__lock(cus);
+	__cus__remove(cus, cu);
+	cus__unlock(cus);
+}
+
 void __cus__add(struct cus *cus, struct cu *cu)
 {
 	cus->nr_entries++;
diff --git a/dwarves.h b/dwarves.h
index 42b00bc1341e66cb..9dc9cb4b074b5d33 100644
--- a/dwarves.h
+++ b/dwarves.h
@@ -173,6 +173,10 @@ int cus__load_dir(struct cus *cus, struct conf_load *conf,
 		  const int recursive);
 void __cus__add(struct cus *cus, struct cu *cu);
 void cus__add(struct cus *cus, struct cu *cu);
+
+void __cus__remove(struct cus *cus, struct cu *cu);
+void cus__remove(struct cus *cus, struct cu *cu);
+
 void cus__print_error_msg(const char *progname, const struct cus *cus,
 			  const char *filename, const int err);
 struct cu *cus__find_pair(struct cus *cus, const char *name);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 09/12] dwarf_loader: Add the cu to the cus list early, remove on LSK_DELETE
  2024-04-12 21:15 [PATCH 00/12] Arnaldo Carvalho de Melo
                   ` (7 preceding siblings ...)
  2024-04-12 21:16 ` [PATCH 08/12] core: Add cus__remove(), counterpart of cus__add() Arnaldo Carvalho de Melo
@ 2024-04-12 21:16 ` Arnaldo Carvalho de Melo
  2024-04-12 21:16 ` [PATCH 10/12] core/dwarf_loader: Add functions to set state of CU processing Arnaldo Carvalho de Melo
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-12 21:16 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

We want to keep it in the same order as in the original DWARF file we're
loading (e.g. vmlinux), we'll only remove it after we load and process
(e.g. convert to BTF).

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarf_loader.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index a097b67a2d123b55..3ef22aada6f46f13 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -3034,12 +3034,12 @@ static int cus__finalize(struct cus *cus, struct cu *cu, struct conf_load *conf,
 	int lsk = cu__finalize(cu, conf, thr_data);
 	switch (lsk) {
 	case LSK__DELETE:
+		cus__remove(cus, cu);
 		cu__delete(cu);
 		break;
 	case LSK__STOP_LOADING:
 		break;
 	case LSK__KEEPIT:
-		cus__add(cus, cu);
 		break;
 	}
 	return lsk;
@@ -3064,7 +3064,7 @@ static int cu__set_common(struct cu *cu, struct conf_load *conf,
 	return 0;
 }
 
-static int __cus__load_debug_types(struct conf_load *conf, Dwfl_Module *mod, Dwarf *dw, Elf *elf,
+static int __cus__load_debug_types(struct cus *cus, struct conf_load *conf, Dwfl_Module *mod, Dwarf *dw, Elf *elf,
 				   const char *filename, const unsigned char *build_id,
 				   int build_id_len, struct cu **cup, struct dwarf_cu *dcup)
 {
@@ -3098,6 +3098,7 @@ static int __cus__load_debug_types(struct conf_load *conf, Dwfl_Module *mod, Dwa
 			cu->dfops = &dwarf__ops;
 
 			*cup = cu;
+			cus__add(cus, cu);
 		}
 
 		Dwarf_Die die_mem;
@@ -3250,6 +3251,8 @@ static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *c
 	if (dcu == NULL)
 		return DWARF_CB_ABORT;
 
+	cus__add(dcus->cus, dcu->cu);
+
 	return dwarf_cus__process_cu(dcus, cu_die, dcu->cu, NULL);
 }
 
@@ -3282,6 +3285,9 @@ static int dwarf_cus__nextcu(struct dwarf_cus *dcus, struct dwarf_cu **dcu,
 			ret = -1;
 			goto out_unlock;
 		}
+		// Do it here to keep all CUs in cus->cus in the same
+		// order as in the DWARF file being loaded (e.g. vmlinux)
+		__cus__add(dcus->cus, (*dcu)->cu);
 	}
 
 out_unlock:
@@ -3496,15 +3502,15 @@ static int cus__load_module(struct cus *cus, struct conf_load *conf,
 	struct dwarf_cu type_dcu;
 	int type_lsk = LSK__KEEPIT;
 
-	int res = __cus__load_debug_types(conf, mod, dw, elf, filename, build_id, build_id_len, &type_cu, &type_dcu);
+	int res = __cus__load_debug_types(cus, conf, mod, dw, elf, filename, build_id, build_id_len, &type_cu, &type_dcu);
 	if (res != 0) {
 		return res;
 	}
 
 	if (type_cu != NULL) {
 		type_lsk = cu__finalize(type_cu, conf, NULL);
-		if (type_lsk == LSK__KEEPIT) {
-			cus__add(cus, type_cu);
+		if (type_lsk == LSK__DELETE) {
+			cus__remove(cus, type_cu);
 		}
 	}
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 10/12] core/dwarf_loader: Add functions to set state of CU processing
  2024-04-12 21:15 [PATCH 00/12] Arnaldo Carvalho de Melo
                   ` (8 preceding siblings ...)
  2024-04-12 21:16 ` [PATCH 09/12] dwarf_loader: Add the cu to the cus list early, remove on LSK_DELETE Arnaldo Carvalho de Melo
@ 2024-04-12 21:16 ` Arnaldo Carvalho de Melo
  2024-04-12 21:16 ` [PATCH 11/12] pahole: Encode BTF serially in a reproducible build Arnaldo Carvalho de Melo
  2024-04-12 21:16 ` [PATCH 12/12] tests: Add a BTF reproducible generation test Arnaldo Carvalho de Melo
  11 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-12 21:16 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

When doing reproducible builds we'll process loaded CUs in
pahole_stealer in the same order as the CUs are in the DWARF file
(vmlinux), so sometimes when we may finish loading a CU out of order and
thus will have to either busy loop (to use a hot cache?) to way for its
turn or simply leave it in the ordered queue waiting its turn (may use
too much memory with a pile of CUs waiting to be processed (converted to
BTF)?), so we need to mark its state:

   unprocessed, loaded, processing

So when pahole has pahole_stealer called, it will not use necessarily
the handed cu, because it may be out of order, say if the first CU that
pahole gets in one of its threads is a big one and takes longer to
process than one of the other handed to the otherd DWARF loading
threads, then it will have to wait for the first one to be processed by
the, so far, non reproducible BTF encoder.

We will instead allow tools such as pahole to ask for the next CU do be
processed, i.e. in order, if the first one is still not loaded, then
it'll just return NULL and the encoder will return so that its DWARF
loading thread can go load one more CU and then ask pahole to encode one
more CU, rinse repeat.

We'll see if this ends up using too much memory due to unconstrained
loading of CUs that pile up waiting for BTF encoding.

At the end the tool will need to flush the queue.

When a CU is processed it gets deleted, which goes on freeing memory.

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarf_loader.c |  8 +++++---
 dwarves.c      | 40 ++++++++++++++++++++++++++++++++++++++++
 dwarves.h      | 11 +++++++++++
 3 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index 3ef22aada6f46f13..b15cf543fa9d7471 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -3016,13 +3016,15 @@ static void cu__sort_types_by_offset(struct cu *cu, struct conf_load *conf)
 	cu__for_all_tags(cu, type__sort_by_offset, conf);
 }
 
-static int cu__finalize(struct cu *cu, struct conf_load *conf, void *thr_data)
+static int cu__finalize(struct cu *cu, struct cus *cus, struct conf_load *conf, void *thr_data)
 {
 	cu__for_all_tags(cu, class_member__cache_byte_size, conf);
 
 	if (cu__language_reorders_offsets(cu))
 		cu__sort_types_by_offset(cu, conf);
 
+	cus__set_cu_state(cus, cu, CU__LOADED);
+
 	if (conf && conf->steal) {
 		return conf->steal(cu, conf, thr_data);
 	}
@@ -3031,7 +3033,7 @@ static int cu__finalize(struct cu *cu, struct conf_load *conf, void *thr_data)
 
 static int cus__finalize(struct cus *cus, struct cu *cu, struct conf_load *conf, void *thr_data)
 {
-	int lsk = cu__finalize(cu, conf, thr_data);
+	int lsk = cu__finalize(cu, cus, conf, thr_data);
 	switch (lsk) {
 	case LSK__DELETE:
 		cus__remove(cus, cu);
@@ -3508,7 +3510,7 @@ static int cus__load_module(struct cus *cus, struct conf_load *conf,
 	}
 
 	if (type_cu != NULL) {
-		type_lsk = cu__finalize(type_cu, conf, NULL);
+		type_lsk = cu__finalize(type_cu, cus, conf, NULL);
 		if (type_lsk == LSK__DELETE) {
 			cus__remove(cus, type_cu);
 		}
diff --git a/dwarves.c b/dwarves.c
index 3cd300db97973ce4..fbc8d8aa0060b7d0 100644
--- a/dwarves.c
+++ b/dwarves.c
@@ -469,6 +469,44 @@ void cus__unlock(struct cus *cus)
 	pthread_mutex_unlock(&cus->mutex);
 }
 
+void cus__set_cu_state(struct cus *cus, struct cu *cu, enum cu_state state)
+{
+	cus__lock(cus);
+	cu->state = state;
+	cus__unlock(cus);
+}
+
+// Used only when reproducible builds are desired
+struct cu *cus__get_next_processable_cu(struct cus *cus)
+{
+	struct cu *cu;
+
+	cus__lock(cus);
+
+	list_for_each_entry(cu, &cus->cus, node) {
+		switch (cu->state) {
+		case CU__LOADED:
+			cu->state = CU__PROCESSING;
+			goto found;
+		case CU__PROCESSING:
+			// This will only happen when we get to parallel
+			// reproducible BTF encoding, libbpf dedup work needed here.
+			continue;
+		case CU__UNPROCESSED:
+			// The first entry isn't loaded, signal the
+			// caller to return and try another day, as we
+			// need to respect the original DWARF CU ordering.
+			goto out;
+		}
+	}
+out:
+	cu = NULL;
+found:
+	cus__unlock(cus);
+
+	return cu;
+}
+
 bool cus__empty(const struct cus *cus)
 {
 	return list_empty(&cus->cus);
@@ -701,6 +739,8 @@ struct cu *cu__new(const char *name, uint8_t addr_size,
 		cu->addr_size = addr_size;
 		cu->extra_dbg_info = 0;
 
+		cu->state = CU__UNPROCESSED;
+
 		cu->nr_inline_expansions   = 0;
 		cu->size_inline_expansions = 0;
 		cu->nr_structures_changed  = 0;
diff --git a/dwarves.h b/dwarves.h
index 9dc9cb4b074b5d33..dd35a4efd6e5decb 100644
--- a/dwarves.h
+++ b/dwarves.h
@@ -43,6 +43,12 @@ enum load_steal_kind {
 	LSK__STOP_LOADING,
 };
 
+enum cu_state {
+	CU__UNPROCESSED,
+	CU__LOADED,
+	CU__PROCESSING,
+};
+
 /*
  * BTF combines all the types into one big CU using btf_dedup(), so for something
  * like a allyesconfig vmlinux kernel we can get over 65535 types.
@@ -177,6 +183,10 @@ void cus__add(struct cus *cus, struct cu *cu);
 void __cus__remove(struct cus *cus, struct cu *cu);
 void cus__remove(struct cus *cus, struct cu *cu);
 
+struct cu *cus__get_next_processable_cu(struct cus *cus);
+
+void cus__set_cu_state(struct cus *cus, struct cu *cu, enum cu_state state);
+
 void cus__print_error_msg(const char *progname, const struct cus *cus,
 			  const char *filename, const int err);
 struct cu *cus__find_pair(struct cus *cus, const char *name);
@@ -287,6 +297,7 @@ struct cu {
 	uint8_t		 little_endian:1;
 	uint8_t		 nr_register_params;
 	int		 register_params[ARCH_MAX_REGISTER_PARAMS];
+	enum cu_state	 state;
 	uint16_t	 language;
 	unsigned long	 nr_inline_expansions;
 	size_t		 size_inline_expansions;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 11/12] pahole: Encode BTF serially in a reproducible build
  2024-04-12 21:15 [PATCH 00/12] Arnaldo Carvalho de Melo
                   ` (9 preceding siblings ...)
  2024-04-12 21:16 ` [PATCH 10/12] core/dwarf_loader: Add functions to set state of CU processing Arnaldo Carvalho de Melo
@ 2024-04-12 21:16 ` Arnaldo Carvalho de Melo
  2024-04-12 21:16 ` [PATCH 12/12] tests: Add a BTF reproducible generation test Arnaldo Carvalho de Melo
  11 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-12 21:16 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

Now we will ask the cus instance for the next processable CU, i.e. one
that is loaded and is in the same CU order as in the original DWARF
file, under the BTF lock.

With this we can go on loading the DWARF file in parallel and only
serialize the BTF encoding, keeping that order, with this the BTF ids
end up the same both for a serial encoding:

And here are some numbers with a Release build:

  $ cat buildcmd.sh
  mkdir build
  cd build
  cmake -DCMAKE_BUILD_TYPE=Release ..
  cd ..
  make -j $(getconf _NPROCESSORS_ONLN) -C build
  $ rm -rf build
  $ ./buildcmd.sh

Its an Intel Hybrid system, and migrates to/from efficiency/perfomance
cores:

  $ getconf _NPROCESSORS_ONLN
  28
  $ grep -m1 'model name' /proc/cpuinfo
  model name	: Intel(R) Core(TM) i7-14700K
  $

8 performance cores (16 threads), 12 efficiency cores.

Serial encoding:

  $ time perf stat -e cycles -r5 pahole --btf_encode_detached=vmlinux.btf.serial vmlinux

   Performance counter stats for 'pahole --btf_encode_detached=vmlinux.btf.serial vmlinux' (5 runs):

      13,313,169,305      cpu_atom/cycles:u/     ( +- 30.61% )  (0.00%)
      27,985,776,096      cpu_core/cycles:u/     ( +-  0.17% )  (100.00%)

             5.18276 +- 0.00952 seconds time elapsed  ( +-  0.18% )

  real	0m25.937s
  user	0m25.337s
  sys	0m0.533s
  $

Parallel, but non-reproducible:

  $ time perf stat -e cycles -r5 pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux' (5 runs):

      73,112,892,822      cpu_atom/cycles:u/     ( +-  0.60% )  (43.16%)
      99,124,802,605      cpu_core/cycles:u/     ( +-  0.72% )  (59.01%)

              1.9501 +- 0.0111 seconds time elapsed  ( +-  0.57% )

  real	0m9.778s
  user	1m30.700s
  sys	0m13.114s
  $

Now what we want, a reproducible build done using parallel DWARF loading
+ CUs-ordered-as-in-vmlinux serial BTF encoding:

  $ time perf stat -e cycles -r5 pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible vmlinux

   Performance counter stats for 'pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible vmlinux' (5 runs):

      21,263,444,208      cpu_atom/cycles:u/     ( +-  1.95% )  (29.18%)
      35,881,074,547      cpu_core/cycles:u/     ( +-  0.64% )  (75.60%)

              2.5354 +- 0.0221 seconds time elapsed  ( +-  0.87% )

  real	0m12.709s
  user	0m35.001s
  sys	0m9.017s
  $

Fastest is off course the unreproducible, fully parallel DWARF loading/
BTF encoding at 1.9501 +- 0.0111 seconds, but doing a reproducible build
in 2.5354 +- 0.0221 seconds is better than completely disabling -j/full
serial at 5.18276 +- 0.00952 seconds.

Comparing the BTF generated:

  $ bpftool btf dump file vmlinux.btf.serial > output.vmlinux.btf.serial
  $ bpftool btf dump file vmlinux.btf.parallel > output.vmlinux.btf.parallel
  $ bpftool btf dump file vmlinux.btf.parallel.reproducible > output.vmlinux.btf.parallel.reproducible

  $ wc -l output.vmlinux.btf.serial output.vmlinux.btf.parallel output.vmlinux.btf.parallel.reproducible
    313404 output.vmlinux.btf.serial
    314345 output.vmlinux.btf.parallel
    313404 output.vmlinux.btf.parallel.reproducible
    941153 total
  $

Non reproducible parallel BTF encoding:

  $ diff -u output.vmlinux.btf.serial output.vmlinux.btf.parallel | head
  --- output.vmlinux.btf.serial	2024-04-02 11:11:56.665027947 -0300
  +++ output.vmlinux.btf.parallel	2024-04-02 11:12:38.490895460 -0300
  @@ -1,1708 +1,2553 @@
   [1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
  -[2] CONST '(anon)' type_id=1
  -[3] VOLATILE '(anon)' type_id=2
  -[4] ARRAY '(anon)' type_id=1 index_type_id=21 nr_elems=2
  -[5] PTR '(anon)' type_id=8
  -[6] CONST '(anon)' type_id=5
  -[7] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
  $

Reproducible:

  $ diff -u output.vmlinux.btf.serial output.vmlinux.btf.parallel.reproducible
  $

And using a test script that I'll add to a nascent repository of
regression tests:

  $ time tests/reproducible_build.sh vmlinux
  Parallel reproducible DWARF Loading/Serial BTF encoding: Ok

  real	1m13.844s
  user	3m3.601s
  sys	0m9.049s
  $

If the number of threads started by pahole is different than what was
requests via its -j command line option, it will fail as well as if the
output of 'bpftool btf dump' differs from the BTF encoded totally
serially to one of the detached BTF encoded using reproducible DWARF
loading/BTF encoding.

In verbose mode:

  $ time VERBOSE=1 tests/reproducible_build.sh vmlinux
  Parallel reproducible DWARF Loading/Serial BTF encoding:
  serial encoding...
  1 threads encoding
  1 threads started
  diff from serial encoding:
  -----------------------------
  2 threads encoding
  2 threads started
  diff from serial encoding:
  -----------------------------
  3 threads encoding
  3 threads started
  diff from serial encoding:
  -----------------------------
  4 threads encoding
  4 threads started
  diff from serial encoding:
  -----------------------------
  5 threads encoding
  5 threads started
  diff from serial encoding:
  -----------------------------
  6 threads encoding
  6 threads started
   diff from serial encoding:
  -----------------------------
  7 threads encoding
  7 threads started
  diff from serial encoding:
  -----------------------------
  8 threads encoding
  8 threads started
  diff from serial encoding:
  -----------------------------
  9 threads encoding
  9 threads started
  diff from serial encoding:
  -----------------------------
  10 threads encoding
  10 threads started
  diff from serial encoding:
  -----------------------------
  11 threads encoding
  11 threads started
  diff from serial encoding:
  -----------------------------
  12 threads encoding
  12 threads started
  diff from serial encoding:
  -----------------------------
  13 threads encoding
  13 threads started
  diff from serial encoding:
  -----------------------------
  14 threads encoding
  14 threads started
  diff from serial encoding:
  -----------------------------
  15 threads encoding
  15 threads started
  diff from serial encoding:
  -----------------------------
  16 threads encoding
  16 threads started
  diff from serial encoding:
  -----------------------------
  17 threads encoding
  17 threads started
  diff from serial encoding:
  -----------------------------
  18 threads encoding
  18 threads started
  diff from serial encoding:
  -----------------------------
  19 threads encoding
  19 threads started
  diff from serial encoding:
  -----------------------------
  20 threads encoding
  20 threads started
  diff from serial encoding:
  -----------------------------
  21 threads encoding
  21 threads started
  diff from serial encoding:
  -----------------------------
  22 threads encoding
  22 threads started
  diff from serial encoding:
  -----------------------------
  23 threads encoding
  23 threads started
  diff from serial encoding:
  -----------------------------
  24 threads encoding
  24 threads started
  diff from serial encoding:
  -----------------------------
  25 threads encoding
  25 threads started
  diff from serial encoding:
  -----------------------------
  26 threads encoding
  26 threads started
  diff from serial encoding:
  -----------------------------
  27 threads encoding
  27 threads started
  diff from serial encoding:
  -----------------------------
  28 threads encoding
  28 threads started
  diff from serial encoding:
  -----------------------------
  Ok

  real	1m14.800s
  user	3m4.315s
  sys	0m8.977s
  $

Tested-by: Alan Maguire <alan.maguire@oracle.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/all/ZhQBpAGIDU_koQnp@x1/T/#u
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarves.c |  8 ++++++--
 pahole.c  | 48 +++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/dwarves.c b/dwarves.c
index fbc8d8aa0060b7d0..1ec259f50dbd3778 100644
--- a/dwarves.c
+++ b/dwarves.c
@@ -489,8 +489,12 @@ struct cu *cus__get_next_processable_cu(struct cus *cus)
 			cu->state = CU__PROCESSING;
 			goto found;
 		case CU__PROCESSING:
-			// This will only happen when we get to parallel
-			// reproducible BTF encoding, libbpf dedup work needed here.
+			// This will happen when we get to parallel
+			// reproducible BTF encoding, libbpf dedup work needed
+			// here. The other possibility is when we're flushing
+			// the DWARF processed CUs when the parallel DWARF
+			// loading stoped and we still have CUs to encode to
+			// BTF because of ordering requirements.
 			continue;
 		case CU__UNPROCESSED:
 			// The first entry isn't loaded, signal the
diff --git a/pahole.c b/pahole.c
index fcb4360f11debeb9..77772bb42bb443ce 100644
--- a/pahole.c
+++ b/pahole.c
@@ -31,6 +31,7 @@
 
 static struct btf_encoder *btf_encoder;
 static char *detached_btf_filename;
+struct cus *cus;
 static bool btf_encode;
 static bool ctf_encode;
 static bool sort_output;
@@ -3324,11 +3325,32 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
 			encoder = btf_encoder;
 		}
 
+		// Since we don't have yet a way to parallelize the BTF encoding, we
+		// need to ask the loader for the next CU that we can process, one
+		// that is loaded and is in order, if the next one isn't yet loaded,
+		// then return to let the DWARF loader thread to load the next one,
+		// eventually all will get processed, even if when all DWARF loading
+		// threads finish.
+		if (conf_load->reproducible_build) {
+			ret = LSK__KEEPIT; // we're not processing the cu passed to this
+					  // function, so keep it.
+			cu = cus__get_next_processable_cu(cus);
+			if (cu == NULL)
+				goto out_btf;
+		}
+
 		ret = btf_encoder__encode_cu(encoder, cu, conf_load);
 		if (ret < 0) {
 			fprintf(stderr, "Encountered error while encoding BTF.\n");
 			exit(1);
 		}
+
+		if (conf_load->reproducible_build) {
+			ret = LSK__KEEPIT; // we're not processing the cu passed to this function, so keep it.
+			// Kinda equivalent to LSK__DELETE since we processed this, but we can't delete it
+			// as we stash references to entries in CUs for 'struct function' in btf_encoder__add_saved_funcs()
+			// and btf_encoder__save_func(), so we can't delete them here. - Alan Maguire
+		}
 out_btf:
 		if (!thr_data) // See comment about reproducibe_build above
 			pthread_mutex_unlock(&btf_lock);
@@ -3632,6 +3654,24 @@ out_free:
 	return ret;
 }
 
+static int cus__flush_reproducible_build(struct cus *cus, struct btf_encoder *encoder, struct conf_load *conf_load)
+{
+	int err = 0;
+
+	while (true) {
+		struct cu *cu = cus__get_next_processable_cu(cus);
+
+		if (cu == NULL)
+			break;
+
+		err = btf_encoder__encode_cu(encoder, cu, conf_load);
+		if (err < 0)
+			break;
+	}
+
+	return err;
+}
+
 int main(int argc, char *argv[])
 {
 	int err, remaining, rc = EXIT_FAILURE;
@@ -3692,7 +3732,7 @@ int main(int argc, char *argv[])
 		}
 	}
 
-	struct cus *cus = cus__new();
+	cus = cus__new();
 	if (cus == NULL) {
 		fputs("pahole: insufficient memory\n", stderr);
 		goto out_dwarves_exit;
@@ -3797,6 +3837,12 @@ try_sole_arg_as_class_names:
 	header = NULL;
 
 	if (btf_encode && btf_encoder) { // maybe all CUs were filtered out and thus we don't have an encoder?
+		if (conf_load.reproducible_build &&
+		    cus__flush_reproducible_build(cus, btf_encoder, &conf_load) < 0) {
+			fprintf(stderr, "Encountered error while encoding BTF.\n");
+			exit(1);
+		}
+
 		err = btf_encoder__encode(btf_encoder);
 		if (err) {
 			fputs("Failed to encode BTF\n", stderr);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 12/12] tests: Add a BTF reproducible generation test
  2024-04-12 21:15 [PATCH 00/12] Arnaldo Carvalho de Melo
                   ` (10 preceding siblings ...)
  2024-04-12 21:16 ` [PATCH 11/12] pahole: Encode BTF serially in a reproducible build Arnaldo Carvalho de Melo
@ 2024-04-12 21:16 ` Arnaldo Carvalho de Melo
  2024-04-15 10:26   ` Alan Maguire
  11 siblings, 1 reply; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-12 21:16 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

  $ time tests/reproducible_build.sh vmlinux
  Parallel reproducible DWARF Loading/Serial BTF encoding: Ok

  real  1m13.844s
  user  3m3.601s
  sys   0m9.049s
  $

If the number of threads started by pahole is different than what was
requests via its -j command line option, it will fail as well as if the
output of 'bpftool btf dump' differs from the BTF encoded totally
serially to one of the detached BTF encoded using reproducible DWARF
loading/BTF encoding.

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tests/reproducible_build.sh | 56 +++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)
 create mode 100755 tests/reproducible_build.sh

diff --git a/tests/reproducible_build.sh b/tests/reproducible_build.sh
new file mode 100755
index 0000000000000000..9c72d548c2a21136
--- /dev/null
+++ b/tests/reproducible_build.sh
@@ -0,0 +1,56 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Test if BTF generated serially matches reproducible parallel DWARF loading + serial BTF encoding
+# Arnaldo Carvalho de Melo <acme@redhat.com> (C) 2024-
+
+vmlinux=$1
+outdir=$(mktemp -d /tmp/reproducible_build.sh.XXXXXX)
+
+echo -n "Parallel reproducible DWARF Loading/Serial BTF encoding: "
+
+test -n "$VERBOSE" && printf "\nserial encoding...\n"
+
+pahole --btf_encode_detached=$outdir/vmlinux.btf.serial $vmlinux
+bpftool btf dump file $outdir/vmlinux.btf.serial > $outdir/bpftool.output.vmlinux.btf.serial
+
+nr_proc=$(getconf _NPROCESSORS_ONLN)
+
+for threads in $(seq $nr_proc) ; do
+	test -n "$VERBOSE" && echo $threads threads encoding
+	pahole -j$threads --reproducible_build --btf_encode_detached=$outdir/vmlinux.btf.parallel.reproducible $vmlinux &
+	pahole=$!
+	# HACK: Wait a bit for pahole to start its threads
+	sleep 0.3s
+	# PID part to remove ps output headers
+	nr_threads_started=$(ps -L -C pahole | grep -v PID | wc -l)
+
+	if [ $threads -gt 1 ] ; then
+		((nr_threads_started -= 1))
+	fi
+
+	if [ $threads != $nr_threads_started ] ; then
+		echo "ERROR: pahole asked to start $threads encoding threads, started $nr_threads_started"
+		exit 1;
+	fi
+
+	# ps -L -C pahole | grep -v PID | nl
+	test -n "$VERBOSE" && echo $nr_threads_started threads started
+	wait $pahole
+	rm -f $outdir/bpftool.output.vmlinux.btf.parallel.reproducible
+	bpftool btf dump file $outdir/vmlinux.btf.parallel.reproducible > $outdir/bpftool.output.vmlinux.btf.parallel.reproducible
+	test -n "$VERBOSE" && echo "diff from serial encoding:"
+	diff -u $outdir/bpftool.output.vmlinux.btf.serial $outdir/bpftool.output.vmlinux.btf.parallel.reproducible > $outdir/diff
+	if [ -s $outdir/diff ] ; then
+		echo "ERROR: BTF generated from DWARF in parallel is different from the one generated in serial!"
+		exit 1
+	fi
+	test -n "$VERBOSE" && echo -----------------------------
+done
+
+rm $outdir/*
+rmdir $outdir
+
+echo "Ok"
+
+exit 0
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 12/12] tests: Add a BTF reproducible generation test
  2024-04-12 21:16 ` [PATCH 12/12] tests: Add a BTF reproducible generation test Arnaldo Carvalho de Melo
@ 2024-04-15 10:26   ` Alan Maguire
  2024-04-15 17:26     ` Arnaldo Carvalho de Melo
  2024-04-15 17:33     ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 17+ messages in thread
From: Alan Maguire @ 2024-04-15 10:26 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Kui-Feng Lee, Thomas Weißschuh

On 12/04/2024 22:16, Arnaldo Carvalho de Melo wrote:
> From: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
>   $ time tests/reproducible_build.sh vmlinux
>   Parallel reproducible DWARF Loading/Serial BTF encoding: Ok
> 
>   real  1m13.844s
>   user  3m3.601s
>   sys   0m9.049s
>   $
> 
> If the number of threads started by pahole is different than what was
> requests via its -j command line option, it will fail as well as if the
> output of 'bpftool btf dump' differs from the BTF encoded totally
> serially to one of the detached BTF encoded using reproducible DWARF
> loading/BTF encoding.
> 
> Cc: Alan Maguire <alan.maguire@oracle.com>
> Cc: Kui-Feng Lee <kuifeng@fb.com>
> Cc: Thomas Weißschuh <linux@weissschuh.net>
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> ---
>  tests/reproducible_build.sh | 56 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 56 insertions(+)
>  create mode 100755 tests/reproducible_build.sh
> 

great to have a test for this! a few small things below but
for the series

Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>

> diff --git a/tests/reproducible_build.sh b/tests/reproducible_build.sh
> new file mode 100755
> index 0000000000000000..9c72d548c2a21136
> --- /dev/null
> +++ b/tests/reproducible_build.sh
> @@ -0,0 +1,56 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# Test if BTF generated serially matches reproducible parallel DWARF loading + serial BTF encoding
> +# Arnaldo Carvalho de Melo <acme@redhat.com> (C) 2024-
> +
> +vmlinux=$1

nit: might be worth having a usage check/error for the vmlinux binary here.

> +outdir=$(mktemp -d /tmp/reproducible_build.sh.XXXXXX)
> +
> +echo -n "Parallel reproducible DWARF Loading/Serial BTF encoding: "
> +
> +test -n "$VERBOSE" && printf "\nserial encoding...\n"
> +
> +pahole --btf_encode_detached=$outdir/vmlinux.btf.serial $vmlinux

suggestion here; what about adding --btf_features=all as this would mean
we'd be encoding all the standard kernel features? we'd need to do the
same below in the thread loop. without that we're missing out on a few
features that the kernel builds use in BTF encoding, and we probably
want to ensure that we're testing as close to a real kernel BTF encoding
scenario as possible.

> +bpftool btf dump file $outdir/vmlinux.btf.serial > $outdir/bpftool.output.vmlinux.btf.serial
> +
> +nr_proc=$(getconf _NPROCESSORS_ONLN)
> +
> +for threads in $(seq $nr_proc) ; do
> +	test -n "$VERBOSE" && echo $threads threads encoding
> +	pahole -j$threads --reproducible_build --btf_encode_detached=$outdir/vmlinux.btf.parallel.reproducible $vmlinux &
> +	pahole=$!
> +	# HACK: Wait a bit for pahole to start its threads
> +	sleep 0.3s
> +	# PID part to remove ps output headers
> +	nr_threads_started=$(ps -L -C pahole | grep -v PID | wc -l)
> +
> +	if [ $threads -gt 1 ] ; then
> +		((nr_threads_started -= 1))
> +	fi
> +
> +	if [ $threads != $nr_threads_started ] ; then
> +		echo "ERROR: pahole asked to start $threads encoding threads, started $nr_threads_started"
> +		exit 1;
> +	fi
> +
> +	# ps -L -C pahole | grep -v PID | nl
> +	test -n "$VERBOSE" && echo $nr_threads_started threads started
> +	wait $pahole
> +	rm -f $outdir/bpftool.output.vmlinux.btf.parallel.reproducible
> +	bpftool btf dump file $outdir/vmlinux.btf.parallel.reproducible > $outdir/bpftool.output.vmlinux.btf.parallel.reproducible
> +	test -n "$VERBOSE" && echo "diff from serial encoding:"
> +	diff -u $outdir/bpftool.output.vmlinux.btf.serial $outdir/bpftool.output.vmlinux.btf.parallel.reproducible > $outdir/diff
> +	if [ -s $outdir/diff ] ; then
> +		echo "ERROR: BTF generated from DWARF in parallel is different from the one generated in serial!"
> +		exit 1
> +	fi
> +	test -n "$VERBOSE" && echo -----------------------------
> +done
> +
> +rm $outdir/*
> +rmdir $outdir
> +
> +echo "Ok"
> +
> +exit 0

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 12/12] tests: Add a BTF reproducible generation test
  2024-04-15 10:26   ` Alan Maguire
@ 2024-04-15 17:26     ` Arnaldo Carvalho de Melo
  2024-04-15 17:33     ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-15 17:26 UTC (permalink / raw)
  To: Alan Maguire
  Cc: dwarves, Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Kui-Feng Lee, Thomas Weißschuh

On Mon, Apr 15, 2024 at 11:26:44AM +0100, Alan Maguire wrote:
> On 12/04/2024 22:16, Arnaldo Carvalho de Melo wrote:
> > From: Arnaldo Carvalho de Melo <acme@redhat.com>
> > 
> >   $ time tests/reproducible_build.sh vmlinux
> >   Parallel reproducible DWARF Loading/Serial BTF encoding: Ok
> > 
> >   real  1m13.844s
> >   user  3m3.601s
> >   sys   0m9.049s
> >   $
> > 
> > If the number of threads started by pahole is different than what was
> > requests via its -j command line option, it will fail as well as if the
> > output of 'bpftool btf dump' differs from the BTF encoded totally
> > serially to one of the detached BTF encoded using reproducible DWARF
> > loading/BTF encoding.
> > 
> > Cc: Alan Maguire <alan.maguire@oracle.com>
> > Cc: Kui-Feng Lee <kuifeng@fb.com>
> > Cc: Thomas Weißschuh <linux@weissschuh.net>
> > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> > ---
> >  tests/reproducible_build.sh | 56 +++++++++++++++++++++++++++++++++++++
> >  1 file changed, 56 insertions(+)
> >  create mode 100755 tests/reproducible_build.sh
> > 
> 
> great to have a test for this! a few small things below but
> for the series
> 
> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> Tested-by: Alan Maguire <alan.maguire@oracle.com>
> 
> > diff --git a/tests/reproducible_build.sh b/tests/reproducible_build.sh
> > new file mode 100755
> > index 0000000000000000..9c72d548c2a21136
> > --- /dev/null
> > +++ b/tests/reproducible_build.sh
> > @@ -0,0 +1,56 @@
> > +#!/bin/bash
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +#
> > +# Test if BTF generated serially matches reproducible parallel DWARF loading + serial BTF encoding
> > +# Arnaldo Carvalho de Melo <acme@redhat.com> (C) 2024-
> > +
> > +vmlinux=$1
> 
> nit: might be worth having a usage check/error for the vmlinux binary here.

Right, I'm doing that now.

Suggested-by: Alan Maguire <alan.maguire@oracle.com>

⬢[acme@toolbox pahole]$ tests/reproducible_build.sh 
Please specify a vmlinux file to operate on
⬢[acme@toolbox pahole]$ tests/reproducible_build.sh vmlinux1
vmlinux1 file not available, please specify another
⬢[acme@toolbox pahole]$ tests/reproducible_build.sh vmlinux
Parallel reproducible DWARF Loading/Serial BTF encoding: ^C
⬢[acme@toolbox pahole]$

diff --git a/tests/reproducible_build.sh b/tests/reproducible_build.sh
index 9c72d548c2a21136..b821e28e7ce7bf8c 100755
--- a/tests/reproducible_build.sh
+++ b/tests/reproducible_build.sh
@@ -5,6 +5,17 @@
 # Arnaldo Carvalho de Melo <acme@redhat.com> (C) 2024-
 
 vmlinux=$1
+
+if [ -z "$vmlinux" ] ; then
+	echo "Please specify a vmlinux file to operate on"
+	exit 1
+fi
+
+if [ ! -f "$vmlinux" ] ; then
+	echo "$vmlinux file not available, please specify another"
+	exit 1
+fi
+
 outdir=$(mktemp -d /tmp/reproducible_build.sh.XXXXXX)
 
 echo -n "Parallel reproducible DWARF Loading/Serial BTF encoding: "

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 12/12] tests: Add a BTF reproducible generation test
  2024-04-15 10:26   ` Alan Maguire
  2024-04-15 17:26     ` Arnaldo Carvalho de Melo
@ 2024-04-15 17:33     ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-15 17:33 UTC (permalink / raw)
  To: Alan Maguire
  Cc: dwarves, Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Kui-Feng Lee, Thomas Weißschuh

On Mon, Apr 15, 2024 at 11:26:44AM +0100, Alan Maguire wrote:
> On 12/04/2024 22:16, Arnaldo Carvalho de Melo wrote:
> > From: Arnaldo Carvalho de Melo <acme@redhat.com>
> > 
> >   $ time tests/reproducible_build.sh vmlinux
> >   Parallel reproducible DWARF Loading/Serial BTF encoding: Ok
> > 
> >   real  1m13.844s
> >   user  3m3.601s
> >   sys   0m9.049s
> >   $
> > 
> > If the number of threads started by pahole is different than what was
> > requests via its -j command line option, it will fail as well as if the
> > output of 'bpftool btf dump' differs from the BTF encoded totally
> > serially to one of the detached BTF encoded using reproducible DWARF
> > loading/BTF encoding.
> > 
> > Cc: Alan Maguire <alan.maguire@oracle.com>
> > Cc: Kui-Feng Lee <kuifeng@fb.com>
> > Cc: Thomas Weißschuh <linux@weissschuh.net>
> > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> > ---
> >  tests/reproducible_build.sh | 56 +++++++++++++++++++++++++++++++++++++
> >  1 file changed, 56 insertions(+)
> >  create mode 100755 tests/reproducible_build.sh
> > 
> 
> great to have a test for this! a few small things below but
> for the series
> 
> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> Tested-by: Alan Maguire <alan.maguire@oracle.com>
> 
> > diff --git a/tests/reproducible_build.sh b/tests/reproducible_build.sh
> > new file mode 100755
> > index 0000000000000000..9c72d548c2a21136
> > --- /dev/null
> > +++ b/tests/reproducible_build.sh
> > @@ -0,0 +1,56 @@
> > +#!/bin/bash
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +#
> > +# Test if BTF generated serially matches reproducible parallel DWARF loading + serial BTF encoding
> > +# Arnaldo Carvalho de Melo <acme@redhat.com> (C) 2024-
> > +
> > +vmlinux=$1
> 
> nit: might be worth having a usage check/error for the vmlinux binary here.
> 
> > +outdir=$(mktemp -d /tmp/reproducible_build.sh.XXXXXX)
> > +
> > +echo -n "Parallel reproducible DWARF Loading/Serial BTF encoding: "
> > +
> > +test -n "$VERBOSE" && printf "\nserial encoding...\n"
> > +
> > +pahole --btf_encode_detached=$outdir/vmlinux.btf.serial $vmlinux
> 
> suggestion here; what about adding --btf_features=all as this would mean
> we'd be encoding all the standard kernel features? we'd need to do the
> same below in the thread loop. without that we're missing out on a few
> features that the kernel builds use in BTF encoding, and we probably
> want to ensure that we're testing as close to a real kernel BTF encoding
> scenario as possible.

⬢[acme@toolbox pahole]$ time tests/reproducible_build.sh vmlinux
Parallel reproducible DWARF Loading/Serial BTF encoding: Ok

real	1m24.903s
user	3m8.143s
sys	0m47.329s
⬢[acme@toolbox pahole]$ 
⬢[acme@toolbox pahole]$ time VERBOSE=1 tests/reproducible_build.sh vmlinux
Parallel reproducible DWARF Loading/Serial BTF encoding: 
serial encoding...
1 threads encoding
1 threads started
diff from serial encoding:
-----------------------------
2 threads encoding
2 threads started
diff from serial encoding:
-----------------------------
3 threads encoding
3 threads started
^X^C

diff --git a/tests/reproducible_build.sh b/tests/reproducible_build.sh
index b821e28e7ce7bf8c..8cc36fe4c75e8b75 100755
--- a/tests/reproducible_build.sh
+++ b/tests/reproducible_build.sh
@@ -22,14 +22,14 @@ echo -n "Parallel reproducible DWARF Loading/Serial BTF encoding: "
 
 test -n "$VERBOSE" && printf "\nserial encoding...\n"
 
-pahole --btf_encode_detached=$outdir/vmlinux.btf.serial $vmlinux
+pahole --btf_features=all --btf_encode_detached=$outdir/vmlinux.btf.serial $vmlinux
 bpftool btf dump file $outdir/vmlinux.btf.serial > $outdir/bpftool.output.vmlinux.btf.serial
 
 nr_proc=$(getconf _NPROCESSORS_ONLN)
 
 for threads in $(seq $nr_proc) ; do
 	test -n "$VERBOSE" && echo $threads threads encoding
-	pahole -j$threads --reproducible_build --btf_encode_detached=$outdir/vmlinux.btf.parallel.reproducible $vmlinux &
+	pahole -j$threads --reproducible_build --btf_features=all --btf_encode_detached=$outdir/vmlinux.btf.parallel.reproducible $vmlinux &
 	pahole=$!
 	# HACK: Wait a bit for pahole to start its threads
 	sleep 0.3s

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 04/12] dwarf_loader: Introduce dwarf_cus__process_cu()
  2024-04-02 19:39 [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Arnaldo Carvalho de Melo
@ 2024-04-02 19:39 ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-02 19:39 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

Finishing the separation of creating a dcu/cu pair from processing it,
as we'll need to add the new dcu/cu pair to the list under cus__lock(),
so that we process it in order to keep a reproducible BTF encoding.

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarf_loader.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index 125e361ef2bf3f7b..5090509309446890 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -3233,6 +3233,16 @@ static struct dwarf_cu *dwarf_cus__create_cu(struct dwarf_cus *dcus, Dwarf_Die *
 	return dcu;
 }
 
+static int dwarf_cus__process_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die,
+				 struct cu *cu, void *thr_data)
+{
+	if (die__process_and_recode(cu_die, cu, dcus->conf) != 0 ||
+	    cus__finalize(dcus->cus, cu, dcus->conf, thr_data) == LSK__STOP_LOADING)
+		return DWARF_CB_ABORT;
+
+       return DWARF_CB_OK;
+}
+
 static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die,
 					    uint8_t pointer_size, void *thr_data)
 {
@@ -3241,13 +3251,7 @@ static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *c
 	if (dcu == NULL)
 		return DWARF_CB_ABORT;
 
-	struct cu *cu = dcu->cu;
-
-	if (die__process_and_recode(cu_die, cu, dcus->conf) != 0 ||
-	    cus__finalize(dcus->cus, cu, dcus->conf, thr_data) == LSK__STOP_LOADING)
-		return DWARF_CB_ABORT;
-
-       return DWARF_CB_OK;
+	return dwarf_cus__process_cu(dcus, cu_die, dcu->cu, thr_data);
 }
 
 static int dwarf_cus__nextcu(struct dwarf_cus *dcus, Dwarf_Die *die_mem, Dwarf_Die **cu_die, uint8_t *pointer_size, uint8_t *offset_size)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-04-15 17:33 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-12 21:15 [PATCH 00/12] Arnaldo Carvalho de Melo
2024-04-12 21:15 ` [PATCH 01/12] core: Allow asking for a reproducible build Arnaldo Carvalho de Melo
2024-04-12 21:15 ` [PATCH 02/12] pahole: Disable BTF multithreaded encoded when doing reproducible builds Arnaldo Carvalho de Melo
2024-04-12 21:15 ` [PATCH 03/12] dwarf_loader: Separate creating the cu/dcu pair from processing it Arnaldo Carvalho de Melo
2024-04-12 21:15 ` [PATCH 04/12] dwarf_loader: Introduce dwarf_cus__process_cu() Arnaldo Carvalho de Melo
2024-04-12 21:15 ` [PATCH 05/12] dwarf_loader: Create the cu/dcu pair in dwarf_cus__nextcu() Arnaldo Carvalho de Melo
2024-04-12 21:15 ` [PATCH 06/12] dwarf_loader: Remove unused 'thr_data' arg from dwarf_cus__create_and_process_cu() Arnaldo Carvalho de Melo
2024-04-12 21:15 ` [PATCH 07/12] core: Add unlocked cus__add() variant Arnaldo Carvalho de Melo
2024-04-12 21:16 ` [PATCH 08/12] core: Add cus__remove(), counterpart of cus__add() Arnaldo Carvalho de Melo
2024-04-12 21:16 ` [PATCH 09/12] dwarf_loader: Add the cu to the cus list early, remove on LSK_DELETE Arnaldo Carvalho de Melo
2024-04-12 21:16 ` [PATCH 10/12] core/dwarf_loader: Add functions to set state of CU processing Arnaldo Carvalho de Melo
2024-04-12 21:16 ` [PATCH 11/12] pahole: Encode BTF serially in a reproducible build Arnaldo Carvalho de Melo
2024-04-12 21:16 ` [PATCH 12/12] tests: Add a BTF reproducible generation test Arnaldo Carvalho de Melo
2024-04-15 10:26   ` Alan Maguire
2024-04-15 17:26     ` Arnaldo Carvalho de Melo
2024-04-15 17:33     ` Arnaldo Carvalho de Melo
  -- strict thread matches above, loose matches on Subject: below --
2024-04-02 19:39 [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Arnaldo Carvalho de Melo
2024-04-02 19:39 ` [PATCH 04/12] dwarf_loader: Introduce dwarf_cus__process_cu() Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).