dwarves.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding
@ 2024-04-02 19:39 Arnaldo Carvalho de Melo
  2024-04-02 19:39 ` [PATCH 01/12] core: Allow asking for a reproducible build Arnaldo Carvalho de Melo
                   ` (14 more replies)
  0 siblings, 15 replies; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-02 19:39 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

Hi,

	This allows us to have reproducible builds while keeping the
DWARF loading phase in parallel, achieving a noticeable speedup as
showed in the commit log messages:

On a:

  model name    : Intel(R) Core(TM) i7-14700K

  8 performance cores (16 threads), 12 efficiency cores.

Serial encoding:

  $ perf stat -e cycles -r5 pahole --btf_encode_detached=vmlinux.btf.serial vmlinux
             5.18276 +- 0.00952 seconds time elapsed  ( +-  0.18% )

Parallel, but non-reproducible:

  $ perf stat -e cycles -r5 pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux
              1.8529 +- 0.0159 seconds time elapsed  ( +-  0.86% )

reproducible build done using parallel DWARF loading + CUs-ordered-as-in-vmlinux serial BTF encoding:

  $ perf stat -e cycles -r5 pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible_build vmlinux
              2.3632 +- 0.0164 seconds time elapsed  ( +-  0.69% )

Please take a look, its in the 'next' branch at:

  https://git.kernel.org/pub/scm/devel/pahole/pahole.git
  https://git.kernel.org/pub/scm/devel/pahole/pahole.git/log/?h=next

There is a new tool to do regression testing on this feature:

  https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=c751214c19bf8591bf8e4abdc677cbadee08f630
  
And here a more detailed set of tests using it:

  https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=4451467ca16a6e31834f6f98661c63587ce556f7

Working on libbpf to allow for parallel reproducible BTF encoding is the
next step.

Thanks a lot,

- Arnaldo

Arnaldo Carvalho de Melo (12):
  core: Allow asking for a reproducible build
  pahole: Disable BTF multithreaded encoded when doing reproducible builds
  dwarf_loader: Separate creating the cu/dcu pair from processing it
  dwarf_loader: Introduce dwarf_cus__process_cu()
  dwarf_loader: Create the cu/dcu pair in dwarf_cus__nextcu()
  dwarf_loader: Remove unused 'thr_data' arg from dwarf_cus__create_and_process_cu()
  core: Add unlocked cus__add() variant
  core: Add cus__remove(), counterpart of cus__add()
  dwarf_loader: Add the cu to the cus list early, remove on LSK_DELETE
  core/dwarf_loader: Add functions to set state of CU processing
  pahole: Encode BTF serially in a reproducible build
  tests: Add a BTF reproducible generation test

 dwarf_loader.c              | 73 +++++++++++++++++++++++---------
 dwarves.c                   | 58 ++++++++++++++++++++++++-
 dwarves.h                   | 17 ++++++++
 pahole.c                    | 84 +++++++++++++++++++++++++++++++++++--
 tests/reproducible_build.sh | 56 +++++++++++++++++++++++++
 5 files changed, 264 insertions(+), 24 deletions(-)
 create mode 100755 tests/reproducible_build.sh

-- 
2.44.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 01/12] core: Allow asking for a reproducible build
  2024-04-02 19:39 [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Arnaldo Carvalho de Melo
@ 2024-04-02 19:39 ` Arnaldo Carvalho de Melo
  2024-04-02 19:39 ` [PATCH 02/12] pahole: Disable BTF multithreaded encoded when doing reproducible builds Arnaldo Carvalho de Melo
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-02 19:39 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

This is initially about BTF encoding, we want to load DWARF and encode
BTF from it in a way that is reproducible, i.e. no matter how many
threads we use for the loading/encoding process, the output will be the
same, i.e. the BTF ids produced will be the same for all builds.

This first path just adds the conf_load field and allows it to be asked
for with the '--reproducible_build' option in pahole.

At some point we'll use with --btf_features=+reproducible_build or
'--btf_features=default --btf_features=reproducible_build' to keep the
default set of BTF features and be able to use this in the Linux kernel
build system without doing an extra pahole version check for the
availability of --reproducible_build with pahole versions that already
support --btf_features and thus would ignore "reproducible_build".

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarves.h | 1 +
 pahole.c  | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/dwarves.h b/dwarves.h
index 2393a6c3dc836f39..4dfaa01a00f782d9 100644
--- a/dwarves.h
+++ b/dwarves.h
@@ -87,6 +87,7 @@ struct conf_load {
 	bool			skip_encoding_btf_vars;
 	bool			btf_gen_floats;
 	bool			btf_encode_force;
+	bool			reproducible_build;
 	uint8_t			hashtable_bits;
 	uint8_t			max_hashtable_bits;
 	uint16_t		kabi_prefix_len;
diff --git a/pahole.c b/pahole.c
index 0b9c2de74f146a4d..96e153432fa212a5 100644
--- a/pahole.c
+++ b/pahole.c
@@ -1235,6 +1235,7 @@ ARGP_PROGRAM_VERSION_HOOK_DEF = dwarves_print_version;
 #define ARGP_supported_btf_features 342
 #define ARGP_btf_features_strict 343
 #define ARGP_contains_enumerator 344
+#define ARGP_reproducible_build 345
 
 /* --btf_features=feature1[,feature2,..] allows us to specify
  * a list of requested BTF features or "all" to enable all features.
@@ -1819,6 +1820,11 @@ static const struct argp_option pahole__options[] = {
 		.arg = "FEATURE_LIST_STRICT",
 		.doc = "Specify supported BTF features in FEATURE_LIST_STRICT or 'all' for all supported features.  Unlike --btf_features, unrecognized features will trigger an error."
 	},
+	{
+		.name = "reproducible_build",
+		.key = ARGP_reproducible_build,
+		.doc = "Generate reproducile BTF output"
+	},
 	{
 		.name = NULL,
 	}
@@ -1997,6 +2003,8 @@ static error_t pahole__options_parser(int key, char *arg,
 		conf_load.btf_gen_optimized = true;		break;
 	case ARGP_skip_encoding_btf_inconsistent_proto:
 		conf_load.skip_encoding_btf_inconsistent_proto = true; break;
+	case ARGP_reproducible_build:
+		conf_load.reproducible_build = true;	break;
 	case ARGP_btf_features:
 		parse_btf_features(arg, false);		break;
 	case ARGP_supported_btf_features:
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 02/12] pahole: Disable BTF multithreaded encoded when doing reproducible builds
  2024-04-02 19:39 [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Arnaldo Carvalho de Melo
  2024-04-02 19:39 ` [PATCH 01/12] core: Allow asking for a reproducible build Arnaldo Carvalho de Melo
@ 2024-04-02 19:39 ` Arnaldo Carvalho de Melo
  2024-04-03 18:19   ` Andrii Nakryiko
  2024-04-04  9:42   ` Jiri Olsa
  2024-04-02 19:39 ` [PATCH 03/12] dwarf_loader: Separate creating the cu/dcu pair from processing it Arnaldo Carvalho de Melo
                   ` (12 subsequent siblings)
  14 siblings, 2 replies; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-02 19:39 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

Reproducible builds need to produce BTF that have the same ids, which is
not possible at the moment to do in parallel with libbpf, so serialize
the encoding.

The next patches will also make sure that DWARF while being read in
parallel into internal representation for later BTF encoding has its CU
(Compile Units) fed to the BTF encoder in the same order as it is in the
DWARF file, this way we'll produce the same BTF output no matter how
many threads are used to read BTF.

Then we'll make sure we have tests in place that compare the output of
parallel BTF encoding (well, just the DWARF loading part, maybe the BTF
in the future), i.e. when using 'pahole -j' with the one obtained when
doing single threaded encoding.

Testing it on a:

  # grep -m1 "model name" /proc/cpuinfo
  model name	: 13th Gen Intel(R) Core(TM) i7-1365U
  ~#

I.e. 2 performance cores (4 threads) + 8 efficiency cores.

From:

  $ perf stat -r5 pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux' (5 runs):

         17,187.27 msec task-clock:u       #    6.153 CPUs utilized   ( +-  0.34% )
  <SNIP>
            2.7931 +- 0.0336 seconds time elapsed  ( +-  1.20% )

  $

To:

  $ perf stat -r5 pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible_build vmlinux

   Performance counter stats for 'pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible_build vmlinux' (5 runs):

         14,654.06 msec task-clock:u       #    3.507 CPUs utilized   ( +-  0.45% )
  <SNIP>
            4.1787 +- 0.0344 seconds time elapsed  ( +-  0.82% )

  $

Which is still a nice improvement over doing it completely serially:

  $ perf stat -r5 pahole --btf_encode_detached=vmlinux.btf.serial vmlinux

   Performance counter stats for 'pahole --btf_encode_detached=vmlinux.btf.serial vmlinux' (5 runs):

          7,506.93 msec task-clock:u       #    1.000 CPUs utilized   ( +-  0.13% )
  <SNIP>
            7.5106 +- 0.0115 seconds time elapsed  ( +-  0.15% )

  $

  $ pahole vmlinux.btf.parallel > /tmp/parallel
  $ pahole vmlinux.btf.parallel.reproducible_build > /tmp/parallel.reproducible_build
  $ diff -u /tmp/parallel /tmp/parallel.reproducible_build | wc -l
  269920
  $ pahole --sort vmlinux.btf.parallel > /tmp/parallel.sorted
  $ pahole --sort vmlinux.btf.parallel.reproducible_build > /tmp/parallel.reproducible_build.sorted
  $ diff -u /tmp/parallel.sorted /tmp/parallel.reproducible_build.sorted | wc -l
  0
  $

The BTF ids continue to be undeterministic, as we need to process the
CUs (compile unites) in the same order that they are on vmlinux:

  $ bpftool btf dump file vmlinux.btf.serial > btfdump.serial
  $ bpftool btf dump file vmlinux.btf.parallel.reproducible_build > btfdump.parallel.reproducible_build
  $ bpftool btf dump file vmlinux.btf.parallel > btfdump.parallel
  $ diff -u btfdump.serial btfdump.parallel | wc -l
  624144
  $ diff -u btfdump.serial btfdump.parallel.reproducible_build | wc -l
  594622
  $ diff -u btfdump.parallel.reproducible_build btfdump.parallel | wc -l
  623355
  $

The BTF ids don't match, we'll get them to match at the end of this
patch series:

  $ tail -5 btfdump.serial
  	type_id=127124 offset=219200 size=40 (VAR 'rt6_uncached_list')
  	type_id=11760 offset=221184 size=64 (VAR 'vmw_steal_time')
  	type_id=13533 offset=221248 size=8 (VAR 'kvm_apic_eoi')
  	type_id=13532 offset=221312 size=64 (VAR 'steal_time')
  	type_id=13531 offset=221376 size=68 (VAR 'apf_reason')
  $ tail -5 btfdump.parallel.reproducible_build
  	type_id=113812 offset=219200 size=40 (VAR 'rt6_uncached_list')
  	type_id=87979 offset=221184 size=64 (VAR 'vmw_steal_time')
  	type_id=127391 offset=221248 size=8 (VAR 'kvm_apic_eoi')
  	type_id=127390 offset=221312 size=64 (VAR 'steal_time')
  	type_id=127389 offset=221376 size=68 (VAR 'apf_reason')
  $

Now to make it process the CUs in order, that should get everything
straight without hopefully not degrading it further too much.

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 pahole.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/pahole.c b/pahole.c
index 96e153432fa212a5..fcb4360f11debeb9 100644
--- a/pahole.c
+++ b/pahole.c
@@ -3173,6 +3173,14 @@ struct thread_data {
 	struct btf_encoder *encoder;
 };
 
+static int pahole_threads_prepare_reproducible_build(struct conf_load *conf, int nr_threads, void **thr_data)
+{
+	for (int i = 0; i < nr_threads; i++)
+		thr_data[i] = NULL;
+
+	return 0;
+}
+
 static int pahole_threads_prepare(struct conf_load *conf, int nr_threads, void **thr_data)
 {
 	int i;
@@ -3283,7 +3291,10 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
 				thread->btf = btf_encoder__btf(btf_encoder);
 			}
 		}
-		pthread_mutex_unlock(&btf_lock);
+
+		// Reproducible builds don't have multiple btf_encoders, so we need to keep the lock until we encode BTF for this CU.
+		if (thr_data)
+			pthread_mutex_unlock(&btf_lock);
 
 		if (!btf_encoder) {
 			ret = LSK__STOP_LOADING;
@@ -3319,6 +3330,8 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
 			exit(1);
 		}
 out_btf:
+		if (!thr_data) // See comment about reproducibe_build above
+			pthread_mutex_unlock(&btf_lock);
 		return ret;
 	}
 #if 0
@@ -3689,8 +3702,14 @@ int main(int argc, char *argv[])
 
 	conf_load.steal = pahole_stealer;
 	conf_load.thread_exit = pahole_thread_exit;
-	conf_load.threads_prepare = pahole_threads_prepare;
-	conf_load.threads_collect = pahole_threads_collect;
+
+	if (conf_load.reproducible_build) {
+		conf_load.threads_prepare = pahole_threads_prepare_reproducible_build;
+		conf_load.threads_collect = NULL;
+	} else {
+		conf_load.threads_prepare = pahole_threads_prepare;
+		conf_load.threads_collect = pahole_threads_collect;
+	}
 
 	// Make 'pahole --header type < file' a shorter form of 'pahole -C type --count 1 < file'
 	if (conf.header_type && !class_name && prettify_input) {
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 03/12] dwarf_loader: Separate creating the cu/dcu pair from processing it
  2024-04-02 19:39 [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Arnaldo Carvalho de Melo
  2024-04-02 19:39 ` [PATCH 01/12] core: Allow asking for a reproducible build Arnaldo Carvalho de Melo
  2024-04-02 19:39 ` [PATCH 02/12] pahole: Disable BTF multithreaded encoded when doing reproducible builds Arnaldo Carvalho de Melo
@ 2024-04-02 19:39 ` Arnaldo Carvalho de Melo
  2024-04-04  9:42   ` Jiri Olsa
  2024-04-02 19:39 ` [PATCH 04/12] dwarf_loader: Introduce dwarf_cus__process_cu() Arnaldo Carvalho de Melo
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-02 19:39 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

We will need it so that we add the dcu to a list in the same order as
the CUs are in the DWARF file (vmlinux mostly).

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarf_loader.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index 1dffb3f433cb7c8e..125e361ef2bf3f7b 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -3207,8 +3207,7 @@ struct dwarf_thread {
 	void			*data;
 };
 
-static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die,
-					    uint8_t pointer_size, void *thr_data)
+static struct dwarf_cu *dwarf_cus__create_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die, uint8_t pointer_size)
 {
 	/*
 	 * DW_AT_name in DW_TAG_compile_unit can be NULL, first seen in:
@@ -3218,17 +3217,32 @@ static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *c
 	const char *name = attr_string(cu_die, DW_AT_name, dcus->conf);
 	struct cu *cu = cu__new(name ?: "", pointer_size, dcus->build_id, dcus->build_id_len, dcus->filename, dcus->conf->use_obstack);
 	if (cu == NULL || cu__set_common(cu, dcus->conf, dcus->mod, dcus->elf) != 0)
-		return DWARF_CB_ABORT;
+		return NULL;
 
 	struct dwarf_cu *dcu = dwarf_cu__new(cu);
 
-	if (dcu == NULL)
-		return DWARF_CB_ABORT;
+	if (dcu == NULL) {
+		cu__delete(cu);
+		return NULL;
+	}
 
 	dcu->type_unit = dcus->type_dcu;
 	cu->priv = dcu;
 	cu->dfops = &dwarf__ops;
 
+	return dcu;
+}
+
+static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die,
+					    uint8_t pointer_size, void *thr_data)
+{
+	struct dwarf_cu *dcu = dwarf_cus__create_cu(dcus, cu_die, pointer_size);
+
+	if (dcu == NULL)
+		return DWARF_CB_ABORT;
+
+	struct cu *cu = dcu->cu;
+
 	if (die__process_and_recode(cu_die, cu, dcus->conf) != 0 ||
 	    cus__finalize(dcus->cus, cu, dcus->conf, thr_data) == LSK__STOP_LOADING)
 		return DWARF_CB_ABORT;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 04/12] dwarf_loader: Introduce dwarf_cus__process_cu()
  2024-04-02 19:39 [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Arnaldo Carvalho de Melo
                   ` (2 preceding siblings ...)
  2024-04-02 19:39 ` [PATCH 03/12] dwarf_loader: Separate creating the cu/dcu pair from processing it Arnaldo Carvalho de Melo
@ 2024-04-02 19:39 ` Arnaldo Carvalho de Melo
  2024-04-02 19:39 ` [PATCH 05/12] dwarf_loader: Create the cu/dcu pair in dwarf_cus__nextcu() Arnaldo Carvalho de Melo
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-02 19:39 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

Finishing the separation of creating a dcu/cu pair from processing it,
as we'll need to add the new dcu/cu pair to the list under cus__lock(),
so that we process it in order to keep a reproducible BTF encoding.

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarf_loader.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index 125e361ef2bf3f7b..5090509309446890 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -3233,6 +3233,16 @@ static struct dwarf_cu *dwarf_cus__create_cu(struct dwarf_cus *dcus, Dwarf_Die *
 	return dcu;
 }
 
+static int dwarf_cus__process_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die,
+				 struct cu *cu, void *thr_data)
+{
+	if (die__process_and_recode(cu_die, cu, dcus->conf) != 0 ||
+	    cus__finalize(dcus->cus, cu, dcus->conf, thr_data) == LSK__STOP_LOADING)
+		return DWARF_CB_ABORT;
+
+       return DWARF_CB_OK;
+}
+
 static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die,
 					    uint8_t pointer_size, void *thr_data)
 {
@@ -3241,13 +3251,7 @@ static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *c
 	if (dcu == NULL)
 		return DWARF_CB_ABORT;
 
-	struct cu *cu = dcu->cu;
-
-	if (die__process_and_recode(cu_die, cu, dcus->conf) != 0 ||
-	    cus__finalize(dcus->cus, cu, dcus->conf, thr_data) == LSK__STOP_LOADING)
-		return DWARF_CB_ABORT;
-
-       return DWARF_CB_OK;
+	return dwarf_cus__process_cu(dcus, cu_die, dcu->cu, thr_data);
 }
 
 static int dwarf_cus__nextcu(struct dwarf_cus *dcus, Dwarf_Die *die_mem, Dwarf_Die **cu_die, uint8_t *pointer_size, uint8_t *offset_size)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 05/12] dwarf_loader: Create the cu/dcu pair in dwarf_cus__nextcu()
  2024-04-02 19:39 [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Arnaldo Carvalho de Melo
                   ` (3 preceding siblings ...)
  2024-04-02 19:39 ` [PATCH 04/12] dwarf_loader: Introduce dwarf_cus__process_cu() Arnaldo Carvalho de Melo
@ 2024-04-02 19:39 ` Arnaldo Carvalho de Melo
  2024-04-02 19:39 ` [PATCH 06/12] dwarf_loader: Remove unused 'thr_data' arg from dwarf_cus__create_and_process_cu() Arnaldo Carvalho de Melo
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-02 19:39 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

dwarf_cus__nextcu() is only used in parallel DWARF loading, and we want
to make sure that we preserve the order of the CUs in the DWARF file
being loaded, so move creating the cu/dcu to under that lock.

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarf_loader.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index 5090509309446890..a7a8b2bea112ba75 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -3254,7 +3254,9 @@ static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *c
 	return dwarf_cus__process_cu(dcus, cu_die, dcu->cu, thr_data);
 }
 
-static int dwarf_cus__nextcu(struct dwarf_cus *dcus, Dwarf_Die *die_mem, Dwarf_Die **cu_die, uint8_t *pointer_size, uint8_t *offset_size)
+static int dwarf_cus__nextcu(struct dwarf_cus *dcus, struct dwarf_cu **dcu,
+			     Dwarf_Die *die_mem, Dwarf_Die **cu_die,
+			     uint8_t *pointer_size, uint8_t *offset_size)
 {
 	Dwarf_Off noff;
 	size_t cuhl;
@@ -3274,6 +3276,15 @@ static int dwarf_cus__nextcu(struct dwarf_cus *dcus, Dwarf_Die *die_mem, Dwarf_D
 			dcus->off = noff;
 	}
 
+	if (ret == 0 && *cu_die != NULL) {
+		*dcu = dwarf_cus__create_cu(dcus, *cu_die, *pointer_size);
+		if (*dcu == NULL) {
+			dcus->error = ENOMEM;
+			ret = -1;
+			goto out_unlock;
+		}
+	}
+
 out_unlock:
 	cus__unlock(dcus->cus);
 
@@ -3286,13 +3297,13 @@ static void *dwarf_cus__process_cu_thread(void *arg)
 	struct dwarf_cus *dcus = dthr->dcus;
 	uint8_t pointer_size, offset_size;
 	Dwarf_Die die_mem, *cu_die;
+	struct dwarf_cu *dcu;
 
-	while (dwarf_cus__nextcu(dcus, &die_mem, &cu_die, &pointer_size, &offset_size) == 0) {
+	while (dwarf_cus__nextcu(dcus, &dcu, &die_mem, &cu_die, &pointer_size, &offset_size) == 0) {
 		if (cu_die == NULL)
 			break;
 
-		if (dwarf_cus__create_and_process_cu(dcus, cu_die,
-						     pointer_size, dthr->data) == DWARF_CB_ABORT)
+		if (dwarf_cus__process_cu(dcus, cu_die, dcu->cu, dthr->data) == DWARF_CB_ABORT)
 			goto out_abort;
 	}
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 06/12] dwarf_loader: Remove unused 'thr_data' arg from dwarf_cus__create_and_process_cu()
  2024-04-02 19:39 [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Arnaldo Carvalho de Melo
                   ` (4 preceding siblings ...)
  2024-04-02 19:39 ` [PATCH 05/12] dwarf_loader: Create the cu/dcu pair in dwarf_cus__nextcu() Arnaldo Carvalho de Melo
@ 2024-04-02 19:39 ` Arnaldo Carvalho de Melo
  2024-04-02 19:39 ` [PATCH 07/12] core: Add unlocked cus__add() variant Arnaldo Carvalho de Melo
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-02 19:39 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

The only caller for dwarf_cus__create_and_process_cu() now is serial
loading of DWARF, so no point in passing the perf thread data, that is
always NULL, so remove that parameter.

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarf_loader.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index a7a8b2bea112ba75..a097b67a2d123b55 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -3243,15 +3243,14 @@ static int dwarf_cus__process_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die,
        return DWARF_CB_OK;
 }
 
-static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die,
-					    uint8_t pointer_size, void *thr_data)
+static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die, uint8_t pointer_size)
 {
 	struct dwarf_cu *dcu = dwarf_cus__create_cu(dcus, cu_die, pointer_size);
 
 	if (dcu == NULL)
 		return DWARF_CB_ABORT;
 
-	return dwarf_cus__process_cu(dcus, cu_die, dcu->cu, thr_data);
+	return dwarf_cus__process_cu(dcus, cu_die, dcu->cu, NULL);
 }
 
 static int dwarf_cus__nextcu(struct dwarf_cus *dcus, struct dwarf_cu **dcu,
@@ -3377,8 +3376,7 @@ static int __dwarf_cus__process_cus(struct dwarf_cus *dcus)
 		if (cu_die == NULL)
 			break;
 
-		if (dwarf_cus__create_and_process_cu(dcus, cu_die,
-						     pointer_size, NULL) == DWARF_CB_ABORT)
+		if (dwarf_cus__create_and_process_cu(dcus, cu_die, pointer_size) == DWARF_CB_ABORT)
 			return DWARF_CB_ABORT;
 
 		dcus->off = noff;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 07/12] core: Add unlocked cus__add() variant
  2024-04-02 19:39 [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Arnaldo Carvalho de Melo
                   ` (5 preceding siblings ...)
  2024-04-02 19:39 ` [PATCH 06/12] dwarf_loader: Remove unused 'thr_data' arg from dwarf_cus__create_and_process_cu() Arnaldo Carvalho de Melo
@ 2024-04-02 19:39 ` Arnaldo Carvalho de Melo
  2024-04-02 19:39 ` [PATCH 08/12] core: Add cus__remove(), counterpart of cus__add() Arnaldo Carvalho de Melo
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-02 19:39 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

As we'll use with the cus lock already held when getting the next CU
from vmlinux to keep the order in the original DWARF file.

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarves.c | 9 ++++++---
 dwarves.h | 1 +
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/dwarves.c b/dwarves.c
index 3b4be595aa59a856..654a8085e9252a21 100644
--- a/dwarves.c
+++ b/dwarves.c
@@ -479,13 +479,16 @@ uint32_t cus__nr_entries(const struct cus *cus)
 	return cus->nr_entries;
 }
 
-void cus__add(struct cus *cus, struct cu *cu)
+void __cus__add(struct cus *cus, struct cu *cu)
 {
-	cus__lock(cus);
-
 	cus->nr_entries++;
 	list_add_tail(&cu->node, &cus->cus);
+}
 
+void cus__add(struct cus *cus, struct cu *cu)
+{
+	cus__lock(cus);
+	__cus__add(cus, cu);
 	cus__unlock(cus);
 
 	cu__find_class_holes(cu);
diff --git a/dwarves.h b/dwarves.h
index 4dfaa01a00f782d9..42b00bc1341e66cb 100644
--- a/dwarves.h
+++ b/dwarves.h
@@ -171,6 +171,7 @@ int cus__fprintf_load_files_err(struct cus *cus, const char *tool,
 int cus__load_dir(struct cus *cus, struct conf_load *conf,
 		  const char *dirname, const char *filename_mask,
 		  const int recursive);
+void __cus__add(struct cus *cus, struct cu *cu);
 void cus__add(struct cus *cus, struct cu *cu);
 void cus__print_error_msg(const char *progname, const struct cus *cus,
 			  const char *filename, const int err);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 08/12] core: Add cus__remove(), counterpart of cus__add()
  2024-04-02 19:39 [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Arnaldo Carvalho de Melo
                   ` (6 preceding siblings ...)
  2024-04-02 19:39 ` [PATCH 07/12] core: Add unlocked cus__add() variant Arnaldo Carvalho de Melo
@ 2024-04-02 19:39 ` Arnaldo Carvalho de Melo
  2024-04-02 19:39 ` [PATCH 09/12] dwarf_loader: Add the cu to the cus list early, remove on LSK_DELETE Arnaldo Carvalho de Melo
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-02 19:39 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

We'll switch to adding the 'struct cu' instance to the 'struct cus' list
early, under the lock, to keep the order from the original DWARF file
and then LSK_KEEPIT will just leave it there while LSK_DELETE will first
remove it from the cus list, under cus lock, to then call cu__delete().

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarves.c | 13 +++++++++++++
 dwarves.h |  4 ++++
 2 files changed, 17 insertions(+)

diff --git a/dwarves.c b/dwarves.c
index 654a8085e9252a21..3cd300db97973ce4 100644
--- a/dwarves.c
+++ b/dwarves.c
@@ -479,6 +479,19 @@ uint32_t cus__nr_entries(const struct cus *cus)
 	return cus->nr_entries;
 }
 
+void __cus__remove(struct cus *cus, struct cu *cu)
+{
+	cus->nr_entries--;
+	list_del_init(&cu->node);
+}
+
+void cus__remove(struct cus *cus, struct cu *cu)
+{
+	cus__lock(cus);
+	__cus__remove(cus, cu);
+	cus__unlock(cus);
+}
+
 void __cus__add(struct cus *cus, struct cu *cu)
 {
 	cus->nr_entries++;
diff --git a/dwarves.h b/dwarves.h
index 42b00bc1341e66cb..9dc9cb4b074b5d33 100644
--- a/dwarves.h
+++ b/dwarves.h
@@ -173,6 +173,10 @@ int cus__load_dir(struct cus *cus, struct conf_load *conf,
 		  const int recursive);
 void __cus__add(struct cus *cus, struct cu *cu);
 void cus__add(struct cus *cus, struct cu *cu);
+
+void __cus__remove(struct cus *cus, struct cu *cu);
+void cus__remove(struct cus *cus, struct cu *cu);
+
 void cus__print_error_msg(const char *progname, const struct cus *cus,
 			  const char *filename, const int err);
 struct cu *cus__find_pair(struct cus *cus, const char *name);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 09/12] dwarf_loader: Add the cu to the cus list early, remove on LSK_DELETE
  2024-04-02 19:39 [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Arnaldo Carvalho de Melo
                   ` (7 preceding siblings ...)
  2024-04-02 19:39 ` [PATCH 08/12] core: Add cus__remove(), counterpart of cus__add() Arnaldo Carvalho de Melo
@ 2024-04-02 19:39 ` Arnaldo Carvalho de Melo
  2024-04-02 19:39 ` [PATCH 10/12] core/dwarf_loader: Add functions to set state of CU processing Arnaldo Carvalho de Melo
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-02 19:39 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

We want to keep it in the same order as in the original DWARF file we're
loading (e.g. vmlinux), we'll only remove it after we load and process
(e.g. convert to BTF).

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarf_loader.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index a097b67a2d123b55..3ef22aada6f46f13 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -3034,12 +3034,12 @@ static int cus__finalize(struct cus *cus, struct cu *cu, struct conf_load *conf,
 	int lsk = cu__finalize(cu, conf, thr_data);
 	switch (lsk) {
 	case LSK__DELETE:
+		cus__remove(cus, cu);
 		cu__delete(cu);
 		break;
 	case LSK__STOP_LOADING:
 		break;
 	case LSK__KEEPIT:
-		cus__add(cus, cu);
 		break;
 	}
 	return lsk;
@@ -3064,7 +3064,7 @@ static int cu__set_common(struct cu *cu, struct conf_load *conf,
 	return 0;
 }
 
-static int __cus__load_debug_types(struct conf_load *conf, Dwfl_Module *mod, Dwarf *dw, Elf *elf,
+static int __cus__load_debug_types(struct cus *cus, struct conf_load *conf, Dwfl_Module *mod, Dwarf *dw, Elf *elf,
 				   const char *filename, const unsigned char *build_id,
 				   int build_id_len, struct cu **cup, struct dwarf_cu *dcup)
 {
@@ -3098,6 +3098,7 @@ static int __cus__load_debug_types(struct conf_load *conf, Dwfl_Module *mod, Dwa
 			cu->dfops = &dwarf__ops;
 
 			*cup = cu;
+			cus__add(cus, cu);
 		}
 
 		Dwarf_Die die_mem;
@@ -3250,6 +3251,8 @@ static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *c
 	if (dcu == NULL)
 		return DWARF_CB_ABORT;
 
+	cus__add(dcus->cus, dcu->cu);
+
 	return dwarf_cus__process_cu(dcus, cu_die, dcu->cu, NULL);
 }
 
@@ -3282,6 +3285,9 @@ static int dwarf_cus__nextcu(struct dwarf_cus *dcus, struct dwarf_cu **dcu,
 			ret = -1;
 			goto out_unlock;
 		}
+		// Do it here to keep all CUs in cus->cus in the same
+		// order as in the DWARF file being loaded (e.g. vmlinux)
+		__cus__add(dcus->cus, (*dcu)->cu);
 	}
 
 out_unlock:
@@ -3496,15 +3502,15 @@ static int cus__load_module(struct cus *cus, struct conf_load *conf,
 	struct dwarf_cu type_dcu;
 	int type_lsk = LSK__KEEPIT;
 
-	int res = __cus__load_debug_types(conf, mod, dw, elf, filename, build_id, build_id_len, &type_cu, &type_dcu);
+	int res = __cus__load_debug_types(cus, conf, mod, dw, elf, filename, build_id, build_id_len, &type_cu, &type_dcu);
 	if (res != 0) {
 		return res;
 	}
 
 	if (type_cu != NULL) {
 		type_lsk = cu__finalize(type_cu, conf, NULL);
-		if (type_lsk == LSK__KEEPIT) {
-			cus__add(cus, type_cu);
+		if (type_lsk == LSK__DELETE) {
+			cus__remove(cus, type_cu);
 		}
 	}
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 10/12] core/dwarf_loader: Add functions to set state of CU processing
  2024-04-02 19:39 [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Arnaldo Carvalho de Melo
                   ` (8 preceding siblings ...)
  2024-04-02 19:39 ` [PATCH 09/12] dwarf_loader: Add the cu to the cus list early, remove on LSK_DELETE Arnaldo Carvalho de Melo
@ 2024-04-02 19:39 ` Arnaldo Carvalho de Melo
  2024-04-02 19:39 ` [PATCH 11/12] pahole: Encode BTF serially in a reproducible build Arnaldo Carvalho de Melo
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-02 19:39 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

When doing reproducible builds we'll process loaded CUs in
pahole_stealer in the same order as the CUs are in the DWARF file
(vmlinux), so sometimes when we may finish loading a CU out of order and
thus will have to either busy loop (to use a hot cache?) to way for its
turn or simply leave it in the ordered queue waiting its turn (may use
too much memory with a pile of CUs waiting to be processed (converted to
BTF)?), so we need to mark its state:

   unprocessed, loaded, processing

So when pahole has pahole_stealer called, it will not use necessarily
the handed cu, because it may be out of order, say if the first CU that
pahole gets in one of its threads is a big one and takes longer to
process than one of the other handed to the otherd DWARF loading
threads, then it will have to wait for the first one to be processed by
the, so far, non reproducible BTF encoder.

We will instead allow tools such as pahole to ask for the next CU do be
processed, i.e. in order, if the first one is still not loaded, then
it'll just return NULL and the encoder will return so that its DWARF
loading thread can go load one more CU and then ask pahole to encode one
more CU, rinse repeat.

We'll see if this ends up using too much memory due to unconstrained
loading of CUs that pile up waiting for BTF encoding.

At the end the tool will need to flush the queue.

When a CU is processed it gets deleted, which goes on freeing memory.

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarf_loader.c |  8 +++++---
 dwarves.c      | 40 ++++++++++++++++++++++++++++++++++++++++
 dwarves.h      | 11 +++++++++++
 3 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index 3ef22aada6f46f13..b15cf543fa9d7471 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -3016,13 +3016,15 @@ static void cu__sort_types_by_offset(struct cu *cu, struct conf_load *conf)
 	cu__for_all_tags(cu, type__sort_by_offset, conf);
 }
 
-static int cu__finalize(struct cu *cu, struct conf_load *conf, void *thr_data)
+static int cu__finalize(struct cu *cu, struct cus *cus, struct conf_load *conf, void *thr_data)
 {
 	cu__for_all_tags(cu, class_member__cache_byte_size, conf);
 
 	if (cu__language_reorders_offsets(cu))
 		cu__sort_types_by_offset(cu, conf);
 
+	cus__set_cu_state(cus, cu, CU__LOADED);
+
 	if (conf && conf->steal) {
 		return conf->steal(cu, conf, thr_data);
 	}
@@ -3031,7 +3033,7 @@ static int cu__finalize(struct cu *cu, struct conf_load *conf, void *thr_data)
 
 static int cus__finalize(struct cus *cus, struct cu *cu, struct conf_load *conf, void *thr_data)
 {
-	int lsk = cu__finalize(cu, conf, thr_data);
+	int lsk = cu__finalize(cu, cus, conf, thr_data);
 	switch (lsk) {
 	case LSK__DELETE:
 		cus__remove(cus, cu);
@@ -3508,7 +3510,7 @@ static int cus__load_module(struct cus *cus, struct conf_load *conf,
 	}
 
 	if (type_cu != NULL) {
-		type_lsk = cu__finalize(type_cu, conf, NULL);
+		type_lsk = cu__finalize(type_cu, cus, conf, NULL);
 		if (type_lsk == LSK__DELETE) {
 			cus__remove(cus, type_cu);
 		}
diff --git a/dwarves.c b/dwarves.c
index 3cd300db97973ce4..fbc8d8aa0060b7d0 100644
--- a/dwarves.c
+++ b/dwarves.c
@@ -469,6 +469,44 @@ void cus__unlock(struct cus *cus)
 	pthread_mutex_unlock(&cus->mutex);
 }
 
+void cus__set_cu_state(struct cus *cus, struct cu *cu, enum cu_state state)
+{
+	cus__lock(cus);
+	cu->state = state;
+	cus__unlock(cus);
+}
+
+// Used only when reproducible builds are desired
+struct cu *cus__get_next_processable_cu(struct cus *cus)
+{
+	struct cu *cu;
+
+	cus__lock(cus);
+
+	list_for_each_entry(cu, &cus->cus, node) {
+		switch (cu->state) {
+		case CU__LOADED:
+			cu->state = CU__PROCESSING;
+			goto found;
+		case CU__PROCESSING:
+			// This will only happen when we get to parallel
+			// reproducible BTF encoding, libbpf dedup work needed here.
+			continue;
+		case CU__UNPROCESSED:
+			// The first entry isn't loaded, signal the
+			// caller to return and try another day, as we
+			// need to respect the original DWARF CU ordering.
+			goto out;
+		}
+	}
+out:
+	cu = NULL;
+found:
+	cus__unlock(cus);
+
+	return cu;
+}
+
 bool cus__empty(const struct cus *cus)
 {
 	return list_empty(&cus->cus);
@@ -701,6 +739,8 @@ struct cu *cu__new(const char *name, uint8_t addr_size,
 		cu->addr_size = addr_size;
 		cu->extra_dbg_info = 0;
 
+		cu->state = CU__UNPROCESSED;
+
 		cu->nr_inline_expansions   = 0;
 		cu->size_inline_expansions = 0;
 		cu->nr_structures_changed  = 0;
diff --git a/dwarves.h b/dwarves.h
index 9dc9cb4b074b5d33..dd35a4efd6e5decb 100644
--- a/dwarves.h
+++ b/dwarves.h
@@ -43,6 +43,12 @@ enum load_steal_kind {
 	LSK__STOP_LOADING,
 };
 
+enum cu_state {
+	CU__UNPROCESSED,
+	CU__LOADED,
+	CU__PROCESSING,
+};
+
 /*
  * BTF combines all the types into one big CU using btf_dedup(), so for something
  * like a allyesconfig vmlinux kernel we can get over 65535 types.
@@ -177,6 +183,10 @@ void cus__add(struct cus *cus, struct cu *cu);
 void __cus__remove(struct cus *cus, struct cu *cu);
 void cus__remove(struct cus *cus, struct cu *cu);
 
+struct cu *cus__get_next_processable_cu(struct cus *cus);
+
+void cus__set_cu_state(struct cus *cus, struct cu *cu, enum cu_state state);
+
 void cus__print_error_msg(const char *progname, const struct cus *cus,
 			  const char *filename, const int err);
 struct cu *cus__find_pair(struct cus *cus, const char *name);
@@ -287,6 +297,7 @@ struct cu {
 	uint8_t		 little_endian:1;
 	uint8_t		 nr_register_params;
 	int		 register_params[ARCH_MAX_REGISTER_PARAMS];
+	enum cu_state	 state;
 	uint16_t	 language;
 	unsigned long	 nr_inline_expansions;
 	size_t		 size_inline_expansions;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 11/12] pahole: Encode BTF serially in a reproducible build
  2024-04-02 19:39 [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Arnaldo Carvalho de Melo
                   ` (9 preceding siblings ...)
  2024-04-02 19:39 ` [PATCH 10/12] core/dwarf_loader: Add functions to set state of CU processing Arnaldo Carvalho de Melo
@ 2024-04-02 19:39 ` Arnaldo Carvalho de Melo
  2024-04-02 19:39 ` [PATCH 12/12] tests: Add a BTF reproducible generation test Arnaldo Carvalho de Melo
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-02 19:39 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

Now we will ask the cus instance for the next processable CU, i.e. one
that is loaded and is in the same CU order as in the original DWARF
file, under the BTF lock.

With this we can go on loading the DWARF file in parallel and only
serialize the BTF encoding, keeping that order, with this the BTF ids
end up the same both for a serial encoding:

And here are some numbers with a Release build:

  $ cat buildcmd.sh
  mkdir build
  cd build
  cmake -DCMAKE_BUILD_TYPE=Release ..
  cd ..
  make -j $(getconf _NPROCESSORS_ONLN) -C build
  $ rm -rf build
  $ ./buildcmd.sh

Its an Intel Hybrid system, and migrates to/from efficiency/perfomance
cores:

  $ getconf _NPROCESSORS_ONLN
  28
  $ grep -m1 'model name' /proc/cpuinfo
  model name	: Intel(R) Core(TM) i7-14700K
  $

8 performance cores (16 threads), 12 efficiency cores.

Serial encoding:

  $ time perf stat -e cycles -r5 pahole --btf_encode_detached=vmlinux.btf.serial vmlinux

   Performance counter stats for 'pahole --btf_encode_detached=vmlinux.btf.serial vmlinux' (5 runs):

      13,313,169,305      cpu_atom/cycles:u/      ( +- 30.61% )  (0.00%)
      27,985,776,096      cpu_core/cycles:u/      ( +-  0.17% )  (100.00%)

             5.18276 +- 0.00952 seconds time elapsed  ( +-  0.18% )

  real	0m25.937s
  user	0m25.337s
  sys	0m0.533s
  $

Parallel, but non-reproducible:

  $ time perf stat -e cycles -r5 pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux' (5 runs):

      65,781,092,442      cpu_atom/cycles:u/      ( +-  0.99% )  (42.99%)
      88,578,827,055      cpu_core/cycles:u/      ( +-  0.90% )  (60.93%)

              1.8529 +- 0.0159 seconds time elapsed  ( +-  0.86% )

  real	0m9.293s
  user	1m21.599s
  sys	0m11.348s
  $

Now what we want, a reproducible build done using parallel DWARF loading
+ CUs-ordered-as-in-vmlinux serial BTF encoding:

  $ time perf stat -e cycles -r5 pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible_build vmlinux

   Performance counter stats for 'pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible_build vmlinux' (5 runs):

      21,255,687,225      cpu_atom/cycles:u/      ( +-  0.76% )  (35.06%)
      33,852,263,760      cpu_core/cycles:u/      ( +-  0.24% )  (72.70%)

              2.3632 +- 0.0164 seconds time elapsed  ( +-  0.69% )

  real	0m11.840s
  user	0m35.952s
  sys	0m1.534s
  $

Fastest is off course the unreproducible, fully parallel DWARF loading/
BTF encoding at 1.8529 +- 0.0159 seconds, but doing a reproducible build
in 2.3632 +- 0.0164 seconds is better than completely disabling -j/full
serial at 5.18276 +- 0.00952 seconds.

Comparing the BTF generated:

  $ bpftool btf dump file vmlinux.btf.serial > output.vmlinux.btf.serial
  $ bpftool btf dump file vmlinux.btf.parallel > output.vmlinux.btf.parallel
  $ bpftool btf dump file vmlinux.btf.parallel.reproducible > output.vmlinux.btf.parallel.reproducible

  $ wc -l output.vmlinux.btf.serial output.vmlinux.btf.parallel output.vmlinux.btf.parallel.reproducible
    313404 output.vmlinux.btf.serial
    314345 output.vmlinux.btf.parallel
    313404 output.vmlinux.btf.parallel.reproducible
    941153 total
  $

Non reproducible parallel BTF encoding:

  $ diff -u output.vmlinux.btf.serial output.vmlinux.btf.parallel | head
  --- output.vmlinux.btf.serial	2024-04-02 11:11:56.665027947 -0300
  +++ output.vmlinux.btf.parallel	2024-04-02 11:12:38.490895460 -0300
  @@ -1,1708 +1,2553 @@
   [1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
  -[2] CONST '(anon)' type_id=1
  -[3] VOLATILE '(anon)' type_id=2
  -[4] ARRAY '(anon)' type_id=1 index_type_id=21 nr_elems=2
  -[5] PTR '(anon)' type_id=8
  -[6] CONST '(anon)' type_id=5
  -[7] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
  $

Reproducible:

  $ diff -u output.vmlinux.btf.serial output.vmlinux.btf.parallel.reproducible
  $

And using a test script that I'll add to a nascent repository of
regression tests:

  $ time tests/reproducible_build.sh vmlinux
  Parallel reproducible DWARF Loading/Serial BTF encoding: Ok

  real	1m13.844s
  user	3m3.601s
  sys	0m9.049s
  $

If the number of threads started by pahole is different than what was
requests via its -j command line option, it will fail as well as if the
output of 'bpftool btf dump' differs from the BTF encoded totally
serially to one of the detached BTF encoded using reproducible DWARF
loading/BTF encoding.

In verbose mode:

  $ time VERBOSE=1 tests/reproducible_build.sh vmlinux
  Parallel reproducible DWARF Loading/Serial BTF encoding:
  serial encoding...
  1 threads encoding
  1 threads started
  diff from serial encoding:
  -----------------------------
  2 threads encoding
  2 threads started
  diff from serial encoding:
  -----------------------------
  3 threads encoding
  3 threads started
  diff from serial encoding:
  -----------------------------
  4 threads encoding
  4 threads started
  diff from serial encoding:
  -----------------------------
  5 threads encoding
  5 threads started
  diff from serial encoding:
  -----------------------------
  6 threads encoding
  6 threads started
   diff from serial encoding:
  -----------------------------
  7 threads encoding
  7 threads started
  diff from serial encoding:
  -----------------------------
  8 threads encoding
  8 threads started
  diff from serial encoding:
  -----------------------------
  9 threads encoding
  9 threads started
  diff from serial encoding:
  -----------------------------
  10 threads encoding
  10 threads started
  diff from serial encoding:
  -----------------------------
  11 threads encoding
  11 threads started
  diff from serial encoding:
  -----------------------------
  12 threads encoding
  12 threads started
  diff from serial encoding:
  -----------------------------
  13 threads encoding
  13 threads started
  diff from serial encoding:
  -----------------------------
  14 threads encoding
  14 threads started
  diff from serial encoding:
  -----------------------------
  15 threads encoding
  15 threads started
  diff from serial encoding:
  -----------------------------
  16 threads encoding
  16 threads started
  diff from serial encoding:
  -----------------------------
  17 threads encoding
  17 threads started
  diff from serial encoding:
  -----------------------------
  18 threads encoding
  18 threads started
  diff from serial encoding:
  -----------------------------
  19 threads encoding
  19 threads started
  diff from serial encoding:
  -----------------------------
  20 threads encoding
  20 threads started
  diff from serial encoding:
  -----------------------------
  21 threads encoding
  21 threads started
  diff from serial encoding:
  -----------------------------
  22 threads encoding
  22 threads started
  diff from serial encoding:
  -----------------------------
  23 threads encoding
  23 threads started
  diff from serial encoding:
  -----------------------------
  24 threads encoding
  24 threads started
  diff from serial encoding:
  -----------------------------
  25 threads encoding
  25 threads started
  diff from serial encoding:
  -----------------------------
  26 threads encoding
  26 threads started
  diff from serial encoding:
  -----------------------------
  27 threads encoding
  27 threads started
  diff from serial encoding:
  -----------------------------
  28 threads encoding
  28 threads started
  diff from serial encoding:
  -----------------------------
  Ok

  real	1m14.800s
  user	3m4.315s
  sys	0m8.977s
  $

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 pahole.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/pahole.c b/pahole.c
index fcb4360f11debeb9..6c7e73835b3e9139 100644
--- a/pahole.c
+++ b/pahole.c
@@ -31,6 +31,7 @@
 
 static struct btf_encoder *btf_encoder;
 static char *detached_btf_filename;
+struct cus *cus;
 static bool btf_encode;
 static bool ctf_encode;
 static bool sort_output;
@@ -3324,11 +3325,32 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
 			encoder = btf_encoder;
 		}
 
+		// Since we don't have yet a way to parallelize the BTF encoding, we
+		// need to ask the loader for the next CU that we can process, one
+		// that is loaded and is in order, if the next one isn't yet loaded,
+		// then return to let the DWARF loader thread to load the next one,
+		// eventually all will get processed, even if when all DWARF loading
+		// threads finish.
+		if (conf_load->reproducible_build) {
+			ret = LSK__KEEPIT; // we're not processing the cu passed to this
+					  // function, so keep it.
+			cu = cus__get_next_processable_cu(cus);
+			if (cu == NULL)
+				goto out_btf;
+		}
+
 		ret = btf_encoder__encode_cu(encoder, cu, conf_load);
 		if (ret < 0) {
 			fprintf(stderr, "Encountered error while encoding BTF.\n");
 			exit(1);
 		}
+
+		if (conf_load->reproducible_build) {
+			ret = LSK__KEEPIT; // we're not processing the cu passed to this function, so keep it.
+			// Equivalent to LSK__DELETE since we processed this
+			cus__remove(cus, cu);
+			cu__delete(cu);
+		}
 out_btf:
 		if (!thr_data) // See comment about reproducibe_build above
 			pthread_mutex_unlock(&btf_lock);
@@ -3632,6 +3654,27 @@ out_free:
 	return ret;
 }
 
+static int cus__flush_reproducible_build(struct cus *cus, struct btf_encoder *encoder, struct conf_load *conf_load)
+{
+	int err = 0;
+
+	while (true) {
+		struct cu *cu = cus__get_next_processable_cu(cus);
+
+		if (cu == NULL)
+			break;
+
+		err = btf_encoder__encode_cu(encoder, cu, conf_load);
+		if (err < 0)
+			break;
+
+		cus__remove(cus, cu);
+		cu__delete(cu);
+	}
+
+	return err;
+}
+
 int main(int argc, char *argv[])
 {
 	int err, remaining, rc = EXIT_FAILURE;
@@ -3692,7 +3735,7 @@ int main(int argc, char *argv[])
 		}
 	}
 
-	struct cus *cus = cus__new();
+	cus = cus__new();
 	if (cus == NULL) {
 		fputs("pahole: insufficient memory\n", stderr);
 		goto out_dwarves_exit;
@@ -3797,6 +3840,12 @@ try_sole_arg_as_class_names:
 	header = NULL;
 
 	if (btf_encode && btf_encoder) { // maybe all CUs were filtered out and thus we don't have an encoder?
+		if (conf_load.reproducible_build &&
+		    cus__flush_reproducible_build(cus, btf_encoder, &conf_load) < 0) {
+			fprintf(stderr, "Encountered error while encoding BTF.\n");
+			exit(1);
+		}
+
 		err = btf_encoder__encode(btf_encoder);
 		if (err) {
 			fputs("Failed to encode BTF\n", stderr);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 12/12] tests: Add a BTF reproducible generation test
  2024-04-02 19:39 [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Arnaldo Carvalho de Melo
                   ` (10 preceding siblings ...)
  2024-04-02 19:39 ` [PATCH 11/12] pahole: Encode BTF serially in a reproducible build Arnaldo Carvalho de Melo
@ 2024-04-02 19:39 ` Arnaldo Carvalho de Melo
  2024-04-04  0:08 ` [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Eduard Zingerman
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-02 19:39 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

  $ time tests/reproducible_build.sh vmlinux
  Parallel reproducible DWARF Loading/Serial BTF encoding: Ok

  real  1m13.844s
  user  3m3.601s
  sys   0m9.049s
  $

If the number of threads started by pahole is different than what was
requests via its -j command line option, it will fail as well as if the
output of 'bpftool btf dump' differs from the BTF encoded totally
serially to one of the detached BTF encoded using reproducible DWARF
loading/BTF encoding.

Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tests/reproducible_build.sh | 56 +++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)
 create mode 100755 tests/reproducible_build.sh

diff --git a/tests/reproducible_build.sh b/tests/reproducible_build.sh
new file mode 100755
index 0000000000000000..9c72d548c2a21136
--- /dev/null
+++ b/tests/reproducible_build.sh
@@ -0,0 +1,56 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Test if BTF generated serially matches reproducible parallel DWARF loading + serial BTF encoding
+# Arnaldo Carvalho de Melo <acme@redhat.com> (C) 2024-
+
+vmlinux=$1
+outdir=$(mktemp -d /tmp/reproducible_build.sh.XXXXXX)
+
+echo -n "Parallel reproducible DWARF Loading/Serial BTF encoding: "
+
+test -n "$VERBOSE" && printf "\nserial encoding...\n"
+
+pahole --btf_encode_detached=$outdir/vmlinux.btf.serial $vmlinux
+bpftool btf dump file $outdir/vmlinux.btf.serial > $outdir/bpftool.output.vmlinux.btf.serial
+
+nr_proc=$(getconf _NPROCESSORS_ONLN)
+
+for threads in $(seq $nr_proc) ; do
+	test -n "$VERBOSE" && echo $threads threads encoding
+	pahole -j$threads --reproducible_build --btf_encode_detached=$outdir/vmlinux.btf.parallel.reproducible $vmlinux &
+	pahole=$!
+	# HACK: Wait a bit for pahole to start its threads
+	sleep 0.3s
+	# PID part to remove ps output headers
+	nr_threads_started=$(ps -L -C pahole | grep -v PID | wc -l)
+
+	if [ $threads -gt 1 ] ; then
+		((nr_threads_started -= 1))
+	fi
+
+	if [ $threads != $nr_threads_started ] ; then
+		echo "ERROR: pahole asked to start $threads encoding threads, started $nr_threads_started"
+		exit 1;
+	fi
+
+	# ps -L -C pahole | grep -v PID | nl
+	test -n "$VERBOSE" && echo $nr_threads_started threads started
+	wait $pahole
+	rm -f $outdir/bpftool.output.vmlinux.btf.parallel.reproducible
+	bpftool btf dump file $outdir/vmlinux.btf.parallel.reproducible > $outdir/bpftool.output.vmlinux.btf.parallel.reproducible
+	test -n "$VERBOSE" && echo "diff from serial encoding:"
+	diff -u $outdir/bpftool.output.vmlinux.btf.serial $outdir/bpftool.output.vmlinux.btf.parallel.reproducible > $outdir/diff
+	if [ -s $outdir/diff ] ; then
+		echo "ERROR: BTF generated from DWARF in parallel is different from the one generated in serial!"
+		exit 1
+	fi
+	test -n "$VERBOSE" && echo -----------------------------
+done
+
+rm $outdir/*
+rmdir $outdir
+
+echo "Ok"
+
+exit 0
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 02/12] pahole: Disable BTF multithreaded encoded when doing reproducible builds
  2024-04-02 19:39 ` [PATCH 02/12] pahole: Disable BTF multithreaded encoded when doing reproducible builds Arnaldo Carvalho de Melo
@ 2024-04-03 18:19   ` Andrii Nakryiko
  2024-04-03 21:38     ` Arnaldo Carvalho de Melo
  2024-04-04  9:42   ` Jiri Olsa
  1 sibling, 1 reply; 37+ messages in thread
From: Andrii Nakryiko @ 2024-04-03 18:19 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: dwarves, Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

On Tue, Apr 2, 2024 at 12:40 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> From: Arnaldo Carvalho de Melo <acme@redhat.com>
>
> Reproducible builds need to produce BTF that have the same ids, which is
> not possible at the moment to do in parallel with libbpf, so serialize
> the encoding.
>
> The next patches will also make sure that DWARF while being read in
> parallel into internal representation for later BTF encoding has its CU
> (Compile Units) fed to the BTF encoder in the same order as it is in the
> DWARF file, this way we'll produce the same BTF output no matter how
> many threads are used to read BTF.
>
> Then we'll make sure we have tests in place that compare the output of
> parallel BTF encoding (well, just the DWARF loading part, maybe the BTF
> in the future), i.e. when using 'pahole -j' with the one obtained when
> doing single threaded encoding.
>
> Testing it on a:
>
>   # grep -m1 "model name" /proc/cpuinfo
>   model name    : 13th Gen Intel(R) Core(TM) i7-1365U
>   ~#
>
> I.e. 2 performance cores (4 threads) + 8 efficiency cores.
>
> From:
>
>   $ perf stat -r5 pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux
>
>    Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux' (5 runs):
>
>          17,187.27 msec task-clock:u       #    6.153 CPUs utilized   ( +-  0.34% )
>   <SNIP>
>             2.7931 +- 0.0336 seconds time elapsed  ( +-  1.20% )
>
>   $
>
> To:
>
>   $ perf stat -r5 pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible_build vmlinux
>
>    Performance counter stats for 'pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible_build vmlinux' (5 runs):
>
>          14,654.06 msec task-clock:u       #    3.507 CPUs utilized   ( +-  0.45% )
>   <SNIP>
>             4.1787 +- 0.0344 seconds time elapsed  ( +-  0.82% )
>
>   $
>
> Which is still a nice improvement over doing it completely serially:
>
>   $ perf stat -r5 pahole --btf_encode_detached=vmlinux.btf.serial vmlinux
>
>    Performance counter stats for 'pahole --btf_encode_detached=vmlinux.btf.serial vmlinux' (5 runs):
>
>           7,506.93 msec task-clock:u       #    1.000 CPUs utilized   ( +-  0.13% )
>   <SNIP>
>             7.5106 +- 0.0115 seconds time elapsed  ( +-  0.15% )
>
>   $
>
>   $ pahole vmlinux.btf.parallel > /tmp/parallel
>   $ pahole vmlinux.btf.parallel.reproducible_build > /tmp/parallel.reproducible_build
>   $ diff -u /tmp/parallel /tmp/parallel.reproducible_build | wc -l
>   269920
>   $ pahole --sort vmlinux.btf.parallel > /tmp/parallel.sorted
>   $ pahole --sort vmlinux.btf.parallel.reproducible_build > /tmp/parallel.reproducible_build.sorted
>   $ diff -u /tmp/parallel.sorted /tmp/parallel.reproducible_build.sorted | wc -l
>   0
>   $
>
> The BTF ids continue to be undeterministic, as we need to process the
> CUs (compile unites) in the same order that they are on vmlinux:
>
>   $ bpftool btf dump file vmlinux.btf.serial > btfdump.serial
>   $ bpftool btf dump file vmlinux.btf.parallel.reproducible_build > btfdump.parallel.reproducible_build
>   $ bpftool btf dump file vmlinux.btf.parallel > btfdump.parallel
>   $ diff -u btfdump.serial btfdump.parallel | wc -l
>   624144
>   $ diff -u btfdump.serial btfdump.parallel.reproducible_build | wc -l
>   594622
>   $ diff -u btfdump.parallel.reproducible_build btfdump.parallel | wc -l
>   623355
>   $
>
> The BTF ids don't match, we'll get them to match at the end of this
> patch series:
>
>   $ tail -5 btfdump.serial
>         type_id=127124 offset=219200 size=40 (VAR 'rt6_uncached_list')
>         type_id=11760 offset=221184 size=64 (VAR 'vmw_steal_time')
>         type_id=13533 offset=221248 size=8 (VAR 'kvm_apic_eoi')
>         type_id=13532 offset=221312 size=64 (VAR 'steal_time')
>         type_id=13531 offset=221376 size=68 (VAR 'apf_reason')
>   $ tail -5 btfdump.parallel.reproducible_build
>         type_id=113812 offset=219200 size=40 (VAR 'rt6_uncached_list')
>         type_id=87979 offset=221184 size=64 (VAR 'vmw_steal_time')
>         type_id=127391 offset=221248 size=8 (VAR 'kvm_apic_eoi')
>         type_id=127390 offset=221312 size=64 (VAR 'steal_time')
>         type_id=127389 offset=221376 size=68 (VAR 'apf_reason')
>   $
>
> Now to make it process the CUs in order, that should get everything
> straight without hopefully not degrading it further too much.
>
> Cc: Alan Maguire <alan.maguire@oracle.com>
> Cc: Kui-Feng Lee <kuifeng@fb.com>
> Cc: Thomas Weißschuh <linux@weissschuh.net>
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> ---
>  pahole.c | 25 ++++++++++++++++++++++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
>

We can still produce per-thread smaller BTFs in parallel, that won't
hurt reproducibility. You only need to concatenate them in
reproducible order at the very end.

Or maybe it's already working like this, not sure, I'm a bit rusty in
pahole internals nowadays, sorry :)

> diff --git a/pahole.c b/pahole.c
> index 96e153432fa212a5..fcb4360f11debeb9 100644
> --- a/pahole.c
> +++ b/pahole.c
> @@ -3173,6 +3173,14 @@ struct thread_data {
>         struct btf_encoder *encoder;
>  };

[...]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 02/12] pahole: Disable BTF multithreaded encoded when doing reproducible builds
  2024-04-03 18:19   ` Andrii Nakryiko
@ 2024-04-03 21:38     ` Arnaldo Carvalho de Melo
  2024-04-03 21:43       ` Andrii Nakryiko
  0 siblings, 1 reply; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-03 21:38 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: dwarves, Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

On Wed, Apr 03, 2024 at 11:19:33AM -0700, Andrii Nakryiko wrote:
> We can still produce per-thread smaller BTFs in parallel, that won't
> hurt reproducibility. You only need to concatenate them in
> reproducible order at the very end.
> 
> Or maybe it's already working like this, not sure, I'm a bit rusty in
> pahole internals nowadays, sorry :)

Yeah, its just that I didn't get to that point yet, I just stopped here
to have the parallel reproducible feature working, will continue with
the parallel BTF part.

- Arnaldo
 
> > diff --git a/pahole.c b/pahole.c
> > index 96e153432fa212a5..fcb4360f11debeb9 100644
> > --- a/pahole.c
> > +++ b/pahole.c
> > @@ -3173,6 +3173,14 @@ struct thread_data {
> >         struct btf_encoder *encoder;
> >  };
> 
> [...]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 02/12] pahole: Disable BTF multithreaded encoded when doing reproducible builds
  2024-04-03 21:38     ` Arnaldo Carvalho de Melo
@ 2024-04-03 21:43       ` Andrii Nakryiko
  0 siblings, 0 replies; 37+ messages in thread
From: Andrii Nakryiko @ 2024-04-03 21:43 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: dwarves, Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

On Wed, Apr 3, 2024 at 2:38 PM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>
> On Wed, Apr 03, 2024 at 11:19:33AM -0700, Andrii Nakryiko wrote:
> > We can still produce per-thread smaller BTFs in parallel, that won't
> > hurt reproducibility. You only need to concatenate them in
> > reproducible order at the very end.
> >
> > Or maybe it's already working like this, not sure, I'm a bit rusty in
> > pahole internals nowadays, sorry :)
>
> Yeah, its just that I didn't get to that point yet, I just stopped here
> to have the parallel reproducible feature working, will continue with
> the parallel BTF part.

Great to hear, thanks, Arnaldo!

>
> - Arnaldo
>
> > > diff --git a/pahole.c b/pahole.c
> > > index 96e153432fa212a5..fcb4360f11debeb9 100644
> > > --- a/pahole.c
> > > +++ b/pahole.c
> > > @@ -3173,6 +3173,14 @@ struct thread_data {
> > >         struct btf_encoder *encoder;
> > >  };
> >
> > [...]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding
  2024-04-02 19:39 [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Arnaldo Carvalho de Melo
                   ` (11 preceding siblings ...)
  2024-04-02 19:39 ` [PATCH 12/12] tests: Add a BTF reproducible generation test Arnaldo Carvalho de Melo
@ 2024-04-04  0:08 ` Eduard Zingerman
  2024-04-04  8:05   ` Alan Maguire
  2024-04-04  8:58 ` Alan Maguire
  2024-04-04  9:42 ` Jiri Olsa
  14 siblings, 1 reply; 37+ messages in thread
From: Eduard Zingerman @ 2024-04-04  0:08 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf, Alan Maguire,
	Kui-Feng Lee, Thomas Weißschuh

On Tue, 2024-04-02 at 16:39 -0300, Arnaldo Carvalho de Melo wrote:
> Hi,
> 
> 	This allows us to have reproducible builds while keeping the
> DWARF loading phase in parallel, achieving a noticeable speedup as
> showed in the commit log messages:

[...]

> Working on libbpf to allow for parallel reproducible BTF encoding is the
> next step.

Another option would be to apply some sort of canonical ordering to
BTF itself. E.g. put all PTR before STRUCT, sort same kinds by name,
sort same kinds by vlen, etc. Something akin to [1], however this
experiment has several flaws:
- slowdown is much worse than with your patch-set;
- I see a small number of functions with identical names appearing and
  disappearing from final BTF.
  
I'll try to figure out the reason for slowdown tomorrow.

[1] https://github.com/eddyz87/dwarves/tree/sort-btf

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding
  2024-04-04  0:08 ` [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Eduard Zingerman
@ 2024-04-04  8:05   ` Alan Maguire
  2024-04-09 14:34     ` Eduard Zingerman
  0 siblings, 1 reply; 37+ messages in thread
From: Alan Maguire @ 2024-04-04  8:05 UTC (permalink / raw)
  To: Eduard Zingerman, Arnaldo Carvalho de Melo, dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf, Kui-Feng Lee,
	Thomas Weißschuh

On 04/04/2024 01:08, Eduard Zingerman wrote:
> On Tue, 2024-04-02 at 16:39 -0300, Arnaldo Carvalho de Melo wrote:
>> Hi,
>>
>> 	This allows us to have reproducible builds while keeping the
>> DWARF loading phase in parallel, achieving a noticeable speedup as
>> showed in the commit log messages:
> 
> [...]
> 
>> Working on libbpf to allow for parallel reproducible BTF encoding is the
>> next step.
> 
> Another option would be to apply some sort of canonical ordering to
> BTF itself. E.g. put all PTR before STRUCT, sort same kinds by name,
> sort same kinds by vlen, etc. Something akin to [1], however this
> experiment has several flaws:
> - slowdown is much worse than with your patch-set;
> - I see a small number of functions with identical names appearing and
>   disappearing from final BTF.
>   

Could that be the handling of functions with same name, inconsistent
prototypes? We leave them out deliberately (see
btf_encoder__add_saved_funcs().

> I'll try to figure out the reason for slowdown tomorrow.
> 
> [1] https://github.com/eddyz87/dwarves/tree/sort-btf
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding
  2024-04-02 19:39 [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Arnaldo Carvalho de Melo
                   ` (12 preceding siblings ...)
  2024-04-04  0:08 ` [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Eduard Zingerman
@ 2024-04-04  8:58 ` Alan Maguire
  2024-04-08 12:00   ` Alan Maguire
  2024-04-04  9:42 ` Jiri Olsa
  14 siblings, 1 reply; 37+ messages in thread
From: Alan Maguire @ 2024-04-04  8:58 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf, Kui-Feng Lee,
	Thomas Weißschuh

On 02/04/2024 20:39, Arnaldo Carvalho de Melo wrote:
> Hi,
> 
> 	This allows us to have reproducible builds while keeping the
> DWARF loading phase in parallel, achieving a noticeable speedup as
> showed in the commit log messages:
> 
> On a:
> 
>   model name    : Intel(R) Core(TM) i7-14700K
> 
>   8 performance cores (16 threads), 12 efficiency cores.
> 
> Serial encoding:
> 
>   $ perf stat -e cycles -r5 pahole --btf_encode_detached=vmlinux.btf.serial vmlinux
>              5.18276 +- 0.00952 seconds time elapsed  ( +-  0.18% )
> 
> Parallel, but non-reproducible:
> 
>   $ perf stat -e cycles -r5 pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux
>               1.8529 +- 0.0159 seconds time elapsed  ( +-  0.86% )
> 
> reproducible build done using parallel DWARF loading + CUs-ordered-as-in-vmlinux serial BTF encoding:
> 
>   $ perf stat -e cycles -r5 pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible_build vmlinux
>               2.3632 +- 0.0164 seconds time elapsed  ( +-  0.69% )
> 
> Please take a look, its in the 'next' branch at:
> 
>   https://git.kernel.org/pub/scm/devel/pahole/pahole.git
>   https://git.kernel.org/pub/scm/devel/pahole/pahole.git/log/?h=next
> 
> There is a new tool to do regression testing on this feature:
> 
>   https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=c751214c19bf8591bf8e4abdc677cbadee08f630
>   
> And here a more detailed set of tests using it:
> 
>   https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=4451467ca16a6e31834f6f98661c63587ce556f7
> 
> Working on libbpf to allow for parallel reproducible BTF encoding is the
> next step.
> 
> Thanks a lot,
> 

Hey Arnaldo

In testing this series I've hit a segmentation fault:

Using host libthread_db library "/usr/lib64/libthread_db.so.1".
Core was generated by `pahole -J --btf_features=all --reproducible_build
-j vmlinux'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77)
    at /home/almagui/src/dwarves/dwarves.c:612
612		return id >= pt->nr_entries ? NULL : pt->entries[id];
[Current thread is 1 (Thread 0x7f8c65400700 (LWP 624441))]
(gdb) bt
#0  0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77)
    at /home/almagui/src/dwarves/dwarves.c:612
#1  0x00007f8c8260ada2 in cu__type (cu=0x7f8c60001e40, id=77)
    at /home/almagui/src/dwarves/dwarves.c:806
#2  0x00007f8c8261342c in ftype__fprintf (ftype=0x7f8c60272f30,
    cu=0x7f8c60001e40, name=0x0, inlined=0, is_pointer=0, type_spacing=0,
    is_prototype=true, conf=0x7f8c653ff930, fp=0x7f8c3804bc90)
    at /home/almagui/src/dwarves/dwarves_fprintf.c:1388
#3  0x00007f8c8261289d in function__prototype_conf (func=0x7f8c60272f30,
    cu=0x7f8c60001e40, conf=0x7f8c653ff930, bf=0x7f8c27225dad "", len=512)
    at /home/almagui/src/dwarves/dwarves_fprintf.c:1183
#4  0x00007f8c8261b52b in proto__get (func=0x7f8c60272f30,
    proto=0x7f8c27225dad "", len=512)
    at /home/almagui/src/dwarves/btf_encoder.c:811
#5  0x00007f8c8261b665 in funcs__match (encoder=0x7f8c28023220,
    func=0x7f8c27225d88, f2=0x7f8c5805c560)
    at /home/almagui/src/dwarves/btf_encoder.c:839
#6  0x00007f8c8261b7fc in btf_encoder__save_func (encoder=0x7f8c28023220,
    fn=0x7f8c5805c560, func=0x7f8c27225d88)
    at /home/almagui/src/dwarves/btf_encoder.c:871
#7  0x00007f8c8261e361 in btf_encoder__encode_cu (encoder=0x7f8c28023220,
    cu=0x7f8c58001e20, conf_load=0x412400 <conf_load>)
    at /home/almagui/src/dwarves/btf_encoder.c:1888
#8  0x000000000040a36c in pahole_stealer (cu=0x7f8c58001e20,
    conf_load=0x412400 <conf_load>, thr_data=0x0)
    at /home/almagui/src/dwarves/pahole.c:3342
#9  0x00007f8c8262672c in cu__finalize (cu=0x7f8c38001e20, cus=0x21412a0,
    conf=0x412400 <conf_load>, thr_data=0x0)
    at /home/almagui/src/dwarves/dwarf_loader.c:3029
#10 0x00007f8c82626765 in cus__finalize (cus=0x21412a0, cu=0x7f8c38001e20,
    conf=0x412400 <conf_load>, thr_data=0x0)
    at /home/almagui/src/dwarves/dwarf_loader.c:3036
#11 0x00007f8c82626e9b in dwarf_cus__process_cu (dcus=0x7ffd71eaf0d0,
    cu_die=0x7f8c653ffeb0, cu=0x7f8c38001e20, thr_data=0x0)
    at /home/almagui/src/dwarves/dwarf_loader.c:3243
#12 0x00007f8c826270d2 in dwarf_cus__process_cu_thread (arg=0x7ffd71eaef50)
    at /home/almagui/src/dwarves/dwarf_loader.c:3313
#13 0x00007f8c816081da in start_thread () from /usr/lib64/libpthread.so.0
#14 0x00007f8c81239e73 in clone () from /usr/lib64/libc.so.6

So for conf_load->skip_encoding_btf_inconsistent_proto (enabled as part
of "all" and enabled for vmlinux/module BTF), we use dwarves_fprintf()
to write prototypes to check for inconsistent definitions.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77)
    at /home/almagui/src/dwarves/dwarves.c:612
612		return id >= pt->nr_entries ? NULL : pt->entries[id];
[Current thread is 1 (Thread 0x7f8c65400700 (LWP 624441))]
(gdb) print *(struct ptr_table *)0x7f8c60001e70
$1 = {entries = 0x0, nr_entries = 2979, allocated_entries = 4096}
(gdb)

So it looks like the ptr_table has 2979 entries but entries is NULL;
could there be an issue where CU initialization is not yet complete
for some threads (it also happens very early in processing)? Can you
reproduce this failure at your end? Thanks!

Alan

> - Arnaldo
> 
> Arnaldo Carvalho de Melo (12):
>   core: Allow asking for a reproducible build
>   pahole: Disable BTF multithreaded encoded when doing reproducible builds
>   dwarf_loader: Separate creating the cu/dcu pair from processing it
>   dwarf_loader: Introduce dwarf_cus__process_cu()
>   dwarf_loader: Create the cu/dcu pair in dwarf_cus__nextcu()
>   dwarf_loader: Remove unused 'thr_data' arg from dwarf_cus__create_and_process_cu()
>   core: Add unlocked cus__add() variant
>   core: Add cus__remove(), counterpart of cus__add()
>   dwarf_loader: Add the cu to the cus list early, remove on LSK_DELETE
>   core/dwarf_loader: Add functions to set state of CU processing
>   pahole: Encode BTF serially in a reproducible build
>   tests: Add a BTF reproducible generation test
> 
>  dwarf_loader.c              | 73 +++++++++++++++++++++++---------
>  dwarves.c                   | 58 ++++++++++++++++++++++++-
>  dwarves.h                   | 17 ++++++++
>  pahole.c                    | 84 +++++++++++++++++++++++++++++++++++--
>  tests/reproducible_build.sh | 56 +++++++++++++++++++++++++
>  5 files changed, 264 insertions(+), 24 deletions(-)
>  create mode 100755 tests/reproducible_build.sh
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding
  2024-04-02 19:39 [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Arnaldo Carvalho de Melo
                   ` (13 preceding siblings ...)
  2024-04-04  8:58 ` Alan Maguire
@ 2024-04-04  9:42 ` Jiri Olsa
  14 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2024-04-04  9:42 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: dwarves, Clark Williams, Kate Carcia, bpf, Alan Maguire,
	Kui-Feng Lee, Thomas Weißschuh

On Tue, Apr 02, 2024 at 04:39:33PM -0300, Arnaldo Carvalho de Melo wrote:
> Hi,
> 
> 	This allows us to have reproducible builds while keeping the
> DWARF loading phase in parallel, achieving a noticeable speedup as
> showed in the commit log messages:
> 
> On a:
> 
>   model name    : Intel(R) Core(TM) i7-14700K
> 
>   8 performance cores (16 threads), 12 efficiency cores.
> 
> Serial encoding:
> 
>   $ perf stat -e cycles -r5 pahole --btf_encode_detached=vmlinux.btf.serial vmlinux
>              5.18276 +- 0.00952 seconds time elapsed  ( +-  0.18% )
> 
> Parallel, but non-reproducible:
> 
>   $ perf stat -e cycles -r5 pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux
>               1.8529 +- 0.0159 seconds time elapsed  ( +-  0.86% )
> 
> reproducible build done using parallel DWARF loading + CUs-ordered-as-in-vmlinux serial BTF encoding:
> 
>   $ perf stat -e cycles -r5 pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible_build vmlinux
>               2.3632 +- 0.0164 seconds time elapsed  ( +-  0.69% )

hm, got it even faster than paralel build ;-) but it's within the
1 second deviation, I guess it shows better on bigger kernels

reproducible_build:

	[root@krava linux-qemu]# perf stat -e cycles -r 3 -- /home/jolsa/kernel/bpf/pahole/build/pahole -j --reproducible_build --btf_encode_detached=btf.2 ./vmlinux

	 Performance counter stats for '/home/jolsa/kernel/bpf/pahole/build/pahole -j --reproducible_build --btf_encode_detached=btf.2 ./vmlinux' (3 runs):

	   295,519,117,258      cycles                                                                  ( +- 19.48% )

		      9.43 +- 1.02 seconds time elapsed  ( +- 10.84% )

paralel build:

	[root@krava linux-qemu]# perf stat -e cycles -r 3 -- /home/jolsa/kernel/bpf/pahole/build/pahole -j  --btf_encode_detached=btf.2 ./vmlinux

	 Performance counter stats for '/home/jolsa/kernel/bpf/pahole/build/pahole -j --btf_encode_detached=btf.2 ./vmlinux' (3 runs):

	   391,320,973,331      cycles                                                                  ( +- 19.19% )

		     9.851 +- 0.695 seconds time elapsed  ( +-  7.06% )

1 cpu:

	[root@krava linux-qemu]# perf stat -e cycles -r 3 -- /home/jolsa/kernel/bpf/pahole/build/pahole --btf_encode_detached=btf.2 ./vmlinux

	 Performance counter stats for '/home/jolsa/kernel/bpf/pahole/build/pahole --btf_encode_detached=btf.2 ./vmlinux' (3 runs):

	   208,492,342,135      cycles                                                                  ( +- 19.43% )

		    16.761 +- 0.644 seconds time elapsed  ( +-  3.84% )

jirka

> 
> Please take a look, its in the 'next' branch at:
> 
>   https://git.kernel.org/pub/scm/devel/pahole/pahole.git
>   https://git.kernel.org/pub/scm/devel/pahole/pahole.git/log/?h=next
> 
> There is a new tool to do regression testing on this feature:
> 
>   https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=c751214c19bf8591bf8e4abdc677cbadee08f630
>   
> And here a more detailed set of tests using it:
> 
>   https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=4451467ca16a6e31834f6f98661c63587ce556f7
> 
> Working on libbpf to allow for parallel reproducible BTF encoding is the
> next step.
> 
> Thanks a lot,
> 
> - Arnaldo
> 
> Arnaldo Carvalho de Melo (12):
>   core: Allow asking for a reproducible build
>   pahole: Disable BTF multithreaded encoded when doing reproducible builds
>   dwarf_loader: Separate creating the cu/dcu pair from processing it
>   dwarf_loader: Introduce dwarf_cus__process_cu()
>   dwarf_loader: Create the cu/dcu pair in dwarf_cus__nextcu()
>   dwarf_loader: Remove unused 'thr_data' arg from dwarf_cus__create_and_process_cu()
>   core: Add unlocked cus__add() variant
>   core: Add cus__remove(), counterpart of cus__add()
>   dwarf_loader: Add the cu to the cus list early, remove on LSK_DELETE
>   core/dwarf_loader: Add functions to set state of CU processing
>   pahole: Encode BTF serially in a reproducible build
>   tests: Add a BTF reproducible generation test
> 
>  dwarf_loader.c              | 73 +++++++++++++++++++++++---------
>  dwarves.c                   | 58 ++++++++++++++++++++++++-
>  dwarves.h                   | 17 ++++++++
>  pahole.c                    | 84 +++++++++++++++++++++++++++++++++++--
>  tests/reproducible_build.sh | 56 +++++++++++++++++++++++++
>  5 files changed, 264 insertions(+), 24 deletions(-)
>  create mode 100755 tests/reproducible_build.sh
> 
> -- 
> 2.44.0
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 03/12] dwarf_loader: Separate creating the cu/dcu pair from processing it
  2024-04-02 19:39 ` [PATCH 03/12] dwarf_loader: Separate creating the cu/dcu pair from processing it Arnaldo Carvalho de Melo
@ 2024-04-04  9:42   ` Jiri Olsa
  0 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2024-04-04  9:42 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: dwarves, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

On Tue, Apr 02, 2024 at 04:39:36PM -0300, Arnaldo Carvalho de Melo wrote:
> From: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> We will need it so that we add the dcu to a list in the same order as
> the CUs are in the DWARF file (vmlinux mostly).
> 
> Cc: Alan Maguire <alan.maguire@oracle.com>
> Cc: Kui-Feng Lee <kuifeng@fb.com>
> Cc: Thomas Weißschuh <linux@weissschuh.net>
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> ---
>  dwarf_loader.c | 24 +++++++++++++++++++-----
>  1 file changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/dwarf_loader.c b/dwarf_loader.c
> index 1dffb3f433cb7c8e..125e361ef2bf3f7b 100644
> --- a/dwarf_loader.c
> +++ b/dwarf_loader.c
> @@ -3207,8 +3207,7 @@ struct dwarf_thread {
>  	void			*data;
>  };
>  
> -static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die,
> -					    uint8_t pointer_size, void *thr_data)
> +static struct dwarf_cu *dwarf_cus__create_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die, uint8_t pointer_size)
>  {
>  	/*
>  	 * DW_AT_name in DW_TAG_compile_unit can be NULL, first seen in:
> @@ -3218,17 +3217,32 @@ static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *c
>  	const char *name = attr_string(cu_die, DW_AT_name, dcus->conf);
>  	struct cu *cu = cu__new(name ?: "", pointer_size, dcus->build_id, dcus->build_id_len, dcus->filename, dcus->conf->use_obstack);
>  	if (cu == NULL || cu__set_common(cu, dcus->conf, dcus->mod, dcus->elf) != 0)
> -		return DWARF_CB_ABORT;
> +		return NULL;
>  
>  	struct dwarf_cu *dcu = dwarf_cu__new(cu);
>  
> -	if (dcu == NULL)
> -		return DWARF_CB_ABORT;
> +	if (dcu == NULL) {
> +		cu__delete(cu);

hm, I dont see cu__delete being called before, why do we need that?

jirka

> +		return NULL;
> +	}
>  
>  	dcu->type_unit = dcus->type_dcu;
>  	cu->priv = dcu;
>  	cu->dfops = &dwarf__ops;
>  
> +	return dcu;
> +}
> +
> +static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die,
> +					    uint8_t pointer_size, void *thr_data)
> +{
> +	struct dwarf_cu *dcu = dwarf_cus__create_cu(dcus, cu_die, pointer_size);
> +
> +	if (dcu == NULL)
> +		return DWARF_CB_ABORT;
> +
> +	struct cu *cu = dcu->cu;
> +
>  	if (die__process_and_recode(cu_die, cu, dcus->conf) != 0 ||
>  	    cus__finalize(dcus->cus, cu, dcus->conf, thr_data) == LSK__STOP_LOADING)
>  		return DWARF_CB_ABORT;
> -- 
> 2.44.0
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 02/12] pahole: Disable BTF multithreaded encoded when doing reproducible builds
  2024-04-02 19:39 ` [PATCH 02/12] pahole: Disable BTF multithreaded encoded when doing reproducible builds Arnaldo Carvalho de Melo
  2024-04-03 18:19   ` Andrii Nakryiko
@ 2024-04-04  9:42   ` Jiri Olsa
  1 sibling, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2024-04-04  9:42 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: dwarves, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

On Tue, Apr 02, 2024 at 04:39:35PM -0300, Arnaldo Carvalho de Melo wrote:
> From: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> Reproducible builds need to produce BTF that have the same ids, which is
> not possible at the moment to do in parallel with libbpf, so serialize
> the encoding.
> 
> The next patches will also make sure that DWARF while being read in
> parallel into internal representation for later BTF encoding has its CU
> (Compile Units) fed to the BTF encoder in the same order as it is in the
> DWARF file, this way we'll produce the same BTF output no matter how
> many threads are used to read BTF.
> 
> Then we'll make sure we have tests in place that compare the output of
> parallel BTF encoding (well, just the DWARF loading part, maybe the BTF
> in the future), i.e. when using 'pahole -j' with the one obtained when
> doing single threaded encoding.
> 
> Testing it on a:
> 
>   # grep -m1 "model name" /proc/cpuinfo
>   model name	: 13th Gen Intel(R) Core(TM) i7-1365U
>   ~#
> 
> I.e. 2 performance cores (4 threads) + 8 efficiency cores.
> 
> From:
> 
>   $ perf stat -r5 pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux
> 
>    Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux' (5 runs):
> 
>          17,187.27 msec task-clock:u       #    6.153 CPUs utilized   ( +-  0.34% )
>   <SNIP>
>             2.7931 +- 0.0336 seconds time elapsed  ( +-  1.20% )
> 
>   $
> 
> To:
> 
>   $ perf stat -r5 pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible_build vmlinux
> 
>    Performance counter stats for 'pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible_build vmlinux' (5 runs):
> 
>          14,654.06 msec task-clock:u       #    3.507 CPUs utilized   ( +-  0.45% )
>   <SNIP>
>             4.1787 +- 0.0344 seconds time elapsed  ( +-  0.82% )
> 
>   $
> 
> Which is still a nice improvement over doing it completely serially:
> 
>   $ perf stat -r5 pahole --btf_encode_detached=vmlinux.btf.serial vmlinux
> 
>    Performance counter stats for 'pahole --btf_encode_detached=vmlinux.btf.serial vmlinux' (5 runs):
> 
>           7,506.93 msec task-clock:u       #    1.000 CPUs utilized   ( +-  0.13% )
>   <SNIP>
>             7.5106 +- 0.0115 seconds time elapsed  ( +-  0.15% )
> 
>   $
> 
>   $ pahole vmlinux.btf.parallel > /tmp/parallel
>   $ pahole vmlinux.btf.parallel.reproducible_build > /tmp/parallel.reproducible_build
>   $ diff -u /tmp/parallel /tmp/parallel.reproducible_build | wc -l
>   269920
>   $ pahole --sort vmlinux.btf.parallel > /tmp/parallel.sorted
>   $ pahole --sort vmlinux.btf.parallel.reproducible_build > /tmp/parallel.reproducible_build.sorted
>   $ diff -u /tmp/parallel.sorted /tmp/parallel.reproducible_build.sorted | wc -l
>   0
>   $
> 
> The BTF ids continue to be undeterministic, as we need to process the
> CUs (compile unites) in the same order that they are on vmlinux:
> 
>   $ bpftool btf dump file vmlinux.btf.serial > btfdump.serial
>   $ bpftool btf dump file vmlinux.btf.parallel.reproducible_build > btfdump.parallel.reproducible_build
>   $ bpftool btf dump file vmlinux.btf.parallel > btfdump.parallel
>   $ diff -u btfdump.serial btfdump.parallel | wc -l
>   624144
>   $ diff -u btfdump.serial btfdump.parallel.reproducible_build | wc -l
>   594622
>   $ diff -u btfdump.parallel.reproducible_build btfdump.parallel | wc -l
>   623355
>   $
> 
> The BTF ids don't match, we'll get them to match at the end of this
> patch series:
> 
>   $ tail -5 btfdump.serial
>   	type_id=127124 offset=219200 size=40 (VAR 'rt6_uncached_list')
>   	type_id=11760 offset=221184 size=64 (VAR 'vmw_steal_time')
>   	type_id=13533 offset=221248 size=8 (VAR 'kvm_apic_eoi')
>   	type_id=13532 offset=221312 size=64 (VAR 'steal_time')
>   	type_id=13531 offset=221376 size=68 (VAR 'apf_reason')
>   $ tail -5 btfdump.parallel.reproducible_build
>   	type_id=113812 offset=219200 size=40 (VAR 'rt6_uncached_list')
>   	type_id=87979 offset=221184 size=64 (VAR 'vmw_steal_time')
>   	type_id=127391 offset=221248 size=8 (VAR 'kvm_apic_eoi')
>   	type_id=127390 offset=221312 size=64 (VAR 'steal_time')
>   	type_id=127389 offset=221376 size=68 (VAR 'apf_reason')
>   $
> 
> Now to make it process the CUs in order, that should get everything
> straight without hopefully not degrading it further too much.
> 
> Cc: Alan Maguire <alan.maguire@oracle.com>
> Cc: Kui-Feng Lee <kuifeng@fb.com>
> Cc: Thomas Weißschuh <linux@weissschuh.net>
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> ---
>  pahole.c | 25 ++++++++++++++++++++++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/pahole.c b/pahole.c
> index 96e153432fa212a5..fcb4360f11debeb9 100644
> --- a/pahole.c
> +++ b/pahole.c
> @@ -3173,6 +3173,14 @@ struct thread_data {
>  	struct btf_encoder *encoder;
>  };
>  
> +static int pahole_threads_prepare_reproducible_build(struct conf_load *conf, int nr_threads, void **thr_data)
> +{
> +	for (int i = 0; i < nr_threads; i++)
> +		thr_data[i] = NULL;
> +
> +	return 0;
> +}
> +
>  static int pahole_threads_prepare(struct conf_load *conf, int nr_threads, void **thr_data)
>  {
>  	int i;
> @@ -3283,7 +3291,10 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
>  				thread->btf = btf_encoder__btf(btf_encoder);
>  			}
>  		}
> -		pthread_mutex_unlock(&btf_lock);
> +
> +		// Reproducible builds don't have multiple btf_encoders, so we need to keep the lock until we encode BTF for this CU.
> +		if (thr_data)
> +			pthread_mutex_unlock(&btf_lock);

so the idea is that this code is executed in threads but with
NULL in thr_data , right?

>  
>  		if (!btf_encoder) {
>  			ret = LSK__STOP_LOADING;
> @@ -3319,6 +3330,8 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
>  			exit(1);
>  		}
>  out_btf:
> +		if (!thr_data) // See comment about reproducibe_build above
> +			pthread_mutex_unlock(&btf_lock);
>  		return ret;
>  	}
>  #if 0
> @@ -3689,8 +3702,14 @@ int main(int argc, char *argv[])
>  
>  	conf_load.steal = pahole_stealer;
>  	conf_load.thread_exit = pahole_thread_exit;
> -	conf_load.threads_prepare = pahole_threads_prepare;
> -	conf_load.threads_collect = pahole_threads_collect;
> +
> +	if (conf_load.reproducible_build) {
> +		conf_load.threads_prepare = pahole_threads_prepare_reproducible_build;

would it be enough just to set conf_load.threads_prepare to NULL? 

there's memset in dwarf_cus__threaded_process_cus doing the same
thing as pahole_threads_prepare_reproducible_build

jirka

> +		conf_load.threads_collect = NULL;
> +	} else {
> +		conf_load.threads_prepare = pahole_threads_prepare;
> +		conf_load.threads_collect = pahole_threads_collect;
> +	}
>  
>  	// Make 'pahole --header type < file' a shorter form of 'pahole -C type --count 1 < file'
>  	if (conf.header_type && !class_name && prettify_input) {
> -- 
> 2.44.0
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding
  2024-04-04  8:58 ` Alan Maguire
@ 2024-04-08 12:00   ` Alan Maguire
  2024-04-08 14:39     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 37+ messages in thread
From: Alan Maguire @ 2024-04-08 12:00 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf, Kui-Feng Lee,
	Thomas Weißschuh

On 04/04/2024 09:58, Alan Maguire wrote:
> On 02/04/2024 20:39, Arnaldo Carvalho de Melo wrote:
>> Hi,
>>
>> 	This allows us to have reproducible builds while keeping the
>> DWARF loading phase in parallel, achieving a noticeable speedup as
>> showed in the commit log messages:
>>
>> On a:
>>
>>   model name    : Intel(R) Core(TM) i7-14700K
>>
>>   8 performance cores (16 threads), 12 efficiency cores.
>>
>> Serial encoding:
>>
>>   $ perf stat -e cycles -r5 pahole --btf_encode_detached=vmlinux.btf.serial vmlinux
>>              5.18276 +- 0.00952 seconds time elapsed  ( +-  0.18% )
>>
>> Parallel, but non-reproducible:
>>
>>   $ perf stat -e cycles -r5 pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux
>>               1.8529 +- 0.0159 seconds time elapsed  ( +-  0.86% )
>>
>> reproducible build done using parallel DWARF loading + CUs-ordered-as-in-vmlinux serial BTF encoding:
>>
>>   $ perf stat -e cycles -r5 pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible_build vmlinux
>>               2.3632 +- 0.0164 seconds time elapsed  ( +-  0.69% )
>>
>> Please take a look, its in the 'next' branch at:
>>
>>   https://git.kernel.org/pub/scm/devel/pahole/pahole.git
>>   https://git.kernel.org/pub/scm/devel/pahole/pahole.git/log/?h=next
>>
>> There is a new tool to do regression testing on this feature:
>>
>>   https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=c751214c19bf8591bf8e4abdc677cbadee08f630
>>   
>> And here a more detailed set of tests using it:
>>
>>   https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=next&id=4451467ca16a6e31834f6f98661c63587ce556f7
>>
>> Working on libbpf to allow for parallel reproducible BTF encoding is the
>> next step.
>>
>> Thanks a lot,
>>
> 
> Hey Arnaldo
> 
> In testing this series I've hit a segmentation fault:
> 
> Using host libthread_db library "/usr/lib64/libthread_db.so.1".
> Core was generated by `pahole -J --btf_features=all --reproducible_build
> -j vmlinux'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77)
>     at /home/almagui/src/dwarves/dwarves.c:612
> 612		return id >= pt->nr_entries ? NULL : pt->entries[id];
> [Current thread is 1 (Thread 0x7f8c65400700 (LWP 624441))]
> (gdb) bt
> #0  0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77)
>     at /home/almagui/src/dwarves/dwarves.c:612
> #1  0x00007f8c8260ada2 in cu__type (cu=0x7f8c60001e40, id=77)
>     at /home/almagui/src/dwarves/dwarves.c:806
> #2  0x00007f8c8261342c in ftype__fprintf (ftype=0x7f8c60272f30,
>     cu=0x7f8c60001e40, name=0x0, inlined=0, is_pointer=0, type_spacing=0,
>     is_prototype=true, conf=0x7f8c653ff930, fp=0x7f8c3804bc90)
>     at /home/almagui/src/dwarves/dwarves_fprintf.c:1388
> #3  0x00007f8c8261289d in function__prototype_conf (func=0x7f8c60272f30,
>     cu=0x7f8c60001e40, conf=0x7f8c653ff930, bf=0x7f8c27225dad "", len=512)
>     at /home/almagui/src/dwarves/dwarves_fprintf.c:1183
> #4  0x00007f8c8261b52b in proto__get (func=0x7f8c60272f30,
>     proto=0x7f8c27225dad "", len=512)
>     at /home/almagui/src/dwarves/btf_encoder.c:811
> #5  0x00007f8c8261b665 in funcs__match (encoder=0x7f8c28023220,
>     func=0x7f8c27225d88, f2=0x7f8c5805c560)
>     at /home/almagui/src/dwarves/btf_encoder.c:839
> #6  0x00007f8c8261b7fc in btf_encoder__save_func (encoder=0x7f8c28023220,
>     fn=0x7f8c5805c560, func=0x7f8c27225d88)
>     at /home/almagui/src/dwarves/btf_encoder.c:871
> #7  0x00007f8c8261e361 in btf_encoder__encode_cu (encoder=0x7f8c28023220,
>     cu=0x7f8c58001e20, conf_load=0x412400 <conf_load>)
>     at /home/almagui/src/dwarves/btf_encoder.c:1888
> #8  0x000000000040a36c in pahole_stealer (cu=0x7f8c58001e20,
>     conf_load=0x412400 <conf_load>, thr_data=0x0)
>     at /home/almagui/src/dwarves/pahole.c:3342
> #9  0x00007f8c8262672c in cu__finalize (cu=0x7f8c38001e20, cus=0x21412a0,
>     conf=0x412400 <conf_load>, thr_data=0x0)
>     at /home/almagui/src/dwarves/dwarf_loader.c:3029
> #10 0x00007f8c82626765 in cus__finalize (cus=0x21412a0, cu=0x7f8c38001e20,
>     conf=0x412400 <conf_load>, thr_data=0x0)
>     at /home/almagui/src/dwarves/dwarf_loader.c:3036
> #11 0x00007f8c82626e9b in dwarf_cus__process_cu (dcus=0x7ffd71eaf0d0,
>     cu_die=0x7f8c653ffeb0, cu=0x7f8c38001e20, thr_data=0x0)
>     at /home/almagui/src/dwarves/dwarf_loader.c:3243
> #12 0x00007f8c826270d2 in dwarf_cus__process_cu_thread (arg=0x7ffd71eaef50)
>     at /home/almagui/src/dwarves/dwarf_loader.c:3313
> #13 0x00007f8c816081da in start_thread () from /usr/lib64/libpthread.so.0
> #14 0x00007f8c81239e73 in clone () from /usr/lib64/libc.so.6
> 
> So for conf_load->skip_encoding_btf_inconsistent_proto (enabled as part
> of "all" and enabled for vmlinux/module BTF), we use dwarves_fprintf()
> to write prototypes to check for inconsistent definitions.
> 
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77)
>     at /home/almagui/src/dwarves/dwarves.c:612
> 612		return id >= pt->nr_entries ? NULL : pt->entries[id];
> [Current thread is 1 (Thread 0x7f8c65400700 (LWP 624441))]
> (gdb) print *(struct ptr_table *)0x7f8c60001e70
> $1 = {entries = 0x0, nr_entries = 2979, allocated_entries = 4096}
> (gdb)
> 
> So it looks like the ptr_table has 2979 entries but entries is NULL;
> could there be an issue where CU initialization is not yet complete
> for some threads (it also happens very early in processing)? Can you
> reproduce this failure at your end? Thanks!
>

the following (when applied on top of the series) resolves the
segmentation fault for me:

diff --git a/pahole.c b/pahole.c
index 6c7e738..5ff0eaf 100644
--- a/pahole.c
+++ b/pahole.c
@@ -3348,8 +3348,8 @@ static enum load_steal_kind pahole_stealer(struct
cu *cu,
                if (conf_load->reproducible_build) {
                        ret = LSK__KEEPIT; // we're not processing the
cu passed to this function, so keep it.
-                        // Equivalent to LSK__DELETE since we processed
this
-                       cus__remove(cus, cu);
-                       cu__delete(cu);
                }
 out_btf:
                if (!thr_data) // See comment about reproducibe_build above


In other words, the problem is we remove/delete CUs when finished with
them in each thread (when BTF is generated).  However because the
save/add_saved_funcs stashes CU references in the associated struct
function * (to allow prototype comparison for the same function in
different CUs), we end up with stale CU references and in this case the
freed/nulled ptr_table caused an issue. As far as I can see we need to
retain CUs until all BTF has been merged from threads.

With the fix in place, I'm seeing less then 100msec difference between
reproducible/non-reproducible vmlinux BTF generation; that's great!

Alan

> Alan
> 
>> - Arnaldo
>>
>> Arnaldo Carvalho de Melo (12):
>>   core: Allow asking for a reproducible build
>>   pahole: Disable BTF multithreaded encoded when doing reproducible builds
>>   dwarf_loader: Separate creating the cu/dcu pair from processing it
>>   dwarf_loader: Introduce dwarf_cus__process_cu()
>>   dwarf_loader: Create the cu/dcu pair in dwarf_cus__nextcu()
>>   dwarf_loader: Remove unused 'thr_data' arg from dwarf_cus__create_and_process_cu()
>>   core: Add unlocked cus__add() variant
>>   core: Add cus__remove(), counterpart of cus__add()
>>   dwarf_loader: Add the cu to the cus list early, remove on LSK_DELETE
>>   core/dwarf_loader: Add functions to set state of CU processing
>>   pahole: Encode BTF serially in a reproducible build
>>   tests: Add a BTF reproducible generation test
>>
>>  dwarf_loader.c              | 73 +++++++++++++++++++++++---------
>>  dwarves.c                   | 58 ++++++++++++++++++++++++-
>>  dwarves.h                   | 17 ++++++++
>>  pahole.c                    | 84 +++++++++++++++++++++++++++++++++++--
>>  tests/reproducible_build.sh | 56 +++++++++++++++++++++++++
>>  5 files changed, 264 insertions(+), 24 deletions(-)
>>  create mode 100755 tests/reproducible_build.sh
>>
> 

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding
  2024-04-08 12:00   ` Alan Maguire
@ 2024-04-08 14:39     ` Arnaldo Carvalho de Melo
  2024-04-12 20:36       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-08 14:39 UTC (permalink / raw)
  To: Alan Maguire
  Cc: dwarves, Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Kui-Feng Lee, Thomas Weißschuh

On Mon, Apr 08, 2024 at 01:00:59PM +0100, Alan Maguire wrote:
> On 04/04/2024 09:58, Alan Maguire wrote:
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77)
> >     at /home/almagui/src/dwarves/dwarves.c:612
> > 612		return id >= pt->nr_entries ? NULL : pt->entries[id];
> > [Current thread is 1 (Thread 0x7f8c65400700 (LWP 624441))]
> > (gdb) print *(struct ptr_table *)0x7f8c60001e70
> > $1 = {entries = 0x0, nr_entries = 2979, allocated_entries = 4096}
> > (gdb)

> > So it looks like the ptr_table has 2979 entries but entries is NULL;
> > could there be an issue where CU initialization is not yet complete
> > for some threads (it also happens very early in processing)? Can you
> > reproduce this failure at your end? Thanks!
 
> the following (when applied on top of the series) resolves the
> segmentation fault for me:
 
> diff --git a/pahole.c b/pahole.c
> index 6c7e738..5ff0eaf 100644
> --- a/pahole.c
> +++ b/pahole.c
> @@ -3348,8 +3348,8 @@ static enum load_steal_kind pahole_stealer(struct
> cu *cu,
>                 if (conf_load->reproducible_build) {
>                         ret = LSK__KEEPIT; // we're not processing the
> cu passed to this function, so keep it.
> -                        // Equivalent to LSK__DELETE since we processed
> this
> -                       cus__remove(cus, cu);
> -                       cu__delete(cu);
>                 }
>  out_btf:
>                 if (!thr_data) // See comment about reproducibe_build above
> 

Yeah, Jiri also pointed out this call to cu__delete() was new, I was
trying to avoid having unprocessed 'struct cu' using too much memory, so
after processing it, delete them, but as you found out there are
references to that memory...

> In other words, the problem is we remove/delete CUs when finished with
> them in each thread (when BTF is generated).  However because the
> save/add_saved_funcs stashes CU references in the associated struct
> function * (to allow prototype comparison for the same function in
> different CUs), we end up with stale CU references and in this case the
> freed/nulled ptr_table caused an issue. As far as I can see we need to
> retain CUs until all BTF has been merged from threads.
 
> With the fix in place, I'm seeing less then 100msec difference between
> reproducible/non-reproducible vmlinux BTF generation; that's great!

Excellent!

I'll remove it and add a note crediting you with the removal and having
the explanation about why its not possibe to delete it at that point
(references to the associated 'struct function').

Perhaps we can save this info in some other way that allows us to free
the CU after having it processed, I'll think about it.

But its good to see that the difference is small, great!

Thanks a lot!

- Arnaldo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding
  2024-04-04  8:05   ` Alan Maguire
@ 2024-04-09 14:34     ` Eduard Zingerman
  2024-04-09 14:56       ` Alexei Starovoitov
  2024-04-12 20:37       ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 37+ messages in thread
From: Eduard Zingerman @ 2024-04-09 14:34 UTC (permalink / raw)
  To: Alan Maguire, Arnaldo Carvalho de Melo, dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf, Kui-Feng Lee,
	Thomas Weißschuh

On Thu, 2024-04-04 at 09:05 +0100, Alan Maguire wrote:
[...]

> Could that be the handling of functions with same name, inconsistent
> prototypes? We leave them out deliberately (see
> btf_encoder__add_saved_funcs().
> 
> > I'll try to figure out the reason for slowdown tomorrow.
> > 
> > [1] https://github.com/eddyz87/dwarves/tree/sort-btf
> > 

Fwiw, the best I can do is here:
https://github.com/eddyz87/dwarves/tree/sort-btf

It adds total ordering to BTF types using kind, kflag, vlen, name etc properties,
and rebuilds final BTF to follow this order. Here are the measurements:

$ sudo cpupower frequency-set --min 3Ghz --max 3Ghz
$ nice -n18 perf stat -r50 pahole --reproducible_build -j --btf_encode_detached=/dev/null vmlinux
           ...
           3.08648 +- 0.00813 seconds time elapsed  ( +-  0.26% )

$ nice -n18 perf stat -r50 pahole -j --btf_encode_detached=/dev/null vmlinux
           ...
           3.00785 +- 0.00878 seconds time elapsed  ( +-  0.29% )

Which gives 2.6% total time penalty when reproducible build option is used.
./test_progs tests are passing.

Still, there are a few discrepancies in generated BTFs: some function
prototypes are included twice at random (about 30 IDs added/deleted).
This might be connected to Alan's suggestion and requires
further investigation.

All in all, Arnaldo's approach with CU ordering looks simpler.

Best regards,
Eduard

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding
  2024-04-09 14:34     ` Eduard Zingerman
@ 2024-04-09 14:56       ` Alexei Starovoitov
  2024-04-09 15:01         ` Eduard Zingerman
  2024-04-12 20:37       ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 37+ messages in thread
From: Alexei Starovoitov @ 2024-04-09 14:56 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alan Maguire, Arnaldo Carvalho de Melo, dwarves, Jiri Olsa,
	Clark Williams, Kate Carcia, bpf, Kui-Feng Lee,
	Thomas Weißschuh

On Tue, Apr 9, 2024 at 7:34 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-04-04 at 09:05 +0100, Alan Maguire wrote:
> [...]
>
> > Could that be the handling of functions with same name, inconsistent
> > prototypes? We leave them out deliberately (see
> > btf_encoder__add_saved_funcs().
> >
> > > I'll try to figure out the reason for slowdown tomorrow.
> > >
> > > [1] https://github.com/eddyz87/dwarves/tree/sort-btf
> > >
>
> Fwiw, the best I can do is here:
> https://github.com/eddyz87/dwarves/tree/sort-btf
>
> It adds total ordering to BTF types using kind, kflag, vlen, name etc properties,
> and rebuilds final BTF to follow this order. Here are the measurements:
>
> $ sudo cpupower frequency-set --min 3Ghz --max 3Ghz
> $ nice -n18 perf stat -r50 pahole --reproducible_build -j --btf_encode_detached=/dev/null vmlinux
>            ...
>            3.08648 +- 0.00813 seconds time elapsed  ( +-  0.26% )
>
> $ nice -n18 perf stat -r50 pahole -j --btf_encode_detached=/dev/null vmlinux
>            ...
>            3.00785 +- 0.00878 seconds time elapsed  ( +-  0.29% )
>
> Which gives 2.6% total time penalty when reproducible build option is used.
> ./test_progs tests are passing.
>
> Still, there are a few discrepancies in generated BTFs: some function
> prototypes are included twice at random (about 30 IDs added/deleted).
> This might be connected to Alan's suggestion and requires
> further investigation.
>
> All in all, Arnaldo's approach with CU ordering looks simpler.

I would actually go with sorted BTF, since it will probably
make diff-ing of BTFs practical. Will be easier to track changes
from one kernel version to another. vmlinux.h will become
a bit more sorted too and normal diff vmlinux_6_1.h vmlinux_6_2.h
will be possible.
Or am I misunderstanding the sorting concept?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding
  2024-04-09 14:56       ` Alexei Starovoitov
@ 2024-04-09 15:01         ` Eduard Zingerman
  2024-04-09 18:45           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 37+ messages in thread
From: Eduard Zingerman @ 2024-04-09 15:01 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alan Maguire, Arnaldo Carvalho de Melo, dwarves, Jiri Olsa,
	Clark Williams, Kate Carcia, bpf, Kui-Feng Lee,
	Thomas Weißschuh

On Tue, 2024-04-09 at 07:56 -0700, Alexei Starovoitov wrote:
[...]

> I would actually go with sorted BTF, since it will probably
> make diff-ing of BTFs practical. Will be easier to track changes
> from one kernel version to another. vmlinux.h will become
> a bit more sorted too and normal diff vmlinux_6_1.h vmlinux_6_2.h
> will be possible.
> Or am I misunderstanding the sorting concept?

You understand the concept correctly, here is a sample:

  [1] INT '_Bool' size=1 bits_offset=0 nr_bits=8 encoding=BOOL
  [2] INT '__int128' size=16 bits_offset=0 nr_bits=128 encoding=SIGNED
  [3] INT '__int128 unsigned' size=16 bits_offset=0 nr_bits=128 encoding=(none)
  [4] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
  [5] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
  [6] INT 'long int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED
  [7] INT 'long long int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED
  ...
  [15085] STRUCT 'arch_elf_state' size=0 vlen=0
  [15086] STRUCT 'arch_vdso_data' size=0 vlen=0
  [15087] STRUCT 'bpf_run_ctx' size=0 vlen=0
  [15088] STRUCT 'dev_archdata' size=0 vlen=0
  [15089] STRUCT 'dyn_arch_ftrace' size=0 vlen=0
  [15090] STRUCT 'fscrypt_dummy_policy' size=0 vlen=0
  ...
  
(Sort by kind, than by vlen, than by name because sorting by name is a
 bit costly, then by member properties)


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding
  2024-04-09 15:01         ` Eduard Zingerman
@ 2024-04-09 18:45           ` Arnaldo Carvalho de Melo
  2024-04-09 19:29             ` Eduard Zingerman
  0 siblings, 1 reply; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-09 18:45 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alexei Starovoitov, Alan Maguire, dwarves, Jiri Olsa,
	Clark Williams, Kate Carcia, bpf, Kui-Feng Lee,
	Thomas Weißschuh

On Tue, Apr 09, 2024 at 06:01:08PM +0300, Eduard Zingerman wrote:
> On Tue, 2024-04-09 at 07:56 -0700, Alexei Starovoitov wrote:
> [...]
 
> > I would actually go with sorted BTF, since it will probably
> > make diff-ing of BTFs practical. Will be easier to track changes

What kind of diff-ing of BTFs from different kernels are you interested
in?

in pahole's repository we have btfdiff, that will, given a vmlinux with
both DWARF and BTF use pahole to pretty print all types, expanded, and
then compare the two outputs, which should produce the same results from
BTF and DWARF. Ditto for DWARF from a vmlinux compared to a detached BTF
file.

And also now we have another regression test script that will produce
the output from 'btftool btf dump' for the BTF generated from DWARF in
serial mode, and then compare that with the output from 'bpftool btf
dump' for reproducible encodings done using -j 1 ...
number-of-processors-on-the-machine. All have to match, all types, all
BTF ids.

We can as well use something like btfdiff to compare the output from
'pahole --expand_types --sort' for two BTFs for two different kernels,
to see what are the new types and the changes to types in both.

What else do you want to compare? To be able to match we would have to
somehow have ranges for each DWARF CU so that when encoding and then
deduplicating we would have space in the ID space for new types to fill
in while keeping the old types IDs matching the same types in the new
vmlinux.

While ordering all types we would have to have ID space available from
each of the BTF kinds, no?

I haven't looked at Eduard's patches, is that what it is done?

> > from one kernel version to another. vmlinux.h will become
> > a bit more sorted too and normal diff vmlinux_6_1.h vmlinux_6_2.h
> > will be possible.
> > Or am I misunderstanding the sorting concept?
 
> You understand the concept correctly, here is a sample:
 
>   [1] INT '_Bool' size=1 bits_offset=0 nr_bits=8 encoding=BOOL
>   [2] INT '__int128' size=16 bits_offset=0 nr_bits=128 encoding=SIGNED
>   [3] INT '__int128 unsigned' size=16 bits_offset=0 nr_bits=128 encoding=(none)
>   [4] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
>   [5] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
>   [6] INT 'long int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED
>   [7] INT 'long long int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED

The above: so far so good, probably there will not be something that
will push what is now BTF id 6 to become 7 in a new vmlinux, but can we
say the same for the more dynamic parts, like the list of structs?

A struct can vanish, that abstraction not being used anymore in the
kernel, so its BTF id will vacate and all of the next struct IDs will
"fall down" and gets its IDs decremented, no?

If these difficulties are present as I mentioned, then rebuilding from
the BTF data with something like the existing 'pahole --expand_types
--sort' from the BTF from kernel N to compare with the same output for
kernel N + 1 should be enough to see what changed from one kernel to the
next one?

- Arnaldo

>   ...
>   [15085] STRUCT 'arch_elf_state' size=0 vlen=0
>   [15086] STRUCT 'arch_vdso_data' size=0 vlen=0
>   [15087] STRUCT 'bpf_run_ctx' size=0 vlen=0
>   [15088] STRUCT 'dev_archdata' size=0 vlen=0
>   [15089] STRUCT 'dyn_arch_ftrace' size=0 vlen=0
>   [15090] STRUCT 'fscrypt_dummy_policy' size=0 vlen=0
>   ...
>   
> (Sort by kind, than by vlen, than by name because sorting by name is a
>  bit costly, then by member properties)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding
  2024-04-09 18:45           ` Arnaldo Carvalho de Melo
@ 2024-04-09 19:29             ` Eduard Zingerman
  2024-04-09 19:34               ` Alexei Starovoitov
  2024-04-09 19:57               ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 37+ messages in thread
From: Eduard Zingerman @ 2024-04-09 19:29 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Alexei Starovoitov, Alan Maguire, dwarves, Jiri Olsa,
	Clark Williams, Kate Carcia, bpf, Kui-Feng Lee,
	Thomas Weißschuh

On Tue, 2024-04-09 at 15:45 -0300, Arnaldo Carvalho de Melo wrote:
> On Tue, Apr 09, 2024 at 06:01:08PM +0300, Eduard Zingerman wrote:
> > On Tue, 2024-04-09 at 07:56 -0700, Alexei Starovoitov wrote:
> > [...]
>  
> > > I would actually go with sorted BTF, since it will probably
> > > make diff-ing of BTFs practical. Will be easier to track changes
> 
> What kind of diff-ing of BTFs from different kernels are you interested
> in?
> 
> in pahole's repository we have btfdiff, that will, given a vmlinux with
> both DWARF and BTF use pahole to pretty print all types, expanded, and
> then compare the two outputs, which should produce the same results from
> BTF and DWARF. Ditto for DWARF from a vmlinux compared to a detached BTF
> file.
> 
> And also now we have another regression test script that will produce
> the output from 'btftool btf dump' for the BTF generated from DWARF in
> serial mode, and then compare that with the output from 'bpftool btf
> dump' for reproducible encodings done using -j 1 ...
> number-of-processors-on-the-machine. All have to match, all types, all
> BTF ids.
> 
> We can as well use something like btfdiff to compare the output from
> 'pahole --expand_types --sort' for two BTFs for two different kernels,
> to see what are the new types and the changes to types in both.
> 
> What else do you want to compare? To be able to match we would have to
> somehow have ranges for each DWARF CU so that when encoding and then
> deduplicating we would have space in the ID space for new types to fill
> in while keeping the old types IDs matching the same types in the new
> vmlinux.

As far as I understand Alexei, he means diffing two vmlinux.h files
generated for different kernel versions. The vmlinux.h is generated by
bpftool using command `bpftool btf dump file <binary-file> format c`.
The output is topologically sorted to satisfy C compiler, but ordering
is not total, so vmlinux.h content may vary from build to build if BTF
type order differs.

Thus, any kind of stable BTF type ordering would make vmlinux.h stable.
On the other hand, topological ordering used by bpftool
(the algorithm is in the libbpf, actually) might be extended with
additional rules to make the ordering total.

> While ordering all types we would have to have ID space available from
> each of the BTF kinds, no?
> 
> I haven't looked at Eduard's patches, is that what it is done?

No, I don't reserve any ID space, the output of 
`bpftool btf dump file <binary-file> format raw` is not suitable for
diffing w/o post-processing if some types are added or removed in the
middle.

I simply add a function to compare two BTF types and a pass that sorts
all BTF types before finalizing BTF generation.

> > > from one kernel version to another. vmlinux.h will become
> > > a bit more sorted too and normal diff vmlinux_6_1.h vmlinux_6_2.h
> > > will be possible.
> > > Or am I misunderstanding the sorting concept?
>  
> > You understand the concept correctly, here is a sample:
>  
> >   [1] INT '_Bool' size=1 bits_offset=0 nr_bits=8 encoding=BOOL
> >   [2] INT '__int128' size=16 bits_offset=0 nr_bits=128 encoding=SIGNED
> >   [3] INT '__int128 unsigned' size=16 bits_offset=0 nr_bits=128 encoding=(none)
> >   [4] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
> >   [5] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
> >   [6] INT 'long int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED
> >   [7] INT 'long long int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED
> 
> The above: so far so good, probably there will not be something that
> will push what is now BTF id 6 to become 7 in a new vmlinux, but can we
> say the same for the more dynamic parts, like the list of structs?
> 
> A struct can vanish, that abstraction not being used anymore in the
> kernel, so its BTF id will vacate and all of the next struct IDs will
> "fall down" and gets its IDs decremented, no?

Yes, this would happen.

> If these difficulties are present as I mentioned, then rebuilding from
> the BTF data with something like the existing 'pahole --expand_types
> --sort' from the BTF from kernel N to compare with the same output for
> kernel N + 1 should be enough to see what changed from one kernel to the
> next one?

Yes, this is an option.

Thanks,
Eduard

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding
  2024-04-09 19:29             ` Eduard Zingerman
@ 2024-04-09 19:34               ` Alexei Starovoitov
  2024-04-09 19:57               ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 37+ messages in thread
From: Alexei Starovoitov @ 2024-04-09 19:34 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Arnaldo Carvalho de Melo, Alan Maguire, dwarves, Jiri Olsa,
	Clark Williams, Kate Carcia, bpf, Kui-Feng Lee,
	Thomas Weißschuh

On Tue, Apr 9, 2024 at 12:29 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Tue, 2024-04-09 at 15:45 -0300, Arnaldo Carvalho de Melo wrote:
> > On Tue, Apr 09, 2024 at 06:01:08PM +0300, Eduard Zingerman wrote:
> > > On Tue, 2024-04-09 at 07:56 -0700, Alexei Starovoitov wrote:
> > > [...]
> >
> > > > I would actually go with sorted BTF, since it will probably
> > > > make diff-ing of BTFs practical. Will be easier to track changes
> >
> > What kind of diff-ing of BTFs from different kernels are you interested
> > in?
> >
> > in pahole's repository we have btfdiff, that will, given a vmlinux with
> > both DWARF and BTF use pahole to pretty print all types, expanded, and
> > then compare the two outputs, which should produce the same results from
> > BTF and DWARF. Ditto for DWARF from a vmlinux compared to a detached BTF
> > file.
> >
> > And also now we have another regression test script that will produce
> > the output from 'btftool btf dump' for the BTF generated from DWARF in
> > serial mode, and then compare that with the output from 'bpftool btf
> > dump' for reproducible encodings done using -j 1 ...
> > number-of-processors-on-the-machine. All have to match, all types, all
> > BTF ids.
> >
> > We can as well use something like btfdiff to compare the output from
> > 'pahole --expand_types --sort' for two BTFs for two different kernels,
> > to see what are the new types and the changes to types in both.
> >
> > What else do you want to compare? To be able to match we would have to
> > somehow have ranges for each DWARF CU so that when encoding and then
> > deduplicating we would have space in the ID space for new types to fill
> > in while keeping the old types IDs matching the same types in the new
> > vmlinux.
>
> As far as I understand Alexei, he means diffing two vmlinux.h files
> generated for different kernel versions. The vmlinux.h is generated by
> bpftool using command `bpftool btf dump file <binary-file> format c`.
> The output is topologically sorted to satisfy C compiler, but ordering
> is not total, so vmlinux.h content may vary from build to build if BTF
> type order differs.
>
> Thus, any kind of stable BTF type ordering would make vmlinux.h stable.
> On the other hand, topological ordering used by bpftool
> (the algorithm is in the libbpf, actually) might be extended with
> additional rules to make the ordering total.

Not looking for perfect ordering.
People check-in vmlinux.h into their repos.
If it's more or less sorted the git diff of updated vmlinux.h will be
a bit more human readable. Hopefully.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding
  2024-04-09 19:29             ` Eduard Zingerman
  2024-04-09 19:34               ` Alexei Starovoitov
@ 2024-04-09 19:57               ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-09 19:57 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alexei Starovoitov, Alan Maguire, dwarves, Jiri Olsa,
	Clark Williams, Kate Carcia, bpf, Kui-Feng Lee,
	Thomas Weißschuh

On Tue, Apr 09, 2024 at 10:29:18PM +0300, Eduard Zingerman wrote:
> On Tue, 2024-04-09 at 15:45 -0300, Arnaldo Carvalho de Melo wrote:
> > On Tue, Apr 09, 2024 at 06:01:08PM +0300, Eduard Zingerman wrote:
> > > On Tue, 2024-04-09 at 07:56 -0700, Alexei Starovoitov wrote:
> > > [...]
> >  
> > > > I would actually go with sorted BTF, since it will probably
> > > > make diff-ing of BTFs practical. Will be easier to track changes
> > 
> > What kind of diff-ing of BTFs from different kernels are you interested
> > in?
> > 
> > in pahole's repository we have btfdiff, that will, given a vmlinux with
> > both DWARF and BTF use pahole to pretty print all types, expanded, and
> > then compare the two outputs, which should produce the same results from
> > BTF and DWARF. Ditto for DWARF from a vmlinux compared to a detached BTF
> > file.
> > 
> > And also now we have another regression test script that will produce
> > the output from 'btftool btf dump' for the BTF generated from DWARF in
> > serial mode, and then compare that with the output from 'bpftool btf
> > dump' for reproducible encodings done using -j 1 ...
> > number-of-processors-on-the-machine. All have to match, all types, all
> > BTF ids.
> > 
> > We can as well use something like btfdiff to compare the output from
> > 'pahole --expand_types --sort' for two BTFs for two different kernels,
> > to see what are the new types and the changes to types in both.
> > 
> > What else do you want to compare? To be able to match we would have to
> > somehow have ranges for each DWARF CU so that when encoding and then
> > deduplicating we would have space in the ID space for new types to fill
> > in while keeping the old types IDs matching the same types in the new
> > vmlinux.
> 
> As far as I understand Alexei, he means diffing two vmlinux.h files
> generated for different kernel versions. The vmlinux.h is generated by
> bpftool using command `bpftool btf dump file <binary-file> format c`.
> The output is topologically sorted to satisfy C compiler, but ordering
> is not total, so vmlinux.h content may vary from build to build if BTF
> type order differs.
> 
> Thus, any kind of stable BTF type ordering would make vmlinux.h stable.
> On the other hand, topological ordering used by bpftool
> (the algorithm is in the libbpf, actually) might be extended with
> additional rules to make the ordering total.

Interesting, the other tool that is in the pahole repo is 'fullcircle',
that given a .o file will generate a compileable file (a vmlinux.h say)
and then build it again to generate DWARF and then compare the original
DWARF with the new onbe.
 
> > While ordering all types we would have to have ID space available from
> > each of the BTF kinds, no?
> > 
> > I haven't looked at Eduard's patches, is that what it is done?
> 
> No, I don't reserve any ID space, the output of 
> `bpftool btf dump file <binary-file> format raw` is not suitable for
> diffing w/o post-processing if some types are added or removed in the
> middle.
 
> I simply add a function to compare two BTF types and a pass that sorts
> all BTF types before finalizing BTF generation.

Ok, so I see that the BTF ids for the types will change, its the
vmlinux.h that is to be compared.

root@x1:~# pahole -F btf --compile | tail -12
struct ncsi_aen_handler {
	unsigned char              type;                 /*     0     1 */

	/* XXX 3 bytes hole, try to pack */

	int                        payload;              /*     4     4 */
	int                        (*handler)(struct ncsi_dev_priv *, struct ncsi_aen_pkt_hdr *); /*     8     8 */

	/* size: 16, cachelines: 1, members: 3 */
	/* sum members: 13, holes: 1, sum holes: 3 */
	/* last cacheline: 16 bytes */
};
root@x1:~# pahole -F btf --compile > a.c ; echo 'int main(void) { struct ncsi_aen_handler b = { 1, } ; return b.type ; } ' >> a.c ; gcc -g -o bla -c a.c
root@x1:~# pahole --expand_types ncsi_aen_handler > from_kernel_btf
root@x1:~# pahole --expand_types -C ncsi_aen_handler bla > from_bla_dwarf
root@x1:~# diff -u from_kernel_btf from_bla_dwarf
root@x1:~#

The above is for a super simple struct, no expansions even, now for:

root@x1:~# pahole -F btf --compile > a.c ; echo 'int main(void) { struct task_struct b = { .prio = 12345, } ; return b.prio ; } ' >> a.c ; gcc -g -o bla -c a.c
root@x1:~# pahole --suppress_aligned_attribute --expand_types -C task_struct bla > from_bla_dwarf
root@x1:~# pahole --suppress_aligned_attribute --expand_types task_struct > from_kernel_btf
root@x1:~# diff -u from_kernel_btf from_bla_dwarf
root@x1:~#

I suppressed the align attribute as right now the output from pahole
when it finds the __attribute__ alignment present in DWARF is slightly
different, but equivalent (barring bugs) to when it infers the alignment
and adds it to BTF data, that has no alignment info other than the
member offsets (DWARF has both the member offsets to infer the alignment
_and_ attributes when they are present in the source code, sometimes
even duplicated, which probably is the reason for the difference in
output (albeit the end result should be equivalent)).

root@x1:~# pahole --expand_types task_struct | wc -l
1254
root@x1:~# pahole --expand_types task_struct | tail
	/* XXX last struct has 1 hole, 1 bit hole */

	/* size: 13696, cachelines: 214, members: 265 */
	/* sum members: 13522, holes: 20, sum holes: 158 */
	/* sum bitfield members: 83 bits, bit holes: 2, sum bit holes: 45 bits */
	/* member types with holes: 4, total: 6, bit holes: 2, total: 2 */
	/* paddings: 6, sum paddings: 49 */
	/* forced alignments: 2, forced holes: 2, sum forced holes: 88 */
};

root@x1:~#

I.e. the original BTF doesn't have to be sorted (well, it will keep the
order DWARF does, which, in turn, is another desire of reproducible
builds, it will not have the same output for two kernel releases, but
should be as close as possible) pahole (--sort or --compile) or bpftool
can do it either by plain sorting the types (pahole --sort, used by
btfdiff to compara output from DWARF to output from BTF) or by
generating a compilable source code (pahole --compile, aka
"topologically sorted to satisfy C compiler").
 
> > > > from one kernel version to another. vmlinux.h will become
> > > > a bit more sorted too and normal diff vmlinux_6_1.h vmlinux_6_2.h
> > > > will be possible.
> > > > Or am I misunderstanding the sorting concept?
  
> > > You understand the concept correctly, here is a sample:
  
> > >   [1] INT '_Bool' size=1 bits_offset=0 nr_bits=8 encoding=BOOL
> > >   [2] INT '__int128' size=16 bits_offset=0 nr_bits=128 encoding=SIGNED
> > >   [3] INT '__int128 unsigned' size=16 bits_offset=0 nr_bits=128 encoding=(none)
> > >   [4] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
> > >   [5] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
> > >   [6] INT 'long int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED
> > >   [7] INT 'long long int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED

> > The above: so far so good, probably there will not be something that
> > will push what is now BTF id 6 to become 7 in a new vmlinux, but can we
> > say the same for the more dynamic parts, like the list of structs?
 
> > A struct can vanish, that abstraction not being used anymore in the
> > kernel, so its BTF id will vacate and all of the next struct IDs will
> > "fall down" and gets its IDs decremented, no?
 
> Yes, this would happen.

We're on the same page.

> > If these difficulties are present as I mentioned, then rebuilding from
> > the BTF data with something like the existing 'pahole --expand_types
> > --sort' from the BTF from kernel N to compare with the same output for
> > kernel N + 1 should be enough to see what changed from one kernel to the
> > next one?
 
> Yes, this is an option.

Agreed. What I tried in my series was to do as little as possible to
make the serial output be the same as whatever level of paralelism we
have while making the whole process to cost as close to the
unconstrained parallelism that we had in place, i.e. to get a
reproducible build at the lowest cost in terms of code churn (the more
code we touch, the more chances we have of new bugs to be introduced)
and of CPU cycles/memory use, etc.

- Arnaldo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding
  2024-04-08 14:39     ` Arnaldo Carvalho de Melo
@ 2024-04-12 20:36       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-12 20:36 UTC (permalink / raw)
  To: Alan Maguire
  Cc: dwarves, Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Kui-Feng Lee, Thomas Weißschuh

On Mon, Apr 08, 2024 at 11:39:32AM -0300, Arnaldo Carvalho de Melo wrote:
> On Mon, Apr 08, 2024 at 01:00:59PM +0100, Alan Maguire wrote:
> > On 04/04/2024 09:58, Alan Maguire wrote:
> > > Program terminated with signal SIGSEGV, Segmentation fault.
> > > #0  0x00007f8c8260a58c in ptr_table__entry (pt=0x7f8c60001e70, id=77)
> > >     at /home/almagui/src/dwarves/dwarves.c:612
> > > 612		return id >= pt->nr_entries ? NULL : pt->entries[id];
> > > [Current thread is 1 (Thread 0x7f8c65400700 (LWP 624441))]
> > > (gdb) print *(struct ptr_table *)0x7f8c60001e70
> > > $1 = {entries = 0x0, nr_entries = 2979, allocated_entries = 4096}
> > > (gdb)
> 
> > > So it looks like the ptr_table has 2979 entries but entries is NULL;
> > > could there be an issue where CU initialization is not yet complete
> > > for some threads (it also happens very early in processing)? Can you
> > > reproduce this failure at your end? Thanks!
>  
> > the following (when applied on top of the series) resolves the
> > segmentation fault for me:
>  
> > diff --git a/pahole.c b/pahole.c
> > index 6c7e738..5ff0eaf 100644
> > --- a/pahole.c
> > +++ b/pahole.c
> > @@ -3348,8 +3348,8 @@ static enum load_steal_kind pahole_stealer(struct
> > cu *cu,
> >                 if (conf_load->reproducible_build) {
> >                         ret = LSK__KEEPIT; // we're not processing the
> > cu passed to this function, so keep it.
> > -                        // Equivalent to LSK__DELETE since we processed
> > this
> > -                       cus__remove(cus, cu);
> > -                       cu__delete(cu);
> >                 }
> >  out_btf:
> >                 if (!thr_data) // See comment about reproducibe_build above
> > 
> 
> Yeah, Jiri also pointed out this call to cu__delete() was new, I was
> trying to avoid having unprocessed 'struct cu' using too much memory, so
> after processing it, delete them, but as you found out there are
> references to that memory...
> 
> > In other words, the problem is we remove/delete CUs when finished with
> > them in each thread (when BTF is generated).  However because the
> > save/add_saved_funcs stashes CU references in the associated struct
> > function * (to allow prototype comparison for the same function in
> > different CUs), we end up with stale CU references and in this case the
> > freed/nulled ptr_table caused an issue. As far as I can see we need to
> > retain CUs until all BTF has been merged from threads.
>  
> > With the fix in place, I'm seeing less then 100msec difference between
> > reproducible/non-reproducible vmlinux BTF generation; that's great!
> 
> Excellent!
> 
> I'll remove it and add a note crediting you with the removal and having
> the explanation about why its not possibe to delete it at that point
> (references to the associated 'struct function').

So I removed that cus__remove + cu__delete and also the other one at the
flush operation, leaving all cleaning up to cus__delete() time:

⬢[acme@toolbox pahole]$ git diff
diff --git a/dwarves.c b/dwarves.c
index fbc8d8aa0060b7d0..1ec259f50dbd3778 100644
--- a/dwarves.c
+++ b/dwarves.c
@@ -489,8 +489,12 @@ struct cu *cus__get_next_processable_cu(struct cus *cus)
                        cu->state = CU__PROCESSING;
                        goto found;
                case CU__PROCESSING:
-                       // This will only happen when we get to parallel
-                       // reproducible BTF encoding, libbpf dedup work needed here.
+                       // This will happen when we get to parallel
+                       // reproducible BTF encoding, libbpf dedup work needed
+                       // here. The other possibility is when we're flushing
+                       // the DWARF processed CUs when the parallel DWARF
+                       // loading stoped and we still have CUs to encode to
+                       // BTF because of ordering requirements.
                        continue;
                case CU__UNPROCESSED:
                        // The first entry isn't loaded, signal the
diff --git a/pahole.c b/pahole.c
index 6c7e73835b3e9139..77772bb42bb443ce 100644
--- a/pahole.c
+++ b/pahole.c
@@ -3347,9 +3347,9 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
 
                if (conf_load->reproducible_build) {
                        ret = LSK__KEEPIT; // we're not processing the cu passed to this function, so keep it.
-                       // Equivalent to LSK__DELETE since we processed this
-                       cus__remove(cus, cu);
-                       cu__delete(cu);
+                       // Kinda equivalent to LSK__DELETE since we processed this, but we can't delete it
+                       // as we stash references to entries in CUs for 'struct function' in btf_encoder__add_saved_funcs()
+                       // and btf_encoder__save_func(), so we can't delete them here. - Alan Maguire
                }
 out_btf:
                if (!thr_data) // See comment about reproducibe_build above
@@ -3667,9 +3667,6 @@ static int cus__flush_reproducible_build(struct cus *cus, struct btf_encoder *en
                err = btf_encoder__encode_cu(encoder, cu, conf_load);
                if (err < 0)
                        break;
-
-               cus__remove(cus, cu);
-               cu__delete(cu);
⬢[acme@toolbox pahole]$


It ends up taking a bit more time on this 14700K with 32Gb, I'll later
try to remove that need to keep everything in memory and also double
check this hunch that this is due to keeping everyuthing in memory.

Can I take this (with the above patch, that is a bit bigger than yours)
as a Tested-by + Reviewed-by you?

- Arnald

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding
  2024-04-09 14:34     ` Eduard Zingerman
  2024-04-09 14:56       ` Alexei Starovoitov
@ 2024-04-12 20:37       ` Arnaldo Carvalho de Melo
  2024-04-12 20:40         ` Eduard Zingerman
  1 sibling, 1 reply; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-12 20:37 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alan Maguire, dwarves, Jiri Olsa, Clark Williams, Kate Carcia,
	bpf, Kui-Feng Lee, Thomas Weißschuh

On Tue, Apr 09, 2024 at 05:34:46PM +0300, Eduard Zingerman wrote:
> Still, there are a few discrepancies in generated BTFs: some function
> prototypes are included twice at random (about 30 IDs added/deleted).
> This might be connected to Alan's suggestion and requires
> further investigation.
> 
> All in all, Arnaldo's approach with CU ordering looks simpler.

I'm going, for now, with the simple approach, can I take your comments
as a Reviewed-by: you?

- Arnaldo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding
  2024-04-12 20:37       ` Arnaldo Carvalho de Melo
@ 2024-04-12 20:40         ` Eduard Zingerman
  2024-04-12 21:09           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 37+ messages in thread
From: Eduard Zingerman @ 2024-04-12 20:40 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Alan Maguire, dwarves, Jiri Olsa, Clark Williams, Kate Carcia,
	bpf, Kui-Feng Lee, Thomas Weißschuh

On Fri, 2024-04-12 at 17:37 -0300, Arnaldo Carvalho de Melo wrote:
> On Tue, Apr 09, 2024 at 05:34:46PM +0300, Eduard Zingerman wrote:
> > Still, there are a few discrepancies in generated BTFs: some function
> > prototypes are included twice at random (about 30 IDs added/deleted).
> > This might be connected to Alan's suggestion and requires
> > further investigation.
> > 
> > All in all, Arnaldo's approach with CU ordering looks simpler.
> 
> I'm going, for now, with the simple approach, can I take your comments
> as a Reviewed-by: you?

If you are going to post next version I'll go through the new series
and ack the patches (I understand the main idea but did not read the
series in detail).

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding
  2024-04-12 20:40         ` Eduard Zingerman
@ 2024-04-12 21:09           ` Arnaldo Carvalho de Melo
  2024-04-12 21:10             ` Eduard Zingerman
  0 siblings, 1 reply; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-12 21:09 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alan Maguire, dwarves, Jiri Olsa, Clark Williams, Kate Carcia,
	bpf, Kui-Feng Lee, Thomas Weißschuh

On Fri, Apr 12, 2024 at 11:40:44PM +0300, Eduard Zingerman wrote:
> On Fri, 2024-04-12 at 17:37 -0300, Arnaldo Carvalho de Melo wrote:
> > On Tue, Apr 09, 2024 at 05:34:46PM +0300, Eduard Zingerman wrote:
> > > Still, there are a few discrepancies in generated BTFs: some function
> > > prototypes are included twice at random (about 30 IDs added/deleted).
> > > This might be connected to Alan's suggestion and requires
> > > further investigation.
> > > 
> > > All in all, Arnaldo's approach with CU ordering looks simpler.
> > 
> > I'm going, for now, with the simple approach, can I take your comments
> > as a Reviewed-by: you?
> 
> If you are going to post next version I'll go through the new series
> and ack the patches (I understand the main idea but did not read the
> series in detail).

Ok, its now in the next branch, I'll repost here as well.

- Arnaldo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding
  2024-04-12 21:09           ` Arnaldo Carvalho de Melo
@ 2024-04-12 21:10             ` Eduard Zingerman
  0 siblings, 0 replies; 37+ messages in thread
From: Eduard Zingerman @ 2024-04-12 21:10 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Alan Maguire, dwarves, Jiri Olsa, Clark Williams, Kate Carcia,
	bpf, Kui-Feng Lee, Thomas Weißschuh

On Fri, 2024-04-12 at 18:09 -0300, Arnaldo Carvalho de Melo wrote:
> On Fri, Apr 12, 2024 at 11:40:44PM +0300, Eduard Zingerman wrote:
> > On Fri, 2024-04-12 at 17:37 -0300, Arnaldo Carvalho de Melo wrote:
> > > On Tue, Apr 09, 2024 at 05:34:46PM +0300, Eduard Zingerman wrote:
> > > > Still, there are a few discrepancies in generated BTFs: some function
> > > > prototypes are included twice at random (about 30 IDs added/deleted).
> > > > This might be connected to Alan's suggestion and requires
> > > > further investigation.
> > > > 
> > > > All in all, Arnaldo's approach with CU ordering looks simpler.
> > > 
> > > I'm going, for now, with the simple approach, can I take your comments
> > > as a Reviewed-by: you?
> > 
> > If you are going to post next version I'll go through the new series
> > and ack the patches (I understand the main idea but did not read the
> > series in detail).
> 
> Ok, its now in the next branch, I'll repost here as well.

Great, thank you, I'll go through it over the weekend.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 11/12] pahole: Encode BTF serially in a reproducible build
  2024-04-12 21:15 [PATCH 00/12] Arnaldo Carvalho de Melo
@ 2024-04-12 21:16 ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 37+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-04-12 21:16 UTC (permalink / raw)
  To: dwarves
  Cc: Jiri Olsa, Clark Williams, Kate Carcia, bpf,
	Arnaldo Carvalho de Melo, Alan Maguire, Kui-Feng Lee,
	Thomas Weißschuh

From: Arnaldo Carvalho de Melo <acme@redhat.com>

Now we will ask the cus instance for the next processable CU, i.e. one
that is loaded and is in the same CU order as in the original DWARF
file, under the BTF lock.

With this we can go on loading the DWARF file in parallel and only
serialize the BTF encoding, keeping that order, with this the BTF ids
end up the same both for a serial encoding:

And here are some numbers with a Release build:

  $ cat buildcmd.sh
  mkdir build
  cd build
  cmake -DCMAKE_BUILD_TYPE=Release ..
  cd ..
  make -j $(getconf _NPROCESSORS_ONLN) -C build
  $ rm -rf build
  $ ./buildcmd.sh

Its an Intel Hybrid system, and migrates to/from efficiency/perfomance
cores:

  $ getconf _NPROCESSORS_ONLN
  28
  $ grep -m1 'model name' /proc/cpuinfo
  model name	: Intel(R) Core(TM) i7-14700K
  $

8 performance cores (16 threads), 12 efficiency cores.

Serial encoding:

  $ time perf stat -e cycles -r5 pahole --btf_encode_detached=vmlinux.btf.serial vmlinux

   Performance counter stats for 'pahole --btf_encode_detached=vmlinux.btf.serial vmlinux' (5 runs):

      13,313,169,305      cpu_atom/cycles:u/     ( +- 30.61% )  (0.00%)
      27,985,776,096      cpu_core/cycles:u/     ( +-  0.17% )  (100.00%)

             5.18276 +- 0.00952 seconds time elapsed  ( +-  0.18% )

  real	0m25.937s
  user	0m25.337s
  sys	0m0.533s
  $

Parallel, but non-reproducible:

  $ time perf stat -e cycles -r5 pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux.btf.parallel vmlinux' (5 runs):

      73,112,892,822      cpu_atom/cycles:u/     ( +-  0.60% )  (43.16%)
      99,124,802,605      cpu_core/cycles:u/     ( +-  0.72% )  (59.01%)

              1.9501 +- 0.0111 seconds time elapsed  ( +-  0.57% )

  real	0m9.778s
  user	1m30.700s
  sys	0m13.114s
  $

Now what we want, a reproducible build done using parallel DWARF loading
+ CUs-ordered-as-in-vmlinux serial BTF encoding:

  $ time perf stat -e cycles -r5 pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible vmlinux

   Performance counter stats for 'pahole -j --reproducible_build --btf_encode_detached=vmlinux.btf.parallel.reproducible vmlinux' (5 runs):

      21,263,444,208      cpu_atom/cycles:u/     ( +-  1.95% )  (29.18%)
      35,881,074,547      cpu_core/cycles:u/     ( +-  0.64% )  (75.60%)

              2.5354 +- 0.0221 seconds time elapsed  ( +-  0.87% )

  real	0m12.709s
  user	0m35.001s
  sys	0m9.017s
  $

Fastest is off course the unreproducible, fully parallel DWARF loading/
BTF encoding at 1.9501 +- 0.0111 seconds, but doing a reproducible build
in 2.5354 +- 0.0221 seconds is better than completely disabling -j/full
serial at 5.18276 +- 0.00952 seconds.

Comparing the BTF generated:

  $ bpftool btf dump file vmlinux.btf.serial > output.vmlinux.btf.serial
  $ bpftool btf dump file vmlinux.btf.parallel > output.vmlinux.btf.parallel
  $ bpftool btf dump file vmlinux.btf.parallel.reproducible > output.vmlinux.btf.parallel.reproducible

  $ wc -l output.vmlinux.btf.serial output.vmlinux.btf.parallel output.vmlinux.btf.parallel.reproducible
    313404 output.vmlinux.btf.serial
    314345 output.vmlinux.btf.parallel
    313404 output.vmlinux.btf.parallel.reproducible
    941153 total
  $

Non reproducible parallel BTF encoding:

  $ diff -u output.vmlinux.btf.serial output.vmlinux.btf.parallel | head
  --- output.vmlinux.btf.serial	2024-04-02 11:11:56.665027947 -0300
  +++ output.vmlinux.btf.parallel	2024-04-02 11:12:38.490895460 -0300
  @@ -1,1708 +1,2553 @@
   [1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
  -[2] CONST '(anon)' type_id=1
  -[3] VOLATILE '(anon)' type_id=2
  -[4] ARRAY '(anon)' type_id=1 index_type_id=21 nr_elems=2
  -[5] PTR '(anon)' type_id=8
  -[6] CONST '(anon)' type_id=5
  -[7] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=(none)
  $

Reproducible:

  $ diff -u output.vmlinux.btf.serial output.vmlinux.btf.parallel.reproducible
  $

And using a test script that I'll add to a nascent repository of
regression tests:

  $ time tests/reproducible_build.sh vmlinux
  Parallel reproducible DWARF Loading/Serial BTF encoding: Ok

  real	1m13.844s
  user	3m3.601s
  sys	0m9.049s
  $

If the number of threads started by pahole is different than what was
requests via its -j command line option, it will fail as well as if the
output of 'bpftool btf dump' differs from the BTF encoded totally
serially to one of the detached BTF encoded using reproducible DWARF
loading/BTF encoding.

In verbose mode:

  $ time VERBOSE=1 tests/reproducible_build.sh vmlinux
  Parallel reproducible DWARF Loading/Serial BTF encoding:
  serial encoding...
  1 threads encoding
  1 threads started
  diff from serial encoding:
  -----------------------------
  2 threads encoding
  2 threads started
  diff from serial encoding:
  -----------------------------
  3 threads encoding
  3 threads started
  diff from serial encoding:
  -----------------------------
  4 threads encoding
  4 threads started
  diff from serial encoding:
  -----------------------------
  5 threads encoding
  5 threads started
  diff from serial encoding:
  -----------------------------
  6 threads encoding
  6 threads started
   diff from serial encoding:
  -----------------------------
  7 threads encoding
  7 threads started
  diff from serial encoding:
  -----------------------------
  8 threads encoding
  8 threads started
  diff from serial encoding:
  -----------------------------
  9 threads encoding
  9 threads started
  diff from serial encoding:
  -----------------------------
  10 threads encoding
  10 threads started
  diff from serial encoding:
  -----------------------------
  11 threads encoding
  11 threads started
  diff from serial encoding:
  -----------------------------
  12 threads encoding
  12 threads started
  diff from serial encoding:
  -----------------------------
  13 threads encoding
  13 threads started
  diff from serial encoding:
  -----------------------------
  14 threads encoding
  14 threads started
  diff from serial encoding:
  -----------------------------
  15 threads encoding
  15 threads started
  diff from serial encoding:
  -----------------------------
  16 threads encoding
  16 threads started
  diff from serial encoding:
  -----------------------------
  17 threads encoding
  17 threads started
  diff from serial encoding:
  -----------------------------
  18 threads encoding
  18 threads started
  diff from serial encoding:
  -----------------------------
  19 threads encoding
  19 threads started
  diff from serial encoding:
  -----------------------------
  20 threads encoding
  20 threads started
  diff from serial encoding:
  -----------------------------
  21 threads encoding
  21 threads started
  diff from serial encoding:
  -----------------------------
  22 threads encoding
  22 threads started
  diff from serial encoding:
  -----------------------------
  23 threads encoding
  23 threads started
  diff from serial encoding:
  -----------------------------
  24 threads encoding
  24 threads started
  diff from serial encoding:
  -----------------------------
  25 threads encoding
  25 threads started
  diff from serial encoding:
  -----------------------------
  26 threads encoding
  26 threads started
  diff from serial encoding:
  -----------------------------
  27 threads encoding
  27 threads started
  diff from serial encoding:
  -----------------------------
  28 threads encoding
  28 threads started
  diff from serial encoding:
  -----------------------------
  Ok

  real	1m14.800s
  user	3m4.315s
  sys	0m8.977s
  $

Tested-by: Alan Maguire <alan.maguire@oracle.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kui-Feng Lee <kuifeng@fb.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/all/ZhQBpAGIDU_koQnp@x1/T/#u
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 dwarves.c |  8 ++++++--
 pahole.c  | 48 +++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/dwarves.c b/dwarves.c
index fbc8d8aa0060b7d0..1ec259f50dbd3778 100644
--- a/dwarves.c
+++ b/dwarves.c
@@ -489,8 +489,12 @@ struct cu *cus__get_next_processable_cu(struct cus *cus)
 			cu->state = CU__PROCESSING;
 			goto found;
 		case CU__PROCESSING:
-			// This will only happen when we get to parallel
-			// reproducible BTF encoding, libbpf dedup work needed here.
+			// This will happen when we get to parallel
+			// reproducible BTF encoding, libbpf dedup work needed
+			// here. The other possibility is when we're flushing
+			// the DWARF processed CUs when the parallel DWARF
+			// loading stoped and we still have CUs to encode to
+			// BTF because of ordering requirements.
 			continue;
 		case CU__UNPROCESSED:
 			// The first entry isn't loaded, signal the
diff --git a/pahole.c b/pahole.c
index fcb4360f11debeb9..77772bb42bb443ce 100644
--- a/pahole.c
+++ b/pahole.c
@@ -31,6 +31,7 @@
 
 static struct btf_encoder *btf_encoder;
 static char *detached_btf_filename;
+struct cus *cus;
 static bool btf_encode;
 static bool ctf_encode;
 static bool sort_output;
@@ -3324,11 +3325,32 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
 			encoder = btf_encoder;
 		}
 
+		// Since we don't have yet a way to parallelize the BTF encoding, we
+		// need to ask the loader for the next CU that we can process, one
+		// that is loaded and is in order, if the next one isn't yet loaded,
+		// then return to let the DWARF loader thread to load the next one,
+		// eventually all will get processed, even if when all DWARF loading
+		// threads finish.
+		if (conf_load->reproducible_build) {
+			ret = LSK__KEEPIT; // we're not processing the cu passed to this
+					  // function, so keep it.
+			cu = cus__get_next_processable_cu(cus);
+			if (cu == NULL)
+				goto out_btf;
+		}
+
 		ret = btf_encoder__encode_cu(encoder, cu, conf_load);
 		if (ret < 0) {
 			fprintf(stderr, "Encountered error while encoding BTF.\n");
 			exit(1);
 		}
+
+		if (conf_load->reproducible_build) {
+			ret = LSK__KEEPIT; // we're not processing the cu passed to this function, so keep it.
+			// Kinda equivalent to LSK__DELETE since we processed this, but we can't delete it
+			// as we stash references to entries in CUs for 'struct function' in btf_encoder__add_saved_funcs()
+			// and btf_encoder__save_func(), so we can't delete them here. - Alan Maguire
+		}
 out_btf:
 		if (!thr_data) // See comment about reproducibe_build above
 			pthread_mutex_unlock(&btf_lock);
@@ -3632,6 +3654,24 @@ out_free:
 	return ret;
 }
 
+static int cus__flush_reproducible_build(struct cus *cus, struct btf_encoder *encoder, struct conf_load *conf_load)
+{
+	int err = 0;
+
+	while (true) {
+		struct cu *cu = cus__get_next_processable_cu(cus);
+
+		if (cu == NULL)
+			break;
+
+		err = btf_encoder__encode_cu(encoder, cu, conf_load);
+		if (err < 0)
+			break;
+	}
+
+	return err;
+}
+
 int main(int argc, char *argv[])
 {
 	int err, remaining, rc = EXIT_FAILURE;
@@ -3692,7 +3732,7 @@ int main(int argc, char *argv[])
 		}
 	}
 
-	struct cus *cus = cus__new();
+	cus = cus__new();
 	if (cus == NULL) {
 		fputs("pahole: insufficient memory\n", stderr);
 		goto out_dwarves_exit;
@@ -3797,6 +3837,12 @@ try_sole_arg_as_class_names:
 	header = NULL;
 
 	if (btf_encode && btf_encoder) { // maybe all CUs were filtered out and thus we don't have an encoder?
+		if (conf_load.reproducible_build &&
+		    cus__flush_reproducible_build(cus, btf_encoder, &conf_load) < 0) {
+			fprintf(stderr, "Encountered error while encoding BTF.\n");
+			exit(1);
+		}
+
 		err = btf_encoder__encode(btf_encoder);
 		if (err) {
 			fputs("Failed to encode BTF\n", stderr);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2024-04-12 21:16 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-02 19:39 [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Arnaldo Carvalho de Melo
2024-04-02 19:39 ` [PATCH 01/12] core: Allow asking for a reproducible build Arnaldo Carvalho de Melo
2024-04-02 19:39 ` [PATCH 02/12] pahole: Disable BTF multithreaded encoded when doing reproducible builds Arnaldo Carvalho de Melo
2024-04-03 18:19   ` Andrii Nakryiko
2024-04-03 21:38     ` Arnaldo Carvalho de Melo
2024-04-03 21:43       ` Andrii Nakryiko
2024-04-04  9:42   ` Jiri Olsa
2024-04-02 19:39 ` [PATCH 03/12] dwarf_loader: Separate creating the cu/dcu pair from processing it Arnaldo Carvalho de Melo
2024-04-04  9:42   ` Jiri Olsa
2024-04-02 19:39 ` [PATCH 04/12] dwarf_loader: Introduce dwarf_cus__process_cu() Arnaldo Carvalho de Melo
2024-04-02 19:39 ` [PATCH 05/12] dwarf_loader: Create the cu/dcu pair in dwarf_cus__nextcu() Arnaldo Carvalho de Melo
2024-04-02 19:39 ` [PATCH 06/12] dwarf_loader: Remove unused 'thr_data' arg from dwarf_cus__create_and_process_cu() Arnaldo Carvalho de Melo
2024-04-02 19:39 ` [PATCH 07/12] core: Add unlocked cus__add() variant Arnaldo Carvalho de Melo
2024-04-02 19:39 ` [PATCH 08/12] core: Add cus__remove(), counterpart of cus__add() Arnaldo Carvalho de Melo
2024-04-02 19:39 ` [PATCH 09/12] dwarf_loader: Add the cu to the cus list early, remove on LSK_DELETE Arnaldo Carvalho de Melo
2024-04-02 19:39 ` [PATCH 10/12] core/dwarf_loader: Add functions to set state of CU processing Arnaldo Carvalho de Melo
2024-04-02 19:39 ` [PATCH 11/12] pahole: Encode BTF serially in a reproducible build Arnaldo Carvalho de Melo
2024-04-02 19:39 ` [PATCH 12/12] tests: Add a BTF reproducible generation test Arnaldo Carvalho de Melo
2024-04-04  0:08 ` [RFC/PATCHES 00/12] pahole: Reproducible parallel DWARF loading/serial BTF encoding Eduard Zingerman
2024-04-04  8:05   ` Alan Maguire
2024-04-09 14:34     ` Eduard Zingerman
2024-04-09 14:56       ` Alexei Starovoitov
2024-04-09 15:01         ` Eduard Zingerman
2024-04-09 18:45           ` Arnaldo Carvalho de Melo
2024-04-09 19:29             ` Eduard Zingerman
2024-04-09 19:34               ` Alexei Starovoitov
2024-04-09 19:57               ` Arnaldo Carvalho de Melo
2024-04-12 20:37       ` Arnaldo Carvalho de Melo
2024-04-12 20:40         ` Eduard Zingerman
2024-04-12 21:09           ` Arnaldo Carvalho de Melo
2024-04-12 21:10             ` Eduard Zingerman
2024-04-04  8:58 ` Alan Maguire
2024-04-08 12:00   ` Alan Maguire
2024-04-08 14:39     ` Arnaldo Carvalho de Melo
2024-04-12 20:36       ` Arnaldo Carvalho de Melo
2024-04-04  9:42 ` Jiri Olsa
2024-04-12 21:15 [PATCH 00/12] Arnaldo Carvalho de Melo
2024-04-12 21:16 ` [PATCH 11/12] pahole: Encode BTF serially in a reproducible build Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).