dwarves.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH dwarves v3 0/4] Parallelize BTF type info generating of pahole
@ 2022-01-26  4:05 Kui-Feng Lee
  2022-01-26  4:05 ` [PATCH dwarves v3 1/4] dwarf_loader: Receive per-thread data on worker threads Kui-Feng Lee
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Kui-Feng Lee @ 2022-01-26  4:05 UTC (permalink / raw)
  To: dwarves, arnaldo.melo; +Cc: ast, daniel, andrii, bpf, Kui-Feng Lee

Creating an instance of btf for each worker thread allows
steal-function provided by pahole to add type info on multiple threads
without a lock.  The main thread merges the results of worker threads
to the primary instance.

Copying data from per-thread btf instances to the primary instance is
expensive now.  However, there is a patch landed at the bpf-next
repository. [1] With the patch for bpf-next and this patch, they drop
total runtime to 5.4s from 6.0s with "-j4" on my device to generate
BTF for Linux.

V3 includes following changes.

 - Merge types collected by workers threads at thread_exit and
   simplify the code creating btf_encoder.

 - Update libbpf and fix uses of deprecated APIs.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=d81283d27266
[v1] https://lore.kernel.org/dwarves/20220120010817.2803482-1-kuifeng@fb.com/
[v2] https://lore.kernel.org/dwarves/20220124191858.1601255-1-kuifeng@fb.com/

Kui-Feng Lee (4):
  dwarf_loader: Receive per-thread data on worker threads.
  dwarf_loader: Prepare and pass per-thread data to worker threads.
  pahole: Use per-thread btf instances to avoid mutex locking.
  libbpf: Update libbpf to a new revision.

 btf_encoder.c  |  25 ++++++----
 btf_encoder.h  |   2 +
 btf_loader.c   |   4 +-
 ctf_loader.c   |   2 +-
 dwarf_loader.c |  58 +++++++++++++++++------
 dwarves.h      |   9 +++-
 lib/bpf        |   2 +-
 pahole.c       | 124 +++++++++++++++++++++++++++++++++++++++++++++----
 pdwtags.c      |   3 +-
 pfunct.c       |   4 +-
 10 files changed, 193 insertions(+), 40 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH dwarves v3 1/4] dwarf_loader: Receive per-thread data on worker threads.
  2022-01-26  4:05 [PATCH dwarves v3 0/4] Parallelize BTF type info generating of pahole Kui-Feng Lee
@ 2022-01-26  4:05 ` Kui-Feng Lee
  2022-01-26  6:10   ` Andrii Nakryiko
  2022-01-26  4:05 ` [PATCH dwarves v3 2/4] dwarf_loader: Prepare and pass per-thread data to " Kui-Feng Lee
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Kui-Feng Lee @ 2022-01-26  4:05 UTC (permalink / raw)
  To: dwarves, arnaldo.melo; +Cc: ast, daniel, andrii, bpf, Kui-Feng Lee

Add arguments to steal and thread_exit callbacks of conf_load to
receive per-thread data.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
---
 btf_loader.c   | 2 +-
 ctf_loader.c   | 2 +-
 dwarf_loader.c | 4 ++--
 dwarves.h      | 5 +++--
 pahole.c       | 3 ++-
 pdwtags.c      | 3 ++-
 pfunct.c       | 4 +++-
 7 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/btf_loader.c b/btf_loader.c
index 7a5b16ff393e..b61cadd55127 100644
--- a/btf_loader.c
+++ b/btf_loader.c
@@ -624,7 +624,7 @@ static int cus__load_btf(struct cus *cus, struct conf_load *conf, const char *fi
 	 * The app stole this cu, possibly deleting it,
 	 * so forget about it
 	 */
-	if (conf && conf->steal && conf->steal(cu, conf))
+	if (conf && conf->steal && conf->steal(cu, conf, NULL))
 		return 0;
 
 	cus__add(cus, cu);
diff --git a/ctf_loader.c b/ctf_loader.c
index 7c34739afdce..de6d4dbfce97 100644
--- a/ctf_loader.c
+++ b/ctf_loader.c
@@ -722,7 +722,7 @@ int ctf__load_file(struct cus *cus, struct conf_load *conf,
 	 * The app stole this cu, possibly deleting it,
 	 * so forget about it
 	 */
-	if (conf && conf->steal && conf->steal(cu, conf))
+	if (conf && conf->steal && conf->steal(cu, conf, NULL))
 		return 0;
 
 	cus__add(cus, cu);
diff --git a/dwarf_loader.c b/dwarf_loader.c
index e30b03c1c541..bf9ea3765419 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -2686,7 +2686,7 @@ static int cu__finalize(struct cu *cu, struct conf_load *conf)
 {
 	cu__for_all_tags(cu, class_member__cache_byte_size, conf);
 	if (conf && conf->steal) {
-		return conf->steal(cu, conf);
+		return conf->steal(cu, conf, NULL);
 	}
 	return LSK__KEEPIT;
 }
@@ -2930,7 +2930,7 @@ static void *dwarf_cus__process_cu_thread(void *arg)
 			goto out_abort;
 	}
 
-	if (dcus->conf->thread_exit && dcus->conf->thread_exit() != 0)
+	if (dcus->conf->thread_exit && dcus->conf->thread_exit(dcus->conf, NULL) != 0)
 		goto out_abort;
 
 	return (void *)DWARF_CB_OK;
diff --git a/dwarves.h b/dwarves.h
index 52d162d67456..9a8e4de8843a 100644
--- a/dwarves.h
+++ b/dwarves.h
@@ -48,8 +48,9 @@ struct conf_fprintf;
  */
 struct conf_load {
 	enum load_steal_kind	(*steal)(struct cu *cu,
-					 struct conf_load *conf);
-	int			(*thread_exit)(void);
+					 struct conf_load *conf,
+					 void *thr_data);
+	int			(*thread_exit)(struct conf_load *conf, void *thr_data);
 	void			*cookie;
 	char			*format_path;
 	int			nr_jobs;
diff --git a/pahole.c b/pahole.c
index f3a51cb2fe74..f3eeaaca4cdf 100644
--- a/pahole.c
+++ b/pahole.c
@@ -2799,7 +2799,8 @@ out:
 static struct type_instance *header;
 
 static enum load_steal_kind pahole_stealer(struct cu *cu,
-					   struct conf_load *conf_load)
+					   struct conf_load *conf_load,
+					   void *thr_data)
 {
 	int ret = LSK__DELETE;
 
diff --git a/pdwtags.c b/pdwtags.c
index 2b5ba1bf6745..8b1d6f1c96cb 100644
--- a/pdwtags.c
+++ b/pdwtags.c
@@ -72,7 +72,8 @@ static int cu__emit_tags(struct cu *cu)
 }
 
 static enum load_steal_kind pdwtags_stealer(struct cu *cu,
-					    struct conf_load *conf_load __maybe_unused)
+					    struct conf_load *conf_load __maybe_unused,
+					    void *thr_data __maybe_unused)
 {
 	cu__emit_tags(cu);
 	return LSK__DELETE;
diff --git a/pfunct.c b/pfunct.c
index 5485622e639b..314915b774f4 100644
--- a/pfunct.c
+++ b/pfunct.c
@@ -489,7 +489,9 @@ int elf_symtabs__show(char *filenames[])
 	return EXIT_SUCCESS;
 }
 
-static enum load_steal_kind pfunct_stealer(struct cu *cu, struct conf_load *conf_load __maybe_unused)
+static enum load_steal_kind pfunct_stealer(struct cu *cu,
+					   struct conf_load *conf_load __maybe_unused,
+					   void *thr_data __maybe_unused)
 {
 
 	if (function_name) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH dwarves v3 2/4] dwarf_loader: Prepare and pass per-thread data to worker threads.
  2022-01-26  4:05 [PATCH dwarves v3 0/4] Parallelize BTF type info generating of pahole Kui-Feng Lee
  2022-01-26  4:05 ` [PATCH dwarves v3 1/4] dwarf_loader: Receive per-thread data on worker threads Kui-Feng Lee
@ 2022-01-26  4:05 ` Kui-Feng Lee
  2022-01-26  6:13   ` Andrii Nakryiko
  2022-01-26  4:05 ` [PATCH dwarves v3 3/4] pahole: Use per-thread btf instances to avoid mutex locking Kui-Feng Lee
  2022-01-26  4:05 ` [PATCH dwarves v3 4/4] libbpf: Update libbpf to a new revision Kui-Feng Lee
  3 siblings, 1 reply; 11+ messages in thread
From: Kui-Feng Lee @ 2022-01-26  4:05 UTC (permalink / raw)
  To: dwarves, arnaldo.melo; +Cc: ast, daniel, andrii, bpf, Kui-Feng Lee

Add interfaces to allow users of dwarf_loader to prepare and pass
per-thread data to steal-functions running on worker threads.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
---
 dwarf_loader.c | 58 +++++++++++++++++++++++++++++++++++++++-----------
 dwarves.h      |  4 ++++
 2 files changed, 49 insertions(+), 13 deletions(-)

diff --git a/dwarf_loader.c b/dwarf_loader.c
index bf9ea3765419..ed415d2db11f 100644
--- a/dwarf_loader.c
+++ b/dwarf_loader.c
@@ -2682,18 +2682,18 @@ static int class_member__cache_byte_size(struct tag *tag, struct cu *cu,
 	return 0;
 }
 
-static int cu__finalize(struct cu *cu, struct conf_load *conf)
+static int cu__finalize(struct cu *cu, struct conf_load *conf, void *thr_data)
 {
 	cu__for_all_tags(cu, class_member__cache_byte_size, conf);
 	if (conf && conf->steal) {
-		return conf->steal(cu, conf, NULL);
+		return conf->steal(cu, conf, thr_data);
 	}
 	return LSK__KEEPIT;
 }
 
-static int cus__finalize(struct cus *cus, struct cu *cu, struct conf_load *conf)
+static int cus__finalize(struct cus *cus, struct cu *cu, struct conf_load *conf, void *thr_data)
 {
-	int lsk = cu__finalize(cu, conf);
+	int lsk = cu__finalize(cu, conf, thr_data);
 	switch (lsk) {
 	case LSK__DELETE:
 		cu__delete(cu);
@@ -2862,7 +2862,13 @@ struct dwarf_cus {
 	struct dwarf_cu	    *type_dcu;
 };
 
-static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die, uint8_t pointer_size)
+struct dwarf_thread {
+	struct dwarf_cus	*dcus;
+	void			*data;
+};
+
+static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *cu_die,
+					    uint8_t pointer_size, void *thr_data)
 {
 	/*
 	 * DW_AT_name in DW_TAG_compile_unit can be NULL, first seen in:
@@ -2884,7 +2890,7 @@ static int dwarf_cus__create_and_process_cu(struct dwarf_cus *dcus, Dwarf_Die *c
 	cu->dfops = &dwarf__ops;
 
 	if (die__process_and_recode(cu_die, cu, dcus->conf) != 0 ||
-	    cus__finalize(dcus->cus, cu, dcus->conf) == LSK__STOP_LOADING)
+	    cus__finalize(dcus->cus, cu, dcus->conf, thr_data) == LSK__STOP_LOADING)
 		return DWARF_CB_ABORT;
 
        return DWARF_CB_OK;
@@ -2918,7 +2924,8 @@ out_unlock:
 
 static void *dwarf_cus__process_cu_thread(void *arg)
 {
-	struct dwarf_cus *dcus = arg;
+	struct dwarf_thread *dthr = arg;
+	struct dwarf_cus *dcus = dthr->dcus;
 	uint8_t pointer_size, offset_size;
 	Dwarf_Die die_mem, *cu_die;
 
@@ -2926,11 +2933,13 @@ static void *dwarf_cus__process_cu_thread(void *arg)
 		if (cu_die == NULL)
 			break;
 
-		if (dwarf_cus__create_and_process_cu(dcus, cu_die, pointer_size) == DWARF_CB_ABORT)
+		if (dwarf_cus__create_and_process_cu(dcus, cu_die,
+						     pointer_size, dthr->data) == DWARF_CB_ABORT)
 			goto out_abort;
 	}
 
-	if (dcus->conf->thread_exit && dcus->conf->thread_exit(dcus->conf, NULL) != 0)
+	if (dcus->conf->thread_exit &&
+	    dcus->conf->thread_exit(dcus->conf, dthr->data) != 0)
 		goto out_abort;
 
 	return (void *)DWARF_CB_OK;
@@ -2941,10 +2950,25 @@ out_abort:
 static int dwarf_cus__threaded_process_cus(struct dwarf_cus *dcus)
 {
 	pthread_t threads[dcus->conf->nr_jobs];
+	struct dwarf_thread dthr[dcus->conf->nr_jobs];
+	void *thread_data[dcus->conf->nr_jobs];
+	int res;
 	int i;
 
+	if (dcus->conf->threads_prepare) {
+		res = dcus->conf->threads_prepare(dcus->conf, dcus->conf->nr_jobs, thread_data);
+		if (res != 0)
+			return res;
+	} else
+		memset(thread_data, 0, sizeof(void *) * dcus->conf->nr_jobs);
+
 	for (i = 0; i < dcus->conf->nr_jobs; ++i) {
-		dcus->error = pthread_create(&threads[i], NULL, dwarf_cus__process_cu_thread, dcus);
+		dthr[i].dcus = dcus;
+		dthr[i].data = thread_data[i];
+
+		dcus->error = pthread_create(&threads[i], NULL,
+					     dwarf_cus__process_cu_thread,
+					     &dthr[i]);
 		if (dcus->error)
 			goto out_join;
 	}
@@ -2960,6 +2984,13 @@ out_join:
 			dcus->error = (long)res;
 	}
 
+	if (dcus->conf->threads_collect) {
+		res = dcus->conf->threads_collect(dcus->conf, dcus->conf->nr_jobs,
+						  thread_data, dcus->error);
+		if (dcus->error == 0)
+			dcus->error = res;
+	}
+
 	return dcus->error;
 }
 
@@ -2976,7 +3007,8 @@ static int __dwarf_cus__process_cus(struct dwarf_cus *dcus)
 		if (cu_die == NULL)
 			break;
 
-		if (dwarf_cus__create_and_process_cu(dcus, cu_die, pointer_size) == DWARF_CB_ABORT)
+		if (dwarf_cus__create_and_process_cu(dcus, cu_die,
+						     pointer_size, NULL) == DWARF_CB_ABORT)
 			return DWARF_CB_ABORT;
 
 		dcus->off = noff;
@@ -3070,7 +3102,7 @@ static int cus__merge_and_process_cu(struct cus *cus, struct conf_load *conf,
 	if (cu__resolve_func_ret_types(cu) != LSK__KEEPIT)
 		goto out_abort;
 
-	if (cus__finalize(cus, cu, conf) == LSK__STOP_LOADING)
+	if (cus__finalize(cus, cu, conf, NULL) == LSK__STOP_LOADING)
 		goto out_abort;
 
 	return 0;
@@ -3102,7 +3134,7 @@ static int cus__load_module(struct cus *cus, struct conf_load *conf,
 	}
 
 	if (type_cu != NULL) {
-		type_lsk = cu__finalize(type_cu, conf);
+		type_lsk = cu__finalize(type_cu, conf, NULL);
 		if (type_lsk == LSK__KEEPIT) {
 			cus__add(cus, type_cu);
 		}
diff --git a/dwarves.h b/dwarves.h
index 9a8e4de8843a..de152f9b64cf 100644
--- a/dwarves.h
+++ b/dwarves.h
@@ -71,6 +71,10 @@ struct conf_load {
 	const char		*kabi_prefix;
 	struct btf		*base_btf;
 	struct conf_fprintf	*conf_fprintf;
+	int			(*threads_prepare)(struct conf_load *conf, int nr_threads,
+						   void **thr_data);
+	int			(*threads_collect)(struct conf_load *conf, int nr_threads,
+						   void **thr_data, int error);
 };
 
 /** struct conf_fprintf - hints to the __fprintf routines
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH dwarves v3 3/4] pahole: Use per-thread btf instances to avoid mutex locking.
  2022-01-26  4:05 [PATCH dwarves v3 0/4] Parallelize BTF type info generating of pahole Kui-Feng Lee
  2022-01-26  4:05 ` [PATCH dwarves v3 1/4] dwarf_loader: Receive per-thread data on worker threads Kui-Feng Lee
  2022-01-26  4:05 ` [PATCH dwarves v3 2/4] dwarf_loader: Prepare and pass per-thread data to " Kui-Feng Lee
@ 2022-01-26  4:05 ` Kui-Feng Lee
  2022-01-26  6:07   ` Andrii Nakryiko
  2022-01-26  4:05 ` [PATCH dwarves v3 4/4] libbpf: Update libbpf to a new revision Kui-Feng Lee
  3 siblings, 1 reply; 11+ messages in thread
From: Kui-Feng Lee @ 2022-01-26  4:05 UTC (permalink / raw)
  To: dwarves, arnaldo.melo; +Cc: ast, daniel, andrii, bpf, Kui-Feng Lee

Create an instance of btf for each worker thread, and add type info to
the local btf instance in the steal-function of pahole without mutex
acquiring.  Once finished with all worker threads, merge all
per-thread btf instances to the primary btf instance.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
---
 btf_encoder.c |   5 +++
 btf_encoder.h |   2 +
 pahole.c      | 121 ++++++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 120 insertions(+), 8 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index 9d015f304e92..56a76f5d7275 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -1529,3 +1529,8 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu)
 out:
 	return err;
 }
+
+struct btf *btf_encoder__btf(struct btf_encoder *encoder)
+{
+	return encoder->btf;
+}
diff --git a/btf_encoder.h b/btf_encoder.h
index f133b0d7674d..0f0eee84df74 100644
--- a/btf_encoder.h
+++ b/btf_encoder.h
@@ -29,4 +29,6 @@ struct btf_encoder *btf_encoders__first(struct list_head *encoders);
 
 struct btf_encoder *btf_encoders__next(struct btf_encoder *encoder);
 
+struct btf *btf_encoder__btf(struct btf_encoder *encoder);
+
 #endif /* _BTF_ENCODER_H_ */
diff --git a/pahole.c b/pahole.c
index f3eeaaca4cdf..525eb4d90b08 100644
--- a/pahole.c
+++ b/pahole.c
@@ -29,6 +29,7 @@
 #include "btf_encoder.h"
 
 static struct btf_encoder *btf_encoder;
+static pthread_mutex_t btf_encoder_lock = PTHREAD_MUTEX_INITIALIZER;
 static char *detached_btf_filename;
 static bool btf_encode;
 static bool btf_gen_floats;
@@ -2798,6 +2799,65 @@ out:
 
 static struct type_instance *header;
 
+struct thread_data {
+	struct btf *btf;
+	struct btf_encoder *encoder;
+};
+
+static int pahole_threads_prepare(struct conf_load *conf, int nr_threads, void **thr_data)
+{
+	int i;
+	struct thread_data *threads = calloc(sizeof(struct thread_data), nr_threads);
+
+	for (i = 0; i < nr_threads; i++)
+		thr_data[i] = threads + i;
+
+	return 0;
+}
+
+static int pahole_thread_exit(struct conf_load *conf, void *thr_data)
+{
+	struct thread_data *thread = thr_data;
+
+	if (thread == NULL)
+		return 0;
+
+	/*
+	 * Here we will call btf__dedup() here once we extend
+	 * btf__dedup().
+	 */
+
+	if (thread->encoder == btf_encoder) {
+		/* Release the lock acuqired when created btf_encoder */
+		pthread_mutex_unlock(&btf_encoder_lock);
+		return 0;
+	}
+
+	pthread_mutex_lock(&btf_encoder_lock);
+	btf__add_btf(btf_encoder__btf(btf_encoder), thread->btf);
+	pthread_mutex_unlock(&btf_encoder_lock);
+
+	btf_encoder__delete(thread->encoder);
+	thread->encoder = NULL;
+
+	return 0;
+}
+
+static int pahole_threads_collect(struct conf_load *conf, int nr_threads, void **thr_data,
+				  int error)
+{
+	struct thread_data **threads = (struct thread_data **)thr_data;
+	int i;
+
+	for (i = 0; i < nr_threads; i++) {
+		if (threads[i]->encoder && threads[i]->encoder != btf_encoder)
+			btf_encoder__delete(threads[i]->encoder);
+	}
+	free(threads[0]);
+
+	return 0;
+}
+
 static enum load_steal_kind pahole_stealer(struct cu *cu,
 					   struct conf_load *conf_load,
 					   void *thr_data)
@@ -2819,30 +2879,72 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
 
 	if (btf_encode) {
 		static pthread_mutex_t btf_lock = PTHREAD_MUTEX_INITIALIZER;
+		struct btf_encoder *encoder;
 
-		pthread_mutex_lock(&btf_lock);
 		/*
 		 * FIXME:
 		 *
 		 * This should be really done at main(), but since in the current codebase only at this
 		 * point we'll have cu->elf setup...
 		 */
+		pthread_mutex_lock(&btf_lock);
 		if (!btf_encoder) {
-			btf_encoder = btf_encoder__new(cu, detached_btf_filename, conf_load->base_btf, skip_encoding_btf_vars,
-						       btf_encode_force, btf_gen_floats, global_verbose);
-			if (btf_encoder == NULL) {
-				ret = LSK__STOP_LOADING;
-				goto out_btf;
+			/*
+			 * btf_encoder is the primary encoder.
+			 * And, it is used by the thread
+			 * create it.
+			 */
+			btf_encoder = btf_encoder__new(cu, detached_btf_filename,
+						       conf_load->base_btf,
+						       skip_encoding_btf_vars,
+						       btf_encode_force,
+						       btf_gen_floats, global_verbose);
+			if (btf_encoder && thr_data) {
+				struct thread_data *thread = (struct thread_data *)thr_data;
+
+				thread->encoder = btf_encoder;
+				thread->btf = btf_encoder__btf(btf_encoder);
+				/* Will be relased by pahole_thread_exit() */
+				pthread_mutex_lock(&btf_encoder_lock);
 			}
 		}
+		pthread_mutex_unlock(&btf_lock);
+
+		if (btf_encoder == NULL) {
+			ret = LSK__STOP_LOADING;
+			goto out_btf;
+		}
 
-		if (btf_encoder__encode_cu(btf_encoder, cu)) {
+		/*
+		 * thr_data keeps per-thread data for worker threads.  Each worker thread
+		 * has an encoder.  The main thread will merge the data collected by all
+		 * these encoders to btf_encoder.  However, the first thread reaching this
+		 * function creates btf_encoder and reuses it as its local encoder.  It
+		 * avoids copying the data collected by the first thread.
+		 */
+		if (thr_data) {
+			struct thread_data *thread = (struct thread_data *)thr_data;
+
+			if (thread->encoder == NULL) {
+				thread->encoder =
+					btf_encoder__new(cu, detached_btf_filename,
+							 NULL,
+							 skip_encoding_btf_vars,
+							 btf_encode_force,
+							 btf_gen_floats,
+							 global_verbose);
+				thread->btf = btf_encoder__btf(thread->encoder);
+			}
+			encoder = thread->encoder;
+		} else
+			encoder = btf_encoder;
+
+		if (btf_encoder__encode_cu(encoder, cu)) {
 			fprintf(stderr, "Encountered error while encoding BTF.\n");
 			exit(1);
 		}
 		ret = LSK__DELETE;
 out_btf:
-		pthread_mutex_unlock(&btf_lock);
 		return ret;
 	}
 #if 0
@@ -3207,6 +3309,9 @@ int main(int argc, char *argv[])
 	memset(tab, ' ', sizeof(tab) - 1);
 
 	conf_load.steal = pahole_stealer;
+	conf_load.thread_exit = pahole_thread_exit;
+	conf_load.threads_prepare = pahole_threads_prepare;
+	conf_load.threads_collect = pahole_threads_collect;
 
 	// Make 'pahole --header type < file' a shorter form of 'pahole -C type --count 1 < file'
 	if (conf.header_type && !class_name && prettify_input) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH dwarves v3 4/4] libbpf: Update libbpf to a new revision.
  2022-01-26  4:05 [PATCH dwarves v3 0/4] Parallelize BTF type info generating of pahole Kui-Feng Lee
                   ` (2 preceding siblings ...)
  2022-01-26  4:05 ` [PATCH dwarves v3 3/4] pahole: Use per-thread btf instances to avoid mutex locking Kui-Feng Lee
@ 2022-01-26  4:05 ` Kui-Feng Lee
  2022-01-26  6:08   ` Andrii Nakryiko
  3 siblings, 1 reply; 11+ messages in thread
From: Kui-Feng Lee @ 2022-01-26  4:05 UTC (permalink / raw)
  To: dwarves, arnaldo.melo; +Cc: ast, daniel, andrii, bpf, Kui-Feng Lee

Replace deprecated APIs with new ones.
---
 btf_encoder.c | 20 ++++++++++----------
 btf_loader.c  |  2 +-
 lib/bpf       |  2 +-
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/btf_encoder.c b/btf_encoder.c
index 56a76f5d7275..a31dbe3edc93 100644
--- a/btf_encoder.c
+++ b/btf_encoder.c
@@ -172,7 +172,7 @@ __attribute ((format (printf, 5, 6)))
 static void btf__log_err(const struct btf *btf, int kind, const char *name,
 			 bool output_cr, const char *fmt, ...)
 {
-	fprintf(stderr, "[%u] %s %s", btf__get_nr_types(btf) + 1,
+	fprintf(stderr, "[%u] %s %s", btf__type_cnt(btf),
 		btf_kind_str[kind], name ?: "(anon)");
 
 	if (fmt && *fmt) {
@@ -203,7 +203,7 @@ static void btf_encoder__log_type(const struct btf_encoder *encoder, const struc
 	out = err ? stderr : stdout;
 
 	fprintf(out, "[%u] %s %s",
-		btf__get_nr_types(btf), btf_kind_str[kind],
+		btf__type_cnt(btf) - 1, btf_kind_str[kind],
 		btf__printable_name(btf, t->name_off));
 
 	if (fmt && *fmt) {
@@ -449,10 +449,10 @@ static int btf_encoder__add_field(struct btf_encoder *encoder, const char *name,
 	int err;
 
 	err = btf__add_field(btf, name, type, offset, bitfield_size);
-	t = btf__type_by_id(btf, btf__get_nr_types(btf));
+	t = btf__type_by_id(btf, btf__type_cnt(btf) - 1);
 	if (err) {
 		fprintf(stderr, "[%u] %s %s's field '%s' offset=%u bit_size=%u type=%u Error emitting field\n",
-			btf__get_nr_types(btf), btf_kind_str[btf_kind(t)],
+			btf__type_cnt(btf) - 1, btf_kind_str[btf_kind(t)],
 			btf__printable_name(btf, t->name_off),
 			name, offset, bitfield_size, type);
 	} else {
@@ -899,9 +899,9 @@ static int btf_encoder__write_raw_file(struct btf_encoder *encoder)
 	const void *raw_btf_data;
 	int fd, err;
 
-	raw_btf_data = btf__get_raw_data(encoder->btf, &raw_btf_size);
+	raw_btf_data = btf__raw_data(encoder->btf, &raw_btf_size);
 	if (raw_btf_data == NULL) {
-		fprintf(stderr, "%s: btf__get_raw_data failed!\n", __func__);
+		fprintf(stderr, "%s: btf__raw_data failed!\n", __func__);
 		return -1;
 	}
 
@@ -976,7 +976,7 @@ static int btf_encoder__write_elf(struct btf_encoder *encoder)
 		}
 	}
 
-	raw_btf_data = btf__get_raw_data(btf, &raw_btf_size);
+	raw_btf_data = btf__raw_data(btf, &raw_btf_size);
 
 	if (btf_data) {
 		/* Existing .BTF section found */
@@ -1043,10 +1043,10 @@ int btf_encoder__encode(struct btf_encoder *encoder)
 		btf_encoder__add_datasec(encoder, PERCPU_SECTION);
 
 	/* Empty file, nothing to do, so... done! */
-	if (btf__get_nr_types(encoder->btf) == 0)
+	if (btf__type_cnt(encoder->btf) - 1 == 0)
 		return 0;
 
-	if (btf__dedup(encoder->btf, NULL, NULL)) {
+	if (btf__dedup(encoder->btf, NULL)) {
 		fprintf(stderr, "%s: btf__dedup failed!\n", __func__);
 		return -1;
 	}
@@ -1403,7 +1403,7 @@ void btf_encoder__delete(struct btf_encoder *encoder)
 
 int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu)
 {
-	uint32_t type_id_off = btf__get_nr_types(encoder->btf);
+	uint32_t type_id_off = btf__type_cnt(encoder->btf) - 1;
 	struct llvm_annotation *annot;
 	int btf_type_id, tag_type_id;
 	uint32_t core_id;
diff --git a/btf_loader.c b/btf_loader.c
index b61cadd55127..d9d59aa889a0 100644
--- a/btf_loader.c
+++ b/btf_loader.c
@@ -399,7 +399,7 @@ static int btf__load_types(struct btf *btf, struct cu *cu)
 	uint32_t type_index;
 	int err;
 
-	for (type_index = 1; type_index <= btf__get_nr_types(btf); type_index++) {
+	for (type_index = 1; type_index <= btf__type_cnt(btf) - 1; type_index++) {
 		const struct btf_type *type_ptr = btf__type_by_id(btf, type_index);
 		uint32_t type = btf_kind(type_ptr);
 
diff --git a/lib/bpf b/lib/bpf
index 94a49850c5ee..393a058d061d 160000
--- a/lib/bpf
+++ b/lib/bpf
@@ -1 +1 @@
-Subproject commit 94a49850c5ee61ea02dfcbabf48013391e8cecdf
+Subproject commit 393a058d061d49d5c3055fa9eefafb4c0c31ccc3
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH dwarves v3 3/4] pahole: Use per-thread btf instances to avoid mutex locking.
  2022-01-26  4:05 ` [PATCH dwarves v3 3/4] pahole: Use per-thread btf instances to avoid mutex locking Kui-Feng Lee
@ 2022-01-26  6:07   ` Andrii Nakryiko
  2022-01-26 18:39     ` Kui-Feng Lee
  0 siblings, 1 reply; 11+ messages in thread
From: Andrii Nakryiko @ 2022-01-26  6:07 UTC (permalink / raw)
  To: Kui-Feng Lee
  Cc: dwarves, Arnaldo Carvalho de Melo, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, bpf

On Tue, Jan 25, 2022 at 8:07 PM Kui-Feng Lee <kuifeng@fb.com> wrote:
>
> Create an instance of btf for each worker thread, and add type info to
> the local btf instance in the steal-function of pahole without mutex
> acquiring.  Once finished with all worker threads, merge all
> per-thread btf instances to the primary btf instance.
>
> Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> ---
>  btf_encoder.c |   5 +++
>  btf_encoder.h |   2 +
>  pahole.c      | 121 ++++++++++++++++++++++++++++++++++++++++++++++----
>  3 files changed, 120 insertions(+), 8 deletions(-)
>
> diff --git a/btf_encoder.c b/btf_encoder.c
> index 9d015f304e92..56a76f5d7275 100644
> --- a/btf_encoder.c
> +++ b/btf_encoder.c
> @@ -1529,3 +1529,8 @@ int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu)
>  out:
>         return err;
>  }
> +
> +struct btf *btf_encoder__btf(struct btf_encoder *encoder)
> +{
> +       return encoder->btf;
> +}
> diff --git a/btf_encoder.h b/btf_encoder.h
> index f133b0d7674d..0f0eee84df74 100644
> --- a/btf_encoder.h
> +++ b/btf_encoder.h
> @@ -29,4 +29,6 @@ struct btf_encoder *btf_encoders__first(struct list_head *encoders);
>
>  struct btf_encoder *btf_encoders__next(struct btf_encoder *encoder);
>
> +struct btf *btf_encoder__btf(struct btf_encoder *encoder);
> +
>  #endif /* _BTF_ENCODER_H_ */
> diff --git a/pahole.c b/pahole.c
> index f3eeaaca4cdf..525eb4d90b08 100644
> --- a/pahole.c
> +++ b/pahole.c
> @@ -29,6 +29,7 @@
>  #include "btf_encoder.h"
>
>  static struct btf_encoder *btf_encoder;
> +static pthread_mutex_t btf_encoder_lock = PTHREAD_MUTEX_INITIALIZER;
>  static char *detached_btf_filename;
>  static bool btf_encode;
>  static bool btf_gen_floats;
> @@ -2798,6 +2799,65 @@ out:
>
>  static struct type_instance *header;
>
> +struct thread_data {
> +       struct btf *btf;
> +       struct btf_encoder *encoder;
> +};
> +
> +static int pahole_threads_prepare(struct conf_load *conf, int nr_threads, void **thr_data)
> +{
> +       int i;
> +       struct thread_data *threads = calloc(sizeof(struct thread_data), nr_threads);
> +
> +       for (i = 0; i < nr_threads; i++)
> +               thr_data[i] = threads + i;
> +
> +       return 0;
> +}
> +
> +static int pahole_thread_exit(struct conf_load *conf, void *thr_data)
> +{
> +       struct thread_data *thread = thr_data;
> +
> +       if (thread == NULL)
> +               return 0;
> +
> +       /*
> +        * Here we will call btf__dedup() here once we extend
> +        * btf__dedup().
> +        */
> +
> +       if (thread->encoder == btf_encoder) {
> +               /* Release the lock acuqired when created btf_encoder */

typo: acquired

> +               pthread_mutex_unlock(&btf_encoder_lock);

Splitting pthread_mutex lock/unlock like this is extremely dangerous
and error prone. If that's the price for reusing global btf_encoder
for first worker, then I'd rather not reuse btf_encoder or revert back
to doing btf__add_btf() and doing btf_encoder__delete() in the main
thread.

Please question and push back the approach and code review feedback if
something doesn't feel natural or is causing more problems than
solves.

I think the cleanest solution would be to not reuse global btf_encoder
for the first worker. I suspect the time difference isn't big, so I'd
optimize for simplicity and clean separation. But if it is causing
unnecessary slowdown, then as I said, let's just revert back to your
previous approach with doing btf__add_btf() in the main thread.

> +               return 0;
> +       }
> +
> +       pthread_mutex_lock(&btf_encoder_lock);
> +       btf__add_btf(btf_encoder__btf(btf_encoder), thread->btf);
> +       pthread_mutex_unlock(&btf_encoder_lock);
> +
> +       btf_encoder__delete(thread->encoder);
> +       thread->encoder = NULL;
> +
> +       return 0;
> +}
> +
> +static int pahole_threads_collect(struct conf_load *conf, int nr_threads, void **thr_data,
> +                                 int error)
> +{
> +       struct thread_data **threads = (struct thread_data **)thr_data;
> +       int i;
> +
> +       for (i = 0; i < nr_threads; i++) {
> +               if (threads[i]->encoder && threads[i]->encoder != btf_encoder)
> +                       btf_encoder__delete(threads[i]->encoder);
> +       }
> +       free(threads[0]);
> +
> +       return 0;
> +}
> +
>  static enum load_steal_kind pahole_stealer(struct cu *cu,
>                                            struct conf_load *conf_load,
>                                            void *thr_data)
> @@ -2819,30 +2879,72 @@ static enum load_steal_kind pahole_stealer(struct cu *cu,
>
>         if (btf_encode) {
>                 static pthread_mutex_t btf_lock = PTHREAD_MUTEX_INITIALIZER;
> +               struct btf_encoder *encoder;
>
> -               pthread_mutex_lock(&btf_lock);
>                 /*
>                  * FIXME:
>                  *
>                  * This should be really done at main(), but since in the current codebase only at this
>                  * point we'll have cu->elf setup...
>                  */
> +               pthread_mutex_lock(&btf_lock);
>                 if (!btf_encoder) {
> -                       btf_encoder = btf_encoder__new(cu, detached_btf_filename, conf_load->base_btf, skip_encoding_btf_vars,
> -                                                      btf_encode_force, btf_gen_floats, global_verbose);
> -                       if (btf_encoder == NULL) {
> -                               ret = LSK__STOP_LOADING;
> -                               goto out_btf;
> +                       /*
> +                        * btf_encoder is the primary encoder.
> +                        * And, it is used by the thread
> +                        * create it.
> +                        */
> +                       btf_encoder = btf_encoder__new(cu, detached_btf_filename,
> +                                                      conf_load->base_btf,
> +                                                      skip_encoding_btf_vars,
> +                                                      btf_encode_force,
> +                                                      btf_gen_floats, global_verbose);
> +                       if (btf_encoder && thr_data) {
> +                               struct thread_data *thread = (struct thread_data *)thr_data;

nit: no need for the cast

> +
> +                               thread->encoder = btf_encoder;
> +                               thread->btf = btf_encoder__btf(btf_encoder);
> +                               /* Will be relased by pahole_thread_exit() */

typo: released

> +                               pthread_mutex_lock(&btf_encoder_lock);
>                         }
>                 }
> +               pthread_mutex_unlock(&btf_lock);
> +

[...]

> @@ -3207,6 +3309,9 @@ int main(int argc, char *argv[])
>         memset(tab, ' ', sizeof(tab) - 1);
>
>         conf_load.steal = pahole_stealer;
> +       conf_load.thread_exit = pahole_thread_exit;
> +       conf_load.threads_prepare = pahole_threads_prepare;
> +       conf_load.threads_collect = pahole_threads_collect;
>
>         // Make 'pahole --header type < file' a shorter form of 'pahole -C type --count 1 < file'
>         if (conf.header_type && !class_name && prettify_input) {
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH dwarves v3 4/4] libbpf: Update libbpf to a new revision.
  2022-01-26  4:05 ` [PATCH dwarves v3 4/4] libbpf: Update libbpf to a new revision Kui-Feng Lee
@ 2022-01-26  6:08   ` Andrii Nakryiko
  0 siblings, 0 replies; 11+ messages in thread
From: Andrii Nakryiko @ 2022-01-26  6:08 UTC (permalink / raw)
  To: Kui-Feng Lee
  Cc: dwarves, Arnaldo Carvalho de Melo, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, bpf

On Tue, Jan 25, 2022 at 8:07 PM Kui-Feng Lee <kuifeng@fb.com> wrote:
>
> Replace deprecated APIs with new ones.
> ---

LGTM with few nits below.

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  btf_encoder.c | 20 ++++++++++----------
>  btf_loader.c  |  2 +-
>  lib/bpf       |  2 +-
>  3 files changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/btf_encoder.c b/btf_encoder.c
> index 56a76f5d7275..a31dbe3edc93 100644
> --- a/btf_encoder.c
> +++ b/btf_encoder.c
> @@ -172,7 +172,7 @@ __attribute ((format (printf, 5, 6)))
>  static void btf__log_err(const struct btf *btf, int kind, const char *name,
>                          bool output_cr, const char *fmt, ...)
>  {
> -       fprintf(stderr, "[%u] %s %s", btf__get_nr_types(btf) + 1,
> +       fprintf(stderr, "[%u] %s %s", btf__type_cnt(btf),
>                 btf_kind_str[kind], name ?: "(anon)");
>
>         if (fmt && *fmt) {
> @@ -203,7 +203,7 @@ static void btf_encoder__log_type(const struct btf_encoder *encoder, const struc
>         out = err ? stderr : stdout;
>
>         fprintf(out, "[%u] %s %s",
> -               btf__get_nr_types(btf), btf_kind_str[kind],
> +               btf__type_cnt(btf) - 1, btf_kind_str[kind],
>                 btf__printable_name(btf, t->name_off));
>
>         if (fmt && *fmt) {
> @@ -449,10 +449,10 @@ static int btf_encoder__add_field(struct btf_encoder *encoder, const char *name,
>         int err;
>
>         err = btf__add_field(btf, name, type, offset, bitfield_size);
> -       t = btf__type_by_id(btf, btf__get_nr_types(btf));
> +       t = btf__type_by_id(btf, btf__type_cnt(btf) - 1);
>         if (err) {
>                 fprintf(stderr, "[%u] %s %s's field '%s' offset=%u bit_size=%u type=%u Error emitting field\n",
> -                       btf__get_nr_types(btf), btf_kind_str[btf_kind(t)],
> +                       btf__type_cnt(btf) - 1, btf_kind_str[btf_kind(t)],
>                         btf__printable_name(btf, t->name_off),
>                         name, offset, bitfield_size, type);
>         } else {
> @@ -899,9 +899,9 @@ static int btf_encoder__write_raw_file(struct btf_encoder *encoder)
>         const void *raw_btf_data;
>         int fd, err;
>
> -       raw_btf_data = btf__get_raw_data(encoder->btf, &raw_btf_size);
> +       raw_btf_data = btf__raw_data(encoder->btf, &raw_btf_size);
>         if (raw_btf_data == NULL) {
> -               fprintf(stderr, "%s: btf__get_raw_data failed!\n", __func__);
> +               fprintf(stderr, "%s: btf__raw_data failed!\n", __func__);
>                 return -1;
>         }
>
> @@ -976,7 +976,7 @@ static int btf_encoder__write_elf(struct btf_encoder *encoder)
>                 }
>         }
>
> -       raw_btf_data = btf__get_raw_data(btf, &raw_btf_size);
> +       raw_btf_data = btf__raw_data(btf, &raw_btf_size);
>
>         if (btf_data) {
>                 /* Existing .BTF section found */
> @@ -1043,10 +1043,10 @@ int btf_encoder__encode(struct btf_encoder *encoder)
>                 btf_encoder__add_datasec(encoder, PERCPU_SECTION);
>
>         /* Empty file, nothing to do, so... done! */
> -       if (btf__get_nr_types(encoder->btf) == 0)
> +       if (btf__type_cnt(encoder->btf) - 1 == 0)

btf__type_cnt() == 1 ?

>                 return 0;
>
> -       if (btf__dedup(encoder->btf, NULL, NULL)) {
> +       if (btf__dedup(encoder->btf, NULL)) {
>                 fprintf(stderr, "%s: btf__dedup failed!\n", __func__);
>                 return -1;
>         }
> @@ -1403,7 +1403,7 @@ void btf_encoder__delete(struct btf_encoder *encoder)
>
>  int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu)
>  {
> -       uint32_t type_id_off = btf__get_nr_types(encoder->btf);
> +       uint32_t type_id_off = btf__type_cnt(encoder->btf) - 1;
>         struct llvm_annotation *annot;
>         int btf_type_id, tag_type_id;
>         uint32_t core_id;
> diff --git a/btf_loader.c b/btf_loader.c
> index b61cadd55127..d9d59aa889a0 100644
> --- a/btf_loader.c
> +++ b/btf_loader.c
> @@ -399,7 +399,7 @@ static int btf__load_types(struct btf *btf, struct cu *cu)
>         uint32_t type_index;
>         int err;
>
> -       for (type_index = 1; type_index <= btf__get_nr_types(btf); type_index++) {
> +       for (type_index = 1; type_index <= btf__type_cnt(btf) - 1; type_index++) {

nit: type_index < btf__type_cnt(btf) would be more natural


>                 const struct btf_type *type_ptr = btf__type_by_id(btf, type_index);
>                 uint32_t type = btf_kind(type_ptr);
>
> diff --git a/lib/bpf b/lib/bpf
> index 94a49850c5ee..393a058d061d 160000
> --- a/lib/bpf
> +++ b/lib/bpf
> @@ -1 +1 @@
> -Subproject commit 94a49850c5ee61ea02dfcbabf48013391e8cecdf
> +Subproject commit 393a058d061d49d5c3055fa9eefafb4c0c31ccc3
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH dwarves v3 1/4] dwarf_loader: Receive per-thread data on worker threads.
  2022-01-26  4:05 ` [PATCH dwarves v3 1/4] dwarf_loader: Receive per-thread data on worker threads Kui-Feng Lee
@ 2022-01-26  6:10   ` Andrii Nakryiko
  0 siblings, 0 replies; 11+ messages in thread
From: Andrii Nakryiko @ 2022-01-26  6:10 UTC (permalink / raw)
  To: Kui-Feng Lee
  Cc: dwarves, Arnaldo Carvalho de Melo, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, bpf

On Tue, Jan 25, 2022 at 8:06 PM Kui-Feng Lee <kuifeng@fb.com> wrote:
>
> Add arguments to steal and thread_exit callbacks of conf_load to
> receive per-thread data.
>
> Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> ---

LGTM.

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  btf_loader.c   | 2 +-
>  ctf_loader.c   | 2 +-
>  dwarf_loader.c | 4 ++--
>  dwarves.h      | 5 +++--
>  pahole.c       | 3 ++-
>  pdwtags.c      | 3 ++-
>  pfunct.c       | 4 +++-
>  7 files changed, 14 insertions(+), 9 deletions(-)
>

[...]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH dwarves v3 2/4] dwarf_loader: Prepare and pass per-thread data to worker threads.
  2022-01-26  4:05 ` [PATCH dwarves v3 2/4] dwarf_loader: Prepare and pass per-thread data to " Kui-Feng Lee
@ 2022-01-26  6:13   ` Andrii Nakryiko
  0 siblings, 0 replies; 11+ messages in thread
From: Andrii Nakryiko @ 2022-01-26  6:13 UTC (permalink / raw)
  To: Kui-Feng Lee
  Cc: dwarves, Arnaldo Carvalho de Melo, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, bpf

On Tue, Jan 25, 2022 at 8:07 PM Kui-Feng Lee <kuifeng@fb.com> wrote:
>
> Add interfaces to allow users of dwarf_loader to prepare and pass
> per-thread data to steal-functions running on worker threads.
>
> Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> ---

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  dwarf_loader.c | 58 +++++++++++++++++++++++++++++++++++++++-----------
>  dwarves.h      |  4 ++++
>  2 files changed, 49 insertions(+), 13 deletions(-)
>

[...]

>  static int dwarf_cus__threaded_process_cus(struct dwarf_cus *dcus)
>  {
>         pthread_t threads[dcus->conf->nr_jobs];
> +       struct dwarf_thread dthr[dcus->conf->nr_jobs];
> +       void *thread_data[dcus->conf->nr_jobs];
> +       int res;
>         int i;
>
> +       if (dcus->conf->threads_prepare) {
> +               res = dcus->conf->threads_prepare(dcus->conf, dcus->conf->nr_jobs, thread_data);
> +               if (res != 0)
> +                       return res;
> +       } else
> +               memset(thread_data, 0, sizeof(void *) * dcus->conf->nr_jobs);

at least in kernel code style, if one branch of if/else has {}, the
other has to have {} as well, even if it's a single-line statement

> +
>         for (i = 0; i < dcus->conf->nr_jobs; ++i) {
> -               dcus->error = pthread_create(&threads[i], NULL, dwarf_cus__process_cu_thread, dcus);
> +               dthr[i].dcus = dcus;
> +               dthr[i].data = thread_data[i];
> +
> +               dcus->error = pthread_create(&threads[i], NULL,
> +                                            dwarf_cus__process_cu_thread,
> +                                            &dthr[i]);
>                 if (dcus->error)
>                         goto out_join;
>         }

[...]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH dwarves v3 3/4] pahole: Use per-thread btf instances to avoid mutex locking.
  2022-01-26  6:07   ` Andrii Nakryiko
@ 2022-01-26 18:39     ` Kui-Feng Lee
  2022-01-26 19:54       ` Andrii Nakryiko
  0 siblings, 1 reply; 11+ messages in thread
From: Kui-Feng Lee @ 2022-01-26 18:39 UTC (permalink / raw)
  To: andrii.nakryiko; +Cc: daniel, ast, arnaldo.melo, andrii, dwarves, bpf

On Tue, 2022-01-25 at 22:07 -0800, Andrii Nakryiko wrote:
> On Tue, Jan 25, 2022 at 8:07 PM Kui-Feng Lee <kuifeng@fb.com> wrote:
> > 
> > Create an instance of btf for each worker thread, and add type info
> > to
> > the local btf instance in the steal-function of pahole without
> > mutex
> > acquiring.  Once finished with all worker threads, merge all
> > per-thread btf instances to the primary btf instance.
> > 
> > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> > 
............ cut ...........
> > +       struct thread_data *thread = thr_data;
> > +
> > +       if (thread == NULL)
> > +               return 0;
> > +
> > +       /*
> > +        * Here we will call btf__dedup() here once we extend
> > +        * btf__dedup().
> > +        */
> > +
> > +       if (thread->encoder == btf_encoder) {
> > +               /* Release the lock acuqired when created
> > btf_encoder */
> 
> typo: acquired
> 
> > +               pthread_mutex_unlock(&btf_encoder_lock);
> 
> Splitting pthread_mutex lock/unlock like this is extremely dangerous
> and error prone. If that's the price for reusing global btf_encoder
> for first worker, then I'd rather not reuse btf_encoder or revert
> back
> to doing btf__add_btf() and doing btf_encoder__delete() in the main
> thread.
> 
> Please question and push back the approach and code review feedback
> if
> something doesn't feel natural or is causing more problems than
> solves.
> 
> I think the cleanest solution would be to not reuse global btf_encoder
> for the first worker. I suspect the time difference isn't big, so I'd
> optimize for simplicity and clean separation. But if it is causing
> unnecessary slowdown, then as I said, let's just revert back to your
> previous approach with doing btf__add_btf() in the main thread.
> 

Your concerns make sense.
I tried the solutions w/ & w/o reusing btf_encoder.  The differencies
are obviously.  So, I will rollback to calling btf__add_btf() at the
main thread.

w/o reusing: AVG 5.78467 P50 5.88 P70 6.03 P90 6.10
w/  reusing: AVG 5.304 P50 5.12 P70 5.17 P90 5.46




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH dwarves v3 3/4] pahole: Use per-thread btf instances to avoid mutex locking.
  2022-01-26 18:39     ` Kui-Feng Lee
@ 2022-01-26 19:54       ` Andrii Nakryiko
  0 siblings, 0 replies; 11+ messages in thread
From: Andrii Nakryiko @ 2022-01-26 19:54 UTC (permalink / raw)
  To: Kui-Feng Lee; +Cc: daniel, ast, arnaldo.melo, andrii, dwarves, bpf

On Wed, Jan 26, 2022 at 10:39 AM Kui-Feng Lee <kuifeng@fb.com> wrote:
>
> On Tue, 2022-01-25 at 22:07 -0800, Andrii Nakryiko wrote:
> > On Tue, Jan 25, 2022 at 8:07 PM Kui-Feng Lee <kuifeng@fb.com> wrote:
> > >
> > > Create an instance of btf for each worker thread, and add type info
> > > to
> > > the local btf instance in the steal-function of pahole without
> > > mutex
> > > acquiring.  Once finished with all worker threads, merge all
> > > per-thread btf instances to the primary btf instance.
> > >
> > > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> > >
> ............ cut ...........
> > > +       struct thread_data *thread = thr_data;
> > > +
> > > +       if (thread == NULL)
> > > +               return 0;
> > > +
> > > +       /*
> > > +        * Here we will call btf__dedup() here once we extend
> > > +        * btf__dedup().
> > > +        */
> > > +
> > > +       if (thread->encoder == btf_encoder) {
> > > +               /* Release the lock acuqired when created
> > > btf_encoder */
> >
> > typo: acquired
> >
> > > +               pthread_mutex_unlock(&btf_encoder_lock);
> >
> > Splitting pthread_mutex lock/unlock like this is extremely dangerous
> > and error prone. If that's the price for reusing global btf_encoder
> > for first worker, then I'd rather not reuse btf_encoder or revert
> > back
> > to doing btf__add_btf() and doing btf_encoder__delete() in the main
> > thread.
> >
> > Please question and push back the approach and code review feedback
> > if
> > something doesn't feel natural or is causing more problems than
> > solves.
> >
> > I think the cleanest solution would be to not reuse global btf_encoder
> > for the first worker. I suspect the time difference isn't big, so I'd
> > optimize for simplicity and clean separation. But if it is causing
> > unnecessary slowdown, then as I said, let's just revert back to your
> > previous approach with doing btf__add_btf() in the main thread.
> >
>
> Your concerns make sense.
> I tried the solutions w/ & w/o reusing btf_encoder.  The differencies
> are obviously.  So, I will rollback to calling btf__add_btf() at the
> main thread.
>
> w/o reusing: AVG 5.78467 P50 5.88 P70 6.03 P90 6.10
> w/  reusing: AVG 5.304 P50 5.12 P70 5.17 P90 5.46
>

Half a second, wow. Ok, yeah, makes sense.

>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-01-26 19:55 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-26  4:05 [PATCH dwarves v3 0/4] Parallelize BTF type info generating of pahole Kui-Feng Lee
2022-01-26  4:05 ` [PATCH dwarves v3 1/4] dwarf_loader: Receive per-thread data on worker threads Kui-Feng Lee
2022-01-26  6:10   ` Andrii Nakryiko
2022-01-26  4:05 ` [PATCH dwarves v3 2/4] dwarf_loader: Prepare and pass per-thread data to " Kui-Feng Lee
2022-01-26  6:13   ` Andrii Nakryiko
2022-01-26  4:05 ` [PATCH dwarves v3 3/4] pahole: Use per-thread btf instances to avoid mutex locking Kui-Feng Lee
2022-01-26  6:07   ` Andrii Nakryiko
2022-01-26 18:39     ` Kui-Feng Lee
2022-01-26 19:54       ` Andrii Nakryiko
2022-01-26  4:05 ` [PATCH dwarves v3 4/4] libbpf: Update libbpf to a new revision Kui-Feng Lee
2022-01-26  6:08   ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).