bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/13] Improvements to memory use
@ 2023-10-12  6:23 Ian Rogers
  2023-10-12  6:23 ` [PATCH v2 01/13] perf machine: Avoid out of bounds LBR memory read Ian Rogers
                   ` (12 more replies)
  0 siblings, 13 replies; 28+ messages in thread
From: Ian Rogers @ 2023-10-12  6:23 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Song Liu,
	Sandipan Das, Anshuman Khandual, James Clark, Liam Howlett,
	Miguel Ojeda, Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

Fix memory leaks detected by address/leak sanitizer affecting LBR
call-graphs, perf mem and BPF offcpu.

Make branch_type_stat in callchain_list optional as it is large and
not always necessary - in particular it isn't used by perf top.

Make the allocations of zstd streams, kernel symbols and event copies
lazier in order to save memory in cases like perf record.

Handle the thread exit event and have it remove the thread from the
threads set in machine. Don't do this for perf report as it causes a
regression for task lists, which assume threads are never removed from
the machine's set, and offcpu events, that may sythensize samples for
threads that have exited.

The overall effect is to reduce memory consumption significantly for
perf top - with call graphs enabled running longer before 1GB of
memory is consumed. For a perf record of 'true', the memory
consumption goes from 39912kb max resident to 20820kb max resident -
nearly halved.

v2: Add additional memory fixes on top of initial LBR and rc check
    fixes.

Ian Rogers (13):
  perf machine: Avoid out of bounds LBR memory read
  libperf rc_check: Make implicit enabling work for GCC
  perf hist: Add missing puts to hist__account_cycles
  perf threads: Remove unused dead thread list
  perf offcpu: Add missed btf_free
  perf callchain: Make display use of branch_type_stat const
  perf callchain: Make brtype_stat in callchain_list optional
  perf callchain: Minor layout changes to callchain_list
  perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit
  perf record: Lazy load kernel symbols
  libperf: Lazily allocate mmap event copy
  perf mmap: Lazily initialize zstd streams
  perf machine thread: Remove exited threads by default

 tools/lib/perf/include/internal/mmap.h     |  2 +-
 tools/lib/perf/include/internal/rc_check.h |  6 ++-
 tools/lib/perf/mmap.c                      |  9 ++++
 tools/perf/builtin-inject.c                |  4 ++
 tools/perf/builtin-record.c                |  2 +
 tools/perf/builtin-report.c                |  7 +++
 tools/perf/util/Build                      |  1 +
 tools/perf/util/bpf_off_cpu.c              | 10 ++--
 tools/perf/util/branch.c                   |  4 +-
 tools/perf/util/branch.h                   |  4 +-
 tools/perf/util/callchain.c                | 62 ++++++++++++++--------
 tools/perf/util/callchain.h                | 18 +++----
 tools/perf/util/compress.h                 |  1 +
 tools/perf/util/event.c                    |  4 +-
 tools/perf/util/hist.c                     | 16 +++---
 tools/perf/util/machine.c                  | 39 +++++++-------
 tools/perf/util/machine.h                  |  1 -
 tools/perf/util/map_symbol.c               | 15 ++++++
 tools/perf/util/map_symbol.h               |  4 ++
 tools/perf/util/mmap.c                     |  5 +-
 tools/perf/util/mmap.h                     |  1 -
 tools/perf/util/symbol.c                   |  5 +-
 tools/perf/util/symbol_conf.h              |  4 +-
 tools/perf/util/thread.h                   | 14 +++++
 tools/perf/util/zstd.c                     | 61 +++++++++++----------
 25 files changed, 196 insertions(+), 103 deletions(-)
 create mode 100644 tools/perf/util/map_symbol.c

-- 
2.42.0.609.gbb76f46606-goog


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 01/13] perf machine: Avoid out of bounds LBR memory read
  2023-10-12  6:23 [PATCH v2 00/13] Improvements to memory use Ian Rogers
@ 2023-10-12  6:23 ` Ian Rogers
  2023-10-12  6:23 ` [PATCH v2 02/13] libperf rc_check: Make implicit enabling work for GCC Ian Rogers
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Ian Rogers @ 2023-10-12  6:23 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Song Liu,
	Sandipan Das, Anshuman Khandual, James Clark, Liam Howlett,
	Miguel Ojeda, Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

Running perf top with address sanitizer and "--call-graph=lbr" fails
due to reading sample 0 when no samples exist. Add a guard to prevent
this.

Fixes: e2b23483eb1d ("perf machine: Factor out lbr_callchain_add_lbr_ip()")
Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/machine.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index addfae2f63ef..e0e2c4a943e4 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2622,16 +2622,18 @@ static int lbr_callchain_add_lbr_ip(struct thread *thread,
 		save_lbr_cursor_node(thread, cursor, i);
 	}
 
-	/* Add LBR ip from first entries.to */
-	ip = entries[0].to;
-	flags = &entries[0].flags;
-	*branch_from = entries[0].from;
-	err = add_callchain_ip(thread, cursor, parent,
-			       root_al, &cpumode, ip,
-			       true, flags, NULL,
-			       *branch_from);
-	if (err)
-		return err;
+	if (lbr_nr > 0) {
+		/* Add LBR ip from first entries.to */
+		ip = entries[0].to;
+		flags = &entries[0].flags;
+		*branch_from = entries[0].from;
+		err = add_callchain_ip(thread, cursor, parent,
+				root_al, &cpumode, ip,
+				true, flags, NULL,
+				*branch_from);
+		if (err)
+			return err;
+	}
 
 	return 0;
 }
-- 
2.42.0.609.gbb76f46606-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 02/13] libperf rc_check: Make implicit enabling work for GCC
  2023-10-12  6:23 [PATCH v2 00/13] Improvements to memory use Ian Rogers
  2023-10-12  6:23 ` [PATCH v2 01/13] perf machine: Avoid out of bounds LBR memory read Ian Rogers
@ 2023-10-12  6:23 ` Ian Rogers
  2023-10-12  6:23 ` [PATCH v2 03/13] perf hist: Add missing puts to hist__account_cycles Ian Rogers
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Ian Rogers @ 2023-10-12  6:23 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Song Liu,
	Sandipan Das, Anshuman Khandual, James Clark, Liam Howlett,
	Miguel Ojeda, Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

Make the implicit REFCOUNT_CHECKING robust to when building with GCC.

Fixes: 9be6ab181b7b ("libperf rc_check: Enable implicitly with sanitizers")
Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/perf/include/internal/rc_check.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/lib/perf/include/internal/rc_check.h b/tools/lib/perf/include/internal/rc_check.h
index d5d771ccdc7b..e88a6d8a0b0f 100644
--- a/tools/lib/perf/include/internal/rc_check.h
+++ b/tools/lib/perf/include/internal/rc_check.h
@@ -9,8 +9,12 @@
  * Enable reference count checking implicitly with leak checking, which is
  * integrated into address sanitizer.
  */
-#if defined(LEAK_SANITIZER) || defined(ADDRESS_SANITIZER)
+#if defined(__SANITIZE_ADDRESS__) || defined(LEAK_SANITIZER) || defined(ADDRESS_SANITIZER)
 #define REFCNT_CHECKING 1
+#elif defined(__has_feature)
+#if __has_feature(address_sanitizer) || __has_feature(leak_sanitizer)
+#define REFCNT_CHECKING 1
+#endif
 #endif
 
 /*
-- 
2.42.0.609.gbb76f46606-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 03/13] perf hist: Add missing puts to hist__account_cycles
  2023-10-12  6:23 [PATCH v2 00/13] Improvements to memory use Ian Rogers
  2023-10-12  6:23 ` [PATCH v2 01/13] perf machine: Avoid out of bounds LBR memory read Ian Rogers
  2023-10-12  6:23 ` [PATCH v2 02/13] libperf rc_check: Make implicit enabling work for GCC Ian Rogers
@ 2023-10-12  6:23 ` Ian Rogers
  2023-10-18 23:16   ` Namhyung Kim
  2023-10-12  6:23 ` [PATCH v2 04/13] perf threads: Remove unused dead thread list Ian Rogers
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Ian Rogers @ 2023-10-12  6:23 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Song Liu,
	Sandipan Das, Anshuman Khandual, James Clark, Liam Howlett,
	Miguel Ojeda, Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

Caught using reference count checking on perf top with
"--call-graph=lbr". After this no memory leaks were detected.

Fixes: 57849998e2cd ("perf report: Add processing for cycle histograms")
Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/hist.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 3dc8a4968beb..ac8c0ef48a7f 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -2676,8 +2676,6 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
 
 	/* If we have branch cycles always annotate them. */
 	if (bs && bs->nr && entries[0].flags.cycles) {
-		int i;
-
 		bi = sample__resolve_bstack(sample, al);
 		if (bi) {
 			struct addr_map_symbol *prev = NULL;
@@ -2692,7 +2690,7 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
 			 * Note that perf stores branches reversed from
 			 * program order!
 			 */
-			for (i = bs->nr - 1; i >= 0; i--) {
+			for (int i = bs->nr - 1; i >= 0; i--) {
 				addr_map_symbol__account_cycles(&bi[i].from,
 					nonany_branch_mode ? NULL : prev,
 					bi[i].flags.cycles);
@@ -2701,6 +2699,12 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
 				if (total_cycles)
 					*total_cycles += bi[i].flags.cycles;
 			}
+			for (unsigned int i = 0; i < bs->nr; i++) {
+				map__put(bi[i].to.ms.map);
+				maps__put(bi[i].to.ms.maps);
+				map__put(bi[i].from.ms.map);
+				maps__put(bi[i].from.ms.maps);
+			}
 			free(bi);
 		}
 	}
-- 
2.42.0.609.gbb76f46606-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 04/13] perf threads: Remove unused dead thread list
  2023-10-12  6:23 [PATCH v2 00/13] Improvements to memory use Ian Rogers
                   ` (2 preceding siblings ...)
  2023-10-12  6:23 ` [PATCH v2 03/13] perf hist: Add missing puts to hist__account_cycles Ian Rogers
@ 2023-10-12  6:23 ` Ian Rogers
  2023-10-12  6:23 ` [PATCH v2 05/13] perf offcpu: Add missed btf_free Ian Rogers
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Ian Rogers @ 2023-10-12  6:23 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Song Liu,
	Sandipan Das, Anshuman Khandual, James Clark, Liam Howlett,
	Miguel Ojeda, Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

Commit 40826c45eb0b ("perf thread: Remove notion of dead threads")
removed dead threads but the list head wasn't removed. Remove it here.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/machine.c | 1 -
 tools/perf/util/machine.h | 1 -
 2 files changed, 2 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index e0e2c4a943e4..8e5085b77c7b 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -67,7 +67,6 @@ static void machine__threads_init(struct machine *machine)
 		threads->entries = RB_ROOT_CACHED;
 		init_rwsem(&threads->lock);
 		threads->nr = 0;
-		INIT_LIST_HEAD(&threads->dead);
 		threads->last_match = NULL;
 	}
 }
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index d034ecaf89c1..1279acda6a8a 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -35,7 +35,6 @@ struct threads {
 	struct rb_root_cached  entries;
 	struct rw_semaphore    lock;
 	unsigned int	       nr;
-	struct list_head       dead;
 	struct thread	       *last_match;
 };
 
-- 
2.42.0.609.gbb76f46606-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 05/13] perf offcpu: Add missed btf_free
  2023-10-12  6:23 [PATCH v2 00/13] Improvements to memory use Ian Rogers
                   ` (3 preceding siblings ...)
  2023-10-12  6:23 ` [PATCH v2 04/13] perf threads: Remove unused dead thread list Ian Rogers
@ 2023-10-12  6:23 ` Ian Rogers
  2023-10-12  6:23 ` [PATCH v2 06/13] perf callchain: Make display use of branch_type_stat const Ian Rogers
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Ian Rogers @ 2023-10-12  6:23 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Song Liu,
	Sandipan Das, Anshuman Khandual, James Clark, Liam Howlett,
	Miguel Ojeda, Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

Caught by address/leak sanitizer.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/bpf_off_cpu.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
index 21f4d9ba023d..6af36142dc5a 100644
--- a/tools/perf/util/bpf_off_cpu.c
+++ b/tools/perf/util/bpf_off_cpu.c
@@ -98,22 +98,22 @@ static void off_cpu_finish(void *arg __maybe_unused)
 /* v5.18 kernel added prev_state arg, so it needs to check the signature */
 static void check_sched_switch_args(void)
 {
-	const struct btf *btf = btf__load_vmlinux_btf();
+	struct btf *btf = btf__load_vmlinux_btf();
 	const struct btf_type *t1, *t2, *t3;
 	u32 type_id;
 
 	type_id = btf__find_by_name_kind(btf, "btf_trace_sched_switch",
 					 BTF_KIND_TYPEDEF);
 	if ((s32)type_id < 0)
-		return;
+		goto cleanup;
 
 	t1 = btf__type_by_id(btf, type_id);
 	if (t1 == NULL)
-		return;
+		goto cleanup;
 
 	t2 = btf__type_by_id(btf, t1->type);
 	if (t2 == NULL || !btf_is_ptr(t2))
-		return;
+		goto cleanup;
 
 	t3 = btf__type_by_id(btf, t2->type);
 	/* btf_trace func proto has one more argument for the context */
@@ -121,6 +121,8 @@ static void check_sched_switch_args(void)
 		/* new format: pass prev_state as 4th arg */
 		skel->rodata->has_prev_state = true;
 	}
+cleanup:
+	btf__free(btf);
 }
 
 int off_cpu_prepare(struct evlist *evlist, struct target *target,
-- 
2.42.0.609.gbb76f46606-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 06/13] perf callchain: Make display use of branch_type_stat const
  2023-10-12  6:23 [PATCH v2 00/13] Improvements to memory use Ian Rogers
                   ` (4 preceding siblings ...)
  2023-10-12  6:23 ` [PATCH v2 05/13] perf offcpu: Add missed btf_free Ian Rogers
@ 2023-10-12  6:23 ` Ian Rogers
  2023-10-12  6:23 ` [PATCH v2 07/13] perf callchain: Make brtype_stat in callchain_list optional Ian Rogers
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Ian Rogers @ 2023-10-12  6:23 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Song Liu,
	Sandipan Das, Anshuman Khandual, James Clark, Liam Howlett,
	Miguel Ojeda, Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

Display code doesn't modify the branch_type_stat so switch uses to
const. This is done to aid refactoring struct callchain_list where
current the branch_type_stat is embedded even if not used.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/branch.c    | 4 ++--
 tools/perf/util/branch.h    | 4 ++--
 tools/perf/util/callchain.c | 6 +++---
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/branch.c b/tools/perf/util/branch.c
index 378f16a24751..ab760e267d41 100644
--- a/tools/perf/util/branch.c
+++ b/tools/perf/util/branch.c
@@ -109,7 +109,7 @@ const char *get_branch_type(struct branch_entry *e)
 	return branch_type_name(e->flags.type);
 }
 
-void branch_type_stat_display(FILE *fp, struct branch_type_stat *st)
+void branch_type_stat_display(FILE *fp, const struct branch_type_stat *st)
 {
 	u64 total = 0;
 	int i;
@@ -171,7 +171,7 @@ static int count_str_scnprintf(int idx, const char *str, char *bf, int size)
 	return scnprintf(bf, size, "%s%s", (idx) ? " " : " (", str);
 }
 
-int branch_type_str(struct branch_type_stat *st, char *bf, int size)
+int branch_type_str(const struct branch_type_stat *st, char *bf, int size)
 {
 	int i, j = 0, printed = 0;
 	u64 total = 0;
diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h
index e41bfffe2217..87704d713ff6 100644
--- a/tools/perf/util/branch.h
+++ b/tools/perf/util/branch.h
@@ -86,8 +86,8 @@ void branch_type_count(struct branch_type_stat *st, struct branch_flags *flags,
 const char *branch_type_name(int type);
 const char *branch_new_type_name(int new_type);
 const char *get_branch_type(struct branch_entry *e);
-void branch_type_stat_display(FILE *fp, struct branch_type_stat *st);
-int branch_type_str(struct branch_type_stat *st, char *bf, int bfsize);
+void branch_type_stat_display(FILE *fp, const struct branch_type_stat *st);
+int branch_type_str(const struct branch_type_stat *st, char *bf, int bfsize);
 
 const char *branch_spec_desc(int spec);
 
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index aee937d14fbb..229cedee1e68 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1339,7 +1339,7 @@ static int count_float_printf(int idx, const char *str, float value,
 static int branch_to_str(char *bf, int bfsize,
 			 u64 branch_count, u64 predicted_count,
 			 u64 abort_count,
-			 struct branch_type_stat *brtype_stat)
+			 const struct branch_type_stat *brtype_stat)
 {
 	int printed, i = 0;
 
@@ -1403,7 +1403,7 @@ static int counts_str_build(char *bf, int bfsize,
 			     u64 abort_count, u64 cycles_count,
 			     u64 iter_count, u64 iter_cycles,
 			     u64 from_count,
-			     struct branch_type_stat *brtype_stat)
+			     const struct branch_type_stat *brtype_stat)
 {
 	int printed;
 
@@ -1430,7 +1430,7 @@ static int callchain_counts_printf(FILE *fp, char *bf, int bfsize,
 				   u64 abort_count, u64 cycles_count,
 				   u64 iter_count, u64 iter_cycles,
 				   u64 from_count,
-				   struct branch_type_stat *brtype_stat)
+				   const struct branch_type_stat *brtype_stat)
 {
 	char str[256];
 
-- 
2.42.0.609.gbb76f46606-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 07/13] perf callchain: Make brtype_stat in callchain_list optional
  2023-10-12  6:23 [PATCH v2 00/13] Improvements to memory use Ian Rogers
                   ` (5 preceding siblings ...)
  2023-10-12  6:23 ` [PATCH v2 06/13] perf callchain: Make display use of branch_type_stat const Ian Rogers
@ 2023-10-12  6:23 ` Ian Rogers
  2023-10-12  6:23 ` [PATCH v2 08/13] perf callchain: Minor layout changes to callchain_list Ian Rogers
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Ian Rogers @ 2023-10-12  6:23 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Song Liu,
	Sandipan Das, Anshuman Khandual, James Clark, Liam Howlett,
	Miguel Ojeda, Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

struct callchain_list is 352bytes in size, 232 of which are
brtype_stat. brtype_stat is only used for certain callchain_list
items so make it optional, allocating when necessary. So that
printing doesn't need to deal with an optional brtype_stat, pass
an empty/zero version.

Before:
```
struct callchain_list {
        u64                        ip;                   /*     0     8 */
        struct map_symbol          ms;                   /*     8    24 */
        struct {
                _Bool              unfolded;             /*    32     1 */
                _Bool              has_children;         /*    33     1 */
        };                                               /*    32     2 */

        /* XXX 6 bytes hole, try to pack */

        u64                        branch_count;         /*    40     8 */
        u64                        from_count;           /*    48     8 */
        u64                        predicted_count;      /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        u64                        abort_count;          /*    64     8 */
        u64                        cycles_count;         /*    72     8 */
        u64                        iter_count;           /*    80     8 */
        u64                        iter_cycles;          /*    88     8 */
        struct branch_type_stat    brtype_stat;          /*    96   232 */
        /* --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- */
        const char  *              srcline;              /*   328     8 */
        struct list_head           list;                 /*   336    16 */

        /* size: 352, cachelines: 6, members: 13 */
        /* sum members: 346, holes: 1, sum holes: 6 */
        /* last cacheline: 32 bytes */
};
```

After:
```
struct callchain_list {
        u64                        ip;                   /*     0     8 */
        struct map_symbol          ms;                   /*     8    24 */
        struct {
                _Bool              unfolded;             /*    32     1 */
                _Bool              has_children;         /*    33     1 */
        };                                               /*    32     2 */

        /* XXX 6 bytes hole, try to pack */

        u64                        branch_count;         /*    40     8 */
        u64                        from_count;           /*    48     8 */
        u64                        predicted_count;      /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        u64                        abort_count;          /*    64     8 */
        u64                        cycles_count;         /*    72     8 */
        u64                        iter_count;           /*    80     8 */
        u64                        iter_cycles;          /*    88     8 */
        struct branch_type_stat *  brtype_stat;          /*    96     8 */
        const char  *              srcline;              /*   104     8 */
        struct list_head           list;                 /*   112    16 */

        /* size: 128, cachelines: 2, members: 13 */
        /* sum members: 122, holes: 1, sum holes: 6 */
};
```

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/callchain.c | 41 +++++++++++++++++++++++++++++--------
 tools/perf/util/callchain.h |  2 +-
 2 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 229cedee1e68..0a7919c2af91 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -586,7 +586,7 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor)
 		call = zalloc(sizeof(*call));
 		if (!call) {
 			perror("not enough memory for the code path tree");
-			return -1;
+			return -ENOMEM;
 		}
 		call->ip = cursor_node->ip;
 		call->ms = cursor_node->ms;
@@ -602,7 +602,15 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor)
 				 * branch_from is set with value somewhere else
 				 * to imply it's "to" of a branch.
 				 */
-				call->brtype_stat.branch_to = true;
+				if (!call->brtype_stat) {
+					call->brtype_stat = zalloc(sizeof(*call->brtype_stat));
+					if (!call->brtype_stat) {
+						perror("not enough memory for the code path branch statisitcs");
+						free(call->brtype_stat);
+						return -ENOMEM;
+					}
+				}
+				call->brtype_stat->branch_to = true;
 
 				if (cursor_node->branch_flags.predicted)
 					call->predicted_count = 1;
@@ -610,7 +618,7 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor)
 				if (cursor_node->branch_flags.abort)
 					call->abort_count = 1;
 
-				branch_type_count(&call->brtype_stat,
+				branch_type_count(call->brtype_stat,
 						  &cursor_node->branch_flags,
 						  cursor_node->branch_from,
 						  cursor_node->ip);
@@ -618,7 +626,8 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor)
 				/*
 				 * It's "from" of a branch
 				 */
-				call->brtype_stat.branch_to = false;
+				if (call->brtype_stat && call->brtype_stat->branch_to)
+					call->brtype_stat->branch_to = false;
 				call->cycles_count =
 					cursor_node->branch_flags.cycles;
 				call->iter_count = cursor_node->nr_loop_iter;
@@ -652,6 +661,7 @@ add_child(struct callchain_node *parent,
 			list_del_init(&call->list);
 			map__zput(call->ms.map);
 			maps__zput(call->ms.maps);
+			zfree(&call->brtype_stat);
 			free(call);
 		}
 		free(new);
@@ -762,7 +772,14 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
 			/*
 			 * It's "to" of a branch
 			 */
-			cnode->brtype_stat.branch_to = true;
+			if (!cnode->brtype_stat) {
+				cnode->brtype_stat = zalloc(sizeof(*cnode->brtype_stat));
+				if (!cnode->brtype_stat) {
+					perror("not enough memory for the code path branch statisitcs");
+					return MATCH_ERROR;
+				}
+			}
+			cnode->brtype_stat->branch_to = true;
 
 			if (node->branch_flags.predicted)
 				cnode->predicted_count++;
@@ -770,7 +787,7 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
 			if (node->branch_flags.abort)
 				cnode->abort_count++;
 
-			branch_type_count(&cnode->brtype_stat,
+			branch_type_count(cnode->brtype_stat,
 					  &node->branch_flags,
 					  node->branch_from,
 					  node->ip);
@@ -778,7 +795,8 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
 			/*
 			 * It's "from" of a branch
 			 */
-			cnode->brtype_stat.branch_to = false;
+			if (cnode->brtype_stat && cnode->brtype_stat->branch_to)
+				cnode->brtype_stat->branch_to = false;
 			cnode->cycles_count += node->branch_flags.cycles;
 			cnode->iter_count += node->nr_loop_iter;
 			cnode->iter_cycles += node->iter_cycles;
@@ -1026,6 +1044,7 @@ merge_chain_branch(struct callchain_cursor *cursor,
 		maps__zput(ms.maps);
 		map__zput(list->ms.map);
 		maps__zput(list->ms.maps);
+		zfree(&list->brtype_stat);
 		free(list);
 	}
 
@@ -1447,11 +1466,14 @@ static int callchain_counts_printf(FILE *fp, char *bf, int bfsize,
 int callchain_list_counts__printf_value(struct callchain_list *clist,
 					FILE *fp, char *bf, int bfsize)
 {
+	static const struct branch_type_stat empty_brtype_stat = {};
+	const struct branch_type_stat *brtype_stat;
 	u64 branch_count, predicted_count;
 	u64 abort_count, cycles_count;
 	u64 iter_count, iter_cycles;
 	u64 from_count;
 
+	brtype_stat = clist->brtype_stat ?: &empty_brtype_stat;
 	branch_count = clist->branch_count;
 	predicted_count = clist->predicted_count;
 	abort_count = clist->abort_count;
@@ -1463,7 +1485,7 @@ int callchain_list_counts__printf_value(struct callchain_list *clist,
 	return callchain_counts_printf(fp, bf, bfsize, branch_count,
 				       predicted_count, abort_count,
 				       cycles_count, iter_count, iter_cycles,
-				       from_count, &clist->brtype_stat);
+				       from_count, brtype_stat);
 }
 
 static void free_callchain_node(struct callchain_node *node)
@@ -1476,6 +1498,7 @@ static void free_callchain_node(struct callchain_node *node)
 		list_del_init(&list->list);
 		map__zput(list->ms.map);
 		maps__zput(list->ms.maps);
+		zfree(&list->brtype_stat);
 		free(list);
 	}
 
@@ -1483,6 +1506,7 @@ static void free_callchain_node(struct callchain_node *node)
 		list_del_init(&list->list);
 		map__zput(list->ms.map);
 		maps__zput(list->ms.maps);
+		zfree(&list->brtype_stat);
 		free(list);
 	}
 
@@ -1569,6 +1593,7 @@ int callchain_node__make_parent_list(struct callchain_node *node)
 		list_del_init(&chain->list);
 		map__zput(chain->ms.map);
 		maps__zput(chain->ms.maps);
+		zfree(&chain->brtype_stat);
 		free(chain);
 	}
 	return -ENOMEM;
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index d2618a47deca..86e8a9e81456 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -129,7 +129,7 @@ struct callchain_list {
 	u64			cycles_count;
 	u64			iter_count;
 	u64			iter_cycles;
-	struct branch_type_stat brtype_stat;
+	struct branch_type_stat *brtype_stat;
 	const char		*srcline;
 	struct list_head	list;
 };
-- 
2.42.0.609.gbb76f46606-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 08/13] perf callchain: Minor layout changes to callchain_list
  2023-10-12  6:23 [PATCH v2 00/13] Improvements to memory use Ian Rogers
                   ` (6 preceding siblings ...)
  2023-10-12  6:23 ` [PATCH v2 07/13] perf callchain: Make brtype_stat in callchain_list optional Ian Rogers
@ 2023-10-12  6:23 ` Ian Rogers
  2023-10-12  6:23 ` [PATCH v2 09/13] perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit Ian Rogers
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Ian Rogers @ 2023-10-12  6:23 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Song Liu,
	Sandipan Das, Anshuman Khandual, James Clark, Liam Howlett,
	Miguel Ojeda, Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

Avoid 6 byte hole for padding. Place more frequently used fields
first in an attempt to use just 1 cacheline in the common case.

Before:
```
struct callchain_list {
        u64                        ip;                   /*     0     8 */
        struct map_symbol          ms;                   /*     8    24 */
        struct {
                _Bool              unfolded;             /*    32     1 */
                _Bool              has_children;         /*    33     1 */
        };                                               /*    32     2 */

        /* XXX 6 bytes hole, try to pack */

        u64                        branch_count;         /*    40     8 */
        u64                        from_count;           /*    48     8 */
        u64                        predicted_count;      /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        u64                        abort_count;          /*    64     8 */
        u64                        cycles_count;         /*    72     8 */
        u64                        iter_count;           /*    80     8 */
        u64                        iter_cycles;          /*    88     8 */
        struct branch_type_stat *  brtype_stat;          /*    96     8 */
        const char  *              srcline;              /*   104     8 */
        struct list_head           list;                 /*   112    16 */

        /* size: 128, cachelines: 2, members: 13 */
        /* sum members: 122, holes: 1, sum holes: 6 */
};
```

After:
```
struct callchain_list {
        struct list_head           list;                 /*     0    16 */
        u64                        ip;                   /*    16     8 */
        struct map_symbol          ms;                   /*    24    24 */
        const char  *              srcline;              /*    48     8 */
        u64                        branch_count;         /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        u64                        from_count;           /*    64     8 */
        u64                        cycles_count;         /*    72     8 */
        u64                        iter_count;           /*    80     8 */
        u64                        iter_cycles;          /*    88     8 */
        struct branch_type_stat *  brtype_stat;          /*    96     8 */
        u64                        predicted_count;      /*   104     8 */
        u64                        abort_count;          /*   112     8 */
        struct {
                _Bool              unfolded;             /*   120     1 */
                _Bool              has_children;         /*   121     1 */
        };                                               /*   120     2 */

        /* size: 128, cachelines: 2, members: 13 */
        /* padding: 6 */
};
```

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/callchain.h | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 86e8a9e81456..d5c66345ae31 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -116,22 +116,22 @@ extern struct callchain_param callchain_param;
 extern struct callchain_param callchain_param_default;
 
 struct callchain_list {
+	struct list_head	list;
 	u64			ip;
 	struct map_symbol	ms;
-	struct /* for TUI */ {
-		bool		unfolded;
-		bool		has_children;
-	};
+	const char		*srcline;
 	u64			branch_count;
 	u64			from_count;
-	u64			predicted_count;
-	u64			abort_count;
 	u64			cycles_count;
 	u64			iter_count;
 	u64			iter_cycles;
 	struct branch_type_stat *brtype_stat;
-	const char		*srcline;
-	struct list_head	list;
+	u64			predicted_count;
+	u64			abort_count;
+	struct /* for TUI */ {
+		bool		unfolded;
+		bool		has_children;
+	};
 };
 
 /*
-- 
2.42.0.609.gbb76f46606-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 09/13] perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit
  2023-10-12  6:23 [PATCH v2 00/13] Improvements to memory use Ian Rogers
                   ` (7 preceding siblings ...)
  2023-10-12  6:23 ` [PATCH v2 08/13] perf callchain: Minor layout changes to callchain_list Ian Rogers
@ 2023-10-12  6:23 ` Ian Rogers
  2023-10-18 23:20   ` Namhyung Kim
  2023-10-12  6:23 ` [PATCH v2 10/13] perf record: Lazy load kernel symbols Ian Rogers
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Ian Rogers @ 2023-10-12  6:23 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Song Liu,
	Sandipan Das, Anshuman Khandual, James Clark, Liam Howlett,
	Miguel Ojeda, Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

Fix leak where mem_info__put wouldn't release the maps/map as used by
perf mem. Add exit functions and use elsewhere that the maps and map
are released.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/Build        |  1 +
 tools/perf/util/callchain.c  | 15 +++++----------
 tools/perf/util/hist.c       |  6 ++----
 tools/perf/util/machine.c    |  6 ++----
 tools/perf/util/map_symbol.c | 15 +++++++++++++++
 tools/perf/util/map_symbol.h |  4 ++++
 tools/perf/util/symbol.c     |  5 ++++-
 7 files changed, 33 insertions(+), 19 deletions(-)
 create mode 100644 tools/perf/util/map_symbol.c

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 0ea5a9d368d4..96058f949ec9 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -49,6 +49,7 @@ perf-y += dso.o
 perf-y += dsos.o
 perf-y += symbol.o
 perf-y += symbol_fprintf.o
+perf-y += map_symbol.o
 perf-y += color.o
 perf-y += color_config.o
 perf-y += metricgroup.o
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 0a7919c2af91..02881d5b822c 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1496,16 +1496,14 @@ static void free_callchain_node(struct callchain_node *node)
 
 	list_for_each_entry_safe(list, tmp, &node->parent_val, list) {
 		list_del_init(&list->list);
-		map__zput(list->ms.map);
-		maps__zput(list->ms.maps);
+		map_symbol__exit(&list->ms);
 		zfree(&list->brtype_stat);
 		free(list);
 	}
 
 	list_for_each_entry_safe(list, tmp, &node->val, list) {
 		list_del_init(&list->list);
-		map__zput(list->ms.map);
-		maps__zput(list->ms.maps);
+		map_symbol__exit(&list->ms);
 		zfree(&list->brtype_stat);
 		free(list);
 	}
@@ -1591,8 +1589,7 @@ int callchain_node__make_parent_list(struct callchain_node *node)
 out:
 	list_for_each_entry_safe(chain, new, &head, list) {
 		list_del_init(&chain->list);
-		map__zput(chain->ms.map);
-		maps__zput(chain->ms.maps);
+		map_symbol__exit(&chain->ms);
 		zfree(&chain->brtype_stat);
 		free(chain);
 	}
@@ -1676,10 +1673,8 @@ void callchain_cursor_reset(struct callchain_cursor *cursor)
 	cursor->nr = 0;
 	cursor->last = &cursor->first;
 
-	for (node = cursor->first; node != NULL; node = node->next) {
-		map__zput(node->ms.map);
-		maps__zput(node->ms.maps);
-	}
+	for (node = cursor->first; node != NULL; node = node->next)
+		map_symbol__exit(&node->ms);
 }
 
 void callchain_param_setup(u64 sample_type, const char *arch)
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index ac8c0ef48a7f..d62693b8fad8 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -524,8 +524,7 @@ static int hist_entry__init(struct hist_entry *he,
 		map__put(he->mem_info->daddr.ms.map);
 	}
 err:
-	maps__zput(he->ms.maps);
-	map__zput(he->ms.map);
+	map_symbol__exit(&he->ms);
 	zfree(&he->stat_acc);
 	return -ENOMEM;
 }
@@ -1317,8 +1316,7 @@ void hist_entry__delete(struct hist_entry *he)
 	struct hist_entry_ops *ops = he->ops;
 
 	thread__zput(he->thread);
-	maps__zput(he->ms.maps);
-	map__zput(he->ms.map);
+	map_symbol__exit(&he->ms);
 
 	if (he->branch_info) {
 		map__zput(he->branch_info->from.ms.map);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 8e5085b77c7b..6ca7500e2cf4 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2389,8 +2389,7 @@ static int add_callchain_ip(struct thread *thread,
 				      iter_cycles, branch_from, srcline);
 out:
 	addr_location__exit(&al);
-	maps__put(ms.maps);
-	map__put(ms.map);
+	map_symbol__exit(&ms);
 	return err;
 }
 
@@ -3116,8 +3115,7 @@ static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms
 		if (ret != 0)
 			return ret;
 	}
-	map__put(ilist_ms.map);
-	maps__put(ilist_ms.maps);
+	map_symbol__exit(&ilist_ms);
 
 	return ret;
 }
diff --git a/tools/perf/util/map_symbol.c b/tools/perf/util/map_symbol.c
new file mode 100644
index 000000000000..bef5079f2403
--- /dev/null
+++ b/tools/perf/util/map_symbol.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "map_symbol.h"
+#include "maps.h"
+#include "map.h"
+
+void map_symbol__exit(struct map_symbol *ms)
+{
+	maps__zput(ms->maps);
+	map__zput(ms->map);
+}
+
+void addr_map_symbol__exit(struct addr_map_symbol *ams)
+{
+	map_symbol__exit(&ams->ms);
+}
diff --git a/tools/perf/util/map_symbol.h b/tools/perf/util/map_symbol.h
index e08817b0c30f..72d5ed938ed6 100644
--- a/tools/perf/util/map_symbol.h
+++ b/tools/perf/util/map_symbol.h
@@ -22,4 +22,8 @@ struct addr_map_symbol {
 	u64	      phys_addr;
 	u64	      data_page_size;
 };
+
+void map_symbol__exit(struct map_symbol *ms);
+void addr_map_symbol__exit(struct addr_map_symbol *ams);
+
 #endif // __PERF_MAP_SYMBOL
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 2740d4457c13..d67a87072eec 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -2790,8 +2790,11 @@ struct mem_info *mem_info__get(struct mem_info *mi)
 
 void mem_info__put(struct mem_info *mi)
 {
-	if (mi && refcount_dec_and_test(&mi->refcnt))
+	if (mi && refcount_dec_and_test(&mi->refcnt)) {
+		addr_map_symbol__exit(&mi->iaddr);
+		addr_map_symbol__exit(&mi->daddr);
 		free(mi);
+	}
 }
 
 struct mem_info *mem_info__new(void)
-- 
2.42.0.609.gbb76f46606-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 10/13] perf record: Lazy load kernel symbols
  2023-10-12  6:23 [PATCH v2 00/13] Improvements to memory use Ian Rogers
                   ` (8 preceding siblings ...)
  2023-10-12  6:23 ` [PATCH v2 09/13] perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit Ian Rogers
@ 2023-10-12  6:23 ` Ian Rogers
  2023-10-19 11:02   ` Adrian Hunter
  2023-10-12  6:23 ` [PATCH v2 11/13] libperf: Lazily allocate mmap event copy Ian Rogers
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Ian Rogers @ 2023-10-12  6:23 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Song Liu,
	Sandipan Das, Anshuman Khandual, James Clark, Liam Howlett,
	Miguel Ojeda, Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

Commit 5b7ba82a7591 ("perf symbols: Load kernel maps before using")
changed it so that loading a kernel dso would cause the symbols for
the dso to be eagerly loaded. For perf record this is overhead as the
symbols won't be used. Add a symbol_conf to control the behavior and
disable it for perf record and perf inject.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-inject.c   | 4 ++++
 tools/perf/builtin-record.c   | 2 ++
 tools/perf/util/event.c       | 4 ++--
 tools/perf/util/symbol_conf.h | 3 ++-
 4 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index c8cf2fdd9cff..1539fb18c749 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -2265,6 +2265,10 @@ int cmd_inject(int argc, const char **argv)
 		"perf inject [<options>]",
 		NULL
 	};
+
+	/* Disable eager loading of kernel symbols that adds overhead to perf inject. */
+	symbol_conf.lazy_load_kernel_maps = true;
+
 #ifndef HAVE_JITDUMP
 	set_option_nobuild(options, 'j', "jit", "NO_LIBELF=1", true);
 #endif
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index dcf288a4fb9a..8ec818568662 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -3989,6 +3989,8 @@ int cmd_record(int argc, const char **argv)
 # undef set_nobuild
 #endif
 
+	/* Disable eager loading of kernel symbols that adds overhead to perf record. */
+	symbol_conf.lazy_load_kernel_maps = true;
 	rec->opts.affinity = PERF_AFFINITY_SYS;
 
 	rec->evlist = evlist__new();
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 923c0fb15122..68f45e9e63b6 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -617,13 +617,13 @@ struct map *thread__find_map(struct thread *thread, u8 cpumode, u64 addr,
 	if (cpumode == PERF_RECORD_MISC_KERNEL && perf_host) {
 		al->level = 'k';
 		maps = machine__kernel_maps(machine);
-		load_map = true;
+		load_map = !symbol_conf.lazy_load_kernel_maps;
 	} else if (cpumode == PERF_RECORD_MISC_USER && perf_host) {
 		al->level = '.';
 	} else if (cpumode == PERF_RECORD_MISC_GUEST_KERNEL && perf_guest) {
 		al->level = 'g';
 		maps = machine__kernel_maps(machine);
-		load_map = true;
+		load_map = !symbol_conf.lazy_load_kernel_maps;
 	} else if (cpumode == PERF_RECORD_MISC_GUEST_USER && perf_guest) {
 		al->level = 'u';
 	} else {
diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
index 0b589570d1d0..2b2fb9e224b0 100644
--- a/tools/perf/util/symbol_conf.h
+++ b/tools/perf/util/symbol_conf.h
@@ -42,7 +42,8 @@ struct symbol_conf {
 			inline_name,
 			disable_add2line_warn,
 			buildid_mmap2,
-			guest_code;
+			guest_code,
+			lazy_load_kernel_maps;
 	const char	*vmlinux_name,
 			*kallsyms_name,
 			*source_prefix,
-- 
2.42.0.609.gbb76f46606-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 11/13] libperf: Lazily allocate mmap event copy
  2023-10-12  6:23 [PATCH v2 00/13] Improvements to memory use Ian Rogers
                   ` (9 preceding siblings ...)
  2023-10-12  6:23 ` [PATCH v2 10/13] perf record: Lazy load kernel symbols Ian Rogers
@ 2023-10-12  6:23 ` Ian Rogers
  2023-10-12  6:23 ` [PATCH v2 12/13] perf mmap: Lazily initialize zstd streams Ian Rogers
  2023-10-12  6:23 ` [PATCH v2 13/13] perf machine thread: Remove exited threads by default Ian Rogers
  12 siblings, 0 replies; 28+ messages in thread
From: Ian Rogers @ 2023-10-12  6:23 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Song Liu,
	Sandipan Das, Anshuman Khandual, James Clark, Liam Howlett,
	Miguel Ojeda, Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

The event copy in the mmap is used to have storage to a read
event. Not all users of mmaps read the events, such as perf record, so
switch the allocation to being on first read rather than being
embedded within the perf_mmap.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/perf/include/internal/mmap.h | 2 +-
 tools/lib/perf/mmap.c                  | 9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/lib/perf/include/internal/mmap.h b/tools/lib/perf/include/internal/mmap.h
index 5a062af8e9d8..b11aaf5ed645 100644
--- a/tools/lib/perf/include/internal/mmap.h
+++ b/tools/lib/perf/include/internal/mmap.h
@@ -33,7 +33,7 @@ struct perf_mmap {
 	bool			 overwrite;
 	u64			 flush;
 	libperf_unmap_cb_t	 unmap_cb;
-	char			 event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
+	void			*event_copy;
 	struct perf_mmap	*next;
 };
 
diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
index 2184814b37dd..91ae46aac378 100644
--- a/tools/lib/perf/mmap.c
+++ b/tools/lib/perf/mmap.c
@@ -51,6 +51,8 @@ int perf_mmap__mmap(struct perf_mmap *map, struct perf_mmap_param *mp,
 
 void perf_mmap__munmap(struct perf_mmap *map)
 {
+	free(map->event_copy);
+	map->event_copy = NULL;
 	if (map && map->base != NULL) {
 		munmap(map->base, perf_mmap__mmap_len(map));
 		map->base = NULL;
@@ -226,6 +228,13 @@ static union perf_event *perf_mmap__read(struct perf_mmap *map,
 			unsigned int len = min(sizeof(*event), size), cpy;
 			void *dst = map->event_copy;
 
+			if (!dst) {
+				dst = malloc(PERF_SAMPLE_MAX_SIZE);
+				if (!dst)
+					return NULL;
+				map->event_copy = dst;
+			}
+
 			do {
 				cpy = min(map->mask + 1 - (offset & map->mask), len);
 				memcpy(dst, &data[offset & map->mask], cpy);
-- 
2.42.0.609.gbb76f46606-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 12/13] perf mmap: Lazily initialize zstd streams
  2023-10-12  6:23 [PATCH v2 00/13] Improvements to memory use Ian Rogers
                   ` (10 preceding siblings ...)
  2023-10-12  6:23 ` [PATCH v2 11/13] libperf: Lazily allocate mmap event copy Ian Rogers
@ 2023-10-12  6:23 ` Ian Rogers
  2023-10-18 23:21   ` Namhyung Kim
  2023-10-12  6:23 ` [PATCH v2 13/13] perf machine thread: Remove exited threads by default Ian Rogers
  12 siblings, 1 reply; 28+ messages in thread
From: Ian Rogers @ 2023-10-12  6:23 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Song Liu,
	Sandipan Das, Anshuman Khandual, James Clark, Liam Howlett,
	Miguel Ojeda, Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

Zstd streams create dictionaries that can require significant RAM,
especially when there is one per-CPU. Tools like perf record won't use
the streams without the -z option, and so the creation of the streams
is pure overhead. Switch to creating the streams on first use.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/compress.h |  1 +
 tools/perf/util/mmap.c     |  5 ++--
 tools/perf/util/mmap.h     |  1 -
 tools/perf/util/zstd.c     | 61 ++++++++++++++++++++------------------
 4 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/tools/perf/util/compress.h b/tools/perf/util/compress.h
index 0cd3369af2a4..9391850f1a7e 100644
--- a/tools/perf/util/compress.h
+++ b/tools/perf/util/compress.h
@@ -21,6 +21,7 @@ struct zstd_data {
 #ifdef HAVE_ZSTD_SUPPORT
 	ZSTD_CStream	*cstream;
 	ZSTD_DStream	*dstream;
+	int comp_level;
 #endif
 };
 
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 49093b21ee2d..122ee198a86e 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -295,15 +295,14 @@ int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, struct perf_cpu
 
 	map->core.flush = mp->flush;
 
-	map->comp_level = mp->comp_level;
 #ifndef PYTHON_PERF
-	if (zstd_init(&map->zstd_data, map->comp_level)) {
+	if (zstd_init(&map->zstd_data, mp->comp_level)) {
 		pr_debug2("failed to init mmap compressor, error %d\n", errno);
 		return -1;
 	}
 #endif
 
-	if (map->comp_level && !perf_mmap__aio_enabled(map)) {
+	if (mp->comp_level && !perf_mmap__aio_enabled(map)) {
 		map->data = mmap(NULL, mmap__mmap_len(map), PROT_READ|PROT_WRITE,
 				 MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
 		if (map->data == MAP_FAILED) {
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index f944c3cd5efa..0df6e1621c7e 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -39,7 +39,6 @@ struct mmap {
 #endif
 	struct mmap_cpu_mask	affinity_mask;
 	void		*data;
-	int		comp_level;
 	struct perf_data_file *file;
 	struct zstd_data      zstd_data;
 };
diff --git a/tools/perf/util/zstd.c b/tools/perf/util/zstd.c
index 48dd2b018c47..60f2d749b1c0 100644
--- a/tools/perf/util/zstd.c
+++ b/tools/perf/util/zstd.c
@@ -7,35 +7,9 @@
 
 int zstd_init(struct zstd_data *data, int level)
 {
-	size_t ret;
-
-	data->dstream = ZSTD_createDStream();
-	if (data->dstream == NULL) {
-		pr_err("Couldn't create decompression stream.\n");
-		return -1;
-	}
-
-	ret = ZSTD_initDStream(data->dstream);
-	if (ZSTD_isError(ret)) {
-		pr_err("Failed to initialize decompression stream: %s\n", ZSTD_getErrorName(ret));
-		return -1;
-	}
-
-	if (!level)
-		return 0;
-
-	data->cstream = ZSTD_createCStream();
-	if (data->cstream == NULL) {
-		pr_err("Couldn't create compression stream.\n");
-		return -1;
-	}
-
-	ret = ZSTD_initCStream(data->cstream, level);
-	if (ZSTD_isError(ret)) {
-		pr_err("Failed to initialize compression stream: %s\n", ZSTD_getErrorName(ret));
-		return -1;
-	}
-
+	data->comp_level = level;
+	data->dstream = NULL;
+	data->cstream = NULL;
 	return 0;
 }
 
@@ -63,6 +37,21 @@ size_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t
 	ZSTD_outBuffer output;
 	void *record;
 
+	if (!data->cstream) {
+		data->cstream = ZSTD_createCStream();
+		if (data->cstream == NULL) {
+			pr_err("Couldn't create compression stream.\n");
+			return -1;
+		}
+
+		ret = ZSTD_initCStream(data->cstream, data->comp_level);
+		if (ZSTD_isError(ret)) {
+			pr_err("Failed to initialize compression stream: %s\n",
+				ZSTD_getErrorName(ret));
+			return -1;
+		}
+	}
+
 	while (input.pos < input.size) {
 		record = dst;
 		size = process_header(record, 0);
@@ -96,6 +85,20 @@ size_t zstd_decompress_stream(struct zstd_data *data, void *src, size_t src_size
 	ZSTD_inBuffer input = { src, src_size, 0 };
 	ZSTD_outBuffer output = { dst, dst_size, 0 };
 
+	if (!data->dstream) {
+		data->dstream = ZSTD_createDStream();
+		if (data->dstream == NULL) {
+			pr_err("Couldn't create decompression stream.\n");
+			return -1;
+		}
+
+		ret = ZSTD_initDStream(data->dstream);
+		if (ZSTD_isError(ret)) {
+			pr_err("Failed to initialize decompression stream: %s\n",
+				ZSTD_getErrorName(ret));
+			return -1;
+		}
+	}
 	while (input.pos < input.size) {
 		ret = ZSTD_decompressStream(data->dstream, &output, &input);
 		if (ZSTD_isError(ret)) {
-- 
2.42.0.609.gbb76f46606-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 13/13] perf machine thread: Remove exited threads by default
  2023-10-12  6:23 [PATCH v2 00/13] Improvements to memory use Ian Rogers
                   ` (11 preceding siblings ...)
  2023-10-12  6:23 ` [PATCH v2 12/13] perf mmap: Lazily initialize zstd streams Ian Rogers
@ 2023-10-12  6:23 ` Ian Rogers
  2023-10-18 23:30   ` Namhyung Kim
  2023-10-23 14:24   ` Adrian Hunter
  12 siblings, 2 replies; 28+ messages in thread
From: Ian Rogers @ 2023-10-12  6:23 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Song Liu,
	Sandipan Das, Anshuman Khandual, James Clark, Liam Howlett,
	Miguel Ojeda, Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

struct thread values hold onto references to mmaps, dsos, etc. When a
thread exits it is necessary to clean all of this memory up by
removing the thread from the machine's threads. Some tools require
this doesn't happen, such as perf report if offcpu events exist or if
a task list is being generated, so add a symbol_conf value to make the
behavior optional. When an exited thread is left in the machine's
threads, mark it as exited.

This change relates to commit 40826c45eb0b ("perf thread: Remove
notion of dead threads"). Dead threads were removed as they had a
reference count of 0 and were difficult to reason about with the
reference count checker. Here a thread is removed from threads when it
exits, unless via symbol_conf the exited thread isn't remove and is
marked as exited. Reference counting behaves as it normally does.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-report.c   |  7 +++++++
 tools/perf/util/machine.c     | 10 +++++++---
 tools/perf/util/symbol_conf.h |  3 ++-
 tools/perf/util/thread.h      | 14 ++++++++++++++
 4 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index dcedfe00f04d..749246817aed 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1411,6 +1411,13 @@ int cmd_report(int argc, const char **argv)
 	if (ret < 0)
 		goto exit;
 
+	/*
+	 * tasks_mode require access to exited threads to list those that are in
+	 * the data file. Off-cpu events are synthesized after other events and
+	 * reference exited threads.
+	 */
+	symbol_conf.keep_exited_threads = true;
+
 	annotation_options__init(&report.annotation_opts);
 
 	ret = perf_config(report__config, &report);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 6ca7500e2cf4..5cda47eb337d 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2157,9 +2157,13 @@ int machine__process_exit_event(struct machine *machine, union perf_event *event
 	if (dump_trace)
 		perf_event__fprintf_task(event, stdout);
 
-	if (thread != NULL)
-		thread__put(thread);
-
+	if (thread != NULL) {
+		if (symbol_conf.keep_exited_threads)
+			thread__set_exited(thread, /*exited=*/true);
+		else
+			machine__remove_thread(machine, thread);
+	}
+	thread__put(thread);
 	return 0;
 }
 
diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
index 2b2fb9e224b0..6040286e07a6 100644
--- a/tools/perf/util/symbol_conf.h
+++ b/tools/perf/util/symbol_conf.h
@@ -43,7 +43,8 @@ struct symbol_conf {
 			disable_add2line_warn,
 			buildid_mmap2,
 			guest_code,
-			lazy_load_kernel_maps;
+			lazy_load_kernel_maps,
+			keep_exited_threads;
 	const char	*vmlinux_name,
 			*kallsyms_name,
 			*source_prefix,
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index e79225a0ea46..0df775b5c110 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -36,13 +36,22 @@ struct thread_rb_node {
 };
 
 DECLARE_RC_STRUCT(thread) {
+	/** @maps: mmaps associated with this thread. */
 	struct maps		*maps;
 	pid_t			pid_; /* Not all tools update this */
+	/** @tid: thread ID number unique to a machine. */
 	pid_t			tid;
+	/** @ppid: parent process of the process this thread belongs to. */
 	pid_t			ppid;
 	int			cpu;
 	int			guest_cpu; /* For QEMU thread */
 	refcount_t		refcnt;
+	/**
+	 * @exited: Has the thread had an exit event. Such threads are usually
+	 * removed from the machine's threads but some events/tools require
+	 * access to dead threads.
+	 */
+	bool			exited;
 	bool			comm_set;
 	int			comm_len;
 	struct list_head	namespaces_list;
@@ -189,6 +198,11 @@ static inline refcount_t *thread__refcnt(struct thread *thread)
 	return &RC_CHK_ACCESS(thread)->refcnt;
 }
 
+static inline void thread__set_exited(struct thread *thread, bool exited)
+{
+	RC_CHK_ACCESS(thread)->exited = exited;
+}
+
 static inline bool thread__comm_set(const struct thread *thread)
 {
 	return RC_CHK_ACCESS(thread)->comm_set;
-- 
2.42.0.609.gbb76f46606-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 03/13] perf hist: Add missing puts to hist__account_cycles
  2023-10-12  6:23 ` [PATCH v2 03/13] perf hist: Add missing puts to hist__account_cycles Ian Rogers
@ 2023-10-18 23:16   ` Namhyung Kim
  2023-10-24 16:12     ` Ian Rogers
  0 siblings, 1 reply; 28+ messages in thread
From: Namhyung Kim @ 2023-10-18 23:16 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Nick Terrell, Kan Liang, Song Liu, Sandipan Das,
	Anshuman Khandual, James Clark, Liam Howlett, Miguel Ojeda,
	Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

Hi Ian,

On Wed, Oct 11, 2023 at 11:24 PM Ian Rogers <irogers@google.com> wrote:
>
> Caught using reference count checking on perf top with
> "--call-graph=lbr". After this no memory leaks were detected.
>
> Fixes: 57849998e2cd ("perf report: Add processing for cycle histograms")
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/util/hist.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> index 3dc8a4968beb..ac8c0ef48a7f 100644
> --- a/tools/perf/util/hist.c
> +++ b/tools/perf/util/hist.c
> @@ -2676,8 +2676,6 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
>
>         /* If we have branch cycles always annotate them. */
>         if (bs && bs->nr && entries[0].flags.cycles) {
> -               int i;
> -

Seems not necessary.

>                 bi = sample__resolve_bstack(sample, al);

It looks like this increases the refcount for each bi entry and
it didn't put the refcounts.


>                 if (bi) {
>                         struct addr_map_symbol *prev = NULL;
> @@ -2692,7 +2690,7 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
>                          * Note that perf stores branches reversed from
>                          * program order!
>                          */
> -                       for (i = bs->nr - 1; i >= 0; i--) {
> +                       for (int i = bs->nr - 1; i >= 0; i--) {
>                                 addr_map_symbol__account_cycles(&bi[i].from,
>                                         nonany_branch_mode ? NULL : prev,
>                                         bi[i].flags.cycles);
> @@ -2701,6 +2699,12 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
>                                 if (total_cycles)
>                                         *total_cycles += bi[i].flags.cycles;
>                         }
> +                       for (unsigned int i = 0; i < bs->nr; i++) {

Can we just reuse the int i above?

Thanks,
Namhyung


> +                               map__put(bi[i].to.ms.map);
> +                               maps__put(bi[i].to.ms.maps);
> +                               map__put(bi[i].from.ms.map);
> +                               maps__put(bi[i].from.ms.maps);
> +                       }
>                         free(bi);
>                 }
>         }
> --
> 2.42.0.609.gbb76f46606-goog
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 09/13] perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit
  2023-10-12  6:23 ` [PATCH v2 09/13] perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit Ian Rogers
@ 2023-10-18 23:20   ` Namhyung Kim
  2023-10-24 20:52     ` Ian Rogers
  0 siblings, 1 reply; 28+ messages in thread
From: Namhyung Kim @ 2023-10-18 23:20 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Nick Terrell, Kan Liang, Song Liu, Sandipan Das,
	Anshuman Khandual, James Clark, Liam Howlett, Miguel Ojeda,
	Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

On Wed, Oct 11, 2023 at 11:24 PM Ian Rogers <irogers@google.com> wrote:
>
> Fix leak where mem_info__put wouldn't release the maps/map as used by
> perf mem. Add exit functions and use elsewhere that the maps and map
> are released.
>
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/util/Build        |  1 +
>  tools/perf/util/callchain.c  | 15 +++++----------
>  tools/perf/util/hist.c       |  6 ++----
>  tools/perf/util/machine.c    |  6 ++----
>  tools/perf/util/map_symbol.c | 15 +++++++++++++++
>  tools/perf/util/map_symbol.h |  4 ++++
>  tools/perf/util/symbol.c     |  5 ++++-
>  7 files changed, 33 insertions(+), 19 deletions(-)
>  create mode 100644 tools/perf/util/map_symbol.c
>
> diff --git a/tools/perf/util/Build b/tools/perf/util/Build
> index 0ea5a9d368d4..96058f949ec9 100644
> --- a/tools/perf/util/Build
> +++ b/tools/perf/util/Build
> @@ -49,6 +49,7 @@ perf-y += dso.o
>  perf-y += dsos.o
>  perf-y += symbol.o
>  perf-y += symbol_fprintf.o
> +perf-y += map_symbol.o
>  perf-y += color.o
>  perf-y += color_config.o
>  perf-y += metricgroup.o
> diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> index 0a7919c2af91..02881d5b822c 100644
> --- a/tools/perf/util/callchain.c
> +++ b/tools/perf/util/callchain.c
> @@ -1496,16 +1496,14 @@ static void free_callchain_node(struct callchain_node *node)
>
>         list_for_each_entry_safe(list, tmp, &node->parent_val, list) {
>                 list_del_init(&list->list);
> -               map__zput(list->ms.map);
> -               maps__zput(list->ms.maps);
> +               map_symbol__exit(&list->ms);
>                 zfree(&list->brtype_stat);
>                 free(list);
>         }
>
>         list_for_each_entry_safe(list, tmp, &node->val, list) {
>                 list_del_init(&list->list);
> -               map__zput(list->ms.map);
> -               maps__zput(list->ms.maps);
> +               map_symbol__exit(&list->ms);
>                 zfree(&list->brtype_stat);
>                 free(list);
>         }
> @@ -1591,8 +1589,7 @@ int callchain_node__make_parent_list(struct callchain_node *node)
>  out:
>         list_for_each_entry_safe(chain, new, &head, list) {
>                 list_del_init(&chain->list);
> -               map__zput(chain->ms.map);
> -               maps__zput(chain->ms.maps);
> +               map_symbol__exit(&chain->ms);
>                 zfree(&chain->brtype_stat);
>                 free(chain);
>         }
> @@ -1676,10 +1673,8 @@ void callchain_cursor_reset(struct callchain_cursor *cursor)
>         cursor->nr = 0;
>         cursor->last = &cursor->first;
>
> -       for (node = cursor->first; node != NULL; node = node->next) {
> -               map__zput(node->ms.map);
> -               maps__zput(node->ms.maps);
> -       }
> +       for (node = cursor->first; node != NULL; node = node->next)
> +               map_symbol__exit(&node->ms);
>  }
>
>  void callchain_param_setup(u64 sample_type, const char *arch)
> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> index ac8c0ef48a7f..d62693b8fad8 100644
> --- a/tools/perf/util/hist.c
> +++ b/tools/perf/util/hist.c
> @@ -524,8 +524,7 @@ static int hist_entry__init(struct hist_entry *he,
>                 map__put(he->mem_info->daddr.ms.map);
>         }
>  err:
> -       maps__zput(he->ms.maps);
> -       map__zput(he->ms.map);
> +       map_symbol__exit(&he->ms);
>         zfree(&he->stat_acc);
>         return -ENOMEM;
>  }
> @@ -1317,8 +1316,7 @@ void hist_entry__delete(struct hist_entry *he)
>         struct hist_entry_ops *ops = he->ops;
>
>         thread__zput(he->thread);
> -       maps__zput(he->ms.maps);
> -       map__zput(he->ms.map);
> +       map_symbol__exit(&he->ms);
>
>         if (he->branch_info) {
>                 map__zput(he->branch_info->from.ms.map);

What about he->branch_info and he->mem_info ?

Also I think we can use it in hists__account_cycles() too.

Thanks,
Namhyung


> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 8e5085b77c7b..6ca7500e2cf4 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -2389,8 +2389,7 @@ static int add_callchain_ip(struct thread *thread,
>                                       iter_cycles, branch_from, srcline);
>  out:
>         addr_location__exit(&al);
> -       maps__put(ms.maps);
> -       map__put(ms.map);
> +       map_symbol__exit(&ms);
>         return err;
>  }
>
> @@ -3116,8 +3115,7 @@ static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms
>                 if (ret != 0)
>                         return ret;
>         }
> -       map__put(ilist_ms.map);
> -       maps__put(ilist_ms.maps);
> +       map_symbol__exit(&ilist_ms);
>
>         return ret;
>  }
> diff --git a/tools/perf/util/map_symbol.c b/tools/perf/util/map_symbol.c
> new file mode 100644
> index 000000000000..bef5079f2403
> --- /dev/null
> +++ b/tools/perf/util/map_symbol.c
> @@ -0,0 +1,15 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include "map_symbol.h"
> +#include "maps.h"
> +#include "map.h"
> +
> +void map_symbol__exit(struct map_symbol *ms)
> +{
> +       maps__zput(ms->maps);
> +       map__zput(ms->map);
> +}
> +
> +void addr_map_symbol__exit(struct addr_map_symbol *ams)
> +{
> +       map_symbol__exit(&ams->ms);
> +}
> diff --git a/tools/perf/util/map_symbol.h b/tools/perf/util/map_symbol.h
> index e08817b0c30f..72d5ed938ed6 100644
> --- a/tools/perf/util/map_symbol.h
> +++ b/tools/perf/util/map_symbol.h
> @@ -22,4 +22,8 @@ struct addr_map_symbol {
>         u64           phys_addr;
>         u64           data_page_size;
>  };
> +
> +void map_symbol__exit(struct map_symbol *ms);
> +void addr_map_symbol__exit(struct addr_map_symbol *ams);
> +
>  #endif // __PERF_MAP_SYMBOL
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index 2740d4457c13..d67a87072eec 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -2790,8 +2790,11 @@ struct mem_info *mem_info__get(struct mem_info *mi)
>
>  void mem_info__put(struct mem_info *mi)
>  {
> -       if (mi && refcount_dec_and_test(&mi->refcnt))
> +       if (mi && refcount_dec_and_test(&mi->refcnt)) {
> +               addr_map_symbol__exit(&mi->iaddr);
> +               addr_map_symbol__exit(&mi->daddr);
>                 free(mi);
> +       }
>  }
>
>  struct mem_info *mem_info__new(void)
> --
> 2.42.0.609.gbb76f46606-goog
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 12/13] perf mmap: Lazily initialize zstd streams
  2023-10-12  6:23 ` [PATCH v2 12/13] perf mmap: Lazily initialize zstd streams Ian Rogers
@ 2023-10-18 23:21   ` Namhyung Kim
  2023-10-24 19:21     ` Ian Rogers
  0 siblings, 1 reply; 28+ messages in thread
From: Namhyung Kim @ 2023-10-18 23:21 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Nick Terrell, Kan Liang, Song Liu, Sandipan Das,
	Anshuman Khandual, James Clark, Liam Howlett, Miguel Ojeda,
	Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

On Wed, Oct 11, 2023 at 11:24 PM Ian Rogers <irogers@google.com> wrote:
>
> Zstd streams create dictionaries that can require significant RAM,
> especially when there is one per-CPU. Tools like perf record won't use
> the streams without the -z option, and so the creation of the streams
> is pure overhead. Switch to creating the streams on first use.
>
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/util/compress.h |  1 +
>  tools/perf/util/mmap.c     |  5 ++--
>  tools/perf/util/mmap.h     |  1 -
>  tools/perf/util/zstd.c     | 61 ++++++++++++++++++++------------------
>  4 files changed, 35 insertions(+), 33 deletions(-)
>
> diff --git a/tools/perf/util/compress.h b/tools/perf/util/compress.h
> index 0cd3369af2a4..9391850f1a7e 100644
> --- a/tools/perf/util/compress.h
> +++ b/tools/perf/util/compress.h
> @@ -21,6 +21,7 @@ struct zstd_data {
>  #ifdef HAVE_ZSTD_SUPPORT
>         ZSTD_CStream    *cstream;
>         ZSTD_DStream    *dstream;
> +       int comp_level;
>  #endif
>  };
>
> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> index 49093b21ee2d..122ee198a86e 100644
> --- a/tools/perf/util/mmap.c
> +++ b/tools/perf/util/mmap.c
> @@ -295,15 +295,14 @@ int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, struct perf_cpu
>
>         map->core.flush = mp->flush;
>
> -       map->comp_level = mp->comp_level;
>  #ifndef PYTHON_PERF
> -       if (zstd_init(&map->zstd_data, map->comp_level)) {
> +       if (zstd_init(&map->zstd_data, mp->comp_level)) {
>                 pr_debug2("failed to init mmap compressor, error %d\n", errno);
>                 return -1;
>         }
>  #endif
>
> -       if (map->comp_level && !perf_mmap__aio_enabled(map)) {
> +       if (mp->comp_level && !perf_mmap__aio_enabled(map)) {
>                 map->data = mmap(NULL, mmap__mmap_len(map), PROT_READ|PROT_WRITE,
>                                  MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
>                 if (map->data == MAP_FAILED) {
> diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
> index f944c3cd5efa..0df6e1621c7e 100644
> --- a/tools/perf/util/mmap.h
> +++ b/tools/perf/util/mmap.h
> @@ -39,7 +39,6 @@ struct mmap {
>  #endif
>         struct mmap_cpu_mask    affinity_mask;
>         void            *data;
> -       int             comp_level;
>         struct perf_data_file *file;
>         struct zstd_data      zstd_data;
>  };
> diff --git a/tools/perf/util/zstd.c b/tools/perf/util/zstd.c
> index 48dd2b018c47..60f2d749b1c0 100644
> --- a/tools/perf/util/zstd.c
> +++ b/tools/perf/util/zstd.c
> @@ -7,35 +7,9 @@
>
>  int zstd_init(struct zstd_data *data, int level)
>  {
> -       size_t ret;
> -
> -       data->dstream = ZSTD_createDStream();
> -       if (data->dstream == NULL) {
> -               pr_err("Couldn't create decompression stream.\n");
> -               return -1;
> -       }
> -
> -       ret = ZSTD_initDStream(data->dstream);
> -       if (ZSTD_isError(ret)) {
> -               pr_err("Failed to initialize decompression stream: %s\n", ZSTD_getErrorName(ret));
> -               return -1;
> -       }
> -
> -       if (!level)
> -               return 0;
> -
> -       data->cstream = ZSTD_createCStream();
> -       if (data->cstream == NULL) {
> -               pr_err("Couldn't create compression stream.\n");
> -               return -1;
> -       }
> -
> -       ret = ZSTD_initCStream(data->cstream, level);
> -       if (ZSTD_isError(ret)) {
> -               pr_err("Failed to initialize compression stream: %s\n", ZSTD_getErrorName(ret));
> -               return -1;
> -       }
> -
> +       data->comp_level = level;
> +       data->dstream = NULL;
> +       data->cstream = NULL;
>         return 0;
>  }
>
> @@ -63,6 +37,21 @@ size_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t
>         ZSTD_outBuffer output;
>         void *record;
>
> +       if (!data->cstream) {
> +               data->cstream = ZSTD_createCStream();
> +               if (data->cstream == NULL) {
> +                       pr_err("Couldn't create compression stream.\n");
> +                       return -1;
> +               }
> +
> +               ret = ZSTD_initCStream(data->cstream, data->comp_level);
> +               if (ZSTD_isError(ret)) {
> +                       pr_err("Failed to initialize compression stream: %s\n",
> +                               ZSTD_getErrorName(ret));
> +                       return -1;

I'm not sure if the callers are ready to handle the failure.

Thanks,
Namhyung


> +               }
> +       }
> +
>         while (input.pos < input.size) {
>                 record = dst;
>                 size = process_header(record, 0);
> @@ -96,6 +85,20 @@ size_t zstd_decompress_stream(struct zstd_data *data, void *src, size_t src_size
>         ZSTD_inBuffer input = { src, src_size, 0 };
>         ZSTD_outBuffer output = { dst, dst_size, 0 };
>
> +       if (!data->dstream) {
> +               data->dstream = ZSTD_createDStream();
> +               if (data->dstream == NULL) {
> +                       pr_err("Couldn't create decompression stream.\n");
> +                       return -1;
> +               }
> +
> +               ret = ZSTD_initDStream(data->dstream);
> +               if (ZSTD_isError(ret)) {
> +                       pr_err("Failed to initialize decompression stream: %s\n",
> +                               ZSTD_getErrorName(ret));
> +                       return -1;
> +               }
> +       }
>         while (input.pos < input.size) {
>                 ret = ZSTD_decompressStream(data->dstream, &output, &input);
>                 if (ZSTD_isError(ret)) {
> --
> 2.42.0.609.gbb76f46606-goog
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 13/13] perf machine thread: Remove exited threads by default
  2023-10-12  6:23 ` [PATCH v2 13/13] perf machine thread: Remove exited threads by default Ian Rogers
@ 2023-10-18 23:30   ` Namhyung Kim
  2023-10-19  0:39     ` Ian Rogers
  2023-10-23 14:24   ` Adrian Hunter
  1 sibling, 1 reply; 28+ messages in thread
From: Namhyung Kim @ 2023-10-18 23:30 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Nick Terrell, Kan Liang, Song Liu, Sandipan Das,
	Anshuman Khandual, James Clark, Liam Howlett, Miguel Ojeda,
	Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

On Wed, Oct 11, 2023 at 11:24 PM Ian Rogers <irogers@google.com> wrote:
>
> struct thread values hold onto references to mmaps, dsos, etc. When a
> thread exits it is necessary to clean all of this memory up by
> removing the thread from the machine's threads. Some tools require
> this doesn't happen, such as perf report if offcpu events exist or if
> a task list is being generated, so add a symbol_conf value to make the
> behavior optional. When an exited thread is left in the machine's
> threads, mark it as exited.
>
> This change relates to commit 40826c45eb0b ("perf thread: Remove
> notion of dead threads"). Dead threads were removed as they had a
> reference count of 0 and were difficult to reason about with the
> reference count checker. Here a thread is removed from threads when it
> exits, unless via symbol_conf the exited thread isn't remove and is
> marked as exited. Reference counting behaves as it normally does.

Maybe we can do it the other way around.  IOW tools can access
dead threads for whatever reason if they are dealing with a data
file.  And I guess the main concern is perf top to reduce memory
footprint, right?  Then we can declare to remove the dead threads
for perf top case only IMHO.

Thanks,
Namhyung

>
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/builtin-report.c   |  7 +++++++
>  tools/perf/util/machine.c     | 10 +++++++---
>  tools/perf/util/symbol_conf.h |  3 ++-
>  tools/perf/util/thread.h      | 14 ++++++++++++++
>  4 files changed, 30 insertions(+), 4 deletions(-)
>
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index dcedfe00f04d..749246817aed 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -1411,6 +1411,13 @@ int cmd_report(int argc, const char **argv)
>         if (ret < 0)
>                 goto exit;
>
> +       /*
> +        * tasks_mode require access to exited threads to list those that are in
> +        * the data file. Off-cpu events are synthesized after other events and
> +        * reference exited threads.
> +        */
> +       symbol_conf.keep_exited_threads = true;
> +
>         annotation_options__init(&report.annotation_opts);
>
>         ret = perf_config(report__config, &report);
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 6ca7500e2cf4..5cda47eb337d 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -2157,9 +2157,13 @@ int machine__process_exit_event(struct machine *machine, union perf_event *event
>         if (dump_trace)
>                 perf_event__fprintf_task(event, stdout);
>
> -       if (thread != NULL)
> -               thread__put(thread);
> -
> +       if (thread != NULL) {
> +               if (symbol_conf.keep_exited_threads)
> +                       thread__set_exited(thread, /*exited=*/true);
> +               else
> +                       machine__remove_thread(machine, thread);
> +       }
> +       thread__put(thread);
>         return 0;
>  }
>
> diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
> index 2b2fb9e224b0..6040286e07a6 100644
> --- a/tools/perf/util/symbol_conf.h
> +++ b/tools/perf/util/symbol_conf.h
> @@ -43,7 +43,8 @@ struct symbol_conf {
>                         disable_add2line_warn,
>                         buildid_mmap2,
>                         guest_code,
> -                       lazy_load_kernel_maps;
> +                       lazy_load_kernel_maps,
> +                       keep_exited_threads;
>         const char      *vmlinux_name,
>                         *kallsyms_name,
>                         *source_prefix,
> diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
> index e79225a0ea46..0df775b5c110 100644
> --- a/tools/perf/util/thread.h
> +++ b/tools/perf/util/thread.h
> @@ -36,13 +36,22 @@ struct thread_rb_node {
>  };
>
>  DECLARE_RC_STRUCT(thread) {
> +       /** @maps: mmaps associated with this thread. */
>         struct maps             *maps;
>         pid_t                   pid_; /* Not all tools update this */
> +       /** @tid: thread ID number unique to a machine. */
>         pid_t                   tid;
> +       /** @ppid: parent process of the process this thread belongs to. */
>         pid_t                   ppid;
>         int                     cpu;
>         int                     guest_cpu; /* For QEMU thread */
>         refcount_t              refcnt;
> +       /**
> +        * @exited: Has the thread had an exit event. Such threads are usually
> +        * removed from the machine's threads but some events/tools require
> +        * access to dead threads.
> +        */
> +       bool                    exited;
>         bool                    comm_set;
>         int                     comm_len;
>         struct list_head        namespaces_list;
> @@ -189,6 +198,11 @@ static inline refcount_t *thread__refcnt(struct thread *thread)
>         return &RC_CHK_ACCESS(thread)->refcnt;
>  }
>
> +static inline void thread__set_exited(struct thread *thread, bool exited)
> +{
> +       RC_CHK_ACCESS(thread)->exited = exited;
> +}
> +
>  static inline bool thread__comm_set(const struct thread *thread)
>  {
>         return RC_CHK_ACCESS(thread)->comm_set;
> --
> 2.42.0.609.gbb76f46606-goog
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 13/13] perf machine thread: Remove exited threads by default
  2023-10-18 23:30   ` Namhyung Kim
@ 2023-10-19  0:39     ` Ian Rogers
  2023-10-20 21:08       ` Namhyung Kim
  0 siblings, 1 reply; 28+ messages in thread
From: Ian Rogers @ 2023-10-19  0:39 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Nick Terrell, Kan Liang, Song Liu, Sandipan Das,
	Anshuman Khandual, James Clark, Liam Howlett, Miguel Ojeda,
	Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

On Wed, Oct 18, 2023 at 4:30 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, Oct 11, 2023 at 11:24 PM Ian Rogers <irogers@google.com> wrote:
> >
> > struct thread values hold onto references to mmaps, dsos, etc. When a
> > thread exits it is necessary to clean all of this memory up by
> > removing the thread from the machine's threads. Some tools require
> > this doesn't happen, such as perf report if offcpu events exist or if
> > a task list is being generated, so add a symbol_conf value to make the
> > behavior optional. When an exited thread is left in the machine's
> > threads, mark it as exited.
> >
> > This change relates to commit 40826c45eb0b ("perf thread: Remove
> > notion of dead threads"). Dead threads were removed as they had a
> > reference count of 0 and were difficult to reason about with the
> > reference count checker. Here a thread is removed from threads when it
> > exits, unless via symbol_conf the exited thread isn't remove and is
> > marked as exited. Reference counting behaves as it normally does.
>
> Maybe we can do it the other way around.  IOW tools can access
> dead threads for whatever reason if they are dealing with a data
> file.  And I guess the main concern is perf top to reduce memory
> footprint, right?  Then we can declare to remove the dead threads
> for perf top case only IMHO.

Thanks I did consider this but I think this order makes most sense.
The only tool relying on access to dead threads is perf report, but
its uses are questionable:

 - task printing - the tools wants to show all threads from a perf run
and assumes they are in threads. By replacing tool.exit it would be
easy to record dead threads for this, as the maps aren't required the
memory overhead could be improved by just recording the necessary
task's data.

 - offcpu - it would be very useful to have offcpu samples be real
samples, rather than an aggregated sample at the end of the data file
with a timestamp greater-than when the thread exited. These samples
would happen before the exit event is processed and so the reason to
keep dead threads around would be removed. I think we could do some
kind of sample aggregation using BPF, but I think we need to think it
through with exit events. Perhaps we can fix things in the short-term
by generating BPF samples with aggregation when threads exit in the
offcpu BPF code, but then again, if you have this going it seems as
easy just to keep to record all offcpu events as distinct.

Other commands like perf top and perf script don't currently have
dependencies on dead threads and I'm kind of glad, for
understandability, memory footprint, etc. Having the default be that
dead threads linger on will just encourage the kind of issues we see
currently in perf report and having to have every tool, perf script
declare it doesn't want dead threads seems burdensome.

Thanks,
Ian

> Thanks,
> Namhyung
>
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/perf/builtin-report.c   |  7 +++++++
> >  tools/perf/util/machine.c     | 10 +++++++---
> >  tools/perf/util/symbol_conf.h |  3 ++-
> >  tools/perf/util/thread.h      | 14 ++++++++++++++
> >  4 files changed, 30 insertions(+), 4 deletions(-)
> >
> > diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> > index dcedfe00f04d..749246817aed 100644
> > --- a/tools/perf/builtin-report.c
> > +++ b/tools/perf/builtin-report.c
> > @@ -1411,6 +1411,13 @@ int cmd_report(int argc, const char **argv)
> >         if (ret < 0)
> >                 goto exit;
> >
> > +       /*
> > +        * tasks_mode require access to exited threads to list those that are in
> > +        * the data file. Off-cpu events are synthesized after other events and
> > +        * reference exited threads.
> > +        */
> > +       symbol_conf.keep_exited_threads = true;
> > +
> >         annotation_options__init(&report.annotation_opts);
> >
> >         ret = perf_config(report__config, &report);
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index 6ca7500e2cf4..5cda47eb337d 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -2157,9 +2157,13 @@ int machine__process_exit_event(struct machine *machine, union perf_event *event
> >         if (dump_trace)
> >                 perf_event__fprintf_task(event, stdout);
> >
> > -       if (thread != NULL)
> > -               thread__put(thread);
> > -
> > +       if (thread != NULL) {
> > +               if (symbol_conf.keep_exited_threads)
> > +                       thread__set_exited(thread, /*exited=*/true);
> > +               else
> > +                       machine__remove_thread(machine, thread);
> > +       }
> > +       thread__put(thread);
> >         return 0;
> >  }
> >
> > diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
> > index 2b2fb9e224b0..6040286e07a6 100644
> > --- a/tools/perf/util/symbol_conf.h
> > +++ b/tools/perf/util/symbol_conf.h
> > @@ -43,7 +43,8 @@ struct symbol_conf {
> >                         disable_add2line_warn,
> >                         buildid_mmap2,
> >                         guest_code,
> > -                       lazy_load_kernel_maps;
> > +                       lazy_load_kernel_maps,
> > +                       keep_exited_threads;
> >         const char      *vmlinux_name,
> >                         *kallsyms_name,
> >                         *source_prefix,
> > diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
> > index e79225a0ea46..0df775b5c110 100644
> > --- a/tools/perf/util/thread.h
> > +++ b/tools/perf/util/thread.h
> > @@ -36,13 +36,22 @@ struct thread_rb_node {
> >  };
> >
> >  DECLARE_RC_STRUCT(thread) {
> > +       /** @maps: mmaps associated with this thread. */
> >         struct maps             *maps;
> >         pid_t                   pid_; /* Not all tools update this */
> > +       /** @tid: thread ID number unique to a machine. */
> >         pid_t                   tid;
> > +       /** @ppid: parent process of the process this thread belongs to. */
> >         pid_t                   ppid;
> >         int                     cpu;
> >         int                     guest_cpu; /* For QEMU thread */
> >         refcount_t              refcnt;
> > +       /**
> > +        * @exited: Has the thread had an exit event. Such threads are usually
> > +        * removed from the machine's threads but some events/tools require
> > +        * access to dead threads.
> > +        */
> > +       bool                    exited;
> >         bool                    comm_set;
> >         int                     comm_len;
> >         struct list_head        namespaces_list;
> > @@ -189,6 +198,11 @@ static inline refcount_t *thread__refcnt(struct thread *thread)
> >         return &RC_CHK_ACCESS(thread)->refcnt;
> >  }
> >
> > +static inline void thread__set_exited(struct thread *thread, bool exited)
> > +{
> > +       RC_CHK_ACCESS(thread)->exited = exited;
> > +}
> > +
> >  static inline bool thread__comm_set(const struct thread *thread)
> >  {
> >         return RC_CHK_ACCESS(thread)->comm_set;
> > --
> > 2.42.0.609.gbb76f46606-goog
> >

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 10/13] perf record: Lazy load kernel symbols
  2023-10-12  6:23 ` [PATCH v2 10/13] perf record: Lazy load kernel symbols Ian Rogers
@ 2023-10-19 11:02   ` Adrian Hunter
  2023-10-24 18:42     ` Ian Rogers
  0 siblings, 1 reply; 28+ messages in thread
From: Adrian Hunter @ 2023-10-19 11:02 UTC (permalink / raw)
  To: Ian Rogers, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Nick Terrell, Kan Liang, Song Liu,
	Sandipan Das, Anshuman Khandual, James Clark, Liam Howlett,
	Miguel Ojeda, Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

On 12/10/23 09:23, Ian Rogers wrote:
> Commit 5b7ba82a7591 ("perf symbols: Load kernel maps before using")
> changed it so that loading a kernel dso would cause the symbols for
> the dso to be eagerly loaded. For perf record this is overhead as the
> symbols won't be used. Add a symbol_conf to control the behavior and
> disable it for perf record and perf inject.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/builtin-inject.c   | 4 ++++
>  tools/perf/builtin-record.c   | 2 ++
>  tools/perf/util/event.c       | 4 ++--
>  tools/perf/util/symbol_conf.h | 3 ++-
>  4 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index c8cf2fdd9cff..1539fb18c749 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -2265,6 +2265,10 @@ int cmd_inject(int argc, const char **argv)
>  		"perf inject [<options>]",
>  		NULL
>  	};
> +
> +	/* Disable eager loading of kernel symbols that adds overhead to perf inject. */
> +	symbol_conf.lazy_load_kernel_maps = true;

Possibly not for itrace kernel decoding, so:

	if (!inject->itrace_synth_opts.set)
		symbol_conf.lazy_load_kernel_maps = true;

> +
>  #ifndef HAVE_JITDUMP
>  	set_option_nobuild(options, 'j', "jit", "NO_LIBELF=1", true);
>  #endif
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index dcf288a4fb9a..8ec818568662 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -3989,6 +3989,8 @@ int cmd_record(int argc, const char **argv)
>  # undef set_nobuild
>  #endif
>  
> +	/* Disable eager loading of kernel symbols that adds overhead to perf record. */
> +	symbol_conf.lazy_load_kernel_maps = true;
>  	rec->opts.affinity = PERF_AFFINITY_SYS;
>  
>  	rec->evlist = evlist__new();
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index 923c0fb15122..68f45e9e63b6 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -617,13 +617,13 @@ struct map *thread__find_map(struct thread *thread, u8 cpumode, u64 addr,
>  	if (cpumode == PERF_RECORD_MISC_KERNEL && perf_host) {
>  		al->level = 'k';
>  		maps = machine__kernel_maps(machine);
> -		load_map = true;
> +		load_map = !symbol_conf.lazy_load_kernel_maps;
>  	} else if (cpumode == PERF_RECORD_MISC_USER && perf_host) {
>  		al->level = '.';
>  	} else if (cpumode == PERF_RECORD_MISC_GUEST_KERNEL && perf_guest) {
>  		al->level = 'g';
>  		maps = machine__kernel_maps(machine);
> -		load_map = true;
> +		load_map = !symbol_conf.lazy_load_kernel_maps;
>  	} else if (cpumode == PERF_RECORD_MISC_GUEST_USER && perf_guest) {
>  		al->level = 'u';
>  	} else {
> diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
> index 0b589570d1d0..2b2fb9e224b0 100644
> --- a/tools/perf/util/symbol_conf.h
> +++ b/tools/perf/util/symbol_conf.h
> @@ -42,7 +42,8 @@ struct symbol_conf {
>  			inline_name,
>  			disable_add2line_warn,
>  			buildid_mmap2,
> -			guest_code;
> +			guest_code,
> +			lazy_load_kernel_maps;
>  	const char	*vmlinux_name,
>  			*kallsyms_name,
>  			*source_prefix,


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 13/13] perf machine thread: Remove exited threads by default
  2023-10-19  0:39     ` Ian Rogers
@ 2023-10-20 21:08       ` Namhyung Kim
  0 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2023-10-20 21:08 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Nick Terrell, Kan Liang, Song Liu, Sandipan Das,
	Anshuman Khandual, James Clark, Liam Howlett, Miguel Ojeda,
	Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

On Wed, Oct 18, 2023 at 5:39 PM Ian Rogers <irogers@google.com> wrote:
>
> On Wed, Oct 18, 2023 at 4:30 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Wed, Oct 11, 2023 at 11:24 PM Ian Rogers <irogers@google.com> wrote:
> > >
> > > struct thread values hold onto references to mmaps, dsos, etc. When a
> > > thread exits it is necessary to clean all of this memory up by
> > > removing the thread from the machine's threads. Some tools require
> > > this doesn't happen, such as perf report if offcpu events exist or if
> > > a task list is being generated, so add a symbol_conf value to make the
> > > behavior optional. When an exited thread is left in the machine's
> > > threads, mark it as exited.
> > >
> > > This change relates to commit 40826c45eb0b ("perf thread: Remove
> > > notion of dead threads"). Dead threads were removed as they had a
> > > reference count of 0 and were difficult to reason about with the
> > > reference count checker. Here a thread is removed from threads when it
> > > exits, unless via symbol_conf the exited thread isn't remove and is
> > > marked as exited. Reference counting behaves as it normally does.
> >
> > Maybe we can do it the other way around.  IOW tools can access
> > dead threads for whatever reason if they are dealing with a data
> > file.  And I guess the main concern is perf top to reduce memory
> > footprint, right?  Then we can declare to remove the dead threads
> > for perf top case only IMHO.
>
> Thanks I did consider this but I think this order makes most sense.
> The only tool relying on access to dead threads is perf report, but
> its uses are questionable:
>
>  - task printing - the tools wants to show all threads from a perf run
> and assumes they are in threads. By replacing tool.exit it would be
> easy to record dead threads for this, as the maps aren't required the
> memory overhead could be improved by just recording the necessary
> task's data.
>
>  - offcpu - it would be very useful to have offcpu samples be real
> samples, rather than an aggregated sample at the end of the data file
> with a timestamp greater-than when the thread exited. These samples
> would happen before the exit event is processed and so the reason to
> keep dead threads around would be removed. I think we could do some
> kind of sample aggregation using BPF, but I think we need to think it
> through with exit events. Perhaps we can fix things in the short-term
> by generating BPF samples with aggregation when threads exit in the
> offcpu BPF code, but then again, if you have this going it seems as
> easy just to keep to record all offcpu events as distinct.

Saving off-cpu event on every context switch might cause event loss
due to its frequency.  Generating aggregated samples on EXIT sounds
interesting, but then it'd iterated all possible keys for that thread
including stack trace ID and few more.  So I'm not 100% sure if it's ok
doing it on a task exit path.

>
> Other commands like perf top and perf script don't currently have
> dependencies on dead threads and I'm kind of glad, for
> understandability, memory footprint, etc. Having the default be that
> dead threads linger on will just encourage the kind of issues we see
> currently in perf report and having to have every tool, perf script
> declare it doesn't want dead threads seems burdensome.

Fair enough.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 13/13] perf machine thread: Remove exited threads by default
  2023-10-12  6:23 ` [PATCH v2 13/13] perf machine thread: Remove exited threads by default Ian Rogers
  2023-10-18 23:30   ` Namhyung Kim
@ 2023-10-23 14:24   ` Adrian Hunter
  2023-10-23 18:49     ` Ian Rogers
  1 sibling, 1 reply; 28+ messages in thread
From: Adrian Hunter @ 2023-10-23 14:24 UTC (permalink / raw)
  To: Ian Rogers, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Nick Terrell, Kan Liang, Song Liu,
	Sandipan Das, Anshuman Khandual, James Clark, Liam Howlett,
	Miguel Ojeda, Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

On 12/10/23 09:23, Ian Rogers wrote:
> struct thread values hold onto references to mmaps, dsos, etc. When a
> thread exits it is necessary to clean all of this memory up by
> removing the thread from the machine's threads. Some tools require
> this doesn't happen, such as perf report if offcpu events exist or if
> a task list is being generated, so add a symbol_conf value to make the
> behavior optional. When an exited thread is left in the machine's
> threads, mark it as exited.
> 
> This change relates to commit 40826c45eb0b ("perf thread: Remove
> notion of dead threads"). Dead threads were removed as they had a
> reference count of 0 and were difficult to reason about with the
> reference count checker. Here a thread is removed from threads when it
> exits, unless via symbol_conf the exited thread isn't remove and is
> marked as exited. Reference counting behaves as it normally does.

Can we exclude AUX area tracing?

Essentially, the EXIT event happens when the task is still running
in kernel mode, so the thread has not in fact fully exited.

Example:

# perf record -a --kcore -e intel_pt// uname

Before:

# perf script --itrace=b --show-task-events -C6 | grep -C10 EXIT
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63124ee __perf_event_header__init_id+0x5e ([kernel.kallsyms]) => ffffffffb63124f7 __perf_event_header__init_id+0x67 ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6312501 __perf_event_header__init_id+0x71 ([kernel.kallsyms]) => ffffffffb6312512 __perf_event_header__init_id+0x82 ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6312531 __perf_event_header__init_id+0xa1 ([kernel.kallsyms]) => ffffffffb6316b3a perf_event_task_output+0x26a ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6316b40 perf_event_task_output+0x270 ([kernel.kallsyms]) => ffffffffb6316959 perf_event_task_output+0x89 ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6316966 perf_event_task_output+0x96 ([kernel.kallsyms]) => ffffffffb6322040 perf_output_begin+0x0 ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6322080 perf_output_begin+0x40 ([kernel.kallsyms]) => ffffffffb6194dc0 __rcu_read_lock+0x0 ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6194de1 __rcu_read_lock+0x21 ([kernel.kallsyms]) => ffffffffb6322085 perf_output_begin+0x45 ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63220ce perf_output_begin+0x8e ([kernel.kallsyms]) => ffffffffb611d280 preempt_count_add+0x0 ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb611d2bf preempt_count_add+0x3f ([kernel.kallsyms]) => ffffffffb63220d3 perf_output_begin+0x93 ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63220e8 perf_output_begin+0xa8 ([kernel.kallsyms]) => ffffffffb63220ff perf_output_begin+0xbf ([kernel.kallsyms])
           uname   14740 [006] 26795.092638: PERF_RECORD_EXIT(14740:14740):(14739:14739)
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6322119 perf_output_begin+0xd9 ([kernel.kallsyms]) => ffffffffb6322128 perf_output_begin+0xe8 ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6322146 perf_output_begin+0x106 ([kernel.kallsyms]) => ffffffffb63220ea perf_output_begin+0xaa ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63220f9 perf_output_begin+0xb9 ([kernel.kallsyms]) => ffffffffb63221ab perf_output_begin+0x16b ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63221ae perf_output_begin+0x16e ([kernel.kallsyms]) => ffffffffb63221b6 perf_output_begin+0x176 ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6322202 perf_output_begin+0x1c2 ([kernel.kallsyms]) => ffffffffb6322167 perf_output_begin+0x127 ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb632218c perf_output_begin+0x14c ([kernel.kallsyms]) => ffffffffb631696b perf_event_task_output+0x9b ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6316990 perf_event_task_output+0xc0 ([kernel.kallsyms]) => ffffffffb61034a0 __task_pid_nr_ns+0x0 ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb61034b7 __task_pid_nr_ns+0x17 ([kernel.kallsyms]) => ffffffffb6194dc0 __rcu_read_lock+0x0 ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6194de1 __rcu_read_lock+0x21 ([kernel.kallsyms]) => ffffffffb61034bc __task_pid_nr_ns+0x1c ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6103503 __task_pid_nr_ns+0x63 ([kernel.kallsyms]) => ffffffffb610353b __task_pid_nr_ns+0x9b ([kernel.kallsyms])

After:

$ perf script --itrace=b --show-task-events -C6 | grep -C10 EXIT
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63124ee __perf_event_header__init_id+0x5e ([kernel.kallsyms]) => ffffffffb63124f7 __perf_event_header__init_id+0x67 ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6312501 __perf_event_header__init_id+0x71 ([kernel.kallsyms]) => ffffffffb6312512 __perf_event_header__init_id+0x82 ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6312531 __perf_event_header__init_id+0xa1 ([kernel.kallsyms]) => ffffffffb6316b3a perf_event_task_output+0x26a ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6316b40 perf_event_task_output+0x270 ([kernel.kallsyms]) => ffffffffb6316959 perf_event_task_output+0x89 ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6316966 perf_event_task_output+0x96 ([kernel.kallsyms]) => ffffffffb6322040 perf_output_begin+0x0 ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6322080 perf_output_begin+0x40 ([kernel.kallsyms]) => ffffffffb6194dc0 __rcu_read_lock+0x0 ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6194de1 __rcu_read_lock+0x21 ([kernel.kallsyms]) => ffffffffb6322085 perf_output_begin+0x45 ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63220ce perf_output_begin+0x8e ([kernel.kallsyms]) => ffffffffb611d280 preempt_count_add+0x0 ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb611d2bf preempt_count_add+0x3f ([kernel.kallsyms]) => ffffffffb63220d3 perf_output_begin+0x93 ([kernel.kallsyms])
           uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63220e8 perf_output_begin+0xa8 ([kernel.kallsyms]) => ffffffffb63220ff perf_output_begin+0xbf ([kernel.kallsyms])
           uname   14740 [006] 26795.092638: PERF_RECORD_EXIT(14740:14740):(14739:14739)
          :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb6322119 perf_output_begin+0xd9 ([kernel.kallsyms]) => ffffffffb6322128 perf_output_begin+0xe8 ([kernel.kallsyms])
          :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb6322146 perf_output_begin+0x106 ([kernel.kallsyms]) => ffffffffb63220ea perf_output_begin+0xaa ([kernel.kallsyms])
          :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb63220f9 perf_output_begin+0xb9 ([kernel.kallsyms]) => ffffffffb63221ab perf_output_begin+0x16b ([kernel.kallsyms])
          :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb63221ae perf_output_begin+0x16e ([kernel.kallsyms]) => ffffffffb63221b6 perf_output_begin+0x176 ([kernel.kallsyms])
          :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb6322202 perf_output_begin+0x1c2 ([kernel.kallsyms]) => ffffffffb6322167 perf_output_begin+0x127 ([kernel.kallsyms])
          :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb632218c perf_output_begin+0x14c ([kernel.kallsyms]) => ffffffffb631696b perf_event_task_output+0x9b ([kernel.kallsyms])
          :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb6316990 perf_event_task_output+0xc0 ([kernel.kallsyms]) => ffffffffb61034a0 __task_pid_nr_ns+0x0 ([kernel.kallsyms])
          :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb61034b7 __task_pid_nr_ns+0x17 ([kernel.kallsyms]) => ffffffffb6194dc0 __rcu_read_lock+0x0 ([kernel.kallsyms])
          :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb6194de1 __rcu_read_lock+0x21 ([kernel.kallsyms]) => ffffffffb61034bc __task_pid_nr_ns+0x1c ([kernel.kallsyms])
          :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb6103503 __task_pid_nr_ns+0x63 ([kernel.kallsyms]) => ffffffffb610353b __task_pid_nr_ns+0x9b ([kernel.kallsyms])

This will also affect samples made after PERF_RECORD_EXIT but before
the task finishes exiting.

> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/builtin-report.c   |  7 +++++++
>  tools/perf/util/machine.c     | 10 +++++++---
>  tools/perf/util/symbol_conf.h |  3 ++-
>  tools/perf/util/thread.h      | 14 ++++++++++++++
>  4 files changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index dcedfe00f04d..749246817aed 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -1411,6 +1411,13 @@ int cmd_report(int argc, const char **argv)
>  	if (ret < 0)
>  		goto exit;
>  
> +	/*
> +	 * tasks_mode require access to exited threads to list those that are in
> +	 * the data file. Off-cpu events are synthesized after other events and
> +	 * reference exited threads.
> +	 */
> +	symbol_conf.keep_exited_threads = true;
> +
>  	annotation_options__init(&report.annotation_opts);
>  
>  	ret = perf_config(report__config, &report);
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 6ca7500e2cf4..5cda47eb337d 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -2157,9 +2157,13 @@ int machine__process_exit_event(struct machine *machine, union perf_event *event
>  	if (dump_trace)
>  		perf_event__fprintf_task(event, stdout);
>  
> -	if (thread != NULL)
> -		thread__put(thread);
> -
> +	if (thread != NULL) {
> +		if (symbol_conf.keep_exited_threads)
> +			thread__set_exited(thread, /*exited=*/true);
> +		else
> +			machine__remove_thread(machine, thread);
> +	}
> +	thread__put(thread);
>  	return 0;
>  }
>  
> diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
> index 2b2fb9e224b0..6040286e07a6 100644
> --- a/tools/perf/util/symbol_conf.h
> +++ b/tools/perf/util/symbol_conf.h
> @@ -43,7 +43,8 @@ struct symbol_conf {
>  			disable_add2line_warn,
>  			buildid_mmap2,
>  			guest_code,
> -			lazy_load_kernel_maps;
> +			lazy_load_kernel_maps,
> +			keep_exited_threads;
>  	const char	*vmlinux_name,
>  			*kallsyms_name,
>  			*source_prefix,
> diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
> index e79225a0ea46..0df775b5c110 100644
> --- a/tools/perf/util/thread.h
> +++ b/tools/perf/util/thread.h
> @@ -36,13 +36,22 @@ struct thread_rb_node {
>  };
>  
>  DECLARE_RC_STRUCT(thread) {
> +	/** @maps: mmaps associated with this thread. */
>  	struct maps		*maps;
>  	pid_t			pid_; /* Not all tools update this */
> +	/** @tid: thread ID number unique to a machine. */
>  	pid_t			tid;
> +	/** @ppid: parent process of the process this thread belongs to. */
>  	pid_t			ppid;
>  	int			cpu;
>  	int			guest_cpu; /* For QEMU thread */
>  	refcount_t		refcnt;
> +	/**
> +	 * @exited: Has the thread had an exit event. Such threads are usually
> +	 * removed from the machine's threads but some events/tools require
> +	 * access to dead threads.
> +	 */
> +	bool			exited;
>  	bool			comm_set;
>  	int			comm_len;
>  	struct list_head	namespaces_list;
> @@ -189,6 +198,11 @@ static inline refcount_t *thread__refcnt(struct thread *thread)
>  	return &RC_CHK_ACCESS(thread)->refcnt;
>  }
>  
> +static inline void thread__set_exited(struct thread *thread, bool exited)
> +{
> +	RC_CHK_ACCESS(thread)->exited = exited;
> +}
> +
>  static inline bool thread__comm_set(const struct thread *thread)
>  {
>  	return RC_CHK_ACCESS(thread)->comm_set;


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 13/13] perf machine thread: Remove exited threads by default
  2023-10-23 14:24   ` Adrian Hunter
@ 2023-10-23 18:49     ` Ian Rogers
  2023-10-24  5:19       ` Adrian Hunter
  0 siblings, 1 reply; 28+ messages in thread
From: Ian Rogers @ 2023-10-23 18:49 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Nick Terrell, Kan Liang, Song Liu, Sandipan Das,
	Anshuman Khandual, James Clark, Liam Howlett, Miguel Ojeda,
	Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

On Mon, Oct 23, 2023 at 7:24 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> On 12/10/23 09:23, Ian Rogers wrote:
> > struct thread values hold onto references to mmaps, dsos, etc. When a
> > thread exits it is necessary to clean all of this memory up by
> > removing the thread from the machine's threads. Some tools require
> > this doesn't happen, such as perf report if offcpu events exist or if
> > a task list is being generated, so add a symbol_conf value to make the
> > behavior optional. When an exited thread is left in the machine's
> > threads, mark it as exited.
> >
> > This change relates to commit 40826c45eb0b ("perf thread: Remove
> > notion of dead threads"). Dead threads were removed as they had a
> > reference count of 0 and were difficult to reason about with the
> > reference count checker. Here a thread is removed from threads when it
> > exits, unless via symbol_conf the exited thread isn't remove and is
> > marked as exited. Reference counting behaves as it normally does.
>
> Can we exclude AUX area tracing?
>
> Essentially, the EXIT event happens when the task is still running
> in kernel mode, so the thread has not in fact fully exited.
>
> Example:
>
> # perf record -a --kcore -e intel_pt// uname
>
> Before:
>
> # perf script --itrace=b --show-task-events -C6 | grep -C10 EXIT
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63124ee __perf_event_header__init_id+0x5e ([kernel.kallsyms]) => ffffffffb63124f7 __perf_event_header__init_id+0x67 ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6312501 __perf_event_header__init_id+0x71 ([kernel.kallsyms]) => ffffffffb6312512 __perf_event_header__init_id+0x82 ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6312531 __perf_event_header__init_id+0xa1 ([kernel.kallsyms]) => ffffffffb6316b3a perf_event_task_output+0x26a ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6316b40 perf_event_task_output+0x270 ([kernel.kallsyms]) => ffffffffb6316959 perf_event_task_output+0x89 ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6316966 perf_event_task_output+0x96 ([kernel.kallsyms]) => ffffffffb6322040 perf_output_begin+0x0 ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6322080 perf_output_begin+0x40 ([kernel.kallsyms]) => ffffffffb6194dc0 __rcu_read_lock+0x0 ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6194de1 __rcu_read_lock+0x21 ([kernel.kallsyms]) => ffffffffb6322085 perf_output_begin+0x45 ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63220ce perf_output_begin+0x8e ([kernel.kallsyms]) => ffffffffb611d280 preempt_count_add+0x0 ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb611d2bf preempt_count_add+0x3f ([kernel.kallsyms]) => ffffffffb63220d3 perf_output_begin+0x93 ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63220e8 perf_output_begin+0xa8 ([kernel.kallsyms]) => ffffffffb63220ff perf_output_begin+0xbf ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638: PERF_RECORD_EXIT(14740:14740):(14739:14739)
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6322119 perf_output_begin+0xd9 ([kernel.kallsyms]) => ffffffffb6322128 perf_output_begin+0xe8 ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6322146 perf_output_begin+0x106 ([kernel.kallsyms]) => ffffffffb63220ea perf_output_begin+0xaa ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63220f9 perf_output_begin+0xb9 ([kernel.kallsyms]) => ffffffffb63221ab perf_output_begin+0x16b ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63221ae perf_output_begin+0x16e ([kernel.kallsyms]) => ffffffffb63221b6 perf_output_begin+0x176 ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6322202 perf_output_begin+0x1c2 ([kernel.kallsyms]) => ffffffffb6322167 perf_output_begin+0x127 ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb632218c perf_output_begin+0x14c ([kernel.kallsyms]) => ffffffffb631696b perf_event_task_output+0x9b ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6316990 perf_event_task_output+0xc0 ([kernel.kallsyms]) => ffffffffb61034a0 __task_pid_nr_ns+0x0 ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb61034b7 __task_pid_nr_ns+0x17 ([kernel.kallsyms]) => ffffffffb6194dc0 __rcu_read_lock+0x0 ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6194de1 __rcu_read_lock+0x21 ([kernel.kallsyms]) => ffffffffb61034bc __task_pid_nr_ns+0x1c ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6103503 __task_pid_nr_ns+0x63 ([kernel.kallsyms]) => ffffffffb610353b __task_pid_nr_ns+0x9b ([kernel.kallsyms])
>
> After:
>
> $ perf script --itrace=b --show-task-events -C6 | grep -C10 EXIT
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63124ee __perf_event_header__init_id+0x5e ([kernel.kallsyms]) => ffffffffb63124f7 __perf_event_header__init_id+0x67 ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6312501 __perf_event_header__init_id+0x71 ([kernel.kallsyms]) => ffffffffb6312512 __perf_event_header__init_id+0x82 ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6312531 __perf_event_header__init_id+0xa1 ([kernel.kallsyms]) => ffffffffb6316b3a perf_event_task_output+0x26a ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6316b40 perf_event_task_output+0x270 ([kernel.kallsyms]) => ffffffffb6316959 perf_event_task_output+0x89 ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6316966 perf_event_task_output+0x96 ([kernel.kallsyms]) => ffffffffb6322040 perf_output_begin+0x0 ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6322080 perf_output_begin+0x40 ([kernel.kallsyms]) => ffffffffb6194dc0 __rcu_read_lock+0x0 ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6194de1 __rcu_read_lock+0x21 ([kernel.kallsyms]) => ffffffffb6322085 perf_output_begin+0x45 ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63220ce perf_output_begin+0x8e ([kernel.kallsyms]) => ffffffffb611d280 preempt_count_add+0x0 ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb611d2bf preempt_count_add+0x3f ([kernel.kallsyms]) => ffffffffb63220d3 perf_output_begin+0x93 ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63220e8 perf_output_begin+0xa8 ([kernel.kallsyms]) => ffffffffb63220ff perf_output_begin+0xbf ([kernel.kallsyms])
>            uname   14740 [006] 26795.092638: PERF_RECORD_EXIT(14740:14740):(14739:14739)
>           :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb6322119 perf_output_begin+0xd9 ([kernel.kallsyms]) => ffffffffb6322128 perf_output_begin+0xe8 ([kernel.kallsyms])
>           :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb6322146 perf_output_begin+0x106 ([kernel.kallsyms]) => ffffffffb63220ea perf_output_begin+0xaa ([kernel.kallsyms])
>           :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb63220f9 perf_output_begin+0xb9 ([kernel.kallsyms]) => ffffffffb63221ab perf_output_begin+0x16b ([kernel.kallsyms])
>           :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb63221ae perf_output_begin+0x16e ([kernel.kallsyms]) => ffffffffb63221b6 perf_output_begin+0x176 ([kernel.kallsyms])
>           :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb6322202 perf_output_begin+0x1c2 ([kernel.kallsyms]) => ffffffffb6322167 perf_output_begin+0x127 ([kernel.kallsyms])
>           :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb632218c perf_output_begin+0x14c ([kernel.kallsyms]) => ffffffffb631696b perf_event_task_output+0x9b ([kernel.kallsyms])
>           :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb6316990 perf_event_task_output+0xc0 ([kernel.kallsyms]) => ffffffffb61034a0 __task_pid_nr_ns+0x0 ([kernel.kallsyms])
>           :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb61034b7 __task_pid_nr_ns+0x17 ([kernel.kallsyms]) => ffffffffb6194dc0 __rcu_read_lock+0x0 ([kernel.kallsyms])
>           :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb6194de1 __rcu_read_lock+0x21 ([kernel.kallsyms]) => ffffffffb61034bc __task_pid_nr_ns+0x1c ([kernel.kallsyms])
>           :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb6103503 __task_pid_nr_ns+0x63 ([kernel.kallsyms]) => ffffffffb610353b __task_pid_nr_ns+0x9b ([kernel.kallsyms])
>
> This will also affect samples made after PERF_RECORD_EXIT but before
> the task finishes exiting.

Makes sense. Would an appropriate fix be in perf_session__open to set:
symbol_conf.keep_exited_threads = true;

when:
perf_header__has_feat(&session->header, HEADER_AUXTRACE)

It is kind of hacky to be changing global state this way, but
symbol_conf is like that in general.

Thanks,
Ian

> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/perf/builtin-report.c   |  7 +++++++
> >  tools/perf/util/machine.c     | 10 +++++++---
> >  tools/perf/util/symbol_conf.h |  3 ++-
> >  tools/perf/util/thread.h      | 14 ++++++++++++++
> >  4 files changed, 30 insertions(+), 4 deletions(-)
> >
> > diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> > index dcedfe00f04d..749246817aed 100644
> > --- a/tools/perf/builtin-report.c
> > +++ b/tools/perf/builtin-report.c
> > @@ -1411,6 +1411,13 @@ int cmd_report(int argc, const char **argv)
> >       if (ret < 0)
> >               goto exit;
> >
> > +     /*
> > +      * tasks_mode require access to exited threads to list those that are in
> > +      * the data file. Off-cpu events are synthesized after other events and
> > +      * reference exited threads.
> > +      */
> > +     symbol_conf.keep_exited_threads = true;
> > +
> >       annotation_options__init(&report.annotation_opts);
> >
> >       ret = perf_config(report__config, &report);
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index 6ca7500e2cf4..5cda47eb337d 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -2157,9 +2157,13 @@ int machine__process_exit_event(struct machine *machine, union perf_event *event
> >       if (dump_trace)
> >               perf_event__fprintf_task(event, stdout);
> >
> > -     if (thread != NULL)
> > -             thread__put(thread);
> > -
> > +     if (thread != NULL) {
> > +             if (symbol_conf.keep_exited_threads)
> > +                     thread__set_exited(thread, /*exited=*/true);
> > +             else
> > +                     machine__remove_thread(machine, thread);
> > +     }
> > +     thread__put(thread);
> >       return 0;
> >  }
> >
> > diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
> > index 2b2fb9e224b0..6040286e07a6 100644
> > --- a/tools/perf/util/symbol_conf.h
> > +++ b/tools/perf/util/symbol_conf.h
> > @@ -43,7 +43,8 @@ struct symbol_conf {
> >                       disable_add2line_warn,
> >                       buildid_mmap2,
> >                       guest_code,
> > -                     lazy_load_kernel_maps;
> > +                     lazy_load_kernel_maps,
> > +                     keep_exited_threads;
> >       const char      *vmlinux_name,
> >                       *kallsyms_name,
> >                       *source_prefix,
> > diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
> > index e79225a0ea46..0df775b5c110 100644
> > --- a/tools/perf/util/thread.h
> > +++ b/tools/perf/util/thread.h
> > @@ -36,13 +36,22 @@ struct thread_rb_node {
> >  };
> >
> >  DECLARE_RC_STRUCT(thread) {
> > +     /** @maps: mmaps associated with this thread. */
> >       struct maps             *maps;
> >       pid_t                   pid_; /* Not all tools update this */
> > +     /** @tid: thread ID number unique to a machine. */
> >       pid_t                   tid;
> > +     /** @ppid: parent process of the process this thread belongs to. */
> >       pid_t                   ppid;
> >       int                     cpu;
> >       int                     guest_cpu; /* For QEMU thread */
> >       refcount_t              refcnt;
> > +     /**
> > +      * @exited: Has the thread had an exit event. Such threads are usually
> > +      * removed from the machine's threads but some events/tools require
> > +      * access to dead threads.
> > +      */
> > +     bool                    exited;
> >       bool                    comm_set;
> >       int                     comm_len;
> >       struct list_head        namespaces_list;
> > @@ -189,6 +198,11 @@ static inline refcount_t *thread__refcnt(struct thread *thread)
> >       return &RC_CHK_ACCESS(thread)->refcnt;
> >  }
> >
> > +static inline void thread__set_exited(struct thread *thread, bool exited)
> > +{
> > +     RC_CHK_ACCESS(thread)->exited = exited;
> > +}
> > +
> >  static inline bool thread__comm_set(const struct thread *thread)
> >  {
> >       return RC_CHK_ACCESS(thread)->comm_set;
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 13/13] perf machine thread: Remove exited threads by default
  2023-10-23 18:49     ` Ian Rogers
@ 2023-10-24  5:19       ` Adrian Hunter
  0 siblings, 0 replies; 28+ messages in thread
From: Adrian Hunter @ 2023-10-24  5:19 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Nick Terrell, Kan Liang, Song Liu, Sandipan Das,
	Anshuman Khandual, James Clark, Liam Howlett, Miguel Ojeda,
	Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

On 23/10/23 21:49, Ian Rogers wrote:
> On Mon, Oct 23, 2023 at 7:24 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>>
>> On 12/10/23 09:23, Ian Rogers wrote:
>>> struct thread values hold onto references to mmaps, dsos, etc. When a
>>> thread exits it is necessary to clean all of this memory up by
>>> removing the thread from the machine's threads. Some tools require
>>> this doesn't happen, such as perf report if offcpu events exist or if
>>> a task list is being generated, so add a symbol_conf value to make the
>>> behavior optional. When an exited thread is left in the machine's
>>> threads, mark it as exited.
>>>
>>> This change relates to commit 40826c45eb0b ("perf thread: Remove
>>> notion of dead threads"). Dead threads were removed as they had a
>>> reference count of 0 and were difficult to reason about with the
>>> reference count checker. Here a thread is removed from threads when it
>>> exits, unless via symbol_conf the exited thread isn't remove and is
>>> marked as exited. Reference counting behaves as it normally does.
>>
>> Can we exclude AUX area tracing?
>>
>> Essentially, the EXIT event happens when the task is still running
>> in kernel mode, so the thread has not in fact fully exited.
>>
>> Example:
>>
>> # perf record -a --kcore -e intel_pt// uname
>>
>> Before:
>>
>> # perf script --itrace=b --show-task-events -C6 | grep -C10 EXIT
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63124ee __perf_event_header__init_id+0x5e ([kernel.kallsyms]) => ffffffffb63124f7 __perf_event_header__init_id+0x67 ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6312501 __perf_event_header__init_id+0x71 ([kernel.kallsyms]) => ffffffffb6312512 __perf_event_header__init_id+0x82 ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6312531 __perf_event_header__init_id+0xa1 ([kernel.kallsyms]) => ffffffffb6316b3a perf_event_task_output+0x26a ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6316b40 perf_event_task_output+0x270 ([kernel.kallsyms]) => ffffffffb6316959 perf_event_task_output+0x89 ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6316966 perf_event_task_output+0x96 ([kernel.kallsyms]) => ffffffffb6322040 perf_output_begin+0x0 ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6322080 perf_output_begin+0x40 ([kernel.kallsyms]) => ffffffffb6194dc0 __rcu_read_lock+0x0 ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6194de1 __rcu_read_lock+0x21 ([kernel.kallsyms]) => ffffffffb6322085 perf_output_begin+0x45 ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63220ce perf_output_begin+0x8e ([kernel.kallsyms]) => ffffffffb611d280 preempt_count_add+0x0 ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb611d2bf preempt_count_add+0x3f ([kernel.kallsyms]) => ffffffffb63220d3 perf_output_begin+0x93 ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63220e8 perf_output_begin+0xa8 ([kernel.kallsyms]) => ffffffffb63220ff perf_output_begin+0xbf ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638: PERF_RECORD_EXIT(14740:14740):(14739:14739)
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6322119 perf_output_begin+0xd9 ([kernel.kallsyms]) => ffffffffb6322128 perf_output_begin+0xe8 ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6322146 perf_output_begin+0x106 ([kernel.kallsyms]) => ffffffffb63220ea perf_output_begin+0xaa ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63220f9 perf_output_begin+0xb9 ([kernel.kallsyms]) => ffffffffb63221ab perf_output_begin+0x16b ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63221ae perf_output_begin+0x16e ([kernel.kallsyms]) => ffffffffb63221b6 perf_output_begin+0x176 ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6322202 perf_output_begin+0x1c2 ([kernel.kallsyms]) => ffffffffb6322167 perf_output_begin+0x127 ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb632218c perf_output_begin+0x14c ([kernel.kallsyms]) => ffffffffb631696b perf_event_task_output+0x9b ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6316990 perf_event_task_output+0xc0 ([kernel.kallsyms]) => ffffffffb61034a0 __task_pid_nr_ns+0x0 ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb61034b7 __task_pid_nr_ns+0x17 ([kernel.kallsyms]) => ffffffffb6194dc0 __rcu_read_lock+0x0 ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6194de1 __rcu_read_lock+0x21 ([kernel.kallsyms]) => ffffffffb61034bc __task_pid_nr_ns+0x1c ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6103503 __task_pid_nr_ns+0x63 ([kernel.kallsyms]) => ffffffffb610353b __task_pid_nr_ns+0x9b ([kernel.kallsyms])
>>
>> After:
>>
>> $ perf script --itrace=b --show-task-events -C6 | grep -C10 EXIT
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63124ee __perf_event_header__init_id+0x5e ([kernel.kallsyms]) => ffffffffb63124f7 __perf_event_header__init_id+0x67 ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6312501 __perf_event_header__init_id+0x71 ([kernel.kallsyms]) => ffffffffb6312512 __perf_event_header__init_id+0x82 ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6312531 __perf_event_header__init_id+0xa1 ([kernel.kallsyms]) => ffffffffb6316b3a perf_event_task_output+0x26a ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6316b40 perf_event_task_output+0x270 ([kernel.kallsyms]) => ffffffffb6316959 perf_event_task_output+0x89 ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6316966 perf_event_task_output+0x96 ([kernel.kallsyms]) => ffffffffb6322040 perf_output_begin+0x0 ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6322080 perf_output_begin+0x40 ([kernel.kallsyms]) => ffffffffb6194dc0 __rcu_read_lock+0x0 ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb6194de1 __rcu_read_lock+0x21 ([kernel.kallsyms]) => ffffffffb6322085 perf_output_begin+0x45 ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63220ce perf_output_begin+0x8e ([kernel.kallsyms]) => ffffffffb611d280 preempt_count_add+0x0 ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb611d2bf preempt_count_add+0x3f ([kernel.kallsyms]) => ffffffffb63220d3 perf_output_begin+0x93 ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638:          1   branches:  ffffffffb63220e8 perf_output_begin+0xa8 ([kernel.kallsyms]) => ffffffffb63220ff perf_output_begin+0xbf ([kernel.kallsyms])
>>            uname   14740 [006] 26795.092638: PERF_RECORD_EXIT(14740:14740):(14739:14739)
>>           :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb6322119 perf_output_begin+0xd9 ([kernel.kallsyms]) => ffffffffb6322128 perf_output_begin+0xe8 ([kernel.kallsyms])
>>           :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb6322146 perf_output_begin+0x106 ([kernel.kallsyms]) => ffffffffb63220ea perf_output_begin+0xaa ([kernel.kallsyms])
>>           :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb63220f9 perf_output_begin+0xb9 ([kernel.kallsyms]) => ffffffffb63221ab perf_output_begin+0x16b ([kernel.kallsyms])
>>           :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb63221ae perf_output_begin+0x16e ([kernel.kallsyms]) => ffffffffb63221b6 perf_output_begin+0x176 ([kernel.kallsyms])
>>           :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb6322202 perf_output_begin+0x1c2 ([kernel.kallsyms]) => ffffffffb6322167 perf_output_begin+0x127 ([kernel.kallsyms])
>>           :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb632218c perf_output_begin+0x14c ([kernel.kallsyms]) => ffffffffb631696b perf_event_task_output+0x9b ([kernel.kallsyms])
>>           :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb6316990 perf_event_task_output+0xc0 ([kernel.kallsyms]) => ffffffffb61034a0 __task_pid_nr_ns+0x0 ([kernel.kallsyms])
>>           :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb61034b7 __task_pid_nr_ns+0x17 ([kernel.kallsyms]) => ffffffffb6194dc0 __rcu_read_lock+0x0 ([kernel.kallsyms])
>>           :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb6194de1 __rcu_read_lock+0x21 ([kernel.kallsyms]) => ffffffffb61034bc __task_pid_nr_ns+0x1c ([kernel.kallsyms])
>>           :14740   14740 [006] 26795.092638:          1   branches:  ffffffffb6103503 __task_pid_nr_ns+0x63 ([kernel.kallsyms]) => ffffffffb610353b __task_pid_nr_ns+0x9b ([kernel.kallsyms])
>>
>> This will also affect samples made after PERF_RECORD_EXIT but before
>> the task finishes exiting.
> 
> Makes sense. Would an appropriate fix be in perf_session__open to set:
> symbol_conf.keep_exited_threads = true;
> 
> when:
> perf_header__has_feat(&session->header, HEADER_AUXTRACE)
> 
> It is kind of hacky to be changing global state this way, but
> symbol_conf is like that in general.

That should work.  Alternatively, could be added to
perf_event__process_auxtrace_info() which would tie it
more directly to auxtrace, and wouldn't have to check
HEADER_AUXTRACE.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 03/13] perf hist: Add missing puts to hist__account_cycles
  2023-10-18 23:16   ` Namhyung Kim
@ 2023-10-24 16:12     ` Ian Rogers
  0 siblings, 0 replies; 28+ messages in thread
From: Ian Rogers @ 2023-10-24 16:12 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Nick Terrell, Kan Liang, Song Liu, Sandipan Das,
	Anshuman Khandual, James Clark, Liam Howlett, Miguel Ojeda,
	Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

On Wed, Oct 18, 2023 at 4:16 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hi Ian,
>
> On Wed, Oct 11, 2023 at 11:24 PM Ian Rogers <irogers@google.com> wrote:
> >
> > Caught using reference count checking on perf top with
> > "--call-graph=lbr". After this no memory leaks were detected.
> >
> > Fixes: 57849998e2cd ("perf report: Add processing for cycle histograms")
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/perf/util/hist.c | 10 +++++++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> > index 3dc8a4968beb..ac8c0ef48a7f 100644
> > --- a/tools/perf/util/hist.c
> > +++ b/tools/perf/util/hist.c
> > @@ -2676,8 +2676,6 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
> >
> >         /* If we have branch cycles always annotate them. */
> >         if (bs && bs->nr && entries[0].flags.cycles) {
> > -               int i;
> > -
>
> Seems not necessary.
>
> >                 bi = sample__resolve_bstack(sample, al);
>
> It looks like this increases the refcount for each bi entry and
> it didn't put the refcounts.

Right, this is why the loop doing the puts is added.

>
> >                 if (bi) {
> >                         struct addr_map_symbol *prev = NULL;
> > @@ -2692,7 +2690,7 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
> >                          * Note that perf stores branches reversed from
> >                          * program order!
> >                          */
> > -                       for (i = bs->nr - 1; i >= 0; i--) {
> > +                       for (int i = bs->nr - 1; i >= 0; i--) {
> >                                 addr_map_symbol__account_cycles(&bi[i].from,
> >                                         nonany_branch_mode ? NULL : prev,
> >                                         bi[i].flags.cycles);
> > @@ -2701,6 +2699,12 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
> >                                 if (total_cycles)
> >                                         *total_cycles += bi[i].flags.cycles;
> >                         }
> > +                       for (unsigned int i = 0; i < bs->nr; i++) {
>
> Can we just reuse the int i above?

I wanted to move to unsigned for consistency with the rest of the
branch_stack code, nr is a u64, but when iterating down the sign
matters - so this fixes up where possible.

Thanks,
Ian

> Thanks,
> Namhyung
>
>
> > +                               map__put(bi[i].to.ms.map);
> > +                               maps__put(bi[i].to.ms.maps);
> > +                               map__put(bi[i].from.ms.map);
> > +                               maps__put(bi[i].from.ms.maps);
> > +                       }
> >                         free(bi);
> >                 }
> >         }
> > --
> > 2.42.0.609.gbb76f46606-goog
> >

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 10/13] perf record: Lazy load kernel symbols
  2023-10-19 11:02   ` Adrian Hunter
@ 2023-10-24 18:42     ` Ian Rogers
  0 siblings, 0 replies; 28+ messages in thread
From: Ian Rogers @ 2023-10-24 18:42 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Nick Terrell, Kan Liang, Song Liu, Sandipan Das,
	Anshuman Khandual, James Clark, Liam Howlett, Miguel Ojeda,
	Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

On Thu, Oct 19, 2023 at 4:02 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> On 12/10/23 09:23, Ian Rogers wrote:
> > Commit 5b7ba82a7591 ("perf symbols: Load kernel maps before using")
> > changed it so that loading a kernel dso would cause the symbols for
> > the dso to be eagerly loaded. For perf record this is overhead as the
> > symbols won't be used. Add a symbol_conf to control the behavior and
> > disable it for perf record and perf inject.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/perf/builtin-inject.c   | 4 ++++
> >  tools/perf/builtin-record.c   | 2 ++
> >  tools/perf/util/event.c       | 4 ++--
> >  tools/perf/util/symbol_conf.h | 3 ++-
> >  4 files changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> > index c8cf2fdd9cff..1539fb18c749 100644
> > --- a/tools/perf/builtin-inject.c
> > +++ b/tools/perf/builtin-inject.c
> > @@ -2265,6 +2265,10 @@ int cmd_inject(int argc, const char **argv)
> >               "perf inject [<options>]",
> >               NULL
> >       };
> > +
> > +     /* Disable eager loading of kernel symbols that adds overhead to perf inject. */
> > +     symbol_conf.lazy_load_kernel_maps = true;
>
> Possibly not for itrace kernel decoding, so:
>
>         if (!inject->itrace_synth_opts.set)
>                 symbol_conf.lazy_load_kernel_maps = true;

Thanks, added to v3.

Ian

> > +
> >  #ifndef HAVE_JITDUMP
> >       set_option_nobuild(options, 'j', "jit", "NO_LIBELF=1", true);
> >  #endif
> > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > index dcf288a4fb9a..8ec818568662 100644
> > --- a/tools/perf/builtin-record.c
> > +++ b/tools/perf/builtin-record.c
> > @@ -3989,6 +3989,8 @@ int cmd_record(int argc, const char **argv)
> >  # undef set_nobuild
> >  #endif
> >
> > +     /* Disable eager loading of kernel symbols that adds overhead to perf record. */
> > +     symbol_conf.lazy_load_kernel_maps = true;
> >       rec->opts.affinity = PERF_AFFINITY_SYS;
> >
> >       rec->evlist = evlist__new();
> > diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> > index 923c0fb15122..68f45e9e63b6 100644
> > --- a/tools/perf/util/event.c
> > +++ b/tools/perf/util/event.c
> > @@ -617,13 +617,13 @@ struct map *thread__find_map(struct thread *thread, u8 cpumode, u64 addr,
> >       if (cpumode == PERF_RECORD_MISC_KERNEL && perf_host) {
> >               al->level = 'k';
> >               maps = machine__kernel_maps(machine);
> > -             load_map = true;
> > +             load_map = !symbol_conf.lazy_load_kernel_maps;
> >       } else if (cpumode == PERF_RECORD_MISC_USER && perf_host) {
> >               al->level = '.';
> >       } else if (cpumode == PERF_RECORD_MISC_GUEST_KERNEL && perf_guest) {
> >               al->level = 'g';
> >               maps = machine__kernel_maps(machine);
> > -             load_map = true;
> > +             load_map = !symbol_conf.lazy_load_kernel_maps;
> >       } else if (cpumode == PERF_RECORD_MISC_GUEST_USER && perf_guest) {
> >               al->level = 'u';
> >       } else {
> > diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
> > index 0b589570d1d0..2b2fb9e224b0 100644
> > --- a/tools/perf/util/symbol_conf.h
> > +++ b/tools/perf/util/symbol_conf.h
> > @@ -42,7 +42,8 @@ struct symbol_conf {
> >                       inline_name,
> >                       disable_add2line_warn,
> >                       buildid_mmap2,
> > -                     guest_code;
> > +                     guest_code,
> > +                     lazy_load_kernel_maps;
> >       const char      *vmlinux_name,
> >                       *kallsyms_name,
> >                       *source_prefix,
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 12/13] perf mmap: Lazily initialize zstd streams
  2023-10-18 23:21   ` Namhyung Kim
@ 2023-10-24 19:21     ` Ian Rogers
  0 siblings, 0 replies; 28+ messages in thread
From: Ian Rogers @ 2023-10-24 19:21 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Nick Terrell, Kan Liang, Song Liu, Sandipan Das,
	Anshuman Khandual, James Clark, Liam Howlett, Miguel Ojeda,
	Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

On Wed, Oct 18, 2023 at 4:21 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, Oct 11, 2023 at 11:24 PM Ian Rogers <irogers@google.com> wrote:
> >
> > Zstd streams create dictionaries that can require significant RAM,
> > especially when there is one per-CPU. Tools like perf record won't use
> > the streams without the -z option, and so the creation of the streams
> > is pure overhead. Switch to creating the streams on first use.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/perf/util/compress.h |  1 +
> >  tools/perf/util/mmap.c     |  5 ++--
> >  tools/perf/util/mmap.h     |  1 -
> >  tools/perf/util/zstd.c     | 61 ++++++++++++++++++++------------------
> >  4 files changed, 35 insertions(+), 33 deletions(-)
> >
> > diff --git a/tools/perf/util/compress.h b/tools/perf/util/compress.h
> > index 0cd3369af2a4..9391850f1a7e 100644
> > --- a/tools/perf/util/compress.h
> > +++ b/tools/perf/util/compress.h
> > @@ -21,6 +21,7 @@ struct zstd_data {
> >  #ifdef HAVE_ZSTD_SUPPORT
> >         ZSTD_CStream    *cstream;
> >         ZSTD_DStream    *dstream;
> > +       int comp_level;
> >  #endif
> >  };
> >
> > diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> > index 49093b21ee2d..122ee198a86e 100644
> > --- a/tools/perf/util/mmap.c
> > +++ b/tools/perf/util/mmap.c
> > @@ -295,15 +295,14 @@ int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, struct perf_cpu
> >
> >         map->core.flush = mp->flush;
> >
> > -       map->comp_level = mp->comp_level;
> >  #ifndef PYTHON_PERF
> > -       if (zstd_init(&map->zstd_data, map->comp_level)) {
> > +       if (zstd_init(&map->zstd_data, mp->comp_level)) {
> >                 pr_debug2("failed to init mmap compressor, error %d\n", errno);
> >                 return -1;
> >         }
> >  #endif
> >
> > -       if (map->comp_level && !perf_mmap__aio_enabled(map)) {
> > +       if (mp->comp_level && !perf_mmap__aio_enabled(map)) {
> >                 map->data = mmap(NULL, mmap__mmap_len(map), PROT_READ|PROT_WRITE,
> >                                  MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
> >                 if (map->data == MAP_FAILED) {
> > diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
> > index f944c3cd5efa..0df6e1621c7e 100644
> > --- a/tools/perf/util/mmap.h
> > +++ b/tools/perf/util/mmap.h
> > @@ -39,7 +39,6 @@ struct mmap {
> >  #endif
> >         struct mmap_cpu_mask    affinity_mask;
> >         void            *data;
> > -       int             comp_level;
> >         struct perf_data_file *file;
> >         struct zstd_data      zstd_data;
> >  };
> > diff --git a/tools/perf/util/zstd.c b/tools/perf/util/zstd.c
> > index 48dd2b018c47..60f2d749b1c0 100644
> > --- a/tools/perf/util/zstd.c
> > +++ b/tools/perf/util/zstd.c
> > @@ -7,35 +7,9 @@
> >
> >  int zstd_init(struct zstd_data *data, int level)
> >  {
> > -       size_t ret;
> > -
> > -       data->dstream = ZSTD_createDStream();
> > -       if (data->dstream == NULL) {
> > -               pr_err("Couldn't create decompression stream.\n");
> > -               return -1;
> > -       }
> > -
> > -       ret = ZSTD_initDStream(data->dstream);
> > -       if (ZSTD_isError(ret)) {
> > -               pr_err("Failed to initialize decompression stream: %s\n", ZSTD_getErrorName(ret));
> > -               return -1;
> > -       }
> > -
> > -       if (!level)
> > -               return 0;
> > -
> > -       data->cstream = ZSTD_createCStream();
> > -       if (data->cstream == NULL) {
> > -               pr_err("Couldn't create compression stream.\n");
> > -               return -1;
> > -       }
> > -
> > -       ret = ZSTD_initCStream(data->cstream, level);
> > -       if (ZSTD_isError(ret)) {
> > -               pr_err("Failed to initialize compression stream: %s\n", ZSTD_getErrorName(ret));
> > -               return -1;
> > -       }
> > -
> > +       data->comp_level = level;
> > +       data->dstream = NULL;
> > +       data->cstream = NULL;
> >         return 0;
> >  }
> >
> > @@ -63,6 +37,21 @@ size_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t
> >         ZSTD_outBuffer output;
> >         void *record;
> >
> > +       if (!data->cstream) {
> > +               data->cstream = ZSTD_createCStream();
> > +               if (data->cstream == NULL) {
> > +                       pr_err("Couldn't create compression stream.\n");
> > +                       return -1;
> > +               }
> > +
> > +               ret = ZSTD_initCStream(data->cstream, data->comp_level);
> > +               if (ZSTD_isError(ret)) {
> > +                       pr_err("Failed to initialize compression stream: %s\n",
> > +                               ZSTD_getErrorName(ret));
> > +                       return -1;
>
> I'm not sure if the callers are ready to handle the failure.

Thanks, fixed in v3.

Ian

> Thanks,
> Namhyung
>
>
> > +               }
> > +       }
> > +
> >         while (input.pos < input.size) {
> >                 record = dst;
> >                 size = process_header(record, 0);
> > @@ -96,6 +85,20 @@ size_t zstd_decompress_stream(struct zstd_data *data, void *src, size_t src_size
> >         ZSTD_inBuffer input = { src, src_size, 0 };
> >         ZSTD_outBuffer output = { dst, dst_size, 0 };
> >
> > +       if (!data->dstream) {
> > +               data->dstream = ZSTD_createDStream();
> > +               if (data->dstream == NULL) {
> > +                       pr_err("Couldn't create decompression stream.\n");
> > +                       return -1;
> > +               }
> > +
> > +               ret = ZSTD_initDStream(data->dstream);
> > +               if (ZSTD_isError(ret)) {
> > +                       pr_err("Failed to initialize decompression stream: %s\n",
> > +                               ZSTD_getErrorName(ret));
> > +                       return -1;
> > +               }
> > +       }
> >         while (input.pos < input.size) {
> >                 ret = ZSTD_decompressStream(data->dstream, &output, &input);
> >                 if (ZSTD_isError(ret)) {
> > --
> > 2.42.0.609.gbb76f46606-goog
> >

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 09/13] perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit
  2023-10-18 23:20   ` Namhyung Kim
@ 2023-10-24 20:52     ` Ian Rogers
  0 siblings, 0 replies; 28+ messages in thread
From: Ian Rogers @ 2023-10-24 20:52 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Nick Terrell, Kan Liang, Song Liu, Sandipan Das,
	Anshuman Khandual, James Clark, Liam Howlett, Miguel Ojeda,
	Leo Yan, German Gomez, Ravi Bangoria, Artem Savkov,
	Athira Rajeev, Andi Kleen, linux-kernel, linux-perf-users, bpf

On Wed, Oct 18, 2023 at 4:20 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, Oct 11, 2023 at 11:24 PM Ian Rogers <irogers@google.com> wrote:
> >
> > Fix leak where mem_info__put wouldn't release the maps/map as used by
> > perf mem. Add exit functions and use elsewhere that the maps and map
> > are released.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/perf/util/Build        |  1 +
> >  tools/perf/util/callchain.c  | 15 +++++----------
> >  tools/perf/util/hist.c       |  6 ++----
> >  tools/perf/util/machine.c    |  6 ++----
> >  tools/perf/util/map_symbol.c | 15 +++++++++++++++
> >  tools/perf/util/map_symbol.h |  4 ++++
> >  tools/perf/util/symbol.c     |  5 ++++-
> >  7 files changed, 33 insertions(+), 19 deletions(-)
> >  create mode 100644 tools/perf/util/map_symbol.c
> >
> > diff --git a/tools/perf/util/Build b/tools/perf/util/Build
> > index 0ea5a9d368d4..96058f949ec9 100644
> > --- a/tools/perf/util/Build
> > +++ b/tools/perf/util/Build
> > @@ -49,6 +49,7 @@ perf-y += dso.o
> >  perf-y += dsos.o
> >  perf-y += symbol.o
> >  perf-y += symbol_fprintf.o
> > +perf-y += map_symbol.o
> >  perf-y += color.o
> >  perf-y += color_config.o
> >  perf-y += metricgroup.o
> > diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> > index 0a7919c2af91..02881d5b822c 100644
> > --- a/tools/perf/util/callchain.c
> > +++ b/tools/perf/util/callchain.c
> > @@ -1496,16 +1496,14 @@ static void free_callchain_node(struct callchain_node *node)
> >
> >         list_for_each_entry_safe(list, tmp, &node->parent_val, list) {
> >                 list_del_init(&list->list);
> > -               map__zput(list->ms.map);
> > -               maps__zput(list->ms.maps);
> > +               map_symbol__exit(&list->ms);
> >                 zfree(&list->brtype_stat);
> >                 free(list);
> >         }
> >
> >         list_for_each_entry_safe(list, tmp, &node->val, list) {
> >                 list_del_init(&list->list);
> > -               map__zput(list->ms.map);
> > -               maps__zput(list->ms.maps);
> > +               map_symbol__exit(&list->ms);
> >                 zfree(&list->brtype_stat);
> >                 free(list);
> >         }
> > @@ -1591,8 +1589,7 @@ int callchain_node__make_parent_list(struct callchain_node *node)
> >  out:
> >         list_for_each_entry_safe(chain, new, &head, list) {
> >                 list_del_init(&chain->list);
> > -               map__zput(chain->ms.map);
> > -               maps__zput(chain->ms.maps);
> > +               map_symbol__exit(&chain->ms);
> >                 zfree(&chain->brtype_stat);
> >                 free(chain);
> >         }
> > @@ -1676,10 +1673,8 @@ void callchain_cursor_reset(struct callchain_cursor *cursor)
> >         cursor->nr = 0;
> >         cursor->last = &cursor->first;
> >
> > -       for (node = cursor->first; node != NULL; node = node->next) {
> > -               map__zput(node->ms.map);
> > -               maps__zput(node->ms.maps);
> > -       }
> > +       for (node = cursor->first; node != NULL; node = node->next)
> > +               map_symbol__exit(&node->ms);
> >  }
> >
> >  void callchain_param_setup(u64 sample_type, const char *arch)
> > diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> > index ac8c0ef48a7f..d62693b8fad8 100644
> > --- a/tools/perf/util/hist.c
> > +++ b/tools/perf/util/hist.c
> > @@ -524,8 +524,7 @@ static int hist_entry__init(struct hist_entry *he,
> >                 map__put(he->mem_info->daddr.ms.map);
> >         }
> >  err:
> > -       maps__zput(he->ms.maps);
> > -       map__zput(he->ms.map);
> > +       map_symbol__exit(&he->ms);
> >         zfree(&he->stat_acc);
> >         return -ENOMEM;
> >  }
> > @@ -1317,8 +1316,7 @@ void hist_entry__delete(struct hist_entry *he)
> >         struct hist_entry_ops *ops = he->ops;
> >
> >         thread__zput(he->thread);
> > -       maps__zput(he->ms.maps);
> > -       map__zput(he->ms.map);
> > +       map_symbol__exit(&he->ms);
> >
> >         if (he->branch_info) {
> >                 map__zput(he->branch_info->from.ms.map);
>
> What about he->branch_info and he->mem_info ?
>
> Also I think we can use it in hists__account_cycles() too.

Thanks, I addressed the ones I could find. Running perf mem report I still see:

==2948587==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1296 byte(s) in 162 object(s) allocated from:
    #0 0x7f2a018d85bf in __interceptor_malloc
../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x556104500e47 in map__get util/map.h:180
    #2 0x55610450cd6d in ip__resolve_data util/machine.c:2260
    #3 0x55610450d022 in sample__resolve_mem util/machine.c:2275
    #4 0x55610459dc73 in iter_prepare_mem_entry util/hist.c:824
    #5 0x5561045a0175 in hist_entry_iter__add util/hist.c:1238
    #6 0x5561042d8e8e in process_sample_event tools/perf/builtin-report.c:332
    #7 0x556104526a37 in evlist__deliver_sample util/session.c:1518
    #8 0x556104526d18 in machines__deliver_event util/session.c:1557
    #9 0x556104527942 in perf_session__deliver_event util/session.c:1639
    #10 0x55610451d576 in ordered_events__deliver_event util/session.c:188
    #11 0x556104536b6a in do_flush util/ordered-events.c:245
    #12 0x5561045372a8 in __ordered_events__flush util/ordered-events.c:324
    #13 0x556104537544 in ordered_events__flush util/ordered-events.c:342
    #14 0x5561045240b0 in perf_event__process_finished_round util/session.c:1080
    #15 0x556104527fd5 in perf_session__process_user_event util/session.c:1692
    #16 0x556104529396 in perf_session__process_event util/session.c:1861
    #17 0x55610452cf8c in process_simple util/session.c:2436
    #18 0x55610452c69b in reader__read_event util/session.c:2365
    #19 0x55610452cb36 in reader__process_events util/session.c:2414
    #20 0x55610452d467 in __perf_session__process_events util/session.c:2461
    #21 0x55610452ec52 in perf_session__process_events util/session.c:2627
    #22 0x5561042dd613 in __cmd_report tools/perf/builtin-report.c:992
    #23 0x5561042e7ac7 in cmd_report tools/perf/builtin-report.c:1711
    #24 0x556104343967 in report_events tools/perf/builtin-mem.c:374
    #25 0x556104345278 in cmd_mem tools/perf/builtin-mem.c:516
    #26 0x55610441848d in run_builtin tools/perf/perf.c:322
    #27 0x5561044189f9 in handle_internal_command tools/perf/perf.c:375
    #28 0x556104418dc1 in run_argv tools/perf/perf.c:419
    #29 0x556104419329 in main tools/perf/perf.c:535

Direct leak of 1296 byte(s) in 162 object(s) allocated from:
    #0 0x7f2a018d85bf in __interceptor_malloc
../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x556104500e47 in map__get util/map.h:180
    #2 0x55610450c97e in ip__resolve_ams util/machine.c:2239
    #3 0x55610450cf1b in sample__resolve_mem util/machine.c:2274
    #4 0x55610459dc73 in iter_prepare_mem_entry util/hist.c:824
    #5 0x5561045a0175 in hist_entry_iter__add util/hist.c:1238
    #6 0x5561042d8e8e in process_sample_event tools/perf/builtin-report.c:332
    #7 0x556104526a37 in evlist__deliver_sample util/session.c:1518
    #8 0x556104526d18 in machines__deliver_event util/session.c:1557
    #9 0x556104527942 in perf_session__deliver_event util/session.c:1639
    #10 0x55610451d576 in ordered_events__deliver_event util/session.c:188
    #11 0x556104536b6a in do_flush util/ordered-events.c:245
    #12 0x5561045372a8 in __ordered_events__flush util/ordered-events.c:324
    #13 0x556104537544 in ordered_events__flush util/ordered-events.c:342
    #14 0x5561045240b0 in perf_event__process_finished_round util/session.c:1080
    #15 0x556104527fd5 in perf_session__process_user_event util/session.c:1692
    #16 0x556104529396 in perf_session__process_event util/session.c:1861
    #17 0x55610452cf8c in process_simple util/session.c:2436
    #18 0x55610452c69b in reader__read_event util/session.c:2365
    #19 0x55610452cb36 in reader__process_events util/session.c:2414
    #20 0x55610452d467 in __perf_session__process_events util/session.c:2461
    #21 0x55610452ec52 in perf_session__process_events util/session.c:2627
    #22 0x5561042dd613 in __cmd_report tools/perf/builtin-report.c:992
    #23 0x5561042e7ac7 in cmd_report tools/perf/builtin-report.c:1711
    #24 0x556104343967 in report_events tools/perf/builtin-mem.c:374
    #25 0x556104345278 in cmd_mem tools/perf/builtin-mem.c:516
    #26 0x55610441848d in run_builtin tools/perf/perf.c:322
    #27 0x5561044189f9 in handle_internal_command tools/perf/perf.c:375
    #28 0x556104418dc1 in run_argv tools/perf/perf.c:419
    #29 0x556104419329 in main tools/perf/perf.c:535

Which looks like iter->priv was freed without puts. The use of void*s
and the general layout of the histogram code I find confusing. It
would be nice to get this resolved but I wasn't able to do it in v3.

Thanks,
Ian

> Thanks,
> Namhyung
>
>
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index 8e5085b77c7b..6ca7500e2cf4 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -2389,8 +2389,7 @@ static int add_callchain_ip(struct thread *thread,
> >                                       iter_cycles, branch_from, srcline);
> >  out:
> >         addr_location__exit(&al);
> > -       maps__put(ms.maps);
> > -       map__put(ms.map);
> > +       map_symbol__exit(&ms);
> >         return err;
> >  }
> >
> > @@ -3116,8 +3115,7 @@ static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms
> >                 if (ret != 0)
> >                         return ret;
> >         }
> > -       map__put(ilist_ms.map);
> > -       maps__put(ilist_ms.maps);
> > +       map_symbol__exit(&ilist_ms);
> >
> >         return ret;
> >  }
> > diff --git a/tools/perf/util/map_symbol.c b/tools/perf/util/map_symbol.c
> > new file mode 100644
> > index 000000000000..bef5079f2403
> > --- /dev/null
> > +++ b/tools/perf/util/map_symbol.c
> > @@ -0,0 +1,15 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#include "map_symbol.h"
> > +#include "maps.h"
> > +#include "map.h"
> > +
> > +void map_symbol__exit(struct map_symbol *ms)
> > +{
> > +       maps__zput(ms->maps);
> > +       map__zput(ms->map);
> > +}
> > +
> > +void addr_map_symbol__exit(struct addr_map_symbol *ams)
> > +{
> > +       map_symbol__exit(&ams->ms);
> > +}
> > diff --git a/tools/perf/util/map_symbol.h b/tools/perf/util/map_symbol.h
> > index e08817b0c30f..72d5ed938ed6 100644
> > --- a/tools/perf/util/map_symbol.h
> > +++ b/tools/perf/util/map_symbol.h
> > @@ -22,4 +22,8 @@ struct addr_map_symbol {
> >         u64           phys_addr;
> >         u64           data_page_size;
> >  };
> > +
> > +void map_symbol__exit(struct map_symbol *ms);
> > +void addr_map_symbol__exit(struct addr_map_symbol *ams);
> > +
> >  #endif // __PERF_MAP_SYMBOL
> > diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> > index 2740d4457c13..d67a87072eec 100644
> > --- a/tools/perf/util/symbol.c
> > +++ b/tools/perf/util/symbol.c
> > @@ -2790,8 +2790,11 @@ struct mem_info *mem_info__get(struct mem_info *mi)
> >
> >  void mem_info__put(struct mem_info *mi)
> >  {
> > -       if (mi && refcount_dec_and_test(&mi->refcnt))
> > +       if (mi && refcount_dec_and_test(&mi->refcnt)) {
> > +               addr_map_symbol__exit(&mi->iaddr);
> > +               addr_map_symbol__exit(&mi->daddr);
> >                 free(mi);
> > +       }
> >  }
> >
> >  struct mem_info *mem_info__new(void)
> > --
> > 2.42.0.609.gbb76f46606-goog
> >

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-10-24 20:52 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-12  6:23 [PATCH v2 00/13] Improvements to memory use Ian Rogers
2023-10-12  6:23 ` [PATCH v2 01/13] perf machine: Avoid out of bounds LBR memory read Ian Rogers
2023-10-12  6:23 ` [PATCH v2 02/13] libperf rc_check: Make implicit enabling work for GCC Ian Rogers
2023-10-12  6:23 ` [PATCH v2 03/13] perf hist: Add missing puts to hist__account_cycles Ian Rogers
2023-10-18 23:16   ` Namhyung Kim
2023-10-24 16:12     ` Ian Rogers
2023-10-12  6:23 ` [PATCH v2 04/13] perf threads: Remove unused dead thread list Ian Rogers
2023-10-12  6:23 ` [PATCH v2 05/13] perf offcpu: Add missed btf_free Ian Rogers
2023-10-12  6:23 ` [PATCH v2 06/13] perf callchain: Make display use of branch_type_stat const Ian Rogers
2023-10-12  6:23 ` [PATCH v2 07/13] perf callchain: Make brtype_stat in callchain_list optional Ian Rogers
2023-10-12  6:23 ` [PATCH v2 08/13] perf callchain: Minor layout changes to callchain_list Ian Rogers
2023-10-12  6:23 ` [PATCH v2 09/13] perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit Ian Rogers
2023-10-18 23:20   ` Namhyung Kim
2023-10-24 20:52     ` Ian Rogers
2023-10-12  6:23 ` [PATCH v2 10/13] perf record: Lazy load kernel symbols Ian Rogers
2023-10-19 11:02   ` Adrian Hunter
2023-10-24 18:42     ` Ian Rogers
2023-10-12  6:23 ` [PATCH v2 11/13] libperf: Lazily allocate mmap event copy Ian Rogers
2023-10-12  6:23 ` [PATCH v2 12/13] perf mmap: Lazily initialize zstd streams Ian Rogers
2023-10-18 23:21   ` Namhyung Kim
2023-10-24 19:21     ` Ian Rogers
2023-10-12  6:23 ` [PATCH v2 13/13] perf machine thread: Remove exited threads by default Ian Rogers
2023-10-18 23:30   ` Namhyung Kim
2023-10-19  0:39     ` Ian Rogers
2023-10-20 21:08       ` Namhyung Kim
2023-10-23 14:24   ` Adrian Hunter
2023-10-23 18:49     ` Ian Rogers
2023-10-24  5:19       ` Adrian Hunter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).