linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/53] Improvements to memory use
@ 2023-11-02 17:56 Ian Rogers
  2023-11-02 17:56 ` [PATCH v4 01/53] perf comm: Use regular mutex Ian Rogers
                   ` (52 more replies)
  0 siblings, 53 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:56 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Fix memory leaks detected by address/leak sanitizer affecting LBR
call-graphs, perf mem and BPF offcpu.

Make branch_type_stat in callchain_list optional as it is large and
not always necessary - in particular it isn't used by perf top.

Make the allocations of zstd streams, kernel symbols and event copies
lazier in order to save memory in cases like perf record.

Handle the thread exit event and have it remove the thread from the
threads set in machine. Don't do this for perf report as it causes a
regression for task lists, which assume threads are never removed from
the machine's set, and offcpu events, that may sythensize samples for
threads that have exited.

Avoid using 8kb buffers for filename__read_str which is excessive for
reading CPU maps. Add io_dir as an allocation free readdir
replacement, opendir allocating 32kb by default and the code uses it
recursively.

Shrink perf map using a two value byte to replace two function
pointers. Modify the implementation of maps to not use an rbtree as
the container for maps, instead use a sorted array. Improve locking
and reference counting issues.

Similar to maps separate out and reimplement threads to use a hashmap
for lower memory consumption and faster look up. The fixes a
regression in memory usage where reference count checking switched to
using non-invasive tree nodes.  Reduce its default size by 32 times
and improve locking discipline. Also, fix regressions where tids had
become unordered to make `perf report --tasks` and
`perf trace --summary` output easier to read.

Better encapsulate the dsos abstraction. Remove the linked list and
rbtree used for faster iteration and log(n) lookup to a sorted array
for similar performance but half the memory usage per dso. Improve
reference counting and locking discipline, adding reference count
checking to dso. Experimented with, but abandoned, a hashmap
implementation due to the need for extra storage and the keys not
being stable.

The overall effect is to reduce memory consumption significantly for
perf top - with call graphs enabled running longer before 1GB of
memory is consumed. For a perf record of 'true', the memory
consumption goes from 39912kb max resident to 20096kb max resident -
nearly halved. perf inject with -b of a system wide perf record of
'true' reduces the max resident by roughly 4.5% (3.4% in v4 due to
branch_type_stat changes being merged). This is while improving
correctness with locking discipline and reference count checking.

Patch organization (v4):
 - 53 patches is a lot, the patches aren't divided as they merge conflict and
   later patches, for example in dsos, rely on the changes and fixes to maps.
 - the dso reference count checking patch is larger due to switch use of dso to
   be by accessors, to encapsulate the reference count checker macros. The
   reference count checking changes within this largely mechanical change amount
   to a few lines and so weren't separated.
 - the first patch contains a build fix if the rwsem error checking is
   enabled missed from v3.
 - the next patches are an assortment of memory size fixes.
 - the next patches are the refactoring of maps.
 - the next patches are the refactoring of threads.
 - the next patches are the refactoring of dsos.
 - finally reference count checking is added to dso and some lock/reference
   count issues are resolved. This is done after changing the data structures,
   for example, as the single pointer on an array is easier to add reference
   count checking to compared to the 5 previous pointers.

v4: Rebased as 11 changes moved to perf-tools-next. Address comments
    from v3 such as error checking on zstd streams. Improve the
    dsos/dso in ways similar to threads and maps, with the addition of
    reference count checking on dso.
v3: Additional memory/speed improvements, in particular for maps and
    threads. Address review comments from namhyung@kernel.org and
    adrian.hunter@intel.com.
v2: Add additional memory fixes on top of initial LBR and rc check
    fixes.

Ian Rogers (53):
  perf comm: Use regular mutex
  perf record: Lazy load kernel symbols
  libperf: Lazily allocate mmap event copy
  perf mmap: Lazily initialize zstd streams
  perf machine thread: Remove exited threads by default
  tools api fs: Switch filename__read_str to use io.h
  tools api fs: Avoid reading whole file for a 1 byte bool
  tools lib api: Add io_dir an allocation free readdir alternative
  perf maps: Switch modules tree walk to io_dir__readdir
  perf record: Be lazier in allocating lost samples buffer
  perf pmu: Switch to io_dir__readdir
  perf bpf: Don't synthesize BPF events when disabled
  perf header: Switch mem topology to io_dir__readdir
  perf events: Remove scandir in thread synthesis
  perf map: Simplify map_ip/unmap_ip and make map size smaller
  perf maps: Move symbol maps functions to maps.c
  perf thread: Add missing RC_CHK_EQUAL
  perf maps: Add maps__for_each_map to call a function on each entry
  perf maps: Add remove maps function to remove a map based on callback
  perf debug: Expose debug file
  perf maps: Refactor maps__fixup_overlappings
  perf maps: Do simple merge if given map doesn't overlap
  perf maps: Rename clone to copy from
  perf maps: Add maps__load_first
  perf maps: Add find next entry to give entry after the given map
  perf maps: Reduce scope of map_rb_node and maps internals
  perf maps: Fix up overlaps during fixup_end
  perf maps: Switch from rbtree to lazily sorted array for addresses
  perf maps: Get map before returning in maps__find
  perf maps: Get map before returning in maps__find_by_name
  perf maps: Get map before returning in maps__find_next_entry
  perf maps: Hide maps internals
  perf maps: Locking tidy up of nr_maps
  perf dso: Reorder variables to save space in struct dso
  perf report: Sort child tasks by tid
  perf trace: Ignore thread hashing in summary
  perf machine: Move fprintf to for_each loop and a callback
  perf threads: Move threads to its own files
  perf threads: Switch from rbtree to hashmap
  perf threads: Reduce table size from 256 to 8
  perf dsos: Attempt to better abstract dsos internals
  perf dsos: Tidy reference counting and locking
  perf dsos: Add dsos__for_each_dso
  perf dso: Move dso functions out of dsos
  perf dsos: Switch more loops to dsos__for_each_dso
  perf dsos: Switch backing storage to array from rbtree/list
  perf dsos: Remove __dsos__addnew
  perf dsos: Remove __dsos__findnew_link_by_longname_id
  perf dsos: Switch hand code to bsearch
  perf dso: Add reference count checking and accessor functions
  perf dso: Reference counting related fixes
  perf dso: Use container_of to avoid a pointer in dso_data
  perf env: Avoid recursively taking env->bpf_progs.lock

 tools/lib/api/Makefile                        |    2 +-
 tools/lib/api/fs/fs.c                         |   74 +-
 tools/lib/api/io.h                            |    9 +-
 tools/lib/api/io_dir.h                        |   75 +
 tools/lib/perf/include/internal/mmap.h        |    2 +-
 tools/lib/perf/mmap.c                         |    9 +
 tools/perf/arch/x86/tests/dwarf-unwind.c      |    1 +
 tools/perf/arch/x86/util/event.c              |  103 +-
 tools/perf/builtin-annotate.c                 |    6 +-
 tools/perf/builtin-buildid-cache.c            |    2 +-
 tools/perf/builtin-buildid-list.c             |   18 +-
 tools/perf/builtin-inject.c                   |  102 +-
 tools/perf/builtin-kallsyms.c                 |    2 +-
 tools/perf/builtin-mem.c                      |    4 +-
 tools/perf/builtin-record.c                   |   59 +-
 tools/perf/builtin-report.c                   |  250 ++--
 tools/perf/builtin-script.c                   |    8 +-
 tools/perf/builtin-top.c                      |    4 +-
 tools/perf/builtin-trace.c                    |   41 +-
 tools/perf/tests/code-reading.c               |    8 +-
 tools/perf/tests/dso-data.c                   |   63 +-
 tools/perf/tests/hists_common.c               |    6 +-
 tools/perf/tests/hists_cumulate.c             |    4 +-
 tools/perf/tests/hists_output.c               |    2 +-
 tools/perf/tests/maps.c                       |   64 +-
 tools/perf/tests/symbols.c                    |    2 +-
 tools/perf/tests/thread-maps-share.c          |    8 +-
 tools/perf/tests/vmlinux-kallsyms.c           |  181 +--
 tools/perf/ui/browsers/annotate.c             |    6 +-
 tools/perf/ui/browsers/hists.c                |    8 +-
 tools/perf/ui/browsers/map.c                  |    4 +-
 tools/perf/util/Build                         |    1 +
 tools/perf/util/annotate.c                    |   44 +-
 tools/perf/util/auxtrace.c                    |    2 +-
 tools/perf/util/block-info.c                  |    2 +-
 tools/perf/util/bpf-event.c                   |   20 +-
 tools/perf/util/bpf-event.h                   |   12 +-
 tools/perf/util/bpf_lock_contention.c         |   10 +-
 tools/perf/util/build-id.c                    |  136 +-
 tools/perf/util/build-id.h                    |    2 -
 tools/perf/util/callchain.c                   |    4 +-
 tools/perf/util/comm.c                        |   10 +-
 tools/perf/util/compress.h                    |    6 +-
 tools/perf/util/data-convert-json.c           |    2 +-
 tools/perf/util/db-export.c                   |    6 +-
 tools/perf/util/debug.c                       |   22 +-
 tools/perf/util/debug.h                       |    1 +
 tools/perf/util/dlfilter.c                    |   12 +-
 tools/perf/util/dso.c                         |  468 ++++---
 tools/perf/util/dso.h                         |  544 ++++++--
 tools/perf/util/dsos.c                        |  529 ++++---
 tools/perf/util/dsos.h                        |   40 +-
 tools/perf/util/env.c                         |   53 +-
 tools/perf/util/env.h                         |    4 +
 tools/perf/util/event.c                       |   16 +-
 tools/perf/util/header.c                      |   47 +-
 tools/perf/util/hist.c                        |    4 +-
 tools/perf/util/intel-pt.c                    |   22 +-
 tools/perf/util/machine.c                     |  662 +++------
 tools/perf/util/machine.h                     |   32 +-
 tools/perf/util/map.c                         |   93 +-
 tools/perf/util/map.h                         |   83 +-
 tools/perf/util/maps.c                        | 1239 +++++++++++++----
 tools/perf/util/maps.h                        |   95 +-
 tools/perf/util/mmap.c                        |    5 +-
 tools/perf/util/mmap.h                        |    1 -
 tools/perf/util/pmu.c                         |   48 +-
 tools/perf/util/pmus.c                        |   30 +-
 tools/perf/util/probe-event.c                 |   62 +-
 tools/perf/util/rb_resort.h                   |    5 -
 .../scripting-engines/trace-event-python.c    |   21 +-
 tools/perf/util/session.c                     |   26 +
 tools/perf/util/session.h                     |    2 +
 tools/perf/util/sort.c                        |   19 +-
 tools/perf/util/srcline.c                     |   65 +-
 tools/perf/util/symbol-elf.c                  |  138 +-
 tools/perf/util/symbol.c                      |  521 ++-----
 tools/perf/util/symbol.h                      |    1 -
 tools/perf/util/symbol_conf.h                 |    4 +-
 tools/perf/util/symbol_fprintf.c              |    4 +-
 tools/perf/util/synthetic-events.c            |  156 ++-
 tools/perf/util/thread.c                      |   48 +-
 tools/perf/util/thread.h                      |   20 +-
 tools/perf/util/threads.c                     |  186 +++
 tools/perf/util/threads.h                     |   35 +
 tools/perf/util/unwind-libunwind-local.c      |   50 +-
 tools/perf/util/unwind-libunwind.c            |    9 +-
 tools/perf/util/vdso.c                        |   89 +-
 tools/perf/util/zstd.c                        |   63 +-
 89 files changed, 4129 insertions(+), 2829 deletions(-)
 create mode 100644 tools/lib/api/io_dir.h
 create mode 100644 tools/perf/util/threads.c
 create mode 100644 tools/perf/util/threads.h

-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v4 01/53] perf comm: Use regular mutex
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
@ 2023-11-02 17:56 ` Ian Rogers
  2023-11-05 17:31   ` Namhyung Kim
  2023-11-02 17:56 ` [PATCH v4 02/53] perf record: Lazy load kernel symbols Ian Rogers
                   ` (51 subsequent siblings)
  52 siblings, 1 reply; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:56 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

The rwsem is only after used for writing so switch to a mutex that has
better error checking.

Fixes: 7a8f349e9d14 ("perf rwsem: Add debug mode that uses a mutex")
Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/comm.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/comm.c b/tools/perf/util/comm.c
index afb8d4fd2644..4ae7bc2aa9a6 100644
--- a/tools/perf/util/comm.c
+++ b/tools/perf/util/comm.c
@@ -17,7 +17,7 @@ struct comm_str {
 
 /* Should perhaps be moved to struct machine */
 static struct rb_root comm_str_root;
-static struct rw_semaphore comm_str_lock = {.lock = PTHREAD_RWLOCK_INITIALIZER,};
+static struct mutex comm_str_lock = {.lock = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP,};
 
 static struct comm_str *comm_str__get(struct comm_str *cs)
 {
@@ -30,9 +30,9 @@ static struct comm_str *comm_str__get(struct comm_str *cs)
 static void comm_str__put(struct comm_str *cs)
 {
 	if (cs && refcount_dec_and_test(&cs->refcnt)) {
-		down_write(&comm_str_lock);
+		mutex_lock(&comm_str_lock);
 		rb_erase(&cs->rb_node, &comm_str_root);
-		up_write(&comm_str_lock);
+		mutex_unlock(&comm_str_lock);
 		zfree(&cs->str);
 		free(cs);
 	}
@@ -98,9 +98,9 @@ static struct comm_str *comm_str__findnew(const char *str, struct rb_root *root)
 {
 	struct comm_str *cs;
 
-	down_write(&comm_str_lock);
+	mutex_lock(&comm_str_lock);
 	cs = __comm_str__findnew(str, root);
-	up_write(&comm_str_lock);
+	mutex_unlock(&comm_str_lock);
 
 	return cs;
 }
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 02/53] perf record: Lazy load kernel symbols
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
  2023-11-02 17:56 ` [PATCH v4 01/53] perf comm: Use regular mutex Ian Rogers
@ 2023-11-02 17:56 ` Ian Rogers
  2023-11-05 17:34   ` Namhyung Kim
  2023-11-06 11:00   ` Adrian Hunter
  2023-11-02 17:56 ` [PATCH v4 03/53] libperf: Lazily allocate mmap event copy Ian Rogers
                   ` (50 subsequent siblings)
  52 siblings, 2 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:56 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Commit 5b7ba82a7591 ("perf symbols: Load kernel maps before using")
changed it so that loading a kernel dso would cause the symbols for
the dso to be eagerly loaded. For perf record this is overhead as the
symbols won't be used. Add a symbol_conf to control the behavior and
disable it for perf record and perf inject.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-inject.c   | 6 ++++++
 tools/perf/builtin-record.c   | 2 ++
 tools/perf/util/event.c       | 4 ++--
 tools/perf/util/symbol_conf.h | 3 ++-
 4 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index c8cf2fdd9cff..eb3ef5c24b66 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -2265,6 +2265,12 @@ int cmd_inject(int argc, const char **argv)
 		"perf inject [<options>]",
 		NULL
 	};
+
+	if (!inject.itrace_synth_opts.set) {
+		/* Disable eager loading of kernel symbols that adds overhead to perf inject. */
+		symbol_conf.lazy_load_kernel_maps = true;
+	}
+
 #ifndef HAVE_JITDUMP
 	set_option_nobuild(options, 'j', "jit", "NO_LIBELF=1", true);
 #endif
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index dcf288a4fb9a..8ec818568662 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -3989,6 +3989,8 @@ int cmd_record(int argc, const char **argv)
 # undef set_nobuild
 #endif
 
+	/* Disable eager loading of kernel symbols that adds overhead to perf record. */
+	symbol_conf.lazy_load_kernel_maps = true;
 	rec->opts.affinity = PERF_AFFINITY_SYS;
 
 	rec->evlist = evlist__new();
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 923c0fb15122..68f45e9e63b6 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -617,13 +617,13 @@ struct map *thread__find_map(struct thread *thread, u8 cpumode, u64 addr,
 	if (cpumode == PERF_RECORD_MISC_KERNEL && perf_host) {
 		al->level = 'k';
 		maps = machine__kernel_maps(machine);
-		load_map = true;
+		load_map = !symbol_conf.lazy_load_kernel_maps;
 	} else if (cpumode == PERF_RECORD_MISC_USER && perf_host) {
 		al->level = '.';
 	} else if (cpumode == PERF_RECORD_MISC_GUEST_KERNEL && perf_guest) {
 		al->level = 'g';
 		maps = machine__kernel_maps(machine);
-		load_map = true;
+		load_map = !symbol_conf.lazy_load_kernel_maps;
 	} else if (cpumode == PERF_RECORD_MISC_GUEST_USER && perf_guest) {
 		al->level = 'u';
 	} else {
diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
index 0b589570d1d0..2b2fb9e224b0 100644
--- a/tools/perf/util/symbol_conf.h
+++ b/tools/perf/util/symbol_conf.h
@@ -42,7 +42,8 @@ struct symbol_conf {
 			inline_name,
 			disable_add2line_warn,
 			buildid_mmap2,
-			guest_code;
+			guest_code,
+			lazy_load_kernel_maps;
 	const char	*vmlinux_name,
 			*kallsyms_name,
 			*source_prefix,
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 03/53] libperf: Lazily allocate mmap event copy
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
  2023-11-02 17:56 ` [PATCH v4 01/53] perf comm: Use regular mutex Ian Rogers
  2023-11-02 17:56 ` [PATCH v4 02/53] perf record: Lazy load kernel symbols Ian Rogers
@ 2023-11-02 17:56 ` Ian Rogers
  2023-11-03  8:32   ` Guilherme Amadio
  2023-11-02 17:56 ` [PATCH v4 04/53] perf mmap: Lazily initialize zstd streams Ian Rogers
                   ` (49 subsequent siblings)
  52 siblings, 1 reply; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:56 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

The event copy in the mmap is used to have storage to a read
event. Not all users of mmaps read the events, such as perf record, so
switch the allocation to being on first read rather than being
embedded within the perf_mmap.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/perf/include/internal/mmap.h | 2 +-
 tools/lib/perf/mmap.c                  | 9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/lib/perf/include/internal/mmap.h b/tools/lib/perf/include/internal/mmap.h
index 5a062af8e9d8..b11aaf5ed645 100644
--- a/tools/lib/perf/include/internal/mmap.h
+++ b/tools/lib/perf/include/internal/mmap.h
@@ -33,7 +33,7 @@ struct perf_mmap {
 	bool			 overwrite;
 	u64			 flush;
 	libperf_unmap_cb_t	 unmap_cb;
-	char			 event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
+	void			*event_copy;
 	struct perf_mmap	*next;
 };
 
diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
index 2184814b37dd..91ae46aac378 100644
--- a/tools/lib/perf/mmap.c
+++ b/tools/lib/perf/mmap.c
@@ -51,6 +51,8 @@ int perf_mmap__mmap(struct perf_mmap *map, struct perf_mmap_param *mp,
 
 void perf_mmap__munmap(struct perf_mmap *map)
 {
+	free(map->event_copy);
+	map->event_copy = NULL;
 	if (map && map->base != NULL) {
 		munmap(map->base, perf_mmap__mmap_len(map));
 		map->base = NULL;
@@ -226,6 +228,13 @@ static union perf_event *perf_mmap__read(struct perf_mmap *map,
 			unsigned int len = min(sizeof(*event), size), cpy;
 			void *dst = map->event_copy;
 
+			if (!dst) {
+				dst = malloc(PERF_SAMPLE_MAX_SIZE);
+				if (!dst)
+					return NULL;
+				map->event_copy = dst;
+			}
+
 			do {
 				cpy = min(map->mask + 1 - (offset & map->mask), len);
 				memcpy(dst, &data[offset & map->mask], cpy);
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 04/53] perf mmap: Lazily initialize zstd streams
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (2 preceding siblings ...)
  2023-11-02 17:56 ` [PATCH v4 03/53] libperf: Lazily allocate mmap event copy Ian Rogers
@ 2023-11-02 17:56 ` Ian Rogers
  2023-11-27 22:00   ` Arnaldo Carvalho de Melo
  2023-11-02 17:56 ` [PATCH v4 05/53] perf machine thread: Remove exited threads by default Ian Rogers
                   ` (48 subsequent siblings)
  52 siblings, 1 reply; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:56 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Zstd streams create dictionaries that can require significant RAM,
especially when there is one per-CPU. Tools like perf record won't use
the streams without the -z option, and so the creation of the streams
is pure overhead. Switch to creating the streams on first use.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-record.c | 26 ++++++++++-----
 tools/perf/util/compress.h  |  6 ++--
 tools/perf/util/mmap.c      |  5 ++-
 tools/perf/util/mmap.h      |  1 -
 tools/perf/util/zstd.c      | 63 +++++++++++++++++++------------------
 5 files changed, 58 insertions(+), 43 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 8ec818568662..9b4f3805ca92 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -270,7 +270,7 @@ static int record__write(struct record *rec, struct mmap *map __maybe_unused,
 
 static int record__aio_enabled(struct record *rec);
 static int record__comp_enabled(struct record *rec);
-static size_t zstd_compress(struct perf_session *session, struct mmap *map,
+static ssize_t zstd_compress(struct perf_session *session, struct mmap *map,
 			    void *dst, size_t dst_size, void *src, size_t src_size);
 
 #ifdef HAVE_AIO_SUPPORT
@@ -405,9 +405,13 @@ static int record__aio_pushfn(struct mmap *map, void *to, void *buf, size_t size
 	 */
 
 	if (record__comp_enabled(aio->rec)) {
-		size = zstd_compress(aio->rec->session, NULL, aio->data + aio->size,
-				     mmap__mmap_len(map) - aio->size,
-				     buf, size);
+		ssize_t compressed = zstd_compress(aio->rec->session, NULL, aio->data + aio->size,
+						   mmap__mmap_len(map) - aio->size,
+						   buf, size);
+		if (compressed < 0)
+			return (int)compressed;
+
+		size = compressed;
 	} else {
 		memcpy(aio->data + aio->size, buf, size);
 	}
@@ -633,7 +637,13 @@ static int record__pushfn(struct mmap *map, void *to, void *bf, size_t size)
 	struct record *rec = to;
 
 	if (record__comp_enabled(rec)) {
-		size = zstd_compress(rec->session, map, map->data, mmap__mmap_len(map), bf, size);
+		ssize_t compressed = zstd_compress(rec->session, map, map->data,
+						   mmap__mmap_len(map), bf, size);
+
+		if (compressed < 0)
+			return (int)compressed;
+
+		size = compressed;
 		bf   = map->data;
 	}
 
@@ -1527,10 +1537,10 @@ static size_t process_comp_header(void *record, size_t increment)
 	return size;
 }
 
-static size_t zstd_compress(struct perf_session *session, struct mmap *map,
+static ssize_t zstd_compress(struct perf_session *session, struct mmap *map,
 			    void *dst, size_t dst_size, void *src, size_t src_size)
 {
-	size_t compressed;
+	ssize_t compressed;
 	size_t max_record_size = PERF_SAMPLE_MAX_SIZE - sizeof(struct perf_record_compressed) - 1;
 	struct zstd_data *zstd_data = &session->zstd_data;
 
@@ -1539,6 +1549,8 @@ static size_t zstd_compress(struct perf_session *session, struct mmap *map,
 
 	compressed = zstd_compress_stream_to_records(zstd_data, dst, dst_size, src, src_size,
 						     max_record_size, process_comp_header);
+	if (compressed < 0)
+		return compressed;
 
 	if (map && map->file) {
 		thread->bytes_transferred += src_size;
diff --git a/tools/perf/util/compress.h b/tools/perf/util/compress.h
index 0cd3369af2a4..9eb6eb5bf038 100644
--- a/tools/perf/util/compress.h
+++ b/tools/perf/util/compress.h
@@ -3,6 +3,7 @@
 #define PERF_COMPRESS_H
 
 #include <stdbool.h>
+#include <stdlib.h>
 #ifdef HAVE_ZSTD_SUPPORT
 #include <zstd.h>
 #endif
@@ -21,6 +22,7 @@ struct zstd_data {
 #ifdef HAVE_ZSTD_SUPPORT
 	ZSTD_CStream	*cstream;
 	ZSTD_DStream	*dstream;
+	int comp_level;
 #endif
 };
 
@@ -29,7 +31,7 @@ struct zstd_data {
 int zstd_init(struct zstd_data *data, int level);
 int zstd_fini(struct zstd_data *data);
 
-size_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
+ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
 				       void *src, size_t src_size, size_t max_record_size,
 				       size_t process_header(void *record, size_t increment));
 
@@ -48,7 +50,7 @@ static inline int zstd_fini(struct zstd_data *data __maybe_unused)
 }
 
 static inline
-size_t zstd_compress_stream_to_records(struct zstd_data *data __maybe_unused,
+ssize_t zstd_compress_stream_to_records(struct zstd_data *data __maybe_unused,
 				       void *dst __maybe_unused, size_t dst_size __maybe_unused,
 				       void *src __maybe_unused, size_t src_size __maybe_unused,
 				       size_t max_record_size __maybe_unused,
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 49093b21ee2d..122ee198a86e 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -295,15 +295,14 @@ int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, struct perf_cpu
 
 	map->core.flush = mp->flush;
 
-	map->comp_level = mp->comp_level;
 #ifndef PYTHON_PERF
-	if (zstd_init(&map->zstd_data, map->comp_level)) {
+	if (zstd_init(&map->zstd_data, mp->comp_level)) {
 		pr_debug2("failed to init mmap compressor, error %d\n", errno);
 		return -1;
 	}
 #endif
 
-	if (map->comp_level && !perf_mmap__aio_enabled(map)) {
+	if (mp->comp_level && !perf_mmap__aio_enabled(map)) {
 		map->data = mmap(NULL, mmap__mmap_len(map), PROT_READ|PROT_WRITE,
 				 MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
 		if (map->data == MAP_FAILED) {
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index f944c3cd5efa..0df6e1621c7e 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -39,7 +39,6 @@ struct mmap {
 #endif
 	struct mmap_cpu_mask	affinity_mask;
 	void		*data;
-	int		comp_level;
 	struct perf_data_file *file;
 	struct zstd_data      zstd_data;
 };
diff --git a/tools/perf/util/zstd.c b/tools/perf/util/zstd.c
index 48dd2b018c47..57027e0ac7b6 100644
--- a/tools/perf/util/zstd.c
+++ b/tools/perf/util/zstd.c
@@ -7,35 +7,9 @@
 
 int zstd_init(struct zstd_data *data, int level)
 {
-	size_t ret;
-
-	data->dstream = ZSTD_createDStream();
-	if (data->dstream == NULL) {
-		pr_err("Couldn't create decompression stream.\n");
-		return -1;
-	}
-
-	ret = ZSTD_initDStream(data->dstream);
-	if (ZSTD_isError(ret)) {
-		pr_err("Failed to initialize decompression stream: %s\n", ZSTD_getErrorName(ret));
-		return -1;
-	}
-
-	if (!level)
-		return 0;
-
-	data->cstream = ZSTD_createCStream();
-	if (data->cstream == NULL) {
-		pr_err("Couldn't create compression stream.\n");
-		return -1;
-	}
-
-	ret = ZSTD_initCStream(data->cstream, level);
-	if (ZSTD_isError(ret)) {
-		pr_err("Failed to initialize compression stream: %s\n", ZSTD_getErrorName(ret));
-		return -1;
-	}
-
+	data->comp_level = level;
+	data->dstream = NULL;
+	data->cstream = NULL;
 	return 0;
 }
 
@@ -54,7 +28,7 @@ int zstd_fini(struct zstd_data *data)
 	return 0;
 }
 
-size_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
+ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
 				       void *src, size_t src_size, size_t max_record_size,
 				       size_t process_header(void *record, size_t increment))
 {
@@ -63,6 +37,21 @@ size_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t
 	ZSTD_outBuffer output;
 	void *record;
 
+	if (!data->cstream) {
+		data->cstream = ZSTD_createCStream();
+		if (data->cstream == NULL) {
+			pr_err("Couldn't create compression stream.\n");
+			return -1;
+		}
+
+		ret = ZSTD_initCStream(data->cstream, data->comp_level);
+		if (ZSTD_isError(ret)) {
+			pr_err("Failed to initialize compression stream: %s\n",
+				ZSTD_getErrorName(ret));
+			return -1;
+		}
+	}
+
 	while (input.pos < input.size) {
 		record = dst;
 		size = process_header(record, 0);
@@ -96,6 +85,20 @@ size_t zstd_decompress_stream(struct zstd_data *data, void *src, size_t src_size
 	ZSTD_inBuffer input = { src, src_size, 0 };
 	ZSTD_outBuffer output = { dst, dst_size, 0 };
 
+	if (!data->dstream) {
+		data->dstream = ZSTD_createDStream();
+		if (data->dstream == NULL) {
+			pr_err("Couldn't create decompression stream.\n");
+			return 0;
+		}
+
+		ret = ZSTD_initDStream(data->dstream);
+		if (ZSTD_isError(ret)) {
+			pr_err("Failed to initialize decompression stream: %s\n",
+				ZSTD_getErrorName(ret));
+			return 0;
+		}
+	}
 	while (input.pos < input.size) {
 		ret = ZSTD_decompressStream(data->dstream, &output, &input);
 		if (ZSTD_isError(ret)) {
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 05/53] perf machine thread: Remove exited threads by default
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (3 preceding siblings ...)
  2023-11-02 17:56 ` [PATCH v4 04/53] perf mmap: Lazily initialize zstd streams Ian Rogers
@ 2023-11-02 17:56 ` Ian Rogers
  2023-11-06 11:28   ` Adrian Hunter
  2023-11-02 17:56 ` [PATCH v4 06/53] tools api fs: Switch filename__read_str to use io.h Ian Rogers
                   ` (47 subsequent siblings)
  52 siblings, 1 reply; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:56 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

struct thread values hold onto references to mmaps, dsos, etc. When a
thread exits it is necessary to clean all of this memory up by
removing the thread from the machine's threads. Some tools require
this doesn't happen, such as auxtrace events, perf report if offcpu
events exist or if a task list is being generated, so add a
symbol_conf value to make the behavior optional. When an exited thread
is left in the machine's threads, mark it as exited.

This change relates to commit 40826c45eb0b ("perf thread: Remove
notion of dead threads"). Dead threads were removed as they had a
reference count of 0 and were difficult to reason about with the
reference count checker. Here a thread is removed from threads when it
exits, unless via symbol_conf the exited thread isn't remove and is
marked as exited. Reference counting behaves as it normally does.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-report.c   |  7 +++++++
 tools/perf/util/machine.c     | 10 +++++++---
 tools/perf/util/session.c     |  5 +++++
 tools/perf/util/symbol_conf.h |  3 ++-
 tools/perf/util/thread.h      | 14 ++++++++++++++
 5 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 9cb1da2dc0c0..121a2781323c 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1426,6 +1426,13 @@ int cmd_report(int argc, const char **argv)
 	if (ret < 0)
 		goto exit;
 
+	/*
+	 * tasks_mode require access to exited threads to list those that are in
+	 * the data file. Off-cpu events are synthesized after other events and
+	 * reference exited threads.
+	 */
+	symbol_conf.keep_exited_threads = true;
+
 	annotation_options__init(&report.annotation_opts);
 
 	ret = perf_config(report__config, &report);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 90c750150b19..a985d004aa8d 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2157,9 +2157,13 @@ int machine__process_exit_event(struct machine *machine, union perf_event *event
 	if (dump_trace)
 		perf_event__fprintf_task(event, stdout);
 
-	if (thread != NULL)
-		thread__put(thread);
-
+	if (thread != NULL) {
+		if (symbol_conf.keep_exited_threads)
+			thread__set_exited(thread, /*exited=*/true);
+		else
+			machine__remove_thread(machine, thread);
+	}
+	thread__put(thread);
 	return 0;
 }
 
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 1e9aa8ed15b6..c6afba7ab1a5 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -115,6 +115,11 @@ static int perf_session__open(struct perf_session *session, int repipe_fd)
 		return -1;
 	}
 
+	if (perf_header__has_feat(&session->header, HEADER_AUXTRACE)) {
+		/* Auxiliary events may reference exited threads, hold onto dead ones. */
+		symbol_conf.keep_exited_threads = true;
+	}
+
 	if (perf_data__is_pipe(data))
 		return 0;
 
diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
index 2b2fb9e224b0..6040286e07a6 100644
--- a/tools/perf/util/symbol_conf.h
+++ b/tools/perf/util/symbol_conf.h
@@ -43,7 +43,8 @@ struct symbol_conf {
 			disable_add2line_warn,
 			buildid_mmap2,
 			guest_code,
-			lazy_load_kernel_maps;
+			lazy_load_kernel_maps,
+			keep_exited_threads;
 	const char	*vmlinux_name,
 			*kallsyms_name,
 			*source_prefix,
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index e79225a0ea46..0df775b5c110 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -36,13 +36,22 @@ struct thread_rb_node {
 };
 
 DECLARE_RC_STRUCT(thread) {
+	/** @maps: mmaps associated with this thread. */
 	struct maps		*maps;
 	pid_t			pid_; /* Not all tools update this */
+	/** @tid: thread ID number unique to a machine. */
 	pid_t			tid;
+	/** @ppid: parent process of the process this thread belongs to. */
 	pid_t			ppid;
 	int			cpu;
 	int			guest_cpu; /* For QEMU thread */
 	refcount_t		refcnt;
+	/**
+	 * @exited: Has the thread had an exit event. Such threads are usually
+	 * removed from the machine's threads but some events/tools require
+	 * access to dead threads.
+	 */
+	bool			exited;
 	bool			comm_set;
 	int			comm_len;
 	struct list_head	namespaces_list;
@@ -189,6 +198,11 @@ static inline refcount_t *thread__refcnt(struct thread *thread)
 	return &RC_CHK_ACCESS(thread)->refcnt;
 }
 
+static inline void thread__set_exited(struct thread *thread, bool exited)
+{
+	RC_CHK_ACCESS(thread)->exited = exited;
+}
+
 static inline bool thread__comm_set(const struct thread *thread)
 {
 	return RC_CHK_ACCESS(thread)->comm_set;
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 06/53] tools api fs: Switch filename__read_str to use io.h
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (4 preceding siblings ...)
  2023-11-02 17:56 ` [PATCH v4 05/53] perf machine thread: Remove exited threads by default Ian Rogers
@ 2023-11-02 17:56 ` Ian Rogers
  2023-11-06  3:53   ` Namhyung Kim
  2023-11-02 17:56 ` [PATCH v4 07/53] tools api fs: Avoid reading whole file for a 1 byte bool Ian Rogers
                   ` (46 subsequent siblings)
  52 siblings, 1 reply; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:56 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

filename__read_str has its own string reading code that allocates
memory before reading into it. The memory allocated is sized at BUFSIZ
that is 8kb. Most strings are short and so most of this 8kb is
wasted.

Refactor io__getline so that the newline character can be configurable
and ignored in the case of filename__read_str.

Code like build_caches_for_cpu in perf's header.c will read many
strings and hold them in a data structure, in this case multiple
strings per cache level per CPU. Using io.h's io__getline avoids the
wasted memory as strings are temporarily read into a buffer on the
stack before being copied to a buffer that grows 128 bytes at a time
and is never sized larger than the string.

For a 16 hyperthread system the memory consumption of "perf record
true" is reduced by 180kb, primarily through saving memory when
reading the cache information.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/api/fs/fs.c | 56 +++++++++++--------------------------------
 tools/lib/api/io.h    |  9 +++++--
 2 files changed, 21 insertions(+), 44 deletions(-)

diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c
index 5cb0eeec2c8a..496812b5f1d2 100644
--- a/tools/lib/api/fs/fs.c
+++ b/tools/lib/api/fs/fs.c
@@ -16,6 +16,7 @@
 #include <sys/mount.h>
 
 #include "fs.h"
+#include "../io.h"
 #include "debug-internal.h"
 
 #define _STR(x) #x
@@ -344,53 +345,24 @@ int filename__read_ull(const char *filename, unsigned long long *value)
 	return filename__read_ull_base(filename, value, 0);
 }
 
-#define STRERR_BUFSIZE  128     /* For the buffer size of strerror_r */
-
 int filename__read_str(const char *filename, char **buf, size_t *sizep)
 {
-	size_t size = 0, alloc_size = 0;
-	void *bf = NULL, *nbf;
-	int fd, n, err = 0;
-	char sbuf[STRERR_BUFSIZE];
+	struct io io;
+	char bf[128];
+	int err;
 
-	fd = open(filename, O_RDONLY);
-	if (fd < 0)
+	io.fd = open(filename, O_RDONLY);
+	if (io.fd < 0)
 		return -errno;
-
-	do {
-		if (size == alloc_size) {
-			alloc_size += BUFSIZ;
-			nbf = realloc(bf, alloc_size);
-			if (!nbf) {
-				err = -ENOMEM;
-				break;
-			}
-
-			bf = nbf;
-		}
-
-		n = read(fd, bf + size, alloc_size - size);
-		if (n < 0) {
-			if (size) {
-				pr_warn("read failed %d: %s\n", errno,
-					strerror_r(errno, sbuf, sizeof(sbuf)));
-				err = 0;
-			} else
-				err = -errno;
-
-			break;
-		}
-
-		size += n;
-	} while (n > 0);
-
-	if (!err) {
-		*sizep = size;
-		*buf   = bf;
+	io__init(&io, io.fd, bf, sizeof(bf));
+	*buf = NULL;
+	err = io__getline_nl(&io, buf, sizep, /*nl=*/-1);
+	if (err < 0) {
+		free(*buf);
+		*buf = NULL;
 	} else
-		free(bf);
-
-	close(fd);
+		err = 0;
+	close(io.fd);
 	return err;
 }
 
diff --git a/tools/lib/api/io.h b/tools/lib/api/io.h
index a77b74c5fb65..50d33e14fb56 100644
--- a/tools/lib/api/io.h
+++ b/tools/lib/api/io.h
@@ -141,7 +141,7 @@ static inline int io__get_dec(struct io *io, __u64 *dec)
 }
 
 /* Read up to and including the first newline following the pattern of getline. */
-static inline ssize_t io__getline(struct io *io, char **line_out, size_t *line_len_out)
+static inline ssize_t io__getline_nl(struct io *io, char **line_out, size_t *line_len_out, int nl)
 {
 	char buf[128];
 	int buf_pos = 0;
@@ -151,7 +151,7 @@ static inline ssize_t io__getline(struct io *io, char **line_out, size_t *line_l
 
 	/* TODO: reuse previously allocated memory. */
 	free(*line_out);
-	while (ch != '\n') {
+	while (ch != nl) {
 		ch = io__get_char(io);
 
 		if (ch < 0)
@@ -184,4 +184,9 @@ static inline ssize_t io__getline(struct io *io, char **line_out, size_t *line_l
 	return -ENOMEM;
 }
 
+static inline ssize_t io__getline(struct io *io, char **line_out, size_t *line_len_out)
+{
+	return io__getline_nl(io, line_out, line_len_out, /*nl=*/'\n');
+}
+
 #endif /* __API_IO__ */
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 07/53] tools api fs: Avoid reading whole file for a 1 byte bool
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (5 preceding siblings ...)
  2023-11-02 17:56 ` [PATCH v4 06/53] tools api fs: Switch filename__read_str to use io.h Ian Rogers
@ 2023-11-02 17:56 ` Ian Rogers
  2023-11-06  3:55   ` Namhyung Kim
  2023-11-02 17:56 ` [PATCH v4 08/53] tools lib api: Add io_dir an allocation free readdir alternative Ian Rogers
                   ` (45 subsequent siblings)
  52 siblings, 1 reply; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:56 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

sysfs__read_bool used the first byte from a fully read file into a
string. It then looked at the first byte's value. Avoid doing this and
just read the first byte.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/api/fs/fs.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c
index 496812b5f1d2..4c35a689d1fc 100644
--- a/tools/lib/api/fs/fs.c
+++ b/tools/lib/api/fs/fs.c
@@ -447,15 +447,16 @@ int sysfs__read_str(const char *entry, char **buf, size_t *sizep)
 
 int sysfs__read_bool(const char *entry, bool *value)
 {
-	char *buf;
-	size_t size;
-	int ret;
+	struct io io;
+	char bf[16];
+	int ret = 0;
 
-	ret = sysfs__read_str(entry, &buf, &size);
-	if (ret < 0)
-		return ret;
+	io.fd = open(entry, O_RDONLY);
+	if (io.fd < 0)
+		return -errno;
 
-	switch (buf[0]) {
+	io__init(&io, io.fd, bf, sizeof(bf));
+	switch (io__get_char(&io)) {
 	case '1':
 	case 'y':
 	case 'Y':
@@ -469,8 +470,7 @@ int sysfs__read_bool(const char *entry, bool *value)
 	default:
 		ret = -1;
 	}
-
-	free(buf);
+	close(io.fd);
 
 	return ret;
 }
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 08/53] tools lib api: Add io_dir an allocation free readdir alternative
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (6 preceding siblings ...)
  2023-11-02 17:56 ` [PATCH v4 07/53] tools api fs: Avoid reading whole file for a 1 byte bool Ian Rogers
@ 2023-11-02 17:56 ` Ian Rogers
  2023-11-02 17:56 ` [PATCH v4 09/53] perf maps: Switch modules tree walk to io_dir__readdir Ian Rogers
                   ` (44 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:56 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

glibc's opendir allocates a minimum of 32kb, when called recursively
for a directory tree the memory consumption can add up - nearly 300kb
during perf start-up when processing modules. Add a stack allocated
variant of readdir sized a little more than 1kb.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/api/Makefile |  2 +-
 tools/lib/api/io_dir.h | 75 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+), 1 deletion(-)
 create mode 100644 tools/lib/api/io_dir.h

diff --git a/tools/lib/api/Makefile b/tools/lib/api/Makefile
index 044860ac1ed1..186aa407de8c 100644
--- a/tools/lib/api/Makefile
+++ b/tools/lib/api/Makefile
@@ -99,7 +99,7 @@ install_lib: $(LIBFILE)
 		$(call do_install_mkdir,$(libdir_SQ)); \
 		cp -fpR $(LIBFILE) $(DESTDIR)$(libdir_SQ)
 
-HDRS := cpu.h debug.h io.h
+HDRS := cpu.h debug.h io.h io_dir.h
 FD_HDRS := fd/array.h
 FS_HDRS := fs/fs.h fs/tracing_path.h
 INSTALL_HDRS_PFX := $(DESTDIR)$(prefix)/include/api
diff --git a/tools/lib/api/io_dir.h b/tools/lib/api/io_dir.h
new file mode 100644
index 000000000000..f3479006edb6
--- /dev/null
+++ b/tools/lib/api/io_dir.h
@@ -0,0 +1,75 @@
+/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
+/*
+ * Lightweight directory reading library.
+ */
+#ifndef __API_IO_DIR__
+#define __API_IO_DIR__
+
+#include <dirent.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/stat.h>
+
+struct io_dirent64 {
+	ino64_t        d_ino;    /* 64-bit inode number */
+	off64_t        d_off;    /* 64-bit offset to next structure */
+	unsigned short d_reclen; /* Size of this dirent */
+	unsigned char  d_type;   /* File type */
+	char           d_name[NAME_MAX + 1]; /* Filename (null-terminated) */
+};
+
+struct io_dir {
+	int dirfd;
+	ssize_t available_bytes;
+	struct io_dirent64 *next;
+	struct io_dirent64 buff[4];
+};
+
+static inline void io_dir__init(struct io_dir *iod, int dirfd)
+{
+	iod->dirfd = dirfd;
+	iod->available_bytes = 0;
+}
+
+static inline void io_dir__rewinddir(struct io_dir *iod)
+{
+	lseek(iod->dirfd, 0, SEEK_SET);
+	iod->available_bytes = 0;
+}
+
+static inline struct io_dirent64 *io_dir__readdir(struct io_dir *iod)
+{
+	struct io_dirent64 *entry;
+
+	if (iod->available_bytes <= 0) {
+		ssize_t rc = getdents64(iod->dirfd, iod->buff, sizeof(iod->buff));
+
+		if (rc <= 0)
+			return NULL;
+		iod->available_bytes = rc;
+		iod->next = iod->buff;
+	}
+	entry = iod->next;
+	iod->next = (struct io_dirent64 *)((char *)entry + entry->d_reclen);
+	iod->available_bytes -= entry->d_reclen;
+	return entry;
+}
+
+static inline bool io_dir__is_dir(const struct io_dir *iod, struct io_dirent64 *dent)
+{
+	if (dent->d_type == DT_UNKNOWN) {
+		struct stat st;
+
+		if (fstatat(iod->dirfd, dent->d_name, &st, /*flags=*/0))
+			return false;
+
+		if (S_ISDIR(st.st_mode)) {
+			dent->d_type = DT_DIR;
+			return true;
+		}
+	}
+	return dent->d_type == DT_DIR;
+}
+
+#endif
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 09/53] perf maps: Switch modules tree walk to io_dir__readdir
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (7 preceding siblings ...)
  2023-11-02 17:56 ` [PATCH v4 08/53] tools lib api: Add io_dir an allocation free readdir alternative Ian Rogers
@ 2023-11-02 17:56 ` Ian Rogers
  2023-11-02 17:56 ` [PATCH v4 10/53] perf record: Be lazier in allocating lost samples buffer Ian Rogers
                   ` (43 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:56 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Compared to glibc's opendir/readdir this lowers the max RSS of perf
record by 1.8MB on a Debian machine.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/machine.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index a985d004aa8d..be3dab9d5253 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -36,6 +36,7 @@
 #include <internal/lib.h> // page_size
 #include "cgroup.h"
 #include "arm64-frame-pointer-unwind-support.h"
+#include <api/io_dir.h>
 
 #include <linux/ctype.h>
 #include <symbol/kallsyms.h>
@@ -1552,25 +1553,21 @@ static int maps__set_module_path(struct maps *maps, const char *path, struct kmo
 
 static int maps__set_modules_path_dir(struct maps *maps, const char *dir_name, int depth)
 {
-	struct dirent *dent;
-	DIR *dir = opendir(dir_name);
+	struct io_dirent64 *dent;
+	struct io_dir iod;
 	int ret = 0;
 
-	if (!dir) {
+	io_dir__init(&iod, open(dir_name, O_CLOEXEC | O_DIRECTORY | O_RDONLY));
+	if (iod.dirfd < 0) {
 		pr_debug("%s: cannot open %s dir\n", __func__, dir_name);
 		return -1;
 	}
 
-	while ((dent = readdir(dir)) != NULL) {
+	while ((dent = io_dir__readdir(&iod)) != NULL) {
 		char path[PATH_MAX];
-		struct stat st;
 
-		/*sshfs might return bad dent->d_type, so we have to stat*/
 		path__join(path, sizeof(path), dir_name, dent->d_name);
-		if (stat(path, &st))
-			continue;
-
-		if (S_ISDIR(st.st_mode)) {
+		if (io_dir__is_dir(&iod, dent)) {
 			if (!strcmp(dent->d_name, ".") ||
 			    !strcmp(dent->d_name, ".."))
 				continue;
@@ -1603,7 +1600,7 @@ static int maps__set_modules_path_dir(struct maps *maps, const char *dir_name, i
 	}
 
 out:
-	closedir(dir);
+	close(iod.dirfd);
 	return ret;
 }
 
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 10/53] perf record: Be lazier in allocating lost samples buffer
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (8 preceding siblings ...)
  2023-11-02 17:56 ` [PATCH v4 09/53] perf maps: Switch modules tree walk to io_dir__readdir Ian Rogers
@ 2023-11-02 17:56 ` Ian Rogers
  2023-11-27 22:03   ` Arnaldo Carvalho de Melo
  2023-11-02 17:56 ` [PATCH v4 11/53] perf pmu: Switch to io_dir__readdir Ian Rogers
                   ` (42 subsequent siblings)
  52 siblings, 1 reply; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:56 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Wait until a lost sample occurs to allocate the lost samples buffer,
often the buffer isn't necessary. This saves a 64kb allocation and
5.3kb of peak memory consumption.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-record.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 9b4f3805ca92..b6c8c1371b39 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1924,21 +1924,13 @@ static void __record__save_lost_samples(struct record *rec, struct evsel *evsel,
 static void record__read_lost_samples(struct record *rec)
 {
 	struct perf_session *session = rec->session;
-	struct perf_record_lost_samples *lost;
+	struct perf_record_lost_samples *lost = NULL;
 	struct evsel *evsel;
 
 	/* there was an error during record__open */
 	if (session->evlist == NULL)
 		return;
 
-	lost = zalloc(PERF_SAMPLE_MAX_SIZE);
-	if (lost == NULL) {
-		pr_debug("Memory allocation failed\n");
-		return;
-	}
-
-	lost->header.type = PERF_RECORD_LOST_SAMPLES;
-
 	evlist__for_each_entry(session->evlist, evsel) {
 		struct xyarray *xy = evsel->core.sample_id;
 		u64 lost_count;
@@ -1961,6 +1953,14 @@ static void record__read_lost_samples(struct record *rec)
 				}
 
 				if (count.lost) {
+					if (!lost) {
+						lost = zalloc(PERF_SAMPLE_MAX_SIZE);
+						if (!lost) {
+							pr_debug("Memory allocation failed\n");
+							return;
+						}
+						lost->header.type = PERF_RECORD_LOST_SAMPLES;
+					}
 					__record__save_lost_samples(rec, evsel, lost,
 								    x, y, count.lost, 0);
 				}
@@ -1968,9 +1968,18 @@ static void record__read_lost_samples(struct record *rec)
 		}
 
 		lost_count = perf_bpf_filter__lost_count(evsel);
-		if (lost_count)
+		if (lost_count) {
+			if (!lost) {
+				lost = zalloc(PERF_SAMPLE_MAX_SIZE);
+				if (!lost) {
+					pr_debug("Memory allocation failed\n");
+					return;
+				}
+				lost->header.type = PERF_RECORD_LOST_SAMPLES;
+			}
 			__record__save_lost_samples(rec, evsel, lost, 0, 0, lost_count,
 						    PERF_RECORD_MISC_LOST_SAMPLES_BPF);
+		}
 	}
 out:
 	free(lost);
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 11/53] perf pmu: Switch to io_dir__readdir
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (9 preceding siblings ...)
  2023-11-02 17:56 ` [PATCH v4 10/53] perf record: Be lazier in allocating lost samples buffer Ian Rogers
@ 2023-11-02 17:56 ` Ian Rogers
  2023-11-02 17:56 ` [PATCH v4 12/53] perf bpf: Don't synthesize BPF events when disabled Ian Rogers
                   ` (41 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:56 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Avoid DIR allocations when scanning sysfs by using io_dir for the
readdir implementation, that allocates about 1kb on the stack.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/pmu.c  | 48 +++++++++++++++++-------------------------
 tools/perf/util/pmus.c | 30 ++++++++++----------------
 2 files changed, 30 insertions(+), 48 deletions(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index d3c9aa4326be..35548bbf7d17 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -12,6 +12,7 @@
 #include <stdbool.h>
 #include <dirent.h>
 #include <api/fs/fs.h>
+#include <api/io_dir.h>
 #include <locale.h>
 #include <fnmatch.h>
 #include <math.h>
@@ -184,19 +185,17 @@ static void perf_pmu_format__load(const struct perf_pmu *pmu, struct perf_pmu_fo
  */
 int perf_pmu__format_parse(struct perf_pmu *pmu, int dirfd, bool eager_load)
 {
-	struct dirent *evt_ent;
-	DIR *format_dir;
+	struct io_dirent64 *evt_ent;
+	struct io_dir format_dir;
 	int ret = 0;
 
-	format_dir = fdopendir(dirfd);
-	if (!format_dir)
-		return -EINVAL;
+	io_dir__init(&format_dir, dirfd);
 
-	while ((evt_ent = readdir(format_dir)) != NULL) {
+	while ((evt_ent = io_dir__readdir(&format_dir)) != NULL) {
 		struct perf_pmu_format *format;
 		char *name = evt_ent->d_name;
 
-		if (!strcmp(name, ".") || !strcmp(name, ".."))
+		if (io_dir__is_dir(&format_dir, evt_ent))
 			continue;
 
 		format = perf_pmu__new_format(&pmu->format, name);
@@ -223,7 +222,7 @@ int perf_pmu__format_parse(struct perf_pmu *pmu, int dirfd, bool eager_load)
 		}
 	}
 
-	closedir(format_dir);
+	close(format_dir.dirfd);
 	return ret;
 }
 
@@ -599,8 +598,8 @@ static inline bool pmu_alias_info_file(const char *name)
 static int pmu_aliases_parse(struct perf_pmu *pmu)
 {
 	char path[PATH_MAX];
-	struct dirent *evt_ent;
-	DIR *event_dir;
+	struct io_dirent64 *evt_ent;
+	struct io_dir event_dir;
 	size_t len;
 	int fd, dir_fd;
 
@@ -615,13 +614,9 @@ static int pmu_aliases_parse(struct perf_pmu *pmu)
 		return 0;
 	}
 
-	event_dir = fdopendir(dir_fd);
-	if (!event_dir){
-		close (dir_fd);
-		return -EINVAL;
-	}
+	io_dir__init(&event_dir, dir_fd);
 
-	while ((evt_ent = readdir(event_dir))) {
+	while ((evt_ent = io_dir__readdir(&event_dir))) {
 		char *name = evt_ent->d_name;
 		FILE *file;
 
@@ -651,7 +646,6 @@ static int pmu_aliases_parse(struct perf_pmu *pmu)
 		fclose(file);
 	}
 
-	closedir(event_dir);
 	close (dir_fd);
 	pmu->sysfs_aliases_loaded = true;
 	return 0;
@@ -1879,10 +1873,9 @@ static void perf_pmu__del_caps(struct perf_pmu *pmu)
  */
 int perf_pmu__caps_parse(struct perf_pmu *pmu)
 {
-	struct stat st;
 	char caps_path[PATH_MAX];
-	DIR *caps_dir;
-	struct dirent *evt_ent;
+	struct io_dir caps_dir;
+	struct io_dirent64 *evt_ent;
 	int caps_fd;
 
 	if (pmu->caps_initialized)
@@ -1893,24 +1886,21 @@ int perf_pmu__caps_parse(struct perf_pmu *pmu)
 	if (!perf_pmu__pathname_scnprintf(caps_path, sizeof(caps_path), pmu->name, "caps"))
 		return -1;
 
-	if (stat(caps_path, &st) < 0) {
+	caps_fd = open(caps_path, O_CLOEXEC | O_DIRECTORY | O_RDONLY);
+	if (caps_fd == -1) {
 		pmu->caps_initialized = true;
 		return 0;	/* no error if caps does not exist */
 	}
 
-	caps_dir = opendir(caps_path);
-	if (!caps_dir)
-		return -EINVAL;
-
-	caps_fd = dirfd(caps_dir);
+	io_dir__init(&caps_dir, caps_fd);
 
-	while ((evt_ent = readdir(caps_dir)) != NULL) {
+	while ((evt_ent = io_dir__readdir(&caps_dir)) != NULL) {
 		char *name = evt_ent->d_name;
 		char value[128];
 		FILE *file;
 		int fd;
 
-		if (!strcmp(name, ".") || !strcmp(name, ".."))
+		if (io_dir__is_dir(&caps_dir, evt_ent))
 			continue;
 
 		fd = openat(caps_fd, name, O_RDONLY);
@@ -1932,7 +1922,7 @@ int perf_pmu__caps_parse(struct perf_pmu *pmu)
 		fclose(file);
 	}
 
-	closedir(caps_dir);
+	close(caps_fd);
 
 	pmu->caps_initialized = true;
 	return pmu->nr_caps;
diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
index ce4931461741..65b23b98666b 100644
--- a/tools/perf/util/pmus.c
+++ b/tools/perf/util/pmus.c
@@ -3,10 +3,10 @@
 #include <linux/list_sort.h>
 #include <linux/string.h>
 #include <linux/zalloc.h>
+#include <api/io_dir.h>
 #include <subcmd/pager.h>
 #include <sys/types.h>
 #include <ctype.h>
-#include <dirent.h>
 #include <pthread.h>
 #include <string.h>
 #include <unistd.h>
@@ -184,8 +184,8 @@ static int pmus_cmp(void *priv __maybe_unused,
 static void pmu_read_sysfs(bool core_only)
 {
 	int fd;
-	DIR *dir;
-	struct dirent *dent;
+	struct io_dir dir;
+	struct io_dirent64 *dent;
 
 	if (read_sysfs_all_pmus || (core_only && read_sysfs_core_pmus))
 		return;
@@ -194,13 +194,9 @@ static void pmu_read_sysfs(bool core_only)
 	if (fd < 0)
 		return;
 
-	dir = fdopendir(fd);
-	if (!dir) {
-		close(fd);
-		return;
-	}
+	io_dir__init(&dir, fd);
 
-	while ((dent = readdir(dir))) {
+	while ((dent = io_dir__readdir(&dir)) != NULL) {
 		if (!strcmp(dent->d_name, ".") || !strcmp(dent->d_name, ".."))
 			continue;
 		if (core_only && !is_pmu_core(dent->d_name))
@@ -209,7 +205,7 @@ static void pmu_read_sysfs(bool core_only)
 		perf_pmu__find2(fd, dent->d_name);
 	}
 
-	closedir(dir);
+	close(fd);
 	if (list_empty(&core_pmus)) {
 		if (!perf_pmu__create_placeholder_core_pmu(&core_pmus))
 			pr_err("Failure to set up any core PMUs\n");
@@ -563,8 +559,8 @@ bool perf_pmus__supports_extended_type(void)
 char *perf_pmus__default_pmu_name(void)
 {
 	int fd;
-	DIR *dir;
-	struct dirent *dent;
+	struct io_dir dir;
+	struct io_dirent64 *dent;
 	char *result = NULL;
 
 	if (!list_empty(&core_pmus))
@@ -574,13 +570,9 @@ char *perf_pmus__default_pmu_name(void)
 	if (fd < 0)
 		return strdup("cpu");
 
-	dir = fdopendir(fd);
-	if (!dir) {
-		close(fd);
-		return strdup("cpu");
-	}
+	io_dir__init(&dir, fd);
 
-	while ((dent = readdir(dir))) {
+	while ((dent = io_dir__readdir(&dir)) != NULL) {
 		if (!strcmp(dent->d_name, ".") || !strcmp(dent->d_name, ".."))
 			continue;
 		if (is_pmu_core(dent->d_name)) {
@@ -589,7 +581,7 @@ char *perf_pmus__default_pmu_name(void)
 		}
 	}
 
-	closedir(dir);
+	close(fd);
 	return result ?: strdup("cpu");
 }
 
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 12/53] perf bpf: Don't synthesize BPF events when disabled
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (10 preceding siblings ...)
  2023-11-02 17:56 ` [PATCH v4 11/53] perf pmu: Switch to io_dir__readdir Ian Rogers
@ 2023-11-02 17:56 ` Ian Rogers
  2023-11-08 16:14   ` Arnaldo Carvalho de Melo
  2023-11-02 17:56 ` [PATCH v4 13/53] perf header: Switch mem topology to io_dir__readdir Ian Rogers
                   ` (40 subsequent siblings)
  52 siblings, 1 reply; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:56 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

If BPF sideband events are disabled on the command line, don't
synthesize BPF events too.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/bpf-event.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/bpf-event.c b/tools/perf/util/bpf-event.c
index 38fcf3ba5749..830711cae30d 100644
--- a/tools/perf/util/bpf-event.c
+++ b/tools/perf/util/bpf-event.c
@@ -386,6 +386,9 @@ int perf_event__synthesize_bpf_events(struct perf_session *session,
 	int err;
 	int fd;
 
+	if (opts->no_bpf_event)
+		return 0;
+
 	event = malloc(sizeof(event->bpf) + KSYM_NAME_LEN + machine->id_hdr_size);
 	if (!event)
 		return -1;
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 13/53] perf header: Switch mem topology to io_dir__readdir
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (11 preceding siblings ...)
  2023-11-02 17:56 ` [PATCH v4 12/53] perf bpf: Don't synthesize BPF events when disabled Ian Rogers
@ 2023-11-02 17:56 ` Ian Rogers
  2023-11-02 17:56 ` [PATCH v4 14/53] perf events: Remove scandir in thread synthesis Ian Rogers
                   ` (39 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:56 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Switch memory_node__read and build_mem_topology from opendir/readdir
to io_dir__readdir, with smaller stack allocations. Reduces peak
memory consumption of perf record by 10kb.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/header.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index e86b9439ffee..55f63d2ee232 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -44,6 +44,7 @@
 #include "build-id.h"
 #include "data.h"
 #include <api/fs/fs.h>
+#include <api/io_dir.h>
 #include "asm/bug.h"
 #include "tool.h"
 #include "time-utils.h"
@@ -1341,11 +1342,11 @@ static int memory_node__read(struct memory_node *n, unsigned long idx)
 {
 	unsigned int phys, size = 0;
 	char path[PATH_MAX];
-	struct dirent *ent;
-	DIR *dir;
+	struct io_dirent64 *ent;
+	struct io_dir dir;
 
 #define for_each_memory(mem, dir)					\
-	while ((ent = readdir(dir)))					\
+	while ((ent = io_dir__readdir(&dir)) != NULL)			\
 		if (strcmp(ent->d_name, ".") &&				\
 		    strcmp(ent->d_name, "..") &&			\
 		    sscanf(ent->d_name, "memory%u", &mem) == 1)
@@ -1354,9 +1355,9 @@ static int memory_node__read(struct memory_node *n, unsigned long idx)
 		  "%s/devices/system/node/node%lu",
 		  sysfs__mountpoint(), idx);
 
-	dir = opendir(path);
-	if (!dir) {
-		pr_warning("failed: can't open memory sysfs data\n");
+	io_dir__init(&dir, open(path, O_CLOEXEC | O_DIRECTORY | O_RDONLY));
+	if (dir.dirfd < 0) {
+		pr_warning("failed: can't open memory sysfs data '%s'\n", path);
 		return -1;
 	}
 
@@ -1368,20 +1369,20 @@ static int memory_node__read(struct memory_node *n, unsigned long idx)
 
 	n->set = bitmap_zalloc(size);
 	if (!n->set) {
-		closedir(dir);
+		close(dir.dirfd);
 		return -ENOMEM;
 	}
 
 	n->node = idx;
 	n->size = size;
 
-	rewinddir(dir);
+	io_dir__rewinddir(&dir);
 
 	for_each_memory(phys, dir) {
 		__set_bit(phys, n->set);
 	}
 
-	closedir(dir);
+	close(dir.dirfd);
 	return 0;
 }
 
@@ -1404,8 +1405,8 @@ static int memory_node__sort(const void *a, const void *b)
 static int build_mem_topology(struct memory_node **nodesp, u64 *cntp)
 {
 	char path[PATH_MAX];
-	struct dirent *ent;
-	DIR *dir;
+	struct io_dirent64 *ent;
+	struct io_dir dir;
 	int ret = 0;
 	size_t cnt = 0, size = 0;
 	struct memory_node *nodes = NULL;
@@ -1413,14 +1414,14 @@ static int build_mem_topology(struct memory_node **nodesp, u64 *cntp)
 	scnprintf(path, PATH_MAX, "%s/devices/system/node/",
 		  sysfs__mountpoint());
 
-	dir = opendir(path);
-	if (!dir) {
+	io_dir__init(&dir, open(path, O_CLOEXEC | O_DIRECTORY | O_RDONLY));
+	if (dir.dirfd < 0) {
 		pr_debug2("%s: couldn't read %s, does this arch have topology information?\n",
 			  __func__, path);
 		return -1;
 	}
 
-	while (!ret && (ent = readdir(dir))) {
+	while (!ret && (ent = io_dir__readdir(&dir))) {
 		unsigned int idx;
 		int r;
 
@@ -1447,7 +1448,7 @@ static int build_mem_topology(struct memory_node **nodesp, u64 *cntp)
 		ret = memory_node__read(&nodes[cnt++], idx);
 	}
 out:
-	closedir(dir);
+	close(dir.dirfd);
 	if (!ret) {
 		*cntp = cnt;
 		*nodesp = nodes;
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 14/53] perf events: Remove scandir in thread synthesis
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (12 preceding siblings ...)
  2023-11-02 17:56 ` [PATCH v4 13/53] perf header: Switch mem topology to io_dir__readdir Ian Rogers
@ 2023-11-02 17:56 ` Ian Rogers
  2023-11-02 17:56 ` [PATCH v4 15/53] perf map: Simplify map_ip/unmap_ip and make map size smaller Ian Rogers
                   ` (38 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:56 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

This avoids scanddir reading the directory into memory that's
allocated and instead allocates on the stack.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/synthetic-events.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index a0579c7d7b9e..7cc38f2a0e9e 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -38,6 +38,7 @@
 #include <uapi/linux/mman.h> /* To get things like MAP_HUGETLB even on older libc headers */
 #include <api/fs/fs.h>
 #include <api/io.h>
+#include <api/io_dir.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
@@ -751,10 +752,10 @@ static int __event__synthesize_thread(union perf_event *comm_event,
 				      bool needs_mmap, bool mmap_data)
 {
 	char filename[PATH_MAX];
-	struct dirent **dirent;
+	struct io_dir iod;
+	struct io_dirent64 *dent;
 	pid_t tgid, ppid;
 	int rc = 0;
-	int i, n;
 
 	/* special case: only send one comm event using passed in pid */
 	if (!full) {
@@ -786,16 +787,19 @@ static int __event__synthesize_thread(union perf_event *comm_event,
 	snprintf(filename, sizeof(filename), "%s/proc/%d/task",
 		 machine->root_dir, pid);
 
-	n = scandir(filename, &dirent, filter_task, NULL);
-	if (n < 0)
-		return n;
+	io_dir__init(&iod, open(filename, O_CLOEXEC | O_DIRECTORY | O_RDONLY));
+	if (iod.dirfd < 0)
+		return -1;
 
-	for (i = 0; i < n; i++) {
+	while ((dent = io_dir__readdir(&iod)) != NULL) {
 		char *end;
 		pid_t _pid;
 		bool kernel_thread = false;
 
-		_pid = strtol(dirent[i]->d_name, &end, 10);
+		if (!isdigit(dent->d_name[0]))
+			continue;
+
+		_pid = strtol(dent->d_name, &end, 10);
 		if (*end)
 			continue;
 
@@ -829,9 +833,7 @@ static int __event__synthesize_thread(union perf_event *comm_event,
 		}
 	}
 
-	for (i = 0; i < n; i++)
-		zfree(&dirent[i]);
-	free(dirent);
+	close(iod.dirfd);
 
 	return rc;
 }
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 15/53] perf map: Simplify map_ip/unmap_ip and make map size smaller
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (13 preceding siblings ...)
  2023-11-02 17:56 ` [PATCH v4 14/53] perf events: Remove scandir in thread synthesis Ian Rogers
@ 2023-11-02 17:56 ` Ian Rogers
  2023-11-02 17:56 ` [PATCH v4 16/53] perf maps: Move symbol maps functions to maps.c Ian Rogers
                   ` (37 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:56 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

When mapping an IP it is either an identity mapping or a DSO relative
mapping, so a single bit is required in the struct to identify
this. The current code uses function pointers, adding 2 pointers per
map and also pushing the size of a map beyond 1 cache line. Switch to
using a byte to identify the mapping type (as well as priv and
erange_warned), to avoid any masking. Change struct maps's layout to
avoid holes.

Before:
```
struct map {
        u64                        start;                /*     0     8 */
        u64                        end;                  /*     8     8 */
        _Bool                      erange_warned:1;      /*    16: 0  1 */
        _Bool                      priv:1;               /*    16: 1  1 */

        /* XXX 6 bits hole, try to pack */
        /* XXX 3 bytes hole, try to pack */

        u32                        prot;                 /*    20     4 */
        u64                        pgoff;                /*    24     8 */
        u64                        reloc;                /*    32     8 */
        u64                        (*map_ip)(const struct map  *, u64); /*    40     8 */
        u64                        (*unmap_ip)(const struct map  *, u64); /*    48     8 */
        struct dso *               dso;                  /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        refcount_t                 refcnt;               /*    64     4 */
        u32                        flags;                /*    68     4 */

        /* size: 72, cachelines: 2, members: 12 */
        /* sum members: 68, holes: 1, sum holes: 3 */
        /* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */
        /* last cacheline: 8 bytes */
};
```

After:
```
struct map {
        u64                        start;                /*     0     8 */
        u64                        end;                  /*     8     8 */
        u64                        pgoff;                /*    16     8 */
        u64                        reloc;                /*    24     8 */
        struct dso *               dso;                  /*    32     8 */
        refcount_t                 refcnt;               /*    40     4 */
        u32                        prot;                 /*    44     4 */
        u32                        flags;                /*    48     4 */
        enum mapping_type          mapping_type:8;       /*    52: 0  4 */

        /* Bitfield combined with next fields */

        _Bool                      erange_warned;        /*    53     1 */
        _Bool                      priv;                 /*    54     1 */

        /* size: 56, cachelines: 1, members: 11 */
        /* padding: 1 */
        /* last cacheline: 56 bytes */
};
```

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/machine.c    |  3 +-
 tools/perf/util/map.c        | 20 +--------
 tools/perf/util/map.h        | 83 +++++++++++++++++++-----------------
 tools/perf/util/symbol-elf.c |  6 +--
 tools/perf/util/symbol.c     |  6 +--
 5 files changed, 50 insertions(+), 68 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index be3dab9d5253..b6831a1f909d 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1360,8 +1360,7 @@ __machine__create_kernel_maps(struct machine *machine, struct dso *kernel)
 	if (machine->vmlinux_map == NULL)
 		return -ENOMEM;
 
-	map__set_map_ip(machine->vmlinux_map, identity__map_ip);
-	map__set_unmap_ip(machine->vmlinux_map, identity__map_ip);
+	map__set_mapping_type(machine->vmlinux_map, MAPPING_TYPE__IDENTITY);
 	return maps__insert(machine__kernel_maps(machine), machine->vmlinux_map);
 }
 
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index f64b83004421..54c67cb7ecef 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -109,8 +109,7 @@ void map__init(struct map *map, u64 start, u64 end, u64 pgoff, struct dso *dso)
 	map__set_pgoff(map, pgoff);
 	map__set_reloc(map, 0);
 	map__set_dso(map, dso__get(dso));
-	map__set_map_ip(map, map__dso_map_ip);
-	map__set_unmap_ip(map, map__dso_unmap_ip);
+	map__set_mapping_type(map, MAPPING_TYPE__DSO);
 	map__set_erange_warned(map, false);
 	refcount_set(map__refcnt(map), 1);
 }
@@ -172,7 +171,7 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
 		map__init(result, start, start + len, pgoff, dso);
 
 		if (anon || no_dso) {
-			map->map_ip = map->unmap_ip = identity__map_ip;
+			map->mapping_type = MAPPING_TYPE__IDENTITY;
 
 			/*
 			 * Set memory without DSO as loaded. All map__find_*
@@ -630,18 +629,3 @@ struct maps *map__kmaps(struct map *map)
 	}
 	return kmap->kmaps;
 }
-
-u64 map__dso_map_ip(const struct map *map, u64 ip)
-{
-	return ip - map__start(map) + map__pgoff(map);
-}
-
-u64 map__dso_unmap_ip(const struct map *map, u64 ip)
-{
-	return ip + map__start(map) - map__pgoff(map);
-}
-
-u64 identity__map_ip(const struct map *map __maybe_unused, u64 ip)
-{
-	return ip;
-}
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index 1b53d53adc86..3a3b7757da5f 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -16,23 +16,25 @@ struct dso;
 struct maps;
 struct machine;
 
+enum mapping_type {
+	/* map__map_ip/map__unmap_ip are given as offsets in the DSO. */
+	MAPPING_TYPE__DSO,
+	/* map__map_ip/map__unmap_ip are just the given ip value. */
+	MAPPING_TYPE__IDENTITY,
+};
+
 DECLARE_RC_STRUCT(map) {
 	u64			start;
 	u64			end;
-	bool			erange_warned:1;
-	bool			priv:1;
-	u32			prot;
 	u64			pgoff;
 	u64			reloc;
-
-	/* ip -> dso rip */
-	u64			(*map_ip)(const struct map *, u64);
-	/* dso rip -> ip */
-	u64			(*unmap_ip)(const struct map *, u64);
-
 	struct dso		*dso;
 	refcount_t		refcnt;
+	u32			prot;
 	u32			flags;
+	enum mapping_type	mapping_type:8;
+	bool			erange_warned;
+	bool			priv;
 };
 
 struct kmap;
@@ -41,38 +43,11 @@ struct kmap *__map__kmap(struct map *map);
 struct kmap *map__kmap(struct map *map);
 struct maps *map__kmaps(struct map *map);
 
-/* ip -> dso rip */
-u64 map__dso_map_ip(const struct map *map, u64 ip);
-/* dso rip -> ip */
-u64 map__dso_unmap_ip(const struct map *map, u64 ip);
-/* Returns ip */
-u64 identity__map_ip(const struct map *map __maybe_unused, u64 ip);
-
 static inline struct dso *map__dso(const struct map *map)
 {
 	return RC_CHK_ACCESS(map)->dso;
 }
 
-static inline u64 map__map_ip(const struct map *map, u64 ip)
-{
-	return RC_CHK_ACCESS(map)->map_ip(map, ip);
-}
-
-static inline u64 map__unmap_ip(const struct map *map, u64 ip)
-{
-	return RC_CHK_ACCESS(map)->unmap_ip(map, ip);
-}
-
-static inline void *map__map_ip_ptr(struct map *map)
-{
-	return RC_CHK_ACCESS(map)->map_ip;
-}
-
-static inline void* map__unmap_ip_ptr(struct map *map)
-{
-	return RC_CHK_ACCESS(map)->unmap_ip;
-}
-
 static inline u64 map__start(const struct map *map)
 {
 	return RC_CHK_ACCESS(map)->start;
@@ -123,6 +98,34 @@ static inline size_t map__size(const struct map *map)
 	return map__end(map) - map__start(map);
 }
 
+/* ip -> dso rip */
+static inline u64 map__dso_map_ip(const struct map *map, u64 ip)
+{
+	return ip - map__start(map) + map__pgoff(map);
+}
+
+/* dso rip -> ip */
+static inline u64 map__dso_unmap_ip(const struct map *map, u64 ip)
+{
+	return ip + map__start(map) - map__pgoff(map);
+}
+
+static inline u64 map__map_ip(const struct map *map, u64 ip)
+{
+	if ((RC_CHK_ACCESS(map)->mapping_type) == MAPPING_TYPE__DSO)
+		return map__dso_map_ip(map, ip);
+	else
+		return ip;
+}
+
+static inline u64 map__unmap_ip(const struct map *map, u64 ip)
+{
+	if ((RC_CHK_ACCESS(map)->mapping_type) == MAPPING_TYPE__DSO)
+		return map__dso_unmap_ip(map, ip);
+	else
+		return ip;
+}
+
 /* rip/ip <-> addr suitable for passing to `objdump --start-address=` */
 u64 map__rip_2objdump(struct map *map, u64 rip);
 
@@ -294,13 +297,13 @@ static inline void map__set_dso(struct map *map, struct dso *dso)
 	RC_CHK_ACCESS(map)->dso = dso;
 }
 
-static inline void map__set_map_ip(struct map *map, u64 (*map_ip)(const struct map *map, u64 ip))
+static inline void map__set_mapping_type(struct map *map, enum mapping_type type)
 {
-	RC_CHK_ACCESS(map)->map_ip = map_ip;
+	RC_CHK_ACCESS(map)->mapping_type = type;
 }
 
-static inline void map__set_unmap_ip(struct map *map, u64 (*unmap_ip)(const struct map *map, u64 rip))
+static inline enum mapping_type map__mapping_type(struct map *map)
 {
-	RC_CHK_ACCESS(map)->unmap_ip = unmap_ip;
+	return RC_CHK_ACCESS(map)->mapping_type;
 }
 #endif /* __PERF_MAP_H */
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 9e7eeaf616b8..4b934ed3bfd1 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1392,8 +1392,7 @@ static int dso__process_kernel_symbol(struct dso *dso, struct map *map,
 			map__set_start(map, shdr->sh_addr + ref_reloc(kmap));
 			map__set_end(map, map__start(map) + shdr->sh_size);
 			map__set_pgoff(map, shdr->sh_offset);
-			map__set_map_ip(map, map__dso_map_ip);
-			map__set_unmap_ip(map, map__dso_unmap_ip);
+			map__set_mapping_type(map, MAPPING_TYPE__DSO);
 			/* Ensure maps are correctly ordered */
 			if (kmaps) {
 				int err;
@@ -1455,8 +1454,7 @@ static int dso__process_kernel_symbol(struct dso *dso, struct map *map,
 			map__set_end(curr_map, map__start(curr_map) + shdr->sh_size);
 			map__set_pgoff(curr_map, shdr->sh_offset);
 		} else {
-			map__set_map_ip(curr_map, identity__map_ip);
-			map__set_unmap_ip(curr_map, identity__map_ip);
+			map__set_mapping_type(curr_map, MAPPING_TYPE__IDENTITY);
 		}
 		curr_dso->symtab_type = dso->symtab_type;
 		if (maps__insert(kmaps, curr_map))
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 82cc74b9358e..314c0263bf3c 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -956,8 +956,7 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
 				return -1;
 			}
 
-			map__set_map_ip(curr_map, identity__map_ip);
-			map__set_unmap_ip(curr_map, identity__map_ip);
+			map__set_mapping_type(curr_map, MAPPING_TYPE__IDENTITY);
 			if (maps__insert(kmaps, curr_map)) {
 				dso__put(ndso);
 				return -1;
@@ -1475,8 +1474,7 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
 			map__set_start(map, map__start(new_map));
 			map__set_end(map, map__end(new_map));
 			map__set_pgoff(map, map__pgoff(new_map));
-			map__set_map_ip(map, map__map_ip_ptr(new_map));
-			map__set_unmap_ip(map, map__unmap_ip_ptr(new_map));
+			map__set_mapping_type(map, map__mapping_type(new_map));
 			/* Ensure maps are correctly ordered */
 			map_ref = map__get(map);
 			maps__remove(kmaps, map_ref);
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 16/53] perf maps: Move symbol maps functions to maps.c
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (14 preceding siblings ...)
  2023-11-02 17:56 ` [PATCH v4 15/53] perf map: Simplify map_ip/unmap_ip and make map size smaller Ian Rogers
@ 2023-11-02 17:56 ` Ian Rogers
  2023-11-02 17:56 ` [PATCH v4 17/53] perf thread: Add missing RC_CHK_EQUAL Ian Rogers
                   ` (36 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:56 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Move the find and certain other symbol maps__* functions to maps.c for
better abstraction.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/maps.c   | 238 +++++++++++++++++++++++++++++++++++++
 tools/perf/util/maps.h   |  12 ++
 tools/perf/util/symbol.c | 248 ---------------------------------------
 tools/perf/util/symbol.h |   1 -
 4 files changed, 250 insertions(+), 249 deletions(-)

diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 233438c95b53..9a011aed4b75 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -475,3 +475,241 @@ struct map_rb_node *map_rb_node__next(struct map_rb_node *node)
 
 	return rb_entry(next, struct map_rb_node, rb_node);
 }
+
+static int map__strcmp(const void *a, const void *b)
+{
+	const struct map *map_a = *(const struct map **)a;
+	const struct map *map_b = *(const struct map **)b;
+	const struct dso *dso_a = map__dso(map_a);
+	const struct dso *dso_b = map__dso(map_b);
+	int ret = strcmp(dso_a->short_name, dso_b->short_name);
+
+	if (ret == 0 && map_a != map_b) {
+		/*
+		 * Ensure distinct but name equal maps have an order in part to
+		 * aid reference counting.
+		 */
+		ret = (int)map__start(map_a) - (int)map__start(map_b);
+		if (ret == 0)
+			ret = (int)((intptr_t)map_a - (intptr_t)map_b);
+	}
+
+	return ret;
+}
+
+static int map__strcmp_name(const void *name, const void *b)
+{
+	const struct dso *dso = map__dso(*(const struct map **)b);
+
+	return strcmp(name, dso->short_name);
+}
+
+void __maps__sort_by_name(struct maps *maps)
+{
+	qsort(maps__maps_by_name(maps), maps__nr_maps(maps), sizeof(struct map *), map__strcmp);
+}
+
+static int map__groups__sort_by_name_from_rbtree(struct maps *maps)
+{
+	struct map_rb_node *rb_node;
+	struct map **maps_by_name = realloc(maps__maps_by_name(maps),
+					    maps__nr_maps(maps) * sizeof(struct map *));
+	int i = 0;
+
+	if (maps_by_name == NULL)
+		return -1;
+
+	up_read(maps__lock(maps));
+	down_write(maps__lock(maps));
+
+	RC_CHK_ACCESS(maps)->maps_by_name = maps_by_name;
+	RC_CHK_ACCESS(maps)->nr_maps_allocated = maps__nr_maps(maps);
+
+	maps__for_each_entry(maps, rb_node)
+		maps_by_name[i++] = map__get(rb_node->map);
+
+	__maps__sort_by_name(maps);
+
+	up_write(maps__lock(maps));
+	down_read(maps__lock(maps));
+
+	return 0;
+}
+
+static struct map *__maps__find_by_name(struct maps *maps, const char *name)
+{
+	struct map **mapp;
+
+	if (maps__maps_by_name(maps) == NULL &&
+	    map__groups__sort_by_name_from_rbtree(maps))
+		return NULL;
+
+	mapp = bsearch(name, maps__maps_by_name(maps), maps__nr_maps(maps),
+		       sizeof(*mapp), map__strcmp_name);
+	if (mapp)
+		return *mapp;
+	return NULL;
+}
+
+struct map *maps__find_by_name(struct maps *maps, const char *name)
+{
+	struct map_rb_node *rb_node;
+	struct map *map;
+
+	down_read(maps__lock(maps));
+
+
+	if (RC_CHK_ACCESS(maps)->last_search_by_name) {
+		const struct dso *dso = map__dso(RC_CHK_ACCESS(maps)->last_search_by_name);
+
+		if (strcmp(dso->short_name, name) == 0) {
+			map = RC_CHK_ACCESS(maps)->last_search_by_name;
+			goto out_unlock;
+		}
+	}
+	/*
+	 * If we have maps->maps_by_name, then the name isn't in the rbtree,
+	 * as maps->maps_by_name mirrors the rbtree when lookups by name are
+	 * made.
+	 */
+	map = __maps__find_by_name(maps, name);
+	if (map || maps__maps_by_name(maps) != NULL)
+		goto out_unlock;
+
+	/* Fallback to traversing the rbtree... */
+	maps__for_each_entry(maps, rb_node) {
+		struct dso *dso;
+
+		map = rb_node->map;
+		dso = map__dso(map);
+		if (strcmp(dso->short_name, name) == 0) {
+			RC_CHK_ACCESS(maps)->last_search_by_name = map;
+			goto out_unlock;
+		}
+	}
+	map = NULL;
+
+out_unlock:
+	up_read(maps__lock(maps));
+	return map;
+}
+
+void maps__fixup_end(struct maps *maps)
+{
+	struct map_rb_node *prev = NULL, *curr;
+
+	down_write(maps__lock(maps));
+
+	maps__for_each_entry(maps, curr) {
+		if (prev != NULL && !map__end(prev->map))
+			map__set_end(prev->map, map__start(curr->map));
+
+		prev = curr;
+	}
+
+	/*
+	 * We still haven't the actual symbols, so guess the
+	 * last map final address.
+	 */
+	if (curr && !map__end(curr->map))
+		map__set_end(curr->map, ~0ULL);
+
+	up_write(maps__lock(maps));
+}
+
+/*
+ * Merges map into maps by splitting the new map within the existing map
+ * regions.
+ */
+int maps__merge_in(struct maps *kmaps, struct map *new_map)
+{
+	struct map_rb_node *rb_node;
+	LIST_HEAD(merged);
+	int err = 0;
+
+	maps__for_each_entry(kmaps, rb_node) {
+		struct map *old_map = rb_node->map;
+
+		/* no overload with this one */
+		if (map__end(new_map) < map__start(old_map) ||
+		    map__start(new_map) >= map__end(old_map))
+			continue;
+
+		if (map__start(new_map) < map__start(old_map)) {
+			/*
+			 * |new......
+			 *       |old....
+			 */
+			if (map__end(new_map) < map__end(old_map)) {
+				/*
+				 * |new......|     -> |new..|
+				 *       |old....| ->       |old....|
+				 */
+				map__set_end(new_map, map__start(old_map));
+			} else {
+				/*
+				 * |new.............| -> |new..|       |new..|
+				 *       |old....|    ->       |old....|
+				 */
+				struct map_list_node *m = map_list_node__new();
+
+				if (!m) {
+					err = -ENOMEM;
+					goto out;
+				}
+
+				m->map = map__clone(new_map);
+				if (!m->map) {
+					free(m);
+					err = -ENOMEM;
+					goto out;
+				}
+
+				map__set_end(m->map, map__start(old_map));
+				list_add_tail(&m->node, &merged);
+				map__add_pgoff(new_map, map__end(old_map) - map__start(new_map));
+				map__set_start(new_map, map__end(old_map));
+			}
+		} else {
+			/*
+			 *      |new......
+			 * |old....
+			 */
+			if (map__end(new_map) < map__end(old_map)) {
+				/*
+				 *      |new..|   -> x
+				 * |old.........| -> |old.........|
+				 */
+				map__put(new_map);
+				new_map = NULL;
+				break;
+			} else {
+				/*
+				 *      |new......| ->         |new...|
+				 * |old....|        -> |old....|
+				 */
+				map__add_pgoff(new_map, map__end(old_map) - map__start(new_map));
+				map__set_start(new_map, map__end(old_map));
+			}
+		}
+	}
+
+out:
+	while (!list_empty(&merged)) {
+		struct map_list_node *old_node;
+
+		old_node = list_entry(merged.next, struct map_list_node, node);
+		list_del_init(&old_node->node);
+		if (!err)
+			err = maps__insert(kmaps, old_node->map);
+		map__put(old_node->map);
+		free(old_node);
+	}
+
+	if (new_map) {
+		if (!err)
+			err = maps__insert(kmaps, new_map);
+		map__put(new_map);
+	}
+	return err;
+}
diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index 83144e0645ed..a689149be8c4 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -21,6 +21,16 @@ struct map_rb_node {
 	struct map *map;
 };
 
+struct map_list_node {
+	struct list_head node;
+	struct map *map;
+};
+
+static inline struct map_list_node *map_list_node__new(void)
+{
+	return malloc(sizeof(struct map_list_node));
+}
+
 struct map_rb_node *maps__first(struct maps *maps);
 struct map_rb_node *map_rb_node__next(struct map_rb_node *node);
 struct map_rb_node *maps__find_node(struct maps *maps, struct map *map);
@@ -133,4 +143,6 @@ int maps__merge_in(struct maps *kmaps, struct map *new_map);
 
 void __maps__sort_by_name(struct maps *maps);
 
+void maps__fixup_end(struct maps *maps);
+
 #endif // __PERF_MAPS_H
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 314c0263bf3c..1cc42b8d8afb 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -48,11 +48,6 @@ static bool symbol__is_idle(const char *name);
 int vmlinux_path__nr_entries;
 char **vmlinux_path;
 
-struct map_list_node {
-	struct list_head node;
-	struct map *map;
-};
-
 struct symbol_conf symbol_conf = {
 	.nanosecs		= false,
 	.use_modules		= true,
@@ -90,11 +85,6 @@ static enum dso_binary_type binary_type_symtab[] = {
 
 #define DSO_BINARY_TYPE__SYMTAB_CNT ARRAY_SIZE(binary_type_symtab)
 
-static struct map_list_node *map_list_node__new(void)
-{
-	return malloc(sizeof(struct map_list_node));
-}
-
 static bool symbol_type__filter(char symbol_type)
 {
 	symbol_type = toupper(symbol_type);
@@ -270,29 +260,6 @@ void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms)
 		curr->end = roundup(curr->start, 4096) + 4096;
 }
 
-void maps__fixup_end(struct maps *maps)
-{
-	struct map_rb_node *prev = NULL, *curr;
-
-	down_write(maps__lock(maps));
-
-	maps__for_each_entry(maps, curr) {
-		if (prev != NULL && !map__end(prev->map))
-			map__set_end(prev->map, map__start(curr->map));
-
-		prev = curr;
-	}
-
-	/*
-	 * We still haven't the actual symbols, so guess the
-	 * last map final address.
-	 */
-	if (curr && !map__end(curr->map))
-		map__set_end(curr->map, ~0ULL);
-
-	up_write(maps__lock(maps));
-}
-
 struct symbol *symbol__new(u64 start, u64 len, u8 binding, u8 type, const char *name)
 {
 	size_t namelen = strlen(name) + 1;
@@ -1270,103 +1237,6 @@ static int kcore_mapfn(u64 start, u64 len, u64 pgoff, void *data)
 	return 0;
 }
 
-/*
- * Merges map into maps by splitting the new map within the existing map
- * regions.
- */
-int maps__merge_in(struct maps *kmaps, struct map *new_map)
-{
-	struct map_rb_node *rb_node;
-	LIST_HEAD(merged);
-	int err = 0;
-
-	maps__for_each_entry(kmaps, rb_node) {
-		struct map *old_map = rb_node->map;
-
-		/* no overload with this one */
-		if (map__end(new_map) < map__start(old_map) ||
-		    map__start(new_map) >= map__end(old_map))
-			continue;
-
-		if (map__start(new_map) < map__start(old_map)) {
-			/*
-			 * |new......
-			 *       |old....
-			 */
-			if (map__end(new_map) < map__end(old_map)) {
-				/*
-				 * |new......|     -> |new..|
-				 *       |old....| ->       |old....|
-				 */
-				map__set_end(new_map, map__start(old_map));
-			} else {
-				/*
-				 * |new.............| -> |new..|       |new..|
-				 *       |old....|    ->       |old....|
-				 */
-				struct map_list_node *m = map_list_node__new();
-
-				if (!m) {
-					err = -ENOMEM;
-					goto out;
-				}
-
-				m->map = map__clone(new_map);
-				if (!m->map) {
-					free(m);
-					err = -ENOMEM;
-					goto out;
-				}
-
-				map__set_end(m->map, map__start(old_map));
-				list_add_tail(&m->node, &merged);
-				map__add_pgoff(new_map, map__end(old_map) - map__start(new_map));
-				map__set_start(new_map, map__end(old_map));
-			}
-		} else {
-			/*
-			 *      |new......
-			 * |old....
-			 */
-			if (map__end(new_map) < map__end(old_map)) {
-				/*
-				 *      |new..|   -> x
-				 * |old.........| -> |old.........|
-				 */
-				map__put(new_map);
-				new_map = NULL;
-				break;
-			} else {
-				/*
-				 *      |new......| ->         |new...|
-				 * |old....|        -> |old....|
-				 */
-				map__add_pgoff(new_map, map__end(old_map) - map__start(new_map));
-				map__set_start(new_map, map__end(old_map));
-			}
-		}
-	}
-
-out:
-	while (!list_empty(&merged)) {
-		struct map_list_node *old_node;
-
-		old_node = list_entry(merged.next, struct map_list_node, node);
-		list_del_init(&old_node->node);
-		if (!err)
-			err = maps__insert(kmaps, old_node->map);
-		map__put(old_node->map);
-		free(old_node);
-	}
-
-	if (new_map) {
-		if (!err)
-			err = maps__insert(kmaps, new_map);
-		map__put(new_map);
-	}
-	return err;
-}
-
 static int dso__load_kcore(struct dso *dso, struct map *map,
 			   const char *kallsyms_filename)
 {
@@ -2065,124 +1935,6 @@ int dso__load(struct dso *dso, struct map *map)
 	return ret;
 }
 
-static int map__strcmp(const void *a, const void *b)
-{
-	const struct map *map_a = *(const struct map **)a;
-	const struct map *map_b = *(const struct map **)b;
-	const struct dso *dso_a = map__dso(map_a);
-	const struct dso *dso_b = map__dso(map_b);
-	int ret = strcmp(dso_a->short_name, dso_b->short_name);
-
-	if (ret == 0 && map_a != map_b) {
-		/*
-		 * Ensure distinct but name equal maps have an order in part to
-		 * aid reference counting.
-		 */
-		ret = (int)map__start(map_a) - (int)map__start(map_b);
-		if (ret == 0)
-			ret = (int)((intptr_t)map_a - (intptr_t)map_b);
-	}
-
-	return ret;
-}
-
-static int map__strcmp_name(const void *name, const void *b)
-{
-	const struct dso *dso = map__dso(*(const struct map **)b);
-
-	return strcmp(name, dso->short_name);
-}
-
-void __maps__sort_by_name(struct maps *maps)
-{
-	qsort(maps__maps_by_name(maps), maps__nr_maps(maps), sizeof(struct map *), map__strcmp);
-}
-
-static int map__groups__sort_by_name_from_rbtree(struct maps *maps)
-{
-	struct map_rb_node *rb_node;
-	struct map **maps_by_name = realloc(maps__maps_by_name(maps),
-					    maps__nr_maps(maps) * sizeof(struct map *));
-	int i = 0;
-
-	if (maps_by_name == NULL)
-		return -1;
-
-	up_read(maps__lock(maps));
-	down_write(maps__lock(maps));
-
-	RC_CHK_ACCESS(maps)->maps_by_name = maps_by_name;
-	RC_CHK_ACCESS(maps)->nr_maps_allocated = maps__nr_maps(maps);
-
-	maps__for_each_entry(maps, rb_node)
-		maps_by_name[i++] = map__get(rb_node->map);
-
-	__maps__sort_by_name(maps);
-
-	up_write(maps__lock(maps));
-	down_read(maps__lock(maps));
-
-	return 0;
-}
-
-static struct map *__maps__find_by_name(struct maps *maps, const char *name)
-{
-	struct map **mapp;
-
-	if (maps__maps_by_name(maps) == NULL &&
-	    map__groups__sort_by_name_from_rbtree(maps))
-		return NULL;
-
-	mapp = bsearch(name, maps__maps_by_name(maps), maps__nr_maps(maps),
-		       sizeof(*mapp), map__strcmp_name);
-	if (mapp)
-		return *mapp;
-	return NULL;
-}
-
-struct map *maps__find_by_name(struct maps *maps, const char *name)
-{
-	struct map_rb_node *rb_node;
-	struct map *map;
-
-	down_read(maps__lock(maps));
-
-
-	if (RC_CHK_ACCESS(maps)->last_search_by_name) {
-		const struct dso *dso = map__dso(RC_CHK_ACCESS(maps)->last_search_by_name);
-
-		if (strcmp(dso->short_name, name) == 0) {
-			map = RC_CHK_ACCESS(maps)->last_search_by_name;
-			goto out_unlock;
-		}
-	}
-	/*
-	 * If we have maps->maps_by_name, then the name isn't in the rbtree,
-	 * as maps->maps_by_name mirrors the rbtree when lookups by name are
-	 * made.
-	 */
-	map = __maps__find_by_name(maps, name);
-	if (map || maps__maps_by_name(maps) != NULL)
-		goto out_unlock;
-
-	/* Fallback to traversing the rbtree... */
-	maps__for_each_entry(maps, rb_node) {
-		struct dso *dso;
-
-		map = rb_node->map;
-		dso = map__dso(map);
-		if (strcmp(dso->short_name, name) == 0) {
-			RC_CHK_ACCESS(maps)->last_search_by_name = map;
-			goto out_unlock;
-		}
-	}
-	map = NULL;
-
-out_unlock:
-	up_read(maps__lock(maps));
-	return map;
-}
-
 int dso__load_vmlinux(struct dso *dso, struct map *map,
 		      const char *vmlinux, bool vmlinux_allocated)
 {
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index af87c46b3f89..071837ddce2a 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -189,7 +189,6 @@ void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
 void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
 void symbols__fixup_duplicate(struct rb_root_cached *symbols);
 void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
-void maps__fixup_end(struct maps *maps);
 
 typedef int (*mapfn_t)(u64 start, u64 len, u64 pgoff, void *data);
 int file__read_maps(int fd, bool exe, mapfn_t mapfn, void *data,
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 17/53] perf thread: Add missing RC_CHK_EQUAL
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (15 preceding siblings ...)
  2023-11-02 17:56 ` [PATCH v4 16/53] perf maps: Move symbol maps functions to maps.c Ian Rogers
@ 2023-11-02 17:56 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 18/53] perf maps: Add maps__for_each_map to call a function on each entry Ian Rogers
                   ` (35 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:56 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Comparing pointers without RC_CHK_ACCESS means the indirect object
will be compared rather than the underlying maps when REFCNT_CHECKING
is enabled. Fix by adding missing RC_CHK_EQUAL.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/thread.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index fe5e6991ae4b..b9c2039c4230 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -385,7 +385,7 @@ static int thread__clone_maps(struct thread *thread, struct thread *parent, bool
 	if (thread__pid(thread) == thread__pid(parent))
 		return thread__prepare_access(thread);
 
-	if (thread__maps(thread) == thread__maps(parent)) {
+	if (RC_CHK_EQUAL(thread__maps(thread), thread__maps(parent))) {
 		pr_debug("broken map groups on thread %d/%d parent %d/%d\n",
 			 thread__pid(thread), thread__tid(thread),
 			 thread__pid(parent), thread__tid(parent));
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 18/53] perf maps: Add maps__for_each_map to call a function on each entry
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (16 preceding siblings ...)
  2023-11-02 17:56 ` [PATCH v4 17/53] perf thread: Add missing RC_CHK_EQUAL Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 19/53] perf maps: Add remove maps function to remove a map based on callback Ian Rogers
                   ` (34 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Most current uses of maps don't take the rwsem introducing a risk that
the maps will change during iteration. Introduce maps__for_each_map
that iterates the entries under the read lock of the rwsem. This
replaces the maps__for_each_entry macro that is moved into
maps.c. maps__for_each_entry_safe will be replaced in a later change.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/arch/x86/util/event.c         | 103 +++++++------
 tools/perf/builtin-report.c              |  54 ++++---
 tools/perf/tests/maps.c                  |  61 +++++---
 tools/perf/tests/vmlinux-kallsyms.c      | 181 ++++++++++++-----------
 tools/perf/util/machine.c                |  53 ++++---
 tools/perf/util/maps.c                   | 102 ++++++++-----
 tools/perf/util/maps.h                   |   6 +-
 tools/perf/util/probe-event.c            |  40 +++--
 tools/perf/util/symbol.c                 |  36 ++---
 tools/perf/util/synthetic-events.c       | 118 ++++++++-------
 tools/perf/util/thread.c                 |  35 +++--
 tools/perf/util/unwind-libunwind-local.c |  34 +++--
 tools/perf/util/vdso.c                   |  35 +++--
 13 files changed, 508 insertions(+), 350 deletions(-)

diff --git a/tools/perf/arch/x86/util/event.c b/tools/perf/arch/x86/util/event.c
index 5741ffe47312..e65b7dbe27fb 100644
--- a/tools/perf/arch/x86/util/event.c
+++ b/tools/perf/arch/x86/util/event.c
@@ -14,66 +14,79 @@
 
 #if defined(__x86_64__)
 
-int perf_event__synthesize_extra_kmaps(struct perf_tool *tool,
-				       perf_event__handler_t process,
-				       struct machine *machine)
+struct perf_event__synthesize_extra_kmaps_cb_args {
+	struct perf_tool *tool;
+	perf_event__handler_t process;
+	struct machine *machine;
+	union perf_event *event;
+};
+
+static int perf_event__synthesize_extra_kmaps_cb(struct map *map, void *data)
 {
-	int rc = 0;
-	struct map_rb_node *pos;
-	struct maps *kmaps = machine__kernel_maps(machine);
-	union perf_event *event = zalloc(sizeof(event->mmap) +
-					 machine->id_hdr_size);
+	struct perf_event__synthesize_extra_kmaps_cb_args *args = data;
+	union perf_event *event = args->event;
+	struct kmap *kmap;
+	size_t size;
 
-	if (!event) {
-		pr_debug("Not enough memory synthesizing mmap event "
-			 "for extra kernel maps\n");
-		return -1;
-	}
+	if (!__map__is_extra_kernel_map(map))
+		return 0;
 
-	maps__for_each_entry(kmaps, pos) {
-		struct kmap *kmap;
-		size_t size;
-		struct map *map = pos->map;
+	kmap = map__kmap(map);
 
-		if (!__map__is_extra_kernel_map(map))
-			continue;
+	size = sizeof(event->mmap) - sizeof(event->mmap.filename) +
+		      PERF_ALIGN(strlen(kmap->name) + 1, sizeof(u64)) +
+		      args->machine->id_hdr_size;
 
-		kmap = map__kmap(map);
+	memset(event, 0, size);
 
-		size = sizeof(event->mmap) - sizeof(event->mmap.filename) +
-		       PERF_ALIGN(strlen(kmap->name) + 1, sizeof(u64)) +
-		       machine->id_hdr_size;
+	event->mmap.header.type = PERF_RECORD_MMAP;
 
-		memset(event, 0, size);
+	/*
+	 * kernel uses 0 for user space maps, see kernel/perf_event.c
+	 * __perf_event_mmap
+	 */
+	if (machine__is_host(args->machine))
+		event->header.misc = PERF_RECORD_MISC_KERNEL;
+	else
+		event->header.misc = PERF_RECORD_MISC_GUEST_KERNEL;
 
-		event->mmap.header.type = PERF_RECORD_MMAP;
+	event->mmap.header.size = size;
 
-		/*
-		 * kernel uses 0 for user space maps, see kernel/perf_event.c
-		 * __perf_event_mmap
-		 */
-		if (machine__is_host(machine))
-			event->header.misc = PERF_RECORD_MISC_KERNEL;
-		else
-			event->header.misc = PERF_RECORD_MISC_GUEST_KERNEL;
+	event->mmap.start = map__start(map);
+	event->mmap.len   = map__size(map);
+	event->mmap.pgoff = map__pgoff(map);
+	event->mmap.pid   = args->machine->pid;
 
-		event->mmap.header.size = size;
+	strlcpy(event->mmap.filename, kmap->name, PATH_MAX);
 
-		event->mmap.start = map__start(map);
-		event->mmap.len   = map__size(map);
-		event->mmap.pgoff = map__pgoff(map);
-		event->mmap.pid   = machine->pid;
+	if (perf_tool__process_synth_event(args->tool, event, args->machine, args->process) != 0)
+		return -1;
 
-		strlcpy(event->mmap.filename, kmap->name, PATH_MAX);
+	return 0;
+}
 
-		if (perf_tool__process_synth_event(tool, event, machine,
-						   process) != 0) {
-			rc = -1;
-			break;
-		}
+int perf_event__synthesize_extra_kmaps(struct perf_tool *tool,
+				       perf_event__handler_t process,
+				       struct machine *machine)
+{
+	int rc;
+	struct maps *kmaps = machine__kernel_maps(machine);
+	struct perf_event__synthesize_extra_kmaps_cb_args args = {
+		.tool = tool,
+		.process = process,
+		.machine = machine,
+		.event = zalloc(sizeof(args.event->mmap) + machine->id_hdr_size),
+	};
+
+	if (!args.event) {
+		pr_debug("Not enough memory synthesizing mmap event "
+			 "for extra kernel maps\n");
+		return -1;
 	}
 
-	free(event);
+	rc = maps__for_each_map(kmaps, perf_event__synthesize_extra_kmaps_cb, &args);
+
+	free(args.event);
 	return rc;
 }
 
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 121a2781323c..a5d7bc5b843f 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -859,27 +859,47 @@ static struct task *tasks_list(struct task *task, struct machine *machine)
 	return tasks_list(parent_task, machine);
 }
 
-static size_t maps__fprintf_task(struct maps *maps, int indent, FILE *fp)
+struct maps__fprintf_task_args {
+	int indent;
+	FILE *fp;
+	size_t printed;
+};
+
+static int maps__fprintf_task_cb(struct map *map, void *data)
 {
-	size_t printed = 0;
-	struct map_rb_node *rb_node;
+	struct maps__fprintf_task_args *args = data;
+	const struct dso *dso = map__dso(map);
+	u32 prot = map__prot(map);
+	int ret;
 
-	maps__for_each_entry(maps, rb_node) {
-		struct map *map = rb_node->map;
-		const struct dso *dso = map__dso(map);
-		u32 prot = map__prot(map);
+	ret = fprintf(args->fp,
+		"%*s  %" PRIx64 "-%" PRIx64 " %c%c%c%c %08" PRIx64 " %" PRIu64 " %s\n",
+		args->indent, "", map__start(map), map__end(map),
+		prot & PROT_READ ? 'r' : '-',
+		prot & PROT_WRITE ? 'w' : '-',
+		prot & PROT_EXEC ? 'x' : '-',
+		map__flags(map) ? 's' : 'p',
+		map__pgoff(map),
+		dso->id.ino, dso->name);
 
-		printed += fprintf(fp, "%*s  %" PRIx64 "-%" PRIx64 " %c%c%c%c %08" PRIx64 " %" PRIu64 " %s\n",
-				   indent, "", map__start(map), map__end(map),
-				   prot & PROT_READ ? 'r' : '-',
-				   prot & PROT_WRITE ? 'w' : '-',
-				   prot & PROT_EXEC ? 'x' : '-',
-				   map__flags(map) ? 's' : 'p',
-				   map__pgoff(map),
-				   dso->id.ino, dso->name);
-	}
+	if (ret < 0)
+		return ret;
+
+	args->printed += ret;
+	return 0;
+}
+
+static size_t maps__fprintf_task(struct maps *maps, int indent, FILE *fp)
+{
+	struct maps__fprintf_task_args args = {
+		.indent = indent,
+		.fp = fp,
+		.printed = 0,
+	};
+
+	maps__for_each_map(maps, maps__fprintf_task_cb, &args);
 
-	return printed;
+	return args.printed;
 }
 
 static void task__print_level(struct task *task, FILE *fp, int level)
diff --git a/tools/perf/tests/maps.c b/tools/perf/tests/maps.c
index 5bb1123a91a7..bb3fbfe5a73e 100644
--- a/tools/perf/tests/maps.c
+++ b/tools/perf/tests/maps.c
@@ -14,44 +14,59 @@ struct map_def {
 	u64 end;
 };
 
+struct check_maps_cb_args {
+	struct map_def *merged;
+	unsigned int i;
+};
+
+static int check_maps_cb(struct map *map, void *data)
+{
+	struct check_maps_cb_args *args = data;
+	struct map_def *merged = &args->merged[args->i];
+
+	if (map__start(map) != merged->start ||
+	    map__end(map) != merged->end ||
+	    strcmp(map__dso(map)->name, merged->name) ||
+	    refcount_read(map__refcnt(map)) != 1) {
+		return 1;
+	}
+	args->i++;
+	return 0;
+}
+
+static int failed_cb(struct map *map, void *data __maybe_unused)
+{
+	pr_debug("\tstart: %" PRIu64 " end: %" PRIu64 " name: '%s' refcnt: %d\n",
+		map__start(map),
+		map__end(map),
+		map__dso(map)->name,
+		refcount_read(map__refcnt(map)));
+
+	return 0;
+}
+
 static int check_maps(struct map_def *merged, unsigned int size, struct maps *maps)
 {
-	struct map_rb_node *rb_node;
-	unsigned int i = 0;
 	bool failed = false;
 
 	if (maps__nr_maps(maps) != size) {
 		pr_debug("Expected %d maps, got %d", size, maps__nr_maps(maps));
 		failed = true;
 	} else {
-		maps__for_each_entry(maps, rb_node) {
-			struct map *map = rb_node->map;
-
-			if (map__start(map) != merged[i].start ||
-			    map__end(map) != merged[i].end ||
-			    strcmp(map__dso(map)->name, merged[i].name) ||
-			    refcount_read(map__refcnt(map)) != 1) {
-				failed = true;
-			}
-			i++;
-		}
+		struct check_maps_cb_args args = {
+			.merged = merged,
+			.i = 0,
+		};
+		failed = maps__for_each_map(maps, check_maps_cb, &args);
 	}
 	if (failed) {
 		pr_debug("Expected:\n");
-		for (i = 0; i < size; i++) {
+		for (unsigned int i = 0; i < size; i++) {
 			pr_debug("\tstart: %" PRIu64 " end: %" PRIu64 " name: '%s' refcnt: 1\n",
 				merged[i].start, merged[i].end, merged[i].name);
 		}
 		pr_debug("Got:\n");
-		maps__for_each_entry(maps, rb_node) {
-			struct map *map = rb_node->map;
-
-			pr_debug("\tstart: %" PRIu64 " end: %" PRIu64 " name: '%s' refcnt: %d\n",
-				map__start(map),
-				map__end(map),
-				map__dso(map)->name,
-				refcount_read(map__refcnt(map)));
-		}
+		maps__for_each_map(maps, failed_cb, NULL);
 	}
 	return failed ? TEST_FAIL : TEST_OK;
 }
diff --git a/tools/perf/tests/vmlinux-kallsyms.c b/tools/perf/tests/vmlinux-kallsyms.c
index 1078a93b01aa..822f893e67d5 100644
--- a/tools/perf/tests/vmlinux-kallsyms.c
+++ b/tools/perf/tests/vmlinux-kallsyms.c
@@ -112,18 +112,92 @@ static bool is_ignored_symbol(const char *name, char type)
 	return false;
 }
 
+struct test__vmlinux_matches_kallsyms_cb_args {
+	struct machine kallsyms;
+	struct map *vmlinux_map;
+	bool header_printed;
+};
+
+static int test__vmlinux_matches_kallsyms_cb1(struct map *map, void *data)
+{
+	struct test__vmlinux_matches_kallsyms_cb_args *args = data;
+	struct dso *dso = map__dso(map);
+	/*
+	 * If it is the kernel, kallsyms is always "[kernel.kallsyms]", while
+	 * the kernel will have the path for the vmlinux file being used, so use
+	 * the short name, less descriptive but the same ("[kernel]" in both
+	 * cases.
+	 */
+	struct map *pair = maps__find_by_name(args->kallsyms.kmaps,
+					(dso->kernel ? dso->short_name : dso->name));
+
+	if (pair)
+		map__set_priv(pair, 1);
+	else {
+		if (!args->header_printed) {
+			pr_info("WARN: Maps only in vmlinux:\n");
+			args->header_printed = true;
+		}
+		map__fprintf(map, stderr);
+	}
+	return 0;
+}
+
+static int test__vmlinux_matches_kallsyms_cb2(struct map *map, void *data)
+{
+	struct test__vmlinux_matches_kallsyms_cb_args *args = data;
+	struct map *pair;
+	u64 mem_start = map__unmap_ip(args->vmlinux_map, map__start(map));
+	u64 mem_end = map__unmap_ip(args->vmlinux_map, map__end(map));
+
+	pair = maps__find(args->kallsyms.kmaps, mem_start);
+	if (pair == NULL || map__priv(pair))
+		return 0;
+
+	if (map__start(pair) == mem_start) {
+		struct dso *dso = map__dso(map);
+
+		if (!args->header_printed) {
+			pr_info("WARN: Maps in vmlinux with a different name in kallsyms:\n");
+			args->header_printed = true;
+		}
+
+		pr_info("WARN: %" PRIx64 "-%" PRIx64 " %" PRIx64 " %s in kallsyms as",
+			map__start(map), map__end(map), map__pgoff(map), dso->name);
+		if (mem_end != map__end(pair))
+			pr_info(":\nWARN: *%" PRIx64 "-%" PRIx64 " %" PRIx64,
+				map__start(pair), map__end(pair), map__pgoff(pair));
+		pr_info(" %s\n", dso->name);
+		map__set_priv(pair, 1);
+	}
+	return 0;
+}
+
+static int test__vmlinux_matches_kallsyms_cb3(struct map *map, void *data)
+{
+	struct test__vmlinux_matches_kallsyms_cb_args *args = data;
+
+	if (!map__priv(map)) {
+		if (!args->header_printed) {
+			pr_info("WARN: Maps only in kallsyms:\n");
+			args->header_printed = true;
+		}
+		map__fprintf(map, stderr);
+	}
+	return 0;
+}
+
 static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused,
 					int subtest __maybe_unused)
 {
 	int err = TEST_FAIL;
 	struct rb_node *nd;
 	struct symbol *sym;
-	struct map *kallsyms_map, *vmlinux_map;
-	struct map_rb_node *rb_node;
-	struct machine kallsyms, vmlinux;
+	struct map *kallsyms_map;
+	struct machine vmlinux;
 	struct maps *maps;
 	u64 mem_start, mem_end;
-	bool header_printed;
+	struct test__vmlinux_matches_kallsyms_cb_args args;
 
 	/*
 	 * Step 1:
@@ -131,7 +205,7 @@ static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused
 	 * Init the machines that will hold kernel, modules obtained from
 	 * both vmlinux + .ko files and from /proc/kallsyms split by modules.
 	 */
-	machine__init(&kallsyms, "", HOST_KERNEL_ID);
+	machine__init(&args.kallsyms, "", HOST_KERNEL_ID);
 	machine__init(&vmlinux, "", HOST_KERNEL_ID);
 
 	maps = machine__kernel_maps(&vmlinux);
@@ -143,7 +217,7 @@ static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused
 	 * load /proc/kallsyms. Also create the modules maps from /proc/modules
 	 * and find the .ko files that match them in /lib/modules/`uname -r`/.
 	 */
-	if (machine__create_kernel_maps(&kallsyms) < 0) {
+	if (machine__create_kernel_maps(&args.kallsyms) < 0) {
 		pr_debug("machine__create_kernel_maps failed");
 		err = TEST_SKIP;
 		goto out;
@@ -160,7 +234,7 @@ static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused
 	 * be compacted against the list of modules found in the "vmlinux"
 	 * code and with the one got from /proc/modules from the "kallsyms" code.
 	 */
-	if (machine__load_kallsyms(&kallsyms, "/proc/kallsyms") <= 0) {
+	if (machine__load_kallsyms(&args.kallsyms, "/proc/kallsyms") <= 0) {
 		pr_debug("machine__load_kallsyms failed");
 		err = TEST_SKIP;
 		goto out;
@@ -174,7 +248,7 @@ static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused
 	 * to see if the running kernel was relocated by checking if it has the
 	 * same value in the vmlinux file we load.
 	 */
-	kallsyms_map = machine__kernel_map(&kallsyms);
+	kallsyms_map = machine__kernel_map(&args.kallsyms);
 
 	/*
 	 * Step 5:
@@ -186,7 +260,7 @@ static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused
 		goto out;
 	}
 
-	vmlinux_map = machine__kernel_map(&vmlinux);
+	args.vmlinux_map = machine__kernel_map(&vmlinux);
 
 	/*
 	 * Step 6:
@@ -213,7 +287,7 @@ static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused
 	 * in the kallsyms dso. For the ones that are in both, check its names and
 	 * end addresses too.
 	 */
-	map__for_each_symbol(vmlinux_map, sym, nd) {
+	map__for_each_symbol(args.vmlinux_map, sym, nd) {
 		struct symbol *pair, *first_pair;
 
 		sym  = rb_entry(nd, struct symbol, rb_node);
@@ -221,10 +295,10 @@ static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused
 		if (sym->start == sym->end)
 			continue;
 
-		mem_start = map__unmap_ip(vmlinux_map, sym->start);
-		mem_end = map__unmap_ip(vmlinux_map, sym->end);
+		mem_start = map__unmap_ip(args.vmlinux_map, sym->start);
+		mem_end = map__unmap_ip(args.vmlinux_map, sym->end);
 
-		first_pair = machine__find_kernel_symbol(&kallsyms, mem_start, NULL);
+		first_pair = machine__find_kernel_symbol(&args.kallsyms, mem_start, NULL);
 		pair = first_pair;
 
 		if (pair && UM(pair->start) == mem_start) {
@@ -253,7 +327,8 @@ static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused
 				 */
 				continue;
 			} else {
-				pair = machine__find_kernel_symbol_by_name(&kallsyms, sym->name, NULL);
+				pair = machine__find_kernel_symbol_by_name(&args.kallsyms,
+									   sym->name, NULL);
 				if (pair) {
 					if (UM(pair->start) == mem_start)
 						goto next_pair;
@@ -267,7 +342,7 @@ static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused
 
 				continue;
 			}
-		} else if (mem_start == map__end(kallsyms.vmlinux_map)) {
+		} else if (mem_start == map__end(args.kallsyms.vmlinux_map)) {
 			/*
 			 * Ignore aliases to _etext, i.e. to the end of the kernel text area,
 			 * such as __indirect_thunk_end.
@@ -289,78 +364,18 @@ static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused
 	if (verbose <= 0)
 		goto out;
 
-	header_printed = false;
-
-	maps__for_each_entry(maps, rb_node) {
-		struct map *map = rb_node->map;
-		struct dso *dso = map__dso(map);
-		/*
-		 * If it is the kernel, kallsyms is always "[kernel.kallsyms]", while
-		 * the kernel will have the path for the vmlinux file being used,
-		 * so use the short name, less descriptive but the same ("[kernel]" in
-		 * both cases.
-		 */
-		struct map *pair = maps__find_by_name(kallsyms.kmaps, (dso->kernel ?
-								dso->short_name :
-								dso->name));
-		if (pair) {
-			map__set_priv(pair, 1);
-		} else {
-			if (!header_printed) {
-				pr_info("WARN: Maps only in vmlinux:\n");
-				header_printed = true;
-			}
-			map__fprintf(map, stderr);
-		}
-	}
-
-	header_printed = false;
-
-	maps__for_each_entry(maps, rb_node) {
-		struct map *pair, *map = rb_node->map;
-
-		mem_start = map__unmap_ip(vmlinux_map, map__start(map));
-		mem_end = map__unmap_ip(vmlinux_map, map__end(map));
+	args.header_printed = false;
+	maps__for_each_map(maps, test__vmlinux_matches_kallsyms_cb1, &args);
 
-		pair = maps__find(kallsyms.kmaps, mem_start);
-		if (pair == NULL || map__priv(pair))
-			continue;
-
-		if (map__start(pair) == mem_start) {
-			struct dso *dso = map__dso(map);
-
-			if (!header_printed) {
-				pr_info("WARN: Maps in vmlinux with a different name in kallsyms:\n");
-				header_printed = true;
-			}
-
-			pr_info("WARN: %" PRIx64 "-%" PRIx64 " %" PRIx64 " %s in kallsyms as",
-				map__start(map), map__end(map), map__pgoff(map), dso->name);
-			if (mem_end != map__end(pair))
-				pr_info(":\nWARN: *%" PRIx64 "-%" PRIx64 " %" PRIx64,
-					map__start(pair), map__end(pair), map__pgoff(pair));
-			pr_info(" %s\n", dso->name);
-			map__set_priv(pair, 1);
-		}
-	}
-
-	header_printed = false;
-
-	maps = machine__kernel_maps(&kallsyms);
+	args.header_printed = false;
+	maps__for_each_map(maps, test__vmlinux_matches_kallsyms_cb2, &args);
 
-	maps__for_each_entry(maps, rb_node) {
-		struct map *map = rb_node->map;
+	args.header_printed = false;
+	maps = machine__kernel_maps(&args.kallsyms);
+	maps__for_each_map(maps, test__vmlinux_matches_kallsyms_cb3, &args);
 
-		if (!map__priv(map)) {
-			if (!header_printed) {
-				pr_info("WARN: Maps only in kallsyms:\n");
-				header_printed = true;
-			}
-			map__fprintf(map, stderr);
-		}
-	}
 out:
-	machine__exit(&kallsyms);
+	machine__exit(&args.kallsyms);
 	machine__exit(&vmlinux);
 	return err;
 }
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index b6831a1f909d..3c967295c9a3 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1286,33 +1286,46 @@ static u64 find_entry_trampoline(struct dso *dso)
 #define X86_64_CPU_ENTRY_AREA_SIZE	0x2c000
 #define X86_64_ENTRY_TRAMPOLINE		0x6000
 
+struct machine__map_x86_64_entry_trampolines_args {
+	struct maps *kmaps;
+	bool found;
+};
+
+static int machine__map_x86_64_entry_trampolines_cb(struct map *map, void *data)
+{
+	struct machine__map_x86_64_entry_trampolines_args *args = data;
+	struct map *dest_map;
+	struct kmap *kmap = __map__kmap(map);
+
+	if (!kmap || !is_entry_trampoline(kmap->name))
+		return 0;
+
+	dest_map = maps__find(args->kmaps, map__pgoff(map));
+	if (dest_map != map)
+		map__set_pgoff(map, map__map_ip(dest_map, map__pgoff(map)));
+
+	args->found = true;
+	return 0;
+}
+
 /* Map x86_64 PTI entry trampolines */
 int machine__map_x86_64_entry_trampolines(struct machine *machine,
 					  struct dso *kernel)
 {
-	struct maps *kmaps = machine__kernel_maps(machine);
+	struct machine__map_x86_64_entry_trampolines_args args = {
+		.kmaps = machine__kernel_maps(machine),
+		.found = false,
+	};
 	int nr_cpus_avail, cpu;
-	bool found = false;
-	struct map_rb_node *rb_node;
 	u64 pgoff;
 
 	/*
 	 * In the vmlinux case, pgoff is a virtual address which must now be
 	 * mapped to a vmlinux offset.
 	 */
-	maps__for_each_entry(kmaps, rb_node) {
-		struct map *dest_map, *map = rb_node->map;
-		struct kmap *kmap = __map__kmap(map);
-
-		if (!kmap || !is_entry_trampoline(kmap->name))
-			continue;
+	maps__for_each_map(args.kmaps, machine__map_x86_64_entry_trampolines_cb, &args);
 
-		dest_map = maps__find(kmaps, map__pgoff(map));
-		if (dest_map != map)
-			map__set_pgoff(map, map__map_ip(dest_map, map__pgoff(map)));
-		found = true;
-	}
-	if (found || machine->trampolines_mapped)
+	if (args.found || machine->trampolines_mapped)
 		return 0;
 
 	pgoff = find_entry_trampoline(kernel);
@@ -3395,16 +3408,8 @@ int machine__for_each_dso(struct machine *machine, machine__dso_t fn, void *priv
 int machine__for_each_kernel_map(struct machine *machine, machine__map_t fn, void *priv)
 {
 	struct maps *maps = machine__kernel_maps(machine);
-	struct map_rb_node *pos;
-	int err = 0;
 
-	maps__for_each_entry(maps, pos) {
-		err = fn(pos->map, priv);
-		if (err != 0) {
-			break;
-		}
-	}
-	return err;
+	return maps__for_each_map(maps, fn, priv);
 }
 
 bool machine__is_lock_function(struct machine *machine, u64 addr)
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 9a011aed4b75..00e6589bba10 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -10,6 +10,9 @@
 #include "ui/ui.h"
 #include "unwind.h"
 
+#define maps__for_each_entry(maps, map) \
+	for (map = maps__first(maps); map; map = map_rb_node__next(map))
+
 static void maps__init(struct maps *maps, struct machine *machine)
 {
 	refcount_set(maps__refcnt(maps), 1);
@@ -196,6 +199,21 @@ void maps__put(struct maps *maps)
 		RC_CHK_PUT(maps);
 }
 
+int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data), void *data)
+{
+	struct map_rb_node *pos;
+	int ret = 0;
+
+	down_read(maps__lock(maps));
+	maps__for_each_entry(maps, pos)	{
+		ret = cb(pos->map, data);
+		if (ret)
+			break;
+	}
+	up_read(maps__lock(maps));
+	return ret;
+}
+
 struct symbol *maps__find_symbol(struct maps *maps, u64 addr, struct map **mapp)
 {
 	struct map *map = maps__find(maps, addr);
@@ -210,31 +228,40 @@ struct symbol *maps__find_symbol(struct maps *maps, u64 addr, struct map **mapp)
 	return NULL;
 }
 
-struct symbol *maps__find_symbol_by_name(struct maps *maps, const char *name, struct map **mapp)
-{
+struct maps__find_symbol_by_name_args {
+	struct map **mapp;
+	const char *name;
 	struct symbol *sym;
-	struct map_rb_node *pos;
+};
 
-	down_read(maps__lock(maps));
+static int maps__find_symbol_by_name_cb(struct map *map, void *data)
+{
+	struct maps__find_symbol_by_name_args *args = data;
 
-	maps__for_each_entry(maps, pos) {
-		sym = map__find_symbol_by_name(pos->map, name);
+	args->sym = map__find_symbol_by_name(map, args->name);
+	if (!args->sym)
+		return 0;
 
-		if (sym == NULL)
-			continue;
-		if (!map__contains_symbol(pos->map, sym)) {
-			sym = NULL;
-			continue;
-		}
-		if (mapp != NULL)
-			*mapp = pos->map;
-		goto out;
+	if (!map__contains_symbol(map, args->sym)) {
+		args->sym = NULL;
+		return 0;
 	}
 
-	sym = NULL;
-out:
-	up_read(maps__lock(maps));
-	return sym;
+	if (args->mapp != NULL)
+		*args->mapp = map__get(map);
+	return 1;
+}
+
+struct symbol *maps__find_symbol_by_name(struct maps *maps, const char *name, struct map **mapp)
+{
+	struct maps__find_symbol_by_name_args args = {
+		.mapp = mapp,
+		.name = name,
+		.sym = NULL,
+	};
+
+	maps__for_each_map(maps, maps__find_symbol_by_name_cb, &args);
+	return args.sym;
 }
 
 int maps__find_ams(struct maps *maps, struct addr_map_symbol *ams)
@@ -253,25 +280,34 @@ int maps__find_ams(struct maps *maps, struct addr_map_symbol *ams)
 	return ams->ms.sym ? 0 : -1;
 }
 
-size_t maps__fprintf(struct maps *maps, FILE *fp)
-{
-	size_t printed = 0;
-	struct map_rb_node *pos;
+struct maps__fprintf_args {
+	FILE *fp;
+	size_t printed;
+};
 
-	down_read(maps__lock(maps));
+static int maps__fprintf_cb(struct map *map, void *data)
+{
+	struct maps__fprintf_args *args = data;
 
-	maps__for_each_entry(maps, pos) {
-		printed += fprintf(fp, "Map:");
-		printed += map__fprintf(pos->map, fp);
-		if (verbose > 2) {
-			printed += dso__fprintf(map__dso(pos->map), fp);
-			printed += fprintf(fp, "--\n");
-		}
+	args->printed += fprintf(args->fp, "Map:");
+	args->printed += map__fprintf(map, args->fp);
+	if (verbose > 2) {
+		args->printed += dso__fprintf(map__dso(map), args->fp);
+		args->printed += fprintf(args->fp, "--\n");
 	}
+	return 0;
+}
 
-	up_read(maps__lock(maps));
+size_t maps__fprintf(struct maps *maps, FILE *fp)
+{
+	struct maps__fprintf_args args = {
+		.fp = fp,
+		.printed = 0,
+	};
+
+	maps__for_each_map(maps, maps__fprintf_cb, &args);
 
-	return printed;
+	return args.printed;
 }
 
 int maps__fixup_overlappings(struct maps *maps, struct map *map, FILE *fp)
diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index a689149be8c4..8ac30cdaf5bd 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -36,9 +36,6 @@ struct map_rb_node *map_rb_node__next(struct map_rb_node *node);
 struct map_rb_node *maps__find_node(struct maps *maps, struct map *map);
 struct map *maps__find(struct maps *maps, u64 addr);
 
-#define maps__for_each_entry(maps, map) \
-	for (map = maps__first(maps); map; map = map_rb_node__next(map))
-
 #define maps__for_each_entry_safe(maps, map, next) \
 	for (map = maps__first(maps), next = map_rb_node__next(map); map; \
 	     map = next, next = map_rb_node__next(map))
@@ -81,6 +78,9 @@ static inline void __maps__zput(struct maps **map)
 
 #define maps__zput(map) __maps__zput(&map)
 
+/* Iterate over map calling cb for each entry. */
+int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data), void *data);
+
 static inline struct rb_root *maps__entries(struct maps *maps)
 {
 	return &RC_CHK_ACCESS(maps)->entries;
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 1a5b7fa459b2..a1a796043691 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -149,10 +149,32 @@ static int kernel_get_symbol_address_by_name(const char *name, u64 *addr,
 	return 0;
 }
 
+struct kernel_get_module_map_cb_args {
+	const char *module;
+	struct map *result;
+};
+
+static int kernel_get_module_map_cb(struct map *map, void *data)
+{
+	struct kernel_get_module_map_cb_args *args = data;
+	struct dso *dso = map__dso(map);
+	const char *short_name = dso->short_name; /* short_name is "[module]" */
+	u16 short_name_len =  dso->short_name_len;
+
+	if (strncmp(short_name + 1, args->module, short_name_len - 2) == 0 &&
+	    args->module[short_name_len - 2] == '\0') {
+		args->result = map__get(map);
+		return 1;
+	}
+	return 0;
+}
+
 static struct map *kernel_get_module_map(const char *module)
 {
-	struct maps *maps = machine__kernel_maps(host_machine);
-	struct map_rb_node *pos;
+	struct kernel_get_module_map_cb_args args = {
+		.module = module,
+		.result = NULL,
+	};
 
 	/* A file path -- this is an offline module */
 	if (module && strchr(module, '/'))
@@ -164,19 +186,9 @@ static struct map *kernel_get_module_map(const char *module)
 		return map__get(map);
 	}
 
-	maps__for_each_entry(maps, pos) {
-		/* short_name is "[module]" */
-		struct dso *dso = map__dso(pos->map);
-		const char *short_name = dso->short_name;
-		u16 short_name_len =  dso->short_name_len;
+	maps__for_each_map(machine__kernel_maps(host_machine), kernel_get_module_map_cb, &args);
 
-		if (strncmp(short_name + 1, module,
-			    short_name_len - 2) == 0 &&
-		    module[short_name_len - 2] == '\0') {
-			return map__get(pos->map);
-		}
-	}
-	return NULL;
+	return args.result;
 }
 
 struct map *get_target_map(const char *target, struct nsinfo *nsi, bool user)
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 1cc42b8d8afb..72f03b875478 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1114,33 +1114,35 @@ int compare_proc_modules(const char *from, const char *to)
 	return ret;
 }
 
+static int do_validate_kcore_modules_cb(struct map *old_map, void *data)
+{
+	struct rb_root *modules = data;
+	struct module_info *mi;
+	struct dso *dso;
+
+	if (!__map__is_kmodule(old_map))
+		return 0;
+
+	dso = map__dso(old_map);
+	/* Module must be in memory at the same address */
+	mi = find_module(dso->short_name, modules);
+	if (!mi || mi->start != map__start(old_map))
+		return -EINVAL;
+
+	return 0;
+}
+
 static int do_validate_kcore_modules(const char *filename, struct maps *kmaps)
 {
 	struct rb_root modules = RB_ROOT;
-	struct map_rb_node *old_node;
 	int err;
 
 	err = read_proc_modules(filename, &modules);
 	if (err)
 		return err;
 
-	maps__for_each_entry(kmaps, old_node) {
-		struct map *old_map = old_node->map;
-		struct module_info *mi;
-		struct dso *dso;
+	err = maps__for_each_map(kmaps, do_validate_kcore_modules_cb, &modules);
 
-		if (!__map__is_kmodule(old_map)) {
-			continue;
-		}
-		dso = map__dso(old_map);
-		/* Module must be in memory at the same address */
-		mi = find_module(dso->short_name, &modules);
-		if (!mi || mi->start != map__start(old_map)) {
-			err = -EINVAL;
-			goto out;
-		}
-	}
-out:
 	delete_modules(&modules);
 	return err;
 }
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 7cc38f2a0e9e..cdab6aa04917 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -666,18 +666,74 @@ int perf_event__synthesize_cgroups(struct perf_tool *tool __maybe_unused,
 }
 #endif
 
+struct perf_event__synthesize_modules_maps_cb_args {
+	struct perf_tool *tool;
+	perf_event__handler_t process;
+	struct machine *machine;
+	union perf_event *event;
+};
+
+static int perf_event__synthesize_modules_maps_cb(struct map *map, void *data)
+{
+	struct perf_event__synthesize_modules_maps_cb_args *args = data;
+	union perf_event *event = args->event;
+	struct dso *dso;
+	size_t size;
+
+	if (!__map__is_kmodule(map))
+		return 0;
+
+	dso = map__dso(map);
+	if (symbol_conf.buildid_mmap2) {
+		size = PERF_ALIGN(dso->long_name_len + 1, sizeof(u64));
+		event->mmap2.header.type = PERF_RECORD_MMAP2;
+		event->mmap2.header.size = (sizeof(event->mmap2) -
+					(sizeof(event->mmap2.filename) - size));
+		memset(event->mmap2.filename + size, 0, args->machine->id_hdr_size);
+		event->mmap2.header.size += args->machine->id_hdr_size;
+		event->mmap2.start = map__start(map);
+		event->mmap2.len   = map__size(map);
+		event->mmap2.pid   = args->machine->pid;
+
+		memcpy(event->mmap2.filename, dso->long_name, dso->long_name_len + 1);
+
+		perf_record_mmap2__read_build_id(&event->mmap2, args->machine, false);
+	} else {
+		size = PERF_ALIGN(dso->long_name_len + 1, sizeof(u64));
+		event->mmap.header.type = PERF_RECORD_MMAP;
+		event->mmap.header.size = (sizeof(event->mmap) -
+					(sizeof(event->mmap.filename) - size));
+		memset(event->mmap.filename + size, 0, args->machine->id_hdr_size);
+		event->mmap.header.size += args->machine->id_hdr_size;
+		event->mmap.start = map__start(map);
+		event->mmap.len   = map__size(map);
+		event->mmap.pid   = args->machine->pid;
+
+		memcpy(event->mmap.filename, dso->long_name, dso->long_name_len + 1);
+	}
+
+	if (perf_tool__process_synth_event(args->tool, event, args->machine, args->process) != 0)
+		return -1;
+
+	return 0;
+}
+
 int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process,
 				   struct machine *machine)
 {
-	int rc = 0;
-	struct map_rb_node *pos;
+	int rc;
 	struct maps *maps = machine__kernel_maps(machine);
-	union perf_event *event;
-	size_t size = symbol_conf.buildid_mmap2 ?
-			sizeof(event->mmap2) : sizeof(event->mmap);
+	struct perf_event__synthesize_modules_maps_cb_args args = {
+		.tool = tool,
+		.process = process,
+		.machine = machine,
+	};
+	size_t size = symbol_conf.buildid_mmap2
+		? sizeof(args.event->mmap2)
+		: sizeof(args.event->mmap);
 
-	event = zalloc(size + machine->id_hdr_size);
-	if (event == NULL) {
+	args.event = zalloc(size + machine->id_hdr_size);
+	if (args.event == NULL) {
 		pr_debug("Not enough memory synthesizing mmap event "
 			 "for kernel modules\n");
 		return -1;
@@ -688,53 +744,13 @@ int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t
 	 * __perf_event_mmap
 	 */
 	if (machine__is_host(machine))
-		event->header.misc = PERF_RECORD_MISC_KERNEL;
+		args.event->header.misc = PERF_RECORD_MISC_KERNEL;
 	else
-		event->header.misc = PERF_RECORD_MISC_GUEST_KERNEL;
-
-	maps__for_each_entry(maps, pos) {
-		struct map *map = pos->map;
-		struct dso *dso;
+		args.event->header.misc = PERF_RECORD_MISC_GUEST_KERNEL;
 
-		if (!__map__is_kmodule(map))
-			continue;
+	rc = maps__for_each_map(maps, perf_event__synthesize_modules_maps_cb, &args);
 
-		dso = map__dso(map);
-		if (symbol_conf.buildid_mmap2) {
-			size = PERF_ALIGN(dso->long_name_len + 1, sizeof(u64));
-			event->mmap2.header.type = PERF_RECORD_MMAP2;
-			event->mmap2.header.size = (sizeof(event->mmap2) -
-						(sizeof(event->mmap2.filename) - size));
-			memset(event->mmap2.filename + size, 0, machine->id_hdr_size);
-			event->mmap2.header.size += machine->id_hdr_size;
-			event->mmap2.start = map__start(map);
-			event->mmap2.len   = map__size(map);
-			event->mmap2.pid   = machine->pid;
-
-			memcpy(event->mmap2.filename, dso->long_name, dso->long_name_len + 1);
-
-			perf_record_mmap2__read_build_id(&event->mmap2, machine, false);
-		} else {
-			size = PERF_ALIGN(dso->long_name_len + 1, sizeof(u64));
-			event->mmap.header.type = PERF_RECORD_MMAP;
-			event->mmap.header.size = (sizeof(event->mmap) -
-						(sizeof(event->mmap.filename) - size));
-			memset(event->mmap.filename + size, 0, machine->id_hdr_size);
-			event->mmap.header.size += machine->id_hdr_size;
-			event->mmap.start = map__start(map);
-			event->mmap.len   = map__size(map);
-			event->mmap.pid   = machine->pid;
-
-			memcpy(event->mmap.filename, dso->long_name, dso->long_name_len + 1);
-		}
-
-		if (perf_tool__process_synth_event(tool, event, machine, process) != 0) {
-			rc = -1;
-			break;
-		}
-	}
-
-	free(event);
+	free(args.event);
 	return rc;
 }
 
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index b9c2039c4230..b6986a81aa6d 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -349,34 +349,33 @@ int thread__insert_map(struct thread *thread, struct map *map)
 	return maps__insert(thread__maps(thread), map);
 }
 
-static int __thread__prepare_access(struct thread *thread)
+struct thread__prepare_access_maps_cb_args {
+	int err;
+	struct maps *maps;
+};
+
+static int thread__prepare_access_maps_cb(struct map *map, void *data)
 {
 	bool initialized = false;
-	int err = 0;
-	struct maps *maps = thread__maps(thread);
-	struct map_rb_node *rb_node;
-
-	down_read(maps__lock(maps));
-
-	maps__for_each_entry(maps, rb_node) {
-		err = unwind__prepare_access(thread__maps(thread), rb_node->map, &initialized);
-		if (err || initialized)
-			break;
-	}
+	struct thread__prepare_access_maps_cb_args *args = data;
 
-	up_read(maps__lock(maps));
+	args->err = unwind__prepare_access(args->maps, map, &initialized);
 
-	return err;
+	return (args->err || initialized) ? 1 : 0;
 }
 
 static int thread__prepare_access(struct thread *thread)
 {
-	int err = 0;
+	struct thread__prepare_access_maps_cb_args args = {
+		.err = 0,
+	};
 
-	if (dwarf_callchain_users)
-		err = __thread__prepare_access(thread);
+	if (dwarf_callchain_users) {
+		args.maps = thread__maps(thread);
+		maps__for_each_map(thread__maps(thread), thread__prepare_access_maps_cb, &args);
+	}
 
-	return err;
+	return args.err;
 }
 
 static int thread__clone_maps(struct thread *thread, struct thread *parent, bool do_maps_clone)
diff --git a/tools/perf/util/unwind-libunwind-local.c b/tools/perf/util/unwind-libunwind-local.c
index c0641882fd2f..228f1565bd0b 100644
--- a/tools/perf/util/unwind-libunwind-local.c
+++ b/tools/perf/util/unwind-libunwind-local.c
@@ -302,12 +302,31 @@ static int unwind_spec_ehframe(struct dso *dso, struct machine *machine,
 	return 0;
 }
 
+struct read_unwind_spec_eh_frame_maps_cb_args {
+	struct dso *dso;
+	u64 base_addr;
+};
+
+static int read_unwind_spec_eh_frame_maps_cb(struct map *map, void *data)
+{
+
+	struct read_unwind_spec_eh_frame_maps_cb_args *args = data;
+
+	if (map__dso(map) == args->dso && map__start(map) < args->base_addr)
+		args->base_addr = map__start(map);
+
+	return 0;
+}
+
+
 static int read_unwind_spec_eh_frame(struct dso *dso, struct unwind_info *ui,
 				     u64 *table_data, u64 *segbase,
 				     u64 *fde_count)
 {
-	struct map_rb_node *map_node;
-	u64 base_addr = UINT64_MAX;
+	struct read_unwind_spec_eh_frame_maps_cb_args args = {
+		.dso = dso,
+		.base_addr = UINT64_MAX,
+	};
 	int ret, fd;
 
 	if (dso->data.eh_frame_hdr_offset == 0) {
@@ -325,16 +344,11 @@ static int read_unwind_spec_eh_frame(struct dso *dso, struct unwind_info *ui,
 			return -EINVAL;
 	}
 
-	maps__for_each_entry(thread__maps(ui->thread), map_node) {
-		struct map *map = map_node->map;
-		u64 start = map__start(map);
+	maps__for_each_map(thread__maps(ui->thread), read_unwind_spec_eh_frame_maps_cb, &args);
 
-		if (map__dso(map) == dso && start < base_addr)
-			base_addr = start;
-	}
-	base_addr -= dso->data.elf_base_addr;
+	args.base_addr -= dso->data.elf_base_addr;
 	/* Address of .eh_frame_hdr */
-	*segbase = base_addr + dso->data.eh_frame_hdr_addr;
+	*segbase = args.base_addr + dso->data.eh_frame_hdr_addr;
 	ret = unwind_spec_ehframe(dso, ui->machine, dso->data.eh_frame_hdr_offset,
 				   table_data, fde_count);
 	if (ret)
diff --git a/tools/perf/util/vdso.c b/tools/perf/util/vdso.c
index ae3eee69b659..df8963796187 100644
--- a/tools/perf/util/vdso.c
+++ b/tools/perf/util/vdso.c
@@ -140,23 +140,34 @@ static struct dso *__machine__addnew_vdso(struct machine *machine, const char *s
 	return dso;
 }
 
+struct machine__thread_dso_type_maps_cb_args {
+	struct machine *machine;
+	enum dso_type dso_type;
+};
+
+static int machine__thread_dso_type_maps_cb(struct map *map, void *data)
+{
+	struct machine__thread_dso_type_maps_cb_args *args = data;
+	struct dso *dso = map__dso(map);
+
+	if (!dso || dso->long_name[0] != '/')
+		return 0;
+
+	args->dso_type = dso__type(dso, args->machine);
+	return (args->dso_type != DSO__TYPE_UNKNOWN) ? 1 : 0;
+}
+
 static enum dso_type machine__thread_dso_type(struct machine *machine,
 					      struct thread *thread)
 {
-	enum dso_type dso_type = DSO__TYPE_UNKNOWN;
-	struct map_rb_node *rb_node;
-
-	maps__for_each_entry(thread__maps(thread), rb_node) {
-		struct dso *dso = map__dso(rb_node->map);
+	struct machine__thread_dso_type_maps_cb_args args = {
+		.machine = machine,
+		.dso_type = DSO__TYPE_UNKNOWN,
+	};
 
-		if (!dso || dso->long_name[0] != '/')
-			continue;
-		dso_type = dso__type(dso, machine);
-		if (dso_type != DSO__TYPE_UNKNOWN)
-			break;
-	}
+	maps__for_each_map(thread__maps(thread), machine__thread_dso_type_maps_cb, &args);
 
-	return dso_type;
+	return args.dso_type;
 }
 
 #if BITS_PER_LONG == 64
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 19/53] perf maps: Add remove maps function to remove a map based on callback
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (17 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 18/53] perf maps: Add maps__for_each_map to call a function on each entry Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 20/53] perf debug: Expose debug file Ian Rogers
                   ` (33 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Removing maps wasn't being done under the write lock. Similar to
maps__for_each_map, iterate the entries but in this case remove the
entry based on the result of the callback. If an entry was removed
then maps_by_name also needs updating, so add missed flush.

In dso__load_kcore, the test of map to save would always be false with
REFCNT_CHECKING because of a missing RC_CHK_ACCESS.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/maps.c   | 24 ++++++++++++++++++++++++
 tools/perf/util/maps.h   |  6 ++----
 tools/perf/util/symbol.c | 24 ++++++++++++------------
 3 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 00e6589bba10..f13fd3a9686b 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -13,6 +13,10 @@
 #define maps__for_each_entry(maps, map) \
 	for (map = maps__first(maps); map; map = map_rb_node__next(map))
 
+#define maps__for_each_entry_safe(maps, map, next) \
+	for (map = maps__first(maps), next = map_rb_node__next(map); map; \
+	     map = next, next = map_rb_node__next(map))
+
 static void maps__init(struct maps *maps, struct machine *machine)
 {
 	refcount_set(maps__refcnt(maps), 1);
@@ -214,6 +218,26 @@ int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data)
 	return ret;
 }
 
+void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map, void *data), void *data)
+{
+	struct map_rb_node *pos, *next;
+	unsigned int start_nr_maps;
+
+	down_write(maps__lock(maps));
+
+	start_nr_maps = maps__nr_maps(maps);
+	maps__for_each_entry_safe(maps, pos, next)	{
+		if (cb(pos->map, data)) {
+			__maps__remove(maps, pos);
+			--RC_CHK_ACCESS(maps)->nr_maps;
+		}
+	}
+	if (maps__maps_by_name(maps) && start_nr_maps != maps__nr_maps(maps))
+		__maps__free_maps_by_name(maps);
+
+	up_write(maps__lock(maps));
+}
+
 struct symbol *maps__find_symbol(struct maps *maps, u64 addr, struct map **mapp)
 {
 	struct map *map = maps__find(maps, addr);
diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index 8ac30cdaf5bd..b94ad5c8fea7 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -36,10 +36,6 @@ struct map_rb_node *map_rb_node__next(struct map_rb_node *node);
 struct map_rb_node *maps__find_node(struct maps *maps, struct map *map);
 struct map *maps__find(struct maps *maps, u64 addr);
 
-#define maps__for_each_entry_safe(maps, map, next) \
-	for (map = maps__first(maps), next = map_rb_node__next(map); map; \
-	     map = next, next = map_rb_node__next(map))
-
 DECLARE_RC_STRUCT(maps) {
 	struct rb_root      entries;
 	struct rw_semaphore lock;
@@ -80,6 +76,8 @@ static inline void __maps__zput(struct maps **map)
 
 /* Iterate over map calling cb for each entry. */
 int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data), void *data);
+/* Iterate over map removing an entry if cb returns true. */
+void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map, void *data), void *data);
 
 static inline struct rb_root *maps__entries(struct maps *maps)
 {
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 72f03b875478..30da8a405d11 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1239,13 +1239,23 @@ static int kcore_mapfn(u64 start, u64 len, u64 pgoff, void *data)
 	return 0;
 }
 
+static bool remove_old_maps(struct map *map, void *data)
+{
+	const struct map *map_to_save = data;
+
+	/*
+	 * We need to preserve eBPF maps even if they are covered by kcore,
+	 * because we need to access eBPF dso for source data.
+	 */
+	return RC_CHK_ACCESS(map) != RC_CHK_ACCESS(map_to_save) && !__map__is_bpf_prog(map);
+}
+
 static int dso__load_kcore(struct dso *dso, struct map *map,
 			   const char *kallsyms_filename)
 {
 	struct maps *kmaps = map__kmaps(map);
 	struct kcore_mapfn_data md;
 	struct map *replacement_map = NULL;
-	struct map_rb_node *old_node, *next;
 	struct machine *machine;
 	bool is_64_bit;
 	int err, fd;
@@ -1292,17 +1302,7 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
 	}
 
 	/* Remove old maps */
-	maps__for_each_entry_safe(kmaps, old_node, next) {
-		struct map *old_map = old_node->map;
-
-		/*
-		 * We need to preserve eBPF maps even if they are
-		 * covered by kcore, because we need to access
-		 * eBPF dso for source data.
-		 */
-		if (old_map != map && !__map__is_bpf_prog(old_map))
-			maps__remove(kmaps, old_map);
-	}
+	maps__remove_maps(kmaps, remove_old_maps, map);
 	machine->trampolines_mapped = false;
 
 	/* Find the kernel map using the '_stext' symbol */
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 20/53] perf debug: Expose debug file
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (18 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 19/53] perf maps: Add remove maps function to remove a map based on callback Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 21/53] perf maps: Refactor maps__fixup_overlappings Ian Rogers
                   ` (32 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Some dumping call backs need to be passed a FILE*. Expose debug file
via an accessor API for a consistent way to do this. Catch the
unlikely failure of it not being set. Switch two cases where stderr
was being used instead of debug_file.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/debug.c | 22 +++++++++++++++-------
 tools/perf/util/debug.h |  1 +
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/debug.c b/tools/perf/util/debug.c
index 88378c4c5dd9..e282b4ceb4d2 100644
--- a/tools/perf/util/debug.c
+++ b/tools/perf/util/debug.c
@@ -38,12 +38,21 @@ bool dump_trace = false, quiet = false;
 int debug_ordered_events;
 static int redirect_to_stderr;
 int debug_data_convert;
-static FILE *debug_file;
+static FILE *_debug_file;
 bool debug_display_time;
 
+FILE *debug_file(void)
+{
+	if (!_debug_file) {
+		pr_warning_once("debug_file not set");
+		debug_set_file(stderr);
+	}
+	return _debug_file;
+}
+
 void debug_set_file(FILE *file)
 {
-	debug_file = file;
+	_debug_file = file;
 }
 
 void debug_set_display_time(bool set)
@@ -78,8 +87,8 @@ int veprintf(int level, int var, const char *fmt, va_list args)
 		if (use_browser >= 1 && !redirect_to_stderr) {
 			ui_helpline__vshow(fmt, args);
 		} else {
-			ret = fprintf_time(debug_file);
-			ret += vfprintf(debug_file, fmt, args);
+			ret = fprintf_time(debug_file());
+			ret += vfprintf(debug_file(), fmt, args);
 		}
 	}
 
@@ -107,9 +116,8 @@ static int veprintf_time(u64 t, const char *fmt, va_list args)
 	nsecs -= secs  * NSEC_PER_SEC;
 	usecs  = nsecs / NSEC_PER_USEC;
 
-	ret = fprintf(stderr, "[%13" PRIu64 ".%06" PRIu64 "] ",
-		      secs, usecs);
-	ret += vfprintf(stderr, fmt, args);
+	ret = fprintf(debug_file(), "[%13" PRIu64 ".%06" PRIu64 "] ", secs, usecs);
+	ret += vfprintf(debug_file(), fmt, args);
 	return ret;
 }
 
diff --git a/tools/perf/util/debug.h b/tools/perf/util/debug.h
index f99468a7f681..de8870980d44 100644
--- a/tools/perf/util/debug.h
+++ b/tools/perf/util/debug.h
@@ -77,6 +77,7 @@ int eprintf_time(int level, int var, u64 t, const char *fmt, ...) __printf(4, 5)
 int veprintf(int level, int var, const char *fmt, va_list args);
 
 int perf_debug_option(const char *str);
+FILE *debug_file(void);
 void debug_set_file(FILE *file);
 void debug_set_display_time(bool set);
 void perf_debug_setup(void);
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 21/53] perf maps: Refactor maps__fixup_overlappings
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (19 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 20/53] perf debug: Expose debug file Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 22/53] perf maps: Do simple merge if given map doesn't overlap Ian Rogers
                   ` (31 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Rename to maps__fixup_overlap_and_insert as the given mapping is
always inserted. Factor out first_ending_after as a utility
function. Minor variable name changes. Switch to using debug_file()
rather than passing a debug FILE*.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/maps.c   | 62 ++++++++++++++++++++++++----------------
 tools/perf/util/maps.h   |  2 +-
 tools/perf/util/thread.c |  3 +-
 3 files changed, 39 insertions(+), 28 deletions(-)

diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index f13fd3a9686b..40df08dd9bf3 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -334,20 +334,16 @@ size_t maps__fprintf(struct maps *maps, FILE *fp)
 	return args.printed;
 }
 
-int maps__fixup_overlappings(struct maps *maps, struct map *map, FILE *fp)
+/*
+ * Find first map where end > new->start.
+ * Same as find_vma() in kernel.
+ */
+static struct rb_node *first_ending_after(struct maps *maps, const struct map *map)
 {
 	struct rb_root *root;
 	struct rb_node *next, *first;
-	int err = 0;
-
-	down_write(maps__lock(maps));
 
 	root = maps__entries(maps);
-
-	/*
-	 * Find first map where end > map->start.
-	 * Same as find_vma() in kernel.
-	 */
 	next = root->rb_node;
 	first = NULL;
 	while (next) {
@@ -361,8 +357,22 @@ int maps__fixup_overlappings(struct maps *maps, struct map *map, FILE *fp)
 		} else
 			next = next->rb_right;
 	}
+	return first;
+}
+
+/*
+ * Adds new to maps, if new overlaps existing entries then the existing maps are
+ * adjusted or removed so that new fits without overlapping any entries.
+ */
+int maps__fixup_overlap_and_insert(struct maps *maps, struct map *new)
+{
+
+	struct rb_node *next;
+	int err = 0;
+
+	down_write(maps__lock(maps));
 
-	next = first;
+	next = first_ending_after(maps, new);
 	while (next && !err) {
 		struct map_rb_node *pos = rb_entry(next, struct map_rb_node, rb_node);
 		next = rb_next(&pos->rb_node);
@@ -371,27 +381,27 @@ int maps__fixup_overlappings(struct maps *maps, struct map *map, FILE *fp)
 		 * Stop if current map starts after map->end.
 		 * Maps are ordered by start: next will not overlap for sure.
 		 */
-		if (map__start(pos->map) >= map__end(map))
+		if (map__start(pos->map) >= map__end(new))
 			break;
 
 		if (verbose >= 2) {
 
 			if (use_browser) {
 				pr_debug("overlapping maps in %s (disable tui for more info)\n",
-					 map__dso(map)->name);
+					 map__dso(new)->name);
 			} else {
-				fputs("overlapping maps:\n", fp);
-				map__fprintf(map, fp);
-				map__fprintf(pos->map, fp);
+				pr_debug("overlapping maps:\n");
+				map__fprintf(new, debug_file());
+				map__fprintf(pos->map, debug_file());
 			}
 		}
 
-		rb_erase_init(&pos->rb_node, root);
+		rb_erase_init(&pos->rb_node, maps__entries(maps));
 		/*
 		 * Now check if we need to create new maps for areas not
 		 * overlapped by the new map:
 		 */
-		if (map__start(map) > map__start(pos->map)) {
+		if (map__start(new) > map__start(pos->map)) {
 			struct map *before = map__clone(pos->map);
 
 			if (before == NULL) {
@@ -399,7 +409,7 @@ int maps__fixup_overlappings(struct maps *maps, struct map *map, FILE *fp)
 				goto put_map;
 			}
 
-			map__set_end(before, map__start(map));
+			map__set_end(before, map__start(new));
 			err = __maps__insert(maps, before);
 			if (err) {
 				map__put(before);
@@ -407,11 +417,11 @@ int maps__fixup_overlappings(struct maps *maps, struct map *map, FILE *fp)
 			}
 
 			if (verbose >= 2 && !use_browser)
-				map__fprintf(before, fp);
+				map__fprintf(before, debug_file());
 			map__put(before);
 		}
 
-		if (map__end(map) < map__end(pos->map)) {
+		if (map__end(new) < map__end(pos->map)) {
 			struct map *after = map__clone(pos->map);
 
 			if (after == NULL) {
@@ -419,23 +429,25 @@ int maps__fixup_overlappings(struct maps *maps, struct map *map, FILE *fp)
 				goto put_map;
 			}
 
-			map__set_start(after, map__end(map));
-			map__add_pgoff(after, map__end(map) - map__start(pos->map));
-			assert(map__map_ip(pos->map, map__end(map)) ==
-				map__map_ip(after, map__end(map)));
+			map__set_start(after, map__end(new));
+			map__add_pgoff(after, map__end(new) - map__start(pos->map));
+			assert(map__map_ip(pos->map, map__end(new)) ==
+				map__map_ip(after, map__end(new)));
 			err = __maps__insert(maps, after);
 			if (err) {
 				map__put(after);
 				goto put_map;
 			}
 			if (verbose >= 2 && !use_browser)
-				map__fprintf(after, fp);
+				map__fprintf(after, debug_file());
 			map__put(after);
 		}
 put_map:
 		map__put(pos->map);
 		free(pos);
 	}
+	/* Add the map. */
+	err = __maps__insert(maps, new);
 	up_write(maps__lock(maps));
 	return err;
 }
diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index b94ad5c8fea7..62e94d443c02 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -133,7 +133,7 @@ struct addr_map_symbol;
 
 int maps__find_ams(struct maps *maps, struct addr_map_symbol *ams);
 
-int maps__fixup_overlappings(struct maps *maps, struct map *map, FILE *fp);
+int maps__fixup_overlap_and_insert(struct maps *maps, struct map *new);
 
 struct map *maps__find_by_name(struct maps *maps, const char *name);
 
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index b6986a81aa6d..3d47b5c5528b 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -345,8 +345,7 @@ int thread__insert_map(struct thread *thread, struct map *map)
 	if (ret)
 		return ret;
 
-	maps__fixup_overlappings(thread__maps(thread), map, stderr);
-	return maps__insert(thread__maps(thread), map);
+	return maps__fixup_overlap_and_insert(thread__maps(thread), map);
 }
 
 struct thread__prepare_access_maps_cb_args {
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 22/53] perf maps: Do simple merge if given map doesn't overlap
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (20 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 21/53] perf maps: Refactor maps__fixup_overlappings Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 23/53] perf maps: Rename clone to copy from Ian Rogers
                   ` (30 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Simplify merge in for the simple case of a non-overlapping map.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/maps.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 40df08dd9bf3..14e1a169433d 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -696,9 +696,20 @@ void maps__fixup_end(struct maps *maps)
 int maps__merge_in(struct maps *kmaps, struct map *new_map)
 {
 	struct map_rb_node *rb_node;
+	struct rb_node *first;
+	bool overlaps;
 	LIST_HEAD(merged);
 	int err = 0;
 
+	down_read(maps__lock(kmaps));
+	first = first_ending_after(kmaps, new_map);
+	overlaps = first &&
+		map__start(rb_entry(first, struct map_rb_node, rb_node)->map) < map__end(new_map);
+	up_read(maps__lock(kmaps));
+
+	if (!overlaps)
+		return maps__insert(kmaps, new_map);
+
 	maps__for_each_entry(kmaps, rb_node) {
 		struct map *old_map = rb_node->map;
 
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 23/53] perf maps: Rename clone to copy from
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (21 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 22/53] perf maps: Do simple merge if given map doesn't overlap Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 24/53] perf maps: Add maps__load_first Ian Rogers
                   ` (29 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Rename maps__clone to maps__copy_from to be more intention revealing
of its behavior. Pass the underlying maps rather than the thread.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/machine.c | 2 +-
 tools/perf/util/maps.c    | 6 +-----
 tools/perf/util/maps.h    | 3 +--
 tools/perf/util/thread.c  | 2 +-
 4 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 3c967295c9a3..191e492539e5 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -454,7 +454,7 @@ static struct thread *findnew_guest_code(struct machine *machine,
 	 * Guest code can be found in hypervisor process at the same address
 	 * so copy host maps.
 	 */
-	err = maps__clone(thread, thread__maps(host_thread));
+	err = maps__copy_from(thread__maps(thread), thread__maps(host_thread));
 	thread__put(host_thread);
 	if (err)
 		goto out_err;
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 14e1a169433d..85bea2a6dca9 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -452,12 +452,8 @@ int maps__fixup_overlap_and_insert(struct maps *maps, struct map *new)
 	return err;
 }
 
-/*
- * XXX This should not really _copy_ te maps, but refcount them.
- */
-int maps__clone(struct thread *thread, struct maps *parent)
+int maps__copy_from(struct maps *maps, struct maps *parent)
 {
-	struct maps *maps = thread__maps(thread);
 	int err;
 	struct map_rb_node *rb_node;
 
diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index 62e94d443c02..e4a49d6ff5cf 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -14,7 +14,6 @@ struct ref_reloc_sym;
 struct machine;
 struct map;
 struct maps;
-struct thread;
 
 struct map_rb_node {
 	struct rb_node rb_node;
@@ -61,7 +60,7 @@ struct kmap {
 
 struct maps *maps__new(struct machine *machine);
 bool maps__empty(struct maps *maps);
-int maps__clone(struct thread *thread, struct maps *parent);
+int maps__copy_from(struct maps *maps, struct maps *parent);
 
 struct maps *maps__get(struct maps *maps);
 void maps__put(struct maps *maps);
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 3d47b5c5528b..89c47a5098e2 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -390,7 +390,7 @@ static int thread__clone_maps(struct thread *thread, struct thread *parent, bool
 		return 0;
 	}
 	/* But this one is new process, copy maps. */
-	return do_maps_clone ? maps__clone(thread, thread__maps(parent)) : 0;
+	return do_maps_clone ? maps__copy_from(thread__maps(thread), thread__maps(parent)) : 0;
 }
 
 int thread__fork(struct thread *thread, struct thread *parent, u64 timestamp, bool do_maps_clone)
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 24/53] perf maps: Add maps__load_first
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (22 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 23/53] perf maps: Rename clone to copy from Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 25/53] perf maps: Add find next entry to give entry after the given map Ian Rogers
                   ` (28 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Avoid bpf_lock_contention_touching the internal maps data structure by
adding a helper function. As access is done directly on the map in
maps, hold the read lock to stop it being removed.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/bpf_lock_contention.c |  2 +-
 tools/perf/util/maps.c                | 13 +++++++++++++
 tools/perf/util/maps.h                |  2 ++
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
index e105245eb905..d9720a910330 100644
--- a/tools/perf/util/bpf_lock_contention.c
+++ b/tools/perf/util/bpf_lock_contention.c
@@ -317,7 +317,7 @@ int lock_contention_read(struct lock_contention *con)
 	}
 
 	/* make sure it loads the kernel map */
-	map__load(maps__first(machine->kmaps)->map);
+	maps__load_first(machine->kmaps);
 
 	prev_key = NULL;
 	while (!bpf_map_get_next_key(fd, prev_key, &key)) {
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 85bea2a6dca9..9a84d26328a7 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -792,3 +792,16 @@ int maps__merge_in(struct maps *kmaps, struct map *new_map)
 	}
 	return err;
 }
+
+void maps__load_first(struct maps *maps)
+{
+	struct map_rb_node *first;
+
+	down_read(maps__lock(maps));
+
+	first = maps__first(maps);
+	if (first)
+		map__load(first->map);
+
+	up_read(maps__lock(maps));
+}
diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index e4a49d6ff5cf..b7ab3ec61b7c 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -142,4 +142,6 @@ void __maps__sort_by_name(struct maps *maps);
 
 void maps__fixup_end(struct maps *maps);
 
+void maps__load_first(struct maps *maps);
+
 #endif // __PERF_MAPS_H
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 25/53] perf maps: Add find next entry to give entry after the given map
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (23 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 24/53] perf maps: Add maps__load_first Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 26/53] perf maps: Reduce scope of map_rb_node and maps internals Ian Rogers
                   ` (27 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Use to remove map_rb_node use from machine.c.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/machine.c |  7 +++----
 tools/perf/util/maps.c    | 11 +++++++++++
 tools/perf/util/maps.h    |  2 ++
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 191e492539e5..ab345604f274 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1759,12 +1759,11 @@ int machine__create_kernel_maps(struct machine *machine)
 
 	if (end == ~0ULL) {
 		/* update end address of the kernel map using adjacent module address */
-		struct map_rb_node *rb_node = maps__find_node(machine__kernel_maps(machine),
-							machine__kernel_map(machine));
-		struct map_rb_node *next = map_rb_node__next(rb_node);
+		struct map *next = maps__find_next_entry(machine__kernel_maps(machine),
+							 machine__kernel_map(machine));
 
 		if (next)
-			machine__set_kernel_mmap(machine, start, map__start(next->map));
+			machine__set_kernel_mmap(machine, start, map__start(next));
 	}
 
 out_put:
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 9a84d26328a7..38d56709bd5e 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -662,6 +662,17 @@ struct map *maps__find_by_name(struct maps *maps, const char *name)
 	return map;
 }
 
+struct map *maps__find_next_entry(struct maps *maps, struct map *map)
+{
+	struct map_rb_node *rb_node = maps__find_node(maps, map);
+	struct map_rb_node *next = map_rb_node__next(rb_node);
+
+	if (next)
+		return next->map;
+
+	return NULL;
+}
+
 void maps__fixup_end(struct maps *maps)
 {
 	struct map_rb_node *prev = NULL, *curr;
diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index b7ab3ec61b7c..84b42c8456e8 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -136,6 +136,8 @@ int maps__fixup_overlap_and_insert(struct maps *maps, struct map *new);
 
 struct map *maps__find_by_name(struct maps *maps, const char *name);
 
+struct map *maps__find_next_entry(struct maps *maps, struct map *map);
+
 int maps__merge_in(struct maps *kmaps, struct map *new_map);
 
 void __maps__sort_by_name(struct maps *maps);
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 26/53] perf maps: Reduce scope of map_rb_node and maps internals
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (24 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 25/53] perf maps: Add find next entry to give entry after the given map Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 27/53] perf maps: Fix up overlaps during fixup_end Ian Rogers
                   ` (26 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Avoid exposing the implementation of maps so that the internals can be
refactored.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/maps.c | 90 ++++++++++++++++++++++++++----------------
 tools/perf/util/maps.h | 23 -----------
 2 files changed, 55 insertions(+), 58 deletions(-)

diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 38d56709bd5e..01c15d0b300a 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -10,6 +10,11 @@
 #include "ui/ui.h"
 #include "unwind.h"
 
+struct map_rb_node {
+	struct rb_node rb_node;
+	struct map *map;
+};
+
 #define maps__for_each_entry(maps, map) \
 	for (map = maps__first(maps); map; map = map_rb_node__next(map))
 
@@ -17,6 +22,56 @@
 	for (map = maps__first(maps), next = map_rb_node__next(map); map; \
 	     map = next, next = map_rb_node__next(map))
 
+static struct rb_root *maps__entries(struct maps *maps)
+{
+	return &RC_CHK_ACCESS(maps)->entries;
+}
+
+static struct rw_semaphore *maps__lock(struct maps *maps)
+{
+	return &RC_CHK_ACCESS(maps)->lock;
+}
+
+static struct map **maps__maps_by_name(struct maps *maps)
+{
+	return RC_CHK_ACCESS(maps)->maps_by_name;
+}
+
+static struct map_rb_node *maps__first(struct maps *maps)
+{
+	struct rb_node *first = rb_first(maps__entries(maps));
+
+	if (first)
+		return rb_entry(first, struct map_rb_node, rb_node);
+	return NULL;
+}
+
+static struct map_rb_node *map_rb_node__next(struct map_rb_node *node)
+{
+	struct rb_node *next;
+
+	if (!node)
+		return NULL;
+
+	next = rb_next(&node->rb_node);
+
+	if (!next)
+		return NULL;
+
+	return rb_entry(next, struct map_rb_node, rb_node);
+}
+
+static struct map_rb_node *maps__find_node(struct maps *maps, struct map *map)
+{
+	struct map_rb_node *rb_node;
+
+	maps__for_each_entry(maps, rb_node) {
+		if (rb_node->RC_CHK_ACCESS(map) == RC_CHK_ACCESS(map))
+			return rb_node;
+	}
+	return NULL;
+}
+
 static void maps__init(struct maps *maps, struct machine *machine)
 {
 	refcount_set(maps__refcnt(maps), 1);
@@ -484,17 +539,6 @@ int maps__copy_from(struct maps *maps, struct maps *parent)
 	return err;
 }
 
-struct map_rb_node *maps__find_node(struct maps *maps, struct map *map)
-{
-	struct map_rb_node *rb_node;
-
-	maps__for_each_entry(maps, rb_node) {
-		if (rb_node->RC_CHK_ACCESS(map) == RC_CHK_ACCESS(map))
-			return rb_node;
-	}
-	return NULL;
-}
-
 struct map *maps__find(struct maps *maps, u64 ip)
 {
 	struct rb_node *p;
@@ -520,30 +564,6 @@ struct map *maps__find(struct maps *maps, u64 ip)
 	return m ? m->map : NULL;
 }
 
-struct map_rb_node *maps__first(struct maps *maps)
-{
-	struct rb_node *first = rb_first(maps__entries(maps));
-
-	if (first)
-		return rb_entry(first, struct map_rb_node, rb_node);
-	return NULL;
-}
-
-struct map_rb_node *map_rb_node__next(struct map_rb_node *node)
-{
-	struct rb_node *next;
-
-	if (!node)
-		return NULL;
-
-	next = rb_next(&node->rb_node);
-
-	if (!next)
-		return NULL;
-
-	return rb_entry(next, struct map_rb_node, rb_node);
-}
-
 static int map__strcmp(const void *a, const void *b)
 {
 	const struct map *map_a = *(const struct map **)a;
diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index 84b42c8456e8..d836d04c9402 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -15,11 +15,6 @@ struct machine;
 struct map;
 struct maps;
 
-struct map_rb_node {
-	struct rb_node rb_node;
-	struct map *map;
-};
-
 struct map_list_node {
 	struct list_head node;
 	struct map *map;
@@ -30,9 +25,6 @@ static inline struct map_list_node *map_list_node__new(void)
 	return malloc(sizeof(struct map_list_node));
 }
 
-struct map_rb_node *maps__first(struct maps *maps);
-struct map_rb_node *map_rb_node__next(struct map_rb_node *node);
-struct map_rb_node *maps__find_node(struct maps *maps, struct map *map);
 struct map *maps__find(struct maps *maps, u64 addr);
 
 DECLARE_RC_STRUCT(maps) {
@@ -78,26 +70,11 @@ int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data)
 /* Iterate over map removing an entry if cb returns true. */
 void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map, void *data), void *data);
 
-static inline struct rb_root *maps__entries(struct maps *maps)
-{
-	return &RC_CHK_ACCESS(maps)->entries;
-}
-
 static inline struct machine *maps__machine(struct maps *maps)
 {
 	return RC_CHK_ACCESS(maps)->machine;
 }
 
-static inline struct rw_semaphore *maps__lock(struct maps *maps)
-{
-	return &RC_CHK_ACCESS(maps)->lock;
-}
-
-static inline struct map **maps__maps_by_name(struct maps *maps)
-{
-	return RC_CHK_ACCESS(maps)->maps_by_name;
-}
-
 static inline unsigned int maps__nr_maps(const struct maps *maps)
 {
 	return RC_CHK_ACCESS(maps)->nr_maps;
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 27/53] perf maps: Fix up overlaps during fixup_end
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (25 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 26/53] perf maps: Reduce scope of map_rb_node and maps internals Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 28/53] perf maps: Switch from rbtree to lazily sorted array for addresses Ian Rogers
                   ` (25 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Maps are sometimes made overlapping, in particular kernel maps. If the
end of a map overlaps the start of the next, shorten the overlapping
map. This should remove potential non-determinism in maps__find, ie
finding maps by address.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/maps.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 01c15d0b300a..fba95a00ecdf 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -700,7 +700,7 @@ void maps__fixup_end(struct maps *maps)
 	down_write(maps__lock(maps));
 
 	maps__for_each_entry(maps, curr) {
-		if (prev != NULL && !map__end(prev->map))
+		if (prev && (!map__end(prev->map) || map__end(prev->map) > map__start(curr->map)))
 			map__set_end(prev->map, map__start(curr->map));
 
 		prev = curr;
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 28/53] perf maps: Switch from rbtree to lazily sorted array for addresses
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (26 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 27/53] perf maps: Fix up overlaps during fixup_end Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 29/53] perf maps: Get map before returning in maps__find Ian Rogers
                   ` (24 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Maps is a collection of maps primarily sorted by the starting address
of the map. Prior to this change the maps were held in an rbtree
requiring 4 pointers per node. Prior to reference count checking, the
rbnode was embedded in the map so 3 pointers per node were
necessary. This change switches the rbtree to an array lazily sorted
by address, much as the array sorting nodes by name. 1 pointer is
needed per node, but to avoid excessive resizing the backing array may
be twice the number of used elements. Meaning the memory overhead is
roughly half that of the rbtree. For a perf record with
"--no-bpf-event -g -a" of true, the memory overhead of perf inject is
reduce fom 3.3MB to 3MB, so 10% or 300KB is saved.

Map inserts always happen at the end of the array. The code tracks
whether the insertion violates the sorting property. O(log n) rb-tree
complexity is switched to O(1).

Remove slides the array, so O(log n) rb-tree complexity is degraded to
O(n).

A find may need to sort the array using qsort which is O(n*log n), but
in general the maps should be sorted and so average performance should
be O(log n) as with the rbtree.

An rbtree node consumes a cache line, but with the array 4 nodes fit
on a cache line. Iteration is simplified to scanning an array rather
than pointer chasing.

Overall it is expected the performance after the change should be
comparable to before, but with half of the memory consumed.

To avoid a list and repeated logic around splitting maps,
maps__merge_in is rewritten in terms of
maps__fixup_overlap_and_insert. maps_merge_in splits the given mapping
inserting remaining gaps. maps__fixup_overlap_and_insert splits the
existing mappings, then adds the incoming mapping. By adding the new
mapping first, then re-inserting the existing mappings the splitting
behavior matches.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/tests/maps.c |    3 +
 tools/perf/util/map.c   |    1 +
 tools/perf/util/maps.c  | 1183 +++++++++++++++++++++++----------------
 tools/perf/util/maps.h  |   54 +-
 4 files changed, 757 insertions(+), 484 deletions(-)

diff --git a/tools/perf/tests/maps.c b/tools/perf/tests/maps.c
index bb3fbfe5a73e..b15417a0d617 100644
--- a/tools/perf/tests/maps.c
+++ b/tools/perf/tests/maps.c
@@ -156,6 +156,9 @@ static int test__maps__merge_in(struct test_suite *t __maybe_unused, int subtest
 	TEST_ASSERT_VAL("merge check failed", !ret);
 
 	maps__zput(maps);
+	map__zput(map_kcore1);
+	map__zput(map_kcore2);
+	map__zput(map_kcore3);
 	return TEST_OK;
 }
 
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 54c67cb7ecef..cf5a15db3a1f 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -168,6 +168,7 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
 		if (dso == NULL)
 			goto out_delete;
 
+		assert(!dso->kernel);
 		map__init(result, start, start + len, pgoff, dso);
 
 		if (anon || no_dso) {
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index fba95a00ecdf..06fdd8a7c2a2 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -10,286 +10,477 @@
 #include "ui/ui.h"
 #include "unwind.h"
 
-struct map_rb_node {
-	struct rb_node rb_node;
-	struct map *map;
-};
-
-#define maps__for_each_entry(maps, map) \
-	for (map = maps__first(maps); map; map = map_rb_node__next(map))
+static void check_invariants(const struct maps *maps __maybe_unused)
+{
+#ifndef NDEBUG
+	assert(RC_CHK_ACCESS(maps)->nr_maps <= RC_CHK_ACCESS(maps)->nr_maps_allocated);
+	for (unsigned int i = 0; i < RC_CHK_ACCESS(maps)->nr_maps; i++) {
+		struct map *map = RC_CHK_ACCESS(maps)->maps_by_address[i];
+
+		/* Check map is well-formed. */
+		assert(map__end(map) == 0 || map__start(map) <= map__end(map));
+		/* Expect at least 1 reference count. */
+		assert(refcount_read(map__refcnt(map)) > 0);
+
+		if (map__dso(map) && map__dso(map)->kernel)
+			assert(RC_CHK_EQUAL(map__kmap(map)->kmaps, maps));
+
+		if (i > 0) {
+			struct map *prev = RC_CHK_ACCESS(maps)->maps_by_address[i - 1];
+
+			/* If addresses are sorted... */
+			if (RC_CHK_ACCESS(maps)->maps_by_address_sorted) {
+				/* Maps should be in start address order. */
+				assert(map__start(prev) <= map__start(map));
+				/*
+				 * If the ends of maps aren't broken (during
+				 * construction) then they should be ordered
+				 * too.
+				 */
+				if (!RC_CHK_ACCESS(maps)->ends_broken) {
+					assert(map__end(prev) <= map__end(map));
+					assert(map__end(prev) <= map__start(map) ||
+					       map__start(prev) == map__start(map));
+				}
+			}
+		}
+	}
+	if (RC_CHK_ACCESS(maps)->maps_by_name) {
+		for (unsigned int i = 0; i < RC_CHK_ACCESS(maps)->nr_maps; i++) {
+			struct map *map = RC_CHK_ACCESS(maps)->maps_by_name[i];
 
-#define maps__for_each_entry_safe(maps, map, next) \
-	for (map = maps__first(maps), next = map_rb_node__next(map); map; \
-	     map = next, next = map_rb_node__next(map))
+			/*
+			 * Maps by name maps should be in maps_by_address, so
+			 * the reference count should be higher.
+			 */
+			assert(refcount_read(map__refcnt(map)) > 1);
+		}
+	}
+#endif
+}
 
-static struct rb_root *maps__entries(struct maps *maps)
+static struct map **maps__maps_by_address(const struct maps *maps)
 {
-	return &RC_CHK_ACCESS(maps)->entries;
+	return RC_CHK_ACCESS(maps)->maps_by_address;
 }
 
-static struct rw_semaphore *maps__lock(struct maps *maps)
+static void maps__set_maps_by_address(struct maps *maps, struct map **new)
 {
-	return &RC_CHK_ACCESS(maps)->lock;
+	RC_CHK_ACCESS(maps)->maps_by_address = new;
+
 }
 
-static struct map **maps__maps_by_name(struct maps *maps)
+/* Not in the header, to aid reference counting. */
+static struct map **maps__maps_by_name(const struct maps *maps)
 {
 	return RC_CHK_ACCESS(maps)->maps_by_name;
+
 }
 
-static struct map_rb_node *maps__first(struct maps *maps)
+static void maps__set_maps_by_name(struct maps *maps, struct map **new)
 {
-	struct rb_node *first = rb_first(maps__entries(maps));
+	RC_CHK_ACCESS(maps)->maps_by_name = new;
 
-	if (first)
-		return rb_entry(first, struct map_rb_node, rb_node);
-	return NULL;
 }
 
-static struct map_rb_node *map_rb_node__next(struct map_rb_node *node)
+static bool maps__maps_by_address_sorted(const struct maps *maps)
 {
-	struct rb_node *next;
-
-	if (!node)
-		return NULL;
-
-	next = rb_next(&node->rb_node);
+	return RC_CHK_ACCESS(maps)->maps_by_address_sorted;
+}
 
-	if (!next)
-		return NULL;
+static void maps__set_maps_by_address_sorted(struct maps *maps, bool value)
+{
+	RC_CHK_ACCESS(maps)->maps_by_address_sorted = value;
+}
 
-	return rb_entry(next, struct map_rb_node, rb_node);
+static bool maps__maps_by_name_sorted(const struct maps *maps)
+{
+	return RC_CHK_ACCESS(maps)->maps_by_name_sorted;
 }
 
-static struct map_rb_node *maps__find_node(struct maps *maps, struct map *map)
+static void maps__set_maps_by_name_sorted(struct maps *maps, bool value)
 {
-	struct map_rb_node *rb_node;
+	RC_CHK_ACCESS(maps)->maps_by_name_sorted = value;
+}
 
-	maps__for_each_entry(maps, rb_node) {
-		if (rb_node->RC_CHK_ACCESS(map) == RC_CHK_ACCESS(map))
-			return rb_node;
-	}
-	return NULL;
+static struct rw_semaphore *maps__lock(struct maps *maps)
+{
+	/*
+	 * When the lock is acquired or released the maps invariants should
+	 * hold.
+	 */
+	check_invariants(maps);
+	return &RC_CHK_ACCESS(maps)->lock;
 }
 
 static void maps__init(struct maps *maps, struct machine *machine)
 {
-	refcount_set(maps__refcnt(maps), 1);
 	init_rwsem(maps__lock(maps));
-	RC_CHK_ACCESS(maps)->entries = RB_ROOT;
+	RC_CHK_ACCESS(maps)->maps_by_address = NULL;
+	RC_CHK_ACCESS(maps)->maps_by_name = NULL;
 	RC_CHK_ACCESS(maps)->machine = machine;
-	RC_CHK_ACCESS(maps)->last_search_by_name = NULL;
+#ifdef HAVE_LIBUNWIND_SUPPORT
+	RC_CHK_ACCESS(maps)->addr_space = NULL;
+	RC_CHK_ACCESS(maps)->unwind_libunwind_ops = NULL;
+#endif
+	refcount_set(maps__refcnt(maps), 1);
 	RC_CHK_ACCESS(maps)->nr_maps = 0;
-	RC_CHK_ACCESS(maps)->maps_by_name = NULL;
+	RC_CHK_ACCESS(maps)->nr_maps_allocated = 0;
+	RC_CHK_ACCESS(maps)->last_search_by_name_idx = 0;
+	RC_CHK_ACCESS(maps)->maps_by_address_sorted = true;
+	RC_CHK_ACCESS(maps)->maps_by_name_sorted = false;
 }
 
-static void __maps__free_maps_by_name(struct maps *maps)
+static void maps__exit(struct maps *maps)
 {
-	/*
-	 * Free everything to try to do it from the rbtree in the next search
-	 */
-	for (unsigned int i = 0; i < maps__nr_maps(maps); i++)
-		map__put(maps__maps_by_name(maps)[i]);
+	struct map **maps_by_address = maps__maps_by_address(maps);
+	struct map **maps_by_name = maps__maps_by_name(maps);
 
-	zfree(&RC_CHK_ACCESS(maps)->maps_by_name);
-	RC_CHK_ACCESS(maps)->nr_maps_allocated = 0;
+	for (unsigned int i = 0; i < maps__nr_maps(maps); i++) {
+		map__zput(maps_by_address[i]);
+		if (maps_by_name)
+			map__zput(maps_by_name[i]);
+	}
+	zfree(&maps_by_address);
+	zfree(&maps_by_name);
+	unwind__finish_access(maps);
 }
 
-static int __maps__insert(struct maps *maps, struct map *map)
+struct maps *maps__new(struct machine *machine)
 {
-	struct rb_node **p = &maps__entries(maps)->rb_node;
-	struct rb_node *parent = NULL;
-	const u64 ip = map__start(map);
-	struct map_rb_node *m, *new_rb_node;
-
-	new_rb_node = malloc(sizeof(*new_rb_node));
-	if (!new_rb_node)
-		return -ENOMEM;
-
-	RB_CLEAR_NODE(&new_rb_node->rb_node);
-	new_rb_node->map = map__get(map);
+	struct maps *result;
+	RC_STRUCT(maps) *maps = zalloc(sizeof(*maps));
 
-	while (*p != NULL) {
-		parent = *p;
-		m = rb_entry(parent, struct map_rb_node, rb_node);
-		if (ip < map__start(m->map))
-			p = &(*p)->rb_left;
-		else
-			p = &(*p)->rb_right;
-	}
+	if (ADD_RC_CHK(result, maps))
+		maps__init(result, machine);
 
-	rb_link_node(&new_rb_node->rb_node, parent, p);
-	rb_insert_color(&new_rb_node->rb_node, maps__entries(maps));
-	return 0;
+	return result;
 }
 
-int maps__insert(struct maps *maps, struct map *map)
+static void maps__delete(struct maps *maps)
 {
-	int err;
-	const struct dso *dso = map__dso(map);
-
-	down_write(maps__lock(maps));
-	err = __maps__insert(maps, map);
-	if (err)
-		goto out;
+	maps__exit(maps);
+	RC_CHK_FREE(maps);
+}
 
-	++RC_CHK_ACCESS(maps)->nr_maps;
+struct maps *maps__get(struct maps *maps)
+{
+	struct maps *result;
 
-	if (dso && dso->kernel) {
-		struct kmap *kmap = map__kmap(map);
+	if (RC_CHK_GET(result, maps))
+		refcount_inc(maps__refcnt(maps));
 
-		if (kmap)
-			kmap->kmaps = maps;
-		else
-			pr_err("Internal error: kernel dso with non kernel map\n");
-	}
+	return result;
+}
 
+void maps__put(struct maps *maps)
+{
+	if (maps && refcount_dec_and_test(maps__refcnt(maps)))
+		maps__delete(maps);
+	else
+		RC_CHK_PUT(maps);
+}
 
+static void __maps__free_maps_by_name(struct maps *maps)
+{
 	/*
-	 * If we already performed some search by name, then we need to add the just
-	 * inserted map and resort.
+	 * Free everything to try to do it from the rbtree in the next search
 	 */
-	if (maps__maps_by_name(maps)) {
-		if (maps__nr_maps(maps) > RC_CHK_ACCESS(maps)->nr_maps_allocated) {
-			int nr_allocate = maps__nr_maps(maps) * 2;
-			struct map **maps_by_name = realloc(maps__maps_by_name(maps),
-							    nr_allocate * sizeof(map));
+	for (unsigned int i = 0; i < maps__nr_maps(maps); i++)
+		map__put(maps__maps_by_name(maps)[i]);
 
-			if (maps_by_name == NULL) {
-				__maps__free_maps_by_name(maps);
-				err = -ENOMEM;
-				goto out;
-			}
+	zfree(&RC_CHK_ACCESS(maps)->maps_by_name);
+}
 
-			RC_CHK_ACCESS(maps)->maps_by_name = maps_by_name;
-			RC_CHK_ACCESS(maps)->nr_maps_allocated = nr_allocate;
+static int map__start_cmp(const void *a, const void *b)
+{
+	const struct map *map_a = *(const struct map * const *)a;
+	const struct map *map_b = *(const struct map * const *)b;
+	u64 map_a_start = map__start(map_a);
+	u64 map_b_start = map__start(map_b);
+
+	if (map_a_start == map_b_start) {
+		u64 map_a_end = map__end(map_a);
+		u64 map_b_end = map__end(map_b);
+
+		if  (map_a_end == map_b_end) {
+			/* Ensure maps with the same addresses have a fixed order. */
+			if (RC_CHK_ACCESS(map_a) == RC_CHK_ACCESS(map_b))
+				return 0;
+			return (intptr_t)RC_CHK_ACCESS(map_a) > (intptr_t)RC_CHK_ACCESS(map_b)
+				? 1 : -1;
 		}
-		maps__maps_by_name(maps)[maps__nr_maps(maps) - 1] = map__get(map);
-		__maps__sort_by_name(maps);
+		return map_a_end > map_b_end ? 1 : -1;
 	}
- out:
-	up_write(maps__lock(maps));
-	return err;
+	return map_a_start > map_b_start ? 1 : -1;
 }
 
-static void __maps__remove(struct maps *maps, struct map_rb_node *rb_node)
+static void __maps__sort_by_address(struct maps *maps)
 {
-	rb_erase_init(&rb_node->rb_node, maps__entries(maps));
-	map__put(rb_node->map);
-	free(rb_node);
+	if (maps__maps_by_address_sorted(maps))
+		return;
+
+	qsort(maps__maps_by_address(maps),
+		maps__nr_maps(maps),
+		sizeof(struct map *),
+		map__start_cmp);
+	maps__set_maps_by_address_sorted(maps, true);
 }
 
-void maps__remove(struct maps *maps, struct map *map)
+static void maps__sort_by_address(struct maps *maps)
 {
-	struct map_rb_node *rb_node;
-
 	down_write(maps__lock(maps));
-	if (RC_CHK_ACCESS(maps)->last_search_by_name == map)
-		RC_CHK_ACCESS(maps)->last_search_by_name = NULL;
-
-	rb_node = maps__find_node(maps, map);
-	assert(rb_node->RC_CHK_ACCESS(map) == RC_CHK_ACCESS(map));
-	__maps__remove(maps, rb_node);
-	if (maps__maps_by_name(maps))
-		__maps__free_maps_by_name(maps);
-	--RC_CHK_ACCESS(maps)->nr_maps;
+	__maps__sort_by_address(maps);
 	up_write(maps__lock(maps));
 }
 
-static void __maps__purge(struct maps *maps)
+static int map__strcmp(const void *a, const void *b)
 {
-	struct map_rb_node *pos, *next;
-
-	if (maps__maps_by_name(maps))
-		__maps__free_maps_by_name(maps);
+	const struct map *map_a = *(const struct map * const *)a;
+	const struct map *map_b = *(const struct map * const *)b;
+	const struct dso *dso_a = map__dso(map_a);
+	const struct dso *dso_b = map__dso(map_b);
+	int ret = strcmp(dso_a->short_name, dso_b->short_name);
 
-	maps__for_each_entry_safe(maps, pos, next) {
-		rb_erase_init(&pos->rb_node,  maps__entries(maps));
-		map__put(pos->map);
-		free(pos);
+	if (ret == 0 && RC_CHK_ACCESS(map_a) != RC_CHK_ACCESS(map_b)) {
+		/* Ensure distinct but name equal maps have an order. */
+		return map__start_cmp(a, b);
 	}
+	return ret;
 }
 
-static void maps__exit(struct maps *maps)
+static int maps__sort_by_name(struct maps *maps)
 {
+	int err = 0;
 	down_write(maps__lock(maps));
-	__maps__purge(maps);
+	if (!maps__maps_by_name_sorted(maps)) {
+		struct map **maps_by_name = maps__maps_by_name(maps);
+
+		if (!maps_by_name) {
+			maps_by_name = malloc(RC_CHK_ACCESS(maps)->nr_maps_allocated *
+					sizeof(*maps_by_name));
+			if (!maps_by_name)
+				err = -ENOMEM;
+			else {
+				struct map **maps_by_address = maps__maps_by_address(maps);
+				unsigned int n = maps__nr_maps(maps);
+
+				maps__set_maps_by_name(maps, maps_by_name);
+				for (unsigned int i = 0; i < n; i++)
+					maps_by_name[i] = map__get(maps_by_address[i]);
+			}
+		}
+		if (!err) {
+			qsort(maps_by_name,
+				maps__nr_maps(maps),
+				sizeof(struct map *),
+				map__strcmp);
+			maps__set_maps_by_name_sorted(maps, true);
+		}
+	}
 	up_write(maps__lock(maps));
+	return err;
 }
 
-bool maps__empty(struct maps *maps)
+static unsigned int maps__by_address_index(const struct maps *maps, const struct map *map)
 {
-	return !maps__first(maps);
+	struct map **maps_by_address = maps__maps_by_address(maps);
+
+	if (maps__maps_by_address_sorted(maps)) {
+		struct map **mapp =
+			bsearch(&map, maps__maps_by_address(maps), maps__nr_maps(maps),
+				sizeof(*mapp), map__start_cmp);
+
+		if (mapp)
+			return mapp - maps_by_address;
+	} else {
+		for (unsigned int i = 0; i < maps__nr_maps(maps); i++) {
+			if (RC_CHK_ACCESS(maps_by_address[i]) == RC_CHK_ACCESS(map))
+				return i;
+		}
+	}
+	pr_err("Map missing from maps");
+	return -1;
 }
 
-struct maps *maps__new(struct machine *machine)
+static unsigned int maps__by_name_index(const struct maps *maps, const struct map *map)
 {
-	struct maps *result;
-	RC_STRUCT(maps) *maps = zalloc(sizeof(*maps));
+	struct map **maps_by_name = maps__maps_by_name(maps);
+
+	if (maps__maps_by_name_sorted(maps)) {
+		struct map **mapp =
+			bsearch(&map, maps_by_name, maps__nr_maps(maps),
+				sizeof(*mapp), map__strcmp);
+
+		if (mapp)
+			return mapp - maps_by_name;
+	} else {
+		for (unsigned int i = 0; i < maps__nr_maps(maps); i++) {
+			if (RC_CHK_ACCESS(maps_by_name[i]) == RC_CHK_ACCESS(map))
+				return i;
+		}
+	}
+	pr_err("Map missing from maps");
+	return -1;
+}
 
-	if (ADD_RC_CHK(result, maps))
-		maps__init(result, machine);
+static int __maps__insert(struct maps *maps, struct map *new)
+{
+	struct map **maps_by_address = maps__maps_by_address(maps);
+	struct map **maps_by_name = maps__maps_by_name(maps);
+	const struct dso *dso = map__dso(new);
+	unsigned int nr_maps = maps__nr_maps(maps);
+	unsigned int nr_allocate = RC_CHK_ACCESS(maps)->nr_maps_allocated;
+
+	if (nr_maps + 1 > nr_allocate) {
+		nr_allocate = !nr_allocate ? 32 : nr_allocate * 2;
+
+		maps_by_address = realloc(maps_by_address, nr_allocate * sizeof(new));
+		if (!maps_by_address)
+			return -ENOMEM;
+
+		maps__set_maps_by_address(maps, maps_by_address);
+		if (maps_by_name) {
+			maps_by_name = realloc(maps_by_name, nr_allocate * sizeof(new));
+			if (!maps_by_name) {
+				/*
+				 * If by name fails, just disable by name and it will
+				 * recompute next time it is required.
+				 */
+				__maps__free_maps_by_name(maps);
+			}
+			maps__set_maps_by_name(maps, maps_by_name);
+		}
+		RC_CHK_ACCESS(maps)->nr_maps_allocated = nr_allocate;
+	}
+	/* Insert the value at the end. */
+	maps_by_address[nr_maps] = map__get(new);
+	if (maps_by_name)
+		maps_by_name[nr_maps] = map__get(new);
 
-	return result;
+	nr_maps++;
+	RC_CHK_ACCESS(maps)->nr_maps = nr_maps;
+
+	/*
+	 * Recompute if things are sorted. If things are inserted in a sorted
+	 * manner, for example by processing /proc/pid/maps, then no
+	 * sorting/resorting will be necessary.
+	 */
+	if (nr_maps == 1) {
+		/* If there's just 1 entry then maps are sorted. */
+		maps__set_maps_by_address_sorted(maps, true);
+		maps__set_maps_by_name_sorted(maps, maps_by_name != NULL);
+	} else {
+		/* Sorted if maps were already sorted and this map starts after the last one. */
+		maps__set_maps_by_address_sorted(maps,
+			maps__maps_by_address_sorted(maps) &&
+			map__end(maps_by_address[nr_maps - 2]) <= map__start(new));
+		maps__set_maps_by_name_sorted(maps, false);
+	}
+	if (map__end(new) < map__start(new))
+		RC_CHK_ACCESS(maps)->ends_broken = true;
+	if (dso && dso->kernel) {
+		struct kmap *kmap = map__kmap(new);
+
+		if (kmap)
+			kmap->kmaps = maps;
+		else
+			pr_err("Internal error: kernel dso with non kernel map\n");
+	}
+	check_invariants(maps);
+	return 0;
 }
 
-static void maps__delete(struct maps *maps)
+int maps__insert(struct maps *maps, struct map *map)
 {
-	maps__exit(maps);
-	unwind__finish_access(maps);
-	RC_CHK_FREE(maps);
+	int ret;
+
+	down_write(maps__lock(maps));
+	ret = __maps__insert(maps, map);
+	up_write(maps__lock(maps));
+	return ret;
 }
 
-struct maps *maps__get(struct maps *maps)
+static void __maps__remove(struct maps *maps, struct map *map)
 {
-	struct maps *result;
+	struct map **maps_by_address = maps__maps_by_address(maps);
+	struct map **maps_by_name = maps__maps_by_name(maps);
+	unsigned int nr_maps = maps__nr_maps(maps);
+	unsigned int address_idx;
+
+	/* Slide later mappings over the one to remove */
+	address_idx = maps__by_address_index(maps, map);
+	map__put(maps_by_address[address_idx]);
+	memmove(&maps_by_address[address_idx],
+		&maps_by_address[address_idx + 1],
+		(nr_maps - address_idx - 1) * sizeof(*maps_by_address));
+
+	if (maps_by_name) {
+		unsigned int name_idx = maps__by_name_index(maps, map);
+
+		map__put(maps_by_name[name_idx]);
+		memmove(&maps_by_name[name_idx],
+			&maps_by_name[name_idx + 1],
+			(nr_maps - name_idx - 1) *  sizeof(*maps_by_name));
+	}
 
-	if (RC_CHK_GET(result, maps))
-		refcount_inc(maps__refcnt(maps));
+	--RC_CHK_ACCESS(maps)->nr_maps;
+	check_invariants(maps);
+}
 
-	return result;
+void maps__remove(struct maps *maps, struct map *map)
+{
+	down_write(maps__lock(maps));
+	__maps__remove(maps, map);
+	up_write(maps__lock(maps));
 }
 
-void maps__put(struct maps *maps)
+bool maps__empty(struct maps *maps)
 {
-	if (maps && refcount_dec_and_test(maps__refcnt(maps)))
-		maps__delete(maps);
-	else
-		RC_CHK_PUT(maps);
+	return maps__nr_maps(maps) == 0;
 }
 
 int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data), void *data)
 {
-	struct map_rb_node *pos;
+	bool done = false;
 	int ret = 0;
 
-	down_read(maps__lock(maps));
-	maps__for_each_entry(maps, pos)	{
-		ret = cb(pos->map, data);
-		if (ret)
-			break;
+	/* See locking/sorting note. */
+	while (!done) {
+		down_read(maps__lock(maps));
+		if (maps__maps_by_address_sorted(maps)) {
+			struct map **maps_by_address = maps__maps_by_address(maps);
+			unsigned int n = maps__nr_maps(maps);
+
+			for (unsigned int i = 0; i < n; i++) {
+				struct map *map = maps_by_address[i];
+
+				ret = cb(map, data);
+				if (ret)
+					break;
+			}
+			done = true;
+		}
+		up_read(maps__lock(maps));
+		if (!done)
+			maps__sort_by_address(maps);
 	}
-	up_read(maps__lock(maps));
 	return ret;
 }
 
 void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map, void *data), void *data)
 {
-	struct map_rb_node *pos, *next;
-	unsigned int start_nr_maps;
+	struct map **maps_by_address;
 
 	down_write(maps__lock(maps));
 
-	start_nr_maps = maps__nr_maps(maps);
-	maps__for_each_entry_safe(maps, pos, next)	{
-		if (cb(pos->map, data)) {
-			__maps__remove(maps, pos);
-			--RC_CHK_ACCESS(maps)->nr_maps;
-		}
+	maps_by_address = maps__maps_by_address(maps);
+	for (unsigned int i = 0; i < maps__nr_maps(maps);) {
+		if (cb(maps_by_address[i], data))
+			__maps__remove(maps, maps_by_address[i]);
+		else
+			i++;
 	}
-	if (maps__maps_by_name(maps) && start_nr_maps != maps__nr_maps(maps))
-		__maps__free_maps_by_name(maps);
-
 	up_write(maps__lock(maps));
 }
 
@@ -300,7 +491,7 @@ struct symbol *maps__find_symbol(struct maps *maps, u64 addr, struct map **mapp)
 	/* Ensure map is loaded before using map->map_ip */
 	if (map != NULL && map__load(map) >= 0) {
 		if (mapp != NULL)
-			*mapp = map;
+			*mapp = map; // TODO: map_put on else path when find returns a get.
 		return map__find_symbol(map, map__map_ip(map, addr));
 	}
 
@@ -348,7 +539,7 @@ int maps__find_ams(struct maps *maps, struct addr_map_symbol *ams)
 	if (ams->addr < map__start(ams->ms.map) || ams->addr >= map__end(ams->ms.map)) {
 		if (maps == NULL)
 			return -1;
-		ams->ms.map = maps__find(maps, ams->addr);
+		ams->ms.map = maps__find(maps, ams->addr);  // TODO: map_get
 		if (ams->ms.map == NULL)
 			return -1;
 	}
@@ -393,24 +584,28 @@ size_t maps__fprintf(struct maps *maps, FILE *fp)
  * Find first map where end > new->start.
  * Same as find_vma() in kernel.
  */
-static struct rb_node *first_ending_after(struct maps *maps, const struct map *map)
+static unsigned int first_ending_after(struct maps *maps, const struct map *map)
 {
-	struct rb_root *root;
-	struct rb_node *next, *first;
+	struct map **maps_by_address = maps__maps_by_address(maps);
+	int low = 0, high = (int)maps__nr_maps(maps) - 1, first = high + 1;
+
+	assert(maps__maps_by_address_sorted(maps));
+	if (low <= high && map__end(maps_by_address[0]) > map__start(map))
+		return 0;
 
-	root = maps__entries(maps);
-	next = root->rb_node;
-	first = NULL;
-	while (next) {
-		struct map_rb_node *pos = rb_entry(next, struct map_rb_node, rb_node);
+	while (low <= high) {
+		int mid = (low + high) / 2;
+		struct map *pos = maps_by_address[mid];
 
-		if (map__end(pos->map) > map__start(map)) {
-			first = next;
-			if (map__start(pos->map) <= map__start(map))
+		if (map__end(pos) > map__start(map)) {
+			first = mid;
+			if (map__start(pos) <= map__start(map)) {
+				/* Entry overlaps map. */
 				break;
-			next = next->rb_left;
+			}
+			high = mid - 1;
 		} else
-			next = next->rb_right;
+			low = mid + 1;
 	}
 	return first;
 }
@@ -419,170 +614,248 @@ static struct rb_node *first_ending_after(struct maps *maps, const struct map *m
  * Adds new to maps, if new overlaps existing entries then the existing maps are
  * adjusted or removed so that new fits without overlapping any entries.
  */
-int maps__fixup_overlap_and_insert(struct maps *maps, struct map *new)
+static int __maps__fixup_overlap_and_insert(struct maps *maps, struct map *new)
 {
-
-	struct rb_node *next;
+	struct map **maps_by_address;
 	int err = 0;
 
-	down_write(maps__lock(maps));
+sort_again:
+	if (!maps__maps_by_address_sorted(maps))
+		__maps__sort_by_address(maps);
 
-	next = first_ending_after(maps, new);
-	while (next && !err) {
-		struct map_rb_node *pos = rb_entry(next, struct map_rb_node, rb_node);
-		next = rb_next(&pos->rb_node);
+	maps_by_address = maps__maps_by_address(maps);
+	/*
+	 * Iterate through entries where the end of the existing entry is
+	 * greater-than the new map's start.
+	 */
+	for (unsigned int i = first_ending_after(maps, new); i < maps__nr_maps(maps); ) {
+		struct map *pos = maps_by_address[i];
+		struct map *before = NULL, *after = NULL;
 
 		/*
 		 * Stop if current map starts after map->end.
 		 * Maps are ordered by start: next will not overlap for sure.
 		 */
-		if (map__start(pos->map) >= map__end(new))
+		if (map__start(pos) >= map__end(new))
 			break;
 
-		if (verbose >= 2) {
-
-			if (use_browser) {
-				pr_debug("overlapping maps in %s (disable tui for more info)\n",
-					 map__dso(new)->name);
-			} else {
-				pr_debug("overlapping maps:\n");
-				map__fprintf(new, debug_file());
-				map__fprintf(pos->map, debug_file());
-			}
+		if (use_browser) {
+			pr_debug("overlapping maps in %s (disable tui for more info)\n",
+				map__dso(new)->name);
+		} else if (verbose >= 2) {
+			pr_debug("overlapping maps:\n");
+			map__fprintf(new, debug_file());
+			map__fprintf(pos, debug_file());
 		}
 
-		rb_erase_init(&pos->rb_node, maps__entries(maps));
 		/*
 		 * Now check if we need to create new maps for areas not
 		 * overlapped by the new map:
 		 */
-		if (map__start(new) > map__start(pos->map)) {
-			struct map *before = map__clone(pos->map);
+		if (map__start(new) > map__start(pos)) {
+			/* Map starts within existing map. Need to shorten the existing map. */
+			before = map__clone(pos);
 
 			if (before == NULL) {
 				err = -ENOMEM;
-				goto put_map;
+				goto out_err;
 			}
-
 			map__set_end(before, map__start(new));
-			err = __maps__insert(maps, before);
-			if (err) {
-				map__put(before);
-				goto put_map;
-			}
 
 			if (verbose >= 2 && !use_browser)
 				map__fprintf(before, debug_file());
-			map__put(before);
 		}
-
-		if (map__end(new) < map__end(pos->map)) {
-			struct map *after = map__clone(pos->map);
+		if (map__end(new) < map__end(pos)) {
+			/* The new map isn't as long as the existing map. */
+			after = map__clone(pos);
 
 			if (after == NULL) {
+				map__zput(before);
 				err = -ENOMEM;
-				goto put_map;
+				goto out_err;
 			}
 
 			map__set_start(after, map__end(new));
-			map__add_pgoff(after, map__end(new) - map__start(pos->map));
-			assert(map__map_ip(pos->map, map__end(new)) ==
-				map__map_ip(after, map__end(new)));
-			err = __maps__insert(maps, after);
-			if (err) {
-				map__put(after);
-				goto put_map;
-			}
+			map__add_pgoff(after, map__end(new) - map__start(pos));
+			assert(map__map_ip(pos, map__end(new)) ==
+			       map__map_ip(after, map__end(new)));
+
 			if (verbose >= 2 && !use_browser)
 				map__fprintf(after, debug_file());
-			map__put(after);
 		}
-put_map:
-		map__put(pos->map);
-		free(pos);
+		/*
+		 * If adding one entry, for `before` or `after`, we can replace
+		 * the existing entry. If both `before` and `after` are
+		 * necessary than an insert is needed. If the existing entry
+		 * entirely overlaps the existing entry it can just be removed.
+		 */
+		if (before) {
+			map__put(maps_by_address[i]);
+			maps_by_address[i] = before;
+			/* Maps are still ordered, go to next one. */
+			i++;
+			if (after) {
+				__maps__insert(maps, after);
+				map__put(after);
+				if (!maps__maps_by_address_sorted(maps)) {
+					/*
+					 * Sorting broken so invariants don't
+					 * hold, sort and go again.
+					 */
+					goto sort_again;
+				}
+				/*
+				 * Maps are still ordered, skip after and go to
+				 * next one (terminate loop).
+				 */
+				i++;
+			}
+		} else if (after) {
+			map__put(maps_by_address[i]);
+			maps_by_address[i] = after;
+			/* Maps are ordered, go to next one. */
+			i++;
+		} else {
+			__maps__remove(maps, pos);
+			/*
+			 * Maps are ordered but no need to increase `i` as the
+			 * later maps were moved down.
+			 */
+		}
+		check_invariants(maps);
 	}
 	/* Add the map. */
-	err = __maps__insert(maps, new);
-	up_write(maps__lock(maps));
+	__maps__insert(maps, new);
+out_err:
 	return err;
 }
 
-int maps__copy_from(struct maps *maps, struct maps *parent)
+int maps__fixup_overlap_and_insert(struct maps *maps, struct map *new)
 {
 	int err;
-	struct map_rb_node *rb_node;
 
+	down_write(maps__lock(maps));
+	err =  __maps__fixup_overlap_and_insert(maps, new);
+	up_write(maps__lock(maps));
+	return err;
+}
+
+int maps__copy_from(struct maps *dest, struct maps *parent)
+{
+	/* Note, if struct map were immutable then cloning could use ref counts. */
+	struct map **parent_maps_by_address;
+	int err = 0;
+	unsigned int n;
+
+	down_write(maps__lock(dest));
 	down_read(maps__lock(parent));
 
-	maps__for_each_entry(parent, rb_node) {
-		struct map *new = map__clone(rb_node->map);
+	parent_maps_by_address = maps__maps_by_address(parent);
+	n = maps__nr_maps(parent);
+	if (maps__empty(dest)) {
+		/* No existing mappings so just copy from parent to avoid reallocs in insert. */
+		unsigned int nr_maps_allocated = RC_CHK_ACCESS(parent)->nr_maps_allocated;
+		struct map **dest_maps_by_address =
+			malloc(nr_maps_allocated * sizeof(struct map *));
+		struct map **dest_maps_by_name = NULL;
 
-		if (new == NULL) {
+		if (!dest_maps_by_address)
 			err = -ENOMEM;
-			goto out_unlock;
+		else {
+			if (maps__maps_by_name(parent)) {
+				dest_maps_by_name =
+					malloc(nr_maps_allocated * sizeof(struct map *));
+			}
+
+			RC_CHK_ACCESS(dest)->maps_by_address = dest_maps_by_address;
+			RC_CHK_ACCESS(dest)->maps_by_name = dest_maps_by_name;
+			RC_CHK_ACCESS(dest)->nr_maps_allocated = nr_maps_allocated;
 		}
 
-		err = unwind__prepare_access(maps, new, NULL);
-		if (err)
-			goto out_unlock;
+		for (unsigned int i = 0; !err && i < n; i++) {
+			struct map *pos = parent_maps_by_address[i];
+			struct map *new = map__clone(pos);
 
-		err = maps__insert(maps, new);
-		if (err)
-			goto out_unlock;
+			if (!new)
+				err = -ENOMEM;
+			else {
+				err = unwind__prepare_access(dest, new, NULL);
+				if (!err) {
+					dest_maps_by_address[i] = new;
+					if (dest_maps_by_name)
+						dest_maps_by_name[i] = map__get(new);
+					RC_CHK_ACCESS(dest)->nr_maps = i + 1;
+				}
+			}
+			if (err)
+				map__put(new);
+		}
+		maps__set_maps_by_address_sorted(dest, maps__maps_by_address_sorted(parent));
+		if (!err) {
+			RC_CHK_ACCESS(dest)->last_search_by_name_idx =
+				RC_CHK_ACCESS(parent)->last_search_by_name_idx;
+			maps__set_maps_by_name_sorted(dest,
+						dest_maps_by_name &&
+						maps__maps_by_name_sorted(parent));
+		} else {
+			RC_CHK_ACCESS(dest)->last_search_by_name_idx = 0;
+			maps__set_maps_by_name_sorted(dest, false);
+		}
+	} else {
+		/* Unexpected copying to a maps containing entries. */
+		for (unsigned int i = 0; !err && i < n; i++) {
+			struct map *pos = parent_maps_by_address[i];
+			struct map *new = map__clone(pos);
 
-		map__put(new);
+			if (!new)
+				err = -ENOMEM;
+			else {
+				err = unwind__prepare_access(dest, new, NULL);
+				if (!err)
+					err = maps__insert(dest, new);
+			}
+			map__put(new);
+		}
 	}
-
-	err = 0;
-out_unlock:
 	up_read(maps__lock(parent));
+	up_write(maps__lock(dest));
 	return err;
 }
 
-struct map *maps__find(struct maps *maps, u64 ip)
+static int map__addr_cmp(const void *key, const void *entry)
 {
-	struct rb_node *p;
-	struct map_rb_node *m;
-
+	const u64 ip = *(const u64 *)key;
+	const struct map *map = *(const struct map * const *)entry;
 
-	down_read(maps__lock(maps));
-
-	p = maps__entries(maps)->rb_node;
-	while (p != NULL) {
-		m = rb_entry(p, struct map_rb_node, rb_node);
-		if (ip < map__start(m->map))
-			p = p->rb_left;
-		else if (ip >= map__end(m->map))
-			p = p->rb_right;
-		else
-			goto out;
-	}
-
-	m = NULL;
-out:
-	up_read(maps__lock(maps));
-	return m ? m->map : NULL;
+	if (ip < map__start(map))
+		return -1;
+	if (ip >= map__end(map))
+		return 1;
+	return 0;
 }
 
-static int map__strcmp(const void *a, const void *b)
+struct map *maps__find(struct maps *maps, u64 ip)
 {
-	const struct map *map_a = *(const struct map **)a;
-	const struct map *map_b = *(const struct map **)b;
-	const struct dso *dso_a = map__dso(map_a);
-	const struct dso *dso_b = map__dso(map_b);
-	int ret = strcmp(dso_a->short_name, dso_b->short_name);
-
-	if (ret == 0 && map_a != map_b) {
-		/*
-		 * Ensure distinct but name equal maps have an order in part to
-		 * aid reference counting.
-		 */
-		ret = (int)map__start(map_a) - (int)map__start(map_b);
-		if (ret == 0)
-			ret = (int)((intptr_t)map_a - (intptr_t)map_b);
+	struct map *result = NULL;
+	bool done = false;
+
+	/* See locking/sorting note. */
+	while (!done) {
+		down_read(maps__lock(maps));
+		if (maps__maps_by_address_sorted(maps)) {
+			struct map **mapp =
+				bsearch(&ip, maps__maps_by_address(maps), maps__nr_maps(maps),
+					sizeof(*mapp), map__addr_cmp);
+
+			if (mapp)
+				result = *mapp; // map__get(*mapp);
+			done = true;
+		}
+		up_read(maps__lock(maps));
+		if (!done)
+			maps__sort_by_address(maps);
 	}
-
-	return ret;
+	return result;
 }
 
 static int map__strcmp_name(const void *name, const void *b)
@@ -592,126 +865,113 @@ static int map__strcmp_name(const void *name, const void *b)
 	return strcmp(name, dso->short_name);
 }
 
-void __maps__sort_by_name(struct maps *maps)
-{
-	qsort(maps__maps_by_name(maps), maps__nr_maps(maps), sizeof(struct map *), map__strcmp);
-}
-
-static int map__groups__sort_by_name_from_rbtree(struct maps *maps)
-{
-	struct map_rb_node *rb_node;
-	struct map **maps_by_name = realloc(maps__maps_by_name(maps),
-					    maps__nr_maps(maps) * sizeof(struct map *));
-	int i = 0;
-
-	if (maps_by_name == NULL)
-		return -1;
-
-	up_read(maps__lock(maps));
-	down_write(maps__lock(maps));
-
-	RC_CHK_ACCESS(maps)->maps_by_name = maps_by_name;
-	RC_CHK_ACCESS(maps)->nr_maps_allocated = maps__nr_maps(maps);
-
-	maps__for_each_entry(maps, rb_node)
-		maps_by_name[i++] = map__get(rb_node->map);
-
-	__maps__sort_by_name(maps);
-
-	up_write(maps__lock(maps));
-	down_read(maps__lock(maps));
-
-	return 0;
-}
-
-static struct map *__maps__find_by_name(struct maps *maps, const char *name)
+struct map *maps__find_by_name(struct maps *maps, const char *name)
 {
-	struct map **mapp;
+	struct map *result = NULL;
+	bool done = false;
 
-	if (maps__maps_by_name(maps) == NULL &&
-	    map__groups__sort_by_name_from_rbtree(maps))
-		return NULL;
+	/* See locking/sorting note. */
+	while (!done) {
+		unsigned int i;
 
-	mapp = bsearch(name, maps__maps_by_name(maps), maps__nr_maps(maps),
-		       sizeof(*mapp), map__strcmp_name);
-	if (mapp)
-		return *mapp;
-	return NULL;
-}
+		down_read(maps__lock(maps));
 
-struct map *maps__find_by_name(struct maps *maps, const char *name)
-{
-	struct map_rb_node *rb_node;
-	struct map *map;
-
-	down_read(maps__lock(maps));
+		/* First check last found entry. */
+		i = RC_CHK_ACCESS(maps)->last_search_by_name_idx;
+		if (i < maps__nr_maps(maps) && maps__maps_by_name(maps)) {
+			struct dso *dso = map__dso(maps__maps_by_name(maps)[i]);
 
+			if (dso && strcmp(dso->short_name, name) == 0) {
+				result = maps__maps_by_name(maps)[i]; // TODO: map__get
+				done = true;
+			}
+		}
 
-	if (RC_CHK_ACCESS(maps)->last_search_by_name) {
-		const struct dso *dso = map__dso(RC_CHK_ACCESS(maps)->last_search_by_name);
+		/* Second search sorted array. */
+		if (!done && maps__maps_by_name_sorted(maps)) {
+			struct map **mapp =
+				bsearch(name, maps__maps_by_name(maps), maps__nr_maps(maps),
+					sizeof(*mapp), map__strcmp_name);
 
-		if (strcmp(dso->short_name, name) == 0) {
-			map = RC_CHK_ACCESS(maps)->last_search_by_name;
-			goto out_unlock;
+			if (mapp) {
+				result = *mapp; // TODO: map__get
+				i = mapp - maps__maps_by_name(maps);
+				RC_CHK_ACCESS(maps)->last_search_by_name_idx = i;
+			}
+			done = true;
 		}
-	}
-	/*
-	 * If we have maps->maps_by_name, then the name isn't in the rbtree,
-	 * as maps->maps_by_name mirrors the rbtree when lookups by name are
-	 * made.
-	 */
-	map = __maps__find_by_name(maps, name);
-	if (map || maps__maps_by_name(maps) != NULL)
-		goto out_unlock;
-
-	/* Fallback to traversing the rbtree... */
-	maps__for_each_entry(maps, rb_node) {
-		struct dso *dso;
-
-		map = rb_node->map;
-		dso = map__dso(map);
-		if (strcmp(dso->short_name, name) == 0) {
-			RC_CHK_ACCESS(maps)->last_search_by_name = map;
-			goto out_unlock;
+		up_read(maps__lock(maps));
+		if (!done) {
+			/* Sort and retry binary search. */
+			if (maps__sort_by_name(maps)) {
+				/*
+				 * Memory allocation failed do linear search
+				 * through address sorted maps.
+				 */
+				struct map **maps_by_address;
+				unsigned int n;
+
+				down_read(maps__lock(maps));
+				maps_by_address =  maps__maps_by_address(maps);
+				n = maps__nr_maps(maps);
+				for (i = 0; i < n; i++) {
+					struct map *pos = maps_by_address[i];
+					struct dso *dso = map__dso(pos);
+
+					if (dso && strcmp(dso->short_name, name) == 0) {
+						result = pos; // TODO: map__get
+						break;
+					}
+				}
+				up_read(maps__lock(maps));
+				done = true;
+			}
 		}
 	}
-	map = NULL;
-
-out_unlock:
-	up_read(maps__lock(maps));
-	return map;
+	return result;
 }
 
 struct map *maps__find_next_entry(struct maps *maps, struct map *map)
 {
-	struct map_rb_node *rb_node = maps__find_node(maps, map);
-	struct map_rb_node *next = map_rb_node__next(rb_node);
+	unsigned int i;
+	struct map *result = NULL;
 
-	if (next)
-		return next->map;
+	down_read(maps__lock(maps));
+	i = maps__by_address_index(maps, map);
+	if (i < maps__nr_maps(maps))
+		result = maps__maps_by_address(maps)[i]; // TODO: map__get
 
-	return NULL;
+	up_read(maps__lock(maps));
+	return result;
 }
 
 void maps__fixup_end(struct maps *maps)
 {
-	struct map_rb_node *prev = NULL, *curr;
+	struct map **maps_by_address;
+	unsigned int n;
 
 	down_write(maps__lock(maps));
+	if (!maps__maps_by_address_sorted(maps))
+		__maps__sort_by_address(maps);
 
-	maps__for_each_entry(maps, curr) {
-		if (prev && (!map__end(prev->map) || map__end(prev->map) > map__start(curr->map)))
-			map__set_end(prev->map, map__start(curr->map));
+	maps_by_address = maps__maps_by_address(maps);
+	n = maps__nr_maps(maps);
+	for (unsigned int i = 1; i < n; i++) {
+		struct map *prev = maps_by_address[i - 1];
+		struct map *curr = maps_by_address[i];
 
-		prev = curr;
+		if (!map__end(prev) || map__end(prev) > map__start(curr))
+			map__set_end(prev, map__start(curr));
 	}
 
 	/*
 	 * We still haven't the actual symbols, so guess the
 	 * last map final address.
 	 */
-	if (curr && !map__end(curr->map))
-		map__set_end(curr->map, ~0ULL);
+	if (n > 0 && !map__end(maps_by_address[n - 1]))
+		map__set_end(maps_by_address[n - 1], ~0ULL);
+
+	RC_CHK_ACCESS(maps)->ends_broken = false;
 
 	up_write(maps__lock(maps));
 }
@@ -722,117 +982,92 @@ void maps__fixup_end(struct maps *maps)
  */
 int maps__merge_in(struct maps *kmaps, struct map *new_map)
 {
-	struct map_rb_node *rb_node;
-	struct rb_node *first;
-	bool overlaps;
-	LIST_HEAD(merged);
-	int err = 0;
+	unsigned int first_after_, kmaps__nr_maps;
+	struct map **kmaps_maps_by_address;
+	struct map **merged_maps_by_address;
+	unsigned int merged_nr_maps_allocated;
+
+	/* First try under a read lock. */
+	while (true) {
+		down_read(maps__lock(kmaps));
+		if (maps__maps_by_address_sorted(kmaps))
+			break;
 
-	down_read(maps__lock(kmaps));
-	first = first_ending_after(kmaps, new_map);
-	overlaps = first &&
-		map__start(rb_entry(first, struct map_rb_node, rb_node)->map) < map__end(new_map);
-	up_read(maps__lock(kmaps));
+		up_read(maps__lock(kmaps));
+
+		/* First after binary search requires sorted maps. Sort and try again. */
+		maps__sort_by_address(kmaps);
+	}
+	first_after_ = first_ending_after(kmaps, new_map);
+	kmaps_maps_by_address = maps__maps_by_address(kmaps);
 
-	if (!overlaps)
+	if (first_after_ >= maps__nr_maps(kmaps) ||
+	    map__start(kmaps_maps_by_address[first_after_]) >= map__end(new_map)) {
+		/* No overlap so regular insert suffices. */
+		up_read(maps__lock(kmaps));
 		return maps__insert(kmaps, new_map);
+	}
+	up_read(maps__lock(kmaps));
 
-	maps__for_each_entry(kmaps, rb_node) {
-		struct map *old_map = rb_node->map;
+	/* Plain insert with a read-lock failed, try again now with the write lock. */
+	down_write(maps__lock(kmaps));
+	if (!maps__maps_by_address_sorted(kmaps))
+		__maps__sort_by_address(kmaps);
 
-		/* no overload with this one */
-		if (map__end(new_map) < map__start(old_map) ||
-		    map__start(new_map) >= map__end(old_map))
-			continue;
+	first_after_ = first_ending_after(kmaps, new_map);
+	kmaps_maps_by_address = maps__maps_by_address(kmaps);
+	kmaps__nr_maps = maps__nr_maps(kmaps);
 
-		if (map__start(new_map) < map__start(old_map)) {
-			/*
-			 * |new......
-			 *       |old....
-			 */
-			if (map__end(new_map) < map__end(old_map)) {
-				/*
-				 * |new......|     -> |new..|
-				 *       |old....| ->       |old....|
-				 */
-				map__set_end(new_map, map__start(old_map));
-			} else {
-				/*
-				 * |new.............| -> |new..|       |new..|
-				 *       |old....|    ->       |old....|
-				 */
-				struct map_list_node *m = map_list_node__new();
+	if (first_after_ >= kmaps__nr_maps ||
+	    map__start(kmaps_maps_by_address[first_after_]) >= map__end(new_map)) {
+		/* No overlap so regular insert suffices. */
+		up_write(maps__lock(kmaps));
+		return maps__insert(kmaps, new_map);
+	}
+	/* Array to merge into, possibly 1 more for the sake of new_map. */
+	merged_nr_maps_allocated = RC_CHK_ACCESS(kmaps)->nr_maps_allocated;
+	if (kmaps__nr_maps + 1 == merged_nr_maps_allocated)
+		merged_nr_maps_allocated++;
+
+	merged_maps_by_address = malloc(merged_nr_maps_allocated * sizeof(*merged_maps_by_address));
+	if (!merged_maps_by_address) {
+		up_write(maps__lock(kmaps));
+		return -ENOMEM;
+	}
+	RC_CHK_ACCESS(kmaps)->maps_by_address = merged_maps_by_address;
+	RC_CHK_ACCESS(kmaps)->maps_by_address_sorted = true;
+	zfree(&RC_CHK_ACCESS(kmaps)->maps_by_name);
+	RC_CHK_ACCESS(kmaps)->maps_by_name_sorted = false;
+	RC_CHK_ACCESS(kmaps)->nr_maps_allocated = merged_nr_maps_allocated;
 
-				if (!m) {
-					err = -ENOMEM;
-					goto out;
-				}
+	/* Copy entries before the new_map that can't overlap. */
+	for (unsigned int i = 0; i < first_after_; i++)
+		merged_maps_by_address[i] = map__get(kmaps_maps_by_address[i]);
 
-				m->map = map__clone(new_map);
-				if (!m->map) {
-					free(m);
-					err = -ENOMEM;
-					goto out;
-				}
+	RC_CHK_ACCESS(kmaps)->nr_maps = first_after_;
 
-				map__set_end(m->map, map__start(old_map));
-				list_add_tail(&m->node, &merged);
-				map__add_pgoff(new_map, map__end(old_map) - map__start(new_map));
-				map__set_start(new_map, map__end(old_map));
-			}
-		} else {
-			/*
-			 *      |new......
-			 * |old....
-			 */
-			if (map__end(new_map) < map__end(old_map)) {
-				/*
-				 *      |new..|   -> x
-				 * |old.........| -> |old.........|
-				 */
-				map__put(new_map);
-				new_map = NULL;
-				break;
-			} else {
-				/*
-				 *      |new......| ->         |new...|
-				 * |old....|        -> |old....|
-				 */
-				map__add_pgoff(new_map, map__end(old_map) - map__start(new_map));
-				map__set_start(new_map, map__end(old_map));
-			}
-		}
-	}
+	/* Add the new map, it will be split when the later overlapping mappings are added. */
+	__maps__insert(kmaps, new_map);
 
-out:
-	while (!list_empty(&merged)) {
-		struct map_list_node *old_node;
+	/* Insert mappings after new_map, splitting new_map in the process. */
+	for (unsigned int i = first_after_; i < kmaps__nr_maps; i++)
+		__maps__fixup_overlap_and_insert(kmaps, kmaps_maps_by_address[i]);
 
-		old_node = list_entry(merged.next, struct map_list_node, node);
-		list_del_init(&old_node->node);
-		if (!err)
-			err = maps__insert(kmaps, old_node->map);
-		map__put(old_node->map);
-		free(old_node);
-	}
+	/* Copy the maps from merged into kmaps. */
+	for (unsigned int i = 0; i < kmaps__nr_maps; i++)
+		map__zput(kmaps_maps_by_address[i]);
 
-	if (new_map) {
-		if (!err)
-			err = maps__insert(kmaps, new_map);
-		map__put(new_map);
-	}
-	return err;
+	free(kmaps_maps_by_address);
+	up_write(maps__lock(kmaps));
+	return 0;
 }
 
 void maps__load_first(struct maps *maps)
 {
-	struct map_rb_node *first;
-
 	down_read(maps__lock(maps));
 
-	first = maps__first(maps);
-	if (first)
-		map__load(first->map);
+	if (maps__nr_maps(maps) > 0)
+		map__load(maps__maps_by_address(maps)[0]);
 
 	up_read(maps__lock(maps));
 }
diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index d836d04c9402..df9dd5a0e3c0 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -25,21 +25,56 @@ static inline struct map_list_node *map_list_node__new(void)
 	return malloc(sizeof(struct map_list_node));
 }
 
-struct map *maps__find(struct maps *maps, u64 addr);
+/*
+ * Locking/sorting note:
+ *
+ * Sorting is done with the write lock, iteration and binary searching happens
+ * under the read lock requiring being sorted. There is a race between sorting
+ * releasing the write lock and acquiring the read lock for iteration/searching
+ * where another thread could insert and break the sorting of the maps. In
+ * practice inserting maps should be rare meaning that the race shouldn't lead
+ * to live lock. Removal of maps doesn't break being sorted.
+ */
 
 DECLARE_RC_STRUCT(maps) {
-	struct rb_root      entries;
 	struct rw_semaphore lock;
-	struct machine	 *machine;
-	struct map	 *last_search_by_name;
+	/**
+	 * @maps_by_address: array of maps sorted by their starting address if
+	 * maps_by_address_sorted is true.
+	 */
+	struct map	 **maps_by_address;
+	/**
+	 * @maps_by_name: optional array of maps sorted by their dso name if
+	 * maps_by_name_sorted is true.
+	 */
 	struct map	 **maps_by_name;
-	refcount_t	 refcnt;
-	unsigned int	 nr_maps;
-	unsigned int	 nr_maps_allocated;
+	struct machine	 *machine;
 #ifdef HAVE_LIBUNWIND_SUPPORT
-	void				*addr_space;
+	void		*addr_space;
 	const struct unwind_libunwind_ops *unwind_libunwind_ops;
 #endif
+	refcount_t	 refcnt;
+	/**
+	 * @nr_maps: number of maps_by_address, and possibly maps_by_name,
+	 * entries that contain maps.
+	 */
+	unsigned int	 nr_maps;
+	/**
+	 * @nr_maps_allocated: number of entries in maps_by_address and possibly
+	 * maps_by_name.
+	 */
+	unsigned int	 nr_maps_allocated;
+	/**
+	 * @last_search_by_name_idx: cache of last found by name entry's index
+	 * as frequent searches for the same dso name are common.
+	 */
+	unsigned int	 last_search_by_name_idx;
+	/** @maps_by_address_sorted: is maps_by_address sorted. */
+	bool		 maps_by_address_sorted;
+	/** @maps_by_name_sorted: is maps_by_name sorted. */
+	bool		 maps_by_name_sorted;
+	/** @ends_broken: does the map contain a map where end values are unset/unsorted? */
+	bool		 ends_broken;
 };
 
 #define KMAP_NAME_LEN 256
@@ -102,6 +137,7 @@ size_t maps__fprintf(struct maps *maps, FILE *fp);
 int maps__insert(struct maps *maps, struct map *map);
 void maps__remove(struct maps *maps, struct map *map);
 
+struct map *maps__find(struct maps *maps, u64 addr);
 struct symbol *maps__find_symbol(struct maps *maps, u64 addr, struct map **mapp);
 struct symbol *maps__find_symbol_by_name(struct maps *maps, const char *name, struct map **mapp);
 
@@ -117,8 +153,6 @@ struct map *maps__find_next_entry(struct maps *maps, struct map *map);
 
 int maps__merge_in(struct maps *kmaps, struct map *new_map);
 
-void __maps__sort_by_name(struct maps *maps);
-
 void maps__fixup_end(struct maps *maps);
 
 void maps__load_first(struct maps *maps);
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 29/53] perf maps: Get map before returning in maps__find
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (27 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 28/53] perf maps: Switch from rbtree to lazily sorted array for addresses Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 30/53] perf maps: Get map before returning in maps__find_by_name Ian Rogers
                   ` (23 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Finding a map is done under a lock, returning the map without a
reference count means it can be removed without notice and causing
uses after free. Grab a reference count to the map within the lock
region and return this. Fix up locations that need a map__put
following this.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/arch/x86/tests/dwarf-unwind.c |  1 +
 tools/perf/tests/vmlinux-kallsyms.c      |  5 ++---
 tools/perf/util/bpf-event.c              |  1 +
 tools/perf/util/event.c                  |  4 ++--
 tools/perf/util/machine.c                | 22 ++++++++--------------
 tools/perf/util/maps.c                   | 17 ++++++++++-------
 tools/perf/util/symbol.c                 |  3 ++-
 7 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/tools/perf/arch/x86/tests/dwarf-unwind.c b/tools/perf/arch/x86/tests/dwarf-unwind.c
index 5bfec3345d59..c05c0a85dad4 100644
--- a/tools/perf/arch/x86/tests/dwarf-unwind.c
+++ b/tools/perf/arch/x86/tests/dwarf-unwind.c
@@ -34,6 +34,7 @@ static int sample_ustack(struct perf_sample *sample,
 	}
 
 	stack_size = map__end(map) - sp;
+	map__put(map);
 	stack_size = stack_size > STACK_SIZE ? STACK_SIZE : stack_size;
 
 	memcpy(buf, (void *) sp, stack_size);
diff --git a/tools/perf/tests/vmlinux-kallsyms.c b/tools/perf/tests/vmlinux-kallsyms.c
index 822f893e67d5..e808e6fc8f76 100644
--- a/tools/perf/tests/vmlinux-kallsyms.c
+++ b/tools/perf/tests/vmlinux-kallsyms.c
@@ -151,10 +151,8 @@ static int test__vmlinux_matches_kallsyms_cb2(struct map *map, void *data)
 	u64 mem_end = map__unmap_ip(args->vmlinux_map, map__end(map));
 
 	pair = maps__find(args->kallsyms.kmaps, mem_start);
-	if (pair == NULL || map__priv(pair))
-		return 0;
 
-	if (map__start(pair) == mem_start) {
+	if (pair != NULL && !map__priv(pair) && map__start(pair) == mem_start) {
 		struct dso *dso = map__dso(map);
 
 		if (!args->header_printed) {
@@ -170,6 +168,7 @@ static int test__vmlinux_matches_kallsyms_cb2(struct map *map, void *data)
 		pr_info(" %s\n", dso->name);
 		map__set_priv(pair, 1);
 	}
+	map__put(pair);
 	return 0;
 }
 
diff --git a/tools/perf/util/bpf-event.c b/tools/perf/util/bpf-event.c
index 830711cae30d..d07fd5ffa823 100644
--- a/tools/perf/util/bpf-event.c
+++ b/tools/perf/util/bpf-event.c
@@ -63,6 +63,7 @@ static int machine__process_bpf_event_load(struct machine *machine,
 			dso->bpf_prog.id = id;
 			dso->bpf_prog.sub_id = i;
 			dso->bpf_prog.env = env;
+			map__put(map);
 		}
 	}
 	return 0;
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 68f45e9e63b6..198903157f9e 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -511,7 +511,7 @@ size_t perf_event__fprintf_text_poke(union perf_event *event, struct machine *ma
 		struct addr_location al;
 
 		addr_location__init(&al);
-		al.map = map__get(maps__find(machine__kernel_maps(machine), tp->addr));
+		al.map = maps__find(machine__kernel_maps(machine), tp->addr);
 		if (al.map && map__load(al.map) >= 0) {
 			al.addr = map__map_ip(al.map, tp->addr);
 			al.sym = map__find_symbol(al.map, al.addr);
@@ -641,7 +641,7 @@ struct map *thread__find_map(struct thread *thread, u8 cpumode, u64 addr,
 		return NULL;
 	}
 	al->maps = maps__get(maps);
-	al->map = map__get(maps__find(maps, al->addr));
+	al->map = maps__find(maps, al->addr);
 	if (al->map != NULL) {
 		/*
 		 * Kernel maps might be changed when loading symbols so loading
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index ab345604f274..1112a9dbb21a 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -897,7 +897,6 @@ static int machine__process_ksymbol_register(struct machine *machine,
 	struct symbol *sym;
 	struct dso *dso;
 	struct map *map = maps__find(machine__kernel_maps(machine), event->ksymbol.addr);
-	bool put_map = false;
 	int err = 0;
 
 	if (!map) {
@@ -914,12 +913,6 @@ static int machine__process_ksymbol_register(struct machine *machine,
 			err = -ENOMEM;
 			goto out;
 		}
-		/*
-		 * The inserted map has a get on it, we need to put to release
-		 * the reference count here, but do it after all accesses are
-		 * done.
-		 */
-		put_map = true;
 		if (event->ksymbol.ksym_type == PERF_RECORD_KSYMBOL_TYPE_OOL) {
 			dso->binary_type = DSO_BINARY_TYPE__OOL;
 			dso->data.file_size = event->ksymbol.len;
@@ -953,8 +946,7 @@ static int machine__process_ksymbol_register(struct machine *machine,
 	}
 	dso__insert_symbol(dso, sym);
 out:
-	if (put_map)
-		map__put(map);
+	map__put(map);
 	return err;
 }
 
@@ -978,7 +970,7 @@ static int machine__process_ksymbol_unregister(struct machine *machine,
 		if (sym)
 			dso__delete_symbol(dso, sym);
 	}
-
+	map__put(map);
 	return 0;
 }
 
@@ -1006,11 +998,11 @@ int machine__process_text_poke(struct machine *machine, union perf_event *event,
 		perf_event__fprintf_text_poke(event, machine, stdout);
 
 	if (!event->text_poke.new_len)
-		return 0;
+		goto out;
 
 	if (cpumode != PERF_RECORD_MISC_KERNEL) {
 		pr_debug("%s: unsupported cpumode - ignoring\n", __func__);
-		return 0;
+		goto out;
 	}
 
 	if (dso) {
@@ -1033,7 +1025,8 @@ int machine__process_text_poke(struct machine *machine, union perf_event *event,
 		pr_debug("Failed to find kernel text poke address map for %#" PRI_lx64 "\n",
 			 event->text_poke.addr);
 	}
-
+out:
+	map__put(map);
 	return 0;
 }
 
@@ -1301,9 +1294,10 @@ static int machine__map_x86_64_entry_trampolines_cb(struct map *map, void *data)
 		return 0;
 
 	dest_map = maps__find(args->kmaps, map__pgoff(map));
-	if (dest_map != map)
+	if (RC_CHK_ACCESS(dest_map) != RC_CHK_ACCESS(map))
 		map__set_pgoff(map, map__map_ip(dest_map, map__pgoff(map)));
 
+	map__put(dest_map);
 	args->found = true;
 	return 0;
 }
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 06fdd8a7c2a2..28facfdac1d7 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -487,15 +487,18 @@ void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map, void *data
 struct symbol *maps__find_symbol(struct maps *maps, u64 addr, struct map **mapp)
 {
 	struct map *map = maps__find(maps, addr);
+	struct symbol *result = NULL;
 
 	/* Ensure map is loaded before using map->map_ip */
 	if (map != NULL && map__load(map) >= 0) {
-		if (mapp != NULL)
-			*mapp = map; // TODO: map_put on else path when find returns a get.
-		return map__find_symbol(map, map__map_ip(map, addr));
-	}
+		if (mapp)
+			*mapp = map;
 
-	return NULL;
+		result = map__find_symbol(map, map__map_ip(map, addr));
+		if (!mapp)
+			map__put(map);
+	}
+	return result;
 }
 
 struct maps__find_symbol_by_name_args {
@@ -539,7 +542,7 @@ int maps__find_ams(struct maps *maps, struct addr_map_symbol *ams)
 	if (ams->addr < map__start(ams->ms.map) || ams->addr >= map__end(ams->ms.map)) {
 		if (maps == NULL)
 			return -1;
-		ams->ms.map = maps__find(maps, ams->addr);  // TODO: map_get
+		ams->ms.map = maps__find(maps, ams->addr);
 		if (ams->ms.map == NULL)
 			return -1;
 	}
@@ -848,7 +851,7 @@ struct map *maps__find(struct maps *maps, u64 ip)
 					sizeof(*mapp), map__addr_cmp);
 
 			if (mapp)
-				result = *mapp; // map__get(*mapp);
+				result = map__get(*mapp);
 			done = true;
 		}
 		up_read(maps__lock(maps));
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 30da8a405d11..ad4819a24320 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -757,7 +757,6 @@ static int dso__load_all_kallsyms(struct dso *dso, const char *filename)
 
 static int maps__split_kallsyms_for_kcore(struct maps *kmaps, struct dso *dso)
 {
-	struct map *curr_map;
 	struct symbol *pos;
 	int count = 0;
 	struct rb_root_cached old_root = dso->symbols;
@@ -770,6 +769,7 @@ static int maps__split_kallsyms_for_kcore(struct maps *kmaps, struct dso *dso)
 	*root = RB_ROOT_CACHED;
 
 	while (next) {
+		struct map *curr_map;
 		struct dso *curr_map_dso;
 		char *module;
 
@@ -796,6 +796,7 @@ static int maps__split_kallsyms_for_kcore(struct maps *kmaps, struct dso *dso)
 			pos->end -= map__start(curr_map) - map__pgoff(curr_map);
 		symbols__insert(&curr_map_dso->symbols, pos);
 		++count;
+		map__put(curr_map);
 	}
 
 	/* Symbols have been adjusted */
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 30/53] perf maps: Get map before returning in maps__find_by_name
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (28 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 29/53] perf maps: Get map before returning in maps__find Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 31/53] perf maps: Get map before returning in maps__find_next_entry Ian Rogers
                   ` (22 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Finding a map is done under a lock, returning the map without a
reference count means it can be removed without notice and causing
uses after free. Grab a reference count to the map within the lock
region and return this. Fix up locations that need a map__put
following this. Also fix some reference counted pointer comparisons.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/tests/vmlinux-kallsyms.c |  5 +++--
 tools/perf/util/machine.c           |  6 ++++--
 tools/perf/util/maps.c              |  6 +++---
 tools/perf/util/probe-event.c       |  1 +
 tools/perf/util/symbol-elf.c        |  4 +++-
 tools/perf/util/symbol.c            | 20 ++++++++++++--------
 6 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/tools/perf/tests/vmlinux-kallsyms.c b/tools/perf/tests/vmlinux-kallsyms.c
index e808e6fc8f76..fecbf851bb2e 100644
--- a/tools/perf/tests/vmlinux-kallsyms.c
+++ b/tools/perf/tests/vmlinux-kallsyms.c
@@ -131,9 +131,10 @@ static int test__vmlinux_matches_kallsyms_cb1(struct map *map, void *data)
 	struct map *pair = maps__find_by_name(args->kallsyms.kmaps,
 					(dso->kernel ? dso->short_name : dso->name));
 
-	if (pair)
+	if (pair) {
 		map__set_priv(pair, 1);
-	else {
+		map__put(pair);
+	} else {
 		if (!args->header_printed) {
 			pr_info("WARN: Maps only in vmlinux:\n");
 			args->header_printed = true;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 1112a9dbb21a..d6b3f84cb935 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1538,8 +1538,10 @@ static int maps__set_module_path(struct maps *maps, const char *path, struct kmo
 		return 0;
 
 	long_name = strdup(path);
-	if (long_name == NULL)
+	if (long_name == NULL) {
+		map__put(map);
 		return -ENOMEM;
+	}
 
 	dso = map__dso(map);
 	dso__set_long_name(dso, long_name, true);
@@ -1553,7 +1555,7 @@ static int maps__set_module_path(struct maps *maps, const char *path, struct kmo
 		dso->symtab_type++;
 		dso->comp = m->comp;
 	}
-
+	map__put(map);
 	return 0;
 }
 
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 28facfdac1d7..8a8c1f216b86 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -885,7 +885,7 @@ struct map *maps__find_by_name(struct maps *maps, const char *name)
 			struct dso *dso = map__dso(maps__maps_by_name(maps)[i]);
 
 			if (dso && strcmp(dso->short_name, name) == 0) {
-				result = maps__maps_by_name(maps)[i]; // TODO: map__get
+				result = map__get(maps__maps_by_name(maps)[i]);
 				done = true;
 			}
 		}
@@ -897,7 +897,7 @@ struct map *maps__find_by_name(struct maps *maps, const char *name)
 					sizeof(*mapp), map__strcmp_name);
 
 			if (mapp) {
-				result = *mapp; // TODO: map__get
+				result = map__get(*mapp);
 				i = mapp - maps__maps_by_name(maps);
 				RC_CHK_ACCESS(maps)->last_search_by_name_idx = i;
 			}
@@ -922,7 +922,7 @@ struct map *maps__find_by_name(struct maps *maps, const char *name)
 					struct dso *dso = map__dso(pos);
 
 					if (dso && strcmp(dso->short_name, name) == 0) {
-						result = pos; // TODO: map__get
+						result = map__get(pos);
 						break;
 					}
 				}
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index a1a796043691..be71abe8b9b0 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -358,6 +358,7 @@ static int kernel_get_module_dso(const char *module, struct dso **pdso)
 		map = maps__find_by_name(machine__kernel_maps(host_machine), module_name);
 		if (map) {
 			dso = map__dso(map);
+			map__put(map);
 			goto found;
 		}
 		pr_debug("Failed to find module %s.\n", module);
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 4b934ed3bfd1..5990e3fabdb5 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1470,8 +1470,10 @@ static int dso__process_kernel_symbol(struct dso *dso, struct map *map,
 		dso__set_loaded(curr_dso);
 		*curr_mapp = curr_map;
 		*curr_dsop = curr_dso;
-	} else
+	} else {
 		*curr_dsop = map__dso(curr_map);
+		map__put(curr_map);
+	}
 
 	return 0;
 }
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index ad4819a24320..0785a54e832e 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -814,7 +814,7 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
 				struct map *initial_map)
 {
 	struct machine *machine;
-	struct map *curr_map = initial_map;
+	struct map *curr_map = map__get(initial_map);
 	struct symbol *pos;
 	int count = 0, moved = 0;
 	struct rb_root_cached *root = &dso->symbols;
@@ -858,13 +858,14 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
 					dso__set_loaded(curr_map_dso);
 				}
 
+				map__zput(curr_map);
 				curr_map = maps__find_by_name(kmaps, module);
 				if (curr_map == NULL) {
 					pr_debug("%s/proc/{kallsyms,modules} "
 					         "inconsistency while looking "
 						 "for \"%s\" module!\n",
 						 machine->root_dir, module);
-					curr_map = initial_map;
+					curr_map = map__get(initial_map);
 					goto discard_symbol;
 				}
 				curr_map_dso = map__dso(curr_map);
@@ -888,7 +889,7 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
 			 * symbols at this point.
 			 */
 			goto discard_symbol;
-		} else if (curr_map != initial_map) {
+		} else if (!RC_CHK_EQUAL(curr_map, initial_map)) {
 			char dso_name[PATH_MAX];
 			struct dso *ndso;
 
@@ -899,7 +900,8 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
 			}
 
 			if (count == 0) {
-				curr_map = initial_map;
+				map__zput(curr_map);
+				curr_map = map__get(initial_map);
 				goto add_symbol;
 			}
 
@@ -913,6 +915,7 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
 					kernel_range++);
 
 			ndso = dso__new(dso_name);
+			map__zput(curr_map);
 			if (ndso == NULL)
 				return -1;
 
@@ -926,6 +929,7 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
 
 			map__set_mapping_type(curr_map, MAPPING_TYPE__IDENTITY);
 			if (maps__insert(kmaps, curr_map)) {
+				map__zput(curr_map);
 				dso__put(ndso);
 				return -1;
 			}
@@ -936,7 +940,7 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
 			pos->end -= delta;
 		}
 add_symbol:
-		if (curr_map != initial_map) {
+		if (!RC_CHK_EQUAL(curr_map, initial_map)) {
 			struct dso *curr_map_dso = map__dso(curr_map);
 
 			rb_erase_cached(&pos->rb_node, root);
@@ -951,12 +955,12 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
 		symbol__delete(pos);
 	}
 
-	if (curr_map != initial_map &&
+	if (!RC_CHK_EQUAL(curr_map, initial_map) &&
 	    dso->kernel == DSO_SPACE__KERNEL_GUEST &&
 	    machine__is_default_guest(maps__machine(kmaps))) {
 		dso__set_loaded(map__dso(curr_map));
 	}
-
+	map__put(curr_map);
 	return count + moved;
 }
 
@@ -1248,7 +1252,7 @@ static bool remove_old_maps(struct map *map, void *data)
 	 * We need to preserve eBPF maps even if they are covered by kcore,
 	 * because we need to access eBPF dso for source data.
 	 */
-	return RC_CHK_ACCESS(map) != RC_CHK_ACCESS(map_to_save) && !__map__is_bpf_prog(map);
+	return !RC_CHK_EQUAL(map, map_to_save) && !__map__is_bpf_prog(map);
 }
 
 static int dso__load_kcore(struct dso *dso, struct map *map,
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 31/53] perf maps: Get map before returning in maps__find_next_entry
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (29 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 30/53] perf maps: Get map before returning in maps__find_by_name Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 32/53] perf maps: Hide maps internals Ian Rogers
                   ` (21 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Finding a map is done under a lock, returning the map without a
reference count means it can be removed without notice and causing
uses after free. Grab a reference count to the map within the lock
region and return this. Fix up locations that need a map__put
following this.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/machine.c | 4 +++-
 tools/perf/util/maps.c    | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index d6b3f84cb935..42d73f00f9c1 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1758,8 +1758,10 @@ int machine__create_kernel_maps(struct machine *machine)
 		struct map *next = maps__find_next_entry(machine__kernel_maps(machine),
 							 machine__kernel_map(machine));
 
-		if (next)
+		if (next) {
 			machine__set_kernel_mmap(machine, start, map__start(next));
+			map__put(next);
+		}
 	}
 
 out_put:
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 8a8c1f216b86..b3937e734cbf 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -942,7 +942,7 @@ struct map *maps__find_next_entry(struct maps *maps, struct map *map)
 	down_read(maps__lock(maps));
 	i = maps__by_address_index(maps, map);
 	if (i < maps__nr_maps(maps))
-		result = maps__maps_by_address(maps)[i]; // TODO: map__get
+		result = map__get(maps__maps_by_address(maps)[i]);
 
 	up_read(maps__lock(maps));
 	return result;
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 32/53] perf maps: Hide maps internals
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (30 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 31/53] perf maps: Get map before returning in maps__find_next_entry Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 33/53] perf maps: Locking tidy up of nr_maps Ian Rogers
                   ` (20 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Move the struct into the C file. Add maps__equal to work around
exposing the struct for reference count checking. Add accessors for
the unwind_libunwind_ops. Move maps_list_node to its only use in
symbol.c.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/tests/thread-maps-share.c     |  8 +-
 tools/perf/util/callchain.c              |  2 +-
 tools/perf/util/maps.c                   | 96 +++++++++++++++++++++++
 tools/perf/util/maps.h                   | 97 +++---------------------
 tools/perf/util/symbol.c                 | 10 +++
 tools/perf/util/thread.c                 |  2 +-
 tools/perf/util/unwind-libunwind-local.c |  2 +-
 tools/perf/util/unwind-libunwind.c       |  7 +-
 8 files changed, 123 insertions(+), 101 deletions(-)

diff --git a/tools/perf/tests/thread-maps-share.c b/tools/perf/tests/thread-maps-share.c
index 7fa6f7c568e2..e9ecd30a5c05 100644
--- a/tools/perf/tests/thread-maps-share.c
+++ b/tools/perf/tests/thread-maps-share.c
@@ -46,9 +46,9 @@ static int test__thread_maps_share(struct test_suite *test __maybe_unused, int s
 	TEST_ASSERT_EQUAL("wrong refcnt", refcount_read(maps__refcnt(maps)), 4);
 
 	/* test the maps pointer is shared */
-	TEST_ASSERT_VAL("maps don't match", RC_CHK_EQUAL(maps, thread__maps(t1)));
-	TEST_ASSERT_VAL("maps don't match", RC_CHK_EQUAL(maps, thread__maps(t2)));
-	TEST_ASSERT_VAL("maps don't match", RC_CHK_EQUAL(maps, thread__maps(t3)));
+	TEST_ASSERT_VAL("maps don't match", maps__equal(maps, thread__maps(t1)));
+	TEST_ASSERT_VAL("maps don't match", maps__equal(maps, thread__maps(t2)));
+	TEST_ASSERT_VAL("maps don't match", maps__equal(maps, thread__maps(t3)));
 
 	/*
 	 * Verify the other leader was created by previous call.
@@ -73,7 +73,7 @@ static int test__thread_maps_share(struct test_suite *test __maybe_unused, int s
 	other_maps = thread__maps(other);
 	TEST_ASSERT_EQUAL("wrong refcnt", refcount_read(maps__refcnt(other_maps)), 2);
 
-	TEST_ASSERT_VAL("maps don't match", RC_CHK_EQUAL(other_maps, thread__maps(other_leader)));
+	TEST_ASSERT_VAL("maps don't match", maps__equal(other_maps, thread__maps(other_leader)));
 
 	/* release thread group */
 	thread__put(t3);
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 8262f69118db..7517d16c02ec 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1157,7 +1157,7 @@ int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node *
 		if (al->map == NULL)
 			goto out;
 	}
-	if (RC_CHK_EQUAL(al->maps, machine__kernel_maps(machine))) {
+	if (maps__equal(al->maps, machine__kernel_maps(machine))) {
 		if (machine__is_host(machine)) {
 			al->cpumode = PERF_RECORD_MISC_KERNEL;
 			al->level = 'k';
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index b3937e734cbf..41e9e39b1b4c 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -6,9 +6,63 @@
 #include "dso.h"
 #include "map.h"
 #include "maps.h"
+#include "rwsem.h"
 #include "thread.h"
 #include "ui/ui.h"
 #include "unwind.h"
+#include <internal/rc_check.h>
+
+/*
+ * Locking/sorting note:
+ *
+ * Sorting is done with the write lock, iteration and binary searching happens
+ * under the read lock requiring being sorted. There is a race between sorting
+ * releasing the write lock and acquiring the read lock for iteration/searching
+ * where another thread could insert and break the sorting of the maps. In
+ * practice inserting maps should be rare meaning that the race shouldn't lead
+ * to live lock. Removal of maps doesn't break being sorted.
+ */
+
+DECLARE_RC_STRUCT(maps) {
+	struct rw_semaphore lock;
+	/**
+	 * @maps_by_address: array of maps sorted by their starting address if
+	 * maps_by_address_sorted is true.
+	 */
+	struct map	 **maps_by_address;
+	/**
+	 * @maps_by_name: optional array of maps sorted by their dso name if
+	 * maps_by_name_sorted is true.
+	 */
+	struct map	 **maps_by_name;
+	struct machine	 *machine;
+#ifdef HAVE_LIBUNWIND_SUPPORT
+	void		*addr_space;
+	const struct unwind_libunwind_ops *unwind_libunwind_ops;
+#endif
+	refcount_t	 refcnt;
+	/**
+	 * @nr_maps: number of maps_by_address, and possibly maps_by_name,
+	 * entries that contain maps.
+	 */
+	unsigned int	 nr_maps;
+	/**
+	 * @nr_maps_allocated: number of entries in maps_by_address and possibly
+	 * maps_by_name.
+	 */
+	unsigned int	 nr_maps_allocated;
+	/**
+	 * @last_search_by_name_idx: cache of last found by name entry's index
+	 * as frequent searches for the same dso name are common.
+	 */
+	unsigned int	 last_search_by_name_idx;
+	/** @maps_by_address_sorted: is maps_by_address sorted. */
+	bool		 maps_by_address_sorted;
+	/** @maps_by_name_sorted: is maps_by_name sorted. */
+	bool		 maps_by_name_sorted;
+	/** @ends_broken: does the map contain a map where end values are unset/unsorted? */
+	bool		 ends_broken;
+};
 
 static void check_invariants(const struct maps *maps __maybe_unused)
 {
@@ -103,6 +157,43 @@ static void maps__set_maps_by_name_sorted(struct maps *maps, bool value)
 	RC_CHK_ACCESS(maps)->maps_by_name_sorted = value;
 }
 
+struct machine *maps__machine(const struct maps *maps)
+{
+	return RC_CHK_ACCESS(maps)->machine;
+}
+
+unsigned int maps__nr_maps(const struct maps *maps)
+{
+	return RC_CHK_ACCESS(maps)->nr_maps;
+}
+
+refcount_t *maps__refcnt(struct maps *maps)
+{
+	return &RC_CHK_ACCESS(maps)->refcnt;
+}
+
+#ifdef HAVE_LIBUNWIND_SUPPORT
+void *maps__addr_space(const struct maps *maps)
+{
+	return RC_CHK_ACCESS(maps)->addr_space;
+}
+
+void maps__set_addr_space(struct maps *maps, void *addr_space)
+{
+	RC_CHK_ACCESS(maps)->addr_space = addr_space;
+}
+
+const struct unwind_libunwind_ops *maps__unwind_libunwind_ops(const struct maps *maps)
+{
+	return RC_CHK_ACCESS(maps)->unwind_libunwind_ops;
+}
+
+void maps__set_unwind_libunwind_ops(struct maps *maps, const struct unwind_libunwind_ops *ops)
+{
+	RC_CHK_ACCESS(maps)->unwind_libunwind_ops = ops;
+}
+#endif
+
 static struct rw_semaphore *maps__lock(struct maps *maps)
 {
 	/*
@@ -440,6 +531,11 @@ bool maps__empty(struct maps *maps)
 	return maps__nr_maps(maps) == 0;
 }
 
+bool maps__equal(struct maps *a, struct maps *b)
+{
+	return RC_CHK_EQUAL(a, b);
+}
+
 int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data), void *data)
 {
 	bool done = false;
diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index df9dd5a0e3c0..4bcba136ffe5 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -3,80 +3,15 @@
 #define __PERF_MAPS_H
 
 #include <linux/refcount.h>
-#include <linux/rbtree.h>
 #include <stdio.h>
 #include <stdbool.h>
 #include <linux/types.h>
-#include "rwsem.h"
-#include <internal/rc_check.h>
 
 struct ref_reloc_sym;
 struct machine;
 struct map;
 struct maps;
 
-struct map_list_node {
-	struct list_head node;
-	struct map *map;
-};
-
-static inline struct map_list_node *map_list_node__new(void)
-{
-	return malloc(sizeof(struct map_list_node));
-}
-
-/*
- * Locking/sorting note:
- *
- * Sorting is done with the write lock, iteration and binary searching happens
- * under the read lock requiring being sorted. There is a race between sorting
- * releasing the write lock and acquiring the read lock for iteration/searching
- * where another thread could insert and break the sorting of the maps. In
- * practice inserting maps should be rare meaning that the race shouldn't lead
- * to live lock. Removal of maps doesn't break being sorted.
- */
-
-DECLARE_RC_STRUCT(maps) {
-	struct rw_semaphore lock;
-	/**
-	 * @maps_by_address: array of maps sorted by their starting address if
-	 * maps_by_address_sorted is true.
-	 */
-	struct map	 **maps_by_address;
-	/**
-	 * @maps_by_name: optional array of maps sorted by their dso name if
-	 * maps_by_name_sorted is true.
-	 */
-	struct map	 **maps_by_name;
-	struct machine	 *machine;
-#ifdef HAVE_LIBUNWIND_SUPPORT
-	void		*addr_space;
-	const struct unwind_libunwind_ops *unwind_libunwind_ops;
-#endif
-	refcount_t	 refcnt;
-	/**
-	 * @nr_maps: number of maps_by_address, and possibly maps_by_name,
-	 * entries that contain maps.
-	 */
-	unsigned int	 nr_maps;
-	/**
-	 * @nr_maps_allocated: number of entries in maps_by_address and possibly
-	 * maps_by_name.
-	 */
-	unsigned int	 nr_maps_allocated;
-	/**
-	 * @last_search_by_name_idx: cache of last found by name entry's index
-	 * as frequent searches for the same dso name are common.
-	 */
-	unsigned int	 last_search_by_name_idx;
-	/** @maps_by_address_sorted: is maps_by_address sorted. */
-	bool		 maps_by_address_sorted;
-	/** @maps_by_name_sorted: is maps_by_name sorted. */
-	bool		 maps_by_name_sorted;
-	/** @ends_broken: does the map contain a map where end values are unset/unsorted? */
-	bool		 ends_broken;
-};
-
 #define KMAP_NAME_LEN 256
 
 struct kmap {
@@ -100,36 +35,22 @@ static inline void __maps__zput(struct maps **map)
 
 #define maps__zput(map) __maps__zput(&map)
 
+bool maps__equal(struct maps *a, struct maps *b);
+
 /* Iterate over map calling cb for each entry. */
 int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data), void *data);
 /* Iterate over map removing an entry if cb returns true. */
 void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map, void *data), void *data);
 
-static inline struct machine *maps__machine(struct maps *maps)
-{
-	return RC_CHK_ACCESS(maps)->machine;
-}
-
-static inline unsigned int maps__nr_maps(const struct maps *maps)
-{
-	return RC_CHK_ACCESS(maps)->nr_maps;
-}
-
-static inline refcount_t *maps__refcnt(struct maps *maps)
-{
-	return &RC_CHK_ACCESS(maps)->refcnt;
-}
+struct machine *maps__machine(const struct maps *maps);
+unsigned int maps__nr_maps(const struct maps *maps);
+refcount_t *maps__refcnt(struct maps *maps);
 
 #ifdef HAVE_LIBUNWIND_SUPPORT
-static inline void *maps__addr_space(struct maps *maps)
-{
-	return RC_CHK_ACCESS(maps)->addr_space;
-}
-
-static inline const struct unwind_libunwind_ops *maps__unwind_libunwind_ops(const struct maps *maps)
-{
-	return RC_CHK_ACCESS(maps)->unwind_libunwind_ops;
-}
+void *maps__addr_space(const struct maps *maps);
+void maps__set_addr_space(struct maps *maps, void *addr_space);
+const struct unwind_libunwind_ops *maps__unwind_libunwind_ops(const struct maps *maps);
+void maps__set_unwind_libunwind_ops(struct maps *maps, const struct unwind_libunwind_ops *ops);
 #endif
 
 size_t maps__fprintf(struct maps *maps, FILE *fp);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 0785a54e832e..35975189999b 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -63,6 +63,16 @@ struct symbol_conf symbol_conf = {
 	.res_sample		= 0,
 };
 
+struct map_list_node {
+	struct list_head node;
+	struct map *map;
+};
+
+static struct map_list_node *map_list_node__new(void)
+{
+	return malloc(sizeof(struct map_list_node));
+}
+
 static enum dso_binary_type binary_type_symtab[] = {
 	DSO_BINARY_TYPE__KALLSYMS,
 	DSO_BINARY_TYPE__GUEST_KALLSYMS,
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 89c47a5098e2..c59ab4d79163 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -383,7 +383,7 @@ static int thread__clone_maps(struct thread *thread, struct thread *parent, bool
 	if (thread__pid(thread) == thread__pid(parent))
 		return thread__prepare_access(thread);
 
-	if (RC_CHK_EQUAL(thread__maps(thread), thread__maps(parent))) {
+	if (maps__equal(thread__maps(thread), thread__maps(parent))) {
 		pr_debug("broken map groups on thread %d/%d parent %d/%d\n",
 			 thread__pid(thread), thread__tid(thread),
 			 thread__pid(parent), thread__tid(parent));
diff --git a/tools/perf/util/unwind-libunwind-local.c b/tools/perf/util/unwind-libunwind-local.c
index 228f1565bd0b..b69dc3a447db 100644
--- a/tools/perf/util/unwind-libunwind-local.c
+++ b/tools/perf/util/unwind-libunwind-local.c
@@ -706,7 +706,7 @@ static int _unwind__prepare_access(struct maps *maps)
 {
 	void *addr_space = unw_create_addr_space(&accessors, 0);
 
-	RC_CHK_ACCESS(maps)->addr_space = addr_space;
+	maps__set_addr_space(maps, addr_space);
 	if (!addr_space) {
 		pr_err("unwind: Can't create unwind address space.\n");
 		return -ENOMEM;
diff --git a/tools/perf/util/unwind-libunwind.c b/tools/perf/util/unwind-libunwind.c
index 76cd63de80a8..2728eb4f13ea 100644
--- a/tools/perf/util/unwind-libunwind.c
+++ b/tools/perf/util/unwind-libunwind.c
@@ -12,11 +12,6 @@ struct unwind_libunwind_ops __weak *local_unwind_libunwind_ops;
 struct unwind_libunwind_ops __weak *x86_32_unwind_libunwind_ops;
 struct unwind_libunwind_ops __weak *arm64_unwind_libunwind_ops;
 
-static void unwind__register_ops(struct maps *maps, struct unwind_libunwind_ops *ops)
-{
-	RC_CHK_ACCESS(maps)->unwind_libunwind_ops = ops;
-}
-
 int unwind__prepare_access(struct maps *maps, struct map *map, bool *initialized)
 {
 	const char *arch;
@@ -60,7 +55,7 @@ int unwind__prepare_access(struct maps *maps, struct map *map, bool *initialized
 		return 0;
 	}
 out_register:
-	unwind__register_ops(maps, ops);
+	maps__set_unwind_libunwind_ops(maps, ops);
 
 	err = maps__unwind_libunwind_ops(maps)->prepare_access(maps);
 	if (initialized)
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 33/53] perf maps: Locking tidy up of nr_maps
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (31 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 32/53] perf maps: Hide maps internals Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 34/53] perf dso: Reorder variables to save space in struct dso Ian Rogers
                   ` (19 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

After this change maps__nr_maps is only used by tests, existing users
are migrated to maps__empty. Compute maps__empty under the read lock.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/machine.c |  2 +-
 tools/perf/util/maps.c    | 10 ++++++++--
 tools/perf/util/maps.h    |  4 ++--
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 42d73f00f9c1..f9c77119af22 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -441,7 +441,7 @@ static struct thread *findnew_guest_code(struct machine *machine,
 		return NULL;
 
 	/* Assume maps are set up if there are any */
-	if (maps__nr_maps(thread__maps(thread)))
+	if (!maps__empty(thread__maps(thread)))
 		return thread;
 
 	host_thread = machine__find_thread(host_machine, -1, pid);
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 41e9e39b1b4c..725f5d73e93a 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -528,7 +528,13 @@ void maps__remove(struct maps *maps, struct map *map)
 
 bool maps__empty(struct maps *maps)
 {
-	return maps__nr_maps(maps) == 0;
+	bool res;
+
+	down_read(maps__lock(maps));
+	res = maps__nr_maps(maps) == 0;
+	up_read(maps__lock(maps));
+
+	return res;
 }
 
 bool maps__equal(struct maps *a, struct maps *b)
@@ -851,7 +857,7 @@ int maps__copy_from(struct maps *dest, struct maps *parent)
 
 	parent_maps_by_address = maps__maps_by_address(parent);
 	n = maps__nr_maps(parent);
-	if (maps__empty(dest)) {
+	if (maps__nr_maps(dest) == 0) {
 		/* No existing mappings so just copy from parent to avoid reallocs in insert. */
 		unsigned int nr_maps_allocated = RC_CHK_ACCESS(parent)->nr_maps_allocated;
 		struct map **dest_maps_by_address =
diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index 4bcba136ffe5..d9aa62ed968a 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -43,8 +43,8 @@ int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data)
 void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map, void *data), void *data);
 
 struct machine *maps__machine(const struct maps *maps);
-unsigned int maps__nr_maps(const struct maps *maps);
-refcount_t *maps__refcnt(struct maps *maps);
+unsigned int maps__nr_maps(const struct maps *maps); /* Test only. */
+refcount_t *maps__refcnt(struct maps *maps); /* Test only. */
 
 #ifdef HAVE_LIBUNWIND_SUPPORT
 void *maps__addr_space(const struct maps *maps);
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 34/53] perf dso: Reorder variables to save space in struct dso
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (32 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 33/53] perf maps: Locking tidy up of nr_maps Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 35/53] perf report: Sort child tasks by tid Ian Rogers
                   ` (18 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Save 40 bytes and move from 8 to 7 cache lines. Make variable dwfl
dependent on being a powerpc build. Squeeze bits of int/enum types
when appropriate. Remove holes/padding by reordering variables.

Before:
```
struct dso {
        struct mutex               lock;                 /*     0    40 */
        struct list_head           node;                 /*    40    16 */
        struct rb_node             rb_node __attribute__((__aligned__(8))); /*    56    24 */
        /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
        struct rb_root *           root;                 /*    80     8 */
        struct rb_root_cached      symbols;              /*    88    16 */
        struct symbol * *          symbol_names;         /*   104     8 */
        size_t                     symbol_names_len;     /*   112     8 */
        struct rb_root_cached      inlined_nodes;        /*   120    16 */
        /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
        struct rb_root_cached      srclines;             /*   136    16 */
        struct {
                u64                addr;                 /*   152     8 */
                struct symbol *    symbol;               /*   160     8 */
        } last_find_result;                              /*   152    16 */
        void *                     a2l;                  /*   168     8 */
        char *                     symsrc_filename;      /*   176     8 */
        unsigned int               a2l_fails;            /*   184     4 */
        enum dso_space_type        kernel;               /*   188     4 */
        /* --- cacheline 3 boundary (192 bytes) --- */
        _Bool                      is_kmod;              /*   192     1 */

        /* XXX 3 bytes hole, try to pack */

        enum dso_swap_type         needs_swap;           /*   196     4 */
        enum dso_binary_type       symtab_type;          /*   200     4 */
        enum dso_binary_type       binary_type;          /*   204     4 */
        enum dso_load_errno        load_errno;           /*   208     4 */
        u8                         adjust_symbols:1;     /*   212: 0  1 */
        u8                         has_build_id:1;       /*   212: 1  1 */
        u8                         header_build_id:1;    /*   212: 2  1 */
        u8                         has_srcline:1;        /*   212: 3  1 */
        u8                         hit:1;                /*   212: 4  1 */
        u8                         annotate_warned:1;    /*   212: 5  1 */
        u8                         auxtrace_warned:1;    /*   212: 6  1 */
        u8                         short_name_allocated:1; /*   212: 7  1 */
        u8                         long_name_allocated:1; /*   213: 0  1 */
        u8                         is_64_bit:1;          /*   213: 1  1 */

        /* XXX 6 bits hole, try to pack */

        _Bool                      sorted_by_name;       /*   214     1 */
        _Bool                      loaded;               /*   215     1 */
        u8                         rel;                  /*   216     1 */

        /* XXX 7 bytes hole, try to pack */

        struct build_id            bid;                  /*   224    32 */
        /* --- cacheline 4 boundary (256 bytes) --- */
        u64                        text_offset;          /*   256     8 */
        u64                        text_end;             /*   264     8 */
        const char  *              short_name;           /*   272     8 */
        const char  *              long_name;            /*   280     8 */
        u16                        long_name_len;        /*   288     2 */
        u16                        short_name_len;       /*   290     2 */

        /* XXX 4 bytes hole, try to pack */

        void *                     dwfl;                 /*   296     8 */
        struct auxtrace_cache *    auxtrace_cache;       /*   304     8 */
        int                        comp;                 /*   312     4 */

        /* XXX 4 bytes hole, try to pack */

        /* --- cacheline 5 boundary (320 bytes) --- */
        struct {
                struct rb_root     cache;                /*   320     8 */
                int                fd;                   /*   328     4 */
                int                status;               /*   332     4 */
                u32                status_seen;          /*   336     4 */

                /* XXX 4 bytes hole, try to pack */

                u64                file_size;            /*   344     8 */
                struct list_head   open_entry;           /*   352    16 */
                u64                elf_base_addr;        /*   368     8 */
                u64                debug_frame_offset;   /*   376     8 */
                /* --- cacheline 6 boundary (384 bytes) --- */
                u64                eh_frame_hdr_addr;    /*   384     8 */
                u64                eh_frame_hdr_offset;  /*   392     8 */
        } data;                                          /*   320    80 */
        struct {
                u32                id;                   /*   400     4 */
                u32                sub_id;               /*   404     4 */
                struct perf_env *  env;                  /*   408     8 */
        } bpf_prog;                                      /*   400    16 */
        union {
                void *             priv;                 /*   416     8 */
                u64                db_id;                /*   416     8 */
        };                                               /*   416     8 */
        struct nsinfo *            nsinfo;               /*   424     8 */
        struct dso_id              id;                   /*   432    24 */
        /* --- cacheline 7 boundary (448 bytes) was 8 bytes ago --- */
        refcount_t                 refcnt;               /*   456     4 */
        char                       name[];               /*   460     0 */

        /* size: 464, cachelines: 8, members: 49 */
        /* sum members: 440, holes: 4, sum holes: 18 */
        /* sum bitfield members: 10 bits, bit holes: 1, sum bit holes: 6 bits */
        /* padding: 4 */
        /* forced alignments: 1 */
        /* last cacheline: 16 bytes */
} __attribute__((__aligned__(8)));
```

After:
```
struct dso {
        struct mutex               lock;                 /*     0    40 */
        struct list_head           node;                 /*    40    16 */
        struct rb_node             rb_node __attribute__((__aligned__(8))); /*    56    24 */
        /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
        struct rb_root *           root;                 /*    80     8 */
        struct rb_root_cached      symbols;              /*    88    16 */
        struct symbol * *          symbol_names;         /*   104     8 */
        size_t                     symbol_names_len;     /*   112     8 */
        struct rb_root_cached      inlined_nodes;        /*   120    16 */
        /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
        struct rb_root_cached      srclines;             /*   136    16 */
        struct {
                u64                addr;                 /*   152     8 */
                struct symbol *    symbol;               /*   160     8 */
        } last_find_result;                              /*   152    16 */
        struct build_id            bid;                  /*   168    32 */
        /* --- cacheline 3 boundary (192 bytes) was 8 bytes ago --- */
        u64                        text_offset;          /*   200     8 */
        u64                        text_end;             /*   208     8 */
        const char  *              short_name;           /*   216     8 */
        const char  *              long_name;            /*   224     8 */
        void *                     a2l;                  /*   232     8 */
        char *                     symsrc_filename;      /*   240     8 */
        struct nsinfo *            nsinfo;               /*   248     8 */
        /* --- cacheline 4 boundary (256 bytes) --- */
        struct auxtrace_cache *    auxtrace_cache;       /*   256     8 */
        union {
                void *             priv;                 /*   264     8 */
                u64                db_id;                /*   264     8 */
        };                                               /*   264     8 */
        struct {
                struct perf_env *  env;                  /*   272     8 */
                u32                id;                   /*   280     4 */
                u32                sub_id;               /*   284     4 */
        } bpf_prog;                                      /*   272    16 */
        struct {
                struct rb_root     cache;                /*   288     8 */
                struct list_head   open_entry;           /*   296    16 */
                u64                file_size;            /*   312     8 */
                /* --- cacheline 5 boundary (320 bytes) --- */
                u64                elf_base_addr;        /*   320     8 */
                u64                debug_frame_offset;   /*   328     8 */
                u64                eh_frame_hdr_addr;    /*   336     8 */
                u64                eh_frame_hdr_offset;  /*   344     8 */
                int                fd;                   /*   352     4 */
                int                status;               /*   356     4 */
                u32                status_seen;          /*   360     4 */
        } data;                                          /*   288    80 */

        /* XXX last struct has 4 bytes of padding */

        struct dso_id              id;                   /*   368    24 */
        /* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */
        unsigned int               a2l_fails;            /*   392     4 */
        int                        comp;                 /*   396     4 */
        refcount_t                 refcnt;               /*   400     4 */
        enum dso_load_errno        load_errno;           /*   404     4 */
        u16                        long_name_len;        /*   408     2 */
        u16                        short_name_len;       /*   410     2 */
        enum dso_binary_type       symtab_type:8;        /*   412: 0  4 */
        enum dso_binary_type       binary_type:8;        /*   412: 8  4 */
        enum dso_space_type        kernel:2;             /*   412:16  4 */
        enum dso_swap_type         needs_swap:2;         /*   412:18  4 */

        /* Bitfield combined with next fields */

        _Bool                      is_kmod:1;            /*   414: 4  1 */
        u8                         adjust_symbols:1;     /*   414: 5  1 */
        u8                         has_build_id:1;       /*   414: 6  1 */
        u8                         header_build_id:1;    /*   414: 7  1 */
        u8                         has_srcline:1;        /*   415: 0  1 */
        u8                         hit:1;                /*   415: 1  1 */
        u8                         annotate_warned:1;    /*   415: 2  1 */
        u8                         auxtrace_warned:1;    /*   415: 3  1 */
        u8                         short_name_allocated:1; /*   415: 4  1 */
        u8                         long_name_allocated:1; /*   415: 5  1 */
        u8                         is_64_bit:1;          /*   415: 6  1 */

        /* XXX 1 bit hole, try to pack */

        _Bool                      sorted_by_name;       /*   416     1 */
        _Bool                      loaded;               /*   417     1 */
        u8                         rel;                  /*   418     1 */
        char                       name[];               /*   419     0 */

        /* size: 424, cachelines: 7, members: 48 */
        /* sum members: 415 */
        /* sum bitfield members: 31 bits, bit holes: 1, sum bit holes: 1 bits */
        /* padding: 5 */
        /* paddings: 1, sum paddings: 4 */
        /* forced alignments: 1 */
        /* last cacheline: 40 bytes */
} __attribute__((__aligned__(8)));
```

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/dso.h | 84 +++++++++++++++++++++----------------------
 1 file changed, 42 insertions(+), 42 deletions(-)

diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 3759de8c2267..8bdc17d78b02 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -158,66 +158,66 @@ struct dso {
 		u64		addr;
 		struct symbol	*symbol;
 	} last_find_result;
-	void		 *a2l;
-	char		 *symsrc_filename;
-	unsigned int	 a2l_fails;
-	enum dso_space_type	kernel;
-	bool			is_kmod;
-	enum dso_swap_type	needs_swap;
-	enum dso_binary_type	symtab_type;
-	enum dso_binary_type	binary_type;
-	enum dso_load_errno	load_errno;
-	u8		 adjust_symbols:1;
-	u8		 has_build_id:1;
-	u8		 header_build_id:1;
-	u8		 has_srcline:1;
-	u8		 hit:1;
-	u8		 annotate_warned:1;
-	u8		 auxtrace_warned:1;
-	u8		 short_name_allocated:1;
-	u8		 long_name_allocated:1;
-	u8		 is_64_bit:1;
-	bool		 sorted_by_name;
-	bool		 loaded;
-	u8		 rel;
 	struct build_id	 bid;
 	u64		 text_offset;
 	u64		 text_end;
 	const char	 *short_name;
 	const char	 *long_name;
-	u16		 long_name_len;
-	u16		 short_name_len;
+	void		 *a2l;
+	char		 *symsrc_filename;
+#if defined(__powerpc__)
 	void		*dwfl;			/* DWARF debug info */
+#endif
+	struct nsinfo	*nsinfo;
 	struct auxtrace_cache *auxtrace_cache;
-	int		 comp;
-
+	union { /* Tool specific area */
+		void	 *priv;
+		u64	 db_id;
+	};
+	/* bpf prog information */
+	struct {
+		struct perf_env	*env;
+		u32		id;
+		u32		sub_id;
+	} bpf_prog;
 	/* dso data file */
 	struct {
 		struct rb_root	 cache;
-		int		 fd;
-		int		 status;
-		u32		 status_seen;
-		u64		 file_size;
 		struct list_head open_entry;
+		u64		 file_size;
 		u64		 elf_base_addr;
 		u64		 debug_frame_offset;
 		u64		 eh_frame_hdr_addr;
 		u64		 eh_frame_hdr_offset;
+		int		 fd;
+		int		 status;
+		u32		 status_seen;
 	} data;
-	/* bpf prog information */
-	struct {
-		u32		id;
-		u32		sub_id;
-		struct perf_env	*env;
-	} bpf_prog;
-
-	union { /* Tool specific area */
-		void	 *priv;
-		u64	 db_id;
-	};
-	struct nsinfo	*nsinfo;
 	struct dso_id	 id;
+	unsigned int	 a2l_fails;
+	int		 comp;
 	refcount_t	 refcnt;
+	enum dso_load_errno	load_errno;
+	u16		 long_name_len;
+	u16		 short_name_len;
+	enum dso_binary_type	symtab_type:8;
+	enum dso_binary_type	binary_type:8;
+	enum dso_space_type	kernel:2;
+	enum dso_swap_type	needs_swap:2;
+	bool			is_kmod:1;
+	u8		 adjust_symbols:1;
+	u8		 has_build_id:1;
+	u8		 header_build_id:1;
+	u8		 has_srcline:1;
+	u8		 hit:1;
+	u8		 annotate_warned:1;
+	u8		 auxtrace_warned:1;
+	u8		 short_name_allocated:1;
+	u8		 long_name_allocated:1;
+	u8		 is_64_bit:1;
+	bool		 sorted_by_name;
+	bool		 loaded;
+	u8		 rel;
 	char		 name[];
 };
 
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 35/53] perf report: Sort child tasks by tid
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (33 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 34/53] perf dso: Reorder variables to save space in struct dso Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 36/53] perf trace: Ignore thread hashing in summary Ian Rogers
                   ` (17 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Commit 91e467bc568f ("perf machine: Use hashtable for machine
threads") made the iteration of thread tids unordered. The perf report
--tasks output now shows child threads in an order determined by the
hashing. For example, in this snippet tid 3 appears after tid 256 even
though they have the same ppid 2:

```
$ perf report --tasks
%      pid      tid     ppid  comm
         0        0       -1 |swapper
         2        2        0 | kthreadd
       256      256        2 |  kworker/12:1H-k
    693761   693761        2 |  kworker/10:1-mm
   1301762  1301762        2 |  kworker/1:1-mm_
   1302530  1302530        2 |  kworker/u32:0-k
         3        3        2 |  rcu_gp
...
```

The output is easier to read if threads appear numerically
increasing. To allow for this, read all threads into a list then sort
with a comparator that orders by the child task's of the first common
parent. The list creation and deletion are created as utilities on
machine.  The indentation is possible by counting the number of
parents a child has.

With this change the output for the same data file is now like:
```
$ perf report --tasks
%      pid      tid     ppid  comm
         0        0       -1 |swapper
         1        1        0 | systemd
       823      823        1 |  systemd-journal
       853      853        1 |  systemd-udevd
      3230     3230        1 |  systemd-timesyn
      3236     3236        1 |  auditd
      3239     3239     3236 |   audisp-syslog
      3321     3321        1 |  accounts-daemon
...
```

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-report.c | 203 ++++++++++++++++++++----------------
 tools/perf/util/machine.c   |  30 ++++++
 tools/perf/util/machine.h   |  10 ++
 3 files changed, 155 insertions(+), 88 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index a5d7bc5b843f..f5b95d45f6da 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -59,6 +59,7 @@
 #include <linux/ctype.h>
 #include <signal.h>
 #include <linux/bitmap.h>
+#include <linux/list_sort.h>
 #include <linux/string.h>
 #include <linux/stringify.h>
 #include <linux/time64.h>
@@ -830,35 +831,6 @@ static void tasks_setup(struct report *rep)
 	rep->tool.no_warn = true;
 }
 
-struct task {
-	struct thread		*thread;
-	struct list_head	 list;
-	struct list_head	 children;
-};
-
-static struct task *tasks_list(struct task *task, struct machine *machine)
-{
-	struct thread *parent_thread, *thread = task->thread;
-	struct task   *parent_task;
-
-	/* Already listed. */
-	if (!list_empty(&task->list))
-		return NULL;
-
-	/* Last one in the chain. */
-	if (thread__ppid(thread) == -1)
-		return task;
-
-	parent_thread = machine__find_thread(machine, -1, thread__ppid(thread));
-	if (!parent_thread)
-		return ERR_PTR(-ENOENT);
-
-	parent_task = thread__priv(parent_thread);
-	thread__put(parent_thread);
-	list_add_tail(&task->list, &parent_task->children);
-	return tasks_list(parent_task, machine);
-}
-
 struct maps__fprintf_task_args {
 	int indent;
 	FILE *fp;
@@ -902,89 +874,144 @@ static size_t maps__fprintf_task(struct maps *maps, int indent, FILE *fp)
 	return args.printed;
 }
 
-static void task__print_level(struct task *task, FILE *fp, int level)
+static int thread_level(struct machine *machine, const struct thread *thread)
 {
-	struct thread *thread = task->thread;
-	struct task *child;
-	int comm_indent = fprintf(fp, "  %8d %8d %8d |%*s",
-				  thread__pid(thread), thread__tid(thread),
-				  thread__ppid(thread), level, "");
+	struct thread *parent_thread;
+	int res;
 
-	fprintf(fp, "%s\n", thread__comm_str(thread));
+	if (thread__tid(thread) <= 0)
+		return 0;
 
-	maps__fprintf_task(thread__maps(thread), comm_indent, fp);
+	if (thread__ppid(thread) <= 0)
+		return 1;
 
-	if (!list_empty(&task->children)) {
-		list_for_each_entry(child, &task->children, list)
-			task__print_level(child, fp, level + 1);
+	parent_thread = machine__find_thread(machine, -1, thread__ppid(thread));
+	if (!parent_thread) {
+		pr_err("Missing parent thread of %d\n", thread__tid(thread));
+		return 0;
 	}
+	res = 1 + thread_level(machine, parent_thread);
+	thread__put(parent_thread);
+	return res;
 }
 
-static int tasks_print(struct report *rep, FILE *fp)
+static void task__print_level(struct machine *machine, struct thread *thread, FILE *fp)
 {
-	struct perf_session *session = rep->session;
-	struct machine      *machine = &session->machines.host;
-	struct task *tasks, *task;
-	unsigned int nr = 0, itask = 0, i;
-	struct rb_node *nd;
-	LIST_HEAD(list);
+	int level = thread_level(machine, thread);
+	int comm_indent = fprintf(fp, "  %8d %8d %8d |%*s",
+				  thread__pid(thread), thread__tid(thread),
+				  thread__ppid(thread), level, "");
 
-	/*
-	 * No locking needed while accessing machine->threads,
-	 * because --tasks is single threaded command.
-	 */
+	fprintf(fp, "%s\n", thread__comm_str(thread));
 
-	/* Count all the threads. */
-	for (i = 0; i < THREADS__TABLE_SIZE; i++)
-		nr += machine->threads[i].nr;
+	maps__fprintf_task(thread__maps(thread), comm_indent, fp);
+}
 
-	tasks = malloc(sizeof(*tasks) * nr);
-	if (!tasks)
-		return -ENOMEM;
+static int task_list_cmp(void *priv, const struct list_head *la, const struct list_head *lb)
+{
+	struct machine *machine = priv;
+	struct thread_list *task_a = list_entry(la, struct thread_list, list);
+	struct thread_list *task_b = list_entry(lb, struct thread_list, list);
+	struct thread *a = task_a->thread;
+	struct thread *b = task_b->thread;
+	int level_a, level_b, res;
+
+	/* Compare a and b to root. */
+	if (thread__tid(a) == thread__tid(b))
+		return 0;
 
-	for (i = 0; i < THREADS__TABLE_SIZE; i++) {
-		struct threads *threads = &machine->threads[i];
+	if (thread__tid(a) == 0)
+		return -1;
 
-		for (nd = rb_first_cached(&threads->entries); nd;
-		     nd = rb_next(nd)) {
-			task = tasks + itask++;
+	if (thread__tid(b) == 0)
+		return 1;
 
-			task->thread = rb_entry(nd, struct thread_rb_node, rb_node)->thread;
-			INIT_LIST_HEAD(&task->children);
-			INIT_LIST_HEAD(&task->list);
-			thread__set_priv(task->thread, task);
-		}
+	/* If parents match sort by tid. */
+	if (thread__ppid(a) == thread__ppid(b)) {
+		return thread__tid(a) < thread__tid(b)
+			? -1
+			: (thread__tid(a) > thread__tid(b) ? 1 : 0);
 	}
 
 	/*
-	 * Iterate every task down to the unprocessed parent
-	 * and link all in task children list. Task with no
-	 * parent is added into 'list'.
+	 * Find a and b such that if they are a child of each other a and b's
+	 * tid's match, otherwise a and b have a common parent and distinct
+	 * tid's to sort by. First make the depths of the threads match.
 	 */
-	for (itask = 0; itask < nr; itask++) {
-		task = tasks + itask;
-
-		if (!list_empty(&task->list))
-			continue;
-
-		task = tasks_list(task, machine);
-		if (IS_ERR(task)) {
-			pr_err("Error: failed to process tasks\n");
-			free(tasks);
-			return PTR_ERR(task);
+	level_a = thread_level(machine, a);
+	level_b = thread_level(machine, b);
+	a = thread__get(a);
+	b = thread__get(b);
+	for (int i = level_a; i > level_b; i--) {
+		struct thread *parent = machine__find_thread(machine, -1, thread__ppid(a));
+
+		thread__put(a);
+		if (!parent) {
+			pr_err("Missing parent thread of %d\n", thread__tid(a));
+			thread__put(b);
+			return -1;
 		}
+		a = parent;
+	}
+	for (int i = level_b; i > level_a; i--) {
+		struct thread *parent = machine__find_thread(machine, -1, thread__ppid(b));
 
-		if (task)
-			list_add_tail(&task->list, &list);
+		thread__put(b);
+		if (!parent) {
+			pr_err("Missing parent thread of %d\n", thread__tid(b));
+			thread__put(a);
+			return 1;
+		}
+		b = parent;
+	}
+	/* Search up to a common parent. */
+	while (thread__ppid(a) != thread__ppid(b)) {
+		struct thread *parent;
+
+		parent = machine__find_thread(machine, -1, thread__ppid(a));
+		thread__put(a);
+		if (!parent)
+			pr_err("Missing parent thread of %d\n", thread__tid(a));
+		a = parent;
+		parent = machine__find_thread(machine, -1, thread__ppid(b));
+		thread__put(b);
+		if (!parent)
+			pr_err("Missing parent thread of %d\n", thread__tid(b));
+		b = parent;
+		if (!a || !b)
+			return !a && !b ? 0 : (!a ? -1 : 1);
+	}
+	if (thread__tid(a) == thread__tid(b)) {
+		/* a is a child of b or vice-versa, deeper levels appear later. */
+		res = level_a < level_b ? -1 : (level_a > level_b ? 1 : 0);
+	} else {
+		/* Sort by tid now the parent is the same. */
+		res = thread__tid(a) < thread__tid(b) ? -1 : 1;
 	}
+	thread__put(a);
+	thread__put(b);
+	return res;
+}
+
+static int tasks_print(struct report *rep, FILE *fp)
+{
+	struct machine *machine = &rep->session->machines.host;
+	LIST_HEAD(tasks);
+	int ret;
 
-	fprintf(fp, "# %8s %8s %8s  %s\n", "pid", "tid", "ppid", "comm");
+	ret = machine__thread_list(machine, &tasks);
+	if (!ret) {
+		struct thread_list *task;
 
-	list_for_each_entry(task, &list, list)
-		task__print_level(task, fp, 0);
+		list_sort(machine, &tasks, task_list_cmp);
 
-	free(tasks);
-	return 0;
+		fprintf(fp, "# %8s %8s %8s  %s\n", "pid", "tid", "ppid", "comm");
+
+		list_for_each_entry(task, &tasks, list)
+			task__print_level(machine, task->thread, fp);
+	}
+	thread_list__delete(&tasks);
+	return ret;
 }
 
 static int __cmd_report(struct report *rep)
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index f9c77119af22..6d7a505850c8 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -3258,6 +3258,36 @@ int machines__for_each_thread(struct machines *machines,
 	return rc;
 }
 
+
+static int thread_list_cb(struct thread *thread, void *data)
+{
+	struct list_head *list = data;
+	struct thread_list *entry = malloc(sizeof(*entry));
+
+	if (!entry)
+		return -ENOMEM;
+
+	entry->thread = thread__get(thread);
+	list_add_tail(&entry->list, list);
+	return 0;
+}
+
+int machine__thread_list(struct machine *machine, struct list_head *list)
+{
+	return machine__for_each_thread(machine, thread_list_cb, list);
+}
+
+void thread_list__delete(struct list_head *list)
+{
+	struct thread_list *pos, *next;
+
+	list_for_each_entry_safe(pos, next, list, list) {
+		thread__zput(pos->thread);
+		list_del(&pos->list);
+		free(pos);
+	}
+}
+
 pid_t machine__get_current_tid(struct machine *machine, int cpu)
 {
 	if (cpu < 0 || (size_t)cpu >= machine->current_tid_sz)
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 1279acda6a8a..b738ce84817b 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -280,6 +280,16 @@ int machines__for_each_thread(struct machines *machines,
 			      int (*fn)(struct thread *thread, void *p),
 			      void *priv);
 
+struct thread_list {
+	struct list_head	 list;
+	struct thread		*thread;
+};
+
+/* Make a list of struct thread_list based on threads in the machine. */
+int machine__thread_list(struct machine *machine, struct list_head *list);
+/* Free up the nodes within the thread_list list. */
+void thread_list__delete(struct list_head *list);
+
 pid_t machine__get_current_tid(struct machine *machine, int cpu);
 int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid,
 			     pid_t tid);
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 36/53] perf trace: Ignore thread hashing in summary
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (34 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 35/53] perf report: Sort child tasks by tid Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 37/53] perf machine: Move fprintf to for_each loop and a callback Ian Rogers
                   ` (16 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Commit 91e467bc568f ("perf machine: Use hashtable for machine
threads") made the iteration of thread tids unordered. The perf trace
--summary output sorts and prints each hash bucket, rather than all
threads globally. Change this behavior by turn all threads into a
list, sort the list by number of trace events then by tids, finally
print the list. This also allows the rbtree in threads to be not
accessed outside of machine.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-trace.c  | 41 +++++++++++++++++++++----------------
 tools/perf/util/rb_resort.h |  5 -----
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index e541d0e2777a..e9ff78b331fe 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -74,6 +74,7 @@
 #include <linux/err.h>
 #include <linux/filter.h>
 #include <linux/kernel.h>
+#include <linux/list_sort.h>
 #include <linux/random.h>
 #include <linux/stringify.h>
 #include <linux/time64.h>
@@ -4314,34 +4315,38 @@ static unsigned long thread__nr_events(struct thread_trace *ttrace)
 	return ttrace ? ttrace->nr_events : 0;
 }
 
-DEFINE_RESORT_RB(threads,
-		(thread__nr_events(thread__priv(a->thread)) <
-		 thread__nr_events(thread__priv(b->thread))),
-	struct thread *thread;
-)
+static int trace_nr_events_cmp(void *priv __maybe_unused,
+			       const struct list_head *la,
+			       const struct list_head *lb)
 {
-	entry->thread = rb_entry(nd, struct thread_rb_node, rb_node)->thread;
+	struct thread_list *a = list_entry(la, struct thread_list, list);
+	struct thread_list *b = list_entry(lb, struct thread_list, list);
+	unsigned long a_nr_events = thread__nr_events(thread__priv(a->thread));
+	unsigned long b_nr_events = thread__nr_events(thread__priv(b->thread));
+
+	if (a_nr_events != b_nr_events)
+		return a_nr_events < b_nr_events ? -1 : 1;
+
+	/* Identical number of threads, place smaller tids first. */
+	return thread__tid(a->thread) < thread__tid(b->thread)
+		? -1
+		: (thread__tid(a->thread) > thread__tid(b->thread) ? 1 : 0);
 }
 
 static size_t trace__fprintf_thread_summary(struct trace *trace, FILE *fp)
 {
 	size_t printed = trace__fprintf_threads_header(fp);
-	struct rb_node *nd;
-	int i;
-
-	for (i = 0; i < THREADS__TABLE_SIZE; i++) {
-		DECLARE_RESORT_RB_MACHINE_THREADS(threads, trace->host, i);
+	LIST_HEAD(threads);
 
-		if (threads == NULL) {
-			fprintf(fp, "%s", "Error sorting output by nr_events!\n");
-			return 0;
-		}
+	if (machine__thread_list(trace->host, &threads) == 0) {
+		struct thread_list *pos;
 
-		resort_rb__for_each_entry(nd, threads)
-			printed += trace__fprintf_thread(fp, threads_entry->thread, trace);
+		list_sort(NULL, &threads, trace_nr_events_cmp);
 
-		resort_rb__delete(threads);
+		list_for_each_entry(pos, &threads, list)
+			printed += trace__fprintf_thread(fp, pos->thread, trace);
 	}
+	thread_list__delete(&threads);
 	return printed;
 }
 
diff --git a/tools/perf/util/rb_resort.h b/tools/perf/util/rb_resort.h
index 376e86cb4c3c..d927a0d25052 100644
--- a/tools/perf/util/rb_resort.h
+++ b/tools/perf/util/rb_resort.h
@@ -143,9 +143,4 @@ struct __name##_sorted *__name = __name##_sorted__new
 	DECLARE_RESORT_RB(__name)(&__ilist->rblist.entries.rb_root,		\
 				  __ilist->rblist.nr_entries)
 
-/* For 'struct machine->threads' */
-#define DECLARE_RESORT_RB_MACHINE_THREADS(__name, __machine, hash_bucket)    \
- DECLARE_RESORT_RB(__name)(&__machine->threads[hash_bucket].entries.rb_root, \
-			   __machine->threads[hash_bucket].nr)
-
 #endif /* _PERF_RESORT_RB_H_ */
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 37/53] perf machine: Move fprintf to for_each loop and a callback
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (35 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 36/53] perf trace: Ignore thread hashing in summary Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 38/53] perf threads: Move threads to its own files Ian Rogers
                   ` (15 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Avoid exposing the threads data structure by switching to the callback
machine__for_each_thread approach. machine__fprintf is only used in
tests and verbose >3 output so don't turn to list and sort. Add
machine__threads_nr to be refactored later.

Note, all existing *_fprintf routines ignore fprintf errors.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/machine.c | 43 ++++++++++++++++++++++++---------------
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 6d7a505850c8..7e19303d1aa6 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1114,29 +1114,40 @@ size_t machine__fprintf_vmlinux_path(struct machine *machine, FILE *fp)
 	return printed;
 }
 
-size_t machine__fprintf(struct machine *machine, FILE *fp)
+struct machine_fprintf_cb_args {
+	FILE *fp;
+	size_t printed;
+};
+
+static int machine_fprintf_cb(struct thread *thread, void *data)
 {
-	struct rb_node *nd;
-	size_t ret;
-	int i;
+	struct machine_fprintf_cb_args *args = data;
 
-	for (i = 0; i < THREADS__TABLE_SIZE; i++) {
-		struct threads *threads = &machine->threads[i];
+	/* TODO: handle fprintf errors. */
+	args->printed += thread__fprintf(thread, args->fp);
+	return 0;
+}
 
-		down_read(&threads->lock);
+static size_t machine__threads_nr(const struct machine *machine)
+{
+	size_t nr = 0;
 
-		ret = fprintf(fp, "Threads: %u\n", threads->nr);
+	for (int i = 0; i < THREADS__TABLE_SIZE; i++)
+		nr += machine->threads[i].nr;
 
-		for (nd = rb_first_cached(&threads->entries); nd;
-		     nd = rb_next(nd)) {
-			struct thread *pos = rb_entry(nd, struct thread_rb_node, rb_node)->thread;
+	return nr;
+}
 
-			ret += thread__fprintf(pos, fp);
-		}
+size_t machine__fprintf(struct machine *machine, FILE *fp)
+{
+	struct machine_fprintf_cb_args args = {
+		.fp = fp,
+		.printed = 0,
+	};
+	size_t ret = fprintf(fp, "Threads: %zu\n", machine__threads_nr(machine));
 
-		up_read(&threads->lock);
-	}
-	return ret;
+	machine__for_each_thread(machine, machine_fprintf_cb, &args);
+	return ret + args.printed;
 }
 
 static struct dso *machine__get_kernel(struct machine *machine)
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 38/53] perf threads: Move threads to its own files
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (36 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 37/53] perf machine: Move fprintf to for_each loop and a callback Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 39/53] perf threads: Switch from rbtree to hashmap Ian Rogers
                   ` (14 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Move threads out of machine and move thread_rb_node into the C
file. This hides the implementation of threads from the rest of the
code allowing for it to be refactored.

Locking discipline is tightened up in this change.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/Build                 |   1 +
 tools/perf/util/bpf_lock_contention.c |   8 +-
 tools/perf/util/machine.c             | 287 ++++----------------------
 tools/perf/util/machine.h             |  20 +-
 tools/perf/util/thread.c              |   2 +-
 tools/perf/util/thread.h              |   6 -
 tools/perf/util/threads.c             | 244 ++++++++++++++++++++++
 tools/perf/util/threads.h             |  35 ++++
 8 files changed, 325 insertions(+), 278 deletions(-)
 create mode 100644 tools/perf/util/threads.c
 create mode 100644 tools/perf/util/threads.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 96058f949ec9..2cb39c3cf46d 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -71,6 +71,7 @@ perf-y += ordered-events.o
 perf-y += namespaces.o
 perf-y += comm.o
 perf-y += thread.o
+perf-y += threads.o
 perf-y += thread_map.o
 perf-y += parse-events-flex.o
 perf-y += parse-events-bison.o
diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
index d9720a910330..52bbd6db2831 100644
--- a/tools/perf/util/bpf_lock_contention.c
+++ b/tools/perf/util/bpf_lock_contention.c
@@ -209,7 +209,7 @@ static const char *lock_contention_get_name(struct lock_contention *con,
 
 		/* do not update idle comm which contains CPU number */
 		if (pid) {
-			struct thread *t = __machine__findnew_thread(machine, /*pid=*/-1, pid);
+			struct thread *t = machine__findnew_thread(machine, /*pid=*/-1, pid);
 
 			if (t == NULL)
 				return name;
@@ -301,9 +301,9 @@ int lock_contention_read(struct lock_contention *con)
 		return -1;
 
 	if (con->aggr_mode == LOCK_AGGR_TASK) {
-		struct thread *idle = __machine__findnew_thread(machine,
-								/*pid=*/0,
-								/*tid=*/0);
+		struct thread *idle = machine__findnew_thread(machine,
+							      /*pid=*/0,
+							      /*tid=*/0);
 		thread__set_comm(idle, "swapper", /*timestamp=*/0);
 	}
 
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 7e19303d1aa6..36231b5a86aa 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -44,9 +44,6 @@
 #include <linux/string.h>
 #include <linux/zalloc.h>
 
-static void __machine__remove_thread(struct machine *machine, struct thread_rb_node *nd,
-				     struct thread *th, bool lock);
-
 static struct dso *machine__kernel_dso(struct machine *machine)
 {
 	return map__dso(machine->vmlinux_map);
@@ -59,35 +56,6 @@ static void dsos__init(struct dsos *dsos)
 	init_rwsem(&dsos->lock);
 }
 
-static void machine__threads_init(struct machine *machine)
-{
-	int i;
-
-	for (i = 0; i < THREADS__TABLE_SIZE; i++) {
-		struct threads *threads = &machine->threads[i];
-		threads->entries = RB_ROOT_CACHED;
-		init_rwsem(&threads->lock);
-		threads->nr = 0;
-		threads->last_match = NULL;
-	}
-}
-
-static int thread_rb_node__cmp_tid(const void *key, const struct rb_node *nd)
-{
-	int to_find = (int) *((pid_t *)key);
-
-	return to_find - (int)thread__tid(rb_entry(nd, struct thread_rb_node, rb_node)->thread);
-}
-
-static struct thread_rb_node *thread_rb_node__find(const struct thread *th,
-						   struct rb_root *tree)
-{
-	pid_t to_find = thread__tid(th);
-	struct rb_node *nd = rb_find(&to_find, tree, thread_rb_node__cmp_tid);
-
-	return rb_entry(nd, struct thread_rb_node, rb_node);
-}
-
 static int machine__set_mmap_name(struct machine *machine)
 {
 	if (machine__is_host(machine))
@@ -121,7 +89,7 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
 	RB_CLEAR_NODE(&machine->rb_node);
 	dsos__init(&machine->dsos);
 
-	machine__threads_init(machine);
+	threads__init(&machine->threads);
 
 	machine->vdso_info = NULL;
 	machine->env = NULL;
@@ -222,27 +190,11 @@ static void dsos__exit(struct dsos *dsos)
 
 void machine__delete_threads(struct machine *machine)
 {
-	struct rb_node *nd;
-	int i;
-
-	for (i = 0; i < THREADS__TABLE_SIZE; i++) {
-		struct threads *threads = &machine->threads[i];
-		down_write(&threads->lock);
-		nd = rb_first_cached(&threads->entries);
-		while (nd) {
-			struct thread_rb_node *trb = rb_entry(nd, struct thread_rb_node, rb_node);
-
-			nd = rb_next(nd);
-			__machine__remove_thread(machine, trb, trb->thread, false);
-		}
-		up_write(&threads->lock);
-	}
+	threads__remove_all_threads(&machine->threads);
 }
 
 void machine__exit(struct machine *machine)
 {
-	int i;
-
 	if (machine == NULL)
 		return;
 
@@ -255,12 +207,7 @@ void machine__exit(struct machine *machine)
 	zfree(&machine->current_tid);
 	zfree(&machine->kallsyms_filename);
 
-	machine__delete_threads(machine);
-	for (i = 0; i < THREADS__TABLE_SIZE; i++) {
-		struct threads *threads = &machine->threads[i];
-
-		exit_rwsem(&threads->lock);
-	}
+	threads__exit(&machine->threads);
 }
 
 void machine__delete(struct machine *machine)
@@ -527,7 +474,7 @@ static void machine__update_thread_pid(struct machine *machine,
 	if (thread__pid(th) == thread__tid(th))
 		return;
 
-	leader = __machine__findnew_thread(machine, thread__pid(th), thread__pid(th));
+	leader = machine__findnew_thread(machine, thread__pid(th), thread__pid(th));
 	if (!leader)
 		goto out_err;
 
@@ -561,160 +508,55 @@ static void machine__update_thread_pid(struct machine *machine,
 	goto out_put;
 }
 
-/*
- * Front-end cache - TID lookups come in blocks,
- * so most of the time we dont have to look up
- * the full rbtree:
- */
-static struct thread*
-__threads__get_last_match(struct threads *threads, struct machine *machine,
-			  int pid, int tid)
-{
-	struct thread *th;
-
-	th = threads->last_match;
-	if (th != NULL) {
-		if (thread__tid(th) == tid) {
-			machine__update_thread_pid(machine, th, pid);
-			return thread__get(th);
-		}
-		thread__put(threads->last_match);
-		threads->last_match = NULL;
-	}
-
-	return NULL;
-}
-
-static struct thread*
-threads__get_last_match(struct threads *threads, struct machine *machine,
-			int pid, int tid)
-{
-	struct thread *th = NULL;
-
-	if (perf_singlethreaded)
-		th = __threads__get_last_match(threads, machine, pid, tid);
-
-	return th;
-}
-
-static void
-__threads__set_last_match(struct threads *threads, struct thread *th)
-{
-	thread__put(threads->last_match);
-	threads->last_match = thread__get(th);
-}
-
-static void
-threads__set_last_match(struct threads *threads, struct thread *th)
-{
-	if (perf_singlethreaded)
-		__threads__set_last_match(threads, th);
-}
-
 /*
  * Caller must eventually drop thread->refcnt returned with a successful
  * lookup/new thread inserted.
  */
-static struct thread *____machine__findnew_thread(struct machine *machine,
-						  struct threads *threads,
-						  pid_t pid, pid_t tid,
-						  bool create)
+static struct thread *__machine__findnew_thread(struct machine *machine,
+						pid_t pid,
+						pid_t tid,
+						bool create)
 {
-	struct rb_node **p = &threads->entries.rb_root.rb_node;
-	struct rb_node *parent = NULL;
-	struct thread *th;
-	struct thread_rb_node *nd;
-	bool leftmost = true;
+	struct thread *th = threads__find(&machine->threads, tid);
+	bool created;
 
-	th = threads__get_last_match(threads, machine, pid, tid);
-	if (th)
+	if (th) {
+		machine__update_thread_pid(machine, th, pid);
 		return th;
-
-	while (*p != NULL) {
-		parent = *p;
-		th = rb_entry(parent, struct thread_rb_node, rb_node)->thread;
-
-		if (thread__tid(th) == tid) {
-			threads__set_last_match(threads, th);
-			machine__update_thread_pid(machine, th, pid);
-			return thread__get(th);
-		}
-
-		if (tid < thread__tid(th))
-			p = &(*p)->rb_left;
-		else {
-			p = &(*p)->rb_right;
-			leftmost = false;
-		}
 	}
-
 	if (!create)
 		return NULL;
 
-	th = thread__new(pid, tid);
-	if (th == NULL)
-		return NULL;
-
-	nd = malloc(sizeof(*nd));
-	if (nd == NULL) {
-		thread__put(th);
-		return NULL;
-	}
-	nd->thread = th;
-
-	rb_link_node(&nd->rb_node, parent, p);
-	rb_insert_color_cached(&nd->rb_node, &threads->entries, leftmost);
-	/*
-	 * We have to initialize maps separately after rb tree is updated.
-	 *
-	 * The reason is that we call machine__findnew_thread within
-	 * thread__init_maps to find the thread leader and that would screwed
-	 * the rb tree.
-	 */
-	if (thread__init_maps(th, machine)) {
-		pr_err("Thread init failed thread %d\n", pid);
-		rb_erase_cached(&nd->rb_node, &threads->entries);
-		RB_CLEAR_NODE(&nd->rb_node);
-		free(nd);
-		thread__put(th);
-		return NULL;
-	}
-	/*
-	 * It is now in the rbtree, get a ref
-	 */
-	threads__set_last_match(threads, th);
-	++threads->nr;
-
-	return thread__get(th);
-}
+	th = threads__findnew(&machine->threads, pid, tid, &created);
+	if (created) {
+		/*
+		 * We have to initialize maps separately after rb tree is
+		 * updated.
+		 *
+		 * The reason is that we call machine__findnew_thread within
+		 * thread__init_maps to find the thread leader and that would
+		 * screwed the rb tree.
+		 */
+		if (thread__init_maps(th, machine)) {
+			pr_err("Thread init failed thread %d\n", pid);
+			threads__remove(&machine->threads, th);
+			thread__put(th);
+			return NULL;
+		}
+	} else
+		machine__update_thread_pid(machine, th, pid);
 
-struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid)
-{
-	return ____machine__findnew_thread(machine, machine__threads(machine, tid), pid, tid, true);
+	return th;
 }
 
-struct thread *machine__findnew_thread(struct machine *machine, pid_t pid,
-				       pid_t tid)
+struct thread *machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid)
 {
-	struct threads *threads = machine__threads(machine, tid);
-	struct thread *th;
-
-	down_write(&threads->lock);
-	th = __machine__findnew_thread(machine, pid, tid);
-	up_write(&threads->lock);
-	return th;
+	return __machine__findnew_thread(machine, pid, tid, /*create=*/true);
 }
 
-struct thread *machine__find_thread(struct machine *machine, pid_t pid,
-				    pid_t tid)
+struct thread *machine__find_thread(struct machine *machine, pid_t pid, pid_t tid)
 {
-	struct threads *threads = machine__threads(machine, tid);
-	struct thread *th;
-
-	down_read(&threads->lock);
-	th =  ____machine__findnew_thread(machine, threads, pid, tid, false);
-	up_read(&threads->lock);
-	return th;
+	return __machine__findnew_thread(machine, pid, tid, /*create=*/false);
 }
 
 /*
@@ -1128,23 +970,13 @@ static int machine_fprintf_cb(struct thread *thread, void *data)
 	return 0;
 }
 
-static size_t machine__threads_nr(const struct machine *machine)
-{
-	size_t nr = 0;
-
-	for (int i = 0; i < THREADS__TABLE_SIZE; i++)
-		nr += machine->threads[i].nr;
-
-	return nr;
-}
-
 size_t machine__fprintf(struct machine *machine, FILE *fp)
 {
 	struct machine_fprintf_cb_args args = {
 		.fp = fp,
 		.printed = 0,
 	};
-	size_t ret = fprintf(fp, "Threads: %zu\n", machine__threads_nr(machine));
+	size_t ret = fprintf(fp, "Threads: %zu\n", threads__nr(&machine->threads));
 
 	machine__for_each_thread(machine, machine_fprintf_cb, &args);
 	return ret + args.printed;
@@ -2066,36 +1898,9 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
 	return 0;
 }
 
-static void __machine__remove_thread(struct machine *machine, struct thread_rb_node *nd,
-				     struct thread *th, bool lock)
-{
-	struct threads *threads = machine__threads(machine, thread__tid(th));
-
-	if (!nd)
-		nd = thread_rb_node__find(th, &threads->entries.rb_root);
-
-	if (threads->last_match && RC_CHK_EQUAL(threads->last_match, th))
-		threads__set_last_match(threads, NULL);
-
-	if (lock)
-		down_write(&threads->lock);
-
-	BUG_ON(refcount_read(thread__refcnt(th)) == 0);
-
-	thread__put(nd->thread);
-	rb_erase_cached(&nd->rb_node, &threads->entries);
-	RB_CLEAR_NODE(&nd->rb_node);
-	--threads->nr;
-
-	free(nd);
-
-	if (lock)
-		up_write(&threads->lock);
-}
-
 void machine__remove_thread(struct machine *machine, struct thread *th)
 {
-	return __machine__remove_thread(machine, NULL, th, true);
+	return threads__remove(&machine->threads, th);
 }
 
 int machine__process_fork_event(struct machine *machine, union perf_event *event,
@@ -3229,23 +3034,7 @@ int machine__for_each_thread(struct machine *machine,
 			     int (*fn)(struct thread *thread, void *p),
 			     void *priv)
 {
-	struct threads *threads;
-	struct rb_node *nd;
-	int rc = 0;
-	int i;
-
-	for (i = 0; i < THREADS__TABLE_SIZE; i++) {
-		threads = &machine->threads[i];
-		for (nd = rb_first_cached(&threads->entries); nd;
-		     nd = rb_next(nd)) {
-			struct thread_rb_node *trb = rb_entry(nd, struct thread_rb_node, rb_node);
-
-			rc = fn(trb->thread, priv);
-			if (rc != 0)
-				return rc;
-		}
-	}
-	return rc;
+	return threads__for_each_thread(&machine->threads, fn, priv);
 }
 
 int machines__for_each_thread(struct machines *machines,
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index b738ce84817b..e28c787616fe 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -7,6 +7,7 @@
 #include "maps.h"
 #include "dsos.h"
 #include "rwsem.h"
+#include "threads.h"
 
 struct addr_location;
 struct branch_stack;
@@ -28,16 +29,6 @@ extern const char *ref_reloc_sym_names[];
 
 struct vdso_info;
 
-#define THREADS__TABLE_BITS	8
-#define THREADS__TABLE_SIZE	(1 << THREADS__TABLE_BITS)
-
-struct threads {
-	struct rb_root_cached  entries;
-	struct rw_semaphore    lock;
-	unsigned int	       nr;
-	struct thread	       *last_match;
-};
-
 struct machine {
 	struct rb_node	  rb_node;
 	pid_t		  pid;
@@ -48,7 +39,7 @@ struct machine {
 	char		  *root_dir;
 	char		  *mmap_name;
 	char		  *kallsyms_filename;
-	struct threads    threads[THREADS__TABLE_SIZE];
+	struct threads    threads;
 	struct vdso_info  *vdso_info;
 	struct perf_env   *env;
 	struct dsos	  dsos;
@@ -69,12 +60,6 @@ struct machine {
 	bool		  trampolines_mapped;
 };
 
-static inline struct threads *machine__threads(struct machine *machine, pid_t tid)
-{
-	/* Cast it to handle tid == -1 */
-	return &machine->threads[(unsigned int)tid % THREADS__TABLE_SIZE];
-}
-
 /*
  * The main kernel (vmlinux) map
  */
@@ -220,7 +205,6 @@ bool machine__is(struct machine *machine, const char *arch);
 bool machine__normalized_is(struct machine *machine, const char *arch);
 int machine__nr_cpus_avail(struct machine *machine);
 
-struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid);
 struct thread *machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid);
 
 struct dso *machine__findnew_dso_id(struct machine *machine, const char *filename, struct dso_id *id);
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index c59ab4d79163..1aa8962dcf52 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -26,7 +26,7 @@ int thread__init_maps(struct thread *thread, struct machine *machine)
 	if (pid == thread__tid(thread) || pid == -1) {
 		thread__set_maps(thread, maps__new(machine));
 	} else {
-		struct thread *leader = __machine__findnew_thread(machine, pid, pid);
+		struct thread *leader = machine__findnew_thread(machine, pid, pid);
 
 		if (leader) {
 			thread__set_maps(thread, maps__get(thread__maps(leader)));
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 0df775b5c110..4b8f3e9e513b 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -3,7 +3,6 @@
 #define __PERF_THREAD_H
 
 #include <linux/refcount.h>
-#include <linux/rbtree.h>
 #include <linux/list.h>
 #include <stdio.h>
 #include <unistd.h>
@@ -30,11 +29,6 @@ struct lbr_stitch {
 	struct callchain_cursor_node	*prev_lbr_cursor;
 };
 
-struct thread_rb_node {
-	struct rb_node rb_node;
-	struct thread *thread;
-};
-
 DECLARE_RC_STRUCT(thread) {
 	/** @maps: mmaps associated with this thread. */
 	struct maps		*maps;
diff --git a/tools/perf/util/threads.c b/tools/perf/util/threads.c
new file mode 100644
index 000000000000..d984ec939c7b
--- /dev/null
+++ b/tools/perf/util/threads.c
@@ -0,0 +1,244 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "threads.h"
+#include "machine.h"
+#include "thread.h"
+
+struct thread_rb_node {
+	struct rb_node rb_node;
+	struct thread *thread;
+};
+
+static struct threads_table_entry *threads__table(struct threads *threads, pid_t tid)
+{
+	/* Cast it to handle tid == -1 */
+	return &threads->table[(unsigned int)tid % THREADS__TABLE_SIZE];
+}
+
+void threads__init(struct threads *threads)
+{
+	for (int i = 0; i < THREADS__TABLE_SIZE; i++) {
+		struct threads_table_entry *table = &threads->table[i];
+
+		table->entries = RB_ROOT_CACHED;
+		init_rwsem(&table->lock);
+		table->nr = 0;
+		table->last_match = NULL;
+	}
+}
+
+void threads__exit(struct threads *threads)
+{
+	threads__remove_all_threads(threads);
+	for (int i = 0; i < THREADS__TABLE_SIZE; i++) {
+		struct threads_table_entry *table = &threads->table[i];
+
+		exit_rwsem(&table->lock);
+	}
+}
+
+size_t threads__nr(struct threads *threads)
+{
+	size_t nr = 0;
+
+	for (int i = 0; i < THREADS__TABLE_SIZE; i++) {
+		struct threads_table_entry *table = &threads->table[i];
+
+		down_read(&table->lock);
+		nr += table->nr;
+		up_read(&table->lock);
+	}
+	return nr;
+}
+
+/*
+ * Front-end cache - TID lookups come in blocks,
+ * so most of the time we dont have to look up
+ * the full rbtree:
+ */
+static struct thread *__threads_table_entry__get_last_match(struct threads_table_entry *table,
+							    pid_t tid)
+{
+	struct thread *th, *res = NULL;
+
+	th = table->last_match;
+	if (th != NULL) {
+		if (thread__tid(th) == tid)
+			res = thread__get(th);
+	}
+	return res;
+}
+
+static void __threads_table_entry__set_last_match(struct threads_table_entry *table,
+						  struct thread *th)
+{
+	thread__put(table->last_match);
+	table->last_match = thread__get(th);
+}
+
+static void threads_table_entry__set_last_match(struct threads_table_entry *table,
+						struct thread *th)
+{
+	down_write(&table->lock);
+	__threads_table_entry__set_last_match(table, th);
+	up_write(&table->lock);
+}
+
+struct thread *threads__find(struct threads *threads, pid_t tid)
+{
+	struct threads_table_entry *table  = threads__table(threads, tid);
+	struct rb_node **p;
+	struct thread *res = NULL;
+
+	down_read(&table->lock);
+	res = __threads_table_entry__get_last_match(table, tid);
+	if (res)
+		return res;
+
+	p = &table->entries.rb_root.rb_node;
+	while (*p != NULL) {
+		struct rb_node *parent = *p;
+		struct thread *th = rb_entry(parent, struct thread_rb_node, rb_node)->thread;
+
+		if (thread__tid(th) == tid) {
+			res = thread__get(th);
+			break;
+		}
+
+		if (tid < thread__tid(th))
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+	up_read(&table->lock);
+	if (res)
+		threads_table_entry__set_last_match(table, res);
+	return res;
+}
+
+struct thread *threads__findnew(struct threads *threads, pid_t pid, pid_t tid, bool *created)
+{
+	struct threads_table_entry *table  = threads__table(threads, tid);
+	struct rb_node **p;
+	struct rb_node *parent = NULL;
+	struct thread *res = NULL;
+	struct thread_rb_node *nd;
+	bool leftmost = true;
+
+	*created = false;
+	down_write(&table->lock);
+	p = &table->entries.rb_root.rb_node;
+	while (*p != NULL) {
+		struct thread *th;
+
+		parent = *p;
+		th = rb_entry(parent, struct thread_rb_node, rb_node)->thread;
+
+		if (thread__tid(th) == tid) {
+			__threads_table_entry__set_last_match(table, th);
+			res = thread__get(th);
+			goto out_unlock;
+		}
+
+		if (tid < thread__tid(th))
+			p = &(*p)->rb_left;
+		else {
+			leftmost = false;
+			p = &(*p)->rb_right;
+		}
+	}
+	nd = malloc(sizeof(*nd));
+	if (nd == NULL)
+		goto out_unlock;
+	res = thread__new(pid, tid);
+	if (!res)
+		free(nd);
+	else {
+		*created = true;
+		nd->thread = thread__get(res);
+		rb_link_node(&nd->rb_node, parent, p);
+		rb_insert_color_cached(&nd->rb_node, &table->entries, leftmost);
+		++table->nr;
+		__threads_table_entry__set_last_match(table, res);
+	}
+out_unlock:
+	up_write(&table->lock);
+	return res;
+}
+
+void threads__remove_all_threads(struct threads *threads)
+{
+	for (int i = 0; i < THREADS__TABLE_SIZE; i++) {
+		struct threads_table_entry *table = &threads->table[i];
+		struct rb_node *nd;
+
+		down_write(&table->lock);
+		__threads_table_entry__set_last_match(table, NULL);
+		nd = rb_first_cached(&table->entries);
+		while (nd) {
+			struct thread_rb_node *trb = rb_entry(nd, struct thread_rb_node, rb_node);
+
+			nd = rb_next(nd);
+			thread__put(trb->thread);
+			rb_erase_cached(&trb->rb_node, &table->entries);
+			RB_CLEAR_NODE(&trb->rb_node);
+			--table->nr;
+
+			free(trb);
+		}
+		assert(table->nr == 0);
+		up_write(&table->lock);
+	}
+}
+
+void threads__remove(struct threads *threads, struct thread *thread)
+{
+	struct rb_node **p;
+	struct threads_table_entry *table  = threads__table(threads, thread__tid(thread));
+	pid_t tid = thread__tid(thread);
+
+	down_write(&table->lock);
+	if (table->last_match && RC_CHK_EQUAL(table->last_match, thread))
+		__threads_table_entry__set_last_match(table, NULL);
+
+	p = &table->entries.rb_root.rb_node;
+	while (*p != NULL) {
+		struct rb_node *parent = *p;
+		struct thread_rb_node *nd = rb_entry(parent, struct thread_rb_node, rb_node);
+		struct thread *th = nd->thread;
+
+		if (RC_CHK_EQUAL(th, thread)) {
+			thread__put(nd->thread);
+			rb_erase_cached(&nd->rb_node, &table->entries);
+			RB_CLEAR_NODE(&nd->rb_node);
+			--table->nr;
+			free(nd);
+			break;
+		}
+
+		if (tid < thread__tid(th))
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+	up_write(&table->lock);
+}
+
+int threads__for_each_thread(struct threads *threads,
+			     int (*fn)(struct thread *thread, void *data),
+			     void *data)
+{
+	for (int i = 0; i < THREADS__TABLE_SIZE; i++) {
+		struct threads_table_entry *table = &threads->table[i];
+		struct rb_node *nd;
+
+		for (nd = rb_first_cached(&table->entries); nd; nd = rb_next(nd)) {
+			struct thread_rb_node *trb = rb_entry(nd, struct thread_rb_node, rb_node);
+			int rc = fn(trb->thread, data);
+
+			if (rc != 0)
+				return rc;
+		}
+	}
+	return 0;
+
+}
diff --git a/tools/perf/util/threads.h b/tools/perf/util/threads.h
new file mode 100644
index 000000000000..ed67de627578
--- /dev/null
+++ b/tools/perf/util/threads.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PERF_THREADS_H
+#define __PERF_THREADS_H
+
+#include <linux/rbtree.h>
+#include "rwsem.h"
+
+struct thread;
+
+#define THREADS__TABLE_BITS	8
+#define THREADS__TABLE_SIZE	(1 << THREADS__TABLE_BITS)
+
+struct threads_table_entry {
+	struct rb_root_cached  entries;
+	struct rw_semaphore    lock;
+	unsigned int	       nr;
+	struct thread	       *last_match;
+};
+
+struct threads {
+	struct threads_table_entry table[THREADS__TABLE_SIZE];
+};
+
+void threads__init(struct threads *threads);
+void threads__exit(struct threads *threads);
+size_t threads__nr(struct threads *threads);
+struct thread *threads__find(struct threads *threads, pid_t tid);
+struct thread *threads__findnew(struct threads *threads, pid_t pid, pid_t tid, bool *created);
+void threads__remove_all_threads(struct threads *threads);
+void threads__remove(struct threads *threads, struct thread *thread);
+int threads__for_each_thread(struct threads *threads,
+			     int (*fn)(struct thread *thread, void *data),
+			     void *data);
+
+#endif	/* __PERF_THREADS_H */
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 39/53] perf threads: Switch from rbtree to hashmap
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (37 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 38/53] perf threads: Move threads to its own files Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 40/53] perf threads: Reduce table size from 256 to 8 Ian Rogers
                   ` (13 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

The rbtree provides a sorting on entries but this is unused. Switch to
using hashmap for O(1) rather than O(log n) find/insert/remove
complexity.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/threads.c | 146 ++++++++++++--------------------------
 tools/perf/util/threads.h |   6 +-
 2 files changed, 47 insertions(+), 105 deletions(-)

diff --git a/tools/perf/util/threads.c b/tools/perf/util/threads.c
index d984ec939c7b..55923be53180 100644
--- a/tools/perf/util/threads.c
+++ b/tools/perf/util/threads.c
@@ -3,25 +3,30 @@
 #include "machine.h"
 #include "thread.h"
 
-struct thread_rb_node {
-	struct rb_node rb_node;
-	struct thread *thread;
-};
-
 static struct threads_table_entry *threads__table(struct threads *threads, pid_t tid)
 {
 	/* Cast it to handle tid == -1 */
 	return &threads->table[(unsigned int)tid % THREADS__TABLE_SIZE];
 }
 
+static size_t key_hash(long key, void *ctx __maybe_unused)
+{
+	/* The table lookup removes low bit entropy, but this is just ignored here. */
+	return key;
+}
+
+static bool key_equal(long key1, long key2, void *ctx __maybe_unused)
+{
+	return key1 == key2;
+}
+
 void threads__init(struct threads *threads)
 {
 	for (int i = 0; i < THREADS__TABLE_SIZE; i++) {
 		struct threads_table_entry *table = &threads->table[i];
 
-		table->entries = RB_ROOT_CACHED;
+		hashmap__init(&table->shard, key_hash, key_equal, NULL);
 		init_rwsem(&table->lock);
-		table->nr = 0;
 		table->last_match = NULL;
 	}
 }
@@ -32,6 +37,7 @@ void threads__exit(struct threads *threads)
 	for (int i = 0; i < THREADS__TABLE_SIZE; i++) {
 		struct threads_table_entry *table = &threads->table[i];
 
+		hashmap__clear(&table->shard);
 		exit_rwsem(&table->lock);
 	}
 }
@@ -44,7 +50,7 @@ size_t threads__nr(struct threads *threads)
 		struct threads_table_entry *table = &threads->table[i];
 
 		down_read(&table->lock);
-		nr += table->nr;
+		nr += hashmap__size(&table->shard);
 		up_read(&table->lock);
 	}
 	return nr;
@@ -86,28 +92,13 @@ static void threads_table_entry__set_last_match(struct threads_table_entry *tabl
 struct thread *threads__find(struct threads *threads, pid_t tid)
 {
 	struct threads_table_entry *table  = threads__table(threads, tid);
-	struct rb_node **p;
-	struct thread *res = NULL;
+	struct thread *res;
 
 	down_read(&table->lock);
 	res = __threads_table_entry__get_last_match(table, tid);
-	if (res)
-		return res;
-
-	p = &table->entries.rb_root.rb_node;
-	while (*p != NULL) {
-		struct rb_node *parent = *p;
-		struct thread *th = rb_entry(parent, struct thread_rb_node, rb_node)->thread;
-
-		if (thread__tid(th) == tid) {
-			res = thread__get(th);
-			break;
-		}
-
-		if (tid < thread__tid(th))
-			p = &(*p)->rb_left;
-		else
-			p = &(*p)->rb_right;
+	if (!res) {
+		if (hashmap__find(&table->shard, tid, &res))
+			res = thread__get(res);
 	}
 	up_read(&table->lock);
 	if (res)
@@ -118,49 +109,25 @@ struct thread *threads__find(struct threads *threads, pid_t tid)
 struct thread *threads__findnew(struct threads *threads, pid_t pid, pid_t tid, bool *created)
 {
 	struct threads_table_entry *table  = threads__table(threads, tid);
-	struct rb_node **p;
-	struct rb_node *parent = NULL;
 	struct thread *res = NULL;
-	struct thread_rb_node *nd;
-	bool leftmost = true;
 
 	*created = false;
 	down_write(&table->lock);
-	p = &table->entries.rb_root.rb_node;
-	while (*p != NULL) {
-		struct thread *th;
-
-		parent = *p;
-		th = rb_entry(parent, struct thread_rb_node, rb_node)->thread;
-
-		if (thread__tid(th) == tid) {
-			__threads_table_entry__set_last_match(table, th);
-			res = thread__get(th);
-			goto out_unlock;
-		}
-
-		if (tid < thread__tid(th))
-			p = &(*p)->rb_left;
-		else {
-			leftmost = false;
-			p = &(*p)->rb_right;
-		}
-	}
-	nd = malloc(sizeof(*nd));
-	if (nd == NULL)
-		goto out_unlock;
 	res = thread__new(pid, tid);
-	if (!res)
-		free(nd);
-	else {
-		*created = true;
-		nd->thread = thread__get(res);
-		rb_link_node(&nd->rb_node, parent, p);
-		rb_insert_color_cached(&nd->rb_node, &table->entries, leftmost);
-		++table->nr;
-		__threads_table_entry__set_last_match(table, res);
+	if (res) {
+		if (hashmap__add(&table->shard, tid, res)) {
+			/* Add failed. Assume a race so find other entry. */
+			thread__put(res);
+			res = NULL;
+			if (hashmap__find(&table->shard, tid, &res))
+				res = thread__get(res);
+		} else {
+			res = thread__get(res);
+			*created = true;
+		}
+		if (res)
+			__threads_table_entry__set_last_match(table, res);
 	}
-out_unlock:
 	up_write(&table->lock);
 	return res;
 }
@@ -169,57 +136,32 @@ void threads__remove_all_threads(struct threads *threads)
 {
 	for (int i = 0; i < THREADS__TABLE_SIZE; i++) {
 		struct threads_table_entry *table = &threads->table[i];
-		struct rb_node *nd;
+		struct hashmap_entry *cur, *tmp;
+		size_t bkt;
 
 		down_write(&table->lock);
 		__threads_table_entry__set_last_match(table, NULL);
-		nd = rb_first_cached(&table->entries);
-		while (nd) {
-			struct thread_rb_node *trb = rb_entry(nd, struct thread_rb_node, rb_node);
-
-			nd = rb_next(nd);
-			thread__put(trb->thread);
-			rb_erase_cached(&trb->rb_node, &table->entries);
-			RB_CLEAR_NODE(&trb->rb_node);
-			--table->nr;
+		hashmap__for_each_entry_safe((&table->shard), cur, tmp, bkt) {
+			struct thread *old_value;
 
-			free(trb);
+			hashmap__delete(&table->shard, cur->key, /*old_key=*/NULL, &old_value);
+			thread__put(old_value);
 		}
-		assert(table->nr == 0);
 		up_write(&table->lock);
 	}
 }
 
 void threads__remove(struct threads *threads, struct thread *thread)
 {
-	struct rb_node **p;
 	struct threads_table_entry *table  = threads__table(threads, thread__tid(thread));
-	pid_t tid = thread__tid(thread);
+	struct thread *old_value;
 
 	down_write(&table->lock);
 	if (table->last_match && RC_CHK_EQUAL(table->last_match, thread))
 		__threads_table_entry__set_last_match(table, NULL);
 
-	p = &table->entries.rb_root.rb_node;
-	while (*p != NULL) {
-		struct rb_node *parent = *p;
-		struct thread_rb_node *nd = rb_entry(parent, struct thread_rb_node, rb_node);
-		struct thread *th = nd->thread;
-
-		if (RC_CHK_EQUAL(th, thread)) {
-			thread__put(nd->thread);
-			rb_erase_cached(&nd->rb_node, &table->entries);
-			RB_CLEAR_NODE(&nd->rb_node);
-			--table->nr;
-			free(nd);
-			break;
-		}
-
-		if (tid < thread__tid(th))
-			p = &(*p)->rb_left;
-		else
-			p = &(*p)->rb_right;
-	}
+	hashmap__delete(&table->shard, thread__tid(thread), /*old_key=*/NULL, &old_value);
+	thread__put(old_value);
 	up_write(&table->lock);
 }
 
@@ -229,11 +171,11 @@ int threads__for_each_thread(struct threads *threads,
 {
 	for (int i = 0; i < THREADS__TABLE_SIZE; i++) {
 		struct threads_table_entry *table = &threads->table[i];
-		struct rb_node *nd;
+		struct hashmap_entry *cur;
+		size_t bkt;
 
-		for (nd = rb_first_cached(&table->entries); nd; nd = rb_next(nd)) {
-			struct thread_rb_node *trb = rb_entry(nd, struct thread_rb_node, rb_node);
-			int rc = fn(trb->thread, data);
+		hashmap__for_each_entry((&table->shard), cur, bkt) {
+			int rc = fn((struct thread *)cur->pvalue, data);
 
 			if (rc != 0)
 				return rc;
diff --git a/tools/perf/util/threads.h b/tools/perf/util/threads.h
index ed67de627578..d03bd91a7769 100644
--- a/tools/perf/util/threads.h
+++ b/tools/perf/util/threads.h
@@ -2,7 +2,7 @@
 #ifndef __PERF_THREADS_H
 #define __PERF_THREADS_H
 
-#include <linux/rbtree.h>
+#include "hashmap.h"
 #include "rwsem.h"
 
 struct thread;
@@ -11,9 +11,9 @@ struct thread;
 #define THREADS__TABLE_SIZE	(1 << THREADS__TABLE_BITS)
 
 struct threads_table_entry {
-	struct rb_root_cached  entries;
+	/* Key is tid, value is struct thread. */
+	struct hashmap	       shard;
 	struct rw_semaphore    lock;
-	unsigned int	       nr;
 	struct thread	       *last_match;
 };
 
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 40/53] perf threads: Reduce table size from 256 to 8
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (38 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 39/53] perf threads: Switch from rbtree to hashmap Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 41/53] perf dsos: Attempt to better abstract dsos internals Ian Rogers
                   ` (12 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

The threads data structure is an array of hashmaps, previously
rbtrees. The two levels allows for a fixed outer array where access is
guarded by rw_semaphores. Commit 91e467bc568f ("perf machine: Use
hashtable for machine threads") sized the outer table at 256 entries
to avoid future scalability problems, however, this means the threads
struct is sized at 30,720 bytes. As the hashmaps allow O(1) access for
the common find/insert/remove operations, lower the number of entries
to 8. This reduces the size overhead to 960 bytes.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/threads.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/threads.h b/tools/perf/util/threads.h
index d03bd91a7769..da68d2223f18 100644
--- a/tools/perf/util/threads.h
+++ b/tools/perf/util/threads.h
@@ -7,7 +7,7 @@
 
 struct thread;
 
-#define THREADS__TABLE_BITS	8
+#define THREADS__TABLE_BITS	3
 #define THREADS__TABLE_SIZE	(1 << THREADS__TABLE_BITS)
 
 struct threads_table_entry {
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 41/53] perf dsos: Attempt to better abstract dsos internals
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (39 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 40/53] perf threads: Reduce table size from 256 to 8 Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 42/53] perf dsos: Tidy reference counting and locking Ian Rogers
                   ` (11 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Move functions from machine and build-id to dsos. Pass dsos struct
rather than internal state. Rename some functions to better represent
which data structure they operate on.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-inject.c |  2 +-
 tools/perf/builtin-record.c |  2 +-
 tools/perf/util/build-id.c  | 38 +---------------------------
 tools/perf/util/build-id.h  |  2 --
 tools/perf/util/dso.h       |  6 -----
 tools/perf/util/dsos.c      | 49 ++++++++++++++++++++++++++++++++++---
 tools/perf/util/dsos.h      | 19 +++++++++++---
 tools/perf/util/machine.c   | 40 ++++++------------------------
 tools/perf/util/machine.h   |  2 ++
 tools/perf/util/session.c   | 21 ++++++++++++++++
 tools/perf/util/session.h   |  2 ++
 11 files changed, 97 insertions(+), 86 deletions(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index eb3ef5c24b66..ef73317e6ae7 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -2122,7 +2122,7 @@ static int __cmd_inject(struct perf_inject *inject)
 		 */
 		if (perf_header__has_feat(&session->header, HEADER_BUILD_ID) &&
 		    inject->have_auxtrace && !inject->itrace_synth_opts.set)
-			dsos__hit_all(session);
+			perf_session__dsos_hit_all(session);
 		/*
 		 * The AUX areas have been removed and replaced with
 		 * synthesized hardware events, so clear the feature flag.
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index b6c8c1371b39..53653f21b52c 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1787,7 +1787,7 @@ record__finish_output(struct record *rec)
 		process_buildids(rec);
 
 		if (rec->buildid_all)
-			dsos__hit_all(rec->session);
+			perf_session__dsos_hit_all(rec->session);
 	}
 	perf_session__write_header(rec->session, rec->evlist, fd, true);
 
diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index 03c64b85383b..a617b1917e6b 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -390,42 +390,6 @@ int perf_session__write_buildid_table(struct perf_session *session,
 	return err;
 }
 
-static int __dsos__hit_all(struct list_head *head)
-{
-	struct dso *pos;
-
-	list_for_each_entry(pos, head, node)
-		pos->hit = true;
-
-	return 0;
-}
-
-static int machine__hit_all_dsos(struct machine *machine)
-{
-	return __dsos__hit_all(&machine->dsos.head);
-}
-
-int dsos__hit_all(struct perf_session *session)
-{
-	struct rb_node *nd;
-	int err;
-
-	err = machine__hit_all_dsos(&session->machines.host);
-	if (err)
-		return err;
-
-	for (nd = rb_first_cached(&session->machines.guests); nd;
-	     nd = rb_next(nd)) {
-		struct machine *pos = rb_entry(nd, struct machine, rb_node);
-
-		err = machine__hit_all_dsos(pos);
-		if (err)
-			return err;
-	}
-
-	return 0;
-}
-
 void disable_buildid_cache(void)
 {
 	no_buildid_cache = true;
@@ -992,7 +956,7 @@ int perf_session__cache_build_ids(struct perf_session *session)
 
 static bool machine__read_build_ids(struct machine *machine, bool with_hits)
 {
-	return __dsos__read_build_ids(&machine->dsos.head, with_hits);
+	return __dsos__read_build_ids(&machine->dsos, with_hits);
 }
 
 bool perf_session__read_build_ids(struct perf_session *session, bool with_hits)
diff --git a/tools/perf/util/build-id.h b/tools/perf/util/build-id.h
index 4e3a1169379b..3fa8bffb07ca 100644
--- a/tools/perf/util/build-id.h
+++ b/tools/perf/util/build-id.h
@@ -39,8 +39,6 @@ int build_id__mark_dso_hit(struct perf_tool *tool, union perf_event *event,
 			   struct perf_sample *sample, struct evsel *evsel,
 			   struct machine *machine);
 
-int dsos__hit_all(struct perf_session *session);
-
 int perf_event__inject_buildid(struct perf_tool *tool, union perf_event *event,
 			       struct perf_sample *sample, struct evsel *evsel,
 			       struct machine *machine);
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 8bdc17d78b02..8b45dbdae776 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -230,12 +230,6 @@ struct dso {
 #define dso__for_each_symbol(dso, pos, n)	\
 	symbols__for_each_entry(&(dso)->symbols, pos, n)
 
-#define dsos__for_each_with_build_id(pos, head)	\
-	list_for_each_entry(pos, head, node)	\
-		if (!pos->has_build_id)		\
-			continue;		\
-		else
-
 static inline void dso__set_loaded(struct dso *dso)
 {
 	dso->loaded = true;
diff --git a/tools/perf/util/dsos.c b/tools/perf/util/dsos.c
index cf80aa42dd07..e65ef6762bed 100644
--- a/tools/perf/util/dsos.c
+++ b/tools/perf/util/dsos.c
@@ -12,6 +12,35 @@
 #include <symbol.h> // filename__read_build_id
 #include <unistd.h>
 
+void dsos__init(struct dsos *dsos)
+{
+	INIT_LIST_HEAD(&dsos->head);
+	dsos->root = RB_ROOT;
+	init_rwsem(&dsos->lock);
+}
+
+static void dsos__purge(struct dsos *dsos)
+{
+	struct dso *pos, *n;
+
+	down_write(&dsos->lock);
+
+	list_for_each_entry_safe(pos, n, &dsos->head, node) {
+		RB_CLEAR_NODE(&pos->rb_node);
+		pos->root = NULL;
+		list_del_init(&pos->node);
+		dso__put(pos);
+	}
+
+	up_write(&dsos->lock);
+}
+
+void dsos__exit(struct dsos *dsos)
+{
+	dsos__purge(dsos);
+	exit_rwsem(&dsos->lock);
+}
+
 static int __dso_id__cmp(struct dso_id *a, struct dso_id *b)
 {
 	if (a->maj > b->maj) return -1;
@@ -73,8 +102,9 @@ int dso__cmp_id(struct dso *a, struct dso *b)
 	return __dso_id__cmp(&a->id, &b->id);
 }
 
-bool __dsos__read_build_ids(struct list_head *head, bool with_hits)
+bool __dsos__read_build_ids(struct dsos *dsos, bool with_hits)
 {
+	struct list_head *head = &dsos->head;
 	bool have_build_id = false;
 	struct dso *pos;
 	struct nscookie nsc;
@@ -303,9 +333,10 @@ struct dso *dsos__findnew_id(struct dsos *dsos, const char *name, struct dso_id
 	return dso;
 }
 
-size_t __dsos__fprintf_buildid(struct list_head *head, FILE *fp,
+size_t __dsos__fprintf_buildid(struct dsos *dsos, FILE *fp,
 			       bool (skip)(struct dso *dso, int parm), int parm)
 {
+	struct list_head *head = &dsos->head;
 	struct dso *pos;
 	size_t ret = 0;
 
@@ -320,8 +351,9 @@ size_t __dsos__fprintf_buildid(struct list_head *head, FILE *fp,
 	return ret;
 }
 
-size_t __dsos__fprintf(struct list_head *head, FILE *fp)
+size_t __dsos__fprintf(struct dsos *dsos, FILE *fp)
 {
+	struct list_head *head = &dsos->head;
 	struct dso *pos;
 	size_t ret = 0;
 
@@ -331,3 +363,14 @@ size_t __dsos__fprintf(struct list_head *head, FILE *fp)
 
 	return ret;
 }
+
+int __dsos__hit_all(struct dsos *dsos)
+{
+	struct list_head *head = &dsos->head;
+	struct dso *pos;
+
+	list_for_each_entry(pos, head, node)
+		pos->hit = true;
+
+	return 0;
+}
diff --git a/tools/perf/util/dsos.h b/tools/perf/util/dsos.h
index 5dbec2bc6966..1c81ddf07f8f 100644
--- a/tools/perf/util/dsos.h
+++ b/tools/perf/util/dsos.h
@@ -21,6 +21,15 @@ struct dsos {
 	struct rw_semaphore lock;
 };
 
+#define dsos__for_each_with_build_id(pos, head)	\
+	list_for_each_entry(pos, head, node)	\
+		if (!pos->has_build_id)		\
+			continue;		\
+		else
+
+void dsos__init(struct dsos *dsos);
+void dsos__exit(struct dsos *dsos);
+
 void __dsos__add(struct dsos *dsos, struct dso *dso);
 void dsos__add(struct dsos *dsos, struct dso *dso);
 struct dso *__dsos__addnew(struct dsos *dsos, const char *name);
@@ -28,13 +37,15 @@ struct dso *__dsos__find(struct dsos *dsos, const char *name, bool cmp_short);
 
 struct dso *dsos__findnew_id(struct dsos *dsos, const char *name, struct dso_id *id);
  
+bool __dsos__read_build_ids(struct dsos *dsos, bool with_hits);
+
 struct dso *__dsos__findnew_link_by_longname_id(struct rb_root *root, struct dso *dso,
 						const char *name, struct dso_id *id);
 
-bool __dsos__read_build_ids(struct list_head *head, bool with_hits);
-
-size_t __dsos__fprintf_buildid(struct list_head *head, FILE *fp,
+size_t __dsos__fprintf_buildid(struct dsos *dsos, FILE *fp,
 			       bool (skip)(struct dso *dso, int parm), int parm);
-size_t __dsos__fprintf(struct list_head *head, FILE *fp);
+size_t __dsos__fprintf(struct dsos *dsos, FILE *fp);
+
+int __dsos__hit_all(struct dsos *dsos);
 
 #endif /* __PERF_DSOS */
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 36231b5a86aa..a0abfba90962 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -49,13 +49,6 @@ static struct dso *machine__kernel_dso(struct machine *machine)
 	return map__dso(machine->vmlinux_map);
 }
 
-static void dsos__init(struct dsos *dsos)
-{
-	INIT_LIST_HEAD(&dsos->head);
-	dsos->root = RB_ROOT;
-	init_rwsem(&dsos->lock);
-}
-
 static int machine__set_mmap_name(struct machine *machine)
 {
 	if (machine__is_host(machine))
@@ -166,28 +159,6 @@ struct machine *machine__new_kallsyms(void)
 	return machine;
 }
 
-static void dsos__purge(struct dsos *dsos)
-{
-	struct dso *pos, *n;
-
-	down_write(&dsos->lock);
-
-	list_for_each_entry_safe(pos, n, &dsos->head, node) {
-		RB_CLEAR_NODE(&pos->rb_node);
-		pos->root = NULL;
-		list_del_init(&pos->node);
-		dso__put(pos);
-	}
-
-	up_write(&dsos->lock);
-}
-
-static void dsos__exit(struct dsos *dsos)
-{
-	dsos__purge(dsos);
-	exit_rwsem(&dsos->lock);
-}
-
 void machine__delete_threads(struct machine *machine)
 {
 	threads__remove_all_threads(&machine->threads);
@@ -907,11 +878,11 @@ static struct map *machine__addnew_module_map(struct machine *machine, u64 start
 size_t machines__fprintf_dsos(struct machines *machines, FILE *fp)
 {
 	struct rb_node *nd;
-	size_t ret = __dsos__fprintf(&machines->host.dsos.head, fp);
+	size_t ret = __dsos__fprintf(&machines->host.dsos, fp);
 
 	for (nd = rb_first_cached(&machines->guests); nd; nd = rb_next(nd)) {
 		struct machine *pos = rb_entry(nd, struct machine, rb_node);
-		ret += __dsos__fprintf(&pos->dsos.head, fp);
+		ret += __dsos__fprintf(&pos->dsos, fp);
 	}
 
 	return ret;
@@ -920,7 +891,7 @@ size_t machines__fprintf_dsos(struct machines *machines, FILE *fp)
 size_t machine__fprintf_dsos_buildid(struct machine *m, FILE *fp,
 				     bool (skip)(struct dso *dso, int parm), int parm)
 {
-	return __dsos__fprintf_buildid(&m->dsos.head, fp, skip, parm);
+	return __dsos__fprintf_buildid(&m->dsos, fp, skip, parm);
 }
 
 size_t machines__fprintf_dsos_buildid(struct machines *machines, FILE *fp,
@@ -3278,3 +3249,8 @@ bool machine__is_lock_function(struct machine *machine, u64 addr)
 
 	return false;
 }
+
+int machine__hit_all_dsos(struct machine *machine)
+{
+	return __dsos__hit_all(&machine->dsos);
+}
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index e28c787616fe..05927aa3e813 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -306,4 +306,6 @@ int machine__map_x86_64_entry_trampolines(struct machine *machine,
 int machine__resolve(struct machine *machine, struct addr_location *al,
 		     struct perf_sample *sample);
 
+int machine__hit_all_dsos(struct machine *machine);
+
 #endif /* __PERF_MACHINE_H */
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index c6afba7ab1a5..63754775931d 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -2895,3 +2895,24 @@ int perf_event__process_id_index(struct perf_session *session,
 	}
 	return 0;
 }
+
+int perf_session__dsos_hit_all(struct perf_session *session)
+{
+	struct rb_node *nd;
+	int err;
+
+	err = machine__hit_all_dsos(&session->machines.host);
+	if (err)
+		return err;
+
+	for (nd = rb_first_cached(&session->machines.guests); nd;
+	     nd = rb_next(nd)) {
+		struct machine *pos = rb_entry(nd, struct machine, rb_node);
+
+		err = machine__hit_all_dsos(pos);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index ee3715e8563b..25c0d6c9cac9 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -154,6 +154,8 @@ int perf_session__deliver_synth_event(struct perf_session *session,
 				      union perf_event *event,
 				      struct perf_sample *sample);
 
+int perf_session__dsos_hit_all(struct perf_session *session);
+
 int perf_event__process_id_index(struct perf_session *session,
 				 union perf_event *event);
 
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 42/53] perf dsos: Tidy reference counting and locking
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (40 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 41/53] perf dsos: Attempt to better abstract dsos internals Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 43/53] perf dsos: Add dsos__for_each_dso Ian Rogers
                   ` (10 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Move more functionality into dsos.c generally from machine, renaming
functions to match their new usage. The find function is made to
always "get" before returning a dso. Reduce the scope of locks in vdso
to match the locking paradigm.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/dsos.c    | 73 +++++++++++++++++++++++++++++++++++----
 tools/perf/util/dsos.h    |  9 ++++-
 tools/perf/util/machine.c | 62 ++-------------------------------
 tools/perf/util/map.c     |  4 +--
 tools/perf/util/vdso.c    | 48 +++++++++++--------------
 5 files changed, 97 insertions(+), 99 deletions(-)

diff --git a/tools/perf/util/dsos.c b/tools/perf/util/dsos.c
index e65ef6762bed..d269e09005a7 100644
--- a/tools/perf/util/dsos.c
+++ b/tools/perf/util/dsos.c
@@ -181,7 +181,7 @@ struct dso *__dsos__findnew_link_by_longname_id(struct rb_root *root, struct dso
 			 * at the end of the list of duplicates.
 			 */
 			if (!dso || (dso == this))
-				return this;	/* Find matching dso */
+				return dso__get(this);	/* Find matching dso */
 			/*
 			 * The core kernel DSOs may have duplicated long name.
 			 * In this case, the short name should be different.
@@ -253,15 +253,20 @@ static struct dso *__dsos__find_id(struct dsos *dsos, const char *name, struct d
 	if (cmp_short) {
 		list_for_each_entry(pos, &dsos->head, node)
 			if (__dso__cmp_short_name(name, id, pos) == 0)
-				return pos;
+				return dso__get(pos);
 		return NULL;
 	}
 	return __dsos__findnew_by_longname_id(&dsos->root, name, id);
 }
 
-struct dso *__dsos__find(struct dsos *dsos, const char *name, bool cmp_short)
+struct dso *dsos__find(struct dsos *dsos, const char *name, bool cmp_short)
 {
-	return __dsos__find_id(dsos, name, NULL, cmp_short);
+	struct dso *res;
+
+	down_read(&dsos->lock);
+	res = __dsos__find_id(dsos, name, NULL, cmp_short);
+	up_read(&dsos->lock);
+	return res;
 }
 
 static void dso__set_basename(struct dso *dso)
@@ -303,8 +308,6 @@ static struct dso *__dsos__addnew_id(struct dsos *dsos, const char *name, struct
 	if (dso != NULL) {
 		__dsos__add(dsos, dso);
 		dso__set_basename(dso);
-		/* Put dso here because __dsos_add already got it */
-		dso__put(dso);
 	}
 	return dso;
 }
@@ -328,7 +331,7 @@ struct dso *dsos__findnew_id(struct dsos *dsos, const char *name, struct dso_id
 {
 	struct dso *dso;
 	down_write(&dsos->lock);
-	dso = dso__get(__dsos__findnew_id(dsos, name, id));
+	dso = __dsos__findnew_id(dsos, name, id);
 	up_write(&dsos->lock);
 	return dso;
 }
@@ -374,3 +377,59 @@ int __dsos__hit_all(struct dsos *dsos)
 
 	return 0;
 }
+
+struct dso *dsos__findnew_module_dso(struct dsos *dsos,
+				     struct machine *machine,
+				     struct kmod_path *m,
+				     const char *filename)
+{
+	struct dso *dso;
+
+	down_write(&dsos->lock);
+
+	dso = __dsos__find_id(dsos, m->name, NULL, /*cmp_short=*/true);
+	if (!dso) {
+		dso = __dsos__addnew(dsos, m->name);
+		if (dso == NULL)
+			goto out_unlock;
+
+		dso__set_module_info(dso, m, machine);
+		dso__set_long_name(dso, strdup(filename), true);
+		dso->kernel = DSO_SPACE__KERNEL;
+	}
+
+out_unlock:
+	up_write(&dsos->lock);
+	return dso;
+}
+
+struct dso *dsos__find_kernel_dso(struct dsos *dsos)
+{
+	struct dso *dso, *res = NULL;
+
+	down_read(&dsos->lock);
+	list_for_each_entry(dso, &dsos->head, node) {
+		/*
+		 * The cpumode passed to is_kernel_module is not the cpumode of
+		 * *this* event. If we insist on passing correct cpumode to
+		 * is_kernel_module, we should record the cpumode when we adding
+		 * this dso to the linked list.
+		 *
+		 * However we don't really need passing correct cpumode.  We
+		 * know the correct cpumode must be kernel mode (if not, we
+		 * should not link it onto kernel_dsos list).
+		 *
+		 * Therefore, we pass PERF_RECORD_MISC_CPUMODE_UNKNOWN.
+		 * is_kernel_module() treats it as a kernel cpumode.
+		 */
+		if (!dso->kernel ||
+		    is_kernel_module(dso->long_name,
+				     PERF_RECORD_MISC_CPUMODE_UNKNOWN))
+			continue;
+
+		res = dso__get(dso);
+		break;
+	}
+	up_read(&dsos->lock);
+	return res;
+}
diff --git a/tools/perf/util/dsos.h b/tools/perf/util/dsos.h
index 1c81ddf07f8f..a7c7f723c5ff 100644
--- a/tools/perf/util/dsos.h
+++ b/tools/perf/util/dsos.h
@@ -10,6 +10,8 @@
 
 struct dso;
 struct dso_id;
+struct kmod_path;
+struct machine;
 
 /*
  * DSOs are put into both a list for fast iteration and rbtree for fast
@@ -33,7 +35,7 @@ void dsos__exit(struct dsos *dsos);
 void __dsos__add(struct dsos *dsos, struct dso *dso);
 void dsos__add(struct dsos *dsos, struct dso *dso);
 struct dso *__dsos__addnew(struct dsos *dsos, const char *name);
-struct dso *__dsos__find(struct dsos *dsos, const char *name, bool cmp_short);
+struct dso *dsos__find(struct dsos *dsos, const char *name, bool cmp_short);
 
 struct dso *dsos__findnew_id(struct dsos *dsos, const char *name, struct dso_id *id);
  
@@ -48,4 +50,9 @@ size_t __dsos__fprintf(struct dsos *dsos, FILE *fp);
 
 int __dsos__hit_all(struct dsos *dsos);
 
+struct dso *dsos__findnew_module_dso(struct dsos *dsos, struct machine *machine,
+				     struct kmod_path *m, const char *filename);
+
+struct dso *dsos__find_kernel_dso(struct dsos *dsos);
+
 #endif /* __PERF_DSOS */
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index a0abfba90962..06026a1b2d1a 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -646,31 +646,6 @@ int machine__process_lost_samples_event(struct machine *machine __maybe_unused,
 	return 0;
 }
 
-static struct dso *machine__findnew_module_dso(struct machine *machine,
-					       struct kmod_path *m,
-					       const char *filename)
-{
-	struct dso *dso;
-
-	down_write(&machine->dsos.lock);
-
-	dso = __dsos__find(&machine->dsos, m->name, true);
-	if (!dso) {
-		dso = __dsos__addnew(&machine->dsos, m->name);
-		if (dso == NULL)
-			goto out_unlock;
-
-		dso__set_module_info(dso, m, machine);
-		dso__set_long_name(dso, strdup(filename), true);
-		dso->kernel = DSO_SPACE__KERNEL;
-	}
-
-	dso__get(dso);
-out_unlock:
-	up_write(&machine->dsos.lock);
-	return dso;
-}
-
 int machine__process_aux_event(struct machine *machine __maybe_unused,
 			       union perf_event *event)
 {
@@ -854,7 +829,7 @@ static struct map *machine__addnew_module_map(struct machine *machine, u64 start
 	if (kmod_path__parse_name(&m, filename))
 		return NULL;
 
-	dso = machine__findnew_module_dso(machine, &m, filename);
+	dso = dsos__findnew_module_dso(&machine->dsos, machine, &m, filename);
 	if (dso == NULL)
 		goto out;
 
@@ -1659,40 +1634,7 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
 		 * Should be there already, from the build-id table in
 		 * the header.
 		 */
-		struct dso *kernel = NULL;
-		struct dso *dso;
-
-		down_read(&machine->dsos.lock);
-
-		list_for_each_entry(dso, &machine->dsos.head, node) {
-
-			/*
-			 * The cpumode passed to is_kernel_module is not the
-			 * cpumode of *this* event. If we insist on passing
-			 * correct cpumode to is_kernel_module, we should
-			 * record the cpumode when we adding this dso to the
-			 * linked list.
-			 *
-			 * However we don't really need passing correct
-			 * cpumode.  We know the correct cpumode must be kernel
-			 * mode (if not, we should not link it onto kernel_dsos
-			 * list).
-			 *
-			 * Therefore, we pass PERF_RECORD_MISC_CPUMODE_UNKNOWN.
-			 * is_kernel_module() treats it as a kernel cpumode.
-			 */
-
-			if (!dso->kernel ||
-			    is_kernel_module(dso->long_name,
-					     PERF_RECORD_MISC_CPUMODE_UNKNOWN))
-				continue;
-
-
-			kernel = dso__get(dso);
-			break;
-		}
-
-		up_read(&machine->dsos.lock);
+		struct dso *kernel = dsos__find_kernel_dso(&machine->dsos);
 
 		if (kernel == NULL)
 			kernel = machine__findnew_dso(machine, machine->mmap_name);
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index cf5a15db3a1f..7c1fff9e413d 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -196,9 +196,7 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
 			 * reading the header will have the build ID set and all future mmaps will
 			 * have it missing.
 			 */
-			down_read(&machine->dsos.lock);
-			header_bid_dso = __dsos__find(&machine->dsos, filename, false);
-			up_read(&machine->dsos.lock);
+			header_bid_dso = dsos__find(&machine->dsos, filename, false);
 			if (header_bid_dso && header_bid_dso->header_build_id) {
 				dso__set_build_id(dso, &header_bid_dso->bid);
 				dso->header_build_id = 1;
diff --git a/tools/perf/util/vdso.c b/tools/perf/util/vdso.c
index df8963796187..35532dcbff74 100644
--- a/tools/perf/util/vdso.c
+++ b/tools/perf/util/vdso.c
@@ -133,8 +133,6 @@ static struct dso *__machine__addnew_vdso(struct machine *machine, const char *s
 	if (dso != NULL) {
 		__dsos__add(&machine->dsos, dso);
 		dso__set_long_name(dso, long_name, false);
-		/* Put dso here because __dsos_add already got it */
-		dso__put(dso);
 	}
 
 	return dso;
@@ -252,17 +250,15 @@ static struct dso *__machine__findnew_compat(struct machine *machine,
 	const char *file_name;
 	struct dso *dso;
 
-	dso = __dsos__find(&machine->dsos, vdso_file->dso_name, true);
+	dso = dsos__find(&machine->dsos, vdso_file->dso_name, true);
 	if (dso)
-		goto out;
+		return dso;
 
 	file_name = vdso__get_compat_file(vdso_file);
 	if (!file_name)
-		goto out;
+		return NULL;
 
-	dso = __machine__addnew_vdso(machine, vdso_file->dso_name, file_name);
-out:
-	return dso;
+	return __machine__addnew_vdso(machine, vdso_file->dso_name, file_name);
 }
 
 static int __machine__findnew_vdso_compat(struct machine *machine,
@@ -308,21 +304,21 @@ static struct dso *machine__find_vdso(struct machine *machine,
 	dso_type = machine__thread_dso_type(machine, thread);
 	switch (dso_type) {
 	case DSO__TYPE_32BIT:
-		dso = __dsos__find(&machine->dsos, DSO__NAME_VDSO32, true);
+		dso = dsos__find(&machine->dsos, DSO__NAME_VDSO32, true);
 		if (!dso) {
-			dso = __dsos__find(&machine->dsos, DSO__NAME_VDSO,
-					   true);
+			dso = dsos__find(&machine->dsos, DSO__NAME_VDSO,
+					 true);
 			if (dso && dso_type != dso__type(dso, machine))
 				dso = NULL;
 		}
 		break;
 	case DSO__TYPE_X32BIT:
-		dso = __dsos__find(&machine->dsos, DSO__NAME_VDSOX32, true);
+		dso = dsos__find(&machine->dsos, DSO__NAME_VDSOX32, true);
 		break;
 	case DSO__TYPE_64BIT:
 	case DSO__TYPE_UNKNOWN:
 	default:
-		dso = __dsos__find(&machine->dsos, DSO__NAME_VDSO, true);
+		dso = dsos__find(&machine->dsos, DSO__NAME_VDSO, true);
 		break;
 	}
 
@@ -334,37 +330,33 @@ struct dso *machine__findnew_vdso(struct machine *machine,
 {
 	struct vdso_info *vdso_info;
 	struct dso *dso = NULL;
+	char *file;
 
-	down_write(&machine->dsos.lock);
 	if (!machine->vdso_info)
 		machine->vdso_info = vdso_info__new();
 
 	vdso_info = machine->vdso_info;
 	if (!vdso_info)
-		goto out_unlock;
+		return NULL;
 
 	dso = machine__find_vdso(machine, thread);
 	if (dso)
-		goto out_unlock;
+		return dso;
 
 #if BITS_PER_LONG == 64
 	if (__machine__findnew_vdso_compat(machine, thread, vdso_info, &dso))
-		goto out_unlock;
+		return dso;
 #endif
 
-	dso = __dsos__find(&machine->dsos, DSO__NAME_VDSO, true);
-	if (!dso) {
-		char *file;
+	dso = dsos__find(&machine->dsos, DSO__NAME_VDSO, true);
+	if (dso)
+		return dso;
 
-		file = get_file(&vdso_info->vdso);
-		if (file)
-			dso = __machine__addnew_vdso(machine, DSO__NAME_VDSO, file);
-	}
+	file = get_file(&vdso_info->vdso);
+	if (!file)
+		return NULL;
 
-out_unlock:
-	dso__get(dso);
-	up_write(&machine->dsos.lock);
-	return dso;
+	return __machine__addnew_vdso(machine, DSO__NAME_VDSO, file);
 }
 
 bool dso__is_vdso(struct dso *dso)
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 43/53] perf dsos: Add dsos__for_each_dso
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (41 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 42/53] perf dsos: Tidy reference counting and locking Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 44/53] perf dso: Move dso functions out of dsos Ian Rogers
                   ` (9 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

To better abstract the dsos internals, add dsos__for_each_dso that
does a callback on each dso. This also means the read lock can be
correctly held.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-inject.c | 25 +++++++-----
 tools/perf/util/build-id.c  | 76 ++++++++++++++++++++-----------------
 tools/perf/util/dsos.c      | 16 ++++++++
 tools/perf/util/dsos.h      |  8 +---
 tools/perf/util/machine.c   | 40 +++++++++++--------
 5 files changed, 100 insertions(+), 65 deletions(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index ef73317e6ae7..ce5e28eaad90 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -1187,23 +1187,28 @@ static int synthesize_build_id(struct perf_inject *inject, struct dso *dso, pid_
 					       process_build_id, machine);
 }
 
+static int guest_session__add_build_ids_cb(struct dso *dso, void *data)
+{
+	struct guest_session *gs = data;
+	struct perf_inject *inject = container_of(gs, struct perf_inject, guest_session);
+
+	if (!dso->has_build_id)
+		return 0;
+
+	return synthesize_build_id(inject, dso, gs->machine_pid);
+
+}
+
 static int guest_session__add_build_ids(struct guest_session *gs)
 {
 	struct perf_inject *inject = container_of(gs, struct perf_inject, guest_session);
-	struct machine *machine = &gs->session->machines.host;
-	struct dso *dso;
-	int ret;
 
 	/* Build IDs will be put in the Build ID feature section */
 	perf_header__set_feat(&inject->session->header, HEADER_BUILD_ID);
 
-	dsos__for_each_with_build_id(dso, &machine->dsos.head) {
-		ret = synthesize_build_id(inject, dso, gs->machine_pid);
-		if (ret)
-			return ret;
-	}
-
-	return 0;
+	return dsos__for_each_dso(&gs->session->machines.host.dsos,
+				  guest_session__add_build_ids_cb,
+				  gs);
 }
 
 static int guest_session__ksymbol_event(struct perf_tool *tool,
diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index a617b1917e6b..a6d3c253f19f 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -327,48 +327,56 @@ static int write_buildid(const char *name, size_t name_len, struct build_id *bid
 	return write_padded(fd, name, name_len + 1, len);
 }
 
-static int machine__write_buildid_table(struct machine *machine,
-					struct feat_fd *fd)
+struct machine__write_buildid_table_cb_args {
+	struct machine *machine;
+	struct feat_fd *fd;
+	u16 kmisc, umisc;
+};
+
+static int machine__write_buildid_table_cb(struct dso *dso, void *data)
 {
-	int err = 0;
-	struct dso *pos;
-	u16 kmisc = PERF_RECORD_MISC_KERNEL,
-	    umisc = PERF_RECORD_MISC_USER;
+	struct machine__write_buildid_table_cb_args *args = data;
+	const char *name;
+	size_t name_len;
+	bool in_kernel = false;
 
-	if (!machine__is_host(machine)) {
-		kmisc = PERF_RECORD_MISC_GUEST_KERNEL;
-		umisc = PERF_RECORD_MISC_GUEST_USER;
-	}
+	if (!dso->has_build_id)
+		return 0;
 
-	dsos__for_each_with_build_id(pos, &machine->dsos.head) {
-		const char *name;
-		size_t name_len;
-		bool in_kernel = false;
+	if (!dso->hit && !dso__is_vdso(dso))
+		return 0;
 
-		if (!pos->hit && !dso__is_vdso(pos))
-			continue;
+	if (dso__is_vdso(dso)) {
+		name = dso->short_name;
+		name_len = dso->short_name_len;
+	} else if (dso__is_kcore(dso)) {
+		name = args->machine->mmap_name;
+		name_len = strlen(name);
+	} else {
+		name = dso->long_name;
+		name_len = dso->long_name_len;
+	}
 
-		if (dso__is_vdso(pos)) {
-			name = pos->short_name;
-			name_len = pos->short_name_len;
-		} else if (dso__is_kcore(pos)) {
-			name = machine->mmap_name;
-			name_len = strlen(name);
-		} else {
-			name = pos->long_name;
-			name_len = pos->long_name_len;
-		}
+	in_kernel = dso->kernel || is_kernel_module(name, PERF_RECORD_MISC_CPUMODE_UNKNOWN);
+	return write_buildid(name, name_len, &dso->bid, args->machine->pid,
+			     in_kernel ? args->kmisc : args->umisc, args->fd);
+}
 
-		in_kernel = pos->kernel ||
-				is_kernel_module(name,
-					PERF_RECORD_MISC_CPUMODE_UNKNOWN);
-		err = write_buildid(name, name_len, &pos->bid, machine->pid,
-				    in_kernel ? kmisc : umisc, fd);
-		if (err)
-			break;
+static int machine__write_buildid_table(struct machine *machine, struct feat_fd *fd)
+{
+	struct machine__write_buildid_table_cb_args args = {
+		.machine = machine,
+		.fd = fd,
+		.kmisc = PERF_RECORD_MISC_KERNEL,
+		.umisc = PERF_RECORD_MISC_USER,
+	};
+
+	if (!machine__is_host(machine)) {
+		args.kmisc = PERF_RECORD_MISC_GUEST_KERNEL;
+		args.umisc = PERF_RECORD_MISC_GUEST_USER;
 	}
 
-	return err;
+	return dsos__for_each_dso(&machine->dsos, machine__write_buildid_table_cb, &args);
 }
 
 int perf_session__write_buildid_table(struct perf_session *session,
diff --git a/tools/perf/util/dsos.c b/tools/perf/util/dsos.c
index d269e09005a7..d43f64939b12 100644
--- a/tools/perf/util/dsos.c
+++ b/tools/perf/util/dsos.c
@@ -433,3 +433,19 @@ struct dso *dsos__find_kernel_dso(struct dsos *dsos)
 	up_read(&dsos->lock);
 	return res;
 }
+
+int dsos__for_each_dso(struct dsos *dsos, int (*cb)(struct dso *dso, void *data), void *data)
+{
+	struct dso *dso;
+
+	down_read(&dsos->lock);
+	list_for_each_entry(dso, &dsos->head, node) {
+		int err;
+
+		err = cb(dso, data);
+		if (err)
+			return err;
+	}
+	up_read(&dsos->lock);
+	return 0;
+}
diff --git a/tools/perf/util/dsos.h b/tools/perf/util/dsos.h
index a7c7f723c5ff..317a263f0e37 100644
--- a/tools/perf/util/dsos.h
+++ b/tools/perf/util/dsos.h
@@ -23,12 +23,6 @@ struct dsos {
 	struct rw_semaphore lock;
 };
 
-#define dsos__for_each_with_build_id(pos, head)	\
-	list_for_each_entry(pos, head, node)	\
-		if (!pos->has_build_id)		\
-			continue;		\
-		else
-
 void dsos__init(struct dsos *dsos);
 void dsos__exit(struct dsos *dsos);
 
@@ -55,4 +49,6 @@ struct dso *dsos__findnew_module_dso(struct dsos *dsos, struct machine *machine,
 
 struct dso *dsos__find_kernel_dso(struct dsos *dsos);
 
+int dsos__for_each_dso(struct dsos *dsos, int (*cb)(struct dso *dso, void *data), void *data);
+
 #endif /* __PERF_DSOS */
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 06026a1b2d1a..6505f8c2cecc 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1558,16 +1558,14 @@ int machine__create_kernel_maps(struct machine *machine)
 	return ret;
 }
 
-static bool machine__uses_kcore(struct machine *machine)
+static int machine__uses_kcore_cb(struct dso *dso, void *data __maybe_unused)
 {
-	struct dso *dso;
-
-	list_for_each_entry(dso, &machine->dsos.head, node) {
-		if (dso__is_kcore(dso))
-			return true;
-	}
+	return dso__is_kcore(dso) ? 1 : 0;
+}
 
-	return false;
+static bool machine__uses_kcore(struct machine *machine)
+{
+	return dsos__for_each_dso(&machine->dsos, machine__uses_kcore_cb, NULL) != 0 ? true : false;
 }
 
 static bool perf_event__is_extra_kernel_mmap(struct machine *machine,
@@ -3133,16 +3131,28 @@ char *machine__resolve_kernel_addr(void *vmachine, unsigned long long *addrp, ch
 	return sym->name;
 }
 
+struct machine__for_each_dso_cb_args {
+	struct machine *machine;
+	machine__dso_t fn;
+	void *priv;
+};
+
+static int machine__for_each_dso_cb(struct dso *dso, void *data)
+{
+	struct machine__for_each_dso_cb_args *args = data;
+
+	return args->fn(dso, args->machine, args->priv);
+}
+
 int machine__for_each_dso(struct machine *machine, machine__dso_t fn, void *priv)
 {
-	struct dso *pos;
-	int err = 0;
+	struct machine__for_each_dso_cb_args args = {
+		.machine = machine,
+		.fn = fn,
+		.priv = priv,
+	};
 
-	list_for_each_entry(pos, &machine->dsos.head, node) {
-		if (fn(pos, machine, priv))
-			err = -1;
-	}
-	return err;
+	return dsos__for_each_dso(&machine->dsos, machine__for_each_dso_cb, &args);
 }
 
 int machine__for_each_kernel_map(struct machine *machine, machine__map_t fn, void *priv)
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 44/53] perf dso: Move dso functions out of dsos
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (42 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 43/53] perf dsos: Add dsos__for_each_dso Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 45/53] perf dsos: Switch more loops to dsos__for_each_dso Ian Rogers
                   ` (8 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Move dso and dso_id functions to dso.c to match the struct
declarations.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/dso.c  | 61 ++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/dso.h  |  4 +++
 tools/perf/util/dsos.c | 61 ------------------------------------------
 3 files changed, 65 insertions(+), 61 deletions(-)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 1f629b6fb7cf..1b0990507a42 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -1268,6 +1268,67 @@ static void dso__set_long_name_id(struct dso *dso, const char *name, struct dso_
 		__dsos__findnew_link_by_longname_id(root, dso, NULL, id);
 }
 
+static int __dso_id__cmp(struct dso_id *a, struct dso_id *b)
+{
+	if (a->maj > b->maj) return -1;
+	if (a->maj < b->maj) return 1;
+
+	if (a->min > b->min) return -1;
+	if (a->min < b->min) return 1;
+
+	if (a->ino > b->ino) return -1;
+	if (a->ino < b->ino) return 1;
+
+	/*
+	 * Synthesized MMAP events have zero ino_generation, avoid comparing
+	 * them with MMAP events with actual ino_generation.
+	 *
+	 * I found it harmful because the mismatch resulted in a new
+	 * dso that did not have a build ID whereas the original dso did have a
+	 * build ID. The build ID was essential because the object was not found
+	 * otherwise. - Adrian
+	 */
+	if (a->ino_generation && b->ino_generation) {
+		if (a->ino_generation > b->ino_generation) return -1;
+		if (a->ino_generation < b->ino_generation) return 1;
+	}
+
+	return 0;
+}
+
+bool dso_id__empty(struct dso_id *id)
+{
+	if (!id)
+		return true;
+
+	return !id->maj && !id->min && !id->ino && !id->ino_generation;
+}
+
+void dso__inject_id(struct dso *dso, struct dso_id *id)
+{
+	dso->id.maj = id->maj;
+	dso->id.min = id->min;
+	dso->id.ino = id->ino;
+	dso->id.ino_generation = id->ino_generation;
+}
+
+int dso_id__cmp(struct dso_id *a, struct dso_id *b)
+{
+	/*
+	 * The second is always dso->id, so zeroes if not set, assume passing
+	 * NULL for a means a zeroed id
+	 */
+	if (dso_id__empty(a) || dso_id__empty(b))
+		return 0;
+
+	return __dso_id__cmp(a, b);
+}
+
+int dso__cmp_id(struct dso *a, struct dso *b)
+{
+	return __dso_id__cmp(&a->id, &b->id);
+}
+
 void dso__set_long_name(struct dso *dso, const char *name, bool name_allocated)
 {
 	dso__set_long_name_id(dso, name, NULL, name_allocated);
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 8b45dbdae776..1b247eeaa81e 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -235,6 +235,9 @@ static inline void dso__set_loaded(struct dso *dso)
 	dso->loaded = true;
 }
 
+int dso_id__cmp(struct dso_id *a, struct dso_id *b);
+bool dso_id__empty(struct dso_id *id);
+
 struct dso *dso__new_id(const char *name, struct dso_id *id);
 struct dso *dso__new(const char *name);
 void dso__delete(struct dso *dso);
@@ -242,6 +245,7 @@ void dso__delete(struct dso *dso);
 int dso__cmp_id(struct dso *a, struct dso *b);
 void dso__set_short_name(struct dso *dso, const char *name, bool name_allocated);
 void dso__set_long_name(struct dso *dso, const char *name, bool name_allocated);
+void dso__inject_id(struct dso *dso, struct dso_id *id);
 
 int dso__name_len(const struct dso *dso);
 
diff --git a/tools/perf/util/dsos.c b/tools/perf/util/dsos.c
index d43f64939b12..f816927a21ff 100644
--- a/tools/perf/util/dsos.c
+++ b/tools/perf/util/dsos.c
@@ -41,67 +41,6 @@ void dsos__exit(struct dsos *dsos)
 	exit_rwsem(&dsos->lock);
 }
 
-static int __dso_id__cmp(struct dso_id *a, struct dso_id *b)
-{
-	if (a->maj > b->maj) return -1;
-	if (a->maj < b->maj) return 1;
-
-	if (a->min > b->min) return -1;
-	if (a->min < b->min) return 1;
-
-	if (a->ino > b->ino) return -1;
-	if (a->ino < b->ino) return 1;
-
-	/*
-	 * Synthesized MMAP events have zero ino_generation, avoid comparing
-	 * them with MMAP events with actual ino_generation.
-	 *
-	 * I found it harmful because the mismatch resulted in a new
-	 * dso that did not have a build ID whereas the original dso did have a
-	 * build ID. The build ID was essential because the object was not found
-	 * otherwise. - Adrian
-	 */
-	if (a->ino_generation && b->ino_generation) {
-		if (a->ino_generation > b->ino_generation) return -1;
-		if (a->ino_generation < b->ino_generation) return 1;
-	}
-
-	return 0;
-}
-
-static bool dso_id__empty(struct dso_id *id)
-{
-	if (!id)
-		return true;
-
-	return !id->maj && !id->min && !id->ino && !id->ino_generation;
-}
-
-static void dso__inject_id(struct dso *dso, struct dso_id *id)
-{
-	dso->id.maj = id->maj;
-	dso->id.min = id->min;
-	dso->id.ino = id->ino;
-	dso->id.ino_generation = id->ino_generation;
-}
-
-static int dso_id__cmp(struct dso_id *a, struct dso_id *b)
-{
-	/*
-	 * The second is always dso->id, so zeroes if not set, assume passing
-	 * NULL for a means a zeroed id
-	 */
-	if (dso_id__empty(a) || dso_id__empty(b))
-		return 0;
-
-	return __dso_id__cmp(a, b);
-}
-
-int dso__cmp_id(struct dso *a, struct dso *b)
-{
-	return __dso_id__cmp(&a->id, &b->id);
-}
-
 bool __dsos__read_build_ids(struct dsos *dsos, bool with_hits)
 {
 	struct list_head *head = &dsos->head;
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 45/53] perf dsos: Switch more loops to dsos__for_each_dso
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (43 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 44/53] perf dso: Move dso functions out of dsos Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 46/53] perf dsos: Switch backing storage to array from rbtree/list Ian Rogers
                   ` (7 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Switch loops within dsos.c, add a version that isn't locked. Switch
some unlocked loops to hold the read lock.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/build-id.c |   2 +-
 tools/perf/util/dsos.c     | 258 ++++++++++++++++++++++++-------------
 tools/perf/util/dsos.h     |   8 +-
 tools/perf/util/machine.c  |   8 +-
 4 files changed, 174 insertions(+), 102 deletions(-)

diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index a6d3c253f19f..864bc26b6b46 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -964,7 +964,7 @@ int perf_session__cache_build_ids(struct perf_session *session)
 
 static bool machine__read_build_ids(struct machine *machine, bool with_hits)
 {
-	return __dsos__read_build_ids(&machine->dsos, with_hits);
+	return dsos__read_build_ids(&machine->dsos, with_hits);
 }
 
 bool perf_session__read_build_ids(struct perf_session *session, bool with_hits)
diff --git a/tools/perf/util/dsos.c b/tools/perf/util/dsos.c
index f816927a21ff..b7fbfb877ae3 100644
--- a/tools/perf/util/dsos.c
+++ b/tools/perf/util/dsos.c
@@ -41,38 +41,65 @@ void dsos__exit(struct dsos *dsos)
 	exit_rwsem(&dsos->lock);
 }
 
-bool __dsos__read_build_ids(struct dsos *dsos, bool with_hits)
+
+static int __dsos__for_each_dso(struct dsos *dsos,
+				int (*cb)(struct dso *dso, void *data),
+				void *data)
+{
+	struct dso *dso;
+
+	list_for_each_entry(dso, &dsos->head, node) {
+		int err;
+
+		err = cb(dso, data);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+struct dsos__read_build_ids_cb_args {
+	bool with_hits;
+	bool have_build_id;
+};
+
+static int dsos__read_build_ids_cb(struct dso *dso, void *data)
 {
-	struct list_head *head = &dsos->head;
-	bool have_build_id = false;
-	struct dso *pos;
+	struct dsos__read_build_ids_cb_args *args = data;
 	struct nscookie nsc;
 
-	list_for_each_entry(pos, head, node) {
-		if (with_hits && !pos->hit && !dso__is_vdso(pos))
-			continue;
-		if (pos->has_build_id) {
-			have_build_id = true;
-			continue;
-		}
-		nsinfo__mountns_enter(pos->nsinfo, &nsc);
-		if (filename__read_build_id(pos->long_name, &pos->bid) > 0) {
-			have_build_id	  = true;
-			pos->has_build_id = true;
-		} else if (errno == ENOENT && pos->nsinfo) {
-			char *new_name = dso__filename_with_chroot(pos, pos->long_name);
-
-			if (new_name && filename__read_build_id(new_name,
-								&pos->bid) > 0) {
-				have_build_id = true;
-				pos->has_build_id = true;
-			}
-			free(new_name);
+	if (args->with_hits && !dso->hit && !dso__is_vdso(dso))
+		return 0;
+	if (dso->has_build_id) {
+		args->have_build_id = true;
+		return 0;
+	}
+	nsinfo__mountns_enter(dso->nsinfo, &nsc);
+	if (filename__read_build_id(dso->long_name, &dso->bid) > 0) {
+		args->have_build_id = true;
+		dso->has_build_id = true;
+	} else if (errno == ENOENT && dso->nsinfo) {
+		char *new_name = dso__filename_with_chroot(dso, dso->long_name);
+
+		if (new_name && filename__read_build_id(new_name, &dso->bid) > 0) {
+			args->have_build_id = true;
+			dso->has_build_id = true;
 		}
-		nsinfo__mountns_exit(&nsc);
+		free(new_name);
 	}
+	nsinfo__mountns_exit(&nsc);
+	return 0;
+}
 
-	return have_build_id;
+bool dsos__read_build_ids(struct dsos *dsos, bool with_hits)
+{
+	struct dsos__read_build_ids_cb_args args = {
+		.with_hits = with_hits,
+		.have_build_id = false,
+	};
+
+	dsos__for_each_dso(dsos, dsos__read_build_ids_cb, &args);
+	return args.have_build_id;
 }
 
 static int __dso__cmp_long_name(const char *long_name, struct dso_id *id, struct dso *b)
@@ -105,6 +132,7 @@ struct dso *__dsos__findnew_link_by_longname_id(struct rb_root *root, struct dso
 
 	if (!name)
 		name = dso->long_name;
+
 	/*
 	 * Find node with the matching name
 	 */
@@ -185,17 +213,40 @@ static struct dso *__dsos__findnew_by_longname_id(struct rb_root *root, const ch
 	return __dsos__findnew_link_by_longname_id(root, NULL, name, id);
 }
 
+struct dsos__find_id_cb_args {
+	const char *name;
+	struct dso_id *id;
+	struct dso *res;
+};
+
+static int dsos__find_id_cb(struct dso *dso, void *data)
+{
+	struct dsos__find_id_cb_args *args = data;
+
+	if (__dso__cmp_short_name(args->name, args->id, dso) == 0) {
+		args->res = dso__get(dso);
+		return 1;
+	}
+	return 0;
+
+}
+
 static struct dso *__dsos__find_id(struct dsos *dsos, const char *name, struct dso_id *id, bool cmp_short)
 {
-	struct dso *pos;
+	struct dso *res;
 
 	if (cmp_short) {
-		list_for_each_entry(pos, &dsos->head, node)
-			if (__dso__cmp_short_name(name, id, pos) == 0)
-				return dso__get(pos);
-		return NULL;
+		struct dsos__find_id_cb_args args = {
+			.name = name,
+			.id = id,
+			.res = NULL,
+		};
+
+		__dsos__for_each_dso(dsos, dsos__find_id_cb, &args);
+		return args.res;
 	}
-	return __dsos__findnew_by_longname_id(&dsos->root, name, id);
+	res = __dsos__findnew_by_longname_id(&dsos->root, name, id);
+	return res;
 }
 
 struct dso *dsos__find(struct dsos *dsos, const char *name, bool cmp_short)
@@ -275,48 +326,74 @@ struct dso *dsos__findnew_id(struct dsos *dsos, const char *name, struct dso_id
 	return dso;
 }
 
-size_t __dsos__fprintf_buildid(struct dsos *dsos, FILE *fp,
-			       bool (skip)(struct dso *dso, int parm), int parm)
-{
-	struct list_head *head = &dsos->head;
-	struct dso *pos;
-	size_t ret = 0;
+struct dsos__fprintf_buildid_cb_args {
+	FILE *fp;
+	bool (*skip)(struct dso *dso, int parm);
+	int parm;
+	size_t ret;
+};
 
-	list_for_each_entry(pos, head, node) {
-		char sbuild_id[SBUILD_ID_SIZE];
+static int dsos__fprintf_buildid_cb(struct dso *dso, void *data)
+{
+	struct dsos__fprintf_buildid_cb_args *args = data;
+	char sbuild_id[SBUILD_ID_SIZE];
 
-		if (skip && skip(pos, parm))
-			continue;
-		build_id__sprintf(&pos->bid, sbuild_id);
-		ret += fprintf(fp, "%-40s %s\n", sbuild_id, pos->long_name);
-	}
-	return ret;
+	if (args->skip && args->skip(dso, args->parm))
+		return 0;
+	build_id__sprintf(&dso->bid, sbuild_id);
+	args->ret += fprintf(args->fp, "%-40s %s\n", sbuild_id, dso->long_name);
+	return 0;
 }
 
-size_t __dsos__fprintf(struct dsos *dsos, FILE *fp)
+size_t dsos__fprintf_buildid(struct dsos *dsos, FILE *fp,
+			       bool (*skip)(struct dso *dso, int parm), int parm)
 {
-	struct list_head *head = &dsos->head;
-	struct dso *pos;
-	size_t ret = 0;
+	struct dsos__fprintf_buildid_cb_args args = {
+		.fp = fp,
+		.skip = skip,
+		.parm = parm,
+		.ret = 0,
+	};
+
+	dsos__for_each_dso(dsos, dsos__fprintf_buildid_cb, &args);
+	return args.ret;
+}
 
-	list_for_each_entry(pos, head, node) {
-		ret += dso__fprintf(pos, fp);
-	}
+struct dsos__fprintf_cb_args {
+	FILE *fp;
+	size_t ret;
+};
 
-	return ret;
+static int dsos__fprintf_cb(struct dso *dso, void *data)
+{
+	struct dsos__fprintf_cb_args *args = data;
+
+	args->ret += dso__fprintf(dso, args->fp);
+	return 0;
 }
 
-int __dsos__hit_all(struct dsos *dsos)
+size_t dsos__fprintf(struct dsos *dsos, FILE *fp)
 {
-	struct list_head *head = &dsos->head;
-	struct dso *pos;
+	struct dsos__fprintf_cb_args args = {
+		.fp = fp,
+		.ret = 0,
+	};
 
-	list_for_each_entry(pos, head, node)
-		pos->hit = true;
+	dsos__for_each_dso(dsos, dsos__fprintf_cb, &args);
+	return args.ret;
+}
 
+static int dsos__hit_all_cb(struct dso *dso, void *data __maybe_unused)
+{
+	dso->hit = true;
 	return 0;
 }
 
+int dsos__hit_all(struct dsos *dsos)
+{
+	return dsos__for_each_dso(dsos, dsos__hit_all_cb, NULL);
+}
+
 struct dso *dsos__findnew_module_dso(struct dsos *dsos,
 				     struct machine *machine,
 				     struct kmod_path *m,
@@ -342,49 +419,44 @@ struct dso *dsos__findnew_module_dso(struct dsos *dsos,
 	return dso;
 }
 
-struct dso *dsos__find_kernel_dso(struct dsos *dsos)
+static int dsos__find_kernel_dso_cb(struct dso *dso, void *data)
 {
-	struct dso *dso, *res = NULL;
+	struct dso **res = data;
+	/*
+	 * The cpumode passed to is_kernel_module is not the cpumode of *this*
+	 * event. If we insist on passing correct cpumode to is_kernel_module,
+	 * we should record the cpumode when we adding this dso to the linked
+	 * list.
+	 *
+	 * However we don't really need passing correct cpumode.  We know the
+	 * correct cpumode must be kernel mode (if not, we should not link it
+	 * onto kernel_dsos list).
+	 *
+	 * Therefore, we pass PERF_RECORD_MISC_CPUMODE_UNKNOWN.
+	 * is_kernel_module() treats it as a kernel cpumode.
+	 */
+	if (!dso->kernel ||
+	    is_kernel_module(dso->long_name, PERF_RECORD_MISC_CPUMODE_UNKNOWN))
+		return 0;
 
-	down_read(&dsos->lock);
-	list_for_each_entry(dso, &dsos->head, node) {
-		/*
-		 * The cpumode passed to is_kernel_module is not the cpumode of
-		 * *this* event. If we insist on passing correct cpumode to
-		 * is_kernel_module, we should record the cpumode when we adding
-		 * this dso to the linked list.
-		 *
-		 * However we don't really need passing correct cpumode.  We
-		 * know the correct cpumode must be kernel mode (if not, we
-		 * should not link it onto kernel_dsos list).
-		 *
-		 * Therefore, we pass PERF_RECORD_MISC_CPUMODE_UNKNOWN.
-		 * is_kernel_module() treats it as a kernel cpumode.
-		 */
-		if (!dso->kernel ||
-		    is_kernel_module(dso->long_name,
-				     PERF_RECORD_MISC_CPUMODE_UNKNOWN))
-			continue;
+	*res = dso__get(dso);
+	return 1;
+}
 
-		res = dso__get(dso);
-		break;
-	}
-	up_read(&dsos->lock);
+struct dso *dsos__find_kernel_dso(struct dsos *dsos)
+{
+	struct dso *res = NULL;
+
+	dsos__for_each_dso(dsos, dsos__find_kernel_dso_cb, &res);
 	return res;
 }
 
 int dsos__for_each_dso(struct dsos *dsos, int (*cb)(struct dso *dso, void *data), void *data)
 {
-	struct dso *dso;
+	int err;
 
 	down_read(&dsos->lock);
-	list_for_each_entry(dso, &dsos->head, node) {
-		int err;
-
-		err = cb(dso, data);
-		if (err)
-			return err;
-	}
+	err = __dsos__for_each_dso(dsos, cb, data);
 	up_read(&dsos->lock);
-	return 0;
+	return err;
 }
diff --git a/tools/perf/util/dsos.h b/tools/perf/util/dsos.h
index 317a263f0e37..50bd51523475 100644
--- a/tools/perf/util/dsos.h
+++ b/tools/perf/util/dsos.h
@@ -33,16 +33,16 @@ struct dso *dsos__find(struct dsos *dsos, const char *name, bool cmp_short);
 
 struct dso *dsos__findnew_id(struct dsos *dsos, const char *name, struct dso_id *id);
  
-bool __dsos__read_build_ids(struct dsos *dsos, bool with_hits);
+bool dsos__read_build_ids(struct dsos *dsos, bool with_hits);
 
 struct dso *__dsos__findnew_link_by_longname_id(struct rb_root *root, struct dso *dso,
 						const char *name, struct dso_id *id);
 
-size_t __dsos__fprintf_buildid(struct dsos *dsos, FILE *fp,
+size_t dsos__fprintf_buildid(struct dsos *dsos, FILE *fp,
 			       bool (skip)(struct dso *dso, int parm), int parm);
-size_t __dsos__fprintf(struct dsos *dsos, FILE *fp);
+size_t dsos__fprintf(struct dsos *dsos, FILE *fp);
 
-int __dsos__hit_all(struct dsos *dsos);
+int dsos__hit_all(struct dsos *dsos);
 
 struct dso *dsos__findnew_module_dso(struct dsos *dsos, struct machine *machine,
 				     struct kmod_path *m, const char *filename);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 6505f8c2cecc..3646d4593502 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -853,11 +853,11 @@ static struct map *machine__addnew_module_map(struct machine *machine, u64 start
 size_t machines__fprintf_dsos(struct machines *machines, FILE *fp)
 {
 	struct rb_node *nd;
-	size_t ret = __dsos__fprintf(&machines->host.dsos, fp);
+	size_t ret = dsos__fprintf(&machines->host.dsos, fp);
 
 	for (nd = rb_first_cached(&machines->guests); nd; nd = rb_next(nd)) {
 		struct machine *pos = rb_entry(nd, struct machine, rb_node);
-		ret += __dsos__fprintf(&pos->dsos, fp);
+		ret += dsos__fprintf(&pos->dsos, fp);
 	}
 
 	return ret;
@@ -866,7 +866,7 @@ size_t machines__fprintf_dsos(struct machines *machines, FILE *fp)
 size_t machine__fprintf_dsos_buildid(struct machine *m, FILE *fp,
 				     bool (skip)(struct dso *dso, int parm), int parm)
 {
-	return __dsos__fprintf_buildid(&m->dsos, fp, skip, parm);
+	return dsos__fprintf_buildid(&m->dsos, fp, skip, parm);
 }
 
 size_t machines__fprintf_dsos_buildid(struct machines *machines, FILE *fp,
@@ -3204,5 +3204,5 @@ bool machine__is_lock_function(struct machine *machine, u64 addr)
 
 int machine__hit_all_dsos(struct machine *machine)
 {
-	return __dsos__hit_all(&machine->dsos);
+	return dsos__hit_all(&machine->dsos);
 }
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 46/53] perf dsos: Switch backing storage to array from rbtree/list
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (44 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 45/53] perf dsos: Switch more loops to dsos__for_each_dso Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 47/53] perf dsos: Remove __dsos__addnew Ian Rogers
                   ` (6 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

DSOs were held on a list for fast iteration and in an rbtree for fast
finds. Switch to using a lazily sorted array where iteration is just
iterating through the array and binary searches are the same
complexity as searching the rbtree. The find may need to sort the
array first which does increase the complexity, but add operations
have lower complexity and overall the complexity should remain about
the same.

The set name operations on the dso just records that the array is no
longer sorted, avoiding complexity in rebalancing the rbtree. Tighter
locking discipline is enforced to avoid the array being resorted while
long and short names or ids are changed.

The array is smaller in size, replacing 6 pointers with 2, and so even
with extra allocated space in the array, the array may be 50%
unoccupied, the memory saving should be at least 2x.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/dso.c  |  67 +++++++++------
 tools/perf/util/dso.h  |  10 +--
 tools/perf/util/dsos.c | 188 ++++++++++++++++++++++++++---------------
 tools/perf/util/dsos.h |  21 +++--
 4 files changed, 177 insertions(+), 109 deletions(-)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 1b0990507a42..66dc929443ba 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -1240,35 +1240,35 @@ struct dso *machine__findnew_kernel(struct machine *machine, const char *name,
 	return dso;
 }
 
-static void dso__set_long_name_id(struct dso *dso, const char *name, struct dso_id *id, bool name_allocated)
+static void dso__set_long_name_id(struct dso *dso, const char *name, bool name_allocated)
 {
-	struct rb_root *root = dso->root;
+	struct dsos *dsos = dso->dsos;
 
 	if (name == NULL)
 		return;
 
-	if (dso->long_name_allocated)
-		free((char *)dso->long_name);
-
-	if (root) {
-		rb_erase(&dso->rb_node, root);
+	if (dsos) {
 		/*
-		 * __dsos__findnew_link_by_longname_id() isn't guaranteed to
-		 * add it back, so a clean removal is required here.
+		 * Need to avoid re-sorting the dsos breaking by non-atomically
+		 * renaming the dso.
 		 */
-		RB_CLEAR_NODE(&dso->rb_node);
-		dso->root = NULL;
+		down_write(&dsos->lock);
 	}
 
+	if (dso->long_name_allocated)
+		free((char *)dso->long_name);
+
 	dso->long_name		 = name;
 	dso->long_name_len	 = strlen(name);
 	dso->long_name_allocated = name_allocated;
 
-	if (root)
-		__dsos__findnew_link_by_longname_id(root, dso, NULL, id);
+	if (dsos) {
+		dsos->sorted = false;
+		up_write(&dsos->lock);
+	}
 }
 
-static int __dso_id__cmp(struct dso_id *a, struct dso_id *b)
+static int __dso_id__cmp(const struct dso_id *a, const struct dso_id *b)
 {
 	if (a->maj > b->maj) return -1;
 	if (a->maj < b->maj) return 1;
@@ -1296,7 +1296,7 @@ static int __dso_id__cmp(struct dso_id *a, struct dso_id *b)
 	return 0;
 }
 
-bool dso_id__empty(struct dso_id *id)
+bool dso_id__empty(const struct dso_id *id)
 {
 	if (!id)
 		return true;
@@ -1304,15 +1304,22 @@ bool dso_id__empty(struct dso_id *id)
 	return !id->maj && !id->min && !id->ino && !id->ino_generation;
 }
 
-void dso__inject_id(struct dso *dso, struct dso_id *id)
+void __dso__inject_id(struct dso *dso, struct dso_id *id)
 {
+	struct dsos *dsos = dso->dsos;
+
+	/* dsos write lock held by caller. */
+
 	dso->id.maj = id->maj;
 	dso->id.min = id->min;
 	dso->id.ino = id->ino;
 	dso->id.ino_generation = id->ino_generation;
+
+	if (dsos)
+		dsos->sorted = false;
 }
 
-int dso_id__cmp(struct dso_id *a, struct dso_id *b)
+int dso_id__cmp(const struct dso_id *a, const struct dso_id *b)
 {
 	/*
 	 * The second is always dso->id, so zeroes if not set, assume passing
@@ -1331,20 +1338,34 @@ int dso__cmp_id(struct dso *a, struct dso *b)
 
 void dso__set_long_name(struct dso *dso, const char *name, bool name_allocated)
 {
-	dso__set_long_name_id(dso, name, NULL, name_allocated);
+	dso__set_long_name_id(dso, name, name_allocated);
 }
 
 void dso__set_short_name(struct dso *dso, const char *name, bool name_allocated)
 {
+	struct dsos *dsos = dso->dsos;
+
 	if (name == NULL)
 		return;
 
+	if (dsos) {
+		/*
+		 * Need to avoid re-sorting the dsos breaking by non-atomically
+		 * renaming the dso.
+		 */
+		down_write(&dsos->lock);
+	}
 	if (dso->short_name_allocated)
 		free((char *)dso->short_name);
 
 	dso->short_name		  = name;
 	dso->short_name_len	  = strlen(name);
 	dso->short_name_allocated = name_allocated;
+
+	if (dsos) {
+		dsos->sorted = false;
+		up_write(&dsos->lock);
+	}
 }
 
 int dso__name_len(const struct dso *dso)
@@ -1380,7 +1401,7 @@ struct dso *dso__new_id(const char *name, struct dso_id *id)
 		strcpy(dso->name, name);
 		if (id)
 			dso->id = *id;
-		dso__set_long_name_id(dso, dso->name, id, false);
+		dso__set_long_name_id(dso, dso->name, false);
 		dso__set_short_name(dso, dso->name, false);
 		dso->symbols = RB_ROOT_CACHED;
 		dso->symbol_names = NULL;
@@ -1403,9 +1424,6 @@ struct dso *dso__new_id(const char *name, struct dso_id *id)
 		dso->is_kmod = 0;
 		dso->needs_swap = DSO_SWAP__UNSET;
 		dso->comp = COMP_ID__NONE;
-		RB_CLEAR_NODE(&dso->rb_node);
-		dso->root = NULL;
-		INIT_LIST_HEAD(&dso->node);
 		INIT_LIST_HEAD(&dso->data.open_entry);
 		mutex_init(&dso->lock);
 		refcount_set(&dso->refcnt, 1);
@@ -1421,9 +1439,8 @@ struct dso *dso__new(const char *name)
 
 void dso__delete(struct dso *dso)
 {
-	if (!RB_EMPTY_NODE(&dso->rb_node))
-		pr_err("DSO %s is still in rbtree when being deleted!\n",
-		       dso->long_name);
+	if (dso->dsos)
+		pr_err("DSO %s is still in rbtree when being deleted!\n", dso->long_name);
 
 	/* free inlines first, as they reference symbols */
 	inlines__tree_delete(&dso->inlined_nodes);
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 1b247eeaa81e..fd500583cd2e 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -146,9 +146,7 @@ struct auxtrace_cache;
 
 struct dso {
 	struct mutex	 lock;
-	struct list_head node;
-	struct rb_node	 rb_node;	/* rbtree node sorted by long name */
-	struct rb_root	 *root;		/* root of rbtree that rb_node is in */
+	struct dsos	 *dsos;
 	struct rb_root_cached symbols;
 	struct symbol	 **symbol_names;
 	size_t		 symbol_names_len;
@@ -235,8 +233,8 @@ static inline void dso__set_loaded(struct dso *dso)
 	dso->loaded = true;
 }
 
-int dso_id__cmp(struct dso_id *a, struct dso_id *b);
-bool dso_id__empty(struct dso_id *id);
+int dso_id__cmp(const struct dso_id *a, const struct dso_id *b);
+bool dso_id__empty(const struct dso_id *id);
 
 struct dso *dso__new_id(const char *name, struct dso_id *id);
 struct dso *dso__new(const char *name);
@@ -245,7 +243,7 @@ void dso__delete(struct dso *dso);
 int dso__cmp_id(struct dso *a, struct dso *b);
 void dso__set_short_name(struct dso *dso, const char *name, bool name_allocated);
 void dso__set_long_name(struct dso *dso, const char *name, bool name_allocated);
-void dso__inject_id(struct dso *dso, struct dso_id *id);
+void __dso__inject_id(struct dso *dso, struct dso_id *id);
 
 int dso__name_len(const struct dso *dso);
 
diff --git a/tools/perf/util/dsos.c b/tools/perf/util/dsos.c
index b7fbfb877ae3..cfc10e1a6802 100644
--- a/tools/perf/util/dsos.c
+++ b/tools/perf/util/dsos.c
@@ -14,24 +14,30 @@
 
 void dsos__init(struct dsos *dsos)
 {
-	INIT_LIST_HEAD(&dsos->head);
-	dsos->root = RB_ROOT;
 	init_rwsem(&dsos->lock);
+
+	dsos->cnt = 0;
+	dsos->allocated = 0;
+	dsos->dsos = NULL;
+	dsos->sorted = true;
 }
 
 static void dsos__purge(struct dsos *dsos)
 {
-	struct dso *pos, *n;
-
 	down_write(&dsos->lock);
 
-	list_for_each_entry_safe(pos, n, &dsos->head, node) {
-		RB_CLEAR_NODE(&pos->rb_node);
-		pos->root = NULL;
-		list_del_init(&pos->node);
-		dso__put(pos);
+	for (unsigned int i = 0; i < dsos->cnt; i++) {
+		struct dso *dso = dsos->dsos[i];
+
+		dso__put(dso);
+		dso->dsos = NULL;
 	}
 
+	zfree(&dsos->dsos);
+	dsos->cnt = 0;
+	dsos->allocated = 0;
+	dsos->sorted = true;
+
 	up_write(&dsos->lock);
 }
 
@@ -46,9 +52,8 @@ static int __dsos__for_each_dso(struct dsos *dsos,
 				int (*cb)(struct dso *dso, void *data),
 				void *data)
 {
-	struct dso *dso;
-
-	list_for_each_entry(dso, &dsos->head, node) {
+	for (unsigned int i = 0; i < dsos->cnt; i++) {
+		struct dso *dso = dsos->dsos[i];
 		int err;
 
 		err = cb(dso, data);
@@ -119,16 +124,47 @@ static int dso__cmp_short_name(struct dso *a, struct dso *b)
 	return __dso__cmp_short_name(a->short_name, &a->id, b);
 }
 
+static int dsos__cmp_long_name_id_short_name(const void *va, const void *vb)
+{
+	const struct dso *a = *((const struct dso **)va);
+	const struct dso *b = *((const struct dso **)vb);
+	int rc = strcmp(a->long_name, b->long_name);
+
+	if (!rc) {
+		rc = dso_id__cmp(&a->id, &b->id);
+		if (!rc)
+			rc = strcmp(a->short_name, b->short_name);
+	}
+	return rc;
+}
+
 /*
  * Find a matching entry and/or link current entry to RB tree.
  * Either one of the dso or name parameter must be non-NULL or the
  * function will not work.
  */
-struct dso *__dsos__findnew_link_by_longname_id(struct rb_root *root, struct dso *dso,
-						const char *name, struct dso_id *id)
+struct dso *__dsos__findnew_link_by_longname_id(struct dsos *dsos,
+						struct dso *dso,
+						const char *name,
+						struct dso_id *id,
+						bool write_locked)
 {
-	struct rb_node **p = &root->rb_node;
-	struct rb_node  *parent = NULL;
+	int low = 0, high = dsos->cnt - 1;
+
+	if (!dsos->sorted) {
+		if (!write_locked) {
+			up_read(&dsos->lock);
+			down_write(&dsos->lock);
+			dso = __dsos__findnew_link_by_longname_id(dsos, dso, name, id,
+								  /*write_locked=*/true);
+			up_write(&dsos->lock);
+			down_read(&dsos->lock);
+			return dso;
+		}
+		qsort(dsos->dsos, dsos->cnt, sizeof(struct dso *),
+		      dsos__cmp_long_name_id_short_name);
+		dsos->sorted = true;
+	}
 
 	if (!name)
 		name = dso->long_name;
@@ -136,11 +172,11 @@ struct dso *__dsos__findnew_link_by_longname_id(struct rb_root *root, struct dso
 	/*
 	 * Find node with the matching name
 	 */
-	while (*p) {
-		struct dso *this = rb_entry(*p, struct dso, rb_node);
+	while (low <= high) {
+		int mid = (low + high) / 2;
+		struct dso *this = dsos->dsos[mid];
 		int rc = __dso__cmp_long_name(name, id, this);
 
-		parent = *p;
 		if (rc == 0) {
 			/*
 			 * In case the new DSO is a duplicate of an existing
@@ -161,56 +197,53 @@ struct dso *__dsos__findnew_link_by_longname_id(struct rb_root *root, struct dso
 			}
 		}
 		if (rc < 0)
-			p = &parent->rb_left;
+			high = mid - 1;
 		else
-			p = &parent->rb_right;
-	}
-	if (dso) {
-		/* Add new node and rebalance tree */
-		rb_link_node(&dso->rb_node, parent, p);
-		rb_insert_color(&dso->rb_node, root);
-		dso->root = root;
+			low = mid + 1;
 	}
+	if (dso)
+		__dsos__add(dsos, dso);
 	return NULL;
 }
 
-void __dsos__add(struct dsos *dsos, struct dso *dso)
+int __dsos__add(struct dsos *dsos, struct dso *dso)
 {
-	list_add_tail(&dso->node, &dsos->head);
-	__dsos__findnew_link_by_longname_id(&dsos->root, dso, NULL, &dso->id);
-	/*
-	 * It is now in the linked list, grab a reference, then garbage collect
-	 * this when needing memory, by looking at LRU dso instances in the
-	 * list with atomic_read(&dso->refcnt) == 1, i.e. no references
-	 * anywhere besides the one for the list, do, under a lock for the
-	 * list: remove it from the list, then a dso__put(), that probably will
-	 * be the last and will then call dso__delete(), end of life.
-	 *
-	 * That, or at the end of the 'struct machine' lifetime, when all
-	 * 'struct dso' instances will be removed from the list, in
-	 * dsos__exit(), if they have no other reference from some other data
-	 * structure.
-	 *
-	 * E.g.: after processing a 'perf.data' file and storing references
-	 * to objects instantiated while processing events, we will have
-	 * references to the 'thread', 'map', 'dso' structs all from 'struct
-	 * hist_entry' instances, but we may not need anything not referenced,
-	 * so we might as well call machines__exit()/machines__delete() and
-	 * garbage collect it.
-	 */
-	dso__get(dso);
+	if (dsos->cnt == dsos->allocated) {
+		unsigned int to_allocate = 2;
+		struct dso **temp;
+
+		if (dsos->allocated > 0)
+			to_allocate = dsos->allocated * 2;
+		temp = realloc(dsos->dsos, sizeof(struct dso *) * to_allocate);
+		if (!temp)
+			return -ENOMEM;
+		dsos->dsos = temp;
+		dsos->allocated = to_allocate;
+	}
+	dsos->dsos[dsos->cnt++] = dso__get(dso);
+	if (dsos->cnt >= 2 && dsos->sorted) {
+		dsos->sorted = dsos__cmp_long_name_id_short_name(&dsos->dsos[dsos->cnt - 2],
+								 &dsos->dsos[dsos->cnt - 1])
+			<= 0;
+	}
+	dso->dsos = dsos;
+	return 0;
 }
 
-void dsos__add(struct dsos *dsos, struct dso *dso)
+int dsos__add(struct dsos *dsos, struct dso *dso)
 {
+	int ret;
+
 	down_write(&dsos->lock);
-	__dsos__add(dsos, dso);
+	ret = __dsos__add(dsos, dso);
 	up_write(&dsos->lock);
+	return ret;
 }
 
-static struct dso *__dsos__findnew_by_longname_id(struct rb_root *root, const char *name, struct dso_id *id)
+static struct dso *__dsos__findnew_by_longname_id(struct dsos *dsos, const char *name,
+						struct dso_id *id, bool write_locked)
 {
-	return __dsos__findnew_link_by_longname_id(root, NULL, name, id);
+	return __dsos__findnew_link_by_longname_id(dsos, NULL, name, id, write_locked);
 }
 
 struct dsos__find_id_cb_args {
@@ -231,7 +264,8 @@ static int dsos__find_id_cb(struct dso *dso, void *data)
 
 }
 
-static struct dso *__dsos__find_id(struct dsos *dsos, const char *name, struct dso_id *id, bool cmp_short)
+static struct dso *__dsos__find_id(struct dsos *dsos, const char *name, struct dso_id *id,
+				   bool cmp_short, bool write_locked)
 {
 	struct dso *res;
 
@@ -245,7 +279,7 @@ static struct dso *__dsos__find_id(struct dsos *dsos, const char *name, struct d
 		__dsos__for_each_dso(dsos, dsos__find_id_cb, &args);
 		return args.res;
 	}
-	res = __dsos__findnew_by_longname_id(&dsos->root, name, id);
+	res = __dsos__findnew_by_longname_id(dsos, name, id, write_locked);
 	return res;
 }
 
@@ -254,7 +288,7 @@ struct dso *dsos__find(struct dsos *dsos, const char *name, bool cmp_short)
 	struct dso *res;
 
 	down_read(&dsos->lock);
-	res = __dsos__find_id(dsos, name, NULL, cmp_short);
+	res = __dsos__find_id(dsos, name, NULL, cmp_short, /*write_locked=*/false);
 	up_read(&dsos->lock);
 	return res;
 }
@@ -296,8 +330,13 @@ static struct dso *__dsos__addnew_id(struct dsos *dsos, const char *name, struct
 	struct dso *dso = dso__new_id(name, id);
 
 	if (dso != NULL) {
-		__dsos__add(dsos, dso);
+		/*
+		 * The dsos lock is held on entry, so rename the dso before
+		 * adding it to avoid needing to take the dsos lock again to say
+		 * the array isn't sorted.
+		 */
 		dso__set_basename(dso);
+		__dsos__add(dsos, dso);
 	}
 	return dso;
 }
@@ -309,10 +348,10 @@ struct dso *__dsos__addnew(struct dsos *dsos, const char *name)
 
 static struct dso *__dsos__findnew_id(struct dsos *dsos, const char *name, struct dso_id *id)
 {
-	struct dso *dso = __dsos__find_id(dsos, name, id, false);
+	struct dso *dso = __dsos__find_id(dsos, name, id, false, /*write_locked=*/true);
 
 	if (dso && dso_id__empty(&dso->id) && !dso_id__empty(id))
-		dso__inject_id(dso, id);
+		__dso__inject_id(dso, id);
 
 	return dso ? dso : __dsos__addnew_id(dsos, name, id);
 }
@@ -403,18 +442,27 @@ struct dso *dsos__findnew_module_dso(struct dsos *dsos,
 
 	down_write(&dsos->lock);
 
-	dso = __dsos__find_id(dsos, m->name, NULL, /*cmp_short=*/true);
+	dso = __dsos__find_id(dsos, m->name, NULL, /*cmp_short=*/true, /*write_locked=*/true);
+	if (dso) {
+		up_write(&dsos->lock);
+		return dso;
+	}
+	/*
+	 * Failed to find the dso so create it. Change the name before adding it
+	 * to the array, to avoid unnecessary sorts and potential locking
+	 * issues.
+	 */
+	dso = dso__new_id(m->name, /*id=*/NULL);
 	if (!dso) {
-		dso = __dsos__addnew(dsos, m->name);
-		if (dso == NULL)
-			goto out_unlock;
-
-		dso__set_module_info(dso, m, machine);
-		dso__set_long_name(dso, strdup(filename), true);
-		dso->kernel = DSO_SPACE__KERNEL;
+		up_write(&dsos->lock);
+		return NULL;
 	}
+	dso__set_basename(dso);
+	dso__set_module_info(dso, m, machine);
+	dso__set_long_name(dso,	strdup(filename), true);
+	dso->kernel = DSO_SPACE__KERNEL;
+	__dsos__add(dsos, dso);
 
-out_unlock:
 	up_write(&dsos->lock);
 	return dso;
 }
diff --git a/tools/perf/util/dsos.h b/tools/perf/util/dsos.h
index 50bd51523475..c1b3979ad4bd 100644
--- a/tools/perf/util/dsos.h
+++ b/tools/perf/util/dsos.h
@@ -14,20 +14,22 @@ struct kmod_path;
 struct machine;
 
 /*
- * DSOs are put into both a list for fast iteration and rbtree for fast
- * long name lookup.
+ * Collection of DSOs as an array for iteration speed, but sorted for O(n)
+ * lookup.
  */
 struct dsos {
-	struct list_head    head;
-	struct rb_root	    root;	/* rbtree root sorted by long name */
 	struct rw_semaphore lock;
+	struct dso **dsos;
+	unsigned int cnt;
+	unsigned int allocated;
+	bool sorted;
 };
 
 void dsos__init(struct dsos *dsos);
 void dsos__exit(struct dsos *dsos);
 
-void __dsos__add(struct dsos *dsos, struct dso *dso);
-void dsos__add(struct dsos *dsos, struct dso *dso);
+int __dsos__add(struct dsos *dsos, struct dso *dso);
+int dsos__add(struct dsos *dsos, struct dso *dso);
 struct dso *__dsos__addnew(struct dsos *dsos, const char *name);
 struct dso *dsos__find(struct dsos *dsos, const char *name, bool cmp_short);
 
@@ -35,8 +37,11 @@ struct dso *dsos__findnew_id(struct dsos *dsos, const char *name, struct dso_id
  
 bool dsos__read_build_ids(struct dsos *dsos, bool with_hits);
 
-struct dso *__dsos__findnew_link_by_longname_id(struct rb_root *root, struct dso *dso,
-						const char *name, struct dso_id *id);
+struct dso *__dsos__findnew_link_by_longname_id(struct dsos *dsos,
+						struct dso *dso,
+						const char *name,
+						struct dso_id *id,
+						bool write_locked);
 
 size_t dsos__fprintf_buildid(struct dsos *dsos, FILE *fp,
 			       bool (skip)(struct dso *dso, int parm), int parm);
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 47/53] perf dsos: Remove __dsos__addnew
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (45 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 46/53] perf dsos: Switch backing storage to array from rbtree/list Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 48/53] perf dsos: Remove __dsos__findnew_link_by_longname_id Ian Rogers
                   ` (5 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Function no longer used so remove.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/dsos.c | 5 -----
 tools/perf/util/dsos.h | 1 -
 2 files changed, 6 deletions(-)

diff --git a/tools/perf/util/dsos.c b/tools/perf/util/dsos.c
index cfc10e1a6802..1495ab1cd7a0 100644
--- a/tools/perf/util/dsos.c
+++ b/tools/perf/util/dsos.c
@@ -341,11 +341,6 @@ static struct dso *__dsos__addnew_id(struct dsos *dsos, const char *name, struct
 	return dso;
 }
 
-struct dso *__dsos__addnew(struct dsos *dsos, const char *name)
-{
-	return __dsos__addnew_id(dsos, name, NULL);
-}
-
 static struct dso *__dsos__findnew_id(struct dsos *dsos, const char *name, struct dso_id *id)
 {
 	struct dso *dso = __dsos__find_id(dsos, name, id, false, /*write_locked=*/true);
diff --git a/tools/perf/util/dsos.h b/tools/perf/util/dsos.h
index c1b3979ad4bd..d1497b11d64c 100644
--- a/tools/perf/util/dsos.h
+++ b/tools/perf/util/dsos.h
@@ -30,7 +30,6 @@ void dsos__exit(struct dsos *dsos);
 
 int __dsos__add(struct dsos *dsos, struct dso *dso);
 int dsos__add(struct dsos *dsos, struct dso *dso);
-struct dso *__dsos__addnew(struct dsos *dsos, const char *name);
 struct dso *dsos__find(struct dsos *dsos, const char *name, bool cmp_short);
 
 struct dso *dsos__findnew_id(struct dsos *dsos, const char *name, struct dso_id *id);
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 48/53] perf dsos: Remove __dsos__findnew_link_by_longname_id
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (46 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 47/53] perf dsos: Remove __dsos__addnew Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 49/53] perf dsos: Switch hand code to bsearch Ian Rogers
                   ` (4 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Function was only called in dsos.c with the dso parameter as
NULL. Remove the function and specialize for the dso being NULL case
removing other unused functions along the way.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/dsos.c | 51 +++++++++---------------------------------
 tools/perf/util/dsos.h |  6 -----
 2 files changed, 10 insertions(+), 47 deletions(-)

diff --git a/tools/perf/util/dsos.c b/tools/perf/util/dsos.c
index 1495ab1cd7a0..e4110438841b 100644
--- a/tools/perf/util/dsos.c
+++ b/tools/perf/util/dsos.c
@@ -119,11 +119,6 @@ static int __dso__cmp_short_name(const char *short_name, struct dso_id *id, stru
 	return rc ?: dso_id__cmp(id, &b->id);
 }
 
-static int dso__cmp_short_name(struct dso *a, struct dso *b)
-{
-	return __dso__cmp_short_name(a->short_name, &a->id, b);
-}
-
 static int dsos__cmp_long_name_id_short_name(const void *va, const void *vb)
 {
 	const struct dso *a = *((const struct dso **)va);
@@ -143,20 +138,21 @@ static int dsos__cmp_long_name_id_short_name(const void *va, const void *vb)
  * Either one of the dso or name parameter must be non-NULL or the
  * function will not work.
  */
-struct dso *__dsos__findnew_link_by_longname_id(struct dsos *dsos,
-						struct dso *dso,
-						const char *name,
-						struct dso_id *id,
-						bool write_locked)
+static struct dso *__dsos__find_by_longname_id(struct dsos *dsos,
+					       const char *name,
+					       struct dso_id *id,
+					       bool write_locked)
 {
 	int low = 0, high = dsos->cnt - 1;
 
 	if (!dsos->sorted) {
 		if (!write_locked) {
+			struct dso *dso;
+
 			up_read(&dsos->lock);
 			down_write(&dsos->lock);
-			dso = __dsos__findnew_link_by_longname_id(dsos, dso, name, id,
-								  /*write_locked=*/true);
+			dso = __dsos__find_by_longname_id(dsos, name, id,
+							  /*write_locked=*/true);
 			up_write(&dsos->lock);
 			down_read(&dsos->lock);
 			return dso;
@@ -166,9 +162,6 @@ struct dso *__dsos__findnew_link_by_longname_id(struct dsos *dsos,
 		dsos->sorted = true;
 	}
 
-	if (!name)
-		name = dso->long_name;
-
 	/*
 	 * Find node with the matching name
 	 */
@@ -178,31 +171,13 @@ struct dso *__dsos__findnew_link_by_longname_id(struct dsos *dsos,
 		int rc = __dso__cmp_long_name(name, id, this);
 
 		if (rc == 0) {
-			/*
-			 * In case the new DSO is a duplicate of an existing
-			 * one, print a one-time warning & put the new entry
-			 * at the end of the list of duplicates.
-			 */
-			if (!dso || (dso == this))
-				return dso__get(this);	/* Find matching dso */
-			/*
-			 * The core kernel DSOs may have duplicated long name.
-			 * In this case, the short name should be different.
-			 * Comparing the short names to differentiate the DSOs.
-			 */
-			rc = dso__cmp_short_name(dso, this);
-			if (rc == 0) {
-				pr_err("Duplicated dso name: %s\n", name);
-				return NULL;
-			}
+			return dso__get(this);	/* Find matching dso */
 		}
 		if (rc < 0)
 			high = mid - 1;
 		else
 			low = mid + 1;
 	}
-	if (dso)
-		__dsos__add(dsos, dso);
 	return NULL;
 }
 
@@ -240,12 +215,6 @@ int dsos__add(struct dsos *dsos, struct dso *dso)
 	return ret;
 }
 
-static struct dso *__dsos__findnew_by_longname_id(struct dsos *dsos, const char *name,
-						struct dso_id *id, bool write_locked)
-{
-	return __dsos__findnew_link_by_longname_id(dsos, NULL, name, id, write_locked);
-}
-
 struct dsos__find_id_cb_args {
 	const char *name;
 	struct dso_id *id;
@@ -279,7 +248,7 @@ static struct dso *__dsos__find_id(struct dsos *dsos, const char *name, struct d
 		__dsos__for_each_dso(dsos, dsos__find_id_cb, &args);
 		return args.res;
 	}
-	res = __dsos__findnew_by_longname_id(dsos, name, id, write_locked);
+	res = __dsos__find_by_longname_id(dsos, name, id, write_locked);
 	return res;
 }
 
diff --git a/tools/perf/util/dsos.h b/tools/perf/util/dsos.h
index d1497b11d64c..6c13b65648bc 100644
--- a/tools/perf/util/dsos.h
+++ b/tools/perf/util/dsos.h
@@ -36,12 +36,6 @@ struct dso *dsos__findnew_id(struct dsos *dsos, const char *name, struct dso_id
  
 bool dsos__read_build_ids(struct dsos *dsos, bool with_hits);
 
-struct dso *__dsos__findnew_link_by_longname_id(struct dsos *dsos,
-						struct dso *dso,
-						const char *name,
-						struct dso_id *id,
-						bool write_locked);
-
 size_t dsos__fprintf_buildid(struct dsos *dsos, FILE *fp,
 			       bool (skip)(struct dso *dso, int parm), int parm);
 size_t dsos__fprintf(struct dsos *dsos, FILE *fp);
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 49/53] perf dsos: Switch hand code to bsearch
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (47 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 48/53] perf dsos: Remove __dsos__findnew_link_by_longname_id Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 50/53] perf dso: Add reference count checking and accessor functions Ian Rogers
                   ` (3 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Switch to using the bsearch library function rather than having a hand
written binary search. Const-ify some static functions to avoid
compiler warnings.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/dsos.c | 46 +++++++++++++++++++++++++-----------------
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/tools/perf/util/dsos.c b/tools/perf/util/dsos.c
index e4110438841b..23c3fe4f2abb 100644
--- a/tools/perf/util/dsos.c
+++ b/tools/perf/util/dsos.c
@@ -107,13 +107,15 @@ bool dsos__read_build_ids(struct dsos *dsos, bool with_hits)
 	return args.have_build_id;
 }
 
-static int __dso__cmp_long_name(const char *long_name, struct dso_id *id, struct dso *b)
+static int __dso__cmp_long_name(const char *long_name, const struct dso_id *id,
+				const struct dso *b)
 {
 	int rc = strcmp(long_name, b->long_name);
 	return rc ?: dso_id__cmp(id, &b->id);
 }
 
-static int __dso__cmp_short_name(const char *short_name, struct dso_id *id, struct dso *b)
+static int __dso__cmp_short_name(const char *short_name, const struct dso_id *id,
+				 const struct dso *b)
 {
 	int rc = strcmp(short_name, b->short_name);
 	return rc ?: dso_id__cmp(id, &b->id);
@@ -133,6 +135,19 @@ static int dsos__cmp_long_name_id_short_name(const void *va, const void *vb)
 	return rc;
 }
 
+struct dsos__key {
+	const char *long_name;
+	const struct dso_id *id;
+};
+
+static int dsos__cmp_key_long_name_id(const void *vkey, const void *vdso)
+{
+	const struct dsos__key *key = vkey;
+	const struct dso *dso = *((const struct dso **)vdso);
+
+	return __dso__cmp_long_name(key->long_name, key->id, dso);
+}
+
 /*
  * Find a matching entry and/or link current entry to RB tree.
  * Either one of the dso or name parameter must be non-NULL or the
@@ -143,7 +158,11 @@ static struct dso *__dsos__find_by_longname_id(struct dsos *dsos,
 					       struct dso_id *id,
 					       bool write_locked)
 {
-	int low = 0, high = dsos->cnt - 1;
+	struct dsos__key key = {
+		.long_name = name,
+		.id = id,
+	};
+	struct dso **res;
 
 	if (!dsos->sorted) {
 		if (!write_locked) {
@@ -162,23 +181,12 @@ static struct dso *__dsos__find_by_longname_id(struct dsos *dsos,
 		dsos->sorted = true;
 	}
 
-	/*
-	 * Find node with the matching name
-	 */
-	while (low <= high) {
-		int mid = (low + high) / 2;
-		struct dso *this = dsos->dsos[mid];
-		int rc = __dso__cmp_long_name(name, id, this);
+	res = bsearch(&key, dsos->dsos, dsos->cnt, sizeof(struct dso *),
+		      dsos__cmp_key_long_name_id);
+	if (!res)
+		return NULL;
 
-		if (rc == 0) {
-			return dso__get(this);	/* Find matching dso */
-		}
-		if (rc < 0)
-			high = mid - 1;
-		else
-			low = mid + 1;
-	}
-	return NULL;
+	return dso__get(*res);
 }
 
 int __dsos__add(struct dsos *dsos, struct dso *dso)
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 50/53] perf dso: Add reference count checking and accessor functions
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (48 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 49/53] perf dsos: Switch hand code to bsearch Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 51/53] perf dso: Reference counting related fixes Ian Rogers
                   ` (2 subsequent siblings)
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Add reference count checking to struct dso, this can help with
implementing correct reference counting discipline. To avoid
RC_CHK_ACCESS everywhere, add accessor functions for the variables in
struct dso.

The majority of the change is mechanical in nature and not easy to
split up.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-annotate.c                 |   6 +-
 tools/perf/builtin-buildid-cache.c            |   2 +-
 tools/perf/builtin-buildid-list.c             |  18 +-
 tools/perf/builtin-inject.c                   |  71 ++-
 tools/perf/builtin-kallsyms.c                 |   2 +-
 tools/perf/builtin-mem.c                      |   4 +-
 tools/perf/builtin-report.c                   |   6 +-
 tools/perf/builtin-script.c                   |   8 +-
 tools/perf/builtin-top.c                      |   4 +-
 tools/perf/tests/code-reading.c               |   8 +-
 tools/perf/tests/dso-data.c                   |  11 +-
 tools/perf/tests/hists_common.c               |   6 +-
 tools/perf/tests/hists_cumulate.c             |   4 +-
 tools/perf/tests/hists_output.c               |   2 +-
 tools/perf/tests/maps.c                       |   4 +-
 tools/perf/tests/symbols.c                    |   2 +-
 tools/perf/tests/vmlinux-kallsyms.c           |   6 +-
 tools/perf/ui/browsers/annotate.c             |   6 +-
 tools/perf/ui/browsers/hists.c                |   8 +-
 tools/perf/ui/browsers/map.c                  |   4 +-
 tools/perf/util/annotate.c                    |  44 +-
 tools/perf/util/auxtrace.c                    |   2 +-
 tools/perf/util/block-info.c                  |   2 +-
 tools/perf/util/bpf-event.c                   |   8 +-
 tools/perf/util/build-id.c                    |  38 +-
 tools/perf/util/callchain.c                   |   2 +-
 tools/perf/util/data-convert-json.c           |   2 +-
 tools/perf/util/db-export.c                   |   6 +-
 tools/perf/util/dlfilter.c                    |  12 +-
 tools/perf/util/dso.c                         | 364 ++++++-------
 tools/perf/util/dso.h                         | 478 ++++++++++++++++--
 tools/perf/util/dsos.c                        |  54 +-
 tools/perf/util/event.c                       |   8 +-
 tools/perf/util/header.c                      |   8 +-
 tools/perf/util/hist.c                        |   4 +-
 tools/perf/util/intel-pt.c                    |  22 +-
 tools/perf/util/machine.c                     |  46 +-
 tools/perf/util/map.c                         |  69 ++-
 tools/perf/util/maps.c                        |  14 +-
 tools/perf/util/probe-event.c                 |  25 +-
 .../scripting-engines/trace-event-python.c    |  21 +-
 tools/perf/util/sort.c                        |  19 +-
 tools/perf/util/srcline.c                     |  65 +--
 tools/perf/util/symbol-elf.c                  |  92 ++--
 tools/perf/util/symbol.c                      | 186 +++----
 tools/perf/util/symbol_fprintf.c              |   4 +-
 tools/perf/util/synthetic-events.c            |  24 +-
 tools/perf/util/thread.c                      |   4 +-
 tools/perf/util/unwind-libunwind-local.c      |  18 +-
 tools/perf/util/unwind-libunwind.c            |   2 +-
 tools/perf/util/vdso.c                        |   8 +-
 51 files changed, 1126 insertions(+), 707 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index aeeb801f1ed7..9e9c90222f0d 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -210,7 +210,7 @@ static int process_branch_callback(struct evsel *evsel,
 	}
 
 	if (a.map != NULL)
-		map__dso(a.map)->hit = 1;
+		dso__set_hit(map__dso(a.map));
 
 	hist__account_cycles(sample->branch_stack, al, sample, false, NULL);
 
@@ -245,7 +245,7 @@ static int evsel__add_sample(struct evsel *evsel, struct perf_sample *sample,
 		if (al->sym != NULL) {
 			struct dso *dso = map__dso(al->map);
 
-			rb_erase_cached(&al->sym->rb_node, &dso->symbols);
+			rb_erase_cached(&al->sym->rb_node, dso__symbols(dso));
 			symbol__delete(al->sym);
 			dso__reset_find_symbol_cache(dso);
 		}
@@ -331,7 +331,7 @@ static void hists__find_annotations(struct hists *hists,
 		struct hist_entry *he = rb_entry(nd, struct hist_entry, rb_node);
 		struct annotation *notes;
 
-		if (he->ms.sym == NULL || map__dso(he->ms.map)->annotate_warned)
+		if (he->ms.sym == NULL || dso__annotate_warned(map__dso(he->ms.map)))
 			goto find_next;
 
 		if (ann->sym_hist_filter &&
diff --git a/tools/perf/builtin-buildid-cache.c b/tools/perf/builtin-buildid-cache.c
index e2a40f1d9225..b0511d16aeb6 100644
--- a/tools/perf/builtin-buildid-cache.c
+++ b/tools/perf/builtin-buildid-cache.c
@@ -286,7 +286,7 @@ static bool dso__missing_buildid_cache(struct dso *dso, int parm __maybe_unused)
 
 		pr_warning("Problems with %s file, consider removing it from the cache\n",
 			   filename);
-	} else if (memcmp(dso->bid.data, bid.data, bid.size)) {
+	} else if (memcmp(dso__bid(dso)->data, bid.data, bid.size)) {
 		pr_warning("Problems with %s file, consider removing it from the cache\n",
 			   filename);
 	}
diff --git a/tools/perf/builtin-buildid-list.c b/tools/perf/builtin-buildid-list.c
index c9037477865a..383d5de36ce4 100644
--- a/tools/perf/builtin-buildid-list.c
+++ b/tools/perf/builtin-buildid-list.c
@@ -26,16 +26,18 @@ static int buildid__map_cb(struct map *map, void *arg __maybe_unused)
 {
 	const struct dso *dso = map__dso(map);
 	char bid_buf[SBUILD_ID_SIZE];
+	const char *dso_long_name = dso__long_name(dso);
+	const char *dso_short_name = dso__short_name(dso);
 
 	memset(bid_buf, 0, sizeof(bid_buf));
-	if (dso->has_build_id)
-		build_id__sprintf(&dso->bid, bid_buf);
+	if (dso__has_build_id(dso))
+		build_id__sprintf(dso__bid_const(dso), bid_buf);
 	printf("%s %16" PRIx64 " %16" PRIx64, bid_buf, map__start(map), map__end(map));
-	if (dso->long_name != NULL) {
-		printf(" %s", dso->long_name);
-	} else if (dso->short_name != NULL) {
-		printf(" %s", dso->short_name);
-	}
+	if (dso_long_name != NULL)
+		printf(" %s", dso_long_name);
+	else if (dso_short_name != NULL)
+		printf(" %s", dso_short_name);
+
 	printf("\n");
 
 	return 0;
@@ -76,7 +78,7 @@ static int filename__fprintf_build_id(const char *name, FILE *fp)
 
 static bool dso__skip_buildid(struct dso *dso, int with_hits)
 {
-	return with_hits && !dso->hit;
+	return with_hits && !dso__hit(dso);
 }
 
 static int perf_session__list_build_ids(bool force, bool with_hits)
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index ce5e28eaad90..a212678d47be 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -445,10 +445,9 @@ static struct dso *findnew_dso(int pid, int tid, const char *filename,
 	}
 
 	if (dso) {
-		mutex_lock(&dso->lock);
-		nsinfo__put(dso->nsinfo);
-		dso->nsinfo = nsi;
-		mutex_unlock(&dso->lock);
+		mutex_lock(dso__lock(dso));
+		dso__set_nsinfo(dso, nsi);
+		mutex_unlock(dso__lock(dso));
 	} else
 		nsinfo__put(nsi);
 
@@ -466,8 +465,8 @@ static int perf_event__repipe_buildid_mmap(struct perf_tool *tool,
 	dso = findnew_dso(event->mmap.pid, event->mmap.tid,
 			  event->mmap.filename, NULL, machine);
 
-	if (dso && !dso->hit) {
-		dso->hit = 1;
+	if (dso && !dso__hit(dso)) {
+		dso__set_hit(dso);
 		dso__inject_build_id(dso, tool, machine, sample->cpumode, 0);
 	}
 	dso__put(dso);
@@ -492,7 +491,7 @@ static int perf_event__repipe_mmap2(struct perf_tool *tool,
 				  event->mmap2.filename, NULL, machine);
 		if (dso) {
 			/* mark it not to inject build-id */
-			dso->hit = 1;
+			dso__set_hit(dso);
 		}
 		dso__put(dso);
 	}
@@ -544,7 +543,7 @@ static int perf_event__repipe_buildid_mmap2(struct perf_tool *tool,
 				  event->mmap2.filename, NULL, machine);
 		if (dso) {
 			/* mark it not to inject build-id */
-			dso->hit = 1;
+			dso__set_hit(dso);
 		}
 		dso__put(dso);
 		perf_event__repipe(tool, event, sample, machine);
@@ -554,8 +553,8 @@ static int perf_event__repipe_buildid_mmap2(struct perf_tool *tool,
 	dso = findnew_dso(event->mmap2.pid, event->mmap2.tid,
 			  event->mmap2.filename, &dso_id, machine);
 
-	if (dso && !dso->hit) {
-		dso->hit = 1;
+	if (dso && !dso__hit(dso)) {
+		dso__set_hit(dso);
 		dso__inject_build_id(dso, tool, machine, sample->cpumode,
 				     event->mmap2.flags);
 	}
@@ -631,24 +630,24 @@ static int dso__read_build_id(struct dso *dso)
 {
 	struct nscookie nsc;
 
-	if (dso->has_build_id)
+	if (dso__has_build_id(dso))
 		return 0;
 
-	mutex_lock(&dso->lock);
-	nsinfo__mountns_enter(dso->nsinfo, &nsc);
-	if (filename__read_build_id(dso->long_name, &dso->bid) > 0)
-		dso->has_build_id = true;
-	else if (dso->nsinfo) {
-		char *new_name = dso__filename_with_chroot(dso, dso->long_name);
+	mutex_lock(dso__lock(dso));
+	nsinfo__mountns_enter(dso__nsinfo(dso), &nsc);
+	if (filename__read_build_id(dso__long_name(dso), dso__bid(dso)) > 0)
+		dso__set_has_build_id(dso);
+	else if (dso__nsinfo(dso)) {
+		char *new_name = dso__filename_with_chroot(dso, dso__long_name(dso));
 
-		if (new_name && filename__read_build_id(new_name, &dso->bid) > 0)
-			dso->has_build_id = true;
+		if (new_name && filename__read_build_id(new_name, dso__bid(dso)) > 0)
+			dso__set_has_build_id(dso);
 		free(new_name);
 	}
 	nsinfo__mountns_exit(&nsc);
-	mutex_unlock(&dso->lock);
+	mutex_unlock(dso__lock(dso));
 
-	return dso->has_build_id ? 0 : -1;
+	return dso__has_build_id(dso) ? 0 : -1;
 }
 
 static struct strlist *perf_inject__parse_known_build_ids(
@@ -700,14 +699,14 @@ static bool perf_inject__lookup_known_build_id(struct perf_inject *inject,
 		dso_name = strchr(build_id, ' ');
 		bid_len = dso_name - pos->s;
 		dso_name = skip_spaces(dso_name);
-		if (strcmp(dso->long_name, dso_name))
+		if (strcmp(dso__long_name(dso), dso_name))
 			continue;
 		for (int ix = 0; 2 * ix + 1 < bid_len; ++ix) {
-			dso->bid.data[ix] = (hex(build_id[2 * ix]) << 4 |
-					     hex(build_id[2 * ix + 1]));
+			dso__bid(dso)->data[ix] = (hex(build_id[2 * ix]) << 4 |
+						  hex(build_id[2 * ix + 1]));
 		}
-		dso->bid.size = bid_len / 2;
-		dso->has_build_id = 1;
+		dso__bid(dso)->size = bid_len / 2;
+		dso__set_has_build_id(dso);
 		return true;
 	}
 	return false;
@@ -720,9 +719,9 @@ static int dso__inject_build_id(struct dso *dso, struct perf_tool *tool,
 						  tool);
 	int err;
 
-	if (is_anon_memory(dso->long_name) || flags & MAP_HUGETLB)
+	if (is_anon_memory(dso__long_name(dso)) || flags & MAP_HUGETLB)
 		return 0;
-	if (is_no_dso_memory(dso->long_name))
+	if (is_no_dso_memory(dso__long_name(dso)))
 		return 0;
 
 	if (inject->known_build_ids != NULL &&
@@ -730,14 +729,14 @@ static int dso__inject_build_id(struct dso *dso, struct perf_tool *tool,
 		return 1;
 
 	if (dso__read_build_id(dso) < 0) {
-		pr_debug("no build_id found for %s\n", dso->long_name);
+		pr_debug("no build_id found for %s\n", dso__long_name(dso));
 		return -1;
 	}
 
 	err = perf_event__synthesize_build_id(tool, dso, cpumode,
 					      perf_event__repipe, machine);
 	if (err) {
-		pr_err("Can't synthesize build_id event for %s\n", dso->long_name);
+		pr_err("Can't synthesize build_id event for %s\n", dso__long_name(dso));
 		return -1;
 	}
 
@@ -763,8 +762,8 @@ int perf_event__inject_buildid(struct perf_tool *tool, union perf_event *event,
 	if (thread__find_map(thread, sample->cpumode, sample->ip, &al)) {
 		struct dso *dso = map__dso(al.map);
 
-		if (!dso->hit) {
-			dso->hit = 1;
+		if (!dso__hit(dso)) {
+			dso__set_hit(dso);
 			dso__inject_build_id(dso, tool, machine,
 					     sample->cpumode, map__flags(al.map));
 		}
@@ -1146,8 +1145,8 @@ static bool dso__is_in_kernel_space(struct dso *dso)
 		return false;
 
 	return dso__is_kcore(dso) ||
-	       dso->kernel ||
-	       is_kernel_module(dso->long_name, PERF_RECORD_MISC_CPUMODE_UNKNOWN);
+	       dso__kernel(dso) ||
+	       is_kernel_module(dso__long_name(dso), PERF_RECORD_MISC_CPUMODE_UNKNOWN);
 }
 
 static u64 evlist__first_id(struct evlist *evlist)
@@ -1181,7 +1180,7 @@ static int synthesize_build_id(struct perf_inject *inject, struct dso *dso, pid_
 	if (!machine)
 		return -ENOMEM;
 
-	dso->hit = 1;
+	dso__set_hit(dso);
 
 	return perf_event__synthesize_build_id(&inject->tool, dso, cpumode,
 					       process_build_id, machine);
@@ -1192,7 +1191,7 @@ static int guest_session__add_build_ids_cb(struct dso *dso, void *data)
 	struct guest_session *gs = data;
 	struct perf_inject *inject = container_of(gs, struct perf_inject, guest_session);
 
-	if (!dso->has_build_id)
+	if (!dso__has_build_id(dso))
 		return 0;
 
 	return synthesize_build_id(inject, dso, gs->machine_pid);
diff --git a/tools/perf/builtin-kallsyms.c b/tools/perf/builtin-kallsyms.c
index 7f75c5b73f26..a3c2ffdc1af8 100644
--- a/tools/perf/builtin-kallsyms.c
+++ b/tools/perf/builtin-kallsyms.c
@@ -38,7 +38,7 @@ static int __cmd_kallsyms(int argc, const char **argv)
 
 		dso = map__dso(map);
 		printf("%s: %s %s %#" PRIx64 "-%#" PRIx64 " (%#" PRIx64 "-%#" PRIx64")\n",
-			symbol->name, dso->short_name, dso->long_name,
+			symbol->name, dso__short_name(dso), dso__long_name(dso),
 			map__unmap_ip(map, symbol->start), map__unmap_ip(map, symbol->end),
 			symbol->start, symbol->end);
 	}
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 51499c20da01..7c2f16d25a71 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -213,7 +213,7 @@ dump_raw_samples(struct perf_tool *tool,
 	if (al.map != NULL) {
 		dso = map__dso(al.map);
 		if (dso)
-			dso->hit = 1;
+			dso__set_hit(dso);
 	}
 
 	field_sep = symbol_conf.field_sep;
@@ -255,7 +255,7 @@ dump_raw_samples(struct perf_tool *tool,
 		symbol_conf.field_sep,
 		sample->data_src,
 		symbol_conf.field_sep,
-		dso ? dso->long_name : "???",
+		dso ? dso__long_name(dso) : "???",
 		al.sym ? al.sym->name : "???");
 out_put:
 	addr_location__exit(&al);
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index f5b95d45f6da..2a2355275379 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -322,7 +322,7 @@ static int process_sample_event(struct perf_tool *tool,
 	}
 
 	if (al.map != NULL)
-		map__dso(al.map)->hit = 1;
+		dso__set_hit(map__dso(al.map));
 
 	if (ui__has_annotation() || rep->symbol_ipc || rep->total_cycles_mode) {
 		hist__account_cycles(sample->branch_stack, &al, sample,
@@ -611,7 +611,7 @@ static void report__warn_kptr_restrict(const struct report *rep)
 		return;
 
 	if (kernel_map == NULL ||
-	     (map__dso(kernel_map)->hit &&
+	    (dso__hit(map__dso(kernel_map)) &&
 	     (kernel_kmap->ref_reloc_sym == NULL ||
 	      kernel_kmap->ref_reloc_sym->addr == 0))) {
 		const char *desc =
@@ -852,7 +852,7 @@ static int maps__fprintf_task_cb(struct map *map, void *data)
 		prot & PROT_EXEC ? 'x' : '-',
 		map__flags(map) ? 's' : 'p',
 		map__pgoff(map),
-		dso->id.ino, dso->name);
+		dso__id_const(dso)->ino, dso__name(dso));
 
 	if (ret < 0)
 		return ret;
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index b1f57401ff23..e31333c5ebd2 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1011,11 +1011,11 @@ static int perf_sample__fprintf_brstackoff(struct perf_sample *sample,
 		to   = entries[i].to;
 
 		if (thread__find_map_fb(thread, sample->cpumode, from, &alf) &&
-		    !map__dso(alf.map)->adjust_symbols)
+		    !dso__adjust_symbols(map__dso(alf.map)))
 			from = map__dso_map_ip(alf.map, from);
 
 		if (thread__find_map_fb(thread, sample->cpumode, to, &alt) &&
-		    !map__dso(alt.map)->adjust_symbols)
+		    !dso__adjust_symbols(map__dso(alt.map)))
 			to = map__dso_map_ip(alt.map, to);
 
 		printed += fprintf(fp, " 0x%"PRIx64, from);
@@ -1076,7 +1076,7 @@ static int grab_bb(u8 *buffer, u64 start, u64 end,
 		pr_debug("\tcannot resolve %" PRIx64 "-%" PRIx64 "\n", start, end);
 		goto out;
 	}
-	if (dso->data.status == DSO_DATA_STATUS_ERROR) {
+	if (dso__data(dso)->status == DSO_DATA_STATUS_ERROR) {
 		pr_debug("\tcannot resolve %" PRIx64 "-%" PRIx64 "\n", start, end);
 		goto out;
 	}
@@ -1088,7 +1088,7 @@ static int grab_bb(u8 *buffer, u64 start, u64 end,
 	len = dso__data_read_offset(dso, machine, offset, (u8 *)buffer,
 				    end - start + MAXINSN);
 
-	*is64bit = dso->is_64_bit;
+	*is64bit = dso__is_64_bit(dso);
 	if (len <= 0)
 		pr_debug("\tcannot fetch code for block at %" PRIx64 "-%" PRIx64 "\n",
 			start, end);
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index ea8c7eca5eee..71c48cf22789 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -129,7 +129,7 @@ static int perf_top__parse_source(struct perf_top *top, struct hist_entry *he)
 	/*
 	 * We can't annotate with just /proc/kallsyms
 	 */
-	if (dso->symtab_type == DSO_BINARY_TYPE__KALLSYMS && !dso__is_kcore(dso)) {
+	if (dso__symtab_type(dso) == DSO_BINARY_TYPE__KALLSYMS && !dso__is_kcore(dso)) {
 		pr_err("Can't annotate %s: No vmlinux file was found in the "
 		       "path\n", sym->name);
 		sleep(1);
@@ -182,7 +182,7 @@ static void ui__warn_map_erange(struct map *map, struct symbol *sym, u64 ip)
 		    "Tools:  %s\n\n"
 		    "Not all samples will be on the annotation output.\n\n"
 		    "Please report to linux-kernel@vger.kernel.org\n",
-		    ip, dso->long_name, dso__symtab_origin(dso),
+		    ip, dso__long_name(dso), dso__symtab_origin(dso),
 		    map__start(map), map__end(map), sym->start, sym->end,
 		    sym->binding == STB_GLOBAL ? 'g' :
 		    sym->binding == STB_LOCAL  ? 'l' : 'w', sym->name,
diff --git a/tools/perf/tests/code-reading.c b/tools/perf/tests/code-reading.c
index 3af81012014e..2bb8430b8d26 100644
--- a/tools/perf/tests/code-reading.c
+++ b/tools/perf/tests/code-reading.c
@@ -253,9 +253,9 @@ static int read_object_code(u64 addr, size_t len, u8 cpumode,
 		goto out;
 	}
 	dso = map__dso(al.map);
-	pr_debug("File is: %s\n", dso->long_name);
+	pr_debug("File is: %s\n", dso__long_name(dso));
 
-	if (dso->symtab_type == DSO_BINARY_TYPE__KALLSYMS && !dso__is_kcore(dso)) {
+	if (dso__symtab_type(dso) == DSO_BINARY_TYPE__KALLSYMS && !dso__is_kcore(dso)) {
 		pr_debug("Unexpected kernel address - skipping\n");
 		goto out;
 	}
@@ -274,7 +274,7 @@ static int read_object_code(u64 addr, size_t len, u8 cpumode,
 	 * modules to manage long jumps. Check if the ip offset falls in stubs
 	 * sections for kernel modules. And skip module address after text end
 	 */
-	if (dso->is_kmod && al.addr > dso->text_end) {
+	if (dso__is_kmod(dso) && al.addr > dso__text_end(dso)) {
 		pr_debug("skipping the module address %#"PRIx64" after text end\n", al.addr);
 		goto out;
 	}
@@ -315,7 +315,7 @@ static int read_object_code(u64 addr, size_t len, u8 cpumode,
 		state->done[state->done_cnt++] = map__start(al.map);
 	}
 
-	objdump_name = dso->long_name;
+	objdump_name = dso__long_name(dso);
 	if (dso__needs_decompress(dso)) {
 		if (dso__decompress_kmodule_path(dso, objdump_name,
 						 decomp_name,
diff --git a/tools/perf/tests/dso-data.c b/tools/perf/tests/dso-data.c
index 3419a4ab5590..625dbb2ffe8a 100644
--- a/tools/perf/tests/dso-data.c
+++ b/tools/perf/tests/dso-data.c
@@ -228,7 +228,8 @@ static void dsos__delete(int cnt)
 	for (i = 0; i < cnt; i++) {
 		struct dso *dso = dsos[i];
 
-		unlink(dso->name);
+		dso__data_close(dso);
+		unlink(dso__name(dso));
 		dso__put(dso);
 	}
 
@@ -289,14 +290,14 @@ static int test__dso_data_cache(struct test_suite *test __maybe_unused, int subt
 	}
 
 	/* verify the first one is already open */
-	TEST_ASSERT_VAL("dsos[0] is not open", dsos[0]->data.fd != -1);
+	TEST_ASSERT_VAL("dsos[0] is not open", dso__data(dsos[0])->fd != -1);
 
 	/* open +1 dso to reach the allowed limit */
 	fd = dso__data_fd(dsos[i], &machine);
 	TEST_ASSERT_VAL("failed to get fd", fd > 0);
 
 	/* should force the first one to be closed */
-	TEST_ASSERT_VAL("failed to close dsos[0]", dsos[0]->data.fd == -1);
+	TEST_ASSERT_VAL("failed to close dsos[0]", dso__data(dsos[0])->fd == -1);
 
 	/* cleanup everything */
 	dsos__delete(dso_cnt);
@@ -371,7 +372,7 @@ static int test__dso_data_reopen(struct test_suite *test __maybe_unused, int sub
 	 * dso_0 should get closed, because we reached
 	 * the file descriptor limit
 	 */
-	TEST_ASSERT_VAL("failed to close dso_0", dso_0->data.fd == -1);
+	TEST_ASSERT_VAL("failed to close dso_0", dso__data(dso_0)->fd == -1);
 
 	/* open dso_0 */
 	fd = dso__data_fd(dso_0, &machine);
@@ -381,7 +382,7 @@ static int test__dso_data_reopen(struct test_suite *test __maybe_unused, int sub
 	 * dso_1 should get closed, because we reached
 	 * the file descriptor limit
 	 */
-	TEST_ASSERT_VAL("failed to close dso_1", dso_1->data.fd == -1);
+	TEST_ASSERT_VAL("failed to close dso_1", dso__data(dso_1)->fd == -1);
 
 	/* cleanup everything */
 	close(fd_extra);
diff --git a/tools/perf/tests/hists_common.c b/tools/perf/tests/hists_common.c
index d08add0f4da6..187f12f5bc21 100644
--- a/tools/perf/tests/hists_common.c
+++ b/tools/perf/tests/hists_common.c
@@ -146,7 +146,7 @@ struct machine *setup_fake_machine(struct machines *machines)
 				goto out;
 			}
 
-			symbols__insert(&dso->symbols, sym);
+			symbols__insert(dso__symbols(dso), sym);
 		}
 
 		dso__put(dso);
@@ -183,7 +183,7 @@ void print_hists_in(struct hists *hists)
 
 			pr_info("%2d: entry: %-8s [%-8s] %20s: period = %"PRIu64"\n",
 				i, thread__comm_str(he->thread),
-				dso->short_name,
+				dso__short_name(dso),
 				he->ms.sym->name, he->stat.period);
 		}
 
@@ -212,7 +212,7 @@ void print_hists_out(struct hists *hists)
 
 			pr_info("%2d: entry: %8s:%5d [%-8s] %20s: period = %"PRIu64"/%"PRIu64"\n",
 				i, thread__comm_str(he->thread), thread__tid(he->thread),
-				dso->short_name,
+				dso__short_name(dso),
 				he->ms.sym->name, he->stat.period,
 				he->stat_acc ? he->stat_acc->period : 0);
 		}
diff --git a/tools/perf/tests/hists_cumulate.c b/tools/perf/tests/hists_cumulate.c
index 71dacb0fec4d..1e0f5a310fd5 100644
--- a/tools/perf/tests/hists_cumulate.c
+++ b/tools/perf/tests/hists_cumulate.c
@@ -164,11 +164,11 @@ static void put_fake_samples(void)
 typedef int (*test_fn_t)(struct evsel *, struct machine *);
 
 #define COMM(he)  (thread__comm_str(he->thread))
-#define DSO(he)   (map__dso(he->ms.map)->short_name)
+#define DSO(he)   (dso__short_name(map__dso(he->ms.map)))
 #define SYM(he)   (he->ms.sym->name)
 #define CPU(he)   (he->cpu)
 #define DEPTH(he) (he->callchain->max_depth)
-#define CDSO(cl)  (map__dso(cl->ms.map)->short_name)
+#define CDSO(cl)  (dso__short_name(map__dso(cl->ms.map)))
 #define CSYM(cl)  (cl->ms.sym->name)
 
 struct result {
diff --git a/tools/perf/tests/hists_output.c b/tools/perf/tests/hists_output.c
index ba1cccf57049..33b5cc8352a7 100644
--- a/tools/perf/tests/hists_output.c
+++ b/tools/perf/tests/hists_output.c
@@ -129,7 +129,7 @@ static void put_fake_samples(void)
 typedef int (*test_fn_t)(struct evsel *, struct machine *);
 
 #define COMM(he)  (thread__comm_str(he->thread))
-#define DSO(he)   (map__dso(he->ms.map)->short_name)
+#define DSO(he)   (dso__short_name(map__dso(he->ms.map)))
 #define SYM(he)   (he->ms.sym->name)
 #define CPU(he)   (he->cpu)
 #define PID(he)   (thread__tid(he->thread))
diff --git a/tools/perf/tests/maps.c b/tools/perf/tests/maps.c
index b15417a0d617..4f1f9385ea9c 100644
--- a/tools/perf/tests/maps.c
+++ b/tools/perf/tests/maps.c
@@ -26,7 +26,7 @@ static int check_maps_cb(struct map *map, void *data)
 
 	if (map__start(map) != merged->start ||
 	    map__end(map) != merged->end ||
-	    strcmp(map__dso(map)->name, merged->name) ||
+	    strcmp(dso__name(map__dso(map)), merged->name) ||
 	    refcount_read(map__refcnt(map)) != 1) {
 		return 1;
 	}
@@ -39,7 +39,7 @@ static int failed_cb(struct map *map, void *data __maybe_unused)
 	pr_debug("\tstart: %" PRIu64 " end: %" PRIu64 " name: '%s' refcnt: %d\n",
 		map__start(map),
 		map__end(map),
-		map__dso(map)->name,
+		dso__name(map__dso(map)),
 		refcount_read(map__refcnt(map)));
 
 	return 0;
diff --git a/tools/perf/tests/symbols.c b/tools/perf/tests/symbols.c
index 16e1c5502b09..4bcb277b0cac 100644
--- a/tools/perf/tests/symbols.c
+++ b/tools/perf/tests/symbols.c
@@ -72,7 +72,7 @@ static int test_dso(struct dso *dso)
 	if (verbose > 1)
 		dso__fprintf(dso, stderr);
 
-	for (nd = rb_first_cached(&dso->symbols); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(dso__symbols(dso)); nd; nd = rb_next(nd)) {
 		struct symbol *sym = rb_entry(nd, struct symbol, rb_node);
 
 		if (sym->type != STT_FUNC && sym->type != STT_GNU_IFUNC)
diff --git a/tools/perf/tests/vmlinux-kallsyms.c b/tools/perf/tests/vmlinux-kallsyms.c
index fecbf851bb2e..e30fd55f8e51 100644
--- a/tools/perf/tests/vmlinux-kallsyms.c
+++ b/tools/perf/tests/vmlinux-kallsyms.c
@@ -129,7 +129,7 @@ static int test__vmlinux_matches_kallsyms_cb1(struct map *map, void *data)
 	 * cases.
 	 */
 	struct map *pair = maps__find_by_name(args->kallsyms.kmaps,
-					(dso->kernel ? dso->short_name : dso->name));
+					(dso__kernel(dso) ? dso__short_name(dso) : dso__name(dso)));
 
 	if (pair) {
 		map__set_priv(pair, 1);
@@ -162,11 +162,11 @@ static int test__vmlinux_matches_kallsyms_cb2(struct map *map, void *data)
 		}
 
 		pr_info("WARN: %" PRIx64 "-%" PRIx64 " %" PRIx64 " %s in kallsyms as",
-			map__start(map), map__end(map), map__pgoff(map), dso->name);
+			map__start(map), map__end(map), map__pgoff(map), dso__name(dso));
 		if (mem_end != map__end(pair))
 			pr_info(":\nWARN: *%" PRIx64 "-%" PRIx64 " %" PRIx64,
 				map__start(pair), map__end(pair), map__pgoff(pair));
-		pr_info(" %s\n", dso->name);
+		pr_info(" %s\n", dso__name(dso));
 		map__set_priv(pair, 1);
 	}
 	map__put(pair);
diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c
index ccdb2cd11fbf..d5c7912dcce4 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -442,7 +442,7 @@ static int sym_title(struct symbol *sym, struct map *map, char *title,
 		     size_t sz, int percent_type)
 {
 	return snprintf(title, sz, "%s  %s [Percent: %s]", sym->name,
-			map__dso(map)->long_name,
+			dso__long_name(map__dso(map)),
 			percent_type_str(percent_type));
 }
 
@@ -975,14 +975,14 @@ int symbol__tui_annotate(struct map_symbol *ms, struct evsel *evsel,
 		return -1;
 
 	dso = map__dso(ms->map);
-	if (dso->annotate_warned)
+	if (dso__annotate_warned(dso))
 		return -1;
 
 	if (not_annotated) {
 		err = symbol__annotate2(ms, evsel, opts, &browser.arch);
 		if (err) {
 			char msg[BUFSIZ];
-			dso->annotate_warned = true;
+			dso__set_annotate_warned(dso);
 			symbol__strerror_disassemble(ms, err, msg, sizeof(msg));
 			ui__error("Couldn't annotate %s:\n%s", sym->name, msg);
 			goto out_free_offsets;
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index f4812b226818..95d901b788c2 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -2491,7 +2491,7 @@ add_annotate_opt(struct hist_browser *browser __maybe_unused,
 {
 	struct dso *dso;
 
-	if (!ms->map || (dso = map__dso(ms->map)) == NULL || dso->annotate_warned)
+	if (!ms->map || (dso = map__dso(ms->map)) == NULL || dso__annotate_warned(dso))
 		return 0;
 
 	if (!ms->sym)
@@ -2584,7 +2584,7 @@ static int hists_browser__zoom_map(struct hist_browser *browser, struct map *map
 	} else {
 		struct dso *dso = map__dso(map);
 		ui_helpline__fpush("To zoom out press ESC or ENTER + \"Zoom out of %s DSO\"",
-				   __map__is_kernel(map) ? "the Kernel" : dso->short_name);
+				   __map__is_kernel(map) ? "the Kernel" : dso__short_name(dso));
 		browser->hists->dso_filter = dso;
 		perf_hpp__set_elide(HISTC_DSO, true);
 		pstack__push(browser->pstack, &browser->hists->dso_filter);
@@ -2610,7 +2610,7 @@ add_dso_opt(struct hist_browser *browser, struct popup_action *act,
 
 	if (asprintf(optstr, "Zoom %s %s DSO (use the 'k' hotkey to zoom directly into the kernel)",
 		     browser->hists->dso_filter ? "out of" : "into",
-		     __map__is_kernel(map) ? "the Kernel" : map__dso(map)->short_name) < 0)
+		     __map__is_kernel(map) ? "the Kernel" : dso__short_name(map__dso(map))) < 0)
 		return 0;
 
 	act->ms.map = map;
@@ -3086,7 +3086,7 @@ static int evsel__hists_browse(struct evsel *evsel, int nr_events, const char *h
 			if (!browser->selection ||
 			    !browser->selection->map ||
 			    !map__dso(browser->selection->map) ||
-			    map__dso(browser->selection->map)->annotate_warned) {
+			    dso__annotate_warned(map__dso(browser->selection->map))) {
 				continue;
 			}
 
diff --git a/tools/perf/ui/browsers/map.c b/tools/perf/ui/browsers/map.c
index 3d1b958d8832..fba55175a935 100644
--- a/tools/perf/ui/browsers/map.c
+++ b/tools/perf/ui/browsers/map.c
@@ -76,7 +76,7 @@ static int map_browser__run(struct map_browser *browser)
 {
 	int key;
 
-	if (ui_browser__show(&browser->b, map__dso(browser->map)->long_name,
+	if (ui_browser__show(&browser->b, dso__long_name(map__dso(browser->map)),
 			     "Press ESC to exit, %s / to search",
 			     verbose > 0 ? "" : "restart with -v to use") < 0)
 		return -1;
@@ -106,7 +106,7 @@ int map__browse(struct map *map)
 {
 	struct map_browser mb = {
 		.b = {
-			.entries = &map__dso(map)->symbols,
+			.entries = dso__symbols(map__dso(map)),
 			.refresh = ui_browser__rb_tree_refresh,
 			.seek	 = ui_browser__rb_tree_seek,
 			.write	 = map_browser__write,
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 82956adf9963..4a6fc15c8278 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -1695,8 +1695,8 @@ int symbol__strerror_disassemble(struct map_symbol *ms, int errnum, char *buf, s
 		char bf[SBUILD_ID_SIZE + 15] = " with build id ";
 		char *build_id_msg = NULL;
 
-		if (dso->has_build_id) {
-			build_id__sprintf(&dso->bid, bf + 15);
+		if (dso__has_build_id(dso)) {
+			build_id__sprintf(dso__bid(dso), bf + 15);
 			build_id_msg = bf;
 		}
 		scnprintf(buf, buflen,
@@ -1718,11 +1718,11 @@ int symbol__strerror_disassemble(struct map_symbol *ms, int errnum, char *buf, s
 		scnprintf(buf, buflen, "Problems while parsing the CPUID in the arch specific initialization.");
 		break;
 	case SYMBOL_ANNOTATE_ERRNO__BPF_INVALID_FILE:
-		scnprintf(buf, buflen, "Invalid BPF file: %s.", dso->long_name);
+		scnprintf(buf, buflen, "Invalid BPF file: %s.", dso__long_name(dso));
 		break;
 	case SYMBOL_ANNOTATE_ERRNO__BPF_MISSING_BTF:
 		scnprintf(buf, buflen, "The %s BPF file has no BTF section, compile with -g or use pahole -J.",
-			  dso->long_name);
+			  dso__long_name(dso));
 		break;
 	default:
 		scnprintf(buf, buflen, "Internal error: Invalid %d error code\n", errnum);
@@ -1740,7 +1740,7 @@ static int dso__disassemble_filename(struct dso *dso, char *filename, size_t fil
 	char *pos;
 	int len;
 
-	if (dso->symtab_type == DSO_BINARY_TYPE__KALLSYMS &&
+	if (dso__symtab_type(dso) == DSO_BINARY_TYPE__KALLSYMS &&
 	    !dso__is_kcore(dso))
 		return SYMBOL_ANNOTATE_ERRNO__NO_VMLINUX;
 
@@ -1749,7 +1749,7 @@ static int dso__disassemble_filename(struct dso *dso, char *filename, size_t fil
 		__symbol__join_symfs(filename, filename_size, build_id_filename);
 		free(build_id_filename);
 	} else {
-		if (dso->has_build_id)
+		if (dso__has_build_id(dso))
 			return ENOMEM;
 		goto fallback;
 	}
@@ -1783,20 +1783,20 @@ static int dso__disassemble_filename(struct dso *dso, char *filename, size_t fil
 		 * cache, or is just a kallsyms file, well, lets hope that this
 		 * DSO is the same as when 'perf record' ran.
 		 */
-		if (dso->kernel && dso->long_name[0] == '/')
-			snprintf(filename, filename_size, "%s", dso->long_name);
+		if (dso__kernel(dso) && dso__long_name(dso)[0] == '/')
+			snprintf(filename, filename_size, "%s", dso__long_name(dso));
 		else
-			__symbol__join_symfs(filename, filename_size, dso->long_name);
+			__symbol__join_symfs(filename, filename_size, dso__long_name(dso));
 
-		mutex_lock(&dso->lock);
-		if (access(filename, R_OK) && errno == ENOENT && dso->nsinfo) {
+		mutex_lock(dso__lock(dso));
+		if (access(filename, R_OK) && errno == ENOENT && dso__nsinfo(dso)) {
 			char *new_name = dso__filename_with_chroot(dso, filename);
 			if (new_name) {
 				strlcpy(filename, new_name, filename_size);
 				free(new_name);
 			}
 		}
-		mutex_unlock(&dso->lock);
+		mutex_unlock(dso__lock(dso));
 	}
 
 	free(build_id_path);
@@ -2083,11 +2083,11 @@ static int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
 		 map__unmap_ip(map, sym->end));
 
 	pr_debug("annotating [%p] %30s : [%p] %30s\n",
-		 dso, dso->long_name, sym, sym->name);
+		 dso, dso__long_name(dso), sym, sym->name);
 
-	if (dso->binary_type == DSO_BINARY_TYPE__BPF_PROG_INFO) {
+	if (dso__binary_type(dso) == DSO_BINARY_TYPE__BPF_PROG_INFO) {
 		return symbol__disassemble_bpf(sym, args);
-	} else if (dso->binary_type == DSO_BINARY_TYPE__BPF_IMAGE) {
+	} else if (dso__binary_type(dso) == DSO_BINARY_TYPE__BPF_IMAGE) {
 		return symbol__disassemble_bpf_image(sym, args);
 	} else if (dso__is_kcore(dso)) {
 		kce.kcore_filename = symfs_filename;
@@ -2514,7 +2514,7 @@ int symbol__annotate_printf(struct map_symbol *ms, struct evsel *evsel,
 	int graph_dotted_len;
 	char buf[512];
 
-	filename = strdup(dso->long_name);
+	filename = strdup(dso__long_name(dso));
 	if (!filename)
 		return -ENOMEM;
 
@@ -2681,7 +2681,7 @@ int map_symbol__annotation_dump(struct map_symbol *ms, struct evsel *evsel,
 	}
 
 	fprintf(fp, "%s() %s\nEvent: %s\n\n",
-		ms->sym->name, map__dso(ms->map)->long_name, ev_name);
+		ms->sym->name, dso__long_name(map__dso(ms->map)), ev_name);
 	symbol__annotate_fprintf2(ms->sym, fp, opts);
 
 	fclose(fp);
@@ -2937,7 +2937,7 @@ int symbol__tty_annotate2(struct map_symbol *ms, struct evsel *evsel,
 	if (err) {
 		char msg[BUFSIZ];
 
-		dso->annotate_warned = true;
+		dso__set_annotate_warned(dso);
 		symbol__strerror_disassemble(ms, err, msg, sizeof(msg));
 		ui__error("Couldn't annotate %s:\n%s", sym->name, msg);
 		return -1;
@@ -2946,12 +2946,12 @@ int symbol__tty_annotate2(struct map_symbol *ms, struct evsel *evsel,
 	if (opts->print_lines) {
 		srcline_full_filename = opts->full_path;
 		symbol__calc_lines(ms, &source_line, opts);
-		print_summary(&source_line, dso->long_name);
+		print_summary(&source_line, dso__long_name(dso));
 	}
 
 	hists__scnprintf_title(hists, buf, sizeof(buf));
 	fprintf(stdout, "%s, [percent: %s]\n%s() %s\n",
-		buf, percent_type_str(opts->percent_type), sym->name, dso->long_name);
+		buf, percent_type_str(opts->percent_type), sym->name, dso__long_name(dso));
 	symbol__annotate_fprintf2(sym, stdout, opts);
 
 	annotated_source__purge(symbol__annotation(sym)->src);
@@ -2971,7 +2971,7 @@ int symbol__tty_annotate(struct map_symbol *ms, struct evsel *evsel,
 	if (err) {
 		char msg[BUFSIZ];
 
-		dso->annotate_warned = true;
+		dso__set_annotate_warned(dso);
 		symbol__strerror_disassemble(ms, err, msg, sizeof(msg));
 		ui__error("Couldn't annotate %s:\n%s", sym->name, msg);
 		return -1;
@@ -2982,7 +2982,7 @@ int symbol__tty_annotate(struct map_symbol *ms, struct evsel *evsel,
 	if (opts->print_lines) {
 		srcline_full_filename = opts->full_path;
 		symbol__calc_lines(ms, &source_line, opts);
-		print_summary(&source_line, dso->long_name);
+		print_summary(&source_line, dso__long_name(dso));
 	}
 
 	symbol__annotate_printf(ms, evsel, opts);
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index a0368202a746..b654fa3f4a7c 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -2649,7 +2649,7 @@ static int addr_filter__entire_dso(struct addr_filter *filt, struct dso *dso)
 	}
 
 	filt->addr = 0;
-	filt->size = dso->data.file_size;
+	filt->size = dso__data(dso)->file_size;
 
 	return 0;
 }
diff --git a/tools/perf/util/block-info.c b/tools/perf/util/block-info.c
index 591fc1edd385..472def2c392d 100644
--- a/tools/perf/util/block-info.c
+++ b/tools/perf/util/block-info.c
@@ -319,7 +319,7 @@ static int block_dso_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
 
 	if (map && map__dso(map)) {
 		return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
-				 map__dso(map)->short_name);
+				 dso__short_name(map__dso(map)));
 	}
 
 	return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
diff --git a/tools/perf/util/bpf-event.c b/tools/perf/util/bpf-event.c
index d07fd5ffa823..b564d6fd078a 100644
--- a/tools/perf/util/bpf-event.c
+++ b/tools/perf/util/bpf-event.c
@@ -59,10 +59,10 @@ static int machine__process_bpf_event_load(struct machine *machine,
 		if (map) {
 			struct dso *dso = map__dso(map);
 
-			dso->binary_type = DSO_BINARY_TYPE__BPF_PROG_INFO;
-			dso->bpf_prog.id = id;
-			dso->bpf_prog.sub_id = i;
-			dso->bpf_prog.env = env;
+			dso__set_binary_type(dso, DSO_BINARY_TYPE__BPF_PROG_INFO);
+			dso__bpf_prog(dso)->id = id;
+			dso__bpf_prog(dso)->sub_id = i;
+			dso__bpf_prog(dso)->env = env;
 			map__put(map);
 		}
 	}
diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index 864bc26b6b46..83a1581e8cf1 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -60,7 +60,7 @@ int build_id__mark_dso_hit(struct perf_tool *tool __maybe_unused,
 
 	addr_location__init(&al);
 	if (thread__find_map(thread, sample->cpumode, sample->ip, &al))
-		map__dso(al.map)->hit = 1;
+		dso__set_hit(map__dso(al.map));
 
 	addr_location__exit(&al);
 	thread__put(thread);
@@ -272,10 +272,10 @@ char *__dso__build_id_filename(const struct dso *dso, char *bf, size_t size,
 	bool alloc = (bf == NULL);
 	int ret;
 
-	if (!dso->has_build_id)
+	if (!dso__has_build_id(dso))
 		return NULL;
 
-	build_id__sprintf(&dso->bid, sbuild_id);
+	build_id__sprintf(dso__bid_const(dso), sbuild_id);
 	linkname = build_id_cache__linkname(sbuild_id, NULL, 0);
 	if (!linkname)
 		return NULL;
@@ -340,25 +340,25 @@ static int machine__write_buildid_table_cb(struct dso *dso, void *data)
 	size_t name_len;
 	bool in_kernel = false;
 
-	if (!dso->has_build_id)
+	if (!dso__has_build_id(dso))
 		return 0;
 
-	if (!dso->hit && !dso__is_vdso(dso))
+	if (!dso__hit(dso) && !dso__is_vdso(dso))
 		return 0;
 
 	if (dso__is_vdso(dso)) {
-		name = dso->short_name;
-		name_len = dso->short_name_len;
+		name = dso__short_name(dso);
+		name_len = dso__short_name_len(dso);
 	} else if (dso__is_kcore(dso)) {
 		name = args->machine->mmap_name;
 		name_len = strlen(name);
 	} else {
-		name = dso->long_name;
-		name_len = dso->long_name_len;
+		name = dso__long_name(dso);
+		name_len = dso__long_name_len(dso);
 	}
 
-	in_kernel = dso->kernel || is_kernel_module(name, PERF_RECORD_MISC_CPUMODE_UNKNOWN);
-	return write_buildid(name, name_len, &dso->bid, args->machine->pid,
+	in_kernel = dso__kernel(dso) || is_kernel_module(name, PERF_RECORD_MISC_CPUMODE_UNKNOWN);
+	return write_buildid(name, name_len, dso__bid(dso), args->machine->pid,
 			     in_kernel ? args->kmisc : args->umisc, args->fd);
 }
 
@@ -876,11 +876,11 @@ static bool dso__build_id_mismatch(struct dso *dso, const char *name)
 	struct build_id bid;
 	bool ret = false;
 
-	mutex_lock(&dso->lock);
-	if (filename__read_build_id_ns(name, &bid, dso->nsinfo) >= 0)
+	mutex_lock(dso__lock(dso));
+	if (filename__read_build_id_ns(name, &bid, dso__nsinfo(dso)) >= 0)
 		ret = !dso__build_id_equal(dso, &bid);
 
-	mutex_unlock(&dso->lock);
+	mutex_unlock(dso__lock(dso));
 
 	return ret;
 }
@@ -890,13 +890,13 @@ static int dso__cache_build_id(struct dso *dso, struct machine *machine,
 {
 	bool is_kallsyms = dso__is_kallsyms(dso);
 	bool is_vdso = dso__is_vdso(dso);
-	const char *name = dso->long_name;
+	const char *name = dso__long_name(dso);
 	const char *proper_name = NULL;
 	const char *root_dir = NULL;
 	char *allocated_name = NULL;
 	int ret = 0;
 
-	if (!dso->has_build_id)
+	if (!dso__has_build_id(dso))
 		return 0;
 
 	if (dso__is_kcore(dso)) {
@@ -921,10 +921,10 @@ static int dso__cache_build_id(struct dso *dso, struct machine *machine,
 	if (!is_kallsyms && dso__build_id_mismatch(dso, name))
 		goto out_free;
 
-	mutex_lock(&dso->lock);
-	ret = build_id_cache__add_b(&dso->bid, name, dso->nsinfo,
+	mutex_lock(dso__lock(dso));
+	ret = build_id_cache__add_b(dso__bid(dso), name, dso__nsinfo(dso),
 				    is_kallsyms, is_vdso, proper_name, root_dir);
-	mutex_unlock(&dso->lock);
+	mutex_unlock(dso__lock(dso));
 out_free:
 	free(allocated_name);
 	return ret;
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 7517d16c02ec..68feed871809 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1205,7 +1205,7 @@ char *callchain_list__sym_name(struct callchain_list *cl,
 	if (show_dso)
 		scnprintf(bf + printed, bfsize - printed, " %s",
 			  cl->ms.map ?
-			  map__dso(cl->ms.map)->short_name :
+			  dso__short_name(map__dso(cl->ms.map)) :
 			  "unknown");
 
 	return bf;
diff --git a/tools/perf/util/data-convert-json.c b/tools/perf/util/data-convert-json.c
index 5bb3c2ba95ca..86ef936e2e04 100644
--- a/tools/perf/util/data-convert-json.c
+++ b/tools/perf/util/data-convert-json.c
@@ -134,7 +134,7 @@ static void output_sample_callchain_entry(struct perf_tool *tool,
 		output_json_key_string(out, false, 5, "symbol", al->sym->name);
 
 		if (dso) {
-			const char *dso_name = dso->short_name;
+			const char *dso_name = dso__short_name(dso);
 
 			if (dso_name && strlen(dso_name) > 0) {
 				fputc(',', out);
diff --git a/tools/perf/util/db-export.c b/tools/perf/util/db-export.c
index b9fb71ab7a73..2fe3143e6689 100644
--- a/tools/perf/util/db-export.c
+++ b/tools/perf/util/db-export.c
@@ -146,10 +146,10 @@ int db_export__comm_thread(struct db_export *dbe, struct comm *comm,
 int db_export__dso(struct db_export *dbe, struct dso *dso,
 		   struct machine *machine)
 {
-	if (dso->db_id)
+	if (dso__db_id(dso))
 		return 0;
 
-	dso->db_id = ++dbe->dso_last_db_id;
+	dso__set_db_id(dso, ++dbe->dso_last_db_id);
 
 	if (dbe->export_dso)
 		return dbe->export_dso(dbe, dso, machine);
@@ -184,7 +184,7 @@ static int db_ids_from_al(struct db_export *dbe, struct addr_location *al,
 		err = db_export__dso(dbe, dso, maps__machine(al->maps));
 		if (err)
 			return err;
-		*dso_db_id = dso->db_id;
+		*dso_db_id = dso__db_id(dso);
 
 		if (!al->sym) {
 			al->sym = symbol__new(al->addr, 0, 0, 0, "unknown");
diff --git a/tools/perf/util/dlfilter.c b/tools/perf/util/dlfilter.c
index 908e16813722..7d180bdaedbc 100644
--- a/tools/perf/util/dlfilter.c
+++ b/tools/perf/util/dlfilter.c
@@ -33,13 +33,13 @@ static void al_to_d_al(struct addr_location *al, struct perf_dlfilter_al *d_al)
 	if (al->map) {
 		struct dso *dso = map__dso(al->map);
 
-		if (symbol_conf.show_kernel_path && dso->long_name)
-			d_al->dso = dso->long_name;
+		if (symbol_conf.show_kernel_path && dso__long_name(dso))
+			d_al->dso = dso__long_name(dso);
 		else
-			d_al->dso = dso->name;
-		d_al->is_64_bit = dso->is_64_bit;
-		d_al->buildid_size = dso->bid.size;
-		d_al->buildid = dso->bid.data;
+			d_al->dso = dso__name(dso);
+		d_al->is_64_bit = dso__is_64_bit(dso);
+		d_al->buildid_size = dso__bid(dso)->size;
+		d_al->buildid = dso__bid(dso)->data;
 	} else {
 		d_al->dso = NULL;
 		d_al->is_64_bit = 0;
diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 66dc929443ba..0fef597725c7 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -39,6 +39,12 @@ static const char * const debuglink_paths[] = {
 	"/usr/lib/debug%s/%s"
 };
 
+void dso__set_nsinfo(struct dso *dso, struct nsinfo *nsi)
+{
+	nsinfo__put(RC_CHK_ACCESS(dso)->nsinfo);
+	RC_CHK_ACCESS(dso)->nsinfo = nsi;
+}
+
 char dso__symtab_origin(const struct dso *dso)
 {
 	static const char origin[] = {
@@ -62,14 +68,14 @@ char dso__symtab_origin(const struct dso *dso)
 		[DSO_BINARY_TYPE__GUEST_VMLINUX]		= 'V',
 	};
 
-	if (dso == NULL || dso->symtab_type == DSO_BINARY_TYPE__NOT_FOUND)
+	if (dso == NULL || dso__symtab_type(dso) == DSO_BINARY_TYPE__NOT_FOUND)
 		return '!';
-	return origin[dso->symtab_type];
+	return origin[dso__symtab_type(dso)];
 }
 
 bool dso__is_object_file(const struct dso *dso)
 {
-	switch (dso->binary_type) {
+	switch (dso__binary_type(dso)) {
 	case DSO_BINARY_TYPE__KALLSYMS:
 	case DSO_BINARY_TYPE__GUEST_KALLSYMS:
 	case DSO_BINARY_TYPE__JAVA_JIT:
@@ -116,7 +122,7 @@ int dso__read_binary_type_filename(const struct dso *dso,
 		char symfile[PATH_MAX];
 		unsigned int i;
 
-		len = __symbol__join_symfs(filename, size, dso->long_name);
+		len = __symbol__join_symfs(filename, size, dso__long_name(dso));
 		last_slash = filename + len;
 		while (last_slash != filename && *last_slash != '/')
 			last_slash--;
@@ -158,12 +164,12 @@ int dso__read_binary_type_filename(const struct dso *dso,
 
 	case DSO_BINARY_TYPE__FEDORA_DEBUGINFO:
 		len = __symbol__join_symfs(filename, size, "/usr/lib/debug");
-		snprintf(filename + len, size - len, "%s.debug", dso->long_name);
+		snprintf(filename + len, size - len, "%s.debug", dso__long_name(dso));
 		break;
 
 	case DSO_BINARY_TYPE__UBUNTU_DEBUGINFO:
 		len = __symbol__join_symfs(filename, size, "/usr/lib/debug");
-		snprintf(filename + len, size - len, "%s", dso->long_name);
+		snprintf(filename + len, size - len, "%s", dso__long_name(dso));
 		break;
 
 	case DSO_BINARY_TYPE__MIXEDUP_UBUNTU_DEBUGINFO:
@@ -172,13 +178,13 @@ int dso__read_binary_type_filename(const struct dso *dso,
 		 * /usr/lib/debug/lib when it is expected to be in
 		 * /usr/lib/debug/usr/lib
 		 */
-		if (strlen(dso->long_name) < 9 ||
-		    strncmp(dso->long_name, "/usr/lib/", 9)) {
+		if (strlen(dso__long_name(dso)) < 9 ||
+		    strncmp(dso__long_name(dso), "/usr/lib/", 9)) {
 			ret = -1;
 			break;
 		}
 		len = __symbol__join_symfs(filename, size, "/usr/lib/debug");
-		snprintf(filename + len, size - len, "%s", dso->long_name + 4);
+		snprintf(filename + len, size - len, "%s", dso__long_name(dso) + 4);
 		break;
 
 	case DSO_BINARY_TYPE__OPENEMBEDDED_DEBUGINFO:
@@ -186,29 +192,29 @@ int dso__read_binary_type_filename(const struct dso *dso,
 		const char *last_slash;
 		size_t dir_size;
 
-		last_slash = dso->long_name + dso->long_name_len;
-		while (last_slash != dso->long_name && *last_slash != '/')
+		last_slash = dso__long_name(dso) + dso__long_name_len(dso);
+		while (last_slash != dso__long_name(dso) && *last_slash != '/')
 			last_slash--;
 
 		len = __symbol__join_symfs(filename, size, "");
-		dir_size = last_slash - dso->long_name + 2;
+		dir_size = last_slash - dso__long_name(dso) + 2;
 		if (dir_size > (size - len)) {
 			ret = -1;
 			break;
 		}
-		len += scnprintf(filename + len, dir_size, "%s",  dso->long_name);
+		len += scnprintf(filename + len, dir_size, "%s",  dso__long_name(dso));
 		len += scnprintf(filename + len , size - len, ".debug%s",
 								last_slash);
 		break;
 	}
 
 	case DSO_BINARY_TYPE__BUILDID_DEBUGINFO:
-		if (!dso->has_build_id) {
+		if (!dso__has_build_id(dso)) {
 			ret = -1;
 			break;
 		}
 
-		build_id__sprintf(&dso->bid, build_id_hex);
+		build_id__sprintf(dso__bid_const(dso), build_id_hex);
 		len = __symbol__join_symfs(filename, size, "/usr/lib/debug/.build-id/");
 		snprintf(filename + len, size - len, "%.2s/%s.debug",
 			 build_id_hex, build_id_hex + 2);
@@ -217,23 +223,23 @@ int dso__read_binary_type_filename(const struct dso *dso,
 	case DSO_BINARY_TYPE__VMLINUX:
 	case DSO_BINARY_TYPE__GUEST_VMLINUX:
 	case DSO_BINARY_TYPE__SYSTEM_PATH_DSO:
-		__symbol__join_symfs(filename, size, dso->long_name);
+		__symbol__join_symfs(filename, size, dso__long_name(dso));
 		break;
 
 	case DSO_BINARY_TYPE__GUEST_KMODULE:
 	case DSO_BINARY_TYPE__GUEST_KMODULE_COMP:
 		path__join3(filename, size, symbol_conf.symfs,
-			    root_dir, dso->long_name);
+			    root_dir, dso__long_name(dso));
 		break;
 
 	case DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE:
 	case DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE_COMP:
-		__symbol__join_symfs(filename, size, dso->long_name);
+		__symbol__join_symfs(filename, size, dso__long_name(dso));
 		break;
 
 	case DSO_BINARY_TYPE__KCORE:
 	case DSO_BINARY_TYPE__GUEST_KCORE:
-		snprintf(filename, size, "%s", dso->long_name);
+		snprintf(filename, size, "%s", dso__long_name(dso));
 		break;
 
 	default:
@@ -309,8 +315,8 @@ bool is_kernel_module(const char *pathname, int cpumode)
 
 bool dso__needs_decompress(struct dso *dso)
 {
-	return dso->symtab_type == DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE_COMP ||
-		dso->symtab_type == DSO_BINARY_TYPE__GUEST_KMODULE_COMP;
+	return dso__symtab_type(dso) == DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE_COMP ||
+		dso__symtab_type(dso) == DSO_BINARY_TYPE__GUEST_KMODULE_COMP;
 }
 
 int filename__decompress(const char *name, char *pathname,
@@ -362,11 +368,10 @@ static int decompress_kmodule(struct dso *dso, const char *name,
 	if (!dso__needs_decompress(dso))
 		return -1;
 
-	if (dso->comp == COMP_ID__NONE)
+	if (dso__comp(dso) == COMP_ID__NONE)
 		return -1;
 
-	return filename__decompress(name, pathname, len, dso->comp,
-				    &dso->load_errno);
+	return filename__decompress(name, pathname, len, dso__comp(dso), dso__load_errno(dso));
 }
 
 int dso__decompress_kmodule_fd(struct dso *dso, const char *name)
@@ -467,17 +472,17 @@ void dso__set_module_info(struct dso *dso, struct kmod_path *m,
 			  struct machine *machine)
 {
 	if (machine__is_host(machine))
-		dso->symtab_type = DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE;
+		dso__set_symtab_type(dso, DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE);
 	else
-		dso->symtab_type = DSO_BINARY_TYPE__GUEST_KMODULE;
+		dso__set_symtab_type(dso, DSO_BINARY_TYPE__GUEST_KMODULE);
 
 	/* _KMODULE_COMP should be next to _KMODULE */
 	if (m->kmod && m->comp) {
-		dso->symtab_type++;
-		dso->comp = m->comp;
+		dso__set_symtab_type(dso, dso__symtab_type(dso) + 1);
+		dso__set_comp(dso, m->comp);
 	}
 
-	dso->is_kmod = 1;
+	dso__set_is_kmod(dso);
 	dso__set_short_name(dso, strdup(m->name), true);
 }
 
@@ -490,13 +495,15 @@ static pthread_mutex_t dso__data_open_lock = PTHREAD_MUTEX_INITIALIZER;
 
 static void dso__list_add(struct dso *dso)
 {
-	list_add_tail(&dso->data.open_entry, &dso__data_open);
+	list_add_tail(&dso__data(dso)->open_entry, &dso__data_open);
+	dso__data(dso)->dso = dso__get(dso);
 	dso__data_open_cnt++;
 }
 
 static void dso__list_del(struct dso *dso)
 {
-	list_del_init(&dso->data.open_entry);
+	list_del_init(&dso__data(dso)->open_entry);
+	dso__put(dso__data(dso)->dso);
 	WARN_ONCE(dso__data_open_cnt <= 0,
 		  "DSO data fd counter out of bounds.");
 	dso__data_open_cnt--;
@@ -527,7 +534,7 @@ static int do_open(char *name)
 
 char *dso__filename_with_chroot(const struct dso *dso, const char *filename)
 {
-	return filename_with_chroot(nsinfo__pid(dso->nsinfo), filename);
+	return filename_with_chroot(nsinfo__pid(dso__nsinfo_const(dso)), filename);
 }
 
 static int __open_dso(struct dso *dso, struct machine *machine)
@@ -540,18 +547,18 @@ static int __open_dso(struct dso *dso, struct machine *machine)
 	if (!name)
 		return -ENOMEM;
 
-	mutex_lock(&dso->lock);
+	mutex_lock(dso__lock(dso));
 	if (machine)
 		root_dir = machine->root_dir;
 
-	if (dso__read_binary_type_filename(dso, dso->binary_type,
+	if (dso__read_binary_type_filename(dso, dso__binary_type(dso),
 					    root_dir, name, PATH_MAX))
 		goto out;
 
 	if (!is_regular_file(name)) {
 		char *new_name;
 
-		if (errno != ENOENT || dso->nsinfo == NULL)
+		if (errno != ENOENT || dso__nsinfo(dso) == NULL)
 			goto out;
 
 		new_name = dso__filename_with_chroot(dso, name);
@@ -567,7 +574,7 @@ static int __open_dso(struct dso *dso, struct machine *machine)
 		size_t len = sizeof(newpath);
 
 		if (dso__decompress_kmodule_path(dso, name, newpath, len) < 0) {
-			fd = -dso->load_errno;
+			fd = -(*dso__load_errno(dso));
 			goto out;
 		}
 
@@ -581,7 +588,7 @@ static int __open_dso(struct dso *dso, struct machine *machine)
 		unlink(name);
 
 out:
-	mutex_unlock(&dso->lock);
+	mutex_unlock(dso__lock(dso));
 	free(name);
 	return fd;
 }
@@ -600,13 +607,13 @@ static int open_dso(struct dso *dso, struct machine *machine)
 	int fd;
 	struct nscookie nsc;
 
-	if (dso->binary_type != DSO_BINARY_TYPE__BUILD_ID_CACHE) {
-		mutex_lock(&dso->lock);
-		nsinfo__mountns_enter(dso->nsinfo, &nsc);
-		mutex_unlock(&dso->lock);
+	if (dso__binary_type(dso) != DSO_BINARY_TYPE__BUILD_ID_CACHE) {
+		mutex_lock(dso__lock(dso));
+		nsinfo__mountns_enter(dso__nsinfo(dso), &nsc);
+		mutex_unlock(dso__lock(dso));
 	}
 	fd = __open_dso(dso, machine);
-	if (dso->binary_type != DSO_BINARY_TYPE__BUILD_ID_CACHE)
+	if (dso__binary_type(dso) != DSO_BINARY_TYPE__BUILD_ID_CACHE)
 		nsinfo__mountns_exit(&nsc);
 
 	if (fd >= 0) {
@@ -623,10 +630,10 @@ static int open_dso(struct dso *dso, struct machine *machine)
 
 static void close_data_fd(struct dso *dso)
 {
-	if (dso->data.fd >= 0) {
-		close(dso->data.fd);
-		dso->data.fd = -1;
-		dso->data.file_size = 0;
+	if (dso__data(dso)->fd >= 0) {
+		close(dso__data(dso)->fd);
+		dso__data(dso)->fd = -1;
+		dso__data(dso)->file_size = 0;
 		dso__list_del(dso);
 	}
 }
@@ -645,10 +652,10 @@ static void close_dso(struct dso *dso)
 
 static void close_first_dso(void)
 {
-	struct dso *dso;
+	struct dso_data *dso_data;
 
-	dso = list_first_entry(&dso__data_open, struct dso, data.open_entry);
-	close_dso(dso);
+	dso_data = list_first_entry(&dso__data_open, struct dso_data, open_entry);
+	close_dso(dso_data->dso);
 }
 
 static rlim_t get_fd_limit(void)
@@ -727,28 +734,29 @@ static void try_to_open_dso(struct dso *dso, struct machine *machine)
 		DSO_BINARY_TYPE__NOT_FOUND,
 	};
 	int i = 0;
+	struct dso_data *dso_data = dso__data(dso);
 
-	if (dso->data.fd >= 0)
+	if (dso_data->fd >= 0)
 		return;
 
-	if (dso->binary_type != DSO_BINARY_TYPE__NOT_FOUND) {
-		dso->data.fd = open_dso(dso, machine);
+	if (dso__binary_type(dso) != DSO_BINARY_TYPE__NOT_FOUND) {
+		dso_data->fd = open_dso(dso, machine);
 		goto out;
 	}
 
 	do {
-		dso->binary_type = binary_type_data[i++];
+		dso__set_binary_type(dso, binary_type_data[i++]);
 
-		dso->data.fd = open_dso(dso, machine);
-		if (dso->data.fd >= 0)
+		dso_data->fd = open_dso(dso, machine);
+		if (dso_data->fd >= 0)
 			goto out;
 
-	} while (dso->binary_type != DSO_BINARY_TYPE__NOT_FOUND);
+	} while (dso__binary_type(dso) != DSO_BINARY_TYPE__NOT_FOUND);
 out:
-	if (dso->data.fd >= 0)
-		dso->data.status = DSO_DATA_STATUS_OK;
+	if (dso_data->fd >= 0)
+		dso_data->status = DSO_DATA_STATUS_OK;
 	else
-		dso->data.status = DSO_DATA_STATUS_ERROR;
+		dso_data->status = DSO_DATA_STATUS_ERROR;
 }
 
 /**
@@ -762,7 +770,7 @@ static void try_to_open_dso(struct dso *dso, struct machine *machine)
  */
 int dso__data_get_fd(struct dso *dso, struct machine *machine)
 {
-	if (dso->data.status == DSO_DATA_STATUS_ERROR)
+	if (dso__data(dso)->status == DSO_DATA_STATUS_ERROR)
 		return -1;
 
 	if (pthread_mutex_lock(&dso__data_open_lock) < 0)
@@ -770,10 +778,10 @@ int dso__data_get_fd(struct dso *dso, struct machine *machine)
 
 	try_to_open_dso(dso, machine);
 
-	if (dso->data.fd < 0)
+	if (dso__data(dso)->fd < 0)
 		pthread_mutex_unlock(&dso__data_open_lock);
 
-	return dso->data.fd;
+	return dso__data(dso)->fd;
 }
 
 void dso__data_put_fd(struct dso *dso __maybe_unused)
@@ -785,10 +793,10 @@ bool dso__data_status_seen(struct dso *dso, enum dso_data_status_seen by)
 {
 	u32 flag = 1 << by;
 
-	if (dso->data.status_seen & flag)
+	if (dso__data(dso)->status_seen & flag)
 		return true;
 
-	dso->data.status_seen |= flag;
+	dso__data(dso)->status_seen |= flag;
 
 	return false;
 }
@@ -798,12 +806,13 @@ static ssize_t bpf_read(struct dso *dso, u64 offset, char *data)
 {
 	struct bpf_prog_info_node *node;
 	ssize_t size = DSO__DATA_CACHE_SIZE;
+	struct dso_bpf_prog *dso_bpf_prog = dso__bpf_prog(dso);
 	u64 len;
 	u8 *buf;
 
-	node = perf_env__find_bpf_prog_info(dso->bpf_prog.env, dso->bpf_prog.id);
+	node = perf_env__find_bpf_prog_info(dso_bpf_prog->env, dso_bpf_prog->id);
 	if (!node || !node->info_linear) {
-		dso->data.status = DSO_DATA_STATUS_ERROR;
+		dso__data(dso)->status = DSO_DATA_STATUS_ERROR;
 		return -1;
 	}
 
@@ -821,14 +830,15 @@ static ssize_t bpf_read(struct dso *dso, u64 offset, char *data)
 static int bpf_size(struct dso *dso)
 {
 	struct bpf_prog_info_node *node;
+	struct dso_bpf_prog *dso_bpf_prog = dso__bpf_prog(dso);
 
-	node = perf_env__find_bpf_prog_info(dso->bpf_prog.env, dso->bpf_prog.id);
+	node = perf_env__find_bpf_prog_info(dso_bpf_prog->env, dso_bpf_prog->id);
 	if (!node || !node->info_linear) {
-		dso->data.status = DSO_DATA_STATUS_ERROR;
+		dso__data(dso)->status = DSO_DATA_STATUS_ERROR;
 		return -1;
 	}
 
-	dso->data.file_size = node->info_linear->info.jited_prog_len;
+	dso__data(dso)->file_size = node->info_linear->info.jited_prog_len;
 	return 0;
 }
 #endif // HAVE_LIBBPF_SUPPORT
@@ -836,10 +846,10 @@ static int bpf_size(struct dso *dso)
 static void
 dso_cache__free(struct dso *dso)
 {
-	struct rb_root *root = &dso->data.cache;
+	struct rb_root *root = &dso__data(dso)->cache;
 	struct rb_node *next = rb_first(root);
 
-	mutex_lock(&dso->lock);
+	mutex_lock(dso__lock(dso));
 	while (next) {
 		struct dso_cache *cache;
 
@@ -848,12 +858,12 @@ dso_cache__free(struct dso *dso)
 		rb_erase(&cache->rb_node, root);
 		free(cache);
 	}
-	mutex_unlock(&dso->lock);
+	mutex_unlock(dso__lock(dso));
 }
 
 static struct dso_cache *__dso_cache__find(struct dso *dso, u64 offset)
 {
-	const struct rb_root *root = &dso->data.cache;
+	const struct rb_root *root = &dso__data(dso)->cache;
 	struct rb_node * const *p = &root->rb_node;
 	const struct rb_node *parent = NULL;
 	struct dso_cache *cache;
@@ -879,13 +889,13 @@ static struct dso_cache *__dso_cache__find(struct dso *dso, u64 offset)
 static struct dso_cache *
 dso_cache__insert(struct dso *dso, struct dso_cache *new)
 {
-	struct rb_root *root = &dso->data.cache;
+	struct rb_root *root = &dso__data(dso)->cache;
 	struct rb_node **p = &root->rb_node;
 	struct rb_node *parent = NULL;
 	struct dso_cache *cache;
 	u64 offset = new->offset;
 
-	mutex_lock(&dso->lock);
+	mutex_lock(dso__lock(dso));
 	while (*p != NULL) {
 		u64 end;
 
@@ -906,7 +916,7 @@ dso_cache__insert(struct dso *dso, struct dso_cache *new)
 
 	cache = NULL;
 out:
-	mutex_unlock(&dso->lock);
+	mutex_unlock(dso__lock(dso));
 	return cache;
 }
 
@@ -931,18 +941,18 @@ static ssize_t file_read(struct dso *dso, struct machine *machine,
 	pthread_mutex_lock(&dso__data_open_lock);
 
 	/*
-	 * dso->data.fd might be closed if other thread opened another
+	 * dso__data(dso)->fd might be closed if other thread opened another
 	 * file (dso) due to open file limit (RLIMIT_NOFILE).
 	 */
 	try_to_open_dso(dso, machine);
 
-	if (dso->data.fd < 0) {
-		dso->data.status = DSO_DATA_STATUS_ERROR;
+	if (dso__data(dso)->fd < 0) {
+		dso__data(dso)->status = DSO_DATA_STATUS_ERROR;
 		ret = -errno;
 		goto out;
 	}
 
-	ret = pread(dso->data.fd, data, DSO__DATA_CACHE_SIZE, offset);
+	ret = pread(dso__data(dso)->fd, data, DSO__DATA_CACHE_SIZE, offset);
 out:
 	pthread_mutex_unlock(&dso__data_open_lock);
 	return ret;
@@ -962,11 +972,11 @@ static struct dso_cache *dso_cache__populate(struct dso *dso,
 		return NULL;
 	}
 #ifdef HAVE_LIBBPF_SUPPORT
-	if (dso->binary_type == DSO_BINARY_TYPE__BPF_PROG_INFO)
+	if (dso__binary_type(dso) == DSO_BINARY_TYPE__BPF_PROG_INFO)
 		*ret = bpf_read(dso, cache_offset, cache->data);
 	else
 #endif
-	if (dso->binary_type == DSO_BINARY_TYPE__OOL)
+	if (dso__binary_type(dso) == DSO_BINARY_TYPE__OOL)
 		*ret = DSO__DATA_CACHE_SIZE;
 	else
 		*ret = file_read(dso, machine, cache_offset, cache->data);
@@ -1055,25 +1065,25 @@ static int file_size(struct dso *dso, struct machine *machine)
 	pthread_mutex_lock(&dso__data_open_lock);
 
 	/*
-	 * dso->data.fd might be closed if other thread opened another
+	 * dso__data(dso)->fd might be closed if other thread opened another
 	 * file (dso) due to open file limit (RLIMIT_NOFILE).
 	 */
 	try_to_open_dso(dso, machine);
 
-	if (dso->data.fd < 0) {
+	if (dso__data(dso)->fd < 0) {
 		ret = -errno;
-		dso->data.status = DSO_DATA_STATUS_ERROR;
+		dso__data(dso)->status = DSO_DATA_STATUS_ERROR;
 		goto out;
 	}
 
-	if (fstat(dso->data.fd, &st) < 0) {
+	if (fstat(dso__data(dso)->fd, &st) < 0) {
 		ret = -errno;
 		pr_err("dso cache fstat failed: %s\n",
 		       str_error_r(errno, sbuf, sizeof(sbuf)));
-		dso->data.status = DSO_DATA_STATUS_ERROR;
+		dso__data(dso)->status = DSO_DATA_STATUS_ERROR;
 		goto out;
 	}
-	dso->data.file_size = st.st_size;
+	dso__data(dso)->file_size = st.st_size;
 
 out:
 	pthread_mutex_unlock(&dso__data_open_lock);
@@ -1082,13 +1092,13 @@ static int file_size(struct dso *dso, struct machine *machine)
 
 int dso__data_file_size(struct dso *dso, struct machine *machine)
 {
-	if (dso->data.file_size)
+	if (dso__data(dso)->file_size)
 		return 0;
 
-	if (dso->data.status == DSO_DATA_STATUS_ERROR)
+	if (dso__data(dso)->status == DSO_DATA_STATUS_ERROR)
 		return -1;
 #ifdef HAVE_LIBBPF_SUPPORT
-	if (dso->binary_type == DSO_BINARY_TYPE__BPF_PROG_INFO)
+	if (dso__binary_type(dso) == DSO_BINARY_TYPE__BPF_PROG_INFO)
 		return bpf_size(dso);
 #endif
 	return file_size(dso, machine);
@@ -1107,7 +1117,7 @@ off_t dso__data_size(struct dso *dso, struct machine *machine)
 		return -1;
 
 	/* For now just estimate dso data size is close to file size */
-	return dso->data.file_size;
+	return dso__data(dso)->file_size;
 }
 
 static ssize_t data_read_write_offset(struct dso *dso, struct machine *machine,
@@ -1118,7 +1128,7 @@ static ssize_t data_read_write_offset(struct dso *dso, struct machine *machine,
 		return -1;
 
 	/* Check the offset sanity. */
-	if (offset > dso->data.file_size)
+	if (offset > dso__data(dso)->file_size)
 		return -1;
 
 	if (offset + size < offset)
@@ -1141,7 +1151,7 @@ static ssize_t data_read_write_offset(struct dso *dso, struct machine *machine,
 ssize_t dso__data_read_offset(struct dso *dso, struct machine *machine,
 			      u64 offset, u8 *data, ssize_t size)
 {
-	if (dso->data.status == DSO_DATA_STATUS_ERROR)
+	if (dso__data(dso)->status == DSO_DATA_STATUS_ERROR)
 		return -1;
 
 	return data_read_write_offset(dso, machine, offset, data, size, true);
@@ -1181,7 +1191,7 @@ ssize_t dso__data_write_cache_offs(struct dso *dso, struct machine *machine,
 {
 	u8 *data = (u8 *)data_in; /* cast away const to use same fns for r/w */
 
-	if (dso->data.status == DSO_DATA_STATUS_ERROR)
+	if (dso__data(dso)->status == DSO_DATA_STATUS_ERROR)
 		return -1;
 
 	return data_read_write_offset(dso, machine, offset, data, size, false);
@@ -1234,7 +1244,7 @@ struct dso *machine__findnew_kernel(struct machine *machine, const char *name,
 	 */
 	if (dso != NULL) {
 		dso__set_short_name(dso, short_name, false);
-		dso->kernel = dso_type;
+		dso__set_kernel(dso, dso_type);
 	}
 
 	return dso;
@@ -1242,7 +1252,7 @@ struct dso *machine__findnew_kernel(struct machine *machine, const char *name,
 
 static void dso__set_long_name_id(struct dso *dso, const char *name, bool name_allocated)
 {
-	struct dsos *dsos = dso->dsos;
+	struct dsos *dsos = dso__dsos(dso);
 
 	if (name == NULL)
 		return;
@@ -1255,12 +1265,12 @@ static void dso__set_long_name_id(struct dso *dso, const char *name, bool name_a
 		down_write(&dsos->lock);
 	}
 
-	if (dso->long_name_allocated)
-		free((char *)dso->long_name);
+	if (dso__long_name_allocated(dso))
+		free((char *)dso__long_name(dso));
 
-	dso->long_name		 = name;
-	dso->long_name_len	 = strlen(name);
-	dso->long_name_allocated = name_allocated;
+	RC_CHK_ACCESS(dso)->long_name = name;
+	RC_CHK_ACCESS(dso)->long_name_len = strlen(name);
+	dso__set_long_name_allocated(dso, name_allocated);
 
 	if (dsos) {
 		dsos->sorted = false;
@@ -1306,14 +1316,15 @@ bool dso_id__empty(const struct dso_id *id)
 
 void __dso__inject_id(struct dso *dso, struct dso_id *id)
 {
-	struct dsos *dsos = dso->dsos;
+	struct dsos *dsos = dso__dsos(dso);
+	struct dso_id *dso_id = dso__id(dso);
 
 	/* dsos write lock held by caller. */
 
-	dso->id.maj = id->maj;
-	dso->id.min = id->min;
-	dso->id.ino = id->ino;
-	dso->id.ino_generation = id->ino_generation;
+	dso_id->maj = id->maj;
+	dso_id->min = id->min;
+	dso_id->ino = id->ino;
+	dso_id->ino_generation = id->ino_generation;
 
 	if (dsos)
 		dsos->sorted = false;
@@ -1333,7 +1344,7 @@ int dso_id__cmp(const struct dso_id *a, const struct dso_id *b)
 
 int dso__cmp_id(struct dso *a, struct dso *b)
 {
-	return __dso_id__cmp(&a->id, &b->id);
+	return __dso_id__cmp(dso__id(a), dso__id(b));
 }
 
 void dso__set_long_name(struct dso *dso, const char *name, bool name_allocated)
@@ -1343,7 +1354,7 @@ void dso__set_long_name(struct dso *dso, const char *name, bool name_allocated)
 
 void dso__set_short_name(struct dso *dso, const char *name, bool name_allocated)
 {
-	struct dsos *dsos = dso->dsos;
+	struct dsos *dsos = dso__dsos(dso);
 
 	if (name == NULL)
 		return;
@@ -1355,12 +1366,12 @@ void dso__set_short_name(struct dso *dso, const char *name, bool name_allocated)
 		 */
 		down_write(&dsos->lock);
 	}
-	if (dso->short_name_allocated)
-		free((char *)dso->short_name);
+	if (dso__short_name_allocated(dso))
+		free((char *)dso__short_name(dso));
 
-	dso->short_name		  = name;
-	dso->short_name_len	  = strlen(name);
-	dso->short_name_allocated = name_allocated;
+	RC_CHK_ACCESS(dso)->short_name		  = name;
+	RC_CHK_ACCESS(dso)->short_name_len	  = strlen(name);
+	dso__set_short_name_allocated(dso, name_allocated);
 
 	if (dsos) {
 		dsos->sorted = false;
@@ -1373,44 +1384,46 @@ int dso__name_len(const struct dso *dso)
 	if (!dso)
 		return strlen("[unknown]");
 	if (verbose > 0)
-		return dso->long_name_len;
+		return dso__long_name_len(dso);
 
-	return dso->short_name_len;
+	return dso__short_name_len(dso);
 }
 
 bool dso__loaded(const struct dso *dso)
 {
-	return dso->loaded;
+	return RC_CHK_ACCESS(dso)->loaded;
 }
 
 bool dso__sorted_by_name(const struct dso *dso)
 {
-	return dso->sorted_by_name;
+	return RC_CHK_ACCESS(dso)->sorted_by_name;
 }
 
 void dso__set_sorted_by_name(struct dso *dso)
 {
-	dso->sorted_by_name = true;
+	RC_CHK_ACCESS(dso)->sorted_by_name = true;
 }
 
 struct dso *dso__new_id(const char *name, struct dso_id *id)
 {
-	struct dso *dso = calloc(1, sizeof(*dso) + strlen(name) + 1);
+	RC_STRUCT(dso) *dso = zalloc(sizeof(*dso) + strlen(name) + 1);
+	struct dso *res;
+	struct dso_data *data;
 
-	if (dso != NULL) {
+	if (!dso)
+		return NULL;
+
+	if (ADD_RC_CHK(res, dso)) {
 		strcpy(dso->name, name);
 		if (id)
 			dso->id = *id;
-		dso__set_long_name_id(dso, dso->name, false);
-		dso__set_short_name(dso, dso->name, false);
+		dso__set_long_name_id(res, dso->name, false);
+		dso__set_short_name(res, dso->name, false);
 		dso->symbols = RB_ROOT_CACHED;
 		dso->symbol_names = NULL;
 		dso->symbol_names_len = 0;
-		dso->data.cache = RB_ROOT;
 		dso->inlined_nodes = RB_ROOT_CACHED;
 		dso->srclines = RB_ROOT_CACHED;
-		dso->data.fd = -1;
-		dso->data.status = DSO_DATA_STATUS_UNKNOWN;
 		dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
 		dso->binary_type = DSO_BINARY_TYPE__NOT_FOUND;
 		dso->is_64_bit = (sizeof(void *) == 8);
@@ -1424,12 +1437,16 @@ struct dso *dso__new_id(const char *name, struct dso_id *id)
 		dso->is_kmod = 0;
 		dso->needs_swap = DSO_SWAP__UNSET;
 		dso->comp = COMP_ID__NONE;
-		INIT_LIST_HEAD(&dso->data.open_entry);
 		mutex_init(&dso->lock);
 		refcount_set(&dso->refcnt, 1);
+		data = &dso->data;
+		data->cache = RB_ROOT;
+		data->fd = -1;
+		data->status = DSO_DATA_STATUS_UNKNOWN;
+		INIT_LIST_HEAD(&data->open_entry);
+		data->dso = NULL; /* Set when on the open_entry list. */
 	}
-
-	return dso;
+	return res;
 }
 
 struct dso *dso__new(const char *name)
@@ -1439,68 +1456,75 @@ struct dso *dso__new(const char *name)
 
 void dso__delete(struct dso *dso)
 {
-	if (dso->dsos)
-		pr_err("DSO %s is still in rbtree when being deleted!\n", dso->long_name);
+	if (dso__dsos(dso))
+		pr_err("DSO %s is still in rbtree when being deleted!\n", dso__long_name(dso));
 
 	/* free inlines first, as they reference symbols */
-	inlines__tree_delete(&dso->inlined_nodes);
-	srcline__tree_delete(&dso->srclines);
-	symbols__delete(&dso->symbols);
-	dso->symbol_names_len = 0;
-	zfree(&dso->symbol_names);
-	if (dso->short_name_allocated) {
-		zfree((char **)&dso->short_name);
-		dso->short_name_allocated = false;
+	inlines__tree_delete(&RC_CHK_ACCESS(dso)->inlined_nodes);
+	srcline__tree_delete(&RC_CHK_ACCESS(dso)->srclines);
+	symbols__delete(&RC_CHK_ACCESS(dso)->symbols);
+	RC_CHK_ACCESS(dso)->symbol_names_len = 0;
+	zfree(&RC_CHK_ACCESS(dso)->symbol_names);
+	if (RC_CHK_ACCESS(dso)->short_name_allocated) {
+		zfree((char **)&RC_CHK_ACCESS(dso)->short_name);
+		RC_CHK_ACCESS(dso)->short_name_allocated = false;
 	}
 
-	if (dso->long_name_allocated) {
-		zfree((char **)&dso->long_name);
-		dso->long_name_allocated = false;
+	if (RC_CHK_ACCESS(dso)->long_name_allocated) {
+		zfree((char **)&RC_CHK_ACCESS(dso)->long_name);
+		RC_CHK_ACCESS(dso)->long_name_allocated = false;
 	}
 
 	dso__data_close(dso);
-	auxtrace_cache__free(dso->auxtrace_cache);
+	auxtrace_cache__free(RC_CHK_ACCESS(dso)->auxtrace_cache);
 	dso_cache__free(dso);
 	dso__free_a2l(dso);
-	zfree(&dso->symsrc_filename);
-	nsinfo__zput(dso->nsinfo);
-	mutex_destroy(&dso->lock);
-	free(dso);
+	zfree(&RC_CHK_ACCESS(dso)->symsrc_filename);
+	nsinfo__zput(RC_CHK_ACCESS(dso)->nsinfo);
+	mutex_destroy(dso__lock(dso));
+	RC_CHK_FREE(dso);
 }
 
 struct dso *dso__get(struct dso *dso)
 {
-	if (dso)
-		refcount_inc(&dso->refcnt);
-	return dso;
+	struct dso *result;
+
+	if (RC_CHK_GET(result, dso))
+		refcount_inc(&RC_CHK_ACCESS(dso)->refcnt);
+
+	return result;
 }
 
 void dso__put(struct dso *dso)
 {
-	if (dso && refcount_dec_and_test(&dso->refcnt))
+	if (dso && refcount_dec_and_test(&RC_CHK_ACCESS(dso)->refcnt))
 		dso__delete(dso);
+	else
+		RC_CHK_PUT(dso);
 }
 
 void dso__set_build_id(struct dso *dso, struct build_id *bid)
 {
-	dso->bid = *bid;
-	dso->has_build_id = 1;
+	RC_CHK_ACCESS(dso)->bid = *bid;
+	RC_CHK_ACCESS(dso)->has_build_id = 1;
 }
 
 bool dso__build_id_equal(const struct dso *dso, struct build_id *bid)
 {
-	if (dso->bid.size > bid->size && dso->bid.size == BUILD_ID_SIZE) {
+	const struct build_id *dso_bid = dso__bid_const(dso);
+
+	if (dso_bid->size > bid->size && dso_bid->size == BUILD_ID_SIZE) {
 		/*
 		 * For the backward compatibility, it allows a build-id has
 		 * trailing zeros.
 		 */
-		return !memcmp(dso->bid.data, bid->data, bid->size) &&
-			!memchr_inv(&dso->bid.data[bid->size], 0,
-				    dso->bid.size - bid->size);
+		return !memcmp(dso_bid->data, bid->data, bid->size) &&
+			!memchr_inv(&dso_bid->data[bid->size], 0,
+				    dso_bid->size - bid->size);
 	}
 
-	return dso->bid.size == bid->size &&
-	       memcmp(dso->bid.data, bid->data, dso->bid.size) == 0;
+	return dso_bid->size == bid->size &&
+	       memcmp(dso_bid->data, bid->data, dso_bid->size) == 0;
 }
 
 void dso__read_running_kernel_build_id(struct dso *dso, struct machine *machine)
@@ -1510,8 +1534,8 @@ void dso__read_running_kernel_build_id(struct dso *dso, struct machine *machine)
 	if (machine__is_default_guest(machine))
 		return;
 	sprintf(path, "%s/sys/kernel/notes", machine->root_dir);
-	if (sysfs__read_build_id(path, &dso->bid) == 0)
-		dso->has_build_id = true;
+	if (sysfs__read_build_id(path, dso__bid(dso)) == 0)
+		dso__set_has_build_id(dso);
 }
 
 int dso__kernel_module_get_build_id(struct dso *dso,
@@ -1522,14 +1546,14 @@ int dso__kernel_module_get_build_id(struct dso *dso,
 	 * kernel module short names are of the form "[module]" and
 	 * we need just "module" here.
 	 */
-	const char *name = dso->short_name + 1;
+	const char *name = dso__short_name(dso) + 1;
 
 	snprintf(filename, sizeof(filename),
 		 "%s/sys/module/%.*s/notes/.note.gnu.build-id",
 		 root_dir, (int)strlen(name) - 1, name);
 
-	if (sysfs__read_build_id(filename, &dso->bid) == 0)
-		dso->has_build_id = true;
+	if (sysfs__read_build_id(filename, dso__bid(dso)) == 0)
+		dso__set_has_build_id(dso);
 
 	return 0;
 }
@@ -1538,21 +1562,21 @@ static size_t dso__fprintf_buildid(struct dso *dso, FILE *fp)
 {
 	char sbuild_id[SBUILD_ID_SIZE];
 
-	build_id__sprintf(&dso->bid, sbuild_id);
+	build_id__sprintf(dso__bid(dso), sbuild_id);
 	return fprintf(fp, "%s", sbuild_id);
 }
 
 size_t dso__fprintf(struct dso *dso, FILE *fp)
 {
 	struct rb_node *nd;
-	size_t ret = fprintf(fp, "dso: %s (", dso->short_name);
+	size_t ret = fprintf(fp, "dso: %s (", dso__short_name(dso));
 
-	if (dso->short_name != dso->long_name)
-		ret += fprintf(fp, "%s, ", dso->long_name);
+	if (dso__short_name(dso) != dso__long_name(dso))
+		ret += fprintf(fp, "%s, ", dso__long_name(dso));
 	ret += fprintf(fp, "%sloaded, ", dso__loaded(dso) ? "" : "NOT ");
 	ret += dso__fprintf_buildid(dso, fp);
 	ret += fprintf(fp, ")\n");
-	for (nd = rb_first_cached(&dso->symbols); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(dso__symbols(dso)); nd; nd = rb_next(nd)) {
 		struct symbol *pos = rb_entry(nd, struct symbol, rb_node);
 		ret += symbol__fprintf(pos, fp);
 	}
@@ -1576,7 +1600,7 @@ enum dso_type dso__type(struct dso *dso, struct machine *machine)
 
 int dso__strerror_load(struct dso *dso, char *buf, size_t buflen)
 {
-	int idx, errnum = dso->load_errno;
+	int idx, errnum = *dso__load_errno(dso);
 	/*
 	 * This must have a same ordering as the enum dso_load_errno.
 	 */
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index fd500583cd2e..fa311ffd2538 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -11,6 +11,7 @@
 #include <linux/bitops.h>
 #include "build-id.h"
 #include "mutex.h"
+#include <internal/rc_check.h>
 
 struct machine;
 struct map;
@@ -100,26 +101,27 @@ enum dso_load_errno {
 	__DSO_LOAD_ERRNO__END,
 };
 
-#define DSO__SWAP(dso, type, val)			\
-({							\
-	type ____r = val;				\
-	BUG_ON(dso->needs_swap == DSO_SWAP__UNSET);	\
-	if (dso->needs_swap == DSO_SWAP__YES) {		\
-		switch (sizeof(____r)) {		\
-		case 2:					\
-			____r = bswap_16(val);		\
-			break;				\
-		case 4:					\
-			____r = bswap_32(val);		\
-			break;				\
-		case 8:					\
-			____r = bswap_64(val);		\
-			break;				\
-		default:				\
-			BUG_ON(1);			\
-		}					\
-	}						\
-	____r;						\
+#define DSO__SWAP(dso, type, val)				\
+({								\
+	type ____r = val;					\
+	enum dso_swap_type ___dst = dso__needs_swap(dso);	\
+	BUG_ON(___dst == DSO_SWAP__UNSET);			\
+	if (___dst == DSO_SWAP__YES) {				\
+		switch (sizeof(____r)) {			\
+		case 2:						\
+			____r = bswap_16(val);			\
+			break;					\
+		case 4:						\
+			____r = bswap_32(val);			\
+			break;					\
+		case 8:						\
+			____r = bswap_64(val);			\
+			break;					\
+		default:					\
+			BUG_ON(1);				\
+		}						\
+	}							\
+	____r;							\
 })
 
 #define DSO__DATA_CACHE_SIZE 4096
@@ -142,9 +144,29 @@ struct dso_cache {
 	char data[];
 };
 
+struct dso_data {
+	struct rb_root	 cache;
+	struct list_head open_entry;
+	struct dso	 *dso;
+	int		 fd;
+	int		 status;
+	u32		 status_seen;
+	u64		 file_size;
+	u64		 elf_base_addr;
+	u64		 debug_frame_offset;
+	u64		 eh_frame_hdr_addr;
+	u64		 eh_frame_hdr_offset;
+};
+
+struct dso_bpf_prog {
+	u32		id;
+	u32		sub_id;
+	struct perf_env	*env;
+};
+
 struct auxtrace_cache;
 
-struct dso {
+DECLARE_RC_STRUCT(dso) {
 	struct mutex	 lock;
 	struct dsos	 *dsos;
 	struct rb_root_cached symbols;
@@ -173,24 +195,9 @@ struct dso {
 		u64	 db_id;
 	};
 	/* bpf prog information */
-	struct {
-		struct perf_env	*env;
-		u32		id;
-		u32		sub_id;
-	} bpf_prog;
+	struct dso_bpf_prog bpf_prog;
 	/* dso data file */
-	struct {
-		struct rb_root	 cache;
-		struct list_head open_entry;
-		u64		 file_size;
-		u64		 elf_base_addr;
-		u64		 debug_frame_offset;
-		u64		 eh_frame_hdr_addr;
-		u64		 eh_frame_hdr_offset;
-		int		 fd;
-		int		 status;
-		u32		 status_seen;
-	} data;
+	struct dso_data	 data;
 	struct dso_id	 id;
 	unsigned int	 a2l_fails;
 	int		 comp;
@@ -226,11 +233,378 @@ struct dso {
  * @n: the 'struct rb_node *' to use as a temporary storage
  */
 #define dso__for_each_symbol(dso, pos, n)	\
-	symbols__for_each_entry(&(dso)->symbols, pos, n)
+	symbols__for_each_entry(dso__symbols(dso), pos, n)
+
+static inline void *dso__a2l(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->a2l;
+}
+
+static inline void dso__set_a2l(struct dso *dso, void *val)
+{
+	RC_CHK_ACCESS(dso)->a2l = val;
+}
+
+static inline unsigned int dso__a2l_fails(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->a2l_fails;
+}
+
+static inline void dso__set_a2l_fails(struct dso *dso, unsigned int val)
+{
+	RC_CHK_ACCESS(dso)->a2l_fails = val;
+}
+
+static inline bool dso__adjust_symbols(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->adjust_symbols;
+}
+
+static inline void dso__set_adjust_symbols(struct dso *dso, bool val)
+{
+	RC_CHK_ACCESS(dso)->adjust_symbols = val;
+}
+
+static inline bool dso__annotate_warned(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->annotate_warned;
+}
+
+static inline void dso__set_annotate_warned(struct dso *dso)
+{
+	RC_CHK_ACCESS(dso)->annotate_warned = 1;
+}
+
+static inline struct auxtrace_cache *dso__auxtrace_cache(struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->auxtrace_cache;
+}
+
+static inline void dso__set_auxtrace_cache(struct dso *dso, struct auxtrace_cache *cache)
+{
+	RC_CHK_ACCESS(dso)->auxtrace_cache = cache;
+}
+
+static inline struct build_id *dso__bid(struct dso *dso)
+{
+	return &RC_CHK_ACCESS(dso)->bid;
+}
+
+static inline const struct build_id *dso__bid_const(const struct dso *dso)
+{
+	return &RC_CHK_ACCESS(dso)->bid;
+}
+
+static inline struct dso_bpf_prog *dso__bpf_prog(struct dso *dso)
+{
+	return &RC_CHK_ACCESS(dso)->bpf_prog;
+}
+
+static inline bool dso__has_build_id(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->has_build_id;
+}
+
+static inline void dso__set_has_build_id(struct dso *dso)
+{
+	RC_CHK_ACCESS(dso)->has_build_id = true;
+}
+
+static inline bool dso__has_srcline(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->has_srcline;
+}
+
+static inline void dso__set_has_srcline(struct dso *dso, bool val)
+{
+	RC_CHK_ACCESS(dso)->has_srcline = val;
+}
+
+static inline int dso__comp(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->comp;
+}
+
+static inline void dso__set_comp(struct dso *dso, int comp)
+{
+	RC_CHK_ACCESS(dso)->comp = comp;
+}
+
+static inline struct dso_data *dso__data(struct dso *dso)
+{
+	return &RC_CHK_ACCESS(dso)->data;
+}
+
+static inline u64 dso__db_id(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->db_id;
+}
+
+static inline void dso__set_db_id(struct dso *dso, u64 db_id)
+{
+	RC_CHK_ACCESS(dso)->db_id = db_id;
+}
+
+static inline struct dsos *dso__dsos(struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->dsos;
+}
+
+static inline void dso__set_dsos(struct dso *dso, struct dsos *dsos)
+{
+	RC_CHK_ACCESS(dso)->dsos = dsos;
+}
+
+static inline bool dso__header_build_id(struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->header_build_id;
+}
+
+static inline void dso__set_header_build_id(struct dso *dso, bool val)
+{
+	RC_CHK_ACCESS(dso)->header_build_id = val;
+}
+
+static inline bool dso__hit(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->hit;
+}
+
+static inline void dso__set_hit(struct dso *dso)
+{
+	RC_CHK_ACCESS(dso)->hit = 1;
+}
+
+static inline struct dso_id *dso__id(struct dso *dso)
+{
+	return &RC_CHK_ACCESS(dso)->id;
+}
+
+static inline const struct dso_id *dso__id_const(const struct dso *dso)
+{
+	return &RC_CHK_ACCESS(dso)->id;
+}
+
+static inline struct rb_root_cached *dso__inlined_nodes(struct dso *dso)
+{
+	return &RC_CHK_ACCESS(dso)->inlined_nodes;
+}
+
+static inline bool dso__is_64_bit(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->is_64_bit;
+}
+
+static inline void dso__set_is_64_bit(struct dso *dso, bool is)
+{
+	RC_CHK_ACCESS(dso)->is_64_bit = is;
+}
+
+static inline bool dso__is_kmod(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->is_kmod;
+}
+
+static inline void dso__set_is_kmod(struct dso *dso)
+{
+	RC_CHK_ACCESS(dso)->is_kmod = 1;
+}
+
+static inline enum dso_space_type dso__kernel(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->kernel;
+}
+
+static inline void dso__set_kernel(struct dso *dso, enum dso_space_type kernel)
+{
+	RC_CHK_ACCESS(dso)->kernel = kernel;
+}
+
+static inline u64 dso__last_find_result_addr(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->last_find_result.addr;
+}
+
+static inline void dso__set_last_find_result_addr(struct dso *dso, u64 addr)
+{
+	RC_CHK_ACCESS(dso)->last_find_result.addr = addr;
+}
+
+static inline struct symbol *dso__last_find_result_symbol(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->last_find_result.symbol;
+}
+
+static inline void dso__set_last_find_result_symbol(struct dso *dso, struct symbol *symbol)
+{
+	RC_CHK_ACCESS(dso)->last_find_result.symbol = symbol;
+}
+
+static inline enum dso_load_errno *dso__load_errno(struct dso *dso)
+{
+	return &RC_CHK_ACCESS(dso)->load_errno;
+}
 
 static inline void dso__set_loaded(struct dso *dso)
 {
-	dso->loaded = true;
+	RC_CHK_ACCESS(dso)->loaded = true;
+}
+
+static inline struct mutex *dso__lock(struct dso *dso)
+{
+	return &RC_CHK_ACCESS(dso)->lock;
+}
+
+static inline const char *dso__long_name(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->long_name;
+}
+
+static inline bool dso__long_name_allocated(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->long_name_allocated;
+}
+
+static inline void dso__set_long_name_allocated(struct dso *dso, bool allocated)
+{
+	RC_CHK_ACCESS(dso)->long_name_allocated = allocated;
+}
+
+static inline u16 dso__long_name_len(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->long_name_len;
+}
+
+static inline const char *dso__name(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->name;
+}
+
+static inline enum dso_swap_type dso__needs_swap(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->needs_swap;
+}
+
+static inline void dso__set_needs_swap(struct dso *dso, enum dso_swap_type type)
+{
+	RC_CHK_ACCESS(dso)->needs_swap = type;
+}
+
+static inline struct nsinfo *dso__nsinfo(struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->nsinfo;
+}
+
+static inline const struct nsinfo *dso__nsinfo_const(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->nsinfo;
+}
+
+static inline struct nsinfo **dso__nsinfo_ptr(struct dso *dso)
+{
+	return &RC_CHK_ACCESS(dso)->nsinfo;
+}
+
+void dso__set_nsinfo(struct dso *dso, struct nsinfo *nsi);
+
+static inline u8 dso__rel(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->rel;
+}
+
+static inline void dso__set_rel(struct dso *dso, u8 rel)
+{
+	RC_CHK_ACCESS(dso)->rel = rel;
+}
+
+static inline const char *dso__short_name(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->short_name;
+}
+
+static inline bool dso__short_name_allocated(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->short_name_allocated;
+}
+
+static inline void dso__set_short_name_allocated(struct dso *dso, bool allocated)
+{
+	RC_CHK_ACCESS(dso)->short_name_allocated = allocated;
+}
+
+static inline u16 dso__short_name_len(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->short_name_len;
+}
+
+static inline struct rb_root_cached *dso__srclines(struct dso *dso)
+{
+	return &RC_CHK_ACCESS(dso)->srclines;
+}
+
+static inline struct rb_root_cached *dso__symbols(struct dso *dso)
+{
+	return &RC_CHK_ACCESS(dso)->symbols;
+}
+
+static inline struct symbol **dso__symbol_names(struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->symbol_names;
+}
+
+static inline void dso__set_symbol_names(struct dso *dso, struct symbol **names)
+{
+	RC_CHK_ACCESS(dso)->symbol_names = names;
+}
+
+static inline size_t dso__symbol_names_len(struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->symbol_names_len;
+}
+
+static inline void dso__set_symbol_names_len(struct dso *dso, size_t len)
+{
+	RC_CHK_ACCESS(dso)->symbol_names_len = len;
+}
+
+static inline const char *dso__symsrc_filename(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->symsrc_filename;
+}
+
+static inline void dso__set_symsrc_filename(struct dso *dso, char *val)
+{
+	RC_CHK_ACCESS(dso)->symsrc_filename = val;
+}
+
+static inline enum dso_binary_type dso__symtab_type(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->symtab_type;
+}
+
+static inline void dso__set_symtab_type(struct dso *dso, enum dso_binary_type bt)
+{
+	RC_CHK_ACCESS(dso)->symtab_type = bt;
+}
+
+static inline u64 dso__text_end(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->text_end;
+}
+
+static inline void dso__set_text_end(struct dso *dso, u64 val)
+{
+	RC_CHK_ACCESS(dso)->text_end = val;
+}
+
+static inline u64 dso__text_offset(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->text_offset;
+}
+
+static inline void dso__set_text_offset(struct dso *dso, u64 val)
+{
+	RC_CHK_ACCESS(dso)->text_offset = val;
 }
 
 int dso_id__cmp(const struct dso_id *a, const struct dso_id *b);
@@ -262,7 +636,7 @@ bool dso__loaded(const struct dso *dso);
 
 static inline bool dso__has_symbols(const struct dso *dso)
 {
-	return !RB_EMPTY_ROOT(&dso->symbols.rb_root);
+	return !RB_EMPTY_ROOT(&RC_CHK_ACCESS(dso)->symbols.rb_root);
 }
 
 char *dso__filename_with_chroot(const struct dso *dso, const char *filename);
@@ -378,21 +752,33 @@ void dso__reset_find_symbol_cache(struct dso *dso);
 size_t dso__fprintf_symbols_by_name(struct dso *dso, FILE *fp);
 size_t dso__fprintf(struct dso *dso, FILE *fp);
 
+static inline enum dso_binary_type dso__binary_type(const struct dso *dso)
+{
+	return RC_CHK_ACCESS(dso)->binary_type;
+}
+
+static inline void dso__set_binary_type(struct dso *dso, enum dso_binary_type bt)
+{
+	RC_CHK_ACCESS(dso)->binary_type = bt;
+}
+
 static inline bool dso__is_vmlinux(const struct dso *dso)
 {
-	return dso->binary_type == DSO_BINARY_TYPE__VMLINUX ||
-	       dso->binary_type == DSO_BINARY_TYPE__GUEST_VMLINUX;
+	enum dso_binary_type bt = dso__binary_type(dso);
+
+	return bt == DSO_BINARY_TYPE__VMLINUX || bt == DSO_BINARY_TYPE__GUEST_VMLINUX;
 }
 
 static inline bool dso__is_kcore(const struct dso *dso)
 {
-	return dso->binary_type == DSO_BINARY_TYPE__KCORE ||
-	       dso->binary_type == DSO_BINARY_TYPE__GUEST_KCORE;
+	enum dso_binary_type bt = dso__binary_type(dso);
+
+	return bt == DSO_BINARY_TYPE__KCORE || bt == DSO_BINARY_TYPE__GUEST_KCORE;
 }
 
 static inline bool dso__is_kallsyms(const struct dso *dso)
 {
-	return dso->kernel && dso->long_name[0] != '/';
+	return RC_CHK_ACCESS(dso)->kernel && RC_CHK_ACCESS(dso)->long_name[0] != '/';
 }
 
 bool dso__is_object_file(const struct dso *dso);
diff --git a/tools/perf/util/dsos.c b/tools/perf/util/dsos.c
index 23c3fe4f2abb..ab3d0c01dd63 100644
--- a/tools/perf/util/dsos.c
+++ b/tools/perf/util/dsos.c
@@ -29,8 +29,8 @@ static void dsos__purge(struct dsos *dsos)
 	for (unsigned int i = 0; i < dsos->cnt; i++) {
 		struct dso *dso = dsos->dsos[i];
 
+		dso__set_dsos(dso, NULL);
 		dso__put(dso);
-		dso->dsos = NULL;
 	}
 
 	zfree(&dsos->dsos);
@@ -73,22 +73,22 @@ static int dsos__read_build_ids_cb(struct dso *dso, void *data)
 	struct dsos__read_build_ids_cb_args *args = data;
 	struct nscookie nsc;
 
-	if (args->with_hits && !dso->hit && !dso__is_vdso(dso))
+	if (args->with_hits && !dso__hit(dso) && !dso__is_vdso(dso))
 		return 0;
-	if (dso->has_build_id) {
+	if (dso__has_build_id(dso)) {
 		args->have_build_id = true;
 		return 0;
 	}
-	nsinfo__mountns_enter(dso->nsinfo, &nsc);
-	if (filename__read_build_id(dso->long_name, &dso->bid) > 0) {
+	nsinfo__mountns_enter(dso__nsinfo(dso), &nsc);
+	if (filename__read_build_id(dso__long_name(dso), dso__bid(dso)) > 0) {
 		args->have_build_id = true;
-		dso->has_build_id = true;
-	} else if (errno == ENOENT && dso->nsinfo) {
-		char *new_name = dso__filename_with_chroot(dso, dso->long_name);
+		dso__set_has_build_id(dso);
+	} else if (errno == ENOENT && dso__nsinfo(dso)) {
+		char *new_name = dso__filename_with_chroot(dso, dso__long_name(dso));
 
-		if (new_name && filename__read_build_id(new_name, &dso->bid) > 0) {
+		if (new_name && filename__read_build_id(new_name, dso__bid(dso)) > 0) {
 			args->have_build_id = true;
-			dso->has_build_id = true;
+			dso__set_has_build_id(dso);
 		}
 		free(new_name);
 	}
@@ -110,27 +110,27 @@ bool dsos__read_build_ids(struct dsos *dsos, bool with_hits)
 static int __dso__cmp_long_name(const char *long_name, const struct dso_id *id,
 				const struct dso *b)
 {
-	int rc = strcmp(long_name, b->long_name);
-	return rc ?: dso_id__cmp(id, &b->id);
+	int rc = strcmp(long_name, dso__long_name(b));
+	return rc ?: dso_id__cmp(id, dso__id_const(b));
 }
 
 static int __dso__cmp_short_name(const char *short_name, const struct dso_id *id,
 				 const struct dso *b)
 {
-	int rc = strcmp(short_name, b->short_name);
-	return rc ?: dso_id__cmp(id, &b->id);
+	int rc = strcmp(short_name, dso__short_name(b));
+	return rc ?: dso_id__cmp(id, dso__id_const(b));
 }
 
 static int dsos__cmp_long_name_id_short_name(const void *va, const void *vb)
 {
 	const struct dso *a = *((const struct dso **)va);
 	const struct dso *b = *((const struct dso **)vb);
-	int rc = strcmp(a->long_name, b->long_name);
+	int rc = strcmp(dso__long_name(a), dso__long_name(b));
 
 	if (!rc) {
-		rc = dso_id__cmp(&a->id, &b->id);
+		rc = dso_id__cmp(dso__id_const(a), dso__id_const(b));
 		if (!rc)
-			rc = strcmp(a->short_name, b->short_name);
+			rc = strcmp(dso__short_name(a), dso__short_name(b));
 	}
 	return rc;
 }
@@ -209,7 +209,7 @@ int __dsos__add(struct dsos *dsos, struct dso *dso)
 								 &dsos->dsos[dsos->cnt - 1])
 			<= 0;
 	}
-	dso->dsos = dsos;
+	dso__set_dsos(dso, dsos);
 	return 0;
 }
 
@@ -275,7 +275,7 @@ static void dso__set_basename(struct dso *dso)
 	char *base, *lname;
 	int tid;
 
-	if (sscanf(dso->long_name, "/tmp/perf-%d.map", &tid) == 1) {
+	if (sscanf(dso__long_name(dso), "/tmp/perf-%d.map", &tid) == 1) {
 		if (asprintf(&base, "[JIT] tid %d", tid) < 0)
 			return;
 	} else {
@@ -283,7 +283,7 @@ static void dso__set_basename(struct dso *dso)
 	       * basename() may modify path buffer, so we must pass
                * a copy.
                */
-		lname = strdup(dso->long_name);
+		lname = strdup(dso__long_name(dso));
 		if (!lname)
 			return;
 
@@ -322,7 +322,7 @@ static struct dso *__dsos__findnew_id(struct dsos *dsos, const char *name, struc
 {
 	struct dso *dso = __dsos__find_id(dsos, name, id, false, /*write_locked=*/true);
 
-	if (dso && dso_id__empty(&dso->id) && !dso_id__empty(id))
+	if (dso && dso_id__empty(dso__id(dso)) && !dso_id__empty(id))
 		__dso__inject_id(dso, id);
 
 	return dso ? dso : __dsos__addnew_id(dsos, name, id);
@@ -351,8 +351,8 @@ static int dsos__fprintf_buildid_cb(struct dso *dso, void *data)
 
 	if (args->skip && args->skip(dso, args->parm))
 		return 0;
-	build_id__sprintf(&dso->bid, sbuild_id);
-	args->ret += fprintf(args->fp, "%-40s %s\n", sbuild_id, dso->long_name);
+	build_id__sprintf(dso__bid(dso), sbuild_id);
+	args->ret += fprintf(args->fp, "%-40s %s\n", sbuild_id, dso__long_name(dso));
 	return 0;
 }
 
@@ -396,7 +396,7 @@ size_t dsos__fprintf(struct dsos *dsos, FILE *fp)
 
 static int dsos__hit_all_cb(struct dso *dso, void *data __maybe_unused)
 {
-	dso->hit = true;
+	dso__set_hit(dso);
 	return 0;
 }
 
@@ -432,7 +432,7 @@ struct dso *dsos__findnew_module_dso(struct dsos *dsos,
 	dso__set_basename(dso);
 	dso__set_module_info(dso, m, machine);
 	dso__set_long_name(dso,	strdup(filename), true);
-	dso->kernel = DSO_SPACE__KERNEL;
+	dso__set_kernel(dso, DSO_SPACE__KERNEL);
 	__dsos__add(dsos, dso);
 
 	up_write(&dsos->lock);
@@ -455,8 +455,8 @@ static int dsos__find_kernel_dso_cb(struct dso *dso, void *data)
 	 * Therefore, we pass PERF_RECORD_MISC_CPUMODE_UNKNOWN.
 	 * is_kernel_module() treats it as a kernel cpumode.
 	 */
-	if (!dso->kernel ||
-	    is_kernel_module(dso->long_name, PERF_RECORD_MISC_CPUMODE_UNKNOWN))
+	if (!dso__kernel(dso) ||
+	    is_kernel_module(dso__long_name(dso), PERF_RECORD_MISC_CPUMODE_UNKNOWN))
 		return 0;
 
 	*res = dso__get(dso);
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 198903157f9e..f32f9abf6344 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -726,7 +726,7 @@ int machine__resolve(struct machine *machine, struct addr_location *al,
 	dso = al->map ? map__dso(al->map) : NULL;
 	dump_printf(" ...... dso: %s\n",
 		dso
-		? dso->long_name
+		? dso__long_name(dso)
 		: (al->level == 'H' ? "[hypervisor]" : "<not found>"));
 
 	if (thread__is_filtered(thread))
@@ -750,10 +750,10 @@ int machine__resolve(struct machine *machine, struct addr_location *al,
 	if (al->map) {
 		if (symbol_conf.dso_list &&
 		    (!dso || !(strlist__has_entry(symbol_conf.dso_list,
-						  dso->short_name) ||
-			       (dso->short_name != dso->long_name &&
+						  dso__short_name(dso)) ||
+			       (dso__short_name(dso) != dso__long_name(dso) &&
 				strlist__has_entry(symbol_conf.dso_list,
-						   dso->long_name))))) {
+						   dso__long_name(dso)))))) {
 			al->filtered |= (1 << HIST_FILTER__DSO);
 		}
 
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 55f63d2ee232..13e23c386601 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -2299,7 +2299,7 @@ static int __event_process_build_id(struct perf_record_header_build_id *bev,
 
 		build_id__init(&bid, bev->data, size);
 		dso__set_build_id(dso, &bid);
-		dso->header_build_id = 1;
+		dso__set_header_build_id(dso, true);
 
 		if (dso_space != DSO_SPACE__USER) {
 			struct kmod_path m = { .name = NULL, };
@@ -2307,13 +2307,13 @@ static int __event_process_build_id(struct perf_record_header_build_id *bev,
 			if (!kmod_path__parse_name(&m, filename) && m.kmod)
 				dso__set_module_info(dso, &m, machine);
 
-			dso->kernel = dso_space;
+			dso__set_kernel(dso, dso_space);
 			free(m.name);
 		}
 
-		build_id__sprintf(&dso->bid, sbuild_id);
+		build_id__sprintf(dso__bid(dso), sbuild_id);
 		pr_debug("build id event received for %s: %s [%zu]\n",
-			 dso->long_name, sbuild_id, size);
+			 dso__long_name(dso), sbuild_id, size);
 		dso__put(dso);
 	}
 
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 0888b7163b7c..a9eef8b5aff0 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -2128,7 +2128,7 @@ static bool hists__filter_entry_by_dso(struct hists *hists,
 				       struct hist_entry *he)
 {
 	if (hists->dso_filter != NULL &&
-	    (he->ms.map == NULL || map__dso(he->ms.map) != hists->dso_filter)) {
+	    (he->ms.map == NULL || !RC_CHK_EQUAL(map__dso(he->ms.map), hists->dso_filter))) {
 		he->filtered |= (1 << HIST_FILTER__DSO);
 		return true;
 	}
@@ -2808,7 +2808,7 @@ int __hists__scnprintf_title(struct hists *hists, char *bf, size_t size, bool sh
 	}
 	if (dso)
 		printed += scnprintf(bf + printed, size - printed,
-				    ", DSO: %s", dso->short_name);
+				     ", DSO: %s", dso__short_name(dso));
 	if (socket_id > -1)
 		printed += scnprintf(bf + printed, size - printed,
 				    ", Processor Socket: %d", socket_id);
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index f38893e0b036..04a291562b14 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -598,15 +598,15 @@ static struct auxtrace_cache *intel_pt_cache(struct dso *dso,
 	struct auxtrace_cache *c;
 	unsigned int bits;
 
-	if (dso->auxtrace_cache)
-		return dso->auxtrace_cache;
+	if (dso__auxtrace_cache(dso))
+		return dso__auxtrace_cache(dso);
 
 	bits = intel_pt_cache_size(dso, machine);
 
 	/* Ignoring cache creation failure */
 	c = auxtrace_cache__new(bits, sizeof(struct intel_pt_cache_entry), 200);
 
-	dso->auxtrace_cache = c;
+	dso__set_auxtrace_cache(dso, c);
 
 	return c;
 }
@@ -650,7 +650,7 @@ intel_pt_cache_lookup(struct dso *dso, struct machine *machine, u64 offset)
 	if (!c)
 		return NULL;
 
-	return auxtrace_cache__lookup(dso->auxtrace_cache, offset);
+	return auxtrace_cache__lookup(dso__auxtrace_cache(dso), offset);
 }
 
 static void intel_pt_cache_invalidate(struct dso *dso, struct machine *machine,
@@ -661,7 +661,7 @@ static void intel_pt_cache_invalidate(struct dso *dso, struct machine *machine,
 	if (!c)
 		return;
 
-	auxtrace_cache__remove(dso->auxtrace_cache, offset);
+	auxtrace_cache__remove(dso__auxtrace_cache(dso), offset);
 }
 
 static inline bool intel_pt_guest_kernel_ip(uint64_t ip)
@@ -820,8 +820,8 @@ static int intel_pt_walk_next_insn(struct intel_pt_insn *intel_pt_insn,
 		}
 		dso = map__dso(al.map);
 
-		if (dso->data.status == DSO_DATA_STATUS_ERROR &&
-			dso__data_status_seen(dso, DSO_DATA_STATUS_SEEN_ITRACE)) {
+		if (dso__data(dso)->status == DSO_DATA_STATUS_ERROR &&
+		    dso__data_status_seen(dso, DSO_DATA_STATUS_SEEN_ITRACE)) {
 			ret = -ENOENT;
 			goto out_ret;
 		}
@@ -854,7 +854,7 @@ static int intel_pt_walk_next_insn(struct intel_pt_insn *intel_pt_insn,
 		/* Load maps to ensure dso->is_64_bit has been updated */
 		map__load(al.map);
 
-		x86_64 = dso->is_64_bit;
+		x86_64 = dso__is_64_bit(dso);
 
 		while (1) {
 			len = dso__data_read_offset(dso, machine,
@@ -1008,7 +1008,7 @@ static int __intel_pt_pgd_ip(uint64_t ip, void *data)
 
 	offset = map__map_ip(al.map, ip);
 
-	res = intel_pt_match_pgd_ip(ptq->pt, ip, offset, map__dso(al.map)->long_name);
+	res = intel_pt_match_pgd_ip(ptq->pt, ip, offset, dso__long_name(map__dso(al.map)));
 	addr_location__exit(&al);
 	return res;
 }
@@ -3416,7 +3416,7 @@ static int intel_pt_text_poke(struct intel_pt *pt, union perf_event *event)
 		}
 
 		dso = map__dso(al.map);
-		if (!dso || !dso->auxtrace_cache)
+		if (!dso || !dso__auxtrace_cache(dso))
 			continue;
 
 		offset = map__map_ip(al.map, addr);
@@ -3436,7 +3436,7 @@ static int intel_pt_text_poke(struct intel_pt *pt, union perf_event *event)
 		} else {
 			intel_pt_cache_invalidate(dso, machine, offset);
 			intel_pt_log("Invalidated instruction cache for %s at %#"PRIx64"\n",
-				     dso->long_name, addr);
+				     dso__long_name(dso), addr);
 		}
 	}
 out:
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 3646d4593502..d4534fbc7098 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -694,7 +694,7 @@ static int machine__process_ksymbol_register(struct machine *machine,
 			err = -ENOMEM;
 			goto out;
 		}
-		dso->kernel = DSO_SPACE__KERNEL;
+		dso__set_kernel(dso, DSO_SPACE__KERNEL);
 		map = map__new2(0, dso);
 		dso__put(dso);
 		if (!map) {
@@ -702,8 +702,8 @@ static int machine__process_ksymbol_register(struct machine *machine,
 			goto out;
 		}
 		if (event->ksymbol.ksym_type == PERF_RECORD_KSYMBOL_TYPE_OOL) {
-			dso->binary_type = DSO_BINARY_TYPE__OOL;
-			dso->data.file_size = event->ksymbol.len;
+			dso__set_binary_type(dso, DSO_BINARY_TYPE__OOL);
+			dso__data(dso)->file_size = event->ksymbol.len;
 			dso__set_loaded(dso);
 		}
 
@@ -718,7 +718,7 @@ static int machine__process_ksymbol_register(struct machine *machine,
 		dso__set_loaded(dso);
 
 		if (is_bpf_image(event->ksymbol.name)) {
-			dso->binary_type = DSO_BINARY_TYPE__BPF_IMAGE;
+			dso__set_binary_type(dso, DSO_BINARY_TYPE__BPF_IMAGE);
 			dso__set_long_name(dso, "", false);
 		}
 	} else {
@@ -888,17 +888,17 @@ size_t machine__fprintf_vmlinux_path(struct machine *machine, FILE *fp)
 	size_t printed = 0;
 	struct dso *kdso = machine__kernel_dso(machine);
 
-	if (kdso->has_build_id) {
+	if (dso__has_build_id(kdso)) {
 		char filename[PATH_MAX];
-		if (dso__build_id_filename(kdso, filename, sizeof(filename),
-					   false))
+
+		if (dso__build_id_filename(kdso, filename, sizeof(filename), false))
 			printed += fprintf(fp, "[0] %s\n", filename);
 	}
 
-	for (i = 0; i < vmlinux_path__nr_entries; ++i)
-		printed += fprintf(fp, "[%d] %s\n",
-				   i + kdso->has_build_id, vmlinux_path[i]);
-
+	for (i = 0; i < vmlinux_path__nr_entries; ++i) {
+		printed += fprintf(fp, "[%d] %s\n", i + dso__has_build_id(kdso),
+				   vmlinux_path[i]);
+	}
 	return printed;
 }
 
@@ -948,7 +948,7 @@ static struct dso *machine__get_kernel(struct machine *machine)
 						 DSO_SPACE__KERNEL_GUEST);
 	}
 
-	if (kernel != NULL && (!kernel->has_build_id))
+	if (kernel != NULL && (!dso__has_build_id(kernel)))
 		dso__read_running_kernel_build_id(kernel, machine);
 
 	return kernel;
@@ -1313,8 +1313,8 @@ static char *get_kernel_version(const char *root_dir)
 
 static bool is_kmod_dso(struct dso *dso)
 {
-	return dso->symtab_type == DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE ||
-	       dso->symtab_type == DSO_BINARY_TYPE__GUEST_KMODULE;
+	return dso__symtab_type(dso) == DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE ||
+	       dso__symtab_type(dso) == DSO_BINARY_TYPE__GUEST_KMODULE;
 }
 
 static int maps__set_module_path(struct maps *maps, const char *path, struct kmod_path *m)
@@ -1341,8 +1341,8 @@ static int maps__set_module_path(struct maps *maps, const char *path, struct kmo
 	 * we need to update the symtab_type if needed.
 	 */
 	if (m->comp && is_kmod_dso(dso)) {
-		dso->symtab_type++;
-		dso->comp = m->comp;
+		dso__set_symtab_type(dso, dso__symtab_type(dso));
+		dso__set_comp(dso, m->comp);
 	}
 	map__put(map);
 	return 0;
@@ -1639,13 +1639,13 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
 		if (kernel == NULL)
 			goto out_problem;
 
-		kernel->kernel = dso_space;
+		dso__set_kernel(kernel, dso_space);
 		if (__machine__create_kernel_maps(machine, kernel) < 0) {
 			dso__put(kernel);
 			goto out_problem;
 		}
 
-		if (strstr(kernel->long_name, "vmlinux"))
+		if (strstr(dso__long_name(kernel), "vmlinux"))
 			dso__set_short_name(kernel, "[kernel.vmlinux]", false);
 
 		if (machine__update_kernel_mmap(machine, xm->start, xm->end) < 0) {
@@ -2027,14 +2027,14 @@ static char *callchain_srcline(struct map_symbol *ms, u64 ip)
 		return srcline;
 
 	dso = map__dso(map);
-	srcline = srcline__tree_find(&dso->srclines, ip);
+	srcline = srcline__tree_find(dso__srclines(dso), ip);
 	if (!srcline) {
 		bool show_sym = false;
 		bool show_addr = callchain_param.key == CCKEY_ADDRESS;
 
 		srcline = get_srcline(dso, map__rip_2objdump(map, ip),
 				      ms->sym, show_sym, show_addr, ip);
-		srcline__tree_insert(&dso->srclines, ip, srcline);
+		srcline__tree_insert(dso__srclines(dso), ip, srcline);
 	}
 
 	return srcline;
@@ -2832,12 +2832,12 @@ static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms
 	addr = map__rip_2objdump(map, addr);
 	dso = map__dso(map);
 
-	inline_node = inlines__tree_find(&dso->inlined_nodes, addr);
+	inline_node = inlines__tree_find(dso__inlined_nodes(dso), addr);
 	if (!inline_node) {
 		inline_node = dso__parse_addr_inlines(dso, addr, sym);
 		if (!inline_node)
 			return ret;
-		inlines__tree_insert(&dso->inlined_nodes, inline_node);
+		inlines__tree_insert(dso__inlined_nodes(dso), inline_node);
 	}
 
 	ilist_ms = (struct map_symbol) {
@@ -3126,7 +3126,7 @@ char *machine__resolve_kernel_addr(void *vmachine, unsigned long long *addrp, ch
 	if (sym == NULL)
 		return NULL;
 
-	*modp = __map__is_kmodule(map) ? (char *)map__dso(map)->short_name : NULL;
+	*modp = __map__is_kmodule(map) ? (char *)dso__short_name(map__dso(map)) : NULL;
 	*addrp = map__unmap_ip(map, sym->start);
 	return sym->name;
 }
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 7c1fff9e413d..14fb8cf65b13 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -168,7 +168,7 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
 		if (dso == NULL)
 			goto out_delete;
 
-		assert(!dso->kernel);
+		assert(!dso__kernel(dso));
 		map__init(result, start, start + len, pgoff, dso);
 
 		if (anon || no_dso) {
@@ -182,10 +182,9 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
 			if (!(prot & PROT_EXEC))
 				dso__set_loaded(dso);
 		}
-		mutex_lock(&dso->lock);
-		nsinfo__put(dso->nsinfo);
-		dso->nsinfo = nsi;
-		mutex_unlock(&dso->lock);
+		mutex_lock(dso__lock(dso));
+		dso__set_nsinfo(dso, nsi);
+		mutex_unlock(dso__lock(dso));
 
 		if (build_id__is_defined(bid)) {
 			dso__set_build_id(dso, bid);
@@ -197,9 +196,9 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
 			 * have it missing.
 			 */
 			header_bid_dso = dsos__find(&machine->dsos, filename, false);
-			if (header_bid_dso && header_bid_dso->header_build_id) {
-				dso__set_build_id(dso, &header_bid_dso->bid);
-				dso->header_build_id = 1;
+			if (header_bid_dso && dso__header_build_id(header_bid_dso)) {
+				dso__set_build_id(dso, dso__bid(header_bid_dso));
+				dso__set_header_build_id(dso, 1);
 			}
 		}
 		dso__put(dso);
@@ -221,7 +220,7 @@ struct map *map__new2(u64 start, struct dso *dso)
 	struct map *result;
 	RC_STRUCT(map) *map;
 
-	map = calloc(1, sizeof(*map) + (dso->kernel ? sizeof(struct kmap) : 0));
+	map = calloc(1, sizeof(*map) + (dso__kernel(dso) ? sizeof(struct kmap) : 0));
 	if (ADD_RC_CHK(result, map)) {
 		/*
 		 * ->end will be filled after we load all the symbols
@@ -234,7 +233,7 @@ struct map *map__new2(u64 start, struct dso *dso)
 
 bool __map__is_kernel(const struct map *map)
 {
-	if (!map__dso(map)->kernel)
+	if (!dso__kernel(map__dso(map)))
 		return false;
 	return machine__kernel_map(maps__machine(map__kmaps((struct map *)map))) == map;
 }
@@ -251,7 +250,7 @@ bool __map__is_bpf_prog(const struct map *map)
 	const char *name;
 	struct dso *dso = map__dso(map);
 
-	if (dso->binary_type == DSO_BINARY_TYPE__BPF_PROG_INFO)
+	if (dso__binary_type(dso) == DSO_BINARY_TYPE__BPF_PROG_INFO)
 		return true;
 
 	/*
@@ -259,7 +258,7 @@ bool __map__is_bpf_prog(const struct map *map)
 	 * type of DSO_BINARY_TYPE__BPF_PROG_INFO. In such cases, we can
 	 * guess the type based on name.
 	 */
-	name = dso->short_name;
+	name = dso__short_name(dso);
 	return name && (strstr(name, "bpf_prog_") == name);
 }
 
@@ -268,7 +267,7 @@ bool __map__is_bpf_image(const struct map *map)
 	const char *name;
 	struct dso *dso = map__dso(map);
 
-	if (dso->binary_type == DSO_BINARY_TYPE__BPF_IMAGE)
+	if (dso__binary_type(dso) == DSO_BINARY_TYPE__BPF_IMAGE)
 		return true;
 
 	/*
@@ -276,7 +275,7 @@ bool __map__is_bpf_image(const struct map *map)
 	 * type of DSO_BINARY_TYPE__BPF_IMAGE. In such cases, we can
 	 * guess the type based on name.
 	 */
-	name = dso->short_name;
+	name = dso__short_name(dso);
 	return name && is_bpf_image(name);
 }
 
@@ -284,7 +283,7 @@ bool __map__is_ool(const struct map *map)
 {
 	const struct dso *dso = map__dso(map);
 
-	return dso && dso->binary_type == DSO_BINARY_TYPE__OOL;
+	return dso && dso__binary_type(dso) == DSO_BINARY_TYPE__OOL;
 }
 
 bool map__has_symbols(const struct map *map)
@@ -315,7 +314,7 @@ void map__put(struct map *map)
 void map__fixup_start(struct map *map)
 {
 	struct dso *dso = map__dso(map);
-	struct rb_root_cached *symbols = &dso->symbols;
+	struct rb_root_cached *symbols = dso__symbols(dso);
 	struct rb_node *nd = rb_first_cached(symbols);
 
 	if (nd != NULL) {
@@ -328,7 +327,7 @@ void map__fixup_start(struct map *map)
 void map__fixup_end(struct map *map)
 {
 	struct dso *dso = map__dso(map);
-	struct rb_root_cached *symbols = &dso->symbols;
+	struct rb_root_cached *symbols = dso__symbols(dso);
 	struct rb_node *nd = rb_last(&symbols->rb_root);
 
 	if (nd != NULL) {
@@ -342,7 +341,7 @@ void map__fixup_end(struct map *map)
 int map__load(struct map *map)
 {
 	struct dso *dso = map__dso(map);
-	const char *name = dso->long_name;
+	const char *name = dso__long_name(dso);
 	int nr;
 
 	if (dso__loaded(dso))
@@ -350,10 +349,10 @@ int map__load(struct map *map)
 
 	nr = dso__load(dso, map);
 	if (nr < 0) {
-		if (dso->has_build_id) {
+		if (dso__has_build_id(dso)) {
 			char sbuild_id[SBUILD_ID_SIZE];
 
-			build_id__sprintf(&dso->bid, sbuild_id);
+			build_id__sprintf(dso__bid(dso), sbuild_id);
 			pr_debug("%s with build id %s not found", name, sbuild_id);
 		} else
 			pr_debug("Failed to open %s", name);
@@ -415,7 +414,7 @@ struct map *map__clone(struct map *from)
 	size_t size = sizeof(RC_STRUCT(map));
 	struct dso *dso = map__dso(from);
 
-	if (dso && dso->kernel)
+	if (dso && dso__kernel(dso))
 		size += sizeof(struct kmap);
 
 	map = memdup(RC_CHK_ACCESS(from), size);
@@ -432,14 +431,14 @@ size_t map__fprintf(struct map *map, FILE *fp)
 	const struct dso *dso = map__dso(map);
 
 	return fprintf(fp, " %" PRIx64 "-%" PRIx64 " %" PRIx64 " %s\n",
-		       map__start(map), map__end(map), map__pgoff(map), dso->name);
+		       map__start(map), map__end(map), map__pgoff(map), dso__name(dso));
 }
 
 static bool prefer_dso_long_name(const struct dso *dso, bool print_off)
 {
-	return dso->long_name &&
+	return dso__long_name(dso) &&
 	       (symbol_conf.show_kernel_path ||
-		(print_off && (dso->name[0] == '[' || dso__is_kcore(dso))));
+		(print_off && (dso__name(dso)[0] == '[' || dso__is_kcore(dso))));
 }
 
 static size_t __map__fprintf_dsoname(struct map *map, bool print_off, FILE *fp)
@@ -450,9 +449,9 @@ static size_t __map__fprintf_dsoname(struct map *map, bool print_off, FILE *fp)
 
 	if (dso) {
 		if (prefer_dso_long_name(dso, print_off))
-			dsoname = dso->long_name;
+			dsoname = dso__long_name(dso);
 		else
-			dsoname = dso->name;
+			dsoname = dso__name(dso);
 	}
 
 	if (symbol_conf.pad_output_len_dso) {
@@ -545,18 +544,18 @@ u64 map__rip_2objdump(struct map *map, u64 rip)
 		}
 	}
 
-	if (!dso->adjust_symbols)
+	if (!dso__adjust_symbols(dso))
 		return rip;
 
-	if (dso->rel)
+	if (dso__rel(dso))
 		return rip - map__pgoff(map);
 
 	/*
 	 * kernel modules also have DSO_TYPE_USER in dso->kernel,
 	 * but all kernel modules are ET_REL, so won't get here.
 	 */
-	if (dso->kernel == DSO_SPACE__USER)
-		return rip + dso->text_offset;
+	if (dso__kernel(dso) == DSO_SPACE__USER)
+		return rip + dso__text_offset(dso);
 
 	return map__unmap_ip(map, rip) - map__reloc(map);
 }
@@ -577,18 +576,18 @@ u64 map__objdump_2mem(struct map *map, u64 ip)
 {
 	const struct dso *dso = map__dso(map);
 
-	if (!dso->adjust_symbols)
+	if (!dso__adjust_symbols(dso))
 		return map__unmap_ip(map, ip);
 
-	if (dso->rel)
+	if (dso__rel(dso))
 		return map__unmap_ip(map, ip + map__pgoff(map));
 
 	/*
 	 * kernel modules also have DSO_TYPE_USER in dso->kernel,
 	 * but all kernel modules are ET_REL, so won't get here.
 	 */
-	if (dso->kernel == DSO_SPACE__USER)
-		return map__unmap_ip(map, ip - dso->text_offset);
+	if (dso__kernel(dso) == DSO_SPACE__USER)
+		return map__unmap_ip(map, ip - dso__text_offset(dso));
 
 	return ip + map__reloc(map);
 }
@@ -604,7 +603,7 @@ struct kmap *__map__kmap(struct map *map)
 {
 	const struct dso *dso = map__dso(map);
 
-	if (!dso || !dso->kernel)
+	if (!dso || !dso__kernel(dso))
 		return NULL;
 	return (struct kmap *)(&RC_CHK_ACCESS(map)[1]);
 }
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 725f5d73e93a..a4a73a5d857a 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -76,7 +76,7 @@ static void check_invariants(const struct maps *maps __maybe_unused)
 		/* Expect at least 1 reference count. */
 		assert(refcount_read(map__refcnt(map)) > 0);
 
-		if (map__dso(map) && map__dso(map)->kernel)
+		if (map__dso(map) && dso__kernel(map__dso(map)))
 			assert(RC_CHK_EQUAL(map__kmap(map)->kmaps, maps));
 
 		if (i > 0) {
@@ -331,7 +331,7 @@ static int map__strcmp(const void *a, const void *b)
 	const struct map *map_b = *(const struct map * const *)b;
 	const struct dso *dso_a = map__dso(map_a);
 	const struct dso *dso_b = map__dso(map_b);
-	int ret = strcmp(dso_a->short_name, dso_b->short_name);
+	int ret = strcmp(dso__short_name(dso_a), dso__short_name(dso_b));
 
 	if (ret == 0 && RC_CHK_ACCESS(map_a) != RC_CHK_ACCESS(map_b)) {
 		/* Ensure distinct but name equal maps have an order. */
@@ -470,7 +470,7 @@ static int __maps__insert(struct maps *maps, struct map *new)
 	}
 	if (map__end(new) < map__start(new))
 		RC_CHK_ACCESS(maps)->ends_broken = true;
-	if (dso && dso->kernel) {
+	if (dso && dso__kernel(dso)) {
 		struct kmap *kmap = map__kmap(new);
 
 		if (kmap)
@@ -746,7 +746,7 @@ static int __maps__fixup_overlap_and_insert(struct maps *maps, struct map *new)
 
 		if (use_browser) {
 			pr_debug("overlapping maps in %s (disable tui for more info)\n",
-				map__dso(new)->name);
+				dso__name(map__dso(new)));
 		} else if (verbose >= 2) {
 			pr_debug("overlapping maps:\n");
 			map__fprintf(new, debug_file());
@@ -967,7 +967,7 @@ static int map__strcmp_name(const void *name, const void *b)
 {
 	const struct dso *dso = map__dso(*(const struct map **)b);
 
-	return strcmp(name, dso->short_name);
+	return strcmp(name, dso__short_name(dso));
 }
 
 struct map *maps__find_by_name(struct maps *maps, const char *name)
@@ -986,7 +986,7 @@ struct map *maps__find_by_name(struct maps *maps, const char *name)
 		if (i < maps__nr_maps(maps) && maps__maps_by_name(maps)) {
 			struct dso *dso = map__dso(maps__maps_by_name(maps)[i]);
 
-			if (dso && strcmp(dso->short_name, name) == 0) {
+			if (dso && strcmp(dso__short_name(dso), name) == 0) {
 				result = map__get(maps__maps_by_name(maps)[i]);
 				done = true;
 			}
@@ -1023,7 +1023,7 @@ struct map *maps__find_by_name(struct maps *maps, const char *name)
 					struct map *pos = maps_by_address[i];
 					struct dso *dso = map__dso(pos);
 
-					if (dso && strcmp(dso->short_name, name) == 0) {
+					if (dso && strcmp(dso__short_name(dso), name) == 0) {
 						result = map__get(pos);
 						break;
 					}
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index be71abe8b9b0..26c084b4a4a6 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -158,8 +158,8 @@ static int kernel_get_module_map_cb(struct map *map, void *data)
 {
 	struct kernel_get_module_map_cb_args *args = data;
 	struct dso *dso = map__dso(map);
-	const char *short_name = dso->short_name; /* short_name is "[module]" */
-	u16 short_name_len =  dso->short_name_len;
+	const char *short_name = dso__short_name(dso);
+	u16 short_name_len =  dso__short_name_len(dso);
 
 	if (strncmp(short_name + 1, args->module, short_name_len - 2) == 0 &&
 	    args->module[short_name_len - 2] == '\0') {
@@ -201,10 +201,9 @@ struct map *get_target_map(const char *target, struct nsinfo *nsi, bool user)
 		map = dso__new_map(target);
 		dso = map ? map__dso(map) : NULL;
 		if (dso) {
-			mutex_lock(&dso->lock);
-			nsinfo__put(dso->nsinfo);
-			dso->nsinfo = nsinfo__get(nsi);
-			mutex_unlock(&dso->lock);
+			mutex_lock(dso__lock(dso));
+			dso__set_nsinfo(dso, nsinfo__get(nsi));
+			mutex_unlock(dso__lock(dso));
 		}
 		return map;
 	} else {
@@ -367,11 +366,11 @@ static int kernel_get_module_dso(const char *module, struct dso **pdso)
 
 	map = machine__kernel_map(host_machine);
 	dso = map__dso(map);
-	if (!dso->has_build_id)
+	if (!dso__has_build_id(dso))
 		dso__read_running_kernel_build_id(dso, host_machine);
 
 	vmlinux_name = symbol_conf.vmlinux_name;
-	dso->load_errno = 0;
+	*dso__load_errno(dso) = 0;
 	if (vmlinux_name)
 		ret = dso__load_vmlinux(dso, map, vmlinux_name, false);
 	else
@@ -498,7 +497,7 @@ static struct debuginfo *open_from_debuginfod(struct dso *dso, struct nsinfo *ns
 	if (!c)
 		return NULL;
 
-	build_id__sprintf(&dso->bid, sbuild_id);
+	build_id__sprintf(dso__bid(dso), sbuild_id);
 	fd = debuginfod_find_debuginfo(c, (const unsigned char *)sbuild_id,
 					0, &path);
 	if (fd >= 0)
@@ -541,7 +540,7 @@ static struct debuginfo *open_debuginfo(const char *module, struct nsinfo *nsi,
 	if (!module || !strchr(module, '/')) {
 		err = kernel_get_module_dso(module, &dso);
 		if (err < 0) {
-			if (!dso || dso->load_errno == 0) {
+			if (!dso || *dso__load_errno(dso) == 0) {
 				if (!str_error_r(-err, reason, STRERR_BUFSIZE))
 					strcpy(reason, "(unknown)");
 			} else
@@ -558,7 +557,7 @@ static struct debuginfo *open_debuginfo(const char *module, struct nsinfo *nsi,
 			}
 			return NULL;
 		}
-		path = dso->long_name;
+		path = dso__long_name(dso);
 	}
 	nsinfo__mountns_enter(nsi, &nsc);
 	ret = debuginfo__new(path);
@@ -3796,8 +3795,8 @@ int show_available_funcs(const char *target, struct nsinfo *nsi,
 	/* Show all (filtered) symbols */
 	setup_pager();
 
-	for (size_t i = 0; i < dso->symbol_names_len; i++) {
-		struct symbol *pos = dso->symbol_names[i];
+	for (size_t i = 0; i < dso__symbol_names_len(dso); i++) {
+		struct symbol *pos = dso__symbol_names(dso)[i];
 
 		if (strfilter__compare(_filter, pos->name))
 			printf("%s\n", pos->name);
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index 94312741443a..0a93f8f88415 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -391,10 +391,10 @@ static const char *get_dsoname(struct map *map)
 	struct dso *dso = map ? map__dso(map) : NULL;
 
 	if (dso) {
-		if (symbol_conf.show_kernel_path && dso->long_name)
-			dsoname = dso->long_name;
+		if (symbol_conf.show_kernel_path && dso__long_name(dso))
+			dsoname = dso__long_name(dso);
 		else
-			dsoname = dso->name;
+			dsoname = dso__name(dso);
 	}
 
 	return dsoname;
@@ -793,8 +793,9 @@ static void set_sym_in_dict(PyObject *dict, struct addr_location *al,
 	if (al->map) {
 		struct dso *dso = map__dso(al->map);
 
-		pydict_set_item_string_decref(dict, dso_field, _PyUnicode_FromString(dso->name));
-		build_id__sprintf(&dso->bid, sbuild_id);
+		pydict_set_item_string_decref(dict, dso_field,
+					      _PyUnicode_FromString(dso__name(dso)));
+		build_id__sprintf(dso__bid(dso), sbuild_id);
 		pydict_set_item_string_decref(dict, dso_bid_field,
 			_PyUnicode_FromString(sbuild_id));
 		pydict_set_item_string_decref(dict, dso_map_start,
@@ -1235,14 +1236,14 @@ static int python_export_dso(struct db_export *dbe, struct dso *dso,
 	char sbuild_id[SBUILD_ID_SIZE];
 	PyObject *t;
 
-	build_id__sprintf(&dso->bid, sbuild_id);
+	build_id__sprintf(dso__bid(dso), sbuild_id);
 
 	t = tuple_new(5);
 
-	tuple_set_d64(t, 0, dso->db_id);
+	tuple_set_d64(t, 0, dso__db_id(dso));
 	tuple_set_d64(t, 1, machine->db_id);
-	tuple_set_string(t, 2, dso->short_name);
-	tuple_set_string(t, 3, dso->long_name);
+	tuple_set_string(t, 2, dso__short_name(dso));
+	tuple_set_string(t, 3, dso__long_name(dso));
 	tuple_set_string(t, 4, sbuild_id);
 
 	call_object(tables->dso_handler, t, "dso_table");
@@ -1262,7 +1263,7 @@ static int python_export_symbol(struct db_export *dbe, struct symbol *sym,
 	t = tuple_new(6);
 
 	tuple_set_d64(t, 0, *sym_db_id);
-	tuple_set_d64(t, 1, dso->db_id);
+	tuple_set_d64(t, 1, dso__db_id(dso));
 	tuple_set_d64(t, 2, sym->start);
 	tuple_set_d64(t, 3, sym->end);
 	tuple_set_s32(t, 4, sym->binding);
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 80e4f6132740..a6e66e5f9f25 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -238,11 +238,11 @@ static int64_t _sort__dso_cmp(struct map *map_l, struct map *map_r)
 		return cmp_null(dso_r, dso_l);
 
 	if (verbose > 0) {
-		dso_name_l = dso_l->long_name;
-		dso_name_r = dso_r->long_name;
+		dso_name_l = dso__long_name(dso_l);
+		dso_name_r = dso__long_name(dso_r);
 	} else {
-		dso_name_l = dso_l->short_name;
-		dso_name_r = dso_r->short_name;
+		dso_name_l = dso__short_name(dso_l);
+		dso_name_r = dso__short_name(dso_r);
 	}
 
 	return strcmp(dso_name_l, dso_name_r);
@@ -261,7 +261,7 @@ static int _hist_entry__dso_snprintf(struct map *map, char *bf,
 	const char *dso_name = "[unknown]";
 
 	if (dso)
-		dso_name = verbose > 0 ? dso->long_name : dso->short_name;
+		dso_name = verbose > 0 ? dso__long_name(dso) : dso__short_name(dso);
 
 	return repsep_snprintf(bf, size, "%-*.*s", width, width, dso_name);
 }
@@ -363,7 +363,7 @@ static int _hist_entry__sym_snprintf(struct map_symbol *ms,
 		char o = dso ? dso__symtab_origin(dso) : '!';
 		u64 rip = ip;
 
-		if (dso && dso->kernel && dso->adjust_symbols)
+		if (dso && dso__kernel(dso) && dso__adjust_symbols(dso))
 			rip = map__unmap_ip(map, ip);
 
 		ret += repsep_snprintf(bf, size, "%-#*llx %c ",
@@ -1539,8 +1539,8 @@ sort__dcacheline_cmp(struct hist_entry *left, struct hist_entry *right)
 	 */
 
 	if ((left->cpumode != PERF_RECORD_MISC_KERNEL) &&
-	    (!(map__flags(l_map) & MAP_SHARED)) && !l_dso->id.maj && !l_dso->id.min &&
-	    !l_dso->id.ino && !l_dso->id.ino_generation) {
+	    (!(map__flags(l_map) & MAP_SHARED)) && !dso__id(l_dso)->maj && !dso__id(l_dso)->min &&
+	     !dso__id(l_dso)->ino && !dso__id(l_dso)->ino_generation) {
 		/* userspace anonymous */
 
 		if (thread__pid(left->thread) > thread__pid(right->thread))
@@ -1579,7 +1579,8 @@ static int hist_entry__dcacheline_snprintf(struct hist_entry *he, char *bf,
 		if ((he->cpumode != PERF_RECORD_MISC_KERNEL) &&
 		     map && !(map__prot(map) & PROT_EXEC) &&
 		     (map__flags(map) & MAP_SHARED) &&
-		    (dso->id.maj || dso->id.min || dso->id.ino || dso->id.ino_generation))
+		     (dso__id(dso)->maj || dso__id(dso)->min || dso__id(dso)->ino ||
+		      dso__id(dso)->ino_generation))
 			level = 's';
 		else if (!map)
 			level = 'X';
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index 034b496df297..7a56b8b0792a 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -27,14 +27,14 @@ bool srcline_full_filename;
 
 char *srcline__unknown = (char *)"??:0";
 
-static const char *dso__name(struct dso *dso)
+static const char *srcline_dso_name(struct dso *dso)
 {
 	const char *dso_name;
 
-	if (dso->symsrc_filename)
-		dso_name = dso->symsrc_filename;
+	if (dso__symsrc_filename(dso))
+		dso_name = dso__symsrc_filename(dso);
 	else
-		dso_name = dso->long_name;
+		dso_name = dso__long_name(dso);
 
 	if (dso_name[0] == '[')
 		return NULL;
@@ -636,7 +636,7 @@ static int addr2line(const char *dso_name, u64 addr,
 		     struct inline_node *node,
 		     struct symbol *sym __maybe_unused)
 {
-	struct child_process *a2l = dso->a2l;
+	struct child_process *a2l = dso__a2l(dso);
 	char *record_function = NULL;
 	char *record_filename = NULL;
 	unsigned int record_line_nr = 0;
@@ -653,8 +653,9 @@ static int addr2line(const char *dso_name, u64 addr,
 		if (!filename__has_section(dso_name, ".debug_line"))
 			goto out;
 
-		dso->a2l = addr2line_subprocess_init(symbol_conf.addr2line_path, dso_name);
-		a2l = dso->a2l;
+		dso__set_a2l(dso,
+			     addr2line_subprocess_init(symbol_conf.addr2line_path, dso_name));
+		a2l = dso__a2l(dso);
 	}
 
 	if (a2l == NULL) {
@@ -768,7 +769,7 @@ static int addr2line(const char *dso_name, u64 addr,
 	free(record_function);
 	free(record_filename);
 	if (io.eof) {
-		dso->a2l = NULL;
+		dso__set_a2l(dso, NULL);
 		addr2line_subprocess_cleanup(a2l);
 	}
 	return ret;
@@ -776,14 +777,14 @@ static int addr2line(const char *dso_name, u64 addr,
 
 void dso__free_a2l(struct dso *dso)
 {
-	struct child_process *a2l = dso->a2l;
+	struct child_process *a2l = dso__a2l(dso);
 
 	if (!a2l)
 		return;
 
 	addr2line_subprocess_cleanup(a2l);
 
-	dso->a2l = NULL;
+	dso__set_a2l(dso, NULL);
 }
 
 #endif /* HAVE_LIBBFD_SUPPORT */
@@ -821,33 +822,34 @@ char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
 	char *srcline;
 	const char *dso_name;
 
-	if (!dso->has_srcline)
+	if (!dso__has_srcline(dso))
 		goto out;
 
-	dso_name = dso__name(dso);
+	dso_name = srcline_dso_name(dso);
 	if (dso_name == NULL)
-		goto out;
+		goto out_err;
 
 	if (!addr2line(dso_name, addr, &file, &line, dso,
 		       unwind_inlines, NULL, sym))
-		goto out;
+		goto out_err;
 
 	srcline = srcline_from_fileline(file, line);
 	free(file);
 
 	if (!srcline)
-		goto out;
+		goto out_err;
 
-	dso->a2l_fails = 0;
+	dso__set_a2l_fails(dso, 0);
 
 	return srcline;
 
-out:
-	if (dso->a2l_fails && ++dso->a2l_fails > A2L_FAIL_LIMIT) {
-		dso->has_srcline = 0;
+out_err:
+	dso__set_a2l_fails(dso, dso__a2l_fails(dso) + 1);
+	if (dso__a2l_fails(dso) > A2L_FAIL_LIMIT) {
+		dso__set_has_srcline(dso, false);
 		dso__free_a2l(dso);
 	}
-
+out:
 	if (!show_addr)
 		return (show_sym && sym) ?
 			    strndup(sym->name, sym->namelen) : SRCLINE_UNKNOWN;
@@ -856,7 +858,7 @@ char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
 		if (asprintf(&srcline, "%s+%" PRIu64, show_sym ? sym->name : "",
 					ip - sym->start) < 0)
 			return SRCLINE_UNKNOWN;
-	} else if (asprintf(&srcline, "%s[%" PRIx64 "]", dso->short_name, addr) < 0)
+	} else if (asprintf(&srcline, "%s[%" PRIx64 "]", dso__short_name(dso), addr) < 0)
 		return SRCLINE_UNKNOWN;
 	return srcline;
 }
@@ -867,22 +869,23 @@ char *get_srcline_split(struct dso *dso, u64 addr, unsigned *line)
 	char *file = NULL;
 	const char *dso_name;
 
-	if (!dso->has_srcline)
-		goto out;
+	if (!dso__has_srcline(dso))
+		return NULL;
 
-	dso_name = dso__name(dso);
+	dso_name = srcline_dso_name(dso);
 	if (dso_name == NULL)
-		goto out;
+		goto out_err;
 
 	if (!addr2line(dso_name, addr, &file, line, dso, true, NULL, NULL))
-		goto out;
+		goto out_err;
 
-	dso->a2l_fails = 0;
+	dso__set_a2l_fails(dso, 0);
 	return file;
 
-out:
-	if (dso->a2l_fails && ++dso->a2l_fails > A2L_FAIL_LIMIT) {
-		dso->has_srcline = 0;
+out_err:
+	dso__set_a2l_fails(dso, dso__a2l_fails(dso) + 1);
+	if (dso__a2l_fails(dso) > A2L_FAIL_LIMIT) {
+		dso__set_has_srcline(dso, false);
 		dso__free_a2l(dso);
 	}
 
@@ -980,7 +983,7 @@ struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
 {
 	const char *dso_name;
 
-	dso_name = dso__name(dso);
+	dso_name = srcline_dso_name(dso);
 	if (dso_name == NULL)
 		return NULL;
 
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 5990e3fabdb5..de73f9fb3fe4 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -311,8 +311,8 @@ static char *demangle_sym(struct dso *dso, int kmodule, const char *elf_name)
 	 * DWARF DW_compile_unit has this, but we don't always have access
 	 * to it...
 	 */
-	if (!want_demangle(dso->kernel || kmodule))
-	    return demangled;
+	if (!want_demangle(dso__kernel(dso) || kmodule))
+		return demangled;
 
 	demangled = cxx_demangle_sym(elf_name, verbose > 0, verbose > 0);
 	if (demangled == NULL) {
@@ -469,7 +469,7 @@ static bool get_plt_sizes(struct dso *dso, GElf_Ehdr *ehdr, GElf_Shdr *shdr_plt,
 	}
 	if (*plt_entry_size)
 		return true;
-	pr_debug("Missing PLT entry size for %s\n", dso->long_name);
+	pr_debug("Missing PLT entry size for %s\n", dso__long_name(dso));
 	return false;
 }
 
@@ -653,7 +653,7 @@ static int dso__synthesize_plt_got_symbols(struct dso *dso, Elf *elf,
 		sym = symbol__new(shdr.sh_offset + i, shdr.sh_entsize, STB_GLOBAL, STT_FUNC, buf);
 		if (!sym)
 			goto out;
-		symbols__insert(&dso->symbols, sym);
+		symbols__insert(dso__symbols(dso), sym);
 	}
 	err = 0;
 out:
@@ -707,7 +707,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss)
 	plt_sym = symbol__new(shdr_plt.sh_offset, plt_header_size, STB_GLOBAL, STT_FUNC, ".plt");
 	if (!plt_sym)
 		goto out_elf_end;
-	symbols__insert(&dso->symbols, plt_sym);
+	symbols__insert(dso__symbols(dso), plt_sym);
 
 	/* Only x86 has .plt.got */
 	if (machine_is_x86(ehdr.e_machine) &&
@@ -829,7 +829,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss)
 			goto out_elf_end;
 
 		plt_offset += plt_entry_size;
-		symbols__insert(&dso->symbols, f);
+		symbols__insert(dso__symbols(dso), f);
 		++nr;
 	}
 
@@ -839,7 +839,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss)
 	if (err == 0)
 		return nr;
 	pr_debug("%s: problems reading %s PLT info.\n",
-		 __func__, dso->long_name);
+		 __func__, dso__long_name(dso));
 	return 0;
 }
 
@@ -1174,19 +1174,19 @@ static int dso__swap_init(struct dso *dso, unsigned char eidata)
 {
 	static unsigned int const endian = 1;
 
-	dso->needs_swap = DSO_SWAP__NO;
+	dso__set_needs_swap(dso, DSO_SWAP__NO);
 
 	switch (eidata) {
 	case ELFDATA2LSB:
 		/* We are big endian, DSO is little endian. */
 		if (*(unsigned char const *)&endian != 1)
-			dso->needs_swap = DSO_SWAP__YES;
+			dso__set_needs_swap(dso, DSO_SWAP__YES);
 		break;
 
 	case ELFDATA2MSB:
 		/* We are little endian, DSO is big endian. */
 		if (*(unsigned char const *)&endian != 0)
-			dso->needs_swap = DSO_SWAP__YES;
+			dso__set_needs_swap(dso, DSO_SWAP__YES);
 		break;
 
 	default:
@@ -1237,11 +1237,11 @@ int symsrc__init(struct symsrc *ss, struct dso *dso, const char *name,
 		if (fd < 0)
 			return -1;
 
-		type = dso->symtab_type;
+		type = dso__symtab_type(dso);
 	} else {
 		fd = open(name, O_RDONLY);
 		if (fd < 0) {
-			dso->load_errno = errno;
+			*dso__load_errno(dso) = errno;
 			return -1;
 		}
 	}
@@ -1249,37 +1249,37 @@ int symsrc__init(struct symsrc *ss, struct dso *dso, const char *name,
 	elf = elf_begin(fd, PERF_ELF_C_READ_MMAP, NULL);
 	if (elf == NULL) {
 		pr_debug("%s: cannot read %s ELF file.\n", __func__, name);
-		dso->load_errno = DSO_LOAD_ERRNO__INVALID_ELF;
+		*dso__load_errno(dso) = DSO_LOAD_ERRNO__INVALID_ELF;
 		goto out_close;
 	}
 
 	if (gelf_getehdr(elf, &ehdr) == NULL) {
-		dso->load_errno = DSO_LOAD_ERRNO__INVALID_ELF;
+		*dso__load_errno(dso) = DSO_LOAD_ERRNO__INVALID_ELF;
 		pr_debug("%s: cannot get elf header.\n", __func__);
 		goto out_elf_end;
 	}
 
 	if (dso__swap_init(dso, ehdr.e_ident[EI_DATA])) {
-		dso->load_errno = DSO_LOAD_ERRNO__INTERNAL_ERROR;
+		*dso__load_errno(dso) = DSO_LOAD_ERRNO__INTERNAL_ERROR;
 		goto out_elf_end;
 	}
 
 	/* Always reject images with a mismatched build-id: */
-	if (dso->has_build_id && !symbol_conf.ignore_vmlinux_buildid) {
+	if (dso__has_build_id(dso) && !symbol_conf.ignore_vmlinux_buildid) {
 		u8 build_id[BUILD_ID_SIZE];
 		struct build_id bid;
 		int size;
 
 		size = elf_read_build_id(elf, build_id, BUILD_ID_SIZE);
 		if (size <= 0) {
-			dso->load_errno = DSO_LOAD_ERRNO__CANNOT_READ_BUILDID;
+			*dso__load_errno(dso) = DSO_LOAD_ERRNO__CANNOT_READ_BUILDID;
 			goto out_elf_end;
 		}
 
 		build_id__init(&bid, build_id, size);
 		if (!dso__build_id_equal(dso, &bid)) {
 			pr_debug("%s: build id mismatch for %s.\n", __func__, name);
-			dso->load_errno = DSO_LOAD_ERRNO__MISMATCHING_BUILDID;
+			*dso__load_errno(dso) = DSO_LOAD_ERRNO__MISMATCHING_BUILDID;
 			goto out_elf_end;
 		}
 	}
@@ -1304,14 +1304,14 @@ int symsrc__init(struct symsrc *ss, struct dso *dso, const char *name,
 	if (ss->opdshdr.sh_type != SHT_PROGBITS)
 		ss->opdsec = NULL;
 
-	if (dso->kernel == DSO_SPACE__USER)
+	if (dso__kernel(dso) == DSO_SPACE__USER)
 		ss->adjust_symbols = true;
 	else
 		ss->adjust_symbols = elf__needs_adjust_symbols(ehdr);
 
 	ss->name   = strdup(name);
 	if (!ss->name) {
-		dso->load_errno = errno;
+		*dso__load_errno(dso) = errno;
 		goto out_elf_end;
 	}
 
@@ -1378,7 +1378,7 @@ static int dso__process_kernel_symbol(struct dso *dso, struct map *map,
 	if (adjust_kernel_syms)
 		sym->st_value -= shdr->sh_addr - shdr->sh_offset;
 
-	if (strcmp(section_name, (curr_dso->short_name + dso->short_name_len)) == 0)
+	if (strcmp(section_name, (dso__short_name(curr_dso) + dso__short_name_len(dso))) == 0)
 		return 0;
 
 	if (strcmp(section_name, ".text") == 0) {
@@ -1387,7 +1387,7 @@ static int dso__process_kernel_symbol(struct dso *dso, struct map *map,
 		 * kallsyms and identity maps.  Overwrite it to
 		 * map to the kernel dso.
 		 */
-		if (*remap_kernel && dso->kernel && !kmodule) {
+		if (*remap_kernel && dso__kernel(dso) && !kmodule) {
 			*remap_kernel = false;
 			map__set_start(map, shdr->sh_addr + ref_reloc(kmap));
 			map__set_end(map, map__start(map) + shdr->sh_size);
@@ -1424,7 +1424,7 @@ static int dso__process_kernel_symbol(struct dso *dso, struct map *map,
 	if (!kmap)
 		return 0;
 
-	snprintf(dso_name, sizeof(dso_name), "%s%s", dso->short_name, section_name);
+	snprintf(dso_name, sizeof(dso_name), "%s%s", dso__short_name(dso), section_name);
 
 	curr_map = maps__find_by_name(kmaps, dso_name);
 	if (curr_map == NULL) {
@@ -1436,17 +1436,17 @@ static int dso__process_kernel_symbol(struct dso *dso, struct map *map,
 		curr_dso = dso__new(dso_name);
 		if (curr_dso == NULL)
 			return -1;
-		curr_dso->kernel = dso->kernel;
-		curr_dso->long_name = dso->long_name;
-		curr_dso->long_name_len = dso->long_name_len;
-		curr_dso->binary_type = dso->binary_type;
-		curr_dso->adjust_symbols = dso->adjust_symbols;
+		dso__set_kernel(curr_dso, dso__kernel(dso));
+		RC_CHK_ACCESS(curr_dso)->long_name = dso__long_name(dso);
+		RC_CHK_ACCESS(curr_dso)->long_name_len = dso__long_name_len(dso);
+		dso__set_binary_type(curr_dso, dso__binary_type(dso));
+		dso__set_adjust_symbols(curr_dso, dso__adjust_symbols(dso));
 		curr_map = map__new2(start, curr_dso);
 		dso__put(curr_dso);
 		if (curr_map == NULL)
 			return -1;
 
-		if (curr_dso->kernel)
+		if (dso__kernel(curr_dso))
 			map__kmap(curr_map)->kmaps = kmaps;
 
 		if (adjust_kernel_syms) {
@@ -1456,7 +1456,7 @@ static int dso__process_kernel_symbol(struct dso *dso, struct map *map,
 		} else {
 			map__set_mapping_type(curr_map, MAPPING_TYPE__IDENTITY);
 		}
-		curr_dso->symtab_type = dso->symtab_type;
+		dso__set_symtab_type(curr_dso, dso__symtab_type(dso));
 		if (maps__insert(kmaps, curr_map))
 			return -1;
 		/*
@@ -1482,7 +1482,7 @@ static int
 dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 		       struct symsrc *runtime_ss, int kmodule, int dynsym)
 {
-	struct kmap *kmap = dso->kernel ? map__kmap(map) : NULL;
+	struct kmap *kmap = dso__kernel(dso) ? map__kmap(map) : NULL;
 	struct maps *kmaps = kmap ? map__kmaps(map) : NULL;
 	struct map *curr_map = map;
 	struct dso *curr_dso = dso;
@@ -1515,8 +1515,8 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 
 	if (elf_section_by_name(runtime_ss->elf, &runtime_ss->ehdr, &tshdr,
 				".text", NULL)) {
-		dso->text_offset = tshdr.sh_addr - tshdr.sh_offset;
-		dso->text_end = tshdr.sh_offset + tshdr.sh_size;
+		dso__set_text_offset(dso, tshdr.sh_addr - tshdr.sh_offset);
+		dso__set_text_end(dso, tshdr.sh_offset + tshdr.sh_size);
 	}
 
 	if (runtime_ss->opdsec)
@@ -1575,16 +1575,16 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 	 * attempted to prelink vdso to its virtual address.
 	 */
 	if (dso__is_vdso(dso))
-		map__set_reloc(map, map__start(map) - dso->text_offset);
+		map__set_reloc(map, map__start(map) - dso__text_offset(dso));
 
-	dso->adjust_symbols = runtime_ss->adjust_symbols || ref_reloc(kmap);
+	dso__set_adjust_symbols(dso, runtime_ss->adjust_symbols || ref_reloc(kmap));
 	/*
 	 * Initial kernel and module mappings do not map to the dso.
 	 * Flag the fixups.
 	 */
-	if (dso->kernel) {
+	if (dso__kernel(dso)) {
 		remap_kernel = true;
-		adjust_kernel_syms = dso->adjust_symbols;
+		adjust_kernel_syms = dso__adjust_symbols(dso);
 	}
 	elf_symtab__for_each_symbol(syms, nr_syms, idx, sym) {
 		struct symbol *f;
@@ -1673,7 +1673,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 		    (sym.st_value & 1))
 			--sym.st_value;
 
-		if (dso->kernel) {
+		if (dso__kernel(dso)) {
 			if (dso__process_kernel_symbol(dso, map, &sym, &shdr, kmaps, kmap, &curr_dso, &curr_map,
 						       section_name, adjust_kernel_syms, kmodule, &remap_kernel))
 				goto out_elf_end;
@@ -1721,7 +1721,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 
 		arch__sym_update(f, &sym);
 
-		__symbols__insert(&curr_dso->symbols, f, dso->kernel);
+		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
 		nr++;
 	}
 
@@ -1729,8 +1729,8 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 	 * For misannotated, zeroed, ASM function sizes.
 	 */
 	if (nr > 0) {
-		symbols__fixup_end(&dso->symbols, false);
-		symbols__fixup_duplicate(&dso->symbols);
+		symbols__fixup_end(dso__symbols(dso), false);
+		symbols__fixup_duplicate(dso__symbols(dso));
 		if (kmap) {
 			/*
 			 * We need to fixup this here too because we create new
@@ -1750,16 +1750,16 @@ int dso__load_sym(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 	int nr = 0;
 	int err = -1;
 
-	dso->symtab_type = syms_ss->type;
-	dso->is_64_bit = syms_ss->is_64_bit;
-	dso->rel = syms_ss->ehdr.e_type == ET_REL;
+	dso__set_symtab_type(dso, syms_ss->type);
+	dso__set_is_64_bit(dso, syms_ss->is_64_bit);
+	dso__set_rel(dso, syms_ss->ehdr.e_type == ET_REL);
 
 	/*
 	 * Modules may already have symbols from kallsyms, but those symbols
 	 * have the wrong values for the dso maps, so remove them.
 	 */
 	if (kmodule && syms_ss->symtab)
-		symbols__delete(&dso->symbols);
+		symbols__delete(dso__symbols(dso));
 
 	if (!syms_ss->symtab) {
 		/*
@@ -1767,7 +1767,7 @@ int dso__load_sym(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 		 * to using kallsyms. The vmlinux runtime symbols aren't
 		 * of much use.
 		 */
-		if (dso->kernel)
+		if (dso__kernel(dso))
 			return err;
 	} else  {
 		err = dso__load_sym_internal(dso, map, syms_ss, runtime_ss,
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 35975189999b..7a065a075a32 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -515,52 +515,52 @@ static struct symbol *symbols__find_by_name(struct symbol *symbols[],
 
 void dso__reset_find_symbol_cache(struct dso *dso)
 {
-	dso->last_find_result.addr   = 0;
-	dso->last_find_result.symbol = NULL;
+	dso__set_last_find_result_addr(dso, 0);
+	dso__set_last_find_result_symbol(dso, NULL);
 }
 
 void dso__insert_symbol(struct dso *dso, struct symbol *sym)
 {
-	__symbols__insert(&dso->symbols, sym, dso->kernel);
+	__symbols__insert(dso__symbols(dso), sym, dso__kernel(dso));
 
 	/* update the symbol cache if necessary */
-	if (dso->last_find_result.addr >= sym->start &&
-	    (dso->last_find_result.addr < sym->end ||
+	if (dso__last_find_result_addr(dso) >= sym->start &&
+	    (dso__last_find_result_addr(dso) < sym->end ||
 	    sym->start == sym->end)) {
-		dso->last_find_result.symbol = sym;
+		dso__set_last_find_result_symbol(dso, sym);
 	}
 }
 
 void dso__delete_symbol(struct dso *dso, struct symbol *sym)
 {
-	rb_erase_cached(&sym->rb_node, &dso->symbols);
+	rb_erase_cached(&sym->rb_node, dso__symbols(dso));
 	symbol__delete(sym);
 	dso__reset_find_symbol_cache(dso);
 }
 
 struct symbol *dso__find_symbol(struct dso *dso, u64 addr)
 {
-	if (dso->last_find_result.addr != addr || dso->last_find_result.symbol == NULL) {
-		dso->last_find_result.addr   = addr;
-		dso->last_find_result.symbol = symbols__find(&dso->symbols, addr);
+	if (dso__last_find_result_addr(dso) != addr || dso__last_find_result_symbol(dso) == NULL) {
+		dso__set_last_find_result_addr(dso, addr);
+		dso__set_last_find_result_symbol(dso, symbols__find(dso__symbols(dso), addr));
 	}
 
-	return dso->last_find_result.symbol;
+	return dso__last_find_result_symbol(dso);
 }
 
 struct symbol *dso__find_symbol_nocache(struct dso *dso, u64 addr)
 {
-	return symbols__find(&dso->symbols, addr);
+	return symbols__find(dso__symbols(dso), addr);
 }
 
 struct symbol *dso__first_symbol(struct dso *dso)
 {
-	return symbols__first(&dso->symbols);
+	return symbols__first(dso__symbols(dso));
 }
 
 struct symbol *dso__last_symbol(struct dso *dso)
 {
-	return symbols__last(&dso->symbols);
+	return symbols__last(dso__symbols(dso));
 }
 
 struct symbol *dso__next_symbol(struct symbol *sym)
@@ -570,11 +570,11 @@ struct symbol *dso__next_symbol(struct symbol *sym)
 
 struct symbol *dso__next_symbol_by_name(struct dso *dso, size_t *idx)
 {
-	if (*idx + 1 >= dso->symbol_names_len)
+	if (*idx + 1 >= dso__symbol_names_len(dso))
 		return NULL;
 
 	++*idx;
-	return dso->symbol_names[*idx];
+	return dso__symbol_names(dso)[*idx];
 }
 
  /*
@@ -582,27 +582,29 @@ struct symbol *dso__next_symbol_by_name(struct dso *dso, size_t *idx)
   */
 struct symbol *dso__find_symbol_by_name(struct dso *dso, const char *name, size_t *idx)
 {
-	struct symbol *s = symbols__find_by_name(dso->symbol_names, dso->symbol_names_len,
-						name, SYMBOL_TAG_INCLUDE__NONE, idx);
-	if (!s)
-		s = symbols__find_by_name(dso->symbol_names, dso->symbol_names_len,
-					name, SYMBOL_TAG_INCLUDE__DEFAULT_ONLY, idx);
+	struct symbol *s = symbols__find_by_name(dso__symbol_names(dso),
+						 dso__symbol_names_len(dso),
+						 name, SYMBOL_TAG_INCLUDE__NONE, idx);
+	if (!s) {
+		s = symbols__find_by_name(dso__symbol_names(dso), dso__symbol_names_len(dso),
+					  name, SYMBOL_TAG_INCLUDE__DEFAULT_ONLY, idx);
+	}
 	return s;
 }
 
 void dso__sort_by_name(struct dso *dso)
 {
-	mutex_lock(&dso->lock);
+	mutex_lock(dso__lock(dso));
 	if (!dso__sorted_by_name(dso)) {
 		size_t len;
 
-		dso->symbol_names = symbols__sort_by_name(&dso->symbols, &len);
-		if (dso->symbol_names) {
-			dso->symbol_names_len = len;
+		dso__set_symbol_names(dso, symbols__sort_by_name(dso__symbols(dso), &len));
+		if (dso__symbol_names(dso)) {
+			dso__set_symbol_names_len(dso, len);
 			dso__set_sorted_by_name(dso);
 		}
 	}
-	mutex_unlock(&dso->lock);
+	mutex_unlock(dso__lock(dso));
 }
 
 /*
@@ -729,7 +731,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
 {
 	struct symbol *sym;
 	struct dso *dso = arg;
-	struct rb_root_cached *root = &dso->symbols;
+	struct rb_root_cached *root = dso__symbols(dso);
 
 	if (!symbol_type__filter(type))
 		return 0;
@@ -769,8 +771,8 @@ static int maps__split_kallsyms_for_kcore(struct maps *kmaps, struct dso *dso)
 {
 	struct symbol *pos;
 	int count = 0;
-	struct rb_root_cached old_root = dso->symbols;
-	struct rb_root_cached *root = &dso->symbols;
+	struct rb_root_cached *root = dso__symbols(dso);
+	struct rb_root_cached old_root = *root;
 	struct rb_node *next = rb_first_cached(root);
 
 	if (!kmaps)
@@ -804,13 +806,13 @@ static int maps__split_kallsyms_for_kcore(struct maps *kmaps, struct dso *dso)
 			pos->end = map__end(curr_map);
 		if (pos->end)
 			pos->end -= map__start(curr_map) - map__pgoff(curr_map);
-		symbols__insert(&curr_map_dso->symbols, pos);
+		symbols__insert(dso__symbols(curr_map_dso), pos);
 		++count;
 		map__put(curr_map);
 	}
 
 	/* Symbols have been adjusted */
-	dso->adjust_symbols = 1;
+	dso__set_adjust_symbols(dso, true);
 
 	return count;
 }
@@ -827,7 +829,7 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
 	struct map *curr_map = map__get(initial_map);
 	struct symbol *pos;
 	int count = 0, moved = 0;
-	struct rb_root_cached *root = &dso->symbols;
+	struct rb_root_cached *root = dso__symbols(dso);
 	struct rb_node *next = rb_first_cached(root);
 	int kernel_range = 0;
 	bool x86_64;
@@ -854,9 +856,9 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
 
 			*module++ = '\0';
 			curr_map_dso = map__dso(curr_map);
-			if (strcmp(curr_map_dso->short_name, module)) {
+			if (strcmp(dso__short_name(curr_map_dso), module)) {
 				if (!RC_CHK_EQUAL(curr_map, initial_map) &&
-				    dso->kernel == DSO_SPACE__KERNEL_GUEST &&
+				    dso__kernel(dso) == DSO_SPACE__KERNEL_GUEST &&
 				    machine__is_default_guest(machine)) {
 					/*
 					 * We assume all symbols of a module are
@@ -879,7 +881,7 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
 					goto discard_symbol;
 				}
 				curr_map_dso = map__dso(curr_map);
-				if (curr_map_dso->loaded &&
+				if (dso__loaded(curr_map_dso) &&
 				    !machine__is_default_guest(machine))
 					goto discard_symbol;
 			}
@@ -915,7 +917,7 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
 				goto add_symbol;
 			}
 
-			if (dso->kernel == DSO_SPACE__KERNEL_GUEST)
+			if (dso__kernel(dso) == DSO_SPACE__KERNEL_GUEST)
 				snprintf(dso_name, sizeof(dso_name),
 					"[guest.kernel].%d",
 					kernel_range++);
@@ -929,7 +931,7 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
 			if (ndso == NULL)
 				return -1;
 
-			ndso->kernel = dso->kernel;
+			dso__set_kernel(ndso, dso__kernel(dso));
 
 			curr_map = map__new2(pos->start, ndso);
 			if (curr_map == NULL) {
@@ -954,7 +956,7 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
 			struct dso *curr_map_dso = map__dso(curr_map);
 
 			rb_erase_cached(&pos->rb_node, root);
-			symbols__insert(&curr_map_dso->symbols, pos);
+			symbols__insert(dso__symbols(curr_map_dso), pos);
 			++moved;
 		} else
 			++count;
@@ -966,7 +968,7 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
 	}
 
 	if (!RC_CHK_EQUAL(curr_map, initial_map) &&
-	    dso->kernel == DSO_SPACE__KERNEL_GUEST &&
+	    dso__kernel(dso) == DSO_SPACE__KERNEL_GUEST &&
 	    machine__is_default_guest(maps__machine(kmaps))) {
 		dso__set_loaded(map__dso(curr_map));
 	}
@@ -1140,7 +1142,7 @@ static int do_validate_kcore_modules_cb(struct map *old_map, void *data)
 
 	dso = map__dso(old_map);
 	/* Module must be in memory at the same address */
-	mi = find_module(dso->short_name, modules);
+	mi = find_module(dso__short_name(dso), modules);
 	if (!mi || mi->start != map__start(old_map))
 		return -EINVAL;
 
@@ -1309,7 +1311,7 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
 			      &is_64_bit);
 	if (err)
 		goto out_err;
-	dso->is_64_bit = is_64_bit;
+	dso__set_is_64_bit(dso, is_64_bit);
 
 	if (list_empty(&md.maps)) {
 		err = -EINVAL;
@@ -1401,10 +1403,10 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
 	 * Set the data type and long name so that kcore can be read via
 	 * dso__data_read_addr().
 	 */
-	if (dso->kernel == DSO_SPACE__KERNEL_GUEST)
-		dso->binary_type = DSO_BINARY_TYPE__GUEST_KCORE;
+	if (dso__kernel(dso) == DSO_SPACE__KERNEL_GUEST)
+		dso__set_binary_type(dso, DSO_BINARY_TYPE__GUEST_KCORE);
 	else
-		dso->binary_type = DSO_BINARY_TYPE__KCORE;
+		dso__set_binary_type(dso, DSO_BINARY_TYPE__KCORE);
 	dso__set_long_name(dso, strdup(kcore_filename), true);
 
 	close(fd);
@@ -1465,13 +1467,13 @@ int __dso__load_kallsyms(struct dso *dso, const char *filename,
 	if (kallsyms__delta(kmap, filename, &delta))
 		return -1;
 
-	symbols__fixup_end(&dso->symbols, true);
-	symbols__fixup_duplicate(&dso->symbols);
+	symbols__fixup_end(dso__symbols(dso), true);
+	symbols__fixup_duplicate(dso__symbols(dso));
 
-	if (dso->kernel == DSO_SPACE__KERNEL_GUEST)
-		dso->symtab_type = DSO_BINARY_TYPE__GUEST_KALLSYMS;
+	if (dso__kernel(dso) == DSO_SPACE__KERNEL_GUEST)
+		dso__set_symtab_type(dso, DSO_BINARY_TYPE__GUEST_KALLSYMS);
 	else
-		dso->symtab_type = DSO_BINARY_TYPE__KALLSYMS;
+		dso__set_symtab_type(dso, DSO_BINARY_TYPE__KALLSYMS);
 
 	if (!no_kcore && !dso__load_kcore(dso, map, filename))
 		return maps__split_kallsyms_for_kcore(kmap->kmaps, dso);
@@ -1527,7 +1529,7 @@ static int dso__load_perf_map(const char *map_path, struct dso *dso)
 		if (sym == NULL)
 			goto out_delete_line;
 
-		symbols__insert(&dso->symbols, sym);
+		symbols__insert(dso__symbols(dso), sym);
 		nr_syms++;
 	}
 
@@ -1653,15 +1655,15 @@ int dso__load_bfd_symbols(struct dso *dso, const char *debugfile)
 		if (!symbol)
 			goto out_free;
 
-		symbols__insert(&dso->symbols, symbol);
+		symbols__insert(dso__symbols(dso), symbol);
 	}
 #ifdef bfd_get_section
 #undef bfd_asymbol_section
 #endif
 
-	symbols__fixup_end(&dso->symbols, false);
-	symbols__fixup_duplicate(&dso->symbols);
-	dso->adjust_symbols = 1;
+	symbols__fixup_end(dso__symbols(dso), false);
+	symbols__fixup_duplicate(dso__symbols(dso));
+	dso__set_adjust_symbols(dso);
 
 	err = 0;
 out_free:
@@ -1684,17 +1686,17 @@ static bool dso__is_compatible_symtab_type(struct dso *dso, bool kmod,
 	case DSO_BINARY_TYPE__MIXEDUP_UBUNTU_DEBUGINFO:
 	case DSO_BINARY_TYPE__BUILDID_DEBUGINFO:
 	case DSO_BINARY_TYPE__OPENEMBEDDED_DEBUGINFO:
-		return !kmod && dso->kernel == DSO_SPACE__USER;
+		return !kmod && dso__kernel(dso) == DSO_SPACE__USER;
 
 	case DSO_BINARY_TYPE__KALLSYMS:
 	case DSO_BINARY_TYPE__VMLINUX:
 	case DSO_BINARY_TYPE__KCORE:
-		return dso->kernel == DSO_SPACE__KERNEL;
+		return dso__kernel(dso) == DSO_SPACE__KERNEL;
 
 	case DSO_BINARY_TYPE__GUEST_KALLSYMS:
 	case DSO_BINARY_TYPE__GUEST_VMLINUX:
 	case DSO_BINARY_TYPE__GUEST_KCORE:
-		return dso->kernel == DSO_SPACE__KERNEL_GUEST;
+		return dso__kernel(dso) == DSO_SPACE__KERNEL_GUEST;
 
 	case DSO_BINARY_TYPE__GUEST_KMODULE:
 	case DSO_BINARY_TYPE__GUEST_KMODULE_COMP:
@@ -1704,7 +1706,7 @@ static bool dso__is_compatible_symtab_type(struct dso *dso, bool kmod,
 		 * kernel modules know their symtab type - it's set when
 		 * creating a module dso in machine__addnew_module_map().
 		 */
-		return kmod && dso->symtab_type == type;
+		return kmod && dso__symtab_type(dso) == type;
 
 	case DSO_BINARY_TYPE__BUILD_ID_CACHE:
 	case DSO_BINARY_TYPE__BUILD_ID_CACHE_DEBUGINFO:
@@ -1772,18 +1774,19 @@ int dso__load(struct dso *dso, struct map *map)
 	struct build_id bid;
 	struct nscookie nsc;
 	char newmapname[PATH_MAX];
-	const char *map_path = dso->long_name;
+	const char *map_path = dso__long_name(dso);
 
-	mutex_lock(&dso->lock);
-	perfmap = strncmp(dso->name, "/tmp/perf-", 10) == 0;
+	mutex_lock(dso__lock(dso));
+	perfmap = strncmp(dso__name(dso), "/tmp/perf-", 10) == 0;
 	if (perfmap) {
-		if (dso->nsinfo && (dso__find_perf_map(newmapname,
-		    sizeof(newmapname), &dso->nsinfo) == 0)) {
+		if (dso__nsinfo(dso) &&
+		    (dso__find_perf_map(newmapname, sizeof(newmapname),
+					dso__nsinfo_ptr(dso)) == 0)) {
 			map_path = newmapname;
 		}
 	}
 
-	nsinfo__mountns_enter(dso->nsinfo, &nsc);
+	nsinfo__mountns_enter(dso__nsinfo(dso), &nsc);
 
 	/* check again under the dso->lock */
 	if (dso__loaded(dso)) {
@@ -1791,15 +1794,15 @@ int dso__load(struct dso *dso, struct map *map)
 		goto out;
 	}
 
-	kmod = dso->symtab_type == DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE ||
-		dso->symtab_type == DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE_COMP ||
-		dso->symtab_type == DSO_BINARY_TYPE__GUEST_KMODULE ||
-		dso->symtab_type == DSO_BINARY_TYPE__GUEST_KMODULE_COMP;
+	kmod = dso__symtab_type(dso) == DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE ||
+		dso__symtab_type(dso) == DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE_COMP ||
+		dso__symtab_type(dso) == DSO_BINARY_TYPE__GUEST_KMODULE ||
+		dso__symtab_type(dso) == DSO_BINARY_TYPE__GUEST_KMODULE_COMP;
 
-	if (dso->kernel && !kmod) {
-		if (dso->kernel == DSO_SPACE__KERNEL)
+	if (dso__kernel(dso) && !kmod) {
+		if (dso__kernel(dso) == DSO_SPACE__KERNEL)
 			ret = dso__load_kernel_sym(dso, map);
-		else if (dso->kernel == DSO_SPACE__KERNEL_GUEST)
+		else if (dso__kernel(dso) == DSO_SPACE__KERNEL_GUEST)
 			ret = dso__load_guest_kernel_sym(dso, map);
 
 		machine = maps__machine(map__kmaps(map));
@@ -1808,12 +1811,13 @@ int dso__load(struct dso *dso, struct map *map)
 		goto out;
 	}
 
-	dso->adjust_symbols = 0;
+	dso__set_adjust_symbols(dso, false);
 
 	if (perfmap) {
 		ret = dso__load_perf_map(map_path, dso);
-		dso->symtab_type = ret > 0 ? DSO_BINARY_TYPE__JAVA_JIT :
-					     DSO_BINARY_TYPE__NOT_FOUND;
+		dso__set_symtab_type(dso, ret > 0
+				? DSO_BINARY_TYPE__JAVA_JIT
+				: DSO_BINARY_TYPE__NOT_FOUND);
 		goto out;
 	}
 
@@ -1828,9 +1832,9 @@ int dso__load(struct dso *dso, struct map *map)
 	 * Read the build id if possible. This is required for
 	 * DSO_BINARY_TYPE__BUILDID_DEBUGINFO to work
 	 */
-	if (!dso->has_build_id &&
-	    is_regular_file(dso->long_name)) {
-	    __symbol__join_symfs(name, PATH_MAX, dso->long_name);
+	if (!dso__has_build_id(dso) &&
+	    is_regular_file(dso__long_name(dso))) {
+		__symbol__join_symfs(name, PATH_MAX, dso__long_name(dso));
 		if (filename__read_build_id(name, &bid) > 0)
 			dso__set_build_id(dso, &bid);
 	}
@@ -1864,7 +1868,7 @@ int dso__load(struct dso *dso, struct map *map)
 			nsinfo__mountns_exit(&nsc);
 
 		is_reg = is_regular_file(name);
-		if (!is_reg && errno == ENOENT && dso->nsinfo) {
+		if (!is_reg && errno == ENOENT && dso__nsinfo(dso)) {
 			char *new_name = dso__filename_with_chroot(dso, name);
 			if (new_name) {
 				is_reg = is_regular_file(new_name);
@@ -1881,7 +1885,7 @@ int dso__load(struct dso *dso, struct map *map)
 			sirc = symsrc__init(ss, dso, name, symtab_type);
 
 		if (nsexit)
-			nsinfo__mountns_enter(dso->nsinfo, &nsc);
+			nsinfo__mountns_enter(dso__nsinfo(dso), &nsc);
 
 		if (bfdrc == 0) {
 			ret = 0;
@@ -1894,8 +1898,8 @@ int dso__load(struct dso *dso, struct map *map)
 		if (!syms_ss && symsrc__has_symtab(ss)) {
 			syms_ss = ss;
 			next_slot = true;
-			if (!dso->symsrc_filename)
-				dso->symsrc_filename = strdup(name);
+			if (!dso__symsrc_filename(dso))
+				dso__set_symsrc_filename(dso, strdup(name));
 		}
 
 		if (!runtime_ss && symsrc__possibly_runtime(ss)) {
@@ -1942,11 +1946,11 @@ int dso__load(struct dso *dso, struct map *map)
 		symsrc__destroy(&ss_[ss_pos - 1]);
 out_free:
 	free(name);
-	if (ret < 0 && strstr(dso->name, " (deleted)") != NULL)
+	if (ret < 0 && strstr(dso__name(dso), " (deleted)") != NULL)
 		ret = 0;
 out:
 	dso__set_loaded(dso);
-	mutex_unlock(&dso->lock);
+	mutex_unlock(dso__lock(dso));
 	nsinfo__mountns_exit(&nsc);
 
 	return ret;
@@ -1965,7 +1969,7 @@ int dso__load_vmlinux(struct dso *dso, struct map *map,
 	else
 		symbol__join_symfs(symfs_vmlinux, vmlinux);
 
-	if (dso->kernel == DSO_SPACE__KERNEL_GUEST)
+	if (dso__kernel(dso) == DSO_SPACE__KERNEL_GUEST)
 		symtab_type = DSO_BINARY_TYPE__GUEST_VMLINUX;
 	else
 		symtab_type = DSO_BINARY_TYPE__VMLINUX;
@@ -1978,10 +1982,10 @@ int dso__load_vmlinux(struct dso *dso, struct map *map,
 	 * an incorrect long name unless we set it here first.
 	 */
 	dso__set_long_name(dso, vmlinux, vmlinux_allocated);
-	if (dso->kernel == DSO_SPACE__KERNEL_GUEST)
-		dso->binary_type = DSO_BINARY_TYPE__GUEST_VMLINUX;
+	if (dso__kernel(dso) == DSO_SPACE__KERNEL_GUEST)
+		dso__set_binary_type(dso, DSO_BINARY_TYPE__GUEST_VMLINUX);
 	else
-		dso->binary_type = DSO_BINARY_TYPE__VMLINUX;
+		dso__set_binary_type(dso, DSO_BINARY_TYPE__VMLINUX);
 
 	err = dso__load_sym(dso, map, &ss, &ss, 0);
 	symsrc__destroy(&ss);
@@ -2074,7 +2078,7 @@ static char *dso__find_kallsyms(struct dso *dso, struct map *map)
 	bool is_host = false;
 	char path[PATH_MAX];
 
-	if (!dso->has_build_id) {
+	if (!dso__has_build_id(dso)) {
 		/*
 		 * Last resort, if we don't have a build-id and couldn't find
 		 * any vmlinux file, try the running kernel kallsyms table.
@@ -2099,7 +2103,7 @@ static char *dso__find_kallsyms(struct dso *dso, struct map *map)
 			goto proc_kallsyms;
 	}
 
-	build_id__sprintf(&dso->bid, sbuild_id);
+	build_id__sprintf(dso__bid(dso), sbuild_id);
 
 	/* Find kallsyms in build-id cache with kcore */
 	scnprintf(path, sizeof(path), "%s/%s/%s",
@@ -2192,7 +2196,7 @@ static int dso__load_kernel_sym(struct dso *dso, struct map *map)
 	free(kallsyms_allocated_filename);
 
 	if (err > 0 && !dso__is_kcore(dso)) {
-		dso->binary_type = DSO_BINARY_TYPE__KALLSYMS;
+		dso__set_binary_type(dso, DSO_BINARY_TYPE__KALLSYMS);
 		dso__set_long_name(dso, DSO__NAME_KALLSYMS, false);
 		map__fixup_start(map);
 		map__fixup_end(map);
@@ -2235,7 +2239,7 @@ static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map)
 	if (err > 0)
 		pr_debug("Using %s for symbols\n", kallsyms_filename);
 	if (err > 0 && !dso__is_kcore(dso)) {
-		dso->binary_type = DSO_BINARY_TYPE__GUEST_KALLSYMS;
+		dso__set_binary_type(dso, DSO_BINARY_TYPE__GUEST_KALLSYMS);
 		dso__set_long_name(dso, machine->mmap_name, false);
 		map__fixup_start(map);
 		map__fixup_end(map);
diff --git a/tools/perf/util/symbol_fprintf.c b/tools/perf/util/symbol_fprintf.c
index 088f4abf230f..53e1af4ed9ac 100644
--- a/tools/perf/util/symbol_fprintf.c
+++ b/tools/perf/util/symbol_fprintf.c
@@ -64,8 +64,8 @@ size_t dso__fprintf_symbols_by_name(struct dso *dso,
 {
 	size_t ret = 0;
 
-	for (size_t i = 0; i < dso->symbol_names_len; i++) {
-		struct symbol *pos = dso->symbol_names[i];
+	for (size_t i = 0; i < dso__symbol_names_len(dso); i++) {
+		struct symbol *pos = dso__symbol_names(dso)[i];
 
 		ret += fprintf(fp, "%s\n", pos->name);
 	}
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index cdab6aa04917..5190566676a7 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -386,8 +386,8 @@ static void perf_record_mmap2__read_build_id(struct perf_record_mmap2 *event,
 	id.ino_generation = event->ino_generation;
 
 	dso = dsos__findnew_id(&machine->dsos, event->filename, &id);
-	if (dso && dso->has_build_id) {
-		bid = dso->bid;
+	if (dso && dso__has_build_id(dso)) {
+		bid = *dso__bid(dso);
 		rc = 0;
 		goto out;
 	}
@@ -408,7 +408,7 @@ static void perf_record_mmap2__read_build_id(struct perf_record_mmap2 *event,
 		event->__reserved_1 = 0;
 		event->__reserved_2 = 0;
 
-		if (dso && !dso->has_build_id)
+		if (dso && !dso__has_build_id(dso))
 			dso__set_build_id(dso, &bid);
 	} else {
 		if (event->filename[0] == '/') {
@@ -685,7 +685,7 @@ static int perf_event__synthesize_modules_maps_cb(struct map *map, void *data)
 
 	dso = map__dso(map);
 	if (symbol_conf.buildid_mmap2) {
-		size = PERF_ALIGN(dso->long_name_len + 1, sizeof(u64));
+		size = PERF_ALIGN(dso__long_name_len(dso) + 1, sizeof(u64));
 		event->mmap2.header.type = PERF_RECORD_MMAP2;
 		event->mmap2.header.size = (sizeof(event->mmap2) -
 					(sizeof(event->mmap2.filename) - size));
@@ -695,11 +695,11 @@ static int perf_event__synthesize_modules_maps_cb(struct map *map, void *data)
 		event->mmap2.len   = map__size(map);
 		event->mmap2.pid   = args->machine->pid;
 
-		memcpy(event->mmap2.filename, dso->long_name, dso->long_name_len + 1);
+		memcpy(event->mmap2.filename, dso__long_name(dso), dso__long_name_len(dso) + 1);
 
 		perf_record_mmap2__read_build_id(&event->mmap2, args->machine, false);
 	} else {
-		size = PERF_ALIGN(dso->long_name_len + 1, sizeof(u64));
+		size = PERF_ALIGN(dso__long_name_len(dso) + 1, sizeof(u64));
 		event->mmap.header.type = PERF_RECORD_MMAP;
 		event->mmap.header.size = (sizeof(event->mmap) -
 					(sizeof(event->mmap.filename) - size));
@@ -709,7 +709,7 @@ static int perf_event__synthesize_modules_maps_cb(struct map *map, void *data)
 		event->mmap.len   = map__size(map);
 		event->mmap.pid   = args->machine->pid;
 
-		memcpy(event->mmap.filename, dso->long_name, dso->long_name_len + 1);
+		memcpy(event->mmap.filename, dso__long_name(dso), dso__long_name_len(dso) + 1);
 	}
 
 	if (perf_tool__process_synth_event(args->tool, event, args->machine, args->process) != 0)
@@ -2233,20 +2233,20 @@ int perf_event__synthesize_build_id(struct perf_tool *tool, struct dso *pos, u16
 	union perf_event ev;
 	size_t len;
 
-	if (!pos->hit)
+	if (!dso__hit(pos))
 		return 0;
 
 	memset(&ev, 0, sizeof(ev));
 
-	len = pos->long_name_len + 1;
+	len = dso__long_name_len(pos) + 1;
 	len = PERF_ALIGN(len, NAME_ALIGN);
-	ev.build_id.size = min(pos->bid.size, sizeof(pos->bid.data));
-	memcpy(&ev.build_id.build_id, pos->bid.data, ev.build_id.size);
+	ev.build_id.size = min(dso__bid(pos)->size, sizeof(dso__bid(pos)->data));
+	memcpy(&ev.build_id.build_id, dso__bid(pos)->data, ev.build_id.size);
 	ev.build_id.header.type = PERF_RECORD_HEADER_BUILD_ID;
 	ev.build_id.header.misc = misc | PERF_RECORD_MISC_BUILD_ID_SIZE;
 	ev.build_id.pid = machine->pid;
 	ev.build_id.header.size = sizeof(ev.build_id) + len;
-	memcpy(&ev.build_id.filename, pos->long_name, pos->long_name_len);
+	memcpy(&ev.build_id.filename, dso__long_name(pos), dso__long_name_len(pos));
 
 	return process(tool, &ev, NULL, machine);
 }
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 1aa8962dcf52..0a473112f881 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -457,14 +457,14 @@ int thread__memcpy(struct thread *thread, struct machine *machine,
 
 	dso = map__dso(al.map);
 
-	if (!dso || dso->data.status == DSO_DATA_STATUS_ERROR || map__load(al.map) < 0) {
+	if (!dso || dso__data(dso)->status == DSO_DATA_STATUS_ERROR || map__load(al.map) < 0) {
 		addr_location__exit(&al);
 		return -1;
 	}
 
 	offset = map__map_ip(al.map, ip);
 	if (is64bit)
-		*is64bit = dso->is_64_bit;
+		*is64bit = dso__is_64_bit(dso);
 
 	addr_location__exit(&al);
 
diff --git a/tools/perf/util/unwind-libunwind-local.c b/tools/perf/util/unwind-libunwind-local.c
index b69dc3a447db..2e7fc5c987f8 100644
--- a/tools/perf/util/unwind-libunwind-local.c
+++ b/tools/perf/util/unwind-libunwind-local.c
@@ -329,27 +329,27 @@ static int read_unwind_spec_eh_frame(struct dso *dso, struct unwind_info *ui,
 	};
 	int ret, fd;
 
-	if (dso->data.eh_frame_hdr_offset == 0) {
+	if (dso__data(dso)->eh_frame_hdr_offset == 0) {
 		fd = dso__data_get_fd(dso, ui->machine);
 		if (fd < 0)
 			return -EINVAL;
 
 		/* Check the .eh_frame section for unwinding info */
 		ret = elf_section_address_and_offset(fd, ".eh_frame_hdr",
-						     &dso->data.eh_frame_hdr_addr,
-						     &dso->data.eh_frame_hdr_offset);
-		dso->data.elf_base_addr = elf_base_address(fd);
+						     &dso__data(dso)->eh_frame_hdr_addr,
+						     &dso__data(dso)->eh_frame_hdr_offset);
+		dso__data(dso)->elf_base_addr = elf_base_address(fd);
 		dso__data_put_fd(dso);
-		if (ret || dso->data.eh_frame_hdr_offset == 0)
+		if (ret || dso__data(dso)->eh_frame_hdr_offset == 0)
 			return -EINVAL;
 	}
 
 	maps__for_each_map(thread__maps(ui->thread), read_unwind_spec_eh_frame_maps_cb, &args);
 
-	args.base_addr -= dso->data.elf_base_addr;
+	args.base_addr -= dso__data(dso)->elf_base_addr;
 	/* Address of .eh_frame_hdr */
-	*segbase = args.base_addr + dso->data.eh_frame_hdr_addr;
-	ret = unwind_spec_ehframe(dso, ui->machine, dso->data.eh_frame_hdr_offset,
+	*segbase = args.base_addr + dso__data(dso)->eh_frame_hdr_addr;
+	ret = unwind_spec_ehframe(dso, ui->machine, dso__data(dso)->eh_frame_hdr_offset,
 				   table_data, fde_count);
 	if (ret)
 		return ret;
@@ -460,7 +460,7 @@ find_proc_info(unw_addr_space_t as, unw_word_t ip, unw_proc_info_t *pi,
 		return -EINVAL;
 	}
 
-	pr_debug("unwind: find_proc_info dso %s\n", dso->name);
+	pr_debug("unwind: find_proc_info dso %s\n", dso__name(dso));
 
 	/* Check the .eh_frame section for unwinding info */
 	if (!read_unwind_spec_eh_frame(dso, ui, &table_data, &segbase, &fde_count)) {
diff --git a/tools/perf/util/unwind-libunwind.c b/tools/perf/util/unwind-libunwind.c
index 2728eb4f13ea..cb8be6acfb6f 100644
--- a/tools/perf/util/unwind-libunwind.c
+++ b/tools/perf/util/unwind-libunwind.c
@@ -25,7 +25,7 @@ int unwind__prepare_access(struct maps *maps, struct map *map, bool *initialized
 		return 0;
 
 	if (maps__addr_space(maps)) {
-		pr_debug("unwind: thread map already set, dso=%s\n", dso->name);
+		pr_debug("unwind: thread map already set, dso=%s\n", dso__name(dso));
 		if (initialized)
 			*initialized = true;
 		return 0;
diff --git a/tools/perf/util/vdso.c b/tools/perf/util/vdso.c
index 35532dcbff74..1b6f8f6db7aa 100644
--- a/tools/perf/util/vdso.c
+++ b/tools/perf/util/vdso.c
@@ -148,7 +148,7 @@ static int machine__thread_dso_type_maps_cb(struct map *map, void *data)
 	struct machine__thread_dso_type_maps_cb_args *args = data;
 	struct dso *dso = map__dso(map);
 
-	if (!dso || dso->long_name[0] != '/')
+	if (!dso || dso__long_name(dso)[0] != '/')
 		return 0;
 
 	args->dso_type = dso__type(dso, args->machine);
@@ -361,7 +361,7 @@ struct dso *machine__findnew_vdso(struct machine *machine,
 
 bool dso__is_vdso(struct dso *dso)
 {
-	return !strcmp(dso->short_name, DSO__NAME_VDSO) ||
-	       !strcmp(dso->short_name, DSO__NAME_VDSO32) ||
-	       !strcmp(dso->short_name, DSO__NAME_VDSOX32);
+	return !strcmp(dso__short_name(dso), DSO__NAME_VDSO) ||
+	       !strcmp(dso__short_name(dso), DSO__NAME_VDSO32) ||
+	       !strcmp(dso__short_name(dso), DSO__NAME_VDSOX32);
 }
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 51/53] perf dso: Reference counting related fixes
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (49 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 50/53] perf dso: Add reference count checking and accessor functions Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 52/53] perf dso: Use container_of to avoid a pointer in dso_data Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 53/53] perf env: Avoid recursively taking env->bpf_progs.lock Ian Rogers
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Ensure gets and puts are better aligned fixing reference couting
checking problems.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/machine.c    |  4 ++--
 tools/perf/util/map.c        |  1 +
 tools/perf/util/symbol-elf.c | 38 +++++++++++++++++-------------------
 3 files changed, 21 insertions(+), 22 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index d4534fbc7098..bd1e2da5bb6d 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -683,7 +683,7 @@ static int machine__process_ksymbol_register(struct machine *machine,
 					     struct perf_sample *sample __maybe_unused)
 {
 	struct symbol *sym;
-	struct dso *dso;
+	struct dso *dso = NULL;
 	struct map *map = maps__find(machine__kernel_maps(machine), event->ksymbol.addr);
 	int err = 0;
 
@@ -696,7 +696,6 @@ static int machine__process_ksymbol_register(struct machine *machine,
 		}
 		dso__set_kernel(dso, DSO_SPACE__KERNEL);
 		map = map__new2(0, dso);
-		dso__put(dso);
 		if (!map) {
 			err = -ENOMEM;
 			goto out;
@@ -735,6 +734,7 @@ static int machine__process_ksymbol_register(struct machine *machine,
 	dso__insert_symbol(dso, sym);
 out:
 	map__put(map);
+	dso__put(dso);
 	return err;
 }
 
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 14fb8cf65b13..4480134ef4ea 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -200,6 +200,7 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
 				dso__set_build_id(dso, dso__bid(header_bid_dso));
 				dso__set_header_build_id(dso, 1);
 			}
+			dso__put(header_bid_dso);
 		}
 		dso__put(dso);
 	}
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index de73f9fb3fe4..4c00463abb7e 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1366,7 +1366,7 @@ void __weak arch__sym_update(struct symbol *s __maybe_unused,
 static int dso__process_kernel_symbol(struct dso *dso, struct map *map,
 				      GElf_Sym *sym, GElf_Shdr *shdr,
 				      struct maps *kmaps, struct kmap *kmap,
-				      struct dso **curr_dsop, struct map **curr_mapp,
+				      struct dso **curr_dsop,
 				      const char *section_name,
 				      bool adjust_kernel_syms, bool kmodule, bool *remap_kernel)
 {
@@ -1416,8 +1416,8 @@ static int dso__process_kernel_symbol(struct dso *dso, struct map *map,
 			map__set_pgoff(map, shdr->sh_offset);
 		}
 
-		*curr_mapp = map;
-		*curr_dsop = dso;
+		dso__put(*curr_dsop);
+		*curr_dsop = dso__get(dso);
 		return 0;
 	}
 
@@ -1442,10 +1442,10 @@ static int dso__process_kernel_symbol(struct dso *dso, struct map *map,
 		dso__set_binary_type(curr_dso, dso__binary_type(dso));
 		dso__set_adjust_symbols(curr_dso, dso__adjust_symbols(dso));
 		curr_map = map__new2(start, curr_dso);
-		dso__put(curr_dso);
-		if (curr_map == NULL)
+		if (curr_map == NULL) {
+			dso__put(curr_dso);
 			return -1;
-
+		}
 		if (dso__kernel(curr_dso))
 			map__kmap(curr_map)->kmaps = kmaps;
 
@@ -1459,21 +1459,15 @@ static int dso__process_kernel_symbol(struct dso *dso, struct map *map,
 		dso__set_symtab_type(curr_dso, dso__symtab_type(dso));
 		if (maps__insert(kmaps, curr_map))
 			return -1;
-		/*
-		 * Add it before we drop the reference to curr_map, i.e. while
-		 * we still are sure to have a reference to this DSO via
-		 * *curr_map->dso.
-		 */
 		dsos__add(&maps__machine(kmaps)->dsos, curr_dso);
-		/* kmaps already got it */
-		map__put(curr_map);
 		dso__set_loaded(curr_dso);
-		*curr_mapp = curr_map;
+		dso__put(*curr_dsop);
 		*curr_dsop = curr_dso;
 	} else {
-		*curr_dsop = map__dso(curr_map);
-		map__put(curr_map);
+		dso__put(*curr_dsop);
+		*curr_dsop = dso__get(map__dso(curr_map));
 	}
+	map__put(curr_map);
 
 	return 0;
 }
@@ -1484,8 +1478,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 {
 	struct kmap *kmap = dso__kernel(dso) ? map__kmap(map) : NULL;
 	struct maps *kmaps = kmap ? map__kmaps(map) : NULL;
-	struct map *curr_map = map;
-	struct dso *curr_dso = dso;
+	struct dso *curr_dso;
 	Elf_Data *symstrs, *secstrs, *secstrs_run, *secstrs_sym;
 	uint32_t nr_syms;
 	int err = -1;
@@ -1586,6 +1579,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 		remap_kernel = true;
 		adjust_kernel_syms = dso__adjust_symbols(dso);
 	}
+	curr_dso = dso__get(dso);
 	elf_symtab__for_each_symbol(syms, nr_syms, idx, sym) {
 		struct symbol *f;
 		const char *elf_name = elf_sym__name(&sym, symstrs);
@@ -1674,8 +1668,11 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 			--sym.st_value;
 
 		if (dso__kernel(dso)) {
-			if (dso__process_kernel_symbol(dso, map, &sym, &shdr, kmaps, kmap, &curr_dso, &curr_map,
-						       section_name, adjust_kernel_syms, kmodule, &remap_kernel))
+			if (dso__process_kernel_symbol(dso, map, &sym, &shdr,
+						       kmaps, kmap, &curr_dso,
+						       section_name,
+						       adjust_kernel_syms,
+						       kmodule, &remap_kernel))
 				goto out_elf_end;
 		} else if ((used_opd && runtime_ss->adjust_symbols) ||
 			   (!used_opd && syms_ss->adjust_symbols)) {
@@ -1724,6 +1721,7 @@ dso__load_sym_internal(struct dso *dso, struct map *map, struct symsrc *syms_ss,
 		__symbols__insert(dso__symbols(curr_dso), f, dso__kernel(dso));
 		nr++;
 	}
+	dso__put(curr_dso);
 
 	/*
 	 * For misannotated, zeroed, ASM function sizes.
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 52/53] perf dso: Use container_of to avoid a pointer in dso_data
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (50 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 51/53] perf dso: Reference counting related fixes Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  2023-11-02 17:57 ` [PATCH v4 53/53] perf env: Avoid recursively taking env->bpf_progs.lock Ian Rogers
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

The dso pointer in dso_data is necessary for reference count checking
to account for the dso_data forming a global list of open dso's with
references to the dso. The dso pointer also allows for the indirection
that reference count checking needs. Outside of reference count
checking the indirection isn't needed and container_of is more
efficient and saves space.

The reference count won't be increased by placing items onto the
global list, matching how things were before the reference count
checking change, but we assert the dso is in dsos holding it live (and
that the set of open dsos is a subset of all dsos for the
machine). Update the DSO data tests so that they use a dsos struct to
make the invariant true.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/tests/dso-data.c | 56 +++++++++++++++++--------------------
 tools/perf/util/dso.c       | 16 ++++++++++-
 tools/perf/util/dso.h       |  2 ++
 3 files changed, 42 insertions(+), 32 deletions(-)

diff --git a/tools/perf/tests/dso-data.c b/tools/perf/tests/dso-data.c
index 625dbb2ffe8a..e5018d390763 100644
--- a/tools/perf/tests/dso-data.c
+++ b/tools/perf/tests/dso-data.c
@@ -10,6 +10,7 @@
 #include <sys/resource.h>
 #include <api/fs/fs.h>
 #include "dso.h"
+#include "dsos.h"
 #include "machine.h"
 #include "symbol.h"
 #include "tests.h"
@@ -123,9 +124,10 @@ static int test__dso_data(struct test_suite *test __maybe_unused, int subtest __
 	TEST_ASSERT_VAL("No test file", file);
 
 	memset(&machine, 0, sizeof(machine));
+	dsos__init(&machine.dsos);
 
-	dso = dso__new((const char *)file);
-
+	dso = dso__new(file);
+	TEST_ASSERT_VAL("Failed to add dso", !dsos__add(&machine.dsos, dso));
 	TEST_ASSERT_VAL("Failed to access to dso",
 			dso__data_fd(dso, &machine) >= 0);
 
@@ -170,6 +172,7 @@ static int test__dso_data(struct test_suite *test __maybe_unused, int subtest __
 	}
 
 	dso__put(dso);
+	dsos__exit(&machine.dsos);
 	unlink(file);
 	return 0;
 }
@@ -199,41 +202,31 @@ static long open_files_cnt(void)
 	return nr - 1;
 }
 
-static struct dso **dsos;
-
-static int dsos__create(int cnt, int size)
+static int dsos__create(int cnt, int size, struct dsos *dsos)
 {
 	int i;
 
-	dsos = malloc(sizeof(*dsos) * cnt);
-	TEST_ASSERT_VAL("failed to alloc dsos array", dsos);
+	dsos__init(dsos);
 
 	for (i = 0; i < cnt; i++) {
-		char *file;
+		char *file = test_file(size);
 
-		file = test_file(size);
 		TEST_ASSERT_VAL("failed to get dso file", file);
-
-		dsos[i] = dso__new(file);
-		TEST_ASSERT_VAL("failed to get dso", dsos[i]);
+		TEST_ASSERT_VAL("failed to get dso", !dsos__add(dsos, dso__new(file)));
 	}
 
 	return 0;
 }
 
-static void dsos__delete(int cnt)
+static void dsos__delete(struct dsos *dsos)
 {
-	int i;
-
-	for (i = 0; i < cnt; i++) {
-		struct dso *dso = dsos[i];
+	for (unsigned int i = 0; i < dsos->cnt; i++) {
+		struct dso *dso = dsos->dsos[i];
 
 		dso__data_close(dso);
 		unlink(dso__name(dso));
-		dso__put(dso);
 	}
-
-	free(dsos);
+	dsos__exit(dsos);
 }
 
 static int set_fd_limit(int n)
@@ -267,10 +260,10 @@ static int test__dso_data_cache(struct test_suite *test __maybe_unused, int subt
 	/* and this is now our dso open FDs limit */
 	dso_cnt = limit / 2;
 	TEST_ASSERT_VAL("failed to create dsos\n",
-		!dsos__create(dso_cnt, TEST_FILE_SIZE));
+			!dsos__create(dso_cnt, TEST_FILE_SIZE, &machine.dsos));
 
 	for (i = 0; i < (dso_cnt - 1); i++) {
-		struct dso *dso = dsos[i];
+		struct dso *dso = machine.dsos.dsos[i];
 
 		/*
 		 * Open dsos via dso__data_fd(), it opens the data
@@ -290,17 +283,17 @@ static int test__dso_data_cache(struct test_suite *test __maybe_unused, int subt
 	}
 
 	/* verify the first one is already open */
-	TEST_ASSERT_VAL("dsos[0] is not open", dso__data(dsos[0])->fd != -1);
+	TEST_ASSERT_VAL("dsos[0] is not open", dso__data(machine.dsos.dsos[0])->fd != -1);
 
 	/* open +1 dso to reach the allowed limit */
-	fd = dso__data_fd(dsos[i], &machine);
+	fd = dso__data_fd(machine.dsos.dsos[i], &machine);
 	TEST_ASSERT_VAL("failed to get fd", fd > 0);
 
 	/* should force the first one to be closed */
-	TEST_ASSERT_VAL("failed to close dsos[0]", dso__data(dsos[0])->fd == -1);
+	TEST_ASSERT_VAL("failed to close dsos[0]", dso__data(machine.dsos.dsos[0])->fd == -1);
 
 	/* cleanup everything */
-	dsos__delete(dso_cnt);
+	dsos__delete(&machine.dsos);
 
 	/* Make sure we did not leak any file descriptor. */
 	nr_end = open_files_cnt();
@@ -325,9 +318,9 @@ static int test__dso_data_reopen(struct test_suite *test __maybe_unused, int sub
 	long nr_end, nr = open_files_cnt(), lim = new_limit(3);
 	int fd, fd_extra;
 
-#define dso_0 (dsos[0])
-#define dso_1 (dsos[1])
-#define dso_2 (dsos[2])
+#define dso_0 (machine.dsos.dsos[0])
+#define dso_1 (machine.dsos.dsos[1])
+#define dso_2 (machine.dsos.dsos[2])
 
 	/* Rest the internal dso open counter limit. */
 	reset_fd_limit();
@@ -347,7 +340,8 @@ static int test__dso_data_reopen(struct test_suite *test __maybe_unused, int sub
 	TEST_ASSERT_VAL("failed to set file limit",
 			!set_fd_limit((lim)));
 
-	TEST_ASSERT_VAL("failed to create dsos\n", !dsos__create(3, TEST_FILE_SIZE));
+	TEST_ASSERT_VAL("failed to create dsos\n",
+			!dsos__create(3, TEST_FILE_SIZE, &machine.dsos));
 
 	/* open dso_0 */
 	fd = dso__data_fd(dso_0, &machine);
@@ -386,7 +380,7 @@ static int test__dso_data_reopen(struct test_suite *test __maybe_unused, int sub
 
 	/* cleanup everything */
 	close(fd_extra);
-	dsos__delete(3);
+	dsos__delete(&machine.dsos);
 
 	/* Make sure we did not leak any file descriptor. */
 	nr_end = open_files_cnt();
diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 0fef597725c7..4f20dac89b77 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -496,14 +496,20 @@ static pthread_mutex_t dso__data_open_lock = PTHREAD_MUTEX_INITIALIZER;
 static void dso__list_add(struct dso *dso)
 {
 	list_add_tail(&dso__data(dso)->open_entry, &dso__data_open);
+#ifdef REFCNT_CHECKING
 	dso__data(dso)->dso = dso__get(dso);
+#endif
+	/* Assume the dso is part of dsos, hence the optional reference count above. */
+	assert(dso__dsos(dso));
 	dso__data_open_cnt++;
 }
 
 static void dso__list_del(struct dso *dso)
 {
 	list_del_init(&dso__data(dso)->open_entry);
+#ifdef REFCNT_CHECKING
 	dso__put(dso__data(dso)->dso);
+#endif
 	WARN_ONCE(dso__data_open_cnt <= 0,
 		  "DSO data fd counter out of bounds.");
 	dso__data_open_cnt--;
@@ -653,9 +659,15 @@ static void close_dso(struct dso *dso)
 static void close_first_dso(void)
 {
 	struct dso_data *dso_data;
+	struct dso *dso;
 
 	dso_data = list_first_entry(&dso__data_open, struct dso_data, open_entry);
-	close_dso(dso_data->dso);
+#ifdef REFCNT_CHECKING
+	dso = dso_data->dso;
+#else
+	dso = container_of(dso_data, struct dso, data);
+#endif
+	close_dso(dso);
 }
 
 static rlim_t get_fd_limit(void)
@@ -1444,7 +1456,9 @@ struct dso *dso__new_id(const char *name, struct dso_id *id)
 		data->fd = -1;
 		data->status = DSO_DATA_STATUS_UNKNOWN;
 		INIT_LIST_HEAD(&data->open_entry);
+#ifdef REFCNT_CHECKING
 		data->dso = NULL; /* Set when on the open_entry list. */
+#endif
 	}
 	return res;
 }
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index fa311ffd2538..e02a4718f1f8 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -147,7 +147,9 @@ struct dso_cache {
 struct dso_data {
 	struct rb_root	 cache;
 	struct list_head open_entry;
+#ifdef REFCNT_CHECKING
 	struct dso	 *dso;
+#endif
 	int		 fd;
 	int		 status;
 	u32		 status_seen;
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v4 53/53] perf env: Avoid recursively taking env->bpf_progs.lock
  2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
                   ` (51 preceding siblings ...)
  2023-11-02 17:57 ` [PATCH v4 52/53] perf dso: Use container_of to avoid a pointer in dso_data Ian Rogers
@ 2023-11-02 17:57 ` Ian Rogers
  52 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-02 17:57 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Ian Rogers, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Add variants of perf_env__insert_bpf_prog_info, perf_env__insert_btf
and perf_env__find_btf prefixed with __ to indicate the
env->bpf_progs.lock is assumed held. Call these variants when the lock
is held to avoid recursively taking it and potentially having a thread
deadlock with itself.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/bpf-event.c |  8 +++---
 tools/perf/util/bpf-event.h | 12 ++++-----
 tools/perf/util/env.c       | 53 +++++++++++++++++++++++--------------
 tools/perf/util/env.h       |  4 +++
 tools/perf/util/header.c    |  8 +++---
 5 files changed, 51 insertions(+), 34 deletions(-)

diff --git a/tools/perf/util/bpf-event.c b/tools/perf/util/bpf-event.c
index b564d6fd078a..827695cd0408 100644
--- a/tools/perf/util/bpf-event.c
+++ b/tools/perf/util/bpf-event.c
@@ -546,9 +546,9 @@ int evlist__add_bpf_sb_event(struct evlist *evlist, struct perf_env *env)
 	return evlist__add_sb_event(evlist, &attr, bpf_event__sb_cb, env);
 }
 
-void bpf_event__print_bpf_prog_info(struct bpf_prog_info *info,
-				    struct perf_env *env,
-				    FILE *fp)
+void __bpf_event__print_bpf_prog_info(struct bpf_prog_info *info,
+				      struct perf_env *env,
+				      FILE *fp)
 {
 	__u32 *prog_lens = (__u32 *)(uintptr_t)(info->jited_func_lens);
 	__u64 *prog_addrs = (__u64 *)(uintptr_t)(info->jited_ksyms);
@@ -564,7 +564,7 @@ void bpf_event__print_bpf_prog_info(struct bpf_prog_info *info,
 	if (info->btf_id) {
 		struct btf_node *node;
 
-		node = perf_env__find_btf(env, info->btf_id);
+		node = __perf_env__find_btf(env, info->btf_id);
 		if (node)
 			btf = btf__new((__u8 *)(node->data),
 				       node->data_size);
diff --git a/tools/perf/util/bpf-event.h b/tools/perf/util/bpf-event.h
index 1bcbd4fb6c66..e2f0420905f5 100644
--- a/tools/perf/util/bpf-event.h
+++ b/tools/perf/util/bpf-event.h
@@ -33,9 +33,9 @@ struct btf_node {
 int machine__process_bpf(struct machine *machine, union perf_event *event,
 			 struct perf_sample *sample);
 int evlist__add_bpf_sb_event(struct evlist *evlist, struct perf_env *env);
-void bpf_event__print_bpf_prog_info(struct bpf_prog_info *info,
-				    struct perf_env *env,
-				    FILE *fp);
+void __bpf_event__print_bpf_prog_info(struct bpf_prog_info *info,
+				      struct perf_env *env,
+				      FILE *fp);
 #else
 static inline int machine__process_bpf(struct machine *machine __maybe_unused,
 				       union perf_event *event __maybe_unused,
@@ -50,9 +50,9 @@ static inline int evlist__add_bpf_sb_event(struct evlist *evlist __maybe_unused,
 	return 0;
 }
 
-static inline void bpf_event__print_bpf_prog_info(struct bpf_prog_info *info __maybe_unused,
-						  struct perf_env *env __maybe_unused,
-						  FILE *fp __maybe_unused)
+static inline void __bpf_event__print_bpf_prog_info(struct bpf_prog_info *info __maybe_unused,
+						    struct perf_env *env __maybe_unused,
+						    FILE *fp __maybe_unused)
 {
 
 }
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 44140b7f596a..66ef87176b11 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -20,15 +20,20 @@ struct perf_env perf_env;
 #include "bpf-utils.h"
 #include <bpf/libbpf.h>
 
-void perf_env__insert_bpf_prog_info(struct perf_env *env,
-				    struct bpf_prog_info_node *info_node)
+void perf_env__insert_bpf_prog_info(struct perf_env *env, struct bpf_prog_info_node *info_node)
+{
+	down_write(&env->bpf_progs.lock);
+	__perf_env__insert_bpf_prog_info(env, info_node);
+	up_write(&env->bpf_progs.lock);
+}
+
+void __perf_env__insert_bpf_prog_info(struct perf_env *env, struct bpf_prog_info_node *info_node)
 {
 	__u32 prog_id = info_node->info_linear->info.id;
 	struct bpf_prog_info_node *node;
 	struct rb_node *parent = NULL;
 	struct rb_node **p;
 
-	down_write(&env->bpf_progs.lock);
 	p = &env->bpf_progs.infos.rb_node;
 
 	while (*p != NULL) {
@@ -40,15 +45,13 @@ void perf_env__insert_bpf_prog_info(struct perf_env *env,
 			p = &(*p)->rb_right;
 		} else {
 			pr_debug("duplicated bpf prog info %u\n", prog_id);
-			goto out;
+			return;
 		}
 	}
 
 	rb_link_node(&info_node->rb_node, parent, p);
 	rb_insert_color(&info_node->rb_node, &env->bpf_progs.infos);
 	env->bpf_progs.infos_cnt++;
-out:
-	up_write(&env->bpf_progs.lock);
 }
 
 struct bpf_prog_info_node *perf_env__find_bpf_prog_info(struct perf_env *env,
@@ -77,14 +80,22 @@ struct bpf_prog_info_node *perf_env__find_bpf_prog_info(struct perf_env *env,
 }
 
 bool perf_env__insert_btf(struct perf_env *env, struct btf_node *btf_node)
+{
+	bool ret;
+
+	down_write(&env->bpf_progs.lock);
+	ret = __perf_env__insert_btf(env, btf_node);
+	up_write(&env->bpf_progs.lock);
+	return ret;
+}
+
+bool __perf_env__insert_btf(struct perf_env *env, struct btf_node *btf_node)
 {
 	struct rb_node *parent = NULL;
 	__u32 btf_id = btf_node->id;
 	struct btf_node *node;
 	struct rb_node **p;
-	bool ret = true;
 
-	down_write(&env->bpf_progs.lock);
 	p = &env->bpf_progs.btfs.rb_node;
 
 	while (*p != NULL) {
@@ -96,25 +107,31 @@ bool perf_env__insert_btf(struct perf_env *env, struct btf_node *btf_node)
 			p = &(*p)->rb_right;
 		} else {
 			pr_debug("duplicated btf %u\n", btf_id);
-			ret = false;
-			goto out;
+			return false;
 		}
 	}
 
 	rb_link_node(&btf_node->rb_node, parent, p);
 	rb_insert_color(&btf_node->rb_node, &env->bpf_progs.btfs);
 	env->bpf_progs.btfs_cnt++;
-out:
-	up_write(&env->bpf_progs.lock);
-	return ret;
+	return true;
 }
 
 struct btf_node *perf_env__find_btf(struct perf_env *env, __u32 btf_id)
+{
+	struct btf_node *res;
+
+	down_read(&env->bpf_progs.lock);
+	res = __perf_env__find_btf(env, btf_id);
+	up_read(&env->bpf_progs.lock);
+	return res;
+}
+
+struct btf_node *__perf_env__find_btf(struct perf_env *env, __u32 btf_id)
 {
 	struct btf_node *node = NULL;
 	struct rb_node *n;
 
-	down_read(&env->bpf_progs.lock);
 	n = env->bpf_progs.btfs.rb_node;
 
 	while (n) {
@@ -124,13 +141,9 @@ struct btf_node *perf_env__find_btf(struct perf_env *env, __u32 btf_id)
 		else if (btf_id > node->id)
 			n = n->rb_right;
 		else
-			goto out;
+			return node;
 	}
-	node = NULL;
-
-out:
-	up_read(&env->bpf_progs.lock);
-	return node;
+	return NULL;
 }
 
 /* purge data in bpf_progs.infos tree */
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index 4566c51f2fd9..359eff51cb85 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -164,12 +164,16 @@ const char *perf_env__raw_arch(struct perf_env *env);
 int perf_env__nr_cpus_avail(struct perf_env *env);
 
 void perf_env__init(struct perf_env *env);
+void __perf_env__insert_bpf_prog_info(struct perf_env *env,
+				      struct bpf_prog_info_node *info_node);
 void perf_env__insert_bpf_prog_info(struct perf_env *env,
 				    struct bpf_prog_info_node *info_node);
 struct bpf_prog_info_node *perf_env__find_bpf_prog_info(struct perf_env *env,
 							__u32 prog_id);
 bool perf_env__insert_btf(struct perf_env *env, struct btf_node *btf_node);
+bool __perf_env__insert_btf(struct perf_env *env, struct btf_node *btf_node);
 struct btf_node *perf_env__find_btf(struct perf_env *env, __u32 btf_id);
+struct btf_node *__perf_env__find_btf(struct perf_env *env, __u32 btf_id);
 
 int perf_env__numa_node(struct perf_env *env, struct perf_cpu cpu);
 char *perf_env__find_pmu_cap(struct perf_env *env, const char *pmu_name,
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 13e23c386601..310a86c16d88 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -1848,8 +1848,8 @@ static void print_bpf_prog_info(struct feat_fd *ff, FILE *fp)
 		node = rb_entry(next, struct bpf_prog_info_node, rb_node);
 		next = rb_next(&node->rb_node);
 
-		bpf_event__print_bpf_prog_info(&node->info_linear->info,
-					       env, fp);
+		__bpf_event__print_bpf_prog_info(&node->info_linear->info,
+						 env, fp);
 	}
 
 	up_read(&env->bpf_progs.lock);
@@ -3179,7 +3179,7 @@ static int process_bpf_prog_info(struct feat_fd *ff, void *data __maybe_unused)
 		/* after reading from file, translate offset to address */
 		bpil_offs_to_addr(info_linear);
 		info_node->info_linear = info_linear;
-		perf_env__insert_bpf_prog_info(env, info_node);
+		__perf_env__insert_bpf_prog_info(env, info_node);
 	}
 
 	up_write(&env->bpf_progs.lock);
@@ -3226,7 +3226,7 @@ static int process_bpf_btf(struct feat_fd *ff, void *data __maybe_unused)
 		if (__do_read(ff, node->data, data_size))
 			goto out;
 
-		perf_env__insert_btf(env, node);
+		__perf_env__insert_btf(env, node);
 		node = NULL;
 	}
 
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 03/53] libperf: Lazily allocate mmap event copy
  2023-11-02 17:56 ` [PATCH v4 03/53] libperf: Lazily allocate mmap event copy Ian Rogers
@ 2023-11-03  8:32   ` Guilherme Amadio
  2023-11-03 15:48     ` Ian Rogers
  0 siblings, 1 reply; 83+ messages in thread
From: Guilherme Amadio @ 2023-11-03  8:32 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen, Kajol Jain,
	Athira Rajeev, Huacai Chen, Masami Hiramatsu, Vincent Whitchurch,
	Steinar H. Gunderson, Liam Howlett, Miguel Ojeda, Colin Ian King,
	Dmitrii Dolgov, Yang Jihong, Ming Wang, James Clark,
	K Prateek Nayak, Sean Christopherson, Leo Yan, Ravi Bangoria,
	German Gomez, Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das,
	liuwenyu, linux-kernel, linux-perf-users

Hi, 

On Thu, Nov 02, 2023 at 10:56:45AM -0700, Ian Rogers wrote:
> The event copy in the mmap is used to have storage to a read
> event. Not all users of mmaps read the events, such as perf record, so
> switch the allocation to being on first read rather than being
> embedded within the perf_mmap.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/lib/perf/include/internal/mmap.h | 2 +-
>  tools/lib/perf/mmap.c                  | 9 +++++++++
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/lib/perf/include/internal/mmap.h b/tools/lib/perf/include/internal/mmap.h
> index 5a062af8e9d8..b11aaf5ed645 100644
> --- a/tools/lib/perf/include/internal/mmap.h
> +++ b/tools/lib/perf/include/internal/mmap.h
> @@ -33,7 +33,7 @@ struct perf_mmap {
>  	bool			 overwrite;
>  	u64			 flush;
>  	libperf_unmap_cb_t	 unmap_cb;
> -	char			 event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
> +	void			*event_copy;
>  	struct perf_mmap	*next;
>  };
>  
> diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> index 2184814b37dd..91ae46aac378 100644
> --- a/tools/lib/perf/mmap.c
> +++ b/tools/lib/perf/mmap.c
> @@ -51,6 +51,8 @@ int perf_mmap__mmap(struct perf_mmap *map, struct perf_mmap_param *mp,
>  
>  void perf_mmap__munmap(struct perf_mmap *map)
>  {
> +	free(map->event_copy);
> +	map->event_copy = NULL;
>  	if (map && map->base != NULL) {

If map can be NULL as the if statement above suggests, then there is a
potential a null pointer dereference bug here. Suggestion:

    if (!map)
        return;

    free(map->event_copy);
    map->event_copy = NULL;
    if (map->base != NULL) {

    ...

Cheers,
-Guilherme

>  		munmap(map->base, perf_mmap__mmap_len(map));
>  		map->base = NULL;
> @@ -226,6 +228,13 @@ static union perf_event *perf_mmap__read(struct perf_mmap *map,
>  			unsigned int len = min(sizeof(*event), size), cpy;
>  			void *dst = map->event_copy;
>  
> +			if (!dst) {
> +				dst = malloc(PERF_SAMPLE_MAX_SIZE);
> +				if (!dst)
> +					return NULL;
> +				map->event_copy = dst;
> +			}
> +
>  			do {
>  				cpy = min(map->mask + 1 - (offset & map->mask), len);
>  				memcpy(dst, &data[offset & map->mask], cpy);
> -- 
> 2.42.0.869.gea05f2083d-goog
> 
> 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 03/53] libperf: Lazily allocate mmap event copy
  2023-11-03  8:32   ` Guilherme Amadio
@ 2023-11-03 15:48     ` Ian Rogers
  2023-11-05 18:12       ` Namhyung Kim
  0 siblings, 1 reply; 83+ messages in thread
From: Ian Rogers @ 2023-11-03 15:48 UTC (permalink / raw)
  To: Guilherme Amadio
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen, Kajol Jain,
	Athira Rajeev, Huacai Chen, Masami Hiramatsu, Vincent Whitchurch,
	Steinar H. Gunderson, Liam Howlett, Miguel Ojeda, Colin Ian King,
	Dmitrii Dolgov, Yang Jihong, Ming Wang, James Clark,
	K Prateek Nayak, Sean Christopherson, Leo Yan, Ravi Bangoria,
	German Gomez, Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das,
	liuwenyu, linux-kernel, linux-perf-users

On Fri, Nov 3, 2023 at 1:33 AM Guilherme Amadio <amadio@gentoo.org> wrote:
>
> Hi,
>
> On Thu, Nov 02, 2023 at 10:56:45AM -0700, Ian Rogers wrote:
> > The event copy in the mmap is used to have storage to a read
> > event. Not all users of mmaps read the events, such as perf record, so
> > switch the allocation to being on first read rather than being
> > embedded within the perf_mmap.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/lib/perf/include/internal/mmap.h | 2 +-
> >  tools/lib/perf/mmap.c                  | 9 +++++++++
> >  2 files changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/lib/perf/include/internal/mmap.h b/tools/lib/perf/include/internal/mmap.h
> > index 5a062af8e9d8..b11aaf5ed645 100644
> > --- a/tools/lib/perf/include/internal/mmap.h
> > +++ b/tools/lib/perf/include/internal/mmap.h
> > @@ -33,7 +33,7 @@ struct perf_mmap {
> >       bool                     overwrite;
> >       u64                      flush;
> >       libperf_unmap_cb_t       unmap_cb;
> > -     char                     event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
> > +     void                    *event_copy;
> >       struct perf_mmap        *next;
> >  };
> >
> > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> > index 2184814b37dd..91ae46aac378 100644
> > --- a/tools/lib/perf/mmap.c
> > +++ b/tools/lib/perf/mmap.c
> > @@ -51,6 +51,8 @@ int perf_mmap__mmap(struct perf_mmap *map, struct perf_mmap_param *mp,
> >
> >  void perf_mmap__munmap(struct perf_mmap *map)
> >  {
> > +     free(map->event_copy);
> > +     map->event_copy = NULL;
> >       if (map && map->base != NULL) {
>
> If map can be NULL as the if statement above suggests, then there is a
> potential a null pointer dereference bug here. Suggestion:
>
>     if (!map)
>         return;
>
>     free(map->event_copy);
>     map->event_copy = NULL;
>     if (map->base != NULL) {
>
>     ...

Makes sense, will fix in v5. Waiting to get additional feedback to
avoid too much email.

Thanks,
Ian

> Cheers,
> -Guilherme
>
> >               munmap(map->base, perf_mmap__mmap_len(map));
> >               map->base = NULL;
> > @@ -226,6 +228,13 @@ static union perf_event *perf_mmap__read(struct perf_mmap *map,
> >                       unsigned int len = min(sizeof(*event), size), cpy;
> >                       void *dst = map->event_copy;
> >
> > +                     if (!dst) {
> > +                             dst = malloc(PERF_SAMPLE_MAX_SIZE);
> > +                             if (!dst)
> > +                                     return NULL;
> > +                             map->event_copy = dst;
> > +                     }
> > +
> >                       do {
> >                               cpy = min(map->mask + 1 - (offset & map->mask), len);
> >                               memcpy(dst, &data[offset & map->mask], cpy);
> > --
> > 2.42.0.869.gea05f2083d-goog
> >
> >

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 01/53] perf comm: Use regular mutex
  2023-11-02 17:56 ` [PATCH v4 01/53] perf comm: Use regular mutex Ian Rogers
@ 2023-11-05 17:31   ` Namhyung Kim
  2023-11-05 21:35     ` Ian Rogers
  2023-11-27 21:53     ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 83+ messages in thread
From: Namhyung Kim @ 2023-11-05 17:31 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Nick Terrell, Kan Liang, Andi Kleen, Kajol Jain, Athira Rajeev,
	Huacai Chen, Masami Hiramatsu, Vincent Whitchurch,
	Steinar H. Gunderson, Liam Howlett, Miguel Ojeda, Colin Ian King,
	Dmitrii Dolgov, Yang Jihong, Ming Wang, James Clark,
	K Prateek Nayak, Sean Christopherson, Leo Yan, Ravi Bangoria,
	German Gomez, Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das,
	liuwenyu, linux-kernel, linux-perf-users

Hi Ian,

On Thu, Nov 2, 2023 at 10:58 AM Ian Rogers <irogers@google.com> wrote:
>
> The rwsem is only after used for writing so switch to a mutex that has
> better error checking.

Hmm.. ok.  It doesn't make sense to use rwsem without readers.

>
> Fixes: 7a8f349e9d14 ("perf rwsem: Add debug mode that uses a mutex")

But I'm not sure this is a fix.  Other than that,

> Signed-off-by: Ian Rogers <irogers@google.com>

Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung


> ---
>  tools/perf/util/comm.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/tools/perf/util/comm.c b/tools/perf/util/comm.c
> index afb8d4fd2644..4ae7bc2aa9a6 100644
> --- a/tools/perf/util/comm.c
> +++ b/tools/perf/util/comm.c
> @@ -17,7 +17,7 @@ struct comm_str {
>
>  /* Should perhaps be moved to struct machine */
>  static struct rb_root comm_str_root;
> -static struct rw_semaphore comm_str_lock = {.lock = PTHREAD_RWLOCK_INITIALIZER,};
> +static struct mutex comm_str_lock = {.lock = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP,};
>
>  static struct comm_str *comm_str__get(struct comm_str *cs)
>  {
> @@ -30,9 +30,9 @@ static struct comm_str *comm_str__get(struct comm_str *cs)
>  static void comm_str__put(struct comm_str *cs)
>  {
>         if (cs && refcount_dec_and_test(&cs->refcnt)) {
> -               down_write(&comm_str_lock);
> +               mutex_lock(&comm_str_lock);
>                 rb_erase(&cs->rb_node, &comm_str_root);
> -               up_write(&comm_str_lock);
> +               mutex_unlock(&comm_str_lock);
>                 zfree(&cs->str);
>                 free(cs);
>         }
> @@ -98,9 +98,9 @@ static struct comm_str *comm_str__findnew(const char *str, struct rb_root *root)
>  {
>         struct comm_str *cs;
>
> -       down_write(&comm_str_lock);
> +       mutex_lock(&comm_str_lock);
>         cs = __comm_str__findnew(str, root);
> -       up_write(&comm_str_lock);
> +       mutex_unlock(&comm_str_lock);
>
>         return cs;
>  }
> --
> 2.42.0.869.gea05f2083d-goog
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 02/53] perf record: Lazy load kernel symbols
  2023-11-02 17:56 ` [PATCH v4 02/53] perf record: Lazy load kernel symbols Ian Rogers
@ 2023-11-05 17:34   ` Namhyung Kim
  2023-11-06 11:00   ` Adrian Hunter
  1 sibling, 0 replies; 83+ messages in thread
From: Namhyung Kim @ 2023-11-05 17:34 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Nick Terrell, Kan Liang, Andi Kleen, Kajol Jain, Athira Rajeev,
	Huacai Chen, Masami Hiramatsu, Vincent Whitchurch,
	Steinar H. Gunderson, Liam Howlett, Miguel Ojeda, Colin Ian King,
	Dmitrii Dolgov, Yang Jihong, Ming Wang, James Clark,
	K Prateek Nayak, Sean Christopherson, Leo Yan, Ravi Bangoria,
	German Gomez, Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das,
	liuwenyu, linux-kernel, linux-perf-users

On Thu, Nov 2, 2023 at 10:58 AM Ian Rogers <irogers@google.com> wrote:
>
> Commit 5b7ba82a7591 ("perf symbols: Load kernel maps before using")
> changed it so that loading a kernel dso would cause the symbols for
> the dso to be eagerly loaded. For perf record this is overhead as the
> symbols won't be used. Add a symbol_conf to control the behavior and
> disable it for perf record and perf inject.
>
> Signed-off-by: Ian Rogers <irogers@google.com>

Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung

> ---
>  tools/perf/builtin-inject.c   | 6 ++++++
>  tools/perf/builtin-record.c   | 2 ++
>  tools/perf/util/event.c       | 4 ++--
>  tools/perf/util/symbol_conf.h | 3 ++-
>  4 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index c8cf2fdd9cff..eb3ef5c24b66 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -2265,6 +2265,12 @@ int cmd_inject(int argc, const char **argv)
>                 "perf inject [<options>]",
>                 NULL
>         };
> +
> +       if (!inject.itrace_synth_opts.set) {
> +               /* Disable eager loading of kernel symbols that adds overhead to perf inject. */
> +               symbol_conf.lazy_load_kernel_maps = true;
> +       }
> +
>  #ifndef HAVE_JITDUMP
>         set_option_nobuild(options, 'j', "jit", "NO_LIBELF=1", true);
>  #endif
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index dcf288a4fb9a..8ec818568662 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -3989,6 +3989,8 @@ int cmd_record(int argc, const char **argv)
>  # undef set_nobuild
>  #endif
>
> +       /* Disable eager loading of kernel symbols that adds overhead to perf record. */
> +       symbol_conf.lazy_load_kernel_maps = true;
>         rec->opts.affinity = PERF_AFFINITY_SYS;
>
>         rec->evlist = evlist__new();
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index 923c0fb15122..68f45e9e63b6 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -617,13 +617,13 @@ struct map *thread__find_map(struct thread *thread, u8 cpumode, u64 addr,
>         if (cpumode == PERF_RECORD_MISC_KERNEL && perf_host) {
>                 al->level = 'k';
>                 maps = machine__kernel_maps(machine);
> -               load_map = true;
> +               load_map = !symbol_conf.lazy_load_kernel_maps;
>         } else if (cpumode == PERF_RECORD_MISC_USER && perf_host) {
>                 al->level = '.';
>         } else if (cpumode == PERF_RECORD_MISC_GUEST_KERNEL && perf_guest) {
>                 al->level = 'g';
>                 maps = machine__kernel_maps(machine);
> -               load_map = true;
> +               load_map = !symbol_conf.lazy_load_kernel_maps;
>         } else if (cpumode == PERF_RECORD_MISC_GUEST_USER && perf_guest) {
>                 al->level = 'u';
>         } else {
> diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
> index 0b589570d1d0..2b2fb9e224b0 100644
> --- a/tools/perf/util/symbol_conf.h
> +++ b/tools/perf/util/symbol_conf.h
> @@ -42,7 +42,8 @@ struct symbol_conf {
>                         inline_name,
>                         disable_add2line_warn,
>                         buildid_mmap2,
> -                       guest_code;
> +                       guest_code,
> +                       lazy_load_kernel_maps;
>         const char      *vmlinux_name,
>                         *kallsyms_name,
>                         *source_prefix,
> --
> 2.42.0.869.gea05f2083d-goog
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 03/53] libperf: Lazily allocate mmap event copy
  2023-11-03 15:48     ` Ian Rogers
@ 2023-11-05 18:12       ` Namhyung Kim
  2023-11-27 19:28         ` Ian Rogers
  0 siblings, 1 reply; 83+ messages in thread
From: Namhyung Kim @ 2023-11-05 18:12 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Guilherme Amadio, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

On Fri, Nov 3, 2023 at 8:49 AM Ian Rogers <irogers@google.com> wrote:
>
> On Fri, Nov 3, 2023 at 1:33 AM Guilherme Amadio <amadio@gentoo.org> wrote:
> >
> > Hi,
> >
> > On Thu, Nov 02, 2023 at 10:56:45AM -0700, Ian Rogers wrote:
> > > The event copy in the mmap is used to have storage to a read
> > > event. Not all users of mmaps read the events, such as perf record, so
> > > switch the allocation to being on first read rather than being
> > > embedded within the perf_mmap.
> > >
> > > Signed-off-by: Ian Rogers <irogers@google.com>
> > > ---
> > >  tools/lib/perf/include/internal/mmap.h | 2 +-
> > >  tools/lib/perf/mmap.c                  | 9 +++++++++
> > >  2 files changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/tools/lib/perf/include/internal/mmap.h b/tools/lib/perf/include/internal/mmap.h
> > > index 5a062af8e9d8..b11aaf5ed645 100644
> > > --- a/tools/lib/perf/include/internal/mmap.h
> > > +++ b/tools/lib/perf/include/internal/mmap.h
> > > @@ -33,7 +33,7 @@ struct perf_mmap {
> > >       bool                     overwrite;
> > >       u64                      flush;
> > >       libperf_unmap_cb_t       unmap_cb;
> > > -     char                     event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
> > > +     void                    *event_copy;
> > >       struct perf_mmap        *next;
> > >  };
> > >
> > > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> > > index 2184814b37dd..91ae46aac378 100644
> > > --- a/tools/lib/perf/mmap.c
> > > +++ b/tools/lib/perf/mmap.c
> > > @@ -51,6 +51,8 @@ int perf_mmap__mmap(struct perf_mmap *map, struct perf_mmap_param *mp,
> > >
> > >  void perf_mmap__munmap(struct perf_mmap *map)
> > >  {
> > > +     free(map->event_copy);
> > > +     map->event_copy = NULL;
> > >       if (map && map->base != NULL) {
> >
> > If map can be NULL as the if statement above suggests, then there is a
> > potential a null pointer dereference bug here. Suggestion:
> >
> >     if (!map)
> >         return;
> >
> >     free(map->event_copy);
> >     map->event_copy = NULL;
> >     if (map->base != NULL) {
> >
> >     ...
>
> Makes sense, will fix in v5. Waiting to get additional feedback to
> avoid too much email.

Acked-by: Namhyung Kim <namhyung@kernel.org>


But I have another concern (not related to this change).

> >
> > >               munmap(map->base, perf_mmap__mmap_len(map));
> > >               map->base = NULL;
> > > @@ -226,6 +228,13 @@ static union perf_event *perf_mmap__read(struct perf_mmap *map,
> > >                       unsigned int len = min(sizeof(*event), size), cpy;

I'm not sure if it's ok to read less than the actual size, IOW
it seems to assume 'size' is smaller than sizeof(*event).
I guess it's true for most cases as union perf_event has
perf_record_mmap2 (among others) which contains a
filename array of size PATH_MAX.

But the SAMPLE record can be larger than that when it has
PERF_SAMPLE_AUX IIRC.  It'd happen only if it crossed the mmap
boundary and I'm afraid it'd corrupt the data.

Thanks,
Namhyung


> > >                       void *dst = map->event_copy;
> > >
> > > +                     if (!dst) {
> > > +                             dst = malloc(PERF_SAMPLE_MAX_SIZE);
> > > +                             if (!dst)
> > > +                                     return NULL;
> > > +                             map->event_copy = dst;
> > > +                     }
> > > +
> > >                       do {
> > >                               cpy = min(map->mask + 1 - (offset & map->mask), len);
> > >                               memcpy(dst, &data[offset & map->mask], cpy);
> > > --
> > > 2.42.0.869.gea05f2083d-goog
> > >
> > >

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 01/53] perf comm: Use regular mutex
  2023-11-05 17:31   ` Namhyung Kim
@ 2023-11-05 21:35     ` Ian Rogers
  2023-11-06  3:58       ` Namhyung Kim
  2023-11-27 21:53     ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 83+ messages in thread
From: Ian Rogers @ 2023-11-05 21:35 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Nick Terrell, Kan Liang, Andi Kleen, Kajol Jain, Athira Rajeev,
	Huacai Chen, Masami Hiramatsu, Vincent Whitchurch,
	Steinar H. Gunderson, Liam Howlett, Miguel Ojeda, Colin Ian King,
	Dmitrii Dolgov, Yang Jihong, Ming Wang, James Clark,
	K Prateek Nayak, Sean Christopherson, Leo Yan, Ravi Bangoria,
	German Gomez, Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das,
	liuwenyu, linux-kernel, linux-perf-users

On Sun, Nov 5, 2023 at 9:32 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hi Ian,
>
> On Thu, Nov 2, 2023 at 10:58 AM Ian Rogers <irogers@google.com> wrote:
> >
> > The rwsem is only after used for writing so switch to a mutex that has
> > better error checking.
>
> Hmm.. ok.  It doesn't make sense to use rwsem without readers.
>
> >
> > Fixes: 7a8f349e9d14 ("perf rwsem: Add debug mode that uses a mutex")
>
> But I'm not sure this is a fix.  Other than that,

Thanks Namhyung, it fixes the case that you enable RWS_ERRORCHECK in
rwsem.h as the rwsem static initialization is wrong for a mutex.

Ian

> > Signed-off-by: Ian Rogers <irogers@google.com>
>
> Acked-by: Namhyung Kim <namhyung@kernel.org>
>
> Thanks,
> Namhyung
>
>
> > ---
> >  tools/perf/util/comm.c | 10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/tools/perf/util/comm.c b/tools/perf/util/comm.c
> > index afb8d4fd2644..4ae7bc2aa9a6 100644
> > --- a/tools/perf/util/comm.c
> > +++ b/tools/perf/util/comm.c
> > @@ -17,7 +17,7 @@ struct comm_str {
> >
> >  /* Should perhaps be moved to struct machine */
> >  static struct rb_root comm_str_root;
> > -static struct rw_semaphore comm_str_lock = {.lock = PTHREAD_RWLOCK_INITIALIZER,};
> > +static struct mutex comm_str_lock = {.lock = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP,};
> >
> >  static struct comm_str *comm_str__get(struct comm_str *cs)
> >  {
> > @@ -30,9 +30,9 @@ static struct comm_str *comm_str__get(struct comm_str *cs)
> >  static void comm_str__put(struct comm_str *cs)
> >  {
> >         if (cs && refcount_dec_and_test(&cs->refcnt)) {
> > -               down_write(&comm_str_lock);
> > +               mutex_lock(&comm_str_lock);
> >                 rb_erase(&cs->rb_node, &comm_str_root);
> > -               up_write(&comm_str_lock);
> > +               mutex_unlock(&comm_str_lock);
> >                 zfree(&cs->str);
> >                 free(cs);
> >         }
> > @@ -98,9 +98,9 @@ static struct comm_str *comm_str__findnew(const char *str, struct rb_root *root)
> >  {
> >         struct comm_str *cs;
> >
> > -       down_write(&comm_str_lock);
> > +       mutex_lock(&comm_str_lock);
> >         cs = __comm_str__findnew(str, root);
> > -       up_write(&comm_str_lock);
> > +       mutex_unlock(&comm_str_lock);
> >
> >         return cs;
> >  }
> > --
> > 2.42.0.869.gea05f2083d-goog
> >

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 06/53] tools api fs: Switch filename__read_str to use io.h
  2023-11-02 17:56 ` [PATCH v4 06/53] tools api fs: Switch filename__read_str to use io.h Ian Rogers
@ 2023-11-06  3:53   ` Namhyung Kim
  2023-11-27 20:26     ` Ian Rogers
  0 siblings, 1 reply; 83+ messages in thread
From: Namhyung Kim @ 2023-11-06  3:53 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Nick Terrell, Kan Liang, Andi Kleen, Kajol Jain, Athira Rajeev,
	Huacai Chen, Masami Hiramatsu, Vincent Whitchurch,
	Steinar H. Gunderson, Liam Howlett, Miguel Ojeda, Colin Ian King,
	Dmitrii Dolgov, Yang Jihong, Ming Wang, James Clark,
	K Prateek Nayak, Sean Christopherson, Leo Yan, Ravi Bangoria,
	German Gomez, Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das,
	liuwenyu, linux-kernel, linux-perf-users

On Thu, Nov 2, 2023 at 10:58 AM Ian Rogers <irogers@google.com> wrote:
>
> filename__read_str has its own string reading code that allocates
> memory before reading into it. The memory allocated is sized at BUFSIZ
> that is 8kb. Most strings are short and so most of this 8kb is
> wasted.
>
> Refactor io__getline so that the newline character can be configurable
> and ignored in the case of filename__read_str.
>
> Code like build_caches_for_cpu in perf's header.c will read many
> strings and hold them in a data structure, in this case multiple
> strings per cache level per CPU. Using io.h's io__getline avoids the
> wasted memory as strings are temporarily read into a buffer on the
> stack before being copied to a buffer that grows 128 bytes at a time
> and is never sized larger than the string.
>
> For a 16 hyperthread system the memory consumption of "perf record
> true" is reduced by 180kb, primarily through saving memory when
> reading the cache information.
>
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---

[SNIP]
> diff --git a/tools/lib/api/io.h b/tools/lib/api/io.h
> index a77b74c5fb65..50d33e14fb56 100644
> --- a/tools/lib/api/io.h
> +++ b/tools/lib/api/io.h
> @@ -141,7 +141,7 @@ static inline int io__get_dec(struct io *io, __u64 *dec)
>  }
>
>  /* Read up to and including the first newline following the pattern of getline. */

You may want to update the comment as well.

> -static inline ssize_t io__getline(struct io *io, char **line_out, size_t *line_len_out)
> +static inline ssize_t io__getline_nl(struct io *io, char **line_out, size_t *line_len_out, int nl)

How about io__getdelim() similar to POSIX?

Thanks,
Namhyung


>  {
>         char buf[128];
>         int buf_pos = 0;
> @@ -151,7 +151,7 @@ static inline ssize_t io__getline(struct io *io, char **line_out, size_t *line_l
>
>         /* TODO: reuse previously allocated memory. */
>         free(*line_out);
> -       while (ch != '\n') {
> +       while (ch != nl) {
>                 ch = io__get_char(io);
>
>                 if (ch < 0)
> @@ -184,4 +184,9 @@ static inline ssize_t io__getline(struct io *io, char **line_out, size_t *line_l
>         return -ENOMEM;
>  }
>
> +static inline ssize_t io__getline(struct io *io, char **line_out, size_t *line_len_out)
> +{
> +       return io__getline_nl(io, line_out, line_len_out, /*nl=*/'\n');
> +}
> +
>  #endif /* __API_IO__ */
> --
> 2.42.0.869.gea05f2083d-goog
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 07/53] tools api fs: Avoid reading whole file for a 1 byte bool
  2023-11-02 17:56 ` [PATCH v4 07/53] tools api fs: Avoid reading whole file for a 1 byte bool Ian Rogers
@ 2023-11-06  3:55   ` Namhyung Kim
  2023-11-27 20:41     ` Ian Rogers
  0 siblings, 1 reply; 83+ messages in thread
From: Namhyung Kim @ 2023-11-06  3:55 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Nick Terrell, Kan Liang, Andi Kleen, Kajol Jain, Athira Rajeev,
	Huacai Chen, Masami Hiramatsu, Vincent Whitchurch,
	Steinar H. Gunderson, Liam Howlett, Miguel Ojeda, Colin Ian King,
	Dmitrii Dolgov, Yang Jihong, Ming Wang, James Clark,
	K Prateek Nayak, Sean Christopherson, Leo Yan, Ravi Bangoria,
	German Gomez, Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das,
	liuwenyu, linux-kernel, linux-perf-users

On Thu, Nov 2, 2023 at 10:58 AM Ian Rogers <irogers@google.com> wrote:
>
> sysfs__read_bool used the first byte from a fully read file into a
> string. It then looked at the first byte's value. Avoid doing this and
> just read the first byte.
>
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/lib/api/fs/fs.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c
> index 496812b5f1d2..4c35a689d1fc 100644
> --- a/tools/lib/api/fs/fs.c
> +++ b/tools/lib/api/fs/fs.c
> @@ -447,15 +447,16 @@ int sysfs__read_str(const char *entry, char **buf, size_t *sizep)
>
>  int sysfs__read_bool(const char *entry, bool *value)
>  {
> -       char *buf;
> -       size_t size;
> -       int ret;
> +       struct io io;
> +       char bf[16];
> +       int ret = 0;
>
> -       ret = sysfs__read_str(entry, &buf, &size);
> -       if (ret < 0)
> -               return ret;
> +       io.fd = open(entry, O_RDONLY);

The entry is a name in sysfs, so you need to get the full name.

Thanks,
Namhyung


> +       if (io.fd < 0)
> +               return -errno;
>
> -       switch (buf[0]) {
> +       io__init(&io, io.fd, bf, sizeof(bf));
> +       switch (io__get_char(&io)) {
>         case '1':
>         case 'y':
>         case 'Y':
> @@ -469,8 +470,7 @@ int sysfs__read_bool(const char *entry, bool *value)
>         default:
>                 ret = -1;
>         }
> -
> -       free(buf);
> +       close(io.fd);
>
>         return ret;
>  }
> --
> 2.42.0.869.gea05f2083d-goog
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 01/53] perf comm: Use regular mutex
  2023-11-05 21:35     ` Ian Rogers
@ 2023-11-06  3:58       ` Namhyung Kim
  2023-11-27 18:59         ` Ian Rogers
  0 siblings, 1 reply; 83+ messages in thread
From: Namhyung Kim @ 2023-11-06  3:58 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Nick Terrell, Kan Liang, Andi Kleen, Kajol Jain, Athira Rajeev,
	Huacai Chen, Masami Hiramatsu, Vincent Whitchurch,
	Steinar H. Gunderson, Liam Howlett, Miguel Ojeda, Colin Ian King,
	Dmitrii Dolgov, Yang Jihong, Ming Wang, James Clark,
	K Prateek Nayak, Sean Christopherson, Leo Yan, Ravi Bangoria,
	German Gomez, Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das,
	liuwenyu, linux-kernel, linux-perf-users

On Sun, Nov 5, 2023 at 1:35 PM Ian Rogers <irogers@google.com> wrote:
>
> On Sun, Nov 5, 2023 at 9:32 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hi Ian,
> >
> > On Thu, Nov 2, 2023 at 10:58 AM Ian Rogers <irogers@google.com> wrote:
> > >
> > > The rwsem is only after used for writing so switch to a mutex that has
> > > better error checking.
> >
> > Hmm.. ok.  It doesn't make sense to use rwsem without readers.
> >
> > >
> > > Fixes: 7a8f349e9d14 ("perf rwsem: Add debug mode that uses a mutex")
> >
> > But I'm not sure this is a fix.  Other than that,
>
> Thanks Namhyung, it fixes the case that you enable RWS_ERRORCHECK in
> rwsem.h as the rwsem static initialization is wrong for a mutex.

Sounds like we need a separate fix.  Maybe you need to
add a static initializer macro depending on the config.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 02/53] perf record: Lazy load kernel symbols
  2023-11-02 17:56 ` [PATCH v4 02/53] perf record: Lazy load kernel symbols Ian Rogers
  2023-11-05 17:34   ` Namhyung Kim
@ 2023-11-06 11:00   ` Adrian Hunter
  2023-11-08 16:01     ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2023-11-06 11:00 UTC (permalink / raw)
  To: Ian Rogers, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

On 2/11/23 19:56, Ian Rogers wrote:
> Commit 5b7ba82a7591 ("perf symbols: Load kernel maps before using")
> changed it so that loading a kernel dso would cause the symbols for
> the dso to be eagerly loaded. For perf record this is overhead as the
> symbols won't be used. Add a symbol_conf to control the behavior and
> disable it for perf record and perf inject.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>

> ---
>  tools/perf/builtin-inject.c   | 6 ++++++
>  tools/perf/builtin-record.c   | 2 ++
>  tools/perf/util/event.c       | 4 ++--
>  tools/perf/util/symbol_conf.h | 3 ++-
>  4 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index c8cf2fdd9cff..eb3ef5c24b66 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -2265,6 +2265,12 @@ int cmd_inject(int argc, const char **argv)
>  		"perf inject [<options>]",
>  		NULL
>  	};
> +
> +	if (!inject.itrace_synth_opts.set) {
> +		/* Disable eager loading of kernel symbols that adds overhead to perf inject. */
> +		symbol_conf.lazy_load_kernel_maps = true;
> +	}
> +
>  #ifndef HAVE_JITDUMP
>  	set_option_nobuild(options, 'j', "jit", "NO_LIBELF=1", true);
>  #endif
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index dcf288a4fb9a..8ec818568662 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -3989,6 +3989,8 @@ int cmd_record(int argc, const char **argv)
>  # undef set_nobuild
>  #endif
>  
> +	/* Disable eager loading of kernel symbols that adds overhead to perf record. */
> +	symbol_conf.lazy_load_kernel_maps = true;
>  	rec->opts.affinity = PERF_AFFINITY_SYS;
>  
>  	rec->evlist = evlist__new();
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index 923c0fb15122..68f45e9e63b6 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -617,13 +617,13 @@ struct map *thread__find_map(struct thread *thread, u8 cpumode, u64 addr,
>  	if (cpumode == PERF_RECORD_MISC_KERNEL && perf_host) {
>  		al->level = 'k';
>  		maps = machine__kernel_maps(machine);
> -		load_map = true;
> +		load_map = !symbol_conf.lazy_load_kernel_maps;
>  	} else if (cpumode == PERF_RECORD_MISC_USER && perf_host) {
>  		al->level = '.';
>  	} else if (cpumode == PERF_RECORD_MISC_GUEST_KERNEL && perf_guest) {
>  		al->level = 'g';
>  		maps = machine__kernel_maps(machine);
> -		load_map = true;
> +		load_map = !symbol_conf.lazy_load_kernel_maps;
>  	} else if (cpumode == PERF_RECORD_MISC_GUEST_USER && perf_guest) {
>  		al->level = 'u';
>  	} else {
> diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
> index 0b589570d1d0..2b2fb9e224b0 100644
> --- a/tools/perf/util/symbol_conf.h
> +++ b/tools/perf/util/symbol_conf.h
> @@ -42,7 +42,8 @@ struct symbol_conf {
>  			inline_name,
>  			disable_add2line_warn,
>  			buildid_mmap2,
> -			guest_code;
> +			guest_code,
> +			lazy_load_kernel_maps;
>  	const char	*vmlinux_name,
>  			*kallsyms_name,
>  			*source_prefix,


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 05/53] perf machine thread: Remove exited threads by default
  2023-11-02 17:56 ` [PATCH v4 05/53] perf machine thread: Remove exited threads by default Ian Rogers
@ 2023-11-06 11:28   ` Adrian Hunter
  2023-11-08 16:04     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 83+ messages in thread
From: Adrian Hunter @ 2023-11-06 11:28 UTC (permalink / raw)
  To: Ian Rogers, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

On 2/11/23 19:56, Ian Rogers wrote:
> struct thread values hold onto references to mmaps, dsos, etc. When a
> thread exits it is necessary to clean all of this memory up by
> removing the thread from the machine's threads. Some tools require
> this doesn't happen, such as auxtrace events, perf report if offcpu
> events exist or if a task list is being generated, so add a
> symbol_conf value to make the behavior optional. When an exited thread
> is left in the machine's threads, mark it as exited.
> 
> This change relates to commit 40826c45eb0b ("perf thread: Remove
> notion of dead threads"). Dead threads were removed as they had a
> reference count of 0 and were difficult to reason about with the
> reference count checker. Here a thread is removed from threads when it
> exits, unless via symbol_conf the exited thread isn't remove and is
> marked as exited. Reference counting behaves as it normally does.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>

For auxtrace:

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>

> ---
>  tools/perf/builtin-report.c   |  7 +++++++
>  tools/perf/util/machine.c     | 10 +++++++---
>  tools/perf/util/session.c     |  5 +++++
>  tools/perf/util/symbol_conf.h |  3 ++-
>  tools/perf/util/thread.h      | 14 ++++++++++++++
>  5 files changed, 35 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index 9cb1da2dc0c0..121a2781323c 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -1426,6 +1426,13 @@ int cmd_report(int argc, const char **argv)
>  	if (ret < 0)
>  		goto exit;
>  
> +	/*
> +	 * tasks_mode require access to exited threads to list those that are in
> +	 * the data file. Off-cpu events are synthesized after other events and
> +	 * reference exited threads.
> +	 */
> +	symbol_conf.keep_exited_threads = true;
> +
>  	annotation_options__init(&report.annotation_opts);
>  
>  	ret = perf_config(report__config, &report);
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 90c750150b19..a985d004aa8d 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -2157,9 +2157,13 @@ int machine__process_exit_event(struct machine *machine, union perf_event *event
>  	if (dump_trace)
>  		perf_event__fprintf_task(event, stdout);
>  
> -	if (thread != NULL)
> -		thread__put(thread);
> -
> +	if (thread != NULL) {
> +		if (symbol_conf.keep_exited_threads)
> +			thread__set_exited(thread, /*exited=*/true);
> +		else
> +			machine__remove_thread(machine, thread);
> +	}
> +	thread__put(thread);
>  	return 0;
>  }
>  
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 1e9aa8ed15b6..c6afba7ab1a5 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -115,6 +115,11 @@ static int perf_session__open(struct perf_session *session, int repipe_fd)
>  		return -1;
>  	}
>  
> +	if (perf_header__has_feat(&session->header, HEADER_AUXTRACE)) {
> +		/* Auxiliary events may reference exited threads, hold onto dead ones. */
> +		symbol_conf.keep_exited_threads = true;
> +	}
> +
>  	if (perf_data__is_pipe(data))
>  		return 0;
>  
> diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
> index 2b2fb9e224b0..6040286e07a6 100644
> --- a/tools/perf/util/symbol_conf.h
> +++ b/tools/perf/util/symbol_conf.h
> @@ -43,7 +43,8 @@ struct symbol_conf {
>  			disable_add2line_warn,
>  			buildid_mmap2,
>  			guest_code,
> -			lazy_load_kernel_maps;
> +			lazy_load_kernel_maps,
> +			keep_exited_threads;
>  	const char	*vmlinux_name,
>  			*kallsyms_name,
>  			*source_prefix,
> diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
> index e79225a0ea46..0df775b5c110 100644
> --- a/tools/perf/util/thread.h
> +++ b/tools/perf/util/thread.h
> @@ -36,13 +36,22 @@ struct thread_rb_node {
>  };
>  
>  DECLARE_RC_STRUCT(thread) {
> +	/** @maps: mmaps associated with this thread. */
>  	struct maps		*maps;
>  	pid_t			pid_; /* Not all tools update this */
> +	/** @tid: thread ID number unique to a machine. */
>  	pid_t			tid;
> +	/** @ppid: parent process of the process this thread belongs to. */
>  	pid_t			ppid;
>  	int			cpu;
>  	int			guest_cpu; /* For QEMU thread */
>  	refcount_t		refcnt;
> +	/**
> +	 * @exited: Has the thread had an exit event. Such threads are usually
> +	 * removed from the machine's threads but some events/tools require
> +	 * access to dead threads.
> +	 */
> +	bool			exited;
>  	bool			comm_set;
>  	int			comm_len;
>  	struct list_head	namespaces_list;
> @@ -189,6 +198,11 @@ static inline refcount_t *thread__refcnt(struct thread *thread)
>  	return &RC_CHK_ACCESS(thread)->refcnt;
>  }
>  
> +static inline void thread__set_exited(struct thread *thread, bool exited)
> +{
> +	RC_CHK_ACCESS(thread)->exited = exited;
> +}
> +
>  static inline bool thread__comm_set(const struct thread *thread)
>  {
>  	return RC_CHK_ACCESS(thread)->comm_set;


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 02/53] perf record: Lazy load kernel symbols
  2023-11-06 11:00   ` Adrian Hunter
@ 2023-11-08 16:01     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 83+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-11-08 16:01 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ian Rogers, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Nick Terrell,
	Kan Liang, Andi Kleen, Kajol Jain, Athira Rajeev, Huacai Chen,
	Masami Hiramatsu, Vincent Whitchurch, Steinar H. Gunderson,
	Liam Howlett, Miguel Ojeda, Colin Ian King, Dmitrii Dolgov,
	Yang Jihong, Ming Wang, James Clark, K Prateek Nayak,
	Sean Christopherson, Leo Yan, Ravi Bangoria, German Gomez,
	Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das, liuwenyu,
	linux-kernel, linux-perf-users

Em Mon, Nov 06, 2023 at 01:00:14PM +0200, Adrian Hunter escreveu:
> On 2/11/23 19:56, Ian Rogers wrote:
> > Commit 5b7ba82a7591 ("perf symbols: Load kernel maps before using")
> > changed it so that loading a kernel dso would cause the symbols for
> > the dso to be eagerly loaded. For perf record this is overhead as the
> > symbols won't be used. Add a symbol_conf to control the behavior and
> > disable it for perf record and perf inject.

> > Signed-off-by: Ian Rogers <irogers@google.com>
 
> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>

Thanks, applied to perf-tools-next.

- Arnaldo


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 05/53] perf machine thread: Remove exited threads by default
  2023-11-06 11:28   ` Adrian Hunter
@ 2023-11-08 16:04     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 83+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-11-08 16:04 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ian Rogers, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Nick Terrell,
	Kan Liang, Andi Kleen, Kajol Jain, Athira Rajeev, Huacai Chen,
	Masami Hiramatsu, Vincent Whitchurch, Steinar H. Gunderson,
	Liam Howlett, Miguel Ojeda, Colin Ian King, Dmitrii Dolgov,
	Yang Jihong, Ming Wang, James Clark, K Prateek Nayak,
	Sean Christopherson, Leo Yan, Ravi Bangoria, German Gomez,
	Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das, liuwenyu,
	linux-kernel, linux-perf-users

Em Mon, Nov 06, 2023 at 01:28:43PM +0200, Adrian Hunter escreveu:
> On 2/11/23 19:56, Ian Rogers wrote:
> > struct thread values hold onto references to mmaps, dsos, etc. When a
> > thread exits it is necessary to clean all of this memory up by
> > removing the thread from the machine's threads. Some tools require
> > this doesn't happen, such as auxtrace events, perf report if offcpu
> > events exist or if a task list is being generated, so add a
> > symbol_conf value to make the behavior optional. When an exited thread
> > is left in the machine's threads, mark it as exited.
> > 
> > This change relates to commit 40826c45eb0b ("perf thread: Remove
> > notion of dead threads"). Dead threads were removed as they had a
> > reference count of 0 and were difficult to reason about with the
> > reference count checker. Here a thread is removed from threads when it
> > exits, unless via symbol_conf the exited thread isn't remove and is
> > marked as exited. Reference counting behaves as it normally does.
> > 
> > Signed-off-by: Ian Rogers <irogers@google.com>
> 
> For auxtrace:
> 
> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>

Thanks, applied to perf-tools-next.

- Arnaldo


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 12/53] perf bpf: Don't synthesize BPF events when disabled
  2023-11-02 17:56 ` [PATCH v4 12/53] perf bpf: Don't synthesize BPF events when disabled Ian Rogers
@ 2023-11-08 16:14   ` Arnaldo Carvalho de Melo
  2023-11-08 23:03     ` Song Liu
  0 siblings, 1 reply; 83+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-11-08 16:14 UTC (permalink / raw)
  To: Ian Rogers, Song Liu
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Adrian Hunter, Nick Terrell, Kan Liang,
	Andi Kleen, Kajol Jain, Athira Rajeev, Huacai Chen,
	Masami Hiramatsu, Vincent Whitchurch, Steinar H. Gunderson,
	Liam Howlett, Miguel Ojeda, Colin Ian King, Dmitrii Dolgov,
	Yang Jihong, Ming Wang, James Clark, K Prateek Nayak,
	Sean Christopherson, Leo Yan, Ravi Bangoria, German Gomez,
	Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das, liuwenyu,
	linux-kernel, linux-perf-users

Em Thu, Nov 02, 2023 at 10:56:54AM -0700, Ian Rogers escreveu:
> If BPF sideband events are disabled on the command line, don't
> synthesize BPF events too.


Interesting, in 71184c6ab7e60fd5 ("perf record: Replace option
--bpf-event with --no-bpf-event") we checked that, but only down at
perf_event__synthesize_one_bpf_prog(), where we have:

        if (!opts->no_bpf_event) {
                /* Synthesize PERF_RECORD_BPF_EVENT */
                *bpf_event = (struct perf_record_bpf_event)


So we better remove that, now redundant check? I'll apply your patch as
is and then we can remove that other check.

Song, can I have your Acked-by or Reviewed-by, please?

- Arnaldo


 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/util/bpf-event.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/tools/perf/util/bpf-event.c b/tools/perf/util/bpf-event.c
> index 38fcf3ba5749..830711cae30d 100644
> --- a/tools/perf/util/bpf-event.c
> +++ b/tools/perf/util/bpf-event.c
> @@ -386,6 +386,9 @@ int perf_event__synthesize_bpf_events(struct perf_session *session,
>  	int err;
>  	int fd;
>  
> +	if (opts->no_bpf_event)
> +		return 0;
> +
>  	event = malloc(sizeof(event->bpf) + KSYM_NAME_LEN + machine->id_hdr_size);
>  	if (!event)
>  		return -1;
> -- 
> 2.42.0.869.gea05f2083d-goog
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 12/53] perf bpf: Don't synthesize BPF events when disabled
  2023-11-08 16:14   ` Arnaldo Carvalho de Melo
@ 2023-11-08 23:03     ` Song Liu
  2023-11-09 16:10       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 83+ messages in thread
From: Song Liu @ 2023-11-08 23:03 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ian Rogers, Song Liu, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Adrian Hunter,
	Nick Terrell, Kan Liang, Andi Kleen, Kajol Jain, Athira Rajeev,
	Huacai Chen, Masami Hiramatsu, Vincent Whitchurch,
	Steinar H. Gunderson, Liam Howlett, Miguel Ojeda, Colin Ian King,
	Dmitrii Dolgov, Yang Jihong, Ming Wang, James Clark,
	K Prateek Nayak, Sean Christopherson, Leo Yan, Ravi Bangoria,
	German Gomez, Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das,
	liuwenyu, linux-kernel, linux-perf-users

On Wed, Nov 8, 2023 at 8:15 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>
> Em Thu, Nov 02, 2023 at 10:56:54AM -0700, Ian Rogers escreveu:
> > If BPF sideband events are disabled on the command line, don't
> > synthesize BPF events too.
>
>
> Interesting, in 71184c6ab7e60fd5 ("perf record: Replace option
> --bpf-event with --no-bpf-event") we checked that, but only down at
> perf_event__synthesize_one_bpf_prog(), where we have:
>
>         if (!opts->no_bpf_event) {
>                 /* Synthesize PERF_RECORD_BPF_EVENT */
>                 *bpf_event = (struct perf_record_bpf_event)
>
>
> So we better remove that, now redundant check? I'll apply your patch as
> is and then we can remove that other check.
>
> Song, can I have your Acked-by or Reviewed-by, please?
>
> - Arnaldo
>
> > Signed-off-by: Ian Rogers <irogers@google.com>

Good catch!

Acked-by: Song Liu <song@kernel.org>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 12/53] perf bpf: Don't synthesize BPF events when disabled
  2023-11-08 23:03     ` Song Liu
@ 2023-11-09 16:10       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 83+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-11-09 16:10 UTC (permalink / raw)
  To: Song Liu
  Cc: Ian Rogers, Song Liu, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Adrian Hunter,
	Nick Terrell, Kan Liang, Andi Kleen, Kajol Jain, Athira Rajeev,
	Huacai Chen, Masami Hiramatsu, Vincent Whitchurch,
	Steinar H. Gunderson, Liam Howlett, Miguel Ojeda, Colin Ian King,
	Dmitrii Dolgov, Yang Jihong, Ming Wang, James Clark,
	K Prateek Nayak, Sean Christopherson, Leo Yan, Ravi Bangoria,
	German Gomez, Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das,
	liuwenyu, linux-kernel, linux-perf-users

Em Wed, Nov 08, 2023 at 03:03:15PM -0800, Song Liu escreveu:
> On Wed, Nov 8, 2023 at 8:15 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> >
> > Em Thu, Nov 02, 2023 at 10:56:54AM -0700, Ian Rogers escreveu:
> > > If BPF sideband events are disabled on the command line, don't
> > > synthesize BPF events too.
> >
> >
> > Interesting, in 71184c6ab7e60fd5 ("perf record: Replace option
> > --bpf-event with --no-bpf-event") we checked that, but only down at
> > perf_event__synthesize_one_bpf_prog(), where we have:
> >
> >         if (!opts->no_bpf_event) {
> >                 /* Synthesize PERF_RECORD_BPF_EVENT */
> >                 *bpf_event = (struct perf_record_bpf_event)
> >
> >
> > So we better remove that, now redundant check? I'll apply your patch as
> > is and then we can remove that other check.
> >
> > Song, can I have your Acked-by or Reviewed-by, please?
> >
> > - Arnaldo
> >
> > > Signed-off-by: Ian Rogers <irogers@google.com>
> 
> Good catch!
> 
> Acked-by: Song Liu <song@kernel.org>

Thanks, applied the patch with your Acked-by, will revisit this after
this gets published.

- Arnaldo

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 01/53] perf comm: Use regular mutex
  2023-11-06  3:58       ` Namhyung Kim
@ 2023-11-27 18:59         ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-27 18:59 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Nick Terrell, Kan Liang, Andi Kleen, Kajol Jain, Athira Rajeev,
	Huacai Chen, Masami Hiramatsu, Vincent Whitchurch,
	Steinar H. Gunderson, Liam Howlett, Miguel Ojeda, Colin Ian King,
	Dmitrii Dolgov, Yang Jihong, Ming Wang, James Clark,
	K Prateek Nayak, Sean Christopherson, Leo Yan, Ravi Bangoria,
	German Gomez, Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das,
	liuwenyu, linux-kernel, linux-perf-users

On Sun, Nov 5, 2023 at 7:59 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Sun, Nov 5, 2023 at 1:35 PM Ian Rogers <irogers@google.com> wrote:
> >
> > On Sun, Nov 5, 2023 at 9:32 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > Hi Ian,
> > >
> > > On Thu, Nov 2, 2023 at 10:58 AM Ian Rogers <irogers@google.com> wrote:
> > > >
> > > > The rwsem is only after used for writing so switch to a mutex that has
> > > > better error checking.
> > >
> > > Hmm.. ok.  It doesn't make sense to use rwsem without readers.
> > >
> > > >
> > > > Fixes: 7a8f349e9d14 ("perf rwsem: Add debug mode that uses a mutex")
> > >
> > > But I'm not sure this is a fix.  Other than that,
> >
> > Thanks Namhyung, it fixes the case that you enable RWS_ERRORCHECK in
> > rwsem.h as the rwsem static initialization is wrong for a mutex.
>
> Sounds like we need a separate fix.  Maybe you need to
> add a static initializer macro depending on the config.

Agreed, but the only use would be here and switching this case to a
mutex gives extra error checking such as the mutex being taken
recursively. Given that, I prefer the existing change and the static
initializer for rwsem can be a follow up when needed.

Thanks,
Ian

> Thanks,
> Namhyung

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 03/53] libperf: Lazily allocate mmap event copy
  2023-11-05 18:12       ` Namhyung Kim
@ 2023-11-27 19:28         ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-27 19:28 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Guilherme Amadio, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

On Sun, Nov 5, 2023 at 10:12 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Fri, Nov 3, 2023 at 8:49 AM Ian Rogers <irogers@google.com> wrote:
> >
> > On Fri, Nov 3, 2023 at 1:33 AM Guilherme Amadio <amadio@gentoo.org> wrote:
> > >
> > > Hi,
> > >
> > > On Thu, Nov 02, 2023 at 10:56:45AM -0700, Ian Rogers wrote:
> > > > The event copy in the mmap is used to have storage to a read
> > > > event. Not all users of mmaps read the events, such as perf record, so
> > > > switch the allocation to being on first read rather than being
> > > > embedded within the perf_mmap.
> > > >
> > > > Signed-off-by: Ian Rogers <irogers@google.com>
> > > > ---
> > > >  tools/lib/perf/include/internal/mmap.h | 2 +-
> > > >  tools/lib/perf/mmap.c                  | 9 +++++++++
> > > >  2 files changed, 10 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/tools/lib/perf/include/internal/mmap.h b/tools/lib/perf/include/internal/mmap.h
> > > > index 5a062af8e9d8..b11aaf5ed645 100644
> > > > --- a/tools/lib/perf/include/internal/mmap.h
> > > > +++ b/tools/lib/perf/include/internal/mmap.h
> > > > @@ -33,7 +33,7 @@ struct perf_mmap {
> > > >       bool                     overwrite;
> > > >       u64                      flush;
> > > >       libperf_unmap_cb_t       unmap_cb;
> > > > -     char                     event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
> > > > +     void                    *event_copy;
> > > >       struct perf_mmap        *next;
> > > >  };
> > > >
> > > > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> > > > index 2184814b37dd..91ae46aac378 100644
> > > > --- a/tools/lib/perf/mmap.c
> > > > +++ b/tools/lib/perf/mmap.c
> > > > @@ -51,6 +51,8 @@ int perf_mmap__mmap(struct perf_mmap *map, struct perf_mmap_param *mp,
> > > >
> > > >  void perf_mmap__munmap(struct perf_mmap *map)
> > > >  {
> > > > +     free(map->event_copy);
> > > > +     map->event_copy = NULL;
> > > >       if (map && map->base != NULL) {
> > >
> > > If map can be NULL as the if statement above suggests, then there is a
> > > potential a null pointer dereference bug here. Suggestion:
> > >
> > >     if (!map)
> > >         return;
> > >
> > >     free(map->event_copy);
> > >     map->event_copy = NULL;
> > >     if (map->base != NULL) {
> > >
> > >     ...
> >
> > Makes sense, will fix in v5. Waiting to get additional feedback to
> > avoid too much email.
>
> Acked-by: Namhyung Kim <namhyung@kernel.org>
>
>
> But I have another concern (not related to this change).
>
> > >
> > > >               munmap(map->base, perf_mmap__mmap_len(map));
> > > >               map->base = NULL;
> > > > @@ -226,6 +228,13 @@ static union perf_event *perf_mmap__read(struct perf_mmap *map,
> > > >                       unsigned int len = min(sizeof(*event), size), cpy;
>
> I'm not sure if it's ok to read less than the actual size, IOW
> it seems to assume 'size' is smaller than sizeof(*event).
> I guess it's true for most cases as union perf_event has
> perf_record_mmap2 (among others) which contains a
> filename array of size PATH_MAX.
>
> But the SAMPLE record can be larger than that when it has
> PERF_SAMPLE_AUX IIRC.  It'd happen only if it crossed the mmap
> boundary and I'm afraid it'd corrupt the data.

Thanks, I was thinking this would just be a drop in change but I think
given this feedback it would be better to switch from allocating once
a PERF_SAMPLE_MAX_SIZE buffer to allocating or reallocating one based
on size. This potentially saves memory when size is less than
PERF_SAMPLE_MAX_SIZE and by removing the min calculation for the
amount copied (len) we can potentially exceed it and fix a potential
bug. I'll add this in v5.

Thanks,
Ian

> Thanks,
> Namhyung
>
>
> > > >                       void *dst = map->event_copy;
> > > >
> > > > +                     if (!dst) {
> > > > +                             dst = malloc(PERF_SAMPLE_MAX_SIZE);
> > > > +                             if (!dst)
> > > > +                                     return NULL;
> > > > +                             map->event_copy = dst;
> > > > +                     }
> > > > +
> > > >                       do {
> > > >                               cpy = min(map->mask + 1 - (offset & map->mask), len);
> > > >                               memcpy(dst, &data[offset & map->mask], cpy);
> > > > --
> > > > 2.42.0.869.gea05f2083d-goog
> > > >
> > > >

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 06/53] tools api fs: Switch filename__read_str to use io.h
  2023-11-06  3:53   ` Namhyung Kim
@ 2023-11-27 20:26     ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-27 20:26 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Nick Terrell, Kan Liang, Andi Kleen, Kajol Jain, Athira Rajeev,
	Huacai Chen, Masami Hiramatsu, Vincent Whitchurch,
	Steinar H. Gunderson, Liam Howlett, Miguel Ojeda, Colin Ian King,
	Dmitrii Dolgov, Yang Jihong, Ming Wang, James Clark,
	K Prateek Nayak, Sean Christopherson, Leo Yan, Ravi Bangoria,
	German Gomez, Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das,
	liuwenyu, linux-kernel, linux-perf-users

On Sun, Nov 5, 2023 at 7:53 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Thu, Nov 2, 2023 at 10:58 AM Ian Rogers <irogers@google.com> wrote:
> >
> > filename__read_str has its own string reading code that allocates
> > memory before reading into it. The memory allocated is sized at BUFSIZ
> > that is 8kb. Most strings are short and so most of this 8kb is
> > wasted.
> >
> > Refactor io__getline so that the newline character can be configurable
> > and ignored in the case of filename__read_str.
> >
> > Code like build_caches_for_cpu in perf's header.c will read many
> > strings and hold them in a data structure, in this case multiple
> > strings per cache level per CPU. Using io.h's io__getline avoids the
> > wasted memory as strings are temporarily read into a buffer on the
> > stack before being copied to a buffer that grows 128 bytes at a time
> > and is never sized larger than the string.
> >
> > For a 16 hyperthread system the memory consumption of "perf record
> > true" is reduced by 180kb, primarily through saving memory when
> > reading the cache information.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
>
> [SNIP]
> > diff --git a/tools/lib/api/io.h b/tools/lib/api/io.h
> > index a77b74c5fb65..50d33e14fb56 100644
> > --- a/tools/lib/api/io.h
> > +++ b/tools/lib/api/io.h
> > @@ -141,7 +141,7 @@ static inline int io__get_dec(struct io *io, __u64 *dec)
> >  }
> >
> >  /* Read up to and including the first newline following the pattern of getline. */
>
> You may want to update the comment as well.
>
> > -static inline ssize_t io__getline(struct io *io, char **line_out, size_t *line_len_out)
> > +static inline ssize_t io__getline_nl(struct io *io, char **line_out, size_t *line_len_out, int nl)
>
> How about io__getdelim() similar to POSIX?

Thanks done for v5.

Ian

> Thanks,
> Namhyung
>
>
> >  {
> >         char buf[128];
> >         int buf_pos = 0;
> > @@ -151,7 +151,7 @@ static inline ssize_t io__getline(struct io *io, char **line_out, size_t *line_l
> >
> >         /* TODO: reuse previously allocated memory. */
> >         free(*line_out);
> > -       while (ch != '\n') {
> > +       while (ch != nl) {
> >                 ch = io__get_char(io);
> >
> >                 if (ch < 0)
> > @@ -184,4 +184,9 @@ static inline ssize_t io__getline(struct io *io, char **line_out, size_t *line_l
> >         return -ENOMEM;
> >  }
> >
> > +static inline ssize_t io__getline(struct io *io, char **line_out, size_t *line_len_out)
> > +{
> > +       return io__getline_nl(io, line_out, line_len_out, /*nl=*/'\n');
> > +}
> > +
> >  #endif /* __API_IO__ */
> > --
> > 2.42.0.869.gea05f2083d-goog
> >
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 07/53] tools api fs: Avoid reading whole file for a 1 byte bool
  2023-11-06  3:55   ` Namhyung Kim
@ 2023-11-27 20:41     ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-27 20:41 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Nick Terrell, Kan Liang, Andi Kleen, Kajol Jain, Athira Rajeev,
	Huacai Chen, Masami Hiramatsu, Vincent Whitchurch,
	Steinar H. Gunderson, Liam Howlett, Miguel Ojeda, Colin Ian King,
	Dmitrii Dolgov, Yang Jihong, Ming Wang, James Clark,
	K Prateek Nayak, Sean Christopherson, Leo Yan, Ravi Bangoria,
	German Gomez, Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das,
	liuwenyu, linux-kernel, linux-perf-users

On Sun, Nov 5, 2023 at 7:56 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Thu, Nov 2, 2023 at 10:58 AM Ian Rogers <irogers@google.com> wrote:
> >
> > sysfs__read_bool used the first byte from a fully read file into a
> > string. It then looked at the first byte's value. Avoid doing this and
> > just read the first byte.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/lib/api/fs/fs.c | 18 +++++++++---------
> >  1 file changed, 9 insertions(+), 9 deletions(-)
> >
> > diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c
> > index 496812b5f1d2..4c35a689d1fc 100644
> > --- a/tools/lib/api/fs/fs.c
> > +++ b/tools/lib/api/fs/fs.c
> > @@ -447,15 +447,16 @@ int sysfs__read_str(const char *entry, char **buf, size_t *sizep)
> >
> >  int sysfs__read_bool(const char *entry, bool *value)
> >  {
> > -       char *buf;
> > -       size_t size;
> > -       int ret;
> > +       struct io io;
> > +       char bf[16];
> > +       int ret = 0;
> >
> > -       ret = sysfs__read_str(entry, &buf, &size);
> > -       if (ret < 0)
> > -               return ret;
> > +       io.fd = open(entry, O_RDONLY);
>
> The entry is a name in sysfs, so you need to get the full name.
>
> Thanks,
> Namhyung

Thanks, added in v5.

Ian

>
> > +       if (io.fd < 0)
> > +               return -errno;
> >
> > -       switch (buf[0]) {
> > +       io__init(&io, io.fd, bf, sizeof(bf));
> > +       switch (io__get_char(&io)) {
> >         case '1':
> >         case 'y':
> >         case 'Y':
> > @@ -469,8 +470,7 @@ int sysfs__read_bool(const char *entry, bool *value)
> >         default:
> >                 ret = -1;
> >         }
> > -
> > -       free(buf);
> > +       close(io.fd);
> >
> >         return ret;
> >  }
> > --
> > 2.42.0.869.gea05f2083d-goog
> >

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 01/53] perf comm: Use regular mutex
  2023-11-05 17:31   ` Namhyung Kim
  2023-11-05 21:35     ` Ian Rogers
@ 2023-11-27 21:53     ` Arnaldo Carvalho de Melo
  2023-11-28  0:48       ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 83+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-11-27 21:53 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ian Rogers, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Nick Terrell,
	Kan Liang, Andi Kleen, Kajol Jain, Athira Rajeev, Huacai Chen,
	Masami Hiramatsu, Vincent Whitchurch, Steinar H. Gunderson,
	Liam Howlett, Miguel Ojeda, Colin Ian King, Dmitrii Dolgov,
	Yang Jihong, Ming Wang, James Clark, K Prateek Nayak,
	Sean Christopherson, Leo Yan, Ravi Bangoria, German Gomez,
	Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das, liuwenyu,
	linux-kernel, linux-perf-users

Em Sun, Nov 05, 2023 at 09:31:47AM -0800, Namhyung Kim escreveu:
> Hi Ian,
> 
> On Thu, Nov 2, 2023 at 10:58 AM Ian Rogers <irogers@google.com> wrote:
> >
> > The rwsem is only after used for writing so switch to a mutex that has
> > better error checking.
> 
> Hmm.. ok.  It doesn't make sense to use rwsem without readers.

Well, the only reader is a findnew method, that will primarily read,
but possibly write if it doesn't find it there, so converting to a
regular mutex seems sensible.

- Arnaldo
 
> > Fixes: 7a8f349e9d14 ("perf rwsem: Add debug mode that uses a mutex")
> 
> But I'm not sure this is a fix.  Other than that,

Yeah, agreed, will remove the fixes.
 
> > Signed-off-by: Ian Rogers <irogers@google.com>
> 
> Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks,

- Arnaldo

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 04/53] perf mmap: Lazily initialize zstd streams
  2023-11-02 17:56 ` [PATCH v4 04/53] perf mmap: Lazily initialize zstd streams Ian Rogers
@ 2023-11-27 22:00   ` Arnaldo Carvalho de Melo
  2023-11-28 17:14     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 83+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-11-27 22:00 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Adrian Hunter, Nick Terrell, Kan Liang,
	Andi Kleen, Kajol Jain, Athira Rajeev, Huacai Chen,
	Masami Hiramatsu, Vincent Whitchurch, Steinar H. Gunderson,
	Liam Howlett, Miguel Ojeda, Colin Ian King, Dmitrii Dolgov,
	Yang Jihong, Ming Wang, James Clark, K Prateek Nayak,
	Sean Christopherson, Leo Yan, Ravi Bangoria, German Gomez,
	Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das, liuwenyu,
	linux-kernel, linux-perf-users

Em Thu, Nov 02, 2023 at 10:56:46AM -0700, Ian Rogers escreveu:
> Zstd streams create dictionaries that can require significant RAM,
> especially when there is one per-CPU. Tools like perf record won't use
> the streams without the -z option, and so the creation of the streams
> is pure overhead. Switch to creating the streams on first use.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>

Thanks, applied to perf-tools-next.

- Arnaldo


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 10/53] perf record: Be lazier in allocating lost samples buffer
  2023-11-02 17:56 ` [PATCH v4 10/53] perf record: Be lazier in allocating lost samples buffer Ian Rogers
@ 2023-11-27 22:03   ` Arnaldo Carvalho de Melo
  2023-11-27 22:23     ` Ian Rogers
  0 siblings, 1 reply; 83+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-11-27 22:03 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Adrian Hunter, Nick Terrell, Kan Liang,
	Andi Kleen, Kajol Jain, Athira Rajeev, Huacai Chen,
	Masami Hiramatsu, Vincent Whitchurch, Steinar H. Gunderson,
	Liam Howlett, Miguel Ojeda, Colin Ian King, Dmitrii Dolgov,
	Yang Jihong, Ming Wang, James Clark, K Prateek Nayak,
	Sean Christopherson, Leo Yan, Ravi Bangoria, German Gomez,
	Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das, liuwenyu,
	linux-kernel, linux-perf-users

Em Thu, Nov 02, 2023 at 10:56:52AM -0700, Ian Rogers escreveu:
> Wait until a lost sample occurs to allocate the lost samples buffer,
> often the buffer isn't necessary. This saves a 64kb allocation and
> 5.3kb of peak memory consumption.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/builtin-record.c | 29 +++++++++++++++++++----------
>  1 file changed, 19 insertions(+), 10 deletions(-)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 9b4f3805ca92..b6c8c1371b39 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1924,21 +1924,13 @@ static void __record__save_lost_samples(struct record *rec, struct evsel *evsel,
>  static void record__read_lost_samples(struct record *rec)
>  {
>  	struct perf_session *session = rec->session;
> -	struct perf_record_lost_samples *lost;
> +	struct perf_record_lost_samples *lost = NULL;
>  	struct evsel *evsel;
>  
>  	/* there was an error during record__open */
>  	if (session->evlist == NULL)
>  		return;
>  
> -	lost = zalloc(PERF_SAMPLE_MAX_SIZE);
> -	if (lost == NULL) {
> -		pr_debug("Memory allocation failed\n");
> -		return;
> -	}

Shouldn't we take the time here and instead improve this error message
and then propagate the error?

For instance, we may want to still get some perf.data file without these
records but inform the user at this point how many records were lost
(count.lost)?

- Arnaldo

> -
> -	lost->header.type = PERF_RECORD_LOST_SAMPLES;
> -
>  	evlist__for_each_entry(session->evlist, evsel) {
>  		struct xyarray *xy = evsel->core.sample_id;
>  		u64 lost_count;
> @@ -1961,6 +1953,14 @@ static void record__read_lost_samples(struct record *rec)
>  				}
>  
>  				if (count.lost) {
> +					if (!lost) {
> +						lost = zalloc(PERF_SAMPLE_MAX_SIZE);
> +						if (!lost) {
> +							pr_debug("Memory allocation failed\n");
> +							return;
> +						}
> +						lost->header.type = PERF_RECORD_LOST_SAMPLES;
> +					}
>  					__record__save_lost_samples(rec, evsel, lost,
>  								    x, y, count.lost, 0);
>  				}
> @@ -1968,9 +1968,18 @@ static void record__read_lost_samples(struct record *rec)
>  		}
>  
>  		lost_count = perf_bpf_filter__lost_count(evsel);
> -		if (lost_count)
> +		if (lost_count) {
> +			if (!lost) {
> +				lost = zalloc(PERF_SAMPLE_MAX_SIZE);
> +				if (!lost) {
> +					pr_debug("Memory allocation failed\n");
> +					return;
> +				}
> +				lost->header.type = PERF_RECORD_LOST_SAMPLES;
> +			}
>  			__record__save_lost_samples(rec, evsel, lost, 0, 0, lost_count,
>  						    PERF_RECORD_MISC_LOST_SAMPLES_BPF);
> +		}
>  	}
>  out:
>  	free(lost);
> -- 
> 2.42.0.869.gea05f2083d-goog
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 10/53] perf record: Be lazier in allocating lost samples buffer
  2023-11-27 22:03   ` Arnaldo Carvalho de Melo
@ 2023-11-27 22:23     ` Ian Rogers
  0 siblings, 0 replies; 83+ messages in thread
From: Ian Rogers @ 2023-11-27 22:23 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Adrian Hunter, Nick Terrell, Kan Liang,
	Andi Kleen, Kajol Jain, Athira Rajeev, Huacai Chen,
	Masami Hiramatsu, Vincent Whitchurch, Steinar H. Gunderson,
	Liam Howlett, Miguel Ojeda, Colin Ian King, Dmitrii Dolgov,
	Yang Jihong, Ming Wang, James Clark, K Prateek Nayak,
	Sean Christopherson, Leo Yan, Ravi Bangoria, German Gomez,
	Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das, liuwenyu,
	linux-kernel, linux-perf-users

On Mon, Nov 27, 2023 at 2:03 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Thu, Nov 02, 2023 at 10:56:52AM -0700, Ian Rogers escreveu:
> > Wait until a lost sample occurs to allocate the lost samples buffer,
> > often the buffer isn't necessary. This saves a 64kb allocation and
> > 5.3kb of peak memory consumption.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/perf/builtin-record.c | 29 +++++++++++++++++++----------
> >  1 file changed, 19 insertions(+), 10 deletions(-)
> >
> > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > index 9b4f3805ca92..b6c8c1371b39 100644
> > --- a/tools/perf/builtin-record.c
> > +++ b/tools/perf/builtin-record.c
> > @@ -1924,21 +1924,13 @@ static void __record__save_lost_samples(struct record *rec, struct evsel *evsel,
> >  static void record__read_lost_samples(struct record *rec)
> >  {
> >       struct perf_session *session = rec->session;
> > -     struct perf_record_lost_samples *lost;
> > +     struct perf_record_lost_samples *lost = NULL;
> >       struct evsel *evsel;
> >
> >       /* there was an error during record__open */
> >       if (session->evlist == NULL)
> >               return;
> >
> > -     lost = zalloc(PERF_SAMPLE_MAX_SIZE);
> > -     if (lost == NULL) {
> > -             pr_debug("Memory allocation failed\n");
> > -             return;
> > -     }
>
> Shouldn't we take the time here and instead improve this error message
> and then propagate the error?
>
> For instance, we may want to still get some perf.data file without these
> records but inform the user at this point how many records were lost
> (count.lost)?

Sounds like a follow-up, the messages here are just moving the
existing message and the point of the patch is to postpone/avoid a
memory allocation when possible.

Thanks,
Ian

> - Arnaldo
>
> > -
> > -     lost->header.type = PERF_RECORD_LOST_SAMPLES;
> > -
> >       evlist__for_each_entry(session->evlist, evsel) {
> >               struct xyarray *xy = evsel->core.sample_id;
> >               u64 lost_count;
> > @@ -1961,6 +1953,14 @@ static void record__read_lost_samples(struct record *rec)
> >                               }
> >
> >                               if (count.lost) {
> > +                                     if (!lost) {
> > +                                             lost = zalloc(PERF_SAMPLE_MAX_SIZE);
> > +                                             if (!lost) {
> > +                                                     pr_debug("Memory allocation failed\n");
> > +                                                     return;
> > +                                             }
> > +                                             lost->header.type = PERF_RECORD_LOST_SAMPLES;
> > +                                     }
> >                                       __record__save_lost_samples(rec, evsel, lost,
> >                                                                   x, y, count.lost, 0);
> >                               }
> > @@ -1968,9 +1968,18 @@ static void record__read_lost_samples(struct record *rec)
> >               }
> >
> >               lost_count = perf_bpf_filter__lost_count(evsel);
> > -             if (lost_count)
> > +             if (lost_count) {
> > +                     if (!lost) {
> > +                             lost = zalloc(PERF_SAMPLE_MAX_SIZE);
> > +                             if (!lost) {
> > +                                     pr_debug("Memory allocation failed\n");
> > +                                     return;
> > +                             }
> > +                             lost->header.type = PERF_RECORD_LOST_SAMPLES;
> > +                     }
> >                       __record__save_lost_samples(rec, evsel, lost, 0, 0, lost_count,
> >                                                   PERF_RECORD_MISC_LOST_SAMPLES_BPF);
> > +             }
> >       }
> >  out:
> >       free(lost);
> > --
> > 2.42.0.869.gea05f2083d-goog
> >
>
> --
>
> - Arnaldo

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 01/53] perf comm: Use regular mutex
  2023-11-27 21:53     ` Arnaldo Carvalho de Melo
@ 2023-11-28  0:48       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 83+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-11-28  0:48 UTC (permalink / raw)
  To: Ian Rogers, Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Nick Terrell, Kan Liang, Andi Kleen,
	Kajol Jain, Athira Rajeev, Huacai Chen, Masami Hiramatsu,
	Vincent Whitchurch, Steinar H. Gunderson, Liam Howlett,
	Miguel Ojeda, Colin Ian King, Dmitrii Dolgov, Yang Jihong,
	Ming Wang, James Clark, K Prateek Nayak, Sean Christopherson,
	Leo Yan, Ravi Bangoria, German Gomez, Changbin Du, Paolo Bonzini,
	Li Dong, Sandipan Das, liuwenyu, linux-kernel, linux-perf-users

Em Mon, Nov 27, 2023 at 06:53:19PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Sun, Nov 05, 2023 at 09:31:47AM -0800, Namhyung Kim escreveu:
> > Hi Ian,
> > 
> > On Thu, Nov 2, 2023 at 10:58 AM Ian Rogers <irogers@google.com> wrote:
> > >
> > > The rwsem is only after used for writing so switch to a mutex that has
> > > better error checking.
> > 
> > Hmm.. ok.  It doesn't make sense to use rwsem without readers.
> 
> Well, the only reader is a findnew method, that will primarily read,
> but possibly write if it doesn't find it there, so converting to a
> regular mutex seems sensible.

To be fixed tomorrow:

   3    32.71 alpine:3.15                   : FAIL gcc version 10.3.1 20211027 (Alpine 10.3.1_git20211027)
    util/comm.c:20:46: error: 'PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP' undeclared here (not in a function)
       20 | static struct mutex comm_str_lock = {.lock = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP,};
          |                                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    make[3]: *** [/git/perf-6.6.0-rc1/tools/build/Makefile.build:158: util] Error 2
   4    32.17 alpine:3.16                   : FAIL gcc version 11.2.1 20220219 (Alpine 11.2.1_git20220219)
    util/comm.c:20:46: error: 'PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP' undeclared here (not in a function)
       20 | static struct mutex comm_str_lock = {.lock = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP,};
          |                                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    make[3]: *** [/git/perf-6.6.0-rc1/tools/build/Makefile.build:158: util] Error 2
   5    25.82 alpine:3.17                   : FAIL gcc version 12.2.1 20220924 (Alpine 12.2.1_git20220924-r4)
    util/comm.c:20:46: error: 'PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP' undeclared here (not in a function)
       20 | static struct mutex comm_str_lock = {.lock = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP,};
          |                                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    make[3]: *** [/git/perf-6.6.0-rc1/tools/build/Makefile.build:158: util] Error 2
   6    26.64 alpine:3.18                   : FAIL gcc version 12.2.1 20220924 (Alpine 12.2.1_git20220924-r10)
    util/comm.c:20:46: error: 'PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP' undeclared here (not in a function)
       20 | static struct mutex comm_str_lock = {.lock = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP,};
          |                                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    make[3]: *** [/git/perf-6.6.0-rc1/tools/build/Makefile.build:158: util] Error 2
   7    29.66 alpine:edge                   : FAIL gcc version 13.1.1 20230722 (Alpine 13.1.1_git20230722)
    util/comm.c:20:46: error: 'PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP' undeclared here (not in a function)
       20 | static struct mutex comm_str_lock = {.lock = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP,};
          |                                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    make[3]: *** [/git/perf-6.6.0-rc1/tools/build/Makefile.build:158: util] Error 2


 I.e. doesn't play well with musl libc.

 - Arnaldo

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 04/53] perf mmap: Lazily initialize zstd streams
  2023-11-27 22:00   ` Arnaldo Carvalho de Melo
@ 2023-11-28 17:14     ` Arnaldo Carvalho de Melo
  2023-11-28 17:38       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 83+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-11-28 17:14 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Adrian Hunter, Nick Terrell, Kan Liang,
	Andi Kleen, Kajol Jain, Athira Rajeev, Huacai Chen,
	Masami Hiramatsu, Vincent Whitchurch, Steinar H. Gunderson,
	Liam Howlett, Miguel Ojeda, Colin Ian King, Dmitrii Dolgov,
	Yang Jihong, Ming Wang, James Clark, K Prateek Nayak,
	Sean Christopherson, Leo Yan, Ravi Bangoria, German Gomez,
	Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das, liuwenyu,
	linux-kernel, linux-perf-users

Em Mon, Nov 27, 2023 at 07:00:06PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, Nov 02, 2023 at 10:56:46AM -0700, Ian Rogers escreveu:
> > Zstd streams create dictionaries that can require significant RAM,
> > especially when there is one per-CPU. Tools like perf record won't use
> > the streams without the -z option, and so the creation of the streams
> > is pure overhead. Switch to creating the streams on first use.
> > 
> > Signed-off-by: Ian Rogers <irogers@google.com>
> 
> Thanks, applied to perf-tools-next.

Trying to fix this now:

  6    20.59 alpine:3.18                   : FAIL gcc version 12.2.1 20220924 (Alpine 12.2.1_git20220924-r10)
    In file included from util/zstd.c:5:
    /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:1: error: unknown type name 'ssize_t'; did you mean 'size_t'?
       34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
          | ^~~~~~~
          | size_t
    util/zstd.c:31:9: error: conflicting types for 'zstd_compress_stream_to_records'; have 'ssize_t(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'long int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
       31 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
          |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:9: note: previous declaration of 'zstd_compress_stream_to_records' with type 'int(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
       34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
          |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    make[3]: *** [/git/perf-6.6.0-rc1/tools/build/Makefile.build:158: util] Error 2
      CC      /tmp/build/perf/util/zstd.o
      CC      /tmp/build/perf/util/cap.o
      CXX     /tmp/build/perf/util/demangle-cxx.o
      CC      /tmp/build/perf/util/demangle-ocaml.o
    In file included from util/zstd.c:5:
    /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:1: error: unknown type name 'ssize_t'; did you mean 'size_t'?
       34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
          | ^~~~~~~
          | size_t
      CC      /tmp/build/perf/util/demangle-java.o
    util/zstd.c:31:9: error: conflicting types for 'zstd_compress_stream_to_records'; have 'ssize_t(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'long int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
       31 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
          |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:9: note: previous declaration of 'zstd_compress_stream_to_records' with type 'int(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
       34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
   7    21.14 alpine:edge                   : FAIL gcc version 13.1.1 20230722 (Alpine 13.1.1_git20230722)
    In file included from util/zstd.c:5:
    /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:1: error: unknown type name 'ssize_t'; did you mean 'size_t'?
       34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
          | ^~~~~~~
          | size_t
    util/zstd.c:31:9: error: conflicting types for 'zstd_compress_stream_to_records'; have 'ssize_t(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'long int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
       31 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
          |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:9: note: previous declaration of 'zstd_compress_stream_to_records' with type 'int(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
       34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
          |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    make[3]: *** [/git/perf-6.6.0-rc1/tools/build/Makefile.build:158: util] Error 2
      CC      /tmp/build/perf/util/cap.o
      CXX     /tmp/build/perf/util/demangle-cxx.o
      CC      /tmp/build/perf/util/demangle-ocaml.o
      CC      /tmp/build/perf/util/demangle-java.o
    In file included from util/zstd.c:5:
    /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:1: error: unknown type name 'ssize_t'; did you mean 'size_t'?
       34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
          | ^~~~~~~
          | size_t
    util/zstd.c:31:9: error: conflicting types for 'zstd_compress_stream_to_records'; have 'ssize_t(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'long int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
       31 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
          |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:9: note: previous declaration of 'zstd_compress_stream_to_records' with type 'int(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
       34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
          |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 04/53] perf mmap: Lazily initialize zstd streams
  2023-11-28 17:14     ` Arnaldo Carvalho de Melo
@ 2023-11-28 17:38       ` Arnaldo Carvalho de Melo
  2023-11-28 17:55         ` Ian Rogers
  0 siblings, 1 reply; 83+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-11-28 17:38 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Adrian Hunter, Nick Terrell, Kan Liang,
	Andi Kleen, Kajol Jain, Athira Rajeev, Huacai Chen,
	Masami Hiramatsu, Vincent Whitchurch, Steinar H. Gunderson,
	Liam Howlett, Miguel Ojeda, Colin Ian King, Dmitrii Dolgov,
	Yang Jihong, Ming Wang, James Clark, K Prateek Nayak,
	Sean Christopherson, Leo Yan, Ravi Bangoria, German Gomez,
	Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das, liuwenyu,
	linux-kernel, linux-perf-users

Em Tue, Nov 28, 2023 at 02:14:33PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Nov 27, 2023 at 07:00:06PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Thu, Nov 02, 2023 at 10:56:46AM -0700, Ian Rogers escreveu:
> > > Zstd streams create dictionaries that can require significant RAM,
> > > especially when there is one per-CPU. Tools like perf record won't use
> > > the streams without the -z option, and so the creation of the streams
> > > is pure overhead. Switch to creating the streams on first use.

> > > Signed-off-by: Ian Rogers <irogers@google.com>

> > Thanks, applied to perf-tools-next.

> Trying to fix this now:
> 
>   6    20.59 alpine:3.18                   : FAIL gcc version 12.2.1 20220924 (Alpine 12.2.1_git20220924-r10)
>     In file included from util/zstd.c:5:
>     /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:1: error: unknown type name 'ssize_t'; did you mean 'size_t'?

So the problem was really the one above, that got fixed with the patch
below, that is what 'man size_t' documents on my fedora:38 system.

- Arnaldo

diff --git a/tools/perf/util/compress.h b/tools/perf/util/compress.h
index 9eb6eb5bf038ce54..b29109cd36095c4f 100644
--- a/tools/perf/util/compress.h
+++ b/tools/perf/util/compress.h
@@ -3,7 +3,8 @@
 #define PERF_COMPRESS_H
 
 #include <stdbool.h>
-#include <stdlib.h>
+#include <stddef.h>
+#include <sys/types.h>
 #ifdef HAVE_ZSTD_SUPPORT
 #include <zstd.h>
 #endif

>        34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
>           | ^~~~~~~
>           | size_t
>     util/zstd.c:31:9: error: conflicting types for 'zstd_compress_stream_to_records'; have 'ssize_t(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'long int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
>        31 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
>           |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:9: note: previous declaration of 'zstd_compress_stream_to_records' with type 'int(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
>        34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
>           |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     make[3]: *** [/git/perf-6.6.0-rc1/tools/build/Makefile.build:158: util] Error 2
>       CC      /tmp/build/perf/util/zstd.o
>       CC      /tmp/build/perf/util/cap.o
>       CXX     /tmp/build/perf/util/demangle-cxx.o
>       CC      /tmp/build/perf/util/demangle-ocaml.o
>     In file included from util/zstd.c:5:
>     /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:1: error: unknown type name 'ssize_t'; did you mean 'size_t'?
>        34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
>           | ^~~~~~~
>           | size_t
>       CC      /tmp/build/perf/util/demangle-java.o
>     util/zstd.c:31:9: error: conflicting types for 'zstd_compress_stream_to_records'; have 'ssize_t(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'long int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
>        31 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
>           |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:9: note: previous declaration of 'zstd_compress_stream_to_records' with type 'int(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
>        34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
>    7    21.14 alpine:edge                   : FAIL gcc version 13.1.1 20230722 (Alpine 13.1.1_git20230722)
>     In file included from util/zstd.c:5:
>     /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:1: error: unknown type name 'ssize_t'; did you mean 'size_t'?
>        34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
>           | ^~~~~~~
>           | size_t
>     util/zstd.c:31:9: error: conflicting types for 'zstd_compress_stream_to_records'; have 'ssize_t(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'long int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
>        31 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
>           |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:9: note: previous declaration of 'zstd_compress_stream_to_records' with type 'int(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
>        34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
>           |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     make[3]: *** [/git/perf-6.6.0-rc1/tools/build/Makefile.build:158: util] Error 2
>       CC      /tmp/build/perf/util/cap.o
>       CXX     /tmp/build/perf/util/demangle-cxx.o
>       CC      /tmp/build/perf/util/demangle-ocaml.o
>       CC      /tmp/build/perf/util/demangle-java.o
>     In file included from util/zstd.c:5:
>     /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:1: error: unknown type name 'ssize_t'; did you mean 'size_t'?
>        34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
>           | ^~~~~~~
>           | size_t
>     util/zstd.c:31:9: error: conflicting types for 'zstd_compress_stream_to_records'; have 'ssize_t(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'long int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
>        31 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
>           |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:9: note: previous declaration of 'zstd_compress_stream_to_records' with type 'int(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
>        34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
>           |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 

-- 

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 04/53] perf mmap: Lazily initialize zstd streams
  2023-11-28 17:38       ` Arnaldo Carvalho de Melo
@ 2023-11-28 17:55         ` Ian Rogers
  2023-11-28 20:29           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 83+ messages in thread
From: Ian Rogers @ 2023-11-28 17:55 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Adrian Hunter, Nick Terrell, Kan Liang,
	Andi Kleen, Kajol Jain, Athira Rajeev, Huacai Chen,
	Masami Hiramatsu, Vincent Whitchurch, Steinar H. Gunderson,
	Liam Howlett, Miguel Ojeda, Colin Ian King, Dmitrii Dolgov,
	Yang Jihong, Ming Wang, James Clark, K Prateek Nayak,
	Sean Christopherson, Leo Yan, Ravi Bangoria, German Gomez,
	Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das, liuwenyu,
	linux-kernel, linux-perf-users

On Tue, Nov 28, 2023 at 9:38 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Tue, Nov 28, 2023 at 02:14:33PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Mon, Nov 27, 2023 at 07:00:06PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Thu, Nov 02, 2023 at 10:56:46AM -0700, Ian Rogers escreveu:
> > > > Zstd streams create dictionaries that can require significant RAM,
> > > > especially when there is one per-CPU. Tools like perf record won't use
> > > > the streams without the -z option, and so the creation of the streams
> > > > is pure overhead. Switch to creating the streams on first use.
>
> > > > Signed-off-by: Ian Rogers <irogers@google.com>
>
> > > Thanks, applied to perf-tools-next.
>
> > Trying to fix this now:
> >
> >   6    20.59 alpine:3.18                   : FAIL gcc version 12.2.1 20220924 (Alpine 12.2.1_git20220924-r10)
> >     In file included from util/zstd.c:5:
> >     /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:1: error: unknown type name 'ssize_t'; did you mean 'size_t'?
>
> So the problem was really the one above, that got fixed with the patch
> below, that is what 'man size_t' documents on my fedora:38 system.


Thanks, perhaps this is something clang-tidy, clang-format or similar
could help with in the future. There was event IWYU discussion at LPC:
https://lpc.events/event/17/contributions/1620/attachments/1228/2520/Linux%20Kernel%20Header%20Optimization.pdf

Thanks,
Ian

> - Arnaldo
>
> diff --git a/tools/perf/util/compress.h b/tools/perf/util/compress.h
> index 9eb6eb5bf038ce54..b29109cd36095c4f 100644
> --- a/tools/perf/util/compress.h
> +++ b/tools/perf/util/compress.h
> @@ -3,7 +3,8 @@
>  #define PERF_COMPRESS_H
>
>  #include <stdbool.h>
> -#include <stdlib.h>
> +#include <stddef.h>
> +#include <sys/types.h>
>  #ifdef HAVE_ZSTD_SUPPORT
>  #include <zstd.h>
>  #endif
>
> >        34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
> >           | ^~~~~~~
> >           | size_t
> >     util/zstd.c:31:9: error: conflicting types for 'zstd_compress_stream_to_records'; have 'ssize_t(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'long int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
> >        31 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
> >           |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >     /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:9: note: previous declaration of 'zstd_compress_stream_to_records' with type 'int(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
> >        34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
> >           |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >     make[3]: *** [/git/perf-6.6.0-rc1/tools/build/Makefile.build:158: util] Error 2
> >       CC      /tmp/build/perf/util/zstd.o
> >       CC      /tmp/build/perf/util/cap.o
> >       CXX     /tmp/build/perf/util/demangle-cxx.o
> >       CC      /tmp/build/perf/util/demangle-ocaml.o
> >     In file included from util/zstd.c:5:
> >     /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:1: error: unknown type name 'ssize_t'; did you mean 'size_t'?
> >        34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
> >           | ^~~~~~~
> >           | size_t
> >       CC      /tmp/build/perf/util/demangle-java.o
> >     util/zstd.c:31:9: error: conflicting types for 'zstd_compress_stream_to_records'; have 'ssize_t(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'long int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
> >        31 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
> >           |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >     /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:9: note: previous declaration of 'zstd_compress_stream_to_records' with type 'int(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
> >        34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
> >    7    21.14 alpine:edge                   : FAIL gcc version 13.1.1 20230722 (Alpine 13.1.1_git20230722)
> >     In file included from util/zstd.c:5:
> >     /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:1: error: unknown type name 'ssize_t'; did you mean 'size_t'?
> >        34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
> >           | ^~~~~~~
> >           | size_t
> >     util/zstd.c:31:9: error: conflicting types for 'zstd_compress_stream_to_records'; have 'ssize_t(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'long int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
> >        31 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
> >           |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >     /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:9: note: previous declaration of 'zstd_compress_stream_to_records' with type 'int(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
> >        34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
> >           |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >     make[3]: *** [/git/perf-6.6.0-rc1/tools/build/Makefile.build:158: util] Error 2
> >       CC      /tmp/build/perf/util/cap.o
> >       CXX     /tmp/build/perf/util/demangle-cxx.o
> >       CC      /tmp/build/perf/util/demangle-ocaml.o
> >       CC      /tmp/build/perf/util/demangle-java.o
> >     In file included from util/zstd.c:5:
> >     /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:1: error: unknown type name 'ssize_t'; did you mean 'size_t'?
> >        34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
> >           | ^~~~~~~
> >           | size_t
> >     util/zstd.c:31:9: error: conflicting types for 'zstd_compress_stream_to_records'; have 'ssize_t(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'long int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
> >        31 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
> >           |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >     /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:9: note: previous declaration of 'zstd_compress_stream_to_records' with type 'int(struct zstd_data *, void *, size_t,  void *, size_t,  size_t,  size_t (*)(void *, size_t))' {aka 'int(struct zstd_data *, void *, long unsigned int,  void *, long unsigned int,  long unsigned int,  long unsigned int (*)(void *, long unsigned int))'}
> >        34 | ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
> >           |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
>
> --

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v4 04/53] perf mmap: Lazily initialize zstd streams
  2023-11-28 17:55         ` Ian Rogers
@ 2023-11-28 20:29           ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 83+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-11-28 20:29 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Adrian Hunter, Nick Terrell, Kan Liang,
	Andi Kleen, Kajol Jain, Athira Rajeev, Huacai Chen,
	Masami Hiramatsu, Vincent Whitchurch, Steinar H. Gunderson,
	Liam Howlett, Miguel Ojeda, Colin Ian King, Dmitrii Dolgov,
	Yang Jihong, Ming Wang, James Clark, K Prateek Nayak,
	Sean Christopherson, Leo Yan, Ravi Bangoria, German Gomez,
	Changbin Du, Paolo Bonzini, Li Dong, Sandipan Das, liuwenyu,
	linux-kernel, linux-perf-users

Em Tue, Nov 28, 2023 at 09:55:22AM -0800, Ian Rogers escreveu:
> On Tue, Nov 28, 2023 at 9:38 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > Em Tue, Nov 28, 2023 at 02:14:33PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Trying to fix this now:
> > >
> > >   6    20.59 alpine:3.18                   : FAIL gcc version 12.2.1 20220924 (Alpine 12.2.1_git20220924-r10)
> > >     In file included from util/zstd.c:5:
> > >     /git/perf-6.6.0-rc1/tools/perf/util/compress.h:34:1: error: unknown type name 'ssize_t'; did you mean 'size_t'?

> > So the problem was really the one above, that got fixed with the patch
> > below, that is what 'man size_t' documents on my fedora:38 system.
 
> Thanks, perhaps this is something clang-tidy, clang-format or similar
> could help with in the future. There was event IWYU discussion at LPC:
> https://lpc.events/event/17/contributions/1620/attachments/1228/2520/Linux%20Kernel%20Header%20Optimization.pdf

Yeah, that is interesting, I took a quick look and it looks promising.

I did this manually in various areas of the kernel and in tools/perf
from time to time, to speed up the building process, etc.

- Arnaldo

^ permalink raw reply	[flat|nested] 83+ messages in thread

end of thread, other threads:[~2023-11-28 20:29 UTC | newest]

Thread overview: 83+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-02 17:56 [PATCH v4 00/53] Improvements to memory use Ian Rogers
2023-11-02 17:56 ` [PATCH v4 01/53] perf comm: Use regular mutex Ian Rogers
2023-11-05 17:31   ` Namhyung Kim
2023-11-05 21:35     ` Ian Rogers
2023-11-06  3:58       ` Namhyung Kim
2023-11-27 18:59         ` Ian Rogers
2023-11-27 21:53     ` Arnaldo Carvalho de Melo
2023-11-28  0:48       ` Arnaldo Carvalho de Melo
2023-11-02 17:56 ` [PATCH v4 02/53] perf record: Lazy load kernel symbols Ian Rogers
2023-11-05 17:34   ` Namhyung Kim
2023-11-06 11:00   ` Adrian Hunter
2023-11-08 16:01     ` Arnaldo Carvalho de Melo
2023-11-02 17:56 ` [PATCH v4 03/53] libperf: Lazily allocate mmap event copy Ian Rogers
2023-11-03  8:32   ` Guilherme Amadio
2023-11-03 15:48     ` Ian Rogers
2023-11-05 18:12       ` Namhyung Kim
2023-11-27 19:28         ` Ian Rogers
2023-11-02 17:56 ` [PATCH v4 04/53] perf mmap: Lazily initialize zstd streams Ian Rogers
2023-11-27 22:00   ` Arnaldo Carvalho de Melo
2023-11-28 17:14     ` Arnaldo Carvalho de Melo
2023-11-28 17:38       ` Arnaldo Carvalho de Melo
2023-11-28 17:55         ` Ian Rogers
2023-11-28 20:29           ` Arnaldo Carvalho de Melo
2023-11-02 17:56 ` [PATCH v4 05/53] perf machine thread: Remove exited threads by default Ian Rogers
2023-11-06 11:28   ` Adrian Hunter
2023-11-08 16:04     ` Arnaldo Carvalho de Melo
2023-11-02 17:56 ` [PATCH v4 06/53] tools api fs: Switch filename__read_str to use io.h Ian Rogers
2023-11-06  3:53   ` Namhyung Kim
2023-11-27 20:26     ` Ian Rogers
2023-11-02 17:56 ` [PATCH v4 07/53] tools api fs: Avoid reading whole file for a 1 byte bool Ian Rogers
2023-11-06  3:55   ` Namhyung Kim
2023-11-27 20:41     ` Ian Rogers
2023-11-02 17:56 ` [PATCH v4 08/53] tools lib api: Add io_dir an allocation free readdir alternative Ian Rogers
2023-11-02 17:56 ` [PATCH v4 09/53] perf maps: Switch modules tree walk to io_dir__readdir Ian Rogers
2023-11-02 17:56 ` [PATCH v4 10/53] perf record: Be lazier in allocating lost samples buffer Ian Rogers
2023-11-27 22:03   ` Arnaldo Carvalho de Melo
2023-11-27 22:23     ` Ian Rogers
2023-11-02 17:56 ` [PATCH v4 11/53] perf pmu: Switch to io_dir__readdir Ian Rogers
2023-11-02 17:56 ` [PATCH v4 12/53] perf bpf: Don't synthesize BPF events when disabled Ian Rogers
2023-11-08 16:14   ` Arnaldo Carvalho de Melo
2023-11-08 23:03     ` Song Liu
2023-11-09 16:10       ` Arnaldo Carvalho de Melo
2023-11-02 17:56 ` [PATCH v4 13/53] perf header: Switch mem topology to io_dir__readdir Ian Rogers
2023-11-02 17:56 ` [PATCH v4 14/53] perf events: Remove scandir in thread synthesis Ian Rogers
2023-11-02 17:56 ` [PATCH v4 15/53] perf map: Simplify map_ip/unmap_ip and make map size smaller Ian Rogers
2023-11-02 17:56 ` [PATCH v4 16/53] perf maps: Move symbol maps functions to maps.c Ian Rogers
2023-11-02 17:56 ` [PATCH v4 17/53] perf thread: Add missing RC_CHK_EQUAL Ian Rogers
2023-11-02 17:57 ` [PATCH v4 18/53] perf maps: Add maps__for_each_map to call a function on each entry Ian Rogers
2023-11-02 17:57 ` [PATCH v4 19/53] perf maps: Add remove maps function to remove a map based on callback Ian Rogers
2023-11-02 17:57 ` [PATCH v4 20/53] perf debug: Expose debug file Ian Rogers
2023-11-02 17:57 ` [PATCH v4 21/53] perf maps: Refactor maps__fixup_overlappings Ian Rogers
2023-11-02 17:57 ` [PATCH v4 22/53] perf maps: Do simple merge if given map doesn't overlap Ian Rogers
2023-11-02 17:57 ` [PATCH v4 23/53] perf maps: Rename clone to copy from Ian Rogers
2023-11-02 17:57 ` [PATCH v4 24/53] perf maps: Add maps__load_first Ian Rogers
2023-11-02 17:57 ` [PATCH v4 25/53] perf maps: Add find next entry to give entry after the given map Ian Rogers
2023-11-02 17:57 ` [PATCH v4 26/53] perf maps: Reduce scope of map_rb_node and maps internals Ian Rogers
2023-11-02 17:57 ` [PATCH v4 27/53] perf maps: Fix up overlaps during fixup_end Ian Rogers
2023-11-02 17:57 ` [PATCH v4 28/53] perf maps: Switch from rbtree to lazily sorted array for addresses Ian Rogers
2023-11-02 17:57 ` [PATCH v4 29/53] perf maps: Get map before returning in maps__find Ian Rogers
2023-11-02 17:57 ` [PATCH v4 30/53] perf maps: Get map before returning in maps__find_by_name Ian Rogers
2023-11-02 17:57 ` [PATCH v4 31/53] perf maps: Get map before returning in maps__find_next_entry Ian Rogers
2023-11-02 17:57 ` [PATCH v4 32/53] perf maps: Hide maps internals Ian Rogers
2023-11-02 17:57 ` [PATCH v4 33/53] perf maps: Locking tidy up of nr_maps Ian Rogers
2023-11-02 17:57 ` [PATCH v4 34/53] perf dso: Reorder variables to save space in struct dso Ian Rogers
2023-11-02 17:57 ` [PATCH v4 35/53] perf report: Sort child tasks by tid Ian Rogers
2023-11-02 17:57 ` [PATCH v4 36/53] perf trace: Ignore thread hashing in summary Ian Rogers
2023-11-02 17:57 ` [PATCH v4 37/53] perf machine: Move fprintf to for_each loop and a callback Ian Rogers
2023-11-02 17:57 ` [PATCH v4 38/53] perf threads: Move threads to its own files Ian Rogers
2023-11-02 17:57 ` [PATCH v4 39/53] perf threads: Switch from rbtree to hashmap Ian Rogers
2023-11-02 17:57 ` [PATCH v4 40/53] perf threads: Reduce table size from 256 to 8 Ian Rogers
2023-11-02 17:57 ` [PATCH v4 41/53] perf dsos: Attempt to better abstract dsos internals Ian Rogers
2023-11-02 17:57 ` [PATCH v4 42/53] perf dsos: Tidy reference counting and locking Ian Rogers
2023-11-02 17:57 ` [PATCH v4 43/53] perf dsos: Add dsos__for_each_dso Ian Rogers
2023-11-02 17:57 ` [PATCH v4 44/53] perf dso: Move dso functions out of dsos Ian Rogers
2023-11-02 17:57 ` [PATCH v4 45/53] perf dsos: Switch more loops to dsos__for_each_dso Ian Rogers
2023-11-02 17:57 ` [PATCH v4 46/53] perf dsos: Switch backing storage to array from rbtree/list Ian Rogers
2023-11-02 17:57 ` [PATCH v4 47/53] perf dsos: Remove __dsos__addnew Ian Rogers
2023-11-02 17:57 ` [PATCH v4 48/53] perf dsos: Remove __dsos__findnew_link_by_longname_id Ian Rogers
2023-11-02 17:57 ` [PATCH v4 49/53] perf dsos: Switch hand code to bsearch Ian Rogers
2023-11-02 17:57 ` [PATCH v4 50/53] perf dso: Add reference count checking and accessor functions Ian Rogers
2023-11-02 17:57 ` [PATCH v4 51/53] perf dso: Reference counting related fixes Ian Rogers
2023-11-02 17:57 ` [PATCH v4 52/53] perf dso: Use container_of to avoid a pointer in dso_data Ian Rogers
2023-11-02 17:57 ` [PATCH v4 53/53] perf env: Avoid recursively taking env->bpf_progs.lock Ian Rogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).