LKML Archive on lore.kernel.org
 help / Atom feed
* [RFCv2 00/48] perf tools: Add threads to record command
@ 2018-09-13 12:54 Jiri Olsa
  2018-09-13 12:54 ` [PATCH 01/48] perf tools: Remove perf_tool from event_op2 Jiri Olsa
                   ` (49 more replies)
  0 siblings, 50 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

hi,
sending *RFC* for threads support in perf record command.

In big picture this patchset adds perf record --threads
option that allows to create threads in following modes:

1) single thread mode (current)

  $ perf record ...
  $ perf record --threads=1 ...

  - all maps are read/stored under process thread

2) mode with specific (X) number of threads

  $ perf record --threads=X ...

  - maps are spread equaly among threads

3) mode that creates thread for every monitored memory map

  $ perf record --threads ...

  - which in perf record is equal to number of CPUs, and
    it pins each thread to its map's cpu:

4) TODO - NUMA aware threads/maps separation
   ...

The perf.data stays as a single file.

v2 changes:
  - rebased to current Arnaldo's perf/core
    (also based on few fixes from my perf/core, see the branch details below)

This patchset contains lot of preparation changes to make
threaded record possible:

  - Namhyung's changes to create multiple data streams in
    perf data file, which allows having each thread data
    being stored in separate files and merged into single
    perf data after

  - Namhyung's changes to create track mmaps for auxiliary
    events

  - Namhyung's changes to search for threads/mmaps/comms
    using the time. This is needed because we have now
    multiple data streams which are processed separately,
    but they all need access to complete auxiliary events
    data (threads/mmaps/comms). That's also a reason why
    the auxiliary events are stored into separate data
    stream, which is processed before real data.

  - the rest of the code that adds threads abstraction into
    record command allows to create them and distribute maps
    among them

  - other preparational changes

The threaded monitoring currently can't monitor backward maps
and there are probably more limitations which I haven't spotted
yet.

So far I tested on laptop:
  http://people.redhat.com/~jolsa/record_threads/test-4CPU.txt

and a one bigger server:
  http://people.redhat.com/~jolsa/record_threads/test-208CPU.txt

I can see decrease in recorded LOST events, but both the benchmark
and the monitoring must be carefully configured wrt:
  - number of events (frequency)
  - size of the memory maps
  - size of events (callchains)
  - final perf.data size

It's also available in:
  git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
  perf/record_threads

thoughts? ;-) thanks
jirka


---
Jiri Olsa (30):
      perf tools: Remove perf_tool from event_op2
      perf tools: Remove perf_tool from event_op3
      perf tools: Pass struct perf_mmap into auxtrace_mmap__read* functions
      perf tools: Add struct perf_mmap arg into record__write
      perf tools: Create separate mmap for dummy tracking event
      perf tools: Make copyfile_offset global
      perf tools: Add perf_data__create_index function
      perf record: Add --index option for building index table
      perf tools: Convert dead thread list into rbtree
      perf tools: Add thread::exited flag
      perf callchain: Maintain libunwind's address space in map_groups
      perf tools: Rename perf_evlist__munmap_filtered to perf_mmap__put_filtered
      tools lib fd array: Introduce fdarray__add_clone function
      tools lib subcmd: Add OPT_INTEGER_OPTARG|_SET options
      perf tools: Move __perf_session__process_events args into struct
      perf ui progress: Fix index progress display
      perf tools: Add threads debug variable
      perf tools: Add perf_mmap__read_tail function
      perf record: Introduce struct record_thread
      perf record: Read record thread's mmaps
      perf record: Move waking into struct record
      perf record: Move samples into struct record_thread
      perf record: Move bytes_written into struct record_thread
      perf record: Add record_thread start/stop/process functions
      perf record: Wait for all threads being started
      perf record: Add --threads option
      perf record: Add --thread-stats option support
      perf record: Add maps to --thread-stats output
      perf record: Spread maps for --threads option
      perf record: Spread maps for --threads=X option

Namhyung Kim (18):
      perf tools: Use a software dummy event to track task/mmap events
      perf tools: Extend perf_evlist__mmap_ex() to use track mmap
      perf report: Skip dummy tracking event
      perf tools: Add HEADER_DATA_INDEX feature
      perf tools: Handle indexed data file properly
      perf tools: Introduce thread__comm(_str)_by_time() helpers
      perf tools: Add a test case for thread comm handling
      perf tools: Use thread__comm_by_time() when adding hist entries
      perf tools: Introduce machine__find*_thread_by_time()
      perf tools: Add a test case for timed thread handling
      perf tools: Maintain map groups list in a leader thread
      perf tools: Introduce thread__find_symbol_by_time() and friends
      perf callchain: Use thread__find_addr_location_by_time() and friends
      perf tools: Add a test case for timed map groups handling
      perf tools: Save timestamp of a map creation
      perf tools: Introduce map_groups__{insert,find}_by_time()
      perf tools: Use map_groups__find_addr_by_time()
      perf tools: Add testcase for managing maps with time

 tools/lib/api/fd/array.c                 |  17 +
 tools/lib/api/fd/array.h                 |   1 +
 tools/lib/subcmd/parse-options.c         |   2 +
 tools/lib/subcmd/parse-options.h         |   9 +
 tools/perf/Documentation/perf-record.txt |   4 +
 tools/perf/Documentation/perf.txt        |   1 +
 tools/perf/builtin-annotate.c            |   7 +-
 tools/perf/builtin-inject.c              |  32 +-
 tools/perf/builtin-record.c              | 899 +++++++++++++++++++++++++++++--
 tools/perf/builtin-report.c              |  12 +-
 tools/perf/builtin-script.c              |  38 +-
 tools/perf/builtin-stat.c                |  23 +-
 tools/perf/perf.c                        |   1 +
 tools/perf/perf.h                        |   3 +
 tools/perf/tests/Build                   |   4 +
 tools/perf/tests/builtin-test.c          |  16 +
 tools/perf/tests/dwarf-unwind.c          |   4 +-
 tools/perf/tests/hists_common.c          |   2 +-
 tools/perf/tests/hists_link.c            |   2 +-
 tools/perf/tests/tests.h                 |   4 +
 tools/perf/tests/thread-comm.c           |  48 ++
 tools/perf/tests/thread-lookup-time.c    | 181 +++++++
 tools/perf/tests/thread-map-time.c       |  90 ++++
 tools/perf/tests/thread-mg-share.c       |   7 +-
 tools/perf/tests/thread-mg-time.c        |  94 ++++
 tools/perf/ui/browsers/hists.c           |  30 +-
 tools/perf/ui/gtk/hists.c                |   3 +
 tools/perf/util/auxtrace.c               |  30 +-
 tools/perf/util/auxtrace.h               |  21 +-
 tools/perf/util/data.c                   |  64 +++
 tools/perf/util/data.h                   |   5 +
 tools/perf/util/debug.c                  |   2 +
 tools/perf/util/debug.h                  |   1 +
 tools/perf/util/dso.c                    |   2 +-
 tools/perf/util/event.c                  | 135 ++++-
 tools/perf/util/evlist.c                 |  96 +++-
 tools/perf/util/evlist.h                 |   7 +-
 tools/perf/util/evsel.h                  |  15 +
 tools/perf/util/header.c                 |  93 +++-
 tools/perf/util/header.h                 |  18 +-
 tools/perf/util/hist.c                   |   4 +-
 tools/perf/util/intel-pt.c               |   2 +-
 tools/perf/util/machine.c                | 293 ++++++++--
 tools/perf/util/machine.h                |  22 +-
 tools/perf/util/map.c                    |  79 ++-
 tools/perf/util/map.h                    |  40 +-
 tools/perf/util/mmap.c                   |   6 +-
 tools/perf/util/mmap.h                   |  33 +-
 tools/perf/util/session.c                | 178 +++---
 tools/perf/util/session.h                |   5 +-
 tools/perf/util/stat.c                   |   5 +-
 tools/perf/util/stat.h                   |   5 +-
 tools/perf/util/symbol-elf.c             |   2 +-
 tools/perf/util/symbol.c                 |   4 +-
 tools/perf/util/thread.c                 | 200 ++++++-
 tools/perf/util/thread.h                 |  27 +-
 tools/perf/util/tool.h                   |   7 +-
 tools/perf/util/unwind-libdw.c           |   6 +-
 tools/perf/util/unwind-libunwind-local.c |  39 +-
 tools/perf/util/unwind-libunwind.c       |   9 +-
 tools/perf/util/unwind.h                 |   7 +-
 tools/perf/util/util.c                   |   2 +-
 tools/perf/util/util.h                   |   2 +
 63 files changed, 2608 insertions(+), 392 deletions(-)
 create mode 100644 tools/perf/tests/thread-comm.c
 create mode 100644 tools/perf/tests/thread-lookup-time.c
 create mode 100644 tools/perf/tests/thread-map-time.c
 create mode 100644 tools/perf/tests/thread-mg-time.c

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 01/48] perf tools: Remove perf_tool from event_op2
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-25  9:31   ` [tip:perf/core] " tip-bot for Jiri Olsa
  2018-09-13 12:54 ` [PATCH 02/48] perf tools: Remove perf_tool from event_op3 Jiri Olsa
                   ` (48 subsequent siblings)
  49 siblings, 1 reply; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

Now when we keep perf_tool pointer inside perf_session,
there's no need to have perf_tool argument in the
event_op2 callback. Removing it.

Link: http://lkml.kernel.org/n/tip-dc99gim2w2919gdnzrl3tegh@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-annotate.c |  7 ++--
 tools/perf/builtin-inject.c   | 26 ++++++--------
 tools/perf/builtin-report.c   |  9 +++--
 tools/perf/builtin-script.c   | 38 ++++++++++----------
 tools/perf/builtin-stat.c     | 23 ++++++------
 tools/perf/util/auxtrace.c    | 10 +++---
 tools/perf/util/auxtrace.h    | 10 +++---
 tools/perf/util/header.c      | 16 ++++-----
 tools/perf/util/header.h      | 15 ++++----
 tools/perf/util/session.c     | 67 +++++++++++++++--------------------
 tools/perf/util/session.h     |  5 ++-
 tools/perf/util/stat.c        |  5 ++-
 tools/perf/util/stat.h        |  5 ++-
 tools/perf/util/tool.h        |  3 +-
 14 files changed, 103 insertions(+), 136 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 830481b8db26..93d679eaf1f4 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -283,12 +283,11 @@ static int process_sample_event(struct perf_tool *tool,
 	return ret;
 }
 
-static int process_feature_event(struct perf_tool *tool,
-				 union perf_event *event,
-				 struct perf_session *session)
+static int process_feature_event(struct perf_session *session,
+				 union perf_event *event)
 {
 	if (event->feat.feat_id < HEADER_LAST_FEATURE)
-		return perf_event__process_feature(tool, event, session);
+		return perf_event__process_feature(session, event);
 	return 0;
 }
 
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index a3b346359ba0..d77ed2aea95a 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -86,12 +86,10 @@ static int perf_event__drop_oe(struct perf_tool *tool __maybe_unused,
 }
 #endif
 
-static int perf_event__repipe_op2_synth(struct perf_tool *tool,
-					union perf_event *event,
-					struct perf_session *session
-					__maybe_unused)
+static int perf_event__repipe_op2_synth(struct perf_session *session,
+					union perf_event *event)
 {
-	return perf_event__repipe_synth(tool, event);
+	return perf_event__repipe_synth(session->tool, event);
 }
 
 static int perf_event__repipe_attr(struct perf_tool *tool,
@@ -362,26 +360,24 @@ static int perf_event__repipe_exit(struct perf_tool *tool,
 	return err;
 }
 
-static int perf_event__repipe_tracing_data(struct perf_tool *tool,
-					   union perf_event *event,
-					   struct perf_session *session)
+static int perf_event__repipe_tracing_data(struct perf_session *session,
+					   union perf_event *event)
 {
 	int err;
 
-	perf_event__repipe_synth(tool, event);
-	err = perf_event__process_tracing_data(tool, event, session);
+	perf_event__repipe_synth(session->tool, event);
+	err = perf_event__process_tracing_data(session, event);
 
 	return err;
 }
 
-static int perf_event__repipe_id_index(struct perf_tool *tool,
-				       union perf_event *event,
-				       struct perf_session *session)
+static int perf_event__repipe_id_index(struct perf_session *session,
+				       union perf_event *event)
 {
 	int err;
 
-	perf_event__repipe_synth(tool, event);
-	err = perf_event__process_id_index(tool, event, session);
+	perf_event__repipe_synth(session->tool, event);
+	err = perf_event__process_id_index(session, event);
 
 	return err;
 }
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 76e12bcd1765..7507e4d6dce1 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -201,14 +201,13 @@ static void setup_forced_leader(struct report *report,
 		perf_evlist__force_leader(evlist);
 }
 
-static int process_feature_event(struct perf_tool *tool,
-				 union perf_event *event,
-				 struct perf_session *session __maybe_unused)
+static int process_feature_event(struct perf_session *session,
+				 union perf_event *event)
 {
-	struct report *rep = container_of(tool, struct report, tool);
+	struct report *rep = container_of(session->tool, struct report, tool);
 
 	if (event->feat.feat_id < HEADER_LAST_FEATURE)
-		return perf_event__process_feature(tool, event, session);
+		return perf_event__process_feature(session, event);
 
 	if (event->feat.feat_id != HEADER_LAST_FEATURE) {
 		pr_err("failed: wrong feature ID: %" PRIu64 "\n",
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 6176bae177c2..765391b6c88c 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2965,9 +2965,8 @@ static void script__setup_sample_type(struct perf_script *script)
 	}
 }
 
-static int process_stat_round_event(struct perf_tool *tool __maybe_unused,
-				    union perf_event *event,
-				    struct perf_session *session)
+static int process_stat_round_event(struct perf_session *session,
+				    union perf_event *event)
 {
 	struct stat_round_event *round = &event->stat_round;
 	struct perf_evsel *counter;
@@ -2981,9 +2980,8 @@ static int process_stat_round_event(struct perf_tool *tool __maybe_unused,
 	return 0;
 }
 
-static int process_stat_config_event(struct perf_tool *tool __maybe_unused,
-				     union perf_event *event,
-				     struct perf_session *session __maybe_unused)
+static int process_stat_config_event(struct perf_session *session __maybe_unused,
+				     union perf_event *event)
 {
 	perf_event__read_stat_config(&stat_config, &event->stat_config);
 	return 0;
@@ -3009,10 +3007,10 @@ static int set_maps(struct perf_script *script)
 }
 
 static
-int process_thread_map_event(struct perf_tool *tool,
-			     union perf_event *event,
-			     struct perf_session *session __maybe_unused)
+int process_thread_map_event(struct perf_session *session,
+			     union perf_event *event)
 {
+	struct perf_tool *tool = session->tool;
 	struct perf_script *script = container_of(tool, struct perf_script, tool);
 
 	if (script->threads) {
@@ -3028,10 +3026,10 @@ int process_thread_map_event(struct perf_tool *tool,
 }
 
 static
-int process_cpu_map_event(struct perf_tool *tool __maybe_unused,
-			  union perf_event *event,
-			  struct perf_session *session __maybe_unused)
+int process_cpu_map_event(struct perf_session *session,
+			  union perf_event *event)
 {
+	struct perf_tool *tool = session->tool;
 	struct perf_script *script = container_of(tool, struct perf_script, tool);
 
 	if (script->cpus) {
@@ -3046,21 +3044,21 @@ int process_cpu_map_event(struct perf_tool *tool __maybe_unused,
 	return set_maps(script);
 }
 
-static int process_feature_event(struct perf_tool *tool,
-				 union perf_event *event,
-				 struct perf_session *session)
+static int process_feature_event(struct perf_session *session,
+				 union perf_event *event)
 {
 	if (event->feat.feat_id < HEADER_LAST_FEATURE)
-		return perf_event__process_feature(tool, event, session);
+		return perf_event__process_feature(session, event);
 	return 0;
 }
 
 #ifdef HAVE_AUXTRACE_SUPPORT
-static int perf_script__process_auxtrace_info(struct perf_tool *tool,
-					      union perf_event *event,
-					      struct perf_session *session)
+static int perf_script__process_auxtrace_info(struct perf_session *session,
+					      union perf_event *event)
 {
-	int ret = perf_event__process_auxtrace_info(tool, event, session);
+	struct perf_tool *tool = session->tool;
+
+	int ret = perf_event__process_auxtrace_info(session, event);
 
 	if (ret == 0) {
 		struct perf_script *script = container_of(tool, struct perf_script, tool);
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 0b0e3961d511..b86aba1c8028 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1354,9 +1354,8 @@ static int __cmd_record(int argc, const char **argv)
 	return argc;
 }
 
-static int process_stat_round_event(struct perf_tool *tool __maybe_unused,
-				    union perf_event *event,
-				    struct perf_session *session)
+static int process_stat_round_event(struct perf_session *session,
+				    union perf_event *event)
 {
 	struct stat_round_event *stat_round = &event->stat_round;
 	struct perf_evsel *counter;
@@ -1381,10 +1380,10 @@ static int process_stat_round_event(struct perf_tool *tool __maybe_unused,
 }
 
 static
-int process_stat_config_event(struct perf_tool *tool,
-			      union perf_event *event,
-			      struct perf_session *session __maybe_unused)
+int process_stat_config_event(struct perf_session *session,
+			      union perf_event *event)
 {
+	struct perf_tool *tool = session->tool;
 	struct perf_stat *st = container_of(tool, struct perf_stat, tool);
 
 	perf_event__read_stat_config(&stat_config, &event->stat_config);
@@ -1424,10 +1423,10 @@ static int set_maps(struct perf_stat *st)
 }
 
 static
-int process_thread_map_event(struct perf_tool *tool,
-			     union perf_event *event,
-			     struct perf_session *session __maybe_unused)
+int process_thread_map_event(struct perf_session *session,
+			     union perf_event *event)
 {
+	struct perf_tool *tool = session->tool;
 	struct perf_stat *st = container_of(tool, struct perf_stat, tool);
 
 	if (st->threads) {
@@ -1443,10 +1442,10 @@ int process_thread_map_event(struct perf_tool *tool,
 }
 
 static
-int process_cpu_map_event(struct perf_tool *tool,
-			  union perf_event *event,
-			  struct perf_session *session __maybe_unused)
+int process_cpu_map_event(struct perf_session *session,
+			  union perf_event *event)
 {
+	struct perf_tool *tool = session->tool;
 	struct perf_stat *st = container_of(tool, struct perf_stat, tool);
 	struct cpu_map *cpus;
 
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index db1511359c5e..86f0bc445f93 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -906,9 +906,8 @@ int perf_event__synthesize_auxtrace_info(struct auxtrace_record *itr,
 	return err;
 }
 
-int perf_event__process_auxtrace_info(struct perf_tool *tool __maybe_unused,
-				      union perf_event *event,
-				      struct perf_session *session)
+int perf_event__process_auxtrace_info(struct perf_session *session,
+				      union perf_event *event)
 {
 	enum auxtrace_type type = event->auxtrace_info.type;
 
@@ -1185,9 +1184,8 @@ void events_stats__auxtrace_error_warn(const struct events_stats *stats)
 	}
 }
 
-int perf_event__process_auxtrace_error(struct perf_tool *tool __maybe_unused,
-				       union perf_event *event,
-				       struct perf_session *session)
+int perf_event__process_auxtrace_error(struct perf_session *session,
+				       union perf_event *event)
 {
 	if (auxtrace__dont_decode(session))
 		return 0;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 71fc3bd74299..97776470a52e 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -517,15 +517,13 @@ int perf_event__synthesize_auxtrace_info(struct auxtrace_record *itr,
 					 struct perf_tool *tool,
 					 struct perf_session *session,
 					 perf_event__handler_t process);
-int perf_event__process_auxtrace_info(struct perf_tool *tool,
-				      union perf_event *event,
-				      struct perf_session *session);
+int perf_event__process_auxtrace_info(struct perf_session *session,
+				      union perf_event *event);
 s64 perf_event__process_auxtrace(struct perf_tool *tool,
 				 union perf_event *event,
 				 struct perf_session *session);
-int perf_event__process_auxtrace_error(struct perf_tool *tool,
-				       union perf_event *event,
-				       struct perf_session *session);
+int perf_event__process_auxtrace_error(struct perf_session *session,
+				       union perf_event *event);
 int itrace_parse_synth_opts(const struct option *opt, const char *str,
 			    int unset);
 void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts);
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 91e6d9cfd906..c78051ad1fcc 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -3448,10 +3448,10 @@ int perf_event__synthesize_features(struct perf_tool *tool,
 	return ret;
 }
 
-int perf_event__process_feature(struct perf_tool *tool,
-				union perf_event *event,
-				struct perf_session *session __maybe_unused)
+int perf_event__process_feature(struct perf_session *session,
+				union perf_event *event)
 {
+	struct perf_tool *tool = session->tool;
 	struct feat_fd ff = { .fd = 0 };
 	struct feature_event *fe = (struct feature_event *)event;
 	int type = fe->header.type;
@@ -3856,9 +3856,8 @@ int perf_event__synthesize_tracing_data(struct perf_tool *tool, int fd,
 	return aligned_size;
 }
 
-int perf_event__process_tracing_data(struct perf_tool *tool __maybe_unused,
-				     union perf_event *event,
-				     struct perf_session *session)
+int perf_event__process_tracing_data(struct perf_session *session,
+				     union perf_event *event)
 {
 	ssize_t size_read, padding, size = event->tracing_data.size;
 	int fd = perf_data__fd(session->data);
@@ -3924,9 +3923,8 @@ int perf_event__synthesize_build_id(struct perf_tool *tool,
 	return err;
 }
 
-int perf_event__process_build_id(struct perf_tool *tool __maybe_unused,
-				 union perf_event *event,
-				 struct perf_session *session)
+int perf_event__process_build_id(struct perf_session *session,
+				 union perf_event *event)
 {
 	__event_process_build_id(&event->build_id,
 				 event->build_id.filename,
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index ff2a1263fb9b..e17903caa71d 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -116,9 +116,8 @@ int perf_event__synthesize_extra_attr(struct perf_tool *tool,
 				      perf_event__handler_t process,
 				      bool is_pipe);
 
-int perf_event__process_feature(struct perf_tool *tool,
-				union perf_event *event,
-				struct perf_session *session);
+int perf_event__process_feature(struct perf_session *session,
+				union perf_event *event);
 
 int perf_event__synthesize_attr(struct perf_tool *tool,
 				struct perf_event_attr *attr, u32 ids, u64 *id,
@@ -148,17 +147,15 @@ size_t perf_event__fprintf_event_update(union perf_event *event, FILE *fp);
 int perf_event__synthesize_tracing_data(struct perf_tool *tool,
 					int fd, struct perf_evlist *evlist,
 					perf_event__handler_t process);
-int perf_event__process_tracing_data(struct perf_tool *tool,
-				     union perf_event *event,
-				     struct perf_session *session);
+int perf_event__process_tracing_data(struct perf_session *session,
+				     union perf_event *event);
 
 int perf_event__synthesize_build_id(struct perf_tool *tool,
 				    struct dso *pos, u16 misc,
 				    perf_event__handler_t process,
 				    struct machine *machine);
-int perf_event__process_build_id(struct perf_tool *tool,
-				 union perf_event *event,
-				 struct perf_session *session);
+int perf_event__process_build_id(struct perf_session *session,
+				 union perf_event *event);
 bool is_perf_magic(u64 magic);
 
 #define NAME_ALIGN 64
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 8b9369303561..e781cdba845c 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -199,12 +199,10 @@ void perf_session__delete(struct perf_session *session)
 	free(session);
 }
 
-static int process_event_synth_tracing_data_stub(struct perf_tool *tool
+static int process_event_synth_tracing_data_stub(struct perf_session *session
 						 __maybe_unused,
 						 union perf_event *event
-						 __maybe_unused,
-						 struct perf_session *session
-						__maybe_unused)
+						 __maybe_unused)
 {
 	dump_printf(": unhandled!\n");
 	return 0;
@@ -288,9 +286,8 @@ static s64 process_event_auxtrace_stub(struct perf_tool *tool __maybe_unused,
 	return event->auxtrace.size;
 }
 
-static int process_event_op2_stub(struct perf_tool *tool __maybe_unused,
-				  union perf_event *event __maybe_unused,
-				  struct perf_session *session __maybe_unused)
+static int process_event_op2_stub(struct perf_session *session __maybe_unused,
+				  union perf_event *event __maybe_unused)
 {
 	dump_printf(": unhandled!\n");
 	return 0;
@@ -298,9 +295,8 @@ static int process_event_op2_stub(struct perf_tool *tool __maybe_unused,
 
 
 static
-int process_event_thread_map_stub(struct perf_tool *tool __maybe_unused,
-				  union perf_event *event __maybe_unused,
-				  struct perf_session *session __maybe_unused)
+int process_event_thread_map_stub(struct perf_session *session __maybe_unused,
+				  union perf_event *event __maybe_unused)
 {
 	if (dump_trace)
 		perf_event__fprintf_thread_map(event, stdout);
@@ -310,9 +306,8 @@ int process_event_thread_map_stub(struct perf_tool *tool __maybe_unused,
 }
 
 static
-int process_event_cpu_map_stub(struct perf_tool *tool __maybe_unused,
-			       union perf_event *event __maybe_unused,
-			       struct perf_session *session __maybe_unused)
+int process_event_cpu_map_stub(struct perf_session *session __maybe_unused,
+			       union perf_event *event __maybe_unused)
 {
 	if (dump_trace)
 		perf_event__fprintf_cpu_map(event, stdout);
@@ -322,9 +317,8 @@ int process_event_cpu_map_stub(struct perf_tool *tool __maybe_unused,
 }
 
 static
-int process_event_stat_config_stub(struct perf_tool *tool __maybe_unused,
-				   union perf_event *event __maybe_unused,
-				   struct perf_session *session __maybe_unused)
+int process_event_stat_config_stub(struct perf_session *session __maybe_unused,
+				   union perf_event *event __maybe_unused)
 {
 	if (dump_trace)
 		perf_event__fprintf_stat_config(event, stdout);
@@ -333,10 +327,8 @@ int process_event_stat_config_stub(struct perf_tool *tool __maybe_unused,
 	return 0;
 }
 
-static int process_stat_stub(struct perf_tool *tool __maybe_unused,
-			     union perf_event *event __maybe_unused,
-			     struct perf_session *perf_session
-			     __maybe_unused)
+static int process_stat_stub(struct perf_session *perf_session __maybe_unused,
+			     union perf_event *event)
 {
 	if (dump_trace)
 		perf_event__fprintf_stat(event, stdout);
@@ -345,10 +337,8 @@ static int process_stat_stub(struct perf_tool *tool __maybe_unused,
 	return 0;
 }
 
-static int process_stat_round_stub(struct perf_tool *tool __maybe_unused,
-				   union perf_event *event __maybe_unused,
-				   struct perf_session *perf_session
-				   __maybe_unused)
+static int process_stat_round_stub(struct perf_session *perf_session __maybe_unused,
+				   union perf_event *event)
 {
 	if (dump_trace)
 		perf_event__fprintf_stat_round(event, stdout);
@@ -1374,37 +1364,37 @@ static s64 perf_session__process_user_event(struct perf_session *session,
 	case PERF_RECORD_HEADER_TRACING_DATA:
 		/* setup for reading amidst mmap */
 		lseek(fd, file_offset, SEEK_SET);
-		return tool->tracing_data(tool, event, session);
+		return tool->tracing_data(session, event);
 	case PERF_RECORD_HEADER_BUILD_ID:
-		return tool->build_id(tool, event, session);
+		return tool->build_id(session, event);
 	case PERF_RECORD_FINISHED_ROUND:
 		return tool->finished_round(tool, event, oe);
 	case PERF_RECORD_ID_INDEX:
-		return tool->id_index(tool, event, session);
+		return tool->id_index(session, event);
 	case PERF_RECORD_AUXTRACE_INFO:
-		return tool->auxtrace_info(tool, event, session);
+		return tool->auxtrace_info(session, event);
 	case PERF_RECORD_AUXTRACE:
 		/* setup for reading amidst mmap */
 		lseek(fd, file_offset + event->header.size, SEEK_SET);
 		return tool->auxtrace(tool, event, session);
 	case PERF_RECORD_AUXTRACE_ERROR:
 		perf_session__auxtrace_error_inc(session, event);
-		return tool->auxtrace_error(tool, event, session);
+		return tool->auxtrace_error(session, event);
 	case PERF_RECORD_THREAD_MAP:
-		return tool->thread_map(tool, event, session);
+		return tool->thread_map(session, event);
 	case PERF_RECORD_CPU_MAP:
-		return tool->cpu_map(tool, event, session);
+		return tool->cpu_map(session, event);
 	case PERF_RECORD_STAT_CONFIG:
-		return tool->stat_config(tool, event, session);
+		return tool->stat_config(session, event);
 	case PERF_RECORD_STAT:
-		return tool->stat(tool, event, session);
+		return tool->stat(session, event);
 	case PERF_RECORD_STAT_ROUND:
-		return tool->stat_round(tool, event, session);
+		return tool->stat_round(session, event);
 	case PERF_RECORD_TIME_CONV:
 		session->time_conv = event->time_conv;
-		return tool->time_conv(tool, event, session);
+		return tool->time_conv(session, event);
 	case PERF_RECORD_HEADER_FEATURE:
-		return tool->feature(tool, event, session);
+		return tool->feature(session, event);
 	default:
 		return -EINVAL;
 	}
@@ -2133,9 +2123,8 @@ int __perf_session__set_tracepoints_handlers(struct perf_session *session,
 	return err;
 }
 
-int perf_event__process_id_index(struct perf_tool *tool __maybe_unused,
-				 union perf_event *event,
-				 struct perf_session *session)
+int perf_event__process_id_index(struct perf_session *session,
+				 union perf_event *event)
 {
 	struct perf_evlist *evlist = session->evlist;
 	struct id_index_event *ie = &event->id_index;
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index da40b4b380ca..d96eccd7d27f 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -120,9 +120,8 @@ int perf_session__deliver_synth_event(struct perf_session *session,
 				      union perf_event *event,
 				      struct perf_sample *sample);
 
-int perf_event__process_id_index(struct perf_tool *tool,
-				 union perf_event *event,
-				 struct perf_session *session);
+int perf_event__process_id_index(struct perf_session *session,
+				 union perf_event *event);
 
 int perf_event__synthesize_id_index(struct perf_tool *tool,
 				    perf_event__handler_t process,
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 5d3172bcc4ae..4d40515307b8 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -374,9 +374,8 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 	return 0;
 }
 
-int perf_event__process_stat_event(struct perf_tool *tool __maybe_unused,
-				   union perf_event *event,
-				   struct perf_session *session)
+int perf_event__process_stat_event(struct perf_session *session,
+				   union perf_event *event)
 {
 	struct perf_counts_values count;
 	struct stat_event *st = &event->stat;
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 3a13a6dc5a62..2f9c9159a364 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -199,9 +199,8 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 struct perf_tool;
 union perf_event;
 struct perf_session;
-int perf_event__process_stat_event(struct perf_tool *tool,
-				   union perf_event *event,
-				   struct perf_session *session);
+int perf_event__process_stat_event(struct perf_session *session,
+				   union perf_event *event);
 
 size_t perf_event__fprintf_stat(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_stat_round(union perf_event *event, FILE *fp);
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 183c91453522..9c7f78d76275 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -26,8 +26,7 @@ typedef int (*event_attr_op)(struct perf_tool *tool,
 			     union perf_event *event,
 			     struct perf_evlist **pevlist);
 
-typedef int (*event_op2)(struct perf_tool *tool, union perf_event *event,
-			 struct perf_session *session);
+typedef int (*event_op2)(struct perf_session *session, union perf_event *event);
 
 typedef int (*event_oe)(struct perf_tool *tool, union perf_event *event,
 			struct ordered_events *oe);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 02/48] perf tools: Remove perf_tool from event_op3
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
  2018-09-13 12:54 ` [PATCH 01/48] perf tools: Remove perf_tool from event_op2 Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-18 20:56   ` Arnaldo Carvalho de Melo
  2018-09-25  9:31   ` [tip:perf/core] " tip-bot for Jiri Olsa
  2018-09-13 12:54 ` [PATCH 03/48] perf tools: Pass struct perf_mmap into auxtrace_mmap__read* functions Jiri Olsa
                   ` (47 subsequent siblings)
  49 siblings, 2 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

Now when we keep perf_tool pointer inside perf_session,
there's no need to have perf_tool argument in the
event_op3 callback. Removing it.

Link: http://lkml.kernel.org/n/tip-78u9m0jbre3bn16l6guqfyrf@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-inject.c | 6 +++---
 tools/perf/util/auxtrace.c  | 7 +++----
 tools/perf/util/auxtrace.h  | 5 ++---
 tools/perf/util/session.c   | 8 +++-----
 tools/perf/util/tool.h      | 4 +---
 5 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index d77ed2aea95a..03fc65da0657 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -131,10 +131,10 @@ static int copy_bytes(struct perf_inject *inject, int fd, off_t size)
 	return 0;
 }
 
-static s64 perf_event__repipe_auxtrace(struct perf_tool *tool,
-				       union perf_event *event,
-				       struct perf_session *session)
+static s64 perf_event__repipe_auxtrace(struct perf_session *session,
+				       union perf_event *event)
 {
+	struct perf_tool *tool = session->tool;
 	struct perf_inject *inject = container_of(tool, struct perf_inject,
 						  tool);
 	int ret;
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 86f0bc445f93..3017b205a157 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -931,9 +931,8 @@ int perf_event__process_auxtrace_info(struct perf_session *session,
 	}
 }
 
-s64 perf_event__process_auxtrace(struct perf_tool *tool,
-				 union perf_event *event,
-				 struct perf_session *session)
+s64 perf_event__process_auxtrace(struct perf_session *session,
+				 union perf_event *event)
 {
 	s64 err;
 
@@ -949,7 +948,7 @@ s64 perf_event__process_auxtrace(struct perf_tool *tool,
 	if (!session->auxtrace || event->header.type != PERF_RECORD_AUXTRACE)
 		return -EINVAL;
 
-	err = session->auxtrace->process_auxtrace_event(session, event, tool);
+	err = session->auxtrace->process_auxtrace_event(session, event, session->tool);
 	if (err < 0)
 		return err;
 
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 97776470a52e..6be89776358c 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -519,9 +519,8 @@ int perf_event__synthesize_auxtrace_info(struct auxtrace_record *itr,
 					 perf_event__handler_t process);
 int perf_event__process_auxtrace_info(struct perf_session *session,
 				      union perf_event *event);
-s64 perf_event__process_auxtrace(struct perf_tool *tool,
-				 union perf_event *event,
-				 struct perf_session *session);
+s64 perf_event__process_auxtrace(struct perf_session *session,
+				 union perf_event *event);
 int perf_event__process_auxtrace_error(struct perf_session *session,
 				       union perf_event *event);
 int itrace_parse_synth_opts(const struct option *opt, const char *str,
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index e781cdba845c..7d2c8ce6cfad 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -275,10 +275,8 @@ static int skipn(int fd, off_t n)
 	return 0;
 }
 
-static s64 process_event_auxtrace_stub(struct perf_tool *tool __maybe_unused,
-				       union perf_event *event,
-				       struct perf_session *session
-				       __maybe_unused)
+static s64 process_event_auxtrace_stub(struct perf_session *session __maybe_unused,
+				       union perf_event *event)
 {
 	dump_printf(": unhandled!\n");
 	if (perf_data__is_pipe(session->data))
@@ -1376,7 +1374,7 @@ static s64 perf_session__process_user_event(struct perf_session *session,
 	case PERF_RECORD_AUXTRACE:
 		/* setup for reading amidst mmap */
 		lseek(fd, file_offset + event->header.size, SEEK_SET);
-		return tool->auxtrace(tool, event, session);
+		return tool->auxtrace(session, event);
 	case PERF_RECORD_AUXTRACE_ERROR:
 		perf_session__auxtrace_error_inc(session, event);
 		return tool->auxtrace_error(session, event);
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 9c7f78d76275..56e4ca54020a 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -27,13 +27,11 @@ typedef int (*event_attr_op)(struct perf_tool *tool,
 			     struct perf_evlist **pevlist);
 
 typedef int (*event_op2)(struct perf_session *session, union perf_event *event);
+typedef s64 (*event_op3)(struct perf_session *session, union perf_event *event);
 
 typedef int (*event_oe)(struct perf_tool *tool, union perf_event *event,
 			struct ordered_events *oe);
 
-typedef s64 (*event_op3)(struct perf_tool *tool, union perf_event *event,
-			 struct perf_session *session);
-
 enum show_feature_header {
 	SHOW_FEAT_NO_HEADER = 0,
 	SHOW_FEAT_HEADER,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 03/48] perf tools: Pass struct perf_mmap into auxtrace_mmap__read* functions
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
  2018-09-13 12:54 ` [PATCH 01/48] perf tools: Remove perf_tool from event_op2 Jiri Olsa
  2018-09-13 12:54 ` [PATCH 02/48] perf tools: Remove perf_tool from event_op3 Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-25  9:32   ` [tip:perf/core] perf auxtrace: Pass struct perf_mmap into mmap__read* functions tip-bot for Jiri Olsa
  2018-09-13 12:54 ` [PATCH 04/48] perf tools: Add struct perf_mmap arg into record__write Jiri Olsa
                   ` (46 subsequent siblings)
  49 siblings, 1 reply; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

The map struct will hold the file pointer, so we need to propagate
it down the stack to record__write callers instead of its member
the auxtrace_mmap struct.

Link: http://lkml.kernel.org/n/tip-h6tbp0huyptieote47dk1znz@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-record.c | 27 +++++++++++++--------------
 tools/perf/util/auxtrace.c  | 11 ++++++-----
 tools/perf/util/auxtrace.h  |  5 +++--
 3 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 9853552bcf16..fd8b12c5f4ae 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -207,11 +207,11 @@ static int record__process_auxtrace(struct perf_tool *tool,
 }
 
 static int record__auxtrace_mmap_read(struct record *rec,
-				      struct auxtrace_mmap *mm)
+				      struct perf_mmap *map)
 {
 	int ret;
 
-	ret = auxtrace_mmap__read(mm, rec->itr, &rec->tool,
+	ret = auxtrace_mmap__read(map, rec->itr, &rec->tool,
 				  record__process_auxtrace);
 	if (ret < 0)
 		return ret;
@@ -223,11 +223,11 @@ static int record__auxtrace_mmap_read(struct record *rec,
 }
 
 static int record__auxtrace_mmap_read_snapshot(struct record *rec,
-					       struct auxtrace_mmap *mm)
+					       struct perf_mmap *map)
 {
 	int ret;
 
-	ret = auxtrace_mmap__read_snapshot(mm, rec->itr, &rec->tool,
+	ret = auxtrace_mmap__read_snapshot(map, rec->itr, &rec->tool,
 					   record__process_auxtrace,
 					   rec->opts.auxtrace_snapshot_size);
 	if (ret < 0)
@@ -245,13 +245,12 @@ static int record__auxtrace_read_snapshot_all(struct record *rec)
 	int rc = 0;
 
 	for (i = 0; i < rec->evlist->nr_mmaps; i++) {
-		struct auxtrace_mmap *mm =
-				&rec->evlist->mmap[i].auxtrace_mmap;
+		struct perf_mmap *map = &rec->evlist->mmap[i];
 
-		if (!mm->base)
+		if (!map->auxtrace_mmap.base)
 			continue;
 
-		if (record__auxtrace_mmap_read_snapshot(rec, mm) != 0) {
+		if (record__auxtrace_mmap_read_snapshot(rec, map) != 0) {
 			rc = -1;
 			goto out;
 		}
@@ -295,7 +294,7 @@ static int record__auxtrace_init(struct record *rec)
 
 static inline
 int record__auxtrace_mmap_read(struct record *rec __maybe_unused,
-			       struct auxtrace_mmap *mm __maybe_unused)
+			       struct perf_mmap *map __maybe_unused)
 {
 	return 0;
 }
@@ -529,17 +528,17 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 		return 0;
 
 	for (i = 0; i < evlist->nr_mmaps; i++) {
-		struct auxtrace_mmap *mm = &maps[i].auxtrace_mmap;
+		struct perf_mmap *map = &maps[i];
 
-		if (maps[i].base) {
-			if (perf_mmap__push(&maps[i], rec, record__pushfn) != 0) {
+		if (map->base) {
+			if (perf_mmap__push(map, rec, record__pushfn) != 0) {
 				rc = -1;
 				goto out;
 			}
 		}
 
-		if (mm->base && !rec->opts.auxtrace_snapshot_mode &&
-		    record__auxtrace_mmap_read(rec, mm) != 0) {
+		if (map->auxtrace_mmap.base && !rec->opts.auxtrace_snapshot_mode &&
+		    record__auxtrace_mmap_read(rec, map) != 0) {
 			rc = -1;
 			goto out;
 		}
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 3017b205a157..2fecee57f555 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1193,11 +1193,12 @@ int perf_event__process_auxtrace_error(struct perf_session *session,
 	return 0;
 }
 
-static int __auxtrace_mmap__read(struct auxtrace_mmap *mm,
+static int __auxtrace_mmap__read(struct perf_mmap *map,
 				 struct auxtrace_record *itr,
 				 struct perf_tool *tool, process_auxtrace_t fn,
 				 bool snapshot, size_t snapshot_size)
 {
+	struct auxtrace_mmap *mm = &map->auxtrace_mmap;
 	u64 head, old = mm->prev, offset, ref;
 	unsigned char *data = mm->base;
 	size_t size, head_off, old_off, len1, len2, padding;
@@ -1303,18 +1304,18 @@ static int __auxtrace_mmap__read(struct auxtrace_mmap *mm,
 	return 1;
 }
 
-int auxtrace_mmap__read(struct auxtrace_mmap *mm, struct auxtrace_record *itr,
+int auxtrace_mmap__read(struct perf_mmap *map, struct auxtrace_record *itr,
 			struct perf_tool *tool, process_auxtrace_t fn)
 {
-	return __auxtrace_mmap__read(mm, itr, tool, fn, false, 0);
+	return __auxtrace_mmap__read(map, itr, tool, fn, false, 0);
 }
 
-int auxtrace_mmap__read_snapshot(struct auxtrace_mmap *mm,
+int auxtrace_mmap__read_snapshot(struct perf_mmap *map,
 				 struct auxtrace_record *itr,
 				 struct perf_tool *tool, process_auxtrace_t fn,
 				 size_t snapshot_size)
 {
-	return __auxtrace_mmap__read(mm, itr, tool, fn, true, snapshot_size);
+	return __auxtrace_mmap__read(map, itr, tool, fn, true, snapshot_size);
 }
 
 /**
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 6be89776358c..7eeb141361b9 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -33,6 +33,7 @@ union perf_event;
 struct perf_session;
 struct perf_evlist;
 struct perf_tool;
+struct perf_mmap;
 struct option;
 struct record_opts;
 struct auxtrace_info_event;
@@ -437,10 +438,10 @@ typedef int (*process_auxtrace_t)(struct perf_tool *tool,
 				  union perf_event *event, void *data1,
 				  size_t len1, void *data2, size_t len2);
 
-int auxtrace_mmap__read(struct auxtrace_mmap *mm, struct auxtrace_record *itr,
+int auxtrace_mmap__read(struct perf_mmap *map, struct auxtrace_record *itr,
 			struct perf_tool *tool, process_auxtrace_t fn);
 
-int auxtrace_mmap__read_snapshot(struct auxtrace_mmap *mm,
+int auxtrace_mmap__read_snapshot(struct perf_mmap *map,
 				 struct auxtrace_record *itr,
 				 struct perf_tool *tool, process_auxtrace_t fn,
 				 size_t snapshot_size);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 04/48] perf tools: Add struct perf_mmap arg into record__write
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (2 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 03/48] perf tools: Pass struct perf_mmap into auxtrace_mmap__read* functions Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-25  9:32   ` [tip:perf/core] perf tools: Add 'struct perf_mmap' arg to record__write() tip-bot for Jiri Olsa
  2018-09-13 12:54 ` [PATCH 05/48] perf tools: Use a software dummy event to track task/mmap events Jiri Olsa
                   ` (45 subsequent siblings)
  49 siblings, 1 reply; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

The map argument will hold the file pointer
to write the data to.

Link: http://lkml.kernel.org/n/tip-hnoi427zudjnm86awuejhnnt@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-record.c | 24 ++++++++++++++----------
 tools/perf/util/auxtrace.c  |  2 +-
 tools/perf/util/auxtrace.h  |  1 +
 tools/perf/util/mmap.c      |  6 +++---
 tools/perf/util/mmap.h      |  2 +-
 5 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index fd8b12c5f4ae..0980dfe3396b 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -106,9 +106,12 @@ static bool switch_output_time(struct record *rec)
 	       trigger_is_ready(&switch_output_trigger);
 }
 
-static int record__write(struct record *rec, void *bf, size_t size)
+static int record__write(struct record *rec, struct perf_mmap *map __maybe_unused,
+			 void *bf, size_t size)
 {
-	if (perf_data__write(rec->session->data, bf, size) < 0) {
+	struct perf_data_file *file = &rec->session->data->file;
+
+	if (perf_data_file__write(file, bf, size) < 0) {
 		pr_err("failed to write perf data, error: %m\n");
 		return -1;
 	}
@@ -127,15 +130,15 @@ static int process_synthesized_event(struct perf_tool *tool,
 				     struct machine *machine __maybe_unused)
 {
 	struct record *rec = container_of(tool, struct record, tool);
-	return record__write(rec, event, event->header.size);
+	return record__write(rec, NULL, event, event->header.size);
 }
 
-static int record__pushfn(void *to, void *bf, size_t size)
+static int record__pushfn(struct perf_mmap *map, void *to, void *bf, size_t size)
 {
 	struct record *rec = to;
 
 	rec->samples++;
-	return record__write(rec, bf, size);
+	return record__write(rec, map, bf, size);
 }
 
 static volatile int done;
@@ -170,6 +173,7 @@ static void record__sig_exit(void)
 #ifdef HAVE_AUXTRACE_SUPPORT
 
 static int record__process_auxtrace(struct perf_tool *tool,
+				    struct perf_mmap *map,
 				    union perf_event *event, void *data1,
 				    size_t len1, void *data2, size_t len2)
 {
@@ -197,11 +201,11 @@ static int record__process_auxtrace(struct perf_tool *tool,
 	if (padding)
 		padding = 8 - padding;
 
-	record__write(rec, event, event->header.size);
-	record__write(rec, data1, len1);
+	record__write(rec, map, event, event->header.size);
+	record__write(rec, map, data1, len1);
 	if (len2)
-		record__write(rec, data2, len2);
-	record__write(rec, &pad, padding);
+		record__write(rec, map, data2, len2);
+	record__write(rec, map, &pad, padding);
 
 	return 0;
 }
@@ -549,7 +553,7 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 	 * at least one event.
 	 */
 	if (bytes_written != rec->bytes_written)
-		rc = record__write(rec, &finished_round_event, sizeof(finished_round_event));
+		rc = record__write(rec, NULL, &finished_round_event, sizeof(finished_round_event));
 
 	if (overwrite)
 		perf_evlist__toggle_bkw_mmap(evlist, BKW_MMAP_EMPTY);
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 2fecee57f555..c4617bcfd521 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1285,7 +1285,7 @@ static int __auxtrace_mmap__read(struct perf_mmap *map,
 	ev.auxtrace.tid = mm->tid;
 	ev.auxtrace.cpu = mm->cpu;
 
-	if (fn(tool, &ev, data1, len1, data2, len2))
+	if (fn(tool, map, &ev, data1, len1, data2, len2))
 		return -1;
 
 	mm->prev = head;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 7eeb141361b9..a86b7eab6673 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -435,6 +435,7 @@ void auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp,
 				   bool per_cpu);
 
 typedef int (*process_auxtrace_t)(struct perf_tool *tool,
+				  struct perf_mmap *map,
 				  union perf_event *event, void *data1,
 				  size_t len1, void *data2, size_t len2);
 
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 215f69f41672..cdb95b3a1213 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -281,7 +281,7 @@ int perf_mmap__read_init(struct perf_mmap *map)
 }
 
 int perf_mmap__push(struct perf_mmap *md, void *to,
-		    int push(void *to, void *buf, size_t size))
+		    int push(struct perf_mmap *map, void *to, void *buf, size_t size))
 {
 	u64 head = perf_mmap__read_head(md);
 	unsigned char *data = md->base + page_size;
@@ -300,7 +300,7 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
 		size = md->mask + 1 - (md->start & md->mask);
 		md->start += size;
 
-		if (push(to, buf, size) < 0) {
+		if (push(md, to, buf, size) < 0) {
 			rc = -1;
 			goto out;
 		}
@@ -310,7 +310,7 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
 	size = md->end - md->start;
 	md->start += size;
 
-	if (push(to, buf, size) < 0) {
+	if (push(md, to, buf, size) < 0) {
 		rc = -1;
 		goto out;
 	}
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index 05a6d47c7956..e603314dc792 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -93,7 +93,7 @@ union perf_event *perf_mmap__read_forward(struct perf_mmap *map);
 union perf_event *perf_mmap__read_event(struct perf_mmap *map);
 
 int perf_mmap__push(struct perf_mmap *md, void *to,
-		    int push(void *to, void *buf, size_t size));
+		    int push(struct perf_mmap *map, void *to, void *buf, size_t size));
 
 size_t perf_mmap__mmap_len(struct perf_mmap *map);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 05/48] perf tools: Use a software dummy event to track task/mmap events
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (3 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 04/48] perf tools: Add struct perf_mmap arg into record__write Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 06/48] perf tools: Create separate mmap for dummy tracking event Jiri Olsa
                   ` (44 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Adrian Hunter, lkml, Ingo Molnar, Namhyung Kim,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov

From: Namhyung Kim <namhyung@kernel.org>

Add APIs for software dummy event to track task/comm/mmap events
separately. The perf record will use them to save such events in
a separate mmap buffer to make it easy to index. This is just a
preparation of multi-thread support which will come later.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/n/tip-qo7opc5kb3ueuicyjdyqupkh@git.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/evlist.c | 30 ++++++++++++++++++++++++++++++
 tools/perf/util/evlist.h |  1 +
 tools/perf/util/evsel.h  | 15 +++++++++++++++
 3 files changed, 46 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index be440df29615..7428d65650c9 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -267,6 +267,36 @@ int perf_evlist__add_dummy(struct perf_evlist *evlist)
 	return 0;
 }
 
+int perf_evlist__add_dummy_tracking(struct perf_evlist *evlist)
+{
+	struct perf_event_attr attr = {
+		.type = PERF_TYPE_SOFTWARE,
+		.config = PERF_COUNT_SW_DUMMY,
+		.exclude_kernel = 1,
+	};
+	struct perf_evsel *evsel;
+
+	event_attr_init(&attr);
+
+	evsel = perf_evsel__new(&attr);
+	if (evsel == NULL)
+		goto error;
+
+	/* use strdup() because free(evsel) assumes name is allocated */
+	evsel->name = strdup("dummy");
+	if (!evsel->name)
+		goto error_free;
+
+	perf_evlist__add(evlist, evsel);
+	perf_evlist__set_tracking_event(evlist, evsel);
+
+	return 0;
+error_free:
+	perf_evsel__delete(evsel);
+error:
+	return -ENOMEM;
+}
+
 static int perf_evlist__add_attrs(struct perf_evlist *evlist,
 				  struct perf_event_attr *attrs, size_t nr_attrs)
 {
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index dc66436add98..c11cb80e7847 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -76,6 +76,7 @@ static inline int perf_evlist__add_default(struct perf_evlist *evlist)
 	return __perf_evlist__add_default(evlist, true);
 }
 
+int perf_evlist__add_dummy_tracking(struct perf_evlist *evlist);
 int __perf_evlist__add_default_attrs(struct perf_evlist *evlist,
 				     struct perf_event_attr *attrs, size_t nr_attrs);
 
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 4f8430a85531..6e18b9ff997d 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -411,6 +411,21 @@ static inline bool perf_evsel__is_clock(struct perf_evsel *evsel)
 	       perf_evsel__match(evsel, SOFTWARE, SW_TASK_CLOCK);
 }
 
+/**
+ * perf_evsel__is_dummy_tracking - Return whether given evsel is a dummy
+ * event for tracking meta events only
+ *
+ * @evsel - evsel selector to be tested
+ *
+ * Return %true if event is a dummy tracking event
+ */
+static inline bool perf_evsel__is_dummy_tracking(struct perf_evsel *evsel)
+{
+	return evsel->attr.type == PERF_TYPE_SOFTWARE &&
+		evsel->attr.config == PERF_COUNT_SW_DUMMY &&
+		evsel->attr.task == 1 && evsel->attr.mmap == 1;
+}
+
 struct perf_attr_details {
 	bool freq;
 	bool verbose;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 06/48] perf tools: Create separate mmap for dummy tracking event
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (4 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 05/48] perf tools: Use a software dummy event to track task/mmap events Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 07/48] perf tools: Extend perf_evlist__mmap_ex() to use track mmap Jiri Olsa
                   ` (43 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Adrian Hunter, lkml, Ingo Molnar, Namhyung Kim,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov

When indexed data file support is enabled, a dummy tracking event will
be used to track metadata (like task, comm and mmap events) for a
session and actual samples will be recorded in separate (intermediate)
files and then merged (with index table).

Provide separate mmap to the dummy tracking event.  The size is fixed
to 128KiB (+ 1 page) as the event rate will be lower than samples.  I
originally wanted to use a single mmap for this but cross-cpu sharing
is prohibited so it's per-cpu (or per-task) like normal mmaps.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/n/tip-8vw9ocqkwqa7rsoolc25ezdt@git.kernel.org
Original-patch-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-record.c |  9 ++++++
 tools/perf/util/evlist.c    | 55 ++++++++++++++++++++++++++++++-------
 tools/perf/util/evlist.h    |  3 ++
 tools/perf/util/mmap.h      |  1 +
 4 files changed, 58 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 0980dfe3396b..cb5a605679a1 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -533,6 +533,8 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 
 	for (i = 0; i < evlist->nr_mmaps; i++) {
 		struct perf_mmap *map = &maps[i];
+		struct perf_mmap *track_map =  evlist->track_mmap ?
+					      &evlist->track_mmap[i] : NULL;
 
 		if (map->base) {
 			if (perf_mmap__push(map, rec, record__pushfn) != 0) {
@@ -546,6 +548,13 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 			rc = -1;
 			goto out;
 		}
+
+		if (track_map && track_map->base) {
+			if (perf_mmap__push(track_map, rec, record__pushfn) != 0) {
+				rc = -1;
+				goto out;
+			}
+		}
 	}
 
 	/*
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 7428d65650c9..6cbfc5ceab75 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -736,9 +736,15 @@ static void perf_evlist__munmap_nofree(struct perf_evlist *evlist)
 {
 	int i;
 
-	if (evlist->mmap)
+	if (evlist->mmap) {
 		for (i = 0; i < evlist->nr_mmaps; i++)
 			perf_mmap__munmap(&evlist->mmap[i]);
+	}
+
+	if (evlist->track_mmap) {
+		for (i = 0; i < evlist->nr_mmaps; i++)
+			perf_mmap__munmap(&evlist->track_mmap[i]);
+	}
 
 	if (evlist->overwrite_mmap)
 		for (i = 0; i < evlist->nr_mmaps; i++)
@@ -792,14 +798,20 @@ perf_evlist__should_poll(struct perf_evlist *evlist __maybe_unused,
 }
 
 static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
-				       struct mmap_params *mp, int cpu_idx,
-				       int thread, int *_output, int *_output_overwrite)
+				       struct mmap_params *_mp, int cpu_idx,
+				       int thread, int *_output, int *_output_overwrite,
+				       int *_output_track)
 {
+	struct mmap_params mp_track = {
+		.prot = PROT_READ | PROT_WRITE,
+		.mask = TRACK_MMAP_SIZE - page_size - 1,
+	};
 	struct perf_evsel *evsel;
 	int revent;
 	int evlist_cpu = cpu_map__cpu(evlist->cpus, cpu_idx);
 
 	evlist__for_each_entry(evlist, evsel) {
+		struct mmap_params *mp = _mp;
 		struct perf_mmap *maps = evlist->mmap;
 		int *output = _output;
 		int fd;
@@ -821,6 +833,12 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
 			mp->prot &= ~PROT_WRITE;
 		}
 
+		if (mp->track_mmap && perf_evsel__is_dummy_tracking(evsel)) {
+			output = _output_track;
+			maps   = evlist->track_mmap;
+			mp     = &mp_track;
+		}
+
 		if (evsel->system_wide && thread)
 			continue;
 
@@ -880,13 +898,15 @@ static int perf_evlist__mmap_per_cpu(struct perf_evlist *evlist,
 	for (cpu = 0; cpu < nr_cpus; cpu++) {
 		int output = -1;
 		int output_overwrite = -1;
+		int output_track = -1;
 
 		auxtrace_mmap_params__set_idx(&mp->auxtrace_mp, evlist, cpu,
 					      true);
 
 		for (thread = 0; thread < nr_threads; thread++) {
 			if (perf_evlist__mmap_per_evsel(evlist, cpu, mp, cpu,
-							thread, &output, &output_overwrite))
+							thread, &output, &output_overwrite,
+							&output_track))
 				goto out_unmap;
 		}
 	}
@@ -908,12 +928,13 @@ static int perf_evlist__mmap_per_thread(struct perf_evlist *evlist,
 	for (thread = 0; thread < nr_threads; thread++) {
 		int output = -1;
 		int output_overwrite = -1;
+		int output_track = -1;
 
 		auxtrace_mmap_params__set_idx(&mp->auxtrace_mp, evlist, thread,
 					      false);
 
 		if (perf_evlist__mmap_per_evsel(evlist, thread, mp, 0, thread,
-						&output, &output_overwrite))
+						&output, &output_overwrite, &output_track))
 			goto out_unmap;
 	}
 
@@ -1058,12 +1079,26 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 	 * Its value is decided by evsel's write_backward.
 	 * So &mp should not be passed through const pointer.
 	 */
-	struct mmap_params mp;
+	struct mmap_params mp = {
+		.track_mmap = false,
+	};
 
-	if (!evlist->mmap)
-		evlist->mmap = perf_evlist__alloc_mmap(evlist, false);
-	if (!evlist->mmap)
-		return -ENOMEM;
+	if (!evlist->mmap) {
+		struct perf_mmap *map;
+
+		map = perf_evlist__alloc_mmap(evlist, false);
+		if (!map)
+			return -ENOMEM;
+
+		evlist->mmap = map;
+		if (mp.track_mmap) {
+			map = perf_evlist__alloc_mmap(evlist, false);
+			if (!map)
+				return -ENOMEM;
+
+			evlist->track_mmap = map;
+		}
+	}
 
 	if (evlist->pollfd.entries == NULL && perf_evlist__alloc_pollfd(evlist) < 0)
 		return -ENOMEM;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index c11cb80e7847..4ef1d355e811 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -24,6 +24,8 @@ struct record_opts;
 #define PERF_EVLIST__HLIST_BITS 8
 #define PERF_EVLIST__HLIST_SIZE (1 << PERF_EVLIST__HLIST_BITS)
 
+#define TRACK_MMAP_SIZE  (((128 * 1024 / page_size) + 1) * page_size)
+
 struct perf_evlist {
 	struct list_head entries;
 	struct hlist_head heads[PERF_EVLIST__HLIST_SIZE];
@@ -44,6 +46,7 @@ struct perf_evlist {
 	struct fdarray	 pollfd;
 	struct perf_mmap *mmap;
 	struct perf_mmap *overwrite_mmap;
+	struct perf_mmap *track_mmap;
 	struct thread_map *threads;
 	struct cpu_map	  *cpus;
 	struct perf_evsel *selected;
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index e603314dc792..9d2672d8f131 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -59,6 +59,7 @@ enum bkw_mmap_state {
 struct mmap_params {
 	int			    prot, mask;
 	struct auxtrace_mmap_params auxtrace_mp;
+	bool track_mmap;
 };
 
 int perf_mmap__mmap(struct perf_mmap *map, struct mmap_params *mp, int fd, int cpu);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 07/48] perf tools: Extend perf_evlist__mmap_ex() to use track mmap
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (5 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 06/48] perf tools: Create separate mmap for dummy tracking event Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 08/48] perf report: Skip dummy tracking event Jiri Olsa
                   ` (42 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Adrian Hunter, lkml, Ingo Molnar, Namhyung Kim,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov

From: Namhyung Kim <namhyung@kernel.org>

The perf_evlist__mmap_ex function creates data and auxtrace mmaps and
optionally tracking mmaps for events now.  It'll be used for perf
record to save events in a separate files and build an index table.
Checking dummy tracking event in perf_evlist__mmap() alone is not
enough as users can specify a dummy event (like in keep tracking
testcase) without the index option.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/n/tip-hche6reuz4unv1614kj7lihz@git.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-record.c | 2 +-
 tools/perf/util/evlist.c    | 7 ++++---
 tools/perf/util/evlist.h    | 2 +-
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index cb5a605679a1..5d1433f92454 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -329,7 +329,7 @@ static int record__mmap_evlist(struct record *rec,
 
 	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages,
 				 opts->auxtrace_mmap_pages,
-				 opts->auxtrace_snapshot_mode) < 0) {
+				 opts->auxtrace_snapshot_mode, false) < 0) {
 		if (errno == EPERM) {
 			pr_err("Permission error mapping pages.\n"
 			       "Consider increasing "
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 6cbfc5ceab75..2f094f3bf446 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1057,6 +1057,7 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
  * @overwrite: overwrite older events?
  * @auxtrace_pages - auxtrace map length in pages
  * @auxtrace_overwrite - overwrite older auxtrace data?
+ * @use_track_mmap: use another mmaps to track meta events
  *
  * If @overwrite is %false the user needs to signal event consumption using
  * perf_mmap__write_tail().  Using perf_evlist__mmap_read() does this
@@ -1069,7 +1070,7 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
  */
 int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 			 unsigned int auxtrace_pages,
-			 bool auxtrace_overwrite)
+			 bool auxtrace_overwrite, bool use_track_mmap)
 {
 	struct perf_evsel *evsel;
 	const struct cpu_map *cpus = evlist->cpus;
@@ -1080,7 +1081,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 	 * So &mp should not be passed through const pointer.
 	 */
 	struct mmap_params mp = {
-		.track_mmap = false,
+		.track_mmap = use_track_mmap,
 	};
 
 	if (!evlist->mmap) {
@@ -1125,7 +1126,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 
 int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages)
 {
-	return perf_evlist__mmap_ex(evlist, pages, 0, false);
+	return perf_evlist__mmap_ex(evlist, pages, 0, false, false);
 }
 
 int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 4ef1d355e811..df5162c4292b 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -166,7 +166,7 @@ unsigned long perf_event_mlock_kb_in_pages(void);
 
 int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 			 unsigned int auxtrace_pages,
-			 bool auxtrace_overwrite);
+			 bool auxtrace_overwrite, bool use_track_mmap);
 int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages);
 void perf_evlist__munmap(struct perf_evlist *evlist);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 08/48] perf report: Skip dummy tracking event
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (6 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 07/48] perf tools: Extend perf_evlist__mmap_ex() to use track mmap Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 09/48] perf tools: Make copyfile_offset global Jiri Olsa
                   ` (41 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

From: Namhyung Kim <namhyung@kernel.org>

The dummy tracking event is only for tracking task/comom/mmap events
and has no sample data for itself. So no need to report, just skip it.

Link: http://lkml.kernel.org/n/tip-l2hq7g8c7lt73aqmqpo6dywj@git.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-report.c    |  3 +++
 tools/perf/ui/browsers/hists.c | 30 ++++++++++++++++++++++++------
 tools/perf/ui/gtk/hists.c      |  3 +++
 3 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 7507e4d6dce1..3666a6b82ff1 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -471,6 +471,9 @@ static int perf_evlist__tty_browse_hists(struct perf_evlist *evlist,
 		struct hists *hists = evsel__hists(pos);
 		const char *evname = perf_evsel__name(pos);
 
+		if (perf_evsel__is_dummy_tracking(pos))
+			continue;
+
 		if (symbol_conf.event_group &&
 		    !perf_evsel__is_group_leader(pos))
 			continue;
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index a96f62ca984a..a517becda28a 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -3214,14 +3214,17 @@ static int perf_evsel_menu__run(struct perf_evsel_menu *menu,
 	return key;
 }
 
-static bool filter_group_entries(struct ui_browser *browser __maybe_unused,
-				 void *entry)
+static bool filter_entries(struct ui_browser *browser __maybe_unused,
+			   void *entry)
 {
 	struct perf_evsel *evsel = list_entry(entry, struct perf_evsel, node);
 
 	if (symbol_conf.event_group && !perf_evsel__is_group_leader(evsel))
 		return true;
 
+	if (perf_evsel__is_dummy_tracking(evsel))
+		return true;
+
 	return false;
 }
 
@@ -3240,7 +3243,7 @@ static int __perf_evlist__tui_browse_hists(struct perf_evlist *evlist,
 			.refresh    = ui_browser__list_head_refresh,
 			.seek	    = ui_browser__list_head_seek,
 			.write	    = perf_evsel_menu__write,
-			.filter	    = filter_group_entries,
+			.filter	    = filter_entries,
 			.nr_entries = nr_entries,
 			.priv	    = evlist,
 		},
@@ -3271,11 +3274,11 @@ int perf_evlist__tui_browse_hists(struct perf_evlist *evlist, const char *help,
 				  struct annotation_options *annotation_opts)
 {
 	int nr_entries = evlist->nr_entries;
+	struct perf_evsel *first = perf_evlist__first(evlist);
+	struct perf_evsel *pos;
 
 single_entry:
 	if (nr_entries == 1) {
-		struct perf_evsel *first = perf_evlist__first(evlist);
-
 		return perf_evsel__hists_browse(first, nr_entries, help,
 						false, hbt, min_pcnt,
 						env, warn_lost_event,
@@ -3283,10 +3286,11 @@ int perf_evlist__tui_browse_hists(struct perf_evlist *evlist, const char *help,
 	}
 
 	if (symbol_conf.event_group) {
-		struct perf_evsel *pos;
 
 		nr_entries = 0;
 		evlist__for_each_entry(evlist, pos) {
+			if (perf_evsel__is_dummy_tracking(pos))
+				continue;
 			if (perf_evsel__is_group_leader(pos))
 				nr_entries++;
 		}
@@ -3295,6 +3299,20 @@ int perf_evlist__tui_browse_hists(struct perf_evlist *evlist, const char *help,
 			goto single_entry;
 	}
 
+	evlist__for_each_entry(evlist, pos) {
+		if (perf_evsel__is_dummy_tracking(pos))
+			nr_entries--;
+	}
+
+	if (nr_entries == 1) {
+		evlist__for_each_entry(evlist, pos) {
+			if (!perf_evsel__is_dummy_tracking(pos)) {
+				first = pos;
+				goto single_entry;
+			}
+		}
+	}
+
 	return __perf_evlist__tui_browse_hists(evlist, nr_entries, help,
 					       hbt, min_pcnt, env,
 					       warn_lost_event,
diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index 4ab663ec3e5e..adbece4e0071 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -639,6 +639,9 @@ int perf_evlist__gtk_browse_hists(struct perf_evlist *evlist,
 		char buf[512];
 		size_t size = sizeof(buf);
 
+		if (perf_evsel__is_dummy_tracking(pos))
+			continue;
+
 		if (symbol_conf.event_group) {
 			if (!perf_evsel__is_group_leader(pos))
 				continue;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 09/48] perf tools: Make copyfile_offset global
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (7 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 08/48] perf report: Skip dummy tracking event Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-18 20:54   ` Arnaldo Carvalho de Melo
  2018-09-25  9:33   ` [tip:perf/core] perf util: Make copyfile_offset() global tip-bot for Jiri Olsa
  2018-09-13 12:54 ` [PATCH 10/48] perf tools: Add HEADER_DATA_INDEX feature Jiri Olsa
                   ` (40 subsequent siblings)
  49 siblings, 2 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

It will be used outside of util object in following patches.

Link: http://lkml.kernel.org/n/tip-xgiypvcrmc12u7czcrc27en2@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/util.c | 2 +-
 tools/perf/util/util.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index eac5b858a371..093352e93d50 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -221,7 +221,7 @@ static int slow_copyfile(const char *from, const char *to, struct nsinfo *nsi)
 	return err;
 }
 
-static int copyfile_offset(int ifd, loff_t off_in, int ofd, loff_t off_out, u64 size)
+int copyfile_offset(int ifd, loff_t off_in, int ofd, loff_t off_out, u64 size)
 {
 	void *ptr;
 	loff_t pgoff;
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index dc58254a2b69..7fc171b20671 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -80,4 +80,6 @@ void perf_set_multithreaded(void);
 #endif
 #endif
 
+int copyfile_offset(int ifd, loff_t off_in, int ofd, loff_t off_out, u64 size);
+
 #endif /* GIT_COMPAT_UTIL_H */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 10/48] perf tools: Add HEADER_DATA_INDEX feature
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (8 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 09/48] perf tools: Make copyfile_offset global Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 11/48] perf tools: Handle indexed data file properly Jiri Olsa
                   ` (39 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

From: Namhyung Kim <namhyung@kernel.org>

The HEADER_DATA_INDEX feature is to record index table for sample data
so that they can be processed by multiple thread concurrently.  Each
item is a struct perf_file_section which consists of an offset and size.

Link: http://lkml.kernel.org/n/tip-q5rh23fs1qlenc2s9fj0ug7u@git.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-record.c |  1 +
 tools/perf/util/header.c    | 76 +++++++++++++++++++++++++++++++++++++
 tools/perf/util/header.h    |  3 ++
 3 files changed, 80 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 5d1433f92454..b5deda2e890c 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -602,6 +602,7 @@ static void record__init_features(struct record *rec)
 		perf_header__clear_feat(&session->header, HEADER_AUXTRACE);
 
 	perf_header__clear_feat(&session->header, HEADER_STAT);
+	perf_header__clear_feat(&session->header, HEADER_DATA_INDEX);
 }
 
 static void
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index c78051ad1fcc..b097bcc35d34 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -34,6 +34,7 @@
 #include "strbuf.h"
 #include "build-id.h"
 #include "data.h"
+#include "units.h"
 #include <api/fs/fs.h>
 #include "asm/bug.h"
 #include "tool.h"
@@ -1417,6 +1418,25 @@ static int write_mem_topology(struct feat_fd *ff __maybe_unused,
 	return ret;
 }
 
+static int write_data_index(struct feat_fd *ff,
+			    struct perf_evlist *evlist __maybe_unused)
+{
+	struct perf_header *ph = ff->ph;
+	int ret;
+	unsigned int i;
+
+	ret = do_write(ff, &ph->nr_index, sizeof(ph->nr_index));
+	if (ret < 0)
+		return ret;
+
+	for (i = 0; i < ph->nr_index; i++) {
+		ret = do_write(ff, &ph->index[i], sizeof(*ph->index));
+		if (ret < 0)
+			return ret;
+	}
+	return 0;
+}
+
 static void print_hostname(struct feat_fd *ff, FILE *fp)
 {
 	fprintf(fp, "# hostname : %s\n", ff->ph->env.hostname);
@@ -1809,6 +1829,23 @@ static void print_mem_topology(struct feat_fd *ff, FILE *fp)
 	}
 }
 
+static void print_data_index(struct feat_fd *ff, FILE *fp)
+{
+	struct perf_header *ph = ff->ph;
+	unsigned int i;
+
+	fprintf(fp, "# contains data index (%lu) for parallel processing\n",
+		ph->nr_index);
+
+	for (i = 0; i < ph->nr_index; i++) {
+		struct perf_file_section *s = &ph->index[i];
+		char buf[20];
+
+		unit_number__scnprintf(buf, sizeof(buf), s->size);
+		fprintf(fp, "#   %u: %s @ %lu\n", i, buf, s->offset);
+	}
+}
+
 static int __event_process_build_id(struct build_id_event *bev,
 				    char *filename,
 				    struct perf_session *session)
@@ -2531,6 +2568,44 @@ static int process_mem_topology(struct feat_fd *ff,
 	return ret;
 }
 
+static int process_data_index(struct feat_fd *ff, void *data __maybe_unused)
+{
+	struct perf_header *ph = ff->ph;
+	int fd = ff->fd;
+	ssize_t ret;
+	u64 nr_idx;
+	unsigned int i;
+	struct perf_file_section *idx;
+
+	ret = readn(fd, &nr_idx, sizeof(nr_idx));
+	if (ret != sizeof(nr_idx))
+		return -1;
+
+	if (ph->needs_swap)
+		nr_idx = bswap_64(nr_idx);
+
+	idx = calloc(nr_idx, sizeof(*idx));
+	if (idx == NULL)
+		return -1;
+
+	for (i = 0; i < nr_idx; i++) {
+		ret = readn(fd, &idx[i], sizeof(*idx));
+		if (ret != sizeof(*idx)) {
+			free(idx);
+			return -1;
+		}
+
+		if (ph->needs_swap) {
+			idx[i].offset = bswap_64(idx[i].offset);
+			idx[i].size   = bswap_64(idx[i].size);
+		}
+	}
+
+	ph->index = idx;
+	ph->nr_index = nr_idx;
+	return 0;
+}
+
 struct feature_ops {
 	int (*write)(struct feat_fd *ff, struct perf_evlist *evlist);
 	void (*print)(struct feat_fd *ff, FILE *fp);
@@ -2590,6 +2665,7 @@ static const struct feature_ops feat_ops[HEADER_LAST_FEATURE] = {
 	FEAT_OPN(CACHE,		cache,		true),
 	FEAT_OPR(SAMPLE_TIME,	sample_time,	false),
 	FEAT_OPR(MEM_TOPOLOGY,	mem_topology,	true),
+	FEAT_OPN(DATA_INDEX,	data_index,	true),
 };
 
 struct header_print_data {
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index e17903caa71d..542a62167ecf 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -38,6 +38,7 @@ enum {
 	HEADER_CACHE,
 	HEADER_SAMPLE_TIME,
 	HEADER_MEM_TOPOLOGY,
+	HEADER_DATA_INDEX,
 	HEADER_LAST_FEATURE,
 	HEADER_FEAT_BITS	= 256,
 };
@@ -78,6 +79,8 @@ struct perf_header {
 	bool				needs_swap;
 	u64				data_offset;
 	u64				data_size;
+	struct perf_file_section	*index;
+	u64				nr_index;
 	u64				feat_offset;
 	DECLARE_BITMAP(adds_features, HEADER_FEAT_BITS);
 	struct perf_env 	env;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 11/48] perf tools: Handle indexed data file properly
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (9 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 10/48] perf tools: Add HEADER_DATA_INDEX feature Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 12/48] perf tools: Add perf_data__create_index function Jiri Olsa
                   ` (38 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

From: Namhyung Kim <namhyung@kernel.org>

When perf detects data file has index table, process header part first
and then rest data files in a row.  Note that the indexed sample data is
recorded for each cpu/thread separately, it's already ordered with
respect to themselves so no need to use the ordered event queue
interface.

Link: http://lkml.kernel.org/n/tip-pqgpc6m32s2cwprxpom9871g@git.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-record.c |  2 ++
 tools/perf/perf.c           |  1 +
 tools/perf/perf.h           |  2 ++
 tools/perf/util/header.c    |  1 +
 tools/perf/util/session.c   | 52 ++++++++++++++++++++++++++++++-------
 5 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index b5deda2e890c..9690c1f3e666 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -53,6 +53,8 @@
 #include <sys/mman.h>
 #include <sys/wait.h>
 #include <linux/time64.h>
+#include <sys/types.h>
+#include <sys/stat.h>
 
 struct switch_output {
 	bool		 enabled;
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index a11cb006f968..989ea9799c88 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -38,6 +38,7 @@ const char perf_more_info_string[] =
 
 static int use_pager = -1;
 const char *input_name;
+bool perf_has_index;
 
 struct cmd_struct {
 	const char *cmd;
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 21bf7f5a3cf5..fba61cc5291f 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -37,6 +37,8 @@ void pthread__unblock_sigwinch(void);
 
 #include "util/target.h"
 
+extern bool perf_has_index;
+
 struct record_opts {
 	struct target target;
 	bool	     group;
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index b097bcc35d34..08ccd38e8eca 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -2603,6 +2603,7 @@ static int process_data_index(struct feat_fd *ff, void *data __maybe_unused)
 
 	ph->index = idx;
 	ph->nr_index = nr_idx;
+	perf_has_index = true;
 	return 0;
 }
 
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 7d2c8ce6cfad..15314052084d 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1840,7 +1840,9 @@ static int __perf_session__process_events(struct perf_session *session,
 	mmap_size = MMAP_SIZE;
 	if (mmap_size > file_size) {
 		mmap_size = file_size;
-		session->one_mmap = true;
+
+		if (!perf_has_index)
+			session->one_mmap = true;
 	}
 
 	memset(mmaps, 0, sizeof(mmaps));
@@ -1918,8 +1920,6 @@ static int __perf_session__process_events(struct perf_session *session,
 	err = perf_session__flush_thread_stacks(session);
 out_err:
 	ui_progress__finish();
-	if (!tool->no_warn)
-		perf_session__warn_about_errors(session);
 	/*
 	 * We may switching perf.data output, make ordered_events
 	 * reusable.
@@ -1930,20 +1930,52 @@ static int __perf_session__process_events(struct perf_session *session,
 	return err;
 }
 
+static int __perf_session__process_indexed_events(struct perf_session *session)
+{
+	struct perf_tool *tool = session->tool;
+	int err = 0, i;
+
+	for (i = 0; i < (int)session->header.nr_index; i++) {
+		struct perf_file_section *idx = &session->header.index[i];
+
+		if (!idx->size)
+			continue;
+
+		err = __perf_session__process_events(session, idx->offset,
+						     idx->size,
+						     idx->offset + idx->size);
+		if (err < 0)
+			break;
+	}
+
+	if (!tool->no_warn)
+		perf_session__warn_about_errors(session);
+
+	return err;
+}
+
 int perf_session__process_events(struct perf_session *session)
 {
-	u64 size = perf_data__size(session->data);
+	struct perf_tool *tool = session->tool;
+	struct perf_data *data = session->data;
+	u64 size = perf_data__size(data);
 	int err;
 
 	if (perf_session__register_idle_thread(session) < 0)
 		return -ENOMEM;
 
-	if (!perf_data__is_pipe(session->data))
-		err = __perf_session__process_events(session,
-						     session->header.data_offset,
-						     session->header.data_size, size);
-	else
-		err = __perf_session__process_pipe_events(session);
+	if (perf_data__is_pipe(data))
+		return __perf_session__process_pipe_events(session);
+	if (perf_has_index)
+		return __perf_session__process_indexed_events(session);
+
+	err = __perf_session__process_events(session,
+					     session->header.data_offset,
+					     session->header.data_size,
+					     size);
+
+	if (!tool->no_warn)
+		perf_session__warn_about_errors(session);
 
 	return err;
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 12/48] perf tools: Add perf_data__create_index function
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (10 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 11/48] perf tools: Handle indexed data file properly Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 13/48] perf record: Add --index option for building index table Jiri Olsa
                   ` (37 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

Adding perf_data__create_index function to create
and open index files within perf_data struct.

Link: http://lkml.kernel.org/n/tip-kl4s1f13cg6wycrg367p85qm@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/data.c | 64 ++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/data.h |  5 ++++
 2 files changed, 69 insertions(+)

diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
index d8cfc19ddb10..b856accfdace 100644
--- a/tools/perf/util/data.c
+++ b/tools/perf/util/data.c
@@ -187,3 +187,67 @@ int perf_data__switch(struct perf_data *data,
 	free(new_filepath);
 	return ret;
 }
+
+static void free_index(struct perf_data_file *index, int nr)
+{
+	while (--nr >= 1) {
+		close(index[nr].fd);
+		free((char *) index[nr].path);
+	}
+	free(index);
+}
+
+static void clean_index(struct perf_data *data,
+			struct perf_data_file *index,
+			int index_nr)
+{
+	char path[PATH_MAX];
+
+	scnprintf(path, sizeof(path), "%s.dir", data->file.path);
+	rm_rf(path);
+
+	free_index(index, index_nr);
+}
+
+void perf_data__clean_index(struct perf_data *data)
+{
+	clean_index(data, data->index, data->index_nr);
+}
+
+int perf_data__create_index(struct perf_data *data, int nr)
+{
+	struct perf_data_file *index;
+	char path[PATH_MAX];
+	int ret = -1, i = 0;
+
+	index = malloc(nr * sizeof(*index));
+	if (!index)
+		return -ENOMEM;
+
+	data->index    = index;
+	data->index_nr = nr;
+
+	scnprintf(path, sizeof(path), "%s.dir", data->file.path);
+	if (rm_rf(path) < 0 || mkdir(path, S_IRWXU) < 0)
+		goto out_err;
+
+	for (; i < nr; i++) {
+		struct perf_data_file *file = &index[i];
+
+		if (asprintf((char **) &file->path, "%s.dir/perf.data.%d",
+			     data->file.path, i) < 0)
+			goto out_err;
+
+		ret = open(file->path, O_RDWR|O_CREAT|O_TRUNC, S_IRUSR|S_IWUSR);
+		if (ret < 0)
+			goto out_err;
+
+		file->fd = ret;
+	}
+
+	return 0;
+
+out_err:
+	clean_index(data, index, i);
+	return ret;
+}
diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h
index 4828f7feea89..33b62c30b053 100644
--- a/tools/perf/util/data.h
+++ b/tools/perf/util/data.h
@@ -20,6 +20,8 @@ struct perf_data {
 	bool			 force;
 	unsigned long		 size;
 	enum perf_data_mode	 mode;
+	struct perf_data_file	*index;
+	int			 index_nr;
 };
 
 static inline bool perf_data__is_read(struct perf_data *data)
@@ -63,4 +65,7 @@ ssize_t perf_data_file__write(struct perf_data_file *file,
 int perf_data__switch(struct perf_data *data,
 			   const char *postfix,
 			   size_t pos, bool at_exit);
+int perf_data__create_index(struct perf_data *data,
+			    int nr);
+void perf_data__clean_index(struct perf_data *data);
 #endif /* __PERF_DATA_H */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 13/48] perf record: Add --index option for building index table
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (11 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 12/48] perf tools: Add perf_data__create_index function Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 14/48] perf tools: Introduce thread__comm(_str)_by_time() helpers Jiri Olsa
                   ` (36 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

The new --index option will create indexed data file which can be
processed by multiple threads parallelly.  It saves meta event and
sample data in separate files and merges them with an index table.

If there's an index table in the data file, the HEADER_DATA_INDEX
feature bit is set and session->header.index[0] will point to the meta
event area, and rest are sample data.  It'd look like below:

        +---------------------+
        |     file header     |
        |---------------------|
        |                     |
        |    meta events[0] <-+--+
        |                     |  |
        |---------------------|  |
        |                     |  |
        |    sample data[1] <-+--+
        |                     |  |
        |---------------------|  |
        |                     |  |
        |    sample data[2] <-|--+
        |                     |  |
        |---------------------|  |
        |         ...         | ...
        |---------------------|  |
        |     feature data    |  |
        |   (contains index) -+--+
        +---------------------+

Link: http://lkml.kernel.org/n/tip-x7uwxr8o0p54xf1vfqiqxav7@git.kernel.org
Original-patch-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/Documentation/perf-record.txt |   4 +
 tools/perf/builtin-record.c              | 134 ++++++++++++++++++++++-
 tools/perf/perf.h                        |   1 +
 tools/perf/util/mmap.h                   |  23 ++--
 4 files changed, 146 insertions(+), 16 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 246dee081efd..ae19f6424e5f 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -500,6 +500,10 @@ config terms. For example: 'cycles/overwrite/' and 'instructions/no-overwrite/'.
 
 Implies --tail-synthesize.
 
+--index::
+Build an index table for sample data.  This will speed up perf report by
+parallel processing.
+
 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 9690c1f3e666..1b01cb4d06b8 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -108,10 +108,15 @@ static bool switch_output_time(struct record *rec)
 	       trigger_is_ready(&switch_output_trigger);
 }
 
-static int record__write(struct record *rec, struct perf_mmap *map __maybe_unused,
+static int record__write(struct record *rec, struct perf_mmap *map,
 			 void *bf, size_t size)
 {
-	struct perf_data_file *file = &rec->session->data->file;
+	struct perf_data_file *file;
+
+	if (rec->opts.index && map)
+		file = map->file;
+	else
+		file = &rec->session->data->file;
 
 	if (perf_data_file__write(file, bf, size) < 0) {
 		pr_err("failed to write perf data, error: %m\n");
@@ -132,6 +137,7 @@ static int process_synthesized_event(struct perf_tool *tool,
 				     struct machine *machine __maybe_unused)
 {
 	struct record *rec = container_of(tool, struct record, tool);
+
 	return record__write(rec, NULL, event, event->header.size);
 }
 
@@ -331,7 +337,7 @@ static int record__mmap_evlist(struct record *rec,
 
 	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages,
 				 opts->auxtrace_mmap_pages,
-				 opts->auxtrace_snapshot_mode, false) < 0) {
+				 opts->auxtrace_snapshot_mode, opts->index) < 0) {
 		if (errno == EPERM) {
 			pr_err("Permission error mapping pages.\n"
 			       "Consider increasing "
@@ -352,6 +358,31 @@ static int record__mmap_evlist(struct record *rec,
 	return 0;
 }
 
+static int record__mmap_index(struct record *rec)
+{
+	struct perf_evlist *evlist = rec->evlist;
+	struct perf_data *data = &rec->data;
+	int i, ret, nr = evlist->nr_mmaps;
+
+	ret = perf_data__create_index(data, nr);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < nr; i++) {
+		struct perf_mmap *map = &evlist->mmap[i];
+
+		map->file = &data->index[i];
+	}
+
+	for (i = 0; i < nr; i++) {
+		struct perf_mmap *map = &evlist->track_mmap[i];
+
+		map->file = &data->file;
+	}
+
+	return 0;
+}
+
 static int record__mmap(struct record *rec)
 {
 	return record__mmap_evlist(rec, rec->evlist);
@@ -604,7 +635,73 @@ static void record__init_features(struct record *rec)
 		perf_header__clear_feat(&session->header, HEADER_AUXTRACE);
 
 	perf_header__clear_feat(&session->header, HEADER_STAT);
-	perf_header__clear_feat(&session->header, HEADER_DATA_INDEX);
+
+	if (!rec->opts.index)
+		perf_header__clear_feat(&session->header, HEADER_DATA_INDEX);
+}
+
+static int record__merge_index(struct record *rec)
+{
+	struct perf_file_section *idx;
+	struct perf_data *data = &rec->data;
+	struct perf_session *session = rec->session;
+	int output_fd = perf_data__fd(data);
+	int i, nr_index, ret = -ENOMEM;
+	u64 offset;
+
+	/* +1 for header file itself */
+	nr_index = data->index_nr + 1;
+
+	idx = calloc(nr_index, sizeof(*idx));
+	if (idx == NULL)
+		goto out_close;
+
+	offset = lseek(output_fd, 0, SEEK_END);
+
+	idx[0].offset = session->header.data_offset;
+	idx[0].size   = offset - idx[0].offset;
+
+	for (i = 1; i < nr_index; i++) {
+		struct stat stbuf;
+		int fd = data->index[i - 1].fd;
+		char buf[20];
+
+		ret = fstat(fd, &stbuf);
+		if (ret < 0)
+			goto out_close;
+
+		idx[i].offset = offset;
+		idx[i].size   = stbuf.st_size;
+
+		offset += stbuf.st_size;
+
+		if (idx[i].size == 0)
+			continue;
+
+		unit_number__scnprintf(buf, sizeof(buf), idx[i].size);
+		pr_debug("storing index %d, size %s ...", i, buf);
+
+		ret = copyfile_offset(fd, 0, output_fd, idx[i].offset,
+				      idx[i].size);
+		if (ret < 0)
+			goto out_close;
+
+		pr_debug(" ok\n");
+	}
+
+	session->header.index = idx;
+	session->header.nr_index = nr_index;
+
+	perf_has_index = true;
+
+	ret = 0;
+
+out_close:
+	if (ret < 0)
+		pr_err("failed to merge index files: %d\n", ret);
+
+	perf_data__clean_index(data);
+	return ret;
 }
 
 static void
@@ -617,7 +714,11 @@ record__finish_output(struct record *rec)
 		return;
 
 	rec->session->header.data_size += rec->bytes_written;
-	data->size = lseek(perf_data__fd(data), 0, SEEK_CUR);
+
+	if (rec->opts.index)
+		record__merge_index(rec);
+
+	data->size = lseek(perf_data__fd(data), 0, SEEK_END);
 
 	if (!rec->no_buildid) {
 		process_buildids(rec);
@@ -929,11 +1030,22 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	if (data->is_pipe && rec->evlist->nr_entries == 1)
 		rec->opts.sample_id = true;
 
+	if (data->is_pipe && opts->index) {
+		pr_warning("Indexing is disabled for pipe output\n");
+		opts->index = false;
+	}
+
 	if (record__open(rec) != 0) {
 		err = -1;
 		goto out_child;
 	}
 
+	if (opts->index) {
+		err = record__mmap_index(rec);
+		if (err)
+			goto out_child;
+	}
+
 	err = bpf__apply_obj_config();
 	if (err) {
 		char errbuf[BUFSIZ];
@@ -1693,6 +1805,8 @@ static struct option __record_options[] = {
 			  "signal"),
 	OPT_BOOLEAN(0, "dry-run", &dry_run,
 		    "Parse options then exit"),
+	OPT_BOOLEAN(0, "index", &record.opts.index,
+		    "make index for sample data to speed-up processing"),
 	OPT_END()
 };
 
@@ -1841,6 +1955,16 @@ int cmd_record(int argc, const char **argv)
 		goto out;
 	}
 
+	if (rec->opts.index) {
+		if (!rec->opts.sample_time) {
+			pr_err("Sample timestamp is required for indexing\n");
+			goto out;
+		}
+
+		perf_evlist__add_dummy_tracking(rec->evlist);
+	}
+
+
 	if (rec->opts.target.tid && !rec->opts.no_inherit_set)
 		rec->opts.no_inherit = true;
 
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index fba61cc5291f..4fd26c05a2d8 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -68,6 +68,7 @@ struct record_opts {
 	bool	     ignore_missing_thread;
 	bool	     strict_freq;
 	bool	     sample_id;
+	bool	     index;
 	unsigned int freq;
 	unsigned int mmap_pages;
 	unsigned int auxtrace_mmap_pages;
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index 9d2672d8f131..bad05b12b9df 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -15,17 +15,18 @@
  * @refcnt - e.g. code using PERF_EVENT_IOC_SET_OUTPUT to share this
  */
 struct perf_mmap {
-	void		 *base;
-	int		 mask;
-	int		 fd;
-	int		 cpu;
-	refcount_t	 refcnt;
-	u64		 prev;
-	u64		 start;
-	u64		 end;
-	bool		 overwrite;
-	struct auxtrace_mmap auxtrace_mmap;
-	char		 event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
+	void			 *base;
+	int			 mask;
+	int			 fd;
+	int			 cpu;
+	refcount_t		 refcnt;
+	u64			 prev;
+	u64			 start;
+	u64			 end;
+	bool			 overwrite;
+	struct auxtrace_mmap	 auxtrace_mmap;
+	struct perf_data_file	*file;
+	char			 event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
 };
 
 /*
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 14/48] perf tools: Introduce thread__comm(_str)_by_time() helpers
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (12 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 13/48] perf record: Add --index option for building index table Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 15/48] perf tools: Add a test case for thread comm handling Jiri Olsa
                   ` (35 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Frederic Weisbecker, lkml, Ingo Molnar, Namhyung Kim,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov

From: Namhyung Kim <namhyung@kernel.org>

When data file indexing is enabled, it processes all task, comm and mmap
events first and then goes to the sample events.  So all it sees is the
last comm of a thread although it has information at the time of sample.

Sort thread's comm by time so that it can find appropriate comm at the
sample time.  The thread__comm_by_time() will mostly work even if
PERF_SAMPLE_TIME bit is off since in that case, sample->time will be
-1 so it'll take the last comm anyway.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-g0q0o4prp0tj1zkb86in0dqu@git.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/thread.c | 33 ++++++++++++++++++++++++++++++++-
 tools/perf/util/thread.h |  2 ++
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 2048d393ece6..a61683157760 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -192,6 +192,21 @@ struct comm *thread__exec_comm(const struct thread *thread)
 	return last;
 }
 
+struct comm *thread__comm_by_time(const struct thread *thread, u64 timestamp)
+{
+	struct comm *comm;
+
+	list_for_each_entry(comm, &thread->comm_list, list) {
+		if (timestamp >= comm->start)
+			return comm;
+	}
+
+	if (list_empty(&thread->comm_list))
+		return NULL;
+
+	return list_last_entry(&thread->comm_list, struct comm, list);
+}
+
 static int ____thread__set_comm(struct thread *thread, const char *str,
 				u64 timestamp, bool exec)
 {
@@ -206,7 +221,13 @@ static int ____thread__set_comm(struct thread *thread, const char *str,
 		new = comm__new(str, timestamp, exec);
 		if (!new)
 			return -ENOMEM;
-		list_add(&new->list, &thread->comm_list);
+
+		/* sort by time */
+		list_for_each_entry(curr, &thread->comm_list, list) {
+			if (timestamp >= curr->start)
+				break;
+		}
+		list_add_tail(&new->list, &curr->list);
 
 		if (exec)
 			unwind__flush_access(thread);
@@ -266,6 +287,16 @@ const char *thread__comm_str(const struct thread *thread)
 	return str;
 }
 
+const char *thread__comm_str_by_time(const struct thread *thread, u64 timestamp)
+{
+	const struct comm *comm = thread__comm_by_time(thread, timestamp);
+
+	if (!comm)
+		return NULL;
+
+	return comm__str(comm);
+}
+
 /* CHECKME: it should probably better return the max comm len from its comm list */
 int thread__comm_len(struct thread *thread)
 {
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 07606aa6998d..64eaa68bb112 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -85,8 +85,10 @@ int thread__set_comm_from_proc(struct thread *thread);
 int thread__comm_len(struct thread *thread);
 struct comm *thread__comm(const struct thread *thread);
 struct comm *thread__exec_comm(const struct thread *thread);
+struct comm *thread__comm_by_time(const struct thread *thread, u64 timestamp);
 const char *thread__comm_str(const struct thread *thread);
 int thread__insert_map(struct thread *thread, struct map *map);
+const char *thread__comm_str_by_time(const struct thread *thread, u64 timestamp);
 int thread__fork(struct thread *thread, struct thread *parent, u64 timestamp);
 size_t thread__fprintf(struct thread *thread, FILE *fp);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 15/48] perf tools: Add a test case for thread comm handling
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (13 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 14/48] perf tools: Introduce thread__comm(_str)_by_time() helpers Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 16/48] perf tools: Use thread__comm_by_time() when adding hist entries Jiri Olsa
                   ` (34 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Frederic Weisbecker, lkml, Ingo Molnar, Namhyung Kim,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov

From: Namhyung Kim <namhyung@kernel.org>

The new test case checks various thread comm handling APIs like
overridding and time sorting.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-b31f0pktgl0k2dba3lqp159w@git.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/tests/Build          |  1 +
 tools/perf/tests/builtin-test.c |  4 +++
 tools/perf/tests/tests.h        |  1 +
 tools/perf/tests/thread-comm.c  | 48 +++++++++++++++++++++++++++++++++
 4 files changed, 54 insertions(+)
 create mode 100644 tools/perf/tests/thread-comm.c

diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build
index 6c108fa79ae3..713fc29871e2 100644
--- a/tools/perf/tests/Build
+++ b/tools/perf/tests/Build
@@ -24,6 +24,7 @@ perf-y += bp_account.o
 perf-y += task-exit.o
 perf-y += sw-clock.o
 perf-y += mmap-thread-lookup.o
+perf-y += thread-comm.o
 perf-y += thread-mg-share.o
 perf-y += switch-tracking.o
 perf-y += keep-tracking.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index d7a5e1b9aa6f..982e5f64df62 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -279,6 +279,10 @@ static struct test generic_tests[] = {
 		.desc = "mem2node",
 		.func = test__mem2node,
 	},
+	{
+		.desc = "Test thread comm handling",
+		.func = test__thread_comm,
+	},
 	{
 		.func = NULL,
 	},
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index a9760e790563..5d16a56f262f 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -104,6 +104,7 @@ const char *test__clang_subtest_get_desc(int subtest);
 int test__clang_subtest_get_nr(void);
 int test__unit_number__scnprint(struct test *test, int subtest);
 int test__mem2node(struct test *t, int subtest);
+int test__thread_comm(struct test *test, int subtest);
 
 bool test__bp_signal_is_supported(void);
 
diff --git a/tools/perf/tests/thread-comm.c b/tools/perf/tests/thread-comm.c
new file mode 100644
index 000000000000..9fcfd2c43488
--- /dev/null
+++ b/tools/perf/tests/thread-comm.c
@@ -0,0 +1,48 @@
+#include <linux/compiler.h>
+#include "tests.h"
+#include "machine.h"
+#include "thread.h"
+#include "debug.h"
+
+int test__thread_comm(struct test *test __maybe_unused, int subtest __maybe_unused)
+{
+	struct machines machines;
+	struct machine *machine;
+	struct thread *t;
+
+	/*
+	 * This test is to check whether it can retrieve a correct
+	 * comm for a given time.  When multi-file data storage is
+	 * enabled, those task/comm events are processed first so the
+	 * later sample should find a matching comm properly.
+	 */
+	machines__init(&machines);
+	machine = &machines.host;
+
+	t = machine__findnew_thread(machine, 100, 100);
+	TEST_ASSERT_VAL("wrong init thread comm",
+			!strcmp(thread__comm_str(t), ":100"));
+
+	thread__set_comm(t, "perf-test1", 10000);
+	TEST_ASSERT_VAL("failed to override thread comm",
+			!strcmp(thread__comm_str(t), "perf-test1"));
+
+	thread__set_comm(t, "perf-test2", 20000);
+	thread__set_comm(t, "perf-test3", 30000);
+	thread__set_comm(t, "perf-test4", 40000);
+
+	TEST_ASSERT_VAL("failed to find timed comm",
+			!strcmp(thread__comm_str_by_time(t, 20000), "perf-test2"));
+	TEST_ASSERT_VAL("failed to find timed comm",
+			!strcmp(thread__comm_str_by_time(t, 35000), "perf-test3"));
+	TEST_ASSERT_VAL("failed to find timed comm",
+			!strcmp(thread__comm_str_by_time(t, 50000), "perf-test4"));
+
+	thread__set_comm(t, "perf-test1.5", 15000);
+	TEST_ASSERT_VAL("failed to sort timed comm",
+			!strcmp(thread__comm_str_by_time(t, 15000), "perf-test1.5"));
+
+	machine__delete_threads(machine);
+	machines__exit(&machines);
+	return 0;
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 16/48] perf tools: Use thread__comm_by_time() when adding hist entries
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (14 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 15/48] perf tools: Add a test case for thread comm handling Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 17/48] perf tools: Convert dead thread list into rbtree Jiri Olsa
                   ` (33 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Frederic Weisbecker, lkml, Ingo Molnar, Namhyung Kim,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov

From: Namhyung Kim <namhyung@kernel.org>

Now thread->comm can be handled with time properly, use it to find
the correct comm at the time when adding hist entries.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-lwjol849j2jy4hmwwjxu99r7@git.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/hist.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 828cb9794c76..7c47e5749809 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -592,7 +592,7 @@ __hists__add_entry(struct hists *hists,
 	struct namespaces *ns = thread__namespaces(al->thread);
 	struct hist_entry entry = {
 		.thread	= al->thread,
-		.comm = thread__comm(al->thread),
+		.comm = thread__comm_by_time(al->thread, sample->time),
 		.cgroup_id = {
 			.dev = ns ? ns->link_info[CGROUP_NS_INDEX].dev : 0,
 			.ino = ns ? ns->link_info[CGROUP_NS_INDEX].ino : 0,
@@ -952,7 +952,7 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 		.hists = evsel__hists(evsel),
 		.cpu = al->cpu,
 		.thread = al->thread,
-		.comm = thread__comm(al->thread),
+		.comm = thread__comm_by_time(al->thread, sample->time),
 		.ip = al->addr,
 		.ms = {
 			.map = al->map,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 17/48] perf tools: Convert dead thread list into rbtree
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (15 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 16/48] perf tools: Use thread__comm_by_time() when adding hist entries Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 18/48] perf tools: Introduce machine__find*_thread_by_time() Jiri Olsa
                   ` (32 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Frederic Weisbecker, lkml, Ingo Molnar, Namhyung Kim,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov

Currently perf maintains dead threads in a linked list but this can be
a problem if someone needs to search from it especially in a large
session which might have many dead threads.  Convert it to a rbtree
like normal threads and it'll be used later with multi-thread changes.

The list node is now used for chaining dead threads of same tid since
it's easier to handle such threads in time order.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-86w2k0bjyi98p0kvyb6frcu5@git.kernel.org
Original-patch-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/intel-pt.c |  2 +-
 tools/perf/util/machine.c  | 82 ++++++++++++++++++++++++++++++++++----
 tools/perf/util/machine.h  | 10 ++---
 tools/perf/util/thread.c   | 12 +++++-
 tools/perf/util/thread.h   |  6 +--
 5 files changed, 93 insertions(+), 19 deletions(-)

diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index aec68908d604..8c3537f980ed 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -2517,7 +2517,7 @@ int intel_pt_process_auxtrace_info(union perf_event *event,
 	 * current thread lifetime assuption is kept and we don't segfault
 	 * at list_del_init().
 	 */
-	INIT_LIST_HEAD(&pt->unknown_thread->node);
+	INIT_LIST_HEAD(&pt->unknown_thread->tid_list);
 
 	err = thread__set_comm(pt->unknown_thread, "unknown", 0);
 	if (err)
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index c4acd2001db0..c36c27429866 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -41,10 +41,11 @@ static void machine__threads_init(struct machine *machine)
 
 	for (i = 0; i < THREADS__TABLE_SIZE; i++) {
 		struct threads *threads = &machine->threads[i];
+
 		threads->entries = RB_ROOT;
+		threads->dead    = RB_ROOT;
 		init_rwsem(&threads->lock);
 		threads->nr = 0;
-		INIT_LIST_HEAD(&threads->dead);
 		threads->last_match = NULL;
 	}
 }
@@ -171,6 +172,28 @@ static void dsos__exit(struct dsos *dsos)
 	exit_rwsem(&dsos->lock);
 }
 
+static void threads__delete_dead(struct threads *threads)
+{
+	struct rb_node *nd = rb_first(&threads->dead);
+
+	while (nd) {
+		struct thread *t = rb_entry(nd, struct thread, rb_node);
+		struct thread *pos;
+
+		nd = rb_next(nd);
+		rb_erase_init(&t->rb_node, &threads->dead);
+
+		while (!list_empty(&t->tid_list)) {
+			pos = list_first_entry(&t->tid_list,
+					       struct thread, tid_list);
+			list_del_init(&pos->tid_list);
+			thread__delete(pos);
+		}
+
+		thread__delete(t);
+	}
+}
+
 void machine__delete_threads(struct machine *machine)
 {
 	struct rb_node *nd;
@@ -178,6 +201,7 @@ void machine__delete_threads(struct machine *machine)
 
 	for (i = 0; i < THREADS__TABLE_SIZE; i++) {
 		struct threads *threads = &machine->threads[i];
+
 		down_write(&threads->lock);
 		nd = rb_first(&threads->entries);
 		while (nd) {
@@ -186,6 +210,9 @@ void machine__delete_threads(struct machine *machine)
 			nd = rb_next(nd);
 			__machine__remove_thread(machine, t, false);
 		}
+
+		threads__delete_dead(threads);
+
 		up_write(&threads->lock);
 	}
 }
@@ -1673,6 +1700,8 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
 static void __machine__remove_thread(struct machine *machine, struct thread *th, bool lock)
 {
 	struct threads *threads = machine__threads(machine, th->tid);
+	struct rb_node **p, *parent;
+	struct thread *pos;
 
 	if (threads->last_match == th)
 		threads__set_last_match(threads, NULL);
@@ -1684,14 +1713,44 @@ static void __machine__remove_thread(struct machine *machine, struct thread *th,
 	RB_CLEAR_NODE(&th->rb_node);
 	--threads->nr;
 	/*
-	 * Move it first to the dead_threads list, then drop the reference,
-	 * if this is the last reference, then the thread__delete destructor
-	 * will be called and we will remove it from the dead_threads list.
+	 * No need to have an additional reference for non-index file
+	 * as they can be released when reference holders died and
+	 * there will be no more new references.
+	 */
+	if (!perf_has_index) {
+		thread__put(th);
+		goto out;
+	}
+
+	p = &threads->dead.rb_node;
+	parent = NULL;
+
+	/*
+	 * For indexed file, We may have references to this (dead)
+	 * thread, as samples are processed after fork/exit events.
+	 * Just move them to a separate rbtree and keep a reference.
 	 */
-	list_add_tail(&th->node, &threads->dead);
+	while (*p != NULL) {
+		parent = *p;
+		pos = rb_entry(parent, struct thread, rb_node);
+
+		if (pos->tid == th->tid) {
+			list_add_tail(&th->tid_list, &pos->tid_list);
+			goto out;
+		}
+
+		if (th->tid < pos->tid)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+
+	rb_link_node(&th->rb_node, parent, p);
+	rb_insert_color(&th->rb_node, &threads->dead);
+
+out:
 	if (lock)
 		up_write(&threads->lock);
-	thread__put(th);
 }
 
 void machine__remove_thread(struct machine *machine, struct thread *th)
@@ -2395,7 +2454,7 @@ int machine__for_each_thread(struct machine *machine,
 {
 	struct threads *threads;
 	struct rb_node *nd;
-	struct thread *thread;
+	struct thread *thread, *pos;
 	int rc = 0;
 	int i;
 
@@ -2408,10 +2467,17 @@ int machine__for_each_thread(struct machine *machine,
 				return rc;
 		}
 
-		list_for_each_entry(thread, &threads->dead, node) {
+		for (nd = rb_first(&threads->dead); nd; nd = rb_next(nd)) {
+			thread = rb_entry(nd, struct thread, rb_node);
 			rc = fn(thread, priv);
 			if (rc != 0)
 				return rc;
+
+			list_for_each_entry(pos, &thread->tid_list, tid_list) {
+				rc = fn(pos, priv);
+				if (rc != 0)
+					return rc;
+			}
 		}
 	}
 	return rc;
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index d856b85862e2..d91a3567d2cd 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -29,11 +29,11 @@ struct vdso_info;
 #define THREADS__TABLE_SIZE	(1 << THREADS__TABLE_BITS)
 
 struct threads {
-	struct rb_root	  entries;
-	struct rw_semaphore lock;
-	unsigned int	  nr;
-	struct list_head  dead;
-	struct thread	  *last_match;
+	struct rb_root		 entries;
+	struct rb_root		 dead;
+	struct rw_semaphore	 lock;
+	unsigned int		 nr;
+	struct thread		*last_match;
 };
 
 struct machine {
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index a61683157760..47c03001d578 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -61,6 +61,7 @@ struct thread *thread__new(pid_t pid, pid_t tid)
 
 		list_add(&comm->list, &thread->comm_list);
 		refcount_set(&thread->refcnt, 1);
+		INIT_LIST_HEAD(&thread->tid_list);
 		RB_CLEAR_NODE(&thread->rb_node);
 		/* Thread holds first ref to nsdata. */
 		thread->nsinfo = nsinfo__new(pid);
@@ -79,6 +80,7 @@ void thread__delete(struct thread *thread)
 	struct comm *comm, *tmp_comm;
 
 	BUG_ON(!RB_EMPTY_NODE(&thread->rb_node));
+	BUG_ON(!list_empty(&thread->tid_list));
 
 	thread_stack__free(thread);
 
@@ -123,7 +125,15 @@ void thread__put(struct thread *thread)
 		 * Remove it from the dead_threads list, as last reference
 		 * is gone.
 		 */
-		list_del_init(&thread->node);
+		if (!RB_EMPTY_NODE(&thread->rb_node)) {
+			struct machine *machine = thread->mg->machine;
+			struct threads *threads = machine__threads(machine, thread->tid);
+
+			rb_erase(&thread->rb_node, &threads->dead);
+			RB_CLEAR_NODE(&thread->rb_node);
+		}
+
+		list_del_init(&thread->tid_list);
 		thread__delete(thread);
 	}
 }
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 64eaa68bb112..d573f3715fec 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -16,10 +16,8 @@ struct thread_stack;
 struct unwind_libunwind_ops;
 
 struct thread {
-	union {
-		struct rb_node	 rb_node;
-		struct list_head node;
-	};
+	struct rb_node		rb_node;
+	struct list_head	tid_list;
 	struct map_groups	*mg;
 	pid_t			pid_; /* Not all tools update this */
 	pid_t			tid;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 18/48] perf tools: Introduce machine__find*_thread_by_time()
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (16 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 17/48] perf tools: Convert dead thread list into rbtree Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 19/48] perf tools: Add thread::exited flag Jiri Olsa
                   ` (31 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Frederic Weisbecker, lkml, Ingo Molnar, Namhyung Kim,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov

From: Namhyung Kim <namhyung@kernel.org>

With data file indexing is enabled, it needs to search thread based on
sample time since sample processing is done after other (task, comm and
mmap) events are processed.  This can be a problem if a session is very
long and pid is recycled - in that case it'll only see the last one.

So keep thread start time in it, and search thread based on the time.
This patch introduces machine__find{,new}_thread_by_time() function
for this.  It'll first search current (i.e. recent) thread rbtree and
then dead thread tree (and tid list).  If it couldn't find anyone,
it'll create a new (missing) thread.

The sample timestamp of 0 means that this is called from synthesized
event so just use current rbtree.  The timestamp will be -1 if sample
didn't record the timestamp so will see current threads automatically.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-fxl42zknqoke9d9jix6fvu8w@git.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/tests/dwarf-unwind.c |   4 +-
 tools/perf/tests/hists_common.c |   2 +-
 tools/perf/tests/hists_link.c   |   2 +-
 tools/perf/util/event.c         |   6 +-
 tools/perf/util/machine.c       | 128 +++++++++++++++++++++++++++++++-
 tools/perf/util/machine.h       |  10 ++-
 tools/perf/util/thread.c        |   5 ++
 tools/perf/util/thread.h        |   1 +
 8 files changed, 148 insertions(+), 10 deletions(-)

diff --git a/tools/perf/tests/dwarf-unwind.c b/tools/perf/tests/dwarf-unwind.c
index 2f008067d989..e55a45c4da5b 100644
--- a/tools/perf/tests/dwarf-unwind.c
+++ b/tools/perf/tests/dwarf-unwind.c
@@ -92,12 +92,10 @@ static int unwind_entry(struct unwind_entry *entry, void *arg)
 
 noinline int test_dwarf_unwind__thread(struct thread *thread)
 {
-	struct perf_sample sample;
+	struct perf_sample sample = { .time = -1ULL, };
 	unsigned long cnt = 0;
 	int err = -1;
 
-	memset(&sample, 0, sizeof(sample));
-
 	if (test__arch_unwind_sample(&sample, thread)) {
 		pr_debug("failed to get unwind sample\n");
 		goto out;
diff --git a/tools/perf/tests/hists_common.c b/tools/perf/tests/hists_common.c
index b889a28fd80b..7499ac340883 100644
--- a/tools/perf/tests/hists_common.c
+++ b/tools/perf/tests/hists_common.c
@@ -104,7 +104,7 @@ struct machine *setup_fake_machine(struct machines *machines)
 
 	for (i = 0; i < ARRAY_SIZE(fake_mmap_info); i++) {
 		struct perf_sample sample = {
-			.cpumode = PERF_RECORD_MISC_USER,
+			.cpumode = PERF_RECORD_MISC_USER, .time = -1ULL,
 		};
 		union perf_event fake_mmap_event = {
 			.mmap = {
diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c
index 9a9d06cb0222..3e07928da53c 100644
--- a/tools/perf/tests/hists_link.c
+++ b/tools/perf/tests/hists_link.c
@@ -67,7 +67,7 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine)
 	struct perf_evsel *evsel;
 	struct addr_location al;
 	struct hist_entry *he;
-	struct perf_sample sample = { .period = 1, .weight = 1, };
+	struct perf_sample sample = { .period = 1, .weight = 1, .time = -1ULL, };
 	size_t i = 0, k;
 
 	/*
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 0cd42150f712..8a19f751d095 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -21,6 +21,7 @@
 #include "thread.h"
 #include "thread_map.h"
 #include "sane_ctype.h"
+#include "session.h"
 #include "symbol/kallsyms.h"
 #include "asm/bug.h"
 #include "stat.h"
@@ -1608,9 +1609,10 @@ struct symbol *thread__find_symbol(struct thread *thread, u8 cpumode,
 int machine__resolve(struct machine *machine, struct addr_location *al,
 		     struct perf_sample *sample)
 {
-	struct thread *thread = machine__findnew_thread(machine, sample->pid,
-							sample->tid);
+	struct thread *thread;
 
+	thread = machine__findnew_thread_by_time(machine, sample->pid,
+						 sample->tid, sample->time);
 	if (thread == NULL)
 		return -1;
 
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index c36c27429866..999f200f24e7 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -578,6 +578,122 @@ struct thread *machine__find_thread(struct machine *machine, pid_t pid,
 	return th;
 }
 
+static struct thread *
+__machine__findnew_thread_by_time(struct machine *machine, struct threads *threads,
+				  pid_t pid, pid_t tid, u64 timestamp, bool create)
+{
+	struct thread *curr, *pos, *new;
+	struct thread *th = NULL;
+	struct rb_node **p;
+	struct rb_node *parent = NULL;
+
+	if (!perf_has_index)
+		return ____machine__findnew_thread(machine, threads, pid, tid, create);
+
+	/* lookup current thread first */
+	curr = ____machine__findnew_thread(machine, threads, pid, tid, false);
+	if (curr && timestamp >= curr->start_time)
+		return curr;
+
+	/* and then check dead threads tree & list */
+	p = &threads->dead.rb_node;
+	while (*p != NULL) {
+		parent = *p;
+		th = rb_entry(parent, struct thread, rb_node);
+
+		if (th->tid == tid) {
+			list_for_each_entry(pos, &th->tid_list, tid_list) {
+				if (timestamp >= pos->start_time &&
+				    pos->start_time > th->start_time) {
+					th = pos;
+					break;
+				}
+			}
+
+			if (timestamp >= th->start_time) {
+				machine__update_thread_pid(machine, th, pid);
+				return th;
+			}
+			break;
+		}
+
+		if (tid < th->tid)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+
+	if (!create)
+		return NULL;
+
+	if (!curr && !*p) {
+		/* found no thread.  create one as current thread */
+		return __machine__findnew_thread(machine, pid, tid);
+	}
+
+	new = thread__new(pid, tid);
+	if (new == NULL)
+		return NULL;
+
+	new->dead = true;
+	new->start_time = timestamp;
+
+	if (*p) {
+		list_for_each_entry(pos, &th->tid_list, tid_list) {
+			/* sort by time */
+			if (timestamp >= pos->start_time) {
+				th = pos;
+				break;
+			}
+		}
+		list_add_tail(&new->tid_list, &th->tid_list);
+	} else {
+		rb_link_node(&new->rb_node, parent, p);
+		rb_insert_color(&new->rb_node, &threads->dead);
+	}
+
+	thread__get(new);
+
+	/*
+	 * We have to initialize map_groups separately
+	 * after rb tree is updated.
+	 *
+	 * The reason is that we call machine__findnew_thread
+	 * within thread__init_map_groups to find the thread
+	 * leader and that would screwed the rb tree.
+	 */
+	if (thread__init_map_groups(new, machine))
+		thread__zput(new);
+
+	return new;
+}
+
+struct thread *machine__find_thread_by_time(struct machine *machine, pid_t pid,
+					    pid_t tid, u64 timestamp)
+{
+	struct threads *threads = machine__threads(machine, tid);
+	struct thread *th;
+
+	down_write(&threads->lock);
+	th = thread__get(__machine__findnew_thread_by_time(machine, threads, pid, tid,
+							   timestamp, false));
+	up_write(&threads->lock);
+	return th;
+}
+
+struct thread *machine__findnew_thread_by_time(struct machine *machine, pid_t pid,
+					       pid_t tid, u64 timestamp)
+{
+	struct threads *threads = machine__threads(machine, tid);
+	struct thread *th;
+
+	down_write(&threads->lock);
+	th = thread__get(__machine__findnew_thread_by_time(machine, threads, pid, tid,
+							   timestamp, true));
+	up_write(&threads->lock);
+	return th;
+}
+
 struct comm *machine__thread_exec_comm(struct machine *machine,
 				       struct thread *thread)
 {
@@ -1611,7 +1727,7 @@ int machine__process_mmap2_event(struct machine *machine,
 	}
 
 	thread = machine__findnew_thread(machine, event->mmap2.pid,
-					event->mmap2.tid);
+					 event->mmap2.tid);
 	if (thread == NULL)
 		goto out_problem;
 
@@ -1735,6 +1851,16 @@ static void __machine__remove_thread(struct machine *machine, struct thread *th,
 		pos = rb_entry(parent, struct thread, rb_node);
 
 		if (pos->tid == th->tid) {
+			struct thread *old;
+
+			/* sort by time */
+			list_for_each_entry(old, &pos->tid_list, tid_list) {
+				if (th->start_time >= old->start_time) {
+					pos = old;
+					break;
+				}
+			}
+
 			list_add_tail(&th->tid_list, &pos->tid_list);
 			goto out;
 		}
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index d91a3567d2cd..9aed55d9facc 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -99,8 +99,6 @@ static inline bool machine__kernel_ip(struct machine *machine, u64 ip)
 	return ip >= kernel_start;
 }
 
-struct thread *machine__find_thread(struct machine *machine, pid_t pid,
-				    pid_t tid);
 struct comm *machine__thread_exec_comm(struct machine *machine,
 				       struct thread *thread);
 
@@ -194,6 +192,14 @@ int machine__nr_cpus_avail(struct machine *machine);
 
 struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid);
 struct thread *machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid);
+struct thread *machine__find_thread(struct machine *machine, pid_t pid,
+				    pid_t tid);
+struct thread *machine__findnew_thread_by_time(struct machine *machine,
+					       pid_t pid, pid_t tid,
+					       u64 timestamp);
+struct thread *machine__find_thread_by_time(struct machine *machine,
+					    pid_t pid, pid_t tid,
+					    u64 timestamp);
 
 struct dso *machine__findnew_dso(struct machine *machine, const char *filename);
 
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 47c03001d578..109fa3bc23c4 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -225,6 +225,10 @@ static int ____thread__set_comm(struct thread *thread, const char *str,
 	/* Override the default :tid entry */
 	if (!thread->comm_set) {
 		int err = comm__override(curr, str, timestamp, exec);
+
+		if (!thread->start_time)
+			thread->start_time = timestamp;
+
 		if (err)
 			return err;
 	} else {
@@ -403,6 +407,7 @@ int thread__fork(struct thread *thread, struct thread *parent, u64 timestamp)
 	}
 
 	thread->ppid = parent->tid;
+	thread->start_time = timestamp;
 	return thread__clone_map_groups(thread, parent);
 }
 
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index d573f3715fec..e8f779e83347 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -32,6 +32,7 @@ struct thread {
 	struct list_head	comm_list;
 	struct rw_semaphore	comm_lock;
 	u64			db_id;
+	u64			start_time;
 
 	void			*priv;
 	struct thread_stack	*ts;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 19/48] perf tools: Add thread::exited flag
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (17 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 18/48] perf tools: Introduce machine__find*_thread_by_time() Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 20/48] perf tools: Add a test case for timed thread handling Jiri Olsa
                   ` (30 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

Adding thread::exited to indicate the thread has exited,
and keeping the thread::dead flag to indicate thread
is on the dead list.

Link: http://lkml.kernel.org/n/tip-iak1784r9z2vhcnw6getmbdk@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/machine.c | 3 +++
 tools/perf/util/thread.h  | 5 +++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 999f200f24e7..5ae2baba27ca 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1828,6 +1828,9 @@ static void __machine__remove_thread(struct machine *machine, struct thread *th,
 	rb_erase_init(&th->rb_node, &threads->entries);
 	RB_CLEAR_NODE(&th->rb_node);
 	--threads->nr;
+
+	th->dead = true;
+
 	/*
 	 * No need to have an additional reference for non-index file
 	 * as they can be released when reference holders died and
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index e8f779e83347..8a1114c2f43a 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -26,7 +26,8 @@ struct thread {
 	refcount_t		refcnt;
 	bool			comm_set;
 	int			comm_len;
-	bool			dead; /* if set thread has exited */
+	bool			exited;	/* if set thread has exited */
+	bool			dead;	/* if set thread is in dead tree */
 	struct list_head	namespaces_list;
 	struct rw_semaphore	namespaces_lock;
 	struct list_head	comm_list;
@@ -64,7 +65,7 @@ static inline void __thread__zput(struct thread **thread)
 
 static inline void thread__exited(struct thread *thread)
 {
-	thread->dead = true;
+	thread->exited = true;
 }
 
 struct namespaces *thread__namespaces(const struct thread *thread);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 20/48] perf tools: Add a test case for timed thread handling
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (18 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 19/48] perf tools: Add thread::exited flag Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 21/48] perf tools: Maintain map groups list in a leader thread Jiri Olsa
                   ` (29 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Frederic Weisbecker, lkml, Ingo Molnar, Namhyung Kim,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov

From: Namhyung Kim <namhyung@kernel.org>

A test case for verifying live and dead thread tree management during
time change and new machine__find{,new}_thread_time().

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-owolpbhyg7e46jp1egd3zhyp@git.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/tests/Build                |   1 +
 tools/perf/tests/builtin-test.c       |   4 +
 tools/perf/tests/tests.h              |   1 +
 tools/perf/tests/thread-lookup-time.c | 181 ++++++++++++++++++++++++++
 4 files changed, 187 insertions(+)
 create mode 100644 tools/perf/tests/thread-lookup-time.c

diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build
index 713fc29871e2..715eb17d8047 100644
--- a/tools/perf/tests/Build
+++ b/tools/perf/tests/Build
@@ -26,6 +26,7 @@ perf-y += sw-clock.o
 perf-y += mmap-thread-lookup.o
 perf-y += thread-comm.o
 perf-y += thread-mg-share.o
+perf-y += thread-lookup-time.o
 perf-y += switch-tracking.o
 perf-y += keep-tracking.o
 perf-y += code-reading.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 982e5f64df62..c985ece3fab8 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -283,6 +283,10 @@ static struct test generic_tests[] = {
 		.desc = "Test thread comm handling",
 		.func = test__thread_comm,
 	},
+	{
+		.desc = "Test thread lookup with time",
+		.func = test__thread_lookup_time,
+	},
 	{
 		.func = NULL,
 	},
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index 5d16a56f262f..60fdc7bea1d8 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -105,6 +105,7 @@ int test__clang_subtest_get_nr(void);
 int test__unit_number__scnprint(struct test *test, int subtest);
 int test__mem2node(struct test *t, int subtest);
 int test__thread_comm(struct test *test, int subtest);
+int test__thread_lookup_time(struct test *test, int subtest);
 
 bool test__bp_signal_is_supported(void);
 
diff --git a/tools/perf/tests/thread-lookup-time.c b/tools/perf/tests/thread-lookup-time.c
new file mode 100644
index 000000000000..88e5bd5a7432
--- /dev/null
+++ b/tools/perf/tests/thread-lookup-time.c
@@ -0,0 +1,181 @@
+#include <linux/compiler.h>
+#include <inttypes.h>
+#include "tests.h"
+#include "machine.h"
+#include "thread.h"
+#include "map.h"
+#include "debug.h"
+
+static int thread__print_cb(struct thread *th, void *arg __maybe_unused)
+{
+	printf("thread: %d, start time: %"PRIu64" %s\n",
+	       th->tid, th->start_time,
+	       th->dead ? "(dead)" : th->exited ? "(exited)" : "");
+	return 0;
+}
+
+static int lookup_with_timestamp(struct machine *machine)
+{
+	struct thread *t1, *t2, *t3;
+	union perf_event fork_event = {
+		.fork = {
+			.pid = 0,
+			.tid = 0,
+			.ppid = 1,
+			.ptid = 1,
+		},
+	};
+	struct perf_sample sample = {
+		.time = 50000,
+	};
+
+	/* this is needed to keep dead threads in rbtree */
+	perf_has_index = true;
+
+	/* start_time is set to 0 */
+	t1 = machine__findnew_thread(machine, 0, 0);
+
+	if (verbose > 1) {
+		printf("========= after t1 created ==========\n");
+		machine__for_each_thread(machine, thread__print_cb, NULL);
+	}
+
+	TEST_ASSERT_VAL("wrong start time of old thread", t1->start_time == 0);
+
+	TEST_ASSERT_VAL("cannot find current thread",
+			machine__find_thread(machine, 0, 0) == t1);
+
+	TEST_ASSERT_VAL("cannot find current thread with time",
+			machine__findnew_thread_by_time(machine, 0, 0, 10000) == t1);
+
+	/* start_time is overwritten to new value */
+	thread__set_comm(t1, "/usr/bin/perf", 20000);
+
+	if (verbose > 1) {
+		printf("========= after t1 set comm ==========\n");
+		machine__for_each_thread(machine, thread__print_cb, NULL);
+	}
+
+	TEST_ASSERT_VAL("failed to update start time", t1->start_time == 20000);
+
+	TEST_ASSERT_VAL("should not find passed thread",
+			/* this will create yet another dead thread */
+			machine__findnew_thread_by_time(machine, 0, 0, 10000) != t1);
+
+	TEST_ASSERT_VAL("cannot find overwritten thread with time",
+			machine__find_thread_by_time(machine, 0, 0, 20000) == t1);
+
+	/* now t1 goes to dead thread tree, and create t2 */
+	machine__process_fork_event(machine, &fork_event, &sample);
+
+	if (verbose > 1) {
+		printf("========= after t2 forked ==========\n");
+		machine__for_each_thread(machine, thread__print_cb, NULL);
+	}
+
+	t2 = machine__find_thread(machine, 0, 0);
+
+	TEST_ASSERT_VAL("cannot find current thread", t2 != NULL);
+
+	TEST_ASSERT_VAL("wrong start time of new thread", t2->start_time == 50000);
+
+	TEST_ASSERT_VAL("dead thread cannot be found",
+			machine__find_thread_by_time(machine, 0, 0, 10000) != t1);
+
+	TEST_ASSERT_VAL("cannot find dead thread after new thread",
+			machine__find_thread_by_time(machine, 0, 0, 30000) == t1);
+
+	TEST_ASSERT_VAL("cannot find current thread after new thread",
+			machine__find_thread_by_time(machine, 0, 0, 50000) == t2);
+
+	/* now t2 goes to dead thread tree, and create t3 */
+	sample.time = 60000;
+	machine__process_fork_event(machine, &fork_event, &sample);
+
+	if (verbose > 1) {
+		printf("========= after t3 forked ==========\n");
+		machine__for_each_thread(machine, thread__print_cb, NULL);
+	}
+
+	t3 = machine__find_thread(machine, 0, 0);
+	TEST_ASSERT_VAL("cannot find current thread", t3 != NULL);
+
+	TEST_ASSERT_VAL("wrong start time of new thread", t3->start_time == 60000);
+
+	TEST_ASSERT_VAL("cannot find dead thread after new thread",
+			machine__findnew_thread_by_time(machine, 0, 0, 30000) == t1);
+
+	TEST_ASSERT_VAL("cannot find dead thread after new thread",
+			machine__findnew_thread_by_time(machine, 0, 0, 50000) == t2);
+
+	TEST_ASSERT_VAL("cannot find current thread after new thread",
+			machine__findnew_thread_by_time(machine, 0, 0, 70000) == t3);
+
+	machine__delete_threads(machine);
+	return 0;
+}
+
+static int lookup_without_timestamp(struct machine *machine)
+{
+	struct thread *t1, *t2, *t3;
+	union perf_event fork_event = {
+		.fork = {
+			.pid = 0,
+			.tid = 0,
+			.ppid = 1,
+			.ptid = 1,
+		},
+	};
+	struct perf_sample sample = {
+		.time = -1ULL,
+	};
+
+	t1 = machine__findnew_thread(machine, 0, 0);
+	TEST_ASSERT_VAL("cannot find current thread", t1 != NULL);
+
+	TEST_ASSERT_VAL("cannot find new thread with time",
+			machine__findnew_thread_by_time(machine, 0, 0, -1ULL) == t1);
+
+	machine__process_fork_event(machine, &fork_event, &sample);
+
+	t2 = machine__find_thread(machine, 0, 0);
+	TEST_ASSERT_VAL("cannot find current thread", t2 != NULL);
+
+	TEST_ASSERT_VAL("cannot find new thread with time",
+			machine__find_thread_by_time(machine, 0, 0, -1ULL) == t2);
+
+	machine__process_fork_event(machine, &fork_event, &sample);
+
+	t3 = machine__find_thread(machine, 0, 0);
+	TEST_ASSERT_VAL("cannot find current thread", t3 != NULL);
+
+	TEST_ASSERT_VAL("cannot find new thread with time",
+			machine__findnew_thread_by_time(machine, 0, 0, -1ULL) == t3);
+
+	machine__delete_threads(machine);
+	return 0;
+}
+
+int test__thread_lookup_time(struct test *test __maybe_unused, int subtest __maybe_unused)
+{
+	struct machines machines;
+	struct machine *machine;
+
+	/*
+	 * This test is to check whether it can retrieve a correct
+	 * thread for a given time.  When multi-file data storage is
+	 * enabled, those task/comm/mmap events are processed first so
+	 * the later sample should find a matching thread properly.
+	 */
+	machines__init(&machines);
+	machine = &machines.host;
+
+	if (lookup_with_timestamp(machine) < 0)
+		return -1;
+
+	if (lookup_without_timestamp(machine) < 0)
+		return -1;
+
+	machines__exit(&machines);
+	return 0;
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 21/48] perf tools: Maintain map groups list in a leader thread
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (19 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 20/48] perf tools: Add a test case for timed thread handling Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 22/48] perf tools: Introduce thread__find_symbol_by_time() and friends Jiri Olsa
                   ` (28 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Frederic Weisbecker, lkml, Ingo Molnar, Namhyung Kim,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov

From: Namhyung Kim <namhyung@kernel.org>

To support multi-threaded perf report, we need to maintain time-sorted
map groups.  Add ->mg_list member to struct thread and sort the list
by time.  Now leader threads have one more refcnt for map groups in
the list so also update the thread-mg-share test case.

Currently only add a new map groups when an exec (comm) event is
received.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-xj6w1y6nwhkatvo1dcd5t6sk@git.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/tests/thread-mg-share.c |   7 +-
 tools/perf/util/event.c            |   2 +
 tools/perf/util/machine.c          |  15 +++-
 tools/perf/util/map.c              |   3 +
 tools/perf/util/map.h              |   2 +
 tools/perf/util/thread.c           | 117 ++++++++++++++++++++++++++++-
 tools/perf/util/thread.h           |   3 +
 7 files changed, 142 insertions(+), 7 deletions(-)

diff --git a/tools/perf/tests/thread-mg-share.c b/tools/perf/tests/thread-mg-share.c
index b1d1bbafe7ae..44a63d7961c8 100644
--- a/tools/perf/tests/thread-mg-share.c
+++ b/tools/perf/tests/thread-mg-share.c
@@ -24,6 +24,9 @@ int test__thread_mg_share(struct test *test __maybe_unused, int subtest __maybe_
 	 * with several threads and checks they properly share and
 	 * maintain map groups info (struct map_groups).
 	 *
+	 * Note that a leader thread has one more refcnt for its
+	 * (current) map groups.
+	 *
 	 * thread group (pid: 0, tids: 0, 1, 2, 3)
 	 * other  group (pid: 4, tids: 4, 5)
 	*/
@@ -44,7 +47,7 @@ int test__thread_mg_share(struct test *test __maybe_unused, int subtest __maybe_
 			leader && t1 && t2 && t3 && other);
 
 	mg = leader->mg;
-	TEST_ASSERT_EQUAL("wrong refcnt", refcount_read(&mg->refcnt), 4);
+	TEST_ASSERT_EQUAL("wrong refcnt", refcount_read(&mg->refcnt), 5);
 
 	/* test the map groups pointer is shared */
 	TEST_ASSERT_VAL("map groups don't match", mg == t1->mg);
@@ -72,7 +75,7 @@ int test__thread_mg_share(struct test *test __maybe_unused, int subtest __maybe_
 	machine__remove_thread(machine, other_leader);
 
 	other_mg = other->mg;
-	TEST_ASSERT_EQUAL("wrong refcnt", refcount_read(&other_mg->refcnt), 2);
+	TEST_ASSERT_EQUAL("wrong refcnt", refcount_read(&other_mg->refcnt), 3);
 
 	TEST_ASSERT_VAL("map groups don't match", other_mg == other_leader->mg);
 
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 8a19f751d095..29438cda8aa2 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -1534,6 +1534,8 @@ struct map *thread__find_map(struct thread *thread, u8 cpumode, u64 addr,
 		return NULL;
 	}
 
+	BUG_ON(mg == NULL);
+
 	if (cpumode == PERF_RECORD_MISC_KERNEL && perf_host) {
 		al->level = 'k';
 		mg = &machine->kmaps;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 5ae2baba27ca..e2ebe471cdbc 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -404,8 +404,19 @@ static void machine__update_thread_pid(struct machine *machine,
 	if (!leader)
 		goto out_err;
 
-	if (!leader->mg)
-		leader->mg = map_groups__new(machine);
+	if (!leader->mg) {
+		struct map_groups *mg = map_groups__new(machine);
+
+		if (mg == NULL) {
+			pr_err("Not enough memory for map groups\n");
+			return;
+		}
+
+		if (thread__set_map_groups(leader, mg, 0) < 0) {
+			map_groups__put(mg);
+			goto out_err;
+		}
+	}
 
 	if (!leader->mg)
 		goto out_err;
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 3f07a587c8e6..6d6a0f65a9a0 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -504,6 +504,8 @@ void map_groups__init(struct map_groups *mg, struct machine *machine)
 	maps__init(&mg->maps);
 	mg->machine = machine;
 	refcount_set(&mg->refcnt, 1);
+	mg->timestamp = 0;
+	INIT_LIST_HEAD(&mg->list);
 }
 
 static void __maps__purge(struct maps *maps)
@@ -550,6 +552,7 @@ struct map_groups *map_groups__new(struct machine *machine)
 void map_groups__delete(struct map_groups *mg)
 {
 	map_groups__exit(mg);
+	list_del(&mg->list);
 	free(mg);
 }
 
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index e0f327b51e66..fb5f40fea2e3 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -64,6 +64,8 @@ struct map_groups {
 	struct maps	 maps;
 	struct machine	 *machine;
 	refcount_t	 refcnt;
+	u64		 timestamp;
+	struct list_head list;
 };
 
 struct map_groups *map_groups__new(struct machine *machine);
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 109fa3bc23c4..fbda4b6d2ec5 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -16,12 +16,78 @@
 
 #include <api/fs/fs.h>
 
+struct map_groups *thread__get_map_groups(struct thread *thread, u64 timestamp)
+{
+	struct map_groups *mg;
+	struct thread *leader = thread;
+
+	BUG_ON(thread->mg == NULL);
+
+	if (thread->tid != thread->pid_) {
+		leader = machine__find_thread_by_time(thread->mg->machine,
+						      thread->pid_, thread->pid_,
+						      timestamp);
+		if (leader == NULL)
+			goto out;
+	}
+
+	list_for_each_entry(mg, &leader->mg_list, list)
+		if (timestamp >= mg->timestamp)
+			return mg;
+
+out:
+	return thread->mg;
+}
+
+int thread__set_map_groups(struct thread *thread, struct map_groups *mg,
+			   u64 timestamp)
+{
+	struct list_head *pos;
+	struct map_groups *old;
+
+	if (mg == NULL)
+		return -ENOMEM;
+
+	/*
+	 * Only a leader thread can have map groups list - others
+	 * reference it through map_groups__get.  This means the
+	 * leader thread will have one more refcnt than others.
+	 */
+	if (thread->tid != thread->pid_)
+		return -EINVAL;
+
+	if (thread->mg) {
+		BUG_ON(refcount_read(&thread->mg->refcnt) <= 1);
+		map_groups__put(thread->mg);
+	}
+
+	/* sort by time */
+	list_for_each(pos, &thread->mg_list) {
+		old = list_entry(pos, struct map_groups, list);
+		if (timestamp > old->timestamp)
+			break;
+	}
+
+	list_add_tail(&mg->list, pos);
+	mg->timestamp = timestamp;
+
+	/* set current ->mg to most recent one */
+	thread->mg = list_first_entry(&thread->mg_list, struct map_groups, list);
+	/* increase one more refcnt for current */
+	map_groups__get(thread->mg);
+
+	return 0;
+}
+
 int thread__init_map_groups(struct thread *thread, struct machine *machine)
 {
 	pid_t pid = thread->pid_;
 
 	if (pid == thread->tid || pid == -1) {
-		thread->mg = map_groups__new(machine);
+		struct map_groups *mg = map_groups__new(machine);
+
+		if (thread__set_map_groups(thread, mg, 0) < 0)
+			map_groups__put(mg);
 	} else {
 		struct thread *leader = __machine__findnew_thread(machine, pid, pid);
 		if (leader) {
@@ -48,6 +114,7 @@ struct thread *thread__new(pid_t pid, pid_t tid)
 		INIT_LIST_HEAD(&thread->comm_list);
 		init_rwsem(&thread->namespaces_lock);
 		init_rwsem(&thread->comm_lock);
+		INIT_LIST_HEAD(&thread->mg_list);
 
 		comm_str = malloc(32);
 		if (!comm_str)
@@ -77,7 +144,8 @@ struct thread *thread__new(pid_t pid, pid_t tid)
 void thread__delete(struct thread *thread)
 {
 	struct namespaces *namespaces, *tmp_namespaces;
-	struct comm *comm, *tmp_comm;
+	struct comm *comm, *tmp;
+	struct map_groups *mg, *tmp_mg;
 
 	BUG_ON(!RB_EMPTY_NODE(&thread->rb_node));
 	BUG_ON(!list_empty(&thread->tid_list));
@@ -89,6 +157,7 @@ void thread__delete(struct thread *thread)
 		thread->mg = NULL;
 	}
 	down_write(&thread->namespaces_lock);
+
 	list_for_each_entry_safe(namespaces, tmp_namespaces,
 				 &thread->namespaces_list, list) {
 		list_del(&namespaces->list);
@@ -97,7 +166,12 @@ void thread__delete(struct thread *thread)
 	up_write(&thread->namespaces_lock);
 
 	down_write(&thread->comm_lock);
-	list_for_each_entry_safe(comm, tmp_comm, &thread->comm_list, list) {
+
+	/* only leader threads have mg list */
+	list_for_each_entry_safe(mg, tmp_mg, &thread->mg_list, list)
+		map_groups__put(mg);
+
+	list_for_each_entry_safe(comm, tmp, &thread->comm_list, list) {
 		list_del(&comm->list);
 		comm__free(comm);
 	}
@@ -217,6 +291,9 @@ struct comm *thread__comm_by_time(const struct thread *thread, u64 timestamp)
 	return list_last_entry(&thread->comm_list, struct comm, list);
 }
 
+static int thread__clone_map_groups(struct thread *thread,
+				    struct thread *parent);
+
 static int ____thread__set_comm(struct thread *thread, const char *str,
 				u64 timestamp, bool exec)
 {
@@ -247,6 +324,40 @@ static int ____thread__set_comm(struct thread *thread, const char *str,
 			unwind__flush_access(thread);
 	}
 
+	if (exec) {
+		struct machine *machine;
+
+		BUG_ON(thread->mg == NULL || thread->mg->machine == NULL);
+
+		machine = thread->mg->machine;
+
+		if (thread->tid != thread->pid_) {
+			struct map_groups *old = thread->mg;
+			struct thread *leader;
+
+			leader = machine__findnew_thread(machine, thread->pid_,
+							 thread->pid_);
+
+			/* now it'll be a new leader */
+			thread->pid_ = thread->tid;
+
+			thread->mg = map_groups__new(old->machine);
+			if (thread->mg == NULL)
+				return -ENOMEM;
+
+			/* save current mg in the new leader */
+			thread__clone_map_groups(thread, leader);
+
+			/* current mg of leader thread needs one more refcnt */
+			map_groups__get(thread->mg);
+
+			thread__set_map_groups(thread, thread->mg, old->timestamp);
+		}
+
+		/* create a new mg for newly executed binary */
+		thread__set_map_groups(thread, map_groups__new(machine), timestamp);
+	}
+
 	thread->comm_set = true;
 
 	return 0;
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 8a1114c2f43a..e7eaf32a0cf1 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -19,6 +19,7 @@ struct thread {
 	struct rb_node		rb_node;
 	struct list_head	tid_list;
 	struct map_groups	*mg;
+	struct list_head	mg_list;
 	pid_t			pid_; /* Not all tools update this */
 	pid_t			tid;
 	pid_t			ppid;
@@ -89,6 +90,8 @@ struct comm *thread__comm_by_time(const struct thread *thread, u64 timestamp);
 const char *thread__comm_str(const struct thread *thread);
 int thread__insert_map(struct thread *thread, struct map *map);
 const char *thread__comm_str_by_time(const struct thread *thread, u64 timestamp);
+struct map_groups *thread__get_map_groups(struct thread *thread, u64 timestamp);
+int thread__set_map_groups(struct thread *thread, struct map_groups *mg, u64 timestamp);
 int thread__fork(struct thread *thread, struct thread *parent, u64 timestamp);
 size_t thread__fprintf(struct thread *thread, FILE *fp);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 22/48] perf tools: Introduce thread__find_symbol_by_time() and friends
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (20 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 21/48] perf tools: Maintain map groups list in a leader thread Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 23/48] perf callchain: Use thread__find_addr_location_by_time() " Jiri Olsa
                   ` (27 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Frederic Weisbecker, lkml, Ingo Molnar, Namhyung Kim,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov

From: Namhyung Kim <namhyung@kernel.org>

These new functions are for find appropriate map (and symbol) at the
given time when used with an indexed data file.  This is based on the
fact that map_groups list is sorted by time in the previous patch.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-dg807z4umbjfq0yk2e3vixaj@git.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/event.c   | 52 ++++++++++++++++++++++++++++++++++-----
 tools/perf/util/machine.c | 20 +++++++++------
 tools/perf/util/thread.c  | 25 +++++++++++++++++++
 tools/perf/util/thread.h  |  9 +++++++
 4 files changed, 92 insertions(+), 14 deletions(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 29438cda8aa2..74d20056b860 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -1516,15 +1516,14 @@ int perf_event__process(struct perf_tool *tool __maybe_unused,
 	return machine__process_event(machine, event, sample);
 }
 
-struct map *thread__find_map(struct thread *thread, u8 cpumode, u64 addr,
-			     struct addr_location *al)
+static
+struct map *map_groups__find_map(struct map_groups *mg, u8 cpumode,
+				 u64 addr, struct addr_location *al)
 {
-	struct map_groups *mg = thread->mg;
 	struct machine *machine = mg->machine;
 	bool load_map = false;
 
 	al->machine = machine;
-	al->thread = thread;
 	al->addr = addr;
 	al->cpumode = cpumode;
 	al->filtered = 0;
@@ -1595,6 +1594,28 @@ struct map *thread__find_map(struct thread *thread, u8 cpumode, u64 addr,
 	return al->map;
 }
 
+struct map *thread__find_map(struct thread *thread, u8 cpumode,
+			     u64 addr, struct addr_location *al)
+{
+	al->thread = thread;
+	return map_groups__find_map(thread->mg, cpumode, addr, al);
+}
+
+struct map *thread__find_map_by_time(struct thread *thread, u8 cpumode,
+				     u64 addr, struct addr_location *al,
+				     u64 timestamp)
+{
+	struct map_groups *mg;
+
+	if (perf_has_index)
+		mg = thread__get_map_groups(thread, timestamp);
+	else
+		mg = thread->mg;
+
+	al->thread = thread;
+	return map_groups__find_map(mg, cpumode, addr, al);
+}
+
 struct symbol *thread__find_symbol(struct thread *thread, u8 cpumode,
 				   u64 addr, struct addr_location *al)
 {
@@ -1604,6 +1625,22 @@ struct symbol *thread__find_symbol(struct thread *thread, u8 cpumode,
 	return al->sym;
 }
 
+struct symbol *thread__find_symbol_by_time(struct thread *thread, u8 cpumode,
+					   u64 addr, struct addr_location *al,
+					   u64 timestamp)
+{
+	if (perf_has_index)
+		thread__find_map_by_time(thread, cpumode, addr, al, timestamp);
+	else
+		thread__find_map(thread, cpumode, addr, al);
+
+	if (al->map != NULL)
+		al->sym = map__find_symbol(al->map, al->addr);
+	else
+		al->sym = NULL;
+	return al->sym;
+}
+
 /*
  * Callers need to drop the reference to al->thread, obtained in
  * machine__findnew_thread()
@@ -1619,7 +1656,9 @@ int machine__resolve(struct machine *machine, struct addr_location *al,
 		return -1;
 
 	dump_printf(" ... thread: %s:%d\n", thread__comm_str(thread), thread->tid);
-	thread__find_map(thread, sample->cpumode, sample->ip, al);
+	thread__find_map_by_time(thread, sample->cpumode,
+				 sample->ip, al, sample->time);
+
 	dump_printf(" ...... dso: %s\n",
 		    al->map ? al->map->dso->long_name :
 			al->level == 'H' ? "[hypervisor]" : "<not found>");
@@ -1698,7 +1737,8 @@ bool sample_addr_correlates_sym(struct perf_event_attr *attr)
 void thread__resolve(struct thread *thread, struct addr_location *al,
 		     struct perf_sample *sample)
 {
-	thread__find_map(thread, sample->cpumode, sample->addr, al);
+	thread__find_map_by_time(thread, sample->cpumode, sample->addr,
+				 al, sample->time);
 
 	al->cpu = sample->cpu;
 	al->sym = NULL;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index e2ebe471cdbc..0c576a01697e 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2011,7 +2011,7 @@ static bool symbol__match_regex(struct symbol *sym, regex_t *regex)
 
 static void ip__resolve_ams(struct thread *thread,
 			    struct addr_map_symbol *ams,
-			    u64 ip)
+			    u64 ip, u64 timestamp)
 {
 	struct addr_location al;
 
@@ -2023,7 +2023,8 @@ static void ip__resolve_ams(struct thread *thread,
 	 * Thus, we have to try consecutively until we find a match
 	 * or else, the symbol is unknown
 	 */
-	thread__find_cpumode_addr_location(thread, ip, &al);
+	thread__find_cpumode_addr_location_by_time(thread, ip,
+						   &al, timestamp);
 
 	ams->addr = ip;
 	ams->al_addr = al.addr;
@@ -2034,13 +2035,14 @@ static void ip__resolve_ams(struct thread *thread,
 
 static void ip__resolve_data(struct thread *thread,
 			     u8 m, struct addr_map_symbol *ams,
-			     u64 addr, u64 phys_addr)
+			     u64 addr, u64 phys_addr, u64 timestamp)
 {
 	struct addr_location al;
 
 	memset(&al, 0, sizeof(al));
 
-	thread__find_symbol(thread, m, addr, &al);
+	thread__find_symbol_by_time(thread, m, addr,
+				    &al, timestamp);
 
 	ams->addr = addr;
 	ams->al_addr = al.addr;
@@ -2057,9 +2059,9 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
 	if (!mi)
 		return NULL;
 
-	ip__resolve_ams(al->thread, &mi->iaddr, sample->ip);
+	ip__resolve_ams(al->thread, &mi->iaddr, sample->ip, sample->time);
 	ip__resolve_data(al->thread, al->cpumode, &mi->daddr,
-			 sample->addr, sample->phys_addr);
+			 sample->addr, sample->phys_addr, sample->time);
 	mi->data_src.val = sample->data_src;
 
 	return mi;
@@ -2175,8 +2177,10 @@ struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
 		return NULL;
 
 	for (i = 0; i < bs->nr; i++) {
-		ip__resolve_ams(al->thread, &bi[i].to, bs->entries[i].to);
-		ip__resolve_ams(al->thread, &bi[i].from, bs->entries[i].from);
+		ip__resolve_ams(al->thread, &bi[i].to,
+				bs->entries[i].to, sample->time);
+		ip__resolve_ams(al->thread, &bi[i].from,
+				bs->entries[i].from, sample->time);
 		bi[i].flags = bs->entries[i].flags;
 	}
 	return bi;
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index fbda4b6d2ec5..8a0b27202ab7 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -550,3 +550,28 @@ struct thread *thread__main_thread(struct machine *machine, struct thread *threa
 
 	return machine__find_thread(machine, thread->pid_, thread->pid_);
 }
+
+void thread__find_cpumode_addr_location_by_time(struct thread *thread,
+						u64 addr, struct addr_location *al,
+						u64 timestamp)
+{
+	size_t i;
+	const u8 cpumodes[] = {
+		PERF_RECORD_MISC_USER,
+		PERF_RECORD_MISC_KERNEL,
+		PERF_RECORD_MISC_GUEST_USER,
+		PERF_RECORD_MISC_GUEST_KERNEL
+	};
+
+	if (!perf_has_index) {
+		thread__find_cpumode_addr_location(thread, addr, al);
+		return;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(cpumodes); i++) {
+		thread__find_symbol_by_time(thread, cpumodes[i],
+					    addr, al, timestamp);
+		if (al->map)
+			break;
+	}
+}
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index e7eaf32a0cf1..86186a0773a0 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -99,12 +99,21 @@ struct thread *thread__main_thread(struct machine *machine, struct thread *threa
 
 struct map *thread__find_map(struct thread *thread, u8 cpumode, u64 addr,
 			     struct addr_location *al);
+struct map *thread__find_map_by_time(struct thread *thread, u8 cpumode,
+				     u64 addr, struct addr_location *al,
+				     u64 timestamp);
 
 struct symbol *thread__find_symbol(struct thread *thread, u8 cpumode,
 				   u64 addr, struct addr_location *al);
+struct symbol *thread__find_symbol_by_time(struct thread *thread, u8 cpumode,
+					   u64 addr, struct addr_location *al,
+					   u64 timestamp);
 
 void thread__find_cpumode_addr_location(struct thread *thread, u64 addr,
 					struct addr_location *al);
+void thread__find_cpumode_addr_location_by_time(struct thread *thread,
+						u64 addr, struct addr_location *al,
+						u64 timestamp);
 
 static inline void *thread__priv(struct thread *thread)
 {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 23/48] perf callchain: Use thread__find_addr_location_by_time() and friends
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (21 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 22/48] perf tools: Introduce thread__find_symbol_by_time() and friends Jiri Olsa
@ 2018-09-13 12:54 ` " Jiri Olsa
  2018-09-13 12:54 ` [PATCH 24/48] perf tools: Add a test case for timed map groups handling Jiri Olsa
                   ` (26 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Frederic Weisbecker, lkml, Ingo Molnar, Namhyung Kim,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov

From: Namhyung Kim <namhyung@kernel.org>

Find correct thread/map/symbol using proper functions.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-k02mlrexo1h3bsezgrr7ydn8@git.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/machine.c                | 18 +++++++++++-------
 tools/perf/util/unwind-libdw.c           |  6 ++++--
 tools/perf/util/unwind-libunwind-local.c | 11 +++++++----
 3 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 0c576a01697e..dc46a7967e10 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2101,7 +2101,8 @@ static int add_callchain_ip(struct thread *thread,
 			    bool branch,
 			    struct branch_flags *flags,
 			    struct iterations *iter,
-			    u64 branch_from)
+			    u64 branch_from,
+			    u64 timestamp)
 {
 	struct addr_location al;
 	int nr_loop_iter = 0;
@@ -2111,7 +2112,8 @@ static int add_callchain_ip(struct thread *thread,
 	al.filtered = 0;
 	al.sym = NULL;
 	if (!cpumode) {
-		thread__find_cpumode_addr_location(thread, ip, &al);
+		thread__find_cpumode_addr_location_by_time(thread, ip,
+							   &al, timestamp);
 	} else {
 		if (ip >= PERF_CONTEXT_MAX) {
 			switch (ip) {
@@ -2136,7 +2138,8 @@ static int add_callchain_ip(struct thread *thread,
 			}
 			return 0;
 		}
-		thread__find_symbol(thread, *cpumode, ip, &al);
+		thread__find_symbol_by_time(thread, *cpumode, ip, &al,
+					    timestamp);
 	}
 
 	if (al.sym != NULL) {
@@ -2333,7 +2336,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 			err = add_callchain_ip(thread, cursor, parent,
 					       root_al, &cpumode, ip,
 					       branch, flags, NULL,
-					       branch_from);
+					       branch_from, sample->time);
 			if (err)
 				return (err < 0) ? err : 0;
 		}
@@ -2356,6 +2359,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 	int chain_nr = 0;
 	u8 cpumode = PERF_RECORD_MISC_USER;
 	int i, j, err, nr_entries;
+	u64 timestamp = sample->time;
 	int skip_idx = -1;
 	int first_call = 0;
 
@@ -2429,13 +2433,13 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 					       root_al,
 					       NULL, be[i].to,
 					       true, &be[i].flags,
-					       NULL, be[i].from);
+					       NULL, be[i].from, timestamp);
 
 			if (!err)
 				err = add_callchain_ip(thread, cursor, parent, root_al,
 						       NULL, be[i].from,
 						       true, &be[i].flags,
-						       &iter[i], 0);
+						       &iter[i], 0, timestamp);
 			if (err == -EINVAL)
 				break;
 			if (err)
@@ -2469,7 +2473,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 
 		err = add_callchain_ip(thread, cursor, parent,
 				       root_al, &cpumode, ip,
-				       false, NULL, NULL, 0);
+				       false, NULL, NULL, 0, timestamp);
 
 		if (err)
 			return (err < 0) ? err : 0;
diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c
index 6f318b15950e..c22c1030a8ad 100644
--- a/tools/perf/util/unwind-libdw.c
+++ b/tools/perf/util/unwind-libdw.c
@@ -32,7 +32,8 @@ static int __report_module(struct addr_location *al, u64 ip,
 	 * Some callers will use al->sym, so we can't just use the
 	 * cheaper thread__find_map() here.
 	 */
-	thread__find_symbol(ui->thread, PERF_RECORD_MISC_USER, ip, al);
+	thread__find_symbol_by_time(ui->thread, PERF_RECORD_MISC_USER, ip,
+				    al, ui->sample->time);
 
 	if (al->map)
 		dso = al->map->dso;
@@ -104,7 +105,8 @@ static int access_dso_mem(struct unwind_info *ui, Dwarf_Addr addr,
 	struct addr_location al;
 	ssize_t size;
 
-	if (!thread__find_map(ui->thread, PERF_RECORD_MISC_USER, addr, &al)) {
+	if (!thread__find_map_by_time(ui->thread, PERF_RECORD_MISC_USER,
+				      addr, &al, ui->sample->time)) {
 		pr_debug("unwind: no map for %lx\n", (unsigned long)addr);
 		return -1;
 	}
diff --git a/tools/perf/util/unwind-libunwind-local.c b/tools/perf/util/unwind-libunwind-local.c
index 79f521a552cf..da6f39315b47 100644
--- a/tools/perf/util/unwind-libunwind-local.c
+++ b/tools/perf/util/unwind-libunwind-local.c
@@ -366,7 +366,9 @@ static int read_unwind_spec_debug_frame(struct dso *dso,
 static struct map *find_map(unw_word_t ip, struct unwind_info *ui)
 {
 	struct addr_location al;
-	return thread__find_map(ui->thread, PERF_RECORD_MISC_USER, ip, &al);
+
+	return thread__find_map_by_time(ui->thread, PERF_RECORD_MISC_USER, ip,
+					&al, ui->sample->time);
 }
 
 static int
@@ -568,13 +570,14 @@ static void put_unwind_info(unw_addr_space_t __maybe_unused as,
 	pr_debug("unwind: put_unwind_info called\n");
 }
 
-static int entry(u64 ip, struct thread *thread,
+static int entry(u64 ip, struct thread *thread, u64 timestamp,
 		 unwind_entry_cb_t cb, void *arg)
 {
 	struct unwind_entry e;
 	struct addr_location al;
 
-	e.sym = thread__find_symbol(thread, PERF_RECORD_MISC_USER, ip, &al);
+	e.sym = thread__find_symbol_by_time(thread, PERF_RECORD_MISC_USER, ip,
+					    &al, timestamp);
 	e.ip  = ip;
 	e.map = al.map;
 
@@ -700,7 +703,7 @@ static int get_entries(struct unwind_info *ui, unwind_entry_cb_t cb,
 
 		if (callchain_param.order == ORDER_CALLER)
 			j = max_stack - i - 1;
-		ret = ips[j] ? entry(ips[j], ui->thread, cb, arg) : 0;
+		ret = ips[j] ? entry(ips[j], ui->thread, ui->sample->time, cb, arg) : 0;
 	}
 
 	return ret;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 24/48] perf tools: Add a test case for timed map groups handling
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (22 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 23/48] perf callchain: Use thread__find_addr_location_by_time() " Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 25/48] perf tools: Save timestamp of a map creation Jiri Olsa
                   ` (25 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Frederic Weisbecker, lkml, Ingo Molnar, Namhyung Kim,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov

From: Namhyung Kim <namhyung@kernel.org>

A test case for verifying thread->mg and ->mg_list handling during
time change and new thread__find_addr_map_by_time() and friends.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-20jkz8hzs9njsvmrseo2o3s8@git.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/tests/Build            |  1 +
 tools/perf/tests/builtin-test.c   |  4 ++
 tools/perf/tests/tests.h          |  1 +
 tools/perf/tests/thread-mg-time.c | 94 +++++++++++++++++++++++++++++++
 4 files changed, 100 insertions(+)
 create mode 100644 tools/perf/tests/thread-mg-time.c

diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build
index 715eb17d8047..d14a532f87f1 100644
--- a/tools/perf/tests/Build
+++ b/tools/perf/tests/Build
@@ -27,6 +27,7 @@ perf-y += mmap-thread-lookup.o
 perf-y += thread-comm.o
 perf-y += thread-mg-share.o
 perf-y += thread-lookup-time.o
+perf-y += thread-mg-time.o
 perf-y += switch-tracking.o
 perf-y += keep-tracking.o
 perf-y += code-reading.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index c985ece3fab8..ee58cb40ebc2 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -287,6 +287,10 @@ static struct test generic_tests[] = {
 		.desc = "Test thread lookup with time",
 		.func = test__thread_lookup_time,
 	},
+	{
+		.desc = "Test thread map group handling with time",
+		.func = test__thread_mg_time,
+	},
 	{
 		.func = NULL,
 	},
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index 60fdc7bea1d8..cd7cca258398 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -106,6 +106,7 @@ int test__unit_number__scnprint(struct test *test, int subtest);
 int test__mem2node(struct test *t, int subtest);
 int test__thread_comm(struct test *test, int subtest);
 int test__thread_lookup_time(struct test *test, int subtest);
+int test__thread_mg_time(struct test *test, int subtest);
 
 bool test__bp_signal_is_supported(void);
 
diff --git a/tools/perf/tests/thread-mg-time.c b/tools/perf/tests/thread-mg-time.c
new file mode 100644
index 000000000000..6a735f59c097
--- /dev/null
+++ b/tools/perf/tests/thread-mg-time.c
@@ -0,0 +1,94 @@
+#include <linux/compiler.h>
+#include "tests.h"
+#include "machine.h"
+#include "thread.h"
+#include "map.h"
+#include "debug.h"
+
+#define PERF_MAP_START  0x40000
+
+int test__thread_mg_time(struct test *test __maybe_unused, int subtest __maybe_unused)
+{
+	struct machines machines;
+	struct machine *machine;
+	struct thread *t;
+	struct map_groups *mg;
+	struct map *map, *old_map;
+	struct addr_location al = { .map = NULL, };
+
+	/*
+	 * This test is to check whether it can retrieve a correct map
+	 * for a given time.  When multi-file data storage is enabled,
+	 * those task/comm/mmap events are processed first so the
+	 * later sample should find a matching comm properly.
+	 */
+	machines__init(&machines);
+	machine = &machines.host;
+
+	/* this is needed to add/find map by time */
+	perf_has_index = true;
+
+	t = machine__findnew_thread(machine, 0, 0);
+	mg = t->mg;
+
+	map = dso__new_map("/usr/bin/perf");
+	map->start = PERF_MAP_START;
+	map->end = PERF_MAP_START + 0x1000;
+
+	thread__insert_map(t, map);
+
+	if (verbose > 1)
+		map_groups__fprintf(t->mg, stderr);
+
+	thread__find_addr_map(t, PERF_RECORD_MISC_USER, MAP__FUNCTION,
+			      PERF_MAP_START, &al);
+
+	TEST_ASSERT_VAL("cannot find mapping for perf", al.map != NULL);
+	TEST_ASSERT_VAL("non matched mapping found", al.map == map);
+	TEST_ASSERT_VAL("incorrect map groups", al.map->groups == mg);
+	TEST_ASSERT_VAL("incorrect map groups", al.map->groups == t->mg);
+
+	thread__find_addr_map_by_time(t, PERF_RECORD_MISC_USER, MAP__FUNCTION,
+				      PERF_MAP_START, &al, -1ULL);
+
+	TEST_ASSERT_VAL("cannot find timed mapping for perf", al.map != NULL);
+	TEST_ASSERT_VAL("non matched timed mapping", al.map == map);
+	TEST_ASSERT_VAL("incorrect timed map groups", al.map->groups == mg);
+	TEST_ASSERT_VAL("incorrect map groups", al.map->groups == t->mg);
+
+
+	pr_debug("simulate EXEC event (generate new mg)\n");
+	__thread__set_comm(t, "perf-test", 10000, true);
+
+	old_map = map;
+
+	map = dso__new_map("/usr/bin/perf-test");
+	map->start = PERF_MAP_START;
+	map->end = PERF_MAP_START + 0x2000;
+
+	thread__insert_map(t, map);
+
+	if (verbose > 1)
+		map_groups__fprintf(t->mg, stderr);
+
+	thread__find_addr_map(t, PERF_RECORD_MISC_USER, MAP__FUNCTION,
+			      PERF_MAP_START + 4, &al);
+
+	TEST_ASSERT_VAL("cannot find mapping for perf-test", al.map != NULL);
+	TEST_ASSERT_VAL("invalid mapping found", al.map == map);
+	TEST_ASSERT_VAL("incorrect map groups", al.map->groups != mg);
+	TEST_ASSERT_VAL("incorrect map groups", al.map->groups == t->mg);
+
+	pr_debug("searching map in the old mag groups\n");
+	thread__find_addr_map_by_time(t, PERF_RECORD_MISC_USER, MAP__FUNCTION,
+				      PERF_MAP_START, &al, 5000);
+
+	TEST_ASSERT_VAL("cannot find timed mapping for perf-test", al.map != NULL);
+	TEST_ASSERT_VAL("non matched timed mapping", al.map == old_map);
+	TEST_ASSERT_VAL("incorrect timed map groups", al.map->groups == mg);
+	TEST_ASSERT_VAL("incorrect map groups", al.map->groups != t->mg);
+
+	machine__delete_threads(machine);
+	machines__exit(&machines);
+	return 0;
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 25/48] perf tools: Save timestamp of a map creation
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (23 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 24/48] perf tools: Add a test case for timed map groups handling Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 26/48] perf tools: Introduce map_groups__{insert,find}_by_time() Jiri Olsa
                   ` (24 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Stephane Eranian, Frederic Weisbecker, lkml, Ingo Molnar,
	Namhyung Kim, Alexander Shishkin, Peter Zijlstra, Andi Kleen,
	Alexey Budankov

From: Namhyung Kim <namhyung@kernel.org>

It'll be used to support multiple maps on a same address like dlopen()
and/or JIT compile cases.

Cc: Stephane Eranian <eranian@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-p39jo653jdpz7uyzz7ih5xk9@git.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/tests/thread-mg-time.c | 16 ++++++++--------
 tools/perf/util/dso.c             |  2 +-
 tools/perf/util/machine.c         | 27 ++++++++++++++++-----------
 tools/perf/util/machine.h         |  2 +-
 tools/perf/util/map.c             | 12 +++++++-----
 tools/perf/util/map.h             |  9 ++++++---
 tools/perf/util/symbol-elf.c      |  2 +-
 tools/perf/util/symbol.c          |  4 ++--
 8 files changed, 42 insertions(+), 32 deletions(-)

diff --git a/tools/perf/tests/thread-mg-time.c b/tools/perf/tests/thread-mg-time.c
index 6a735f59c097..19dc298756c8 100644
--- a/tools/perf/tests/thread-mg-time.c
+++ b/tools/perf/tests/thread-mg-time.c
@@ -40,16 +40,16 @@ int test__thread_mg_time(struct test *test __maybe_unused, int subtest __maybe_u
 	if (verbose > 1)
 		map_groups__fprintf(t->mg, stderr);
 
-	thread__find_addr_map(t, PERF_RECORD_MISC_USER, MAP__FUNCTION,
-			      PERF_MAP_START, &al);
+	thread__find_map(t, PERF_RECORD_MISC_USER,
+			 PERF_MAP_START, &al);
 
 	TEST_ASSERT_VAL("cannot find mapping for perf", al.map != NULL);
 	TEST_ASSERT_VAL("non matched mapping found", al.map == map);
 	TEST_ASSERT_VAL("incorrect map groups", al.map->groups == mg);
 	TEST_ASSERT_VAL("incorrect map groups", al.map->groups == t->mg);
 
-	thread__find_addr_map_by_time(t, PERF_RECORD_MISC_USER, MAP__FUNCTION,
-				      PERF_MAP_START, &al, -1ULL);
+	thread__find_map_by_time(t, PERF_RECORD_MISC_USER,
+				 PERF_MAP_START, &al, -1ULL);
 
 	TEST_ASSERT_VAL("cannot find timed mapping for perf", al.map != NULL);
 	TEST_ASSERT_VAL("non matched timed mapping", al.map == map);
@@ -71,8 +71,8 @@ int test__thread_mg_time(struct test *test __maybe_unused, int subtest __maybe_u
 	if (verbose > 1)
 		map_groups__fprintf(t->mg, stderr);
 
-	thread__find_addr_map(t, PERF_RECORD_MISC_USER, MAP__FUNCTION,
-			      PERF_MAP_START + 4, &al);
+	thread__find_map(t, PERF_RECORD_MISC_USER,
+			 PERF_MAP_START + 4, &al);
 
 	TEST_ASSERT_VAL("cannot find mapping for perf-test", al.map != NULL);
 	TEST_ASSERT_VAL("invalid mapping found", al.map == map);
@@ -80,8 +80,8 @@ int test__thread_mg_time(struct test *test __maybe_unused, int subtest __maybe_u
 	TEST_ASSERT_VAL("incorrect map groups", al.map->groups == t->mg);
 
 	pr_debug("searching map in the old mag groups\n");
-	thread__find_addr_map_by_time(t, PERF_RECORD_MISC_USER, MAP__FUNCTION,
-				      PERF_MAP_START, &al, 5000);
+	thread__find_map_by_time(t, PERF_RECORD_MISC_USER,
+				 PERF_MAP_START, &al, 5000);
 
 	TEST_ASSERT_VAL("cannot find timed mapping for perf-test", al.map != NULL);
 	TEST_ASSERT_VAL("non matched timed mapping", al.map == old_map);
diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index bbed90e5d9bb..2729239daf0b 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -1010,7 +1010,7 @@ struct map *dso__new_map(const char *name)
 	struct dso *dso = dso__new(name);
 
 	if (dso)
-		map = map__new2(0, dso);
+		map = map__new2(0, dso, 0);
 
 	return map;
 }
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index dc46a7967e10..abf66c7dd26a 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -854,7 +854,7 @@ static void dso__adjust_kmod_long_name(struct dso *dso, const char *filename)
 }
 
 struct map *machine__findnew_module_map(struct machine *machine, u64 start,
-					const char *filename)
+					const char *filename, u64 timestamp)
 {
 	struct map *map = NULL;
 	struct dso *dso = NULL;
@@ -878,7 +878,7 @@ struct map *machine__findnew_module_map(struct machine *machine, u64 start,
 	if (dso == NULL)
 		goto out;
 
-	map = map__new2(start, dso);
+	map = map__new2(start, dso, timestamp);
 	if (map == NULL)
 		goto out;
 
@@ -1050,7 +1050,7 @@ int machine__create_extra_kernel_map(struct machine *machine,
 	struct kmap *kmap;
 	struct map *map;
 
-	map = map__new2(xm->start, kernel);
+	map = map__new2(xm->start, kernel, 0);
 	if (!map)
 		return -1;
 
@@ -1176,7 +1176,7 @@ __machine__create_kernel_maps(struct machine *machine, struct dso *kernel)
 	/* In case of renewal the kernel map, destroy previous one */
 	machine__destroy_kernel_maps(machine);
 
-	machine->vmlinux_map = map__new2(0, kernel);
+	machine->vmlinux_map = map__new2(0, kernel, 0);
 	if (machine->vmlinux_map == NULL)
 		return -1;
 
@@ -1463,7 +1463,7 @@ static int machine__create_module(void *arg, const char *name, u64 start,
 	if (arch__fix_module_text_start(&start, name) < 0)
 		return -1;
 
-	map = machine__findnew_module_map(machine, start, name);
+	map = machine__findnew_module_map(machine, start, name, 0);
 	if (map == NULL)
 		return -1;
 	map->end = start + size;
@@ -1608,7 +1608,8 @@ static int machine__process_extra_kernel_map(struct machine *machine,
 }
 
 static int machine__process_kernel_mmap_event(struct machine *machine,
-					      union perf_event *event)
+					      union perf_event *event,
+					      u64 timestamp)
 {
 	struct map *map;
 	enum dso_kernel_type kernel_type;
@@ -1629,7 +1630,8 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
 	if (event->mmap.filename[0] == '/' ||
 	    (!is_kernel_mmap && event->mmap.filename[0] == '[')) {
 		map = machine__findnew_module_map(machine, event->mmap.start,
-						  event->mmap.filename);
+						  event->mmap.filename,
+						  timestamp);
 		if (map == NULL)
 			goto out_problem;
 
@@ -1731,7 +1733,8 @@ int machine__process_mmap2_event(struct machine *machine,
 
 	if (sample->cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
 	    sample->cpumode == PERF_RECORD_MISC_KERNEL) {
-		ret = machine__process_kernel_mmap_event(machine, event);
+		ret = machine__process_kernel_mmap_event(machine, event,
+							 sample->time);
 		if (ret < 0)
 			goto out_problem;
 		return 0;
@@ -1749,7 +1752,8 @@ int machine__process_mmap2_event(struct machine *machine,
 			event->mmap2.ino_generation,
 			event->mmap2.prot,
 			event->mmap2.flags,
-			event->mmap2.filename, thread);
+			event->mmap2.filename, thread,
+			sample->time);
 
 	if (map == NULL)
 		goto out_problem_map;
@@ -1784,7 +1788,8 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
 
 	if (sample->cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
 	    sample->cpumode == PERF_RECORD_MISC_KERNEL) {
-		ret = machine__process_kernel_mmap_event(machine, event);
+		ret = machine__process_kernel_mmap_event(machine, event,
+							 sample->time);
 		if (ret < 0)
 			goto out_problem;
 		return 0;
@@ -1802,7 +1807,7 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
 			event->mmap.len, event->mmap.pgoff,
 			0, 0, 0, 0, prot, 0,
 			event->mmap.filename,
-			thread);
+			thread, sample->time);
 
 	if (map == NULL)
 		goto out_problem_map;
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 9aed55d9facc..733d5814542b 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -221,7 +221,7 @@ struct symbol *machine__find_kernel_symbol_by_name(struct machine *machine,
 }
 
 struct map *machine__findnew_module_map(struct machine *machine, u64 start,
-					const char *filename);
+					const char *filename, u64 timestamp);
 int arch__fix_module_text_start(u64 *start, const char *name);
 
 int machine__load_kallsyms(struct machine *machine, const char *filename);
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 6d6a0f65a9a0..2821919156c9 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -124,7 +124,8 @@ static inline bool replace_android_lib(const char *filename, char *newfilename)
 	return false;
 }
 
-void map__init(struct map *map, u64 start, u64 end, u64 pgoff, struct dso *dso)
+void map__init(struct map *map, u64 start, u64 end, u64 pgoff,
+	       struct dso *dso, u64 timestamp)
 {
 	map->start    = start;
 	map->end      = end;
@@ -137,12 +138,13 @@ void map__init(struct map *map, u64 start, u64 end, u64 pgoff, struct dso *dso)
 	map->groups   = NULL;
 	map->erange_warned = false;
 	refcount_set(&map->refcnt, 1);
+	map->timestamp = timestamp;
 }
 
 struct map *map__new(struct machine *machine, u64 start, u64 len,
 		     u64 pgoff, u32 d_maj, u32 d_min, u64 ino,
 		     u64 ino_gen, u32 prot, u32 flags, char *filename,
-		     struct thread *thread)
+		     struct thread *thread, u64 timestamp)
 {
 	struct map *map = malloc(sizeof(*map));
 	struct nsinfo *nsi = NULL;
@@ -196,7 +198,7 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
 		if (dso == NULL)
 			goto out_delete;
 
-		map__init(map, start, start + len, pgoff, dso);
+		map__init(map, start, start + len, pgoff, dso, timestamp);
 
 		if (anon || no_dso) {
 			map->map_ip = map->unmap_ip = identity__map_ip;
@@ -224,7 +226,7 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
  * they are loaded) and for vmlinux, where only after we load all the
  * symbols we'll know where it starts and ends.
  */
-struct map *map__new2(u64 start, struct dso *dso)
+struct map *map__new2(u64 start, struct dso *dso, u64 timestamp)
 {
 	struct map *map = calloc(1, (sizeof(*map) +
 				     (dso->kernel ? sizeof(struct kmap) : 0)));
@@ -232,7 +234,7 @@ struct map *map__new2(u64 start, struct dso *dso)
 		/*
 		 * ->end will be filled after we load all the symbols
 		 */
-		map__init(map, start, 0, 0, dso);
+		map__init(map, start, 0, 0, dso, timestamp);
 	}
 
 	return map;
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index fb5f40fea2e3..0d35064cf813 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -36,6 +36,7 @@ struct map {
 	u32			maj, min; /* only valid for MMAP2 record */
 	u64			ino;      /* only valid for MMAP2 record */
 	u64			ino_generation;/* only valid for MMAP2 record */
+	u64			timestamp;
 
 	/* ip -> dso rip */
 	u64			(*map_ip)(struct map *, u64);
@@ -142,12 +143,14 @@ struct thread;
 	__map__for_each_symbol_by_name(map, sym_name, (pos))
 
 void map__init(struct map *map,
-	       u64 start, u64 end, u64 pgoff, struct dso *dso);
+	       u64 start, u64 end, u64 pgoff, struct dso *dso,
+	       u64 timestamp);
 struct map *map__new(struct machine *machine, u64 start, u64 len,
 		     u64 pgoff, u32 d_maj, u32 d_min, u64 ino,
 		     u64 ino_gen, u32 prot, u32 flags,
-		     char *filename, struct thread *thread);
-struct map *map__new2(u64 start, struct dso *dso);
+		     char *filename, struct thread *thread,
+		     u64 timestamp);
+struct map *map__new2(u64 start, struct dso *dso, u64 timestamp);
 void map__delete(struct map *map);
 struct map *map__clone(struct map *map);
 
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 29770ea61768..7214942e1ddf 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -870,7 +870,7 @@ static int dso__process_kernel_symbol(struct dso *dso, struct map *map,
 		curr_dso->kernel = dso->kernel;
 		curr_dso->long_name = dso->long_name;
 		curr_dso->long_name_len = dso->long_name_len;
-		curr_map = map__new2(start, curr_dso);
+		curr_map = map__new2(start, curr_dso, map->timestamp);
 		dso__put(curr_dso);
 		if (curr_map == NULL)
 			return -1;
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index d188b7588152..91252507b3ab 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -832,7 +832,7 @@ static int map_groups__split_kallsyms(struct map_groups *kmaps, struct dso *dso,
 
 			ndso->kernel = dso->kernel;
 
-			curr_map = map__new2(pos->start, ndso);
+			curr_map = map__new2(pos->start, ndso, 0);
 			if (curr_map == NULL) {
 				dso__put(ndso);
 				return -1;
@@ -1139,7 +1139,7 @@ static int kcore_mapfn(u64 start, u64 len, u64 pgoff, void *data)
 	struct kcore_mapfn_data *md = data;
 	struct map *map;
 
-	map = map__new2(start, md->dso);
+	map = map__new2(start, md->dso, 0);
 	if (map == NULL)
 		return -ENOMEM;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 26/48] perf tools: Introduce map_groups__{insert,find}_by_time()
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (24 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 25/48] perf tools: Save timestamp of a map creation Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 27/48] perf tools: Use map_groups__find_addr_by_time() Jiri Olsa
                   ` (23 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Stephane Eranian, lkml, Ingo Molnar, Namhyung Kim,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov

From: Namhyung Kim <namhyung@kernel.org>

It'll manage maps using timestamp so that it can find correct
map/symbol for sample at a certain time.  With this API, it can
maintain overlapping maps in a map_groups.

Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-3bzcl4dzqh6qiqaddo5gco4y@git.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/map.c | 64 +++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/map.h | 24 ++++++++++++++++
 2 files changed, 88 insertions(+)

diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 2821919156c9..4135a22091fe 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -825,6 +825,41 @@ void maps__insert(struct maps *maps, struct map *map)
 	up_write(&maps->lock);
 }
 
+static void __maps__insert_by_time(struct maps *maps, struct map *map)
+{
+	struct rb_node **p = &maps->entries.rb_node;
+	struct rb_node *parent = NULL;
+	const u64 ip = map->start;
+	const u64 timestamp = map->timestamp;
+	struct map *m;
+
+	while (*p != NULL) {
+		parent = *p;
+		m = rb_entry(parent, struct map, rb_node);
+		if (ip < m->start)
+			p = &(*p)->rb_left;
+		else if (ip > m->start)
+			p = &(*p)->rb_right;
+		else if (timestamp > m->timestamp)
+			p = &(*p)->rb_left;
+		else if (timestamp <= m->timestamp)
+			p = &(*p)->rb_right;
+		else
+			BUG_ON(1);
+	}
+
+	rb_link_node(&map->rb_node, parent, p);
+	rb_insert_color(&map->rb_node, &maps->entries);
+	map__get(map);
+}
+
+void maps__insert_by_time(struct maps *maps, struct map *map)
+{
+	down_write(&maps->lock);
+	__maps__insert_by_time(maps, map);
+	up_write(&maps->lock);
+}
+
 static void __maps__remove(struct maps *maps, struct map *map)
 {
 	rb_erase_init(&map->rb_node, &maps->entries);
@@ -863,6 +898,35 @@ struct map *maps__find(struct maps *maps, u64 ip)
 	return m;
 }
 
+struct map *maps__find_by_time(struct maps *maps, u64 ip, u64 timestamp)
+{
+	struct rb_node **p;
+	struct rb_node *parent = NULL;
+	struct map *m;
+	struct map *best = NULL;
+
+	down_read(&maps->lock);
+
+	p = &maps->entries.rb_node;
+	while (*p != NULL) {
+		parent = *p;
+		m = rb_entry(parent, struct map, rb_node);
+		if (ip < m->start)
+			p = &(*p)->rb_left;
+		else if (ip >= m->end)
+			p = &(*p)->rb_right;
+		else if (timestamp >= m->timestamp) {
+			if (!best || best->timestamp < m->timestamp)
+				best = m;
+			p = &(*p)->rb_left;
+		} else
+			p = &(*p)->rb_right;
+	}
+
+	up_read(&maps->lock);
+	return best;
+}
+
 struct map *maps__first(struct maps *maps)
 {
 	struct rb_node *first = rb_first(&maps->entries);
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index 0d35064cf813..02c6f6962eb1 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -13,6 +13,8 @@
 #include <linux/types.h>
 #include "rwsem.h"
 
+#include "perf.h"  /* for perf_has_index */
+
 struct dso;
 struct ip_callchain;
 struct ref_reloc_sym;
@@ -186,8 +188,10 @@ void map__fixup_end(struct map *map);
 void map__reloc_vmlinux(struct map *map);
 
 void maps__insert(struct maps *maps, struct map *map);
+void maps__insert_by_time(struct maps *maps, struct map *map);
 void maps__remove(struct maps *maps, struct map *map);
 struct map *maps__find(struct maps *maps, u64 addr);
+struct map *maps__find_by_time(struct maps *maps, u64 addr, u64 timestamp);
 struct map *maps__first(struct maps *maps);
 struct map *map__next(struct map *map);
 struct symbol *maps__find_symbol_by_name(struct maps *maps, const char *name,
@@ -207,6 +211,17 @@ static inline void map_groups__insert(struct map_groups *mg, struct map *map)
 	map->groups = mg;
 }
 
+static inline void map_groups__insert_by_time(struct map_groups *mg,
+					      struct map *map)
+{
+	if (perf_has_index)
+		maps__insert_by_time(&mg->maps, map);
+	else
+		maps__insert(&mg->maps, map);
+
+	map->groups = mg;
+}
+
 static inline void map_groups__remove(struct map_groups *mg, struct map *map)
 {
 	maps__remove(&mg->maps, map);
@@ -219,6 +234,15 @@ static inline struct map *map_groups__find(struct map_groups *mg, u64 addr)
 
 struct map *map_groups__first(struct map_groups *mg);
 
+static inline struct map *map_groups__find_by_time(struct map_groups *mg,
+						   u64 addr, u64 timestamp)
+{
+	if (!perf_has_index)
+		return maps__find(&mg->maps, addr);
+
+	return maps__find_by_time(&mg->maps, addr, timestamp);
+}
+
 static inline struct map *map_groups__next(struct map *map)
 {
 	return map__next(map);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 27/48] perf tools: Use map_groups__find_addr_by_time()
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (25 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 26/48] perf tools: Introduce map_groups__{insert,find}_by_time() Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 28/48] perf tools: Add testcase for managing maps with time Jiri Olsa
                   ` (22 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Stephane Eranian, lkml, Ingo Molnar, Namhyung Kim,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov

From: Namhyung Kim <namhyung@kernel.org>

Use timestamp to find a corresponding map so that it can find a match
symbol eventually.

Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-v4i4q9u41ab4hcjispe9qjzk@git.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/event.c  | 83 +++++++++++++++++++++++++++++++++++-----
 tools/perf/util/thread.c |  8 +++-
 2 files changed, 79 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 74d20056b860..64ba909aeecb 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -1516,12 +1516,11 @@ int perf_event__process(struct perf_tool *tool __maybe_unused,
 	return machine__process_event(machine, event, sample);
 }
 
-static
-struct map *map_groups__find_map(struct map_groups *mg, u8 cpumode,
-				 u64 addr, struct addr_location *al)
+static bool map_groups__set_addr_location(struct map_groups *mg,
+					  struct addr_location *al,
+					  u8 cpumode, u64 addr)
 {
 	struct machine *machine = mg->machine;
-	bool load_map = false;
 
 	al->machine = machine;
 	al->addr = addr;
@@ -1530,21 +1529,17 @@ struct map *map_groups__find_map(struct map_groups *mg, u8 cpumode,
 
 	if (machine == NULL) {
 		al->map = NULL;
-		return NULL;
+		return true;
 	}
 
 	BUG_ON(mg == NULL);
 
 	if (cpumode == PERF_RECORD_MISC_KERNEL && perf_host) {
 		al->level = 'k';
-		mg = &machine->kmaps;
-		load_map = true;
 	} else if (cpumode == PERF_RECORD_MISC_USER && perf_host) {
 		al->level = '.';
 	} else if (cpumode == PERF_RECORD_MISC_GUEST_KERNEL && perf_guest) {
 		al->level = 'g';
-		mg = &machine->kmaps;
-		load_map = true;
 	} else if (cpumode == PERF_RECORD_MISC_GUEST_USER && perf_guest) {
 		al->level = 'u';
 	} else {
@@ -1560,8 +1555,27 @@ struct map *map_groups__find_map(struct map_groups *mg, u8 cpumode,
 			!perf_host)
 			al->filtered |= (1 << HIST_FILTER__HOST);
 
+		return true;
+	}
+	return false;
+}
+
+static
+struct map *map_groups__find_map(struct map_groups *mg, u8 cpumode,
+				 u64 addr, struct addr_location *al)
+{
+	struct machine *machine = mg->machine;
+	bool load_map = false;
+
+	if (map_groups__set_addr_location(mg, al, cpumode, addr))
 		return NULL;
+
+	if ((cpumode == PERF_RECORD_MISC_KERNEL && perf_host) ||
+	    (cpumode == PERF_RECORD_MISC_GUEST_KERNEL && perf_guest)) {
+		mg = &machine->kmaps;
+		load_map = true;
 	}
+
 try_again:
 	al->map = map_groups__find(mg, al->addr);
 	if (al->map == NULL) {
@@ -1594,6 +1608,55 @@ struct map *map_groups__find_map(struct map_groups *mg, u8 cpumode,
 	return al->map;
 }
 
+static
+struct map *map_groups__find_map_by_time(struct map_groups *mg, u8 cpumode,
+					 u64 addr, struct addr_location *al,
+					 u64 timestamp)
+{
+	struct machine *machine = mg->machine;
+	bool load_map = false;
+
+	if (map_groups__set_addr_location(mg, al, cpumode, addr))
+		return NULL;
+
+	if ((cpumode == PERF_RECORD_MISC_KERNEL && perf_host) ||
+	    (cpumode == PERF_RECORD_MISC_GUEST_KERNEL && perf_guest)) {
+		mg = &machine->kmaps;
+		load_map = true;
+	}
+
+try_again:
+	al->map = map_groups__find_by_time(mg, al->addr, timestamp);
+	if (al->map == NULL) {
+		/*
+		 * If this is outside of all known maps, and is a negative
+		 * address, try to look it up in the kernel dso, as it might be
+		 * a vsyscall or vdso (which executes in user-mode).
+		 *
+		 * XXX This is nasty, we should have a symbol list in the
+		 * "[vdso]" dso, but for now lets use the old trick of looking
+		 * in the whole kernel symbol list.
+		 */
+		if (cpumode == PERF_RECORD_MISC_USER && machine &&
+		    mg != &machine->kmaps &&
+		    machine__kernel_ip(machine, al->addr)) {
+			mg = &machine->kmaps;
+			load_map = true;
+			goto try_again;
+		}
+	} else {
+		/*
+		 * Kernel maps might be changed when loading symbols so loading
+		 * must be done prior to using kernel maps.
+		 */
+		if (load_map)
+			map__load(al->map);
+		al->addr = al->map->map_ip(al->map, al->addr);
+	}
+
+	return al->map;
+}
+
 struct map *thread__find_map(struct thread *thread, u8 cpumode,
 			     u64 addr, struct addr_location *al)
 {
@@ -1613,7 +1676,7 @@ struct map *thread__find_map_by_time(struct thread *thread, u8 cpumode,
 		mg = thread->mg;
 
 	al->thread = thread;
-	return map_groups__find_map(mg, cpumode, addr, al);
+	return map_groups__find_map_by_time(mg, cpumode, addr, al, timestamp);
 }
 
 struct symbol *thread__find_symbol(struct thread *thread, u8 cpumode,
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 8a0b27202ab7..491761752ac6 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -449,8 +449,12 @@ int thread__insert_map(struct thread *thread, struct map *map)
 	if (ret)
 		return ret;
 
-	map_groups__fixup_overlappings(thread->mg, map, stderr);
-	map_groups__insert(thread->mg, map);
+	if (perf_has_index) {
+		map_groups__insert_by_time(thread->mg, map);
+	} else {
+		map_groups__fixup_overlappings(thread->mg, map, stderr);
+		map_groups__insert(thread->mg, map);
+	}
 
 	return 0;
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 28/48] perf tools: Add testcase for managing maps with time
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (26 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 27/48] perf tools: Use map_groups__find_addr_by_time() Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 29/48] perf callchain: Maintain libunwind's address space in map_groups Jiri Olsa
                   ` (21 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Stephane Eranian, lkml, Ingo Molnar, Namhyung Kim,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov

From: Namhyung Kim <namhyung@kernel.org>

This tests new map_groups__{insert,find}_by_time() API working
correctly by using 3 * 100 maps.

Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-h0fna591s3uf1zyaftvos9hj@git.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/tests/Build             |  1 +
 tools/perf/tests/builtin-test.c    |  4 ++
 tools/perf/tests/tests.h           |  1 +
 tools/perf/tests/thread-map-time.c | 90 ++++++++++++++++++++++++++++++
 4 files changed, 96 insertions(+)
 create mode 100644 tools/perf/tests/thread-map-time.c

diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build
index d14a532f87f1..edaec95c8812 100644
--- a/tools/perf/tests/Build
+++ b/tools/perf/tests/Build
@@ -28,6 +28,7 @@ perf-y += thread-comm.o
 perf-y += thread-mg-share.o
 perf-y += thread-lookup-time.o
 perf-y += thread-mg-time.o
+perf-y += thread-map-time.o
 perf-y += switch-tracking.o
 perf-y += keep-tracking.o
 perf-y += code-reading.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index ee58cb40ebc2..fe8ef107e61b 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -291,6 +291,10 @@ static struct test generic_tests[] = {
 		.desc = "Test thread map group handling with time",
 		.func = test__thread_mg_time,
 	},
+	{
+		.desc = "Test thread map lookup with time",
+		.func = test__thread_map_lookup_time,
+	},
 	{
 		.func = NULL,
 	},
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index cd7cca258398..3c831a08f37a 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -107,6 +107,7 @@ int test__mem2node(struct test *t, int subtest);
 int test__thread_comm(struct test *test, int subtest);
 int test__thread_lookup_time(struct test *test, int subtest);
 int test__thread_mg_time(struct test *test, int subtest);
+int test__thread_map_lookup_time(struct test *test, int subtest);
 
 bool test__bp_signal_is_supported(void);
 
diff --git a/tools/perf/tests/thread-map-time.c b/tools/perf/tests/thread-map-time.c
new file mode 100644
index 000000000000..f1e59944268c
--- /dev/null
+++ b/tools/perf/tests/thread-map-time.c
@@ -0,0 +1,90 @@
+#include <linux/compiler.h>
+#include "debug.h"
+#include "tests.h"
+#include "machine.h"
+#include "thread.h"
+#include "map.h"
+
+#define PERF_MAP_START  0x40000
+#define LIBC_MAP_START  0x80000
+#define VDSO_MAP_START  0x7F000
+
+#define NR_MAPS  100
+
+static int lookup_maps(struct map_groups *mg)
+{
+	struct map *map;
+	int i, ret = -1;
+	size_t n;
+	struct {
+		const char *path;
+		u64 start;
+	} maps[] = {
+		{ "/usr/bin/perf",	PERF_MAP_START },
+		{ "/usr/lib/libc.so",	LIBC_MAP_START },
+		{ "[vdso]",		VDSO_MAP_START },
+	};
+
+	/* this is needed to insert/find map by time */
+	perf_has_index = true;
+
+	for (n = 0; n < ARRAY_SIZE(maps); n++) {
+		for (i = 0; i < NR_MAPS; i++) {
+			map = map__new2(maps[n].start, dso__new(maps[n].path),
+					i * 10000);
+			if (map == NULL) {
+				pr_debug("memory allocation failed\n");
+				goto out;
+			}
+
+			map->end = map->start + 0x1000;
+			map_groups__insert_by_time(mg, map);
+		}
+	}
+
+	if (verbose > 1)
+		map_groups__fprintf(mg, stderr);
+
+	for (n = 0; n < ARRAY_SIZE(maps); n++) {
+		for (i = 0; i < NR_MAPS; i++) {
+			u64 timestamp = i * 10000;
+
+			map = map_groups__find_by_time(mg, maps[n].start,
+						       timestamp);
+
+			TEST_ASSERT_VAL("cannot find map", map);
+			TEST_ASSERT_VAL("addr not matched",
+					map->start == maps[n].start);
+			TEST_ASSERT_VAL("pathname not matched",
+					!strcmp(map->dso->name, maps[n].path));
+			TEST_ASSERT_VAL("timestamp not matched",
+					map->timestamp == timestamp);
+		}
+	}
+
+	ret = 0;
+out:
+	return ret;
+}
+
+/*
+ * This test creates large number of overlapping maps for increasing
+ * time and find a map based on timestamp.
+ */
+int test__thread_map_lookup_time(struct test *test __maybe_unused, int subtest __maybe_unused)
+{
+	struct machines machines;
+	struct machine *machine;
+	struct thread *t;
+	int ret;
+
+	machines__init(&machines);
+	machine = &machines.host;
+
+	t = machine__findnew_thread(machine, 0, 0);
+
+	ret = lookup_maps(t->mg);
+
+	machine__delete_threads(machine);
+	return ret;
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 29/48] perf callchain: Maintain libunwind's address space in map_groups
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (27 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 28/48] perf tools: Add testcase for managing maps with time Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-14 18:15   ` Arnaldo Carvalho de Melo
  2018-09-13 12:54 ` [PATCH 30/48] perf tools: Rename perf_evlist__munmap_filtered to perf_mmap__put_filtered Jiri Olsa
                   ` (20 subsequent siblings)
  49 siblings, 1 reply; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Frederic Weisbecker, lkml, Ingo Molnar, Namhyung Kim,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov

Currently the address_space was kept in thread struct but it's more
appropriate to keep it in map_groups as it's maintained throughout
exec's with timestamps.  Also we should not flush the address space
after exec since it still can be accessed when used with an indexed
data file.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-hjryh6x2yfnrz8g0djhez24z@git.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/map.h                    |  5 ++++-
 tools/perf/util/thread.h                 |  1 -
 tools/perf/util/unwind-libunwind-local.c | 28 ++++++++++++++----------
 tools/perf/util/unwind-libunwind.c       |  9 ++++----
 tools/perf/util/unwind.h                 |  7 +++---
 5 files changed, 29 insertions(+), 21 deletions(-)

diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index 02c6f6962eb1..b1efe57b8563 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -65,10 +65,13 @@ struct maps {
 
 struct map_groups {
 	struct maps	 maps;
-	struct machine	 *machine;
+	struct machine	*machine;
 	refcount_t	 refcnt;
 	u64		 timestamp;
 	struct list_head list;
+#ifdef HAVE_LIBUNWIND_SUPPORT
+	void		*addr_space;
+#endif
 };
 
 struct map_groups *map_groups__new(struct machine *machine);
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 86186a0773a0..637775f622b3 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -40,7 +40,6 @@ struct thread {
 	struct thread_stack	*ts;
 	struct nsinfo		*nsinfo;
 #ifdef HAVE_LIBUNWIND_SUPPORT
-	void				*addr_space;
 	struct unwind_libunwind_ops	*unwind_libunwind_ops;
 #endif
 };
diff --git a/tools/perf/util/unwind-libunwind-local.c b/tools/perf/util/unwind-libunwind-local.c
index da6f39315b47..f7c921f87bcf 100644
--- a/tools/perf/util/unwind-libunwind-local.c
+++ b/tools/perf/util/unwind-libunwind-local.c
@@ -617,32 +617,35 @@ static unw_accessors_t accessors = {
 	.get_proc_name		= get_proc_name,
 };
 
-static int _unwind__prepare_access(struct thread *thread)
+static int _unwind__prepare_access(struct map_groups *mg)
 {
 	if (!dwarf_callchain_users)
 		return 0;
-	thread->addr_space = unw_create_addr_space(&accessors, 0);
-	if (!thread->addr_space) {
+
+	mg->addr_space = unw_create_addr_space(&accessors, 0);
+	if (!mg->addr_space) {
 		pr_err("unwind: Can't create unwind address space.\n");
 		return -ENOMEM;
 	}
 
-	unw_set_caching_policy(thread->addr_space, UNW_CACHE_GLOBAL);
+	unw_set_caching_policy(mg->addr_space, UNW_CACHE_GLOBAL);
 	return 0;
 }
 
-static void _unwind__flush_access(struct thread *thread)
+static void _unwind__flush_access(struct map_groups *mg)
 {
 	if (!dwarf_callchain_users)
 		return;
-	unw_flush_cache(thread->addr_space, 0, 0);
+
+	unw_flush_cache(mg->addr_space, 0, 0);
 }
 
-static void _unwind__finish_access(struct thread *thread)
+static void _unwind__finish_access(struct map_groups *mg)
 {
 	if (!dwarf_callchain_users)
 		return;
-	unw_destroy_addr_space(thread->addr_space);
+
+	unw_destroy_addr_space(mg->addr_space);
 }
 
 static int get_entries(struct unwind_info *ui, unwind_entry_cb_t cb,
@@ -650,7 +653,6 @@ static int get_entries(struct unwind_info *ui, unwind_entry_cb_t cb,
 {
 	u64 val;
 	unw_word_t ips[max_stack];
-	unw_addr_space_t addr_space;
 	unw_cursor_t c;
 	int ret, i = 0;
 
@@ -666,13 +668,15 @@ static int get_entries(struct unwind_info *ui, unwind_entry_cb_t cb,
 	 * unwind itself.
 	 */
 	if (max_stack - 1 > 0) {
+		struct map_groups *mg;
+
 		WARN_ONCE(!ui->thread, "WARNING: ui->thread is NULL");
-		addr_space = ui->thread->addr_space;
 
-		if (addr_space == NULL)
+		mg = thread__get_map_groups(ui->thread, ui->sample->time);
+		if (mg == NULL || mg->addr_space == NULL)
 			return -1;
 
-		ret = unw_init_remote(&c, addr_space, ui);
+		ret = unw_init_remote(&c, mg->addr_space, ui);
 		if (ret)
 			display_error(ret);
 
diff --git a/tools/perf/util/unwind-libunwind.c b/tools/perf/util/unwind-libunwind.c
index b029a5e9ae49..ce8408e460f2 100644
--- a/tools/perf/util/unwind-libunwind.c
+++ b/tools/perf/util/unwind-libunwind.c
@@ -18,12 +18,13 @@ static void unwind__register_ops(struct thread *thread,
 int unwind__prepare_access(struct thread *thread, struct map *map,
 			   bool *initialized)
 {
+	struct map_groups *mg = thread->mg;
 	const char *arch;
 	enum dso_type dso_type;
 	struct unwind_libunwind_ops *ops = local_unwind_libunwind_ops;
 	int err;
 
-	if (thread->addr_space) {
+	if (mg->addr_space) {
 		pr_debug("unwind: thread map already set, dso=%s\n",
 			 map->dso->name);
 		if (initialized)
@@ -56,7 +57,7 @@ int unwind__prepare_access(struct thread *thread, struct map *map,
 out_register:
 	unwind__register_ops(thread, ops);
 
-	err = thread->unwind_libunwind_ops->prepare_access(thread);
+	err = thread->unwind_libunwind_ops->prepare_access(thread->mg);
 	if (initialized)
 		*initialized = err ? false : true;
 	return err;
@@ -65,13 +66,13 @@ int unwind__prepare_access(struct thread *thread, struct map *map,
 void unwind__flush_access(struct thread *thread)
 {
 	if (thread->unwind_libunwind_ops)
-		thread->unwind_libunwind_ops->flush_access(thread);
+		thread->unwind_libunwind_ops->flush_access(thread->mg);
 }
 
 void unwind__finish_access(struct thread *thread)
 {
 	if (thread->unwind_libunwind_ops)
-		thread->unwind_libunwind_ops->finish_access(thread);
+		thread->unwind_libunwind_ops->finish_access(thread->mg);
 }
 
 int unwind__get_entries(unwind_entry_cb_t cb, void *arg,
diff --git a/tools/perf/util/unwind.h b/tools/perf/util/unwind.h
index 8a44a1569a21..0f18a0858904 100644
--- a/tools/perf/util/unwind.h
+++ b/tools/perf/util/unwind.h
@@ -9,6 +9,7 @@ struct map;
 struct perf_sample;
 struct symbol;
 struct thread;
+struct map_groups;
 
 struct unwind_entry {
 	struct map	*map;
@@ -19,9 +20,9 @@ struct unwind_entry {
 typedef int (*unwind_entry_cb_t)(struct unwind_entry *entry, void *arg);
 
 struct unwind_libunwind_ops {
-	int (*prepare_access)(struct thread *thread);
-	void (*flush_access)(struct thread *thread);
-	void (*finish_access)(struct thread *thread);
+	int (*prepare_access)(struct map_groups *mg);
+	void (*flush_access)(struct map_groups *mg);
+	void (*finish_access)(struct map_groups *mg);
 	int (*get_entries)(unwind_entry_cb_t cb, void *arg,
 			   struct thread *thread,
 			   struct perf_sample *data, int max_stack);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 30/48] perf tools: Rename perf_evlist__munmap_filtered to perf_mmap__put_filtered
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (28 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 29/48] perf callchain: Maintain libunwind's address space in map_groups Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 31/48] tools lib fd array: Introduce fdarray__add_clone function Jiri Olsa
                   ` (19 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

We will use it outside the evlist scope.

Link: http://lkml.kernel.org/n/tip-0r9rtn1bii1iaggumlgkyxqk@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/evlist.c | 6 +++---
 tools/perf/util/evlist.h | 1 +
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 2f094f3bf446..05f57814e9e5 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -503,8 +503,8 @@ int perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd)
 	return __perf_evlist__add_pollfd(evlist, fd, NULL, POLLIN);
 }
 
-static void perf_evlist__munmap_filtered(struct fdarray *fda, int fd,
-					 void *arg __maybe_unused)
+void perf_mmap__put_filtered(struct fdarray *fda, int fd,
+			     void *arg __maybe_unused)
 {
 	struct perf_mmap *map = fda->priv[fd].ptr;
 
@@ -515,7 +515,7 @@ static void perf_evlist__munmap_filtered(struct fdarray *fda, int fd,
 int perf_evlist__filter_pollfd(struct perf_evlist *evlist, short revents_and_mask)
 {
 	return fdarray__filter(&evlist->pollfd, revents_and_mask,
-			       perf_evlist__munmap_filtered, NULL);
+			       perf_mmap__put_filtered, NULL);
 }
 
 int perf_evlist__poll(struct perf_evlist *evlist, int timeout)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index df5162c4292b..523bd68a78a3 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -316,4 +316,5 @@ bool perf_evlist__exclude_kernel(struct perf_evlist *evlist);
 
 void perf_evlist__force_leader(struct perf_evlist *evlist);
 
+void perf_mmap__put_filtered(struct fdarray *fda, int fd, void *arg);
 #endif /* __PERF_EVLIST_H */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 31/48] tools lib fd array: Introduce fdarray__add_clone function
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (29 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 30/48] perf tools: Rename perf_evlist__munmap_filtered to perf_mmap__put_filtered Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 32/48] tools lib subcmd: Add OPT_INTEGER_OPTARG|_SET options Jiri Olsa
                   ` (18 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

Adding fdarray__add_clone to be able to copy/clone
a specific entry from fdarray struct.

It will be useful when separating event maps for
specific threads.

Link: http://lkml.kernel.org/n/tip-0r9rtn1bii1iaggumlgkyxqk@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/lib/api/fd/array.c | 17 +++++++++++++++++
 tools/lib/api/fd/array.h |  1 +
 2 files changed, 18 insertions(+)

diff --git a/tools/lib/api/fd/array.c b/tools/lib/api/fd/array.c
index b0a035fc87b3..cd458b46f61b 100644
--- a/tools/lib/api/fd/array.c
+++ b/tools/lib/api/fd/array.c
@@ -84,6 +84,23 @@ int fdarray__add(struct fdarray *fda, int fd, short revents)
 	return pos;
 }
 
+int fdarray__add_clone(struct fdarray *fda, int pos, struct fdarray *base)
+{
+	struct pollfd *entry;
+	int npos;
+
+	if (pos >= base->nr)
+		return -EINVAL;
+
+	entry = &base->entries[pos];
+
+	npos = fdarray__add(fda, entry->fd, entry->events);
+	if (npos >= 0)
+		fda->priv[npos] = base->priv[pos];
+
+	return npos;
+}
+
 int fdarray__filter(struct fdarray *fda, short revents,
 		    void (*entry_destructor)(struct fdarray *fda, int fd, void *arg),
 		    void *arg)
diff --git a/tools/lib/api/fd/array.h b/tools/lib/api/fd/array.h
index b39557d1a88f..06e89d099b1e 100644
--- a/tools/lib/api/fd/array.h
+++ b/tools/lib/api/fd/array.h
@@ -34,6 +34,7 @@ struct fdarray *fdarray__new(int nr_alloc, int nr_autogrow);
 void fdarray__delete(struct fdarray *fda);
 
 int fdarray__add(struct fdarray *fda, int fd, short revents);
+int fdarray__add_clone(struct fdarray *fda, int pos, struct fdarray *base);
 int fdarray__poll(struct fdarray *fda, int timeout);
 int fdarray__filter(struct fdarray *fda, short revents,
 		    void (*entry_destructor)(struct fdarray *fda, int fd, void *arg),
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 32/48] tools lib subcmd: Add OPT_INTEGER_OPTARG|_SET options
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (30 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 31/48] tools lib fd array: Introduce fdarray__add_clone function Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 33/48] perf tools: Move __perf_session__process_events args into struct Jiri Olsa
                   ` (17 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

Adding OPT_INTEGER_OPTARG|_SET option macros to allow
optional argument with set flag.

Link: http://lkml.kernel.org/n/tip-d01dcbwqvrv5762salo84105@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/lib/subcmd/parse-options.c | 2 ++
 tools/lib/subcmd/parse-options.h | 9 +++++++++
 2 files changed, 11 insertions(+)

diff --git a/tools/lib/subcmd/parse-options.c b/tools/lib/subcmd/parse-options.c
index cb7154eccbdc..e2f68edd76d2 100644
--- a/tools/lib/subcmd/parse-options.c
+++ b/tools/lib/subcmd/parse-options.c
@@ -250,6 +250,8 @@ static int get_value(struct parse_opt_ctx_t *p,
 			*(int *)opt->value = 0;
 			return 0;
 		}
+		if (opt->set)
+			*(bool *)opt->set = true;
 		if (opt->flags & PARSE_OPT_OPTARG && !p->opt) {
 			*(int *)opt->value = opt->defval;
 			return 0;
diff --git a/tools/lib/subcmd/parse-options.h b/tools/lib/subcmd/parse-options.h
index 92fdbe1519f6..e2edd6aada53 100644
--- a/tools/lib/subcmd/parse-options.h
+++ b/tools/lib/subcmd/parse-options.h
@@ -144,6 +144,15 @@ struct option {
 	  .value = check_vtype(v, const char **), .argh = (a), .help = (h), \
 	  .flags = PARSE_OPT_OPTARG, .defval = (intptr_t)(d), \
 	  .set = check_vtype(os, bool *)}
+#define OPT_INTEGER_OPTARG(s, l, v, a, h, d) \
+	{ .type = OPTION_INTEGER,  .short_name = (s), .long_name = (l), \
+	  .value = check_vtype(v, int *), .argh = (a), .help = (h), \
+	  .flags = PARSE_OPT_OPTARG, .defval = (intptr_t)(d) }
+#define OPT_INTEGER_OPTARG_SET(s, l, v, os, a, h, d) \
+	{ .type = OPTION_INTEGER, .short_name = (s), .long_name = (l), \
+	  .value = check_vtype(v, int *), .argh = (a), .help = (h), \
+	  .flags = PARSE_OPT_OPTARG, .defval = (intptr_t)(d), \
+	  .set = check_vtype(os, bool *)}
 #define OPT_STRING_NOEMPTY(s, l, v, a, h)   { .type = OPTION_STRING,  .short_name = (s), .long_name = (l), .value = check_vtype(v, const char **), .argh = (a), .help = (h), .flags = PARSE_OPT_NOEMPTY}
 #define OPT_DATE(s, l, v, h) \
 	{ .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), .value = (v), .argh = "time", .help = (h), .callback = parse_opt_approxidate_cb }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 33/48] perf tools: Move __perf_session__process_events args into struct
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (31 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 32/48] tools lib subcmd: Add OPT_INTEGER_OPTARG|_SET options Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 34/48] perf ui progress: Fix index progress display Jiri Olsa
                   ` (16 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

We will add yet another argument in here, so put all
of them to struct to make it easy to read.

Link: http://lkml.kernel.org/n/tip-lt92qp4bs3igrh6patio0euy@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/session.c | 40 ++++++++++++++++++++++++---------------
 1 file changed, 25 insertions(+), 15 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 15314052084d..f21c209aeef1 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1796,6 +1796,12 @@ fetch_mmaped_event(struct perf_session *session,
 	return event;
 }
 
+struct process_args {
+	u64	data_offset;
+	u64	data_size;
+	u64	file_size;
+};
+
 /*
  * On 64bit we can mmap the data file in one go. No need for tiny mmap
  * slices. On 32bit we use 32MB.
@@ -1809,13 +1815,13 @@ fetch_mmaped_event(struct perf_session *session,
 #endif
 
 static int __perf_session__process_events(struct perf_session *session,
-					  u64 data_offset, u64 data_size,
-					  u64 file_size)
+					  struct process_args *args)
 {
 	struct ordered_events *oe = &session->ordered_events;
 	struct perf_tool *tool = session->tool;
 	int fd = perf_data__fd(session->data);
 	u64 head, page_offset, file_offset, file_pos, size;
+	u64 file_size = args->file_size;
 	int err, mmap_prot, mmap_flags, map_idx = 0;
 	size_t	mmap_size;
 	char *buf, *mmaps[NUM_MMAPS];
@@ -1825,15 +1831,15 @@ static int __perf_session__process_events(struct perf_session *session,
 
 	perf_tool__fill_defaults(tool);
 
-	page_offset = page_size * (data_offset / page_size);
+	page_offset = page_size * (args->data_offset / page_size);
 	file_offset = page_offset;
-	head = data_offset - page_offset;
+	head = args->data_offset - page_offset;
 
-	if (data_size == 0)
+	if (args->data_size == 0)
 		goto out;
 
-	if (data_offset + data_size < file_size)
-		file_size = data_offset + data_size;
+	if (args->data_offset + args->data_size < file_size)
+		file_size = args->data_offset + args->data_size;
 
 	ui_progress__init_size(&prog, file_size, "Processing events...");
 
@@ -1932,6 +1938,7 @@ static int __perf_session__process_events(struct perf_session *session,
 
 static int __perf_session__process_indexed_events(struct perf_session *session)
 {
+	struct process_args args;
 	struct perf_tool *tool = session->tool;
 	int err = 0, i;
 
@@ -1941,9 +1948,11 @@ static int __perf_session__process_indexed_events(struct perf_session *session)
 		if (!idx->size)
 			continue;
 
-		err = __perf_session__process_events(session, idx->offset,
-						     idx->size,
-						     idx->offset + idx->size);
+		args.data_offset = idx->offset;
+		args.data_size   = idx->size;
+		args.file_size   = idx->offset + idx->size;
+
+		err = __perf_session__process_events(session, &args);
 		if (err < 0)
 			break;
 	}
@@ -1956,9 +1965,9 @@ static int __perf_session__process_indexed_events(struct perf_session *session)
 
 int perf_session__process_events(struct perf_session *session)
 {
+	struct process_args args;
 	struct perf_tool *tool = session->tool;
 	struct perf_data *data = session->data;
-	u64 size = perf_data__size(data);
 	int err;
 
 	if (perf_session__register_idle_thread(session) < 0)
@@ -1969,10 +1978,11 @@ int perf_session__process_events(struct perf_session *session)
 	if (perf_has_index)
 		return __perf_session__process_indexed_events(session);
 
-	err = __perf_session__process_events(session,
-					     session->header.data_offset,
-					     session->header.data_size,
-					     size);
+	args.data_offset = session->header.data_offset;
+	args.data_size   = session->header.data_size;
+	args.file_size   = perf_data__size(data);
+
+	err = __perf_session__process_events(session, &args);
 
 	if (!tool->no_warn)
 		perf_session__warn_about_errors(session);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 34/48] perf ui progress: Fix index progress display
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (32 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 33/48] perf tools: Move __perf_session__process_events args into struct Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 35/48] perf tools: Add threads debug variable Jiri Olsa
                   ` (15 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

Display overall index files progress size instead of having
multiple (per index) progress bars.

Link: http://lkml.kernel.org/n/tip-b1alb4i6urd623bcbdqni8xp@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/session.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index f21c209aeef1..e00a5d7e521e 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1797,6 +1797,8 @@ fetch_mmaped_event(struct perf_session *session,
 }
 
 struct process_args {
+	struct ui_progress	prog;
+
 	u64	data_offset;
 	u64	data_size;
 	u64	file_size;
@@ -1826,7 +1828,6 @@ static int __perf_session__process_events(struct perf_session *session,
 	size_t	mmap_size;
 	char *buf, *mmaps[NUM_MMAPS];
 	union perf_event *event;
-	struct ui_progress prog;
 	s64 skip;
 
 	perf_tool__fill_defaults(tool);
@@ -1841,8 +1842,6 @@ static int __perf_session__process_events(struct perf_session *session,
 	if (args->data_offset + args->data_size < file_size)
 		file_size = args->data_offset + args->data_size;
 
-	ui_progress__init_size(&prog, file_size, "Processing events...");
-
 	mmap_size = MMAP_SIZE;
 	if (mmap_size > file_size) {
 		mmap_size = file_size;
@@ -1907,7 +1906,7 @@ static int __perf_session__process_events(struct perf_session *session,
 	head += size;
 	file_pos += size;
 
-	ui_progress__update(&prog, size);
+	ui_progress__update(&args->prog, size);
 
 	if (session_done())
 		goto out;
@@ -1936,12 +1935,29 @@ static int __perf_session__process_events(struct perf_session *session,
 	return err;
 }
 
+static u64 get_index_size(struct perf_session *session)
+{
+	u64 size = 0;
+	int i;
+
+	for (i = 0; i < (int)session->header.nr_index; i++) {
+		struct perf_file_section *idx = &session->header.index[i];
+
+		size += idx->size;
+	}
+
+	return size;
+}
+
 static int __perf_session__process_indexed_events(struct perf_session *session)
 {
 	struct process_args args;
 	struct perf_tool *tool = session->tool;
 	int err = 0, i;
 
+	ui_progress__init_size(&args.prog, get_index_size(session),
+			       "Processing events");
+
 	for (i = 0; i < (int)session->header.nr_index; i++) {
 		struct perf_file_section *idx = &session->header.index[i];
 
@@ -1982,6 +1998,9 @@ int perf_session__process_events(struct perf_session *session)
 	args.data_size   = session->header.data_size;
 	args.file_size   = perf_data__size(data);
 
+	ui_progress__init_size(&args.prog, args.file_size,
+			       "Processing events");
+
 	err = __perf_session__process_events(session, &args);
 
 	if (!tool->no_warn)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 35/48] perf tools: Add threads debug variable
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (33 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 34/48] perf ui progress: Fix index progress display Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 36/48] perf tools: Add perf_mmap__read_tail function Jiri Olsa
                   ` (14 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

To separate thread debug messages from global verbose,
to enable type:

  $ perf --debug threads=X record ...

where X is the debug level.

Link: http://lkml.kernel.org/n/tip-tgpv7s1pjnxgfw7c90mg5tpl@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/Documentation/perf.txt | 1 +
 tools/perf/util/debug.c           | 2 ++
 tools/perf/util/debug.h           | 1 +
 3 files changed, 4 insertions(+)

diff --git a/tools/perf/Documentation/perf.txt b/tools/perf/Documentation/perf.txt
index 864e37597252..5ae38692af64 100644
--- a/tools/perf/Documentation/perf.txt
+++ b/tools/perf/Documentation/perf.txt
@@ -22,6 +22,7 @@ OPTIONS
 	  verbose          - general debug messages
 	  ordered-events   - ordered events object debug messages
 	  data-convert     - data convert command debug messages
+	  threads          - threads debug/stats messages
 
 --buildid-dir::
 	Setup buildid cache directory. It has higher priority than
diff --git a/tools/perf/util/debug.c b/tools/perf/util/debug.c
index 3d6459626c2a..10425630cb71 100644
--- a/tools/perf/util/debug.c
+++ b/tools/perf/util/debug.c
@@ -28,6 +28,7 @@ bool dump_trace = false, quiet = false;
 int debug_ordered_events;
 static int redirect_to_stderr;
 int debug_data_convert;
+int debug_threads;
 
 int veprintf(int level, int var, const char *fmt, va_list args)
 {
@@ -180,6 +181,7 @@ static struct debug_variable {
 	{ .name = "ordered-events",	.ptr = &debug_ordered_events},
 	{ .name = "stderr",		.ptr = &redirect_to_stderr},
 	{ .name = "data-convert",	.ptr = &debug_data_convert },
+	{ .name = "threads",		.ptr = &debug_threads },
 	{ .name = NULL, }
 };
 
diff --git a/tools/perf/util/debug.h b/tools/perf/util/debug.h
index 77445dfc5c7d..96665c66057b 100644
--- a/tools/perf/util/debug.h
+++ b/tools/perf/util/debug.h
@@ -15,6 +15,7 @@ extern int verbose;
 extern bool quiet, dump_trace;
 extern int debug_ordered_events;
 extern int debug_data_convert;
+extern int debug_threads;
 
 #ifndef pr_fmt
 #define pr_fmt(fmt) fmt
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 36/48] perf tools: Add perf_mmap__read_tail function
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (34 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 35/48] perf tools: Add threads debug variable Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 37/48] perf record: Introduce struct record_thread Jiri Olsa
                   ` (13 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

It will be used in following patches.

Link: http://lkml.kernel.org/n/tip-4id1fjlu5ypfqnu9kvpo7l3z@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/mmap.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index bad05b12b9df..eb39d3f85b93 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -79,6 +79,13 @@ static inline u64 perf_mmap__read_head(struct perf_mmap *mm)
 	return head;
 }
 
+static inline u64 perf_mmap__read_tail(struct perf_mmap *md)
+{
+	struct perf_event_mmap_page *pc = md->base;
+
+	return pc->data_tail;
+}
+
 static inline void perf_mmap__write_tail(struct perf_mmap *md, u64 tail)
 {
 	struct perf_event_mmap_page *pc = md->base;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 37/48] perf record: Introduce struct record_thread
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (35 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 36/48] perf tools: Add perf_mmap__read_tail function Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-17 11:26   ` Namhyung Kim
  2018-09-13 12:54 ` [PATCH 38/48] perf record: Read record thread's mmaps Jiri Olsa
                   ` (12 subsequent siblings)
  49 siblings, 1 reply; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

Adding struct record_thread to carry the single thread's maps.

Link: http://lkml.kernel.org/n/tip-dsyi97xdc7ullvsisqmha0ca@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-record.c | 179 ++++++++++++++++++++++++++++++++++++
 1 file changed, 179 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 1b01cb4d06b8..5c6b56f164a9 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -65,6 +65,15 @@ struct switch_output {
 	bool		 set;
 };
 
+struct record_thread {
+	struct perf_mmap	**mmap;
+	int			  mmap_nr;
+	struct perf_mmap	**ovw_mmap;
+	int			  ovw_mmap_nr;
+	struct fdarray		  pollfd;
+	struct record		 *rec;
+};
+
 struct record {
 	struct perf_tool	tool;
 	struct record_opts	opts;
@@ -83,6 +92,8 @@ struct record {
 	bool			timestamp_boundary;
 	struct switch_output	switch_output;
 	unsigned long long	samples;
+	struct record_thread	*threads;
+	int			threads_cnt;
 };
 
 static volatile int auxtrace_record__snapshot_started;
@@ -967,6 +978,166 @@ static int record__synthesize(struct record *rec, bool tail)
 	return err;
 }
 
+static void
+record_thread__clean(struct record_thread *th)
+{
+	free(th->mmap);
+	free(th->ovw_mmap);
+}
+
+static void
+record__threads_clean(struct record *rec)
+{
+	struct record_thread *threads = rec->threads;
+	int i;
+
+	if (threads) {
+		for (i = 0; i < rec->threads_cnt; i++)
+			record_thread__clean(threads + i);
+	}
+}
+
+static void record_thread__init(struct record_thread *th, struct record *rec)
+{
+	memset(th, 0, sizeof(*th));
+	fdarray__init(&th->pollfd, 64);
+	th->rec = rec;
+}
+
+static int
+record_thread__mmap(struct record_thread *th, int nr, int nr_ovw)
+{
+	struct perf_mmap **mmap;
+
+	mmap = zalloc(sizeof(*mmap) * nr);
+	if (!mmap)
+		return -ENOMEM;
+
+	th->mmap    = mmap;
+	th->mmap_nr = nr;
+
+	if (nr_ovw) {
+		mmap = zalloc(sizeof(*mmap) * nr_ovw);
+		if (!mmap)
+			return -ENOMEM;
+
+		th->ovw_mmap    = mmap;
+		th->ovw_mmap_nr = nr;
+	}
+
+	return 0;
+}
+
+static int
+record__threads_assign(struct record *rec)
+{
+	struct record_thread *threads = rec->threads;
+	struct record_thread *thread0 = threads;
+	struct perf_evlist *evlist = rec->evlist;
+	int i, j, nr, nr0, nr_ovw, nr_trk;
+	int ret = -ENOMEM;
+
+	nr     = evlist->mmap           ? evlist->nr_mmaps : 0;
+	nr_trk = evlist->track_mmap     ? evlist->nr_mmaps : 0;
+	nr_ovw = evlist->overwrite_mmap ? evlist->nr_mmaps : 0;
+
+	nr0  = nr_trk;
+	nr0 += nr;
+
+	if (record_thread__mmap(thread0, nr0, nr_ovw))
+		goto out_error;
+
+	for (i = 0; i < nr_ovw; i++)
+		thread0->ovw_mmap[i] = &evlist->overwrite_mmap[i];
+
+	for (i = 0; i < nr_trk; i++)
+		thread0->mmap[i] = &evlist->track_mmap[i];
+
+	for (j = 0; i < nr0 && j < nr; i++, j++)
+		thread0->mmap[i] = &evlist->mmap[j];
+
+	ret = 0;
+
+out_error:
+	return ret;
+}
+
+static int
+record_thread__create_poll(struct record_thread *th,
+			   struct perf_evlist *evlist)
+{
+	struct fdarray *fda = &evlist->pollfd;
+	struct perf_mmap *mmap;
+	int i, j;
+
+	for (i = 0; i < th->mmap_nr; i++) {
+		mmap = th->mmap[i];
+
+		for (j = 0; j < fda->nr; j++) {
+			if (mmap != fda->priv[j].ptr)
+				continue;
+
+			if (fdarray__add_clone(&th->pollfd, j, fda) < 0)
+				return -ENOMEM;
+
+			break;
+		}
+	}
+
+	return 0;
+}
+
+static int
+record__threads_create_poll(struct record *rec)
+{
+	struct record_thread *threads = rec->threads;
+	int ret = 0, i;
+
+	for (i = 0; !ret && (i < rec->threads_cnt); i++)
+		ret = record_thread__create_poll(threads + i, rec->evlist);
+
+	return ret;
+}
+
+static int
+record__threads_create(struct record *rec)
+{
+	struct record_thread *threads;
+	int i, cnt = rec->threads_cnt;
+
+	threads = zalloc(sizeof(*threads) * cnt);
+	if (threads) {
+		for (i = 0; i < cnt; i++)
+			record_thread__init(threads + i, rec);
+
+		rec->threads = threads;
+	}
+
+	return threads ? 0 : -ENOMEM;
+}
+
+static int
+record__threads_config(struct record *rec)
+{
+	int ret;
+
+	ret = record__threads_create(rec);
+	if (ret)
+		goto out;
+
+	ret = record__threads_assign(rec);
+	if (ret)
+		goto out;
+
+	ret = record__threads_create_poll(rec);
+
+out:
+	if (ret)
+		record__threads_clean(rec);
+
+	return ret;
+}
+
 static int __cmd_record(struct record *rec, int argc, const char **argv)
 {
 	int err;
@@ -1040,6 +1211,11 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		goto out_child;
 	}
 
+	if (record__threads_config(rec)) {
+		err = -1;
+		goto out_child;
+	}
+
 	if (opts->index) {
 		err = record__mmap_index(rec);
 		if (err)
@@ -1316,6 +1492,8 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 
 	perf_hooks__invoke_record_end();
 
+	record__threads_clean(rec);
+
 	if (!err && !quiet) {
 		char samples[128];
 		const char *postfix = rec->timestamp_filename ?
@@ -1657,6 +1835,7 @@ static struct record record = {
 		.mmap2		= perf_event__process_mmap2,
 		.ordered_events	= true,
 	},
+	.threads_cnt = 1,
 };
 
 const char record_callchain_help[] = CALLCHAIN_RECORD_HELP
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 38/48] perf record: Read record thread's mmaps
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (36 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 37/48] perf record: Introduce struct record_thread Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-17 11:28   ` Namhyung Kim
  2018-09-13 12:54 ` [PATCH 39/48] perf record: Move waking into struct record Jiri Olsa
                   ` (11 subsequent siblings)
  49 siblings, 1 reply; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

Switch the maps source from evlist into thread data.

Link: http://lkml.kernel.org/n/tip-2r6hn6shl185j66b4vl1k4pr@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-record.c | 37 ++++++++++++++++++++-----------------
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 5c6b56f164a9..d6fef646b67f 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -96,6 +96,8 @@ struct record {
 	int			threads_cnt;
 };
 
+static __thread struct record_thread *thread;
+
 static volatile int auxtrace_record__snapshot_started;
 static DEFINE_TRIGGER(auxtrace_snapshot_trigger);
 static DEFINE_TRIGGER(switch_output_trigger);
@@ -561,24 +563,24 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 				    bool overwrite)
 {
 	u64 bytes_written = rec->bytes_written;
-	int i;
+	int i, nr;
 	int rc = 0;
-	struct perf_mmap *maps;
+	struct perf_mmap **maps;
 
 	if (!evlist)
 		return 0;
 
-	maps = overwrite ? evlist->overwrite_mmap : evlist->mmap;
+	maps = overwrite ? thread->ovw_mmap : thread->mmap;
 	if (!maps)
 		return 0;
 
 	if (overwrite && evlist->bkw_mmap_state != BKW_MMAP_DATA_PENDING)
 		return 0;
 
-	for (i = 0; i < evlist->nr_mmaps; i++) {
-		struct perf_mmap *map = &maps[i];
-		struct perf_mmap *track_map =  evlist->track_mmap ?
-					      &evlist->track_mmap[i] : NULL;
+	nr = overwrite ? thread->ovw_mmap_nr : thread->mmap_nr;
+
+	for (i = 0; i < nr; i++) {
+		struct perf_mmap *map = maps[i];
 
 		if (map->base) {
 			if (perf_mmap__push(map, rec, record__pushfn) != 0) {
@@ -592,21 +594,20 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 			rc = -1;
 			goto out;
 		}
-
-		if (track_map && track_map->base) {
-			if (perf_mmap__push(track_map, rec, record__pushfn) != 0) {
-				rc = -1;
-				goto out;
-			}
-		}
 	}
 
 	/*
 	 * Mark the round finished in case we wrote
 	 * at least one event.
 	 */
-	if (bytes_written != rec->bytes_written)
-		rc = record__write(rec, NULL, &finished_round_event, sizeof(finished_round_event));
+	if (bytes_written != rec->bytes_written) {
+		/*
+		 * All maps of the threads point to a single file,
+		 * so we can just pick first one.
+		 */
+		rc = record__write(rec, thread->mmap[0], &finished_round_event,
+				   sizeof(finished_round_event));
+	}
 
 	if (overwrite)
 		perf_evlist__toggle_bkw_mmap(evlist, BKW_MMAP_EMPTY);
@@ -1222,6 +1223,8 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 			goto out_child;
 	}
 
+	thread = &rec->threads[0];
+
 	err = bpf__apply_obj_config();
 	if (err) {
 		char errbuf[BUFSIZ];
@@ -1415,7 +1418,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		if (hits == rec->samples) {
 			if (done || draining)
 				break;
-			err = perf_evlist__poll(rec->evlist, -1);
+			err = fdarray__poll(&thread->pollfd, -1);
 			/*
 			 * Propagate error, only if there's any. Ignore positive
 			 * number of returned events and interrupt error.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 39/48] perf record: Move waking into struct record
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (37 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 38/48] perf record: Read record thread's mmaps Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-17 11:31   ` Namhyung Kim
  2018-09-13 12:54 ` [PATCH 40/48] perf record: Move samples into struct record_thread Jiri Olsa
                   ` (10 subsequent siblings)
  49 siblings, 1 reply; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

We need to keep global number of 'waking' now.

TODO: make this multiple threads safe.

Link: http://lkml.kernel.org/n/tip-veetgk62aisdt1cxaa6fbgox@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-record.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index d6fef646b67f..62ff4411ce39 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -94,6 +94,7 @@ struct record {
 	unsigned long long	samples;
 	struct record_thread	*threads;
 	int			threads_cnt;
+	unsigned long		waking;
 };
 
 static __thread struct record_thread *thread;
@@ -1143,7 +1144,6 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 {
 	int err;
 	int status = 0;
-	unsigned long waking = 0;
 	const bool forks = argc > 0;
 	struct perf_tool *tool = &rec->tool;
 	struct record_opts *opts = &rec->opts;
@@ -1400,8 +1400,8 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 
 			if (!quiet)
 				fprintf(stderr, "[ perf record: dump data: Woken up %ld times ]\n",
-					waking);
-			waking = 0;
+					rec->waking);
+			rec->waking = 0;
 			fd = record__switch_output(rec, false);
 			if (fd < 0) {
 				pr_err("Failed to switch to new file\n");
@@ -1425,7 +1425,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 			 */
 			if (err > 0 || (err < 0 && errno == EINTR))
 				err = 0;
-			waking++;
+			rec->waking++;
 
 			if (perf_evlist__filter_pollfd(rec->evlist, POLLERR | POLLHUP) == 0)
 				draining = true;
@@ -1454,7 +1454,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	}
 
 	if (!quiet)
-		fprintf(stderr, "[ perf record: Woken up %ld times to write data ]\n", waking);
+		fprintf(stderr, "[ perf record: Woken up %ld times to write data ]\n", rec->waking);
 
 	if (target__none(&rec->opts.target))
 		record__synthesize_workload(rec, true);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 40/48] perf record: Move samples into struct record_thread
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (38 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 39/48] perf record: Move waking into struct record Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 41/48] perf record: Move bytes_written " Jiri Olsa
                   ` (9 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

Moving samples value into struct record_thread, because
we need to have this value per thread for checking
if there have been new data.

Link: http://lkml.kernel.org/n/tip-yhah10r7jikirxhbprxs3wlm@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-record.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 62ff4411ce39..b17445f332a8 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -72,6 +72,7 @@ struct record_thread {
 	int			  ovw_mmap_nr;
 	struct fdarray		  pollfd;
 	struct record		 *rec;
+	unsigned long long	  samples;
 };
 
 struct record {
@@ -159,7 +160,7 @@ static int record__pushfn(struct perf_mmap *map, void *to, void *bf, size_t size
 {
 	struct record *rec = to;
 
-	rec->samples++;
+	thread->samples++;
 	return record__write(rec, map, bf, size);
 }
 
@@ -243,7 +244,7 @@ static int record__auxtrace_mmap_read(struct record *rec,
 		return ret;
 
 	if (ret)
-		rec->samples++;
+		thread->samples++;
 
 	return 0;
 }
@@ -260,7 +261,7 @@ static int record__auxtrace_mmap_read_snapshot(struct record *rec,
 		return ret;
 
 	if (ret)
-		rec->samples++;
+		thread->samples++;
 
 	return 0;
 }
@@ -1346,7 +1347,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	trigger_ready(&switch_output_trigger);
 	perf_hooks__invoke_record_start();
 	for (;;) {
-		unsigned long long hits = rec->samples;
+		unsigned long long hits = thread->samples;
 
 		/*
 		 * rec->evlist->bkw_mmap_state is possible to be
@@ -1415,7 +1416,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 				alarm(rec->switch_output.time);
 		}
 
-		if (hits == rec->samples) {
+		if (hits == thread->samples) {
 			if (done || draining)
 				break;
 			err = fdarray__poll(&thread->pollfd, -1);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 41/48] perf record: Move bytes_written into struct record_thread
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (39 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 40/48] perf record: Move samples into struct record_thread Jiri Olsa
@ 2018-09-13 12:54 ` " Jiri Olsa
  2018-09-13 12:54 ` [PATCH 42/48] perf record: Add record_thread start/stop/process functions Jiri Olsa
                   ` (8 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

Moving bytes_written into struct record_thread.

Link: http://lkml.kernel.org/n/tip-q1xlsqksw6i8my0wbunv066d@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-record.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index b17445f332a8..253bafd4dbe7 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -73,12 +73,12 @@ struct record_thread {
 	struct fdarray		  pollfd;
 	struct record		 *rec;
 	unsigned long long	  samples;
+	u64			  bytes_written;
 };
 
 struct record {
 	struct perf_tool	tool;
 	struct record_opts	opts;
-	u64			bytes_written;
 	struct perf_data	data;
 	struct auxtrace_record	*itr;
 	struct perf_evlist	*evlist;
@@ -114,7 +114,7 @@ static bool switch_output_size(struct record *rec)
 {
 	return rec->switch_output.size &&
 	       trigger_is_ready(&switch_output_trigger) &&
-	       (rec->bytes_written >= rec->switch_output.size);
+	       (thread->bytes_written >= rec->switch_output.size);
 }
 
 static bool switch_output_time(struct record *rec)
@@ -138,7 +138,7 @@ static int record__write(struct record *rec, struct perf_mmap *map,
 		return -1;
 	}
 
-	rec->bytes_written += size;
+	thread->bytes_written += size;
 
 	if (switch_output_size(rec))
 		trigger_hit(&switch_output_trigger);
@@ -564,7 +564,7 @@ static struct perf_event_header finished_round_event = {
 static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evlist,
 				    bool overwrite)
 {
-	u64 bytes_written = rec->bytes_written;
+	u64 bytes_written = thread->bytes_written;
 	int i, nr;
 	int rc = 0;
 	struct perf_mmap **maps;
@@ -602,7 +602,7 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 	 * Mark the round finished in case we wrote
 	 * at least one event.
 	 */
-	if (bytes_written != rec->bytes_written) {
+	if (bytes_written != thread->bytes_written) {
 		/*
 		 * All maps of the threads point to a single file,
 		 * so we can just pick first one.
@@ -701,8 +701,12 @@ static int record__merge_index(struct record *rec)
 			goto out_close;
 
 		pr_debug(" ok\n");
+		rec->session->header.data_size += idx[i].size;
 	}
 
+	offset = lseek(output_fd, 0, SEEK_END);
+	rec->session->header.data_size = offset - session->header.data_offset;
+
 	session->header.index = idx;
 	session->header.nr_index = nr_index;
 
@@ -727,10 +731,11 @@ record__finish_output(struct record *rec)
 	if (data->is_pipe)
 		return;
 
-	rec->session->header.data_size += rec->bytes_written;
-
 	if (rec->opts.index)
 		record__merge_index(rec);
+	else
+		rec->session->header.data_size += thread->bytes_written;
+
 
 	data->size = lseek(perf_data__fd(data), 0, SEEK_END);
 
@@ -793,7 +798,7 @@ record__switch_output(struct record *rec, bool at_exit)
 				    rec->session->header.data_offset,
 				    at_exit);
 	if (fd >= 0 && !at_exit) {
-		rec->bytes_written = 0;
+		thread->bytes_written = 0;
 		rec->session->header.data_size = 0;
 	}
 
@@ -917,7 +922,6 @@ static int record__synthesize(struct record *rec, bool tail)
 				pr_err("Couldn't record tracing data.\n");
 				goto out;
 			}
-			rec->bytes_written += err;
 		}
 	}
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 42/48] perf record: Add record_thread start/stop/process functions
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (40 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 41/48] perf record: Move bytes_written " Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 43/48] perf record: Wait for all threads being started Jiri Olsa
                   ` (7 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

Adding thread process and API to start/stop it.

Link: http://lkml.kernel.org/n/tip-bwa3w7lt63ffe78w4ggjc9dw@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-record.c | 92 +++++++++++++++++++++++++++++++++++++
 1 file changed, 92 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 253bafd4dbe7..6ad57ba6657e 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -65,6 +65,11 @@ struct switch_output {
 	bool		 set;
 };
 
+enum {
+	RECORD_THREAD__RUNNING	= 0,
+	RECORD_THREAD__STOP	= 1,
+};
+
 struct record_thread {
 	struct perf_mmap	**mmap;
 	int			  mmap_nr;
@@ -74,6 +79,8 @@ struct record_thread {
 	struct record		 *rec;
 	unsigned long long	  samples;
 	u64			  bytes_written;
+	pthread_t		  pt;
+	int			  state;
 };
 
 struct record {
@@ -1145,6 +1152,80 @@ record__threads_config(struct record *rec)
 	return ret;
 }
 
+static void*
+record_thread__process(struct record *rec)
+{
+	while (thread->state != RECORD_THREAD__STOP) {
+		unsigned long long hits = thread->samples;
+		int err;
+
+		if (record__mmap_read_all(thread->rec) < 0)
+			break;
+
+		if (hits == thread->samples) {
+			err = fdarray__poll(&thread->pollfd, 500);
+			/*
+			 * Propagate error, only if there's any. Ignore positive
+			 * number of returned events and interrupt error.
+			 */
+			if (err > 0 || (err < 0 && errno == EINTR))
+				err = 0;
+			rec->waking++;
+
+			if (fdarray__filter(&thread->pollfd, POLLERR|POLLHUP,
+					    perf_mmap__put_filtered, NULL) == 0)
+				break;
+		}
+	}
+
+	return NULL;
+}
+
+static void *worker(void *arg)
+{
+	struct record_thread *th = arg;
+	struct record *rec = th->rec;
+
+	thread        = th;
+	thread->state = RECORD_THREAD__RUNNING;
+
+	return record_thread__process(rec);
+}
+
+static int record__threads_start(struct record *rec)
+{
+	struct record_thread *threads = rec->threads;
+	int i, err = 0;
+
+	for (i = 1; !err && i < rec->threads_cnt; i++) {
+		struct record_thread *th = threads + i;
+
+		err = pthread_create(&th->pt, NULL, worker, th);
+	}
+
+	return err;
+}
+
+static int record__threads_stop(struct record *rec)
+{
+	struct record_thread *threads = rec->threads;
+	int i, err = 0;
+
+	for (i = 1; i < rec->threads_cnt; i++) {
+		struct record_thread *th = threads + i;
+
+		th->state = RECORD_THREAD__STOP;
+	}
+
+	for (i = 1; !err && i < rec->threads_cnt; i++) {
+		struct record_thread *th = threads + i;
+
+		err = pthread_join(th->pt, NULL);
+	}
+
+	return err;
+}
+
 static int __cmd_record(struct record *rec, int argc, const char **argv)
 {
 	int err;
@@ -1270,6 +1351,14 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		goto out_child;
 	}
 
+	/*
+	 * We need to call this before record__synthesize, so in case we
+	 * sample system wide perf threads get synthesized as well.
+	 */
+	err = record__threads_start(rec);
+	if (err < 0)
+		goto out_child;
+
 	err = record__synthesize(rec, false);
 	if (err < 0)
 		goto out_child;
@@ -1450,6 +1539,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	trigger_off(&auxtrace_snapshot_trigger);
 	trigger_off(&switch_output_trigger);
 
+	if (record__threads_stop(rec))
+		pr_err("failed to stop threads\n");
+
 	if (forks && workload_exec_errno) {
 		char msg[STRERR_BUFSIZE];
 		const char *emsg = str_error_r(workload_exec_errno, msg, sizeof(msg));
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 43/48] perf record: Wait for all threads being started
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (41 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 42/48] perf record: Add record_thread start/stop/process functions Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 44/48] perf record: Add --threads option Jiri Olsa
                   ` (6 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

Ensure all threads are started and ready before
we enable events. Using pthread_cond_t signaling
logic for that.

Link: http://lkml.kernel.org/n/tip-z4x5ikp8v7e3flpxoo4bv1un@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-record.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 6ad57ba6657e..fbca1d15b90d 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -102,6 +102,9 @@ struct record {
 	unsigned long long	samples;
 	struct record_thread	*threads;
 	int			threads_cnt;
+	int			threads_signal_cnt;
+	pthread_mutex_t		threads_signal_mutex;
+	pthread_cond_t		threads_signal_cond;
 	unsigned long		waking;
 };
 
@@ -1145,6 +1148,9 @@ record__threads_config(struct record *rec)
 
 	ret = record__threads_create_poll(rec);
 
+	pthread_mutex_init(&rec->threads_signal_mutex, NULL);
+	pthread_cond_init(&rec->threads_signal_cond, NULL);
+
 out:
 	if (ret)
 		record__threads_clean(rec);
@@ -1181,6 +1187,26 @@ record_thread__process(struct record *rec)
 	return NULL;
 }
 
+static void signal_main(struct record *rec)
+{
+	pthread_mutex_lock(&rec->threads_signal_mutex);
+	rec->threads_signal_cnt++;
+	pthread_cond_signal(&rec->threads_signal_cond);
+	pthread_mutex_unlock(&rec->threads_signal_mutex);
+}
+
+static void wait_for_signal(struct record *rec)
+{
+	pthread_mutex_lock(&rec->threads_signal_mutex);
+
+	while (rec->threads_signal_cnt < rec->threads_cnt) {
+		pthread_cond_wait(&rec->threads_signal_cond,
+				  &rec->threads_signal_mutex);
+	}
+
+	pthread_mutex_unlock(&rec->threads_signal_mutex);
+}
+
 static void *worker(void *arg)
 {
 	struct record_thread *th = arg;
@@ -1189,6 +1215,8 @@ static void *worker(void *arg)
 	thread        = th;
 	thread->state = RECORD_THREAD__RUNNING;
 
+	signal_main(rec);
+
 	return record_thread__process(rec);
 }
 
@@ -1197,12 +1225,17 @@ static int record__threads_start(struct record *rec)
 	struct record_thread *threads = rec->threads;
 	int i, err = 0;
 
+	rec->threads_signal_cnt = 1;
+
 	for (i = 1; !err && i < rec->threads_cnt; i++) {
 		struct record_thread *th = threads + i;
 
 		err = pthread_create(&th->pt, NULL, worker, th);
 	}
 
+	if (rec->threads_cnt > 1)
+		wait_for_signal(rec);
+
 	return err;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 44/48] perf record: Add --threads option
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (42 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 43/48] perf record: Wait for all threads being started Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-17 11:37   ` Namhyung Kim
  2018-09-13 12:54 ` [PATCH 45/48] perf record: Add --thread-stats option support Jiri Olsa
                   ` (5 subsequent siblings)
  49 siblings, 1 reply; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

Allows to assign number to record::threads_cnt and thus
to create multiple threads. At this point we don't allow
to specify number of threads, instead we assign it number
of evlist's mmaps to have a single thread for each.

Link: http://lkml.kernel.org/n/tip-ijl786fsk46q6g01is378a5t@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-record.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index fbca1d15b90d..ada6f795d492 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -102,6 +102,7 @@ struct record {
 	unsigned long long	samples;
 	struct record_thread	*threads;
 	int			threads_cnt;
+	bool			threads_set;
 	int			threads_signal_cnt;
 	pthread_mutex_t		threads_signal_mutex;
 	pthread_cond_t		threads_signal_cond;
@@ -1133,11 +1134,38 @@ record__threads_create(struct record *rec)
 	return threads ? 0 : -ENOMEM;
 }
 
+static int record__threads_cnt(struct record *rec)
+{
+	struct perf_evlist *evlist = rec->evlist;
+	int cnt;
+
+	if (rec->threads_set) {
+		if (rec->threads_cnt) {
+			pr_err("failed: Can't specify number of threads yet.\n");
+			return -EINVAL;
+		}
+		if (evlist->overwrite_mmap) {
+			pr_err("failed: Can't use multiple threads with overwrite mmaps yet.\n");
+			return -EINVAL;
+		}
+		cnt = evlist->nr_mmaps;
+	} else {
+		cnt = 1;
+	}
+
+	rec->threads_cnt = cnt;
+	return 0;
+}
+
 static int
 record__threads_config(struct record *rec)
 {
 	int ret;
 
+	ret = record__threads_cnt(rec);
+	if (ret)
+		goto out;
+
 	ret = record__threads_create(rec);
 	if (ret)
 		goto out;
@@ -2119,6 +2147,8 @@ static struct option __record_options[] = {
 		    "Parse options then exit"),
 	OPT_BOOLEAN(0, "index", &record.opts.index,
 		    "make index for sample data to speed-up processing"),
+	OPT_INTEGER_OPTARG_SET(0, "threads", &record.threads_cnt, &record.threads_set,
+			       "count", "Enabled threads (count)", 0),
 	OPT_END()
 };
 
@@ -2267,6 +2297,12 @@ int cmd_record(int argc, const char **argv)
 		goto out;
 	}
 
+	/*
+	 * Threads need index data file.
+	 */
+	if (record.threads_set)
+		record.opts.index = true;
+
 	if (rec->opts.index) {
 		if (!rec->opts.sample_time) {
 			pr_err("Sample timestamp is required for indexing\n");
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 45/48] perf record: Add --thread-stats option support
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (43 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 44/48] perf record: Add --threads option Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 46/48] perf record: Add maps to --thread-stats output Jiri Olsa
                   ` (4 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

Add per-thread stats to have an idea what's happening
in the main reading loop.

  $ perf --debug threads=2 record ...
  SNIP
            pid      write       poll       skip
      1s   8914       136B          1          0
      2s   8914       512K         43         79
      3s   8914         3M        214        385
      4s   8914         3M        121        291

  $ perf --debug threads=2 record --threads ...
  SNIP
            pid      write       poll       skip
     1s   9770       144B          1          0
          9772         0B          1          0
          9773         0B          1          0
          9774         0B          1          0
     2s   9770       290K         35         37
          9772       272K         36         34
          9773       274K         35         35
          9774       304K         39         39
     3s   9770      1120K        140        140
          9772      1088K        138        138
          9773      1120K        140        140
          9774      1123K        140        140
     4s   9770      1161K        146        146
          9772      1121K        142        142
          9773      1135K        142        142
          9774      1159K        145        145

Link: http://lkml.kernel.org/n/tip-z9un5mjzsh47u9m12ijn7pfq@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-record.c | 79 +++++++++++++++++++++++++++++++++++--
 1 file changed, 76 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index ada6f795d492..ec487d1f2b0b 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -70,7 +70,14 @@ enum {
 	RECORD_THREAD__STOP	= 1,
 };
 
+struct thread_stat {
+	u64	bytes_written;
+	u64	poll;
+	u64	poll_skip;
+};
+
 struct record_thread {
+	int			  pid;
 	struct perf_mmap	**mmap;
 	int			  mmap_nr;
 	struct perf_mmap	**ovw_mmap;
@@ -81,6 +88,7 @@ struct record_thread {
 	u64			  bytes_written;
 	pthread_t		  pt;
 	int			  state;
+	struct thread_stat	  stats;
 };
 
 struct record {
@@ -149,7 +157,8 @@ static int record__write(struct record *rec, struct perf_mmap *map,
 		return -1;
 	}
 
-	thread->bytes_written += size;
+	thread->bytes_written       += size;
+	thread->stats.bytes_written += size;
 
 	if (switch_output_size(rec))
 		trigger_hit(&switch_output_trigger);
@@ -1186,6 +1195,11 @@ record__threads_config(struct record *rec)
 	return ret;
 }
 
+static inline pid_t gettid(void)
+{
+	return (pid_t) syscall(__NR_gettid);
+}
+
 static void*
 record_thread__process(struct record *rec)
 {
@@ -1197,6 +1211,8 @@ record_thread__process(struct record *rec)
 			break;
 
 		if (hits == thread->samples) {
+			thread->stats.poll++;
+
 			err = fdarray__poll(&thread->pollfd, 500);
 			/*
 			 * Propagate error, only if there's any. Ignore positive
@@ -1209,6 +1225,8 @@ record_thread__process(struct record *rec)
 			if (fdarray__filter(&thread->pollfd, POLLERR|POLLHUP,
 					    perf_mmap__put_filtered, NULL) == 0)
 				break;
+		} else {
+			thread->stats.poll_skip++;
 		}
 	}
 
@@ -1241,6 +1259,7 @@ static void *worker(void *arg)
 	struct record *rec = th->rec;
 
 	thread        = th;
+	thread->pid   = gettid();
 	thread->state = RECORD_THREAD__RUNNING;
 
 	signal_main(rec);
@@ -1287,6 +1306,50 @@ static int record__threads_stop(struct record *rec)
 	return err;
 }
 
+static void record_thread__display(struct record_thread *t, unsigned long s)
+{
+	char buf_size[20];
+	char buf_time[20];
+
+	unit_number__scnprintf(buf_size, sizeof(buf_size), t->stats.bytes_written);
+
+	if (s)
+		scnprintf(buf_time, sizeof(buf_time), "%5lus", s);
+	else
+		buf_time[0] = 0;
+
+	fprintf(stderr, "%6s %6d %10s %10" PRIu64" %10" PRIu64"\n",
+		buf_time, t->pid, buf_size, t->stats.poll, t->stats.poll_skip);
+}
+
+static void record__threads_stats(struct record *rec)
+{
+	struct record_thread *threads = rec->threads;
+	static time_t last, last_header, start;
+	time_t current = time(NULL);
+	int i;
+
+	if (last == current)
+		return;
+
+	if (!start)
+		start = current - 1;
+
+	last = current;
+
+	if (!last_header || (last_header + 10 < current)) {
+		fprintf(stderr, "%6s %6s %10s %10s %10s\n", " ", "pid", "write", "poll", "skip");
+		last_header = current;
+	}
+
+	for (i = 0; i < rec->threads_cnt; i++) {
+		struct record_thread *t = threads + i;
+
+		record_thread__display(t, !i ? current - start : 0);
+		memset(&t->stats, 0, sizeof(t->stats));
+	}
+}
+
 static int __cmd_record(struct record *rec, int argc, const char **argv)
 {
 	int err;
@@ -1371,6 +1434,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	}
 
 	thread = &rec->threads[0];
+	thread->pid = gettid();
 
 	err = bpf__apply_obj_config();
 	if (err) {
@@ -1573,7 +1637,10 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		if (hits == thread->samples) {
 			if (done || draining)
 				break;
-			err = fdarray__poll(&thread->pollfd, -1);
+
+			err = fdarray__poll(&thread->pollfd, 1000);
+			thread->stats.poll++;
+
 			/*
 			 * Propagate error, only if there's any. Ignore positive
 			 * number of returned events and interrupt error.
@@ -1582,10 +1649,16 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 				err = 0;
 			rec->waking++;
 
-			if (perf_evlist__filter_pollfd(rec->evlist, POLLERR | POLLHUP) == 0)
+			if (fdarray__filter(&thread->pollfd, POLLERR|POLLHUP,
+					    perf_mmap__put_filtered, NULL) == 0)
 				draining = true;
+		} else {
+			thread->stats.poll_skip++;
 		}
 
+		if (debug_threads)
+			record__threads_stats(rec);
+
 		/*
 		 * When perf is starting the traced process, at the end events
 		 * die with the process and we wait for that. Thus no need to
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 46/48] perf record: Add maps to --thread-stats output
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (44 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 45/48] perf record: Add --thread-stats option support Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 12:54 ` [PATCH 47/48] perf record: Spread maps for --threads option Jiri Olsa
                   ` (3 subsequent siblings)
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

Display free size of thread's memory maps as part
of --thread-stats output.

  $ perf --debug threads=2 record ...
  ...
            pid      write       poll       skip  maps (size 20K)
      1s   8914       136B          1          0   19K   19K   19K   19K
      2s   8914       512K         43         79   19K   19K   17K   19K
      3s   8914         3M        214        385   17K   16K   16K   17K
      4s   8914         3M        121        291   17K   17K   18K   18K
   ...

Link: http://lkml.kernel.org/n/tip-4id1fjlu5ypfqnu9kvpo7l3z@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-record.c | 37 +++++++++++++++++++++++++++++++++----
 1 file changed, 33 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index ec487d1f2b0b..92ba4d83b18c 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1306,10 +1306,25 @@ static int record__threads_stop(struct record *rec)
 	return err;
 }
 
-static void record_thread__display(struct record_thread *t, unsigned long s)
+/* Stolen from kernel. */
+#define CIRC_CNT(head, tail, size)    (((head) - (tail)) & ((size) - 1))
+#define CIRC_SPACE(head, tail, size)  CIRC_CNT((tail), ((head)+1), (size))
+
+static u64 mmap_free_size(struct perf_mmap *map, struct perf_evlist *evlist)
+{
+	u64 head = perf_mmap__read_head(map);
+	u64 tail = perf_mmap__read_tail(map);
+
+	return CIRC_SPACE(head, tail, evlist->mmap_len);
+}
+
+static void
+record_thread__display(struct record_thread *t, struct perf_evlist *evlist,
+		       unsigned long s)
 {
 	char buf_size[20];
 	char buf_time[20];
+	int i;
 
 	unit_number__scnprintf(buf_size, sizeof(buf_size), t->stats.bytes_written);
 
@@ -1318,15 +1333,26 @@ static void record_thread__display(struct record_thread *t, unsigned long s)
 	else
 		buf_time[0] = 0;
 
-	fprintf(stderr, "%6s %6d %10s %10" PRIu64" %10" PRIu64"\n",
+	fprintf(stderr, "%6s %6d %10s %10" PRIu64" %10" PRIu64 " ",
 		buf_time, t->pid, buf_size, t->stats.poll, t->stats.poll_skip);
+
+	for (i = 0; i < t->mmap_nr; i++) {
+		u64 size = mmap_free_size(t->mmap[i], evlist);
+
+		unit_number__scnprintf(buf_size, sizeof(buf_size), size);
+		fprintf(stderr, "%5s ", buf_size);
+	}
+
+	fprintf(stderr, "\n");
 }
 
 static void record__threads_stats(struct record *rec)
 {
 	struct record_thread *threads = rec->threads;
+	struct perf_evlist *evlist = rec->evlist;
 	static time_t last, last_header, start;
 	time_t current = time(NULL);
+	char buf_size[20];
 	int i;
 
 	if (last == current)
@@ -1337,15 +1363,18 @@ static void record__threads_stats(struct record *rec)
 
 	last = current;
 
+	unit_number__scnprintf(buf_size, sizeof(buf_size), evlist->mmap_len);
+
 	if (!last_header || (last_header + 10 < current)) {
-		fprintf(stderr, "%6s %6s %10s %10s %10s\n", " ", "pid", "write", "poll", "skip");
+		fprintf(stderr, "%6s %6s %10s %10s %10s %5s (size %s)\n",
+			" ", "pid", "write", "poll", "skip", "maps", buf_size);
 		last_header = current;
 	}
 
 	for (i = 0; i < rec->threads_cnt; i++) {
 		struct record_thread *t = threads + i;
 
-		record_thread__display(t, !i ? current - start : 0);
+		record_thread__display(t, evlist, !i ? current - start : 0);
 		memset(&t->stats, 0, sizeof(t->stats));
 	}
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 47/48] perf record: Spread maps for --threads option
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (45 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 46/48] perf record: Add maps to --thread-stats output Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-17 11:40   ` Namhyung Kim
  2018-09-13 12:54 ` [PATCH 48/48] perf record: Spread maps for --threads=X option Jiri Olsa
                   ` (2 subsequent siblings)
  49 siblings, 1 reply; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

Currently we assign all maps to main thread. Adding
code that spreads maps for --threads option.

For --thread option we create as many threads as there
are memory maps in evlist, which is the number of CPUs
in the system or CPUs we monitor. Each thread gets a
single data mmap to read.

In addition we have also same amount of tracking mmaps
for auxiliary events which we don't create special thread
for. Instead we assign the to the main thread, because
there's not much traffic expected there.

The assignment is visible from --thread-stats output:

          pid      write       poll       skip  maps (size 20K)
    1s   9770       144B          1          0   19K   19K   19K   18K   19K
         9772         0B          1          0   18K
         9773         0B          1          0   19K
         9774         0B          1          0   19K

There are 5 maps for thread 9770 (1 data map and 4 auxiliary)
and one data map for every other thread. Each thread writes
data to the separate data file.

In addition we also pin every thread to the cpu that
the data map belongs to in order to keep both writer
(kernel) and reader (perf tool thread) on the same CPU.

Link: http://lkml.kernel.org/n/tip-ghcsnp3b73innq2gkl1lkfbz@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-record.c | 133 +++++++++++++++++++++++++++++++++---
 1 file changed, 125 insertions(+), 8 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 92ba4d83b18c..4cc728174c79 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -109,6 +109,8 @@ struct record {
 	struct switch_output	switch_output;
 	unsigned long long	samples;
 	struct record_thread	*threads;
+	bool			threads_all;
+	bool			threads_one;
 	int			threads_cnt;
 	bool			threads_set;
 	int			threads_signal_cnt;
@@ -393,15 +395,11 @@ static int record__mmap_evlist(struct record *rec,
 	return 0;
 }
 
-static int record__mmap_index(struct record *rec)
+static void record__mmap_index_single(struct record *rec)
 {
 	struct perf_evlist *evlist = rec->evlist;
 	struct perf_data *data = &rec->data;
-	int i, ret, nr = evlist->nr_mmaps;
-
-	ret = perf_data__create_index(data, nr);
-	if (ret)
-		return ret;
+	int i, nr = evlist->nr_mmaps;
 
 	for (i = 0; i < nr; i++) {
 		struct perf_mmap *map = &evlist->mmap[i];
@@ -414,6 +412,50 @@ static int record__mmap_index(struct record *rec)
 
 		map->file = &data->file;
 	}
+}
+
+static void record__mmap_index_all(struct record *rec)
+{
+	struct perf_evlist *evlist = rec->evlist;
+	struct perf_data     *data = &rec->data;
+	struct record_thread *threads = rec->threads;
+	struct record_thread *thread0 = threads;
+	int i, t;
+
+	BUG_ON(data->index_nr != rec->threads_cnt);
+
+	for (i = 0; i < evlist->nr_mmaps; i++) {
+		struct perf_mmap *map = &evlist->track_mmap[i];
+
+		map->file = &data->file;
+	}
+
+	thread0->mmap[0]->file = &data->index[0];
+
+	for (t = 1; t < rec->threads_cnt; t++) {
+		struct record_thread *th = threads + t;
+
+		for (i = 0; i < th->mmap_nr; i++) {
+			struct perf_mmap *map = th->mmap[i];
+
+			map->file = &data->index[t];
+		}
+	}
+}
+
+static int record__mmap_index(struct record *rec)
+{
+	struct perf_data *data = &rec->data;
+	int ret;
+
+	ret = perf_data__create_index(data, rec->threads_cnt);
+	if (ret)
+		return ret;
+
+	if (rec->threads_all)
+		record__mmap_index_all(rec);
+	else if (rec->threads_one)
+		record__mmap_index_single(rec);
 
 	return 0;
 }
@@ -1056,7 +1098,7 @@ record_thread__mmap(struct record_thread *th, int nr, int nr_ovw)
 }
 
 static int
-record__threads_assign(struct record *rec)
+record__threads_assign_single(struct record *rec)
 {
 	struct record_thread *threads = rec->threads;
 	struct record_thread *thread0 = threads;
@@ -1089,6 +1131,55 @@ record__threads_assign(struct record *rec)
 	return ret;
 }
 
+static int
+record__threads_assign_all(struct record *rec)
+{
+	struct perf_evlist *evlist = rec->evlist;
+	struct record_thread *threads = rec->threads;
+	struct record_thread *thread0 = threads;
+	int cnt = rec->threads_cnt;
+	int i, t, nr, nr0, nr_trk;
+	int nr_cpus = cpu__max_present_cpu();
+
+	nr     = evlist->mmap       ? evlist->nr_mmaps : 0;
+	nr_trk = evlist->track_mmap ? evlist->nr_mmaps : 0;
+
+	BUG_ON(evlist->overwrite_mmap);
+	BUG_ON(nr_cpus != nr);
+
+	nr0 = 1 + nr_trk;
+
+	if (record_thread__mmap(thread0, nr0, 0))
+		return -ENOMEM;
+
+	thread0->mmap[0] = &evlist->mmap[0];
+
+	for (i = 0; i < nr_trk; i++)
+		thread0->mmap[i + 1] = &evlist->track_mmap[i];
+
+	for (t = 1; t < cnt; t++) {
+		struct record_thread *th = threads + t;
+
+		if (record_thread__mmap(th, 1, 0))
+			return -ENOMEM;
+
+		th->mmap[0] = &evlist->mmap[t];
+	}
+
+	return 0;
+}
+
+static int
+record__threads_assign(struct record *rec)
+{
+	if (rec->threads_all)
+		return record__threads_assign_all(rec);
+	else if (rec->threads_one)
+		return record__threads_assign_single(rec);
+	else
+		return -EINVAL;
+}
+
 static int
 record_thread__create_poll(struct record_thread *th,
 			   struct perf_evlist *evlist)
@@ -1146,7 +1237,8 @@ record__threads_create(struct record *rec)
 static int record__threads_cnt(struct record *rec)
 {
 	struct perf_evlist *evlist = rec->evlist;
-	int cnt;
+	bool all = false, one = false;
+	int cnt = 0;
 
 	if (rec->threads_set) {
 		if (rec->threads_cnt) {
@@ -1158,11 +1250,15 @@ static int record__threads_cnt(struct record *rec)
 			return -EINVAL;
 		}
 		cnt = evlist->nr_mmaps;
+		all = true;
 	} else {
+		one = true;
 		cnt = 1;
 	}
 
 	rec->threads_cnt = cnt;
+	rec->threads_all = all;
+	rec->threads_one = one;
 	return 0;
 }
 
@@ -1200,6 +1296,25 @@ static inline pid_t gettid(void)
 	return (pid_t) syscall(__NR_gettid);
 }
 
+static int set_affinity(int cpu)
+{
+	cpu_set_t mask;
+
+	CPU_ZERO(&mask);
+	CPU_SET(cpu, &mask);
+	return sched_setaffinity(0, sizeof(mask), &mask);
+}
+
+static void set_thread_affinity(struct record *rec)
+{
+	if (rec->threads_all) {
+		struct perf_mmap *m0 = thread->mmap[0];
+
+		if (set_affinity(m0->cpu))
+			pr_err("failed to set affinity for cpu %d\n", m0->cpu);
+	}
+}
+
 static void*
 record_thread__process(struct record *rec)
 {
@@ -1263,6 +1378,7 @@ static void *worker(void *arg)
 	thread->state = RECORD_THREAD__RUNNING;
 
 	signal_main(rec);
+	set_thread_affinity(rec);
 
 	return record_thread__process(rec);
 }
@@ -1283,6 +1399,7 @@ static int record__threads_start(struct record *rec)
 	if (rec->threads_cnt > 1)
 		wait_for_signal(rec);
 
+	set_thread_affinity(rec);
 	return err;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 48/48] perf record: Spread maps for --threads=X option
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (46 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 47/48] perf record: Spread maps for --threads option Jiri Olsa
@ 2018-09-13 12:54 ` Jiri Olsa
  2018-09-13 16:10 ` [RFCv2 00/48] perf tools: Add threads to record command Alexey Budankov
  2018-09-14 17:02 ` Andi Kleen
  49 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-13 12:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

This code allows to create arbitrary number of threads
and assign maps for it.

Each thread gets a fair share of the data maps plus the
main thread gets also to read tracking maps.

The assignment is visible from --thread-stats output
(for --threads=2 on 4 CPUs system):

        pid      write       poll       skip  maps (size 20K)
  1s   9318       144B          1          0   18K   19K   17K   19K   19K   19K
       9320        16K          3          3   19K   18K

There are 6 maps for thread 9318 (2 data maps and 4 auxiliary)
and 2 data maps for thread 9320. Each thread writes data to the
separate data file.

Link: http://lkml.kernel.org/n/tip-5myhg112a5mqv0ij0q7hky98@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-record.c | 103 ++++++++++++++++++++++++++++++++++--
 1 file changed, 98 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4cc728174c79..6d3ce2ca8fbe 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -443,6 +443,33 @@ static void record__mmap_index_all(struct record *rec)
 	}
 }
 
+static void record__mmap_index_cnt(struct record *rec)
+{
+	struct perf_evlist *evlist = rec->evlist;
+	struct perf_data     *data = &rec->data;
+	int i, t;
+
+	for (i = 0; i < evlist->nr_mmaps; i++) {
+		struct perf_mmap *map = &evlist->track_mmap[i];
+
+		map->file = &data->file;
+	}
+
+	for (t = 0; t < rec->threads_cnt; t++) {
+		struct record_thread *th = rec->threads + t;
+		int nr = th->mmap_nr;
+
+		if (!t)
+			nr -= evlist->nr_mmaps;
+
+		for (i = 0; i < nr; i++) {
+			struct perf_mmap *map = th->mmap[i];
+
+			map->file = &data->index[t];
+		}
+	}
+}
+
 static int record__mmap_index(struct record *rec)
 {
 	struct perf_data *data = &rec->data;
@@ -456,6 +483,8 @@ static int record__mmap_index(struct record *rec)
 		record__mmap_index_all(rec);
 	else if (rec->threads_one)
 		record__mmap_index_single(rec);
+	else
+		record__mmap_index_cnt(rec);
 
 	return 0;
 }
@@ -1169,6 +1198,70 @@ record__threads_assign_all(struct record *rec)
 	return 0;
 }
 
+static int
+record__threads_assign_cnt(struct record *rec)
+{
+	struct record_thread *threads = rec->threads;
+	struct record_thread *thread0 = threads;
+	struct perf_evlist *evlist = rec->evlist;
+	int cnt = rec->threads_cnt;
+	int i, j, t, nr, nr_trk, nr_thr, nr_mod, n0 = 0;
+
+	nr     = evlist->mmap       ? evlist->nr_mmaps : 0;
+	nr_trk = evlist->track_mmap ? evlist->nr_mmaps : 0;
+
+	nr_thr = nr / cnt;
+	nr_mod = nr % cnt;
+
+	/*
+	 * Create threads' mmaps first..
+	 */
+	for (t = 0; t < cnt; t++) {
+		struct record_thread *th = threads + t;
+		int n = nr_thr;
+
+		/* evenly spread the remainder of threads */
+		n += (nr_mod-- > 0) ? 1 : 0;
+
+		/* first thread deals with track mmaps */
+		if (!t) {
+			n0 = n;
+			n += nr_trk;
+		}
+
+		if (record_thread__mmap(th, n, 0))
+			return -ENOMEM;
+	}
+
+	/*
+	 *  ... and assign mmaps separatelly.
+	 */
+
+	j = 0; /* mmap in evlist->mmap */
+	i = 0; /* mmap in thread    */
+
+	while (1) {
+		for (t = 0; t < cnt; t++) {
+			struct record_thread *th = threads + t;
+
+			if (i == th->mmap_nr)
+				continue;
+
+			th->mmap[i] = &evlist->mmap[j++];
+
+			if (j == nr)
+				goto out;
+		}
+		i++;
+	}
+out:
+	/* Assign track maps to thread 0. */
+	for (i = n0, j = 0; j < nr_trk && i < thread0->mmap_nr; i++, j++)
+		thread0->mmap[i] = &evlist->track_mmap[j];
+
+	return 0;
+}
+
 static int
 record__threads_assign(struct record *rec)
 {
@@ -1177,7 +1270,7 @@ record__threads_assign(struct record *rec)
 	else if (rec->threads_one)
 		return record__threads_assign_single(rec);
 	else
-		return -EINVAL;
+		return record__threads_assign_cnt(rec);
 }
 
 static int
@@ -1242,15 +1335,15 @@ static int record__threads_cnt(struct record *rec)
 
 	if (rec->threads_set) {
 		if (rec->threads_cnt) {
-			pr_err("failed: Can't specify number of threads yet.\n");
-			return -EINVAL;
+			cnt = rec->threads_cnt;
+		} else {
+			cnt = evlist->nr_mmaps;
+			all = true;
 		}
 		if (evlist->overwrite_mmap) {
 			pr_err("failed: Can't use multiple threads with overwrite mmaps yet.\n");
 			return -EINVAL;
 		}
-		cnt = evlist->nr_mmaps;
-		all = true;
 	} else {
 		one = true;
 		cnt = 1;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (47 preceding siblings ...)
  2018-09-13 12:54 ` [PATCH 48/48] perf record: Spread maps for --threads=X option Jiri Olsa
@ 2018-09-13 16:10 ` Alexey Budankov
  2018-09-14  2:29   ` Namhyung Kim
  2018-09-14  8:26   ` Jiri Olsa
  2018-09-14 17:02 ` Andi Kleen
  49 siblings, 2 replies; 101+ messages in thread
From: Alexey Budankov @ 2018-09-13 16:10 UTC (permalink / raw)
  To: Jiri Olsa, Arnaldo Carvalho de Melo
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen

Hi,

On 13.09.2018 15:54, Jiri Olsa wrote:
> hi,
> sending *RFC* for threads support in perf record command.
> 
> In big picture this patchset adds perf record --threads
> option that allows to create threads in following modes:
> 
> 1) single thread mode (current)
> 
>   $ perf record ...
>   $ perf record --threads=1 ...
> 
>   - all maps are read/stored under process thread
> 
> 2) mode with specific (X) number of threads
> 
>   $ perf record --threads=X ...
> 
>   - maps are spread equaly among threads
> 
> 3) mode that creates thread for every monitored memory map
> 
>   $ perf record --threads ...
> 
>   - which in perf record is equal to number of CPUs, and
>     it pins each thread to its map's cpu:
> 
> 4) TODO - NUMA aware threads/maps separation
>    ...
> 
> The perf.data stays as a single file.
> 
> v2 changes:
>   - rebased to current Arnaldo's perf/core
>     (also based on few fixes from my perf/core, see the branch details below)
> 
> This patchset contains lot of preparation changes to make
> threaded record possible:
> 
>   - Namhyung's changes to create multiple data streams in
>     perf data file, which allows having each thread data
>     being stored in separate files and merged into single
>     perf data after
> 
>   - Namhyung's changes to create track mmaps for auxiliary
>     events
> 
>   - Namhyung's changes to search for threads/mmaps/comms
>     using the time. This is needed because we have now
>     multiple data streams which are processed separately,
>     but they all need access to complete auxiliary events
>     data (threads/mmaps/comms). That's also a reason why
>     the auxiliary events are stored into separate data
>     stream, which is processed before real data.
> 
>   - the rest of the code that adds threads abstraction into
>     record command allows to create them and distribute maps
>     among them
> 
>   - other preparational changes
> 
> The threaded monitoring currently can't monitor backward maps
> and there are probably more limitations which I haven't spotted
> yet.
> 
> So far I tested on laptop:
>   http://people.redhat.com/~jolsa/record_threads/test-4CPU.txt
> 
> and a one bigger server:
>   http://people.redhat.com/~jolsa/record_threads/test-208CPU.txt
> 
> I can see decrease in recorded LOST events, but both the benchmark
> and the monitoring must be carefully configured wrt:
>   - number of events (frequency)
>   - size of the memory maps
>   - size of events (callchains)
>   - final perf.data size
> 
> It's also available in:
>   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
>   perf/record_threads
> 
> thoughts? ;-) thanks
> jirka

It is preferable to split into smaller pieces that bring 
some improvement proved by metrics numbers and ready for 
merging and upstream. Do we have more metrics than the 
data loss from trace AIO patches?

There is usage of Posix threading API but there is no 
its implementation in the patch series, to avoid dependency 
on externally coded designs in the core of the tool.

> 
> 
> ---
> Jiri Olsa (30):
>       perf tools: Remove perf_tool from event_op2
>       perf tools: Remove perf_tool from event_op3
>       perf tools: Pass struct perf_mmap into auxtrace_mmap__read* functions
>       perf tools: Add struct perf_mmap arg into record__write
>       perf tools: Create separate mmap for dummy tracking event
>       perf tools: Make copyfile_offset global
>       perf tools: Add perf_data__create_index function
>       perf record: Add --index option for building index table
>       perf tools: Convert dead thread list into rbtree
>       perf tools: Add thread::exited flag
>       perf callchain: Maintain libunwind's address space in map_groups
>       perf tools: Rename perf_evlist__munmap_filtered to perf_mmap__put_filtered
>       tools lib fd array: Introduce fdarray__add_clone function
>       tools lib subcmd: Add OPT_INTEGER_OPTARG|_SET options
>       perf tools: Move __perf_session__process_events args into struct
>       perf ui progress: Fix index progress display
>       perf tools: Add threads debug variable
>       perf tools: Add perf_mmap__read_tail function
>       perf record: Introduce struct record_thread
>       perf record: Read record thread's mmaps
>       perf record: Move waking into struct record
>       perf record: Move samples into struct record_thread
>       perf record: Move bytes_written into struct record_thread
>       perf record: Add record_thread start/stop/process functions
>       perf record: Wait for all threads being started
>       perf record: Add --threads option
>       perf record: Add --thread-stats option support
>       perf record: Add maps to --thread-stats output
>       perf record: Spread maps for --threads option
>       perf record: Spread maps for --threads=X option
> 
> Namhyung Kim (18):
>       perf tools: Use a software dummy event to track task/mmap events
>       perf tools: Extend perf_evlist__mmap_ex() to use track mmap
>       perf report: Skip dummy tracking event
>       perf tools: Add HEADER_DATA_INDEX feature
>       perf tools: Handle indexed data file properly
>       perf tools: Introduce thread__comm(_str)_by_time() helpers
>       perf tools: Add a test case for thread comm handling
>       perf tools: Use thread__comm_by_time() when adding hist entries
>       perf tools: Introduce machine__find*_thread_by_time()
>       perf tools: Add a test case for timed thread handling
>       perf tools: Maintain map groups list in a leader thread
>       perf tools: Introduce thread__find_symbol_by_time() and friends
>       perf callchain: Use thread__find_addr_location_by_time() and friends
>       perf tools: Add a test case for timed map groups handling
>       perf tools: Save timestamp of a map creation
>       perf tools: Introduce map_groups__{insert,find}_by_time()
>       perf tools: Use map_groups__find_addr_by_time()
>       perf tools: Add testcase for managing maps with time
> 
>  tools/lib/api/fd/array.c                 |  17 +
>  tools/lib/api/fd/array.h                 |   1 +
>  tools/lib/subcmd/parse-options.c         |   2 +
>  tools/lib/subcmd/parse-options.h         |   9 +
>  tools/perf/Documentation/perf-record.txt |   4 +
>  tools/perf/Documentation/perf.txt        |   1 +
>  tools/perf/builtin-annotate.c            |   7 +-
>  tools/perf/builtin-inject.c              |  32 +-
>  tools/perf/builtin-record.c              | 899 +++++++++++++++++++++++++++++--
>  tools/perf/builtin-report.c              |  12 +-
>  tools/perf/builtin-script.c              |  38 +-
>  tools/perf/builtin-stat.c                |  23 +-
>  tools/perf/perf.c                        |   1 +
>  tools/perf/perf.h                        |   3 +
>  tools/perf/tests/Build                   |   4 +
>  tools/perf/tests/builtin-test.c          |  16 +
>  tools/perf/tests/dwarf-unwind.c          |   4 +-
>  tools/perf/tests/hists_common.c          |   2 +-
>  tools/perf/tests/hists_link.c            |   2 +-
>  tools/perf/tests/tests.h                 |   4 +
>  tools/perf/tests/thread-comm.c           |  48 ++
>  tools/perf/tests/thread-lookup-time.c    | 181 +++++++
>  tools/perf/tests/thread-map-time.c       |  90 ++++
>  tools/perf/tests/thread-mg-share.c       |   7 +-
>  tools/perf/tests/thread-mg-time.c        |  94 ++++
>  tools/perf/ui/browsers/hists.c           |  30 +-
>  tools/perf/ui/gtk/hists.c                |   3 +
>  tools/perf/util/auxtrace.c               |  30 +-
>  tools/perf/util/auxtrace.h               |  21 +-
>  tools/perf/util/data.c                   |  64 +++
>  tools/perf/util/data.h                   |   5 +
>  tools/perf/util/debug.c                  |   2 +
>  tools/perf/util/debug.h                  |   1 +
>  tools/perf/util/dso.c                    |   2 +-
>  tools/perf/util/event.c                  | 135 ++++-
>  tools/perf/util/evlist.c                 |  96 +++-
>  tools/perf/util/evlist.h                 |   7 +-
>  tools/perf/util/evsel.h                  |  15 +
>  tools/perf/util/header.c                 |  93 +++-
>  tools/perf/util/header.h                 |  18 +-
>  tools/perf/util/hist.c                   |   4 +-
>  tools/perf/util/intel-pt.c               |   2 +-
>  tools/perf/util/machine.c                | 293 ++++++++--
>  tools/perf/util/machine.h                |  22 +-
>  tools/perf/util/map.c                    |  79 ++-
>  tools/perf/util/map.h                    |  40 +-
>  tools/perf/util/mmap.c                   |   6 +-
>  tools/perf/util/mmap.h                   |  33 +-
>  tools/perf/util/session.c                | 178 +++---
>  tools/perf/util/session.h                |   5 +-
>  tools/perf/util/stat.c                   |   5 +-
>  tools/perf/util/stat.h                   |   5 +-
>  tools/perf/util/symbol-elf.c             |   2 +-
>  tools/perf/util/symbol.c                 |   4 +-
>  tools/perf/util/thread.c                 | 200 ++++++-
>  tools/perf/util/thread.h                 |  27 +-
>  tools/perf/util/tool.h                   |   7 +-
>  tools/perf/util/unwind-libdw.c           |   6 +-
>  tools/perf/util/unwind-libunwind-local.c |  39 +-
>  tools/perf/util/unwind-libunwind.c       |   9 +-
>  tools/perf/util/unwind.h                 |   7 +-
>  tools/perf/util/util.c                   |   2 +-
>  tools/perf/util/util.h                   |   2 +
>  63 files changed, 2608 insertions(+), 392 deletions(-)
>  create mode 100644 tools/perf/tests/thread-comm.c
>  create mode 100644 tools/perf/tests/thread-lookup-time.c
>  create mode 100644 tools/perf/tests/thread-map-time.c
>  create mode 100644 tools/perf/tests/thread-mg-time.c
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-13 16:10 ` [RFCv2 00/48] perf tools: Add threads to record command Alexey Budankov
@ 2018-09-14  2:29   ` Namhyung Kim
  2018-09-14  7:15     ` Alexey Budankov
                       ` (2 more replies)
  2018-09-14  8:26   ` Jiri Olsa
  1 sibling, 3 replies; 101+ messages in thread
From: Namhyung Kim @ 2018-09-14  2:29 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, kernel-team

On Thu, Sep 13, 2018 at 07:10:35PM +0300, Alexey Budankov wrote:
> Hi,

Hello,

> 
> On 13.09.2018 15:54, Jiri Olsa wrote:
> > hi,
> > sending *RFC* for threads support in perf record command.
> > 
> > In big picture this patchset adds perf record --threads
> > option that allows to create threads in following modes:
> > 
> > 1) single thread mode (current)
> > 
> >   $ perf record ...
> >   $ perf record --threads=1 ...
> > 
> >   - all maps are read/stored under process thread
> > 
> > 2) mode with specific (X) number of threads
> > 
> >   $ perf record --threads=X ...
> > 
> >   - maps are spread equaly among threads
> > 
> > 3) mode that creates thread for every monitored memory map
> > 
> >   $ perf record --threads ...
> > 
> >   - which in perf record is equal to number of CPUs, and
> >     it pins each thread to its map's cpu:
> > 
> > 4) TODO - NUMA aware threads/maps separation
> >    ...
> > 
> > The perf.data stays as a single file.

I'm not sure we really need to keep it as a single file.  As it's a
kind of big changes, we might consider breaking compatibility and use
a directory structure.


> > 
> > v2 changes:
> >   - rebased to current Arnaldo's perf/core
> >     (also based on few fixes from my perf/core, see the branch details below)
> > 
> > This patchset contains lot of preparation changes to make
> > threaded record possible:
> > 
> >   - Namhyung's changes to create multiple data streams in
> >     perf data file, which allows having each thread data
> >     being stored in separate files and merged into single
> >     perf data after
> > 
> >   - Namhyung's changes to create track mmaps for auxiliary
> >     events
> > 
> >   - Namhyung's changes to search for threads/mmaps/comms
> >     using the time. This is needed because we have now
> >     multiple data streams which are processed separately,
> >     but they all need access to complete auxiliary events
> >     data (threads/mmaps/comms). That's also a reason why
> >     the auxiliary events are stored into separate data
> >     stream, which is processed before real data.
> > 
> >   - the rest of the code that adds threads abstraction into
> >     record command allows to create them and distribute maps
> >     among them
> > 
> >   - other preparational changes
> > 
> > The threaded monitoring currently can't monitor backward maps
> > and there are probably more limitations which I haven't spotted
> > yet.
> > 
> > So far I tested on laptop:
> >   http://people.redhat.com/~jolsa/record_threads/test-4CPU.txt
> > 
> > and a one bigger server:
> >   http://people.redhat.com/~jolsa/record_threads/test-208CPU.txt
> > 
> > I can see decrease in recorded LOST events, but both the benchmark
> > and the monitoring must be carefully configured wrt:
> >   - number of events (frequency)
> >   - size of the memory maps
> >   - size of events (callchains)
> >   - final perf.data size
> > 
> > It's also available in:
> >   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
> >   perf/record_threads
> > 
> > thoughts? ;-) thanks
> > jirka
> 
> It is preferable to split into smaller pieces that bring 
> some improvement proved by metrics numbers and ready for 
> merging and upstream. Do we have more metrics than the 
> data loss from trace AIO patches?

Well, this change is to enable parallel access of perf data so the
preparation works don't affect single thread processing (hopefully) or
make it even worse.  I'm not sure if could be split for performance
benefits.


> 
> There is usage of Posix threading API but there is no 
> its implementation in the patch series, to avoid dependency 
> on externally coded designs in the core of the tool.

Do you mean it needs to implement its own threading?  I don't think
that's what Ingo wanted to.

Thanks,
Namhyung


> 
> > 
> > 
> > ---
> > Jiri Olsa (30):
> >       perf tools: Remove perf_tool from event_op2
> >       perf tools: Remove perf_tool from event_op3
> >       perf tools: Pass struct perf_mmap into auxtrace_mmap__read* functions
> >       perf tools: Add struct perf_mmap arg into record__write
> >       perf tools: Create separate mmap for dummy tracking event
> >       perf tools: Make copyfile_offset global
> >       perf tools: Add perf_data__create_index function
> >       perf record: Add --index option for building index table
> >       perf tools: Convert dead thread list into rbtree
> >       perf tools: Add thread::exited flag
> >       perf callchain: Maintain libunwind's address space in map_groups
> >       perf tools: Rename perf_evlist__munmap_filtered to perf_mmap__put_filtered
> >       tools lib fd array: Introduce fdarray__add_clone function
> >       tools lib subcmd: Add OPT_INTEGER_OPTARG|_SET options
> >       perf tools: Move __perf_session__process_events args into struct
> >       perf ui progress: Fix index progress display
> >       perf tools: Add threads debug variable
> >       perf tools: Add perf_mmap__read_tail function
> >       perf record: Introduce struct record_thread
> >       perf record: Read record thread's mmaps
> >       perf record: Move waking into struct record
> >       perf record: Move samples into struct record_thread
> >       perf record: Move bytes_written into struct record_thread
> >       perf record: Add record_thread start/stop/process functions
> >       perf record: Wait for all threads being started
> >       perf record: Add --threads option
> >       perf record: Add --thread-stats option support
> >       perf record: Add maps to --thread-stats output
> >       perf record: Spread maps for --threads option
> >       perf record: Spread maps for --threads=X option
> > 
> > Namhyung Kim (18):
> >       perf tools: Use a software dummy event to track task/mmap events
> >       perf tools: Extend perf_evlist__mmap_ex() to use track mmap
> >       perf report: Skip dummy tracking event
> >       perf tools: Add HEADER_DATA_INDEX feature
> >       perf tools: Handle indexed data file properly
> >       perf tools: Introduce thread__comm(_str)_by_time() helpers
> >       perf tools: Add a test case for thread comm handling
> >       perf tools: Use thread__comm_by_time() when adding hist entries
> >       perf tools: Introduce machine__find*_thread_by_time()
> >       perf tools: Add a test case for timed thread handling
> >       perf tools: Maintain map groups list in a leader thread
> >       perf tools: Introduce thread__find_symbol_by_time() and friends
> >       perf callchain: Use thread__find_addr_location_by_time() and friends
> >       perf tools: Add a test case for timed map groups handling
> >       perf tools: Save timestamp of a map creation
> >       perf tools: Introduce map_groups__{insert,find}_by_time()
> >       perf tools: Use map_groups__find_addr_by_time()
> >       perf tools: Add testcase for managing maps with time
> > 
> >  tools/lib/api/fd/array.c                 |  17 +
> >  tools/lib/api/fd/array.h                 |   1 +
> >  tools/lib/subcmd/parse-options.c         |   2 +
> >  tools/lib/subcmd/parse-options.h         |   9 +
> >  tools/perf/Documentation/perf-record.txt |   4 +
> >  tools/perf/Documentation/perf.txt        |   1 +
> >  tools/perf/builtin-annotate.c            |   7 +-
> >  tools/perf/builtin-inject.c              |  32 +-
> >  tools/perf/builtin-record.c              | 899 +++++++++++++++++++++++++++++--
> >  tools/perf/builtin-report.c              |  12 +-
> >  tools/perf/builtin-script.c              |  38 +-
> >  tools/perf/builtin-stat.c                |  23 +-
> >  tools/perf/perf.c                        |   1 +
> >  tools/perf/perf.h                        |   3 +
> >  tools/perf/tests/Build                   |   4 +
> >  tools/perf/tests/builtin-test.c          |  16 +
> >  tools/perf/tests/dwarf-unwind.c          |   4 +-
> >  tools/perf/tests/hists_common.c          |   2 +-
> >  tools/perf/tests/hists_link.c            |   2 +-
> >  tools/perf/tests/tests.h                 |   4 +
> >  tools/perf/tests/thread-comm.c           |  48 ++
> >  tools/perf/tests/thread-lookup-time.c    | 181 +++++++
> >  tools/perf/tests/thread-map-time.c       |  90 ++++
> >  tools/perf/tests/thread-mg-share.c       |   7 +-
> >  tools/perf/tests/thread-mg-time.c        |  94 ++++
> >  tools/perf/ui/browsers/hists.c           |  30 +-
> >  tools/perf/ui/gtk/hists.c                |   3 +
> >  tools/perf/util/auxtrace.c               |  30 +-
> >  tools/perf/util/auxtrace.h               |  21 +-
> >  tools/perf/util/data.c                   |  64 +++
> >  tools/perf/util/data.h                   |   5 +
> >  tools/perf/util/debug.c                  |   2 +
> >  tools/perf/util/debug.h                  |   1 +
> >  tools/perf/util/dso.c                    |   2 +-
> >  tools/perf/util/event.c                  | 135 ++++-
> >  tools/perf/util/evlist.c                 |  96 +++-
> >  tools/perf/util/evlist.h                 |   7 +-
> >  tools/perf/util/evsel.h                  |  15 +
> >  tools/perf/util/header.c                 |  93 +++-
> >  tools/perf/util/header.h                 |  18 +-
> >  tools/perf/util/hist.c                   |   4 +-
> >  tools/perf/util/intel-pt.c               |   2 +-
> >  tools/perf/util/machine.c                | 293 ++++++++--
> >  tools/perf/util/machine.h                |  22 +-
> >  tools/perf/util/map.c                    |  79 ++-
> >  tools/perf/util/map.h                    |  40 +-
> >  tools/perf/util/mmap.c                   |   6 +-
> >  tools/perf/util/mmap.h                   |  33 +-
> >  tools/perf/util/session.c                | 178 +++---
> >  tools/perf/util/session.h                |   5 +-
> >  tools/perf/util/stat.c                   |   5 +-
> >  tools/perf/util/stat.h                   |   5 +-
> >  tools/perf/util/symbol-elf.c             |   2 +-
> >  tools/perf/util/symbol.c                 |   4 +-
> >  tools/perf/util/thread.c                 | 200 ++++++-
> >  tools/perf/util/thread.h                 |  27 +-
> >  tools/perf/util/tool.h                   |   7 +-
> >  tools/perf/util/unwind-libdw.c           |   6 +-
> >  tools/perf/util/unwind-libunwind-local.c |  39 +-
> >  tools/perf/util/unwind-libunwind.c       |   9 +-
> >  tools/perf/util/unwind.h                 |   7 +-
> >  tools/perf/util/util.c                   |   2 +-
> >  tools/perf/util/util.h                   |   2 +
> >  63 files changed, 2608 insertions(+), 392 deletions(-)
> >  create mode 100644 tools/perf/tests/thread-comm.c
> >  create mode 100644 tools/perf/tests/thread-lookup-time.c
> >  create mode 100644 tools/perf/tests/thread-map-time.c
> >  create mode 100644 tools/perf/tests/thread-mg-time.c
> > 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-14  2:29   ` Namhyung Kim
@ 2018-09-14  7:15     ` Alexey Budankov
  2018-09-14  8:23     ` Jiri Olsa
  2018-09-14  9:33     ` Ingo Molnar
  2 siblings, 0 replies; 101+ messages in thread
From: Alexey Budankov @ 2018-09-14  7:15 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, kernel-team

Hi Namhyung,

On 14.09.2018 5:29, Namhyung Kim wrote:
> On Thu, Sep 13, 2018 at 07:10:35PM +0300, Alexey Budankov wrote:
>> Hi,
> 
> Hello,
> 
>>
>> On 13.09.2018 15:54, Jiri Olsa wrote:
>>> hi,
>>> sending *RFC* for threads support in perf record command.
>>>
>>> In big picture this patchset adds perf record --threads
>>> option that allows to create threads in following modes:
>>>
>>> 1) single thread mode (current)
>>>
>>>   $ perf record ...
>>>   $ perf record --threads=1 ...
>>>
>>>   - all maps are read/stored under process thread
>>>
>>> 2) mode with specific (X) number of threads
>>>
>>>   $ perf record --threads=X ...
>>>
>>>   - maps are spread equaly among threads
>>>
>>> 3) mode that creates thread for every monitored memory map
>>>
>>>   $ perf record --threads ...
>>>
>>>   - which in perf record is equal to number of CPUs, and
>>>     it pins each thread to its map's cpu:
>>>
>>> 4) TODO - NUMA aware threads/maps separation
>>>    ...
>>>
>>> The perf.data stays as a single file.
> 
> I'm not sure we really need to keep it as a single file.  As it's a
> kind of big changes, we might consider breaking compatibility and use
> a directory structure.
> 
> 
>>>
>>> v2 changes:
>>>   - rebased to current Arnaldo's perf/core
>>>     (also based on few fixes from my perf/core, see the branch details below)
>>>
>>> This patchset contains lot of preparation changes to make
>>> threaded record possible:
>>>
>>>   - Namhyung's changes to create multiple data streams in
>>>     perf data file, which allows having each thread data
>>>     being stored in separate files and merged into single
>>>     perf data after
>>>
>>>   - Namhyung's changes to create track mmaps for auxiliary
>>>     events
>>>
>>>   - Namhyung's changes to search for threads/mmaps/comms
>>>     using the time. This is needed because we have now
>>>     multiple data streams which are processed separately,
>>>     but they all need access to complete auxiliary events
>>>     data (threads/mmaps/comms). That's also a reason why
>>>     the auxiliary events are stored into separate data
>>>     stream, which is processed before real data.
>>>
>>>   - the rest of the code that adds threads abstraction into
>>>     record command allows to create them and distribute maps
>>>     among them
>>>
>>>   - other preparational changes
>>>
>>> The threaded monitoring currently can't monitor backward maps
>>> and there are probably more limitations which I haven't spotted
>>> yet.
>>>
>>> So far I tested on laptop:
>>>   http://people.redhat.com/~jolsa/record_threads/test-4CPU.txt
>>>
>>> and a one bigger server:
>>>   http://people.redhat.com/~jolsa/record_threads/test-208CPU.txt
>>>
>>> I can see decrease in recorded LOST events, but both the benchmark
>>> and the monitoring must be carefully configured wrt:
>>>   - number of events (frequency)
>>>   - size of the memory maps
>>>   - size of events (callchains)
>>>   - final perf.data size
>>>
>>> It's also available in:
>>>   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
>>>   perf/record_threads
>>>
>>> thoughts? ;-) thanks
>>> jirka
>>
>> It is preferable to split into smaller pieces that bring 
>> some improvement proved by metrics numbers and ready for 
>> merging and upstream. Do we have more metrics than the 
>> data loss from trace AIO patches?
> 
> Well, this change is to enable parallel access of perf data so the
> preparation works don't affect single thread processing (hopefully) or
> make it even worse.  I'm not sure if could be split for performance
> benefits.
> 
> 
>>
>> There is usage of Posix threading API but there is no 
>> its implementation in the patch series, to avoid dependency 
>> on externally coded designs in the core of the tool.
> 
> Do you mean it needs to implement its own threading?  I don't think
> that's what Ingo wanted to.

Pthreads implementation is still a part of libc and even part of 
other libpthread.so binary in case of glibc.

I guess for Pthreads it is probably acceptable to reside separately 
because as glibc as bionic provide reasonable implementation. 

But which else libc libraries do Perf target? It is preferable to 
clarify it in advance in order to provide complete implementation.

Thanks,
Alexey

> 
> Thanks,
> Namhyung
> 
> 
>>
>>>
>>>
>>> ---
>>> Jiri Olsa (30):
>>>       perf tools: Remove perf_tool from event_op2
>>>       perf tools: Remove perf_tool from event_op3
>>>       perf tools: Pass struct perf_mmap into auxtrace_mmap__read* functions
>>>       perf tools: Add struct perf_mmap arg into record__write
>>>       perf tools: Create separate mmap for dummy tracking event
>>>       perf tools: Make copyfile_offset global
>>>       perf tools: Add perf_data__create_index function
>>>       perf record: Add --index option for building index table
>>>       perf tools: Convert dead thread list into rbtree
>>>       perf tools: Add thread::exited flag
>>>       perf callchain: Maintain libunwind's address space in map_groups
>>>       perf tools: Rename perf_evlist__munmap_filtered to perf_mmap__put_filtered
>>>       tools lib fd array: Introduce fdarray__add_clone function
>>>       tools lib subcmd: Add OPT_INTEGER_OPTARG|_SET options
>>>       perf tools: Move __perf_session__process_events args into struct
>>>       perf ui progress: Fix index progress display
>>>       perf tools: Add threads debug variable
>>>       perf tools: Add perf_mmap__read_tail function
>>>       perf record: Introduce struct record_thread
>>>       perf record: Read record thread's mmaps
>>>       perf record: Move waking into struct record
>>>       perf record: Move samples into struct record_thread
>>>       perf record: Move bytes_written into struct record_thread
>>>       perf record: Add record_thread start/stop/process functions
>>>       perf record: Wait for all threads being started
>>>       perf record: Add --threads option
>>>       perf record: Add --thread-stats option support
>>>       perf record: Add maps to --thread-stats output
>>>       perf record: Spread maps for --threads option
>>>       perf record: Spread maps for --threads=X option
>>>
>>> Namhyung Kim (18):
>>>       perf tools: Use a software dummy event to track task/mmap events
>>>       perf tools: Extend perf_evlist__mmap_ex() to use track mmap
>>>       perf report: Skip dummy tracking event
>>>       perf tools: Add HEADER_DATA_INDEX feature
>>>       perf tools: Handle indexed data file properly
>>>       perf tools: Introduce thread__comm(_str)_by_time() helpers
>>>       perf tools: Add a test case for thread comm handling
>>>       perf tools: Use thread__comm_by_time() when adding hist entries
>>>       perf tools: Introduce machine__find*_thread_by_time()
>>>       perf tools: Add a test case for timed thread handling
>>>       perf tools: Maintain map groups list in a leader thread
>>>       perf tools: Introduce thread__find_symbol_by_time() and friends
>>>       perf callchain: Use thread__find_addr_location_by_time() and friends
>>>       perf tools: Add a test case for timed map groups handling
>>>       perf tools: Save timestamp of a map creation
>>>       perf tools: Introduce map_groups__{insert,find}_by_time()
>>>       perf tools: Use map_groups__find_addr_by_time()
>>>       perf tools: Add testcase for managing maps with time
>>>
>>>  tools/lib/api/fd/array.c                 |  17 +
>>>  tools/lib/api/fd/array.h                 |   1 +
>>>  tools/lib/subcmd/parse-options.c         |   2 +
>>>  tools/lib/subcmd/parse-options.h         |   9 +
>>>  tools/perf/Documentation/perf-record.txt |   4 +
>>>  tools/perf/Documentation/perf.txt        |   1 +
>>>  tools/perf/builtin-annotate.c            |   7 +-
>>>  tools/perf/builtin-inject.c              |  32 +-
>>>  tools/perf/builtin-record.c              | 899 +++++++++++++++++++++++++++++--
>>>  tools/perf/builtin-report.c              |  12 +-
>>>  tools/perf/builtin-script.c              |  38 +-
>>>  tools/perf/builtin-stat.c                |  23 +-
>>>  tools/perf/perf.c                        |   1 +
>>>  tools/perf/perf.h                        |   3 +
>>>  tools/perf/tests/Build                   |   4 +
>>>  tools/perf/tests/builtin-test.c          |  16 +
>>>  tools/perf/tests/dwarf-unwind.c          |   4 +-
>>>  tools/perf/tests/hists_common.c          |   2 +-
>>>  tools/perf/tests/hists_link.c            |   2 +-
>>>  tools/perf/tests/tests.h                 |   4 +
>>>  tools/perf/tests/thread-comm.c           |  48 ++
>>>  tools/perf/tests/thread-lookup-time.c    | 181 +++++++
>>>  tools/perf/tests/thread-map-time.c       |  90 ++++
>>>  tools/perf/tests/thread-mg-share.c       |   7 +-
>>>  tools/perf/tests/thread-mg-time.c        |  94 ++++
>>>  tools/perf/ui/browsers/hists.c           |  30 +-
>>>  tools/perf/ui/gtk/hists.c                |   3 +
>>>  tools/perf/util/auxtrace.c               |  30 +-
>>>  tools/perf/util/auxtrace.h               |  21 +-
>>>  tools/perf/util/data.c                   |  64 +++
>>>  tools/perf/util/data.h                   |   5 +
>>>  tools/perf/util/debug.c                  |   2 +
>>>  tools/perf/util/debug.h                  |   1 +
>>>  tools/perf/util/dso.c                    |   2 +-
>>>  tools/perf/util/event.c                  | 135 ++++-
>>>  tools/perf/util/evlist.c                 |  96 +++-
>>>  tools/perf/util/evlist.h                 |   7 +-
>>>  tools/perf/util/evsel.h                  |  15 +
>>>  tools/perf/util/header.c                 |  93 +++-
>>>  tools/perf/util/header.h                 |  18 +-
>>>  tools/perf/util/hist.c                   |   4 +-
>>>  tools/perf/util/intel-pt.c               |   2 +-
>>>  tools/perf/util/machine.c                | 293 ++++++++--
>>>  tools/perf/util/machine.h                |  22 +-
>>>  tools/perf/util/map.c                    |  79 ++-
>>>  tools/perf/util/map.h                    |  40 +-
>>>  tools/perf/util/mmap.c                   |   6 +-
>>>  tools/perf/util/mmap.h                   |  33 +-
>>>  tools/perf/util/session.c                | 178 +++---
>>>  tools/perf/util/session.h                |   5 +-
>>>  tools/perf/util/stat.c                   |   5 +-
>>>  tools/perf/util/stat.h                   |   5 +-
>>>  tools/perf/util/symbol-elf.c             |   2 +-
>>>  tools/perf/util/symbol.c                 |   4 +-
>>>  tools/perf/util/thread.c                 | 200 ++++++-
>>>  tools/perf/util/thread.h                 |  27 +-
>>>  tools/perf/util/tool.h                   |   7 +-
>>>  tools/perf/util/unwind-libdw.c           |   6 +-
>>>  tools/perf/util/unwind-libunwind-local.c |  39 +-
>>>  tools/perf/util/unwind-libunwind.c       |   9 +-
>>>  tools/perf/util/unwind.h                 |   7 +-
>>>  tools/perf/util/util.c                   |   2 +-
>>>  tools/perf/util/util.h                   |   2 +
>>>  63 files changed, 2608 insertions(+), 392 deletions(-)
>>>  create mode 100644 tools/perf/tests/thread-comm.c
>>>  create mode 100644 tools/perf/tests/thread-lookup-time.c
>>>  create mode 100644 tools/perf/tests/thread-map-time.c
>>>  create mode 100644 tools/perf/tests/thread-mg-time.c
>>>
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-14  2:29   ` Namhyung Kim
  2018-09-14  7:15     ` Alexey Budankov
@ 2018-09-14  8:23     ` Jiri Olsa
  2018-09-14  9:40       ` Ingo Molnar
  2018-09-14  9:33     ` Ingo Molnar
  2 siblings, 1 reply; 101+ messages in thread
From: Jiri Olsa @ 2018-09-14  8:23 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Alexey Budankov, Jiri Olsa, Arnaldo Carvalho de Melo, lkml,
	Ingo Molnar, Alexander Shishkin, Peter Zijlstra, Andi Kleen,
	kernel-team

On Fri, Sep 14, 2018 at 11:29:10AM +0900, Namhyung Kim wrote:
> On Thu, Sep 13, 2018 at 07:10:35PM +0300, Alexey Budankov wrote:
> > Hi,
> 
> Hello,
> 
> > 
> > On 13.09.2018 15:54, Jiri Olsa wrote:
> > > hi,
> > > sending *RFC* for threads support in perf record command.
> > > 
> > > In big picture this patchset adds perf record --threads
> > > option that allows to create threads in following modes:
> > > 
> > > 1) single thread mode (current)
> > > 
> > >   $ perf record ...
> > >   $ perf record --threads=1 ...
> > > 
> > >   - all maps are read/stored under process thread
> > > 
> > > 2) mode with specific (X) number of threads
> > > 
> > >   $ perf record --threads=X ...
> > > 
> > >   - maps are spread equaly among threads
> > > 
> > > 3) mode that creates thread for every monitored memory map
> > > 
> > >   $ perf record --threads ...
> > > 
> > >   - which in perf record is equal to number of CPUs, and
> > >     it pins each thread to its map's cpu:
> > > 
> > > 4) TODO - NUMA aware threads/maps separation
> > >    ...
> > > 
> > > The perf.data stays as a single file.
> 
> I'm not sure we really need to keep it as a single file.  As it's a
> kind of big changes, we might consider breaking compatibility and use
> a directory structure.

moving the files into the perf.data at the end is actualy
not a lot code.. and I think it's one of the 'small' things
that make this feature more user friendly

jirka

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-13 16:10 ` [RFCv2 00/48] perf tools: Add threads to record command Alexey Budankov
  2018-09-14  2:29   ` Namhyung Kim
@ 2018-09-14  8:26   ` Jiri Olsa
  2018-09-14  8:28     ` Jiri Olsa
  1 sibling, 1 reply; 101+ messages in thread
From: Jiri Olsa @ 2018-09-14  8:26 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar,
	Namhyung Kim, Alexander Shishkin, Peter Zijlstra, Andi Kleen

On Thu, Sep 13, 2018 at 07:10:35PM +0300, Alexey Budankov wrote:
> Hi,
> 
> On 13.09.2018 15:54, Jiri Olsa wrote:
> > hi,
> > sending *RFC* for threads support in perf record command.
> > 
> > In big picture this patchset adds perf record --threads
> > option that allows to create threads in following modes:
> > 
> > 1) single thread mode (current)
> > 
> >   $ perf record ...
> >   $ perf record --threads=1 ...
> > 
> >   - all maps are read/stored under process thread
> > 
> > 2) mode with specific (X) number of threads
> > 
> >   $ perf record --threads=X ...
> > 
> >   - maps are spread equaly among threads
> > 
> > 3) mode that creates thread for every monitored memory map
> > 
> >   $ perf record --threads ...
> > 
> >   - which in perf record is equal to number of CPUs, and
> >     it pins each thread to its map's cpu:
> > 
> > 4) TODO - NUMA aware threads/maps separation
> >    ...
> > 
> > The perf.data stays as a single file.
> > 
> > v2 changes:
> >   - rebased to current Arnaldo's perf/core
> >     (also based on few fixes from my perf/core, see the branch details below)
> > 
> > This patchset contains lot of preparation changes to make
> > threaded record possible:
> > 
> >   - Namhyung's changes to create multiple data streams in
> >     perf data file, which allows having each thread data
> >     being stored in separate files and merged into single
> >     perf data after
> > 
> >   - Namhyung's changes to create track mmaps for auxiliary
> >     events
> > 
> >   - Namhyung's changes to search for threads/mmaps/comms
> >     using the time. This is needed because we have now
> >     multiple data streams which are processed separately,
> >     but they all need access to complete auxiliary events
> >     data (threads/mmaps/comms). That's also a reason why
> >     the auxiliary events are stored into separate data
> >     stream, which is processed before real data.
> > 
> >   - the rest of the code that adds threads abstraction into
> >     record command allows to create them and distribute maps
> >     among them
> > 
> >   - other preparational changes
> > 
> > The threaded monitoring currently can't monitor backward maps
> > and there are probably more limitations which I haven't spotted
> > yet.
> > 
> > So far I tested on laptop:
> >   http://people.redhat.com/~jolsa/record_threads/test-4CPU.txt
> > 
> > and a one bigger server:
> >   http://people.redhat.com/~jolsa/record_threads/test-208CPU.txt
> > 
> > I can see decrease in recorded LOST events, but both the benchmark
> > and the monitoring must be carefully configured wrt:
> >   - number of events (frequency)
> >   - size of the memory maps
> >   - size of events (callchains)
> >   - final perf.data size
> > 
> > It's also available in:
> >   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
> >   perf/record_threads
> > 
> > thoughts? ;-) thanks
> > jirka
> 
> It is preferable to split into smaller pieces that bring 
> some improvement proved by metrics numbers and ready for 
> merging and upstream. Do we have more metrics than the 
> data loss from trace AIO patches?

well the primary focus is to get more events in,
so the LOST metric is the main one

> 
> There is usage of Posix threading API but there is no 
> its implementation in the patch series, to avoid dependency 
> on externally coded designs in the core of the tool.

well, we use pthreads in here, bt it's really not that
much code.. we could make that generic in future if needed

jirka

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-14  8:26   ` Jiri Olsa
@ 2018-09-14  8:28     ` Jiri Olsa
  2018-09-14  9:37       ` Alexey Budankov
  0 siblings, 1 reply; 101+ messages in thread
From: Jiri Olsa @ 2018-09-14  8:28 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar,
	Namhyung Kim, Alexander Shishkin, Peter Zijlstra, Andi Kleen

On Fri, Sep 14, 2018 at 10:26:53AM +0200, Jiri Olsa wrote:

SNIP

> > > The threaded monitoring currently can't monitor backward maps
> > > and there are probably more limitations which I haven't spotted
> > > yet.
> > > 
> > > So far I tested on laptop:
> > >   http://people.redhat.com/~jolsa/record_threads/test-4CPU.txt
> > > 
> > > and a one bigger server:
> > >   http://people.redhat.com/~jolsa/record_threads/test-208CPU.txt
> > > 
> > > I can see decrease in recorded LOST events, but both the benchmark
> > > and the monitoring must be carefully configured wrt:
> > >   - number of events (frequency)
> > >   - size of the memory maps
> > >   - size of events (callchains)
> > >   - final perf.data size
> > > 
> > > It's also available in:
> > >   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
> > >   perf/record_threads
> > > 
> > > thoughts? ;-) thanks
> > > jirka
> > 
> > It is preferable to split into smaller pieces that bring 
> > some improvement proved by metrics numbers and ready for 
> > merging and upstream. Do we have more metrics than the 
> > data loss from trace AIO patches?
> 
> well the primary focus is to get more events in,
> so the LOST metric is the main one

actualy I was hoping, could you please run it through the same
tests as you do for AIO code on some huge server? 

thanks,
jirka

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-14  2:29   ` Namhyung Kim
  2018-09-14  7:15     ` Alexey Budankov
  2018-09-14  8:23     ` Jiri Olsa
@ 2018-09-14  9:33     ` Ingo Molnar
  2 siblings, 0 replies; 101+ messages in thread
From: Ingo Molnar @ 2018-09-14  9:33 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Alexey Budankov, Jiri Olsa, Arnaldo Carvalho de Melo, lkml,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, kernel-team


* Namhyung Kim <namhyung@kernel.org> wrote:

> > > The perf.data stays as a single file.
> 
> I'm not sure we really need to keep it as a single file.  As it's a
> kind of big changes, we might consider breaking compatibility and use
> a directory structure.

Agreed - and to make use of the highly scalable Linux VFS implementation we should
attempt to use per CPU file resources as well.

Any cross-CPU contention should stick out like a sore thumb.

> > There is usage of Posix threading API but there is no 
> > its implementation in the patch series, to avoid dependency 
> > on externally coded designs in the core of the tool.
> 
> Do you mean it needs to implement its own threading?  I don't think
> that's what Ingo wanted to.

Yeah, I didn't mean that: every libc hoping to work on Linux implements a pthread API, plus the 
pthread APIs we are using are really just narrow wrappers on top of system calls that were 
written with libc pthread APIs in mind. So it's not a problem to rely on pthreads.h. (And if we 
have trouble with any particular pthread detail we can single out specific functionality and 
not use it or use our own implementation.)

The AIO library is another matter: it's a family of interfaces with complex libc specific 
design choices that cannot be influenced.

I.e. my suggestion was to keep using pthreads APIs like we do today, but not use the libc AIO 
library. Not because there's any problem with glibc AIO: but because basic event flow is a core 
competency of perf that we want to implement ourselves.

Is this clearer?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-14  8:28     ` Jiri Olsa
@ 2018-09-14  9:37       ` Alexey Budankov
  2018-09-21  6:13         ` Alexey Budankov
  0 siblings, 1 reply; 101+ messages in thread
From: Alexey Budankov @ 2018-09-14  9:37 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar,
	Namhyung Kim, Alexander Shishkin, Peter Zijlstra, Andi Kleen

On 14.09.2018 11:28, Jiri Olsa wrote:
> On Fri, Sep 14, 2018 at 10:26:53AM +0200, Jiri Olsa wrote:
> 
> SNIP
> 
>>>> The threaded monitoring currently can't monitor backward maps
>>>> and there are probably more limitations which I haven't spotted
>>>> yet.
>>>>
>>>> So far I tested on laptop:
>>>>   http://people.redhat.com/~jolsa/record_threads/test-4CPU.txt
>>>>
>>>> and a one bigger server:
>>>>   http://people.redhat.com/~jolsa/record_threads/test-208CPU.txt
>>>>
>>>> I can see decrease in recorded LOST events, but both the benchmark
>>>> and the monitoring must be carefully configured wrt:
>>>>   - number of events (frequency)
>>>>   - size of the memory maps
>>>>   - size of events (callchains)
>>>>   - final perf.data size
>>>>
>>>> It's also available in:
>>>>   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
>>>>   perf/record_threads
>>>>
>>>> thoughts? ;-) thanks
>>>> jirka
>>>
>>> It is preferable to split into smaller pieces that bring 
>>> some improvement proved by metrics numbers and ready for 
>>> merging and upstream. Do we have more metrics than the 
>>> data loss from trace AIO patches?
>>
>> well the primary focus is to get more events in,
>> so the LOST metric is the main one
> 
> actualy I was hoping, could you please run it through the same
> tests as you do for AIO code on some huge server? 

Yeah, I will, but it takes some time.

> 
> thanks,
> jirka
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-14  8:23     ` Jiri Olsa
@ 2018-09-14  9:40       ` Ingo Molnar
  2018-09-14 11:15         ` Peter Zijlstra
  0 siblings, 1 reply; 101+ messages in thread
From: Ingo Molnar @ 2018-09-14  9:40 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Namhyung Kim, Alexey Budankov, Jiri Olsa,
	Arnaldo Carvalho de Melo, lkml, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, kernel-team


* Jiri Olsa <jolsa@redhat.com> wrote:

> On Fri, Sep 14, 2018 at 11:29:10AM +0900, Namhyung Kim wrote:
> > On Thu, Sep 13, 2018 at 07:10:35PM +0300, Alexey Budankov wrote:
> > > Hi,
> > 
> > Hello,
> > 
> > > 
> > > On 13.09.2018 15:54, Jiri Olsa wrote:
> > > > hi,
> > > > sending *RFC* for threads support in perf record command.
> > > > 
> > > > In big picture this patchset adds perf record --threads
> > > > option that allows to create threads in following modes:
> > > > 
> > > > 1) single thread mode (current)
> > > > 
> > > >   $ perf record ...
> > > >   $ perf record --threads=1 ...
> > > > 
> > > >   - all maps are read/stored under process thread
> > > > 
> > > > 2) mode with specific (X) number of threads
> > > > 
> > > >   $ perf record --threads=X ...
> > > > 
> > > >   - maps are spread equaly among threads
> > > > 
> > > > 3) mode that creates thread for every monitored memory map
> > > > 
> > > >   $ perf record --threads ...
> > > > 
> > > >   - which in perf record is equal to number of CPUs, and
> > > >     it pins each thread to its map's cpu:
> > > > 
> > > > 4) TODO - NUMA aware threads/maps separation
> > > >    ...
> > > > 
> > > > The perf.data stays as a single file.
> > 
> > I'm not sure we really need to keep it as a single file.  As it's a
> > kind of big changes, we might consider breaking compatibility and use
> > a directory structure.
> 
> moving the files into the perf.data at the end is actualy
> not a lot code.. and I think it's one of the 'small' things
> that make this feature more user friendly

So the user shouldn't really care about the structure of the file when most uses of perf 
tooling, and 'single file' versus 'single directory' has similar usability IMHO.

When moving across machines it's recommended to use 'perf archive' anyway, which already 
creates a tarball that includes debuginfo and other context.

In fact keeping the files separate has scalability advantages for 'perf report' and similar 
parsing tools: they could read all the streams in a per-CPU fashion already, from the very 
beginning.


BTW., random annoyance bugreport, for me 'perf archive' is spewing a ton of these messages:

  $ perf archive
  unwind: target platform=x86 is not supported
  unwind: target platform=x86 is not supported
  unwind: target platform=x86 is not supported
  ...
  unwind: target platform=x86 is not supported
  unwind: target platform=x86 is not supported

  Now please run:

  $ tar xvf perf.data.tar.bz2 -C ~/.debug

  wherever you need to run 'perf report' on.

  $ perf version
  perf version 4.19.rc2.gcb48b6

That message is repeated 7,200 times (!) and immediately nuked my terminal history. :-/

Even if we want to emit that warning (we really shouldn't unless it's important for the user to 
know), there's no reason to print thousands of messages to stderr.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-14  9:40       ` Ingo Molnar
@ 2018-09-14 11:15         ` Peter Zijlstra
  2018-09-14 11:47           ` Jiri Olsa
  0 siblings, 1 reply; 101+ messages in thread
From: Peter Zijlstra @ 2018-09-14 11:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jiri Olsa, Namhyung Kim, Alexey Budankov, Jiri Olsa,
	Arnaldo Carvalho de Melo, lkml, Alexander Shishkin, Andi Kleen,
	kernel-team

On Fri, Sep 14, 2018 at 11:40:22AM +0200, Ingo Molnar wrote:
> In fact keeping the files separate has scalability advantages for 'perf report' and similar 
> parsing tools: they could read all the streams in a per-CPU fashion already, from the very 
> beginning.

Also writing to different files from different CPUs is good for record,
less contention on the inode state (which include pagecache).

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-14 11:15         ` Peter Zijlstra
@ 2018-09-14 11:47           ` Jiri Olsa
  2018-09-14 12:01             ` Peter Zijlstra
  0 siblings, 1 reply; 101+ messages in thread
From: Jiri Olsa @ 2018-09-14 11:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Namhyung Kim, Alexey Budankov, Jiri Olsa,
	Arnaldo Carvalho de Melo, lkml, Alexander Shishkin, Andi Kleen,
	kernel-team

On Fri, Sep 14, 2018 at 01:15:28PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 14, 2018 at 11:40:22AM +0200, Ingo Molnar wrote:
> > In fact keeping the files separate has scalability advantages for 'perf report' and similar 
> > parsing tools: they could read all the streams in a per-CPU fashion already, from the very 
> > beginning.
> 
> Also writing to different files from different CPUs is good for record,
> less contention on the inode state (which include pagecache).

maybe I should explain a little bit more on this

we write to different (per-cpu) files during the record,
and at the end of the session, we take them and store
them inside perf.data

I don't mind having the directory instead, however we are
talking about small amount of code allowing us to keep the
data in single file.. we can always leave it to a special
command line option ;-)

jirka

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-14 11:47           ` Jiri Olsa
@ 2018-09-14 12:01             ` Peter Zijlstra
  2018-09-14 12:13               ` Ingo Molnar
  0 siblings, 1 reply; 101+ messages in thread
From: Peter Zijlstra @ 2018-09-14 12:01 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ingo Molnar, Namhyung Kim, Alexey Budankov, Jiri Olsa,
	Arnaldo Carvalho de Melo, lkml, Alexander Shishkin, Andi Kleen,
	kernel-team

On Fri, Sep 14, 2018 at 01:47:25PM +0200, Jiri Olsa wrote:
> On Fri, Sep 14, 2018 at 01:15:28PM +0200, Peter Zijlstra wrote:
> > On Fri, Sep 14, 2018 at 11:40:22AM +0200, Ingo Molnar wrote:
> > > In fact keeping the files separate has scalability advantages for 'perf report' and similar 
> > > parsing tools: they could read all the streams in a per-CPU fashion already, from the very 
> > > beginning.
> > 
> > Also writing to different files from different CPUs is good for record,
> > less contention on the inode state (which include pagecache).
> 
> maybe I should explain a little bit more on this
> 
> we write to different (per-cpu) files during the record,
> and at the end of the session, we take them and store
> them inside perf.data

How long does it take to combine that? If we generated a lot of data,
that could take a fair amount of time, no?

I feel that record should not mysteriously 'hang' when it is done. It
used to do that at some point because of that stupid .debug crap, but
acme fixed that I think.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-14 12:01             ` Peter Zijlstra
@ 2018-09-14 12:13               ` Ingo Molnar
  2018-09-14 12:19                 ` Jiri Olsa
  0 siblings, 1 reply; 101+ messages in thread
From: Ingo Molnar @ 2018-09-14 12:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, Namhyung Kim, Alexey Budankov, Jiri Olsa,
	Arnaldo Carvalho de Melo, lkml, Alexander Shishkin, Andi Kleen,
	kernel-team


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Sep 14, 2018 at 01:47:25PM +0200, Jiri Olsa wrote:
> > On Fri, Sep 14, 2018 at 01:15:28PM +0200, Peter Zijlstra wrote:
> > > On Fri, Sep 14, 2018 at 11:40:22AM +0200, Ingo Molnar wrote:
> > > > In fact keeping the files separate has scalability advantages for 'perf report' and similar 
> > > > parsing tools: they could read all the streams in a per-CPU fashion already, from the very 
> > > > beginning.
> > > 
> > > Also writing to different files from different CPUs is good for record,
> > > less contention on the inode state (which include pagecache).
> > 
> > maybe I should explain a little bit more on this
> > 
> > we write to different (per-cpu) files during the record,
> > and at the end of the session, we take them and store
> > them inside perf.data
> 
> How long does it take to combine that? If we generated a lot of data,
> that could take a fair amount of time, no?
> 
> I feel that record should not mysteriously 'hang' when it is done. It
> used to do that at some point because of that stupid .debug crap, but
> acme fixed that I think.

Agreed - plus at the report stage it would be advantageous to be able to *read* per-cpu files 
as well.

If we do things smartly them report will create similar NUMA affinity as the record session 
used.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-14 12:13               ` Ingo Molnar
@ 2018-09-14 12:19                 ` Jiri Olsa
  2018-09-14 12:45                   ` Ingo Molnar
  0 siblings, 1 reply; 101+ messages in thread
From: Jiri Olsa @ 2018-09-14 12:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Namhyung Kim, Alexey Budankov, Jiri Olsa,
	Arnaldo Carvalho de Melo, lkml, Alexander Shishkin, Andi Kleen,
	kernel-team

On Fri, Sep 14, 2018 at 02:13:07PM +0200, Ingo Molnar wrote:
> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Fri, Sep 14, 2018 at 01:47:25PM +0200, Jiri Olsa wrote:
> > > On Fri, Sep 14, 2018 at 01:15:28PM +0200, Peter Zijlstra wrote:
> > > > On Fri, Sep 14, 2018 at 11:40:22AM +0200, Ingo Molnar wrote:
> > > > > In fact keeping the files separate has scalability advantages for 'perf report' and similar 
> > > > > parsing tools: they could read all the streams in a per-CPU fashion already, from the very 
> > > > > beginning.
> > > > 
> > > > Also writing to different files from different CPUs is good for record,
> > > > less contention on the inode state (which include pagecache).
> > > 
> > > maybe I should explain a little bit more on this
> > > 
> > > we write to different (per-cpu) files during the record,
> > > and at the end of the session, we take them and store
> > > them inside perf.data
> > 
> > How long does it take to combine that? If we generated a lot of data,
> > that could take a fair amount of time, no?

yep.. fair amount ;-) wasn't that bad in my tests,
but could be evil on some really big server

> > I feel that record should not mysteriously 'hang' when it is done. It
> > used to do that at some point because of that stupid .debug crap, but
> > acme fixed that I think.
> 
> Agreed - plus at the report stage it would be advantageous to be able to *read* per-cpu files 
> as well.
> 
> If we do things smartly them report will create similar NUMA affinity as the record session 
> used.

ok, separate files it is

jirka

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-14 12:19                 ` Jiri Olsa
@ 2018-09-14 12:45                   ` Ingo Molnar
  0 siblings, 0 replies; 101+ messages in thread
From: Ingo Molnar @ 2018-09-14 12:45 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Namhyung Kim, Alexey Budankov, Jiri Olsa,
	Arnaldo Carvalho de Melo, lkml, Alexander Shishkin, Andi Kleen,
	kernel-team


* Jiri Olsa <jolsa@redhat.com> wrote:

> On Fri, Sep 14, 2018 at 02:13:07PM +0200, Ingo Molnar wrote:
> > 
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Fri, Sep 14, 2018 at 01:47:25PM +0200, Jiri Olsa wrote:
> > > > On Fri, Sep 14, 2018 at 01:15:28PM +0200, Peter Zijlstra wrote:
> > > > > On Fri, Sep 14, 2018 at 11:40:22AM +0200, Ingo Molnar wrote:
> > > > > > In fact keeping the files separate has scalability advantages for 'perf report' and similar 
> > > > > > parsing tools: they could read all the streams in a per-CPU fashion already, from the very 
> > > > > > beginning.
> > > > > 
> > > > > Also writing to different files from different CPUs is good for record,
> > > > > less contention on the inode state (which include pagecache).
> > > > 
> > > > maybe I should explain a little bit more on this
> > > > 
> > > > we write to different (per-cpu) files during the record,
> > > > and at the end of the session, we take them and store
> > > > them inside perf.data
> > > 
> > > How long does it take to combine that? If we generated a lot of data,
> > > that could take a fair amount of time, no?
> 
> yep.. fair amount ;-) wasn't that bad in my tests,
> but could be evil on some really big server

Also, adding any sort of 'global' processing to the end of a session sucks as a workflow 
principle: perf record should ideally be as lightweight as ftrace. It should trace and that's 
it - the processing should be done at the report phase. Shuffling hundreds of megs or gigs 
around at the end of the session is really bad.

> > Agreed - plus at the report stage it would be advantageous to be able to *read* per-cpu files 
> > as well.
> > 
> > If we do things smartly them report will create similar NUMA affinity as the record session 
> > used.
> 
> ok, separate files it is

Thanks!!

	Ingo

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
                   ` (48 preceding siblings ...)
  2018-09-13 16:10 ` [RFCv2 00/48] perf tools: Add threads to record command Alexey Budankov
@ 2018-09-14 17:02 ` Andi Kleen
  49 siblings, 0 replies; 101+ messages in thread
From: Andi Kleen @ 2018-09-14 17:02 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, lkml, Ingo Molnar, Namhyung Kim,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov

> In big picture this patchset adds perf record --threads
> option that allows to create threads in following modes:
> 
> 1) single thread mode (current)
> 
>   $ perf record ...
>   $ perf record --threads=1 ...
> 
>   - all maps are read/stored under process thread
> 
> 2) mode with specific (X) number of threads
> 
>   $ perf record --threads=X ...
> 
>   - maps are spread equaly among threads
> 
> 3) mode that creates thread for every monitored memory map
> 
>   $ perf record --threads ...
> 
>   - which in perf record is equal to number of CPUs, and
>     it pins each thread to its map's cpu:

We need some way to flush data on a different thread than
what is processing the perf data, so essentially more threads than
CPUs. Something like this is needed for PT where even a single CPU can
generate so much data that it may take longer and longer to flush. You end up
with long gaps in the PT collection then because PT stays disabled
while the flushing happens on the same thread. If it was done
in the background in another thread these gaps would be much smaller.


Of course if Linux would finally get real file system AIO that wouldn't
be needed ...


-Andi

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 29/48] perf callchain: Maintain libunwind's address space in map_groups
  2018-09-13 12:54 ` [PATCH 29/48] perf callchain: Maintain libunwind's address space in map_groups Jiri Olsa
@ 2018-09-14 18:15   ` Arnaldo Carvalho de Melo
  2018-09-14 19:00     ` Jiri Olsa
  0 siblings, 1 reply; 101+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-09-14 18:15 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Frederic Weisbecker, lkml, Ingo Molnar, Namhyung Kim,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov

Em Thu, Sep 13, 2018 at 02:54:31PM +0200, Jiri Olsa escreveu:
> Currently the address_space was kept in thread struct but it's more
> appropriate to keep it in map_groups as it's maintained throughout
> exec's with timestamps.  Also we should not flush the address space
> after exec since it still can be accessed when used with an indexed
> data file.
> 
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Link: http://lkml.kernel.org/n/tip-hjryh6x2yfnrz8g0djhez24z@git.kernel.org
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  tools/perf/util/map.h                    |  5 ++++-
>  tools/perf/util/thread.h                 |  1 -
>  tools/perf/util/unwind-libunwind-local.c | 28 ++++++++++++++----------
>  tools/perf/util/unwind-libunwind.c       |  9 ++++----
>  tools/perf/util/unwind.h                 |  7 +++---
>  5 files changed, 29 insertions(+), 21 deletions(-)
> 
> diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
> index 02c6f6962eb1..b1efe57b8563 100644
> --- a/tools/perf/util/map.h
> +++ b/tools/perf/util/map.h
> @@ -65,10 +65,13 @@ struct maps {
>  
>  struct map_groups {
>  	struct maps	 maps;
> -	struct machine	 *machine;
> +	struct machine	*machine;
>  	refcount_t	 refcnt;
>  	u64		 timestamp;
>  	struct list_head list;

Hey, avoid these distractions, this doesn't change anything and besides
having the * aligned with the names of non-pointers is what is common
practice, see for instance 'struct task_struct', 'struct inode', to name
just two widely used structs in the kernel source :-)

I'll get these two fixed up, i.e. remove the above hunk, align the one
below :-)

- Arnaldo

> +#ifdef HAVE_LIBUNWIND_SUPPORT
> +	void		*addr_space;
> +#endif
>  };
>  
>  struct map_groups *map_groups__new(struct machine *machine);
> diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
> index 86186a0773a0..637775f622b3 100644
> --- a/tools/perf/util/thread.h
> +++ b/tools/perf/util/thread.h
> @@ -40,7 +40,6 @@ struct thread {
>  	struct thread_stack	*ts;
>  	struct nsinfo		*nsinfo;
>  #ifdef HAVE_LIBUNWIND_SUPPORT
> -	void				*addr_space;
>  	struct unwind_libunwind_ops	*unwind_libunwind_ops;
>  #endif
>  };
> diff --git a/tools/perf/util/unwind-libunwind-local.c b/tools/perf/util/unwind-libunwind-local.c
> index da6f39315b47..f7c921f87bcf 100644
> --- a/tools/perf/util/unwind-libunwind-local.c
> +++ b/tools/perf/util/unwind-libunwind-local.c
> @@ -617,32 +617,35 @@ static unw_accessors_t accessors = {
>  	.get_proc_name		= get_proc_name,
>  };
>  
> -static int _unwind__prepare_access(struct thread *thread)
> +static int _unwind__prepare_access(struct map_groups *mg)
>  {
>  	if (!dwarf_callchain_users)
>  		return 0;
> -	thread->addr_space = unw_create_addr_space(&accessors, 0);
> -	if (!thread->addr_space) {
> +
> +	mg->addr_space = unw_create_addr_space(&accessors, 0);
> +	if (!mg->addr_space) {
>  		pr_err("unwind: Can't create unwind address space.\n");
>  		return -ENOMEM;
>  	}
>  
> -	unw_set_caching_policy(thread->addr_space, UNW_CACHE_GLOBAL);
> +	unw_set_caching_policy(mg->addr_space, UNW_CACHE_GLOBAL);
>  	return 0;
>  }
>  
> -static void _unwind__flush_access(struct thread *thread)
> +static void _unwind__flush_access(struct map_groups *mg)
>  {
>  	if (!dwarf_callchain_users)
>  		return;
> -	unw_flush_cache(thread->addr_space, 0, 0);
> +
> +	unw_flush_cache(mg->addr_space, 0, 0);
>  }
>  
> -static void _unwind__finish_access(struct thread *thread)
> +static void _unwind__finish_access(struct map_groups *mg)
>  {
>  	if (!dwarf_callchain_users)
>  		return;
> -	unw_destroy_addr_space(thread->addr_space);
> +
> +	unw_destroy_addr_space(mg->addr_space);
>  }
>  
>  static int get_entries(struct unwind_info *ui, unwind_entry_cb_t cb,
> @@ -650,7 +653,6 @@ static int get_entries(struct unwind_info *ui, unwind_entry_cb_t cb,
>  {
>  	u64 val;
>  	unw_word_t ips[max_stack];
> -	unw_addr_space_t addr_space;
>  	unw_cursor_t c;
>  	int ret, i = 0;
>  
> @@ -666,13 +668,15 @@ static int get_entries(struct unwind_info *ui, unwind_entry_cb_t cb,
>  	 * unwind itself.
>  	 */
>  	if (max_stack - 1 > 0) {
> +		struct map_groups *mg;
> +
>  		WARN_ONCE(!ui->thread, "WARNING: ui->thread is NULL");
> -		addr_space = ui->thread->addr_space;
>  
> -		if (addr_space == NULL)
> +		mg = thread__get_map_groups(ui->thread, ui->sample->time);
> +		if (mg == NULL || mg->addr_space == NULL)
>  			return -1;
>  
> -		ret = unw_init_remote(&c, addr_space, ui);
> +		ret = unw_init_remote(&c, mg->addr_space, ui);
>  		if (ret)
>  			display_error(ret);
>  
> diff --git a/tools/perf/util/unwind-libunwind.c b/tools/perf/util/unwind-libunwind.c
> index b029a5e9ae49..ce8408e460f2 100644
> --- a/tools/perf/util/unwind-libunwind.c
> +++ b/tools/perf/util/unwind-libunwind.c
> @@ -18,12 +18,13 @@ static void unwind__register_ops(struct thread *thread,
>  int unwind__prepare_access(struct thread *thread, struct map *map,
>  			   bool *initialized)
>  {
> +	struct map_groups *mg = thread->mg;
>  	const char *arch;
>  	enum dso_type dso_type;
>  	struct unwind_libunwind_ops *ops = local_unwind_libunwind_ops;
>  	int err;
>  
> -	if (thread->addr_space) {
> +	if (mg->addr_space) {
>  		pr_debug("unwind: thread map already set, dso=%s\n",
>  			 map->dso->name);
>  		if (initialized)
> @@ -56,7 +57,7 @@ int unwind__prepare_access(struct thread *thread, struct map *map,
>  out_register:
>  	unwind__register_ops(thread, ops);
>  
> -	err = thread->unwind_libunwind_ops->prepare_access(thread);
> +	err = thread->unwind_libunwind_ops->prepare_access(thread->mg);
>  	if (initialized)
>  		*initialized = err ? false : true;
>  	return err;
> @@ -65,13 +66,13 @@ int unwind__prepare_access(struct thread *thread, struct map *map,
>  void unwind__flush_access(struct thread *thread)
>  {
>  	if (thread->unwind_libunwind_ops)
> -		thread->unwind_libunwind_ops->flush_access(thread);
> +		thread->unwind_libunwind_ops->flush_access(thread->mg);
>  }
>  
>  void unwind__finish_access(struct thread *thread)
>  {
>  	if (thread->unwind_libunwind_ops)
> -		thread->unwind_libunwind_ops->finish_access(thread);
> +		thread->unwind_libunwind_ops->finish_access(thread->mg);
>  }
>  
>  int unwind__get_entries(unwind_entry_cb_t cb, void *arg,
> diff --git a/tools/perf/util/unwind.h b/tools/perf/util/unwind.h
> index 8a44a1569a21..0f18a0858904 100644
> --- a/tools/perf/util/unwind.h
> +++ b/tools/perf/util/unwind.h
> @@ -9,6 +9,7 @@ struct map;
>  struct perf_sample;
>  struct symbol;
>  struct thread;
> +struct map_groups;
>  
>  struct unwind_entry {
>  	struct map	*map;
> @@ -19,9 +20,9 @@ struct unwind_entry {
>  typedef int (*unwind_entry_cb_t)(struct unwind_entry *entry, void *arg);
>  
>  struct unwind_libunwind_ops {
> -	int (*prepare_access)(struct thread *thread);
> -	void (*flush_access)(struct thread *thread);
> -	void (*finish_access)(struct thread *thread);
> +	int (*prepare_access)(struct map_groups *mg);
> +	void (*flush_access)(struct map_groups *mg);
> +	void (*finish_access)(struct map_groups *mg);
>  	int (*get_entries)(unwind_entry_cb_t cb, void *arg,
>  			   struct thread *thread,
>  			   struct perf_sample *data, int max_stack);
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 29/48] perf callchain: Maintain libunwind's address space in map_groups
  2018-09-14 18:15   ` Arnaldo Carvalho de Melo
@ 2018-09-14 19:00     ` Jiri Olsa
  0 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-14 19:00 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Frederic Weisbecker, lkml, Ingo Molnar, Namhyung Kim,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov

On Fri, Sep 14, 2018 at 03:15:47PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Sep 13, 2018 at 02:54:31PM +0200, Jiri Olsa escreveu:
> > Currently the address_space was kept in thread struct but it's more
> > appropriate to keep it in map_groups as it's maintained throughout
> > exec's with timestamps.  Also we should not flush the address space
> > after exec since it still can be accessed when used with an indexed
> > data file.
> > 
> > Cc: Frederic Weisbecker <fweisbec@gmail.com>
> > Link: http://lkml.kernel.org/n/tip-hjryh6x2yfnrz8g0djhez24z@git.kernel.org
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> >  tools/perf/util/map.h                    |  5 ++++-
> >  tools/perf/util/thread.h                 |  1 -
> >  tools/perf/util/unwind-libunwind-local.c | 28 ++++++++++++++----------
> >  tools/perf/util/unwind-libunwind.c       |  9 ++++----
> >  tools/perf/util/unwind.h                 |  7 +++---
> >  5 files changed, 29 insertions(+), 21 deletions(-)
> > 
> > diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
> > index 02c6f6962eb1..b1efe57b8563 100644
> > --- a/tools/perf/util/map.h
> > +++ b/tools/perf/util/map.h
> > @@ -65,10 +65,13 @@ struct maps {
> >  
> >  struct map_groups {
> >  	struct maps	 maps;
> > -	struct machine	 *machine;
> > +	struct machine	*machine;
> >  	refcount_t	 refcnt;
> >  	u64		 timestamp;
> >  	struct list_head list;
> 
> Hey, avoid these distractions, this doesn't change anything and besides

it bothers me.. ;-)

jirka

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 37/48] perf record: Introduce struct record_thread
  2018-09-13 12:54 ` [PATCH 37/48] perf record: Introduce struct record_thread Jiri Olsa
@ 2018-09-17 11:26   ` Namhyung Kim
  2018-09-23 19:31     ` Jiri Olsa
  0 siblings, 1 reply; 101+ messages in thread
From: Namhyung Kim @ 2018-09-17 11:26 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, lkml, Ingo Molnar, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov, kernel-team

Hi Jiri,

On Thu, Sep 13, 2018 at 02:54:39PM +0200, Jiri Olsa wrote:
> Adding struct record_thread to carry the single thread's maps.
> 
> Link: http://lkml.kernel.org/n/tip-dsyi97xdc7ullvsisqmha0ca@git.kernel.org
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  tools/perf/builtin-record.c | 179 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 179 insertions(+)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 1b01cb4d06b8..5c6b56f164a9 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -65,6 +65,15 @@ struct switch_output {
>  	bool		 set;
>  };
>  
> +struct record_thread {
> +	struct perf_mmap	**mmap;
> +	int			  mmap_nr;
> +	struct perf_mmap	**ovw_mmap;
> +	int			  ovw_mmap_nr;
> +	struct fdarray		  pollfd;
> +	struct record		 *rec;
> +};
> +
>  struct record {
>  	struct perf_tool	tool;
>  	struct record_opts	opts;
> @@ -83,6 +92,8 @@ struct record {
>  	bool			timestamp_boundary;
>  	struct switch_output	switch_output;
>  	unsigned long long	samples;
> +	struct record_thread	*threads;
> +	int			threads_cnt;
>  };
>  
>  static volatile int auxtrace_record__snapshot_started;
> @@ -967,6 +978,166 @@ static int record__synthesize(struct record *rec, bool tail)
>  	return err;
>  }
>  
> +static void
> +record_thread__clean(struct record_thread *th)
> +{
> +	free(th->mmap);
> +	free(th->ovw_mmap);
> +}
> +
> +static void
> +record__threads_clean(struct record *rec)
> +{
> +	struct record_thread *threads = rec->threads;
> +	int i;
> +
> +	if (threads) {
> +		for (i = 0; i < rec->threads_cnt; i++)
> +			record_thread__clean(threads + i);
> +	}
> +}
> +
> +static void record_thread__init(struct record_thread *th, struct record *rec)
> +{
> +	memset(th, 0, sizeof(*th));
> +	fdarray__init(&th->pollfd, 64);
> +	th->rec = rec;
> +}
> +
> +static int
> +record_thread__mmap(struct record_thread *th, int nr, int nr_ovw)
> +{
> +	struct perf_mmap **mmap;
> +
> +	mmap = zalloc(sizeof(*mmap) * nr);
> +	if (!mmap)
> +		return -ENOMEM;
> +
> +	th->mmap    = mmap;
> +	th->mmap_nr = nr;
> +
> +	if (nr_ovw) {
> +		mmap = zalloc(sizeof(*mmap) * nr_ovw);
> +		if (!mmap)
> +			return -ENOMEM;
> +
> +		th->ovw_mmap    = mmap;
> +		th->ovw_mmap_nr = nr;

s/nr;/nr_ovw;/ ?


> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +record__threads_assign(struct record *rec)
> +{
> +	struct record_thread *threads = rec->threads;
> +	struct record_thread *thread0 = threads;
> +	struct perf_evlist *evlist = rec->evlist;
> +	int i, j, nr, nr0, nr_ovw, nr_trk;
> +	int ret = -ENOMEM;
> +
> +	nr     = evlist->mmap           ? evlist->nr_mmaps : 0;
> +	nr_trk = evlist->track_mmap     ? evlist->nr_mmaps : 0;
> +	nr_ovw = evlist->overwrite_mmap ? evlist->nr_mmaps : 0;
> +
> +	nr0  = nr_trk;
> +	nr0 += nr;
> +
> +	if (record_thread__mmap(thread0, nr0, nr_ovw))
> +		goto out_error;
> +
> +	for (i = 0; i < nr_ovw; i++)
> +		thread0->ovw_mmap[i] = &evlist->overwrite_mmap[i];
> +
> +	for (i = 0; i < nr_trk; i++)
> +		thread0->mmap[i] = &evlist->track_mmap[i];
> +
> +	for (j = 0; i < nr0 && j < nr; i++, j++)
> +		thread0->mmap[i] = &evlist->mmap[j];

I'm not sure it'll work with the overwrite mmap well..

Thanks,
Namhyung


> +
> +	ret = 0;
> +
> +out_error:
> +	return ret;
> +}

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 38/48] perf record: Read record thread's mmaps
  2018-09-13 12:54 ` [PATCH 38/48] perf record: Read record thread's mmaps Jiri Olsa
@ 2018-09-17 11:28   ` Namhyung Kim
  2018-09-23 19:35     ` Jiri Olsa
  0 siblings, 1 reply; 101+ messages in thread
From: Namhyung Kim @ 2018-09-17 11:28 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, lkml, Ingo Molnar, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov, kernel-team

On Thu, Sep 13, 2018 at 02:54:40PM +0200, Jiri Olsa wrote:
> Switch the maps source from evlist into thread data.
> 
> Link: http://lkml.kernel.org/n/tip-2r6hn6shl185j66b4vl1k4pr@git.kernel.org
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  tools/perf/builtin-record.c | 37 ++++++++++++++++++++-----------------
>  1 file changed, 20 insertions(+), 17 deletions(-)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 5c6b56f164a9..d6fef646b67f 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -96,6 +96,8 @@ struct record {
>  	int			threads_cnt;
>  };
>  
> +static __thread struct record_thread *thread;
> +
>  static volatile int auxtrace_record__snapshot_started;
>  static DEFINE_TRIGGER(auxtrace_snapshot_trigger);
>  static DEFINE_TRIGGER(switch_output_trigger);
> @@ -561,24 +563,24 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>  				    bool overwrite)
>  {
>  	u64 bytes_written = rec->bytes_written;
> -	int i;
> +	int i, nr;
>  	int rc = 0;
> -	struct perf_mmap *maps;
> +	struct perf_mmap **maps;
>  
>  	if (!evlist)
>  		return 0;
>  
> -	maps = overwrite ? evlist->overwrite_mmap : evlist->mmap;
> +	maps = overwrite ? thread->ovw_mmap : thread->mmap;
>  	if (!maps)
>  		return 0;
>  
>  	if (overwrite && evlist->bkw_mmap_state != BKW_MMAP_DATA_PENDING)
>  		return 0;
>  
> -	for (i = 0; i < evlist->nr_mmaps; i++) {
> -		struct perf_mmap *map = &maps[i];
> -		struct perf_mmap *track_map =  evlist->track_mmap ?
> -					      &evlist->track_mmap[i] : NULL;
> +	nr = overwrite ? thread->ovw_mmap_nr : thread->mmap_nr;
> +
> +	for (i = 0; i < nr; i++) {
> +		struct perf_mmap *map = maps[i];
>  
>  		if (map->base) {
>  			if (perf_mmap__push(map, rec, record__pushfn) != 0) {
> @@ -592,21 +594,20 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
>  			rc = -1;
>  			goto out;
>  		}
> -
> -		if (track_map && track_map->base) {
> -			if (perf_mmap__push(track_map, rec, record__pushfn) != 0) {
> -				rc = -1;
> -				goto out;
> -			}
> -		}
>  	}
>  
>  	/*
>  	 * Mark the round finished in case we wrote
>  	 * at least one event.
>  	 */
> -	if (bytes_written != rec->bytes_written)
> -		rc = record__write(rec, NULL, &finished_round_event, sizeof(finished_round_event));
> +	if (bytes_written != rec->bytes_written) {
> +		/*
> +		 * All maps of the threads point to a single file,
> +		 * so we can just pick first one.
> +		 */
> +		rc = record__write(rec, thread->mmap[0], &finished_round_event,

Shouldn't it be maps[0] ?

Thanks,
Namhyung


> +				   sizeof(finished_round_event));
> +	}
>  
>  	if (overwrite)
>  		perf_evlist__toggle_bkw_mmap(evlist, BKW_MMAP_EMPTY);
> @@ -1222,6 +1223,8 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>  			goto out_child;
>  	}
>  
> +	thread = &rec->threads[0];
> +
>  	err = bpf__apply_obj_config();
>  	if (err) {
>  		char errbuf[BUFSIZ];
> @@ -1415,7 +1418,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>  		if (hits == rec->samples) {
>  			if (done || draining)
>  				break;
> -			err = perf_evlist__poll(rec->evlist, -1);
> +			err = fdarray__poll(&thread->pollfd, -1);
>  			/*
>  			 * Propagate error, only if there's any. Ignore positive
>  			 * number of returned events and interrupt error.
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 39/48] perf record: Move waking into struct record
  2018-09-13 12:54 ` [PATCH 39/48] perf record: Move waking into struct record Jiri Olsa
@ 2018-09-17 11:31   ` Namhyung Kim
  2018-09-23 19:36     ` Jiri Olsa
  0 siblings, 1 reply; 101+ messages in thread
From: Namhyung Kim @ 2018-09-17 11:31 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, lkml, Ingo Molnar, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov, kernel-team

On Thu, Sep 13, 2018 at 02:54:41PM +0200, Jiri Olsa wrote:
> We need to keep global number of 'waking' now.
> 
> TODO: make this multiple threads safe.

Why not using atomic APIs?

Thanks,
Namhyung


> 
> Link: http://lkml.kernel.org/n/tip-veetgk62aisdt1cxaa6fbgox@git.kernel.org
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  tools/perf/builtin-record.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index d6fef646b67f..62ff4411ce39 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -94,6 +94,7 @@ struct record {
>  	unsigned long long	samples;
>  	struct record_thread	*threads;
>  	int			threads_cnt;
> +	unsigned long		waking;
>  };
>  
>  static __thread struct record_thread *thread;
> @@ -1143,7 +1144,6 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>  {
>  	int err;
>  	int status = 0;
> -	unsigned long waking = 0;
>  	const bool forks = argc > 0;
>  	struct perf_tool *tool = &rec->tool;
>  	struct record_opts *opts = &rec->opts;
> @@ -1400,8 +1400,8 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>  
>  			if (!quiet)
>  				fprintf(stderr, "[ perf record: dump data: Woken up %ld times ]\n",
> -					waking);
> -			waking = 0;
> +					rec->waking);
> +			rec->waking = 0;
>  			fd = record__switch_output(rec, false);
>  			if (fd < 0) {
>  				pr_err("Failed to switch to new file\n");
> @@ -1425,7 +1425,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>  			 */
>  			if (err > 0 || (err < 0 && errno == EINTR))
>  				err = 0;
> -			waking++;
> +			rec->waking++;
>  
>  			if (perf_evlist__filter_pollfd(rec->evlist, POLLERR | POLLHUP) == 0)
>  				draining = true;
> @@ -1454,7 +1454,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>  	}
>  
>  	if (!quiet)
> -		fprintf(stderr, "[ perf record: Woken up %ld times to write data ]\n", waking);
> +		fprintf(stderr, "[ perf record: Woken up %ld times to write data ]\n", rec->waking);
>  
>  	if (target__none(&rec->opts.target))
>  		record__synthesize_workload(rec, true);
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 44/48] perf record: Add --threads option
  2018-09-13 12:54 ` [PATCH 44/48] perf record: Add --threads option Jiri Olsa
@ 2018-09-17 11:37   ` Namhyung Kim
  0 siblings, 0 replies; 101+ messages in thread
From: Namhyung Kim @ 2018-09-17 11:37 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, lkml, Ingo Molnar, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov, kernel-team

On Thu, Sep 13, 2018 at 02:54:46PM +0200, Jiri Olsa wrote:
> Allows to assign number to record::threads_cnt and thus
> to create multiple threads. At this point we don't allow
> to specify number of threads, instead we assign it number
> of evlist's mmaps to have a single thread for each.
> 
> Link: http://lkml.kernel.org/n/tip-ijl786fsk46q6g01is378a5t@git.kernel.org
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  tools/perf/builtin-record.c | 36 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index fbca1d15b90d..ada6f795d492 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -102,6 +102,7 @@ struct record {
>  	unsigned long long	samples;
>  	struct record_thread	*threads;
>  	int			threads_cnt;
> +	bool			threads_set;
>  	int			threads_signal_cnt;
>  	pthread_mutex_t		threads_signal_mutex;
>  	pthread_cond_t		threads_signal_cond;
> @@ -1133,11 +1134,38 @@ record__threads_create(struct record *rec)
>  	return threads ? 0 : -ENOMEM;
>  }
>  
> +static int record__threads_cnt(struct record *rec)
> +{
> +	struct perf_evlist *evlist = rec->evlist;
> +	int cnt;
> +
> +	if (rec->threads_set) {
> +		if (rec->threads_cnt) {
> +			pr_err("failed: Can't specify number of threads yet.\n");
> +			return -EINVAL;
> +		}
> +		if (evlist->overwrite_mmap) {
> +			pr_err("failed: Can't use multiple threads with overwrite mmaps yet.\n");
> +			return -EINVAL;

Ah, ok.  You made it incompatible with the overwrite mode..

Thanks,
Namhyung


> +		}
> +		cnt = evlist->nr_mmaps;
> +	} else {
> +		cnt = 1;
> +	}
> +
> +	rec->threads_cnt = cnt;
> +	return 0;
> +}
> +
>  static int
>  record__threads_config(struct record *rec)
>  {
>  	int ret;
>  
> +	ret = record__threads_cnt(rec);
> +	if (ret)
> +		goto out;
> +
>  	ret = record__threads_create(rec);
>  	if (ret)
>  		goto out;
> @@ -2119,6 +2147,8 @@ static struct option __record_options[] = {
>  		    "Parse options then exit"),
>  	OPT_BOOLEAN(0, "index", &record.opts.index,
>  		    "make index for sample data to speed-up processing"),
> +	OPT_INTEGER_OPTARG_SET(0, "threads", &record.threads_cnt, &record.threads_set,
> +			       "count", "Enabled threads (count)", 0),
>  	OPT_END()
>  };
>  
> @@ -2267,6 +2297,12 @@ int cmd_record(int argc, const char **argv)
>  		goto out;
>  	}
>  
> +	/*
> +	 * Threads need index data file.
> +	 */
> +	if (record.threads_set)
> +		record.opts.index = true;
> +
>  	if (rec->opts.index) {
>  		if (!rec->opts.sample_time) {
>  			pr_err("Sample timestamp is required for indexing\n");
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 47/48] perf record: Spread maps for --threads option
  2018-09-13 12:54 ` [PATCH 47/48] perf record: Spread maps for --threads option Jiri Olsa
@ 2018-09-17 11:40   ` Namhyung Kim
  2018-09-23 19:44     ` Jiri Olsa
  0 siblings, 1 reply; 101+ messages in thread
From: Namhyung Kim @ 2018-09-17 11:40 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, lkml, Ingo Molnar, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov, kernel-team

On Thu, Sep 13, 2018 at 02:54:49PM +0200, Jiri Olsa wrote:
> Currently we assign all maps to main thread. Adding
> code that spreads maps for --threads option.
> 
> For --thread option we create as many threads as there
> are memory maps in evlist, which is the number of CPUs
> in the system or CPUs we monitor. Each thread gets a
> single data mmap to read.
> 
> In addition we have also same amount of tracking mmaps
> for auxiliary events which we don't create special thread
> for. Instead we assign the to the main thread, because
> there's not much traffic expected there.
> 
> The assignment is visible from --thread-stats output:
> 
>           pid      write       poll       skip  maps (size 20K)
>     1s   9770       144B          1          0   19K   19K   19K   18K   19K
>          9772         0B          1          0   18K
>          9773         0B          1          0   19K
>          9774         0B          1          0   19K
> 
> There are 5 maps for thread 9770 (1 data map and 4 auxiliary)
> and one data map for every other thread. Each thread writes
> data to the separate data file.

Hmm.. not sure it'll work well for large machines with 1000+ cpus.
What about giving each thread a data mmap and a tracking mmap?

Thanks,
Namhyung


> 
> In addition we also pin every thread to the cpu that
> the data map belongs to in order to keep both writer
> (kernel) and reader (perf tool thread) on the same CPU.
> 
> Link: http://lkml.kernel.org/n/tip-ghcsnp3b73innq2gkl1lkfbz@git.kernel.org
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 09/48] perf tools: Make copyfile_offset global
  2018-09-13 12:54 ` [PATCH 09/48] perf tools: Make copyfile_offset global Jiri Olsa
@ 2018-09-18 20:54   ` Arnaldo Carvalho de Melo
  2018-09-23 19:44     ` Jiri Olsa
  2018-09-25  9:33   ` [tip:perf/core] perf util: Make copyfile_offset() global tip-bot for Jiri Olsa
  1 sibling, 1 reply; 101+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-09-18 20:54 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

Em Thu, Sep 13, 2018 at 02:54:11PM +0200, Jiri Olsa escreveu:
> It will be used outside of util object in following patches.

Had to add fcntl.h to have loff_t to fix the build in some systems,
moved the prototype closer to the other copyfile_ prefixed functions in
util.h.

- Arnaldo
 
> Link: http://lkml.kernel.org/n/tip-xgiypvcrmc12u7czcrc27en2@git.kernel.org
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  tools/perf/util/util.c | 2 +-
>  tools/perf/util/util.h | 2 ++
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
> index eac5b858a371..093352e93d50 100644
> --- a/tools/perf/util/util.c
> +++ b/tools/perf/util/util.c
> @@ -221,7 +221,7 @@ static int slow_copyfile(const char *from, const char *to, struct nsinfo *nsi)
>  	return err;
>  }
>  
> -static int copyfile_offset(int ifd, loff_t off_in, int ofd, loff_t off_out, u64 size)
> +int copyfile_offset(int ifd, loff_t off_in, int ofd, loff_t off_out, u64 size)
>  {
>  	void *ptr;
>  	loff_t pgoff;
> diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
> index dc58254a2b69..7fc171b20671 100644
> --- a/tools/perf/util/util.h
> +++ b/tools/perf/util/util.h
> @@ -80,4 +80,6 @@ void perf_set_multithreaded(void);
>  #endif
>  #endif
>  
> +int copyfile_offset(int ifd, loff_t off_in, int ofd, loff_t off_out, u64 size);
> +
>  #endif /* GIT_COMPAT_UTIL_H */
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 02/48] perf tools: Remove perf_tool from event_op3
  2018-09-13 12:54 ` [PATCH 02/48] perf tools: Remove perf_tool from event_op3 Jiri Olsa
@ 2018-09-18 20:56   ` Arnaldo Carvalho de Melo
  2018-09-23 19:45     ` Jiri Olsa
  2018-09-25  9:31   ` [tip:perf/core] " tip-bot for Jiri Olsa
  1 sibling, 1 reply; 101+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-09-18 20:56 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

Em Thu, Sep 13, 2018 at 02:54:04PM +0200, Jiri Olsa escreveu:
> Now when we keep perf_tool pointer inside perf_session,
> there's no need to have perf_tool argument in the
> event_op3 callback. Removing it.
> 
> Link: http://lkml.kernel.org/n/tip-78u9m0jbre3bn16l6guqfyrf@git.kernel.org
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  tools/perf/builtin-inject.c | 6 +++---
>  tools/perf/util/auxtrace.c  | 7 +++----
>  tools/perf/util/auxtrace.h  | 5 ++---
>  tools/perf/util/session.c   | 8 +++-----
>  tools/perf/util/tool.h      | 4 +---
>  5 files changed, 12 insertions(+), 18 deletions(-)
> 
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index d77ed2aea95a..03fc65da0657 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -131,10 +131,10 @@ static int copy_bytes(struct perf_inject *inject, int fd, off_t size)
>  	return 0;
>  }
>  
> -static s64 perf_event__repipe_auxtrace(struct perf_tool *tool,
> -				       union perf_event *event,
> -				       struct perf_session *session)
> +static s64 perf_event__repipe_auxtrace(struct perf_session *session,
> +				       union perf_event *event)
>  {
> +	struct perf_tool *tool = session->tool;
>  	struct perf_inject *inject = container_of(tool, struct perf_inject,
>  						  tool);
>  	int ret;

You forgot the !HAVE_AUXTRACE_SUPPORT case, fixed with:


diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 03fc65da0657..b4a29f435b06 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -172,9 +172,8 @@ static s64 perf_event__repipe_auxtrace(struct perf_session *session,
 #else
 
 static s64
-perf_event__repipe_auxtrace(struct perf_tool *tool __maybe_unused,
-			    union perf_event *event __maybe_unused,
-			    struct perf_session *session __maybe_unused)
+perf_event__repipe_auxtrace(struct perf_session *session __maybe_unused,
+			    union perf_event *event __maybe_unused)
 {
 	pr_err("AUX area tracing not supported\n");
 	return -EINVAL;

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-14  9:37       ` Alexey Budankov
@ 2018-09-21  6:13         ` Alexey Budankov
  2018-09-21 12:15           ` Alexey Budankov
  2018-09-23 19:30           ` Jiri Olsa
  0 siblings, 2 replies; 101+ messages in thread
From: Alexey Budankov @ 2018-09-21  6:13 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar,
	Namhyung Kim, Alexander Shishkin, Peter Zijlstra, Andi Kleen

Hello Jiri,

On 14.09.2018 12:37, Alexey Budankov wrote:
> On 14.09.2018 11:28, Jiri Olsa wrote:
>> On Fri, Sep 14, 2018 at 10:26:53AM +0200, Jiri Olsa wrote:
>>
>> SNIP
>>
>>>>> The threaded monitoring currently can't monitor backward maps
>>>>> and there are probably more limitations which I haven't spotted
>>>>> yet.
>>>>>
>>>>> So far I tested on laptop:
>>>>>   http://people.redhat.com/~jolsa/record_threads/test-4CPU.txt
>>>>>
>>>>> and a one bigger server:
>>>>>   http://people.redhat.com/~jolsa/record_threads/test-208CPU.txt
>>>>>
>>>>> I can see decrease in recorded LOST events, but both the benchmark
>>>>> and the monitoring must be carefully configured wrt:
>>>>>   - number of events (frequency)
>>>>>   - size of the memory maps
>>>>>   - size of events (callchains)
>>>>>   - final perf.data size
>>>>>
>>>>> It's also available in:
>>>>>   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
>>>>>   perf/record_threads
>>>>>
>>>>> thoughts? ;-) thanks
>>>>> jirka
>>>>
>>>> It is preferable to split into smaller pieces that bring 
>>>> some improvement proved by metrics numbers and ready for 
>>>> merging and upstream. Do we have more metrics than the 
>>>> data loss from trace AIO patches?
>>>
>>> well the primary focus is to get more events in,
>>> so the LOST metric is the main one
>>
>> actualy I was hoping, could you please run it through the same
>> tests as you do for AIO code on some huge server? 
> 
> Yeah, I will, but it takes some time.

Here it is:

Hardware:
cat /proc/cpuinfo
processor	: 271
vendor_id	: GenuineIntel
cpu family	: 6
model		: 133
model name	: Intel(R) Xeon Phi(TM) CPU 7285 @ 1.30GHz
stepping	: 0
microcode	: 0xe
cpu MHz		: 1064.235
cache size	: 1024 KB
physical id	: 0
siblings	: 272
core id		: 73
cpu cores	: 68
apicid		: 295
initial apicid	: 295
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ring3mwait cpuid_fault epb pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms avx512f rdseed adx avx512pf avx512er avx512cd xsaveopt dtherm ida arat pln pts avx512_vpopcntdq avx512_4vnniw avx512_4fmaps
bugs		: cpu_meltdown spectre_v1 spectre_v2
bogomips	: 2594.07
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

uname -a
Linux nntpat98-196 4.18.0-rc7+ #2 SMP Thu Sep 6 13:24:37 MSK 2018 x86_64 x86_64 x86_64 GNU/Linux

cat /proc/sys/kernel/perf_event_paranoid
0

cat /proc/sys/kernel/perf_event_mlock_kb 
516

cat /proc/sys/kernel/perf_event_max_sample_rate 
3000

cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Metrics:
runtime overhead (%) : elapsed_time_under_profiling / elapsed_time
data loss (%)        : paused_time / elapsed_time_under_profiling
LOST events          : stat from perf report --stats
SAMPLE events        : stat from perf report --stats
perf.data size (B)   : size of trace file on disk

Events:
cpu/period=P,event=0x3c/Duk;CPU_CLK_UNHALTED.THREAD
cpu/period=P,umask=0x3/Duk;CPU_CLK_UNHALTED.REF_TSC
cpu/period=P,event=0xc0/Duk;INST_RETIRED.ANY
cpu/period=0xaae61,event=0xc2,umask=0x10/uk;UOPS_RETIRED.ALL
cpu/period=0x11171,event=0xc2,umask=0x20/uk;UOPS_RETIRED.SCALAR_SIMD
cpu/period=0x11171,event=0xc2,umask=0x40/uk;UOPS_RETIRED.PACKED_SIMD

=================================================

Command:
/usr/bin/time /tmp/vtune_amplifier_2019.574715/bin64/perf.thr record --threads=T \
	-a -N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
        -e cpu/period=P,event=0x3c/Duk,\
           cpu/period=P,umask=0x3/Duk,\
           cpu/period=P,event=0xc0/Duk,\
           cpu/period=0x30d40,event=0xc2,umask=0x10/uk,\
           cpu/period=0x4e20,event=0xc2,umask=0x20/uk,\
           cpu/period=0x4e20,event=0xc2,umask=0x40/uk \
         --clockid=monotonic_raw -- ./matrix.(icc|gcc)

Workload: matrix multiplication in 256 threads

/usr/bin/time ./matrix.icc
Addr of buf1 = 0x7ff9faa73010
Offs of buf1 = 0x7ff9faa73180
Addr of buf2 = 0x7ff9f8a72010
Offs of buf2 = 0x7ff9f8a721c0
Addr of buf3 = 0x7ff9f6a71010
Offs of buf3 = 0x7ff9f6a71100
Addr of buf4 = 0x7ff9f4a70010
Offs of buf4 = 0x7ff9f4a70140
Threads #: 256 Pthreads
Matrix size: 2048
Using multiply kernel: multiply1
Freq = 0.997720 GHz
Execution time = 9.061 seconds
1639.55user 6.59system 0:07.12elapsed 23094%CPU (0avgtext+0avgdata 100448maxresident)k
96inputs+0outputs (1major+33839minor)pagefaults 0swaps

T : 272
        P (period, ms)       : 0.1
	runtime overhead (%) : 45x ~ 323.54 / 7.12
	data loss (%)        : 96
	LOST events          : 323662
	SAMPLE events        : 31885479
        perf.data size (GiB) : 42

	P (period, ms)       : 0.25
	runtime overhead (%) : 25x ~ 180.76 / 7.12
	data loss (%)        : 69 
	LOST events          : 10636
	SAMPLE events        : 18692998
        perf.data size (GiB) : 23.5

	P (period, ms)       : 0.35 
	runtime overhead (%) : 16x ~ 119.49 / 7.12
	data loss (%)        : 1
	LOST events          : 6
	SAMPLE events        : 11178524
        perf.data size (GiB) : 14

T : 128
	P (period, ms)       : 0.35 
	runtime overhead (%) : 15x ~ 111.98 / 7.12
	data loss (%)        : 62
	LOST events          : 2825
	SAMPLE events        : 11267247
        perf.data size (GiB) : 15

T : 64
	P (period, ms)       : 0.35 
	runtime overhead (%) : 14x ~ 101.55 / 7.12
	data loss (%)        : 67
	LOST events          : 5155
	SAMPLE events        : 10966297
        perf.data size (GiB) : 13.7

Workload: matrix multiplication in 128 threads

/usr/bin/time ./matrix.gcc
Addr of buf1 = 0x7f072e630010
Offs of buf1 = 0x7f072e630180
Addr of buf2 = 0x7f072c62f010
Offs of buf2 = 0x7f072c62f1c0
Addr of buf3 = 0x7f072a62e010
Offs of buf3 = 0x7f072a62e100
Addr of buf4 = 0x7f072862d010
Offs of buf4 = 0x7f072862d140
Threads #: 128 Pthreads
Matrix size: 2048
Using multiply kernel: multiply1
Execution time = 6.639 seconds
767.03user 11.17system 0:06.81elapsed 11424%CPU (0avgtext+0avgdata 100756maxresident)k
88inputs+0outputs (0major+139898minor)pagefaults 0swaps

T : 272
        P (period, ms)       : 0.1
	runtime overhead (%) : 29x ~ 198.81 / 6.81
	data loss (%)        : 21
	LOST events          : 2502
	SAMPLE events        : 22481062
        perf.data size (GiB) : 27.6

	P (period, ms)       : 0.25
	runtime overhead (%) : 13x ~ 88.47 / 6.81
	data loss (%)        : 0
	LOST events          : 0
	SAMPLE events        : 9572787
        perf.data size (GiB) : 11.3

	P (period, ms)       : 0.35 
	runtime overhead (%) : 10x ~ 67.11 / 6.81
	data loss (%)        : 1
	LOST events          : 137
	SAMPLE events        : 6985930
        perf.data size (GiB) : 8

T : 128
	P (period, ms)       : 0.35 
	runtime overhead (%) : 9.5x ~ 64.33 / 6.81
	data loss (%)        : 1
	LOST events          : 3
	SAMPLE events        : 6666903
        perf.data size (GiB) : 7.8

T : 64
	P (period, ms)       : 0.25
	runtime overhead (%) : 17x ~ 114.27 / 6.81
	data loss (%)        : 2
	LOST events          : 52
	SAMPLE events        : 12643645
        perf.data size (GiB) : 15.5

	P (period, ms)       : 0.35 
	runtime overhead (%) : 10x ~ 68.60 / 6.81
	data loss (%)        : 1
	LOST events          : 93
	SAMPLE events        : 7164368
        perf.data size (GiB) : 8.5

Thanks,
Alexey

> 
>>
>> thanks,
>> jirka
>>
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-21  6:13         ` Alexey Budankov
@ 2018-09-21 12:15           ` Alexey Budankov
  2018-09-24 19:23             ` Alexey Budankov
  2018-09-23 19:30           ` Jiri Olsa
  1 sibling, 1 reply; 101+ messages in thread
From: Alexey Budankov @ 2018-09-21 12:15 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar,
	Namhyung Kim, Alexander Shishkin, Peter Zijlstra, Andi Kleen

Hello Jiri,

On 21.09.2018 9:13, Alexey Budankov wrote:
> Hello Jiri,
> 
> On 14.09.2018 12:37, Alexey Budankov wrote:
>> On 14.09.2018 11:28, Jiri Olsa wrote:
>>> On Fri, Sep 14, 2018 at 10:26:53AM +0200, Jiri Olsa wrote:
>>>
>>> SNIP
>>>
>>>>>> The threaded monitoring currently can't monitor backward maps
>>>>>> and there are probably more limitations which I haven't spotted
>>>>>> yet.
>>>>>>
>>>>>> So far I tested on laptop:
>>>>>>   http://people.redhat.com/~jolsa/record_threads/test-4CPU.txt
>>>>>>
>>>>>> and a one bigger server:
>>>>>>   http://people.redhat.com/~jolsa/record_threads/test-208CPU.txt
>>>>>>
>>>>>> I can see decrease in recorded LOST events, but both the benchmark
>>>>>> and the monitoring must be carefully configured wrt:
>>>>>>   - number of events (frequency)
>>>>>>   - size of the memory maps
>>>>>>   - size of events (callchains)
>>>>>>   - final perf.data size
>>>>>>
>>>>>> It's also available in:
>>>>>>   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
>>>>>>   perf/record_threads
>>>>>>
>>>>>> thoughts? ;-) thanks
>>>>>> jirka
>>>>>
>>>>> It is preferable to split into smaller pieces that bring 
>>>>> some improvement proved by metrics numbers and ready for 
>>>>> merging and upstream. Do we have more metrics than the 
>>>>> data loss from trace AIO patches?
>>>>
>>>> well the primary focus is to get more events in,
>>>> so the LOST metric is the main one
>>>
>>> actualy I was hoping, could you please run it through the same
>>> tests as you do for AIO code on some huge server? 
>>
>> Yeah, I will, but it takes some time.
> 
> Here it is:
> 
> Hardware:
> cat /proc/cpuinfo
> processor	: 271
> vendor_id	: GenuineIntel
> cpu family	: 6
> model		: 133
> model name	: Intel(R) Xeon Phi(TM) CPU 7285 @ 1.30GHz
> stepping	: 0
> microcode	: 0xe
> cpu MHz		: 1064.235
> cache size	: 1024 KB
> physical id	: 0
> siblings	: 272
> core id		: 73
> cpu cores	: 68
> apicid		: 295
> initial apicid	: 295
> fpu		: yes
> fpu_exception	: yes
> cpuid level	: 13
> wp		: yes
> flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ring3mwait cpuid_fault epb pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms avx512f rdseed adx avx512pf avx512er avx512cd xsaveopt dtherm ida arat pln pts avx512_vpopcntdq avx512_4vnniw avx512_4fmaps
> bugs		: cpu_meltdown spectre_v1 spectre_v2
> bogomips	: 2594.07
> clflush size	: 64
> cache_alignment	: 64
> address sizes	: 46 bits physical, 48 bits virtual
> power management:
> 
> uname -a
> Linux nntpat98-196 4.18.0-rc7+ #2 SMP Thu Sep 6 13:24:37 MSK 2018 x86_64 x86_64 x86_64 GNU/Linux
> 
> cat /proc/sys/kernel/perf_event_paranoid
> 0
> 
> cat /proc/sys/kernel/perf_event_mlock_kb 
> 516
> 
> cat /proc/sys/kernel/perf_event_max_sample_rate 
> 3000
> 
> cat /etc/redhat-release 
> Red Hat Enterprise Linux Server release 7.5 (Maipo)
> 
> Metrics:
> runtime overhead (%) : elapsed_time_under_profiling / elapsed_time
> data loss (%)        : paused_time / elapsed_time_under_profiling
> LOST events          : stat from perf report --stats
> SAMPLE events        : stat from perf report --stats
> perf.data size (B)   : size of trace file on disk
> 
> Events:
> cpu/period=P,event=0x3c/Duk;CPU_CLK_UNHALTED.THREAD
> cpu/period=P,umask=0x3/Duk;CPU_CLK_UNHALTED.REF_TSC
> cpu/period=P,event=0xc0/Duk;INST_RETIRED.ANY
> cpu/period=0xaae61,event=0xc2,umask=0x10/uk;UOPS_RETIRED.ALL
> cpu/period=0x11171,event=0xc2,umask=0x20/uk;UOPS_RETIRED.SCALAR_SIMD
> cpu/period=0x11171,event=0xc2,umask=0x40/uk;UOPS_RETIRED.PACKED_SIMD
> 
> =================================================
> 
> Command:
> /usr/bin/time /tmp/vtune_amplifier_2019.574715/bin64/perf.thr record --threads=T \
> 	-a -N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
>         -e cpu/period=P,event=0x3c/Duk,\
>            cpu/period=P,umask=0x3/Duk,\
>            cpu/period=P,event=0xc0/Duk,\
>            cpu/period=0x30d40,event=0xc2,umask=0x10/uk,\
>            cpu/period=0x4e20,event=0xc2,umask=0x20/uk,\
>            cpu/period=0x4e20,event=0xc2,umask=0x40/uk \
>          --clockid=monotonic_raw -- ./matrix.(icc|gcc)
> 
> Workload: matrix multiplication in 256 threads
> 
> /usr/bin/time ./matrix.icc
> Addr of buf1 = 0x7ff9faa73010
> Offs of buf1 = 0x7ff9faa73180
> Addr of buf2 = 0x7ff9f8a72010
> Offs of buf2 = 0x7ff9f8a721c0
> Addr of buf3 = 0x7ff9f6a71010
> Offs of buf3 = 0x7ff9f6a71100
> Addr of buf4 = 0x7ff9f4a70010
> Offs of buf4 = 0x7ff9f4a70140
> Threads #: 256 Pthreads
> Matrix size: 2048
> Using multiply kernel: multiply1
> Freq = 0.997720 GHz
> Execution time = 9.061 seconds
> 1639.55user 6.59system 0:07.12elapsed 23094%CPU (0avgtext+0avgdata 100448maxresident)k
> 96inputs+0outputs (1major+33839minor)pagefaults 0swaps
> 
> T : 272
>         P (period, ms)       : 0.1
> 	runtime overhead (%) : 45x ~ 323.54 / 7.12
> 	data loss (%)        : 96
> 	LOST events          : 323662
> 	SAMPLE events        : 31885479
>         perf.data size (GiB) : 42
> 
> 	P (period, ms)       : 0.25
> 	runtime overhead (%) : 25x ~ 180.76 / 7.12
> 	data loss (%)        : 69 
> 	LOST events          : 10636
> 	SAMPLE events        : 18692998
>         perf.data size (GiB) : 23.5
> 
> 	P (period, ms)       : 0.35 
> 	runtime overhead (%) : 16x ~ 119.49 / 7.12
> 	data loss (%)        : 1
> 	LOST events          : 6
> 	SAMPLE events        : 11178524
>         perf.data size (GiB) : 14
> 
> T : 128
> 	P (period, ms)       : 0.35 
> 	runtime overhead (%) : 15x ~ 111.98 / 7.12
> 	data loss (%)        : 62
> 	LOST events          : 2825
> 	SAMPLE events        : 11267247
>         perf.data size (GiB) : 15
> 
> T : 64
> 	P (period, ms)       : 0.35 
> 	runtime overhead (%) : 14x ~ 101.55 / 7.12
> 	data loss (%)        : 67
> 	LOST events          : 5155
> 	SAMPLE events        : 10966297
>         perf.data size (GiB) : 13.7
> 
> Workload: matrix multiplication in 128 threads
> 
> /usr/bin/time ./matrix.gcc
> Addr of buf1 = 0x7f072e630010
> Offs of buf1 = 0x7f072e630180
> Addr of buf2 = 0x7f072c62f010
> Offs of buf2 = 0x7f072c62f1c0
> Addr of buf3 = 0x7f072a62e010
> Offs of buf3 = 0x7f072a62e100
> Addr of buf4 = 0x7f072862d010
> Offs of buf4 = 0x7f072862d140
> Threads #: 128 Pthreads
> Matrix size: 2048
> Using multiply kernel: multiply1
> Execution time = 6.639 seconds
> 767.03user 11.17system 0:06.81elapsed 11424%CPU (0avgtext+0avgdata 100756maxresident)k
> 88inputs+0outputs (0major+139898minor)pagefaults 0swaps
> 
> T : 272
>         P (period, ms)       : 0.1
> 	runtime overhead (%) : 29x ~ 198.81 / 6.81
> 	data loss (%)        : 21
> 	LOST events          : 2502
> 	SAMPLE events        : 22481062
>         perf.data size (GiB) : 27.6
> 
> 	P (period, ms)       : 0.25
> 	runtime overhead (%) : 13x ~ 88.47 / 6.81
> 	data loss (%)        : 0
> 	LOST events          : 0
> 	SAMPLE events        : 9572787
>         perf.data size (GiB) : 11.3
> 
> 	P (period, ms)       : 0.35 
> 	runtime overhead (%) : 10x ~ 67.11 / 6.81
> 	data loss (%)        : 1
> 	LOST events          : 137
> 	SAMPLE events        : 6985930
>         perf.data size (GiB) : 8
> 
> T : 128
> 	P (period, ms)       : 0.35 
> 	runtime overhead (%) : 9.5x ~ 64.33 / 6.81
> 	data loss (%)        : 1
> 	LOST events          : 3
> 	SAMPLE events        : 6666903
>         perf.data size (GiB) : 7.8
> 
> T : 64
> 	P (period, ms)       : 0.25
> 	runtime overhead (%) : 17x ~ 114.27 / 6.81
> 	data loss (%)        : 2
> 	LOST events          : 52
> 	SAMPLE events        : 12643645
>         perf.data size (GiB) : 15.5
> 
> 	P (period, ms)       : 0.35 
> 	runtime overhead (%) : 10x ~ 68.60 / 6.81
> 	data loss (%)        : 1
> 	LOST events          : 93
> 	SAMPLE events        : 7164368
>         perf.data size (GiB) : 8.5

and this is for AIO and serial:

Command:
/usr/bin/time /tmp/vtune_amplifier_2019.574715/bin64/perf.aio record --aio=N \
	-a -N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
        -e cpu/period=P,event=0x3c/Duk,\
           cpu/period=P,umask=0x3/Duk,\
           cpu/period=P,event=0xc0/Duk,\
           cpu/period=0x30d40,event=0xc2,umask=0x10/uk,\
           cpu/period=0x4e20,event=0xc2,umask=0x20/uk,\
           cpu/period=0x4e20,event=0xc2,umask=0x40/uk \
         --clockid=monotonic_raw -- ./matrix.(icc|gcc)

Workload: matrix multiplication in 256 threads

 N : 512
        P (period, ms)       : 2.5
 	runtime overhead (%) : 2.7x ~ 19.21 / 7.12
 	data loss (%)        : 42
 	LOST events          : 1600
 	SAMPLE events        : 1235928
        perf.data size (GiB) : 1.5
 
 N : 272
 	P (period, ms)       : 1.5
 	runtime overhead (%) : 2.5x ~ 18.09 / 7.12
 	data loss (%)        : 89
 	LOST events          : 3457
 	SAMPLE events        : 1222143
        perf.data size (GiB) : 1.5

 	P (period, ms)       : 2
 	runtime overhead (%) : 2.5x ~ 17.93 / 7.12
 	data loss (%)        : 65
 	LOST events          : 2496
 	SAMPLE events        : 1240754
        perf.data size (GiB) : 1.5

 	P (period, ms)       : 2.5
 	runtime overhead (%) : 2.5x ~ 17.87 / 7.12
 	data loss (%)        : 44
 	LOST events          : 1621
 	SAMPLE events        : 1221949
        perf.data size (GiB) : 1.5

 	P (period, ms)       : 3
 	runtime overhead (%) : 2.5x ~ 18.43 / 7.12
 	data loss (%)        : 12
 	LOST events          : 350
 	SAMPLE events        : 1117972
        perf.data size (GiB) : 1.3
 
 N : 128
 	P (period, ms)       : 3
 	runtime overhead (%) : 2.4x ~ 17.08 / 7.12
 	data loss (%)        : 11
 	LOST events          : 335
 	SAMPLE events        : 1116832
        perf.data size (GiB) : 1.3
 
 N : 64
 	P (period, ms)       : 3
 	runtime overhead (%) : 2.2x ~ 16.03 / 7.12
 	data loss (%)        : 11
 	LOST events          : 329
 	SAMPLE events        : 1108205
        perf.data size (GiB) : 1.3
 
Workload: matrix multiplication in 128 threads

 N : 512
        P (period, ms)       : 1
 	runtime overhead (%) : 3.5x ~ 23.72 / 6.81
 	data loss (%)        : 18
 	LOST events          : 1043
 	SAMPLE events        : 2015306
        perf.data size (GiB) : 2.3

 N : 272
        P (period, ms)       : 0.5
 	runtime overhead (%) : 3x ~ 22.72 / 6.81
 	data loss (%)        : 90
 	LOST events          : 5842
 	SAMPLE events        : 2205937
        perf.data size (GiB) : 2.5

        P (period, ms)       : 1
 	runtime overhead (%) : 3x ~ 22.79 / 6.81
 	data loss (%)        : 11
 	LOST events          : 481
 	SAMPLE events        : 2017099
        perf.data size (GiB) : 2.5
 
 	P (period, ms)       : 1.5
 	runtime overhead (%) : 3x ~ 19.93 / 6.81
 	data loss (%)        : 5
 	LOST events          : 190
 	SAMPLE events        : 1308692
        perf.data size (GiB) : 1.5
 
 	P (period, ms)       : 2
 	runtime overhead (%) : 3x ~ 18.95 / 6.81
 	data loss (%)        : 0
 	LOST events          : 0
 	SAMPLE events        : 1010769
        perf.data size (GiB) : 1.2
 
 N : 128
 	P (period, ms)       : 1.5
 	runtime overhead (%) : 3x ~ 19.08 / 6.81
 	data loss (%)        : 6
 	LOST events          : 220
 	SAMPLE events        : 1322240
        perf.data size (GiB) : 1.5
 
 N : 64
 	P (period, ms)       : 1.5
 	runtime overhead (%) : 3x ~ 19.43 / 6.81
 	data loss (%)        : 3
 	LOST events          : 130
 	SAMPLE events        : 1386521
        perf.data size (GiB) : 1.6

=================================================

Command:
/usr/bin/time /tmp/vtune_amplifier_2019.574715/bin64/perf record \
	-a -N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
        -e cpu/period=P,event=0x3c/Duk,\
           cpu/period=P,umask=0x3/Duk,\
           cpu/period=P,event=0xc0/Duk,\
           cpu/period=0x30d40,event=0xc2,umask=0x10/uk,\
           cpu/period=0x4e20,event=0xc2,umask=0x20/uk,\
           cpu/period=0x4e20,event=0xc2,umask=0x40/uk \
         --clockid=monotonic_raw -- ./matrix.(icc|gcc)

Workload: matrix multiplication in 256 threads

	P (period, ms)       : 7.5
 	runtime overhead (%) : 1.6x ~ 11.6 / 7.12
 	data loss (%)        : 1
 	LOST events          : 1
 	SAMPLE events        : 451062
        perf.data size (GiB) : 0.5

Workload: matrix multiplication in 128 threads

	P (period, ms)       : 3
 	runtime overhead (%) : 1.8x ~ 12.58 / 6.81
 	data loss (%)        : 9
 	LOST events          : 147
 	SAMPLE events        : 673299
        perf.data size (GiB) : 0.8

Thanks,
Alexey

> 
> Thanks,
> Alexey
> 
>>
>>>
>>> thanks,
>>> jirka
>>>
>>
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-21  6:13         ` Alexey Budankov
  2018-09-21 12:15           ` Alexey Budankov
@ 2018-09-23 19:30           ` Jiri Olsa
  2018-09-24  7:02             ` Alexey Budankov
  1 sibling, 1 reply; 101+ messages in thread
From: Jiri Olsa @ 2018-09-23 19:30 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar,
	Namhyung Kim, Alexander Shishkin, Peter Zijlstra, Andi Kleen

On Fri, Sep 21, 2018 at 09:13:08AM +0300, Alexey Budankov wrote:

SNIP

> Events:
> cpu/period=P,event=0x3c/Duk;CPU_CLK_UNHALTED.THREAD
> cpu/period=P,umask=0x3/Duk;CPU_CLK_UNHALTED.REF_TSC
> cpu/period=P,event=0xc0/Duk;INST_RETIRED.ANY
> cpu/period=0xaae61,event=0xc2,umask=0x10/uk;UOPS_RETIRED.ALL
> cpu/period=0x11171,event=0xc2,umask=0x20/uk;UOPS_RETIRED.SCALAR_SIMD
> cpu/period=0x11171,event=0xc2,umask=0x40/uk;UOPS_RETIRED.PACKED_SIMD
> 
> =================================================
> 
> Command:
> /usr/bin/time /tmp/vtune_amplifier_2019.574715/bin64/perf.thr record --threads=T \
> 	-a -N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
>         -e cpu/period=P,event=0x3c/Duk,\
>            cpu/period=P,umask=0x3/Duk,\
>            cpu/period=P,event=0xc0/Duk,\
>            cpu/period=0x30d40,event=0xc2,umask=0x10/uk,\
>            cpu/period=0x4e20,event=0xc2,umask=0x20/uk,\
>            cpu/period=0x4e20,event=0xc2,umask=0x40/uk \
>          --clockid=monotonic_raw -- ./matrix.(icc|gcc)

hum, so I guess the results suck because of the -a option,
getting extra samples for all the perf record threads

could you try without the -a? you monitor only user events,
so you're interested only in ./matrix.* samples, right?

thanks,
jirka

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 37/48] perf record: Introduce struct record_thread
  2018-09-17 11:26   ` Namhyung Kim
@ 2018-09-23 19:31     ` Jiri Olsa
  0 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-23 19:31 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov,
	kernel-team

On Mon, Sep 17, 2018 at 08:26:15PM +0900, Namhyung Kim wrote:
> Hi Jiri,
> 
> On Thu, Sep 13, 2018 at 02:54:39PM +0200, Jiri Olsa wrote:
> > Adding struct record_thread to carry the single thread's maps.
> > 
> > Link: http://lkml.kernel.org/n/tip-dsyi97xdc7ullvsisqmha0ca@git.kernel.org
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> >  tools/perf/builtin-record.c | 179 ++++++++++++++++++++++++++++++++++++
> >  1 file changed, 179 insertions(+)
> > 
> > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > index 1b01cb4d06b8..5c6b56f164a9 100644
> > --- a/tools/perf/builtin-record.c
> > +++ b/tools/perf/builtin-record.c
> > @@ -65,6 +65,15 @@ struct switch_output {
> >  	bool		 set;
> >  };
> >  
> > +struct record_thread {
> > +	struct perf_mmap	**mmap;
> > +	int			  mmap_nr;
> > +	struct perf_mmap	**ovw_mmap;
> > +	int			  ovw_mmap_nr;
> > +	struct fdarray		  pollfd;
> > +	struct record		 *rec;
> > +};
> > +
> >  struct record {
> >  	struct perf_tool	tool;
> >  	struct record_opts	opts;
> > @@ -83,6 +92,8 @@ struct record {
> >  	bool			timestamp_boundary;
> >  	struct switch_output	switch_output;
> >  	unsigned long long	samples;
> > +	struct record_thread	*threads;
> > +	int			threads_cnt;
> >  };
> >  
> >  static volatile int auxtrace_record__snapshot_started;
> > @@ -967,6 +978,166 @@ static int record__synthesize(struct record *rec, bool tail)
> >  	return err;
> >  }
> >  
> > +static void
> > +record_thread__clean(struct record_thread *th)
> > +{
> > +	free(th->mmap);
> > +	free(th->ovw_mmap);
> > +}
> > +
> > +static void
> > +record__threads_clean(struct record *rec)
> > +{
> > +	struct record_thread *threads = rec->threads;
> > +	int i;
> > +
> > +	if (threads) {
> > +		for (i = 0; i < rec->threads_cnt; i++)
> > +			record_thread__clean(threads + i);
> > +	}
> > +}
> > +
> > +static void record_thread__init(struct record_thread *th, struct record *rec)
> > +{
> > +	memset(th, 0, sizeof(*th));
> > +	fdarray__init(&th->pollfd, 64);
> > +	th->rec = rec;
> > +}
> > +
> > +static int
> > +record_thread__mmap(struct record_thread *th, int nr, int nr_ovw)
> > +{
> > +	struct perf_mmap **mmap;
> > +
> > +	mmap = zalloc(sizeof(*mmap) * nr);
> > +	if (!mmap)
> > +		return -ENOMEM;
> > +
> > +	th->mmap    = mmap;
> > +	th->mmap_nr = nr;
> > +
> > +	if (nr_ovw) {
> > +		mmap = zalloc(sizeof(*mmap) * nr_ovw);
> > +		if (!mmap)
> > +			return -ENOMEM;
> > +
> > +		th->ovw_mmap    = mmap;
> > +		th->ovw_mmap_nr = nr;
> 
> s/nr;/nr_ovw;/ ?

right, thanks

> 
> 
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int
> > +record__threads_assign(struct record *rec)
> > +{
> > +	struct record_thread *threads = rec->threads;
> > +	struct record_thread *thread0 = threads;
> > +	struct perf_evlist *evlist = rec->evlist;
> > +	int i, j, nr, nr0, nr_ovw, nr_trk;
> > +	int ret = -ENOMEM;
> > +
> > +	nr     = evlist->mmap           ? evlist->nr_mmaps : 0;
> > +	nr_trk = evlist->track_mmap     ? evlist->nr_mmaps : 0;
> > +	nr_ovw = evlist->overwrite_mmap ? evlist->nr_mmaps : 0;
> > +
> > +	nr0  = nr_trk;
> > +	nr0 += nr;
> > +
> > +	if (record_thread__mmap(thread0, nr0, nr_ovw))
> > +		goto out_error;
> > +
> > +	for (i = 0; i < nr_ovw; i++)
> > +		thread0->ovw_mmap[i] = &evlist->overwrite_mmap[i];
> > +
> > +	for (i = 0; i < nr_trk; i++)
> > +		thread0->mmap[i] = &evlist->track_mmap[i];
> > +
> > +	for (j = 0; i < nr0 && j < nr; i++, j++)
> > +		thread0->mmap[i] = &evlist->mmap[j];
> 
> I'm not sure it'll work with the overwrite mmap well..

as you said in the later email, there's no support
for threads and overwrite mode

thanks,
jirka

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 38/48] perf record: Read record thread's mmaps
  2018-09-17 11:28   ` Namhyung Kim
@ 2018-09-23 19:35     ` Jiri Olsa
  0 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-23 19:35 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov,
	kernel-team

On Mon, Sep 17, 2018 at 08:28:37PM +0900, Namhyung Kim wrote:

SNIP

> > -
> > -		if (track_map && track_map->base) {
> > -			if (perf_mmap__push(track_map, rec, record__pushfn) != 0) {
> > -				rc = -1;
> > -				goto out;
> > -			}
> > -		}
> >  	}
> >  
> >  	/*
> >  	 * Mark the round finished in case we wrote
> >  	 * at least one event.
> >  	 */
> > -	if (bytes_written != rec->bytes_written)
> > -		rc = record__write(rec, NULL, &finished_round_event, sizeof(finished_round_event));
> > +	if (bytes_written != rec->bytes_written) {
> > +		/*
> > +		 * All maps of the threads point to a single file,
> > +		 * so we can just pick first one.
> > +		 */
> > +		rc = record__write(rec, thread->mmap[0], &finished_round_event,
> 
> Shouldn't it be maps[0] ?

yep, overwrite wouldn't work..

thanks,
jirka

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 39/48] perf record: Move waking into struct record
  2018-09-17 11:31   ` Namhyung Kim
@ 2018-09-23 19:36     ` Jiri Olsa
  0 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-23 19:36 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov,
	kernel-team

On Mon, Sep 17, 2018 at 08:31:31PM +0900, Namhyung Kim wrote:
> On Thu, Sep 13, 2018 at 02:54:41PM +0200, Jiri Olsa wrote:
> > We need to keep global number of 'waking' now.
> > 
> > TODO: make this multiple threads safe.
> 
> Why not using atomic APIs?

that would solve it, will check

thanks,
jirka

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 47/48] perf record: Spread maps for --threads option
  2018-09-17 11:40   ` Namhyung Kim
@ 2018-09-23 19:44     ` Jiri Olsa
  2018-09-24 14:22       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 101+ messages in thread
From: Jiri Olsa @ 2018-09-23 19:44 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov,
	kernel-team

On Mon, Sep 17, 2018 at 08:40:48PM +0900, Namhyung Kim wrote:
> On Thu, Sep 13, 2018 at 02:54:49PM +0200, Jiri Olsa wrote:
> > Currently we assign all maps to main thread. Adding
> > code that spreads maps for --threads option.
> > 
> > For --thread option we create as many threads as there
> > are memory maps in evlist, which is the number of CPUs
> > in the system or CPUs we monitor. Each thread gets a
> > single data mmap to read.
> > 
> > In addition we have also same amount of tracking mmaps
> > for auxiliary events which we don't create special thread
> > for. Instead we assign the to the main thread, because
> > there's not much traffic expected there.
> > 
> > The assignment is visible from --thread-stats output:
> > 
> >           pid      write       poll       skip  maps (size 20K)
> >     1s   9770       144B          1          0   19K   19K   19K   18K   19K
> >          9772         0B          1          0   18K
> >          9773         0B          1          0   19K
> >          9774         0B          1          0   19K
> > 
> > There are 5 maps for thread 9770 (1 data map and 4 auxiliary)
> > and one data map for every other thread. Each thread writes
> > data to the separate data file.
> 
> Hmm.. not sure it'll work well for large machines with 1000+ cpus.
> What about giving each thread a data mmap and a tracking mmap?

well currently we store the tracking data in single file,
thats why we need just one thread to write them down

with the *_time API, we should be able to properly read the
tracking data separately for each cpu

jirka

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 09/48] perf tools: Make copyfile_offset global
  2018-09-18 20:54   ` Arnaldo Carvalho de Melo
@ 2018-09-23 19:44     ` Jiri Olsa
  0 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-23 19:44 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

On Tue, Sep 18, 2018 at 05:54:50PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Sep 13, 2018 at 02:54:11PM +0200, Jiri Olsa escreveu:
> > It will be used outside of util object in following patches.
> 
> Had to add fcntl.h to have loff_t to fix the build in some systems,
> moved the prototype closer to the other copyfile_ prefixed functions in
> util.h.

ok, thanks

jirka

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 02/48] perf tools: Remove perf_tool from event_op3
  2018-09-18 20:56   ` Arnaldo Carvalho de Melo
@ 2018-09-23 19:45     ` Jiri Olsa
  0 siblings, 0 replies; 101+ messages in thread
From: Jiri Olsa @ 2018-09-23 19:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, lkml, Ingo Molnar, Namhyung Kim, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov

On Tue, Sep 18, 2018 at 05:56:09PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Sep 13, 2018 at 02:54:04PM +0200, Jiri Olsa escreveu:
> > Now when we keep perf_tool pointer inside perf_session,
> > there's no need to have perf_tool argument in the
> > event_op3 callback. Removing it.
> > 
> > Link: http://lkml.kernel.org/n/tip-78u9m0jbre3bn16l6guqfyrf@git.kernel.org
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> >  tools/perf/builtin-inject.c | 6 +++---
> >  tools/perf/util/auxtrace.c  | 7 +++----
> >  tools/perf/util/auxtrace.h  | 5 ++---
> >  tools/perf/util/session.c   | 8 +++-----
> >  tools/perf/util/tool.h      | 4 +---
> >  5 files changed, 12 insertions(+), 18 deletions(-)
> > 
> > diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> > index d77ed2aea95a..03fc65da0657 100644
> > --- a/tools/perf/builtin-inject.c
> > +++ b/tools/perf/builtin-inject.c
> > @@ -131,10 +131,10 @@ static int copy_bytes(struct perf_inject *inject, int fd, off_t size)
> >  	return 0;
> >  }
> >  
> > -static s64 perf_event__repipe_auxtrace(struct perf_tool *tool,
> > -				       union perf_event *event,
> > -				       struct perf_session *session)
> > +static s64 perf_event__repipe_auxtrace(struct perf_session *session,
> > +				       union perf_event *event)
> >  {
> > +	struct perf_tool *tool = session->tool;
> >  	struct perf_inject *inject = container_of(tool, struct perf_inject,
> >  						  tool);
> >  	int ret;
> 
> You forgot the !HAVE_AUXTRACE_SUPPORT case, fixed with:

oops, thanks ;-)

jirka

> 
> 
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index 03fc65da0657..b4a29f435b06 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -172,9 +172,8 @@ static s64 perf_event__repipe_auxtrace(struct perf_session *session,
>  #else
>  
>  static s64
> -perf_event__repipe_auxtrace(struct perf_tool *tool __maybe_unused,
> -			    union perf_event *event __maybe_unused,
> -			    struct perf_session *session __maybe_unused)
> +perf_event__repipe_auxtrace(struct perf_session *session __maybe_unused,
> +			    union perf_event *event __maybe_unused)
>  {
>  	pr_err("AUX area tracing not supported\n");
>  	return -EINVAL;

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-23 19:30           ` Jiri Olsa
@ 2018-09-24  7:02             ` Alexey Budankov
  2018-09-24 13:09               ` Alexey Budankov
  0 siblings, 1 reply; 101+ messages in thread
From: Alexey Budankov @ 2018-09-24  7:02 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar,
	Namhyung Kim, Alexander Shishkin, Peter Zijlstra, Andi Kleen

Hi,

On 23.09.2018 22:30, Jiri Olsa wrote:
> On Fri, Sep 21, 2018 at 09:13:08AM +0300, Alexey Budankov wrote:
> 
> SNIP
> 
>> Events:
>> cpu/period=P,event=0x3c/Duk;CPU_CLK_UNHALTED.THREAD
>> cpu/period=P,umask=0x3/Duk;CPU_CLK_UNHALTED.REF_TSC
>> cpu/period=P,event=0xc0/Duk;INST_RETIRED.ANY
>> cpu/period=0xaae61,event=0xc2,umask=0x10/uk;UOPS_RETIRED.ALL
>> cpu/period=0x11171,event=0xc2,umask=0x20/uk;UOPS_RETIRED.SCALAR_SIMD
>> cpu/period=0x11171,event=0xc2,umask=0x40/uk;UOPS_RETIRED.PACKED_SIMD
>>
>> =================================================
>>
>> Command:
>> /usr/bin/time /tmp/vtune_amplifier_2019.574715/bin64/perf.thr record --threads=T \
>> 	-a -N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
>>         -e cpu/period=P,event=0x3c/Duk,\
>>            cpu/period=P,umask=0x3/Duk,\
>>            cpu/period=P,event=0xc0/Duk,\
>>            cpu/period=0x30d40,event=0xc2,umask=0x10/uk,\
>>            cpu/period=0x4e20,event=0xc2,umask=0x20/uk,\
>>            cpu/period=0x4e20,event=0xc2,umask=0x40/uk \
>>          --clockid=monotonic_raw -- ./matrix.(icc|gcc)
> 
> hum, so I guess the results suck because of the -a option,
> getting extra samples for all the perf record threads
> 
> could you try without the -a? you monitor only user events,
> so you're interested only in ./matrix.* samples, right?

Ok, trying without -a, in per-process mode. 
VTune collects as user as kernel mode samples, using /uk modifiers set.
The set can be extended to collect in VM host and guests as well.

Thanks,
Alexey

> 
> thanks,
> jirka
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-24  7:02             ` Alexey Budankov
@ 2018-09-24 13:09               ` Alexey Budankov
  2018-09-24 14:29                 ` Jiri Olsa
  0 siblings, 1 reply; 101+ messages in thread
From: Alexey Budankov @ 2018-09-24 13:09 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar,
	Namhyung Kim, Alexander Shishkin, Peter Zijlstra, Andi Kleen

Hi,

On 24.09.2018 10:02, Alexey Budankov wrote:
> Hi,
> 
> On 23.09.2018 22:30, Jiri Olsa wrote:
>> On Fri, Sep 21, 2018 at 09:13:08AM +0300, Alexey Budankov wrote:
>>
>> SNIP
>>
>>> Events:
>>> cpu/period=P,event=0x3c/Duk;CPU_CLK_UNHALTED.THREAD
>>> cpu/period=P,umask=0x3/Duk;CPU_CLK_UNHALTED.REF_TSC
>>> cpu/period=P,event=0xc0/Duk;INST_RETIRED.ANY
>>> cpu/period=0xaae61,event=0xc2,umask=0x10/uk;UOPS_RETIRED.ALL
>>> cpu/period=0x11171,event=0xc2,umask=0x20/uk;UOPS_RETIRED.SCALAR_SIMD
>>> cpu/period=0x11171,event=0xc2,umask=0x40/uk;UOPS_RETIRED.PACKED_SIMD
>>>
>>> =================================================
>>>
>>> Command:
>>> /usr/bin/time /tmp/vtune_amplifier_2019.574715/bin64/perf.thr record --threads=T \
>>> 	-a -N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
>>>         -e cpu/period=P,event=0x3c/Duk,\
>>>            cpu/period=P,umask=0x3/Duk,\
>>>            cpu/period=P,event=0xc0/Duk,\
>>>            cpu/period=0x30d40,event=0xc2,umask=0x10/uk,\
>>>            cpu/period=0x4e20,event=0xc2,umask=0x20/uk,\
>>>            cpu/period=0x4e20,event=0xc2,umask=0x40/uk \
>>>          --clockid=monotonic_raw -- ./matrix.(icc|gcc)
>>
>> hum, so I guess the results suck because of the -a option,
>> getting extra samples for all the perf record threads
>>
>> could you try without the -a? you monitor only user events,
>> so you're interested only in ./matrix.* samples, right?
> 
> Ok, trying without -a, in per-process mode. 

Command:

/usr/bin/time ./perf.thr record --threads=T \
	-N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
	-e cpu/period=P,event=0x3c/Duk,\
	   cpu/period=P,umask=0x3/Duk,\
	   cpu/period=P,event=0xc0/Duk,\
	   cpu/period=0xaae61,event=0xc2,umask=0x10/uk,\
	   cpu/period=0x11171,event=0xc2,umask=0x20/uk,\
	   cpu/period=0x11171,event=0xc2,umask=0x40/uk \
	--clockid=monotonic_raw -- ./matrix.gcc

Workload: matrix multiplication in 128 threads

T : 272
	P (period, ms)       : 0.35 
	runtime overhead (%) : 13x ~ 87.73 / 6.81
	data loss (%)        : 0
	LOST events          : 36
	SAMPLE events        : 8048542
        perf.data size (GiB) : 10

T : 128
	P (period, ms)       : 0.35 
	runtime overhead (%) : 10x ~ 71.12 / 6.81
	data loss (%)        : 0
	LOST events          : 2
	SAMPLE events        : 6524363
        perf.data size (GiB) : 8

T : 64
	P (period, ms)       : 0.35 
	runtime overhead (%) : 10x ~ 71.89 / 6.81
	data loss (%)        : 0
	LOST events          : 2
	SAMPLE events        : 7160623
        perf.data size (GiB) : 9

=================================================

Command:

/usr/bin/time ./perf.aio record --aio=N \
	-N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
	-e cpu/period=P,event=0x3c/Duk,\
	   cpu/period=P,umask=0x3/Duk,\
           cpu/period=P,event=0xc0/Duk,\
           cpu/period=0xaae61,event=0xc2,umask=0x10/uk,\
           cpu/period=0x11171,event=0xc2,umask=0x20/uk,\
           cpu/period=0x11171,event=0xc2,umask=0x40/uk \
        --clockid=monotonic_raw ./matrix.gcc

Workload: matrix multiplication in 128 threads

N : 512
        P (period, ms)       : 1.5
 	runtime overhead (%) : 2.8x ~ 19.20 / 6.81
 	data loss (%)        : 0
 	LOST events          : 0
 	SAMPLE events        : 1094976
        perf.data size (GiB) : 1.3

N : 272
  	P (period, ms)       : 1.5
 	runtime overhead (%) : 3.3x ~ 22.34 / 6.81
 	data loss (%)        : 0
 	LOST events          : 0
 	SAMPLE events        : 1089252
        perf.data size (GiB) : 1.3
  
N : 128
 	P (period, ms)       : 1.5
 	runtime overhead (%) : 2.6x ~ 15.15 / 6.81
 	data loss (%)        : 1
 	LOST events          : 1
 	SAMPLE events        : 1094102
        perf.data size (GiB) : 1.3
 
N : 64
 	P (period, ms)       : 1.5
 	runtime overhead (%) : 2.4x ~ 16.23 / 6.81
 	data loss (%)        : 2
 	LOST events          : 18
 	SAMPLE events        : 1105986
        perf.data size (GiB) : 1.3

Thanks,
Alexey

> VTune collects as user as kernel mode samples, using /uk modifiers set.
> The set can be extended to collect in VM host and guests as well.
> 
> Thanks,
> Alexey
> 
>>
>> thanks,
>> jirka
>>
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 47/48] perf record: Spread maps for --threads option
  2018-09-23 19:44     ` Jiri Olsa
@ 2018-09-24 14:22       ` Arnaldo Carvalho de Melo
  2018-09-26  6:23         ` Jiri Olsa
  0 siblings, 1 reply; 101+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-09-24 14:22 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Namhyung Kim, Jiri Olsa, lkml, Ingo Molnar, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov, kernel-team

Em Sun, Sep 23, 2018 at 09:44:32PM +0200, Jiri Olsa escreveu:
> On Mon, Sep 17, 2018 at 08:40:48PM +0900, Namhyung Kim wrote:
> > On Thu, Sep 13, 2018 at 02:54:49PM +0200, Jiri Olsa wrote:
> > > Currently we assign all maps to main thread. Adding
> > > code that spreads maps for --threads option.
> > > 
> > > For --thread option we create as many threads as there
> > > are memory maps in evlist, which is the number of CPUs
> > > in the system or CPUs we monitor. Each thread gets a
> > > single data mmap to read.
> > > 
> > > In addition we have also same amount of tracking mmaps
> > > for auxiliary events which we don't create special thread
> > > for. Instead we assign the to the main thread, because
> > > there's not much traffic expected there.
> > > 
> > > The assignment is visible from --thread-stats output:
> > > 
> > >           pid      write       poll       skip  maps (size 20K)
> > >     1s   9770       144B          1          0   19K   19K   19K   18K   19K
> > >          9772         0B          1          0   18K
> > >          9773         0B          1          0   19K
> > >          9774         0B          1          0   19K
> > > 
> > > There are 5 maps for thread 9770 (1 data map and 4 auxiliary)
> > > and one data map for every other thread. Each thread writes
> > > data to the separate data file.
> > 
> > Hmm.. not sure it'll work well for large machines with 1000+ cpus.
> > What about giving each thread a data mmap and a tracking mmap?
> 
> well currently we store the tracking data in single file,
> thats why we need just one thread to write them down

I agree with Namhyung, with a slight difference: perhaps we should set
perf_event_attr.mmap on one of the events of the per-cpu mmap, that way
we don't need that dummy event, right?
 
> with the *_time API, we should be able to properly read the
> tracking data separately for each cpu

That may end up making the *_time API not needed (assuming the kernel
keeps the per-cpu mmap events in order, barring that, using the
ordered_events in batches, prior to consuming the events) and would help
with things like 'perf top' and 'perf trace', that want to consume
events right away.

- Arnaldo

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-24 13:09               ` Alexey Budankov
@ 2018-09-24 14:29                 ` Jiri Olsa
  2018-09-24 18:32                   ` Alexey Budankov
  0 siblings, 1 reply; 101+ messages in thread
From: Jiri Olsa @ 2018-09-24 14:29 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar,
	Namhyung Kim, Alexander Shishkin, Peter Zijlstra, Andi Kleen

On Mon, Sep 24, 2018 at 04:09:09PM +0300, Alexey Budankov wrote:
> Hi,
> 
> On 24.09.2018 10:02, Alexey Budankov wrote:
> > Hi,
> > 
> > On 23.09.2018 22:30, Jiri Olsa wrote:
> >> On Fri, Sep 21, 2018 at 09:13:08AM +0300, Alexey Budankov wrote:
> >>
> >> SNIP
> >>
> >>> Events:
> >>> cpu/period=P,event=0x3c/Duk;CPU_CLK_UNHALTED.THREAD
> >>> cpu/period=P,umask=0x3/Duk;CPU_CLK_UNHALTED.REF_TSC
> >>> cpu/period=P,event=0xc0/Duk;INST_RETIRED.ANY
> >>> cpu/period=0xaae61,event=0xc2,umask=0x10/uk;UOPS_RETIRED.ALL
> >>> cpu/period=0x11171,event=0xc2,umask=0x20/uk;UOPS_RETIRED.SCALAR_SIMD
> >>> cpu/period=0x11171,event=0xc2,umask=0x40/uk;UOPS_RETIRED.PACKED_SIMD
> >>>
> >>> =================================================
> >>>
> >>> Command:
> >>> /usr/bin/time /tmp/vtune_amplifier_2019.574715/bin64/perf.thr record --threads=T \
> >>> 	-a -N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
> >>>         -e cpu/period=P,event=0x3c/Duk,\
> >>>            cpu/period=P,umask=0x3/Duk,\
> >>>            cpu/period=P,event=0xc0/Duk,\
> >>>            cpu/period=0x30d40,event=0xc2,umask=0x10/uk,\
> >>>            cpu/period=0x4e20,event=0xc2,umask=0x20/uk,\
> >>>            cpu/period=0x4e20,event=0xc2,umask=0x40/uk \
> >>>          --clockid=monotonic_raw -- ./matrix.(icc|gcc)
> >>
> >> hum, so I guess the results suck because of the -a option,
> >> getting extra samples for all the perf record threads
> >>
> >> could you try without the -a? you monitor only user events,
> >> so you're interested only in ./matrix.* samples, right?
> > 
> > Ok, trying without -a, in per-process mode. 
> 
> Command:
> 
> /usr/bin/time ./perf.thr record --threads=T \
> 	-N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
> 	-e cpu/period=P,event=0x3c/Duk,\
> 	   cpu/period=P,umask=0x3/Duk,\
> 	   cpu/period=P,event=0xc0/Duk,\
> 	   cpu/period=0xaae61,event=0xc2,umask=0x10/uk,\
> 	   cpu/period=0x11171,event=0xc2,umask=0x20/uk,\
> 	   cpu/period=0x11171,event=0xc2,umask=0x40/uk \
> 	--clockid=monotonic_raw -- ./matrix.gcc
> 
> Workload: matrix multiplication in 128 threads
> 
> T : 272
> 	P (period, ms)       : 0.35 
> 	runtime overhead (%) : 13x ~ 87.73 / 6.81

how do you meassure this?

> 	data loss (%)        : 0
> 	LOST events          : 36
> 	SAMPLE events        : 8048542
>         perf.data size (GiB) : 10

any idea why does it have some much more samples?

thanks,
jirka

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-24 14:29                 ` Jiri Olsa
@ 2018-09-24 18:32                   ` Alexey Budankov
  2018-09-24 19:12                     ` Alexey Budankov
  2018-10-05  6:14                     ` Namhyung Kim
  0 siblings, 2 replies; 101+ messages in thread
From: Alexey Budankov @ 2018-09-24 18:32 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar,
	Namhyung Kim, Alexander Shishkin, Peter Zijlstra, Andi Kleen

Hi,

On 24.09.2018 17:29, Jiri Olsa wrote:
> On Mon, Sep 24, 2018 at 04:09:09PM +0300, Alexey Budankov wrote:
>> Hi,
>>
>> On 24.09.2018 10:02, Alexey Budankov wrote:
>>> Hi,
>>>
>>> On 23.09.2018 22:30, Jiri Olsa wrote:
>>>> On Fri, Sep 21, 2018 at 09:13:08AM +0300, Alexey Budankov wrote:
>>>>
>>>> SNIP
>>>>
>>>>> Events:
>>>>> cpu/period=P,event=0x3c/Duk;CPU_CLK_UNHALTED.THREAD
>>>>> cpu/period=P,umask=0x3/Duk;CPU_CLK_UNHALTED.REF_TSC
>>>>> cpu/period=P,event=0xc0/Duk;INST_RETIRED.ANY
>>>>> cpu/period=0xaae61,event=0xc2,umask=0x10/uk;UOPS_RETIRED.ALL
>>>>> cpu/period=0x11171,event=0xc2,umask=0x20/uk;UOPS_RETIRED.SCALAR_SIMD
>>>>> cpu/period=0x11171,event=0xc2,umask=0x40/uk;UOPS_RETIRED.PACKED_SIMD
>>>>>
>>>>> =================================================
>>>>>
>>>>> Command:
>>>>> /usr/bin/time /tmp/vtune_amplifier_2019.574715/bin64/perf.thr record --threads=T \
>>>>> 	-a -N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
>>>>>         -e cpu/period=P,event=0x3c/Duk,\
>>>>>            cpu/period=P,umask=0x3/Duk,\
>>>>>            cpu/period=P,event=0xc0/Duk,\
>>>>>            cpu/period=0x30d40,event=0xc2,umask=0x10/uk,\
>>>>>            cpu/period=0x4e20,event=0xc2,umask=0x20/uk,\
>>>>>            cpu/period=0x4e20,event=0xc2,umask=0x40/uk \
>>>>>          --clockid=monotonic_raw -- ./matrix.(icc|gcc)
>>>>
>>>> hum, so I guess the results suck because of the -a option,
>>>> getting extra samples for all the perf record threads
>>>>
>>>> could you try without the -a? you monitor only user events,
>>>> so you're interested only in ./matrix.* samples, right?
>>>
>>> Ok, trying without -a, in per-process mode. 
>>
>> Command:
>>
>> /usr/bin/time ./perf.thr record --threads=T \
>> 	-N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
>> 	-e cpu/period=P,event=0x3c/Duk,\
>> 	   cpu/period=P,umask=0x3/Duk,\
>> 	   cpu/period=P,event=0xc0/Duk,\
>> 	   cpu/period=0xaae61,event=0xc2,umask=0x10/uk,\
>> 	   cpu/period=0x11171,event=0xc2,umask=0x20/uk,\
>> 	   cpu/period=0x11171,event=0xc2,umask=0x40/uk \
>> 	--clockid=monotonic_raw -- ./matrix.gcc
>>
>> Workload: matrix multiplication in 128 threads
>>
>> T : 272
>> 	P (period, ms)       : 0.35 
>> 	runtime overhead (%) : 13x ~ 87.73 / 6.81
> 
> how do you meassure this?

This is the ratio of elapsed times:
runtime overhead (%) : elapsed_time_under_profiling / elapsed_time
i.e.

/usr/bin/time ./matrix.gcc
...
767.03user 11.17system 0:06.81elapsed 11424%CPU (0avgtext+0avgdata 100756maxresident)k
88inputs+0outputs (0major+139898minor)pagefaults 0swaps

so elapsed_time = 6.81 sec

elapsed_time_uder_profiling is elapsed value from output of 

/usr/bin/time ./perf.thr record --threads=T ...

> 
>> 	data loss (%)        : 0
>> 	LOST events          : 36
>> 	SAMPLE events        : 8048542
>>         perf.data size (GiB) : 10
> 
> any idea why does it have some much more samples?

Presumably, this is because period is 350us and this is the smallest 
one that perf.thr manages to capture data without data loss (=0) when T=272.
However, during collection, I get message that max sampling frequency 
is lowered to 3KHz.

Thanks,
Alexey

> 
> thanks,
> jirka
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-24 18:32                   ` Alexey Budankov
@ 2018-09-24 19:12                     ` Alexey Budankov
  2018-10-05  6:14                     ` Namhyung Kim
  1 sibling, 0 replies; 101+ messages in thread
From: Alexey Budankov @ 2018-09-24 19:12 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar,
	Namhyung Kim, Alexander Shishkin, Peter Zijlstra, Andi Kleen

Hi,

On 24.09.2018 21:32, Alexey Budankov wrote:
> Hi,
> 
> On 24.09.2018 17:29, Jiri Olsa wrote:
>> On Mon, Sep 24, 2018 at 04:09:09PM +0300, Alexey Budankov wrote:
>>> Hi,
>>>
>>> On 24.09.2018 10:02, Alexey Budankov wrote:
>>>> Hi,
>>>>
>>>> On 23.09.2018 22:30, Jiri Olsa wrote:
>>>>> On Fri, Sep 21, 2018 at 09:13:08AM +0300, Alexey Budankov wrote:
>>>>>
>>>>> SNIP
>>>>>
>>>>>> Events:
>>>>>> cpu/period=P,event=0x3c/Duk;CPU_CLK_UNHALTED.THREAD
>>>>>> cpu/period=P,umask=0x3/Duk;CPU_CLK_UNHALTED.REF_TSC
>>>>>> cpu/period=P,event=0xc0/Duk;INST_RETIRED.ANY
>>>>>> cpu/period=0xaae61,event=0xc2,umask=0x10/uk;UOPS_RETIRED.ALL
>>>>>> cpu/period=0x11171,event=0xc2,umask=0x20/uk;UOPS_RETIRED.SCALAR_SIMD
>>>>>> cpu/period=0x11171,event=0xc2,umask=0x40/uk;UOPS_RETIRED.PACKED_SIMD
>>>>>>
>>>>>> =================================================
>>>>>>
>>>>>> Command:
>>>>>> /usr/bin/time /tmp/vtune_amplifier_2019.574715/bin64/perf.thr record --threads=T \
>>>>>> 	-a -N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
>>>>>>         -e cpu/period=P,event=0x3c/Duk,\
>>>>>>            cpu/period=P,umask=0x3/Duk,\
>>>>>>            cpu/period=P,event=0xc0/Duk,\
>>>>>>            cpu/period=0x30d40,event=0xc2,umask=0x10/uk,\
>>>>>>            cpu/period=0x4e20,event=0xc2,umask=0x20/uk,\
>>>>>>            cpu/period=0x4e20,event=0xc2,umask=0x40/uk \
>>>>>>          --clockid=monotonic_raw -- ./matrix.(icc|gcc)
>>>>>
>>>>> hum, so I guess the results suck because of the -a option,
>>>>> getting extra samples for all the perf record threads
>>>>>
>>>>> could you try without the -a? you monitor only user events,
>>>>> so you're interested only in ./matrix.* samples, right?
>>>>
>>>> Ok, trying without -a, in per-process mode. 
>>>
>>> Command:
>>>
>>> /usr/bin/time ./perf.thr record --threads=T \
>>> 	-N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
>>> 	-e cpu/period=P,event=0x3c/Duk,\
>>> 	   cpu/period=P,umask=0x3/Duk,\
>>> 	   cpu/period=P,event=0xc0/Duk,\
>>> 	   cpu/period=0xaae61,event=0xc2,umask=0x10/uk,\
>>> 	   cpu/period=0x11171,event=0xc2,umask=0x20/uk,\
>>> 	   cpu/period=0x11171,event=0xc2,umask=0x40/uk \
>>> 	--clockid=monotonic_raw -- ./matrix.gcc
>>>
>>> Workload: matrix multiplication in 128 threads
>>>
>>> T : 272
>>> 	P (period, ms)       : 0.35 
>>> 	runtime overhead (%) : 13x ~ 87.73 / 6.81
>>
>> how do you meassure this?
> 
> This is the ratio of elapsed times:
> runtime overhead (%) : elapsed_time_under_profiling / elapsed_time
> i.e.
> 
> /usr/bin/time ./matrix.gcc
> ...
> 767.03user 11.17system 0:06.81elapsed 11424%CPU (0avgtext+0avgdata 100756maxresident)k
> 88inputs+0outputs (0major+139898minor)pagefaults 0swaps
> 
> so elapsed_time = 6.81 sec
> 
> elapsed_time_uder_profiling is elapsed value from output of 
> 
> /usr/bin/time ./perf.thr record --threads=T ...
> 
>>
>>> 	data loss (%)        : 0
>>> 	LOST events          : 36
>>> 	SAMPLE events        : 8048542
>>>         perf.data size (GiB) : 10
>>
>> any idea why does it have some much more samples?
> 
> Presumably, this is because period is 350us and this is the smallest 
> one that perf.thr manages to capture data without data loss (=0) when T=272.
> However, during collection, I get message that max sampling frequency 
> is lowered to 3KHz.

Lowering default frequency rate to 3000.
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.

Thanks,
Alexey

> 
> Thanks,
> Alexey
> 
>>
>> thanks,
>> jirka
>>
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-21 12:15           ` Alexey Budankov
@ 2018-09-24 19:23             ` Alexey Budankov
  2018-10-02 21:41               ` Jiri Olsa
  0 siblings, 1 reply; 101+ messages in thread
From: Alexey Budankov @ 2018-09-24 19:23 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar,
	Namhyung Kim, Alexander Shishkin, Peter Zijlstra, Andi Kleen

Hi,

On 21.09.2018 15:15, Alexey Budankov wrote:
> Hello Jiri,
> 
> On 21.09.2018 9:13, Alexey Budankov wrote:
>> Hello Jiri,
>>
>> On 14.09.2018 12:37, Alexey Budankov wrote:
>>> On 14.09.2018 11:28, Jiri Olsa wrote:
>>>> On Fri, Sep 14, 2018 at 10:26:53AM +0200, Jiri Olsa wrote:
>>>>
>>>> SNIP
>>>>
>>>>>>> The threaded monitoring currently can't monitor backward maps
>>>>>>> and there are probably more limitations which I haven't spotted
>>>>>>> yet.
>>>>>>>
>>>>>>> So far I tested on laptop:
>>>>>>>   http://people.redhat.com/~jolsa/record_threads/test-4CPU.txt
>>>>>>>
>>>>>>> and a one bigger server:
>>>>>>>   http://people.redhat.com/~jolsa/record_threads/test-208CPU.txt
>>>>>>>
>>>>>>> I can see decrease in recorded LOST events, but both the benchmark
>>>>>>> and the monitoring must be carefully configured wrt:
>>>>>>>   - number of events (frequency)
>>>>>>>   - size of the memory maps
>>>>>>>   - size of events (callchains)
>>>>>>>   - final perf.data size
>>>>>>>
>>>>>>> It's also available in:
>>>>>>>   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
>>>>>>>   perf/record_threads
>>>>>>>
>>>>>>> thoughts? ;-) thanks
>>>>>>> jirka
>>>>>>
>>>>>> It is preferable to split into smaller pieces that bring 
>>>>>> some improvement proved by metrics numbers and ready for 
>>>>>> merging and upstream. Do we have more metrics than the 
>>>>>> data loss from trace AIO patches?
>>>>>
>>>>> well the primary focus is to get more events in,
>>>>> so the LOST metric is the main one
>>>>
>>>> actualy I was hoping, could you please run it through the same
>>>> tests as you do for AIO code on some huge server? 
>>>
>>> Yeah, I will, but it takes some time.
>>
>> Here it is:
>>
>> Hardware:
>> cat /proc/cpuinfo
>> processor	: 271
>> vendor_id	: GenuineIntel
>> cpu family	: 6
>> model		: 133
>> model name	: Intel(R) Xeon Phi(TM) CPU 7285 @ 1.30GHz
>> stepping	: 0
>> microcode	: 0xe
>> cpu MHz		: 1064.235
>> cache size	: 1024 KB
>> physical id	: 0
>> siblings	: 272
>> core id		: 73
>> cpu cores	: 68
>> apicid		: 295
>> initial apicid	: 295
>> fpu		: yes
>> fpu_exception	: yes
>> cpuid level	: 13
>> wp		: yes
>> flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ring3mwait cpuid_fault epb pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms avx512f rdseed adx avx512pf avx512er avx512cd xsaveopt dtherm ida arat pln pts avx512_vpopcntdq avx512_4vnniw avx512_4fmaps
>> bugs		: cpu_meltdown spectre_v1 spectre_v2
>> bogomips	: 2594.07
>> clflush size	: 64
>> cache_alignment	: 64
>> address sizes	: 46 bits physical, 48 bits virtual
>> power management:
>>
>> uname -a
>> Linux nntpat98-196 4.18.0-rc7+ #2 SMP Thu Sep 6 13:24:37 MSK 2018 x86_64 x86_64 x86_64 GNU/Linux
>>
>> cat /proc/sys/kernel/perf_event_paranoid
>> 0
>>
>> cat /proc/sys/kernel/perf_event_mlock_kb 
>> 516
>>
>> cat /proc/sys/kernel/perf_event_max_sample_rate 
>> 3000
>>
>> cat /etc/redhat-release 
>> Red Hat Enterprise Linux Server release 7.5 (Maipo)
>>
>> Metrics:
>> runtime overhead (%) : elapsed_time_under_profiling / elapsed_time
>> data loss (%)        : paused_time / elapsed_time_under_profiling
>> LOST events          : stat from perf report --stats
>> SAMPLE events        : stat from perf report --stats
>> perf.data size (B)   : size of trace file on disk
>>
>> Events:
>> cpu/period=P,event=0x3c/Duk;CPU_CLK_UNHALTED.THREAD
>> cpu/period=P,umask=0x3/Duk;CPU_CLK_UNHALTED.REF_TSC
>> cpu/period=P,event=0xc0/Duk;INST_RETIRED.ANY
>> cpu/period=0xaae61,event=0xc2,umask=0x10/uk;UOPS_RETIRED.ALL
>> cpu/period=0x11171,event=0xc2,umask=0x20/uk;UOPS_RETIRED.SCALAR_SIMD
>> cpu/period=0x11171,event=0xc2,umask=0x40/uk;UOPS_RETIRED.PACKED_SIMD
>>
>> =================================================
>>
>> Command:
>> /usr/bin/time /tmp/vtune_amplifier_2019.574715/bin64/perf.thr record --threads=T \
>> 	-a -N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
>>         -e cpu/period=P,event=0x3c/Duk,\
>>            cpu/period=P,umask=0x3/Duk,\
>>            cpu/period=P,event=0xc0/Duk,\
>>            cpu/period=0x30d40,event=0xc2,umask=0x10/uk,\
>>            cpu/period=0x4e20,event=0xc2,umask=0x20/uk,\
>>            cpu/period=0x4e20,event=0xc2,umask=0x40/uk \
>>          --clockid=monotonic_raw -- ./matrix.(icc|gcc)
>>
>> Workload: matrix multiplication in 256 threads
>>
>> /usr/bin/time ./matrix.icc
>> Addr of buf1 = 0x7ff9faa73010
>> Offs of buf1 = 0x7ff9faa73180
>> Addr of buf2 = 0x7ff9f8a72010
>> Offs of buf2 = 0x7ff9f8a721c0
>> Addr of buf3 = 0x7ff9f6a71010
>> Offs of buf3 = 0x7ff9f6a71100
>> Addr of buf4 = 0x7ff9f4a70010
>> Offs of buf4 = 0x7ff9f4a70140
>> Threads #: 256 Pthreads
>> Matrix size: 2048
>> Using multiply kernel: multiply1
>> Freq = 0.997720 GHz
>> Execution time = 9.061 seconds
>> 1639.55user 6.59system 0:07.12elapsed 23094%CPU (0avgtext+0avgdata 100448maxresident)k
>> 96inputs+0outputs (1major+33839minor)pagefaults 0swaps
>>
>> T : 272
>>         P (period, ms)       : 0.1
>> 	runtime overhead (%) : 45x ~ 323.54 / 7.12
>> 	data loss (%)        : 96
>> 	LOST events          : 323662
>> 	SAMPLE events        : 31885479
>>         perf.data size (GiB) : 42
>>
>> 	P (period, ms)       : 0.25
>> 	runtime overhead (%) : 25x ~ 180.76 / 7.12
>> 	data loss (%)        : 69 
>> 	LOST events          : 10636
>> 	SAMPLE events        : 18692998
>>         perf.data size (GiB) : 23.5
>>
>> 	P (period, ms)       : 0.35 
>> 	runtime overhead (%) : 16x ~ 119.49 / 7.12
>> 	data loss (%)        : 1
>> 	LOST events          : 6
>> 	SAMPLE events        : 11178524
>>         perf.data size (GiB) : 14
>>
>> T : 128
>> 	P (period, ms)       : 0.35 
>> 	runtime overhead (%) : 15x ~ 111.98 / 7.12
>> 	data loss (%)        : 62
>> 	LOST events          : 2825
>> 	SAMPLE events        : 11267247
>>         perf.data size (GiB) : 15
>>
>> T : 64
>> 	P (period, ms)       : 0.35 
>> 	runtime overhead (%) : 14x ~ 101.55 / 7.12
>> 	data loss (%)        : 67
>> 	LOST events          : 5155
>> 	SAMPLE events        : 10966297
>>         perf.data size (GiB) : 13.7
>>
>> Workload: matrix multiplication in 128 threads
>>
>> /usr/bin/time ./matrix.gcc
>> Addr of buf1 = 0x7f072e630010
>> Offs of buf1 = 0x7f072e630180
>> Addr of buf2 = 0x7f072c62f010
>> Offs of buf2 = 0x7f072c62f1c0
>> Addr of buf3 = 0x7f072a62e010
>> Offs of buf3 = 0x7f072a62e100
>> Addr of buf4 = 0x7f072862d010
>> Offs of buf4 = 0x7f072862d140
>> Threads #: 128 Pthreads
>> Matrix size: 2048
>> Using multiply kernel: multiply1
>> Execution time = 6.639 seconds
>> 767.03user 11.17system 0:06.81elapsed 11424%CPU (0avgtext+0avgdata 100756maxresident)k
>> 88inputs+0outputs (0major+139898minor)pagefaults 0swaps
>>
>> T : 272
>>         P (period, ms)       : 0.1
>> 	runtime overhead (%) : 29x ~ 198.81 / 6.81
>> 	data loss (%)        : 21
>> 	LOST events          : 2502
>> 	SAMPLE events        : 22481062
>>         perf.data size (GiB) : 27.6
>>
>> 	P (period, ms)       : 0.25
>> 	runtime overhead (%) : 13x ~ 88.47 / 6.81
>> 	data loss (%)        : 0
>> 	LOST events          : 0
>> 	SAMPLE events        : 9572787
>>         perf.data size (GiB) : 11.3
>>
>> 	P (period, ms)       : 0.35 
>> 	runtime overhead (%) : 10x ~ 67.11 / 6.81
>> 	data loss (%)        : 1
>> 	LOST events          : 137
>> 	SAMPLE events        : 6985930
>>         perf.data size (GiB) : 8
>>
>> T : 128
>> 	P (period, ms)       : 0.35 
>> 	runtime overhead (%) : 9.5x ~ 64.33 / 6.81
>> 	data loss (%)        : 1
>> 	LOST events          : 3
>> 	SAMPLE events        : 6666903
>>         perf.data size (GiB) : 7.8
>>
>> T : 64
>> 	P (period, ms)       : 0.25
>> 	runtime overhead (%) : 17x ~ 114.27 / 6.81
>> 	data loss (%)        : 2
>> 	LOST events          : 52
>> 	SAMPLE events        : 12643645
>>         perf.data size (GiB) : 15.5
>>
>> 	P (period, ms)       : 0.35 
>> 	runtime overhead (%) : 10x ~ 68.60 / 6.81
>> 	data loss (%)        : 1
>> 	LOST events          : 93
>> 	SAMPLE events        : 7164368
>>         perf.data size (GiB) : 8.5
> 
> and this is for AIO and serial:
> 
> Command:
> /usr/bin/time /tmp/vtune_amplifier_2019.574715/bin64/perf.aio record --aio=N \
> 	-a -N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
>         -e cpu/period=P,event=0x3c/Duk,\
>            cpu/period=P,umask=0x3/Duk,\
>            cpu/period=P,event=0xc0/Duk,\
>            cpu/period=0x30d40,event=0xc2,umask=0x10/uk,\
>            cpu/period=0x4e20,event=0xc2,umask=0x20/uk,\
>            cpu/period=0x4e20,event=0xc2,umask=0x40/uk \
>          --clockid=monotonic_raw -- ./matrix.(icc|gcc)
> 
> Workload: matrix multiplication in 256 threads
> 
>  N : 512
>         P (period, ms)       : 2.5
>  	runtime overhead (%) : 2.7x ~ 19.21 / 7.12
>  	data loss (%)        : 42
>  	LOST events          : 1600
>  	SAMPLE events        : 1235928
>         perf.data size (GiB) : 1.5
>  
>  N : 272
>  	P (period, ms)       : 1.5
>  	runtime overhead (%) : 2.5x ~ 18.09 / 7.12
>  	data loss (%)        : 89
>  	LOST events          : 3457
>  	SAMPLE events        : 1222143
>         perf.data size (GiB) : 1.5
> 
>  	P (period, ms)       : 2
>  	runtime overhead (%) : 2.5x ~ 17.93 / 7.12
>  	data loss (%)        : 65
>  	LOST events          : 2496
>  	SAMPLE events        : 1240754
>         perf.data size (GiB) : 1.5
> 
>  	P (period, ms)       : 2.5
>  	runtime overhead (%) : 2.5x ~ 17.87 / 7.12
>  	data loss (%)        : 44
>  	LOST events          : 1621
>  	SAMPLE events        : 1221949
>         perf.data size (GiB) : 1.5
> 
>  	P (period, ms)       : 3
>  	runtime overhead (%) : 2.5x ~ 18.43 / 7.12
>  	data loss (%)        : 12
>  	LOST events          : 350
>  	SAMPLE events        : 1117972
>         perf.data size (GiB) : 1.3
>  
>  N : 128
>  	P (period, ms)       : 3
>  	runtime overhead (%) : 2.4x ~ 17.08 / 7.12
>  	data loss (%)        : 11
>  	LOST events          : 335
>  	SAMPLE events        : 1116832
>         perf.data size (GiB) : 1.3
>  
>  N : 64
>  	P (period, ms)       : 3
>  	runtime overhead (%) : 2.2x ~ 16.03 / 7.12
>  	data loss (%)        : 11
>  	LOST events          : 329
>  	SAMPLE events        : 1108205
>         perf.data size (GiB) : 1.3
>  
> Workload: matrix multiplication in 128 threads
> 
>  N : 512
>         P (period, ms)       : 1
>  	runtime overhead (%) : 3.5x ~ 23.72 / 6.81
>  	data loss (%)        : 18
>  	LOST events          : 1043
>  	SAMPLE events        : 2015306
>         perf.data size (GiB) : 2.3
> 
>  N : 272
>         P (period, ms)       : 0.5
>  	runtime overhead (%) : 3x ~ 22.72 / 6.81
>  	data loss (%)        : 90
>  	LOST events          : 5842
>  	SAMPLE events        : 2205937
>         perf.data size (GiB) : 2.5
> 
>         P (period, ms)       : 1
>  	runtime overhead (%) : 3x ~ 22.79 / 6.81
>  	data loss (%)        : 11
>  	LOST events          : 481
>  	SAMPLE events        : 2017099
>         perf.data size (GiB) : 2.5
>  
>  	P (period, ms)       : 1.5
>  	runtime overhead (%) : 3x ~ 19.93 / 6.81
>  	data loss (%)        : 5
>  	LOST events          : 190
>  	SAMPLE events        : 1308692
>         perf.data size (GiB) : 1.5
>  
>  	P (period, ms)       : 2
>  	runtime overhead (%) : 3x ~ 18.95 / 6.81
>  	data loss (%)        : 0
>  	LOST events          : 0
>  	SAMPLE events        : 1010769
>         perf.data size (GiB) : 1.2
>  
>  N : 128
>  	P (period, ms)       : 1.5
>  	runtime overhead (%) : 3x ~ 19.08 / 6.81
>  	data loss (%)        : 6
>  	LOST events          : 220
>  	SAMPLE events        : 1322240
>         perf.data size (GiB) : 1.5
>  
>  N : 64
>  	P (period, ms)       : 1.5
>  	runtime overhead (%) : 3x ~ 19.43 / 6.81
>  	data loss (%)        : 3
>  	LOST events          : 130
>  	SAMPLE events        : 1386521
>         perf.data size (GiB) : 1.6
> 
> =================================================
> 
> Command:
> /usr/bin/time /tmp/vtune_amplifier_2019.574715/bin64/perf record \
> 	-a -N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
>         -e cpu/period=P,event=0x3c/Duk,\
>            cpu/period=P,umask=0x3/Duk,\
>            cpu/period=P,event=0xc0/Duk,\
>            cpu/period=0x30d40,event=0xc2,umask=0x10/uk,\
>            cpu/period=0x4e20,event=0xc2,umask=0x20/uk,\
>            cpu/period=0x4e20,event=0xc2,umask=0x40/uk \
>          --clockid=monotonic_raw -- ./matrix.(icc|gcc)
> 
> Workload: matrix multiplication in 256 threads
> 
> 	P (period, ms)       : 7.5
>  	runtime overhead (%) : 1.6x ~ 11.6 / 7.12
>  	data loss (%)        : 1
>  	LOST events          : 1
>  	SAMPLE events        : 451062
>         perf.data size (GiB) : 0.5
> 
> Workload: matrix multiplication in 128 threads
> 
> 	P (period, ms)       : 3
>  	runtime overhead (%) : 1.8x ~ 12.58 / 6.81
>  	data loss (%)        : 9
>  	LOST events          : 147
>  	SAMPLE events        : 673299
>         perf.data size (GiB) : 0.8

Please see more comparable data by P (period, ms), 
runtime overhead and data loss metrics at the same time.

It start from serial implementation as the baseline and 
then demonstrates possible improvement applying configurable 
--aio(=N) and --threads(=T) implementations.

Smaller P values, with data loss and runtime overhead values
equal or in small vicinity of the ones from serial implementation,
might mean possible gain.

Workload: matrix multiplication in 128 threads

Serial:
 	P (period, ms)       : 3
  	runtime overhead (%) : 1.8x ~ 12.58 / 6.81
  	data loss (%)        : 9
  	LOST events          : 147
  	SAMPLE events        : 673299
        perf.data size (GiB) : 0.8

AIO:
    N : 1
 	P (period, ms)       : 3
  	runtime overhead (%) : 1.8x ~ 12.42 / 6.81
  	data loss (%)        : 2
  	LOST events          : 19
  	SAMPLE events        : 664749
        perf.data size (GiB) : 0.75

    N : 4
 	P (period, ms)       : 1.8
  	runtime overhead (%) : 1.8x ~ 12.74 / 6.81
  	data loss (%)        : 10
  	LOST events          : 257
  	SAMPLE events        : 1079250
        perf.data size (GiB) : 1.25

Threads:
    T : 1
 	P (period, ms)       : 3
  	runtime overhead (%) : 2.6x ~ 17.73 / 6.81
  	data loss (%)        : 6
  	LOST events          : 95
  	SAMPLE events        : 665844
        perf.data size (GiB) : 0.78

    T : 2
 	P (period, ms)       : 3
  	runtime overhead (%) : 2.6x ~ 18.04 / 6.81
  	data loss (%)        : 0
  	LOST events          : 0
  	SAMPLE events        : 662075
        perf.data size (GiB) : 0.8

 	P (period, ms)       : 1.8
  	runtime overhead (%) : 3x ~ 20.83 / 6.81
  	data loss (%)        : 4
  	LOST events          : 76
  	SAMPLE events        : 1085826
        perf.data size (GiB) : 1.25

    T : 4
 	P (period, ms)       : 3
  	runtime overhead (%) : 2.6x ~ 17.85 / 6.81 
  	data loss (%)        : 0
  	LOST events          : 0
  	SAMPLE events        : 665262
        perf.data size (GiB) : 0.78

 	P (period, ms)       : 1.8
  	runtime overhead (%) : 3x ~ 21.15 / 6.81 
  	data loss (%)        : 0
  	LOST events          : 0
  	SAMPLE events        : 1126563
        perf.data size (GiB) : 1.3

 	P (period, ms)       : 1
  	runtime overhead (%) : 4.35x ~ 29.6 / 6.81
  	data loss (%)        : 0
  	LOST events          : 6
  	SAMPLE events        : 2124837
        perf.data size (GiB) : 2.5

 	P (period, ms)       : 0.8
  	runtime overhead (%) : 4.8x ~ 32.62 / 6.81
  	data loss (%)        : 12
  	LOST events          : 536
  	SAMPLE events        : 2620345
        perf.data size (GiB) : 3

Thanks,
Alexey

> 
> Thanks,
> Alexey
> 
>>
>> Thanks,
>> Alexey
>>
>>>
>>>>
>>>> thanks,
>>>> jirka
>>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [tip:perf/core] perf tools: Remove perf_tool from event_op2
  2018-09-13 12:54 ` [PATCH 01/48] perf tools: Remove perf_tool from event_op2 Jiri Olsa
@ 2018-09-25  9:31   ` " tip-bot for Jiri Olsa
  0 siblings, 0 replies; 101+ messages in thread
From: tip-bot for Jiri Olsa @ 2018-09-25  9:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: alexey.budankov, acme, alexander.shishkin, andi, hpa, jolsa,
	linux-kernel, mingo, tglx, peterz, namhyung

Commit-ID:  89f1688a57a8f0b685fccd648e601a1f830fa744
Gitweb:     https://git.kernel.org/tip/89f1688a57a8f0b685fccd648e601a1f830fa744
Author:     Jiri Olsa <jolsa@kernel.org>
AuthorDate: Thu, 13 Sep 2018 14:54:03 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 19 Sep 2018 10:25:10 -0300

perf tools: Remove perf_tool from event_op2

Now that we keep a perf_tool pointer inside perf_session, there's no
need to have a perf_tool argument in the event_op2 callback. Remove it.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180913125450.21342-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-annotate.c |  7 ++---
 tools/perf/builtin-inject.c   | 26 +++++++----------
 tools/perf/builtin-report.c   |  9 +++---
 tools/perf/builtin-script.c   | 38 ++++++++++++------------
 tools/perf/builtin-stat.c     | 23 +++++++--------
 tools/perf/util/auxtrace.c    | 10 +++----
 tools/perf/util/auxtrace.h    | 10 +++----
 tools/perf/util/header.c      | 16 +++++------
 tools/perf/util/header.h      | 15 ++++------
 tools/perf/util/session.c     | 67 ++++++++++++++++++-------------------------
 tools/perf/util/session.h     |  5 ++--
 tools/perf/util/stat.c        |  5 ++--
 tools/perf/util/stat.h        |  5 ++--
 tools/perf/util/tool.h        |  3 +-
 14 files changed, 103 insertions(+), 136 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 830481b8db26..93d679eaf1f4 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -283,12 +283,11 @@ out_put:
 	return ret;
 }
 
-static int process_feature_event(struct perf_tool *tool,
-				 union perf_event *event,
-				 struct perf_session *session)
+static int process_feature_event(struct perf_session *session,
+				 union perf_event *event)
 {
 	if (event->feat.feat_id < HEADER_LAST_FEATURE)
-		return perf_event__process_feature(tool, event, session);
+		return perf_event__process_feature(session, event);
 	return 0;
 }
 
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index a3b346359ba0..d77ed2aea95a 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -86,12 +86,10 @@ static int perf_event__drop_oe(struct perf_tool *tool __maybe_unused,
 }
 #endif
 
-static int perf_event__repipe_op2_synth(struct perf_tool *tool,
-					union perf_event *event,
-					struct perf_session *session
-					__maybe_unused)
+static int perf_event__repipe_op2_synth(struct perf_session *session,
+					union perf_event *event)
 {
-	return perf_event__repipe_synth(tool, event);
+	return perf_event__repipe_synth(session->tool, event);
 }
 
 static int perf_event__repipe_attr(struct perf_tool *tool,
@@ -362,26 +360,24 @@ static int perf_event__repipe_exit(struct perf_tool *tool,
 	return err;
 }
 
-static int perf_event__repipe_tracing_data(struct perf_tool *tool,
-					   union perf_event *event,
-					   struct perf_session *session)
+static int perf_event__repipe_tracing_data(struct perf_session *session,
+					   union perf_event *event)
 {
 	int err;
 
-	perf_event__repipe_synth(tool, event);
-	err = perf_event__process_tracing_data(tool, event, session);
+	perf_event__repipe_synth(session->tool, event);
+	err = perf_event__process_tracing_data(session, event);
 
 	return err;
 }
 
-static int perf_event__repipe_id_index(struct perf_tool *tool,
-				       union perf_event *event,
-				       struct perf_session *session)
+static int perf_event__repipe_id_index(struct perf_session *session,
+				       union perf_event *event)
 {
 	int err;
 
-	perf_event__repipe_synth(tool, event);
-	err = perf_event__process_id_index(tool, event, session);
+	perf_event__repipe_synth(session->tool, event);
+	err = perf_event__process_id_index(session, event);
 
 	return err;
 }
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 76e12bcd1765..7507e4d6dce1 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -201,14 +201,13 @@ static void setup_forced_leader(struct report *report,
 		perf_evlist__force_leader(evlist);
 }
 
-static int process_feature_event(struct perf_tool *tool,
-				 union perf_event *event,
-				 struct perf_session *session __maybe_unused)
+static int process_feature_event(struct perf_session *session,
+				 union perf_event *event)
 {
-	struct report *rep = container_of(tool, struct report, tool);
+	struct report *rep = container_of(session->tool, struct report, tool);
 
 	if (event->feat.feat_id < HEADER_LAST_FEATURE)
-		return perf_event__process_feature(tool, event, session);
+		return perf_event__process_feature(session, event);
 
 	if (event->feat.feat_id != HEADER_LAST_FEATURE) {
 		pr_err("failed: wrong feature ID: %" PRIu64 "\n",
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 6176bae177c2..765391b6c88c 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2965,9 +2965,8 @@ static void script__setup_sample_type(struct perf_script *script)
 	}
 }
 
-static int process_stat_round_event(struct perf_tool *tool __maybe_unused,
-				    union perf_event *event,
-				    struct perf_session *session)
+static int process_stat_round_event(struct perf_session *session,
+				    union perf_event *event)
 {
 	struct stat_round_event *round = &event->stat_round;
 	struct perf_evsel *counter;
@@ -2981,9 +2980,8 @@ static int process_stat_round_event(struct perf_tool *tool __maybe_unused,
 	return 0;
 }
 
-static int process_stat_config_event(struct perf_tool *tool __maybe_unused,
-				     union perf_event *event,
-				     struct perf_session *session __maybe_unused)
+static int process_stat_config_event(struct perf_session *session __maybe_unused,
+				     union perf_event *event)
 {
 	perf_event__read_stat_config(&stat_config, &event->stat_config);
 	return 0;
@@ -3009,10 +3007,10 @@ static int set_maps(struct perf_script *script)
 }
 
 static
-int process_thread_map_event(struct perf_tool *tool,
-			     union perf_event *event,
-			     struct perf_session *session __maybe_unused)
+int process_thread_map_event(struct perf_session *session,
+			     union perf_event *event)
 {
+	struct perf_tool *tool = session->tool;
 	struct perf_script *script = container_of(tool, struct perf_script, tool);
 
 	if (script->threads) {
@@ -3028,10 +3026,10 @@ int process_thread_map_event(struct perf_tool *tool,
 }
 
 static
-int process_cpu_map_event(struct perf_tool *tool __maybe_unused,
-			  union perf_event *event,
-			  struct perf_session *session __maybe_unused)
+int process_cpu_map_event(struct perf_session *session,
+			  union perf_event *event)
 {
+	struct perf_tool *tool = session->tool;
 	struct perf_script *script = container_of(tool, struct perf_script, tool);
 
 	if (script->cpus) {
@@ -3046,21 +3044,21 @@ int process_cpu_map_event(struct perf_tool *tool __maybe_unused,
 	return set_maps(script);
 }
 
-static int process_feature_event(struct perf_tool *tool,
-				 union perf_event *event,
-				 struct perf_session *session)
+static int process_feature_event(struct perf_session *session,
+				 union perf_event *event)
 {
 	if (event->feat.feat_id < HEADER_LAST_FEATURE)
-		return perf_event__process_feature(tool, event, session);
+		return perf_event__process_feature(session, event);
 	return 0;
 }
 
 #ifdef HAVE_AUXTRACE_SUPPORT
-static int perf_script__process_auxtrace_info(struct perf_tool *tool,
-					      union perf_event *event,
-					      struct perf_session *session)
+static int perf_script__process_auxtrace_info(struct perf_session *session,
+					      union perf_event *event)
 {
-	int ret = perf_event__process_auxtrace_info(tool, event, session);
+	struct perf_tool *tool = session->tool;
+
+	int ret = perf_event__process_auxtrace_info(session, event);
 
 	if (ret == 0) {
 		struct perf_script *script = container_of(tool, struct perf_script, tool);
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 0b0e3961d511..b86aba1c8028 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1354,9 +1354,8 @@ static int __cmd_record(int argc, const char **argv)
 	return argc;
 }
 
-static int process_stat_round_event(struct perf_tool *tool __maybe_unused,
-				    union perf_event *event,
-				    struct perf_session *session)
+static int process_stat_round_event(struct perf_session *session,
+				    union perf_event *event)
 {
 	struct stat_round_event *stat_round = &event->stat_round;
 	struct perf_evsel *counter;
@@ -1381,10 +1380,10 @@ static int process_stat_round_event(struct perf_tool *tool __maybe_unused,
 }
 
 static
-int process_stat_config_event(struct perf_tool *tool,
-			      union perf_event *event,
-			      struct perf_session *session __maybe_unused)
+int process_stat_config_event(struct perf_session *session,
+			      union perf_event *event)
 {
+	struct perf_tool *tool = session->tool;
 	struct perf_stat *st = container_of(tool, struct perf_stat, tool);
 
 	perf_event__read_stat_config(&stat_config, &event->stat_config);
@@ -1424,10 +1423,10 @@ static int set_maps(struct perf_stat *st)
 }
 
 static
-int process_thread_map_event(struct perf_tool *tool,
-			     union perf_event *event,
-			     struct perf_session *session __maybe_unused)
+int process_thread_map_event(struct perf_session *session,
+			     union perf_event *event)
 {
+	struct perf_tool *tool = session->tool;
 	struct perf_stat *st = container_of(tool, struct perf_stat, tool);
 
 	if (st->threads) {
@@ -1443,10 +1442,10 @@ int process_thread_map_event(struct perf_tool *tool,
 }
 
 static
-int process_cpu_map_event(struct perf_tool *tool,
-			  union perf_event *event,
-			  struct perf_session *session __maybe_unused)
+int process_cpu_map_event(struct perf_session *session,
+			  union perf_event *event)
 {
+	struct perf_tool *tool = session->tool;
 	struct perf_stat *st = container_of(tool, struct perf_stat, tool);
 	struct cpu_map *cpus;
 
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index db1511359c5e..86f0bc445f93 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -906,9 +906,8 @@ out_free:
 	return err;
 }
 
-int perf_event__process_auxtrace_info(struct perf_tool *tool __maybe_unused,
-				      union perf_event *event,
-				      struct perf_session *session)
+int perf_event__process_auxtrace_info(struct perf_session *session,
+				      union perf_event *event)
 {
 	enum auxtrace_type type = event->auxtrace_info.type;
 
@@ -1185,9 +1184,8 @@ void events_stats__auxtrace_error_warn(const struct events_stats *stats)
 	}
 }
 
-int perf_event__process_auxtrace_error(struct perf_tool *tool __maybe_unused,
-				       union perf_event *event,
-				       struct perf_session *session)
+int perf_event__process_auxtrace_error(struct perf_session *session,
+				       union perf_event *event)
 {
 	if (auxtrace__dont_decode(session))
 		return 0;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 71fc3bd74299..97776470a52e 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -517,15 +517,13 @@ int perf_event__synthesize_auxtrace_info(struct auxtrace_record *itr,
 					 struct perf_tool *tool,
 					 struct perf_session *session,
 					 perf_event__handler_t process);
-int perf_event__process_auxtrace_info(struct perf_tool *tool,
-				      union perf_event *event,
-				      struct perf_session *session);
+int perf_event__process_auxtrace_info(struct perf_session *session,
+				      union perf_event *event);
 s64 perf_event__process_auxtrace(struct perf_tool *tool,
 				 union perf_event *event,
 				 struct perf_session *session);
-int perf_event__process_auxtrace_error(struct perf_tool *tool,
-				       union perf_event *event,
-				       struct perf_session *session);
+int perf_event__process_auxtrace_error(struct perf_session *session,
+				       union perf_event *event);
 int itrace_parse_synth_opts(const struct option *opt, const char *str,
 			    int unset);
 void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts);
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 91e6d9cfd906..c78051ad1fcc 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -3448,10 +3448,10 @@ int perf_event__synthesize_features(struct perf_tool *tool,
 	return ret;
 }
 
-int perf_event__process_feature(struct perf_tool *tool,
-				union perf_event *event,
-				struct perf_session *session __maybe_unused)
+int perf_event__process_feature(struct perf_session *session,
+				union perf_event *event)
 {
+	struct perf_tool *tool = session->tool;
 	struct feat_fd ff = { .fd = 0 };
 	struct feature_event *fe = (struct feature_event *)event;
 	int type = fe->header.type;
@@ -3856,9 +3856,8 @@ int perf_event__synthesize_tracing_data(struct perf_tool *tool, int fd,
 	return aligned_size;
 }
 
-int perf_event__process_tracing_data(struct perf_tool *tool __maybe_unused,
-				     union perf_event *event,
-				     struct perf_session *session)
+int perf_event__process_tracing_data(struct perf_session *session,
+				     union perf_event *event)
 {
 	ssize_t size_read, padding, size = event->tracing_data.size;
 	int fd = perf_data__fd(session->data);
@@ -3924,9 +3923,8 @@ int perf_event__synthesize_build_id(struct perf_tool *tool,
 	return err;
 }
 
-int perf_event__process_build_id(struct perf_tool *tool __maybe_unused,
-				 union perf_event *event,
-				 struct perf_session *session)
+int perf_event__process_build_id(struct perf_session *session,
+				 union perf_event *event)
 {
 	__event_process_build_id(&event->build_id,
 				 event->build_id.filename,
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index ff2a1263fb9b..e17903caa71d 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -116,9 +116,8 @@ int perf_event__synthesize_extra_attr(struct perf_tool *tool,
 				      perf_event__handler_t process,
 				      bool is_pipe);
 
-int perf_event__process_feature(struct perf_tool *tool,
-				union perf_event *event,
-				struct perf_session *session);
+int perf_event__process_feature(struct perf_session *session,
+				union perf_event *event);
 
 int perf_event__synthesize_attr(struct perf_tool *tool,
 				struct perf_event_attr *attr, u32 ids, u64 *id,
@@ -148,17 +147,15 @@ size_t perf_event__fprintf_event_update(union perf_event *event, FILE *fp);
 int perf_event__synthesize_tracing_data(struct perf_tool *tool,
 					int fd, struct perf_evlist *evlist,
 					perf_event__handler_t process);
-int perf_event__process_tracing_data(struct perf_tool *tool,
-				     union perf_event *event,
-				     struct perf_session *session);
+int perf_event__process_tracing_data(struct perf_session *session,
+				     union perf_event *event);
 
 int perf_event__synthesize_build_id(struct perf_tool *tool,
 				    struct dso *pos, u16 misc,
 				    perf_event__handler_t process,
 				    struct machine *machine);
-int perf_event__process_build_id(struct perf_tool *tool,
-				 union perf_event *event,
-				 struct perf_session *session);
+int perf_event__process_build_id(struct perf_session *session,
+				 union perf_event *event);
 bool is_perf_magic(u64 magic);
 
 #define NAME_ALIGN 64
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 8b9369303561..e781cdba845c 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -199,12 +199,10 @@ void perf_session__delete(struct perf_session *session)
 	free(session);
 }
 
-static int process_event_synth_tracing_data_stub(struct perf_tool *tool
+static int process_event_synth_tracing_data_stub(struct perf_session *session
 						 __maybe_unused,
 						 union perf_event *event
-						 __maybe_unused,
-						 struct perf_session *session
-						__maybe_unused)
+						 __maybe_unused)
 {
 	dump_printf(": unhandled!\n");
 	return 0;
@@ -288,9 +286,8 @@ static s64 process_event_auxtrace_stub(struct perf_tool *tool __maybe_unused,
 	return event->auxtrace.size;
 }
 
-static int process_event_op2_stub(struct perf_tool *tool __maybe_unused,
-				  union perf_event *event __maybe_unused,
-				  struct perf_session *session __maybe_unused)
+static int process_event_op2_stub(struct perf_session *session __maybe_unused,
+				  union perf_event *event __maybe_unused)
 {
 	dump_printf(": unhandled!\n");
 	return 0;
@@ -298,9 +295,8 @@ static int process_event_op2_stub(struct perf_tool *tool __maybe_unused,
 
 
 static
-int process_event_thread_map_stub(struct perf_tool *tool __maybe_unused,
-				  union perf_event *event __maybe_unused,
-				  struct perf_session *session __maybe_unused)
+int process_event_thread_map_stub(struct perf_session *session __maybe_unused,
+				  union perf_event *event __maybe_unused)
 {
 	if (dump_trace)
 		perf_event__fprintf_thread_map(event, stdout);
@@ -310,9 +306,8 @@ int process_event_thread_map_stub(struct perf_tool *tool __maybe_unused,
 }
 
 static
-int process_event_cpu_map_stub(struct perf_tool *tool __maybe_unused,
-			       union perf_event *event __maybe_unused,
-			       struct perf_session *session __maybe_unused)
+int process_event_cpu_map_stub(struct perf_session *session __maybe_unused,
+			       union perf_event *event __maybe_unused)
 {
 	if (dump_trace)
 		perf_event__fprintf_cpu_map(event, stdout);
@@ -322,9 +317,8 @@ int process_event_cpu_map_stub(struct perf_tool *tool __maybe_unused,
 }
 
 static
-int process_event_stat_config_stub(struct perf_tool *tool __maybe_unused,
-				   union perf_event *event __maybe_unused,
-				   struct perf_session *session __maybe_unused)
+int process_event_stat_config_stub(struct perf_session *session __maybe_unused,
+				   union perf_event *event __maybe_unused)
 {
 	if (dump_trace)
 		perf_event__fprintf_stat_config(event, stdout);
@@ -333,10 +327,8 @@ int process_event_stat_config_stub(struct perf_tool *tool __maybe_unused,
 	return 0;
 }
 
-static int process_stat_stub(struct perf_tool *tool __maybe_unused,
-			     union perf_event *event __maybe_unused,
-			     struct perf_session *perf_session
-			     __maybe_unused)
+static int process_stat_stub(struct perf_session *perf_session __maybe_unused,
+			     union perf_event *event)
 {
 	if (dump_trace)
 		perf_event__fprintf_stat(event, stdout);
@@ -345,10 +337,8 @@ static int process_stat_stub(struct perf_tool *tool __maybe_unused,
 	return 0;
 }
 
-static int process_stat_round_stub(struct perf_tool *tool __maybe_unused,
-				   union perf_event *event __maybe_unused,
-				   struct perf_session *perf_session
-				   __maybe_unused)
+static int process_stat_round_stub(struct perf_session *perf_session __maybe_unused,
+				   union perf_event *event)
 {
 	if (dump_trace)
 		perf_event__fprintf_stat_round(event, stdout);
@@ -1374,37 +1364,37 @@ static s64 perf_session__process_user_event(struct perf_session *session,
 	case PERF_RECORD_HEADER_TRACING_DATA:
 		/* setup for reading amidst mmap */
 		lseek(fd, file_offset, SEEK_SET);
-		return tool->tracing_data(tool, event, session);
+		return tool->tracing_data(session, event);
 	case PERF_RECORD_HEADER_BUILD_ID:
-		return tool->build_id(tool, event, session);
+		return tool->build_id(session, event);
 	case PERF_RECORD_FINISHED_ROUND:
 		return tool->finished_round(tool, event, oe);
 	case PERF_RECORD_ID_INDEX:
-		return tool->id_index(tool, event, session);
+		return tool->id_index(session, event);
 	case PERF_RECORD_AUXTRACE_INFO:
-		return tool->auxtrace_info(tool, event, session);
+		return tool->auxtrace_info(session, event);
 	case PERF_RECORD_AUXTRACE:
 		/* setup for reading amidst mmap */
 		lseek(fd, file_offset + event->header.size, SEEK_SET);
 		return tool->auxtrace(tool, event, session);
 	case PERF_RECORD_AUXTRACE_ERROR:
 		perf_session__auxtrace_error_inc(session, event);
-		return tool->auxtrace_error(tool, event, session);
+		return tool->auxtrace_error(session, event);
 	case PERF_RECORD_THREAD_MAP:
-		return tool->thread_map(tool, event, session);
+		return tool->thread_map(session, event);
 	case PERF_RECORD_CPU_MAP:
-		return tool->cpu_map(tool, event, session);
+		return tool->cpu_map(session, event);
 	case PERF_RECORD_STAT_CONFIG:
-		return tool->stat_config(tool, event, session);
+		return tool->stat_config(session, event);
 	case PERF_RECORD_STAT:
-		return tool->stat(tool, event, session);
+		return tool->stat(session, event);
 	case PERF_RECORD_STAT_ROUND:
-		return tool->stat_round(tool, event, session);
+		return tool->stat_round(session, event);
 	case PERF_RECORD_TIME_CONV:
 		session->time_conv = event->time_conv;
-		return tool->time_conv(tool, event, session);
+		return tool->time_conv(session, event);
 	case PERF_RECORD_HEADER_FEATURE:
-		return tool->feature(tool, event, session);
+		return tool->feature(session, event);
 	default:
 		return -EINVAL;
 	}
@@ -2133,9 +2123,8 @@ out:
 	return err;
 }
 
-int perf_event__process_id_index(struct perf_tool *tool __maybe_unused,
-				 union perf_event *event,
-				 struct perf_session *session)
+int perf_event__process_id_index(struct perf_session *session,
+				 union perf_event *event)
 {
 	struct perf_evlist *evlist = session->evlist;
 	struct id_index_event *ie = &event->id_index;
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index da40b4b380ca..d96eccd7d27f 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -120,9 +120,8 @@ int perf_session__deliver_synth_event(struct perf_session *session,
 				      union perf_event *event,
 				      struct perf_sample *sample);
 
-int perf_event__process_id_index(struct perf_tool *tool,
-				 union perf_event *event,
-				 struct perf_session *session);
+int perf_event__process_id_index(struct perf_session *session,
+				 union perf_event *event);
 
 int perf_event__synthesize_id_index(struct perf_tool *tool,
 				    perf_event__handler_t process,
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 5d3172bcc4ae..4d40515307b8 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -374,9 +374,8 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 	return 0;
 }
 
-int perf_event__process_stat_event(struct perf_tool *tool __maybe_unused,
-				   union perf_event *event,
-				   struct perf_session *session)
+int perf_event__process_stat_event(struct perf_session *session,
+				   union perf_event *event)
 {
 	struct perf_counts_values count;
 	struct stat_event *st = &event->stat;
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 3a13a6dc5a62..2f9c9159a364 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -199,9 +199,8 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 struct perf_tool;
 union perf_event;
 struct perf_session;
-int perf_event__process_stat_event(struct perf_tool *tool,
-				   union perf_event *event,
-				   struct perf_session *session);
+int perf_event__process_stat_event(struct perf_session *session,
+				   union perf_event *event);
 
 size_t perf_event__fprintf_stat(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_stat_round(union perf_event *event, FILE *fp);
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 183c91453522..9c7f78d76275 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -26,8 +26,7 @@ typedef int (*event_attr_op)(struct perf_tool *tool,
 			     union perf_event *event,
 			     struct perf_evlist **pevlist);
 
-typedef int (*event_op2)(struct perf_tool *tool, union perf_event *event,
-			 struct perf_session *session);
+typedef int (*event_op2)(struct perf_session *session, union perf_event *event);
 
 typedef int (*event_oe)(struct perf_tool *tool, union perf_event *event,
 			struct ordered_events *oe);

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [tip:perf/core] perf tools: Remove perf_tool from event_op3
  2018-09-13 12:54 ` [PATCH 02/48] perf tools: Remove perf_tool from event_op3 Jiri Olsa
  2018-09-18 20:56   ` Arnaldo Carvalho de Melo
@ 2018-09-25  9:31   ` " tip-bot for Jiri Olsa
  1 sibling, 0 replies; 101+ messages in thread
From: tip-bot for Jiri Olsa @ 2018-09-25  9:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, alexey.budankov, acme, jolsa, tglx, mingo,
	alexander.shishkin, namhyung, hpa, andi, linux-kernel

Commit-ID:  7336555a682c09fd9a3fdf38724493e52653be50
Gitweb:     https://git.kernel.org/tip/7336555a682c09fd9a3fdf38724493e52653be50
Author:     Jiri Olsa <jolsa@kernel.org>
AuthorDate: Thu, 13 Sep 2018 14:54:04 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 19 Sep 2018 10:25:10 -0300

perf tools: Remove perf_tool from event_op3

Now that we keep a perf_tool pointer inside perf_session, there's no need
to have a perf_tool argument in the event_op3 callback. Remove it.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180913125450.21342-3-jolsa@kernel.org
[ Fix the builtin-inject.c build for !HAVE_AUXTRACE_SUPPORT ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-inject.c | 11 +++++------
 tools/perf/util/auxtrace.c  |  7 +++----
 tools/perf/util/auxtrace.h  |  5 ++---
 tools/perf/util/session.c   |  8 +++-----
 tools/perf/util/tool.h      |  4 +---
 5 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index d77ed2aea95a..b4a29f435b06 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -131,10 +131,10 @@ static int copy_bytes(struct perf_inject *inject, int fd, off_t size)
 	return 0;
 }
 
-static s64 perf_event__repipe_auxtrace(struct perf_tool *tool,
-				       union perf_event *event,
-				       struct perf_session *session)
+static s64 perf_event__repipe_auxtrace(struct perf_session *session,
+				       union perf_event *event)
 {
+	struct perf_tool *tool = session->tool;
 	struct perf_inject *inject = container_of(tool, struct perf_inject,
 						  tool);
 	int ret;
@@ -172,9 +172,8 @@ static s64 perf_event__repipe_auxtrace(struct perf_tool *tool,
 #else
 
 static s64
-perf_event__repipe_auxtrace(struct perf_tool *tool __maybe_unused,
-			    union perf_event *event __maybe_unused,
-			    struct perf_session *session __maybe_unused)
+perf_event__repipe_auxtrace(struct perf_session *session __maybe_unused,
+			    union perf_event *event __maybe_unused)
 {
 	pr_err("AUX area tracing not supported\n");
 	return -EINVAL;
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 86f0bc445f93..3017b205a157 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -931,9 +931,8 @@ int perf_event__process_auxtrace_info(struct perf_session *session,
 	}
 }
 
-s64 perf_event__process_auxtrace(struct perf_tool *tool,
-				 union perf_event *event,
-				 struct perf_session *session)
+s64 perf_event__process_auxtrace(struct perf_session *session,
+				 union perf_event *event)
 {
 	s64 err;
 
@@ -949,7 +948,7 @@ s64 perf_event__process_auxtrace(struct perf_tool *tool,
 	if (!session->auxtrace || event->header.type != PERF_RECORD_AUXTRACE)
 		return -EINVAL;
 
-	err = session->auxtrace->process_auxtrace_event(session, event, tool);
+	err = session->auxtrace->process_auxtrace_event(session, event, session->tool);
 	if (err < 0)
 		return err;
 
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 97776470a52e..6be89776358c 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -519,9 +519,8 @@ int perf_event__synthesize_auxtrace_info(struct auxtrace_record *itr,
 					 perf_event__handler_t process);
 int perf_event__process_auxtrace_info(struct perf_session *session,
 				      union perf_event *event);
-s64 perf_event__process_auxtrace(struct perf_tool *tool,
-				 union perf_event *event,
-				 struct perf_session *session);
+s64 perf_event__process_auxtrace(struct perf_session *session,
+				 union perf_event *event);
 int perf_event__process_auxtrace_error(struct perf_session *session,
 				       union perf_event *event);
 int itrace_parse_synth_opts(const struct option *opt, const char *str,
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index e781cdba845c..7d2c8ce6cfad 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -275,10 +275,8 @@ static int skipn(int fd, off_t n)
 	return 0;
 }
 
-static s64 process_event_auxtrace_stub(struct perf_tool *tool __maybe_unused,
-				       union perf_event *event,
-				       struct perf_session *session
-				       __maybe_unused)
+static s64 process_event_auxtrace_stub(struct perf_session *session __maybe_unused,
+				       union perf_event *event)
 {
 	dump_printf(": unhandled!\n");
 	if (perf_data__is_pipe(session->data))
@@ -1376,7 +1374,7 @@ static s64 perf_session__process_user_event(struct perf_session *session,
 	case PERF_RECORD_AUXTRACE:
 		/* setup for reading amidst mmap */
 		lseek(fd, file_offset + event->header.size, SEEK_SET);
-		return tool->auxtrace(tool, event, session);
+		return tool->auxtrace(session, event);
 	case PERF_RECORD_AUXTRACE_ERROR:
 		perf_session__auxtrace_error_inc(session, event);
 		return tool->auxtrace_error(session, event);
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 9c7f78d76275..56e4ca54020a 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -27,13 +27,11 @@ typedef int (*event_attr_op)(struct perf_tool *tool,
 			     struct perf_evlist **pevlist);
 
 typedef int (*event_op2)(struct perf_session *session, union perf_event *event);
+typedef s64 (*event_op3)(struct perf_session *session, union perf_event *event);
 
 typedef int (*event_oe)(struct perf_tool *tool, union perf_event *event,
 			struct ordered_events *oe);
 
-typedef s64 (*event_op3)(struct perf_tool *tool, union perf_event *event,
-			 struct perf_session *session);
-
 enum show_feature_header {
 	SHOW_FEAT_NO_HEADER = 0,
 	SHOW_FEAT_HEADER,

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [tip:perf/core] perf auxtrace: Pass struct perf_mmap into mmap__read* functions
  2018-09-13 12:54 ` [PATCH 03/48] perf tools: Pass struct perf_mmap into auxtrace_mmap__read* functions Jiri Olsa
@ 2018-09-25  9:32   ` tip-bot for Jiri Olsa
  0 siblings, 0 replies; 101+ messages in thread
From: tip-bot for Jiri Olsa @ 2018-09-25  9:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, andi, mingo, linux-kernel, acme, hpa, alexey.budankov,
	alexander.shishkin, namhyung, jolsa, tglx

Commit-ID:  e035f4ca2ac97c30842fb03101198a86730de3ad
Gitweb:     https://git.kernel.org/tip/e035f4ca2ac97c30842fb03101198a86730de3ad
Author:     Jiri Olsa <jolsa@kernel.org>
AuthorDate: Thu, 13 Sep 2018 14:54:05 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 19 Sep 2018 10:25:11 -0300

perf auxtrace: Pass struct perf_mmap into mmap__read* functions

The perf_mmap struct will hold a file pointer to write the mmap's
contents, so we need to propagate it down the stack to record__write
callers instead of its member the auxtrace_mmap struct.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180913125450.21342-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-record.c | 27 +++++++++++++--------------
 tools/perf/util/auxtrace.c  | 11 ++++++-----
 tools/perf/util/auxtrace.h  |  5 +++--
 3 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 9853552bcf16..fd8b12c5f4ae 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -207,11 +207,11 @@ static int record__process_auxtrace(struct perf_tool *tool,
 }
 
 static int record__auxtrace_mmap_read(struct record *rec,
-				      struct auxtrace_mmap *mm)
+				      struct perf_mmap *map)
 {
 	int ret;
 
-	ret = auxtrace_mmap__read(mm, rec->itr, &rec->tool,
+	ret = auxtrace_mmap__read(map, rec->itr, &rec->tool,
 				  record__process_auxtrace);
 	if (ret < 0)
 		return ret;
@@ -223,11 +223,11 @@ static int record__auxtrace_mmap_read(struct record *rec,
 }
 
 static int record__auxtrace_mmap_read_snapshot(struct record *rec,
-					       struct auxtrace_mmap *mm)
+					       struct perf_mmap *map)
 {
 	int ret;
 
-	ret = auxtrace_mmap__read_snapshot(mm, rec->itr, &rec->tool,
+	ret = auxtrace_mmap__read_snapshot(map, rec->itr, &rec->tool,
 					   record__process_auxtrace,
 					   rec->opts.auxtrace_snapshot_size);
 	if (ret < 0)
@@ -245,13 +245,12 @@ static int record__auxtrace_read_snapshot_all(struct record *rec)
 	int rc = 0;
 
 	for (i = 0; i < rec->evlist->nr_mmaps; i++) {
-		struct auxtrace_mmap *mm =
-				&rec->evlist->mmap[i].auxtrace_mmap;
+		struct perf_mmap *map = &rec->evlist->mmap[i];
 
-		if (!mm->base)
+		if (!map->auxtrace_mmap.base)
 			continue;
 
-		if (record__auxtrace_mmap_read_snapshot(rec, mm) != 0) {
+		if (record__auxtrace_mmap_read_snapshot(rec, map) != 0) {
 			rc = -1;
 			goto out;
 		}
@@ -295,7 +294,7 @@ static int record__auxtrace_init(struct record *rec)
 
 static inline
 int record__auxtrace_mmap_read(struct record *rec __maybe_unused,
-			       struct auxtrace_mmap *mm __maybe_unused)
+			       struct perf_mmap *map __maybe_unused)
 {
 	return 0;
 }
@@ -529,17 +528,17 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 		return 0;
 
 	for (i = 0; i < evlist->nr_mmaps; i++) {
-		struct auxtrace_mmap *mm = &maps[i].auxtrace_mmap;
+		struct perf_mmap *map = &maps[i];
 
-		if (maps[i].base) {
-			if (perf_mmap__push(&maps[i], rec, record__pushfn) != 0) {
+		if (map->base) {
+			if (perf_mmap__push(map, rec, record__pushfn) != 0) {
 				rc = -1;
 				goto out;
 			}
 		}
 
-		if (mm->base && !rec->opts.auxtrace_snapshot_mode &&
-		    record__auxtrace_mmap_read(rec, mm) != 0) {
+		if (map->auxtrace_mmap.base && !rec->opts.auxtrace_snapshot_mode &&
+		    record__auxtrace_mmap_read(rec, map) != 0) {
 			rc = -1;
 			goto out;
 		}
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 3017b205a157..2fecee57f555 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1193,11 +1193,12 @@ int perf_event__process_auxtrace_error(struct perf_session *session,
 	return 0;
 }
 
-static int __auxtrace_mmap__read(struct auxtrace_mmap *mm,
+static int __auxtrace_mmap__read(struct perf_mmap *map,
 				 struct auxtrace_record *itr,
 				 struct perf_tool *tool, process_auxtrace_t fn,
 				 bool snapshot, size_t snapshot_size)
 {
+	struct auxtrace_mmap *mm = &map->auxtrace_mmap;
 	u64 head, old = mm->prev, offset, ref;
 	unsigned char *data = mm->base;
 	size_t size, head_off, old_off, len1, len2, padding;
@@ -1303,18 +1304,18 @@ static int __auxtrace_mmap__read(struct auxtrace_mmap *mm,
 	return 1;
 }
 
-int auxtrace_mmap__read(struct auxtrace_mmap *mm, struct auxtrace_record *itr,
+int auxtrace_mmap__read(struct perf_mmap *map, struct auxtrace_record *itr,
 			struct perf_tool *tool, process_auxtrace_t fn)
 {
-	return __auxtrace_mmap__read(mm, itr, tool, fn, false, 0);
+	return __auxtrace_mmap__read(map, itr, tool, fn, false, 0);
 }
 
-int auxtrace_mmap__read_snapshot(struct auxtrace_mmap *mm,
+int auxtrace_mmap__read_snapshot(struct perf_mmap *map,
 				 struct auxtrace_record *itr,
 				 struct perf_tool *tool, process_auxtrace_t fn,
 				 size_t snapshot_size)
 {
-	return __auxtrace_mmap__read(mm, itr, tool, fn, true, snapshot_size);
+	return __auxtrace_mmap__read(map, itr, tool, fn, true, snapshot_size);
 }
 
 /**
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 6be89776358c..7eeb141361b9 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -33,6 +33,7 @@ union perf_event;
 struct perf_session;
 struct perf_evlist;
 struct perf_tool;
+struct perf_mmap;
 struct option;
 struct record_opts;
 struct auxtrace_info_event;
@@ -437,10 +438,10 @@ typedef int (*process_auxtrace_t)(struct perf_tool *tool,
 				  union perf_event *event, void *data1,
 				  size_t len1, void *data2, size_t len2);
 
-int auxtrace_mmap__read(struct auxtrace_mmap *mm, struct auxtrace_record *itr,
+int auxtrace_mmap__read(struct perf_mmap *map, struct auxtrace_record *itr,
 			struct perf_tool *tool, process_auxtrace_t fn);
 
-int auxtrace_mmap__read_snapshot(struct auxtrace_mmap *mm,
+int auxtrace_mmap__read_snapshot(struct perf_mmap *map,
 				 struct auxtrace_record *itr,
 				 struct perf_tool *tool, process_auxtrace_t fn,
 				 size_t snapshot_size);

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [tip:perf/core] perf tools: Add 'struct perf_mmap' arg to record__write()
  2018-09-13 12:54 ` [PATCH 04/48] perf tools: Add struct perf_mmap arg into record__write Jiri Olsa
@ 2018-09-25  9:32   ` tip-bot for Jiri Olsa
  0 siblings, 0 replies; 101+ messages in thread
From: tip-bot for Jiri Olsa @ 2018-09-25  9:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: andi, hpa, jolsa, peterz, alexander.shishkin, tglx,
	alexey.budankov, mingo, linux-kernel, acme, namhyung

Commit-ID:  ded2b8fe2e431d8029ab50238744fcce06a2f0c6
Gitweb:     https://git.kernel.org/tip/ded2b8fe2e431d8029ab50238744fcce06a2f0c6
Author:     Jiri Olsa <jolsa@kernel.org>
AuthorDate: Thu, 13 Sep 2018 14:54:06 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 19 Sep 2018 10:25:11 -0300

perf tools: Add 'struct perf_mmap' arg to record__write()

The struct perf_mmap map argument will hold the file pointer to write
the data to.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180913125450.21342-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-record.c | 24 ++++++++++++++----------
 tools/perf/util/auxtrace.c  |  2 +-
 tools/perf/util/auxtrace.h  |  1 +
 tools/perf/util/mmap.c      |  6 +++---
 tools/perf/util/mmap.h      |  2 +-
 5 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index fd8b12c5f4ae..0980dfe3396b 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -106,9 +106,12 @@ static bool switch_output_time(struct record *rec)
 	       trigger_is_ready(&switch_output_trigger);
 }
 
-static int record__write(struct record *rec, void *bf, size_t size)
+static int record__write(struct record *rec, struct perf_mmap *map __maybe_unused,
+			 void *bf, size_t size)
 {
-	if (perf_data__write(rec->session->data, bf, size) < 0) {
+	struct perf_data_file *file = &rec->session->data->file;
+
+	if (perf_data_file__write(file, bf, size) < 0) {
 		pr_err("failed to write perf data, error: %m\n");
 		return -1;
 	}
@@ -127,15 +130,15 @@ static int process_synthesized_event(struct perf_tool *tool,
 				     struct machine *machine __maybe_unused)
 {
 	struct record *rec = container_of(tool, struct record, tool);
-	return record__write(rec, event, event->header.size);
+	return record__write(rec, NULL, event, event->header.size);
 }
 
-static int record__pushfn(void *to, void *bf, size_t size)
+static int record__pushfn(struct perf_mmap *map, void *to, void *bf, size_t size)
 {
 	struct record *rec = to;
 
 	rec->samples++;
-	return record__write(rec, bf, size);
+	return record__write(rec, map, bf, size);
 }
 
 static volatile int done;
@@ -170,6 +173,7 @@ static void record__sig_exit(void)
 #ifdef HAVE_AUXTRACE_SUPPORT
 
 static int record__process_auxtrace(struct perf_tool *tool,
+				    struct perf_mmap *map,
 				    union perf_event *event, void *data1,
 				    size_t len1, void *data2, size_t len2)
 {
@@ -197,11 +201,11 @@ static int record__process_auxtrace(struct perf_tool *tool,
 	if (padding)
 		padding = 8 - padding;
 
-	record__write(rec, event, event->header.size);
-	record__write(rec, data1, len1);
+	record__write(rec, map, event, event->header.size);
+	record__write(rec, map, data1, len1);
 	if (len2)
-		record__write(rec, data2, len2);
-	record__write(rec, &pad, padding);
+		record__write(rec, map, data2, len2);
+	record__write(rec, map, &pad, padding);
 
 	return 0;
 }
@@ -549,7 +553,7 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
 	 * at least one event.
 	 */
 	if (bytes_written != rec->bytes_written)
-		rc = record__write(rec, &finished_round_event, sizeof(finished_round_event));
+		rc = record__write(rec, NULL, &finished_round_event, sizeof(finished_round_event));
 
 	if (overwrite)
 		perf_evlist__toggle_bkw_mmap(evlist, BKW_MMAP_EMPTY);
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 2fecee57f555..c4617bcfd521 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1285,7 +1285,7 @@ static int __auxtrace_mmap__read(struct perf_mmap *map,
 	ev.auxtrace.tid = mm->tid;
 	ev.auxtrace.cpu = mm->cpu;
 
-	if (fn(tool, &ev, data1, len1, data2, len2))
+	if (fn(tool, map, &ev, data1, len1, data2, len2))
 		return -1;
 
 	mm->prev = head;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 7eeb141361b9..a86b7eab6673 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -435,6 +435,7 @@ void auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp,
 				   bool per_cpu);
 
 typedef int (*process_auxtrace_t)(struct perf_tool *tool,
+				  struct perf_mmap *map,
 				  union perf_event *event, void *data1,
 				  size_t len1, void *data2, size_t len2);
 
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 215f69f41672..cdb95b3a1213 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -281,7 +281,7 @@ int perf_mmap__read_init(struct perf_mmap *map)
 }
 
 int perf_mmap__push(struct perf_mmap *md, void *to,
-		    int push(void *to, void *buf, size_t size))
+		    int push(struct perf_mmap *map, void *to, void *buf, size_t size))
 {
 	u64 head = perf_mmap__read_head(md);
 	unsigned char *data = md->base + page_size;
@@ -300,7 +300,7 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
 		size = md->mask + 1 - (md->start & md->mask);
 		md->start += size;
 
-		if (push(to, buf, size) < 0) {
+		if (push(md, to, buf, size) < 0) {
 			rc = -1;
 			goto out;
 		}
@@ -310,7 +310,7 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
 	size = md->end - md->start;
 	md->start += size;
 
-	if (push(to, buf, size) < 0) {
+	if (push(md, to, buf, size) < 0) {
 		rc = -1;
 		goto out;
 	}
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index 05a6d47c7956..e603314dc792 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -93,7 +93,7 @@ union perf_event *perf_mmap__read_forward(struct perf_mmap *map);
 union perf_event *perf_mmap__read_event(struct perf_mmap *map);
 
 int perf_mmap__push(struct perf_mmap *md, void *to,
-		    int push(void *to, void *buf, size_t size));
+		    int push(struct perf_mmap *map, void *to, void *buf, size_t size));
 
 size_t perf_mmap__mmap_len(struct perf_mmap *map);
 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [tip:perf/core] perf util: Make copyfile_offset() global
  2018-09-13 12:54 ` [PATCH 09/48] perf tools: Make copyfile_offset global Jiri Olsa
  2018-09-18 20:54   ` Arnaldo Carvalho de Melo
@ 2018-09-25  9:33   ` tip-bot for Jiri Olsa
  1 sibling, 0 replies; 101+ messages in thread
From: tip-bot for Jiri Olsa @ 2018-09-25  9:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, acme, peterz, linux-kernel, alexey.budankov, tglx,
	namhyung, alexander.shishkin, mingo, andi, hpa

Commit-ID:  ed93d0a26012a4a12231c16b18628a324079dc45
Gitweb:     https://git.kernel.org/tip/ed93d0a26012a4a12231c16b18628a324079dc45
Author:     Jiri Olsa <jolsa@kernel.org>
AuthorDate: Thu, 13 Sep 2018 14:54:11 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 19 Sep 2018 10:25:12 -0300

perf util: Make copyfile_offset() global

It will be used outside of util object in following patches.

Committer note:

We need to have the header with the definition for loff_t in util.h
since we now use it in the copyfile_offset() signature.

Also move that prototype closer to the other copyfile_ prefixed
functions.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180913125450.21342-10-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/util.c | 2 +-
 tools/perf/util/util.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index eac5b858a371..093352e93d50 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -221,7 +221,7 @@ out:
 	return err;
 }
 
-static int copyfile_offset(int ifd, loff_t off_in, int ofd, loff_t off_out, u64 size)
+int copyfile_offset(int ifd, loff_t off_in, int ofd, loff_t off_out, u64 size)
 {
 	void *ptr;
 	loff_t pgoff;
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index dc58254a2b69..14508ee7707a 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -6,6 +6,7 @@
 /* glibc 2.20 deprecates _BSD_SOURCE in favour of _DEFAULT_SOURCE */
 #define _DEFAULT_SOURCE 1
 
+#include <fcntl.h>
 #include <stdbool.h>
 #include <stddef.h>
 #include <stdlib.h>
@@ -35,6 +36,7 @@ bool lsdir_no_dot_filter(const char *name, struct dirent *d);
 int copyfile(const char *from, const char *to);
 int copyfile_mode(const char *from, const char *to, mode_t mode);
 int copyfile_ns(const char *from, const char *to, struct nsinfo *nsi);
+int copyfile_offset(int ifd, loff_t off_in, int ofd, loff_t off_out, u64 size);
 
 ssize_t readn(int fd, void *buf, size_t n);
 ssize_t writen(int fd, const void *buf, size_t n);

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 47/48] perf record: Spread maps for --threads option
  2018-09-24 14:22       ` Arnaldo Carvalho de Melo
@ 2018-09-26  6:23         ` Jiri Olsa
  2018-09-27 16:01           ` Jiri Olsa
  0 siblings, 1 reply; 101+ messages in thread
From: Jiri Olsa @ 2018-09-26  6:23 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Namhyung Kim, Jiri Olsa, lkml, Ingo Molnar, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov, kernel-team

On Mon, Sep 24, 2018 at 11:22:54AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Sun, Sep 23, 2018 at 09:44:32PM +0200, Jiri Olsa escreveu:
> > On Mon, Sep 17, 2018 at 08:40:48PM +0900, Namhyung Kim wrote:
> > > On Thu, Sep 13, 2018 at 02:54:49PM +0200, Jiri Olsa wrote:
> > > > Currently we assign all maps to main thread. Adding
> > > > code that spreads maps for --threads option.
> > > > 
> > > > For --thread option we create as many threads as there
> > > > are memory maps in evlist, which is the number of CPUs
> > > > in the system or CPUs we monitor. Each thread gets a
> > > > single data mmap to read.
> > > > 
> > > > In addition we have also same amount of tracking mmaps
> > > > for auxiliary events which we don't create special thread
> > > > for. Instead we assign the to the main thread, because
> > > > there's not much traffic expected there.
> > > > 
> > > > The assignment is visible from --thread-stats output:
> > > > 
> > > >           pid      write       poll       skip  maps (size 20K)
> > > >     1s   9770       144B          1          0   19K   19K   19K   18K   19K
> > > >          9772         0B          1          0   18K
> > > >          9773         0B          1          0   19K
> > > >          9774         0B          1          0   19K
> > > > 
> > > > There are 5 maps for thread 9770 (1 data map and 4 auxiliary)
> > > > and one data map for every other thread. Each thread writes
> > > > data to the separate data file.
> > > 
> > > Hmm.. not sure it'll work well for large machines with 1000+ cpus.
> > > What about giving each thread a data mmap and a tracking mmap?
> > 
> > well currently we store the tracking data in single file,
> > thats why we need just one thread to write them down
> 
> I agree with Namhyung, with a slight difference: perhaps we should set
> perf_event_attr.mmap on one of the events of the per-cpu mmap, that way
> we don't need that dummy event, right?

currently it's all based on having tracking data separated
in single file which is read/processed first, so when we
read the sample data files, we can read them separately,
because we have the tracking data ready

>  
> > with the *_time API, we should be able to properly read the
> > tracking data separately for each cpu
> 
> That may end up making the *_time API not needed (assuming the kernel
> keeps the per-cpu mmap events in order, barring that, using the
> ordered_events in batches, prior to consuming the events) and would help
> with things like 'perf top' and 'perf trace', that want to consume
> events right away.

if we dont want to use *_by_time API, we need to find a way
to sort evevrything out before we start processing.. and that
seems too costly to me

jirka

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 47/48] perf record: Spread maps for --threads option
  2018-09-26  6:23         ` Jiri Olsa
@ 2018-09-27 16:01           ` Jiri Olsa
  2018-09-28  6:25             ` Namhyung Kim
  0 siblings, 1 reply; 101+ messages in thread
From: Jiri Olsa @ 2018-09-27 16:01 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Namhyung Kim, Jiri Olsa, lkml, Ingo Molnar, Alexander Shishkin,
	Peter Zijlstra, Andi Kleen, Alexey Budankov, kernel-team

On Wed, Sep 26, 2018 at 08:23:17AM +0200, Jiri Olsa wrote:

SNIP

> > I agree with Namhyung, with a slight difference: perhaps we should set
> > perf_event_attr.mmap on one of the events of the per-cpu mmap, that way
> > we don't need that dummy event, right?
> 
> currently it's all based on having tracking data separated
> in single file which is read/processed first, so when we
> read the sample data files, we can read them separately,
> because we have the tracking data ready
> 
> >  
> > > with the *_time API, we should be able to properly read the
> > > tracking data separately for each cpu
> > 
> > That may end up making the *_time API not needed (assuming the kernel
> > keeps the per-cpu mmap events in order, barring that, using the
> > ordered_events in batches, prior to consuming the events) and would help
> > with things like 'perf top' and 'perf trace', that want to consume
> > events right away.
> 
> if we dont want to use *_by_time API, we need to find a way
> to sort evevrything out before we start processing.. and that
> seems too costly to me

actualy we might try to read all the streams at simultaneously
and sort the samples on the fly with some reasonable sorting
window time frame.. this way we could have just single file
for thread and would skip the *by_time api, hopefuly :-\

I'll try to prepare something

jirka

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 47/48] perf record: Spread maps for --threads option
  2018-09-27 16:01           ` Jiri Olsa
@ 2018-09-28  6:25             ` Namhyung Kim
  0 siblings, 0 replies; 101+ messages in thread
From: Namhyung Kim @ 2018-09-28  6:25 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, lkml, Ingo Molnar,
	Alexander Shishkin, Peter Zijlstra, Andi Kleen, Alexey Budankov,
	kernel-team

Hi Arnaldo and Jiri,

On Thu, Sep 27, 2018 at 06:01:58PM +0200, Jiri Olsa wrote:
> On Wed, Sep 26, 2018 at 08:23:17AM +0200, Jiri Olsa wrote:
> 
> SNIP
> 
> > > I agree with Namhyung, with a slight difference: perhaps we should set
> > > perf_event_attr.mmap on one of the events of the per-cpu mmap, that way
> > > we don't need that dummy event, right?
> > 
> > currently it's all based on having tracking data separated
> > in single file which is read/processed first, so when we
> > read the sample data files, we can read them separately,
> > because we have the tracking data ready
> > 
> > >  
> > > > with the *_time API, we should be able to properly read the
> > > > tracking data separately for each cpu
> > > 
> > > That may end up making the *_time API not needed (assuming the kernel
> > > keeps the per-cpu mmap events in order, barring that, using the
> > > ordered_events in batches, prior to consuming the events) and would help
> > > with things like 'perf top' and 'perf trace', that want to consume
> > > events right away.
> > 
> > if we dont want to use *_by_time API, we need to find a way
> > to sort evevrything out before we start processing.. and that
> > seems too costly to me
> 
> actualy we might try to read all the streams at simultaneously
> and sort the samples on the fly with some reasonable sorting
> window time frame.. this way we could have just single file
> for thread and would skip the *by_time api, hopefuly :-\

Note that without the *by_time API, it only can see the last state.
For example if a task calls exec() in the middle of the window, all
the samples will be processed to the new one.  It might or might not
matter depending on the length of the window.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-24 19:23             ` Alexey Budankov
@ 2018-10-02 21:41               ` Jiri Olsa
  2018-10-03  7:01                 ` Alexey Budankov
  0 siblings, 1 reply; 101+ messages in thread
From: Jiri Olsa @ 2018-10-02 21:41 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar,
	Namhyung Kim, Alexander Shishkin, Peter Zijlstra, Andi Kleen

On Mon, Sep 24, 2018 at 10:23:37PM +0300, Alexey Budankov wrote:

SNIP

> > Workload: matrix multiplication in 128 threads
> > 
> > 	P (period, ms)       : 3
> >  	runtime overhead (%) : 1.8x ~ 12.58 / 6.81
> >  	data loss (%)        : 9
> >  	LOST events          : 147
> >  	SAMPLE events        : 673299
> >         perf.data size (GiB) : 0.8
> 
> Please see more comparable data by P (period, ms), 
> runtime overhead and data loss metrics at the same time.
> 
> It start from serial implementation as the baseline and 
> then demonstrates possible improvement applying configurable 
> --aio(=N) and --threads(=T) implementations.
> 
> Smaller P values, with data loss and runtime overhead values
> equal or in small vicinity of the ones from serial implementation,
> might mean possible gain.

sry for delay.. ok, so it's not so bad afterall ;-)
thanks a lot for running the test

I need to rewrite some parts of it for the next post,
but I'd hate to lose your aio implementation and the
possibility to easily compare it against threaded
implementation

I think we are able to keep it along under --aio option
together with current (sync) write implementation and
future threads implementation.. could you make it available
only under --aio option (or such) and repost?

thanks,
jirka

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-10-02 21:41               ` Jiri Olsa
@ 2018-10-03  7:01                 ` Alexey Budankov
  0 siblings, 0 replies; 101+ messages in thread
From: Alexey Budankov @ 2018-10-03  7:01 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Arnaldo Carvalho de Melo, lkml, Ingo Molnar,
	Namhyung Kim, Alexander Shishkin, Peter Zijlstra, Andi Kleen

Hi,

On 03.10.2018 0:41, Jiri Olsa wrote:

<SNIP>

> I think we are able to keep it along under --aio option
> together with current (sync) write implementation and
> future threads implementation.. could you make it available
> only under --aio option (or such) and repost?

That's possible. Let me look into it.

Thanks,
Alexey

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFCv2 00/48] perf tools: Add threads to record command
  2018-09-24 18:32                   ` Alexey Budankov
  2018-09-24 19:12                     ` Alexey Budankov
@ 2018-10-05  6:14                     ` Namhyung Kim
  1 sibling, 0 replies; 101+ messages in thread
From: Namhyung Kim @ 2018-10-05  6:14 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Jiri Olsa, Jiri Olsa, Arnaldo Carvalho de Melo, lkml,
	Ingo Molnar, Alexander Shishkin, Peter Zijlstra, Andi Kleen,
	kernel-team

Hi,

Sorry for late..

On Mon, Sep 24, 2018 at 09:32:11PM +0300, Alexey Budankov wrote:
> On 24.09.2018 17:29, Jiri Olsa wrote:
> > On Mon, Sep 24, 2018 at 04:09:09PM +0300, Alexey Budankov wrote:
> >> Command:
> >>
> >> /usr/bin/time ./perf.thr record --threads=T \
> >> 	-N -B -T -R --call-graph dwarf,1024 --user-regs=ip,bp,sp \
> >> 	-e cpu/period=P,event=0x3c/Duk,\
> >> 	   cpu/period=P,umask=0x3/Duk,\
> >> 	   cpu/period=P,event=0xc0/Duk,\
> >> 	   cpu/period=0xaae61,event=0xc2,umask=0x10/uk,\
> >> 	   cpu/period=0x11171,event=0xc2,umask=0x20/uk,\
> >> 	   cpu/period=0x11171,event=0xc2,umask=0x40/uk \
> >> 	--clockid=monotonic_raw -- ./matrix.gcc
> >>
> >> Workload: matrix multiplication in 128 threads
> >>
> >> T : 272
> >> 	P (period, ms)       : 0.35 
> >> 	runtime overhead (%) : 13x ~ 87.73 / 6.81
> >> 	data loss (%)        : 0
> >> 	LOST events          : 36
> >> 	SAMPLE events        : 8048542
> >>    perf.data size (GiB) : 10
> > 
> > any idea why does it have some much more samples?
> 
> Presumably, this is because period is 350us and this is the smallest 
> one that perf.thr manages to capture data without data loss (=0) when T=272.
> However, during collection, I get message that max sampling frequency 
> is lowered to 3KHz.

And it took much longer than AIO:  87.73 vs 22.34  (N=272)

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 101+ messages in thread

end of thread, back to index

Thread overview: 101+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-13 12:54 [RFCv2 00/48] perf tools: Add threads to record command Jiri Olsa
2018-09-13 12:54 ` [PATCH 01/48] perf tools: Remove perf_tool from event_op2 Jiri Olsa
2018-09-25  9:31   ` [tip:perf/core] " tip-bot for Jiri Olsa
2018-09-13 12:54 ` [PATCH 02/48] perf tools: Remove perf_tool from event_op3 Jiri Olsa
2018-09-18 20:56   ` Arnaldo Carvalho de Melo
2018-09-23 19:45     ` Jiri Olsa
2018-09-25  9:31   ` [tip:perf/core] " tip-bot for Jiri Olsa
2018-09-13 12:54 ` [PATCH 03/48] perf tools: Pass struct perf_mmap into auxtrace_mmap__read* functions Jiri Olsa
2018-09-25  9:32   ` [tip:perf/core] perf auxtrace: Pass struct perf_mmap into mmap__read* functions tip-bot for Jiri Olsa
2018-09-13 12:54 ` [PATCH 04/48] perf tools: Add struct perf_mmap arg into record__write Jiri Olsa
2018-09-25  9:32   ` [tip:perf/core] perf tools: Add 'struct perf_mmap' arg to record__write() tip-bot for Jiri Olsa
2018-09-13 12:54 ` [PATCH 05/48] perf tools: Use a software dummy event to track task/mmap events Jiri Olsa
2018-09-13 12:54 ` [PATCH 06/48] perf tools: Create separate mmap for dummy tracking event Jiri Olsa
2018-09-13 12:54 ` [PATCH 07/48] perf tools: Extend perf_evlist__mmap_ex() to use track mmap Jiri Olsa
2018-09-13 12:54 ` [PATCH 08/48] perf report: Skip dummy tracking event Jiri Olsa
2018-09-13 12:54 ` [PATCH 09/48] perf tools: Make copyfile_offset global Jiri Olsa
2018-09-18 20:54   ` Arnaldo Carvalho de Melo
2018-09-23 19:44     ` Jiri Olsa
2018-09-25  9:33   ` [tip:perf/core] perf util: Make copyfile_offset() global tip-bot for Jiri Olsa
2018-09-13 12:54 ` [PATCH 10/48] perf tools: Add HEADER_DATA_INDEX feature Jiri Olsa
2018-09-13 12:54 ` [PATCH 11/48] perf tools: Handle indexed data file properly Jiri Olsa
2018-09-13 12:54 ` [PATCH 12/48] perf tools: Add perf_data__create_index function Jiri Olsa
2018-09-13 12:54 ` [PATCH 13/48] perf record: Add --index option for building index table Jiri Olsa
2018-09-13 12:54 ` [PATCH 14/48] perf tools: Introduce thread__comm(_str)_by_time() helpers Jiri Olsa
2018-09-13 12:54 ` [PATCH 15/48] perf tools: Add a test case for thread comm handling Jiri Olsa
2018-09-13 12:54 ` [PATCH 16/48] perf tools: Use thread__comm_by_time() when adding hist entries Jiri Olsa
2018-09-13 12:54 ` [PATCH 17/48] perf tools: Convert dead thread list into rbtree Jiri Olsa
2018-09-13 12:54 ` [PATCH 18/48] perf tools: Introduce machine__find*_thread_by_time() Jiri Olsa
2018-09-13 12:54 ` [PATCH 19/48] perf tools: Add thread::exited flag Jiri Olsa
2018-09-13 12:54 ` [PATCH 20/48] perf tools: Add a test case for timed thread handling Jiri Olsa
2018-09-13 12:54 ` [PATCH 21/48] perf tools: Maintain map groups list in a leader thread Jiri Olsa
2018-09-13 12:54 ` [PATCH 22/48] perf tools: Introduce thread__find_symbol_by_time() and friends Jiri Olsa
2018-09-13 12:54 ` [PATCH 23/48] perf callchain: Use thread__find_addr_location_by_time() " Jiri Olsa
2018-09-13 12:54 ` [PATCH 24/48] perf tools: Add a test case for timed map groups handling Jiri Olsa
2018-09-13 12:54 ` [PATCH 25/48] perf tools: Save timestamp of a map creation Jiri Olsa
2018-09-13 12:54 ` [PATCH 26/48] perf tools: Introduce map_groups__{insert,find}_by_time() Jiri Olsa
2018-09-13 12:54 ` [PATCH 27/48] perf tools: Use map_groups__find_addr_by_time() Jiri Olsa
2018-09-13 12:54 ` [PATCH 28/48] perf tools: Add testcase for managing maps with time Jiri Olsa
2018-09-13 12:54 ` [PATCH 29/48] perf callchain: Maintain libunwind's address space in map_groups Jiri Olsa
2018-09-14 18:15   ` Arnaldo Carvalho de Melo
2018-09-14 19:00     ` Jiri Olsa
2018-09-13 12:54 ` [PATCH 30/48] perf tools: Rename perf_evlist__munmap_filtered to perf_mmap__put_filtered Jiri Olsa
2018-09-13 12:54 ` [PATCH 31/48] tools lib fd array: Introduce fdarray__add_clone function Jiri Olsa
2018-09-13 12:54 ` [PATCH 32/48] tools lib subcmd: Add OPT_INTEGER_OPTARG|_SET options Jiri Olsa
2018-09-13 12:54 ` [PATCH 33/48] perf tools: Move __perf_session__process_events args into struct Jiri Olsa
2018-09-13 12:54 ` [PATCH 34/48] perf ui progress: Fix index progress display Jiri Olsa
2018-09-13 12:54 ` [PATCH 35/48] perf tools: Add threads debug variable Jiri Olsa
2018-09-13 12:54 ` [PATCH 36/48] perf tools: Add perf_mmap__read_tail function Jiri Olsa
2018-09-13 12:54 ` [PATCH 37/48] perf record: Introduce struct record_thread Jiri Olsa
2018-09-17 11:26   ` Namhyung Kim
2018-09-23 19:31     ` Jiri Olsa
2018-09-13 12:54 ` [PATCH 38/48] perf record: Read record thread's mmaps Jiri Olsa
2018-09-17 11:28   ` Namhyung Kim
2018-09-23 19:35     ` Jiri Olsa
2018-09-13 12:54 ` [PATCH 39/48] perf record: Move waking into struct record Jiri Olsa
2018-09-17 11:31   ` Namhyung Kim
2018-09-23 19:36     ` Jiri Olsa
2018-09-13 12:54 ` [PATCH 40/48] perf record: Move samples into struct record_thread Jiri Olsa
2018-09-13 12:54 ` [PATCH 41/48] perf record: Move bytes_written " Jiri Olsa
2018-09-13 12:54 ` [PATCH 42/48] perf record: Add record_thread start/stop/process functions Jiri Olsa
2018-09-13 12:54 ` [PATCH 43/48] perf record: Wait for all threads being started Jiri Olsa
2018-09-13 12:54 ` [PATCH 44/48] perf record: Add --threads option Jiri Olsa
2018-09-17 11:37   ` Namhyung Kim
2018-09-13 12:54 ` [PATCH 45/48] perf record: Add --thread-stats option support Jiri Olsa
2018-09-13 12:54 ` [PATCH 46/48] perf record: Add maps to --thread-stats output Jiri Olsa
2018-09-13 12:54 ` [PATCH 47/48] perf record: Spread maps for --threads option Jiri Olsa
2018-09-17 11:40   ` Namhyung Kim
2018-09-23 19:44     ` Jiri Olsa
2018-09-24 14:22       ` Arnaldo Carvalho de Melo
2018-09-26  6:23         ` Jiri Olsa
2018-09-27 16:01           ` Jiri Olsa
2018-09-28  6:25             ` Namhyung Kim
2018-09-13 12:54 ` [PATCH 48/48] perf record: Spread maps for --threads=X option Jiri Olsa
2018-09-13 16:10 ` [RFCv2 00/48] perf tools: Add threads to record command Alexey Budankov
2018-09-14  2:29   ` Namhyung Kim
2018-09-14  7:15     ` Alexey Budankov
2018-09-14  8:23     ` Jiri Olsa
2018-09-14  9:40       ` Ingo Molnar
2018-09-14 11:15         ` Peter Zijlstra
2018-09-14 11:47           ` Jiri Olsa
2018-09-14 12:01             ` Peter Zijlstra
2018-09-14 12:13               ` Ingo Molnar
2018-09-14 12:19                 ` Jiri Olsa
2018-09-14 12:45                   ` Ingo Molnar
2018-09-14  9:33     ` Ingo Molnar
2018-09-14  8:26   ` Jiri Olsa
2018-09-14  8:28     ` Jiri Olsa
2018-09-14  9:37       ` Alexey Budankov
2018-09-21  6:13         ` Alexey Budankov
2018-09-21 12:15           ` Alexey Budankov
2018-09-24 19:23             ` Alexey Budankov
2018-10-02 21:41               ` Jiri Olsa
2018-10-03  7:01                 ` Alexey Budankov
2018-09-23 19:30           ` Jiri Olsa
2018-09-24  7:02             ` Alexey Budankov
2018-09-24 13:09               ` Alexey Budankov
2018-09-24 14:29                 ` Jiri Olsa
2018-09-24 18:32                   ` Alexey Budankov
2018-09-24 19:12                     ` Alexey Budankov
2018-10-05  6:14                     ` Namhyung Kim
2018-09-14 17:02 ` Andi Kleen

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox