All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1)
@ 2014-12-24  7:14 Namhyung Kim
  2014-12-24  7:14 ` [PATCH 01/37] perf tools: Set attr.task bit for a tracking event Namhyung Kim
                   ` (38 more replies)
  0 siblings, 39 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

Hello,

This patchset converts perf report to use multiple threads in order to
speed up the processing on large data files.  I can see a minimum 40%
of speedup with this change.  The code is still experimental, little
bit outdated and contains many rough edges.  But I'd like to share and
give some feedbacks.

The perf report processes (sample) events like below:

  1. preprocess sample to get matching thread/dso/symbol info
  2. insert it to hists rbtree (with callchain tree) based on the info
  3. optionally collapse hist entries that match given sort key(s)
  4. resort hist entries (by overhead) for output
  5. display the hist entries

The stage 1 is a preprocessing and mostly act like a read-only
operation during the sample processing.  Meta events like fork, comm
and mmap can change the machine/thread state but symbols can be loaded
during the processing (stage 2).

The stage 2 consumes most of the time especially with callchains and
 --children option is enabled.  And this work can be easily patitioned
as each sample is independent to others.  But the resulting hists must
be combined/collapsed to a single global hists before going to further
steps.

The stage 3 is optional and only needed by certain sort keys - but
with stage 2 paralellized, it needs to be done anyway.

The stage 4 and 5 works on whole hists so must be done serially.

So my approach is like this:

Partially do stage 1 first - but only for meta events that changes
machine state.  To do this I add a dummy tracking event to perf record
and make it collect such meta events only.  They are saved in a
separate file (perf.header) and processed before sample events at perf
report time.

This also requires to handle multiple files and to find a
corresponding machine state when processing samples.  On a large
profiling session, many tasks were created and exited so pid might be
recycled (even more than once!).  To deal with it, I managed to have
thread, map_groups and comm in time sorted.  The only remaining thing
is symbol loading as it's done lazily when sample requires it.

With that being done, the stage 2 can be done by multiple threads.  I
also save each sample data (per-cpu or per-thread) in separate files
during record.  On perf report time, each file will be processed by
each thread.  And symbol loading is protected by a mutex lock.

For DWARF post-unwinding, dso cache data also needs to be protected by
a lock and this causes a huge contention.  I just added a front cache
that can be accessed without the lock but this should be improved IMHO.

The patch 1-10 are to support multi-file data recording.  With
 -M/--multi option, perf record will create a directory (named
'perf.data.dir' by default - but maybe renamed 'perf.data' for
transparent conversion later) and save meta events to perf.header file
and sample events to perf.data.<n> file).  It'd be better considering
file format change Jiri suggested [1].

The patch 11-20 are to manage machine and thread state using timestamp
so that it can be searched when processing samples.  The patch 21-35
are to implement parallel report.  And finally I implemented 'perf
data split' command to convert a single data file into a multi-file
format.

This patchset didn't change perf record to use multi-thread.  But I
think it can be easily done later if needed.

Note that output has a slight difference to original version when
compared using splitted data file.  But they're mostly unresolved
symbols for callchains.

Here is the result:

This is just elapsed (real) time measured by shell 'time' function.

The data file was recorded during kernel build with fp callchain and
size is 2.1GB.  The machine has 6 core with hyper-threading enabled
and I got a similar result on my laptop too.

 time perf report  --children  --no-children  + --call-graph none
 		   ----------  -------------  -------------------
 current            4m43.260s      1m32.779s            0m35.866s            
 patched            4m43.710s      1m29.695s            0m33.995s
 --multi-thread     2m46.265s      0m45.486s             0m7.570s


This result is with 7.7GB data file using libunwind for callchain.

 time perf report  --children  --no-children  + --call-graph none
 		   ----------  -------------  -------------------
 current            3m51.762s      3m10.451s             0m4.695s            
 patched            2m26.030s      1m49.846s             0m4.105s
 --multi-thread     0m49.217s      0m35.106s             0m1.457s

Note that the single thread performance improvement in patched version
is due to changes in the patch 33-35.


This result is with same file but using libdw for callchain unwind.

 time perf report  --children  --no-children  + --call-graph none
 		   ----------  -------------  -------------------
 current           10m22.472s     11m42.290s             0m4.758s            
 patched           10m10.625s     11m45.480s             0m4.162s
 --multi-thread     3m47.332s      3m35.235s             0m1.755s

On my archlinux system, callchain unwind using libdw is much slower
than libunwind.  I'm using elfutils version 0.160.  Also I don't know
why --children takes less time than --no-children.  Anyway we can see
the --multi-thread performance is much better for each case.


You can get it from 'perf/threaded-v1' branch on my tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Please take a look and play with it.  Any comments are welcome! :)

Thanks,
Namhyung


[1] https://lkml.org/lkml/2013/9/1/20


Jiri Olsa (1):
  perf tools: Add new perf data command

Namhyung Kim (36):
  perf tools: Set attr.task bit for a tracking event
  perf record: Use a software dummy event to track task/mmap events
  perf tools: Use perf_data_file__fd() consistently
  perf tools: Add multi file interface to perf_data_file
  perf tools: Create separate mmap for dummy tracking event
  perf tools: Introduce perf_evlist__mmap_multi()
  perf tools: Do not use __perf_session__process_events() directly
  perf tools: Handle multi-file session properly
  perf record: Add -M/--multi option for multi file recording
  perf report: Skip dummy tracking event
  perf tools: Introduce thread__comm_time() helpers
  perf tools: Add a test case for thread comm handling
  perf tools: Use thread__comm_time() when adding hist entries
  perf tools: Convert dead thread list into rbtree
  perf tools: Introduce machine__find*_thread_time()
  perf tools: Add a test case for timed thread handling
  perf tools: Maintain map groups list in a leader thread
  perf tools: Remove thread when map groups initialization failed
  perf tools: Introduce thread__find_addr_location_time() and friends
  perf tools: Add a test case for timed map groups handling
  perf tools: Protect dso symbol loading using a mutex
  perf tools: Protect dso cache tree using dso->lock
  perf tools: Protect dso cache fd with a mutex
  perf session: Pass struct events stats to event processing functions
  perf hists: Pass hists struct to hist_entry_iter functions
  perf tools: Move BUILD_ID_SIZE definition to perf.h
  perf report: Parallelize perf report using multi-thread
  perf tools: Add missing_threads rb tree
  perf top: Always creates thread in the current task tree.
  perf tools: Fix progress ui to support multi thread
  perf record: Show total size of multi file data
  perf report: Add --multi-thread option and config item
  perf tools: Add front cache for dso data access
  perf tools: Convert lseek + read to pread
  perf callchain: Save eh/debug frame offset for dwarf unwind
  perf data: Implement 'split' subcommand

 tools/perf/Documentation/perf-data.txt   |  43 ++++
 tools/perf/Documentation/perf-record.txt |   5 +
 tools/perf/Documentation/perf-report.txt |   3 +
 tools/perf/Makefile.perf                 |   4 +
 tools/perf/builtin-annotate.c            |   5 +-
 tools/perf/builtin-data.c                | 298 ++++++++++++++++++++++++++
 tools/perf/builtin-diff.c                |   8 +-
 tools/perf/builtin-inject.c              |   9 +-
 tools/perf/builtin-record.c              |  65 ++++--
 tools/perf/builtin-report.c              | 107 ++++++++--
 tools/perf/builtin-script.c              |   5 +-
 tools/perf/builtin-top.c                 |   9 +-
 tools/perf/builtin.h                     |   1 +
 tools/perf/command-list.txt              |   1 +
 tools/perf/perf.c                        |   1 +
 tools/perf/perf.h                        |   2 +
 tools/perf/tests/builtin-test.c          |  12 ++
 tools/perf/tests/dso-data.c              |   5 +
 tools/perf/tests/dwarf-unwind.c          |  10 +-
 tools/perf/tests/hists_common.c          |   3 +-
 tools/perf/tests/hists_cumulate.c        |   4 +-
 tools/perf/tests/hists_filter.c          |   3 +-
 tools/perf/tests/hists_link.c            |   6 +-
 tools/perf/tests/hists_output.c          |   4 +-
 tools/perf/tests/tests.h                 |   3 +
 tools/perf/tests/thread-comm.c           |  47 +++++
 tools/perf/tests/thread-lookup-time.c    | 180 ++++++++++++++++
 tools/perf/tests/thread-mg-share.c       |   7 +-
 tools/perf/tests/thread-mg-time.c        |  88 ++++++++
 tools/perf/ui/browsers/hists.c           |  10 +-
 tools/perf/ui/gtk/hists.c                |   3 +
 tools/perf/util/build-id.c               |   9 +-
 tools/perf/util/build-id.h               |   2 -
 tools/perf/util/data.c                   | 188 ++++++++++++++++-
 tools/perf/util/data.h                   |  17 ++
 tools/perf/util/dso.c                    | 192 ++++++++++++-----
 tools/perf/util/dso.h                    |   5 +
 tools/perf/util/event.c                  |  85 ++++++--
 tools/perf/util/event.h                  |   6 +-
 tools/perf/util/evlist.c                 | 151 ++++++++++++--
 tools/perf/util/evlist.h                 |  22 +-
 tools/perf/util/evsel.c                  |   1 +
 tools/perf/util/evsel.h                  |  15 ++
 tools/perf/util/hist.c                   | 121 +++++++----
 tools/perf/util/hist.h                   |  12 +-
 tools/perf/util/machine.c                | 251 +++++++++++++++++++---
 tools/perf/util/machine.h                |  12 +-
 tools/perf/util/map.c                    |   1 +
 tools/perf/util/map.h                    |   2 +
 tools/perf/util/ordered-events.c         |   4 +-
 tools/perf/util/session.c                | 347 ++++++++++++++++++++++++++-----
 tools/perf/util/session.h                |   8 +-
 tools/perf/util/symbol.c                 |  34 ++-
 tools/perf/util/thread.c                 | 140 ++++++++++++-
 tools/perf/util/thread.h                 |  28 ++-
 tools/perf/util/tool.h                   |  17 ++
 tools/perf/util/unwind-libdw.c           |  11 +-
 tools/perf/util/unwind-libunwind.c       |  49 +++--
 tools/perf/util/util.c                   |  43 ++++
 tools/perf/util/util.h                   |   1 +
 60 files changed, 2381 insertions(+), 344 deletions(-)
 create mode 100644 tools/perf/Documentation/perf-data.txt
 create mode 100644 tools/perf/builtin-data.c
 create mode 100644 tools/perf/tests/thread-comm.c
 create mode 100644 tools/perf/tests/thread-lookup-time.c
 create mode 100644 tools/perf/tests/thread-mg-time.c

-- 
2.1.3


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 01/37] perf tools: Set attr.task bit for a tracking event
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
@ 2014-12-24  7:14 ` Namhyung Kim
  2014-12-31 11:25   ` Jiri Olsa
  2014-12-24  7:14 ` [PATCH 02/37] perf record: Use a software dummy event to track task/mmap events Namhyung Kim
                   ` (37 subsequent siblings)
  38 siblings, 1 reply; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

The perf_event_attr.task bit is to track task (fork and exit) events
but it missed to be set by perf_evsel__config().  While it was not a
problem in practice since setting other bits (comm/mmap) ended up
being in same result, it'd be good to set it explicitly anyway.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/evsel.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 1e90c8557ede..e17d2b1624bc 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -709,6 +709,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts)
 	if (opts->sample_weight)
 		perf_evsel__set_sample_bit(evsel, WEIGHT);
 
+	attr->task  = track;
 	attr->mmap  = track;
 	attr->mmap2 = track && !perf_missing_features.mmap2;
 	attr->comm  = track;
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 02/37] perf record: Use a software dummy event to track task/mmap events
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
  2014-12-24  7:14 ` [PATCH 01/37] perf tools: Set attr.task bit for a tracking event Namhyung Kim
@ 2014-12-24  7:14 ` Namhyung Kim
  2014-12-26 16:27   ` David Ahern
  2014-12-24  7:14 ` [PATCH 03/37] perf tools: Use perf_data_file__fd() consistently Namhyung Kim
                   ` (36 subsequent siblings)
  38 siblings, 1 reply; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

Prepend a software dummy event into evlist to track task/comm/mmap
events separately.  This is a preparation of multi-file/thread support
which will come later.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-record.c |  3 +++
 tools/perf/perf.h           |  1 +
 tools/perf/util/evlist.c    | 38 ++++++++++++++++++++++++++++++++++++++
 tools/perf/util/evlist.h    |  1 +
 tools/perf/util/evsel.h     | 15 +++++++++++++++
 5 files changed, 58 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 8648c6d3003d..aa5fa6aabb31 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -862,6 +862,9 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
 		goto out_symbol_exit;
 	}
 
+	if (rec->opts.multi_file)
+		perf_evlist__prepend_dummy(rec->evlist);
+
 	if (rec->opts.target.tid && !rec->opts.no_inherit_set)
 		rec->opts.no_inherit = true;
 
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 1dabb8553499..37284eb47b56 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -53,6 +53,7 @@ struct record_opts {
 	bool	     sample_time;
 	bool	     period;
 	bool	     sample_intr_regs;
+	bool	     multi_file;
 	unsigned int freq;
 	unsigned int mmap_pages;
 	unsigned int user_freq;
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index cfbe2b99b9aa..72dff295237e 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -193,6 +193,44 @@ int perf_evlist__add_default(struct perf_evlist *evlist)
 	return -ENOMEM;
 }
 
+int perf_evlist__prepend_dummy(struct perf_evlist *evlist)
+{
+	struct perf_event_attr attr = {
+		.type = PERF_TYPE_SOFTWARE,
+		.config = PERF_COUNT_SW_DUMMY,
+	};
+	struct perf_evsel *evsel, *pos;
+
+	event_attr_init(&attr);
+
+	evsel = perf_evsel__new(&attr);
+	if (evsel == NULL)
+		goto error;
+
+	/* use strdup() because free(evsel) assumes name is allocated */
+	evsel->name = strdup("dummy");
+	if (!evsel->name)
+		goto error_free;
+
+	list_for_each_entry(pos, &evlist->entries, node) {
+		pos->idx += 1;
+		pos->tracking = false;
+	}
+
+	list_add(&evsel->node, &evlist->entries);
+	evsel->idx = 0;
+	evsel->tracking = true;
+
+	if (!evlist->nr_entries++)
+		perf_evlist__set_id_pos(evlist);
+
+	return 0;
+error_free:
+	perf_evsel__delete(evsel);
+error:
+	return -ENOMEM;
+}
+
 static int perf_evlist__add_attrs(struct perf_evlist *evlist,
 				  struct perf_event_attr *attrs, size_t nr_attrs)
 {
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 649b0c597283..b974bddf6b8b 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -67,6 +67,7 @@ void perf_evlist__delete(struct perf_evlist *evlist);
 
 void perf_evlist__add(struct perf_evlist *evlist, struct perf_evsel *entry);
 int perf_evlist__add_default(struct perf_evlist *evlist);
+int perf_evlist__prepend_dummy(struct perf_evlist *evlist);
 int __perf_evlist__add_default_attrs(struct perf_evlist *evlist,
 				     struct perf_event_attr *attrs, size_t nr_attrs);
 
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 38622747d130..5b45eea63043 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -331,6 +331,21 @@ static inline bool perf_evsel__is_function_event(struct perf_evsel *evsel)
 #undef FUNCTION_EVENT
 }
 
+/**
+ * perf_evsel__is_dummy_tracking - Return whether given evsel is a dummy
+ * event for tracking meta events only
+ *
+ * @evsel - evsel selector to be tested
+ *
+ * Return %true if event is a dummy tracking event
+ */
+static inline bool perf_evsel__is_dummy_tracking(struct perf_evsel *evsel)
+{
+	return evsel->attr.type == PERF_TYPE_SOFTWARE &&
+		evsel->attr.config == PERF_COUNT_SW_DUMMY &&
+		evsel->attr.task == 1 && evsel->idx == 0;
+}
+
 struct perf_attr_details {
 	bool freq;
 	bool verbose;
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 03/37] perf tools: Use perf_data_file__fd() consistently
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
  2014-12-24  7:14 ` [PATCH 01/37] perf tools: Set attr.task bit for a tracking event Namhyung Kim
  2014-12-24  7:14 ` [PATCH 02/37] perf record: Use a software dummy event to track task/mmap events Namhyung Kim
@ 2014-12-24  7:14 ` Namhyung Kim
  2014-12-26 16:30   ` David Ahern
  2014-12-24  7:15 ` [PATCH 04/37] perf tools: Add multi file interface to perf_data_file Namhyung Kim
                   ` (35 subsequent siblings)
  38 siblings, 1 reply; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

Do not reference file->fd directly since we want hide the
implementation details from outside to support multi-file store.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-inject.c |  9 ++++++---
 tools/perf/builtin-record.c | 14 ++++++++------
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 84df2deed988..d8b13407594d 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -375,8 +375,10 @@ static int __cmd_inject(struct perf_inject *inject)
 		}
 	}
 
-	if (!file_out->is_pipe)
-		lseek(file_out->fd, session->header.data_offset, SEEK_SET);
+	if (!file_out->is_pipe) {
+		lseek(perf_data_file__fd(file_out), session->header.data_offset,
+		      SEEK_SET);
+	}
 
 	ret = perf_session__process_events(session, &inject->tool);
 
@@ -385,7 +387,8 @@ static int __cmd_inject(struct perf_inject *inject)
 			perf_header__set_feat(&session->header,
 					      HEADER_BUILD_ID);
 		session->header.data_size = inject->bytes_written;
-		perf_session__write_header(session, session->evlist, file_out->fd, true);
+		perf_session__write_header(session, session->evlist,
+					   perf_data_file__fd(file_out), true);
 	}
 
 	return ret;
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index aa5fa6aabb31..054c6e57d3b9 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -196,7 +196,7 @@ static int process_buildids(struct record *rec)
 	struct perf_session *session = rec->session;
 	u64 start = session->header.data_offset;
 
-	u64 size = lseek(file->fd, 0, SEEK_CUR);
+	u64 size = lseek(perf_data_file__fd(file), 0, SEEK_CUR);
 	if (size == 0)
 		return 0;
 
@@ -360,12 +360,12 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		perf_header__clear_feat(&session->header, HEADER_GROUP_DESC);
 
 	if (file->is_pipe) {
-		err = perf_header__write_pipe(file->fd);
+		err = perf_header__write_pipe(perf_data_file__fd(file));
 		if (err < 0)
 			goto out_child;
 	} else {
 		err = perf_session__write_header(session, rec->evlist,
-						 file->fd, false);
+						 perf_data_file__fd(file), false);
 		if (err < 0)
 			goto out_child;
 	}
@@ -397,8 +397,10 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 			 * return this more properly and also
 			 * propagate errors that now are calling die()
 			 */
-			err = perf_event__synthesize_tracing_data(tool, file->fd, rec->evlist,
-								  process_synthesized_event);
+			err = perf_event__synthesize_tracing_data(tool,
+						perf_data_file__fd(file),
+						rec->evlist,
+						process_synthesized_event);
 			if (err <= 0) {
 				pr_err("Couldn't record tracing data.\n");
 				goto out_child;
@@ -541,7 +543,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		if (!rec->no_buildid)
 			process_buildids(rec);
 		perf_session__write_header(rec->session, rec->evlist,
-					   file->fd, true);
+					   perf_data_file__fd(&rec->file), true);
 	}
 
 out_delete_session:
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 04/37] perf tools: Add multi file interface to perf_data_file
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (2 preceding siblings ...)
  2014-12-24  7:14 ` [PATCH 03/37] perf tools: Use perf_data_file__fd() consistently Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-25 22:08   ` Jiri Olsa
  2014-12-31 11:26   ` Jiri Olsa
  2014-12-24  7:15 ` [PATCH 05/37] perf tools: Create separate mmap for dummy tracking event Namhyung Kim
                   ` (34 subsequent siblings)
  38 siblings, 2 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

When multi file storage is enabled, the perf data files will be saved
in a directory (default: perf.data.dir) and it'll have a single header
file for metadata (task/comm/mmap events and file header) and multiple
data files (sample events) like below:

  $ tree perf.data.dir
  perf.data.dir
  |-- perf.data.0
  |-- perf.data.1
  |-- perf.data.2
  |-- perf.data.3
  `-- perf.header

  0 directories, 5 files

Existing data file interface supports multi files internally and add
new perf_data_file__prepare_write() and perf_data_file__write_multi()
functions in order to support multi-file record.  Note that multi read
interface is not needed since they're accessed via mmap.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/data.c | 149 +++++++++++++++++++++++++++++++++++++++++++++++--
 tools/perf/util/data.h |  14 +++++
 tools/perf/util/util.c |  43 ++++++++++++++
 tools/perf/util/util.h |   1 +
 4 files changed, 201 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
index 1921942fc2e0..8dacd34659cc 100644
--- a/tools/perf/util/data.c
+++ b/tools/perf/util/data.c
@@ -4,6 +4,7 @@
 #include <sys/stat.h>
 #include <unistd.h>
 #include <string.h>
+#include <dirent.h>
 
 #include "data.h"
 #include "util.h"
@@ -39,26 +40,97 @@ static int check_backup(struct perf_data_file *file)
 		char oldname[PATH_MAX];
 		snprintf(oldname, sizeof(oldname), "%s.old",
 			 file->path);
-		unlink(oldname);
+
+		if (S_ISDIR(st.st_mode))
+			rm_rf(oldname);
+		else
+			unlink(oldname);
+
 		rename(file->path, oldname);
 	}
 
 	return 0;
 }
 
+static int scandir_filter(const struct dirent *d)
+{
+	return !prefixcmp(d->d_name, "perf.data.");
+}
+
+static int open_file_read_multi(struct perf_data_file *file)
+{
+	int i, n;
+	int ret;
+	struct dirent **list;
+
+	n = scandir(file->path, &list, scandir_filter, versionsort);
+	if (n <= 0) {
+		ret = -errno;
+		pr_err("cannot find multi-data file\n");
+		return ret;
+	}
+
+	file->multi_fd = malloc(n * sizeof(int));
+	if (file->multi_fd == NULL) {
+		free(list);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < n; i++) {
+		char path[PATH_MAX];
+
+		scnprintf(path, sizeof(path), "%s/%s",
+			  file->path, list[i]->d_name);
+
+		ret = open(path, O_RDONLY);
+		if (ret < 0)
+			goto out_err;
+
+		file->multi_fd[i] = ret;
+	}
+
+	file->nr_multi = n;
+
+	free(list);
+	return 0;
+
+out_err:
+	while (--i >= 0)
+		close(file->multi_fd[i]);
+
+	zfree(&file->multi_fd);
+	free(list);
+	return ret;
+}
+
+static const char *default_data_path(struct perf_data_file *file)
+{
+	return file->is_multi ? "perf.data.dir" : "perf.data";
+}
+
 static int open_file_read(struct perf_data_file *file)
 {
 	struct stat st;
+	char path[PATH_MAX];
 	int fd;
 	char sbuf[STRERR_BUFSIZE];
 
-	fd = open(file->path, O_RDONLY);
+	strcpy(path, file->path);
+	if (file->is_multi) {
+		if (open_file_read_multi(file) < 0)
+			return -1;
+
+		if (path__join(path, sizeof(path), file->path, "perf.header") < 0)
+			return -1;
+	}
+
+	fd = open(path, O_RDONLY);
 	if (fd < 0) {
 		int err = errno;
 
-		pr_err("failed to open %s: %s", file->path,
+		pr_err("failed to open %s: %s", path,
 			strerror_r(err, sbuf, sizeof(sbuf)));
-		if (err == ENOENT && !strcmp(file->path, "perf.data"))
+		if (err == ENOENT && !strcmp(path, default_data_path(file)))
 			pr_err("  (try 'perf record' first)");
 		pr_err("\n");
 		return -err;
@@ -90,12 +162,26 @@ static int open_file_read(struct perf_data_file *file)
 static int open_file_write(struct perf_data_file *file)
 {
 	int fd;
+	char path[PATH_MAX];
 	char sbuf[STRERR_BUFSIZE];
 
 	if (check_backup(file))
 		return -1;
 
-	fd = open(file->path, O_CREAT|O_RDWR|O_TRUNC, S_IRUSR|S_IWUSR);
+	strcpy(path, file->path);
+
+	if (file->is_multi) {
+		if (mkdir(file->path, S_IRWXU) < 0) {
+			pr_err("cannot create data directory `%s': %s\n",
+			       file->path, strerror_r(errno, sbuf, sizeof(sbuf)));
+			return -1;
+		}
+
+		if (path__join(path, sizeof(path), file->path, "perf.header") < 0)
+			return -1;
+	}
+
+	fd = open(path, O_CREAT|O_RDWR|O_TRUNC, S_IRUSR|S_IWUSR);
 
 	if (fd < 0)
 		pr_err("failed to open %s : %s\n", file->path,
@@ -121,18 +207,69 @@ int perf_data_file__open(struct perf_data_file *file)
 		return 0;
 
 	if (!file->path)
-		file->path = "perf.data";
+		file->path = default_data_path(file);
 
 	return open_file(file);
 }
 
 void perf_data_file__close(struct perf_data_file *file)
 {
+	if (file->is_multi) {
+		int i;
+
+		for (i = 0; i < file->nr_multi; i++)
+			close(file->multi_fd[i]);
+
+		zfree(&file->multi_fd);
+	}
+
 	close(file->fd);
 }
 
+int perf_data_file__prepare_write(struct perf_data_file *file, int nr)
+{
+	int i;
+	int ret;
+	char path[PATH_MAX];
+
+	if (!file->is_multi)
+		return 0;
+
+	file->multi_fd = malloc(nr * sizeof(int));
+	if (file->multi_fd == NULL)
+		return -ENOMEM;
+
+	for (i = 0; i < nr; i++) {
+		scnprintf(path, sizeof(path), "%s/perf.data.%d", file->path, i);
+		ret = open(path, O_RDWR|O_CREAT|O_TRUNC, S_IRUSR|S_IWUSR);
+		if (ret < 0)
+			goto out_err;
+
+		file->multi_fd[i] = ret;
+	}
+
+	file->nr_multi = nr;
+	return 0;
+
+out_err:
+	while (--i >= 0)
+		close(file->multi_fd[i]);
+
+	zfree(&file->multi_fd);
+	return ret;
+}
+
 ssize_t perf_data_file__write(struct perf_data_file *file,
 			      void *buf, size_t size)
 {
 	return writen(file->fd, buf, size);
 }
+
+ssize_t perf_data_file__write_multi(struct perf_data_file *file,
+				    void *buf, size_t size, int idx)
+{
+	if (!file->is_multi)
+		return -1;
+
+	return writen(file->multi_fd[idx], buf, size);
+}
diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h
index 2b15d0c95c7f..f5c229166614 100644
--- a/tools/perf/util/data.h
+++ b/tools/perf/util/data.h
@@ -11,6 +11,9 @@ enum perf_data_mode {
 struct perf_data_file {
 	const char		*path;
 	int			 fd;
+	int			 nr_multi;
+	int			*multi_fd;
+	bool			 is_multi;
 	bool			 is_pipe;
 	bool			 force;
 	unsigned long		 size;
@@ -37,6 +40,14 @@ static inline int perf_data_file__fd(struct perf_data_file *file)
 	return file->fd;
 }
 
+static inline int perf_data_file__multi_fd(struct perf_data_file *file, int idx)
+{
+	if (!file->is_multi || idx >= file->nr_multi)
+		return -1;
+
+	return file->multi_fd[idx];
+}
+
 static inline unsigned long perf_data_file__size(struct perf_data_file *file)
 {
 	return file->size;
@@ -46,5 +57,8 @@ int perf_data_file__open(struct perf_data_file *file);
 void perf_data_file__close(struct perf_data_file *file);
 ssize_t perf_data_file__write(struct perf_data_file *file,
 			      void *buf, size_t size);
+int perf_data_file__prepare_write(struct perf_data_file *file, int nr);
+ssize_t perf_data_file__write_multi(struct perf_data_file *file,
+				    void *buf, size_t size, int idx);
 
 #endif /* __PERF_DATA_H */
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index d5eab3f3323f..a5046d52e311 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -72,6 +72,49 @@ int mkdir_p(char *path, mode_t mode)
 	return (stat(path, &st) && mkdir(path, mode)) ? -1 : 0;
 }
 
+int rm_rf(char *path)
+{
+	DIR *dir;
+	int ret = 0;
+	struct dirent *d;
+	char namebuf[PATH_MAX];
+
+	dir = opendir(path);
+	if (dir == NULL)
+		return 0;
+
+	while ((d = readdir(dir)) != NULL && !ret) {
+		struct stat statbuf;
+
+		if (d->d_name[0] == '.')
+			continue;
+
+		scnprintf(namebuf, sizeof(namebuf), "%s/%s",
+			  path, d->d_name);
+
+		ret = stat(namebuf, &statbuf);
+		if (ret < 0) {
+			pr_debug("stat failed: %s\n", namebuf);
+			break;
+		}
+
+		if (S_ISREG(statbuf.st_mode))
+			ret = unlink(namebuf);
+		else if (S_ISDIR(statbuf.st_mode))
+			ret = rm_rf(namebuf);
+		else {
+			pr_debug("unknown file: %s\n", namebuf);
+			ret = -1;
+		}
+	}
+	closedir(dir);
+
+	if (ret < 0)
+		return ret;
+
+	return rmdir(path);
+}
+
 static int slow_copyfile(const char *from, const char *to, mode_t mode)
 {
 	int err = -1;
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index abc445ee4f60..d75975a71d2c 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -248,6 +248,7 @@ static inline int sane_case(int x, int high)
 }
 
 int mkdir_p(char *path, mode_t mode);
+int rm_rf(char *path);
 int copyfile(const char *from, const char *to);
 int copyfile_mode(const char *from, const char *to, mode_t mode);
 
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 05/37] perf tools: Create separate mmap for dummy tracking event
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (3 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 04/37] perf tools: Add multi file interface to perf_data_file Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-25 22:08   ` Jiri Olsa
                     ` (3 more replies)
  2014-12-24  7:15 ` [PATCH 06/37] perf tools: Introduce perf_evlist__mmap_multi() Namhyung Kim
                   ` (33 subsequent siblings)
  38 siblings, 4 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

When multi file support is enabled, a dummy tracking event will be
used to track metadata (like task, comm and mmap events) for a session
and actual samples will be recorded in separate files.

Provide separate mmap to the dummy tracking event.  The size is fixed
to 128KiB (+ 1 page) as the event rate will be lower than samples.  I
originally wanted to use a single mmap for this but cross-cpu sharing
is prohibited so it's per-cpu (or per-task) like normal mmaps.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-record.c |   9 +++-
 tools/perf/util/evlist.c    | 104 +++++++++++++++++++++++++++++++++++---------
 tools/perf/util/evlist.h    |  11 ++++-
 3 files changed, 102 insertions(+), 22 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 054c6e57d3b9..129fab35fdc5 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -69,7 +69,7 @@ static int process_synthesized_event(struct perf_tool *tool,
 
 static int record__mmap_read(struct record *rec, int idx)
 {
-	struct perf_mmap *md = &rec->evlist->mmap[idx];
+	struct perf_mmap *md = perf_evlist__mmap_desc(rec->evlist, idx);
 	unsigned int head = perf_mmap__read_head(md);
 	unsigned int old = md->prev;
 	unsigned char *data = md->base + page_size;
@@ -105,6 +105,7 @@ static int record__mmap_read(struct record *rec, int idx)
 	}
 
 	md->prev = old;
+
 	perf_evlist__mmap_consume(rec->evlist, idx);
 out:
 	return rc;
@@ -263,6 +264,12 @@ static int record__mmap_read_all(struct record *rec)
 				goto out;
 			}
 		}
+		if (rec->evlist->track_mmap) {
+			if (record__mmap_read(rec, track_mmap_idx(i)) != 0) {
+				rc = -1;
+				goto out;
+			}
+		}
 	}
 
 	/*
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 72dff295237e..d99343b988fe 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -27,6 +27,7 @@
 
 static void perf_evlist__mmap_put(struct perf_evlist *evlist, int idx);
 static void __perf_evlist__munmap(struct perf_evlist *evlist, int idx);
+static void __perf_evlist__munmap_track(struct perf_evlist *evlist, int idx);
 
 #define FD(e, x, y) (*(int *)xyarray__entry(e->fd, x, y))
 #define SID(e, x, y) xyarray__entry(e->sample_id, x, y)
@@ -735,22 +736,39 @@ static bool perf_mmap__empty(struct perf_mmap *md)
 	return perf_mmap__read_head(md) != md->prev;
 }
 
+struct perf_mmap *perf_evlist__mmap_desc(struct perf_evlist *evlist, int idx)
+{
+	if (idx >= 0)
+		return &evlist->mmap[idx];
+	else
+		return &evlist->track_mmap[track_mmap_idx(idx)];
+}
+
 static void perf_evlist__mmap_get(struct perf_evlist *evlist, int idx)
 {
-	++evlist->mmap[idx].refcnt;
+	struct perf_mmap *md = perf_evlist__mmap_desc(evlist, idx);
+
+	++md->refcnt;
 }
 
 static void perf_evlist__mmap_put(struct perf_evlist *evlist, int idx)
 {
-	BUG_ON(evlist->mmap[idx].refcnt == 0);
+	struct perf_mmap *md = perf_evlist__mmap_desc(evlist, idx);
+
+	BUG_ON(md->refcnt == 0);
+
+	if (--md->refcnt != 0)
+		return;
 
-	if (--evlist->mmap[idx].refcnt == 0)
+	if (idx >= 0)
 		__perf_evlist__munmap(evlist, idx);
+	else
+		__perf_evlist__munmap_track(evlist, track_mmap_idx(idx));
 }
 
 void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx)
 {
-	struct perf_mmap *md = &evlist->mmap[idx];
+	struct perf_mmap *md = perf_evlist__mmap_desc(evlist, idx);
 
 	if (!evlist->overwrite) {
 		unsigned int old = md->prev;
@@ -771,6 +789,15 @@ static void __perf_evlist__munmap(struct perf_evlist *evlist, int idx)
 	}
 }
 
+static void __perf_evlist__munmap_track(struct perf_evlist *evlist, int idx)
+{
+	if (evlist->track_mmap[idx].base != NULL) {
+		munmap(evlist->track_mmap[idx].base, TRACK_MMAP_SIZE);
+		evlist->track_mmap[idx].base = NULL;
+		evlist->track_mmap[idx].refcnt = 0;
+	}
+}
+
 void perf_evlist__munmap(struct perf_evlist *evlist)
 {
 	int i;
@@ -782,23 +809,43 @@ void perf_evlist__munmap(struct perf_evlist *evlist)
 		__perf_evlist__munmap(evlist, i);
 
 	zfree(&evlist->mmap);
+
+	if (evlist->track_mmap == NULL)
+		return;
+
+	for (i = 0; i < evlist->nr_mmaps; i++)
+		__perf_evlist__munmap_track(evlist, i);
+
+	zfree(&evlist->track_mmap);
 }
 
-static int perf_evlist__alloc_mmap(struct perf_evlist *evlist)
+static int perf_evlist__alloc_mmap(struct perf_evlist *evlist, bool track_mmap)
 {
 	evlist->nr_mmaps = cpu_map__nr(evlist->cpus);
 	if (cpu_map__empty(evlist->cpus))
 		evlist->nr_mmaps = thread_map__nr(evlist->threads);
 	evlist->mmap = zalloc(evlist->nr_mmaps * sizeof(struct perf_mmap));
-	return evlist->mmap != NULL ? 0 : -ENOMEM;
+	if (evlist->mmap == NULL)
+		return -ENOMEM;
+
+	if (track_mmap) {
+		evlist->track_mmap = calloc(evlist->nr_mmaps,
+					    sizeof(struct perf_mmap));
+		if (evlist->track_mmap == NULL) {
+			zfree(&evlist->mmap);
+			return -ENOMEM;
+		}
+	}
+	return 0;
 }
 
 struct mmap_params {
-	int prot;
-	int mask;
+	int	prot;
+	size_t	len;
 };
 
-static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
+static int __perf_evlist__mmap(struct perf_evlist *evlist __maybe_unused,
+			       struct perf_mmap *pmmap,
 			       struct mmap_params *mp, int fd)
 {
 	/*
@@ -814,15 +861,14 @@ static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
 	 * evlist layer can't just drop it when filtering events in
 	 * perf_evlist__filter_pollfd().
 	 */
-	evlist->mmap[idx].refcnt = 2;
-	evlist->mmap[idx].prev = 0;
-	evlist->mmap[idx].mask = mp->mask;
-	evlist->mmap[idx].base = mmap(NULL, evlist->mmap_len, mp->prot,
-				      MAP_SHARED, fd, 0);
-	if (evlist->mmap[idx].base == MAP_FAILED) {
+	pmmap->refcnt = 2;
+	pmmap->prev = 0;
+	pmmap->mask = mp->len - page_size - 1;
+	pmmap->base = mmap(NULL, mp->len, mp->prot, MAP_SHARED, fd, 0);
+	if (pmmap->base == MAP_FAILED) {
 		pr_debug2("failed to mmap perf event ring buffer, error %d\n",
 			  errno);
-		evlist->mmap[idx].base = NULL;
+		pmmap->base = NULL;
 		return -1;
 	}
 
@@ -843,9 +889,22 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
 
 		fd = FD(evsel, cpu, thread);
 
-		if (*output == -1) {
+		if (perf_evsel__is_dummy_tracking(evsel)) {
+			struct mmap_params track_mp = {
+				.prot	= mp->prot,
+				.len	= TRACK_MMAP_SIZE,
+			};
+
+			if (__perf_evlist__mmap(evlist, &evlist->track_mmap[idx],
+						&track_mp, fd) < 0)
+				return -1;
+
+			/* mark idx as track mmap idx (negative) */
+			idx = track_mmap_idx(idx);
+		} else if (*output == -1) {
 			*output = fd;
-			if (__perf_evlist__mmap(evlist, idx, mp, *output) < 0)
+			if (__perf_evlist__mmap(evlist, &evlist->mmap[idx],
+						mp, *output) < 0)
 				return -1;
 		} else {
 			if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, *output) != 0)
@@ -874,6 +933,11 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
 			perf_evlist__set_sid_idx(evlist, evsel, idx, cpu,
 						 thread);
 		}
+
+		if (mp->track && perf_evsel__is_dummy_tracking(evsel)) {
+			/* restore idx as normal idx (positive) */
+			idx = track_mmap_idx(idx);
+		}
 	}
 
 	return 0;
@@ -1025,7 +1089,7 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
 		.prot = PROT_READ | (overwrite ? 0 : PROT_WRITE),
 	};
 
-	if (evlist->mmap == NULL && perf_evlist__alloc_mmap(evlist) < 0)
+	if (evlist->mmap == NULL && perf_evlist__alloc_mmap(evlist, use_track_mmap) < 0)
 		return -ENOMEM;
 
 	if (evlist->pollfd.entries == NULL && perf_evlist__alloc_pollfd(evlist) < 0)
@@ -1034,7 +1098,7 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
 	evlist->overwrite = overwrite;
 	evlist->mmap_len = perf_evlist__mmap_size(pages);
 	pr_debug("mmap size %zuB\n", evlist->mmap_len);
-	mp.mask = evlist->mmap_len - page_size - 1;
+	mp.len = evlist->mmap_len;
 
 	evlist__for_each(evlist, evsel) {
 		if ((evsel->attr.read_format & PERF_FORMAT_ID) &&
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index b974bddf6b8b..b7f54b8577f7 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -48,11 +48,14 @@ struct perf_evlist {
 	bool		 overwrite;
 	struct fdarray	 pollfd;
 	struct perf_mmap *mmap;
+	struct perf_mmap *track_mmap;
 	struct thread_map *threads;
 	struct cpu_map	  *cpus;
 	struct perf_evsel *selected;
 };
 
+#define TRACK_MMAP_SIZE  (((128 * 1024 / page_size) + 1) * page_size)
+
 struct perf_evsel_str_handler {
 	const char *name;
 	void	   *handler;
@@ -100,8 +103,8 @@ struct perf_evsel *perf_evlist__id2evsel(struct perf_evlist *evlist, u64 id);
 struct perf_sample_id *perf_evlist__id2sid(struct perf_evlist *evlist, u64 id);
 
 union perf_event *perf_evlist__mmap_read(struct perf_evlist *evlist, int idx);
-
 void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx);
+struct perf_mmap *perf_evlist__mmap_desc(struct perf_evlist *evlist, int idx);
 
 int perf_evlist__open(struct perf_evlist *evlist);
 void perf_evlist__close(struct perf_evlist *evlist);
@@ -211,6 +214,12 @@ bool perf_evlist__can_select_event(struct perf_evlist *evlist, const char *str);
 void perf_evlist__to_front(struct perf_evlist *evlist,
 			   struct perf_evsel *move_evsel);
 
+/* convert from/to negative idx for track mmaps */
+static inline int track_mmap_idx(int idx)
+{
+	return -idx - 1;
+}
+
 /**
  * __evlist__for_each - iterate thru all the evsels
  * @list: list_head instance to iterate
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 06/37] perf tools: Introduce perf_evlist__mmap_multi()
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (4 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 05/37] perf tools: Create separate mmap for dummy tracking event Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 07/37] perf tools: Do not use __perf_session__process_events() directly Namhyung Kim
                   ` (32 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

The perf_evlist__mmap_multi function creates data mmaps and optionally
tracking mmaps for events.  It'll be used for perf record to save
tracking events in a separate files.  Checking dummy tracking event in
perf_evlist__mmap() alone is not enough as users can specify the dummy
event at first (like in keep tracking testcase) without the multi-file
option.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-record.c |  3 ++-
 tools/perf/util/evlist.c    | 11 +++++++----
 tools/perf/util/evlist.h    | 10 ++++++++--
 3 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 129fab35fdc5..8c91f25b81f6 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -169,7 +169,8 @@ static int record__open(struct record *rec)
 		goto out;
 	}
 
-	if (perf_evlist__mmap(evlist, opts->mmap_pages, false) < 0) {
+	if (perf_evlist__mmap_multi(evlist, opts->mmap_pages, false,
+				    opts->multi_file) < 0) {
 		if (errno == EPERM) {
 			pr_err("Permission error mapping pages.\n"
 			       "Consider increasing "
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index d99343b988fe..010188939104 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -841,6 +841,7 @@ static int perf_evlist__alloc_mmap(struct perf_evlist *evlist, bool track_mmap)
 
 struct mmap_params {
 	int	prot;
+	bool	track;
 	size_t	len;
 };
 
@@ -889,7 +890,7 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
 
 		fd = FD(evsel, cpu, thread);
 
-		if (perf_evsel__is_dummy_tracking(evsel)) {
+		if (mp->track && perf_evsel__is_dummy_tracking(evsel)) {
 			struct mmap_params track_mp = {
 				.prot	= mp->prot,
 				.len	= TRACK_MMAP_SIZE,
@@ -1068,10 +1069,11 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
 }
 
 /**
- * perf_evlist__mmap - Create mmaps to receive events.
+ * perf_evlist__mmap_multi - Create mmaps to receive events.
  * @evlist: list of events
  * @pages: map length in pages
  * @overwrite: overwrite older events?
+ * @use_track_mmap: use another mmaps to track meta events
  *
  * If @overwrite is %false the user needs to signal event consumption using
  * perf_mmap__write_tail().  Using perf_evlist__mmap_read() does this
@@ -1079,14 +1081,15 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
  *
  * Return: %0 on success, negative error code otherwise.
  */
-int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
-		      bool overwrite)
+int perf_evlist__mmap_multi(struct perf_evlist *evlist, unsigned int pages,
+			    bool overwrite, bool use_track_mmap)
 {
 	struct perf_evsel *evsel;
 	const struct cpu_map *cpus = evlist->cpus;
 	const struct thread_map *threads = evlist->threads;
 	struct mmap_params mp = {
 		.prot = PROT_READ | (overwrite ? 0 : PROT_WRITE),
+		.track = use_track_mmap,
 	};
 
 	if (evlist->mmap == NULL && perf_evlist__alloc_mmap(evlist, use_track_mmap) < 0)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index b7f54b8577f7..65c1aea6a3a4 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -127,10 +127,16 @@ int perf_evlist__parse_mmap_pages(const struct option *opt,
 				  const char *str,
 				  int unset);
 
-int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
-		      bool overwrite);
+int perf_evlist__mmap_multi(struct perf_evlist *evlist, unsigned int pages,
+			    bool overwrite, bool use_track_mmap);
 void perf_evlist__munmap(struct perf_evlist *evlist);
 
+static inline int perf_evlist__mmap(struct perf_evlist *evlist,
+				    unsigned int pages, bool overwrite)
+{
+	return perf_evlist__mmap_multi(evlist, pages, overwrite, false);
+}
+
 void perf_evlist__disable(struct perf_evlist *evlist);
 void perf_evlist__enable(struct perf_evlist *evlist);
 
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 07/37] perf tools: Do not use __perf_session__process_events() directly
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (5 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 06/37] perf tools: Introduce perf_evlist__mmap_multi() Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-31 11:33   ` Jiri Olsa
  2014-12-24  7:15 ` [PATCH 08/37] perf tools: Handle multi-file session properly Namhyung Kim
                   ` (31 subsequent siblings)
  38 siblings, 1 reply; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

It's only used for perf record to process build-id because its file
size it's not fixed at this time due to remaining header features.
However data offset and size is available so that we can use the
perf_session__process_events() once we set the file size as the
current offset like for now.

It turns out that we can staticize the function again as it's the only
user and add multi file support in a single place.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-record.c | 7 +++----
 tools/perf/util/session.c   | 6 +++---
 tools/perf/util/session.h   | 3 ---
 3 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 8c91f25b81f6..4f97657f14e7 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -196,12 +196,13 @@ static int process_buildids(struct record *rec)
 {
 	struct perf_data_file *file  = &rec->file;
 	struct perf_session *session = rec->session;
-	u64 start = session->header.data_offset;
 
 	u64 size = lseek(perf_data_file__fd(file), 0, SEEK_CUR);
 	if (size == 0)
 		return 0;
 
+	file->size = size;
+
 	/*
 	 * During this process, it'll load kernel map and replace the
 	 * dso->long_name to a real pathname it found.  In this case
@@ -213,9 +214,7 @@ static int process_buildids(struct record *rec)
 	 */
 	symbol_conf.ignore_vmlinux_buildid = true;
 
-	return __perf_session__process_events(session, start,
-					      size - start,
-					      size, &build_id__mark_dso_hit_ops);
+	return perf_session__process_events(session, &build_id__mark_dso_hit_ops);
 }
 
 static void perf_event__synthesize_guest_os(struct machine *machine, void *data)
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 6ac62ae6b8fa..88aa2f09df93 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1252,9 +1252,9 @@ fetch_mmaped_event(struct perf_session *session,
 #define NUM_MMAPS 128
 #endif
 
-int __perf_session__process_events(struct perf_session *session,
-				   u64 data_offset, u64 data_size,
-				   u64 file_size, struct perf_tool *tool)
+static int __perf_session__process_events(struct perf_session *session,
+					  u64 data_offset, u64 data_size,
+					  u64 file_size, struct perf_tool *tool)
 {
 	int fd = perf_data_file__fd(session->file);
 	u64 head, page_offset, file_offset, file_pos, size;
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index dc26ebf60fe4..6d663dc76404 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -49,9 +49,6 @@ int perf_session__peek_event(struct perf_session *session, off_t file_offset,
 			     union perf_event **event_ptr,
 			     struct perf_sample *sample);
 
-int __perf_session__process_events(struct perf_session *session,
-				   u64 data_offset, u64 data_size, u64 size,
-				   struct perf_tool *tool);
 int perf_session__process_events(struct perf_session *session,
 				 struct perf_tool *tool);
 
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 08/37] perf tools: Handle multi-file session properly
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (6 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 07/37] perf tools: Do not use __perf_session__process_events() directly Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-31 12:01   ` Jiri Olsa
  2014-12-24  7:15 ` [PATCH 09/37] perf record: Add -M/--multi option for multi file recording Namhyung Kim
                   ` (30 subsequent siblings)
  38 siblings, 1 reply; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

When perf detects multi-file data directory, process header file first
and then rest data files in a row.  Note that the multi-file data is
recorded for each cpu/thread separately, it's already ordered with
respect to themselves so no need to use the ordered event queue
interface.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/data.c    | 17 +++++++++++++++++
 tools/perf/util/session.c | 41 +++++++++++++++++++++++++++++++----------
 2 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
index 8dacd34659cc..b6f7cdc4a39f 100644
--- a/tools/perf/util/data.c
+++ b/tools/perf/util/data.c
@@ -52,6 +52,21 @@ static int check_backup(struct perf_data_file *file)
 	return 0;
 }
 
+static void check_multi(struct perf_data_file *file)
+{
+	struct stat st;
+
+	/*
+	 * For write, it'll be determined by user (perf record -M)
+	 * whether to enable multi file data storage.
+	 */
+	if (perf_data_file__is_write(file))
+		return;
+
+	if (!stat(file->path, &st) && S_ISDIR(st.st_mode))
+		file->is_multi = true;
+}
+
 static int scandir_filter(const struct dirent *d)
 {
 	return !prefixcmp(d->d_name, "perf.data.");
@@ -206,6 +221,8 @@ int perf_data_file__open(struct perf_data_file *file)
 	if (check_pipe(file))
 		return 0;
 
+	check_multi(file);
+
 	if (!file->path)
 		file->path = default_data_path(file);
 
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 88aa2f09df93..4f0fcd2d3901 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1252,11 +1252,10 @@ fetch_mmaped_event(struct perf_session *session,
 #define NUM_MMAPS 128
 #endif
 
-static int __perf_session__process_events(struct perf_session *session,
+static int __perf_session__process_events(struct perf_session *session, int fd,
 					  u64 data_offset, u64 data_size,
 					  u64 file_size, struct perf_tool *tool)
 {
-	int fd = perf_data_file__fd(session->file);
 	u64 head, page_offset, file_offset, file_pos, size;
 	int err, mmap_prot, mmap_flags, map_idx = 0;
 	size_t	mmap_size;
@@ -1362,18 +1361,40 @@ int perf_session__process_events(struct perf_session *session,
 				 struct perf_tool *tool)
 {
 	u64 size = perf_data_file__size(session->file);
-	int err;
+	int err, i;
 
 	if (perf_session__register_idle_thread(session) == NULL)
 		return -ENOMEM;
 
-	if (!perf_data_file__is_pipe(session->file))
-		err = __perf_session__process_events(session,
-						     session->header.data_offset,
-						     session->header.data_size,
-						     size, tool);
-	else
-		err = __perf_session__process_pipe_events(session, tool);
+	if (perf_data_file__is_pipe(session->file))
+		return __perf_session__process_pipe_events(session, tool);
+
+	err = __perf_session__process_events(session,
+					     perf_data_file__fd(session->file),
+					     session->header.data_offset,
+					     session->header.data_size,
+					     size, tool);
+	if (!session->file->is_multi || err)
+		return err;
+
+	/*
+	 * For multi-file data storage, events are processed for each
+	 * cpu/thread so it's already ordered.
+	 */
+	tool->ordered_events = false;
+
+	for (i = 0; i < session->file->nr_multi; i++) {
+		int fd = perf_data_file__multi_fd(session->file, i);
+
+		size = lseek(fd, 0, SEEK_END);
+		if (size == 0)
+			continue;
+
+		err = __perf_session__process_events(session, fd,
+						     0, size, size, tool);
+		if (err < 0)
+			break;
+	}
 
 	return err;
 }
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 09/37] perf record: Add -M/--multi option for multi file recording
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (7 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 08/37] perf tools: Handle multi-file session properly Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 10/37] perf report: Skip dummy tracking event Namhyung Kim
                   ` (29 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

The new -M/--multi option enables multi file storage recording.  Now
sample data in separate mmap are saved in different files and other
events will be recorded in the perf.header file.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-record.txt |  5 +++++
 tools/perf/builtin-record.c              | 27 ++++++++++++++++++++++++---
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index af9a54ece024..14247ccd7965 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -220,6 +220,11 @@ Capture machine state (registers) at interrupt, i.e., on counter overflows for
 each sample. List of captured registers depends on the architecture. This option
 is off by default.
 
+-M::
+--multi::
+Record data in multi-file storage instead of a single data file (perf.data).
+This will speed up perf report by parallel processing.
+
 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4f97657f14e7..7f7a4725d080 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -58,6 +58,19 @@ static int record__write(struct record *rec, void *bf, size_t size)
 	return 0;
 }
 
+static int record__write_multi(struct record *rec, void *bf, size_t size, int idx)
+{
+	if (rec->file.is_multi && idx >= 0) {
+		int ret = perf_data_file__write_multi(rec->session->file,
+						      bf, size, idx);
+		if (ret < 0)
+			pr_err("failed to write perf data, error: %m\n");
+
+		return ret;
+	}
+	return record__write(rec, bf, size);
+}
+
 static int process_synthesized_event(struct perf_tool *tool,
 				     union perf_event *event,
 				     struct perf_sample *sample __maybe_unused,
@@ -89,7 +102,7 @@ static int record__mmap_read(struct record *rec, int idx)
 		size = md->mask + 1 - (old & md->mask);
 		old += size;
 
-		if (record__write(rec, buf, size) < 0) {
+		if (record__write_multi(rec, buf, size, idx) < 0) {
 			rc = -1;
 			goto out;
 		}
@@ -99,7 +112,7 @@ static int record__mmap_read(struct record *rec, int idx)
 	size = head - old;
 	old += size;
 
-	if (record__write(rec, buf, size) < 0) {
+	if (record__write_multi(rec, buf, size, idx) < 0) {
 		rc = -1;
 		goto out;
 	}
@@ -186,6 +199,10 @@ static int record__open(struct record *rec)
 		goto out;
 	}
 
+	rc = perf_data_file__prepare_write(session->file, evlist->nr_mmaps);
+	if (rc < 0)
+		goto out;
+
 	session->evlist = evlist;
 	perf_session__set_id_hdr_size(session);
 out:
@@ -822,6 +839,8 @@ struct option __record_options[] = {
 		    "use per-thread mmaps"),
 	OPT_BOOLEAN('I', "intr-regs", &record.opts.sample_intr_regs,
 		    "Sample machine registers on interrupt"),
+	OPT_BOOLEAN('M', "multi", &record.opts.multi_file,
+		    "use multi-file storage"),
 	OPT_END()
 };
 
@@ -871,8 +890,10 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
 		goto out_symbol_exit;
 	}
 
-	if (rec->opts.multi_file)
+	if (rec->opts.multi_file) {
+		rec->file.is_multi = true;
 		perf_evlist__prepend_dummy(rec->evlist);
+	}
 
 	if (rec->opts.target.tid && !rec->opts.no_inherit_set)
 		rec->opts.no_inherit = true;
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 10/37] perf report: Skip dummy tracking event
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (8 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 09/37] perf record: Add -M/--multi option for multi file recording Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 11/37] perf tools: Introduce thread__comm_time() helpers Namhyung Kim
                   ` (28 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

The dummy tracking event is only for tracking task/comom/mmap events
and has no sample data for itself.  So no need to report, just skip it.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-report.c    |  3 +++
 tools/perf/ui/browsers/hists.c | 10 ++++++++--
 tools/perf/ui/gtk/hists.c      |  3 +++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 2f91094e228b..4cac79ad3085 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -318,6 +318,9 @@ static int perf_evlist__tty_browse_hists(struct perf_evlist *evlist,
 		struct hists *hists = evsel__hists(pos);
 		const char *evname = perf_evsel__name(pos);
 
+		if (perf_evsel__is_dummy_tracking(pos))
+			continue;
+
 		if (symbol_conf.event_group &&
 		    !perf_evsel__is_group_leader(pos))
 			continue;
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index e6bb04b5b09b..be594e1d6a99 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -1998,11 +1998,15 @@ int perf_evlist__tui_browse_hists(struct perf_evlist *evlist, const char *help,
 				  struct perf_session_env *env)
 {
 	int nr_entries = evlist->nr_entries;
+	struct perf_evsel *first = perf_evlist__first(evlist);
+
+	if (perf_evsel__is_dummy_tracking(first)) {
+		first = perf_evsel__next(first);
+		nr_entries--;
+	}
 
 single_entry:
 	if (nr_entries == 1) {
-		struct perf_evsel *first = perf_evlist__first(evlist);
-
 		return perf_evsel__hists_browse(first, nr_entries, help,
 						false, hbt, min_pcnt,
 						env);
@@ -2013,6 +2017,8 @@ int perf_evlist__tui_browse_hists(struct perf_evlist *evlist, const char *help,
 
 		nr_entries = 0;
 		evlist__for_each(evlist, pos) {
+			if (perf_evsel__is_dummy_tracking(pos))
+				continue;
 			if (perf_evsel__is_group_leader(pos))
 				nr_entries++;
 		}
diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index 4b3585eed1e8..83a7ecd5cda8 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -317,6 +317,9 @@ int perf_evlist__gtk_browse_hists(struct perf_evlist *evlist,
 		char buf[512];
 		size_t size = sizeof(buf);
 
+		if (perf_evsel__is_dummy_tracking(pos))
+			continue;
+
 		if (symbol_conf.event_group) {
 			if (!perf_evsel__is_group_leader(pos))
 				continue;
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 11/37] perf tools: Introduce thread__comm_time() helpers
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (9 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 10/37] perf report: Skip dummy tracking event Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-26 17:00   ` David Ahern
  2014-12-24  7:15 ` [PATCH 12/37] perf tools: Add a test case for thread comm handling Namhyung Kim
                   ` (27 subsequent siblings)
  38 siblings, 1 reply; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

When multi-file data storage is enabled, it processes all task, comm
and mmap events first and then goes to the sample events.  So all it
sees is the last comm of a thread although it has information at the
time of sample.

Sort thread's comm by time so that it can find appropriate comm at the
sample time.  The thread__comm_time() will mostly work even if
PERF_SAMPLE_TIME bit is off since in that case, sample->time will be
-1 so it'll take the last comm anyway.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/thread.c | 34 +++++++++++++++++++++++++++++++++-
 tools/perf/util/thread.h |  2 ++
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 9ebc8b1f9be5..083fa0fcf316 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -103,6 +103,22 @@ struct comm *thread__exec_comm(const struct thread *thread)
 	return last;
 }
 
+struct comm *thread__comm_time(const struct thread *thread, u64 timestamp)
+{
+	struct comm *comm;
+
+	list_for_each_entry(comm, &thread->comm_list, list) {
+		if (timestamp >= comm->start)
+			return comm;
+	}
+
+	if (list_empty(&thread->comm_list))
+		return NULL;
+
+	return list_last_entry(&thread->comm_list, struct comm, list);
+}
+
+/* CHECKME: time should always be 0 if event aren't ordered */
 int __thread__set_comm(struct thread *thread, const char *str, u64 timestamp,
 		       bool exec)
 {
@@ -118,7 +134,13 @@ int __thread__set_comm(struct thread *thread, const char *str, u64 timestamp,
 		new = comm__new(str, timestamp, exec);
 		if (!new)
 			return -ENOMEM;
-		list_add(&new->list, &thread->comm_list);
+
+		/* sort by time */
+		list_for_each_entry(curr, &thread->comm_list, list) {
+			if (timestamp >= curr->start)
+				break;
+		}
+		list_add_tail(&new->list, &curr->list);
 
 		if (exec)
 			unwind__flush_access(thread);
@@ -139,6 +161,16 @@ const char *thread__comm_str(const struct thread *thread)
 	return comm__str(comm);
 }
 
+const char *thread__comm_time_str(const struct thread *thread, u64 timestamp)
+{
+	const struct comm *comm = thread__comm_time(thread, timestamp);
+
+	if (!comm)
+		return NULL;
+
+	return comm__str(comm);
+}
+
 /* CHECKME: it should probably better return the max comm len from its comm list */
 int thread__comm_len(struct thread *thread)
 {
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 160fd066a7d1..0b6dcd70bc8b 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -53,7 +53,9 @@ static inline int thread__set_comm(struct thread *thread, const char *comm,
 int thread__comm_len(struct thread *thread);
 struct comm *thread__comm(const struct thread *thread);
 struct comm *thread__exec_comm(const struct thread *thread);
+struct comm *thread__comm_time(const struct thread *thread, u64 timestamp);
 const char *thread__comm_str(const struct thread *thread);
+const char *thread__comm_time_str(const struct thread *thread, u64 timestamp);
 void thread__insert_map(struct thread *thread, struct map *map);
 int thread__fork(struct thread *thread, struct thread *parent, u64 timestamp);
 size_t thread__fprintf(struct thread *thread, FILE *fp);
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 12/37] perf tools: Add a test case for thread comm handling
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (10 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 11/37] perf tools: Introduce thread__comm_time() helpers Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 13/37] perf tools: Use thread__comm_time() when adding hist entries Namhyung Kim
                   ` (26 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

The new test case checks various thread comm handling like overridding
and time sorting.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Makefile.perf        |  1 +
 tools/perf/tests/builtin-test.c |  4 ++++
 tools/perf/tests/tests.h        |  1 +
 tools/perf/tests/thread-comm.c  | 47 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 53 insertions(+)
 create mode 100644 tools/perf/tests/thread-comm.c

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 763e68fb5767..e4528a4a3a8c 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -447,6 +447,7 @@ endif
 LIB_OBJS += $(OUTPUT)tests/mmap-thread-lookup.o
 LIB_OBJS += $(OUTPUT)tests/thread-mg-share.o
 LIB_OBJS += $(OUTPUT)tests/switch-tracking.o
+LIB_OBJS += $(OUTPUT)tests/thread-comm.o
 
 BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o
 BUILTIN_OBJS += $(OUTPUT)builtin-bench.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 4b7d9ab0f049..1b463d82a71a 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -167,6 +167,10 @@ static struct test {
 		.func = test__fdarray__add,
 	},
 	{
+		.desc = "Test thread comm handling",
+		.func = test__thread_comm,
+	},
+	{
 		.func = NULL,
 	},
 };
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index 00e776a87a9c..43ac17780629 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -51,6 +51,7 @@ int test__hists_cumulate(void);
 int test__switch_tracking(void);
 int test__fdarray__filter(void);
 int test__fdarray__add(void);
+int test__thread_comm(void);
 
 #if defined(__x86_64__) || defined(__i386__) || defined(__arm__)
 #ifdef HAVE_DWARF_UNWIND_SUPPORT
diff --git a/tools/perf/tests/thread-comm.c b/tools/perf/tests/thread-comm.c
new file mode 100644
index 000000000000..b81a429a6305
--- /dev/null
+++ b/tools/perf/tests/thread-comm.c
@@ -0,0 +1,47 @@
+#include "tests.h"
+#include "machine.h"
+#include "thread.h"
+#include "debug.h"
+
+int test__thread_comm(void)
+{
+	struct machines machines;
+	struct machine *machine;
+	struct thread *t;
+
+	/*
+	 * This test is to check whether it can retrieve a correct
+	 * comm for a given time.  When multi-file data storage is
+	 * enabled, those task/comm events are processed first so the
+	 * later sample should find a matching comm properly.
+	 */
+	machines__init(&machines);
+	machine = &machines.host;
+
+	t = machine__findnew_thread(machine, 100, 100);
+	TEST_ASSERT_VAL("wrong init thread comm",
+			!strcmp(thread__comm_str(t), ":100"));
+
+	thread__set_comm(t, "perf-test1", 10000);
+	TEST_ASSERT_VAL("failed to override thread comm",
+			!strcmp(thread__comm_str(t), "perf-test1"));
+
+	thread__set_comm(t, "perf-test2", 20000);
+	thread__set_comm(t, "perf-test3", 30000);
+	thread__set_comm(t, "perf-test4", 40000);
+
+	TEST_ASSERT_VAL("failed to find timed comm",
+			!strcmp(thread__comm_time_str(t, 20000), "perf-test2"));
+	TEST_ASSERT_VAL("failed to find timed comm",
+			!strcmp(thread__comm_time_str(t, 35000), "perf-test3"));
+	TEST_ASSERT_VAL("failed to find timed comm",
+			!strcmp(thread__comm_time_str(t, 50000), "perf-test4"));
+
+	thread__set_comm(t, "perf-test1.5", 15000);
+	TEST_ASSERT_VAL("failed to sort timed comm",
+			!strcmp(thread__comm_time_str(t, 15000), "perf-test1.5"));
+
+	machine__delete_threads(machine);
+	machines__exit(&machines);
+	return 0;
+}
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 13/37] perf tools: Use thread__comm_time() when adding hist entries
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (11 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 12/37] perf tools: Add a test case for thread comm handling Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-25 22:53   ` Jiri Olsa
  2014-12-24  7:15 ` [PATCH 14/37] perf tools: Convert dead thread list into rbtree Namhyung Kim
                   ` (25 subsequent siblings)
  38 siblings, 1 reply; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

Now thread->comm can be handled with time properly, use it to find
correct comm when adding hist entries.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-annotate.c |  5 +++--
 tools/perf/builtin-diff.c     |  8 ++++----
 tools/perf/tests/hists_link.c |  4 ++--
 tools/perf/util/hist.c        | 19 ++++++++++---------
 tools/perf/util/hist.h        |  2 +-
 5 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 747f86103599..a3b6d9d14925 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -47,7 +47,7 @@ struct perf_annotate {
 };
 
 static int perf_evsel__add_sample(struct perf_evsel *evsel,
-				  struct perf_sample *sample __maybe_unused,
+				  struct perf_sample *sample,
 				  struct addr_location *al,
 				  struct perf_annotate *ann)
 {
@@ -67,7 +67,8 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
 		return 0;
 	}
 
-	he = __hists__add_entry(hists, al, NULL, NULL, NULL, 1, 1, 0, true);
+	he = __hists__add_entry(hists, al, NULL, NULL, NULL, 1, 1, 0,
+				sample->time, true);
 	if (he == NULL)
 		return -ENOMEM;
 
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 260b10c19ad6..2b86607a8571 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -312,10 +312,10 @@ static int formula_fprintf(struct hist_entry *he, struct hist_entry *pair,
 
 static int hists__add_entry(struct hists *hists,
 			    struct addr_location *al, u64 period,
-			    u64 weight, u64 transaction)
+			    u64 weight, u64 transaction, u64 timestamp)
 {
 	if (__hists__add_entry(hists, al, NULL, NULL, NULL, period, weight,
-			       transaction, true) != NULL)
+			       transaction, timestamp, true) != NULL)
 		return 0;
 	return -ENOMEM;
 }
@@ -335,8 +335,8 @@ static int diff__process_sample_event(struct perf_tool *tool __maybe_unused,
 		return -1;
 	}
 
-	if (hists__add_entry(hists, &al, sample->period,
-			     sample->weight, sample->transaction)) {
+	if (hists__add_entry(hists, &al, sample->period, sample->weight,
+			     sample->transaction, sample->time)) {
 		pr_warning("problem incrementing symbol period, skipping event\n");
 		return -1;
 	}
diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c
index 278ba8344c23..fe7cb886c23e 100644
--- a/tools/perf/tests/hists_link.c
+++ b/tools/perf/tests/hists_link.c
@@ -90,7 +90,7 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine)
 				goto out;
 
 			he = __hists__add_entry(hists, &al, NULL,
-						NULL, NULL, 1, 1, 0, true);
+						NULL, NULL, 1, 1, 0, -1, true);
 			if (he == NULL)
 				goto out;
 
@@ -114,7 +114,7 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine)
 				goto out;
 
 			he = __hists__add_entry(hists, &al, NULL,
-						NULL, NULL, 1, 1, 0, true);
+						NULL, NULL, 1, 1, 0, -1, true);
 			if (he == NULL)
 				goto out;
 
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 9314286ed25c..d322264bac22 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -451,11 +451,11 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
 				      struct branch_info *bi,
 				      struct mem_info *mi,
 				      u64 period, u64 weight, u64 transaction,
-				      bool sample_self)
+				      u64 timestamp, bool sample_self)
 {
 	struct hist_entry entry = {
 		.thread	= al->thread,
-		.comm = thread__comm(al->thread),
+		.comm = thread__comm_time(al->thread, timestamp),
 		.ms = {
 			.map	= al->map,
 			.sym	= al->sym,
@@ -513,13 +513,14 @@ iter_add_single_mem_entry(struct hist_entry_iter *iter, struct addr_location *al
 {
 	u64 cost;
 	struct mem_info *mi = iter->priv;
+	struct perf_sample *sample = iter->sample;
 	struct hists *hists = evsel__hists(iter->evsel);
 	struct hist_entry *he;
 
 	if (mi == NULL)
 		return -EINVAL;
 
-	cost = iter->sample->weight;
+	cost = sample->weight;
 	if (!cost)
 		cost = 1;
 
@@ -531,7 +532,7 @@ iter_add_single_mem_entry(struct hist_entry_iter *iter, struct addr_location *al
 	 * and the he_stat__add_period() function.
 	 */
 	he = __hists__add_entry(hists, al, iter->parent, NULL, mi,
-				cost, cost, 0, true);
+				cost, cost, 0, sample->time, true);
 	if (!he)
 		return -ENOMEM;
 
@@ -632,7 +633,7 @@ iter_add_next_branch_entry(struct hist_entry_iter *iter, struct addr_location *a
 	 * and not events sampled. Thus we use a pseudo period of 1.
 	 */
 	he = __hists__add_entry(hists, al, iter->parent, &bi[i], NULL,
-				1, 1, 0, true);
+				1, 1, 0, iter->sample->time, true);
 	if (he == NULL)
 		return -ENOMEM;
 
@@ -670,7 +671,7 @@ iter_add_single_normal_entry(struct hist_entry_iter *iter, struct addr_location
 
 	he = __hists__add_entry(evsel__hists(evsel), al, iter->parent, NULL, NULL,
 				sample->period, sample->weight,
-				sample->transaction, true);
+				sample->transaction, sample->time, true);
 	if (he == NULL)
 		return -ENOMEM;
 
@@ -732,7 +733,7 @@ iter_add_single_cumulative_entry(struct hist_entry_iter *iter,
 
 	he = __hists__add_entry(hists, al, iter->parent, NULL, NULL,
 				sample->period, sample->weight,
-				sample->transaction, true);
+				sample->transaction, sample->time, true);
 	if (he == NULL)
 		return -ENOMEM;
 
@@ -776,7 +777,7 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 	struct hist_entry he_tmp = {
 		.cpu = al->cpu,
 		.thread = al->thread,
-		.comm = thread__comm(al->thread),
+		.comm = thread__comm_time(al->thread, sample->time),
 		.ip = al->addr,
 		.ms = {
 			.map = al->map,
@@ -805,7 +806,7 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 
 	he = __hists__add_entry(evsel__hists(evsel), al, iter->parent, NULL, NULL,
 				sample->period, sample->weight,
-				sample->transaction, false);
+				sample->transaction, sample->time, false);
 	if (he == NULL)
 		return -ENOMEM;
 
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 46bd50344f85..b86966206ba8 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -109,7 +109,7 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
 				      struct branch_info *bi,
 				      struct mem_info *mi, u64 period,
 				      u64 weight, u64 transaction,
-				      bool sample_self);
+				      u64 timestamp, bool sample_self);
 int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
 			 struct perf_evsel *evsel, struct perf_sample *sample,
 			 int max_stack_depth, void *arg);
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 14/37] perf tools: Convert dead thread list into rbtree
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (12 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 13/37] perf tools: Use thread__comm_time() when adding hist entries Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-25 23:05   ` Jiri Olsa
  2014-12-27 15:31   ` David Ahern
  2014-12-24  7:15 ` [PATCH 15/37] perf tools: Introduce machine__find*_thread_time() Namhyung Kim
                   ` (24 subsequent siblings)
  38 siblings, 2 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

Currently perf maintains dead threads in a linked list but this can be
a problem if someone needs to search from it.  Convert it to a rbtree
like normal threads and it'll be used later with multi-file changes.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/machine.c | 59 +++++++++++++++++++++++++++++++++++++++--------
 tools/perf/util/machine.h |  2 +-
 tools/perf/util/thread.c  |  1 +
 tools/perf/util/thread.h  | 11 ++++-----
 4 files changed, 57 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 15dd0a9691ce..582e011adc92 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -28,7 +28,7 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
 	dsos__init(&machine->kernel_dsos);
 
 	machine->threads = RB_ROOT;
-	INIT_LIST_HEAD(&machine->dead_threads);
+	machine->dead_threads = RB_ROOT;
 	machine->last_match = NULL;
 
 	machine->vdso_info = NULL;
@@ -91,10 +91,21 @@ static void dsos__delete(struct dsos *dsos)
 
 void machine__delete_dead_threads(struct machine *machine)
 {
-	struct thread *n, *t;
+	struct rb_node *nd = rb_first(&machine->dead_threads);
+
+	while (nd) {
+		struct thread *t = rb_entry(nd, struct thread, rb_node);
+		struct thread *pos;
+
+		nd = rb_next(nd);
+		rb_erase(&t->rb_node, &machine->dead_threads);
+
+		while (!list_empty(&t->node)) {
+			pos = list_first_entry(&t->node, struct thread, node);
+			list_del(&pos->node);
+			thread__delete(pos);
+		}
 
-	list_for_each_entry_safe(t, n, &machine->dead_threads, node) {
-		list_del(&t->node);
 		thread__delete(t);
 	}
 }
@@ -106,8 +117,8 @@ void machine__delete_threads(struct machine *machine)
 	while (nd) {
 		struct thread *t = rb_entry(nd, struct thread, rb_node);
 
-		rb_erase(&t->rb_node, &machine->threads);
 		nd = rb_next(nd);
+		rb_erase(&t->rb_node, &machine->threads);
 		thread__delete(t);
 	}
 }
@@ -1236,13 +1247,36 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
 
 static void machine__remove_thread(struct machine *machine, struct thread *th)
 {
+	struct rb_node **p = &machine->dead_threads.rb_node;
+	struct rb_node *parent = NULL;
+	struct thread *pos;
+
 	machine->last_match = NULL;
 	rb_erase(&th->rb_node, &machine->threads);
+
+	th->dead = true;
+
 	/*
 	 * We may have references to this thread, for instance in some hist_entry
-	 * instances, so just move them to a separate list.
+	 * instances, so just move them to a separate list in rbtree.
 	 */
-	list_add_tail(&th->node, &machine->dead_threads);
+	while (*p != NULL) {
+		parent = *p;
+		pos = rb_entry(parent, struct thread, rb_node);
+
+		if (pos->tid == th->tid) {
+			list_add_tail(&th->node, &pos->node);
+			return;
+		}
+
+		if (th->tid < pos->tid)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+
+	rb_link_node(&th->rb_node, parent, p);
+	rb_insert_color(&th->rb_node, &machine->dead_threads);
 }
 
 int machine__process_fork_event(struct machine *machine, union perf_event *event,
@@ -1649,7 +1683,7 @@ int machine__for_each_thread(struct machine *machine,
 			     void *priv)
 {
 	struct rb_node *nd;
-	struct thread *thread;
+	struct thread *thread, *pos;
 	int rc = 0;
 
 	for (nd = rb_first(&machine->threads); nd; nd = rb_next(nd)) {
@@ -1659,10 +1693,17 @@ int machine__for_each_thread(struct machine *machine,
 			return rc;
 	}
 
-	list_for_each_entry(thread, &machine->dead_threads, node) {
+	for (nd = rb_first(&machine->dead_threads); nd; nd = rb_next(nd)) {
+		thread = rb_entry(nd, struct thread, rb_node);
 		rc = fn(thread, priv);
 		if (rc != 0)
 			return rc;
+
+		list_for_each_entry(pos, &thread->node, node) {
+			rc = fn(pos, priv);
+			if (rc != 0)
+				return rc;
+		}
 	}
 	return rc;
 }
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index e8b7779a0a3f..4349946a38ff 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -30,7 +30,7 @@ struct machine {
 	bool		  comm_exec;
 	char		  *root_dir;
 	struct rb_root	  threads;
-	struct list_head  dead_threads;
+	struct rb_root	  dead_threads;
 	struct thread	  *last_match;
 	struct vdso_info  *vdso_info;
 	struct dsos	  user_dsos;
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 083fa0fcf316..b9c5c5d5e718 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -38,6 +38,7 @@ struct thread *thread__new(pid_t pid, pid_t tid)
 		thread->ppid = -1;
 		thread->cpu = -1;
 		INIT_LIST_HEAD(&thread->comm_list);
+		INIT_LIST_HEAD(&thread->node);
 
 		if (unwind__prepare_access(thread) < 0)
 			goto err_thread;
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 0b6dcd70bc8b..413f28cf689b 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -11,10 +11,8 @@
 struct thread_stack;
 
 struct thread {
-	union {
-		struct rb_node	 rb_node;
-		struct list_head node;
-	};
+	struct rb_node	 	rb_node;
+	struct list_head 	node;
 	struct map_groups	*mg;
 	pid_t			pid_; /* Not all tools update this */
 	pid_t			tid;
@@ -22,7 +20,8 @@ struct thread {
 	int			cpu;
 	char			shortname[3];
 	bool			comm_set;
-	bool			dead; /* if set thread has exited */
+	bool			exited; /* if set thread has exited */
+	bool			dead; /* thread is in dead_threads list */
 	struct list_head	comm_list;
 	int			comm_len;
 	u64			db_id;
@@ -39,7 +38,7 @@ int thread__init_map_groups(struct thread *thread, struct machine *machine);
 void thread__delete(struct thread *thread);
 static inline void thread__exited(struct thread *thread)
 {
-	thread->dead = true;
+	thread->exited = true;
 }
 
 int __thread__set_comm(struct thread *thread, const char *comm, u64 timestamp,
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 15/37] perf tools: Introduce machine__find*_thread_time()
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (13 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 14/37] perf tools: Convert dead thread list into rbtree Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-27 16:33   ` David Ahern
  2014-12-24  7:15 ` [PATCH 16/37] perf tools: Add a test case for timed thread handling Namhyung Kim
                   ` (23 subsequent siblings)
  38 siblings, 1 reply; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

With multi-file data storage is enabled, it needs to search thread
based on sample time since sample processing is done after other
(task, comm and mmap) events are processed.  This can be a problem if
a session is very long and pid is recycled - in that case it'll only
see the last one.

So keep thread start time in it, and search thread based on the time.
This patch introduces machine__find{,new}_thread_time() function for
this.  It'll first search current thread rbtree and then dead thread
tree and list.  If it couldn't find anyone, it'll create a new thread.

The sample timestamp of 0 means that this is called from synthesized
event so just use current rbtree.  The timestamp will be -1 if sample
didn't record the timestamp so will see current threads automatically.

Dead threads are managed in a rbtree, and if there's more than one
thread has sample tid/pid, it'll be saved in a list.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-script.c     |   5 +-
 tools/perf/tests/dwarf-unwind.c |  10 ++--
 tools/perf/tests/hists_common.c |   3 +-
 tools/perf/tests/hists_link.c   |   2 +-
 tools/perf/util/event.c         |   4 +-
 tools/perf/util/machine.c       | 109 +++++++++++++++++++++++++++++++++++++++-
 tools/perf/util/machine.h       |   8 ++-
 tools/perf/util/thread.c        |   4 ++
 tools/perf/util/thread.h        |   1 +
 9 files changed, 132 insertions(+), 14 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index ce304dfd962a..85122b388d8e 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -549,8 +549,9 @@ static int process_sample_event(struct perf_tool *tool __maybe_unused,
 				struct machine *machine)
 {
 	struct addr_location al;
-	struct thread *thread = machine__findnew_thread(machine, sample->pid,
-							sample->tid);
+	struct thread *thread = machine__findnew_thread_time(machine, sample->pid,
+							     sample->tid,
+							     sample->time);
 
 	if (thread == NULL) {
 		pr_debug("problem processing %d event, skipping it.\n",
diff --git a/tools/perf/tests/dwarf-unwind.c b/tools/perf/tests/dwarf-unwind.c
index ab28cca2cb97..dfecd49fef90 100644
--- a/tools/perf/tests/dwarf-unwind.c
+++ b/tools/perf/tests/dwarf-unwind.c
@@ -13,10 +13,10 @@
 
 static int mmap_handler(struct perf_tool *tool __maybe_unused,
 			union perf_event *event,
-			struct perf_sample *sample __maybe_unused,
+			struct perf_sample *sample,
 			struct machine *machine)
 {
-	return machine__process_mmap2_event(machine, event, NULL);
+	return machine__process_mmap2_event(machine, event, sample);
 }
 
 static int init_live_machine(struct machine *machine)
@@ -61,12 +61,12 @@ static int unwind_entry(struct unwind_entry *entry, void *arg)
 __attribute__ ((noinline))
 static int unwind_thread(struct thread *thread)
 {
-	struct perf_sample sample;
+	struct perf_sample sample = {
+		.time = -1ULL,
+	};
 	unsigned long cnt = 0;
 	int err = -1;
 
-	memset(&sample, 0, sizeof(sample));
-
 	if (test__arch_unwind_sample(&sample, thread)) {
 		pr_debug("failed to get unwind sample\n");
 		goto out;
diff --git a/tools/perf/tests/hists_common.c b/tools/perf/tests/hists_common.c
index a62c09134516..86a8fdb41804 100644
--- a/tools/perf/tests/hists_common.c
+++ b/tools/perf/tests/hists_common.c
@@ -80,6 +80,7 @@ static struct {
 struct machine *setup_fake_machine(struct machines *machines)
 {
 	struct machine *machine = machines__find(machines, HOST_KERNEL_ID);
+	struct perf_sample sample = { .time = -1ULL, };
 	size_t i;
 
 	if (machine == NULL) {
@@ -113,7 +114,7 @@ struct machine *setup_fake_machine(struct machines *machines)
 		strcpy(fake_mmap_event.mmap.filename,
 		       fake_mmap_info[i].filename);
 
-		machine__process_mmap_event(machine, &fake_mmap_event, NULL);
+		machine__process_mmap_event(machine, &fake_mmap_event, &sample);
 	}
 
 	for (i = 0; i < ARRAY_SIZE(fake_symbols); i++) {
diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c
index fe7cb886c23e..07f1d19b88b5 100644
--- a/tools/perf/tests/hists_link.c
+++ b/tools/perf/tests/hists_link.c
@@ -64,7 +64,7 @@ static int add_hist_entries(struct perf_evlist *evlist, struct machine *machine)
 	struct perf_evsel *evsel;
 	struct addr_location al;
 	struct hist_entry *he;
-	struct perf_sample sample = { .period = 1, };
+	struct perf_sample sample = { .period = 1, .time = -1ULL, };
 	size_t i = 0, k;
 
 	/*
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 6c6d044e959a..ff7594a27c73 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -825,8 +825,8 @@ int perf_event__preprocess_sample(const union perf_event *event,
 				  struct perf_sample *sample)
 {
 	u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
-	struct thread *thread = machine__findnew_thread(machine, sample->pid,
-							sample->tid);
+	struct thread *thread = machine__findnew_thread_time(machine, sample->pid,
+							     sample->tid, sample->time);
 
 	if (thread == NULL)
 		return -1;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 582e011adc92..2cc088d71922 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -431,6 +431,103 @@ struct thread *machine__find_thread(struct machine *machine, pid_t pid,
 	return __machine__findnew_thread(machine, pid, tid, false);
 }
 
+static void machine__remove_thread(struct machine *machine, struct thread *th);
+
+static struct thread *__machine__findnew_thread_time(struct machine *machine,
+						     pid_t pid, pid_t tid,
+						     u64 timestamp, bool create)
+{
+	struct thread *curr, *pos, *new;
+	struct thread *th = NULL;
+	struct rb_node **p;
+	struct rb_node *parent = NULL;
+	bool initial = timestamp == (u64)0;
+
+	curr = __machine__findnew_thread(machine, pid, tid, initial);
+	if (curr && timestamp >= curr->start_time)
+		return curr;
+
+	p = &machine->dead_threads.rb_node;
+	while (*p != NULL) {
+		parent = *p;
+		th = rb_entry(parent, struct thread, rb_node);
+
+		if (th->tid == tid) {
+			list_for_each_entry(pos, &th->node, node) {
+				if (timestamp >= pos->start_time &&
+				    pos->start_time > th->start_time) {
+					th = pos;
+					break;
+				}
+			}
+
+			if (timestamp >= th->start_time) {
+				machine__update_thread_pid(machine, th, pid);
+				return th;
+			}
+			break;
+		}
+
+		if (tid < th->tid)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+
+	if (!create)
+		return NULL;
+
+	if (!curr)
+		return __machine__findnew_thread(machine, pid, tid, true);
+
+	new = thread__new(pid, tid);
+	if (new == NULL)
+		return NULL;
+
+	new->start_time = timestamp;
+
+	if (*p) {
+		list_for_each_entry(pos, &th->node, node) {
+			/* sort by time */
+			if (timestamp >= pos->start_time) {
+				th = pos;
+				break;
+			}
+		}
+		list_add_tail(&new->node, &th->node);
+	} else {
+		rb_link_node(&new->rb_node, parent, p);
+		rb_insert_color(&new->rb_node, &machine->dead_threads);
+	}
+
+	/*
+	 * We have to initialize map_groups separately
+	 * after rb tree is updated.
+	 *
+	 * The reason is that we call machine__findnew_thread
+	 * within thread__init_map_groups to find the thread
+	 * leader and that would screwed the rb tree.
+	 */
+	if (thread__init_map_groups(new, machine)) {
+		thread__delete(new);
+		return NULL;
+	}
+
+	return new;
+}
+
+struct thread *machine__find_thread_time(struct machine *machine, pid_t pid,
+					 pid_t tid, u64 timestamp)
+{
+	return __machine__findnew_thread_time(machine, pid, tid, timestamp, false);
+}
+
+struct thread *machine__findnew_thread_time(struct machine *machine, pid_t pid,
+					    pid_t tid, u64 timestamp)
+{
+	return __machine__findnew_thread_time(machine, pid, tid, timestamp, true);
+}
+
 struct comm *machine__thread_exec_comm(struct machine *machine,
 				       struct thread *thread)
 {
@@ -1169,7 +1266,7 @@ int machine__process_mmap2_event(struct machine *machine,
 	}
 
 	thread = machine__findnew_thread(machine, event->mmap2.pid,
-					event->mmap2.tid);
+					 event->mmap2.tid);
 	if (thread == NULL)
 		goto out_problem;
 
@@ -1265,6 +1362,16 @@ static void machine__remove_thread(struct machine *machine, struct thread *th)
 		pos = rb_entry(parent, struct thread, rb_node);
 
 		if (pos->tid == th->tid) {
+			struct thread *old;
+
+			/* sort by time */
+			list_for_each_entry(old, &pos->node, node) {
+				if (th->start_time >= old->start_time) {
+					pos = old;
+					break;
+				}
+			}
+
 			list_add_tail(&th->node, &pos->node);
 			return;
 		}
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 4349946a38ff..9571b6b1c5b5 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -68,8 +68,6 @@ static inline bool machine__kernel_ip(struct machine *machine, u64 ip)
 	return ip >= kernel_start;
 }
 
-struct thread *machine__find_thread(struct machine *machine, pid_t pid,
-				    pid_t tid);
 struct comm *machine__thread_exec_comm(struct machine *machine,
 				       struct thread *thread);
 
@@ -149,6 +147,12 @@ static inline bool machine__is_host(struct machine *machine)
 
 struct thread *machine__findnew_thread(struct machine *machine, pid_t pid,
 				       pid_t tid);
+struct thread *machine__find_thread(struct machine *machine, pid_t pid,
+				    pid_t tid);
+struct thread *machine__findnew_thread_time(struct machine *machine, pid_t pid,
+					    pid_t tid, u64 timestamp);
+struct thread *machine__find_thread_time(struct machine *machine, pid_t pid,
+					 pid_t tid, u64 timestamp);
 
 size_t machine__fprintf(struct machine *machine, FILE *fp);
 
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index b9c5c5d5e718..f2465f17cf16 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -128,6 +128,9 @@ int __thread__set_comm(struct thread *thread, const char *str, u64 timestamp,
 
 	/* Override the default :tid entry */
 	if (!thread->comm_set) {
+		if (!thread->start_time)
+			thread->start_time = timestamp;
+
 		err = comm__override(curr, str, timestamp, exec);
 		if (err)
 			return err;
@@ -229,6 +232,7 @@ int thread__fork(struct thread *thread, struct thread *parent, u64 timestamp)
 	}
 
 	thread->ppid = parent->tid;
+	thread->start_time = timestamp;
 	return thread__clone_map_groups(thread, parent);
 }
 
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 413f28cf689b..5c07cee3b64e 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -25,6 +25,7 @@ struct thread {
 	struct list_head	comm_list;
 	int			comm_len;
 	u64			db_id;
+	u64			start_time;
 
 	void			*priv;
 	struct thread_stack	*ts;
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 16/37] perf tools: Add a test case for timed thread handling
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (14 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 15/37] perf tools: Introduce machine__find*_thread_time() Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-31 14:17   ` Jiri Olsa
  2014-12-24  7:15 ` [PATCH 17/37] perf tools: Maintain map groups list in a leader thread Namhyung Kim
                   ` (22 subsequent siblings)
  38 siblings, 1 reply; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

A test case for verifying live and dead thread tree management during
time change and new machine__find{,new}_thread_time().

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Makefile.perf              |   1 +
 tools/perf/tests/builtin-test.c       |   4 +
 tools/perf/tests/tests.h              |   1 +
 tools/perf/tests/thread-lookup-time.c | 174 ++++++++++++++++++++++++++++++++++
 4 files changed, 180 insertions(+)
 create mode 100644 tools/perf/tests/thread-lookup-time.c

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index e4528a4a3a8c..6094f0a10d8b 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -448,6 +448,7 @@ LIB_OBJS += $(OUTPUT)tests/mmap-thread-lookup.o
 LIB_OBJS += $(OUTPUT)tests/thread-mg-share.o
 LIB_OBJS += $(OUTPUT)tests/switch-tracking.o
 LIB_OBJS += $(OUTPUT)tests/thread-comm.o
+LIB_OBJS += $(OUTPUT)tests/thread-lookup-time.o
 
 BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o
 BUILTIN_OBJS += $(OUTPUT)builtin-bench.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 1b463d82a71a..e4d335de19ea 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -171,6 +171,10 @@ static struct test {
 		.func = test__thread_comm,
 	},
 	{
+		.desc = "Test thread lookup with time",
+		.func = test__thread_lookup_time,
+	},
+	{
 		.func = NULL,
 	},
 };
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index 43ac17780629..1090337f63e5 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -52,6 +52,7 @@ int test__switch_tracking(void);
 int test__fdarray__filter(void);
 int test__fdarray__add(void);
 int test__thread_comm(void);
+int test__thread_lookup_time(void);
 
 #if defined(__x86_64__) || defined(__i386__) || defined(__arm__)
 #ifdef HAVE_DWARF_UNWIND_SUPPORT
diff --git a/tools/perf/tests/thread-lookup-time.c b/tools/perf/tests/thread-lookup-time.c
new file mode 100644
index 000000000000..6237ecf8caae
--- /dev/null
+++ b/tools/perf/tests/thread-lookup-time.c
@@ -0,0 +1,174 @@
+#include "tests.h"
+#include "machine.h"
+#include "thread.h"
+#include "map.h"
+#include "debug.h"
+
+static int thread__print_cb(struct thread *th, void *arg __maybe_unused)
+{
+	printf("thread: %d, start time: %"PRIu64" %s\n",
+	       th->tid, th->start_time, th->dead ? "(dead)" : "");
+	return 0;
+}
+
+static int lookup_with_timestamp(struct machine *machine)
+{
+	struct thread *t1, *t2, *t3;
+	union perf_event fork = {
+		.fork = {
+			.pid = 0,
+			.tid = 0,
+			.ppid = 1,
+			.ptid = 1,
+		},
+	};
+	struct perf_sample sample = {
+		.time = 50000,
+	};
+
+	/* start_time is set to 0 */
+	t1 = machine__findnew_thread(machine, 0, 0);
+
+	if (verbose > 1) {
+		printf("========= after t1 created ==========\n");
+		machine__for_each_thread(machine, thread__print_cb, NULL);
+	}
+
+	TEST_ASSERT_VAL("wrong start time of old thread", t1->start_time == 0);
+
+	TEST_ASSERT_VAL("cannot find current thread",
+			machine__find_thread(machine, 0, 0) == t1);
+
+	TEST_ASSERT_VAL("cannot find current thread with time",
+			machine__findnew_thread_time(machine, 0, 0, 10000) == t1);
+
+	/* start_time is overwritten to new value */
+	thread__set_comm(t1, "/usr/bin/perf", 20000);
+
+	if (verbose > 1) {
+		printf("========= after t1 set comm ==========\n");
+		machine__for_each_thread(machine, thread__print_cb, NULL);
+	}
+
+	TEST_ASSERT_VAL("failed to update start time", t1->start_time == 20000);
+
+	TEST_ASSERT_VAL("should not find passed thread",
+			/* this will create yet another dead thread */
+			machine__findnew_thread_time(machine, 0, 0, 10000) != t1);
+
+	TEST_ASSERT_VAL("cannot find overwritten thread with time",
+			machine__find_thread_time(machine, 0, 0, 20000) == t1);
+
+	/* now t1 goes to dead thread tree, and create t2 */
+	machine__process_fork_event(machine, &fork, &sample);
+
+	if (verbose > 1) {
+		printf("========= after t2 forked ==========\n");
+		machine__for_each_thread(machine, thread__print_cb, NULL);
+	}
+
+	t2 = machine__find_thread(machine, 0, 0);
+	TEST_ASSERT_VAL("cannot find current thread", t2 != NULL);
+
+	TEST_ASSERT_VAL("wrong start time of new thread", t2->start_time == 50000);
+
+	TEST_ASSERT_VAL("dead thread cannot be found",
+			machine__find_thread_time(machine, 0, 0, 10000) != t1);
+
+	TEST_ASSERT_VAL("cannot find dead thread after new thread",
+			machine__find_thread_time(machine, 0, 0, 30000) == t1);
+
+	TEST_ASSERT_VAL("cannot find current thread after new thread",
+			machine__find_thread_time(machine, 0, 0, 50000) == t2);
+
+	/* now t2 goes to dead thread tree, and create t3 */
+	sample.time = 60000;
+	machine__process_fork_event(machine, &fork, &sample);
+
+	if (verbose > 1) {
+		printf("========= after t3 forked ==========\n");
+		machine__for_each_thread(machine, thread__print_cb, NULL);
+	}
+
+	t3 = machine__find_thread(machine, 0, 0);
+	TEST_ASSERT_VAL("cannot find current thread", t3 != NULL);
+
+	TEST_ASSERT_VAL("wrong start time of new thread", t3->start_time == 60000);
+
+	TEST_ASSERT_VAL("cannot find dead thread after new thread",
+			machine__findnew_thread_time(machine, 0, 0, 30000) == t1);
+
+	TEST_ASSERT_VAL("cannot find dead thread after new thread",
+			machine__findnew_thread_time(machine, 0, 0, 50000) == t2);
+
+	TEST_ASSERT_VAL("cannot find current thread after new thread",
+			machine__findnew_thread_time(machine, 0, 0, 70000) == t3);
+
+	machine__delete_threads(machine);
+	return 0;
+}
+
+static int lookup_without_timestamp(struct machine *machine)
+{
+	struct thread *t1, *t2, *t3;
+	union perf_event fork = {
+		.fork = {
+			.pid = 0,
+			.tid = 0,
+			.ppid = 1,
+			.ptid = 1,
+		},
+	};
+	struct perf_sample sample = {
+		.time = -1ULL,
+	};
+
+	t1 = machine__findnew_thread(machine, 0, 0);
+	TEST_ASSERT_VAL("cannot find current thread", t1 != NULL);
+
+	TEST_ASSERT_VAL("cannot find new thread with time",
+			machine__findnew_thread_time(machine, 0, 0, -1ULL) == t1);
+
+	machine__process_fork_event(machine, &fork, &sample);
+
+	t2 = machine__find_thread(machine, 0, 0);
+	TEST_ASSERT_VAL("cannot find current thread", t2 != NULL);
+
+	TEST_ASSERT_VAL("cannot find new thread with time",
+			machine__find_thread_time(machine, 0, 0, -1ULL) == t2);
+
+	machine__process_fork_event(machine, &fork, &sample);
+
+	t3 = machine__find_thread(machine, 0, 0);
+	TEST_ASSERT_VAL("cannot find current thread", t3 != NULL);
+
+	TEST_ASSERT_VAL("cannot find new thread with time",
+			machine__findnew_thread_time(machine, 0, 0, -1ULL) == t3);
+
+	machine__delete_threads(machine);
+	return 0;
+}
+
+int test__thread_lookup_time(void)
+{
+	struct machines machines;
+	struct machine *machine;
+
+	/*
+	 * This test is to check whether it can retrieve a correct
+	 * thread for a given time.  When multi-file data storage is
+	 * enabled, those task/comm/mmap events are processed first so
+	 * the later sample should find a matching thread properly.
+	 */
+	machines__init(&machines);
+	machine = &machines.host;
+
+	if (lookup_with_timestamp(machine) < 0)
+		return -1;
+
+	if (lookup_without_timestamp(machine) < 0)
+		return -1;
+
+	machines__exit(&machines);
+	return 0;
+}
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 17/37] perf tools: Maintain map groups list in a leader thread
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (15 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 16/37] perf tools: Add a test case for timed thread handling Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 18/37] perf tools: Remove thread when map groups initialization failed Namhyung Kim
                   ` (21 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

To support multi-threaded perf report, we need to maintain time-sorted
map groups.  Add ->mg_list member to struct thread and sort the list
by time.  Now leader threads have one more refcnt for map groups in
the list so also update the thread-mg-share test case.

Currently only add a new map groups when an exec (comm) event is
received.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/tests/thread-mg-share.c |  7 +++-
 tools/perf/util/event.c            |  2 +
 tools/perf/util/machine.c          |  2 +-
 tools/perf/util/map.c              |  1 +
 tools/perf/util/map.h              |  2 +
 tools/perf/util/thread.c           | 80 +++++++++++++++++++++++++++++++++++++-
 tools/perf/util/thread.h           |  3 ++
 7 files changed, 93 insertions(+), 4 deletions(-)

diff --git a/tools/perf/tests/thread-mg-share.c b/tools/perf/tests/thread-mg-share.c
index b028499dd3cf..8933e01d0549 100644
--- a/tools/perf/tests/thread-mg-share.c
+++ b/tools/perf/tests/thread-mg-share.c
@@ -23,6 +23,9 @@ int test__thread_mg_share(void)
 	 * with several threads and checks they properly share and
 	 * maintain map groups info (struct map_groups).
 	 *
+	 * Note that a leader thread has one more refcnt for its
+	 * (current) map groups.
+	 *
 	 * thread group (pid: 0, tids: 0, 1, 2, 3)
 	 * other  group (pid: 4, tids: 4, 5)
 	*/
@@ -43,7 +46,7 @@ int test__thread_mg_share(void)
 			leader && t1 && t2 && t3 && other);
 
 	mg = leader->mg;
-	TEST_ASSERT_VAL("wrong refcnt", mg->refcnt == 4);
+	TEST_ASSERT_VAL("wrong refcnt", mg->refcnt == 5);
 
 	/* test the map groups pointer is shared */
 	TEST_ASSERT_VAL("map groups don't match", mg == t1->mg);
@@ -59,7 +62,7 @@ int test__thread_mg_share(void)
 	TEST_ASSERT_VAL("failed to find other leader", other_leader);
 
 	other_mg = other->mg;
-	TEST_ASSERT_VAL("wrong refcnt", other_mg->refcnt == 2);
+	TEST_ASSERT_VAL("wrong refcnt", other_mg->refcnt == 3);
 
 	TEST_ASSERT_VAL("map groups don't match", other_mg == other_leader->mg);
 
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index ff7594a27c73..2d04949bdc7d 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -750,6 +750,8 @@ void thread__find_addr_map(struct thread *thread, u8 cpumode,
 		return;
 	}
 
+	BUG_ON(mg == NULL);
+
 	if (cpumode == PERF_RECORD_MISC_KERNEL && perf_host) {
 		al->level = 'k';
 		mg = &machine->kmaps;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 2cc088d71922..031bace39fdc 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -330,7 +330,7 @@ static void machine__update_thread_pid(struct machine *machine,
 		goto out_err;
 
 	if (!leader->mg)
-		leader->mg = map_groups__new(machine);
+		thread__set_map_groups(leader, map_groups__new(machine), 0);
 
 	if (!leader->mg)
 		goto out_err;
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 62ca9f2607d5..f0c1e2a24fee 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -422,6 +422,7 @@ void map_groups__init(struct map_groups *mg, struct machine *machine)
 	}
 	mg->machine = machine;
 	mg->refcnt = 1;
+	mg->timestamp = 0;
 }
 
 static void maps__delete(struct rb_root *maps)
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index 6951a9d42339..dd92510b03d1 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -61,7 +61,9 @@ struct map_groups {
 	struct rb_root	 maps[MAP__NR_TYPES];
 	struct list_head removed_maps[MAP__NR_TYPES];
 	struct machine	 *machine;
+	u64		 timestamp;
 	int		 refcnt;
+	struct list_head list;
 };
 
 struct map_groups *map_groups__new(struct machine *machine);
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index f2465f17cf16..109ceb5e2a85 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -10,13 +10,64 @@
 #include "comm.h"
 #include "unwind.h"
 
+struct map_groups *thread__get_map_groups(struct thread *thread, u64 timestamp)
+{
+	struct map_groups *mg;
+
+	list_for_each_entry(mg, &thread->mg_list, list)
+		if (timestamp >= mg->timestamp)
+			return mg;
+
+	return thread->mg;
+}
+
+int thread__set_map_groups(struct thread *thread, struct map_groups *mg,
+			   u64 timestamp)
+{
+	struct list_head *pos;
+	struct map_groups *old;
+
+	if (mg == NULL)
+		return -ENOMEM;
+
+	/*
+	 * Only a leader thread can have map groups list - others
+	 * reference it through map_groups__get.  This means the
+	 * leader thread will have one more refcnt than others.
+	 */
+	if (thread->tid != thread->pid_)
+		return -EINVAL;
+
+	if (thread->mg) {
+		BUG_ON(thread->mg->refcnt <= 1);
+		map_groups__put(thread->mg);
+	}
+
+	/* sort by time */
+	list_for_each(pos, &thread->mg_list) {
+		old = list_entry(pos, struct map_groups, list);
+		if (timestamp > old->timestamp)
+			break;
+	}
+
+	list_add_tail(&mg->list, pos);
+	mg->timestamp = timestamp;
+
+	/* set current ->mg to most recent one */
+	thread->mg = list_first_entry(&thread->mg_list, struct map_groups, list);
+	/* increase one more refcnt for current */
+	map_groups__get(thread->mg);
+
+	return 0;
+}
+
 int thread__init_map_groups(struct thread *thread, struct machine *machine)
 {
 	struct thread *leader;
 	pid_t pid = thread->pid_;
 
 	if (pid == thread->tid || pid == -1) {
-		thread->mg = map_groups__new(machine);
+		thread__set_map_groups(thread, map_groups__new(machine), 0);
 	} else {
 		leader = machine__findnew_thread(machine, pid, pid);
 		if (leader)
@@ -39,6 +90,7 @@ struct thread *thread__new(pid_t pid, pid_t tid)
 		thread->cpu = -1;
 		INIT_LIST_HEAD(&thread->comm_list);
 		INIT_LIST_HEAD(&thread->node);
+		INIT_LIST_HEAD(&thread->mg_list);
 
 		if (unwind__prepare_access(thread) < 0)
 			goto err_thread;
@@ -67,6 +119,7 @@ struct thread *thread__new(pid_t pid, pid_t tid)
 void thread__delete(struct thread *thread)
 {
 	struct comm *comm, *tmp;
+	struct map_groups *mg, *tmp_mg;
 
 	thread_stack__free(thread);
 
@@ -74,6 +127,11 @@ void thread__delete(struct thread *thread)
 		map_groups__put(thread->mg);
 		thread->mg = NULL;
 	}
+	/* only leader threads have mg list */
+	list_for_each_entry_safe(mg, tmp_mg, &thread->mg_list, list) {
+		list_del(&mg->list);
+		map_groups__put(mg);
+	}
 	list_for_each_entry_safe(comm, tmp, &thread->comm_list, list) {
 		list_del(&comm->list);
 		comm__free(comm);
@@ -150,6 +208,26 @@ int __thread__set_comm(struct thread *thread, const char *str, u64 timestamp,
 			unwind__flush_access(thread);
 	}
 
+	if (exec) {
+		struct machine *machine;
+
+		BUG_ON(thread->mg == NULL || thread->mg->machine == NULL);
+
+		if (thread->tid != thread->pid_) {
+			/* now it'll be a new leader */
+			thread->pid_ = thread->tid;
+
+			/* current mg of leader thread needs one more refcnt */
+			map_groups__get(thread->mg);
+
+			thread__set_map_groups(thread, thread->mg,
+					       thread->mg->timestamp);
+		}
+
+		machine = thread->mg->machine;
+		thread__set_map_groups(thread, map_groups__new(machine), timestamp);
+	}
+
 	thread->comm_set = true;
 
 	return 0;
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 5c07cee3b64e..8b9a67764613 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -14,6 +14,7 @@ struct thread {
 	struct rb_node	 	rb_node;
 	struct list_head 	node;
 	struct map_groups	*mg;
+	struct list_head	mg_list;
 	pid_t			pid_; /* Not all tools update this */
 	pid_t			tid;
 	pid_t			ppid;
@@ -56,6 +57,8 @@ struct comm *thread__exec_comm(const struct thread *thread);
 struct comm *thread__comm_time(const struct thread *thread, u64 timestamp);
 const char *thread__comm_str(const struct thread *thread);
 const char *thread__comm_time_str(const struct thread *thread, u64 timestamp);
+struct map_groups *thread__get_map_groups(struct thread *thread, u64 timestamp);
+int thread__set_map_groups(struct thread *thread, struct map_groups *mg, u64 timestamp);
 void thread__insert_map(struct thread *thread, struct map *map);
 int thread__fork(struct thread *thread, struct thread *parent, u64 timestamp);
 size_t thread__fprintf(struct thread *thread, FILE *fp);
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 18/37] perf tools: Remove thread when map groups initialization failed
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (16 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 17/37] perf tools: Maintain map groups list in a leader thread Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-28  0:45   ` David Ahern
  2014-12-24  7:15 ` [PATCH 19/37] perf tools: Introduce thread__find_addr_location_time() and friends Namhyung Kim
                   ` (20 subsequent siblings)
  38 siblings, 1 reply; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

Otherwise it'll break the machine->threads tree.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/machine.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 031bace39fdc..beae6e8fe789 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -411,6 +411,7 @@ static struct thread *__machine__findnew_thread(struct machine *machine,
 		 * leader and that would screwed the rb tree.
 		 */
 		if (thread__init_map_groups(th, machine)) {
+			rb_erase(&th->rb_node, &machine->threads);
 			thread__delete(th);
 			return NULL;
 		}
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 19/37] perf tools: Introduce thread__find_addr_location_time() and friends
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (17 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 18/37] perf tools: Remove thread when map groups initialization failed Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 20/37] perf tools: Add a test case for timed map groups handling Namhyung Kim
                   ` (19 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

The *_time() variants are for find appropriate map (and symbol) at the
given time.  This is based on the fact that map_groups list is sorted
by time in the previous patch.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/event.c            | 59 ++++++++++++++++++++++++++++++++++----
 tools/perf/util/machine.c          | 51 ++++++++++++++++++--------------
 tools/perf/util/thread.c           | 21 ++++++++++++++
 tools/perf/util/thread.h           | 10 +++++++
 tools/perf/util/unwind-libdw.c     | 11 +++----
 tools/perf/util/unwind-libunwind.c | 18 ++++++------
 6 files changed, 129 insertions(+), 41 deletions(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 2d04949bdc7d..3bb186a26314 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -731,16 +731,14 @@ int perf_event__process(struct perf_tool *tool __maybe_unused,
 	return machine__process_event(machine, event, sample);
 }
 
-void thread__find_addr_map(struct thread *thread, u8 cpumode,
-			   enum map_type type, u64 addr,
-			   struct addr_location *al)
+static void map_groups__find_addr_map(struct map_groups *mg, u8 cpumode,
+				      enum map_type type, u64 addr,
+				      struct addr_location *al)
 {
-	struct map_groups *mg = thread->mg;
 	struct machine *machine = mg->machine;
 	bool load_map = false;
 
 	al->machine = machine;
-	al->thread = thread;
 	al->addr = addr;
 	al->cpumode = cpumode;
 	al->filtered = 0;
@@ -809,6 +807,35 @@ void thread__find_addr_map(struct thread *thread, u8 cpumode,
 	}
 }
 
+void thread__find_addr_map(struct thread *thread, u8 cpumode,
+			   enum map_type type, u64 addr,
+			   struct addr_location *al)
+{
+	al->thread = thread;
+	map_groups__find_addr_map(thread->mg, cpumode, type, addr, al);
+}
+
+void thread__find_addr_map_time(struct thread *thread, u8 cpumode,
+				enum map_type type, u64 addr,
+				struct addr_location *al, u64 timestamp)
+{
+	struct map_groups *mg;
+	struct thread *leader;
+
+	if (thread->tid == thread->pid_)
+		leader = thread;
+	else
+		leader = machine__find_thread(thread->mg->machine,
+					      thread->pid_, thread->pid_);
+
+	BUG_ON(leader == NULL);
+
+	mg = thread__get_map_groups(leader, timestamp);
+
+	al->thread = thread;
+	map_groups__find_addr_map(mg, cpumode, type, addr, al);
+}
+
 void thread__find_addr_location(struct thread *thread,
 				u8 cpumode, enum map_type type, u64 addr,
 				struct addr_location *al)
@@ -821,6 +848,21 @@ void thread__find_addr_location(struct thread *thread,
 		al->sym = NULL;
 }
 
+void thread__find_addr_location_time(struct thread *thread, u8 cpumode,
+				     enum map_type type, u64 addr,
+				     struct addr_location *al, u64 timestamp)
+{
+	struct map_groups *mg;
+
+	mg = thread__get_map_groups(thread, timestamp);
+	map_groups__find_addr_map(mg, cpumode, type, addr, al);
+	if (al->map != NULL)
+		al->sym = map__find_symbol(al->map, al->addr,
+					   mg->machine->symbol_filter);
+	else
+		al->sym = NULL;
+}
+
 int perf_event__preprocess_sample(const union perf_event *event,
 				  struct machine *machine,
 				  struct addr_location *al,
@@ -845,7 +887,12 @@ int perf_event__preprocess_sample(const union perf_event *event,
 	    machine->vmlinux_maps[MAP__FUNCTION] == NULL)
 		machine__create_kernel_maps(machine);
 
-	thread__find_addr_map(thread, cpumode, MAP__FUNCTION, sample->ip, al);
+	/*
+	 * sample->time is -1ULL if !PERF_SAMPLE_TIME which ends up
+	 * with using most recent map_groups (same as default behavior).
+	 */
+	thread__find_addr_map_time(thread, cpumode, MAP__FUNCTION, sample->ip,
+				   al, sample->time);
 	dump_printf(" ...... dso: %s\n",
 		    al->map ? al->map->dso->long_name :
 			al->level == 'H' ? "[hypervisor]" : "<not found>");
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index beae6e8fe789..ffce0bcd2d9a 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1466,7 +1466,7 @@ static bool symbol__match_regex(struct symbol *sym, regex_t *regex)
 
 static void ip__resolve_ams(struct thread *thread,
 			    struct addr_map_symbol *ams,
-			    u64 ip)
+			    u64 ip, u64 timestamp)
 {
 	struct addr_location al;
 
@@ -1478,7 +1478,8 @@ static void ip__resolve_ams(struct thread *thread,
 	 * Thus, we have to try consecutively until we find a match
 	 * or else, the symbol is unknown
 	 */
-	thread__find_cpumode_addr_location(thread, MAP__FUNCTION, ip, &al);
+	thread__find_cpumode_addr_location_time(thread, MAP__FUNCTION, ip, &al,
+						timestamp);
 
 	ams->addr = ip;
 	ams->al_addr = al.addr;
@@ -1486,21 +1487,24 @@ static void ip__resolve_ams(struct thread *thread,
 	ams->map = al.map;
 }
 
-static void ip__resolve_data(struct thread *thread,
-			     u8 m, struct addr_map_symbol *ams, u64 addr)
+static void ip__resolve_data(struct thread *thread, u8 m,
+			     struct addr_map_symbol *ams,
+			     u64 addr, u64 timestamp)
 {
 	struct addr_location al;
 
 	memset(&al, 0, sizeof(al));
 
-	thread__find_addr_location(thread, m, MAP__VARIABLE, addr, &al);
+	thread__find_addr_location_time(thread, m, MAP__VARIABLE, addr,
+					&al, timestamp);
 	if (al.map == NULL) {
 		/*
 		 * some shared data regions have execute bit set which puts
 		 * their mapping in the MAP__FUNCTION type array.
 		 * Check there as a fallback option before dropping the sample.
 		 */
-		thread__find_addr_location(thread, m, MAP__FUNCTION, addr, &al);
+		thread__find_addr_location_time(thread, m, MAP__FUNCTION, addr,
+						&al, timestamp);
 	}
 
 	ams->addr = addr;
@@ -1517,8 +1521,9 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
 	if (!mi)
 		return NULL;
 
-	ip__resolve_ams(al->thread, &mi->iaddr, sample->ip);
-	ip__resolve_data(al->thread, al->cpumode, &mi->daddr, sample->addr);
+	ip__resolve_ams(al->thread, &mi->iaddr, sample->ip, sample->time);
+	ip__resolve_data(al->thread, al->cpumode, &mi->daddr, sample->addr,
+			 sample->time);
 	mi->data_src.val = sample->data_src;
 
 	return mi;
@@ -1527,19 +1532,20 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
 static int add_callchain_ip(struct thread *thread,
 			    struct symbol **parent,
 			    struct addr_location *root_al,
-			    int cpumode,
-			    u64 ip)
+			    int cpumode, u64 ip, u64 timestamp)
 {
 	struct addr_location al;
 
 	al.filtered = 0;
 	al.sym = NULL;
+
 	if (cpumode == -1)
-		thread__find_cpumode_addr_location(thread, MAP__FUNCTION,
-						   ip, &al);
+		thread__find_cpumode_addr_location_time(thread, MAP__FUNCTION,
+							ip, &al, timestamp);
 	else
-		thread__find_addr_location(thread, cpumode, MAP__FUNCTION,
-				   ip, &al);
+		thread__find_addr_location_time(thread, cpumode, MAP__FUNCTION,
+						ip, &al, timestamp);
+
 	if (al.sym != NULL) {
 		if (sort__has_parent && !*parent &&
 		    symbol__match_regex(al.sym, &parent_regex))
@@ -1567,8 +1573,10 @@ struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
 		return NULL;
 
 	for (i = 0; i < bs->nr; i++) {
-		ip__resolve_ams(al->thread, &bi[i].to, bs->entries[i].to);
-		ip__resolve_ams(al->thread, &bi[i].from, bs->entries[i].from);
+		ip__resolve_ams(al->thread, &bi[i].to, bs->entries[i].to,
+				sample->time);
+		ip__resolve_ams(al->thread, &bi[i].from, bs->entries[i].from,
+				sample->time);
 		bi[i].flags = bs->entries[i].flags;
 	}
 	return bi;
@@ -1620,7 +1628,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 					     struct branch_stack *branch,
 					     struct symbol **parent,
 					     struct addr_location *root_al,
-					     int max_stack)
+					     int max_stack, u64 timestamp)
 {
 	u8 cpumode = PERF_RECORD_MISC_USER;
 	int chain_nr = min(max_stack, (int)chain->nr);
@@ -1684,10 +1692,10 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 
 		for (i = 0; i < nr; i++) {
 			err = add_callchain_ip(thread, parent, root_al,
-					       -1, be[i].to);
+					       -1, be[i].to, timestamp);
 			if (!err)
 				err = add_callchain_ip(thread, parent, root_al,
-						       -1, be[i].from);
+						       -1, be[i].from, timestamp);
 			if (err == -EINVAL)
 				break;
 			if (err)
@@ -1741,7 +1749,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 		}
 
 		err = add_callchain_ip(thread, parent, root_al,
-				       cpumode, ip);
+				       cpumode, ip, timestamp);
 		if (err == -EINVAL)
 			break;
 		if (err)
@@ -1767,7 +1775,8 @@ int thread__resolve_callchain(struct thread *thread,
 {
 	int ret = thread__resolve_callchain_sample(thread, sample->callchain,
 						   sample->branch_stack,
-						   parent, root_al, max_stack);
+						   parent, root_al, max_stack,
+						   sample->time);
 	if (ret)
 		return ret;
 
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 109ceb5e2a85..87c20308b91f 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -332,3 +332,24 @@ void thread__find_cpumode_addr_location(struct thread *thread,
 			break;
 	}
 }
+
+void thread__find_cpumode_addr_location_time(struct thread *thread,
+					     enum map_type type, u64 addr,
+					     struct addr_location *al,
+					     u64 timestamp)
+{
+	size_t i;
+	const u8 const cpumodes[] = {
+		PERF_RECORD_MISC_USER,
+		PERF_RECORD_MISC_KERNEL,
+		PERF_RECORD_MISC_GUEST_USER,
+		PERF_RECORD_MISC_GUEST_KERNEL
+	};
+
+	for (i = 0; i < ARRAY_SIZE(cpumodes); i++) {
+		thread__find_addr_location_time(thread, cpumodes[i], type,
+						addr, al, timestamp);
+		if (al->map)
+			break;
+	}
+}
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 8b9a67764613..0b88ca22bc3d 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -66,14 +66,24 @@ size_t thread__fprintf(struct thread *thread, FILE *fp);
 void thread__find_addr_map(struct thread *thread,
 			   u8 cpumode, enum map_type type, u64 addr,
 			   struct addr_location *al);
+void thread__find_addr_map_time(struct thread *thread, u8 cpumode,
+				enum map_type type, u64 addr,
+				struct addr_location *al, u64 timestamp);
 
 void thread__find_addr_location(struct thread *thread,
 				u8 cpumode, enum map_type type, u64 addr,
 				struct addr_location *al);
+void thread__find_addr_location_time(struct thread *thread, u8 cpumode,
+				     enum map_type type, u64 addr,
+				     struct addr_location *al, u64 timestamp);
 
 void thread__find_cpumode_addr_location(struct thread *thread,
 					enum map_type type, u64 addr,
 					struct addr_location *al);
+void thread__find_cpumode_addr_location_time(struct thread *thread,
+					     enum map_type type, u64 addr,
+					     struct addr_location *al,
+					     u64 timestamp);
 
 static inline void *thread__priv(struct thread *thread)
 {
diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c
index 2dcfe9a7c8d0..ba8d8e41d680 100644
--- a/tools/perf/util/unwind-libdw.c
+++ b/tools/perf/util/unwind-libdw.c
@@ -26,9 +26,10 @@ static int __report_module(struct addr_location *al, u64 ip,
 	Dwfl_Module *mod;
 	struct dso *dso = NULL;
 
-	thread__find_addr_location(ui->thread,
-				   PERF_RECORD_MISC_USER,
-				   MAP__FUNCTION, ip, al);
+	thread__find_addr_location_time(ui->thread,
+					PERF_RECORD_MISC_USER,
+					MAP__FUNCTION, ip, al,
+					ui->sample->time);
 
 	if (al->map)
 		dso = al->map->dso;
@@ -89,8 +90,8 @@ static int access_dso_mem(struct unwind_info *ui, Dwarf_Addr addr,
 	struct addr_location al;
 	ssize_t size;
 
-	thread__find_addr_map(ui->thread, PERF_RECORD_MISC_USER,
-			      MAP__FUNCTION, addr, &al);
+	thread__find_addr_map_time(ui->thread, PERF_RECORD_MISC_USER,
+				   MAP__FUNCTION, addr, &al, ui->sample->time);
 	if (!al.map) {
 		pr_debug("unwind: no map for %lx\n", (unsigned long)addr);
 		return -1;
diff --git a/tools/perf/util/unwind-libunwind.c b/tools/perf/util/unwind-libunwind.c
index 371219a6daf1..94929ec491f3 100644
--- a/tools/perf/util/unwind-libunwind.c
+++ b/tools/perf/util/unwind-libunwind.c
@@ -284,8 +284,8 @@ static struct map *find_map(unw_word_t ip, struct unwind_info *ui)
 {
 	struct addr_location al;
 
-	thread__find_addr_map(ui->thread, PERF_RECORD_MISC_USER,
-			      MAP__FUNCTION, ip, &al);
+	thread__find_addr_map_time(ui->thread, PERF_RECORD_MISC_USER,
+				   MAP__FUNCTION, ip, &al, ui->sample->time);
 	return al.map;
 }
 
@@ -374,8 +374,8 @@ static int access_dso_mem(struct unwind_info *ui, unw_word_t addr,
 	struct addr_location al;
 	ssize_t size;
 
-	thread__find_addr_map(ui->thread, PERF_RECORD_MISC_USER,
-			      MAP__FUNCTION, addr, &al);
+	thread__find_addr_map_time(ui->thread, PERF_RECORD_MISC_USER,
+				   MAP__FUNCTION, addr, &al, ui->sample->time);
 	if (!al.map) {
 		pr_debug("unwind: no map for %lx\n", (unsigned long)addr);
 		return -1;
@@ -476,14 +476,14 @@ static void put_unwind_info(unw_addr_space_t __maybe_unused as,
 	pr_debug("unwind: put_unwind_info called\n");
 }
 
-static int entry(u64 ip, struct thread *thread,
+static int entry(u64 ip, struct thread *thread, u64 timestamp,
 		 unwind_entry_cb_t cb, void *arg)
 {
 	struct unwind_entry e;
 	struct addr_location al;
 
-	thread__find_addr_location(thread, PERF_RECORD_MISC_USER,
-				   MAP__FUNCTION, ip, &al);
+	thread__find_addr_location_time(thread, PERF_RECORD_MISC_USER,
+					MAP__FUNCTION, ip, &al, timestamp);
 
 	e.ip = ip;
 	e.map = al.map;
@@ -585,7 +585,7 @@ static int get_entries(struct unwind_info *ui, unwind_entry_cb_t cb,
 		unw_word_t ip;
 
 		unw_get_reg(&c, UNW_REG_IP, &ip);
-		ret = ip ? entry(ip, ui->thread, cb, arg) : 0;
+		ret = ip ? entry(ip, ui->thread, ui->sample->time, cb, arg) : 0;
 	}
 
 	return ret;
@@ -610,7 +610,7 @@ int unwind__get_entries(unwind_entry_cb_t cb, void *arg,
 	if (ret)
 		return ret;
 
-	ret = entry(ip, thread, cb, arg);
+	ret = entry(ip, thread, data->time, cb, arg);
 	if (ret)
 		return -ENOMEM;
 
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 20/37] perf tools: Add a test case for timed map groups handling
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (18 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 19/37] perf tools: Introduce thread__find_addr_location_time() and friends Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 21/37] perf tools: Protect dso symbol loading using a mutex Namhyung Kim
                   ` (18 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

A test case for verifying thread->mg and ->mg_list handling during
time change and new thread__find_addr_map_time() and friends.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Makefile.perf          |  1 +
 tools/perf/tests/builtin-test.c   |  4 ++
 tools/perf/tests/tests.h          |  1 +
 tools/perf/tests/thread-mg-time.c | 88 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 94 insertions(+)
 create mode 100644 tools/perf/tests/thread-mg-time.c

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 6094f0a10d8b..47d933454492 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -449,6 +449,7 @@ LIB_OBJS += $(OUTPUT)tests/thread-mg-share.o
 LIB_OBJS += $(OUTPUT)tests/switch-tracking.o
 LIB_OBJS += $(OUTPUT)tests/thread-comm.o
 LIB_OBJS += $(OUTPUT)tests/thread-lookup-time.o
+LIB_OBJS += $(OUTPUT)tests/thread-mg-time.o
 
 BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o
 BUILTIN_OBJS += $(OUTPUT)builtin-bench.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index e4d335de19ea..8f61a7e291ee 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -175,6 +175,10 @@ static struct test {
 		.func = test__thread_lookup_time,
 	},
 	{
+		.desc = "Test thread map group handling with time",
+		.func = test__thread_mg_time,
+	},
+	{
 		.func = NULL,
 	},
 };
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index 1090337f63e5..03557563f31d 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -53,6 +53,7 @@ int test__fdarray__filter(void);
 int test__fdarray__add(void);
 int test__thread_comm(void);
 int test__thread_lookup_time(void);
+int test__thread_mg_time(void);
 
 #if defined(__x86_64__) || defined(__i386__) || defined(__arm__)
 #ifdef HAVE_DWARF_UNWIND_SUPPORT
diff --git a/tools/perf/tests/thread-mg-time.c b/tools/perf/tests/thread-mg-time.c
new file mode 100644
index 000000000000..69fd13752c1d
--- /dev/null
+++ b/tools/perf/tests/thread-mg-time.c
@@ -0,0 +1,88 @@
+#include "tests.h"
+#include "machine.h"
+#include "thread.h"
+#include "map.h"
+#include "debug.h"
+
+#define PERF_MAP_START  0x40000
+
+int test__thread_mg_time(void)
+{
+	struct machines machines;
+	struct machine *machine;
+	struct thread *t;
+	struct map_groups *mg;
+	struct map *map;
+	struct addr_location al = { .map = NULL, };
+
+	/*
+	 * This test is to check whether it can retrieve a correct map
+	 * for a given time.  When multi-file data storage is enabled,
+	 * those task/comm/mmap events are processed first so the
+	 * later sample should find a matching comm properly.
+	 */
+	machines__init(&machines);
+	machine = &machines.host;
+
+	t = machine__findnew_thread(machine, 0, 0);
+	mg = t->mg;
+
+	map = dso__new_map("/usr/bin/perf");
+	map->start = PERF_MAP_START;
+	map->end = PERF_MAP_START + 0x1000;
+
+	thread__insert_map(t, map);
+
+	if (verbose > 1)
+		map_groups__fprintf(t->mg, stderr);
+
+	thread__find_addr_map(t, PERF_RECORD_MISC_USER, MAP__FUNCTION,
+			      PERF_MAP_START, &al);
+
+	TEST_ASSERT_VAL("cannot find mapping for perf", al.map != NULL);
+	TEST_ASSERT_VAL("non matched mapping found", al.map == map);
+	TEST_ASSERT_VAL("incorrect map groups", al.map->groups == mg);
+	TEST_ASSERT_VAL("incorrect map groups", al.map->groups == t->mg);
+
+	thread__find_addr_map_time(t, PERF_RECORD_MISC_USER,
+				   MAP__FUNCTION, PERF_MAP_START, &al, -1ULL);
+
+	TEST_ASSERT_VAL("cannot find timed mapping for perf", al.map != NULL);
+	TEST_ASSERT_VAL("non matched timed mapping", al.map == map);
+	TEST_ASSERT_VAL("incorrect timed map groups", al.map->groups == mg);
+	TEST_ASSERT_VAL("incorrect map groups", al.map->groups == t->mg);
+
+
+	pr_debug("simulate EXEC event (generate new mg)\n");
+	__thread__set_comm(t, "perf-test", 10000, true);
+
+	map = dso__new_map("/usr/bin/perf-test");
+	map->start = PERF_MAP_START;
+	map->end = PERF_MAP_START + 0x2000;
+
+	thread__insert_map(t, map);
+
+	if (verbose > 1)
+		map_groups__fprintf(t->mg, stderr);
+
+	thread__find_addr_map(t, PERF_RECORD_MISC_USER, MAP__FUNCTION,
+			      PERF_MAP_START + 4, &al);
+
+	TEST_ASSERT_VAL("cannot find mapping for perf-test", al.map != NULL);
+	TEST_ASSERT_VAL("invalid mapping found", al.map == map);
+	TEST_ASSERT_VAL("incorrect map groups", al.map->groups != mg);
+	TEST_ASSERT_VAL("incorrect map groups", al.map->groups == t->mg);
+
+	pr_debug("searching map in the old mag groups\n");
+	thread__find_addr_map_time(t, PERF_RECORD_MISC_USER,
+				   MAP__FUNCTION, PERF_MAP_START, &al, 5000);
+
+	TEST_ASSERT_VAL("cannot find timed mapping for perf-test", al.map != NULL);
+	TEST_ASSERT_VAL("non matched timed mapping", al.map != map);
+	TEST_ASSERT_VAL("incorrect timed map groups", al.map->groups == mg);
+	TEST_ASSERT_VAL("incorrect map groups", al.map->groups != t->mg);
+
+	machine__delete_threads(machine);
+	machines__exit(&machines);
+	return 0;
+}
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 21/37] perf tools: Protect dso symbol loading using a mutex
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (19 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 20/37] perf tools: Add a test case for timed map groups handling Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 22/37] perf tools: Protect dso cache tree using dso->lock Namhyung Kim
                   ` (17 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

When multi-thread support for perf report is enabled, it's possible to
access a dso concurrently.  Add a new pthread_mutex to protect it from
concurrent dso__load().

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/dso.c    |  2 ++
 tools/perf/util/dso.h    |  1 +
 tools/perf/util/symbol.c | 34 ++++++++++++++++++++++++----------
 3 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 45be944d450a..3da75816b8f8 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -888,6 +888,7 @@ struct dso *dso__new(const char *name)
 		RB_CLEAR_NODE(&dso->rb_node);
 		INIT_LIST_HEAD(&dso->node);
 		INIT_LIST_HEAD(&dso->data.open_entry);
+		pthread_mutex_init(&dso->lock, NULL);
 	}
 
 	return dso;
@@ -917,6 +918,7 @@ void dso__delete(struct dso *dso)
 	dso_cache__free(&dso->data.cache);
 	dso__free_a2l(dso);
 	zfree(&dso->symsrc_filename);
+	pthread_mutex_destroy(&dso->lock);
 	free(dso);
 }
 
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 3782c82c6e44..ac753594a469 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -102,6 +102,7 @@ struct dsos {
 };
 
 struct dso {
+	pthread_mutex_t	 lock;
 	struct list_head node;
 	struct rb_node	 rb_node;	/* rbtree node sorted by long name */
 	struct rb_root	 symbols[MAP__NR_TYPES];
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index c24c5b83156c..6c764a5b3464 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1332,12 +1332,22 @@ int dso__load(struct dso *dso, struct map *map, symbol_filter_t filter)
 	struct symsrc *syms_ss = NULL, *runtime_ss = NULL;
 	bool kmod;
 
-	dso__set_loaded(dso, map->type);
+	pthread_mutex_lock(&dso->lock);
+
+	/* check again under the dso->lock */
+	if (dso__loaded(dso, map->type)) {
+		ret = 1;
+		goto out;
+	}
+
+	if (dso->kernel) {
+		if (dso->kernel == DSO_TYPE_KERNEL)
+			ret = dso__load_kernel_sym(dso, map, filter);
+		else if (dso->kernel == DSO_TYPE_GUEST_KERNEL)
+			ret = dso__load_guest_kernel_sym(dso, map, filter);
 
-	if (dso->kernel == DSO_TYPE_KERNEL)
-		return dso__load_kernel_sym(dso, map, filter);
-	else if (dso->kernel == DSO_TYPE_GUEST_KERNEL)
-		return dso__load_guest_kernel_sym(dso, map, filter);
+		goto out;
+	}
 
 	if (map->groups && map->groups->machine)
 		machine = map->groups->machine;
@@ -1350,18 +1360,18 @@ int dso__load(struct dso *dso, struct map *map, symbol_filter_t filter)
 		struct stat st;
 
 		if (lstat(dso->name, &st) < 0)
-			return -1;
+			goto out;
 
 		if (st.st_uid && (st.st_uid != geteuid())) {
 			pr_warning("File %s not owned by current user or root, "
 				"ignoring it.\n", dso->name);
-			return -1;
+			goto out;
 		}
 
 		ret = dso__load_perf_map(dso, map, filter);
 		dso->symtab_type = ret > 0 ? DSO_BINARY_TYPE__JAVA_JIT :
 					     DSO_BINARY_TYPE__NOT_FOUND;
-		return ret;
+		goto out;
 	}
 
 	if (machine)
@@ -1369,7 +1379,7 @@ int dso__load(struct dso *dso, struct map *map, symbol_filter_t filter)
 
 	name = malloc(PATH_MAX);
 	if (!name)
-		return -1;
+		goto out;
 
 	kmod = dso->symtab_type == DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE ||
 		dso->symtab_type == DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE_COMP ||
@@ -1450,7 +1460,11 @@ int dso__load(struct dso *dso, struct map *map, symbol_filter_t filter)
 out_free:
 	free(name);
 	if (ret < 0 && strstr(dso->name, " (deleted)") != NULL)
-		return 0;
+		ret = 0;
+out:
+	dso__set_loaded(dso, map->type);
+	pthread_mutex_unlock(&dso->lock);
+
 	return ret;
 }
 
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 22/37] perf tools: Protect dso cache tree using dso->lock
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (20 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 21/37] perf tools: Protect dso symbol loading using a mutex Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 23/37] perf tools: Protect dso cache fd with a mutex Namhyung Kim
                   ` (16 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

The dso cache is accessed during dwarf callchain unwind and it might
be processed concurrently when multi-thread report is enabled.
Protect it under dso->lock.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/dso.c | 41 ++++++++++++++++++++++++++++++++---------
 1 file changed, 32 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 3da75816b8f8..9555b1772fb5 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -443,10 +443,12 @@ bool dso__data_status_seen(struct dso *dso, enum dso_data_status_seen by)
 }
 
 static void
-dso_cache__free(struct rb_root *root)
+dso_cache__free(struct dso *dso)
 {
+	struct rb_root *root = &dso->data.cache;
 	struct rb_node *next = rb_first(root);
 
+	pthread_mutex_lock(&dso->lock);
 	while (next) {
 		struct dso_cache *cache;
 
@@ -455,14 +457,17 @@ dso_cache__free(struct rb_root *root)
 		rb_erase(&cache->rb_node, root);
 		free(cache);
 	}
+	pthread_mutex_unlock(&dso->lock);
 }
 
-static struct dso_cache *dso_cache__find(const struct rb_root *root, u64 offset)
+static struct dso_cache *dso_cache__find(struct dso *dso, u64 offset)
 {
+	const struct rb_root *root = &dso->data.cache;
 	struct rb_node * const *p = &root->rb_node;
 	const struct rb_node *parent = NULL;
 	struct dso_cache *cache;
 
+	pthread_mutex_lock(&dso->lock);
 	while (*p != NULL) {
 		u64 end;
 
@@ -475,19 +480,24 @@ static struct dso_cache *dso_cache__find(const struct rb_root *root, u64 offset)
 		else if (offset >= end)
 			p = &(*p)->rb_right;
 		else
-			return cache;
+			goto out;
 	}
-	return NULL;
+	cache = NULL;
+out:
+	pthread_mutex_unlock(&dso->lock);
+	return cache;
 }
 
-static void
-dso_cache__insert(struct rb_root *root, struct dso_cache *new)
+static struct dso_cache *
+dso_cache__insert(struct dso *dso, struct dso_cache *new)
 {
+	struct rb_root *root = &dso->data.cache;
 	struct rb_node **p = &root->rb_node;
 	struct rb_node *parent = NULL;
 	struct dso_cache *cache;
 	u64 offset = new->offset;
 
+	pthread_mutex_lock(&dso->lock);
 	while (*p != NULL) {
 		u64 end;
 
@@ -499,10 +509,17 @@ dso_cache__insert(struct rb_root *root, struct dso_cache *new)
 			p = &(*p)->rb_left;
 		else if (offset >= end)
 			p = &(*p)->rb_right;
+		else
+			goto out;
 	}
 
 	rb_link_node(&new->rb_node, parent, p);
 	rb_insert_color(&new->rb_node, root);
+
+	cache = NULL;
+out:
+	pthread_mutex_unlock(&dso->lock);
+	return cache;
 }
 
 static ssize_t
@@ -520,6 +537,7 @@ static ssize_t
 dso_cache__read(struct dso *dso, u64 offset, u8 *data, ssize_t size)
 {
 	struct dso_cache *cache;
+	struct dso_cache *old;
 	ssize_t ret;
 
 	do {
@@ -543,7 +561,12 @@ dso_cache__read(struct dso *dso, u64 offset, u8 *data, ssize_t size)
 
 		cache->offset = cache_offset;
 		cache->size   = ret;
-		dso_cache__insert(&dso->data.cache, cache);
+		old = dso_cache__insert(dso, cache);
+		if (old) {
+			/* we lose the race */
+			free(cache);
+			cache = old;
+		}
 
 		ret = dso_cache__memcpy(cache, offset, data, size);
 
@@ -560,7 +583,7 @@ static ssize_t dso_cache_read(struct dso *dso, u64 offset,
 {
 	struct dso_cache *cache;
 
-	cache = dso_cache__find(&dso->data.cache, offset);
+	cache = dso_cache__find(dso, offset);
 	if (cache)
 		return dso_cache__memcpy(cache, offset, data, size);
 	else
@@ -915,7 +938,7 @@ void dso__delete(struct dso *dso)
 	}
 
 	dso__data_close(dso);
-	dso_cache__free(&dso->data.cache);
+	dso_cache__free(dso);
 	dso__free_a2l(dso);
 	zfree(&dso->symsrc_filename);
 	pthread_mutex_destroy(&dso->lock);
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 23/37] perf tools: Protect dso cache fd with a mutex
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (21 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 22/37] perf tools: Protect dso cache tree using dso->lock Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 24/37] perf session: Pass struct events stats to event processing functions Namhyung Kim
                   ` (15 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

When dso cache is accessed in multi-thread environment, it's possible
to close other dso->data.fd during operation due to open file limit.
Protect the file descriptors using a separate mutex.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/tests/dso-data.c |   5 ++
 tools/perf/util/dso.c       | 136 +++++++++++++++++++++++++++++---------------
 2 files changed, 94 insertions(+), 47 deletions(-)

diff --git a/tools/perf/tests/dso-data.c b/tools/perf/tests/dso-data.c
index caaf37f079b1..0276e7d2d41b 100644
--- a/tools/perf/tests/dso-data.c
+++ b/tools/perf/tests/dso-data.c
@@ -111,6 +111,9 @@ int test__dso_data(void)
 	memset(&machine, 0, sizeof(machine));
 
 	dso = dso__new((const char *)file);
+	TEST_ASSERT_VAL("failed to get dso", dso);
+
+	dso->binary_type = DSO_BINARY_TYPE__SYSTEM_PATH_DSO;
 
 	/* Basic 10 bytes tests. */
 	for (i = 0; i < ARRAY_SIZE(offsets); i++) {
@@ -199,6 +202,8 @@ static int dsos__create(int cnt, int size)
 
 		dsos[i] = dso__new(file);
 		TEST_ASSERT_VAL("failed to get dso", dsos[i]);
+
+		dsos[i]->binary_type = DSO_BINARY_TYPE__SYSTEM_PATH_DSO;
 	}
 
 	return 0;
diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 9555b1772fb5..6c1f5619f423 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -213,6 +213,7 @@ bool dso__needs_decompress(struct dso *dso)
  */
 static LIST_HEAD(dso__data_open);
 static long dso__data_open_cnt;
+static pthread_mutex_t dso__data_open_lock = PTHREAD_MUTEX_INITIALIZER;
 
 static void dso__list_add(struct dso *dso)
 {
@@ -240,7 +241,7 @@ static int do_open(char *name)
 		if (fd >= 0)
 			return fd;
 
-		pr_debug("dso open failed, mmap: %s\n",
+		pr_debug("dso open failed: %s\n",
 			 strerror_r(errno, sbuf, sizeof(sbuf)));
 		if (!dso__data_open_cnt || errno != EMFILE)
 			break;
@@ -382,7 +383,9 @@ static void check_data_close(void)
  */
 void dso__data_close(struct dso *dso)
 {
+	pthread_mutex_lock(&dso__data_open_lock);
 	close_dso(dso);
+	pthread_mutex_unlock(&dso__data_open_lock);
 }
 
 /**
@@ -405,6 +408,8 @@ int dso__data_fd(struct dso *dso, struct machine *machine)
 	if (dso->data.status == DSO_DATA_STATUS_ERROR)
 		return -1;
 
+	pthread_mutex_lock(&dso__data_open_lock);
+
 	if (dso->data.fd >= 0)
 		goto out;
 
@@ -427,6 +432,7 @@ int dso__data_fd(struct dso *dso, struct machine *machine)
 	else
 		dso->data.status = DSO_DATA_STATUS_ERROR;
 
+	pthread_mutex_unlock(&dso__data_open_lock);
 	return dso->data.fd;
 }
 
@@ -534,52 +540,66 @@ dso_cache__memcpy(struct dso_cache *cache, u64 offset,
 }
 
 static ssize_t
-dso_cache__read(struct dso *dso, u64 offset, u8 *data, ssize_t size)
+dso_cache__read(struct dso *dso, struct machine *machine,
+		u64 offset, u8 *data, ssize_t size)
 {
 	struct dso_cache *cache;
 	struct dso_cache *old;
-	ssize_t ret;
-
-	do {
-		u64 cache_offset;
+	ssize_t ret = -EINVAL;
+	u64 cache_offset;
 
-		ret = -ENOMEM;
+	cache = zalloc(sizeof(*cache) + DSO__DATA_CACHE_SIZE);
+	if (!cache)
+		return -ENOMEM;
 
-		cache = zalloc(sizeof(*cache) + DSO__DATA_CACHE_SIZE);
-		if (!cache)
-			break;
+	cache_offset = offset & DSO__DATA_CACHE_MASK;
 
-		cache_offset = offset & DSO__DATA_CACHE_MASK;
-		ret = -EINVAL;
+	pthread_mutex_lock(&dso__data_open_lock);
 
-		if (-1 == lseek(dso->data.fd, cache_offset, SEEK_SET))
-			break;
+	/*
+	 * dso->data.fd might be closed if other thread opened another
+	 * file (dso) due to open file limit (RLIMIT_NOFILE).
+	 */
+	if (dso->data.fd < 0) {
+		dso->data.fd = open_dso(dso, machine);
+		if (dso->data.fd < 0) {
+			ret = -errno;
+			dso->data.status = DSO_DATA_STATUS_ERROR;
+			goto err_unlock;
+		}
+	}
 
-		ret = read(dso->data.fd, cache->data, DSO__DATA_CACHE_SIZE);
-		if (ret <= 0)
-			break;
+	if (-1 == lseek(dso->data.fd, cache_offset, SEEK_SET))
+		goto err_unlock;
 
-		cache->offset = cache_offset;
-		cache->size   = ret;
-		old = dso_cache__insert(dso, cache);
-		if (old) {
-			/* we lose the race */
-			free(cache);
-			cache = old;
-		}
+	ret = read(dso->data.fd, cache->data, DSO__DATA_CACHE_SIZE);
+	if (ret <= 0)
+		goto err_unlock;
 
-		ret = dso_cache__memcpy(cache, offset, data, size);
+	pthread_mutex_unlock(&dso__data_open_lock);
 
-	} while (0);
+	cache->offset = cache_offset;
+	cache->size   = ret;
+	old = dso_cache__insert(dso, cache);
+	if (old) {
+		/* we lose the race */
+		free(cache);
+		cache = old;
+	}
 
+	ret = dso_cache__memcpy(cache, offset, data, size);
 	if (ret <= 0)
 		free(cache);
 
 	return ret;
+
+err_unlock:
+	pthread_mutex_unlock(&dso__data_open_lock);
+	return ret;
 }
 
-static ssize_t dso_cache_read(struct dso *dso, u64 offset,
-			      u8 *data, ssize_t size)
+static ssize_t dso_cache_read(struct dso *dso, struct machine *machine,
+			      u64 offset, u8 *data, ssize_t size)
 {
 	struct dso_cache *cache;
 
@@ -587,7 +607,7 @@ static ssize_t dso_cache_read(struct dso *dso, u64 offset,
 	if (cache)
 		return dso_cache__memcpy(cache, offset, data, size);
 	else
-		return dso_cache__read(dso, offset, data, size);
+		return dso_cache__read(dso, machine, offset, data, size);
 }
 
 /*
@@ -595,7 +615,8 @@ static ssize_t dso_cache_read(struct dso *dso, u64 offset,
  * in the rb_tree. Any read to already cached data is served
  * by cached data.
  */
-static ssize_t cached_read(struct dso *dso, u64 offset, u8 *data, ssize_t size)
+static ssize_t cached_read(struct dso *dso, struct machine *machine,
+			   u64 offset, u8 *data, ssize_t size)
 {
 	ssize_t r = 0;
 	u8 *p = data;
@@ -603,7 +624,7 @@ static ssize_t cached_read(struct dso *dso, u64 offset, u8 *data, ssize_t size)
 	do {
 		ssize_t ret;
 
-		ret = dso_cache_read(dso, offset, p, size);
+		ret = dso_cache_read(dso, machine, offset, p, size);
 		if (ret < 0)
 			return ret;
 
@@ -623,21 +644,42 @@ static ssize_t cached_read(struct dso *dso, u64 offset, u8 *data, ssize_t size)
 	return r;
 }
 
-static int data_file_size(struct dso *dso)
+static int data_file_size(struct dso *dso, struct machine *machine)
 {
+	int ret = 0;
 	struct stat st;
 	char sbuf[STRERR_BUFSIZE];
 
-	if (!dso->data.file_size) {
-		if (fstat(dso->data.fd, &st)) {
-			pr_err("dso mmap failed, fstat: %s\n",
-				strerror_r(errno, sbuf, sizeof(sbuf)));
-			return -1;
+	if (dso->data.file_size)
+		return 0;
+
+	pthread_mutex_lock(&dso__data_open_lock);
+
+	/*
+	 * dso->data.fd might be closed if other thread opened another
+	 * file (dso) due to open file limit (RLIMIT_NOFILE).
+	 */
+	if (dso->data.fd < 0) {
+		dso->data.fd = open_dso(dso, machine);
+		if (dso->data.fd < 0) {
+			ret = -errno;
+			dso->data.status = DSO_DATA_STATUS_ERROR;
+			goto out;
 		}
-		dso->data.file_size = st.st_size;
 	}
 
-	return 0;
+	if (fstat(dso->data.fd, &st) < 0) {
+		ret = -errno;
+		pr_err("dso cache fstat failed: %s\n",
+		       strerror_r(errno, sbuf, sizeof(sbuf)));
+		dso->data.status = DSO_DATA_STATUS_ERROR;
+		goto out;
+	}
+	dso->data.file_size = st.st_size;
+
+out:
+	pthread_mutex_unlock(&dso__data_open_lock);
+	return ret;
 }
 
 /**
@@ -655,17 +697,17 @@ off_t dso__data_size(struct dso *dso, struct machine *machine)
 	if (fd < 0)
 		return fd;
 
-	if (data_file_size(dso))
+	if (data_file_size(dso, machine))
 		return -1;
 
 	/* For now just estimate dso data size is close to file size */
 	return dso->data.file_size;
 }
 
-static ssize_t data_read_offset(struct dso *dso, u64 offset,
-				u8 *data, ssize_t size)
+static ssize_t data_read_offset(struct dso *dso, struct machine *machine,
+				u64 offset, u8 *data, ssize_t size)
 {
-	if (data_file_size(dso))
+	if (data_file_size(dso, machine))
 		return -1;
 
 	/* Check the offset sanity. */
@@ -675,7 +717,7 @@ static ssize_t data_read_offset(struct dso *dso, u64 offset,
 	if (offset + size < offset)
 		return -1;
 
-	return cached_read(dso, offset, data, size);
+	return cached_read(dso, machine, offset, data, size);
 }
 
 /**
@@ -692,10 +734,10 @@ static ssize_t data_read_offset(struct dso *dso, u64 offset,
 ssize_t dso__data_read_offset(struct dso *dso, struct machine *machine,
 			      u64 offset, u8 *data, ssize_t size)
 {
-	if (dso__data_fd(dso, machine) < 0)
+	if (dso->data.status == DSO_DATA_STATUS_ERROR)
 		return -1;
 
-	return data_read_offset(dso, offset, data, size);
+	return data_read_offset(dso, machine, offset, data, size);
 }
 
 /**
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 24/37] perf session: Pass struct events stats to event processing functions
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (22 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 23/37] perf tools: Protect dso cache fd with a mutex Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 25/37] perf hists: Pass hists struct to hist_entry_iter functions Namhyung Kim
                   ` (14 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

Pass stats structure so that it can point separate object when used in
multi-thread environment.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/ordered-events.c |  4 +-
 tools/perf/util/session.c        | 87 +++++++++++++++++++++++-----------------
 tools/perf/util/session.h        |  1 +
 3 files changed, 54 insertions(+), 38 deletions(-)

diff --git a/tools/perf/util/ordered-events.c b/tools/perf/util/ordered-events.c
index fd4be94125fb..e933c51d7090 100644
--- a/tools/perf/util/ordered-events.c
+++ b/tools/perf/util/ordered-events.c
@@ -183,7 +183,9 @@ static int __ordered_events__flush(struct perf_session *s,
 		if (ret)
 			pr_err("Can't parse sample, err = %d\n", ret);
 		else {
-			ret = perf_session__deliver_event(s, iter->event, &sample, tool,
+			ret = perf_session__deliver_event(s, &s->stats,
+							  iter->event,
+							  &sample, tool,
 							  iter->file_offset);
 			if (ret)
 				return ret;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 4f0fcd2d3901..af2608e782ae 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -779,6 +779,7 @@ static struct machine *
 }
 
 static int deliver_sample_value(struct perf_session *session,
+				struct events_stats *stats,
 				struct perf_tool *tool,
 				union perf_event *event,
 				struct perf_sample *sample,
@@ -795,7 +796,7 @@ static int deliver_sample_value(struct perf_session *session,
 	}
 
 	if (!sid || sid->evsel == NULL) {
-		++session->stats.nr_unknown_id;
+		++stats->nr_unknown_id;
 		return 0;
 	}
 
@@ -803,6 +804,7 @@ static int deliver_sample_value(struct perf_session *session,
 }
 
 static int deliver_sample_group(struct perf_session *session,
+				struct events_stats *stats,
 				struct perf_tool *tool,
 				union  perf_event *event,
 				struct perf_sample *sample,
@@ -812,7 +814,7 @@ static int deliver_sample_group(struct perf_session *session,
 	u64 i;
 
 	for (i = 0; i < sample->read.group.nr; i++) {
-		ret = deliver_sample_value(session, tool, event, sample,
+		ret = deliver_sample_value(session, stats, tool, event, sample,
 					   &sample->read.group.values[i],
 					   machine);
 		if (ret)
@@ -824,6 +826,7 @@ static int deliver_sample_group(struct perf_session *session,
 
 static int
 perf_session__deliver_sample(struct perf_session *session,
+			     struct events_stats *stats,
 			     struct perf_tool *tool,
 			     union  perf_event *event,
 			     struct perf_sample *sample,
@@ -840,14 +843,15 @@ perf_session__deliver_sample(struct perf_session *session,
 
 	/* For PERF_SAMPLE_READ we have either single or group mode. */
 	if (read_format & PERF_FORMAT_GROUP)
-		return deliver_sample_group(session, tool, event, sample,
+		return deliver_sample_group(session, stats, tool, event, sample,
 					    machine);
 	else
-		return deliver_sample_value(session, tool, event, sample,
+		return deliver_sample_value(session, stats, tool, event, sample,
 					    &sample->read.one, machine);
 }
 
 int perf_session__deliver_event(struct perf_session *session,
+				struct events_stats *stats,
 				union perf_event *event,
 				struct perf_sample *sample,
 				struct perf_tool *tool, u64 file_offset)
@@ -866,14 +870,14 @@ int perf_session__deliver_event(struct perf_session *session,
 	case PERF_RECORD_SAMPLE:
 		dump_sample(evsel, event, sample);
 		if (evsel == NULL) {
-			++session->stats.nr_unknown_id;
+			++stats->nr_unknown_id;
 			return 0;
 		}
 		if (machine == NULL) {
-			++session->stats.nr_unprocessable_samples;
+			++stats->nr_unprocessable_samples;
 			return 0;
 		}
-		return perf_session__deliver_sample(session, tool, event,
+		return perf_session__deliver_sample(session, stats, tool, event,
 						    sample, evsel, machine);
 	case PERF_RECORD_MMAP:
 		return tool->mmap(tool, event, sample, machine);
@@ -887,7 +891,7 @@ int perf_session__deliver_event(struct perf_session *session,
 		return tool->exit(tool, event, sample, machine);
 	case PERF_RECORD_LOST:
 		if (tool->lost == perf_event__process_lost)
-			session->stats.total_lost += event->lost.lost;
+			stats->total_lost += event->lost.lost;
 		return tool->lost(tool, event, sample, machine);
 	case PERF_RECORD_READ:
 		return tool->read(tool, event, sample, evsel, machine);
@@ -896,7 +900,7 @@ int perf_session__deliver_event(struct perf_session *session,
 	case PERF_RECORD_UNTHROTTLE:
 		return tool->unthrottle(tool, event, sample, machine);
 	default:
-		++session->stats.nr_unknown_events;
+		++stats->nr_unknown_events;
 		return -1;
 	}
 }
@@ -951,7 +955,8 @@ int perf_session__deliver_synth_event(struct perf_session *session,
 	if (event->header.type >= PERF_RECORD_USER_TYPE_START)
 		return perf_session__process_user_event(session, event, tool, 0);
 
-	return perf_session__deliver_event(session, event, sample, tool, 0);
+	return perf_session__deliver_event(session, &session->stats,
+					   event, sample, tool, 0);
 }
 
 static void event_swap(union perf_event *event, bool sample_id_all)
@@ -1019,6 +1024,7 @@ int perf_session__peek_event(struct perf_session *session, off_t file_offset,
 }
 
 static s64 perf_session__process_event(struct perf_session *session,
+				       struct events_stats *stats,
 				       union perf_event *event,
 				       struct perf_tool *tool,
 				       u64 file_offset)
@@ -1032,7 +1038,7 @@ static s64 perf_session__process_event(struct perf_session *session,
 	if (event->header.type >= PERF_RECORD_HEADER_MAX)
 		return -EINVAL;
 
-	events_stats__inc(&session->stats, event->header.type);
+	events_stats__inc(stats, event->header.type);
 
 	if (event->header.type >= PERF_RECORD_USER_TYPE_START)
 		return perf_session__process_user_event(session, event, tool, file_offset);
@@ -1051,8 +1057,8 @@ static s64 perf_session__process_event(struct perf_session *session,
 			return ret;
 	}
 
-	return perf_session__deliver_event(session, event, &sample, tool,
-					   file_offset);
+	return perf_session__deliver_event(session, stats, event, &sample,
+					   tool, file_offset);
 }
 
 void perf_event_header__bswap(struct perf_event_header *hdr)
@@ -1080,43 +1086,43 @@ static struct thread *perf_session__register_idle_thread(struct perf_session *se
 	return thread;
 }
 
-static void perf_session__warn_about_errors(const struct perf_session *session,
+static void events_stats__warn_about_errors(const struct events_stats *stats,
 					    const struct perf_tool *tool)
 {
 	if (tool->lost == perf_event__process_lost &&
-	    session->stats.nr_events[PERF_RECORD_LOST] != 0) {
+	    stats->nr_events[PERF_RECORD_LOST] != 0) {
 		ui__warning("Processed %d events and lost %d chunks!\n\n"
 			    "Check IO/CPU overload!\n\n",
-			    session->stats.nr_events[0],
-			    session->stats.nr_events[PERF_RECORD_LOST]);
+			    stats->nr_events[0],
+			    stats->nr_events[PERF_RECORD_LOST]);
 	}
 
-	if (session->stats.nr_unknown_events != 0) {
+	if (stats->nr_unknown_events != 0) {
 		ui__warning("Found %u unknown events!\n\n"
 			    "Is this an older tool processing a perf.data "
 			    "file generated by a more recent tool?\n\n"
 			    "If that is not the case, consider "
 			    "reporting to linux-kernel@vger.kernel.org.\n\n",
-			    session->stats.nr_unknown_events);
+			    stats->nr_unknown_events);
 	}
 
-	if (session->stats.nr_unknown_id != 0) {
+	if (stats->nr_unknown_id != 0) {
 		ui__warning("%u samples with id not present in the header\n",
-			    session->stats.nr_unknown_id);
+			    stats->nr_unknown_id);
 	}
 
- 	if (session->stats.nr_invalid_chains != 0) {
+	if (stats->nr_invalid_chains != 0) {
  		ui__warning("Found invalid callchains!\n\n"
  			    "%u out of %u events were discarded for this reason.\n\n"
  			    "Consider reporting to linux-kernel@vger.kernel.org.\n\n",
- 			    session->stats.nr_invalid_chains,
- 			    session->stats.nr_events[PERF_RECORD_SAMPLE]);
+			    stats->nr_invalid_chains,
+			    stats->nr_events[PERF_RECORD_SAMPLE]);
  	}
 
-	if (session->stats.nr_unprocessable_samples != 0) {
+	if (stats->nr_unprocessable_samples != 0) {
 		ui__warning("%u unprocessable samples recorded.\n"
 			    "Do you have a KVM guest running and not using 'perf kvm'?\n",
-			    session->stats.nr_unprocessable_samples);
+			    stats->nr_unprocessable_samples);
 	}
 }
 
@@ -1188,7 +1194,8 @@ static int __perf_session__process_pipe_events(struct perf_session *session,
 		}
 	}
 
-	if ((skip = perf_session__process_event(session, event, tool, head)) < 0) {
+	if ((skip = perf_session__process_event(session, &session->stats,
+						event, tool, head)) < 0) {
 		pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n",
 		       head, event->header.size, event->header.type);
 		err = -EINVAL;
@@ -1207,7 +1214,7 @@ static int __perf_session__process_pipe_events(struct perf_session *session,
 	err = ordered_events__flush(session, tool, OE_FLUSH__FINAL);
 out_err:
 	free(buf);
-	perf_session__warn_about_errors(session, tool);
+	events_stats__warn_about_errors(&session->stats, tool);
 	ordered_events__free(&session->ordered_events);
 	return err;
 }
@@ -1252,7 +1259,8 @@ fetch_mmaped_event(struct perf_session *session,
 #define NUM_MMAPS 128
 #endif
 
-static int __perf_session__process_events(struct perf_session *session, int fd,
+static int __perf_session__process_events(struct perf_session *session,
+					  struct events_stats *stats, int fd,
 					  u64 data_offset, u64 data_size,
 					  u64 file_size, struct perf_tool *tool)
 {
@@ -1278,7 +1286,9 @@ static int __perf_session__process_events(struct perf_session *session, int fd,
 	mmap_size = MMAP_SIZE;
 	if (mmap_size > file_size) {
 		mmap_size = file_size;
-		session->one_mmap = true;
+
+		if (!session->file->is_multi)
+			session->one_mmap = true;
 	}
 
 	memset(mmaps, 0, sizeof(mmaps));
@@ -1323,8 +1333,8 @@ static int __perf_session__process_events(struct perf_session *session, int fd,
 	size = event->header.size;
 
 	if (size < sizeof(struct perf_event_header) ||
-	    (skip = perf_session__process_event(session, event, tool, file_pos))
-									< 0) {
+	    (skip = perf_session__process_event(session, stats, event,
+						tool, file_pos)) < 0) {
 		pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n",
 		       file_offset + head, event->header.size,
 		       event->header.type);
@@ -1351,7 +1361,6 @@ static int __perf_session__process_events(struct perf_session *session, int fd,
 	err = ordered_events__flush(session, tool, OE_FLUSH__FINAL);
 out_err:
 	ui_progress__finish();
-	perf_session__warn_about_errors(session, tool);
 	ordered_events__free(&session->ordered_events);
 	session->one_mmap = false;
 	return err;
@@ -1369,13 +1378,16 @@ int perf_session__process_events(struct perf_session *session,
 	if (perf_data_file__is_pipe(session->file))
 		return __perf_session__process_pipe_events(session, tool);
 
-	err = __perf_session__process_events(session,
+	err = __perf_session__process_events(session, &session->stats,
 					     perf_data_file__fd(session->file),
 					     session->header.data_offset,
 					     session->header.data_size,
 					     size, tool);
-	if (!session->file->is_multi || err)
+
+	if (!session->file->is_multi || err) {
+		events_stats__warn_about_errors(&session->stats, tool);
 		return err;
+	}
 
 	/*
 	 * For multi-file data storage, events are processed for each
@@ -1390,12 +1402,13 @@ int perf_session__process_events(struct perf_session *session,
 		if (size == 0)
 			continue;
 
-		err = __perf_session__process_events(session, fd,
-						     0, size, size, tool);
+		err = __perf_session__process_events(session, &session->stats,
+						     fd, 0, size, size, tool);
 		if (err < 0)
 			break;
 	}
 
+	events_stats__warn_about_errors(&session->stats, tool);
 	return err;
 }
 
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 6d663dc76404..8fc067d931cd 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -59,6 +59,7 @@ int perf_session_queue_event(struct perf_session *s, union perf_event *event,
 void perf_tool__fill_defaults(struct perf_tool *tool);
 
 int perf_session__deliver_event(struct perf_session *session,
+				struct events_stats *stats,
 				union perf_event *event,
 				struct perf_sample *sample,
 				struct perf_tool *tool, u64 file_offset);
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 25/37] perf hists: Pass hists struct to hist_entry_iter functions
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (23 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 24/37] perf session: Pass struct events stats to event processing functions Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 26/37] perf tools: Move BUILD_ID_SIZE definition to perf.h Namhyung Kim
                   ` (13 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

This is a preparation for perf report multi-thread support.  When
multi-thread is enable, each thread will have its own hists during the
sample processing.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-report.c       |  4 ++--
 tools/perf/builtin-top.c          |  4 ++--
 tools/perf/tests/hists_cumulate.c |  4 ++--
 tools/perf/tests/hists_filter.c   |  3 ++-
 tools/perf/tests/hists_output.c   |  4 ++--
 tools/perf/util/hist.c            | 26 +++++++++++---------------
 tools/perf/util/hist.h            |  6 ++++--
 7 files changed, 25 insertions(+), 26 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 4cac79ad3085..aabcfc24afd1 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -166,8 +166,8 @@ static int process_sample_event(struct perf_tool *tool,
 	if (al.map != NULL)
 		al.map->dso->hit = 1;
 
-	ret = hist_entry_iter__add(&iter, &al, evsel, sample, rep->max_stack,
-				   rep);
+	ret = hist_entry_iter__add(&iter, evsel__hists(evsel), evsel, &al,
+				   sample, rep->max_stack, rep);
 	if (ret < 0)
 		pr_debug("problem adding hist entry, skipping event\n");
 
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 961cea183a83..818ae35cbd7b 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -784,8 +784,8 @@ static void perf_event__process_sample(struct perf_tool *tool,
 
 		pthread_mutex_lock(&hists->lock);
 
-		err = hist_entry_iter__add(&iter, &al, evsel, sample,
-					   top->max_stack, top);
+		err = hist_entry_iter__add(&iter, evsel__hists(evsel), evsel,
+					   &al, sample, top->max_stack, top);
 		if (err < 0)
 			pr_err("Problem incrementing symbol period, skipping event\n");
 
diff --git a/tools/perf/tests/hists_cumulate.c b/tools/perf/tests/hists_cumulate.c
index 4b8226e19a91..b1a2bed721a8 100644
--- a/tools/perf/tests/hists_cumulate.c
+++ b/tools/perf/tests/hists_cumulate.c
@@ -104,8 +104,8 @@ static int add_hist_entries(struct hists *hists, struct machine *machine)
 						  &sample) < 0)
 			goto out;
 
-		if (hist_entry_iter__add(&iter, &al, evsel, &sample,
-					 PERF_MAX_STACK_DEPTH, NULL) < 0)
+		if (hist_entry_iter__add(&iter, evsel__hists(evsel), evsel, &al,
+					 &sample, PERF_MAX_STACK_DEPTH, NULL) < 0)
 			goto out;
 
 		fake_samples[i].thread = al.thread;
diff --git a/tools/perf/tests/hists_filter.c b/tools/perf/tests/hists_filter.c
index 59e53db7914c..3c54264ec58d 100644
--- a/tools/perf/tests/hists_filter.c
+++ b/tools/perf/tests/hists_filter.c
@@ -81,7 +81,8 @@ static int add_hist_entries(struct perf_evlist *evlist,
 							  &sample) < 0)
 				goto out;
 
-			if (hist_entry_iter__add(&iter, &al, evsel, &sample,
+			if (hist_entry_iter__add(&iter, evsel__hists(evsel),
+						 evsel, &al, &sample,
 						 PERF_MAX_STACK_DEPTH, NULL) < 0)
 				goto out;
 
diff --git a/tools/perf/tests/hists_output.c b/tools/perf/tests/hists_output.c
index f5547610da02..c705bc7a5e78 100644
--- a/tools/perf/tests/hists_output.c
+++ b/tools/perf/tests/hists_output.c
@@ -70,8 +70,8 @@ static int add_hist_entries(struct hists *hists, struct machine *machine)
 						  &sample) < 0)
 			goto out;
 
-		if (hist_entry_iter__add(&iter, &al, evsel, &sample,
-					 PERF_MAX_STACK_DEPTH, NULL) < 0)
+		if (hist_entry_iter__add(&iter, evsel__hists(evsel), evsel, &al,
+					 &sample, PERF_MAX_STACK_DEPTH, NULL) < 0)
 			goto out;
 
 		fake_samples[i].thread = al.thread;
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index d322264bac22..d7cee7165bcd 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -514,7 +514,7 @@ iter_add_single_mem_entry(struct hist_entry_iter *iter, struct addr_location *al
 	u64 cost;
 	struct mem_info *mi = iter->priv;
 	struct perf_sample *sample = iter->sample;
-	struct hists *hists = evsel__hists(iter->evsel);
+	struct hists *hists = iter->hists;
 	struct hist_entry *he;
 
 	if (mi == NULL)
@@ -544,8 +544,7 @@ static int
 iter_finish_mem_entry(struct hist_entry_iter *iter,
 		      struct addr_location *al __maybe_unused)
 {
-	struct perf_evsel *evsel = iter->evsel;
-	struct hists *hists = evsel__hists(evsel);
+	struct hists *hists = iter->hists;
 	struct hist_entry *he = iter->he;
 	int err = -EINVAL;
 
@@ -617,8 +616,7 @@ static int
 iter_add_next_branch_entry(struct hist_entry_iter *iter, struct addr_location *al)
 {
 	struct branch_info *bi;
-	struct perf_evsel *evsel = iter->evsel;
-	struct hists *hists = evsel__hists(evsel);
+	struct hists *hists = iter->hists;
 	struct hist_entry *he = NULL;
 	int i = iter->curr;
 	int err = 0;
@@ -665,11 +663,10 @@ iter_prepare_normal_entry(struct hist_entry_iter *iter __maybe_unused,
 static int
 iter_add_single_normal_entry(struct hist_entry_iter *iter, struct addr_location *al)
 {
-	struct perf_evsel *evsel = iter->evsel;
 	struct perf_sample *sample = iter->sample;
 	struct hist_entry *he;
 
-	he = __hists__add_entry(evsel__hists(evsel), al, iter->parent, NULL, NULL,
+	he = __hists__add_entry(iter->hists, al, iter->parent, NULL, NULL,
 				sample->period, sample->weight,
 				sample->transaction, sample->time, true);
 	if (he == NULL)
@@ -684,7 +681,6 @@ iter_finish_normal_entry(struct hist_entry_iter *iter,
 			 struct addr_location *al __maybe_unused)
 {
 	struct hist_entry *he = iter->he;
-	struct perf_evsel *evsel = iter->evsel;
 	struct perf_sample *sample = iter->sample;
 
 	if (he == NULL)
@@ -692,7 +688,7 @@ iter_finish_normal_entry(struct hist_entry_iter *iter,
 
 	iter->he = NULL;
 
-	hists__inc_nr_samples(evsel__hists(evsel), he->filtered);
+	hists__inc_nr_samples(iter->hists, he->filtered);
 
 	return hist_entry__append_callchain(he, sample);
 }
@@ -724,8 +720,7 @@ static int
 iter_add_single_cumulative_entry(struct hist_entry_iter *iter,
 				 struct addr_location *al)
 {
-	struct perf_evsel *evsel = iter->evsel;
-	struct hists *hists = evsel__hists(evsel);
+	struct hists *hists = iter->hists;
 	struct perf_sample *sample = iter->sample;
 	struct hist_entry **he_cache = iter->priv;
 	struct hist_entry *he;
@@ -770,7 +765,6 @@ static int
 iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 			       struct addr_location *al)
 {
-	struct perf_evsel *evsel = iter->evsel;
 	struct perf_sample *sample = iter->sample;
 	struct hist_entry **he_cache = iter->priv;
 	struct hist_entry *he;
@@ -804,7 +798,7 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 		}
 	}
 
-	he = __hists__add_entry(evsel__hists(evsel), al, iter->parent, NULL, NULL,
+	he = __hists__add_entry(iter->hists, al, iter->parent, NULL, NULL,
 				sample->period, sample->weight,
 				sample->transaction, sample->time, false);
 	if (he == NULL)
@@ -860,8 +854,9 @@ const struct hist_iter_ops hist_iter_cumulative = {
 	.finish_entry 		= iter_finish_cumulative_entry,
 };
 
-int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
-			 struct perf_evsel *evsel, struct perf_sample *sample,
+int hist_entry_iter__add(struct hist_entry_iter *iter, struct hists *hists,
+			 struct perf_evsel *evsel, struct addr_location *al,
+			 struct perf_sample *sample,
 			 int max_stack_depth, void *arg)
 {
 	int err, err2;
@@ -871,6 +866,7 @@ int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
 	if (err)
 		return err;
 
+	iter->hists = hists;
 	iter->evsel = evsel;
 	iter->sample = sample;
 
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index b86966206ba8..2ee0e40cf44c 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -86,6 +86,7 @@ struct hist_entry_iter {
 
 	bool hide_unresolved;
 
+	struct hists *hists;
 	struct perf_evsel *evsel;
 	struct perf_sample *sample;
 	struct hist_entry *he;
@@ -110,8 +111,9 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
 				      struct mem_info *mi, u64 period,
 				      u64 weight, u64 transaction,
 				      u64 timestamp, bool sample_self);
-int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
-			 struct perf_evsel *evsel, struct perf_sample *sample,
+int hist_entry_iter__add(struct hist_entry_iter *iter, struct hists *hists,
+			 struct perf_evsel *evsel, struct addr_location *al,
+			 struct perf_sample *sample,
 			 int max_stack_depth, void *arg);
 
 int64_t hist_entry__cmp(struct hist_entry *left, struct hist_entry *right);
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 26/37] perf tools: Move BUILD_ID_SIZE definition to perf.h
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (24 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 25/37] perf hists: Pass hists struct to hist_entry_iter functions Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 27/37] perf report: Parallelize perf report using multi-thread Namhyung Kim
                   ` (12 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

The util/event.h includes util/build-id.h only for BUILD_ID_SIZE.
This is a problem when I include util/event.h from util/tool.h which
is also included by util/build-id.h since it now makes a circular
dependency resulting in incomplete type error.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/perf.h          | 1 +
 tools/perf/util/build-id.h | 2 --
 tools/perf/util/dso.h      | 1 +
 tools/perf/util/event.h    | 1 -
 4 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 37284eb47b56..56ce258314fc 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -30,6 +30,7 @@ static inline unsigned long long rdclock(void)
 }
 
 #define MAX_NR_CPUS			256
+#define BUILD_ID_SIZE			20
 
 extern const char *input_name;
 extern bool perf_host, perf_guest;
diff --git a/tools/perf/util/build-id.h b/tools/perf/util/build-id.h
index 8236319514d5..8f31545edc5b 100644
--- a/tools/perf/util/build-id.h
+++ b/tools/perf/util/build-id.h
@@ -1,8 +1,6 @@
 #ifndef PERF_BUILD_ID_H_
 #define PERF_BUILD_ID_H_ 1
 
-#define BUILD_ID_SIZE 20
-
 #include "tool.h"
 #include <linux/types.h>
 
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index ac753594a469..c18fcc0e8081 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -7,6 +7,7 @@
 #include <linux/types.h>
 #include <linux/bitops.h>
 #include "map.h"
+#include "perf.h"
 #include "build-id.h"
 
 enum dso_binary_type {
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 09b9e8d3fcf7..5f66abfa61ca 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -6,7 +6,6 @@
 
 #include "../perf.h"
 #include "map.h"
-#include "build-id.h"
 #include "perf_regs.h"
 
 struct mmap_event {
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 27/37] perf report: Parallelize perf report using multi-thread
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (25 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 26/37] perf tools: Move BUILD_ID_SIZE definition to perf.h Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 28/37] perf tools: Add missing_threads rb tree Namhyung Kim
                   ` (11 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

Introduce perf_session__process_events_mt() to enable multi-thread
sample processing.  It allocates a struct perf_tool_mt and fills
needed info in it.  The init and fini callbacks are provided so that
we can pass additional data structure if needed.

The session and hists event stats are counted for each thread and
summed after finishing the processing.  Similarly hist entries are
added to per-thread hists first and then move to the original hists
using hists__multi_resort().  This function reuses hists__collapse_
resort() code so makes sort__need_collapse force to true and skips
the collapsing function.

Note that most of preprocessing stage is already done by processing
meta events in dummy tracking evsel first.  We can find corresponding
thread and map based on the sample time and symbol loading and dso
cache access is protected by pthread mutex.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-report.c |  90 +++++++++++++++++++++------
 tools/perf/util/hist.c      |  75 +++++++++++++++++++----
 tools/perf/util/hist.h      |   3 +
 tools/perf/util/session.c   | 146 ++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/session.h   |   4 ++
 tools/perf/util/tool.h      |  14 +++++
 6 files changed, 302 insertions(+), 30 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index aabcfc24afd1..796db514db31 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -128,18 +128,16 @@ static int hist_iter__report_callback(struct hist_entry_iter *iter,
 	return err;
 }
 
-static int process_sample_event(struct perf_tool *tool,
-				union perf_event *event,
-				struct perf_sample *sample,
-				struct perf_evsel *evsel,
-				struct machine *machine)
+static int __process_sample_event(struct perf_tool *tool __maybe_unused,
+				  union perf_event *event,
+				  struct perf_sample *sample,
+				  struct perf_evsel *evsel,
+				  struct machine *machine,
+				  struct hist_entry_iter *iter,
+				  struct hists *hists,
+				  struct report *rep)
 {
-	struct report *rep = container_of(tool, struct report, tool);
 	struct addr_location al;
-	struct hist_entry_iter iter = {
-		.hide_unresolved = rep->hide_unresolved,
-		.add_entry_cb = hist_iter__report_callback,
-	};
 	int ret;
 
 	if (perf_event__preprocess_sample(event, machine, &al, sample) < 0) {
@@ -155,18 +153,18 @@ static int process_sample_event(struct perf_tool *tool,
 		return 0;
 
 	if (sort__mode == SORT_MODE__BRANCH)
-		iter.ops = &hist_iter_branch;
+		iter->ops = &hist_iter_branch;
 	else if (rep->mem_mode)
-		iter.ops = &hist_iter_mem;
+		iter->ops = &hist_iter_mem;
 	else if (symbol_conf.cumulate_callchain)
-		iter.ops = &hist_iter_cumulative;
+		iter->ops = &hist_iter_cumulative;
 	else
-		iter.ops = &hist_iter_normal;
+		iter->ops = &hist_iter_normal;
 
 	if (al.map != NULL)
 		al.map->dso->hit = 1;
 
-	ret = hist_entry_iter__add(&iter, evsel__hists(evsel), evsel, &al,
+	ret = hist_entry_iter__add(iter, hists, evsel, &al,
 				   sample, rep->max_stack, rep);
 	if (ret < 0)
 		pr_debug("problem adding hist entry, skipping event\n");
@@ -174,6 +172,52 @@ static int process_sample_event(struct perf_tool *tool,
 	return ret;
 }
 
+static int process_sample_event(struct perf_tool *tool,
+				union perf_event *event,
+				struct perf_sample *sample,
+				struct perf_evsel *evsel,
+				struct machine *machine)
+{
+	struct report *rep = container_of(tool, struct report, tool);
+	struct hist_entry_iter iter = {
+		.hide_unresolved = rep->hide_unresolved,
+		.add_entry_cb = hist_iter__report_callback,
+	};
+
+	return __process_sample_event(tool, event, sample, evsel, machine,
+				      &iter, evsel__hists(evsel), rep);
+}
+
+static int process_sample_event_multi(struct perf_tool *tool,
+				      union perf_event *event,
+				      struct perf_sample *sample,
+				      struct perf_evsel *evsel,
+				      struct machine *machine)
+{
+	struct perf_tool_mt *mt = container_of(tool, struct perf_tool_mt, tool);
+	struct report *rep = mt->priv;
+	struct hist_entry_iter iter = {
+		.hide_unresolved = rep->hide_unresolved,
+	};
+
+	return __process_sample_event(tool, event, sample, evsel, machine,
+				      &iter, &mt->hists[evsel->idx], rep);
+}
+
+static int multi_report_init(struct perf_tool_mt *mt, void *arg)
+{
+	struct report *rep = arg;
+
+	mt->priv = rep;
+	return 0;
+}
+
+static int multi_report_fini(struct perf_tool_mt *mt, void *arg __maybe_unused)
+{
+	mt->priv = NULL;
+	return 0;
+}
+
 static int process_read_event(struct perf_tool *tool,
 			      union perf_event *event,
 			      struct perf_sample *sample __maybe_unused,
@@ -483,7 +527,14 @@ static int __cmd_report(struct report *rep)
 	if (ret)
 		return ret;
 
-	ret = perf_session__process_events(session, &rep->tool);
+	if (file->is_multi) {
+		rep->tool.sample = process_sample_event_multi;
+		ret = perf_session__process_events_mt(session, &rep->tool,
+						      multi_report_init,
+						      multi_report_fini, rep);
+	} else {
+		ret = perf_session__process_events(session, &rep->tool);
+	}
 	if (ret)
 		return ret;
 
@@ -506,7 +557,12 @@ static int __cmd_report(struct report *rep)
 		}
 	}
 
-	report__collapse_hists(rep);
+	/*
+	 * For multi-file report, it already calls hists__multi_resort()
+	 * so no need to collapse here.
+	 */
+	if (!file->is_multi)
+		report__collapse_hists(rep);
 
 	if (session_done())
 		return 0;
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index d7cee7165bcd..f3b39b45f2ec 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -953,7 +953,7 @@ void hist_entry__free(struct hist_entry *he)
  * collapse the histogram
  */
 
-static bool hists__collapse_insert_entry(struct hists *hists __maybe_unused,
+static bool hists__collapse_insert_entry(struct hists *hists,
 					 struct rb_root *root,
 					 struct hist_entry *he)
 {
@@ -990,6 +990,13 @@ static bool hists__collapse_insert_entry(struct hists *hists __maybe_unused,
 	}
 	hists->nr_entries++;
 
+	/*
+	 * For multi-threaded report, he->hists points to a dummy
+	 * hists in the struct perf_tool_mt.  Please see
+	 * perf_session__process_events_mt().
+	 */
+	he->hists = hists;
+
 	rb_link_node(&he->rb_node_in, parent, p);
 	rb_insert_color(&he->rb_node_in, root);
 	return true;
@@ -1017,19 +1024,12 @@ static void hists__apply_filters(struct hists *hists, struct hist_entry *he)
 	hists__filter_entry_by_symbol(hists, he);
 }
 
-void hists__collapse_resort(struct hists *hists, struct ui_progress *prog)
+static void __hists__collapse_resort(struct hists *hists, struct rb_root *root,
+				     struct ui_progress *prog)
 {
-	struct rb_root *root;
 	struct rb_node *next;
 	struct hist_entry *n;
 
-	if (!sort__need_collapse)
-		return;
-
-	hists->nr_entries = 0;
-
-	root = hists__get_rotate_entries_in(hists);
-
 	next = rb_first(root);
 
 	while (next) {
@@ -1052,6 +1052,27 @@ void hists__collapse_resort(struct hists *hists, struct ui_progress *prog)
 	}
 }
 
+void hists__collapse_resort(struct hists *hists, struct ui_progress *prog)
+{
+	struct rb_root *root;
+
+	if (!sort__need_collapse)
+		return;
+
+	hists->nr_entries = 0;
+
+	root = hists__get_rotate_entries_in(hists);
+	__hists__collapse_resort(hists, root, prog);
+}
+
+void hists__multi_resort(struct hists *dst, struct hists *src)
+{
+	struct rb_root *root = src->entries_in;
+
+	sort__need_collapse = 1;
+	__hists__collapse_resort(dst, root, NULL);
+}
+
 static int hist_entry__sort(struct hist_entry *a, struct hist_entry *b)
 {
 	struct perf_hpp_fmt *fmt;
@@ -1280,6 +1301,29 @@ void events_stats__inc(struct events_stats *stats, u32 type)
 	++stats->nr_events[type];
 }
 
+void events_stats__add(struct events_stats *dst, struct events_stats *src)
+{
+	int i;
+
+#define ADD(_field)  dst->_field += src->_field
+
+	ADD(total_period);
+	ADD(total_non_filtered_period);
+	ADD(total_lost);
+	ADD(total_invalid_chains);
+	ADD(nr_non_filtered_samples);
+	ADD(nr_lost_warned);
+	ADD(nr_unknown_events);
+	ADD(nr_invalid_chains);
+	ADD(nr_unknown_id);
+	ADD(nr_unprocessable_samples);
+
+	for (i = 0; i < PERF_RECORD_HEADER_MAX; i++)
+		ADD(nr_events[i]);
+
+#undef ADD
+}
+
 void hists__inc_nr_events(struct hists *hists, u32 type)
 {
 	events_stats__inc(&hists->stats, type);
@@ -1456,16 +1500,21 @@ int perf_hist_config(const char *var, const char *value)
 	return 0;
 }
 
-static int hists_evsel__init(struct perf_evsel *evsel)
+void __hists__init(struct hists *hists)
 {
-	struct hists *hists = evsel__hists(evsel);
-
 	memset(hists, 0, sizeof(*hists));
 	hists->entries_in_array[0] = hists->entries_in_array[1] = RB_ROOT;
 	hists->entries_in = &hists->entries_in_array[0];
 	hists->entries_collapsed = RB_ROOT;
 	hists->entries = RB_ROOT;
 	pthread_mutex_init(&hists->lock, NULL);
+}
+
+static int hists_evsel__init(struct perf_evsel *evsel)
+{
+	struct hists *hists = evsel__hists(evsel);
+
+	__hists__init(hists);
 	return 0;
 }
 
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 2ee0e40cf44c..4d975f5501ed 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -124,6 +124,7 @@ int hist_entry__sort_snprintf(struct hist_entry *he, char *bf, size_t size,
 void hist_entry__free(struct hist_entry *);
 
 void hists__output_resort(struct hists *hists, struct ui_progress *prog);
+void hists__multi_resort(struct hists *dst, struct hists *src);
 void hists__collapse_resort(struct hists *hists, struct ui_progress *prog);
 
 void hists__decay_entries(struct hists *hists, bool zap_user, bool zap_kernel);
@@ -136,6 +137,7 @@ void hists__inc_stats(struct hists *hists, struct hist_entry *h);
 void hists__inc_nr_events(struct hists *hists, u32 type);
 void hists__inc_nr_samples(struct hists *hists, bool filtered);
 void events_stats__inc(struct events_stats *stats, u32 type);
+void events_stats__add(struct events_stats *dst, struct events_stats *src);
 size_t events_stats__fprintf(struct events_stats *stats, FILE *fp);
 
 size_t hists__fprintf(struct hists *hists, bool show_header, int max_rows,
@@ -179,6 +181,7 @@ static inline struct hists *evsel__hists(struct perf_evsel *evsel)
 }
 
 int hists__init(void);
+void __hists__init(struct hists *hists);
 
 struct perf_hpp {
 	char *buf;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index af2608e782ae..c1a17110ec6a 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1412,6 +1412,152 @@ int perf_session__process_events(struct perf_session *session,
 	return err;
 }
 
+static void *processing_thread(void *arg)
+{
+	struct perf_tool_mt *mt_tool = arg;
+	int fd = perf_data_file__multi_fd(mt_tool->session->file, mt_tool->idx);
+	u64 size;
+
+	size = lseek(fd, 0, SEEK_END);
+	if (size == 0)
+		return arg;
+
+	pr_debug("processing samples using thread [%d]\n", mt_tool->idx);
+	if (__perf_session__process_events(mt_tool->session, &mt_tool->stats,
+					   fd, 0, size, size, &mt_tool->tool) < 0) {
+		pr_err("processing samples failed (thread [%d)\n", mt_tool->idx);
+		free(mt_tool->hists);
+		free(mt_tool);
+		return NULL;
+	}
+
+	pr_debug("processing samples done for thread [%d]\n", mt_tool->idx);
+	return arg;
+}
+
+int perf_session__process_events_mt(struct perf_session *session,
+				    struct perf_tool *tool,
+				    mt_tool_callback_t init_cb,
+				    mt_tool_callback_t fini_cb, void *arg)
+{
+	struct perf_data_file *file = session->file;
+	struct perf_evlist *evlist = session->evlist;
+	u64 size = perf_data_file__size(file);
+	struct perf_tool_mt *mt_tools = NULL;
+	struct perf_tool_mt *mt;
+	pthread_t *mt_id;
+	int err, i, k;
+
+	if (perf_session__register_idle_thread(session) == NULL)
+		return -ENOMEM;
+
+	if (perf_data_file__is_pipe(file) || !file->is_multi) {
+		pr_err("multi thread processing should be called with multi-file\n");
+		return -EINVAL;
+	}
+
+	err = __perf_session__process_events(session, &session->stats,
+					     perf_data_file__fd(file),
+					     session->header.data_offset,
+					     session->header.data_size,
+					     size, tool);
+	if (err)
+		return err;
+
+	mt_id = calloc(file->nr_multi, sizeof(*mt_id));
+	if (mt_id == NULL)
+		goto out;
+
+	mt_tools = calloc(file->nr_multi, sizeof(*mt_tools));
+	if (mt_tools == NULL)
+		goto out;
+
+	for (i = 0; i < file->nr_multi; i++) {
+		mt = &mt_tools[i];
+
+		memcpy(&mt->tool, tool, sizeof(*tool));
+		memset(&mt->stats, 0, sizeof(mt->stats));
+
+		mt->hists = calloc(evlist->nr_entries,
+					sizeof(*mt->hists));
+		if (mt->hists == NULL)
+			goto err;
+
+		for (k = 0; k < evlist->nr_entries; k++)
+			__hists__init(&mt->hists[k]);
+
+		mt->session = session;
+		mt->tool.ordered_events = false;
+		mt->idx = i;
+
+		err = init_cb(mt, arg);
+		if (err < 0)
+			goto err;
+
+		pthread_create(&mt_id[i], NULL, processing_thread, mt);
+	}
+
+	for (i = 0; i < file->nr_multi; i++) {
+		struct perf_evsel *evsel;
+		int err2;
+
+		pthread_join(mt_id[i], (void **)&mt);
+		if (mt == NULL) {
+			err = -EINVAL;
+			continue;
+		}
+
+		events_stats__add(&session->stats, &mt->stats);
+
+		evlist__for_each(evlist, evsel) {
+			struct hists *hists;
+
+			if (perf_evsel__is_dummy_tracking(evsel))
+				continue;
+
+			hists = evsel__hists(evsel);
+			events_stats__add(&hists->stats,
+					  &mt->hists[evsel->idx].stats);
+
+			hists__multi_resort(hists, &mt->hists[evsel->idx]);
+
+			/* Non-group events are considered as leader */
+			if (symbol_conf.event_group &&
+			    !perf_evsel__is_group_leader(evsel)) {
+				struct hists *leader_hists;
+
+				leader_hists = evsel__hists(evsel->leader);
+				hists__match(leader_hists, hists);
+				hists__link(leader_hists, hists);
+			}
+		}
+
+		err2 = fini_cb(mt, arg);
+		if (!err)
+			err = err2;
+	}
+
+out:
+	events_stats__warn_about_errors(&session->stats, tool);
+
+	if (mt_tools) {
+		for (i = 0; i < file->nr_multi; i++)
+			free(mt_tools[i].hists);
+		free(mt_tools);
+	}
+
+	free(mt_id);
+	return err;
+
+err:
+	while (i-- > 0) {
+		pthread_cancel(mt_id[i]);
+		pthread_join(mt_id[i], NULL);
+	}
+
+	goto out;
+}
+
 bool perf_session__has_traces(struct perf_session *session, const char *msg)
 {
 	struct perf_evsel *evsel;
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 8fc067d931cd..1d0750d891ba 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -51,6 +51,10 @@ int perf_session__peek_event(struct perf_session *session, off_t file_offset,
 
 int perf_session__process_events(struct perf_session *session,
 				 struct perf_tool *tool);
+int perf_session__process_events_mt(struct perf_session *session,
+				    struct perf_tool *tool,
+				    mt_tool_callback_t init_cb,
+				    mt_tool_callback_t fini_cb, void *arg);
 
 int perf_session_queue_event(struct perf_session *s, union perf_event *event,
 			     struct perf_tool *tool, struct perf_sample *sample,
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index bb2708bbfaca..222ebd4df6c1 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -2,6 +2,7 @@
 #define __PERF_TOOL_H
 
 #include <stdbool.h>
+#include "util/event.h"
 
 struct perf_session;
 union perf_event;
@@ -10,6 +11,7 @@ struct perf_evsel;
 struct perf_sample;
 struct perf_tool;
 struct machine;
+struct hists;
 
 typedef int (*event_sample)(struct perf_tool *tool, union perf_event *event,
 			    struct perf_sample *sample,
@@ -45,4 +47,16 @@ struct perf_tool {
 	bool		ordering_requires_timestamps;
 };
 
+struct perf_tool_mt {
+	struct perf_tool	tool;
+	struct events_stats	stats;
+	struct hists		*hists;
+	struct perf_session	*session;
+	int			idx;
+
+	void			*priv;
+};
+
+typedef int (*mt_tool_callback_t)(struct perf_tool_mt *, void *);
+
 #endif /* __PERF_TOOL_H */
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 28/37] perf tools: Add missing_threads rb tree
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (26 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 27/37] perf report: Parallelize perf report using multi-thread Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 29/37] perf top: Always creates thread in the current task tree Namhyung Kim
                   ` (10 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

Sometimes it's possible to miss certain meta events like fork/exit and
in this case it can fail to find such thread in the machine's rbtree.
But adding a thread to the tree is dangerous since it's now executed
in multi-thread environment otherwise it'll add an overhead in order
to grab a lock for every search.  So adds a separate missing_threads
tree and protect it with a mutex.  It's expected to be accessed only
if a thread is not found in a normal tree.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/tests/thread-lookup-time.c |  8 ++-
 tools/perf/util/build-id.c            |  9 +++-
 tools/perf/util/machine.c             | 91 +++++++++++++++++++++++------------
 tools/perf/util/machine.h             |  2 +
 tools/perf/util/session.c             |  8 +--
 tools/perf/util/thread.h              |  1 +
 6 files changed, 80 insertions(+), 39 deletions(-)

diff --git a/tools/perf/tests/thread-lookup-time.c b/tools/perf/tests/thread-lookup-time.c
index 6237ecf8caae..04cdde9329d6 100644
--- a/tools/perf/tests/thread-lookup-time.c
+++ b/tools/perf/tests/thread-lookup-time.c
@@ -7,7 +7,9 @@
 static int thread__print_cb(struct thread *th, void *arg __maybe_unused)
 {
 	printf("thread: %d, start time: %"PRIu64" %s\n",
-	       th->tid, th->start_time, th->dead ? "(dead)" : "");
+	       th->tid, th->start_time,
+	       th->dead ? "(dead)" : th->exited ? "(exited)" :
+	       th->missing ? "(missing)" : "");
 	return 0;
 }
 
@@ -105,6 +107,8 @@ static int lookup_with_timestamp(struct machine *machine)
 			machine__findnew_thread_time(machine, 0, 0, 70000) == t3);
 
 	machine__delete_threads(machine);
+	machine__delete_dead_threads(machine);
+	machine__delete_missing_threads(machine);
 	return 0;
 }
 
@@ -146,6 +150,8 @@ static int lookup_without_timestamp(struct machine *machine)
 			machine__findnew_thread_time(machine, 0, 0, -1ULL) == t3);
 
 	machine__delete_threads(machine);
+	machine__delete_dead_threads(machine);
+	machine__delete_missing_threads(machine);
 	return 0;
 }
 
diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index 0c72680a977f..98446d089b08 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -60,7 +60,14 @@ static int perf_event__exit_del_thread(struct perf_tool *tool __maybe_unused,
 		    event->fork.ppid, event->fork.ptid);
 
 	if (thread) {
-		rb_erase(&thread->rb_node, &machine->threads);
+		if (thread->dead)
+			rb_erase(&thread->rb_node, &machine->dead_threads);
+		else if (thread->missing)
+			rb_erase(&thread->rb_node, &machine->missing_threads);
+		else
+			rb_erase(&thread->rb_node, &machine->threads);
+
+		list_del(&thread->node);
 		machine->last_match = NULL;
 		thread__delete(thread);
 	}
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index ffce0bcd2d9a..c7492d4fde29 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -29,6 +29,7 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
 
 	machine->threads = RB_ROOT;
 	machine->dead_threads = RB_ROOT;
+	machine->missing_threads = RB_ROOT;
 	machine->last_match = NULL;
 
 	machine->vdso_info = NULL;
@@ -89,6 +90,19 @@ static void dsos__delete(struct dsos *dsos)
 	}
 }
 
+void machine__delete_missing_threads(struct machine *machine)
+{
+	struct rb_node *nd = rb_first(&machine->missing_threads);
+
+	while (nd) {
+		struct thread *t = rb_entry(nd, struct thread, rb_node);
+
+		nd = rb_next(nd);
+		rb_erase(&t->rb_node, &machine->missing_threads);
+		thread__delete(t);
+	}
+}
+
 void machine__delete_dead_threads(struct machine *machine)
 {
 	struct rb_node *nd = rb_first(&machine->dead_threads);
@@ -438,11 +452,12 @@ static struct thread *__machine__findnew_thread_time(struct machine *machine,
 						     pid_t pid, pid_t tid,
 						     u64 timestamp, bool create)
 {
-	struct thread *curr, *pos, *new;
+	struct thread *curr, *pos, *new = NULL;
 	struct thread *th = NULL;
 	struct rb_node **p;
 	struct rb_node *parent = NULL;
 	bool initial = timestamp == (u64)0;
+	static pthread_mutex_t missing_thread_lock = PTHREAD_MUTEX_INITIALIZER;
 
 	curr = __machine__findnew_thread(machine, pid, tid, initial);
 	if (curr && timestamp >= curr->start_time)
@@ -475,44 +490,49 @@ static struct thread *__machine__findnew_thread_time(struct machine *machine,
 			p = &(*p)->rb_right;
 	}
 
-	if (!create)
-		return NULL;
-
-	if (!curr)
-		return __machine__findnew_thread(machine, pid, tid, true);
+	pthread_mutex_lock(&missing_thread_lock);
 
-	new = thread__new(pid, tid);
-	if (new == NULL)
-		return NULL;
+	p = &machine->missing_threads.rb_node;
+	parent = NULL;
 
-	new->start_time = timestamp;
+	while (*p != NULL) {
+		parent = *p;
+		th = rb_entry(parent, struct thread, rb_node);
 
-	if (*p) {
-		list_for_each_entry(pos, &th->node, node) {
-			/* sort by time */
-			if (timestamp >= pos->start_time) {
-				th = pos;
-				break;
-			}
+		if (th->tid == tid) {
+			pthread_mutex_unlock(&missing_thread_lock);
+			return th;
 		}
-		list_add_tail(&new->node, &th->node);
-	} else {
-		rb_link_node(&new->rb_node, parent, p);
-		rb_insert_color(&new->rb_node, &machine->dead_threads);
+
+		if (tid < th->tid)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
 	}
 
+	if (!create)
+		goto out;
+
+	new = thread__new(pid, tid);
+	if (new == NULL)
+		goto out;
+
+	/* missing threads are not bothered with timestamp */
+	new->start_time = 0;
+	new->missing = true;
+
 	/*
-	 * We have to initialize map_groups separately
-	 * after rb tree is updated.
-	 *
-	 * The reason is that we call machine__findnew_thread
-	 * within thread__init_map_groups to find the thread
-	 * leader and that would screwed the rb tree.
+	 * missing threads have their own map groups regardless of
+	 * leader for the sake of simplicity.  it's okay since the map
+	 * groups has no map in it anyway.
 	 */
-	if (thread__init_map_groups(new, machine)) {
-		thread__delete(new);
-		return NULL;
-	}
+	new->mg = map_groups__new(machine);
+
+	rb_link_node(&new->rb_node, parent, p);
+	rb_insert_color(&new->rb_node, &machine->missing_threads);
+
+out:
+	pthread_mutex_unlock(&missing_thread_lock);
 
 	return new;
 }
@@ -1351,6 +1371,7 @@ static void machine__remove_thread(struct machine *machine, struct thread *th)
 
 	machine->last_match = NULL;
 	rb_erase(&th->rb_node, &machine->threads);
+	RB_CLEAR_NODE(&th->rb_node);
 
 	th->dead = true;
 
@@ -1822,6 +1843,14 @@ int machine__for_each_thread(struct machine *machine,
 				return rc;
 		}
 	}
+
+	for (nd = rb_first(&machine->missing_threads); nd; nd = rb_next(nd)) {
+		thread = rb_entry(nd, struct thread, rb_node);
+		rc = fn(thread, priv);
+		if (rc != 0)
+			return rc;
+	}
+
 	return rc;
 }
 
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 9571b6b1c5b5..40af1f59e360 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -31,6 +31,7 @@ struct machine {
 	char		  *root_dir;
 	struct rb_root	  threads;
 	struct rb_root	  dead_threads;
+	struct rb_root	  missing_threads;
 	struct thread	  *last_match;
 	struct vdso_info  *vdso_info;
 	struct dsos	  user_dsos;
@@ -116,6 +117,7 @@ void machines__set_comm_exec(struct machines *machines, bool comm_exec);
 struct machine *machine__new_host(void);
 int machine__init(struct machine *machine, const char *root_dir, pid_t pid);
 void machine__exit(struct machine *machine);
+void machine__delete_missing_threads(struct machine *machine);
 void machine__delete_dead_threads(struct machine *machine);
 void machine__delete_threads(struct machine *machine);
 void machine__delete(struct machine *machine);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index c1a17110ec6a..34956983ae8e 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -138,14 +138,11 @@ struct perf_session *perf_session__new(struct perf_data_file *file,
 	return NULL;
 }
 
-static void perf_session__delete_dead_threads(struct perf_session *session)
-{
-	machine__delete_dead_threads(&session->machines.host);
-}
-
 static void perf_session__delete_threads(struct perf_session *session)
 {
 	machine__delete_threads(&session->machines.host);
+	machine__delete_dead_threads(&session->machines.host);
+	machine__delete_missing_threads(&session->machines.host);
 }
 
 static void perf_session_env__delete(struct perf_session_env *env)
@@ -167,7 +164,6 @@ static void perf_session_env__delete(struct perf_session_env *env)
 void perf_session__delete(struct perf_session *session)
 {
 	perf_session__destroy_kernel_maps(session);
-	perf_session__delete_dead_threads(session);
 	perf_session__delete_threads(session);
 	perf_session_env__delete(&session->header.env);
 	machines__exit(&session->machines);
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 0b88ca22bc3d..87188ba9465b 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -23,6 +23,7 @@ struct thread {
 	bool			comm_set;
 	bool			exited; /* if set thread has exited */
 	bool			dead; /* thread is in dead_threads list */
+	bool			missing; /* thread is in missing_threads list */
 	struct list_head	comm_list;
 	int			comm_len;
 	u64			db_id;
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 29/37] perf top: Always creates thread in the current task tree.
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (27 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 28/37] perf tools: Add missing_threads rb tree Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 30/37] perf tools: Fix progress ui to support multi thread Namhyung Kim
                   ` (9 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

When machine__findnew_thread_time() creates a new thread, it puts the
thread in the missing_threads tree assuming it's rare case that missed
related task/mmap events during the recorded header file.

However this is not the case of live profiling - so it should be
treated differently.  This patch fixes NULL thread->mg reference in
case of missing threads in live profiling.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-top.c |  5 ++++-
 tools/perf/util/event.c  | 24 ++++++++++++++++++------
 tools/perf/util/event.h  |  5 +++++
 3 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 818ae35cbd7b..2d75a7fab470 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -699,6 +699,7 @@ static void perf_event__process_sample(struct perf_tool *tool,
 {
 	struct perf_top *top = container_of(tool, struct perf_top, tool);
 	struct addr_location al;
+	struct thread *thread;
 	int err;
 
 	if (!machine && perf_guest) {
@@ -724,7 +725,9 @@ static void perf_event__process_sample(struct perf_tool *tool,
 	if (event->header.misc & PERF_RECORD_MISC_EXACT_IP)
 		top->exact_samples++;
 
-	if (perf_event__preprocess_sample(event, machine, &al, sample) < 0)
+	/* Always use current thread tree for live profiling */
+	thread = machine__findnew_thread(machine, sample->pid, sample->tid);
+	if (__perf_event__preprocess_sample(event, machine, thread, &al, sample) < 0)
 		return;
 
 	if (!top->kptr_restrict_warned &&
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 3bb186a26314..452af8f4b0f3 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -863,14 +863,13 @@ void thread__find_addr_location_time(struct thread *thread, u8 cpumode,
 		al->sym = NULL;
 }
 
-int perf_event__preprocess_sample(const union perf_event *event,
-				  struct machine *machine,
-				  struct addr_location *al,
-				  struct perf_sample *sample)
+int __perf_event__preprocess_sample(const union perf_event *event,
+				    struct machine *machine,
+				    struct thread *thread,
+				    struct addr_location *al,
+				    struct perf_sample *sample)
 {
 	u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
-	struct thread *thread = machine__findnew_thread_time(machine, sample->pid,
-							     sample->tid, sample->time);
 
 	if (thread == NULL)
 		return -1;
@@ -928,6 +927,19 @@ int perf_event__preprocess_sample(const union perf_event *event,
 	return 0;
 }
 
+int perf_event__preprocess_sample(const union perf_event *event,
+				  struct machine *machine,
+				  struct addr_location *al,
+				  struct perf_sample *sample)
+{
+	struct thread *thread;
+
+	thread = machine__findnew_thread_time(machine, sample->pid,
+					      sample->tid, sample->time);
+	return __perf_event__preprocess_sample(event, machine, thread,
+					       al, sample);
+}
+
 bool is_bts_event(struct perf_event_attr *attr)
 {
 	return attr->type == PERF_TYPE_HARDWARE &&
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 5f66abfa61ca..1fbd37864241 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -352,6 +352,11 @@ int perf_event__process(struct perf_tool *tool,
 
 struct addr_location;
 
+int __perf_event__preprocess_sample(const union perf_event *event,
+				    struct machine *machine,
+				    struct thread *thread,
+				    struct addr_location *al,
+				    struct perf_sample *sample);
 int perf_event__preprocess_sample(const union perf_event *event,
 				  struct machine *machine,
 				  struct addr_location *al,
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 30/37] perf tools: Fix progress ui to support multi thread
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (28 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 29/37] perf top: Always creates thread in the current task tree Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 31/37] perf record: Show total size of multi file data Namhyung Kim
                   ` (8 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

Split ui_progress struct into global and local one.  Each thread
updates local struct without lock and only updates global one if
meaningful progress is done (with lock).

To do that, pass struct ui_progress to __perf_session__process_event()
and set it for the total size of multi-file storage.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/data.c    |  22 +++++++++
 tools/perf/util/data.h    |   3 ++
 tools/perf/util/hist.c    |   5 +-
 tools/perf/util/hist.h    |   3 +-
 tools/perf/util/session.c | 117 ++++++++++++++++++++++++++++++++++------------
 tools/perf/util/tool.h    |   3 ++
 6 files changed, 121 insertions(+), 32 deletions(-)

diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
index b6f7cdc4a39f..37f75f231c4d 100644
--- a/tools/perf/util/data.c
+++ b/tools/perf/util/data.c
@@ -290,3 +290,25 @@ ssize_t perf_data_file__write_multi(struct perf_data_file *file,
 
 	return writen(file->multi_fd[idx], buf, size);
 }
+
+s64 perf_data_file__multi_size(struct perf_data_file *file)
+{
+	int i;
+	s64 total_size = perf_data_file__size(file);
+
+	if (!file->is_multi)
+		return total_size;
+
+	for (i = 0; i < file->nr_multi; i++) {
+		int fd = perf_data_file__multi_fd(file, i);
+		long size;
+
+		size = lseek(fd, 0, SEEK_END);
+		if (size < 0)
+			return (s64) -1;
+
+		total_size += size;
+	}
+
+	return total_size;
+}
diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h
index f5c229166614..0f0013ac9e30 100644
--- a/tools/perf/util/data.h
+++ b/tools/perf/util/data.h
@@ -2,6 +2,7 @@
 #define __PERF_DATA_H
 
 #include <stdbool.h>
+#include "perf.h"
 
 enum perf_data_mode {
 	PERF_DATA_MODE_WRITE,
@@ -61,4 +62,6 @@ int perf_data_file__prepare_write(struct perf_data_file *file, int nr);
 ssize_t perf_data_file__write_multi(struct perf_data_file *file,
 				    void *buf, size_t size, int idx);
 
+s64 perf_data_file__multi_size(struct perf_data_file *file);
+
 #endif /* __PERF_DATA_H */
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index f3b39b45f2ec..60b55a92f23e 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -1065,12 +1065,13 @@ void hists__collapse_resort(struct hists *hists, struct ui_progress *prog)
 	__hists__collapse_resort(hists, root, prog);
 }
 
-void hists__multi_resort(struct hists *dst, struct hists *src)
+void hists__multi_resort(struct hists *dst, struct hists *src,
+			 struct ui_progress *prog)
 {
 	struct rb_root *root = src->entries_in;
 
 	sort__need_collapse = 1;
-	__hists__collapse_resort(dst, root, NULL);
+	__hists__collapse_resort(dst, root, prog);
 }
 
 static int hist_entry__sort(struct hist_entry *a, struct hist_entry *b)
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 4d975f5501ed..e2abc4c75158 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -124,7 +124,8 @@ int hist_entry__sort_snprintf(struct hist_entry *he, char *bf, size_t size,
 void hist_entry__free(struct hist_entry *);
 
 void hists__output_resort(struct hists *hists, struct ui_progress *prog);
-void hists__multi_resort(struct hists *dst, struct hists *src);
+void hists__multi_resort(struct hists *dst, struct hists *src,
+			 struct ui_progress *prog);
 void hists__collapse_resort(struct hists *hists, struct ui_progress *prog);
 
 void hists__decay_entries(struct hists *hists, bool zap_user, bool zap_kernel);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 34956983ae8e..6d1dfbc650ba 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1258,14 +1258,14 @@ fetch_mmaped_event(struct perf_session *session,
 static int __perf_session__process_events(struct perf_session *session,
 					  struct events_stats *stats, int fd,
 					  u64 data_offset, u64 data_size,
-					  u64 file_size, struct perf_tool *tool)
+					  u64 file_size, struct perf_tool *tool,
+					  struct ui_progress *prog)
 {
 	u64 head, page_offset, file_offset, file_pos, size;
 	int err, mmap_prot, mmap_flags, map_idx = 0;
 	size_t	mmap_size;
 	char *buf, *mmaps[NUM_MMAPS];
 	union perf_event *event;
-	struct ui_progress prog;
 	s64 skip;
 
 	perf_tool__fill_defaults(tool);
@@ -1277,8 +1277,6 @@ static int __perf_session__process_events(struct perf_session *session,
 	if (data_size && (data_offset + data_size < file_size))
 		file_size = data_offset + data_size;
 
-	ui_progress__init(&prog, file_size, "Processing events...");
-
 	mmap_size = MMAP_SIZE;
 	if (mmap_size > file_size) {
 		mmap_size = file_size;
@@ -1344,7 +1342,7 @@ static int __perf_session__process_events(struct perf_session *session,
 	head += size;
 	file_pos += size;
 
-	ui_progress__update(&prog, size);
+	ui_progress__update(prog, size);
 
 	if (session_done())
 		goto out;
@@ -1356,7 +1354,6 @@ static int __perf_session__process_events(struct perf_session *session,
 	/* do the final flush for ordered samples */
 	err = ordered_events__flush(session, tool, OE_FLUSH__FINAL);
 out_err:
-	ui_progress__finish();
 	ordered_events__free(&session->ordered_events);
 	session->one_mmap = false;
 	return err;
@@ -1365,25 +1362,32 @@ static int __perf_session__process_events(struct perf_session *session,
 int perf_session__process_events(struct perf_session *session,
 				 struct perf_tool *tool)
 {
-	u64 size = perf_data_file__size(session->file);
+	u64 size;
+	struct ui_progress prog;
+	struct perf_data_file *file = session->file;
 	int err, i;
 
 	if (perf_session__register_idle_thread(session) == NULL)
 		return -ENOMEM;
 
-	if (perf_data_file__is_pipe(session->file))
+	if (perf_data_file__is_pipe(file))
 		return __perf_session__process_pipe_events(session, tool);
 
+	size = perf_data_file__multi_size(file);
+	if ((s64)size < 0)
+		return -EINVAL;
+
+	ui_progress__init(&prog, size, "Processing events...");
+
 	err = __perf_session__process_events(session, &session->stats,
-					     perf_data_file__fd(session->file),
+					     perf_data_file__fd(file),
 					     session->header.data_offset,
 					     session->header.data_size,
-					     size, tool);
+					     perf_data_file__size(file),
+					     tool, &prog);
 
-	if (!session->file->is_multi || err) {
-		events_stats__warn_about_errors(&session->stats, tool);
-		return err;
-	}
+	if (!file->is_multi || err)
+		goto out;
 
 	/*
 	 * For multi-file data storage, events are processed for each
@@ -1391,23 +1395,49 @@ int perf_session__process_events(struct perf_session *session,
 	 */
 	tool->ordered_events = false;
 
-	for (i = 0; i < session->file->nr_multi; i++) {
-		int fd = perf_data_file__multi_fd(session->file, i);
+	for (i = 0; i < file->nr_multi; i++) {
+		int fd = perf_data_file__multi_fd(file, i);
 
 		size = lseek(fd, 0, SEEK_END);
 		if (size == 0)
 			continue;
 
 		err = __perf_session__process_events(session, &session->stats,
-						     fd, 0, size, size, tool);
+						     fd, 0, size, size, tool,
+						     &prog);
 		if (err < 0)
 			break;
 	}
 
+out:
+	ui_progress__finish();
 	events_stats__warn_about_errors(&session->stats, tool);
 	return err;
 }
 
+struct ui_progress_ops *orig_progress__ops;
+
+static void mt_progress__update(struct ui_progress *p)
+{
+	struct perf_tool_mt *mt_tool = container_of(p, struct perf_tool_mt, prog);
+	struct ui_progress *gprog = mt_tool->global_prog;
+	static pthread_mutex_t prog_lock = PTHREAD_MUTEX_INITIALIZER;
+
+	pthread_mutex_lock(&prog_lock);
+
+	gprog->curr += p->step;
+	if (gprog->curr >= gprog->next) {
+		gprog->next += gprog->step;
+		orig_progress__ops->update(gprog);
+	}
+
+	pthread_mutex_unlock(&prog_lock);
+}
+
+static struct ui_progress_ops mt_progress__ops = {
+	.update = mt_progress__update,
+};
+
 static void *processing_thread(void *arg)
 {
 	struct perf_tool_mt *mt_tool = arg;
@@ -1418,9 +1448,12 @@ static void *processing_thread(void *arg)
 	if (size == 0)
 		return arg;
 
+	ui_progress__init(&mt_tool->prog, size, "");
+
 	pr_debug("processing samples using thread [%d]\n", mt_tool->idx);
 	if (__perf_session__process_events(mt_tool->session, &mt_tool->stats,
-					   fd, 0, size, size, &mt_tool->tool) < 0) {
+					   fd, 0, size, size, &mt_tool->tool,
+					   &mt_tool->prog) < 0) {
 		pr_err("processing samples failed (thread [%d)\n", mt_tool->idx);
 		free(mt_tool->hists);
 		free(mt_tool);
@@ -1438,9 +1471,11 @@ int perf_session__process_events_mt(struct perf_session *session,
 {
 	struct perf_data_file *file = session->file;
 	struct perf_evlist *evlist = session->evlist;
-	u64 size = perf_data_file__size(file);
+	struct perf_evsel *evsel;
+	u64 size, nr_entries = 0;
 	struct perf_tool_mt *mt_tools = NULL;
 	struct perf_tool_mt *mt;
+	struct ui_progress prog;
 	pthread_t *mt_id;
 	int err, i, k;
 
@@ -1452,14 +1487,25 @@ int perf_session__process_events_mt(struct perf_session *session,
 		return -EINVAL;
 	}
 
+	size = perf_data_file__multi_size(file);
+	if ((s64)size < 0)
+		return -EINVAL;
+
+	ui_progress__init(&prog, size, "Processing events...");
+
 	err = __perf_session__process_events(session, &session->stats,
 					     perf_data_file__fd(file),
 					     session->header.data_offset,
 					     session->header.data_size,
-					     size, tool);
+					     perf_data_file__size(file),
+					     tool, &prog);
 	if (err)
 		return err;
 
+	orig_progress__ops = ui_progress__ops;
+	ui_progress__ops = &mt_progress__ops;
+	ui_progress__ops->finish = orig_progress__ops->finish;
+
 	mt_id = calloc(file->nr_multi, sizeof(*mt_id));
 	if (mt_id == NULL)
 		goto out;
@@ -1485,6 +1531,7 @@ int perf_session__process_events_mt(struct perf_session *session,
 		mt->session = session;
 		mt->tool.ordered_events = false;
 		mt->idx = i;
+		mt->global_prog = &prog;
 
 		err = init_cb(mt, arg);
 		if (err < 0)
@@ -1494,9 +1541,6 @@ int perf_session__process_events_mt(struct perf_session *session,
 	}
 
 	for (i = 0; i < file->nr_multi; i++) {
-		struct perf_evsel *evsel;
-		int err2;
-
 		pthread_join(mt_id[i], (void **)&mt);
 		if (mt == NULL) {
 			err = -EINVAL;
@@ -1506,16 +1550,30 @@ int perf_session__process_events_mt(struct perf_session *session,
 		events_stats__add(&session->stats, &mt->stats);
 
 		evlist__for_each(evlist, evsel) {
-			struct hists *hists;
+			struct hists *hists = evsel__hists(evsel);
 
-			if (perf_evsel__is_dummy_tracking(evsel))
-				continue;
-
-			hists = evsel__hists(evsel);
 			events_stats__add(&hists->stats,
 					  &mt->hists[evsel->idx].stats);
 
-			hists__multi_resort(hists, &mt->hists[evsel->idx]);
+			nr_entries += mt->hists[evsel->idx].nr_entries;
+		}
+	}
+
+	ui_progress__ops = orig_progress__ops;
+	ui_progress__init(&prog, nr_entries, "Merging related events...");
+
+	for (i = 0; i < file->nr_multi; i++) {
+		int err2;
+
+		mt = &mt_tools[i];
+
+		evlist__for_each(evlist, evsel) {
+			struct hists *hists = evsel__hists(evsel);
+
+			if (perf_evsel__is_dummy_tracking(evsel))
+				continue;
+
+			hists__multi_resort(hists, &mt->hists[evsel->idx], &prog);
 
 			/* Non-group events are considered as leader */
 			if (symbol_conf.event_group &&
@@ -1534,6 +1592,7 @@ int perf_session__process_events_mt(struct perf_session *session,
 	}
 
 out:
+	ui_progress__finish();
 	events_stats__warn_about_errors(&session->stats, tool);
 
 	if (mt_tools) {
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 222ebd4df6c1..bd273c85e3cb 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -3,6 +3,7 @@
 
 #include <stdbool.h>
 #include "util/event.h"
+#include "ui/progress.h"
 
 struct perf_session;
 union perf_event;
@@ -52,6 +53,8 @@ struct perf_tool_mt {
 	struct events_stats	stats;
 	struct hists		*hists;
 	struct perf_session	*session;
+	struct ui_progress	prog;
+	struct ui_progress	*global_prog;
 	int			idx;
 
 	void			*priv;
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 31/37] perf record: Show total size of multi file data
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (29 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 30/37] perf tools: Fix progress ui to support multi thread Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 32/37] perf report: Add --multi-thread option and config item Namhyung Kim
                   ` (7 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

Currently perf record shows header file size only - extends it to show
total size of multi-file data storage.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-record.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 7f7a4725d080..eb485f1ee66e 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -538,9 +538,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		 */
 		fprintf(stderr,
 			"[ perf record: Captured and wrote %.3f MB %s (~%" PRIu64 " samples) ]\n",
-			(double)rec->bytes_written / 1024.0 / 1024.0,
+			(double)(u64)perf_data_file__multi_size(file) / 1024.0 / 1024.0,
 			file->path,
-			rec->bytes_written / 24);
+			perf_data_file__multi_size(file) / 24);
 	}
 
 out_child:
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 32/37] perf report: Add --multi-thread option and config item
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (30 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 31/37] perf record: Show total size of multi file data Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 33/37] perf tools: Add front cache for dso data access Namhyung Kim
                   ` (6 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

The --multi-thread option is to enable parallel processing so user can
force serial processing even for multi-file data.  It default to false
but users also can changes this by setting "report.multi_thread"
config option in ~/.perfconfig file.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-report.txt |  3 +++
 tools/perf/builtin-report.c              | 18 +++++++++++++++---
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index dd7cccdde498..e00077a658c1 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -318,6 +318,9 @@ OPTIONS
 --header-only::
 	Show only perf.data header (forces --stdio).
 
+--multi-thread::
+	Speed up report by parallelizing sample processing using multi-thread.
+
 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-annotate[1]
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 796db514db31..6e260eaf3b1a 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -51,6 +51,7 @@ struct report {
 	bool			mem_mode;
 	bool			header;
 	bool			header_only;
+	bool			multi_thread;
 	int			max_stack;
 	struct perf_read_values	show_threads_values;
 	const char		*pretty_printing_style;
@@ -82,6 +83,10 @@ static int report__config(const char *var, const char *value, void *cb)
 		rep->queue_size = perf_config_u64(var, value);
 		return 0;
 	}
+	if (!strcmp(var, "report.multi-thread")) {
+		rep->multi_thread = perf_config_bool(var, value);
+		return 0;
+	}
 
 	return perf_default_config(var, value, cb);
 }
@@ -527,7 +532,7 @@ static int __cmd_report(struct report *rep)
 	if (ret)
 		return ret;
 
-	if (file->is_multi) {
+	if (rep->multi_thread) {
 		rep->tool.sample = process_sample_event_multi;
 		ret = perf_session__process_events_mt(session, &rep->tool,
 						      multi_report_init,
@@ -558,10 +563,10 @@ static int __cmd_report(struct report *rep)
 	}
 
 	/*
-	 * For multi-file report, it already calls hists__multi_resort()
+	 * For multi-thread report, it already calls hists__multi_resort()
 	 * so no need to collapse here.
 	 */
-	if (!file->is_multi)
+	if (!rep->multi_thread)
 		report__collapse_hists(rep);
 
 	if (session_done())
@@ -770,6 +775,8 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 		     "Don't show entries under that percent", parse_percent_limit),
 	OPT_CALLBACK(0, "percentage", NULL, "relative|absolute",
 		     "how to display percentage of filtered entries", parse_filter_percentage),
+	OPT_BOOLEAN(0, "multi-thread", &report.multi_thread,
+		    "Speed up sample processing using multi-thead"),
 	OPT_END()
 	};
 	struct perf_data_file file = {
@@ -814,6 +821,11 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 					       report.queue_size);
 	}
 
+	if (report.multi_thread && !file.is_multi) {
+		pr_debug("fallback to single thread for single data file.\n");
+		report.multi_thread = false;
+	}
+
 	report.session = session;
 
 	has_br_stack = perf_header__has_feat(&session->header,
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 33/37] perf tools: Add front cache for dso data access
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (31 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 32/37] perf report: Add --multi-thread option and config item Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 34/37] perf tools: Convert lseek + read to pread Namhyung Kim
                   ` (5 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

There's a high contention in dso_cache__find() due to the dso->lock
when dwarf unwinding is done with libunwind.  Add last accessed
pointers of dso_cache to lockless lookup.  It'll fallback to normal
tree search when it misses the last cache.

The size 16 is arbitrary and works best for my setting.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/dso.c | 28 ++++++++++++++++++++++++++--
 tools/perf/util/dso.h |  2 ++
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 6c1f5619f423..d8ee1fd826e7 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -466,12 +466,31 @@ dso_cache__free(struct dso *dso)
 	pthread_mutex_unlock(&dso->lock);
 }
 
+static void update_last_cache(struct dso *dso, struct dso_cache *cache)
+{
+	int i;
+
+	for (i = DSO_LAST_CACHE_NR - 1; i > 0; i--)
+		dso->data.last[i] = dso->data.last[i-1];
+
+	dso->data.last[0] = cache;
+}
+
 static struct dso_cache *dso_cache__find(struct dso *dso, u64 offset)
 {
 	const struct rb_root *root = &dso->data.cache;
 	struct rb_node * const *p = &root->rb_node;
 	const struct rb_node *parent = NULL;
 	struct dso_cache *cache;
+	int i;
+
+	for (i = 0; i < DSO_LAST_CACHE_NR; i++) {
+		cache = dso->data.last[i];
+
+		if (cache && cache->offset <= offset &&
+		    offset < cache->offset + DSO__DATA_CACHE_SIZE)
+			return cache;
+	}
 
 	pthread_mutex_lock(&dso->lock);
 	while (*p != NULL) {
@@ -485,8 +504,10 @@ static struct dso_cache *dso_cache__find(struct dso *dso, u64 offset)
 			p = &(*p)->rb_left;
 		else if (offset >= end)
 			p = &(*p)->rb_right;
-		else
+		else {
+			update_last_cache(dso, cache);
 			goto out;
+		}
 	}
 	cache = NULL;
 out:
@@ -515,13 +536,16 @@ dso_cache__insert(struct dso *dso, struct dso_cache *new)
 			p = &(*p)->rb_left;
 		else if (offset >= end)
 			p = &(*p)->rb_right;
-		else
+		else {
+			update_last_cache(dso, cache);
 			goto out;
+		}
 	}
 
 	rb_link_node(&new->rb_node, parent, p);
 	rb_insert_color(&new->rb_node, root);
 
+	update_last_cache(dso, new);
 	cache = NULL;
 out:
 	pthread_mutex_unlock(&dso->lock);
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index c18fcc0e8081..28e8bb320495 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -136,6 +136,8 @@ struct dso {
 	/* dso data file */
 	struct {
 		struct rb_root	 cache;
+#define DSO_LAST_CACHE_NR  16
+		struct dso_cache *last[DSO_LAST_CACHE_NR];
 		int		 fd;
 		int		 status;
 		u32		 status_seen;
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 34/37] perf tools: Convert lseek + read to pread
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (32 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 33/37] perf tools: Add front cache for dso data access Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 35/37] perf callchain: Save eh/debug frame offset for dwarf unwind Namhyung Kim
                   ` (4 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

When dso_cache__read() is called, it reads data from the given offset
using lseek + normal read syscall.  It can be combined to a single
pread syscall.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/dso.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index d8ee1fd826e7..95c8d5a2b934 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -593,10 +593,7 @@ dso_cache__read(struct dso *dso, struct machine *machine,
 		}
 	}
 
-	if (-1 == lseek(dso->data.fd, cache_offset, SEEK_SET))
-		goto err_unlock;
-
-	ret = read(dso->data.fd, cache->data, DSO__DATA_CACHE_SIZE);
+	ret = pread(dso->data.fd, cache->data, DSO__DATA_CACHE_SIZE, cache_offset);
 	if (ret <= 0)
 		goto err_unlock;
 
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 35/37] perf callchain: Save eh/debug frame offset for dwarf unwind
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (33 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 34/37] perf tools: Convert lseek + read to pread Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 36/37] perf tools: Add new perf data command Namhyung Kim
                   ` (3 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

When libunwind tries to resolve callchains it needs to know the offset
of .eh_frame_hdr or .debug_frame to access the dso.  Since it calls
dso__data_fd(), it'll try to grab dso->lock everytime for same
information.  So save it to dso_data struct and reuse it.

Note that there's a window between dso__data_fd() and actual use of
the fd.  The fd could be closed by other threads to deal with the open
file limit in dso cache code.  But I think it's ok since in that case
elf_section_offset() will return 0 so it'll be tried in next acess.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/dso.h              |  1 +
 tools/perf/util/unwind-libunwind.c | 31 ++++++++++++++++++++-----------
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 28e8bb320495..5cf8dfe04eac 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -143,6 +143,7 @@ struct dso {
 		u32		 status_seen;
 		size_t		 file_size;
 		struct list_head open_entry;
+		u64		 frame_offset;
 	} data;
 
 	union { /* Tool specific area */
diff --git a/tools/perf/util/unwind-libunwind.c b/tools/perf/util/unwind-libunwind.c
index 94929ec491f3..3cb78c36faad 100644
--- a/tools/perf/util/unwind-libunwind.c
+++ b/tools/perf/util/unwind-libunwind.c
@@ -244,14 +244,17 @@ static int read_unwind_spec_eh_frame(struct dso *dso, struct machine *machine,
 				     u64 *fde_count)
 {
 	int ret = -EINVAL, fd;
-	u64 offset;
+	u64 offset = dso->data.frame_offset;
 
-	fd = dso__data_fd(dso, machine);
-	if (fd < 0)
-		return -EINVAL;
+	if (offset == 0) {
+		fd = dso__data_fd(dso, machine);
+		if (fd < 0)
+			return -EINVAL;
 
-	/* Check the .eh_frame section for unwinding info */
-	offset = elf_section_offset(fd, ".eh_frame_hdr");
+		/* Check the .eh_frame section for unwinding info */
+		offset = elf_section_offset(fd, ".eh_frame_hdr");
+		dso->data.frame_offset = offset;
+	}
 
 	if (offset)
 		ret = unwind_spec_ehframe(dso, machine, offset,
@@ -265,14 +268,20 @@ static int read_unwind_spec_eh_frame(struct dso *dso, struct machine *machine,
 static int read_unwind_spec_debug_frame(struct dso *dso,
 					struct machine *machine, u64 *offset)
 {
-	int fd = dso__data_fd(dso, machine);
+	int fd;
+	u64 ofs = dso->data.frame_offset;
 
-	if (fd < 0)
-		return -EINVAL;
+	if (ofs == 0) {
+		fd = dso__data_fd(dso, machine);
+		if (fd < 0)
+			return -EINVAL;
 
-	/* Check the .debug_frame section for unwinding info */
-	*offset = elf_section_offset(fd, ".debug_frame");
+		/* Check the .debug_frame section for unwinding info */
+		ofs = elf_section_offset(fd, ".debug_frame");
+		dso->data.frame_offset = ofs;
+	}
 
+	*offset = ofs;
 	if (*offset)
 		return 0;
 
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 36/37] perf tools: Add new perf data command
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (34 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 35/37] perf callchain: Save eh/debug frame offset for dwarf unwind Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24  7:15 ` [PATCH 37/37] perf data: Implement 'split' subcommand Namhyung Kim
                   ` (2 subsequent siblings)
  38 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker,
	Jiri Olsa, Sebastian Andrzej Siewior, Jiri Olsa

From: Jiri Olsa <namhyung@kernel.org>

Adding new 'perf data' command to provide operations over
data files.

The 'perf data convert' sub command is coming in following
patch, but there's possibility for other useful commands
like 'perf data ls' (to display perf data file in directory
in ls style).

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-data.txt | 15 +++++++
 tools/perf/Makefile.perf               |  1 +
 tools/perf/builtin-data.c              | 75 ++++++++++++++++++++++++++++++++++
 tools/perf/builtin.h                   |  1 +
 tools/perf/command-list.txt            |  1 +
 tools/perf/perf.c                      |  1 +
 6 files changed, 94 insertions(+)
 create mode 100644 tools/perf/Documentation/perf-data.txt
 create mode 100644 tools/perf/builtin-data.c

diff --git a/tools/perf/Documentation/perf-data.txt b/tools/perf/Documentation/perf-data.txt
new file mode 100644
index 000000000000..b8c83947715c
--- /dev/null
+++ b/tools/perf/Documentation/perf-data.txt
@@ -0,0 +1,15 @@
+perf-data(1)
+==============
+
+NAME
+----
+perf-data - Data file related processing
+
+SYNOPSIS
+--------
+[verse]
+'perf data' [<common options>] <command> [<options>]",
+
+DESCRIPTION
+-----------
+Data file related processing.
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 47d933454492..f22085cb0b24 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -485,6 +485,7 @@ BUILTIN_OBJS += $(OUTPUT)builtin-kvm.o
 BUILTIN_OBJS += $(OUTPUT)builtin-inject.o
 BUILTIN_OBJS += $(OUTPUT)tests/builtin-test.o
 BUILTIN_OBJS += $(OUTPUT)builtin-mem.o
+BUILTIN_OBJS += $(OUTPUT)builtin-data.o
 
 PERFLIBS = $(LIB_FILE) $(LIBAPIKFS) $(LIBTRACEEVENT)
 
diff --git a/tools/perf/builtin-data.c b/tools/perf/builtin-data.c
new file mode 100644
index 000000000000..1eee97d020fa
--- /dev/null
+++ b/tools/perf/builtin-data.c
@@ -0,0 +1,75 @@
+#include <linux/compiler.h>
+#include "builtin.h"
+#include "perf.h"
+#include "debug.h"
+#include "parse-options.h"
+
+typedef int (*data_cmd_fn_t)(int argc, const char **argv, const char *prefix);
+
+struct data_cmd {
+	const char	*name;
+	const char	*summary;
+	data_cmd_fn_t	fn;
+};
+
+static struct data_cmd data_cmds[];
+
+#define for_each_cmd(cmd) \
+	for (cmd = data_cmds; cmd && cmd->name; cmd++)
+
+static const struct option data_options[] = {
+	OPT_END()
+};
+
+static const char * const data_usage[] = {
+	"perf data [<common options>] <command> [<options>]",
+	NULL
+};
+
+static void print_usage(void)
+{
+	struct data_cmd *cmd;
+
+	printf("Usage:\n");
+	printf("\t%s\n\n", data_usage[0]);
+	printf("\tAvailable commands:\n");
+
+	for_each_cmd(cmd) {
+		printf("\t %s\t- %s\n", cmd->name, cmd->summary);
+	}
+
+	printf("\n");
+}
+
+static struct data_cmd data_cmds[] = {
+	{ NULL },
+};
+
+int cmd_data(int argc, const char **argv, const char *prefix)
+{
+	struct data_cmd *cmd;
+	const char *cmdstr;
+
+	/* No command specified. */
+	if (argc < 2)
+		goto usage;
+
+	argc = parse_options(argc, argv, data_options, data_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+	if (argc < 1)
+		goto usage;
+
+	cmdstr = argv[0];
+
+	for_each_cmd(cmd) {
+		if (strcmp(cmd->name, cmdstr))
+			continue;
+
+		return cmd->fn(argc, argv, prefix);
+	}
+
+	pr_err("Unknown command: %s\n", cmdstr);
+usage:
+	print_usage();
+	return -1;
+}
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index b210d62907e4..3688ad29085f 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -37,6 +37,7 @@ extern int cmd_test(int argc, const char **argv, const char *prefix);
 extern int cmd_trace(int argc, const char **argv, const char *prefix);
 extern int cmd_inject(int argc, const char **argv, const char *prefix);
 extern int cmd_mem(int argc, const char **argv, const char *prefix);
+extern int cmd_data(int argc, const char **argv, const char *prefix);
 
 extern int find_scripts(char **scripts_array, char **scripts_path_array);
 #endif
diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt
index 0906fc401c52..00fcaf8a5b8d 100644
--- a/tools/perf/command-list.txt
+++ b/tools/perf/command-list.txt
@@ -7,6 +7,7 @@ perf-archive			mainporcelain common
 perf-bench			mainporcelain common
 perf-buildid-cache		mainporcelain common
 perf-buildid-list		mainporcelain common
+perf-data			mainporcelain common
 perf-diff			mainporcelain common
 perf-evlist			mainporcelain common
 perf-inject			mainporcelain common
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index 3700a7faca6c..f3c66b81c6be 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -62,6 +62,7 @@ static struct cmd_struct commands[] = {
 #endif
 	{ "inject",	cmd_inject,	0 },
 	{ "mem",	cmd_mem,	0 },
+	{ "data",	cmd_data,	0 },
 };
 
 struct pager_config {
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH 37/37] perf data: Implement 'split' subcommand
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (35 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 36/37] perf tools: Add new perf data command Namhyung Kim
@ 2014-12-24  7:15 ` Namhyung Kim
  2014-12-24 13:51   ` Arnaldo Carvalho de Melo
  2014-12-26 13:59   ` Jiri Olsa
  2014-12-26 14:02 ` [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Jiri Olsa
  2015-01-05 18:48 ` Andi Kleen
  38 siblings, 2 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24  7:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

The perf data split command is for splitting a (large) single data
file into multiple files under a directory (perf.data.dir by default)
so that it can be processed and reported using multiple threads.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-data.txt |  28 +++++
 tools/perf/builtin-data.c              | 223 +++++++++++++++++++++++++++++++++
 2 files changed, 251 insertions(+)

diff --git a/tools/perf/Documentation/perf-data.txt b/tools/perf/Documentation/perf-data.txt
index b8c83947715c..42708702f10c 100644
--- a/tools/perf/Documentation/perf-data.txt
+++ b/tools/perf/Documentation/perf-data.txt
@@ -13,3 +13,31 @@ SYNOPSIS
 DESCRIPTION
 -----------
 Data file related processing.
+
+COMMANDS
+--------
+split::
+	Split single data file (perf.data) into multiple files under a directory
+	in order to be reported by multiple threads.
+
+OPTIONS for 'split'
+---------------------
+-i::
+--input::
+	Specify input perf data file path.
+
+-o::
+--output::
+	Specify output perf data directory path.
+
+-v::
+--verbose::
+        Be more verbose (show counter open errors, etc).
+
+-f::
+--force::
+        Don't complain, do it.
+
+SEE ALSO
+--------
+linkperf:perf[1], linkperf:perf-report[1]
diff --git a/tools/perf/builtin-data.c b/tools/perf/builtin-data.c
index 1eee97d020fa..5f3173826850 100644
--- a/tools/perf/builtin-data.c
+++ b/tools/perf/builtin-data.c
@@ -2,10 +2,14 @@
 #include "builtin.h"
 #include "perf.h"
 #include "debug.h"
+#include "session.h"
+#include "evlist.h"
 #include "parse-options.h"
 
 typedef int (*data_cmd_fn_t)(int argc, const char **argv, const char *prefix);
 
+static const char *output_name;
+
 struct data_cmd {
 	const char	*name;
 	const char	*summary;
@@ -41,10 +45,229 @@ static void print_usage(void)
 	printf("\n");
 }
 
+static int data_cmd_split(int argc, const char **argv, const char *prefix);
+
 static struct data_cmd data_cmds[] = {
+	{ "split", "split single data file into multi-file", data_cmd_split },
 	{ NULL },
 };
 
+#define FD_HASH_BITS  7
+#define FD_HASH_SIZE  (1 << FD_HASH_BITS)
+#define FD_HASH_MASK  (FD_HASH_SIZE - 1)
+
+struct data_split {
+	struct perf_tool	tool;
+	struct perf_session	*session;
+	enum {
+		PER_CPU,
+		PER_THREAD,
+	} mode;
+	int 			header_fd;
+	u64			header_written;
+	struct hlist_head	fd_hash[FD_HASH_SIZE];
+	int			fd_hash_nr;
+};
+
+struct fdhash_node {
+	int			id;
+	int			fd;
+	struct hlist_node	list;
+};
+
+static struct hlist_head *get_hash(struct data_split *split, int id)
+{
+	return &split->fd_hash[id % FD_HASH_MASK];
+}
+
+static int perf_event__rewrite_header(struct perf_tool *tool,
+				      union perf_event *event)
+{
+	struct data_split *split = container_of(tool, struct data_split, tool);
+	ssize_t size;
+
+	size = writen(split->header_fd, event, event->header.size);
+	if (size < 0)
+		return -errno;
+
+	split->header_written += size;
+	return 0;
+}
+
+static int split_other_events(struct perf_tool *tool,
+				union perf_event *event,
+				struct perf_sample *sample __maybe_unused,
+				struct machine *machine __maybe_unused)
+{
+	return perf_event__rewrite_header(tool, event);
+}
+
+static int split_sample_event(struct perf_tool *tool,
+				union perf_event *event,
+				struct perf_sample *sample,
+				struct perf_evsel *evsel __maybe_unused,
+				struct machine *machine __maybe_unused)
+{
+	struct data_split *split = container_of(tool, struct data_split, tool);
+	int id = split->mode == PER_CPU ? sample->cpu : sample->tid;
+	int fd = -1;
+	char buf[PATH_MAX];
+	struct hlist_head *head;
+	struct fdhash_node *node;
+
+	head = get_hash(split, id);
+	hlist_for_each_entry(node, head, list) {
+		if (node->id == id) {
+			fd = node->fd;
+			break;
+		}
+	}
+
+	if (fd == -1) {
+		scnprintf(buf, sizeof(buf), "%s/perf.data.%d",
+			  output_name, split->fd_hash_nr++);
+
+		fd = open(buf, O_RDWR|O_CREAT|O_TRUNC, 0600);
+		if (fd < 0) {
+			pr_err("cannot open data file: %s: %m\n", buf);
+			return -1;
+		}
+
+		node = malloc(sizeof(*node));
+		if (node == NULL) {
+			pr_err("memory allocation failed\n");
+			return -1;
+		}
+
+		node->id = id;
+		node->fd = fd;
+
+		hlist_add_head(&node->list, head);
+	}
+
+	return writen(fd, event, event->header.size) > 0 ? 0 : -errno;
+}
+
+static int __data_cmd_split(struct data_split *split)
+{
+	struct perf_session *session = split->session;
+	char *output = NULL;
+	char buf[PATH_MAX];
+	u64 sample_type;
+	int header_fd;
+	int ret = -1;
+	int i;
+
+	if (!output_name) {
+		if (asprintf(&output, "%s.dir", input_name) < 0) {
+			pr_err("memory allocation failed\n");
+			return -1;
+		}
+		output_name = output;
+	}
+
+	mkdir(output_name, 0700);
+
+	/*
+	 * This is necessary to write (copy) build-id table.  After
+	 * processing header, dsos list will contain dso which was on
+	 * the original build-id table.
+	 */
+	dsos__hit_all(session);
+
+	scnprintf(buf, sizeof(buf), "%s/perf.header", output_name);
+	header_fd = open(buf, O_RDWR|O_CREAT|O_TRUNC, 0600);
+	if (header_fd < 0) {
+		pr_err("cannot open header file: %s: %m\n", buf);
+		goto out;
+	}
+
+	lseek(header_fd, session->header.data_offset, SEEK_SET);
+
+	sample_type = perf_evlist__combined_sample_type(session->evlist);
+	if (sample_type & PERF_SAMPLE_CPU)
+		split->mode = PER_CPU;
+	else
+		split->mode = PER_THREAD;
+
+	pr_debug("splitting data file for %s\n",
+		 split->mode == PER_CPU ? "CPUs" : "threads");
+
+	split->header_fd = header_fd;
+	perf_session__process_events(session, &split->tool);
+
+	for (i = 0; i < FD_HASH_SIZE; i++) {
+		struct fdhash_node *pos;
+		struct hlist_node *tmp;
+
+		hlist_for_each_entry_safe(pos, tmp, &split->fd_hash[i], list) {
+			hlist_del(&pos->list);
+			close(pos->fd);
+			free(pos);
+		}
+	}
+
+	session->header.data_size = split->header_written;
+	perf_session__write_header(session, session->evlist, header_fd, true);
+
+	close(header_fd);
+	ret = 0;
+out:
+	free(output);
+	return ret;
+}
+
+int data_cmd_split(int argc, const char **argv, const char *prefix __maybe_unused)
+{
+	bool force = false;
+	struct perf_session *session;
+	struct perf_data_file file = {
+		.mode  = PERF_DATA_MODE_READ,
+	};
+	struct data_split split = {
+		.tool = {
+			.sample		= split_sample_event,
+			.fork		= split_other_events,
+			.comm		= split_other_events,
+			.exit		= split_other_events,
+			.mmap		= split_other_events,
+			.mmap2		= split_other_events,
+			.lost		= split_other_events,
+			.throttle	= split_other_events,
+			.unthrottle	= split_other_events,
+		},
+	};
+	const char * const split_usage[] = {
+		"perf data split [<options>]",
+		NULL
+	};
+	const struct option split_options[] = {
+	OPT_STRING('i', "input", &input_name, "file", "input file name"),
+	OPT_STRING('o', "output", &output_name, "file", "output directory name"),
+	OPT_BOOLEAN('f', "force", &force, "don't complain, do it"),
+	OPT_INCR('v', "verbose", &verbose, "be more verbose"),
+	OPT_END()
+	};
+
+	argc = parse_options(argc, argv, split_options, split_usage, 0);
+	if (argc)
+		usage_with_options(split_usage, split_options);
+
+	file.path = input_name;
+	file.force = force;
+	session = perf_session__new(&file, false, &split.tool);
+	if (session == NULL)
+		return -1;
+
+	split.session = session;
+	symbol__init(&session->header.env);
+
+	__data_cmd_split(&split);
+
+	perf_session__delete(session);
+	return 0;
+}
+
 int cmd_data(int argc, const char **argv, const char *prefix)
 {
 	struct data_cmd *cmd;
-- 
2.1.3


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCH 37/37] perf data: Implement 'split' subcommand
  2014-12-24  7:15 ` [PATCH 37/37] perf data: Implement 'split' subcommand Namhyung Kim
@ 2014-12-24 13:51   ` Arnaldo Carvalho de Melo
  2014-12-24 14:14     ` Namhyung Kim
  2014-12-26 13:59   ` Jiri Olsa
  1 sibling, 1 reply; 91+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-12-24 13:51 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

Em Wed, Dec 24, 2014 at 04:15:33PM +0900, Namhyung Kim escreveu:
> The perf data split command is for splitting a (large) single data
> file into multiple files under a directory (perf.data.dir by default)
> so that it can be processed and reported using multiple threads.

How is it split? By CPU?
Will the metadata stay in a different file?
Please be as verbose on the description as possible :-)

- ARnaldo
 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/Documentation/perf-data.txt |  28 +++++
>  tools/perf/builtin-data.c              | 223 +++++++++++++++++++++++++++++++++
>  2 files changed, 251 insertions(+)
> 
> diff --git a/tools/perf/Documentation/perf-data.txt b/tools/perf/Documentation/perf-data.txt
> index b8c83947715c..42708702f10c 100644
> --- a/tools/perf/Documentation/perf-data.txt
> +++ b/tools/perf/Documentation/perf-data.txt
> @@ -13,3 +13,31 @@ SYNOPSIS
>  DESCRIPTION
>  -----------
>  Data file related processing.
> +
> +COMMANDS
> +--------
> +split::
> +	Split single data file (perf.data) into multiple files under a directory
> +	in order to be reported by multiple threads.
> +
> +OPTIONS for 'split'
> +---------------------
> +-i::
> +--input::
> +	Specify input perf data file path.
> +
> +-o::
> +--output::
> +	Specify output perf data directory path.
> +
> +-v::
> +--verbose::
> +        Be more verbose (show counter open errors, etc).
> +
> +-f::
> +--force::
> +        Don't complain, do it.
> +
> +SEE ALSO
> +--------
> +linkperf:perf[1], linkperf:perf-report[1]
> diff --git a/tools/perf/builtin-data.c b/tools/perf/builtin-data.c
> index 1eee97d020fa..5f3173826850 100644
> --- a/tools/perf/builtin-data.c
> +++ b/tools/perf/builtin-data.c
> @@ -2,10 +2,14 @@
>  #include "builtin.h"
>  #include "perf.h"
>  #include "debug.h"
> +#include "session.h"
> +#include "evlist.h"
>  #include "parse-options.h"
>  
>  typedef int (*data_cmd_fn_t)(int argc, const char **argv, const char *prefix);
>  
> +static const char *output_name;
> +
>  struct data_cmd {
>  	const char	*name;
>  	const char	*summary;
> @@ -41,10 +45,229 @@ static void print_usage(void)
>  	printf("\n");
>  }
>  
> +static int data_cmd_split(int argc, const char **argv, const char *prefix);
> +
>  static struct data_cmd data_cmds[] = {
> +	{ "split", "split single data file into multi-file", data_cmd_split },
>  	{ NULL },
>  };
>  
> +#define FD_HASH_BITS  7
> +#define FD_HASH_SIZE  (1 << FD_HASH_BITS)
> +#define FD_HASH_MASK  (FD_HASH_SIZE - 1)
> +
> +struct data_split {
> +	struct perf_tool	tool;
> +	struct perf_session	*session;
> +	enum {
> +		PER_CPU,
> +		PER_THREAD,
> +	} mode;
> +	int 			header_fd;
> +	u64			header_written;
> +	struct hlist_head	fd_hash[FD_HASH_SIZE];
> +	int			fd_hash_nr;
> +};
> +
> +struct fdhash_node {
> +	int			id;
> +	int			fd;
> +	struct hlist_node	list;
> +};
> +
> +static struct hlist_head *get_hash(struct data_split *split, int id)
> +{
> +	return &split->fd_hash[id % FD_HASH_MASK];
> +}
> +
> +static int perf_event__rewrite_header(struct perf_tool *tool,
> +				      union perf_event *event)
> +{
> +	struct data_split *split = container_of(tool, struct data_split, tool);
> +	ssize_t size;
> +
> +	size = writen(split->header_fd, event, event->header.size);
> +	if (size < 0)
> +		return -errno;
> +
> +	split->header_written += size;
> +	return 0;
> +}
> +
> +static int split_other_events(struct perf_tool *tool,
> +				union perf_event *event,
> +				struct perf_sample *sample __maybe_unused,
> +				struct machine *machine __maybe_unused)
> +{
> +	return perf_event__rewrite_header(tool, event);
> +}
> +
> +static int split_sample_event(struct perf_tool *tool,
> +				union perf_event *event,
> +				struct perf_sample *sample,
> +				struct perf_evsel *evsel __maybe_unused,
> +				struct machine *machine __maybe_unused)
> +{
> +	struct data_split *split = container_of(tool, struct data_split, tool);
> +	int id = split->mode == PER_CPU ? sample->cpu : sample->tid;
> +	int fd = -1;
> +	char buf[PATH_MAX];
> +	struct hlist_head *head;
> +	struct fdhash_node *node;
> +
> +	head = get_hash(split, id);
> +	hlist_for_each_entry(node, head, list) {
> +		if (node->id == id) {
> +			fd = node->fd;
> +			break;
> +		}
> +	}
> +
> +	if (fd == -1) {
> +		scnprintf(buf, sizeof(buf), "%s/perf.data.%d",
> +			  output_name, split->fd_hash_nr++);
> +
> +		fd = open(buf, O_RDWR|O_CREAT|O_TRUNC, 0600);
> +		if (fd < 0) {
> +			pr_err("cannot open data file: %s: %m\n", buf);
> +			return -1;
> +		}
> +
> +		node = malloc(sizeof(*node));
> +		if (node == NULL) {
> +			pr_err("memory allocation failed\n");
> +			return -1;
> +		}
> +
> +		node->id = id;
> +		node->fd = fd;
> +
> +		hlist_add_head(&node->list, head);
> +	}
> +
> +	return writen(fd, event, event->header.size) > 0 ? 0 : -errno;
> +}
> +
> +static int __data_cmd_split(struct data_split *split)
> +{
> +	struct perf_session *session = split->session;
> +	char *output = NULL;
> +	char buf[PATH_MAX];
> +	u64 sample_type;
> +	int header_fd;
> +	int ret = -1;
> +	int i;
> +
> +	if (!output_name) {
> +		if (asprintf(&output, "%s.dir", input_name) < 0) {
> +			pr_err("memory allocation failed\n");
> +			return -1;
> +		}
> +		output_name = output;
> +	}
> +
> +	mkdir(output_name, 0700);
> +
> +	/*
> +	 * This is necessary to write (copy) build-id table.  After
> +	 * processing header, dsos list will contain dso which was on
> +	 * the original build-id table.
> +	 */
> +	dsos__hit_all(session);
> +
> +	scnprintf(buf, sizeof(buf), "%s/perf.header", output_name);
> +	header_fd = open(buf, O_RDWR|O_CREAT|O_TRUNC, 0600);
> +	if (header_fd < 0) {
> +		pr_err("cannot open header file: %s: %m\n", buf);
> +		goto out;
> +	}
> +
> +	lseek(header_fd, session->header.data_offset, SEEK_SET);
> +
> +	sample_type = perf_evlist__combined_sample_type(session->evlist);
> +	if (sample_type & PERF_SAMPLE_CPU)
> +		split->mode = PER_CPU;
> +	else
> +		split->mode = PER_THREAD;
> +
> +	pr_debug("splitting data file for %s\n",
> +		 split->mode == PER_CPU ? "CPUs" : "threads");
> +
> +	split->header_fd = header_fd;
> +	perf_session__process_events(session, &split->tool);
> +
> +	for (i = 0; i < FD_HASH_SIZE; i++) {
> +		struct fdhash_node *pos;
> +		struct hlist_node *tmp;
> +
> +		hlist_for_each_entry_safe(pos, tmp, &split->fd_hash[i], list) {
> +			hlist_del(&pos->list);
> +			close(pos->fd);
> +			free(pos);
> +		}
> +	}
> +
> +	session->header.data_size = split->header_written;
> +	perf_session__write_header(session, session->evlist, header_fd, true);
> +
> +	close(header_fd);
> +	ret = 0;
> +out:
> +	free(output);
> +	return ret;
> +}
> +
> +int data_cmd_split(int argc, const char **argv, const char *prefix __maybe_unused)
> +{
> +	bool force = false;
> +	struct perf_session *session;
> +	struct perf_data_file file = {
> +		.mode  = PERF_DATA_MODE_READ,
> +	};
> +	struct data_split split = {
> +		.tool = {
> +			.sample		= split_sample_event,
> +			.fork		= split_other_events,
> +			.comm		= split_other_events,
> +			.exit		= split_other_events,
> +			.mmap		= split_other_events,
> +			.mmap2		= split_other_events,
> +			.lost		= split_other_events,
> +			.throttle	= split_other_events,
> +			.unthrottle	= split_other_events,
> +		},
> +	};
> +	const char * const split_usage[] = {
> +		"perf data split [<options>]",
> +		NULL
> +	};
> +	const struct option split_options[] = {
> +	OPT_STRING('i', "input", &input_name, "file", "input file name"),
> +	OPT_STRING('o', "output", &output_name, "file", "output directory name"),
> +	OPT_BOOLEAN('f', "force", &force, "don't complain, do it"),
> +	OPT_INCR('v', "verbose", &verbose, "be more verbose"),
> +	OPT_END()
> +	};
> +
> +	argc = parse_options(argc, argv, split_options, split_usage, 0);
> +	if (argc)
> +		usage_with_options(split_usage, split_options);
> +
> +	file.path = input_name;
> +	file.force = force;
> +	session = perf_session__new(&file, false, &split.tool);
> +	if (session == NULL)
> +		return -1;
> +
> +	split.session = session;
> +	symbol__init(&session->header.env);
> +
> +	__data_cmd_split(&split);
> +
> +	perf_session__delete(session);
> +	return 0;
> +}
> +
>  int cmd_data(int argc, const char **argv, const char *prefix)
>  {
>  	struct data_cmd *cmd;
> -- 
> 2.1.3

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 37/37] perf data: Implement 'split' subcommand
  2014-12-24 13:51   ` Arnaldo Carvalho de Melo
@ 2014-12-24 14:14     ` Namhyung Kim
  2014-12-24 14:45       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 91+ messages in thread
From: Namhyung Kim @ 2014-12-24 14:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

Hi Arnaldo,

On Wed, Dec 24, 2014 at 10:51 PM, Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
> Em Wed, Dec 24, 2014 at 04:15:33PM +0900, Namhyung Kim escreveu:
>> The perf data split command is for splitting a (large) single data
>> file into multiple files under a directory (perf.data.dir by default)
>> so that it can be processed and reported using multiple threads.
>
> How is it split? By CPU?
> Will the metadata stay in a different file?
> Please be as verbose on the description as possible :-)

It depends on the data file - if it's recorded system-wide, the split
will be done by cpu, otherwise by thread.  It's determined by checking
sample type has PERF_SAMPLE_CPU.  And metadata is saved in a separate
perf.header file like other multi-file data.  Will add it to the
change log.

Thanks,
Namhyung


>> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
>> ---
>>  tools/perf/Documentation/perf-data.txt |  28 +++++
>>  tools/perf/builtin-data.c              | 223 +++++++++++++++++++++++++++++++++
>>  2 files changed, 251 insertions(+)
>>
>> diff --git a/tools/perf/Documentation/perf-data.txt b/tools/perf/Documentation/perf-data.txt
>> index b8c83947715c..42708702f10c 100644
>> --- a/tools/perf/Documentation/perf-data.txt
>> +++ b/tools/perf/Documentation/perf-data.txt
>> @@ -13,3 +13,31 @@ SYNOPSIS
>>  DESCRIPTION
>>  -----------
>>  Data file related processing.
>> +
>> +COMMANDS
>> +--------
>> +split::
>> +     Split single data file (perf.data) into multiple files under a directory
>> +     in order to be reported by multiple threads.
>> +
>> +OPTIONS for 'split'
>> +---------------------
>> +-i::
>> +--input::
>> +     Specify input perf data file path.
>> +
>> +-o::
>> +--output::
>> +     Specify output perf data directory path.
>> +
>> +-v::
>> +--verbose::
>> +        Be more verbose (show counter open errors, etc).
>> +
>> +-f::
>> +--force::
>> +        Don't complain, do it.
>> +
>> +SEE ALSO
>> +--------
>> +linkperf:perf[1], linkperf:perf-report[1]
>> diff --git a/tools/perf/builtin-data.c b/tools/perf/builtin-data.c
>> index 1eee97d020fa..5f3173826850 100644
>> --- a/tools/perf/builtin-data.c
>> +++ b/tools/perf/builtin-data.c
>> @@ -2,10 +2,14 @@
>>  #include "builtin.h"
>>  #include "perf.h"
>>  #include "debug.h"
>> +#include "session.h"
>> +#include "evlist.h"
>>  #include "parse-options.h"
>>
>>  typedef int (*data_cmd_fn_t)(int argc, const char **argv, const char *prefix);
>>
>> +static const char *output_name;
>> +
>>  struct data_cmd {
>>       const char      *name;
>>       const char      *summary;
>> @@ -41,10 +45,229 @@ static void print_usage(void)
>>       printf("\n");
>>  }
>>
>> +static int data_cmd_split(int argc, const char **argv, const char *prefix);
>> +
>>  static struct data_cmd data_cmds[] = {
>> +     { "split", "split single data file into multi-file", data_cmd_split },
>>       { NULL },
>>  };
>>
>> +#define FD_HASH_BITS  7
>> +#define FD_HASH_SIZE  (1 << FD_HASH_BITS)
>> +#define FD_HASH_MASK  (FD_HASH_SIZE - 1)
>> +
>> +struct data_split {
>> +     struct perf_tool        tool;
>> +     struct perf_session     *session;
>> +     enum {
>> +             PER_CPU,
>> +             PER_THREAD,
>> +     } mode;
>> +     int                     header_fd;
>> +     u64                     header_written;
>> +     struct hlist_head       fd_hash[FD_HASH_SIZE];
>> +     int                     fd_hash_nr;
>> +};
>> +
>> +struct fdhash_node {
>> +     int                     id;
>> +     int                     fd;
>> +     struct hlist_node       list;
>> +};
>> +
>> +static struct hlist_head *get_hash(struct data_split *split, int id)
>> +{
>> +     return &split->fd_hash[id % FD_HASH_MASK];
>> +}
>> +
>> +static int perf_event__rewrite_header(struct perf_tool *tool,
>> +                                   union perf_event *event)
>> +{
>> +     struct data_split *split = container_of(tool, struct data_split, tool);
>> +     ssize_t size;
>> +
>> +     size = writen(split->header_fd, event, event->header.size);
>> +     if (size < 0)
>> +             return -errno;
>> +
>> +     split->header_written += size;
>> +     return 0;
>> +}
>> +
>> +static int split_other_events(struct perf_tool *tool,
>> +                             union perf_event *event,
>> +                             struct perf_sample *sample __maybe_unused,
>> +                             struct machine *machine __maybe_unused)
>> +{
>> +     return perf_event__rewrite_header(tool, event);
>> +}
>> +
>> +static int split_sample_event(struct perf_tool *tool,
>> +                             union perf_event *event,
>> +                             struct perf_sample *sample,
>> +                             struct perf_evsel *evsel __maybe_unused,
>> +                             struct machine *machine __maybe_unused)
>> +{
>> +     struct data_split *split = container_of(tool, struct data_split, tool);
>> +     int id = split->mode == PER_CPU ? sample->cpu : sample->tid;
>> +     int fd = -1;
>> +     char buf[PATH_MAX];
>> +     struct hlist_head *head;
>> +     struct fdhash_node *node;
>> +
>> +     head = get_hash(split, id);
>> +     hlist_for_each_entry(node, head, list) {
>> +             if (node->id == id) {
>> +                     fd = node->fd;
>> +                     break;
>> +             }
>> +     }
>> +
>> +     if (fd == -1) {
>> +             scnprintf(buf, sizeof(buf), "%s/perf.data.%d",
>> +                       output_name, split->fd_hash_nr++);
>> +
>> +             fd = open(buf, O_RDWR|O_CREAT|O_TRUNC, 0600);
>> +             if (fd < 0) {
>> +                     pr_err("cannot open data file: %s: %m\n", buf);
>> +                     return -1;
>> +             }
>> +
>> +             node = malloc(sizeof(*node));
>> +             if (node == NULL) {
>> +                     pr_err("memory allocation failed\n");
>> +                     return -1;
>> +             }
>> +
>> +             node->id = id;
>> +             node->fd = fd;
>> +
>> +             hlist_add_head(&node->list, head);
>> +     }
>> +
>> +     return writen(fd, event, event->header.size) > 0 ? 0 : -errno;
>> +}
>> +
>> +static int __data_cmd_split(struct data_split *split)
>> +{
>> +     struct perf_session *session = split->session;
>> +     char *output = NULL;
>> +     char buf[PATH_MAX];
>> +     u64 sample_type;
>> +     int header_fd;
>> +     int ret = -1;
>> +     int i;
>> +
>> +     if (!output_name) {
>> +             if (asprintf(&output, "%s.dir", input_name) < 0) {
>> +                     pr_err("memory allocation failed\n");
>> +                     return -1;
>> +             }
>> +             output_name = output;
>> +     }
>> +
>> +     mkdir(output_name, 0700);
>> +
>> +     /*
>> +      * This is necessary to write (copy) build-id table.  After
>> +      * processing header, dsos list will contain dso which was on
>> +      * the original build-id table.
>> +      */
>> +     dsos__hit_all(session);
>> +
>> +     scnprintf(buf, sizeof(buf), "%s/perf.header", output_name);
>> +     header_fd = open(buf, O_RDWR|O_CREAT|O_TRUNC, 0600);
>> +     if (header_fd < 0) {
>> +             pr_err("cannot open header file: %s: %m\n", buf);
>> +             goto out;
>> +     }
>> +
>> +     lseek(header_fd, session->header.data_offset, SEEK_SET);
>> +
>> +     sample_type = perf_evlist__combined_sample_type(session->evlist);
>> +     if (sample_type & PERF_SAMPLE_CPU)
>> +             split->mode = PER_CPU;
>> +     else
>> +             split->mode = PER_THREAD;
>> +
>> +     pr_debug("splitting data file for %s\n",
>> +              split->mode == PER_CPU ? "CPUs" : "threads");
>> +
>> +     split->header_fd = header_fd;
>> +     perf_session__process_events(session, &split->tool);
>> +
>> +     for (i = 0; i < FD_HASH_SIZE; i++) {
>> +             struct fdhash_node *pos;
>> +             struct hlist_node *tmp;
>> +
>> +             hlist_for_each_entry_safe(pos, tmp, &split->fd_hash[i], list) {
>> +                     hlist_del(&pos->list);
>> +                     close(pos->fd);
>> +                     free(pos);
>> +             }
>> +     }
>> +
>> +     session->header.data_size = split->header_written;
>> +     perf_session__write_header(session, session->evlist, header_fd, true);
>> +
>> +     close(header_fd);
>> +     ret = 0;
>> +out:
>> +     free(output);
>> +     return ret;
>> +}
>> +
>> +int data_cmd_split(int argc, const char **argv, const char *prefix __maybe_unused)
>> +{
>> +     bool force = false;
>> +     struct perf_session *session;
>> +     struct perf_data_file file = {
>> +             .mode  = PERF_DATA_MODE_READ,
>> +     };
>> +     struct data_split split = {
>> +             .tool = {
>> +                     .sample         = split_sample_event,
>> +                     .fork           = split_other_events,
>> +                     .comm           = split_other_events,
>> +                     .exit           = split_other_events,
>> +                     .mmap           = split_other_events,
>> +                     .mmap2          = split_other_events,
>> +                     .lost           = split_other_events,
>> +                     .throttle       = split_other_events,
>> +                     .unthrottle     = split_other_events,
>> +             },
>> +     };
>> +     const char * const split_usage[] = {
>> +             "perf data split [<options>]",
>> +             NULL
>> +     };
>> +     const struct option split_options[] = {
>> +     OPT_STRING('i', "input", &input_name, "file", "input file name"),
>> +     OPT_STRING('o', "output", &output_name, "file", "output directory name"),
>> +     OPT_BOOLEAN('f', "force", &force, "don't complain, do it"),
>> +     OPT_INCR('v', "verbose", &verbose, "be more verbose"),
>> +     OPT_END()
>> +     };
>> +
>> +     argc = parse_options(argc, argv, split_options, split_usage, 0);
>> +     if (argc)
>> +             usage_with_options(split_usage, split_options);
>> +
>> +     file.path = input_name;
>> +     file.force = force;
>> +     session = perf_session__new(&file, false, &split.tool);
>> +     if (session == NULL)
>> +             return -1;
>> +
>> +     split.session = session;
>> +     symbol__init(&session->header.env);
>> +
>> +     __data_cmd_split(&split);
>> +
>> +     perf_session__delete(session);
>> +     return 0;
>> +}
>> +
>>  int cmd_data(int argc, const char **argv, const char *prefix)
>>  {
>>       struct data_cmd *cmd;
>> --
>> 2.1.3



-- 
Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 37/37] perf data: Implement 'split' subcommand
  2014-12-24 14:14     ` Namhyung Kim
@ 2014-12-24 14:45       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 91+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-12-24 14:45 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

Em Wed, Dec 24, 2014 at 11:14:47PM +0900, Namhyung Kim escreveu:
> Hi Arnaldo,
> 
> On Wed, Dec 24, 2014 at 10:51 PM, Arnaldo Carvalho de Melo
> <acme@kernel.org> wrote:
> > Em Wed, Dec 24, 2014 at 04:15:33PM +0900, Namhyung Kim escreveu:
> >> The perf data split command is for splitting a (large) single data
> >> file into multiple files under a directory (perf.data.dir by default)
> >> so that it can be processed and reported using multiple threads.
> >
> > How is it split? By CPU?
> > Will the metadata stay in a different file?
> > Please be as verbose on the description as possible :-)
> 
> It depends on the data file - if it's recorded system-wide, the split
> will be done by cpu, otherwise by thread.  It's determined by checking
> sample type has PERF_SAMPLE_CPU.  And metadata is saved in a separate
> perf.header file like other multi-file data.  Will add it to the
> change log.

Thanks for the explanation!

Please try to add such more detailed information on the change logs, as
the reviewing process starts at the idea/design, and that is first read
in the changelog, only after understanding what you want to do is that
we should bother looking if you did what you say you would do :-)

- Arnaldo
 
> Thanks,
> Namhyung
> 
> 
> >> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> >> ---
> >>  tools/perf/Documentation/perf-data.txt |  28 +++++
> >>  tools/perf/builtin-data.c              | 223 +++++++++++++++++++++++++++++++++
> >>  2 files changed, 251 insertions(+)
> >>
> >> diff --git a/tools/perf/Documentation/perf-data.txt b/tools/perf/Documentation/perf-data.txt
> >> index b8c83947715c..42708702f10c 100644
> >> --- a/tools/perf/Documentation/perf-data.txt
> >> +++ b/tools/perf/Documentation/perf-data.txt
> >> @@ -13,3 +13,31 @@ SYNOPSIS
> >>  DESCRIPTION
> >>  -----------
> >>  Data file related processing.
> >> +
> >> +COMMANDS
> >> +--------
> >> +split::
> >> +     Split single data file (perf.data) into multiple files under a directory
> >> +     in order to be reported by multiple threads.
> >> +
> >> +OPTIONS for 'split'
> >> +---------------------
> >> +-i::
> >> +--input::
> >> +     Specify input perf data file path.
> >> +
> >> +-o::
> >> +--output::
> >> +     Specify output perf data directory path.
> >> +
> >> +-v::
> >> +--verbose::
> >> +        Be more verbose (show counter open errors, etc).
> >> +
> >> +-f::
> >> +--force::
> >> +        Don't complain, do it.
> >> +
> >> +SEE ALSO
> >> +--------
> >> +linkperf:perf[1], linkperf:perf-report[1]
> >> diff --git a/tools/perf/builtin-data.c b/tools/perf/builtin-data.c
> >> index 1eee97d020fa..5f3173826850 100644
> >> --- a/tools/perf/builtin-data.c
> >> +++ b/tools/perf/builtin-data.c
> >> @@ -2,10 +2,14 @@
> >>  #include "builtin.h"
> >>  #include "perf.h"
> >>  #include "debug.h"
> >> +#include "session.h"
> >> +#include "evlist.h"
> >>  #include "parse-options.h"
> >>
> >>  typedef int (*data_cmd_fn_t)(int argc, const char **argv, const char *prefix);
> >>
> >> +static const char *output_name;
> >> +
> >>  struct data_cmd {
> >>       const char      *name;
> >>       const char      *summary;
> >> @@ -41,10 +45,229 @@ static void print_usage(void)
> >>       printf("\n");
> >>  }
> >>
> >> +static int data_cmd_split(int argc, const char **argv, const char *prefix);
> >> +
> >>  static struct data_cmd data_cmds[] = {
> >> +     { "split", "split single data file into multi-file", data_cmd_split },
> >>       { NULL },
> >>  };
> >>
> >> +#define FD_HASH_BITS  7
> >> +#define FD_HASH_SIZE  (1 << FD_HASH_BITS)
> >> +#define FD_HASH_MASK  (FD_HASH_SIZE - 1)
> >> +
> >> +struct data_split {
> >> +     struct perf_tool        tool;
> >> +     struct perf_session     *session;
> >> +     enum {
> >> +             PER_CPU,
> >> +             PER_THREAD,
> >> +     } mode;
> >> +     int                     header_fd;
> >> +     u64                     header_written;
> >> +     struct hlist_head       fd_hash[FD_HASH_SIZE];
> >> +     int                     fd_hash_nr;
> >> +};
> >> +
> >> +struct fdhash_node {
> >> +     int                     id;
> >> +     int                     fd;
> >> +     struct hlist_node       list;
> >> +};
> >> +
> >> +static struct hlist_head *get_hash(struct data_split *split, int id)
> >> +{
> >> +     return &split->fd_hash[id % FD_HASH_MASK];
> >> +}
> >> +
> >> +static int perf_event__rewrite_header(struct perf_tool *tool,
> >> +                                   union perf_event *event)
> >> +{
> >> +     struct data_split *split = container_of(tool, struct data_split, tool);
> >> +     ssize_t size;
> >> +
> >> +     size = writen(split->header_fd, event, event->header.size);
> >> +     if (size < 0)
> >> +             return -errno;
> >> +
> >> +     split->header_written += size;
> >> +     return 0;
> >> +}
> >> +
> >> +static int split_other_events(struct perf_tool *tool,
> >> +                             union perf_event *event,
> >> +                             struct perf_sample *sample __maybe_unused,
> >> +                             struct machine *machine __maybe_unused)
> >> +{
> >> +     return perf_event__rewrite_header(tool, event);
> >> +}
> >> +
> >> +static int split_sample_event(struct perf_tool *tool,
> >> +                             union perf_event *event,
> >> +                             struct perf_sample *sample,
> >> +                             struct perf_evsel *evsel __maybe_unused,
> >> +                             struct machine *machine __maybe_unused)
> >> +{
> >> +     struct data_split *split = container_of(tool, struct data_split, tool);
> >> +     int id = split->mode == PER_CPU ? sample->cpu : sample->tid;
> >> +     int fd = -1;
> >> +     char buf[PATH_MAX];
> >> +     struct hlist_head *head;
> >> +     struct fdhash_node *node;
> >> +
> >> +     head = get_hash(split, id);
> >> +     hlist_for_each_entry(node, head, list) {
> >> +             if (node->id == id) {
> >> +                     fd = node->fd;
> >> +                     break;
> >> +             }
> >> +     }
> >> +
> >> +     if (fd == -1) {
> >> +             scnprintf(buf, sizeof(buf), "%s/perf.data.%d",
> >> +                       output_name, split->fd_hash_nr++);
> >> +
> >> +             fd = open(buf, O_RDWR|O_CREAT|O_TRUNC, 0600);
> >> +             if (fd < 0) {
> >> +                     pr_err("cannot open data file: %s: %m\n", buf);
> >> +                     return -1;
> >> +             }
> >> +
> >> +             node = malloc(sizeof(*node));
> >> +             if (node == NULL) {
> >> +                     pr_err("memory allocation failed\n");
> >> +                     return -1;
> >> +             }
> >> +
> >> +             node->id = id;
> >> +             node->fd = fd;
> >> +
> >> +             hlist_add_head(&node->list, head);
> >> +     }
> >> +
> >> +     return writen(fd, event, event->header.size) > 0 ? 0 : -errno;
> >> +}
> >> +
> >> +static int __data_cmd_split(struct data_split *split)
> >> +{
> >> +     struct perf_session *session = split->session;
> >> +     char *output = NULL;
> >> +     char buf[PATH_MAX];
> >> +     u64 sample_type;
> >> +     int header_fd;
> >> +     int ret = -1;
> >> +     int i;
> >> +
> >> +     if (!output_name) {
> >> +             if (asprintf(&output, "%s.dir", input_name) < 0) {
> >> +                     pr_err("memory allocation failed\n");
> >> +                     return -1;
> >> +             }
> >> +             output_name = output;
> >> +     }
> >> +
> >> +     mkdir(output_name, 0700);
> >> +
> >> +     /*
> >> +      * This is necessary to write (copy) build-id table.  After
> >> +      * processing header, dsos list will contain dso which was on
> >> +      * the original build-id table.
> >> +      */
> >> +     dsos__hit_all(session);
> >> +
> >> +     scnprintf(buf, sizeof(buf), "%s/perf.header", output_name);
> >> +     header_fd = open(buf, O_RDWR|O_CREAT|O_TRUNC, 0600);
> >> +     if (header_fd < 0) {
> >> +             pr_err("cannot open header file: %s: %m\n", buf);
> >> +             goto out;
> >> +     }
> >> +
> >> +     lseek(header_fd, session->header.data_offset, SEEK_SET);
> >> +
> >> +     sample_type = perf_evlist__combined_sample_type(session->evlist);
> >> +     if (sample_type & PERF_SAMPLE_CPU)
> >> +             split->mode = PER_CPU;
> >> +     else
> >> +             split->mode = PER_THREAD;
> >> +
> >> +     pr_debug("splitting data file for %s\n",
> >> +              split->mode == PER_CPU ? "CPUs" : "threads");
> >> +
> >> +     split->header_fd = header_fd;
> >> +     perf_session__process_events(session, &split->tool);
> >> +
> >> +     for (i = 0; i < FD_HASH_SIZE; i++) {
> >> +             struct fdhash_node *pos;
> >> +             struct hlist_node *tmp;
> >> +
> >> +             hlist_for_each_entry_safe(pos, tmp, &split->fd_hash[i], list) {
> >> +                     hlist_del(&pos->list);
> >> +                     close(pos->fd);
> >> +                     free(pos);
> >> +             }
> >> +     }
> >> +
> >> +     session->header.data_size = split->header_written;
> >> +     perf_session__write_header(session, session->evlist, header_fd, true);
> >> +
> >> +     close(header_fd);
> >> +     ret = 0;
> >> +out:
> >> +     free(output);
> >> +     return ret;
> >> +}
> >> +
> >> +int data_cmd_split(int argc, const char **argv, const char *prefix __maybe_unused)
> >> +{
> >> +     bool force = false;
> >> +     struct perf_session *session;
> >> +     struct perf_data_file file = {
> >> +             .mode  = PERF_DATA_MODE_READ,
> >> +     };
> >> +     struct data_split split = {
> >> +             .tool = {
> >> +                     .sample         = split_sample_event,
> >> +                     .fork           = split_other_events,
> >> +                     .comm           = split_other_events,
> >> +                     .exit           = split_other_events,
> >> +                     .mmap           = split_other_events,
> >> +                     .mmap2          = split_other_events,
> >> +                     .lost           = split_other_events,
> >> +                     .throttle       = split_other_events,
> >> +                     .unthrottle     = split_other_events,
> >> +             },
> >> +     };
> >> +     const char * const split_usage[] = {
> >> +             "perf data split [<options>]",
> >> +             NULL
> >> +     };
> >> +     const struct option split_options[] = {
> >> +     OPT_STRING('i', "input", &input_name, "file", "input file name"),
> >> +     OPT_STRING('o', "output", &output_name, "file", "output directory name"),
> >> +     OPT_BOOLEAN('f', "force", &force, "don't complain, do it"),
> >> +     OPT_INCR('v', "verbose", &verbose, "be more verbose"),
> >> +     OPT_END()
> >> +     };
> >> +
> >> +     argc = parse_options(argc, argv, split_options, split_usage, 0);
> >> +     if (argc)
> >> +             usage_with_options(split_usage, split_options);
> >> +
> >> +     file.path = input_name;
> >> +     file.force = force;
> >> +     session = perf_session__new(&file, false, &split.tool);
> >> +     if (session == NULL)
> >> +             return -1;
> >> +
> >> +     split.session = session;
> >> +     symbol__init(&session->header.env);
> >> +
> >> +     __data_cmd_split(&split);
> >> +
> >> +     perf_session__delete(session);
> >> +     return 0;
> >> +}
> >> +
> >>  int cmd_data(int argc, const char **argv, const char *prefix)
> >>  {
> >>       struct data_cmd *cmd;
> >> --
> >> 2.1.3
> 
> 
> 
> -- 
> Thanks,
> Namhyung

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 04/37] perf tools: Add multi file interface to perf_data_file
  2014-12-24  7:15 ` [PATCH 04/37] perf tools: Add multi file interface to perf_data_file Namhyung Kim
@ 2014-12-25 22:08   ` Jiri Olsa
  2014-12-26  1:19     ` Namhyung Kim
  2014-12-31 11:26   ` Jiri Olsa
  1 sibling, 1 reply; 91+ messages in thread
From: Jiri Olsa @ 2014-12-25 22:08 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Wed, Dec 24, 2014 at 04:15:00PM +0900, Namhyung Kim wrote:

SNIP

>  #endif /* __PERF_DATA_H */
> diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
> index d5eab3f3323f..a5046d52e311 100644
> --- a/tools/perf/util/util.c
> +++ b/tools/perf/util/util.c
> @@ -72,6 +72,49 @@ int mkdir_p(char *path, mode_t mode)
>  	return (stat(path, &st) && mkdir(path, mode)) ? -1 : 0;
>  }
>  
> +int rm_rf(char *path)
> +{
> +	DIR *dir;
> +	int ret = 0;
> +	struct dirent *d;
> +	char namebuf[PATH_MAX];
> +
> +	dir = opendir(path);
> +	if (dir == NULL)
> +		return 0;
> +
> +	while ((d = readdir(dir)) != NULL && !ret) {
> +		struct stat statbuf;
> +
> +		if (d->d_name[0] == '.')
> +			continue;

Could you check for '.' and for '..' to support '.*' removal?

I know tha we will probably not have any '.*' files in perf.data.dir,
but this function could be used later like for total cache clean ;-)

also please submit 'rm_rf' in separate patch

thanks,
jirka

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/37] perf tools: Create separate mmap for dummy tracking event
  2014-12-24  7:15 ` [PATCH 05/37] perf tools: Create separate mmap for dummy tracking event Namhyung Kim
@ 2014-12-25 22:08   ` Jiri Olsa
  2014-12-26  1:45     ` Namhyung Kim
  2014-12-25 22:09   ` Jiri Olsa
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 91+ messages in thread
From: Jiri Olsa @ 2014-12-25 22:08 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Wed, Dec 24, 2014 at 04:15:01PM +0900, Namhyung Kim wrote:
> When multi file support is enabled, a dummy tracking event will be
> used to track metadata (like task, comm and mmap events) for a session
> and actual samples will be recorded in separate files.
> 
> Provide separate mmap to the dummy tracking event.  The size is fixed
> to 128KiB (+ 1 page) as the event rate will be lower than samples.  I
> originally wanted to use a single mmap for this but cross-cpu sharing
> is prohibited so it's per-cpu (or per-task) like normal mmaps.

maybe this needs to be applied after next patch?
  perf tools: Introduce perf_evlist__mmap_multi

I'm getting compile error:

[jolsa@krava perf]$ make JOBS=1
  BUILD:   Doing 'make -j1' parallel build
  CC       util/evlist.o
util/evlist.c: In function ‘perf_evlist__mmap_per_evsel’:
util/evlist.c:937:9: error: ‘struct mmap_params’ has no member named ‘track’
   if (mp->track && perf_evsel__is_dummy_tracking(evsel)) {
         ^
util/evlist.c: In function ‘perf_evlist__mmap’:
util/evlist.c:1092:62: error: ‘use_track_mmap’ undeclared (first use in this function)
  if (evlist->mmap == NULL && perf_evlist__alloc_mmap(evlist, use_track_mmap) < 0)
                                                              ^
util/evlist.c:1092:62: note: each undeclared identifier is reported only once for each function it appears in
make[1]: *** [util/evlist.o] Error 1
make: *** [all] Error 2

jirka

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/37] perf tools: Create separate mmap for dummy tracking event
  2014-12-24  7:15 ` [PATCH 05/37] perf tools: Create separate mmap for dummy tracking event Namhyung Kim
  2014-12-25 22:08   ` Jiri Olsa
@ 2014-12-25 22:09   ` Jiri Olsa
  2014-12-26  1:55     ` Namhyung Kim
  2014-12-26 16:51   ` David Ahern
  2014-12-29 13:44   ` Adrian Hunter
  3 siblings, 1 reply; 91+ messages in thread
From: Jiri Olsa @ 2014-12-25 22:09 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Wed, Dec 24, 2014 at 04:15:01PM +0900, Namhyung Kim wrote:

SNIP

>  
>  union perf_event *perf_evlist__mmap_read(struct perf_evlist *evlist, int idx);
> -
>  void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx);
> +struct perf_mmap *perf_evlist__mmap_desc(struct perf_evlist *evlist, int idx);
>  
>  int perf_evlist__open(struct perf_evlist *evlist);
>  void perf_evlist__close(struct perf_evlist *evlist);
> @@ -211,6 +214,12 @@ bool perf_evlist__can_select_event(struct perf_evlist *evlist, const char *str);
>  void perf_evlist__to_front(struct perf_evlist *evlist,
>  			   struct perf_evsel *move_evsel);
>  
> +/* convert from/to negative idx for track mmaps */
> +static inline int track_mmap_idx(int idx)
> +{
> +	return -idx - 1;
> +}

hum, whats the logic with negative numbers in here?
you still access track_mmap array with this index no?

thanks,
jirka

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 13/37] perf tools: Use thread__comm_time() when adding hist entries
  2014-12-24  7:15 ` [PATCH 13/37] perf tools: Use thread__comm_time() when adding hist entries Namhyung Kim
@ 2014-12-25 22:53   ` Jiri Olsa
  2014-12-26  2:10     ` Namhyung Kim
  0 siblings, 1 reply; 91+ messages in thread
From: Jiri Olsa @ 2014-12-25 22:53 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Wed, Dec 24, 2014 at 04:15:09PM +0900, Namhyung Kim wrote:

SNIP

>  
>  			he = __hists__add_entry(hists, &al, NULL,
> -						NULL, NULL, 1, 1, 0, true);
> +						NULL, NULL, 1, 1, 0, -1, true);
>  			if (he == NULL)
>  				goto out;
>  
> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> index 9314286ed25c..d322264bac22 100644
> --- a/tools/perf/util/hist.c
> +++ b/tools/perf/util/hist.c
> @@ -451,11 +451,11 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
>  				      struct branch_info *bi,
>  				      struct mem_info *mi,
>  				      u64 period, u64 weight, u64 transaction,
> -				      bool sample_self)
> +				      u64 timestamp, bool sample_self)
>  {
>  	struct hist_entry entry = {
>  		.thread	= al->thread,
> -		.comm = thread__comm(al->thread),
> +		.comm = thread__comm_time(al->thread, timestamp),

with thread object having multiple comm entries, could this hurt
the single threaded performance?

The thread__comm_time function iterates comm_list each time,
maybe you could add some 'last_comm found check' logic in it?

jirka

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 14/37] perf tools: Convert dead thread list into rbtree
  2014-12-24  7:15 ` [PATCH 14/37] perf tools: Convert dead thread list into rbtree Namhyung Kim
@ 2014-12-25 23:05   ` Jiri Olsa
  2014-12-26  2:26     ` Namhyung Kim
  2014-12-27 15:31   ` David Ahern
  1 sibling, 1 reply; 91+ messages in thread
From: Jiri Olsa @ 2014-12-25 23:05 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Wed, Dec 24, 2014 at 04:15:10PM +0900, Namhyung Kim wrote:

SNIP

>  
>  static void machine__remove_thread(struct machine *machine, struct thread *th)
>  {
> +	struct rb_node **p = &machine->dead_threads.rb_node;
> +	struct rb_node *parent = NULL;
> +	struct thread *pos;
> +
>  	machine->last_match = NULL;
>  	rb_erase(&th->rb_node, &machine->threads);
> +
> +	th->dead = true;
> +
>  	/*
>  	 * We may have references to this thread, for instance in some hist_entry
> -	 * instances, so just move them to a separate list.
> +	 * instances, so just move them to a separate list in rbtree.
>  	 */
> -	list_add_tail(&th->node, &machine->dead_threads);
> +	while (*p != NULL) {
> +		parent = *p;
> +		pos = rb_entry(parent, struct thread, rb_node);
> +
> +		if (pos->tid == th->tid) {
> +			list_add_tail(&th->node, &pos->node);
> +			return;
> +		}

hum, why is this 'new list' in thread object necessary? why not
to store all in the tree?


> +
> +		if (th->tid < pos->tid)
> +			p = &(*p)->rb_left;
> +		else
> +			p = &(*p)->rb_right;
> +	}
> +

SNIP

> diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
> index 0b6dcd70bc8b..413f28cf689b 100644
> --- a/tools/perf/util/thread.h
> +++ b/tools/perf/util/thread.h
> @@ -11,10 +11,8 @@
>  struct thread_stack;
>  
>  struct thread {
> -	union {
> -		struct rb_node	 rb_node;
> -		struct list_head node;
> -	};
> +	struct rb_node	 	rb_node;
> +	struct list_head 	node;
>  	struct map_groups	*mg;
>  	pid_t			pid_; /* Not all tools update this */
>  	pid_t			tid;
> @@ -22,7 +20,8 @@ struct thread {
>  	int			cpu;
>  	char			shortname[3];
>  	bool			comm_set;
> -	bool			dead; /* if set thread has exited */
> +	bool			exited; /* if set thread has exited */
> +	bool			dead; /* thread is in dead_threads list */

looks like this also changes the logic (new exited flag),
not just the dead threads storage wheel

jirka

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 04/37] perf tools: Add multi file interface to perf_data_file
  2014-12-25 22:08   ` Jiri Olsa
@ 2014-12-26  1:19     ` Namhyung Kim
  0 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-26  1:19 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

Hi Jiri,

On Fri, Dec 26, 2014 at 7:08 AM, Jiri Olsa <jolsa@redhat.com> wrote:
> On Wed, Dec 24, 2014 at 04:15:00PM +0900, Namhyung Kim wrote:
>
> SNIP
>
>>  #endif /* __PERF_DATA_H */
>> diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
>> index d5eab3f3323f..a5046d52e311 100644
>> --- a/tools/perf/util/util.c
>> +++ b/tools/perf/util/util.c
>> @@ -72,6 +72,49 @@ int mkdir_p(char *path, mode_t mode)
>>       return (stat(path, &st) && mkdir(path, mode)) ? -1 : 0;
>>  }
>>
>> +int rm_rf(char *path)
>> +{
>> +     DIR *dir;
>> +     int ret = 0;
>> +     struct dirent *d;
>> +     char namebuf[PATH_MAX];
>> +
>> +     dir = opendir(path);
>> +     if (dir == NULL)
>> +             return 0;
>> +
>> +     while ((d = readdir(dir)) != NULL && !ret) {
>> +             struct stat statbuf;
>> +
>> +             if (d->d_name[0] == '.')
>> +                     continue;
>
> Could you check for '.' and for '..' to support '.*' removal?
>
> I know tha we will probably not have any '.*' files in perf.data.dir,
> but this function could be used later like for total cache clean ;-)

Ah, okay.


> also please submit 'rm_rf' in separate patch

Will do.


Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/37] perf tools: Create separate mmap for dummy tracking event
  2014-12-25 22:08   ` Jiri Olsa
@ 2014-12-26  1:45     ` Namhyung Kim
  0 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-26  1:45 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Fri, Dec 26, 2014 at 7:08 AM, Jiri Olsa <jolsa@redhat.com> wrote:
> On Wed, Dec 24, 2014 at 04:15:01PM +0900, Namhyung Kim wrote:
>> When multi file support is enabled, a dummy tracking event will be
>> used to track metadata (like task, comm and mmap events) for a session
>> and actual samples will be recorded in separate files.
>>
>> Provide separate mmap to the dummy tracking event.  The size is fixed
>> to 128KiB (+ 1 page) as the event rate will be lower than samples.  I
>> originally wanted to use a single mmap for this but cross-cpu sharing
>> is prohibited so it's per-cpu (or per-task) like normal mmaps.
>
> maybe this needs to be applied after next patch?
>   perf tools: Introduce perf_evlist__mmap_multi

Oops, sorry.  it seems the code was mixed after huge number of
rebase..  The commit order is right but it just has a wrong hunk.


>
> I'm getting compile error:
>
> [jolsa@krava perf]$ make JOBS=1
>   BUILD:   Doing 'make -j1' parallel build
>   CC       util/evlist.o
> util/evlist.c: In function ‘perf_evlist__mmap_per_evsel’:
> util/evlist.c:937:9: error: ‘struct mmap_params’ has no member named ‘track’
>    if (mp->track && perf_evsel__is_dummy_tracking(evsel)) {
>          ^
> util/evlist.c: In function ‘perf_evlist__mmap’:
> util/evlist.c:1092:62: error: ‘use_track_mmap’ undeclared (first use in this function)
>   if (evlist->mmap == NULL && perf_evlist__alloc_mmap(evlist, use_track_mmap) < 0)
>                                                               ^
> util/evlist.c:1092:62: note: each undeclared identifier is reported only once for each function it appears in
> make[1]: *** [util/evlist.o] Error 1
> make: *** [all] Error 2

Will fix, thanks
Namhyung

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/37] perf tools: Create separate mmap for dummy tracking event
  2014-12-25 22:09   ` Jiri Olsa
@ 2014-12-26  1:55     ` Namhyung Kim
  0 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-26  1:55 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Fri, Dec 26, 2014 at 7:09 AM, Jiri Olsa <jolsa@redhat.com> wrote:
> On Wed, Dec 24, 2014 at 04:15:01PM +0900, Namhyung Kim wrote:
>
> SNIP
>
>>
>>  union perf_event *perf_evlist__mmap_read(struct perf_evlist *evlist, int idx);
>> -
>>  void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx);
>> +struct perf_mmap *perf_evlist__mmap_desc(struct perf_evlist *evlist, int idx);
>>
>>  int perf_evlist__open(struct perf_evlist *evlist);
>>  void perf_evlist__close(struct perf_evlist *evlist);
>> @@ -211,6 +214,12 @@ bool perf_evlist__can_select_event(struct perf_evlist *evlist, const char *str);
>>  void perf_evlist__to_front(struct perf_evlist *evlist,
>>                          struct perf_evsel *move_evsel);
>>
>> +/* convert from/to negative idx for track mmaps */
>> +static inline int track_mmap_idx(int idx)
>> +{
>> +     return -idx - 1;
>> +}
>
> hum, whats the logic with negative numbers in here?
> you still access track_mmap array with this index no?

For each index (per-cpu or per-task depends on user input), it now has
two mmaps - one for normal (sampling) mmap and another for track
(metadata) mmap.  So I wanted to distinguish them by using positive
and negative number.  But simply changing the sign did not work due to
0 index.  And the subtracting by 1 works nicely for both direction.

0 => -1 => 0
1 => -2 => 1
2 => -3 => 2
3 => -4 => 3

So when it sees a negative index, it'll convert it using
track_mmap_idx() to get a matching (track) mmap.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 13/37] perf tools: Use thread__comm_time() when adding hist entries
  2014-12-25 22:53   ` Jiri Olsa
@ 2014-12-26  2:10     ` Namhyung Kim
  0 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-26  2:10 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Fri, Dec 26, 2014 at 7:53 AM, Jiri Olsa <jolsa@redhat.com> wrote:
> On Wed, Dec 24, 2014 at 04:15:09PM +0900, Namhyung Kim wrote:
>
> SNIP
>
>>
>>                       he = __hists__add_entry(hists, &al, NULL,
>> -                                             NULL, NULL, 1, 1, 0, true);
>> +                                             NULL, NULL, 1, 1, 0, -1, true);
>>                       if (he == NULL)
>>                               goto out;
>>
>> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
>> index 9314286ed25c..d322264bac22 100644
>> --- a/tools/perf/util/hist.c
>> +++ b/tools/perf/util/hist.c
>> @@ -451,11 +451,11 @@ struct hist_entry *__hists__add_entry(struct hists *hists,
>>                                     struct branch_info *bi,
>>                                     struct mem_info *mi,
>>                                     u64 period, u64 weight, u64 transaction,
>> -                                   bool sample_self)
>> +                                   u64 timestamp, bool sample_self)
>>  {
>>       struct hist_entry entry = {
>>               .thread = al->thread,
>> -             .comm = thread__comm(al->thread),
>> +             .comm = thread__comm_time(al->thread, timestamp),
>
> with thread object having multiple comm entries, could this hurt
> the single threaded performance?

Probably.  But in my test, the single threaded performance on a single
data file is slightly better or almost same.  I didn't investigate it
yet where the performance gain comes, but anyway, I think this has
almost no effect on the performance since most thread will have just
one or two comms I guess.

And JFYI, the single threaded performance on a multi-file data is
better (I tested same data using 'perf data split' command and no
--multi-thread option to perf report) as IMHO it doesn't need to use
the ordered event queue layer.

>
> The thread__comm_time function iterates comm_list each time,
> maybe you could add some 'last_comm found check' logic in it?

I think it can be easily added later if it really affects the performance.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 14/37] perf tools: Convert dead thread list into rbtree
  2014-12-25 23:05   ` Jiri Olsa
@ 2014-12-26  2:26     ` Namhyung Kim
  2014-12-26 17:14       ` David Ahern
  0 siblings, 1 reply; 91+ messages in thread
From: Namhyung Kim @ 2014-12-26  2:26 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Fri, Dec 26, 2014 at 8:05 AM, Jiri Olsa <jolsa@redhat.com> wrote:
> On Wed, Dec 24, 2014 at 04:15:10PM +0900, Namhyung Kim wrote:
>
> SNIP
>
>>
>>  static void machine__remove_thread(struct machine *machine, struct thread *th)
>>  {
>> +     struct rb_node **p = &machine->dead_threads.rb_node;
>> +     struct rb_node *parent = NULL;
>> +     struct thread *pos;
>> +
>>       machine->last_match = NULL;
>>       rb_erase(&th->rb_node, &machine->threads);
>> +
>> +     th->dead = true;
>> +
>>       /*
>>        * We may have references to this thread, for instance in some hist_entry
>> -      * instances, so just move them to a separate list.
>> +      * instances, so just move them to a separate list in rbtree.
>>        */
>> -     list_add_tail(&th->node, &machine->dead_threads);
>> +     while (*p != NULL) {
>> +             parent = *p;
>> +             pos = rb_entry(parent, struct thread, rb_node);
>> +
>> +             if (pos->tid == th->tid) {
>> +                     list_add_tail(&th->node, &pos->node);
>> +                     return;
>> +             }
>
> hum, why is this 'new list' in thread object necessary? why not
> to store all in the tree?

No absolute reason, but I just wanted to keep them in a single place
and to see how many of them exist easily.  We could compare pid and
then timestamp so that it can be in a single rbtree.

>
>
>> +
>> +             if (th->tid < pos->tid)
>> +                     p = &(*p)->rb_left;
>> +             else
>> +                     p = &(*p)->rb_right;
>> +     }
>> +
>
> SNIP
>
>> diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
>> index 0b6dcd70bc8b..413f28cf689b 100644
>> --- a/tools/perf/util/thread.h
>> +++ b/tools/perf/util/thread.h
>> @@ -11,10 +11,8 @@
>>  struct thread_stack;
>>
>>  struct thread {
>> -     union {
>> -             struct rb_node   rb_node;
>> -             struct list_head node;
>> -     };
>> +     struct rb_node          rb_node;
>> +     struct list_head        node;
>>       struct map_groups       *mg;
>>       pid_t                   pid_; /* Not all tools update this */
>>       pid_t                   tid;
>> @@ -22,7 +20,8 @@ struct thread {
>>       int                     cpu;
>>       char                    shortname[3];
>>       bool                    comm_set;
>> -     bool                    dead; /* if set thread has exited */
>> +     bool                    exited; /* if set thread has exited */
>> +     bool                    dead; /* thread is in dead_threads list */
>
> looks like this also changes the logic (new exited flag),
> not just the dead threads storage wheel

AFAICS the 'dead' flag is not used other than thread__exited().  And
it confused me a dead thread might not be in a dead_threads tree (or
list).  So I changed the name and no logical change intended.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 37/37] perf data: Implement 'split' subcommand
  2014-12-24  7:15 ` [PATCH 37/37] perf data: Implement 'split' subcommand Namhyung Kim
  2014-12-24 13:51   ` Arnaldo Carvalho de Melo
@ 2014-12-26 13:59   ` Jiri Olsa
  2014-12-27  5:21     ` Namhyung Kim
  1 sibling, 1 reply; 91+ messages in thread
From: Jiri Olsa @ 2014-12-26 13:59 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Wed, Dec 24, 2014 at 04:15:33PM +0900, Namhyung Kim wrote:
> The perf data split command is for splitting a (large) single data
> file into multiple files under a directory (perf.data.dir by default)
> so that it can be processed and reported using multiple threads.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/Documentation/perf-data.txt |  28 +++++
>  tools/perf/builtin-data.c              | 223 +++++++++++++++++++++++++++++++++
>  2 files changed, 251 insertions(+)
> 
> diff --git a/tools/perf/Documentation/perf-data.txt b/tools/perf/Documentation/perf-data.txt
> index b8c83947715c..42708702f10c 100644
> --- a/tools/perf/Documentation/perf-data.txt
> +++ b/tools/perf/Documentation/perf-data.txt
> @@ -13,3 +13,31 @@ SYNOPSIS
>  DESCRIPTION
>  -----------
>  Data file related processing.
> +
> +COMMANDS
> +--------
> +split::
> +	Split single data file (perf.data) into multiple files under a directory
> +	in order to be reported by multiple threads.
> +
> +OPTIONS for 'split'
> +---------------------
> +-i::
> +--input::
> +	Specify input perf data file path.
> +
> +-o::
> +--output::
> +	Specify output perf data directory path.

should the -o have 'perf.data.dir' as default?

[jolsa@krava perf]$ ./perf record ls > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB perf.data (~0 samples) ]
[jolsa@krava perf]$ ./perf data split
[jolsa@krava perf]$ ll perf.data*
-rw------- 1 jolsa jolsa 16172 Dec 26 14:58 perf.data

jirka

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1)
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (36 preceding siblings ...)
  2014-12-24  7:15 ` [PATCH 37/37] perf data: Implement 'split' subcommand Namhyung Kim
@ 2014-12-26 14:02 ` Jiri Olsa
  2014-12-27  5:23   ` Namhyung Kim
  2015-01-05 18:48 ` Andi Kleen
  38 siblings, 1 reply; 91+ messages in thread
From: Jiri Olsa @ 2014-12-26 14:02 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Wed, Dec 24, 2014 at 04:14:56PM +0900, Namhyung Kim wrote:

SNIP

> 
> 
> You can get it from 'perf/threaded-v1' branch on my tree at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
> 
> Please take a look and play with it.  Any comments are welcome! :)

very nice at first round check ;-)

I'll do detailed review next week

thanks,
jirka

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 02/37] perf record: Use a software dummy event to track task/mmap events
  2014-12-24  7:14 ` [PATCH 02/37] perf record: Use a software dummy event to track task/mmap events Namhyung Kim
@ 2014-12-26 16:27   ` David Ahern
  2014-12-27  5:28     ` Namhyung Kim
  0 siblings, 1 reply; 91+ messages in thread
From: David Ahern @ 2014-12-26 16:27 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Stephane Eranian,
	Adrian Hunter, Andi Kleen, Frederic Weisbecker

On 12/24/14 12:14 AM, Namhyung Kim wrote:
> Prepend a software dummy event into evlist to track task/comm/mmap
> events separately.  This is a preparation of multi-file/thread support
> which will come later.

Are you are making this the first event because of how perf internals 
are coded -- that the first event tracks tasks events? With the tracking 
bit in evsel you should not need to do that. Is there another reason?

---8<---

> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index cfbe2b99b9aa..72dff295237e 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -193,6 +193,44 @@ int perf_evlist__add_default(struct perf_evlist *evlist)
>   	return -ENOMEM;
>   }
>
> +int perf_evlist__prepend_dummy(struct perf_evlist *evlist)
> +{
> +	struct perf_event_attr attr = {
> +		.type = PERF_TYPE_SOFTWARE,
> +		.config = PERF_COUNT_SW_DUMMY,
> +	};
> +	struct perf_evsel *evsel, *pos;
> +
> +	event_attr_init(&attr);
> +
> +	evsel = perf_evsel__new(&attr);
> +	if (evsel == NULL)
> +		goto error;
> +
> +	/* use strdup() because free(evsel) assumes name is allocated */
> +	evsel->name = strdup("dummy");
> +	if (!evsel->name)
> +		goto error_free;
> +
> +	list_for_each_entry(pos, &evlist->entries, node) {
> +		pos->idx += 1;
> +		pos->tracking = false;
> +	}
> +
> +	list_add(&evsel->node, &evlist->entries);
> +	evsel->idx = 0;
> +	evsel->tracking = true;

perf_evlist__set_tracking_event()?

David


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 03/37] perf tools: Use perf_data_file__fd() consistently
  2014-12-24  7:14 ` [PATCH 03/37] perf tools: Use perf_data_file__fd() consistently Namhyung Kim
@ 2014-12-26 16:30   ` David Ahern
  2014-12-27  5:30     ` Namhyung Kim
  0 siblings, 1 reply; 91+ messages in thread
From: David Ahern @ 2014-12-26 16:30 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Stephane Eranian,
	Adrian Hunter, Andi Kleen, Frederic Weisbecker

On 12/24/14 12:14 AM, Namhyung Kim wrote:

> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index 84df2deed988..d8b13407594d 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -375,8 +375,10 @@ static int __cmd_inject(struct perf_inject *inject)
>   		}
>   	}
>

How about a local variable to make this more readable?
fd = perf_data_file__fd(file_out)

> -	if (!file_out->is_pipe)
> -		lseek(file_out->fd, session->header.data_offset, SEEK_SET);
> +	if (!file_out->is_pipe) {
> +		lseek(perf_data_file__fd(file_out), session->header.data_offset,
> +		      SEEK_SET);
> +	}
>
>   	ret = perf_session__process_events(session, &inject->tool);
>
> @@ -385,7 +387,8 @@ static int __cmd_inject(struct perf_inject *inject)
>   			perf_header__set_feat(&session->header,
>   					      HEADER_BUILD_ID);
>   		session->header.data_size = inject->bytes_written;
> -		perf_session__write_header(session, session->evlist, file_out->fd, true);
> +		perf_session__write_header(session, session->evlist,
> +					   perf_data_file__fd(file_out), true);
>   	}
>
>   	return ret;
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index aa5fa6aabb31..054c6e57d3b9 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -196,7 +196,7 @@ static int process_buildids(struct record *rec)
>   	struct perf_session *session = rec->session;
>   	u64 start = session->header.data_offset;
>
> -	u64 size = lseek(file->fd, 0, SEEK_CUR);
> +	u64 size = lseek(perf_data_file__fd(file), 0, SEEK_CUR);
>   	if (size == 0)
>   		return 0;
>
> @@ -360,12 +360,12 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>   		perf_header__clear_feat(&session->header, HEADER_GROUP_DESC);

Similarly in this function.

>
>   	if (file->is_pipe) {
> -		err = perf_header__write_pipe(file->fd);
> +		err = perf_header__write_pipe(perf_data_file__fd(file));
>   		if (err < 0)
>   			goto out_child;
>   	} else {
>   		err = perf_session__write_header(session, rec->evlist,
> -						 file->fd, false);
> +						 perf_data_file__fd(file), false);
>   		if (err < 0)
>   			goto out_child;
>   	}
> @@ -397,8 +397,10 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>   			 * return this more properly and also
>   			 * propagate errors that now are calling die()
>   			 */
> -			err = perf_event__synthesize_tracing_data(tool, file->fd, rec->evlist,
> -								  process_synthesized_event);
> +			err = perf_event__synthesize_tracing_data(tool,
> +						perf_data_file__fd(file),
> +						rec->evlist,
> +						process_synthesized_event);
>   			if (err <= 0) {
>   				pr_err("Couldn't record tracing data.\n");
>   				goto out_child;
> @@ -541,7 +543,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>   		if (!rec->no_buildid)
>   			process_buildids(rec);
>   		perf_session__write_header(rec->session, rec->evlist,
> -					   file->fd, true);
> +					   perf_data_file__fd(&rec->file), true);
>   	}
>
>   out_delete_session:

David


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/37] perf tools: Create separate mmap for dummy tracking event
  2014-12-24  7:15 ` [PATCH 05/37] perf tools: Create separate mmap for dummy tracking event Namhyung Kim
  2014-12-25 22:08   ` Jiri Olsa
  2014-12-25 22:09   ` Jiri Olsa
@ 2014-12-26 16:51   ` David Ahern
  2014-12-27  5:32     ` Namhyung Kim
  2014-12-29 13:44   ` Adrian Hunter
  3 siblings, 1 reply; 91+ messages in thread
From: David Ahern @ 2014-12-26 16:51 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Stephane Eranian,
	Adrian Hunter, Andi Kleen, Frederic Weisbecker

On 12/24/14 12:15 AM, Namhyung Kim wrote:
> When multi file support is enabled, a dummy tracking event will be
> used to track metadata (like task, comm and mmap events) for a session
> and actual samples will be recorded in separate files.
>
> Provide separate mmap to the dummy tracking event.  The size is fixed
> to 128KiB (+ 1 page) as the event rate will be lower than samples.  I

Have you tried stress tests like a kernel compile with -j Ncpus (Ncpus = 
16 or higher)?

David

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 11/37] perf tools: Introduce thread__comm_time() helpers
  2014-12-24  7:15 ` [PATCH 11/37] perf tools: Introduce thread__comm_time() helpers Namhyung Kim
@ 2014-12-26 17:00   ` David Ahern
  2014-12-27  5:36     ` Namhyung Kim
  0 siblings, 1 reply; 91+ messages in thread
From: David Ahern @ 2014-12-26 17:00 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Stephane Eranian,
	Adrian Hunter, Andi Kleen, Frederic Weisbecker

On 12/24/14 12:15 AM, Namhyung Kim wrote:
> @@ -139,6 +161,16 @@ const char *thread__comm_str(const struct thread *thread)
>   	return comm__str(comm);
>   }
>
> +const char *thread__comm_time_str(const struct thread *thread, u64 timestamp)
> +{
> +	const struct comm *comm = thread__comm_time(thread, timestamp);
> +
> +	if (!comm)
> +		return NULL;
> +
> +	return comm__str(comm);
> +}
> +

thread__comm_str_time()? time_str suggests a time-based string as 
opposed to a comm_str based on time.

David

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 14/37] perf tools: Convert dead thread list into rbtree
  2014-12-26  2:26     ` Namhyung Kim
@ 2014-12-26 17:14       ` David Ahern
  2014-12-27  5:42         ` Namhyung Kim
  0 siblings, 1 reply; 91+ messages in thread
From: David Ahern @ 2014-12-26 17:14 UTC (permalink / raw)
  To: Namhyung Kim, Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	Stephane Eranian, Adrian Hunter, Andi Kleen, Frederic Weisbecker

On 12/25/14 7:26 PM, Namhyung Kim wrote:
>>> diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
>>> index 0b6dcd70bc8b..413f28cf689b 100644
>>> --- a/tools/perf/util/thread.h
>>> +++ b/tools/perf/util/thread.h
>>> @@ -11,10 +11,8 @@
>>>   struct thread_stack;
>>>
>>>   struct thread {
>>> -     union {
>>> -             struct rb_node   rb_node;
>>> -             struct list_head node;
>>> -     };
>>> +     struct rb_node          rb_node;
>>> +     struct list_head        node;
>>>        struct map_groups       *mg;
>>>        pid_t                   pid_; /* Not all tools update this */
>>>        pid_t                   tid;
>>> @@ -22,7 +20,8 @@ struct thread {
>>>        int                     cpu;
>>>        char                    shortname[3];
>>>        bool                    comm_set;
>>> -     bool                    dead; /* if set thread has exited */
>>> +     bool                    exited; /* if set thread has exited */
>>> +     bool                    dead; /* thread is in dead_threads list */
>>
>> looks like this also changes the logic (new exited flag),
>> not just the dead threads storage wheel
>
> AFAICS the 'dead' flag is not used other than thread__exited().  And
> it confused me a dead thread might not be in a dead_threads tree (or
> list).  So I changed the name and no logical change intended.

git show 236a3bbd5cb51

David


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 37/37] perf data: Implement 'split' subcommand
  2014-12-26 13:59   ` Jiri Olsa
@ 2014-12-27  5:21     ` Namhyung Kim
  0 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-27  5:21 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

Hi Jiri,

On Fri, Dec 26, 2014 at 10:59 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> On Wed, Dec 24, 2014 at 04:15:33PM +0900, Namhyung Kim wrote:
>> The perf data split command is for splitting a (large) single data
>> file into multiple files under a directory (perf.data.dir by default)
>> so that it can be processed and reported using multiple threads.
>>
>> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
>> ---
>>  tools/perf/Documentation/perf-data.txt |  28 +++++
>>  tools/perf/builtin-data.c              | 223 +++++++++++++++++++++++++++++++++
>>  2 files changed, 251 insertions(+)
>>
>> diff --git a/tools/perf/Documentation/perf-data.txt b/tools/perf/Documentation/perf-data.txt
>> index b8c83947715c..42708702f10c 100644
>> --- a/tools/perf/Documentation/perf-data.txt
>> +++ b/tools/perf/Documentation/perf-data.txt
>> @@ -13,3 +13,31 @@ SYNOPSIS
>>  DESCRIPTION
>>  -----------
>>  Data file related processing.
>> +
>> +COMMANDS
>> +--------
>> +split::
>> +     Split single data file (perf.data) into multiple files under a directory
>> +     in order to be reported by multiple threads.
>> +
>> +OPTIONS for 'split'
>> +---------------------
>> +-i::
>> +--input::
>> +     Specify input perf data file path.
>> +
>> +-o::
>> +--output::
>> +     Specify output perf data directory path.
>
> should the -o have 'perf.data.dir' as default?
>
> [jolsa@krava perf]$ ./perf record ls > /dev/null
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.000 MB perf.data (~0 samples) ]
> [jolsa@krava perf]$ ./perf data split
> [jolsa@krava perf]$ ll perf.data*
> -rw------- 1 jolsa jolsa 16172 Dec 26 14:58 perf.data

You're right! :)

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1)
  2014-12-26 14:02 ` [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Jiri Olsa
@ 2014-12-27  5:23   ` Namhyung Kim
  0 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-27  5:23 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Fri, Dec 26, 2014 at 11:02 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> On Wed, Dec 24, 2014 at 04:14:56PM +0900, Namhyung Kim wrote:
>
> SNIP
>
>>
>>
>> You can get it from 'perf/threaded-v1' branch on my tree at:
>>
>>   git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
>>
>> Please take a look and play with it.  Any comments are welcome! :)
>
> very nice at first round check ;-)
>
> I'll do detailed review next week

Thank you very much!
Namhyung

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 02/37] perf record: Use a software dummy event to track task/mmap events
  2014-12-26 16:27   ` David Ahern
@ 2014-12-27  5:28     ` Namhyung Kim
  2014-12-29 12:58       ` Adrian Hunter
  0 siblings, 1 reply; 91+ messages in thread
From: Namhyung Kim @ 2014-12-27  5:28 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

Hi David,

On Sat, Dec 27, 2014 at 1:27 AM, David Ahern <dsahern@gmail.com> wrote:
> On 12/24/14 12:14 AM, Namhyung Kim wrote:
>>
>> Prepend a software dummy event into evlist to track task/comm/mmap
>> events separately.  This is a preparation of multi-file/thread support
>> which will come later.
>
>
> Are you are making this the first event because of how perf internals are
> coded -- that the first event tracks tasks events? With the tracking bit in
> evsel you should not need to do that. Is there another reason?

Yeah, I know the tracking bit can be set to any evsel in the evlist.
But I'd like to keep it at a fixed index so that it can be easily
identified at later stages (like perf report) too.  Ideally, it'd be
great if we have a way to distinguish this auto-added dummy tracking
event from other (user-added) (dummy?) tracking events if any.

>
> ---8<---
>
>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
>> index cfbe2b99b9aa..72dff295237e 100644
>> --- a/tools/perf/util/evlist.c
>> +++ b/tools/perf/util/evlist.c
>> @@ -193,6 +193,44 @@ int perf_evlist__add_default(struct perf_evlist
>> *evlist)
>>         return -ENOMEM;
>>   }
>>
>> +int perf_evlist__prepend_dummy(struct perf_evlist *evlist)
>> +{
>> +       struct perf_event_attr attr = {
>> +               .type = PERF_TYPE_SOFTWARE,
>> +               .config = PERF_COUNT_SW_DUMMY,
>> +       };
>> +       struct perf_evsel *evsel, *pos;
>> +
>> +       event_attr_init(&attr);
>> +
>> +       evsel = perf_evsel__new(&attr);
>> +       if (evsel == NULL)
>> +               goto error;
>> +
>> +       /* use strdup() because free(evsel) assumes name is allocated */
>> +       evsel->name = strdup("dummy");
>> +       if (!evsel->name)
>> +               goto error_free;
>> +
>> +       list_for_each_entry(pos, &evlist->entries, node) {
>> +               pos->idx += 1;
>> +               pos->tracking = false;
>> +       }
>> +
>> +       list_add(&evsel->node, &evlist->entries);
>> +       evsel->idx = 0;
>> +       evsel->tracking = true;
>
>
> perf_evlist__set_tracking_event()?

I found that after I wrote this, so yes, it can use the function
instead of the oped-code.  But the loop traversal is needed anyway to
fixup the evsel->idx.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 03/37] perf tools: Use perf_data_file__fd() consistently
  2014-12-26 16:30   ` David Ahern
@ 2014-12-27  5:30     ` Namhyung Kim
  0 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-27  5:30 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Sat, Dec 27, 2014 at 1:30 AM, David Ahern <dsahern@gmail.com> wrote:
> On 12/24/14 12:14 AM, Namhyung Kim wrote:
>
>> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
>> index 84df2deed988..d8b13407594d 100644
>> --- a/tools/perf/builtin-inject.c
>> +++ b/tools/perf/builtin-inject.c
>> @@ -375,8 +375,10 @@ static int __cmd_inject(struct perf_inject *inject)
>>                 }
>>         }
>>
>
> How about a local variable to make this more readable?
> fd = perf_data_file__fd(file_out)

Will do.


>
>> -       if (!file_out->is_pipe)
>> -               lseek(file_out->fd, session->header.data_offset,
>> SEEK_SET);
>> +       if (!file_out->is_pipe) {
>> +               lseek(perf_data_file__fd(file_out),
>> session->header.data_offset,
>> +                     SEEK_SET);
>> +       }
>>
>>         ret = perf_session__process_events(session, &inject->tool);
>>
>> @@ -385,7 +387,8 @@ static int __cmd_inject(struct perf_inject *inject)
>>                         perf_header__set_feat(&session->header,
>>                                               HEADER_BUILD_ID);
>>                 session->header.data_size = inject->bytes_written;
>> -               perf_session__write_header(session, session->evlist,
>> file_out->fd, true);
>> +               perf_session__write_header(session, session->evlist,
>> +                                          perf_data_file__fd(file_out),
>> true);
>>         }
>>
>>         return ret;
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index aa5fa6aabb31..054c6e57d3b9 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -196,7 +196,7 @@ static int process_buildids(struct record *rec)
>>         struct perf_session *session = rec->session;
>>         u64 start = session->header.data_offset;
>>
>> -       u64 size = lseek(file->fd, 0, SEEK_CUR);
>> +       u64 size = lseek(perf_data_file__fd(file), 0, SEEK_CUR);
>>         if (size == 0)
>>                 return 0;
>>
>> @@ -360,12 +360,12 @@ static int __cmd_record(struct record *rec, int
>> argc, const char **argv)
>>                 perf_header__clear_feat(&session->header,
>> HEADER_GROUP_DESC);
>
>
> Similarly in this function.

No problem.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/37] perf tools: Create separate mmap for dummy tracking event
  2014-12-26 16:51   ` David Ahern
@ 2014-12-27  5:32     ` Namhyung Kim
  0 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-27  5:32 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Sat, Dec 27, 2014 at 1:51 AM, David Ahern <dsahern@gmail.com> wrote:
> On 12/24/14 12:15 AM, Namhyung Kim wrote:
>>
>> When multi file support is enabled, a dummy tracking event will be
>> used to track metadata (like task, comm and mmap events) for a session
>> and actual samples will be recorded in separate files.
>>
>> Provide separate mmap to the dummy tracking event.  The size is fixed
>> to 128KiB (+ 1 page) as the event rate will be lower than samples.  I
>
>
> Have you tried stress tests like a kernel compile with -j Ncpus (Ncpus = 16
> or higher)?

Yep, the test results in the cover letter were recorded kernel build
with -j 20 on my development machine.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 11/37] perf tools: Introduce thread__comm_time() helpers
  2014-12-26 17:00   ` David Ahern
@ 2014-12-27  5:36     ` Namhyung Kim
  0 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-27  5:36 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Sat, Dec 27, 2014 at 2:00 AM, David Ahern <dsahern@gmail.com> wrote:
> On 12/24/14 12:15 AM, Namhyung Kim wrote:
>>
>> @@ -139,6 +161,16 @@ const char *thread__comm_str(const struct thread
>> *thread)
>>         return comm__str(comm);
>>   }
>>
>> +const char *thread__comm_time_str(const struct thread *thread, u64
>> timestamp)
>> +{
>> +       const struct comm *comm = thread__comm_time(thread, timestamp);
>> +
>> +       if (!comm)
>> +               return NULL;
>> +
>> +       return comm__str(comm);
>> +}
>> +
>
>
> thread__comm_str_time()? time_str suggests a time-based string as opposed to
> a comm_str based on time.

Will change - my naming sense is always awful.  I'd be happy to hear
any naming suggestion. ;-)

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 14/37] perf tools: Convert dead thread list into rbtree
  2014-12-26 17:14       ` David Ahern
@ 2014-12-27  5:42         ` Namhyung Kim
  0 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-27  5:42 UTC (permalink / raw)
  To: David Ahern
  Cc: Jiri Olsa, Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	LKML, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Sat, Dec 27, 2014 at 2:14 AM, David Ahern <dsahern@gmail.com> wrote:
> On 12/25/14 7:26 PM, Namhyung Kim wrote:
>>>>
>>>> diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
>>>> index 0b6dcd70bc8b..413f28cf689b 100644
>>>> --- a/tools/perf/util/thread.h
>>>> +++ b/tools/perf/util/thread.h
>>>> @@ -11,10 +11,8 @@
>>>>   struct thread_stack;
>>>>
>>>>   struct thread {
>>>> -     union {
>>>> -             struct rb_node   rb_node;
>>>> -             struct list_head node;
>>>> -     };
>>>> +     struct rb_node          rb_node;
>>>> +     struct list_head        node;
>>>>        struct map_groups       *mg;
>>>>        pid_t                   pid_; /* Not all tools update this */
>>>>        pid_t                   tid;
>>>> @@ -22,7 +20,8 @@ struct thread {
>>>>        int                     cpu;
>>>>        char                    shortname[3];
>>>>        bool                    comm_set;
>>>> -     bool                    dead; /* if set thread has exited */
>>>> +     bool                    exited; /* if set thread has exited */
>>>> +     bool                    dead; /* thread is in dead_threads list */
>>>
>>>
>>> looks like this also changes the logic (new exited flag),
>>> not just the dead threads storage wheel
>>
>>
>> AFAICS the 'dead' flag is not used other than thread__exited().  And
>> it confused me a dead thread might not be in a dead_threads tree (or
>> list).  So I changed the name and no logical change intended.
>
>
> git show 236a3bbd5cb51

Thanks for the pointer.  I understand the need of delaying the move to
dead_threads list, but anyway it still makes me confused.  So I
renamed the flag to keep the current behavior and match the name
'dead' to the list/tree management to reduce further confusion.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 14/37] perf tools: Convert dead thread list into rbtree
  2014-12-24  7:15 ` [PATCH 14/37] perf tools: Convert dead thread list into rbtree Namhyung Kim
  2014-12-25 23:05   ` Jiri Olsa
@ 2014-12-27 15:31   ` David Ahern
  2014-12-28 13:24     ` Namhyung Kim
  1 sibling, 1 reply; 91+ messages in thread
From: David Ahern @ 2014-12-27 15:31 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Stephane Eranian,
	Adrian Hunter, Andi Kleen, Frederic Weisbecker

On 12/24/14 12:15 AM, Namhyung Kim wrote:
> @@ -106,8 +117,8 @@ void machine__delete_threads(struct machine *machine)
>   	while (nd) {
>   		struct thread *t = rb_entry(nd, struct thread, rb_node);
>
> -		rb_erase(&t->rb_node, &machine->threads);
>   		nd = rb_next(nd);
> +		rb_erase(&t->rb_node, &machine->threads);
>   		thread__delete(t);
>   	}
>   }

unrelated to dead threads. Bug fix? separate patch?


> @@ -1236,13 +1247,36 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
>
>   static void machine__remove_thread(struct machine *machine, struct thread *th)
>   {
> +	struct rb_node **p = &machine->dead_threads.rb_node;
> +	struct rb_node *parent = NULL;
> +	struct thread *pos;
> +
>   	machine->last_match = NULL;
>   	rb_erase(&th->rb_node, &machine->threads);
> +
> +	th->dead = true;
> +
>   	/*
>   	 * We may have references to this thread, for instance in some hist_entry
> -	 * instances, so just move them to a separate list.
> +	 * instances, so just move them to a separate list in rbtree.
>   	 */
> -	list_add_tail(&th->node, &machine->dead_threads);
> +	while (*p != NULL) {
> +		parent = *p;
> +		pos = rb_entry(parent, struct thread, rb_node);
> +
> +		if (pos->tid == th->tid) {
> +			list_add_tail(&th->node, &pos->node);
> +			return;
> +		}

So you have a linked list for tid collisions (not mentioned in the git log).

> +
> +		if (th->tid < pos->tid)
> +			p = &(*p)->rb_left;
> +		else
> +			p = &(*p)->rb_right;
> +	}
> +
> +	rb_link_node(&th->rb_node, parent, p);
> +	rb_insert_color(&th->rb_node, &machine->dead_threads);
>   }
>
>   int machine__process_fork_event(struct machine *machine, union perf_event *event,
> @@ -1649,7 +1683,7 @@ int machine__for_each_thread(struct machine *machine,

---8<---

> diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
> index 0b6dcd70bc8b..413f28cf689b 100644
> --- a/tools/perf/util/thread.h
> +++ b/tools/perf/util/thread.h
> @@ -11,10 +11,8 @@
>   struct thread_stack;
>
>   struct thread {
> -	union {
> -		struct rb_node	 rb_node;
> -		struct list_head node;
> -	};
> +	struct rb_node	 	rb_node;
> +	struct list_head 	node;
>   	struct map_groups	*mg;
>   	pid_t			pid_; /* Not all tools update this */
>   	pid_t			tid;

could use better names for rb_node and node. rb_node is the entry in the 
dead_threads tree - dead_node?; node is the linked list for tid 
collisions - tid_node?

David

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 15/37] perf tools: Introduce machine__find*_thread_time()
  2014-12-24  7:15 ` [PATCH 15/37] perf tools: Introduce machine__find*_thread_time() Namhyung Kim
@ 2014-12-27 16:33   ` David Ahern
  2014-12-28 14:50     ` Namhyung Kim
  0 siblings, 1 reply; 91+ messages in thread
From: David Ahern @ 2014-12-27 16:33 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Stephane Eranian,
	Adrian Hunter, Andi Kleen, Frederic Weisbecker

On 12/24/14 12:15 AM, Namhyung Kim wrote:
> @@ -61,12 +61,12 @@ static int unwind_entry(struct unwind_entry *entry, void *arg)
>   __attribute__ ((noinline))
>   static int unwind_thread(struct thread *thread)
>   {
> -	struct perf_sample sample;
> +	struct perf_sample sample = {
> +		.time = -1ULL,
> +	};

1-liner like the others?



> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 582e011adc92..2cc088d71922 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -431,6 +431,103 @@ struct thread *machine__find_thread(struct machine *machine, pid_t pid,
>   	return __machine__findnew_thread(machine, pid, tid, false);
>   }
>
> +static void machine__remove_thread(struct machine *machine, struct thread *th);

Why is this declaration needed?

> +
> +static struct thread *__machine__findnew_thread_time(struct machine *machine,
> +						     pid_t pid, pid_t tid,
> +						     u64 timestamp, bool create)
> +{
> +	struct thread *curr, *pos, *new;
> +	struct thread *th = NULL;
> +	struct rb_node **p;
> +	struct rb_node *parent = NULL;
> +	bool initial = timestamp == (u64)0;
> +
> +	curr = __machine__findnew_thread(machine, pid, tid, initial);

What if create arg is false? Should initial arg also be false?

> +	if (curr && timestamp >= curr->start_time)
> +		return curr;
> +
> +	p = &machine->dead_threads.rb_node;
> +	while (*p != NULL) {
> +		parent = *p;
> +		th = rb_entry(parent, struct thread, rb_node);
> +
> +		if (th->tid == tid) {
> +			list_for_each_entry(pos, &th->node, node) {
> +				if (timestamp >= pos->start_time &&
> +				    pos->start_time > th->start_time) {
> +					th = pos;
> +					break;
> +				}
> +			}
> +
> +			if (timestamp >= th->start_time) {
> +				machine__update_thread_pid(machine, th, pid);
> +				return th;
> +			}
> +			break;
> +		}
> +
> +		if (tid < th->tid)
> +			p = &(*p)->rb_left;
> +		else
> +			p = &(*p)->rb_right;
> +	}

Can the dead_threads search be put in a separate function -- 
machine__find_dead_thread?

> +
> +	if (!create)
> +		return NULL;
> +
> +	if (!curr)
> +		return __machine__findnew_thread(machine, pid, tid, true);
> +
> +	new = thread__new(pid, tid);
> +	if (new == NULL)
> +		return NULL;
> +
> +	new->start_time = timestamp;
> +
> +	if (*p) {
> +		list_for_each_entry(pos, &th->node, node) {
> +			/* sort by time */
> +			if (timestamp >= pos->start_time) {
> +				th = pos;
> +				break;
> +			}
> +		}
> +		list_add_tail(&new->node, &th->node);
> +	} else {
> +		rb_link_node(&new->rb_node, parent, p);
> +		rb_insert_color(&new->rb_node, &machine->dead_threads);
> +	}

Why insert this unknown tid on the dead_threads list?

> +
> +	/*
> +	 * We have to initialize map_groups separately
> +	 * after rb tree is updated.
> +	 *
> +	 * The reason is that we call machine__findnew_thread
> +	 * within thread__init_map_groups to find the thread
> +	 * leader and that would screwed the rb tree.
> +	 */
> +	if (thread__init_map_groups(new, machine)) {
> +		thread__delete(new);
> +		return NULL;
> +	}
> +
> +	return new;
> +}

---8<---

> @@ -1265,6 +1362,16 @@ static void machine__remove_thread(struct machine *machine, struct thread *th)
>   		pos = rb_entry(parent, struct thread, rb_node);
>
>   		if (pos->tid == th->tid) {
> +			struct thread *old;
> +
> +			/* sort by time */
> +			list_for_each_entry(old, &pos->node, node) {
> +				if (th->start_time >= old->start_time) {
> +					pos = old;
> +					break;
> +				}
> +			}
> +

this change seems unrelated to the patch subject -- 
machine__find*_thread_time.

David

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 18/37] perf tools: Remove thread when map groups initialization failed
  2014-12-24  7:15 ` [PATCH 18/37] perf tools: Remove thread when map groups initialization failed Namhyung Kim
@ 2014-12-28  0:45   ` David Ahern
  2014-12-29  7:08     ` Namhyung Kim
  0 siblings, 1 reply; 91+ messages in thread
From: David Ahern @ 2014-12-28  0:45 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, Stephane Eranian,
	Adrian Hunter, Andi Kleen, Frederic Weisbecker

On 12/24/14 12:15 AM, Namhyung Kim wrote:
> Otherwise it'll break the machine->threads tree.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>   tools/perf/util/machine.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 031bace39fdc..beae6e8fe789 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -411,6 +411,7 @@ static struct thread *__machine__findnew_thread(struct machine *machine,
>   		 * leader and that would screwed the rb tree.
>   		 */
>   		if (thread__init_map_groups(th, machine)) {
> +			rb_erase(&th->rb_node, &machine->threads);
>   			thread__delete(th);
>   			return NULL;
>   		}
>

Can you move the thread__init_map_groups() before the thread is added to 
the rbtree? If no, you need to delay setting 'machine->last_match = th' 
otherwise it references a deleted thread.

David

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 14/37] perf tools: Convert dead thread list into rbtree
  2014-12-27 15:31   ` David Ahern
@ 2014-12-28 13:24     ` Namhyung Kim
  0 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-28 13:24 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

Hi David,

On Sun, Dec 28, 2014 at 12:31 AM, David Ahern <dsahern@gmail.com> wrote:
> On 12/24/14 12:15 AM, Namhyung Kim wrote:
>>
>> @@ -106,8 +117,8 @@ void machine__delete_threads(struct machine *machine)
>>         while (nd) {
>>                 struct thread *t = rb_entry(nd, struct thread, rb_node);
>>
>> -               rb_erase(&t->rb_node, &machine->threads);
>>                 nd = rb_next(nd);
>> +               rb_erase(&t->rb_node, &machine->threads);
>>                 thread__delete(t);
>>         }
>>   }
>
>
> unrelated to dead threads. Bug fix? separate patch?

Yes, I'll make it a separate bug-fix patch.


>
>
>> @@ -1236,13 +1247,36 @@ int machine__process_mmap_event(struct machine
>> *machine, union perf_event *event
>>
>>   static void machine__remove_thread(struct machine *machine, struct
>> thread *th)
>>   {
>> +       struct rb_node **p = &machine->dead_threads.rb_node;
>> +       struct rb_node *parent = NULL;
>> +       struct thread *pos;
>> +
>>         machine->last_match = NULL;
>>         rb_erase(&th->rb_node, &machine->threads);
>> +
>> +       th->dead = true;
>> +
>>         /*
>>          * We may have references to this thread, for instance in some
>> hist_entry
>> -        * instances, so just move them to a separate list.
>> +        * instances, so just move them to a separate list in rbtree.
>>          */
>> -       list_add_tail(&th->node, &machine->dead_threads);
>> +       while (*p != NULL) {
>> +               parent = *p;
>> +               pos = rb_entry(parent, struct thread, rb_node);
>> +
>> +               if (pos->tid == th->tid) {
>> +                       list_add_tail(&th->node, &pos->node);
>> +                       return;
>> +               }
>
>
> So you have a linked list for tid collisions (not mentioned in the git log).

Right, will add a description.


>
>> +
>> +               if (th->tid < pos->tid)
>> +                       p = &(*p)->rb_left;
>> +               else
>> +                       p = &(*p)->rb_right;
>> +       }
>> +
>> +       rb_link_node(&th->rb_node, parent, p);
>> +       rb_insert_color(&th->rb_node, &machine->dead_threads);
>>   }
>>
>>   int machine__process_fork_event(struct machine *machine, union
>> perf_event *event,
>> @@ -1649,7 +1683,7 @@ int machine__for_each_thread(struct machine
>> *machine,
>
>
> ---8<---
>
>> diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
>> index 0b6dcd70bc8b..413f28cf689b 100644
>> --- a/tools/perf/util/thread.h
>> +++ b/tools/perf/util/thread.h
>> @@ -11,10 +11,8 @@
>>   struct thread_stack;
>>
>>   struct thread {
>> -       union {
>> -               struct rb_node   rb_node;
>> -               struct list_head node;
>> -       };
>> +       struct rb_node          rb_node;
>> +       struct list_head        node;
>>         struct map_groups       *mg;
>>         pid_t                   pid_; /* Not all tools update this */
>>         pid_t                   tid;
>
>
> could use better names for rb_node and node. rb_node is the entry in the
> dead_threads tree - dead_node?; node is the linked list for tid collisions -
> tid_node?

But the rb_node is used for 3 different purpose depends on its state -
a thread can be in a (normal) threads tree, dead threads tree or
missing threads tree (will be introduced later).

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 15/37] perf tools: Introduce machine__find*_thread_time()
  2014-12-27 16:33   ` David Ahern
@ 2014-12-28 14:50     ` Namhyung Kim
  0 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-28 14:50 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Sun, Dec 28, 2014 at 1:33 AM, David Ahern <dsahern@gmail.com> wrote:
> On 12/24/14 12:15 AM, Namhyung Kim wrote:
>>
>> @@ -61,12 +61,12 @@ static int unwind_entry(struct unwind_entry *entry,
>> void *arg)
>>   __attribute__ ((noinline))
>>   static int unwind_thread(struct thread *thread)
>>   {
>> -       struct perf_sample sample;
>> +       struct perf_sample sample = {
>> +               .time = -1ULL,
>> +       };
>
>
> 1-liner like the others?

Did you mean this?

        struct perf_sample sample = { .time = -1ULL, };

>
>
>
>> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
>> index 582e011adc92..2cc088d71922 100644
>> --- a/tools/perf/util/machine.c
>> +++ b/tools/perf/util/machine.c
>> @@ -431,6 +431,103 @@ struct thread *machine__find_thread(struct machine
>> *machine, pid_t pid,
>>         return __machine__findnew_thread(machine, pid, tid, false);
>>   }
>>
>> +static void machine__remove_thread(struct machine *machine, struct thread
>> *th);
>
>
> Why is this declaration needed?

It seems like a leftover from previous changes, will remove.


>
>> +
>> +static struct thread *__machine__findnew_thread_time(struct machine
>> *machine,
>> +                                                    pid_t pid, pid_t tid,
>> +                                                    u64 timestamp, bool
>> create)
>> +{
>> +       struct thread *curr, *pos, *new;
>> +       struct thread *th = NULL;
>> +       struct rb_node **p;
>> +       struct rb_node *parent = NULL;
>> +       bool initial = timestamp == (u64)0;
>> +
>> +       curr = __machine__findnew_thread(machine, pid, tid, initial);
>
>
> What if create arg is false? Should initial arg also be false?

Nop, the create arg adds the thread to the dead thread tree (or
missing thread tree later).  And the initial flag adds it to the
normal threads tree like in case of synthesized events.  This was
because I firstly used the *_findnew_thread_time() for the meta event
processing, but then I realized that using *_findnew_thread() makes
the processing much easier since the meta events are processed
sequencially.  And this *_findnew_thread_time() is used only for
sample processing.  Maybe we can use 'false' instead of the
'initial'..


>
>> +       if (curr && timestamp >= curr->start_time)
>> +               return curr;
>> +
>> +       p = &machine->dead_threads.rb_node;
>> +       while (*p != NULL) {
>> +               parent = *p;
>> +               th = rb_entry(parent, struct thread, rb_node);
>> +
>> +               if (th->tid == tid) {
>> +                       list_for_each_entry(pos, &th->node, node) {
>> +                               if (timestamp >= pos->start_time &&
>> +                                   pos->start_time > th->start_time) {
>> +                                       th = pos;
>> +                                       break;
>> +                               }
>> +                       }
>> +
>> +                       if (timestamp >= th->start_time) {
>> +                               machine__update_thread_pid(machine, th,
>> pid);
>> +                               return th;
>> +                       }
>> +                       break;
>> +               }
>> +
>> +               if (tid < th->tid)
>> +                       p = &(*p)->rb_left;
>> +               else
>> +                       p = &(*p)->rb_right;
>> +       }
>
>
> Can the dead_threads search be put in a separate function --
> machine__find_dead_thread?

Okay, I can do it.


>
>> +
>> +       if (!create)
>> +               return NULL;
>> +
>> +       if (!curr)
>> +               return __machine__findnew_thread(machine, pid, tid, true);
>> +
>> +       new = thread__new(pid, tid);
>> +       if (new == NULL)
>> +               return NULL;
>> +
>> +       new->start_time = timestamp;
>> +
>> +       if (*p) {
>> +               list_for_each_entry(pos, &th->node, node) {
>> +                       /* sort by time */
>> +                       if (timestamp >= pos->start_time) {
>> +                               th = pos;
>> +                               break;
>> +                       }
>> +               }
>> +               list_add_tail(&new->node, &th->node);
>> +       } else {
>> +               rb_link_node(&new->rb_node, parent, p);
>> +               rb_insert_color(&new->rb_node, &machine->dead_threads);
>> +       }
>
>
> Why insert this unknown tid on the dead_threads list?

Well, mostly the search will be succeeded, and if it failed it's
either newly created thread so curr = NULL and adds it to the normal
tree.  Otherwise (rarely) there might be a missing fork event and the
sample is for an older thread in that any timestamp didn't match to
existing (current and dead) threads.  So I added it to the dead thread
tree.  But with multi-thread support enabled, it'll be added to the
new missing threads tree which is protected by a mutex.


>
>> +
>> +       /*
>> +        * We have to initialize map_groups separately
>> +        * after rb tree is updated.
>> +        *
>> +        * The reason is that we call machine__findnew_thread
>> +        * within thread__init_map_groups to find the thread
>> +        * leader and that would screwed the rb tree.
>> +        */
>> +       if (thread__init_map_groups(new, machine)) {
>> +               thread__delete(new);
>> +               return NULL;
>> +       }
>> +
>> +       return new;
>> +}
>
>
> ---8<---
>
>> @@ -1265,6 +1362,16 @@ static void machine__remove_thread(struct machine
>> *machine, struct thread *th)
>>                 pos = rb_entry(parent, struct thread, rb_node);
>>
>>                 if (pos->tid == th->tid) {
>> +                       struct thread *old;
>> +
>> +                       /* sort by time */
>> +                       list_for_each_entry(old, &pos->node, node) {
>> +                               if (th->start_time >= old->start_time) {
>> +                                       pos = old;
>> +                                       break;
>> +                               }
>> +                       }
>> +
>
>
> this change seems unrelated to the patch subject --
> machine__find*_thread_time.

It searchs a thread in the dead threads tree based on timestamp value.
So the list should be sorted by time order.  But yes, I agree that
this change should be moved to the dead thread conversion patch.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 18/37] perf tools: Remove thread when map groups initialization failed
  2014-12-28  0:45   ` David Ahern
@ 2014-12-29  7:08     ` Namhyung Kim
  0 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-29  7:08 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Sat, Dec 27, 2014 at 05:45:56PM -0700, David Ahern wrote:
> On 12/24/14 12:15 AM, Namhyung Kim wrote:
> >Otherwise it'll break the machine->threads tree.
> >
> >Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> >---
> >  tools/perf/util/machine.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> >diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> >index 031bace39fdc..beae6e8fe789 100644
> >--- a/tools/perf/util/machine.c
> >+++ b/tools/perf/util/machine.c
> >@@ -411,6 +411,7 @@ static struct thread *__machine__findnew_thread(struct machine *machine,
> >  		 * leader and that would screwed the rb tree.
> >  		 */
> >  		if (thread__init_map_groups(th, machine)) {
> >+			rb_erase(&th->rb_node, &machine->threads);
> >  			thread__delete(th);
> >  			return NULL;
> >  		}
> >
> 
> Can you move the thread__init_map_groups() before the thread is added to the
> rbtree? If no, you need to delay setting 'machine->last_match = th'
> otherwise it references a deleted thread.

You're right - I'll move the setting after the thread__init_map_groups().

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 02/37] perf record: Use a software dummy event to track task/mmap events
  2014-12-27  5:28     ` Namhyung Kim
@ 2014-12-29 12:58       ` Adrian Hunter
  2014-12-30  5:51         ` Namhyung Kim
  0 siblings, 1 reply; 91+ messages in thread
From: Adrian Hunter @ 2014-12-29 12:58 UTC (permalink / raw)
  To: Namhyung Kim, David Ahern
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, Stephane Eranian, Andi Kleen, Frederic Weisbecker

On 27/12/14 07:28, Namhyung Kim wrote:
> Hi David,
> 
> On Sat, Dec 27, 2014 at 1:27 AM, David Ahern <dsahern@gmail.com> wrote:
>> On 12/24/14 12:14 AM, Namhyung Kim wrote:
>>>
>>> Prepend a software dummy event into evlist to track task/comm/mmap
>>> events separately.  This is a preparation of multi-file/thread support
>>> which will come later.
>>
>>
>> Are you are making this the first event because of how perf internals are
>> coded -- that the first event tracks tasks events? With the tracking bit in
>> evsel you should not need to do that. Is there another reason?
> 
> Yeah, I know the tracking bit can be set to any evsel in the evlist.
> But I'd like to keep it at a fixed index so that it can be easily
> identified at later stages (like perf report) too.  Ideally, it'd be
> great if we have a way to distinguish this auto-added dummy tracking
> event from other (user-added) (dummy?) tracking events if any.
> 
>>
>> ---8<---
>>
>>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
>>> index cfbe2b99b9aa..72dff295237e 100644
>>> --- a/tools/perf/util/evlist.c
>>> +++ b/tools/perf/util/evlist.c
>>> @@ -193,6 +193,44 @@ int perf_evlist__add_default(struct perf_evlist
>>> *evlist)
>>>         return -ENOMEM;
>>>   }
>>>
>>> +int perf_evlist__prepend_dummy(struct perf_evlist *evlist)
>>> +{
>>> +       struct perf_event_attr attr = {
>>> +               .type = PERF_TYPE_SOFTWARE,
>>> +               .config = PERF_COUNT_SW_DUMMY,

Probably need .exclude_kernel = 1, here

>>> +       };
>>> +       struct perf_evsel *evsel, *pos;
>>> +
>>> +       event_attr_init(&attr);
>>> +
>>> +       evsel = perf_evsel__new(&attr);
>>> +       if (evsel == NULL)
>>> +               goto error;
>>> +
>>> +       /* use strdup() because free(evsel) assumes name is allocated */
>>> +       evsel->name = strdup("dummy");
>>> +       if (!evsel->name)
>>> +               goto error_free;
>>> +
>>> +       list_for_each_entry(pos, &evlist->entries, node) {
>>> +               pos->idx += 1;
>>> +               pos->tracking = false;
>>> +       }
>>> +
>>> +       list_add(&evsel->node, &evlist->entries);
>>> +       evsel->idx = 0;
>>> +       evsel->tracking = true;
>>
>>
>> perf_evlist__set_tracking_event()?
> 
> I found that after I wrote this, so yes, it can use the function
> instead of the oped-code.  But the loop traversal is needed anyway to
> fixup the evsel->idx.

perf_evlist__set_tracking_event() also ensures there is only one tracking
event so it is easy to identify. It is the only event with attr->mmap etc
set to 1. Then you can use perf_evlist__add().


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/37] perf tools: Create separate mmap for dummy tracking event
  2014-12-24  7:15 ` [PATCH 05/37] perf tools: Create separate mmap for dummy tracking event Namhyung Kim
                     ` (2 preceding siblings ...)
  2014-12-26 16:51   ` David Ahern
@ 2014-12-29 13:44   ` Adrian Hunter
  2014-12-30  5:57     ` Namhyung Kim
  3 siblings, 1 reply; 91+ messages in thread
From: Adrian Hunter @ 2014-12-29 13:44 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Jiri Olsa, LKML, David Ahern,
	Stephane Eranian, Andi Kleen, Frederic Weisbecker

On 24/12/14 09:15, Namhyung Kim wrote:
> When multi file support is enabled, a dummy tracking event will be
> used to track metadata (like task, comm and mmap events) for a session
> and actual samples will be recorded in separate files.
> 
> Provide separate mmap to the dummy tracking event.  The size is fixed
> to 128KiB (+ 1 page) as the event rate will be lower than samples.  I
> originally wanted to use a single mmap for this but cross-cpu sharing
> is prohibited so it's per-cpu (or per-task) like normal mmaps.
> 
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-record.c |   9 +++-
>  tools/perf/util/evlist.c    | 104 +++++++++++++++++++++++++++++++++++---------
>  tools/perf/util/evlist.h    |  11 ++++-
>  3 files changed, 102 insertions(+), 22 deletions(-)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 054c6e57d3b9..129fab35fdc5 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -69,7 +69,7 @@ static int process_synthesized_event(struct perf_tool *tool,
>  
>  static int record__mmap_read(struct record *rec, int idx)
>  {
> -	struct perf_mmap *md = &rec->evlist->mmap[idx];
> +	struct perf_mmap *md = perf_evlist__mmap_desc(rec->evlist, idx);
>  	unsigned int head = perf_mmap__read_head(md);
>  	unsigned int old = md->prev;
>  	unsigned char *data = md->base + page_size;
> @@ -105,6 +105,7 @@ static int record__mmap_read(struct record *rec, int idx)
>  	}
>  
>  	md->prev = old;
> +
>  	perf_evlist__mmap_consume(rec->evlist, idx);
>  out:
>  	return rc;
> @@ -263,6 +264,12 @@ static int record__mmap_read_all(struct record *rec)
>  				goto out;
>  			}
>  		}
> +		if (rec->evlist->track_mmap) {
> +			if (record__mmap_read(rec, track_mmap_idx(i)) != 0) {
> +				rc = -1;
> +				goto out;
> +			}
> +		}
>  	}
>  
>  	/*
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 72dff295237e..d99343b988fe 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -27,6 +27,7 @@
>  
>  static void perf_evlist__mmap_put(struct perf_evlist *evlist, int idx);
>  static void __perf_evlist__munmap(struct perf_evlist *evlist, int idx);
> +static void __perf_evlist__munmap_track(struct perf_evlist *evlist, int idx);
>  
>  #define FD(e, x, y) (*(int *)xyarray__entry(e->fd, x, y))
>  #define SID(e, x, y) xyarray__entry(e->sample_id, x, y)
> @@ -735,22 +736,39 @@ static bool perf_mmap__empty(struct perf_mmap *md)
>  	return perf_mmap__read_head(md) != md->prev;
>  }
>  
> +struct perf_mmap *perf_evlist__mmap_desc(struct perf_evlist *evlist, int idx)
> +{
> +	if (idx >= 0)
> +		return &evlist->mmap[idx];
> +	else
> +		return &evlist->track_mmap[track_mmap_idx(idx)];
> +}
> +
>  static void perf_evlist__mmap_get(struct perf_evlist *evlist, int idx)
>  {
> -	++evlist->mmap[idx].refcnt;
> +	struct perf_mmap *md = perf_evlist__mmap_desc(evlist, idx);
> +
> +	++md->refcnt;
>  }
>  
>  static void perf_evlist__mmap_put(struct perf_evlist *evlist, int idx)
>  {
> -	BUG_ON(evlist->mmap[idx].refcnt == 0);
> +	struct perf_mmap *md = perf_evlist__mmap_desc(evlist, idx);
> +
> +	BUG_ON(md->refcnt == 0);
> +
> +	if (--md->refcnt != 0)
> +		return;
>  
> -	if (--evlist->mmap[idx].refcnt == 0)
> +	if (idx >= 0)
>  		__perf_evlist__munmap(evlist, idx);
> +	else
> +		__perf_evlist__munmap_track(evlist, track_mmap_idx(idx));
>  }
>  
>  void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx)
>  {
> -	struct perf_mmap *md = &evlist->mmap[idx];
> +	struct perf_mmap *md = perf_evlist__mmap_desc(evlist, idx);
>  
>  	if (!evlist->overwrite) {
>  		unsigned int old = md->prev;
> @@ -771,6 +789,15 @@ static void __perf_evlist__munmap(struct perf_evlist *evlist, int idx)
>  	}
>  }
>  
> +static void __perf_evlist__munmap_track(struct perf_evlist *evlist, int idx)
> +{
> +	if (evlist->track_mmap[idx].base != NULL) {
> +		munmap(evlist->track_mmap[idx].base, TRACK_MMAP_SIZE);
> +		evlist->track_mmap[idx].base = NULL;
> +		evlist->track_mmap[idx].refcnt = 0;
> +	}
> +}
> +
>  void perf_evlist__munmap(struct perf_evlist *evlist)
>  {
>  	int i;
> @@ -782,23 +809,43 @@ void perf_evlist__munmap(struct perf_evlist *evlist)
>  		__perf_evlist__munmap(evlist, i);
>  
>  	zfree(&evlist->mmap);
> +
> +	if (evlist->track_mmap == NULL)
> +		return;
> +
> +	for (i = 0; i < evlist->nr_mmaps; i++)
> +		__perf_evlist__munmap_track(evlist, i);
> +
> +	zfree(&evlist->track_mmap);
>  }
>  
> -static int perf_evlist__alloc_mmap(struct perf_evlist *evlist)
> +static int perf_evlist__alloc_mmap(struct perf_evlist *evlist, bool track_mmap)
>  {
>  	evlist->nr_mmaps = cpu_map__nr(evlist->cpus);
>  	if (cpu_map__empty(evlist->cpus))
>  		evlist->nr_mmaps = thread_map__nr(evlist->threads);
>  	evlist->mmap = zalloc(evlist->nr_mmaps * sizeof(struct perf_mmap));
> -	return evlist->mmap != NULL ? 0 : -ENOMEM;
> +	if (evlist->mmap == NULL)
> +		return -ENOMEM;
> +
> +	if (track_mmap) {
> +		evlist->track_mmap = calloc(evlist->nr_mmaps,
> +					    sizeof(struct perf_mmap));
> +		if (evlist->track_mmap == NULL) {
> +			zfree(&evlist->mmap);
> +			return -ENOMEM;
> +		}
> +	}
> +	return 0;
>  }
>  
>  struct mmap_params {
> -	int prot;
> -	int mask;
> +	int	prot;
> +	size_t	len;
>  };
>  
> -static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
> +static int __perf_evlist__mmap(struct perf_evlist *evlist __maybe_unused,
> +			       struct perf_mmap *pmmap,
>  			       struct mmap_params *mp, int fd)
>  {
>  	/*
> @@ -814,15 +861,14 @@ static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
>  	 * evlist layer can't just drop it when filtering events in
>  	 * perf_evlist__filter_pollfd().
>  	 */
> -	evlist->mmap[idx].refcnt = 2;
> -	evlist->mmap[idx].prev = 0;
> -	evlist->mmap[idx].mask = mp->mask;
> -	evlist->mmap[idx].base = mmap(NULL, evlist->mmap_len, mp->prot,
> -				      MAP_SHARED, fd, 0);
> -	if (evlist->mmap[idx].base == MAP_FAILED) {
> +	pmmap->refcnt = 2;
> +	pmmap->prev = 0;
> +	pmmap->mask = mp->len - page_size - 1;
> +	pmmap->base = mmap(NULL, mp->len, mp->prot, MAP_SHARED, fd, 0);
> +	if (pmmap->base == MAP_FAILED) {
>  		pr_debug2("failed to mmap perf event ring buffer, error %d\n",
>  			  errno);
> -		evlist->mmap[idx].base = NULL;
> +		pmmap->base = NULL;
>  		return -1;
>  	}
>  
> @@ -843,9 +889,22 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
>  
>  		fd = FD(evsel, cpu, thread);
>  
> -		if (*output == -1) {
> +		if (perf_evsel__is_dummy_tracking(evsel)) {
> +			struct mmap_params track_mp = {
> +				.prot	= mp->prot,
> +				.len	= TRACK_MMAP_SIZE,
> +			};
> +
> +			if (__perf_evlist__mmap(evlist, &evlist->track_mmap[idx],
> +						&track_mp, fd) < 0)
> +				return -1;
> +
> +			/* mark idx as track mmap idx (negative) */
> +			idx = track_mmap_idx(idx);

Do you not still need to do SET_OUTPUT when there are multiple cpus and
multiple pids?


> +		} else if (*output == -1) {
>  			*output = fd;
> -			if (__perf_evlist__mmap(evlist, idx, mp, *output) < 0)
> +			if (__perf_evlist__mmap(evlist, &evlist->mmap[idx],
> +						mp, *output) < 0)
>  				return -1;
>  		} else {
>  			if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, *output) != 0)
> @@ -874,6 +933,11 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
>  			perf_evlist__set_sid_idx(evlist, evsel, idx, cpu,
>  						 thread);
>  		}
> +
> +		if (mp->track && perf_evsel__is_dummy_tracking(evsel)) {
> +			/* restore idx as normal idx (positive) */
> +			idx = track_mmap_idx(idx);
> +		}
>  	}
>  
>  	return 0;
> @@ -1025,7 +1089,7 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
>  		.prot = PROT_READ | (overwrite ? 0 : PROT_WRITE),
>  	};
>  
> -	if (evlist->mmap == NULL && perf_evlist__alloc_mmap(evlist) < 0)
> +	if (evlist->mmap == NULL && perf_evlist__alloc_mmap(evlist, use_track_mmap) < 0)
>  		return -ENOMEM;
>  
>  	if (evlist->pollfd.entries == NULL && perf_evlist__alloc_pollfd(evlist) < 0)
> @@ -1034,7 +1098,7 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
>  	evlist->overwrite = overwrite;
>  	evlist->mmap_len = perf_evlist__mmap_size(pages);
>  	pr_debug("mmap size %zuB\n", evlist->mmap_len);
> -	mp.mask = evlist->mmap_len - page_size - 1;
> +	mp.len = evlist->mmap_len;
>  
>  	evlist__for_each(evlist, evsel) {
>  		if ((evsel->attr.read_format & PERF_FORMAT_ID) &&
> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> index b974bddf6b8b..b7f54b8577f7 100644
> --- a/tools/perf/util/evlist.h
> +++ b/tools/perf/util/evlist.h
> @@ -48,11 +48,14 @@ struct perf_evlist {
>  	bool		 overwrite;
>  	struct fdarray	 pollfd;
>  	struct perf_mmap *mmap;
> +	struct perf_mmap *track_mmap;
>  	struct thread_map *threads;
>  	struct cpu_map	  *cpus;
>  	struct perf_evsel *selected;
>  };
>  
> +#define TRACK_MMAP_SIZE  (((128 * 1024 / page_size) + 1) * page_size)
> +
>  struct perf_evsel_str_handler {
>  	const char *name;
>  	void	   *handler;
> @@ -100,8 +103,8 @@ struct perf_evsel *perf_evlist__id2evsel(struct perf_evlist *evlist, u64 id);
>  struct perf_sample_id *perf_evlist__id2sid(struct perf_evlist *evlist, u64 id);
>  
>  union perf_event *perf_evlist__mmap_read(struct perf_evlist *evlist, int idx);
> -
>  void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx);
> +struct perf_mmap *perf_evlist__mmap_desc(struct perf_evlist *evlist, int idx);
>  
>  int perf_evlist__open(struct perf_evlist *evlist);
>  void perf_evlist__close(struct perf_evlist *evlist);
> @@ -211,6 +214,12 @@ bool perf_evlist__can_select_event(struct perf_evlist *evlist, const char *str);
>  void perf_evlist__to_front(struct perf_evlist *evlist,
>  			   struct perf_evsel *move_evsel);
>  
> +/* convert from/to negative idx for track mmaps */
> +static inline int track_mmap_idx(int idx)
> +{
> +	return -idx - 1;
> +}
> +
>  /**
>   * __evlist__for_each - iterate thru all the evsels
>   * @list: list_head instance to iterate
> 


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 02/37] perf record: Use a software dummy event to track task/mmap events
  2014-12-29 12:58       ` Adrian Hunter
@ 2014-12-30  5:51         ` Namhyung Kim
  2014-12-30  9:04           ` Adrian Hunter
  0 siblings, 1 reply; 91+ messages in thread
From: Namhyung Kim @ 2014-12-30  5:51 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: David Ahern, Arnaldo Carvalho de Melo, Ingo Molnar,
	Peter Zijlstra, Jiri Olsa, LKML, Stephane Eranian, Andi Kleen,
	Frederic Weisbecker

Hi Adrian,

On Mon, Dec 29, 2014 at 02:58:12PM +0200, Adrian Hunter wrote:
> On 27/12/14 07:28, Namhyung Kim wrote:
> > Hi David,
> > 
> > On Sat, Dec 27, 2014 at 1:27 AM, David Ahern <dsahern@gmail.com> wrote:
> >> On 12/24/14 12:14 AM, Namhyung Kim wrote:
> >>>
> >>> Prepend a software dummy event into evlist to track task/comm/mmap
> >>> events separately.  This is a preparation of multi-file/thread support
> >>> which will come later.
> >>
> >>
> >> Are you are making this the first event because of how perf internals are
> >> coded -- that the first event tracks tasks events? With the tracking bit in
> >> evsel you should not need to do that. Is there another reason?
> > 
> > Yeah, I know the tracking bit can be set to any evsel in the evlist.
> > But I'd like to keep it at a fixed index so that it can be easily
> > identified at later stages (like perf report) too.  Ideally, it'd be
> > great if we have a way to distinguish this auto-added dummy tracking
> > event from other (user-added) (dummy?) tracking events if any.
> > 
> >>
> >> ---8<---
> >>
> >>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> >>> index cfbe2b99b9aa..72dff295237e 100644
> >>> --- a/tools/perf/util/evlist.c
> >>> +++ b/tools/perf/util/evlist.c
> >>> @@ -193,6 +193,44 @@ int perf_evlist__add_default(struct perf_evlist
> >>> *evlist)
> >>>         return -ENOMEM;
> >>>   }
> >>>
> >>> +int perf_evlist__prepend_dummy(struct perf_evlist *evlist)
> >>> +{
> >>> +       struct perf_event_attr attr = {
> >>> +               .type = PERF_TYPE_SOFTWARE,
> >>> +               .config = PERF_COUNT_SW_DUMMY,
> 
> Probably need .exclude_kernel = 1, here

Ah, right.

> 
> >>> +       };
> >>> +       struct perf_evsel *evsel, *pos;
> >>> +
> >>> +       event_attr_init(&attr);
> >>> +
> >>> +       evsel = perf_evsel__new(&attr);
> >>> +       if (evsel == NULL)
> >>> +               goto error;
> >>> +
> >>> +       /* use strdup() because free(evsel) assumes name is allocated */
> >>> +       evsel->name = strdup("dummy");
> >>> +       if (!evsel->name)
> >>> +               goto error_free;
> >>> +
> >>> +       list_for_each_entry(pos, &evlist->entries, node) {
> >>> +               pos->idx += 1;
> >>> +               pos->tracking = false;
> >>> +       }
> >>> +
> >>> +       list_add(&evsel->node, &evlist->entries);
> >>> +       evsel->idx = 0;
> >>> +       evsel->tracking = true;
> >>
> >>
> >> perf_evlist__set_tracking_event()?
> > 
> > I found that after I wrote this, so yes, it can use the function
> > instead of the oped-code.  But the loop traversal is needed anyway to
> > fixup the evsel->idx.
> 
> perf_evlist__set_tracking_event() also ensures there is only one tracking
> event so it is easy to identify. It is the only event with attr->mmap etc
> set to 1. Then you can use perf_evlist__add().

Well, yes, I think we can put the dummy tracking event anywhere in the
evlist with this, but I still slightly prefer put it at a fixed
location for a possible code simplification..

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 05/37] perf tools: Create separate mmap for dummy tracking event
  2014-12-29 13:44   ` Adrian Hunter
@ 2014-12-30  5:57     ` Namhyung Kim
  0 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-30  5:57 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, David Ahern, Stephane Eranian, Andi Kleen,
	Frederic Weisbecker

On Mon, Dec 29, 2014 at 03:44:21PM +0200, Adrian Hunter wrote:
> > @@ -843,9 +889,22 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
> >  
> >  		fd = FD(evsel, cpu, thread);
> >  
> > -		if (*output == -1) {
> > +		if (perf_evsel__is_dummy_tracking(evsel)) {
> > +			struct mmap_params track_mp = {
> > +				.prot	= mp->prot,
> > +				.len	= TRACK_MMAP_SIZE,
> > +			};
> > +
> > +			if (__perf_evlist__mmap(evlist, &evlist->track_mmap[idx],
> > +						&track_mp, fd) < 0)
> > +				return -1;
> > +
> > +			/* mark idx as track mmap idx (negative) */
> > +			idx = track_mmap_idx(idx);
> 
> Do you not still need to do SET_OUTPUT when there are multiple cpus and
> multiple pids?

You're right.  I just considered simple cases, will fix.

Thanks,
Namhyung


> 
> 
> > +		} else if (*output == -1) {
> >  			*output = fd;
> > -			if (__perf_evlist__mmap(evlist, idx, mp, *output) < 0)
> > +			if (__perf_evlist__mmap(evlist, &evlist->mmap[idx],
> > +						mp, *output) < 0)
> >  				return -1;
> >  		} else {
> >  			if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, *output) != 0)
> > @@ -874,6 +933,11 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
> >  			perf_evlist__set_sid_idx(evlist, evsel, idx, cpu,
> >  						 thread);
> >  		}
> > +
> > +		if (mp->track && perf_evsel__is_dummy_tracking(evsel)) {
> > +			/* restore idx as normal idx (positive) */
> > +			idx = track_mmap_idx(idx);
> > +		}
> >  	}
> >  
> >  	return 0;

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 02/37] perf record: Use a software dummy event to track task/mmap events
  2014-12-30  5:51         ` Namhyung Kim
@ 2014-12-30  9:04           ` Adrian Hunter
  0 siblings, 0 replies; 91+ messages in thread
From: Adrian Hunter @ 2014-12-30  9:04 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: David Ahern, Arnaldo Carvalho de Melo, Ingo Molnar,
	Peter Zijlstra, Jiri Olsa, LKML, Stephane Eranian, Andi Kleen,
	Frederic Weisbecker

On 30/12/14 07:51, Namhyung Kim wrote:
> Hi Adrian,
> 
> On Mon, Dec 29, 2014 at 02:58:12PM +0200, Adrian Hunter wrote:
>> On 27/12/14 07:28, Namhyung Kim wrote:
>>> Hi David,
>>>
>>> On Sat, Dec 27, 2014 at 1:27 AM, David Ahern <dsahern@gmail.com> wrote:
>>>> On 12/24/14 12:14 AM, Namhyung Kim wrote:
>>>>>
>>>>> Prepend a software dummy event into evlist to track task/comm/mmap
>>>>> events separately.  This is a preparation of multi-file/thread support
>>>>> which will come later.
>>>>
>>>>
>>>> Are you are making this the first event because of how perf internals are
>>>> coded -- that the first event tracks tasks events? With the tracking bit in
>>>> evsel you should not need to do that. Is there another reason?
>>>
>>> Yeah, I know the tracking bit can be set to any evsel in the evlist.
>>> But I'd like to keep it at a fixed index so that it can be easily
>>> identified at later stages (like perf report) too.  Ideally, it'd be
>>> great if we have a way to distinguish this auto-added dummy tracking
>>> event from other (user-added) (dummy?) tracking events if any.
>>>
>>>>
>>>> ---8<---
>>>>
>>>>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
>>>>> index cfbe2b99b9aa..72dff295237e 100644
>>>>> --- a/tools/perf/util/evlist.c
>>>>> +++ b/tools/perf/util/evlist.c
>>>>> @@ -193,6 +193,44 @@ int perf_evlist__add_default(struct perf_evlist
>>>>> *evlist)
>>>>>         return -ENOMEM;
>>>>>   }
>>>>>
>>>>> +int perf_evlist__prepend_dummy(struct perf_evlist *evlist)
>>>>> +{
>>>>> +       struct perf_event_attr attr = {
>>>>> +               .type = PERF_TYPE_SOFTWARE,
>>>>> +               .config = PERF_COUNT_SW_DUMMY,
>>
>> Probably need .exclude_kernel = 1, here
> 
> Ah, right.
> 
>>
>>>>> +       };
>>>>> +       struct perf_evsel *evsel, *pos;
>>>>> +
>>>>> +       event_attr_init(&attr);
>>>>> +
>>>>> +       evsel = perf_evsel__new(&attr);
>>>>> +       if (evsel == NULL)
>>>>> +               goto error;
>>>>> +
>>>>> +       /* use strdup() because free(evsel) assumes name is allocated */
>>>>> +       evsel->name = strdup("dummy");
>>>>> +       if (!evsel->name)
>>>>> +               goto error_free;
>>>>> +
>>>>> +       list_for_each_entry(pos, &evlist->entries, node) {
>>>>> +               pos->idx += 1;
>>>>> +               pos->tracking = false;
>>>>> +       }
>>>>> +
>>>>> +       list_add(&evsel->node, &evlist->entries);
>>>>> +       evsel->idx = 0;
>>>>> +       evsel->tracking = true;
>>>>
>>>>
>>>> perf_evlist__set_tracking_event()?
>>>
>>> I found that after I wrote this, so yes, it can use the function
>>> instead of the oped-code.  But the loop traversal is needed anyway to
>>> fixup the evsel->idx.
>>
>> perf_evlist__set_tracking_event() also ensures there is only one tracking
>> event so it is easy to identify. It is the only event with attr->mmap etc
>> set to 1. Then you can use perf_evlist__add().
> 
> Well, yes, I think we can put the dummy tracking event anywhere in the
> evlist with this, but I still slightly prefer put it at a fixed
> location for a possible code simplification..

Probably you should not be checking for the "tracking event" at all on the
processing side and instead just have perf report ignore all dummy events.
That generalizes things even more.

What is the code simplification?


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 01/37] perf tools: Set attr.task bit for a tracking event
  2014-12-24  7:14 ` [PATCH 01/37] perf tools: Set attr.task bit for a tracking event Namhyung Kim
@ 2014-12-31 11:25   ` Jiri Olsa
  0 siblings, 0 replies; 91+ messages in thread
From: Jiri Olsa @ 2014-12-31 11:25 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Wed, Dec 24, 2014 at 04:14:57PM +0900, Namhyung Kim wrote:
> The perf_event_attr.task bit is to track task (fork and exit) events
> but it missed to be set by perf_evsel__config().  While it was not a
> problem in practice since setting other bits (comm/mmap) ended up
> being in same result, it'd be good to set it explicitly anyway.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

Acked-by: Jiri Olsa <jolsa@kernel.org>

> ---
>  tools/perf/util/evsel.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 1e90c8557ede..e17d2b1624bc 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -709,6 +709,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts)
>  	if (opts->sample_weight)
>  		perf_evsel__set_sample_bit(evsel, WEIGHT);
>  
> +	attr->task  = track;
>  	attr->mmap  = track;
>  	attr->mmap2 = track && !perf_missing_features.mmap2;
>  	attr->comm  = track;
> -- 
> 2.1.3
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 04/37] perf tools: Add multi file interface to perf_data_file
  2014-12-24  7:15 ` [PATCH 04/37] perf tools: Add multi file interface to perf_data_file Namhyung Kim
  2014-12-25 22:08   ` Jiri Olsa
@ 2014-12-31 11:26   ` Jiri Olsa
  2014-12-31 14:55     ` Namhyung Kim
  1 sibling, 1 reply; 91+ messages in thread
From: Jiri Olsa @ 2014-12-31 11:26 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Wed, Dec 24, 2014 at 04:15:00PM +0900, Namhyung Kim wrote:

SNIP

> +
>  static int open_file_read(struct perf_data_file *file)
>  {
>  	struct stat st;
> +	char path[PATH_MAX];
>  	int fd;
>  	char sbuf[STRERR_BUFSIZE];
>  
> -	fd = open(file->path, O_RDONLY);
> +	strcpy(path, file->path);

strncpy ?

jirka

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 07/37] perf tools: Do not use __perf_session__process_events() directly
  2014-12-24  7:15 ` [PATCH 07/37] perf tools: Do not use __perf_session__process_events() directly Namhyung Kim
@ 2014-12-31 11:33   ` Jiri Olsa
  0 siblings, 0 replies; 91+ messages in thread
From: Jiri Olsa @ 2014-12-31 11:33 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Wed, Dec 24, 2014 at 04:15:03PM +0900, Namhyung Kim wrote:
> It's only used for perf record to process build-id because its file
> size it's not fixed at this time due to remaining header features.
> However data offset and size is available so that we can use the
> perf_session__process_events() once we set the file size as the
> current offset like for now.
> 
> It turns out that we can staticize the function again as it's the only
> user and add multi file support in a single place.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

Acked-by: Jiri Olsa <jolsa@kernel.org>

> ---
>  tools/perf/builtin-record.c | 7 +++----
>  tools/perf/util/session.c   | 6 +++---
>  tools/perf/util/session.h   | 3 ---
>  3 files changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 8c91f25b81f6..4f97657f14e7 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -196,12 +196,13 @@ static int process_buildids(struct record *rec)
>  {
>  	struct perf_data_file *file  = &rec->file;
>  	struct perf_session *session = rec->session;
> -	u64 start = session->header.data_offset;
>  
>  	u64 size = lseek(perf_data_file__fd(file), 0, SEEK_CUR);
>  	if (size == 0)
>  		return 0;
>  
> +	file->size = size;
> +
>  	/*
>  	 * During this process, it'll load kernel map and replace the
>  	 * dso->long_name to a real pathname it found.  In this case
> @@ -213,9 +214,7 @@ static int process_buildids(struct record *rec)
>  	 */
>  	symbol_conf.ignore_vmlinux_buildid = true;
>  
> -	return __perf_session__process_events(session, start,
> -					      size - start,
> -					      size, &build_id__mark_dso_hit_ops);
> +	return perf_session__process_events(session, &build_id__mark_dso_hit_ops);
>  }
>  
>  static void perf_event__synthesize_guest_os(struct machine *machine, void *data)
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 6ac62ae6b8fa..88aa2f09df93 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -1252,9 +1252,9 @@ fetch_mmaped_event(struct perf_session *session,
>  #define NUM_MMAPS 128
>  #endif
>  
> -int __perf_session__process_events(struct perf_session *session,
> -				   u64 data_offset, u64 data_size,
> -				   u64 file_size, struct perf_tool *tool)
> +static int __perf_session__process_events(struct perf_session *session,
> +					  u64 data_offset, u64 data_size,
> +					  u64 file_size, struct perf_tool *tool)
>  {
>  	int fd = perf_data_file__fd(session->file);
>  	u64 head, page_offset, file_offset, file_pos, size;
> diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
> index dc26ebf60fe4..6d663dc76404 100644
> --- a/tools/perf/util/session.h
> +++ b/tools/perf/util/session.h
> @@ -49,9 +49,6 @@ int perf_session__peek_event(struct perf_session *session, off_t file_offset,
>  			     union perf_event **event_ptr,
>  			     struct perf_sample *sample);
>  
> -int __perf_session__process_events(struct perf_session *session,
> -				   u64 data_offset, u64 data_size, u64 size,
> -				   struct perf_tool *tool);
>  int perf_session__process_events(struct perf_session *session,
>  				 struct perf_tool *tool);
>  
> -- 
> 2.1.3
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 08/37] perf tools: Handle multi-file session properly
  2014-12-24  7:15 ` [PATCH 08/37] perf tools: Handle multi-file session properly Namhyung Kim
@ 2014-12-31 12:01   ` Jiri Olsa
  2014-12-31 14:53     ` Namhyung Kim
  0 siblings, 1 reply; 91+ messages in thread
From: Jiri Olsa @ 2014-12-31 12:01 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Wed, Dec 24, 2014 at 04:15:04PM +0900, Namhyung Kim wrote:
> When perf detects multi-file data directory, process header file first
> and then rest data files in a row.  Note that the multi-file data is
> recorded for each cpu/thread separately, it's already ordered with
> respect to themselves so no need to use the ordered event queue
> interface.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/util/data.c    | 17 +++++++++++++++++
>  tools/perf/util/session.c | 41 +++++++++++++++++++++++++++++++----------
>  2 files changed, 48 insertions(+), 10 deletions(-)
> 
> diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
> index 8dacd34659cc..b6f7cdc4a39f 100644
> --- a/tools/perf/util/data.c
> +++ b/tools/perf/util/data.c
> @@ -52,6 +52,21 @@ static int check_backup(struct perf_data_file *file)
>  	return 0;
>  }
>  
> +static void check_multi(struct perf_data_file *file)
> +{
> +	struct stat st;
> +
> +	/*
> +	 * For write, it'll be determined by user (perf record -M)
> +	 * whether to enable multi file data storage.
> +	 */
> +	if (perf_data_file__is_write(file))
> +		return;
> +
> +	if (!stat(file->path, &st) && S_ISDIR(st.st_mode))
> +		file->is_multi = true;
> +}
> +
>  static int scandir_filter(const struct dirent *d)
>  {
>  	return !prefixcmp(d->d_name, "perf.data.");
> @@ -206,6 +221,8 @@ int perf_data_file__open(struct perf_data_file *file)
>  	if (check_pipe(file))
>  		return 0;
>  
> +	check_multi(file);
> +
>  	if (!file->path)
>  		file->path = default_data_path(file);
>  
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 88aa2f09df93..4f0fcd2d3901 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -1252,11 +1252,10 @@ fetch_mmaped_event(struct perf_session *session,
>  #define NUM_MMAPS 128
>  #endif
>  
> -static int __perf_session__process_events(struct perf_session *session,
> +static int __perf_session__process_events(struct perf_session *session, int fd,
>  					  u64 data_offset, u64 data_size,
>  					  u64 file_size, struct perf_tool *tool)
>  {
> -	int fd = perf_data_file__fd(session->file);
>  	u64 head, page_offset, file_offset, file_pos, size;
>  	int err, mmap_prot, mmap_flags, map_idx = 0;
>  	size_t	mmap_size;
> @@ -1362,18 +1361,40 @@ int perf_session__process_events(struct perf_session *session,
>  				 struct perf_tool *tool)
>  {
>  	u64 size = perf_data_file__size(session->file);
> -	int err;
> +	int err, i;
>  
>  	if (perf_session__register_idle_thread(session) == NULL)
>  		return -ENOMEM;
>  
> -	if (!perf_data_file__is_pipe(session->file))
> -		err = __perf_session__process_events(session,
> -						     session->header.data_offset,
> -						     session->header.data_size,
> -						     size, tool);
> -	else
> -		err = __perf_session__process_pipe_events(session, tool);
> +	if (perf_data_file__is_pipe(session->file))
> +		return __perf_session__process_pipe_events(session, tool);
> +
> +	err = __perf_session__process_events(session,
> +					     perf_data_file__fd(session->file),
> +					     session->header.data_offset,
> +					     session->header.data_size,
> +					     size, tool);
> +	if (!session->file->is_multi || err)
> +		return err;

if we have file->is_multi true, the perf_data_file__fd(session->file)
is the perf.header file right? So presumably, there's no data in it,
and we dont need to call above __perf_session__process_events function?

jirka

> +
> +	/*
> +	 * For multi-file data storage, events are processed for each
> +	 * cpu/thread so it's already ordered.
> +	 */
> +	tool->ordered_events = false;
> +
> +	for (i = 0; i < session->file->nr_multi; i++) {
> +		int fd = perf_data_file__multi_fd(session->file, i);
> +
> +		size = lseek(fd, 0, SEEK_END);
> +		if (size == 0)
> +			continue;
> +
> +		err = __perf_session__process_events(session, fd,
> +						     0, size, size, tool);
> +		if (err < 0)
> +			break;
> +	}
>  
>  	return err;
>  }
> -- 
> 2.1.3
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 16/37] perf tools: Add a test case for timed thread handling
  2014-12-24  7:15 ` [PATCH 16/37] perf tools: Add a test case for timed thread handling Namhyung Kim
@ 2014-12-31 14:17   ` Jiri Olsa
  2014-12-31 15:32     ` Namhyung Kim
  0 siblings, 1 reply; 91+ messages in thread
From: Jiri Olsa @ 2014-12-31 14:17 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Wed, Dec 24, 2014 at 04:15:12PM +0900, Namhyung Kim wrote:

SNIP

>  		.func = NULL,
>  	},
>  };
> diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
> index 43ac17780629..1090337f63e5 100644
> --- a/tools/perf/tests/tests.h
> +++ b/tools/perf/tests/tests.h
> @@ -52,6 +52,7 @@ int test__switch_tracking(void);
>  int test__fdarray__filter(void);
>  int test__fdarray__add(void);
>  int test__thread_comm(void);
> +int test__thread_lookup_time(void);
>  
>  #if defined(__x86_64__) || defined(__i386__) || defined(__arm__)
>  #ifdef HAVE_DWARF_UNWIND_SUPPORT
> diff --git a/tools/perf/tests/thread-lookup-time.c b/tools/perf/tests/thread-lookup-time.c
> new file mode 100644
> index 000000000000..6237ecf8caae
> --- /dev/null
> +++ b/tools/perf/tests/thread-lookup-time.c
> @@ -0,0 +1,174 @@
> +#include "tests.h"
> +#include "machine.h"
> +#include "thread.h"
> +#include "map.h"
> +#include "debug.h"
> +
> +static int thread__print_cb(struct thread *th, void *arg __maybe_unused)
> +{
> +	printf("thread: %d, start time: %"PRIu64" %s\n",
> +	       th->tid, th->start_time, th->dead ? "(dead)" : "");
> +	return 0;
> +}
> +
> +static int lookup_with_timestamp(struct machine *machine)
> +{
> +	struct thread *t1, *t2, *t3;
> +	union perf_event fork = {
> +		.fork = {
> +			.pid = 0,
> +			.tid = 0,
> +			.ppid = 1,
> +			.ptid = 1,
> +		},

I've got following output from the test:

test child forked, pid 18483
========= after t1 created ==========
thread: 0, start time: 0 
========= after t1 set comm ==========
thread: 0, start time: 20000 
========= after t2 forked ==========
thread: 0, start time: 50000 
thread: 1, start time: 0 
thread: 0, start time: 10000 
thread: 0, start time: 20000 (dead)
========= after t3 forked ==========
thread: 0, start time: 60000 
thread: 1, start time: 0 
thread: 0, start time: 10000 
thread: 0, start time: 50000 (dead)
thread: 0, start time: 20000 (dead)
test child finished with 0

'after t2 forked' data shows 'thread 0 with time 50000' and
newly added parent 'thread: 1, start time: 0'

this makes me wonder if you wanted switch 0 and 1 for pid and ppid
in above sample init and follow with forked pid 1 ... but not sure
because you're using the same sample for fork 3 ;-)

my question is if that was intentional, because I've got
confused in here

> +	};
> +	struct perf_sample sample = {
> +		.time = 50000,
> +	};
> +
> +	/* start_time is set to 0 */
> +	t1 = machine__findnew_thread(machine, 0, 0);
> +
> +	if (verbose > 1) {
> +		printf("========= after t1 created ==========\n");
> +		machine__for_each_thread(machine, thread__print_cb, NULL);
> +	}
> +
> +	TEST_ASSERT_VAL("wrong start time of old thread", t1->start_time == 0);
> +
> +	TEST_ASSERT_VAL("cannot find current thread",
> +			machine__find_thread(machine, 0, 0) == t1);
> +
> +	TEST_ASSERT_VAL("cannot find current thread with time",
> +			machine__findnew_thread_time(machine, 0, 0, 10000) == t1);
> +
> +	/* start_time is overwritten to new value */
> +	thread__set_comm(t1, "/usr/bin/perf", 20000);
> +
> +	if (verbose > 1) {
> +		printf("========= after t1 set comm ==========\n");
> +		machine__for_each_thread(machine, thread__print_cb, NULL);
> +	}
> +
> +	TEST_ASSERT_VAL("failed to update start time", t1->start_time == 20000);
> +
> +	TEST_ASSERT_VAL("should not find passed thread",
> +			/* this will create yet another dead thread */
> +			machine__findnew_thread_time(machine, 0, 0, 10000) != t1);

also this comment say that calling machine__findnew_thread_time will
create another dead thread, which actually did not happened based on
above test output

thanks,
jirka

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 08/37] perf tools: Handle multi-file session properly
  2014-12-31 12:01   ` Jiri Olsa
@ 2014-12-31 14:53     ` Namhyung Kim
  0 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-31 14:53 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

Hi Jiri,

On Wed, Dec 31, 2014 at 9:01 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> On Wed, Dec 24, 2014 at 04:15:04PM +0900, Namhyung Kim wrote:
>> When perf detects multi-file data directory, process header file first
>> and then rest data files in a row.  Note that the multi-file data is
>> recorded for each cpu/thread separately, it's already ordered with
>> respect to themselves so no need to use the ordered event queue
>> interface.

[SNIP]
>> @@ -1362,18 +1361,40 @@ int perf_session__process_events(struct perf_session *session,
>>                                struct perf_tool *tool)
>>  {
>>       u64 size = perf_data_file__size(session->file);
>> -     int err;
>> +     int err, i;
>>
>>       if (perf_session__register_idle_thread(session) == NULL)
>>               return -ENOMEM;
>>
>> -     if (!perf_data_file__is_pipe(session->file))
>> -             err = __perf_session__process_events(session,
>> -                                                  session->header.data_offset,
>> -                                                  session->header.data_size,
>> -                                                  size, tool);
>> -     else
>> -             err = __perf_session__process_pipe_events(session, tool);
>> +     if (perf_data_file__is_pipe(session->file))
>> +             return __perf_session__process_pipe_events(session, tool);
>> +
>> +     err = __perf_session__process_events(session,
>> +                                          perf_data_file__fd(session->file),
>> +                                          session->header.data_offset,
>> +                                          session->header.data_size,
>> +                                          size, tool);
>> +     if (!session->file->is_multi || err)
>> +             return err;
>
> if we have file->is_multi true, the perf_data_file__fd(session->file)
> is the perf.header file right? So presumably, there's no data in it,
> and we dont need to call above __perf_session__process_events function?

Right, it's for perf.header file and has no samples but it has meta
events like fork, comm, mmap so it needs to be processed (before
processing samples).

Thanks,
Namhyung


>
>> +
>> +     /*
>> +      * For multi-file data storage, events are processed for each
>> +      * cpu/thread so it's already ordered.
>> +      */
>> +     tool->ordered_events = false;
>> +
>> +     for (i = 0; i < session->file->nr_multi; i++) {
>> +             int fd = perf_data_file__multi_fd(session->file, i);
>> +
>> +             size = lseek(fd, 0, SEEK_END);
>> +             if (size == 0)
>> +                     continue;
>> +
>> +             err = __perf_session__process_events(session, fd,
>> +                                                  0, size, size, tool);
>> +             if (err < 0)
>> +                     break;
>> +     }
>>
>>       return err;
>>  }
>> --
>> 2.1.3
>>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 04/37] perf tools: Add multi file interface to perf_data_file
  2014-12-31 11:26   ` Jiri Olsa
@ 2014-12-31 14:55     ` Namhyung Kim
  0 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-31 14:55 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Wed, Dec 31, 2014 at 8:26 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> On Wed, Dec 24, 2014 at 04:15:00PM +0900, Namhyung Kim wrote:
>
> SNIP
>
>> +
>>  static int open_file_read(struct perf_data_file *file)
>>  {
>>       struct stat st;
>> +     char path[PATH_MAX];
>>       int fd;
>>       char sbuf[STRERR_BUFSIZE];
>>
>> -     fd = open(file->path, O_RDONLY);
>> +     strcpy(path, file->path);
>
> strncpy ?

Will change. :)

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH 16/37] perf tools: Add a test case for timed thread handling
  2014-12-31 14:17   ` Jiri Olsa
@ 2014-12-31 15:32     ` Namhyung Kim
  0 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2014-12-31 15:32 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker

On Wed, Dec 31, 2014 at 11:17 PM, Jiri Olsa <jolsa@redhat.com> wrote:
> On Wed, Dec 24, 2014 at 04:15:12PM +0900, Namhyung Kim wrote:
>
> SNIP
>
>>               .func = NULL,
>>       },
>>  };
>> diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
>> index 43ac17780629..1090337f63e5 100644
>> --- a/tools/perf/tests/tests.h
>> +++ b/tools/perf/tests/tests.h
>> @@ -52,6 +52,7 @@ int test__switch_tracking(void);
>>  int test__fdarray__filter(void);
>>  int test__fdarray__add(void);
>>  int test__thread_comm(void);
>> +int test__thread_lookup_time(void);
>>
>>  #if defined(__x86_64__) || defined(__i386__) || defined(__arm__)
>>  #ifdef HAVE_DWARF_UNWIND_SUPPORT
>> diff --git a/tools/perf/tests/thread-lookup-time.c b/tools/perf/tests/thread-lookup-time.c
>> new file mode 100644
>> index 000000000000..6237ecf8caae
>> --- /dev/null
>> +++ b/tools/perf/tests/thread-lookup-time.c
>> @@ -0,0 +1,174 @@
>> +#include "tests.h"
>> +#include "machine.h"
>> +#include "thread.h"
>> +#include "map.h"
>> +#include "debug.h"
>> +
>> +static int thread__print_cb(struct thread *th, void *arg __maybe_unused)
>> +{
>> +     printf("thread: %d, start time: %"PRIu64" %s\n",
>> +            th->tid, th->start_time, th->dead ? "(dead)" : "");
>> +     return 0;
>> +}
>> +
>> +static int lookup_with_timestamp(struct machine *machine)
>> +{
>> +     struct thread *t1, *t2, *t3;
>> +     union perf_event fork = {
>> +             .fork = {
>> +                     .pid = 0,
>> +                     .tid = 0,
>> +                     .ppid = 1,
>> +                     .ptid = 1,
>> +             },
>
> I've got following output from the test:
>
> test child forked, pid 18483
> ========= after t1 created ==========
> thread: 0, start time: 0
> ========= after t1 set comm ==========
> thread: 0, start time: 20000
> ========= after t2 forked ==========
> thread: 0, start time: 50000
> thread: 1, start time: 0
> thread: 0, start time: 10000
> thread: 0, start time: 20000 (dead)
> ========= after t3 forked ==========
> thread: 0, start time: 60000
> thread: 1, start time: 0
> thread: 0, start time: 10000
> thread: 0, start time: 50000 (dead)
> thread: 0, start time: 20000 (dead)
> test child finished with 0
>
> 'after t2 forked' data shows 'thread 0 with time 50000' and
> newly added parent 'thread: 1, start time: 0'
>
> this makes me wonder if you wanted switch 0 and 1 for pid and ppid
> in above sample init and follow with forked pid 1 ... but not sure
> because you're using the same sample for fork 3 ;-)
>
> my question is if that was intentional, because I've got
> confused in here

Yeah, it's intentional.  I'm testing machine__findnew_thread_time()
and machine__process_fork_event() can generate threads properly.  The
former creates a dead thread if the timestamp is before any of
existing threads which have a same pid.  The latter can create two
threads - one for tid and another for ptid (only if it doesn't exist).


>
>> +     };
>> +     struct perf_sample sample = {
>> +             .time = 50000,
>> +     };
>> +
>> +     /* start_time is set to 0 */
>> +     t1 = machine__findnew_thread(machine, 0, 0);
>> +
>> +     if (verbose > 1) {
>> +             printf("========= after t1 created ==========\n");
>> +             machine__for_each_thread(machine, thread__print_cb, NULL);
>> +     }
>> +
>> +     TEST_ASSERT_VAL("wrong start time of old thread", t1->start_time == 0);
>> +
>> +     TEST_ASSERT_VAL("cannot find current thread",
>> +                     machine__find_thread(machine, 0, 0) == t1);
>> +
>> +     TEST_ASSERT_VAL("cannot find current thread with time",
>> +                     machine__findnew_thread_time(machine, 0, 0, 10000) == t1);
>> +
>> +     /* start_time is overwritten to new value */
>> +     thread__set_comm(t1, "/usr/bin/perf", 20000);
>> +
>> +     if (verbose > 1) {
>> +             printf("========= after t1 set comm ==========\n");
>> +             machine__for_each_thread(machine, thread__print_cb, NULL);
>> +     }
>> +
>> +     TEST_ASSERT_VAL("failed to update start time", t1->start_time == 20000);
>> +
>> +     TEST_ASSERT_VAL("should not find passed thread",
>> +                     /* this will create yet another dead thread */
>> +                     machine__findnew_thread_time(machine, 0, 0, 10000) != t1);
>
> also this comment say that calling machine__findnew_thread_time will
> create another dead thread, which actually did not happened based on
> above test output

Oh, it's actually a dead thread - it's in the dead threads tree - but
I just missed to set the dead flag. :)

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1)
  2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
                   ` (37 preceding siblings ...)
  2014-12-26 14:02 ` [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Jiri Olsa
@ 2015-01-05 18:48 ` Andi Kleen
  2015-01-06 15:50   ` Stephane Eranian
  2015-01-07  6:58   ` Namhyung Kim
  38 siblings, 2 replies; 91+ messages in thread
From: Andi Kleen @ 2015-01-05 18:48 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, David Ahern, Stephane Eranian, Adrian Hunter, Andi Kleen,
	Frederic Weisbecker


Thanks for working on this. Haven't read any code, just
some high level comments on the design.
> 
> So my approach is like this:
> 
> Partially do stage 1 first - but only for meta events that changes
> machine state.  To do this I add a dummy tracking event to perf record
> and make it collect such meta events only.  They are saved in a
> separate file (perf.header) and processed before sample events at perf
> report time.

Can't you just use seek to put the offset into the perf.data header
like it's already done for other sections? Managing another file would be
a big change for users and especially is a problem if the data
is moved between different systems.

Also I thought Adrian's meta data index already addressed this
at least partially.

> 
> This also requires to handle multiple files and to find a
> corresponding machine state when processing samples.  On a large
> profiling session, many tasks were created and exited so pid might be
> recycled (even more than once!).  To deal with it, I managed to have
> thread, map_groups and comm in time sorted.  The only remaining thing
> is symbol loading as it's done lazily when sample requires it.

FWIW there's often a lot of unnecessary information in this
(e.g. mmaps that are not used). The Quipper page
claims large saving in data files by avoided redundancies.

It would be probably better if perf record avoided writing redundant
information better (I realize that's not easy)
> 
> With that being done, the stage 2 can be done by multiple threads.  I
> also save each sample data (per-cpu or per-thread) in separate files
> during record.  On perf report time, each file will be processed by
> each thread.  And symbol loading is protected by a mutex lock.

I really don't like the multiple files. See above. Also it could easily
cause additional seeking on spinning disks.

Isn't it fast enough to have a single thread that pre scans
the events (perhaps with some single-thread optimizations
like vectorization), and then load balances the work to
a thread pool?

BTW I suspect if you used cilk plus or a similar library that
would make the code much simpler.

> Here is the result:
> 
> This is just elapsed (real) time measured by shell 'time' function.
> 
> The data file was recorded during kernel build with fp callchain and
> size is 2.1GB.  The machine has 6 core with hyper-threading enabled
> and I got a similar result on my laptop too.
> 
>  time perf report  --children  --no-children  + --call-graph none
>  		   ----------  -------------  -------------------
>  current            4m43.260s      1m32.779s            0m35.866s            
>  patched            4m43.710s      1m29.695s            0m33.995s
>  --multi-thread     2m46.265s      0m45.486s             0m7.570s
> 
> 
> This result is with 7.7GB data file using libunwind for callchain.

Nice results!

-Andi

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1)
  2015-01-05 18:48 ` Andi Kleen
@ 2015-01-06 15:50   ` Stephane Eranian
  2015-01-07  7:13     ` Namhyung Kim
  2015-01-07  6:58   ` Namhyung Kim
  1 sibling, 1 reply; 91+ messages in thread
From: Stephane Eranian @ 2015-01-06 15:50 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Namhyung Kim, Arnaldo Carvalho de Melo, Ingo Molnar,
	Peter Zijlstra, Jiri Olsa, LKML, David Ahern, Adrian Hunter,
	Frederic Weisbecker

On Mon, Jan 5, 2015 at 1:48 PM, Andi Kleen <andi@firstfloor.org> wrote:
>
>
> Thanks for working on this. Haven't read any code, just
> some high level comments on the design.
> >
> > So my approach is like this:
> >
> > Partially do stage 1 first - but only for meta events that changes
> > machine state.  To do this I add a dummy tracking event to perf record
> > and make it collect such meta events only.  They are saved in a
> > separate file (perf.header) and processed before sample events at perf
> > report time.
>
> Can't you just use seek to put the offset into the perf.data header
> like it's already done for other sections? Managing another file would be
> a big change for users and especially is a problem if the data
> is moved between different systems.
>
> Also I thought Adrian's meta data index already addressed this
> at least partially.
>
> >
> > This also requires to handle multiple files and to find a
> > corresponding machine state when processing samples.  On a large
> > profiling session, many tasks were created and exited so pid might be
> > recycled (even more than once!).  To deal with it, I managed to have
> > thread, map_groups and comm in time sorted.  The only remaining thing
> > is symbol loading as it's done lazily when sample requires it.
>
> FWIW there's often a lot of unnecessary information in this
> (e.g. mmaps that are not used). The Quipper page
> claims large saving in data files by avoided redundancies.
>
> It would be probably better if perf record avoided writing redundant
> information better (I realize that's not easy)
> >
> > With that being done, the stage 2 can be done by multiple threads.  I
> > also save each sample data (per-cpu or per-thread) in separate files
> > during record.  On perf report time, each file will be processed by
> > each thread.  And symbol loading is protected by a mutex lock.
>
> I really don't like the multiple files. See above. Also it could easily
> cause additional seeking on spinning disks.
>
having to manage two separate files is a major change which I don't
particularly like. It will cause problems. I don't see why this cannot
be appended to the perf.data file with a index at the beginning. There
is already an index for sections in file mode.

We use the pipe mode a lot and this would not work there. So no,
I don't like the 2 files solution. But I like the idea of using multiple
threads to speed up processing.

>
> Isn't it fast enough to have a single thread that pre scans
> the events (perhaps with some single-thread optimizations
> like vectorization), and then load balances the work to
> a thread pool?
>
> BTW I suspect if you used cilk plus or a similar library that
> would make the code much simpler.
>
> > Here is the result:
> >
> > This is just elapsed (real) time measured by shell 'time' function.
> >
> > The data file was recorded during kernel build with fp callchain and
> > size is 2.1GB.  The machine has 6 core with hyper-threading enabled
> > and I got a similar result on my laptop too.
> >
> >  time perf report  --children  --no-children  + --call-graph none
> >                  ----------  -------------  -------------------
> >  current            4m43.260s      1m32.779s            0m35.866s
> >  patched            4m43.710s      1m29.695s            0m33.995s
> >  --multi-thread     2m46.265s      0m45.486s             0m7.570s
> >
> >
> > This result is with 7.7GB data file using libunwind for callchain.
>
> Nice results!
>
> -Andi

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1)
  2015-01-05 18:48 ` Andi Kleen
  2015-01-06 15:50   ` Stephane Eranian
@ 2015-01-07  6:58   ` Namhyung Kim
  2015-01-08 14:52     ` Andi Kleen
  1 sibling, 1 reply; 91+ messages in thread
From: Namhyung Kim @ 2015-01-07  6:58 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, Jiri Olsa,
	LKML, David Ahern, Stephane Eranian, Adrian Hunter,
	Frederic Weisbecker

Hi Andi,

On Mon, Jan 05, 2015 at 07:48:11PM +0100, Andi Kleen wrote:
> 
> Thanks for working on this. Haven't read any code, just
> some high level comments on the design.

Really appreciate it!


> > 
> > So my approach is like this:
> > 
> > Partially do stage 1 first - but only for meta events that changes
> > machine state.  To do this I add a dummy tracking event to perf record
> > and make it collect such meta events only.  They are saved in a
> > separate file (perf.header) and processed before sample events at perf
> > report time.
> 
> Can't you just use seek to put the offset into the perf.data header
> like it's already done for other sections? Managing another file would be
> a big change for users and especially is a problem if the data
> is moved between different systems.

The files are located in a directory and users only deal with the
directory so I don't think it's a big problem.  In addition, moving
data between different systems requires archiving related debuginfos
and I think we can extend perf-archive to put those debuginfo in the
data directory so that it can find the symbols more easily.


> 
> Also I thought Adrian's meta data index already addressed this
> at least partially.

I know Adrian's work might have some common parts but I haven't looked
at it deeply, sorry!  It'd be great if we can discuss how to
coordinate the future direction or something..


> 
> > 
> > This also requires to handle multiple files and to find a
> > corresponding machine state when processing samples.  On a large
> > profiling session, many tasks were created and exited so pid might be
> > recycled (even more than once!).  To deal with it, I managed to have
> > thread, map_groups and comm in time sorted.  The only remaining thing
> > is symbol loading as it's done lazily when sample requires it.
> 
> FWIW there's often a lot of unnecessary information in this
> (e.g. mmaps that are not used). The Quipper page
> claims large saving in data files by avoided redundancies.
> 
> It would be probably better if perf record avoided writing redundant
> information better (I realize that's not easy)

Right, many mmap events won't be used but we cannot predict which one
is used or not.


> > 
> > With that being done, the stage 2 can be done by multiple threads.  I
> > also save each sample data (per-cpu or per-thread) in separate files
> > during record.  On perf report time, each file will be processed by
> > each thread.  And symbol loading is protected by a mutex lock.
> 
> I really don't like the multiple files. See above. Also it could easily
> cause additional seeking on spinning disks.

Right, I admit that my result ran on a SSD disk.


> 
> Isn't it fast enough to have a single thread that pre scans
> the events (perhaps with some single-thread optimizations
> like vectorization), and then load balances the work to
> a thread pool?

I don't understand it.  Could you please elaborate it?


> 
> BTW I suspect if you used cilk plus or a similar library that
> would make the code much simpler.

I'm not sure how much code I can make simpler with the help of such
library.  I think most changes in this patchset is preparations to
concurrent access in libperf and it's still needed even if the library
is used anyway.

Thanks,
Namhyung


> 
> > Here is the result:
> > 
> > This is just elapsed (real) time measured by shell 'time' function.
> > 
> > The data file was recorded during kernel build with fp callchain and
> > size is 2.1GB.  The machine has 6 core with hyper-threading enabled
> > and I got a similar result on my laptop too.
> > 
> >  time perf report  --children  --no-children  + --call-graph none
> >  		   ----------  -------------  -------------------
> >  current            4m43.260s      1m32.779s            0m35.866s            
> >  patched            4m43.710s      1m29.695s            0m33.995s
> >  --multi-thread     2m46.265s      0m45.486s             0m7.570s
> > 
> > 
> > This result is with 7.7GB data file using libunwind for callchain.
> 
> Nice results!
> 
> -Andi
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1)
  2015-01-06 15:50   ` Stephane Eranian
@ 2015-01-07  7:13     ` Namhyung Kim
  2015-01-07 15:14       ` Stephane Eranian
  0 siblings, 1 reply; 91+ messages in thread
From: Namhyung Kim @ 2015-01-07  7:13 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Ingo Molnar,
	Peter Zijlstra, Jiri Olsa, LKML, David Ahern, Adrian Hunter,
	Frederic Weisbecker

Hi Stephane,

On Tue, Jan 06, 2015 at 10:50:44AM -0500, Stephane Eranian wrote:
> On Mon, Jan 5, 2015 at 1:48 PM, Andi Kleen <andi@firstfloor.org> wrote:
> > > This also requires to handle multiple files and to find a
> > > corresponding machine state when processing samples.  On a large
> > > profiling session, many tasks were created and exited so pid might be
> > > recycled (even more than once!).  To deal with it, I managed to have
> > > thread, map_groups and comm in time sorted.  The only remaining thing
> > > is symbol loading as it's done lazily when sample requires it.
> >
> > FWIW there's often a lot of unnecessary information in this
> > (e.g. mmaps that are not used). The Quipper page
> > claims large saving in data files by avoided redundancies.
> >
> > It would be probably better if perf record avoided writing redundant
> > information better (I realize that's not easy)
> > >
> > > With that being done, the stage 2 can be done by multiple threads.  I
> > > also save each sample data (per-cpu or per-thread) in separate files
> > > during record.  On perf report time, each file will be processed by
> > > each thread.  And symbol loading is protected by a mutex lock.
> >
> > I really don't like the multiple files. See above. Also it could easily
> > cause additional seeking on spinning disks.
> >
> having to manage two separate files is a major change which I don't
> particularly like. It will cause problems. I don't see why this cannot
> be appended to the perf.data file with a index at the beginning. There
> is already an index for sections in file mode.

I just thought it's easier to handle with multiple thread.  Maybe we
can concatenate the files after recording.


> 
> We use the pipe mode a lot and this would not work there. So no,
> I don't like the 2 files solution. But I like the idea of using multiple
> threads to speed up processing.

Actually it's not 2 files, it's 1 + N files. :)  Anyway, I think the
single file + index approach requires seeking to process them, is it
ok for pipe-mode?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1)
  2015-01-07  7:13     ` Namhyung Kim
@ 2015-01-07 15:14       ` Stephane Eranian
  2015-01-08  5:19         ` Namhyung Kim
  0 siblings, 1 reply; 91+ messages in thread
From: Stephane Eranian @ 2015-01-07 15:14 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Ingo Molnar,
	Peter Zijlstra, Jiri Olsa, LKML, David Ahern, Adrian Hunter,
	Frederic Weisbecker

On Wed, Jan 7, 2015 at 2:13 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> Hi Stephane,
>
> On Tue, Jan 06, 2015 at 10:50:44AM -0500, Stephane Eranian wrote:
>> On Mon, Jan 5, 2015 at 1:48 PM, Andi Kleen <andi@firstfloor.org> wrote:
>> > > This also requires to handle multiple files and to find a
>> > > corresponding machine state when processing samples.  On a large
>> > > profiling session, many tasks were created and exited so pid might be
>> > > recycled (even more than once!).  To deal with it, I managed to have
>> > > thread, map_groups and comm in time sorted.  The only remaining thing
>> > > is symbol loading as it's done lazily when sample requires it.
>> >
>> > FWIW there's often a lot of unnecessary information in this
>> > (e.g. mmaps that are not used). The Quipper page
>> > claims large saving in data files by avoided redundancies.
>> >
>> > It would be probably better if perf record avoided writing redundant
>> > information better (I realize that's not easy)
>> > >
>> > > With that being done, the stage 2 can be done by multiple threads.  I
>> > > also save each sample data (per-cpu or per-thread) in separate files
>> > > during record.  On perf report time, each file will be processed by
>> > > each thread.  And symbol loading is protected by a mutex lock.
>> >
>> > I really don't like the multiple files. See above. Also it could easily
>> > cause additional seeking on spinning disks.
>> >
>> having to manage two separate files is a major change which I don't
>> particularly like. It will cause problems. I don't see why this cannot
>> be appended to the perf.data file with a index at the beginning. There
>> is already an index for sections in file mode.
>
> I just thought it's easier to handle with multiple thread.  Maybe we
> can concatenate the files after recording.
>
>
>>
>> We use the pipe mode a lot and this would not work there. So no,
>> I don't like the 2 files solution. But I like the idea of using multiple
>> threads to speed up processing.
>
> Actually it's not 2 files, it's 1 + N files. :)  Anyway, I think the
> single file + index approach requires seeking to process them, is it
> ok for pipe-mode?
>
There is no seek possible in pipe mode.

The way this is done (as I remember) is by creating pseudo-record types
and injecting them in the stream.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1)
  2015-01-07 15:14       ` Stephane Eranian
@ 2015-01-08  5:19         ` Namhyung Kim
  0 siblings, 0 replies; 91+ messages in thread
From: Namhyung Kim @ 2015-01-08  5:19 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Ingo Molnar,
	Peter Zijlstra, Jiri Olsa, LKML, David Ahern, Adrian Hunter,
	Frederic Weisbecker

On Wed, Jan 07, 2015 at 10:14:56AM -0500, Stephane Eranian wrote:
> On Wed, Jan 7, 2015 at 2:13 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> > Hi Stephane,
> >
> > On Tue, Jan 06, 2015 at 10:50:44AM -0500, Stephane Eranian wrote:
> >> We use the pipe mode a lot and this would not work there. So no,
> >> I don't like the 2 files solution. But I like the idea of using multiple
> >> threads to speed up processing.
> >
> > Actually it's not 2 files, it's 1 + N files. :)  Anyway, I think the
> > single file + index approach requires seeking to process them, is it
> > ok for pipe-mode?
> >
> There is no seek possible in pipe mode.
> 
> The way this is done (as I remember) is by creating pseudo-record types
> and injecting them in the stream.

Right.  But I have no idea how I can index a live stream and process
it using multiple threads.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1)
  2015-01-07  6:58   ` Namhyung Kim
@ 2015-01-08 14:52     ` Andi Kleen
  0 siblings, 0 replies; 91+ messages in thread
From: Andi Kleen @ 2015-01-08 14:52 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Ingo Molnar,
	Peter Zijlstra, Jiri Olsa, LKML, David Ahern, Stephane Eranian,
	Adrian Hunter, Frederic Weisbecker

> > Isn't it fast enough to have a single thread that pre scans
> > the events (perhaps with some single-thread optimizations
> > like vectorization), and then load balances the work to
> > a thread pool?
> 
> I don't understand it.  Could you please elaborate it?

Have a thread pool. Then there is a single thread which pre-processes
events and puts them into queues, which are then processed in parallel
by the threads in the thread pool.


-Andi

^ permalink raw reply	[flat|nested] 91+ messages in thread

end of thread, other threads:[~2015-01-08 14:52 UTC | newest]

Thread overview: 91+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-24  7:14 [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Namhyung Kim
2014-12-24  7:14 ` [PATCH 01/37] perf tools: Set attr.task bit for a tracking event Namhyung Kim
2014-12-31 11:25   ` Jiri Olsa
2014-12-24  7:14 ` [PATCH 02/37] perf record: Use a software dummy event to track task/mmap events Namhyung Kim
2014-12-26 16:27   ` David Ahern
2014-12-27  5:28     ` Namhyung Kim
2014-12-29 12:58       ` Adrian Hunter
2014-12-30  5:51         ` Namhyung Kim
2014-12-30  9:04           ` Adrian Hunter
2014-12-24  7:14 ` [PATCH 03/37] perf tools: Use perf_data_file__fd() consistently Namhyung Kim
2014-12-26 16:30   ` David Ahern
2014-12-27  5:30     ` Namhyung Kim
2014-12-24  7:15 ` [PATCH 04/37] perf tools: Add multi file interface to perf_data_file Namhyung Kim
2014-12-25 22:08   ` Jiri Olsa
2014-12-26  1:19     ` Namhyung Kim
2014-12-31 11:26   ` Jiri Olsa
2014-12-31 14:55     ` Namhyung Kim
2014-12-24  7:15 ` [PATCH 05/37] perf tools: Create separate mmap for dummy tracking event Namhyung Kim
2014-12-25 22:08   ` Jiri Olsa
2014-12-26  1:45     ` Namhyung Kim
2014-12-25 22:09   ` Jiri Olsa
2014-12-26  1:55     ` Namhyung Kim
2014-12-26 16:51   ` David Ahern
2014-12-27  5:32     ` Namhyung Kim
2014-12-29 13:44   ` Adrian Hunter
2014-12-30  5:57     ` Namhyung Kim
2014-12-24  7:15 ` [PATCH 06/37] perf tools: Introduce perf_evlist__mmap_multi() Namhyung Kim
2014-12-24  7:15 ` [PATCH 07/37] perf tools: Do not use __perf_session__process_events() directly Namhyung Kim
2014-12-31 11:33   ` Jiri Olsa
2014-12-24  7:15 ` [PATCH 08/37] perf tools: Handle multi-file session properly Namhyung Kim
2014-12-31 12:01   ` Jiri Olsa
2014-12-31 14:53     ` Namhyung Kim
2014-12-24  7:15 ` [PATCH 09/37] perf record: Add -M/--multi option for multi file recording Namhyung Kim
2014-12-24  7:15 ` [PATCH 10/37] perf report: Skip dummy tracking event Namhyung Kim
2014-12-24  7:15 ` [PATCH 11/37] perf tools: Introduce thread__comm_time() helpers Namhyung Kim
2014-12-26 17:00   ` David Ahern
2014-12-27  5:36     ` Namhyung Kim
2014-12-24  7:15 ` [PATCH 12/37] perf tools: Add a test case for thread comm handling Namhyung Kim
2014-12-24  7:15 ` [PATCH 13/37] perf tools: Use thread__comm_time() when adding hist entries Namhyung Kim
2014-12-25 22:53   ` Jiri Olsa
2014-12-26  2:10     ` Namhyung Kim
2014-12-24  7:15 ` [PATCH 14/37] perf tools: Convert dead thread list into rbtree Namhyung Kim
2014-12-25 23:05   ` Jiri Olsa
2014-12-26  2:26     ` Namhyung Kim
2014-12-26 17:14       ` David Ahern
2014-12-27  5:42         ` Namhyung Kim
2014-12-27 15:31   ` David Ahern
2014-12-28 13:24     ` Namhyung Kim
2014-12-24  7:15 ` [PATCH 15/37] perf tools: Introduce machine__find*_thread_time() Namhyung Kim
2014-12-27 16:33   ` David Ahern
2014-12-28 14:50     ` Namhyung Kim
2014-12-24  7:15 ` [PATCH 16/37] perf tools: Add a test case for timed thread handling Namhyung Kim
2014-12-31 14:17   ` Jiri Olsa
2014-12-31 15:32     ` Namhyung Kim
2014-12-24  7:15 ` [PATCH 17/37] perf tools: Maintain map groups list in a leader thread Namhyung Kim
2014-12-24  7:15 ` [PATCH 18/37] perf tools: Remove thread when map groups initialization failed Namhyung Kim
2014-12-28  0:45   ` David Ahern
2014-12-29  7:08     ` Namhyung Kim
2014-12-24  7:15 ` [PATCH 19/37] perf tools: Introduce thread__find_addr_location_time() and friends Namhyung Kim
2014-12-24  7:15 ` [PATCH 20/37] perf tools: Add a test case for timed map groups handling Namhyung Kim
2014-12-24  7:15 ` [PATCH 21/37] perf tools: Protect dso symbol loading using a mutex Namhyung Kim
2014-12-24  7:15 ` [PATCH 22/37] perf tools: Protect dso cache tree using dso->lock Namhyung Kim
2014-12-24  7:15 ` [PATCH 23/37] perf tools: Protect dso cache fd with a mutex Namhyung Kim
2014-12-24  7:15 ` [PATCH 24/37] perf session: Pass struct events stats to event processing functions Namhyung Kim
2014-12-24  7:15 ` [PATCH 25/37] perf hists: Pass hists struct to hist_entry_iter functions Namhyung Kim
2014-12-24  7:15 ` [PATCH 26/37] perf tools: Move BUILD_ID_SIZE definition to perf.h Namhyung Kim
2014-12-24  7:15 ` [PATCH 27/37] perf report: Parallelize perf report using multi-thread Namhyung Kim
2014-12-24  7:15 ` [PATCH 28/37] perf tools: Add missing_threads rb tree Namhyung Kim
2014-12-24  7:15 ` [PATCH 29/37] perf top: Always creates thread in the current task tree Namhyung Kim
2014-12-24  7:15 ` [PATCH 30/37] perf tools: Fix progress ui to support multi thread Namhyung Kim
2014-12-24  7:15 ` [PATCH 31/37] perf record: Show total size of multi file data Namhyung Kim
2014-12-24  7:15 ` [PATCH 32/37] perf report: Add --multi-thread option and config item Namhyung Kim
2014-12-24  7:15 ` [PATCH 33/37] perf tools: Add front cache for dso data access Namhyung Kim
2014-12-24  7:15 ` [PATCH 34/37] perf tools: Convert lseek + read to pread Namhyung Kim
2014-12-24  7:15 ` [PATCH 35/37] perf callchain: Save eh/debug frame offset for dwarf unwind Namhyung Kim
2014-12-24  7:15 ` [PATCH 36/37] perf tools: Add new perf data command Namhyung Kim
2014-12-24  7:15 ` [PATCH 37/37] perf data: Implement 'split' subcommand Namhyung Kim
2014-12-24 13:51   ` Arnaldo Carvalho de Melo
2014-12-24 14:14     ` Namhyung Kim
2014-12-24 14:45       ` Arnaldo Carvalho de Melo
2014-12-26 13:59   ` Jiri Olsa
2014-12-27  5:21     ` Namhyung Kim
2014-12-26 14:02 ` [RFC/PATCHSET 00/37] perf tools: Speed-up perf report by using multi thread (v1) Jiri Olsa
2014-12-27  5:23   ` Namhyung Kim
2015-01-05 18:48 ` Andi Kleen
2015-01-06 15:50   ` Stephane Eranian
2015-01-07  7:13     ` Namhyung Kim
2015-01-07 15:14       ` Stephane Eranian
2015-01-08  5:19         ` Namhyung Kim
2015-01-07  6:58   ` Namhyung Kim
2015-01-08 14:52     ` Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.