linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function
@ 2018-01-24 11:51 Jiri Olsa
  2018-01-24 11:51 ` [PATCH 01/21] " Jiri Olsa
                   ` (21 more replies)
  0 siblings, 22 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 11:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: lkml, Namhyung Kim, David Ahern, Andi Kleen, Alexander Shishkin,
	Andy Lutomirski, Arnaldo Carvalho de Melo

hi,
this RFC contains change to delay sample's user space
data retrieval into task work, originally described and
discussed by Peter and Ingo in here [1].

This patchset tries to follow the original patch with
some kernel changes (described below) and perf tool
support included.

Basically we allow the NMI event code to skip user data
retrieval and schedule task work to do it, before the
task resumes.

Using the task work limits the window where we can do
this. We can trigger the delayed task work only if the
taskwork gets executed before the process executes again
after NMI, because we need its stack as it was in NMI.

That leaves us with window during the slow syscall path
(check task_struct::perf_user_data_allowed in patches).

The slow syscall processing is forced for task when
the user data event is enabled, which makes the task
slower.

On the other hand I noticed roughly 100us drop in NMI
processing times, which I plotted in here [2].

Not sure it's worth to introduce this processing, which adds
more processing time and does not show much improvement. On
the other hand IIRC Peter mentioned it'd be nice to get user
space data retrieval out of NMI.

Also you guys could think of some other better/faster way ;-)

NOTE I also implemented putting the user stack data into
delayed processing, which showed nicer numbers. But it's
little more tricky and brings more changes into this already
big patchset. The logic stays, so I did not include it to
keep the patchset simple.

Also available in:
  https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
  perf/user_data

thanks for comments,
jirka

[1] https://marc.info/?l=linux-kernel&m=150098372819938&w=2
[2] http://people.redhat.com/~jolsa/ud-bench.png

---
Jiri Olsa (21):
      perf tools: Add perf_evsel__is_sample_bit function
      perf tools: Add perf_sample__process function
      perf tools: Add callchain__printf for pure callchain dump
      perf tools: Add perf_sample__copy|free functions
      perf: Add TIF_PERF_USER_DATA bit
      perf: Add PERF_RECORD_USER_DATA event processing
      perf: Add PERF_SAMPLE_USER_DATA_ID sample type
      perf: Add PERF_SAMPLE_CALLCHAIN to user data event
      perf: Export running sample length values through debugfs
      perf tools: Sync perf_event.h uapi header
      perf tools: Add perf_sample__parse function
      perf tools: Add struct parse_args arg to perf_sample__parse
      perf tools: Add support to parse user data event
      perf tools: Add support to dump user data event info
      perf report: Add delayed user data event processing
      perf record: Enable delayed user data events
      perf script: Add support to display user data events
      perf script: Add support to display user data ID
      perf script: Display USER_DATA misc char for sample
      perf report: Add user data processing stats
      perf report: Add --stats=ud option to display user data debug info

 arch/x86/entry/common.c                  |   6 +++
 arch/x86/events/core.c                   |  18 ++++++++
 arch/x86/events/intel/ds.c               |   4 +-
 arch/x86/include/asm/thread_info.h       |   4 +-
 include/linux/init_task.h                |   4 +-
 include/linux/perf_event.h               |   3 ++
 include/linux/sched.h                    |  20 ++++++++
 include/uapi/linux/perf_event.h          |  34 +++++++++++++-
 kernel/events/core.c                     | 283 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 tools/include/uapi/linux/perf_event.h    |  34 +++++++++++++-
 tools/perf/Documentation/perf-script.txt |   3 +-
 tools/perf/builtin-record.c              |   2 +
 tools/perf/builtin-report.c              | 301 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------
 tools/perf/builtin-script.c              |  98 +++++++++++++++++++++++++++++++++++++++
 tools/perf/perf.h                        |   1 +
 tools/perf/util/event.c                  |   1 +
 tools/perf/util/event.h                  |   9 ++++
 tools/perf/util/evsel.c                  | 118 +++++++++++++++++++++++++++++++++++++----------
 tools/perf/util/evsel.h                  |   5 ++
 tools/perf/util/session.c                |  60 +++++++++++++++++++-----
 tools/perf/util/thread.c                 |   1 +
 tools/perf/util/thread.h                 |  16 +++++++
 tools/perf/util/tool.h                   |   1 +
 23 files changed, 954 insertions(+), 72 deletions(-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 01/21] perf tools: Add perf_evsel__is_sample_bit function
  2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
@ 2018-01-24 11:51 ` Jiri Olsa
  2018-01-24 11:51 ` [PATCH 02/21] perf tools: Add perf_sample__process function Jiri Olsa
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 11:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: lkml, Namhyung Kim, David Ahern, Andi Kleen, Alexander Shishkin,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Adding perf_evsel__is_sample_bit function to check on the
evsel's sample_type bit. It will be used later in the patchset.

Link: http://lkml.kernel.org/n/tip-woz5sp8qxvnhz8h14ne0657g@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/evsel.c | 6 ++++++
 tools/perf/util/evsel.h | 5 +++++
 2 files changed, 11 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 66fa45198a11..0f62de48594e 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -183,6 +183,12 @@ void perf_evsel__calc_id_pos(struct perf_evsel *evsel)
 	evsel->is_pos = __perf_evsel__calc_is_pos(evsel->attr.sample_type);
 }
 
+bool __perf_evsel__is_sample_bit(struct perf_evsel *evsel,
+				 enum perf_event_sample_format bit)
+{
+	return evsel->attr.sample_type & bit;
+}
+
 void __perf_evsel__set_sample_bit(struct perf_evsel *evsel,
 				  enum perf_event_sample_format bit)
 {
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 846e41644525..e54ea37469b3 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -234,11 +234,16 @@ int perf_evsel__group_desc(struct perf_evsel *evsel, char *buf, size_t size);
 int perf_evsel__alloc_id(struct perf_evsel *evsel, int ncpus, int nthreads);
 void perf_evsel__close_fd(struct perf_evsel *evsel);
 
+bool __perf_evsel__is_sample_bit(struct perf_evsel *evsel,
+				 enum perf_event_sample_format bit);
 void __perf_evsel__set_sample_bit(struct perf_evsel *evsel,
 				  enum perf_event_sample_format bit);
 void __perf_evsel__reset_sample_bit(struct perf_evsel *evsel,
 				    enum perf_event_sample_format bit);
 
+#define perf_evsel__is_sample_bit(evsel, bit) \
+	__perf_evsel__is_sample_bit(evsel, PERF_SAMPLE_##bit)
+
 #define perf_evsel__set_sample_bit(evsel, bit) \
 	__perf_evsel__set_sample_bit(evsel, PERF_SAMPLE_##bit)
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 02/21] perf tools: Add perf_sample__process function
  2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
  2018-01-24 11:51 ` [PATCH 01/21] " Jiri Olsa
@ 2018-01-24 11:51 ` Jiri Olsa
  2018-01-24 11:51 ` [PATCH 03/21] perf tools: Add callchain__printf for pure callchain dump Jiri Olsa
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 11:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: lkml, Namhyung Kim, David Ahern, Andi Kleen, Alexander Shishkin,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Later we are going to process samples from another place
in the code. Factor out perf_sample__process function
for that purpose, that prepares iterator and calls the
hist_entry_iter__add function.

Link: http://lkml.kernel.org/n/tip-bh115ggpykl0wt8e7e3rolwn@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-report.c | 42 ++++++++++++++++++++++++++----------------
 1 file changed, 26 insertions(+), 16 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 697ccd2c68ca..9bae7f11691c 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -193,6 +193,31 @@ static int hist_iter__branch_callback(struct hist_entry_iter *iter,
 	return err;
 }
 
+static int
+perf_sample__process(struct perf_sample *sample, struct addr_location *al,
+		     struct perf_evsel *evsel, struct report *rep)
+{
+	struct hist_entry_iter iter = {
+		.evsel			= evsel,
+		.sample			= sample,
+		.hide_unresolved	= symbol_conf.hide_unresolved,
+		.add_entry_cb		= hist_iter__report_callback,
+	};
+
+	if (sort__mode == SORT_MODE__BRANCH) {
+		iter.add_entry_cb = hist_iter__branch_callback;
+		iter.ops = &hist_iter_branch;
+	} else if (rep->mem_mode) {
+		iter.ops = &hist_iter_mem;
+	} else if (symbol_conf.cumulate_callchain) {
+		iter.ops = &hist_iter_cumulative;
+	} else {
+		iter.ops = &hist_iter_normal;
+	}
+
+	return hist_entry_iter__add(&iter, al, rep->max_stack, rep);
+}
+
 static int process_sample_event(struct perf_tool *tool,
 				union perf_event *event,
 				struct perf_sample *sample,
@@ -201,12 +226,6 @@ static int process_sample_event(struct perf_tool *tool,
 {
 	struct report *rep = container_of(tool, struct report, tool);
 	struct addr_location al;
-	struct hist_entry_iter iter = {
-		.evsel 			= evsel,
-		.sample 		= sample,
-		.hide_unresolved 	= symbol_conf.hide_unresolved,
-		.add_entry_cb 		= hist_iter__report_callback,
-	};
 	int ret = 0;
 
 	if (perf_time__ranges_skip_sample(rep->ptime_range, rep->range_num,
@@ -233,21 +252,12 @@ static int process_sample_event(struct perf_tool *tool,
 		 */
 		if (!sample->branch_stack)
 			goto out_put;
-
-		iter.add_entry_cb = hist_iter__branch_callback;
-		iter.ops = &hist_iter_branch;
-	} else if (rep->mem_mode) {
-		iter.ops = &hist_iter_mem;
-	} else if (symbol_conf.cumulate_callchain) {
-		iter.ops = &hist_iter_cumulative;
-	} else {
-		iter.ops = &hist_iter_normal;
 	}
 
 	if (al.map != NULL)
 		al.map->dso->hit = 1;
 
-	ret = hist_entry_iter__add(&iter, &al, rep->max_stack, rep);
+	ret = perf_sample__process(sample, &al, evsel, rep);
 	if (ret < 0)
 		pr_debug("problem adding hist entry, skipping event\n");
 out_put:
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 03/21] perf tools: Add callchain__printf for pure callchain dump
  2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
  2018-01-24 11:51 ` [PATCH 01/21] " Jiri Olsa
  2018-01-24 11:51 ` [PATCH 02/21] perf tools: Add perf_sample__process function Jiri Olsa
@ 2018-01-24 11:51 ` Jiri Olsa
  2018-01-24 11:51 ` [PATCH 04/21] perf tools: Add perf_sample__copy|free functions Jiri Olsa
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 11:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: lkml, Namhyung Kim, David Ahern, Andi Kleen, Alexander Shishkin,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Adding callchain__printf for the pure callchain dump
without the lbr portion, it will be used later in this
patchset. The lbr dump is now included in the new
perf_evsel__callchain__printf function.

Link: http://lkml.kernel.org/n/tip-xg1o9sr5p4yfxe61lu13m802@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/session.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index c71ced7db152..da0635e2f100 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -917,15 +917,11 @@ static void callchain__lbr_callstack_printf(struct perf_sample *sample)
 	}
 }
 
-static void callchain__printf(struct perf_evsel *evsel,
-			      struct perf_sample *sample)
+static void callchain__printf(struct perf_sample *sample)
 {
 	unsigned int i;
 	struct ip_callchain *callchain = sample->callchain;
 
-	if (perf_evsel__has_branch_callstack(evsel))
-		callchain__lbr_callstack_printf(sample);
-
 	printf("... FP chain: nr:%" PRIu64 "\n", callchain->nr);
 
 	for (i = 0; i < callchain->nr; i++)
@@ -933,6 +929,16 @@ static void callchain__printf(struct perf_evsel *evsel,
 		       i, callchain->ips[i]);
 }
 
+static void
+perf_evsel__callchain__printf(struct perf_evsel *evsel,
+			      struct perf_sample *sample)
+{
+	if (perf_evsel__has_branch_callstack(evsel))
+		callchain__lbr_callstack_printf(sample);
+
+	callchain__printf(sample);
+}
+
 static void branch_stack__printf(struct perf_sample *sample)
 {
 	uint64_t i;
@@ -1095,7 +1101,7 @@ static void dump_sample(struct perf_evsel *evsel, union perf_event *event,
 	sample_type = evsel->attr.sample_type;
 
 	if (sample_type & PERF_SAMPLE_CALLCHAIN)
-		callchain__printf(evsel, sample);
+		perf_evsel__callchain__printf(evsel, sample);
 
 	if ((sample_type & PERF_SAMPLE_BRANCH_STACK) && !perf_evsel__has_branch_callstack(evsel))
 		branch_stack__printf(sample);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 04/21] perf tools: Add perf_sample__copy|free functions
  2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
                   ` (2 preceding siblings ...)
  2018-01-24 11:51 ` [PATCH 03/21] perf tools: Add callchain__printf for pure callchain dump Jiri Olsa
@ 2018-01-24 11:51 ` Jiri Olsa
  2018-01-24 11:51 ` [PATCH 05/21] perf: Add TIF_PERF_USER_DATA bit Jiri Olsa
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 11:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: lkml, Namhyung Kim, David Ahern, Andi Kleen, Alexander Shishkin,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Later in the patchset we are going to queue some samples
for later processing. To be able to do that I'm adding
perf_sample__copy|free functions that duplicate|free
perf_sample data.

Link: http://lkml.kernel.org/n/tip-lcwyac9i125cq89m9msvgu9f@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-report.c | 75 +++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/event.h     |  1 +
 2 files changed, 76 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 9bae7f11691c..a08e2c88070a 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -193,6 +193,81 @@ static int hist_iter__branch_callback(struct hist_entry_iter *iter,
 	return err;
 }
 
+static void perf_sample__free(struct perf_sample *sample)
+{
+	if (sample->copy) {
+		free(sample->callchain);
+		free(sample->raw_data);
+		free(sample->branch_stack);
+		free(sample->user_regs.regs);
+		free(sample->intr_regs.regs);
+		free(sample->user_stack.data);
+	}
+}
+
+static __maybe_unused int
+perf_sample__copy(struct perf_sample *dst, struct perf_sample *src)
+{
+	int ret = -1;
+	u64 size;
+
+	*dst = *src;
+
+	dst->callchain       = NULL;
+	dst->raw_data        = NULL;
+	dst->branch_stack    = NULL;
+	dst->user_regs.regs  = NULL;
+	dst->intr_regs.regs  = NULL;
+	dst->user_stack.data = NULL;
+
+#define DUP(__field, __size)					\
+	do {							\
+		dst->__field = memdup(src->__field, __size);	\
+		if (!dst->__field)				\
+			goto error;				\
+	} while (0)
+
+	if (src->callchain) {
+		size = (src->callchain->nr + 1) * sizeof(u64);
+		DUP(callchain, size);
+	}
+
+	if (src->raw_data)
+		DUP(raw_data, src->raw_size);
+
+	if (src->branch_stack) {
+		size  = sizeof(u64);
+		size += src->branch_stack->nr * sizeof(struct branch_entry);
+		DUP(branch_stack, size);
+	}
+
+	if (src->user_regs.regs) {
+		u64 mask = src->user_regs.mask;
+
+		size = hweight_long(mask) * sizeof(u64);
+		DUP(user_regs.regs, size);
+	}
+
+	if (src->intr_regs.regs) {
+		u64 mask = src->intr_regs.mask;
+
+		size = hweight_long(mask) * sizeof(u64);
+		DUP(intr_regs.regs, size);
+	}
+
+	if (src->user_stack.data)
+		DUP(user_stack.data, src->user_stack.size);
+
+#undef DUP
+	dst->copy = true;
+	ret = 0;
+
+error:
+	if (ret)
+		perf_sample__free(dst);
+	return ret;
+}
+
 static int
 perf_sample__process(struct perf_sample *sample, struct addr_location *al,
 		     struct perf_evsel *evsel, struct report *rep)
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 0f794744919c..546539da1592 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -189,6 +189,7 @@ enum {
 #define MAX_INSN 16
 
 struct perf_sample {
+	bool copy;
 	u64 ip;
 	u32 pid, tid;
 	u64 time;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 05/21] perf: Add TIF_PERF_USER_DATA bit
  2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
                   ` (3 preceding siblings ...)
  2018-01-24 11:51 ` [PATCH 04/21] perf tools: Add perf_sample__copy|free functions Jiri Olsa
@ 2018-01-24 11:51 ` Jiri Olsa
  2018-01-24 11:51 ` [PATCH 06/21] perf: Add PERF_RECORD_USER_DATA event processing Jiri Olsa
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 11:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: lkml, Namhyung Kim, David Ahern, Andi Kleen, Alexander Shishkin,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Adding TIF_PERF_USER_DATA bit to be able to limit the code area
where the perf's delayed user data retrieval is possible.

Task which is marked with TIF_PERF_USER_DATA bit, will do the slow
syscall path and mark the 'safe area' to trigger user data harvest
through the task_work by setting up its perf_user_data_allowed bit.

Link: http://lkml.kernel.org/n/tip-by32kqjb8rhjc0gdnzftur83@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 arch/x86/entry/common.c            | 6 ++++++
 arch/x86/include/asm/thread_info.h | 4 +++-
 include/linux/sched.h              | 3 +++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index d7d3cc24baf4..29d9e5ef0c75 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -77,6 +77,9 @@ static long syscall_trace_enter(struct pt_regs *regs)
 
 	work = READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY;
 
+	if (unlikely(work & _TIF_PERF_USER_DATA))
+		current->perf_user_data_allowed = 1;
+
 	if (unlikely(work & _TIF_SYSCALL_EMU))
 		emulated = true;
 
@@ -191,6 +194,9 @@ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs)
 
 	cached_flags = READ_ONCE(ti->flags);
 
+	if (unlikely(cached_flags & _TIF_PERF_USER_DATA))
+		current->perf_user_data_allowed = 0;
+
 	if (unlikely(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS))
 		exit_to_usermode_loop(regs, cached_flags);
 
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 00223333821a..f664f4bd25ac 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -80,6 +80,7 @@ struct thread_info {
 #define TIF_SIGPENDING		2	/* signal pending */
 #define TIF_NEED_RESCHED	3	/* rescheduling necessary */
 #define TIF_SINGLESTEP		4	/* reenable singlestep on user return*/
+#define TIF_PERF_USER_DATA	5	/* enable safe area for perf user data retrieval */
 #define TIF_SYSCALL_EMU		6	/* syscall emulation active */
 #define TIF_SYSCALL_AUDIT	7	/* syscall auditing active */
 #define TIF_SECCOMP		8	/* secure computing */
@@ -106,6 +107,7 @@ struct thread_info {
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
 #define _TIF_SINGLESTEP		(1 << TIF_SINGLESTEP)
+#define _TIF_PERF_USER_DATA	(1 << TIF_PERF_USER_DATA)
 #define _TIF_SYSCALL_EMU	(1 << TIF_SYSCALL_EMU)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_SECCOMP		(1 << TIF_SECCOMP)
@@ -133,7 +135,7 @@ struct thread_info {
 #define _TIF_WORK_SYSCALL_ENTRY	\
 	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | _TIF_SYSCALL_AUDIT |	\
 	 _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT |	\
-	 _TIF_NOHZ)
+	 _TIF_NOHZ | _TIF_PERF_USER_DATA)
 
 /* work to do on any return to user space */
 #define _TIF_ALLWORK_MASK						\
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d2588263a989..6e8079524010 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -653,6 +653,9 @@ struct task_struct {
 	/* disallow userland-initiated cgroup migration */
 	unsigned			no_cgroup_migration:1;
 #endif
+#ifdef CONFIG_PERF_EVENTS
+	unsigned			perf_user_data_allowed:1;
+#endif
 
 	unsigned long			atomic_flags; /* Flags requiring atomic access. */
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 06/21] perf: Add PERF_RECORD_USER_DATA event processing
  2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
                   ` (4 preceding siblings ...)
  2018-01-24 11:51 ` [PATCH 05/21] perf: Add TIF_PERF_USER_DATA bit Jiri Olsa
@ 2018-01-24 11:51 ` Jiri Olsa
  2018-01-24 11:51 ` [PATCH 07/21] perf: Add PERF_SAMPLE_USER_DATA_ID sample type Jiri Olsa
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 11:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: lkml, Namhyung Kim, David Ahern, Andi Kleen, Alexander Shishkin,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Adding support to skip user data retrieval from event's
NMI processing and delay that to the time when task is
jumping back to user space.

Using task work to retrieve the needed user data and
store user data event with it linked with ID to the
original sample.

We can trigger the delayed task work only if the taskwork
gets executed before the process executes again after NMI,
because we need its stack as it was in NMI.

That leaves us with window during the slow syscall path
delimited by task_struct::perf_user_data_allowed.

Note that this change only adds skeleton of this code,
the data retrieval is coming in following patch.

This patch also adds the PERF_RECORD_USER_DATA event, which
is designed in a similar way as sample. Having data area
governed by 'sample' type will allow us to add multiple
user data such as callchain or user stack.

Link: http://lkml.kernel.org/n/tip-j8azyyhfgipc6mn8amfpy8hm@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 arch/x86/events/core.c          |  18 +++++
 arch/x86/events/intel/ds.c      |   4 +-
 include/linux/init_task.h       |   4 +-
 include/linux/perf_event.h      |   1 +
 include/linux/sched.h           |  15 +++++
 include/uapi/linux/perf_event.h |  17 ++++-
 kernel/events/core.c            | 145 +++++++++++++++++++++++++++++++++++++++-
 7 files changed, 198 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 140d33288e78..8e7fe39a33b8 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2567,3 +2567,21 @@ void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
 	cap->events_mask_len	= x86_pmu.events_mask_len;
 }
 EXPORT_SYMBOL_GPL(perf_get_x86_pmu_capability);
+
+int arch_perf_set_user_data(struct task_struct *task, bool set)
+{
+	struct perf_user_data *ud = &task->perf_user_data;
+
+	mutex_lock(&ud->enabled_mutex);
+
+	ud->enabled_count += set ? 1 : -1;
+	WARN_ON_ONCE(ud->enabled_count < 0);
+
+	if (ud->enabled_count == 1)
+		set_tsk_thread_flag(task, TIF_PERF_USER_DATA);
+	else if (ud->enabled_count == 0)
+		clear_tsk_thread_flag(task, TIF_PERF_USER_DATA);
+
+	mutex_unlock(&ud->enabled_mutex);
+	return 0;
+}
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 8156e47da7ba..a4329c59b195 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -637,8 +637,10 @@ int intel_pmu_drain_bts_buffer(void)
 	perf_prepare_sample(&header, &data, event, &regs);
 
 	if (perf_output_begin(&handle, event, header.size *
-			      (top - base - skip)))
+			      (top - base - skip))) {
+		perf_prepare_sample_fallback(event);
 		goto unlock;
+	}
 
 	for (at = base; at < top; at++) {
 		/* Filter out any records that contain kernel addresses. */
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 6a532629c983..55fa53ab9d91 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -157,7 +157,9 @@ extern struct cred init_cred;
 # define INIT_PERF_EVENTS(tsk)						\
 	.perf_event_mutex = 						\
 		 __MUTEX_INITIALIZER(tsk.perf_event_mutex),		\
-	.perf_event_list = LIST_HEAD_INIT(tsk.perf_event_list),
+	.perf_event_list = LIST_HEAD_INIT(tsk.perf_event_list),		\
+	.perf_user_data.enabled_mutex =					\
+		 __MUTEX_INITIALIZER(tsk.perf_user_data.enabled_mutex),
 #else
 # define INIT_PERF_EVENTS(tsk)
 #endif
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 7546822a1d74..b716bbca6f87 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -948,6 +948,7 @@ extern void perf_prepare_sample(struct perf_event_header *header,
 				struct perf_sample_data *data,
 				struct perf_event *event,
 				struct pt_regs *regs);
+void perf_prepare_sample_fallback(struct perf_event *event);
 
 extern int perf_event_overflow(struct perf_event *event,
 				 struct perf_sample_data *data,
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6e8079524010..101c49cdde09 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -506,6 +506,20 @@ union rcu_special {
 	u32 s; /* Set of bits. */
 };
 
+enum perf_user_data_state {
+	PERF_USER_DATA_STATE_OFF	= 0,
+	PERF_USER_DATA_STATE_ENABLE	= 1,
+	PERF_USER_DATA_STATE_ON		= 2,
+};
+
+struct perf_user_data {
+	struct callback_head		 work;
+	enum perf_user_data_state	 state;
+	u64				 type;
+	int				 enabled_count;
+	struct mutex			 enabled_mutex;
+};
+
 enum perf_event_task_context {
 	perf_invalid_context = -1,
 	perf_hw_context = 0,
@@ -917,6 +931,7 @@ struct task_struct {
 	struct perf_event_context	*perf_event_ctxp[perf_nr_task_contexts];
 	struct mutex			perf_event_mutex;
 	struct list_head		perf_event_list;
+	struct perf_user_data		perf_user_data;
 #endif
 #ifdef CONFIG_DEBUG_PREEMPT
 	unsigned long			preempt_disable_ip;
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c77c9a2ebbbb..f7b152a2f004 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -370,7 +370,8 @@ struct perf_event_attr {
 				context_switch :  1, /* context switch data */
 				write_backward :  1, /* Write ring buffer from end to beginning */
 				namespaces     :  1, /* include namespaces data */
-				__reserved_1   : 35;
+				user_data      :  1, /* generate user data */
+				__reserved_1   : 34;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -618,10 +619,12 @@ struct perf_event_mmap_page {
  *   PERF_RECORD_MISC_MMAP_DATA  - PERF_RECORD_MMAP* events
  *   PERF_RECORD_MISC_COMM_EXEC  - PERF_RECORD_COMM event
  *   PERF_RECORD_MISC_SWITCH_OUT - PERF_RECORD_SWITCH* events
+ *   PERF_RECORD_MISC_USER_DATA  - PERF_RECORD_SAMPLE event
  */
 #define PERF_RECORD_MISC_MMAP_DATA		(1 << 13)
 #define PERF_RECORD_MISC_COMM_EXEC		(1 << 13)
 #define PERF_RECORD_MISC_SWITCH_OUT		(1 << 13)
+#define PERF_RECORD_MISC_USER_DATA		(1 << 13)
 /*
  * Indicates that the content of PERF_SAMPLE_IP points to
  * the actual instruction that triggered the event. See also
@@ -922,6 +925,18 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_NAMESPACES			= 16,
 
+	/*
+	 * Records the user space data for previous
+	 * kernel samples.
+	 *
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u64				sample_type;
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_USER_DATA			= 17,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4e1a1bf8d867..8162cadb6736 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -50,6 +50,7 @@
 #include <linux/sched/mm.h>
 #include <linux/proc_ns.h>
 #include <linux/mount.h>
+#include <linux/task_work.h>
 
 #include "internal.h"
 
@@ -4850,6 +4851,11 @@ void __weak arch_perf_update_userpage(
 {
 }
 
+int __weak arch_perf_set_user_data(struct task_struct *task, bool set)
+{
+	return -EINVAL;
+}
+
 /*
  * Callers need to ensure there can be no nesting of this function, otherwise
  * the seqlock logic goes bad. We can not serialize this because the arch
@@ -5938,6 +5944,23 @@ void perf_output_sample(struct perf_output_handle *handle,
 			}
 		}
 	}
+
+	if (event->attr.user_data) {
+		struct perf_user_data *user_data = &current->perf_user_data;
+
+		if (user_data->state == PERF_USER_DATA_STATE_ENABLE) {
+			user_data->state = PERF_USER_DATA_STATE_ON;
+
+			/*
+			 * We cannot do set_notify_resume() from NMI context,
+			 * also, knowing we are already in an interrupted
+			 * context and will pass return to userspace, we can
+			 * simply set TIF_NOTIFY_RESUME.
+			 */
+			task_work_add(current, &user_data->work, false);
+			set_tsk_thread_flag(current, TIF_NOTIFY_RESUME);
+		}
+	}
 }
 
 static u64 perf_virt_to_phys(u64 virt)
@@ -5972,6 +5995,20 @@ static u64 perf_virt_to_phys(u64 virt)
 	return phys_addr;
 }
 
+struct user_data {
+	u64	type;
+	bool	allow;
+};
+
+static void user_data(struct user_data *ud, struct perf_event *event)
+{
+	ud->allow = event->attr.user_data &&		/* is user data event	*/
+		    current->perf_user_data_allowed &&	/* is in allowed area	*/
+		    current->mm &&			/* is normal task	*/
+		    !(current->flags & PF_EXITING);	/* is not exiting task	*/
+	ud->type  = 0;
+}
+
 static struct perf_callchain_entry __empty_callchain = { .nr = 0, };
 
 static struct perf_callchain_entry *
@@ -5998,6 +6035,9 @@ void perf_prepare_sample(struct perf_event_header *header,
 			 struct pt_regs *regs)
 {
 	u64 sample_type = event->attr.sample_type;
+	struct user_data ud;
+
+	user_data(&ud, event);
 
 	header->type = PERF_RECORD_SAMPLE;
 	header->size = sizeof(*header) + event->header_size;
@@ -6111,6 +6151,27 @@ void perf_prepare_sample(struct perf_event_header *header,
 
 	if (sample_type & PERF_SAMPLE_PHYS_ADDR)
 		data->phys_addr = perf_virt_to_phys(data->addr);
+
+	if (ud.allow && ud.type) {
+		struct perf_user_data *user_data = &current->perf_user_data;
+
+		header->misc |= PERF_RECORD_MISC_USER_DATA;
+		user_data->type |= ud.type;
+
+		if (!user_data->state)
+			user_data->state = PERF_USER_DATA_STATE_ENABLE;
+	}
+}
+
+void perf_prepare_sample_fallback(struct perf_event *event)
+{
+	struct perf_user_data *user_data = &current->perf_user_data;
+
+	if (!event->attr.user_data)
+		return;
+
+	if (user_data->state == PERF_USER_DATA_STATE_ENABLE)
+		user_data->state = PERF_USER_DATA_STATE_OFF;
 }
 
 static void __always_inline
@@ -6129,8 +6190,10 @@ __perf_event_output(struct perf_event *event,
 
 	perf_prepare_sample(&header, data, event, regs);
 
-	if (output_begin(&handle, event, header.size))
+	if (output_begin(&handle, event, header.size)) {
+		perf_prepare_sample_fallback(event);
 		goto exit;
+	}
 
 	perf_output_sample(&handle, &header, data, event);
 
@@ -6285,6 +6348,67 @@ perf_iterate_sb(perf_iterate_f output, void *data,
 	rcu_read_unlock();
 }
 
+struct perf_user_data_event {
+	struct {
+		struct perf_event_header	header;
+		u64				type;
+	} event_id;
+};
+
+static void perf_user_data_output(struct perf_event *event, void *data)
+{
+	struct perf_user_data *user_data = &current->perf_user_data;
+	struct perf_user_data_event *user = data;
+	struct perf_output_handle handle;
+	struct perf_sample_data sample;
+	u16 header_size = user->event_id.header.size;
+
+	if (!event->attr.user_data)
+		return;
+
+	user->event_id.type  = event->attr.sample_type & user_data->type;
+
+	perf_event_header__init_id(&user->event_id.header, &sample, event);
+
+	if (perf_output_begin(&handle, event, user->event_id.header.size))
+		goto out;
+
+	perf_output_put(&handle, user->event_id);
+	perf_event__output_id_sample(event, &handle, &sample);
+	perf_output_end(&handle);
+out:
+	user->event_id.header.size = header_size;
+}
+
+static void perf_user_data_event(struct perf_user_data *user_data)
+{
+	struct perf_user_data_event event;
+
+	event = (struct perf_user_data_event) {
+		.event_id = {
+			.header	= {
+				.type = PERF_RECORD_USER_DATA,
+				.misc = 0,
+				.size = sizeof(event.event_id),
+			},
+		},
+	};
+
+	perf_iterate_sb(perf_user_data_output, &event, NULL);
+
+	/*
+	 * User data events are disabled (perf_user_data_allowed),
+	 * so there's no race and we can set new id and zero type.
+	 */
+	user_data->type  = 0;
+	user_data->state = PERF_USER_DATA_STATE_OFF;
+}
+
+static void perf_user_data_work(struct callback_head *work)
+{
+	perf_user_data_event(&current->perf_user_data);
+}
+
 /*
  * Clear all file-based filters at exec, they'll have to be
  * re-instated when/if these objects are mmapped again.
@@ -9919,16 +10043,26 @@ SYSCALL_DEFINE5(perf_event_open,
 		}
 	}
 
+	if (attr.user_data) {
+		if (!task) {
+			err = -EINVAL;
+			goto err_group_fd;
+		}
+		err = arch_perf_set_user_data(task, true);
+		if (err)
+			goto err_task;
+	}
+
 	if (task && group_leader &&
 	    group_leader->attr.inherit != attr.inherit) {
 		err = -EINVAL;
-		goto err_task;
+		goto err_user_data;
 	}
 
 	if (task) {
 		err = mutex_lock_interruptible(&task->signal->cred_guard_mutex);
 		if (err)
-			goto err_task;
+			goto err_user_data;
 
 		/*
 		 * Reuse ptrace permission checks for now.
@@ -10252,6 +10386,9 @@ SYSCALL_DEFINE5(perf_event_open,
 err_cred:
 	if (task)
 		mutex_unlock(&task->signal->cred_guard_mutex);
+err_user_data:
+	if (attr.user_data && task)
+		arch_perf_set_user_data(task, false);
 err_task:
 	if (task)
 		put_task_struct(task);
@@ -10985,6 +11122,8 @@ int perf_event_init_task(struct task_struct *child)
 	memset(child->perf_event_ctxp, 0, sizeof(child->perf_event_ctxp));
 	mutex_init(&child->perf_event_mutex);
 	INIT_LIST_HEAD(&child->perf_event_list);
+	init_task_work(&child->perf_user_data.work, perf_user_data_work);
+	mutex_init(&child->perf_user_data.enabled_mutex);
 
 	for_each_task_context_nr(ctxn) {
 		ret = perf_event_init_context(child, ctxn);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 07/21] perf: Add PERF_SAMPLE_USER_DATA_ID sample type
  2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
                   ` (5 preceding siblings ...)
  2018-01-24 11:51 ` [PATCH 06/21] perf: Add PERF_RECORD_USER_DATA event processing Jiri Olsa
@ 2018-01-24 11:51 ` Jiri Olsa
  2018-01-24 11:51 ` [PATCH 08/21] perf: Add PERF_SAMPLE_CALLCHAIN to user data event Jiri Olsa
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 11:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: lkml, Namhyung Kim, David Ahern, Andi Kleen, Alexander Shishkin,
	Andy Lutomirski, Arnaldo Carvalho de Melo

We need to add id to connect sample and user data
event in case one of them get lost. Adding new
PERF_SAMPLE_USER_DATA_ID for that purpose.

Samples mark with PERF_RECORD_MISC_USER_DATA carry
current user data ID, otherwise that field is 0.

This will be used in user space to connect samples
with USER DATA event and force proper link in cases
of lost events.

Link: http://lkml.kernel.org/n/tip-j8azyyhfgipc6mn8amfpy8hm@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 include/linux/perf_event.h            |  2 ++
 include/linux/sched.h                 |  1 +
 include/uapi/linux/perf_event.h       | 14 +++++++++++++-
 kernel/events/core.c                  | 21 +++++++++++++++++++++
 tools/include/uapi/linux/perf_event.h |  3 ++-
 5 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index b716bbca6f87..225f0787d886 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -918,6 +918,8 @@ struct perf_sample_data {
 	u64				stack_user_size;
 
 	u64				phys_addr;
+
+	u64				user_data_id;
 } ____cacheline_aligned;
 
 /* default value for data source */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 101c49cdde09..a2e041acfc4e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -518,6 +518,7 @@ struct perf_user_data {
 	u64				 type;
 	int				 enabled_count;
 	struct mutex			 enabled_mutex;
+	u64				 id;
 };
 
 enum perf_event_task_context {
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index f7b152a2f004..3df8024f54f1 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -141,8 +141,9 @@ enum perf_event_sample_format {
 	PERF_SAMPLE_TRANSACTION			= 1U << 17,
 	PERF_SAMPLE_REGS_INTR			= 1U << 18,
 	PERF_SAMPLE_PHYS_ADDR			= 1U << 19,
+	PERF_SAMPLE_USER_DATA_ID		= 1U << 20,
 
-	PERF_SAMPLE_MAX = 1U << 20,		/* non-ABI */
+	PERF_SAMPLE_MAX = 1U << 21,		/* non-ABI */
 };
 
 /*
@@ -823,6 +824,7 @@ enum perf_event_type {
 	 *	{ u64			abi; # enum perf_sample_regs_abi
 	 *	  u64			regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR
 	 *	{ u64			phys_addr;} && PERF_SAMPLE_PHYS_ADDR
+	 *	{ u64			user_data_id;} && PERF_SAMPLE_USER_DATA_ID
 	 * };
 	 */
 	PERF_RECORD_SAMPLE			= 9,
@@ -932,6 +934,16 @@ enum perf_event_type {
 	 * struct {
 	 *	struct perf_event_header	header;
 	 *	u64				sample_type;
+	 *
+	 *	# The sample_type value could contain following
+	 *	# PERF_SAMPLE_* bits:
+	 *	#
+	 *	#   PERF_SAMPLE_USER_DATA_ID
+	 *	#
+	 *	# and governs the data portion:
+	 *
+	 *	{ u64		user_data_id;} && PERF_SAMPLE_USER_DATA_ID
+	 *
 	 *	struct sample_id		sample_id;
 	 * };
 	 */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8162cadb6736..1edf02dcd6e8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1566,6 +1566,9 @@ static void __perf_event_header_size(struct perf_event *event, u64 sample_type)
 	if (sample_type & PERF_SAMPLE_PHYS_ADDR)
 		size += sizeof(data->phys_addr);
 
+	if (sample_type & PERF_SAMPLE_USER_DATA_ID)
+		size += sizeof(data->user_data_id);
+
 	event->header_size = size;
 }
 
@@ -5931,6 +5934,9 @@ void perf_output_sample(struct perf_output_handle *handle,
 	if (sample_type & PERF_SAMPLE_PHYS_ADDR)
 		perf_output_put(handle, data->phys_addr);
 
+	if (sample_type & PERF_SAMPLE_USER_DATA_ID)
+		perf_output_put(handle, data->user_data_id);
+
 	if (!event->attr.watermark) {
 		int wakeup_events = event->attr.wakeup_events;
 
@@ -6152,6 +6158,9 @@ void perf_prepare_sample(struct perf_event_header *header,
 	if (sample_type & PERF_SAMPLE_PHYS_ADDR)
 		data->phys_addr = perf_virt_to_phys(data->addr);
 
+	if (sample_type & PERF_SAMPLE_USER_DATA_ID)
+		data->user_data_id = 0;
+
 	if (ud.allow && ud.type) {
 		struct perf_user_data *user_data = &current->perf_user_data;
 
@@ -6160,6 +6169,8 @@ void perf_prepare_sample(struct perf_event_header *header,
 
 		if (!user_data->state)
 			user_data->state = PERF_USER_DATA_STATE_ENABLE;
+
+		data->user_data_id = user_data->id;
 	}
 }
 
@@ -6367,13 +6378,21 @@ static void perf_user_data_output(struct perf_event *event, void *data)
 		return;
 
 	user->event_id.type  = event->attr.sample_type & user_data->type;
+	user->event_id.type |= event->attr.sample_type & PERF_SAMPLE_USER_DATA_ID;
 
 	perf_event_header__init_id(&user->event_id.header, &sample, event);
 
+	if (user->event_id.type & PERF_SAMPLE_USER_DATA_ID)
+		user->event_id.header.size += sizeof(u64);
+
 	if (perf_output_begin(&handle, event, user->event_id.header.size))
 		goto out;
 
 	perf_output_put(&handle, user->event_id);
+
+	if (user->event_id.type & PERF_SAMPLE_USER_DATA_ID)
+		perf_output_put(&handle, user_data->id);
+
 	perf_event__output_id_sample(event, &handle, &sample);
 	perf_output_end(&handle);
 out:
@@ -6402,6 +6421,7 @@ static void perf_user_data_event(struct perf_user_data *user_data)
 	 */
 	user_data->type  = 0;
 	user_data->state = PERF_USER_DATA_STATE_OFF;
+	user_data->id++;
 }
 
 static void perf_user_data_work(struct callback_head *work)
@@ -11124,6 +11144,7 @@ int perf_event_init_task(struct task_struct *child)
 	INIT_LIST_HEAD(&child->perf_event_list);
 	init_task_work(&child->perf_user_data.work, perf_user_data_work);
 	mutex_init(&child->perf_user_data.enabled_mutex);
+	child->perf_user_data.id = 0;
 
 	for_each_task_context_nr(ctxn) {
 		ret = perf_event_init_context(child, ctxn);
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index c77c9a2ebbbb..dea5e9c32e8a 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -141,8 +141,9 @@ enum perf_event_sample_format {
 	PERF_SAMPLE_TRANSACTION			= 1U << 17,
 	PERF_SAMPLE_REGS_INTR			= 1U << 18,
 	PERF_SAMPLE_PHYS_ADDR			= 1U << 19,
+	PERF_SAMPLE_USER_DATA_ID		= 1U << 20,
 
-	PERF_SAMPLE_MAX = 1U << 20,		/* non-ABI */
+	PERF_SAMPLE_MAX = 1U << 21,		/* non-ABI */
 };
 
 /*
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 08/21] perf: Add PERF_SAMPLE_CALLCHAIN to user data event
  2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
                   ` (6 preceding siblings ...)
  2018-01-24 11:51 ` [PATCH 07/21] perf: Add PERF_SAMPLE_USER_DATA_ID sample type Jiri Olsa
@ 2018-01-24 11:51 ` Jiri Olsa
  2018-01-24 11:51 ` [PATCH 09/21] perf: Export running sample length values through debugfs Jiri Olsa
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 11:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: lkml, Namhyung Kim, David Ahern, Andi Kleen, Alexander Shishkin,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Adding PERF_SAMPLE_CALLCHAIN to user data event
and allowing to defer callchain retrieval to
user data task work.

Callchain data is stored in the same way as for
sample events. Using also the sample sample type
bits for the USER DATA event 'type' value.

Link: http://lkml.kernel.org/n/tip-drrmdnu591ix4rul0kktud4f@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 include/linux/sched.h           |  1 +
 include/uapi/linux/perf_event.h |  3 +++
 kernel/events/core.c            | 50 +++++++++++++++++++++++++++++++++++++++--
 3 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index a2e041acfc4e..97d30eabb266 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -519,6 +519,7 @@ struct perf_user_data {
 	int				 enabled_count;
 	struct mutex			 enabled_mutex;
 	u64				 id;
+	u16				 max_stack;
 };
 
 enum perf_event_task_context {
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 3df8024f54f1..d30583411f97 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -939,9 +939,12 @@ enum perf_event_type {
 	 *	# PERF_SAMPLE_* bits:
 	 *	#
 	 *	#   PERF_SAMPLE_USER_DATA_ID
+	 *	#   PERF_SAMPLE_CALLCHAIN
 	 *	#
 	 *	# and governs the data portion:
 	 *
+	 *	{ u64		nr,
+	 *	  u64		ips[nr];}      && PERF_SAMPLE_CALLCHAIN
 	 *	{ u64		user_data_id;} && PERF_SAMPLE_USER_DATA_ID
 	 *
 	 *	struct sample_id		sample_id;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1edf02dcd6e8..4676fbf681c7 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6018,7 +6018,8 @@ static void user_data(struct user_data *ud, struct perf_event *event)
 static struct perf_callchain_entry __empty_callchain = { .nr = 0, };
 
 static struct perf_callchain_entry *
-perf_callchain(struct perf_event *event, struct pt_regs *regs)
+perf_callchain(struct perf_event *event, struct pt_regs *regs,
+	       struct user_data *ud)
 {
 	bool kernel = !event->attr.exclude_callchain_kernel;
 	bool user   = !event->attr.exclude_callchain_user;
@@ -6027,6 +6028,11 @@ perf_callchain(struct perf_event *event, struct pt_regs *regs)
 	const u32 max_stack = event->attr.sample_max_stack;
 	struct perf_callchain_entry *callchain;
 
+	if (ud->allow && user && !crosstask) {
+		ud->type |= PERF_SAMPLE_CALLCHAIN;
+		user = false;
+	}
+
 	if (!kernel && !user)
 		return &__empty_callchain;
 
@@ -6059,7 +6065,7 @@ void perf_prepare_sample(struct perf_event_header *header,
 	if (sample_type & PERF_SAMPLE_CALLCHAIN) {
 		int size = 1;
 
-		data->callchain = perf_callchain(event, regs);
+		data->callchain = perf_callchain(event, regs, &ud);
 		size += data->callchain->nr;
 
 		header->size += size * sizeof(u64);
@@ -6166,6 +6172,8 @@ void perf_prepare_sample(struct perf_event_header *header,
 
 		header->misc |= PERF_RECORD_MISC_USER_DATA;
 		user_data->type |= ud.type;
+		user_data->max_stack = max(user_data->max_stack,
+					   event->attr.sample_max_stack);
 
 		if (!user_data->state)
 			user_data->state = PERF_USER_DATA_STATE_ENABLE;
@@ -6360,12 +6368,29 @@ perf_iterate_sb(perf_iterate_f output, void *data,
 }
 
 struct perf_user_data_event {
+	struct perf_callchain_entry	*callchain;
+
 	struct {
 		struct perf_event_header	header;
 		u64				type;
 	} event_id;
 };
 
+static struct perf_callchain_entry *perf_user_callchain(u16 max_stack)
+{
+	struct perf_callchain_entry *callchain;
+
+	callchain = get_perf_callchain(task_pt_regs(current),
+					/* init_nr   */ 0,
+					/* kernel    */ false,
+					/* user      */ true,
+					max_stack,
+					/* crosstask */ false,
+					/* add_mark  */ true);
+
+	return callchain ?: &__empty_callchain;
+}
+
 static void perf_user_data_output(struct perf_event *event, void *data)
 {
 	struct perf_user_data *user_data = &current->perf_user_data;
@@ -6373,6 +6398,7 @@ static void perf_user_data_output(struct perf_event *event, void *data)
 	struct perf_output_handle handle;
 	struct perf_sample_data sample;
 	u16 header_size = user->event_id.header.size;
+	u64 nr;
 
 	if (!event->attr.user_data)
 		return;
@@ -6382,6 +6408,18 @@ static void perf_user_data_output(struct perf_event *event, void *data)
 
 	perf_event_header__init_id(&user->event_id.header, &sample, event);
 
+	if (user->event_id.type & PERF_SAMPLE_CALLCHAIN) {
+		int size = 1;
+
+		nr = user->callchain->nr;
+		nr = min((__u16) nr, event->attr.sample_max_stack);
+
+		size += nr;
+		size *= sizeof(u64);
+
+		user->event_id.header.size += size;
+	}
+
 	if (user->event_id.type & PERF_SAMPLE_USER_DATA_ID)
 		user->event_id.header.size += sizeof(u64);
 
@@ -6390,6 +6428,11 @@ static void perf_user_data_output(struct perf_event *event, void *data)
 
 	perf_output_put(&handle, user->event_id);
 
+	if (user->event_id.type & PERF_SAMPLE_CALLCHAIN) {
+		perf_output_put(&handle, nr);
+		__output_copy(&handle, user->callchain->ip, nr * sizeof(u64));
+	}
+
 	if (user->event_id.type & PERF_SAMPLE_USER_DATA_ID)
 		perf_output_put(&handle, user_data->id);
 
@@ -6413,6 +6456,9 @@ static void perf_user_data_event(struct perf_user_data *user_data)
 		},
 	};
 
+	if (user_data->type & PERF_SAMPLE_CALLCHAIN)
+		event.callchain = perf_user_callchain(user_data->max_stack);
+
 	perf_iterate_sb(perf_user_data_output, &event, NULL);
 
 	/*
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 09/21] perf: Export running sample length values through debugfs
  2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
                   ` (7 preceding siblings ...)
  2018-01-24 11:51 ` [PATCH 08/21] perf: Add PERF_SAMPLE_CALLCHAIN to user data event Jiri Olsa
@ 2018-01-24 11:51 ` Jiri Olsa
  2018-01-24 11:51 ` [PATCH 10/21] perf tools: Sync perf_event.h uapi header Jiri Olsa
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 11:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: lkml, Namhyung Kim, David Ahern, Andi Kleen, Alexander Shishkin,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Exporting running_sample_length value through the debugfs,
via per cpu files:
  /sys/kernel/debug/irq/cpuX/sample_length

and reset file to zero it:
  /sys/kernel/debug/irq/reset

to allow some basic meassurements of the NMI time length.

Link: http://lkml.kernel.org/n/tip-uodlhfk3zc55fyajtlczr5wd@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/events/core.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4676fbf681c7..582913b7aba9 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -51,6 +51,7 @@
 #include <linux/proc_ns.h>
 #include <linux/mount.h>
 #include <linux/task_work.h>
+#include <linux/debugfs.h>
 
 #include "internal.h"
 
@@ -554,6 +555,72 @@ void perf_sample_event_took(u64 sample_len_ns)
 	}
 }
 
+static int get_sample_length(void *data, u64 *val)
+{
+	unsigned long cpu = (unsigned long) data;
+
+	*val = per_cpu(running_sample_length, cpu);
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(sample_length_fops, get_sample_length, NULL, "%llu\n");
+
+
+static int reset_sample_length(void *data, u64 val)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		per_cpu(running_sample_length, cpu) = val;
+	}
+
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(reset_fops, NULL, reset_sample_length, "%llu\n");
+
+static __init int init_perf_debugfs(void)
+{
+	struct dentry *root, *irq, *icpu, *file;
+	int cpu, ret = 0;
+
+	root = debugfs_create_dir("perf", NULL);
+	if (!root)
+		return -1;
+
+	irq = debugfs_create_dir("irq", root);
+	if (!irq)
+		return -1;
+
+	for_each_possible_cpu(cpu) {
+		char buf[50];
+
+		snprintf(buf, sizeof(buf), "cpu%d", cpu);
+
+		icpu = debugfs_create_dir(buf, irq);
+		if (!icpu)
+			return -1;
+
+		file = debugfs_create_file("sample_length", 0444, icpu,
+					   (void *)(unsigned long) cpu,
+					   &sample_length_fops);
+		if (!file) {
+			ret = -1;
+			break;
+		}
+	}
+
+	if (!ret) {
+		file = debugfs_create_file("reset", S_IWUSR, irq, NULL, &reset_fops);
+		if (!file)
+			ret = -1;
+	}
+
+	return ret;
+}
+
+late_initcall(init_perf_debugfs);
+
 static atomic64_t perf_event_id;
 
 static void cpu_ctx_sched_out(struct perf_cpu_context *cpuctx,
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 10/21] perf tools: Sync perf_event.h uapi header
  2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
                   ` (8 preceding siblings ...)
  2018-01-24 11:51 ` [PATCH 09/21] perf: Export running sample length values through debugfs Jiri Olsa
@ 2018-01-24 11:51 ` Jiri Olsa
  2018-01-24 11:51 ` [PATCH 11/21] perf tools: Add perf_sample__parse function Jiri Olsa
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 11:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: lkml, Namhyung Kim, David Ahern, Andi Kleen, Alexander Shishkin,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Syncing perf_event.h uapi header with user data changes.

Link: http://lkml.kernel.org/n/tip-hoayrwj8xl8a8oy54m0fhdyo@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/include/uapi/linux/perf_event.h | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index dea5e9c32e8a..d30583411f97 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -371,7 +371,8 @@ struct perf_event_attr {
 				context_switch :  1, /* context switch data */
 				write_backward :  1, /* Write ring buffer from end to beginning */
 				namespaces     :  1, /* include namespaces data */
-				__reserved_1   : 35;
+				user_data      :  1, /* generate user data */
+				__reserved_1   : 34;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -619,10 +620,12 @@ struct perf_event_mmap_page {
  *   PERF_RECORD_MISC_MMAP_DATA  - PERF_RECORD_MMAP* events
  *   PERF_RECORD_MISC_COMM_EXEC  - PERF_RECORD_COMM event
  *   PERF_RECORD_MISC_SWITCH_OUT - PERF_RECORD_SWITCH* events
+ *   PERF_RECORD_MISC_USER_DATA  - PERF_RECORD_SAMPLE event
  */
 #define PERF_RECORD_MISC_MMAP_DATA		(1 << 13)
 #define PERF_RECORD_MISC_COMM_EXEC		(1 << 13)
 #define PERF_RECORD_MISC_SWITCH_OUT		(1 << 13)
+#define PERF_RECORD_MISC_USER_DATA		(1 << 13)
 /*
  * Indicates that the content of PERF_SAMPLE_IP points to
  * the actual instruction that triggered the event. See also
@@ -821,6 +824,7 @@ enum perf_event_type {
 	 *	{ u64			abi; # enum perf_sample_regs_abi
 	 *	  u64			regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR
 	 *	{ u64			phys_addr;} && PERF_SAMPLE_PHYS_ADDR
+	 *	{ u64			user_data_id;} && PERF_SAMPLE_USER_DATA_ID
 	 * };
 	 */
 	PERF_RECORD_SAMPLE			= 9,
@@ -923,6 +927,31 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_NAMESPACES			= 16,
 
+	/*
+	 * Records the user space data for previous
+	 * kernel samples.
+	 *
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u64				sample_type;
+	 *
+	 *	# The sample_type value could contain following
+	 *	# PERF_SAMPLE_* bits:
+	 *	#
+	 *	#   PERF_SAMPLE_USER_DATA_ID
+	 *	#   PERF_SAMPLE_CALLCHAIN
+	 *	#
+	 *	# and governs the data portion:
+	 *
+	 *	{ u64		nr,
+	 *	  u64		ips[nr];}      && PERF_SAMPLE_CALLCHAIN
+	 *	{ u64		user_data_id;} && PERF_SAMPLE_USER_DATA_ID
+	 *
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_USER_DATA			= 17,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 11/21] perf tools: Add perf_sample__parse function
  2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
                   ` (9 preceding siblings ...)
  2018-01-24 11:51 ` [PATCH 10/21] perf tools: Sync perf_event.h uapi header Jiri Olsa
@ 2018-01-24 11:51 ` Jiri Olsa
  2018-01-24 11:51 ` [PATCH 12/21] perf tools: Add struct parse_args arg to perf_sample__parse Jiri Olsa
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 11:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: lkml, Namhyung Kim, David Ahern, Andi Kleen, Alexander Shishkin,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Adding perf_sample__parse function to separate out
the parse sampling. It will be used later in the
patchset.

Link: http://lkml.kernel.org/n/tip-q4u4tp1jra0x5x3vly7wwvir@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/evsel.c | 47 +++++++++++++++++++++++++++--------------------
 1 file changed, 27 insertions(+), 20 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 0f62de48594e..5a95839994a1 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2045,8 +2045,9 @@ perf_event__check_size(union perf_event *event, unsigned int sample_size)
 	return 0;
 }
 
-int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
-			     struct perf_sample *data)
+static int
+perf_sample__parse(struct perf_sample *data, struct perf_evsel *evsel,
+		   union perf_event *event)
 {
 	u64 type = evsel->attr.sample_type;
 	bool swapped = evsel->needs_swap;
@@ -2061,26 +2062,8 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
 	 */
 	union u64_swap u;
 
-	memset(data, 0, sizeof(*data));
-	data->cpu = data->pid = data->tid = -1;
-	data->stream_id = data->id = data->time = -1ULL;
-	data->period = evsel->attr.sample_period;
-	data->cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
-	data->misc    = event->header.misc;
-	data->id = -1ULL;
-	data->data_src = PERF_MEM_DATA_SRC_NONE;
-
-	if (event->header.type != PERF_RECORD_SAMPLE) {
-		if (!evsel->attr.sample_id_all)
-			return 0;
-		return perf_evsel__parse_id_sample(evsel, event, data);
-	}
-
 	array = event->sample.array;
 
-	if (perf_event__check_size(event, evsel->sample_size))
-		return -EFAULT;
-
 	if (type & PERF_SAMPLE_IDENTIFIER) {
 		data->id = *array;
 		array++;
@@ -2324,6 +2307,30 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
 	return 0;
 }
 
+int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
+			     struct perf_sample *data)
+{
+	memset(data, 0, sizeof(*data));
+	data->cpu = data->pid = data->tid = -1;
+	data->stream_id = data->id = data->time = -1ULL;
+	data->period = evsel->attr.sample_period;
+	data->cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+	data->misc    = event->header.misc;
+	data->id = -1ULL;
+	data->data_src = PERF_MEM_DATA_SRC_NONE;
+
+	if (event->header.type != PERF_RECORD_SAMPLE) {
+		if (!evsel->attr.sample_id_all)
+			return 0;
+		return perf_evsel__parse_id_sample(evsel, event, data);
+	}
+
+	if (perf_event__check_size(event, evsel->sample_size))
+		return -EFAULT;
+
+	return perf_sample__parse(data, evsel, event);
+}
+
 int perf_evsel__parse_sample_timestamp(struct perf_evsel *evsel,
 				       union perf_event *event,
 				       u64 *timestamp)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 12/21] perf tools: Add struct parse_args arg to perf_sample__parse
  2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
                   ` (10 preceding siblings ...)
  2018-01-24 11:51 ` [PATCH 11/21] perf tools: Add perf_sample__parse function Jiri Olsa
@ 2018-01-24 11:51 ` Jiri Olsa
  2018-01-24 11:51 ` [PATCH 13/21] perf tools: Add support to parse user data event Jiri Olsa
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 11:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: lkml, Namhyung Kim, David Ahern, Andi Kleen, Alexander Shishkin,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Move perf_sample__parse function arguments into
struct parse_args to be able to pass other sample
type than the one in perf_evsel.

Link: http://lkml.kernel.org/n/tip-1e1vrg29sr8kbjrmi244yft8@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/evsel.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 5a95839994a1..035da5d1fdd3 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2045,13 +2045,21 @@ perf_event__check_size(union perf_event *event, unsigned int sample_size)
 	return 0;
 }
 
+struct parse_args {
+	struct perf_evsel	*evsel;
+	union perf_event	*event;
+	const u64		*array;
+	u64			 type;
+};
+
 static int
-perf_sample__parse(struct perf_sample *data, struct perf_evsel *evsel,
-		   union perf_event *event)
+perf_sample__parse(struct perf_sample *data, struct parse_args *arg)
 {
-	u64 type = evsel->attr.sample_type;
+	struct perf_evsel *evsel = arg->evsel;
+	union  perf_event *event = arg->event;
+	u64 type                 = arg->type;
+	const u64 *array         = arg->array;
 	bool swapped = evsel->needs_swap;
-	const u64 *array;
 	u16 max_size = event->header.size;
 	const void *endp = (void *)event + max_size;
 	u64 sz;
@@ -2062,8 +2070,6 @@ perf_sample__parse(struct perf_sample *data, struct perf_evsel *evsel,
 	 */
 	union u64_swap u;
 
-	array = event->sample.array;
-
 	if (type & PERF_SAMPLE_IDENTIFIER) {
 		data->id = *array;
 		array++;
@@ -2310,6 +2316,13 @@ perf_sample__parse(struct perf_sample *data, struct perf_evsel *evsel,
 int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
 			     struct perf_sample *data)
 {
+	struct parse_args arg = {
+		.evsel = evsel,
+		.event = event,
+		.array = event->sample.array,
+		.type  = evsel->attr.sample_type,
+	};
+
 	memset(data, 0, sizeof(*data));
 	data->cpu = data->pid = data->tid = -1;
 	data->stream_id = data->id = data->time = -1ULL;
@@ -2328,7 +2341,7 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
 	if (perf_event__check_size(event, evsel->sample_size))
 		return -EFAULT;
 
-	return perf_sample__parse(data, evsel, event);
+	return perf_sample__parse(data, &arg);
 }
 
 int perf_evsel__parse_sample_timestamp(struct perf_evsel *evsel,
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 13/21] perf tools: Add support to parse user data event
  2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
                   ` (11 preceding siblings ...)
  2018-01-24 11:51 ` [PATCH 12/21] perf tools: Add struct parse_args arg to perf_sample__parse Jiri Olsa
@ 2018-01-24 11:51 ` Jiri Olsa
  2018-01-24 11:51 ` [PATCH 14/21] perf tools: Add support to dump user data event info Jiri Olsa
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 11:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: lkml, Namhyung Kim, David Ahern, Andi Kleen, Alexander Shishkin,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Adding support to parse user data event and prepare
it for later processing.

Link: http://lkml.kernel.org/n/tip-j1pw90h5a9mhecpk949p68gs@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/event.c   |  1 +
 tools/perf/util/event.h   |  8 ++++++++
 tools/perf/util/evsel.c   | 20 ++++++++++++++++++--
 tools/perf/util/session.c | 23 +++++++++++++++++------
 tools/perf/util/tool.h    |  1 +
 5 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 44e603c27944..89f20ae9d949 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -43,6 +43,7 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_SWITCH]			= "SWITCH",
 	[PERF_RECORD_SWITCH_CPU_WIDE]		= "SWITCH_CPU_WIDE",
 	[PERF_RECORD_NAMESPACES]		= "NAMESPACES",
+	[PERF_RECORD_USER_DATA]			= "USER_DATA",
 	[PERF_RECORD_HEADER_ATTR]		= "ATTR",
 	[PERF_RECORD_HEADER_EVENT_TYPE]		= "EVENT_TYPE",
 	[PERF_RECORD_HEADER_TRACING_DATA]	= "TRACING_DATA",
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 546539da1592..5ac657aebb67 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -99,6 +99,12 @@ struct sample_event {
 	u64 array[];
 };
 
+struct user_data_event {
+	struct perf_event_header        header;
+	u64 type;
+	u64 array[];
+};
+
 struct regs_dump {
 	u64 abi;
 	u64 mask;
@@ -203,6 +209,7 @@ struct perf_sample {
 	u32 raw_size;
 	u64 data_src;
 	u64 phys_addr;
+	u64 user_data_id;
 	u32 flags;
 	u16 insn_len;
 	u8  cpumode;
@@ -633,6 +640,7 @@ union perf_event {
 	struct read_event		read;
 	struct throttle_event		throttle;
 	struct sample_event		sample;
+	struct user_data_event		user_data;
 	struct attr_event		attr;
 	struct event_update_event	event_update;
 	struct event_type_event		event_type;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 035da5d1fdd3..6f6eab6bc108 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1508,7 +1508,7 @@ static void __p_sample_type(char *buf, size_t size, u64 value)
 		bit_name(PERIOD), bit_name(STREAM_ID), bit_name(RAW),
 		bit_name(BRANCH_STACK), bit_name(REGS_USER), bit_name(STACK_USER),
 		bit_name(IDENTIFIER), bit_name(REGS_INTR), bit_name(DATA_SRC),
-		bit_name(WEIGHT), bit_name(PHYS_ADDR),
+		bit_name(WEIGHT), bit_name(PHYS_ADDR), bit_name(USER_DATA_ID),
 		{ .name = NULL, }
 	};
 #undef bit_name
@@ -1602,6 +1602,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
 	PRINT_ATTRf(context_switch, p_unsigned);
 	PRINT_ATTRf(write_backward, p_unsigned);
 	PRINT_ATTRf(namespaces, p_unsigned);
+	PRINT_ATTRf(user_data, p_unsigned);
 
 	PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned);
 	PRINT_ATTRf(bp_type, p_unsigned);
@@ -2310,6 +2311,10 @@ perf_sample__parse(struct perf_sample *data, struct parse_args *arg)
 		array++;
 	}
 
+	if (type & PERF_SAMPLE_USER_DATA_ID) {
+		data->user_data_id = *array;
+		array++;
+	}
 	return 0;
 }
 
@@ -2335,7 +2340,15 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
 	if (event->header.type != PERF_RECORD_SAMPLE) {
 		if (!evsel->attr.sample_id_all)
 			return 0;
-		return perf_evsel__parse_id_sample(evsel, event, data);
+
+		perf_evsel__parse_id_sample(evsel, event, data);
+
+		if (event->header.type != PERF_RECORD_USER_DATA)
+			return 0;
+
+		arg.type  = event->user_data.type;
+		arg.array = event->user_data.array;
+		return perf_sample__parse(data, &arg);
 	}
 
 	if (perf_event__check_size(event, evsel->sample_size))
@@ -2493,6 +2506,9 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
 	if (type & PERF_SAMPLE_PHYS_ADDR)
 		result += sizeof(u64);
 
+	if (type & PERF_SAMPLE_USER_DATA_ID)
+		result += sizeof(u64);
+
 	return result;
 }
 
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index da0635e2f100..cb910ea6f0a0 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -361,6 +361,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 {
 	if (tool->sample == NULL)
 		tool->sample = process_event_sample_stub;
+	if (tool->user_data == NULL)
+		tool->user_data = process_event_sample_stub;
 	if (tool->mmap == NULL)
 		tool->mmap = process_event_stub;
 	if (tool->mmap2 == NULL)
@@ -1127,6 +1129,13 @@ static void dump_sample(struct perf_evsel *evsel, union perf_event *event,
 	if (sample_type & PERF_SAMPLE_TRANSACTION)
 		printf("... transaction: %" PRIx64 "\n", sample->transaction);
 
+	if (sample_type & PERF_SAMPLE_USER_DATA_ID) {
+		if (sample->misc & PERF_RECORD_MISC_USER_DATA)
+			printf("... user data ID: %" PRIu64 "\n", sample->user_data_id);
+		else
+			printf("... user data ID: N/A\n");
+	}
+
 	if (sample_type & PERF_SAMPLE_READ)
 		sample_read__printf(sample, evsel->attr.read_format);
 }
@@ -1225,12 +1234,12 @@ static int deliver_sample_group(struct perf_evlist *evlist,
 }
 
 static int
- perf_evlist__deliver_sample(struct perf_evlist *evlist,
-			     struct perf_tool *tool,
-			     union  perf_event *event,
-			     struct perf_sample *sample,
-			     struct perf_evsel *evsel,
-			     struct machine *machine)
+perf_evlist__deliver_sample(struct perf_evlist *evlist,
+			    struct perf_tool *tool,
+			    union  perf_event *event,
+			    struct perf_sample *sample,
+			    struct perf_evsel *evsel,
+			    struct machine *machine)
 {
 	/* We know evsel != NULL. */
 	u64 sample_type = evsel->attr.sample_type;
@@ -1276,6 +1285,8 @@ static int machines__deliver_event(struct machines *machines,
 			return 0;
 		}
 		return perf_evlist__deliver_sample(evlist, tool, event, sample, evsel, machine);
+	case PERF_RECORD_USER_DATA:
+		return tool->user_data(tool, event, sample, evsel, machine);
 	case PERF_RECORD_MMAP:
 		return tool->mmap(tool, event, sample, machine);
 	case PERF_RECORD_MMAP2:
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 183c91453522..9ae190d4d0aa 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -43,6 +43,7 @@ enum show_feature_header {
 
 struct perf_tool {
 	event_sample	sample,
+			user_data,
 			read;
 	event_op	mmap,
 			mmap2,
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 14/21] perf tools: Add support to dump user data event info
  2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
                   ` (12 preceding siblings ...)
  2018-01-24 11:51 ` [PATCH 13/21] perf tools: Add support to parse user data event Jiri Olsa
@ 2018-01-24 11:51 ` Jiri Olsa
  2018-01-24 11:51 ` [PATCH 15/21] perf report: Add delayed user data event processing Jiri Olsa
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 11:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: lkml, Namhyung Kim, David Ahern, Andi Kleen, Alexander Shishkin,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Dump user data event in perf report -D, looks like this:

  0x25de3e [0x58]: event: 17
  .
  . ... raw event: size 88 bytes
  .  0000:  11 00 00 00 00 00 58 00 20 00 10 00 00 00 00 00  ......X. .......

  SNIP

  1214276524830 0x25de3e [0x58]: PERF_RECORD_USER_DATA 29014/29014: type: 0x100020, id 5
  ... FP chain: nr:5
  .....  0: fffffffffffffe00
  .....  1: 00007fadc451cd77
  .....  2: 00007fadc4504eb1
  .....  3: 00007fadc451af0f
  .....  4: 0000000000000040

Link: http://lkml.kernel.org/n/tip-d3qh6jythy3cjg3askwn8fd1@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/session.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index cb910ea6f0a0..218b36e76d4d 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1164,6 +1164,24 @@ static void dump_read(struct perf_evsel *evsel, union perf_event *event)
 		printf("... id           : %" PRIu64 "\n", read_event->id);
 }
 
+static void
+dump_user_data(union perf_event *event, struct perf_sample *sample)
+{
+	u64 type;
+
+	if (!dump_trace)
+		return;
+
+	printf(" %d/%d: type: %#" PRIx64 ", id %" PRIu64 "\n",
+	       sample->pid, sample->tid, event->user_data.type,
+	       sample->user_data_id);
+
+	type = event->user_data.type;
+
+	if (type & PERF_SAMPLE_CALLCHAIN)
+		callchain__printf(sample);
+}
+
 static struct machine *machines__find_for_cpumode(struct machines *machines,
 					       union perf_event *event,
 					       struct perf_sample *sample)
@@ -1286,6 +1304,7 @@ static int machines__deliver_event(struct machines *machines,
 		}
 		return perf_evlist__deliver_sample(evlist, tool, event, sample, evsel, machine);
 	case PERF_RECORD_USER_DATA:
+		dump_user_data(event, sample);
 		return tool->user_data(tool, event, sample, evsel, machine);
 	case PERF_RECORD_MMAP:
 		return tool->mmap(tool, event, sample, machine);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 15/21] perf report: Add delayed user data event processing
  2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
                   ` (13 preceding siblings ...)
  2018-01-24 11:51 ` [PATCH 14/21] perf tools: Add support to dump user data event info Jiri Olsa
@ 2018-01-24 11:51 ` Jiri Olsa
  2018-01-24 11:51 ` [PATCH 16/21] perf record: Enable delayed user data events Jiri Olsa
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 11:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: lkml, Namhyung Kim, David Ahern, Andi Kleen, Alexander Shishkin,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Adding support to process user data events and attach
their data into samples.

The logic is to copy and store every sample that has
delayed user data (PERF_RECORD_MISC_USER_DATA bit)
under per-thread list.

Then when there's the USER DATA event under this thread,
we go through that list and attach user data to matching
samples (using USER_DATA_ID value).

The processing data at this point are already sorted,
so we don't need to worry about wrongly skipping samples
by unordered USER DATA event. Also we connect only
data that matches same event.

However event loss is still possible, that's why we
match USER_DATA_ID on sample and USER DATA event.
We can remove not matching samples.

Link: http://lkml.kernel.org/n/tip-hacxbv4wzvpt50nf8wj6lk2w@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-report.c | 127 ++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/thread.c    |   1 +
 tools/perf/util/thread.h    |   8 +++
 3 files changed, 136 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index a08e2c88070a..a1cd5fd793fc 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -293,6 +293,31 @@ perf_sample__process(struct perf_sample *sample, struct addr_location *al,
 	return hist_entry_iter__add(&iter, al, rep->max_stack, rep);
 }
 
+static int
+perf_thread__add_user_data(struct thread *thread,
+			   struct perf_sample *sample,
+			   struct addr_location *al,
+			   struct perf_evsel *evsel)
+{
+	struct user_data *entry;
+
+	entry = zalloc(sizeof(*entry));
+	if (!entry)
+		return -ENOMEM;
+
+	entry->al    = *al;
+	entry->evsel = evsel;
+	INIT_LIST_HEAD(&entry->list);
+
+	if (perf_sample__copy(&entry->sample, sample)) {
+		free(entry);
+		return -ENOMEM;
+	}
+
+	list_add_tail(&entry->list, &thread->user_data_list);
+	return 0;
+}
+
 static int process_sample_event(struct perf_tool *tool,
 				union perf_event *event,
 				struct perf_sample *sample,
@@ -332,6 +357,9 @@ static int process_sample_event(struct perf_tool *tool,
 	if (al.map != NULL)
 		al.map->dso->hit = 1;
 
+	if (event->header.misc & PERF_RECORD_MISC_USER_DATA)
+		return perf_thread__add_user_data(al.thread, sample, &al, evsel);
+
 	ret = perf_sample__process(sample, &al, evsel, rep);
 	if (ret < 0)
 		pr_debug("problem adding hist entry, skipping event\n");
@@ -340,6 +368,104 @@ static int process_sample_event(struct perf_tool *tool,
 	return ret;
 }
 
+static int
+perf_sample__add_user_callchain(struct perf_sample *sample,
+				struct perf_sample *user)
+{
+	struct ip_callchain *sc = sample->callchain;
+	struct ip_callchain *uc = user->callchain;
+	struct ip_callchain *new;
+	u64 nr = 1 + sc->nr + uc->nr;
+
+	new = zalloc(nr * sizeof(u64));
+	if (!new)
+		return -ENOMEM;
+
+	new->nr = nr;
+	memcpy(new->ips,          sc->ips, sc->nr * sizeof(u64));
+	memcpy(new->ips + sc->nr, uc->ips, uc->nr * sizeof(u64));
+
+	free(sample->callchain);
+	sample->callchain = new;
+	return 0;
+}
+
+static int
+perf_sample__add_user_data(struct perf_sample *sample,
+			   struct perf_sample *user,
+			   u64 type)
+{
+	int ret = 0;
+
+	if (type & PERF_SAMPLE_CALLCHAIN)
+		ret = perf_sample__add_user_callchain(sample, user);
+
+	return ret;
+}
+
+static int
+user_data__process(struct user_data *entry, struct perf_sample *sample,
+		   struct user_data_event *event, struct report *rep)
+{
+	int ret;
+
+	ret = perf_sample__add_user_data(&entry->sample, sample, event->type);
+	if (ret)
+		return ret;
+
+	return perf_sample__process(&entry->sample, &entry->al, entry->evsel, rep);
+}
+
+static int
+thread__flush_user_data(struct thread *thread,
+			struct user_data_event *event,
+			struct perf_sample *sample,
+			struct report *rep)
+{
+	struct user_data *entry, *p;
+	int ret = 0;
+
+	list_for_each_entry_safe(entry, p, &thread->user_data_list, list) {
+		/* different event, skip it */
+		if (entry->sample.id != sample->id)
+			continue;
+
+		/*
+		 * We process only matching IDs, if we don't match in here
+		 * it means we've lot master sample, remove user data event
+		 * without any action.
+		 */
+		if (entry->sample.user_data_id == sample->user_data_id) {
+			ret = user_data__process(entry, sample, event, rep);
+			if (ret)
+				pr_debug("problem adding hist entry, skipping event\n");
+		}
+
+		list_del(&entry->list);
+		perf_sample__free(&entry->sample);
+		free(entry);
+	}
+
+	return ret;
+}
+
+static int
+process_user_data_event(struct perf_tool *tool,
+			union perf_event *event,
+			struct perf_sample *sample,
+			struct perf_evsel *evsel __maybe_unused,
+			struct machine *machine)
+{
+	struct report *rep = container_of(tool, struct report, tool);
+	struct thread *thread = machine__findnew_thread(machine, sample->pid,
+							sample->tid);
+
+	if (thread == NULL)
+		return -1;
+
+	return thread__flush_user_data(thread, &event->user_data, sample, rep);
+}
+
 static int process_read_event(struct perf_tool *tool,
 			      union perf_event *event,
 			      struct perf_sample *sample __maybe_unused,
@@ -1025,6 +1151,7 @@ int cmd_report(int argc, const char **argv)
 	struct report report = {
 		.tool = {
 			.sample		 = process_sample_event,
+			.user_data	 = process_user_data_event,
 			.mmap		 = perf_event__process_mmap,
 			.mmap2		 = perf_event__process_mmap2,
 			.comm		 = perf_event__process_comm,
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 68b65b10579b..e9008245a421 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -48,6 +48,7 @@ struct thread *thread__new(pid_t pid, pid_t tid)
 		INIT_LIST_HEAD(&thread->comm_list);
 		init_rwsem(&thread->namespaces_lock);
 		init_rwsem(&thread->comm_lock);
+		INIT_LIST_HEAD(&thread->user_data_list);
 
 		comm_str = malloc(32);
 		if (!comm_str)
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 40cfa36c022a..e78b295bbcdb 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -15,6 +15,13 @@
 struct thread_stack;
 struct unwind_libunwind_ops;
 
+struct user_data {
+	struct perf_sample	 sample;
+	struct addr_location	 al;
+	struct list_head	 list;
+	struct perf_evsel	*evsel;
+};
+
 struct thread {
 	union {
 		struct rb_node	 rb_node;
@@ -34,6 +41,7 @@ struct thread {
 	struct rw_semaphore	namespaces_lock;
 	struct list_head	comm_list;
 	struct rw_semaphore	comm_lock;
+	struct list_head	user_data_list;
 	u64			db_id;
 
 	void			*priv;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 16/21] perf record: Enable delayed user data events
  2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
                   ` (14 preceding siblings ...)
  2018-01-24 11:51 ` [PATCH 15/21] perf report: Add delayed user data event processing Jiri Olsa
@ 2018-01-24 11:51 ` Jiri Olsa
  2018-01-24 11:51 ` [PATCH 17/21] perf script: Add support to display " Jiri Olsa
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 11:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: lkml, Namhyung Kim, David Ahern, Andi Kleen, Alexander Shishkin,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Enabling user data events, when the right option
constellation is detected:

  - user space callchains are enabled
  - kernel space is not ommited (it's the only time
    USER DATA events can be generated)

Adding --no-user-data option to prevent this.

Link: http://lkml.kernel.org/n/tip-x2q1z2v9njw3f0nondg779i5@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-record.c |  2 ++
 tools/perf/perf.h           |  1 +
 tools/perf/util/evsel.c     | 26 ++++++++++++++++++++++++++
 3 files changed, 29 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index f251e824edac..fa87389ebfac 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1633,6 +1633,8 @@ static struct option __record_options[] = {
 		    "append timestamp to output filename"),
 	OPT_BOOLEAN(0, "timestamp-boundary", &record.timestamp_boundary,
 		    "Record timestamp boundary (time of first/last samples)"),
+	OPT_BOOLEAN(0, "no-user-data", &record.opts.no_user_data,
+		    "disable user data events"),
 	OPT_STRING_OPTARG_SET(0, "switch-output", &record.switch_output.str,
 			  &record.switch_output.set, "signal,size,time",
 			  "Switch output when receive SIGUSR2 or cross size,time threshold",
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 2357f4ccc9c7..23fb0ffac73c 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -42,6 +42,7 @@ struct record_opts {
 	bool	     no_inherit;
 	bool	     no_inherit_set;
 	bool	     no_samples;
+	bool	     no_user_data;
 	bool	     raw_samples;
 	bool	     sample_address;
 	bool	     sample_phys_addr;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 6f6eab6bc108..1d3c9fb91881 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -51,6 +51,7 @@ static struct {
 	bool lbr_flags;
 	bool write_backward;
 	bool group_read;
+	bool user_data;
 } perf_missing_features;
 
 static clockid_t clockid;
@@ -1079,6 +1080,23 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts,
 	apply_config_terms(evsel, opts, track);
 
 	evsel->ignore_missing_thread = opts->ignore_missing_thread;
+
+	/*
+	 * Enable delayed user data processing,
+	 * if it's allowed and if there's any.
+	 */
+	if (!opts->no_user_data) {
+		bool has_callchain = perf_evsel__is_sample_bit(evsel, CALLCHAIN);
+		bool user_data;
+
+		user_data  = !attr->exclude_kernel;
+		user_data &= (has_callchain && !attr->exclude_callchain_user);
+
+		if (user_data) {
+			attr->user_data = 1;
+			perf_evsel__set_sample_bit(evsel, USER_DATA_ID);
+		}
+	}
 }
 
 static int perf_evsel__alloc_fd(struct perf_evsel *evsel, int ncpus, int nthreads)
@@ -1762,6 +1780,10 @@ int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
 				     PERF_SAMPLE_BRANCH_NO_CYCLES);
 	if (perf_missing_features.group_read && evsel->attr.inherit)
 		evsel->attr.read_format &= ~(PERF_FORMAT_GROUP|PERF_FORMAT_ID);
+	if (perf_missing_features.user_data) {
+		evsel->attr.user_data    = 0;
+		evsel->attr.sample_type &= ~PERF_SAMPLE_USER_DATA_ID;
+	}
 retry_sample_id:
 	if (perf_missing_features.sample_id_all)
 		evsel->attr.sample_id_all = 0;
@@ -1923,6 +1945,10 @@ int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
 		perf_missing_features.group_read = true;
 		pr_debug2("switching off group read\n");
 		goto fallback_missing_features;
+	} else if (!perf_missing_features.user_data) {
+		perf_missing_features.user_data = true;
+		pr_debug2("switching off user data events\n");
+		goto fallback_missing_features;
 	}
 out_close:
 	do {
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 17/21] perf script: Add support to display user data events
  2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
                   ` (15 preceding siblings ...)
  2018-01-24 11:51 ` [PATCH 16/21] perf record: Enable delayed user data events Jiri Olsa
@ 2018-01-24 11:51 ` Jiri Olsa
  2018-01-24 11:51 ` [PATCH 18/21] perf script: Add support to display user data ID Jiri Olsa
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 11:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: lkml, Namhyung Kim, David Ahern, Andi Kleen, Alexander Shishkin,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Adding option to display user data events:

  $ perf script -F +misc,+tid --show-userdata-events ...

   sched-messaging  1410 KD    28690.634993:       2164 cycles:ppp:  ffffffff8985bb70 nmi_res ...
   sched-messaging  1410 KD    28690.635003:       8920 cycles:ppp:  ffffffff891b5e70 __perf_ ...
   sched-messaging  1410       28690.635043: PERF_RECORD_USER_DATA type 0x20 (callchain)
   sched-messaging  1407 KD    28690.635117:     319262 cycles:ppp:  ffffffff89206c3c copy_pa ...
   sched-messaging  1407       28690.635226: PERF_RECORD_USER_DATA type 0x20 (callchain)
   sched-messaging  1411 KD    28690.635318:          1 cycles:ppp:  ffffffff8905aa54 native_ ...

Link: http://lkml.kernel.org/n/tip-ozg2qt7ofv69e1hddxwuf0hp@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-script.c | 85 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index ab19a6ee4093..2b8231292fe2 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1489,6 +1489,7 @@ struct perf_script {
 	bool			show_switch_events;
 	bool			show_namespace_events;
 	bool			show_lost_events;
+	bool			show_userdata_events;
 	bool			allocated;
 	bool			per_event_dump;
 	struct cpu_map		*cpus;
@@ -2104,6 +2105,85 @@ process_lost_event(struct perf_tool *tool,
 	return 0;
 }
 
+static size_t
+perf_event__fprintf_user_data(union perf_event *event,
+			      struct perf_sample *sample,
+			      struct perf_evsel *evsel,
+			      struct perf_script *script,
+			      FILE *fp)
+{
+	const char *evname = perf_evsel__name(evsel);
+	u64 type = event->user_data.type;
+	bool single = true;
+	size_t printed = 0;
+
+	printed += fprintf(fp, "PERF_RECORD_%s ",
+			   perf_event__name(event->header.type));
+
+	if (!script->name_width)
+		script->name_width = perf_evlist__max_name_len(script->session->evlist);
+
+	printed += fprintf(fp, "%*s: ", script->name_width, evname ?: "[unknown]");
+
+	printed += fprintf(fp, " id %lu, type 0x%lx (",
+			   sample->user_data_id, event->user_data.type);
+
+	if (type & PERF_SAMPLE_CALLCHAIN) {
+		printed += fprintf(fp, "callchain");
+		single = false;
+	}
+
+	if (type & PERF_SAMPLE_STACK_USER) {
+		printed += fprintf(fp, "%sstack", !single ? "," : "");
+		single = false;
+	}
+
+	if (type & PERF_SAMPLE_USER_DATA_ID)
+		printed += fprintf(fp, "%sid", !single ? "," : "");
+
+	printed += fprintf(fp, ")\n");
+	return printed;
+}
+
+static int
+process_user_data_event(struct perf_tool *tool __maybe_unused,
+			union perf_event *event __maybe_unused,
+			struct perf_sample *sample,
+			struct perf_evsel *evsel,
+			struct machine *machine)
+{
+	struct perf_script *script = container_of(tool, struct perf_script, tool);
+	struct perf_event_attr *attr = &evsel->attr;
+	unsigned int type = output_type(attr->type);
+	struct thread *thread;
+
+	thread = machine__findnew_thread(machine, sample->pid,
+					 sample->tid);
+	if (thread == NULL)
+		return -1;
+
+	perf_sample__fprintf_start(sample, thread, evsel,
+				   PERF_RECORD_SAMPLE, stdout);
+	perf_event__fprintf_user_data(event, sample, evsel, script, stdout);
+
+	if (PRINT_FIELD(IP)) {
+		struct callchain_cursor *cursor = NULL;
+
+		if (symbol_conf.use_callchain && sample->callchain &&
+		    thread__resolve_callchain(thread, &callchain_cursor, evsel,
+					      sample, NULL, NULL, scripting_max_stack) == 0)
+			cursor = &callchain_cursor;
+
+		if (cursor) {
+			sample__fprintf_callchain(sample, 0, output[type].print_ip_opts, cursor, stdout);
+			fprintf(stdout, "\n");
+		}
+	}
+
+	thread__put(thread);
+	return 0;
+}
+
 static void sig_handler(int sig __maybe_unused)
 {
 	session_done = 1;
@@ -2200,6 +2280,9 @@ static int __cmd_script(struct perf_script *script)
 		script->tool.namespaces = process_namespaces_event;
 	if (script->show_lost_events)
 		script->tool.lost = process_lost_event;
+	if (script->show_userdata_events)
+		script->tool.user_data = process_user_data_event;
+
 
 	if (perf_script__setup_per_event_dump(script)) {
 		pr_err("Couldn't create the per event dump files\n");
@@ -3139,6 +3222,8 @@ int cmd_script(int argc, const char **argv)
 		    "Show namespace events (if recorded)"),
 	OPT_BOOLEAN('\0', "show-lost-events", &script.show_lost_events,
 		    "Show lost events (if recorded)"),
+	OPT_BOOLEAN('\0', "show-userdata-events", &script.show_userdata_events,
+		    "Show userdata events (if recorded)"),
 	OPT_BOOLEAN('\0', "per-event-dump", &script.per_event_dump,
 		    "Dump trace output to files named by the monitored events"),
 	OPT_BOOLEAN('f', "force", &symbol_conf.force, "don't complain, do it"),
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 18/21] perf script: Add support to display user data ID
  2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
                   ` (16 preceding siblings ...)
  2018-01-24 11:51 ` [PATCH 17/21] perf script: Add support to display " Jiri Olsa
@ 2018-01-24 11:51 ` Jiri Olsa
  2018-01-24 11:51 ` [PATCH 19/21] perf script: Display USER_DATA misc char for sample Jiri Olsa
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 11:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: lkml, Namhyung Kim, David Ahern, Andi Kleen, Alexander Shishkin,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Assing new -F option user_data_id field argument
to display column with user data id for sample.

The new column displays the USER_DATA_ID or N/A
if the sample does not have user data attached
(PERF_RECORD_MISC_USER_DATA bit set)

  $ perf script -F+user_data_id ...
              perf 29014  ... cycles:ppp:              N/A ffffffff870d013d __upda...
   sched-messaging 29014  ... cycles:ppp:                8 ffffffff8741d15a flex_a...
                                   new column _________/

Link: http://lkml.kernel.org/n/tip-ujjq7auvaw49slr57sarh0cu@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/Documentation/perf-script.txt | 2 +-
 tools/perf/builtin-script.c              | 9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 7730c1d2b5d3..0d1db8c7d2d1 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -117,7 +117,7 @@ OPTIONS
         Comma separated list of fields to print. Options are:
         comm, tid, pid, time, cpu, event, trace, ip, sym, dso, addr, symoff,
         srcline, period, iregs, uregs, brstack, brstacksym, flags, bpf-output, brstackinsn,
-        brstackoff, callindent, insn, insnlen, synth, phys_addr, metric, misc.
+        brstackoff, callindent, insn, insnlen, synth, phys_addr, metric, misc, user_data_id.
         Field list can be prepended with the type, trace, sw or hw,
         to indicate to which event type the field list applies.
         e.g., -F sw:comm,tid,time,ip,sym  and -F trace:time,cpu,trace
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 2b8231292fe2..7baa2e5d7f9e 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -95,6 +95,7 @@ enum perf_output_field {
 	PERF_OUTPUT_UREGS	    = 1U << 27,
 	PERF_OUTPUT_METRIC	    = 1U << 28,
 	PERF_OUTPUT_MISC            = 1U << 29,
+	PERF_OUTPUT_USER_DATA_ID    = 1U << 30,
 };
 
 struct output_option {
@@ -131,6 +132,7 @@ struct output_option {
 	{.str = "phys_addr", .field = PERF_OUTPUT_PHYS_ADDR},
 	{.str = "metric", .field = PERF_OUTPUT_METRIC},
 	{.str = "misc", .field = PERF_OUTPUT_MISC},
+	{.str = "user_data_id", .field = PERF_OUTPUT_USER_DATA_ID},
 };
 
 enum {
@@ -1670,6 +1672,13 @@ static void process_event(struct perf_script *script,
 	if (PRINT_FIELD(WEIGHT))
 		fprintf(fp, "%16" PRIu64, sample->weight);
 
+	if (PRINT_FIELD(USER_DATA_ID)) {
+		if (sample->misc & PERF_RECORD_MISC_USER_DATA)
+			fprintf(fp, "%16" PRIu64, sample->user_data_id);
+		else
+			fprintf(fp, "%16s", "N/A");
+	}
+
 	if (PRINT_FIELD(IP)) {
 		struct callchain_cursor *cursor = NULL;
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 19/21] perf script: Display USER_DATA misc char for sample
  2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
                   ` (17 preceding siblings ...)
  2018-01-24 11:51 ` [PATCH 18/21] perf script: Add support to display user data ID Jiri Olsa
@ 2018-01-24 11:51 ` Jiri Olsa
  2018-01-24 11:51 ` [PATCH 20/21] perf report: Add user data processing stats Jiri Olsa
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 11:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: lkml, Namhyung Kim, David Ahern, Andi Kleen, Alexander Shishkin,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Adding support to display sample USER_DATA misc misc char
for sample event:

  # perf script -F +misc ...
   sched-messaging  1414 KD    28690.636582:       4590 cycles ...
  new misc field   ______/

Assigning the letter 'D' for PERF_RECORD_MISC_USER_DATA.

Link: http://lkml.kernel.org/n/tip-8wczn7vnmgd98cxpemq9g832@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/Documentation/perf-script.txt | 1 +
 tools/perf/builtin-script.c              | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 0d1db8c7d2d1..d1425f067ad5 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -236,6 +236,7 @@ OPTIONS
 	  PERF_RECORD_MISC_MMAP_DATA*    M
 	  PERF_RECORD_MISC_COMM_EXEC     E
 	  PERF_RECORD_MISC_SWITCH_OUT    S
+	  PERF_RECORD_MISC_USER_DATA     D
 
 	  $ perf script -F +misc ...
 	   sched-messaging  1414 K     28690.636582:       4590 cycles ...
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 7baa2e5d7f9e..45473c2d6e25 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -648,6 +648,10 @@ static int perf_sample__fprintf_start(struct perf_sample *sample,
 			ret += fprintf(fp, "g");
 
 		switch (type) {
+		case PERF_RECORD_SAMPLE:
+			if (has(USER_DATA))
+				ret += fprintf(fp, "D");
+			break;
 		case PERF_RECORD_MMAP:
 		case PERF_RECORD_MMAP2:
 			if (has(MMAP_DATA))
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 20/21] perf report: Add user data processing stats
  2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
                   ` (18 preceding siblings ...)
  2018-01-24 11:51 ` [PATCH 19/21] perf script: Display USER_DATA misc char for sample Jiri Olsa
@ 2018-01-24 11:51 ` Jiri Olsa
  2018-01-24 11:51 ` [PATCH 21/21] perf report: Add --stats=ud option to display user data debug info Jiri Olsa
  2018-01-24 12:11 ` [RFC 00/21] perf tools: Add user data delayed processing Jiri Olsa
  21 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 11:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: lkml, Namhyung Kim, David Ahern, Andi Kleen, Alexander Shishkin,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Adding user data processing stats for to review the
processing. Adding following per-thread counters:

  user_data_sample - nb of samples with with
		     PERF_RECORD_MISC_USER_DATA set
  user_data_event  - nb of user data events
  user_data_match  - nb of samples that matched user
                     data event
  user_data_drop   - nb of samples that did not match
                     any user data event and were drop

Link: http://lkml.kernel.org/n/tip-j4zh9fpntjbtjo993cjdn92b@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-report.c | 6 ++++++
 tools/perf/util/thread.h    | 8 ++++++++
 2 files changed, 14 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index a1cd5fd793fc..82b2368d208a 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -314,6 +314,7 @@ perf_thread__add_user_data(struct thread *thread,
 		return -ENOMEM;
 	}
 
+	thread->stats.user_data_sample++;
 	list_add_tail(&entry->list, &thread->user_data_list);
 	return 0;
 }
@@ -425,6 +426,8 @@ thread__flush_user_data(struct thread *thread,
 	struct user_data *entry, *p;
 	int ret = 0;
 
+	thread->stats.user_data_event++;
+
 	list_for_each_entry_safe(entry, p, &thread->user_data_list, list) {
 		/* different event, skip it */
 		if (entry->sample.id != sample->id)
@@ -439,6 +442,9 @@ thread__flush_user_data(struct thread *thread,
 			ret = user_data__process(entry, sample, event, rep);
 			if (ret)
 				pr_debug("problem adding hist entry, skipping event\n");
+			thread->stats.user_data_match++;
+		} else {
+			thread->stats.user_data_drop++;
 		}
 
 		list_del(&entry->list);
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index e78b295bbcdb..742bbf8cb285 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -22,6 +22,13 @@ struct user_data {
 	struct perf_evsel	*evsel;
 };
 
+struct thread_stats {
+	u64	user_data_sample;
+	u64	user_data_event;
+	u64	user_data_match;
+	u64	user_data_drop;
+};
+
 struct thread {
 	union {
 		struct rb_node	 rb_node;
@@ -51,6 +58,7 @@ struct thread {
 	void				*addr_space;
 	struct unwind_libunwind_ops	*unwind_libunwind_ops;
 #endif
+	struct thread_stats	stats;
 };
 
 struct machine;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 21/21] perf report: Add --stats=ud option to display user data debug info
  2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
                   ` (19 preceding siblings ...)
  2018-01-24 11:51 ` [PATCH 20/21] perf report: Add user data processing stats Jiri Olsa
@ 2018-01-24 11:51 ` Jiri Olsa
  2018-01-24 12:11 ` [RFC 00/21] perf tools: Add user data delayed processing Jiri Olsa
  21 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 11:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: lkml, Namhyung Kim, David Ahern, Andi Kleen, Alexander Shishkin,
	Andy Lutomirski, Arnaldo Carvalho de Melo

Adding --stats=ud option to display user data
debug info, like:

  $ perf report --no-children --tasks=ud
  #  sample    event    match     drop     pid      tid     ppid  comm
        0        0        0        0       0        0       -1 |swapper
      503      437      503        0   23985    23985       -1 |sched-messaging
      594      520      594        0   24064    24064    23985 | sched-messaging
      624      345      622        2   24320    24320    23985 | sched-messaging
      567      514      566        1   24065    24065    23985 | sched-messaging
      470      298      467        3   24321    24321    23985 | sched-messaging
      388      266      387        1   24066    24066    23985 | sched-messaging
      ...

The --tasks output is useful for displaying thread
related stats. More stats can be added by adding
new argument to --stats=<arg> option.

Link: http://lkml.kernel.org/n/tip-3wcrngoibk5l96nqyhp0nbkm@git.kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/builtin-report.c | 51 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 45 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 82b2368d208a..2cd00055d517 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -54,6 +54,11 @@
 #include <unistd.h>
 #include <linux/mman.h>
 
+enum {
+	TASKS_STAT__USER_NONE = 0,
+	TASKS_STAT__USER_DATA,
+};
+
 struct report {
 	struct perf_tool	tool;
 	struct perf_session	*session;
@@ -83,6 +88,7 @@ struct report {
 	int			socket_filter;
 	DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
 	struct branch_type_stat	brtype_stat;
+	int			tasks_stat;
 };
 
 static int report__config(const char *var, const char *value, void *cb)
@@ -839,6 +845,11 @@ static void tasks_setup(struct report *rep)
 	rep->tool.exit = perf_event__process_exit;
 	rep->tool.fork = perf_event__process_fork;
 	rep->tool.no_warn = true;
+
+	if (rep->tasks_stat == TASKS_STAT__USER_DATA) {
+		rep->tool.sample    = process_sample_event;
+		rep->tool.user_data = process_user_data_event;
+	}
 }
 
 struct task {
@@ -898,11 +909,24 @@ static int map_groups__fprintf_task(struct map_groups *mg, int indent, FILE *fp)
 	return printed;
 }
 
-static void task__print_level(struct task *task, FILE *fp, int level)
+static void task__print_level(struct task *task, FILE *fp, int level,
+			      struct report *rep)
 {
 	struct thread *thread = task->thread;
 	struct task *child;
-	int comm_indent = fprintf(fp, "  %8d %8d %8d |%*s",
+	int comm_indent;
+
+	comm_indent = fprintf(fp, " ");
+
+	if (rep->tasks_stat == TASKS_STAT__USER_DATA) {
+		comm_indent += fprintf(fp, "%8" PRIu64 " %8" PRIu64 " %8" PRIu64 " %8" PRIu64,
+					thread->stats.user_data_sample,
+					thread->stats.user_data_event,
+					thread->stats.user_data_match,
+					thread->stats.user_data_drop);
+	}
+
+	comm_indent += fprintf(fp, "%8d %8d %8d |%*s",
 				  thread->pid_, thread->tid, thread->ppid,
 				  level, "");
 
@@ -912,7 +936,7 @@ static void task__print_level(struct task *task, FILE *fp, int level)
 
 	if (!list_empty(&task->children)) {
 		list_for_each_entry(child, &task->children, list)
-			task__print_level(child, fp, level + 1);
+			task__print_level(child, fp, level + 1, rep);
 	}
 }
 
@@ -973,10 +997,15 @@ static int tasks_print(struct report *rep, FILE *fp)
 			list_add_tail(&task->list, &list);
 	}
 
-	fprintf(fp, "# %8s %8s %8s  %s\n", "pid", "tid", "ppid", "comm");
+	fprintf(fp, "#");
+
+	if (rep->tasks_stat == TASKS_STAT__USER_DATA)
+		fprintf(fp, "%8s %8s %8s %8s", "sample", "event", "match", "drop");
+
+	fprintf(fp, "%8s %8s %8s  %s\n", "pid", "tid", "ppid", "comm");
 
 	list_for_each_entry(task, &list, list)
-		task__print_level(task, fp, 0);
+		task__print_level(task, fp, 0, rep);
 
 	free(tasks);
 	return 0;
@@ -1154,6 +1183,7 @@ int cmd_report(int argc, const char **argv)
 		"perf report [<options>]",
 		NULL
 	};
+	const char *tasks_stat;
 	struct report report = {
 		.tool = {
 			.sample		 = process_sample_event,
@@ -1189,7 +1219,8 @@ int cmd_report(int argc, const char **argv)
 	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
 		    "dump raw trace in ASCII"),
 	OPT_BOOLEAN(0, "stats", &report.stats_mode, "Display event stats"),
-	OPT_BOOLEAN(0, "tasks", &report.tasks_mode, "Display recorded tasks"),
+	OPT_STRING_OPTARG_SET(0, "tasks", &tasks_stat, &report.tasks_mode,
+			      "ud", "Display recorded tasks", "(none)"),
 	OPT_BOOLEAN(0, "mmaps", &report.mmaps_mode, "Display recorded tasks memory maps"),
 	OPT_STRING('k', "vmlinux", &symbol_conf.vmlinux_name,
 		   "file", "vmlinux pathname"),
@@ -1341,6 +1372,14 @@ int cmd_report(int argc, const char **argv)
 	if (report.mmaps_mode)
 		report.tasks_mode = true;
 
+	if (report.tasks_mode) {
+		if (!strcmp(tasks_stat, "ud")) {
+			report.tasks_stat = TASKS_STAT__USER_DATA;
+			/* force --no-children */
+			symbol_conf.cumulate_callchain = false;
+		}
+	}
+
 	if (quiet)
 		perf_quiet_option();
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC 00/21] perf tools: Add user data delayed processing
  2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
                   ` (20 preceding siblings ...)
  2018-01-24 11:51 ` [PATCH 21/21] perf report: Add --stats=ud option to display user data debug info Jiri Olsa
@ 2018-01-24 12:11 ` Jiri Olsa
  21 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2018-01-24 12:11 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Ingo Molnar, lkml, Namhyung Kim, David Ahern,
	Andi Kleen, Alexander Shishkin, Andy Lutomirski,
	Arnaldo Carvalho de Melo

changing wrong subject :-\

On Wed, Jan 24, 2018 at 12:51:22PM +0100, Jiri Olsa wrote:
> hi,
> this RFC contains change to delay sample's user space
> data retrieval into task work, originally described and
> discussed by Peter and Ingo in here [1].
> 
> This patchset tries to follow the original patch with
> some kernel changes (described below) and perf tool
> support included.
> 
> Basically we allow the NMI event code to skip user data
> retrieval and schedule task work to do it, before the
> task resumes.
> 
> Using the task work limits the window where we can do
> this. We can trigger the delayed task work only if the
> taskwork gets executed before the process executes again
> after NMI, because we need its stack as it was in NMI.
> 
> That leaves us with window during the slow syscall path
> (check task_struct::perf_user_data_allowed in patches).
> 
> The slow syscall processing is forced for task when
> the user data event is enabled, which makes the task
> slower.
> 
> On the other hand I noticed roughly 100us drop in NMI
> processing times, which I plotted in here [2].
> 
> Not sure it's worth to introduce this processing, which adds
> more processing time and does not show much improvement. On
> the other hand IIRC Peter mentioned it'd be nice to get user
> space data retrieval out of NMI.
> 
> Also you guys could think of some other better/faster way ;-)
> 
> NOTE I also implemented putting the user stack data into
> delayed processing, which showed nicer numbers. But it's
> little more tricky and brings more changes into this already
> big patchset. The logic stays, so I did not include it to
> keep the patchset simple.
> 
> Also available in:
>   https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
>   perf/user_data
> 
> thanks for comments,
> jirka
> 
> [1] https://marc.info/?l=linux-kernel&m=150098372819938&w=2
> [2] http://people.redhat.com/~jolsa/ud-bench.png
> 
> ---
> Jiri Olsa (21):
>       perf tools: Add perf_evsel__is_sample_bit function
>       perf tools: Add perf_sample__process function
>       perf tools: Add callchain__printf for pure callchain dump
>       perf tools: Add perf_sample__copy|free functions
>       perf: Add TIF_PERF_USER_DATA bit
>       perf: Add PERF_RECORD_USER_DATA event processing
>       perf: Add PERF_SAMPLE_USER_DATA_ID sample type
>       perf: Add PERF_SAMPLE_CALLCHAIN to user data event
>       perf: Export running sample length values through debugfs
>       perf tools: Sync perf_event.h uapi header
>       perf tools: Add perf_sample__parse function
>       perf tools: Add struct parse_args arg to perf_sample__parse
>       perf tools: Add support to parse user data event
>       perf tools: Add support to dump user data event info
>       perf report: Add delayed user data event processing
>       perf record: Enable delayed user data events
>       perf script: Add support to display user data events
>       perf script: Add support to display user data ID
>       perf script: Display USER_DATA misc char for sample
>       perf report: Add user data processing stats
>       perf report: Add --stats=ud option to display user data debug info
> 
>  arch/x86/entry/common.c                  |   6 +++
>  arch/x86/events/core.c                   |  18 ++++++++
>  arch/x86/events/intel/ds.c               |   4 +-
>  arch/x86/include/asm/thread_info.h       |   4 +-
>  include/linux/init_task.h                |   4 +-
>  include/linux/perf_event.h               |   3 ++
>  include/linux/sched.h                    |  20 ++++++++
>  include/uapi/linux/perf_event.h          |  34 +++++++++++++-
>  kernel/events/core.c                     | 283 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  tools/include/uapi/linux/perf_event.h    |  34 +++++++++++++-
>  tools/perf/Documentation/perf-script.txt |   3 +-
>  tools/perf/builtin-record.c              |   2 +
>  tools/perf/builtin-report.c              | 301 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------
>  tools/perf/builtin-script.c              |  98 +++++++++++++++++++++++++++++++++++++++
>  tools/perf/perf.h                        |   1 +
>  tools/perf/util/event.c                  |   1 +
>  tools/perf/util/event.h                  |   9 ++++
>  tools/perf/util/evsel.c                  | 118 +++++++++++++++++++++++++++++++++++++----------
>  tools/perf/util/evsel.h                  |   5 ++
>  tools/perf/util/session.c                |  60 +++++++++++++++++++-----
>  tools/perf/util/thread.c                 |   1 +
>  tools/perf/util/thread.h                 |  16 +++++++
>  tools/perf/util/tool.h                   |   1 +
>  23 files changed, 954 insertions(+), 72 deletions(-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-01-24 12:11 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-24 11:51 [RFC 00/21] perf tools: Add perf_evsel__is_sample_bit function Jiri Olsa
2018-01-24 11:51 ` [PATCH 01/21] " Jiri Olsa
2018-01-24 11:51 ` [PATCH 02/21] perf tools: Add perf_sample__process function Jiri Olsa
2018-01-24 11:51 ` [PATCH 03/21] perf tools: Add callchain__printf for pure callchain dump Jiri Olsa
2018-01-24 11:51 ` [PATCH 04/21] perf tools: Add perf_sample__copy|free functions Jiri Olsa
2018-01-24 11:51 ` [PATCH 05/21] perf: Add TIF_PERF_USER_DATA bit Jiri Olsa
2018-01-24 11:51 ` [PATCH 06/21] perf: Add PERF_RECORD_USER_DATA event processing Jiri Olsa
2018-01-24 11:51 ` [PATCH 07/21] perf: Add PERF_SAMPLE_USER_DATA_ID sample type Jiri Olsa
2018-01-24 11:51 ` [PATCH 08/21] perf: Add PERF_SAMPLE_CALLCHAIN to user data event Jiri Olsa
2018-01-24 11:51 ` [PATCH 09/21] perf: Export running sample length values through debugfs Jiri Olsa
2018-01-24 11:51 ` [PATCH 10/21] perf tools: Sync perf_event.h uapi header Jiri Olsa
2018-01-24 11:51 ` [PATCH 11/21] perf tools: Add perf_sample__parse function Jiri Olsa
2018-01-24 11:51 ` [PATCH 12/21] perf tools: Add struct parse_args arg to perf_sample__parse Jiri Olsa
2018-01-24 11:51 ` [PATCH 13/21] perf tools: Add support to parse user data event Jiri Olsa
2018-01-24 11:51 ` [PATCH 14/21] perf tools: Add support to dump user data event info Jiri Olsa
2018-01-24 11:51 ` [PATCH 15/21] perf report: Add delayed user data event processing Jiri Olsa
2018-01-24 11:51 ` [PATCH 16/21] perf record: Enable delayed user data events Jiri Olsa
2018-01-24 11:51 ` [PATCH 17/21] perf script: Add support to display " Jiri Olsa
2018-01-24 11:51 ` [PATCH 18/21] perf script: Add support to display user data ID Jiri Olsa
2018-01-24 11:51 ` [PATCH 19/21] perf script: Display USER_DATA misc char for sample Jiri Olsa
2018-01-24 11:51 ` [PATCH 20/21] perf report: Add user data processing stats Jiri Olsa
2018-01-24 11:51 ` [PATCH 21/21] perf report: Add --stats=ud option to display user data debug info Jiri Olsa
2018-01-24 12:11 ` [RFC 00/21] perf tools: Add user data delayed processing Jiri Olsa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).