linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V4 0/5] perf: Add PERF_RECORD_SWITCH to indicate context switches
@ 2015-07-21  9:44 Adrian Hunter
  2015-07-21  9:44 ` [PATCH V4 1/5] " Adrian Hunter
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Adrian Hunter @ 2015-07-21  9:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel,
	Jiri Olsa, Stephane Eranian, mathieu.poirier, Pawel Moll

Hi

Here is V4 of PERF_RECORD_SWITCH.

Changes in V4:

    perf: Add PERF_RECORD_SWITCH to indicate context switches
        Use a separate record format PERF_RECORD_SWITCH_CPU_WIDE
        for the CPU-wide case that includes the next/previous pid/tid

    perf tools: Add new PERF_RECORD_SWITCH event
        Amended to handle PERF_RECORD_SWITCH_CPU_WIDE

    perf record: Add option --switch-events to select PERF_RECORD_SWITCH events
        Add PERF_RECORD_SWITCH_CPU_WIDE to documentation

    perf script: Don't assume evsel position of tracking events
        New patch

    perf script: Add option --show-switch-events
        Add PERF_RECORD_SWITCH_CPU_WIDE to documentation

Changes in V3:

Rename 'perf_event_switch()' parameter 'out' to be 'sched_in'
and invert meaning.

Changes in V2 (RFC):

The event now shows scheduling in and out,
and I added next/prev pid/tid for cpu-wide
contexts, plus there are some tools patches
that apply to Arnaldo's perf/core branch.


Adrian Hunter (5):
      perf: Add PERF_RECORD_SWITCH to indicate context switches
      perf tools: Add new PERF_RECORD_SWITCH event
      perf record: Add option --switch-events to select PERF_RECORD_SWITCH events
      perf script: Don't assume evsel position of tracking events
      perf script: Add option --show-switch-events

 include/uapi/linux/perf_event.h          |  31 +++++++++-
 kernel/events/core.c                     | 103 +++++++++++++++++++++++++++++++
 tools/perf/Documentation/perf-record.txt |   4 ++
 tools/perf/Documentation/perf-script.txt |   4 ++
 tools/perf/builtin-inject.c              |   1 +
 tools/perf/builtin-record.c              |   7 +++
 tools/perf/builtin-script.c              |  41 ++++++++++--
 tools/perf/perf.h                        |   1 +
 tools/perf/util/event.c                  |  28 +++++++++
 tools/perf/util/event.h                  |  12 ++++
 tools/perf/util/evlist.h                 |   1 +
 tools/perf/util/evsel.c                  |   4 ++
 tools/perf/util/machine.c                |  11 ++++
 tools/perf/util/machine.h                |   2 +
 tools/perf/util/record.c                 |  10 +++
 tools/perf/util/session.c                |  20 ++++++
 tools/perf/util/tool.h                   |   1 +
 17 files changed, 275 insertions(+), 6 deletions(-)


Regards
Adrian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH V4 1/5] perf: Add PERF_RECORD_SWITCH to indicate context switches
  2015-07-21  9:44 [PATCH V4 0/5] perf: Add PERF_RECORD_SWITCH to indicate context switches Adrian Hunter
@ 2015-07-21  9:44 ` Adrian Hunter
  2015-07-21 10:29   ` Peter Zijlstra
  2015-07-29  8:14   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-21  9:44 ` [PATCH V4 2/5] perf tools: Add new PERF_RECORD_SWITCH event Adrian Hunter
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 15+ messages in thread
From: Adrian Hunter @ 2015-07-21  9:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel,
	Jiri Olsa, Stephane Eranian, mathieu.poirier, Pawel Moll

There are already two events for context switches, namely
the tracepoint sched:sched_switch and the software event
context_switches. Unfortunately neither are suitable for
use by non-privileged users for the purpose of synchronizing
hardware trace data (e.g. Intel PT) to the context switch.

Tracepoints are no good at all for non-privileged users
because they need either CAP_SYS_ADMIN or
/proc/sys/kernel/perf_event_paranoid <= -1.

On the other hand, kernel software events need either
CAP_SYS_ADMIN or /proc/sys/kernel/perf_event_paranoid <= 1.

Now many distributions do default perf_event_paranoid to 1
making context_switches a contender, except it has another
problem (which is also shared with sched:sched_switch)
which is that it happens before perf schedules events out
instead of after perf schedules events in. Whereas a
privileged user can see all the events anyway, a
non-privileged user only sees events for their own processes,
in other words they see when their process was scheduled out
not when it was scheduled in. That presents two problems to
use the event: 1. the information comes too late, so tools
have to look ahead in the event stream to find out what the
current state is 2. if they are unlucky tracing might have
stopped before the context-switches event is recorded.

This new PERF_RECORD_SWITCH event does not have those problems
and it also has a couple of other small advantages. It is
easier to use because it is an auxiliary event (like mmap,
comm and task events) which can be enabled by setting a single
bit. It is smaller than sched:sched_switch and easier to parse.

To make the event useful for privileged users also, if the
context is cpu-wide then the event record will be
PERF_RECORD_SWITCH_CPU_WIDE which is the same as
PERF_RECORD_SWITCH except it also provides the next or
previous pid/tid.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 include/uapi/linux/perf_event.h |  31 +++++++++++-
 kernel/events/core.c            | 103 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 133 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index d97f84c080da..022d0acf7df0 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -330,7 +330,8 @@ struct perf_event_attr {
 				mmap2          :  1, /* include mmap with inode data     */
 				comm_exec      :  1, /* flag comm events that are due to an exec */
 				use_clockid    :  1, /* use @clockid for time fields */
-				__reserved_1   : 38;
+				context_switch :  1, /* context switch data */
+				__reserved_1   : 37;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -572,9 +573,11 @@ struct perf_event_mmap_page {
 /*
  * PERF_RECORD_MISC_MMAP_DATA and PERF_RECORD_MISC_COMM_EXEC are used on
  * different events so can reuse the same bit position.
+ * Ditto PERF_RECORD_MISC_SWITCH_OUT.
  */
 #define PERF_RECORD_MISC_MMAP_DATA		(1 << 13)
 #define PERF_RECORD_MISC_COMM_EXEC		(1 << 13)
+#define PERF_RECORD_MISC_SWITCH_OUT		(1 << 13)
 /*
  * Indicates that the content of PERF_SAMPLE_IP points to
  * the actual instruction that triggered the event. See also
@@ -818,6 +821,32 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_LOST_SAMPLES		= 13,
 
+	/*
+	 * Records a context switch in or out (flagged by
+	 * PERF_RECORD_MISC_SWITCH_OUT). See also
+	 * PERF_RECORD_SWITCH_CPU_WIDE.
+	 *
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_SWITCH			= 14,
+
+	/*
+	 * CPU-wide version of PERF_RECORD_SWITCH with next_prev_pid and
+	 * next_prev_tid that are the next (switching out) or previous
+	 * (switching in) pid/tid.
+	 *
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u32				next_prev_pid;
+	 *	u32				next_prev_tid;
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_SWITCH_CPU_WIDE		= 15,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d3dae3419b99..ce21143c0d9e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -163,6 +163,7 @@ static atomic_t nr_mmap_events __read_mostly;
 static atomic_t nr_comm_events __read_mostly;
 static atomic_t nr_task_events __read_mostly;
 static atomic_t nr_freq_events __read_mostly;
+static atomic_t nr_switch_events __read_mostly;
 
 static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
@@ -2619,6 +2620,9 @@ static void perf_pmu_sched_task(struct task_struct *prev,
 	local_irq_restore(flags);
 }
 
+static void perf_event_switch(struct task_struct *task,
+			      struct task_struct *next_prev, bool sched_in);
+
 #define for_each_task_context_nr(ctxn)					\
 	for ((ctxn) = 0; (ctxn) < perf_nr_task_contexts; (ctxn)++)
 
@@ -2641,6 +2645,9 @@ void __perf_event_task_sched_out(struct task_struct *task,
 	if (__this_cpu_read(perf_sched_cb_usages))
 		perf_pmu_sched_task(task, next, false);
 
+	if (atomic_read(&nr_switch_events))
+		perf_event_switch(task, next, false);
+
 	for_each_task_context_nr(ctxn)
 		perf_event_context_sched_out(task, ctxn, next);
 
@@ -2831,6 +2838,9 @@ void __perf_event_task_sched_in(struct task_struct *prev,
 	if (atomic_read(this_cpu_ptr(&perf_cgroup_events)))
 		perf_cgroup_sched_in(prev, task);
 
+	if (atomic_read(&nr_switch_events))
+		perf_event_switch(task, prev, true);
+
 	if (__this_cpu_read(perf_sched_cb_usages))
 		perf_pmu_sched_task(prev, task, true);
 }
@@ -3454,6 +3464,10 @@ static void unaccount_event(struct perf_event *event)
 		atomic_dec(&nr_task_events);
 	if (event->attr.freq)
 		atomic_dec(&nr_freq_events);
+	if (event->attr.context_switch) {
+		static_key_slow_dec_deferred(&perf_sched_events);
+		atomic_dec(&nr_switch_events);
+	}
 	if (is_cgroup_event(event))
 		static_key_slow_dec_deferred(&perf_sched_events);
 	if (has_branch_stack(event))
@@ -5982,6 +5996,91 @@ void perf_log_lost_samples(struct perf_event *event, u64 lost)
 }
 
 /*
+ * context_switch tracking
+ */
+
+struct perf_switch_event {
+	struct task_struct	*task;
+	struct task_struct	*next_prev;
+
+	struct {
+		struct perf_event_header	header;
+		u32				next_prev_pid;
+		u32				next_prev_tid;
+	} event_id;
+};
+
+static int perf_event_switch_match(struct perf_event *event)
+{
+	return event->attr.context_switch;
+}
+
+static void perf_event_switch_output(struct perf_event *event, void *data)
+{
+	struct perf_switch_event *se = data;
+	struct perf_output_handle handle;
+	struct perf_sample_data sample;
+	int ret;
+
+	if (!perf_event_switch_match(event))
+		return;
+
+	/* Only CPU-wide events are allowed to see next/prev pid/tid */
+	if (event->ctx->task) {
+		se->event_id.header.type = PERF_RECORD_SWITCH;
+		se->event_id.header.size = sizeof(se->event_id.header);
+	} else {
+		se->event_id.header.type = PERF_RECORD_SWITCH_CPU_WIDE;
+		se->event_id.header.size = sizeof(se->event_id);
+		se->event_id.next_prev_pid =
+					perf_event_pid(event, se->next_prev);
+		se->event_id.next_prev_tid =
+					perf_event_tid(event, se->next_prev);
+	}
+
+	perf_event_header__init_id(&se->event_id.header, &sample, event);
+
+	ret = perf_output_begin(&handle, event, se->event_id.header.size);
+	if (ret)
+		return;
+
+	if (event->ctx->task)
+		perf_output_put(&handle, se->event_id.header);
+	else
+		perf_output_put(&handle, se->event_id);
+
+	perf_event__output_id_sample(event, &handle, &sample);
+
+	perf_output_end(&handle);
+}
+
+static void perf_event_switch(struct task_struct *task,
+			      struct task_struct *next_prev, bool sched_in)
+{
+	struct perf_switch_event switch_event;
+
+	/* N.B. caller checks nr_switch_events != 0 */
+
+	switch_event = (struct perf_switch_event){
+		.task		= task,
+		.next_prev	= next_prev,
+		.event_id	= {
+			.header = {
+				/* .type */
+				.misc = sched_in ? 0 : PERF_RECORD_MISC_SWITCH_OUT,
+				/* .size */
+			},
+			/* .next_prev_pid */
+			/* .next_prev_tid */
+		},
+	};
+
+	perf_event_aux(perf_event_switch_output,
+		       &switch_event,
+		       NULL);
+}
+
+/*
  * IRQ throttle logging
  */
 
@@ -7479,6 +7578,10 @@ static void account_event(struct perf_event *event)
 		if (atomic_inc_return(&nr_freq_events) == 1)
 			tick_nohz_full_kick_all();
 	}
+	if (event->attr.context_switch) {
+		atomic_inc(&nr_switch_events);
+		static_key_slow_inc(&perf_sched_events.key);
+	}
 	if (has_branch_stack(event))
 		static_key_slow_inc(&perf_sched_events.key);
 	if (is_cgroup_event(event))
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V4 2/5] perf tools: Add new PERF_RECORD_SWITCH event
  2015-07-21  9:44 [PATCH V4 0/5] perf: Add PERF_RECORD_SWITCH to indicate context switches Adrian Hunter
  2015-07-21  9:44 ` [PATCH V4 1/5] " Adrian Hunter
@ 2015-07-21  9:44 ` Adrian Hunter
  2015-07-29  8:14   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-21  9:44 ` [PATCH V4 3/5] perf record: Add option --switch-events to select PERF_RECORD_SWITCH events Adrian Hunter
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Adrian Hunter @ 2015-07-21  9:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel,
	Jiri Olsa, Stephane Eranian, mathieu.poirier, Pawel Moll

Support processing of PERF_RECORD_SWITCH events and
PERF_RECORD_SWITCH_CPU_WIDE events. There is a single
tools callback for them both so that the tool must
check the event type before using the extra members
in PERF_RECORD_SWITCH_CPU_WIDE.

There is still no way to select the events, though.
That is added in a subsequest patch.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-inject.c |  1 +
 tools/perf/util/event.c     | 28 ++++++++++++++++++++++++++++
 tools/perf/util/event.h     | 12 ++++++++++++
 tools/perf/util/evsel.c     |  1 +
 tools/perf/util/machine.c   | 11 +++++++++++
 tools/perf/util/machine.h   |  2 ++
 tools/perf/util/session.c   | 20 ++++++++++++++++++++
 tools/perf/util/tool.h      |  1 +
 8 files changed, 76 insertions(+)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 01b06492bd6a..f62c49b35be0 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -561,6 +561,7 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused)
 			.lost		= perf_event__repipe,
 			.aux		= perf_event__repipe,
 			.itrace_start	= perf_event__repipe,
+			.context_switch	= perf_event__repipe,
 			.read		= perf_event__repipe_sample,
 			.throttle	= perf_event__repipe,
 			.unthrottle	= perf_event__repipe,
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 67a977e5d0ab..7ff61274ed57 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -26,6 +26,8 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_AUX]			= "AUX",
 	[PERF_RECORD_ITRACE_START]		= "ITRACE_START",
 	[PERF_RECORD_LOST_SAMPLES]		= "LOST_SAMPLES",
+	[PERF_RECORD_SWITCH]			= "SWITCH",
+	[PERF_RECORD_SWITCH_CPU_WIDE]		= "SWITCH_CPU_WIDE",
 	[PERF_RECORD_HEADER_ATTR]		= "ATTR",
 	[PERF_RECORD_HEADER_EVENT_TYPE]		= "EVENT_TYPE",
 	[PERF_RECORD_HEADER_TRACING_DATA]	= "TRACING_DATA",
@@ -749,6 +751,14 @@ int perf_event__process_lost_samples(struct perf_tool *tool __maybe_unused,
 	return machine__process_lost_samples_event(machine, event, sample);
 }
 
+int perf_event__process_switch(struct perf_tool *tool __maybe_unused,
+			       union perf_event *event,
+			       struct perf_sample *sample __maybe_unused,
+			       struct machine *machine)
+{
+	return machine__process_switch_event(machine, event);
+}
+
 size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp)
 {
 	return fprintf(fp, " %d/%d: [%#" PRIx64 "(%#" PRIx64 ") @ %#" PRIx64 "]: %c %s\n",
@@ -827,6 +837,20 @@ size_t perf_event__fprintf_itrace_start(union perf_event *event, FILE *fp)
 		       event->itrace_start.pid, event->itrace_start.tid);
 }
 
+size_t perf_event__fprintf_switch(union perf_event *event, FILE *fp)
+{
+	bool out = event->header.misc & PERF_RECORD_MISC_SWITCH_OUT;
+	const char *in_out = out ? "OUT" : "IN ";
+
+	if (event->header.type == PERF_RECORD_SWITCH)
+		return fprintf(fp, " %s\n", in_out);
+
+	return fprintf(fp, " %s  %s pid/tid: %5u/%-5u\n",
+		       in_out, out ? "next" : "prev",
+		       event->context_switch.next_prev_pid,
+		       event->context_switch.next_prev_tid);
+}
+
 size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 {
 	size_t ret = fprintf(fp, "PERF_RECORD_%s",
@@ -852,6 +876,10 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 	case PERF_RECORD_ITRACE_START:
 		ret += perf_event__fprintf_itrace_start(event, fp);
 		break;
+	case PERF_RECORD_SWITCH:
+	case PERF_RECORD_SWITCH_CPU_WIDE:
+		ret += perf_event__fprintf_switch(event, fp);
+		break;
 	default:
 		ret += fprintf(fp, "\n");
 	}
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index c53f36384b64..4bb2ae894c78 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -348,6 +348,12 @@ struct itrace_start_event {
 	u32 pid, tid;
 };
 
+struct context_switch_event {
+	struct perf_event_header header;
+	u32 next_prev_pid;
+	u32 next_prev_tid;
+};
+
 union perf_event {
 	struct perf_event_header	header;
 	struct mmap_event		mmap;
@@ -369,6 +375,7 @@ union perf_event {
 	struct auxtrace_error_event	auxtrace_error;
 	struct aux_event		aux;
 	struct itrace_start_event	itrace_start;
+	struct context_switch_event	context_switch;
 };
 
 void perf_event__print_totals(void);
@@ -418,6 +425,10 @@ int perf_event__process_itrace_start(struct perf_tool *tool,
 				     union perf_event *event,
 				     struct perf_sample *sample,
 				     struct machine *machine);
+int perf_event__process_switch(struct perf_tool *tool,
+			       union perf_event *event,
+			       struct perf_sample *sample,
+			       struct machine *machine);
 int perf_event__process_mmap(struct perf_tool *tool,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -480,6 +491,7 @@ size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_task(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_aux(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_itrace_start(union perf_event *event, FILE *fp);
+size_t perf_event__fprintf_switch(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf(union perf_event *event, FILE *fp);
 
 u64 kallsyms__get_function_start(const char *kallsyms_filename,
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 49fb7b5feb09..9e6e6f40b787 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1127,6 +1127,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
 	PRINT_ATTRf(mmap2, p_unsigned);
 	PRINT_ATTRf(comm_exec, p_unsigned);
 	PRINT_ATTRf(use_clockid, p_unsigned);
+	PRINT_ATTRf(context_switch, p_unsigned);
 
 	PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned);
 	PRINT_ATTRf(bp_type, p_unsigned);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index d0bf1e590479..7add28d8ce97 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -550,6 +550,14 @@ int machine__process_itrace_start_event(struct machine *machine __maybe_unused,
 	return 0;
 }
 
+int machine__process_switch_event(struct machine *machine __maybe_unused,
+				  union perf_event *event)
+{
+	if (dump_trace)
+		perf_event__fprintf_switch(event, stdout);
+	return 0;
+}
+
 struct map *machine__findnew_module_map(struct machine *machine, u64 start,
 					const char *filename)
 {
@@ -1451,6 +1459,9 @@ int machine__process_event(struct machine *machine, union perf_event *event,
 		ret = machine__process_itrace_start_event(machine, event); break;
 	case PERF_RECORD_LOST_SAMPLES:
 		ret = machine__process_lost_samples_event(machine, event, sample); break;
+	case PERF_RECORD_SWITCH:
+	case PERF_RECORD_SWITCH_CPU_WIDE:
+		ret = machine__process_switch_event(machine, event); break;
 	default:
 		ret = -1;
 		break;
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 887798e511e9..f890697a4c9c 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -87,6 +87,8 @@ int machine__process_aux_event(struct machine *machine,
 			       union perf_event *event);
 int machine__process_itrace_start_event(struct machine *machine,
 					union perf_event *event);
+int machine__process_switch_event(struct machine *machine __maybe_unused,
+				  union perf_event *event);
 int machine__process_mmap_event(struct machine *machine, union perf_event *event,
 				struct perf_sample *sample);
 int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index ed9dc2555ec7..3d359bbf6e95 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -332,6 +332,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 		tool->aux = perf_event__process_aux;
 	if (tool->itrace_start == NULL)
 		tool->itrace_start = perf_event__process_itrace_start;
+	if (tool->context_switch == NULL)
+		tool->context_switch = perf_event__process_switch;
 	if (tool->read == NULL)
 		tool->read = process_event_sample_stub;
 	if (tool->throttle == NULL)
@@ -470,6 +472,19 @@ static void perf_event__itrace_start_swap(union perf_event *event,
 		swap_sample_id_all(event, &event->itrace_start + 1);
 }
 
+static void perf_event__switch_swap(union perf_event *event, bool sample_id_all)
+{
+	if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE) {
+		event->context_switch.next_prev_pid =
+				bswap_32(event->context_switch.next_prev_pid);
+		event->context_switch.next_prev_tid =
+				bswap_32(event->context_switch.next_prev_tid);
+	}
+
+	if (sample_id_all)
+		swap_sample_id_all(event, &event->context_switch + 1);
+}
+
 static void perf_event__throttle_swap(union perf_event *event,
 				      bool sample_id_all)
 {
@@ -632,6 +647,8 @@ static perf_event__swap_op perf_event__swap_ops[] = {
 	[PERF_RECORD_AUX]		  = perf_event__aux_swap,
 	[PERF_RECORD_ITRACE_START]	  = perf_event__itrace_start_swap,
 	[PERF_RECORD_LOST_SAMPLES]	  = perf_event__all64_swap,
+	[PERF_RECORD_SWITCH]		  = perf_event__switch_swap,
+	[PERF_RECORD_SWITCH_CPU_WIDE]	  = perf_event__switch_swap,
 	[PERF_RECORD_HEADER_ATTR]	  = perf_event__hdr_attr_swap,
 	[PERF_RECORD_HEADER_EVENT_TYPE]	  = perf_event__event_type_swap,
 	[PERF_RECORD_HEADER_TRACING_DATA] = perf_event__tracing_data_swap,
@@ -1093,6 +1110,9 @@ static int machines__deliver_event(struct machines *machines,
 		return tool->aux(tool, event, sample, machine);
 	case PERF_RECORD_ITRACE_START:
 		return tool->itrace_start(tool, event, sample, machine);
+	case PERF_RECORD_SWITCH:
+	case PERF_RECORD_SWITCH_CPU_WIDE:
+		return tool->context_switch(tool, event, sample, machine);
 	default:
 		++evlist->stats.nr_unknown_events;
 		return -1;
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index c307dd438286..cab8cc24831b 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -46,6 +46,7 @@ struct perf_tool {
 			lost_samples,
 			aux,
 			itrace_start,
+			context_switch,
 			throttle,
 			unthrottle;
 	event_attr_op	attr;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V4 3/5] perf record: Add option --switch-events to select PERF_RECORD_SWITCH events
  2015-07-21  9:44 [PATCH V4 0/5] perf: Add PERF_RECORD_SWITCH to indicate context switches Adrian Hunter
  2015-07-21  9:44 ` [PATCH V4 1/5] " Adrian Hunter
  2015-07-21  9:44 ` [PATCH V4 2/5] perf tools: Add new PERF_RECORD_SWITCH event Adrian Hunter
@ 2015-07-21  9:44 ` Adrian Hunter
  2015-07-29  8:15   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-21  9:44 ` [PATCH V4 4/5] perf script: Don't assume evsel position of tracking events Adrian Hunter
  2015-07-21  9:44 ` [PATCH V4 5/5] perf script: Add option --show-switch-events Adrian Hunter
  4 siblings, 1 reply; 15+ messages in thread
From: Adrian Hunter @ 2015-07-21  9:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel,
	Jiri Olsa, Stephane Eranian, mathieu.poirier, Pawel Moll

Add an option to select PERF_RECORD_SWITCH events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-record.txt |  4 ++++
 tools/perf/builtin-record.c              |  7 +++++++
 tools/perf/perf.h                        |  1 +
 tools/perf/util/evlist.h                 |  1 +
 tools/perf/util/evsel.c                  |  3 +++
 tools/perf/util/record.c                 | 10 ++++++++++
 6 files changed, 26 insertions(+)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 29e5307945bf..63ee0408761d 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -293,6 +293,10 @@ When processing pre-existing threads /proc/XXX/mmap, it may take a long time,
 because the file may be huge. A time out is needed in such cases.
 This option sets the time out limit. The default value is 500 ms.
 
+--switch-events::
+Record context switch events i.e. events of type PERF_RECORD_SWITCH or
+PERF_RECORD_SWITCH_CPU_WIDE.
+
 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 1932e27c00d8..445a64d19625 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1075,6 +1075,8 @@ struct option __record_options[] = {
 			  "opts", "AUX area tracing Snapshot Mode", ""),
 	OPT_UINTEGER(0, "proc-map-timeout", &record.opts.proc_map_timeout,
 			"per thread proc mmap processing timeout in ms"),
+	OPT_BOOLEAN(0, "switch-events", &record.opts.record_switch_events,
+		    "Record context switch events"),
 	OPT_END()
 };
 
@@ -1102,6 +1104,11 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
 			  " system-wide mode\n");
 		usage_with_options(record_usage, record_options);
 	}
+	if (rec->opts.record_switch_events &&
+	    !perf_can_record_switch_events()) {
+		ui__error("kernel does not support recording context switch events (--switch-events option)\n");
+		usage_with_options(record_usage, record_options);
+	}
 
 	if (!rec->itr) {
 		rec->itr = auxtrace_record__init(rec->evlist, &err);
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 937b16aa0300..cf459f89fc9b 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -57,6 +57,7 @@ struct record_opts {
 	bool	     running_time;
 	bool	     full_auxtrace;
 	bool	     auxtrace_snapshot_mode;
+	bool	     record_switch_events;
 	unsigned int freq;
 	unsigned int mmap_pages;
 	unsigned int auxtrace_mmap_pages;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 037633c1da9d..c2fb15768d79 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -114,6 +114,7 @@ void perf_evlist__close(struct perf_evlist *evlist);
 
 void perf_evlist__set_id_pos(struct perf_evlist *evlist);
 bool perf_can_sample_identifier(void);
+bool perf_can_record_switch_events(void);
 void perf_evlist__config(struct perf_evlist *evlist, struct record_opts *opts);
 int record_opts__config(struct record_opts *opts);
 
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 9e6e6f40b787..71f6905c7cb9 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -738,6 +738,9 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts)
 	attr->mmap2 = track && !perf_missing_features.mmap2;
 	attr->comm  = track;
 
+	if (opts->record_switch_events)
+		attr->context_switch = track;
+
 	if (opts->sample_transaction)
 		perf_evsel__set_sample_bit(evsel, TRANSACTION);
 
diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
index 1f7becbe5e18..0d228a29526d 100644
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
@@ -85,6 +85,11 @@ static void perf_probe_comm_exec(struct perf_evsel *evsel)
 	evsel->attr.comm_exec = 1;
 }
 
+static void perf_probe_context_switch(struct perf_evsel *evsel)
+{
+	evsel->attr.context_switch = 1;
+}
+
 bool perf_can_sample_identifier(void)
 {
 	return perf_probe_api(perf_probe_sample_identifier);
@@ -95,6 +100,11 @@ static bool perf_can_comm_exec(void)
 	return perf_probe_api(perf_probe_comm_exec);
 }
 
+bool perf_can_record_switch_events(void)
+{
+	return perf_probe_api(perf_probe_context_switch);
+}
+
 void perf_evlist__config(struct perf_evlist *evlist, struct record_opts *opts)
 {
 	struct perf_evsel *evsel;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V4 4/5] perf script: Don't assume evsel position of tracking events
  2015-07-21  9:44 [PATCH V4 0/5] perf: Add PERF_RECORD_SWITCH to indicate context switches Adrian Hunter
                   ` (2 preceding siblings ...)
  2015-07-21  9:44 ` [PATCH V4 3/5] perf record: Add option --switch-events to select PERF_RECORD_SWITCH events Adrian Hunter
@ 2015-07-21  9:44 ` Adrian Hunter
  2015-07-29  8:15   ` [tip:perf/core] perf script: Don' t " tip-bot for Adrian Hunter
  2015-07-21  9:44 ` [PATCH V4 5/5] perf script: Add option --show-switch-events Adrian Hunter
  4 siblings, 1 reply; 15+ messages in thread
From: Adrian Hunter @ 2015-07-21  9:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel,
	Jiri Olsa, Stephane Eranian, mathieu.poirier, Pawel Moll

The tracking event does not have to be the first event
so replace perf_evlist__first() with perf_evlist__id2evsel()
which uses the event ID to find the correct evsel.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-script.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 24809787369f..9b531ac63a24 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -661,7 +661,7 @@ static int process_comm_event(struct perf_tool *tool,
 	struct thread *thread;
 	struct perf_script *script = container_of(tool, struct perf_script, tool);
 	struct perf_session *session = script->session;
-	struct perf_evsel *evsel = perf_evlist__first(session->evlist);
+	struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
 	int ret = -1;
 
 	thread = machine__findnew_thread(machine, event->comm.pid, event->comm.tid);
@@ -695,7 +695,7 @@ static int process_fork_event(struct perf_tool *tool,
 	struct thread *thread;
 	struct perf_script *script = container_of(tool, struct perf_script, tool);
 	struct perf_session *session = script->session;
-	struct perf_evsel *evsel = perf_evlist__first(session->evlist);
+	struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
 
 	if (perf_event__process_fork(tool, event, sample, machine) < 0)
 		return -1;
@@ -727,7 +727,7 @@ static int process_exit_event(struct perf_tool *tool,
 	struct thread *thread;
 	struct perf_script *script = container_of(tool, struct perf_script, tool);
 	struct perf_session *session = script->session;
-	struct perf_evsel *evsel = perf_evlist__first(session->evlist);
+	struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
 
 	thread = machine__findnew_thread(machine, event->fork.pid, event->fork.tid);
 	if (thread == NULL) {
@@ -759,7 +759,7 @@ static int process_mmap_event(struct perf_tool *tool,
 	struct thread *thread;
 	struct perf_script *script = container_of(tool, struct perf_script, tool);
 	struct perf_session *session = script->session;
-	struct perf_evsel *evsel = perf_evlist__first(session->evlist);
+	struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
 
 	if (perf_event__process_mmap(tool, event, sample, machine) < 0)
 		return -1;
@@ -790,7 +790,7 @@ static int process_mmap2_event(struct perf_tool *tool,
 	struct thread *thread;
 	struct perf_script *script = container_of(tool, struct perf_script, tool);
 	struct perf_session *session = script->session;
-	struct perf_evsel *evsel = perf_evlist__first(session->evlist);
+	struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
 
 	if (perf_event__process_mmap2(tool, event, sample, machine) < 0)
 		return -1;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V4 5/5] perf script: Add option --show-switch-events
  2015-07-21  9:44 [PATCH V4 0/5] perf: Add PERF_RECORD_SWITCH to indicate context switches Adrian Hunter
                   ` (3 preceding siblings ...)
  2015-07-21  9:44 ` [PATCH V4 4/5] perf script: Don't assume evsel position of tracking events Adrian Hunter
@ 2015-07-21  9:44 ` Adrian Hunter
  2015-07-29  8:15   ` [tip:perf/core] " tip-bot for Adrian Hunter
  4 siblings, 1 reply; 15+ messages in thread
From: Adrian Hunter @ 2015-07-21  9:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel,
	Jiri Olsa, Stephane Eranian, mathieu.poirier, Pawel Moll

Add option --show-switch-events to show switch events
in a similar fashion to --show-task-events and
--show-mmap-events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-script.txt |  4 ++++
 tools/perf/builtin-script.c              | 31 +++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index c82df572fac2..e2fec5fc21e7 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -222,6 +222,10 @@ OPTIONS
 --show-mmap-events
 	Display mmap related events (e.g. MMAP, MMAP2).
 
+--show-switch-events
+	Display context switch events i.e. events of type PERF_RECORD_SWITCH or
+	PERF_RECORD_SWITCH_CPU_WIDE.
+
 --header
 	Show perf.data header.
 
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 9b531ac63a24..3f8325dbc30e 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -623,6 +623,7 @@ struct perf_script {
 	struct perf_session	*session;
 	bool			show_task_events;
 	bool			show_mmap_events;
+	bool			show_switch_events;
 };
 
 static int process_attr(struct perf_tool *tool, union perf_event *event,
@@ -813,6 +814,32 @@ static int process_mmap2_event(struct perf_tool *tool,
 	return 0;
 }
 
+static int process_switch_event(struct perf_tool *tool,
+				union perf_event *event,
+				struct perf_sample *sample,
+				struct machine *machine)
+{
+	struct thread *thread;
+	struct perf_script *script = container_of(tool, struct perf_script, tool);
+	struct perf_session *session = script->session;
+	struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
+
+	if (perf_event__process_switch(tool, event, sample, machine) < 0)
+		return -1;
+
+	thread = machine__findnew_thread(machine, sample->pid,
+					 sample->tid);
+	if (thread == NULL) {
+		pr_debug("problem processing SWITCH event, skipping it.\n");
+		return -1;
+	}
+
+	print_sample_start(sample, thread, evsel);
+	perf_event__fprintf(event, stdout);
+	thread__put(thread);
+	return 0;
+}
+
 static void sig_handler(int sig __maybe_unused)
 {
 	session_done = 1;
@@ -834,6 +861,8 @@ static int __cmd_script(struct perf_script *script)
 		script->tool.mmap = process_mmap_event;
 		script->tool.mmap2 = process_mmap2_event;
 	}
+	if (script->show_switch_events)
+		script->tool.context_switch = process_switch_event;
 
 	ret = perf_session__process_events(script->session);
 
@@ -1618,6 +1647,8 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 		    "Show the fork/comm/exit events"),
 	OPT_BOOLEAN('\0', "show-mmap-events", &script.show_mmap_events,
 		    "Show the mmap events"),
+	OPT_BOOLEAN('\0', "show-switch-events", &script.show_switch_events,
+		    "Show context switch events (if recorded)"),
 	OPT_BOOLEAN('f', "force", &file.force, "don't complain, do it"),
 	OPT_CALLBACK_OPTARG(0, "itrace", &itrace_synth_opts, NULL, "opts",
 			    "Instruction Tracing options",
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH V4 1/5] perf: Add PERF_RECORD_SWITCH to indicate context switches
  2015-07-21  9:44 ` [PATCH V4 1/5] " Adrian Hunter
@ 2015-07-21 10:29   ` Peter Zijlstra
  2015-07-21 17:54     ` Arnaldo Carvalho de Melo
  2015-07-29  8:14   ` [tip:perf/core] " tip-bot for Adrian Hunter
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2015-07-21 10:29 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel,
	Jiri Olsa, Stephane Eranian, mathieu.poirier, Pawel Moll

On Tue, Jul 21, 2015 at 12:44:02PM +0300, Adrian Hunter wrote:
> There are already two events for context switches, namely
> the tracepoint sched:sched_switch and the software event
> context_switches. Unfortunately neither are suitable for
> use by non-privileged users for the purpose of synchronizing
> hardware trace data (e.g. Intel PT) to the context switch.
> 
> Tracepoints are no good at all for non-privileged users
> because they need either CAP_SYS_ADMIN or
> /proc/sys/kernel/perf_event_paranoid <= -1.
> 
> On the other hand, kernel software events need either
> CAP_SYS_ADMIN or /proc/sys/kernel/perf_event_paranoid <= 1.
> 
> Now many distributions do default perf_event_paranoid to 1
> making context_switches a contender, except it has another
> problem (which is also shared with sched:sched_switch)
> which is that it happens before perf schedules events out
> instead of after perf schedules events in. Whereas a
> privileged user can see all the events anyway, a
> non-privileged user only sees events for their own processes,
> in other words they see when their process was scheduled out
> not when it was scheduled in. That presents two problems to
> use the event: 1. the information comes too late, so tools
> have to look ahead in the event stream to find out what the
> current state is 2. if they are unlucky tracing might have
> stopped before the context-switches event is recorded.
> 
> This new PERF_RECORD_SWITCH event does not have those problems
> and it also has a couple of other small advantages. It is
> easier to use because it is an auxiliary event (like mmap,
> comm and task events) which can be enabled by setting a single
> bit. It is smaller than sched:sched_switch and easier to parse.
> 
> To make the event useful for privileged users also, if the
> context is cpu-wide then the event record will be
> PERF_RECORD_SWITCH_CPU_WIDE which is the same as
> PERF_RECORD_SWITCH except it also provides the next or
> previous pid/tid.
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acme, if you agree with the tools part, feel free to take the lot.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V4 1/5] perf: Add PERF_RECORD_SWITCH to indicate context switches
  2015-07-21 10:29   ` Peter Zijlstra
@ 2015-07-21 17:54     ` Arnaldo Carvalho de Melo
  2015-07-21 18:15       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 15+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-07-21 17:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Adrian Hunter, Andi Kleen, Ingo Molnar, linux-kernel, Jiri Olsa,
	Stephane Eranian, mathieu.poirier, Pawel Moll

Em Tue, Jul 21, 2015 at 12:29:55PM +0200, Peter Zijlstra escreveu:
> On Tue, Jul 21, 2015 at 12:44:02PM +0300, Adrian Hunter wrote:
> > There are already two events for context switches, namely
> > the tracepoint sched:sched_switch and the software event
> > context_switches. Unfortunately neither are suitable for
> > use by non-privileged users for the purpose of synchronizing
> > hardware trace data (e.g. Intel PT) to the context switch.
> > 
> > Tracepoints are no good at all for non-privileged users
> > because they need either CAP_SYS_ADMIN or
> > /proc/sys/kernel/perf_event_paranoid <= -1.
> > 
> > On the other hand, kernel software events need either
> > CAP_SYS_ADMIN or /proc/sys/kernel/perf_event_paranoid <= 1.
> > 
> > Now many distributions do default perf_event_paranoid to 1
> > making context_switches a contender, except it has another
> > problem (which is also shared with sched:sched_switch)
> > which is that it happens before perf schedules events out
> > instead of after perf schedules events in. Whereas a
> > privileged user can see all the events anyway, a
> > non-privileged user only sees events for their own processes,
> > in other words they see when their process was scheduled out
> > not when it was scheduled in. That presents two problems to
> > use the event: 1. the information comes too late, so tools
> > have to look ahead in the event stream to find out what the
> > current state is 2. if they are unlucky tracing might have
> > stopped before the context-switches event is recorded.
> > 
> > This new PERF_RECORD_SWITCH event does not have those problems
> > and it also has a couple of other small advantages. It is
> > easier to use because it is an auxiliary event (like mmap,
> > comm and task events) which can be enabled by setting a single
> > bit. It is smaller than sched:sched_switch and easier to parse.
> > 
> > To make the event useful for privileged users also, if the
> > context is cpu-wide then the event record will be
> > PERF_RECORD_SWITCH_CPU_WIDE which is the same as
> > PERF_RECORD_SWITCH except it also provides the next or
> > previous pid/tid.
> > 
> > Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> 
> Acme, if you agree with the tools part, feel free to take the lot.

Looking at it, if all is well, I'll do it
 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V4 1/5] perf: Add PERF_RECORD_SWITCH to indicate context switches
  2015-07-21 17:54     ` Arnaldo Carvalho de Melo
@ 2015-07-21 18:15       ` Arnaldo Carvalho de Melo
  2015-07-22  9:53         ` Jiri Olsa
  0 siblings, 1 reply; 15+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-07-21 18:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Adrian Hunter, Andi Kleen, Ingo Molnar, linux-kernel, Jiri Olsa,
	Stephane Eranian, mathieu.poirier, Pawel Moll

Em Tue, Jul 21, 2015 at 02:54:54PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Jul 21, 2015 at 12:29:55PM +0200, Peter Zijlstra escreveu:

> > Acme, if you agree with the tools part, feel free to take the lot.
 
> Looking at it, if all is well, I'll do it
  
> > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

All seems ok, building a kernel to test it all, patch by patch,
will be on my next pull req to Ingo for perf/core.

Thanks,

- Arnaldo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V4 1/5] perf: Add PERF_RECORD_SWITCH to indicate context switches
  2015-07-21 18:15       ` Arnaldo Carvalho de Melo
@ 2015-07-22  9:53         ` Jiri Olsa
  0 siblings, 0 replies; 15+ messages in thread
From: Jiri Olsa @ 2015-07-22  9:53 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Adrian Hunter, Andi Kleen, Ingo Molnar,
	linux-kernel, Stephane Eranian, mathieu.poirier, Pawel Moll

On Tue, Jul 21, 2015 at 03:15:14PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Jul 21, 2015 at 02:54:54PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Tue, Jul 21, 2015 at 12:29:55PM +0200, Peter Zijlstra escreveu:
> 
> > > Acme, if you agree with the tools part, feel free to take the lot.
>  
> > Looking at it, if all is well, I'll do it
>   
> > > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> All seems ok, building a kernel to test it all, patch by patch,
> will be on my next pull req to Ingo for perf/core.

tested, looks ok

jirka

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [tip:perf/core] perf: Add PERF_RECORD_SWITCH to indicate context switches
  2015-07-21  9:44 ` [PATCH V4 1/5] " Adrian Hunter
  2015-07-21 10:29   ` Peter Zijlstra
@ 2015-07-29  8:14   ` tip-bot for Adrian Hunter
  1 sibling, 0 replies; 15+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-07-29  8:14 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: ak, hpa, pawel.moll, tglx, linux-kernel, jolsa, adrian.hunter,
	mingo, acme, peterz, eranian, mathieu.poirier

Commit-ID:  45ac1403f564f411c6a383a2448688ba8dd705a4
Gitweb:     http://git.kernel.org/tip/45ac1403f564f411c6a383a2448688ba8dd705a4
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Tue, 21 Jul 2015 12:44:02 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 23 Jul 2015 22:51:12 -0300

perf: Add PERF_RECORD_SWITCH to indicate context switches

There are already two events for context switches, namely the tracepoint
sched:sched_switch and the software event context_switches.
Unfortunately neither are suitable for use by non-privileged users for
the purpose of synchronizing hardware trace data (e.g. Intel PT) to the
context switch.

Tracepoints are no good at all for non-privileged users because they
need either CAP_SYS_ADMIN or /proc/sys/kernel/perf_event_paranoid <= -1.

On the other hand, kernel software events need either CAP_SYS_ADMIN or
/proc/sys/kernel/perf_event_paranoid <= 1.

Now many distributions do default perf_event_paranoid to 1 making
context_switches a contender, except it has another problem (which is
also shared with sched:sched_switch) which is that it happens before
perf schedules events out instead of after perf schedules events in.
Whereas a privileged user can see all the events anyway, a
non-privileged user only sees events for their own processes, in other
words they see when their process was scheduled out not when it was
scheduled in. That presents two problems to use the event:

1. the information comes too late, so tools have to look ahead in the
   event stream to find out what the current state is

2. if they are unlucky tracing might have stopped before the
   context-switches event is recorded.

This new PERF_RECORD_SWITCH event does not have those problems
and it also has a couple of other small advantages.

It is easier to use because it is an auxiliary event (like mmap, comm
and task events) which can be enabled by setting a single bit. It is
smaller than sched:sched_switch and easier to parse.

To make the event useful for privileged users also, if the
context is cpu-wide then the event record will be
PERF_RECORD_SWITCH_CPU_WIDE which is the same as
PERF_RECORD_SWITCH except it also provides the next or
previous pid/tid.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1437471846-26995-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 include/uapi/linux/perf_event.h |  31 +++++++++++-
 kernel/events/core.c            | 103 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 133 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index d97f84c..022d0ac 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -330,7 +330,8 @@ struct perf_event_attr {
 				mmap2          :  1, /* include mmap with inode data     */
 				comm_exec      :  1, /* flag comm events that are due to an exec */
 				use_clockid    :  1, /* use @clockid for time fields */
-				__reserved_1   : 38;
+				context_switch :  1, /* context switch data */
+				__reserved_1   : 37;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -572,9 +573,11 @@ struct perf_event_mmap_page {
 /*
  * PERF_RECORD_MISC_MMAP_DATA and PERF_RECORD_MISC_COMM_EXEC are used on
  * different events so can reuse the same bit position.
+ * Ditto PERF_RECORD_MISC_SWITCH_OUT.
  */
 #define PERF_RECORD_MISC_MMAP_DATA		(1 << 13)
 #define PERF_RECORD_MISC_COMM_EXEC		(1 << 13)
+#define PERF_RECORD_MISC_SWITCH_OUT		(1 << 13)
 /*
  * Indicates that the content of PERF_SAMPLE_IP points to
  * the actual instruction that triggered the event. See also
@@ -818,6 +821,32 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_LOST_SAMPLES		= 13,
 
+	/*
+	 * Records a context switch in or out (flagged by
+	 * PERF_RECORD_MISC_SWITCH_OUT). See also
+	 * PERF_RECORD_SWITCH_CPU_WIDE.
+	 *
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_SWITCH			= 14,
+
+	/*
+	 * CPU-wide version of PERF_RECORD_SWITCH with next_prev_pid and
+	 * next_prev_tid that are the next (switching out) or previous
+	 * (switching in) pid/tid.
+	 *
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u32				next_prev_pid;
+	 *	u32				next_prev_tid;
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_SWITCH_CPU_WIDE		= 15,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d3dae34..ce21143 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -163,6 +163,7 @@ static atomic_t nr_mmap_events __read_mostly;
 static atomic_t nr_comm_events __read_mostly;
 static atomic_t nr_task_events __read_mostly;
 static atomic_t nr_freq_events __read_mostly;
+static atomic_t nr_switch_events __read_mostly;
 
 static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
@@ -2619,6 +2620,9 @@ static void perf_pmu_sched_task(struct task_struct *prev,
 	local_irq_restore(flags);
 }
 
+static void perf_event_switch(struct task_struct *task,
+			      struct task_struct *next_prev, bool sched_in);
+
 #define for_each_task_context_nr(ctxn)					\
 	for ((ctxn) = 0; (ctxn) < perf_nr_task_contexts; (ctxn)++)
 
@@ -2641,6 +2645,9 @@ void __perf_event_task_sched_out(struct task_struct *task,
 	if (__this_cpu_read(perf_sched_cb_usages))
 		perf_pmu_sched_task(task, next, false);
 
+	if (atomic_read(&nr_switch_events))
+		perf_event_switch(task, next, false);
+
 	for_each_task_context_nr(ctxn)
 		perf_event_context_sched_out(task, ctxn, next);
 
@@ -2831,6 +2838,9 @@ void __perf_event_task_sched_in(struct task_struct *prev,
 	if (atomic_read(this_cpu_ptr(&perf_cgroup_events)))
 		perf_cgroup_sched_in(prev, task);
 
+	if (atomic_read(&nr_switch_events))
+		perf_event_switch(task, prev, true);
+
 	if (__this_cpu_read(perf_sched_cb_usages))
 		perf_pmu_sched_task(prev, task, true);
 }
@@ -3454,6 +3464,10 @@ static void unaccount_event(struct perf_event *event)
 		atomic_dec(&nr_task_events);
 	if (event->attr.freq)
 		atomic_dec(&nr_freq_events);
+	if (event->attr.context_switch) {
+		static_key_slow_dec_deferred(&perf_sched_events);
+		atomic_dec(&nr_switch_events);
+	}
 	if (is_cgroup_event(event))
 		static_key_slow_dec_deferred(&perf_sched_events);
 	if (has_branch_stack(event))
@@ -5982,6 +5996,91 @@ void perf_log_lost_samples(struct perf_event *event, u64 lost)
 }
 
 /*
+ * context_switch tracking
+ */
+
+struct perf_switch_event {
+	struct task_struct	*task;
+	struct task_struct	*next_prev;
+
+	struct {
+		struct perf_event_header	header;
+		u32				next_prev_pid;
+		u32				next_prev_tid;
+	} event_id;
+};
+
+static int perf_event_switch_match(struct perf_event *event)
+{
+	return event->attr.context_switch;
+}
+
+static void perf_event_switch_output(struct perf_event *event, void *data)
+{
+	struct perf_switch_event *se = data;
+	struct perf_output_handle handle;
+	struct perf_sample_data sample;
+	int ret;
+
+	if (!perf_event_switch_match(event))
+		return;
+
+	/* Only CPU-wide events are allowed to see next/prev pid/tid */
+	if (event->ctx->task) {
+		se->event_id.header.type = PERF_RECORD_SWITCH;
+		se->event_id.header.size = sizeof(se->event_id.header);
+	} else {
+		se->event_id.header.type = PERF_RECORD_SWITCH_CPU_WIDE;
+		se->event_id.header.size = sizeof(se->event_id);
+		se->event_id.next_prev_pid =
+					perf_event_pid(event, se->next_prev);
+		se->event_id.next_prev_tid =
+					perf_event_tid(event, se->next_prev);
+	}
+
+	perf_event_header__init_id(&se->event_id.header, &sample, event);
+
+	ret = perf_output_begin(&handle, event, se->event_id.header.size);
+	if (ret)
+		return;
+
+	if (event->ctx->task)
+		perf_output_put(&handle, se->event_id.header);
+	else
+		perf_output_put(&handle, se->event_id);
+
+	perf_event__output_id_sample(event, &handle, &sample);
+
+	perf_output_end(&handle);
+}
+
+static void perf_event_switch(struct task_struct *task,
+			      struct task_struct *next_prev, bool sched_in)
+{
+	struct perf_switch_event switch_event;
+
+	/* N.B. caller checks nr_switch_events != 0 */
+
+	switch_event = (struct perf_switch_event){
+		.task		= task,
+		.next_prev	= next_prev,
+		.event_id	= {
+			.header = {
+				/* .type */
+				.misc = sched_in ? 0 : PERF_RECORD_MISC_SWITCH_OUT,
+				/* .size */
+			},
+			/* .next_prev_pid */
+			/* .next_prev_tid */
+		},
+	};
+
+	perf_event_aux(perf_event_switch_output,
+		       &switch_event,
+		       NULL);
+}
+
+/*
  * IRQ throttle logging
  */
 
@@ -7479,6 +7578,10 @@ static void account_event(struct perf_event *event)
 		if (atomic_inc_return(&nr_freq_events) == 1)
 			tick_nohz_full_kick_all();
 	}
+	if (event->attr.context_switch) {
+		atomic_inc(&nr_switch_events);
+		static_key_slow_inc(&perf_sched_events.key);
+	}
 	if (has_branch_stack(event))
 		static_key_slow_inc(&perf_sched_events.key);
 	if (is_cgroup_event(event))

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [tip:perf/core] perf tools: Add new PERF_RECORD_SWITCH event
  2015-07-21  9:44 ` [PATCH V4 2/5] perf tools: Add new PERF_RECORD_SWITCH event Adrian Hunter
@ 2015-07-29  8:14   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-07-29  8:14 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, mingo, mathieu.poirier, eranian, linux-kernel, acme,
	adrian.hunter, ak, tglx, peterz, pawel.moll, jolsa

Commit-ID:  0286039f777ec0b6684868c34f7b16f97a069d6e
Gitweb:     http://git.kernel.org/tip/0286039f777ec0b6684868c34f7b16f97a069d6e
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Tue, 21 Jul 2015 12:44:03 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 23 Jul 2015 22:51:13 -0300

perf tools: Add new PERF_RECORD_SWITCH event

Support processing of PERF_RECORD_SWITCH events and
PERF_RECORD_SWITCH_CPU_WIDE events. There is a single
tools callback for them both so that the tool must
check the event type before using the extra members
in PERF_RECORD_SWITCH_CPU_WIDE.

There is still no way to select the events, though.
That is added in a subsequest patch.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1437471846-26995-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-inject.c |  1 +
 tools/perf/util/event.c     | 28 ++++++++++++++++++++++++++++
 tools/perf/util/event.h     | 12 ++++++++++++
 tools/perf/util/evsel.c     |  1 +
 tools/perf/util/machine.c   | 11 +++++++++++
 tools/perf/util/machine.h   |  2 ++
 tools/perf/util/session.c   | 20 ++++++++++++++++++++
 tools/perf/util/tool.h      |  1 +
 8 files changed, 76 insertions(+)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 01b0649..f62c49b 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -561,6 +561,7 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused)
 			.lost		= perf_event__repipe,
 			.aux		= perf_event__repipe,
 			.itrace_start	= perf_event__repipe,
+			.context_switch	= perf_event__repipe,
 			.read		= perf_event__repipe_sample,
 			.throttle	= perf_event__repipe,
 			.unthrottle	= perf_event__repipe,
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 67a977e..7ff6127 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -26,6 +26,8 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_AUX]			= "AUX",
 	[PERF_RECORD_ITRACE_START]		= "ITRACE_START",
 	[PERF_RECORD_LOST_SAMPLES]		= "LOST_SAMPLES",
+	[PERF_RECORD_SWITCH]			= "SWITCH",
+	[PERF_RECORD_SWITCH_CPU_WIDE]		= "SWITCH_CPU_WIDE",
 	[PERF_RECORD_HEADER_ATTR]		= "ATTR",
 	[PERF_RECORD_HEADER_EVENT_TYPE]		= "EVENT_TYPE",
 	[PERF_RECORD_HEADER_TRACING_DATA]	= "TRACING_DATA",
@@ -749,6 +751,14 @@ int perf_event__process_lost_samples(struct perf_tool *tool __maybe_unused,
 	return machine__process_lost_samples_event(machine, event, sample);
 }
 
+int perf_event__process_switch(struct perf_tool *tool __maybe_unused,
+			       union perf_event *event,
+			       struct perf_sample *sample __maybe_unused,
+			       struct machine *machine)
+{
+	return machine__process_switch_event(machine, event);
+}
+
 size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp)
 {
 	return fprintf(fp, " %d/%d: [%#" PRIx64 "(%#" PRIx64 ") @ %#" PRIx64 "]: %c %s\n",
@@ -827,6 +837,20 @@ size_t perf_event__fprintf_itrace_start(union perf_event *event, FILE *fp)
 		       event->itrace_start.pid, event->itrace_start.tid);
 }
 
+size_t perf_event__fprintf_switch(union perf_event *event, FILE *fp)
+{
+	bool out = event->header.misc & PERF_RECORD_MISC_SWITCH_OUT;
+	const char *in_out = out ? "OUT" : "IN ";
+
+	if (event->header.type == PERF_RECORD_SWITCH)
+		return fprintf(fp, " %s\n", in_out);
+
+	return fprintf(fp, " %s  %s pid/tid: %5u/%-5u\n",
+		       in_out, out ? "next" : "prev",
+		       event->context_switch.next_prev_pid,
+		       event->context_switch.next_prev_tid);
+}
+
 size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 {
 	size_t ret = fprintf(fp, "PERF_RECORD_%s",
@@ -852,6 +876,10 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 	case PERF_RECORD_ITRACE_START:
 		ret += perf_event__fprintf_itrace_start(event, fp);
 		break;
+	case PERF_RECORD_SWITCH:
+	case PERF_RECORD_SWITCH_CPU_WIDE:
+		ret += perf_event__fprintf_switch(event, fp);
+		break;
 	default:
 		ret += fprintf(fp, "\n");
 	}
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index c53f363..4bb2ae8 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -348,6 +348,12 @@ struct itrace_start_event {
 	u32 pid, tid;
 };
 
+struct context_switch_event {
+	struct perf_event_header header;
+	u32 next_prev_pid;
+	u32 next_prev_tid;
+};
+
 union perf_event {
 	struct perf_event_header	header;
 	struct mmap_event		mmap;
@@ -369,6 +375,7 @@ union perf_event {
 	struct auxtrace_error_event	auxtrace_error;
 	struct aux_event		aux;
 	struct itrace_start_event	itrace_start;
+	struct context_switch_event	context_switch;
 };
 
 void perf_event__print_totals(void);
@@ -418,6 +425,10 @@ int perf_event__process_itrace_start(struct perf_tool *tool,
 				     union perf_event *event,
 				     struct perf_sample *sample,
 				     struct machine *machine);
+int perf_event__process_switch(struct perf_tool *tool,
+			       union perf_event *event,
+			       struct perf_sample *sample,
+			       struct machine *machine);
 int perf_event__process_mmap(struct perf_tool *tool,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -480,6 +491,7 @@ size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_task(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_aux(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_itrace_start(union perf_event *event, FILE *fp);
+size_t perf_event__fprintf_switch(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf(union perf_event *event, FILE *fp);
 
 u64 kallsyms__get_function_start(const char *kallsyms_filename,
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 49fb7b5..9e6e6f4 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1127,6 +1127,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
 	PRINT_ATTRf(mmap2, p_unsigned);
 	PRINT_ATTRf(comm_exec, p_unsigned);
 	PRINT_ATTRf(use_clockid, p_unsigned);
+	PRINT_ATTRf(context_switch, p_unsigned);
 
 	PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned);
 	PRINT_ATTRf(bp_type, p_unsigned);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 22006c1..be3e008 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -550,6 +550,14 @@ int machine__process_itrace_start_event(struct machine *machine __maybe_unused,
 	return 0;
 }
 
+int machine__process_switch_event(struct machine *machine __maybe_unused,
+				  union perf_event *event)
+{
+	if (dump_trace)
+		perf_event__fprintf_switch(event, stdout);
+	return 0;
+}
+
 struct map *machine__findnew_module_map(struct machine *machine, u64 start,
 					const char *filename)
 {
@@ -1451,6 +1459,9 @@ int machine__process_event(struct machine *machine, union perf_event *event,
 		ret = machine__process_itrace_start_event(machine, event); break;
 	case PERF_RECORD_LOST_SAMPLES:
 		ret = machine__process_lost_samples_event(machine, event, sample); break;
+	case PERF_RECORD_SWITCH:
+	case PERF_RECORD_SWITCH_CPU_WIDE:
+		ret = machine__process_switch_event(machine, event); break;
 	default:
 		ret = -1;
 		break;
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index ff5f78c..ea5cb4a 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -87,6 +87,8 @@ int machine__process_aux_event(struct machine *machine,
 			       union perf_event *event);
 int machine__process_itrace_start_event(struct machine *machine,
 					union perf_event *event);
+int machine__process_switch_event(struct machine *machine __maybe_unused,
+				  union perf_event *event);
 int machine__process_mmap_event(struct machine *machine, union perf_event *event,
 				struct perf_sample *sample);
 int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index fb1d525..2d95747 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -333,6 +333,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 		tool->aux = perf_event__process_aux;
 	if (tool->itrace_start == NULL)
 		tool->itrace_start = perf_event__process_itrace_start;
+	if (tool->context_switch == NULL)
+		tool->context_switch = perf_event__process_switch;
 	if (tool->read == NULL)
 		tool->read = process_event_sample_stub;
 	if (tool->throttle == NULL)
@@ -471,6 +473,19 @@ static void perf_event__itrace_start_swap(union perf_event *event,
 		swap_sample_id_all(event, &event->itrace_start + 1);
 }
 
+static void perf_event__switch_swap(union perf_event *event, bool sample_id_all)
+{
+	if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE) {
+		event->context_switch.next_prev_pid =
+				bswap_32(event->context_switch.next_prev_pid);
+		event->context_switch.next_prev_tid =
+				bswap_32(event->context_switch.next_prev_tid);
+	}
+
+	if (sample_id_all)
+		swap_sample_id_all(event, &event->context_switch + 1);
+}
+
 static void perf_event__throttle_swap(union perf_event *event,
 				      bool sample_id_all)
 {
@@ -633,6 +648,8 @@ static perf_event__swap_op perf_event__swap_ops[] = {
 	[PERF_RECORD_AUX]		  = perf_event__aux_swap,
 	[PERF_RECORD_ITRACE_START]	  = perf_event__itrace_start_swap,
 	[PERF_RECORD_LOST_SAMPLES]	  = perf_event__all64_swap,
+	[PERF_RECORD_SWITCH]		  = perf_event__switch_swap,
+	[PERF_RECORD_SWITCH_CPU_WIDE]	  = perf_event__switch_swap,
 	[PERF_RECORD_HEADER_ATTR]	  = perf_event__hdr_attr_swap,
 	[PERF_RECORD_HEADER_EVENT_TYPE]	  = perf_event__event_type_swap,
 	[PERF_RECORD_HEADER_TRACING_DATA] = perf_event__tracing_data_swap,
@@ -1094,6 +1111,9 @@ static int machines__deliver_event(struct machines *machines,
 		return tool->aux(tool, event, sample, machine);
 	case PERF_RECORD_ITRACE_START:
 		return tool->itrace_start(tool, event, sample, machine);
+	case PERF_RECORD_SWITCH:
+	case PERF_RECORD_SWITCH_CPU_WIDE:
+		return tool->context_switch(tool, event, sample, machine);
 	default:
 		++evlist->stats.nr_unknown_events;
 		return -1;
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index c307dd4..cab8cc2 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -46,6 +46,7 @@ struct perf_tool {
 			lost_samples,
 			aux,
 			itrace_start,
+			context_switch,
 			throttle,
 			unthrottle;
 	event_attr_op	attr;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [tip:perf/core] perf record: Add option --switch-events to select PERF_RECORD_SWITCH events
  2015-07-21  9:44 ` [PATCH V4 3/5] perf record: Add option --switch-events to select PERF_RECORD_SWITCH events Adrian Hunter
@ 2015-07-29  8:15   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-07-29  8:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: ak, acme, jolsa, mingo, peterz, pawel.moll, linux-kernel,
	eranian, hpa, tglx, adrian.hunter, mathieu.poirier

Commit-ID:  b757bb09134f479a087ece08d2cd2a6ba31c9210
Gitweb:     http://git.kernel.org/tip/b757bb09134f479a087ece08d2cd2a6ba31c9210
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Tue, 21 Jul 2015 12:44:04 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 23 Jul 2015 22:51:13 -0300

perf record: Add option --switch-events to select PERF_RECORD_SWITCH events

Add an option to select PERF_RECORD_SWITCH events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1437471846-26995-4-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-record.txt |  4 ++++
 tools/perf/builtin-record.c              |  7 +++++++
 tools/perf/perf.h                        |  1 +
 tools/perf/util/evlist.h                 |  1 +
 tools/perf/util/evsel.c                  |  3 +++
 tools/perf/util/record.c                 | 10 ++++++++++
 6 files changed, 26 insertions(+)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 29e5307..63ee040 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -293,6 +293,10 @@ When processing pre-existing threads /proc/XXX/mmap, it may take a long time,
 because the file may be huge. A time out is needed in such cases.
 This option sets the time out limit. The default value is 500 ms.
 
+--switch-events::
+Record context switch events i.e. events of type PERF_RECORD_SWITCH or
+PERF_RECORD_SWITCH_CPU_WIDE.
+
 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 1932e27..445a64d 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1075,6 +1075,8 @@ struct option __record_options[] = {
 			  "opts", "AUX area tracing Snapshot Mode", ""),
 	OPT_UINTEGER(0, "proc-map-timeout", &record.opts.proc_map_timeout,
 			"per thread proc mmap processing timeout in ms"),
+	OPT_BOOLEAN(0, "switch-events", &record.opts.record_switch_events,
+		    "Record context switch events"),
 	OPT_END()
 };
 
@@ -1102,6 +1104,11 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
 			  " system-wide mode\n");
 		usage_with_options(record_usage, record_options);
 	}
+	if (rec->opts.record_switch_events &&
+	    !perf_can_record_switch_events()) {
+		ui__error("kernel does not support recording context switch events (--switch-events option)\n");
+		usage_with_options(record_usage, record_options);
+	}
 
 	if (!rec->itr) {
 		rec->itr = auxtrace_record__init(rec->evlist, &err);
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 937b16a..cf459f8 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -57,6 +57,7 @@ struct record_opts {
 	bool	     running_time;
 	bool	     full_auxtrace;
 	bool	     auxtrace_snapshot_mode;
+	bool	     record_switch_events;
 	unsigned int freq;
 	unsigned int mmap_pages;
 	unsigned int auxtrace_mmap_pages;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 406a821..a8930b6 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -114,6 +114,7 @@ void perf_evlist__close(struct perf_evlist *evlist);
 
 void perf_evlist__set_id_pos(struct perf_evlist *evlist);
 bool perf_can_sample_identifier(void);
+bool perf_can_record_switch_events(void);
 void perf_evlist__config(struct perf_evlist *evlist, struct record_opts *opts);
 int record_opts__config(struct record_opts *opts);
 
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 9e6e6f4..71f6905 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -738,6 +738,9 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts)
 	attr->mmap2 = track && !perf_missing_features.mmap2;
 	attr->comm  = track;
 
+	if (opts->record_switch_events)
+		attr->context_switch = track;
+
 	if (opts->sample_transaction)
 		perf_evsel__set_sample_bit(evsel, TRANSACTION);
 
diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
index 1f7becb..0d228a2 100644
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
@@ -85,6 +85,11 @@ static void perf_probe_comm_exec(struct perf_evsel *evsel)
 	evsel->attr.comm_exec = 1;
 }
 
+static void perf_probe_context_switch(struct perf_evsel *evsel)
+{
+	evsel->attr.context_switch = 1;
+}
+
 bool perf_can_sample_identifier(void)
 {
 	return perf_probe_api(perf_probe_sample_identifier);
@@ -95,6 +100,11 @@ static bool perf_can_comm_exec(void)
 	return perf_probe_api(perf_probe_comm_exec);
 }
 
+bool perf_can_record_switch_events(void)
+{
+	return perf_probe_api(perf_probe_context_switch);
+}
+
 void perf_evlist__config(struct perf_evlist *evlist, struct record_opts *opts)
 {
 	struct perf_evsel *evsel;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [tip:perf/core] perf script: Don' t assume evsel position of tracking events
  2015-07-21  9:44 ` [PATCH V4 4/5] perf script: Don't assume evsel position of tracking events Adrian Hunter
@ 2015-07-29  8:15   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-07-29  8:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mathieu.poirier, mingo, pawel.moll, acme, linux-kernel, eranian,
	jolsa, ak, peterz, hpa, adrian.hunter, tglx

Commit-ID:  06b234ec26fde8d0fce54030fe0858e218636410
Gitweb:     http://git.kernel.org/tip/06b234ec26fde8d0fce54030fe0858e218636410
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Tue, 21 Jul 2015 12:44:05 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 23 Jul 2015 22:51:14 -0300

perf script: Don't assume evsel position of tracking events

The tracking event does not have to be the first event so replace
perf_evlist__first() with perf_evlist__id2evsel() which uses the event
ID to find the correct evsel.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1437471846-26995-5-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-script.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 2777c64..8a8e2ea 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -661,7 +661,7 @@ static int process_comm_event(struct perf_tool *tool,
 	struct thread *thread;
 	struct perf_script *script = container_of(tool, struct perf_script, tool);
 	struct perf_session *session = script->session;
-	struct perf_evsel *evsel = perf_evlist__first(session->evlist);
+	struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
 	int ret = -1;
 
 	thread = machine__findnew_thread(machine, event->comm.pid, event->comm.tid);
@@ -695,7 +695,7 @@ static int process_fork_event(struct perf_tool *tool,
 	struct thread *thread;
 	struct perf_script *script = container_of(tool, struct perf_script, tool);
 	struct perf_session *session = script->session;
-	struct perf_evsel *evsel = perf_evlist__first(session->evlist);
+	struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
 
 	if (perf_event__process_fork(tool, event, sample, machine) < 0)
 		return -1;
@@ -727,7 +727,7 @@ static int process_exit_event(struct perf_tool *tool,
 	struct thread *thread;
 	struct perf_script *script = container_of(tool, struct perf_script, tool);
 	struct perf_session *session = script->session;
-	struct perf_evsel *evsel = perf_evlist__first(session->evlist);
+	struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
 
 	thread = machine__findnew_thread(machine, event->fork.pid, event->fork.tid);
 	if (thread == NULL) {
@@ -759,7 +759,7 @@ static int process_mmap_event(struct perf_tool *tool,
 	struct thread *thread;
 	struct perf_script *script = container_of(tool, struct perf_script, tool);
 	struct perf_session *session = script->session;
-	struct perf_evsel *evsel = perf_evlist__first(session->evlist);
+	struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
 
 	if (perf_event__process_mmap(tool, event, sample, machine) < 0)
 		return -1;
@@ -790,7 +790,7 @@ static int process_mmap2_event(struct perf_tool *tool,
 	struct thread *thread;
 	struct perf_script *script = container_of(tool, struct perf_script, tool);
 	struct perf_session *session = script->session;
-	struct perf_evsel *evsel = perf_evlist__first(session->evlist);
+	struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
 
 	if (perf_event__process_mmap2(tool, event, sample, machine) < 0)
 		return -1;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [tip:perf/core] perf script: Add option --show-switch-events
  2015-07-21  9:44 ` [PATCH V4 5/5] perf script: Add option --show-switch-events Adrian Hunter
@ 2015-07-29  8:15   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-07-29  8:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: ak, jolsa, mingo, acme, mathieu.poirier, hpa, eranian,
	pawel.moll, tglx, linux-kernel, peterz, adrian.hunter

Commit-ID:  7c14898ba9386ee5c939bb418643ac6baff52840
Gitweb:     http://git.kernel.org/tip/7c14898ba9386ee5c939bb418643ac6baff52840
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Tue, 21 Jul 2015 12:44:06 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 23 Jul 2015 22:51:14 -0300

perf script: Add option --show-switch-events

Add option --show-switch-events to show switch events in a similar
fashion to --show-task-events and --show-mmap-events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1437471846-26995-6-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-script.txt |  4 ++++
 tools/perf/builtin-script.c              | 31 +++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index c82df57..e2fec5f 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -222,6 +222,10 @@ OPTIONS
 --show-mmap-events
 	Display mmap related events (e.g. MMAP, MMAP2).
 
+--show-switch-events
+	Display context switch events i.e. events of type PERF_RECORD_SWITCH or
+	PERF_RECORD_SWITCH_CPU_WIDE.
+
 --header
 	Show perf.data header.
 
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 8a8e2ea..bd31380 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -623,6 +623,7 @@ struct perf_script {
 	struct perf_session	*session;
 	bool			show_task_events;
 	bool			show_mmap_events;
+	bool			show_switch_events;
 };
 
 static int process_attr(struct perf_tool *tool, union perf_event *event,
@@ -813,6 +814,32 @@ static int process_mmap2_event(struct perf_tool *tool,
 	return 0;
 }
 
+static int process_switch_event(struct perf_tool *tool,
+				union perf_event *event,
+				struct perf_sample *sample,
+				struct machine *machine)
+{
+	struct thread *thread;
+	struct perf_script *script = container_of(tool, struct perf_script, tool);
+	struct perf_session *session = script->session;
+	struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
+
+	if (perf_event__process_switch(tool, event, sample, machine) < 0)
+		return -1;
+
+	thread = machine__findnew_thread(machine, sample->pid,
+					 sample->tid);
+	if (thread == NULL) {
+		pr_debug("problem processing SWITCH event, skipping it.\n");
+		return -1;
+	}
+
+	print_sample_start(sample, thread, evsel);
+	perf_event__fprintf(event, stdout);
+	thread__put(thread);
+	return 0;
+}
+
 static void sig_handler(int sig __maybe_unused)
 {
 	session_done = 1;
@@ -834,6 +861,8 @@ static int __cmd_script(struct perf_script *script)
 		script->tool.mmap = process_mmap_event;
 		script->tool.mmap2 = process_mmap2_event;
 	}
+	if (script->show_switch_events)
+		script->tool.context_switch = process_switch_event;
 
 	ret = perf_session__process_events(script->session);
 
@@ -1618,6 +1647,8 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 		    "Show the fork/comm/exit events"),
 	OPT_BOOLEAN('\0', "show-mmap-events", &script.show_mmap_events,
 		    "Show the mmap events"),
+	OPT_BOOLEAN('\0', "show-switch-events", &script.show_switch_events,
+		    "Show context switch events (if recorded)"),
 	OPT_BOOLEAN('f', "force", &file.force, "don't complain, do it"),
 	OPT_CALLBACK_OPTARG(0, "itrace", &itrace_synth_opts, NULL, "opts",
 			    "Instruction Tracing options",

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-07-29  8:16 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-21  9:44 [PATCH V4 0/5] perf: Add PERF_RECORD_SWITCH to indicate context switches Adrian Hunter
2015-07-21  9:44 ` [PATCH V4 1/5] " Adrian Hunter
2015-07-21 10:29   ` Peter Zijlstra
2015-07-21 17:54     ` Arnaldo Carvalho de Melo
2015-07-21 18:15       ` Arnaldo Carvalho de Melo
2015-07-22  9:53         ` Jiri Olsa
2015-07-29  8:14   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-21  9:44 ` [PATCH V4 2/5] perf tools: Add new PERF_RECORD_SWITCH event Adrian Hunter
2015-07-29  8:14   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-21  9:44 ` [PATCH V4 3/5] perf record: Add option --switch-events to select PERF_RECORD_SWITCH events Adrian Hunter
2015-07-29  8:15   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-21  9:44 ` [PATCH V4 4/5] perf script: Don't assume evsel position of tracking events Adrian Hunter
2015-07-29  8:15   ` [tip:perf/core] perf script: Don' t " tip-bot for Adrian Hunter
2015-07-21  9:44 ` [PATCH V4 5/5] perf script: Add option --show-switch-events Adrian Hunter
2015-07-29  8:15   ` [tip:perf/core] " tip-bot for Adrian Hunter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).