linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 0/4] perf: Add PERF_RECORD_SWITCH to indicate context switches
@ 2015-07-07  8:36 Adrian Hunter
  2015-07-07  8:36 ` [PATCH V3 1/4] " Adrian Hunter
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Adrian Hunter @ 2015-07-07  8:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel,
	Jiri Olsa, Stephane Eranian, mathieu.poirier, Pawel Moll

Hi

Here is V3 of PERF_RECORD_SWITCH.

Changes in V3:

Rename 'perf_event_switch()' parameter 'out' to be 'sched_in'
and invert meaning.

Changes in V2 (RFC):

The event now shows scheduling in and out,
and I added next/prev pid/tid for cpu-wide
contexts, plus there are some tools patches
that apply to Arnaldo's perf/core branch.


Adrian Hunter (4):
      perf: Add PERF_RECORD_SWITCH to indicate context switches
      perf tools: Add new PERF_RECORD_SWITCH event
      perf record: Add option --switch-events to select PERF_RECORD_SWITCH events
      perf script: Add option --show-switch-events

 include/uapi/linux/perf_event.h          |  20 +++++-
 kernel/events/core.c                     | 102 +++++++++++++++++++++++++++++++
 tools/perf/Documentation/perf-record.txt |   3 +
 tools/perf/Documentation/perf-script.txt |   3 +
 tools/perf/builtin-inject.c              |   1 +
 tools/perf/builtin-record.c              |   7 +++
 tools/perf/builtin-script.c              |  31 ++++++++++
 tools/perf/perf.h                        |   1 +
 tools/perf/util/event.c                  |  26 ++++++++
 tools/perf/util/event.h                  |  12 ++++
 tools/perf/util/evlist.h                 |   1 +
 tools/perf/util/evsel.c                  |   4 ++
 tools/perf/util/machine.c                |  10 +++
 tools/perf/util/machine.h                |   2 +
 tools/perf/util/record.c                 |  10 +++
 tools/perf/util/session.c                |  16 +++++
 tools/perf/util/tool.h                   |   1 +
 17 files changed, 249 insertions(+), 1 deletion(-)


Regards
Adrian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH V3 1/4] perf: Add PERF_RECORD_SWITCH to indicate context switches
  2015-07-07  8:36 [PATCH V3 0/4] perf: Add PERF_RECORD_SWITCH to indicate context switches Adrian Hunter
@ 2015-07-07  8:36 ` Adrian Hunter
  2015-07-07 13:25   ` Arnaldo Carvalho de Melo
  2015-07-07  8:36 ` [PATCH V3 2/4] perf tools: Add new PERF_RECORD_SWITCH event Adrian Hunter
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Adrian Hunter @ 2015-07-07  8:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel,
	Jiri Olsa, Stephane Eranian, mathieu.poirier, Pawel Moll

There are already two events for context switches, namely
the tracepoint sched:sched_switch and the software event
context_switches. Unfortunately neither are suitable for
use by non-privileged users for the purpose of synchronizing
hardware trace data (e.g. Intel PT) to the context switch.

Tracepoints are no good at all for non-privileged users
because they need either CAP_SYS_ADMIN or
/proc/sys/kernel/perf_event_paranoid <= -1.

On the other hand, kernel software events need either
CAP_SYS_ADMIN or /proc/sys/kernel/perf_event_paranoid <= 1.

Now many distributions do default perf_event_paranoid to 1
making context_switches a contender, except it has another
problem (which is also shared with sched:sched_switch)
which is that it happens before perf schedules events out
instead of after perf schedules events in. Whereas a
privileged user can see all the events anyway, a
non-privileged user only sees events for their own processes,
in other words they see when their process was scheduled out
not when it was scheduled in. That presents two problems to
use the event: 1. the information comes too late, so tools
have to look ahead in the event stream to find out what the
current state is 2. if they are unlucky tracing might have
stopped before the context-switches event is recorded.

This new PERF_RECORD_SWITCH event does not have those problems
and it also has a couple of other small advantages. It is
easier to use because it is an auxiliary event (like mmap,
comm and task events) which can be enabled by setting a single
bit. It is smaller than sched:sched_switch and easier to parse.

To make the event useful for privileged users also, if the
context is cpu-wide then the event will also provide the
next or previous pid/tid.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 include/uapi/linux/perf_event.h |  20 +++++++-
 kernel/events/core.c            | 102 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 121 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index d97f84c080da..7f1664b818c0 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -330,7 +330,8 @@ struct perf_event_attr {
 				mmap2          :  1, /* include mmap with inode data     */
 				comm_exec      :  1, /* flag comm events that are due to an exec */
 				use_clockid    :  1, /* use @clockid for time fields */
-				__reserved_1   : 38;
+				context_switch :  1, /* context switch data */
+				__reserved_1   : 37;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -572,9 +573,11 @@ struct perf_event_mmap_page {
 /*
  * PERF_RECORD_MISC_MMAP_DATA and PERF_RECORD_MISC_COMM_EXEC are used on
  * different events so can reuse the same bit position.
+ * Ditto PERF_RECORD_MISC_SWITCH_OUT.
  */
 #define PERF_RECORD_MISC_MMAP_DATA		(1 << 13)
 #define PERF_RECORD_MISC_COMM_EXEC		(1 << 13)
+#define PERF_RECORD_MISC_SWITCH_OUT		(1 << 13)
 /*
  * Indicates that the content of PERF_SAMPLE_IP points to
  * the actual instruction that triggered the event. See also
@@ -818,6 +821,21 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_LOST_SAMPLES		= 13,
 
+	/*
+	 * Records a context switch in or out (flagged by
+	 * PERF_RECORD_MISC_SWITCH_OUT).  next_prev_pid and next_prev_tid are
+	 * (u32)-1 unless the context is cpu-wide, in which case they are the
+	 * next (switching out) or previous (switching in) pid/tid.
+	 *
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u32				next_prev_pid;
+	 *	u32				next_prev_tid;
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_SWITCH			= 14,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d3dae3419b99..4e8755c5948e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -163,6 +163,7 @@ static atomic_t nr_mmap_events __read_mostly;
 static atomic_t nr_comm_events __read_mostly;
 static atomic_t nr_task_events __read_mostly;
 static atomic_t nr_freq_events __read_mostly;
+static atomic_t nr_switch_events __read_mostly;
 
 static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
@@ -2619,6 +2620,9 @@ static void perf_pmu_sched_task(struct task_struct *prev,
 	local_irq_restore(flags);
 }
 
+static void perf_event_switch(struct task_struct *task,
+			      struct task_struct *next_prev, bool sched_in);
+
 #define for_each_task_context_nr(ctxn)					\
 	for ((ctxn) = 0; (ctxn) < perf_nr_task_contexts; (ctxn)++)
 
@@ -2641,6 +2645,9 @@ void __perf_event_task_sched_out(struct task_struct *task,
 	if (__this_cpu_read(perf_sched_cb_usages))
 		perf_pmu_sched_task(task, next, false);
 
+	if (atomic_read(&nr_switch_events))
+		perf_event_switch(task, next, false);
+
 	for_each_task_context_nr(ctxn)
 		perf_event_context_sched_out(task, ctxn, next);
 
@@ -2831,6 +2838,9 @@ void __perf_event_task_sched_in(struct task_struct *prev,
 	if (atomic_read(this_cpu_ptr(&perf_cgroup_events)))
 		perf_cgroup_sched_in(prev, task);
 
+	if (atomic_read(&nr_switch_events))
+		perf_event_switch(task, prev, true);
+
 	if (__this_cpu_read(perf_sched_cb_usages))
 		perf_pmu_sched_task(prev, task, true);
 }
@@ -3454,6 +3464,10 @@ static void unaccount_event(struct perf_event *event)
 		atomic_dec(&nr_task_events);
 	if (event->attr.freq)
 		atomic_dec(&nr_freq_events);
+	if (event->attr.context_switch) {
+		static_key_slow_dec_deferred(&perf_sched_events);
+		atomic_dec(&nr_switch_events);
+	}
 	if (is_cgroup_event(event))
 		static_key_slow_dec_deferred(&perf_sched_events);
 	if (has_branch_stack(event))
@@ -5982,6 +5996,90 @@ void perf_log_lost_samples(struct perf_event *event, u64 lost)
 }
 
 /*
+ * context_switch tracking
+ */
+
+struct perf_switch_event {
+	struct task_struct	*task;
+	struct task_struct	*next_prev;
+
+	struct {
+		struct perf_event_header	header;
+		u32				next_prev_pid;
+		u32				next_prev_tid;
+	} event_id;
+};
+
+static int perf_event_switch_match(struct perf_event *event)
+{
+	return event->attr.context_switch;
+}
+
+static void perf_event_switch_output(struct perf_event *event, void *data)
+{
+	struct perf_switch_event *switch_event = data;
+	struct perf_output_handle handle;
+	struct perf_sample_data sample;
+	int size = switch_event->event_id.header.size;
+	int ret;
+
+	if (!perf_event_switch_match(event))
+		return;
+
+	perf_event_header__init_id(&switch_event->event_id.header, &sample, event);
+
+	ret = perf_output_begin(&handle, event,
+				switch_event->event_id.header.size);
+	if (ret)
+		goto out;
+
+	/* Only CPU-wide events are allowed to see next/prev pid/tid */
+	if (event->ctx->task) {
+		switch_event->event_id.next_prev_pid = -1;
+		switch_event->event_id.next_prev_tid = -1;
+	} else {
+		switch_event->event_id.next_prev_pid =
+				perf_event_pid(event, switch_event->next_prev);
+		switch_event->event_id.next_prev_tid =
+				perf_event_tid(event, switch_event->next_prev);
+	}
+
+	perf_output_put(&handle, switch_event->event_id);
+
+	perf_event__output_id_sample(event, &handle, &sample);
+
+	perf_output_end(&handle);
+out:
+	switch_event->event_id.header.size = size;
+}
+
+static void perf_event_switch(struct task_struct *task,
+			      struct task_struct *next_prev, bool sched_in)
+{
+	struct perf_switch_event switch_event;
+
+	/* N.B. caller checks nr_switch_events != 0 */
+
+	switch_event = (struct perf_switch_event){
+		.task		= task,
+		.next_prev	= next_prev,
+		.event_id	= {
+			.header = {
+				.type = PERF_RECORD_SWITCH,
+				.misc = sched_in ? 0 : PERF_RECORD_MISC_SWITCH_OUT,
+				.size = sizeof(switch_event.event_id),
+			},
+			/* .next_prev_pid */
+			/* .next_prev_tid */
+		},
+	};
+
+	perf_event_aux(perf_event_switch_output,
+		       &switch_event,
+		       NULL);
+}
+
+/*
  * IRQ throttle logging
  */
 
@@ -7479,6 +7577,10 @@ static void account_event(struct perf_event *event)
 		if (atomic_inc_return(&nr_freq_events) == 1)
 			tick_nohz_full_kick_all();
 	}
+	if (event->attr.context_switch) {
+		atomic_inc(&nr_switch_events);
+		static_key_slow_inc(&perf_sched_events.key);
+	}
 	if (has_branch_stack(event))
 		static_key_slow_inc(&perf_sched_events.key);
 	if (is_cgroup_event(event))
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V3 2/4] perf tools: Add new PERF_RECORD_SWITCH event
  2015-07-07  8:36 [PATCH V3 0/4] perf: Add PERF_RECORD_SWITCH to indicate context switches Adrian Hunter
  2015-07-07  8:36 ` [PATCH V3 1/4] " Adrian Hunter
@ 2015-07-07  8:36 ` Adrian Hunter
  2015-07-07 13:50   ` Arnaldo Carvalho de Melo
  2015-07-07  8:36 ` [PATCH V3 3/4] perf record: Add option --switch-events to select PERF_RECORD_SWITCH events Adrian Hunter
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Adrian Hunter @ 2015-07-07  8:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel,
	Jiri Olsa, Stephane Eranian, mathieu.poirier, Pawel Moll

Support processing of PERF_RECORD_SWITCH events.
There is still no way to select them, though.
That is added in a subsequest patch.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-inject.c |  1 +
 tools/perf/util/event.c     | 26 ++++++++++++++++++++++++++
 tools/perf/util/event.h     | 12 ++++++++++++
 tools/perf/util/evsel.c     |  1 +
 tools/perf/util/machine.c   | 10 ++++++++++
 tools/perf/util/machine.h   |  2 ++
 tools/perf/util/session.c   | 16 ++++++++++++++++
 tools/perf/util/tool.h      |  1 +
 8 files changed, 69 insertions(+)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 01b06492bd6a..f62c49b35be0 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -561,6 +561,7 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused)
 			.lost		= perf_event__repipe,
 			.aux		= perf_event__repipe,
 			.itrace_start	= perf_event__repipe,
+			.context_switch	= perf_event__repipe,
 			.read		= perf_event__repipe_sample,
 			.throttle	= perf_event__repipe,
 			.unthrottle	= perf_event__repipe,
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 67a977e5d0ab..c61cc02932fa 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -26,6 +26,7 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_AUX]			= "AUX",
 	[PERF_RECORD_ITRACE_START]		= "ITRACE_START",
 	[PERF_RECORD_LOST_SAMPLES]		= "LOST_SAMPLES",
+	[PERF_RECORD_SWITCH]			= "SWITCH",
 	[PERF_RECORD_HEADER_ATTR]		= "ATTR",
 	[PERF_RECORD_HEADER_EVENT_TYPE]		= "EVENT_TYPE",
 	[PERF_RECORD_HEADER_TRACING_DATA]	= "TRACING_DATA",
@@ -749,6 +750,14 @@ int perf_event__process_lost_samples(struct perf_tool *tool __maybe_unused,
 	return machine__process_lost_samples_event(machine, event, sample);
 }
 
+int perf_event__process_switch(struct perf_tool *tool __maybe_unused,
+			       union perf_event *event,
+			       struct perf_sample *sample __maybe_unused,
+			       struct machine *machine)
+{
+	return machine__process_switch_event(machine, event);
+}
+
 size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp)
 {
 	return fprintf(fp, " %d/%d: [%#" PRIx64 "(%#" PRIx64 ") @ %#" PRIx64 "]: %c %s\n",
@@ -827,6 +836,20 @@ size_t perf_event__fprintf_itrace_start(union perf_event *event, FILE *fp)
 		       event->itrace_start.pid, event->itrace_start.tid);
 }
 
+size_t perf_event__fprintf_switch(union perf_event *event, FILE *fp)
+{
+	bool out = event->header.misc & PERF_RECORD_MISC_SWITCH_OUT;
+	const char *in_out = out ? "OUT" : "IN ";
+
+	if (event->context_switch.next_prev_pid == (u32)-1)
+		return fprintf(fp, " %s\n", in_out);
+
+	return fprintf(fp, " %s  %s pid/tid: %5u/%-5u\n",
+		       in_out, out ? "next" : "prev",
+		       event->context_switch.next_prev_pid,
+		       event->context_switch.next_prev_tid);
+}
+
 size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 {
 	size_t ret = fprintf(fp, "PERF_RECORD_%s",
@@ -852,6 +875,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 	case PERF_RECORD_ITRACE_START:
 		ret += perf_event__fprintf_itrace_start(event, fp);
 		break;
+	case PERF_RECORD_SWITCH:
+		ret += perf_event__fprintf_switch(event, fp);
+		break;
 	default:
 		ret += fprintf(fp, "\n");
 	}
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index c53f36384b64..4bb2ae894c78 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -348,6 +348,12 @@ struct itrace_start_event {
 	u32 pid, tid;
 };
 
+struct context_switch_event {
+	struct perf_event_header header;
+	u32 next_prev_pid;
+	u32 next_prev_tid;
+};
+
 union perf_event {
 	struct perf_event_header	header;
 	struct mmap_event		mmap;
@@ -369,6 +375,7 @@ union perf_event {
 	struct auxtrace_error_event	auxtrace_error;
 	struct aux_event		aux;
 	struct itrace_start_event	itrace_start;
+	struct context_switch_event	context_switch;
 };
 
 void perf_event__print_totals(void);
@@ -418,6 +425,10 @@ int perf_event__process_itrace_start(struct perf_tool *tool,
 				     union perf_event *event,
 				     struct perf_sample *sample,
 				     struct machine *machine);
+int perf_event__process_switch(struct perf_tool *tool,
+			       union perf_event *event,
+			       struct perf_sample *sample,
+			       struct machine *machine);
 int perf_event__process_mmap(struct perf_tool *tool,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -480,6 +491,7 @@ size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_task(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_aux(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_itrace_start(union perf_event *event, FILE *fp);
+size_t perf_event__fprintf_switch(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf(union perf_event *event, FILE *fp);
 
 u64 kallsyms__get_function_start(const char *kallsyms_filename,
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 83c08037e7e2..67d2b43dd500 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1126,6 +1126,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
 	PRINT_ATTRf(mmap2, p_unsigned);
 	PRINT_ATTRf(comm_exec, p_unsigned);
 	PRINT_ATTRf(use_clockid, p_unsigned);
+	PRINT_ATTRf(context_switch, p_unsigned);
 
 	PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned);
 	PRINT_ATTRf(bp_type, p_unsigned);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 7ff682770fdb..dfcd80df7cba 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -550,6 +550,14 @@ int machine__process_itrace_start_event(struct machine *machine __maybe_unused,
 	return 0;
 }
 
+int machine__process_switch_event(struct machine *machine __maybe_unused,
+				  union perf_event *event)
+{
+	if (dump_trace)
+		perf_event__fprintf_switch(event, stdout);
+	return 0;
+}
+
 struct map *machine__findnew_module_map(struct machine *machine, u64 start,
 					const char *filename)
 {
@@ -1451,6 +1459,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
 		ret = machine__process_itrace_start_event(machine, event); break;
 	case PERF_RECORD_LOST_SAMPLES:
 		ret = machine__process_lost_samples_event(machine, event, sample); break;
+	case PERF_RECORD_SWITCH:
+		ret = machine__process_switch_event(machine, event); break;
 	default:
 		ret = -1;
 		break;
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 887798e511e9..f890697a4c9c 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -87,6 +87,8 @@ int machine__process_aux_event(struct machine *machine,
 			       union perf_event *event);
 int machine__process_itrace_start_event(struct machine *machine,
 					union perf_event *event);
+int machine__process_switch_event(struct machine *machine __maybe_unused,
+				  union perf_event *event);
 int machine__process_mmap_event(struct machine *machine, union perf_event *event,
 				struct perf_sample *sample);
 int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index ed9dc2555ec7..be6914a1c29d 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -332,6 +332,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 		tool->aux = perf_event__process_aux;
 	if (tool->itrace_start == NULL)
 		tool->itrace_start = perf_event__process_itrace_start;
+	if (tool->context_switch == NULL)
+		tool->context_switch = perf_event__process_switch;
 	if (tool->read == NULL)
 		tool->read = process_event_sample_stub;
 	if (tool->throttle == NULL)
@@ -470,6 +472,17 @@ static void perf_event__itrace_start_swap(union perf_event *event,
 		swap_sample_id_all(event, &event->itrace_start + 1);
 }
 
+static void perf_event__switch_swap(union perf_event *event, bool sample_id_all)
+{
+	event->context_switch.next_prev_pid =
+				bswap_32(event->context_switch.next_prev_pid);
+	event->context_switch.next_prev_tid =
+				bswap_32(event->context_switch.next_prev_tid);
+
+	if (sample_id_all)
+		swap_sample_id_all(event, &event->context_switch + 1);
+}
+
 static void perf_event__throttle_swap(union perf_event *event,
 				      bool sample_id_all)
 {
@@ -632,6 +645,7 @@ static perf_event__swap_op perf_event__swap_ops[] = {
 	[PERF_RECORD_AUX]		  = perf_event__aux_swap,
 	[PERF_RECORD_ITRACE_START]	  = perf_event__itrace_start_swap,
 	[PERF_RECORD_LOST_SAMPLES]	  = perf_event__all64_swap,
+	[PERF_RECORD_SWITCH]		  = perf_event__switch_swap,
 	[PERF_RECORD_HEADER_ATTR]	  = perf_event__hdr_attr_swap,
 	[PERF_RECORD_HEADER_EVENT_TYPE]	  = perf_event__event_type_swap,
 	[PERF_RECORD_HEADER_TRACING_DATA] = perf_event__tracing_data_swap,
@@ -1093,6 +1107,8 @@ static int machines__deliver_event(struct machines *machines,
 		return tool->aux(tool, event, sample, machine);
 	case PERF_RECORD_ITRACE_START:
 		return tool->itrace_start(tool, event, sample, machine);
+	case PERF_RECORD_SWITCH:
+		return tool->context_switch(tool, event, sample, machine);
 	default:
 		++evlist->stats.nr_unknown_events;
 		return -1;
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index c307dd438286..cab8cc24831b 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -46,6 +46,7 @@ struct perf_tool {
 			lost_samples,
 			aux,
 			itrace_start,
+			context_switch,
 			throttle,
 			unthrottle;
 	event_attr_op	attr;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V3 3/4] perf record: Add option --switch-events to select PERF_RECORD_SWITCH events
  2015-07-07  8:36 [PATCH V3 0/4] perf: Add PERF_RECORD_SWITCH to indicate context switches Adrian Hunter
  2015-07-07  8:36 ` [PATCH V3 1/4] " Adrian Hunter
  2015-07-07  8:36 ` [PATCH V3 2/4] perf tools: Add new PERF_RECORD_SWITCH event Adrian Hunter
@ 2015-07-07  8:36 ` Adrian Hunter
  2015-07-07 13:53   ` Arnaldo Carvalho de Melo
  2015-07-07  8:36 ` [PATCH V3 4/4] perf script: Add option --show-switch-events Adrian Hunter
  2015-07-07  9:06 ` [PATCH V3 0/4] perf: Add PERF_RECORD_SWITCH to indicate context switches Peter Zijlstra
  4 siblings, 1 reply; 17+ messages in thread
From: Adrian Hunter @ 2015-07-07  8:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel,
	Jiri Olsa, Stephane Eranian, mathieu.poirier, Pawel Moll

Add an option to select PERF_RECORD_SWITCH events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-record.txt |  3 +++
 tools/perf/builtin-record.c              |  7 +++++++
 tools/perf/perf.h                        |  1 +
 tools/perf/util/evlist.h                 |  1 +
 tools/perf/util/evsel.c                  |  3 +++
 tools/perf/util/record.c                 | 10 ++++++++++
 6 files changed, 25 insertions(+)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 9b9d9d086680..92d098c5123b 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -276,6 +276,9 @@ When processing pre-existing threads /proc/XXX/mmap, it may take a long time,
 because the file may be huge. A time out is needed in such cases.
 This option sets the time out limit. The default value is 500 ms.
 
+--switch-events::
+Record context switch events i.e. events of type PERF_RECORD_SWITCH.
+
 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 283fe96bdfc1..a20ad2191a92 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1072,6 +1072,8 @@ struct option __record_options[] = {
 			  "opts", "AUX area tracing Snapshot Mode", ""),
 	OPT_UINTEGER(0, "proc-map-timeout", &record.opts.proc_map_timeout,
 			"per thread proc mmap processing timeout in ms"),
+	OPT_BOOLEAN(0, "switch-events", &record.opts.record_switch_events,
+		    "Record context switch events"),
 	OPT_END()
 };
 
@@ -1099,6 +1101,11 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
 			  " system-wide mode\n");
 		usage_with_options(record_usage, record_options);
 	}
+	if (rec->opts.record_switch_events &&
+	    !perf_can_record_switch_events()) {
+		ui__error("kernel does not support recording context switch events (--switch-events option)\n");
+		usage_with_options(record_usage, record_options);
+	}
 
 	if (!rec->itr) {
 		rec->itr = auxtrace_record__init(rec->evlist, &err);
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 937b16aa0300..cf459f89fc9b 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -57,6 +57,7 @@ struct record_opts {
 	bool	     running_time;
 	bool	     full_auxtrace;
 	bool	     auxtrace_snapshot_mode;
+	bool	     record_switch_events;
 	unsigned int freq;
 	unsigned int mmap_pages;
 	unsigned int auxtrace_mmap_pages;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 037633c1da9d..c2fb15768d79 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -114,6 +114,7 @@ void perf_evlist__close(struct perf_evlist *evlist);
 
 void perf_evlist__set_id_pos(struct perf_evlist *evlist);
 bool perf_can_sample_identifier(void);
+bool perf_can_record_switch_events(void);
 void perf_evlist__config(struct perf_evlist *evlist, struct record_opts *opts);
 int record_opts__config(struct record_opts *opts);
 
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 67d2b43dd500..c63f96dba0d2 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -737,6 +737,9 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts)
 	attr->mmap2 = track && !perf_missing_features.mmap2;
 	attr->comm  = track;
 
+	if (opts->record_switch_events)
+		attr->context_switch = track;
+
 	if (opts->sample_transaction)
 		perf_evsel__set_sample_bit(evsel, TRANSACTION);
 
diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
index 1f7becbe5e18..0d228a29526d 100644
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
@@ -85,6 +85,11 @@ static void perf_probe_comm_exec(struct perf_evsel *evsel)
 	evsel->attr.comm_exec = 1;
 }
 
+static void perf_probe_context_switch(struct perf_evsel *evsel)
+{
+	evsel->attr.context_switch = 1;
+}
+
 bool perf_can_sample_identifier(void)
 {
 	return perf_probe_api(perf_probe_sample_identifier);
@@ -95,6 +100,11 @@ static bool perf_can_comm_exec(void)
 	return perf_probe_api(perf_probe_comm_exec);
 }
 
+bool perf_can_record_switch_events(void)
+{
+	return perf_probe_api(perf_probe_context_switch);
+}
+
 void perf_evlist__config(struct perf_evlist *evlist, struct record_opts *opts)
 {
 	struct perf_evsel *evsel;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V3 4/4] perf script: Add option --show-switch-events
  2015-07-07  8:36 [PATCH V3 0/4] perf: Add PERF_RECORD_SWITCH to indicate context switches Adrian Hunter
                   ` (2 preceding siblings ...)
  2015-07-07  8:36 ` [PATCH V3 3/4] perf record: Add option --switch-events to select PERF_RECORD_SWITCH events Adrian Hunter
@ 2015-07-07  8:36 ` Adrian Hunter
  2015-07-07 13:54   ` Arnaldo Carvalho de Melo
  2015-07-07  9:06 ` [PATCH V3 0/4] perf: Add PERF_RECORD_SWITCH to indicate context switches Peter Zijlstra
  4 siblings, 1 reply; 17+ messages in thread
From: Adrian Hunter @ 2015-07-07  8:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel,
	Jiri Olsa, Stephane Eranian, mathieu.poirier, Pawel Moll

Add option --show-switch-events to show switch events
in a similar fashion to --show-task-events and
--show-mmap-events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-script.txt |  3 +++
 tools/perf/builtin-script.c              | 31 +++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index c82df572fac2..beca47ae9dbf 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -222,6 +222,9 @@ OPTIONS
 --show-mmap-events
 	Display mmap related events (e.g. MMAP, MMAP2).
 
+--show-switch-events
+	Display context switch events i.e. events of type PERF_RECORD_SWITCH.
+
 --header
 	Show perf.data header.
 
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 24809787369f..b3a00c93bfa1 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -623,6 +623,7 @@ struct perf_script {
 	struct perf_session	*session;
 	bool			show_task_events;
 	bool			show_mmap_events;
+	bool			show_switch_events;
 };
 
 static int process_attr(struct perf_tool *tool, union perf_event *event,
@@ -813,6 +814,32 @@ static int process_mmap2_event(struct perf_tool *tool,
 	return 0;
 }
 
+static int process_switch_event(struct perf_tool *tool,
+				union perf_event *event,
+				struct perf_sample *sample,
+				struct machine *machine)
+{
+	struct thread *thread;
+	struct perf_script *script = container_of(tool, struct perf_script, tool);
+	struct perf_session *session = script->session;
+	struct perf_evsel *evsel = perf_evlist__first(session->evlist);
+
+	if (perf_event__process_switch(tool, event, sample, machine) < 0)
+		return -1;
+
+	thread = machine__findnew_thread(machine, sample->pid,
+					 sample->tid);
+	if (thread == NULL) {
+		pr_debug("problem processing SWITCH event, skipping it.\n");
+		return -1;
+	}
+
+	print_sample_start(sample, thread, evsel);
+	perf_event__fprintf(event, stdout);
+	thread__put(thread);
+	return 0;
+}
+
 static void sig_handler(int sig __maybe_unused)
 {
 	session_done = 1;
@@ -834,6 +861,8 @@ static int __cmd_script(struct perf_script *script)
 		script->tool.mmap = process_mmap_event;
 		script->tool.mmap2 = process_mmap2_event;
 	}
+	if (script->show_switch_events)
+		script->tool.context_switch = process_switch_event;
 
 	ret = perf_session__process_events(script->session);
 
@@ -1618,6 +1647,8 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 		    "Show the fork/comm/exit events"),
 	OPT_BOOLEAN('\0', "show-mmap-events", &script.show_mmap_events,
 		    "Show the mmap events"),
+	OPT_BOOLEAN('\0', "show-switch-events", &script.show_switch_events,
+		    "Show context switch events (if recorded)"),
 	OPT_BOOLEAN('f', "force", &file.force, "don't complain, do it"),
 	OPT_CALLBACK_OPTARG(0, "itrace", &itrace_synth_opts, NULL, "opts",
 			    "Instruction Tracing options",
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH V3 0/4] perf: Add PERF_RECORD_SWITCH to indicate context switches
  2015-07-07  8:36 [PATCH V3 0/4] perf: Add PERF_RECORD_SWITCH to indicate context switches Adrian Hunter
                   ` (3 preceding siblings ...)
  2015-07-07  8:36 ` [PATCH V3 4/4] perf script: Add option --show-switch-events Adrian Hunter
@ 2015-07-07  9:06 ` Peter Zijlstra
  4 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2015-07-07  9:06 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel,
	Jiri Olsa, Stephane Eranian, mathieu.poirier, Pawel Moll

On Tue, Jul 07, 2015 at 11:36:38AM +0300, Adrian Hunter wrote:
> Adrian Hunter (4):
>       perf: Add PERF_RECORD_SWITCH to indicate context switches
>       perf tools: Add new PERF_RECORD_SWITCH event
>       perf record: Add option --switch-events to select PERF_RECORD_SWITCH events
>       perf script: Add option --show-switch-events
> 
>  include/uapi/linux/perf_event.h          |  20 +++++-
>  kernel/events/core.c                     | 102 +++++++++++++++++++++++++++++++
>  tools/perf/Documentation/perf-record.txt |   3 +
>  tools/perf/Documentation/perf-script.txt |   3 +
>  tools/perf/builtin-inject.c              |   1 +
>  tools/perf/builtin-record.c              |   7 +++
>  tools/perf/builtin-script.c              |  31 ++++++++++
>  tools/perf/perf.h                        |   1 +
>  tools/perf/util/event.c                  |  26 ++++++++
>  tools/perf/util/event.h                  |  12 ++++
>  tools/perf/util/evlist.h                 |   1 +
>  tools/perf/util/evsel.c                  |   4 ++
>  tools/perf/util/machine.c                |  10 +++
>  tools/perf/util/machine.h                |   2 +
>  tools/perf/util/record.c                 |  10 +++
>  tools/perf/util/session.c                |  16 +++++
>  tools/perf/util/tool.h                   |   1 +
>  17 files changed, 249 insertions(+), 1 deletion(-)

Acme, Jiri, ACK on the tools bits?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V3 1/4] perf: Add PERF_RECORD_SWITCH to indicate context switches
  2015-07-07  8:36 ` [PATCH V3 1/4] " Adrian Hunter
@ 2015-07-07 13:25   ` Arnaldo Carvalho de Melo
  2015-07-07 13:44     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-07-07 13:25 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Peter Zijlstra, Andi Kleen, Ingo Molnar, linux-kernel, Jiri Olsa,
	Stephane Eranian, mathieu.poirier, Pawel Moll

Em Tue, Jul 07, 2015 at 11:36:39AM +0300, Adrian Hunter escreveu:
> To make the event useful for privileged users also, if the
> context is cpu-wide then the event will also provide the
> next or previous pid/tid.

<SNIP>
  
> +	/*
> +	 * Records a context switch in or out (flagged by
> +	 * PERF_RECORD_MISC_SWITCH_OUT).  next_prev_pid and next_prev_tid are
> +	 * (u32)-1 unless the context is cpu-wide, in which case they are the

Why carry those extra 8 bytes for non priviledged users, all the time
with -1?

Can't userspace cope with this, i.e. we should be able to look for those
fields when the context is CPU wide, and to not look for them otherwise,
no?

> +	 * next (switching out) or previous (switching in) pid/tid.
> +	 *
> +	 * struct {
> +	 *	struct perf_event_header	header;
> +	 *	u32				next_prev_pid;
> +	 *	u32				next_prev_tid;
> +	 *	struct sample_id		sample_id;
> +	 * };
> +	 */
> +	PERF_RECORD_SWITCH			= 14,
> +

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V3 1/4] perf: Add PERF_RECORD_SWITCH to indicate context switches
  2015-07-07 13:25   ` Arnaldo Carvalho de Melo
@ 2015-07-07 13:44     ` Arnaldo Carvalho de Melo
  2015-07-07 15:36       ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-07-07 13:44 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Peter Zijlstra, Andi Kleen, Ingo Molnar, linux-kernel, Jiri Olsa,
	Stephane Eranian, mathieu.poirier, Pawel Moll

Em Tue, Jul 07, 2015 at 10:25:52AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Jul 07, 2015 at 11:36:39AM +0300, Adrian Hunter escreveu:
> > To make the event useful for privileged users also, if the
> > context is cpu-wide then the event will also provide the
> > next or previous pid/tid.
> 
> <SNIP>
>   
> > +	/*
> > +	 * Records a context switch in or out (flagged by
> > +	 * PERF_RECORD_MISC_SWITCH_OUT).  next_prev_pid and next_prev_tid are
> > +	 * (u32)-1 unless the context is cpu-wide, in which case they are the
> 
> Why carry those extra 8 bytes for non priviledged users, all the time
> with -1?

> Can't userspace cope with this, i.e. we should be able to look for those
> fields when the context is CPU wide, and to not look for them otherwise,
> no?

To help userspace in places where all it has is the union perf_event, we
can reuse one bit in misc to state that, i.e.

  #define PERF_RECORD_MISC_SWITCH_NEXT_PREV_PID 14

For instance.

- Arnaldo
 
> > +	 * next (switching out) or previous (switching in) pid/tid.
> > +	 *
> > +	 * struct {
> > +	 *	struct perf_event_header	header;
> > +	 *	u32				next_prev_pid;
> > +	 *	u32				next_prev_tid;
> > +	 *	struct sample_id		sample_id;
> > +	 * };
> > +	 */
> > +	PERF_RECORD_SWITCH			= 14,
> > +


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V3 2/4] perf tools: Add new PERF_RECORD_SWITCH event
  2015-07-07  8:36 ` [PATCH V3 2/4] perf tools: Add new PERF_RECORD_SWITCH event Adrian Hunter
@ 2015-07-07 13:50   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-07-07 13:50 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Peter Zijlstra, Andi Kleen, Ingo Molnar, linux-kernel, Jiri Olsa,
	Stephane Eranian, mathieu.poirier, Pawel Moll

Em Tue, Jul 07, 2015 at 11:36:40AM +0300, Adrian Hunter escreveu:
> +size_t perf_event__fprintf_switch(union perf_event *event, FILE *fp)
> +{
> +	bool out = event->header.misc & PERF_RECORD_MISC_SWITCH_OUT;
> +	const char *in_out = out ? "OUT" : "IN ";
> +
> +	if (event->context_switch.next_prev_pid == (u32)-1)

I.e. we would have the test:

	if (!(event->header.misc & PERF_RECORD_MISC_SWITCH_NEXT_PREV_PID))
		return fprintf(fp, " %s\n", in_out);

> +	return fprintf(fp, " %s  %s pid/tid: %5u/%-5u\n",
> +		       in_out, out ? "next" : "prev",
> +		       event->context_switch.next_prev_pid,
> +		       event->context_switch.next_prev_tid);


<SNIP>

> +static void perf_event__switch_swap(union perf_event *event, bool sample_id_all)
> +{

Here too:

	if (event->header.misc & PERF_RECORD_MISC_SWITCH_NEXT_PREV_PID) {
		event->context_switch.next_prev_pid = bswap_32(event->context_switch.next_prev_pid);
		event->context_switch.next_prev_tid = bswap_32(event->context_switch.next_prev_tid);
		if (sample_id_all)
			swap_sample_id_all(event, &event->context_switch + 1);
	} else if (sample_id_all)
		swap_sample_id_all(event, &event->header + 1);
> +}
> +

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V3 3/4] perf record: Add option --switch-events to select PERF_RECORD_SWITCH events
  2015-07-07  8:36 ` [PATCH V3 3/4] perf record: Add option --switch-events to select PERF_RECORD_SWITCH events Adrian Hunter
@ 2015-07-07 13:53   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-07-07 13:53 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Peter Zijlstra, Andi Kleen, Ingo Molnar, linux-kernel, Jiri Olsa,
	Stephane Eranian, mathieu.poirier, Pawel Moll

Em Tue, Jul 07, 2015 at 11:36:41AM +0300, Adrian Hunter escreveu:
> Add an option to select PERF_RECORD_SWITCH events.

Looks ok, only if explicitely asked, and probed for presence in the
runninbg kerrnel that attr.context_switch will be set, ok.

Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V3 4/4] perf script: Add option --show-switch-events
  2015-07-07  8:36 ` [PATCH V3 4/4] perf script: Add option --show-switch-events Adrian Hunter
@ 2015-07-07 13:54   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-07-07 13:54 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Peter Zijlstra, Andi Kleen, Ingo Molnar, linux-kernel, Jiri Olsa,
	Stephane Eranian, mathieu.poirier, Pawel Moll

Em Tue, Jul 07, 2015 at 11:36:42AM +0300, Adrian Hunter escreveu:
> Add option --show-switch-events to show switch events
> in a similar fashion to --show-task-events and
> --show-mmap-events.

Looks, ok,

Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V3 1/4] perf: Add PERF_RECORD_SWITCH to indicate context switches
  2015-07-07 13:44     ` Arnaldo Carvalho de Melo
@ 2015-07-07 15:36       ` Peter Zijlstra
  2015-07-07 16:13         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2015-07-07 15:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Adrian Hunter, Andi Kleen, Ingo Molnar, linux-kernel, Jiri Olsa,
	Stephane Eranian, mathieu.poirier, Pawel Moll

On Tue, Jul 07, 2015 at 10:44:37AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Jul 07, 2015 at 10:25:52AM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Tue, Jul 07, 2015 at 11:36:39AM +0300, Adrian Hunter escreveu:
> > > To make the event useful for privileged users also, if the
> > > context is cpu-wide then the event will also provide the
> > > next or previous pid/tid.
> > 
> > <SNIP>
> >   
> > > +	/*
> > > +	 * Records a context switch in or out (flagged by
> > > +	 * PERF_RECORD_MISC_SWITCH_OUT).  next_prev_pid and next_prev_tid are
> > > +	 * (u32)-1 unless the context is cpu-wide, in which case they are the
> > 
> > Why carry those extra 8 bytes for non priviledged users, all the time
> > with -1?
> 
> > Can't userspace cope with this, i.e. we should be able to look for those
> > fields when the context is CPU wide, and to not look for them otherwise,
> > no?
> 
> To help userspace in places where all it has is the union perf_event, we
> can reuse one bit in misc to state that, i.e.
> 
>   #define PERF_RECORD_MISC_SWITCH_NEXT_PREV_PID 14
> 
> For instance.

The other option would be a separate RECORD type, which might be
simpler.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V3 1/4] perf: Add PERF_RECORD_SWITCH to indicate context switches
  2015-07-07 15:36       ` Peter Zijlstra
@ 2015-07-07 16:13         ` Arnaldo Carvalho de Melo
  2015-07-07 22:52           ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-07-07 16:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Adrian Hunter, Andi Kleen, Ingo Molnar, linux-kernel, Jiri Olsa,
	Stephane Eranian, mathieu.poirier, Pawel Moll

Em Tue, Jul 07, 2015 at 05:36:14PM +0200, Peter Zijlstra escreveu:
> On Tue, Jul 07, 2015 at 10:44:37AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Jul 07, 2015 at 10:25:52AM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Tue, Jul 07, 2015 at 11:36:39AM +0300, Adrian Hunter escreveu:
> > > > +	 * Records a context switch in or out (flagged by
> > > > +	 * PERF_RECORD_MISC_SWITCH_OUT).  next_prev_pid and next_prev_tid are
> > > > +	 * (u32)-1 unless the context is cpu-wide, in which case they are the

> > > Why carry those extra 8 bytes for non priviledged users, all the time
> > > with -1?

> > > Can't userspace cope with this, i.e. we should be able to look for those
> > > fields when the context is CPU wide, and to not look for them otherwise,
> > > no?

> > To help userspace in places where all it has is the union perf_event, we
> > can reuse one bit in misc to state that, i.e.

> >   #define PERF_RECORD_MISC_SWITCH_NEXT_PREV_PID 14

> > For instance.

> The other option would be a separate RECORD type, which might be
> simpler.

Humm, do we really need it?

I think this is just us wanting to, since we are going to add a new
record, to make it more useful for other, not right now needed,
situations, i.e. if the user is priviledged, there are two other options
to get his info, right?

So, yeah, since we have those bits doing nothing in header.misc, we
could well use them :-)

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V3 1/4] perf: Add PERF_RECORD_SWITCH to indicate context switches
  2015-07-07 16:13         ` Arnaldo Carvalho de Melo
@ 2015-07-07 22:52           ` Peter Zijlstra
  2015-07-08 13:28             ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2015-07-07 22:52 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Adrian Hunter, Andi Kleen, Ingo Molnar, linux-kernel, Jiri Olsa,
	Stephane Eranian, mathieu.poirier, Pawel Moll

On Tue, Jul 07, 2015 at 01:13:59PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Jul 07, 2015 at 05:36:14PM +0200, Peter Zijlstra escreveu:
> > > To help userspace in places where all it has is the union perf_event, we
> > > can reuse one bit in misc to state that, i.e.
> 
> > >   #define PERF_RECORD_MISC_SWITCH_NEXT_PREV_PID 14
> 
> > > For instance.
> 
> > The other option would be a separate RECORD type, which might be
> > simpler.
> 
> Humm, do we really need it?
> 
> I think this is just us wanting to, since we are going to add a new
> record, to make it more useful for other, not right now needed,
> situations, i.e. if the user is priviledged, there are two other options
> to get his info, right?

I was just thinking that 2 records, each with a fixed layout would be
easier to parse than 1 record with variable layout.

The record space is immense, so from that point it really doesn't
matter.

Do whatever is easiest, less mistakes get made etc. :-)

No real preference either way, as long we we've thought about it.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V3 1/4] perf: Add PERF_RECORD_SWITCH to indicate context switches
  2015-07-07 22:52           ` Peter Zijlstra
@ 2015-07-08 13:28             ` Arnaldo Carvalho de Melo
  2015-07-08 13:42               ` Adrian Hunter
  0 siblings, 1 reply; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-07-08 13:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Adrian Hunter, Andi Kleen, Ingo Molnar, linux-kernel, Jiri Olsa,
	Stephane Eranian, mathieu.poirier, Pawel Moll

Em Wed, Jul 08, 2015 at 12:52:40AM +0200, Peter Zijlstra escreveu:
> On Tue, Jul 07, 2015 at 01:13:59PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Jul 07, 2015 at 05:36:14PM +0200, Peter Zijlstra escreveu:
> > > > To help userspace in places where all it has is the union perf_event, we
> > > > can reuse one bit in misc to state that, i.e.

> > > >   #define PERF_RECORD_MISC_SWITCH_NEXT_PREV_PID 14
 
> > > > For instance.

> > > The other option would be a separate RECORD type, which might be
> > > simpler.

> > Humm, do we really need it?

> > I think this is just us wanting to, since we are going to add a new
> > record, to make it more useful for other, not right now needed,
> > situations, i.e. if the user is priviledged, there are two other options
> > to get his info, right?
 
> I was just thinking that 2 records, each with a fixed layout would be
> easier to parse than 1 record with variable layout.
 
> The record space is immense, so from that point it really doesn't
> matter.

We could do a land grab at some point there, if/when we find some reason
for that... :-)
 
> Do whatever is easiest, less mistakes get made etc. :-)
 
> No real preference either way, as long we we've thought about it.

Right, I just don't want to have two u32 carrying -1 for no reason.

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V3 1/4] perf: Add PERF_RECORD_SWITCH to indicate context switches
  2015-07-08 13:28             ` Arnaldo Carvalho de Melo
@ 2015-07-08 13:42               ` Adrian Hunter
  2015-07-08 14:56                 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 17+ messages in thread
From: Adrian Hunter @ 2015-07-08 13:42 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra
  Cc: Andi Kleen, Ingo Molnar, linux-kernel, Jiri Olsa,
	Stephane Eranian, mathieu.poirier, Pawel Moll

On 08/07/15 16:28, Arnaldo Carvalho de Melo wrote:
> Em Wed, Jul 08, 2015 at 12:52:40AM +0200, Peter Zijlstra escreveu:
>> On Tue, Jul 07, 2015 at 01:13:59PM -0300, Arnaldo Carvalho de Melo wrote:
>>> Em Tue, Jul 07, 2015 at 05:36:14PM +0200, Peter Zijlstra escreveu:
>>>>> To help userspace in places where all it has is the union perf_event, we
>>>>> can reuse one bit in misc to state that, i.e.
> 
>>>>>   #define PERF_RECORD_MISC_SWITCH_NEXT_PREV_PID 14
>  
>>>>> For instance.
> 
>>>> The other option would be a separate RECORD type, which might be
>>>> simpler.
> 
>>> Humm, do we really need it?
> 
>>> I think this is just us wanting to, since we are going to add a new
>>> record, to make it more useful for other, not right now needed,
>>> situations, i.e. if the user is priviledged, there are two other options
>>> to get his info, right?
>  
>> I was just thinking that 2 records, each with a fixed layout would be
>> easier to parse than 1 record with variable layout.
>  
>> The record space is immense, so from that point it really doesn't
>> matter.
> 
> We could do a land grab at some point there, if/when we find some reason
> for that... :-)
>  
>> Do whatever is easiest, less mistakes get made etc. :-)
>  
>> No real preference either way, as long we we've thought about it.
> 
> Right, I just don't want to have two u32 carrying -1 for no reason.

So you'd be OK with 2 RECORD types?

I will see what is involved.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V3 1/4] perf: Add PERF_RECORD_SWITCH to indicate context switches
  2015-07-08 13:42               ` Adrian Hunter
@ 2015-07-08 14:56                 ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-07-08 14:56 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Peter Zijlstra, Andi Kleen, Ingo Molnar, linux-kernel, Jiri Olsa,
	Stephane Eranian, mathieu.poirier, Pawel Moll

Em Wed, Jul 08, 2015 at 04:42:03PM +0300, Adrian Hunter escreveu:
> On 08/07/15 16:28, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Jul 08, 2015 at 12:52:40AM +0200, Peter Zijlstra escreveu:
> >> On Tue, Jul 07, 2015 at 01:13:59PM -0300, Arnaldo Carvalho de Melo wrote:
> >>> Em Tue, Jul 07, 2015 at 05:36:14PM +0200, Peter Zijlstra escreveu:
> >> I was just thinking that 2 records, each with a fixed layout would be
> >> easier to parse than 1 record with variable layout.

> >> The record space is immense, so from that point it really doesn't
> >> matter.

> > We could do a land grab at some point there, if/when we find some reason
> > for that... :-)

> >> Do whatever is easiest, less mistakes get made etc. :-)

> >> No real preference either way, as long we we've thought about it.

> > Right, I just don't want to have two u32 carrying -1 for no reason.

> So you'd be OK with 2 RECORD types?

Yeah, no problem.
 
> I will see what is involved.

Ok.

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-07-08 14:57 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-07  8:36 [PATCH V3 0/4] perf: Add PERF_RECORD_SWITCH to indicate context switches Adrian Hunter
2015-07-07  8:36 ` [PATCH V3 1/4] " Adrian Hunter
2015-07-07 13:25   ` Arnaldo Carvalho de Melo
2015-07-07 13:44     ` Arnaldo Carvalho de Melo
2015-07-07 15:36       ` Peter Zijlstra
2015-07-07 16:13         ` Arnaldo Carvalho de Melo
2015-07-07 22:52           ` Peter Zijlstra
2015-07-08 13:28             ` Arnaldo Carvalho de Melo
2015-07-08 13:42               ` Adrian Hunter
2015-07-08 14:56                 ` Arnaldo Carvalho de Melo
2015-07-07  8:36 ` [PATCH V3 2/4] perf tools: Add new PERF_RECORD_SWITCH event Adrian Hunter
2015-07-07 13:50   ` Arnaldo Carvalho de Melo
2015-07-07  8:36 ` [PATCH V3 3/4] perf record: Add option --switch-events to select PERF_RECORD_SWITCH events Adrian Hunter
2015-07-07 13:53   ` Arnaldo Carvalho de Melo
2015-07-07  8:36 ` [PATCH V3 4/4] perf script: Add option --show-switch-events Adrian Hunter
2015-07-07 13:54   ` Arnaldo Carvalho de Melo
2015-07-07  9:06 ` [PATCH V3 0/4] perf: Add PERF_RECORD_SWITCH to indicate context switches Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).