All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/2] Yet another take at user/kernel time correlation problem
@ 2014-09-12 11:48 Pawel Moll
  2014-09-12 11:48 ` [RFC 1/2] perf: Add sampling of the raw monotonic clock Pawel Moll
  2014-09-12 11:48   ` Pawel Moll
  0 siblings, 2 replies; 35+ messages in thread
From: Pawel Moll @ 2014-09-12 11:48 UTC (permalink / raw)
  To: Richard Cochran, Steven Rostedt, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, John Stultz
  Cc: linux-kernel, linux-api, Pawel Moll

Greetings,

Here comes yet another take at the problem of correlating perf
samples, timestamped in kernel (with - de-facto - sched_clock values),
with performance-related events (be it debug information from JIT
engines or energy sensor data obtained via USB or hwmon) generated
in user space.

The first patch adds an additional timestamp field in the perf
sample data, which can be requested for any perf event along
with normal PERF_SAMPLE_TIME. Events with both values appearing
periodically in the perf data allow user code to translate
raw monotonic time (obtained via POSIX clock API) to sched_clock
domain. Although any perf event can be used, the natural choice
would be a sched_switch trace event (for processes with root
permissions) or a hrtimer-based PERF_COUNT_SW_CPU_CLOCK.

One question I haven't found answer to is: could it be even more
generic? As in: would it be possible to request a time value from
any of the available time sources? It doesn't make sense, I
believe, to have a PERF_SAMPLE_* for each possibility. An extra
flags in perf_event_attrs maybe? (we still have 39 spare bits there)

The second patch, functionally orthogonal but complementing
the first one, replicates the "trace_maker" idea from ftrace
in the perf world. Instead of a sysfs file, there is an ioctl
command, which simply injects a new type of software event into
the buffer. The argument can point at a zero-terminated string
of PAGE_SIZE max lenght. If provided, it will be copied to
the "raw" part of a sample. Of course the event can sample
a monotonic clock as well, if used with the above, so one
gets means of both synchronisation and time stamp approximation.

One doubt I have here is the ioctl argument. It takes
a strong now, like trace_marker does. But maybe a simple
integer value would suffice? After all the ioctl can be
only generated by the "owner" of the perf stream (unlike
in trace_marker case, where "anyone" can write to it), so
we could rely on him to have a dictionary of events of
some sort.

On another note, I proposed this subject for the tracing
microconference on Plumbers next month. Hope to have some good
discussion there. Maybe even a conclusion? (I wish... ;-)

Thanks!

Pawel


Pawel Moll (2):
  perf: Add sampling of the raw monotonic clock
  perf: Marker software event and ioctl

 include/linux/perf_event.h      |  2 ++
 include/uapi/linux/perf_event.h |  6 ++++-
 kernel/events/core.c            | 55 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 62 insertions(+), 1 deletion(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC 1/2] perf: Add sampling of the raw monotonic clock
  2014-09-12 11:48 [RFC 0/2] Yet another take at user/kernel time correlation problem Pawel Moll
@ 2014-09-12 11:48 ` Pawel Moll
  2014-09-12 11:48   ` Pawel Moll
  1 sibling, 0 replies; 35+ messages in thread
From: Pawel Moll @ 2014-09-12 11:48 UTC (permalink / raw)
  To: Richard Cochran, Steven Rostedt, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, John Stultz
  Cc: linux-kernel, linux-api, Pawel Moll

This patch adds an option to sample raw monotonic clock
value with any perf event, with the the aim of allowing
time correlation between data coming from perf and
additional performance-related information generated in
userspace.

In order to correlate timestamps in perf data stream
with events happening in userspace (be it JITed debug
symbols or hwmon-originating environment data), user
requests a more or less periodic event (sched_switch
trace event of a hrtimer-based cpu-clock being the
most obvious examples) with PERF_SAMPLE_TIME *and*
PERF_SAMPLE_CLOCK_RAW_MONOTONIC and stamps
user-originating data with values obtained from
clock_gettime(CLOCK_MONOTONIC_RAW). Then, during
analysis, one looks at the perf events immediately
preceding and following (in terms of the
clock_raw_monotonic sample) the userspace event and
does simple linear approximation to get the equivalent
perf time.

        perf event     user event
       -----O--------------+-------------O------> t_mono
            :              |             :
            :              V             :
       -----O----------------------------O------> t_perf

Signed-off-by: Pawel Moll <pawel.moll@arm.com>
---
 include/linux/perf_event.h      |  2 ++
 include/uapi/linux/perf_event.h |  4 +++-
 kernel/events/core.c            | 12 ++++++++++++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 707617a..28b73b2 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -602,6 +602,8 @@ struct perf_sample_data {
 	 * Transaction flags for abort events:
 	 */
 	u64				txn;
+	/* Raw monotonic timestamp, for userspace time correlation */
+	u64				clock_raw_monotonic;
 };
 
 static inline void perf_sample_data_init(struct perf_sample_data *data,
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 9269de2..e5a75c5 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -137,8 +137,9 @@ enum perf_event_sample_format {
 	PERF_SAMPLE_DATA_SRC			= 1U << 15,
 	PERF_SAMPLE_IDENTIFIER			= 1U << 16,
 	PERF_SAMPLE_TRANSACTION			= 1U << 17,
+	PERF_SAMPLE_CLOCK_RAW_MONOTONIC		= 1U << 18,
 
-	PERF_SAMPLE_MAX = 1U << 18,		/* non-ABI */
+	PERF_SAMPLE_MAX = 1U << 19,		/* non-ABI */
 };
 
 /*
@@ -686,6 +687,7 @@ enum perf_event_type {
 	 *	{ u64			weight;   } && PERF_SAMPLE_WEIGHT
 	 *	{ u64			data_src; } && PERF_SAMPLE_DATA_SRC
 	 *	{ u64			transaction; } && PERF_SAMPLE_TRANSACTION
+	 *	{ u64			clock_raw_monotonic; } && PERF_SAMPLE_CLOCK_RAW_MONOTONIC
 	 * };
 	 */
 	PERF_RECORD_SAMPLE			= 9,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f9c1ed0..df093e3 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1216,6 +1216,9 @@ static void perf_event__header_size(struct perf_event *event)
 	if (sample_type & PERF_SAMPLE_TRANSACTION)
 		size += sizeof(data->txn);
 
+	if (sample_type & PERF_SAMPLE_CLOCK_RAW_MONOTONIC)
+		size += sizeof(data->clock_raw_monotonic);
+
 	event->header_size = size;
 }
 
@@ -4456,6 +4459,12 @@ static void __perf_event_header__init_id(struct perf_event_header *header,
 		data->cpu_entry.cpu	 = raw_smp_processor_id();
 		data->cpu_entry.reserved = 0;
 	}
+
+	if (sample_type & PERF_SAMPLE_CLOCK_RAW_MONOTONIC) {
+		struct timespec now;
+		getrawmonotonic(&now);
+		data->clock_raw_monotonic = timespec_to_ns(&now);
+	}
 }
 
 void perf_event_header__init_id(struct perf_event_header *header,
@@ -4714,6 +4723,9 @@ void perf_output_sample(struct perf_output_handle *handle,
 	if (sample_type & PERF_SAMPLE_TRANSACTION)
 		perf_output_put(handle, data->txn);
 
+	if (sample_type & PERF_SAMPLE_CLOCK_RAW_MONOTONIC)
+		perf_output_put(handle, data->clock_raw_monotonic);
+
 	if (!event->attr.watermark) {
 		int wakeup_events = event->attr.wakeup_events;
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-12 11:48   ` Pawel Moll
  0 siblings, 0 replies; 35+ messages in thread
From: Pawel Moll @ 2014-09-12 11:48 UTC (permalink / raw)
  To: Richard Cochran, Steven Rostedt, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, John Stultz
  Cc: linux-kernel, linux-api, Pawel Moll

This patch adds a PERF_COUNT_SW_MARKER event type, which
can be requested by user and a PERF_EVENT_IOC_MARKER
ioctl command which will inject an event of said type into
the perf buffer. The ioctl can take a zero-terminated
string argument, similar to tracing_marker in ftrace,
which will be kept in the "raw" field of the sample.

The main use case for this is synchronisation of
performance data generated in user space with the perf
stream coming from the kernel. For example, the marker
can be inserted by a JIT engine after it generated
portion of the code, but before the code is executed
for the first time, allowing the post-processor to
pick the correct debugging information. Other example
is a system profiling tool taking data from other
sources than just perf, which generates a marker
at the beginning at at the end of the session
(also possibly periodically during the session) to
synchronise kernel timestamps with clock values
obtained in userspace (gtod or raw_monotonic).

Signed-off-by: Pawel Moll <pawel.moll@arm.com>
---
 include/uapi/linux/perf_event.h |  2 ++
 kernel/events/core.c            | 43 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index e5a75c5..83b0f5b 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -110,6 +110,7 @@ enum perf_sw_ids {
 	PERF_COUNT_SW_ALIGNMENT_FAULTS		= 7,
 	PERF_COUNT_SW_EMULATION_FAULTS		= 8,
 	PERF_COUNT_SW_DUMMY			= 9,
+	PERF_COUNT_SW_MARKER			= 10,
 
 	PERF_COUNT_SW_MAX,			/* non-ABI */
 };
@@ -350,6 +351,7 @@ struct perf_event_attr {
 #define PERF_EVENT_IOC_SET_OUTPUT	_IO ('$', 5)
 #define PERF_EVENT_IOC_SET_FILTER	_IOW('$', 6, char *)
 #define PERF_EVENT_IOC_ID		_IOR('$', 7, __u64 *)
+#define PERF_EVENT_IOC_MARKER		_IOR('$', 8, char *)
 
 enum perf_event_ioc_flags {
 	PERF_IOC_FLAG_GROUP		= 1U << 0,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index df093e3..dbce284 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3655,6 +3655,7 @@ static inline int perf_fget_light(int fd, struct fd *p)
 static int perf_event_set_output(struct perf_event *event,
 				 struct perf_event *output_event);
 static int perf_event_set_filter(struct perf_event *event, void __user *arg);
+static int perf_sw_event_marker(struct perf_event *event, char __user *arg);
 
 static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
@@ -3709,6 +3710,9 @@ static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 	case PERF_EVENT_IOC_SET_FILTER:
 		return perf_event_set_filter(event, (void __user *)arg);
 
+	case PERF_EVENT_IOC_MARKER:
+		return perf_sw_event_marker(event, (char __user *)arg);
+
 	default:
 		return -ENOTTY;
 	}
@@ -3728,6 +3732,7 @@ static long perf_compat_ioctl(struct file *file, unsigned int cmd,
 	switch (_IOC_NR(cmd)) {
 	case _IOC_NR(PERF_EVENT_IOC_SET_FILTER):
 	case _IOC_NR(PERF_EVENT_IOC_ID):
+	case _IOC_NR(PERF_EVENT_IOC_MARKER):
 		/* Fix up pointer size (usually 4 -> 8 in 32-on-64-bit case */
 		if (_IOC_SIZE(cmd) == sizeof(compat_uptr_t)) {
 			cmd &= ~IOCSIZE_MASK;
@@ -5960,6 +5965,44 @@ static struct pmu perf_swevent = {
 	.event_idx	= perf_swevent_event_idx,
 };
 
+static int perf_sw_event_marker(struct perf_event *event, char __user *arg)
+{
+	struct perf_sample_data data;
+	struct pt_regs *regs = current_pt_regs();
+	struct perf_raw_record raw = { 0, };
+
+	if (!static_key_false(&perf_swevent_enabled[PERF_COUNT_SW_MARKER]))
+		return 0;
+
+	perf_sample_data_init(&data, 0, 0);
+
+	if (arg) {
+		long len = strnlen_user(arg, PAGE_SIZE);
+
+		if (len) {
+			raw.size = ALIGN(len + sizeof(u32), sizeof(u64))
+					- sizeof(u32);
+			raw.data = kzalloc(raw.size, GFP_KERNEL);
+			if (!raw.data)
+				return -ENOMEM;
+
+			if (copy_from_user(raw.data, arg, len)) {
+				kfree(raw.data);
+				return -EFAULT;
+			}
+
+			data.raw = &raw;
+		}
+	}
+
+	perf_event_output(event, &data, regs);
+
+	if (raw.size)
+		kfree(raw.data);
+
+	return 0;
+}
+
 #ifdef CONFIG_EVENT_TRACING
 
 static int perf_tp_filter_match(struct perf_event *event,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-12 11:48   ` Pawel Moll
  0 siblings, 0 replies; 35+ messages in thread
From: Pawel Moll @ 2014-09-12 11:48 UTC (permalink / raw)
  To: Richard Cochran, Steven Rostedt, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, John Stultz
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Pawel Moll

This patch adds a PERF_COUNT_SW_MARKER event type, which
can be requested by user and a PERF_EVENT_IOC_MARKER
ioctl command which will inject an event of said type into
the perf buffer. The ioctl can take a zero-terminated
string argument, similar to tracing_marker in ftrace,
which will be kept in the "raw" field of the sample.

The main use case for this is synchronisation of
performance data generated in user space with the perf
stream coming from the kernel. For example, the marker
can be inserted by a JIT engine after it generated
portion of the code, but before the code is executed
for the first time, allowing the post-processor to
pick the correct debugging information. Other example
is a system profiling tool taking data from other
sources than just perf, which generates a marker
at the beginning at at the end of the session
(also possibly periodically during the session) to
synchronise kernel timestamps with clock values
obtained in userspace (gtod or raw_monotonic).

Signed-off-by: Pawel Moll <pawel.moll-5wv7dgnIgG8@public.gmane.org>
---
 include/uapi/linux/perf_event.h |  2 ++
 kernel/events/core.c            | 43 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index e5a75c5..83b0f5b 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -110,6 +110,7 @@ enum perf_sw_ids {
 	PERF_COUNT_SW_ALIGNMENT_FAULTS		= 7,
 	PERF_COUNT_SW_EMULATION_FAULTS		= 8,
 	PERF_COUNT_SW_DUMMY			= 9,
+	PERF_COUNT_SW_MARKER			= 10,
 
 	PERF_COUNT_SW_MAX,			/* non-ABI */
 };
@@ -350,6 +351,7 @@ struct perf_event_attr {
 #define PERF_EVENT_IOC_SET_OUTPUT	_IO ('$', 5)
 #define PERF_EVENT_IOC_SET_FILTER	_IOW('$', 6, char *)
 #define PERF_EVENT_IOC_ID		_IOR('$', 7, __u64 *)
+#define PERF_EVENT_IOC_MARKER		_IOR('$', 8, char *)
 
 enum perf_event_ioc_flags {
 	PERF_IOC_FLAG_GROUP		= 1U << 0,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index df093e3..dbce284 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3655,6 +3655,7 @@ static inline int perf_fget_light(int fd, struct fd *p)
 static int perf_event_set_output(struct perf_event *event,
 				 struct perf_event *output_event);
 static int perf_event_set_filter(struct perf_event *event, void __user *arg);
+static int perf_sw_event_marker(struct perf_event *event, char __user *arg);
 
 static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
@@ -3709,6 +3710,9 @@ static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 	case PERF_EVENT_IOC_SET_FILTER:
 		return perf_event_set_filter(event, (void __user *)arg);
 
+	case PERF_EVENT_IOC_MARKER:
+		return perf_sw_event_marker(event, (char __user *)arg);
+
 	default:
 		return -ENOTTY;
 	}
@@ -3728,6 +3732,7 @@ static long perf_compat_ioctl(struct file *file, unsigned int cmd,
 	switch (_IOC_NR(cmd)) {
 	case _IOC_NR(PERF_EVENT_IOC_SET_FILTER):
 	case _IOC_NR(PERF_EVENT_IOC_ID):
+	case _IOC_NR(PERF_EVENT_IOC_MARKER):
 		/* Fix up pointer size (usually 4 -> 8 in 32-on-64-bit case */
 		if (_IOC_SIZE(cmd) == sizeof(compat_uptr_t)) {
 			cmd &= ~IOCSIZE_MASK;
@@ -5960,6 +5965,44 @@ static struct pmu perf_swevent = {
 	.event_idx	= perf_swevent_event_idx,
 };
 
+static int perf_sw_event_marker(struct perf_event *event, char __user *arg)
+{
+	struct perf_sample_data data;
+	struct pt_regs *regs = current_pt_regs();
+	struct perf_raw_record raw = { 0, };
+
+	if (!static_key_false(&perf_swevent_enabled[PERF_COUNT_SW_MARKER]))
+		return 0;
+
+	perf_sample_data_init(&data, 0, 0);
+
+	if (arg) {
+		long len = strnlen_user(arg, PAGE_SIZE);
+
+		if (len) {
+			raw.size = ALIGN(len + sizeof(u32), sizeof(u64))
+					- sizeof(u32);
+			raw.data = kzalloc(raw.size, GFP_KERNEL);
+			if (!raw.data)
+				return -ENOMEM;
+
+			if (copy_from_user(raw.data, arg, len)) {
+				kfree(raw.data);
+				return -EFAULT;
+			}
+
+			data.raw = &raw;
+		}
+	}
+
+	perf_event_output(event, &data, regs);
+
+	if (raw.size)
+		kfree(raw.data);
+
+	return 0;
+}
+
 #ifdef CONFIG_EVENT_TRACING
 
 static int perf_tp_filter_match(struct perf_event *event,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-12 12:43     ` Christopher Covington
  0 siblings, 0 replies; 35+ messages in thread
From: Christopher Covington @ 2014-09-12 12:43 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Richard Cochran, Steven Rostedt, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, John Stultz,
	linux-kernel, linux-api

Hi Pawel,

On 09/12/2014 07:48 AM, Pawel Moll wrote:
> This patch adds a PERF_COUNT_SW_MARKER event type, which
> can be requested by user and a PERF_EVENT_IOC_MARKER
> ioctl command which will inject an event of said type into
> the perf buffer. The ioctl can take a zero-terminated
> string argument, similar to tracing_marker in ftrace,
> which will be kept in the "raw" field of the sample.
> 
> The main use case for this is synchronisation of
> performance data generated in user space with the perf
> stream coming from the kernel. For example, the marker
> can be inserted by a JIT engine after it generated
> portion of the code, but before the code is executed
> for the first time, allowing the post-processor to
> pick the correct debugging information. Other example
> is a system profiling tool taking data from other
> sources than just perf, which generates a marker
> at the beginning at at the end of the session
> (also possibly periodically during the session) to
> synchronise kernel timestamps with clock values
> obtained in userspace (gtod or raw_monotonic).

> @@ -5960,6 +5965,44 @@ static struct pmu perf_swevent = {
>  	.event_idx	= perf_swevent_event_idx,
>  };
>  
> +static int perf_sw_event_marker(struct perf_event *event, char __user *arg)
> +{
> +	struct perf_sample_data data;
> +	struct pt_regs *regs = current_pt_regs();
> +	struct perf_raw_record raw = { 0, };
> +
> +	if (!static_key_false(&perf_swevent_enabled[PERF_COUNT_SW_MARKER]))
> +		return 0;
> +
> +	perf_sample_data_init(&data, 0, 0);
> +
> +	if (arg) {
> +		long len = strnlen_user(arg, PAGE_SIZE);

Just to ask the dumb questions in case the answers I've come up with are
wrong: What is PAGE_SIZE on an arm64 kernel? How does userspace know?

Thanks,
Christopher

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-12 12:43     ` Christopher Covington
  0 siblings, 0 replies; 35+ messages in thread
From: Christopher Covington @ 2014-09-12 12:43 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Richard Cochran, Steven Rostedt, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, John Stultz,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Hi Pawel,

On 09/12/2014 07:48 AM, Pawel Moll wrote:
> This patch adds a PERF_COUNT_SW_MARKER event type, which
> can be requested by user and a PERF_EVENT_IOC_MARKER
> ioctl command which will inject an event of said type into
> the perf buffer. The ioctl can take a zero-terminated
> string argument, similar to tracing_marker in ftrace,
> which will be kept in the "raw" field of the sample.
> 
> The main use case for this is synchronisation of
> performance data generated in user space with the perf
> stream coming from the kernel. For example, the marker
> can be inserted by a JIT engine after it generated
> portion of the code, but before the code is executed
> for the first time, allowing the post-processor to
> pick the correct debugging information. Other example
> is a system profiling tool taking data from other
> sources than just perf, which generates a marker
> at the beginning at at the end of the session
> (also possibly periodically during the session) to
> synchronise kernel timestamps with clock values
> obtained in userspace (gtod or raw_monotonic).

> @@ -5960,6 +5965,44 @@ static struct pmu perf_swevent = {
>  	.event_idx	= perf_swevent_event_idx,
>  };
>  
> +static int perf_sw_event_marker(struct perf_event *event, char __user *arg)
> +{
> +	struct perf_sample_data data;
> +	struct pt_regs *regs = current_pt_regs();
> +	struct perf_raw_record raw = { 0, };
> +
> +	if (!static_key_false(&perf_swevent_enabled[PERF_COUNT_SW_MARKER]))
> +		return 0;
> +
> +	perf_sample_data_init(&data, 0, 0);
> +
> +	if (arg) {
> +		long len = strnlen_user(arg, PAGE_SIZE);

Just to ask the dumb questions in case the answers I've come up with are
wrong: What is PAGE_SIZE on an arm64 kernel? How does userspace know?

Thanks,
Christopher

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-12 12:57       ` Pawel Moll
  0 siblings, 0 replies; 35+ messages in thread
From: Pawel Moll @ 2014-09-12 12:57 UTC (permalink / raw)
  To: Christopher Covington
  Cc: Richard Cochran, Steven Rostedt, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, John Stultz,
	linux-kernel, linux-api

On Fri, 2014-09-12 at 13:43 +0100, Christopher Covington wrote:
> Just to ask the dumb questions in case the answers I've come up with are
> wrong: What is PAGE_SIZE on an arm64 kernel? 

It's either 4 or 64k, depending on CONFIG_ARM64_64K_PAGES.

> How does userspace know?
> 
#include <unistd.h>
#include <stdio.h>

int main(void)
{
	printf("%ld\n", sysconf(_SC_PAGESIZE));
	return 0;
}

Now a word of explanation. The PAGE_SIZE limitation was shamelessly
stolen from perf_event_set_filter() (so PERF_EVENT_IOC_SET_FILTER) as an
attempt to address a problem of passing a zero-terminated string from
userspace. Simply speaking - there must be some limitation, and a page
size seem as good as any other. I have strong doubts about this myself,
so all alternative ideas are more than welcome.

As I mentioned in the cover letter, maybe this simply shouldn't be a
string? I made it like this to mimic trace_marker, but maybe an integer
value + some kind of a dictionary in userspace is a better approach? I
belive that ftrace's maker is taking a string, because it's: 1. natural
for its interface and 2. anyone (sort of) can write to it, so it's hard
to assume anything. In this case the user "owns" the perf data, so he
could handle int<->whatever-else relation table...

Pawel
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-12 12:57       ` Pawel Moll
  0 siblings, 0 replies; 35+ messages in thread
From: Pawel Moll @ 2014-09-12 12:57 UTC (permalink / raw)
  To: Christopher Covington
  Cc: Richard Cochran, Steven Rostedt, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, John Stultz,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Fri, 2014-09-12 at 13:43 +0100, Christopher Covington wrote:
> Just to ask the dumb questions in case the answers I've come up with are
> wrong: What is PAGE_SIZE on an arm64 kernel? 

It's either 4 or 64k, depending on CONFIG_ARM64_64K_PAGES.

> How does userspace know?
> 
#include <unistd.h>
#include <stdio.h>

int main(void)
{
	printf("%ld\n", sysconf(_SC_PAGESIZE));
	return 0;
}

Now a word of explanation. The PAGE_SIZE limitation was shamelessly
stolen from perf_event_set_filter() (so PERF_EVENT_IOC_SET_FILTER) as an
attempt to address a problem of passing a zero-terminated string from
userspace. Simply speaking - there must be some limitation, and a page
size seem as good as any other. I have strong doubts about this myself,
so all alternative ideas are more than welcome.

As I mentioned in the cover letter, maybe this simply shouldn't be a
string? I made it like this to mimic trace_marker, but maybe an integer
value + some kind of a dictionary in userspace is a better approach? I
belive that ftrace's maker is taking a string, because it's: 1. natural
for its interface and 2. anyone (sort of) can write to it, so it's hard
to assume anything. In this case the user "owns" the perf data, so he
could handle int<->whatever-else relation table...

Pawel
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
  2014-09-12 12:57       ` Pawel Moll
@ 2014-09-12 13:49         ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 35+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-12 13:49 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Christopher Covington, Richard Cochran, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Paul Mackerras, John Stultz,
	linux-kernel, linux-api

Em Fri, Sep 12, 2014 at 01:57:52PM +0100, Pawel Moll escreveu:
> On Fri, 2014-09-12 at 13:43 +0100, Christopher Covington wrote:
> > Just to ask the dumb questions in case the answers I've come up with are
> > wrong: What is PAGE_SIZE on an arm64 kernel? 
 
> It's either 4 or 64k, depending on CONFIG_ARM64_64K_PAGES.
 
> > How does userspace know?

> #include <unistd.h>
> #include <stdio.h>
 
> int main(void)
> {
> 	printf("%ld\n", sysconf(_SC_PAGESIZE));
> 	return 0;
> }
 
> Now a word of explanation. The PAGE_SIZE limitation was shamelessly
> stolen from perf_event_set_filter() (so PERF_EVENT_IOC_SET_FILTER) as an
> attempt to address a problem of passing a zero-terminated string from
> userspace. Simply speaking - there must be some limitation, and a page
> size seem as good as any other. I have strong doubts about this myself,
> so all alternative ideas are more than welcome.
 
> As I mentioned in the cover letter, maybe this simply shouldn't be a
> string? I made it like this to mimic trace_marker, but maybe an integer
> value + some kind of a dictionary in userspace is a better approach? I
> belive that ftrace's maker is taking a string, because it's: 1. natural
> for its interface and 2. anyone (sort of) can write to it, so it's hard
> to assume anything. In this case the user "owns" the perf data, so he
> could handle int<->whatever-else relation table...

Perhaps both? I.e. an u64 followed from a string, if the u64 is zero,
then there is a string right after it?

- Arnaldo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-12 13:49         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 35+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-12 13:49 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Christopher Covington, Richard Cochran, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Paul Mackerras, John Stultz,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Em Fri, Sep 12, 2014 at 01:57:52PM +0100, Pawel Moll escreveu:
> On Fri, 2014-09-12 at 13:43 +0100, Christopher Covington wrote:
> > Just to ask the dumb questions in case the answers I've come up with are
> > wrong: What is PAGE_SIZE on an arm64 kernel? 
 
> It's either 4 or 64k, depending on CONFIG_ARM64_64K_PAGES.
 
> > How does userspace know?

> #include <unistd.h>
> #include <stdio.h>
 
> int main(void)
> {
> 	printf("%ld\n", sysconf(_SC_PAGESIZE));
> 	return 0;
> }
 
> Now a word of explanation. The PAGE_SIZE limitation was shamelessly
> stolen from perf_event_set_filter() (so PERF_EVENT_IOC_SET_FILTER) as an
> attempt to address a problem of passing a zero-terminated string from
> userspace. Simply speaking - there must be some limitation, and a page
> size seem as good as any other. I have strong doubts about this myself,
> so all alternative ideas are more than welcome.
 
> As I mentioned in the cover letter, maybe this simply shouldn't be a
> string? I made it like this to mimic trace_marker, but maybe an integer
> value + some kind of a dictionary in userspace is a better approach? I
> belive that ftrace's maker is taking a string, because it's: 1. natural
> for its interface and 2. anyone (sort of) can write to it, so it's hard
> to assume anything. In this case the user "owns" the perf data, so he
> could handle int<->whatever-else relation table...

Perhaps both? I.e. an u64 followed from a string, if the u64 is zero,
then there is a string right after it?

- Arnaldo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-12 13:58           ` Pawel Moll
  0 siblings, 0 replies; 35+ messages in thread
From: Pawel Moll @ 2014-09-12 13:58 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Christopher Covington, Richard Cochran, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Paul Mackerras, John Stultz,
	linux-kernel, linux-api

On Fri, 2014-09-12 at 14:49 +0100, Arnaldo Carvalho de Melo wrote:
> Perhaps both? I.e. an u64 followed from a string, if the u64 is zero,
> then there is a string right after it?

How would this look like in userspace? Something like this?

8<----
struct perf_event_marker {
	uint64_t value;
	char *string;
} arg;

arg.value = 0x1234;

/* or */

arg.value = 0;
arg.string = "abcd";

ioctl(fd, PERF_EVENT_IOC_MARKER, &arg)
8<----

If so, maybe it would simpler just to go for classic size/data
structure?

8<-----
struct perf_event_marker {
	uint32_t size;
	void *data;
}
8<-----

This would directly map into struct perf_raw_record...

Paweł


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-12 13:58           ` Pawel Moll
  0 siblings, 0 replies; 35+ messages in thread
From: Pawel Moll @ 2014-09-12 13:58 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Christopher Covington, Richard Cochran, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Paul Mackerras, John Stultz,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Fri, 2014-09-12 at 14:49 +0100, Arnaldo Carvalho de Melo wrote:
> Perhaps both? I.e. an u64 followed from a string, if the u64 is zero,
> then there is a string right after it?

How would this look like in userspace? Something like this?

8<----
struct perf_event_marker {
	uint64_t value;
	char *string;
} arg;

arg.value = 0x1234;

/* or */

arg.value = 0;
arg.string = "abcd";

ioctl(fd, PERF_EVENT_IOC_MARKER, &arg)
8<----

If so, maybe it would simpler just to go for classic size/data
structure?

8<-----
struct perf_event_marker {
	uint32_t size;
	void *data;
}
8<-----

This would directly map into struct perf_raw_record...

Paweł

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
  2014-09-12 12:57       ` Pawel Moll
@ 2014-09-12 14:00         ` Christopher Covington
  -1 siblings, 0 replies; 35+ messages in thread
From: Christopher Covington @ 2014-09-12 14:00 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Richard Cochran, Steven Rostedt, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, John Stultz,
	linux-kernel, linux-api

On 09/12/2014 08:57 AM, Pawel Moll wrote:
> On Fri, 2014-09-12 at 13:43 +0100, Christopher Covington wrote:
>> Just to ask the dumb questions in case the answers I've come up with are
>> wrong: What is PAGE_SIZE on an arm64 kernel? 
> 
> It's either 4 or 64k, depending on CONFIG_ARM64_64K_PAGES.
> 
>> How does userspace know?
>>
> #include <unistd.h>
> #include <stdio.h>
> 
> int main(void)
> {
> 	printf("%ld\n", sysconf(_SC_PAGESIZE));
> 	return 0;
> }

Oh excellent, that actually works. Based on a misreading of the glibc code I
thought it was hard-coded to 64K.

Thanks,
Christopher

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-12 14:00         ` Christopher Covington
  0 siblings, 0 replies; 35+ messages in thread
From: Christopher Covington @ 2014-09-12 14:00 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Richard Cochran, Steven Rostedt, Ingo Molnar, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, John Stultz,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On 09/12/2014 08:57 AM, Pawel Moll wrote:
> On Fri, 2014-09-12 at 13:43 +0100, Christopher Covington wrote:
>> Just to ask the dumb questions in case the answers I've come up with are
>> wrong: What is PAGE_SIZE on an arm64 kernel? 
> 
> It's either 4 or 64k, depending on CONFIG_ARM64_64K_PAGES.
> 
>> How does userspace know?
>>
> #include <unistd.h>
> #include <stdio.h>
> 
> int main(void)
> {
> 	printf("%ld\n", sysconf(_SC_PAGESIZE));
> 	return 0;
> }

Oh excellent, that actually works. Based on a misreading of the glibc code I
thought it was hard-coded to 64K.

Thanks,
Christopher

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
  2014-09-12 13:58           ` Pawel Moll
@ 2014-09-12 16:19             ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 35+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-12 16:19 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Christopher Covington, Richard Cochran, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Paul Mackerras, John Stultz,
	linux-kernel, linux-api

Em Fri, Sep 12, 2014 at 02:58:55PM +0100, Pawel Moll escreveu:
> On Fri, 2014-09-12 at 14:49 +0100, Arnaldo Carvalho de Melo wrote:
> > Perhaps both? I.e. an u64 followed from a string, if the u64 is zero,
> > then there is a string right after it?
 
> How would this look like in userspace? Something like this?
 
> 8<----
> struct perf_event_marker {
> 	uint64_t value;
> 	char *string;
> } arg;
 
> arg.value = 0x1234;
 
> /* or */
 
> arg.value = 0;
> arg.string = "abcd";
 
> ioctl(fd, PERF_EVENT_IOC_MARKER, &arg)
> 8<----
 
> If so, maybe it would simpler just to go for classic size/data
> structure?
 
> 8<-----
> struct perf_event_marker {
> 	uint32_t size;
> 	void *data;
> }
> 8<-----
 
> This would directly map into struct perf_raw_record...

I can see the usefulness of having it all, i.e. if we do just:

perf trace --pid `pidof some-tool-in-debug-mode-using-this-interface`

Then 'perf trace' doesn't know about any binary format a tool may have,
getting strings there (hey, LD_PRELOADing some logging library to hook
into this comes to mind) and having it merged with other events
(syscalls, pagefaults, etc) looks useful.

As well as some specialized version of 'perf trace' that knows about
some binary protocol that would get app specific stats or lock status,
etc, perhaps even plugins for 'perf trace' that would be selected by
that first u64? Also seems useful.

I.e. having a way to provide just strings and another that would allow
passing perf_raw_record.

- Arnaldo

- Arnaldo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-12 16:19             ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 35+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-12 16:19 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Christopher Covington, Richard Cochran, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Paul Mackerras, John Stultz,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Em Fri, Sep 12, 2014 at 02:58:55PM +0100, Pawel Moll escreveu:
> On Fri, 2014-09-12 at 14:49 +0100, Arnaldo Carvalho de Melo wrote:
> > Perhaps both? I.e. an u64 followed from a string, if the u64 is zero,
> > then there is a string right after it?
 
> How would this look like in userspace? Something like this?
 
> 8<----
> struct perf_event_marker {
> 	uint64_t value;
> 	char *string;
> } arg;
 
> arg.value = 0x1234;
 
> /* or */
 
> arg.value = 0;
> arg.string = "abcd";
 
> ioctl(fd, PERF_EVENT_IOC_MARKER, &arg)
> 8<----
 
> If so, maybe it would simpler just to go for classic size/data
> structure?
 
> 8<-----
> struct perf_event_marker {
> 	uint32_t size;
> 	void *data;
> }
> 8<-----
 
> This would directly map into struct perf_raw_record...

I can see the usefulness of having it all, i.e. if we do just:

perf trace --pid `pidof some-tool-in-debug-mode-using-this-interface`

Then 'perf trace' doesn't know about any binary format a tool may have,
getting strings there (hey, LD_PRELOADing some logging library to hook
into this comes to mind) and having it merged with other events
(syscalls, pagefaults, etc) looks useful.

As well as some specialized version of 'perf trace' that knows about
some binary protocol that would get app specific stats or lock status,
etc, perhaps even plugins for 'perf trace' that would be selected by
that first u64? Also seems useful.

I.e. having a way to provide just strings and another that would allow
passing perf_raw_record.

- Arnaldo

- Arnaldo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-12 17:37     ` David Ahern
  0 siblings, 0 replies; 35+ messages in thread
From: David Ahern @ 2014-09-12 17:37 UTC (permalink / raw)
  To: Pawel Moll, Richard Cochran, Steven Rostedt, Ingo Molnar,
	Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	John Stultz
  Cc: linux-kernel, linux-api

On 9/12/14, 4:48 AM, Pawel Moll wrote:
> This patch adds a PERF_COUNT_SW_MARKER event type, which
> can be requested by user and a PERF_EVENT_IOC_MARKER
> ioctl command which will inject an event of said type into
> the perf buffer. The ioctl can take a zero-terminated
> string argument, similar to tracing_marker in ftrace,
> which will be kept in the "raw" field of the sample.
>
> The main use case for this is synchronisation of
> performance data generated in user space with the perf
> stream coming from the kernel. For example, the marker
> can be inserted by a JIT engine after it generated
> portion of the code, but before the code is executed
> for the first time, allowing the post-processor to
> pick the correct debugging information. Other example
> is a system profiling tool taking data from other
> sources than just perf, which generates a marker
> at the beginning at at the end of the session
> (also possibly periodically during the session) to
> synchronise kernel timestamps with clock values
> obtained in userspace (gtod or raw_monotonic).

Seems really similar to what I proposed in the past:

https://lkml.org/lkml/2011/2/27/159

Which was rejected.

David

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-12 17:37     ` David Ahern
  0 siblings, 0 replies; 35+ messages in thread
From: David Ahern @ 2014-09-12 17:37 UTC (permalink / raw)
  To: Pawel Moll, Richard Cochran, Steven Rostedt, Ingo Molnar,
	Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	John Stultz
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA

On 9/12/14, 4:48 AM, Pawel Moll wrote:
> This patch adds a PERF_COUNT_SW_MARKER event type, which
> can be requested by user and a PERF_EVENT_IOC_MARKER
> ioctl command which will inject an event of said type into
> the perf buffer. The ioctl can take a zero-terminated
> string argument, similar to tracing_marker in ftrace,
> which will be kept in the "raw" field of the sample.
>
> The main use case for this is synchronisation of
> performance data generated in user space with the perf
> stream coming from the kernel. For example, the marker
> can be inserted by a JIT engine after it generated
> portion of the code, but before the code is executed
> for the first time, allowing the post-processor to
> pick the correct debugging information. Other example
> is a system profiling tool taking data from other
> sources than just perf, which generates a marker
> at the beginning at at the end of the session
> (also possibly periodically during the session) to
> synchronise kernel timestamps with clock values
> obtained in userspace (gtod or raw_monotonic).

Seems really similar to what I proposed in the past:

https://lkml.org/lkml/2011/2/27/159

Which was rejected.

David

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-12 20:44       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 35+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-12 20:44 UTC (permalink / raw)
  To: David Ahern
  Cc: Pawel Moll, Richard Cochran, Steven Rostedt, Ingo Molnar,
	Peter Zijlstra, Paul Mackerras, John Stultz, linux-kernel,
	linux-api

Em Fri, Sep 12, 2014 at 10:37:39AM -0700, David Ahern escreveu:
> On 9/12/14, 4:48 AM, Pawel Moll wrote:
> >This patch adds a PERF_COUNT_SW_MARKER event type, which
> >can be requested by user and a PERF_EVENT_IOC_MARKER
> >ioctl command which will inject an event of said type into
> >the perf buffer. The ioctl can take a zero-terminated
> >string argument, similar to tracing_marker in ftrace,
> >which will be kept in the "raw" field of the sample.
> >
> >The main use case for this is synchronisation of
> >performance data generated in user space with the perf
> >stream coming from the kernel. For example, the marker
> >can be inserted by a JIT engine after it generated
> >portion of the code, but before the code is executed
> >for the first time, allowing the post-processor to
> >pick the correct debugging information. Other example
> >is a system profiling tool taking data from other
> >sources than just perf, which generates a marker
> >at the beginning at at the end of the session
> >(also possibly periodically during the session) to
> >synchronise kernel timestamps with clock values
> >obtained in userspace (gtod or raw_monotonic).
> 
> Seems really similar to what I proposed in the past:
> 
> https://lkml.org/lkml/2011/2/27/159
> 
> Which was rejected.

I took a look at that thread, but just barely, emphasis on that.

Injecting something from userspace, a la ftrace, seems to be something,
as tglx mentioned, "buried" in that patchset.

- Arnaldo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-12 20:44       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 35+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-12 20:44 UTC (permalink / raw)
  To: David Ahern
  Cc: Pawel Moll, Richard Cochran, Steven Rostedt, Ingo Molnar,
	Peter Zijlstra, Paul Mackerras, John Stultz,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Em Fri, Sep 12, 2014 at 10:37:39AM -0700, David Ahern escreveu:
> On 9/12/14, 4:48 AM, Pawel Moll wrote:
> >This patch adds a PERF_COUNT_SW_MARKER event type, which
> >can be requested by user and a PERF_EVENT_IOC_MARKER
> >ioctl command which will inject an event of said type into
> >the perf buffer. The ioctl can take a zero-terminated
> >string argument, similar to tracing_marker in ftrace,
> >which will be kept in the "raw" field of the sample.
> >
> >The main use case for this is synchronisation of
> >performance data generated in user space with the perf
> >stream coming from the kernel. For example, the marker
> >can be inserted by a JIT engine after it generated
> >portion of the code, but before the code is executed
> >for the first time, allowing the post-processor to
> >pick the correct debugging information. Other example
> >is a system profiling tool taking data from other
> >sources than just perf, which generates a marker
> >at the beginning at at the end of the session
> >(also possibly periodically during the session) to
> >synchronise kernel timestamps with clock values
> >obtained in userspace (gtod or raw_monotonic).
> 
> Seems really similar to what I proposed in the past:
> 
> https://lkml.org/lkml/2011/2/27/159
> 
> Which was rejected.

I took a look at that thread, but just barely, emphasis on that.

Injecting something from userspace, a la ftrace, seems to be something,
as tglx mentioned, "buried" in that patchset.

- Arnaldo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
  2014-09-12 20:44       ` Arnaldo Carvalho de Melo
  (?)
@ 2014-09-14 15:43       ` David Ahern
  2014-09-15 17:18           ` Pawel Moll
  2014-09-16  7:44           ` Ingo Molnar
  -1 siblings, 2 replies; 35+ messages in thread
From: David Ahern @ 2014-09-14 15:43 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Pawel Moll, Richard Cochran, Steven Rostedt, Ingo Molnar,
	Peter Zijlstra, Paul Mackerras, John Stultz, linux-kernel,
	linux-api

On 9/12/14, 2:44 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, Sep 12, 2014 at 10:37:39AM -0700, David Ahern escreveu:
>> On 9/12/14, 4:48 AM, Pawel Moll wrote:
>>> This patch adds a PERF_COUNT_SW_MARKER event type, which
>>> can be requested by user and a PERF_EVENT_IOC_MARKER
>>> ioctl command which will inject an event of said type into
>>> the perf buffer. The ioctl can take a zero-terminated
>>> string argument, similar to tracing_marker in ftrace,
>>> which will be kept in the "raw" field of the sample.
>>>
>>> The main use case for this is synchronisation of
>>> performance data generated in user space with the perf
>>> stream coming from the kernel. For example, the marker
>>> can be inserted by a JIT engine after it generated
>>> portion of the code, but before the code is executed
>>> for the first time, allowing the post-processor to
>>> pick the correct debugging information. Other example
>>> is a system profiling tool taking data from other
>>> sources than just perf, which generates a marker
>>> at the beginning at at the end of the session
>>> (also possibly periodically during the session) to
>>> synchronise kernel timestamps with clock values
>>> obtained in userspace (gtod or raw_monotonic).
>>
>> Seems really similar to what I proposed in the past:
>>
>> https://lkml.org/lkml/2011/2/27/159
>>
>> Which was rejected.
>
> I took a look at that thread, but just barely, emphasis on that.
>
> Injecting something from userspace, a la ftrace, seems to be something,
> as tglx mentioned, "buried" in that patchset.

Thomas object to an ioctl buried deep in a patch -- newbie mistake.

Peter objected to the ioctl https://lkml.org/lkml/2011/3/1/229

It was not userspace injecting random data into the stream but rather 
forcing the sample to be generated and added to the stream.

David

David



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-15 17:18           ` Pawel Moll
  0 siblings, 0 replies; 35+ messages in thread
From: Pawel Moll @ 2014-09-15 17:18 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, Richard Cochran, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Paul Mackerras, John Stultz,
	linux-kernel, linux-api

On Sun, 2014-09-14 at 16:43 +0100, David Ahern wrote:
> >> Seems really similar to what I proposed in the past:

Yeah, it wasn't really hard to come with similar conclusions :-)

> >> https://lkml.org/lkml/2011/2/27/159
> >>
> >> Which was rejected.
> >
> > I took a look at that thread, but just barely, emphasis on that.
> >
> > Injecting something from userspace, a la ftrace, seems to be something,
> > as tglx mentioned, "buried" in that patchset.
> 
> Thomas object to an ioctl buried deep in a patch -- newbie mistake.
> 
> Peter objected to the ioctl https://lkml.org/lkml/2011/3/1/229
> 
> It was not userspace injecting random data into the stream but rather 
> forcing the sample to be generated and added to the stream.

I would like to hear from Peter and others. If not here, I'll get them
to talk next month on Linux Plumbers :-)

Pawel


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-15 17:18           ` Pawel Moll
  0 siblings, 0 replies; 35+ messages in thread
From: Pawel Moll @ 2014-09-15 17:18 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, Richard Cochran, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Paul Mackerras, John Stultz,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Sun, 2014-09-14 at 16:43 +0100, David Ahern wrote:
> >> Seems really similar to what I proposed in the past:

Yeah, it wasn't really hard to come with similar conclusions :-)

> >> https://lkml.org/lkml/2011/2/27/159
> >>
> >> Which was rejected.
> >
> > I took a look at that thread, but just barely, emphasis on that.
> >
> > Injecting something from userspace, a la ftrace, seems to be something,
> > as tglx mentioned, "buried" in that patchset.
> 
> Thomas object to an ioctl buried deep in a patch -- newbie mistake.
> 
> Peter objected to the ioctl https://lkml.org/lkml/2011/3/1/229
> 
> It was not userspace injecting random data into the stream but rather 
> forcing the sample to be generated and added to the stream.

I would like to hear from Peter and others. If not here, I'll get them
to talk next month on Linux Plumbers :-)

Pawel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-15 17:27               ` Pawel Moll
  0 siblings, 0 replies; 35+ messages in thread
From: Pawel Moll @ 2014-09-15 17:27 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Christopher Covington, Richard Cochran, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Paul Mackerras, John Stultz,
	linux-kernel, linux-api

On Fri, 2014-09-12 at 17:19 +0100, Arnaldo Carvalho de Melo wrote:
> Em Fri, Sep 12, 2014 at 02:58:55PM +0100, Pawel Moll escreveu:
> > On Fri, 2014-09-12 at 14:49 +0100, Arnaldo Carvalho de Melo wrote:
> > > Perhaps both? I.e. an u64 followed from a string, if the u64 is zero,
> > > then there is a string right after it?
>  
> > How would this look like in userspace? Something like this?
>  
> > 8<----
> > struct perf_event_marker {
> > 	uint64_t value;
> > 	char *string;
> > } arg;
>  
> > arg.value = 0x1234;
>  
> > /* or */
>  
> > arg.value = 0;
> > arg.string = "abcd";
>  
> > ioctl(fd, PERF_EVENT_IOC_MARKER, &arg)
> > 8<----
>  
> > If so, maybe it would simpler just to go for classic size/data
> > structure?
>  
> > 8<-----
> > struct perf_event_marker {
> > 	uint32_t size;
> > 	void *data;
> > }
> > 8<-----
>  
> > This would directly map into struct perf_raw_record...
> 
> I can see the usefulness of having it all, i.e. if we do just:
> 
> perf trace --pid `pidof some-tool-in-debug-mode-using-this-interface`

Hm. I haven't thought about a situation when 3rd party wants to inject
something into "my" data stream... I guess it could be implemented (a
"pid" member of the struct perf_event_marker with default 0 meaning
"myself"?), but will definitely complicate the patch. Should I have a
look at it now or maybe leave it till we get a general agreement about
the marker ioctl existence?

> Then 'perf trace' doesn't know about any binary format a tool may have,
> getting strings there (hey, LD_PRELOADing some logging library to hook
> into this comes to mind) and having it merged with other events
> (syscalls, pagefaults, etc) looks useful.

But do you still mean a "magic" u64 before the rest? Injecting a string
would just mean:

	marker.size = strlen(s) + 1;
	marker.data = s;

> As well as some specialized version of 'perf trace' that knows about
> some binary protocol that would get app specific stats or lock status,
> etc, perhaps even plugins for 'perf trace' that would be selected by
> that first u64? Also seems useful.
> 	  
> I.e. having a way to provide just strings and another that would allow
> passing perf_raw_record.

Sounds interesting. But then maybe this stuff shouldn't go into "raw"
then? It could be something like this in the sample:

	{ u64 type; /* 0 means zero-terminated string in data */
	  u32 size;
	  char data[size]; } && PERF_SAMPLE_MARKER

Pawel



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-15 17:27               ` Pawel Moll
  0 siblings, 0 replies; 35+ messages in thread
From: Pawel Moll @ 2014-09-15 17:27 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Christopher Covington, Richard Cochran, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Paul Mackerras, John Stultz,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Fri, 2014-09-12 at 17:19 +0100, Arnaldo Carvalho de Melo wrote:
> Em Fri, Sep 12, 2014 at 02:58:55PM +0100, Pawel Moll escreveu:
> > On Fri, 2014-09-12 at 14:49 +0100, Arnaldo Carvalho de Melo wrote:
> > > Perhaps both? I.e. an u64 followed from a string, if the u64 is zero,
> > > then there is a string right after it?
>  
> > How would this look like in userspace? Something like this?
>  
> > 8<----
> > struct perf_event_marker {
> > 	uint64_t value;
> > 	char *string;
> > } arg;
>  
> > arg.value = 0x1234;
>  
> > /* or */
>  
> > arg.value = 0;
> > arg.string = "abcd";
>  
> > ioctl(fd, PERF_EVENT_IOC_MARKER, &arg)
> > 8<----
>  
> > If so, maybe it would simpler just to go for classic size/data
> > structure?
>  
> > 8<-----
> > struct perf_event_marker {
> > 	uint32_t size;
> > 	void *data;
> > }
> > 8<-----
>  
> > This would directly map into struct perf_raw_record...
> 
> I can see the usefulness of having it all, i.e. if we do just:
> 
> perf trace --pid `pidof some-tool-in-debug-mode-using-this-interface`

Hm. I haven't thought about a situation when 3rd party wants to inject
something into "my" data stream... I guess it could be implemented (a
"pid" member of the struct perf_event_marker with default 0 meaning
"myself"?), but will definitely complicate the patch. Should I have a
look at it now or maybe leave it till we get a general agreement about
the marker ioctl existence?

> Then 'perf trace' doesn't know about any binary format a tool may have,
> getting strings there (hey, LD_PRELOADing some logging library to hook
> into this comes to mind) and having it merged with other events
> (syscalls, pagefaults, etc) looks useful.

But do you still mean a "magic" u64 before the rest? Injecting a string
would just mean:

	marker.size = strlen(s) + 1;
	marker.data = s;

> As well as some specialized version of 'perf trace' that knows about
> some binary protocol that would get app specific stats or lock status,
> etc, perhaps even plugins for 'perf trace' that would be selected by
> that first u64? Also seems useful.
> 	  
> I.e. having a way to provide just strings and another that would allow
> passing perf_raw_record.

Sounds interesting. But then maybe this stuff shouldn't go into "raw"
then? It could be something like this in the sample:

	{ u64 type; /* 0 means zero-terminated string in data */
	  u32 size;
	  char data[size]; } && PERF_SAMPLE_MARKER

Pawel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
  2014-09-15 17:27               ` Pawel Moll
@ 2014-09-15 18:31                 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 35+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-15 18:31 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Christopher Covington, Richard Cochran, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Paul Mackerras, John Stultz,
	linux-kernel, linux-api

Em Mon, Sep 15, 2014 at 06:27:14PM +0100, Pawel Moll escreveu:
> On Fri, 2014-09-12 at 17:19 +0100, Arnaldo Carvalho de Melo wrote:
> > Em Fri, Sep 12, 2014 at 02:58:55PM +0100, Pawel Moll escreveu:
> > > On Fri, 2014-09-12 at 14:49 +0100, Arnaldo Carvalho de Melo wrote:
> > > > Perhaps both? I.e. an u64 followed from a string, if the u64 is zero,
> > > > then there is a string right after it?
> >  
> > > How would this look like in userspace? Something like this?
> >  
> > > 8<----
> > > struct perf_event_marker {
> > > 	uint64_t value;
> > > 	char *string;
> > > } arg;
> >  
> > > arg.value = 0x1234;
> >  
> > > /* or */
> >  
> > > arg.value = 0;
> > > arg.string = "abcd";
> >  
> > > ioctl(fd, PERF_EVENT_IOC_MARKER, &arg)
> > > 8<----
> >  
> > > If so, maybe it would simpler just to go for classic size/data
> > > structure?
> >  
> > > 8<-----
> > > struct perf_event_marker {
> > > 	uint32_t size;
> > > 	void *data;
> > > }
> > > 8<-----
> >  
> > > This would directly map into struct perf_raw_record...
> > 
> > I can see the usefulness of having it all, i.e. if we do just:
> > 
> > perf trace --pid `pidof some-tool-in-debug-mode-using-this-interface`
> 
> Hm. I haven't thought about a situation when 3rd party wants to inject
> something into "my" data stream... I guess it could be implemented (a

I was thinking about intercepting calls that pass some logging data, as
strings, and 'tee' them to the 'perf trace' 'data stream'.

> "pid" member of the struct perf_event_marker with default 0 meaning

Humm, Isn't PERF_SAMPLE_TID enough for that?

> "myself"?), but will definitely complicate the patch. Should I have a
> look at it now or maybe leave it till we get a general agreement about
> the marker ioctl existence?
> 
> > Then 'perf trace' doesn't know about any binary format a tool may have,
> > getting strings there (hey, LD_PRELOADing some logging library to hook
> > into this comes to mind) and having it merged with other events
> > (syscalls, pagefaults, etc) looks useful.
> 
> But do you still mean a "magic" u64 before the rest? Injecting a string
> would just mean:
> 
> 	marker.size = strlen(s) + 1;
> 	marker.data = s;
> 
> > As well as some specialized version of 'perf trace' that knows about
> > some binary protocol that would get app specific stats or lock status,
> > etc, perhaps even plugins for 'perf trace' that would be selected by
> > that first u64? Also seems useful.
> > 	  
> > I.e. having a way to provide just strings and another that would allow
> > passing perf_raw_record.
> 
> Sounds interesting. But then maybe this stuff shouldn't go into "raw"
> then? It could be something like this in the sample:
> 
> 	{ u64 type; /* 0 means zero-terminated string in data */
> 	  u32 size;
> 	  char data[size]; } && PERF_SAMPLE_MARKER

Yes, this is how I think it should be.
 
> Pawel
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-15 18:31                 ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 35+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-15 18:31 UTC (permalink / raw)
  To: Pawel Moll
  Cc: Christopher Covington, Richard Cochran, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Paul Mackerras, John Stultz,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Em Mon, Sep 15, 2014 at 06:27:14PM +0100, Pawel Moll escreveu:
> On Fri, 2014-09-12 at 17:19 +0100, Arnaldo Carvalho de Melo wrote:
> > Em Fri, Sep 12, 2014 at 02:58:55PM +0100, Pawel Moll escreveu:
> > > On Fri, 2014-09-12 at 14:49 +0100, Arnaldo Carvalho de Melo wrote:
> > > > Perhaps both? I.e. an u64 followed from a string, if the u64 is zero,
> > > > then there is a string right after it?
> >  
> > > How would this look like in userspace? Something like this?
> >  
> > > 8<----
> > > struct perf_event_marker {
> > > 	uint64_t value;
> > > 	char *string;
> > > } arg;
> >  
> > > arg.value = 0x1234;
> >  
> > > /* or */
> >  
> > > arg.value = 0;
> > > arg.string = "abcd";
> >  
> > > ioctl(fd, PERF_EVENT_IOC_MARKER, &arg)
> > > 8<----
> >  
> > > If so, maybe it would simpler just to go for classic size/data
> > > structure?
> >  
> > > 8<-----
> > > struct perf_event_marker {
> > > 	uint32_t size;
> > > 	void *data;
> > > }
> > > 8<-----
> >  
> > > This would directly map into struct perf_raw_record...
> > 
> > I can see the usefulness of having it all, i.e. if we do just:
> > 
> > perf trace --pid `pidof some-tool-in-debug-mode-using-this-interface`
> 
> Hm. I haven't thought about a situation when 3rd party wants to inject
> something into "my" data stream... I guess it could be implemented (a

I was thinking about intercepting calls that pass some logging data, as
strings, and 'tee' them to the 'perf trace' 'data stream'.

> "pid" member of the struct perf_event_marker with default 0 meaning

Humm, Isn't PERF_SAMPLE_TID enough for that?

> "myself"?), but will definitely complicate the patch. Should I have a
> look at it now or maybe leave it till we get a general agreement about
> the marker ioctl existence?
> 
> > Then 'perf trace' doesn't know about any binary format a tool may have,
> > getting strings there (hey, LD_PRELOADing some logging library to hook
> > into this comes to mind) and having it merged with other events
> > (syscalls, pagefaults, etc) looks useful.
> 
> But do you still mean a "magic" u64 before the rest? Injecting a string
> would just mean:
> 
> 	marker.size = strlen(s) + 1;
> 	marker.data = s;
> 
> > As well as some specialized version of 'perf trace' that knows about
> > some binary protocol that would get app specific stats or lock status,
> > etc, perhaps even plugins for 'perf trace' that would be selected by
> > that first u64? Also seems useful.
> > 	  
> > I.e. having a way to provide just strings and another that would allow
> > passing perf_raw_record.
> 
> Sounds interesting. But then maybe this stuff shouldn't go into "raw"
> then? It could be something like this in the sample:
> 
> 	{ u64 type; /* 0 means zero-terminated string in data */
> 	  u32 size;
> 	  char data[size]; } && PERF_SAMPLE_MARKER

Yes, this is how I think it should be.
 
> Pawel
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-16  7:44           ` Ingo Molnar
  0 siblings, 0 replies; 35+ messages in thread
From: Ingo Molnar @ 2014-09-16  7:44 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, Pawel Moll, Richard Cochran,
	Steven Rostedt, Ingo Molnar, Peter Zijlstra, Paul Mackerras,
	John Stultz, linux-kernel, linux-api


* David Ahern <dsahern@gmail.com> wrote:

> On 9/12/14, 2:44 PM, Arnaldo Carvalho de Melo wrote:
> >Em Fri, Sep 12, 2014 at 10:37:39AM -0700, David Ahern escreveu:
> >>On 9/12/14, 4:48 AM, Pawel Moll wrote:
> >>>This patch adds a PERF_COUNT_SW_MARKER event type, which
> >>>can be requested by user and a PERF_EVENT_IOC_MARKER
> >>>ioctl command which will inject an event of said type into
> >>>the perf buffer. The ioctl can take a zero-terminated
> >>>string argument, similar to tracing_marker in ftrace,
> >>>which will be kept in the "raw" field of the sample.
> >>>
> >>>The main use case for this is synchronisation of
> >>>performance data generated in user space with the perf
> >>>stream coming from the kernel. For example, the marker
> >>>can be inserted by a JIT engine after it generated
> >>>portion of the code, but before the code is executed
> >>>for the first time, allowing the post-processor to
> >>>pick the correct debugging information. Other example
> >>>is a system profiling tool taking data from other
> >>>sources than just perf, which generates a marker
> >>>at the beginning at at the end of the session
> >>>(also possibly periodically during the session) to
> >>>synchronise kernel timestamps with clock values
> >>>obtained in userspace (gtod or raw_monotonic).
> >>
> >>Seems really similar to what I proposed in the past:
> >>
> >>https://lkml.org/lkml/2011/2/27/159
> >>
> >>Which was rejected.
> >
> >I took a look at that thread, but just barely, emphasis on that.
> >
> >Injecting something from userspace, a la ftrace, seems to be something,
> >as tglx mentioned, "buried" in that patchset.
> 
> Thomas object to an ioctl buried deep in a patch -- newbie 
> mistake.
> 
> Peter objected to the ioctl https://lkml.org/lkml/2011/3/1/229
> 
> It was not userspace injecting random data into the stream but 
> rather forcing the sample to be generated and added to the 
> stream.

I think adding an ioctl to inject user-provided data into the 
event stream is sensible, as long as there's a separate 'user 
generated data' event for it, etc.

The main usecase I could see would be to introduce a 
perf_printf() variant, supported by 'perf trace' by default, to 
add various tracable printouts to apps.

Timestamps generated by apps would be another usecase. It would 
probably be wise to add a 32-bit (or 64-bit) message type ID, 
plus a length field, with a message type registry somewhere in 
tools/perf/ (and reference implementation for each new subtype), 
to keep things organized yet flexible going forward.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-16  7:44           ` Ingo Molnar
  0 siblings, 0 replies; 35+ messages in thread
From: Ingo Molnar @ 2014-09-16  7:44 UTC (permalink / raw)
  To: David Ahern
  Cc: Arnaldo Carvalho de Melo, Pawel Moll, Richard Cochran,
	Steven Rostedt, Ingo Molnar, Peter Zijlstra, Paul Mackerras,
	John Stultz, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA


* David Ahern <dsahern-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> On 9/12/14, 2:44 PM, Arnaldo Carvalho de Melo wrote:
> >Em Fri, Sep 12, 2014 at 10:37:39AM -0700, David Ahern escreveu:
> >>On 9/12/14, 4:48 AM, Pawel Moll wrote:
> >>>This patch adds a PERF_COUNT_SW_MARKER event type, which
> >>>can be requested by user and a PERF_EVENT_IOC_MARKER
> >>>ioctl command which will inject an event of said type into
> >>>the perf buffer. The ioctl can take a zero-terminated
> >>>string argument, similar to tracing_marker in ftrace,
> >>>which will be kept in the "raw" field of the sample.
> >>>
> >>>The main use case for this is synchronisation of
> >>>performance data generated in user space with the perf
> >>>stream coming from the kernel. For example, the marker
> >>>can be inserted by a JIT engine after it generated
> >>>portion of the code, but before the code is executed
> >>>for the first time, allowing the post-processor to
> >>>pick the correct debugging information. Other example
> >>>is a system profiling tool taking data from other
> >>>sources than just perf, which generates a marker
> >>>at the beginning at at the end of the session
> >>>(also possibly periodically during the session) to
> >>>synchronise kernel timestamps with clock values
> >>>obtained in userspace (gtod or raw_monotonic).
> >>
> >>Seems really similar to what I proposed in the past:
> >>
> >>https://lkml.org/lkml/2011/2/27/159
> >>
> >>Which was rejected.
> >
> >I took a look at that thread, but just barely, emphasis on that.
> >
> >Injecting something from userspace, a la ftrace, seems to be something,
> >as tglx mentioned, "buried" in that patchset.
> 
> Thomas object to an ioctl buried deep in a patch -- newbie 
> mistake.
> 
> Peter objected to the ioctl https://lkml.org/lkml/2011/3/1/229
> 
> It was not userspace injecting random data into the stream but 
> rather forcing the sample to be generated and added to the 
> stream.

I think adding an ioctl to inject user-provided data into the 
event stream is sensible, as long as there's a separate 'user 
generated data' event for it, etc.

The main usecase I could see would be to introduce a 
perf_printf() variant, supported by 'perf trace' by default, to 
add various tracable printouts to apps.

Timestamps generated by apps would be another usecase. It would 
probably be wise to add a 32-bit (or 64-bit) message type ID, 
plus a length field, with a message type registry somewhere in 
tools/perf/ (and reference implementation for each new subtype), 
to keep things organized yet flexible going forward.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-16 16:33                   ` Pawel Moll
  0 siblings, 0 replies; 35+ messages in thread
From: Pawel Moll @ 2014-09-16 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Christopher Covington, Richard Cochran, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Paul Mackerras, John Stultz,
	linux-kernel, linux-api

On Mon, 2014-09-15 at 19:31 +0100, Arnaldo Carvalho de Melo wrote:
> Em Mon, Sep 15, 2014 at 06:27:14PM +0100, Pawel Moll escreveu:
> > On Fri, 2014-09-12 at 17:19 +0100, Arnaldo Carvalho de Melo wrote:
> > > Em Fri, Sep 12, 2014 at 02:58:55PM +0100, Pawel Moll escreveu:
> > > > On Fri, 2014-09-12 at 14:49 +0100, Arnaldo Carvalho de Melo wrote:
> > > > > Perhaps both? I.e. an u64 followed from a string, if the u64 is zero,
> > > > > then there is a string right after it?
> > >  
> > > > How would this look like in userspace? Something like this?
> > >  
> > > > 8<----
> > > > struct perf_event_marker {
> > > > 	uint64_t value;
> > > > 	char *string;
> > > > } arg;
> > >  
> > > > arg.value = 0x1234;
> > >  
> > > > /* or */
> > >  
> > > > arg.value = 0;
> > > > arg.string = "abcd";
> > >  
> > > > ioctl(fd, PERF_EVENT_IOC_MARKER, &arg)
> > > > 8<----
> > >  
> > > > If so, maybe it would simpler just to go for classic size/data
> > > > structure?
> > >  
> > > > 8<-----
> > > > struct perf_event_marker {
> > > > 	uint32_t size;
> > > > 	void *data;
> > > > }
> > > > 8<-----
> > >  
> > > > This would directly map into struct perf_raw_record...
> > > 
> > > I can see the usefulness of having it all, i.e. if we do just:
> > > 
> > > perf trace --pid `pidof some-tool-in-debug-mode-using-this-interface`
> > 
> > Hm. I haven't thought about a situation when 3rd party wants to inject
> > something into "my" data stream... I guess it could be implemented (a
> 
> I was thinking about intercepting calls that pass some logging data, as
> strings, and 'tee' them to the 'perf trace' 'data stream'.

Right, ok, like LD_PRELOADing printf (a stupid example of course) and
piping it inside perf... So if I'm getting it right, it's the perf
process that would eventually do the ioctl(PERF_EVENT_IOC_MARKER), not
the traced process, correct? This makes sense. Another use case for
ioctl justification, thanks :-)

> > "myself"?), but will definitely complicate the patch. Should I have a
> > look at it now or maybe leave it till we get a general agreement about
> > the marker ioctl existence?
> > 
> > > Then 'perf trace' doesn't know about any binary format a tool may have,
> > > getting strings there (hey, LD_PRELOADing some logging library to hook
> > > into this comes to mind) and having it merged with other events
> > > (syscalls, pagefaults, etc) looks useful.
> > 
> > But do you still mean a "magic" u64 before the rest? Injecting a string
> > would just mean:
> > 
> > 	marker.size = strlen(s) + 1;
> > 	marker.data = s;
> > 
> > > As well as some specialized version of 'perf trace' that knows about
> > > some binary protocol that would get app specific stats or lock status,
> > > etc, perhaps even plugins for 'perf trace' that would be selected by
> > > that first u64? Also seems useful.
> > > 	  
> > > I.e. having a way to provide just strings and another that would allow
> > > passing perf_raw_record.
> > 
> > Sounds interesting. But then maybe this stuff shouldn't go into "raw"
> > then? It could be something like this in the sample:
> > 
> > 	{ u64 type; /* 0 means zero-terminated string in data */
> > 	  u32 size;
> > 	  char data[size]; } && PERF_SAMPLE_MARKER
> 
> Yes, this is how I think it should be.

Seems that Ingo had exactly the same thing on mind. I'll get a patch
done.

Paweł


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-16 16:33                   ` Pawel Moll
  0 siblings, 0 replies; 35+ messages in thread
From: Pawel Moll @ 2014-09-16 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Christopher Covington, Richard Cochran, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, Paul Mackerras, John Stultz,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Mon, 2014-09-15 at 19:31 +0100, Arnaldo Carvalho de Melo wrote:
> Em Mon, Sep 15, 2014 at 06:27:14PM +0100, Pawel Moll escreveu:
> > On Fri, 2014-09-12 at 17:19 +0100, Arnaldo Carvalho de Melo wrote:
> > > Em Fri, Sep 12, 2014 at 02:58:55PM +0100, Pawel Moll escreveu:
> > > > On Fri, 2014-09-12 at 14:49 +0100, Arnaldo Carvalho de Melo wrote:
> > > > > Perhaps both? I.e. an u64 followed from a string, if the u64 is zero,
> > > > > then there is a string right after it?
> > >  
> > > > How would this look like in userspace? Something like this?
> > >  
> > > > 8<----
> > > > struct perf_event_marker {
> > > > 	uint64_t value;
> > > > 	char *string;
> > > > } arg;
> > >  
> > > > arg.value = 0x1234;
> > >  
> > > > /* or */
> > >  
> > > > arg.value = 0;
> > > > arg.string = "abcd";
> > >  
> > > > ioctl(fd, PERF_EVENT_IOC_MARKER, &arg)
> > > > 8<----
> > >  
> > > > If so, maybe it would simpler just to go for classic size/data
> > > > structure?
> > >  
> > > > 8<-----
> > > > struct perf_event_marker {
> > > > 	uint32_t size;
> > > > 	void *data;
> > > > }
> > > > 8<-----
> > >  
> > > > This would directly map into struct perf_raw_record...
> > > 
> > > I can see the usefulness of having it all, i.e. if we do just:
> > > 
> > > perf trace --pid `pidof some-tool-in-debug-mode-using-this-interface`
> > 
> > Hm. I haven't thought about a situation when 3rd party wants to inject
> > something into "my" data stream... I guess it could be implemented (a
> 
> I was thinking about intercepting calls that pass some logging data, as
> strings, and 'tee' them to the 'perf trace' 'data stream'.

Right, ok, like LD_PRELOADing printf (a stupid example of course) and
piping it inside perf... So if I'm getting it right, it's the perf
process that would eventually do the ioctl(PERF_EVENT_IOC_MARKER), not
the traced process, correct? This makes sense. Another use case for
ioctl justification, thanks :-)

> > "myself"?), but will definitely complicate the patch. Should I have a
> > look at it now or maybe leave it till we get a general agreement about
> > the marker ioctl existence?
> > 
> > > Then 'perf trace' doesn't know about any binary format a tool may have,
> > > getting strings there (hey, LD_PRELOADing some logging library to hook
> > > into this comes to mind) and having it merged with other events
> > > (syscalls, pagefaults, etc) looks useful.
> > 
> > But do you still mean a "magic" u64 before the rest? Injecting a string
> > would just mean:
> > 
> > 	marker.size = strlen(s) + 1;
> > 	marker.data = s;
> > 
> > > As well as some specialized version of 'perf trace' that knows about
> > > some binary protocol that would get app specific stats or lock status,
> > > etc, perhaps even plugins for 'perf trace' that would be selected by
> > > that first u64? Also seems useful.
> > > 	  
> > > I.e. having a way to provide just strings and another that would allow
> > > passing perf_raw_record.
> > 
> > Sounds interesting. But then maybe this stuff shouldn't go into "raw"
> > then? It could be something like this in the sample:
> > 
> > 	{ u64 type; /* 0 means zero-terminated string in data */
> > 	  u32 size;
> > 	  char data[size]; } && PERF_SAMPLE_MARKER
> 
> Yes, this is how I think it should be.

Seems that Ingo had exactly the same thing on mind. I'll get a patch
done.

Paweł

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-16 16:37             ` Pawel Moll
  0 siblings, 0 replies; 35+ messages in thread
From: Pawel Moll @ 2014-09-16 16:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Ahern, Arnaldo Carvalho de Melo, Richard Cochran,
	Steven Rostedt, Ingo Molnar, Peter Zijlstra, Paul Mackerras,
	John Stultz, linux-kernel, linux-api

On Tue, 2014-09-16 at 08:44 +0100, Ingo Molnar wrote:
> I think adding an ioctl to inject user-provided data into the 
> event stream is sensible, as long as there's a separate 'user 
> generated data' event for it, etc.
> 
> The main usecase I could see would be to introduce a 
> perf_printf() variant, supported by 'perf trace' by default, to 
> add various tracable printouts to apps.
> 
> Timestamps generated by apps would be another usecase. It would 
> probably be wise to add a 32-bit (or 64-bit) message type ID, 
> plus a length field, with a message type registry somewhere in 
> tools/perf/ (and reference implementation for each new subtype), 
> to keep things organized yet flexible going forward.

Right, so this is pretty much what I got talking to Arnaldo...

>       { u64 type; /* 0 means zero-terminated string in data */
>         u32 size;
>         char data[size]; } && PERF_SAMPLE_MARKER

... with one type - 0 - defined as a "universal" string (so any possible
tool knows what to do about it), the rest being left to userspace (this
"registry" you mention).

Before I proceed any further, is the term "marker" acceptable? Maybe a
"printf" instead? Or a "log"? As we know naming is often single most
discussed subject when it comes to new things in the kernel ;-)

Pawel


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-16 16:37             ` Pawel Moll
  0 siblings, 0 replies; 35+ messages in thread
From: Pawel Moll @ 2014-09-16 16:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Ahern, Arnaldo Carvalho de Melo, Richard Cochran,
	Steven Rostedt, Ingo Molnar, Peter Zijlstra, Paul Mackerras,
	John Stultz, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Tue, 2014-09-16 at 08:44 +0100, Ingo Molnar wrote:
> I think adding an ioctl to inject user-provided data into the 
> event stream is sensible, as long as there's a separate 'user 
> generated data' event for it, etc.
> 
> The main usecase I could see would be to introduce a 
> perf_printf() variant, supported by 'perf trace' by default, to 
> add various tracable printouts to apps.
> 
> Timestamps generated by apps would be another usecase. It would 
> probably be wise to add a 32-bit (or 64-bit) message type ID, 
> plus a length field, with a message type registry somewhere in 
> tools/perf/ (and reference implementation for each new subtype), 
> to keep things organized yet flexible going forward.

Right, so this is pretty much what I got talking to Arnaldo...

>       { u64 type; /* 0 means zero-terminated string in data */
>         u32 size;
>         char data[size]; } && PERF_SAMPLE_MARKER

... with one type - 0 - defined as a "universal" string (so any possible
tool knows what to do about it), the rest being left to userspace (this
"registry" you mention).

Before I proceed any further, is the term "marker" acceptable? Maybe a
"printf" instead? Or a "log"? As we know naming is often single most
discussed subject when it comes to new things in the kernel ;-)

Pawel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
  2014-09-16 16:37             ` Pawel Moll
@ 2014-09-16 17:58               ` Ingo Molnar
  -1 siblings, 0 replies; 35+ messages in thread
From: Ingo Molnar @ 2014-09-16 17:58 UTC (permalink / raw)
  To: Pawel Moll
  Cc: David Ahern, Arnaldo Carvalho de Melo, Richard Cochran,
	Steven Rostedt, Ingo Molnar, Peter Zijlstra, Paul Mackerras,
	John Stultz, linux-kernel, linux-api


* Pawel Moll <pawel.moll@arm.com> wrote:

> On Tue, 2014-09-16 at 08:44 +0100, Ingo Molnar wrote:
> > I think adding an ioctl to inject user-provided data into the 
> > event stream is sensible, as long as there's a separate 'user 
> > generated data' event for it, etc.
> > 
> > The main usecase I could see would be to introduce a 
> > perf_printf() variant, supported by 'perf trace' by default, to 
> > add various tracable printouts to apps.
> > 
> > Timestamps generated by apps would be another usecase. It would 
> > probably be wise to add a 32-bit (or 64-bit) message type ID, 
> > plus a length field, with a message type registry somewhere in 
> > tools/perf/ (and reference implementation for each new subtype), 
> > to keep things organized yet flexible going forward.
> 
> Right, so this is pretty much what I got talking to Arnaldo...
> 
> >       { u64 type; /* 0 means zero-terminated string in data */
> >         u32 size;
> >         char data[size]; } && PERF_SAMPLE_MARKER
> 
> ... with one type - 0 - defined as a "universal" string (so any 
> possible tool knows what to do about it), the rest being left 
> to userspace (this "registry" you mention).
> 
> Before I proceed any further, is the term "marker" acceptable? 
> Maybe a "printf" instead? Or a "log"? As we know naming is 
> often single most discussed subject when it comes to new things 
> in the kernel ;-)

Well, it's a user-space generated trace/event entry, so lets call 
it that?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 2/2] perf: Marker software event and ioctl
@ 2014-09-16 17:58               ` Ingo Molnar
  0 siblings, 0 replies; 35+ messages in thread
From: Ingo Molnar @ 2014-09-16 17:58 UTC (permalink / raw)
  To: Pawel Moll
  Cc: David Ahern, Arnaldo Carvalho de Melo, Richard Cochran,
	Steven Rostedt, Ingo Molnar, Peter Zijlstra, Paul Mackerras,
	John Stultz, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA


* Pawel Moll <pawel.moll-5wv7dgnIgG8@public.gmane.org> wrote:

> On Tue, 2014-09-16 at 08:44 +0100, Ingo Molnar wrote:
> > I think adding an ioctl to inject user-provided data into the 
> > event stream is sensible, as long as there's a separate 'user 
> > generated data' event for it, etc.
> > 
> > The main usecase I could see would be to introduce a 
> > perf_printf() variant, supported by 'perf trace' by default, to 
> > add various tracable printouts to apps.
> > 
> > Timestamps generated by apps would be another usecase. It would 
> > probably be wise to add a 32-bit (or 64-bit) message type ID, 
> > plus a length field, with a message type registry somewhere in 
> > tools/perf/ (and reference implementation for each new subtype), 
> > to keep things organized yet flexible going forward.
> 
> Right, so this is pretty much what I got talking to Arnaldo...
> 
> >       { u64 type; /* 0 means zero-terminated string in data */
> >         u32 size;
> >         char data[size]; } && PERF_SAMPLE_MARKER
> 
> ... with one type - 0 - defined as a "universal" string (so any 
> possible tool knows what to do about it), the rest being left 
> to userspace (this "registry" you mention).
> 
> Before I proceed any further, is the term "marker" acceptable? 
> Maybe a "printf" instead? Or a "log"? As we know naming is 
> often single most discussed subject when it comes to new things 
> in the kernel ;-)

Well, it's a user-space generated trace/event entry, so lets call 
it that?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2014-09-16 17:58 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-12 11:48 [RFC 0/2] Yet another take at user/kernel time correlation problem Pawel Moll
2014-09-12 11:48 ` [RFC 1/2] perf: Add sampling of the raw monotonic clock Pawel Moll
2014-09-12 11:48 ` [RFC 2/2] perf: Marker software event and ioctl Pawel Moll
2014-09-12 11:48   ` Pawel Moll
2014-09-12 12:43   ` Christopher Covington
2014-09-12 12:43     ` Christopher Covington
2014-09-12 12:57     ` Pawel Moll
2014-09-12 12:57       ` Pawel Moll
2014-09-12 13:49       ` Arnaldo Carvalho de Melo
2014-09-12 13:49         ` Arnaldo Carvalho de Melo
2014-09-12 13:58         ` Pawel Moll
2014-09-12 13:58           ` Pawel Moll
2014-09-12 16:19           ` Arnaldo Carvalho de Melo
2014-09-12 16:19             ` Arnaldo Carvalho de Melo
2014-09-15 17:27             ` Pawel Moll
2014-09-15 17:27               ` Pawel Moll
2014-09-15 18:31               ` Arnaldo Carvalho de Melo
2014-09-15 18:31                 ` Arnaldo Carvalho de Melo
2014-09-16 16:33                 ` Pawel Moll
2014-09-16 16:33                   ` Pawel Moll
2014-09-12 14:00       ` Christopher Covington
2014-09-12 14:00         ` Christopher Covington
2014-09-12 17:37   ` David Ahern
2014-09-12 17:37     ` David Ahern
2014-09-12 20:44     ` Arnaldo Carvalho de Melo
2014-09-12 20:44       ` Arnaldo Carvalho de Melo
2014-09-14 15:43       ` David Ahern
2014-09-15 17:18         ` Pawel Moll
2014-09-15 17:18           ` Pawel Moll
2014-09-16  7:44         ` Ingo Molnar
2014-09-16  7:44           ` Ingo Molnar
2014-09-16 16:37           ` Pawel Moll
2014-09-16 16:37             ` Pawel Moll
2014-09-16 17:58             ` Ingo Molnar
2014-09-16 17:58               ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.