linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration
@ 2013-06-11 16:42 Robert Richter
  2013-06-11 16:42 ` [PATCH v2 01/14] perf, ring_buffer: Use same prefix Robert Richter
                   ` (17 more replies)
  0 siblings, 18 replies; 54+ messages in thread
From: Robert Richter @ 2013-06-11 16:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Robert Richter

This patch set implements out of the box support of perf tools for
persistent events. For this the kernel must provide necessary
information about existing persistent events via sysfs to userland.
Persistent events are provided by the kernel with readonly event
buffers. To allow the independent usage of the event buffers by any
process without limiting other processes, multiple users of a single
event must be handled too.

The basic concept is to use a pmu as an event container for persistent
events. The pmu registers events in sysfs and provides format and
event information for the userland. The persistent event framework
requires to add events to the pmu dynamically.

With the information in sysfs userland knows about how to setup the
perf_event attribute of a persistent event. Since a persistent event
always has the persistent flag set, a way is needed to express this in
sysfs. A new syntax is used for this. With 'attr<num>:<mask>' any bit
in the attribute structure may be set in a similar way as using
'config<num>', but <num> is an index that points to the u64 value to
change within the attribute.

For persistent events the persistent flag (bit 23 of flag field in
struct perf_event_attr) needs to be set which is expressed in sysfs
with "attr5:23". E.g. the mce_record event is described in sysfs as
follows:

 /sys/bus/event_source/devices/persistent/events/mce_record:persistent,config=106
 /sys/bus/event_source/devices/persistent/format/persistent:attr5:23

Note that perf tools need to support the 'attr<num>' syntax that is
added in a separate patch set. With it we are able to run perf tool
commands to read persistent events, e.g.:

 # perf record -e persistent/mce_record/ sleep 10
 # perf top -e persistent/mce_record/

In general the new syntax is flexible to describe with sysfs any event
to be setup by perf tools.

First patches contain also fixes and reworks made after reviewing
code. Could be applied separately.

Patches base on Boris' patches which I have rebased to latest
tip/perf/core. All patches can be found here:

 git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git persistent-v2

I wonder if this patch set could be applied to a tip/perf topic
branch? This would avoid reposting already reviewed patches.

Note: The perf tools patch set do not need to be reposted.

-Robert


Changes for V2:
* Merged minor changes into Boris' patches
* Included Boris' patches for review
* Document attr<index> syntax in sysfs ABI
* Adding cpu check to perf_get_persistent_event_fd()
* Rebased to latest tip/perf/core


Borislav Petkov (4):
  perf, ring_buffer: Use same prefix
  perf: Add persistent events
  perf: Add persistent event facilities
  MCE: Enable persistent event

Robert Richter (10):
  perf, persistent: Rework struct pers_event_desc
  perf, persistent: Remove rb_put()
  perf, persistent: Introduce get_persistent_event()
  perf, persistent: Reworking perf_get_persistent_event_fd()
  perf, persistent: Protect event lists with mutex
  perf, persistent: Avoid adding identical events
  perf, persistent: Implementing a persistent pmu
  perf, persistent: Name each persistent event
  perf, persistent: Exposing persistent events using sysfs
  perf, persistent: Allow multiple users for an event

 .../testing/sysfs-bus-event_source-devices-format  |  43 ++-
 arch/x86/kernel/cpu/mcheck/mce.c                   |  19 ++
 include/linux/perf_event.h                         |   9 +-
 include/uapi/linux/perf_event.h                    |   3 +-
 kernel/events/Makefile                             |   2 +-
 kernel/events/core.c                               |  56 ++--
 kernel/events/internal.h                           |   3 +
 kernel/events/persistent.c                         | 320 +++++++++++++++++++++
 kernel/events/ring_buffer.c                        |   7 +-
 9 files changed, 419 insertions(+), 43 deletions(-)
 create mode 100644 kernel/events/persistent.c

-- 
1.8.1.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 01/14] perf, ring_buffer: Use same prefix
  2013-06-11 16:42 [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Robert Richter
@ 2013-06-11 16:42 ` Robert Richter
  2013-06-11 16:42 ` [PATCH v2 02/14] perf: Add persistent events Robert Richter
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 54+ messages in thread
From: Robert Richter @ 2013-06-11 16:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Borislav Petkov, Robert Richter

From: Borislav Petkov <bp@suse.de>

Rename ring_buffer-handling functions consistently with the "rb_"
prefix.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Robert Richter <rric@kernel.org>
---
 kernel/events/core.c        | 37 +++++++++++++++++--------------------
 kernel/events/ring_buffer.c |  7 +++----
 2 files changed, 20 insertions(+), 24 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index a0780b3..b790ab6 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -198,8 +198,7 @@ static void cpu_ctx_sched_in(struct perf_cpu_context *cpuctx,
 static void update_context_time(struct perf_event_context *ctx);
 static u64 perf_event_time(struct perf_event *event);
 
-static void ring_buffer_attach(struct perf_event *event,
-			       struct ring_buffer *rb);
+static void rb_attach(struct perf_event *event, struct ring_buffer *rb);
 
 void __weak perf_event_print_debug(void)	{ }
 
@@ -3022,7 +3021,7 @@ static void free_event_rcu(struct rcu_head *head)
 	kfree(event);
 }
 
-static void ring_buffer_put(struct ring_buffer *rb);
+static void rb_put(struct ring_buffer *rb);
 
 static void free_event(struct perf_event *event)
 {
@@ -3054,7 +3053,7 @@ static void free_event(struct perf_event *event)
 	}
 
 	if (event->rb) {
-		ring_buffer_put(event->rb);
+		rb_put(event->rb);
 		event->rb = NULL;
 	}
 
@@ -3299,8 +3298,8 @@ static unsigned int perf_poll(struct file *file, poll_table *wait)
 	 * t0: T1, rb = rcu_dereference(event->rb)
 	 * t1: T2, old_rb = event->rb
 	 * t2: T2, event->rb = new rb
-	 * t3: T2, ring_buffer_detach(old_rb)
-	 * t4: T1, ring_buffer_attach(rb1)
+	 * t3: T2, rb_detach(old_rb)
+	 * t4: T1, rb_attach(rb1)
 	 * t5: T1, poll_wait(event->waitq)
 	 *
 	 * To avoid this problem, we grab mmap_mutex in perf_poll()
@@ -3312,7 +3311,7 @@ static unsigned int perf_poll(struct file *file, poll_table *wait)
 	rcu_read_lock();
 	rb = rcu_dereference(event->rb);
 	if (rb) {
-		ring_buffer_attach(event, rb);
+		rb_attach(event, rb);
 		events = atomic_xchg(&rb->poll, 0);
 	}
 	rcu_read_unlock();
@@ -3617,8 +3616,7 @@ unlock:
 	return ret;
 }
 
-static void ring_buffer_attach(struct perf_event *event,
-			       struct ring_buffer *rb)
+static void rb_attach(struct perf_event *event, struct ring_buffer *rb)
 {
 	unsigned long flags;
 
@@ -3634,8 +3632,7 @@ unlock:
 	spin_unlock_irqrestore(&rb->event_lock, flags);
 }
 
-static void ring_buffer_detach(struct perf_event *event,
-			       struct ring_buffer *rb)
+static void rb_detach(struct perf_event *event, struct ring_buffer *rb)
 {
 	unsigned long flags;
 
@@ -3648,7 +3645,7 @@ static void ring_buffer_detach(struct perf_event *event,
 	spin_unlock_irqrestore(&rb->event_lock, flags);
 }
 
-static void ring_buffer_wakeup(struct perf_event *event)
+static void rb_wakeup(struct perf_event *event)
 {
 	struct ring_buffer *rb;
 
@@ -3672,7 +3669,7 @@ static void rb_free_rcu(struct rcu_head *rcu_head)
 	rb_free(rb);
 }
 
-static struct ring_buffer *ring_buffer_get(struct perf_event *event)
+static struct ring_buffer *rb_get(struct perf_event *event)
 {
 	struct ring_buffer *rb;
 
@@ -3687,7 +3684,7 @@ static struct ring_buffer *ring_buffer_get(struct perf_event *event)
 	return rb;
 }
 
-static void ring_buffer_put(struct ring_buffer *rb)
+static void rb_put(struct ring_buffer *rb)
 {
 	struct perf_event *event, *n;
 	unsigned long flags;
@@ -3724,10 +3721,10 @@ static void perf_mmap_close(struct vm_area_struct *vma)
 		atomic_long_sub((size >> PAGE_SHIFT) + 1, &user->locked_vm);
 		vma->vm_mm->pinned_vm -= event->mmap_locked;
 		rcu_assign_pointer(event->rb, NULL);
-		ring_buffer_detach(event, rb);
+		rb_detach(event, rb);
 		mutex_unlock(&event->mmap_mutex);
 
-		ring_buffer_put(rb);
+		rb_put(rb);
 		free_uid(user);
 	}
 }
@@ -3881,7 +3878,7 @@ static const struct file_operations perf_fops = {
 
 void perf_event_wakeup(struct perf_event *event)
 {
-	ring_buffer_wakeup(event);
+	rb_wakeup(event);
 
 	if (event->pending_kill) {
 		kill_fasync(&event->fasync, SIGIO, event->pending_kill);
@@ -6567,7 +6564,7 @@ set:
 
 	if (output_event) {
 		/* get the rb we want to redirect to */
-		rb = ring_buffer_get(output_event);
+		rb = rb_get(output_event);
 		if (!rb)
 			goto unlock;
 	}
@@ -6575,13 +6572,13 @@ set:
 	old_rb = event->rb;
 	rcu_assign_pointer(event->rb, rb);
 	if (old_rb)
-		ring_buffer_detach(event, old_rb);
+		rb_detach(event, old_rb);
 	ret = 0;
 unlock:
 	mutex_unlock(&event->mmap_mutex);
 
 	if (old_rb)
-		ring_buffer_put(old_rb);
+		rb_put(old_rb);
 out:
 	return ret;
 }
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index cd55144..b514c56 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -212,8 +212,7 @@ void perf_output_end(struct perf_output_handle *handle)
 	rcu_read_unlock();
 }
 
-static void
-ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
+static void rb_init(struct ring_buffer *rb, long watermark, int flags)
 {
 	long max_size = perf_data_size(rb);
 
@@ -290,7 +289,7 @@ struct ring_buffer *rb_alloc(int nr_pages, long watermark, int cpu, int flags)
 
 	rb->nr_pages = nr_pages;
 
-	ring_buffer_init(rb, watermark, flags);
+	rb_init(rb, watermark, flags);
 
 	return rb;
 
@@ -395,7 +394,7 @@ struct ring_buffer *rb_alloc(int nr_pages, long watermark, int cpu, int flags)
 	rb->page_order = ilog2(nr_pages);
 	rb->nr_pages = !!nr_pages;
 
-	ring_buffer_init(rb, watermark, flags);
+	rb_init(rb, watermark, flags);
 
 	return rb;
 
-- 
1.8.1.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 02/14] perf: Add persistent events
  2013-06-11 16:42 [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Robert Richter
  2013-06-11 16:42 ` [PATCH v2 01/14] perf, ring_buffer: Use same prefix Robert Richter
@ 2013-06-11 16:42 ` Robert Richter
  2013-06-24  9:28   ` Peter Zijlstra
  2013-06-11 16:42 ` [PATCH v2 03/14] perf: Add persistent event facilities Robert Richter
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 54+ messages in thread
From: Robert Richter @ 2013-06-11 16:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Borislav Petkov, Robert Richter, Robert Richter

From: Borislav Petkov <bp@alien8.de>

Add the needed pieces for persistent events which makes them
process-agnostic. Also, make their buffers read-only when mmaping them
from userspace.

While at it, do not return a void function, as caught by Fengguang's
build robot.

Changes made by Robert Richter <robert.richter@linaro.org>:

* mmap should return EACCES error if fd can not be opened writable.
  This error code also helps userland to map buffers readonly on
  failure.

Signed-off-by: Borislav Petkov <bp@suse.de>
[ Return -EACCES if mapped buffers must be readonly ]
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
---
 include/uapi/linux/perf_event.h |  3 ++-
 kernel/events/core.c            | 10 +++++++++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index fb104e5..6032361 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -272,8 +272,9 @@ struct perf_event_attr {
 
 				exclude_callchain_kernel : 1, /* exclude kernel callchains */
 				exclude_callchain_user   : 1, /* exclude user callchains */
+				persistent     :  1, /* always-on event */
 
-				__reserved_1   : 41;
+				__reserved_1   : 40;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index b790ab6..a13e457 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3713,6 +3713,11 @@ static void perf_mmap_close(struct vm_area_struct *vma)
 {
 	struct perf_event *event = vma->vm_file->private_data;
 
+	if (event->attr.persistent) {
+		atomic_dec(&event->mmap_count);
+		return;
+	}
+
 	if (atomic_dec_and_mutex_lock(&event->mmap_count, &event->mmap_mutex)) {
 		unsigned long size = perf_data_size(event->rb);
 		struct user_struct *user = event->mmap_user;
@@ -3756,9 +3761,12 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
 	if (event->cpu == -1 && event->attr.inherit)
 		return -EINVAL;
 
-	if (!(vma->vm_flags & VM_SHARED))
+	if (!(vma->vm_flags & VM_SHARED) && !event->attr.persistent)
 		return -EINVAL;
 
+	if (event->attr.persistent && (vma->vm_flags & VM_WRITE))
+		return -EACCES;
+
 	vma_size = vma->vm_end - vma->vm_start;
 	nr_pages = (vma_size / PAGE_SIZE) - 1;
 
-- 
1.8.1.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 03/14] perf: Add persistent event facilities
  2013-06-11 16:42 [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Robert Richter
  2013-06-11 16:42 ` [PATCH v2 01/14] perf, ring_buffer: Use same prefix Robert Richter
  2013-06-11 16:42 ` [PATCH v2 02/14] perf: Add persistent events Robert Richter
@ 2013-06-11 16:42 ` Robert Richter
  2013-06-14  2:15   ` Namhyung Kim
                     ` (3 more replies)
  2013-06-11 16:42 ` [PATCH v2 04/14] MCE: Enable persistent event Robert Richter
                   ` (14 subsequent siblings)
  17 siblings, 4 replies; 54+ messages in thread
From: Robert Richter @ 2013-06-11 16:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Borislav Petkov, Robert Richter, Robert Richter

From: Borislav Petkov <bp@alien8.de>

Add a barebones implementation for registering persistent events with
perf. For that, we don't destroy the buffers when they're unmapped;
also, we map them read-only so that multiple agents can access them.

Also, we allocate the event buffers at event init time and not at mmap
time so that we can log samples into them regardless of whether there
are readers in userspace or not.

Changes made by Robert Richter <robert.richter@linaro.org>:

* Fixing wrongly determined attribute size.

* The default buffer size used to setup event buffers with perf tools
  is 512k. Using the same buffer size for persistent events. This also
  avoids failed mmap calls due to different buffer sizes.

* Improve error reporting.

* Returning -ENODEV if no file descriptor is found. An error code of
  -1 (-EPERM) is misleading in this case.

* Adding cpu check to perf_get_persistent_event_fd()

[ make percpu variable static ]
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
[ Fix attr size ]
[ Setting default buffer size to 512k as in perf tools ]
[ Print error code on failure when adding events ]
[ Return resonable error code ]
[ Adding cpu check to perf_get_persistent_event_fd() ]
Reported-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
---
 include/linux/perf_event.h |  16 +++-
 kernel/events/Makefile     |   2 +-
 kernel/events/core.c       |  13 ++--
 kernel/events/internal.h   |   4 +
 kernel/events/persistent.c | 181 +++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 207 insertions(+), 9 deletions(-)
 create mode 100644 kernel/events/persistent.c

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6fddac1..d2a42b7 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -518,6 +518,13 @@ struct perf_output_handle {
 	int				page;
 };
 
+struct pers_event_desc {
+	struct perf_event_attr *attr;
+	struct perf_event *event;
+	struct list_head plist;
+	int fd;
+};
+
 #ifdef CONFIG_PERF_EVENTS
 
 extern int perf_pmu_register(struct pmu *pmu, char *name, int type);
@@ -750,7 +757,9 @@ extern void perf_event_enable(struct perf_event *event);
 extern void perf_event_disable(struct perf_event *event);
 extern int __perf_event_disable(void *info);
 extern void perf_event_task_tick(void);
-#else
+extern int perf_add_persistent_event(struct perf_event_attr *, unsigned);
+extern int perf_add_persistent_event_by_id(int id);
+#else /* !CONFIG_PERF_EVENTS */
 static inline void
 perf_event_task_sched_in(struct task_struct *prev,
 			 struct task_struct *task)			{ }
@@ -790,7 +799,10 @@ static inline void perf_event_enable(struct perf_event *event)		{ }
 static inline void perf_event_disable(struct perf_event *event)		{ }
 static inline int __perf_event_disable(void *info)			{ return -1; }
 static inline void perf_event_task_tick(void)				{ }
-#endif
+static inline int perf_add_persistent_event(struct perf_event_attr *attr,
+					    unsigned nr_pages)		{ return -EINVAL; }
+static inline int perf_add_persistent_event_by_id(int id)		{ return -EINVAL; }
+#endif /* !CONFIG_PERF_EVENTS */
 
 #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_NO_HZ_FULL)
 extern bool perf_event_can_stop_tick(void);
diff --git a/kernel/events/Makefile b/kernel/events/Makefile
index 103f5d1..70990d5 100644
--- a/kernel/events/Makefile
+++ b/kernel/events/Makefile
@@ -2,7 +2,7 @@ ifdef CONFIG_FUNCTION_TRACER
 CFLAGS_REMOVE_core.o = -pg
 endif
 
-obj-y := core.o ring_buffer.o callchain.o
+obj-y := core.o ring_buffer.o callchain.o persistent.o
 
 obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
 obj-$(CONFIG_UPROBES) += uprobes.o
diff --git a/kernel/events/core.c b/kernel/events/core.c
index a13e457..a9b6470 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3021,8 +3021,6 @@ static void free_event_rcu(struct rcu_head *head)
 	kfree(event);
 }
 
-static void rb_put(struct ring_buffer *rb);
-
 static void free_event(struct perf_event *event)
 {
 	irq_work_sync(&event->pending);
@@ -3398,8 +3396,6 @@ unlock:
 	return ret;
 }
 
-static const struct file_operations perf_fops;
-
 static inline int perf_fget_light(int fd, struct fd *p)
 {
 	struct fd f = fdget(fd);
@@ -3684,7 +3680,7 @@ static struct ring_buffer *rb_get(struct perf_event *event)
 	return rb;
 }
 
-static void rb_put(struct ring_buffer *rb)
+void rb_put(struct ring_buffer *rb)
 {
 	struct perf_event *event, *n;
 	unsigned long flags;
@@ -3866,7 +3862,7 @@ static int perf_fasync(int fd, struct file *filp, int on)
 	return 0;
 }
 
-static const struct file_operations perf_fops = {
+const struct file_operations perf_fops = {
 	.llseek			= no_llseek,
 	.release		= perf_release,
 	.read			= perf_read,
@@ -6623,6 +6619,9 @@ SYSCALL_DEFINE5(perf_event_open,
 	if (err)
 		return err;
 
+	if (attr.persistent)
+		return perf_get_persistent_event_fd(cpu, &attr);
+
 	if (!attr.exclude_kernel) {
 		if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
 			return -EACCES;
@@ -7579,6 +7578,8 @@ void __init perf_event_init(void)
 	 */
 	BUILD_BUG_ON((offsetof(struct perf_event_mmap_page, data_head))
 		     != 1024);
+
+	persistent_events_init();
 }
 
 static int __init perf_event_sysfs_init(void)
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index eb675c4..3b481be 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -38,6 +38,7 @@ struct ring_buffer {
 extern void rb_free(struct ring_buffer *rb);
 extern struct ring_buffer *
 rb_alloc(int nr_pages, long watermark, int cpu, int flags);
+extern void rb_put(struct ring_buffer *rb);
 extern void perf_event_wakeup(struct perf_event *event);
 
 extern void
@@ -174,4 +175,7 @@ static inline bool arch_perf_have_user_stack_dump(void)
 #define perf_user_stack_pointer(regs) 0
 #endif /* CONFIG_HAVE_PERF_USER_STACK_DUMP */
 
+extern const struct file_operations perf_fops;
+extern int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr);
+extern void __init persistent_events_init(void);
 #endif /* _KERNEL_EVENTS_INTERNAL_H */
diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
new file mode 100644
index 0000000..53411b4
--- /dev/null
+++ b/kernel/events/persistent.c
@@ -0,0 +1,181 @@
+#include <linux/slab.h>
+#include <linux/file.h>
+#include <linux/perf_event.h>
+#include <linux/anon_inodes.h>
+
+#include "internal.h"
+
+/* 512 kiB: default perf tools memory size, see perf_evlist__mmap() */
+#define CPU_BUFFER_NR_PAGES	((512 * 1024) / PAGE_SIZE)
+
+static DEFINE_PER_CPU(struct list_head, pers_events);
+
+static struct perf_event *
+add_persistent_event_on_cpu(unsigned int cpu, struct perf_event_attr *attr,
+			    unsigned nr_pages)
+{
+	struct perf_event *event = ERR_PTR(-ENOMEM);
+	struct pers_event_desc *desc;
+	struct ring_buffer *buf;
+
+	desc = kzalloc(sizeof(*desc), GFP_KERNEL);
+	if (!desc)
+		goto out;
+
+	buf = rb_alloc(nr_pages, 0, cpu, 0);
+	if (!buf)
+		goto err_rb;
+
+	event = perf_event_create_kernel_counter(attr, cpu, NULL, NULL, NULL);
+	if (IS_ERR(event))
+		goto err_event;
+
+	rcu_assign_pointer(event->rb, buf);
+
+	desc->event = event;
+	desc->attr  = attr;
+
+	INIT_LIST_HEAD(&desc->plist);
+	list_add_tail(&desc->plist, &per_cpu(pers_events, cpu));
+
+	/* All workie, enable event now */
+	perf_event_enable(event);
+
+	goto out;
+
+ err_event:
+	rb_put(buf);
+
+ err_rb:
+	kfree(desc);
+
+ out:
+	return event;
+}
+
+static void del_persistent_event(int cpu, struct perf_event_attr *attr)
+{
+	struct pers_event_desc *desc, *tmp;
+	struct perf_event *event = NULL;
+
+	list_for_each_entry_safe(desc, tmp, &per_cpu(pers_events, cpu), plist) {
+		if (desc->attr->config == attr->config) {
+			event = desc->event;
+			break;
+		}
+	}
+
+	if (!event)
+		return;
+
+	list_del(&desc->plist);
+
+	perf_event_disable(event);
+	if (event->rb) {
+		rb_put(event->rb);
+		rcu_assign_pointer(event->rb, NULL);
+	}
+
+	perf_event_release_kernel(event);
+	put_unused_fd(desc->fd);
+	kfree(desc->attr);
+	kfree(desc);
+}
+
+static int __alloc_persistent_event_fd(struct pers_event_desc *desc)
+{
+	struct file *event_file = NULL;
+	int event_fd = -1;
+
+	event_fd = get_unused_fd();
+	if (event_fd < 0)
+		goto out;
+
+	event_file = anon_inode_getfile("[pers_event]", &perf_fops,
+					desc->event, O_RDONLY);
+	if (IS_ERR(event_file))
+		goto err_event_file;
+
+	desc->fd = event_fd;
+	fd_install(event_fd, event_file);
+
+	goto out;
+
+
+ err_event_file:
+	put_unused_fd(event_fd);
+
+ out:
+	return event_fd;
+}
+
+/*
+ * Create and enable the persistent version of the perf event described by
+ * @attr.
+ *
+ * @attr: perf event descriptor
+ * @nr_pages: size in pages
+ */
+int perf_add_persistent_event(struct perf_event_attr *attr, unsigned nr_pages)
+{
+	struct perf_event *event;
+	int i;
+
+	for_each_possible_cpu(i) {
+		event = add_persistent_event_on_cpu(i, attr, nr_pages);
+		if (IS_ERR(event))
+			goto unwind;
+	}
+	return 0;
+
+unwind:
+	pr_err("%s: Error adding persistent event on cpu %d: %ld\n",
+		__func__, i, PTR_ERR(event));
+
+	while (--i >= 0)
+		del_persistent_event(i, attr);
+
+	return PTR_ERR(event);
+}
+
+int perf_add_persistent_event_by_id(int id)
+{
+	struct perf_event_attr *attr;
+
+	attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+	if (!attr)
+		return -ENOMEM;
+
+	attr->sample_period	= 1;
+	attr->wakeup_events	= 1;
+	attr->sample_type	= PERF_SAMPLE_RAW;
+	attr->persistent	= 1;
+	attr->config		= id;
+	attr->type		= PERF_TYPE_TRACEPOINT;
+	attr->size		= sizeof(*attr);
+
+	return perf_add_persistent_event(attr, CPU_BUFFER_NR_PAGES);
+}
+
+int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr)
+{
+	struct pers_event_desc *desc;
+
+	if (cpu >= (unsigned)nr_cpu_ids)
+		return -EINVAL;
+
+	list_for_each_entry(desc, &per_cpu(pers_events, cpu), plist)
+		if (desc->attr->config == attr->config)
+			return __alloc_persistent_event_fd(desc);
+
+	return -ENODEV;
+}
+
+
+void __init persistent_events_init(void)
+{
+	int i;
+
+	for_each_possible_cpu(i)
+		INIT_LIST_HEAD(&per_cpu(pers_events, i));
+}
-- 
1.8.1.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 04/14] MCE: Enable persistent event
  2013-06-11 16:42 [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Robert Richter
                   ` (2 preceding siblings ...)
  2013-06-11 16:42 ` [PATCH v2 03/14] perf: Add persistent event facilities Robert Richter
@ 2013-06-11 16:42 ` Robert Richter
  2013-06-11 16:42 ` [PATCH v2 05/14] perf, persistent: Rework struct pers_event_desc Robert Richter
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 54+ messages in thread
From: Robert Richter @ 2013-06-11 16:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Borislav Petkov, Robert Richter, Robert Richter

From: Borislav Petkov <bp@suse.de>

... for MCEs collection.

Changes made by Robert Richter <robert.richter@linaro.org>:

The mce_record tracepoint needs tracepoints to be enabled. Fixing
build error for no-tracepoints configs.

Signed-off-by: Borislav Petkov <bp@suse.de>
[ Fix build error for no-tracepoints configs ]
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 9239504..d421937 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1987,6 +1987,24 @@ int __init mcheck_init(void)
 	return 0;
 }
 
+#ifdef CONFIG_TRACEPOINTS
+
+int __init mcheck_init_tp(void)
+{
+	if (perf_add_persistent_event_by_id(event_mce_record.event.type)) {
+		pr_err("Error adding MCE persistent event.\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+/*
+ * We can't run earlier because persistent events uses anon_inode_getfile and
+ * its anon_inode_mnt gets initialized as a fs_initcall.
+ */
+fs_initcall_sync(mcheck_init_tp);
+
+#endif /* CONFIG_TRACEPOINTS */
+
 /*
  * mce_syscore: PM support
  */
-- 
1.8.1.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 05/14] perf, persistent: Rework struct pers_event_desc
  2013-06-11 16:42 [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Robert Richter
                   ` (3 preceding siblings ...)
  2013-06-11 16:42 ` [PATCH v2 04/14] MCE: Enable persistent event Robert Richter
@ 2013-06-11 16:42 ` Robert Richter
  2013-06-11 16:42 ` [PATCH v2 06/14] perf, persistent: Remove rb_put() Robert Richter
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 54+ messages in thread
From: Robert Richter @ 2013-06-11 16:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Robert Richter, Robert Richter

From: Robert Richter <robert.richter@linaro.org>

Struct pers_event_desc is only used in kernel/events/persistent.c.
Moving it there. Also, removing attr member as this is a copy of
event->attr.

Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
---
 include/linux/perf_event.h |  7 -------
 kernel/events/persistent.c | 12 ++++++++----
 2 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index d2a42b7..dc72c93 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -518,13 +518,6 @@ struct perf_output_handle {
 	int				page;
 };
 
-struct pers_event_desc {
-	struct perf_event_attr *attr;
-	struct perf_event *event;
-	struct list_head plist;
-	int fd;
-};
-
 #ifdef CONFIG_PERF_EVENTS
 
 extern int perf_pmu_register(struct pmu *pmu, char *name, int type);
diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index 53411b4..d7aaf95 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -8,6 +8,12 @@
 /* 512 kiB: default perf tools memory size, see perf_evlist__mmap() */
 #define CPU_BUFFER_NR_PAGES	((512 * 1024) / PAGE_SIZE)
 
+struct pers_event_desc {
+	struct perf_event *event;
+	struct list_head plist;
+	int fd;
+};
+
 static DEFINE_PER_CPU(struct list_head, pers_events);
 
 static struct perf_event *
@@ -33,7 +39,6 @@ add_persistent_event_on_cpu(unsigned int cpu, struct perf_event_attr *attr,
 	rcu_assign_pointer(event->rb, buf);
 
 	desc->event = event;
-	desc->attr  = attr;
 
 	INIT_LIST_HEAD(&desc->plist);
 	list_add_tail(&desc->plist, &per_cpu(pers_events, cpu));
@@ -59,7 +64,7 @@ static void del_persistent_event(int cpu, struct perf_event_attr *attr)
 	struct perf_event *event = NULL;
 
 	list_for_each_entry_safe(desc, tmp, &per_cpu(pers_events, cpu), plist) {
-		if (desc->attr->config == attr->config) {
+		if (desc->event->attr.config == attr->config) {
 			event = desc->event;
 			break;
 		}
@@ -78,7 +83,6 @@ static void del_persistent_event(int cpu, struct perf_event_attr *attr)
 
 	perf_event_release_kernel(event);
 	put_unused_fd(desc->fd);
-	kfree(desc->attr);
 	kfree(desc);
 }
 
@@ -165,7 +169,7 @@ int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr)
 		return -EINVAL;
 
 	list_for_each_entry(desc, &per_cpu(pers_events, cpu), plist)
-		if (desc->attr->config == attr->config)
+		if (desc->event->attr.config == attr->config)
 			return __alloc_persistent_event_fd(desc);
 
 	return -ENODEV;
-- 
1.8.1.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 06/14] perf, persistent: Remove rb_put()
  2013-06-11 16:42 [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Robert Richter
                   ` (4 preceding siblings ...)
  2013-06-11 16:42 ` [PATCH v2 05/14] perf, persistent: Rework struct pers_event_desc Robert Richter
@ 2013-06-11 16:42 ` Robert Richter
  2013-06-11 16:42 ` [PATCH v2 07/14] perf, persistent: Introduce get_persistent_event() Robert Richter
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 54+ messages in thread
From: Robert Richter @ 2013-06-11 16:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Robert Richter, Robert Richter

From: Robert Richter <robert.richter@linaro.org>

rb_put() is called already in perf_event_release_kernel(), so no need
to do the same in del_persistent_event(). We also don't need it in
add_persistent_event_on_cpu() after a rework. Since there are no users
of rb_put() anymore, we can make it private again to only
events/core.c.

Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
---
 kernel/events/core.c       |  4 +++-
 kernel/events/internal.h   |  1 -
 kernel/events/persistent.c | 29 +++++++++++------------------
 3 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index a9b6470..8f85caa 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3021,6 +3021,8 @@ static void free_event_rcu(struct rcu_head *head)
 	kfree(event);
 }
 
+static void rb_put(struct ring_buffer *rb);
+
 static void free_event(struct perf_event *event)
 {
 	irq_work_sync(&event->pending);
@@ -3680,7 +3682,7 @@ static struct ring_buffer *rb_get(struct perf_event *event)
 	return rb;
 }
 
-void rb_put(struct ring_buffer *rb)
+static void rb_put(struct ring_buffer *rb)
 {
 	struct perf_event *event, *n;
 	unsigned long flags;
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 3b481be..3647289 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -38,7 +38,6 @@ struct ring_buffer {
 extern void rb_free(struct ring_buffer *rb);
 extern struct ring_buffer *
 rb_alloc(int nr_pages, long watermark, int cpu, int flags);
-extern void rb_put(struct ring_buffer *rb);
 extern void perf_event_wakeup(struct perf_event *event);
 
 extern void
diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index d7aaf95..e6a7664 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -20,22 +20,22 @@ static struct perf_event *
 add_persistent_event_on_cpu(unsigned int cpu, struct perf_event_attr *attr,
 			    unsigned nr_pages)
 {
-	struct perf_event *event = ERR_PTR(-ENOMEM);
+	struct perf_event *event;
 	struct pers_event_desc *desc;
 	struct ring_buffer *buf;
 
 	desc = kzalloc(sizeof(*desc), GFP_KERNEL);
 	if (!desc)
-		goto out;
-
-	buf = rb_alloc(nr_pages, 0, cpu, 0);
-	if (!buf)
-		goto err_rb;
+		return ERR_PTR(-ENOMEM);
 
 	event = perf_event_create_kernel_counter(attr, cpu, NULL, NULL, NULL);
 	if (IS_ERR(event))
 		goto err_event;
 
+	buf = rb_alloc(nr_pages, 0, cpu, 0);
+	if (!buf)
+		goto err_rb;
+
 	rcu_assign_pointer(event->rb, buf);
 
 	desc->event = event;
@@ -47,14 +47,12 @@ add_persistent_event_on_cpu(unsigned int cpu, struct perf_event_attr *attr,
 	perf_event_enable(event);
 
 	goto out;
-
- err_event:
-	rb_put(buf);
-
- err_rb:
+err_rb:
+	perf_event_release_kernel(event);
+	event = ERR_PTR(-ENOMEM);
+err_event:
 	kfree(desc);
-
- out:
+out:
 	return event;
 }
 
@@ -76,11 +74,6 @@ static void del_persistent_event(int cpu, struct perf_event_attr *attr)
 	list_del(&desc->plist);
 
 	perf_event_disable(event);
-	if (event->rb) {
-		rb_put(event->rb);
-		rcu_assign_pointer(event->rb, NULL);
-	}
-
 	perf_event_release_kernel(event);
 	put_unused_fd(desc->fd);
 	kfree(desc);
-- 
1.8.1.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 07/14] perf, persistent: Introduce get_persistent_event()
  2013-06-11 16:42 [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Robert Richter
                   ` (5 preceding siblings ...)
  2013-06-11 16:42 ` [PATCH v2 06/14] perf, persistent: Remove rb_put() Robert Richter
@ 2013-06-11 16:42 ` Robert Richter
  2013-06-11 16:42 ` [PATCH v2 08/14] perf, persistent: Reworking perf_get_persistent_event_fd() Robert Richter
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 54+ messages in thread
From: Robert Richter @ 2013-06-11 16:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Robert Richter, Robert Richter

From: Robert Richter <robert.richter@linaro.org>

Reducing duplicate code by introducing function get_persistent_event()
to get the descriptor of an persistent event.

Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
---
 kernel/events/persistent.c | 36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index e6a7664..7d91871 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -16,6 +16,19 @@ struct pers_event_desc {
 
 static DEFINE_PER_CPU(struct list_head, pers_events);
 
+static struct pers_event_desc
+*get_persistent_event(int cpu, struct perf_event_attr *attr)
+{
+	struct pers_event_desc *desc;
+
+	list_for_each_entry(desc, &per_cpu(pers_events, cpu), plist) {
+		if (desc->event->attr.config == attr->config)
+			return desc;
+	}
+
+	return NULL;
+}
+
 static struct perf_event *
 add_persistent_event_on_cpu(unsigned int cpu, struct perf_event_attr *attr,
 			    unsigned nr_pages)
@@ -58,18 +71,13 @@ out:
 
 static void del_persistent_event(int cpu, struct perf_event_attr *attr)
 {
-	struct pers_event_desc *desc, *tmp;
-	struct perf_event *event = NULL;
-
-	list_for_each_entry_safe(desc, tmp, &per_cpu(pers_events, cpu), plist) {
-		if (desc->event->attr.config == attr->config) {
-			event = desc->event;
-			break;
-		}
-	}
+	struct pers_event_desc *desc;
+	struct perf_event *event;
 
-	if (!event)
+	desc = get_persistent_event(cpu, attr);
+	if (!desc)
 		return;
+	event = desc->event;
 
 	list_del(&desc->plist);
 
@@ -161,11 +169,11 @@ int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr)
 	if (cpu >= (unsigned)nr_cpu_ids)
 		return -EINVAL;
 
-	list_for_each_entry(desc, &per_cpu(pers_events, cpu), plist)
-		if (desc->event->attr.config == attr->config)
-			return __alloc_persistent_event_fd(desc);
+	desc = get_persistent_event(cpu, attr);
+	if (!desc)
+		return -ENODEV;
 
-	return -ENODEV;
+	return __alloc_persistent_event_fd(desc);
 }
 
 
-- 
1.8.1.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 08/14] perf, persistent: Reworking perf_get_persistent_event_fd()
  2013-06-11 16:42 [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Robert Richter
                   ` (6 preceding siblings ...)
  2013-06-11 16:42 ` [PATCH v2 07/14] perf, persistent: Introduce get_persistent_event() Robert Richter
@ 2013-06-11 16:42 ` Robert Richter
  2013-06-11 16:42 ` [PATCH v2 09/14] perf, persistent: Protect event lists with mutex Robert Richter
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 54+ messages in thread
From: Robert Richter @ 2013-06-11 16:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Robert Richter, Robert Richter

From: Robert Richter <robert.richter@linaro.org>

There is already a function anon_inode_getfd() that does already all
the work. Reworking and simplifying code.

Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
---
 kernel/events/persistent.c | 35 +++++++----------------------------
 1 file changed, 7 insertions(+), 28 deletions(-)

diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index 7d91871..16ed47c 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -87,33 +87,6 @@ static void del_persistent_event(int cpu, struct perf_event_attr *attr)
 	kfree(desc);
 }
 
-static int __alloc_persistent_event_fd(struct pers_event_desc *desc)
-{
-	struct file *event_file = NULL;
-	int event_fd = -1;
-
-	event_fd = get_unused_fd();
-	if (event_fd < 0)
-		goto out;
-
-	event_file = anon_inode_getfile("[pers_event]", &perf_fops,
-					desc->event, O_RDONLY);
-	if (IS_ERR(event_file))
-		goto err_event_file;
-
-	desc->fd = event_fd;
-	fd_install(event_fd, event_file);
-
-	goto out;
-
-
- err_event_file:
-	put_unused_fd(event_fd);
-
- out:
-	return event_fd;
-}
-
 /*
  * Create and enable the persistent version of the perf event described by
  * @attr.
@@ -165,6 +138,7 @@ int perf_add_persistent_event_by_id(int id)
 int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr)
 {
 	struct pers_event_desc *desc;
+	int event_fd;
 
 	if (cpu >= (unsigned)nr_cpu_ids)
 		return -EINVAL;
@@ -173,7 +147,12 @@ int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr)
 	if (!desc)
 		return -ENODEV;
 
-	return __alloc_persistent_event_fd(desc);
+	event_fd = anon_inode_getfd("[pers_event]", &perf_fops,
+				desc->event, O_RDONLY);
+	if (event_fd >= 0)
+		desc->fd = event_fd;
+
+	return event_fd;
 }
 
 
-- 
1.8.1.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 09/14] perf, persistent: Protect event lists with mutex
  2013-06-11 16:42 [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Robert Richter
                   ` (7 preceding siblings ...)
  2013-06-11 16:42 ` [PATCH v2 08/14] perf, persistent: Reworking perf_get_persistent_event_fd() Robert Richter
@ 2013-06-11 16:42 ` Robert Richter
  2013-06-11 16:42 ` [PATCH v2 10/14] perf, persistent: Avoid adding identical events Robert Richter
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 54+ messages in thread
From: Robert Richter @ 2013-06-11 16:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Robert Richter, Robert Richter

From: Robert Richter <robert.richter@linaro.org>

Protect esp. access to struct pers_event_desc *desc. There are race
conditions possible where the descriptor could be removed from list
while it is used.

Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
---
 kernel/events/persistent.c | 33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index 16ed47c..586cea5 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -15,6 +15,7 @@ struct pers_event_desc {
 };
 
 static DEFINE_PER_CPU(struct list_head, pers_events);
+static DEFINE_PER_CPU(struct mutex, pers_events_lock);
 
 static struct pers_event_desc
 *get_persistent_event(int cpu, struct perf_event_attr *attr)
@@ -37,9 +38,13 @@ add_persistent_event_on_cpu(unsigned int cpu, struct perf_event_attr *attr,
 	struct pers_event_desc *desc;
 	struct ring_buffer *buf;
 
+	mutex_lock(&per_cpu(pers_events_lock, cpu));
+
 	desc = kzalloc(sizeof(*desc), GFP_KERNEL);
-	if (!desc)
-		return ERR_PTR(-ENOMEM);
+	if (!desc) {
+		event = ERR_PTR(-ENOMEM);
+		goto out;
+	}
 
 	event = perf_event_create_kernel_counter(attr, cpu, NULL, NULL, NULL);
 	if (IS_ERR(event))
@@ -66,6 +71,7 @@ err_rb:
 err_event:
 	kfree(desc);
 out:
+	mutex_unlock(&per_cpu(pers_events_lock, cpu));
 	return event;
 }
 
@@ -74,9 +80,11 @@ static void del_persistent_event(int cpu, struct perf_event_attr *attr)
 	struct pers_event_desc *desc;
 	struct perf_event *event;
 
+	mutex_lock(&per_cpu(pers_events_lock, cpu));
+
 	desc = get_persistent_event(cpu, attr);
 	if (!desc)
-		return;
+		goto out;
 	event = desc->event;
 
 	list_del(&desc->plist);
@@ -85,6 +93,8 @@ static void del_persistent_event(int cpu, struct perf_event_attr *attr)
 	perf_event_release_kernel(event);
 	put_unused_fd(desc->fd);
 	kfree(desc);
+out:
+	mutex_unlock(&per_cpu(pers_events_lock, cpu));
 }
 
 /*
@@ -138,28 +148,33 @@ int perf_add_persistent_event_by_id(int id)
 int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr)
 {
 	struct pers_event_desc *desc;
-	int event_fd;
+	int event_fd = -ENODEV;
 
 	if (cpu >= (unsigned)nr_cpu_ids)
 		return -EINVAL;
 
+	mutex_lock(&per_cpu(pers_events_lock, cpu));
+
 	desc = get_persistent_event(cpu, attr);
 	if (!desc)
-		return -ENODEV;
+		goto out;
 
 	event_fd = anon_inode_getfd("[pers_event]", &perf_fops,
 				desc->event, O_RDONLY);
 	if (event_fd >= 0)
 		desc->fd = event_fd;
+out:
+	mutex_unlock(&per_cpu(pers_events_lock, cpu));
 
 	return event_fd;
 }
 
-
 void __init persistent_events_init(void)
 {
-	int i;
+	int cpu;
 
-	for_each_possible_cpu(i)
-		INIT_LIST_HEAD(&per_cpu(pers_events, i));
+	for_each_possible_cpu(cpu) {
+		INIT_LIST_HEAD(&per_cpu(pers_events, cpu));
+		mutex_init(&per_cpu(pers_events_lock, cpu));
+	}
 }
-- 
1.8.1.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 10/14] perf, persistent: Avoid adding identical events
  2013-06-11 16:42 [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Robert Richter
                   ` (8 preceding siblings ...)
  2013-06-11 16:42 ` [PATCH v2 09/14] perf, persistent: Protect event lists with mutex Robert Richter
@ 2013-06-11 16:42 ` Robert Richter
  2013-06-11 16:42 ` [PATCH v2 11/14] perf, persistent: Implementing a persistent pmu Robert Richter
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 54+ messages in thread
From: Robert Richter @ 2013-06-11 16:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Robert Richter, Robert Richter

From: Robert Richter <robert.richter@linaro.org>

Check if an event already exists before adding it.

Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
---
 kernel/events/persistent.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index 586cea5..4fcd071 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -40,6 +40,12 @@ add_persistent_event_on_cpu(unsigned int cpu, struct perf_event_attr *attr,
 
 	mutex_lock(&per_cpu(pers_events_lock, cpu));
 
+	desc = get_persistent_event(cpu, attr);
+	if (desc) {
+		event = ERR_PTR(-EEXIST);
+		goto out;
+	}
+
 	desc = kzalloc(sizeof(*desc), GFP_KERNEL);
 	if (!desc) {
 		event = ERR_PTR(-ENOMEM);
-- 
1.8.1.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 11/14] perf, persistent: Implementing a persistent pmu
  2013-06-11 16:42 [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Robert Richter
                   ` (9 preceding siblings ...)
  2013-06-11 16:42 ` [PATCH v2 10/14] perf, persistent: Avoid adding identical events Robert Richter
@ 2013-06-11 16:42 ` Robert Richter
  2013-06-11 16:42 ` [PATCH v2 12/14] perf, persistent: Name each persistent event Robert Richter
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 54+ messages in thread
From: Robert Richter @ 2013-06-11 16:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Robert Richter, Robert Richter

From: Robert Richter <robert.richter@linaro.org>

We want to use the kernel's pmu design to later expose persistent
events via sysfs to userland. Initially implement a persistent pmu.

The format syntax is introduced allowing to set bits anywhere in
struct perf_event_attr. This is used in this case to set the
persistent flag (attr5:23). The syntax is attr<num> where num is the
index of the u64 array in struct perf_event_attr. Otherwise syntax is
same as for config<num>.

Patches that implement this functionality for perf tools are sent in a
separate patchset.

Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
---
 kernel/events/persistent.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index 4fcd071..97c57c9 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -175,10 +175,45 @@ out:
 	return event_fd;
 }
 
+PMU_FORMAT_ATTR(persistent, "attr5:23");
+
+static struct attribute *persistent_format_attrs[] = {
+	&format_attr_persistent.attr,
+	NULL,
+};
+
+static struct attribute_group persistent_format_group = {
+	.name = "format",
+	.attrs = persistent_format_attrs,
+};
+
+static const struct attribute_group *persistent_attr_groups[] = {
+	&persistent_format_group,
+	NULL,
+};
+
+static struct pmu persistent_pmu;
+
+static int persistent_pmu_init(struct perf_event *event)
+{
+	if (persistent_pmu.type != event->attr.type)
+		return -ENOENT;
+
+	/* Not a persistent event. */
+	return -EFAULT;
+}
+
+static struct pmu persistent_pmu = {
+	.event_init	= persistent_pmu_init,
+	.attr_groups	= persistent_attr_groups,
+};
+
 void __init persistent_events_init(void)
 {
 	int cpu;
 
+	perf_pmu_register(&persistent_pmu, "persistent", -1);
+
 	for_each_possible_cpu(cpu) {
 		INIT_LIST_HEAD(&per_cpu(pers_events, cpu));
 		mutex_init(&per_cpu(pers_events_lock, cpu));
-- 
1.8.1.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 12/14] perf, persistent: Name each persistent event
  2013-06-11 16:42 [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Robert Richter
                   ` (10 preceding siblings ...)
  2013-06-11 16:42 ` [PATCH v2 11/14] perf, persistent: Implementing a persistent pmu Robert Richter
@ 2013-06-11 16:42 ` Robert Richter
  2013-06-11 16:42 ` [PATCH v2 13/14] perf, persistent: Exposing persistent events using sysfs Robert Richter
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 54+ messages in thread
From: Robert Richter @ 2013-06-11 16:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Robert Richter, Robert Richter

From: Robert Richter <robert.richter@linaro.org>

For later adding persistent events to sysfs we need a name for each
event. Adding a name to each persistent event.

Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
---
 arch/x86/kernel/cpu/mcheck/mce.c |  3 ++-
 include/linux/perf_event.h       |  4 ++--
 kernel/events/persistent.c       | 30 +++++++++++++++++++++++++-----
 3 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index d421937..833eb7a 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1991,7 +1991,8 @@ int __init mcheck_init(void)
 
 int __init mcheck_init_tp(void)
 {
-	if (perf_add_persistent_event_by_id(event_mce_record.event.type)) {
+	if (perf_add_persistent_event_by_id(event_mce_record.name,
+					    event_mce_record.event.type)) {
 		pr_err("Error adding MCE persistent event.\n");
 		return -EINVAL;
 	}
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index dc72c93..06b4357b 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -751,7 +751,7 @@ extern void perf_event_disable(struct perf_event *event);
 extern int __perf_event_disable(void *info);
 extern void perf_event_task_tick(void);
 extern int perf_add_persistent_event(struct perf_event_attr *, unsigned);
-extern int perf_add_persistent_event_by_id(int id);
+extern int perf_add_persistent_event_by_id(char *name, int id);
 #else /* !CONFIG_PERF_EVENTS */
 static inline void
 perf_event_task_sched_in(struct task_struct *prev,
@@ -794,7 +794,7 @@ static inline int __perf_event_disable(void *info)			{ return -1; }
 static inline void perf_event_task_tick(void)				{ }
 static inline int perf_add_persistent_event(struct perf_event_attr *attr,
 					    unsigned nr_pages)		{ return -EINVAL; }
-static inline int perf_add_persistent_event_by_id(int id)		{ return -EINVAL; }
+static inline int perf_add_persistent_event_by_id(char *name, int id)	{ return -EINVAL; }
 #endif /* !CONFIG_PERF_EVENTS */
 
 #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_NO_HZ_FULL)
diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index 97c57c9..96201c1 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -14,6 +14,11 @@ struct pers_event_desc {
 	int fd;
 };
 
+struct pers_event {
+	char				*name;
+	struct perf_event_attr		attr;
+};
+
 static DEFINE_PER_CPU(struct list_head, pers_events);
 static DEFINE_PER_CPU(struct mutex, pers_events_lock);
 
@@ -132,14 +137,20 @@ unwind:
 	return PTR_ERR(event);
 }
 
-int perf_add_persistent_event_by_id(int id)
+int perf_add_persistent_event_by_id(char* name, int id)
 {
-	struct perf_event_attr *attr;
+	struct pers_event	*event;
+	struct perf_event_attr	*attr;
+	int ret = -ENOMEM;
 
-	attr = kzalloc(sizeof(*attr), GFP_KERNEL);
-	if (!attr)
+	event = kzalloc(sizeof(*event), GFP_KERNEL);
+	if (!event)
 		return -ENOMEM;
+	event->name = kstrdup(name, GFP_KERNEL);
+	if (!event->name)
+		goto fail;
 
+	attr = &event->attr;
 	attr->sample_period	= 1;
 	attr->wakeup_events	= 1;
 	attr->sample_type	= PERF_SAMPLE_RAW;
@@ -148,7 +159,16 @@ int perf_add_persistent_event_by_id(int id)
 	attr->type		= PERF_TYPE_TRACEPOINT;
 	attr->size		= sizeof(*attr);
 
-	return perf_add_persistent_event(attr, CPU_BUFFER_NR_PAGES);
+	ret = perf_add_persistent_event(attr, CPU_BUFFER_NR_PAGES);
+	if (ret)
+		goto fail;
+
+	return 0;
+fail:
+	kfree(event->name);
+	kfree(event);
+
+	return ret;
 }
 
 int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr)
-- 
1.8.1.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 13/14] perf, persistent: Exposing persistent events using sysfs
  2013-06-11 16:42 [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Robert Richter
                   ` (11 preceding siblings ...)
  2013-06-11 16:42 ` [PATCH v2 12/14] perf, persistent: Name each persistent event Robert Richter
@ 2013-06-11 16:42 ` Robert Richter
  2013-06-14  2:36   ` Namhyung Kim
  2013-06-11 16:42 ` [PATCH v2 14/14] perf, persistent: Allow multiple users for an event Robert Richter
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 54+ messages in thread
From: Robert Richter @ 2013-06-11 16:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Robert Richter, Robert Richter

From: Robert Richter <robert.richter@linaro.org>

Expose persistent events in the system to userland using sysfs. Perf
tools are able to read existing pmu events from sysfs. Now we use a
persistent pmu as an event container containing all registered
persistent events of the system. This patch adds dynamically
registration of persistent events to sysfs. E.g. something like this:

 /sys/bus/event_source/devices/persistent/events/mce_record:persistent,config=106
 /sys/bus/event_source/devices/persistent/format/persistent:attr5:23

Perf tools need to support the attr<num> syntax that is added in a
separate patch set. With it we are able to run perf tool commands to
read persistent events, e.g.:

 # perf record -e persistent/mce_record/ sleep 10
 # perf top -e persistent/mce_record/

[ Document attr<index> syntax in sysfs ABI ]
Reported-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
---
 .../testing/sysfs-bus-event_source-devices-format  | 43 ++++++++++++-----
 kernel/events/persistent.c                         | 55 +++++++++++++++++++++-
 2 files changed, 86 insertions(+), 12 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-format b/Documentation/ABI/testing/sysfs-bus-event_source-devices-format
index 77f47ff..47b7353 100644
--- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-format
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-format
@@ -1,13 +1,14 @@
-Where:		/sys/bus/event_source/devices/<dev>/format
+Where:		/sys/bus/event_source/devices/<pmu>/format/<name>
 Date:		January 2012
-Kernel Version: 3.3
+Kernel Version:	3.3
+		3.12	(added attr<index>:<bits>)
 Contact:	Jiri Olsa <jolsa@redhat.com>
-Description:
-		Attribute group to describe the magic bits that go into
-		perf_event_attr::config[012] for a particular pmu.
-		Each attribute of this group defines the 'hardware' bitmask
-		we want to export, so that userspace can deal with sane
-		name/value pairs.
+
+Description:	Define formats for bit ranges in perf_event_attr
+
+		Attribute group to describe the magic bits that go
+		into struct perf_event_attr for a particular pmu. Bit
+		range may be any bit mask of an u64 (bits 0 to 63).
 
 		Userspace must be prepared for the possibility that attributes
 		define overlapping bit ranges. For example:
@@ -15,6 +16,26 @@ Description:
 			attr2 = 'config:0-7'
 			attr3 = 'config:12-35'
 
-		Example: 'config1:1,6-10,44'
-		Defines contents of attribute that occupies bits 1,6-10,44 of
-		perf_event_attr::config1.
+		Syntax			Description
+
+		config[012]*:<bits>	Each attribute of this group
+					defines the 'hardware' bitmask
+					we want to export, so that
+					userspace can deal with sane
+					name/value pairs.
+
+		attr<index>:<bits>	Set any field of the event
+					attribute. The index is a
+					decimal number that specifies
+					the u64 value to be set within
+					struct perf_event_attr.
+
+		Examples:
+
+		'config1:1,6-10,44'	Defines contents of attribute
+					that occupies bits 1,6-10,44
+					of perf_event_attr::config1.
+
+		'attr5:23'		Define the persistent event
+					flag (bit 23 of the attribute
+					flags)
diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index 96201c1..8be7c05 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -17,8 +17,10 @@ struct pers_event_desc {
 struct pers_event {
 	char				*name;
 	struct perf_event_attr		attr;
+	struct perf_pmu_events_attr	sysfs;
 };
 
+static struct pmu persistent_pmu;
 static DEFINE_PER_CPU(struct list_head, pers_events);
 static DEFINE_PER_CPU(struct mutex, pers_events_lock);
 
@@ -137,6 +139,8 @@ unwind:
 	return PTR_ERR(event);
 }
 
+static int pers_event_sysfs_register(struct pers_event *event);
+
 int perf_add_persistent_event_by_id(char* name, int id)
 {
 	struct pers_event	*event;
@@ -150,6 +154,8 @@ int perf_add_persistent_event_by_id(char* name, int id)
 	if (!event->name)
 		goto fail;
 
+	event->sysfs.id		= id;
+
 	attr = &event->attr;
 	attr->sample_period	= 1;
 	attr->wakeup_events	= 1;
@@ -163,6 +169,8 @@ int perf_add_persistent_event_by_id(char* name, int id)
 	if (ret)
 		goto fail;
 
+	pers_event_sysfs_register(event);
+
 	return 0;
 fail:
 	kfree(event->name);
@@ -207,12 +215,57 @@ static struct attribute_group persistent_format_group = {
 	.attrs = persistent_format_attrs,
 };
 
+#define MAX_EVENTS 16
+
+static struct attribute *persistent_events_attr[MAX_EVENTS + 1] = { };
+
+static struct attribute_group persistent_events_group = {
+	.name = "events",
+	.attrs = persistent_events_attr,
+};
+
 static const struct attribute_group *persistent_attr_groups[] = {
 	&persistent_format_group,
+	NULL, /* placeholder: &persistent_events_group */
 	NULL,
 };
+#define EVENTS_GROUP	(persistent_attr_groups[1])
 
-static struct pmu persistent_pmu;
+static ssize_t pers_event_sysfs_show(struct device *dev,
+				struct device_attribute *__attr, char *page)
+{
+	struct perf_pmu_events_attr *attr =
+		container_of(__attr, struct perf_pmu_events_attr, attr);
+	return sprintf(page, "persistent,config=%lld",
+		(unsigned long long)attr->id);
+}
+
+static int pers_event_sysfs_register(struct pers_event *event)
+{
+	struct device_attribute *attr = &event->sysfs.attr;
+	int idx;
+
+	*attr = (struct device_attribute)__ATTR(, 0444, pers_event_sysfs_show,
+						NULL);
+	attr->attr.name = event->name;
+
+	/* add sysfs attr to events: */
+	for (idx = 0; idx < MAX_EVENTS; idx++) {
+		if (!cmpxchg(persistent_events_attr + idx, NULL, &attr->attr))
+			break;
+	}
+
+	if (idx >= MAX_EVENTS)
+		return -ENOSPC;
+	if (!idx)
+		EVENTS_GROUP = &persistent_events_group;
+	if (!persistent_pmu.dev)
+		return 0;	/* sysfs not yet initialized */
+	if (idx)
+		return sysfs_update_group(&persistent_pmu.dev->kobj,
+					EVENTS_GROUP);
+	return sysfs_create_group(&persistent_pmu.dev->kobj, EVENTS_GROUP);
+}
 
 static int persistent_pmu_init(struct perf_event *event)
 {
-- 
1.8.1.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 14/14] perf, persistent: Allow multiple users for an event
  2013-06-11 16:42 [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Robert Richter
                   ` (12 preceding siblings ...)
  2013-06-11 16:42 ` [PATCH v2 13/14] perf, persistent: Exposing persistent events using sysfs Robert Richter
@ 2013-06-11 16:42 ` Robert Richter
  2013-06-24 10:08 ` [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Peter Zijlstra
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 54+ messages in thread
From: Robert Richter @ 2013-06-11 16:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Robert Richter, Robert Richter

From: Robert Richter <robert.richter@linaro.org>

Usually a fd close leads to the release of the event too. For
persistent events this is different as the events should be
permanently enabled in the system. Using reference counting to avoid
releasing an event during a fd close. This also allows it to have
multiple users (open file descriptors) for a single persistent event.

While at this, we don't need desc->fd any longer. The fd is attached
to a task and reference counting keeps the event. Removing desc->fd.

Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
---
 kernel/events/persistent.c | 46 ++++++++++++++++++++++++++++++++++++----------
 1 file changed, 36 insertions(+), 10 deletions(-)

diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c
index 8be7c05..dd20b55 100644
--- a/kernel/events/persistent.c
+++ b/kernel/events/persistent.c
@@ -11,7 +11,6 @@
 struct pers_event_desc {
 	struct perf_event *event;
 	struct list_head plist;
-	int fd;
 };
 
 struct pers_event {
@@ -88,6 +87,18 @@ out:
 	return event;
 }
 
+static void detach_persistent_event(struct pers_event_desc *desc)
+{
+	list_del(&desc->plist);
+	kfree(desc);
+}
+
+static void release_persistent_event(struct perf_event *event)
+{
+	perf_event_disable(event);
+	perf_event_release_kernel(event);
+}
+
 static void del_persistent_event(int cpu, struct perf_event_attr *attr)
 {
 	struct pers_event_desc *desc;
@@ -100,12 +111,14 @@ static void del_persistent_event(int cpu, struct perf_event_attr *attr)
 		goto out;
 	event = desc->event;
 
-	list_del(&desc->plist);
-
-	perf_event_disable(event);
-	perf_event_release_kernel(event);
-	put_unused_fd(desc->fd);
-	kfree(desc);
+	/*
+	 * We primarily want to remove desc from the list. If there
+	 * are no open files, the refcount is 0 and we need to release
+	 * the event too.
+	 */
+	detach_persistent_event(desc);
+	if (atomic_long_dec_and_test(&event->refcount))
+		release_persistent_event(event);
 out:
 	mutex_unlock(&per_cpu(pers_events_lock, cpu));
 }
@@ -182,6 +195,7 @@ fail:
 int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr)
 {
 	struct pers_event_desc *desc;
+	struct perf_event *event;
 	int event_fd = -ENODEV;
 
 	if (cpu >= (unsigned)nr_cpu_ids)
@@ -190,13 +204,25 @@ int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr)
 	mutex_lock(&per_cpu(pers_events_lock, cpu));
 
 	desc = get_persistent_event(cpu, attr);
-	if (!desc)
+
+	/* Increment refcount to keep event on put_event() */
+	if (!desc || !atomic_long_inc_not_zero(&desc->event->refcount))
 		goto out;
 
 	event_fd = anon_inode_getfd("[pers_event]", &perf_fops,
 				desc->event, O_RDONLY);
-	if (event_fd >= 0)
-		desc->fd = event_fd;
+
+	if (event_fd < 0) {
+		event = desc->event;
+		if (WARN_ON(atomic_long_dec_and_test(&event->refcount))) {
+			/*
+			 * May not happen since decrementing refcount is
+			 * protected by pers_events_lock.
+			 */
+			detach_persistent_event(desc);
+			release_persistent_event(event);
+		}
+	}
 out:
 	mutex_unlock(&per_cpu(pers_events_lock, cpu));
 
-- 
1.8.1.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 03/14] perf: Add persistent event facilities
  2013-06-11 16:42 ` [PATCH v2 03/14] perf: Add persistent event facilities Robert Richter
@ 2013-06-14  2:15   ` Namhyung Kim
  2013-06-14  7:20     ` Robert Richter
  2013-06-24  9:32   ` Peter Zijlstra
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 54+ messages in thread
From: Namhyung Kim @ 2013-06-14  2:15 UTC (permalink / raw)
  To: Robert Richter
  Cc: Borislav Petkov, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel,
	Borislav Petkov, Robert Richter

Hi Robert and Boris,

On Tue, 11 Jun 2013 18:42:29 +0200, Robert Richter wrote:
> From: Borislav Petkov <bp@alien8.de>
>
> Add a barebones implementation for registering persistent events with
> perf. For that, we don't destroy the buffers when they're unmapped;
> also, we map them read-only so that multiple agents can access them.
>
> Also, we allocate the event buffers at event init time and not at mmap
> time so that we can log samples into them regardless of whether there
> are readers in userspace or not.
>
> Changes made by Robert Richter <robert.richter@linaro.org>:
>
> * Fixing wrongly determined attribute size.
>
> * The default buffer size used to setup event buffers with perf tools
>   is 512k. Using the same buffer size for persistent events. This also
>   avoids failed mmap calls due to different buffer sizes.
>
> * Improve error reporting.
>
> * Returning -ENODEV if no file descriptor is found. An error code of
>   -1 (-EPERM) is misleading in this case.
>
> * Adding cpu check to perf_get_persistent_event_fd()
>
> [ make percpu variable static ]
> Reported-by: Fengguang Wu <fengguang.wu@intel.com>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> [ Fix attr size ]
> [ Setting default buffer size to 512k as in perf tools ]
> [ Print error code on failure when adding events ]
> [ Return resonable error code ]
> [ Adding cpu check to perf_get_persistent_event_fd() ]
> Reported-by: Jiri Olsa <jolsa@redhat.com>
> Signed-off-by: Robert Richter <robert.richter@linaro.org>
> Signed-off-by: Robert Richter <rric@kernel.org>
[SNIP]
> +int perf_add_persistent_event_by_id(int id)
> +{
> +	struct perf_event_attr *attr;
> +
> +	attr = kzalloc(sizeof(*attr), GFP_KERNEL);
> +	if (!attr)
> +		return -ENOMEM;
> +
> +	attr->sample_period	= 1;
> +	attr->wakeup_events	= 1;
> +	attr->sample_type	= PERF_SAMPLE_RAW;
> +	attr->persistent	= 1;
> +	attr->config		= id;
> +	attr->type		= PERF_TYPE_TRACEPOINT;
> +	attr->size		= sizeof(*attr);
> +
> +	return perf_add_persistent_event(attr, CPU_BUFFER_NR_PAGES);
> +}
> +
> +int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr)
> +{
> +	struct pers_event_desc *desc;
> +
> +	if (cpu >= (unsigned)nr_cpu_ids)
> +		return -EINVAL;
> +
> +	list_for_each_entry(desc, &per_cpu(pers_events, cpu), plist)
> +		if (desc->attr->config == attr->config)
> +			return __alloc_persistent_event_fd(desc);
> +

So it only supports tracepoint events.  Don't we need to add other types
of event?

Thanks,
Namhyung


> +	return -ENODEV;
> +}
> +
> +
> +void __init persistent_events_init(void)
> +{
> +	int i;
> +
> +	for_each_possible_cpu(i)
> +		INIT_LIST_HEAD(&per_cpu(pers_events, i));
> +}

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 13/14] perf, persistent: Exposing persistent events using sysfs
  2013-06-11 16:42 ` [PATCH v2 13/14] perf, persistent: Exposing persistent events using sysfs Robert Richter
@ 2013-06-14  2:36   ` Namhyung Kim
  2013-06-14  8:57     ` Robert Richter
  0 siblings, 1 reply; 54+ messages in thread
From: Namhyung Kim @ 2013-06-14  2:36 UTC (permalink / raw)
  To: Robert Richter
  Cc: Borislav Petkov, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel,
	Robert Richter

On Tue, 11 Jun 2013 18:42:39 +0200, Robert Richter wrote:
> From: Robert Richter <robert.richter@linaro.org>
>
> Expose persistent events in the system to userland using sysfs. Perf
> tools are able to read existing pmu events from sysfs. Now we use a
> persistent pmu as an event container containing all registered
> persistent events of the system. This patch adds dynamically
> registration of persistent events to sysfs. E.g. something like this:
>
>  /sys/bus/event_source/devices/persistent/events/mce_record:persistent,config=106
>  /sys/bus/event_source/devices/persistent/format/persistent:attr5:23
>
> Perf tools need to support the attr<num> syntax that is added in a
> separate patch set. With it we are able to run perf tool commands to
> read persistent events, e.g.:
>
>  # perf record -e persistent/mce_record/ sleep 10
>  # perf top -e persistent/mce_record/
>
> [ Document attr<index> syntax in sysfs ABI ]
> Reported-by: Jiri Olsa <jolsa@redhat.com>
> Signed-off-by: Robert Richter <robert.richter@linaro.org>
> Signed-off-by: Robert Richter <rric@kernel.org>
> ---
[SNIP]
> +static int pers_event_sysfs_register(struct pers_event *event)
> +{
> +	struct device_attribute *attr = &event->sysfs.attr;
> +	int idx;
> +
> +	*attr = (struct device_attribute)__ATTR(, 0444, pers_event_sysfs_show,
> +						NULL);
> +	attr->attr.name = event->name;

When I added another persistent event with this API, I got an WARNING
from lockdep like this:

[    0.432506] BUG: key ffff88040946f140 not in .data!
[    0.432581] ------------[ cut here ]------------
[    0.432656] WARNING: at /home/namhyung/project/linux/kernel/lockdep.c:2987 lockdep_init_map+0x53d/0x570()
[    0.432763] DEBUG_LOCKS_WARN_ON(1)


I guess we need the following line here:

	sysfs_attr_init(&attr->attr);


Thanks,
Namhyung


> +
> +	/* add sysfs attr to events: */
> +	for (idx = 0; idx < MAX_EVENTS; idx++) {
> +		if (!cmpxchg(persistent_events_attr + idx, NULL, &attr->attr))
> +			break;
> +	}
> +
> +	if (idx >= MAX_EVENTS)
> +		return -ENOSPC;
> +	if (!idx)
> +		EVENTS_GROUP = &persistent_events_group;
> +	if (!persistent_pmu.dev)
> +		return 0;	/* sysfs not yet initialized */
> +	if (idx)
> +		return sysfs_update_group(&persistent_pmu.dev->kobj,
> +					EVENTS_GROUP);
> +	return sysfs_create_group(&persistent_pmu.dev->kobj, EVENTS_GROUP);
> +}
>  
>  static int persistent_pmu_init(struct perf_event *event)
>  {

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 03/14] perf: Add persistent event facilities
  2013-06-14  2:15   ` Namhyung Kim
@ 2013-06-14  7:20     ` Robert Richter
  0 siblings, 0 replies; 54+ messages in thread
From: Robert Richter @ 2013-06-14  7:20 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Borislav Petkov, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel,
	Borislav Petkov

On 14.06.13 11:15:13, Namhyung Kim wrote:
> > +int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr)
> > +{
> > +   struct pers_event_desc *desc;
> > +
> > +   if (cpu >= (unsigned)nr_cpu_ids)
> > +           return -EINVAL;  
> > +
> > +   list_for_each_entry(desc, &per_cpu(pers_events, cpu), plist)
> > +           if (desc->attr->config == attr->config)
> > +                   return __alloc_persistent_event_fd(desc);
> > +
>
> So it only supports tracepoint events.  Don't we need to add other types
> of event?

We have choosen tracepoints as initial implementation since they are
our first use case and its specifiers are unique already.

So this is for the initial implementation only. We can easily
implement support for other event types later. The only thing we need
to implement for this are unique identifiers for each persistent
event. Since all events are described in sysfs this change can be done
later without changing ABI or breaking userland.

The support for all event types is definitely on our list. We need
this also for a later implemention of registering all system wide
events as persistent events. This allows sharing events between
processes and is one of Ingos use cases to enable persistent events at
runtime.

-Robert

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 13/14] perf, persistent: Exposing persistent events using sysfs
  2013-06-14  2:36   ` Namhyung Kim
@ 2013-06-14  8:57     ` Robert Richter
  0 siblings, 0 replies; 54+ messages in thread
From: Robert Richter @ 2013-06-14  8:57 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Borislav Petkov, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

On 14.06.13 11:36:00, Namhyung Kim wrote:
> > +static int pers_event_sysfs_register(struct pers_event *event)
> > +{
> > +	struct device_attribute *attr = &event->sysfs.attr;
> > +	int idx;
> > +
> > +	*attr = (struct device_attribute)__ATTR(, 0444, pers_event_sysfs_show,
> > +						NULL);
> > +	attr->attr.name = event->name;
> 
> When I added another persistent event with this API, I got an WARNING
> from lockdep like this:
> 
> [    0.432506] BUG: key ffff88040946f140 not in .data!
> [    0.432581] ------------[ cut here ]------------
> [    0.432656] WARNING: at /home/namhyung/project/linux/kernel/lockdep.c:2987 lockdep_init_map+0x53d/0x570()
> [    0.432763] DEBUG_LOCKS_WARN_ON(1)
> 
> 
> I guess we need the following line here:
> 
> 	sysfs_attr_init(&attr->attr);

Yes, added your change. Thanks Namhyung for reviewing and testing.

-Robert

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/14] perf: Add persistent events
  2013-06-11 16:42 ` [PATCH v2 02/14] perf: Add persistent events Robert Richter
@ 2013-06-24  9:28   ` Peter Zijlstra
  2013-06-24 19:24     ` Borislav Petkov
  0 siblings, 1 reply; 54+ messages in thread
From: Peter Zijlstra @ 2013-06-24  9:28 UTC (permalink / raw)
  To: Robert Richter
  Cc: Borislav Petkov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Jiri Olsa, linux-kernel, Borislav Petkov, Robert Richter

On Tue, Jun 11, 2013 at 06:42:28PM +0200, Robert Richter wrote:
> From: Borislav Petkov <bp@alien8.de>
> 
> Add the needed pieces for persistent events which makes them
> process-agnostic. Also, make their buffers read-only when mmaping them
> from userspace.
> 
> While at it, do not return a void function, as caught by Fengguang's
> build robot.
> 
> Changes made by Robert Richter <robert.richter@linaro.org>:
> 
> * mmap should return EACCES error if fd can not be opened writable.
>   This error code also helps userland to map buffers readonly on
>   failure.
> 
> Signed-off-by: Borislav Petkov <bp@suse.de>
> [ Return -EACCES if mapped buffers must be readonly ]
> Signed-off-by: Robert Richter <robert.richter@linaro.org>
> Signed-off-by: Robert Richter <rric@kernel.org>
> ---
>  include/uapi/linux/perf_event.h |  3 ++-
>  kernel/events/core.c            | 10 +++++++++-
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index fb104e5..6032361 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -272,8 +272,9 @@ struct perf_event_attr {
>  
>  				exclude_callchain_kernel : 1, /* exclude kernel callchains */
>  				exclude_callchain_user   : 1, /* exclude user callchains */
> +				persistent     :  1, /* always-on event */
>  
> -				__reserved_1   : 41;
> +				__reserved_1   : 40;
>  
>  	union {
>  		__u32		wakeup_events;	  /* wakeup every n events */
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index b790ab6..a13e457 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3713,6 +3713,11 @@ static void perf_mmap_close(struct vm_area_struct *vma)
>  {
>  	struct perf_event *event = vma->vm_file->private_data;
>  
> +	if (event->attr.persistent) {
> +		atomic_dec(&event->mmap_count);
> +		return;
> +	}
> +
>  	if (atomic_dec_and_mutex_lock(&event->mmap_count, &event->mmap_mutex)) {
>  		unsigned long size = perf_data_size(event->rb);
>  		struct user_struct *user = event->mmap_user;
> @@ -3756,9 +3761,12 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
>  	if (event->cpu == -1 && event->attr.inherit)
>  		return -EINVAL;
>  
> -	if (!(vma->vm_flags & VM_SHARED))
> +	if (!(vma->vm_flags & VM_SHARED) && !event->attr.persistent)
>  		return -EINVAL;
>  
> +	if (event->attr.persistent && (vma->vm_flags & VM_WRITE))
> +		return -EACCES;
> +
>  	vma_size = vma->vm_end - vma->vm_start;
>  	nr_pages = (vma_size / PAGE_SIZE) - 1;


I find this patch incomplete for its asymmetric; there a dec, but not an
implied inc.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 03/14] perf: Add persistent event facilities
  2013-06-11 16:42 ` [PATCH v2 03/14] perf: Add persistent event facilities Robert Richter
  2013-06-14  2:15   ` Namhyung Kim
@ 2013-06-24  9:32   ` Peter Zijlstra
  2013-06-25  8:47     ` Robert Richter
  2013-06-24  9:44   ` Peter Zijlstra
  2013-06-24  9:48   ` Peter Zijlstra
  3 siblings, 1 reply; 54+ messages in thread
From: Peter Zijlstra @ 2013-06-24  9:32 UTC (permalink / raw)
  To: Robert Richter
  Cc: Borislav Petkov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Jiri Olsa, linux-kernel, Borislav Petkov, Robert Richter

On Tue, Jun 11, 2013 at 06:42:29PM +0200, Robert Richter wrote:
> +static struct perf_event *
> +add_persistent_event_on_cpu(unsigned int cpu, struct perf_event_attr *attr,
> +			    unsigned nr_pages)
> +{
> +	struct perf_event *event = ERR_PTR(-ENOMEM);
> +	struct pers_event_desc *desc;
> +	struct ring_buffer *buf;
> +
> +	desc = kzalloc(sizeof(*desc), GFP_KERNEL);
> +	if (!desc)
> +		goto out;
> +
> +	buf = rb_alloc(nr_pages, 0, cpu, 0);
> +	if (!buf)
> +		goto err_rb;
> +
> +	event = perf_event_create_kernel_counter(attr, cpu, NULL, NULL, NULL);
> +	if (IS_ERR(event))
> +		goto err_event;
> +
> +	rcu_assign_pointer(event->rb, buf);
> +
> +	desc->event = event;
> +	desc->attr  = attr;
> +
> +	INIT_LIST_HEAD(&desc->plist);
> +	list_add_tail(&desc->plist, &per_cpu(pers_events, cpu));
> +
> +	/* All workie, enable event now */
> +	perf_event_enable(event);
> +
> +	goto out;
> +
> + err_event:
> +	rb_put(buf);
> +
> + err_rb:
> +	kfree(desc);
> +
> + out:
> +	return event;
> +}

I generally disapprove of indented labels. None of the perf code has
that and the tools are easy to 'fix'.

My quiltrc contains:

QUILT_DIFF_OPTS="-F ^[[:alpha:]\$_].*[^:]\$"

my .gitconfig contains:

[diff "default"]
                xfuncname = "^[[:alpha:]$_].*[^:]$"

Both avoid diff thinking labels are function names.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 03/14] perf: Add persistent event facilities
  2013-06-11 16:42 ` [PATCH v2 03/14] perf: Add persistent event facilities Robert Richter
  2013-06-14  2:15   ` Namhyung Kim
  2013-06-24  9:32   ` Peter Zijlstra
@ 2013-06-24  9:44   ` Peter Zijlstra
  2013-06-25  8:41     ` Robert Richter
  2013-06-24  9:48   ` Peter Zijlstra
  3 siblings, 1 reply; 54+ messages in thread
From: Peter Zijlstra @ 2013-06-24  9:44 UTC (permalink / raw)
  To: Robert Richter
  Cc: Borislav Petkov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Jiri Olsa, linux-kernel, Borislav Petkov, Robert Richter

> +static void del_persistent_event(int cpu, struct perf_event_attr *attr)
> +{
> +	struct pers_event_desc *desc, *tmp;
> +	struct perf_event *event = NULL;
> +
> +	list_for_each_entry_safe(desc, tmp, &per_cpu(pers_events, cpu), plist) {
> +		if (desc->attr->config == attr->config) {

attr->config might not be enough; you might also want to compare type,
config1, config2, and possible bp_type as well; although I think we want
to avoid having hwbp style persistent events, but then what do I know.

> +			event = desc->event;
> +			break;
> +		}
> +	}
> +
> +	if (!event)
> +		return;
> +
> +	list_del(&desc->plist);
> +
> +	perf_event_disable(event);
> +	if (event->rb) {
> +		rb_put(event->rb);
> +		rcu_assign_pointer(event->rb, NULL);
> +	}
> +
> +	perf_event_release_kernel(event);
> +	put_unused_fd(desc->fd);
> +	kfree(desc->attr);
> +	kfree(desc);
> +}

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 03/14] perf: Add persistent event facilities
  2013-06-11 16:42 ` [PATCH v2 03/14] perf: Add persistent event facilities Robert Richter
                     ` (2 preceding siblings ...)
  2013-06-24  9:44   ` Peter Zijlstra
@ 2013-06-24  9:48   ` Peter Zijlstra
  2013-06-24 19:26     ` Borislav Petkov
  3 siblings, 1 reply; 54+ messages in thread
From: Peter Zijlstra @ 2013-06-24  9:48 UTC (permalink / raw)
  To: Robert Richter
  Cc: Borislav Petkov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Jiri Olsa, linux-kernel, Borislav Petkov, Robert Richter

On Tue, Jun 11, 2013 at 06:42:29PM +0200, Robert Richter wrote:
> +int perf_add_persistent_event_by_id(int id)
> +{
> +	struct perf_event_attr *attr;
> +
> +	attr = kzalloc(sizeof(*attr), GFP_KERNEL);
> +	if (!attr)
> +		return -ENOMEM;
> +
> +	attr->sample_period	= 1;
> +	attr->wakeup_events	= 1;
> +	attr->sample_type	= PERF_SAMPLE_RAW;
> +	attr->persistent	= 1;
> +	attr->config		= id;
> +	attr->type		= PERF_TYPE_TRACEPOINT;
> +	attr->size		= sizeof(*attr);
> +
> +	return perf_add_persistent_event(attr, CPU_BUFFER_NR_PAGES);
> +}

I would call this what it is: perf_add_persistent_tracepoint(), or so
:-)


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration
  2013-06-11 16:42 [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Robert Richter
                   ` (13 preceding siblings ...)
  2013-06-11 16:42 ` [PATCH v2 14/14] perf, persistent: Allow multiple users for an event Robert Richter
@ 2013-06-24 10:08 ` Peter Zijlstra
  2013-06-25 10:46   ` Robert Richter
  2013-06-24 10:22 ` Peter Zijlstra
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 54+ messages in thread
From: Peter Zijlstra @ 2013-06-24 10:08 UTC (permalink / raw)
  To: Robert Richter
  Cc: Borislav Petkov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Jiri Olsa, linux-kernel

On Tue, Jun 11, 2013 at 06:42:26PM +0200, Robert Richter wrote:
> This patch set implements out of the box support of perf tools for
> persistent events. For this the kernel must provide necessary
> information about existing persistent events via sysfs to userland.
> Persistent events are provided by the kernel with readonly event
> buffers. To allow the independent usage of the event buffers by any
> process without limiting other processes, multiple users of a single
> event must be handled too.
> 
> The basic concept is to use a pmu as an event container for persistent
> events. The pmu registers events in sysfs and provides format and
> event information for the userland. The persistent event framework
> requires to add events to the pmu dynamically.
> 
> With the information in sysfs userland knows about how to setup the
> perf_event attribute of a persistent event. Since a persistent event
> always has the persistent flag set, a way is needed to express this in
> sysfs. A new syntax is used for this. With 'attr<num>:<mask>' any bit
> in the attribute structure may be set in a similar way as using
> 'config<num>', but <num> is an index that points to the u64 value to
> change within the attribute.
> 
> For persistent events the persistent flag (bit 23 of flag field in
> struct perf_event_attr) needs to be set which is expressed in sysfs
> with "attr5:23". E.g. the mce_record event is described in sysfs as
> follows:
> 
>  /sys/bus/event_source/devices/persistent/events/mce_record:persistent,config=106
>  /sys/bus/event_source/devices/persistent/format/persistent:attr5:23
> 
> Note that perf tools need to support the 'attr<num>' syntax that is
> added in a separate patch set. With it we are able to run perf tool
> commands to read persistent events, e.g.:
> 
>  # perf record -e persistent/mce_record/ sleep 10
>  # perf top -e persistent/mce_record/
> 
> In general the new syntax is flexible to describe with sysfs any event
> to be setup by perf tools.
> 
> First patches contain also fixes and reworks made after reviewing
> code. Could be applied separately.
> 
> Patches base on Boris' patches which I have rebased to latest
> tip/perf/core. All patches can be found here:
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git persistent-v2
> 

OK, so I gave up on reading the patches :/ and went and looked at the git
tree. There's just too much needless churn in the patches.

As already said somewhere else; get_persistent_event() -- although that
wasn't the name at the time -- needs a better match function.


+ int perf_add_persistent_event(struct perf_event_attr *attr, unsigned nr_pages)
+ {
+ 	struct perf_event *event;
+ 	int i;
+ 
+ 	for_each_possible_cpu(i) {
+ 		event = add_persistent_event_on_cpu(i, attr, nr_pages);
+ 		if (IS_ERR(event))
+ 			goto unwind;
+ 	}
+ 	return 0;
+ 
+ unwind:
+ 	pr_err("%s: Error adding persistent event on cpu %d: %ld\n",
+ 		__func__, i, PTR_ERR(event));
+ 
+ 	while (--i >= 0)
+ 		del_persistent_event(i, attr);
+ 
+ 	return PTR_ERR(event);
+ }

This assumes cpu_possible_mask doesn't have holes in; I'm fairly sure we cannot
assume such.

Furthermore; while the function is exposed and thus looks like a generic usable
thing; however it looks to 'forget' making a sysfs entry, making it look like
something incomplete.

Only the ill named perf_add_persistent_event_by_id() function looks
something that's usable outside of this.


What's with MAX_EVENTS ?




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration
  2013-06-11 16:42 [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Robert Richter
                   ` (14 preceding siblings ...)
  2013-06-24 10:08 ` [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Peter Zijlstra
@ 2013-06-24 10:22 ` Peter Zijlstra
  2013-06-25 16:56   ` Robert Richter
  2013-06-24 10:24 ` Peter Zijlstra
  2013-06-24 15:25 ` Peter Zijlstra
  17 siblings, 1 reply; 54+ messages in thread
From: Peter Zijlstra @ 2013-06-24 10:22 UTC (permalink / raw)
  To: Robert Richter
  Cc: Borislav Petkov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Jiri Olsa, linux-kernel

On Tue, Jun 11, 2013 at 06:42:26PM +0200, Robert Richter wrote:
> Note that perf tools need to support the 'attr<num>' syntax that is
> added in a separate patch set. With it we are able to run perf tool
> commands to read persistent events, e.g.:

where is this patch? I can't find it.

I also find attr<num>:<bit> a bit weird. So far we've used
<perf_event_attr::fieldname>:<bits>, so while initializing anonymous
unions is a bit difficult with 'older' GCCs and we cannot actually do:

struct perf_event_attr {
	...
	union {
		u64 flags;
		u64 disabled : 1,
		    ...
		    __reserved_1 : X;
	}
	...
};

We could fake it in userspace by allowing things like: flags:23. It
would not be a much worse hack than attr<num>:<bit> I suppose.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration
  2013-06-11 16:42 [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Robert Richter
                   ` (15 preceding siblings ...)
  2013-06-24 10:22 ` Peter Zijlstra
@ 2013-06-24 10:24 ` Peter Zijlstra
  2013-06-24 15:25 ` Peter Zijlstra
  17 siblings, 0 replies; 54+ messages in thread
From: Peter Zijlstra @ 2013-06-24 10:24 UTC (permalink / raw)
  To: Robert Richter
  Cc: Borislav Petkov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Jiri Olsa, linux-kernel

On Tue, Jun 11, 2013 at 06:42:26PM +0200, Robert Richter wrote:
> The basic concept is to use a pmu as an event container for persistent
> events. The pmu registers events in sysfs and provides format and
> event information for the userland. The persistent event framework
> requires to add events to the pmu dynamically.
> 
> With the information in sysfs userland knows about how to setup the
> perf_event attribute of a persistent event. Since a persistent event
> always has the persistent flag set, a way is needed to express this in
> sysfs. A new syntax is used for this. With 'attr<num>:<mask>' any bit
> in the attribute structure may be set in a similar way as using
> 'config<num>', but <num> is an index that points to the u64 value to
> change within the attribute.
> 
> For persistent events the persistent flag (bit 23 of flag field in
> struct perf_event_attr) needs to be set which is expressed in sysfs
> with "attr5:23". E.g. the mce_record event is described in sysfs as
> follows:
> 
>  /sys/bus/event_source/devices/persistent/events/mce_record:persistent,config=106
>  /sys/bus/event_source/devices/persistent/format/persistent:attr5:23
> 

Ok so it took me a little to 'get' this PMU 'abuse', but I suppose I can
go along with it ;-)

On the specifics though; it again seems to hard assume persistent events
are for tracepoints only -- either make this very explicit or remove it.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration
  2013-06-11 16:42 [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Robert Richter
                   ` (16 preceding siblings ...)
  2013-06-24 10:24 ` Peter Zijlstra
@ 2013-06-24 15:25 ` Peter Zijlstra
  2013-06-24 19:45   ` Ingo Molnar
  17 siblings, 1 reply; 54+ messages in thread
From: Peter Zijlstra @ 2013-06-24 15:25 UTC (permalink / raw)
  To: Robert Richter
  Cc: Borislav Petkov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Jiri Olsa, linux-kernel

On Tue, Jun 11, 2013 at 06:42:26PM +0200, Robert Richter wrote:

Oh and what Boris and Ingo said; persistent events should 'persist' and
not be tied to particular processes. I'm not sure about the entire
eventfs thing; but the proposed sysfs thing should definitely work for
now.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/14] perf: Add persistent events
  2013-06-24  9:28   ` Peter Zijlstra
@ 2013-06-24 19:24     ` Borislav Petkov
  2013-06-25  8:46       ` Robert Richter
  0 siblings, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2013-06-24 19:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Robert Richter, Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Borislav Petkov, Robert Richter

On Mon, Jun 24, 2013 at 11:28:07AM +0200, Peter Zijlstra wrote:
> I find this patch incomplete for its asymmetric; there a dec, but not an
> implied inc.

There's one in perf_mmap_open.

Basically the idea was to pull up the ->mmap_count decrement (haha,
sounds like excrement :-)) and not deallocate the ring buffers for a
persistent event.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 03/14] perf: Add persistent event facilities
  2013-06-24  9:48   ` Peter Zijlstra
@ 2013-06-24 19:26     ` Borislav Petkov
  2013-06-25  7:44       ` Peter Zijlstra
  0 siblings, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2013-06-24 19:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Robert Richter, Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Borislav Petkov, Robert Richter

On Mon, Jun 24, 2013 at 11:48:30AM +0200, Peter Zijlstra wrote:
> On Tue, Jun 11, 2013 at 06:42:29PM +0200, Robert Richter wrote:
> > +int perf_add_persistent_event_by_id(int id)
> > +{
> > +	struct perf_event_attr *attr;
> > +
> > +	attr = kzalloc(sizeof(*attr), GFP_KERNEL);
> > +	if (!attr)
> > +		return -ENOMEM;
> > +
> > +	attr->sample_period	= 1;
> > +	attr->wakeup_events	= 1;
> > +	attr->sample_type	= PERF_SAMPLE_RAW;
> > +	attr->persistent	= 1;
> > +	attr->config		= id;
> > +	attr->type		= PERF_TYPE_TRACEPOINT;
> > +	attr->size		= sizeof(*attr);
> > +
> > +	return perf_add_persistent_event(attr, CPU_BUFFER_NR_PAGES);
> > +}
> 
> I would call this what it is: perf_add_persistent_tracepoint(), or so
> :-)

While we're at it, can we agree on a shortened variant for "persistent"
- it is too much to type and makes function names uglily¹ long.


¹ look at me making up adverbs.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration
  2013-06-24 15:25 ` Peter Zijlstra
@ 2013-06-24 19:45   ` Ingo Molnar
  2013-06-25 17:57     ` Robert Richter
  0 siblings, 1 reply; 54+ messages in thread
From: Ingo Molnar @ 2013-06-24 19:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Robert Richter, Borislav Petkov, Arnaldo Carvalho de Melo,
	Jiri Olsa, linux-kernel


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Jun 11, 2013 at 06:42:26PM +0200, Robert Richter wrote:
> 
> Oh and what Boris and Ingo said; persistent events should 'persist' and
> not be tied to particular processes. I'm not sure about the entire
> eventfs thing; but the proposed sysfs thing should definitely work for
> now.

I'm fine with a sysfs approach as well, as long as it's correct and 
obvious enough to use.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 03/14] perf: Add persistent event facilities
  2013-06-24 19:26     ` Borislav Petkov
@ 2013-06-25  7:44       ` Peter Zijlstra
  2013-06-25  9:24         ` Robert Richter
  0 siblings, 1 reply; 54+ messages in thread
From: Peter Zijlstra @ 2013-06-25  7:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Robert Richter, Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Borislav Petkov, Robert Richter

On Mon, Jun 24, 2013 at 09:26:16PM +0200, Borislav Petkov wrote:
> On Mon, Jun 24, 2013 at 11:48:30AM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 11, 2013 at 06:42:29PM +0200, Robert Richter wrote:
> > > +int perf_add_persistent_event_by_id(int id)
> > > +{
> > > +	struct perf_event_attr *attr;
> > > +
> > > +	attr = kzalloc(sizeof(*attr), GFP_KERNEL);
> > > +	if (!attr)
> > > +		return -ENOMEM;
> > > +
> > > +	attr->sample_period	= 1;
> > > +	attr->wakeup_events	= 1;
> > > +	attr->sample_type	= PERF_SAMPLE_RAW;
> > > +	attr->persistent	= 1;
> > > +	attr->config		= id;
> > > +	attr->type		= PERF_TYPE_TRACEPOINT;
> > > +	attr->size		= sizeof(*attr);
> > > +
> > > +	return perf_add_persistent_event(attr, CPU_BUFFER_NR_PAGES);
> > > +}
> > 
> > I would call this what it is: perf_add_persistent_tracepoint(), or so
> > :-)
> 
> While we're at it, can we agree on a shortened variant for "persistent"
> - it is too much to type and makes function names uglily¹ long.

Elsewhere in this series you use 'pers' to shorten things; it reads a
bit odd to me because 'pers' is the Dutch word for press (both meanings
transfer) but that is just something I'll have to live with isn't it ;-)

As for tracepoint, it seems common to shorten that to tp; which always
reminds me of toilet paper, but I suppose more people suffer from that.

Yielding: perf_add_pers_tp() 

which I read as adding pressed toilet paper.. a well :-)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 03/14] perf: Add persistent event facilities
  2013-06-24  9:44   ` Peter Zijlstra
@ 2013-06-25  8:41     ` Robert Richter
  0 siblings, 0 replies; 54+ messages in thread
From: Robert Richter @ 2013-06-25  8:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Borislav Petkov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Jiri Olsa, linux-kernel, Borislav Petkov

On 24.06.13 11:44:58, Peter Zijlstra wrote:
> > +static void del_persistent_event(int cpu, struct perf_event_attr *attr)
> > +{
> > +	struct pers_event_desc *desc, *tmp;
> > +	struct perf_event *event = NULL;
> > +
> > +	list_for_each_entry_safe(desc, tmp, &per_cpu(pers_events, cpu), plist) {
> > +		if (desc->attr->config == attr->config) {
> 
> attr->config might not be enough; you might also want to compare type,
> config1, config2, and possible bp_type as well; although I think we want
> to avoid having hwbp style persistent events, but then what do I know.

[ I would call this what it is: perf_add_persistent_tracepoint(), or
so :-) ]

We chosed a tracepoint-only implementation as a first try and since
this is our use case. The goal is to have support for any event type.

I also was thinking of comparing the whole attr structure. But since
attr are similar (or almost identical) only you can't compare each
member of attr and need exceptions (e.g. cpu number). These exceptions
are ugly and depending on the specific use case.

I rather tend to assign unique identifiers to each persistent event
and then use attr->config to select the event, e.g.:

		if (desc->unique_id == attr->config) ...

Tracepoints have this unique id in it already, thus tp only first.

The change can be made without breaking ABI, since the event setup is
described in sysfs.

-Robert

> 
> > +			event = desc->event;
> > +			break;
> > +		}
> > +	}

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 02/14] perf: Add persistent events
  2013-06-24 19:24     ` Borislav Petkov
@ 2013-06-25  8:46       ` Robert Richter
  0 siblings, 0 replies; 54+ messages in thread
From: Robert Richter @ 2013-06-25  8:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Borislav Petkov

On 24.06.13 21:24:04, Borislav Petkov wrote:
> On Mon, Jun 24, 2013 at 11:28:07AM +0200, Peter Zijlstra wrote:
> > I find this patch incomplete for its asymmetric; there a dec, but not an
> > implied inc.
> 
> There's one in perf_mmap_open.
> 
> Basically the idea was to pull up the ->mmap_count decrement (haha,
> sounds like excrement :-)) and not deallocate the ring buffers for a
> persistent event.

The asymmetry comes from rb_alloc() in add_persistent_event_on_cpu()
which is usually implemented in the mmap functions. We might better
improve functions to attach/detach events and buffers for more
symmetric code.

-Robert

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 03/14] perf: Add persistent event facilities
  2013-06-24  9:32   ` Peter Zijlstra
@ 2013-06-25  8:47     ` Robert Richter
  0 siblings, 0 replies; 54+ messages in thread
From: Robert Richter @ 2013-06-25  8:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Borislav Petkov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Jiri Olsa, linux-kernel, Borislav Petkov

On 24.06.13 11:32:04, Peter Zijlstra wrote:
> I generally disapprove of indented labels. None of the perf code has
> that and the tools are easy to 'fix'.
> 
> My quiltrc contains:
> 
> QUILT_DIFF_OPTS="-F ^[[:alpha:]\$_].*[^:]\$"
> 
> my .gitconfig contains:
> 
> [diff "default"]
>                 xfuncname = "^[[:alpha:]$_].*[^:]$"
> 
> Both avoid diff thinking labels are function names.

Right, we will use your coding style. ;)

-Robert

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 03/14] perf: Add persistent event facilities
  2013-06-25  7:44       ` Peter Zijlstra
@ 2013-06-25  9:24         ` Robert Richter
  2013-06-25  9:37           ` Borislav Petkov
  0 siblings, 1 reply; 54+ messages in thread
From: Robert Richter @ 2013-06-25  9:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Borislav Petkov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Jiri Olsa, linux-kernel, Borislav Petkov

On 25.06.13 09:44:01, Peter Zijlstra wrote:
> Elsewhere in this series you use 'pers' to shorten things; it reads a
> bit odd to me because 'pers' is the Dutch word for press (both meanings
> transfer) but that is just something I'll have to live with isn't it ;-)
> 
> As for tracepoint, it seems common to shorten that to tp; which always
> reminds me of toilet paper, but I suppose more people suffer from that.
> 
> Yielding: perf_add_pers_tp() 
> 
> which I read as adding pressed toilet paper.. a well :-)

Better we avoid this, don't wonat you misread the code. ;)

I also see 'pers_' not as an optimum since it could be mixed-up easily
with 'perf_'. Maybe we take 'persist_' instead?

-Robert

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 03/14] perf: Add persistent event facilities
  2013-06-25  9:24         ` Robert Richter
@ 2013-06-25  9:37           ` Borislav Petkov
  2013-06-25 10:51             ` Robert Richter
  0 siblings, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2013-06-25  9:37 UTC (permalink / raw)
  To: Robert Richter
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Borislav Petkov

On Tue, Jun 25, 2013 at 11:24:39AM +0200, Robert Richter wrote:
> On 25.06.13 09:44:01, Peter Zijlstra wrote:
> > Elsewhere in this series you use 'pers' to shorten things; it reads a
> > bit odd to me because 'pers' is the Dutch word for press (both meanings
> > transfer) but that is just something I'll have to live with isn't it ;-)
> > 
> > As for tracepoint, it seems common to shorten that to tp; which always
> > reminds me of toilet paper, but I suppose more people suffer from that.
> > 
> > Yielding: perf_add_pers_tp() 
> > 
> > which I read as adding pressed toilet paper.. a well :-)
> 
> Better we avoid this, don't wonat you misread the code. ;)
> 
> I also see 'pers_' not as an optimum since it could be mixed-up easily
> with 'perf_'. Maybe we take 'persist_' instead?

Yep, although it reads wrong:

perf_add_persist_event

Another not really optimal idea would be to call them sticky events:

perf_add_sticky_event.

Also funny.

Then there's

perf_add_prstnt_event

Also, not really great.

Hmmm.. I got nothing so far...

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration
  2013-06-24 10:08 ` [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Peter Zijlstra
@ 2013-06-25 10:46   ` Robert Richter
  0 siblings, 0 replies; 54+ messages in thread
From: Robert Richter @ 2013-06-25 10:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Borislav Petkov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Jiri Olsa, linux-kernel

On 24.06.13 12:08:19, Peter Zijlstra wrote:
> >  git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git persistent-v2
> > 
> 
> OK, so I gave up on reading the patches :/ and went and looked at the git
> tree. There's just too much needless churn in the patches.

Ok, we will rework the patches to have a final patch set hiding all
reworks. I was hoping we could work on a public branch since this
eases developing code with multiple people working on it. This also
shows how code matures. But this seems not to fly.

> + int perf_add_persistent_event(struct perf_event_attr *attr, unsigned nr_pages)
> + {
> + 	struct perf_event *event;
> + 	int i;
> + 
> + 	for_each_possible_cpu(i) {
> + 		event = add_persistent_event_on_cpu(i, attr, nr_pages);
> + 		if (IS_ERR(event))
> + 			goto unwind;
> + 	}
> + 	return 0;
> + 
> + unwind:
> + 	pr_err("%s: Error adding persistent event on cpu %d: %ld\n",
> + 		__func__, i, PTR_ERR(event));
> + 
> + 	while (--i >= 0)
> + 		del_persistent_event(i, attr);
> + 
> + 	return PTR_ERR(event);
> + }
> 
> This assumes cpu_possible_mask doesn't have holes in; I'm fairly sure we cannot
> assume such.

Right, needs to be fixed.

> Furthermore; while the function is exposed and thus looks like a generic usable
> thing; however it looks to 'forget' making a sysfs entry, making it look like
> something incomplete.
> 
> Only the ill named perf_add_persistent_event_by_id() function looks
> something that's usable outside of this.

Right, the name should be actually perf_add_persistent_tp() or so.

And, most of the function should be merged with
perf_add_persistent_event(). Maybe perf_add_persistent_event_by_id()
could be removed completly leaving perf_add_persistent_tp() as wrapper
for tracepoints.

> What's with MAX_EVENTS ?

It is the total number of sysfs entries that can currently be
registered dynamically. The problem with an unlimited event number is
that the sysfs attr list is a fixed array. We need to resize the array
which involves copying entries including locking etc. This was left
for later. ;)

Btw, another thing left for later is cpu hotplug code. I know this
needs to be implemented. But we want to see something running first.
Do you know what happens to system-wide events if a cpu is offline? It
looks like cpu_function_call() in perf_install_in_context() simply
ignores to install an event on an offline cpu. Not sure if it is later
enabled while bringing the cpu online. Quick looking at the code seems
to be not.

-Robert

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 03/14] perf: Add persistent event facilities
  2013-06-25  9:37           ` Borislav Petkov
@ 2013-06-25 10:51             ` Robert Richter
  2013-06-25 15:29               ` Borislav Petkov
  0 siblings, 1 reply; 54+ messages in thread
From: Robert Richter @ 2013-06-25 10:51 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Borislav Petkov

On 25.06.13 11:37:06, Borislav Petkov wrote:
> On Tue, Jun 25, 2013 at 11:24:39AM +0200, Robert Richter wrote:
> > I also see 'pers_' not as an optimum since it could be mixed-up easily
> > with 'perf_'. Maybe we take 'persist_' instead?
> 
> Yep, although it reads wrong:
> 
> perf_add_persist_event

We could use persist_ as prefix for static functions and use the long
versions for the interface only.

But all this is a bit bikeshedding. I am sure we find something.

-Robert

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 03/14] perf: Add persistent event facilities
  2013-06-25 10:51             ` Robert Richter
@ 2013-06-25 15:29               ` Borislav Petkov
  2013-06-25 16:14                 ` Robert Richter
  0 siblings, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2013-06-25 15:29 UTC (permalink / raw)
  To: Robert Richter
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Borislav Petkov

On Tue, Jun 25, 2013 at 12:51:35PM +0200, Robert Richter wrote:
> On 25.06.13 11:37:06, Borislav Petkov wrote:
> > On Tue, Jun 25, 2013 at 11:24:39AM +0200, Robert Richter wrote:
> > > I also see 'pers_' not as an optimum since it could be mixed-up easily
> > > with 'perf_'. Maybe we take 'persist_' instead?
> > 
> > Yep, although it reads wrong:
> > 
> > perf_add_persist_event
> 
> We could use persist_ as prefix for static functions and use the long
> versions for the interface only.
> 
> But all this is a bit bikeshedding. I am sure we find something.

How about

perf_add_pevent?

It is nice and short, although maybe too short but "pevent" is almost
like a special term and the name shows that it is a special type of
event, i.e. a p-event.

:-)

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 03/14] perf: Add persistent event facilities
  2013-06-25 15:29               ` Borislav Petkov
@ 2013-06-25 16:14                 ` Robert Richter
  0 siblings, 0 replies; 54+ messages in thread
From: Robert Richter @ 2013-06-25 16:14 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel, Borislav Petkov

On 25.06.13 17:29:41, Borislav Petkov wrote:
> How about
> 
> perf_add_pevent?
> 
> It is nice and short, although maybe too short but "pevent" is almost
> like a special term and the name shows that it is a special type of
> event, i.e. a p-event.

Fine with me, though it's more for internal usage. There is also use
of it as name for event pointers. And, it is used in perf tools as
event parser structure. But otherwise its ok.

-Robert

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration
  2013-06-24 10:22 ` Peter Zijlstra
@ 2013-06-25 16:56   ` Robert Richter
  0 siblings, 0 replies; 54+ messages in thread
From: Robert Richter @ 2013-06-25 16:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Borislav Petkov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Jiri Olsa, linux-kernel

On 24.06.13 12:22:00, Peter Zijlstra wrote:
> On Tue, Jun 11, 2013 at 06:42:26PM +0200, Robert Richter wrote:
> > Note that perf tools need to support the 'attr<num>' syntax that is
> > added in a separate patch set. With it we are able to run perf tool
> > commands to read persistent events, e.g.:
> 
> where is this patch? I can't find it.

I posted it with that subject:

 [PATCH 0/4] perf tools: Persistent events, changes for perf tool integration

> I also find attr<num>:<bit> a bit weird. So far we've used
> <perf_event_attr::fieldname>:<bits>, so while initializing anonymous
> unions is a bit difficult with 'older' GCCs and we cannot actually do:
> 
> struct perf_event_attr {
> 	...
> 	union {
> 		u64 flags;
> 		u64 disabled : 1,
> 		    ...
> 		    __reserved_1 : X;
> 	}
> 	...
> };
> 
> We could fake it in userspace by allowing things like: flags:23. It
> would not be a much worse hack than attr<num>:<bit> I suppose.

A goal is to be able to entirely describe the complete attr structure
with sysfs. perf tools should work out-of-the-box for every kind of
event. For this particular case a 'flags' syntax would be ok, but not
if you like to fill in other attr members.

With attr<num>:<bit> you are flexible to put in *everything* into
attr. And for better readability there are abstractions as formats
like:

 /sys/bus/event_source/devices/persistent/format/persistent:attr5:23

Not sure if the event parser handles recursion already, but we could
also support the 'flag' format you describe that way:

 /sys/bus/event_source/devices/persistent/format/flags:attr5:0-63
 /sys/bus/event_source/devices/persistent/format/persistent:flags:23

We could even list events for other pmus with small changes in the
kernel, e.g:

 /sys/bus/event_source/devices/persistent/format/type:attr0:0-31
 /sys/bus/event_source/devices/persistent/events/some_tracepoint:type=2,config=<tp_id>

The example is a bit weird. But in general we can translate struct
perf_event_attr to a string and vice versa. And perf tools understand
how to set up every bit of the event attr structure.

-Robert

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration
  2013-06-24 19:45   ` Ingo Molnar
@ 2013-06-25 17:57     ` Robert Richter
  2013-06-25 19:16       ` Borislav Petkov
  0 siblings, 1 reply; 54+ messages in thread
From: Robert Richter @ 2013-06-25 17:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Borislav Petkov, Arnaldo Carvalho de Melo,
	Jiri Olsa, linux-kernel

On 24.06.13 21:45:11, Ingo Molnar wrote:
> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Oh and what Boris and Ingo said; persistent events should 'persist' and
> > not be tied to particular processes.

Fine with me too. But we need an answer how to create/release
persistent events.

It's easy to add them as in-kernel events in the kernel. But, how
could persistent events be created from userspace without inventing
too much new? As for standard events this could be done with the perf
syscall too. The event is created by a process and buffers are also
setup using the event's file descriptor. After the setup we detach the
event from the process. The event is persistent now. We could release
an event the same way by attaching it to a process an then closing it.

Detaching an event from a process basically means the event is not
released after the process is closing the event or is killed itself.
Sharing an event between processes is very important for this. Sharing
requires the event buffer to be accessed readonly and the event's ref
count needs to keep track of all attached processes. As long as a
process is using the event it is not released.

But what options there are to detach the event from all processes and
make it persistent?  We could just put the creating process to sleep
as long as the event should be persistent. This seems not to be an
option for you. There could be other ways to just increase the
refcount. We could use a ftrace-like approach and modify the refcount
by:

Detach:

 # echo opened_event > /sys/bus/event_source/devices/persistent/detached

Release:

 # echo '!opened_event' > /sys/bus/event_source/devices/persistent/detached

And Ingo's proposal using eventfs somehow:

 /sys/fs/events/user/PID/

And/or cgroups?

Are there other suggestions?

> > I'm not sure about the entire
> > eventfs thing; but the proposed sysfs thing should definitely work for
> > now.
> 
> I'm fine with a sysfs approach as well, as long as it's correct and 
> obvious enough to use.

Glad to hear this. :)

Thanks,

-Robert

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration
  2013-06-25 17:57     ` Robert Richter
@ 2013-06-25 19:16       ` Borislav Petkov
  2013-06-26  8:12         ` Robert Richter
  0 siblings, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2013-06-25 19:16 UTC (permalink / raw)
  To: Robert Richter
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

On Tue, Jun 25, 2013 at 07:57:29PM +0200, Robert Richter wrote:
> But what options there are to detach the event from all processes and
> make it persistent?

Something like this:

	ioctl(fd, PERF_EVENT_IOC_DETACH, 0);

I guess this could simply set the persistent flag so that the rest of
the perf code knows not to destroy event buffers etc.

I don't have an idea about the reattaching though because you don't have
a file descriptor there.

Maybe for that we could really use the sys_perf_event_open() with flags
set to PERF_FLAG_PERSISTENT to note that we want to reattach to the
persistent event instead of opening a new one.

Something to that effect...

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration
  2013-06-25 19:16       ` Borislav Petkov
@ 2013-06-26  8:12         ` Robert Richter
  2013-06-26  8:24           ` Borislav Petkov
  0 siblings, 1 reply; 54+ messages in thread
From: Robert Richter @ 2013-06-26  8:12 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

On 25.06.13 21:16:54, Borislav Petkov wrote:
> On Tue, Jun 25, 2013 at 07:57:29PM +0200, Robert Richter wrote:
> > But what options there are to detach the event from all processes and
> > make it persistent?
> 
> Something like this:
> 
> 	ioctl(fd, PERF_EVENT_IOC_DETACH, 0);
> 
> I guess this could simply set the persistent flag so that the rest of
> the perf code knows not to destroy event buffers etc.
> 
> I don't have an idea about the reattaching though because you don't have
> a file descriptor there.
> 
> Maybe for that we could really use the sys_perf_event_open() with flags
> set to PERF_FLAG_PERSISTENT to note that we want to reattach to the
> persistent event instead of opening a new one.

We get a new fd by opening the persistent event with the syscall.
There would be 2 new ioctls:

 ioctl(fd, PERF_EVENT_IOC_DETACH, 0);
 ioctl(fd, PERF_EVENT_IOC_ATTACH, 0);

This would be fine and reuses existing infrastructure.

-Robert

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration
  2013-06-26  8:12         ` Robert Richter
@ 2013-06-26  8:24           ` Borislav Petkov
  2013-06-26  9:46             ` Ingo Molnar
  2013-06-26 10:11             ` Robert Richter
  0 siblings, 2 replies; 54+ messages in thread
From: Borislav Petkov @ 2013-06-26  8:24 UTC (permalink / raw)
  To: Robert Richter
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

On Wed, Jun 26, 2013 at 10:12:23AM +0200, Robert Richter wrote:
> We get a new fd by opening the persistent event with the syscall.
> There would be 2 new ioctls:
> 
>  ioctl(fd, PERF_EVENT_IOC_DETACH, 0);
>  ioctl(fd, PERF_EVENT_IOC_ATTACH, 0);
> 
> This would be fine and reuses existing infrastructure.

Well, how are you going to say that you want to open an already existing
persistent event or your want to create exactly the same persistent
event? Are we even going to allow identical persistent events to
coexist?

The answers to those questions gives us the implementation.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration
  2013-06-26  8:24           ` Borislav Petkov
@ 2013-06-26  9:46             ` Ingo Molnar
  2013-06-26  9:56               ` Borislav Petkov
  2013-06-26 10:11             ` Robert Richter
  1 sibling, 1 reply; 54+ messages in thread
From: Ingo Molnar @ 2013-06-26  9:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Robert Richter, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Jiri Olsa, linux-kernel


* Borislav Petkov <bp@alien8.de> wrote:

> On Wed, Jun 26, 2013 at 10:12:23AM +0200, Robert Richter wrote:
> > We get a new fd by opening the persistent event with the syscall.
> > There would be 2 new ioctls:
> > 
> >  ioctl(fd, PERF_EVENT_IOC_DETACH, 0);
> >  ioctl(fd, PERF_EVENT_IOC_ATTACH, 0);
> > 
> > This would be fine and reuses existing infrastructure.
> 
> Well, how are you going to say that you want to open an already existing 
> persistent event or your want to create exactly the same persistent 
> event? Are we even going to allow identical persistent events to 
> coexist?

If already existing persistent events show up somewhere in sysfs (or in a 
separate pseudofilesystem) then an open() of them [given sufficient 
privileges of the caller, etc.] could attach to them.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration
  2013-06-26  9:46             ` Ingo Molnar
@ 2013-06-26  9:56               ` Borislav Petkov
  0 siblings, 0 replies; 54+ messages in thread
From: Borislav Petkov @ 2013-06-26  9:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Robert Richter, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Jiri Olsa, linux-kernel

On Wed, Jun 26, 2013 at 11:46:34AM +0200, Ingo Molnar wrote:
> If already existing persistent events show up somewhere in sysfs
> (or in a separate pseudofilesystem) then an open() of them [given
> sufficient privileges of the caller, etc.] could attach to them.

Yep, I think we want to say sys_perf_event_open(attr.persistent=1) which
would give you a file descriptor to an already existing persistent event
with attr.config=<id>.

If it doesn't exist, it creates it.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration
  2013-06-26  8:24           ` Borislav Petkov
  2013-06-26  9:46             ` Ingo Molnar
@ 2013-06-26 10:11             ` Robert Richter
  2013-06-26 11:45               ` Ingo Molnar
  1 sibling, 1 reply; 54+ messages in thread
From: Robert Richter @ 2013-06-26 10:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa,
	linux-kernel

On 26.06.13 10:24:08, Borislav Petkov wrote:
> On Wed, Jun 26, 2013 at 10:12:23AM +0200, Robert Richter wrote:
> > We get a new fd by opening the persistent event with the syscall.
> > There would be 2 new ioctls:
> > 
> >  ioctl(fd, PERF_EVENT_IOC_DETACH, 0);
> >  ioctl(fd, PERF_EVENT_IOC_ATTACH, 0);
> > 
> > This would be fine and reuses existing infrastructure.
> 
> Well, how are you going to say that you want to open an already existing
> persistent event or your want to create exactly the same persistent
> event? Are we even going to allow identical persistent events to
> coexist?

Here is the scenario:

Creating a persistent event from userspace:

 * A process opens a system-wide event with the syscall and gets a fd.
 * The process mmaps the buffer.
 * The process does an ioctl to detach the process which increases the
   events and buffers refcount. The event is listed as 'persistent' in
   sysfs with a unique id.
 * The process closes the fd. Event and buffer remain in the system
   since the refcounts are not zero.

Opening a persistent event:

 * A process scans sysfs for persistent events.
 * To open the event it sets up the event attr according to sysfs.
 * The persistent event is opened with the syscall, the process gets a
   new fd of the event.
 * The process attaches to the event buffer with mmap.

Releasing a persistent event:

 * A process opens a persistent event and gets a fd.
 * The process does an ioctl to attach the process which decreases the
   refcounts. The sysfs entry is removed.
 * The process closes the fd.
 * After all processes that are tied to the event closed their event's
   fds, the persistent event and its buffer is released.

Sounds like a plan?

-Robert

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration
  2013-06-26 10:11             ` Robert Richter
@ 2013-06-26 11:45               ` Ingo Molnar
  2013-06-26 12:25                 ` Ingo Molnar
  2013-06-26 12:44                 ` Robert Richter
  0 siblings, 2 replies; 54+ messages in thread
From: Ingo Molnar @ 2013-06-26 11:45 UTC (permalink / raw)
  To: Robert Richter
  Cc: Borislav Petkov, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Jiri Olsa, linux-kernel


* Robert Richter <rric@kernel.org> wrote:

> On 26.06.13 10:24:08, Borislav Petkov wrote:
> > On Wed, Jun 26, 2013 at 10:12:23AM +0200, Robert Richter wrote:
> > > We get a new fd by opening the persistent event with the syscall.
> > > There would be 2 new ioctls:
> > > 
> > >  ioctl(fd, PERF_EVENT_IOC_DETACH, 0);
> > >  ioctl(fd, PERF_EVENT_IOC_ATTACH, 0);
> > > 
> > > This would be fine and reuses existing infrastructure.
> > 
> > Well, how are you going to say that you want to open an already existing
> > persistent event or your want to create exactly the same persistent
> > event? Are we even going to allow identical persistent events to
> > coexist?
> 
> Here is the scenario:

Looks mostly good - with a few suggestions:

> 
> Creating a persistent event from userspace:
> 
>  * A process opens a system-wide event with the syscall and gets a fd.

Should this really be limited to system-wide events?

>  * The process mmaps the buffer.
>  * The process does an ioctl to detach the process which increases the
>    events and buffers refcount. The event is listed as 'persistent' in
>    sysfs with a unique id.
>  * The process closes the fd. Event and buffer remain in the system
>    since the refcounts are not zero.
> 
> Opening a persistent event:
> 
>  * A process scans sysfs for persistent events.
>  * To open the event it sets up the event attr according to sysfs.

Basically it would just put some ID (found in sysfs) into the attr and set 
attr.persistent=1 - not any other information, right?

If it knows the ID straight away (the user told it, or it remembers it 
from some other file such as a temporary file, etc.) then it does not even 
have to scan sysfs.

[ How about to additional logic: attr.persistent=1 && attr.config==0 means 
  a new persistent event is created straight away - no ioctl is needed to 
  detach it explicitly. ]

>  * The persistent event is opened with the syscall, the process gets a
>    new fd of the event.
>  * The process attaches to the event buffer with mmap.

Yes. And gets the pre-existing event and mmap buffer.

> Releasing a persistent event:
> 
>  * A process opens a persistent event and gets a fd.
>  * The process does an ioctl to attach the process which decreases the
>    refcounts. The sysfs entry is removed.
>  * The process closes the fd.
>  * After all processes that are tied to the event closed their event's
>    fds, the persistent event and its buffer is released.
> 
> Sounds like a plan?

It does :-)

I'm sure there will be some details going down that path, but it looks 
workable at first glance.

Note, for tracing the PERF_FLAG_FD_OUTPUT method of multiplexing multiple 
events onto a single mmap buffers is probably useful (also usable via the 
PERF_EVENT_IOC_SET_OUTPUT ioctl()), so please make sure the scheme works 
naturally with that model as well, not just with 1:1 event+buffer 
mappings.

See the uses of PERF_EVENT_IOC_SET_OUTPUT in tools/perf/.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration
  2013-06-26 11:45               ` Ingo Molnar
@ 2013-06-26 12:25                 ` Ingo Molnar
  2013-06-26 12:44                 ` Robert Richter
  1 sibling, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2013-06-26 12:25 UTC (permalink / raw)
  To: Robert Richter
  Cc: Borislav Petkov, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Jiri Olsa, linux-kernel


* Ingo Molnar <mingo@kernel.org> wrote:

> Note, for tracing the PERF_FLAG_FD_OUTPUT method of multiplexing 
> multiple events onto a single mmap buffers is probably useful (also 
> usable via the PERF_EVENT_IOC_SET_OUTPUT ioctl()), so please make sure 
> the scheme works naturally with that model as well, not just with 1:1 
> event+buffer mappings.
> 
> See the uses of PERF_EVENT_IOC_SET_OUTPUT in tools/perf/.

Note that another facility that would be very useful for tracing is 
PeterZ's and tglx's patch that enables multiple tracepoints to be attached 
to a single event.

See the 2+ years old (bitrotten and unfinished) WIP patch below.

It adds a PERF_EVENT_IOC_ADD_TP ioctl() that adds a new tracepoint to an 
existing event. This makes perf based tracing scale up to an arbitrary 
number of tracepoints in essence.

Thanks,

	Ingo

------------------>
Subject: perf-tracepoint-idr.patch
From: Thomas Gleixner <tglx@linutronix.de>
Date: Wed, 24 Nov 2010 12:09:26 +0100

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/linux/ftrace_event.h    |   10 
 include/linux/perf_event.h      |    9 
 include/linux/sched.h           |    9 
 include/trace/ftrace.h          |    4 
 kernel/events/core.c            |  407 ++++++++++++++++++++++++++++++++++++++--
 kernel/trace/trace_event_perf.c |   95 +++------
 kernel/trace/trace_kprobe.c     |   10 
 kernel/trace/trace_output.c     |  116 +++--------
 kernel/trace/trace_syscalls.c   |    8 
 9 files changed, 498 insertions(+), 170 deletions(-)

Index: linux/include/linux/ftrace_event.h
===================================================================
--- linux.orig/include/linux/ftrace_event.h
+++ linux/include/linux/ftrace_event.h
@@ -87,8 +87,6 @@ struct trace_event_functions {
 };
 
 struct trace_event {
-	struct hlist_node		node;
-	struct list_head		list;
 	int				type;
 	struct trace_event_functions	*funcs;
 };
@@ -194,7 +192,6 @@ struct ftrace_event_call {
 
 #ifdef CONFIG_PERF_EVENTS
 	int				perf_refcount;
-	struct hlist_head __percpu	*perf_events;
 #endif
 };
 
@@ -263,8 +260,9 @@ struct perf_event;
 
 DECLARE_PER_CPU(struct pt_regs, perf_trace_regs);
 
-extern int  perf_trace_init(struct perf_event *event);
+extern int  perf_trace_init(struct perf_event *event, int event_id);
 extern void perf_trace_destroy(struct perf_event *event);
+extern void perf_trace_destroy_id(int id);
 extern int  perf_trace_add(struct perf_event *event, int flags);
 extern void perf_trace_del(struct perf_event *event, int flags);
 extern int  ftrace_profile_set_filter(struct perf_event *event, int event_id,
@@ -275,9 +273,9 @@ extern void *perf_trace_buf_prepare(int 
 
 static inline void
 perf_trace_buf_submit(void *raw_data, int size, int rctx, u64 addr,
-		       u64 count, struct pt_regs *regs, void *head)
+		       u64 count, struct pt_regs *regs, int id)
 {
-	perf_tp_event(addr, count, raw_data, size, regs, head, rctx);
+	perf_tp_event(addr, count, raw_data, size, regs, rctx, id);
 }
 #endif
 
Index: linux/include/linux/perf_event.h
===================================================================
--- linux.orig/include/linux/perf_event.h
+++ linux/include/linux/perf_event.h
@@ -247,6 +247,7 @@ struct perf_event_attr {
 #define PERF_EVENT_IOC_PERIOD		_IOW('$', 4, __u64)
 #define PERF_EVENT_IOC_SET_OUTPUT	_IO ('$', 5)
 #define PERF_EVENT_IOC_SET_FILTER	_IOW('$', 6, char *)
+#define PERF_EVENT_IOC_ADD_TP		_IO ('$', 7)
 
 enum perf_event_ioc_flags {
 	PERF_IOC_FLAG_GROUP		= 1U << 0,
@@ -568,6 +569,11 @@ struct hw_perf_event {
 			struct task_struct		*bp_target;
 		};
 #endif
+		/*
+		 * Same fudge as for breakpoints, trace-events needs
+		 * it too,.. convert the bp crap over..
+		 */
+		struct task_struct *event_target;
 	};
 	int				state;
 	local64_t			prev_count;
@@ -859,6 +865,7 @@ struct perf_event {
 #ifdef CONFIG_EVENT_TRACING
 	struct ftrace_event_call	*tp_event;
 	struct event_filter		*filter;
+	struct perf_tp_idr		tp_idr;
 #endif
 
 #ifdef CONFIG_CGROUP_PERF
@@ -1133,7 +1140,7 @@ static inline bool perf_paranoid_kernel(
 extern void perf_event_init(void);
 extern void perf_tp_event(u64 addr, u64 count, void *record,
 			  int entry_size, struct pt_regs *regs,
-			  struct hlist_head *head, int rctx);
+			  int rctx, int id);
 extern void perf_bp_event(struct perf_event *event, void *data);
 
 #ifndef perf_misc_flags
Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -82,6 +82,7 @@ struct sched_param {
 #include <linux/rculist.h>
 #include <linux/rtmutex.h>
 
+#include <linux/idr.h>
 #include <linux/time.h>
 #include <linux/param.h>
 #include <linux/resource.h>
@@ -1199,6 +1200,11 @@ enum perf_event_task_context {
 	perf_nr_task_contexts,
 };
 
+struct perf_tp_idr {
+	struct mutex	lock;
+	struct idr	idr;
+};
+
 struct task_struct {
 	volatile long state;	/* -1 unrunnable, 0 runnable, >0 stopped */
 	void *stack;
@@ -1485,6 +1491,9 @@ struct task_struct {
 	struct perf_event_context *perf_event_ctxp[perf_nr_task_contexts];
 	struct mutex perf_event_mutex;
 	struct list_head perf_event_list;
+#ifdef CONFIG_EVENT_TRACING
+	struct perf_tp_idr *perf_tp_idr;
+#endif
 #endif
 #ifdef CONFIG_NUMA
 	struct mempolicy *mempolicy;	/* Protected by alloc_lock */
Index: linux/include/trace/ftrace.h
===================================================================
--- linux.orig/include/trace/ftrace.h
+++ linux/include/trace/ftrace.h
@@ -708,7 +708,6 @@ perf_trace_##call(void *__data, proto)		
 	struct ftrace_raw_##call *entry;				\
 	struct pt_regs __regs;						\
 	u64 __addr = 0, __count = 1;					\
-	struct hlist_head *head;					\
 	int __entry_size;						\
 	int __data_size;						\
 	int rctx;							\
@@ -733,9 +732,8 @@ perf_trace_##call(void *__data, proto)		
 									\
 	{ assign; }							\
 									\
-	head = this_cpu_ptr(event_call->perf_events);			\
 	perf_trace_buf_submit(entry, __entry_size, rctx, __addr,	\
-		__count, &__regs, head);				\
+		__count, &__regs, event_call->event.type);		\
 }
 
 /*
Index: linux/kernel/events/core.c
===================================================================
--- linux.orig/kernel/events/core.c
+++ linux/kernel/events/core.c
@@ -823,6 +823,7 @@ list_add_event(struct perf_event *event,
 	ctx->nr_events++;
 	if (event->attr.inherit_stat)
 		ctx->nr_stat++;
+	++ctx->generation;
 }
 
 /*
@@ -976,6 +977,7 @@ list_del_event(struct perf_event *event,
 	 */
 	if (event->state > PERF_EVENT_STATE_OFF)
 		event->state = PERF_EVENT_STATE_OFF;
+	++ctx->generation;
 }
 
 static void perf_group_detach(struct perf_event *event)
@@ -1894,6 +1896,12 @@ static void perf_event_context_sched_out
 	if (!cpuctx->task_ctx)
 		return;
 
+#if 0
+	/*
+	 * Need to sort out how to make task_struct::perf_tp_idr
+	 * work with this fancy switching stuff.. tracepoints could be
+	 * in multiple contexts due to the software event muck.
+	 */
 	rcu_read_lock();
 	parent = rcu_dereference(ctx->parent_ctx);
 	next_ctx = next->perf_event_ctxp[ctxn];
@@ -1927,6 +1935,7 @@ static void perf_event_context_sched_out
 		raw_spin_unlock(&ctx->lock);
 	}
 	rcu_read_unlock();
+#endif
 
 	if (do_switch) {
 		ctx_sched_out(ctx, cpuctx, EVENT_ALL);
@@ -3261,6 +3270,7 @@ static struct perf_event *perf_fget_ligh
 static int perf_event_set_output(struct perf_event *event,
 				 struct perf_event *output_event);
 static int perf_event_set_filter(struct perf_event *event, void __user *arg);
+static int perf_event_add_tp(struct perf_event *event, int tp_id);
 
 static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
@@ -3307,6 +3317,9 @@ static long perf_ioctl(struct file *file
 	case PERF_EVENT_IOC_SET_FILTER:
 		return perf_event_set_filter(event, (void __user *)arg);
 
+	case PERF_EVENT_IOC_ADD_TP:
+		return perf_event_add_tp(event, arg);
+
 	default:
 		return -ENOTTY;
 	}
@@ -5471,6 +5484,9 @@ static struct pmu perf_swevent = {
 
 #ifdef CONFIG_EVENT_TRACING
 
+#include <linux/ftrace_event.h>
+#include "../trace/trace_output.h"
+
 static int perf_tp_filter_match(struct perf_event *event,
 				struct perf_sample_data *data)
 {
@@ -5485,8 +5501,9 @@ static int perf_tp_event_match(struct pe
 				struct perf_sample_data *data,
 				struct pt_regs *regs)
 {
-	if (event->hw.state & PERF_HES_STOPPED)
+	if (event->state != PERF_EVENT_STATE_ACTIVE)
 		return 0;
+
 	/*
 	 * All tracepoints are from kernel-space.
 	 */
@@ -5499,8 +5516,60 @@ static int perf_tp_event_match(struct pe
 	return 1;
 }
 
+static void perf_tp_idr_init(struct perf_tp_idr *idr)
+{
+	idr_init(&idr->idr);
+	mutex_init(&idr->lock);
+}
+
+static DEFINE_PER_CPU(struct perf_tp_idr, perf_tp_idr);
+
+struct perf_tp_node {
+	struct list_head	list;
+	struct perf_event	*event;
+	struct rcu_head		rcu;
+};
+
+static void do_perf_tp_event(struct perf_event *event, u64 count,
+			     struct perf_sample_data *data,
+			     struct pt_regs *regs)
+{
+	if (perf_tp_event_match(event, data, regs))
+		perf_swevent_event(event, count, 1, data, regs);
+}
+
+static void perf_tp_idr_event(struct perf_tp_idr *tp_idr,
+			      int id, u64 count,
+			      struct perf_sample_data *data,
+			      struct pt_regs *regs)
+{
+	struct perf_tp_node *tp_node, *node;
+	struct perf_event *event;
+
+	if (!tp_idr)
+		return;
+
+	/*
+	 * Most of this is done under rcu_read_lock_sched(), which doesn't
+	 * exclude regular RCU grace periods, but the IDR code uses call_rcu()
+	 * so we have to use rcu_read_lock() here as well.
+	 */
+	rcu_read_lock();
+	tp_node = idr_find(&tp_idr->idr, id);
+	rcu_read_unlock();
+
+	if (!tp_node)
+		return;
+
+	event = tp_node->event;
+
+	do_perf_tp_event(event, count, data, regs);
+	list_for_each_entry_rcu(node, &tp_node->list, list)
+		do_perf_tp_event(node->event, count, data, regs);
+}
+
 void perf_tp_event(u64 addr, u64 count, void *record, int entry_size,
-		   struct pt_regs *regs, struct hlist_head *head, int rctx)
+		   struct pt_regs *regs, int rctx, int id)
 {
 	struct perf_sample_data data;
 	struct perf_event *event;
@@ -5514,18 +5583,197 @@ void perf_tp_event(u64 addr, u64 count, 
 	perf_sample_data_init(&data, addr);
 	data.raw = &raw;
 
-	hlist_for_each_entry_rcu(event, node, head, hlist_entry) {
-		if (perf_tp_event_match(event, &data, regs))
-			perf_swevent_event(event, count, 1, &data, regs);
-	}
+	perf_tp_idr_event(&__get_cpu_var(perf_tp_idr), id, count, &data, regs);
+	perf_tp_idr_event(current->perf_tp_idr, id, count, &data, regs);
 
 	perf_swevent_put_recursion_context(rctx);
 }
 EXPORT_SYMBOL_GPL(perf_tp_event);
 
+static struct perf_tp_idr *
+perf_tp_init_task(struct perf_event *event, struct task_struct *task)
+{
+	struct perf_tp_idr *idr;
+
+	mutex_lock(&task->perf_event_mutex);
+	idr = task->perf_tp_idr;
+	if (idr)
+		goto unlock;
+
+	idr = kzalloc(sizeof(struct perf_tp_idr), GFP_KERNEL);
+	if (!idr)
+		goto unlock;
+
+	perf_tp_idr_init(idr);
+
+	task->perf_tp_idr = idr;
+unlock:
+	mutex_unlock(&task->perf_event_mutex);
+
+	return idr;
+}
+
+static struct perf_tp_idr *perf_event_idr(struct perf_event *event, bool create)
+{
+	struct perf_tp_idr *tp_idr;
+	struct task_struct *task;
+
+	if (event->attach_state & PERF_ATTACH_TASK) {
+		task = event->hw.event_target;
+		tp_idr = task->perf_tp_idr;
+		if (!tp_idr && create)
+			tp_idr = perf_tp_init_task(event, task);
+	} else
+		tp_idr = &per_cpu(perf_tp_idr, event->cpu);
+
+	return tp_idr;
+}
+
+static void perf_tp_free_node(struct rcu_head *rcu)
+{
+	struct perf_tp_node *node = container_of(rcu, struct perf_tp_node, rcu);
+
+	kfree(node);
+}
+
+static int perf_tp_remove_idr(int id, void *p, void *data)
+{
+	struct perf_tp_node *node = p;
+	struct perf_tp_node *first, *next;
+	struct perf_tp_idr *tp_idr = data;
+
+	if (!tp_idr)
+		goto no_idr;
+
+	mutex_lock(&tp_idr->lock);
+	first = idr_find(&tp_idr->idr, id);
+	if (first == node) {
+		next = list_first_entry(&first->list, struct perf_tp_node, list);
+		if (next != first)
+			idr_replace(&tp_idr->idr, next, id);
+		else
+			idr_remove(&tp_idr->idr, id);
+	}
+	list_del_rcu(&node->list);
+	mutex_unlock(&tp_idr->lock);
+
+no_idr:
+	perf_trace_destroy_id(id);
+	call_rcu_sched(&node->rcu, perf_tp_free_node);
+	return 0;
+}
+
 static void tp_perf_event_destroy(struct perf_event *event)
 {
-	perf_trace_destroy(event);
+	/*
+	 * Since this is the free path, the fd is gone an there
+	 * can be no concurrency on event->tp_idr.
+	 */
+
+	idr_for_each(&event->tp_idr.idr, perf_tp_remove_idr,
+			perf_event_idr(event, false));
+
+	idr_remove_all(&event->tp_idr.idr);
+	idr_destroy(&event->tp_idr.idr);
+}
+
+static int __perf_event_add_tp(struct perf_event *event, int tp_id)
+{
+	struct perf_tp_node *node, *first;
+	struct perf_tp_idr *idr;
+	int tmp_id, err, ret = -ENOMEM;
+
+	node = kmalloc(sizeof(*node), GFP_KERNEL);
+	if (!node)
+		goto out;
+
+	node->event = event;
+	INIT_LIST_HEAD(&node->list);
+
+	/*
+	 * Insert the node into the event->idr, this idr tracks the
+	 * tracepoints we're interested in, it has a 1:1 relation
+	 * with the node.
+	 */
+	idr = &event->tp_idr;
+	mutex_lock(&idr->lock);
+	err = idr_pre_get(&idr->idr, GFP_KERNEL);
+	if (!err) {
+		ret = -ENOMEM;
+		goto free_node;
+	}
+
+	ret = idr_get_new_above(&idr->idr, node, tp_id, &tmp_id);
+	if (ret)
+		goto free_node;
+
+	if (WARN_ON(tp_id != tmp_id)) {
+		printk(KERN_ERR "fail: %d %d\n" , tp_id, tmp_id);
+		ret = -EBUSY;
+		goto free_idr1;
+	}
+	mutex_unlock(&idr->lock);
+
+	/*
+	 * Insert the node into the task/cpu idr, this idr tracks
+	 * all active tracepoints for the task/cpu, it has a 1:n relation
+	 * with the node.
+	 */
+	idr = perf_event_idr(event, true);
+	if (!idr) {
+		if (event->attach_state & PERF_ATTACH_CONTEXT)
+			ret = -ENOMEM;
+		else
+			ret = -ESRCH;
+		goto free_idr1_set;
+	}
+	mutex_lock(&idr->lock);
+	first = idr_find(&idr->idr, tp_id);
+	if (first) {
+		list_add_rcu(&node->list, &first->list);
+		goto unlock;
+	}
+
+	err = idr_pre_get(&idr->idr, GFP_KERNEL);
+	if (!err) {
+		ret = -ENOMEM;
+		goto free_idr1_set_unlock;
+	}
+
+	ret = idr_get_new_above(&idr->idr, node, tp_id, &tmp_id);
+	if (ret)
+		goto free_idr1_set;
+
+	if (WARN_ON(tp_id != tmp_id)) {
+		ret = -EBUSY;
+		goto free_idr2;
+	}
+unlock:
+	mutex_unlock(&idr->lock);
+
+	ret = perf_trace_init(event, tp_id);
+	if (ret)
+		goto free_all;
+
+out:
+	return ret;
+
+free_all:
+	mutex_lock(&idr->lock);
+free_idr2:
+	idr_remove(&idr->idr, tmp_id);
+free_idr1_set_unlock:
+	mutex_unlock(&idr->lock);
+free_idr1_set:
+	idr = &event->tp_idr;
+	tmp_id = tp_id;
+	mutex_lock(&idr->lock);
+free_idr1:
+	idr_remove(&idr->idr, tmp_id);
+free_node:
+	mutex_unlock(&idr->lock);
+	kfree(node);
+	goto out;
 }
 
 static int perf_tp_event_init(struct perf_event *event)
@@ -5535,21 +5783,35 @@ static int perf_tp_event_init(struct per
 	if (event->attr.type != PERF_TYPE_TRACEPOINT)
 		return -ENOENT;
 
-	err = perf_trace_init(event);
-	if (err)
-		return err;
+	perf_tp_idr_init(&event->tp_idr);
 
 	event->destroy = tp_perf_event_destroy;
 
+	if (event->attr.config != ~0ULL) {
+		err = __perf_event_add_tp(event, event->attr.config);
+		if (err)
+			return err;
+	}
+
 	return 0;
 }
 
+static int perf_tp_event_add(struct perf_event *event, int flags)
+{
+	event->hw.state = flags & PERF_EF_START ? 0 : PERF_HES_STOPPED;
+	return 0;
+}
+
+static void perf_tp_event_del(struct perf_event *event, int flags)
+{
+}
+
 static struct pmu perf_tracepoint = {
 	.task_ctx_nr	= perf_sw_context,
 
 	.event_init	= perf_tp_event_init,
-	.add		= perf_trace_add,
-	.del		= perf_trace_del,
+	.add		= perf_tp_event_add,
+	.del		= perf_tp_event_del,
 	.start		= perf_swevent_start,
 	.stop		= perf_swevent_stop,
 	.read		= perf_swevent_read,
@@ -5557,6 +5819,11 @@ static struct pmu perf_tracepoint = {
 
 static inline void perf_tp_register(void)
 {
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		perf_tp_idr_init(&per_cpu(perf_tp_idr, cpu));
+
 	perf_pmu_register(&perf_tracepoint, "tracepoint", PERF_TYPE_TRACEPOINT);
 }
 
@@ -5565,7 +5832,8 @@ static int perf_event_set_filter(struct 
 	char *filter_str;
 	int ret;
 
-	if (event->attr.type != PERF_TYPE_TRACEPOINT)
+	if (event->attr.type != PERF_TYPE_TRACEPOINT ||
+	    event->attr.config == ~0ULL)
 		return -EINVAL;
 
 	filter_str = strndup_user(arg, PAGE_SIZE);
@@ -5583,6 +5851,74 @@ static void perf_event_free_filter(struc
 	ftrace_profile_free_filter(event);
 }
 
+static int perf_event_add_tp(struct perf_event *event, int tp_id)
+{
+	if (event->attr.type != PERF_TYPE_TRACEPOINT &&
+	    event->attr.config != ~0ULL)
+		return -EINVAL;
+
+	return __perf_event_add_tp(event, tp_id);
+}
+
+/*
+ * Called from the exit path, _after_ all events have been detached from it.
+ */
+static void perf_tp_event_exit(struct task_struct *tsk)
+{
+	struct perf_tp_idr *idr = tsk->perf_tp_idr;
+
+	if (!idr)
+		return;
+
+	idr_remove_all(&idr->idr);
+	idr_destroy(&idr->idr);
+}
+
+static void perf_tp_event_delayed_put(struct task_struct *tsk)
+{
+	struct perf_tp_idr *idr = tsk->perf_tp_idr;
+
+	tsk->perf_tp_idr = NULL;
+	kfree(idr);
+}
+
+static int perf_tp_inherit_idr(int id, void *p, void *data)
+{
+	struct perf_event *child = data;
+
+	return __perf_event_add_tp(child, id);
+}
+
+static int perf_tp_event_inherit(struct perf_event *parent_event,
+				 struct perf_event *child_event)
+{
+	int ret;
+
+	if (parent_event->attr.type != PERF_TYPE_TRACEPOINT ||
+	    parent_event->attr.config != ~0ULL)
+		return 0;
+
+	/*
+	 * The child is not yet exposed, hence no need to serialize things
+	 * on that side.
+	 */
+	mutex_lock(&parent_event->tp_idr.lock);
+	ret = idr_for_each(&parent_event->tp_idr.idr,
+			perf_tp_inherit_idr,
+			child_event);
+	mutex_unlock(&parent_event->tp_idr.lock);
+
+	return ret;
+}
+
+static void perf_tp_event_init_task(struct task_struct *child)
+{
+	/*
+	 * Clear the idr pointer copied from the parent.
+	 */
+	child->perf_tp_idr = NULL;
+}
+
 #else
 
 static inline void perf_tp_register(void)
@@ -5598,6 +5934,29 @@ static void perf_event_free_filter(struc
 {
 }
 
+static int perf_event_add_tp(struct perf_event *event, int tp_id)
+{
+	return -ENOENT;
+}
+
+static void perf_tp_event_exit(struct task_struct *tsk)
+{
+}
+
+static void perf_tp_event_delayed_put(struct task_struct *tsk)
+{
+}
+
+static int perf_tp_event_inherit(struct perf_event *parent_event,
+				 struct perf_event *child_event)
+{
+	return 0;
+}
+
+static void perf_tp_event_init_task()(struct task_struct *child)
+{
+}
+
 #endif /* CONFIG_EVENT_TRACING */
 
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
@@ -6173,6 +6532,9 @@ perf_event_alloc(struct perf_event_attr 
 	INIT_LIST_HEAD(&event->sibling_list);
 	init_waitqueue_head(&event->waitq);
 	init_irq_work(&event->pending, perf_pending_event);
+#ifdef CONFIG_EVENT_TRACING
+	perf_tp_idr_init(&event->tp_idr);
+#endif
 
 	mutex_init(&event->mmap_mutex);
 
@@ -6191,6 +6553,7 @@ perf_event_alloc(struct perf_event_attr 
 
 	if (task) {
 		event->attach_state = PERF_ATTACH_TASK;
+		event->hw.event_target = task;
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
 		/*
 		 * hw_breakpoint is a bit difficult here..
@@ -6236,7 +6599,7 @@ done:
 	if (err) {
 		if (event->ns)
 			put_pid_ns(event->ns);
-		kfree(event);
+		free_event(event);
 		return ERR_PTR(err);
 	}
 
@@ -6604,7 +6967,6 @@ SYSCALL_DEFINE5(perf_event_open,
 	}
 
 	perf_install_in_context(ctx, event, cpu);
-	++ctx->generation;
 	perf_unpin_context(ctx);
 	mutex_unlock(&ctx->mutex);
 
@@ -6681,7 +7043,6 @@ perf_event_create_kernel_counter(struct 
 	WARN_ON_ONCE(ctx->parent_ctx);
 	mutex_lock(&ctx->mutex);
 	perf_install_in_context(ctx, event, cpu);
-	++ctx->generation;
 	perf_unpin_context(ctx);
 	mutex_unlock(&ctx->mutex);
 
@@ -6858,6 +7219,8 @@ void perf_event_exit_task(struct task_st
 
 	for_each_task_context_nr(ctxn)
 		perf_event_exit_task_context(child, ctxn);
+
+	perf_tp_event_exit(child);
 }
 
 static void perf_free_event(struct perf_event *event,
@@ -6920,6 +7283,8 @@ void perf_event_delayed_put(struct task_
 
 	for_each_task_context_nr(ctxn)
 		WARN_ON_ONCE(task->perf_event_ctxp[ctxn]);
+
+	perf_tp_event_delayed_put(task);
 }
 
 /*
@@ -6935,6 +7300,7 @@ inherit_event(struct perf_event *parent_
 {
 	struct perf_event *child_event;
 	unsigned long flags;
+	int ret;
 
 	/*
 	 * Instead of creating recursive hierarchies of events,
@@ -6952,6 +7318,13 @@ inherit_event(struct perf_event *parent_
 					   NULL);
 	if (IS_ERR(child_event))
 		return child_event;
+
+	ret = perf_tp_event_inherit(parent_event, child_event);
+	if (ret) {
+		free_event(child_event);
+		return ERR_PTR(ret);
+	}
+
 	get_ctx(child_ctx);
 
 	/*
@@ -7177,6 +7550,8 @@ int perf_event_init_task(struct task_str
 	mutex_init(&child->perf_event_mutex);
 	INIT_LIST_HEAD(&child->perf_event_list);
 
+	perf_tp_event_init_task(child);
+
 	for_each_task_context_nr(ctxn) {
 		ret = perf_event_init_context(child, ctxn);
 		if (ret)
Index: linux/kernel/trace/trace_event_perf.c
===================================================================
--- linux.orig/kernel/trace/trace_event_perf.c
+++ linux/kernel/trace/trace_event_perf.c
@@ -8,6 +8,7 @@
 #include <linux/module.h>
 #include <linux/kprobes.h>
 #include "trace.h"
+#include "trace_output.h"
 
 static char __percpu *perf_trace_buf[PERF_NR_CONTEXTS];
 
@@ -47,9 +48,7 @@ static int perf_trace_event_perm(struct 
 static int perf_trace_event_init(struct ftrace_event_call *tp_event,
 				 struct perf_event *p_event)
 {
-	struct hlist_head __percpu *list;
 	int ret;
-	int cpu;
 
 	ret = perf_trace_event_perm(tp_event, p_event);
 	if (ret)
@@ -61,15 +60,6 @@ static int perf_trace_event_init(struct 
 
 	ret = -ENOMEM;
 
-	list = alloc_percpu(struct hlist_head);
-	if (!list)
-		goto fail;
-
-	for_each_possible_cpu(cpu)
-		INIT_HLIST_HEAD(per_cpu_ptr(list, cpu));
-
-	tp_event->perf_events = list;
-
 	if (!total_ref_count) {
 		char __percpu *buf;
 		int i;
@@ -100,63 +90,40 @@ fail:
 		}
 	}
 
-	if (!--tp_event->perf_refcount) {
-		free_percpu(tp_event->perf_events);
-		tp_event->perf_events = NULL;
-	}
+	--tp_event->perf_refcount;
 
 	return ret;
 }
 
-int perf_trace_init(struct perf_event *p_event)
+int perf_trace_init(struct perf_event *p_event, int event_id)
 {
 	struct ftrace_event_call *tp_event;
-	int event_id = p_event->attr.config;
+	struct trace_event *t_event;
 	int ret = -EINVAL;
 
+	trace_event_read_lock();
+	t_event = ftrace_find_event(event_id);
+	if (!t_event)
+		goto out;
+
+	tp_event = container_of(t_event, struct ftrace_event_call, event);
+
 	mutex_lock(&event_mutex);
-	list_for_each_entry(tp_event, &ftrace_events, list) {
-		if (tp_event->event.type == event_id &&
-		    tp_event->class && tp_event->class->reg &&
-		    try_module_get(tp_event->mod)) {
-			ret = perf_trace_event_init(tp_event, p_event);
-			if (ret)
-				module_put(tp_event->mod);
-			break;
-		}
+	if (tp_event->class && tp_event->class->reg &&
+			try_module_get(tp_event->mod)) {
+		ret = perf_trace_event_init(tp_event, p_event);
+		if (ret)
+			module_put(tp_event->mod);
 	}
 	mutex_unlock(&event_mutex);
+out:
+	trace_event_read_unlock();
 
 	return ret;
 }
 
-int perf_trace_add(struct perf_event *p_event, int flags)
-{
-	struct ftrace_event_call *tp_event = p_event->tp_event;
-	struct hlist_head __percpu *pcpu_list;
-	struct hlist_head *list;
-
-	pcpu_list = tp_event->perf_events;
-	if (WARN_ON_ONCE(!pcpu_list))
-		return -EINVAL;
-
-	if (!(flags & PERF_EF_START))
-		p_event->hw.state = PERF_HES_STOPPED;
-
-	list = this_cpu_ptr(pcpu_list);
-	hlist_add_head_rcu(&p_event->hlist_entry, list);
-
-	return 0;
-}
-
-void perf_trace_del(struct perf_event *p_event, int flags)
-{
-	hlist_del_rcu(&p_event->hlist_entry);
-}
-
-void perf_trace_destroy(struct perf_event *p_event)
+static void __perf_trace_destroy(struct ftrace_event_call *tp_event)
 {
-	struct ftrace_event_call *tp_event = p_event->tp_event;
 	int i;
 
 	mutex_lock(&event_mutex);
@@ -171,9 +138,6 @@ void perf_trace_destroy(struct perf_even
 	 */
 	tracepoint_synchronize_unregister();
 
-	free_percpu(tp_event->perf_events);
-	tp_event->perf_events = NULL;
-
 	if (!--total_ref_count) {
 		for (i = 0; i < PERF_NR_CONTEXTS; i++) {
 			free_percpu(perf_trace_buf[i]);
@@ -185,6 +149,27 @@ out:
 	mutex_unlock(&event_mutex);
 }
 
+void perf_trace_destroy(struct perf_event *p_event)
+{
+	__perf_trace_destroy(p_event->tp_event);
+}
+
+void perf_trace_destroy_id(int event_id)
+{
+	struct ftrace_event_call *tp_event;
+	struct trace_event *t_event;
+
+	trace_event_read_lock();
+	t_event = ftrace_find_event(event_id);
+	if (!t_event)
+		goto unlock;
+
+	tp_event = container_of(t_event, struct ftrace_event_call, event);
+	__perf_trace_destroy(tp_event);
+unlock:
+	trace_event_read_unlock();
+}
+
 __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
 				       struct pt_regs *regs, int *rctxp)
 {
Index: linux/kernel/trace/trace_kprobe.c
===================================================================
--- linux.orig/kernel/trace/trace_kprobe.c
+++ linux/kernel/trace/trace_kprobe.c
@@ -1659,7 +1659,6 @@ static __kprobes void kprobe_perf_func(s
 	struct trace_probe *tp = container_of(kp, struct trace_probe, rp.kp);
 	struct ftrace_event_call *call = &tp->call;
 	struct kprobe_trace_entry_head *entry;
-	struct hlist_head *head;
 	int size, __size, dsize;
 	int rctx;
 
@@ -1679,8 +1678,8 @@ static __kprobes void kprobe_perf_func(s
 	memset(&entry[1], 0, dsize);
 	store_trace_args(sizeof(*entry), tp, regs, (u8 *)&entry[1], dsize);
 
-	head = this_cpu_ptr(call->perf_events);
-	perf_trace_buf_submit(entry, size, rctx, entry->ip, 1, regs, head);
+	perf_trace_buf_submit(entry, size, rctx, entry->ip, 1, regs,
+			      call->event.type);
 }
 
 /* Kretprobe profile handler */
@@ -1690,7 +1689,6 @@ static __kprobes void kretprobe_perf_fun
 	struct trace_probe *tp = container_of(ri->rp, struct trace_probe, rp);
 	struct ftrace_event_call *call = &tp->call;
 	struct kretprobe_trace_entry_head *entry;
-	struct hlist_head *head;
 	int size, __size, dsize;
 	int rctx;
 
@@ -1710,8 +1708,8 @@ static __kprobes void kretprobe_perf_fun
 	entry->ret_ip = (unsigned long)ri->ret_addr;
 	store_trace_args(sizeof(*entry), tp, regs, (u8 *)&entry[1], dsize);
 
-	head = this_cpu_ptr(call->perf_events);
-	perf_trace_buf_submit(entry, size, rctx, entry->ret_ip, 1, regs, head);
+	perf_trace_buf_submit(entry, size, rctx, entry->ret_ip, 1,
+			      regs, call->event.type);
 }
 
 static int probe_perf_enable(struct ftrace_event_call *call)
Index: linux/kernel/trace/trace_output.c
===================================================================
--- linux.orig/kernel/trace/trace_output.c
+++ linux/kernel/trace/trace_output.c
@@ -8,6 +8,7 @@
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/ftrace.h>
+#include <linux/idr.h>
 
 #include "trace_output.h"
 
@@ -16,9 +17,9 @@
 
 DECLARE_RWSEM(trace_event_mutex);
 
-static struct hlist_head event_hash[EVENT_HASHSIZE] __read_mostly;
+static const int first_event_type = __TRACE_LAST_TYPE + 1;
 
-static int next_event_type = __TRACE_LAST_TYPE + 1;
+static DEFINE_IDR(trace_type_idr);
 
 int trace_print_seq(struct seq_file *m, struct trace_seq *s)
 {
@@ -664,58 +665,43 @@ static int task_state_char(unsigned long
  */
 struct trace_event *ftrace_find_event(int type)
 {
-	struct trace_event *event;
-	struct hlist_node *n;
-	unsigned key;
-
-	key = type & (EVENT_HASHSIZE - 1);
-
-	hlist_for_each_entry(event, n, &event_hash[key], node) {
-		if (event->type == type)
-			return event;
-	}
-
-	return NULL;
+	return idr_find(&trace_type_idr, type);
 }
 
-static LIST_HEAD(ftrace_event_list);
+void trace_event_read_lock(void)
+{
+	down_read(&trace_event_mutex);
+}
 
-static int trace_search_list(struct list_head **list)
+void trace_event_read_unlock(void)
 {
-	struct trace_event *e;
-	int last = __TRACE_LAST_TYPE;
+	up_read(&trace_event_mutex);
+}
 
-	if (list_empty(&ftrace_event_list)) {
-		*list = &ftrace_event_list;
-		return last + 1;
-	}
+static int register_event(struct trace_event *event, int id, bool strict)
+{
+	int ret, type;
 
-	/*
-	 * We used up all possible max events,
-	 * lets see if somebody freed one.
-	 */
-	list_for_each_entry(e, &ftrace_event_list, list) {
-		if (e->type != last + 1)
-			break;
-		last++;
-	}
+	ret = idr_pre_get(&trace_type_idr, GFP_KERNEL);
+	if (!ret)
+		return 0;
 
-	/* Did we used up all 65 thousand events??? */
-	if ((last + 1) > FTRACE_MAX_EVENT)
+	ret = idr_get_new_above(&trace_type_idr, event, id, &type);
+	if (ret)
 		return 0;
 
-	*list = &e->list;
-	return last + 1;
-}
+	if (strict && id != type) {
+		idr_remove(&trace_type_idr, type);
+		return 0;
+	}
 
-void trace_event_read_lock(void)
-{
-	down_read(&trace_event_mutex);
-}
+	if (type > FTRACE_MAX_EVENT) {
+		idr_remove(&trace_type_idr, type);
+		return 0;
+	}
 
-void trace_event_read_unlock(void)
-{
-	up_read(&trace_event_mutex);
+	event->type = type;
+	return type;
 }
 
 /**
@@ -735,7 +721,6 @@ void trace_event_read_unlock(void)
  */
 int register_ftrace_event(struct trace_event *event)
 {
-	unsigned key;
 	int ret = 0;
 
 	down_write(&trace_event_mutex);
@@ -746,35 +731,18 @@ int register_ftrace_event(struct trace_e
 	if (WARN_ON(!event->funcs))
 		goto out;
 
-	INIT_LIST_HEAD(&event->list);
-
 	if (!event->type) {
-		struct list_head *list = NULL;
-
-		if (next_event_type > FTRACE_MAX_EVENT) {
-
-			event->type = trace_search_list(&list);
-			if (!event->type)
-				goto out;
-
-		} else {
-			
-			event->type = next_event_type++;
-			list = &ftrace_event_list;
-		}
-
-		if (WARN_ON(ftrace_find_event(event->type)))
+		ret = register_event(event, first_event_type, false);
+		if (!ret)
 			goto out;
-
-		list_add_tail(&event->list, list);
-
-	} else if (event->type > __TRACE_LAST_TYPE) {
-		printk(KERN_WARNING "Need to add type to trace.h\n");
-		WARN_ON(1);
-		goto out;
 	} else {
-		/* Is this event already used */
-		if (ftrace_find_event(event->type))
+		if (event->type > __TRACE_LAST_TYPE) {
+			printk(KERN_WARNING "Need to add type to trace.h\n");
+			WARN_ON(1);
+			goto out;
+		}
+		ret = register_event(event, event->type, true);
+		if (!ret)
 			goto out;
 	}
 
@@ -787,11 +755,6 @@ int register_ftrace_event(struct trace_e
 	if (event->funcs->binary == NULL)
 		event->funcs->binary = trace_nop_print;
 
-	key = event->type & (EVENT_HASHSIZE - 1);
-
-	hlist_add_head(&event->node, &event_hash[key]);
-
-	ret = event->type;
  out:
 	up_write(&trace_event_mutex);
 
@@ -804,8 +767,7 @@ EXPORT_SYMBOL_GPL(register_ftrace_event)
  */
 int __unregister_ftrace_event(struct trace_event *event)
 {
-	hlist_del(&event->node);
-	list_del(&event->list);
+	idr_remove(&trace_type_idr, event->type);
 	return 0;
 }
 
Index: linux/kernel/trace/trace_syscalls.c
===================================================================
--- linux.orig/kernel/trace/trace_syscalls.c
+++ linux/kernel/trace/trace_syscalls.c
@@ -499,7 +499,6 @@ static void perf_syscall_enter(void *ign
 {
 	struct syscall_metadata *sys_data;
 	struct syscall_trace_enter *rec;
-	struct hlist_head *head;
 	int syscall_nr;
 	int rctx;
 	int size;
@@ -530,8 +529,7 @@ static void perf_syscall_enter(void *ign
 	syscall_get_arguments(current, regs, 0, sys_data->nb_args,
 			       (unsigned long *)&rec->args);
 
-	head = this_cpu_ptr(sys_data->enter_event->perf_events);
-	perf_trace_buf_submit(rec, size, rctx, 0, 1, regs, head);
+	perf_trace_buf_submit(rec, size, rctx, 0, 1, regs, rec->ent.type);
 }
 
 int perf_sysenter_enable(struct ftrace_event_call *call)
@@ -573,7 +571,6 @@ static void perf_syscall_exit(void *igno
 {
 	struct syscall_metadata *sys_data;
 	struct syscall_trace_exit *rec;
-	struct hlist_head *head;
 	int syscall_nr;
 	int rctx;
 	int size;
@@ -606,8 +603,7 @@ static void perf_syscall_exit(void *igno
 	rec->nr = syscall_nr;
 	rec->ret = syscall_get_return_value(current, regs);
 
-	head = this_cpu_ptr(sys_data->exit_event->perf_events);
-	perf_trace_buf_submit(rec, size, rctx, 0, 1, regs, head);
+	perf_trace_buf_submit(rec, size, rctx, 0, 1, regs, rec->ent.type);
 }
 
 int perf_sysexit_enable(struct ftrace_event_call *call)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration
  2013-06-26 11:45               ` Ingo Molnar
  2013-06-26 12:25                 ` Ingo Molnar
@ 2013-06-26 12:44                 ` Robert Richter
  2013-06-27  5:46                   ` Namhyung Kim
  1 sibling, 1 reply; 54+ messages in thread
From: Robert Richter @ 2013-06-26 12:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Borislav Petkov, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Jiri Olsa, linux-kernel

On 26.06.13 13:45:38, Ingo Molnar wrote:
> * Robert Richter <rric@kernel.org> wrote:
> > Creating a persistent event from userspace:
> > 
> >  * A process opens a system-wide event with the syscall and gets a fd.
> 
> Should this really be limited to system-wide events?

It must not necessarily be restricted to system-wide events. Limiting
it is just to make it easier in the beginning, we don't need to think
about what happens if a process dies and permissions in case of
per-task events, etc (didn't thought about it yet ;).

Also, a persistent event is currently per-system, meaning there is one
entry only for the same kind of event scheduled on all cpus. This
keeps event handling easy (e.g. no need to export the cpu in the
event's sysfs entry, just the flag and id) but also has some drawbacks
(handling of multiple events per entry). Probably it's better to have
a 1:1 mapping.

> >  * The process mmaps the buffer.
> >  * The process does an ioctl to detach the process which increases the
> >    events and buffers refcount. The event is listed as 'persistent' in
> >    sysfs with a unique id.
> >  * The process closes the fd. Event and buffer remain in the system
> >    since the refcounts are not zero.
> > 
> > Opening a persistent event:
> > 
> >  * A process scans sysfs for persistent events.
> >  * To open the event it sets up the event attr according to sysfs.
> 
> Basically it would just put some ID (found in sysfs) into the attr and set 
> attr.persistent=1 - not any other information, right?
> 
> If it knows the ID straight away (the user told it, or it remembers it 
> from some other file such as a temporary file, etc.) then it does not even 
> have to scan sysfs.

Yes, there is a unique id which we could also return with the ioctl or
so. sysfs is esp. to let perf tools and the event parser know about
how to setup the events. It might be also useful if the syscall setup
changes in the future for these kind of events, then we just modify
the sysfs entry.

> [ How about to additional logic: attr.persistent=1 && attr.config==0 means 
>   a new persistent event is created straight away - no ioctl is needed to 
>   detach it explicitly. ]

That's correct. We could also do the following:

To connect to an existing event:

 attr.type=<persistent-pmu> && attr.config==<event-id>

(This might be harder to implement except the persistent event pmu
type will be fix, PERF_TYPE_PERSISTENT=6.)

To create a new persistent event:

 attr.persistent=1 && attr=<some event setup: pmu, config, flags, etc>

> >  * The persistent event is opened with the syscall, the process gets a
> >    new fd of the event.
> >  * The process attaches to the event buffer with mmap.
> 
> Yes. And gets the pre-existing event and mmap buffer.

That's what I mean.

A problem here is that mmap'ed buffer size (number of pages) must be
be equal to the pre-existing buffer size and thus to be known somehow.

> > Releasing a persistent event:
> > 
> >  * A process opens a persistent event and gets a fd.
> >  * The process does an ioctl to attach the process which decreases the
> >    refcounts. The sysfs entry is removed.
> >  * The process closes the fd.
> >  * After all processes that are tied to the event closed their event's
> >    fds, the persistent event and its buffer is released.
> > 
> > Sounds like a plan?
> 
> It does :-)
> 
> I'm sure there will be some details going down that path, but it looks 
> workable at first glance.

Yes, there will be some 'implementation details', but it should work.

> Note, for tracing the PERF_FLAG_FD_OUTPUT method of multiplexing multiple 
> events onto a single mmap buffers is probably useful (also usable via the 
> PERF_EVENT_IOC_SET_OUTPUT ioctl()), so please make sure the scheme works 
> naturally with that model as well, not just with 1:1 event+buffer 
> mappings.
> 
> See the uses of PERF_EVENT_IOC_SET_OUTPUT in tools/perf/.

Yes, thanks for this hint. I wasn't aware of this feature yet.

Thanks for your comments. Will start reworking the patches into this
direction.

-Robert

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration
  2013-06-26 12:44                 ` Robert Richter
@ 2013-06-27  5:46                   ` Namhyung Kim
  2013-06-27  8:35                     ` Borislav Petkov
  0 siblings, 1 reply; 54+ messages in thread
From: Namhyung Kim @ 2013-06-27  5:46 UTC (permalink / raw)
  To: Robert Richter
  Cc: Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

Hi Robert,

On Wed, 26 Jun 2013 14:44:24 +0200, Robert Richter wrote:
> On 26.06.13 13:45:38, Ingo Molnar wrote:
>> [ How about to additional logic: attr.persistent=1 && attr.config==0 means 
>>   a new persistent event is created straight away - no ioctl is needed to 
>>   detach it explicitly. ]
>
> That's correct. We could also do the following:
>
> To connect to an existing event:
>
>  attr.type=<persistent-pmu> && attr.config==<event-id>
>
> (This might be harder to implement except the persistent event pmu
> type will be fix, PERF_TYPE_PERSISTENT=6.)
>
> To create a new persistent event:
>
>  attr.persistent=1 && attr=<some event setup: pmu, config, flags, etc>

How about using 2 bits for perfsistent flag, 1 for connecting to an
existing one, 2 for creating new one.


>> >  * The persistent event is opened with the syscall, the process gets a
>> >    new fd of the event.
>> >  * The process attaches to the event buffer with mmap.
>> 
>> Yes. And gets the pre-existing event and mmap buffer.
>
> That's what I mean.
>
> A problem here is that mmap'ed buffer size (number of pages) must be
> be equal to the pre-existing buffer size and thus to be known somehow.

What about also exporting the buffer size via sysfs pmu directory?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration
  2013-06-27  5:46                   ` Namhyung Kim
@ 2013-06-27  8:35                     ` Borislav Petkov
  2013-06-27  8:50                       ` Ingo Molnar
  0 siblings, 1 reply; 54+ messages in thread
From: Borislav Petkov @ 2013-06-27  8:35 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Robert Richter, Ingo Molnar, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel

On Thu, Jun 27, 2013 at 02:46:36PM +0900, Namhyung Kim wrote:
> How about using 2 bits for perfsistent flag, 1 for connecting to an
> existing one, 2 for creating new one.

No need since persistent events don't need to be duplicated. Think of a
tracepoint: the samples you get there are the same, no matter how many
times you enable it.

So, if you open a persistent event which doesn't exist, it gets created.
If you open an existing one, you get read-only access to its buffers. No
need for two bits. Actually, with the detach/attach ioctl we don't even
need a single flag but having it makes the implementation a lot simpler.

> > A problem here is that mmap'ed buffer size (number of pages) must be
> > be equal to the pre-existing buffer size and thus to be known somehow.
> 
> What about also exporting the buffer size via sysfs pmu directory?

Yes, we've been discussing buffer size. The simplest thing is to
hardcode the buffer size so that it is implicitly known to all agents.
However, I don't know whether there is a use case where different buffer
sizes actually make sense. I'd tend to do the simplest thing initially.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration
  2013-06-27  8:35                     ` Borislav Petkov
@ 2013-06-27  8:50                       ` Ingo Molnar
  0 siblings, 0 replies; 54+ messages in thread
From: Ingo Molnar @ 2013-06-27  8:50 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Namhyung Kim, Robert Richter, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel


* Borislav Petkov <bp@alien8.de> wrote:

> > > A problem here is that mmap'ed buffer size (number of pages) must be 
> > > be equal to the pre-existing buffer size and thus to be known 
> > > somehow.
> > 
> > What about also exporting the buffer size via sysfs pmu directory?
> 
> Yes, we've been discussing buffer size. The simplest thing is to 
> hardcode the buffer size so that it is implicitly known to all agents. 
> However, I don't know whether there is a use case where different buffer 
> sizes actually make sense. I'd tend to do the simplest thing initially.

Btw., in terms of testing and design focus I'd suggest concentrating not 
on rare and relatively singular events like RAS MCE events, but on a more 
'everyday tracing' flow of events:

 - large, per cpu trace buffers

 - all events output into a single trace buffer to get a coherent trace

 - [ possibly revive the tracepoint-multiplexing patch from peterz/tglx,
     to be able to get a rich selection of tracepoints. ]

That is I think the workflow most people will be interested in:

 - they have a workload to analyze and they want to do some 'tracing' to 
   understand it better or to pinpoint a problem.

 - based on the problem they want to trace a selection of tracepoints, as 
   easily and quickly as possible.

 - we could possibly provide a few 'groups' of tracepoints for typical
   uses - for example scheduler tracing, or MM tracing, or IO tracing,
   etc.), so people wouldn't have to specify a dozen tracepoints but could
   symbolically refer to any of these pre-cooked groups. [this is probably
   a tooling detail.]

 - they want to persistently trace into a generously sized trace buffer,
   without any tool running while the collection period lasts.

 - to refine the result they'd like to stop/start tracing, reset/clear the
   tracebuffer for failed attempts, and generally see how large the
   tracebuffer is.

 - and extract/analyze traces - both in a detailed form and in summary
   fashion.

 - they possibly want to save it to a perf.data and have equivalent
   facilities of analysis.

If that workflow works well then the RAS angle will work well too.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2013-06-27  8:51 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-11 16:42 [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Robert Richter
2013-06-11 16:42 ` [PATCH v2 01/14] perf, ring_buffer: Use same prefix Robert Richter
2013-06-11 16:42 ` [PATCH v2 02/14] perf: Add persistent events Robert Richter
2013-06-24  9:28   ` Peter Zijlstra
2013-06-24 19:24     ` Borislav Petkov
2013-06-25  8:46       ` Robert Richter
2013-06-11 16:42 ` [PATCH v2 03/14] perf: Add persistent event facilities Robert Richter
2013-06-14  2:15   ` Namhyung Kim
2013-06-14  7:20     ` Robert Richter
2013-06-24  9:32   ` Peter Zijlstra
2013-06-25  8:47     ` Robert Richter
2013-06-24  9:44   ` Peter Zijlstra
2013-06-25  8:41     ` Robert Richter
2013-06-24  9:48   ` Peter Zijlstra
2013-06-24 19:26     ` Borislav Petkov
2013-06-25  7:44       ` Peter Zijlstra
2013-06-25  9:24         ` Robert Richter
2013-06-25  9:37           ` Borislav Petkov
2013-06-25 10:51             ` Robert Richter
2013-06-25 15:29               ` Borislav Petkov
2013-06-25 16:14                 ` Robert Richter
2013-06-11 16:42 ` [PATCH v2 04/14] MCE: Enable persistent event Robert Richter
2013-06-11 16:42 ` [PATCH v2 05/14] perf, persistent: Rework struct pers_event_desc Robert Richter
2013-06-11 16:42 ` [PATCH v2 06/14] perf, persistent: Remove rb_put() Robert Richter
2013-06-11 16:42 ` [PATCH v2 07/14] perf, persistent: Introduce get_persistent_event() Robert Richter
2013-06-11 16:42 ` [PATCH v2 08/14] perf, persistent: Reworking perf_get_persistent_event_fd() Robert Richter
2013-06-11 16:42 ` [PATCH v2 09/14] perf, persistent: Protect event lists with mutex Robert Richter
2013-06-11 16:42 ` [PATCH v2 10/14] perf, persistent: Avoid adding identical events Robert Richter
2013-06-11 16:42 ` [PATCH v2 11/14] perf, persistent: Implementing a persistent pmu Robert Richter
2013-06-11 16:42 ` [PATCH v2 12/14] perf, persistent: Name each persistent event Robert Richter
2013-06-11 16:42 ` [PATCH v2 13/14] perf, persistent: Exposing persistent events using sysfs Robert Richter
2013-06-14  2:36   ` Namhyung Kim
2013-06-14  8:57     ` Robert Richter
2013-06-11 16:42 ` [PATCH v2 14/14] perf, persistent: Allow multiple users for an event Robert Richter
2013-06-24 10:08 ` [PATCH v2 00/14] perf, persistent: Kernel updates for perf tool integration Peter Zijlstra
2013-06-25 10:46   ` Robert Richter
2013-06-24 10:22 ` Peter Zijlstra
2013-06-25 16:56   ` Robert Richter
2013-06-24 10:24 ` Peter Zijlstra
2013-06-24 15:25 ` Peter Zijlstra
2013-06-24 19:45   ` Ingo Molnar
2013-06-25 17:57     ` Robert Richter
2013-06-25 19:16       ` Borislav Petkov
2013-06-26  8:12         ` Robert Richter
2013-06-26  8:24           ` Borislav Petkov
2013-06-26  9:46             ` Ingo Molnar
2013-06-26  9:56               ` Borislav Petkov
2013-06-26 10:11             ` Robert Richter
2013-06-26 11:45               ` Ingo Molnar
2013-06-26 12:25                 ` Ingo Molnar
2013-06-26 12:44                 ` Robert Richter
2013-06-27  5:46                   ` Namhyung Kim
2013-06-27  8:35                     ` Borislav Petkov
2013-06-27  8:50                       ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).