linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* perf fuzzer crash [PATCH] perf: Get group events reference before moving the group
@ 2015-01-16  7:57 Jiri Olsa
  2015-01-16 10:46 ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Jiri Olsa @ 2015-01-16  7:57 UTC (permalink / raw)
  To: Vince Weaver; +Cc: Peter Zijlstra, Ingo Molnar, Andi Kleen, linux-kernel

hi Vince,
I was able to reproduce the issue you described in:
  http://marc.info/?l=linux-kernel&m=141806390822670&w=2

I might have found one way that could lead to screwing up
context's refcounts.. could you please try attached patch?

I'm now on 2 days of no crash while it used to happen
3 times a day before.

Or you can use following git branch:
  git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
  perf/trinity_fix

thanks,
jirka


---
We need to make sure, that no event in the group lost
the last reference and gets removed from the context
during the group move in perf syscall.

This could happen if the child exits and calls put_event
on the parent event which got already closed, like in
following scenario:

  - T1 creates software event E1
  - T1 creates other software events as group with E1 as group leader
  - T1 forks T2
  - T2 has cloned E1 event that holds reference on E1
  - T1 closes event within E1 group (say E3), the event stays alive
    due to the T2 reference
  - following happens concurently:
    A) T1 creates hardware event E2 with groupleader E1
    B) T2 exits

ad A) T1 triggers the E1 group move into hardware context:
        mutex_lock(E1->ctx)
          - remove E1 group only from the E1->ctx context, leaving
            the goup links untouched
        mutex_unlock(E1->ctx)
        mutex_lock(E2->ctx)
          - install E1 group into E2->ctx using the E1 group links
        mutex_unlock(E2->ctx)

ad B) put_event(E3) is called and E3 is removed from E1->ctx
      completely, including group links

If 'A' and 'B' races, we will get unbalanced refcounts,
because of removed group links.

Adding get_group/put_group functions to handle the event
ref's increase/decrease for the whole group.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
---
 kernel/events/core.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index af0a5ba4e21d..1922bae9f24e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7250,6 +7250,55 @@ out:
 	return ret;
 }
 
+static void put_group(struct perf_event **group_arr)
+{
+	struct perf_event *event;
+	int i = 0;
+
+	while (event = group_arr[i++]) {
+		put_event(event);
+	}
+
+	kfree(group_arr);
+}
+
+static int get_group(struct perf_event *leader, struct perf_event ***group_arr)
+{
+	struct perf_event_context *ctx = leader->ctx;
+	struct perf_event *sibling, **arr = NULL;
+	int i = 0, err = -ENOMEM;
+	size_t size;
+
+	if (!atomic_long_inc_not_zero(&leader->refcount))
+		return -EINVAL;
+
+	mutex_lock(&ctx->mutex);
+	/* + 1 for leader and +1 for final NULL */
+	size = (leader->nr_siblings + 2) * sizeof(leader);
+
+	arr = *group_arr = kzalloc(size, GFP_KERNEL);
+	if (!arr)
+		goto err;
+
+	err = -EINVAL;
+	arr[i++] = leader;
+
+	list_for_each_entry(sibling, &leader->sibling_list, group_entry) {
+		if (!atomic_long_inc_not_zero(&sibling->refcount))
+			goto err;
+
+		arr[i++] = sibling;
+	}
+
+	mutex_unlock(&ctx->mutex);
+	return 0;
+err:
+	mutex_unlock(&ctx->mutex);
+	if (arr)
+		put_group(arr);
+	return err;
+}
+
 /**
  * sys_perf_event_open - open a performance event, associate it to a task/cpu
  *
@@ -7263,6 +7312,7 @@ SYSCALL_DEFINE5(perf_event_open,
 		pid_t, pid, int, cpu, int, group_fd, unsigned long, flags)
 {
 	struct perf_event *group_leader = NULL, *output_event = NULL;
+	struct perf_event **group_arr = NULL;
 	struct perf_event *event, *sibling;
 	struct perf_event_attr attr;
 	struct perf_event_context *ctx;
@@ -7443,6 +7493,12 @@ SYSCALL_DEFINE5(perf_event_open,
 			goto err_context;
 	}
 
+	if (move_group) {
+		err = get_group(group_leader, &group_arr);
+		if (err)
+			goto err_context;
+	}
+
 	event_file = anon_inode_getfile("[perf_event]", &perf_fops, event,
 					f_flags);
 	if (IS_ERR(event_file)) {
@@ -7490,6 +7546,9 @@ SYSCALL_DEFINE5(perf_event_open,
 	perf_unpin_context(ctx);
 	mutex_unlock(&ctx->mutex);
 
+	if (move_group)
+		put_group(group_arr);
+
 	put_online_cpus();
 
 	event->owner = current;
@@ -7515,6 +7574,8 @@ SYSCALL_DEFINE5(perf_event_open,
 	return event_fd;
 
 err_context:
+	if (group_arr)
+		put_group(group_arr);
 	perf_unpin_context(ctx);
 	put_ctx(ctx);
 err_alloc:
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: perf fuzzer crash [PATCH] perf: Get group events reference before moving the group
  2015-01-16  7:57 perf fuzzer crash [PATCH] perf: Get group events reference before moving the group Jiri Olsa
@ 2015-01-16 10:46 ` Peter Zijlstra
  2015-01-16 14:11   ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2015-01-16 10:46 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Vince Weaver, Ingo Molnar, Andi Kleen, linux-kernel

On Fri, Jan 16, 2015 at 08:57:46AM +0100, Jiri Olsa wrote:
> We need to make sure, that no event in the group lost
> the last reference and gets removed from the context
> during the group move in perf syscall.
> 
> This could happen if the child exits and calls put_event
> on the parent event which got already closed, like in
> following scenario:
> 
>   - T1 creates software event E1
>   - T1 creates other software events as group with E1 as group leader
>   - T1 forks T2
>   - T2 has cloned E1 event that holds reference on E1
>   - T1 closes event within E1 group (say E3), the event stays alive
>     due to the T2 reference
>   - following happens concurently:
>     A) T1 creates hardware event E2 with groupleader E1
>     B) T2 exits
> 
> ad A) T1 triggers the E1 group move into hardware context:
>         mutex_lock(E1->ctx)
>           - remove E1 group only from the E1->ctx context, leaving
>             the goup links untouched
>         mutex_unlock(E1->ctx)
>         mutex_lock(E2->ctx)
>           - install E1 group into E2->ctx using the E1 group links
>         mutex_unlock(E2->ctx)
> 
> ad B) put_event(E3) is called and E3 is removed from E1->ctx
>       completely, including group links
> 
> If 'A' and 'B' races, we will get unbalanced refcounts,
> because of removed group links.
> 
> Adding get_group/put_group functions to handle the event
> ref's increase/decrease for the whole group.

Its a bandaid at best :/ The problem is (again) that we changes
event->ctx without any kind of serialization.

The issue came up before:

  https://lkml.org/lkml/2014/9/5/397

and I've not been able to come up with anything much saner.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf fuzzer crash [PATCH] perf: Get group events reference before moving the group
  2015-01-16 10:46 ` Peter Zijlstra
@ 2015-01-16 14:11   ` Peter Zijlstra
  2015-01-16 18:54     ` Vince Weaver
                       ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Peter Zijlstra @ 2015-01-16 14:11 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Vince Weaver, Ingo Molnar, Andi Kleen, linux-kernel,
	mark.rutland, Linus Torvalds

On Fri, Jan 16, 2015 at 11:46:44AM +0100, Peter Zijlstra wrote:
> Its a bandaid at best :/ The problem is (again) that we changes
> event->ctx without any kind of serialization.
> 
> The issue came up before:
> 
>   https://lkml.org/lkml/2014/9/5/397
> 
> and I've not been able to come up with anything much saner.

A little something like the below is the best I could come up with; I
know Linus hated it, but I figure we ought to do something to stop
crashing.



---
 init/Kconfig         |    1 +
 kernel/events/core.c |  126 ++++++++++++++++++++++++++++++++++++++++----------
 2 files changed, 103 insertions(+), 24 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 9afb971..ebc8522 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1595,6 +1595,7 @@ config PERF_EVENTS
 	depends on HAVE_PERF_EVENTS
 	select ANON_INODES
 	select IRQ_WORK
+	select PERCPU_RWSEM
 	help
 	  Enable kernel support for various performance events provided
 	  by software and hardware.
diff --git a/kernel/events/core.c b/kernel/events/core.c
index c10124b..fb3971d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -42,6 +42,7 @@
 #include <linux/module.h>
 #include <linux/mman.h>
 #include <linux/compat.h>
+#include <linux/percpu-rwsem.h>
 
 #include "internal.h"
 
@@ -122,6 +123,42 @@ static int cpu_function_call(int cpu, int (*func) (void *info), void *info)
 	return data.ret;
 }
 
+/*
+ * Required to migrate events between contexts.
+ *
+ * Migrating events between contexts is rather tricky; there is no real
+ * serialization around the perf_event::ctx pointer.
+ *
+ * So what we do is hold this rwsem over the remove_from_context and
+ * install_in_context. The remove_from_context ensures the event is inactive
+ * and will not be used from IRQ/NMI context anymore, and the remaining
+ * sites can acquire the rwsem read side.
+ */
+static struct percpu_rw_semaphore perf_rwsem;
+
+static inline struct perf_event_context *perf_event_ctx(struct perf_event *event)
+{
+#ifdef CONFIG_LOCKDEP
+	/*
+	 * Assert the locking rules outlined above; in order to dereference
+	 * event->ctx we must either be attached to the context or hold
+	 * perf_rwsem.
+	 *
+	 * XXX not usable from IPIs because the lockdep held lock context
+	 * will be wrong; maybe add trylock variants to the percpu_rw_semaphore
+	 */
+	WARN_ON_ONCE(!(event->attach_state & PERF_ATTACH_CONTEXT) ||
+		     (debug_locks && !lockdep_is_held(&perf_rwsem.rw_sem)));
+#endif
+
+	return event->ctx;
+}
+
+static inline struct perf_event_context *__perf_event_ctx(struct perf_event *event)
+{
+	return event->ctx;
+}
+
 #define EVENT_OWNER_KERNEL ((void *) -1)
 
 static bool is_kernel_event(struct perf_event *event)
@@ -380,7 +417,7 @@ perf_cgroup_from_task(struct task_struct *task)
 static inline bool
 perf_cgroup_match(struct perf_event *event)
 {
-	struct perf_event_context *ctx = event->ctx;
+	struct perf_event_context *ctx = __perf_event_ctx(event);
 	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
 
 	/* @event doesn't care about cgroup */
@@ -1054,7 +1091,7 @@ static void update_context_time(struct perf_event_context *ctx)
 
 static u64 perf_event_time(struct perf_event *event)
 {
-	struct perf_event_context *ctx = event->ctx;
+	struct perf_event_context *ctx = __perf_event_ctx(event);
 
 	if (is_cgroup_event(event))
 		return perf_cgroup_event_time(event);
@@ -1068,7 +1105,7 @@ static u64 perf_event_time(struct perf_event *event)
  */
 static void update_event_times(struct perf_event *event)
 {
-	struct perf_event_context *ctx = event->ctx;
+	struct perf_event_context *ctx = __perf_event_ctx(event);
 	u64 run_end;
 
 	if (event->state < PERF_EVENT_STATE_INACTIVE ||
@@ -1518,7 +1555,7 @@ static int __perf_remove_from_context(void *info)
 {
 	struct remove_event *re = info;
 	struct perf_event *event = re->event;
-	struct perf_event_context *ctx = event->ctx;
+	struct perf_event_context *ctx = __perf_event_ctx(event);
 	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
 
 	raw_spin_lock(&ctx->lock);
@@ -1551,7 +1588,7 @@ static int __perf_remove_from_context(void *info)
  */
 static void perf_remove_from_context(struct perf_event *event, bool detach_group)
 {
-	struct perf_event_context *ctx = event->ctx;
+	struct perf_event_context *ctx = perf_event_ctx(event);
 	struct task_struct *task = ctx->task;
 	struct remove_event re = {
 		.event = event,
@@ -1606,7 +1643,7 @@ retry:
 int __perf_event_disable(void *info)
 {
 	struct perf_event *event = info;
-	struct perf_event_context *ctx = event->ctx;
+	struct perf_event_context *ctx = __perf_event_ctx(event);
 	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
 
 	/*
@@ -1656,20 +1693,24 @@ int __perf_event_disable(void *info)
  */
 void perf_event_disable(struct perf_event *event)
 {
-	struct perf_event_context *ctx = event->ctx;
-	struct task_struct *task = ctx->task;
+	struct perf_event_context *ctx;
+	struct task_struct *task;
+
+	percpu_down_read(&perf_rwsem);
+	ctx = perf_event_ctx(event);
+	task = ctx->task;
 
 	if (!task) {
 		/*
 		 * Disable the event on the cpu that it's on
 		 */
 		cpu_function_call(event->cpu, __perf_event_disable, event);
-		return;
+		goto unlock;
 	}
 
 retry:
 	if (!task_function_call(task, __perf_event_disable, event))
-		return;
+		goto unlock;
 
 	raw_spin_lock_irq(&ctx->lock);
 	/*
@@ -1694,6 +1735,8 @@ retry:
 		event->state = PERF_EVENT_STATE_OFF;
 	}
 	raw_spin_unlock_irq(&ctx->lock);
+unlock:
+	percpu_up_read(&perf_rwsem);
 }
 EXPORT_SYMBOL_GPL(perf_event_disable);
 
@@ -1937,7 +1980,7 @@ static void perf_event_sched_in(struct perf_cpu_context *cpuctx,
 static int  __perf_install_in_context(void *info)
 {
 	struct perf_event *event = info;
-	struct perf_event_context *ctx = event->ctx;
+	struct perf_event_context *ctx = __perf_event_ctx(event);
 	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
 	struct perf_event_context *task_ctx = cpuctx->task_ctx;
 	struct task_struct *task = current;
@@ -2076,7 +2119,7 @@ static void __perf_event_mark_enabled(struct perf_event *event)
 static int __perf_event_enable(void *info)
 {
 	struct perf_event *event = info;
-	struct perf_event_context *ctx = event->ctx;
+	struct perf_event_context *ctx = __perf_event_ctx(event);
 	struct perf_event *leader = event->group_leader;
 	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
 	int err;
@@ -2160,15 +2203,19 @@ unlock:
  */
 void perf_event_enable(struct perf_event *event)
 {
-	struct perf_event_context *ctx = event->ctx;
-	struct task_struct *task = ctx->task;
+	struct perf_event_context *ctx;
+	struct task_struct *task;
+
+	percpu_down_read(&perf_rwsem);
+	ctx = perf_event_ctx(event);
+	task = ctx->task;
 
 	if (!task) {
 		/*
 		 * Enable the event on the cpu that it's on
 		 */
 		cpu_function_call(event->cpu, __perf_event_enable, event);
-		return;
+		goto unlock;
 	}
 
 	raw_spin_lock_irq(&ctx->lock);
@@ -2194,7 +2241,7 @@ retry:
 	raw_spin_unlock_irq(&ctx->lock);
 
 	if (!task_function_call(task, __perf_event_enable, event))
-		return;
+		goto unlock;
 
 	raw_spin_lock_irq(&ctx->lock);
 
@@ -2213,6 +2260,8 @@ retry:
 
 out:
 	raw_spin_unlock_irq(&ctx->lock);
+unlock:
+	percpu_up_read(&perf_rwsem);
 }
 EXPORT_SYMBOL_GPL(perf_event_enable);
 
@@ -3076,7 +3125,7 @@ void perf_event_exec(void)
 static void __perf_event_read(void *info)
 {
 	struct perf_event *event = info;
-	struct perf_event_context *ctx = event->ctx;
+	struct perf_event_context *ctx = __perf_event_ctx(event);
 	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
 
 	/*
@@ -3115,7 +3164,7 @@ static u64 perf_event_read(struct perf_event *event)
 		smp_call_function_single(event->oncpu,
 					 __perf_event_read, event, 1);
 	} else if (event->state == PERF_EVENT_STATE_INACTIVE) {
-		struct perf_event_context *ctx = event->ctx;
+		struct perf_event_context *ctx = perf_event_ctx(event);
 		unsigned long flags;
 
 		raw_spin_lock_irqsave(&ctx->lock, flags);
@@ -3440,7 +3489,7 @@ static void perf_remove_from_owner(struct perf_event *event)
  */
 static void put_event(struct perf_event *event)
 {
-	struct perf_event_context *ctx = event->ctx;
+	struct perf_event_context *ctx;
 
 	if (!atomic_long_dec_and_test(&event->refcount))
 		return;
@@ -3448,6 +3497,8 @@ static void put_event(struct perf_event *event)
 	if (!is_kernel_event(event))
 		perf_remove_from_owner(event);
 
+	percpu_down_read(&perf_rwsem);
+	ctx = perf_event_ctx(event);
 	WARN_ON_ONCE(ctx->parent_ctx);
 	/*
 	 * There are two ways this annotation is useful:
@@ -3464,6 +3515,7 @@ static void put_event(struct perf_event *event)
 	mutex_lock_nested(&ctx->mutex, SINGLE_DEPTH_NESTING);
 	perf_remove_from_context(event, true);
 	mutex_unlock(&ctx->mutex);
+	percpu_up_read(&perf_rwsem);
 
 	_free_event(event);
 }
@@ -3647,11 +3699,13 @@ perf_read_hw(struct perf_event *event, char __user *buf, size_t count)
 	if (count < event->read_size)
 		return -ENOSPC;
 
-	WARN_ON_ONCE(event->ctx->parent_ctx);
+	percpu_down_read(&perf_rwsem);
+	WARN_ON_ONCE(perf_event_ctx(event)->parent_ctx);
 	if (read_format & PERF_FORMAT_GROUP)
 		ret = perf_event_read_group(event, read_format, buf);
 	else
 		ret = perf_event_read_one(event, read_format, buf);
+	percpu_up_read(&perf_rwsem);
 
 	return ret;
 }
@@ -3689,9 +3743,11 @@ static unsigned int perf_poll(struct file *file, poll_table *wait)
 
 static void perf_event_reset(struct perf_event *event)
 {
+	percpu_down_read(&perf_rwsem);
 	(void)perf_event_read(event);
 	local64_set(&event->count, 0);
 	perf_event_update_userpage(event);
+	percpu_up_read(&perf_rwsem);
 }
 
 /*
@@ -3705,7 +3761,7 @@ static void perf_event_for_each_child(struct perf_event *event,
 {
 	struct perf_event *child;
 
-	WARN_ON_ONCE(event->ctx->parent_ctx);
+	WARN_ON_ONCE(__perf_event_ctx(event)->parent_ctx);
 	mutex_lock(&event->child_mutex);
 	func(event);
 	list_for_each_entry(child, &event->child_list, child_list)
@@ -3716,6 +3772,14 @@ static void perf_event_for_each_child(struct perf_event *event,
 static void perf_event_for_each(struct perf_event *event,
 				  void (*func)(struct perf_event *))
 {
+	/* 
+	 * XXX broken 
+	 *
+	 * lock inversion and recursion issues; ctx->mutex must nest inside
+	 * perf_rwsem, but func() will take perf_rwsem again.
+	 *
+	 * Cure with ugly.
+	 */
 	struct perf_event_context *ctx = event->ctx;
 	struct perf_event *sibling;
 
@@ -3731,7 +3795,7 @@ static void perf_event_for_each(struct perf_event *event,
 
 static int perf_event_period(struct perf_event *event, u64 __user *arg)
 {
-	struct perf_event_context *ctx = event->ctx;
+	struct perf_event_context *ctx;
 	int ret = 0, active;
 	u64 value;
 
@@ -3744,6 +3808,8 @@ static int perf_event_period(struct perf_event *event, u64 __user *arg)
 	if (!value)
 		return -EINVAL;
 
+	percpu_down_read(&perf_rwsem);
+	ctx = perf_event_ctx(event);
 	raw_spin_lock_irq(&ctx->lock);
 	if (event->attr.freq) {
 		if (value > sysctl_perf_event_sample_rate) {
@@ -3772,6 +3838,7 @@ static int perf_event_period(struct perf_event *event, u64 __user *arg)
 
 unlock:
 	raw_spin_unlock_irq(&ctx->lock);
+	percpu_up_read(&perf_rwsem);
 
 	return ret;
 }
@@ -7229,11 +7296,13 @@ perf_event_set_output(struct perf_event *event, struct perf_event *output_event)
 	if (output_event->cpu != event->cpu)
 		goto out;
 
+	percpu_down_read(&perf_rwsem);
 	/*
 	 * If its not a per-cpu rb, it must be the same task.
 	 */
-	if (output_event->cpu == -1 && output_event->ctx != event->ctx)
-		goto out;
+	if (output_event->cpu == -1 &&
+	    perf_event_ctx(output_event) != perf_event_ctx(event))
+		goto unlock_rwsem;
 
 set:
 	mutex_lock(&event->mmap_mutex);
@@ -7253,6 +7322,8 @@ set:
 	ret = 0;
 unlock:
 	mutex_unlock(&event->mmap_mutex);
+unlock_rwsem:
+	percpu_up_read(&perf_rwsem);
 
 out:
 	return ret;
@@ -7461,6 +7532,7 @@ SYSCALL_DEFINE5(perf_event_open,
 	if (move_group) {
 		struct perf_event_context *gctx = group_leader->ctx;
 
+		percpu_down_write(&perf_rwsem);
 		mutex_lock(&gctx->mutex);
 		perf_remove_from_context(group_leader, false);
 
@@ -7498,6 +7570,9 @@ SYSCALL_DEFINE5(perf_event_open,
 	perf_unpin_context(ctx);
 	mutex_unlock(&ctx->mutex);
 
+	if (move_group)
+		percpu_up_write(&perf_rwsem);
+
 	put_online_cpus();
 
 	event->owner = current;
@@ -7600,6 +7675,7 @@ void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu)
 	struct perf_event *event, *tmp;
 	LIST_HEAD(events);
 
+	percpu_down_write(&perf_rwsem);
 	src_ctx = &per_cpu_ptr(pmu->pmu_cpu_context, src_cpu)->ctx;
 	dst_ctx = &per_cpu_ptr(pmu->pmu_cpu_context, dst_cpu)->ctx;
 
@@ -7625,6 +7701,7 @@ void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu)
 		get_ctx(dst_ctx);
 	}
 	mutex_unlock(&dst_ctx->mutex);
+	percpu_up_write(&perf_rwsem);
 }
 EXPORT_SYMBOL_GPL(perf_pmu_migrate_context);
 
@@ -8261,6 +8338,7 @@ void __init perf_event_init(void)
 
 	idr_init(&pmu_idr);
 
+	percpu_init_rwsem(&perf_rwsem);
 	perf_event_init_all_cpus();
 	init_srcu_struct(&pmus_srcu);
 	perf_pmu_register(&perf_swevent, "software", PERF_TYPE_SOFTWARE);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: perf fuzzer crash [PATCH] perf: Get group events reference before moving the group
  2015-01-16 14:11   ` Peter Zijlstra
@ 2015-01-16 18:54     ` Vince Weaver
  2015-01-19  3:49       ` Vince Weaver
  2015-01-18 14:13     ` Ingo Molnar
  2015-01-19 14:40     ` Mark Rutland
  2 siblings, 1 reply; 14+ messages in thread
From: Vince Weaver @ 2015-01-16 18:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, Ingo Molnar, Andi Kleen, linux-kernel, mark.rutland,
	Linus Torvalds

On Fri, 16 Jan 2015, Peter Zijlstra wrote:
> On Fri, Jan 16, 2015 at 11:46:44AM +0100, Peter Zijlstra wrote:
> > Its a bandaid at best :/ The problem is (again) that we changes
> > event->ctx without any kind of serialization.
> > 
> > The issue came up before:
> > 
> >   https://lkml.org/lkml/2014/9/5/397
> > 
> > and I've not been able to come up with anything much saner.
> 
> A little something like the below is the best I could come up with; I
> know Linus hated it, but I figure we ought to do something to stop
> crashing.

I just wanted to report that I've tested both this patch and Jiri's 
original one and they both keep my easy-to-trigger-testcase from crashing 
my core2 machine (when applied against 3.18).

This is great!  I've been chasing this bug for months.

Vince

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf fuzzer crash [PATCH] perf: Get group events reference before moving the group
  2015-01-16 14:11   ` Peter Zijlstra
  2015-01-16 18:54     ` Vince Weaver
@ 2015-01-18 14:13     ` Ingo Molnar
  2015-01-19 14:40     ` Mark Rutland
  2 siblings, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2015-01-18 14:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, Vince Weaver, Ingo Molnar, Andi Kleen, linux-kernel,
	mark.rutland, Linus Torvalds


* Peter Zijlstra <peterz@infradead.org> wrote:

> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -42,6 +42,7 @@
>  #include <linux/module.h>
>  #include <linux/mman.h>
>  #include <linux/compat.h>
> +#include <linux/percpu-rwsem.h>
>  
>  #include "internal.h"
>  
> @@ -122,6 +123,42 @@ static int cpu_function_call(int cpu, int (*func) (void *info), void *info)
>  	return data.ret;
>  }
>  
> +/*
> + * Required to migrate events between contexts.
> + *
> + * Migrating events between contexts is rather tricky; there is no real
> + * serialization around the perf_event::ctx pointer.
> + *
> + * So what we do is hold this rwsem over the remove_from_context and
> + * install_in_context. The remove_from_context ensures the event is inactive
> + * and will not be used from IRQ/NMI context anymore, and the remaining
> + * sites can acquire the rwsem read side.
> + */
> +static struct percpu_rw_semaphore perf_rwsem;
> +
> +static inline struct perf_event_context *perf_event_ctx(struct perf_event *event)
> +{
> +#ifdef CONFIG_LOCKDEP
> +	/*
> +	 * Assert the locking rules outlined above; in order to dereference
> +	 * event->ctx we must either be attached to the context or hold
> +	 * perf_rwsem.
> +	 *
> +	 * XXX not usable from IPIs because the lockdep held lock context
> +	 * will be wrong; maybe add trylock variants to the percpu_rw_semaphore
> +	 */
> +	WARN_ON_ONCE(!(event->attach_state & PERF_ATTACH_CONTEXT) ||
> +		     (debug_locks && !lockdep_is_held(&perf_rwsem.rw_sem)));
> +#endif
> +
> +	return event->ctx;
> +}
> +
> +static inline struct perf_event_context *__perf_event_ctx(struct perf_event *event)
> +{
> +	return event->ctx;
> +}


So if this approach is acceptable I'd also rename event->ctx to 
event->__ctx, to make sure it's not used accidentally without 
serialization in any old (or new) perf related patches.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf fuzzer crash [PATCH] perf: Get group events reference before moving the group
  2015-01-16 18:54     ` Vince Weaver
@ 2015-01-19  3:49       ` Vince Weaver
  0 siblings, 0 replies; 14+ messages in thread
From: Vince Weaver @ 2015-01-19  3:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, Ingo Molnar, Andi Kleen, linux-kernel, mark.rutland,
	Linus Torvalds

On Fri, 16 Jan 2015, Vince Weaver wrote:
> 
> I just wanted to report that I've tested both this patch and Jiri's 
> original one and they both keep my easy-to-trigger-testcase from crashing 
> my core2 machine (when applied against 3.18).

I've continued fuzzing all weekend, with PeterZ's patch 
applied to 3.19-rc4.  It has managed to stay up without crashing, 
impressive.   

Fuzzing still turns up a few issues but no crashes:

[ 1753.904001] WARNING: CPU: 0 PID: 23800 at arch/x86/kernel/hw_breakpoint.c:119 arch_install_hw_breakpoint+0x6a/0xf6()
[ 1753.904001] Can't find any breakpoint slot

(I thought we had fixed that one).

[80608.209702] Uhhuh. NMI received for unknown reason 3d on CPU 0.
[80608.212041] Do you have a strange power saving mode enabled?
[80608.212041] Dazed and confused, but trying to continue

(these are common but don't seem to affect system stability).

Vince

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf fuzzer crash [PATCH] perf: Get group events reference before moving the group
  2015-01-16 14:11   ` Peter Zijlstra
  2015-01-16 18:54     ` Vince Weaver
  2015-01-18 14:13     ` Ingo Molnar
@ 2015-01-19 14:40     ` Mark Rutland
  2015-01-19 17:40       ` Mark Rutland
  2 siblings, 1 reply; 14+ messages in thread
From: Mark Rutland @ 2015-01-19 14:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, Vince Weaver, Ingo Molnar, Andi Kleen, linux-kernel,
	Linus Torvalds

On Fri, Jan 16, 2015 at 02:11:04PM +0000, Peter Zijlstra wrote:
> On Fri, Jan 16, 2015 at 11:46:44AM +0100, Peter Zijlstra wrote:
> > Its a bandaid at best :/ The problem is (again) that we changes
> > event->ctx without any kind of serialization.
> >
> > The issue came up before:
> >
> >   https://lkml.org/lkml/2014/9/5/397

In the end neither the CCI or CCN perf drivers migrate events on
hotplug, so ARM is currently safe from the perf_pmu_migrate_context
case, but I see that you fix the move_group handling too.

I had a go at testing this by hacking migration back into the CCI PMU
driver (atop of v3.19-rc5), but I'm seeing lockups after a few minutes
with my original test case (https://lkml.org/lkml/2014/9/1/569 with
PMU_TYPE and PMU_EVENT fixed up).

I unfortunately don't have a suitable x86 box spare to run that on.
Would someone be able to give it a spin on something with an uncore PMU?

I'll go and dig a bit further. I may just be hitting another latent
issue on my board.

Thanks,
Mark.

> >
> > and I've not been able to come up with anything much saner.
>
> A little something like the below is the best I could come up with; I
> know Linus hated it, but I figure we ought to do something to stop
> crashing.
>
>
>
> ---
>  init/Kconfig         |    1 +
>  kernel/events/core.c |  126 ++++++++++++++++++++++++++++++++++++++++----------
>  2 files changed, 103 insertions(+), 24 deletions(-)
>
> diff --git a/init/Kconfig b/init/Kconfig
> index 9afb971..ebc8522 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1595,6 +1595,7 @@ config PERF_EVENTS
>         depends on HAVE_PERF_EVENTS
>         select ANON_INODES
>         select IRQ_WORK
> +       select PERCPU_RWSEM
>         help
>           Enable kernel support for various performance events provided
>           by software and hardware.
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index c10124b..fb3971d 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -42,6 +42,7 @@
>  #include <linux/module.h>
>  #include <linux/mman.h>
>  #include <linux/compat.h>
> +#include <linux/percpu-rwsem.h>
>
>  #include "internal.h"
>
> @@ -122,6 +123,42 @@ static int cpu_function_call(int cpu, int (*func) (void *info), void *info)
>         return data.ret;
>  }
>
> +/*
> + * Required to migrate events between contexts.
> + *
> + * Migrating events between contexts is rather tricky; there is no real
> + * serialization around the perf_event::ctx pointer.
> + *
> + * So what we do is hold this rwsem over the remove_from_context and
> + * install_in_context. The remove_from_context ensures the event is inactive
> + * and will not be used from IRQ/NMI context anymore, and the remaining
> + * sites can acquire the rwsem read side.
> + */
> +static struct percpu_rw_semaphore perf_rwsem;
> +
> +static inline struct perf_event_context *perf_event_ctx(struct perf_event *event)
> +{
> +#ifdef CONFIG_LOCKDEP
> +       /*
> +        * Assert the locking rules outlined above; in order to dereference
> +        * event->ctx we must either be attached to the context or hold
> +        * perf_rwsem.
> +        *
> +        * XXX not usable from IPIs because the lockdep held lock context
> +        * will be wrong; maybe add trylock variants to the percpu_rw_semaphore
> +        */
> +       WARN_ON_ONCE(!(event->attach_state & PERF_ATTACH_CONTEXT) ||
> +                    (debug_locks && !lockdep_is_held(&perf_rwsem.rw_sem)));
> +#endif
> +
> +       return event->ctx;
> +}
> +
> +static inline struct perf_event_context *__perf_event_ctx(struct perf_event *event)
> +{
> +       return event->ctx;
> +}
> +
>  #define EVENT_OWNER_KERNEL ((void *) -1)
>
>  static bool is_kernel_event(struct perf_event *event)
> @@ -380,7 +417,7 @@ perf_cgroup_from_task(struct task_struct *task)
>  static inline bool
>  perf_cgroup_match(struct perf_event *event)
>  {
> -       struct perf_event_context *ctx = event->ctx;
> +       struct perf_event_context *ctx = __perf_event_ctx(event);
>         struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
>
>         /* @event doesn't care about cgroup */
> @@ -1054,7 +1091,7 @@ static void update_context_time(struct perf_event_context *ctx)
>
>  static u64 perf_event_time(struct perf_event *event)
>  {
> -       struct perf_event_context *ctx = event->ctx;
> +       struct perf_event_context *ctx = __perf_event_ctx(event);
>
>         if (is_cgroup_event(event))
>                 return perf_cgroup_event_time(event);
> @@ -1068,7 +1105,7 @@ static u64 perf_event_time(struct perf_event *event)
>   */
>  static void update_event_times(struct perf_event *event)
>  {
> -       struct perf_event_context *ctx = event->ctx;
> +       struct perf_event_context *ctx = __perf_event_ctx(event);
>         u64 run_end;
>
>         if (event->state < PERF_EVENT_STATE_INACTIVE ||
> @@ -1518,7 +1555,7 @@ static int __perf_remove_from_context(void *info)
>  {
>         struct remove_event *re = info;
>         struct perf_event *event = re->event;
> -       struct perf_event_context *ctx = event->ctx;
> +       struct perf_event_context *ctx = __perf_event_ctx(event);
>         struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
>
>         raw_spin_lock(&ctx->lock);
> @@ -1551,7 +1588,7 @@ static int __perf_remove_from_context(void *info)
>   */
>  static void perf_remove_from_context(struct perf_event *event, bool detach_group)
>  {
> -       struct perf_event_context *ctx = event->ctx;
> +       struct perf_event_context *ctx = perf_event_ctx(event);
>         struct task_struct *task = ctx->task;
>         struct remove_event re = {
>                 .event = event,
> @@ -1606,7 +1643,7 @@ retry:
>  int __perf_event_disable(void *info)
>  {
>         struct perf_event *event = info;
> -       struct perf_event_context *ctx = event->ctx;
> +       struct perf_event_context *ctx = __perf_event_ctx(event);
>         struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
>
>         /*
> @@ -1656,20 +1693,24 @@ int __perf_event_disable(void *info)
>   */
>  void perf_event_disable(struct perf_event *event)
>  {
> -       struct perf_event_context *ctx = event->ctx;
> -       struct task_struct *task = ctx->task;
> +       struct perf_event_context *ctx;
> +       struct task_struct *task;
> +
> +       percpu_down_read(&perf_rwsem);
> +       ctx = perf_event_ctx(event);
> +       task = ctx->task;
>
>         if (!task) {
>                 /*
>                  * Disable the event on the cpu that it's on
>                  */
>                 cpu_function_call(event->cpu, __perf_event_disable, event);
> -               return;
> +               goto unlock;
>         }
>
>  retry:
>         if (!task_function_call(task, __perf_event_disable, event))
> -               return;
> +               goto unlock;
>
>         raw_spin_lock_irq(&ctx->lock);
>         /*
> @@ -1694,6 +1735,8 @@ retry:
>                 event->state = PERF_EVENT_STATE_OFF;
>         }
>         raw_spin_unlock_irq(&ctx->lock);
> +unlock:
> +       percpu_up_read(&perf_rwsem);
>  }
>  EXPORT_SYMBOL_GPL(perf_event_disable);
>
> @@ -1937,7 +1980,7 @@ static void perf_event_sched_in(struct perf_cpu_context *cpuctx,
>  static int  __perf_install_in_context(void *info)
>  {
>         struct perf_event *event = info;
> -       struct perf_event_context *ctx = event->ctx;
> +       struct perf_event_context *ctx = __perf_event_ctx(event);
>         struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
>         struct perf_event_context *task_ctx = cpuctx->task_ctx;
>         struct task_struct *task = current;
> @@ -2076,7 +2119,7 @@ static void __perf_event_mark_enabled(struct perf_event *event)
>  static int __perf_event_enable(void *info)
>  {
>         struct perf_event *event = info;
> -       struct perf_event_context *ctx = event->ctx;
> +       struct perf_event_context *ctx = __perf_event_ctx(event);
>         struct perf_event *leader = event->group_leader;
>         struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
>         int err;
> @@ -2160,15 +2203,19 @@ unlock:
>   */
>  void perf_event_enable(struct perf_event *event)
>  {
> -       struct perf_event_context *ctx = event->ctx;
> -       struct task_struct *task = ctx->task;
> +       struct perf_event_context *ctx;
> +       struct task_struct *task;
> +
> +       percpu_down_read(&perf_rwsem);
> +       ctx = perf_event_ctx(event);
> +       task = ctx->task;
>
>         if (!task) {
>                 /*
>                  * Enable the event on the cpu that it's on
>                  */
>                 cpu_function_call(event->cpu, __perf_event_enable, event);
> -               return;
> +               goto unlock;
>         }
>
>         raw_spin_lock_irq(&ctx->lock);
> @@ -2194,7 +2241,7 @@ retry:
>         raw_spin_unlock_irq(&ctx->lock);
>
>         if (!task_function_call(task, __perf_event_enable, event))
> -               return;
> +               goto unlock;
>
>         raw_spin_lock_irq(&ctx->lock);
>
> @@ -2213,6 +2260,8 @@ retry:
>
>  out:
>         raw_spin_unlock_irq(&ctx->lock);
> +unlock:
> +       percpu_up_read(&perf_rwsem);
>  }
>  EXPORT_SYMBOL_GPL(perf_event_enable);
>
> @@ -3076,7 +3125,7 @@ void perf_event_exec(void)
>  static void __perf_event_read(void *info)
>  {
>         struct perf_event *event = info;
> -       struct perf_event_context *ctx = event->ctx;
> +       struct perf_event_context *ctx = __perf_event_ctx(event);
>         struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
>
>         /*
> @@ -3115,7 +3164,7 @@ static u64 perf_event_read(struct perf_event *event)
>                 smp_call_function_single(event->oncpu,
>                                          __perf_event_read, event, 1);
>         } else if (event->state == PERF_EVENT_STATE_INACTIVE) {
> -               struct perf_event_context *ctx = event->ctx;
> +               struct perf_event_context *ctx = perf_event_ctx(event);
>                 unsigned long flags;
>
>                 raw_spin_lock_irqsave(&ctx->lock, flags);
> @@ -3440,7 +3489,7 @@ static void perf_remove_from_owner(struct perf_event *event)
>   */
>  static void put_event(struct perf_event *event)
>  {
> -       struct perf_event_context *ctx = event->ctx;
> +       struct perf_event_context *ctx;
>
>         if (!atomic_long_dec_and_test(&event->refcount))
>                 return;
> @@ -3448,6 +3497,8 @@ static void put_event(struct perf_event *event)
>         if (!is_kernel_event(event))
>                 perf_remove_from_owner(event);
>
> +       percpu_down_read(&perf_rwsem);
> +       ctx = perf_event_ctx(event);
>         WARN_ON_ONCE(ctx->parent_ctx);
>         /*
>          * There are two ways this annotation is useful:
> @@ -3464,6 +3515,7 @@ static void put_event(struct perf_event *event)
>         mutex_lock_nested(&ctx->mutex, SINGLE_DEPTH_NESTING);
>         perf_remove_from_context(event, true);
>         mutex_unlock(&ctx->mutex);
> +       percpu_up_read(&perf_rwsem);
>
>         _free_event(event);
>  }
> @@ -3647,11 +3699,13 @@ perf_read_hw(struct perf_event *event, char __user *buf, size_t count)
>         if (count < event->read_size)
>                 return -ENOSPC;
>
> -       WARN_ON_ONCE(event->ctx->parent_ctx);
> +       percpu_down_read(&perf_rwsem);
> +       WARN_ON_ONCE(perf_event_ctx(event)->parent_ctx);
>         if (read_format & PERF_FORMAT_GROUP)
>                 ret = perf_event_read_group(event, read_format, buf);
>         else
>                 ret = perf_event_read_one(event, read_format, buf);
> +       percpu_up_read(&perf_rwsem);
>
>         return ret;
>  }
> @@ -3689,9 +3743,11 @@ static unsigned int perf_poll(struct file *file, poll_table *wait)
>
>  static void perf_event_reset(struct perf_event *event)
>  {
> +       percpu_down_read(&perf_rwsem);
>         (void)perf_event_read(event);
>         local64_set(&event->count, 0);
>         perf_event_update_userpage(event);
> +       percpu_up_read(&perf_rwsem);
>  }
>
>  /*
> @@ -3705,7 +3761,7 @@ static void perf_event_for_each_child(struct perf_event *event,
>  {
>         struct perf_event *child;
>
> -       WARN_ON_ONCE(event->ctx->parent_ctx);
> +       WARN_ON_ONCE(__perf_event_ctx(event)->parent_ctx);
>         mutex_lock(&event->child_mutex);
>         func(event);
>         list_for_each_entry(child, &event->child_list, child_list)
> @@ -3716,6 +3772,14 @@ static void perf_event_for_each_child(struct perf_event *event,
>  static void perf_event_for_each(struct perf_event *event,
>                                   void (*func)(struct perf_event *))
>  {
> +       /*
> +        * XXX broken
> +        *
> +        * lock inversion and recursion issues; ctx->mutex must nest inside
> +        * perf_rwsem, but func() will take perf_rwsem again.
> +        *
> +        * Cure with ugly.
> +        */
>         struct perf_event_context *ctx = event->ctx;
>         struct perf_event *sibling;
>
> @@ -3731,7 +3795,7 @@ static void perf_event_for_each(struct perf_event *event,
>
>  static int perf_event_period(struct perf_event *event, u64 __user *arg)
>  {
> -       struct perf_event_context *ctx = event->ctx;
> +       struct perf_event_context *ctx;
>         int ret = 0, active;
>         u64 value;
>
> @@ -3744,6 +3808,8 @@ static int perf_event_period(struct perf_event *event, u64 __user *arg)
>         if (!value)
>                 return -EINVAL;
>
> +       percpu_down_read(&perf_rwsem);
> +       ctx = perf_event_ctx(event);
>         raw_spin_lock_irq(&ctx->lock);
>         if (event->attr.freq) {
>                 if (value > sysctl_perf_event_sample_rate) {
> @@ -3772,6 +3838,7 @@ static int perf_event_period(struct perf_event *event, u64 __user *arg)
>
>  unlock:
>         raw_spin_unlock_irq(&ctx->lock);
> +       percpu_up_read(&perf_rwsem);
>
>         return ret;
>  }
> @@ -7229,11 +7296,13 @@ perf_event_set_output(struct perf_event *event, struct perf_event *output_event)
>         if (output_event->cpu != event->cpu)
>                 goto out;
>
> +       percpu_down_read(&perf_rwsem);
>         /*
>          * If its not a per-cpu rb, it must be the same task.
>          */
> -       if (output_event->cpu == -1 && output_event->ctx != event->ctx)
> -               goto out;
> +       if (output_event->cpu == -1 &&
> +           perf_event_ctx(output_event) != perf_event_ctx(event))
> +               goto unlock_rwsem;
>
>  set:
>         mutex_lock(&event->mmap_mutex);
> @@ -7253,6 +7322,8 @@ set:
>         ret = 0;
>  unlock:
>         mutex_unlock(&event->mmap_mutex);
> +unlock_rwsem:
> +       percpu_up_read(&perf_rwsem);
>
>  out:
>         return ret;
> @@ -7461,6 +7532,7 @@ SYSCALL_DEFINE5(perf_event_open,
>         if (move_group) {
>                 struct perf_event_context *gctx = group_leader->ctx;
>
> +               percpu_down_write(&perf_rwsem);
>                 mutex_lock(&gctx->mutex);
>                 perf_remove_from_context(group_leader, false);
>
> @@ -7498,6 +7570,9 @@ SYSCALL_DEFINE5(perf_event_open,
>         perf_unpin_context(ctx);
>         mutex_unlock(&ctx->mutex);
>
> +       if (move_group)
> +               percpu_up_write(&perf_rwsem);
> +
>         put_online_cpus();
>
>         event->owner = current;
> @@ -7600,6 +7675,7 @@ void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu)
>         struct perf_event *event, *tmp;
>         LIST_HEAD(events);
>
> +       percpu_down_write(&perf_rwsem);
>         src_ctx = &per_cpu_ptr(pmu->pmu_cpu_context, src_cpu)->ctx;
>         dst_ctx = &per_cpu_ptr(pmu->pmu_cpu_context, dst_cpu)->ctx;
>
> @@ -7625,6 +7701,7 @@ void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu)
>                 get_ctx(dst_ctx);
>         }
>         mutex_unlock(&dst_ctx->mutex);
> +       percpu_up_write(&perf_rwsem);
>  }
>  EXPORT_SYMBOL_GPL(perf_pmu_migrate_context);
>
> @@ -8261,6 +8338,7 @@ void __init perf_event_init(void)
>
>         idr_init(&pmu_idr);
>
> +       percpu_init_rwsem(&perf_rwsem);
>         perf_event_init_all_cpus();
>         init_srcu_struct(&pmus_srcu);
>         perf_pmu_register(&perf_swevent, "software", PERF_TYPE_SOFTWARE);
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf fuzzer crash [PATCH] perf: Get group events reference before moving the group
  2015-01-19 14:40     ` Mark Rutland
@ 2015-01-19 17:40       ` Mark Rutland
  2015-01-20 13:39         ` Mark Rutland
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Rutland @ 2015-01-19 17:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, Vince Weaver, Ingo Molnar, Andi Kleen, linux-kernel,
	Linus Torvalds

On Mon, Jan 19, 2015 at 02:40:28PM +0000, Mark Rutland wrote:
> On Fri, Jan 16, 2015 at 02:11:04PM +0000, Peter Zijlstra wrote:
> > On Fri, Jan 16, 2015 at 11:46:44AM +0100, Peter Zijlstra wrote:
> > > Its a bandaid at best :/ The problem is (again) that we changes
> > > event->ctx without any kind of serialization.
> > >
> > > The issue came up before:
> > >
> > >   https://lkml.org/lkml/2014/9/5/397
> 
> In the end neither the CCI or CCN perf drivers migrate events on
> hotplug, so ARM is currently safe from the perf_pmu_migrate_context
> case, but I see that you fix the move_group handling too.
> 
> I had a go at testing this by hacking migration back into the CCI PMU
> driver (atop of v3.19-rc5), but I'm seeing lockups after a few minutes
> with my original test case (https://lkml.org/lkml/2014/9/1/569 with
> PMU_TYPE and PMU_EVENT fixed up).
> 
> I unfortunately don't have a suitable x86 box spare to run that on.
> Would someone be able to give it a spin on something with an uncore PMU?
> 
> I'll go and dig a bit further. I may just be hitting another latent
> issue on my board.

I'm able to trigger the lockups even without both your patch and the
call to perf_pmu_migrate_context, so there is a latent issue.

On vanilla v3.19-rc5 and vanilla v3.18, I'm able to get my hotplug
script hung when run concurrently with the test case against the CCI PMU
driver (without migration). The v3.18 and v3.19-rc5 lockups are
identical:

INFO: task hpall.sh:1506 blocked for more than 120 seconds.
      Not tainted 3.19.0-rc5 #9
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
hpall.sh        D 804a6ffc     0  1506   1497 0x00000000
[<804a6ffc>] (__schedule) from [<80022308>] (cpu_hotplug_begin+0xa0/0xac)
[<80022308>] (cpu_hotplug_begin) from [<8002236c>] (_cpu_up+0x24/0x180)
[<8002236c>] (_cpu_up) from [<8002253c>] (cpu_up+0x74/0x98)
[<8002253c>] (cpu_up) from [<802bce60>] (device_online+0x64/0x90)
[<802bce60>] (device_online) from [<802bcef4>] (online_store+0x68/0x74)
[<802bcef4>] (online_store) from [<8014059c>] (kernfs_fop_write+0xbc/0x1a0)
[<8014059c>] (kernfs_fop_write) from [<800e71b0>] (vfs_write+0xa0/0x1ac)
[<800e71b0>] (vfs_write) from [<800e7808>] (SyS_write+0x44/0x9c)
[<800e7808>] (SyS_write) from [<8000e560>] (ret_fast_syscall+0x0/0x48)
7 locks held by hpall.sh/1506:
 #0:  (sb_writers#6){.+.+.+}, at: [<800e729c>] vfs_write+0x18c/0x1ac
 #1:  (&of->mutex){+.+.+.}, at: [<8014052c>] kernfs_fop_write+0x4c/0x1a0
 #2:  (s_active#15){.+.+.+}, at: [<80140534>] kernfs_fop_write+0x54/0x1a0
 #3:  (device_hotplug_lock){+.+.+.}, at: [<802bbe44>] lock_device_hotplug_sysfs+0xc/0x4c
 #4:  (&dev->mutex){......}, at: [<802bce14>] device_online+0x18/0x90
 #5:  (cpu_add_remove_lock){+.+.+.}, at: [<80022508>] cpu_up+0x40/0x98
 #6:  (cpu_hotplug.lock){++++++}, at: [<80022268>] cpu_hotplug_begin+0x0/0xac

I guess that lockup is my fundamental issue, and with your patch the
perf_rwsem manages to spread a transitive dependency on one of those
locks all over the perf subsystem. I haven't considered that in great
detail, however.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf fuzzer crash [PATCH] perf: Get group events reference before moving the group
  2015-01-19 17:40       ` Mark Rutland
@ 2015-01-20 13:39         ` Mark Rutland
  2015-01-20 14:35           ` Mark Rutland
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Rutland @ 2015-01-20 13:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, Vince Weaver, Ingo Molnar, Andi Kleen, linux-kernel,
	Linus Torvalds, Paul E. McKenney, Jiri Kosina, Borislav Petkov,
	Will Deacon

On Mon, Jan 19, 2015 at 05:40:09PM +0000, Mark Rutland wrote:
> On Mon, Jan 19, 2015 at 02:40:28PM +0000, Mark Rutland wrote:
> > On Fri, Jan 16, 2015 at 02:11:04PM +0000, Peter Zijlstra wrote:
> > > On Fri, Jan 16, 2015 at 11:46:44AM +0100, Peter Zijlstra wrote:
> > > > Its a bandaid at best :/ The problem is (again) that we changes
> > > > event->ctx without any kind of serialization.
> > > >
> > > > The issue came up before:
> > > >
> > > >   https://lkml.org/lkml/2014/9/5/397
> > 
> > In the end neither the CCI or CCN perf drivers migrate events on
> > hotplug, so ARM is currently safe from the perf_pmu_migrate_context
> > case, but I see that you fix the move_group handling too.
> > 
> > I had a go at testing this by hacking migration back into the CCI PMU
> > driver (atop of v3.19-rc5), but I'm seeing lockups after a few minutes
> > with my original test case (https://lkml.org/lkml/2014/9/1/569 with
> > PMU_TYPE and PMU_EVENT fixed up).
> > 
> > I unfortunately don't have a suitable x86 box spare to run that on.
> > Would someone be able to give it a spin on something with an uncore PMU?
> > 
> > I'll go and dig a bit further. I may just be hitting another latent
> > issue on my board.
> 
> I'm able to trigger the lockups even without both your patch and the
> call to perf_pmu_migrate_context, so there is a latent issue.
> 
> On vanilla v3.19-rc5 and vanilla v3.18, I'm able to get my hotplug
> script hung when run concurrently with the test case against the CCI PMU
> driver (without migration). The v3.18 and v3.19-rc5 lockups are
> identical:
> 
> INFO: task hpall.sh:1506 blocked for more than 120 seconds.
>       Not tainted 3.19.0-rc5 #9
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> hpall.sh        D 804a6ffc     0  1506   1497 0x00000000
> [<804a6ffc>] (__schedule) from [<80022308>] (cpu_hotplug_begin+0xa0/0xac)
> [<80022308>] (cpu_hotplug_begin) from [<8002236c>] (_cpu_up+0x24/0x180)
> [<8002236c>] (_cpu_up) from [<8002253c>] (cpu_up+0x74/0x98)
> [<8002253c>] (cpu_up) from [<802bce60>] (device_online+0x64/0x90)
> [<802bce60>] (device_online) from [<802bcef4>] (online_store+0x68/0x74)
> [<802bcef4>] (online_store) from [<8014059c>] (kernfs_fop_write+0xbc/0x1a0)
> [<8014059c>] (kernfs_fop_write) from [<800e71b0>] (vfs_write+0xa0/0x1ac)
> [<800e71b0>] (vfs_write) from [<800e7808>] (SyS_write+0x44/0x9c)
> [<800e7808>] (SyS_write) from [<8000e560>] (ret_fast_syscall+0x0/0x48)
> 7 locks held by hpall.sh/1506:
>  #0:  (sb_writers#6){.+.+.+}, at: [<800e729c>] vfs_write+0x18c/0x1ac
>  #1:  (&of->mutex){+.+.+.}, at: [<8014052c>] kernfs_fop_write+0x4c/0x1a0
>  #2:  (s_active#15){.+.+.+}, at: [<80140534>] kernfs_fop_write+0x54/0x1a0
>  #3:  (device_hotplug_lock){+.+.+.}, at: [<802bbe44>] lock_device_hotplug_sysfs+0xc/0x4c
>  #4:  (&dev->mutex){......}, at: [<802bce14>] device_online+0x18/0x90
>  #5:  (cpu_add_remove_lock){+.+.+.}, at: [<80022508>] cpu_up+0x40/0x98
>  #6:  (cpu_hotplug.lock){++++++}, at: [<80022268>] cpu_hotplug_begin+0x0/0xac
> 
> I guess that lockup is my fundamental issue, and with your patch the
> perf_rwsem manages to spread a transitive dependency on one of those
> locks all over the perf subsystem. I haven't considered that in great
> detail, however.

I found that I couldn't trigger the issue with v3.17, and I was able to
bisect down to commit b2c4623dcd07af4b ("rcu: More on deadlock between
CPU hotplug and expedited grace periods").

I'm currently stressing b2c4623dcd07af4b~1 to make sure my bisect hasn't
mislead me.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf fuzzer crash [PATCH] perf: Get group events reference before moving the group
  2015-01-20 13:39         ` Mark Rutland
@ 2015-01-20 14:35           ` Mark Rutland
  2015-01-21  1:00             ` Paul E. McKenney
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Rutland @ 2015-01-20 14:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, Vince Weaver, Ingo Molnar, Andi Kleen, linux-kernel,
	Linus Torvalds, Paul E. McKenney, Jiri Kosina, Borislav Petkov,
	Will Deacon

On Tue, Jan 20, 2015 at 01:39:47PM +0000, Mark Rutland wrote:
> On Mon, Jan 19, 2015 at 05:40:09PM +0000, Mark Rutland wrote:
> > On Mon, Jan 19, 2015 at 02:40:28PM +0000, Mark Rutland wrote:
> > > On Fri, Jan 16, 2015 at 02:11:04PM +0000, Peter Zijlstra wrote:
> > > > On Fri, Jan 16, 2015 at 11:46:44AM +0100, Peter Zijlstra wrote:
> > > > > Its a bandaid at best :/ The problem is (again) that we changes
> > > > > event->ctx without any kind of serialization.
> > > > >
> > > > > The issue came up before:
> > > > >
> > > > >   https://lkml.org/lkml/2014/9/5/397
> > > 
> > > In the end neither the CCI or CCN perf drivers migrate events on
> > > hotplug, so ARM is currently safe from the perf_pmu_migrate_context
> > > case, but I see that you fix the move_group handling too.
> > > 
> > > I had a go at testing this by hacking migration back into the CCI PMU
> > > driver (atop of v3.19-rc5), but I'm seeing lockups after a few minutes
> > > with my original test case (https://lkml.org/lkml/2014/9/1/569 with
> > > PMU_TYPE and PMU_EVENT fixed up).
> > > 
> > > I unfortunately don't have a suitable x86 box spare to run that on.
> > > Would someone be able to give it a spin on something with an uncore PMU?
> > > 
> > > I'll go and dig a bit further. I may just be hitting another latent
> > > issue on my board.
> > 
> > I'm able to trigger the lockups even without both your patch and the
> > call to perf_pmu_migrate_context, so there is a latent issue.
> > 
> > On vanilla v3.19-rc5 and vanilla v3.18, I'm able to get my hotplug
> > script hung when run concurrently with the test case against the CCI PMU
> > driver (without migration). The v3.18 and v3.19-rc5 lockups are
> > identical:
> > 
> > INFO: task hpall.sh:1506 blocked for more than 120 seconds.
> >       Not tainted 3.19.0-rc5 #9
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > hpall.sh        D 804a6ffc     0  1506   1497 0x00000000
> > [<804a6ffc>] (__schedule) from [<80022308>] (cpu_hotplug_begin+0xa0/0xac)
> > [<80022308>] (cpu_hotplug_begin) from [<8002236c>] (_cpu_up+0x24/0x180)
> > [<8002236c>] (_cpu_up) from [<8002253c>] (cpu_up+0x74/0x98)
> > [<8002253c>] (cpu_up) from [<802bce60>] (device_online+0x64/0x90)
> > [<802bce60>] (device_online) from [<802bcef4>] (online_store+0x68/0x74)
> > [<802bcef4>] (online_store) from [<8014059c>] (kernfs_fop_write+0xbc/0x1a0)
> > [<8014059c>] (kernfs_fop_write) from [<800e71b0>] (vfs_write+0xa0/0x1ac)
> > [<800e71b0>] (vfs_write) from [<800e7808>] (SyS_write+0x44/0x9c)
> > [<800e7808>] (SyS_write) from [<8000e560>] (ret_fast_syscall+0x0/0x48)
> > 7 locks held by hpall.sh/1506:
> >  #0:  (sb_writers#6){.+.+.+}, at: [<800e729c>] vfs_write+0x18c/0x1ac
> >  #1:  (&of->mutex){+.+.+.}, at: [<8014052c>] kernfs_fop_write+0x4c/0x1a0
> >  #2:  (s_active#15){.+.+.+}, at: [<80140534>] kernfs_fop_write+0x54/0x1a0
> >  #3:  (device_hotplug_lock){+.+.+.}, at: [<802bbe44>] lock_device_hotplug_sysfs+0xc/0x4c
> >  #4:  (&dev->mutex){......}, at: [<802bce14>] device_online+0x18/0x90
> >  #5:  (cpu_add_remove_lock){+.+.+.}, at: [<80022508>] cpu_up+0x40/0x98
> >  #6:  (cpu_hotplug.lock){++++++}, at: [<80022268>] cpu_hotplug_begin+0x0/0xac
> > 
> > I guess that lockup is my fundamental issue, and with your patch the
> > perf_rwsem manages to spread a transitive dependency on one of those
> > locks all over the perf subsystem. I haven't considered that in great
> > detail, however.
> 
> I found that I couldn't trigger the issue with v3.17, and I was able to
> bisect down to commit b2c4623dcd07af4b ("rcu: More on deadlock between
> CPU hotplug and expedited grace periods").
> 
> I'm currently stressing b2c4623dcd07af4b~1 to make sure my bisect hasn't
> mislead me.

That seems to be solid, and I think I see what's going on.

The task doing hotplug (hpall.sh:1506) gets to cpu_hotplug_begin(), and
sets cpu_hotplug.active_writer to current (I assume writes to this are
protected by cpu_add_remove_lock from cpu_up()?). Then it loops, acquiring
cpu_hotplug.lock and testing the refcount, and if non-zero dropping the
lock and going into uninterruptible sleep, expecting to be woken by
put_online_cpus().

Concurrently a task holding the refcount non-zero calls
put_online_cpus(), and finds there to be contention on cpu_hotplug.lock.
Thus it increments cpu_hotplug.puts_pending and goes of on its merry
way, without trying to wake the writer.

So the writer is never woken and never gets to handle the non-zero
cpu_hotplug.puts_pending.

I'm not sure what the right fix for that is. It looks like the writer
could observe the change to puts_pending and so
cpu_hotplug.active_writer could change under our feet unless we hold
cpu_hotplug.lock. But holding that would reintroduce the deadlock
b2c4623dcd07af4b was trying to avoid.

Any ideas?

Mark.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf fuzzer crash [PATCH] perf: Get group events reference before moving the group
  2015-01-20 14:35           ` Mark Rutland
@ 2015-01-21  1:00             ` Paul E. McKenney
  2015-01-21 12:08               ` Mark Rutland
  0 siblings, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2015-01-21  1:00 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Peter Zijlstra, Jiri Olsa, Vince Weaver, Ingo Molnar, Andi Kleen,
	linux-kernel, Linus Torvalds, Jiri Kosina, Borislav Petkov,
	Will Deacon

On Tue, Jan 20, 2015 at 02:35:09PM +0000, Mark Rutland wrote:
> On Tue, Jan 20, 2015 at 01:39:47PM +0000, Mark Rutland wrote:
> > On Mon, Jan 19, 2015 at 05:40:09PM +0000, Mark Rutland wrote:
> > > On Mon, Jan 19, 2015 at 02:40:28PM +0000, Mark Rutland wrote:
> > > > On Fri, Jan 16, 2015 at 02:11:04PM +0000, Peter Zijlstra wrote:
> > > > > On Fri, Jan 16, 2015 at 11:46:44AM +0100, Peter Zijlstra wrote:
> > > > > > Its a bandaid at best :/ The problem is (again) that we changes
> > > > > > event->ctx without any kind of serialization.
> > > > > >
> > > > > > The issue came up before:
> > > > > >
> > > > > >   https://lkml.org/lkml/2014/9/5/397
> > > > 
> > > > In the end neither the CCI or CCN perf drivers migrate events on
> > > > hotplug, so ARM is currently safe from the perf_pmu_migrate_context
> > > > case, but I see that you fix the move_group handling too.
> > > > 
> > > > I had a go at testing this by hacking migration back into the CCI PMU
> > > > driver (atop of v3.19-rc5), but I'm seeing lockups after a few minutes
> > > > with my original test case (https://lkml.org/lkml/2014/9/1/569 with
> > > > PMU_TYPE and PMU_EVENT fixed up).
> > > > 
> > > > I unfortunately don't have a suitable x86 box spare to run that on.
> > > > Would someone be able to give it a spin on something with an uncore PMU?
> > > > 
> > > > I'll go and dig a bit further. I may just be hitting another latent
> > > > issue on my board.
> > > 
> > > I'm able to trigger the lockups even without both your patch and the
> > > call to perf_pmu_migrate_context, so there is a latent issue.
> > > 
> > > On vanilla v3.19-rc5 and vanilla v3.18, I'm able to get my hotplug
> > > script hung when run concurrently with the test case against the CCI PMU
> > > driver (without migration). The v3.18 and v3.19-rc5 lockups are
> > > identical:
> > > 
> > > INFO: task hpall.sh:1506 blocked for more than 120 seconds.
> > >       Not tainted 3.19.0-rc5 #9
> > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > hpall.sh        D 804a6ffc     0  1506   1497 0x00000000
> > > [<804a6ffc>] (__schedule) from [<80022308>] (cpu_hotplug_begin+0xa0/0xac)
> > > [<80022308>] (cpu_hotplug_begin) from [<8002236c>] (_cpu_up+0x24/0x180)
> > > [<8002236c>] (_cpu_up) from [<8002253c>] (cpu_up+0x74/0x98)
> > > [<8002253c>] (cpu_up) from [<802bce60>] (device_online+0x64/0x90)
> > > [<802bce60>] (device_online) from [<802bcef4>] (online_store+0x68/0x74)
> > > [<802bcef4>] (online_store) from [<8014059c>] (kernfs_fop_write+0xbc/0x1a0)
> > > [<8014059c>] (kernfs_fop_write) from [<800e71b0>] (vfs_write+0xa0/0x1ac)
> > > [<800e71b0>] (vfs_write) from [<800e7808>] (SyS_write+0x44/0x9c)
> > > [<800e7808>] (SyS_write) from [<8000e560>] (ret_fast_syscall+0x0/0x48)
> > > 7 locks held by hpall.sh/1506:
> > >  #0:  (sb_writers#6){.+.+.+}, at: [<800e729c>] vfs_write+0x18c/0x1ac
> > >  #1:  (&of->mutex){+.+.+.}, at: [<8014052c>] kernfs_fop_write+0x4c/0x1a0
> > >  #2:  (s_active#15){.+.+.+}, at: [<80140534>] kernfs_fop_write+0x54/0x1a0
> > >  #3:  (device_hotplug_lock){+.+.+.}, at: [<802bbe44>] lock_device_hotplug_sysfs+0xc/0x4c
> > >  #4:  (&dev->mutex){......}, at: [<802bce14>] device_online+0x18/0x90
> > >  #5:  (cpu_add_remove_lock){+.+.+.}, at: [<80022508>] cpu_up+0x40/0x98
> > >  #6:  (cpu_hotplug.lock){++++++}, at: [<80022268>] cpu_hotplug_begin+0x0/0xac
> > > 
> > > I guess that lockup is my fundamental issue, and with your patch the
> > > perf_rwsem manages to spread a transitive dependency on one of those
> > > locks all over the perf subsystem. I haven't considered that in great
> > > detail, however.
> > 
> > I found that I couldn't trigger the issue with v3.17, and I was able to
> > bisect down to commit b2c4623dcd07af4b ("rcu: More on deadlock between
> > CPU hotplug and expedited grace periods").
> > 
> > I'm currently stressing b2c4623dcd07af4b~1 to make sure my bisect hasn't
> > mislead me.
> 
> That seems to be solid, and I think I see what's going on.
> 
> The task doing hotplug (hpall.sh:1506) gets to cpu_hotplug_begin(), and
> sets cpu_hotplug.active_writer to current (I assume writes to this are
> protected by cpu_add_remove_lock from cpu_up()?). Then it loops, acquiring
> cpu_hotplug.lock and testing the refcount, and if non-zero dropping the
> lock and going into uninterruptible sleep, expecting to be woken by
> put_online_cpus().
> 
> Concurrently a task holding the refcount non-zero calls
> put_online_cpus(), and finds there to be contention on cpu_hotplug.lock.
> Thus it increments cpu_hotplug.puts_pending and goes of on its merry
> way, without trying to wake the writer.
> 
> So the writer is never woken and never gets to handle the non-zero
> cpu_hotplug.puts_pending.
> 
> I'm not sure what the right fix for that is. It looks like the writer
> could observe the change to puts_pending and so
> cpu_hotplug.active_writer could change under our feet unless we hold
> cpu_hotplug.lock. But holding that would reintroduce the deadlock
> b2c4623dcd07af4b was trying to avoid.
> 
> Any ideas?

You need 87af9e7ff9d90 (hotplugcpu: Avoid deadlocks by waking active_writer),
which is in -rcu at:

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git

With some luck, this will be in -tip soon, and hit mainline during
the next merge window.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf fuzzer crash [PATCH] perf: Get group events reference before moving the group
  2015-01-21  1:00             ` Paul E. McKenney
@ 2015-01-21 12:08               ` Mark Rutland
  2015-01-21 20:07                 ` Paul E. McKenney
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Rutland @ 2015-01-21 12:08 UTC (permalink / raw)
  To: Paul E. McKenney, Peter Zijlstra
  Cc: Jiri Olsa, Vince Weaver, Ingo Molnar, Andi Kleen, linux-kernel,
	Linus Torvalds, Jiri Kosina, Borislav Petkov, Will Deacon

[...]

> > > > On vanilla v3.19-rc5 and vanilla v3.18, I'm able to get my hotplug
> > > > script hung when run concurrently with the test case against the CCI PMU
> > > > driver (without migration). The v3.18 and v3.19-rc5 lockups are
> > > > identical:
> > > > 
> > > > INFO: task hpall.sh:1506 blocked for more than 120 seconds.
> > > >       Not tainted 3.19.0-rc5 #9
> > > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > hpall.sh        D 804a6ffc     0  1506   1497 0x00000000
> > > > [<804a6ffc>] (__schedule) from [<80022308>] (cpu_hotplug_begin+0xa0/0xac)
> > > > [<80022308>] (cpu_hotplug_begin) from [<8002236c>] (_cpu_up+0x24/0x180)
> > > > [<8002236c>] (_cpu_up) from [<8002253c>] (cpu_up+0x74/0x98)
> > > > [<8002253c>] (cpu_up) from [<802bce60>] (device_online+0x64/0x90)
> > > > [<802bce60>] (device_online) from [<802bcef4>] (online_store+0x68/0x74)
> > > > [<802bcef4>] (online_store) from [<8014059c>] (kernfs_fop_write+0xbc/0x1a0)
> > > > [<8014059c>] (kernfs_fop_write) from [<800e71b0>] (vfs_write+0xa0/0x1ac)
> > > > [<800e71b0>] (vfs_write) from [<800e7808>] (SyS_write+0x44/0x9c)
> > > > [<800e7808>] (SyS_write) from [<8000e560>] (ret_fast_syscall+0x0/0x48)
> > > > 7 locks held by hpall.sh/1506:
> > > >  #0:  (sb_writers#6){.+.+.+}, at: [<800e729c>] vfs_write+0x18c/0x1ac
> > > >  #1:  (&of->mutex){+.+.+.}, at: [<8014052c>] kernfs_fop_write+0x4c/0x1a0
> > > >  #2:  (s_active#15){.+.+.+}, at: [<80140534>] kernfs_fop_write+0x54/0x1a0
> > > >  #3:  (device_hotplug_lock){+.+.+.}, at: [<802bbe44>] lock_device_hotplug_sysfs+0xc/0x4c
> > > >  #4:  (&dev->mutex){......}, at: [<802bce14>] device_online+0x18/0x90
> > > >  #5:  (cpu_add_remove_lock){+.+.+.}, at: [<80022508>] cpu_up+0x40/0x98
> > > >  #6:  (cpu_hotplug.lock){++++++}, at: [<80022268>] cpu_hotplug_begin+0x0/0xac
> > > > 
> > > > I guess that lockup is my fundamental issue, and with your patch the
> > > > perf_rwsem manages to spread a transitive dependency on one of those
> > > > locks all over the perf subsystem. I haven't considered that in great
> > > > detail, however.
> > > 
> > > I found that I couldn't trigger the issue with v3.17, and I was able to
> > > bisect down to commit b2c4623dcd07af4b ("rcu: More on deadlock between
> > > CPU hotplug and expedited grace periods").
> > > 
> > > I'm currently stressing b2c4623dcd07af4b~1 to make sure my bisect hasn't
> > > mislead me.
> > 
> > That seems to be solid, and I think I see what's going on.
> > 
> > The task doing hotplug (hpall.sh:1506) gets to cpu_hotplug_begin(), and
> > sets cpu_hotplug.active_writer to current (I assume writes to this are
> > protected by cpu_add_remove_lock from cpu_up()?). Then it loops, acquiring
> > cpu_hotplug.lock and testing the refcount, and if non-zero dropping the
> > lock and going into uninterruptible sleep, expecting to be woken by
> > put_online_cpus().
> > 
> > Concurrently a task holding the refcount non-zero calls
> > put_online_cpus(), and finds there to be contention on cpu_hotplug.lock.
> > Thus it increments cpu_hotplug.puts_pending and goes of on its merry
> > way, without trying to wake the writer.
> > 
> > So the writer is never woken and never gets to handle the non-zero
> > cpu_hotplug.puts_pending.
> > 
> > I'm not sure what the right fix for that is. It looks like the writer
> > could observe the change to puts_pending and so
> > cpu_hotplug.active_writer could change under our feet unless we hold
> > cpu_hotplug.lock. But holding that would reintroduce the deadlock
> > b2c4623dcd07af4b was trying to avoid.
> > 
> > Any ideas?
> 
> You need 87af9e7ff9d90 (hotplugcpu: Avoid deadlocks by waking active_writer),
> which is in -rcu at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
> 
> With some luck, this will be in -tip soon, and hit mainline during
> the next merge window.

Thanks Paul, that fixes the issue for me.

Peter, with that fix applied in addition to your patch, I don't see the
CCI PMU code exploding around hotplug, even with event migration hacked
into the driver.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf fuzzer crash [PATCH] perf: Get group events reference before moving the group
  2015-01-21 12:08               ` Mark Rutland
@ 2015-01-21 20:07                 ` Paul E. McKenney
  0 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2015-01-21 20:07 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Peter Zijlstra, Jiri Olsa, Vince Weaver, Ingo Molnar, Andi Kleen,
	linux-kernel, Linus Torvalds, Jiri Kosina, Borislav Petkov,
	Will Deacon

On Wed, Jan 21, 2015 at 12:08:12PM +0000, Mark Rutland wrote:
> [...]
> 
> > > > > On vanilla v3.19-rc5 and vanilla v3.18, I'm able to get my hotplug
> > > > > script hung when run concurrently with the test case against the CCI PMU
> > > > > driver (without migration). The v3.18 and v3.19-rc5 lockups are
> > > > > identical:
> > > > > 
> > > > > INFO: task hpall.sh:1506 blocked for more than 120 seconds.
> > > > >       Not tainted 3.19.0-rc5 #9
> > > > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > > hpall.sh        D 804a6ffc     0  1506   1497 0x00000000
> > > > > [<804a6ffc>] (__schedule) from [<80022308>] (cpu_hotplug_begin+0xa0/0xac)
> > > > > [<80022308>] (cpu_hotplug_begin) from [<8002236c>] (_cpu_up+0x24/0x180)
> > > > > [<8002236c>] (_cpu_up) from [<8002253c>] (cpu_up+0x74/0x98)
> > > > > [<8002253c>] (cpu_up) from [<802bce60>] (device_online+0x64/0x90)
> > > > > [<802bce60>] (device_online) from [<802bcef4>] (online_store+0x68/0x74)
> > > > > [<802bcef4>] (online_store) from [<8014059c>] (kernfs_fop_write+0xbc/0x1a0)
> > > > > [<8014059c>] (kernfs_fop_write) from [<800e71b0>] (vfs_write+0xa0/0x1ac)
> > > > > [<800e71b0>] (vfs_write) from [<800e7808>] (SyS_write+0x44/0x9c)
> > > > > [<800e7808>] (SyS_write) from [<8000e560>] (ret_fast_syscall+0x0/0x48)
> > > > > 7 locks held by hpall.sh/1506:
> > > > >  #0:  (sb_writers#6){.+.+.+}, at: [<800e729c>] vfs_write+0x18c/0x1ac
> > > > >  #1:  (&of->mutex){+.+.+.}, at: [<8014052c>] kernfs_fop_write+0x4c/0x1a0
> > > > >  #2:  (s_active#15){.+.+.+}, at: [<80140534>] kernfs_fop_write+0x54/0x1a0
> > > > >  #3:  (device_hotplug_lock){+.+.+.}, at: [<802bbe44>] lock_device_hotplug_sysfs+0xc/0x4c
> > > > >  #4:  (&dev->mutex){......}, at: [<802bce14>] device_online+0x18/0x90
> > > > >  #5:  (cpu_add_remove_lock){+.+.+.}, at: [<80022508>] cpu_up+0x40/0x98
> > > > >  #6:  (cpu_hotplug.lock){++++++}, at: [<80022268>] cpu_hotplug_begin+0x0/0xac
> > > > > 
> > > > > I guess that lockup is my fundamental issue, and with your patch the
> > > > > perf_rwsem manages to spread a transitive dependency on one of those
> > > > > locks all over the perf subsystem. I haven't considered that in great
> > > > > detail, however.
> > > > 
> > > > I found that I couldn't trigger the issue with v3.17, and I was able to
> > > > bisect down to commit b2c4623dcd07af4b ("rcu: More on deadlock between
> > > > CPU hotplug and expedited grace periods").
> > > > 
> > > > I'm currently stressing b2c4623dcd07af4b~1 to make sure my bisect hasn't
> > > > mislead me.
> > > 
> > > That seems to be solid, and I think I see what's going on.
> > > 
> > > The task doing hotplug (hpall.sh:1506) gets to cpu_hotplug_begin(), and
> > > sets cpu_hotplug.active_writer to current (I assume writes to this are
> > > protected by cpu_add_remove_lock from cpu_up()?). Then it loops, acquiring
> > > cpu_hotplug.lock and testing the refcount, and if non-zero dropping the
> > > lock and going into uninterruptible sleep, expecting to be woken by
> > > put_online_cpus().
> > > 
> > > Concurrently a task holding the refcount non-zero calls
> > > put_online_cpus(), and finds there to be contention on cpu_hotplug.lock.
> > > Thus it increments cpu_hotplug.puts_pending and goes of on its merry
> > > way, without trying to wake the writer.
> > > 
> > > So the writer is never woken and never gets to handle the non-zero
> > > cpu_hotplug.puts_pending.
> > > 
> > > I'm not sure what the right fix for that is. It looks like the writer
> > > could observe the change to puts_pending and so
> > > cpu_hotplug.active_writer could change under our feet unless we hold
> > > cpu_hotplug.lock. But holding that would reintroduce the deadlock
> > > b2c4623dcd07af4b was trying to avoid.
> > > 
> > > Any ideas?
> > 
> > You need 87af9e7ff9d90 (hotplugcpu: Avoid deadlocks by waking active_writer),
> > which is in -rcu at:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
> > 
> > With some luck, this will be in -tip soon, and hit mainline during
> > the next merge window.
> 
> Thanks Paul, that fixes the issue for me.

Good to hear!  As luck would have it, it is already in -tip, so I cannot
apply your Tested-by.  :-(

							Thanx, Paul

> Peter, with that fix applied in addition to your patch, I don't see the
> CCI PMU code exploding around hotplug, even with event migration hacked
> into the driver.
> 
> Thanks,
> Mark.
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: perf fuzzer crash [PATCH] perf: Get group events reference before moving the group
@ 2015-01-19 18:09 Vince Weaver
  0 siblings, 0 replies; 14+ messages in thread
From: Vince Weaver @ 2015-01-19 18:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, Ingo Molnar, Andi Kleen, linux-kernel, mark.rutland,
	Linus Torvalds

On Sun, 18 Jan 2015, Vince Weaver wrote:

> On Fri, 16 Jan 2015, Vince Weaver wrote:
> 
> I've continued fuzzing all weekend, with PeterZ's patch 
> applied to 3.19-rc4.  It has managed to stay up without crashing, 
> impressive.   

That was on my core2 system.

I tried on my 3.19-rc5+Peterz-patch Haswell system and it crashes nearly 
instantly.  
Possibly because I have a lot of gratuitous debugging options enabled due to
trying to debug fuzzer-found issues in the past.

Here's what I get:

Stopping after 30000
Watchdog enabled with timeout 60s
Will auto-exit if signal storm detected

*** perf_fuzzer 0.29 *** by Vince Weaver

	Linux version 3.19.0-rc5+ x86_64
	Processor: Intel 6/60/3

	Seeding random number generator with 1421689769
	/proc/sys/kernel/perf_event_max_sample_rate currently: 100000/s
	/proc/sys/kernel/perf_event_paranoid currently: 0
	Logging perf_event_open() failures: no
	Running fsync after every syscall: no
	To reproduce, try: ./perf_fuzzer -s 30000 -r 1421689769

Pid=2303, sleeping 1s
==================================================
Fuzzing the following syscalls:
	mmap perf_event_open close read write ioct

[  775.544727] ------------[ cut here ]------------
[  775.550944] WARNING: CPU: 4 PID: 2305 at kernel/events/core.c:151 perf_remove_from_context+0x167/0x180()
[  775.558422] ------------[ cut here ]------------
[  775.558424] WARNING: CPU: 5 PID: 2306 at kernel/events/core.c:151 perf_remove_from_context+0x167/0x180()
[  775.558441] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel crct10dif_pclmul crc32_pclmul snd_hda_controller snd_hda_codec ghash_clmulni_intel snd_hwdep i915 aesni_intel snd_pcm aes_x86_64 snd_timer lrw iTCO_wdt iTCO_vendor_support evdev gf128mul snd glue_helper psmouse ppdev xhci_pci ablk_helper pcspkr serio_raw parport_pc soundcore cryptd xhci_hcd lpc_ich tpm_tis mei_me mei tpm i2c_i801 drm_kms_helper parport mfd_core battery video drm wmi i2c_algo_bit button processor sg sr_mod cdrom sd_mod ehci_pci ahci ehci_hcd libahci libata e1000e ptp scsi_mod crc32c_intel usbcore usb_common pps_core fan thermal thermal_sys
[  775.558442] CPU: 5 PID: 2306 Comm: perf_fuzzer Not tainted 3.19.0-rc5+ #123
[  775.558443] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[  775.558444]  ffffffff81a3ef6f ffff8800cd9c3bc8 ffffffff816b62b1 0000000000000000
[  775.558445]  0000000000000000 ffff8800cd9c3c08 ffffffff8106dcba 0000000000000286
[  775.558446]  ffff8800cf4a49d0 0000000000000001 ffff8800ce9b3800 ffff8800365de800
[  775.558446] Call Trace:
[  775.558450]  [<ffffffff816b62b1>] dump_stack+0x45/0x57
[  775.558452]  [<ffffffff8106dcba>] warn_slowpath_common+0x8a/0xc0
[  775.558453]  [<ffffffff8106ddaa>] warn_slowpath_null+0x1a/0x20
[  775.558454]  [<ffffffff81157637>] perf_remove_from_context+0x167/0x180
[  775.558456]  [<ffffffff8115ea2f>] perf_event_exit_task+0x2bf/0x330
[  775.558458]  [<ffffffff81070786>] do_exit+0x326/0xac0
[  775.558460]  [<ffffffff8107cc79>] ? get_signal+0xe9/0x770
[  775.558461]  [<ffffffff8107cee9>] ? get_signal+0x359/0x770
[  775.558462]  [<ffffffff81070fc4>] do_group_exit+0x54/0xe0
[  775.558463]  [<ffffffff8107ce26>] get_signal+0x296/0x770
[  775.558465]  [<ffffffff8105ded2>] ? __do_page_fault+0x1f2/0x580
[  775.558467]  [<ffffffff81013578>] do_signal+0x28/0xbb0
[  775.558469]  [<ffffffff8105e282>] ? do_page_fault+0x22/0x30
[  775.558470]  [<ffffffff81014170>] do_notify_resume+0x70/0x90
[  775.558472]  [<ffffffff816bf1a2>] retint_signal+0x48/0x86
[  775.558472] ---[ end trace 667f9b8301b0c838 ]---
[  775.807809] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel crct10dif_pclmul crc32_pclmul snd_hda_controller snd_hda_codec ghash_clmulni_intel snd_hwdep i915 aesni_intel snd_pcm aes_x86_64 snd_timer lrw iTCO_wdt iTCO_vendor_support evdev gf128mul snd glue_helper psmouse ppdev xhci_pci ablk_helper pcspkr serio_raw parport_pc soundcore cryptd xhci_hcd lpc_ich tpm_tis mei_me mei tpm i2c_i801 drm_kms_helper parport mfd_core battery video drm wmi i2c_algo_bit button processor sg sr_mod cdrom sd_mod ehci_pci ahci ehci_hcd libahci libata e1000e ptp scsi_mod crc32c_intel usbcore usb_common pps_core fan thermal thermal_sys
[  775.879242] CPU: 4 PID: 2305 Comm: perf_fuzzer Tainted: G        W      3.19.0-rc5+ #123
[  775.887958] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[  775.895894]  ffffffff81a3ef6f ffff8800cefd7bc8 ffffffff816b62b1 0000000000000000
[  775.903977]  0000000000000000 ffff8800cefd7c08 ffffffff8106dcba 0000000000000286
[  775.911979]  ffff880036c84990 0000000000000001 ffff8800cdaa20f0 ffff8800cd74d000
[  775.920009] Call Trace:
[  775.922613]  [<ffffffff816b62b1>] dump_stack+0x45/0x57
[  775.928145]  [<ffffffff8106dcba>] warn_slowpath_common+0x8a/0xc0
[  775.934603]  [<ffffffff8106ddaa>] warn_slowpath_null+0x1a/0x20
[  775.940839]  [<ffffffff81157637>] perf_remove_from_context+0x167/0x180
[  775.947856]  [<ffffffff8115ea2f>] perf_event_exit_task+0x2bf/0x330
[  775.954480]  [<ffffffff81070786>] do_exit+0x326/0xac0
[  775.959914]  [<ffffffff8107cc79>] ? get_signal+0xe9/0x770
[  775.965724]  [<ffffffff8107cee9>] ? get_signal+0x359/0x770
[  775.971610]  [<ffffffff81070fc4>] do_group_exit+0x54/0xe0
[  775.977385]  [<ffffffff8107ce26>] get_signal+0x296/0x770
[  775.983076]  [<ffffffff8105ded2>] ? __do_page_fault+0x1f2/0x580
[  775.989402]  [<ffffffff81013578>] do_signal+0x28/0xbb0
[  775.994924]  [<ffffffff8105e282>] ? do_page_fault+0x22/0x30
[  776.000933]  [<ffffffff81014170>] do_notify_resume+0x70/0x90
[  776.006995]  [<ffffffff816bf1a2>] retint_signal+0x48/0x86
[  776.012792] ---[ end trace 667f9b8301b0c839 ]---
[  782.600462] ------------[ cut here ]------------
[  782.605452] WARNING: CPU: 2 PID: 0 at arch/x86/kernel/cpu/perf_event.c:1206 x86_pmu_del+0x9c/0x140()
[  782.616019] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel crct10dif_pclmul crc32_pclmul snd_hda_controller snd_hda_codec ghash_clmulni_intel snd_hwdep i915 aesni_intel snd_pcm aes_x86_64 snd_timer lrw iTCO_wdt iTCO_vendor_support evdev gf128mul snd glue_helper psmouse ppdev xhci_pci ablk_helper pcspkr serio_raw parport_pc soundcore cryptd xhci_hcd lpc_ich tpm_tis mei_me mei tpm i2c_i801 drm_kms_helper parport mfd_core battery video drm wmi i2c_algo_bit button processor sg sr_mod cdrom sd_mod ehci_pci ahci ehci_hcd libahci libata e1000e ptp scsi_mod crc32c_intel usbcore usb_common pps_core fan thermal thermal_sys
[  782.690685] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G        W      3.19.0-rc5+ #123
[  782.699771] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[  782.708545]  ffffffff81a25cf8 ffff88011ea83dc8 ffffffff816b62b1 0000000000000000
[  782.717468]  0000000000000000 ffff88011ea83e08 ffffffff8106dcba ffff88011ea9cee0
[  782.726334]  0000000000000001 ffff88011ea8bd40 ffff880118f6b800 ffff88011ea9cee0
[  782.735206] Call Trace:
[  782.738686]  <IRQ>  [<ffffffff816b62b1>] dump_stack+0x45/0x57
[  782.745740]  [<ffffffff8106dcba>] warn_slowpath_common+0x8a/0xc0
[  782.753074]  [<ffffffff8106ddaa>] warn_slowpath_null+0x1a/0x20
[  782.760154]  [<ffffffff8102a8ec>] x86_pmu_del+0x9c/0x140
[  782.766662]  [<ffffffff811585b6>] event_sched_out.isra.73+0xf6/0x240
[  782.774218]  [<ffffffff81158828>] ? __perf_event_disable+0x58/0x140
[  782.781625]  [<ffffffff81158771>] group_sched_out+0x71/0xd0
[  782.788348]  [<ffffffff8101d90a>] ? native_sched_clock+0x2a/0x90
[  782.795503]  [<ffffffff811588d6>] __perf_event_disable+0x106/0x140
[  782.802842]  [<ffffffff810ea589>] ? tick_nohz_irq_exit+0x29/0x30
[  782.810013]  [<ffffffff811541b0>] remote_function+0x50/0x60
[  782.816740]  [<ffffffff810ef762>] flush_smp_call_function_queue+0x62/0x140
[  782.824793]  [<ffffffff810efd83>] generic_smp_call_function_single_interrupt+0x13/0x60
[  782.834039]  [<ffffffff81046e28>] smp_trace_call_function_single_interrupt+0x38/0xc0
[  782.843107]  [<ffffffff816bf83d>] trace_call_function_single_interrupt+0x6d/0x80
[  782.851772]  <EOI>  [<ffffffff81553ca5>] ? cpuidle_enter_state+0x65/0x160
[  782.859843]  [<ffffffff81553c91>] ? cpuidle_enter_state+0x51/0x160
[  782.867195]  [<ffffffff81553e87>] cpuidle_enter+0x17/0x20
[  782.873731]  [<ffffffff810aebc1>] cpu_startup_entry+0x311/0x3c0
[  782.880799]  [<ffffffff810476b0>] start_secondary+0x140/0x150
[  782.887705] ---[ end trace 667f9b8301b0c83a ]---
[  782.948582] ------------[ cut here ]------------
[  782.955444] WARNING: CPU: 1 PID: 0 at arch/x86/kernel/cpu/perf_event.c:1161 x86_pmu_stop+0xca/0xe0()
[  783.062026] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W      3.19.0-rc5+ #123
[  783.071617] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[  783.080904]  ffffffff81a25cf8(#12 0/18)(#13 0 ffff88011ea43c58 ffffffff816b62b1/13)(#14 0/22)(> 0000000000000000
[  783.093032]  0000000000000000 ffff88011ea43c98ose attempts: 88 ffffffff8106dcba 0000000000000000
[  783.103833]  ffff88011ea4bd40 ffff880118f6b80084 0000000000000004 ffff88011ea4c270
[  783.115993] Call Trace:
[  783.120017]  <IRQ> [<ffffffff816b62b1>] dump_stack+0x45/0x57
[  783.128954]  [<ffffffff8106dcba>] warn_slowpath_common+0x8a/0xc0
[  783.136775]  [<ffffffff8106ddaa>] warn_slowpath_null+0x1a/0x20
[  783.144435]  [<ffffffff8102a83a>] x86_pmu_stop+0xca/0xe0
[  783.151481]  [<ffffffff8102b83d>] x86_pmu_enable+0xad/0x310
[  783.158848]  [<ffffffff811584ba>] perf_pmu_enable+0x2a/0x30
[  783.166222]  [<ffffffff81029e08>] x86_pmu_commit_txn+0x78/0xa0
[  783.173901]  [<ffffffff8141357b>] ? debug_object_activate+0x14b/0x1e0
[  783.182167]  [<ffffffff810b58df>] ? __lock_acquire.isra.31+0x3af/0xfe0
[  783.190548]  [<ffffffff810b4758>] ? __lock_is_held+0x58/0x80
[  783.198022]  [<ffffffff81159340>] ? event_sched_in.isra.75+0x180/0x280
[  783.206395]  [<ffffffff811595f8>] group_sched_in+0x1b8/0x1e0
[  783.213862]  [<ffffffff8101d90a>] ? native_sched_clock+0x2a/0x90
[  783.221687]  [<ffffffff81159d0c>] __perf_event_enable+0x25c/0x2a0
[  783.229620]  [<ffffffff810ea589>] ? tick_nohz_irq_exit+0x29/0x30
[  783.237440]  [<ffffffff811541b0>] remote_function+0x50/0x60
[  783.244840]  [<ffffffff810ef762>] flush_smp_call_function_queue+0x62/0x140
[  783.253564]  [<ffffffff810efd83>] generic_smp_call_function_single_interrupt+0x13/0x60
[  783.263453]  [<ffffffff81046e28>] smp_trace_call_function_single_interrupt+0x38/0xc0
[  783.273176]  [<ffffffff816bf83d>] trace_call_function_single_interrupt+0x6d/0x80
[  783.282518]  <EOI>  full: 0[<ffffffff81553ca5>] ? cpuidle_enter_state+0x65/0x160
[  783.292114]  [<ffffffff81553c91>] ? cpuidle_enter_state+0x51/0x160
[  783.299502]  [<ffffffff81553e87>] cpuidle_enter+0x17/0x20
[  783.306059]  [<ffffffff810aebc1>] cpu_startup_entry+0x311/0x3c0
[  783.313107]  [<ffffffff810476b0>] start_secondary+0x140/0x150
[  783.320026] ---[ end trace 667f9b8301b0c83b ]---
[  783.496936] ------------[ cut here ]------------
[  783.502576] WARNING: CPU: 1 PID: 2783 at arch/x86/kernel/cpu/perf_event.c:1079 x86_pmu_start+0xb2/0x120()
[  783.513534] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel crct10dif_pclmul crc32_pclmul snd_hda_controller snd_hda_codec ghash_clmulni_intel snd_hwdep i915 aesni_intel snd_pcm aes_x86_64 snd_timer lrw iTCO_wdt iTCO_vendor_support evdev gf128mul snd glue_helper psmouse ppdev xhci_pci ablk_helper pcspkr serio_raw parport_pc soundcore cryptd xhci_hcd lpc_ich tpm_tis mei_me mei tpm i2c_i801 drm_kms_helper parport mfd_core battery video drm wmi i2c_algo_bit button processor sg sr_mod cdrom sd_mod ehci_pci ahci ehci_hcd libahci libata e1000e ptp scsi_mod crc32c_intel usbcore usb_common pps_core fan thermal thermal_sys
[  783.587991] CPU: 1 PID: 2783 Comm: perf_fuzzer Tainted: G        W      3.19.0-rc5+ #123
[  783.597490] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[  783.606217]  ffffffff81a25cf8 ffff88011ea43c58 ffffffff816b62b1 0000000000000000
[  783.615030]  0000000000000000 ffff88011ea43c98 ffffffff8106dcba ffff88011ea43ca8
[  783.623826]  ffff88011ea4bd40 ffff880118f6b800 0000000000000002 ffff88011ea4bf6c
[  783.632683] Call Trace:
[  783.636123]  <IRQ>  [<ffffffff816b62b1>] dump_stack+0x45/0x57
[  783.643160]  [<ffffffff8106dcba>] warn_slowpath_common+0x8a/0xc0
[  783.650449]  [<ffffffff8106ddaa>] warn_slowpath_null+0x1a/0x20
[  783.657523]  [<ffffffff8102b282>] x86_pmu_start+0xb2/0x120
[  783.664225]  [<ffffffff8102ba25>] x86_pmu_enable+0x295/0x310
[  783.671072]  [<ffffffff811584ba>] perf_pmu_enable+0x2a/0x30
[  783.677867]  [<ffffffff81029e08>] x86_pmu_commit_txn+0x78/0xa0
[  783.684940]  [<ffffffff81413db6>] ? debug_check_no_obj_freed+0x186/0x220
[  783.692936]  [<ffffffff810b58df>] ? __lock_acquire.isra.31+0x3af/0xfe0
[  783.700759]  [<ffffffff810b4758>] ? __lock_is_held+0x58/0x80
[  783.707650]  [<ffffffff81159340>] ? event_sched_in.isra.75+0x180/0x280
[  783.715443]  [<ffffffff811595f8>] group_sched_in+0x1b8/0x1e0
[  783.722296]  [<ffffffff8101d90a>] ? native_sched_clock+0x2a/0x90
[  783.729510]  [<ffffffff81159d0c>] __perf_event_enable+0x25c/0x2a0
[  783.736824]  [<ffffffff811541b0>] remote_function+0x50/0x60
[  783.743600]  [<ffffffff810ef762>] flush_smp_call_function_queue+0x62/0x140
[  783.751778]  [<ffffffff810efd83>] generic_smp_call_function_single_interrupt+0x13/0x60
[  783.761049]  [<ffffffff81046e28>] smp_trace_call_function_single_interrupt+0x38/0xc0
[  783.770165]  [<ffffffff816bf83d>] trace_call_function_single_interrupt+0x6d/0x80
[  783.778835]  <EOI> 
[  783.780912] ---[ end trace 667f9b8301b0c83c ]---
[  784.067436] BUG: unable to handle kernel paging request at 000000000000cc68
[  784.075656] IP: [<ffffffff81032bb2>] rapl_event_update+0x82/0xb0
[  784.082789] PGD 0 
[  784.085640] Oops: 0000 [#1] SMP 
[  784.089821] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel crct10dif_pclmul crc32_pclmul snd_hda_controller snd_hda_codec ghash_clmulni_intel snd_hwdep i915 aesni_intel snd_pcm aes_x86_64 snd_timer lrw iTCO_wdt iTCO_vendor_support evdev gf128mul snd glue_helper psmouse ppdev xhci_pci ablk_helper pcspkr serio_raw parport_pc soundcore cryptd xhci_hcd lpc_ich tpm_tis mei_me mei tpm i2c_i801 drm_kms_helper parport mfd_core battery video drm wmi i2c_algo_bit button processor sg sr_mod cdrom sd_mod ehci_pci ahci ehci_hcd libahci libata e1000e ptp scsi_mod crc32c_intel usbcore usb_common pps_core fan thermal thermal_sys
[  784.164146] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W      3.19.0-rc5+ #123
[  784.173122] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[  784.181834] task: ffff880119470390 ti: ffff880119474000 task.ti: ffff880119474000
[  784.190665] RIP: 0010:[<ffffffff81032bb2>]  [<ffffffff81032bb2>] rapl_event_update+0x82/0xb0
[  784.200530] RSP: 0000:ffff88011eb43dd0  EFLAGS: 00010046
[  784.207075] RAX: 0000000003f8077a RBX: ffff88011eb43ddc RCX: 0000000000000020
[  784.215532] RDX: 0000000000000000 RSI: 0000000003f8077a RDI: 0000000000000000
[  784.224001] RBP: ffff88011eb43e08 R08: 0000000000000000 R09: 0000000000000090
[  784.232471] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000003f80574
[  784.240952] R13: ffff8800ce2c81a0 R14: ffff8800ce2c8000 R15: 0000000003f8077a
[  784.249414] FS:  0000000000000000(0000) GS:ffff88011eb40000(0000) knlGS:0000000000000000
[  784.258911] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  784.265926] CR2: 000000000000cc68 CR3: 0000000001c13000 CR4: 00000000001407e0
[  784.274389] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  784.282869] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  784.291350] Stack:
[  784.294326]  ffff8800cda1cbe0 00000000cda1cb80 ffff8800ce2c8000 ffff8800cda1cb80
[  784.303169]  0000000000000086 0000000000000004 ffffe8ffffd43f54 ffff88011eb43e38
[  784.312047]  ffffffff81032db8 ffff8800ce2c8000 ffffe8ffffd43d10 000000b68e0eb5a0
[  784.320889] Call Trace:
[  784.324325]  <IRQ> 
[  784.326416]  [<ffffffff81032db8>] rapl_pmu_event_stop+0x98/0x120
[  784.334741]  [<ffffffff81032e53>] rapl_pmu_event_del+0x13/0x20
[  784.341796]  [<ffffffff811585b6>] event_sched_out.isra.73+0xf6/0x240
[  784.349420]  [<ffffffff81158828>] ? __perf_event_disable+0x58/0x140
[  784.356927]  [<ffffffff8115874d>] group_sched_out+0x4d/0xd0
[  784.363716]  [<ffffffff8101d90a>] ? native_sched_clock+0x2a/0x90
[  784.370964]  [<ffffffff811588d6>] __perf_event_disable+0x106/0x140
[  784.378366]  [<ffffffff810ea589>] ? tick_nohz_irq_exit+0x29/0x30
[  784.385560]  [<ffffffff811541b0>] remote_function+0x50/0x60
[  784.392245]  [<ffffffff810ef762>] flush_smp_call_function_queue+0x62/0x140
[  784.400323]  [<ffffffff810efd83>] generic_smp_call_function_single_interrupt+0x13/0x60
[  784.409466]  [<ffffffff81046e28>] smp_trace_call_function_single_interrupt+0x38/0xc0
[  784.418418]  [<ffffffff816bf83d>] trace_call_function_single_interrupt+0x6d/0x80
[  784.426989]  <EOI> 
[  784.429083]  [<ffffffff81553ca5>] ? cpuidle_enter_state+0x65/0x160
[  784.437316]  [<ffffffff81553c91>] ? cpuidle_enter_state+0x51/0x160
[  784.444623]  [<ffffffff81553e87>] cpuidle_enter+0x17/0x20
[  784.451067]  [<ffffffff810aebc1>] cpu_startup_entry+0x311/0x3c0
[  784.458060]  [<ffffffff810476b0>] start_secondary+0x140/0x150
[  784.464880] Code: 00 00 41 8b be 48 01 00 00 48 89 de e8 d8 42 02 00 66 90 49 89 c7 4c 89 e0 4d 0f b1 7d 00 4c 39 e0 75 d6 4c 89 f8 b9 20 00 00 00 <48> 8b 15 af a0 fd 7e 4c 29 e0 65 8b 52 38 48 98 29 d1 48 d3 e0 
[  784.487063] RIP  [<ffffffff81032bb2>] rapl_event_update+0x82/0xb0
[  784.494312]  RSP <ffff88011eb43dd0>
[  784.498774] CR2: 000000000000cc68
[  784.509591] ---[ end trace 667f9b8301b0c83d ]---
[  784.515268] Kernel panic - not syncing: Fatal exception in interrupt
[  784.522828] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[  784.534493] drm_kms_helper: panic occurred, switching back to text console
[  784.542629] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

and then the machine is locked solid.

Vince

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-01-21 20:07 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-16  7:57 perf fuzzer crash [PATCH] perf: Get group events reference before moving the group Jiri Olsa
2015-01-16 10:46 ` Peter Zijlstra
2015-01-16 14:11   ` Peter Zijlstra
2015-01-16 18:54     ` Vince Weaver
2015-01-19  3:49       ` Vince Weaver
2015-01-18 14:13     ` Ingo Molnar
2015-01-19 14:40     ` Mark Rutland
2015-01-19 17:40       ` Mark Rutland
2015-01-20 13:39         ` Mark Rutland
2015-01-20 14:35           ` Mark Rutland
2015-01-21  1:00             ` Paul E. McKenney
2015-01-21 12:08               ` Mark Rutland
2015-01-21 20:07                 ` Paul E. McKenney
2015-01-19 18:09 Vince Weaver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).