All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] perf: Untangle aux refcounting
@ 2015-12-03 10:32 Alexander Shishkin
  2015-12-03 10:32 ` [PATCH 1/7] perf: Refuse to begin aux transaction after aux_mmap_count drops Alexander Shishkin
                   ` (6 more replies)
  0 siblings, 7 replies; 28+ messages in thread
From: Alexander Shishkin @ 2015-12-03 10:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo, Alexander Shishkin

Hi Peter,

As we discussed, here's a patchset that tweaks aux refcounting so that
freeing stuff in NMI context is no longer possible. Also, as
discussed, I tried to generalize the unlock context -
task_function_call - lock context - check stuff - repeat sequence into
a helper and converted the existing offenders to use that instead.

I'm not very happy with the way perf_mmap_close turned out, but I
figured it's an unmap path and it's at least a start. At some point, a
better way of doing things to events on rb::event_list may have to be
devised.

For bisectability purposes, it *may* make sense to bring the last two
patches forward as they should work just fine with or without the
perf_event_stop() trick in mmap_close.

I may have gone slightly over the top with the verbal explanations
around what's happening, but at least we won't have to dig up old irc
conversations next time it blows up (which it won't do).

Alexander Shishkin (7):
  perf: Refuse to begin aux transaction after aux_mmap_count drops
  perf: Generalize task_function_call()ers
  perf: Add a helper to stop running events
  perf: Free aux pages in unmap path
  perf: Document aux api usage
  perf/x86/intel/pt: Move transaction start/stop to pmu start/stop
    callbacks
  perf/x86/intel/bts: Move transaction start/stop to start/stop
    callbacks

 arch/x86/kernel/cpu/perf_event_intel_bts.c | 105 +++++------
 arch/x86/kernel/cpu/perf_event_intel_pt.c  |  85 +++------
 include/linux/perf_event.h                 |   1 +
 kernel/events/core.c                       | 287 ++++++++++++++++-------------
 kernel/events/internal.h                   |   1 -
 kernel/events/ring_buffer.c                |  52 +++---
 6 files changed, 262 insertions(+), 269 deletions(-)

-- 
2.6.2


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/7] perf: Refuse to begin aux transaction after aux_mmap_count drops
  2015-12-03 10:32 [PATCH 0/7] perf: Untangle aux refcounting Alexander Shishkin
@ 2015-12-03 10:32 ` Alexander Shishkin
  2015-12-03 10:32 ` [PATCH 2/7] perf: Generalize task_function_call()ers Alexander Shishkin
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Alexander Shishkin @ 2015-12-03 10:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo, Alexander Shishkin

When ring buffer's aux area is unmapped and aux_mmap_count drops to
zero, new aux transactions into this buffer can still be started,
even though the buffer in en route to deallocation.

This patch adds a check to perf_aux_output_begin() for aux_mmap_count
being zero, in which case there is no point starting new transactions,
in other words, the ring buffers that pass a certain point in
perf_mmap_close will not have their events sending new data, which
clears path for freeing those buffers' pages right there and then,
provided that no active transactions are holding the aux reference.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 kernel/events/ring_buffer.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index adfdc05361..5709cc222f 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -288,6 +288,13 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
 		goto err;
 
 	/*
+	 * If rb::aux_mmap_count is zero (and rb_has_aux() above went through),
+	 * the aux buffer is in perf_mmap_close(), about to get free'd.
+	 */
+	if (!atomic_read(&rb->aux_mmap_count))
+		goto err;
+
+	/*
 	 * Nesting is not supported for AUX area, make sure nested
 	 * writers are caught early
 	 */
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/7] perf: Generalize task_function_call()ers
  2015-12-03 10:32 [PATCH 0/7] perf: Untangle aux refcounting Alexander Shishkin
  2015-12-03 10:32 ` [PATCH 1/7] perf: Refuse to begin aux transaction after aux_mmap_count drops Alexander Shishkin
@ 2015-12-03 10:32 ` Alexander Shishkin
  2015-12-03 17:34   ` Peter Zijlstra
  2015-12-03 10:32 ` [PATCH 3/7] perf: Add a helper to stop running events Alexander Shishkin
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Alexander Shishkin @ 2015-12-03 10:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo, Alexander Shishkin

A number of places in perf core (perf_event_enable, perf_event_disable,
perf_install_in_context, perf_remove_from_context) are performing the
same tapdance around task_function_call() to modify an event and/or its
context on its target cpu. The sequence mainly consists of re-trying
task_function_call() until it either succeeds or is no longer necessary
or the event's context has gone inactive, in which case the action can
be performed on the caller's cpu provided the context lock is still
held. The complication that all callers are dealing with is that they
have to release the context lock before calling task_function_call(),
which gives an opportunity for the context in question to get scheduled
out, so then they have to re-acquire the lock, check whether that is
the case and re-try the cross-call.

This patch creates a more generic helper that either executes a callback
on the target cpu or returns with ctx::lock locked, so that the caller
can perform its action on an inactive context. This patch also converts
the existing users of this sequence to use the new helper.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 include/linux/perf_event.h |   1 +
 kernel/events/core.c       | 190 +++++++++++++++------------------------------
 2 files changed, 64 insertions(+), 127 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index f9828a48f1..55290d609e 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -402,6 +402,7 @@ enum perf_event_active_state {
 	PERF_EVENT_STATE_OFF		= -1,
 	PERF_EVENT_STATE_INACTIVE	=  0,
 	PERF_EVENT_STATE_ACTIVE		=  1,
+	PERF_EVENT_STATE_NONE		=  2,
 };
 
 struct file;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5854fcf7f0..0d3296f600 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -126,6 +126,59 @@ static int cpu_function_call(int cpu, remote_function_f func, void *info)
 	return data.ret;
 }
 
+static int
+remote_call_or_ctx_lock(struct perf_event *event, remote_function_f func,
+			void *info, enum perf_event_active_state retry_state)
+{
+	struct perf_event_context *ctx = event->ctx;
+	struct task_struct *task = ctx->task;
+	unsigned int retry = 1;
+
+	if (!task) {
+		/*
+		 * Per cpu events are removed via an smp call. The removal can
+		 * fail if the CPU is currently offline, but in that case we
+		 * already called __perf_remove_from_context from
+		 * perf_event_exit_cpu.
+		 */
+		cpu_function_call(event->cpu, func, info);
+		return 0;
+	}
+
+	raw_spin_lock_irq(&ctx->lock);
+	do {
+		/*
+		 * Reload the task pointer, it might have been changed by
+		 * a concurrent perf_event_context_sched_out().
+		 */
+		task = ctx->task;
+
+		/*
+		 * If the context is inactive, we don't need a cross call to
+		 * fiddle with the event so long as the ctx::lock is held.
+		 */
+		if (!ctx->is_active)
+			break;
+
+		raw_spin_unlock_irq(&ctx->lock);
+
+		if (!task_function_call(task, func, info))
+			return 0;
+
+		raw_spin_lock_irq(&ctx->lock);
+
+		if (retry_state <= PERF_EVENT_STATE_ACTIVE)
+			retry = event->state == retry_state;
+	} while (retry);
+
+	/*
+	 * No luck: leaving with ctx::lock held and interrupts disabled
+	 * so that the caller can safely fiddle with event or context.
+	 */
+
+	return -ENXIO;
+}
+
 #define EVENT_OWNER_KERNEL ((void *) -1)
 
 static bool is_kernel_event(struct perf_event *event)
@@ -1673,7 +1726,6 @@ static int __perf_remove_from_context(void *info)
 static void perf_remove_from_context(struct perf_event *event, bool detach_group)
 {
 	struct perf_event_context *ctx = event->ctx;
-	struct task_struct *task = ctx->task;
 	struct remove_event re = {
 		.event = event,
 		.detach_group = detach_group,
@@ -1681,36 +1733,10 @@ static void perf_remove_from_context(struct perf_event *event, bool detach_group
 
 	lockdep_assert_held(&ctx->mutex);
 
-	if (!task) {
-		/*
-		 * Per cpu events are removed via an smp call. The removal can
-		 * fail if the CPU is currently offline, but in that case we
-		 * already called __perf_remove_from_context from
-		 * perf_event_exit_cpu.
-		 */
-		cpu_function_call(event->cpu, __perf_remove_from_context, &re);
-		return;
-	}
-
-retry:
-	if (!task_function_call(task, __perf_remove_from_context, &re))
+	if (!remote_call_or_ctx_lock(event, __perf_remove_from_context, &re,
+				     PERF_EVENT_STATE_NONE))
 		return;
 
-	raw_spin_lock_irq(&ctx->lock);
-	/*
-	 * If we failed to find a running task, but find the context active now
-	 * that we've acquired the ctx->lock, retry.
-	 */
-	if (ctx->is_active) {
-		raw_spin_unlock_irq(&ctx->lock);
-		/*
-		 * Reload the task pointer, it might have been changed by
-		 * a concurrent perf_event_context_sched_out().
-		 */
-		task = ctx->task;
-		goto retry;
-	}
-
 	/*
 	 * Since the task isn't running, its safe to remove the event, us
 	 * holding the ctx->lock ensures the task won't get scheduled in.
@@ -1778,34 +1804,10 @@ int __perf_event_disable(void *info)
 static void _perf_event_disable(struct perf_event *event)
 {
 	struct perf_event_context *ctx = event->ctx;
-	struct task_struct *task = ctx->task;
-
-	if (!task) {
-		/*
-		 * Disable the event on the cpu that it's on
-		 */
-		cpu_function_call(event->cpu, __perf_event_disable, event);
-		return;
-	}
 
-retry:
-	if (!task_function_call(task, __perf_event_disable, event))
+	if (!remote_call_or_ctx_lock(event, __perf_event_disable, event, PERF_EVENT_STATE_ACTIVE))
 		return;
 
-	raw_spin_lock_irq(&ctx->lock);
-	/*
-	 * If the event is still active, we need to retry the cross-call.
-	 */
-	if (event->state == PERF_EVENT_STATE_ACTIVE) {
-		raw_spin_unlock_irq(&ctx->lock);
-		/*
-		 * Reload the task pointer, it might have been changed by
-		 * a concurrent perf_event_context_sched_out().
-		 */
-		task = ctx->task;
-		goto retry;
-	}
-
 	/*
 	 * Since we have the lock this context can't be scheduled
 	 * in, so we can change the state safely.
@@ -2143,42 +2145,16 @@ perf_install_in_context(struct perf_event_context *ctx,
 			struct perf_event *event,
 			int cpu)
 {
-	struct task_struct *task = ctx->task;
-
 	lockdep_assert_held(&ctx->mutex);
 
 	event->ctx = ctx;
 	if (event->cpu != -1)
 		event->cpu = cpu;
 
-	if (!task) {
-		/*
-		 * Per cpu events are installed via an smp call and
-		 * the install is always successful.
-		 */
-		cpu_function_call(cpu, __perf_install_in_context, event);
-		return;
-	}
-
-retry:
-	if (!task_function_call(task, __perf_install_in_context, event))
+	if (!remote_call_or_ctx_lock(event, __perf_install_in_context, event,
+				     PERF_EVENT_STATE_NONE))
 		return;
 
-	raw_spin_lock_irq(&ctx->lock);
-	/*
-	 * If we failed to find a running task, but find the context active now
-	 * that we've acquired the ctx->lock, retry.
-	 */
-	if (ctx->is_active) {
-		raw_spin_unlock_irq(&ctx->lock);
-		/*
-		 * Reload the task pointer, it might have been changed by
-		 * a concurrent perf_event_context_sched_out().
-		 */
-		task = ctx->task;
-		goto retry;
-	}
-
 	/*
 	 * Since the task isn't running, its safe to add the event, us holding
 	 * the ctx->lock ensures the task won't get scheduled in.
@@ -2299,15 +2275,6 @@ unlock:
 static void _perf_event_enable(struct perf_event *event)
 {
 	struct perf_event_context *ctx = event->ctx;
-	struct task_struct *task = ctx->task;
-
-	if (!task) {
-		/*
-		 * Enable the event on the cpu that it's on
-		 */
-		cpu_function_call(event->cpu, __perf_event_enable, event);
-		return;
-	}
 
 	raw_spin_lock_irq(&ctx->lock);
 	if (event->state >= PERF_EVENT_STATE_INACTIVE)
@@ -2322,32 +2289,13 @@ static void _perf_event_enable(struct perf_event *event)
 	 */
 	if (event->state == PERF_EVENT_STATE_ERROR)
 		event->state = PERF_EVENT_STATE_OFF;
-
-retry:
-	if (!ctx->is_active) {
-		__perf_event_mark_enabled(event);
-		goto out;
-	}
-
 	raw_spin_unlock_irq(&ctx->lock);
 
-	if (!task_function_call(task, __perf_event_enable, event))
+	if (!remote_call_or_ctx_lock(event, __perf_event_enable, event, PERF_EVENT_STATE_OFF))
 		return;
 
-	raw_spin_lock_irq(&ctx->lock);
-
-	/*
-	 * If the context is active and the event is still off,
-	 * we need to retry the cross-call.
-	 */
-	if (ctx->is_active && event->state == PERF_EVENT_STATE_OFF) {
-		/*
-		 * task could have been flipped by a concurrent
-		 * perf_event_context_sched_out()
-		 */
-		task = ctx->task;
-		goto retry;
-	}
+	if (!ctx->is_active)
+		__perf_event_mark_enabled(event);
 
 out:
 	raw_spin_unlock_irq(&ctx->lock);
@@ -4209,22 +4157,10 @@ static int perf_event_period(struct perf_event *event, u64 __user *arg)
 	task = ctx->task;
 	pe.value = value;
 
-	if (!task) {
-		cpu_function_call(event->cpu, __perf_event_period, &pe);
-		return 0;
-	}
-
-retry:
-	if (!task_function_call(task, __perf_event_period, &pe))
+	if (!remote_call_or_ctx_lock(event, __perf_event_period, &pe,
+				     PERF_EVENT_STATE_NONE))
 		return 0;
 
-	raw_spin_lock_irq(&ctx->lock);
-	if (ctx->is_active) {
-		raw_spin_unlock_irq(&ctx->lock);
-		task = ctx->task;
-		goto retry;
-	}
-
 	__perf_event_period(&pe);
 	raw_spin_unlock_irq(&ctx->lock);
 
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/7] perf: Add a helper to stop running events
  2015-12-03 10:32 [PATCH 0/7] perf: Untangle aux refcounting Alexander Shishkin
  2015-12-03 10:32 ` [PATCH 1/7] perf: Refuse to begin aux transaction after aux_mmap_count drops Alexander Shishkin
  2015-12-03 10:32 ` [PATCH 2/7] perf: Generalize task_function_call()ers Alexander Shishkin
@ 2015-12-03 10:32 ` Alexander Shishkin
  2015-12-03 10:32 ` [PATCH 4/7] perf: Free aux pages in unmap path Alexander Shishkin
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Alexander Shishkin @ 2015-12-03 10:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo, Alexander Shishkin

This patch adds a helper function that stops running events without
changing their state. The use case at the moment is stopping active AUX
events while their ring buffer's AUX area is getting unmapped. Since we
know that a new AUX transaction can't be started once ring buffer's
aux_mmap_count drops to zero, the only thing we need to do is stop the
active transactions.

This does a cross-cpu call that will only look at active events that are
running on their cpus.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 kernel/events/core.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 0d3296f600..66f835a2df 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2314,6 +2314,50 @@ void perf_event_enable(struct perf_event *event)
 }
 EXPORT_SYMBOL_GPL(perf_event_enable);
 
+static int __perf_event_stop(void *info)
+{
+	int ret = -EINVAL;
+	struct perf_event *event = info;
+	struct perf_event_context *ctx = event->ctx;
+	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
+
+	raw_spin_lock(&ctx->lock);
+	/* scheduler had a chance to do its thing */
+	if (ctx->task && cpuctx->task_ctx != ctx)
+		goto unlock;
+
+	ret = 0;
+	if (event->state < PERF_EVENT_STATE_ACTIVE)
+		goto unlock;
+
+	perf_pmu_disable(event->pmu);
+	event->pmu->stop(event, PERF_EF_UPDATE);
+	perf_pmu_enable(event->pmu);
+
+unlock:
+	raw_spin_unlock(&ctx->lock);
+
+	return ret;
+}
+
+/*
+ * Stop an event without touching its state;
+ * useful if you know that it won't get restarted when you let go of
+ * the context lock, such as aux path in perf_mmap_close().
+ */
+static void perf_event_stop(struct perf_event *event)
+{
+	struct perf_event_context *ctx;
+
+	ctx = perf_event_ctx_lock(event);
+
+	if (remote_call_or_ctx_lock(event, __perf_event_stop, event,
+				    PERF_EVENT_STATE_NONE))
+		raw_spin_unlock_irq(&event->ctx->lock);
+
+	perf_event_ctx_unlock(event, ctx);
+}
+
 static int _perf_event_refresh(struct perf_event *event, int refresh)
 {
 	/*
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/7] perf: Free aux pages in unmap path
  2015-12-03 10:32 [PATCH 0/7] perf: Untangle aux refcounting Alexander Shishkin
                   ` (2 preceding siblings ...)
  2015-12-03 10:32 ` [PATCH 3/7] perf: Add a helper to stop running events Alexander Shishkin
@ 2015-12-03 10:32 ` Alexander Shishkin
  2015-12-04 17:02   ` Peter Zijlstra
  2015-12-03 10:32 ` [PATCH 5/7] perf: Document aux api usage Alexander Shishkin
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Alexander Shishkin @ 2015-12-03 10:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo, Alexander Shishkin

Now that we can ensure that when ring buffer's aux area is on the way
to getting unmapped new transactions won't start, and we have means of
stopping the running transactions, we can do the latter to the events
on this ring buffer's event list and then safely free the aux pages and
corresponding pmu data, as this time it is guaranteed to be the last
aux reference holder. This partially reverts 57ffc5ca679 ("perf: Fix AUX
buffer refcounting"), which was made to defer deallocation that was
otherwise possible from an NMI context. Now it is no longer the case;
the last call to rb_free_aux() that drops the last AUX reference has
to happen in perf_mmap_close() on that AUX area.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 kernel/events/core.c        | 53 ++++++++++++++++++++++++++++++++++++++++++++-
 kernel/events/internal.h    |  1 -
 kernel/events/ring_buffer.c | 37 ++++++++++---------------------
 3 files changed, 63 insertions(+), 28 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 66f835a2df..10fce18710 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4630,11 +4630,62 @@ static void perf_mmap_close(struct vm_area_struct *vma)
 	 */
 	if (rb_has_aux(rb) && vma->vm_pgoff == rb->aux_pgoff &&
 	    atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &event->mmap_mutex)) {
+		struct perf_event *iter;
+		LIST_HEAD(stop_list);
+		unsigned long flags;
+
+		/*
+		 * Stop all aux events that are writing to this here buffer,
+		 * so that we can free its aux pages and corresponding pmu
+		 * data. Note that after rb::aux_mmap_count dropped to zero,
+		 * they won't start any more (see perf_aux_output_begin()).
+		 *
+		 * Since we can't take ctx::mutex under rb::event_lock, we
+		 * need to jump through hoops to get there, namely fish out
+		 * all events from rb::event_list onto an on-stack list,
+		 * carry out the stopping and splice this on-stack list back
+		 * to rb::event_list.
+		 * This means that these events will miss wakeups during this
+		 * window, but since it's mmap_close, assume the consumer
+		 * doesn't care any more.
+		 *
+		 * Note: list_splice_init_rcu() doesn't cut it, since it syncs
+		 * and rb::event_lock is a spinlock.
+		 */
+retry:
+		spin_lock_irqsave(&rb->event_lock, flags);
+		list_for_each_entry_rcu(iter, &rb->event_list, rb_entry) {
+			list_del_rcu(&iter->rb_entry);
+			spin_unlock_irqrestore(&rb->event_lock, flags);
+
+			synchronize_rcu();
+			list_add_tail(&iter->rb_entry, &stop_list);
+
+			goto retry;
+		}
+		spin_unlock_irqrestore(&rb->event_lock, flags);
+
+		mutex_unlock(&event->mmap_mutex);
+
+		list_for_each_entry(iter, &stop_list, rb_entry) {
+			if (!has_aux(iter))
+				continue;
+
+			perf_event_stop(iter);
+		}
+
+		/* and splice it back now that we're done with them */
+		spin_lock_irqsave(&rb->event_lock, flags);
+		list_splice_tail(&stop_list, &rb->event_list);
+		spin_unlock_irqrestore(&rb->event_lock, flags);
+
+		/* now it's safe to free the pages */
 		atomic_long_sub(rb->aux_nr_pages, &mmap_user->locked_vm);
 		vma->vm_mm->pinned_vm -= rb->aux_mmap_locked;
 
+		/* this has to be the last one */
 		rb_free_aux(rb);
-		mutex_unlock(&event->mmap_mutex);
+		WARN_ON_ONCE(atomic_read(&rb->aux_refcount));
 	}
 
 	atomic_dec(&rb->mmap_count);
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 2bbad9c127..2b229fdcfc 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -11,7 +11,6 @@
 struct ring_buffer {
 	atomic_t			refcount;
 	struct rcu_head			rcu_head;
-	struct irq_work			irq_work;
 #ifdef CONFIG_PERF_USE_VMALLOC
 	struct work_struct		work;
 	int				page_order;	/* allocation order  */
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 5709cc222f..6865ac95ca 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -221,8 +221,6 @@ void perf_output_end(struct perf_output_handle *handle)
 	rcu_read_unlock();
 }
 
-static void rb_irq_work(struct irq_work *work);
-
 static void
 ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
 {
@@ -243,16 +241,6 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
 
 	INIT_LIST_HEAD(&rb->event_list);
 	spin_lock_init(&rb->event_lock);
-	init_irq_work(&rb->irq_work, rb_irq_work);
-}
-
-static void ring_buffer_put_async(struct ring_buffer *rb)
-{
-	if (!atomic_dec_and_test(&rb->refcount))
-		return;
-
-	rb->rcu_head.next = (void *)rb;
-	irq_work_queue(&rb->irq_work);
 }
 
 /*
@@ -292,7 +280,7 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
 	 * the aux buffer is in perf_mmap_close(), about to get free'd.
 	 */
 	if (!atomic_read(&rb->aux_mmap_count))
-		goto err;
+		goto err_put;
 
 	/*
 	 * Nesting is not supported for AUX area, make sure nested
@@ -338,7 +326,7 @@ err_put:
 	rb_free_aux(rb);
 
 err:
-	ring_buffer_put_async(rb);
+	ring_buffer_put(rb);
 	handle->event = NULL;
 
 	return NULL;
@@ -389,7 +377,7 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size,
 
 	local_set(&rb->aux_nest, 0);
 	rb_free_aux(rb);
-	ring_buffer_put_async(rb);
+	ring_buffer_put(rb);
 }
 
 /*
@@ -563,6 +551,14 @@ static void __rb_free_aux(struct ring_buffer *rb)
 {
 	int pg;
 
+	/*
+	 * Should never happen, the last reference should be dropped from
+	 * perf_mmap_close() path, which first stops aux transactions (which
+	 * in turn are the atomic holders of aux_refcount) and then does the
+	 * last rb_free_aux().
+	 */
+	WARN_ON_ONCE(in_atomic());
+
 	if (rb->aux_priv) {
 		rb->free_aux(rb->aux_priv);
 		rb->free_aux = NULL;
@@ -581,18 +577,7 @@ static void __rb_free_aux(struct ring_buffer *rb)
 void rb_free_aux(struct ring_buffer *rb)
 {
 	if (atomic_dec_and_test(&rb->aux_refcount))
-		irq_work_queue(&rb->irq_work);
-}
-
-static void rb_irq_work(struct irq_work *work)
-{
-	struct ring_buffer *rb = container_of(work, struct ring_buffer, irq_work);
-
-	if (!atomic_read(&rb->aux_refcount))
 		__rb_free_aux(rb);
-
-	if (rb->rcu_head.next == (void *)rb)
-		call_rcu(&rb->rcu_head, rb_free_rcu);
 }
 
 #ifndef CONFIG_PERF_USE_VMALLOC
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/7] perf: Document aux api usage
  2015-12-03 10:32 [PATCH 0/7] perf: Untangle aux refcounting Alexander Shishkin
                   ` (3 preceding siblings ...)
  2015-12-03 10:32 ` [PATCH 4/7] perf: Free aux pages in unmap path Alexander Shishkin
@ 2015-12-03 10:32 ` Alexander Shishkin
  2015-12-03 20:36   ` Mathieu Poirier
  2015-12-03 10:32 ` [PATCH 6/7] perf/x86/intel/pt: Move transaction start/stop to pmu start/stop callbacks Alexander Shishkin
  2015-12-03 10:32 ` [PATCH 7/7] perf/x86/intel/bts: Move transaction start/stop to " Alexander Shishkin
  6 siblings, 1 reply; 28+ messages in thread
From: Alexander Shishkin @ 2015-12-03 10:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Mathieu Poirier

In order to ensure safe aux buffer management, we rely on the assumption
that pmu::stop() stops its ongoing aux transaction and not just the hw.

This patch documents this requirement for perf_aux_output_{begin,end}()
apis.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
---
 kernel/events/ring_buffer.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 6865ac95ca..1aed2617e8 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -252,6 +252,10 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
  * The ordering is similar to that of perf_output_{begin,end}, with
  * the exception of (B), which should be taken care of by the pmu
  * driver, since ordering rules will differ depending on hardware.
+ *
+ * Call this from pmu::start(); see the comment in perf_aux_output_end()
+ * about its use in pmu callbacks. Both can also be called from the PMI
+ * handler if needed.
  */
 void *perf_aux_output_begin(struct perf_output_handle *handle,
 			    struct perf_event *event)
@@ -323,6 +327,7 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
 	return handle->rb->aux_priv;
 
 err_put:
+	/* can't be last */
 	rb_free_aux(rb);
 
 err:
@@ -337,6 +342,10 @@ err:
  * aux_head and posting a PERF_RECORD_AUX into the perf buffer. It is the
  * pmu driver's responsibility to observe ordering rules of the hardware,
  * so that all the data is externally visible before this is called.
+ *
+ * Note: this has to be called from pmu::stop() callback, as the assumption
+ * of the aux buffer management code is that after pmu::stop(), the aux
+ * transaction must be stopped and therefore drop the aux reference count.
  */
 void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size,
 			 bool truncated)
@@ -376,6 +385,7 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size,
 	handle->event = NULL;
 
 	local_set(&rb->aux_nest, 0);
+	/* can't be last */
 	rb_free_aux(rb);
 	ring_buffer_put(rb);
 }
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 6/7] perf/x86/intel/pt: Move transaction start/stop to pmu start/stop callbacks
  2015-12-03 10:32 [PATCH 0/7] perf: Untangle aux refcounting Alexander Shishkin
                   ` (4 preceding siblings ...)
  2015-12-03 10:32 ` [PATCH 5/7] perf: Document aux api usage Alexander Shishkin
@ 2015-12-03 10:32 ` Alexander Shishkin
  2015-12-03 10:32 ` [PATCH 7/7] perf/x86/intel/bts: Move transaction start/stop to " Alexander Shishkin
  6 siblings, 0 replies; 28+ messages in thread
From: Alexander Shishkin @ 2015-12-03 10:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Mathieu Poirier

As per AUX buffer management requirement, AUX output has to happen between
pmu::start and pmu::stop calls so that perf_event_stop() actually stops it
and therefore perf can free the aux data after it has called pmu::stop.

This patch moves perf_aux_output_{begin,end} from pt_event_{add,del} to
pt_event_{start,stop}. As a bonus, we get rid of pt_buffer_is_full(),
which is already taken care of by perf_aux_output_begin() anyway.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
---
 arch/x86/kernel/cpu/perf_event_intel_pt.c | 85 ++++++++++---------------------
 1 file changed, 27 insertions(+), 58 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_pt.c b/arch/x86/kernel/cpu/perf_event_intel_pt.c
index c0bbd1033b..daf5f6caf8 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_pt.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_pt.c
@@ -906,26 +906,6 @@ static void pt_buffer_free_aux(void *data)
 }
 
 /**
- * pt_buffer_is_full() - check if the buffer is full
- * @buf:	PT buffer.
- * @pt:		Per-cpu pt handle.
- *
- * If the user hasn't read data from the output region that aux_head
- * points to, the buffer is considered full: the user needs to read at
- * least this region and update aux_tail to point past it.
- */
-static bool pt_buffer_is_full(struct pt_buffer *buf, struct pt *pt)
-{
-	if (buf->snapshot)
-		return false;
-
-	if (local_read(&buf->data_size) >= pt->handle.size)
-		return true;
-
-	return false;
-}
-
-/**
  * intel_pt_interrupt() - PT PMI handler
  */
 void intel_pt_interrupt(void)
@@ -989,20 +969,33 @@ void intel_pt_interrupt(void)
 
 static void pt_event_start(struct perf_event *event, int mode)
 {
+	struct hw_perf_event *hwc = &event->hw;
 	struct pt *pt = this_cpu_ptr(&pt_ctx);
-	struct pt_buffer *buf = perf_get_aux(&pt->handle);
+	struct pt_buffer *buf;
 
-	if (!buf || pt_buffer_is_full(buf, pt)) {
-		event->hw.state = PERF_HES_STOPPED;
-		return;
+	buf = perf_aux_output_begin(&pt->handle, event);
+	if (!buf)
+		goto fail_stop;
+
+	pt_buffer_reset_offsets(buf, pt->handle.head);
+	if (!buf->snapshot) {
+		if (pt_buffer_reset_markers(buf, &pt->handle))
+			goto fail_end_stop;
 	}
 
 	ACCESS_ONCE(pt->handle_nmi) = 1;
-	event->hw.state = 0;
+	hwc->state = 0;
 
 	pt_config_buffer(buf->cur->table, buf->cur_idx,
 			 buf->output_off);
 	pt_config(event);
+
+	return;
+
+fail_end_stop:
+	perf_aux_output_end(&pt->handle, 0, true);
+fail_stop:
+	hwc->state = PERF_HES_STOPPED;
 }
 
 static void pt_event_stop(struct perf_event *event, int mode)
@@ -1035,19 +1028,7 @@ static void pt_event_stop(struct perf_event *event, int mode)
 		pt_handle_status(pt);
 
 		pt_update_head(pt);
-	}
-}
 
-static void pt_event_del(struct perf_event *event, int mode)
-{
-	struct pt *pt = this_cpu_ptr(&pt_ctx);
-	struct pt_buffer *buf;
-
-	pt_event_stop(event, PERF_EF_UPDATE);
-
-	buf = perf_get_aux(&pt->handle);
-
-	if (buf) {
 		if (buf->snapshot)
 			pt->handle.head =
 				local_xchg(&buf->data_size,
@@ -1057,9 +1038,13 @@ static void pt_event_del(struct perf_event *event, int mode)
 	}
 }
 
+static void pt_event_del(struct perf_event *event, int mode)
+{
+	pt_event_stop(event, PERF_EF_UPDATE);
+}
+
 static int pt_event_add(struct perf_event *event, int mode)
 {
-	struct pt_buffer *buf;
 	struct pt *pt = this_cpu_ptr(&pt_ctx);
 	struct hw_perf_event *hwc = &event->hw;
 	int ret = -EBUSY;
@@ -1067,34 +1052,18 @@ static int pt_event_add(struct perf_event *event, int mode)
 	if (pt->handle.event)
 		goto fail;
 
-	buf = perf_aux_output_begin(&pt->handle, event);
-	ret = -EINVAL;
-	if (!buf)
-		goto fail_stop;
-
-	pt_buffer_reset_offsets(buf, pt->handle.head);
-	if (!buf->snapshot) {
-		ret = pt_buffer_reset_markers(buf, &pt->handle);
-		if (ret)
-			goto fail_end_stop;
-	}
-
 	if (mode & PERF_EF_START) {
 		pt_event_start(event, 0);
-		ret = -EBUSY;
+		ret = -EINVAL;
 		if (hwc->state == PERF_HES_STOPPED)
-			goto fail_end_stop;
+			goto fail;
 	} else {
 		hwc->state = PERF_HES_STOPPED;
 	}
 
-	return 0;
-
-fail_end_stop:
-	perf_aux_output_end(&pt->handle, 0, true);
-fail_stop:
-	hwc->state = PERF_HES_STOPPED;
+	ret = 0;
 fail:
+
 	return ret;
 }
 
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 7/7] perf/x86/intel/bts: Move transaction start/stop to start/stop callbacks
  2015-12-03 10:32 [PATCH 0/7] perf: Untangle aux refcounting Alexander Shishkin
                   ` (5 preceding siblings ...)
  2015-12-03 10:32 ` [PATCH 6/7] perf/x86/intel/pt: Move transaction start/stop to pmu start/stop callbacks Alexander Shishkin
@ 2015-12-03 10:32 ` Alexander Shishkin
  6 siblings, 0 replies; 28+ messages in thread
From: Alexander Shishkin @ 2015-12-03 10:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Mathieu Poirier

As per AUX buffer management requirement, AUX output has to happen between
pmu::start and pmu::stop calls so that perf_event_stop() actually stops it
and therefore perf can free the aux data after it has called pmu::stop.

This patch moves perf_aux_output_{begin,end} from bts_event_{add,del} to
bts_event_{start,stop}. As a bonus, we get rid of bts_buffer_is_full(),
which is already taken care of by perf_aux_output_begin() anyway.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
---
 arch/x86/kernel/cpu/perf_event_intel_bts.c | 105 +++++++++++++----------------
 1 file changed, 48 insertions(+), 57 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_bts.c b/arch/x86/kernel/cpu/perf_event_intel_bts.c
index 2cad71d1b1..5f9c6fbac2 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_bts.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_bts.c
@@ -171,18 +171,6 @@ static void bts_buffer_pad_out(struct bts_phys *phys, unsigned long head)
 	memset(page_address(phys->page) + index, 0, phys->size - index);
 }
 
-static bool bts_buffer_is_full(struct bts_buffer *buf, struct bts_ctx *bts)
-{
-	if (buf->snapshot)
-		return false;
-
-	if (local_read(&buf->data_size) >= bts->handle.size ||
-	    bts->handle.size - local_read(&buf->data_size) < BTS_RECORD_SIZE)
-		return true;
-
-	return false;
-}
-
 static void bts_update(struct bts_ctx *bts)
 {
 	int cpu = raw_smp_processor_id();
@@ -213,18 +201,15 @@ static void bts_update(struct bts_ctx *bts)
 	}
 }
 
+static int
+bts_buffer_reset(struct bts_buffer *buf, struct perf_output_handle *handle);
+
 static void __bts_event_start(struct perf_event *event)
 {
 	struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
 	struct bts_buffer *buf = perf_get_aux(&bts->handle);
 	u64 config = 0;
 
-	if (!buf || bts_buffer_is_full(buf, bts))
-		return;
-
-	event->hw.itrace_started = 1;
-	event->hw.state = 0;
-
 	if (!buf->snapshot)
 		config |= ARCH_PERFMON_EVENTSEL_INT;
 	if (!event->attr.exclude_kernel)
@@ -241,16 +226,41 @@ static void __bts_event_start(struct perf_event *event)
 	wmb();
 
 	intel_pmu_enable_bts(config);
+
 }
 
 static void bts_event_start(struct perf_event *event, int flags)
 {
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
+	struct bts_buffer *buf;
+
+	buf = perf_aux_output_begin(&bts->handle, event);
+	if (!buf)
+		goto fail_stop;
+
+	if (bts_buffer_reset(buf, &bts->handle))
+		goto fail_end_stop;
+
+	bts->ds_back.bts_buffer_base = cpuc->ds->bts_buffer_base;
+	bts->ds_back.bts_absolute_maximum = cpuc->ds->bts_absolute_maximum;
+	bts->ds_back.bts_interrupt_threshold = cpuc->ds->bts_interrupt_threshold;
+
+	event->hw.itrace_started = 1;
+	event->hw.state = 0;
 
 	__bts_event_start(event);
 
 	/* PMI handler: this counter is running and likely generating PMIs */
 	ACCESS_ONCE(bts->started) = 1;
+
+	return;
+
+fail_end_stop:
+	perf_aux_output_end(&bts->handle, 0, false);
+
+fail_stop:
+	event->hw.state = PERF_HES_STOPPED;
 }
 
 static void __bts_event_stop(struct perf_event *event)
@@ -269,15 +279,32 @@ static void __bts_event_stop(struct perf_event *event)
 
 static void bts_event_stop(struct perf_event *event, int flags)
 {
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
+	struct bts_buffer *buf = perf_get_aux(&bts->handle);
 
 	/* PMI handler: don't restart this counter */
 	ACCESS_ONCE(bts->started) = 0;
 
 	__bts_event_stop(event);
 
-	if (flags & PERF_EF_UPDATE)
+	if (flags & PERF_EF_UPDATE) {
 		bts_update(bts);
+
+		if (buf) {
+			if (buf->snapshot)
+				bts->handle.head =
+					local_xchg(&buf->data_size,
+						   buf->nr_pages << PAGE_SHIFT);
+			perf_aux_output_end(&bts->handle, local_xchg(&buf->data_size, 0),
+					    !!local_xchg(&buf->lost, 0));
+		}
+
+		cpuc->ds->bts_index = bts->ds_back.bts_buffer_base;
+		cpuc->ds->bts_buffer_base = bts->ds_back.bts_buffer_base;
+		cpuc->ds->bts_absolute_maximum = bts->ds_back.bts_absolute_maximum;
+		cpuc->ds->bts_interrupt_threshold = bts->ds_back.bts_interrupt_threshold;
+	}
 }
 
 void intel_bts_enable_local(void)
@@ -417,34 +444,14 @@ int intel_bts_interrupt(void)
 
 static void bts_event_del(struct perf_event *event, int mode)
 {
-	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
-	struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
-	struct bts_buffer *buf = perf_get_aux(&bts->handle);
-
 	bts_event_stop(event, PERF_EF_UPDATE);
-
-	if (buf) {
-		if (buf->snapshot)
-			bts->handle.head =
-				local_xchg(&buf->data_size,
-					   buf->nr_pages << PAGE_SHIFT);
-		perf_aux_output_end(&bts->handle, local_xchg(&buf->data_size, 0),
-				    !!local_xchg(&buf->lost, 0));
-	}
-
-	cpuc->ds->bts_index = bts->ds_back.bts_buffer_base;
-	cpuc->ds->bts_buffer_base = bts->ds_back.bts_buffer_base;
-	cpuc->ds->bts_absolute_maximum = bts->ds_back.bts_absolute_maximum;
-	cpuc->ds->bts_interrupt_threshold = bts->ds_back.bts_interrupt_threshold;
 }
 
 static int bts_event_add(struct perf_event *event, int mode)
 {
-	struct bts_buffer *buf;
 	struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct hw_perf_event *hwc = &event->hw;
-	int ret = -EBUSY;
 
 	event->hw.state = PERF_HES_STOPPED;
 
@@ -454,26 +461,10 @@ static int bts_event_add(struct perf_event *event, int mode)
 	if (bts->handle.event)
 		return -EBUSY;
 
-	buf = perf_aux_output_begin(&bts->handle, event);
-	if (!buf)
-		return -EINVAL;
-
-	ret = bts_buffer_reset(buf, &bts->handle);
-	if (ret) {
-		perf_aux_output_end(&bts->handle, 0, false);
-		return ret;
-	}
-
-	bts->ds_back.bts_buffer_base = cpuc->ds->bts_buffer_base;
-	bts->ds_back.bts_absolute_maximum = cpuc->ds->bts_absolute_maximum;
-	bts->ds_back.bts_interrupt_threshold = cpuc->ds->bts_interrupt_threshold;
-
 	if (mode & PERF_EF_START) {
 		bts_event_start(event, 0);
-		if (hwc->state & PERF_HES_STOPPED) {
-			bts_event_del(event, 0);
-			return -EBUSY;
-		}
+		if (hwc->state & PERF_HES_STOPPED)
+			return -EINVAL;
 	}
 
 	return 0;
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] perf: Generalize task_function_call()ers
  2015-12-03 10:32 ` [PATCH 2/7] perf: Generalize task_function_call()ers Alexander Shishkin
@ 2015-12-03 17:34   ` Peter Zijlstra
  2015-12-08 16:42     ` Alexander Shishkin
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2015-12-03 17:34 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo

On Thu, Dec 03, 2015 at 12:32:37PM +0200, Alexander Shishkin wrote:
> +static int
> +remote_call_or_ctx_lock(struct perf_event *event, remote_function_f func,
> +			void *info, enum perf_event_active_state retry_state)
> +{
> +	struct perf_event_context *ctx = event->ctx;
> +	struct task_struct *task = ctx->task;
> +	unsigned int retry = 1;
> +
> +	if (!task) {
> +		/*
> +		 * Per cpu events are removed via an smp call. The removal can
> +		 * fail if the CPU is currently offline, but in that case we
> +		 * already called __perf_remove_from_context from
> +		 * perf_event_exit_cpu.
> +		 */
> +		cpu_function_call(event->cpu, func, info);
> +		return 0;
> +	}
> +
> +	raw_spin_lock_irq(&ctx->lock);
> +	do {
> +		/*
> +		 * Reload the task pointer, it might have been changed by
> +		 * a concurrent perf_event_context_sched_out().
> +		 */
> +		task = ctx->task;
> +
> +		/*
> +		 * If the context is inactive, we don't need a cross call to
> +		 * fiddle with the event so long as the ctx::lock is held.
> +		 */
> +		if (!ctx->is_active)
> +			break;
> +
> +		raw_spin_unlock_irq(&ctx->lock);
> +
> +		if (!task_function_call(task, func, info))
> +			return 0;
> +
> +		raw_spin_lock_irq(&ctx->lock);
> +
> +		if (retry_state <= PERF_EVENT_STATE_ACTIVE)
> +			retry = event->state == retry_state;
> +	} while (retry);

OK, so the retry_state thing is clever, but either I'm too tired or its
not quite right. Nor do I think its actually required.

/me frobs...

Hmm, I cannot seem to convince myself the current code is correct to
begin with.

In any case, consider the below (on top of my previous collapse patch).
The two 'hard' cases are perf_event_{dis,en}able(), those appear to play
silly games with event->state.

So starting with perf_event_disable(); we don't strictly need to test
for event->state == ACTIVE, ctx->is_active is enough. If the event is
not scheduled while the ctx is, __perf_event_disable() still does the
right thing.  Its a little less efficient to IPI in that case, over-all
simpler.

For perf_event_enable(); the same goes, but I think that's actually
broken in its current form. The current condition is: ctx->is_active &&
event->state == OFF, that means it doesn't do anything when !ctx->active
&& event->state == OFF. This is wrong, it should still mark the event
INACTIVE in that case, otherwise we'll still not try and schedule the
event once the context becomes active again.


--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1766,6 +1766,20 @@ int __perf_event_disable(void *info)
 	return 0;
 }
 
+void ___perf_event_disable(void *info)
+{
+	struct perf_event *event = info;
+
+	/*
+	 * Since we have the lock this context can't be scheduled
+	 * in, so we can change the state safely.
+	 */
+	if (event->state == PERF_EVENT_STATE_INACTIVE) {
+		update_group_times(event);
+		event->state = PERF_EVENT_STATE_OFF;
+	}
+}
+
 /*
  * Disable a event.
  *
@@ -1782,43 +1796,16 @@ int __perf_event_disable(void *info)
 static void _perf_event_disable(struct perf_event *event)
 {
 	struct perf_event_context *ctx = event->ctx;
-	struct task_struct *task = ctx->task;
-
-	if (!task) {
-		/*
-		 * Disable the event on the cpu that it's on
-		 */
-		cpu_function_call(event->cpu, __perf_event_disable, event);
-		return;
-	}
-
-retry:
-	if (!task_function_call(task, __perf_event_disable, event))
-		return;
 
 	raw_spin_lock_irq(&ctx->lock);
-	/*
-	 * If the event is still active, we need to retry the cross-call.
-	 */
-	if (event->state == PERF_EVENT_STATE_ACTIVE) {
+	if (event->state <= PERF_EVENT_STATE_OFF) {
 		raw_spin_unlock_irq(&ctx->lock);
-		/*
-		 * Reload the task pointer, it might have been changed by
-		 * a concurrent perf_event_context_sched_out().
-		 */
-		task = ctx->task;
-		goto retry;
-	}
-
-	/*
-	 * Since we have the lock this context can't be scheduled
-	 * in, so we can change the state safely.
-	 */
-	if (event->state == PERF_EVENT_STATE_INACTIVE) {
-		update_group_times(event);
-		event->state = PERF_EVENT_STATE_OFF;
+		return;
 	}
 	raw_spin_unlock_irq(&ctx->lock);
+
+	event_function_call(event, __perf_event_disable,
+			    ___perf_event_disable, event);
 }
 
 /*
@@ -2269,6 +2256,11 @@ static int __perf_event_enable(void *inf
 	return 0;
 }
 
+void ___perf_event_enable(void *info)
+{
+	__perf_event_mark_enabled((struct perf_event *)info);
+}
+
 /*
  * Enable a event.
  *
@@ -2281,58 +2273,26 @@ static int __perf_event_enable(void *inf
 static void _perf_event_enable(struct perf_event *event)
 {
 	struct perf_event_context *ctx = event->ctx;
-	struct task_struct *task = ctx->task;
 
-	if (!task) {
-		/*
-		 * Enable the event on the cpu that it's on
-		 */
-		cpu_function_call(event->cpu, __perf_event_enable, event);
+	raw_spin_lock_irq(&ctx->lock);
+	if (event->state >= PERF_EVENT_STATE_INACTIVE) {
+		raw_spin_unlock_irq(&ctx->lock);
 		return;
 	}
 
-	raw_spin_lock_irq(&ctx->lock);
-	if (event->state >= PERF_EVENT_STATE_INACTIVE)
-		goto out;
-
 	/*
 	 * If the event is in error state, clear that first.
-	 * That way, if we see the event in error state below, we
-	 * know that it has gone back into error state, as distinct
-	 * from the task having been scheduled away before the
-	 * cross-call arrived.
+	 *
+	 * That way, if we see the event in error state below, we know that it
+	 * has gone back into error state, as distinct from the task having
+	 * been scheduled away before the cross-call arrived.
 	 */
 	if (event->state == PERF_EVENT_STATE_ERROR)
 		event->state = PERF_EVENT_STATE_OFF;
-
-retry:
-	if (!ctx->is_active) {
-		__perf_event_mark_enabled(event);
-		goto out;
-	}
-
 	raw_spin_unlock_irq(&ctx->lock);
 
-	if (!task_function_call(task, __perf_event_enable, event))
-		return;
-
-	raw_spin_lock_irq(&ctx->lock);
-
-	/*
-	 * If the context is active and the event is still off,
-	 * we need to retry the cross-call.
-	 */
-	if (ctx->is_active && event->state == PERF_EVENT_STATE_OFF) {
-		/*
-		 * task could have been flipped by a concurrent
-		 * perf_event_context_sched_out()
-		 */
-		task = ctx->task;
-		goto retry;
-	}
-
-out:
-	raw_spin_unlock_irq(&ctx->lock);
+	event_function_call(event, __perf_event_enable,
+			    ___perf_event_enable, event);
 }
 
 /*

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/7] perf: Document aux api usage
  2015-12-03 10:32 ` [PATCH 5/7] perf: Document aux api usage Alexander Shishkin
@ 2015-12-03 20:36   ` Mathieu Poirier
  0 siblings, 0 replies; 28+ messages in thread
From: Mathieu Poirier @ 2015-12-03 20:36 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, vince,
	Stephane Eranian, johannes, Arnaldo Carvalho de Melo

On 3 December 2015 at 03:32, Alexander Shishkin
<alexander.shishkin@linux.intel.com> wrote:
> In order to ensure safe aux buffer management, we rely on the assumption
> that pmu::stop() stops its ongoing aux transaction and not just the hw.
>
> This patch documents this requirement for perf_aux_output_{begin,end}()
> apis.
>
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> ---
>  kernel/events/ring_buffer.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> index 6865ac95ca..1aed2617e8 100644
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -252,6 +252,10 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
>   * The ordering is similar to that of perf_output_{begin,end}, with
>   * the exception of (B), which should be taken care of by the pmu
>   * driver, since ordering rules will differ depending on hardware.
> + *
> + * Call this from pmu::start(); see the comment in perf_aux_output_end()
> + * about its use in pmu callbacks. Both can also be called from the PMI
> + * handler if needed.
>   */
>  void *perf_aux_output_begin(struct perf_output_handle *handle,
>                             struct perf_event *event)
> @@ -323,6 +327,7 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
>         return handle->rb->aux_priv;
>
>  err_put:
> +       /* can't be last */
>         rb_free_aux(rb);
>
>  err:
> @@ -337,6 +342,10 @@ err:
>   * aux_head and posting a PERF_RECORD_AUX into the perf buffer. It is the
>   * pmu driver's responsibility to observe ordering rules of the hardware,
>   * so that all the data is externally visible before this is called.
> + *
> + * Note: this has to be called from pmu::stop() callback, as the assumption
> + * of the aux buffer management code is that after pmu::stop(), the aux
> + * transaction must be stopped and therefore drop the aux reference count.
>   */
>  void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size,
>                          bool truncated)
> @@ -376,6 +385,7 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size,
>         handle->event = NULL;
>
>         local_set(&rb->aux_nest, 0);
> +       /* can't be last */
>         rb_free_aux(rb);
>         ring_buffer_put(rb);
>  }
> --
> 2.6.2
>

Thanks for the heads-up.  My next version (V7) will follow the same scheme.

Mathieu

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/7] perf: Free aux pages in unmap path
  2015-12-03 10:32 ` [PATCH 4/7] perf: Free aux pages in unmap path Alexander Shishkin
@ 2015-12-04 17:02   ` Peter Zijlstra
  2015-12-04 22:17     ` Peter Zijlstra
  2015-12-09  9:57     ` Alexander Shishkin
  0 siblings, 2 replies; 28+ messages in thread
From: Peter Zijlstra @ 2015-12-04 17:02 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo

On Thu, Dec 03, 2015 at 12:32:39PM +0200, Alexander Shishkin wrote:
> +++ b/kernel/events/core.c
> @@ -4630,11 +4630,62 @@ static void perf_mmap_close(struct vm_area_struct *vma)
>  	 */
>  	if (rb_has_aux(rb) && vma->vm_pgoff == rb->aux_pgoff &&
>  	    atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &event->mmap_mutex)) {
> +		struct perf_event *iter;
> +		LIST_HEAD(stop_list);
> +		unsigned long flags;
> +
> +		/*
> +		 * Stop all aux events that are writing to this here buffer,
> +		 * so that we can free its aux pages and corresponding pmu
> +		 * data. Note that after rb::aux_mmap_count dropped to zero,
> +		 * they won't start any more (see perf_aux_output_begin()).
> +		 *
> +		 * Since we can't take ctx::mutex under rb::event_lock, we
> +		 * need to jump through hoops to get there, namely fish out
> +		 * all events from rb::event_list onto an on-stack list,
> +		 * carry out the stopping and splice this on-stack list back
> +		 * to rb::event_list.
> +		 * This means that these events will miss wakeups during this
> +		 * window, but since it's mmap_close, assume the consumer
> +		 * doesn't care any more.
> +		 *
> +		 * Note: list_splice_init_rcu() doesn't cut it, since it syncs
> +		 * and rb::event_lock is a spinlock.
> +		 */
> +retry:
> +		spin_lock_irqsave(&rb->event_lock, flags);
> +		list_for_each_entry_rcu(iter, &rb->event_list, rb_entry) {
> +			list_del_rcu(&iter->rb_entry);
> +			spin_unlock_irqrestore(&rb->event_lock, flags);
> +
> +			synchronize_rcu();
> +			list_add_tail(&iter->rb_entry, &stop_list);
> +
> +			goto retry;
> +		}
> +		spin_unlock_irqrestore(&rb->event_lock, flags);
> +
> +		mutex_unlock(&event->mmap_mutex);
> +
> +		list_for_each_entry(iter, &stop_list, rb_entry) {
> +			if (!has_aux(iter))
> +				continue;
> +
> +			perf_event_stop(iter);
> +		}
> +
> +		/* and splice it back now that we're done with them */
> +		spin_lock_irqsave(&rb->event_lock, flags);
> +		list_splice_tail(&stop_list, &rb->event_list);
> +		spin_unlock_irqrestore(&rb->event_lock, flags);
> +
> +		/* now it's safe to free the pages */
>  		atomic_long_sub(rb->aux_nr_pages, &mmap_user->locked_vm);
>  		vma->vm_mm->pinned_vm -= rb->aux_mmap_locked;
>  
> +		/* this has to be the last one */
>  		rb_free_aux(rb);
> -		mutex_unlock(&event->mmap_mutex);
> +		WARN_ON_ONCE(atomic_read(&rb->aux_refcount));
>  	}

Yuck, nasty problem. Also, I think its broken. By not having
mmap_mutex around the whole thing, notably rb_free_aux(), you can race
against mmap().

What seems possible now is that:

	mmap(aux); // rb->aux_mmap_count == 1
	munmap(aux)
	  atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &event->mmap_mutex); // == 0

	  mutex_unlock(&event->mmap_mutex);

					mmap(aux)
					  if (rb_has_aux())
					    atomic_inc(&rb->aux_mmap_count); // == 1

	  rb_free_aux(); // oops!!




So I thought that pulling all the aux bits out from the ring_buffer
struct, such that we have rb->aux, would solve the issue in that we can
then fix mmap() to have the same retry loop as for event->rb.

And while that fixes that race (I almost had that patch complete -- I
might still send it out, just so you can see what it looks like), it
doesn't solve the complete problem I don't think.

Because in that case, you want the event to start again on the new
buffer, and I think its possible we end up calling ->start() before
we've issued the ->stop() and that would be BAD (tm).

The only solution I've come up with is:

	struct rb_aux *aux = rb->aux;

	if (aux && vma->vm_pgoff == aux->pgoff) {
		ctx = perf_event_ctx_lock(event);
		if (!atomic_dec_and_mutex_lock(&aux->mmap_count, &event->mmap_mutex) {
			/* we now hold both ctx::mutex and event::mmap_mutex */
			rb->aux = NULL;
			ring_buffer_put(rb); /* aux had a reference */
			_perf_event_stop(event);
			ring_buffer_put_aux(aux); /* should be last */
			mutex_unlock(&event->mmap_mutex);
		}
		mutex_unlock(&ctx->mutex);
	}



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/7] perf: Free aux pages in unmap path
  2015-12-04 17:02   ` Peter Zijlstra
@ 2015-12-04 22:17     ` Peter Zijlstra
  2015-12-07 16:16       ` Peter Zijlstra
  2015-12-09  9:57     ` Alexander Shishkin
  1 sibling, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2015-12-04 22:17 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo

On Fri, Dec 04, 2015 at 06:02:06PM +0100, Peter Zijlstra wrote:
> The only solution I've come up with is:
> 
> 	struct rb_aux *aux = rb->aux;
> 
> 	if (aux && vma->vm_pgoff == aux->pgoff) {
> 		ctx = perf_event_ctx_lock(event);

Can't do this at all, see the comment in put_event(). perf_read_group()
accesses user memory (and hence causes faults, which in turn take
mmap_sem) while holding ctx::mutex.

So neither this, not what you proposed can work.

Will need moar thinking.

> 		if (!atomic_dec_and_mutex_lock(&aux->mmap_count, &event->mmap_mutex) {
> 			/* we now hold both ctx::mutex and event::mmap_mutex */
> 			rb->aux = NULL;
> 			ring_buffer_put(rb); /* aux had a reference */
> 			_perf_event_stop(event);
> 			ring_buffer_put_aux(aux); /* should be last */
> 			mutex_unlock(&event->mmap_mutex);
> 		}
> 		mutex_unlock(&ctx->mutex);
> 	}
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/7] perf: Free aux pages in unmap path
  2015-12-04 22:17     ` Peter Zijlstra
@ 2015-12-07 16:16       ` Peter Zijlstra
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2015-12-07 16:16 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo

On Fri, Dec 04, 2015 at 11:17:23PM +0100, Peter Zijlstra wrote:
> On Fri, Dec 04, 2015 at 06:02:06PM +0100, Peter Zijlstra wrote:
> > The only solution I've come up with is:
> > 
> > 	struct rb_aux *aux = rb->aux;
> > 
> > 	if (aux && vma->vm_pgoff == aux->pgoff) {
> > 		ctx = perf_event_ctx_lock(event);
> 
> Can't do this at all, see the comment in put_event(). perf_read_group()
> accesses user memory (and hence causes faults, which in turn take
> mmap_sem) while holding ctx::mutex.
> 
> So neither this, not what you proposed can work.
> 
> Will need moar thinking.

So we could try and see if we can get this working:

static int __perf_event_stop(void *info)
{
	struct perf_event *event = info;

	/* IRQs disabled, cannot get scheduled away */
	if (event->oncpu == smp_processor_id()) {
		event->pmu->stop(event);
		return 0;
	}

	return -EAGAIN;
}

perf_event_stop(struct perf_event *event)
{
	for (;;) {
		if (READ_ONCE(event->state) != PERF_EVENT_STATE_ACTIVE)
			break;

		smp_rmb(); /* if we see ACTIVE, ->oncpu must be set */

		if (!cpu_function_call(READ_ONCE(event->oncpu), __perf_event_stop, event))
			break;
	}
}


That probably wants some WRITE_ONCE() and maybe some memory barriers in
event_sched_in() as well, like:

	WRITE_ONCE(event->oncpu, smp_processor_id());
	smp_wmb();
	WRITE_ONCE(event->state, PERF_EVENT_STATE_ACTIVE);




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] perf: Generalize task_function_call()ers
  2015-12-03 17:34   ` Peter Zijlstra
@ 2015-12-08 16:42     ` Alexander Shishkin
  2015-12-08 16:57       ` Peter Zijlstra
  0 siblings, 1 reply; 28+ messages in thread
From: Alexander Shishkin @ 2015-12-08 16:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo

Peter Zijlstra <peterz@infradead.org> writes:

> OK, so the retry_state thing is clever, but either I'm too tired or its
> not quite right. Nor do I think its actually required.
>
> /me frobs...
>
> Hmm, I cannot seem to convince myself the current code is correct to
> begin with.
>
> In any case, consider the below (on top of my previous collapse patch).
> The two 'hard' cases are perf_event_{dis,en}able(), those appear to play
> silly games with event->state.
>
> So starting with perf_event_disable(); we don't strictly need to test
> for event->state == ACTIVE, ctx->is_active is enough. If the event is
> not scheduled while the ctx is, __perf_event_disable() still does the
> right thing.  Its a little less efficient to IPI in that case, over-all
> simpler.
>
> For perf_event_enable(); the same goes, but I think that's actually
> broken in its current form. The current condition is: ctx->is_active &&
> event->state == OFF, that means it doesn't do anything when !ctx->active
> && event->state == OFF. This is wrong, it should still mark the event
> INACTIVE in that case, otherwise we'll still not try and schedule the
> event once the context becomes active again.

Yes, this does look more logically correct.

>
>
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -1766,6 +1766,20 @@ int __perf_event_disable(void *info)
>  	return 0;
>  }
>  
> +void ___perf_event_disable(void *info)

Only maybe change these to __perf_event_disable_locked() or something
visually distinctive from the 'active' callback?

FWIW,

Reviewed-by: Alexander Shishkin <alexander.shishkin@intel.com>

Thanks,
--
Alex

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] perf: Generalize task_function_call()ers
  2015-12-08 16:42     ` Alexander Shishkin
@ 2015-12-08 16:57       ` Peter Zijlstra
  2015-12-17 13:40         ` Peter Zijlstra
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2015-12-08 16:57 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo

On Tue, Dec 08, 2015 at 06:42:01PM +0200, Alexander Shishkin wrote:
> > +void ___perf_event_disable(void *info)
> 
> Only maybe change these to __perf_event_disable_locked() or something
> visually distinctive from the 'active' callback?

Yeah, I ran out of naming-foo and punted. I'll give it another go
tomorrow.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/7] perf: Free aux pages in unmap path
  2015-12-04 17:02   ` Peter Zijlstra
  2015-12-04 22:17     ` Peter Zijlstra
@ 2015-12-09  9:57     ` Alexander Shishkin
  2015-12-09 10:56       ` Peter Zijlstra
  1 sibling, 1 reply; 28+ messages in thread
From: Alexander Shishkin @ 2015-12-09  9:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo

Peter Zijlstra <peterz@infradead.org> writes:

> Yuck, nasty problem. Also, I think its broken. By not having
> mmap_mutex around the whole thing, notably rb_free_aux(), you can race
> against mmap().
>
> What seems possible now is that:
>
> 	mmap(aux); // rb->aux_mmap_count == 1
> 	munmap(aux)
> 	  atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &event->mmap_mutex); // == 0
>
> 	  mutex_unlock(&event->mmap_mutex);
>
> 					mmap(aux)
> 					  if (rb_has_aux())
> 					    atomic_inc(&rb->aux_mmap_count); // == 1
>
> 	  rb_free_aux(); // oops!!

Wait, this isn't actually a problem, we can hold mmap_mutex over
rb_free_aux(), as we actually already do in current code. My patch did
it wrongly though, but there's really no reason to drop the mutex before
rb_free_aux().

> So I thought that pulling all the aux bits out from the ring_buffer
> struct, such that we have rb->aux, would solve the issue in that we can
> then fix mmap() to have the same retry loop as for event->rb.
>
> And while that fixes that race (I almost had that patch complete -- I
> might still send it out, just so you can see what it looks like), it
> doesn't solve the complete problem I don't think.

I was toying with that some time ago, but I couldn't really see the
benefits that would justify the hassle.

> Because in that case, you want the event to start again on the new
> buffer, and I think its possible we end up calling ->start() before
> we've issued the ->stop() and that would be BAD (tm).

So if we just hold the mmap_mutex over rb_free_aux(), this won't
happen, right?

> The only solution I've come up with is:
>
> 	struct rb_aux *aux = rb->aux;
>
> 	if (aux && vma->vm_pgoff == aux->pgoff) {
> 		ctx = perf_event_ctx_lock(event);
> 		if (!atomic_dec_and_mutex_lock(&aux->mmap_count, &event->mmap_mutex) {
> 			/* we now hold both ctx::mutex and event::mmap_mutex */
> 			rb->aux = NULL;
> 			ring_buffer_put(rb); /* aux had a reference */
> 			_perf_event_stop(event);

Here we really need to ensure that none of the events on the
rb->event_list is running, not just the parent, and that still presents
complications wrt irqsave rb->event_lock even with your new idea for
perf_event_stop().

How about something like this to stop the writers:

static int __ring_buffer_output_stop(void *info)
{
	struct ring_buffer *rb = info;
	struct perf_event *event;
 
	spin_lock(&rb->event_lock);
	list_for_each_entry_rcu(event, &rb->event_list, rb_entry) {
		if (event->state != PERF_EVENT_STATE_ACTIVE)
			continue;

		event->pmu->stop(event, PERF_EF_UPDATE);
	}
	spin_unlock(&rb->event_lock);

	return 0;
}

static void perf_event_output_stop(struct perf_event *event)
{
	struct ring_buffer *rb = event->rb;

	lockdep_assert_held(&event->mmap_mutex);

	if (event->cpu == -1)
		perf_event_stop(event);

	cpu_function_call(event->cpu, __ring_buffer_output_stop, rb);
}
 
And then in the mmap_close:

	if (rb_has_aux(rb) && vma->vm_pgoff == rb->aux_pgoff &&
	    atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &event->mmap_mutex)) {
		perf_event_output_stop(event);

                /* undo the mlock accounting here */

 		rb_free_aux(rb);
               	mutex_unlock(&event->mmap_mutex);
	}

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/7] perf: Free aux pages in unmap path
  2015-12-09  9:57     ` Alexander Shishkin
@ 2015-12-09 10:56       ` Peter Zijlstra
  2015-12-10 11:20         ` Alexander Shishkin
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2015-12-09 10:56 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo

On Wed, Dec 09, 2015 at 11:57:51AM +0200, Alexander Shishkin wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > Yuck, nasty problem. Also, I think its broken. By not having
> > mmap_mutex around the whole thing, notably rb_free_aux(), you can race
> > against mmap().
> >
> > What seems possible now is that:
> >
> > 	mmap(aux); // rb->aux_mmap_count == 1
> > 	munmap(aux)
> > 	  atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &event->mmap_mutex); // == 0
> >
> > 	  mutex_unlock(&event->mmap_mutex);
> >
> > 					mmap(aux)
> > 					  if (rb_has_aux())
> > 					    atomic_inc(&rb->aux_mmap_count); // == 1
> >
> > 	  rb_free_aux(); // oops!!
> 
> Wait, this isn't actually a problem, we can hold mmap_mutex over
> rb_free_aux(), as we actually already do in current code. My patch did
> it wrongly though, but there's really no reason to drop the mutex before
> rb_free_aux().

Well, you had to drop it because you wanted to acquire the ctx::mutex,
but if we drop that requirement, as we must per the other emails, then
this should indeed be possible.

> So if we just hold the mmap_mutex over rb_free_aux(), this won't
> happen, right?

Correct.

> How about something like this to stop the writers:
> 
> static int __ring_buffer_output_stop(void *info)
> {
> 	struct ring_buffer *rb = info;
> 	struct perf_event *event;
>  
> 	spin_lock(&rb->event_lock);
> 	list_for_each_entry_rcu(event, &rb->event_list, rb_entry) {
> 		if (event->state != PERF_EVENT_STATE_ACTIVE)
> 			continue;
> 
> 		event->pmu->stop(event, PERF_EF_UPDATE);
> 	}
> 	spin_unlock(&rb->event_lock);
> 
> 	return 0;
> }
> 
> static void perf_event_output_stop(struct perf_event *event)
> {
> 	struct ring_buffer *rb = event->rb;
> 
> 	lockdep_assert_held(&event->mmap_mutex);
> 
> 	if (event->cpu == -1)
> 		perf_event_stop(event);
> 
> 	cpu_function_call(event->cpu, __ring_buffer_output_stop, rb);

I'm not sure about the different semantics between event->cpu == -1 and
not, but yes, something along those likes.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/7] perf: Free aux pages in unmap path
  2015-12-09 10:56       ` Peter Zijlstra
@ 2015-12-10 11:20         ` Alexander Shishkin
  2015-12-10 12:58           ` Alexander Shishkin
  0 siblings, 1 reply; 28+ messages in thread
From: Alexander Shishkin @ 2015-12-10 11:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo

Peter Zijlstra <peterz@infradead.org> writes:

>> How about something like this to stop the writers:
>> 
>> static int __ring_buffer_output_stop(void *info)
>> {
>> 	struct ring_buffer *rb = info;
>> 	struct perf_event *event;
>>  
>> 	spin_lock(&rb->event_lock);
>> 	list_for_each_entry_rcu(event, &rb->event_list, rb_entry) {
>> 		if (event->state != PERF_EVENT_STATE_ACTIVE)
>> 			continue;
>> 
>> 		event->pmu->stop(event, PERF_EF_UPDATE);
>> 	}
>> 	spin_unlock(&rb->event_lock);
>> 
>> 	return 0;
>> }
>> 
>> static void perf_event_output_stop(struct perf_event *event)
>> {
>> 	struct ring_buffer *rb = event->rb;
>> 
>> 	lockdep_assert_held(&event->mmap_mutex);
>> 
>> 	if (event->cpu == -1)
>> 		perf_event_stop(event);
>> 
>> 	cpu_function_call(event->cpu, __ring_buffer_output_stop, rb);
>
> I'm not sure about the different semantics between event->cpu == -1 and
> not, but yes, something along those likes.

So this also doesn't quite cut it, since we also have children (which
aren't explicitly ring_buffer_attach()ed to the ring buffer, but will
still write there) and in particular children of those events that are
on rb->event_list, which triples the fun of synchronizing and iterating
these.

So I tried a different approach: iterating through pmu's contexts'
instead. Consider the following. 

+static void __perf_event_output_stop(struct perf_event *event, void *data)
+{
+       struct perf_event *parent = event->parent;
+       struct ring_buffer *rb = data;
+
+       if (rcu_dereference(event->rb) == rb)
+               __perf_event_stop(event);
+       if (parent && rcu_dereference(parent->rb) == rb)
+               __perf_event_stop(event);
+}
+
+static int __perf_pmu_output_stop(void *info)
+{
+       struct perf_event *event = info;
+       struct pmu *pmu = event->pmu;
+       struct perf_cpu_context *cpuctx = get_cpu_ptr(pmu->pmu_cpu_context);
+
+       rcu_read_lock();
+       perf_event_aux_ctx(&cpuctx->ctx, __perf_event_output_stop, event->rb, true);
+       if (cpuctx->task_ctx)
+               perf_event_aux_ctx(cpuctx->task_ctx, __perf_event_output_stop,
+                                  event->rb, true);
+       rcu_read_unlock();
+
+       return 0;
+}
+
+static void perf_pmu_output_stop(struct perf_event *event)
+{
+       int cpu;
+
+       get_online_cpus();
+       for_each_online_cpu(cpu) {
+               cpu_function_call(cpu, __perf_pmu_output_stop, event);
+       }
+       put_online_cpus();
+}

And then we just call this perf_pmu_output_stop() from perf_mmap_close()
under mmap_mutex, before rb_free_aux(). Likely going through online cpus
is not necessary, but I'm gonna grab a lunch first.

I also hacked perf_event_aux_ctx() to make the above work, like so:

@@ -5696,15 +5696,18 @@ typedef void (perf_event_aux_output_cb)(struct perf_event *event, void *data);
 static void
 perf_event_aux_ctx(struct perf_event_context *ctx,
                   perf_event_aux_output_cb output,
-                  void *data)
+                  void *data, bool all)
 {
        struct perf_event *event;
 
        list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
-               if (event->state < PERF_EVENT_STATE_INACTIVE)
-                       continue;
-               if (!event_filter_match(event))
-                       continue;
+               if (!all) {
+                       if (event->state < PERF_EVENT_STATE_INACTIVE)
+                               continue;
+                       if (!event_filter_match(event))
+                               continue;
+               }
+
                output(event, data);
        }
 }

How does this look to you?

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/7] perf: Free aux pages in unmap path
  2015-12-10 11:20         ` Alexander Shishkin
@ 2015-12-10 12:58           ` Alexander Shishkin
  0 siblings, 0 replies; 28+ messages in thread
From: Alexander Shishkin @ 2015-12-10 12:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo

Alexander Shishkin <alexander.shishkin@linux.intel.com> writes:

> I also hacked perf_event_aux_ctx() to make the above work, like so:
>
> @@ -5696,15 +5696,18 @@ typedef void (perf_event_aux_output_cb)(struct perf_event *event, void *data);
>  static void
>  perf_event_aux_ctx(struct perf_event_context *ctx,
>                    perf_event_aux_output_cb output,
> -                  void *data)
> +                  void *data, bool all)
>  {
>         struct perf_event *event;
>  
>         list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
> -               if (event->state < PERF_EVENT_STATE_INACTIVE)
> -                       continue;
> -               if (!event_filter_match(event))
> -                       continue;
> +               if (!all) {
> +                       if (event->state < PERF_EVENT_STATE_INACTIVE)
> +                               continue;
> +                       if (!event_filter_match(event))
> +                               continue;
> +               }
> +
>                 output(event, data);
>         }
>  }
>

This last bit is actually not needed at all, not for the task at hand anyway.

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] perf: Generalize task_function_call()ers
  2015-12-08 16:57       ` Peter Zijlstra
@ 2015-12-17 13:40         ` Peter Zijlstra
  2015-12-17 14:25           ` Alexander Shishkin
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2015-12-17 13:40 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo

On Tue, Dec 08, 2015 at 05:57:00PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 08, 2015 at 06:42:01PM +0200, Alexander Shishkin wrote:
> > > +void ___perf_event_disable(void *info)
> > 
> > Only maybe change these to __perf_event_disable_locked() or something
> > visually distinctive from the 'active' callback?
> 
> Yeah, I ran out of naming-foo and punted. I'll give it another go
> tomorrow.

How about something like so?

Its a bit 'weird' but they're already long function names and adding
things like _locked to it makes them really rather unwieldy.

---
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1660,7 +1660,7 @@ struct remove_event {
 	bool detach_group;
 };
 
-static void ___perf_remove_from_context(void *info)
+static void __I_perf_remove_from_context(void *info)
 {
 	struct remove_event *re = info;
 	struct perf_event *event = re->event;
@@ -1677,7 +1677,7 @@ static void ___perf_remove_from_context(
  * We disable the event on the hardware level first. After that we
  * remove it from the context list.
  */
-static int __perf_remove_from_context(void *info)
+static int __A_perf_remove_from_context(void *info)
 {
 	struct remove_event *re = info;
 	struct perf_event *event = re->event;
@@ -1721,14 +1721,14 @@ static void perf_remove_from_context(str
 
 	lockdep_assert_held(&ctx->mutex);
 
-	event_function_call(event, __perf_remove_from_context,
-			    ___perf_remove_from_context, &re);
+	event_function_call(event, __A_perf_remove_from_context,
+				   __I_perf_remove_from_context, &re);
 }
 
 /*
  * Cross CPU call to disable a performance event
  */
-int __perf_event_disable(void *info)
+int __A_perf_event_disable(void *info)
 {
 	struct perf_event *event = info;
 	struct perf_event_context *ctx = event->ctx;
@@ -1766,7 +1766,7 @@ int __perf_event_disable(void *info)
 	return 0;
 }
 
-void ___perf_event_disable(void *info)
+void __I_perf_event_disable(void *info)
 {
 	struct perf_event *event = info;
 
@@ -1804,8 +1804,8 @@ static void _perf_event_disable(struct p
 	}
 	raw_spin_unlock_irq(&ctx->lock);
 
-	event_function_call(event, __perf_event_disable,
-			    ___perf_event_disable, event);
+	event_function_call(event, __A_perf_event_disable,
+				   __I_perf_event_disable, event);
 }
 
 /*
@@ -2058,7 +2058,7 @@ static void perf_event_sched_in(struct p
 		ctx_sched_in(ctx, cpuctx, EVENT_FLEXIBLE, task);
 }
 
-static void ___perf_install_in_context(void *info)
+static void __I_perf_install_in_context(void *info)
 {
 	struct perf_event *event = info;
 	struct perf_event_context *ctx = event->ctx;
@@ -2075,7 +2075,7 @@ static void ___perf_install_in_context(v
  *
  * Must be called with ctx->mutex held
  */
-static int  __perf_install_in_context(void *info)
+static int  __A_perf_install_in_context(void *info)
 {
 	struct perf_event *event = info;
 	struct perf_event_context *ctx = event->ctx;
@@ -2152,8 +2152,8 @@ perf_install_in_context(struct perf_even
 	if (event->cpu != -1)
 		event->cpu = cpu;
 
-	event_function_call(event, __perf_install_in_context,
-			    ___perf_install_in_context, event);
+	event_function_call(event, __A_perf_install_in_context,
+				   __I_perf_install_in_context, event);
 }
 
 /*
@@ -2180,7 +2180,7 @@ static void __perf_event_mark_enabled(st
 /*
  * Cross CPU call to enable a performance event
  */
-static int __perf_event_enable(void *info)
+static int __A_perf_event_enable(void *info)
 {
 	struct perf_event *event = info;
 	struct perf_event_context *ctx = event->ctx;
@@ -2256,7 +2256,7 @@ static int __perf_event_enable(void *inf
 	return 0;
 }
 
-void ___perf_event_enable(void *info)
+void __I_perf_event_enable(void *info)
 {
 	__perf_event_mark_enabled((struct perf_event *)info);
 }
@@ -2291,8 +2291,8 @@ static void _perf_event_enable(struct pe
 		event->state = PERF_EVENT_STATE_OFF;
 	raw_spin_unlock_irq(&ctx->lock);
 
-	event_function_call(event, __perf_event_enable,
-			    ___perf_event_enable, event);
+	event_function_call(event, __A_perf_event_enable,
+				   __I_perf_event_enable, event);
 }
 
 /*
@@ -4091,7 +4091,7 @@ struct period_event {
 	u64 value;
 };
 
-static void ___perf_event_period(void *info)
+static void __I_perf_event_period(void *info)
 {
 	struct period_event *pe = info;
 	struct perf_event *event = pe->event;
@@ -4107,7 +4107,7 @@ static void ___perf_event_period(void *i
 	local64_set(&event->hw.period_left, 0);
 }
 
-static int __perf_event_period(void *info)
+static int __A_perf_event_period(void *info)
 {
 	struct period_event *pe = info;
 	struct perf_event *event = pe->event;
@@ -4159,8 +4159,8 @@ static int perf_event_period(struct perf
 
 	pe.value = value;
 
-	event_function_call(event, __perf_event_period,
-			    ___perf_event_period, &pe);
+	event_function_call(event, __A_perf_event_period,
+				   __I_perf_event_period, &pe);
 
 	return 0;
 }
@@ -9226,7 +9226,7 @@ static void __perf_event_exit_context(vo
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(re.event, &ctx->event_list, event_entry)
-		__perf_remove_from_context(&re);
+		__A_perf_remove_from_context(&re);
 	rcu_read_unlock();
 }
 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] perf: Generalize task_function_call()ers
  2015-12-17 13:40         ` Peter Zijlstra
@ 2015-12-17 14:25           ` Alexander Shishkin
  2015-12-17 15:07             ` Peter Zijlstra
  0 siblings, 1 reply; 28+ messages in thread
From: Alexander Shishkin @ 2015-12-17 14:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo

Peter Zijlstra <peterz@infradead.org> writes:

> On Tue, Dec 08, 2015 at 05:57:00PM +0100, Peter Zijlstra wrote:
>> On Tue, Dec 08, 2015 at 06:42:01PM +0200, Alexander Shishkin wrote:
>> > > +void ___perf_event_disable(void *info)
>> > 
>> > Only maybe change these to __perf_event_disable_locked() or something
>> > visually distinctive from the 'active' callback?
>> 
>> Yeah, I ran out of naming-foo and punted. I'll give it another go
>> tomorrow.
>
> How about something like so?
>
> Its a bit 'weird' but they're already long function names and adding
> things like _locked to it makes them really rather unwieldy.

That aside, why I brought it up in the first place is because the two
functions are asymmetric: one is called with irqs disabled and the
other -- with ctx::lock held (and not because I'm into bikeshedding or
anything like that). Looking at the pair of them sets off my "that's not
right" trigger and sends me to the event_function_call()
implementation. So in that sense, prepending an extra underscore kind of
made sense. Maybe __perf_remove_from_context_{on,off}()?

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] perf: Generalize task_function_call()ers
  2015-12-17 14:25           ` Alexander Shishkin
@ 2015-12-17 15:07             ` Peter Zijlstra
  2015-12-18  9:01               ` Peter Zijlstra
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2015-12-17 15:07 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo

On Thu, Dec 17, 2015 at 04:25:14PM +0200, Alexander Shishkin wrote:

> That aside, why I brought it up in the first place is because the two
> functions are asymmetric: one is called with irqs disabled and the
> other -- with ctx::lock held (and not because I'm into bikeshedding or
> anything like that). Looking at the pair of them sets off my "that's not
> right" trigger and sends me to the event_function_call()
> implementation. So in that sense, prepending an extra underscore kind of
> made sense. Maybe __perf_remove_from_context_{on,off}()?

You are quite right, and I think I've found more problems because of
this. Let me prod at this some more.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] perf: Generalize task_function_call()ers
  2015-12-17 15:07             ` Peter Zijlstra
@ 2015-12-18  9:01               ` Peter Zijlstra
  2015-12-18 15:07                 ` Alexander Shishkin
                                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Peter Zijlstra @ 2015-12-18  9:01 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo

On Thu, Dec 17, 2015 at 04:07:32PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 17, 2015 at 04:25:14PM +0200, Alexander Shishkin wrote:
> 
> > That aside, why I brought it up in the first place is because the two
> > functions are asymmetric: one is called with irqs disabled and the
> > other -- with ctx::lock held (and not because I'm into bikeshedding or
> > anything like that). Looking at the pair of them sets off my "that's not
> > right" trigger and sends me to the event_function_call()
> > implementation. So in that sense, prepending an extra underscore kind of
> > made sense. Maybe __perf_remove_from_context_{on,off}()?
> 
> You are quite right, and I think I've found more problems because of
> this. Let me prod at this some more.

So this then...

This fixes, I think, 3 separate bugs:

 - remove_from_context used to clear ->is_active, this is against the
   update rules from ctx_sched_in() which set ->is_active even though
   there might be !nr_events

 - install_in_context did bad things to cpuctx->task_ctx; it would not
   validate that ctx->task == current and could do random things because
   of that.

 - cpuctx->task_ctx tracking was iffy

It also unifies a lot of the weird and fragile code we had around all
those IPI calls and adds a bunch of assertions.

It seems to survive a little pounding with 'normal' workloads.

Please have an extra careful look..

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/perf_event.h    |    4 
 kernel/events/core.c          |  463 ++++++++++++++++--------------------------
 kernel/events/hw_breakpoint.c |    2 
 3 files changed, 180 insertions(+), 289 deletions(-)

--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1044,7 +1044,7 @@ extern void perf_swevent_put_recursion_c
 extern u64 perf_swevent_set_period(struct perf_event *event);
 extern void perf_event_enable(struct perf_event *event);
 extern void perf_event_disable(struct perf_event *event);
-extern int __perf_event_disable(void *info);
+extern void perf_event_disable_local(struct perf_event *event);
 extern void perf_event_task_tick(void);
 #else /* !CONFIG_PERF_EVENTS: */
 static inline void *
@@ -1106,7 +1106,7 @@ static inline void perf_swevent_put_recu
 static inline u64 perf_swevent_set_period(struct perf_event *event)	{ return 0; }
 static inline void perf_event_enable(struct perf_event *event)		{ }
 static inline void perf_event_disable(struct perf_event *event)		{ }
-static inline int __perf_event_disable(void *info)			{ return -1; }
+static inline void perf_event_disable_local(struct perf_event *event)	{ }
 static inline void perf_event_task_tick(void)				{ }
 static inline int perf_event_release_kernel(struct perf_event *event)	{ return 0; }
 #endif
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -126,21 +126,127 @@ static int cpu_function_call(int cpu, re
 	return data.ret;
 }
 
-static void event_function_call(struct perf_event *event,
-				int (*active)(void *),
-				void (*inactive)(void *),
-				void *data)
+static inline struct perf_cpu_context *
+__get_cpu_context(struct perf_event_context *ctx)
+{
+	return this_cpu_ptr(ctx->pmu->pmu_cpu_context);
+}
+
+static void perf_ctx_lock(struct perf_cpu_context *cpuctx,
+			  struct perf_event_context *ctx)
+{
+	raw_spin_lock(&cpuctx->ctx.lock);
+	if (ctx)
+		raw_spin_lock(&ctx->lock);
+}
+
+static void perf_ctx_unlock(struct perf_cpu_context *cpuctx,
+			    struct perf_event_context *ctx)
+{
+	if (ctx)
+		raw_spin_unlock(&ctx->lock);
+	raw_spin_unlock(&cpuctx->ctx.lock);
+}
+
+typedef void (*event_f)(struct perf_event *, struct perf_cpu_context *,
+			struct perf_event_context *, void *);
+
+struct event_function_struct {
+	struct perf_event *event;
+	event_f func;
+	void *data;
+};
+
+static int event_function(void *info)
+{
+	struct event_function_struct *efs = info;
+	struct perf_event *event = efs->event;
+	struct perf_event_context *ctx = event->ctx;
+	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
+	struct perf_event_context *task_ctx = NULL;
+
+	WARN_ON_ONCE(!irqs_disabled());
+
+	/*
+	 * If this is aimed at a task, check that we hit it.
+	 *
+	 * This being a smp_call_function() we'll have IRQs disabled, therefore
+	 * if this is current, we cannot race with context switches swapping
+	 * out contexts, see perf_event_context_sched_out().
+	 */
+	if (ctx->task) {
+		/* Wrong task, try again. */
+		if (ctx->task != current)
+			return -EAGAIN;
+
+		task_ctx = ctx;
+	} else
+		WARN_ON_ONCE(&cpuctx->ctx != ctx);
+
+	perf_ctx_lock(cpuctx, task_ctx);
+	/*
+	 * Now that we hold locks, double check state. Paranoia pays.
+	 */
+	if (task_ctx) {
+		WARN_ON_ONCE(task_ctx->task != current);
+
+		/*
+		 * The rules are that:
+		 *
+		 *  * ctx->is_active is set irrespective of the actual number of
+		 *    events, such that we can know it got scheduled in etc.
+		 *
+		 *  * cpuctx->task_ctx is only set if there were actual events on.
+		 *
+		 * It can happen that even though we hit the right task,
+		 * ->is_active is still false, this is possible when the ctx is
+		 * new and the task hasn't scheduled yet, or the ctx on its way
+		 * out.
+		 */
+		if (task_ctx->is_active && task_ctx->nr_events) {
+			WARN_ON_ONCE(cpuctx->task_ctx != task_ctx);
+		} else {
+			WARN_ON_ONCE(cpuctx->task_ctx);
+		}
+	}
+
+	efs->func(event, cpuctx, ctx, efs->data);
+	perf_ctx_unlock(cpuctx, task_ctx);
+
+	return 0;
+}
+
+static void event_function_local(struct perf_event *event, event_f func, void *data)
+{
+	struct event_function_struct efs = {
+		.event = event,
+		.func = func,
+		.data = data,
+	};
+
+	int ret = event_function(&efs);
+	WARN_ON_ONCE(ret);
+}
+
+static void event_function_call(struct perf_event *event, event_f func, void *data)
 {
 	struct perf_event_context *ctx = event->ctx;
 	struct task_struct *task = ctx->task;
+	struct perf_cpu_context *cpuctx;
+
+	struct event_function_struct efs = {
+		.event = event,
+		.func = func,
+		.data = data,
+	};
 
 	if (!task) {
-		cpu_function_call(event->cpu, active, data);
+		cpu_function_call(event->cpu, event_function, &efs);
 		return;
 	}
 
 again:
-	if (!task_function_call(task, active, data))
+	if (!task_function_call(task, event_function, &efs))
 		return;
 
 	raw_spin_lock_irq(&ctx->lock);
@@ -153,7 +259,8 @@ static void event_function_call(struct p
 		raw_spin_unlock_irq(&ctx->lock);
 		goto again;
 	}
-	inactive(data);
+	cpuctx = __get_cpu_context(ctx);
+	func(event, cpuctx, ctx, data);
 	raw_spin_unlock_irq(&ctx->lock);
 }
 
@@ -368,28 +475,6 @@ static inline u64 perf_event_clock(struc
 	return event->clock();
 }
 
-static inline struct perf_cpu_context *
-__get_cpu_context(struct perf_event_context *ctx)
-{
-	return this_cpu_ptr(ctx->pmu->pmu_cpu_context);
-}
-
-static void perf_ctx_lock(struct perf_cpu_context *cpuctx,
-			  struct perf_event_context *ctx)
-{
-	raw_spin_lock(&cpuctx->ctx.lock);
-	if (ctx)
-		raw_spin_lock(&ctx->lock);
-}
-
-static void perf_ctx_unlock(struct perf_cpu_context *cpuctx,
-			    struct perf_event_context *ctx)
-{
-	if (ctx)
-		raw_spin_unlock(&ctx->lock);
-	raw_spin_unlock(&cpuctx->ctx.lock);
-}
-
 #ifdef CONFIG_CGROUP_PERF
 
 static inline bool
@@ -1655,47 +1740,18 @@ group_sched_out(struct perf_event *group
 		cpuctx->exclusive = 0;
 }
 
-struct remove_event {
-	struct perf_event *event;
-	bool detach_group;
-};
-
-static void ___perf_remove_from_context(void *info)
-{
-	struct remove_event *re = info;
-	struct perf_event *event = re->event;
-	struct perf_event_context *ctx = event->ctx;
-
-	if (re->detach_group)
-		perf_group_detach(event);
-	list_del_event(event, ctx);
-}
-
-/*
- * Cross CPU call to remove a performance event
- *
- * We disable the event on the hardware level first. After that we
- * remove it from the context list.
- */
-static int __perf_remove_from_context(void *info)
+static void
+__perf_remove_from_context(struct perf_event *event, struct perf_cpu_context *cpuctx,
+			   struct perf_event_context *ctx, void *info)
 {
-	struct remove_event *re = info;
-	struct perf_event *event = re->event;
-	struct perf_event_context *ctx = event->ctx;
-	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
+	bool detach_group = (unsigned long)info;
 
-	raw_spin_lock(&ctx->lock);
 	event_sched_out(event, cpuctx, ctx);
-	if (re->detach_group)
+	if (detach_group)
 		perf_group_detach(event);
 	list_del_event(event, ctx);
-	if (!ctx->nr_events && cpuctx->task_ctx == ctx) {
-		ctx->is_active = 0;
+	if (!ctx->nr_events && cpuctx->task_ctx == ctx)
 		cpuctx->task_ctx = NULL;
-	}
-	raw_spin_unlock(&ctx->lock);
-
-	return 0;
 }
 
 /*
@@ -1714,70 +1770,33 @@ static int __perf_remove_from_context(vo
 static void perf_remove_from_context(struct perf_event *event, bool detach_group)
 {
 	struct perf_event_context *ctx = event->ctx;
-	struct remove_event re = {
-		.event = event,
-		.detach_group = detach_group,
-	};
 
 	lockdep_assert_held(&ctx->mutex);
 
 	event_function_call(event, __perf_remove_from_context,
-			    ___perf_remove_from_context, &re);
+			    (void *)(unsigned long)detach_group);
 }
 
-/*
- * Cross CPU call to disable a performance event
- */
-int __perf_event_disable(void *info)
+static void
+__perf_event_disable(struct perf_event *event, struct perf_cpu_context *cpuctx,
+		     struct perf_event_context *ctx, void *info)
 {
-	struct perf_event *event = info;
-	struct perf_event_context *ctx = event->ctx;
-	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
-
-	/*
-	 * If this is a per-task event, need to check whether this
-	 * event's task is the current task on this cpu.
-	 *
-	 * Can trigger due to concurrent perf_event_context_sched_out()
-	 * flipping contexts around.
-	 */
-	if (ctx->task && cpuctx->task_ctx != ctx)
-		return -EINVAL;
-
-	raw_spin_lock(&ctx->lock);
-
-	/*
-	 * If the event is on, turn it off.
-	 * If it is in error state, leave it in error state.
-	 */
-	if (event->state >= PERF_EVENT_STATE_INACTIVE) {
-		update_context_time(ctx);
-		update_cgrp_time_from_event(event);
-		update_group_times(event);
-		if (event == event->group_leader)
-			group_sched_out(event, cpuctx, ctx);
-		else
-			event_sched_out(event, cpuctx, ctx);
-		event->state = PERF_EVENT_STATE_OFF;
-	}
-
-	raw_spin_unlock(&ctx->lock);
+	if (event->state < PERF_EVENT_STATE_INACTIVE)
+		return;
 
-	return 0;
+	update_context_time(ctx);
+	update_cgrp_time_from_event(event);
+	update_group_times(event);
+	if (event == event->group_leader)
+		group_sched_out(event, cpuctx, ctx);
+	else
+		event_sched_out(event, cpuctx, ctx);
+	event->state = PERF_EVENT_STATE_OFF;
 }
 
-void ___perf_event_disable(void *info)
+void perf_event_disable_local(struct perf_event *event)
 {
-	struct perf_event *event = info;
-
-	/*
-	 * Since we have the lock this context can't be scheduled
-	 * in, so we can change the state safely.
-	 */
-	if (event->state == PERF_EVENT_STATE_INACTIVE) {
-		update_group_times(event);
-		event->state = PERF_EVENT_STATE_OFF;
-	}
+	event_function_local(event, __perf_event_disable, NULL);
 }
 
 /*
@@ -1804,8 +1823,7 @@ static void _perf_event_disable(struct p
 	}
 	raw_spin_unlock_irq(&ctx->lock);
 
-	event_function_call(event, __perf_event_disable,
-			    ___perf_event_disable, event);
+	event_function_call(event, __perf_event_disable, NULL);
 }
 
 /*
@@ -2054,62 +2072,30 @@ static void perf_event_sched_in(struct p
 	if (ctx)
 		ctx_sched_in(ctx, cpuctx, EVENT_PINNED, task);
 	cpu_ctx_sched_in(cpuctx, EVENT_FLEXIBLE, task);
-	if (ctx)
+	if (ctx) {
 		ctx_sched_in(ctx, cpuctx, EVENT_FLEXIBLE, task);
+		if (ctx->nr_events)
+			cpuctx->task_ctx = ctx;
+	}
 }
 
-static void ___perf_install_in_context(void *info)
-{
-	struct perf_event *event = info;
-	struct perf_event_context *ctx = event->ctx;
-
-	/*
-	 * Since the task isn't running, its safe to add the event, us holding
-	 * the ctx->lock ensures the task won't get scheduled in.
-	 */
-	add_event_to_ctx(event, ctx);
-}
-
-/*
- * Cross CPU call to install and enable a performance event
- *
- * Must be called with ctx->mutex held
- */
-static int  __perf_install_in_context(void *info)
+static void perf_resched_context(struct perf_cpu_context *cpuctx)
 {
-	struct perf_event *event = info;
-	struct perf_event_context *ctx = event->ctx;
-	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
 	struct perf_event_context *task_ctx = cpuctx->task_ctx;
-	struct task_struct *task = current;
 
-	perf_ctx_lock(cpuctx, task_ctx);
 	perf_pmu_disable(cpuctx->ctx.pmu);
-
-	/*
-	 * If there was an active task_ctx schedule it out.
-	 */
 	if (task_ctx)
 		task_ctx_sched_out(task_ctx);
-
-	/*
-	 * If the context we're installing events in is not the
-	 * active task_ctx, flip them.
-	 */
-	if (ctx->task && task_ctx != ctx) {
-		if (task_ctx)
-			raw_spin_unlock(&task_ctx->lock);
-		raw_spin_lock(&ctx->lock);
-		task_ctx = ctx;
-	}
-
-	if (task_ctx) {
-		cpuctx->task_ctx = task_ctx;
-		task = task_ctx->task;
-	}
-
 	cpu_ctx_sched_out(cpuctx, EVENT_ALL);
 
+	perf_event_sched_in(cpuctx, task_ctx, current);
+	perf_pmu_enable(cpuctx->ctx.pmu);
+}
+
+static void
+__perf_install_in_context(struct perf_event *event, struct perf_cpu_context *cpuctx,
+			  struct perf_event_context *ctx, void *info)
+{
 	update_context_time(ctx);
 	/*
 	 * update cgrp time only if current cgrp
@@ -2120,15 +2106,8 @@ static int  __perf_install_in_context(vo
 
 	add_event_to_ctx(event, ctx);
 
-	/*
-	 * Schedule everything back in
-	 */
-	perf_event_sched_in(cpuctx, task_ctx, task);
-
-	perf_pmu_enable(cpuctx->ctx.pmu);
-	perf_ctx_unlock(cpuctx, task_ctx);
-
-	return 0;
+	if (ctx->is_active)
+		perf_resched_context(cpuctx);
 }
 
 /*
@@ -2152,8 +2131,7 @@ perf_install_in_context(struct perf_even
 	if (event->cpu != -1)
 		event->cpu = cpu;
 
-	event_function_call(event, __perf_install_in_context,
-			    ___perf_install_in_context, event);
+	event_function_call(event, __perf_install_in_context, NULL);
 }
 
 /*
@@ -2177,88 +2155,33 @@ static void __perf_event_mark_enabled(st
 	}
 }
 
-/*
- * Cross CPU call to enable a performance event
- */
-static int __perf_event_enable(void *info)
+static void
+__perf_event_enable(struct perf_event *event, struct perf_cpu_context *cpuctx,
+		    struct perf_event_context *ctx, void *info)
 {
-	struct perf_event *event = info;
-	struct perf_event_context *ctx = event->ctx;
 	struct perf_event *leader = event->group_leader;
-	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
-	int err;
-
-	/*
-	 * There's a time window between 'ctx->is_active' check
-	 * in perf_event_enable function and this place having:
-	 *   - IRQs on
-	 *   - ctx->lock unlocked
-	 *
-	 * where the task could be killed and 'ctx' deactivated
-	 * by perf_event_exit_task.
-	 */
-	if (!ctx->is_active)
-		return -EINVAL;
-
-	raw_spin_lock(&ctx->lock);
-	update_context_time(ctx);
 
 	if (event->state >= PERF_EVENT_STATE_INACTIVE)
-		goto unlock;
+		return;
 
-	/*
-	 * set current task's cgroup time reference point
-	 */
+	update_context_time(ctx);
 	perf_cgroup_set_timestamp(current, ctx);
-
 	__perf_event_mark_enabled(event);
 
 	if (!event_filter_match(event)) {
 		if (is_cgroup_event(event))
 			perf_cgroup_defer_enabled(event);
-		goto unlock;
+		return;
 	}
 
 	/*
-	 * If the event is in a group and isn't the group leader,
-	 * then don't put it on unless the group is on.
+	 * If we're part of a group and the leader isn't active, we won't be either.
 	 */
 	if (leader != event && leader->state != PERF_EVENT_STATE_ACTIVE)
-		goto unlock;
-
-	if (!group_can_go_on(event, cpuctx, 1)) {
-		err = -EEXIST;
-	} else {
-		if (event == leader)
-			err = group_sched_in(event, cpuctx, ctx);
-		else
-			err = event_sched_in(event, cpuctx, ctx);
-	}
-
-	if (err) {
-		/*
-		 * If this event can't go on and it's part of a
-		 * group, then the whole group has to come off.
-		 */
-		if (leader != event) {
-			group_sched_out(leader, cpuctx, ctx);
-			perf_mux_hrtimer_restart(cpuctx);
-		}
-		if (leader->attr.pinned) {
-			update_group_times(leader);
-			leader->state = PERF_EVENT_STATE_ERROR;
-		}
-	}
-
-unlock:
-	raw_spin_unlock(&ctx->lock);
-
-	return 0;
-}
+		return;
 
-void ___perf_event_enable(void *info)
-{
-	__perf_event_mark_enabled((struct perf_event *)info);
+	if (ctx->is_active)
+		perf_resched_context(cpuctx);
 }
 
 /*
@@ -2291,8 +2214,7 @@ static void _perf_event_enable(struct pe
 		event->state = PERF_EVENT_STATE_OFF;
 	raw_spin_unlock_irq(&ctx->lock);
 
-	event_function_call(event, __perf_event_enable,
-			    ___perf_event_enable, event);
+	event_function_call(event, __perf_event_enable, NULL);
 }
 
 /*
@@ -2774,9 +2696,6 @@ static void perf_event_context_sched_in(
 	 */
 	cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
 
-	if (ctx->nr_events)
-		cpuctx->task_ctx = ctx;
-
 	perf_event_sched_in(cpuctx, cpuctx->task_ctx, task);
 
 	perf_pmu_enable(ctx->pmu);
@@ -4086,36 +4005,13 @@ static void perf_event_for_each(struct p
 		perf_event_for_each_child(sibling, func);
 }
 
-struct period_event {
-	struct perf_event *event;
-	u64 value;
-};
-
-static void ___perf_event_period(void *info)
-{
-	struct period_event *pe = info;
-	struct perf_event *event = pe->event;
-	u64 value = pe->value;
-
-	if (event->attr.freq) {
-		event->attr.sample_freq = value;
-	} else {
-		event->attr.sample_period = value;
-		event->hw.sample_period = value;
-	}
-
-	local64_set(&event->hw.period_left, 0);
-}
-
-static int __perf_event_period(void *info)
+static void
+__perf_event_period(struct perf_event *event, struct perf_cpu_context *cpuctx,
+		    struct perf_event_context *ctx, void *info)
 {
-	struct period_event *pe = info;
-	struct perf_event *event = pe->event;
-	struct perf_event_context *ctx = event->ctx;
-	u64 value = pe->value;
+	u64 value = *((u64 *)info);
 	bool active;
 
-	raw_spin_lock(&ctx->lock);
 	if (event->attr.freq) {
 		event->attr.sample_freq = value;
 	} else {
@@ -4125,6 +4021,7 @@ static int __perf_event_period(void *inf
 
 	active = (event->state == PERF_EVENT_STATE_ACTIVE);
 	if (active) {
+		WARN_ON_ONCE(!ctx->is_active);
 		perf_pmu_disable(ctx->pmu);
 		event->pmu->stop(event, PERF_EF_UPDATE);
 	}
@@ -4135,14 +4032,10 @@ static int __perf_event_period(void *inf
 		event->pmu->start(event, PERF_EF_RELOAD);
 		perf_pmu_enable(ctx->pmu);
 	}
-	raw_spin_unlock(&ctx->lock);
-
-	return 0;
 }
 
 static int perf_event_period(struct perf_event *event, u64 __user *arg)
 {
-	struct period_event pe = { .event = event, };
 	u64 value;
 
 	if (!is_sampling_event(event))
@@ -4157,10 +4050,7 @@ static int perf_event_period(struct perf
 	if (event->attr.freq && value > sysctl_perf_event_sample_rate)
 		return -EINVAL;
 
-	pe.value = value;
-
-	event_function_call(event, __perf_event_period,
-			    ___perf_event_period, &pe);
+	event_function_call(event, __perf_event_period, &value);
 
 	return 0;
 }
@@ -4932,7 +4822,7 @@ static void perf_pending_event(struct ir
 
 	if (event->pending_disable) {
 		event->pending_disable = 0;
-		__perf_event_disable(event);
+		perf_event_disable_local(event);
 	}
 
 	if (event->pending_wakeup) {
@@ -9221,13 +9111,14 @@ static void perf_event_init_cpu(int cpu)
 #if defined CONFIG_HOTPLUG_CPU || defined CONFIG_KEXEC_CORE
 static void __perf_event_exit_context(void *__info)
 {
-	struct remove_event re = { .detach_group = true };
 	struct perf_event_context *ctx = __info;
+	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
+	struct perf_event *event;
 
-	rcu_read_lock();
-	list_for_each_entry_rcu(re.event, &ctx->event_list, event_entry)
-		__perf_remove_from_context(&re);
-	rcu_read_unlock();
+	raw_spin_lock(&ctx->lock);
+	list_for_each_entry(event, &ctx->event_list, event_entry)
+		__perf_remove_from_context(event, cpuctx, ctx, (void *)(unsigned long)true);
+	raw_spin_unlock(&ctx->lock);
 }
 
 static void perf_event_exit_cpu_context(int cpu)
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -444,7 +444,7 @@ int modify_user_hw_breakpoint(struct per
 	 * current task.
 	 */
 	if (irqs_disabled() && bp->ctx && bp->ctx->task == current)
-		__perf_event_disable(bp);
+		perf_event_disable_local(bp);
 	else
 		perf_event_disable(bp);
 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] perf: Generalize task_function_call()ers
  2015-12-18  9:01               ` Peter Zijlstra
@ 2015-12-18 15:07                 ` Alexander Shishkin
  2015-12-18 16:47                   ` Peter Zijlstra
  2015-12-21 14:39                 ` Alexander Shishkin
  2016-01-11 10:44                 ` Alexander Shishkin
  2 siblings, 1 reply; 28+ messages in thread
From: Alexander Shishkin @ 2015-12-18 15:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo

Peter Zijlstra <peterz@infradead.org> writes:

> @@ -2774,9 +2696,6 @@ static void perf_event_context_sched_in(
>  	 */
>  	cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
>  
> -	if (ctx->nr_events)
> -		cpuctx->task_ctx = ctx;
> -
>  	perf_event_sched_in(cpuctx, cpuctx->task_ctx, task);

This then should probably become

  	perf_event_sched_in(cpuctx, ctx, task);

otherwise task contexts just don't get scheduled any more.

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] perf: Generalize task_function_call()ers
  2015-12-18 15:07                 ` Alexander Shishkin
@ 2015-12-18 16:47                   ` Peter Zijlstra
  2015-12-18 17:41                     ` Alexander Shishkin
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2015-12-18 16:47 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo

On Fri, Dec 18, 2015 at 05:07:34PM +0200, Alexander Shishkin wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > @@ -2774,9 +2696,6 @@ static void perf_event_context_sched_in(
> >  	 */
> >  	cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
> >  
> > -	if (ctx->nr_events)
> > -		cpuctx->task_ctx = ctx;
> > -
> >  	perf_event_sched_in(cpuctx, cpuctx->task_ctx, task);
> 
> This then should probably become
> 
>   	perf_event_sched_in(cpuctx, ctx, task);
> 
> otherwise task contexts just don't get scheduled any more.

Very good, thanks!

Running with that triggers another WARN, I'll have a look at that later
tonight.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] perf: Generalize task_function_call()ers
  2015-12-18 16:47                   ` Peter Zijlstra
@ 2015-12-18 17:41                     ` Alexander Shishkin
  0 siblings, 0 replies; 28+ messages in thread
From: Alexander Shishkin @ 2015-12-18 17:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo

Peter Zijlstra <peterz@infradead.org> writes:

> On Fri, Dec 18, 2015 at 05:07:34PM +0200, Alexander Shishkin wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>> 
>> > @@ -2774,9 +2696,6 @@ static void perf_event_context_sched_in(
>> >  	 */
>> >  	cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
>> >  
>> > -	if (ctx->nr_events)
>> > -		cpuctx->task_ctx = ctx;
>> > -
>> >  	perf_event_sched_in(cpuctx, cpuctx->task_ctx, task);
>> 
>> This then should probably become
>> 
>>   	perf_event_sched_in(cpuctx, ctx, task);
>> 
>> otherwise task contexts just don't get scheduled any more.
>
> Very good, thanks!
>
> Running with that triggers another WARN, I'll have a look at that later
> tonight.

Hmm, is it the one from event_function() where a task context is
scheduled, but is inactive and/or doesn't have events?

I think there's also more fun to be had there now that you've started
digging it up, I'll also poke at it more later today.

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] perf: Generalize task_function_call()ers
  2015-12-18  9:01               ` Peter Zijlstra
  2015-12-18 15:07                 ` Alexander Shishkin
@ 2015-12-21 14:39                 ` Alexander Shishkin
  2016-01-11 10:44                 ` Alexander Shishkin
  2 siblings, 0 replies; 28+ messages in thread
From: Alexander Shishkin @ 2015-12-21 14:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo

Peter Zijlstra <peterz@infradead.org> writes:

> -
> -	/*
> -	 * If the context we're installing events in is not the
> -	 * active task_ctx, flip them.
> -	 */
> -	if (ctx->task && task_ctx != ctx) {
> -		if (task_ctx)
> -			raw_spin_unlock(&task_ctx->lock);
> -		raw_spin_lock(&ctx->lock);
> -		task_ctx = ctx;
> -	}
> -
> -	if (task_ctx) {
> -		cpuctx->task_ctx = task_ctx;
> -		task = task_ctx->task;
> -	}
> -

So previously, this would schedule in the tast_ctx right in
perf_install_in_context path.

The new code would only reschedule the context if it is already on:

> +	if (ctx->is_active)
> +		perf_resched_context(cpuctx);
>  }

which means, iiuc, that an enabled event (say, attr.disabled==0) will
have to wait till the next time the ctx::task is scheduled instead of
getting scheduled right here.

Something like

    if (ctx->task == current && ctx->nr_events)
       perf_event_sched_in(cpuctx, ctx, ctx->task);

might make sense here.

Also the new __perf_event_enable() has the same symptom: it doesn't
schedule the new context on, only reschedule it if it's already on.

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/7] perf: Generalize task_function_call()ers
  2015-12-18  9:01               ` Peter Zijlstra
  2015-12-18 15:07                 ` Alexander Shishkin
  2015-12-21 14:39                 ` Alexander Shishkin
@ 2016-01-11 10:44                 ` Alexander Shishkin
  2 siblings, 0 replies; 28+ messages in thread
From: Alexander Shishkin @ 2016-01-11 10:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian, johannes,
	Arnaldo Carvalho de Melo

Peter Zijlstra <peterz@infradead.org> writes:

> On Thu, Dec 17, 2015 at 04:07:32PM +0100, Peter Zijlstra wrote:
>> On Thu, Dec 17, 2015 at 04:25:14PM +0200, Alexander Shishkin wrote:
>> 
>> > That aside, why I brought it up in the first place is because the two
>> > functions are asymmetric: one is called with irqs disabled and the
>> > other -- with ctx::lock held (and not because I'm into bikeshedding or
>> > anything like that). Looking at the pair of them sets off my "that's not
>> > right" trigger and sends me to the event_function_call()
>> > implementation. So in that sense, prepending an extra underscore kind of
>> > made sense. Maybe __perf_remove_from_context_{on,off}()?
>> 
>> You are quite right, and I think I've found more problems because of
>> this. Let me prod at this some more.
>
> So this then...
>
> This fixes, I think, 3 separate bugs:
>
>  - remove_from_context used to clear ->is_active, this is against the
>    update rules from ctx_sched_in() which set ->is_active even though
>    there might be !nr_events
>
>  - install_in_context did bad things to cpuctx->task_ctx; it would not
>    validate that ctx->task == current and could do random things because
>    of that.
>
>  - cpuctx->task_ctx tracking was iffy
>
> It also unifies a lot of the weird and fragile code we had around all
> those IPI calls and adds a bunch of assertions.
>
> It seems to survive a little pounding with 'normal' workloads.
>
> Please have an extra careful look..

I notice that you dropped this from your queue, do you have any plans to
proceed with this, or should I pick it up?

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-01-11 10:44 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-03 10:32 [PATCH 0/7] perf: Untangle aux refcounting Alexander Shishkin
2015-12-03 10:32 ` [PATCH 1/7] perf: Refuse to begin aux transaction after aux_mmap_count drops Alexander Shishkin
2015-12-03 10:32 ` [PATCH 2/7] perf: Generalize task_function_call()ers Alexander Shishkin
2015-12-03 17:34   ` Peter Zijlstra
2015-12-08 16:42     ` Alexander Shishkin
2015-12-08 16:57       ` Peter Zijlstra
2015-12-17 13:40         ` Peter Zijlstra
2015-12-17 14:25           ` Alexander Shishkin
2015-12-17 15:07             ` Peter Zijlstra
2015-12-18  9:01               ` Peter Zijlstra
2015-12-18 15:07                 ` Alexander Shishkin
2015-12-18 16:47                   ` Peter Zijlstra
2015-12-18 17:41                     ` Alexander Shishkin
2015-12-21 14:39                 ` Alexander Shishkin
2016-01-11 10:44                 ` Alexander Shishkin
2015-12-03 10:32 ` [PATCH 3/7] perf: Add a helper to stop running events Alexander Shishkin
2015-12-03 10:32 ` [PATCH 4/7] perf: Free aux pages in unmap path Alexander Shishkin
2015-12-04 17:02   ` Peter Zijlstra
2015-12-04 22:17     ` Peter Zijlstra
2015-12-07 16:16       ` Peter Zijlstra
2015-12-09  9:57     ` Alexander Shishkin
2015-12-09 10:56       ` Peter Zijlstra
2015-12-10 11:20         ` Alexander Shishkin
2015-12-10 12:58           ` Alexander Shishkin
2015-12-03 10:32 ` [PATCH 5/7] perf: Document aux api usage Alexander Shishkin
2015-12-03 20:36   ` Mathieu Poirier
2015-12-03 10:32 ` [PATCH 6/7] perf/x86/intel/pt: Move transaction start/stop to pmu start/stop callbacks Alexander Shishkin
2015-12-03 10:32 ` [PATCH 7/7] perf/x86/intel/bts: Move transaction start/stop to " Alexander Shishkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.