All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] perf: Untangle aux refcounting
@ 2016-03-04 13:42 Alexander Shishkin
  2016-03-04 13:42 ` [PATCH v2 1/5] perf: Refuse to begin aux transaction after aux_mmap_count drops Alexander Shishkin
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Alexander Shishkin @ 2016-03-04 13:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian,
	Arnaldo Carvalho de Melo, Alexander Shishkin

Hi Peter,

Here's a long overdue update of the previous attempt to untangle the
aux buffer unmapping and deallocation. Most of the work is in 2/5.

Alexander Shishkin (5):
  perf: Refuse to begin aux transaction after aux_mmap_count drops
  perf: Free aux pages in unmap path
  perf: Document aux api usage
  perf/x86/intel/pt: Move transaction start/stop to pmu start/stop
    callbacks
  perf/x86/intel/bts: Move transaction start/stop to start/stop
    callbacks

 arch/x86/events/intel/bts.c | 105 ++++++++++++++++++++------------------------
 arch/x86/events/intel/pt.c  |  85 ++++++++++++-----------------------
 kernel/events/core.c        | 100 ++++++++++++++++++++++++++++++++++++++++-
 kernel/events/internal.h    |   1 -
 kernel/events/ring_buffer.c |  52 +++++++++++-----------
 5 files changed, 200 insertions(+), 143 deletions(-)

-- 
2.7.0

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/5] perf: Refuse to begin aux transaction after aux_mmap_count drops
  2016-03-04 13:42 [PATCH v2 0/5] perf: Untangle aux refcounting Alexander Shishkin
@ 2016-03-04 13:42 ` Alexander Shishkin
  2016-03-31  9:23   ` [tip:perf/core] perf/ring_buffer: Refuse to begin AUX transaction after rb->aux_mmap_count drops tip-bot for Alexander Shishkin
  2016-03-04 13:42 ` [PATCH v2 2/5] perf: Free aux pages in unmap path Alexander Shishkin
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Alexander Shishkin @ 2016-03-04 13:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian,
	Arnaldo Carvalho de Melo, Alexander Shishkin

When ring buffer's aux area is unmapped and aux_mmap_count drops to
zero, new aux transactions into this buffer can still be started,
even though the buffer in en route to deallocation.

This patch adds a check to perf_aux_output_begin() for aux_mmap_count
being zero, in which case there is no point starting new transactions,
in other words, the ring buffers that pass a certain point in
perf_mmap_close will not have their events sending new data, which
clears path for freeing those buffers' pages right there and then,
provided that no active transactions are holding the aux reference.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 kernel/events/ring_buffer.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 1faad2cfdb..af5fbc7e91 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -288,6 +288,13 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
 		goto err;
 
 	/*
+	 * If rb::aux_mmap_count is zero (and rb_has_aux() above went through),
+	 * the aux buffer is in perf_mmap_close(), about to get free'd.
+	 */
+	if (!atomic_read(&rb->aux_mmap_count))
+		goto err;
+
+	/*
 	 * Nesting is not supported for AUX area, make sure nested
 	 * writers are caught early
 	 */
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/5] perf: Free aux pages in unmap path
  2016-03-04 13:42 [PATCH v2 0/5] perf: Untangle aux refcounting Alexander Shishkin
  2016-03-04 13:42 ` [PATCH v2 1/5] perf: Refuse to begin aux transaction after aux_mmap_count drops Alexander Shishkin
@ 2016-03-04 13:42 ` Alexander Shishkin
  2016-03-14 12:38   ` Peter Zijlstra
  2016-03-04 13:42 ` [PATCH v2 3/5] perf: Document aux api usage Alexander Shishkin
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Alexander Shishkin @ 2016-03-04 13:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian,
	Arnaldo Carvalho de Melo, Alexander Shishkin

Now that we can ensure that when ring buffer's aux area is on the way
to getting unmapped new transactions won't start, we only need to stop
all events that can potentially be writing aux data to our ring buffer.
Having done that, we can safely free the aux pages and corresponding
pmu data, as this time it is guaranteed to be the last aux reference
holder. This partially reverts 57ffc5ca679 ("perf: Fix AUX buffer
refcounting"), which was made to defer deallocation that was otherwise
possible from an NMI context. Now it is no longer the case; the last
call to rb_free_aux() that drops the last AUX reference has to happen
in perf_mmap_close() on that AUX area.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 kernel/events/core.c        | 100 +++++++++++++++++++++++++++++++++++++++++++-
 kernel/events/internal.h    |   1 -
 kernel/events/ring_buffer.c |  37 +++++-----------
 3 files changed, 109 insertions(+), 29 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5dcc0bd08d..00cf56abe8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1910,8 +1910,13 @@ event_sched_in(struct perf_event *event,
 	if (event->state <= PERF_EVENT_STATE_OFF)
 		return 0;
 
-	event->state = PERF_EVENT_STATE_ACTIVE;
-	event->oncpu = smp_processor_id();
+	WRITE_ONCE(event->oncpu, smp_processor_id());
+	/*
+	 * Order event::oncpu write to happen before the ACTIVE state
+	 * is visible.
+	 */
+	smp_wmb();
+	WRITE_ONCE(event->state, PERF_EVENT_STATE_ACTIVE);
 
 	/*
 	 * Unthrottle events, since we scheduled we might have missed several
@@ -2343,6 +2348,29 @@ void perf_event_enable(struct perf_event *event)
 }
 EXPORT_SYMBOL_GPL(perf_event_enable);
 
+static int __perf_event_stop(void *info)
+{
+	struct perf_event *event = info;
+
+	/* for aux events, our job is done if the event is already inactive */
+	if (READ_ONCE(event->state) != PERF_EVENT_STATE_ACTIVE)
+		return 0;
+
+	/* matches smp_wmb() in event_sched_in() */
+	smp_rmb();
+
+	/*
+	 * There is a window with interrupts enabled before we get here,
+	 * so we need to check again lest we try to stop another cpu's event.
+	 */
+	if (READ_ONCE(event->oncpu) != smp_processor_id())
+		return -EAGAIN;
+
+	event->pmu->stop(event, PERF_EF_UPDATE);
+
+	return 0;
+}
+
 static int _perf_event_refresh(struct perf_event *event, int refresh)
 {
 	/*
@@ -4622,6 +4650,8 @@ static void perf_mmap_open(struct vm_area_struct *vma)
 		event->pmu->event_mapped(event);
 }
 
+static void perf_pmu_output_stop(struct perf_event *event);
+
 /*
  * A buffer can be mmap()ed multiple times; either directly through the same
  * event, or through other events by use of perf_event_set_output().
@@ -4649,10 +4679,22 @@ static void perf_mmap_close(struct vm_area_struct *vma)
 	 */
 	if (rb_has_aux(rb) && vma->vm_pgoff == rb->aux_pgoff &&
 	    atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &event->mmap_mutex)) {
+		/*
+		 * Stop all aux events that are writing to this here buffer,
+		 * so that we can free its aux pages and corresponding pmu
+		 * data. Note that after rb::aux_mmap_count dropped to zero,
+		 * they won't start any more (see perf_aux_output_begin()).
+		 */
+		perf_pmu_output_stop(event);
+
+		/* now it's safe to free the pages */
 		atomic_long_sub(rb->aux_nr_pages, &mmap_user->locked_vm);
 		vma->vm_mm->pinned_vm -= rb->aux_mmap_locked;
 
+		/* this has to be the last one */
 		rb_free_aux(rb);
+		WARN_ON_ONCE(atomic_read(&rb->aux_refcount));
+
 		mutex_unlock(&event->mmap_mutex);
 	}
 
@@ -5723,6 +5765,60 @@ next:
 	rcu_read_unlock();
 }
 
+struct remote_output {
+	struct ring_buffer	*rb;
+	int			err;
+};
+
+static void __perf_event_output_stop(struct perf_event *event, void *data)
+{
+	struct perf_event *parent = event->parent;
+	struct remote_output *ro = data;
+	struct ring_buffer *rb = ro->rb;
+
+	if (!has_aux(event))
+		return;
+
+	if (rcu_dereference(event->rb) == rb)
+		ro->err = __perf_event_stop(event);
+	else if (parent && rcu_dereference(parent->rb) == rb)
+		ro->err = __perf_event_stop(event);
+}
+
+static int __perf_pmu_output_stop(void *info)
+{
+	struct perf_event *event = info;
+	struct pmu *pmu = event->pmu;
+	struct perf_cpu_context *cpuctx = get_cpu_ptr(pmu->pmu_cpu_context);
+	struct remote_output ro = {
+		.rb	= event->rb,
+	};
+
+	rcu_read_lock();
+	perf_event_aux_ctx(&cpuctx->ctx, __perf_event_output_stop, &ro);
+	if (cpuctx->task_ctx)
+		perf_event_aux_ctx(cpuctx->task_ctx, __perf_event_output_stop,
+				   &ro);
+	rcu_read_unlock();
+
+	return ro.err;
+}
+
+static void perf_pmu_output_stop(struct perf_event *event)
+{
+	int cpu, err;
+
+	/* better be thorough */
+	get_online_cpus();
+restart:
+	for_each_online_cpu(cpu) {
+		err = cpu_function_call(cpu, __perf_pmu_output_stop, event);
+		if (err)
+			goto restart;
+	}
+	put_online_cpus();
+}
+
 /*
  * task tracking -- fork/exit
  *
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 2bbad9c127..2b229fdcfc 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -11,7 +11,6 @@
 struct ring_buffer {
 	atomic_t			refcount;
 	struct rcu_head			rcu_head;
-	struct irq_work			irq_work;
 #ifdef CONFIG_PERF_USE_VMALLOC
 	struct work_struct		work;
 	int				page_order;	/* allocation order  */
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index af5fbc7e91..fb7f5bce27 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -221,8 +221,6 @@ void perf_output_end(struct perf_output_handle *handle)
 	rcu_read_unlock();
 }
 
-static void rb_irq_work(struct irq_work *work);
-
 static void
 ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
 {
@@ -243,16 +241,6 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
 
 	INIT_LIST_HEAD(&rb->event_list);
 	spin_lock_init(&rb->event_lock);
-	init_irq_work(&rb->irq_work, rb_irq_work);
-}
-
-static void ring_buffer_put_async(struct ring_buffer *rb)
-{
-	if (!atomic_dec_and_test(&rb->refcount))
-		return;
-
-	rb->rcu_head.next = (void *)rb;
-	irq_work_queue(&rb->irq_work);
 }
 
 /*
@@ -292,7 +280,7 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
 	 * the aux buffer is in perf_mmap_close(), about to get free'd.
 	 */
 	if (!atomic_read(&rb->aux_mmap_count))
-		goto err;
+		goto err_put;
 
 	/*
 	 * Nesting is not supported for AUX area, make sure nested
@@ -338,7 +326,7 @@ err_put:
 	rb_free_aux(rb);
 
 err:
-	ring_buffer_put_async(rb);
+	ring_buffer_put(rb);
 	handle->event = NULL;
 
 	return NULL;
@@ -389,7 +377,7 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size,
 
 	local_set(&rb->aux_nest, 0);
 	rb_free_aux(rb);
-	ring_buffer_put_async(rb);
+	ring_buffer_put(rb);
 }
 
 /*
@@ -470,6 +458,14 @@ static void __rb_free_aux(struct ring_buffer *rb)
 {
 	int pg;
 
+	/*
+	 * Should never happen, the last reference should be dropped from
+	 * perf_mmap_close() path, which first stops aux transactions (which
+	 * in turn are the atomic holders of aux_refcount) and then does the
+	 * last rb_free_aux().
+	 */
+	WARN_ON_ONCE(in_atomic());
+
 	if (rb->aux_priv) {
 		rb->free_aux(rb->aux_priv);
 		rb->free_aux = NULL;
@@ -581,18 +577,7 @@ out:
 void rb_free_aux(struct ring_buffer *rb)
 {
 	if (atomic_dec_and_test(&rb->aux_refcount))
-		irq_work_queue(&rb->irq_work);
-}
-
-static void rb_irq_work(struct irq_work *work)
-{
-	struct ring_buffer *rb = container_of(work, struct ring_buffer, irq_work);
-
-	if (!atomic_read(&rb->aux_refcount))
 		__rb_free_aux(rb);
-
-	if (rb->rcu_head.next == (void *)rb)
-		call_rcu(&rb->rcu_head, rb_free_rcu);
 }
 
 #ifndef CONFIG_PERF_USE_VMALLOC
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/5] perf: Document aux api usage
  2016-03-04 13:42 [PATCH v2 0/5] perf: Untangle aux refcounting Alexander Shishkin
  2016-03-04 13:42 ` [PATCH v2 1/5] perf: Refuse to begin aux transaction after aux_mmap_count drops Alexander Shishkin
  2016-03-04 13:42 ` [PATCH v2 2/5] perf: Free aux pages in unmap path Alexander Shishkin
@ 2016-03-04 13:42 ` Alexander Shishkin
  2016-03-31  9:24   ` [tip:perf/core] perf/ring_buffer: Document AUX API usage tip-bot for Alexander Shishkin
  2016-03-04 13:42 ` [PATCH v2 4/5] perf/x86/intel/pt: Move transaction start/stop to pmu start/stop callbacks Alexander Shishkin
  2016-03-04 13:42 ` [PATCH v2 5/5] perf/x86/intel/bts: Move transaction start/stop to " Alexander Shishkin
  4 siblings, 1 reply; 16+ messages in thread
From: Alexander Shishkin @ 2016-03-04 13:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Mathieu Poirier

In order to ensure safe aux buffer management, we rely on the assumption
that pmu::stop() stops its ongoing aux transaction and not just the hw.

This patch documents this requirement for perf_aux_output_{begin,end}()
apis.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
---
 kernel/events/ring_buffer.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index fb7f5bce27..b5feb7ebb9 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -252,6 +252,10 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
  * The ordering is similar to that of perf_output_{begin,end}, with
  * the exception of (B), which should be taken care of by the pmu
  * driver, since ordering rules will differ depending on hardware.
+ *
+ * Call this from pmu::start(); see the comment in perf_aux_output_end()
+ * about its use in pmu callbacks. Both can also be called from the PMI
+ * handler if needed.
  */
 void *perf_aux_output_begin(struct perf_output_handle *handle,
 			    struct perf_event *event)
@@ -323,6 +327,7 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
 	return handle->rb->aux_priv;
 
 err_put:
+	/* can't be last */
 	rb_free_aux(rb);
 
 err:
@@ -337,6 +342,10 @@ err:
  * aux_head and posting a PERF_RECORD_AUX into the perf buffer. It is the
  * pmu driver's responsibility to observe ordering rules of the hardware,
  * so that all the data is externally visible before this is called.
+ *
+ * Note: this has to be called from pmu::stop() callback, as the assumption
+ * of the aux buffer management code is that after pmu::stop(), the aux
+ * transaction must be stopped and therefore drop the aux reference count.
  */
 void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size,
 			 bool truncated)
@@ -376,6 +385,7 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size,
 	handle->event = NULL;
 
 	local_set(&rb->aux_nest, 0);
+	/* can't be last */
 	rb_free_aux(rb);
 	ring_buffer_put(rb);
 }
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 4/5] perf/x86/intel/pt: Move transaction start/stop to pmu start/stop callbacks
  2016-03-04 13:42 [PATCH v2 0/5] perf: Untangle aux refcounting Alexander Shishkin
                   ` (2 preceding siblings ...)
  2016-03-04 13:42 ` [PATCH v2 3/5] perf: Document aux api usage Alexander Shishkin
@ 2016-03-04 13:42 ` Alexander Shishkin
  2016-03-31  9:24   ` [tip:perf/core] perf/x86/intel/pt: Move transaction start/stop to PMU " tip-bot for Alexander Shishkin
  2016-03-04 13:42 ` [PATCH v2 5/5] perf/x86/intel/bts: Move transaction start/stop to " Alexander Shishkin
  4 siblings, 1 reply; 16+ messages in thread
From: Alexander Shishkin @ 2016-03-04 13:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Mathieu Poirier

As per AUX buffer management requirement, AUX output has to happen between
pmu::start and pmu::stop calls so that perf_event_stop() actually stops it
and therefore perf can free the aux data after it has called pmu::stop.

This patch moves perf_aux_output_{begin,end} from pt_event_{add,del} to
pt_event_{start,stop}. As a bonus, we get rid of pt_buffer_is_full(),
which is already taken care of by perf_aux_output_begin() anyway.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
---
 arch/x86/events/intel/pt.c | 85 +++++++++++++++-------------------------------
 1 file changed, 27 insertions(+), 58 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 6af7cf71d6..127f58c179 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -906,26 +906,6 @@ static void pt_buffer_free_aux(void *data)
 }
 
 /**
- * pt_buffer_is_full() - check if the buffer is full
- * @buf:	PT buffer.
- * @pt:		Per-cpu pt handle.
- *
- * If the user hasn't read data from the output region that aux_head
- * points to, the buffer is considered full: the user needs to read at
- * least this region and update aux_tail to point past it.
- */
-static bool pt_buffer_is_full(struct pt_buffer *buf, struct pt *pt)
-{
-	if (buf->snapshot)
-		return false;
-
-	if (local_read(&buf->data_size) >= pt->handle.size)
-		return true;
-
-	return false;
-}
-
-/**
  * intel_pt_interrupt() - PT PMI handler
  */
 void intel_pt_interrupt(void)
@@ -989,20 +969,33 @@ void intel_pt_interrupt(void)
 
 static void pt_event_start(struct perf_event *event, int mode)
 {
+	struct hw_perf_event *hwc = &event->hw;
 	struct pt *pt = this_cpu_ptr(&pt_ctx);
-	struct pt_buffer *buf = perf_get_aux(&pt->handle);
+	struct pt_buffer *buf;
 
-	if (!buf || pt_buffer_is_full(buf, pt)) {
-		event->hw.state = PERF_HES_STOPPED;
-		return;
+	buf = perf_aux_output_begin(&pt->handle, event);
+	if (!buf)
+		goto fail_stop;
+
+	pt_buffer_reset_offsets(buf, pt->handle.head);
+	if (!buf->snapshot) {
+		if (pt_buffer_reset_markers(buf, &pt->handle))
+			goto fail_end_stop;
 	}
 
 	ACCESS_ONCE(pt->handle_nmi) = 1;
-	event->hw.state = 0;
+	hwc->state = 0;
 
 	pt_config_buffer(buf->cur->table, buf->cur_idx,
 			 buf->output_off);
 	pt_config(event);
+
+	return;
+
+fail_end_stop:
+	perf_aux_output_end(&pt->handle, 0, true);
+fail_stop:
+	hwc->state = PERF_HES_STOPPED;
 }
 
 static void pt_event_stop(struct perf_event *event, int mode)
@@ -1035,19 +1028,7 @@ static void pt_event_stop(struct perf_event *event, int mode)
 		pt_handle_status(pt);
 
 		pt_update_head(pt);
-	}
-}
 
-static void pt_event_del(struct perf_event *event, int mode)
-{
-	struct pt *pt = this_cpu_ptr(&pt_ctx);
-	struct pt_buffer *buf;
-
-	pt_event_stop(event, PERF_EF_UPDATE);
-
-	buf = perf_get_aux(&pt->handle);
-
-	if (buf) {
 		if (buf->snapshot)
 			pt->handle.head =
 				local_xchg(&buf->data_size,
@@ -1057,9 +1038,13 @@ static void pt_event_del(struct perf_event *event, int mode)
 	}
 }
 
+static void pt_event_del(struct perf_event *event, int mode)
+{
+	pt_event_stop(event, PERF_EF_UPDATE);
+}
+
 static int pt_event_add(struct perf_event *event, int mode)
 {
-	struct pt_buffer *buf;
 	struct pt *pt = this_cpu_ptr(&pt_ctx);
 	struct hw_perf_event *hwc = &event->hw;
 	int ret = -EBUSY;
@@ -1067,34 +1052,18 @@ static int pt_event_add(struct perf_event *event, int mode)
 	if (pt->handle.event)
 		goto fail;
 
-	buf = perf_aux_output_begin(&pt->handle, event);
-	ret = -EINVAL;
-	if (!buf)
-		goto fail_stop;
-
-	pt_buffer_reset_offsets(buf, pt->handle.head);
-	if (!buf->snapshot) {
-		ret = pt_buffer_reset_markers(buf, &pt->handle);
-		if (ret)
-			goto fail_end_stop;
-	}
-
 	if (mode & PERF_EF_START) {
 		pt_event_start(event, 0);
-		ret = -EBUSY;
+		ret = -EINVAL;
 		if (hwc->state == PERF_HES_STOPPED)
-			goto fail_end_stop;
+			goto fail;
 	} else {
 		hwc->state = PERF_HES_STOPPED;
 	}
 
-	return 0;
-
-fail_end_stop:
-	perf_aux_output_end(&pt->handle, 0, true);
-fail_stop:
-	hwc->state = PERF_HES_STOPPED;
+	ret = 0;
 fail:
+
 	return ret;
 }
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 5/5] perf/x86/intel/bts: Move transaction start/stop to start/stop callbacks
  2016-03-04 13:42 [PATCH v2 0/5] perf: Untangle aux refcounting Alexander Shishkin
                   ` (3 preceding siblings ...)
  2016-03-04 13:42 ` [PATCH v2 4/5] perf/x86/intel/pt: Move transaction start/stop to pmu start/stop callbacks Alexander Shishkin
@ 2016-03-04 13:42 ` Alexander Shishkin
  2016-03-31  9:25   ` [tip:perf/core] " tip-bot for Alexander Shishkin
  4 siblings, 1 reply; 16+ messages in thread
From: Alexander Shishkin @ 2016-03-04 13:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Mathieu Poirier

As per AUX buffer management requirement, AUX output has to happen between
pmu::start and pmu::stop calls so that perf_event_stop() actually stops it
and therefore perf can free the aux data after it has called pmu::stop.

This patch moves perf_aux_output_{begin,end} from bts_event_{add,del} to
bts_event_{start,stop}. As a bonus, we get rid of bts_buffer_is_full(),
which is already taken care of by perf_aux_output_begin() anyway.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
---
 arch/x86/events/intel/bts.c | 105 ++++++++++++++++++++------------------------
 1 file changed, 48 insertions(+), 57 deletions(-)

diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c
index b99dc9258c..0a6e393a2e 100644
--- a/arch/x86/events/intel/bts.c
+++ b/arch/x86/events/intel/bts.c
@@ -171,18 +171,6 @@ static void bts_buffer_pad_out(struct bts_phys *phys, unsigned long head)
 	memset(page_address(phys->page) + index, 0, phys->size - index);
 }
 
-static bool bts_buffer_is_full(struct bts_buffer *buf, struct bts_ctx *bts)
-{
-	if (buf->snapshot)
-		return false;
-
-	if (local_read(&buf->data_size) >= bts->handle.size ||
-	    bts->handle.size - local_read(&buf->data_size) < BTS_RECORD_SIZE)
-		return true;
-
-	return false;
-}
-
 static void bts_update(struct bts_ctx *bts)
 {
 	int cpu = raw_smp_processor_id();
@@ -213,18 +201,15 @@ static void bts_update(struct bts_ctx *bts)
 	}
 }
 
+static int
+bts_buffer_reset(struct bts_buffer *buf, struct perf_output_handle *handle);
+
 static void __bts_event_start(struct perf_event *event)
 {
 	struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
 	struct bts_buffer *buf = perf_get_aux(&bts->handle);
 	u64 config = 0;
 
-	if (!buf || bts_buffer_is_full(buf, bts))
-		return;
-
-	event->hw.itrace_started = 1;
-	event->hw.state = 0;
-
 	if (!buf->snapshot)
 		config |= ARCH_PERFMON_EVENTSEL_INT;
 	if (!event->attr.exclude_kernel)
@@ -241,16 +226,41 @@ static void __bts_event_start(struct perf_event *event)
 	wmb();
 
 	intel_pmu_enable_bts(config);
+
 }
 
 static void bts_event_start(struct perf_event *event, int flags)
 {
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
+	struct bts_buffer *buf;
+
+	buf = perf_aux_output_begin(&bts->handle, event);
+	if (!buf)
+		goto fail_stop;
+
+	if (bts_buffer_reset(buf, &bts->handle))
+		goto fail_end_stop;
+
+	bts->ds_back.bts_buffer_base = cpuc->ds->bts_buffer_base;
+	bts->ds_back.bts_absolute_maximum = cpuc->ds->bts_absolute_maximum;
+	bts->ds_back.bts_interrupt_threshold = cpuc->ds->bts_interrupt_threshold;
+
+	event->hw.itrace_started = 1;
+	event->hw.state = 0;
 
 	__bts_event_start(event);
 
 	/* PMI handler: this counter is running and likely generating PMIs */
 	ACCESS_ONCE(bts->started) = 1;
+
+	return;
+
+fail_end_stop:
+	perf_aux_output_end(&bts->handle, 0, false);
+
+fail_stop:
+	event->hw.state = PERF_HES_STOPPED;
 }
 
 static void __bts_event_stop(struct perf_event *event)
@@ -269,15 +279,32 @@ static void __bts_event_stop(struct perf_event *event)
 
 static void bts_event_stop(struct perf_event *event, int flags)
 {
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
+	struct bts_buffer *buf = perf_get_aux(&bts->handle);
 
 	/* PMI handler: don't restart this counter */
 	ACCESS_ONCE(bts->started) = 0;
 
 	__bts_event_stop(event);
 
-	if (flags & PERF_EF_UPDATE)
+	if (flags & PERF_EF_UPDATE) {
 		bts_update(bts);
+
+		if (buf) {
+			if (buf->snapshot)
+				bts->handle.head =
+					local_xchg(&buf->data_size,
+						   buf->nr_pages << PAGE_SHIFT);
+			perf_aux_output_end(&bts->handle, local_xchg(&buf->data_size, 0),
+					    !!local_xchg(&buf->lost, 0));
+		}
+
+		cpuc->ds->bts_index = bts->ds_back.bts_buffer_base;
+		cpuc->ds->bts_buffer_base = bts->ds_back.bts_buffer_base;
+		cpuc->ds->bts_absolute_maximum = bts->ds_back.bts_absolute_maximum;
+		cpuc->ds->bts_interrupt_threshold = bts->ds_back.bts_interrupt_threshold;
+	}
 }
 
 void intel_bts_enable_local(void)
@@ -417,34 +444,14 @@ int intel_bts_interrupt(void)
 
 static void bts_event_del(struct perf_event *event, int mode)
 {
-	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
-	struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
-	struct bts_buffer *buf = perf_get_aux(&bts->handle);
-
 	bts_event_stop(event, PERF_EF_UPDATE);
-
-	if (buf) {
-		if (buf->snapshot)
-			bts->handle.head =
-				local_xchg(&buf->data_size,
-					   buf->nr_pages << PAGE_SHIFT);
-		perf_aux_output_end(&bts->handle, local_xchg(&buf->data_size, 0),
-				    !!local_xchg(&buf->lost, 0));
-	}
-
-	cpuc->ds->bts_index = bts->ds_back.bts_buffer_base;
-	cpuc->ds->bts_buffer_base = bts->ds_back.bts_buffer_base;
-	cpuc->ds->bts_absolute_maximum = bts->ds_back.bts_absolute_maximum;
-	cpuc->ds->bts_interrupt_threshold = bts->ds_back.bts_interrupt_threshold;
 }
 
 static int bts_event_add(struct perf_event *event, int mode)
 {
-	struct bts_buffer *buf;
 	struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct hw_perf_event *hwc = &event->hw;
-	int ret = -EBUSY;
 
 	event->hw.state = PERF_HES_STOPPED;
 
@@ -454,26 +461,10 @@ static int bts_event_add(struct perf_event *event, int mode)
 	if (bts->handle.event)
 		return -EBUSY;
 
-	buf = perf_aux_output_begin(&bts->handle, event);
-	if (!buf)
-		return -EINVAL;
-
-	ret = bts_buffer_reset(buf, &bts->handle);
-	if (ret) {
-		perf_aux_output_end(&bts->handle, 0, false);
-		return ret;
-	}
-
-	bts->ds_back.bts_buffer_base = cpuc->ds->bts_buffer_base;
-	bts->ds_back.bts_absolute_maximum = cpuc->ds->bts_absolute_maximum;
-	bts->ds_back.bts_interrupt_threshold = cpuc->ds->bts_interrupt_threshold;
-
 	if (mode & PERF_EF_START) {
 		bts_event_start(event, 0);
-		if (hwc->state & PERF_HES_STOPPED) {
-			bts_event_del(event, 0);
-			return -EBUSY;
-		}
+		if (hwc->state & PERF_HES_STOPPED)
+			return -EINVAL;
 	}
 
 	return 0;
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] perf: Free aux pages in unmap path
  2016-03-04 13:42 ` [PATCH v2 2/5] perf: Free aux pages in unmap path Alexander Shishkin
@ 2016-03-14 12:38   ` Peter Zijlstra
  2016-03-14 14:04     ` Alexander Shishkin
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2016-03-14 12:38 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Ingo Molnar, linux-kernel, vince, eranian, Arnaldo Carvalho de Melo

On Fri, Mar 04, 2016 at 03:42:46PM +0200, Alexander Shishkin wrote:
> @@ -4649,10 +4679,22 @@ static void perf_mmap_close(struct vm_area_struct *vma)
>  	 */
>  	if (rb_has_aux(rb) && vma->vm_pgoff == rb->aux_pgoff &&
>  	    atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &event->mmap_mutex)) {
> +		/*
> +		 * Stop all aux events that are writing to this here buffer,
> +		 * so that we can free its aux pages and corresponding pmu
> +		 * data. Note that after rb::aux_mmap_count dropped to zero,
> +		 * they won't start any more (see perf_aux_output_begin()).
> +		 */
> +		perf_pmu_output_stop(event);

So to me it seems like we're interested in rb, we don't particularly
care about @event in this case.

> +
> +		/* now it's safe to free the pages */
>  		atomic_long_sub(rb->aux_nr_pages, &mmap_user->locked_vm);
>  		vma->vm_mm->pinned_vm -= rb->aux_mmap_locked;
>  
> +		/* this has to be the last one */
>  		rb_free_aux(rb);
> +		WARN_ON_ONCE(atomic_read(&rb->aux_refcount));
> +
>  		mutex_unlock(&event->mmap_mutex);
>  	}
>  

> +static void __perf_event_output_stop(struct perf_event *event, void *data)
> +{
> +	struct perf_event *parent = event->parent;
> +	struct remote_output *ro = data;
> +	struct ring_buffer *rb = ro->rb;
> +
> +	if (!has_aux(event))
> +		return;
> +

	if (!parent)
		parent = event;

> +	if (rcu_dereference(event->rb) == rb)
s/event/parent/

> +		ro->err = __perf_event_stop(event);

> +	else if (parent && rcu_dereference(parent->rb) == rb)
> +		ro->err = __perf_event_stop(event);

and these can go.. However..

> +}
> +
> +static int __perf_pmu_output_stop(void *info)
> +{
> +	struct perf_event *event = info;
> +	struct pmu *pmu = event->pmu;
> +	struct perf_cpu_context *cpuctx = get_cpu_ptr(pmu->pmu_cpu_context);
> +	struct remote_output ro = {
> +		.rb	= event->rb,
> +	};
> +
> +	rcu_read_lock();
> +	perf_event_aux_ctx(&cpuctx->ctx, __perf_event_output_stop, &ro);
> +	if (cpuctx->task_ctx)
> +		perf_event_aux_ctx(cpuctx->task_ctx, __perf_event_output_stop,
> +				   &ro);
> +	rcu_read_unlock();
> +
> +	return ro.err;
> +}
> +
> +static void perf_pmu_output_stop(struct perf_event *event)
> +{
> +	int cpu, err;
> +
> +	/* better be thorough */
> +	get_online_cpus();
> +restart:
> +	for_each_online_cpu(cpu) {
> +		err = cpu_function_call(cpu, __perf_pmu_output_stop, event);
> +		if (err)
> +			goto restart;
> +	}
> +	put_online_cpus();
> +}

This seems wildly overkill, could we not iterate rb->event_list like we
do for the normal buffer?

Sure, we need to IPI for each event found, but that seems better than
unconditionally sending IPIs to all CPUs.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] perf: Free aux pages in unmap path
  2016-03-14 12:38   ` Peter Zijlstra
@ 2016-03-14 14:04     ` Alexander Shishkin
  2016-03-14 16:42       ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Shishkin @ 2016-03-14 14:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian, Arnaldo Carvalho de Melo

Peter Zijlstra <peterz@infradead.org> writes:

> On Fri, Mar 04, 2016 at 03:42:46PM +0200, Alexander Shishkin wrote:
>> @@ -4649,10 +4679,22 @@ static void perf_mmap_close(struct vm_area_struct *vma)
>>  	 */
>>  	if (rb_has_aux(rb) && vma->vm_pgoff == rb->aux_pgoff &&
>>  	    atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &event->mmap_mutex)) {
>> +		/*
>> +		 * Stop all aux events that are writing to this here buffer,
>> +		 * so that we can free its aux pages and corresponding pmu
>> +		 * data. Note that after rb::aux_mmap_count dropped to zero,
>> +		 * they won't start any more (see perf_aux_output_begin()).
>> +		 */
>> +		perf_pmu_output_stop(event);
>
> So to me it seems like we're interested in rb, we don't particularly
> care about @event in this case.

Yeah, @event is used only for its rb and pmu down this path.

>> +	if (!has_aux(event))
>> +		return;
>> +
>
> 	if (!parent)
> 		parent = event;
>
>> +	if (rcu_dereference(event->rb) == rb)
> s/event/parent/
>
>> +		ro->err = __perf_event_stop(event);
>
>> +	else if (parent && rcu_dereference(parent->rb) == rb)
>> +		ro->err = __perf_event_stop(event);
>
> and these can go.. However..

You're right: it's the parent that's got the ->rb, but it may well be
the child that's actually running and writing data there. So we need to
stop any running children. I think I actually broke this bit since the
previous version, because there needs to be a comment about it.

>
>> +}
>> +
>> +static int __perf_pmu_output_stop(void *info)
>> +{
>> +	struct perf_event *event = info;
>> +	struct pmu *pmu = event->pmu;
>> +	struct perf_cpu_context *cpuctx = get_cpu_ptr(pmu->pmu_cpu_context);
>> +	struct remote_output ro = {
>> +		.rb	= event->rb,
>> +	};
>> +
>> +	rcu_read_lock();
>> +	perf_event_aux_ctx(&cpuctx->ctx, __perf_event_output_stop, &ro);
>> +	if (cpuctx->task_ctx)
>> +		perf_event_aux_ctx(cpuctx->task_ctx, __perf_event_output_stop,
>> +				   &ro);
>> +	rcu_read_unlock();
>> +
>> +	return ro.err;
>> +}
>> +
>> +static void perf_pmu_output_stop(struct perf_event *event)
>> +{
>> +	int cpu, err;
>> +
>> +	/* better be thorough */
>> +	get_online_cpus();
>> +restart:
>> +	for_each_online_cpu(cpu) {
>> +		err = cpu_function_call(cpu, __perf_pmu_output_stop, event);
>> +		if (err)
>> +			goto restart;
>> +	}
>> +	put_online_cpus();
>> +}
>
> This seems wildly overkill, could we not iterate rb->event_list like we
> do for the normal buffer?

Actually we can. One problem though is that iterating rb::event_list
requires rcu read section or irqsafe rb::event_lock and we need to send
IPIs. The normal buffer case tears down the rb::event_list as it goes,
so it can close the rcu read section right after it fetches one event
from it. In this case however, we must keep the list intact.

> Sure, we need to IPI for each event found, but that seems better than
> unconditionally sending IPIs to all CPUs.

Actually, won't it "often" be the case that the number of events will be
a multiple of the number of cpus? The usual use case being one event per
task per cpu and inheritance enabled. In this case we'll zap multiple
events per IPI.

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] perf: Free aux pages in unmap path
  2016-03-14 14:04     ` Alexander Shishkin
@ 2016-03-14 16:42       ` Peter Zijlstra
  2016-03-17 13:05         ` Alexander Shishkin
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2016-03-14 16:42 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Ingo Molnar, linux-kernel, vince, eranian, Arnaldo Carvalho de Melo

On Mon, Mar 14, 2016 at 04:04:44PM +0200, Alexander Shishkin wrote:
> Peter Zijlstra <peterz@infradead.org> writes:

> >> +static void perf_pmu_output_stop(struct perf_event *event)
> >> +{
> >> +	int cpu, err;
> >> +
> >> +	/* better be thorough */
> >> +	get_online_cpus();
> >> +restart:
> >> +	for_each_online_cpu(cpu) {
> >> +		err = cpu_function_call(cpu, __perf_pmu_output_stop, event);
> >> +		if (err)
> >> +			goto restart;
> >> +	}
> >> +	put_online_cpus();
> >> +}
> >
> > This seems wildly overkill, could we not iterate rb->event_list like we
> > do for the normal buffer?
> 
> Actually we can. One problem though is that iterating rb::event_list
> requires rcu read section or irqsafe rb::event_lock and we need to send
> IPIs. 

We should be able to send IPIs with rcu_read_lock() held; doing so with
IRQs disabled is a bit harder.

> The normal buffer case tears down the rb::event_list as it goes,
> so it can close the rcu read section right after it fetches one event
> from it. In this case however, we must keep the list intact.

Yep..

> > Sure, we need to IPI for each event found, but that seems better than
> > unconditionally sending IPIs to all CPUs.
> 
> Actually, won't it "often" be the case that the number of events will be
> a multiple of the number of cpus? The usual use case being one event per
> task per cpu and inheritance enabled. In this case we'll zap multiple
> events per IPI.

Right, but then each event (or set thereof) will be for one particular
CPU. So for the one munmap() you typically only end up sending IPIs to
that one CPU.

If OTOH you send IPIs to all CPUs for all events, you end up with n^2
IPIs, because for each CPUs munmap() you send IPIs to all other CPUs.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] perf: Free aux pages in unmap path
  2016-03-14 16:42       ` Peter Zijlstra
@ 2016-03-17 13:05         ` Alexander Shishkin
  2016-03-23  8:34           ` Peter Zijlstra
  2016-03-31  9:24           ` [tip:perf/core] perf/core: Free AUX " tip-bot for Alexander Shishkin
  0 siblings, 2 replies; 16+ messages in thread
From: Alexander Shishkin @ 2016-03-17 13:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, vince, eranian, Arnaldo Carvalho de Melo

Peter Zijlstra <peterz@infradead.org> writes:

> On Mon, Mar 14, 2016 at 04:04:44PM +0200, Alexander Shishkin wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>
>> >> +static void perf_pmu_output_stop(struct perf_event *event)
>> >> +{
>> >> +	int cpu, err;
>> >> +
>> >> +	/* better be thorough */
>> >> +	get_online_cpus();
>> >> +restart:
>> >> +	for_each_online_cpu(cpu) {
>> >> +		err = cpu_function_call(cpu, __perf_pmu_output_stop, event);
>> >> +		if (err)
>> >> +			goto restart;
>> >> +	}
>> >> +	put_online_cpus();
>> >> +}
>> >
>> > This seems wildly overkill, could we not iterate rb->event_list like we
>> > do for the normal buffer?
>> 
>> Actually we can. One problem though is that iterating rb::event_list
>> requires rcu read section or irqsafe rb::event_lock and we need to send
>> IPIs. 
>
> We should be able to send IPIs with rcu_read_lock() held; doing so with
> IRQs disabled is a bit harder.

Ok, so how about this one instead.

>From f674bc768b77479478ca4a67251c5f6333993249 Mon Sep 17 00:00:00 2001
From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Date: Wed, 2 Dec 2015 18:41:11 +0200
Subject: [PATCH v3] perf: Free aux pages in unmap path

Now that we can ensure that when ring buffer's aux area is on the way
to getting unmapped new transactions won't start, we only need to stop
all events that can potentially be writing aux data to our ring buffer.
Having done that, we can safely free the aux pages and corresponding
pmu data, as this time it is guaranteed to be the last aux reference
holder. This partially reverts 57ffc5ca679 ("perf: Fix AUX buffer
refcounting"), which was made to defer deallocation that was otherwise
possible from an NMI context. Now it is no longer the case; the last
call to rb_free_aux() that drops the last AUX reference has to happen
in perf_mmap_close() on that AUX area.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 kernel/events/core.c        | 120 +++++++++++++++++++++++++++++++++++++++++++-
 kernel/events/internal.h    |   1 -
 kernel/events/ring_buffer.c |  37 ++++----------
 3 files changed, 129 insertions(+), 29 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index b7231498de..4bfa0ff2fb 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1910,8 +1910,13 @@ event_sched_in(struct perf_event *event,
 	if (event->state <= PERF_EVENT_STATE_OFF)
 		return 0;
 
-	event->state = PERF_EVENT_STATE_ACTIVE;
-	event->oncpu = smp_processor_id();
+	WRITE_ONCE(event->oncpu, smp_processor_id());
+	/*
+	 * Order event::oncpu write to happen before the ACTIVE state
+	 * is visible.
+	 */
+	smp_wmb();
+	WRITE_ONCE(event->state, PERF_EVENT_STATE_ACTIVE);
 
 	/*
 	 * Unthrottle events, since we scheduled we might have missed several
@@ -2343,6 +2348,29 @@ void perf_event_enable(struct perf_event *event)
 }
 EXPORT_SYMBOL_GPL(perf_event_enable);
 
+static int __perf_event_stop(void *info)
+{
+	struct perf_event *event = info;
+
+	/* for aux events, our job is done if the event is already inactive */
+	if (READ_ONCE(event->state) != PERF_EVENT_STATE_ACTIVE)
+		return 0;
+
+	/* matches smp_wmb() in event_sched_in() */
+	smp_rmb();
+
+	/*
+	 * There is a window with interrupts enabled before we get here,
+	 * so we need to check again lest we try to stop another cpu's event.
+	 */
+	if (READ_ONCE(event->oncpu) != smp_processor_id())
+		return -EAGAIN;
+
+	event->pmu->stop(event, PERF_EF_UPDATE);
+
+	return 0;
+}
+
 static int _perf_event_refresh(struct perf_event *event, int refresh)
 {
 	/*
@@ -4622,6 +4650,8 @@ static void perf_mmap_open(struct vm_area_struct *vma)
 		event->pmu->event_mapped(event);
 }
 
+static void perf_pmu_output_stop(struct perf_event *event);
+
 /*
  * A buffer can be mmap()ed multiple times; either directly through the same
  * event, or through other events by use of perf_event_set_output().
@@ -4649,10 +4679,22 @@ static void perf_mmap_close(struct vm_area_struct *vma)
 	 */
 	if (rb_has_aux(rb) && vma->vm_pgoff == rb->aux_pgoff &&
 	    atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &event->mmap_mutex)) {
+		/*
+		 * Stop all aux events that are writing to this here buffer,
+		 * so that we can free its aux pages and corresponding pmu
+		 * data. Note that after rb::aux_mmap_count dropped to zero,
+		 * they won't start any more (see perf_aux_output_begin()).
+		 */
+		perf_pmu_output_stop(event);
+
+		/* now it's safe to free the pages */
 		atomic_long_sub(rb->aux_nr_pages, &mmap_user->locked_vm);
 		vma->vm_mm->pinned_vm -= rb->aux_mmap_locked;
 
+		/* this has to be the last one */
 		rb_free_aux(rb);
+		WARN_ON_ONCE(atomic_read(&rb->aux_refcount));
+
 		mutex_unlock(&event->mmap_mutex);
 	}
 
@@ -5723,6 +5765,80 @@ next:
 	rcu_read_unlock();
 }
 
+struct remote_output {
+	struct ring_buffer	*rb;
+	int			err;
+};
+
+static void __perf_event_output_stop(struct perf_event *event, void *data)
+{
+	struct perf_event *parent = event->parent;
+	struct remote_output *ro = data;
+	struct ring_buffer *rb = ro->rb;
+
+	if (!has_aux(event))
+		return;
+
+	if (!parent)
+		parent = event;
+
+	/*
+	 * in case of inheritance, it will be the parent that links to the
+	 * rb, but it will be the child that's actually using it
+	 */
+	if (rcu_dereference(parent->rb) == rb)
+		ro->err = __perf_event_stop(event);
+}
+
+static int __perf_pmu_output_stop(void *info)
+{
+	struct perf_event *event = info;
+	struct pmu *pmu = event->pmu;
+	struct perf_cpu_context *cpuctx = get_cpu_ptr(pmu->pmu_cpu_context);
+	struct remote_output ro = {
+		.rb	= event->rb,
+	};
+
+	rcu_read_lock();
+	perf_event_aux_ctx(&cpuctx->ctx, __perf_event_output_stop, &ro);
+	if (cpuctx->task_ctx)
+		perf_event_aux_ctx(cpuctx->task_ctx, __perf_event_output_stop,
+				   &ro);
+	rcu_read_unlock();
+
+	return ro.err;
+}
+
+static void perf_pmu_output_stop(struct perf_event *event)
+{
+	struct perf_event *iter;
+	int err, cpu;
+
+restart:
+	rcu_read_lock();
+	list_for_each_entry_rcu(iter, &event->rb->event_list, rb_entry) {
+		/*
+		 * For per-cpu events, we need to make sure that neither they
+		 * nor their children are running; for cpu==-1 events it's
+		 * sufficient to stop the event itself if it's active, since
+		 * it can't have children.
+		 */
+		cpu = iter->cpu;
+		if (cpu == -1)
+			cpu = READ_ONCE(iter->oncpu);
+
+		if (cpu == -1)
+			continue;
+
+		err = cpu_function_call(cpu, __perf_pmu_output_stop, event);
+		if (err == -EAGAIN) {
+			rcu_read_unlock();
+			goto restart;
+		}
+	}
+	rcu_read_unlock();
+}
+
 /*
  * task tracking -- fork/exit
  *
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 2bbad9c127..2b229fdcfc 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -11,7 +11,6 @@
 struct ring_buffer {
 	atomic_t			refcount;
 	struct rcu_head			rcu_head;
-	struct irq_work			irq_work;
 #ifdef CONFIG_PERF_USE_VMALLOC
 	struct work_struct		work;
 	int				page_order;	/* allocation order  */
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index af5fbc7e91..fb7f5bce27 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -221,8 +221,6 @@ void perf_output_end(struct perf_output_handle *handle)
 	rcu_read_unlock();
 }
 
-static void rb_irq_work(struct irq_work *work);
-
 static void
 ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
 {
@@ -243,16 +241,6 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
 
 	INIT_LIST_HEAD(&rb->event_list);
 	spin_lock_init(&rb->event_lock);
-	init_irq_work(&rb->irq_work, rb_irq_work);
-}
-
-static void ring_buffer_put_async(struct ring_buffer *rb)
-{
-	if (!atomic_dec_and_test(&rb->refcount))
-		return;
-
-	rb->rcu_head.next = (void *)rb;
-	irq_work_queue(&rb->irq_work);
 }
 
 /*
@@ -292,7 +280,7 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
 	 * the aux buffer is in perf_mmap_close(), about to get free'd.
 	 */
 	if (!atomic_read(&rb->aux_mmap_count))
-		goto err;
+		goto err_put;
 
 	/*
 	 * Nesting is not supported for AUX area, make sure nested
@@ -338,7 +326,7 @@ err_put:
 	rb_free_aux(rb);
 
 err:
-	ring_buffer_put_async(rb);
+	ring_buffer_put(rb);
 	handle->event = NULL;
 
 	return NULL;
@@ -389,7 +377,7 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size,
 
 	local_set(&rb->aux_nest, 0);
 	rb_free_aux(rb);
-	ring_buffer_put_async(rb);
+	ring_buffer_put(rb);
 }
 
 /*
@@ -470,6 +458,14 @@ static void __rb_free_aux(struct ring_buffer *rb)
 {
 	int pg;
 
+	/*
+	 * Should never happen, the last reference should be dropped from
+	 * perf_mmap_close() path, which first stops aux transactions (which
+	 * in turn are the atomic holders of aux_refcount) and then does the
+	 * last rb_free_aux().
+	 */
+	WARN_ON_ONCE(in_atomic());
+
 	if (rb->aux_priv) {
 		rb->free_aux(rb->aux_priv);
 		rb->free_aux = NULL;
@@ -581,18 +577,7 @@ out:
 void rb_free_aux(struct ring_buffer *rb)
 {
 	if (atomic_dec_and_test(&rb->aux_refcount))
-		irq_work_queue(&rb->irq_work);
-}
-
-static void rb_irq_work(struct irq_work *work)
-{
-	struct ring_buffer *rb = container_of(work, struct ring_buffer, irq_work);
-
-	if (!atomic_read(&rb->aux_refcount))
 		__rb_free_aux(rb);
-
-	if (rb->rcu_head.next == (void *)rb)
-		call_rcu(&rb->rcu_head, rb_free_rcu);
 }
 
 #ifndef CONFIG_PERF_USE_VMALLOC
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] perf: Free aux pages in unmap path
  2016-03-17 13:05         ` Alexander Shishkin
@ 2016-03-23  8:34           ` Peter Zijlstra
  2016-03-31  9:24           ` [tip:perf/core] perf/core: Free AUX " tip-bot for Alexander Shishkin
  1 sibling, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2016-03-23  8:34 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Ingo Molnar, linux-kernel, vince, eranian, Arnaldo Carvalho de Melo

On Thu, Mar 17, 2016 at 03:05:42PM +0200, Alexander Shishkin wrote:
> > We should be able to send IPIs with rcu_read_lock() held;

> Ok, so how about this one instead.

Looks good, thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tip:perf/core] perf/ring_buffer: Refuse to begin AUX transaction after rb->aux_mmap_count drops
  2016-03-04 13:42 ` [PATCH v2 1/5] perf: Refuse to begin aux transaction after aux_mmap_count drops Alexander Shishkin
@ 2016-03-31  9:23   ` tip-bot for Alexander Shishkin
  0 siblings, 0 replies; 16+ messages in thread
From: tip-bot for Alexander Shishkin @ 2016-03-31  9:23 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, linux-kernel, peterz, vincent.weaver, eranian, hpa, jolsa,
	alexander.shishkin, tglx, acme, acme, torvalds

Commit-ID:  dcb10a967ce82d5ad20570693091139ae716ff76
Gitweb:     http://git.kernel.org/tip/dcb10a967ce82d5ad20570693091139ae716ff76
Author:     Alexander Shishkin <alexander.shishkin@linux.intel.com>
AuthorDate: Fri, 4 Mar 2016 15:42:45 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 31 Mar 2016 10:30:41 +0200

perf/ring_buffer: Refuse to begin AUX transaction after rb->aux_mmap_count drops

When ring buffer's AUX area is unmapped and rb->aux_mmap_count drops to
zero, new AUX transactions into this buffer can still be started,
even though the buffer in en route to deallocation.

This patch adds a check to perf_aux_output_begin() for rb->aux_mmap_count
being zero, in which case there is no point starting new transactions,
in other words, the ring buffers that pass a certain point in
perf_mmap_close will not have their events sending new data, which
clears path for freeing those buffers' pages right there and then,
provided that no active transactions are holding the AUX reference.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/1457098969-21595-2-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/ring_buffer.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index c61f0cb..89abf62 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -288,6 +288,13 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
 		goto err;
 
 	/*
+	 * If rb::aux_mmap_count is zero (and rb_has_aux() above went through),
+	 * the aux buffer is in perf_mmap_close(), about to get freed.
+	 */
+	if (!atomic_read(&rb->aux_mmap_count))
+		goto err;
+
+	/*
 	 * Nesting is not supported for AUX area, make sure nested
 	 * writers are caught early
 	 */

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [tip:perf/core] perf/core: Free AUX pages in unmap path
  2016-03-17 13:05         ` Alexander Shishkin
  2016-03-23  8:34           ` Peter Zijlstra
@ 2016-03-31  9:24           ` tip-bot for Alexander Shishkin
  1 sibling, 0 replies; 16+ messages in thread
From: tip-bot for Alexander Shishkin @ 2016-03-31  9:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, acme, torvalds, tglx, linux-kernel, mingo, eranian, hpa,
	acme, vincent.weaver, alexander.shishkin, jolsa

Commit-ID:  95ff4ca26c492fc1ed7751f5dd7ab7674b54f4e0
Gitweb:     http://git.kernel.org/tip/95ff4ca26c492fc1ed7751f5dd7ab7674b54f4e0
Author:     Alexander Shishkin <alexander.shishkin@linux.intel.com>
AuthorDate: Wed, 2 Dec 2015 18:41:11 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 31 Mar 2016 10:30:42 +0200

perf/core: Free AUX pages in unmap path

Now that we can ensure that when ring buffer's AUX area is on the way
to getting unmapped new transactions won't start, we only need to stop
all events that can potentially be writing aux data to our ring buffer.

Having done that, we can safely free the AUX pages and corresponding
PMU data, as this time it is guaranteed to be the last aux reference
holder.

This partially reverts:

  57ffc5ca679 ("perf: Fix AUX buffer refcounting")

... which was made to defer deallocation that was otherwise possible
from an NMI context. Now it is no longer the case; the last call to
rb_free_aux() that drops the last AUX reference has to happen in
perf_mmap_close() on that AUX area.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/87d1qtz23d.fsf@ashishki-desk.ger.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c        | 120 +++++++++++++++++++++++++++++++++++++++++++-
 kernel/events/internal.h    |   1 -
 kernel/events/ring_buffer.c |  37 ++++----------
 3 files changed, 129 insertions(+), 29 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 525d11c..243df4b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1925,8 +1925,13 @@ event_sched_in(struct perf_event *event,
 	if (event->state <= PERF_EVENT_STATE_OFF)
 		return 0;
 
-	event->state = PERF_EVENT_STATE_ACTIVE;
-	event->oncpu = smp_processor_id();
+	WRITE_ONCE(event->oncpu, smp_processor_id());
+	/*
+	 * Order event::oncpu write to happen before the ACTIVE state
+	 * is visible.
+	 */
+	smp_wmb();
+	WRITE_ONCE(event->state, PERF_EVENT_STATE_ACTIVE);
 
 	/*
 	 * Unthrottle events, since we scheduled we might have missed several
@@ -2358,6 +2363,29 @@ void perf_event_enable(struct perf_event *event)
 }
 EXPORT_SYMBOL_GPL(perf_event_enable);
 
+static int __perf_event_stop(void *info)
+{
+	struct perf_event *event = info;
+
+	/* for AUX events, our job is done if the event is already inactive */
+	if (READ_ONCE(event->state) != PERF_EVENT_STATE_ACTIVE)
+		return 0;
+
+	/* matches smp_wmb() in event_sched_in() */
+	smp_rmb();
+
+	/*
+	 * There is a window with interrupts enabled before we get here,
+	 * so we need to check again lest we try to stop another CPU's event.
+	 */
+	if (READ_ONCE(event->oncpu) != smp_processor_id())
+		return -EAGAIN;
+
+	event->pmu->stop(event, PERF_EF_UPDATE);
+
+	return 0;
+}
+
 static int _perf_event_refresh(struct perf_event *event, int refresh)
 {
 	/*
@@ -4667,6 +4695,8 @@ static void perf_mmap_open(struct vm_area_struct *vma)
 		event->pmu->event_mapped(event);
 }
 
+static void perf_pmu_output_stop(struct perf_event *event);
+
 /*
  * A buffer can be mmap()ed multiple times; either directly through the same
  * event, or through other events by use of perf_event_set_output().
@@ -4694,10 +4724,22 @@ static void perf_mmap_close(struct vm_area_struct *vma)
 	 */
 	if (rb_has_aux(rb) && vma->vm_pgoff == rb->aux_pgoff &&
 	    atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &event->mmap_mutex)) {
+		/*
+		 * Stop all AUX events that are writing to this buffer,
+		 * so that we can free its AUX pages and corresponding PMU
+		 * data. Note that after rb::aux_mmap_count dropped to zero,
+		 * they won't start any more (see perf_aux_output_begin()).
+		 */
+		perf_pmu_output_stop(event);
+
+		/* now it's safe to free the pages */
 		atomic_long_sub(rb->aux_nr_pages, &mmap_user->locked_vm);
 		vma->vm_mm->pinned_vm -= rb->aux_mmap_locked;
 
+		/* this has to be the last one */
 		rb_free_aux(rb);
+		WARN_ON_ONCE(atomic_read(&rb->aux_refcount));
+
 		mutex_unlock(&event->mmap_mutex);
 	}
 
@@ -5768,6 +5810,80 @@ next:
 	rcu_read_unlock();
 }
 
+struct remote_output {
+	struct ring_buffer	*rb;
+	int			err;
+};
+
+static void __perf_event_output_stop(struct perf_event *event, void *data)
+{
+	struct perf_event *parent = event->parent;
+	struct remote_output *ro = data;
+	struct ring_buffer *rb = ro->rb;
+
+	if (!has_aux(event))
+		return;
+
+	if (!parent)
+		parent = event;
+
+	/*
+	 * In case of inheritance, it will be the parent that links to the
+	 * ring-buffer, but it will be the child that's actually using it:
+	 */
+	if (rcu_dereference(parent->rb) == rb)
+		ro->err = __perf_event_stop(event);
+}
+
+static int __perf_pmu_output_stop(void *info)
+{
+	struct perf_event *event = info;
+	struct pmu *pmu = event->pmu;
+	struct perf_cpu_context *cpuctx = get_cpu_ptr(pmu->pmu_cpu_context);
+	struct remote_output ro = {
+		.rb	= event->rb,
+	};
+
+	rcu_read_lock();
+	perf_event_aux_ctx(&cpuctx->ctx, __perf_event_output_stop, &ro);
+	if (cpuctx->task_ctx)
+		perf_event_aux_ctx(cpuctx->task_ctx, __perf_event_output_stop,
+				   &ro);
+	rcu_read_unlock();
+
+	return ro.err;
+}
+
+static void perf_pmu_output_stop(struct perf_event *event)
+{
+	struct perf_event *iter;
+	int err, cpu;
+
+restart:
+	rcu_read_lock();
+	list_for_each_entry_rcu(iter, &event->rb->event_list, rb_entry) {
+		/*
+		 * For per-CPU events, we need to make sure that neither they
+		 * nor their children are running; for cpu==-1 events it's
+		 * sufficient to stop the event itself if it's active, since
+		 * it can't have children.
+		 */
+		cpu = iter->cpu;
+		if (cpu == -1)
+			cpu = READ_ONCE(iter->oncpu);
+
+		if (cpu == -1)
+			continue;
+
+		err = cpu_function_call(cpu, __perf_pmu_output_stop, event);
+		if (err == -EAGAIN) {
+			rcu_read_unlock();
+			goto restart;
+		}
+	}
+	rcu_read_unlock();
+}
+
 /*
  * task tracking -- fork/exit
  *
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 2bbad9c..2b229fd 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -11,7 +11,6 @@
 struct ring_buffer {
 	atomic_t			refcount;
 	struct rcu_head			rcu_head;
-	struct irq_work			irq_work;
 #ifdef CONFIG_PERF_USE_VMALLOC
 	struct work_struct		work;
 	int				page_order;	/* allocation order  */
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 89abf62..367e9c5 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -221,8 +221,6 @@ void perf_output_end(struct perf_output_handle *handle)
 	rcu_read_unlock();
 }
 
-static void rb_irq_work(struct irq_work *work);
-
 static void
 ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
 {
@@ -243,16 +241,6 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
 
 	INIT_LIST_HEAD(&rb->event_list);
 	spin_lock_init(&rb->event_lock);
-	init_irq_work(&rb->irq_work, rb_irq_work);
-}
-
-static void ring_buffer_put_async(struct ring_buffer *rb)
-{
-	if (!atomic_dec_and_test(&rb->refcount))
-		return;
-
-	rb->rcu_head.next = (void *)rb;
-	irq_work_queue(&rb->irq_work);
 }
 
 /*
@@ -292,7 +280,7 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
 	 * the aux buffer is in perf_mmap_close(), about to get freed.
 	 */
 	if (!atomic_read(&rb->aux_mmap_count))
-		goto err;
+		goto err_put;
 
 	/*
 	 * Nesting is not supported for AUX area, make sure nested
@@ -338,7 +326,7 @@ err_put:
 	rb_free_aux(rb);
 
 err:
-	ring_buffer_put_async(rb);
+	ring_buffer_put(rb);
 	handle->event = NULL;
 
 	return NULL;
@@ -389,7 +377,7 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size,
 
 	local_set(&rb->aux_nest, 0);
 	rb_free_aux(rb);
-	ring_buffer_put_async(rb);
+	ring_buffer_put(rb);
 }
 
 /*
@@ -470,6 +458,14 @@ static void __rb_free_aux(struct ring_buffer *rb)
 {
 	int pg;
 
+	/*
+	 * Should never happen, the last reference should be dropped from
+	 * perf_mmap_close() path, which first stops aux transactions (which
+	 * in turn are the atomic holders of aux_refcount) and then does the
+	 * last rb_free_aux().
+	 */
+	WARN_ON_ONCE(in_atomic());
+
 	if (rb->aux_priv) {
 		rb->free_aux(rb->aux_priv);
 		rb->free_aux = NULL;
@@ -581,18 +577,7 @@ out:
 void rb_free_aux(struct ring_buffer *rb)
 {
 	if (atomic_dec_and_test(&rb->aux_refcount))
-		irq_work_queue(&rb->irq_work);
-}
-
-static void rb_irq_work(struct irq_work *work)
-{
-	struct ring_buffer *rb = container_of(work, struct ring_buffer, irq_work);
-
-	if (!atomic_read(&rb->aux_refcount))
 		__rb_free_aux(rb);
-
-	if (rb->rcu_head.next == (void *)rb)
-		call_rcu(&rb->rcu_head, rb_free_rcu);
 }
 
 #ifndef CONFIG_PERF_USE_VMALLOC

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [tip:perf/core] perf/ring_buffer: Document AUX API usage
  2016-03-04 13:42 ` [PATCH v2 3/5] perf: Document aux api usage Alexander Shishkin
@ 2016-03-31  9:24   ` tip-bot for Alexander Shishkin
  0 siblings, 0 replies; 16+ messages in thread
From: tip-bot for Alexander Shishkin @ 2016-03-31  9:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, peterz, linux-kernel, tglx, eranian, alexander.shishkin,
	hpa, acme, mingo, torvalds, vincent.weaver, mathieu.poirier,
	acme

Commit-ID:  af5bb4ed1254a378b6028c09e58bdcc1cd9bf5b3
Gitweb:     http://git.kernel.org/tip/af5bb4ed1254a378b6028c09e58bdcc1cd9bf5b3
Author:     Alexander Shishkin <alexander.shishkin@linux.intel.com>
AuthorDate: Fri, 4 Mar 2016 15:42:47 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 31 Mar 2016 10:30:43 +0200

perf/ring_buffer: Document AUX API usage

In order to ensure safe AUX buffer management, we rely on the assumption
that pmu::stop() stops its ongoing AUX transaction and not just the hw.

This patch documents this requirement for the perf_aux_output_{begin,end}()
APIs.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/1457098969-21595-4-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/ring_buffer.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 367e9c5..0ed4555 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -252,6 +252,10 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
  * The ordering is similar to that of perf_output_{begin,end}, with
  * the exception of (B), which should be taken care of by the pmu
  * driver, since ordering rules will differ depending on hardware.
+ *
+ * Call this from pmu::start(); see the comment in perf_aux_output_end()
+ * about its use in pmu callbacks. Both can also be called from the PMI
+ * handler if needed.
  */
 void *perf_aux_output_begin(struct perf_output_handle *handle,
 			    struct perf_event *event)
@@ -323,6 +327,7 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
 	return handle->rb->aux_priv;
 
 err_put:
+	/* can't be last */
 	rb_free_aux(rb);
 
 err:
@@ -337,6 +342,10 @@ err:
  * aux_head and posting a PERF_RECORD_AUX into the perf buffer. It is the
  * pmu driver's responsibility to observe ordering rules of the hardware,
  * so that all the data is externally visible before this is called.
+ *
+ * Note: this has to be called from pmu::stop() callback, as the assumption
+ * of the AUX buffer management code is that after pmu::stop(), the AUX
+ * transaction must be stopped and therefore drop the AUX reference count.
  */
 void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size,
 			 bool truncated)
@@ -376,6 +385,7 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size,
 	handle->event = NULL;
 
 	local_set(&rb->aux_nest, 0);
+	/* can't be last */
 	rb_free_aux(rb);
 	ring_buffer_put(rb);
 }

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [tip:perf/core] perf/x86/intel/pt: Move transaction start/stop to PMU start/stop callbacks
  2016-03-04 13:42 ` [PATCH v2 4/5] perf/x86/intel/pt: Move transaction start/stop to pmu start/stop callbacks Alexander Shishkin
@ 2016-03-31  9:24   ` tip-bot for Alexander Shishkin
  0 siblings, 0 replies; 16+ messages in thread
From: tip-bot for Alexander Shishkin @ 2016-03-31  9:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, jolsa, eranian, alexander.shishkin, hpa, linux-kernel,
	peterz, acme, tglx, mathieu.poirier, acme, vincent.weaver, mingo

Commit-ID:  66d219014a4ee47ad4ca2b9db5fe6547353e2a56
Gitweb:     http://git.kernel.org/tip/66d219014a4ee47ad4ca2b9db5fe6547353e2a56
Author:     Alexander Shishkin <alexander.shishkin@linux.intel.com>
AuthorDate: Fri, 4 Mar 2016 15:42:48 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 31 Mar 2016 10:30:43 +0200

perf/x86/intel/pt: Move transaction start/stop to PMU start/stop callbacks

As per AUX buffer management requirement, AUX output has to happen between
pmu::start and pmu::stop calls so that perf_event_stop() actually stops it
and therefore perf can free the AUX data after it has called pmu::stop.

This patch moves perf_aux_output_{begin,end} from pt_event_{add,del} to
pt_event_{start,stop}. As a bonus, we get rid of pt_buffer_is_full(),
which is already taken care of by perf_aux_output_begin() anyway.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/1457098969-21595-5-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/events/intel/pt.c | 85 +++++++++++++++-------------------------------
 1 file changed, 27 insertions(+), 58 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 6af7cf7..127f58c 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -906,26 +906,6 @@ static void pt_buffer_free_aux(void *data)
 }
 
 /**
- * pt_buffer_is_full() - check if the buffer is full
- * @buf:	PT buffer.
- * @pt:		Per-cpu pt handle.
- *
- * If the user hasn't read data from the output region that aux_head
- * points to, the buffer is considered full: the user needs to read at
- * least this region and update aux_tail to point past it.
- */
-static bool pt_buffer_is_full(struct pt_buffer *buf, struct pt *pt)
-{
-	if (buf->snapshot)
-		return false;
-
-	if (local_read(&buf->data_size) >= pt->handle.size)
-		return true;
-
-	return false;
-}
-
-/**
  * intel_pt_interrupt() - PT PMI handler
  */
 void intel_pt_interrupt(void)
@@ -989,20 +969,33 @@ void intel_pt_interrupt(void)
 
 static void pt_event_start(struct perf_event *event, int mode)
 {
+	struct hw_perf_event *hwc = &event->hw;
 	struct pt *pt = this_cpu_ptr(&pt_ctx);
-	struct pt_buffer *buf = perf_get_aux(&pt->handle);
+	struct pt_buffer *buf;
 
-	if (!buf || pt_buffer_is_full(buf, pt)) {
-		event->hw.state = PERF_HES_STOPPED;
-		return;
+	buf = perf_aux_output_begin(&pt->handle, event);
+	if (!buf)
+		goto fail_stop;
+
+	pt_buffer_reset_offsets(buf, pt->handle.head);
+	if (!buf->snapshot) {
+		if (pt_buffer_reset_markers(buf, &pt->handle))
+			goto fail_end_stop;
 	}
 
 	ACCESS_ONCE(pt->handle_nmi) = 1;
-	event->hw.state = 0;
+	hwc->state = 0;
 
 	pt_config_buffer(buf->cur->table, buf->cur_idx,
 			 buf->output_off);
 	pt_config(event);
+
+	return;
+
+fail_end_stop:
+	perf_aux_output_end(&pt->handle, 0, true);
+fail_stop:
+	hwc->state = PERF_HES_STOPPED;
 }
 
 static void pt_event_stop(struct perf_event *event, int mode)
@@ -1035,19 +1028,7 @@ static void pt_event_stop(struct perf_event *event, int mode)
 		pt_handle_status(pt);
 
 		pt_update_head(pt);
-	}
-}
 
-static void pt_event_del(struct perf_event *event, int mode)
-{
-	struct pt *pt = this_cpu_ptr(&pt_ctx);
-	struct pt_buffer *buf;
-
-	pt_event_stop(event, PERF_EF_UPDATE);
-
-	buf = perf_get_aux(&pt->handle);
-
-	if (buf) {
 		if (buf->snapshot)
 			pt->handle.head =
 				local_xchg(&buf->data_size,
@@ -1057,9 +1038,13 @@ static void pt_event_del(struct perf_event *event, int mode)
 	}
 }
 
+static void pt_event_del(struct perf_event *event, int mode)
+{
+	pt_event_stop(event, PERF_EF_UPDATE);
+}
+
 static int pt_event_add(struct perf_event *event, int mode)
 {
-	struct pt_buffer *buf;
 	struct pt *pt = this_cpu_ptr(&pt_ctx);
 	struct hw_perf_event *hwc = &event->hw;
 	int ret = -EBUSY;
@@ -1067,34 +1052,18 @@ static int pt_event_add(struct perf_event *event, int mode)
 	if (pt->handle.event)
 		goto fail;
 
-	buf = perf_aux_output_begin(&pt->handle, event);
-	ret = -EINVAL;
-	if (!buf)
-		goto fail_stop;
-
-	pt_buffer_reset_offsets(buf, pt->handle.head);
-	if (!buf->snapshot) {
-		ret = pt_buffer_reset_markers(buf, &pt->handle);
-		if (ret)
-			goto fail_end_stop;
-	}
-
 	if (mode & PERF_EF_START) {
 		pt_event_start(event, 0);
-		ret = -EBUSY;
+		ret = -EINVAL;
 		if (hwc->state == PERF_HES_STOPPED)
-			goto fail_end_stop;
+			goto fail;
 	} else {
 		hwc->state = PERF_HES_STOPPED;
 	}
 
-	return 0;
-
-fail_end_stop:
-	perf_aux_output_end(&pt->handle, 0, true);
-fail_stop:
-	hwc->state = PERF_HES_STOPPED;
+	ret = 0;
 fail:
+
 	return ret;
 }
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [tip:perf/core] perf/x86/intel/bts: Move transaction start/stop to start/stop callbacks
  2016-03-04 13:42 ` [PATCH v2 5/5] perf/x86/intel/bts: Move transaction start/stop to " Alexander Shishkin
@ 2016-03-31  9:25   ` tip-bot for Alexander Shishkin
  0 siblings, 0 replies; 16+ messages in thread
From: tip-bot for Alexander Shishkin @ 2016-03-31  9:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mathieu.poirier, hpa, vincent.weaver, linux-kernel, jolsa, mingo,
	eranian, torvalds, tglx, alexander.shishkin, acme, acme, peterz

Commit-ID:  981a4cb380d3dff7010ce9f89618064a254eab8c
Gitweb:     http://git.kernel.org/tip/981a4cb380d3dff7010ce9f89618064a254eab8c
Author:     Alexander Shishkin <alexander.shishkin@linux.intel.com>
AuthorDate: Fri, 4 Mar 2016 15:42:49 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 31 Mar 2016 10:30:44 +0200

perf/x86/intel/bts: Move transaction start/stop to start/stop callbacks

As per AUX buffer management requirement, AUX output has to happen between
pmu::start and pmu::stop calls so that perf_event_stop() actually stops it
and therefore perf can free the AUX data after it has called pmu::stop.

This patch moves perf_aux_output_{begin,end} from bts_event_{add,del} to
bts_event_{start,stop}. As a bonus, we get rid of bts_buffer_is_full(),
which is already taken care of by perf_aux_output_begin() anyway.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/1457098969-21595-6-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/events/intel/bts.c | 105 ++++++++++++++++++++------------------------
 1 file changed, 48 insertions(+), 57 deletions(-)

diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c
index b99dc92..0a6e393 100644
--- a/arch/x86/events/intel/bts.c
+++ b/arch/x86/events/intel/bts.c
@@ -171,18 +171,6 @@ static void bts_buffer_pad_out(struct bts_phys *phys, unsigned long head)
 	memset(page_address(phys->page) + index, 0, phys->size - index);
 }
 
-static bool bts_buffer_is_full(struct bts_buffer *buf, struct bts_ctx *bts)
-{
-	if (buf->snapshot)
-		return false;
-
-	if (local_read(&buf->data_size) >= bts->handle.size ||
-	    bts->handle.size - local_read(&buf->data_size) < BTS_RECORD_SIZE)
-		return true;
-
-	return false;
-}
-
 static void bts_update(struct bts_ctx *bts)
 {
 	int cpu = raw_smp_processor_id();
@@ -213,18 +201,15 @@ static void bts_update(struct bts_ctx *bts)
 	}
 }
 
+static int
+bts_buffer_reset(struct bts_buffer *buf, struct perf_output_handle *handle);
+
 static void __bts_event_start(struct perf_event *event)
 {
 	struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
 	struct bts_buffer *buf = perf_get_aux(&bts->handle);
 	u64 config = 0;
 
-	if (!buf || bts_buffer_is_full(buf, bts))
-		return;
-
-	event->hw.itrace_started = 1;
-	event->hw.state = 0;
-
 	if (!buf->snapshot)
 		config |= ARCH_PERFMON_EVENTSEL_INT;
 	if (!event->attr.exclude_kernel)
@@ -241,16 +226,41 @@ static void __bts_event_start(struct perf_event *event)
 	wmb();
 
 	intel_pmu_enable_bts(config);
+
 }
 
 static void bts_event_start(struct perf_event *event, int flags)
 {
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
+	struct bts_buffer *buf;
+
+	buf = perf_aux_output_begin(&bts->handle, event);
+	if (!buf)
+		goto fail_stop;
+
+	if (bts_buffer_reset(buf, &bts->handle))
+		goto fail_end_stop;
+
+	bts->ds_back.bts_buffer_base = cpuc->ds->bts_buffer_base;
+	bts->ds_back.bts_absolute_maximum = cpuc->ds->bts_absolute_maximum;
+	bts->ds_back.bts_interrupt_threshold = cpuc->ds->bts_interrupt_threshold;
+
+	event->hw.itrace_started = 1;
+	event->hw.state = 0;
 
 	__bts_event_start(event);
 
 	/* PMI handler: this counter is running and likely generating PMIs */
 	ACCESS_ONCE(bts->started) = 1;
+
+	return;
+
+fail_end_stop:
+	perf_aux_output_end(&bts->handle, 0, false);
+
+fail_stop:
+	event->hw.state = PERF_HES_STOPPED;
 }
 
 static void __bts_event_stop(struct perf_event *event)
@@ -269,15 +279,32 @@ static void __bts_event_stop(struct perf_event *event)
 
 static void bts_event_stop(struct perf_event *event, int flags)
 {
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
+	struct bts_buffer *buf = perf_get_aux(&bts->handle);
 
 	/* PMI handler: don't restart this counter */
 	ACCESS_ONCE(bts->started) = 0;
 
 	__bts_event_stop(event);
 
-	if (flags & PERF_EF_UPDATE)
+	if (flags & PERF_EF_UPDATE) {
 		bts_update(bts);
+
+		if (buf) {
+			if (buf->snapshot)
+				bts->handle.head =
+					local_xchg(&buf->data_size,
+						   buf->nr_pages << PAGE_SHIFT);
+			perf_aux_output_end(&bts->handle, local_xchg(&buf->data_size, 0),
+					    !!local_xchg(&buf->lost, 0));
+		}
+
+		cpuc->ds->bts_index = bts->ds_back.bts_buffer_base;
+		cpuc->ds->bts_buffer_base = bts->ds_back.bts_buffer_base;
+		cpuc->ds->bts_absolute_maximum = bts->ds_back.bts_absolute_maximum;
+		cpuc->ds->bts_interrupt_threshold = bts->ds_back.bts_interrupt_threshold;
+	}
 }
 
 void intel_bts_enable_local(void)
@@ -417,34 +444,14 @@ int intel_bts_interrupt(void)
 
 static void bts_event_del(struct perf_event *event, int mode)
 {
-	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
-	struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
-	struct bts_buffer *buf = perf_get_aux(&bts->handle);
-
 	bts_event_stop(event, PERF_EF_UPDATE);
-
-	if (buf) {
-		if (buf->snapshot)
-			bts->handle.head =
-				local_xchg(&buf->data_size,
-					   buf->nr_pages << PAGE_SHIFT);
-		perf_aux_output_end(&bts->handle, local_xchg(&buf->data_size, 0),
-				    !!local_xchg(&buf->lost, 0));
-	}
-
-	cpuc->ds->bts_index = bts->ds_back.bts_buffer_base;
-	cpuc->ds->bts_buffer_base = bts->ds_back.bts_buffer_base;
-	cpuc->ds->bts_absolute_maximum = bts->ds_back.bts_absolute_maximum;
-	cpuc->ds->bts_interrupt_threshold = bts->ds_back.bts_interrupt_threshold;
 }
 
 static int bts_event_add(struct perf_event *event, int mode)
 {
-	struct bts_buffer *buf;
 	struct bts_ctx *bts = this_cpu_ptr(&bts_ctx);
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct hw_perf_event *hwc = &event->hw;
-	int ret = -EBUSY;
 
 	event->hw.state = PERF_HES_STOPPED;
 
@@ -454,26 +461,10 @@ static int bts_event_add(struct perf_event *event, int mode)
 	if (bts->handle.event)
 		return -EBUSY;
 
-	buf = perf_aux_output_begin(&bts->handle, event);
-	if (!buf)
-		return -EINVAL;
-
-	ret = bts_buffer_reset(buf, &bts->handle);
-	if (ret) {
-		perf_aux_output_end(&bts->handle, 0, false);
-		return ret;
-	}
-
-	bts->ds_back.bts_buffer_base = cpuc->ds->bts_buffer_base;
-	bts->ds_back.bts_absolute_maximum = cpuc->ds->bts_absolute_maximum;
-	bts->ds_back.bts_interrupt_threshold = cpuc->ds->bts_interrupt_threshold;
-
 	if (mode & PERF_EF_START) {
 		bts_event_start(event, 0);
-		if (hwc->state & PERF_HES_STOPPED) {
-			bts_event_del(event, 0);
-			return -EBUSY;
-		}
+		if (hwc->state & PERF_HES_STOPPED)
+			return -EINVAL;
 	}
 
 	return 0;

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-03-31  9:26 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-04 13:42 [PATCH v2 0/5] perf: Untangle aux refcounting Alexander Shishkin
2016-03-04 13:42 ` [PATCH v2 1/5] perf: Refuse to begin aux transaction after aux_mmap_count drops Alexander Shishkin
2016-03-31  9:23   ` [tip:perf/core] perf/ring_buffer: Refuse to begin AUX transaction after rb->aux_mmap_count drops tip-bot for Alexander Shishkin
2016-03-04 13:42 ` [PATCH v2 2/5] perf: Free aux pages in unmap path Alexander Shishkin
2016-03-14 12:38   ` Peter Zijlstra
2016-03-14 14:04     ` Alexander Shishkin
2016-03-14 16:42       ` Peter Zijlstra
2016-03-17 13:05         ` Alexander Shishkin
2016-03-23  8:34           ` Peter Zijlstra
2016-03-31  9:24           ` [tip:perf/core] perf/core: Free AUX " tip-bot for Alexander Shishkin
2016-03-04 13:42 ` [PATCH v2 3/5] perf: Document aux api usage Alexander Shishkin
2016-03-31  9:24   ` [tip:perf/core] perf/ring_buffer: Document AUX API usage tip-bot for Alexander Shishkin
2016-03-04 13:42 ` [PATCH v2 4/5] perf/x86/intel/pt: Move transaction start/stop to pmu start/stop callbacks Alexander Shishkin
2016-03-31  9:24   ` [tip:perf/core] perf/x86/intel/pt: Move transaction start/stop to PMU " tip-bot for Alexander Shishkin
2016-03-04 13:42 ` [PATCH v2 5/5] perf/x86/intel/bts: Move transaction start/stop to " Alexander Shishkin
2016-03-31  9:25   ` [tip:perf/core] " tip-bot for Alexander Shishkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.