linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	linux-kernel@vger.kernel.org, vince@deater.net,
	eranian@google.com, Arnaldo Carvalho de Melo <acme@infradead.org>
Subject: Re: [PATCH v2 2/5] perf: Free aux pages in unmap path
Date: Thu, 17 Mar 2016 15:05:42 +0200	[thread overview]
Message-ID: <87d1qtz23d.fsf@ashishki-desk.ger.corp.intel.com> (raw)
In-Reply-To: <20160314164218.GP6344@twins.programming.kicks-ass.net>

Peter Zijlstra <peterz@infradead.org> writes:

> On Mon, Mar 14, 2016 at 04:04:44PM +0200, Alexander Shishkin wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>
>> >> +static void perf_pmu_output_stop(struct perf_event *event)
>> >> +{
>> >> +	int cpu, err;
>> >> +
>> >> +	/* better be thorough */
>> >> +	get_online_cpus();
>> >> +restart:
>> >> +	for_each_online_cpu(cpu) {
>> >> +		err = cpu_function_call(cpu, __perf_pmu_output_stop, event);
>> >> +		if (err)
>> >> +			goto restart;
>> >> +	}
>> >> +	put_online_cpus();
>> >> +}
>> >
>> > This seems wildly overkill, could we not iterate rb->event_list like we
>> > do for the normal buffer?
>> 
>> Actually we can. One problem though is that iterating rb::event_list
>> requires rcu read section or irqsafe rb::event_lock and we need to send
>> IPIs. 
>
> We should be able to send IPIs with rcu_read_lock() held; doing so with
> IRQs disabled is a bit harder.

Ok, so how about this one instead.

>From f674bc768b77479478ca4a67251c5f6333993249 Mon Sep 17 00:00:00 2001
From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Date: Wed, 2 Dec 2015 18:41:11 +0200
Subject: [PATCH v3] perf: Free aux pages in unmap path

Now that we can ensure that when ring buffer's aux area is on the way
to getting unmapped new transactions won't start, we only need to stop
all events that can potentially be writing aux data to our ring buffer.
Having done that, we can safely free the aux pages and corresponding
pmu data, as this time it is guaranteed to be the last aux reference
holder. This partially reverts 57ffc5ca679 ("perf: Fix AUX buffer
refcounting"), which was made to defer deallocation that was otherwise
possible from an NMI context. Now it is no longer the case; the last
call to rb_free_aux() that drops the last AUX reference has to happen
in perf_mmap_close() on that AUX area.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 kernel/events/core.c        | 120 +++++++++++++++++++++++++++++++++++++++++++-
 kernel/events/internal.h    |   1 -
 kernel/events/ring_buffer.c |  37 ++++----------
 3 files changed, 129 insertions(+), 29 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index b7231498de..4bfa0ff2fb 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1910,8 +1910,13 @@ event_sched_in(struct perf_event *event,
 	if (event->state <= PERF_EVENT_STATE_OFF)
 		return 0;
 
-	event->state = PERF_EVENT_STATE_ACTIVE;
-	event->oncpu = smp_processor_id();
+	WRITE_ONCE(event->oncpu, smp_processor_id());
+	/*
+	 * Order event::oncpu write to happen before the ACTIVE state
+	 * is visible.
+	 */
+	smp_wmb();
+	WRITE_ONCE(event->state, PERF_EVENT_STATE_ACTIVE);
 
 	/*
 	 * Unthrottle events, since we scheduled we might have missed several
@@ -2343,6 +2348,29 @@ void perf_event_enable(struct perf_event *event)
 }
 EXPORT_SYMBOL_GPL(perf_event_enable);
 
+static int __perf_event_stop(void *info)
+{
+	struct perf_event *event = info;
+
+	/* for aux events, our job is done if the event is already inactive */
+	if (READ_ONCE(event->state) != PERF_EVENT_STATE_ACTIVE)
+		return 0;
+
+	/* matches smp_wmb() in event_sched_in() */
+	smp_rmb();
+
+	/*
+	 * There is a window with interrupts enabled before we get here,
+	 * so we need to check again lest we try to stop another cpu's event.
+	 */
+	if (READ_ONCE(event->oncpu) != smp_processor_id())
+		return -EAGAIN;
+
+	event->pmu->stop(event, PERF_EF_UPDATE);
+
+	return 0;
+}
+
 static int _perf_event_refresh(struct perf_event *event, int refresh)
 {
 	/*
@@ -4622,6 +4650,8 @@ static void perf_mmap_open(struct vm_area_struct *vma)
 		event->pmu->event_mapped(event);
 }
 
+static void perf_pmu_output_stop(struct perf_event *event);
+
 /*
  * A buffer can be mmap()ed multiple times; either directly through the same
  * event, or through other events by use of perf_event_set_output().
@@ -4649,10 +4679,22 @@ static void perf_mmap_close(struct vm_area_struct *vma)
 	 */
 	if (rb_has_aux(rb) && vma->vm_pgoff == rb->aux_pgoff &&
 	    atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &event->mmap_mutex)) {
+		/*
+		 * Stop all aux events that are writing to this here buffer,
+		 * so that we can free its aux pages and corresponding pmu
+		 * data. Note that after rb::aux_mmap_count dropped to zero,
+		 * they won't start any more (see perf_aux_output_begin()).
+		 */
+		perf_pmu_output_stop(event);
+
+		/* now it's safe to free the pages */
 		atomic_long_sub(rb->aux_nr_pages, &mmap_user->locked_vm);
 		vma->vm_mm->pinned_vm -= rb->aux_mmap_locked;
 
+		/* this has to be the last one */
 		rb_free_aux(rb);
+		WARN_ON_ONCE(atomic_read(&rb->aux_refcount));
+
 		mutex_unlock(&event->mmap_mutex);
 	}
 
@@ -5723,6 +5765,80 @@ next:
 	rcu_read_unlock();
 }
 
+struct remote_output {
+	struct ring_buffer	*rb;
+	int			err;
+};
+
+static void __perf_event_output_stop(struct perf_event *event, void *data)
+{
+	struct perf_event *parent = event->parent;
+	struct remote_output *ro = data;
+	struct ring_buffer *rb = ro->rb;
+
+	if (!has_aux(event))
+		return;
+
+	if (!parent)
+		parent = event;
+
+	/*
+	 * in case of inheritance, it will be the parent that links to the
+	 * rb, but it will be the child that's actually using it
+	 */
+	if (rcu_dereference(parent->rb) == rb)
+		ro->err = __perf_event_stop(event);
+}
+
+static int __perf_pmu_output_stop(void *info)
+{
+	struct perf_event *event = info;
+	struct pmu *pmu = event->pmu;
+	struct perf_cpu_context *cpuctx = get_cpu_ptr(pmu->pmu_cpu_context);
+	struct remote_output ro = {
+		.rb	= event->rb,
+	};
+
+	rcu_read_lock();
+	perf_event_aux_ctx(&cpuctx->ctx, __perf_event_output_stop, &ro);
+	if (cpuctx->task_ctx)
+		perf_event_aux_ctx(cpuctx->task_ctx, __perf_event_output_stop,
+				   &ro);
+	rcu_read_unlock();
+
+	return ro.err;
+}
+
+static void perf_pmu_output_stop(struct perf_event *event)
+{
+	struct perf_event *iter;
+	int err, cpu;
+
+restart:
+	rcu_read_lock();
+	list_for_each_entry_rcu(iter, &event->rb->event_list, rb_entry) {
+		/*
+		 * For per-cpu events, we need to make sure that neither they
+		 * nor their children are running; for cpu==-1 events it's
+		 * sufficient to stop the event itself if it's active, since
+		 * it can't have children.
+		 */
+		cpu = iter->cpu;
+		if (cpu == -1)
+			cpu = READ_ONCE(iter->oncpu);
+
+		if (cpu == -1)
+			continue;
+
+		err = cpu_function_call(cpu, __perf_pmu_output_stop, event);
+		if (err == -EAGAIN) {
+			rcu_read_unlock();
+			goto restart;
+		}
+	}
+	rcu_read_unlock();
+}
+
 /*
  * task tracking -- fork/exit
  *
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 2bbad9c127..2b229fdcfc 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -11,7 +11,6 @@
 struct ring_buffer {
 	atomic_t			refcount;
 	struct rcu_head			rcu_head;
-	struct irq_work			irq_work;
 #ifdef CONFIG_PERF_USE_VMALLOC
 	struct work_struct		work;
 	int				page_order;	/* allocation order  */
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index af5fbc7e91..fb7f5bce27 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -221,8 +221,6 @@ void perf_output_end(struct perf_output_handle *handle)
 	rcu_read_unlock();
 }
 
-static void rb_irq_work(struct irq_work *work);
-
 static void
 ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
 {
@@ -243,16 +241,6 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
 
 	INIT_LIST_HEAD(&rb->event_list);
 	spin_lock_init(&rb->event_lock);
-	init_irq_work(&rb->irq_work, rb_irq_work);
-}
-
-static void ring_buffer_put_async(struct ring_buffer *rb)
-{
-	if (!atomic_dec_and_test(&rb->refcount))
-		return;
-
-	rb->rcu_head.next = (void *)rb;
-	irq_work_queue(&rb->irq_work);
 }
 
 /*
@@ -292,7 +280,7 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
 	 * the aux buffer is in perf_mmap_close(), about to get free'd.
 	 */
 	if (!atomic_read(&rb->aux_mmap_count))
-		goto err;
+		goto err_put;
 
 	/*
 	 * Nesting is not supported for AUX area, make sure nested
@@ -338,7 +326,7 @@ err_put:
 	rb_free_aux(rb);
 
 err:
-	ring_buffer_put_async(rb);
+	ring_buffer_put(rb);
 	handle->event = NULL;
 
 	return NULL;
@@ -389,7 +377,7 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size,
 
 	local_set(&rb->aux_nest, 0);
 	rb_free_aux(rb);
-	ring_buffer_put_async(rb);
+	ring_buffer_put(rb);
 }
 
 /*
@@ -470,6 +458,14 @@ static void __rb_free_aux(struct ring_buffer *rb)
 {
 	int pg;
 
+	/*
+	 * Should never happen, the last reference should be dropped from
+	 * perf_mmap_close() path, which first stops aux transactions (which
+	 * in turn are the atomic holders of aux_refcount) and then does the
+	 * last rb_free_aux().
+	 */
+	WARN_ON_ONCE(in_atomic());
+
 	if (rb->aux_priv) {
 		rb->free_aux(rb->aux_priv);
 		rb->free_aux = NULL;
@@ -581,18 +577,7 @@ out:
 void rb_free_aux(struct ring_buffer *rb)
 {
 	if (atomic_dec_and_test(&rb->aux_refcount))
-		irq_work_queue(&rb->irq_work);
-}
-
-static void rb_irq_work(struct irq_work *work)
-{
-	struct ring_buffer *rb = container_of(work, struct ring_buffer, irq_work);
-
-	if (!atomic_read(&rb->aux_refcount))
 		__rb_free_aux(rb);
-
-	if (rb->rcu_head.next == (void *)rb)
-		call_rcu(&rb->rcu_head, rb_free_rcu);
 }
 
 #ifndef CONFIG_PERF_USE_VMALLOC
-- 
2.7.0

  reply	other threads:[~2016-03-17 13:05 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-04 13:42 [PATCH v2 0/5] perf: Untangle aux refcounting Alexander Shishkin
2016-03-04 13:42 ` [PATCH v2 1/5] perf: Refuse to begin aux transaction after aux_mmap_count drops Alexander Shishkin
2016-03-31  9:23   ` [tip:perf/core] perf/ring_buffer: Refuse to begin AUX transaction after rb->aux_mmap_count drops tip-bot for Alexander Shishkin
2016-03-04 13:42 ` [PATCH v2 2/5] perf: Free aux pages in unmap path Alexander Shishkin
2016-03-14 12:38   ` Peter Zijlstra
2016-03-14 14:04     ` Alexander Shishkin
2016-03-14 16:42       ` Peter Zijlstra
2016-03-17 13:05         ` Alexander Shishkin [this message]
2016-03-23  8:34           ` Peter Zijlstra
2016-03-31  9:24           ` [tip:perf/core] perf/core: Free AUX " tip-bot for Alexander Shishkin
2016-03-04 13:42 ` [PATCH v2 3/5] perf: Document aux api usage Alexander Shishkin
2016-03-31  9:24   ` [tip:perf/core] perf/ring_buffer: Document AUX API usage tip-bot for Alexander Shishkin
2016-03-04 13:42 ` [PATCH v2 4/5] perf/x86/intel/pt: Move transaction start/stop to pmu start/stop callbacks Alexander Shishkin
2016-03-31  9:24   ` [tip:perf/core] perf/x86/intel/pt: Move transaction start/stop to PMU " tip-bot for Alexander Shishkin
2016-03-04 13:42 ` [PATCH v2 5/5] perf/x86/intel/bts: Move transaction start/stop to " Alexander Shishkin
2016-03-31  9:25   ` [tip:perf/core] " tip-bot for Alexander Shishkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87d1qtz23d.fsf@ashishki-desk.ger.corp.intel.com \
    --to=alexander.shishkin@linux.intel.com \
    --cc=acme@infradead.org \
    --cc=eranian@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=vince@deater.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).