From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S964905AbcCNOEx (ORCPT <rfc822;w@1wt.eu>);
	Mon, 14 Mar 2016 10:04:53 -0400
Received: from mga14.intel.com ([192.55.52.115]:8117 "EHLO mga14.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S932432AbcCNOEu (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Mon, 14 Mar 2016 10:04:50 -0400
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.24,335,1455004800"; 
   d="scan'208";a="909919752"
From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>, linux-kernel@vger.kernel.org,
        vince@deater.net, eranian@google.com,
        Arnaldo Carvalho de Melo <acme@infradead.org>
Subject: Re: [PATCH v2 2/5] perf: Free aux pages in unmap path
In-Reply-To: <20160314123837.GU6356@twins.programming.kicks-ass.net>
References: <1457098969-21595-1-git-send-email-alexander.shishkin@linux.intel.com> <1457098969-21595-3-git-send-email-alexander.shishkin@linux.intel.com> <20160314123837.GU6356@twins.programming.kicks-ass.net>
User-Agent: Notmuch/0.21 (http://notmuchmail.org) Emacs/24.5.1 (x86_64-pc-linux-gnu)
Date: Mon, 14 Mar 2016 16:04:44 +0200
Message-ID: <87fuvtyx37.fsf@ashishki-desk.ger.corp.intel.com>
MIME-Version: 1.0
Content-Type: text/plain
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Peter Zijlstra <peterz@infradead.org> writes:

> On Fri, Mar 04, 2016 at 03:42:46PM +0200, Alexander Shishkin wrote:
>> @@ -4649,10 +4679,22 @@ static void perf_mmap_close(struct vm_area_struct *vma)
>>  	 */
>>  	if (rb_has_aux(rb) && vma->vm_pgoff == rb->aux_pgoff &&
>>  	    atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &event->mmap_mutex)) {
>> +		/*
>> +		 * Stop all aux events that are writing to this here buffer,
>> +		 * so that we can free its aux pages and corresponding pmu
>> +		 * data. Note that after rb::aux_mmap_count dropped to zero,
>> +		 * they won't start any more (see perf_aux_output_begin()).
>> +		 */
>> +		perf_pmu_output_stop(event);
>
> So to me it seems like we're interested in rb, we don't particularly
> care about @event in this case.

Yeah, @event is used only for its rb and pmu down this path.

>> +	if (!has_aux(event))
>> +		return;
>> +
>
> 	if (!parent)
> 		parent = event;
>
>> +	if (rcu_dereference(event->rb) == rb)
> s/event/parent/
>
>> +		ro->err = __perf_event_stop(event);
>
>> +	else if (parent && rcu_dereference(parent->rb) == rb)
>> +		ro->err = __perf_event_stop(event);
>
> and these can go.. However..

You're right: it's the parent that's got the ->rb, but it may well be
the child that's actually running and writing data there. So we need to
stop any running children. I think I actually broke this bit since the
previous version, because there needs to be a comment about it.

>
>> +}
>> +
>> +static int __perf_pmu_output_stop(void *info)
>> +{
>> +	struct perf_event *event = info;
>> +	struct pmu *pmu = event->pmu;
>> +	struct perf_cpu_context *cpuctx = get_cpu_ptr(pmu->pmu_cpu_context);
>> +	struct remote_output ro = {
>> +		.rb	= event->rb,
>> +	};
>> +
>> +	rcu_read_lock();
>> +	perf_event_aux_ctx(&cpuctx->ctx, __perf_event_output_stop, &ro);
>> +	if (cpuctx->task_ctx)
>> +		perf_event_aux_ctx(cpuctx->task_ctx, __perf_event_output_stop,
>> +				   &ro);
>> +	rcu_read_unlock();
>> +
>> +	return ro.err;
>> +}
>> +
>> +static void perf_pmu_output_stop(struct perf_event *event)
>> +{
>> +	int cpu, err;
>> +
>> +	/* better be thorough */
>> +	get_online_cpus();
>> +restart:
>> +	for_each_online_cpu(cpu) {
>> +		err = cpu_function_call(cpu, __perf_pmu_output_stop, event);
>> +		if (err)
>> +			goto restart;
>> +	}
>> +	put_online_cpus();
>> +}
>
> This seems wildly overkill, could we not iterate rb->event_list like we
> do for the normal buffer?

Actually we can. One problem though is that iterating rb::event_list
requires rcu read section or irqsafe rb::event_lock and we need to send
IPIs. The normal buffer case tears down the rb::event_list as it goes,
so it can close the rcu read section right after it fetches one event
from it. In this case however, we must keep the list intact.

> Sure, we need to IPI for each event found, but that seems better than
> unconditionally sending IPIs to all CPUs.

Actually, won't it "often" be the case that the number of events will be
a multiple of the number of cpus? The usual use case being one event per
task per cpu and inheritance enabled. In this case we'll zap multiple
events per IPI.

Regards,
--
Alex