From mboxrd@z Thu Jan  1 00:00:00 1970
From: Neil Leeder <nleeder@codeaurora.org>
Subject: Re: [PATCH 2/2] soc: qcom: add l2 cache perf events driver
Date: Fri, 10 Jun 2016 18:34:04 -0400
Message-ID: <7a8fd879-6f0b-3b73-013f-99b12ba5729f@codeaurora.org>
References: <1464987812-14360-1-git-send-email-nleeder@codeaurora.org>
 <1464987812-14360-3-git-send-email-nleeder@codeaurora.org>
 <20160606095139.GC6831@leverpostej>
 <be1cc820-7b9d-041b-06d8-cefeb43934d5@codeaurora.org>
 <20160609155615.GE12201@leverpostej>
 <20160609194136.GZ30154@twins.programming.kicks-ass.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Return-path: <linux-arm-msm-owner@vger.kernel.org>
Received: from smtp.codeaurora.org ([198.145.29.96]:55596 "EHLO
	smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751724AbcFJWeJ (ORCPT
	<rfc822;linux-arm-msm@vger.kernel.org>);
	Fri, 10 Jun 2016 18:34:09 -0400
In-Reply-To: <20160609194136.GZ30154@twins.programming.kicks-ass.net>
Sender: linux-arm-msm-owner@vger.kernel.org
List-Id: linux-arm-msm@vger.kernel.org
To: Peter Zijlstra <peterz@infradead.org>, Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>, WillDeacon <will.deacon@arm.com>, Ingo Molnar <mingo@redhat.com>, Arnaldo Carvalho de Melo <acme@kernel.org>, linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Mark Langsdorf <mlangsdo@redhat.com>, Mark Salter <msalter@redhat.com>, Jon Masters <jcm@redhat.com>, Timur Tabi <timur@codeaurora.org>, cov@codeaurora.org, nleeder@codeaurora.org


On 6/9/2016 03:41 PM, Peter Zijlstra wrote:
> On Thu, Jun 09, 2016 at 04:56:16PM +0100, Mark Rutland wrote:
>>>>> +static irqreturn_t l2_cache__handle_irq(int irq_num, void *data)
>>>>> +{
>>>>> +	struct hml2_pmu *slice = data;
>>>>> +	u32 ovsr;
>>>>> +	int idx;
>>>>> +	struct pt_regs *regs;
>>>>> +
>>>>> +	ovsr = hml2_pmu__getreset_ovsr();
>>>>> +	if (!hml2_pmu__has_overflowed(ovsr))
>>>>> +		return IRQ_NONE;
>>>>> +
>>>>> +	regs = get_irq_regs();
>>>>> +
>>>>> +	for (idx = 0; idx < l2cache_pmu.num_counters; idx++) {
>>>>> +		struct perf_event *event = slice->events[idx];
>>>>> +		struct hw_perf_event *hwc;
>>>>> +		struct perf_sample_data data;
>>>>> +
>>>>> +		if (!event)
>>>>> +			continue;
>>>>> +
>>>>> +		if (!hml2_pmu__counter_has_overflowed(ovsr, idx))
>>>>> +			continue;
>>>>> +
>>>>> +		l2_cache__event_update_from_slice(event, slice);
>>>>> +		hwc = &event->hw;
>>>>> +
>>>>> +		if (is_sampling_event(event)) {
>>>>> +			perf_sample_data_init(&data, 0, hwc->last_period);
>>>>
>>>> I don't think sampling makes sense, given this is an uncore PMU and the
>>>> events are triggered by other CPUs.
>>>
>>> There is origin filtering so events can be attributed to a CPU when sampling.
>>
>> Ok. I believe that's different from all other uncore PMUs we support
>> (none of the drivers support sampling, certainly), so I'm not entirely
>> sure how/if we can make use of that sanely and reliably.
> 
> Right; because not only do you need to know which CPU originated the
> event, the IRQ must also happen on that CPU. Simply knowing which CPU
> triggered it is not enough for sampling.
> 
>> For the timebeing, I think this sampling needs to go, and the event_init
>> logic needs to reject sampling as with other uncore PMU drivers.
> 
> Agreed.

I want to make sure I understand what the concern is here.
Given the hardware filter which restricts counting to events generated by
a specific CPU, and an irq which is affine to that CPU, sampling and task mode
would seem to work for a single perf use. 
Is the issue only related to multiple concurrent perf uses?

> 
>> One thing I forgot to mention in my earlier comments is that as an
>> uncore PMU you need to have task_ctx_nr = perf_invalid_context here
>> also.
> 
> For good reasons, uncore PMUs (as is the case here) count strictly more
> than the effect of single CPUs (and thus also the current task). So
> attributing it back to a task is nonsense.
> 

Thanks,
Neil

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

From mboxrd@z Thu Jan  1 00:00:00 1970
From: nleeder@codeaurora.org (Neil Leeder)
Date: Fri, 10 Jun 2016 18:34:04 -0400
Subject: [PATCH 2/2] soc: qcom: add l2 cache perf events driver
In-Reply-To: <20160609194136.GZ30154@twins.programming.kicks-ass.net>
References: <1464987812-14360-1-git-send-email-nleeder@codeaurora.org>
 <1464987812-14360-3-git-send-email-nleeder@codeaurora.org>
 <20160606095139.GC6831@leverpostej>
 <be1cc820-7b9d-041b-06d8-cefeb43934d5@codeaurora.org>
 <20160609155615.GE12201@leverpostej>
 <20160609194136.GZ30154@twins.programming.kicks-ass.net>
Message-ID: <7a8fd879-6f0b-3b73-013f-99b12ba5729f@codeaurora.org>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org


On 6/9/2016 03:41 PM, Peter Zijlstra wrote:
> On Thu, Jun 09, 2016 at 04:56:16PM +0100, Mark Rutland wrote:
>>>>> +static irqreturn_t l2_cache__handle_irq(int irq_num, void *data)
>>>>> +{
>>>>> +	struct hml2_pmu *slice = data;
>>>>> +	u32 ovsr;
>>>>> +	int idx;
>>>>> +	struct pt_regs *regs;
>>>>> +
>>>>> +	ovsr = hml2_pmu__getreset_ovsr();
>>>>> +	if (!hml2_pmu__has_overflowed(ovsr))
>>>>> +		return IRQ_NONE;
>>>>> +
>>>>> +	regs = get_irq_regs();
>>>>> +
>>>>> +	for (idx = 0; idx < l2cache_pmu.num_counters; idx++) {
>>>>> +		struct perf_event *event = slice->events[idx];
>>>>> +		struct hw_perf_event *hwc;
>>>>> +		struct perf_sample_data data;
>>>>> +
>>>>> +		if (!event)
>>>>> +			continue;
>>>>> +
>>>>> +		if (!hml2_pmu__counter_has_overflowed(ovsr, idx))
>>>>> +			continue;
>>>>> +
>>>>> +		l2_cache__event_update_from_slice(event, slice);
>>>>> +		hwc = &event->hw;
>>>>> +
>>>>> +		if (is_sampling_event(event)) {
>>>>> +			perf_sample_data_init(&data, 0, hwc->last_period);
>>>>
>>>> I don't think sampling makes sense, given this is an uncore PMU and the
>>>> events are triggered by other CPUs.
>>>
>>> There is origin filtering so events can be attributed to a CPU when sampling.
>>
>> Ok. I believe that's different from all other uncore PMUs we support
>> (none of the drivers support sampling, certainly), so I'm not entirely
>> sure how/if we can make use of that sanely and reliably.
> 
> Right; because not only do you need to know which CPU originated the
> event, the IRQ must also happen on that CPU. Simply knowing which CPU
> triggered it is not enough for sampling.
> 
>> For the timebeing, I think this sampling needs to go, and the event_init
>> logic needs to reject sampling as with other uncore PMU drivers.
> 
> Agreed.

I want to make sure I understand what the concern is here.
Given the hardware filter which restricts counting to events generated by
a specific CPU, and an irq which is affine to that CPU, sampling and task mode
would seem to work for a single perf use. 
Is the issue only related to multiple concurrent perf uses?

> 
>> One thing I forgot to mention in my earlier comments is that as an
>> uncore PMU you need to have task_ctx_nr = perf_invalid_context here
>> also.
> 
> For good reasons, uncore PMUs (as is the case here) count strictly more
> than the effect of single CPUs (and thus also the current task). So
> attributing it back to a task is nonsense.
> 

Thanks,
Neil

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.