All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org, devicetree@vger.kernel.org,
	linux-kernel@vger.kernel.org, Will Deacon <will@kernel.org>,
	Hector Martin <marcan@marcan.st>, Sven Peter <sven@svenpeter.dev>,
	Alyssa Rosenzweig <alyssa@rosenzweig.io>,
	Rob Herring <robh+dt@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Dougall <dougallj@gmail.com>,
	kernel-team@android.com
Subject: Re: [PATCH v2 8/8] drivers/perf: Add Apple icestorm/firestorm CPU PMU driver
Date: Thu, 02 Dec 2021 15:39:46 +0000	[thread overview]
Message-ID: <877dcnm2wt.wl-maz@kernel.org> (raw)
In-Reply-To: <YaepolizIKkzDQoV@FVFF77S0Q05N>

On Wed, 01 Dec 2021 16:58:10 +0000,
Mark Rutland <mark.rutland@arm.com> wrote:
> 
> On Wed, Dec 01, 2021 at 01:49:09PM +0000, Marc Zyngier wrote:
> > Add a new, weird and wonderful driver for the equally weird Apple
> > PMU HW. Although the PMU itself is functional, we don't know much
> > about the events yet, so this can be considered as yet another
> > random number generator...
> 
> It's really frustrating that Apple built this rather than the
> architected PMU, because we've generally pushed back on
> IMPLEMENTATION DEFINED junk in this area, and supporting this makes
> it harder to push back on other vendors going the same route, which
> I'm not keen on. That, and the usual state of IMP-DEF stuff making
> this stupidly painful to reason about.

As much as I agree with you on the stinking aspect of an IMPDEF PMU,
this doesn't contradicts the architecture. To avoid the spread of this
madness, forbidding an IMPDEF implementation in the architecture would
be the right thing to do.

> 
> I can see that we can get this working bare-metal with DT, but I
> really don't want to try to support this in other cases (e.g. in a
> VM, potentially with ACPI), or this IMP-DEFness is going to spread
> more throughout the arm_pmu code.

Well, an alternative would be to sidestep the arm_pmu framework
altogether.  Which would probably suck even more.

> How does this interact with PMU emulation for a KVM guest?

It doesn't. No non-architected PMU will get exposed to a KVM guest,
and the usual "inject an UNDEF exception on IMPDEF access" applies. As
far as I am concerned, KVM is purely architectural and doesn't need to
be encumbered with this.

> 
> I have a bunch of comments and questions below.
> 
> [...]
> 
> > +#define ANY_BUT_0_1			GENMASK(9, 2)
> > +#define ONLY_2_TO_7			GENMASK(7, 2)
> > +#define ONLY_2_4_6			(BIT(2) | BIT(4) | BIT(6))
> > +#define ONLY_5_6_7			GENMASK(7, 5)
> 
> For clarity/consistency it might be better to use separate BIT()s for
> ONLY_5_6_7 too.

Sure.

[...]

> > +static const u16 m1_pmu_event_affinity[M1_PMU_PERFCTR_LAST + 1] = {
> > +	[0 ... M1_PMU_PERFCTR_LAST]	= ANY_BUT_0_1,
> > +	[M1_PMU_PERFCTR_UNKNOWN_01]	= BIT(7),
> > +	[M1_PMU_PERFCTR_CPU_CYCLES]	= ANY_BUT_0_1 | BIT(0),
> > +	[M1_PMU_PERFCTR_INSTRUCTIONS]	= BIT(7) | BIT(1),
> > +	[M1_PMU_PERFCTR_UNKNOWN_8d]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_8e]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_8f]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_90]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_93]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_94]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_95]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_96]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_97]	= BIT(7),
> > +	[M1_PMU_PERFCTR_UNKNOWN_98]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_99]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_9a]	= BIT(7),
> > +	[M1_PMU_PERFCTR_UNKNOWN_9b]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_9c]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_9f]	= BIT(7),
> > +	[M1_PMU_PERFCTR_UNKNOWN_bf]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_c0]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_c1]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_c4]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_c5]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_c6]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_c8]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_ca]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_cb]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_f5]	= ONLY_2_4_6,
> > +	[M1_PMU_PERFCTR_UNKNOWN_f6]	= ONLY_2_4_6,
> > +	[M1_PMU_PERFCTR_UNKNOWN_f7]	= ONLY_2_4_6,
> > +	[M1_PMU_PERFCTR_UNKNOWN_f8]	= ONLY_2_TO_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_fd]	= ONLY_2_4_6,
> > +};
> 
> I don't entirely follow what's going on here. Is this a matrix
> scheme like what QC had in their IMP-DEF Krait PMUs? See:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/kernel/perf_event_v7.c?h=v5.16-rc3#n1286

It is nowhere as complicated as that.

> I'm a bit worried about this, since is this is of that shape, there
> are potential constraints on which counters and/or events you can
> use concurrently, and if you violate those they can conflict. If so,
> we need to be *very* careful about the abstraction we provide to
> userspace.

The HW does have placement constraints (this is what this per-event
bitmap is expressing), but the counting seems completely independent
as long as you find an ad-hoc counter to place the event. Which means
that if you try and count (for example) 4 events that would only fit
in {5,6,7}, we'll say NO to the fourth one.

As I say somewhere in a comment, we could do a better job if we had a
global view of the events to be counted, and split them in batches
that the core perf would then schedule.

If you think any of this somehow breaks the userspace ABI, please let
me know (my understand of perf is pretty limited...).

> 
> [...]
> 
> > +/* Low level accessors. No synchronisation. */
> > +#define PMU_READ_COUNTER(_idx)						\
> > +	case _idx:	return read_sysreg_s(SYS_IMP_APL_PMC## _idx ##_EL1)
> > +
> > +#define PMU_WRITE_COUNTER(_val, _idx)					\
> > +	case _idx:							\
> > +		write_sysreg_s(_val, SYS_IMP_APL_PMC## _idx ##_EL1);	\
> > +		return
> > +
> > +static u64 m1_pmu_read_hw_counter(unsigned int index)
> > +{
> > +	switch (index) {
> > +		PMU_READ_COUNTER(0);
> > +		PMU_READ_COUNTER(1);
> > +		PMU_READ_COUNTER(2);
> > +		PMU_READ_COUNTER(3);
> > +		PMU_READ_COUNTER(4);
> > +		PMU_READ_COUNTER(5);
> > +		PMU_READ_COUNTER(6);
> > +		PMU_READ_COUNTER(7);
> > +		PMU_READ_COUNTER(8);
> > +		PMU_READ_COUNTER(9);
> > +	}
> > +
> > +	BUG();
> > +}
> > +
> > +static void m1_pmu_write_hw_counter(u64 val, unsigned int index)
> > +{
> > +	switch (index) {
> > +		PMU_WRITE_COUNTER(val, 0);
> > +		PMU_WRITE_COUNTER(val, 1);
> > +		PMU_WRITE_COUNTER(val, 2);
> > +		PMU_WRITE_COUNTER(val, 3);
> > +		PMU_WRITE_COUNTER(val, 4);
> > +		PMU_WRITE_COUNTER(val, 5);
> > +		PMU_WRITE_COUNTER(val, 6);
> > +		PMU_WRITE_COUNTER(val, 7);
> > +		PMU_WRITE_COUNTER(val, 8);
> > +		PMU_WRITE_COUNTER(val, 9);
> > +	}
> > +
> > +	BUG();
> > +}
> 
> As an aside, since this pattern has cropped up in a few places, maybe we want
> to look into scripting the generation of sysreg banki accessors like this.
> 
> [...]
> 
> > +static void m1_pmu_configure_counter(unsigned int index, u8 event,
> > +				     bool user, bool kernel)
> > +{
> > +	u64 val, user_bit, kernel_bit;
> > +	int shift;
> > +
> > +	switch (index) {
> > +	case 0 ... 7:
> > +		user_bit = BIT(get_bit_offset(index, PMCR1_COUNT_A64_EL0_0_7));
> > +		kernel_bit = BIT(get_bit_offset(index, PMCR1_COUNT_A64_EL1_0_7));
> > +		break;
> > +	case 8 ... 9:
> > +		user_bit = BIT(get_bit_offset(index - 8, PMCR1_COUNT_A64_EL0_8_9));
> > +		kernel_bit = BIT(get_bit_offset(index - 8, PMCR1_COUNT_A64_EL1_8_9));
> > +		break;
> 
> When this says 'EL1', presuambly that's counting at EL2 in VHE?

It does.

> Are there separate EL1 / EL2 controls, or anythign of that sort we need to be
> aware of?

No, there is a single, per-counter control for EL0 and EL2. I couldn't
get the counters to report anything useful while a guest was running,
but that doesn't mean such control doesn't exist.

> 
> [...]
> 
> > +/* arm_pmu backend */
> > +static void m1_pmu_enable_event(struct perf_event *event)
> > +{
> > +	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
> > +	struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
> > +	unsigned long flags;
> > +	bool user, kernel;
> > +	u8 evt;
> > +
> > +	evt = event->hw.config_base & M1_PMU_CFG_EVENT;
> > +	user = event->hw.config_base & M1_PMU_CFG_COUNT_USER;
> > +	kernel = event->hw.config_base & M1_PMU_CFG_COUNT_KERNEL;
> > +
> > +	raw_spin_lock_irqsave(&cpuc->pmu_lock, flags);
> 
> You shouldn't need this locking. The perf core always calls into the HW access
> functions with IRQs disabled and we don't do cross-cpu state modification.
> Likewise elsewhere in this file.
> 
> We pulled similar out of the architectural PMU driver in commit:
> 
> 2a0e2a02e4b71917 ("arm64: perf: Remove PMU locking")
> 
> ... though that says non-preemptible when it should say non-interruptible.

Ah, nice. I'll get rid of it.

[...]

> > +	state = read_sysreg_s(SYS_IMP_APL_PMCR0_EL1);
> > +	overflow = read_sysreg_s(SYS_IMP_APL_PMSR_EL1);
> 
> I assume the overflow behaviour is free-running rather than stopping?

Configurable, apparently. At the moment, I set it to stop on overflow.
Happy to change the behaviour though.

> > +	if (!overflow) {
> > +		ret = IRQ_NONE;
> > +		goto out;
> > +	}
> > +
> > +	regs = get_irq_regs();
> > +
> > +	for (idx = 0; idx < cpu_pmu->num_events; idx++) {
> > +		struct perf_event *event = cpuc->events[idx];
> > +		struct perf_sample_data data;
> > +
> > +		if (!event)
> > +			continue;
> > +
> > +		armpmu_event_update(event);
> > +		perf_sample_data_init(&data, 0, event->hw.last_period);
> > +		if (!armpmu_event_set_period(event))
> > +			continue;
> > +
> > +		if (perf_event_overflow(event, &data, regs))
> > +			__m1_pmu_disable_event(event);
> > +	}
> > +
> > +out:
> > +	state &= ~PMCR0_IACT;
> > +	write_sysreg_s(state, SYS_IMP_APL_PMCR0_EL1);
> > +	isb();
> > +
> > +	raw_spin_unlock_irqrestore(&cpuc->pmu_lock, flags);
> > +
> > +	return ret;
> > +}
> 
> [...]
> 
> > +static int m1_pmu_device_probe(struct platform_device *pdev)
> > +{
> > +	int ret;
> > +
> > +	ret = arm_pmu_device_probe(pdev, m1_pmu_of_device_ids, NULL);
> > +	if (!ret) {
> > +		/*
> > +		 * If probe succeeds, taint the kernel as this is all
> > +		 * undocumented, implementation defined black magic.
> > +		 */
> > +		add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
> > +	}
> > +
> > +	return ret;
> > +}
> 
> Hmmm... that means we're always going to TAINT on this HW with an appropriate
> DT, which could mask other reasons TAINT_CPU_OUT_OF_SPEC would be set, even
> where the user isn't using the PMU.
> 
> Maybe we should have a cmdline option to opt-in to using the IMP-DEF PMU (and
> only tainting in that case)?

I'd rather taint on first use. Requiring a command-line argument for
this seems a bit over the top...

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org, devicetree@vger.kernel.org,
	linux-kernel@vger.kernel.org, Will Deacon <will@kernel.org>,
	Hector Martin <marcan@marcan.st>, Sven Peter <sven@svenpeter.dev>,
	Alyssa Rosenzweig <alyssa@rosenzweig.io>,
	Rob Herring <robh+dt@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Dougall <dougallj@gmail.com>,
	kernel-team@android.com
Subject: Re: [PATCH v2 8/8] drivers/perf: Add Apple icestorm/firestorm CPU PMU driver
Date: Thu, 02 Dec 2021 15:39:46 +0000	[thread overview]
Message-ID: <877dcnm2wt.wl-maz@kernel.org> (raw)
In-Reply-To: <YaepolizIKkzDQoV@FVFF77S0Q05N>

On Wed, 01 Dec 2021 16:58:10 +0000,
Mark Rutland <mark.rutland@arm.com> wrote:
> 
> On Wed, Dec 01, 2021 at 01:49:09PM +0000, Marc Zyngier wrote:
> > Add a new, weird and wonderful driver for the equally weird Apple
> > PMU HW. Although the PMU itself is functional, we don't know much
> > about the events yet, so this can be considered as yet another
> > random number generator...
> 
> It's really frustrating that Apple built this rather than the
> architected PMU, because we've generally pushed back on
> IMPLEMENTATION DEFINED junk in this area, and supporting this makes
> it harder to push back on other vendors going the same route, which
> I'm not keen on. That, and the usual state of IMP-DEF stuff making
> this stupidly painful to reason about.

As much as I agree with you on the stinking aspect of an IMPDEF PMU,
this doesn't contradicts the architecture. To avoid the spread of this
madness, forbidding an IMPDEF implementation in the architecture would
be the right thing to do.

> 
> I can see that we can get this working bare-metal with DT, but I
> really don't want to try to support this in other cases (e.g. in a
> VM, potentially with ACPI), or this IMP-DEFness is going to spread
> more throughout the arm_pmu code.

Well, an alternative would be to sidestep the arm_pmu framework
altogether.  Which would probably suck even more.

> How does this interact with PMU emulation for a KVM guest?

It doesn't. No non-architected PMU will get exposed to a KVM guest,
and the usual "inject an UNDEF exception on IMPDEF access" applies. As
far as I am concerned, KVM is purely architectural and doesn't need to
be encumbered with this.

> 
> I have a bunch of comments and questions below.
> 
> [...]
> 
> > +#define ANY_BUT_0_1			GENMASK(9, 2)
> > +#define ONLY_2_TO_7			GENMASK(7, 2)
> > +#define ONLY_2_4_6			(BIT(2) | BIT(4) | BIT(6))
> > +#define ONLY_5_6_7			GENMASK(7, 5)
> 
> For clarity/consistency it might be better to use separate BIT()s for
> ONLY_5_6_7 too.

Sure.

[...]

> > +static const u16 m1_pmu_event_affinity[M1_PMU_PERFCTR_LAST + 1] = {
> > +	[0 ... M1_PMU_PERFCTR_LAST]	= ANY_BUT_0_1,
> > +	[M1_PMU_PERFCTR_UNKNOWN_01]	= BIT(7),
> > +	[M1_PMU_PERFCTR_CPU_CYCLES]	= ANY_BUT_0_1 | BIT(0),
> > +	[M1_PMU_PERFCTR_INSTRUCTIONS]	= BIT(7) | BIT(1),
> > +	[M1_PMU_PERFCTR_UNKNOWN_8d]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_8e]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_8f]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_90]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_93]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_94]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_95]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_96]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_97]	= BIT(7),
> > +	[M1_PMU_PERFCTR_UNKNOWN_98]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_99]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_9a]	= BIT(7),
> > +	[M1_PMU_PERFCTR_UNKNOWN_9b]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_9c]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_9f]	= BIT(7),
> > +	[M1_PMU_PERFCTR_UNKNOWN_bf]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_c0]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_c1]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_c4]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_c5]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_c6]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_c8]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_ca]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_cb]	= ONLY_5_6_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_f5]	= ONLY_2_4_6,
> > +	[M1_PMU_PERFCTR_UNKNOWN_f6]	= ONLY_2_4_6,
> > +	[M1_PMU_PERFCTR_UNKNOWN_f7]	= ONLY_2_4_6,
> > +	[M1_PMU_PERFCTR_UNKNOWN_f8]	= ONLY_2_TO_7,
> > +	[M1_PMU_PERFCTR_UNKNOWN_fd]	= ONLY_2_4_6,
> > +};
> 
> I don't entirely follow what's going on here. Is this a matrix
> scheme like what QC had in their IMP-DEF Krait PMUs? See:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/kernel/perf_event_v7.c?h=v5.16-rc3#n1286

It is nowhere as complicated as that.

> I'm a bit worried about this, since is this is of that shape, there
> are potential constraints on which counters and/or events you can
> use concurrently, and if you violate those they can conflict. If so,
> we need to be *very* careful about the abstraction we provide to
> userspace.

The HW does have placement constraints (this is what this per-event
bitmap is expressing), but the counting seems completely independent
as long as you find an ad-hoc counter to place the event. Which means
that if you try and count (for example) 4 events that would only fit
in {5,6,7}, we'll say NO to the fourth one.

As I say somewhere in a comment, we could do a better job if we had a
global view of the events to be counted, and split them in batches
that the core perf would then schedule.

If you think any of this somehow breaks the userspace ABI, please let
me know (my understand of perf is pretty limited...).

> 
> [...]
> 
> > +/* Low level accessors. No synchronisation. */
> > +#define PMU_READ_COUNTER(_idx)						\
> > +	case _idx:	return read_sysreg_s(SYS_IMP_APL_PMC## _idx ##_EL1)
> > +
> > +#define PMU_WRITE_COUNTER(_val, _idx)					\
> > +	case _idx:							\
> > +		write_sysreg_s(_val, SYS_IMP_APL_PMC## _idx ##_EL1);	\
> > +		return
> > +
> > +static u64 m1_pmu_read_hw_counter(unsigned int index)
> > +{
> > +	switch (index) {
> > +		PMU_READ_COUNTER(0);
> > +		PMU_READ_COUNTER(1);
> > +		PMU_READ_COUNTER(2);
> > +		PMU_READ_COUNTER(3);
> > +		PMU_READ_COUNTER(4);
> > +		PMU_READ_COUNTER(5);
> > +		PMU_READ_COUNTER(6);
> > +		PMU_READ_COUNTER(7);
> > +		PMU_READ_COUNTER(8);
> > +		PMU_READ_COUNTER(9);
> > +	}
> > +
> > +	BUG();
> > +}
> > +
> > +static void m1_pmu_write_hw_counter(u64 val, unsigned int index)
> > +{
> > +	switch (index) {
> > +		PMU_WRITE_COUNTER(val, 0);
> > +		PMU_WRITE_COUNTER(val, 1);
> > +		PMU_WRITE_COUNTER(val, 2);
> > +		PMU_WRITE_COUNTER(val, 3);
> > +		PMU_WRITE_COUNTER(val, 4);
> > +		PMU_WRITE_COUNTER(val, 5);
> > +		PMU_WRITE_COUNTER(val, 6);
> > +		PMU_WRITE_COUNTER(val, 7);
> > +		PMU_WRITE_COUNTER(val, 8);
> > +		PMU_WRITE_COUNTER(val, 9);
> > +	}
> > +
> > +	BUG();
> > +}
> 
> As an aside, since this pattern has cropped up in a few places, maybe we want
> to look into scripting the generation of sysreg banki accessors like this.
> 
> [...]
> 
> > +static void m1_pmu_configure_counter(unsigned int index, u8 event,
> > +				     bool user, bool kernel)
> > +{
> > +	u64 val, user_bit, kernel_bit;
> > +	int shift;
> > +
> > +	switch (index) {
> > +	case 0 ... 7:
> > +		user_bit = BIT(get_bit_offset(index, PMCR1_COUNT_A64_EL0_0_7));
> > +		kernel_bit = BIT(get_bit_offset(index, PMCR1_COUNT_A64_EL1_0_7));
> > +		break;
> > +	case 8 ... 9:
> > +		user_bit = BIT(get_bit_offset(index - 8, PMCR1_COUNT_A64_EL0_8_9));
> > +		kernel_bit = BIT(get_bit_offset(index - 8, PMCR1_COUNT_A64_EL1_8_9));
> > +		break;
> 
> When this says 'EL1', presuambly that's counting at EL2 in VHE?

It does.

> Are there separate EL1 / EL2 controls, or anythign of that sort we need to be
> aware of?

No, there is a single, per-counter control for EL0 and EL2. I couldn't
get the counters to report anything useful while a guest was running,
but that doesn't mean such control doesn't exist.

> 
> [...]
> 
> > +/* arm_pmu backend */
> > +static void m1_pmu_enable_event(struct perf_event *event)
> > +{
> > +	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
> > +	struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
> > +	unsigned long flags;
> > +	bool user, kernel;
> > +	u8 evt;
> > +
> > +	evt = event->hw.config_base & M1_PMU_CFG_EVENT;
> > +	user = event->hw.config_base & M1_PMU_CFG_COUNT_USER;
> > +	kernel = event->hw.config_base & M1_PMU_CFG_COUNT_KERNEL;
> > +
> > +	raw_spin_lock_irqsave(&cpuc->pmu_lock, flags);
> 
> You shouldn't need this locking. The perf core always calls into the HW access
> functions with IRQs disabled and we don't do cross-cpu state modification.
> Likewise elsewhere in this file.
> 
> We pulled similar out of the architectural PMU driver in commit:
> 
> 2a0e2a02e4b71917 ("arm64: perf: Remove PMU locking")
> 
> ... though that says non-preemptible when it should say non-interruptible.

Ah, nice. I'll get rid of it.

[...]

> > +	state = read_sysreg_s(SYS_IMP_APL_PMCR0_EL1);
> > +	overflow = read_sysreg_s(SYS_IMP_APL_PMSR_EL1);
> 
> I assume the overflow behaviour is free-running rather than stopping?

Configurable, apparently. At the moment, I set it to stop on overflow.
Happy to change the behaviour though.

> > +	if (!overflow) {
> > +		ret = IRQ_NONE;
> > +		goto out;
> > +	}
> > +
> > +	regs = get_irq_regs();
> > +
> > +	for (idx = 0; idx < cpu_pmu->num_events; idx++) {
> > +		struct perf_event *event = cpuc->events[idx];
> > +		struct perf_sample_data data;
> > +
> > +		if (!event)
> > +			continue;
> > +
> > +		armpmu_event_update(event);
> > +		perf_sample_data_init(&data, 0, event->hw.last_period);
> > +		if (!armpmu_event_set_period(event))
> > +			continue;
> > +
> > +		if (perf_event_overflow(event, &data, regs))
> > +			__m1_pmu_disable_event(event);
> > +	}
> > +
> > +out:
> > +	state &= ~PMCR0_IACT;
> > +	write_sysreg_s(state, SYS_IMP_APL_PMCR0_EL1);
> > +	isb();
> > +
> > +	raw_spin_unlock_irqrestore(&cpuc->pmu_lock, flags);
> > +
> > +	return ret;
> > +}
> 
> [...]
> 
> > +static int m1_pmu_device_probe(struct platform_device *pdev)
> > +{
> > +	int ret;
> > +
> > +	ret = arm_pmu_device_probe(pdev, m1_pmu_of_device_ids, NULL);
> > +	if (!ret) {
> > +		/*
> > +		 * If probe succeeds, taint the kernel as this is all
> > +		 * undocumented, implementation defined black magic.
> > +		 */
> > +		add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
> > +	}
> > +
> > +	return ret;
> > +}
> 
> Hmmm... that means we're always going to TAINT on this HW with an appropriate
> DT, which could mask other reasons TAINT_CPU_OUT_OF_SPEC would be set, even
> where the user isn't using the PMU.
> 
> Maybe we should have a cmdline option to opt-in to using the IMP-DEF PMU (and
> only tainting in that case)?

I'd rather taint on first use. Requiring a command-line argument for
this seems a bit over the top...

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2021-12-02 15:39 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-01 13:49 [PATCH v2 0/8] drivers/perf: CPU PMU driver for Apple M1 Marc Zyngier
2021-12-01 13:49 ` Marc Zyngier
2021-12-01 13:49 ` [PATCH v2 1/8] dt-bindings: arm-pmu: Document Apple PMU compatible strings Marc Zyngier
2021-12-01 13:49   ` Marc Zyngier
2021-12-12  7:27   ` Hector Martin
2021-12-12  7:27     ` Hector Martin
2021-12-01 13:49 ` [PATCH v2 2/8] dt-bindings: apple,aic: Add CPU PMU per-cpu pseudo-interrupts Marc Zyngier
2021-12-01 13:49   ` [PATCH v2 2/8] dt-bindings: apple, aic: " Marc Zyngier
2021-12-12  7:26   ` [PATCH v2 2/8] dt-bindings: apple,aic: " Hector Martin
2021-12-12  7:26     ` Hector Martin
2021-12-01 13:49 ` [PATCH v2 3/8] irqchip/apple-aic: Add cpumasks for E and P cores Marc Zyngier
2021-12-01 13:49   ` Marc Zyngier
2021-12-01 16:08   ` Mark Rutland
2021-12-01 16:08     ` Mark Rutland
2021-12-03 16:32     ` Marc Zyngier
2021-12-03 16:32       ` Marc Zyngier
2021-12-12  7:22       ` Hector Martin
2021-12-12  7:22         ` Hector Martin
2021-12-12  7:30   ` Hector Martin
2021-12-12  7:30     ` Hector Martin
2021-12-13 14:43     ` Marc Zyngier
2021-12-13 14:43       ` Marc Zyngier
2021-12-01 13:49 ` [PATCH v2 4/8] irqchip/apple-aic: Wire PMU interrupts Marc Zyngier
2021-12-01 13:49   ` Marc Zyngier
2021-12-12  7:25   ` Hector Martin
2021-12-12  7:25     ` Hector Martin
2021-12-01 13:49 ` [PATCH v2 5/8] irqchip/apple-aic: Move PMU-specific registers to their own include file Marc Zyngier
2021-12-01 13:49   ` Marc Zyngier
2021-12-12  7:23   ` Hector Martin
2021-12-12  7:23     ` Hector Martin
2021-12-01 13:49 ` [PATCH v2 6/8] arm64: apple: t8301: Add PMU nodes Marc Zyngier
2021-12-01 13:49   ` Marc Zyngier
2021-12-12  7:26   ` Hector Martin
2021-12-12  7:26     ` Hector Martin
2021-12-01 13:49 ` [PATCH v2 7/8] drivers/perf: arm_pmu: Handle 47 bit counters Marc Zyngier
2021-12-01 13:49   ` Marc Zyngier
2021-12-12  7:26   ` Hector Martin
2021-12-12  7:26     ` Hector Martin
2021-12-01 13:49 ` [PATCH v2 8/8] drivers/perf: Add Apple icestorm/firestorm CPU PMU driver Marc Zyngier
2021-12-01 13:49   ` Marc Zyngier
2021-12-01 16:58   ` Mark Rutland
2021-12-01 16:58     ` Mark Rutland
2021-12-01 17:56     ` Alyssa Rosenzweig
2021-12-01 17:56       ` Alyssa Rosenzweig
2021-12-02 15:39     ` Marc Zyngier [this message]
2021-12-02 15:39       ` Marc Zyngier
2021-12-02 16:14       ` Mark Rutland
2021-12-02 16:14         ` Mark Rutland
2021-12-03 11:22         ` Marc Zyngier
2021-12-03 11:22           ` Marc Zyngier
2021-12-03 12:04           ` Mark Rutland
2021-12-03 12:04             ` Mark Rutland
2021-12-03 16:22             ` Marc Zyngier
2021-12-03 16:22               ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877dcnm2wt.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=alyssa@rosenzweig.io \
    --cc=devicetree@vger.kernel.org \
    --cc=dougallj@gmail.com \
    --cc=kernel-team@android.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcan@marcan.st \
    --cc=mark.rutland@arm.com \
    --cc=robh+dt@kernel.org \
    --cc=sven@svenpeter.dev \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.