All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.
@ 2018-10-01 10:07 ` Kulkarni, Ganapatrao
  0 siblings, 0 replies; 18+ messages in thread
From: Kulkarni, Ganapatrao @ 2018-10-01 10:07 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Will.Deacon, mark.rutland, catalin.marinas, peterz, mingo, acme,
	Nair, Jayachandran, Richter, Robert, Lomovtsev, Vadim,
	Jan Glauber, gklkml16

Perf events L1-dcache-load-misses, L1-dcache-store-misses are mapped to
armv8_pmuv3 (both DT and ACPI) event L1D_CACHE_REFILL. This is incorrect,
since L1D_CACHE_REFILL counts both load and store misses.
Similarly the events L1-dcache-loads, L1-dcache-stores, dTLB-load-misses
and dTLB-loads are wrongly mapped. Hence Deleting all these cache events
from armv8_pmuv3 cache mapping.

Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
---
 arch/arm64/kernel/perf_event.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 33147aacdafd..6a67ad22d1eb 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -207,17 +207,9 @@ static const unsigned armv8_pmuv3_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 						[PERF_COUNT_HW_CACHE_RESULT_MAX] = {
 	PERF_CACHE_MAP_ALL_UNSUPPORTED,
 
-	[C(L1D)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1D_CACHE,
-	[C(L1D)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL,
-	[C(L1D)][C(OP_WRITE)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1D_CACHE,
-	[C(L1D)][C(OP_WRITE)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL,
-
 	[C(L1I)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1I_CACHE,
 	[C(L1I)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L1I_CACHE_REFILL,
 
-	[C(DTLB)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L1D_TLB_REFILL,
-	[C(DTLB)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1D_TLB,
-
 	[C(ITLB)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L1I_TLB_REFILL,
 	[C(ITLB)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1I_TLB,
 
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.
@ 2018-10-01 10:07 ` Kulkarni, Ganapatrao
  0 siblings, 0 replies; 18+ messages in thread
From: Kulkarni, Ganapatrao @ 2018-10-01 10:07 UTC (permalink / raw)
  To: linux-arm-kernel

Perf events L1-dcache-load-misses, L1-dcache-store-misses are mapped to
armv8_pmuv3 (both DT and ACPI) event L1D_CACHE_REFILL. This is incorrect,
since L1D_CACHE_REFILL counts both load and store misses.
Similarly the events L1-dcache-loads, L1-dcache-stores, dTLB-load-misses
and dTLB-loads are wrongly mapped. Hence Deleting all these cache events
from armv8_pmuv3 cache mapping.

Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
---
 arch/arm64/kernel/perf_event.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 33147aacdafd..6a67ad22d1eb 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -207,17 +207,9 @@ static const unsigned armv8_pmuv3_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 						[PERF_COUNT_HW_CACHE_RESULT_MAX] = {
 	PERF_CACHE_MAP_ALL_UNSUPPORTED,
 
-	[C(L1D)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1D_CACHE,
-	[C(L1D)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL,
-	[C(L1D)][C(OP_WRITE)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1D_CACHE,
-	[C(L1D)][C(OP_WRITE)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL,
-
 	[C(L1I)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1I_CACHE,
 	[C(L1I)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L1I_CACHE_REFILL,
 
-	[C(DTLB)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L1D_TLB_REFILL,
-	[C(DTLB)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1D_TLB,
-
 	[C(ITLB)][C(OP_READ)][C(RESULT_MISS)]	= ARMV8_PMUV3_PERFCTR_L1I_TLB_REFILL,
 	[C(ITLB)][C(OP_READ)][C(RESULT_ACCESS)]	= ARMV8_PMUV3_PERFCTR_L1I_TLB,
 
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.
  2018-10-01 10:07 ` Kulkarni, Ganapatrao
@ 2018-10-01 14:29   ` Will Deacon
  -1 siblings, 0 replies; 18+ messages in thread
From: Will Deacon @ 2018-10-01 14:29 UTC (permalink / raw)
  To: Kulkarni, Ganapatrao
  Cc: linux-kernel, linux-arm-kernel, mark.rutland, catalin.marinas,
	peterz, mingo, acme, Nair, Jayachandran, Richter, Robert,
	Lomovtsev, Vadim, Jan Glauber, gklkml16

Hi Ganapat,

On Mon, Oct 01, 2018 at 10:07:43AM +0000, Kulkarni, Ganapatrao wrote:
> Perf events L1-dcache-load-misses, L1-dcache-store-misses are mapped to
> armv8_pmuv3 (both DT and ACPI) event L1D_CACHE_REFILL. This is incorrect,
> since L1D_CACHE_REFILL counts both load and store misses.
> Similarly the events L1-dcache-loads, L1-dcache-stores, dTLB-load-misses
> and dTLB-loads are wrongly mapped. Hence Deleting all these cache events
> from armv8_pmuv3 cache mapping.
> 
> Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
> ---
>  arch/arm64/kernel/perf_event.c | 8 --------
>  1 file changed, 8 deletions(-)

The "generic" events are really implemented on a best-effort basis, as
they rarely tend to map exactly to what the hardware supports. I think
they originally stemmed from the x86 CPU PMU, but that doesn't really
help us.

I had a discussion with Ingo back when we originally implemented perf
because I actually preferred not to implement the generic events at all.
However, he was strongly of the opinion that a best-effort approach was
sufficient to get casual users going with the tool, so that's what we went
with.

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.
@ 2018-10-01 14:29   ` Will Deacon
  0 siblings, 0 replies; 18+ messages in thread
From: Will Deacon @ 2018-10-01 14:29 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ganapat,

On Mon, Oct 01, 2018 at 10:07:43AM +0000, Kulkarni, Ganapatrao wrote:
> Perf events L1-dcache-load-misses, L1-dcache-store-misses are mapped to
> armv8_pmuv3 (both DT and ACPI) event L1D_CACHE_REFILL. This is incorrect,
> since L1D_CACHE_REFILL counts both load and store misses.
> Similarly the events L1-dcache-loads, L1-dcache-stores, dTLB-load-misses
> and dTLB-loads are wrongly mapped. Hence Deleting all these cache events
> from armv8_pmuv3 cache mapping.
> 
> Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
> ---
>  arch/arm64/kernel/perf_event.c | 8 --------
>  1 file changed, 8 deletions(-)

The "generic" events are really implemented on a best-effort basis, as
they rarely tend to map exactly to what the hardware supports. I think
they originally stemmed from the x86 CPU PMU, but that doesn't really
help us.

I had a discussion with Ingo back when we originally implemented perf
because I actually preferred not to implement the generic events at all.
However, he was strongly of the opinion that a best-effort approach was
sufficient to get casual users going with the tool, so that's what we went
with.

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.
  2018-10-01 14:29   ` Will Deacon
@ 2018-10-01 16:39     ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 18+ messages in thread
From: Ganapatrao Kulkarni @ 2018-10-01 16:39 UTC (permalink / raw)
  To: Will Deacon
  Cc: Ganapatrao Kulkarni, LKML, linux-arm-kernel, Mark Rutland,
	catalin.marinas, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Nair, Jayachandran, Robert Richter,
	Vadim.Lomovtsev, Jan.Glauber

Hi Will,

On Mon, Oct 1, 2018 at 7:58 PM Will Deacon <will.deacon@arm.com> wrote:
>
> Hi Ganapat,
>
> On Mon, Oct 01, 2018 at 10:07:43AM +0000, Kulkarni, Ganapatrao wrote:
> > Perf events L1-dcache-load-misses, L1-dcache-store-misses are mapped to
> > armv8_pmuv3 (both DT and ACPI) event L1D_CACHE_REFILL. This is incorrect,
> > since L1D_CACHE_REFILL counts both load and store misses.
> > Similarly the events L1-dcache-loads, L1-dcache-stores, dTLB-load-misses
> > and dTLB-loads are wrongly mapped. Hence Deleting all these cache events
> > from armv8_pmuv3 cache mapping.
> >
> > Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
> > ---
> >  arch/arm64/kernel/perf_event.c | 8 --------
> >  1 file changed, 8 deletions(-)
>
> The "generic" events are really implemented on a best-effort basis, as
> they rarely tend to map exactly to what the hardware supports. I think
> they originally stemmed from the x86 CPU PMU, but that doesn't really
> help us.

This works fairly well for DT based boots, since almost all SoCs have
added remapping using custom dt object binding.
However we have concluded in the past to drop SoC specific from the
ACPI mapping and use json to add SoC/micro architecture specific
events support.
At present ,  When we boot with ACPI,  it is misleading for these events.

In fact, this has been pointed internally from benchmark team and
reported it as hardware bug!
IMHO, it would be much simpler to delete these misleading events
mapping rather explaining to perf tool users.

We already have proper mapping for these events,
armv8_pmuv3_0/l1d_cache_refill/
armv8_pmuv3_0/l1d_cache/
[core imp def:]
l1d_cache_rd
l1d_cache_wr
l1d_cache_refill_rd
l1d_cache_refill_wr

>
> I had a discussion with Ingo back when we originally implemented perf
> because I actually preferred not to implement the generic events at all.
> However, he was strongly of the opinion that a best-effort approach was
> sufficient to get casual users going with the tool, so that's what we went
> with.

thanks for the background info, these generic mapping fairly works
except these events.

>
> Will

thanks,
Ganapat

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.
@ 2018-10-01 16:39     ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 18+ messages in thread
From: Ganapatrao Kulkarni @ 2018-10-01 16:39 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On Mon, Oct 1, 2018 at 7:58 PM Will Deacon <will.deacon@arm.com> wrote:
>
> Hi Ganapat,
>
> On Mon, Oct 01, 2018 at 10:07:43AM +0000, Kulkarni, Ganapatrao wrote:
> > Perf events L1-dcache-load-misses, L1-dcache-store-misses are mapped to
> > armv8_pmuv3 (both DT and ACPI) event L1D_CACHE_REFILL. This is incorrect,
> > since L1D_CACHE_REFILL counts both load and store misses.
> > Similarly the events L1-dcache-loads, L1-dcache-stores, dTLB-load-misses
> > and dTLB-loads are wrongly mapped. Hence Deleting all these cache events
> > from armv8_pmuv3 cache mapping.
> >
> > Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
> > ---
> >  arch/arm64/kernel/perf_event.c | 8 --------
> >  1 file changed, 8 deletions(-)
>
> The "generic" events are really implemented on a best-effort basis, as
> they rarely tend to map exactly to what the hardware supports. I think
> they originally stemmed from the x86 CPU PMU, but that doesn't really
> help us.

This works fairly well for DT based boots, since almost all SoCs have
added remapping using custom dt object binding.
However we have concluded in the past to drop SoC specific from the
ACPI mapping and use json to add SoC/micro architecture specific
events support.
At present ,  When we boot with ACPI,  it is misleading for these events.

In fact, this has been pointed internally from benchmark team and
reported it as hardware bug!
IMHO, it would be much simpler to delete these misleading events
mapping rather explaining to perf tool users.

We already have proper mapping for these events,
armv8_pmuv3_0/l1d_cache_refill/
armv8_pmuv3_0/l1d_cache/
[core imp def:]
l1d_cache_rd
l1d_cache_wr
l1d_cache_refill_rd
l1d_cache_refill_wr

>
> I had a discussion with Ingo back when we originally implemented perf
> because I actually preferred not to implement the generic events at all.
> However, he was strongly of the opinion that a best-effort approach was
> sufficient to get casual users going with the tool, so that's what we went
> with.

thanks for the background info, these generic mapping fairly works
except these events.

>
> Will

thanks,
Ganapat

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.
  2018-10-01 16:39     ` Ganapatrao Kulkarni
@ 2018-10-04  5:42       ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 18+ messages in thread
From: Ganapatrao Kulkarni @ 2018-10-04  5:42 UTC (permalink / raw)
  To: Will Deacon
  Cc: Ganapatrao Kulkarni, LKML, linux-arm-kernel, Mark Rutland,
	catalin.marinas, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Nair, Jayachandran, Robert Richter,
	Vadim.Lomovtsev, Jan.Glauber

Hi Will,

can you please pull this patch?

On Mon, Oct 1, 2018 at 10:09 PM Ganapatrao Kulkarni <gklkml16@gmail.com> wrote:
>
> Hi Will,
>
> On Mon, Oct 1, 2018 at 7:58 PM Will Deacon <will.deacon@arm.com> wrote:
> >
> > Hi Ganapat,
> >
> > On Mon, Oct 01, 2018 at 10:07:43AM +0000, Kulkarni, Ganapatrao wrote:
> > > Perf events L1-dcache-load-misses, L1-dcache-store-misses are mapped to
> > > armv8_pmuv3 (both DT and ACPI) event L1D_CACHE_REFILL. This is incorrect,
> > > since L1D_CACHE_REFILL counts both load and store misses.
> > > Similarly the events L1-dcache-loads, L1-dcache-stores, dTLB-load-misses
> > > and dTLB-loads are wrongly mapped. Hence Deleting all these cache events
> > > from armv8_pmuv3 cache mapping.
> > >
> > > Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
> > > ---
> > >  arch/arm64/kernel/perf_event.c | 8 --------
> > >  1 file changed, 8 deletions(-)
> >
> > The "generic" events are really implemented on a best-effort basis, as
> > they rarely tend to map exactly to what the hardware supports. I think
> > they originally stemmed from the x86 CPU PMU, but that doesn't really
> > help us.
>
> This works fairly well for DT based boots, since almost all SoCs have
> added remapping using custom dt object binding.
> However we have concluded in the past to drop SoC specific from the
> ACPI mapping and use json to add SoC/micro architecture specific
> events support.
> At present ,  When we boot with ACPI,  it is misleading for these events.
>
> In fact, this has been pointed internally from benchmark team and
> reported it as hardware bug!
> IMHO, it would be much simpler to delete these misleading events
> mapping rather explaining to perf tool users.
>
> We already have proper mapping for these events,
> armv8_pmuv3_0/l1d_cache_refill/
> armv8_pmuv3_0/l1d_cache/
> [core imp def:]
> l1d_cache_rd
> l1d_cache_wr
> l1d_cache_refill_rd
> l1d_cache_refill_wr
>
> >
> > I had a discussion with Ingo back when we originally implemented perf
> > because I actually preferred not to implement the generic events at all.
> > However, he was strongly of the opinion that a best-effort approach was
> > sufficient to get casual users going with the tool, so that's what we went
> > with.
>
> thanks for the background info, these generic mapping fairly works
> except these events.
>
> >
> > Will
>
> thanks,
> Ganapat

thanks,
Ganapat

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.
@ 2018-10-04  5:42       ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 18+ messages in thread
From: Ganapatrao Kulkarni @ 2018-10-04  5:42 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

can you please pull this patch?

On Mon, Oct 1, 2018 at 10:09 PM Ganapatrao Kulkarni <gklkml16@gmail.com> wrote:
>
> Hi Will,
>
> On Mon, Oct 1, 2018 at 7:58 PM Will Deacon <will.deacon@arm.com> wrote:
> >
> > Hi Ganapat,
> >
> > On Mon, Oct 01, 2018 at 10:07:43AM +0000, Kulkarni, Ganapatrao wrote:
> > > Perf events L1-dcache-load-misses, L1-dcache-store-misses are mapped to
> > > armv8_pmuv3 (both DT and ACPI) event L1D_CACHE_REFILL. This is incorrect,
> > > since L1D_CACHE_REFILL counts both load and store misses.
> > > Similarly the events L1-dcache-loads, L1-dcache-stores, dTLB-load-misses
> > > and dTLB-loads are wrongly mapped. Hence Deleting all these cache events
> > > from armv8_pmuv3 cache mapping.
> > >
> > > Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
> > > ---
> > >  arch/arm64/kernel/perf_event.c | 8 --------
> > >  1 file changed, 8 deletions(-)
> >
> > The "generic" events are really implemented on a best-effort basis, as
> > they rarely tend to map exactly to what the hardware supports. I think
> > they originally stemmed from the x86 CPU PMU, but that doesn't really
> > help us.
>
> This works fairly well for DT based boots, since almost all SoCs have
> added remapping using custom dt object binding.
> However we have concluded in the past to drop SoC specific from the
> ACPI mapping and use json to add SoC/micro architecture specific
> events support.
> At present ,  When we boot with ACPI,  it is misleading for these events.
>
> In fact, this has been pointed internally from benchmark team and
> reported it as hardware bug!
> IMHO, it would be much simpler to delete these misleading events
> mapping rather explaining to perf tool users.
>
> We already have proper mapping for these events,
> armv8_pmuv3_0/l1d_cache_refill/
> armv8_pmuv3_0/l1d_cache/
> [core imp def:]
> l1d_cache_rd
> l1d_cache_wr
> l1d_cache_refill_rd
> l1d_cache_refill_wr
>
> >
> > I had a discussion with Ingo back when we originally implemented perf
> > because I actually preferred not to implement the generic events at all.
> > However, he was strongly of the opinion that a best-effort approach was
> > sufficient to get casual users going with the tool, so that's what we went
> > with.
>
> thanks for the background info, these generic mapping fairly works
> except these events.
>
> >
> > Will
>
> thanks,
> Ganapat

thanks,
Ganapat

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.
  2018-10-04  5:42       ` Ganapatrao Kulkarni
@ 2018-10-04 12:22         ` Will Deacon
  -1 siblings, 0 replies; 18+ messages in thread
From: Will Deacon @ 2018-10-04 12:22 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: Ganapatrao Kulkarni, LKML, linux-arm-kernel, Mark Rutland,
	catalin.marinas, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Nair, Jayachandran, Robert Richter,
	Vadim.Lomovtsev, Jan.Glauber

Hi Ganapat,

On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote:
> can you please pull this patch?

I still don't like the idea of just removing events like this, especially
when other architectures (including some x86 and Power CPUs afaict) playa
similar games for generic events, and these events do actually appear in
user code.

I also don't understand why you remove the TLB events. I think that logic
would imply we should remove all of the events, because we can't distinguish
prefetches from reads either. If we want to be consistent, then I think
we should just remove the OP_WRITE events for L1D and BPU -- would you be
ok with that instead?

Also, looking at the code, I think our PMCEID parsing is broken for 8.1
parts, where the upper 32 bits of the register are offset by 0x4000 in the
event numbering space.

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.
@ 2018-10-04 12:22         ` Will Deacon
  0 siblings, 0 replies; 18+ messages in thread
From: Will Deacon @ 2018-10-04 12:22 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ganapat,

On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote:
> can you please pull this patch?

I still don't like the idea of just removing events like this, especially
when other architectures (including some x86 and Power CPUs afaict) playa
similar games for generic events, and these events do actually appear in
user code.

I also don't understand why you remove the TLB events. I think that logic
would imply we should remove all of the events, because we can't distinguish
prefetches from reads either. If we want to be consistent, then I think
we should just remove the OP_WRITE events for L1D and BPU -- would you be
ok with that instead?

Also, looking at the code, I think our PMCEID parsing is broken for 8.1
parts, where the upper 32 bits of the register are offset by 0x4000 in the
event numbering space.

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.
  2018-10-04 12:22         ` Will Deacon
@ 2018-10-04 19:39           ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 18+ messages in thread
From: Ganapatrao Kulkarni @ 2018-10-04 19:39 UTC (permalink / raw)
  To: Will Deacon
  Cc: Ganapatrao Kulkarni, LKML, linux-arm-kernel, Mark Rutland,
	catalin.marinas, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Nair, Jayachandran, Robert Richter,
	Vadim.Lomovtsev, Jan.Glauber

Hi Will,

On Thu, Oct 4, 2018 at 5:51 PM Will Deacon <will.deacon@arm.com> wrote:
>
> Hi Ganapat,
>
> On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote:
> > can you please pull this patch?
>
> I still don't like the idea of just removing events like this, especially
> when other architectures (including some x86 and Power CPUs afaict) playa
> similar games for generic events, and these events do actually appear in
> user code.
>
> I also don't understand why you remove the TLB events. I think that logic
> would imply we should remove all of the events, because we can't distinguish
> prefetches from reads either. If we want to be consistent, then I think
> we should just remove the OP_WRITE events for L1D and BPU -- would you be
> ok with that instead?

IIUC, dTLB-load-misses is mapped to
ARMV8_PMUV3_PERFCTR_L1D_TLB_REFILL(event 0x05) and dTLB-loads is
mapped to ARMV8_PMUV3_PERFCTR_L1D_TLB(0x25). Which are as per spec,
counts TLB access/misses for both memory-read operation and
memory-write operation.

IMO, It won't help in keeping these events, knowingly that their
mapping is not accurate, only thing i can say to users is , dont use
events that are marked as "Hardware cache event"

>
> Also, looking at the code, I think our PMCEID parsing is broken for 8.1
> parts, where the upper 32 bits of the register are offset by 0x4000 in the
> event numbering space.

yes, i did not find any mapping in PMCEIDx registers for
implementation defined events, otherwise we would have remapped at
runtime.

>
> Will

thanks
Ganapat

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.
@ 2018-10-04 19:39           ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 18+ messages in thread
From: Ganapatrao Kulkarni @ 2018-10-04 19:39 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On Thu, Oct 4, 2018 at 5:51 PM Will Deacon <will.deacon@arm.com> wrote:
>
> Hi Ganapat,
>
> On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote:
> > can you please pull this patch?
>
> I still don't like the idea of just removing events like this, especially
> when other architectures (including some x86 and Power CPUs afaict) playa
> similar games for generic events, and these events do actually appear in
> user code.
>
> I also don't understand why you remove the TLB events. I think that logic
> would imply we should remove all of the events, because we can't distinguish
> prefetches from reads either. If we want to be consistent, then I think
> we should just remove the OP_WRITE events for L1D and BPU -- would you be
> ok with that instead?

IIUC, dTLB-load-misses is mapped to
ARMV8_PMUV3_PERFCTR_L1D_TLB_REFILL(event 0x05) and dTLB-loads is
mapped to ARMV8_PMUV3_PERFCTR_L1D_TLB(0x25). Which are as per spec,
counts TLB access/misses for both memory-read operation and
memory-write operation.

IMO, It won't help in keeping these events, knowingly that their
mapping is not accurate, only thing i can say to users is , dont use
events that are marked as "Hardware cache event"

>
> Also, looking at the code, I think our PMCEID parsing is broken for 8.1
> parts, where the upper 32 bits of the register are offset by 0x4000 in the
> event numbering space.

yes, i did not find any mapping in PMCEIDx registers for
implementation defined events, otherwise we would have remapped at
runtime.

>
> Will

thanks
Ganapat

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.
  2018-10-04 12:22         ` Will Deacon
@ 2018-10-05 12:27           ` John Garry
  -1 siblings, 0 replies; 18+ messages in thread
From: John Garry @ 2018-10-05 12:27 UTC (permalink / raw)
  To: Will Deacon, Ganapatrao Kulkarni
  Cc: Mark Rutland, Nair, Jayachandran, Jan.Glauber, Peter Zijlstra,
	catalin.marinas, LKML, Arnaldo Carvalho de Melo, Robert Richter,
	Ingo Molnar, Vadim.Lomovtsev, Ganapatrao Kulkarni,
	linux-arm-kernel

On 04/10/2018 13:22, Will Deacon wrote:
> Hi Ganapat,
>
> On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote:
>> can you please pull this patch?
>
> I still don't like the idea of just removing events like this, especially
> when other architectures (including some x86 and Power CPUs afaict) playa
> similar games for generic events, and these events do actually appear in
> user code.
>
> I also don't understand why you remove the TLB events. I think that logic
> would imply we should remove all of the events, because we can't distinguish
> prefetches from reads either. If we want to be consistent, then I think
> we should just remove the OP_WRITE events for L1D and BPU -- would you be
> ok with that instead?
>
> Also, looking at the code, I think our PMCEID parsing is broken for 8.1
> parts, where the upper 32 bits of the register are offset by 0x4000 in the
> event numbering space.
>

Here's something I noticed:
static ssize_t
armv8pmu_events_sysfs_show(struct device *dev,
			   struct device_attribute *attr, char *page)
{
	struct perf_pmu_events_attr *pmu_attr;

	pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr);

	return sprintf(page, "event=0x%03llx\n", pmu_attr->id);

Should this be min width now be 4, since event width is now 16 bits 
(even though I don't know why we need to specify this width at all)?

Cheers,
John

> Will
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
> .
>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.
@ 2018-10-05 12:27           ` John Garry
  0 siblings, 0 replies; 18+ messages in thread
From: John Garry @ 2018-10-05 12:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/10/2018 13:22, Will Deacon wrote:
> Hi Ganapat,
>
> On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote:
>> can you please pull this patch?
>
> I still don't like the idea of just removing events like this, especially
> when other architectures (including some x86 and Power CPUs afaict) playa
> similar games for generic events, and these events do actually appear in
> user code.
>
> I also don't understand why you remove the TLB events. I think that logic
> would imply we should remove all of the events, because we can't distinguish
> prefetches from reads either. If we want to be consistent, then I think
> we should just remove the OP_WRITE events for L1D and BPU -- would you be
> ok with that instead?
>
> Also, looking at the code, I think our PMCEID parsing is broken for 8.1
> parts, where the upper 32 bits of the register are offset by 0x4000 in the
> event numbering space.
>

Here's something I noticed:
static ssize_t
armv8pmu_events_sysfs_show(struct device *dev,
			   struct device_attribute *attr, char *page)
{
	struct perf_pmu_events_attr *pmu_attr;

	pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr);

	return sprintf(page, "event=0x%03llx\n", pmu_attr->id);

Should this be min width now be 4, since event width is now 16 bits 
(even though I don't know why we need to specify this width at all)?

Cheers,
John

> Will
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
> .
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.
  2018-10-05 12:27           ` John Garry
@ 2018-10-05 12:39             ` Will Deacon
  -1 siblings, 0 replies; 18+ messages in thread
From: Will Deacon @ 2018-10-05 12:39 UTC (permalink / raw)
  To: John Garry
  Cc: Ganapatrao Kulkarni, Mark Rutland, Nair, Jayachandran,
	Jan.Glauber, Peter Zijlstra, catalin.marinas, LKML,
	Arnaldo Carvalho de Melo, Robert Richter, Ingo Molnar,
	Vadim.Lomovtsev, Ganapatrao Kulkarni, linux-arm-kernel

On Fri, Oct 05, 2018 at 01:27:08PM +0100, John Garry wrote:
> On 04/10/2018 13:22, Will Deacon wrote:
> >Hi Ganapat,
> >
> >On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote:
> >>can you please pull this patch?
> >
> >I still don't like the idea of just removing events like this, especially
> >when other architectures (including some x86 and Power CPUs afaict) playa
> >similar games for generic events, and these events do actually appear in
> >user code.
> >
> >I also don't understand why you remove the TLB events. I think that logic
> >would imply we should remove all of the events, because we can't distinguish
> >prefetches from reads either. If we want to be consistent, then I think
> >we should just remove the OP_WRITE events for L1D and BPU -- would you be
> >ok with that instead?
> >
> >Also, looking at the code, I think our PMCEID parsing is broken for 8.1
> >parts, where the upper 32 bits of the register are offset by 0x4000 in the
> >event numbering space.
> >
> 
> Here's something I noticed:
> static ssize_t
> armv8pmu_events_sysfs_show(struct device *dev,
> 			   struct device_attribute *attr, char *page)
> {
> 	struct perf_pmu_events_attr *pmu_attr;
> 
> 	pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr);
> 
> 	return sprintf(page, "event=0x%03llx\n", pmu_attr->id);
> 
> Should this be min width now be 4, since event width is now 16 bits (even
> though I don't know why we need to specify this width at all)?

Yeah, that is pretty weird, but the 16-bit events won't be truncated, so
I think I'd rather just leave that string as-is since it's ABI. I agree
that we probably shouldn't have bothered with the width at all in hindsight.

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.
@ 2018-10-05 12:39             ` Will Deacon
  0 siblings, 0 replies; 18+ messages in thread
From: Will Deacon @ 2018-10-05 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 05, 2018 at 01:27:08PM +0100, John Garry wrote:
> On 04/10/2018 13:22, Will Deacon wrote:
> >Hi Ganapat,
> >
> >On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote:
> >>can you please pull this patch?
> >
> >I still don't like the idea of just removing events like this, especially
> >when other architectures (including some x86 and Power CPUs afaict) playa
> >similar games for generic events, and these events do actually appear in
> >user code.
> >
> >I also don't understand why you remove the TLB events. I think that logic
> >would imply we should remove all of the events, because we can't distinguish
> >prefetches from reads either. If we want to be consistent, then I think
> >we should just remove the OP_WRITE events for L1D and BPU -- would you be
> >ok with that instead?
> >
> >Also, looking at the code, I think our PMCEID parsing is broken for 8.1
> >parts, where the upper 32 bits of the register are offset by 0x4000 in the
> >event numbering space.
> >
> 
> Here's something I noticed:
> static ssize_t
> armv8pmu_events_sysfs_show(struct device *dev,
> 			   struct device_attribute *attr, char *page)
> {
> 	struct perf_pmu_events_attr *pmu_attr;
> 
> 	pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr);
> 
> 	return sprintf(page, "event=0x%03llx\n", pmu_attr->id);
> 
> Should this be min width now be 4, since event width is now 16 bits (even
> though I don't know why we need to specify this width at all)?

Yeah, that is pretty weird, but the 16-bit events won't be truncated, so
I think I'd rather just leave that string as-is since it's ABI. I agree
that we probably shouldn't have bothered with the width at all in hindsight.

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.
  2018-10-04 19:39           ` Ganapatrao Kulkarni
@ 2018-10-05 13:34             ` Will Deacon
  -1 siblings, 0 replies; 18+ messages in thread
From: Will Deacon @ 2018-10-05 13:34 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: Ganapatrao Kulkarni, LKML, linux-arm-kernel, Mark Rutland,
	catalin.marinas, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Nair, Jayachandran, Robert Richter,
	Vadim.Lomovtsev, Jan.Glauber

On Fri, Oct 05, 2018 at 01:09:33AM +0530, Ganapatrao Kulkarni wrote:
> On Thu, Oct 4, 2018 at 5:51 PM Will Deacon <will.deacon@arm.com> wrote:
> > On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote:
> > > can you please pull this patch?
> >
> > I still don't like the idea of just removing events like this, especially
> > when other architectures (including some x86 and Power CPUs afaict) playa
> > similar games for generic events, and these events do actually appear in
> > user code.
> >
> > I also don't understand why you remove the TLB events. I think that logic
> > would imply we should remove all of the events, because we can't distinguish
> > prefetches from reads either. If we want to be consistent, then I think
> > we should just remove the OP_WRITE events for L1D and BPU -- would you be
> > ok with that instead?
> 
> IIUC, dTLB-load-misses is mapped to
> ARMV8_PMUV3_PERFCTR_L1D_TLB_REFILL(event 0x05) and dTLB-loads is
> mapped to ARMV8_PMUV3_PERFCTR_L1D_TLB(0x25). Which are as per spec,
> counts TLB access/misses for both memory-read operation and
> memory-write operation.
> 
> IMO, It won't help in keeping these events, knowingly that their
> mapping is not accurate, only thing i can say to users is , dont use
> events that are marked as "Hardware cache event"

Right, but my point is that all of the events are inaccurate by this line
of reasoning, because we don't support PREFETCH for any of them. So I'd
rather just drop the duplicate events from the WRITE entries, like other
CPUs and architectures do.

I'm about to pack my desk up because we're moving office, but I pushed out
some patches I hacked up on my perf/updates branch:

https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=perf/updates

I'll post them to the list next week if it's not the merge window.

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.
@ 2018-10-05 13:34             ` Will Deacon
  0 siblings, 0 replies; 18+ messages in thread
From: Will Deacon @ 2018-10-05 13:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 05, 2018 at 01:09:33AM +0530, Ganapatrao Kulkarni wrote:
> On Thu, Oct 4, 2018 at 5:51 PM Will Deacon <will.deacon@arm.com> wrote:
> > On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote:
> > > can you please pull this patch?
> >
> > I still don't like the idea of just removing events like this, especially
> > when other architectures (including some x86 and Power CPUs afaict) playa
> > similar games for generic events, and these events do actually appear in
> > user code.
> >
> > I also don't understand why you remove the TLB events. I think that logic
> > would imply we should remove all of the events, because we can't distinguish
> > prefetches from reads either. If we want to be consistent, then I think
> > we should just remove the OP_WRITE events for L1D and BPU -- would you be
> > ok with that instead?
> 
> IIUC, dTLB-load-misses is mapped to
> ARMV8_PMUV3_PERFCTR_L1D_TLB_REFILL(event 0x05) and dTLB-loads is
> mapped to ARMV8_PMUV3_PERFCTR_L1D_TLB(0x25). Which are as per spec,
> counts TLB access/misses for both memory-read operation and
> memory-write operation.
> 
> IMO, It won't help in keeping these events, knowingly that their
> mapping is not accurate, only thing i can say to users is , dont use
> events that are marked as "Hardware cache event"

Right, but my point is that all of the events are inaccurate by this line
of reasoning, because we don't support PREFETCH for any of them. So I'd
rather just drop the duplicate events from the WRITE entries, like other
CPUs and architectures do.

I'm about to pack my desk up because we're moving office, but I pushed out
some patches I hacked up on my perf/updates branch:

https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=perf/updates

I'll post them to the list next week if it's not the merge window.

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-10-05 13:34 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-01 10:07 [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events Kulkarni, Ganapatrao
2018-10-01 10:07 ` Kulkarni, Ganapatrao
2018-10-01 14:29 ` Will Deacon
2018-10-01 14:29   ` Will Deacon
2018-10-01 16:39   ` Ganapatrao Kulkarni
2018-10-01 16:39     ` Ganapatrao Kulkarni
2018-10-04  5:42     ` Ganapatrao Kulkarni
2018-10-04  5:42       ` Ganapatrao Kulkarni
2018-10-04 12:22       ` Will Deacon
2018-10-04 12:22         ` Will Deacon
2018-10-04 19:39         ` Ganapatrao Kulkarni
2018-10-04 19:39           ` Ganapatrao Kulkarni
2018-10-05 13:34           ` Will Deacon
2018-10-05 13:34             ` Will Deacon
2018-10-05 12:27         ` John Garry
2018-10-05 12:27           ` John Garry
2018-10-05 12:39           ` Will Deacon
2018-10-05 12:39             ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.