linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating
@ 2021-05-04  6:52 Suravee Suthikulpanit
  2021-05-04  9:39 ` Peter Zijlstra
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Suravee Suthikulpanit @ 2021-05-04  6:52 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, iommu
  Cc: peterz, mingo, joro, Jon.Grimm, amonakov, Suravee Suthikulpanit,
	David Coe

On certain AMD platforms, when the IOMMU performance counter source
(csource) field is zero, power-gating for the counter is enabled, which
prevents write access and returns zero for read access.

This can cause invalid perf result especially when event multiplexing
is needed (i.e. more number of events than available counters) since
the current logic keeps track of the previously read counter value,
and subsequently re-program the counter to continue counting the event.
With power-gating enabled, we cannot gurantee successful re-programming
of the counter.

Workaround this issue by :

1. Modifying the ordering of setting/reading counters and enabing/
   disabling csources to only access the counter when the csource
   is set to non-zero.

2. Since AMD IOMMU PMU does not support interrupt mode, the logic
   can be simplified to always start counting with value zero,
   and accumulate the counter value when stopping without the need
   to keep track and reprogram the counter with the previously read
   counter value.

This has been tested on systems with and without power-gating.

Fixes: 994d6608efe4 ("iommu/amd: Remove performance counter pre-initialization test")
Suggested-by: Alexander Monakov <amonakov@ispras.ru>
Cc: David Coe <david.coe@live.co.uk>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/events/amd/iommu.c | 47 ++++++++++++++++++++-----------------
 1 file changed, 26 insertions(+), 21 deletions(-)

diff --git a/arch/x86/events/amd/iommu.c b/arch/x86/events/amd/iommu.c
index 1c1a7e45dc64..913745f1419b 100644
--- a/arch/x86/events/amd/iommu.c
+++ b/arch/x86/events/amd/iommu.c
@@ -19,8 +19,6 @@
 #include "../perf_event.h"
 #include "iommu.h"
 
-#define COUNTER_SHIFT		16
-
 /* iommu pmu conf masks */
 #define GET_CSOURCE(x)     ((x)->conf & 0xFFULL)
 #define GET_DEVID(x)       (((x)->conf >> 8)  & 0xFFFFULL)
@@ -286,22 +284,31 @@ static void perf_iommu_start(struct perf_event *event, int flags)
 	WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
 	hwc->state = 0;
 
+	/*
+	 * To account for power-gating, which prevents write to
+	 * the counter, we need to enable the counter
+	 * before setting up counter register.
+	 */
+	perf_iommu_enable_event(event);
+
 	if (flags & PERF_EF_RELOAD) {
-		u64 prev_raw_count = local64_read(&hwc->prev_count);
+		u64 count = 0;
 		struct amd_iommu *iommu = perf_event_2_iommu(event);
 
+		/*
+		 * Since the IOMMU PMU only support counting mode,
+		 * the counter always start with value zero.
+		 */
 		amd_iommu_pc_set_reg(iommu, hwc->iommu_bank, hwc->iommu_cntr,
-				     IOMMU_PC_COUNTER_REG, &prev_raw_count);
+				     IOMMU_PC_COUNTER_REG, &count);
 	}
 
-	perf_iommu_enable_event(event);
 	perf_event_update_userpage(event);
-
 }
 
 static void perf_iommu_read(struct perf_event *event)
 {
-	u64 count, prev, delta;
+	u64 count;
 	struct hw_perf_event *hwc = &event->hw;
 	struct amd_iommu *iommu = perf_event_2_iommu(event);
 
@@ -312,14 +319,11 @@ static void perf_iommu_read(struct perf_event *event)
 	/* IOMMU pc counter register is only 48 bits */
 	count &= GENMASK_ULL(47, 0);
 
-	prev = local64_read(&hwc->prev_count);
-	if (local64_cmpxchg(&hwc->prev_count, prev, count) != prev)
-		return;
-
-	/* Handle 48-bit counter overflow */
-	delta = (count << COUNTER_SHIFT) - (prev << COUNTER_SHIFT);
-	delta >>= COUNTER_SHIFT;
-	local64_add(delta, &event->count);
+	/*
+	 * Since the counter always start with value zero,
+	 * simply just accumulate the count for the event.
+	 */
+	local64_add(count, &event->count);
 }
 
 static void perf_iommu_stop(struct perf_event *event, int flags)
@@ -329,15 +333,16 @@ static void perf_iommu_stop(struct perf_event *event, int flags)
 	if (hwc->state & PERF_HES_UPTODATE)
 		return;
 
+	/*
+	 * To account for power-gating, in which reading the counter would
+	 * return zero, we need to read the register before disabling.
+	 */
+	perf_iommu_read(event);
+	hwc->state |= PERF_HES_UPTODATE;
+
 	perf_iommu_disable_event(event);
 	WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
 	hwc->state |= PERF_HES_STOPPED;
-
-	if (hwc->state & PERF_HES_UPTODATE)
-		return;
-
-	perf_iommu_read(event);
-	hwc->state |= PERF_HES_UPTODATE;
 }
 
 static int perf_iommu_add(struct perf_event *event, int flags)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating
  2021-05-04  6:52 [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating Suravee Suthikulpanit
@ 2021-05-04  9:39 ` Peter Zijlstra
  2021-05-04 11:58   ` Suthikulpanit, Suravee
  2021-05-04 17:04 ` David Coe
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2021-05-04  9:39 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: linux-kernel, linux-perf-users, iommu, mingo, joro, Jon.Grimm,
	amonakov, David Coe

On Tue, May 04, 2021 at 01:52:36AM -0500, Suravee Suthikulpanit wrote:

> 2. Since AMD IOMMU PMU does not support interrupt mode, the logic
>    can be simplified to always start counting with value zero,
>    and accumulate the counter value when stopping without the need
>    to keep track and reprogram the counter with the previously read
>    counter value.

This relies on the hardware counter being the full 64bit wide, is it?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating
  2021-05-04  9:39 ` Peter Zijlstra
@ 2021-05-04 11:58   ` Suthikulpanit, Suravee
  2021-05-04 12:13     ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Suthikulpanit, Suravee @ 2021-05-04 11:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-perf-users, iommu, mingo, joro, Jon.Grimm,
	amonakov, David Coe

Peter,

On 5/4/2021 4:39 PM, Peter Zijlstra wrote:
> On Tue, May 04, 2021 at 01:52:36AM -0500, Suravee Suthikulpanit wrote:
> 
>> 2. Since AMD IOMMU PMU does not support interrupt mode, the logic
>>     can be simplified to always start counting with value zero,
>>     and accumulate the counter value when stopping without the need
>>     to keep track and reprogram the counter with the previously read
>>     counter value.
> 
> This relies on the hardware counter being the full 64bit wide, is it?
> 

The HW counter value is 48-bit. Not sure why it needs to be 64-bit?
I might be missing some points here? Could you please describe?

Thanks,
Suravee



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating
  2021-05-04 11:58   ` Suthikulpanit, Suravee
@ 2021-05-04 12:13     ` Peter Zijlstra
  2021-05-05 12:39       ` Suthikulpanit, Suravee
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2021-05-04 12:13 UTC (permalink / raw)
  To: Suthikulpanit, Suravee
  Cc: linux-kernel, linux-perf-users, iommu, mingo, joro, Jon.Grimm,
	amonakov, David Coe

On Tue, May 04, 2021 at 06:58:29PM +0700, Suthikulpanit, Suravee wrote:
> Peter,
> 
> On 5/4/2021 4:39 PM, Peter Zijlstra wrote:
> > On Tue, May 04, 2021 at 01:52:36AM -0500, Suravee Suthikulpanit wrote:
> > 
> > > 2. Since AMD IOMMU PMU does not support interrupt mode, the logic
> > >     can be simplified to always start counting with value zero,
> > >     and accumulate the counter value when stopping without the need
> > >     to keep track and reprogram the counter with the previously read
> > >     counter value.
> > 
> > This relies on the hardware counter being the full 64bit wide, is it?
> > 
> 
> The HW counter value is 48-bit. Not sure why it needs to be 64-bit?
> I might be missing some points here? Could you please describe?

How do you deal with the 48bit overflow if you don't use the interrupt?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating
  2021-05-04  6:52 [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating Suravee Suthikulpanit
  2021-05-04  9:39 ` Peter Zijlstra
@ 2021-05-04 17:04 ` David Coe
  2021-05-05 10:24 ` David Coe
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: David Coe @ 2021-05-04 17:04 UTC (permalink / raw)
  To: Suravee Suthikulpanit, linux-kernel, linux-perf-users, iommu
  Cc: peterz, mingo, joro, Jon.Grimm, amonakov

Hi again!

On 04/05/2021 07:52, Suravee Suthikulpanit wrote:
> On certain AMD platforms, when the IOMMU performance counter source
> (csource) field is zero, power-gating for the counter is enabled, which
> prevents write access and returns zero for read access.
> 
> This can cause invalid perf result especially when event multiplexing
> is needed (i.e. more number of events than available counters) since
> the current logic keeps track of the previously read counter value,
> and subsequently re-program the counter to continue counting the event.
> With power-gating enabled, we cannot gurantee successful re-programming
> of the counter.
> 
> Workaround this issue by :
> 
> 1. Modifying the ordering of setting/reading counters and enabing/
>     disabling csources to only access the counter when the csource
>     is set to non-zero.
> 
> 2. Since AMD IOMMU PMU does not support interrupt mode, the logic
>     can be simplified to always start counting with value zero,
>     and accumulate the counter value when stopping without the need
>     to keep track and reprogram the counter with the previously read
>     counter value.
> 

Results for Ryzen 4700U running Ubuntu 21.04 kernel 5.11.0-16 patched as 
above.

All amd_iommu events:

  Performance counter stats for 'system wide':

                 18       amd_iommu_0/cmd_processed/            (33.29%)
                  9       amd_iommu_0/cmd_processed_inv/        (33.33%)
                  0       amd_iommu_0/ign_rd_wr_mmio_1ff8h/     (33.36%)
                308       amd_iommu_0/int_dte_hit/              (33.40%)
                  5       amd_iommu_0/int_dte_mis/              (33.45%)
                346       amd_iommu_0/mem_dte_hit/              (33.46%)
              8,954       amd_iommu_0/mem_dte_mis/              (33.48%)
                  0       amd_iommu_0/mem_iommu_tlb_pde_hit/    (33.46%)
                771       amd_iommu_0/mem_iommu_tlb_pde_mis/    (33.44%)
                 14       amd_iommu_0/mem_iommu_tlb_pte_hit/    (33.40%)
                836       amd_iommu_0/mem_iommu_tlb_pte_mis/    (33.36%)
                  0       amd_iommu_0/mem_pass_excl/            (33.32%)
                  0       amd_iommu_0/mem_pass_pretrans/        (33.28%)
              1,601       amd_iommu_0/mem_pass_untrans/         (33.27%)
                  0       amd_iommu_0/mem_target_abort/         (33.27%)
              1,130       amd_iommu_0/mem_trans_total/          (33.27%)
                  0       amd_iommu_0/page_tbl_read_gst/        (33.27%)
                312       amd_iommu_0/page_tbl_read_nst/        (33.28%)
                279       amd_iommu_0/page_tbl_read_tot/        (33.27%)
                  0       amd_iommu_0/smi_blk/                  (33.29%)
                  0       amd_iommu_0/smi_recv/                 (33.27%)
                  0       amd_iommu_0/tlb_inv/                  (33.26%)
                  0       amd_iommu_0/vapic_int_guest/          (33.25%)
                366       amd_iommu_0/vapic_int_non_guest/      (33.27%)

       10.001941666 seconds time elapsed


Groups of 8 amd_iommu events:

  Performance counter stats for 'system wide':

                 14       amd_iommu_0/cmd_processed/ 

                  7       amd_iommu_0/cmd_processed_inv/ 

                  0       amd_iommu_0/ign_rd_wr_mmio_1ff8h/ 

                502       amd_iommu_0/int_dte_hit/ 

                  6       amd_iommu_0/int_dte_mis/ 

                532       amd_iommu_0/mem_dte_hit/ 

             13,622       amd_iommu_0/mem_dte_mis/ 

                159       amd_iommu_0/mem_iommu_tlb_pde_hit/ 


       10.002170562 seconds time elapsed


  Performance counter stats for 'system wide':

                762       amd_iommu_0/mem_iommu_tlb_pde_mis/ 

                 20       amd_iommu_0/mem_iommu_tlb_pte_hit/ 

                698       amd_iommu_0/mem_iommu_tlb_pte_mis/ 

                  0       amd_iommu_0/mem_pass_excl/ 

                  0       amd_iommu_0/mem_pass_pretrans/ 

                 15       amd_iommu_0/mem_pass_untrans/ 

                  0       amd_iommu_0/mem_target_abort/ 

                718       amd_iommu_0/mem_trans_total/ 


       10.001683428 seconds time elapsed


  Performance counter stats for 'system wide':

                  0       amd_iommu_0/page_tbl_read_gst/ 

                 33       amd_iommu_0/page_tbl_read_nst/ 

                 33       amd_iommu_0/page_tbl_read_tot/ 

                  0       amd_iommu_0/smi_blk/ 

                  0       amd_iommu_0/smi_recv/ 

                  0       amd_iommu_0/tlb_inv/ 

                  0       amd_iommu_0/vapic_int_guest/ 

             11,638       amd_iommu_0/vapic_int_non_guest/ 


       10.002205748 seconds time elapsed

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating
  2021-05-04  6:52 [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating Suravee Suthikulpanit
  2021-05-04  9:39 ` Peter Zijlstra
  2021-05-04 17:04 ` David Coe
@ 2021-05-05 10:24 ` David Coe
  2021-05-06 13:48 ` [tip: perf/urgent] " tip-bot2 for Suravee Suthikulpanit
  2021-05-14 10:48 ` [PATCH] " David Coe
  4 siblings, 0 replies; 11+ messages in thread
From: David Coe @ 2021-05-05 10:24 UTC (permalink / raw)
  To: Suravee Suthikulpanit, linux-kernel, linux-perf-users, iommu
  Cc: peterz, mingo, joro, Jon.Grimm, amonakov

Hi, once more!

On 04/05/2021 07:52, Suravee Suthikulpanit wrote:
> On certain AMD platforms, when the IOMMU performance counter source
> (csource) field is zero, power-gating for the counter is enabled, which
> prevents write access and returns zero for read access.
> 
> This can cause invalid perf result especially when event multiplexing
> is needed (i.e. more number of events than available counters) since
> the current logic keeps track of the previously read counter value,
> and subsequently re-program the counter to continue counting the event.
> With power-gating enabled, we cannot gurantee successful re-programming
> of the counter.
> 
> Workaround this issue by :
> 
> 1. Modifying the ordering of setting/reading counters and enabing/
>     disabling csources to only access the counter when the csource
>     is set to non-zero.
> 
> 2. Since AMD IOMMU PMU does not support interrupt mode, the logic
>     can be simplified to always start counting with value zero,
>     and accumulate the counter value when stopping without the need
>     to keep track and reprogram the counter with the previously read
>     counter value.
> 
> This has been tested on systems with and without power-gating.
> 

Just as a final, sanity check I've loaded the same patched kernel 
5.11.0-16 on to an old AMD Athlon FX8350. So far, all seems in order: it 
loads IOMMUv1 and runs Ubuntu 21.04 without incident!

Much appreciate all your efforts, Suravee, Alex et al. Best regards.

-- 
David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating
  2021-05-04 12:13     ` Peter Zijlstra
@ 2021-05-05 12:39       ` Suthikulpanit, Suravee
  2021-05-05 13:05         ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Suthikulpanit, Suravee @ 2021-05-05 12:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-perf-users, iommu, mingo, joro, Jon.Grimm,
	amonakov, David Coe

Peter,

On 5/4/2021 7:13 PM, Peter Zijlstra wrote:
> On Tue, May 04, 2021 at 06:58:29PM +0700, Suthikulpanit, Suravee wrote:
>> Peter,
>>
>> On 5/4/2021 4:39 PM, Peter Zijlstra wrote:
>>> On Tue, May 04, 2021 at 01:52:36AM -0500, Suravee Suthikulpanit wrote:
>>>
>>>> 2. Since AMD IOMMU PMU does not support interrupt mode, the logic
>>>>      can be simplified to always start counting with value zero,
>>>>      and accumulate the counter value when stopping without the need
>>>>      to keep track and reprogram the counter with the previously read
>>>>      counter value.
>>>
>>> This relies on the hardware counter being the full 64bit wide, is it?
>>>
>>
>> The HW counter value is 48-bit. Not sure why it needs to be 64-bit?
>> I might be missing some points here? Could you please describe?
> 
> How do you deal with the 48bit overflow if you don't use the interrupt?

The IOMMU Perf driver does not currently handle counter overflow since the overflow
notification mechanism (i.e. IOMMU creates an EVENT_COUNTER_ZERO event in the IOMMU event log,
and generate an IOMMU MSI interrupt to signal IOMMU driver to process the event.) is not
currently supported. When counter overflows, the counter becomes zero, and Perf
reports value zero for the event.

Alternatively, to detect overflow, we might start counting with value 1 so that
we can detect overflow when the value becomes zero in which case the Perf driver
could generate error message.

Regards,
Suravee

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating
  2021-05-05 12:39       ` Suthikulpanit, Suravee
@ 2021-05-05 13:05         ` Peter Zijlstra
  2021-05-10  2:08           ` Suthikulpanit, Suravee
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2021-05-05 13:05 UTC (permalink / raw)
  To: Suthikulpanit, Suravee
  Cc: linux-kernel, linux-perf-users, iommu, mingo, joro, Jon.Grimm,
	amonakov, David Coe

On Wed, May 05, 2021 at 07:39:14PM +0700, Suthikulpanit, Suravee wrote:
> Peter,
> 
> On 5/4/2021 7:13 PM, Peter Zijlstra wrote:
> > On Tue, May 04, 2021 at 06:58:29PM +0700, Suthikulpanit, Suravee wrote:
> > > Peter,
> > > 
> > > On 5/4/2021 4:39 PM, Peter Zijlstra wrote:
> > > > On Tue, May 04, 2021 at 01:52:36AM -0500, Suravee Suthikulpanit wrote:
> > > > 
> > > > > 2. Since AMD IOMMU PMU does not support interrupt mode, the logic
> > > > >      can be simplified to always start counting with value zero,
> > > > >      and accumulate the counter value when stopping without the need
> > > > >      to keep track and reprogram the counter with the previously read
> > > > >      counter value.
> > > > 
> > > > This relies on the hardware counter being the full 64bit wide, is it?
> > > > 
> > > 
> > > The HW counter value is 48-bit. Not sure why it needs to be 64-bit?
> > > I might be missing some points here? Could you please describe?
> > 
> > How do you deal with the 48bit overflow if you don't use the interrupt?
> 
> The IOMMU Perf driver does not currently handle counter overflow since the overflow
> notification mechanism (i.e. IOMMU creates an EVENT_COUNTER_ZERO event in the IOMMU event log,
> and generate an IOMMU MSI interrupt to signal IOMMU driver to process the event.) is not
> currently supported. When counter overflows, the counter becomes zero, and Perf
> reports value zero for the event.
> 
> Alternatively, to detect overflow, we might start counting with value 1 so that
> we can detect overflow when the value becomes zero in which case the Perf driver
> could generate error message.

Urgh.. the intel uncore driver programs an hrtimer to periodically fold
deltas. That way the counter will never be short.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tip: perf/urgent] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating
  2021-05-04  6:52 [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating Suravee Suthikulpanit
                   ` (2 preceding siblings ...)
  2021-05-05 10:24 ` David Coe
@ 2021-05-06 13:48 ` tip-bot2 for Suravee Suthikulpanit
  2021-05-14 10:48 ` [PATCH] " David Coe
  4 siblings, 0 replies; 11+ messages in thread
From: tip-bot2 for Suravee Suthikulpanit @ 2021-05-06 13:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Alexander Monakov, Suravee Suthikulpanit, Peter Zijlstra (Intel),
	x86, linux-kernel

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     e10de314287c2c14b0e6f0e3e961975ce2f4a83d
Gitweb:        https://git.kernel.org/tip/e10de314287c2c14b0e6f0e3e961975ce2f4a83d
Author:        Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
AuthorDate:    Tue, 04 May 2021 01:52:36 -05:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 06 May 2021 15:33:37 +02:00

x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating

On certain AMD platforms, when the IOMMU performance counter source
(csource) field is zero, power-gating for the counter is enabled, which
prevents write access and returns zero for read access.

This can cause invalid perf result especially when event multiplexing
is needed (i.e. more number of events than available counters) since
the current logic keeps track of the previously read counter value,
and subsequently re-program the counter to continue counting the event.
With power-gating enabled, we cannot gurantee successful re-programming
of the counter.

Workaround this issue by :

1. Modifying the ordering of setting/reading counters and enabing/
   disabling csources to only access the counter when the csource
   is set to non-zero.

2. Since AMD IOMMU PMU does not support interrupt mode, the logic
   can be simplified to always start counting with value zero,
   and accumulate the counter value when stopping without the need
   to keep track and reprogram the counter with the previously read
   counter value.

This has been tested on systems with and without power-gating.

Fixes: 994d6608efe4 ("iommu/amd: Remove performance counter pre-initialization test")
Suggested-by: Alexander Monakov <amonakov@ispras.ru>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210504065236.4415-1-suravee.suthikulpanit@amd.com
---
 arch/x86/events/amd/iommu.c | 47 +++++++++++++++++++-----------------
 1 file changed, 26 insertions(+), 21 deletions(-)

diff --git a/arch/x86/events/amd/iommu.c b/arch/x86/events/amd/iommu.c
index 6a98a76..2da6139 100644
--- a/arch/x86/events/amd/iommu.c
+++ b/arch/x86/events/amd/iommu.c
@@ -18,8 +18,6 @@
 #include "../perf_event.h"
 #include "iommu.h"
 
-#define COUNTER_SHIFT		16
-
 /* iommu pmu conf masks */
 #define GET_CSOURCE(x)     ((x)->conf & 0xFFULL)
 #define GET_DEVID(x)       (((x)->conf >> 8)  & 0xFFFFULL)
@@ -285,22 +283,31 @@ static void perf_iommu_start(struct perf_event *event, int flags)
 	WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
 	hwc->state = 0;
 
+	/*
+	 * To account for power-gating, which prevents write to
+	 * the counter, we need to enable the counter
+	 * before setting up counter register.
+	 */
+	perf_iommu_enable_event(event);
+
 	if (flags & PERF_EF_RELOAD) {
-		u64 prev_raw_count = local64_read(&hwc->prev_count);
+		u64 count = 0;
 		struct amd_iommu *iommu = perf_event_2_iommu(event);
 
+		/*
+		 * Since the IOMMU PMU only support counting mode,
+		 * the counter always start with value zero.
+		 */
 		amd_iommu_pc_set_reg(iommu, hwc->iommu_bank, hwc->iommu_cntr,
-				     IOMMU_PC_COUNTER_REG, &prev_raw_count);
+				     IOMMU_PC_COUNTER_REG, &count);
 	}
 
-	perf_iommu_enable_event(event);
 	perf_event_update_userpage(event);
-
 }
 
 static void perf_iommu_read(struct perf_event *event)
 {
-	u64 count, prev, delta;
+	u64 count;
 	struct hw_perf_event *hwc = &event->hw;
 	struct amd_iommu *iommu = perf_event_2_iommu(event);
 
@@ -311,14 +318,11 @@ static void perf_iommu_read(struct perf_event *event)
 	/* IOMMU pc counter register is only 48 bits */
 	count &= GENMASK_ULL(47, 0);
 
-	prev = local64_read(&hwc->prev_count);
-	if (local64_cmpxchg(&hwc->prev_count, prev, count) != prev)
-		return;
-
-	/* Handle 48-bit counter overflow */
-	delta = (count << COUNTER_SHIFT) - (prev << COUNTER_SHIFT);
-	delta >>= COUNTER_SHIFT;
-	local64_add(delta, &event->count);
+	/*
+	 * Since the counter always start with value zero,
+	 * simply just accumulate the count for the event.
+	 */
+	local64_add(count, &event->count);
 }
 
 static void perf_iommu_stop(struct perf_event *event, int flags)
@@ -328,15 +332,16 @@ static void perf_iommu_stop(struct perf_event *event, int flags)
 	if (hwc->state & PERF_HES_UPTODATE)
 		return;
 
+	/*
+	 * To account for power-gating, in which reading the counter would
+	 * return zero, we need to read the register before disabling.
+	 */
+	perf_iommu_read(event);
+	hwc->state |= PERF_HES_UPTODATE;
+
 	perf_iommu_disable_event(event);
 	WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
 	hwc->state |= PERF_HES_STOPPED;
-
-	if (hwc->state & PERF_HES_UPTODATE)
-		return;
-
-	perf_iommu_read(event);
-	hwc->state |= PERF_HES_UPTODATE;
 }
 
 static int perf_iommu_add(struct perf_event *event, int flags)

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating
  2021-05-05 13:05         ` Peter Zijlstra
@ 2021-05-10  2:08           ` Suthikulpanit, Suravee
  0 siblings, 0 replies; 11+ messages in thread
From: Suthikulpanit, Suravee @ 2021-05-10  2:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-perf-users, iommu, mingo, joro, Jon.Grimm,
	amonakov, David Coe



On 5/5/2021 8:05 PM, Peter Zijlstra wrote:
> On Wed, May 05, 2021 at 07:39:14PM +0700, Suthikulpanit, Suravee wrote:
>> Peter,
>>
>> On 5/4/2021 7:13 PM, Peter Zijlstra wrote:
>>> On Tue, May 04, 2021 at 06:58:29PM +0700, Suthikulpanit, Suravee wrote:
>>>> Peter,
>>>>
>>>> On 5/4/2021 4:39 PM, Peter Zijlstra wrote:
>>>>> On Tue, May 04, 2021 at 01:52:36AM -0500, Suravee Suthikulpanit wrote:
>>>>>
>>>>>> 2. Since AMD IOMMU PMU does not support interrupt mode, the logic
>>>>>>       can be simplified to always start counting with value zero,
>>>>>>       and accumulate the counter value when stopping without the need
>>>>>>       to keep track and reprogram the counter with the previously read
>>>>>>       counter value.
>>>>>
>>>>> This relies on the hardware counter being the full 64bit wide, is it?
>>>>>
>>>>
>>>> The HW counter value is 48-bit. Not sure why it needs to be 64-bit?
>>>> I might be missing some points here? Could you please describe?
>>>
>>> How do you deal with the 48bit overflow if you don't use the interrupt?
>>
>> The IOMMU Perf driver does not currently handle counter overflow since the overflow
>> notification mechanism (i.e. IOMMU creates an EVENT_COUNTER_ZERO event in the IOMMU event log,
>> and generate an IOMMU MSI interrupt to signal IOMMU driver to process the event.) is not
>> currently supported. When counter overflows, the counter becomes zero, and Perf
>> reports value zero for the event.
>>
>> Alternatively, to detect overflow, we might start counting with value 1 so that
>> we can detect overflow when the value becomes zero in which case the Perf driver
>> could generate error message.
> 
> Urgh.. the intel uncore driver programs an hrtimer to periodically fold
> deltas. That way the counter will never be short.
> 

Thanks for the info. I'll look into ways to detect and handle counter overflow for this.
So far, with the current power-gating, it has several restrictions regarding when
the HW counter can be accessed, which makes it more difficult to handle this.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating
  2021-05-04  6:52 [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating Suravee Suthikulpanit
                   ` (3 preceding siblings ...)
  2021-05-06 13:48 ` [tip: perf/urgent] " tip-bot2 for Suravee Suthikulpanit
@ 2021-05-14 10:48 ` David Coe
  4 siblings, 0 replies; 11+ messages in thread
From: David Coe @ 2021-05-14 10:48 UTC (permalink / raw)
  To: Suravee Suthikulpanit, linux-kernel, linux-perf-users, iommu
  Cc: peterz, mingo, joro, Jon.Grimm, amonakov

Hi all!

On 04/05/2021 07:52, Suravee Suthikulpanit wrote:
> On certain AMD platforms, when the IOMMU performance counter source
> (csource) field is zero, power-gating for the counter is enabled, which
> prevents write access and returns zero for read access.
> 
> This can cause invalid perf result especially when event multiplexing
> is needed (i.e. more number of events than available counters) since
> the current logic keeps track of the previously read counter value,
> and subsequently re-program the counter to continue counting the event.
> With power-gating enabled, we cannot gurantee successful re-programming
> of the counter.
> 
> Workaround this issue by :
> 
> 1. Modifying the ordering of setting/reading counters and enabing/
>     disabling csources to only access the counter when the csource
>     is set to non-zero.
> 
> 2. Since AMD IOMMU PMU does not support interrupt mode, the logic
>     can be simplified to always start counting with value zero,
>     and accumulate the counter value when stopping without the need
>     to keep track and reprogram the counter with the previously read
>     counter value.
> 
> This has been tested on systems with and without power-gating.

I've just noticed kernel-5.13-rc1 includes your full iommu enchilada. A 
quick test with Ubuntu's mainline ppa debs (and a home-spun perf)gives 
on a Ryzen 2400G what seem very satisfactory results. Bravo!

  Performance counter stats for 'system wide':

                  0       amd_iommu_0/cmd_processed/           (33.32%)
                  0       amd_iommu_0/cmd_processed_inv/       (33.34%)
                  0       amd_iommu_0/ign_rd_wr_mmio_1ff8h/    (33.38%)
                615       amd_iommu_0/int_dte_hit/             (33.44%)
                  5       amd_iommu_0/int_dte_mis/             (33.44%)
              1,347       amd_iommu_0/mem_dte_hit/             (33.46%)
             19,127       amd_iommu_0/mem_dte_mis/             (33.44%)
                 71       amd_iommu_0/mem_iommu_tlb_pde_hit/   (33.43%)
                754       amd_iommu_0/mem_iommu_tlb_pde_mis/   (33.41%)
              1,777       amd_iommu_0/mem_iommu_tlb_pte_hit/   (33.36%)
             20,163       amd_iommu_0/mem_iommu_tlb_pte_mis/   (33.32%)
                  0       amd_iommu_0/mem_pass_excl/           (33.25%)
                  0       amd_iommu_0/mem_pass_pretrans/       (33.28%)
             27,283       amd_iommu_0/mem_pass_untrans/        (33.27%)
                  0       amd_iommu_0/mem_target_abort/        (33.29%)
                645       amd_iommu_0/mem_trans_total/         (33.32%)
                  0       amd_iommu_0/page_tbl_read_gst/       (33.28%)
                183       amd_iommu_0/page_tbl_read_nst/       (33.30%)
                 45       amd_iommu_0/page_tbl_read_tot/       (33.30%)
                  0       amd_iommu_0/smi_blk/                 (33.32%)
                  0       amd_iommu_0/smi_recv/                (33.28%)
                  0       amd_iommu_0/tlb_inv/                 (33.27%)
                  0       amd_iommu_0/vapic_int_guest/         (33.28%)
                613       amd_iommu_0/vapic_int_non_guest/     (33.26%)

        9.998673791 seconds time elapsed

Running Windows 10 & etc under QEMU/KVM produces nothing untoward. 
Again, congratulations and many thanks.

-- 
David

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-05-14 10:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-04  6:52 [PATCH] x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating Suravee Suthikulpanit
2021-05-04  9:39 ` Peter Zijlstra
2021-05-04 11:58   ` Suthikulpanit, Suravee
2021-05-04 12:13     ` Peter Zijlstra
2021-05-05 12:39       ` Suthikulpanit, Suravee
2021-05-05 13:05         ` Peter Zijlstra
2021-05-10  2:08           ` Suthikulpanit, Suravee
2021-05-04 17:04 ` David Coe
2021-05-05 10:24 ` David Coe
2021-05-06 13:48 ` [tip: perf/urgent] " tip-bot2 for Suravee Suthikulpanit
2021-05-14 10:48 ` [PATCH] " David Coe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).