kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] KVM: x86/pmu: Clear reserved bit PERF_CTL2[43] for AMD erratum 1292
@ 2022-01-17  5:57 Like Xu
  2022-02-02  4:28 ` [PATCH] perf/amd: Implement errata #1292 workaround for F19h M00-0Fh Ravi Bangoria
  0 siblings, 1 reply; 22+ messages in thread
From: Like Xu @ 2022-01-17  5:57 UTC (permalink / raw)
  To: Paolo Bonzini, Jim Mattson
  Cc: Ananth Narayan, Sean Christopherson, Wanpeng Li,
	Vitaly Kuznetsov, Joerg Roedel, x86, kvm, linux-kernel

From: Like Xu <likexu@tencent.com>

The AMD Family 19h Models 00h-0Fh Processors may experience sampling
inaccuracies that cause the following performance counters to overcount
retire-based events. To count the non-FP affected PMC events correctly,
a patched guest with a target vCPU model would:

    - Use Core::X86::Msr::PERF_CTL2 to count the events, and
    - Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
    - Program Core::X86::Msr::PERF_CTL2[20] to 0b.

To support this use of AMD guests, KVM should not reserve bit 43
only for counter #2. Treatment of other cases remains unchanged.

Note, the host's perf subsystem will decide which hardware counter
will be used for the guest counter, based on its own physical CPU
model and its own workaround(s) in the host perf context.

Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/svm/pmu.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 12d8b301065a..1111b12adcca 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -18,6 +18,17 @@
 #include "pmu.h"
 #include "svm.h"
 
+/*
+ * As a workaround of "Retire Based Events May Overcount" for erratum 1292,
+ * some patched guests may set PERF_CTL2[43] to 1b and PERF_CTL2[20] to 0b
+ * to count the non-FP affected PMC events correctly.
+ */
+static inline bool vcpu_overcount_retire_events(struct kvm_vcpu *vcpu)
+{
+	return guest_cpuid_family(vcpu) == 0x19 &&
+		guest_cpuid_model(vcpu) < 0x10;
+}
+
 enum pmu_type {
 	PMU_TYPE_COUNTER = 0,
 	PMU_TYPE_EVNTSEL,
@@ -252,6 +263,7 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	struct kvm_pmc *pmc;
 	u32 msr = msr_info->index;
 	u64 data = msr_info->data;
+	u64 reserved_bits;
 
 	/* MSR_PERFCTRn */
 	pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_COUNTER);
@@ -264,7 +276,10 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	if (pmc) {
 		if (data == pmc->eventsel)
 			return 0;
-		if (!(data & pmu->reserved_bits)) {
+		reserved_bits = pmu->reserved_bits;
+		if (pmc->idx == 2 && vcpu_overcount_retire_events(vcpu))
+			reserved_bits &= ~BIT_ULL(43);
+		if (!(data & reserved_bits)) {
 			reprogram_gp_counter(pmc, data);
 			return 0;
 		}
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH] perf/amd: Implement errata #1292 workaround for F19h M00-0Fh
  2022-01-17  5:57 [PATCH] KVM: x86/pmu: Clear reserved bit PERF_CTL2[43] for AMD erratum 1292 Like Xu
@ 2022-02-02  4:28 ` Ravi Bangoria
  2022-02-02  5:27   ` Stephane Eranian
  0 siblings, 1 reply; 22+ messages in thread
From: Ravi Bangoria @ 2022-02-02  4:28 UTC (permalink / raw)
  To: like.xu.linux, jmattson
  Cc: ravi.bangoria, santosh.shukla, pbonzini, eranian, seanjc,
	wanpengli, vkuznets, joro, peterz, mingo, alexander.shishkin,
	tglx, bp, dave.hansen, hpa, kvm, x86, linux-perf-users,
	ananth.narayan, kim.phillips

Perf counter may overcount for a list of Retire Based Events. Implement
workaround for Zen3 Family 19 Model 00-0F processors as suggested in
Revision Guide[1]:

  To count the non-FP affected PMC events correctly:
    o Use Core::X86::Msr::PERF_CTL2 to count the events, and
    o Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
    o Program Core::X86::Msr::PERF_CTL2[20] to 0b.

Above workaround suggests to clear PERF_CTL2[20], but that will disable
sampling mode. Given the fact that, there is already a skew between
actual counter overflow vs PMI hit, we are anyway not getting accurate
count for sampling events. Also, using PMC2 with both bit43 and bit20
set can result in additional issues. Hence Linux implementation of
workaround uses non-PMC2 counter for sampling events.

Although the issue exists on all previous Zen revisions, the workaround
is different and thus not included in this patch.

This patch needs Like's patch[2] to make it work on kvm guest.

[1] https://bugzilla.kernel.org/attachment.cgi?id=298241
[2] https://lore.kernel.org/lkml/20220117055703.52020-1-likexu@tencent.com

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/core.c | 75 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 74 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index 9687a8aef01c..e2f172e75ce8 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -874,8 +874,78 @@ amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, int idx,
 	}
 }
 
+/* Errata 1292: Overcounting of Retire Based Events */
+static struct event_constraint retire_event_count_constraints[] __read_mostly = {
+	EVENT_CONSTRAINT(0xC0, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xC1, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xC2, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xC3, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xC4, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xC5, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xC8, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xC9, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xCA, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xCC, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xD1, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0x1000000C7, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0x1000000D0, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT_END
+};
+
+#define SAMPLE_IDX_MASK	(((1ULL << AMD64_NUM_COUNTERS_CORE) - 1) & ~0x4ULL)
+
+static struct event_constraint retire_event_sample_constraints[] __read_mostly = {
+	EVENT_CONSTRAINT(0xC0, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xC0, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xC1, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xC2, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xC3, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xC4, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xC5, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xC8, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xC9, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xCA, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xCC, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xD1, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0x1000000C7, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0x1000000D0, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT_END
+};
+
 static struct event_constraint pair_constraint;
 
+/*
+ * Although 'Overcounting of Retire Based Events' errata exists
+ * for older generation cpus, workaround to set bit 43 works only
+ * for Family 19h Model 00-0Fh as per the Revision Guide.
+ */
+static struct event_constraint *
+amd_get_event_constraints_f19h_m00_0fh(struct cpu_hw_events *cpuc, int idx,
+				       struct perf_event *event)
+{
+	struct event_constraint *c;
+
+	if (amd_is_pair_event_code(&event->hw))
+		return &pair_constraint;
+
+	if (is_sampling_event(event)) {
+		for_each_event_constraint(c, retire_event_sample_constraints) {
+			if (constraint_match(c, event->hw.config))
+				return c;
+		}
+	} else {
+		for_each_event_constraint(c, retire_event_count_constraints) {
+			if (constraint_match(c, event->hw.config)) {
+				event->hw.config |= (1ULL << 43);
+				event->hw.config &= ~(1ULL << 20);
+				return c;
+			}
+		}
+	}
+
+	return &unconstrained;
+}
+
 static struct event_constraint *
 amd_get_event_constraints_f17h(struct cpu_hw_events *cpuc, int idx,
 			       struct perf_event *event)
@@ -983,7 +1053,10 @@ static int __init amd_core_pmu_init(void)
 				    x86_pmu.num_counters / 2, 0,
 				    PERF_X86_EVENT_PAIR);
 
-		x86_pmu.get_event_constraints = amd_get_event_constraints_f17h;
+		if (boot_cpu_data.x86 == 0x19 && boot_cpu_data.x86_model <= 0xf)
+			x86_pmu.get_event_constraints = amd_get_event_constraints_f19h_m00_0fh;
+		else
+			x86_pmu.get_event_constraints = amd_get_event_constraints_f17h;
 		x86_pmu.put_event_constraints = amd_put_event_constraints_f17h;
 		x86_pmu.perf_ctr_pair_en = AMD_MERGE_EVENT_ENABLE;
 		x86_pmu.flags |= PMU_FL_PAIR;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/amd: Implement errata #1292 workaround for F19h M00-0Fh
  2022-02-02  4:28 ` [PATCH] perf/amd: Implement errata #1292 workaround for F19h M00-0Fh Ravi Bangoria
@ 2022-02-02  5:27   ` Stephane Eranian
  2022-02-02  6:02     ` Ravi Bangoria
  0 siblings, 1 reply; 22+ messages in thread
From: Stephane Eranian @ 2022-02-02  5:27 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: like.xu.linux, jmattson, santosh.shukla, pbonzini, seanjc,
	wanpengli, vkuznets, joro, peterz, mingo, alexander.shishkin,
	tglx, bp, dave.hansen, hpa, kvm, x86, linux-perf-users,
	ananth.narayan, kim.phillips

On Tue, Feb 1, 2022 at 8:29 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>
> Perf counter may overcount for a list of Retire Based Events. Implement
> workaround for Zen3 Family 19 Model 00-0F processors as suggested in
> Revision Guide[1]:
>
>   To count the non-FP affected PMC events correctly:
>     o Use Core::X86::Msr::PERF_CTL2 to count the events, and
>     o Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
>     o Program Core::X86::Msr::PERF_CTL2[20] to 0b.
>
> Above workaround suggests to clear PERF_CTL2[20], but that will disable
> sampling mode. Given the fact that, there is already a skew between
> actual counter overflow vs PMI hit, we are anyway not getting accurate
> count for sampling events. Also, using PMC2 with both bit43 and bit20
> set can result in additional issues. Hence Linux implementation of
> workaround uses non-PMC2 counter for sampling events.
>
Something is missing from your description here. If you are not
clearing bit[20] and
not setting bit[43], then how does running on CTL2 by itself improve
the count. Is that
enough to make the counter count correctly?

For sampling events, your patch makes CTL2 not available. That seems
to contradict the
workaround. Are you doing this to free CTL2 for counting mode events
instead? If you are
not using CTL2, then you are not correcting the count. Are you saying
this is okay in sampling mode
because of the skid, anyway?

> Although the issue exists on all previous Zen revisions, the workaround
> is different and thus not included in this patch.
>
> This patch needs Like's patch[2] to make it work on kvm guest.
>
> [1] https://bugzilla.kernel.org/attachment.cgi?id=298241
> [2] https://lore.kernel.org/lkml/20220117055703.52020-1-likexu@tencent.com
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
>  arch/x86/events/amd/core.c | 75 +++++++++++++++++++++++++++++++++++++-
>  1 file changed, 74 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
> index 9687a8aef01c..e2f172e75ce8 100644
> --- a/arch/x86/events/amd/core.c
> +++ b/arch/x86/events/amd/core.c
> @@ -874,8 +874,78 @@ amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, int idx,
>         }
>  }
>
> +/* Errata 1292: Overcounting of Retire Based Events */
> +static struct event_constraint retire_event_count_constraints[] __read_mostly = {
> +       EVENT_CONSTRAINT(0xC0, 0x4, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0xC1, 0x4, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0xC2, 0x4, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0xC3, 0x4, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0xC4, 0x4, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0xC5, 0x4, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0xC8, 0x4, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0xC9, 0x4, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0xCA, 0x4, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0xCC, 0x4, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0xD1, 0x4, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0x1000000C7, 0x4, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0x1000000D0, 0x4, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT_END
> +};
> +
> +#define SAMPLE_IDX_MASK        (((1ULL << AMD64_NUM_COUNTERS_CORE) - 1) & ~0x4ULL)
> +
> +static struct event_constraint retire_event_sample_constraints[] __read_mostly = {
> +       EVENT_CONSTRAINT(0xC0, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0xC0, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0xC1, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0xC2, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0xC3, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0xC4, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0xC5, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0xC8, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0xC9, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0xCA, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0xCC, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0xD1, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0x1000000C7, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT(0x1000000D0, SAMPLE_IDX_MASK, AMD64_EVENTSEL_EVENT),
> +       EVENT_CONSTRAINT_END
> +};
> +
>  static struct event_constraint pair_constraint;
>
> +/*
> + * Although 'Overcounting of Retire Based Events' errata exists
> + * for older generation cpus, workaround to set bit 43 works only
> + * for Family 19h Model 00-0Fh as per the Revision Guide.
> + */
> +static struct event_constraint *
> +amd_get_event_constraints_f19h_m00_0fh(struct cpu_hw_events *cpuc, int idx,
> +                                      struct perf_event *event)
> +{
> +       struct event_constraint *c;
> +
> +       if (amd_is_pair_event_code(&event->hw))
> +               return &pair_constraint;
> +
> +       if (is_sampling_event(event)) {
> +               for_each_event_constraint(c, retire_event_sample_constraints) {
> +                       if (constraint_match(c, event->hw.config))
> +                               return c;
> +               }
> +       } else {
> +               for_each_event_constraint(c, retire_event_count_constraints) {
> +                       if (constraint_match(c, event->hw.config)) {
> +                               event->hw.config |= (1ULL << 43);
> +                               event->hw.config &= ~(1ULL << 20);
> +                               return c;
> +                       }
> +               }
> +       }
> +
> +       return &unconstrained;
> +}
> +
>  static struct event_constraint *
>  amd_get_event_constraints_f17h(struct cpu_hw_events *cpuc, int idx,
>                                struct perf_event *event)
> @@ -983,7 +1053,10 @@ static int __init amd_core_pmu_init(void)
>                                     x86_pmu.num_counters / 2, 0,
>                                     PERF_X86_EVENT_PAIR);
>
> -               x86_pmu.get_event_constraints = amd_get_event_constraints_f17h;
> +               if (boot_cpu_data.x86 == 0x19 && boot_cpu_data.x86_model <= 0xf)
> +                       x86_pmu.get_event_constraints = amd_get_event_constraints_f19h_m00_0fh;
> +               else
> +                       x86_pmu.get_event_constraints = amd_get_event_constraints_f17h;
>                 x86_pmu.put_event_constraints = amd_put_event_constraints_f17h;
>                 x86_pmu.perf_ctr_pair_en = AMD_MERGE_EVENT_ENABLE;
>                 x86_pmu.flags |= PMU_FL_PAIR;
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/amd: Implement errata #1292 workaround for F19h M00-0Fh
  2022-02-02  5:27   ` Stephane Eranian
@ 2022-02-02  6:02     ` Ravi Bangoria
  2022-02-02  6:16       ` Stephane Eranian
  0 siblings, 1 reply; 22+ messages in thread
From: Ravi Bangoria @ 2022-02-02  6:02 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: like.xu.linux, jmattson, santosh.shukla, pbonzini, seanjc,
	wanpengli, vkuznets, joro, peterz, mingo, alexander.shishkin,
	tglx, bp, dave.hansen, hpa, kvm, x86, linux-perf-users,
	ananth.narayan, kim.phillips, Ravi Bangoria

Hi Stephane,

On 02-Feb-22 10:57 AM, Stephane Eranian wrote:
> On Tue, Feb 1, 2022 at 8:29 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>
>> Perf counter may overcount for a list of Retire Based Events. Implement
>> workaround for Zen3 Family 19 Model 00-0F processors as suggested in
>> Revision Guide[1]:
>>
>>   To count the non-FP affected PMC events correctly:
>>     o Use Core::X86::Msr::PERF_CTL2 to count the events, and
>>     o Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
>>     o Program Core::X86::Msr::PERF_CTL2[20] to 0b.
>>
>> Above workaround suggests to clear PERF_CTL2[20], but that will disable
>> sampling mode. Given the fact that, there is already a skew between
>> actual counter overflow vs PMI hit, we are anyway not getting accurate
>> count for sampling events. Also, using PMC2 with both bit43 and bit20
>> set can result in additional issues. Hence Linux implementation of
>> workaround uses non-PMC2 counter for sampling events.
>>
> Something is missing from your description here. If you are not
> clearing bit[20] and
> not setting bit[43], then how does running on CTL2 by itself improve
> the count. Is that
> enough to make the counter count correctly?

Yes. For counting retire based events, we need PMC2[43] set and
PMC2[20] clear so that it will not overcount.

> 
> For sampling events, your patch makes CTL2 not available. That seems
> to contradict the
> workaround. Are you doing this to free CTL2 for counting mode events
> instead? If you are
> not using CTL2, then you are not correcting the count. Are you saying
> this is okay in sampling mode
> because of the skid, anyway?

Correct. The constraint I am placing is to count retire events on
PMC2 and sample retire events on other counters.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/amd: Implement errata #1292 workaround for F19h M00-0Fh
  2022-02-02  6:02     ` Ravi Bangoria
@ 2022-02-02  6:16       ` Stephane Eranian
  2022-02-02  6:32         ` Ravi Bangoria
  0 siblings, 1 reply; 22+ messages in thread
From: Stephane Eranian @ 2022-02-02  6:16 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: like.xu.linux, jmattson, santosh.shukla, pbonzini, seanjc,
	wanpengli, vkuznets, joro, peterz, mingo, alexander.shishkin,
	tglx, bp, dave.hansen, hpa, kvm, x86, linux-perf-users,
	ananth.narayan, kim.phillips

On Tue, Feb 1, 2022 at 10:03 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>
> Hi Stephane,
>
> On 02-Feb-22 10:57 AM, Stephane Eranian wrote:
> > On Tue, Feb 1, 2022 at 8:29 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
> >>
> >> Perf counter may overcount for a list of Retire Based Events. Implement
> >> workaround for Zen3 Family 19 Model 00-0F processors as suggested in
> >> Revision Guide[1]:
> >>
> >>   To count the non-FP affected PMC events correctly:
> >>     o Use Core::X86::Msr::PERF_CTL2 to count the events, and
> >>     o Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
> >>     o Program Core::X86::Msr::PERF_CTL2[20] to 0b.
> >>
> >> Above workaround suggests to clear PERF_CTL2[20], but that will disable
> >> sampling mode. Given the fact that, there is already a skew between
> >> actual counter overflow vs PMI hit, we are anyway not getting accurate
> >> count for sampling events. Also, using PMC2 with both bit43 and bit20
> >> set can result in additional issues. Hence Linux implementation of
> >> workaround uses non-PMC2 counter for sampling events.
> >>
> > Something is missing from your description here. If you are not
> > clearing bit[20] and
> > not setting bit[43], then how does running on CTL2 by itself improve
> > the count. Is that
> > enough to make the counter count correctly?
>
> Yes. For counting retire based events, we need PMC2[43] set and
> PMC2[20] clear so that it will not overcount.
>
Ok, I get that part now. You are forcing the bits in the
get_constraint() function.

> >
> > For sampling events, your patch makes CTL2 not available. That seems
> > to contradict the
> > workaround. Are you doing this to free CTL2 for counting mode events
> > instead? If you are
> > not using CTL2, then you are not correcting the count. Are you saying
> > this is okay in sampling mode
> > because of the skid, anyway?
>
> Correct. The constraint I am placing is to count retire events on
> PMC2 and sample retire events on other counters.
>
Why do you need to permanently exclude CTL2 for retired events given
you are forcing the bits
in the get_constraints() for counting events config only, i.e., as
opposed to in CTL2 itself.
If the sampling retired events are unconstrained, they can use any
counters. If a counting retired
event is added, it has a "stronger" constraints and will be scheduled
before the unconstrained events,
yield the same behavior you wanted, except on demand which is preferable.

> Thanks,
> Ravi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] perf/amd: Implement errata #1292 workaround for F19h M00-0Fh
  2022-02-02  6:16       ` Stephane Eranian
@ 2022-02-02  6:32         ` Ravi Bangoria
  2022-02-02 10:51           ` [PATCH v2] perf/amd: Implement erratum " Ravi Bangoria
  0 siblings, 1 reply; 22+ messages in thread
From: Ravi Bangoria @ 2022-02-02  6:32 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: like.xu.linux, jmattson, santosh.shukla, pbonzini, seanjc,
	wanpengli, vkuznets, joro, peterz, mingo, alexander.shishkin,
	tglx, bp, dave.hansen, hpa, kvm, x86, linux-perf-users,
	ananth.narayan, kim.phillips, Ravi Bangoria

Hi Stephane,

On 02-Feb-22 11:46 AM, Stephane Eranian wrote:
> On Tue, Feb 1, 2022 at 10:03 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>
>> Hi Stephane,
>>
>> On 02-Feb-22 10:57 AM, Stephane Eranian wrote:
>>> On Tue, Feb 1, 2022 at 8:29 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>>>
>>>> Perf counter may overcount for a list of Retire Based Events. Implement
>>>> workaround for Zen3 Family 19 Model 00-0F processors as suggested in
>>>> Revision Guide[1]:
>>>>
>>>>   To count the non-FP affected PMC events correctly:
>>>>     o Use Core::X86::Msr::PERF_CTL2 to count the events, and
>>>>     o Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
>>>>     o Program Core::X86::Msr::PERF_CTL2[20] to 0b.
>>>>
>>>> Above workaround suggests to clear PERF_CTL2[20], but that will disable
>>>> sampling mode. Given the fact that, there is already a skew between
>>>> actual counter overflow vs PMI hit, we are anyway not getting accurate
>>>> count for sampling events. Also, using PMC2 with both bit43 and bit20
>>>> set can result in additional issues. Hence Linux implementation of
>>>> workaround uses non-PMC2 counter for sampling events.
>>>>
>>> Something is missing from your description here. If you are not
>>> clearing bit[20] and
>>> not setting bit[43], then how does running on CTL2 by itself improve
>>> the count. Is that
>>> enough to make the counter count correctly?
>>
>> Yes. For counting retire based events, we need PMC2[43] set and
>> PMC2[20] clear so that it will not overcount.
>>
> Ok, I get that part now. You are forcing the bits in the
> get_constraint() function.
> 
>>>
>>> For sampling events, your patch makes CTL2 not available. That seems
>>> to contradict the
>>> workaround. Are you doing this to free CTL2 for counting mode events
>>> instead? If you are
>>> not using CTL2, then you are not correcting the count. Are you saying
>>> this is okay in sampling mode
>>> because of the skid, anyway?
>>
>> Correct. The constraint I am placing is to count retire events on
>> PMC2 and sample retire events on other counters.
>>
> Why do you need to permanently exclude CTL2 for retired events given
> you are forcing the bits
> in the get_constraints() for counting events config only, i.e., as
> opposed to in CTL2 itself.
> If the sampling retired events are unconstrained, they can use any
> counters. If a counting retired
> event is added, it has a "stronger" constraints and will be scheduled
> before the unconstrained events,
> yield the same behavior you wanted, except on demand which is preferable.

Got it. Let me respin.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2] perf/amd: Implement erratum #1292 workaround for F19h M00-0Fh
  2022-02-02  6:32         ` Ravi Bangoria
@ 2022-02-02 10:51           ` Ravi Bangoria
  2022-02-02 14:36             ` Peter Zijlstra
  2022-02-03  4:09             ` [PATCH v2] " Jim Mattson
  0 siblings, 2 replies; 22+ messages in thread
From: Ravi Bangoria @ 2022-02-02 10:51 UTC (permalink / raw)
  To: like.xu.linux, jmattson, eranian
  Cc: ravi.bangoria, santosh.shukla, pbonzini, seanjc, wanpengli,
	vkuznets, joro, peterz, mingo, alexander.shishkin, tglx, bp,
	dave.hansen, hpa, kvm, x86, linux-perf-users, ananth.narayan,
	kim.phillips

Perf counter may overcount for a list of Retire Based Events. Implement
workaround for Zen3 Family 19 Model 00-0F processors as suggested in
Revision Guide[1]:

  To count the non-FP affected PMC events correctly:
    o Use Core::X86::Msr::PERF_CTL2 to count the events, and
    o Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
    o Program Core::X86::Msr::PERF_CTL2[20] to 0b.

Note that the specified workaround applies only to counting events and
not to sampling events. Thus sampling event will continue functioning
as is.

Although the issue exists on all previous Zen revisions, the workaround
is different and thus not included in this patch.

This patch needs Like's patch[2] to make it work on kvm guest.

[1] https://bugzilla.kernel.org/attachment.cgi?id=298241
[2] https://lore.kernel.org/lkml/20220117055703.52020-1-likexu@tencent.com

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
v1: https://lore.kernel.org/r/20220202042838.6532-1-ravi.bangoria@amd.com
v1->v2:
- Don't put any constraint on sampling events
- s/errata/erratum/


 arch/x86/events/amd/core.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index 9687a8aef01c..d4dc5ff35366 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -874,6 +874,24 @@ amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, int idx,
 	}
 }
 
+/* Overcounting of Retire Based Events Erratum */
+static struct event_constraint retire_event_constraints[] __read_mostly = {
+	EVENT_CONSTRAINT(0xC0, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xC1, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xC2, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xC3, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xC4, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xC5, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xC8, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xC9, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xCA, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xCC, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xD1, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0x1000000C7, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0x1000000D0, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT_END
+};
+
 static struct event_constraint pair_constraint;
 
 static struct event_constraint *
@@ -881,10 +899,30 @@ amd_get_event_constraints_f17h(struct cpu_hw_events *cpuc, int idx,
 			       struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
+	struct event_constraint *c;
 
 	if (amd_is_pair_event_code(hwc))
 		return &pair_constraint;
 
+	/*
+	 * Although 'Overcounting of Retire Based Events' erratum exists
+	 * for older generation cpus, workaround to set bit 43 works only
+	 * for Family 19h Model 00-0Fh as per the Revision Guide.
+	 */
+	if (boot_cpu_data.x86 == 0x19 && boot_cpu_data.x86_model <= 0xf) {
+		if (is_sampling_event(event))
+			goto out;
+
+		for_each_event_constraint(c, retire_event_constraints) {
+			if (constraint_match(c, event->hw.config)) {
+				event->hw.config |= (1ULL << 43);
+				event->hw.config &= ~(1ULL << 20);
+				return c;
+			}
+		}
+	}
+
+out:
 	return &unconstrained;
 }
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] perf/amd: Implement erratum #1292 workaround for F19h M00-0Fh
  2022-02-02 10:51           ` [PATCH v2] perf/amd: Implement erratum " Ravi Bangoria
@ 2022-02-02 14:36             ` Peter Zijlstra
  2022-02-02 15:32               ` Ravi Bangoria
  2022-02-03  4:09             ` [PATCH v2] " Jim Mattson
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2022-02-02 14:36 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: like.xu.linux, jmattson, eranian, santosh.shukla, pbonzini,
	seanjc, wanpengli, vkuznets, joro, mingo, alexander.shishkin,
	tglx, bp, dave.hansen, hpa, kvm, x86, linux-perf-users,
	ananth.narayan, kim.phillips

On Wed, Feb 02, 2022 at 04:21:58PM +0530, Ravi Bangoria wrote:
> +/* Overcounting of Retire Based Events Erratum */
> +static struct event_constraint retire_event_constraints[] __read_mostly = {
> +	EVENT_CONSTRAINT(0xC0, 0x4, AMD64_EVENTSEL_EVENT),
> +	EVENT_CONSTRAINT(0xC1, 0x4, AMD64_EVENTSEL_EVENT),
> +	EVENT_CONSTRAINT(0xC2, 0x4, AMD64_EVENTSEL_EVENT),
> +	EVENT_CONSTRAINT(0xC3, 0x4, AMD64_EVENTSEL_EVENT),
> +	EVENT_CONSTRAINT(0xC4, 0x4, AMD64_EVENTSEL_EVENT),
> +	EVENT_CONSTRAINT(0xC5, 0x4, AMD64_EVENTSEL_EVENT),
> +	EVENT_CONSTRAINT(0xC8, 0x4, AMD64_EVENTSEL_EVENT),
> +	EVENT_CONSTRAINT(0xC9, 0x4, AMD64_EVENTSEL_EVENT),
> +	EVENT_CONSTRAINT(0xCA, 0x4, AMD64_EVENTSEL_EVENT),
> +	EVENT_CONSTRAINT(0xCC, 0x4, AMD64_EVENTSEL_EVENT),
> +	EVENT_CONSTRAINT(0xD1, 0x4, AMD64_EVENTSEL_EVENT),
> +	EVENT_CONSTRAINT(0x1000000C7, 0x4, AMD64_EVENTSEL_EVENT),
> +	EVENT_CONSTRAINT(0x1000000D0, 0x4, AMD64_EVENTSEL_EVENT),

Can't this be encoded nicer? Something like:

	EVENT_CONSTRAINT(0xC0, 0x4, AMD64_EVENTSEL_EVENT & ~0xF).

To match all of 0xCn ?


> +	EVENT_CONSTRAINT_END
> +};

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] perf/amd: Implement erratum #1292 workaround for F19h M00-0Fh
  2022-02-02 14:36             ` Peter Zijlstra
@ 2022-02-02 15:32               ` Ravi Bangoria
  2022-02-03  9:58                 ` [PATCH v3] " Ravi Bangoria
  0 siblings, 1 reply; 22+ messages in thread
From: Ravi Bangoria @ 2022-02-02 15:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: like.xu.linux, jmattson, eranian, santosh.shukla, pbonzini,
	seanjc, wanpengli, vkuznets, joro, mingo, alexander.shishkin,
	tglx, bp, dave.hansen, hpa, kvm, x86, linux-perf-users,
	ananth.narayan, kim.phillips, Ravi Bangoria

Hi Peter,

On 02-Feb-22 8:06 PM, Peter Zijlstra wrote:
> On Wed, Feb 02, 2022 at 04:21:58PM +0530, Ravi Bangoria wrote:
>> +/* Overcounting of Retire Based Events Erratum */
>> +static struct event_constraint retire_event_constraints[] __read_mostly = {
>> +	EVENT_CONSTRAINT(0xC0, 0x4, AMD64_EVENTSEL_EVENT),
>> +	EVENT_CONSTRAINT(0xC1, 0x4, AMD64_EVENTSEL_EVENT),
>> +	EVENT_CONSTRAINT(0xC2, 0x4, AMD64_EVENTSEL_EVENT),
>> +	EVENT_CONSTRAINT(0xC3, 0x4, AMD64_EVENTSEL_EVENT),
>> +	EVENT_CONSTRAINT(0xC4, 0x4, AMD64_EVENTSEL_EVENT),
>> +	EVENT_CONSTRAINT(0xC5, 0x4, AMD64_EVENTSEL_EVENT),
>> +	EVENT_CONSTRAINT(0xC8, 0x4, AMD64_EVENTSEL_EVENT),
>> +	EVENT_CONSTRAINT(0xC9, 0x4, AMD64_EVENTSEL_EVENT),
>> +	EVENT_CONSTRAINT(0xCA, 0x4, AMD64_EVENTSEL_EVENT),
>> +	EVENT_CONSTRAINT(0xCC, 0x4, AMD64_EVENTSEL_EVENT),
>> +	EVENT_CONSTRAINT(0xD1, 0x4, AMD64_EVENTSEL_EVENT),
>> +	EVENT_CONSTRAINT(0x1000000C7, 0x4, AMD64_EVENTSEL_EVENT),
>> +	EVENT_CONSTRAINT(0x1000000D0, 0x4, AMD64_EVENTSEL_EVENT),
> 
> Can't this be encoded nicer? Something like:
> 
> 	EVENT_CONSTRAINT(0xC0, 0x4, AMD64_EVENTSEL_EVENT & ~0xF).
> 
> To match all of 0xCn ?

I don't think so as not all 0xCn events are constrained.

But I can probably use EVENT_CONSTRAINT_RANGE() for continuous event
codes:

	EVENT_CONSTRAINT_RANGE(0xC0, 0xC5, 0x4, AMD64_EVENTSEL_EVENT),
	EVENT_CONSTRAINT_RANGE(0xC8, 0xCA, 0x4, AMD64_EVENTSEL_EVENT),

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] perf/amd: Implement erratum #1292 workaround for F19h M00-0Fh
  2022-02-02 10:51           ` [PATCH v2] perf/amd: Implement erratum " Ravi Bangoria
  2022-02-02 14:36             ` Peter Zijlstra
@ 2022-02-03  4:09             ` Jim Mattson
  2022-02-03  5:18               ` Ravi Bangoria
  1 sibling, 1 reply; 22+ messages in thread
From: Jim Mattson @ 2022-02-03  4:09 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: like.xu.linux, eranian, santosh.shukla, pbonzini, seanjc,
	wanpengli, vkuznets, joro, peterz, mingo, alexander.shishkin,
	tglx, bp, dave.hansen, hpa, kvm, x86, linux-perf-users,
	ananth.narayan, kim.phillips

On Wed, Feb 2, 2022 at 2:52 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>
> Perf counter may overcount for a list of Retire Based Events. Implement
> workaround for Zen3 Family 19 Model 00-0F processors as suggested in
> Revision Guide[1]:
>
>   To count the non-FP affected PMC events correctly:
>     o Use Core::X86::Msr::PERF_CTL2 to count the events, and
>     o Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
>     o Program Core::X86::Msr::PERF_CTL2[20] to 0b.
>
> Note that the specified workaround applies only to counting events and
> not to sampling events. Thus sampling event will continue functioning
> as is.
>
> Although the issue exists on all previous Zen revisions, the workaround
> is different and thus not included in this patch.
>
> This patch needs Like's patch[2] to make it work on kvm guest.

IIUC, this patch along with Like's patch actually breaks PMU
virtualization for a kvm guest.

Suppose I have some code which counts event 0xC2 [Retired Branch
Instructions] on PMC0 and event 0xC4 [Retired Taken Branch
Instructions] on PMC1. I then divide PMC1 by PMC0 to see what
percentage of my branch instructions are taken. On hardware that
suffers from erratum 1292, both counters may overcount, but if the
inaccuracy is small, then my final result may still be fairly close to
reality.

With these patches, if I run that same code in a kvm guest, it looks
like one of those events will be counted on PMC2 and the other won't
be counted at all. So, when I calculate the percentage of branch
instructions taken, I either get 0 or infinity.

> [1] https://bugzilla.kernel.org/attachment.cgi?id=298241
> [2] https://lore.kernel.org/lkml/20220117055703.52020-1-likexu@tencent.com
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] perf/amd: Implement erratum #1292 workaround for F19h M00-0Fh
  2022-02-03  4:09             ` [PATCH v2] " Jim Mattson
@ 2022-02-03  5:18               ` Ravi Bangoria
  2022-02-03 17:55                 ` Jim Mattson
  0 siblings, 1 reply; 22+ messages in thread
From: Ravi Bangoria @ 2022-02-03  5:18 UTC (permalink / raw)
  To: Jim Mattson
  Cc: like.xu.linux, eranian, santosh.shukla, pbonzini, seanjc,
	wanpengli, vkuznets, joro, peterz, mingo, alexander.shishkin,
	tglx, bp, dave.hansen, hpa, kvm, x86, linux-perf-users,
	ananth.narayan, kim.phillips, Ravi Bangoria

Hi Jim,

On 03-Feb-22 9:39 AM, Jim Mattson wrote:
> On Wed, Feb 2, 2022 at 2:52 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>
>> Perf counter may overcount for a list of Retire Based Events. Implement
>> workaround for Zen3 Family 19 Model 00-0F processors as suggested in
>> Revision Guide[1]:
>>
>>   To count the non-FP affected PMC events correctly:
>>     o Use Core::X86::Msr::PERF_CTL2 to count the events, and
>>     o Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
>>     o Program Core::X86::Msr::PERF_CTL2[20] to 0b.
>>
>> Note that the specified workaround applies only to counting events and
>> not to sampling events. Thus sampling event will continue functioning
>> as is.
>>
>> Although the issue exists on all previous Zen revisions, the workaround
>> is different and thus not included in this patch.
>>
>> This patch needs Like's patch[2] to make it work on kvm guest.
> 
> IIUC, this patch along with Like's patch actually breaks PMU
> virtualization for a kvm guest.
> 
> Suppose I have some code which counts event 0xC2 [Retired Branch
> Instructions] on PMC0 and event 0xC4 [Retired Taken Branch
> Instructions] on PMC1. I then divide PMC1 by PMC0 to see what
> percentage of my branch instructions are taken. On hardware that
> suffers from erratum 1292, both counters may overcount, but if the
> inaccuracy is small, then my final result may still be fairly close to
> reality.
> 
> With these patches, if I run that same code in a kvm guest, it looks
> like one of those events will be counted on PMC2 and the other won't
> be counted at all. So, when I calculate the percentage of branch
> instructions taken, I either get 0 or infinity.

Events get multiplexed internally. See below quick test I ran inside
guest. My host is running with my+Like's patch and guest is running
with only my patch.

  $ ./perf stat -e branch-instructions,branch-misses -- ./branch-misses
   Performance counter stats for './branch-misses':

    19,847,153,209      branch-instructions:u                                         (50.03%)
       950,410,251      branch-misses:u           #    4.79% of all branches          (49.97%)


  $ cat branch-misses.c
  #include <stdlib.h>
  
  int main()
  {
          long i = 1000000000;
          long j = 0;
  
          while(i--) {
                  switch(rand() % 20) {
                  case 0:  j += 0; break;
                  case 1:  j += 1; break;
                  case 2:  j += 2; break;
                  case 3:  j += 3; break;
                  case 4:  j += 4; break;
                  case 5:  j += 5; break;
                  case 6:  j += 6; break;
                  case 7:  j += 7; break;
                  case 8:  j += 8; break;
                  case 9:  j += 9; break;
                  case 10: j += 10; break;
                  case 11: j += 11; break;
                  case 12: j += 12; break;
                  case 13: j += 13; break;
                  case 14: j += 14; break;
                  case 15: j += 15; break;
                  case 16: j += 16; break;
                  case 17: j += 17; break;
                  case 18: j += 18; break;
                  case 19: j += 19; break;
                  default: j += 20; break;
                  }
          }
          return 0;
  }

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3] perf/amd: Implement erratum #1292 workaround for F19h M00-0Fh
  2022-02-02 15:32               ` Ravi Bangoria
@ 2022-02-03  9:58                 ` Ravi Bangoria
  2022-02-09 12:51                   ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Ravi Bangoria @ 2022-02-03  9:58 UTC (permalink / raw)
  To: like.xu.linux, jmattson, eranian, peterz
  Cc: ravi.bangoria, santosh.shukla, pbonzini, seanjc, wanpengli,
	vkuznets, joro, mingo, alexander.shishkin, tglx, bp, dave.hansen,
	hpa, kvm, x86, linux-perf-users, ananth.narayan, kim.phillips

Perf counter may overcount for a list of Retire Based Events. Implement
workaround for Zen3 Family 19 Model 00-0F processors as suggested in
Revision Guide[1]:

  To count the non-FP affected PMC events correctly:
    o Use Core::X86::Msr::PERF_CTL2 to count the events, and
    o Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
    o Program Core::X86::Msr::PERF_CTL2[20] to 0b.

Note that the specified workaround applies only to counting events and
not to sampling events. Thus sampling event will continue functioning
as is.

Although the issue exists on all previous Zen revisions, the workaround
is different and thus not included in this patch.

This patch needs Like's patch[2] to make it work on kvm guest.

[1] https://bugzilla.kernel.org/attachment.cgi?id=298241
[2] https://lore.kernel.org/lkml/20220117055703.52020-1-likexu@tencent.com

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
v2: https://lore.kernel.org/r/20220202105158.7072-1-ravi.bangoria@amd.com
v2->v3:
  - Use EVENT_CONSTRAINT_RANGE() for continuous event codes.

 arch/x86/events/amd/core.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index 9687a8aef01c..124ec15851bc 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -874,6 +874,17 @@ amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, int idx,
 	}
 }
 
+/* Overcounting of Retire Based Events Erratum */
+static struct event_constraint retire_event_constraints[] __read_mostly = {
+	EVENT_CONSTRAINT_RANGE(0xC0, 0xC5, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT_RANGE(0xC8, 0xCA, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xCC, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0xD1, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0x1000000C7, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT(0x1000000D0, 0x4, AMD64_EVENTSEL_EVENT),
+	EVENT_CONSTRAINT_END
+};
+
 static struct event_constraint pair_constraint;
 
 static struct event_constraint *
@@ -881,10 +892,30 @@ amd_get_event_constraints_f17h(struct cpu_hw_events *cpuc, int idx,
 			       struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
+	struct event_constraint *c;
 
 	if (amd_is_pair_event_code(hwc))
 		return &pair_constraint;
 
+	/*
+	 * Although 'Overcounting of Retire Based Events' erratum exists
+	 * for older generation cpus, workaround to set bit 43 works only
+	 * for Family 19h Model 00-0Fh as per the Revision Guide.
+	 */
+	if (boot_cpu_data.x86 == 0x19 && boot_cpu_data.x86_model <= 0xf) {
+		if (is_sampling_event(event))
+			goto out;
+
+		for_each_event_constraint(c, retire_event_constraints) {
+			if (constraint_match(c, event->hw.config)) {
+				event->hw.config |= (1ULL << 43);
+				event->hw.config &= ~(1ULL << 20);
+				return c;
+			}
+		}
+	}
+
+out:
 	return &unconstrained;
 }
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] perf/amd: Implement erratum #1292 workaround for F19h M00-0Fh
  2022-02-03  5:18               ` Ravi Bangoria
@ 2022-02-03 17:55                 ` Jim Mattson
  2022-02-04  9:32                   ` Ravi Bangoria
  0 siblings, 1 reply; 22+ messages in thread
From: Jim Mattson @ 2022-02-03 17:55 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: like.xu.linux, eranian, santosh.shukla, pbonzini, seanjc,
	wanpengli, vkuznets, joro, peterz, mingo, alexander.shishkin,
	tglx, bp, dave.hansen, hpa, kvm, x86, linux-perf-users,
	ananth.narayan, kim.phillips

On Wed, Feb 2, 2022 at 9:18 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>
> Hi Jim,
>
> On 03-Feb-22 9:39 AM, Jim Mattson wrote:
> > On Wed, Feb 2, 2022 at 2:52 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
> >>
> >> Perf counter may overcount for a list of Retire Based Events. Implement
> >> workaround for Zen3 Family 19 Model 00-0F processors as suggested in
> >> Revision Guide[1]:
> >>
> >>   To count the non-FP affected PMC events correctly:
> >>     o Use Core::X86::Msr::PERF_CTL2 to count the events, and
> >>     o Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
> >>     o Program Core::X86::Msr::PERF_CTL2[20] to 0b.
> >>
> >> Note that the specified workaround applies only to counting events and
> >> not to sampling events. Thus sampling event will continue functioning
> >> as is.
> >>
> >> Although the issue exists on all previous Zen revisions, the workaround
> >> is different and thus not included in this patch.
> >>
> >> This patch needs Like's patch[2] to make it work on kvm guest.
> >
> > IIUC, this patch along with Like's patch actually breaks PMU
> > virtualization for a kvm guest.
> >
> > Suppose I have some code which counts event 0xC2 [Retired Branch
> > Instructions] on PMC0 and event 0xC4 [Retired Taken Branch
> > Instructions] on PMC1. I then divide PMC1 by PMC0 to see what
> > percentage of my branch instructions are taken. On hardware that
> > suffers from erratum 1292, both counters may overcount, but if the
> > inaccuracy is small, then my final result may still be fairly close to
> > reality.
> >
> > With these patches, if I run that same code in a kvm guest, it looks
> > like one of those events will be counted on PMC2 and the other won't
> > be counted at all. So, when I calculate the percentage of branch
> > instructions taken, I either get 0 or infinity.
>
> Events get multiplexed internally. See below quick test I ran inside
> guest. My host is running with my+Like's patch and guest is running
> with only my patch.

Your guest may be multiplexing the counters. The guest I posited does not.

I hope that you are not saying that kvm's *thread-pinned* perf events
are not being multiplexed at the host level, because that completely
breaks PMU virtualization.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] perf/amd: Implement erratum #1292 workaround for F19h M00-0Fh
  2022-02-03 17:55                 ` Jim Mattson
@ 2022-02-04  9:32                   ` Ravi Bangoria
  2022-02-04 13:01                     ` Jim Mattson
  0 siblings, 1 reply; 22+ messages in thread
From: Ravi Bangoria @ 2022-02-04  9:32 UTC (permalink / raw)
  To: Jim Mattson
  Cc: like.xu.linux, eranian, santosh.shukla, pbonzini, seanjc,
	wanpengli, vkuznets, joro, peterz, mingo, alexander.shishkin,
	tglx, bp, dave.hansen, hpa, kvm, x86, linux-perf-users,
	ananth.narayan, kim.phillips, Ravi Bangoria



On 03-Feb-22 11:25 PM, Jim Mattson wrote:
> On Wed, Feb 2, 2022 at 9:18 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>
>> Hi Jim,
>>
>> On 03-Feb-22 9:39 AM, Jim Mattson wrote:
>>> On Wed, Feb 2, 2022 at 2:52 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>>>
>>>> Perf counter may overcount for a list of Retire Based Events. Implement
>>>> workaround for Zen3 Family 19 Model 00-0F processors as suggested in
>>>> Revision Guide[1]:
>>>>
>>>>   To count the non-FP affected PMC events correctly:
>>>>     o Use Core::X86::Msr::PERF_CTL2 to count the events, and
>>>>     o Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
>>>>     o Program Core::X86::Msr::PERF_CTL2[20] to 0b.
>>>>
>>>> Note that the specified workaround applies only to counting events and
>>>> not to sampling events. Thus sampling event will continue functioning
>>>> as is.
>>>>
>>>> Although the issue exists on all previous Zen revisions, the workaround
>>>> is different and thus not included in this patch.
>>>>
>>>> This patch needs Like's patch[2] to make it work on kvm guest.
>>>
>>> IIUC, this patch along with Like's patch actually breaks PMU
>>> virtualization for a kvm guest.
>>>
>>> Suppose I have some code which counts event 0xC2 [Retired Branch
>>> Instructions] on PMC0 and event 0xC4 [Retired Taken Branch
>>> Instructions] on PMC1. I then divide PMC1 by PMC0 to see what
>>> percentage of my branch instructions are taken. On hardware that
>>> suffers from erratum 1292, both counters may overcount, but if the
>>> inaccuracy is small, then my final result may still be fairly close to
>>> reality.
>>>
>>> With these patches, if I run that same code in a kvm guest, it looks
>>> like one of those events will be counted on PMC2 and the other won't
>>> be counted at all. So, when I calculate the percentage of branch
>>> instructions taken, I either get 0 or infinity.
>>
>> Events get multiplexed internally. See below quick test I ran inside
>> guest. My host is running with my+Like's patch and guest is running
>> with only my patch.
> 
> Your guest may be multiplexing the counters. The guest I posited does not.

It would be helpful if you can provide an example.

> I hope that you are not saying that kvm's *thread-pinned* perf events
> are not being multiplexed at the host level, because that completely
> breaks PMU virtualization.

IIUC, multiplexing happens inside the guest.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] perf/amd: Implement erratum #1292 workaround for F19h M00-0Fh
  2022-02-04  9:32                   ` Ravi Bangoria
@ 2022-02-04 13:01                     ` Jim Mattson
  2022-02-09 10:18                       ` Like Xu
  0 siblings, 1 reply; 22+ messages in thread
From: Jim Mattson @ 2022-02-04 13:01 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: like.xu.linux, eranian, santosh.shukla, pbonzini, seanjc,
	wanpengli, vkuznets, joro, peterz, mingo, alexander.shishkin,
	tglx, bp, dave.hansen, hpa, kvm, x86, linux-perf-users,
	ananth.narayan, kim.phillips

On Fri, Feb 4, 2022 at 1:33 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>
>
>
> On 03-Feb-22 11:25 PM, Jim Mattson wrote:
> > On Wed, Feb 2, 2022 at 9:18 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
> >>
> >> Hi Jim,
> >>
> >> On 03-Feb-22 9:39 AM, Jim Mattson wrote:
> >>> On Wed, Feb 2, 2022 at 2:52 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
> >>>>
> >>>> Perf counter may overcount for a list of Retire Based Events. Implement
> >>>> workaround for Zen3 Family 19 Model 00-0F processors as suggested in
> >>>> Revision Guide[1]:
> >>>>
> >>>>   To count the non-FP affected PMC events correctly:
> >>>>     o Use Core::X86::Msr::PERF_CTL2 to count the events, and
> >>>>     o Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
> >>>>     o Program Core::X86::Msr::PERF_CTL2[20] to 0b.
> >>>>
> >>>> Note that the specified workaround applies only to counting events and
> >>>> not to sampling events. Thus sampling event will continue functioning
> >>>> as is.
> >>>>
> >>>> Although the issue exists on all previous Zen revisions, the workaround
> >>>> is different and thus not included in this patch.
> >>>>
> >>>> This patch needs Like's patch[2] to make it work on kvm guest.
> >>>
> >>> IIUC, this patch along with Like's patch actually breaks PMU
> >>> virtualization for a kvm guest.
> >>>
> >>> Suppose I have some code which counts event 0xC2 [Retired Branch
> >>> Instructions] on PMC0 and event 0xC4 [Retired Taken Branch
> >>> Instructions] on PMC1. I then divide PMC1 by PMC0 to see what
> >>> percentage of my branch instructions are taken. On hardware that
> >>> suffers from erratum 1292, both counters may overcount, but if the
> >>> inaccuracy is small, then my final result may still be fairly close to
> >>> reality.
> >>>
> >>> With these patches, if I run that same code in a kvm guest, it looks
> >>> like one of those events will be counted on PMC2 and the other won't
> >>> be counted at all. So, when I calculate the percentage of branch
> >>> instructions taken, I either get 0 or infinity.
> >>
> >> Events get multiplexed internally. See below quick test I ran inside
> >> guest. My host is running with my+Like's patch and guest is running
> >> with only my patch.
> >
> > Your guest may be multiplexing the counters. The guest I posited does not.
>
> It would be helpful if you can provide an example.

Perf on any current Linux distro (i.e. without your fix).

> > I hope that you are not saying that kvm's *thread-pinned* perf events
> > are not being multiplexed at the host level, because that completely
> > breaks PMU virtualization.
>
> IIUC, multiplexing happens inside the guest.

I'm not sure that multiplexing is the answer. Extrapolation may
introduce greater imprecision than the erratum.

If you count something like "instructions retired" three ways:
1) Unfixed counter
2) PMC2 with the fix
3) Multiplexed on PMC2 with the fix

Is (3) always more accurate than (1)?

> Thanks,
> Ravi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] perf/amd: Implement erratum #1292 workaround for F19h M00-0Fh
  2022-02-04 13:01                     ` Jim Mattson
@ 2022-02-09 10:18                       ` Like Xu
  2022-02-09 21:40                         ` Jim Mattson
  0 siblings, 1 reply; 22+ messages in thread
From: Like Xu @ 2022-02-09 10:18 UTC (permalink / raw)
  To: Jim Mattson, Ravi Bangoria
  Cc: eranian, santosh.shukla, pbonzini, seanjc, wanpengli, vkuznets,
	joro, peterz, mingo, alexander.shishkin, tglx, bp, dave.hansen,
	hpa, kvm, x86, linux-perf-users, ananth.narayan, kim.phillips

On 4/2/2022 9:01 pm, Jim Mattson wrote:
> On Fri, Feb 4, 2022 at 1:33 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>
>>
>>
>> On 03-Feb-22 11:25 PM, Jim Mattson wrote:
>>> On Wed, Feb 2, 2022 at 9:18 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>>>
>>>> Hi Jim,
>>>>
>>>> On 03-Feb-22 9:39 AM, Jim Mattson wrote:
>>>>> On Wed, Feb 2, 2022 at 2:52 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>>>>>
>>>>>> Perf counter may overcount for a list of Retire Based Events. Implement
>>>>>> workaround for Zen3 Family 19 Model 00-0F processors as suggested in
>>>>>> Revision Guide[1]:
>>>>>>
>>>>>>    To count the non-FP affected PMC events correctly:
>>>>>>      o Use Core::X86::Msr::PERF_CTL2 to count the events, and
>>>>>>      o Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
>>>>>>      o Program Core::X86::Msr::PERF_CTL2[20] to 0b.
>>>>>>
>>>>>> Note that the specified workaround applies only to counting events and
>>>>>> not to sampling events. Thus sampling event will continue functioning
>>>>>> as is.
>>>>>>
>>>>>> Although the issue exists on all previous Zen revisions, the workaround
>>>>>> is different and thus not included in this patch.
>>>>>>
>>>>>> This patch needs Like's patch[2] to make it work on kvm guest.
>>>>>
>>>>> IIUC, this patch along with Like's patch actually breaks PMU
>>>>> virtualization for a kvm guest.
>>>>>
>>>>> Suppose I have some code which counts event 0xC2 [Retired Branch
>>>>> Instructions] on PMC0 and event 0xC4 [Retired Taken Branch
>>>>> Instructions] on PMC1. I then divide PMC1 by PMC0 to see what
>>>>> percentage of my branch instructions are taken. On hardware that
>>>>> suffers from erratum 1292, both counters may overcount, but if the
>>>>> inaccuracy is small, then my final result may still be fairly close to
>>>>> reality.
>>>>>
>>>>> With these patches, if I run that same code in a kvm guest, it looks
>>>>> like one of those events will be counted on PMC2 and the other won't
>>>>> be counted at all. So, when I calculate the percentage of branch
>>>>> instructions taken, I either get 0 or infinity.
>>>>
>>>> Events get multiplexed internally. See below quick test I ran inside
>>>> guest. My host is running with my+Like's patch and guest is running
>>>> with only my patch.
>>>
>>> Your guest may be multiplexing the counters. The guest I posited does not.
>>
>> It would be helpful if you can provide an example.
> 
> Perf on any current Linux distro (i.e. without your fix).

The patch for errata #1292 (like most hw issues or vulnerabilities) should be
applied to both the host and guest.

For non-patched guests on a patched host, the KVM-created perf_events
will be true for is_sampling_event() due to get_sample_period().

I think we (KVM) have a congenital defect in distinguishing whether guest
counters are used in counting mode or sampling mode, which is just
a different use of pure software.

> 
>>> I hope that you are not saying that kvm's *thread-pinned* perf events
>>> are not being multiplexed at the host level, because that completely
>>> breaks PMU virtualization.
>>
>> IIUC, multiplexing happens inside the guest.
> 
> I'm not sure that multiplexing is the answer. Extrapolation may
> introduce greater imprecision than the erratum.

If you run the same test on the patched host, the PMC2 will be
used in a multiplexing way. This is no different.

> 
> If you count something like "instructions retired" three ways:
> 1) Unfixed counter
> 2) PMC2 with the fix
> 3) Multiplexed on PMC2 with the fix
> 
> Is (3) always more accurate than (1)?

The loss of accuracy is due to a reduction in the number of trustworthy counters,
not to these two workaround patches. Any multiplexing (whatever on the host or
the guest) will result in a loss of accuracy. Right ?

I'm not sure if we should provide a sysfs knob for (1), is there a precedent for 
this ?

> 
>> Thanks,
>> Ravi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] perf/amd: Implement erratum #1292 workaround for F19h M00-0Fh
  2022-02-03  9:58                 ` [PATCH v3] " Ravi Bangoria
@ 2022-02-09 12:51                   ` Peter Zijlstra
  2022-02-10  4:05                     ` Ravi Bangoria
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2022-02-09 12:51 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: like.xu.linux, jmattson, eranian, santosh.shukla, pbonzini,
	seanjc, wanpengli, vkuznets, joro, mingo, alexander.shishkin,
	tglx, bp, dave.hansen, hpa, kvm, x86, linux-perf-users,
	ananth.narayan, kim.phillips

On Thu, Feb 03, 2022 at 03:28:41PM +0530, Ravi Bangoria wrote:
> Perf counter may overcount for a list of Retire Based Events. Implement
> workaround for Zen3 Family 19 Model 00-0F processors as suggested in
> Revision Guide[1]:
> 
>   To count the non-FP affected PMC events correctly:
>     o Use Core::X86::Msr::PERF_CTL2 to count the events, and
>     o Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
>     o Program Core::X86::Msr::PERF_CTL2[20] to 0b.
> 
> Note that the specified workaround applies only to counting events and
> not to sampling events. Thus sampling event will continue functioning
> as is.
> 
> Although the issue exists on all previous Zen revisions, the workaround
> is different and thus not included in this patch.
> 
> This patch needs Like's patch[2] to make it work on kvm guest.
> 
> [1] https://bugzilla.kernel.org/attachment.cgi?id=298241
> [2] https://lore.kernel.org/lkml/20220117055703.52020-1-likexu@tencent.com
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>

Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] perf/amd: Implement erratum #1292 workaround for F19h M00-0Fh
  2022-02-09 10:18                       ` Like Xu
@ 2022-02-09 21:40                         ` Jim Mattson
  2022-02-10  4:06                           ` Ravi Bangoria
  0 siblings, 1 reply; 22+ messages in thread
From: Jim Mattson @ 2022-02-09 21:40 UTC (permalink / raw)
  To: Like Xu
  Cc: Ravi Bangoria, eranian, santosh.shukla, pbonzini, seanjc,
	wanpengli, vkuznets, joro, peterz, mingo, alexander.shishkin,
	tglx, bp, dave.hansen, hpa, kvm, x86, linux-perf-users,
	ananth.narayan, kim.phillips

On Wed, Feb 9, 2022 at 2:19 AM Like Xu <like.xu.linux@gmail.com> wrote:
>
> On 4/2/2022 9:01 pm, Jim Mattson wrote:
> > On Fri, Feb 4, 2022 at 1:33 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
> >>
> >>
> >>
> >> On 03-Feb-22 11:25 PM, Jim Mattson wrote:
> >>> On Wed, Feb 2, 2022 at 9:18 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
> >>>>
> >>>> Hi Jim,
> >>>>
> >>>> On 03-Feb-22 9:39 AM, Jim Mattson wrote:
> >>>>> On Wed, Feb 2, 2022 at 2:52 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
> >>>>>>
> >>>>>> Perf counter may overcount for a list of Retire Based Events. Implement
> >>>>>> workaround for Zen3 Family 19 Model 00-0F processors as suggested in
> >>>>>> Revision Guide[1]:
> >>>>>>
> >>>>>>    To count the non-FP affected PMC events correctly:
> >>>>>>      o Use Core::X86::Msr::PERF_CTL2 to count the events, and
> >>>>>>      o Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
> >>>>>>      o Program Core::X86::Msr::PERF_CTL2[20] to 0b.
> >>>>>>
> >>>>>> Note that the specified workaround applies only to counting events and
> >>>>>> not to sampling events. Thus sampling event will continue functioning
> >>>>>> as is.
> >>>>>>
> >>>>>> Although the issue exists on all previous Zen revisions, the workaround
> >>>>>> is different and thus not included in this patch.
> >>>>>>
> >>>>>> This patch needs Like's patch[2] to make it work on kvm guest.
> >>>>>
> >>>>> IIUC, this patch along with Like's patch actually breaks PMU
> >>>>> virtualization for a kvm guest.
> >>>>>
> >>>>> Suppose I have some code which counts event 0xC2 [Retired Branch
> >>>>> Instructions] on PMC0 and event 0xC4 [Retired Taken Branch
> >>>>> Instructions] on PMC1. I then divide PMC1 by PMC0 to see what
> >>>>> percentage of my branch instructions are taken. On hardware that
> >>>>> suffers from erratum 1292, both counters may overcount, but if the
> >>>>> inaccuracy is small, then my final result may still be fairly close to
> >>>>> reality.
> >>>>>
> >>>>> With these patches, if I run that same code in a kvm guest, it looks
> >>>>> like one of those events will be counted on PMC2 and the other won't
> >>>>> be counted at all. So, when I calculate the percentage of branch
> >>>>> instructions taken, I either get 0 or infinity.
> >>>>
> >>>> Events get multiplexed internally. See below quick test I ran inside
> >>>> guest. My host is running with my+Like's patch and guest is running
> >>>> with only my patch.
> >>>
> >>> Your guest may be multiplexing the counters. The guest I posited does not.
> >>
> >> It would be helpful if you can provide an example.
> >
> > Perf on any current Linux distro (i.e. without your fix).
>
> The patch for errata #1292 (like most hw issues or vulnerabilities) should be
> applied to both the host and guest.

As I'm sure you are aware, guests are often not patched. For example,
we have a lot of Debian-9 guests running on Milan, despite the fact
that it has to be booted with "nopcid" due to a bug introduced on
4.9-stable. We submitted the fix and notified Debian about a year ago,
but they have not seen fit to cut a new kernel. Do you think they will
cut a new kernel for this patch?

> For non-patched guests on a patched host, the KVM-created perf_events
> will be true for is_sampling_event() due to get_sample_period().
>
> I think we (KVM) have a congenital defect in distinguishing whether guest
> counters are used in counting mode or sampling mode, which is just
> a different use of pure software.

I have no idea what you are saying. However, when kvm sees a guest
counter used in sampling mode, it will just request a PERF_TYPE_RAW
perf event with the INT bit set in 'config.' If it sees a guest
counter used in counting mode, it will either request a PERF_TYPE_RAW
perf event or a PERF_TYPE_HARDWARE perf event, depending on whether or
not it finds the requested event in amd_event_mapping[].

> >
> >>> I hope that you are not saying that kvm's *thread-pinned* perf events
> >>> are not being multiplexed at the host level, because that completely
> >>> breaks PMU virtualization.
> >>
> >> IIUC, multiplexing happens inside the guest.
> >
> > I'm not sure that multiplexing is the answer. Extrapolation may
> > introduce greater imprecision than the erratum.
>
> If you run the same test on the patched host, the PMC2 will be
> used in a multiplexing way. This is no different.
>
> >
> > If you count something like "instructions retired" three ways:
> > 1) Unfixed counter
> > 2) PMC2 with the fix
> > 3) Multiplexed on PMC2 with the fix
> >
> > Is (3) always more accurate than (1)?

Since Ravi has gone dark, I will answer my own question.

For better reproducibility, I simplified his program to:

int main() { return 0;}

On an unpatched Milan host, I get instructions retired between 21911
and 21915. I get branch instructions retired between 5565 and 5566. It
does not matter if I count them separately or at the same time.

After applying v3 of Ravi's patch, if I try to count these events at
the same time, I get 36869 instructions retired and 4962 branch
instructions on the first run. On subsequent runs, perf refuses to
count both at the same time. I get branch instructions retired between
5565 and 5567, but no instructions retired. Instead, perf tells me:

Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog

If I just count one thing at a time (on the patched kernel), I get
between 21911 and 21916 instructions retired, and I get between 5565
and 5566 branch instructions retired.

I don't know under what circumstances the unfixed counters overcount
or by how much. However, for this simple test case, the fixed PMC2
yields the same results as any unfixed counter. Ravi's patch, however
makes counting two of these events simultaneously either (a)
impossible, or (b) highly inaccurate (from 10% under to 68% over).

> The loss of accuracy is due to a reduction in the number of trustworthy counters,
> not to these two workaround patches. Any multiplexing (whatever on the host or
> the guest) will result in a loss of accuracy. Right ?

Yes, that's my point. Fixing one inaccuracy by using a mechanism that
introduces another inaccuracy only makes sense if the inaccuracy you
are fixing is worse than the inaccuracy you are introducing. That does
not appear to be the case here, but I am not privy to all of the
details of this erratum.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] perf/amd: Implement erratum #1292 workaround for F19h M00-0Fh
  2022-02-09 12:51                   ` Peter Zijlstra
@ 2022-02-10  4:05                     ` Ravi Bangoria
  2022-02-10  8:46                       ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Ravi Bangoria @ 2022-02-10  4:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: like.xu.linux, jmattson, eranian, santosh.shukla, pbonzini,
	seanjc, wanpengli, vkuznets, joro, mingo, alexander.shishkin,
	tglx, bp, dave.hansen, hpa, kvm, x86, linux-perf-users,
	ananth.narayan, kim.phillips, Ravi Bangoria



On 09-Feb-22 6:21 PM, Peter Zijlstra wrote:
> On Thu, Feb 03, 2022 at 03:28:41PM +0530, Ravi Bangoria wrote:
>> Perf counter may overcount for a list of Retire Based Events. Implement
>> workaround for Zen3 Family 19 Model 00-0F processors as suggested in
>> Revision Guide[1]:
>>
>>   To count the non-FP affected PMC events correctly:
>>     o Use Core::X86::Msr::PERF_CTL2 to count the events, and
>>     o Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
>>     o Program Core::X86::Msr::PERF_CTL2[20] to 0b.
>>
>> Note that the specified workaround applies only to counting events and
>> not to sampling events. Thus sampling event will continue functioning
>> as is.
>>
>> Although the issue exists on all previous Zen revisions, the workaround
>> is different and thus not included in this patch.
>>
>> This patch needs Like's patch[2] to make it work on kvm guest.
>>
>> [1] https://bugzilla.kernel.org/attachment.cgi?id=298241
>> [2] https://lore.kernel.org/lkml/20220117055703.52020-1-likexu@tencent.com
>>
>> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> 
> Thanks!

Peter, On subsequent tests, I found that this 'fix' is still not
optimal. Please drop this patch from your queue for now. Really
sorry for the noise.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] perf/amd: Implement erratum #1292 workaround for F19h M00-0Fh
  2022-02-09 21:40                         ` Jim Mattson
@ 2022-02-10  4:06                           ` Ravi Bangoria
  2022-02-10 13:56                             ` Like Xu
  0 siblings, 1 reply; 22+ messages in thread
From: Ravi Bangoria @ 2022-02-10  4:06 UTC (permalink / raw)
  To: Jim Mattson
  Cc: eranian, santosh.shukla, pbonzini, seanjc, wanpengli, vkuznets,
	joro, peterz, mingo, alexander.shishkin, tglx, bp, dave.hansen,
	hpa, kvm, x86, linux-perf-users, ananth.narayan, kim.phillips,
	Like Xu, Ravi Bangoria

Hi Jim,

On 10-Feb-22 3:10 AM, Jim Mattson wrote:
> On Wed, Feb 9, 2022 at 2:19 AM Like Xu <like.xu.linux@gmail.com> wrote:
>>
>> On 4/2/2022 9:01 pm, Jim Mattson wrote:
>>> On Fri, Feb 4, 2022 at 1:33 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>>>
>>>>
>>>>
>>>> On 03-Feb-22 11:25 PM, Jim Mattson wrote:
>>>>> On Wed, Feb 2, 2022 at 9:18 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>>>>>
>>>>>> Hi Jim,
>>>>>>
>>>>>> On 03-Feb-22 9:39 AM, Jim Mattson wrote:
>>>>>>> On Wed, Feb 2, 2022 at 2:52 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>>>>>>>
>>>>>>>> Perf counter may overcount for a list of Retire Based Events. Implement
>>>>>>>> workaround for Zen3 Family 19 Model 00-0F processors as suggested in
>>>>>>>> Revision Guide[1]:
>>>>>>>>
>>>>>>>>    To count the non-FP affected PMC events correctly:
>>>>>>>>      o Use Core::X86::Msr::PERF_CTL2 to count the events, and
>>>>>>>>      o Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
>>>>>>>>      o Program Core::X86::Msr::PERF_CTL2[20] to 0b.
>>>>>>>>
>>>>>>>> Note that the specified workaround applies only to counting events and
>>>>>>>> not to sampling events. Thus sampling event will continue functioning
>>>>>>>> as is.
>>>>>>>>
>>>>>>>> Although the issue exists on all previous Zen revisions, the workaround
>>>>>>>> is different and thus not included in this patch.
>>>>>>>>
>>>>>>>> This patch needs Like's patch[2] to make it work on kvm guest.
>>>>>>>
>>>>>>> IIUC, this patch along with Like's patch actually breaks PMU
>>>>>>> virtualization for a kvm guest.
>>>>>>>
>>>>>>> Suppose I have some code which counts event 0xC2 [Retired Branch
>>>>>>> Instructions] on PMC0 and event 0xC4 [Retired Taken Branch
>>>>>>> Instructions] on PMC1. I then divide PMC1 by PMC0 to see what
>>>>>>> percentage of my branch instructions are taken. On hardware that
>>>>>>> suffers from erratum 1292, both counters may overcount, but if the
>>>>>>> inaccuracy is small, then my final result may still be fairly close to
>>>>>>> reality.
>>>>>>>
>>>>>>> With these patches, if I run that same code in a kvm guest, it looks
>>>>>>> like one of those events will be counted on PMC2 and the other won't
>>>>>>> be counted at all. So, when I calculate the percentage of branch
>>>>>>> instructions taken, I either get 0 or infinity.
>>>>>>
>>>>>> Events get multiplexed internally. See below quick test I ran inside
>>>>>> guest. My host is running with my+Like's patch and guest is running
>>>>>> with only my patch.
>>>>>
>>>>> Your guest may be multiplexing the counters. The guest I posited does not.
>>>>
>>>> It would be helpful if you can provide an example.
>>>
>>> Perf on any current Linux distro (i.e. without your fix).
>>
>> The patch for errata #1292 (like most hw issues or vulnerabilities) should be
>> applied to both the host and guest.
> 
> As I'm sure you are aware, guests are often not patched. For example,
> we have a lot of Debian-9 guests running on Milan, despite the fact
> that it has to be booted with "nopcid" due to a bug introduced on
> 4.9-stable. We submitted the fix and notified Debian about a year ago,
> but they have not seen fit to cut a new kernel. Do you think they will
> cut a new kernel for this patch?
> 
>> For non-patched guests on a patched host, the KVM-created perf_events
>> will be true for is_sampling_event() due to get_sample_period().
>>
>> I think we (KVM) have a congenital defect in distinguishing whether guest
>> counters are used in counting mode or sampling mode, which is just
>> a different use of pure software.
> 
> I have no idea what you are saying. However, when kvm sees a guest
> counter used in sampling mode, it will just request a PERF_TYPE_RAW
> perf event with the INT bit set in 'config.' If it sees a guest
> counter used in counting mode, it will either request a PERF_TYPE_RAW
> perf event or a PERF_TYPE_HARDWARE perf event, depending on whether or
> not it finds the requested event in amd_event_mapping[].
> 
>>>
>>>>> I hope that you are not saying that kvm's *thread-pinned* perf events
>>>>> are not being multiplexed at the host level, because that completely
>>>>> breaks PMU virtualization.
>>>>
>>>> IIUC, multiplexing happens inside the guest.
>>>
>>> I'm not sure that multiplexing is the answer. Extrapolation may
>>> introduce greater imprecision than the erratum.
>>
>> If you run the same test on the patched host, the PMC2 will be
>> used in a multiplexing way. This is no different.
>>
>>>
>>> If you count something like "instructions retired" three ways:
>>> 1) Unfixed counter
>>> 2) PMC2 with the fix
>>> 3) Multiplexed on PMC2 with the fix
>>>
>>> Is (3) always more accurate than (1)?
> 
> Since Ravi has gone dark, I will answer my own question.

Sorry about the delay. I was discussing this internally with hw folks.

> 
> For better reproducibility, I simplified his program to:
> 
> int main() { return 0;}
> 
> On an unpatched Milan host, I get instructions retired between 21911
> and 21915. I get branch instructions retired between 5565 and 5566. It
> does not matter if I count them separately or at the same time.
> 
> After applying v3 of Ravi's patch, if I try to count these events at
> the same time, I get 36869 instructions retired and 4962 branch
> instructions on the first run. On subsequent runs, perf refuses to
> count both at the same time. I get branch instructions retired between
> 5565 and 5567, but no instructions retired. Instead, perf tells me:
> 
> Some events weren't counted. Try disabling the NMI watchdog:
> echo 0 > /proc/sys/kernel/nmi_watchdog
> perf stat ...
> echo 1 > /proc/sys/kernel/nmi_watchdog
> 
> If I just count one thing at a time (on the patched kernel), I get
> between 21911 and 21916 instructions retired, and I get between 5565
> and 5566 branch instructions retired.
> 
> I don't know under what circumstances the unfixed counters overcount
> or by how much. However, for this simple test case, the fixed PMC2
> yields the same results as any unfixed counter. Ravi's patch, however
> makes counting two of these events simultaneously either (a)
> impossible, or (b) highly inaccurate (from 10% under to 68% over).

In further discussions with our hardware team, I am given to understand
that the conditions under which the overcounting can happen, is quite
rare. In my tests, I've found that the patched vs. unpatched cases are
not significantly different to warrant the restriction introduced by
this fix. I have requested Peter to hold off pushing this fix.

> 
>> The loss of accuracy is due to a reduction in the number of trustworthy counters,
>> not to these two workaround patches. Any multiplexing (whatever on the host or
>> the guest) will result in a loss of accuracy. Right ?
> 
> Yes, that's my point. Fixing one inaccuracy by using a mechanism that
> introduces another inaccuracy only makes sense if the inaccuracy you
> are fixing is worse than the inaccuracy you are introducing. That does
> not appear to be the case here, but I am not privy to all of the
> details of this erratum.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] perf/amd: Implement erratum #1292 workaround for F19h M00-0Fh
  2022-02-10  4:05                     ` Ravi Bangoria
@ 2022-02-10  8:46                       ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2022-02-10  8:46 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: like.xu.linux, jmattson, eranian, santosh.shukla, pbonzini,
	seanjc, wanpengli, vkuznets, joro, mingo, alexander.shishkin,
	tglx, bp, dave.hansen, hpa, kvm, x86, linux-perf-users,
	ananth.narayan, kim.phillips

On Thu, Feb 10, 2022 at 09:35:14AM +0530, Ravi Bangoria wrote:

> Peter, On subsequent tests, I found that this 'fix' is still not
> optimal. Please drop this patch from your queue for now. Really
> sorry for the noise.

Just in time, and done.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] perf/amd: Implement erratum #1292 workaround for F19h M00-0Fh
  2022-02-10  4:06                           ` Ravi Bangoria
@ 2022-02-10 13:56                             ` Like Xu
  0 siblings, 0 replies; 22+ messages in thread
From: Like Xu @ 2022-02-10 13:56 UTC (permalink / raw)
  To: Ravi Bangoria, Jim Mattson
  Cc: eranian, santosh.shukla, pbonzini, seanjc, wanpengli, vkuznets,
	joro, peterz, mingo, alexander.shishkin, tglx, bp, dave.hansen,
	hpa, kvm, x86, linux-perf-users, ananth.narayan, kim.phillips

On 10/2/2022 12:06 pm, Ravi Bangoria wrote:
> Hi Jim,
> 
> On 10-Feb-22 3:10 AM, Jim Mattson wrote:
>> On Wed, Feb 9, 2022 at 2:19 AM Like Xu <like.xu.linux@gmail.com> wrote:
>>>
>>> On 4/2/2022 9:01 pm, Jim Mattson wrote:
>>>> On Fri, Feb 4, 2022 at 1:33 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 03-Feb-22 11:25 PM, Jim Mattson wrote:
>>>>>> On Wed, Feb 2, 2022 at 9:18 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>>>>>>
>>>>>>> Hi Jim,
>>>>>>>
>>>>>>> On 03-Feb-22 9:39 AM, Jim Mattson wrote:
>>>>>>>> On Wed, Feb 2, 2022 at 2:52 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>>>>>>>>
>>>>>>>>> Perf counter may overcount for a list of Retire Based Events. Implement
>>>>>>>>> workaround for Zen3 Family 19 Model 00-0F processors as suggested in
>>>>>>>>> Revision Guide[1]:
>>>>>>>>>
>>>>>>>>>     To count the non-FP affected PMC events correctly:
>>>>>>>>>       o Use Core::X86::Msr::PERF_CTL2 to count the events, and
>>>>>>>>>       o Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
>>>>>>>>>       o Program Core::X86::Msr::PERF_CTL2[20] to 0b.
>>>>>>>>>
>>>>>>>>> Note that the specified workaround applies only to counting events and
>>>>>>>>> not to sampling events. Thus sampling event will continue functioning
>>>>>>>>> as is.
>>>>>>>>>
>>>>>>>>> Although the issue exists on all previous Zen revisions, the workaround
>>>>>>>>> is different and thus not included in this patch.
>>>>>>>>>
>>>>>>>>> This patch needs Like's patch[2] to make it work on kvm guest.
>>>>>>>>
>>>>>>>> IIUC, this patch along with Like's patch actually breaks PMU
>>>>>>>> virtualization for a kvm guest.
>>>>>>>>
>>>>>>>> Suppose I have some code which counts event 0xC2 [Retired Branch
>>>>>>>> Instructions] on PMC0 and event 0xC4 [Retired Taken Branch
>>>>>>>> Instructions] on PMC1. I then divide PMC1 by PMC0 to see what
>>>>>>>> percentage of my branch instructions are taken. On hardware that
>>>>>>>> suffers from erratum 1292, both counters may overcount, but if the
>>>>>>>> inaccuracy is small, then my final result may still be fairly close to
>>>>>>>> reality.
>>>>>>>>
>>>>>>>> With these patches, if I run that same code in a kvm guest, it looks
>>>>>>>> like one of those events will be counted on PMC2 and the other won't
>>>>>>>> be counted at all. So, when I calculate the percentage of branch
>>>>>>>> instructions taken, I either get 0 or infinity.
>>>>>>>
>>>>>>> Events get multiplexed internally. See below quick test I ran inside
>>>>>>> guest. My host is running with my+Like's patch and guest is running
>>>>>>> with only my patch.
>>>>>>
>>>>>> Your guest may be multiplexing the counters. The guest I posited does not.
>>>>>
>>>>> It would be helpful if you can provide an example.
>>>>
>>>> Perf on any current Linux distro (i.e. without your fix).
>>>
>>> The patch for errata #1292 (like most hw issues or vulnerabilities) should be
>>> applied to both the host and guest.
>>
>> As I'm sure you are aware, guests are often not patched. For example,

It's true. What a real world.

>> we have a lot of Debian-9 guests running on Milan, despite the fact
>> that it has to be booted with "nopcid" due to a bug introduced on
>> 4.9-stable. We submitted the fix and notified Debian about a year ago,
>> but they have not seen fit to cut a new kernel. Do you think they will
>> cut a new kernel for this patch?

Indeed, thanks for your user stories.

>>
>>> For non-patched guests on a patched host, the KVM-created perf_events
>>> will be true for is_sampling_event() due to get_sample_period().
>>>
>>> I think we (KVM) have a congenital defect in distinguishing whether guest
>>> counters are used in counting mode or sampling mode, which is just
>>> a different use of pure software.
>>
>> I have no idea what you are saying. However, when kvm sees a guest
>> counter used in sampling mode, it will just request a PERF_TYPE_RAW
>> perf event with the INT bit set in 'config.' If it sees a guest

The counters work very simply: increments until it overflows.

The use of INT bit is not related to counting or sampling mode.
A pmu driver can set the INT bit, but set a very small ctr value and not
expect it to overflow, and it can be used for counting mode as well, right?

We don't know under what circumstances the overcount will occur, maybe
it's related to the INT bit and maybe bot, but absolutely it's nothing to do with
the software check is_sampling_event().

>> counter used in counting mode, it will either request a PERF_TYPE_RAW
>> perf event or a PERF_TYPE_HARDWARE perf event, depending on whether or
>> not it finds the requested event in amd_event_mapping[].
>>
>>>>
>>>>>> I hope that you are not saying that kvm's *thread-pinned* perf events
>>>>>> are not being multiplexed at the host level, because that completely
>>>>>> breaks PMU virtualization.
>>>>>
>>>>> IIUC, multiplexing happens inside the guest.
>>>>
>>>> I'm not sure that multiplexing is the answer. Extrapolation may
>>>> introduce greater imprecision than the erratum.
>>>
>>> If you run the same test on the patched host, the PMC2 will be
>>> used in a multiplexing way. This is no different.
>>>
>>>>
>>>> If you count something like "instructions retired" three ways:
>>>> 1) Unfixed counter
>>>> 2) PMC2 with the fix
>>>> 3) Multiplexed on PMC2 with the fix
>>>>
>>>> Is (3) always more accurate than (1)?
>>
>> Since Ravi has gone dark, I will answer my own question.
> 
> Sorry about the delay. I was discussing this internally with hw folks.
> 
>>
>> For better reproducibility, I simplified his program to:
>>
>> int main() { return 0;}
>>
>> On an unpatched Milan host, I get instructions retired between 21911
>> and 21915. I get branch instructions retired between 5565 and 5566. It
>> does not matter if I count them separately or at the same time.
>>
>> After applying v3 of Ravi's patch, if I try to count these events at
>> the same time, I get 36869 instructions retired and 4962 branch
>> instructions on the first run. On subsequent runs, perf refuses to
>> count both at the same time. I get branch instructions retired between
>> 5565 and 5567, but no instructions retired. Instead, perf tells me:
>>
>> Some events weren't counted. Try disabling the NMI watchdog:
>> echo 0 > /proc/sys/kernel/nmi_watchdog
>> perf stat ...
>> echo 1 > /proc/sys/kernel/nmi_watchdog
>>
>> If I just count one thing at a time (on the patched kernel), I get
>> between 21911 and 21916 instructions retired, and I get between 5565
>> and 5566 branch instructions retired.
>>
>> I don't know under what circumstances the unfixed counters overcount
>> or by how much. However, for this simple test case, the fixed PMC2
>> yields the same results as any unfixed counter. Ravi's patch, however
>> makes counting two of these events simultaneously either (a)
>> impossible, or (b) highly inaccurate (from 10% under to 68% over).
> 
> In further discussions with our hardware team, I am given to understand
> that the conditions under which the overcounting can happen, is quite
> rare. In my tests, I've found that the patched vs. unpatched cases are
> not significantly different to warrant the restriction introduced by

That's cute and thank you both.

I hope we can come to this conclusion before the code is committed.

But the kvm's patch may have made those PMU driver developers who read the
erratum #1292 a little happier, wouldn't it ?

> this fix. I have requested Peter to hold off pushing this fix.
> 
>>
>>> The loss of accuracy is due to a reduction in the number of trustworthy counters,
>>> not to these two workaround patches. Any multiplexing (whatever on the host or
>>> the guest) will result in a loss of accuracy. Right ?
>>
>> Yes, that's my point. Fixing one inaccuracy by using a mechanism that
>> introduces another inaccuracy only makes sense if the inaccuracy you
>> are fixing is worse than the inaccuracy you are introducing. That does

Couldn't agree more, and in response to similar issues,
we may adopt a quantitative-first strategy in the future.

>> not appear to be the case here, but I am not privy to all of the
>> details of this erratum.
> 
> Thanks,
> Ravi

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-02-10 13:57 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-17  5:57 [PATCH] KVM: x86/pmu: Clear reserved bit PERF_CTL2[43] for AMD erratum 1292 Like Xu
2022-02-02  4:28 ` [PATCH] perf/amd: Implement errata #1292 workaround for F19h M00-0Fh Ravi Bangoria
2022-02-02  5:27   ` Stephane Eranian
2022-02-02  6:02     ` Ravi Bangoria
2022-02-02  6:16       ` Stephane Eranian
2022-02-02  6:32         ` Ravi Bangoria
2022-02-02 10:51           ` [PATCH v2] perf/amd: Implement erratum " Ravi Bangoria
2022-02-02 14:36             ` Peter Zijlstra
2022-02-02 15:32               ` Ravi Bangoria
2022-02-03  9:58                 ` [PATCH v3] " Ravi Bangoria
2022-02-09 12:51                   ` Peter Zijlstra
2022-02-10  4:05                     ` Ravi Bangoria
2022-02-10  8:46                       ` Peter Zijlstra
2022-02-03  4:09             ` [PATCH v2] " Jim Mattson
2022-02-03  5:18               ` Ravi Bangoria
2022-02-03 17:55                 ` Jim Mattson
2022-02-04  9:32                   ` Ravi Bangoria
2022-02-04 13:01                     ` Jim Mattson
2022-02-09 10:18                       ` Like Xu
2022-02-09 21:40                         ` Jim Mattson
2022-02-10  4:06                           ` Ravi Bangoria
2022-02-10 13:56                             ` Like Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).