linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] KVM: x86/vPMU: Speed up vmexit for AMD Zen 4 CPUs
@ 2023-11-09 18:06 Konstantin Khorenko
  2023-11-09 18:06 ` [PATCH 1/1] KVM: x86/vPMU: Check PMU is enabled for vCPU before searching for PMC Konstantin Khorenko
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Konstantin Khorenko @ 2023-11-09 18:06 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H . Peter Anvin, kvm, linux-kernel, Konstantin Khorenko,
	Denis V. Lunev

We have detected significant performance drop of our atomic test which
checks the rate of CPUID instructions rate inside an L1 VM on an AMD
node.

Investigation led to 2 mainstream patches which have introduced extra
events accounting:

   018d70ffcfec ("KVM: x86: Update vPMCs when retiring branch instructions")
   9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions")

And on an AMD Zen 3 CPU that resulted in immediate 43% drop in the CPUID
rate.

Checking latest mainsteam kernel the performance difference is much less
but still quite noticeable: 13.4% and shows up on AMD CPUs only.

Looks like iteration over all PMCs in kvm_pmu_trigger_event() is cheap
on Intel and expensive on AMD CPUs.

So the idea behind this patch is to skip iterations over PMCs at all in
case PMU is disabled for a VM completely or PMU is enabled for a VM, but
there are no active PMCs at all.

Unfortunately
 * current kernel code does not differentiate if PMU is globally enabled
   for a VM or not (pmu->version is always 1)
 * AMD CPUs older than Zen 4 do not support PMU v2 and thus efficient
   check for enabled PMCs is not possible

=> the patch speeds up vmexit for AMD Zen 4 CPUs only, this is sad.
   but the patch does not hurt other CPUs - and this is fortunate!

i have no access to a node with AMD Zen 4 CPU, so i had to test on
AMD Zen 3 CPU and i hope my expectations are right for AMD Zen 4.

i would appreciate if anyone perform the test of a real AMD Zen 4 node.

AMD performance results:
CPU: AMD Zen 3 (three!): AMD EPYC 7443P 24-Core Processor

 * The test binary is run inside an AlmaLinux 9 VM with their stock kernel
   5.14.0-284.11.1.el9_2.x86_64.
 * Test binary checks the CPUID instractions rate (instructions per sec).
 * Default VM config (PMU is off, pmu->version is reported as 1).
 * The Host runs the kernel under test.

 # for i in 1 2 3 4 5 ; do ./at_cpu_cpuid.pub ; done | \
   awk -e '{print $4;}' | \
   cut -f1 --delimiter='.' | \
   ./avg.sh

Measurements:
1. Host runs stock latest mainstream kernel commit 305230142ae0.
2. Host runs same mainstream kernel + current patch.
3. Host runs same mainstream kernel + current patch + force
   guest_pmu_is_enabled() to always return "false" using following change:

   -       if (pmu->version >= 2 && !(pmu->global_ctrl & ~pmu->global_ctrl_mask))
   +       if (pmu->version == 1 && !(pmu->global_ctrl & ~pmu->global_ctrl_mask))

   -----------------------------------------
   | Kernels       | CPUID rate            |
   -----------------------------------------
   | 1.            | 1360250               |
   | 2.            | 1365536 (+ 0.4%)      |
   | 3.            | 1541850 (+13.4%)      |
   -----------------------------------------

Measurement (2) gives some fluctuation, the performance is not increased
because the test was done on a Zen 3 CPU, so we are unable to use fast
check for active PMCs.
Measurement (3) shows expected performance boost on a Zen 4 CPU under
the same test.

The test used:
# cat at_cpu_cpuid.pub.cpp
/*
 * The test executes CPUID instruction in a loop and reports the calls rate.
 */

#include <stdio.h>
#include <time.h>

/* #define CPUID_EAX            0x80000002 */
#define CPUID_EAX               0x29a
#define CPUID_ECX               0

#define TEST_EXEC_SECS          30      // in seconds
#define LOOPS_APPROX_RATE       1000000

static inline void cpuid(unsigned int _eax, unsigned int _ecx)
{
        unsigned int regs[4] = {_eax, 0, _ecx, 0};

        asm __volatile__(
                "cpuid"
                : "=a" (regs[0]), "=b" (regs[1]), "=c" (regs[2]), "=d" (regs[3])
                :  "0" (regs[0]),  "1" (regs[1]),  "2" (regs[2]),  "3" (regs[3])
                : "memory");
}

double cpuid_rate_loops(int loops_num)
{
        int i;
        clock_t start_time, end_time;
        double spent_time, rate;

        start_time = clock();

        for (i = 0; i < loops_num; i++)
                cpuid((unsigned int)CPUID_EAX, (unsigned int)CPUID_ECX);

        end_time = clock();
        spent_time = (double)(end_time - start_time) / CLOCKS_PER_SEC;

        rate = (double)loops_num / spent_time;

        return rate;
}

int main(int argc, char* argv[])
{
        double approx_rate, rate;
        int loops;

        /* First we detect approximate CPUIDs rate. */
        approx_rate = cpuid_rate_loops(LOOPS_APPROX_RATE);

        /*
         * How many loops there should be in order to run the test for
         * TEST_EXEC_SECS seconds?
         */
        loops = (int)(approx_rate * TEST_EXEC_SECS);

        /* Get the precise instructions rate. */
        rate = cpuid_rate_loops(loops);

        printf( "CPUID instructions rate: %f instructions/second\n", rate);

        return 0;
}

Konstantin Khorenko (1):
  KVM: x86/vPMU: Check PMU is enabled for vCPU before searching for PMC

 arch/x86/kvm/pmu.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

-- 
2.39.3


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/1] KVM: x86/vPMU: Check PMU is enabled for vCPU before searching for PMC
  2023-11-09 18:06 [PATCH 0/1] KVM: x86/vPMU: Speed up vmexit for AMD Zen 4 CPUs Konstantin Khorenko
@ 2023-11-09 18:06 ` Konstantin Khorenko
  2023-11-09 20:09   ` Sean Christopherson
  2023-11-09 20:13   ` Jim Mattson
  2023-11-09 18:18 ` KVM: x86/vPMU/AMD: Can we detect PMU is off for a VM? Konstantin Khorenko
  2023-11-09 22:44 ` [PATCH 0/1] KVM: x86/vPMU: Speed up vmexit for AMD Zen 4 CPUs Jim Mattson
  2 siblings, 2 replies; 20+ messages in thread
From: Konstantin Khorenko @ 2023-11-09 18:06 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H . Peter Anvin, kvm, linux-kernel, Konstantin Khorenko,
	Denis V. Lunev

The following 2 mainstream patches have introduced extra
events accounting:

  018d70ffcfec ("KVM: x86: Update vPMCs when retiring branch instructions")
  9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions")

kvm_pmu_trigger_event() iterates over all PMCs looking for enabled and
this appeared to be fast on Intel CPUs and quite expensive for AMD CPUs.

kvm_pmu_trigger_event() can be optimized not to iterate over all PMCs in
the following cases:

  * if PMU is completely disabled for a VM, which is the default
    configuration
  * if PMU v2 is enabled, but no PMCs are configured

For Intel CPUs:
  * By default PMU is disabled for KVM VMs (<pmu state='off'/> or absent
    in the VM config xml which results in "-cpu pmu=off" qemu option).
    In this case pmu->version is reported as 0 for the appropriate vCPU.

  * According to Intel® 64 and IA-32 Architectures Software Developer’s
    Manual PMU version 2 and higher provide IA32_PERF_GLOBAL_CTRL MSR
    which in particular contains bits which can be used for efficient
    detection which fixed-function performance and general-purpose
    performance monitoring counters are enabled at the moment.

  * Searching for enabled PMCs is fast and the optimization does not
    bring noticeable performance increase.

For AMD CPUs:
  * For CPUs older than Zen 4 pmu->version is always reported as "1" for
    the appropriate vCPU, no matter if PMU is disabled for KVM VMs
    (<pmu state='off'/>) or enabled.
    So for "old" CPUs currently it's impossible to detect when PMU is
    disabled for a VM and skip the iteration by PMCs efficiently.

  * Since Zen 4 AMD CPUs support PMU v2 and in this case pmu->version
    should be reported as "2" and IA32_PERF_GLOBAL_CTRL MSR is available
    and can be used for fast and efficient check for enabled PMCs.
    https://www.phoronix.com/news/AMD-PerfMonV2-Linux-Patches
    https://www.phoronix.com/news/AMD-PerfMonV2-Guests-KVM

  * Optimized preliminary check for enabled PMCs on AMD Zen 4 CPUs
    should give quite noticeable performance improvement.

AMD performance results:
CPU: AMD Zen 3 (three!): AMD EPYC 7443P 24-Core Processor

 * The test binary is run inside an AlmaLinux 9 VM with their stock kernel
   5.14.0-284.11.1.el9_2.x86_64.
 * Test binary checks the CPUID instractions rate (instructions per sec).
 * Default VM config (PMU is off, pmu->version is reported as 1).
 * The Host runs the kernel under test.

 # for i in 1 2 3 4 5 ; do ./at_cpu_cpuid.pub ; done | \
   awk -e '{print $4;}' | \
   cut -f1 --delimiter='.' | \
   ./avg.sh

Measurements:
1. Host runs stock latest mainstream kernel commit 305230142ae0.
2. Host runs same mainstream kernel + current patch.
3. Host runs same mainstream kernel + current patch + force
   guest_pmu_is_enabled() to always return "false" using following change:

   -       if (pmu->version >= 2 && !(pmu->global_ctrl & ~pmu->global_ctrl_mask))
   +       if (pmu->version == 1 && !(pmu->global_ctrl & ~pmu->global_ctrl_mask))

   --------------------------------------
   | Kernels	| CPUID rate		|
   --------------------------------------
   | 1.		| 1360250		|
   | 2.		| 1365536 (+ 0.4%)	|
   | 3.		| 1541850 (+13.4%)	|
   --------------------------------------

Measurement (2) gives some fluctuation, the performance is not increased
because the test was done on a Zen 3 CPU, so we are unable to use fast
check for active PMCs.
Measurement (3) shows expected performance boost on a Zen 4 CPU under
the same test.

Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
---
 arch/x86/kvm/pmu.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 9ae07db6f0f6..290d407f339b 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -731,12 +731,38 @@ static inline bool cpl_is_matched(struct kvm_pmc *pmc)
 	return (static_call(kvm_x86_get_cpl)(pmc->vcpu) == 0) ? select_os : select_user;
 }
 
+static inline bool guest_pmu_is_enabled(struct kvm_pmu *pmu)
+{
+	/*
+	 * Currently VMs do not have PMU settings in configs which defaults
+	 * to "pmu=off".
+	 *
+	 * For Intel currently this means pmu->version will be 0.
+	 * For AMD currently PMU cannot be disabled:
+	 * pmu->version should be 2 for Zen 4 cpus and 1 otherwise.
+	 */
+	if (pmu->version == 0)
+		return false;
+
+	/*
+	 * Starting with PMU v2 IA32_PERF_GLOBAL_CTRL MSR is available and
+	 * it can be used to check if none PMCs are enabled.
+	 */
+	if (pmu->version >= 2 && !(pmu->global_ctrl & ~pmu->global_ctrl_mask))
+		return false;
+
+	return true;
+}
+
 void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
 	struct kvm_pmc *pmc;
 	int i;
 
+	if (!guest_pmu_is_enabled(pmu))
+		return;
+
 	for_each_set_bit(i, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX) {
 		pmc = static_call(kvm_x86_pmu_pmc_idx_to_pmc)(pmu, i);
 
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* KVM: x86/vPMU/AMD: Can we detect PMU is off for a VM?
  2023-11-09 18:06 [PATCH 0/1] KVM: x86/vPMU: Speed up vmexit for AMD Zen 4 CPUs Konstantin Khorenko
  2023-11-09 18:06 ` [PATCH 1/1] KVM: x86/vPMU: Check PMU is enabled for vCPU before searching for PMC Konstantin Khorenko
@ 2023-11-09 18:18 ` Konstantin Khorenko
  2023-11-09 22:52   ` Jim Mattson
  2023-11-09 22:44 ` [PATCH 0/1] KVM: x86/vPMU: Speed up vmexit for AMD Zen 4 CPUs Jim Mattson
  2 siblings, 1 reply; 20+ messages in thread
From: Konstantin Khorenko @ 2023-11-09 18:18 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H . Peter Anvin, kvm, linux-kernel, Denis V. Lunev

Hi All,

as a followup for my patch: i have noticed that
currently Intel kernel code provides an ability to detect if PMU is totally disabled for a VM
(pmu->version == 0 in this case), but for AMD code pmu->version is never 0,
no matter if PMU is enabled or disabled for a VM (i mean <pmu state='off'/> in the VM config which 
results in "-cpu pmu=off" qemu option).

So the question is - is it possible to enhance the code for AMD to also honor PMU VM setting or it is 
impossible by design?

i can try implementing this, but need a direction.

Thank you in advance.

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

On 09.11.2023 19:06, Konstantin Khorenko wrote:
> We have detected significant performance drop of our atomic test which
> checks the rate of CPUID instructions rate inside an L1 VM on an AMD
> node.
> 
> Investigation led to 2 mainstream patches which have introduced extra
> events accounting:
> 
>     018d70ffcfec ("KVM: x86: Update vPMCs when retiring branch instructions")
>     9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions")
> 
> And on an AMD Zen 3 CPU that resulted in immediate 43% drop in the CPUID
> rate.
> 
> Checking latest mainsteam kernel the performance difference is much less
> but still quite noticeable: 13.4% and shows up on AMD CPUs only.
> 
> Looks like iteration over all PMCs in kvm_pmu_trigger_event() is cheap
> on Intel and expensive on AMD CPUs.
> 
> So the idea behind this patch is to skip iterations over PMCs at all in
> case PMU is disabled for a VM completely or PMU is enabled for a VM, but
> there are no active PMCs at all.
> 
> Unfortunately
>   * current kernel code does not differentiate if PMU is globally enabled
>     for a VM or not (pmu->version is always 1)
>   * AMD CPUs older than Zen 4 do not support PMU v2 and thus efficient
>     check for enabled PMCs is not possible
> 
> => the patch speeds up vmexit for AMD Zen 4 CPUs only, this is sad.
>     but the patch does not hurt other CPUs - and this is fortunate!
> 
> i have no access to a node with AMD Zen 4 CPU, so i had to test on
> AMD Zen 3 CPU and i hope my expectations are right for AMD Zen 4.
> 
> i would appreciate if anyone perform the test of a real AMD Zen 4 node.
> 
> AMD performance results:
> CPU: AMD Zen 3 (three!): AMD EPYC 7443P 24-Core Processor
> 
>   * The test binary is run inside an AlmaLinux 9 VM with their stock kernel
>     5.14.0-284.11.1.el9_2.x86_64.
>   * Test binary checks the CPUID instractions rate (instructions per sec).
>   * Default VM config (PMU is off, pmu->version is reported as 1).
>   * The Host runs the kernel under test.
> 
>   # for i in 1 2 3 4 5 ; do ./at_cpu_cpuid.pub ; done | \
>     awk -e '{print $4;}' | \
>     cut -f1 --delimiter='.' | \
>     ./avg.sh
> 
> Measurements:
> 1. Host runs stock latest mainstream kernel commit 305230142ae0.
> 2. Host runs same mainstream kernel + current patch.
> 3. Host runs same mainstream kernel + current patch + force
>     guest_pmu_is_enabled() to always return "false" using following change:
> 
>     -       if (pmu->version >= 2 && !(pmu->global_ctrl & ~pmu->global_ctrl_mask))
>     +       if (pmu->version == 1 && !(pmu->global_ctrl & ~pmu->global_ctrl_mask))
> 
>     -----------------------------------------
>     | Kernels       | CPUID rate            |
>     -----------------------------------------
>     | 1.            | 1360250               |
>     | 2.            | 1365536 (+ 0.4%)      |
>     | 3.            | 1541850 (+13.4%)      |
>     -----------------------------------------
> 
> Measurement (2) gives some fluctuation, the performance is not increased
> because the test was done on a Zen 3 CPU, so we are unable to use fast
> check for active PMCs.
> Measurement (3) shows expected performance boost on a Zen 4 CPU under
> the same test.
> 
> The test used:
> # cat at_cpu_cpuid.pub.cpp
> /*
>   * The test executes CPUID instruction in a loop and reports the calls rate.
>   */
> 
> #include <stdio.h>
> #include <time.h>
> 
> /* #define CPUID_EAX            0x80000002 */
> #define CPUID_EAX               0x29a
> #define CPUID_ECX               0
> 
> #define TEST_EXEC_SECS          30      // in seconds
> #define LOOPS_APPROX_RATE       1000000
> 
> static inline void cpuid(unsigned int _eax, unsigned int _ecx)
> {
>          unsigned int regs[4] = {_eax, 0, _ecx, 0};
> 
>          asm __volatile__(
>                  "cpuid"
>                  : "=a" (regs[0]), "=b" (regs[1]), "=c" (regs[2]), "=d" (regs[3])
>                  :  "0" (regs[0]),  "1" (regs[1]),  "2" (regs[2]),  "3" (regs[3])
>                  : "memory");
> }
> 
> double cpuid_rate_loops(int loops_num)
> {
>          int i;
>          clock_t start_time, end_time;
>          double spent_time, rate;
> 
>          start_time = clock();
> 
>          for (i = 0; i < loops_num; i++)
>                  cpuid((unsigned int)CPUID_EAX, (unsigned int)CPUID_ECX);
> 
>          end_time = clock();
>          spent_time = (double)(end_time - start_time) / CLOCKS_PER_SEC;
> 
>          rate = (double)loops_num / spent_time;
> 
>          return rate;
> }
> 
> int main(int argc, char* argv[])
> {
>          double approx_rate, rate;
>          int loops;
> 
>          /* First we detect approximate CPUIDs rate. */
>          approx_rate = cpuid_rate_loops(LOOPS_APPROX_RATE);
> 
>          /*
>           * How many loops there should be in order to run the test for
>           * TEST_EXEC_SECS seconds?
>           */
>          loops = (int)(approx_rate * TEST_EXEC_SECS);
> 
>          /* Get the precise instructions rate. */
>          rate = cpuid_rate_loops(loops);
> 
>          printf( "CPUID instructions rate: %f instructions/second\n", rate);
> 
>          return 0;
> }
> 
> Konstantin Khorenko (1):
>    KVM: x86/vPMU: Check PMU is enabled for vCPU before searching for PMC
> 
>   arch/x86/kvm/pmu.c | 26 ++++++++++++++++++++++++++
>   1 file changed, 26 insertions(+)
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] KVM: x86/vPMU: Check PMU is enabled for vCPU before searching for PMC
  2023-11-09 18:06 ` [PATCH 1/1] KVM: x86/vPMU: Check PMU is enabled for vCPU before searching for PMC Konstantin Khorenko
@ 2023-11-09 20:09   ` Sean Christopherson
  2023-11-09 20:13   ` Jim Mattson
  1 sibling, 0 replies; 20+ messages in thread
From: Sean Christopherson @ 2023-11-09 20:09 UTC (permalink / raw)
  To: Konstantin Khorenko
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, kvm, linux-kernel,
	Denis V. Lunev

On Thu, Nov 09, 2023, Konstantin Khorenko wrote:
> The following 2 mainstream patches have introduced extra
> events accounting:
> 
>   018d70ffcfec ("KVM: x86: Update vPMCs when retiring branch instructions")
>   9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions")
> 
> kvm_pmu_trigger_event() iterates over all PMCs looking for enabled and
> this appeared to be fast on Intel CPUs and quite expensive for AMD CPUs.
> 
> kvm_pmu_trigger_event() can be optimized not to iterate over all PMCs in
> the following cases:

Heh, I'm just putting the finishing touches on a series to optimize this mess.

> ---
>  arch/x86/kvm/pmu.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 9ae07db6f0f6..290d407f339b 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -731,12 +731,38 @@ static inline bool cpl_is_matched(struct kvm_pmc *pmc)
>  	return (static_call(kvm_x86_get_cpl)(pmc->vcpu) == 0) ? select_os : select_user;
>  }
>  
> +static inline bool guest_pmu_is_enabled(struct kvm_pmu *pmu)
> +{
> +	/*
> +	 * Currently VMs do not have PMU settings in configs which defaults
> +	 * to "pmu=off".
> +	 *
> +	 * For Intel currently this means pmu->version will be 0.
> +	 * For AMD currently PMU cannot be disabled:
> +	 * pmu->version should be 2 for Zen 4 cpus and 1 otherwise.
> +	 */
> +	if (pmu->version == 0)
> +		return false;
> +
> +	/*
> +	 * Starting with PMU v2 IA32_PERF_GLOBAL_CTRL MSR is available and
> +	 * it can be used to check if none PMCs are enabled.
> +	 */
> +	if (pmu->version >= 2 && !(pmu->global_ctrl & ~pmu->global_ctrl_mask))
> +		return false;
> +
> +	return true;
> +}
> +
>  void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id)
>  {
>  	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>  	struct kvm_pmc *pmc;
>  	int i;
>  
> +	if (!guest_pmu_is_enabled(pmu))
> +		return;

This _should_ be unnecessary, because all_valid_pmc_idx _should_ be zero when the
PMU is disabled.  Architecturally, AMD doesn't provide a way to disable the PMU,
but KVM open that can of worms a long time ago, e.g. see the enable_pmu check in
get_gp_pmc_amd().  Kernels have long since learned to not panic if the PMU isn't
available, precisely because hypervisors have a long history of not virtualizing
a PMU.

The issue is that KVM stupidly doesn't zero out the metadata, i.e. configures
all_valid_pmc_idx as if the vCPU has a PMU even though KVM will deny access to
all assets in the end.

The below should resolve the issue.  Note, it won't apply on kvm-x86/next due to
multiple dependencies on other PMU changes I have in flight.  Give me a few hours
to test; with luck I'll get this posted by end-of-day.

--
From: Sean Christopherson <seanjc@google.com>
Date: Thu, 9 Nov 2023 11:03:48 -0800
Subject: [PATCH 1/4] KVM: x86/pmu: Zero out PMU metadata on AMD if PMU is
 disabled

Move the purging of common PMU metadata from intel_pmu_refresh() to
kvm_pmu_refresh(), and invoke the vendor refresh() hook if and only if
the VM is supposed to have a vPMU.

KVM already denies access to the PMU based on kvm->arch.enable_pmu, as
get_gp_pmc_amd() returns NULL for all PMCs in that case, i.e. KVM already
violates AMD's architecture by not virtualizing a PMU (kernels have long
since learned to not panic when the PMU is unavailable).  But configuring
the PMU as if it were enabled causes unwanted side effects, e.g. calls to
kvm_pmu_trigger_event() waste an absurd number of cycles due to the
all_valid_pmc_idx bitmap being non-zero.

Fixes: b1d66dad65dc ("KVM: x86/svm: Add module param to control PMU virtualization")
Reported-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Closes: https://lore.kernel.org/all/20231109180646.2963718-2-khorenko@virtuozzo.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/pmu.c           | 20 ++++++++++++++++++--
 arch/x86/kvm/vmx/pmu_intel.c | 16 ++--------------
 2 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 1b74a29ed250..b52bab7dc422 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -739,6 +739,8 @@ static void kvm_pmu_reset(struct kvm_vcpu *vcpu)
  */
 void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
 {
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+
 	if (KVM_BUG_ON(kvm_vcpu_has_run(vcpu), vcpu->kvm))
 		return;
 
@@ -748,8 +750,22 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
 	 */
 	kvm_pmu_reset(vcpu);
 
-	bitmap_zero(vcpu_to_pmu(vcpu)->all_valid_pmc_idx, X86_PMC_IDX_MAX);
-	static_call(kvm_x86_pmu_refresh)(vcpu);
+	pmu->version = 0;
+	pmu->nr_arch_gp_counters = 0;
+	pmu->nr_arch_fixed_counters = 0;
+	pmu->counter_bitmask[KVM_PMC_GP] = 0;
+	pmu->counter_bitmask[KVM_PMC_FIXED] = 0;
+	pmu->reserved_bits = 0xffffffff00200000ull;
+	pmu->raw_event_mask = X86_RAW_EVENT_MASK;
+	pmu->global_ctrl_mask = ~0ull;
+	pmu->global_status_mask = ~0ull;
+	pmu->fixed_ctr_ctrl_mask = ~0ull;
+	pmu->pebs_enable_mask = ~0ull;
+	pmu->pebs_data_cfg_mask = ~0ull;
+	bitmap_zero(pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX);
+
+	if (vcpu->kvm->arch.enable_pmu)
+		static_call(kvm_x86_pmu_refresh)(vcpu);
 }
 
 void kvm_pmu_init(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index c3a841d8df27..0d2fd9fdcf4b 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -463,19 +463,6 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 	u64 counter_mask;
 	int i;
 
-	pmu->nr_arch_gp_counters = 0;
-	pmu->nr_arch_fixed_counters = 0;
-	pmu->counter_bitmask[KVM_PMC_GP] = 0;
-	pmu->counter_bitmask[KVM_PMC_FIXED] = 0;
-	pmu->version = 0;
-	pmu->reserved_bits = 0xffffffff00200000ull;
-	pmu->raw_event_mask = X86_RAW_EVENT_MASK;
-	pmu->global_ctrl_mask = ~0ull;
-	pmu->global_status_mask = ~0ull;
-	pmu->fixed_ctr_ctrl_mask = ~0ull;
-	pmu->pebs_enable_mask = ~0ull;
-	pmu->pebs_data_cfg_mask = ~0ull;
-
 	memset(&lbr_desc->records, 0, sizeof(lbr_desc->records));
 
 	/*
@@ -487,8 +474,9 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 		return;
 
 	entry = kvm_find_cpuid_entry(vcpu, 0xa);
-	if (!entry || !vcpu->kvm->arch.enable_pmu)
+	if (!entry)
 		return;
+
 	eax.full = entry->eax;
 	edx.full = entry->edx;
 

base-commit: 45c6565ff59d0b254ca3755cb4e14776a2c0b324
-- 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] KVM: x86/vPMU: Check PMU is enabled for vCPU before searching for PMC
  2023-11-09 18:06 ` [PATCH 1/1] KVM: x86/vPMU: Check PMU is enabled for vCPU before searching for PMC Konstantin Khorenko
  2023-11-09 20:09   ` Sean Christopherson
@ 2023-11-09 20:13   ` Jim Mattson
  2023-11-09 23:05     ` Sean Christopherson
  1 sibling, 1 reply; 20+ messages in thread
From: Jim Mattson @ 2023-11-09 20:13 UTC (permalink / raw)
  To: Konstantin Khorenko
  Cc: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, kvm,
	linux-kernel, Denis V. Lunev

On Thu, Nov 9, 2023 at 10:24 AM Konstantin Khorenko
<khorenko@virtuozzo.com> wrote:
>
> The following 2 mainstream patches have introduced extra
> events accounting:
>
>   018d70ffcfec ("KVM: x86: Update vPMCs when retiring branch instructions")
>   9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions")
>
> kvm_pmu_trigger_event() iterates over all PMCs looking for enabled and
> this appeared to be fast on Intel CPUs and quite expensive for AMD CPUs.
>
> kvm_pmu_trigger_event() can be optimized not to iterate over all PMCs in
> the following cases:
>
>   * if PMU is completely disabled for a VM, which is the default
>     configuration
>   * if PMU v2 is enabled, but no PMCs are configured
>
> For Intel CPUs:
>   * By default PMU is disabled for KVM VMs (<pmu state='off'/> or absent
>     in the VM config xml which results in "-cpu pmu=off" qemu option).
>     In this case pmu->version is reported as 0 for the appropriate vCPU.
>
>   * According to Intel® 64 and IA-32 Architectures Software Developer’s
>     Manual PMU version 2 and higher provide IA32_PERF_GLOBAL_CTRL MSR
>     which in particular contains bits which can be used for efficient
>     detection which fixed-function performance and general-purpose
>     performance monitoring counters are enabled at the moment.
>
>   * Searching for enabled PMCs is fast and the optimization does not
>     bring noticeable performance increase.
>
> For AMD CPUs:
>   * For CPUs older than Zen 4 pmu->version is always reported as "1" for
>     the appropriate vCPU, no matter if PMU is disabled for KVM VMs
>     (<pmu state='off'/>) or enabled.
>     So for "old" CPUs currently it's impossible to detect when PMU is
>     disabled for a VM and skip the iteration by PMCs efficiently.
>
>   * Since Zen 4 AMD CPUs support PMU v2 and in this case pmu->version
>     should be reported as "2" and IA32_PERF_GLOBAL_CTRL MSR is available
>     and can be used for fast and efficient check for enabled PMCs.
>     https://www.phoronix.com/news/AMD-PerfMonV2-Linux-Patches
>     https://www.phoronix.com/news/AMD-PerfMonV2-Guests-KVM
>
>   * Optimized preliminary check for enabled PMCs on AMD Zen 4 CPUs
>     should give quite noticeable performance improvement.
>
> AMD performance results:
> CPU: AMD Zen 3 (three!): AMD EPYC 7443P 24-Core Processor
>
>  * The test binary is run inside an AlmaLinux 9 VM with their stock kernel
>    5.14.0-284.11.1.el9_2.x86_64.
>  * Test binary checks the CPUID instractions rate (instructions per sec).
>  * Default VM config (PMU is off, pmu->version is reported as 1).
>  * The Host runs the kernel under test.
>
>  # for i in 1 2 3 4 5 ; do ./at_cpu_cpuid.pub ; done | \
>    awk -e '{print $4;}' | \
>    cut -f1 --delimiter='.' | \
>    ./avg.sh
>
> Measurements:
> 1. Host runs stock latest mainstream kernel commit 305230142ae0.
> 2. Host runs same mainstream kernel + current patch.
> 3. Host runs same mainstream kernel + current patch + force
>    guest_pmu_is_enabled() to always return "false" using following change:
>
>    -       if (pmu->version >= 2 && !(pmu->global_ctrl & ~pmu->global_ctrl_mask))
>    +       if (pmu->version == 1 && !(pmu->global_ctrl & ~pmu->global_ctrl_mask))
>
>    --------------------------------------
>    | Kernels    | CPUID rate            |
>    --------------------------------------
>    | 1.         | 1360250               |
>    | 2.         | 1365536 (+ 0.4%)      |
>    | 3.         | 1541850 (+13.4%)      |
>    --------------------------------------
>
> Measurement (2) gives some fluctuation, the performance is not increased
> because the test was done on a Zen 3 CPU, so we are unable to use fast
> check for active PMCs.
> Measurement (3) shows expected performance boost on a Zen 4 CPU under
> the same test.
>
> Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
> ---
>  arch/x86/kvm/pmu.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
>
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 9ae07db6f0f6..290d407f339b 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -731,12 +731,38 @@ static inline bool cpl_is_matched(struct kvm_pmc *pmc)
>         return (static_call(kvm_x86_get_cpl)(pmc->vcpu) == 0) ? select_os : select_user;
>  }
>
> +static inline bool guest_pmu_is_enabled(struct kvm_pmu *pmu)
> +{
> +       /*
> +        * Currently VMs do not have PMU settings in configs which defaults
> +        * to "pmu=off".
> +        *
> +        * For Intel currently this means pmu->version will be 0.
> +        * For AMD currently PMU cannot be disabled:

Isn't that what KVM_PMU_CAP_DISABLE is for?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/1] KVM: x86/vPMU: Speed up vmexit for AMD Zen 4 CPUs
  2023-11-09 18:06 [PATCH 0/1] KVM: x86/vPMU: Speed up vmexit for AMD Zen 4 CPUs Konstantin Khorenko
  2023-11-09 18:06 ` [PATCH 1/1] KVM: x86/vPMU: Check PMU is enabled for vCPU before searching for PMC Konstantin Khorenko
  2023-11-09 18:18 ` KVM: x86/vPMU/AMD: Can we detect PMU is off for a VM? Konstantin Khorenko
@ 2023-11-09 22:44 ` Jim Mattson
  2023-11-09 23:42   ` Sean Christopherson
  2 siblings, 1 reply; 20+ messages in thread
From: Jim Mattson @ 2023-11-09 22:44 UTC (permalink / raw)
  To: Konstantin Khorenko
  Cc: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, kvm,
	linux-kernel, Denis V. Lunev

On Thu, Nov 9, 2023 at 10:24 AM Konstantin Khorenko
<khorenko@virtuozzo.com> wrote:
>
> We have detected significant performance drop of our atomic test which
> checks the rate of CPUID instructions rate inside an L1 VM on an AMD
> node.
>
> Investigation led to 2 mainstream patches which have introduced extra
> events accounting:
>
>    018d70ffcfec ("KVM: x86: Update vPMCs when retiring branch instructions")
>    9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions")
>
> And on an AMD Zen 3 CPU that resulted in immediate 43% drop in the CPUID
> rate.
>
> Checking latest mainsteam kernel the performance difference is much less
> but still quite noticeable: 13.4% and shows up on AMD CPUs only.
>
> Looks like iteration over all PMCs in kvm_pmu_trigger_event() is cheap
> on Intel and expensive on AMD CPUs.
>
> So the idea behind this patch is to skip iterations over PMCs at all in
> case PMU is disabled for a VM completely or PMU is enabled for a VM, but
> there are no active PMCs at all.

A better solution may be to maintain two bitmaps of general purpose
counters that need to be incremented, one for instructions retired and
one for branch instructions retired. Set or clear these bits whenever
the PerfEvtSelN MSRs are written. I think I would keep the PGC bits
separate, on those microarchitectures that support PGC. Then,
kvm_pmu_trigger_event() need only consult the appropriate bitmap (or
the logical and of that bitmap with PGC). In most cases, the value
will be zero, and the function can simply return.

This would work even for AMD microarchitectures that don't support PGC.

> Unfortunately
>  * current kernel code does not differentiate if PMU is globally enabled
>    for a VM or not (pmu->version is always 1)
>  * AMD CPUs older than Zen 4 do not support PMU v2 and thus efficient
>    check for enabled PMCs is not possible
>
> => the patch speeds up vmexit for AMD Zen 4 CPUs only, this is sad.
>    but the patch does not hurt other CPUs - and this is fortunate!
>
> i have no access to a node with AMD Zen 4 CPU, so i had to test on
> AMD Zen 3 CPU and i hope my expectations are right for AMD Zen 4.
>
> i would appreciate if anyone perform the test of a real AMD Zen 4 node.
>
> AMD performance results:
> CPU: AMD Zen 3 (three!): AMD EPYC 7443P 24-Core Processor
>
>  * The test binary is run inside an AlmaLinux 9 VM with their stock kernel
>    5.14.0-284.11.1.el9_2.x86_64.
>  * Test binary checks the CPUID instractions rate (instructions per sec).
>  * Default VM config (PMU is off, pmu->version is reported as 1).
>  * The Host runs the kernel under test.
>
>  # for i in 1 2 3 4 5 ; do ./at_cpu_cpuid.pub ; done | \
>    awk -e '{print $4;}' | \
>    cut -f1 --delimiter='.' | \
>    ./avg.sh
>
> Measurements:
> 1. Host runs stock latest mainstream kernel commit 305230142ae0.
> 2. Host runs same mainstream kernel + current patch.
> 3. Host runs same mainstream kernel + current patch + force
>    guest_pmu_is_enabled() to always return "false" using following change:
>
>    -       if (pmu->version >= 2 && !(pmu->global_ctrl & ~pmu->global_ctrl_mask))
>    +       if (pmu->version == 1 && !(pmu->global_ctrl & ~pmu->global_ctrl_mask))
>
>    -----------------------------------------
>    | Kernels       | CPUID rate            |
>    -----------------------------------------
>    | 1.            | 1360250               |
>    | 2.            | 1365536 (+ 0.4%)      |
>    | 3.            | 1541850 (+13.4%)      |
>    -----------------------------------------
>
> Measurement (2) gives some fluctuation, the performance is not increased
> because the test was done on a Zen 3 CPU, so we are unable to use fast
> check for active PMCs.
> Measurement (3) shows expected performance boost on a Zen 4 CPU under
> the same test.
>
> The test used:
> # cat at_cpu_cpuid.pub.cpp
> /*
>  * The test executes CPUID instruction in a loop and reports the calls rate.
>  */
>
> #include <stdio.h>
> #include <time.h>
>
> /* #define CPUID_EAX            0x80000002 */
> #define CPUID_EAX               0x29a
> #define CPUID_ECX               0
>
> #define TEST_EXEC_SECS          30      // in seconds
> #define LOOPS_APPROX_RATE       1000000
>
> static inline void cpuid(unsigned int _eax, unsigned int _ecx)
> {
>         unsigned int regs[4] = {_eax, 0, _ecx, 0};
>
>         asm __volatile__(
>                 "cpuid"
>                 : "=a" (regs[0]), "=b" (regs[1]), "=c" (regs[2]), "=d" (regs[3])
>                 :  "0" (regs[0]),  "1" (regs[1]),  "2" (regs[2]),  "3" (regs[3])
>                 : "memory");
> }
>
> double cpuid_rate_loops(int loops_num)
> {
>         int i;
>         clock_t start_time, end_time;
>         double spent_time, rate;
>
>         start_time = clock();
>
>         for (i = 0; i < loops_num; i++)
>                 cpuid((unsigned int)CPUID_EAX, (unsigned int)CPUID_ECX);
>
>         end_time = clock();
>         spent_time = (double)(end_time - start_time) / CLOCKS_PER_SEC;
>
>         rate = (double)loops_num / spent_time;
>
>         return rate;
> }
>
> int main(int argc, char* argv[])
> {
>         double approx_rate, rate;
>         int loops;
>
>         /* First we detect approximate CPUIDs rate. */
>         approx_rate = cpuid_rate_loops(LOOPS_APPROX_RATE);
>
>         /*
>          * How many loops there should be in order to run the test for
>          * TEST_EXEC_SECS seconds?
>          */
>         loops = (int)(approx_rate * TEST_EXEC_SECS);
>
>         /* Get the precise instructions rate. */
>         rate = cpuid_rate_loops(loops);
>
>         printf( "CPUID instructions rate: %f instructions/second\n", rate);
>
>         return 0;
> }
>
> Konstantin Khorenko (1):
>   KVM: x86/vPMU: Check PMU is enabled for vCPU before searching for PMC
>
>  arch/x86/kvm/pmu.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
>
> --
> 2.39.3
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: KVM: x86/vPMU/AMD: Can we detect PMU is off for a VM?
  2023-11-09 18:18 ` KVM: x86/vPMU/AMD: Can we detect PMU is off for a VM? Konstantin Khorenko
@ 2023-11-09 22:52   ` Jim Mattson
  2023-11-09 23:46     ` Denis V. Lunev
  0 siblings, 1 reply; 20+ messages in thread
From: Jim Mattson @ 2023-11-09 22:52 UTC (permalink / raw)
  To: Konstantin Khorenko
  Cc: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, kvm,
	linux-kernel, Denis V. Lunev

On Thu, Nov 9, 2023 at 10:18 AM Konstantin Khorenko
<khorenko@virtuozzo.com> wrote:
>
> Hi All,
>
> as a followup for my patch: i have noticed that
> currently Intel kernel code provides an ability to detect if PMU is totally disabled for a VM
> (pmu->version == 0 in this case), but for AMD code pmu->version is never 0,
> no matter if PMU is enabled or disabled for a VM (i mean <pmu state='off'/> in the VM config which
> results in "-cpu pmu=off" qemu option).
>
> So the question is - is it possible to enhance the code for AMD to also honor PMU VM setting or it is
> impossible by design?

The AMD architectural specification prior to AMD PMU v2 does not allow
one to describe a CPU (via CPUID or MSRs) that has fewer than 4
general purpose PMU counters. While AMD PMU v2 does allow one to
describe such a CPU, legacy software that knows nothing of AMD PMU v2
can expect four counters regardless.

Having said that, KVM does provide a per-VM capability for disabling
the virtual PMU: KVM_CAP_PMU_CAPABILITY(KVM_PMU_CAP_DISABLE). See
section 8.35 in Documentation/virt/kvm/api.rst.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] KVM: x86/vPMU: Check PMU is enabled for vCPU before searching for PMC
  2023-11-09 20:13   ` Jim Mattson
@ 2023-11-09 23:05     ` Sean Christopherson
  0 siblings, 0 replies; 20+ messages in thread
From: Sean Christopherson @ 2023-11-09 23:05 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Konstantin Khorenko, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, kvm,
	linux-kernel, Denis V. Lunev

On Thu, Nov 09, 2023, Jim Mattson wrote:
> On Thu, Nov 9, 2023 at 10:24 AM Konstantin Khorenko <khorenko@virtuozzo.com> wrote:
> > ---
> >  arch/x86/kvm/pmu.c | 26 ++++++++++++++++++++++++++
> >  1 file changed, 26 insertions(+)
> >
> > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> > index 9ae07db6f0f6..290d407f339b 100644
> > --- a/arch/x86/kvm/pmu.c
> > +++ b/arch/x86/kvm/pmu.c
> > @@ -731,12 +731,38 @@ static inline bool cpl_is_matched(struct kvm_pmc *pmc)
> >         return (static_call(kvm_x86_get_cpl)(pmc->vcpu) == 0) ? select_os : select_user;
> >  }
> >
> > +static inline bool guest_pmu_is_enabled(struct kvm_pmu *pmu)
> > +{
> > +       /*
> > +        * Currently VMs do not have PMU settings in configs which defaults
> > +        * to "pmu=off".
> > +        *
> > +        * For Intel currently this means pmu->version will be 0.
> > +        * For AMD currently PMU cannot be disabled:
> 
> Isn't that what KVM_PMU_CAP_DISABLE is for?

Yeah, see my response.  KVM doesn't clear the metadata, so internally it looks
like the PMU is enabled even though it's effectively disabled from the guest's
perspective.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/1] KVM: x86/vPMU: Speed up vmexit for AMD Zen 4 CPUs
  2023-11-09 22:44 ` [PATCH 0/1] KVM: x86/vPMU: Speed up vmexit for AMD Zen 4 CPUs Jim Mattson
@ 2023-11-09 23:42   ` Sean Christopherson
  2023-11-10  0:17     ` Jim Mattson
  0 siblings, 1 reply; 20+ messages in thread
From: Sean Christopherson @ 2023-11-09 23:42 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Konstantin Khorenko, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, kvm,
	linux-kernel, Denis V. Lunev

On Thu, Nov 09, 2023, Jim Mattson wrote:
> On Thu, Nov 9, 2023 at 10:24 AM Konstantin Khorenko
> <khorenko@virtuozzo.com> wrote:
> >
> > We have detected significant performance drop of our atomic test which
> > checks the rate of CPUID instructions rate inside an L1 VM on an AMD
> > node.
> >
> > Investigation led to 2 mainstream patches which have introduced extra
> > events accounting:
> >
> >    018d70ffcfec ("KVM: x86: Update vPMCs when retiring branch instructions")
> >    9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions")
> >
> > And on an AMD Zen 3 CPU that resulted in immediate 43% drop in the CPUID
> > rate.
> >
> > Checking latest mainsteam kernel the performance difference is much less
> > but still quite noticeable: 13.4% and shows up on AMD CPUs only.
> >
> > Looks like iteration over all PMCs in kvm_pmu_trigger_event() is cheap
> > on Intel and expensive on AMD CPUs.
> >
> > So the idea behind this patch is to skip iterations over PMCs at all in
> > case PMU is disabled for a VM completely or PMU is enabled for a VM, but
> > there are no active PMCs at all.
> 
> A better solution may be to maintain two bitmaps of general purpose
> counters that need to be incremented, one for instructions retired and
> one for branch instructions retired. Set or clear these bits whenever
> the PerfEvtSelN MSRs are written. I think I would keep the PGC bits
> separate, on those microarchitectures that support PGC. Then,
> kvm_pmu_trigger_event() need only consult the appropriate bitmap (or
> the logical and of that bitmap with PGC). In most cases, the value
> will be zero, and the function can simply return.
> 
> This would work even for AMD microarchitectures that don't support PGC.

Yeah.  There are multiple lower-hanging fruits to be picked though, most of which
won't conflict with using dedicated per-event bitmaps, or at worst are trivial
to resolve.

 1. Don't call into perf to get the eventsel (which generates an indirect call)
    on every invocation, let alone every iteration.

 2. Avoid getting the CPL when it's irrelevant.

 3. Check the eventsel before querying the event filter.

 4. Mask out PMCs that aren't globally enabled from the get-go (masking out
    PMCs based on eventsel would essentially be the same as per-event bitmaps).

I'm definitely not opposed to per-event bitmaps, but it'd be nice to avoid them,
e.g. if we can eke out 99% of the performance just by doing a few obvious
optimizations.

This is the end result of what I'm testing and will (hopefully) post shortly:

static inline bool pmc_is_eventsel_match(struct kvm_pmc *pmc, u64 eventsel)
{
	return !((pmc->eventsel ^ eventsel) & AMD64_RAW_EVENT_MASK_NB);
}

static inline bool cpl_is_matched(struct kvm_pmc *pmc)
{
	bool select_os, select_user;
	u64 config;

	if (pmc_is_gp(pmc)) {
		config = pmc->eventsel;
		select_os = config & ARCH_PERFMON_EVENTSEL_OS;
		select_user = config & ARCH_PERFMON_EVENTSEL_USR;
	} else {
		config = fixed_ctrl_field(pmc_to_pmu(pmc)->fixed_ctr_ctrl,
					  pmc->idx - KVM_FIXED_PMC_BASE_IDX);
		select_os = config & 0x1;
		select_user = config & 0x2;
	}

	/*
	 * Skip the CPL lookup, which isn't free on Intel, if the result will
	 * be the same regardless of the CPL.
	 */
	if (select_os == select_user)
		return select_os;

	return (static_call(kvm_x86_get_cpl)(pmc->vcpu) == 0) ? select_os : select_user;
}

void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel)
{
	DECLARE_BITMAP(bitmap, X86_PMC_IDX_MAX);
	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
	struct kvm_pmc *pmc;
	int i;

	BUILD_BUG_ON(sizeof(pmu->global_ctrl) * BITS_PER_BYTE != X86_PMC_IDX_MAX);

	if (!kvm_pmu_has_perf_global_ctrl(pmu))
		bitmap_copy(bitmap, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX);
	else if (!bitmap_and(bitmap, pmu->all_valid_pmc_idx,
			     (unsigned long *)&pmu->global_ctrl, X86_PMC_IDX_MAX))
		return;

	kvm_for_each_pmc(pmu, pmc, i, bitmap) {
		/* Ignore checks for edge detect, pin control, invert and CMASK bits */
		if (!pmc_is_eventsel_match(pmc, eventsel) ||
		    !pmc_event_is_allowed(pmc) || !cpl_is_matched(pmc))
			continue;

		kvm_pmu_incr_counter(pmc);
	}
}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: KVM: x86/vPMU/AMD: Can we detect PMU is off for a VM?
  2023-11-09 22:52   ` Jim Mattson
@ 2023-11-09 23:46     ` Denis V. Lunev
  2023-11-10  0:01       ` Dongli Zhang
  2023-11-10  0:02       ` Jim Mattson
  0 siblings, 2 replies; 20+ messages in thread
From: Denis V. Lunev @ 2023-11-09 23:46 UTC (permalink / raw)
  To: Jim Mattson, Konstantin Khorenko
  Cc: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, kvm,
	linux-kernel

On 11/9/23 23:52, Jim Mattson wrote:
> On Thu, Nov 9, 2023 at 10:18 AM Konstantin Khorenko
> <khorenko@virtuozzo.com> wrote:
>> Hi All,
>>
>> as a followup for my patch: i have noticed that
>> currently Intel kernel code provides an ability to detect if PMU is totally disabled for a VM
>> (pmu->version == 0 in this case), but for AMD code pmu->version is never 0,
>> no matter if PMU is enabled or disabled for a VM (i mean <pmu state='off'/> in the VM config which
>> results in "-cpu pmu=off" qemu option).
>>
>> So the question is - is it possible to enhance the code for AMD to also honor PMU VM setting or it is
>> impossible by design?
> The AMD architectural specification prior to AMD PMU v2 does not allow
> one to describe a CPU (via CPUID or MSRs) that has fewer than 4
> general purpose PMU counters. While AMD PMU v2 does allow one to
> describe such a CPU, legacy software that knows nothing of AMD PMU v2
> can expect four counters regardless.
>
> Having said that, KVM does provide a per-VM capability for disabling
> the virtual PMU: KVM_CAP_PMU_CAPABILITY(KVM_PMU_CAP_DISABLE). See
> section 8.35 in Documentation/virt/kvm/api.rst.
But this means in particular that QEMU should immediately
use this KVM_PMU_CAP_DISABLE if this capability is supported and 
PMU=off. I am not seeing this code thus I believe that we have missed 
this. I think that this change worth adding. We will measure the impact 
:-) Den

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: KVM: x86/vPMU/AMD: Can we detect PMU is off for a VM?
  2023-11-09 23:46     ` Denis V. Lunev
@ 2023-11-10  0:01       ` Dongli Zhang
  2023-11-13  9:31         ` Denis V. Lunev
  2023-11-10  0:02       ` Jim Mattson
  1 sibling, 1 reply; 20+ messages in thread
From: Dongli Zhang @ 2023-11-10  0:01 UTC (permalink / raw)
  To: Denis V. Lunev, Jim Mattson, Konstantin Khorenko
  Cc: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, kvm,
	linux-kernel



On 11/9/23 3:46 PM, Denis V. Lunev wrote:
> On 11/9/23 23:52, Jim Mattson wrote:
>> On Thu, Nov 9, 2023 at 10:18 AM Konstantin Khorenko
>> <khorenko@virtuozzo.com> wrote:
>>> Hi All,
>>>
>>> as a followup for my patch: i have noticed that
>>> currently Intel kernel code provides an ability to detect if PMU is totally
>>> disabled for a VM
>>> (pmu->version == 0 in this case), but for AMD code pmu->version is never 0,
>>> no matter if PMU is enabled or disabled for a VM (i mean <pmu state='off'/>
>>> in the VM config which
>>> results in "-cpu pmu=off" qemu option).
>>>
>>> So the question is - is it possible to enhance the code for AMD to also honor
>>> PMU VM setting or it is
>>> impossible by design?
>> The AMD architectural specification prior to AMD PMU v2 does not allow
>> one to describe a CPU (via CPUID or MSRs) that has fewer than 4
>> general purpose PMU counters. While AMD PMU v2 does allow one to
>> describe such a CPU, legacy software that knows nothing of AMD PMU v2
>> can expect four counters regardless.
>>
>> Having said that, KVM does provide a per-VM capability for disabling
>> the virtual PMU: KVM_CAP_PMU_CAPABILITY(KVM_PMU_CAP_DISABLE). See
>> section 8.35 in Documentation/virt/kvm/api.rst.
> But this means in particular that QEMU should immediately
> use this KVM_PMU_CAP_DISABLE if this capability is supported and PMU=off. I am
> not seeing this code thus I believe that we have missed this. I think that this
> change worth adding. We will measure the impact :-) Den
> 

I used to have a patch to use KVM_PMU_CAP_DISABLE in QEMU, but that did not draw
many developers' attention.

https://lore.kernel.org/qemu-devel/20230621013821.6874-2-dongli.zhang@oracle.com/

It is time to first re-send that again.

Dongli Zhang

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: KVM: x86/vPMU/AMD: Can we detect PMU is off for a VM?
  2023-11-09 23:46     ` Denis V. Lunev
  2023-11-10  0:01       ` Dongli Zhang
@ 2023-11-10  0:02       ` Jim Mattson
  1 sibling, 0 replies; 20+ messages in thread
From: Jim Mattson @ 2023-11-10  0:02 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: Konstantin Khorenko, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H . Peter Anvin, kvm, linux-kernel

On Thu, Nov 9, 2023 at 3:46 PM Denis V. Lunev <den@virtuozzo.com> wrote:
>
> On 11/9/23 23:52, Jim Mattson wrote:
> > On Thu, Nov 9, 2023 at 10:18 AM Konstantin Khorenko
> > <khorenko@virtuozzo.com> wrote:
> >> Hi All,
> >>
> >> as a followup for my patch: i have noticed that
> >> currently Intel kernel code provides an ability to detect if PMU is totally disabled for a VM
> >> (pmu->version == 0 in this case), but for AMD code pmu->version is never 0,
> >> no matter if PMU is enabled or disabled for a VM (i mean <pmu state='off'/> in the VM config which
> >> results in "-cpu pmu=off" qemu option).
> >>
> >> So the question is - is it possible to enhance the code for AMD to also honor PMU VM setting or it is
> >> impossible by design?
> > The AMD architectural specification prior to AMD PMU v2 does not allow
> > one to describe a CPU (via CPUID or MSRs) that has fewer than 4
> > general purpose PMU counters. While AMD PMU v2 does allow one to
> > describe such a CPU, legacy software that knows nothing of AMD PMU v2
> > can expect four counters regardless.
> >
> > Having said that, KVM does provide a per-VM capability for disabling
> > the virtual PMU: KVM_CAP_PMU_CAPABILITY(KVM_PMU_CAP_DISABLE). See
> > section 8.35 in Documentation/virt/kvm/api.rst.
> But this means in particular that QEMU should immediately
> use this KVM_PMU_CAP_DISABLE if this capability is supported and
> PMU=off. I am not seeing this code thus I believe that we have missed
> this. I think that this change worth adding. We will measure the impact
> :-) Den

At present, KVM will still blindly cycle through each GP counter (4
minimum for AMD) until it checks vcpu->kvm->arch.enable_pmu at the top
of get_gp_pmc_amd().

Sean's proposal to clear the metadata should eliminate the overhead
for VMs with the virtual PMU disabled. My proposal should eliminate
the overhead even for VMs with the virtual PMU enabled, as long as no
counters are programmed for "instructions retired."

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/1] KVM: x86/vPMU: Speed up vmexit for AMD Zen 4 CPUs
  2023-11-09 23:42   ` Sean Christopherson
@ 2023-11-10  0:17     ` Jim Mattson
  2023-11-10  0:52       ` Sean Christopherson
  0 siblings, 1 reply; 20+ messages in thread
From: Jim Mattson @ 2023-11-10  0:17 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Konstantin Khorenko, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, kvm,
	linux-kernel, Denis V. Lunev

On Thu, Nov 9, 2023 at 3:42 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Nov 09, 2023, Jim Mattson wrote:
> > On Thu, Nov 9, 2023 at 10:24 AM Konstantin Khorenko
> > <khorenko@virtuozzo.com> wrote:
> > >
> > > We have detected significant performance drop of our atomic test which
> > > checks the rate of CPUID instructions rate inside an L1 VM on an AMD
> > > node.
> > >
> > > Investigation led to 2 mainstream patches which have introduced extra
> > > events accounting:
> > >
> > >    018d70ffcfec ("KVM: x86: Update vPMCs when retiring branch instructions")
> > >    9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions")
> > >
> > > And on an AMD Zen 3 CPU that resulted in immediate 43% drop in the CPUID
> > > rate.
> > >
> > > Checking latest mainsteam kernel the performance difference is much less
> > > but still quite noticeable: 13.4% and shows up on AMD CPUs only.
> > >
> > > Looks like iteration over all PMCs in kvm_pmu_trigger_event() is cheap
> > > on Intel and expensive on AMD CPUs.
> > >
> > > So the idea behind this patch is to skip iterations over PMCs at all in
> > > case PMU is disabled for a VM completely or PMU is enabled for a VM, but
> > > there are no active PMCs at all.
> >
> > A better solution may be to maintain two bitmaps of general purpose
> > counters that need to be incremented, one for instructions retired and
> > one for branch instructions retired. Set or clear these bits whenever
> > the PerfEvtSelN MSRs are written. I think I would keep the PGC bits
> > separate, on those microarchitectures that support PGC. Then,
> > kvm_pmu_trigger_event() need only consult the appropriate bitmap (or
> > the logical and of that bitmap with PGC). In most cases, the value
> > will be zero, and the function can simply return.
> >
> > This would work even for AMD microarchitectures that don't support PGC.
>
> Yeah.  There are multiple lower-hanging fruits to be picked though, most of which
> won't conflict with using dedicated per-event bitmaps, or at worst are trivial
> to resolve.
>
>  1. Don't call into perf to get the eventsel (which generates an indirect call)
>     on every invocation, let alone every iteration.
>
>  2. Avoid getting the CPL when it's irrelevant.
>
>  3. Check the eventsel before querying the event filter.
>
>  4. Mask out PMCs that aren't globally enabled from the get-go (masking out
>     PMCs based on eventsel would essentially be the same as per-event bitmaps).

The code below only looks at PGC. Even on CPUs that support PGC, some
PMU clients still use the enable bits in the PerfEvtSelN. Linux perf,
for instance, can't seem to make up its mind whether to use PGC or
PerfEvtSelN.EN.

> I'm definitely not opposed to per-event bitmaps, but it'd be nice to avoid them,
> e.g. if we can eke out 99% of the performance just by doing a few obvious
> optimizations.
>
> This is the end result of what I'm testing and will (hopefully) post shortly:
>
> static inline bool pmc_is_eventsel_match(struct kvm_pmc *pmc, u64 eventsel)
> {
>         return !((pmc->eventsel ^ eventsel) & AMD64_RAW_EVENT_MASK_NB);
> }

The top nybble of AMD's 3-nybble event select collides with Intel's
IN_TX and IN_TXCP bits. I think we can assert that the vCPU can't be
in a transaction if KVM is emulating an instruction, but this probably
merits a comment. The function name should also be more descriptive,
so that it doesn't get misused out of context. :)

> static inline bool cpl_is_matched(struct kvm_pmc *pmc)
> {
>         bool select_os, select_user;
>         u64 config;
>
>         if (pmc_is_gp(pmc)) {
>                 config = pmc->eventsel;
>                 select_os = config & ARCH_PERFMON_EVENTSEL_OS;
>                 select_user = config & ARCH_PERFMON_EVENTSEL_USR;
>         } else {
>                 config = fixed_ctrl_field(pmc_to_pmu(pmc)->fixed_ctr_ctrl,
>                                           pmc->idx - KVM_FIXED_PMC_BASE_IDX);
>                 select_os = config & 0x1;
>                 select_user = config & 0x2;
>         }
>
>         /*
>          * Skip the CPL lookup, which isn't free on Intel, if the result will
>          * be the same regardless of the CPL.
>          */
>         if (select_os == select_user)
>                 return select_os;
>
>         return (static_call(kvm_x86_get_cpl)(pmc->vcpu) == 0) ? select_os : select_user;
> }
>
> void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel)
> {
>         DECLARE_BITMAP(bitmap, X86_PMC_IDX_MAX);
>         struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>         struct kvm_pmc *pmc;
>         int i;
>
>         BUILD_BUG_ON(sizeof(pmu->global_ctrl) * BITS_PER_BYTE != X86_PMC_IDX_MAX);
>
>         if (!kvm_pmu_has_perf_global_ctrl(pmu))
>                 bitmap_copy(bitmap, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX);
>         else if (!bitmap_and(bitmap, pmu->all_valid_pmc_idx,
>                              (unsigned long *)&pmu->global_ctrl, X86_PMC_IDX_MAX))
>                 return;
>
>         kvm_for_each_pmc(pmu, pmc, i, bitmap) {
>                 /* Ignore checks for edge detect, pin control, invert and CMASK bits */
>                 if (!pmc_is_eventsel_match(pmc, eventsel) ||
>                     !pmc_event_is_allowed(pmc) || !cpl_is_matched(pmc))
>                         continue;
>
>                 kvm_pmu_incr_counter(pmc);
>         }
> }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/1] KVM: x86/vPMU: Speed up vmexit for AMD Zen 4 CPUs
  2023-11-10  0:17     ` Jim Mattson
@ 2023-11-10  0:52       ` Sean Christopherson
  2023-11-10  1:14         ` Sean Christopherson
  0 siblings, 1 reply; 20+ messages in thread
From: Sean Christopherson @ 2023-11-10  0:52 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Konstantin Khorenko, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, kvm,
	linux-kernel, Denis V. Lunev

On Thu, Nov 09, 2023, Jim Mattson wrote:
> On Thu, Nov 9, 2023 at 3:42 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, Nov 09, 2023, Jim Mattson wrote:
> > > A better solution may be to maintain two bitmaps of general purpose
> > > counters that need to be incremented, one for instructions retired and
> > > one for branch instructions retired. Set or clear these bits whenever
> > > the PerfEvtSelN MSRs are written. I think I would keep the PGC bits
> > > separate, on those microarchitectures that support PGC. Then,
> > > kvm_pmu_trigger_event() need only consult the appropriate bitmap (or
> > > the logical and of that bitmap with PGC). In most cases, the value
> > > will be zero, and the function can simply return.
> > >
> > > This would work even for AMD microarchitectures that don't support PGC.
> >
> > Yeah.  There are multiple lower-hanging fruits to be picked though, most of which
> > won't conflict with using dedicated per-event bitmaps, or at worst are trivial
> > to resolve.
> >
> >  1. Don't call into perf to get the eventsel (which generates an indirect call)
> >     on every invocation, let alone every iteration.
> >
> >  2. Avoid getting the CPL when it's irrelevant.
> >
> >  3. Check the eventsel before querying the event filter.
> >
> >  4. Mask out PMCs that aren't globally enabled from the get-go (masking out
> >     PMCs based on eventsel would essentially be the same as per-event bitmaps).
> 
> The code below only looks at PGC. Even on CPUs that support PGC, some
> PMU clients still use the enable bits in the PerfEvtSelN. Linux perf,
> for instance, can't seem to make up its mind whether to use PGC or
> PerfEvtSelN.EN.

Ugh, as in perf leaves all GLOBAL_CTRL bits set and toggles only the eventsel
enable bit?  Lame.

> > I'm definitely not opposed to per-event bitmaps, but it'd be nice to avoid them,
> > e.g. if we can eke out 99% of the performance just by doing a few obvious
> > optimizations.
> >
> > This is the end result of what I'm testing and will (hopefully) post shortly:
> >
> > static inline bool pmc_is_eventsel_match(struct kvm_pmc *pmc, u64 eventsel)
> > {
> >         return !((pmc->eventsel ^ eventsel) & AMD64_RAW_EVENT_MASK_NB);
> > }
> 
> The top nybble of AMD's 3-nybble event select collides with Intel's
> IN_TX and IN_TXCP bits. I think we can assert that the vCPU can't be
> in a transaction if KVM is emulating an instruction, but this probably
> merits a comment.

Argh, more pre-existing crud.  This is silly, the vendor mask is already in
kvm_pmu_ops.EVENTSEL_EVENT.  What's one more patch...

> The function name should also be more descriptive, so that it doesn't get
> misused out of context. :)

Heh, I think I'll just delete the darn thing.  That also makes it easier to
understand the comment about ignoring certain fields.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/1] KVM: x86/vPMU: Speed up vmexit for AMD Zen 4 CPUs
  2023-11-10  0:52       ` Sean Christopherson
@ 2023-11-10  1:14         ` Sean Christopherson
  0 siblings, 0 replies; 20+ messages in thread
From: Sean Christopherson @ 2023-11-10  1:14 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Konstantin Khorenko, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, kvm,
	linux-kernel, Denis V. Lunev

On Fri, Nov 10, 2023, Sean Christopherson wrote:
> On Thu, Nov 09, 2023, Jim Mattson wrote:
> > On Thu, Nov 9, 2023 at 3:42 PM Sean Christopherson <seanjc@google.com> wrote:
> > > static inline bool pmc_is_eventsel_match(struct kvm_pmc *pmc, u64 eventsel)
> > > {
> > >         return !((pmc->eventsel ^ eventsel) & AMD64_RAW_EVENT_MASK_NB);
> > > }
> > 
> > The top nybble of AMD's 3-nybble event select collides with Intel's
> > IN_TX and IN_TXCP bits. I think we can assert that the vCPU can't be
> > in a transaction if KVM is emulating an instruction, but this probably
> > merits a comment.
> 
> Argh, more pre-existing crud.  This is silly, the vendor mask is already in
> kvm_pmu_ops.EVENTSEL_EVENT.  What's one more patch...

Ah, I see what your saying.  Checking the bits is actually correct, probably through
sheer dumb luck.  I'll expand the comment to cover that and the reserved bits.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: KVM: x86/vPMU/AMD: Can we detect PMU is off for a VM?
  2023-11-10  0:01       ` Dongli Zhang
@ 2023-11-13  9:31         ` Denis V. Lunev
  2023-11-13 14:14           ` Dongli Zhang
  0 siblings, 1 reply; 20+ messages in thread
From: Denis V. Lunev @ 2023-11-13  9:31 UTC (permalink / raw)
  To: Dongli Zhang, Jim Mattson, Konstantin Khorenko
  Cc: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, kvm,
	linux-kernel, Alexander Ivanov

On 11/10/23 01:01, Dongli Zhang wrote:
>
> On 11/9/23 3:46 PM, Denis V. Lunev wrote:
>> On 11/9/23 23:52, Jim Mattson wrote:
>>> On Thu, Nov 9, 2023 at 10:18 AM Konstantin Khorenko
>>> <khorenko@virtuozzo.com> wrote:
>>>> Hi All,
>>>>
>>>> as a followup for my patch: i have noticed that
>>>> currently Intel kernel code provides an ability to detect if PMU is totally
>>>> disabled for a VM
>>>> (pmu->version == 0 in this case), but for AMD code pmu->version is never 0,
>>>> no matter if PMU is enabled or disabled for a VM (i mean <pmu state='off'/>
>>>> in the VM config which
>>>> results in "-cpu pmu=off" qemu option).
>>>>
>>>> So the question is - is it possible to enhance the code for AMD to also honor
>>>> PMU VM setting or it is
>>>> impossible by design?
>>> The AMD architectural specification prior to AMD PMU v2 does not allow
>>> one to describe a CPU (via CPUID or MSRs) that has fewer than 4
>>> general purpose PMU counters. While AMD PMU v2 does allow one to
>>> describe such a CPU, legacy software that knows nothing of AMD PMU v2
>>> can expect four counters regardless.
>>>
>>> Having said that, KVM does provide a per-VM capability for disabling
>>> the virtual PMU: KVM_CAP_PMU_CAPABILITY(KVM_PMU_CAP_DISABLE). See
>>> section 8.35 in Documentation/virt/kvm/api.rst.
>> But this means in particular that QEMU should immediately
>> use this KVM_PMU_CAP_DISABLE if this capability is supported and PMU=off. I am
>> not seeing this code thus I believe that we have missed this. I think that this
>> change worth adding. We will measure the impact :-) Den
>>
> I used to have a patch to use KVM_PMU_CAP_DISABLE in QEMU, but that did not draw
> many developers' attention.
>
> https://lore.kernel.org/qemu-devel/20230621013821.6874-2-dongli.zhang@oracle.com/
>
> It is time to first re-send that again.
>
> Dongli Zhang
We have checked that setting KVM_PMU_CAP_DISABLE really helps. 
Konstantin has done this and this is good. On the other hand, looking 
into these patches I disagree with them. We should not introduce new 
option for QEMU. If PMU is disabled, i.e. we assume that pmu=off passed 
in the command line, we should set KVM_PMU_CAP_DISABLE for that virtual 
machine. Den

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: KVM: x86/vPMU/AMD: Can we detect PMU is off for a VM?
  2023-11-13  9:31         ` Denis V. Lunev
@ 2023-11-13 14:14           ` Dongli Zhang
  2023-11-13 14:42             ` Denis V. Lunev
  0 siblings, 1 reply; 20+ messages in thread
From: Dongli Zhang @ 2023-11-13 14:14 UTC (permalink / raw)
  To: Denis V. Lunev, Konstantin Khorenko
  Cc: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, kvm,
	linux-kernel, Alexander Ivanov, Jim Mattson

Hi Denis,

On 11/13/23 01:31, Denis V. Lunev wrote:
> On 11/10/23 01:01, Dongli Zhang wrote:
>>
>> On 11/9/23 3:46 PM, Denis V. Lunev wrote:
>>> On 11/9/23 23:52, Jim Mattson wrote:
>>>> On Thu, Nov 9, 2023 at 10:18 AM Konstantin Khorenko
>>>> <khorenko@virtuozzo.com> wrote:
>>>>> Hi All,
>>>>>
>>>>> as a followup for my patch: i have noticed that
>>>>> currently Intel kernel code provides an ability to detect if PMU is totally
>>>>> disabled for a VM
>>>>> (pmu->version == 0 in this case), but for AMD code pmu->version is never 0,
>>>>> no matter if PMU is enabled or disabled for a VM (i mean <pmu state='off'/>
>>>>> in the VM config which
>>>>> results in "-cpu pmu=off" qemu option).
>>>>>
>>>>> So the question is - is it possible to enhance the code for AMD to also honor
>>>>> PMU VM setting or it is
>>>>> impossible by design?
>>>> The AMD architectural specification prior to AMD PMU v2 does not allow
>>>> one to describe a CPU (via CPUID or MSRs) that has fewer than 4
>>>> general purpose PMU counters. While AMD PMU v2 does allow one to
>>>> describe such a CPU, legacy software that knows nothing of AMD PMU v2
>>>> can expect four counters regardless.
>>>>
>>>> Having said that, KVM does provide a per-VM capability for disabling
>>>> the virtual PMU: KVM_CAP_PMU_CAPABILITY(KVM_PMU_CAP_DISABLE). See
>>>> section 8.35 in Documentation/virt/kvm/api.rst.
>>> But this means in particular that QEMU should immediately
>>> use this KVM_PMU_CAP_DISABLE if this capability is supported and PMU=off. I am
>>> not seeing this code thus I believe that we have missed this. I think that this
>>> change worth adding. We will measure the impact :-) Den
>>>
>> I used to have a patch to use KVM_PMU_CAP_DISABLE in QEMU, but that did not draw
>> many developers' attention.
>>
>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20230621013821.6874-2-dongli.zhang@oracle.com/__;!!ACWV5N9M2RV99hQ!McSH2M-kuHmzAwTuXKxrjLkrdJoPqML6cY_Ndc-8k9LRQ7D1V9bSBRQPwHqtx9XCVLK3uzdsMaxyfwve$
>> It is time to first re-send that again.
>>
>> Dongli Zhang
> We have checked that setting KVM_PMU_CAP_DISABLE really helps. Konstantin has
> done this and this is good. On the other hand, looking into these patches I
> disagree with them. We should not introduce new option for QEMU. If PMU is
> disabled, i.e. we assume that pmu=off passed in the command line, we should set
> KVM_PMU_CAP_DISABLE for that virtual machine. Den

Can I assume you meant pmu=off, that is, cpu->enable_pmu in QEMU?

In my opinion, cpu->enable_pmu indicates the option to control the cpu features.
It may be used by any accelerators, and it is orthogonal to the KVM cap.


The KVM_PMU_CAP_DISABLE is only specific to the KVM accelerator.


That's why I had introduced a new option, to allow to configure the VM in my
dimensions.

It means one dimension to AMD, but two for Intel: to disable PMU via cpuid, or
KVM cap.

Anyway, this is KVM mailing list, and I may initiate the discussion in QEMU list.

Thank you very much!

Dongli Zhang

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: KVM: x86/vPMU/AMD: Can we detect PMU is off for a VM?
  2023-11-13 14:14           ` Dongli Zhang
@ 2023-11-13 14:42             ` Denis V. Lunev
  2023-11-13 16:17               ` Dongli Zhang
  0 siblings, 1 reply; 20+ messages in thread
From: Denis V. Lunev @ 2023-11-13 14:42 UTC (permalink / raw)
  To: Dongli Zhang, Konstantin Khorenko
  Cc: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, kvm,
	linux-kernel, Alexander Ivanov, Jim Mattson

On 11/13/23 15:14, Dongli Zhang wrote:
> Hi Denis,
>
> On 11/13/23 01:31, Denis V. Lunev wrote:
>> On 11/10/23 01:01, Dongli Zhang wrote:
>>> On 11/9/23 3:46 PM, Denis V. Lunev wrote:
>>>> On 11/9/23 23:52, Jim Mattson wrote:
>>>>> On Thu, Nov 9, 2023 at 10:18 AM Konstantin Khorenko
>>>>> <khorenko@virtuozzo.com> wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> as a followup for my patch: i have noticed that
>>>>>> currently Intel kernel code provides an ability to detect if PMU is totally
>>>>>> disabled for a VM
>>>>>> (pmu->version == 0 in this case), but for AMD code pmu->version is never 0,
>>>>>> no matter if PMU is enabled or disabled for a VM (i mean <pmu state='off'/>
>>>>>> in the VM config which
>>>>>> results in "-cpu pmu=off" qemu option).
>>>>>>
>>>>>> So the question is - is it possible to enhance the code for AMD to also honor
>>>>>> PMU VM setting or it is
>>>>>> impossible by design?
>>>>> The AMD architectural specification prior to AMD PMU v2 does not allow
>>>>> one to describe a CPU (via CPUID or MSRs) that has fewer than 4
>>>>> general purpose PMU counters. While AMD PMU v2 does allow one to
>>>>> describe such a CPU, legacy software that knows nothing of AMD PMU v2
>>>>> can expect four counters regardless.
>>>>>
>>>>> Having said that, KVM does provide a per-VM capability for disabling
>>>>> the virtual PMU: KVM_CAP_PMU_CAPABILITY(KVM_PMU_CAP_DISABLE). See
>>>>> section 8.35 in Documentation/virt/kvm/api.rst.
>>>> But this means in particular that QEMU should immediately
>>>> use this KVM_PMU_CAP_DISABLE if this capability is supported and PMU=off. I am
>>>> not seeing this code thus I believe that we have missed this. I think that this
>>>> change worth adding. We will measure the impact :-) Den
>>>>
>>> I used to have a patch to use KVM_PMU_CAP_DISABLE in QEMU, but that did not draw
>>> many developers' attention.
>>>
>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20230621013821.6874-2-dongli.zhang@oracle.com/__;!!ACWV5N9M2RV99hQ!McSH2M-kuHmzAwTuXKxrjLkrdJoPqML6cY_Ndc-8k9LRQ7D1V9bSBRQPwHqtx9XCVLK3uzdsMaxyfwve$
>>> It is time to first re-send that again.
>>>
>>> Dongli Zhang
>> We have checked that setting KVM_PMU_CAP_DISABLE really helps. Konstantin has
>> done this and this is good. On the other hand, looking into these patches I
>> disagree with them. We should not introduce new option for QEMU. If PMU is
>> disabled, i.e. we assume that pmu=off passed in the command line, we should set
>> KVM_PMU_CAP_DISABLE for that virtual machine. Den
> Can I assume you meant pmu=off, that is, cpu->enable_pmu in QEMU?
>
> In my opinion, cpu->enable_pmu indicates the option to control the cpu features.
> It may be used by any accelerators, and it is orthogonal to the KVM cap.
>
>
> The KVM_PMU_CAP_DISABLE is only specific to the KVM accelerator.
>
>
> That's why I had introduced a new option, to allow to configure the VM in my
> dimensions.
>
> It means one dimension to AMD, but two for Intel: to disable PMU via cpuid, or
> KVM cap.
>
> Anyway, this is KVM mailing list, and I may initiate the discussion in QEMU list.
>
> Thank you very much!
>
> Dongli Zhang
with the option pmu='off' it is expected that PMU should be
off for the guest. At the moment (without this KVM capability)
we can disable PMU for Intel only and thus have performance
degradation on AMD.

This option disables PMU and thus normally when we are
running KVM guest and wanting PMU to be off it would
be required to
* disable CPUID leaf for Intel
* set KVM_PMU_CAP_DISABLE for both processors This would be quite 
natural and transparent for the libvirt. Alexander will prepare the 
patch today or tomorrow for the discussion. Den

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: KVM: x86/vPMU/AMD: Can we detect PMU is off for a VM?
  2023-11-13 14:42             ` Denis V. Lunev
@ 2023-11-13 16:17               ` Dongli Zhang
  2023-11-13 16:33                 ` Denis V. Lunev
  0 siblings, 1 reply; 20+ messages in thread
From: Dongli Zhang @ 2023-11-13 16:17 UTC (permalink / raw)
  To: Denis V. Lunev, Konstantin Khorenko
  Cc: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, kvm,
	linux-kernel, Alexander Ivanov, Jim Mattson



On 11/13/23 06:42, Denis V. Lunev wrote:
> On 11/13/23 15:14, Dongli Zhang wrote:
>> Hi Denis,
>>
>> On 11/13/23 01:31, Denis V. Lunev wrote:
>>> On 11/10/23 01:01, Dongli Zhang wrote:
>>>> On 11/9/23 3:46 PM, Denis V. Lunev wrote:
>>>>> On 11/9/23 23:52, Jim Mattson wrote:
>>>>>> On Thu, Nov 9, 2023 at 10:18 AM Konstantin Khorenko
>>>>>> <khorenko@virtuozzo.com> wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> as a followup for my patch: i have noticed that
>>>>>>> currently Intel kernel code provides an ability to detect if PMU is totally
>>>>>>> disabled for a VM
>>>>>>> (pmu->version == 0 in this case), but for AMD code pmu->version is never 0,
>>>>>>> no matter if PMU is enabled or disabled for a VM (i mean <pmu state='off'/>
>>>>>>> in the VM config which
>>>>>>> results in "-cpu pmu=off" qemu option).
>>>>>>>
>>>>>>> So the question is - is it possible to enhance the code for AMD to also
>>>>>>> honor
>>>>>>> PMU VM setting or it is
>>>>>>> impossible by design?
>>>>>> The AMD architectural specification prior to AMD PMU v2 does not allow
>>>>>> one to describe a CPU (via CPUID or MSRs) that has fewer than 4
>>>>>> general purpose PMU counters. While AMD PMU v2 does allow one to
>>>>>> describe such a CPU, legacy software that knows nothing of AMD PMU v2
>>>>>> can expect four counters regardless.
>>>>>>
>>>>>> Having said that, KVM does provide a per-VM capability for disabling
>>>>>> the virtual PMU: KVM_CAP_PMU_CAPABILITY(KVM_PMU_CAP_DISABLE). See
>>>>>> section 8.35 in Documentation/virt/kvm/api.rst.
>>>>> But this means in particular that QEMU should immediately
>>>>> use this KVM_PMU_CAP_DISABLE if this capability is supported and PMU=off. I am
>>>>> not seeing this code thus I believe that we have missed this. I think that
>>>>> this
>>>>> change worth adding. We will measure the impact :-) Den
>>>>>
>>>> I used to have a patch to use KVM_PMU_CAP_DISABLE in QEMU, but that did not
>>>> draw
>>>> many developers' attention.
>>>>
>>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20230621013821.6874-2-dongli.zhang@oracle.com/__;!!ACWV5N9M2RV99hQ!McSH2M-kuHmzAwTuXKxrjLkrdJoPqML6cY_Ndc-8k9LRQ7D1V9bSBRQPwHqtx9XCVLK3uzdsMaxyfwve$
>>>> It is time to first re-send that again.
>>>>
>>>> Dongli Zhang
>>> We have checked that setting KVM_PMU_CAP_DISABLE really helps. Konstantin has
>>> done this and this is good. On the other hand, looking into these patches I
>>> disagree with them. We should not introduce new option for QEMU. If PMU is
>>> disabled, i.e. we assume that pmu=off passed in the command line, we should set
>>> KVM_PMU_CAP_DISABLE for that virtual machine. Den
>> Can I assume you meant pmu=off, that is, cpu->enable_pmu in QEMU?
>>
>> In my opinion, cpu->enable_pmu indicates the option to control the cpu features.
>> It may be used by any accelerators, and it is orthogonal to the KVM cap.
>>
>>
>> The KVM_PMU_CAP_DISABLE is only specific to the KVM accelerator.
>>
>>
>> That's why I had introduced a new option, to allow to configure the VM in my
>> dimensions.
>>
>> It means one dimension to AMD, but two for Intel: to disable PMU via cpuid, or
>> KVM cap.
>>
>> Anyway, this is KVM mailing list, and I may initiate the discussion in QEMU list.
>>
>> Thank you very much!
>>
>> Dongli Zhang
> with the option pmu='off' it is expected that PMU should be
> off for the guest. At the moment (without this KVM capability)
> we can disable PMU for Intel only and thus have performance
> degradation on AMD.
> 
> This option disables PMU and thus normally when we are
> running KVM guest and wanting PMU to be off it would
> be required to
> * disable CPUID leaf for Intel
> * set KVM_PMU_CAP_DISABLE for both processors This would be quite natural and
> transparent for the libvirt. Alexander will prepare the patch today or tomorrow
> for the discussion. Den

That is what I had implemented in the v1 of patch.

https://lore.kernel.org/all/20221119122901.2469-3-dongli.zhang@oracle.com/

However, I changed that after people suggested introduce a new property.

Dongli Zhang

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: KVM: x86/vPMU/AMD: Can we detect PMU is off for a VM?
  2023-11-13 16:17               ` Dongli Zhang
@ 2023-11-13 16:33                 ` Denis V. Lunev
  0 siblings, 0 replies; 20+ messages in thread
From: Denis V. Lunev @ 2023-11-13 16:33 UTC (permalink / raw)
  To: Dongli Zhang, Konstantin Khorenko
  Cc: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, kvm,
	linux-kernel, Alexander Ivanov, Jim Mattson

On 11/13/23 17:17, Dongli Zhang wrote:
>
> On 11/13/23 06:42, Denis V. Lunev wrote:
>> On 11/13/23 15:14, Dongli Zhang wrote:
>>> Hi Denis,
>>>
>>> On 11/13/23 01:31, Denis V. Lunev wrote:
>>>> On 11/10/23 01:01, Dongli Zhang wrote:
>>>>> On 11/9/23 3:46 PM, Denis V. Lunev wrote:
>>>>>> On 11/9/23 23:52, Jim Mattson wrote:
>>>>>>> On Thu, Nov 9, 2023 at 10:18 AM Konstantin Khorenko
>>>>>>> <khorenko@virtuozzo.com> wrote:
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> as a followup for my patch: i have noticed that
>>>>>>>> currently Intel kernel code provides an ability to detect if PMU is totally
>>>>>>>> disabled for a VM
>>>>>>>> (pmu->version == 0 in this case), but for AMD code pmu->version is never 0,
>>>>>>>> no matter if PMU is enabled or disabled for a VM (i mean <pmu state='off'/>
>>>>>>>> in the VM config which
>>>>>>>> results in "-cpu pmu=off" qemu option).
>>>>>>>>
>>>>>>>> So the question is - is it possible to enhance the code for AMD to also
>>>>>>>> honor
>>>>>>>> PMU VM setting or it is
>>>>>>>> impossible by design?
>>>>>>> The AMD architectural specification prior to AMD PMU v2 does not allow
>>>>>>> one to describe a CPU (via CPUID or MSRs) that has fewer than 4
>>>>>>> general purpose PMU counters. While AMD PMU v2 does allow one to
>>>>>>> describe such a CPU, legacy software that knows nothing of AMD PMU v2
>>>>>>> can expect four counters regardless.
>>>>>>>
>>>>>>> Having said that, KVM does provide a per-VM capability for disabling
>>>>>>> the virtual PMU: KVM_CAP_PMU_CAPABILITY(KVM_PMU_CAP_DISABLE). See
>>>>>>> section 8.35 in Documentation/virt/kvm/api.rst.
>>>>>> But this means in particular that QEMU should immediately
>>>>>> use this KVM_PMU_CAP_DISABLE if this capability is supported and PMU=off. I am
>>>>>> not seeing this code thus I believe that we have missed this. I think that
>>>>>> this
>>>>>> change worth adding. We will measure the impact :-) Den
>>>>>>
>>>>> I used to have a patch to use KVM_PMU_CAP_DISABLE in QEMU, but that did not
>>>>> draw
>>>>> many developers' attention.
>>>>>
>>>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20230621013821.6874-2-dongli.zhang@oracle.com/__;!!ACWV5N9M2RV99hQ!McSH2M-kuHmzAwTuXKxrjLkrdJoPqML6cY_Ndc-8k9LRQ7D1V9bSBRQPwHqtx9XCVLK3uzdsMaxyfwve$
>>>>> It is time to first re-send that again.
>>>>>
>>>>> Dongli Zhang
>>>> We have checked that setting KVM_PMU_CAP_DISABLE really helps. Konstantin has
>>>> done this and this is good. On the other hand, looking into these patches I
>>>> disagree with them. We should not introduce new option for QEMU. If PMU is
>>>> disabled, i.e. we assume that pmu=off passed in the command line, we should set
>>>> KVM_PMU_CAP_DISABLE for that virtual machine. Den
>>> Can I assume you meant pmu=off, that is, cpu->enable_pmu in QEMU?
>>>
>>> In my opinion, cpu->enable_pmu indicates the option to control the cpu features.
>>> It may be used by any accelerators, and it is orthogonal to the KVM cap.
>>>
>>>
>>> The KVM_PMU_CAP_DISABLE is only specific to the KVM accelerator.
>>>
>>>
>>> That's why I had introduced a new option, to allow to configure the VM in my
>>> dimensions.
>>>
>>> It means one dimension to AMD, but two for Intel: to disable PMU via cpuid, or
>>> KVM cap.
>>>
>>> Anyway, this is KVM mailing list, and I may initiate the discussion in QEMU list.
>>>
>>> Thank you very much!
>>>
>>> Dongli Zhang
>> with the option pmu='off' it is expected that PMU should be
>> off for the guest. At the moment (without this KVM capability)
>> we can disable PMU for Intel only and thus have performance
>> degradation on AMD.
>>
>> This option disables PMU and thus normally when we are
>> running KVM guest and wanting PMU to be off it would
>> be required to
>> * disable CPUID leaf for Intel
>> * set KVM_PMU_CAP_DISABLE for both processors This would be quite natural and
>> transparent for the libvirt. Alexander will prepare the patch today or tomorrow
>> for the discussion. Den
> That is what I had implemented in the v1 of patch.
>
> https://lore.kernel.org/all/20221119122901.2469-3-dongli.zhang@oracle.com/
>
> However, I changed that after people suggested introduce a new property.
>
> Dongli Zhang
That would save a bit of our work :)

For me this patch looks absolutely awesome and is doing exactly
what I want to do in our downstream. This would get us required
15+% benefit for each VMexit.

Den

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-11-13 16:33 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-09 18:06 [PATCH 0/1] KVM: x86/vPMU: Speed up vmexit for AMD Zen 4 CPUs Konstantin Khorenko
2023-11-09 18:06 ` [PATCH 1/1] KVM: x86/vPMU: Check PMU is enabled for vCPU before searching for PMC Konstantin Khorenko
2023-11-09 20:09   ` Sean Christopherson
2023-11-09 20:13   ` Jim Mattson
2023-11-09 23:05     ` Sean Christopherson
2023-11-09 18:18 ` KVM: x86/vPMU/AMD: Can we detect PMU is off for a VM? Konstantin Khorenko
2023-11-09 22:52   ` Jim Mattson
2023-11-09 23:46     ` Denis V. Lunev
2023-11-10  0:01       ` Dongli Zhang
2023-11-13  9:31         ` Denis V. Lunev
2023-11-13 14:14           ` Dongli Zhang
2023-11-13 14:42             ` Denis V. Lunev
2023-11-13 16:17               ` Dongli Zhang
2023-11-13 16:33                 ` Denis V. Lunev
2023-11-10  0:02       ` Jim Mattson
2023-11-09 22:44 ` [PATCH 0/1] KVM: x86/vPMU: Speed up vmexit for AMD Zen 4 CPUs Jim Mattson
2023-11-09 23:42   ` Sean Christopherson
2023-11-10  0:17     ` Jim Mattson
2023-11-10  0:52       ` Sean Christopherson
2023-11-10  1:14         ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).