[PATCH 1/3] KVM: emulate: #GP when emulating rdpmc if CR0.PE is 1

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/3] KVM: emulate: #GP when emulating rdpmc if CR0.PE is 1
@ 2021-10-08  9:57 Wanpeng Li
  2021-10-08  9:57 ` [PATCH 2/3] KVM: vPMU: Fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 Wanpeng Li
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Wanpeng Li @ 2021-10-08  9:57 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

SDM mentioned that, RDPMC: 

  IF (((CR4.PCE = 1) or (CPL = 0) or (CR0.PE = 0)) and (ECX indicates a supported counter)) 
      THEN
          EAX := counter[31:0];
          EDX := ZeroExtend(counter[MSCB:32]);
      ELSE (* ECX is not valid or CR4.PCE is 0 and CPL is 1, 2, or 3 and CR0.PE is 1 *)
          #GP(0); 
  FI;

Let's add the CR0.PE is 1 checking to rdpmc emulate.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/emulate.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 9a144ca8e146..ab7ec569e8c9 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4213,6 +4213,7 @@ static int check_rdtsc(struct x86_emulate_ctxt *ctxt)
 static int check_rdpmc(struct x86_emulate_ctxt *ctxt)
 {
 	u64 cr4 = ctxt->ops->get_cr(ctxt, 4);
+	u64 cr0 = ctxt->ops->get_cr(ctxt, 0);
 	u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
 
 	/*
@@ -4222,7 +4223,7 @@ static int check_rdpmc(struct x86_emulate_ctxt *ctxt)
 	if (enable_vmware_backdoor && is_vmware_backdoor_pmc(rcx))
 		return X86EMUL_CONTINUE;
 
-	if ((!(cr4 & X86_CR4_PCE) && ctxt->ops->cpl(ctxt)) ||
+	if ((!(cr4 & X86_CR4_PCE) && ctxt->ops->cpl(ctxt) && (cr0 & X86_CR0_PE)) ||
 	    ctxt->ops->check_pmc(ctxt, rcx))
 		return emulate_gp(ctxt, 0);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] KVM: vPMU: Fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0
  2021-10-08  9:57 [PATCH 1/3] KVM: emulate: #GP when emulating rdpmc if CR0.PE is 1 Wanpeng Li
@ 2021-10-08  9:57 ` Wanpeng Li
  2021-10-08 11:02   ` Like Xu
  2021-10-08  9:57 ` [PATCH 3/3] KVM: LAPIC: Optimize PMI delivering overhead Wanpeng Li
  2021-10-08 15:20 ` [PATCH 1/3] KVM: emulate: #GP when emulating rdpmc if CR0.PE is 1 Sean Christopherson
  2 siblings, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2021-10-08  9:57 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

SDM section 18.2.3 mentioned that:

  "IA32_PERF_GLOBAL_OVF_CTL MSR allows software to clear overflow indicator(s) of 
   any general-purpose or fixed-function counters via a single WRMSR."

It is R/W mentioned by SDM, we read this msr on bare-metal during perf testing, 
the value is always 0 for CLX/SKX boxes on hands. Let's fill get_msr
MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 as hardware behavior.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
Btw, xen also fills get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL 0.

 arch/x86/kvm/vmx/pmu_intel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 10cc4f65c4ef..47260a8563f9 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -365,7 +365,7 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		msr_info->data = pmu->global_ctrl;
 		return 0;
 	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-		msr_info->data = pmu->global_ovf_ctrl;
+		msr_info->data = 0;
 		return 0;
 	default:
 		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] KVM: LAPIC: Optimize PMI delivering overhead
  2021-10-08  9:57 [PATCH 1/3] KVM: emulate: #GP when emulating rdpmc if CR0.PE is 1 Wanpeng Li
  2021-10-08  9:57 ` [PATCH 2/3] KVM: vPMU: Fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 Wanpeng Li
@ 2021-10-08  9:57 ` Wanpeng Li
  2021-10-08 10:52   ` Vitaly Kuznetsov
  2021-10-08 15:20 ` [PATCH 1/3] KVM: emulate: #GP when emulating rdpmc if CR0.PE is 1 Sean Christopherson
  2 siblings, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2021-10-08  9:57 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

The overhead of kvm_vcpu_kick() is huge since expensive rcu/memory
barrier etc operations in rcuwait_wake_up(). It is worse when local 
delivery since the vCPU is scheduled and we still suffer from this. 
We can observe 12us+ for kvm_vcpu_kick() in kvm_pmu_deliver_pmi() 
path by ftrace before the patch and 6us+ after the optimization. 

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/lapic.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 76fb00921203..ec6997187c6d 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1120,7 +1120,8 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 	case APIC_DM_NMI:
 		result = 1;
 		kvm_inject_nmi(vcpu);
-		kvm_vcpu_kick(vcpu);
+		if (vcpu != kvm_get_running_vcpu())
+			kvm_vcpu_kick(vcpu);
 		break;
 
 	case APIC_DM_INIT:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] KVM: LAPIC: Optimize PMI delivering overhead
  2021-10-08  9:57 ` [PATCH 3/3] KVM: LAPIC: Optimize PMI delivering overhead Wanpeng Li
@ 2021-10-08 10:52   ` Vitaly Kuznetsov
  2021-10-08 11:06     ` Wanpeng Li
  0 siblings, 1 reply; 11+ messages in thread
From: Vitaly Kuznetsov @ 2021-10-08 10:52 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Wanpeng Li, Jim Mattson,
	Joerg Roedel

Wanpeng Li <kernellwp@gmail.com> writes:

> From: Wanpeng Li <wanpengli@tencent.com>
>
> The overhead of kvm_vcpu_kick() is huge since expensive rcu/memory
> barrier etc operations in rcuwait_wake_up(). It is worse when local 
> delivery since the vCPU is scheduled and we still suffer from this. 
> We can observe 12us+ for kvm_vcpu_kick() in kvm_pmu_deliver_pmi() 
> path by ftrace before the patch and 6us+ after the optimization. 
>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
>  arch/x86/kvm/lapic.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 76fb00921203..ec6997187c6d 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1120,7 +1120,8 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
>  	case APIC_DM_NMI:
>  		result = 1;
>  		kvm_inject_nmi(vcpu);
> -		kvm_vcpu_kick(vcpu);
> +		if (vcpu != kvm_get_running_vcpu())
> +			kvm_vcpu_kick(vcpu);

Out of curiosity,

can this be converted into a generic optimization for kvm_vcpu_kick()
instead? I.e. if kvm_vcpu_kick() is called for the currently running
vCPU, there's almost nothing to do, especially when we already have a
request pending, right? (I didn't put too much though to it)

>  		break;
>  
>  	case APIC_DM_INIT:

-- 
Vitaly


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] KVM: vPMU: Fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0
  2021-10-08  9:57 ` [PATCH 2/3] KVM: vPMU: Fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 Wanpeng Li
@ 2021-10-08 11:02   ` Like Xu
  2021-10-08 11:17     ` Wanpeng Li
  0 siblings, 1 reply; 11+ messages in thread
From: Like Xu @ 2021-10-08 11:02 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, linux-kernel, Andi Kleen

cc Andi,

On 8/10/2021 5:57 pm, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> SDM section 18.2.3 mentioned that:
> 
>    "IA32_PERF_GLOBAL_OVF_CTL MSR allows software to clear overflow indicator(s) of
>     any general-purpose or fixed-function counters via a single WRMSR."
> 
> It is R/W mentioned by SDM, we read this msr on bare-metal during perf testing,
> the value is always 0 for CLX/SKX boxes on hands. Let's fill get_msr
> MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 as hardware behavior.
> 
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
> Btw, xen also fills get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL 0.
> 
>   arch/x86/kvm/vmx/pmu_intel.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 10cc4f65c4ef..47260a8563f9 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -365,7 +365,7 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		msr_info->data = pmu->global_ctrl;
>   		return 0;
>   	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> -		msr_info->data = pmu->global_ovf_ctrl;
> +		msr_info->data = 0;

Tested-by: Like Xu <likexu@tencent.com>

Further, better to drop 'u64 global_ovf_ctrl' directly.

>   		return 0;
>   	default:
>   		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] KVM: LAPIC: Optimize PMI delivering overhead
  2021-10-08 10:52   ` Vitaly Kuznetsov
@ 2021-10-08 11:06     ` Wanpeng Li
  2021-10-08 15:59       ` Sean Christopherson
  0 siblings, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2021-10-08 11:06 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: LKML, kvm, Paolo Bonzini, Sean Christopherson, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Fri, 8 Oct 2021 at 18:52, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
> Wanpeng Li <kernellwp@gmail.com> writes:
>
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > The overhead of kvm_vcpu_kick() is huge since expensive rcu/memory
> > barrier etc operations in rcuwait_wake_up(). It is worse when local
> > delivery since the vCPU is scheduled and we still suffer from this.
> > We can observe 12us+ for kvm_vcpu_kick() in kvm_pmu_deliver_pmi()
> > path by ftrace before the patch and 6us+ after the optimization.
> >
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
> >  arch/x86/kvm/lapic.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 76fb00921203..ec6997187c6d 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -1120,7 +1120,8 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
> >       case APIC_DM_NMI:
> >               result = 1;
> >               kvm_inject_nmi(vcpu);
> > -             kvm_vcpu_kick(vcpu);
> > +             if (vcpu != kvm_get_running_vcpu())
> > +                     kvm_vcpu_kick(vcpu);
>
> Out of curiosity,
>
> can this be converted into a generic optimization for kvm_vcpu_kick()
> instead? I.e. if kvm_vcpu_kick() is called for the currently running
> vCPU, there's almost nothing to do, especially when we already have a
> request pending, right? (I didn't put too much though to it)

I thought about it before, I will do it in the next version since you
also vote for it. :)

    Wanpeng

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] KVM: vPMU: Fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0
  2021-10-08 11:02   ` Like Xu
@ 2021-10-08 11:17     ` Wanpeng Li
  0 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2021-10-08 11:17 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, kvm, linux-kernel, Andi Kleen

On Fri, 8 Oct 2021 at 19:02, Like Xu <like.xu.linux@gmail.com> wrote:
>
> cc Andi,
>
> On 8/10/2021 5:57 pm, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > SDM section 18.2.3 mentioned that:
> >
> >    "IA32_PERF_GLOBAL_OVF_CTL MSR allows software to clear overflow indicator(s) of
> >     any general-purpose or fixed-function counters via a single WRMSR."
> >
> > It is R/W mentioned by SDM, we read this msr on bare-metal during perf testing,
> > the value is always 0 for CLX/SKX boxes on hands. Let's fill get_msr
> > MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 as hardware behavior.
> >
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
> > Btw, xen also fills get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL 0.
> >
> >   arch/x86/kvm/vmx/pmu_intel.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> > index 10cc4f65c4ef..47260a8563f9 100644
> > --- a/arch/x86/kvm/vmx/pmu_intel.c
> > +++ b/arch/x86/kvm/vmx/pmu_intel.c
> > @@ -365,7 +365,7 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >               msr_info->data = pmu->global_ctrl;
> >               return 0;
> >       case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> > -             msr_info->data = pmu->global_ovf_ctrl;
> > +             msr_info->data = 0;
>
> Tested-by: Like Xu <likexu@tencent.com>

Thanks.

> Further, better to drop 'u64 global_ovf_ctrl' directly.

Good suggestion. :)

    Wanpeng

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] KVM: emulate: #GP when emulating rdpmc if CR0.PE is 1
  2021-10-08  9:57 [PATCH 1/3] KVM: emulate: #GP when emulating rdpmc if CR0.PE is 1 Wanpeng Li
  2021-10-08  9:57 ` [PATCH 2/3] KVM: vPMU: Fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 Wanpeng Li
  2021-10-08  9:57 ` [PATCH 3/3] KVM: LAPIC: Optimize PMI delivering overhead Wanpeng Li
@ 2021-10-08 15:20 ` Sean Christopherson
  2021-10-09  9:09   ` Wanpeng Li
  2 siblings, 1 reply; 11+ messages in thread
From: Sean Christopherson @ 2021-10-08 15:20 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

The shortlog makes it sound like "inject a #GP if CR0.PE=1", i.e. unconditionally
inject #GP for RDMPC in protected mode.  Maybe "Don't inject #GP when emulating
RDMPC if CR0.PE=0"?

On Fri, Oct 08, 2021, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> SDM mentioned that, RDPMC: 
> 
>   IF (((CR4.PCE = 1) or (CPL = 0) or (CR0.PE = 0)) and (ECX indicates a supported counter)) 
>       THEN
>           EAX := counter[31:0];
>           EDX := ZeroExtend(counter[MSCB:32]);
>       ELSE (* ECX is not valid or CR4.PCE is 0 and CPL is 1, 2, or 3 and CR0.PE is 1 *)
>           #GP(0); 
>   FI;
> 
> Let's add the CR0.PE is 1 checking to rdpmc emulate.
> 
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
>  arch/x86/kvm/emulate.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 9a144ca8e146..ab7ec569e8c9 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -4213,6 +4213,7 @@ static int check_rdtsc(struct x86_emulate_ctxt *ctxt)
>  static int check_rdpmc(struct x86_emulate_ctxt *ctxt)
>  {
>  	u64 cr4 = ctxt->ops->get_cr(ctxt, 4);
> +	u64 cr0 = ctxt->ops->get_cr(ctxt, 0);
>  	u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
>  
>  	/*
> @@ -4222,7 +4223,7 @@ static int check_rdpmc(struct x86_emulate_ctxt *ctxt)
>  	if (enable_vmware_backdoor && is_vmware_backdoor_pmc(rcx))
>  		return X86EMUL_CONTINUE;
>  
> -	if ((!(cr4 & X86_CR4_PCE) && ctxt->ops->cpl(ctxt)) ||
> +	if ((!(cr4 & X86_CR4_PCE) && ctxt->ops->cpl(ctxt) && (cr0 & X86_CR0_PE)) ||

I don't think it's possible for CPL to be >0 if CR0.PE=0, e.g. we could probably
WARN in the #GP path.  Realistically it doesn't add value though, so maybe just
add a blurb in the changelog saying this isn't strictly necessary?

>  	    ctxt->ops->check_pmc(ctxt, rcx))
>  		return emulate_gp(ctxt, 0);
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] KVM: LAPIC: Optimize PMI delivering overhead
  2021-10-08 11:06     ` Wanpeng Li
@ 2021-10-08 15:59       ` Sean Christopherson
  2021-10-09  9:14         ` Wanpeng Li
  0 siblings, 1 reply; 11+ messages in thread
From: Sean Christopherson @ 2021-10-08 15:59 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Vitaly Kuznetsov, LKML, kvm, Paolo Bonzini, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Fri, Oct 08, 2021, Wanpeng Li wrote:
> On Fri, 8 Oct 2021 at 18:52, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> >
> > Wanpeng Li <kernellwp@gmail.com> writes:
> >
> > > From: Wanpeng Li <wanpengli@tencent.com>
> > >
> > > The overhead of kvm_vcpu_kick() is huge since expensive rcu/memory
> > > barrier etc operations in rcuwait_wake_up(). It is worse when local

Memory barriers on x86 are just compiler barriers.  The only meaningful overhead
is the locked transaction in rcu_read_lock() => preempt_disable().  I suspect the
performance benefit from this patch comes either comes from avoiding a second
lock when disabling preemption again for get_cpu(), or by avoiding the cmpxchg()
in kvm_vcpu_exiting_guest_mode().

> > > delivery since the vCPU is scheduled and we still suffer from this.
> > > We can observe 12us+ for kvm_vcpu_kick() in kvm_pmu_deliver_pmi()
> > > path by ftrace before the patch and 6us+ after the optimization.

Those numbers seem off, I wouldn't expect a few locks to take 6us.

> > > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > > ---
> > >  arch/x86/kvm/lapic.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > index 76fb00921203..ec6997187c6d 100644
> > > --- a/arch/x86/kvm/lapic.c
> > > +++ b/arch/x86/kvm/lapic.c
> > > @@ -1120,7 +1120,8 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
> > >       case APIC_DM_NMI:
> > >               result = 1;
> > >               kvm_inject_nmi(vcpu);
> > > -             kvm_vcpu_kick(vcpu);
> > > +             if (vcpu != kvm_get_running_vcpu())
> > > +                     kvm_vcpu_kick(vcpu);
> >
> > Out of curiosity,
> >
> > can this be converted into a generic optimization for kvm_vcpu_kick()
> > instead? I.e. if kvm_vcpu_kick() is called for the currently running
> > vCPU, there's almost nothing to do, especially when we already have a
> > request pending, right? (I didn't put too much though to it)
> 
> I thought about it before, I will do it in the next version since you
> also vote for it. :)

Adding a kvm_get_running_vcpu() check before kvm_vcpu_wake_up() in kvm_vcpu_kick()
is not functionally correct as it's possible to reach kvm_cpu_kick() from (soft)
IRQ context, e.g. hrtimer => apic_timer_expired() and pi_wakeup_handler().  If
the kick occurs after prepare_to_rcuwait() and the final kvm_vcpu_check_block(),
but before the vCPU is scheduled out, then the kvm_vcpu_wake_up() is required to
wake the vCPU, even if it is the current running vCPU.

The extra check might also degrade performance for many cases since the full kick
path would need to disable preemption three times, though if the overhead is from
x86's cmpxchg() then it's a moot point.

I think we'd want something like this to avoid extra preempt_disable() as well
as the cmpxchg() when @vcpu is the running vCPU.

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8b7dc6e89fd7..f148a7d2a8b9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3349,8 +3349,15 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
 {
        int me, cpu;

-       if (kvm_vcpu_wake_up(vcpu))
-               return;
+       me = get_cpu();
+
+       if (rcuwait_active(&vcpu->wait) && kvm_vcpu_wake_up(vcpu))
+               goto out;
+
+       if (vcpu == __this_cpu_read(kvm_running_vcpu)) {
+               WARN_ON_ONCE(vcpu->mode == IN_GUEST_MODE);
+               goto out;
+       }

        /*
         * Note, the vCPU could get migrated to a different pCPU at any point
@@ -3359,12 +3366,12 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
         * IPI is to force the vCPU to leave IN_GUEST_MODE, and migrating the
         * vCPU also requires it to leave IN_GUEST_MODE.
         */
-       me = get_cpu();
        if (kvm_arch_vcpu_should_kick(vcpu)) {
                cpu = READ_ONCE(vcpu->cpu);
                if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
                        smp_send_reschedule(cpu);
        }
+out:
        put_cpu();
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_kick);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] KVM: emulate: #GP when emulating rdpmc if CR0.PE is 1
  2021-10-08 15:20 ` [PATCH 1/3] KVM: emulate: #GP when emulating rdpmc if CR0.PE is 1 Sean Christopherson
@ 2021-10-09  9:09   ` Wanpeng Li
  0 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2021-10-09  9:09 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: LKML, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Fri, 8 Oct 2021 at 23:20, Sean Christopherson <seanjc@google.com> wrote:
>
> The shortlog makes it sound like "inject a #GP if CR0.PE=1", i.e. unconditionally
> inject #GP for RDMPC in protected mode.  Maybe "Don't inject #GP when emulating
> RDMPC if CR0.PE=0"?
>

Agreed.

> On Fri, Oct 08, 2021, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > SDM mentioned that, RDPMC:
> >
> >   IF (((CR4.PCE = 1) or (CPL = 0) or (CR0.PE = 0)) and (ECX indicates a supported counter))
> >       THEN
> >           EAX := counter[31:0];
> >           EDX := ZeroExtend(counter[MSCB:32]);
> >       ELSE (* ECX is not valid or CR4.PCE is 0 and CPL is 1, 2, or 3 and CR0.PE is 1 *)
> >           #GP(0);
> >   FI;
> >
> > Let's add the CR0.PE is 1 checking to rdpmc emulate.
> >
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
> >  arch/x86/kvm/emulate.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> > index 9a144ca8e146..ab7ec569e8c9 100644
> > --- a/arch/x86/kvm/emulate.c
> > +++ b/arch/x86/kvm/emulate.c
> > @@ -4213,6 +4213,7 @@ static int check_rdtsc(struct x86_emulate_ctxt *ctxt)
> >  static int check_rdpmc(struct x86_emulate_ctxt *ctxt)
> >  {
> >       u64 cr4 = ctxt->ops->get_cr(ctxt, 4);
> > +     u64 cr0 = ctxt->ops->get_cr(ctxt, 0);
> >       u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
> >
> >       /*
> > @@ -4222,7 +4223,7 @@ static int check_rdpmc(struct x86_emulate_ctxt *ctxt)
> >       if (enable_vmware_backdoor && is_vmware_backdoor_pmc(rcx))
> >               return X86EMUL_CONTINUE;
> >
> > -     if ((!(cr4 & X86_CR4_PCE) && ctxt->ops->cpl(ctxt)) ||
> > +     if ((!(cr4 & X86_CR4_PCE) && ctxt->ops->cpl(ctxt) && (cr0 & X86_CR0_PE)) ||
>
> I don't think it's possible for CPL to be >0 if CR0.PE=0, e.g. we could probably
> WARN in the #GP path.  Realistically it doesn't add value though, so maybe just
> add a blurb in the changelog saying this isn't strictly necessary?

Do it in v2.

    Wanpeng

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] KVM: LAPIC: Optimize PMI delivering overhead
  2021-10-08 15:59       ` Sean Christopherson
@ 2021-10-09  9:14         ` Wanpeng Li
  0 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2021-10-09  9:14 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, LKML, kvm, Paolo Bonzini, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Fri, 8 Oct 2021 at 23:59, Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Oct 08, 2021, Wanpeng Li wrote:
> > On Fri, 8 Oct 2021 at 18:52, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> > >
> > > Wanpeng Li <kernellwp@gmail.com> writes:
> > >
> > > > From: Wanpeng Li <wanpengli@tencent.com>
> > > >
> > > > The overhead of kvm_vcpu_kick() is huge since expensive rcu/memory
> > > > barrier etc operations in rcuwait_wake_up(). It is worse when local
>
> Memory barriers on x86 are just compiler barriers.  The only meaningful overhead
> is the locked transaction in rcu_read_lock() => preempt_disable().  I suspect the
> performance benefit from this patch comes either comes from avoiding a second
> lock when disabling preemption again for get_cpu(), or by avoiding the cmpxchg()
> in kvm_vcpu_exiting_guest_mode().
>
> > > > delivery since the vCPU is scheduled and we still suffer from this.
> > > > We can observe 12us+ for kvm_vcpu_kick() in kvm_pmu_deliver_pmi()
> > > > path by ftrace before the patch and 6us+ after the optimization.
>
> Those numbers seem off, I wouldn't expect a few locks to take 6us.

Maybe the ftrace introduces more overhead.

>
> > > > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > > > ---
> > > >  arch/x86/kvm/lapic.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > index 76fb00921203..ec6997187c6d 100644
> > > > --- a/arch/x86/kvm/lapic.c
> > > > +++ b/arch/x86/kvm/lapic.c
> > > > @@ -1120,7 +1120,8 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
> > > >       case APIC_DM_NMI:
> > > >               result = 1;
> > > >               kvm_inject_nmi(vcpu);
> > > > -             kvm_vcpu_kick(vcpu);
> > > > +             if (vcpu != kvm_get_running_vcpu())
> > > > +                     kvm_vcpu_kick(vcpu);
> > >
> > > Out of curiosity,
> > >
> > > can this be converted into a generic optimization for kvm_vcpu_kick()
> > > instead? I.e. if kvm_vcpu_kick() is called for the currently running
> > > vCPU, there's almost nothing to do, especially when we already have a
> > > request pending, right? (I didn't put too much though to it)
> >
> > I thought about it before, I will do it in the next version since you
> > also vote for it. :)
>
> Adding a kvm_get_running_vcpu() check before kvm_vcpu_wake_up() in kvm_vcpu_kick()
> is not functionally correct as it's possible to reach kvm_cpu_kick() from (soft)
> IRQ context, e.g. hrtimer => apic_timer_expired() and pi_wakeup_handler().  If
> the kick occurs after prepare_to_rcuwait() and the final kvm_vcpu_check_block(),
> but before the vCPU is scheduled out, then the kvm_vcpu_wake_up() is required to
> wake the vCPU, even if it is the current running vCPU.

Good point.

>
> The extra check might also degrade performance for many cases since the full kick
> path would need to disable preemption three times, though if the overhead is from
> x86's cmpxchg() then it's a moot point.
>
> I think we'd want something like this to avoid extra preempt_disable() as well
> as the cmpxchg() when @vcpu is the running vCPU.

Do it in v2, thanks for the suggestion.

    Wanpeng

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-10-09  9:14 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-08  9:57 [PATCH 1/3] KVM: emulate: #GP when emulating rdpmc if CR0.PE is 1 Wanpeng Li
2021-10-08  9:57 ` [PATCH 2/3] KVM: vPMU: Fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 Wanpeng Li
2021-10-08 11:02   ` Like Xu
2021-10-08 11:17     ` Wanpeng Li
2021-10-08  9:57 ` [PATCH 3/3] KVM: LAPIC: Optimize PMI delivering overhead Wanpeng Li
2021-10-08 10:52   ` Vitaly Kuznetsov
2021-10-08 11:06     ` Wanpeng Li
2021-10-08 15:59       ` Sean Christopherson
2021-10-09  9:14         ` Wanpeng Li
2021-10-08 15:20 ` [PATCH 1/3] KVM: emulate: #GP when emulating rdpmc if CR0.PE is 1 Sean Christopherson
2021-10-09  9:09   ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).