kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/3] KVM: emulate: Don't inject #GP when emulating RDMPC if CR0.PE=0
@ 2021-10-19  8:12 Wanpeng Li
  2021-10-19  8:12 ` [PATCH v3 2/3] KVM: vPMU: Fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 Wanpeng Li
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Wanpeng Li @ 2021-10-19  8:12 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

SDM mentioned that, RDPMC:

  IF (((CR4.PCE = 1) or (CPL = 0) or (CR0.PE = 0)) and (ECX indicates a supported counter))
      THEN
          EAX := counter[31:0];
          EDX := ZeroExtend(counter[MSCB:32]);
      ELSE (* ECX is not valid or CR4.PCE is 0 and CPL is 1, 2, or 3 and CR0.PE is 1 *)
          #GP(0);
  FI;

Let's add the CR0.PE is 1 checking to rdpmc emulate, though this isn't
strictly necessary since it's impossible for CPL to be >0 if CR0.PE=0.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
v2 -> v3:
 * add the missing 'S'
v1 -> v2:
 * update patch description

 arch/x86/kvm/emulate.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 9a144ca8e146..ab7ec569e8c9 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4213,6 +4213,7 @@ static int check_rdtsc(struct x86_emulate_ctxt *ctxt)
 static int check_rdpmc(struct x86_emulate_ctxt *ctxt)
 {
 	u64 cr4 = ctxt->ops->get_cr(ctxt, 4);
+	u64 cr0 = ctxt->ops->get_cr(ctxt, 0);
 	u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
 
 	/*
@@ -4222,7 +4223,7 @@ static int check_rdpmc(struct x86_emulate_ctxt *ctxt)
 	if (enable_vmware_backdoor && is_vmware_backdoor_pmc(rcx))
 		return X86EMUL_CONTINUE;
 
-	if ((!(cr4 & X86_CR4_PCE) && ctxt->ops->cpl(ctxt)) ||
+	if ((!(cr4 & X86_CR4_PCE) && ctxt->ops->cpl(ctxt) && (cr0 & X86_CR0_PE)) ||
 	    ctxt->ops->check_pmc(ctxt, rcx))
 		return emulate_gp(ctxt, 0);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 2/3] KVM: vPMU: Fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0
  2021-10-19  8:12 [PATCH v3 1/3] KVM: emulate: Don't inject #GP when emulating RDMPC if CR0.PE=0 Wanpeng Li
@ 2021-10-19  8:12 ` Wanpeng Li
  2021-10-19 16:59   ` Paolo Bonzini
  2021-10-19  8:12 ` [PATCH v3 3/3] KVM: vCPU kick tax cut for running vCPU Wanpeng Li
  2021-10-19 17:00 ` [PATCH v3 1/3] KVM: emulate: Don't inject #GP when emulating RDMPC if CR0.PE=0 Paolo Bonzini
  2 siblings, 1 reply; 13+ messages in thread
From: Wanpeng Li @ 2021-10-19  8:12 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

SDM section 18.2.3 mentioned that:

  "IA32_PERF_GLOBAL_OVF_CTL MSR allows software to clear overflow indicator(s) of
   any general-purpose or fixed-function counters via a single WRMSR."

It is R/W mentioned by SDM, we read this msr on bare-metal during perf testing,
the value is always 0 for ICX/SKX boxes on hands. Let's fill get_msr
MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 as hardware behavior and drop
global_ovf_ctrl variable.

Tested-by: Like Xu <likexu@tencent.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
Btw, xen also fills get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0.
v1 -> v2:
 * drop 'u64 global_ovf_ctrl' directly

 arch/x86/include/asm/kvm_host.h | 1 -
 arch/x86/kvm/vmx/pmu_intel.c    | 6 ++----
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f8f48a7ec577..7aaac918e992 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -499,7 +499,6 @@ struct kvm_pmu {
 	u64 fixed_ctr_ctrl;
 	u64 global_ctrl;
 	u64 global_status;
-	u64 global_ovf_ctrl;
 	u64 counter_bitmask[2];
 	u64 global_ctrl_mask;
 	u64 global_ovf_ctrl_mask;
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 10cc4f65c4ef..b8e0d21b7c8a 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -365,7 +365,7 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		msr_info->data = pmu->global_ctrl;
 		return 0;
 	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-		msr_info->data = pmu->global_ovf_ctrl;
+		msr_info->data = 0;
 		return 0;
 	default:
 		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
@@ -423,7 +423,6 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (!(data & pmu->global_ovf_ctrl_mask)) {
 			if (!msr_info->host_initiated)
 				pmu->global_status &= ~data;
-			pmu->global_ovf_ctrl = data;
 			return 0;
 		}
 		break;
@@ -588,8 +587,7 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
 		pmc->counter = 0;
 	}
 
-	pmu->fixed_ctr_ctrl = pmu->global_ctrl = pmu->global_status =
-		pmu->global_ovf_ctrl = 0;
+	pmu->fixed_ctr_ctrl = pmu->global_ctrl = pmu->global_status = 0;
 
 	intel_pmu_release_guest_lbr_event(vcpu);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 3/3] KVM: vCPU kick tax cut for running vCPU
  2021-10-19  8:12 [PATCH v3 1/3] KVM: emulate: Don't inject #GP when emulating RDMPC if CR0.PE=0 Wanpeng Li
  2021-10-19  8:12 ` [PATCH v3 2/3] KVM: vPMU: Fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 Wanpeng Li
@ 2021-10-19  8:12 ` Wanpeng Li
  2021-10-19 16:59   ` Paolo Bonzini
  2021-10-19 17:00 ` [PATCH v3 1/3] KVM: emulate: Don't inject #GP when emulating RDMPC if CR0.PE=0 Paolo Bonzini
  2 siblings, 1 reply; 13+ messages in thread
From: Wanpeng Li @ 2021-10-19  8:12 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

Sometimes a vCPU kick is following a pending request, even if @vcpu is 
the running vCPU. It suffers from both rcuwait_wake_up() which has 
rcu/memory barrier operations and cmpxchg(). Let's check vcpu->wait 
before rcu_wait_wake_up() and whether @vcpu is the running vCPU before 
cmpxchg() to tax cut this overhead.

We evaluate the kvm-unit-test/vmexit.flat on an Intel ICX box, most of the 
scores can improve ~600 cpu cycles especially when APICv is disabled.

tscdeadline_immed
tscdeadline
self_ipi_sti_nop
..............
x2apic_self_ipi_tpr_sti_hlt

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
v2 -> v3:
 * use kvm_arch_vcpu_get_wait()
v1 -> v2:
 * move checking running vCPU logic to kvm_vcpu_kick
 * check rcuwait_active(&vcpu->wait) etc

 virt/kvm/kvm_main.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7851f3a1b5f7..1bc52eab0a7d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3314,8 +3314,15 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
 {
 	int me, cpu;
 
-	if (kvm_vcpu_wake_up(vcpu))
-		return;
+	me = get_cpu();
+
+	if (rcuwait_active(kvm_arch_vcpu_get_wait(vcpu)) && kvm_vcpu_wake_up(vcpu))
+		goto out;
+
+	if (vcpu == __this_cpu_read(kvm_running_vcpu)) {
+		WARN_ON_ONCE(vcpu->mode == IN_GUEST_MODE);
+		goto out;
+	}
 
 	/*
 	 * Note, the vCPU could get migrated to a different pCPU at any point
@@ -3324,12 +3331,12 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
 	 * IPI is to force the vCPU to leave IN_GUEST_MODE, and migrating the
 	 * vCPU also requires it to leave IN_GUEST_MODE.
 	 */
-	me = get_cpu();
 	if (kvm_arch_vcpu_should_kick(vcpu)) {
 		cpu = READ_ONCE(vcpu->cpu);
 		if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
 			smp_send_reschedule(cpu);
 	}
+out:
 	put_cpu();
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_kick);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] KVM: vCPU kick tax cut for running vCPU
  2021-10-19  8:12 ` [PATCH v3 3/3] KVM: vCPU kick tax cut for running vCPU Wanpeng Li
@ 2021-10-19 16:59   ` Paolo Bonzini
  2021-10-19 17:34     ` Sean Christopherson
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2021-10-19 16:59 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel

On 19/10/21 10:12, Wanpeng Li wrote:
> -	if (kvm_vcpu_wake_up(vcpu))
> -		return;
> +	me = get_cpu();
> +
> +	if (rcuwait_active(kvm_arch_vcpu_get_wait(vcpu)) && kvm_vcpu_wake_up(vcpu))
> +		goto out;

This is racy.  You are basically doing the same check that 
rcuwait_wake_up does, but without the memory barrier before.

Also here:

> +	if (vcpu == __this_cpu_read(kvm_running_vcpu)) {
> +		WARN_ON_ONCE(vcpu->mode == IN_GUEST_MODE);

it's better to do

	if (vcpu == ... && !WARN_ON_ONCE(vcpu->mode == IN_GUEST_MODE))
		goto out;

so that if the bug happens you do get a smp_send_reschedule() and fail 
safely.

Paolo

> +		goto out;
> +	}


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/3] KVM: vPMU: Fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0
  2021-10-19  8:12 ` [PATCH v3 2/3] KVM: vPMU: Fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 Wanpeng Li
@ 2021-10-19 16:59   ` Paolo Bonzini
  0 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2021-10-19 16:59 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel

On 19/10/21 10:12, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> SDM section 18.2.3 mentioned that:
> 
>    "IA32_PERF_GLOBAL_OVF_CTL MSR allows software to clear overflow indicator(s) of
>     any general-purpose or fixed-function counters via a single WRMSR."
> 
> It is R/W mentioned by SDM, we read this msr on bare-metal during perf testing,
> the value is always 0 for ICX/SKX boxes on hands. Let's fill get_msr
> MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 as hardware behavior and drop
> global_ovf_ctrl variable.
> 
> Tested-by: Like Xu <likexu@tencent.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
> Btw, xen also fills get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0.
> v1 -> v2:
>   * drop 'u64 global_ovf_ctrl' directly
> 
>   arch/x86/include/asm/kvm_host.h | 1 -
>   arch/x86/kvm/vmx/pmu_intel.c    | 6 ++----
>   2 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f8f48a7ec577..7aaac918e992 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -499,7 +499,6 @@ struct kvm_pmu {
>   	u64 fixed_ctr_ctrl;
>   	u64 global_ctrl;
>   	u64 global_status;
> -	u64 global_ovf_ctrl;
>   	u64 counter_bitmask[2];
>   	u64 global_ctrl_mask;
>   	u64 global_ovf_ctrl_mask;
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 10cc4f65c4ef..b8e0d21b7c8a 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -365,7 +365,7 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		msr_info->data = pmu->global_ctrl;
>   		return 0;
>   	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> -		msr_info->data = pmu->global_ovf_ctrl;
> +		msr_info->data = 0;
>   		return 0;
>   	default:
>   		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
> @@ -423,7 +423,6 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		if (!(data & pmu->global_ovf_ctrl_mask)) {
>   			if (!msr_info->host_initiated)
>   				pmu->global_status &= ~data;
> -			pmu->global_ovf_ctrl = data;
>   			return 0;
>   		}
>   		break;
> @@ -588,8 +587,7 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
>   		pmc->counter = 0;
>   	}
>   
> -	pmu->fixed_ctr_ctrl = pmu->global_ctrl = pmu->global_status =
> -		pmu->global_ovf_ctrl = 0;
> +	pmu->fixed_ctr_ctrl = pmu->global_ctrl = pmu->global_status = 0;
>   
>   	intel_pmu_release_guest_lbr_event(vcpu);
>   }
> 

Queued, thanks.

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/3] KVM: emulate: Don't inject #GP when emulating RDMPC if CR0.PE=0
  2021-10-19  8:12 [PATCH v3 1/3] KVM: emulate: Don't inject #GP when emulating RDMPC if CR0.PE=0 Wanpeng Li
  2021-10-19  8:12 ` [PATCH v3 2/3] KVM: vPMU: Fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 Wanpeng Li
  2021-10-19  8:12 ` [PATCH v3 3/3] KVM: vCPU kick tax cut for running vCPU Wanpeng Li
@ 2021-10-19 17:00 ` Paolo Bonzini
  2021-10-20  1:56   ` Wanpeng Li
  2 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2021-10-19 17:00 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel

On 19/10/21 10:12, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> SDM mentioned that, RDPMC:
> 
>    IF (((CR4.PCE = 1) or (CPL = 0) or (CR0.PE = 0)) and (ECX indicates a supported counter))
>        THEN
>            EAX := counter[31:0];
>            EDX := ZeroExtend(counter[MSCB:32]);
>        ELSE (* ECX is not valid or CR4.PCE is 0 and CPL is 1, 2, or 3 and CR0.PE is 1 *)
>            #GP(0);
>    FI;
> 
> Let's add the CR0.PE is 1 checking to rdpmc emulate, though this isn't
> strictly necessary since it's impossible for CPL to be >0 if CR0.PE=0.

Why not just add a comment then instead?

Paolo

> Reviewed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
> v2 -> v3:
>   * add the missing 'S'
> v1 -> v2:
>   * update patch description
> 
>   arch/x86/kvm/emulate.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 9a144ca8e146..ab7ec569e8c9 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -4213,6 +4213,7 @@ static int check_rdtsc(struct x86_emulate_ctxt *ctxt)
>   static int check_rdpmc(struct x86_emulate_ctxt *ctxt)
>   {
>   	u64 cr4 = ctxt->ops->get_cr(ctxt, 4);
> +	u64 cr0 = ctxt->ops->get_cr(ctxt, 0);
>   	u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
>   
>   	/*
> @@ -4222,7 +4223,7 @@ static int check_rdpmc(struct x86_emulate_ctxt *ctxt)
>   	if (enable_vmware_backdoor && is_vmware_backdoor_pmc(rcx))
>   		return X86EMUL_CONTINUE;
>   
> -	if ((!(cr4 & X86_CR4_PCE) && ctxt->ops->cpl(ctxt)) ||
> +	if ((!(cr4 & X86_CR4_PCE) && ctxt->ops->cpl(ctxt) && (cr0 & X86_CR0_PE)) ||
>   	    ctxt->ops->check_pmc(ctxt, rcx))
>   		return emulate_gp(ctxt, 0);
>   
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] KVM: vCPU kick tax cut for running vCPU
  2021-10-19 16:59   ` Paolo Bonzini
@ 2021-10-19 17:34     ` Sean Christopherson
  2021-10-20  2:49       ` Wanpeng Li
  2021-10-20 10:33       ` Paolo Bonzini
  0 siblings, 2 replies; 13+ messages in thread
From: Sean Christopherson @ 2021-10-19 17:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Wanpeng Li, linux-kernel, kvm, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Tue, Oct 19, 2021, Paolo Bonzini wrote:
> On 19/10/21 10:12, Wanpeng Li wrote:
> > -	if (kvm_vcpu_wake_up(vcpu))
> > -		return;
> > +	me = get_cpu();
> > +
> > +	if (rcuwait_active(kvm_arch_vcpu_get_wait(vcpu)) && kvm_vcpu_wake_up(vcpu))
> > +		goto out;
> 
> This is racy.  You are basically doing the same check that rcuwait_wake_up
> does, but without the memory barrier before.

I was worried that was the case[*], but I didn't have the two hours it would have
taken me to verify there was indeed a problem :-)

The intent of the extra check was to avoid the locked instruction that comes with
disabling preemption via rcu_read_lock().  But thinking more, the extra op should
be little more than a basic arithmetic operation in the grand scheme on modern x86
since the cache line is going to be locked and written no matter what, either
immediately before or immediately after.

So with Paolo's other comment, maybe just this?  And if this doesn't provide the
desired performance boost, changes to the rcuwait behavior should go in separate
patch.

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9ec99f5b972c..ebc6d4f2fbfa 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3333,11 +3333,22 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
         * vCPU also requires it to leave IN_GUEST_MODE.
         */
        me = get_cpu();
+
+       /*
+        * Avoid the moderately expensive "should kick" operation if this pCPU
+        * is currently running the target vCPU, in which case it's a KVM bug
+        * if the vCPU is in the inner run loop.
+        */
+       if (vcpu == __this_cpu_read(kvm_running_vcpu) &&
+           !WARN_ON_ONCE(vcpu->mode == IN_GUEST_MODE))
+               goto out;
+
        if (kvm_arch_vcpu_should_kick(vcpu)) {
                cpu = READ_ONCE(vcpu->cpu);
                if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
                        smp_send_reschedule(cpu);
        }
+out:
        put_cpu();
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_kick);

[*] https://lkml.kernel.org/r/YWoOG40Ap0Islpu2@google.com

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/3] KVM: emulate: Don't inject #GP when emulating RDMPC if CR0.PE=0
  2021-10-19 17:00 ` [PATCH v3 1/3] KVM: emulate: Don't inject #GP when emulating RDMPC if CR0.PE=0 Paolo Bonzini
@ 2021-10-20  1:56   ` Wanpeng Li
  0 siblings, 0 replies; 13+ messages in thread
From: Wanpeng Li @ 2021-10-20  1:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Wed, 20 Oct 2021 at 01:00, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 19/10/21 10:12, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > SDM mentioned that, RDPMC:
> >
> >    IF (((CR4.PCE = 1) or (CPL = 0) or (CR0.PE = 0)) and (ECX indicates a supported counter))
> >        THEN
> >            EAX := counter[31:0];
> >            EDX := ZeroExtend(counter[MSCB:32]);
> >        ELSE (* ECX is not valid or CR4.PCE is 0 and CPL is 1, 2, or 3 and CR0.PE is 1 *)
> >            #GP(0);
> >    FI;
> >
> > Let's add the CR0.PE is 1 checking to rdpmc emulate, though this isn't
> > strictly necessary since it's impossible for CPL to be >0 if CR0.PE=0.
>
> Why not just add a comment then instead?

Will do.

    Wanpeng

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] KVM: vCPU kick tax cut for running vCPU
  2021-10-19 17:34     ` Sean Christopherson
@ 2021-10-20  2:49       ` Wanpeng Li
  2021-10-20  6:47         ` Paolo Bonzini
  2021-10-20 10:33       ` Paolo Bonzini
  1 sibling, 1 reply; 13+ messages in thread
From: Wanpeng Li @ 2021-10-20  2:49 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, LKML, kvm, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Wed, 20 Oct 2021 at 01:34, Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Oct 19, 2021, Paolo Bonzini wrote:
> > On 19/10/21 10:12, Wanpeng Li wrote:
> > > -   if (kvm_vcpu_wake_up(vcpu))
> > > -           return;
> > > +   me = get_cpu();
> > > +
> > > +   if (rcuwait_active(kvm_arch_vcpu_get_wait(vcpu)) && kvm_vcpu_wake_up(vcpu))
> > > +           goto out;
> >
> > This is racy.  You are basically doing the same check that rcuwait_wake_up
> > does, but without the memory barrier before.
>
> I was worried that was the case[*], but I didn't have the two hours it would have
> taken me to verify there was indeed a problem :-)
>
> The intent of the extra check was to avoid the locked instruction that comes with
> disabling preemption via rcu_read_lock().  But thinking more, the extra op should
> be little more than a basic arithmetic operation in the grand scheme on modern x86
> since the cache line is going to be locked and written no matter what, either
> immediately before or immediately after.

I observe the main overhead of rcuwait_wake_up() is from rcu
operations, especially rcu_read_lock/unlock().

>
> So with Paolo's other comment, maybe just this?  And if this doesn't provide the
> desired performance boost, changes to the rcuwait behavior should go in separate
> patch.

Ok.

    Wanpeng

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] KVM: vCPU kick tax cut for running vCPU
  2021-10-20  2:49       ` Wanpeng Li
@ 2021-10-20  6:47         ` Paolo Bonzini
  2021-10-20 10:02           ` Wanpeng Li
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2021-10-20  6:47 UTC (permalink / raw)
  To: Wanpeng Li, Sean Christopherson
  Cc: LKML, kvm, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel

On 20/10/21 04:49, Wanpeng Li wrote:
>> The intent of the extra check was to avoid the locked instruction that comes with
>> disabling preemption via rcu_read_lock().  But thinking more, the extra op should
>> be little more than a basic arithmetic operation in the grand scheme on modern x86
>> since the cache line is going to be locked and written no matter what, either
>> immediately before or immediately after.
>
> I observe the main overhead of rcuwait_wake_up() is from rcu
> operations, especially rcu_read_lock/unlock().

Do you have CONFIG_PREEMPT_RCU set?  If so, maybe something like this would help:

diff --git a/kernel/exit.c b/kernel/exit.c
index fd1c04193e18..ca1e60a1234d 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -235,8 +235,6 @@ int rcuwait_wake_up(struct rcuwait *w)
  	int ret = 0;
  	struct task_struct *task;
  
-	rcu_read_lock();
-
  	/*
  	 * Order condition vs @task, such that everything prior to the load
  	 * of @task is visible. This is the condition as to why the user called
@@ -250,6 +248,14 @@ int rcuwait_wake_up(struct rcuwait *w)
  	 */
  	smp_mb(); /* (B) */
  
+#ifdef CONFIG_PREEMPT_RCU
+	/* The cost of rcu_read_lock() is nontrivial for preemptable RCU.  */
+	if (!rcuwait_active(w))
+		return ret;
+#endif
+
+	rcu_read_lock();
+
  	task = rcu_dereference(w->task);
  	if (task)
  		ret = wake_up_process(task);

(If you don't, rcu_read_lock is essentially preempt_disable() and it
should not have a large overhead).  You still need the memory barrier
though, in order to avoid missed wakeups; shameless plug for my
article at https://lwn.net/Articles/847481/.

Paolo

>> So with Paolo's other comment, maybe just this?  And if this doesn't provide the
>> desired performance boost, changes to the rcuwait behavior should go in separate
>> patch.
> Ok.


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] KVM: vCPU kick tax cut for running vCPU
  2021-10-20  6:47         ` Paolo Bonzini
@ 2021-10-20 10:02           ` Wanpeng Li
  2021-10-20 10:37             ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Wanpeng Li @ 2021-10-20 10:02 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, LKML, kvm, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Wed, 20 Oct 2021 at 14:47, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 20/10/21 04:49, Wanpeng Li wrote:
> >> The intent of the extra check was to avoid the locked instruction that comes with
> >> disabling preemption via rcu_read_lock().  But thinking more, the extra op should
> >> be little more than a basic arithmetic operation in the grand scheme on modern x86
> >> since the cache line is going to be locked and written no matter what, either
> >> immediately before or immediately after.
> >
> > I observe the main overhead of rcuwait_wake_up() is from rcu
> > operations, especially rcu_read_lock/unlock().
>
> Do you have CONFIG_PREEMPT_RCU set?  If so, maybe something like this would help:

Yes.

>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index fd1c04193e18..ca1e60a1234d 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -235,8 +235,6 @@ int rcuwait_wake_up(struct rcuwait *w)
>         int ret = 0;
>         struct task_struct *task;
>
> -       rcu_read_lock();
> -
>         /*
>          * Order condition vs @task, such that everything prior to the load
>          * of @task is visible. This is the condition as to why the user called
> @@ -250,6 +248,14 @@ int rcuwait_wake_up(struct rcuwait *w)
>          */
>         smp_mb(); /* (B) */
>
> +#ifdef CONFIG_PREEMPT_RCU
> +       /* The cost of rcu_read_lock() is nontrivial for preemptable RCU.  */
> +       if (!rcuwait_active(w))
> +               return ret;
> +#endif
> +
> +       rcu_read_lock();
> +
>         task = rcu_dereference(w->task);
>         if (task)
>                 ret = wake_up_process(task);
>
> (If you don't, rcu_read_lock is essentially preempt_disable() and it
> should not have a large overhead).  You still need the memory barrier
> though, in order to avoid missed wakeups; shameless plug for my
> article at https://lwn.net/Articles/847481/.

You are right, the cost of rcu_read_lock() for preemptable RCU
introduces too much overhead, do you want to send a separate patch?

    Wanpeng

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] KVM: vCPU kick tax cut for running vCPU
  2021-10-19 17:34     ` Sean Christopherson
  2021-10-20  2:49       ` Wanpeng Li
@ 2021-10-20 10:33       ` Paolo Bonzini
  1 sibling, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2021-10-20 10:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Wanpeng Li, linux-kernel, kvm, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On 19/10/21 19:34, Sean Christopherson wrote:
> The intent of the extra check was to avoid the locked instruction that comes with
> disabling preemption via rcu_read_lock().  But thinking more, the extra op should
> be little more than a basic arithmetic operation in the grand scheme on modern x86
> since the cache line is going to be locked and written no matter what, either
> immediately before or immediately after.

There should be no locked instructions unless you're using 
PREEMPT_RT/PREEMPT_RCU, no?  The preempt_disable count is in a percpu 
variable.

> 
> +       /*
> +        * Avoid the moderately expensive "should kick" operation if this pCPU
> +        * is currently running the target vCPU, in which case it's a KVM bug
> +        * if the vCPU is in the inner run loop.
> +        */
> +       if (vcpu == __this_cpu_read(kvm_running_vcpu) &&
> +           !WARN_ON_ONCE(vcpu->mode == IN_GUEST_MODE))
> +               goto out;
> +

It should not even be a problem if vcpu->mode == IN_GUEST_MODE, you just 
set it to EXITING_GUEST_MODE without even the need for atomic_cmpxchg.
I'll send a few patches out, since I think I found some related issues.

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] KVM: vCPU kick tax cut for running vCPU
  2021-10-20 10:02           ` Wanpeng Li
@ 2021-10-20 10:37             ` Paolo Bonzini
  0 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2021-10-20 10:37 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Sean Christopherson, LKML, kvm, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On 20/10/21 12:02, Wanpeng Li wrote:
>> +#ifdef CONFIG_PREEMPT_RCU
>> +       /* The cost of rcu_read_lock() is nontrivial for preemptable RCU.  */
>> +       if (!rcuwait_active(w))
>> +               return ret;
>> +#endif
>> +
>> +       rcu_read_lock();
>> +
>>          task = rcu_dereference(w->task);
>>          if (task)
>>                  ret = wake_up_process(task);
>>
>> (If you don't, rcu_read_lock is essentially preempt_disable() and it
>> should not have a large overhead).  You still need the memory barrier
>> though, in order to avoid missed wakeups; shameless plug for my
>> article athttps://lwn.net/Articles/847481/.
> You are right, the cost of rcu_read_lock() for preemptable RCU
> introduces too much overhead, do you want to send a separate patch?

Yes, I'll take care of this.  Thanks!

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-10-20 10:37 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-19  8:12 [PATCH v3 1/3] KVM: emulate: Don't inject #GP when emulating RDMPC if CR0.PE=0 Wanpeng Li
2021-10-19  8:12 ` [PATCH v3 2/3] KVM: vPMU: Fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 Wanpeng Li
2021-10-19 16:59   ` Paolo Bonzini
2021-10-19  8:12 ` [PATCH v3 3/3] KVM: vCPU kick tax cut for running vCPU Wanpeng Li
2021-10-19 16:59   ` Paolo Bonzini
2021-10-19 17:34     ` Sean Christopherson
2021-10-20  2:49       ` Wanpeng Li
2021-10-20  6:47         ` Paolo Bonzini
2021-10-20 10:02           ` Wanpeng Li
2021-10-20 10:37             ` Paolo Bonzini
2021-10-20 10:33       ` Paolo Bonzini
2021-10-19 17:00 ` [PATCH v3 1/3] KVM: emulate: Don't inject #GP when emulating RDMPC if CR0.PE=0 Paolo Bonzini
2021-10-20  1:56   ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).