All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/4] KVM: x86: Disable intercept for CORE cstate read
@ 2019-05-17  8:49 Wanpeng Li
  2019-05-17  8:49 ` [PATCH RESEND 2/4] KVM: X86: Emulate MSR_IA32_MISC_ENABLE MWAIT bit Wanpeng Li
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Wanpeng Li @ 2019-05-17  8:49 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Liran Alon

From: Wanpeng Li <wanpengli@tencent.com>

Allow guest reads CORE cstate when exposing host CPU power management capabilities 
to the guest. PKG cstate is restricted to avoid a guest to get the whole package 
information in multi-tenant scenario.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/vmx/vmx.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 771d3bf..b0d6be5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6615,6 +6615,12 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
 	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
 	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
 	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
+	if (kvm_mwait_in_guest(kvm)) {
+		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C1_RES, MSR_TYPE_R);
+		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
+		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
+		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
+	}
 	vmx->msr_bitmap_mode = 0;
 
 	vmx->loaded_vmcs = &vmx->vmcs01;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RESEND 2/4] KVM: X86: Emulate MSR_IA32_MISC_ENABLE MWAIT bit
  2019-05-17  8:49 [PATCH 1/4] KVM: x86: Disable intercept for CORE cstate read Wanpeng Li
@ 2019-05-17  8:49 ` Wanpeng Li
  2019-05-20 10:34   ` Paolo Bonzini
  2019-05-17  8:49 ` [PATCH 3/4] KVM: Fix spinlock taken warning during host resume Wanpeng Li
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2019-05-17  8:49 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Liran Alon, Konrad Rzeszutek Wilk

From: Wanpeng Li <wanpengli@tencent.com>

MSR IA32_MSIC_ENABLE bit 18, according to SDM:

| When this bit is set to 0, the MONITOR feature flag is not set (CPUID.01H:ECX[bit 3] = 0).
| This indicates that MONITOR/MWAIT are not supported.
|
| Software attempts to execute MONITOR/MWAIT will cause #UD when this bit is 0.
|
| When this bit is set to 1 (default), MONITOR/MWAIT are supported (CPUID.01H:ECX[bit 3] = 1).

The CPUID.01H:ECX[bit 3] ought to mirror the value of the MSR bit,
CPUID.01H:ECX[bit 3] is a better guard than kvm_mwait_in_guest().
kvm_mwait_in_guest() affects the behavior of MONITOR/MWAIT, not its
guest visibility.

This patch implements toggling of the CPUID bit based on guest writes
to the MSR.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/cpuid.c | 8 ++++++++
 arch/x86/kvm/x86.c   | 9 +++++++++
 2 files changed, 17 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index fd39516..9244d63 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -137,6 +137,14 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 		(best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
 		best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
 
+	best = kvm_find_cpuid_entry(vcpu, 0x1, 0);
+	if (best) {
+		if (vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_MWAIT)
+			best->ecx |= F(MWAIT);
+		else
+			best->ecx &= ~F(MWAIT);
+	}
+
 	/* Update physical-address width */
 	vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
 	kvm_mmu_reset_context(vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1d89cb9..f2e3847 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2506,6 +2506,15 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		}
 		break;
 	case MSR_IA32_MISC_ENABLE:
+		if ((vcpu->arch.ia32_misc_enable_msr ^ data) & MSR_IA32_MISC_ENABLE_MWAIT) {
+			if ((vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_MWAIT) &&
+				!(data & MSR_IA32_MISC_ENABLE_MWAIT)) {
+				if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
+					return 1;
+			}
+			vcpu->arch.ia32_misc_enable_msr = data;
+			kvm_update_cpuid(vcpu);
+		}
 		vcpu->arch.ia32_misc_enable_msr = data;
 		break;
 	case MSR_IA32_SMBASE:
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/4] KVM: Fix spinlock taken warning during host resume
  2019-05-17  8:49 [PATCH 1/4] KVM: x86: Disable intercept for CORE cstate read Wanpeng Li
  2019-05-17  8:49 ` [PATCH RESEND 2/4] KVM: X86: Emulate MSR_IA32_MISC_ENABLE MWAIT bit Wanpeng Li
@ 2019-05-17  8:49 ` Wanpeng Li
  2019-05-20 10:36   ` Paolo Bonzini
  2019-05-17  8:49 ` [PATCH 4/4] KVM: nVMX: Fix using __this_cpu_read() in preemptible context Wanpeng Li
  2019-05-20 10:30 ` [PATCH 1/4] KVM: x86: Disable intercept for CORE cstate read Paolo Bonzini
  3 siblings, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2019-05-17  8:49 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Paul E . McKenney

From: Wanpeng Li <wanpengli@tencent.com>

 WARNING: CPU: 0 PID: 13554 at kvm/arch/x86/kvm//../../../virt/kvm/kvm_main.c:4183 kvm_resume+0x3c/0x40 [kvm]
  CPU: 0 PID: 13554 Comm: step_after_susp Tainted: G           OE     5.1.0-rc4+ #1
  RIP: 0010:kvm_resume+0x3c/0x40 [kvm]
  Call Trace:
   syscore_resume+0x63/0x2d0
   suspend_devices_and_enter+0x9d1/0xa40
   pm_suspend+0x33a/0x3b0
   state_store+0x82/0xf0
   kobj_attr_store+0x12/0x20
   sysfs_kf_write+0x4b/0x60
   kernfs_fop_write+0x120/0x1a0
   __vfs_write+0x1b/0x40
   vfs_write+0xcd/0x1d0
   ksys_write+0x5f/0xe0
   __x64_sys_write+0x1a/0x20
   do_syscall_64+0x6f/0x6c0
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Commit ca84d1a24 (KVM: x86: Add clock sync request to hardware enable) mentioned 
that "we always hold kvm_lock when hardware_enable is called.  The one place that 
doesn't need to worry about it is resume, as resuming a frozen CPU, the spinlock 
won't be taken." However, commit 6706dae9 (virt/kvm: Replace spin_is_locked() with 
lockdep) introduces a bug, it asserts when the lock is not held which is contrary 
to the original goal. 

This patch fixes it by WARN_ON when the lock is held.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 virt/kvm/kvm_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5fb0f16..c7eab5f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4096,7 +4096,7 @@ static int kvm_suspend(void)
 static void kvm_resume(void)
 {
 	if (kvm_usage_count) {
-		lockdep_assert_held(&kvm_count_lock);
+		WARN_ON(lockdep_is_held(&kvm_count_lock));
 		hardware_enable_nolock(NULL);
 	}
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/4] KVM: nVMX: Fix using __this_cpu_read() in preemptible context
  2019-05-17  8:49 [PATCH 1/4] KVM: x86: Disable intercept for CORE cstate read Wanpeng Li
  2019-05-17  8:49 ` [PATCH RESEND 2/4] KVM: X86: Emulate MSR_IA32_MISC_ENABLE MWAIT bit Wanpeng Li
  2019-05-17  8:49 ` [PATCH 3/4] KVM: Fix spinlock taken warning during host resume Wanpeng Li
@ 2019-05-17  8:49 ` Wanpeng Li
  2019-05-17 20:12   ` Sean Christopherson
  2019-05-20 10:36   ` Paolo Bonzini
  2019-05-20 10:30 ` [PATCH 1/4] KVM: x86: Disable intercept for CORE cstate read Paolo Bonzini
  3 siblings, 2 replies; 11+ messages in thread
From: Wanpeng Li @ 2019-05-17  8:49 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář

From: Wanpeng Li <wanpengli@tencent.com>

 BUG: using __this_cpu_read() in preemptible [00000000] code: qemu-system-x86/4590
  caller is nested_vmx_enter_non_root_mode+0xebd/0x1790 [kvm_intel]
  CPU: 4 PID: 4590 Comm: qemu-system-x86 Tainted: G           OE     5.1.0-rc4+ #1
  Call Trace:
   dump_stack+0x67/0x95
   __this_cpu_preempt_check+0xd2/0xe0
   nested_vmx_enter_non_root_mode+0xebd/0x1790 [kvm_intel]
   nested_vmx_run+0xda/0x2b0 [kvm_intel]
   handle_vmlaunch+0x13/0x20 [kvm_intel]
   vmx_handle_exit+0xbd/0x660 [kvm_intel]
   kvm_arch_vcpu_ioctl_run+0xa2c/0x1e50 [kvm]
   kvm_vcpu_ioctl+0x3ad/0x6d0 [kvm]
   do_vfs_ioctl+0xa5/0x6e0
   ksys_ioctl+0x6d/0x80
   __x64_sys_ioctl+0x1a/0x20
   do_syscall_64+0x6f/0x6c0
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Accessing per-cpu variable should disable preemption, this patch extends the 
preemption disable region for __this_cpu_read().

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/vmx/nested.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 0c601d0..8f6f69c 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2792,14 +2792,13 @@ static int nested_vmx_check_vmentry_hw(struct kvm_vcpu *vcpu)
 	      : "cc", "memory"
 	);
 
-	preempt_enable();
-
 	if (vmx->msr_autoload.host.nr)
 		vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.host.nr);
 	if (vmx->msr_autoload.guest.nr)
 		vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr);
 
 	if (vm_fail) {
+		preempt_enable();
 		WARN_ON_ONCE(vmcs_read32(VM_INSTRUCTION_ERROR) !=
 			     VMXERR_ENTRY_INVALID_CONTROL_FIELD);
 		return 1;
@@ -2811,6 +2810,7 @@ static int nested_vmx_check_vmentry_hw(struct kvm_vcpu *vcpu)
 	local_irq_enable();
 	if (hw_breakpoint_active())
 		set_debugreg(__this_cpu_read(cpu_dr7), 7);
+	preempt_enable();
 
 	/*
 	 * A non-failing VMEntry means we somehow entered guest mode with
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] KVM: nVMX: Fix using __this_cpu_read() in preemptible context
  2019-05-17  8:49 ` [PATCH 4/4] KVM: nVMX: Fix using __this_cpu_read() in preemptible context Wanpeng Li
@ 2019-05-17 20:12   ` Sean Christopherson
  2019-05-20 10:36   ` Paolo Bonzini
  1 sibling, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2019-05-17 20:12 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, kvm, Paolo Bonzini, Radim Krčmář

On Fri, May 17, 2019 at 04:49:50PM +0800, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
>  BUG: using __this_cpu_read() in preemptible [00000000] code: qemu-system-x86/4590
>   caller is nested_vmx_enter_non_root_mode+0xebd/0x1790 [kvm_intel]
>   CPU: 4 PID: 4590 Comm: qemu-system-x86 Tainted: G           OE     5.1.0-rc4+ #1
>   Call Trace:
>    dump_stack+0x67/0x95
>    __this_cpu_preempt_check+0xd2/0xe0
>    nested_vmx_enter_non_root_mode+0xebd/0x1790 [kvm_intel]
>    nested_vmx_run+0xda/0x2b0 [kvm_intel]
>    handle_vmlaunch+0x13/0x20 [kvm_intel]
>    vmx_handle_exit+0xbd/0x660 [kvm_intel]
>    kvm_arch_vcpu_ioctl_run+0xa2c/0x1e50 [kvm]
>    kvm_vcpu_ioctl+0x3ad/0x6d0 [kvm]
>    do_vfs_ioctl+0xa5/0x6e0
>    ksys_ioctl+0x6d/0x80
>    __x64_sys_ioctl+0x1a/0x20
>    do_syscall_64+0x6f/0x6c0
>    entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> Accessing per-cpu variable should disable preemption, this patch extends the 
> preemption disable region for __this_cpu_read().
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---

Fixes: 52017608da33 ("KVM: nVMX: add option to perform early consistency checks via H/W")

and probably

Cc: stable@vger.kernel.org

Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] KVM: x86: Disable intercept for CORE cstate read
  2019-05-17  8:49 [PATCH 1/4] KVM: x86: Disable intercept for CORE cstate read Wanpeng Li
                   ` (2 preceding siblings ...)
  2019-05-17  8:49 ` [PATCH 4/4] KVM: nVMX: Fix using __this_cpu_read() in preemptible context Wanpeng Li
@ 2019-05-20 10:30 ` Paolo Bonzini
  2019-05-20 11:29   ` Wanpeng Li
  3 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2019-05-20 10:30 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář, Sean Christopherson, Liran Alon

On 17/05/19 10:49, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> Allow guest reads CORE cstate when exposing host CPU power management capabilities 
> to the guest. PKG cstate is restricted to avoid a guest to get the whole package 
> information in multi-tenant scenario.

Hmm, I am not sure about this.  I can see why it can be useful to run
turbostat in the guest, but is it a good idea to share it with the
guest, since it counts from machine reset rather than from VM reset?

Maybe it could use a separate bit for KVM_CAP_X86_DISABLE_EXITS?

Thanks,

Paolo

> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Sean Christopherson <sean.j.christopherson@intel.com>
> Cc: Liran Alon <liran.alon@oracle.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 771d3bf..b0d6be5 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6615,6 +6615,12 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
>  	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
>  	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
>  	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
> +	if (kvm_mwait_in_guest(kvm)) {
> +		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C1_RES, MSR_TYPE_R);
> +		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
> +		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
> +		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
> +	}
>  	vmx->msr_bitmap_mode = 0;
>  
>  	vmx->loaded_vmcs = &vmx->vmcs01;
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RESEND 2/4] KVM: X86: Emulate MSR_IA32_MISC_ENABLE MWAIT bit
  2019-05-17  8:49 ` [PATCH RESEND 2/4] KVM: X86: Emulate MSR_IA32_MISC_ENABLE MWAIT bit Wanpeng Li
@ 2019-05-20 10:34   ` Paolo Bonzini
  2019-05-20 11:39     ` Wanpeng Li
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2019-05-20 10:34 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář,
	Sean Christopherson, Liran Alon, Konrad Rzeszutek Wilk

On 17/05/19 10:49, Wanpeng Li wrote:
> MSR IA32_MSIC_ENABLE bit 18, according to SDM:
> 
> | When this bit is set to 0, the MONITOR feature flag is not set (CPUID.01H:ECX[bit 3] = 0).
> | This indicates that MONITOR/MWAIT are not supported.
> |
> | Software attempts to execute MONITOR/MWAIT will cause #UD when this bit is 0.
> |
> | When this bit is set to 1 (default), MONITOR/MWAIT are supported (CPUID.01H:ECX[bit 3] = 1).
> 
> The CPUID.01H:ECX[bit 3] ought to mirror the value of the MSR bit,
> CPUID.01H:ECX[bit 3] is a better guard than kvm_mwait_in_guest().
> kvm_mwait_in_guest() affects the behavior of MONITOR/MWAIT, not its
> guest visibility.
> 
> This patch implements toggling of the CPUID bit based on guest writes
> to the MSR.

Won't this disable mwait after migration, unless IA32_MISC_ENABLE is set
correctly by firmware or userspace?  I think you need to hide this
behind KVM_CAP_DISABLE_QUIRKS.  (Also, what is the reason for this
change in general besides making behavior closer to real hardware?)

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/4] KVM: Fix spinlock taken warning during host resume
  2019-05-17  8:49 ` [PATCH 3/4] KVM: Fix spinlock taken warning during host resume Wanpeng Li
@ 2019-05-20 10:36   ` Paolo Bonzini
  0 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2019-05-20 10:36 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář, Paul E . McKenney

On 17/05/19 10:49, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
>  WARNING: CPU: 0 PID: 13554 at kvm/arch/x86/kvm//../../../virt/kvm/kvm_main.c:4183 kvm_resume+0x3c/0x40 [kvm]
>   CPU: 0 PID: 13554 Comm: step_after_susp Tainted: G           OE     5.1.0-rc4+ #1
>   RIP: 0010:kvm_resume+0x3c/0x40 [kvm]
>   Call Trace:
>    syscore_resume+0x63/0x2d0
>    suspend_devices_and_enter+0x9d1/0xa40
>    pm_suspend+0x33a/0x3b0
>    state_store+0x82/0xf0
>    kobj_attr_store+0x12/0x20
>    sysfs_kf_write+0x4b/0x60
>    kernfs_fop_write+0x120/0x1a0
>    __vfs_write+0x1b/0x40
>    vfs_write+0xcd/0x1d0
>    ksys_write+0x5f/0xe0
>    __x64_sys_write+0x1a/0x20
>    do_syscall_64+0x6f/0x6c0
>    entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> Commit ca84d1a24 (KVM: x86: Add clock sync request to hardware enable) mentioned 
> that "we always hold kvm_lock when hardware_enable is called.  The one place that 
> doesn't need to worry about it is resume, as resuming a frozen CPU, the spinlock 
> won't be taken." However, commit 6706dae9 (virt/kvm: Replace spin_is_locked() with 
> lockdep) introduces a bug, it asserts when the lock is not held which is contrary 
> to the original goal. 
> 
> This patch fixes it by WARN_ON when the lock is held.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Paul E. McKenney <paulmck@linux.ibm.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
>  virt/kvm/kvm_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 5fb0f16..c7eab5f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4096,7 +4096,7 @@ static int kvm_suspend(void)
>  static void kvm_resume(void)
>  {
>  	if (kvm_usage_count) {
> -		lockdep_assert_held(&kvm_count_lock);
> +		WARN_ON(lockdep_is_held(&kvm_count_lock));
>  		hardware_enable_nolock(NULL);
>  	}
>  }
> 

Queued, thanks.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] KVM: nVMX: Fix using __this_cpu_read() in preemptible context
  2019-05-17  8:49 ` [PATCH 4/4] KVM: nVMX: Fix using __this_cpu_read() in preemptible context Wanpeng Li
  2019-05-17 20:12   ` Sean Christopherson
@ 2019-05-20 10:36   ` Paolo Bonzini
  1 sibling, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2019-05-20 10:36 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm; +Cc: Radim Krčmář

On 17/05/19 10:49, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
>  BUG: using __this_cpu_read() in preemptible [00000000] code: qemu-system-x86/4590
>   caller is nested_vmx_enter_non_root_mode+0xebd/0x1790 [kvm_intel]
>   CPU: 4 PID: 4590 Comm: qemu-system-x86 Tainted: G           OE     5.1.0-rc4+ #1
>   Call Trace:
>    dump_stack+0x67/0x95
>    __this_cpu_preempt_check+0xd2/0xe0
>    nested_vmx_enter_non_root_mode+0xebd/0x1790 [kvm_intel]
>    nested_vmx_run+0xda/0x2b0 [kvm_intel]
>    handle_vmlaunch+0x13/0x20 [kvm_intel]
>    vmx_handle_exit+0xbd/0x660 [kvm_intel]
>    kvm_arch_vcpu_ioctl_run+0xa2c/0x1e50 [kvm]
>    kvm_vcpu_ioctl+0x3ad/0x6d0 [kvm]
>    do_vfs_ioctl+0xa5/0x6e0
>    ksys_ioctl+0x6d/0x80
>    __x64_sys_ioctl+0x1a/0x20
>    do_syscall_64+0x6f/0x6c0
>    entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> Accessing per-cpu variable should disable preemption, this patch extends the 
> preemption disable region for __this_cpu_read().
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 0c601d0..8f6f69c 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2792,14 +2792,13 @@ static int nested_vmx_check_vmentry_hw(struct kvm_vcpu *vcpu)
>  	      : "cc", "memory"
>  	);
>  
> -	preempt_enable();
> -
>  	if (vmx->msr_autoload.host.nr)
>  		vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.host.nr);
>  	if (vmx->msr_autoload.guest.nr)
>  		vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr);
>  
>  	if (vm_fail) {
> +		preempt_enable();
>  		WARN_ON_ONCE(vmcs_read32(VM_INSTRUCTION_ERROR) !=
>  			     VMXERR_ENTRY_INVALID_CONTROL_FIELD);
>  		return 1;
> @@ -2811,6 +2810,7 @@ static int nested_vmx_check_vmentry_hw(struct kvm_vcpu *vcpu)
>  	local_irq_enable();
>  	if (hw_breakpoint_active())
>  		set_debugreg(__this_cpu_read(cpu_dr7), 7);
> +	preempt_enable();
>  
>  	/*
>  	 * A non-failing VMEntry means we somehow entered guest mode with
> 


Queued, thanks.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] KVM: x86: Disable intercept for CORE cstate read
  2019-05-20 10:30 ` [PATCH 1/4] KVM: x86: Disable intercept for CORE cstate read Paolo Bonzini
@ 2019-05-20 11:29   ` Wanpeng Li
  0 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2019-05-20 11:29 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Radim Krčmář, Sean Christopherson, Liran Alon

On Mon, 20 May 2019 at 18:30, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 17/05/19 10:49, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > Allow guest reads CORE cstate when exposing host CPU power management capabilities
> > to the guest. PKG cstate is restricted to avoid a guest to get the whole package
> > information in multi-tenant scenario.
>
> Hmm, I am not sure about this.  I can see why it can be useful to run
> turbostat in the guest, but is it a good idea to share it with the

Yeah.

> guest, since it counts from machine reset rather than from VM reset?

I also saw amazon expose these in their nitro c5 instance.

>
> Maybe it could use a separate bit for KVM_CAP_X86_DISABLE_EXITS?

It could be.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RESEND 2/4] KVM: X86: Emulate MSR_IA32_MISC_ENABLE MWAIT bit
  2019-05-20 10:34   ` Paolo Bonzini
@ 2019-05-20 11:39     ` Wanpeng Li
  0 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2019-05-20 11:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Radim Krčmář,
	Sean Christopherson, Liran Alon, Konrad Rzeszutek Wilk

On Mon, 20 May 2019 at 18:34, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 17/05/19 10:49, Wanpeng Li wrote:
> > MSR IA32_MSIC_ENABLE bit 18, according to SDM:
> >
> > | When this bit is set to 0, the MONITOR feature flag is not set (CPUID.01H:ECX[bit 3] = 0).
> > | This indicates that MONITOR/MWAIT are not supported.
> > |
> > | Software attempts to execute MONITOR/MWAIT will cause #UD when this bit is 0.
> > |
> > | When this bit is set to 1 (default), MONITOR/MWAIT are supported (CPUID.01H:ECX[bit 3] = 1).
> >
> > The CPUID.01H:ECX[bit 3] ought to mirror the value of the MSR bit,
> > CPUID.01H:ECX[bit 3] is a better guard than kvm_mwait_in_guest().
> > kvm_mwait_in_guest() affects the behavior of MONITOR/MWAIT, not its
> > guest visibility.
> >
> > This patch implements toggling of the CPUID bit based on guest writes
> > to the MSR.
>
> Won't this disable mwait after migration, unless IA32_MISC_ENABLE is set
> correctly by firmware or userspace?  I think you need to hide this

Agreed.

> behind KVM_CAP_DISABLE_QUIRKS.  (Also, what is the reason for this
> change in general besides making behavior closer to real hardware?)

Just making behavior closer to real hardware. :)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-05-20 11:40 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-17  8:49 [PATCH 1/4] KVM: x86: Disable intercept for CORE cstate read Wanpeng Li
2019-05-17  8:49 ` [PATCH RESEND 2/4] KVM: X86: Emulate MSR_IA32_MISC_ENABLE MWAIT bit Wanpeng Li
2019-05-20 10:34   ` Paolo Bonzini
2019-05-20 11:39     ` Wanpeng Li
2019-05-17  8:49 ` [PATCH 3/4] KVM: Fix spinlock taken warning during host resume Wanpeng Li
2019-05-20 10:36   ` Paolo Bonzini
2019-05-17  8:49 ` [PATCH 4/4] KVM: nVMX: Fix using __this_cpu_read() in preemptible context Wanpeng Li
2019-05-17 20:12   ` Sean Christopherson
2019-05-20 10:36   ` Paolo Bonzini
2019-05-20 10:30 ` [PATCH 1/4] KVM: x86: Disable intercept for CORE cstate read Paolo Bonzini
2019-05-20 11:29   ` Wanpeng Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.