kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] KVM: Documentation: Add disable pause exits to KVM_CAP_X86_DISABLE_EXITS
@ 2019-05-21  6:06 Wanpeng Li
  2019-05-21  6:06 ` [PATCH v2 2/3] KVM: X86: Provide a capability to disable cstate msr read intercepts Wanpeng Li
  2019-05-21  6:06 ` [PATCH v2 3/3] KVM: X86: Emulate MSR_IA32_MISC_ENABLE MWAIT bit Wanpeng Li
  0 siblings, 2 replies; 11+ messages in thread
From: Wanpeng Li @ 2019-05-21  6:06 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Liran Alon

From: Wanpeng Li <wanpengli@tencent.com>

Commit b31c114b (KVM: X86: Provide a capability to disable PAUSE intercepts)
forgot to add the KVM_X86_DISABLE_EXITS_PAUSE into api doc. This patch adds 
it.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 Documentation/virtual/kvm/api.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index ba6c42c..33cd92d 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4893,6 +4893,7 @@ Valid bits in args[0] are
 
 #define KVM_X86_DISABLE_EXITS_MWAIT            (1 << 0)
 #define KVM_X86_DISABLE_EXITS_HLT              (1 << 1)
+#define KVM_X86_DISABLE_EXITS_PAUSE            (1 << 2)
 
 Enabling this capability on a VM provides userspace with a way to no
 longer intercept some instructions for improved latency in some
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/3] KVM: X86: Provide a capability to disable cstate msr read intercepts
  2019-05-21  6:06 [PATCH 1/3] KVM: Documentation: Add disable pause exits to KVM_CAP_X86_DISABLE_EXITS Wanpeng Li
@ 2019-05-21  6:06 ` Wanpeng Li
  2019-06-04 16:53   ` Paolo Bonzini
  2019-05-21  6:06 ` [PATCH v2 3/3] KVM: X86: Emulate MSR_IA32_MISC_ENABLE MWAIT bit Wanpeng Li
  1 sibling, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2019-05-21  6:06 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Liran Alon

From: Wanpeng Li <wanpengli@tencent.com>

Allow guest reads CORE cstate when exposing host CPU power management capabilities 
to the guest. PKG cstate is restricted to avoid a guest to get the whole package 
information in multi-tenant scenario.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
v1 -> v2:
 * use a separate bit for KVM_CAP_X86_DISABLE_EXITS

 Documentation/virtual/kvm/api.txt | 1 +
 arch/x86/include/asm/kvm_host.h   | 1 +
 arch/x86/kvm/vmx/vmx.c            | 6 ++++++
 arch/x86/kvm/x86.c                | 5 ++++-
 arch/x86/kvm/x86.h                | 5 +++++
 include/uapi/linux/kvm.h          | 4 +++-
 tools/include/uapi/linux/kvm.h    | 4 +++-
 7 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 33cd92d..91fd86f 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4894,6 +4894,7 @@ Valid bits in args[0] are
 #define KVM_X86_DISABLE_EXITS_MWAIT            (1 << 0)
 #define KVM_X86_DISABLE_EXITS_HLT              (1 << 1)
 #define KVM_X86_DISABLE_EXITS_PAUSE            (1 << 2)
+#define KVM_X86_DISABLE_EXITS_CSTATE           (1 << 3)
 
 Enabling this capability on a VM provides userspace with a way to no
 longer intercept some instructions for improved latency in some
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d5457c7..1ce8289 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -882,6 +882,7 @@ struct kvm_arch {
 	bool mwait_in_guest;
 	bool hlt_in_guest;
 	bool pause_in_guest;
+	bool cstate_in_guest;
 
 	unsigned long irq_sources_bitmap;
 	s64 kvmclock_offset;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0861c71..da24f18 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6637,6 +6637,12 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
 	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
 	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
 	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
+	if (kvm_cstate_in_guest(kvm)) {
+		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C1_RES, MSR_TYPE_R);
+		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
+		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
+		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
+	}
 	vmx->msr_bitmap_mode = 0;
 
 	vmx->loaded_vmcs = &vmx->vmcs01;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 202048e..765fe59 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3098,7 +3098,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = KVM_CLOCK_TSC_STABLE;
 		break;
 	case KVM_CAP_X86_DISABLE_EXITS:
-		r |=  KVM_X86_DISABLE_EXITS_HLT | KVM_X86_DISABLE_EXITS_PAUSE;
+		r |=  KVM_X86_DISABLE_EXITS_HLT | KVM_X86_DISABLE_EXITS_PAUSE |
+		      KVM_X86_DISABLE_EXITS_CSTATE;
 		if(kvm_can_mwait_in_guest())
 			r |= KVM_X86_DISABLE_EXITS_MWAIT;
 		break;
@@ -4612,6 +4613,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 			kvm->arch.hlt_in_guest = true;
 		if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE)
 			kvm->arch.pause_in_guest = true;
+		if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE)
+			kvm->arch.cstate_in_guest = true;
 		r = 0;
 		break;
 	case KVM_CAP_MSR_PLATFORM_INFO:
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index a470ff0..275b3b6 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -333,6 +333,11 @@ static inline bool kvm_pause_in_guest(struct kvm *kvm)
 	return kvm->arch.pause_in_guest;
 }
 
+static inline bool kvm_cstate_in_guest(struct kvm *kvm)
+{
+	return kvm->arch.cstate_in_guest;
+}
+
 DECLARE_PER_CPU(struct kvm_vcpu *, current_vcpu);
 
 static inline void kvm_before_interrupt(struct kvm_vcpu *vcpu)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2fe12b4..c2152f3 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -696,9 +696,11 @@ struct kvm_ioeventfd {
 #define KVM_X86_DISABLE_EXITS_MWAIT          (1 << 0)
 #define KVM_X86_DISABLE_EXITS_HLT            (1 << 1)
 #define KVM_X86_DISABLE_EXITS_PAUSE          (1 << 2)
+#define KVM_X86_DISABLE_EXITS_CSTATE         (1 << 3)
 #define KVM_X86_DISABLE_VALID_EXITS          (KVM_X86_DISABLE_EXITS_MWAIT | \
                                               KVM_X86_DISABLE_EXITS_HLT | \
-                                              KVM_X86_DISABLE_EXITS_PAUSE)
+                                              KVM_X86_DISABLE_EXITS_PAUSE | \
+                                              KVM_X86_DISABLE_EXITS_CSTATE)
 
 /* for KVM_ENABLE_CAP */
 struct kvm_enable_cap {
diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
index 6d4ea4b..ef3303f 100644
--- a/tools/include/uapi/linux/kvm.h
+++ b/tools/include/uapi/linux/kvm.h
@@ -696,9 +696,11 @@ struct kvm_ioeventfd {
 #define KVM_X86_DISABLE_EXITS_MWAIT          (1 << 0)
 #define KVM_X86_DISABLE_EXITS_HLT            (1 << 1)
 #define KVM_X86_DISABLE_EXITS_PAUSE          (1 << 2)
+#define KVM_X86_DISABLE_EXITS_CSTATE         (1 << 3)
 #define KVM_X86_DISABLE_VALID_EXITS          (KVM_X86_DISABLE_EXITS_MWAIT | \
                                               KVM_X86_DISABLE_EXITS_HLT | \
-                                              KVM_X86_DISABLE_EXITS_PAUSE)
+                                              KVM_X86_DISABLE_EXITS_PAUSE | \
+                                              KVM_X86_DISABLE_EXITS_CSTATE)
 
 /* for KVM_ENABLE_CAP */
 struct kvm_enable_cap {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/3] KVM: X86: Emulate MSR_IA32_MISC_ENABLE MWAIT bit
  2019-05-21  6:06 [PATCH 1/3] KVM: Documentation: Add disable pause exits to KVM_CAP_X86_DISABLE_EXITS Wanpeng Li
  2019-05-21  6:06 ` [PATCH v2 2/3] KVM: X86: Provide a capability to disable cstate msr read intercepts Wanpeng Li
@ 2019-05-21  6:06 ` Wanpeng Li
  2019-06-04 16:59   ` Paolo Bonzini
  1 sibling, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2019-05-21  6:06 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Liran Alon, Konrad Rzeszutek Wilk

From: Wanpeng Li <wanpengli@tencent.com>

MSR IA32_MISC_ENABLE bit 18, according to SDM:

| When this bit is set to 0, the MONITOR feature flag is not set (CPUID.01H:ECX[bit 3] = 0).
| This indicates that MONITOR/MWAIT are not supported.
|
| Software attempts to execute MONITOR/MWAIT will cause #UD when this bit is 0.
|
| When this bit is set to 1 (default), MONITOR/MWAIT are supported (CPUID.01H:ECX[bit 3] = 1).

The CPUID.01H:ECX[bit 3] ought to mirror the value of the MSR bit,
CPUID.01H:ECX[bit 3] is a better guard than kvm_mwait_in_guest().
kvm_mwait_in_guest() affects the behavior of MONITOR/MWAIT, not its
guest visibility.

This patch implements toggling of the CPUID bit based on guest writes
to the MSR.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
v1 -> v2:
 * hide behind KVM_CAP_DISABLE_QUIRKS

 arch/x86/include/uapi/asm/kvm.h |  1 +
 arch/x86/kvm/cpuid.c            | 10 ++++++++++
 arch/x86/kvm/x86.c              | 10 ++++++++++
 3 files changed, 21 insertions(+)

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 7a0e64c..e3ae96b5 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -382,6 +382,7 @@ struct kvm_sync_regs {
 #define KVM_X86_QUIRK_CD_NW_CLEARED	(1 << 1)
 #define KVM_X86_QUIRK_LAPIC_MMIO_HOLE	(1 << 2)
 #define KVM_X86_QUIRK_OUT_7E_INC_RIP	(1 << 3)
+#define KVM_X86_QUIRK_MISC_ENABLE_MWAIT (1 << 4)
 
 #define KVM_STATE_NESTED_GUEST_MODE	0x00000001
 #define KVM_STATE_NESTED_RUN_PENDING	0x00000002
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index e18a9f9..f54d266 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -137,6 +137,16 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 		(best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
 		best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
 
+	if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_MWAIT)) {
+		best = kvm_find_cpuid_entry(vcpu, 0x1, 0);
+		if (best) {
+			if (vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_MWAIT)
+				best->ecx |= F(MWAIT);
+			else
+				best->ecx &= ~F(MWAIT);
+		}
+	}
+
 	/* Update physical-address width */
 	vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
 	kvm_mmu_reset_context(vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 765fe59..a4eb711 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2547,6 +2547,16 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		}
 		break;
 	case MSR_IA32_MISC_ENABLE:
+		if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_MWAIT) &&
+			((vcpu->arch.ia32_misc_enable_msr ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
+			if ((vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_MWAIT) &&
+				!(data & MSR_IA32_MISC_ENABLE_MWAIT)) {
+				if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
+					return 1;
+			}
+			vcpu->arch.ia32_misc_enable_msr = data;
+			kvm_update_cpuid(vcpu);
+		}
 		vcpu->arch.ia32_misc_enable_msr = data;
 		break;
 	case MSR_IA32_SMBASE:
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] KVM: X86: Provide a capability to disable cstate msr read intercepts
  2019-05-21  6:06 ` [PATCH v2 2/3] KVM: X86: Provide a capability to disable cstate msr read intercepts Wanpeng Li
@ 2019-06-04 16:53   ` Paolo Bonzini
  2019-06-05 10:56     ` Wanpeng Li
  2019-06-11  7:38     ` Wanpeng Li
  0 siblings, 2 replies; 11+ messages in thread
From: Paolo Bonzini @ 2019-06-04 16:53 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář, Sean Christopherson, Liran Alon

On 21/05/19 08:06, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> Allow guest reads CORE cstate when exposing host CPU power management capabilities 
> to the guest. PKG cstate is restricted to avoid a guest to get the whole package 
> information in multi-tenant scenario.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Sean Christopherson <sean.j.christopherson@intel.com>
> Cc: Liran Alon <liran.alon@oracle.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
> v1 -> v2:
>  * use a separate bit for KVM_CAP_X86_DISABLE_EXITS
> 
>  Documentation/virtual/kvm/api.txt | 1 +
>  arch/x86/include/asm/kvm_host.h   | 1 +
>  arch/x86/kvm/vmx/vmx.c            | 6 ++++++
>  arch/x86/kvm/x86.c                | 5 ++++-
>  arch/x86/kvm/x86.h                | 5 +++++
>  include/uapi/linux/kvm.h          | 4 +++-
>  tools/include/uapi/linux/kvm.h    | 4 +++-
>  7 files changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 33cd92d..91fd86f 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -4894,6 +4894,7 @@ Valid bits in args[0] are
>  #define KVM_X86_DISABLE_EXITS_MWAIT            (1 << 0)
>  #define KVM_X86_DISABLE_EXITS_HLT              (1 << 1)
>  #define KVM_X86_DISABLE_EXITS_PAUSE            (1 << 2)
> +#define KVM_X86_DISABLE_EXITS_CSTATE           (1 << 3)
>  
>  Enabling this capability on a VM provides userspace with a way to no
>  longer intercept some instructions for improved latency in some
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index d5457c7..1ce8289 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -882,6 +882,7 @@ struct kvm_arch {
>  	bool mwait_in_guest;
>  	bool hlt_in_guest;
>  	bool pause_in_guest;
> +	bool cstate_in_guest;
>  
>  	unsigned long irq_sources_bitmap;
>  	s64 kvmclock_offset;
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 0861c71..da24f18 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6637,6 +6637,12 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
>  	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
>  	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
>  	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
> +	if (kvm_cstate_in_guest(kvm)) {
> +		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C1_RES, MSR_TYPE_R);
> +		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
> +		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
> +		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);

I think I have changed my mind on the implementation of this, sorry.

1) We should emulate these MSRs always, otherwise the guest API changes
between different values of KVM_CAP_X86_DISABLE_EXITS which is not
intended.  Also, KVM_CAP_X86_DISABLE_EXITS does not prevent live
migration, so it should be possible to set the MSRs in the host to
change the delta between the host and guest values.

2) If both KVM_X86_DISABLE_EXITS_HLT and KVM_X86_DISABLE_EXITS_MWAIT are
disabled (i.e. exit happens), the MSRs will be purely emulated.
C3/C6/C7 residency will never increase (it will remain the value that is
set by the host).  When the VM executes an hlt vmexit, it should save
the current TSC.  When it comes back, the C1 residency MSR should be
increased by the time that has passed.

3) If KVM_X86_DISABLE_EXITS_HLT is enabled but
KVM_X86_DISABLE_EXITS_MWAIT is disabled (i.e. mait exits happen),
C3/C6/C7 residency will also never increase, but the C1 residency value
should be read using rdmsr from the host, with a delta added from the
host value.

4) If KVM_X86_DISABLE_EXITS_HLT and KVM_X86_DISABLE_EXITS_MWAIT are both
disabled (i.e. mwait exits do not happen), all four residency values
should be read using rdmsr from the host, with a delta added from the
host value.

5) If KVM_X86_DISABLE_EXITS_HLT is disabled and
KVM_X86_DISABLE_EXITS_MWAIT is enabled, the configuration makes no sense
so it's okay not to be very optimized.  In this case, the residency
value should be read as in (4), but hlt vmexits will be accounted as in
(2) so we need to be careful not to double-count the residency during
hlt.  This means doing four rdmsr before the beginning of the hlt vmexit
and four at the end of the hlt vmexit.

Therefore the data structure should be something like

struct kvm_residency_msr {
	u64 value;
	bool delta_from_host;
	bool count_with_host;
}

u64 kvm_residency_read_host(struct kvm_residency_msr *msr)
{
	u64 unscaled_value = rdmsrl(msr->index);
	// apply TSC scaling...
	return ...
}

u64 kvm_residency_read(struct kvm_residency_msr *msr)
{
	return msr->value +
		(msr->delta_from_host ? kvm_residency_read_host(msr) : 0);
}

void kvm_residency_write(struct kvm_residency_msr *msr,
			 u64 value)
{
	msr->value = value -
		(msr->delta_from_host ? kvm_residency_read_host(msr) : 0);
}

// count_with_host is true for C1 iff any of KVM_CAP_DISABLE_EXITS_HLT
// or KVM_CAP_DISABLE_EXITS_MWAIT is set
// count_with_host is true for C3/C6/C7 iff KVM_CAP_DISABLE_EXITS_MWAIT
is set
void kvm_residency_setup(struct kvm_residency_msr *msr, u16 index,
			 bool count_with_host)
{
	/* Preserve value on calls after the first */
	u64 value = msr->index ? kvm_residency_read(msr) : 0;
	msr->delta_from_host = msr->count_with_host = count_with_host;
	msr->index = index;
	kvm_residency_write(msr, value);
}

// The following functions are called from hlt vmexits.

void kvm_residency_start_hlt(struct kvm_residency_msr *msr)
{
	if (msr->count_with_host) {
		WARN_ON(msr->delta_from_host);
		msr->value += kvm_residency_read_host(msr);
		msr->delta_from_host = false;
	}
}

// host_tsc_waited is 0 except for MSR_CORE_C1_RES
void kvm_residency_end_hlt(struct kvm_residency_msr *msr,
			   u64 host_tsc_waited)
{
	if (msr->count_with_host) {
		WARN_ON(!msr->delta_from_host);
		msr->value -= kvm_residency_read_host(msr);
		msr->delta_from_host = true;
	}
	if (host_tsc_waited) {
		// ... apply TSC scaling to host_tsc_waited ...
		msr->value += ...;
	}
}

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/3] KVM: X86: Emulate MSR_IA32_MISC_ENABLE MWAIT bit
  2019-05-21  6:06 ` [PATCH v2 3/3] KVM: X86: Emulate MSR_IA32_MISC_ENABLE MWAIT bit Wanpeng Li
@ 2019-06-04 16:59   ` Paolo Bonzini
  2019-06-05 11:00     ` Wanpeng Li
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2019-06-04 16:59 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář,
	Sean Christopherson, Liran Alon, Konrad Rzeszutek Wilk

On 21/05/19 08:06, Wanpeng Li wrote:
> 
> The CPUID.01H:ECX[bit 3] ought to mirror the value of the MSR bit,
> CPUID.01H:ECX[bit 3] is a better guard than kvm_mwait_in_guest().
> kvm_mwait_in_guest() affects the behavior of MONITOR/MWAIT, not its
> guest visibility.

This needs some adjustment so that the default is backwards-compatible:

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index e3ae96b52a16..f9b021e16ebc 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -378,11 +378,11 @@ struct kvm_sync_regs {
 	struct kvm_vcpu_events events;
 };
 
-#define KVM_X86_QUIRK_LINT0_REENABLED	(1 << 0)
-#define KVM_X86_QUIRK_CD_NW_CLEARED	(1 << 1)
-#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE	(1 << 2)
-#define KVM_X86_QUIRK_OUT_7E_INC_RIP	(1 << 3)
-#define KVM_X86_QUIRK_MISC_ENABLE_MWAIT (1 << 4)
+#define KVM_X86_QUIRK_LINT0_REENABLED	   (1 << 0)
+#define KVM_X86_QUIRK_CD_NW_CLEARED	   (1 << 1)
+#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE	   (1 << 2)
+#define KVM_X86_QUIRK_OUT_7E_INC_RIP	   (1 << 3)
+#define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT (1 << 4)
 
 #define KVM_STATE_NESTED_GUEST_MODE	0x00000001
 #define KVM_STATE_NESTED_RUN_PENDING	0x00000002
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index f54d266fd3b5..bfa1341ce6f1 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -137,10 +137,10 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 		(best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
 		best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
 
-	if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_MWAIT)) {
+	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
 		best = kvm_find_cpuid_entry(vcpu, 0x1, 0);
 		if (best) {
-			if (vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_MWAIT)
+			if (vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_NO_MWAIT)
 				best->ecx |= F(MWAIT);
 			else
 				best->ecx &= ~F(MWAIT);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 528935733fe0..0c1498da46c7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2548,17 +2548,15 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		}
 		break;
 	case MSR_IA32_MISC_ENABLE:
-		if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_MWAIT) &&
-			((vcpu->arch.ia32_misc_enable_msr ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
-			if ((vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_MWAIT) &&
-				!(data & MSR_IA32_MISC_ENABLE_MWAIT)) {
-				if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
-					return 1;
-			}
+		if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
+		    ((vcpu->arch.ia32_misc_enable_msr ^ data) & MSR_IA32_MISC_ENABLE_NO_MWAIT)) {
+			if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
+				return 1;
 			vcpu->arch.ia32_misc_enable_msr = data;
 			kvm_update_cpuid(vcpu);
+		} else {
+			vcpu->arch.ia32_misc_enable_msr = data;
 		}
-		vcpu->arch.ia32_misc_enable_msr = data;
 		break;
 	case MSR_IA32_SMBASE:
 		if (!msr_info->host_initiated)

Please double check, in the meanwhile I've queued the patch.

Paolo

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] KVM: X86: Provide a capability to disable cstate msr read intercepts
  2019-06-04 16:53   ` Paolo Bonzini
@ 2019-06-05 10:56     ` Wanpeng Li
  2019-06-11  7:38     ` Wanpeng Li
  1 sibling, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2019-06-05 10:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Radim Krčmář, Sean Christopherson, Liran Alon

On Wed, 5 Jun 2019 at 00:53, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 21/05/19 08:06, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > Allow guest reads CORE cstate when exposing host CPU power management capabilities
> > to the guest. PKG cstate is restricted to avoid a guest to get the whole package
> > information in multi-tenant scenario.
> >
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > Cc: Sean Christopherson <sean.j.christopherson@intel.com>
> > Cc: Liran Alon <liran.alon@oracle.com>
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
> > v1 -> v2:
> >  * use a separate bit for KVM_CAP_X86_DISABLE_EXITS
> >
> >  Documentation/virtual/kvm/api.txt | 1 +
> >  arch/x86/include/asm/kvm_host.h   | 1 +
> >  arch/x86/kvm/vmx/vmx.c            | 6 ++++++
> >  arch/x86/kvm/x86.c                | 5 ++++-
> >  arch/x86/kvm/x86.h                | 5 +++++
> >  include/uapi/linux/kvm.h          | 4 +++-
> >  tools/include/uapi/linux/kvm.h    | 4 +++-
> >  7 files changed, 23 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > index 33cd92d..91fd86f 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -4894,6 +4894,7 @@ Valid bits in args[0] are
> >  #define KVM_X86_DISABLE_EXITS_MWAIT            (1 << 0)
> >  #define KVM_X86_DISABLE_EXITS_HLT              (1 << 1)
> >  #define KVM_X86_DISABLE_EXITS_PAUSE            (1 << 2)
> > +#define KVM_X86_DISABLE_EXITS_CSTATE           (1 << 3)
> >
> >  Enabling this capability on a VM provides userspace with a way to no
> >  longer intercept some instructions for improved latency in some
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index d5457c7..1ce8289 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -882,6 +882,7 @@ struct kvm_arch {
> >       bool mwait_in_guest;
> >       bool hlt_in_guest;
> >       bool pause_in_guest;
> > +     bool cstate_in_guest;
> >
> >       unsigned long irq_sources_bitmap;
> >       s64 kvmclock_offset;
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 0861c71..da24f18 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -6637,6 +6637,12 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
> >       vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
> >       vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
> >       vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
> > +     if (kvm_cstate_in_guest(kvm)) {
> > +             vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C1_RES, MSR_TYPE_R);
> > +             vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
> > +             vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
> > +             vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
>
> I think I have changed my mind on the implementation of this, sorry.
>
> 1) We should emulate these MSRs always, otherwise the guest API changes
> between different values of KVM_CAP_X86_DISABLE_EXITS which is not
> intended.  Also, KVM_CAP_X86_DISABLE_EXITS does not prevent live
> migration, so it should be possible to set the MSRs in the host to
> change the delta between the host and guest values.
>
> 2) If both KVM_X86_DISABLE_EXITS_HLT and KVM_X86_DISABLE_EXITS_MWAIT are
> disabled (i.e. exit happens), the MSRs will be purely emulated.
> C3/C6/C7 residency will never increase (it will remain the value that is
> set by the host).  When the VM executes an hlt vmexit, it should save
> the current TSC.  When it comes back, the C1 residency MSR should be
> increased by the time that has passed.
>
> 3) If KVM_X86_DISABLE_EXITS_HLT is enabled but
> KVM_X86_DISABLE_EXITS_MWAIT is disabled (i.e. mait exits happen),
> C3/C6/C7 residency will also never increase, but the C1 residency value
> should be read using rdmsr from the host, with a delta added from the
> host value.
>
> 4) If KVM_X86_DISABLE_EXITS_HLT and KVM_X86_DISABLE_EXITS_MWAIT are both
> disabled (i.e. mwait exits do not happen), all four residency values
> should be read using rdmsr from the host, with a delta added from the
> host value.
>
> 5) If KVM_X86_DISABLE_EXITS_HLT is disabled and
> KVM_X86_DISABLE_EXITS_MWAIT is enabled, the configuration makes no sense
> so it's okay not to be very optimized.  In this case, the residency
> value should be read as in (4), but hlt vmexits will be accounted as in
> (2) so we need to be careful not to double-count the residency during
> hlt.  This means doing four rdmsr before the beginning of the hlt vmexit
> and four at the end of the hlt vmexit.

I will have a try, thanks Paolo! :)

Regards,
Wanpeng Li

>
> Therefore the data structure should be something like
>
> struct kvm_residency_msr {
>         u64 value;
>         bool delta_from_host;
>         bool count_with_host;
> }
>
> u64 kvm_residency_read_host(struct kvm_residency_msr *msr)
> {
>         u64 unscaled_value = rdmsrl(msr->index);
>         // apply TSC scaling...
>         return ...
> }
>
> u64 kvm_residency_read(struct kvm_residency_msr *msr)
> {
>         return msr->value +
>                 (msr->delta_from_host ? kvm_residency_read_host(msr) : 0);
> }
>
> void kvm_residency_write(struct kvm_residency_msr *msr,
>                          u64 value)
> {
>         msr->value = value -
>                 (msr->delta_from_host ? kvm_residency_read_host(msr) : 0);
> }
>
> // count_with_host is true for C1 iff any of KVM_CAP_DISABLE_EXITS_HLT
> // or KVM_CAP_DISABLE_EXITS_MWAIT is set
> // count_with_host is true for C3/C6/C7 iff KVM_CAP_DISABLE_EXITS_MWAIT
> is set
> void kvm_residency_setup(struct kvm_residency_msr *msr, u16 index,
>                          bool count_with_host)
> {
>         /* Preserve value on calls after the first */
>         u64 value = msr->index ? kvm_residency_read(msr) : 0;
>         msr->delta_from_host = msr->count_with_host = count_with_host;
>         msr->index = index;
>         kvm_residency_write(msr, value);
> }
>
> // The following functions are called from hlt vmexits.
>
> void kvm_residency_start_hlt(struct kvm_residency_msr *msr)
> {
>         if (msr->count_with_host) {
>                 WARN_ON(msr->delta_from_host);
>                 msr->value += kvm_residency_read_host(msr);
>                 msr->delta_from_host = false;
>         }
> }
>
> // host_tsc_waited is 0 except for MSR_CORE_C1_RES
> void kvm_residency_end_hlt(struct kvm_residency_msr *msr,
>                            u64 host_tsc_waited)
> {
>         if (msr->count_with_host) {
>                 WARN_ON(!msr->delta_from_host);
>                 msr->value -= kvm_residency_read_host(msr);
>                 msr->delta_from_host = true;
>         }
>         if (host_tsc_waited) {
>                 // ... apply TSC scaling to host_tsc_waited ...
>                 msr->value += ...;
>         }
> }
>
> Thanks,
>
> Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/3] KVM: X86: Emulate MSR_IA32_MISC_ENABLE MWAIT bit
  2019-06-04 16:59   ` Paolo Bonzini
@ 2019-06-05 11:00     ` Wanpeng Li
  0 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2019-06-05 11:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Radim Krčmář,
	Sean Christopherson, Liran Alon, Konrad Rzeszutek Wilk

On Wed, 5 Jun 2019 at 00:59, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 21/05/19 08:06, Wanpeng Li wrote:
> >
> > The CPUID.01H:ECX[bit 3] ought to mirror the value of the MSR bit,
> > CPUID.01H:ECX[bit 3] is a better guard than kvm_mwait_in_guest().
> > kvm_mwait_in_guest() affects the behavior of MONITOR/MWAIT, not its
> > guest visibility.
>
> This needs some adjustment so that the default is backwards-compatible:
>
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index e3ae96b52a16..f9b021e16ebc 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -378,11 +378,11 @@ struct kvm_sync_regs {
>         struct kvm_vcpu_events events;
>  };
>
> -#define KVM_X86_QUIRK_LINT0_REENABLED  (1 << 0)
> -#define KVM_X86_QUIRK_CD_NW_CLEARED    (1 << 1)
> -#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE  (1 << 2)
> -#define KVM_X86_QUIRK_OUT_7E_INC_RIP   (1 << 3)
> -#define KVM_X86_QUIRK_MISC_ENABLE_MWAIT (1 << 4)
> +#define KVM_X86_QUIRK_LINT0_REENABLED     (1 << 0)
> +#define KVM_X86_QUIRK_CD_NW_CLEARED       (1 << 1)
> +#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE     (1 << 2)
> +#define KVM_X86_QUIRK_OUT_7E_INC_RIP      (1 << 3)
> +#define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT (1 << 4)
>
>  #define KVM_STATE_NESTED_GUEST_MODE    0x00000001
>  #define KVM_STATE_NESTED_RUN_PENDING   0x00000002
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index f54d266fd3b5..bfa1341ce6f1 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -137,10 +137,10 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
>                 (best->eax & (1 << KVM_FEATURE_PV_UNHALT)))
>                 best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
>
> -       if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_MWAIT)) {
> +       if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
>                 best = kvm_find_cpuid_entry(vcpu, 0x1, 0);
>                 if (best) {
> -                       if (vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_MWAIT)
> +                       if (vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_NO_MWAIT)
>                                 best->ecx |= F(MWAIT);
>                         else
>                                 best->ecx &= ~F(MWAIT);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 528935733fe0..0c1498da46c7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2548,17 +2548,15 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>                 }
>                 break;
>         case MSR_IA32_MISC_ENABLE:
> -               if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_MWAIT) &&
> -                       ((vcpu->arch.ia32_misc_enable_msr ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
> -                       if ((vcpu->arch.ia32_misc_enable_msr & MSR_IA32_MISC_ENABLE_MWAIT) &&
> -                               !(data & MSR_IA32_MISC_ENABLE_MWAIT)) {
> -                               if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
> -                                       return 1;
> -                       }
> +               if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
> +                   ((vcpu->arch.ia32_misc_enable_msr ^ data) & MSR_IA32_MISC_ENABLE_NO_MWAIT)) {
> +                       if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
> +                               return 1;
>                         vcpu->arch.ia32_misc_enable_msr = data;
>                         kvm_update_cpuid(vcpu);
> +               } else {
> +                       vcpu->arch.ia32_misc_enable_msr = data;
>                 }
> -               vcpu->arch.ia32_misc_enable_msr = data;
>                 break;
>         case MSR_IA32_SMBASE:
>                 if (!msr_info->host_initiated)
>
> Please double check, in the meanwhile I've queued the patch.

Looks good to me, thanks. :)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] KVM: X86: Provide a capability to disable cstate msr read intercepts
  2019-06-04 16:53   ` Paolo Bonzini
  2019-06-05 10:56     ` Wanpeng Li
@ 2019-06-11  7:38     ` Wanpeng Li
  2019-06-11 11:09       ` Paolo Bonzini
  1 sibling, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2019-06-11  7:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Radim Krčmář, Sean Christopherson, Liran Alon

On Wed, 5 Jun 2019 at 00:53, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 21/05/19 08:06, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > Allow guest reads CORE cstate when exposing host CPU power management capabilities
> > to the guest. PKG cstate is restricted to avoid a guest to get the whole package
> > information in multi-tenant scenario.
> >
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > Cc: Sean Christopherson <sean.j.christopherson@intel.com>
> > Cc: Liran Alon <liran.alon@oracle.com>
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
> > v1 -> v2:
> >  * use a separate bit for KVM_CAP_X86_DISABLE_EXITS
> >
> >  Documentation/virtual/kvm/api.txt | 1 +
> >  arch/x86/include/asm/kvm_host.h   | 1 +
> >  arch/x86/kvm/vmx/vmx.c            | 6 ++++++
> >  arch/x86/kvm/x86.c                | 5 ++++-
> >  arch/x86/kvm/x86.h                | 5 +++++
> >  include/uapi/linux/kvm.h          | 4 +++-
> >  tools/include/uapi/linux/kvm.h    | 4 +++-
> >  7 files changed, 23 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > index 33cd92d..91fd86f 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -4894,6 +4894,7 @@ Valid bits in args[0] are
> >  #define KVM_X86_DISABLE_EXITS_MWAIT            (1 << 0)
> >  #define KVM_X86_DISABLE_EXITS_HLT              (1 << 1)
> >  #define KVM_X86_DISABLE_EXITS_PAUSE            (1 << 2)
> > +#define KVM_X86_DISABLE_EXITS_CSTATE           (1 << 3)
> >
> >  Enabling this capability on a VM provides userspace with a way to no
> >  longer intercept some instructions for improved latency in some
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index d5457c7..1ce8289 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -882,6 +882,7 @@ struct kvm_arch {
> >       bool mwait_in_guest;
> >       bool hlt_in_guest;
> >       bool pause_in_guest;
> > +     bool cstate_in_guest;
> >
> >       unsigned long irq_sources_bitmap;
> >       s64 kvmclock_offset;
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 0861c71..da24f18 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -6637,6 +6637,12 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
> >       vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
> >       vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
> >       vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
> > +     if (kvm_cstate_in_guest(kvm)) {
> > +             vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C1_RES, MSR_TYPE_R);
> > +             vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
> > +             vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
> > +             vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
>
> I think I have changed my mind on the implementation of this, sorry.
>
> 1) We should emulate these MSRs always, otherwise the guest API changes
> between different values of KVM_CAP_X86_DISABLE_EXITS which is not
> intended.  Also, KVM_CAP_X86_DISABLE_EXITS does not prevent live
> migration, so it should be possible to set the MSRs in the host to
> change the delta between the host and guest values.
>
> 2) If both KVM_X86_DISABLE_EXITS_HLT and KVM_X86_DISABLE_EXITS_MWAIT are
> disabled (i.e. exit happens), the MSRs will be purely emulated.
> C3/C6/C7 residency will never increase (it will remain the value that is
> set by the host).  When the VM executes an hlt vmexit, it should save
> the current TSC.  When it comes back, the C1 residency MSR should be
> increased by the time that has passed.
>
> 3) If KVM_X86_DISABLE_EXITS_HLT is enabled but
> KVM_X86_DISABLE_EXITS_MWAIT is disabled (i.e. mait exits happen),
> C3/C6/C7 residency will also never increase, but the C1 residency value
> should be read using rdmsr from the host, with a delta added from the
> host value.
>
> 4) If KVM_X86_DISABLE_EXITS_HLT and KVM_X86_DISABLE_EXITS_MWAIT are both
> disabled (i.e. mwait exits do not happen), all four residency values
> should be read using rdmsr from the host, with a delta added from the
> host value.
>
> 5) If KVM_X86_DISABLE_EXITS_HLT is disabled and
> KVM_X86_DISABLE_EXITS_MWAIT is enabled, the configuration makes no sense
> so it's okay not to be very optimized.  In this case, the residency
> value should be read as in (4), but hlt vmexits will be accounted as in
> (2) so we need to be careful not to double-count the residency during
> hlt.  This means doing four rdmsr before the beginning of the hlt vmexit
> and four at the end of the hlt vmexit.

MSR_CORE_C1_RES is unreadable except for ATOM platform, so I think we
can avoid the complex logic to handle C1 now. :)

Regards,
Wanpeng Li

>
> Therefore the data structure should be something like
>
> struct kvm_residency_msr {
>         u64 value;
>         bool delta_from_host;
>         bool count_with_host;
> }
>
> u64 kvm_residency_read_host(struct kvm_residency_msr *msr)
> {
>         u64 unscaled_value = rdmsrl(msr->index);
>         // apply TSC scaling...
>         return ...
> }
>
> u64 kvm_residency_read(struct kvm_residency_msr *msr)
> {
>         return msr->value +
>                 (msr->delta_from_host ? kvm_residency_read_host(msr) : 0);
> }
>
> void kvm_residency_write(struct kvm_residency_msr *msr,
>                          u64 value)
> {
>         msr->value = value -
>                 (msr->delta_from_host ? kvm_residency_read_host(msr) : 0);
> }
>
> // count_with_host is true for C1 iff any of KVM_CAP_DISABLE_EXITS_HLT
> // or KVM_CAP_DISABLE_EXITS_MWAIT is set
> // count_with_host is true for C3/C6/C7 iff KVM_CAP_DISABLE_EXITS_MWAIT
> is set
> void kvm_residency_setup(struct kvm_residency_msr *msr, u16 index,
>                          bool count_with_host)
> {
>         /* Preserve value on calls after the first */
>         u64 value = msr->index ? kvm_residency_read(msr) : 0;
>         msr->delta_from_host = msr->count_with_host = count_with_host;
>         msr->index = index;
>         kvm_residency_write(msr, value);
> }
>
> // The following functions are called from hlt vmexits.
>
> void kvm_residency_start_hlt(struct kvm_residency_msr *msr)
> {
>         if (msr->count_with_host) {
>                 WARN_ON(msr->delta_from_host);
>                 msr->value += kvm_residency_read_host(msr);
>                 msr->delta_from_host = false;
>         }
> }
>
> // host_tsc_waited is 0 except for MSR_CORE_C1_RES
> void kvm_residency_end_hlt(struct kvm_residency_msr *msr,
>                            u64 host_tsc_waited)
> {
>         if (msr->count_with_host) {
>                 WARN_ON(!msr->delta_from_host);
>                 msr->value -= kvm_residency_read_host(msr);
>                 msr->delta_from_host = true;
>         }
>         if (host_tsc_waited) {
>                 // ... apply TSC scaling to host_tsc_waited ...
>                 msr->value += ...;
>         }
> }
>
> Thanks,
>
> Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] KVM: X86: Provide a capability to disable cstate msr read intercepts
  2019-06-11  7:38     ` Wanpeng Li
@ 2019-06-11 11:09       ` Paolo Bonzini
  2019-06-11 11:35         ` Wanpeng Li
  2019-06-12  2:20         ` Wanpeng Li
  0 siblings, 2 replies; 11+ messages in thread
From: Paolo Bonzini @ 2019-06-11 11:09 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: LKML, kvm, Radim Krčmář, Sean Christopherson, Liran Alon

On 11/06/19 09:38, Wanpeng Li wrote:
> MSR_CORE_C1_RES is unreadable except for ATOM platform, so I think we
> can avoid the complex logic to handle C1 now. :)

I disagree.  Linux uses it on all platforms is available, and virtual
machines that don't pass mwait through _only_ have C1, so it would be
less useful to have deep C-state residency MSRs and not C1 residency.

But turbostat can get the information from sysfs, so what are these MSRs
used for?

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] KVM: X86: Provide a capability to disable cstate msr read intercepts
  2019-06-11 11:09       ` Paolo Bonzini
@ 2019-06-11 11:35         ` Wanpeng Li
  2019-06-12  2:20         ` Wanpeng Li
  1 sibling, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2019-06-11 11:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Radim Krčmář, Sean Christopherson, Liran Alon

On Tue, 11 Jun 2019 at 19:09, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 11/06/19 09:38, Wanpeng Li wrote:
> > MSR_CORE_C1_RES is unreadable except for ATOM platform, so I think we
> > can avoid the complex logic to handle C1 now. :)
>
> I disagree.  Linux uses it on all platforms is available, and virtual
> machines that don't pass mwait through _only_ have C1, so it would be
> less useful to have deep C-state residency MSRs and not C1 residency.
>
> But turbostat can get the information from sysfs, so what are these MSRs
> used for?

The sysfs is not accurate, the time which is accumulated in the
function cpuidle_enter_state() is the expected cstate we hope to enter
instead of the real cstate we finally enter. For example, we found
several SKX/CLX models can't enter deeper cstates in non-root mode,
Intel hardware team has been working on this according to our report
recently. The bare-metal cstate residency msrs don't increase in
non-root mode, however, the time under
/sys/devices/system/cpu/cpux/cpuidle/statex/ increase gradually in the
guest.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] KVM: X86: Provide a capability to disable cstate msr read intercepts
  2019-06-11 11:09       ` Paolo Bonzini
  2019-06-11 11:35         ` Wanpeng Li
@ 2019-06-12  2:20         ` Wanpeng Li
  1 sibling, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2019-06-12  2:20 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Radim Krčmář, Sean Christopherson, Liran Alon

On Tue, 11 Jun 2019 at 19:09, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 11/06/19 09:38, Wanpeng Li wrote:
> > MSR_CORE_C1_RES is unreadable except for ATOM platform, so I think we
> > can avoid the complex logic to handle C1 now. :)
>
> I disagree.  Linux uses it on all platforms is available, and virtual
> machines that don't pass mwait through _only_ have C1, so it would be
> less useful to have deep C-state residency MSRs and not C1 residency.

Your design heavily depends on read host MSR_CORE_C1_RES for C1
residency msr emulation, however, the host MSR_CORE_C1_RES is
unreadable.
#rdmsr 0x660
rdmsr: CPU 0 cannot read MSR 0x00000660

Refer to turbostat codes,
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/power/x86/turbostat/turbostat.c#n1335
C1 is derivated from other parameters. use_c1_residency_msr is true
just for ATOM platform.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-06-12  2:19 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-21  6:06 [PATCH 1/3] KVM: Documentation: Add disable pause exits to KVM_CAP_X86_DISABLE_EXITS Wanpeng Li
2019-05-21  6:06 ` [PATCH v2 2/3] KVM: X86: Provide a capability to disable cstate msr read intercepts Wanpeng Li
2019-06-04 16:53   ` Paolo Bonzini
2019-06-05 10:56     ` Wanpeng Li
2019-06-11  7:38     ` Wanpeng Li
2019-06-11 11:09       ` Paolo Bonzini
2019-06-11 11:35         ` Wanpeng Li
2019-06-12  2:20         ` Wanpeng Li
2019-05-21  6:06 ` [PATCH v2 3/3] KVM: X86: Emulate MSR_IA32_MISC_ENABLE MWAIT bit Wanpeng Li
2019-06-04 16:59   ` Paolo Bonzini
2019-06-05 11:00     ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).