kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/5] KVM: MMU: 5 level EPT/shadow support
@ 2017-08-24 12:27 Yu Zhang
  2017-08-24 12:27 ` [PATCH v3 1/5] KVM: x86: Add return value to kvm_cpuid() Yu Zhang
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Yu Zhang @ 2017-08-24 12:27 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, pbonzini, rkrcmar, tglx, mingo, hpa, xiaoguangrong, joro

Intel's existing processors limit the maximum linear address width to
48 bits, and the maximum physical address width to 46 bits. And the
upcoming processors will extend maximum linear address width to 57 bits
and maximum physical address width can go upto 52 bits in practical.

With linear address width greater than 48, a new paging mode in IA-32e
is introduced - 5 level paging(also known as LA57). And to support VMs 
with this feature, KVM MMU code need to be extended. 

And to achieve this, this patchset:
1> leverages 2 qemu parameters: +la57 and phys-bits to expose wider linear
address width and physical address width to the VM; 
2> extends shadow logic to construct 5 level shadow page for VMs running
in LA57 mode;
3> extends ept logic to construct 5 level ept table for VMs whose maximum
physical width exceeds 48 bits.

Changes in v3: 
- Address comments from Paolo Bonzini: do not fall into check_cpuid_limit()
  in kvm_cpuid() for em_movbe() and check_fxsr();
- Address comments from Paolo Bonzini: change parameter 'check_limit' of
  kvm_cpuid() to bool type;
- Address comments from Paolo Bonzini: set maxphyaddr to 36, for guest cr3
  reserved bits check if cpuid.0x80000008 is not available;
- Address comments from Paolo Bonzini: replace the hardcoded value 48 as
  va_bits in __linearize();
- Rebase change: add new eptp definition VMX_EPTP_PWL_5, instead of use bit
  shifts(in line with previous commit bb97a01).

Changes in v2: 
- Address comments from Paolo Bonzini and Jim Mattson: add a new patch to let
  kvm_cpuid() return false when cpuid entry is not found; 
- Address comments from Paolo Bonzini: fix a typo in check_cr_write() and use
  62 as the upper limit when checking reserved bits for a physical address;
- Address comments from Paolo Bonzini: move definition of PT64_ROOT_MAX_LEVEL
  into kvm_host.h;
- Address comments from Paolo Bonzini: add checking for shadow_root_level in
  mmu_free_roots(); 
- Address comments from Paolo Bonzini: set root_level & shadow_root_level both
  to PT64_ROOT_4LEVEL for shadow ept situation.


Yu Zhang (5):
  KVM: x86: Add return value to kvm_cpuid().
  KVM: MMU: check guest CR3 reserved bits based on its physical address
    width.
  KVM: MMU: Rename PT64_ROOT_LEVEL to PT64_ROOT_4LEVEL.
  KVM: MMU: Add 5 level EPT & Shadow page table support.
  KVM: MMU: Expose the LA57 feature to VM.

 arch/x86/include/asm/kvm_emulate.h |  4 +--
 arch/x86/include/asm/kvm_host.h    | 31 ++++++--------------
 arch/x86/include/asm/vmx.h         |  2 ++
 arch/x86/kvm/cpuid.c               | 38 +++++++++++++++++-------
 arch/x86/kvm/cpuid.h               |  3 +-
 arch/x86/kvm/emulate.c             | 42 +++++++++++++++++----------
 arch/x86/kvm/kvm_cache_regs.h      |  2 +-
 arch/x86/kvm/mmu.c                 | 59 ++++++++++++++++++++++++--------------
 arch/x86/kvm/mmu.h                 |  6 +++-
 arch/x86/kvm/mmu_audit.c           |  4 +--
 arch/x86/kvm/svm.c                 |  8 +++---
 arch/x86/kvm/trace.h               | 11 ++++---
 arch/x86/kvm/vmx.c                 | 29 ++++++++++++-------
 arch/x86/kvm/x86.c                 | 21 ++++++++------
 arch/x86/kvm/x86.h                 | 44 ++++++++++++++++++++++++++++
 15 files changed, 201 insertions(+), 103 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 1/5] KVM: x86: Add return value to kvm_cpuid().
  2017-08-24 12:27 [PATCH v3 0/5] KVM: MMU: 5 level EPT/shadow support Yu Zhang
@ 2017-08-24 12:27 ` Yu Zhang
  2017-08-24 12:27 ` [PATCH v3 2/5] KVM: MMU: check guest CR3 reserved bits based on its physical address width Yu Zhang
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Yu Zhang @ 2017-08-24 12:27 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, pbonzini, rkrcmar, tglx, mingo, hpa, xiaoguangrong, joro

Return false in kvm_cpuid() when it fails to find the cpuid
entry. Also, this routine(and its caller) is optimized with
a new argument - check_limit, so that the check_cpuid_limit()
fall back can be avoided.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
 arch/x86/include/asm/kvm_emulate.h |  4 ++--
 arch/x86/kvm/cpuid.c               | 17 +++++++++++++----
 arch/x86/kvm/cpuid.h               |  3 ++-
 arch/x86/kvm/emulate.c             | 12 ++++++------
 arch/x86/kvm/svm.c                 |  2 +-
 arch/x86/kvm/trace.h               | 11 +++++++----
 arch/x86/kvm/x86.c                 |  6 +++---
 7 files changed, 34 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index fde36f1..fa2558e 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -219,8 +219,8 @@ struct x86_emulate_ops {
 			 struct x86_instruction_info *info,
 			 enum x86_intercept_stage stage);
 
-	void (*get_cpuid)(struct x86_emulate_ctxt *ctxt,
-			  u32 *eax, u32 *ebx, u32 *ecx, u32 *edx);
+	bool (*get_cpuid)(struct x86_emulate_ctxt *ctxt, u32 *eax, u32 *ebx,
+			  u32 *ecx, u32 *edx, bool check_limit);
 	void (*set_nmi_mask)(struct x86_emulate_ctxt *ctxt, bool masked);
 
 	unsigned (*get_hflags)(struct x86_emulate_ctxt *ctxt);
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 59ca2ee..1450547 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -853,16 +853,24 @@ static struct kvm_cpuid_entry2* check_cpuid_limit(struct kvm_vcpu *vcpu,
 	return kvm_find_cpuid_entry(vcpu, maxlevel->eax, index);
 }
 
-void kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx)
+bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
+	       u32 *ecx, u32 *edx, bool check_limit)
 {
 	u32 function = *eax, index = *ecx;
 	struct kvm_cpuid_entry2 *best;
+	bool entry_found = true;
 
 	best = kvm_find_cpuid_entry(vcpu, function, index);
 
-	if (!best)
+	if (!best) {
+		entry_found = false;
+		if (!check_limit)
+			goto out;
+
 		best = check_cpuid_limit(vcpu, function, index);
+	}
 
+out:
 	if (best) {
 		*eax = best->eax;
 		*ebx = best->ebx;
@@ -870,7 +878,8 @@ void kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx)
 		*edx = best->edx;
 	} else
 		*eax = *ebx = *ecx = *edx = 0;
-	trace_kvm_cpuid(function, *eax, *ebx, *ecx, *edx);
+	trace_kvm_cpuid(function, *eax, *ebx, *ecx, *edx, entry_found);
+	return entry_found;
 }
 EXPORT_SYMBOL_GPL(kvm_cpuid);
 
@@ -883,7 +892,7 @@ int kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
 
 	eax = kvm_register_read(vcpu, VCPU_REGS_RAX);
 	ecx = kvm_register_read(vcpu, VCPU_REGS_RCX);
-	kvm_cpuid(vcpu, &eax, &ebx, &ecx, &edx);
+	kvm_cpuid(vcpu, &eax, &ebx, &ecx, &edx, true);
 	kvm_register_write(vcpu, VCPU_REGS_RAX, eax);
 	kvm_register_write(vcpu, VCPU_REGS_RBX, ebx);
 	kvm_register_write(vcpu, VCPU_REGS_RCX, ecx);
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index ac15193..1ea3c0e 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -21,7 +21,8 @@ int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
 int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
 			      struct kvm_cpuid2 *cpuid,
 			      struct kvm_cpuid_entry2 __user *entries);
-void kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx);
+bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
+	       u32 *ecx, u32 *edx, bool check_limit);
 
 int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu);
 
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index fb00559..319d91f 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2333,7 +2333,7 @@ static int emulator_has_longmode(struct x86_emulate_ctxt *ctxt)
 
 	eax = 0x80000001;
 	ecx = 0;
-	ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
+	ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx, false);
 	return edx & bit(X86_FEATURE_LM);
 }
 
@@ -2636,7 +2636,7 @@ static bool vendor_intel(struct x86_emulate_ctxt *ctxt)
 	u32 eax, ebx, ecx, edx;
 
 	eax = ecx = 0;
-	ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
+	ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx, false);
 	return ebx == X86EMUL_CPUID_VENDOR_GenuineIntel_ebx
 		&& ecx == X86EMUL_CPUID_VENDOR_GenuineIntel_ecx
 		&& edx == X86EMUL_CPUID_VENDOR_GenuineIntel_edx;
@@ -2656,7 +2656,7 @@ static bool em_syscall_is_enabled(struct x86_emulate_ctxt *ctxt)
 
 	eax = 0x00000000;
 	ecx = 0x00000000;
-	ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
+	ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx, false);
 	/*
 	 * Intel ("GenuineIntel")
 	 * remark: Intel CPUs only support "syscall" in 64bit
@@ -3551,7 +3551,7 @@ static int em_movbe(struct x86_emulate_ctxt *ctxt)
 	/*
 	 * Check MOVBE is set in the guest-visible CPUID leaf.
 	 */
-	ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
+	ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx, false);
 	if (!(ecx & FFL(MOVBE)))
 		return emulate_ud(ctxt);
 
@@ -3865,7 +3865,7 @@ static int em_cpuid(struct x86_emulate_ctxt *ctxt)
 
 	eax = reg_read(ctxt, VCPU_REGS_RAX);
 	ecx = reg_read(ctxt, VCPU_REGS_RCX);
-	ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
+	ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx, true);
 	*reg_write(ctxt, VCPU_REGS_RAX) = eax;
 	*reg_write(ctxt, VCPU_REGS_RBX) = ebx;
 	*reg_write(ctxt, VCPU_REGS_RCX) = ecx;
@@ -3924,7 +3924,7 @@ static int check_fxsr(struct x86_emulate_ctxt *ctxt)
 {
 	u32 eax = 1, ebx, ecx = 0, edx;
 
-	ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
+	ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx, false);
 	if (!(edx & FFL(FXSR)))
 		return emulate_ud(ctxt);
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 3681287..be25072 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1561,7 +1561,7 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	}
 	init_vmcb(svm);
 
-	kvm_cpuid(vcpu, &eax, &dummy, &dummy, &dummy);
+	kvm_cpuid(vcpu, &eax, &dummy, &dummy, &dummy, true);
 	kvm_register_write(vcpu, VCPU_REGS_RDX, eax);
 
 	if (kvm_vcpu_apicv_active(vcpu) && !init_event)
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 0a6cc67..8a202c4 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -151,8 +151,8 @@ TRACE_EVENT(kvm_fast_mmio,
  */
 TRACE_EVENT(kvm_cpuid,
 	TP_PROTO(unsigned int function, unsigned long rax, unsigned long rbx,
-		 unsigned long rcx, unsigned long rdx),
-	TP_ARGS(function, rax, rbx, rcx, rdx),
+		 unsigned long rcx, unsigned long rdx, bool found),
+	TP_ARGS(function, rax, rbx, rcx, rdx, found),
 
 	TP_STRUCT__entry(
 		__field(	unsigned int,	function	)
@@ -160,6 +160,7 @@ TRACE_EVENT(kvm_cpuid,
 		__field(	unsigned long,	rbx		)
 		__field(	unsigned long,	rcx		)
 		__field(	unsigned long,	rdx		)
+		__field(	bool,		found		)
 	),
 
 	TP_fast_assign(
@@ -168,11 +169,13 @@ TRACE_EVENT(kvm_cpuid,
 		__entry->rbx		= rbx;
 		__entry->rcx		= rcx;
 		__entry->rdx		= rdx;
+		__entry->found		= found;
 	),
 
-	TP_printk("func %x rax %lx rbx %lx rcx %lx rdx %lx",
+	TP_printk("func %x rax %lx rbx %lx rcx %lx rdx %lx, cpuid entry %s",
 		  __entry->function, __entry->rax,
-		  __entry->rbx, __entry->rcx, __entry->rdx)
+		  __entry->rbx, __entry->rcx, __entry->rdx,
+		  __entry->found ? "found" : "not found")
 );
 
 #define AREG(x) { APIC_##x, "APIC_" #x }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4e69923..cc2c7e4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5206,10 +5206,10 @@ static int emulator_intercept(struct x86_emulate_ctxt *ctxt,
 	return kvm_x86_ops->check_intercept(emul_to_vcpu(ctxt), info, stage);
 }
 
-static void emulator_get_cpuid(struct x86_emulate_ctxt *ctxt,
-			       u32 *eax, u32 *ebx, u32 *ecx, u32 *edx)
+static bool emulator_get_cpuid(struct x86_emulate_ctxt *ctxt,
+			u32 *eax, u32 *ebx, u32 *ecx, u32 *edx, bool check_limit)
 {
-	kvm_cpuid(emul_to_vcpu(ctxt), eax, ebx, ecx, edx);
+	return kvm_cpuid(emul_to_vcpu(ctxt), eax, ebx, ecx, edx, check_limit);
 }
 
 static ulong emulator_read_gpr(struct x86_emulate_ctxt *ctxt, unsigned reg)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 2/5] KVM: MMU: check guest CR3 reserved bits based on its physical address width.
  2017-08-24 12:27 [PATCH v3 0/5] KVM: MMU: 5 level EPT/shadow support Yu Zhang
  2017-08-24 12:27 ` [PATCH v3 1/5] KVM: x86: Add return value to kvm_cpuid() Yu Zhang
@ 2017-08-24 12:27 ` Yu Zhang
  2017-08-24 13:40   ` Paolo Bonzini
  2017-09-15 23:19   ` Jim Mattson
  2017-08-24 12:27 ` [PATCH v3 3/5] KVM: MMU: Rename PT64_ROOT_LEVEL to PT64_ROOT_4LEVEL Yu Zhang
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 19+ messages in thread
From: Yu Zhang @ 2017-08-24 12:27 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, pbonzini, rkrcmar, tglx, mingo, hpa, xiaoguangrong, joro

Currently, KVM uses CR3_L_MODE_RESERVED_BITS to check the
reserved bits in CR3. Yet the length of reserved bits in
guest CR3 should be based on the physical address width
exposed to the VM. This patch changes CR3 check logic to
calculate the reserved bits at runtime.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/emulate.c          | 14 ++++++++++++--
 arch/x86/kvm/mmu.h              |  3 +++
 arch/x86/kvm/x86.c              |  8 ++++----
 4 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6db0ed9..e716228 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -79,7 +79,6 @@
 			  | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM \
 			  | X86_CR0_NW | X86_CR0_CD | X86_CR0_PG))
 
-#define CR3_L_MODE_RESERVED_BITS 0xFFFFFF0000000000ULL
 #define CR3_PCID_INVD		 BIT_64(63)
 #define CR4_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 319d91f..a89b595 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -28,6 +28,7 @@
 
 #include "x86.h"
 #include "tss.h"
+#include "mmu.h"
 
 /*
  * Operand types
@@ -4097,8 +4098,17 @@ static int check_cr_write(struct x86_emulate_ctxt *ctxt)
 		u64 rsvd = 0;
 
 		ctxt->ops->get_msr(ctxt, MSR_EFER, &efer);
-		if (efer & EFER_LMA)
-			rsvd = CR3_L_MODE_RESERVED_BITS & ~CR3_PCID_INVD;
+		if (efer & EFER_LMA) {
+			u64 maxphyaddr;
+			u32 eax = 0x80000008;
+
+			if (ctxt->ops->get_cpuid(ctxt, &eax, NULL, NULL,
+						 NULL, false))
+				maxphyaddr = eax & 0xff;
+			else
+				maxphyaddr = 36;
+			rsvd = rsvd_bits(maxphyaddr, 62);
+		}
 
 		if (new_val & rsvd)
 			return emulate_gp(ctxt, 0);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 3ed6192..67e7ec2 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -48,6 +48,9 @@
 
 static inline u64 rsvd_bits(int s, int e)
 {
+	if (e < s)
+		return 0;
+
 	return ((1ULL << (e - s + 1)) - 1) << s;
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cc2c7e4..79f5889 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -813,10 +813,10 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 		return 0;
 	}
 
-	if (is_long_mode(vcpu)) {
-		if (cr3 & CR3_L_MODE_RESERVED_BITS)
-			return 1;
-	} else if (is_pae(vcpu) && is_paging(vcpu) &&
+	if (is_long_mode(vcpu) &&
+	    (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 62)))
+		return 1;
+	else if (is_pae(vcpu) && is_paging(vcpu) &&
 		   !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
 		return 1;
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 3/5] KVM: MMU: Rename PT64_ROOT_LEVEL to PT64_ROOT_4LEVEL.
  2017-08-24 12:27 [PATCH v3 0/5] KVM: MMU: 5 level EPT/shadow support Yu Zhang
  2017-08-24 12:27 ` [PATCH v3 1/5] KVM: x86: Add return value to kvm_cpuid() Yu Zhang
  2017-08-24 12:27 ` [PATCH v3 2/5] KVM: MMU: check guest CR3 reserved bits based on its physical address width Yu Zhang
@ 2017-08-24 12:27 ` Yu Zhang
  2017-08-24 12:27 ` [PATCH v3 4/5] KVM: MMU: Add 5 level EPT & Shadow page table support Yu Zhang
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Yu Zhang @ 2017-08-24 12:27 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, pbonzini, rkrcmar, tglx, mingo, hpa, xiaoguangrong, joro

Now we have 4 level page table and 5 level page table in 64 bits
long mode, let's rename the PT64_ROOT_LEVEL to PT64_ROOT_4LEVEL,
then we can use PT64_ROOT_5LEVEL for 5 level page table, it's
helpful to make the code more clear.

Also PT64_ROOT_MAX_LEVEL is defined as 4, so that we can just
redefine it to 5 whenever a replacement is needed for 5 level
paging.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |  4 +++-
 arch/x86/kvm/mmu.c              | 36 ++++++++++++++++++------------------
 arch/x86/kvm/mmu.h              |  2 +-
 arch/x86/kvm/mmu_audit.c        |  4 ++--
 arch/x86/kvm/svm.c              |  2 +-
 5 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e716228..5907d46 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -315,8 +315,10 @@ struct kvm_pio_request {
 	int size;
 };
 
+#define PT64_ROOT_MAX_LEVEL 4
+
 struct rsvd_bits_validate {
-	u64 rsvd_bits_mask[2][4];
+	u64 rsvd_bits_mask[2][PT64_ROOT_MAX_LEVEL];
 	u64 bad_mt_xwr;
 };
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2dafd36..3faa20c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2167,8 +2167,8 @@ static bool kvm_sync_pages(struct kvm_vcpu *vcpu, gfn_t gfn,
 }
 
 struct mmu_page_path {
-	struct kvm_mmu_page *parent[PT64_ROOT_LEVEL];
-	unsigned int idx[PT64_ROOT_LEVEL];
+	struct kvm_mmu_page *parent[PT64_ROOT_MAX_LEVEL];
+	unsigned int idx[PT64_ROOT_MAX_LEVEL];
 };
 
 #define for_each_sp(pvec, sp, parents, i)			\
@@ -2383,8 +2383,8 @@ static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
 	iterator->shadow_addr = vcpu->arch.mmu.root_hpa;
 	iterator->level = vcpu->arch.mmu.shadow_root_level;
 
-	if (iterator->level == PT64_ROOT_LEVEL &&
-	    vcpu->arch.mmu.root_level < PT64_ROOT_LEVEL &&
+	if (iterator->level == PT64_ROOT_4LEVEL &&
+	    vcpu->arch.mmu.root_level < PT64_ROOT_4LEVEL &&
 	    !vcpu->arch.mmu.direct_map)
 		--iterator->level;
 
@@ -3322,8 +3322,8 @@ static void mmu_free_roots(struct kvm_vcpu *vcpu)
 	if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
 		return;
 
-	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL &&
-	    (vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL ||
+	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_4LEVEL &&
+	    (vcpu->arch.mmu.root_level == PT64_ROOT_4LEVEL ||
 	     vcpu->arch.mmu.direct_map)) {
 		hpa_t root = vcpu->arch.mmu.root_hpa;
 
@@ -3375,13 +3375,13 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 	struct kvm_mmu_page *sp;
 	unsigned i;
 
-	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
+	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_4LEVEL) {
 		spin_lock(&vcpu->kvm->mmu_lock);
 		if(make_mmu_pages_available(vcpu) < 0) {
 			spin_unlock(&vcpu->kvm->mmu_lock);
 			return 1;
 		}
-		sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL, 1, ACC_ALL);
+		sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_4LEVEL, 1, ACC_ALL);
 		++sp->root_count;
 		spin_unlock(&vcpu->kvm->mmu_lock);
 		vcpu->arch.mmu.root_hpa = __pa(sp->spt);
@@ -3425,7 +3425,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * Do we shadow a long mode page table? If so we need to
 	 * write-protect the guests page table root.
 	 */
-	if (vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL) {
+	if (vcpu->arch.mmu.root_level == PT64_ROOT_4LEVEL) {
 		hpa_t root = vcpu->arch.mmu.root_hpa;
 
 		MMU_WARN_ON(VALID_PAGE(root));
@@ -3435,7 +3435,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 			spin_unlock(&vcpu->kvm->mmu_lock);
 			return 1;
 		}
-		sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL,
+		sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_4LEVEL,
 				      0, ACC_ALL);
 		root = __pa(sp->spt);
 		++sp->root_count;
@@ -3450,7 +3450,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * the shadow page table may be a PAE or a long mode page table.
 	 */
 	pm_mask = PT_PRESENT_MASK;
-	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL)
+	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_4LEVEL)
 		pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
 
 	for (i = 0; i < 4; ++i) {
@@ -3486,7 +3486,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * If we shadow a 32 bit page table with a long mode page
 	 * table we enter this path.
 	 */
-	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
+	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_4LEVEL) {
 		if (vcpu->arch.mmu.lm_root == NULL) {
 			/*
 			 * The additional page necessary for this is only
@@ -3531,7 +3531,7 @@ static void mmu_sync_roots(struct kvm_vcpu *vcpu)
 
 	vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
 	kvm_mmu_audit(vcpu, AUDIT_PRE_SYNC);
-	if (vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL) {
+	if (vcpu->arch.mmu.root_level == PT64_ROOT_4LEVEL) {
 		hpa_t root = vcpu->arch.mmu.root_hpa;
 		sp = page_header(root);
 		mmu_sync_children(vcpu, sp);
@@ -3614,7 +3614,7 @@ static bool
 walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep)
 {
 	struct kvm_shadow_walk_iterator iterator;
-	u64 sptes[PT64_ROOT_LEVEL], spte = 0ull;
+	u64 sptes[PT64_ROOT_MAX_LEVEL], spte = 0ull;
 	int root, leaf;
 	bool reserved = false;
 
@@ -4057,7 +4057,7 @@ __reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
 		rsvd_check->rsvd_bits_mask[1][0] =
 			rsvd_check->rsvd_bits_mask[0][0];
 		break;
-	case PT64_ROOT_LEVEL:
+	case PT64_ROOT_4LEVEL:
 		rsvd_check->rsvd_bits_mask[0][3] = exb_bit_rsvd |
 			nonleaf_bit8_rsvd | rsvd_bits(7, 7) |
 			rsvd_bits(maxphyaddr, 51);
@@ -4367,7 +4367,7 @@ static void paging64_init_context_common(struct kvm_vcpu *vcpu,
 static void paging64_init_context(struct kvm_vcpu *vcpu,
 				  struct kvm_mmu *context)
 {
-	paging64_init_context_common(vcpu, context, PT64_ROOT_LEVEL);
+	paging64_init_context_common(vcpu, context, PT64_ROOT_4LEVEL);
 }
 
 static void paging32_init_context(struct kvm_vcpu *vcpu,
@@ -4422,7 +4422,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 		context->root_level = 0;
 	} else if (is_long_mode(vcpu)) {
 		context->nx = is_nx(vcpu);
-		context->root_level = PT64_ROOT_LEVEL;
+		context->root_level = PT64_ROOT_4LEVEL;
 		reset_rsvds_bits_mask(vcpu, context);
 		context->gva_to_gpa = paging64_gva_to_gpa;
 	} else if (is_pae(vcpu)) {
@@ -4533,7 +4533,7 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
 		g_context->gva_to_gpa = nonpaging_gva_to_gpa_nested;
 	} else if (is_long_mode(vcpu)) {
 		g_context->nx = is_nx(vcpu);
-		g_context->root_level = PT64_ROOT_LEVEL;
+		g_context->root_level = PT64_ROOT_4LEVEL;
 		reset_rsvds_bits_mask(vcpu, g_context);
 		g_context->gva_to_gpa = paging64_gva_to_gpa_nested;
 	} else if (is_pae(vcpu)) {
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 67e7ec2..45b9a11 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -37,7 +37,7 @@
 #define PT32_DIR_PSE36_MASK \
 	(((1ULL << PT32_DIR_PSE36_SIZE) - 1) << PT32_DIR_PSE36_SHIFT)
 
-#define PT64_ROOT_LEVEL 4
+#define PT64_ROOT_4LEVEL 4
 #define PT32_ROOT_LEVEL 2
 #define PT32E_ROOT_LEVEL 3
 
diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
index dcce533..2e6996d 100644
--- a/arch/x86/kvm/mmu_audit.c
+++ b/arch/x86/kvm/mmu_audit.c
@@ -62,11 +62,11 @@ static void mmu_spte_walk(struct kvm_vcpu *vcpu, inspect_spte_fn fn)
 	if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
 		return;
 
-	if (vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL) {
+	if (vcpu->arch.mmu.root_level == PT64_ROOT_4LEVEL) {
 		hpa_t root = vcpu->arch.mmu.root_hpa;
 
 		sp = page_header(root);
-		__mmu_spte_walk(vcpu, sp, fn, PT64_ROOT_LEVEL);
+		__mmu_spte_walk(vcpu, sp, fn, PT64_ROOT_4LEVEL);
 		return;
 	}
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index be25072..19e43ca 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -566,7 +566,7 @@ static inline void invlpga(unsigned long addr, u32 asid)
 static int get_npt_level(void)
 {
 #ifdef CONFIG_X86_64
-	return PT64_ROOT_LEVEL;
+	return PT64_ROOT_4LEVEL;
 #else
 	return PT32E_ROOT_LEVEL;
 #endif
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 4/5] KVM: MMU: Add 5 level EPT & Shadow page table support.
  2017-08-24 12:27 [PATCH v3 0/5] KVM: MMU: 5 level EPT/shadow support Yu Zhang
                   ` (2 preceding siblings ...)
  2017-08-24 12:27 ` [PATCH v3 3/5] KVM: MMU: Rename PT64_ROOT_LEVEL to PT64_ROOT_4LEVEL Yu Zhang
@ 2017-08-24 12:27 ` Yu Zhang
  2017-08-24 13:42   ` Paolo Bonzini
  2017-08-24 12:27 ` [PATCH v3 5/5] KVM: MMU: Expose the LA57 feature to VM Yu Zhang
  2017-08-24 13:47 ` [PATCH v3 0/5] KVM: MMU: 5 level EPT/shadow support Paolo Bonzini
  5 siblings, 1 reply; 19+ messages in thread
From: Yu Zhang @ 2017-08-24 12:27 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, pbonzini, rkrcmar, tglx, mingo, hpa, xiaoguangrong, joro

Extends the shadow paging code, so that 5 level shadow page
table can be constructed if VM is running in 5 level paging
mode.

Also extends the ept code, so that 5 level ept table can be
constructed if maxphysaddr of VM exceeds 48 bits. Unlike the
shadow logic, KVM should still use 4 level ept table for a VM
whose physical address width is less than 48 bits, even when
the VM is running in 5 level paging mode.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h | 10 +++++-----
 arch/x86/include/asm/vmx.h      |  2 ++
 arch/x86/kvm/cpuid.c            |  5 +++++
 arch/x86/kvm/mmu.c              | 43 +++++++++++++++++++++++++++--------------
 arch/x86/kvm/mmu.h              |  1 +
 arch/x86/kvm/mmu_audit.c        |  4 ++--
 arch/x86/kvm/svm.c              |  4 ++--
 arch/x86/kvm/vmx.c              | 21 ++++++++++++++------
 arch/x86/kvm/x86.h              | 10 ++++++++++
 9 files changed, 71 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5907d46..bdef532 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -315,7 +315,7 @@ struct kvm_pio_request {
 	int size;
 };
 
-#define PT64_ROOT_MAX_LEVEL 4
+#define PT64_ROOT_MAX_LEVEL 5
 
 struct rsvd_bits_validate {
 	u64 rsvd_bits_mask[2][PT64_ROOT_MAX_LEVEL];
@@ -323,9 +323,9 @@ struct rsvd_bits_validate {
 };
 
 /*
- * x86 supports 3 paging modes (4-level 64-bit, 3-level 64-bit, and 2-level
- * 32-bit).  The kvm_mmu structure abstracts the details of the current mmu
- * mode.
+ * x86 supports 4 paging modes (5-level 64-bit, 4-level 64-bit, 3-level 32-bit,
+ * and 2-level 32-bit).  The kvm_mmu structure abstracts the details of the
+ * current mmu mode.
  */
 struct kvm_mmu {
 	void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long root);
@@ -982,7 +982,7 @@ struct kvm_x86_ops {
 	void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
 	int (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
 	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
-	int (*get_tdp_level)(void);
+	int (*get_tdp_level)(struct kvm_vcpu *vcpu);
 	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
 	int (*get_lpage_level)(void);
 	bool (*rdtscp_supported)(void);
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 340007a..caec841 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -453,6 +453,7 @@ enum vmcs_field {
 
 #define VMX_EPT_EXECUTE_ONLY_BIT		(1ull)
 #define VMX_EPT_PAGE_WALK_4_BIT			(1ull << 6)
+#define VMX_EPT_PAGE_WALK_5_BIT			(1ull << 7)
 #define VMX_EPTP_UC_BIT				(1ull << 8)
 #define VMX_EPTP_WB_BIT				(1ull << 14)
 #define VMX_EPT_2MB_PAGE_BIT			(1ull << 16)
@@ -471,6 +472,7 @@ enum vmcs_field {
 #define VMX_EPT_MT_EPTE_SHIFT			3
 #define VMX_EPTP_PWL_MASK			0x38ull
 #define VMX_EPTP_PWL_4				0x18ull
+#define VMX_EPTP_PWL_5				0x20ull
 #define VMX_EPTP_AD_ENABLE_BIT			(1ull << 6)
 #define VMX_EPTP_MT_MASK			0x7ull
 #define VMX_EPTP_MT_WB				0x6ull
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1450547..83865a3 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -137,6 +137,11 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 	/* Update physical-address width */
 	vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
 
+#ifdef CONFIG_X86_64
+	if (vcpu->arch.maxphyaddr > 48)
+		kvm_mmu_reset_context(vcpu);
+#endif
+
 	kvm_pmu_refresh(vcpu);
 	return 0;
 }
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 3faa20c..f47ccca 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3322,8 +3322,8 @@ static void mmu_free_roots(struct kvm_vcpu *vcpu)
 	if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
 		return;
 
-	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_4LEVEL &&
-	    (vcpu->arch.mmu.root_level == PT64_ROOT_4LEVEL ||
+	if (vcpu->arch.mmu.shadow_root_level >= PT64_ROOT_4LEVEL &&
+	    (vcpu->arch.mmu.root_level >= PT64_ROOT_4LEVEL ||
 	     vcpu->arch.mmu.direct_map)) {
 		hpa_t root = vcpu->arch.mmu.root_hpa;
 
@@ -3375,13 +3375,14 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 	struct kvm_mmu_page *sp;
 	unsigned i;
 
-	if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_4LEVEL) {
+	if (vcpu->arch.mmu.shadow_root_level >= PT64_ROOT_4LEVEL) {
 		spin_lock(&vcpu->kvm->mmu_lock);
 		if(make_mmu_pages_available(vcpu) < 0) {
 			spin_unlock(&vcpu->kvm->mmu_lock);
 			return 1;
 		}
-		sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_4LEVEL, 1, ACC_ALL);
+		sp = kvm_mmu_get_page(vcpu, 0, 0,
+				vcpu->arch.mmu.shadow_root_level, 1, ACC_ALL);
 		++sp->root_count;
 		spin_unlock(&vcpu->kvm->mmu_lock);
 		vcpu->arch.mmu.root_hpa = __pa(sp->spt);
@@ -3425,7 +3426,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 	 * Do we shadow a long mode page table? If so we need to
 	 * write-protect the guests page table root.
 	 */
-	if (vcpu->arch.mmu.root_level == PT64_ROOT_4LEVEL) {
+	if (vcpu->arch.mmu.root_level >= PT64_ROOT_4LEVEL) {
 		hpa_t root = vcpu->arch.mmu.root_hpa;
 
 		MMU_WARN_ON(VALID_PAGE(root));
@@ -3435,8 +3436,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 			spin_unlock(&vcpu->kvm->mmu_lock);
 			return 1;
 		}
-		sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_4LEVEL,
-				      0, ACC_ALL);
+		sp = kvm_mmu_get_page(vcpu, root_gfn, 0,
+				vcpu->arch.mmu.shadow_root_level, 0, ACC_ALL);
 		root = __pa(sp->spt);
 		++sp->root_count;
 		spin_unlock(&vcpu->kvm->mmu_lock);
@@ -3531,7 +3532,7 @@ static void mmu_sync_roots(struct kvm_vcpu *vcpu)
 
 	vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
 	kvm_mmu_audit(vcpu, AUDIT_PRE_SYNC);
-	if (vcpu->arch.mmu.root_level == PT64_ROOT_4LEVEL) {
+	if (vcpu->arch.mmu.root_level >= PT64_ROOT_4LEVEL) {
 		hpa_t root = vcpu->arch.mmu.root_hpa;
 		sp = page_header(root);
 		mmu_sync_children(vcpu, sp);
@@ -4057,6 +4058,12 @@ __reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
 		rsvd_check->rsvd_bits_mask[1][0] =
 			rsvd_check->rsvd_bits_mask[0][0];
 		break;
+	case PT64_ROOT_5LEVEL:
+		rsvd_check->rsvd_bits_mask[0][4] = exb_bit_rsvd |
+			nonleaf_bit8_rsvd | rsvd_bits(7, 7) |
+			rsvd_bits(maxphyaddr, 51);
+		rsvd_check->rsvd_bits_mask[1][4] =
+			rsvd_check->rsvd_bits_mask[0][4];
 	case PT64_ROOT_4LEVEL:
 		rsvd_check->rsvd_bits_mask[0][3] = exb_bit_rsvd |
 			nonleaf_bit8_rsvd | rsvd_bits(7, 7) |
@@ -4098,6 +4105,8 @@ __reset_rsvds_bits_mask_ept(struct rsvd_bits_validate *rsvd_check,
 {
 	u64 bad_mt_xwr;
 
+	rsvd_check->rsvd_bits_mask[0][4] =
+		rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 7);
 	rsvd_check->rsvd_bits_mask[0][3] =
 		rsvd_bits(maxphyaddr, 51) | rsvd_bits(3, 7);
 	rsvd_check->rsvd_bits_mask[0][2] =
@@ -4107,6 +4116,7 @@ __reset_rsvds_bits_mask_ept(struct rsvd_bits_validate *rsvd_check,
 	rsvd_check->rsvd_bits_mask[0][0] = rsvd_bits(maxphyaddr, 51);
 
 	/* large page */
+	rsvd_check->rsvd_bits_mask[1][4] = rsvd_check->rsvd_bits_mask[0][4];
 	rsvd_check->rsvd_bits_mask[1][3] = rsvd_check->rsvd_bits_mask[0][3];
 	rsvd_check->rsvd_bits_mask[1][2] =
 		rsvd_bits(maxphyaddr, 51) | rsvd_bits(12, 29);
@@ -4367,7 +4377,10 @@ static void paging64_init_context_common(struct kvm_vcpu *vcpu,
 static void paging64_init_context(struct kvm_vcpu *vcpu,
 				  struct kvm_mmu *context)
 {
-	paging64_init_context_common(vcpu, context, PT64_ROOT_4LEVEL);
+	int root_level = is_la57_mode(vcpu) ?
+			 PT64_ROOT_5LEVEL : PT64_ROOT_4LEVEL;
+
+	paging64_init_context_common(vcpu, context, root_level);
 }
 
 static void paging32_init_context(struct kvm_vcpu *vcpu,
@@ -4408,7 +4421,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 	context->sync_page = nonpaging_sync_page;
 	context->invlpg = nonpaging_invlpg;
 	context->update_pte = nonpaging_update_pte;
-	context->shadow_root_level = kvm_x86_ops->get_tdp_level();
+	context->shadow_root_level = kvm_x86_ops->get_tdp_level(vcpu);
 	context->root_hpa = INVALID_PAGE;
 	context->direct_map = true;
 	context->set_cr3 = kvm_x86_ops->set_tdp_cr3;
@@ -4422,7 +4435,8 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 		context->root_level = 0;
 	} else if (is_long_mode(vcpu)) {
 		context->nx = is_nx(vcpu);
-		context->root_level = PT64_ROOT_4LEVEL;
+		context->root_level = is_la57_mode(vcpu) ?
+				PT64_ROOT_5LEVEL : PT64_ROOT_4LEVEL;
 		reset_rsvds_bits_mask(vcpu, context);
 		context->gva_to_gpa = paging64_gva_to_gpa;
 	} else if (is_pae(vcpu)) {
@@ -4479,7 +4493,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 
 	MMU_WARN_ON(VALID_PAGE(context->root_hpa));
 
-	context->shadow_root_level = kvm_x86_ops->get_tdp_level();
+	context->shadow_root_level = PT64_ROOT_4LEVEL;
 
 	context->nx = true;
 	context->ept_ad = accessed_dirty;
@@ -4488,7 +4502,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 	context->sync_page = ept_sync_page;
 	context->invlpg = ept_invlpg;
 	context->update_pte = ept_update_pte;
-	context->root_level = context->shadow_root_level;
+	context->root_level = PT64_ROOT_4LEVEL;
 	context->root_hpa = INVALID_PAGE;
 	context->direct_map = false;
 	context->base_role.ad_disabled = !accessed_dirty;
@@ -4533,7 +4547,8 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
 		g_context->gva_to_gpa = nonpaging_gva_to_gpa_nested;
 	} else if (is_long_mode(vcpu)) {
 		g_context->nx = is_nx(vcpu);
-		g_context->root_level = PT64_ROOT_4LEVEL;
+		g_context->root_level = is_la57_mode(vcpu) ?
+					PT64_ROOT_5LEVEL : PT64_ROOT_4LEVEL;
 		reset_rsvds_bits_mask(vcpu, g_context);
 		g_context->gva_to_gpa = paging64_gva_to_gpa_nested;
 	} else if (is_pae(vcpu)) {
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 45b9a11..e2999f5 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -37,6 +37,7 @@
 #define PT32_DIR_PSE36_MASK \
 	(((1ULL << PT32_DIR_PSE36_SIZE) - 1) << PT32_DIR_PSE36_SHIFT)
 
+#define PT64_ROOT_5LEVEL 5
 #define PT64_ROOT_4LEVEL 4
 #define PT32_ROOT_LEVEL 2
 #define PT32E_ROOT_LEVEL 3
diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
index 2e6996d..d22ddbd 100644
--- a/arch/x86/kvm/mmu_audit.c
+++ b/arch/x86/kvm/mmu_audit.c
@@ -62,11 +62,11 @@ static void mmu_spte_walk(struct kvm_vcpu *vcpu, inspect_spte_fn fn)
 	if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
 		return;
 
-	if (vcpu->arch.mmu.root_level == PT64_ROOT_4LEVEL) {
+	if (vcpu->arch.mmu.root_level >= PT64_ROOT_4LEVEL) {
 		hpa_t root = vcpu->arch.mmu.root_hpa;
 
 		sp = page_header(root);
-		__mmu_spte_walk(vcpu, sp, fn, PT64_ROOT_4LEVEL);
+		__mmu_spte_walk(vcpu, sp, fn, vcpu->arch.mmu.root_level);
 		return;
 	}
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 19e43ca..7161d79 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -563,7 +563,7 @@ static inline void invlpga(unsigned long addr, u32 asid)
 	asm volatile (__ex(SVM_INVLPGA) : : "a"(addr), "c"(asid));
 }
 
-static int get_npt_level(void)
+static int get_npt_level(struct kvm_vcpu *vcpu)
 {
 #ifdef CONFIG_X86_64
 	return PT64_ROOT_4LEVEL;
@@ -2370,7 +2370,7 @@ static void nested_svm_init_mmu_context(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu.get_cr3           = nested_svm_get_tdp_cr3;
 	vcpu->arch.mmu.get_pdptr         = nested_svm_get_tdp_pdptr;
 	vcpu->arch.mmu.inject_page_fault = nested_svm_inject_npf_exit;
-	vcpu->arch.mmu.shadow_root_level = get_npt_level();
+	vcpu->arch.mmu.shadow_root_level = get_npt_level(vcpu);
 	reset_shadow_zero_bits_mask(vcpu, &vcpu->arch.mmu);
 	vcpu->arch.walk_mmu              = &vcpu->arch.nested_mmu;
 }
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 19aa69a..d4038a1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1205,6 +1205,11 @@ static inline bool cpu_has_vmx_ept_mt_wb(void)
 	return vmx_capability.ept & VMX_EPTP_WB_BIT;
 }
 
+static inline bool cpu_has_vmx_ept_5levels(void)
+{
+	return vmx_capability.ept & VMX_EPT_PAGE_WALK_5_BIT;
+}
+
 static inline bool cpu_has_vmx_ept_ad_bits(void)
 {
 	return vmx_capability.ept & VMX_EPT_AD_BIT;
@@ -4301,9 +4306,18 @@ static void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 	vmx->emulation_required = emulation_required(vcpu);
 }
 
+static int get_ept_level(struct kvm_vcpu *vcpu)
+{
+	if (cpu_has_vmx_ept_5levels() && (cpuid_maxphyaddr(vcpu) > 48))
+		return 5;
+	return 4;
+}
+
 static u64 construct_eptp(struct kvm_vcpu *vcpu, unsigned long root_hpa)
 {
-	u64 eptp = VMX_EPTP_MT_WB | VMX_EPTP_PWL_4;
+	u64 eptp = VMX_EPTP_MT_WB;
+
+	eptp |= (get_ept_level(vcpu) == 5) ? VMX_EPTP_PWL_5 : VMX_EPTP_PWL_4;
 
 	if (enable_ept_ad_bits &&
 	    (!is_guest_mode(vcpu) || nested_ept_ad_enabled(vcpu)))
@@ -9501,11 +9515,6 @@ static void __init vmx_check_processor_compat(void *rtn)
 	}
 }
 
-static int get_ept_level(void)
-{
-	return 4;
-}
-
 static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 {
 	u8 cache;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 1134603..69de8bf 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -62,6 +62,16 @@ static inline bool is_64_bit_mode(struct kvm_vcpu *vcpu)
 	return cs_l;
 }
 
+static inline bool is_la57_mode(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_X86_64
+	return (vcpu->arch.efer & EFER_LMA) &&
+		 kvm_read_cr4_bits(vcpu, X86_CR4_LA57);
+#else
+	return 0;
+#endif
+}
+
 static inline bool mmu_is_nested(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.walk_mmu == &vcpu->arch.nested_mmu;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 5/5] KVM: MMU: Expose the LA57 feature to VM.
  2017-08-24 12:27 [PATCH v3 0/5] KVM: MMU: 5 level EPT/shadow support Yu Zhang
                   ` (3 preceding siblings ...)
  2017-08-24 12:27 ` [PATCH v3 4/5] KVM: MMU: Add 5 level EPT & Shadow page table support Yu Zhang
@ 2017-08-24 12:27 ` Yu Zhang
  2017-08-24 13:47 ` [PATCH v3 0/5] KVM: MMU: 5 level EPT/shadow support Paolo Bonzini
  5 siblings, 0 replies; 19+ messages in thread
From: Yu Zhang @ 2017-08-24 12:27 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, pbonzini, rkrcmar, tglx, mingo, hpa, xiaoguangrong, joro

This patch exposes 5 level page table feature to the VM.
At the same time, the canonical virtual address checking is
extended to support both 48-bits and 57-bits address width.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h | 18 ++----------------
 arch/x86/kvm/cpuid.c            | 16 ++++++++++------
 arch/x86/kvm/emulate.c          | 16 +++++++++-------
 arch/x86/kvm/kvm_cache_regs.h   |  2 +-
 arch/x86/kvm/vmx.c              |  8 ++++----
 arch/x86/kvm/x86.c              |  7 +++++--
 arch/x86/kvm/x86.h              | 34 ++++++++++++++++++++++++++++++++++
 7 files changed, 65 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bdef532..b4d4f51 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -85,8 +85,8 @@
 			  | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE     \
 			  | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
 			  | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
-			  | X86_CR4_OSXMMEXCPT | X86_CR4_VMXE | X86_CR4_SMAP \
-			  | X86_CR4_PKE))
+			  | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
+			  | X86_CR4_SMAP | X86_CR4_PKE))
 
 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
 
@@ -1300,20 +1300,6 @@ static inline void kvm_inject_gp(struct kvm_vcpu *vcpu, u32 error_code)
 	kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
 }
 
-static inline u64 get_canonical(u64 la)
-{
-	return ((int64_t)la << 16) >> 16;
-}
-
-static inline bool is_noncanonical_address(u64 la)
-{
-#ifdef CONFIG_X86_64
-	return get_canonical(la) != la;
-#else
-	return false;
-#endif
-}
-
 #define TSS_IOPB_BASE_OFFSET 0x66
 #define TSS_BASE_SIZE 0x68
 #define TSS_IOPB_SIZE (65536 / 8)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 83865a3..8683811 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -126,13 +126,16 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
 
 	/*
-	 * The existing code assumes virtual address is 48-bit in the canonical
-	 * address checks; exit if it is ever changed.
+	 * The existing code assumes virtual address is 48-bit or 57-bit in the
+	 * canonical address checks; exit if it is ever changed.
 	 */
 	best = kvm_find_cpuid_entry(vcpu, 0x80000008, 0);
-	if (best && ((best->eax & 0xff00) >> 8) != 48 &&
-		((best->eax & 0xff00) >> 8) != 0)
-		return -EINVAL;
+	if (best) {
+		int vaddr_bits = (best->eax & 0xff00) >> 8;
+
+		if (vaddr_bits != 48 && vaddr_bits != 57 && vaddr_bits != 0)
+			return -EINVAL;
+	}
 
 	/* Update physical-address width */
 	vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
@@ -388,7 +391,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 
 	/* cpuid 7.0.ecx*/
 	const u32 kvm_cpuid_7_0_ecx_x86_features =
-		F(AVX512VBMI) | F(PKU) | 0 /*OSPKE*/ | F(AVX512_VPOPCNTDQ);
+		F(AVX512VBMI) | F(LA57) | F(PKU) |
+		0 /*OSPKE*/ | F(AVX512_VPOPCNTDQ);
 
 	/* cpuid 7.0.edx*/
 	const u32 kvm_cpuid_7_0_edx_x86_features =
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index a89b595..16bf665 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -689,16 +689,18 @@ static __always_inline int __linearize(struct x86_emulate_ctxt *ctxt,
 	ulong la;
 	u32 lim;
 	u16 sel;
+	u8  va_bits;
 
 	la = seg_base(ctxt, addr.seg) + addr.ea;
 	*max_size = 0;
 	switch (mode) {
 	case X86EMUL_MODE_PROT64:
 		*linear = la;
-		if (is_noncanonical_address(la))
+		va_bits = ctxt_virt_addr_bits(ctxt);
+		if (get_canonical(la, va_bits) != la)
 			goto bad;
 
-		*max_size = min_t(u64, ~0u, (1ull << 48) - la);
+		*max_size = min_t(u64, ~0u, (1ull << va_bits) - la);
 		if (size > *max_size)
 			goto bad;
 		break;
@@ -1749,8 +1751,8 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
 				sizeof(base3), &ctxt->exception);
 		if (ret != X86EMUL_CONTINUE)
 			return ret;
-		if (is_noncanonical_address(get_desc_base(&seg_desc) |
-					     ((u64)base3 << 32)))
+		if (emul_is_noncanonical_address(get_desc_base(&seg_desc) |
+				((u64)base3 << 32), ctxt))
 			return emulate_gp(ctxt, 0);
 	}
 load:
@@ -2841,8 +2843,8 @@ static int em_sysexit(struct x86_emulate_ctxt *ctxt)
 		ss_sel = cs_sel + 8;
 		cs.d = 0;
 		cs.l = 1;
-		if (is_noncanonical_address(rcx) ||
-		    is_noncanonical_address(rdx))
+		if (emul_is_noncanonical_address(rcx, ctxt) ||
+		    emul_is_noncanonical_address(rdx, ctxt))
 			return emulate_gp(ctxt, 0);
 		break;
 	}
@@ -3757,7 +3759,7 @@ static int em_lgdt_lidt(struct x86_emulate_ctxt *ctxt, bool lgdt)
 	if (rc != X86EMUL_CONTINUE)
 		return rc;
 	if (ctxt->mode == X86EMUL_MODE_PROT64 &&
-	    is_noncanonical_address(desc_ptr.address))
+	    emul_is_noncanonical_address(desc_ptr.address, ctxt))
 		return emulate_gp(ctxt, 0);
 	if (lgdt)
 		ctxt->ops->set_gdt(ctxt, &desc_ptr);
diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 762cdf2..0052317 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -4,7 +4,7 @@
 #define KVM_POSSIBLE_CR0_GUEST_BITS X86_CR0_TS
 #define KVM_POSSIBLE_CR4_GUEST_BITS				  \
 	(X86_CR4_PVI | X86_CR4_DE | X86_CR4_PCE | X86_CR4_OSFXSR  \
-	 | X86_CR4_OSXMMEXCPT | X86_CR4_PGE)
+	 | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_PGE)
 
 static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu,
 					      enum kvm_reg reg)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d4038a1..18be45d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -122,7 +122,7 @@ module_param_named(preemption_timer, enable_preemption_timer, bool, S_IRUGO);
 	(KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST | X86_CR0_PG | X86_CR0_PE)
 #define KVM_CR4_GUEST_OWNED_BITS				      \
 	(X86_CR4_PVI | X86_CR4_DE | X86_CR4_PCE | X86_CR4_OSFXSR      \
-	 | X86_CR4_OSXMMEXCPT | X86_CR4_TSD)
+	 | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_TSD)
 
 #define KVM_PMODE_VM_CR4_ALWAYS_ON (X86_CR4_PAE | X86_CR4_VMXE)
 #define KVM_RMODE_VM_CR4_ALWAYS_ON (X86_CR4_VME | X86_CR4_PAE | X86_CR4_VMXE)
@@ -3373,7 +3373,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		    (!msr_info->host_initiated &&
 		     !guest_cpuid_has(vcpu, X86_FEATURE_MPX)))
 			return 1;
-		if (is_noncanonical_address(data & PAGE_MASK) ||
+		if (is_noncanonical_address(data & PAGE_MASK, vcpu) ||
 		    (data & MSR_IA32_BNDCFGS_RSVD))
 			return 1;
 		vmcs_write64(GUEST_BNDCFGS, data);
@@ -7034,7 +7034,7 @@ static int get_vmx_mem_address(struct kvm_vcpu *vcpu,
 		 * non-canonical form. This is the only check on the memory
 		 * destination for long mode!
 		 */
-		exn = is_noncanonical_address(*ret);
+		exn = is_noncanonical_address(*ret, vcpu);
 	} else if (is_protmode(vcpu)) {
 		/* Protected mode: apply checks for segment validity in the
 		 * following order:
@@ -7839,7 +7839,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
 
 	switch (type) {
 	case VMX_VPID_EXTENT_INDIVIDUAL_ADDR:
-		if (is_noncanonical_address(operand.gla)) {
+		if (is_noncanonical_address(operand.gla, vcpu)) {
 			nested_vmx_failValid(vcpu,
 				VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
 			return kvm_skip_emulated_instruction(vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 79f5889..80eceb7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -769,6 +769,9 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 	if (!guest_cpuid_has(vcpu, X86_FEATURE_PKU) && (cr4 & X86_CR4_PKE))
 		return 1;
 
+	if (!guest_cpuid_has(vcpu, X86_FEATURE_LA57) && (cr4 & X86_CR4_LA57))
+		return 1;
+
 	if (is_long_mode(vcpu)) {
 		if (!(cr4 & X86_CR4_PAE))
 			return 1;
@@ -1074,7 +1077,7 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	case MSR_KERNEL_GS_BASE:
 	case MSR_CSTAR:
 	case MSR_LSTAR:
-		if (is_noncanonical_address(msr->data))
+		if (is_noncanonical_address(msr->data, vcpu))
 			return 1;
 		break;
 	case MSR_IA32_SYSENTER_EIP:
@@ -1091,7 +1094,7 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		 * value, and that something deterministic happens if the guest
 		 * invokes 64-bit SYSENTER.
 		 */
-		msr->data = get_canonical(msr->data);
+		msr->data = get_canonical(msr->data, vcpu_virt_addr_bits(vcpu));
 	}
 	return kvm_x86_ops->set_msr(vcpu, msr);
 }
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 69de8bf..de1abf4 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -97,6 +97,40 @@ static inline u32 bit(int bitno)
 	return 1 << (bitno & 31);
 }
 
+static inline u8 vcpu_virt_addr_bits(struct kvm_vcpu *vcpu)
+{
+	return kvm_read_cr4_bits(vcpu, X86_CR4_LA57) ? 57 : 48;
+}
+
+static inline u8 ctxt_virt_addr_bits(struct x86_emulate_ctxt *ctxt)
+{
+	return (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_LA57) ? 57 : 48;
+}
+
+static inline u64 get_canonical(u64 la, u8 vaddr_bits)
+{
+	return ((int64_t)la << (64 - vaddr_bits)) >> (64 - vaddr_bits);
+}
+
+static inline bool is_noncanonical_address(u64 la, struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_X86_64
+	return get_canonical(la, vcpu_virt_addr_bits(vcpu)) != la;
+#else
+	return false;
+#endif
+}
+
+static inline bool emul_is_noncanonical_address(u64 la,
+						struct x86_emulate_ctxt *ctxt)
+{
+#ifdef CONFIG_X86_64
+	return get_canonical(la, ctxt_virt_addr_bits(ctxt)) != la;
+#else
+	return false;
+#endif
+}
+
 static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
 					gva_t gva, gfn_t gfn, unsigned access)
 {
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] KVM: MMU: check guest CR3 reserved bits based on its physical address width.
  2017-08-24 12:27 ` [PATCH v3 2/5] KVM: MMU: check guest CR3 reserved bits based on its physical address width Yu Zhang
@ 2017-08-24 13:40   ` Paolo Bonzini
  2017-08-24 15:23     ` Yu Zhang
  2017-09-15 23:19   ` Jim Mattson
  1 sibling, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2017-08-24 13:40 UTC (permalink / raw)
  To: Yu Zhang, kvm
  Cc: linux-kernel, rkrcmar, tglx, mingo, hpa, xiaoguangrong, joro

On 24/08/2017 14:27, Yu Zhang wrote:
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 3ed6192..67e7ec2 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -48,6 +48,9 @@
>  
>  static inline u64 rsvd_bits(int s, int e)
>  {
> +	if (e < s)
> +		return 0;
> +
>  	return ((1ULL << (e - s + 1)) - 1) << s;
>  }

e = s - 1 is already supported; why do you need e <= s - 2?

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/5] KVM: MMU: Add 5 level EPT & Shadow page table support.
  2017-08-24 12:27 ` [PATCH v3 4/5] KVM: MMU: Add 5 level EPT & Shadow page table support Yu Zhang
@ 2017-08-24 13:42   ` Paolo Bonzini
  0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2017-08-24 13:42 UTC (permalink / raw)
  To: Yu Zhang, kvm
  Cc: linux-kernel, rkrcmar, tglx, mingo, hpa, xiaoguangrong, joro

On 24/08/2017 14:27, Yu Zhang wrote:
>  	/* Update physical-address width */
>  	vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
>  
> +#ifdef CONFIG_X86_64
> +	if (vcpu->arch.maxphyaddr > 48)
> +		kvm_mmu_reset_context(vcpu);
> +#endif

I think CONFIG_X86_64 is not necessary to have maxphyaddr>48?

Paolo

>  	kvm_pmu_refresh(vcpu);
>  	return 0;
>  }

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 0/5] KVM: MMU: 5 level EPT/shadow support
  2017-08-24 12:27 [PATCH v3 0/5] KVM: MMU: 5 level EPT/shadow support Yu Zhang
                   ` (4 preceding siblings ...)
  2017-08-24 12:27 ` [PATCH v3 5/5] KVM: MMU: Expose the LA57 feature to VM Yu Zhang
@ 2017-08-24 13:47 ` Paolo Bonzini
  5 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2017-08-24 13:47 UTC (permalink / raw)
  To: Yu Zhang, kvm
  Cc: linux-kernel, rkrcmar, tglx, mingo, hpa, xiaoguangrong, joro

On 24/08/2017 14:27, Yu Zhang wrote:
> Intel's existing processors limit the maximum linear address width to
> 48 bits, and the maximum physical address width to 46 bits. And the
> upcoming processors will extend maximum linear address width to 57 bits
> and maximum physical address width can go upto 52 bits in practical.
> 
> With linear address width greater than 48, a new paging mode in IA-32e
> is introduced - 5 level paging(also known as LA57). And to support VMs 
> with this feature, KVM MMU code need to be extended. 
> 
> And to achieve this, this patchset:
> 1> leverages 2 qemu parameters: +la57 and phys-bits to expose wider linear
> address width and physical address width to the VM; 
> 2> extends shadow logic to construct 5 level shadow page for VMs running
> in LA57 mode;
> 3> extends ept logic to construct 5 level ept table for VMs whose maximum
> physical width exceeds 48 bits.
> 
> Changes in v3: 
> - Address comments from Paolo Bonzini: do not fall into check_cpuid_limit()
>   in kvm_cpuid() for em_movbe() and check_fxsr();
> - Address comments from Paolo Bonzini: change parameter 'check_limit' of
>   kvm_cpuid() to bool type;
> - Address comments from Paolo Bonzini: set maxphyaddr to 36, for guest cr3
>   reserved bits check if cpuid.0x80000008 is not available;
> - Address comments from Paolo Bonzini: replace the hardcoded value 48 as
>   va_bits in __linearize();
> - Rebase change: add new eptp definition VMX_EPTP_PWL_5, instead of use bit
>   shifts(in line with previous commit bb97a01).
> 
> Changes in v2: 
> - Address comments from Paolo Bonzini and Jim Mattson: add a new patch to let
>   kvm_cpuid() return false when cpuid entry is not found; 
> - Address comments from Paolo Bonzini: fix a typo in check_cr_write() and use
>   62 as the upper limit when checking reserved bits for a physical address;
> - Address comments from Paolo Bonzini: move definition of PT64_ROOT_MAX_LEVEL
>   into kvm_host.h;
> - Address comments from Paolo Bonzini: add checking for shadow_root_level in
>   mmu_free_roots(); 
> - Address comments from Paolo Bonzini: set root_level & shadow_root_level both
>   to PT64_ROOT_4LEVEL for shadow ept situation.
> 
> 
> Yu Zhang (5):
>   KVM: x86: Add return value to kvm_cpuid().
>   KVM: MMU: check guest CR3 reserved bits based on its physical address
>     width.
>   KVM: MMU: Rename PT64_ROOT_LEVEL to PT64_ROOT_4LEVEL.
>   KVM: MMU: Add 5 level EPT & Shadow page table support.
>   KVM: MMU: Expose the LA57 feature to VM.
> 
>  arch/x86/include/asm/kvm_emulate.h |  4 +--
>  arch/x86/include/asm/kvm_host.h    | 31 ++++++--------------
>  arch/x86/include/asm/vmx.h         |  2 ++
>  arch/x86/kvm/cpuid.c               | 38 +++++++++++++++++-------
>  arch/x86/kvm/cpuid.h               |  3 +-
>  arch/x86/kvm/emulate.c             | 42 +++++++++++++++++----------
>  arch/x86/kvm/kvm_cache_regs.h      |  2 +-
>  arch/x86/kvm/mmu.c                 | 59 ++++++++++++++++++++++++--------------
>  arch/x86/kvm/mmu.h                 |  6 +++-
>  arch/x86/kvm/mmu_audit.c           |  4 +--
>  arch/x86/kvm/svm.c                 |  8 +++---
>  arch/x86/kvm/trace.h               | 11 ++++---
>  arch/x86/kvm/vmx.c                 | 29 ++++++++++++-------
>  arch/x86/kvm/x86.c                 | 21 ++++++++------
>  arch/x86/kvm/x86.h                 | 44 ++++++++++++++++++++++++++++
>  15 files changed, 201 insertions(+), 103 deletions(-)
> 

Applied to kvm/queue.  The only change I made is to make
kvm_mmu_reset_context unconditional in patch 4, because changing
MAXPHYADDR invalidates the MMU's bitmasks for reserved bits.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] KVM: MMU: check guest CR3 reserved bits based on its physical address width.
  2017-08-24 13:40   ` Paolo Bonzini
@ 2017-08-24 15:23     ` Yu Zhang
  2017-08-24 15:50       ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: Yu Zhang @ 2017-08-24 15:23 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: linux-kernel, rkrcmar, tglx, mingo, hpa, xiaoguangrong, joro



On 8/24/2017 9:40 PM, Paolo Bonzini wrote:
> On 24/08/2017 14:27, Yu Zhang wrote:
>> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>> index 3ed6192..67e7ec2 100644
>> --- a/arch/x86/kvm/mmu.h
>> +++ b/arch/x86/kvm/mmu.h
>> @@ -48,6 +48,9 @@
>>   
>>   static inline u64 rsvd_bits(int s, int e)
>>   {
>> +	if (e < s)
>> +		return 0;
>> +
>>   	return ((1ULL << (e - s + 1)) - 1) << s;
>>   }
> e = s - 1 is already supported; why do you need e <= s - 2?

Sorry? I do not quite understand. When will e = s - 1?

Thanks
Yu
> Paolo
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] KVM: MMU: check guest CR3 reserved bits based on its physical address width.
  2017-08-24 15:50       ` Paolo Bonzini
@ 2017-08-24 15:38         ` Yu Zhang
  2017-08-24 16:27           ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: Yu Zhang @ 2017-08-24 15:38 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: linux-kernel, rkrcmar, tglx, mingo, hpa, xiaoguangrong, joro



On 8/24/2017 11:50 PM, Paolo Bonzini wrote:
> On 24/08/2017 17:23, Yu Zhang wrote:
>>>>      static inline u64 rsvd_bits(int s, int e)
>>>>    {
>>>> +    if (e < s)
>>>> +        return 0;
>>>> +
>>>>        return ((1ULL << (e - s + 1)) - 1) << s;
>>>>    }
>>> e = s - 1 is already supported; why do you need e <= s - 2?
>> Sorry? I do not quite understand. When will e = s - 1?
> Is there any case where e < s?  I can see that MAXPHYADDR=63 gives
> rsvd_bits(63, 62), but that works.
>
> In practice, MAXPHYADDR will never be 59 even because the PKRU bits are
> at bits 59..62.

Thanks, Paolo.
I see. I had made an assumption that MAXPHYADDR shall not exceed the 
physical one,
which is 52 I believe. But I'm not sure there's any place to check this.
Maybe we should make sure the vcpu->arch.maxphyaddr will not be greater 
than the
value of the host?

Yu
>
> Paolo
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] KVM: MMU: check guest CR3 reserved bits based on its physical address width.
  2017-08-24 15:23     ` Yu Zhang
@ 2017-08-24 15:50       ` Paolo Bonzini
  2017-08-24 15:38         ` Yu Zhang
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2017-08-24 15:50 UTC (permalink / raw)
  To: Yu Zhang, kvm
  Cc: linux-kernel, rkrcmar, tglx, mingo, hpa, xiaoguangrong, joro

On 24/08/2017 17:23, Yu Zhang wrote:
>>>
>>>     static inline u64 rsvd_bits(int s, int e)
>>>   {
>>> +    if (e < s)
>>> +        return 0;
>>> +
>>>       return ((1ULL << (e - s + 1)) - 1) << s;
>>>   }
>> e = s - 1 is already supported; why do you need e <= s - 2?
> 
> Sorry? I do not quite understand. When will e = s - 1?

Is there any case where e < s?  I can see that MAXPHYADDR=63 gives
rsvd_bits(63, 62), but that works.

In practice, MAXPHYADDR will never be 59 even because the PKRU bits are
at bits 59..62.

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] KVM: MMU: check guest CR3 reserved bits based on its physical address width.
  2017-08-24 16:27           ` Paolo Bonzini
@ 2017-08-24 16:21             ` Yu Zhang
  2017-08-24 16:51               ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: Yu Zhang @ 2017-08-24 16:21 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: linux-kernel, rkrcmar, tglx, mingo, hpa, xiaoguangrong, joro



On 8/25/2017 12:27 AM, Paolo Bonzini wrote:
> On 24/08/2017 17:38, Yu Zhang wrote:
>>>
>>> In practice, MAXPHYADDR will never be 59 even because the PKRU bits are
>>> at bits 59..62.
>> Thanks, Paolo.
>> I see. I had made an assumption that MAXPHYADDR shall not exceed the
>> physical one,
>> which is 52 I believe. But I'm not sure there's any place to check this.
>> Maybe we should make sure the vcpu->arch.maxphyaddr will not be greater
>> than the value of the host?
> That's a separate change anyway.  In any case, since currently the
> MAXPHYADDR is not validated, your change to rsvd_bits makes sense.

Thanks, Paolo.
As to this patch series, any change I need do?

BTW,  I have written a patch for kvm-unit-test access test, but the test 
failed.
Not sure if my patch is erroneous or due to a simulator error. I'll send 
out the
test patch after it works.:-)

Yu
> Paolo
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] KVM: MMU: check guest CR3 reserved bits based on its physical address width.
  2017-08-24 15:38         ` Yu Zhang
@ 2017-08-24 16:27           ` Paolo Bonzini
  2017-08-24 16:21             ` Yu Zhang
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2017-08-24 16:27 UTC (permalink / raw)
  To: Yu Zhang, kvm
  Cc: linux-kernel, rkrcmar, tglx, mingo, hpa, xiaoguangrong, joro

On 24/08/2017 17:38, Yu Zhang wrote:
>>
>>
>> In practice, MAXPHYADDR will never be 59 even because the PKRU bits are
>> at bits 59..62.
> 
> Thanks, Paolo.
> I see. I had made an assumption that MAXPHYADDR shall not exceed the
> physical one,
> which is 52 I believe. But I'm not sure there's any place to check this.
> Maybe we should make sure the vcpu->arch.maxphyaddr will not be greater
> than the value of the host?

That's a separate change anyway.  In any case, since currently the
MAXPHYADDR is not validated, your change to rsvd_bits makes sense.

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] KVM: MMU: check guest CR3 reserved bits based on its physical address width.
  2017-08-24 16:21             ` Yu Zhang
@ 2017-08-24 16:51               ` Paolo Bonzini
  0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2017-08-24 16:51 UTC (permalink / raw)
  To: Yu Zhang, kvm
  Cc: linux-kernel, rkrcmar, tglx, mingo, hpa, xiaoguangrong, joro

On 24/08/2017 18:21, Yu Zhang wrote:
> 
> 
> On 8/25/2017 12:27 AM, Paolo Bonzini wrote:
>> On 24/08/2017 17:38, Yu Zhang wrote:
>>>>
>>>> In practice, MAXPHYADDR will never be 59 even because the PKRU bits are
>>>> at bits 59..62.
>>> Thanks, Paolo.
>>> I see. I had made an assumption that MAXPHYADDR shall not exceed the
>>> physical one,
>>> which is 52 I believe. But I'm not sure there's any place to check this.
>>> Maybe we should make sure the vcpu->arch.maxphyaddr will not be greater
>>> than the value of the host?
>> That's a separate change anyway.  In any case, since currently the
>> MAXPHYADDR is not validated, your change to rsvd_bits makes sense.
> 
> Thanks, Paolo.
> As to this patch series, any change I need do?

No, it's fine.

> BTW,  I have written a patch for kvm-unit-test access test, but the test
> failed.
> Not sure if my patch is erroneous or due to a simulator error. I'll send
> out the
> test patch after it works.:-)

Try to send it.  I can also test it with QEMU TCG.

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] KVM: MMU: check guest CR3 reserved bits based on its physical address width.
  2017-08-24 12:27 ` [PATCH v3 2/5] KVM: MMU: check guest CR3 reserved bits based on its physical address width Yu Zhang
  2017-08-24 13:40   ` Paolo Bonzini
@ 2017-09-15 23:19   ` Jim Mattson
  2017-09-18  8:15     ` Yu Zhang
  1 sibling, 1 reply; 19+ messages in thread
From: Jim Mattson @ 2017-09-15 23:19 UTC (permalink / raw)
  To: Yu Zhang
  Cc: kvm list, LKML, Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H . Peter Anvin, xiaoguangrong,
	Joerg Roedel

On Thu, Aug 24, 2017 at 5:27 AM, Yu Zhang <yu.c.zhang@linux.intel.com> wrote:
> Currently, KVM uses CR3_L_MODE_RESERVED_BITS to check the
> reserved bits in CR3. Yet the length of reserved bits in
> guest CR3 should be based on the physical address width
> exposed to the VM. This patch changes CR3 check logic to
> calculate the reserved bits at runtime.
>
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 -
>  arch/x86/kvm/emulate.c          | 14 ++++++++++++--
>  arch/x86/kvm/mmu.h              |  3 +++
>  arch/x86/kvm/x86.c              |  8 ++++----
>  4 files changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 6db0ed9..e716228 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -79,7 +79,6 @@
>                           | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM \
>                           | X86_CR0_NW | X86_CR0_CD | X86_CR0_PG))
>
> -#define CR3_L_MODE_RESERVED_BITS 0xFFFFFF0000000000ULL
>  #define CR3_PCID_INVD           BIT_64(63)
>  #define CR4_RESERVED_BITS                                               \
>         (~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 319d91f..a89b595 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -28,6 +28,7 @@
>
>  #include "x86.h"
>  #include "tss.h"
> +#include "mmu.h"
>
>  /*
>   * Operand types
> @@ -4097,8 +4098,17 @@ static int check_cr_write(struct x86_emulate_ctxt *ctxt)
>                 u64 rsvd = 0;
>
>                 ctxt->ops->get_msr(ctxt, MSR_EFER, &efer);
> -               if (efer & EFER_LMA)
> -                       rsvd = CR3_L_MODE_RESERVED_BITS & ~CR3_PCID_INVD;
> +               if (efer & EFER_LMA) {
> +                       u64 maxphyaddr;
> +                       u32 eax = 0x80000008;
> +
> +                       if (ctxt->ops->get_cpuid(ctxt, &eax, NULL, NULL,
> +                                                NULL, false))

Passing NULL for the address of ecx looks problematic to me.

We have:

static bool emulator_get_cpuid(struct x86_emulate_ctxt *ctxt,
                        u32 *eax, u32 *ebx, u32 *ecx, u32 *edx, bool
check_limit)
{
        return kvm_cpuid(emul_to_vcpu(ctxt), eax, ebx, ecx, edx, check_limit);
}

And:

bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
       u32 *ecx, u32 *edx, bool check_limit)
{
u32 function = *eax, index = *ecx;
struct kvm_cpuid_entry2 *best;
bool entry_found = true;
...

Doesn't this immediately try to dereference a NULL pointer?  How much
testing have you done of this code?

> +                               maxphyaddr = eax & 0xff;
> +                       else
> +                               maxphyaddr = 36;
> +                       rsvd = rsvd_bits(maxphyaddr, 62);
> +               }
>
>                 if (new_val & rsvd)
>                         return emulate_gp(ctxt, 0);
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 3ed6192..67e7ec2 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -48,6 +48,9 @@
>
>  static inline u64 rsvd_bits(int s, int e)
>  {
> +       if (e < s)
> +               return 0;
> +
>         return ((1ULL << (e - s + 1)) - 1) << s;
>  }
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index cc2c7e4..79f5889 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -813,10 +813,10 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>                 return 0;
>         }
>
> -       if (is_long_mode(vcpu)) {
> -               if (cr3 & CR3_L_MODE_RESERVED_BITS)
> -                       return 1;
> -       } else if (is_pae(vcpu) && is_paging(vcpu) &&
> +       if (is_long_mode(vcpu) &&
> +           (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 62)))
> +               return 1;
> +       else if (is_pae(vcpu) && is_paging(vcpu) &&
>                    !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
>                 return 1;
>
> --
> 2.5.0
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] KVM: MMU: check guest CR3 reserved bits based on its physical address width.
  2017-09-15 23:19   ` Jim Mattson
@ 2017-09-18  8:15     ` Yu Zhang
  2017-09-18  8:41       ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: Yu Zhang @ 2017-09-18  8:15 UTC (permalink / raw)
  To: Jim Mattson
  Cc: kvm list, LKML, Paolo Bonzini, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H . Peter Anvin, xiaoguangrong,
	Joerg Roedel



On 9/16/2017 7:19 AM, Jim Mattson wrote:
> On Thu, Aug 24, 2017 at 5:27 AM, Yu Zhang <yu.c.zhang@linux.intel.com> wrote:
>> Currently, KVM uses CR3_L_MODE_RESERVED_BITS to check the
>> reserved bits in CR3. Yet the length of reserved bits in
>> guest CR3 should be based on the physical address width
>> exposed to the VM. This patch changes CR3 check logic to
>> calculate the reserved bits at runtime.
>>
>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>> ---
>>   arch/x86/include/asm/kvm_host.h |  1 -
>>   arch/x86/kvm/emulate.c          | 14 ++++++++++++--
>>   arch/x86/kvm/mmu.h              |  3 +++
>>   arch/x86/kvm/x86.c              |  8 ++++----
>>   4 files changed, 19 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 6db0ed9..e716228 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -79,7 +79,6 @@
>>                            | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM \
>>                            | X86_CR0_NW | X86_CR0_CD | X86_CR0_PG))
>>
>> -#define CR3_L_MODE_RESERVED_BITS 0xFFFFFF0000000000ULL
>>   #define CR3_PCID_INVD           BIT_64(63)
>>   #define CR4_RESERVED_BITS                                               \
>>          (~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index 319d91f..a89b595 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -28,6 +28,7 @@
>>
>>   #include "x86.h"
>>   #include "tss.h"
>> +#include "mmu.h"
>>
>>   /*
>>    * Operand types
>> @@ -4097,8 +4098,17 @@ static int check_cr_write(struct x86_emulate_ctxt *ctxt)
>>                  u64 rsvd = 0;
>>
>>                  ctxt->ops->get_msr(ctxt, MSR_EFER, &efer);
>> -               if (efer & EFER_LMA)
>> -                       rsvd = CR3_L_MODE_RESERVED_BITS & ~CR3_PCID_INVD;
>> +               if (efer & EFER_LMA) {
>> +                       u64 maxphyaddr;
>> +                       u32 eax = 0x80000008;
>> +
>> +                       if (ctxt->ops->get_cpuid(ctxt, &eax, NULL, NULL,
>> +                                                NULL, false))
> Passing NULL for the address of ecx looks problematic to me.
>
> We have:
>
> static bool emulator_get_cpuid(struct x86_emulate_ctxt *ctxt,
>                          u32 *eax, u32 *ebx, u32 *ecx, u32 *edx, bool
> check_limit)
> {
>          return kvm_cpuid(emul_to_vcpu(ctxt), eax, ebx, ecx, edx, check_limit);
> }
>
> And:
>
> bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
>         u32 *ecx, u32 *edx, bool check_limit)
> {
> u32 function = *eax, index = *ecx;
> struct kvm_cpuid_entry2 *best;
> bool entry_found = true;
> ...
>
> Doesn't this immediately try to dereference a NULL pointer?  How much
> testing have you done of this code?

Thanks Jim.
I have tested this code in a simulator to successfully boot a VM in 
shadow mode.
Seems this code is not covered(but I am now still perplexed why this is 
not covered).
Any possibility that the check_cr_write() is not triggered when 
emulating the cr
operations?

Anyway, this should be a bug and thanks for pointing this out, and I'll 
send out the
fix later.

BR
Yu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] KVM: MMU: check guest CR3 reserved bits based on its physical address width.
  2017-09-18  8:15     ` Yu Zhang
@ 2017-09-18  8:41       ` Paolo Bonzini
  2017-09-18  9:35         ` Yu Zhang
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2017-09-18  8:41 UTC (permalink / raw)
  To: Yu Zhang, Jim Mattson
  Cc: kvm list, LKML, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H . Peter Anvin, xiaoguangrong,
	Joerg Roedel

On 18/09/2017 10:15, Yu Zhang wrote:
>>
>> static bool emulator_get_cpuid(struct x86_emulate_ctxt *ctxt,
>>                          u32 *eax, u32 *ebx, u32 *ecx, u32 *edx, bool
>> check_limit)
>> {
>>          return kvm_cpuid(emul_to_vcpu(ctxt), eax, ebx, ecx, edx,
>> check_limit);
>> }
>>
>> And:
>>
>> bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
>>         u32 *ecx, u32 *edx, bool check_limit)
>> {
>> u32 function = *eax, index = *ecx;
>> struct kvm_cpuid_entry2 *best;
>> bool entry_found = true;
>> ...
>>
>> Doesn't this immediately try to dereference a NULL pointer?  How much
>> testing have you done of this code?
> 
> Thanks Jim.
> I have tested this code in a simulator to successfully boot a VM in 
> shadow mode. Seems this code is not covered(but I am now still
> perplexed why this is not covered). Any possibility that the
> check_cr_write() is not triggered when emulating the cr operations?

CR moves usually don't go through the emulator (the main exception is
emulation of invalid guest state when the processor doesn't support
unrestricted_guest=1, but even that is unlikely to happen with
EFER.LMA=1).  This explains why you didn't see the failure.

> Anyway, this should be a bug and thanks for pointing this out, and
> I'll send out the fix later.
Thanks,

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/5] KVM: MMU: check guest CR3 reserved bits based on its physical address width.
  2017-09-18  8:41       ` Paolo Bonzini
@ 2017-09-18  9:35         ` Yu Zhang
  0 siblings, 0 replies; 19+ messages in thread
From: Yu Zhang @ 2017-09-18  9:35 UTC (permalink / raw)
  To: Paolo Bonzini, Jim Mattson
  Cc: kvm list, LKML, Radim Krčmář,
	Thomas Gleixner, Ingo Molnar, H . Peter Anvin, xiaoguangrong,
	Joerg Roedel



On 9/18/2017 4:41 PM, Paolo Bonzini wrote:
> On 18/09/2017 10:15, Yu Zhang wrote:
>>> static bool emulator_get_cpuid(struct x86_emulate_ctxt *ctxt,
>>>                           u32 *eax, u32 *ebx, u32 *ecx, u32 *edx, bool
>>> check_limit)
>>> {
>>>           return kvm_cpuid(emul_to_vcpu(ctxt), eax, ebx, ecx, edx,
>>> check_limit);
>>> }
>>>
>>> And:
>>>
>>> bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
>>>          u32 *ecx, u32 *edx, bool check_limit)
>>> {
>>> u32 function = *eax, index = *ecx;
>>> struct kvm_cpuid_entry2 *best;
>>> bool entry_found = true;
>>> ...
>>>
>>> Doesn't this immediately try to dereference a NULL pointer?  How much
>>> testing have you done of this code?
>> Thanks Jim.
>> I have tested this code in a simulator to successfully boot a VM in
>> shadow mode. Seems this code is not covered(but I am now still
>> perplexed why this is not covered). Any possibility that the
>> check_cr_write() is not triggered when emulating the cr operations?
> CR moves usually don't go through the emulator (the main exception is
> emulation of invalid guest state when the processor doesn't support
> unrestricted_guest=1, but even that is unlikely to happen with
> EFER.LMA=1).  This explains why you didn't see the failure.

Oh, right. It normally goes to handle_cr(). Thanks, Paolo.

Yu

>
>> Anyway, this should be a bug and thanks for pointing this out, and
>> I'll send out the fix later.
> Thanks,
>
> Paolo
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-09-18  9:35 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-24 12:27 [PATCH v3 0/5] KVM: MMU: 5 level EPT/shadow support Yu Zhang
2017-08-24 12:27 ` [PATCH v3 1/5] KVM: x86: Add return value to kvm_cpuid() Yu Zhang
2017-08-24 12:27 ` [PATCH v3 2/5] KVM: MMU: check guest CR3 reserved bits based on its physical address width Yu Zhang
2017-08-24 13:40   ` Paolo Bonzini
2017-08-24 15:23     ` Yu Zhang
2017-08-24 15:50       ` Paolo Bonzini
2017-08-24 15:38         ` Yu Zhang
2017-08-24 16:27           ` Paolo Bonzini
2017-08-24 16:21             ` Yu Zhang
2017-08-24 16:51               ` Paolo Bonzini
2017-09-15 23:19   ` Jim Mattson
2017-09-18  8:15     ` Yu Zhang
2017-09-18  8:41       ` Paolo Bonzini
2017-09-18  9:35         ` Yu Zhang
2017-08-24 12:27 ` [PATCH v3 3/5] KVM: MMU: Rename PT64_ROOT_LEVEL to PT64_ROOT_4LEVEL Yu Zhang
2017-08-24 12:27 ` [PATCH v3 4/5] KVM: MMU: Add 5 level EPT & Shadow page table support Yu Zhang
2017-08-24 13:42   ` Paolo Bonzini
2017-08-24 12:27 ` [PATCH v3 5/5] KVM: MMU: Expose the LA57 feature to VM Yu Zhang
2017-08-24 13:47 ` [PATCH v3 0/5] KVM: MMU: 5 level EPT/shadow support Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).