kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/4] KVM: nSVM: ondemand nested state allocation
@ 2020-10-01 11:29 Maxim Levitsky
  2020-10-01 11:29 ` [PATCH v7 1/4] KVM: x86: xen_hvm_config: cleanup return values Maxim Levitsky
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Maxim Levitsky @ 2020-10-01 11:29 UTC (permalink / raw)
  To: kvm
  Cc: Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Jim Mattson, Sean Christopherson, Thomas Gleixner,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar, linux-kernel,
	Vitaly Kuznetsov, Joerg Roedel, Maxim Levitsky

This is the next version of this patch series.

In V5 I adopted Sean Christopherson's suggestion to make .set_efer return
a negative error (-ENOMEM in this case) which in most cases in kvm
propagates to the userspace.

I noticed though that wrmsr emulation code doesn't do this and instead
it injects #GP to the guest on _any_ error.

So I fixed the wrmsr code to behave in a similar way to the rest
of the kvm code.
(#GP only on a positive error value, and forward the negative error to
the userspace)

I had to adjust one wrmsr handler (xen_hvm_config) to stop it from returning
negative values	so that new WRMSR emulation behavior doesn't break it.
This patch was only compile tested.

The memory allocation failure was tested by always returning -ENOMEM
from svm_allocate_nested.

The nested allocation itself was tested by countless attempts to run
nested guests, do nested migration on both my AMD and Intel machines.
I wasn't able to break it.

Changes from V5: addressed Sean Christopherson's review feedback.
Changes from V6: rebased the code on latest kvm/queue

Best regards,
	Maxim Levitsky

Maxim Levitsky (4):
  KVM: x86: xen_hvm_config: cleanup return values
  KVM: x86: report negative values from wrmsr emulation to userspace
  KVM: x86: allow kvm_x86_ops.set_efer to return an error value
  KVM: nSVM: implement on demand allocation of the nested state

 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/emulate.c          |  4 +--
 arch/x86/kvm/svm/nested.c       | 42 ++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c          | 64 ++++++++++++++++++---------------
 arch/x86/kvm/svm/svm.h          | 10 +++++-
 arch/x86/kvm/vmx/vmx.c          |  6 ++--
 arch/x86/kvm/x86.c              | 39 ++++++++++----------
 7 files changed, 114 insertions(+), 53 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v7 1/4] KVM: x86: xen_hvm_config: cleanup return values
  2020-10-01 11:29 [PATCH v7 0/4] KVM: nSVM: ondemand nested state allocation Maxim Levitsky
@ 2020-10-01 11:29 ` Maxim Levitsky
  2020-10-01 11:29 ` [PATCH v7 2/4] KVM: x86: report negative values from wrmsr emulation to userspace Maxim Levitsky
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Maxim Levitsky @ 2020-10-01 11:29 UTC (permalink / raw)
  To: kvm
  Cc: Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Jim Mattson, Sean Christopherson, Thomas Gleixner,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar, linux-kernel,
	Vitaly Kuznetsov, Joerg Roedel, Maxim Levitsky

Return 1 on errors that are caused by wrong guest behavior
(which will inject #GP to the guest)

And return a negative error value on issues that are
the kernel's fault (e.g -ENOMEM)

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c4015a43cc8a7..09a0cad49af51 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2812,24 +2812,19 @@ static int xen_hvm_config(struct kvm_vcpu *vcpu, u64 data)
 	u32 page_num = data & ~PAGE_MASK;
 	u64 page_addr = data & PAGE_MASK;
 	u8 *page;
-	int r;
 
-	r = -E2BIG;
 	if (page_num >= blob_size)
-		goto out;
-	r = -ENOMEM;
+		return 1;
+
 	page = memdup_user(blob_addr + (page_num * PAGE_SIZE), PAGE_SIZE);
-	if (IS_ERR(page)) {
-		r = PTR_ERR(page);
-		goto out;
+	if (IS_ERR(page))
+		return PTR_ERR(page);
+
+	if (kvm_vcpu_write_guest(vcpu, page_addr, page, PAGE_SIZE)) {
+		kfree(page);
+		return 1;
 	}
-	if (kvm_vcpu_write_guest(vcpu, page_addr, page, PAGE_SIZE))
-		goto out_free;
-	r = 0;
-out_free:
-	kfree(page);
-out:
-	return r;
+	return 0;
 }
 
 static inline bool kvm_pv_async_pf_enabled(struct kvm_vcpu *vcpu)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v7 2/4] KVM: x86: report negative values from wrmsr emulation to userspace
  2020-10-01 11:29 [PATCH v7 0/4] KVM: nSVM: ondemand nested state allocation Maxim Levitsky
  2020-10-01 11:29 ` [PATCH v7 1/4] KVM: x86: xen_hvm_config: cleanup return values Maxim Levitsky
@ 2020-10-01 11:29 ` Maxim Levitsky
  2020-10-01 11:29 ` [PATCH v7 3/4] KVM: x86: allow kvm_x86_ops.set_efer to return an error value Maxim Levitsky
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Maxim Levitsky @ 2020-10-01 11:29 UTC (permalink / raw)
  To: kvm
  Cc: Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Jim Mattson, Sean Christopherson, Thomas Gleixner,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar, linux-kernel,
	Vitaly Kuznetsov, Joerg Roedel, Maxim Levitsky

This will allow the KVM to report such errors (e.g -ENOMEM)
to the userspace.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/emulate.c | 4 ++--
 arch/x86/kvm/x86.c     | 9 ++++++---
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 0cc0db500f718..0d917eb703194 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3712,10 +3712,10 @@ static int em_wrmsr(struct x86_emulate_ctxt *ctxt)
 	if (r == X86EMUL_IO_NEEDED)
 		return r;
 
-	if (r)
+	if (r > 0)
 		return emulate_gp(ctxt, 0);
 
-	return X86EMUL_CONTINUE;
+	return r < 0 ? X86EMUL_UNHANDLEABLE : X86EMUL_CONTINUE;
 }
 
 static int em_rdmsr(struct x86_emulate_ctxt *ctxt)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 09a0cad49af51..7af04f9e20b48 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1737,13 +1737,16 @@ int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
 	r = kvm_set_msr(vcpu, ecx, data);
 
 	/* MSR write failed? See if we should ask user space */
-	if (r && kvm_set_msr_user_space(vcpu, ecx, data, r)) {
+	if (r && kvm_set_msr_user_space(vcpu, ecx, data, r))
 		/* Bounce to user space */
 		return 0;
-	}
+
+	/* Signal all other negative errors to userspace */
+	if (r < 0)
+		return r;
 
 	/* MSR write failed? Inject a #GP */
-	if (r) {
+	if (r > 0) {
 		trace_kvm_msr_write_ex(ecx, data);
 		kvm_inject_gp(vcpu, 0);
 		return 1;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v7 3/4] KVM: x86: allow kvm_x86_ops.set_efer to return an error value
  2020-10-01 11:29 [PATCH v7 0/4] KVM: nSVM: ondemand nested state allocation Maxim Levitsky
  2020-10-01 11:29 ` [PATCH v7 1/4] KVM: x86: xen_hvm_config: cleanup return values Maxim Levitsky
  2020-10-01 11:29 ` [PATCH v7 2/4] KVM: x86: report negative values from wrmsr emulation to userspace Maxim Levitsky
@ 2020-10-01 11:29 ` Maxim Levitsky
  2020-10-01 11:29 ` [PATCH v7 4/4] KVM: nSVM: implement on demand allocation of the nested state Maxim Levitsky
  2020-10-19 15:52 ` [PATCH v7 0/4] KVM: nSVM: ondemand nested state allocation Paolo Bonzini
  4 siblings, 0 replies; 6+ messages in thread
From: Maxim Levitsky @ 2020-10-01 11:29 UTC (permalink / raw)
  To: kvm
  Cc: Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Jim Mattson, Sean Christopherson, Thomas Gleixner,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar, linux-kernel,
	Vitaly Kuznetsov, Joerg Roedel, Maxim Levitsky

This will be used to signal an error to the userspace, in case
the vendor code failed during handling of this msr. (e.g -ENOMEM)

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 2 +-
 arch/x86/kvm/svm/svm.c          | 3 ++-
 arch/x86/kvm/svm/svm.h          | 2 +-
 arch/x86/kvm/vmx/vmx.c          | 6 ++++--
 arch/x86/kvm/x86.c              | 7 ++++++-
 5 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d0f77235da923..99657051ee8a0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1086,7 +1086,7 @@ struct kvm_x86_ops {
 	void (*get_cs_db_l_bits)(struct kvm_vcpu *vcpu, int *db, int *l);
 	void (*set_cr0)(struct kvm_vcpu *vcpu, unsigned long cr0);
 	int (*set_cr4)(struct kvm_vcpu *vcpu, unsigned long cr4);
-	void (*set_efer)(struct kvm_vcpu *vcpu, u64 efer);
+	int (*set_efer)(struct kvm_vcpu *vcpu, u64 efer);
 	void (*get_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
 	void (*set_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
 	void (*get_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4f401fc6a05d9..57e0f27ff7d20 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -263,7 +263,7 @@ static int get_max_npt_level(void)
 #endif
 }
 
-void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
+int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	vcpu->arch.efer = efer;
@@ -283,6 +283,7 @@ void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 
 	svm->vmcb->save.efer = efer | EFER_SVME;
 	vmcb_mark_dirty(svm->vmcb, VMCB_CR);
+	return 0;
 }
 
 static int is_external_interrupt(u32 info)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a7f997459b870..e7af21e6fe1e0 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -350,7 +350,7 @@ static inline bool gif_set(struct vcpu_svm *svm)
 #define MSR_INVALID				0xffffffffU
 
 u32 svm_msrpm_offset(u32 msr);
-void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer);
+int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer);
 void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
 int svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
 void svm_flush_tlb(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4551a7e80ebc3..2115d35094bc5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2815,13 +2815,14 @@ static void enter_rmode(struct kvm_vcpu *vcpu)
 	kvm_mmu_reset_context(vcpu);
 }
 
-void vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer)
+int vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	struct vmx_uret_msr *msr = vmx_find_uret_msr(vmx, MSR_EFER);
 
+	/* Nothing to do if hardware doesn't support EFER. */
 	if (!msr)
-		return;
+		return 0;
 
 	vcpu->arch.efer = efer;
 	if (efer & EFER_LMA) {
@@ -2833,6 +2834,7 @@ void vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 		msr->data = efer & ~EFER_LME;
 	}
 	setup_msrs(vmx);
+	return 0;
 }
 
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7af04f9e20b48..80e3dcd542382 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1457,6 +1457,7 @@ static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
 	u64 old_efer = vcpu->arch.efer;
 	u64 efer = msr_info->data;
+	int r;
 
 	if (efer & efer_reserved_bits)
 		return 1;
@@ -1473,7 +1474,11 @@ static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	efer &= ~EFER_LMA;
 	efer |= vcpu->arch.efer & EFER_LMA;
 
-	kvm_x86_ops.set_efer(vcpu, efer);
+	r = kvm_x86_ops.set_efer(vcpu, efer);
+	if (r) {
+		WARN_ON(r > 0);
+		return r;
+	}
 
 	/* Update reserved bits */
 	if ((efer ^ old_efer) & EFER_NX)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v7 4/4] KVM: nSVM: implement on demand allocation of the nested state
  2020-10-01 11:29 [PATCH v7 0/4] KVM: nSVM: ondemand nested state allocation Maxim Levitsky
                   ` (2 preceding siblings ...)
  2020-10-01 11:29 ` [PATCH v7 3/4] KVM: x86: allow kvm_x86_ops.set_efer to return an error value Maxim Levitsky
@ 2020-10-01 11:29 ` Maxim Levitsky
  2020-10-19 15:52 ` [PATCH v7 0/4] KVM: nSVM: ondemand nested state allocation Paolo Bonzini
  4 siblings, 0 replies; 6+ messages in thread
From: Maxim Levitsky @ 2020-10-01 11:29 UTC (permalink / raw)
  To: kvm
  Cc: Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Jim Mattson, Sean Christopherson, Thomas Gleixner,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar, linux-kernel,
	Vitaly Kuznetsov, Joerg Roedel, Maxim Levitsky

This way we don't waste memory on VMs which don't use nesting
virtualization even when the host enabled it for them.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 42 +++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c    | 61 +++++++++++++++++++++------------------
 arch/x86/kvm/svm/svm.h    |  8 +++++
 3 files changed, 83 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index ba50ff6e35c7c..9e4c226dbf7d9 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -481,6 +481,9 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
 
 	vmcb12 = map.hva;
 
+	if (WARN_ON_ONCE(!svm->nested.initialized))
+		return -EINVAL;
+
 	if (!nested_vmcb_checks(svm, vmcb12)) {
 		vmcb12->control.exit_code    = SVM_EXIT_ERR;
 		vmcb12->control.exit_code_hi = 0;
@@ -698,6 +701,45 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	return 0;
 }
 
+int svm_allocate_nested(struct vcpu_svm *svm)
+{
+	struct page *hsave_page;
+
+	if (svm->nested.initialized)
+		return 0;
+
+	hsave_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	if (!hsave_page)
+		return -ENOMEM;
+	svm->nested.hsave = page_address(hsave_page);
+
+	svm->nested.msrpm = svm_vcpu_alloc_msrpm();
+	if (!svm->nested.msrpm)
+		goto err_free_hsave;
+	svm_vcpu_init_msrpm(&svm->vcpu, svm->nested.msrpm);
+
+	svm->nested.initialized = true;
+	return 0;
+
+err_free_hsave:
+	__free_page(hsave_page);
+	return -ENOMEM;
+}
+
+void svm_free_nested(struct vcpu_svm *svm)
+{
+	if (!svm->nested.initialized)
+		return;
+
+	svm_vcpu_free_msrpm(svm->nested.msrpm);
+	svm->nested.msrpm = NULL;
+
+	__free_page(virt_to_page(svm->nested.hsave));
+	svm->nested.hsave = NULL;
+
+	svm->nested.initialized = false;
+}
+
 /*
  * Forcibly leave nested mode in order to be able to reset the VCPU later on.
  */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 57e0f27ff7d20..dc4fe579d460e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -266,6 +266,7 @@ static int get_max_npt_level(void)
 int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	u64 old_efer = vcpu->arch.efer;
 	vcpu->arch.efer = efer;
 
 	if (!npt_enabled) {
@@ -276,9 +277,27 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 			efer &= ~EFER_LME;
 	}
 
-	if (!(efer & EFER_SVME)) {
-		svm_leave_nested(svm);
-		svm_set_gif(svm, true);
+	if ((old_efer & EFER_SVME) != (efer & EFER_SVME)) {
+		if (!(efer & EFER_SVME)) {
+			svm_leave_nested(svm);
+			svm_set_gif(svm, true);
+
+			/*
+			 * Free the nested guest state, unless we are in SMM.
+			 * In this case we will return to the nested guest
+			 * as soon as we leave SMM.
+			 */
+			if (!is_smm(&svm->vcpu))
+				svm_free_nested(svm);
+
+		} else {
+			int ret = svm_allocate_nested(svm);
+
+			if (ret) {
+				vcpu->arch.efer = old_efer;
+				return ret;
+			}
+		}
 	}
 
 	svm->vmcb->save.efer = efer | EFER_SVME;
@@ -650,7 +669,7 @@ static void set_msr_interception(struct kvm_vcpu *vcpu, u32 *msrpm, u32 msr,
 	set_msr_interception_bitmap(vcpu, msrpm, msr, read, write);
 }
 
-static u32 *svm_vcpu_alloc_msrpm(void)
+u32 *svm_vcpu_alloc_msrpm(void)
 {
 	struct page *pages = alloc_pages(GFP_KERNEL_ACCOUNT, MSRPM_ALLOC_ORDER);
 	u32 *msrpm;
@@ -664,7 +683,7 @@ static u32 *svm_vcpu_alloc_msrpm(void)
 	return msrpm;
 }
 
-static void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm)
+void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm)
 {
 	int i;
 
@@ -675,7 +694,8 @@ static void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm)
 	}
 }
 
-static void svm_vcpu_free_msrpm(u32 *msrpm)
+
+void svm_vcpu_free_msrpm(u32 *msrpm)
 {
 	__free_pages(virt_to_page(msrpm), MSRPM_ALLOC_ORDER);
 }
@@ -1268,7 +1288,6 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm;
 	struct page *vmcb_page;
-	struct page *hsave_page;
 	int err;
 
 	BUILD_BUG_ON(offsetof(struct vcpu_svm, vcpu) != 0);
@@ -1279,13 +1298,9 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 	if (!vmcb_page)
 		goto out;
 
-	hsave_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
-	if (!hsave_page)
-		goto error_free_vmcb_page;
-
 	err = avic_init_vcpu(svm);
 	if (err)
-		goto error_free_hsave_page;
+		goto error_free_vmcb_page;
 
 	/* We initialize this flag to true to make sure that the is_running
 	 * bit would be set the first time the vcpu is loaded.
@@ -1293,21 +1308,12 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 	if (irqchip_in_kernel(vcpu->kvm) && kvm_apicv_activated(vcpu->kvm))
 		svm->avic_is_running = true;
 
-	svm->nested.hsave = page_address(hsave_page);
-
 	svm->msrpm = svm_vcpu_alloc_msrpm();
 	if (!svm->msrpm)
-		goto error_free_hsave_page;
+		goto error_free_vmcb_page;
 
 	svm_vcpu_init_msrpm(vcpu, svm->msrpm);
 
-	svm->nested.msrpm = svm_vcpu_alloc_msrpm();
-	if (!svm->nested.msrpm)
-		goto error_free_msrpm;
-
-	/* We only need the L1 pass-through MSR state, so leave vcpu as NULL */
-	svm_vcpu_init_msrpm(vcpu, svm->nested.msrpm);
-
 	svm->vmcb = page_address(vmcb_page);
 	svm->vmcb_pa = __sme_set(page_to_pfn(vmcb_page) << PAGE_SHIFT);
 	svm->asid_generation = 0;
@@ -1318,10 +1324,6 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 
 	return 0;
 
-error_free_msrpm:
-	svm_vcpu_free_msrpm(svm->msrpm);
-error_free_hsave_page:
-	__free_page(hsave_page);
 error_free_vmcb_page:
 	__free_page(vmcb_page);
 out:
@@ -1347,10 +1349,10 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
 	 */
 	svm_clear_current_vmcb(svm->vmcb);
 
+	svm_free_nested(svm);
+
 	__free_page(pfn_to_page(__sme_clr(svm->vmcb_pa) >> PAGE_SHIFT));
 	__free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
-	__free_page(virt_to_page(svm->nested.hsave));
-	__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
 }
 
 static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
@@ -4038,6 +4040,9 @@ static int svm_pre_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
 					 gpa_to_gfn(vmcb12_gpa), &map) == -EINVAL)
 				return 1;
 
+			if (svm_allocate_nested(svm))
+				return 1;
+
 			ret = enter_svm_guest_mode(svm, vmcb12_gpa, map.hva);
 			kvm_vcpu_unmap(&svm->vcpu, &map, true);
 		}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index e7af21e6fe1e0..1d853fe4c778b 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -97,6 +97,8 @@ struct svm_nested_state {
 
 	/* cache for control fields of the guest */
 	struct vmcb_control_area ctl;
+
+	bool initialized;
 };
 
 struct vcpu_svm {
@@ -350,6 +352,10 @@ static inline bool gif_set(struct vcpu_svm *svm)
 #define MSR_INVALID				0xffffffffU
 
 u32 svm_msrpm_offset(u32 msr);
+u32 *svm_vcpu_alloc_msrpm(void);
+void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm);
+void svm_vcpu_free_msrpm(u32 *msrpm);
+
 int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer);
 void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
 int svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
@@ -391,6 +397,8 @@ static inline bool nested_exit_on_nmi(struct vcpu_svm *svm)
 int enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
 			 struct vmcb *nested_vmcb);
 void svm_leave_nested(struct vcpu_svm *svm);
+void svm_free_nested(struct vcpu_svm *svm);
+int svm_allocate_nested(struct vcpu_svm *svm);
 int nested_svm_vmrun(struct vcpu_svm *svm);
 void nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb);
 int nested_svm_vmexit(struct vcpu_svm *svm);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v7 0/4] KVM: nSVM: ondemand nested state allocation
  2020-10-01 11:29 [PATCH v7 0/4] KVM: nSVM: ondemand nested state allocation Maxim Levitsky
                   ` (3 preceding siblings ...)
  2020-10-01 11:29 ` [PATCH v7 4/4] KVM: nSVM: implement on demand allocation of the nested state Maxim Levitsky
@ 2020-10-19 15:52 ` Paolo Bonzini
  4 siblings, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2020-10-19 15:52 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Jim Mattson, Sean Christopherson, Thomas Gleixner,
	H. Peter Anvin, Ingo Molnar, linux-kernel, Vitaly Kuznetsov,
	Joerg Roedel

On 01/10/20 13:29, Maxim Levitsky wrote:
> This is the next version of this patch series.
> 
> In V5 I adopted Sean Christopherson's suggestion to make .set_efer return
> a negative error (-ENOMEM in this case) which in most cases in kvm
> propagates to the userspace.
> 
> I noticed though that wrmsr emulation code doesn't do this and instead
> it injects #GP to the guest on _any_ error.
> 
> So I fixed the wrmsr code to behave in a similar way to the rest
> of the kvm code.
> (#GP only on a positive error value, and forward the negative error to
> the userspace)
> 
> I had to adjust one wrmsr handler (xen_hvm_config) to stop it from returning
> negative values	so that new WRMSR emulation behavior doesn't break it.
> This patch was only compile tested.
> 
> The memory allocation failure was tested by always returning -ENOMEM
> from svm_allocate_nested.
> 
> The nested allocation itself was tested by countless attempts to run
> nested guests, do nested migration on both my AMD and Intel machines.
> I wasn't able to break it.
> 
> Changes from V5: addressed Sean Christopherson's review feedback.
> Changes from V6: rebased the code on latest kvm/queue
> 
> Best regards,
> 	Maxim Levitsky
> 
> Maxim Levitsky (4):
>   KVM: x86: xen_hvm_config: cleanup return values
>   KVM: x86: report negative values from wrmsr emulation to userspace
>   KVM: x86: allow kvm_x86_ops.set_efer to return an error value
>   KVM: nSVM: implement on demand allocation of the nested state
> 
>  arch/x86/include/asm/kvm_host.h |  2 +-
>  arch/x86/kvm/emulate.c          |  4 +--
>  arch/x86/kvm/svm/nested.c       | 42 ++++++++++++++++++++++
>  arch/x86/kvm/svm/svm.c          | 64 ++++++++++++++++++---------------
>  arch/x86/kvm/svm/svm.h          | 10 +++++-
>  arch/x86/kvm/vmx/vmx.c          |  6 ++--
>  arch/x86/kvm/x86.c              | 39 ++++++++++----------
>  7 files changed, 114 insertions(+), 53 deletions(-)
> 

Queued, thanks.

Paolo


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-10-19 15:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-01 11:29 [PATCH v7 0/4] KVM: nSVM: ondemand nested state allocation Maxim Levitsky
2020-10-01 11:29 ` [PATCH v7 1/4] KVM: x86: xen_hvm_config: cleanup return values Maxim Levitsky
2020-10-01 11:29 ` [PATCH v7 2/4] KVM: x86: report negative values from wrmsr emulation to userspace Maxim Levitsky
2020-10-01 11:29 ` [PATCH v7 3/4] KVM: x86: allow kvm_x86_ops.set_efer to return an error value Maxim Levitsky
2020-10-01 11:29 ` [PATCH v7 4/4] KVM: nSVM: implement on demand allocation of the nested state Maxim Levitsky
2020-10-19 15:52 ` [PATCH v7 0/4] KVM: nSVM: ondemand nested state allocation Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).