All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/4] KVM: nSVM: ondemand nested state allocation
@ 2020-10-01 11:29 Maxim Levitsky
  2020-10-01 11:29 ` [PATCH v7 1/4] KVM: x86: xen_hvm_config: cleanup return values Maxim Levitsky
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Maxim Levitsky @ 2020-10-01 11:29 UTC (permalink / raw)
  To: kvm
  Cc: Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Jim Mattson, Sean Christopherson, Thomas Gleixner,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar, linux-kernel,
	Vitaly Kuznetsov, Joerg Roedel, Maxim Levitsky

This is the next version of this patch series.

In V5 I adopted Sean Christopherson's suggestion to make .set_efer return
a negative error (-ENOMEM in this case) which in most cases in kvm
propagates to the userspace.

I noticed though that wrmsr emulation code doesn't do this and instead
it injects #GP to the guest on _any_ error.

So I fixed the wrmsr code to behave in a similar way to the rest
of the kvm code.
(#GP only on a positive error value, and forward the negative error to
the userspace)

I had to adjust one wrmsr handler (xen_hvm_config) to stop it from returning
negative values	so that new WRMSR emulation behavior doesn't break it.
This patch was only compile tested.

The memory allocation failure was tested by always returning -ENOMEM
from svm_allocate_nested.

The nested allocation itself was tested by countless attempts to run
nested guests, do nested migration on both my AMD and Intel machines.
I wasn't able to break it.

Changes from V5: addressed Sean Christopherson's review feedback.
Changes from V6: rebased the code on latest kvm/queue

Best regards,
	Maxim Levitsky

Maxim Levitsky (4):
  KVM: x86: xen_hvm_config: cleanup return values
  KVM: x86: report negative values from wrmsr emulation to userspace
  KVM: x86: allow kvm_x86_ops.set_efer to return an error value
  KVM: nSVM: implement on demand allocation of the nested state

 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/emulate.c          |  4 +--
 arch/x86/kvm/svm/nested.c       | 42 ++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c          | 64 ++++++++++++++++++---------------
 arch/x86/kvm/svm/svm.h          | 10 +++++-
 arch/x86/kvm/vmx/vmx.c          |  6 ++--
 arch/x86/kvm/x86.c              | 39 ++++++++++----------
 7 files changed, 114 insertions(+), 53 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v7 1/4] KVM: x86: xen_hvm_config: cleanup return values
  2020-10-01 11:29 [PATCH v7 0/4] KVM: nSVM: ondemand nested state allocation Maxim Levitsky
@ 2020-10-01 11:29 ` Maxim Levitsky
  2020-10-01 11:29 ` [PATCH v7 2/4] KVM: x86: report negative values from wrmsr emulation to userspace Maxim Levitsky
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Maxim Levitsky @ 2020-10-01 11:29 UTC (permalink / raw)
  To: kvm
  Cc: Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Jim Mattson, Sean Christopherson, Thomas Gleixner,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar, linux-kernel,
	Vitaly Kuznetsov, Joerg Roedel, Maxim Levitsky

Return 1 on errors that are caused by wrong guest behavior
(which will inject #GP to the guest)

And return a negative error value on issues that are
the kernel's fault (e.g -ENOMEM)

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c4015a43cc8a7..09a0cad49af51 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2812,24 +2812,19 @@ static int xen_hvm_config(struct kvm_vcpu *vcpu, u64 data)
 	u32 page_num = data & ~PAGE_MASK;
 	u64 page_addr = data & PAGE_MASK;
 	u8 *page;
-	int r;
 
-	r = -E2BIG;
 	if (page_num >= blob_size)
-		goto out;
-	r = -ENOMEM;
+		return 1;
+
 	page = memdup_user(blob_addr + (page_num * PAGE_SIZE), PAGE_SIZE);
-	if (IS_ERR(page)) {
-		r = PTR_ERR(page);
-		goto out;
+	if (IS_ERR(page))
+		return PTR_ERR(page);
+
+	if (kvm_vcpu_write_guest(vcpu, page_addr, page, PAGE_SIZE)) {
+		kfree(page);
+		return 1;
 	}
-	if (kvm_vcpu_write_guest(vcpu, page_addr, page, PAGE_SIZE))
-		goto out_free;
-	r = 0;
-out_free:
-	kfree(page);
-out:
-	return r;
+	return 0;
 }
 
 static inline bool kvm_pv_async_pf_enabled(struct kvm_vcpu *vcpu)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v7 2/4] KVM: x86: report negative values from wrmsr emulation to userspace
  2020-10-01 11:29 [PATCH v7 0/4] KVM: nSVM: ondemand nested state allocation Maxim Levitsky
  2020-10-01 11:29 ` [PATCH v7 1/4] KVM: x86: xen_hvm_config: cleanup return values Maxim Levitsky
@ 2020-10-01 11:29 ` Maxim Levitsky
  2020-10-01 11:29 ` [PATCH v7 3/4] KVM: x86: allow kvm_x86_ops.set_efer to return an error value Maxim Levitsky
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Maxim Levitsky @ 2020-10-01 11:29 UTC (permalink / raw)
  To: kvm
  Cc: Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Jim Mattson, Sean Christopherson, Thomas Gleixner,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar, linux-kernel,
	Vitaly Kuznetsov, Joerg Roedel, Maxim Levitsky

This will allow the KVM to report such errors (e.g -ENOMEM)
to the userspace.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/emulate.c | 4 ++--
 arch/x86/kvm/x86.c     | 9 ++++++---
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 0cc0db500f718..0d917eb703194 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3712,10 +3712,10 @@ static int em_wrmsr(struct x86_emulate_ctxt *ctxt)
 	if (r == X86EMUL_IO_NEEDED)
 		return r;
 
-	if (r)
+	if (r > 0)
 		return emulate_gp(ctxt, 0);
 
-	return X86EMUL_CONTINUE;
+	return r < 0 ? X86EMUL_UNHANDLEABLE : X86EMUL_CONTINUE;
 }
 
 static int em_rdmsr(struct x86_emulate_ctxt *ctxt)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 09a0cad49af51..7af04f9e20b48 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1737,13 +1737,16 @@ int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
 	r = kvm_set_msr(vcpu, ecx, data);
 
 	/* MSR write failed? See if we should ask user space */
-	if (r && kvm_set_msr_user_space(vcpu, ecx, data, r)) {
+	if (r && kvm_set_msr_user_space(vcpu, ecx, data, r))
 		/* Bounce to user space */
 		return 0;
-	}
+
+	/* Signal all other negative errors to userspace */
+	if (r < 0)
+		return r;
 
 	/* MSR write failed? Inject a #GP */
-	if (r) {
+	if (r > 0) {
 		trace_kvm_msr_write_ex(ecx, data);
 		kvm_inject_gp(vcpu, 0);
 		return 1;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v7 3/4] KVM: x86: allow kvm_x86_ops.set_efer to return an error value
  2020-10-01 11:29 [PATCH v7 0/4] KVM: nSVM: ondemand nested state allocation Maxim Levitsky
  2020-10-01 11:29 ` [PATCH v7 1/4] KVM: x86: xen_hvm_config: cleanup return values Maxim Levitsky
  2020-10-01 11:29 ` [PATCH v7 2/4] KVM: x86: report negative values from wrmsr emulation to userspace Maxim Levitsky
@ 2020-10-01 11:29 ` Maxim Levitsky
  2020-10-01 11:29 ` [PATCH v7 4/4] KVM: nSVM: implement on demand allocation of the nested state Maxim Levitsky
  2020-10-19 15:52 ` [PATCH v7 0/4] KVM: nSVM: ondemand nested state allocation Paolo Bonzini
  4 siblings, 0 replies; 6+ messages in thread
From: Maxim Levitsky @ 2020-10-01 11:29 UTC (permalink / raw)
  To: kvm
  Cc: Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Jim Mattson, Sean Christopherson, Thomas Gleixner,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar, linux-kernel,
	Vitaly Kuznetsov, Joerg Roedel, Maxim Levitsky

This will be used to signal an error to the userspace, in case
the vendor code failed during handling of this msr. (e.g -ENOMEM)

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 2 +-
 arch/x86/kvm/svm/svm.c          | 3 ++-
 arch/x86/kvm/svm/svm.h          | 2 +-
 arch/x86/kvm/vmx/vmx.c          | 6 ++++--
 arch/x86/kvm/x86.c              | 7 ++++++-
 5 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d0f77235da923..99657051ee8a0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1086,7 +1086,7 @@ struct kvm_x86_ops {
 	void (*get_cs_db_l_bits)(struct kvm_vcpu *vcpu, int *db, int *l);
 	void (*set_cr0)(struct kvm_vcpu *vcpu, unsigned long cr0);
 	int (*set_cr4)(struct kvm_vcpu *vcpu, unsigned long cr4);
-	void (*set_efer)(struct kvm_vcpu *vcpu, u64 efer);
+	int (*set_efer)(struct kvm_vcpu *vcpu, u64 efer);
 	void (*get_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
 	void (*set_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
 	void (*get_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4f401fc6a05d9..57e0f27ff7d20 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -263,7 +263,7 @@ static int get_max_npt_level(void)
 #endif
 }
 
-void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
+int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	vcpu->arch.efer = efer;
@@ -283,6 +283,7 @@ void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 
 	svm->vmcb->save.efer = efer | EFER_SVME;
 	vmcb_mark_dirty(svm->vmcb, VMCB_CR);
+	return 0;
 }
 
 static int is_external_interrupt(u32 info)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a7f997459b870..e7af21e6fe1e0 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -350,7 +350,7 @@ static inline bool gif_set(struct vcpu_svm *svm)
 #define MSR_INVALID				0xffffffffU
 
 u32 svm_msrpm_offset(u32 msr);
-void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer);
+int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer);
 void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
 int svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
 void svm_flush_tlb(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4551a7e80ebc3..2115d35094bc5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2815,13 +2815,14 @@ static void enter_rmode(struct kvm_vcpu *vcpu)
 	kvm_mmu_reset_context(vcpu);
 }
 
-void vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer)
+int vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	struct vmx_uret_msr *msr = vmx_find_uret_msr(vmx, MSR_EFER);
 
+	/* Nothing to do if hardware doesn't support EFER. */
 	if (!msr)
-		return;
+		return 0;
 
 	vcpu->arch.efer = efer;
 	if (efer & EFER_LMA) {
@@ -2833,6 +2834,7 @@ void vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 		msr->data = efer & ~EFER_LME;
 	}
 	setup_msrs(vmx);
+	return 0;
 }
 
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7af04f9e20b48..80e3dcd542382 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1457,6 +1457,7 @@ static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
 	u64 old_efer = vcpu->arch.efer;
 	u64 efer = msr_info->data;
+	int r;
 
 	if (efer & efer_reserved_bits)
 		return 1;
@@ -1473,7 +1474,11 @@ static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	efer &= ~EFER_LMA;
 	efer |= vcpu->arch.efer & EFER_LMA;
 
-	kvm_x86_ops.set_efer(vcpu, efer);
+	r = kvm_x86_ops.set_efer(vcpu, efer);
+	if (r) {
+		WARN_ON(r > 0);
+		return r;
+	}
 
 	/* Update reserved bits */
 	if ((efer ^ old_efer) & EFER_NX)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v7 4/4] KVM: nSVM: implement on demand allocation of the nested state
  2020-10-01 11:29 [PATCH v7 0/4] KVM: nSVM: ondemand nested state allocation Maxim Levitsky
                   ` (2 preceding siblings ...)
  2020-10-01 11:29 ` [PATCH v7 3/4] KVM: x86: allow kvm_x86_ops.set_efer to return an error value Maxim Levitsky
@ 2020-10-01 11:29 ` Maxim Levitsky
  2020-10-19 15:52 ` [PATCH v7 0/4] KVM: nSVM: ondemand nested state allocation Paolo Bonzini
  4 siblings, 0 replies; 6+ messages in thread
From: Maxim Levitsky @ 2020-10-01 11:29 UTC (permalink / raw)
  To: kvm
  Cc: Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Jim Mattson, Sean Christopherson, Thomas Gleixner,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar, linux-kernel,
	Vitaly Kuznetsov, Joerg Roedel, Maxim Levitsky

This way we don't waste memory on VMs which don't use nesting
virtualization even when the host enabled it for them.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 42 +++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c    | 61 +++++++++++++++++++++------------------
 arch/x86/kvm/svm/svm.h    |  8 +++++
 3 files changed, 83 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index ba50ff6e35c7c..9e4c226dbf7d9 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -481,6 +481,9 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
 
 	vmcb12 = map.hva;
 
+	if (WARN_ON_ONCE(!svm->nested.initialized))
+		return -EINVAL;
+
 	if (!nested_vmcb_checks(svm, vmcb12)) {
 		vmcb12->control.exit_code    = SVM_EXIT_ERR;
 		vmcb12->control.exit_code_hi = 0;
@@ -698,6 +701,45 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	return 0;
 }
 
+int svm_allocate_nested(struct vcpu_svm *svm)
+{
+	struct page *hsave_page;
+
+	if (svm->nested.initialized)
+		return 0;
+
+	hsave_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	if (!hsave_page)
+		return -ENOMEM;
+	svm->nested.hsave = page_address(hsave_page);
+
+	svm->nested.msrpm = svm_vcpu_alloc_msrpm();
+	if (!svm->nested.msrpm)
+		goto err_free_hsave;
+	svm_vcpu_init_msrpm(&svm->vcpu, svm->nested.msrpm);
+
+	svm->nested.initialized = true;
+	return 0;
+
+err_free_hsave:
+	__free_page(hsave_page);
+	return -ENOMEM;
+}
+
+void svm_free_nested(struct vcpu_svm *svm)
+{
+	if (!svm->nested.initialized)
+		return;
+
+	svm_vcpu_free_msrpm(svm->nested.msrpm);
+	svm->nested.msrpm = NULL;
+
+	__free_page(virt_to_page(svm->nested.hsave));
+	svm->nested.hsave = NULL;
+
+	svm->nested.initialized = false;
+}
+
 /*
  * Forcibly leave nested mode in order to be able to reset the VCPU later on.
  */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 57e0f27ff7d20..dc4fe579d460e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -266,6 +266,7 @@ static int get_max_npt_level(void)
 int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	u64 old_efer = vcpu->arch.efer;
 	vcpu->arch.efer = efer;
 
 	if (!npt_enabled) {
@@ -276,9 +277,27 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 			efer &= ~EFER_LME;
 	}
 
-	if (!(efer & EFER_SVME)) {
-		svm_leave_nested(svm);
-		svm_set_gif(svm, true);
+	if ((old_efer & EFER_SVME) != (efer & EFER_SVME)) {
+		if (!(efer & EFER_SVME)) {
+			svm_leave_nested(svm);
+			svm_set_gif(svm, true);
+
+			/*
+			 * Free the nested guest state, unless we are in SMM.
+			 * In this case we will return to the nested guest
+			 * as soon as we leave SMM.
+			 */
+			if (!is_smm(&svm->vcpu))
+				svm_free_nested(svm);
+
+		} else {
+			int ret = svm_allocate_nested(svm);
+
+			if (ret) {
+				vcpu->arch.efer = old_efer;
+				return ret;
+			}
+		}
 	}
 
 	svm->vmcb->save.efer = efer | EFER_SVME;
@@ -650,7 +669,7 @@ static void set_msr_interception(struct kvm_vcpu *vcpu, u32 *msrpm, u32 msr,
 	set_msr_interception_bitmap(vcpu, msrpm, msr, read, write);
 }
 
-static u32 *svm_vcpu_alloc_msrpm(void)
+u32 *svm_vcpu_alloc_msrpm(void)
 {
 	struct page *pages = alloc_pages(GFP_KERNEL_ACCOUNT, MSRPM_ALLOC_ORDER);
 	u32 *msrpm;
@@ -664,7 +683,7 @@ static u32 *svm_vcpu_alloc_msrpm(void)
 	return msrpm;
 }
 
-static void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm)
+void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm)
 {
 	int i;
 
@@ -675,7 +694,8 @@ static void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm)
 	}
 }
 
-static void svm_vcpu_free_msrpm(u32 *msrpm)
+
+void svm_vcpu_free_msrpm(u32 *msrpm)
 {
 	__free_pages(virt_to_page(msrpm), MSRPM_ALLOC_ORDER);
 }
@@ -1268,7 +1288,6 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm;
 	struct page *vmcb_page;
-	struct page *hsave_page;
 	int err;
 
 	BUILD_BUG_ON(offsetof(struct vcpu_svm, vcpu) != 0);
@@ -1279,13 +1298,9 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 	if (!vmcb_page)
 		goto out;
 
-	hsave_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
-	if (!hsave_page)
-		goto error_free_vmcb_page;
-
 	err = avic_init_vcpu(svm);
 	if (err)
-		goto error_free_hsave_page;
+		goto error_free_vmcb_page;
 
 	/* We initialize this flag to true to make sure that the is_running
 	 * bit would be set the first time the vcpu is loaded.
@@ -1293,21 +1308,12 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 	if (irqchip_in_kernel(vcpu->kvm) && kvm_apicv_activated(vcpu->kvm))
 		svm->avic_is_running = true;
 
-	svm->nested.hsave = page_address(hsave_page);
-
 	svm->msrpm = svm_vcpu_alloc_msrpm();
 	if (!svm->msrpm)
-		goto error_free_hsave_page;
+		goto error_free_vmcb_page;
 
 	svm_vcpu_init_msrpm(vcpu, svm->msrpm);
 
-	svm->nested.msrpm = svm_vcpu_alloc_msrpm();
-	if (!svm->nested.msrpm)
-		goto error_free_msrpm;
-
-	/* We only need the L1 pass-through MSR state, so leave vcpu as NULL */
-	svm_vcpu_init_msrpm(vcpu, svm->nested.msrpm);
-
 	svm->vmcb = page_address(vmcb_page);
 	svm->vmcb_pa = __sme_set(page_to_pfn(vmcb_page) << PAGE_SHIFT);
 	svm->asid_generation = 0;
@@ -1318,10 +1324,6 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 
 	return 0;
 
-error_free_msrpm:
-	svm_vcpu_free_msrpm(svm->msrpm);
-error_free_hsave_page:
-	__free_page(hsave_page);
 error_free_vmcb_page:
 	__free_page(vmcb_page);
 out:
@@ -1347,10 +1349,10 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
 	 */
 	svm_clear_current_vmcb(svm->vmcb);
 
+	svm_free_nested(svm);
+
 	__free_page(pfn_to_page(__sme_clr(svm->vmcb_pa) >> PAGE_SHIFT));
 	__free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
-	__free_page(virt_to_page(svm->nested.hsave));
-	__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
 }
 
 static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
@@ -4038,6 +4040,9 @@ static int svm_pre_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
 					 gpa_to_gfn(vmcb12_gpa), &map) == -EINVAL)
 				return 1;
 
+			if (svm_allocate_nested(svm))
+				return 1;
+
 			ret = enter_svm_guest_mode(svm, vmcb12_gpa, map.hva);
 			kvm_vcpu_unmap(&svm->vcpu, &map, true);
 		}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index e7af21e6fe1e0..1d853fe4c778b 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -97,6 +97,8 @@ struct svm_nested_state {
 
 	/* cache for control fields of the guest */
 	struct vmcb_control_area ctl;
+
+	bool initialized;
 };
 
 struct vcpu_svm {
@@ -350,6 +352,10 @@ static inline bool gif_set(struct vcpu_svm *svm)
 #define MSR_INVALID				0xffffffffU
 
 u32 svm_msrpm_offset(u32 msr);
+u32 *svm_vcpu_alloc_msrpm(void);
+void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm);
+void svm_vcpu_free_msrpm(u32 *msrpm);
+
 int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer);
 void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
 int svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
@@ -391,6 +397,8 @@ static inline bool nested_exit_on_nmi(struct vcpu_svm *svm)
 int enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
 			 struct vmcb *nested_vmcb);
 void svm_leave_nested(struct vcpu_svm *svm);
+void svm_free_nested(struct vcpu_svm *svm);
+int svm_allocate_nested(struct vcpu_svm *svm);
 int nested_svm_vmrun(struct vcpu_svm *svm);
 void nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb);
 int nested_svm_vmexit(struct vcpu_svm *svm);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v7 0/4] KVM: nSVM: ondemand nested state allocation
  2020-10-01 11:29 [PATCH v7 0/4] KVM: nSVM: ondemand nested state allocation Maxim Levitsky
                   ` (3 preceding siblings ...)
  2020-10-01 11:29 ` [PATCH v7 4/4] KVM: nSVM: implement on demand allocation of the nested state Maxim Levitsky
@ 2020-10-19 15:52 ` Paolo Bonzini
  4 siblings, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2020-10-19 15:52 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Wanpeng Li, Jim Mattson, Sean Christopherson, Thomas Gleixner,
	H. Peter Anvin, Ingo Molnar, linux-kernel, Vitaly Kuznetsov,
	Joerg Roedel

On 01/10/20 13:29, Maxim Levitsky wrote:
> This is the next version of this patch series.
> 
> In V5 I adopted Sean Christopherson's suggestion to make .set_efer return
> a negative error (-ENOMEM in this case) which in most cases in kvm
> propagates to the userspace.
> 
> I noticed though that wrmsr emulation code doesn't do this and instead
> it injects #GP to the guest on _any_ error.
> 
> So I fixed the wrmsr code to behave in a similar way to the rest
> of the kvm code.
> (#GP only on a positive error value, and forward the negative error to
> the userspace)
> 
> I had to adjust one wrmsr handler (xen_hvm_config) to stop it from returning
> negative values	so that new WRMSR emulation behavior doesn't break it.
> This patch was only compile tested.
> 
> The memory allocation failure was tested by always returning -ENOMEM
> from svm_allocate_nested.
> 
> The nested allocation itself was tested by countless attempts to run
> nested guests, do nested migration on both my AMD and Intel machines.
> I wasn't able to break it.
> 
> Changes from V5: addressed Sean Christopherson's review feedback.
> Changes from V6: rebased the code on latest kvm/queue
> 
> Best regards,
> 	Maxim Levitsky
> 
> Maxim Levitsky (4):
>   KVM: x86: xen_hvm_config: cleanup return values
>   KVM: x86: report negative values from wrmsr emulation to userspace
>   KVM: x86: allow kvm_x86_ops.set_efer to return an error value
>   KVM: nSVM: implement on demand allocation of the nested state
> 
>  arch/x86/include/asm/kvm_host.h |  2 +-
>  arch/x86/kvm/emulate.c          |  4 +--
>  arch/x86/kvm/svm/nested.c       | 42 ++++++++++++++++++++++
>  arch/x86/kvm/svm/svm.c          | 64 ++++++++++++++++++---------------
>  arch/x86/kvm/svm/svm.h          | 10 +++++-
>  arch/x86/kvm/vmx/vmx.c          |  6 ++--
>  arch/x86/kvm/x86.c              | 39 ++++++++++----------
>  7 files changed, 114 insertions(+), 53 deletions(-)
> 

Queued, thanks.

Paolo


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-10-19 15:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-01 11:29 [PATCH v7 0/4] KVM: nSVM: ondemand nested state allocation Maxim Levitsky
2020-10-01 11:29 ` [PATCH v7 1/4] KVM: x86: xen_hvm_config: cleanup return values Maxim Levitsky
2020-10-01 11:29 ` [PATCH v7 2/4] KVM: x86: report negative values from wrmsr emulation to userspace Maxim Levitsky
2020-10-01 11:29 ` [PATCH v7 3/4] KVM: x86: allow kvm_x86_ops.set_efer to return an error value Maxim Levitsky
2020-10-01 11:29 ` [PATCH v7 4/4] KVM: nSVM: implement on demand allocation of the nested state Maxim Levitsky
2020-10-19 15:52 ` [PATCH v7 0/4] KVM: nSVM: ondemand nested state allocation Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.