kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch 0/4] Defer page pinning for SEV guests until guest pages touched
@ 2020-07-24 23:54 eric van tassell
  2020-07-24 23:54 ` [Patch 1/4] KVM:MMU: Introduce the set_spte_notify() callback eric van tassell
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: eric van tassell @ 2020-07-24 23:54 UTC (permalink / raw)
  To: kvm
  Cc: bp, hpa, mingo, jmattson, joro, pbonzini, sean.j.christopherson,
	tglx, vkuznets, wanpengli, x86, evantass

Overview
========
	Defer pinning of the guest's pages until nested page faults occur
	to improve startup time and reduce memory pressure for SEV guests.

	Cease paying the computational cost of pinning all pages when an
	encrypted region is registered, before it is known if they will be accessed.

	Cease creating the memory pressure due to  pinning all pages when an
	encrypted region is registered before, it is known if they will be accessed.

Timing Results
==========
All timings are done by hand with and Android stopwatch app

SEV guest size(GiB)  	     |  4 |  8 | 16 | 32 | 60 |
without patch series(sec)    |  2 |  3 |  4 |  8 | 14 |
with patch series (sec)      |  1 |  1 |  1 |  1 |  1 |

Applies To:
===========
	This patch applies top of this commit from the <next> branch of
	the kvm tree:
	    c34b26b98cac   Tianjia Zhang : KVM: MIPS: clean up redundant 'kvm_run' parameters

eric van tassell (4):
  KVM:MMU: Introduce the set_spte_notify() callback
  KVM:SVM: Introduce set_spte_notify support
  KVM:SVM: Pin sev_launch_update_data() pages via sev_get_page()
  KVM:SVM: Remove struct enc_region and associated pinned page tracking.

 arch/x86/include/asm/kvm_host.h |   3 +
 arch/x86/kvm/mmu/mmu.c          |  31 ++++-
 arch/x86/kvm/mmu/paging_tmpl.h  |  27 ++--
 arch/x86/kvm/svm/sev.c          | 232 ++++++++++++++++----------------
 arch/x86/kvm/svm/svm.c          |   2 +
 arch/x86/kvm/svm/svm.h          |   4 +-
 6 files changed, 169 insertions(+), 130 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Patch 1/4] KVM:MMU: Introduce the set_spte_notify() callback
  2020-07-24 23:54 [Patch 0/4] Defer page pinning for SEV guests until guest pages touched eric van tassell
@ 2020-07-24 23:54 ` eric van tassell
  2020-07-24 23:54 ` [Patch 2/4] KVM:SVM: Introduce set_spte_notify support eric van tassell
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: eric van tassell @ 2020-07-24 23:54 UTC (permalink / raw)
  To: kvm
  Cc: bp, hpa, mingo, jmattson, joro, pbonzini, sean.j.christopherson,
	tglx, vkuznets, wanpengli, x86, evantass

This generic callback will be called just before setting the spte.

Check the return code where not previously checked since this operation can
return an error.

This will be used by SVM to defer pinning of SEV guest pages until they're
used in order to reduce SEV guest startup time by eliminating the compute
cycles required to pin all the pages up front. Additionally, we want to
reduce memory pressure due to guests that reserve but only sparsely access
a large amount of memory.

Signed-off-by: eric van tassell <Eric.VanTassell@amd.com>
---
 arch/x86/include/asm/kvm_host.h |  3 +++
 arch/x86/kvm/mmu/mmu.c          | 31 +++++++++++++++++++++++++++----
 arch/x86/kvm/mmu/paging_tmpl.h  | 27 ++++++++++++++++-----------
 3 files changed, 46 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5aaef036627f..e8c37e28a8ae 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1227,6 +1227,9 @@ struct kvm_x86_ops {
 	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
 
 	void (*migrate_timers)(struct kvm_vcpu *vcpu);
+
+	int (*set_spte_notify)(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn,
+			       int level, bool mmio, u64 *spte);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index fa506aaaf019..0a6a3df8c8c8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3004,6 +3004,23 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		spte |= kvm_x86_ops.get_mt_mask(vcpu, gfn,
 			kvm_is_mmio_pfn(pfn));
 
+	if (kvm_x86_ops.set_spte_notify) {
+		ret = kvm_x86_ops.set_spte_notify(vcpu, gfn, pfn, level,
+						  kvm_is_mmio_pfn(pfn), &spte);
+		if (ret) {
+			if (WARN_ON_ONCE(ret > 0))
+				/*
+				 * set_spte_notify() should return 0 on success
+				 * and non-zero preferably less than zero,
+				 * for failure.  We check for any unanticipated
+				 * positive return values here.
+				 */
+				ret = -EINVAL;
+
+			return ret;
+		}
+	}
+
 	if (host_writable)
 		spte |= SPTE_HOST_WRITEABLE;
 	else
@@ -3086,6 +3103,9 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 
 	set_spte_ret = set_spte(vcpu, sptep, pte_access, level, gfn, pfn,
 				speculative, true, host_writable);
+	if (set_spte_ret < 0)
+		return set_spte_ret;
+
 	if (set_spte_ret & SET_SPTE_WRITE_PROTECTED_PT) {
 		if (write_fault)
 			ret = RET_PF_EMULATE;
@@ -3134,7 +3154,7 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
 	struct page *pages[PTE_PREFETCH_NUM];
 	struct kvm_memory_slot *slot;
 	unsigned int access = sp->role.access;
-	int i, ret;
+	int i, ret, error_ret = 0;
 	gfn_t gfn;
 
 	gfn = kvm_mmu_page_get_gfn(sp, start - sp->spt);
@@ -3147,12 +3167,15 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
 		return -1;
 
 	for (i = 0; i < ret; i++, gfn++, start++) {
-		mmu_set_spte(vcpu, start, access, 0, sp->role.level, gfn,
-			     page_to_pfn(pages[i]), true, true);
+		ret = mmu_set_spte(vcpu, start, access, 0, sp->role.level, gfn,
+				   page_to_pfn(pages[i]), true, true);
+		if (ret < 0 && error_ret == 0) /* only track 1st fail */
+			error_ret = ret;
 		put_page(pages[i]);
 	}
 
-	return 0;
+	/* If there was an error for any gfn, return non-0. */
+	return error_ret;
 }
 
 static void __direct_pte_prefetch(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 0172a949f6a7..a777bb43dfa0 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -532,6 +532,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	unsigned pte_access;
 	gfn_t gfn;
 	kvm_pfn_t pfn;
+	int set_spte_ret;
 
 	if (FNAME(prefetch_invalid_gpte)(vcpu, sp, spte, gpte))
 		return false;
@@ -550,11 +551,12 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	 * we call mmu_set_spte() with host_writable = true because
 	 * pte_prefetch_gfn_to_pfn always gets a writable pfn.
 	 */
-	mmu_set_spte(vcpu, spte, pte_access, 0, PG_LEVEL_4K, gfn, pfn,
-		     true, true);
+	set_spte_ret = mmu_set_spte(vcpu, spte, pte_access, 0,
+				    PG_LEVEL_4K, gfn, pfn, true, true);
 
 	kvm_release_pfn_clean(pfn);
-	return true;
+
+	return set_spte_ret >= 0;
 }
 
 static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
@@ -1011,7 +1013,8 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 	int i, nr_present = 0;
 	bool host_writable;
 	gpa_t first_pte_gpa;
-	int set_spte_ret = 0;
+	int ret;
+	int accum_set_spte_flags = 0;
 
 	/* direct kvm_mmu_page can not be unsync. */
 	BUG_ON(sp->role.direct);
@@ -1064,17 +1067,19 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 			continue;
 		}
 
-		nr_present++;
-
 		host_writable = sp->spt[i] & SPTE_HOST_WRITEABLE;
 
-		set_spte_ret |= set_spte(vcpu, &sp->spt[i],
-					 pte_access, PG_LEVEL_4K,
-					 gfn, spte_to_pfn(sp->spt[i]),
-					 true, false, host_writable);
+		ret = set_spte(vcpu, &sp->spt[i], pte_access,
+			       PG_LEVEL_4K, gfn,
+			       spte_to_pfn(sp->spt[i]), true, false,
+			       host_writable);
+		if (ret >= 0) {
+			nr_present++;
+			accum_set_spte_flags |= ret;
+		}
 	}
 
-	if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH)
+	if (accum_set_spte_flags & SET_SPTE_NEED_REMOTE_TLB_FLUSH)
 		kvm_flush_remote_tlbs(vcpu->kvm);
 
 	return nr_present;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Patch 2/4] KVM:SVM: Introduce set_spte_notify support
  2020-07-24 23:54 [Patch 0/4] Defer page pinning for SEV guests until guest pages touched eric van tassell
  2020-07-24 23:54 ` [Patch 1/4] KVM:MMU: Introduce the set_spte_notify() callback eric van tassell
@ 2020-07-24 23:54 ` eric van tassell
  2020-07-31 20:25   ` Sean Christopherson
  2020-07-24 23:54 ` [Patch 3/4] KVM:SVM: Pin sev_launch_update_data() pages via sev_get_page() eric van tassell
  2020-07-24 23:54 ` [Patch 4/4] KVM:SVM: Remove struct enc_region and associated pinned page tracking eric van tassell
  3 siblings, 1 reply; 17+ messages in thread
From: eric van tassell @ 2020-07-24 23:54 UTC (permalink / raw)
  To: kvm
  Cc: bp, hpa, mingo, jmattson, joro, pbonzini, sean.j.christopherson,
	tglx, vkuznets, wanpengli, x86, evantass

Improve SEV guest startup time from O(n) to a constant by deferring
guest page pinning until the pages are used to satisfy nested page faults.

Implement the code to do the pinning (sev_get_page) and the notifier
sev_set_spte_notify().

Track the pinned pages with xarray so they can be released during guest
termination.

Signed-off-by: eric van tassell <Eric.VanTassell@amd.com>
---
 arch/x86/kvm/svm/sev.c | 71 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c |  2 ++
 arch/x86/kvm/svm/svm.h |  3 ++
 3 files changed, 76 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index f7f1f4ecf08e..040ae4aa7c5a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -184,6 +184,8 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	sev->asid = asid;
 	INIT_LIST_HEAD(&sev->regions_list);
 
+	xa_init(&sev->pages_xarray);
+
 	return 0;
 
 e_free:
@@ -415,6 +417,42 @@ static unsigned long get_num_contig_pages(unsigned long idx,
 	return pages;
 }
 
+static int sev_get_page(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	struct xarray *xa = &sev->pages_xarray;
+	struct page *page = pfn_to_page(pfn);
+	int ret;
+
+	/* store page at index = gfn */
+	ret = xa_insert(xa, gfn, page, GFP_ATOMIC);
+	if (ret == -EBUSY) {
+		/*
+		 * If xa_insert returned -EBUSY, the  gfn was already associated
+		 * with a struct page *.
+		 */
+		struct page *cur_page;
+
+		cur_page = xa_load(xa, gfn);
+		/* If cur_page == page, no change is needed, so return 0 */
+		if (cur_page == page)
+			return 0;
+
+		/* Release the page that was stored at index = gfn */
+		put_page(cur_page);
+
+		/* Return result of attempting to store page at index = gfn */
+		ret = xa_err(xa_store(xa, gfn, page, GFP_ATOMIC));
+	}
+
+	if (ret)
+		return ret;
+
+	get_page(page);
+
+	return 0;
+}
+
 static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
 {
 	unsigned long vaddr, vaddr_end, next_vaddr, npages, pages, size, i;
@@ -1085,6 +1123,8 @@ void sev_vm_destroy(struct kvm *kvm)
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
 	struct list_head *head = &sev->regions_list;
 	struct list_head *pos, *q;
+	XA_STATE(xas, &sev->pages_xarray, 0);
+	struct page *xa_page;
 
 	if (!sev_guest(kvm))
 		return;
@@ -1109,6 +1149,12 @@ void sev_vm_destroy(struct kvm *kvm)
 		}
 	}
 
+	/* Release each pinned page that SEV tracked in sev->pages_xarray. */
+	xas_for_each(&xas, xa_page, ULONG_MAX) {
+		put_page(xa_page);
+	}
+	xa_destroy(&sev->pages_xarray);
+
 	mutex_unlock(&kvm->lock);
 
 	sev_unbind_asid(kvm, sev->handle);
@@ -1193,3 +1239,28 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu)
 	svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
 	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
 }
+
+int sev_set_spte_notify(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn,
+			int level, bool mmio, u64 *spte)
+{
+	int rc;
+
+	if (!sev_guest(vcpu->kvm))
+		return 0;
+
+	/* MMIO page contains the unencrypted data, no need to lock this page */
+	if (mmio)
+		return 0;
+
+	rc = sev_get_page(vcpu->kvm, gfn, pfn);
+	if (rc)
+		return rc;
+
+	/*
+	 * Flush any cached lines of the page being added since "ownership" of
+	 * it will be transferred from the host to an encrypted guest.
+	 */
+	clflush_cache_range(__va(pfn << PAGE_SHIFT), page_level_size(level));
+
+	return 0;
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 535ad311ad02..9b304c761a99 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4130,6 +4130,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
 
 	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
+
+	.set_spte_notify = sev_set_spte_notify,
 };
 
 static struct kvm_x86_init_ops svm_init_ops __initdata = {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 121b198b51e9..8a5c01516c89 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -65,6 +65,7 @@ struct kvm_sev_info {
 	int fd;			/* SEV device fd */
 	unsigned long pages_locked; /* Number of pages locked */
 	struct list_head regions_list;  /* List of registered regions */
+	struct xarray pages_xarray; /* List of PFN locked */
 };
 
 struct kvm_svm {
@@ -488,5 +489,7 @@ int svm_unregister_enc_region(struct kvm *kvm,
 void pre_sev_run(struct vcpu_svm *svm, int cpu);
 int __init sev_hardware_setup(void);
 void sev_hardware_teardown(void);
+int sev_set_spte_notify(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn,
+			int level, bool mmio, u64 *spte);
 
 #endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Patch 3/4] KVM:SVM: Pin sev_launch_update_data() pages via sev_get_page()
  2020-07-24 23:54 [Patch 0/4] Defer page pinning for SEV guests until guest pages touched eric van tassell
  2020-07-24 23:54 ` [Patch 1/4] KVM:MMU: Introduce the set_spte_notify() callback eric van tassell
  2020-07-24 23:54 ` [Patch 2/4] KVM:SVM: Introduce set_spte_notify support eric van tassell
@ 2020-07-24 23:54 ` eric van tassell
  2020-07-31 20:40   ` Sean Christopherson
  2020-07-24 23:54 ` [Patch 4/4] KVM:SVM: Remove struct enc_region and associated pinned page tracking eric van tassell
  3 siblings, 1 reply; 17+ messages in thread
From: eric van tassell @ 2020-07-24 23:54 UTC (permalink / raw)
  To: kvm
  Cc: bp, hpa, mingo, jmattson, joro, pbonzini, sean.j.christopherson,
	tglx, vkuznets, wanpengli, x86, evantass

Add 2 small infrastructure functions here which to enable pinning the SEV
guest pages used for sev_launch_update_data() using sev_get_page().

Pin the memory for the data being passed to launch_update_data() because it
gets encrypted before the guest is first run and must not be moved which
would corrupt it.

Signed-off-by: eric van tassell <Eric.VanTassell@amd.com>
---
 arch/x86/kvm/svm/sev.c | 48 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 040ae4aa7c5a..e0eed9a20a51 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -453,6 +453,37 @@ static int sev_get_page(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn)
 	return 0;
 }
 
+static struct kvm_memory_slot *hva_to_memslot(struct kvm *kvm,
+					      unsigned long hva)
+{
+	struct kvm_memslots *slots = kvm_memslots(kvm);
+	struct kvm_memory_slot *memslot;
+
+	kvm_for_each_memslot(memslot, slots) {
+		if (hva >= memslot->userspace_addr &&
+		    hva < memslot->userspace_addr +
+			      (memslot->npages << PAGE_SHIFT))
+			return memslot;
+	}
+
+	return NULL;
+}
+
+static bool hva_to_gfn(struct kvm *kvm, unsigned long hva, gfn_t *gfn)
+{
+	struct kvm_memory_slot *memslot;
+	gpa_t gpa_offset;
+
+	memslot = hva_to_memslot(kvm, hva);
+	if (!memslot)
+		return false;
+
+	gpa_offset = hva - memslot->userspace_addr;
+	*gfn = ((memslot->base_gfn << PAGE_SHIFT) + gpa_offset) >> PAGE_SHIFT;
+
+	return true;
+}
+
 static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
 {
 	unsigned long vaddr, vaddr_end, next_vaddr, npages, pages, size, i;
@@ -483,6 +514,23 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
 		goto e_free;
 	}
 
+	/*
+	 * Increment the page ref count so that the pages do not get migrated or
+	 * moved after we are done from the LAUNCH_UPDATE_DATA.
+	 */
+	for (i = 0; i < npages; i++) {
+		gfn_t gfn;
+
+		if (!hva_to_gfn(kvm, (vaddr + (i * PAGE_SIZE)) & PAGE_MASK, &gfn)) {
+			ret = -EFAULT;
+			goto e_unpin;
+		}
+
+		ret = sev_get_page(kvm, gfn, page_to_pfn(inpages[i]));
+		if (ret)
+			goto e_unpin;
+	}
+
 	/*
 	 * The LAUNCH_UPDATE command will perform in-place encryption of the
 	 * memory content (i.e it will write the same memory region with C=1).
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Patch 4/4] KVM:SVM: Remove struct enc_region and associated pinned page tracking.
  2020-07-24 23:54 [Patch 0/4] Defer page pinning for SEV guests until guest pages touched eric van tassell
                   ` (2 preceding siblings ...)
  2020-07-24 23:54 ` [Patch 3/4] KVM:SVM: Pin sev_launch_update_data() pages via sev_get_page() eric van tassell
@ 2020-07-24 23:54 ` eric van tassell
  3 siblings, 0 replies; 17+ messages in thread
From: eric van tassell @ 2020-07-24 23:54 UTC (permalink / raw)
  To: kvm
  Cc: bp, hpa, mingo, jmattson, joro, pbonzini, sean.j.christopherson,
	tglx, vkuznets, wanpengli, x86, evantass

Remove the enc_region structure definition and the code which maintained
it, as they are no longer needed in view of the xarray support we added in
the previous patch.

Leave svm_register_enc_region() and svm_unregister_enc_region() as stubs
since the ioctl is used by qemu and qemu will crash if they do not
return 0.

Signed-off-by: eric van tassell <Eric.VanTassell@amd.com>
---
 arch/x86/kvm/svm/sev.c | 117 +----------------------------------------
 arch/x86/kvm/svm/svm.h |   1 -
 2 files changed, 1 insertion(+), 117 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e0eed9a20a51..7ca5d043d908 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -27,14 +27,6 @@ static unsigned long *sev_asid_bitmap;
 static unsigned long *sev_reclaim_asid_bitmap;
 #define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)
 
-struct enc_region {
-	struct list_head list;
-	unsigned long npages;
-	struct page **pages;
-	unsigned long uaddr;
-	unsigned long size;
-};
-
 static int sev_flush_asids(void)
 {
 	int ret, error = 0;
@@ -182,7 +174,6 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 
 	sev->active = true;
 	sev->asid = asid;
-	INIT_LIST_HEAD(&sev->regions_list);
 
 	xa_init(&sev->pages_xarray);
 
@@ -1064,113 +1055,18 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 int svm_register_enc_region(struct kvm *kvm,
 			    struct kvm_enc_region *range)
 {
-	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
-	struct enc_region *region;
-	int ret = 0;
-
-	if (!sev_guest(kvm))
-		return -ENOTTY;
-
-	if (range->addr > ULONG_MAX || range->size > ULONG_MAX)
-		return -EINVAL;
-
-	region = kzalloc(sizeof(*region), GFP_KERNEL_ACCOUNT);
-	if (!region)
-		return -ENOMEM;
-
-	region->pages = sev_pin_memory(kvm, range->addr, range->size, &region->npages, 1);
-	if (IS_ERR(region->pages)) {
-		ret = PTR_ERR(region->pages);
-		goto e_free;
-	}
-
-	/*
-	 * The guest may change the memory encryption attribute from C=0 -> C=1
-	 * or vice versa for this memory range. Lets make sure caches are
-	 * flushed to ensure that guest data gets written into memory with
-	 * correct C-bit.
-	 */
-	sev_clflush_pages(region->pages, region->npages);
-
-	region->uaddr = range->addr;
-	region->size = range->size;
-
-	mutex_lock(&kvm->lock);
-	list_add_tail(&region->list, &sev->regions_list);
-	mutex_unlock(&kvm->lock);
-
-	return ret;
-
-e_free:
-	kfree(region);
-	return ret;
-}
-
-static struct enc_region *
-find_enc_region(struct kvm *kvm, struct kvm_enc_region *range)
-{
-	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
-	struct list_head *head = &sev->regions_list;
-	struct enc_region *i;
-
-	list_for_each_entry(i, head, list) {
-		if (i->uaddr == range->addr &&
-		    i->size == range->size)
-			return i;
-	}
-
-	return NULL;
-}
-
-static void __unregister_enc_region_locked(struct kvm *kvm,
-					   struct enc_region *region)
-{
-	sev_unpin_memory(kvm, region->pages, region->npages);
-	list_del(&region->list);
-	kfree(region);
+	return 0;
 }
 
 int svm_unregister_enc_region(struct kvm *kvm,
 			      struct kvm_enc_region *range)
 {
-	struct enc_region *region;
-	int ret;
-
-	mutex_lock(&kvm->lock);
-
-	if (!sev_guest(kvm)) {
-		ret = -ENOTTY;
-		goto failed;
-	}
-
-	region = find_enc_region(kvm, range);
-	if (!region) {
-		ret = -EINVAL;
-		goto failed;
-	}
-
-	/*
-	 * Ensure that all guest tagged cache entries are flushed before
-	 * releasing the pages back to the system for use. CLFLUSH will
-	 * not do this, so issue a WBINVD.
-	 */
-	wbinvd_on_all_cpus();
-
-	__unregister_enc_region_locked(kvm, region);
-
-	mutex_unlock(&kvm->lock);
 	return 0;
-
-failed:
-	mutex_unlock(&kvm->lock);
-	return ret;
 }
 
 void sev_vm_destroy(struct kvm *kvm)
 {
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
-	struct list_head *head = &sev->regions_list;
-	struct list_head *pos, *q;
 	XA_STATE(xas, &sev->pages_xarray, 0);
 	struct page *xa_page;
 
@@ -1186,17 +1082,6 @@ void sev_vm_destroy(struct kvm *kvm)
 	 */
 	wbinvd_on_all_cpus();
 
-	/*
-	 * if userspace was terminated before unregistering the memory regions
-	 * then lets unpin all the registered memory.
-	 */
-	if (!list_empty(head)) {
-		list_for_each_safe(pos, q, head) {
-			__unregister_enc_region_locked(kvm,
-				list_entry(pos, struct enc_region, list));
-		}
-	}
-
 	/* Release each pinned page that SEV tracked in sev->pages_xarray. */
 	xas_for_each(&xas, xa_page, ULONG_MAX) {
 		put_page(xa_page);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 8a5c01516c89..ed6311768030 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -64,7 +64,6 @@ struct kvm_sev_info {
 	unsigned int handle;	/* SEV firmware handle */
 	int fd;			/* SEV device fd */
 	unsigned long pages_locked; /* Number of pages locked */
-	struct list_head regions_list;  /* List of registered regions */
 	struct xarray pages_xarray; /* List of PFN locked */
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Patch 2/4] KVM:SVM: Introduce set_spte_notify support
  2020-07-24 23:54 ` [Patch 2/4] KVM:SVM: Introduce set_spte_notify support eric van tassell
@ 2020-07-31 20:25   ` Sean Christopherson
  2020-08-02 20:53     ` Eric van Tassell
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2020-07-31 20:25 UTC (permalink / raw)
  To: eric van tassell
  Cc: kvm, bp, hpa, mingo, jmattson, joro, pbonzini, tglx, vkuznets,
	wanpengli, x86, evantass

On Fri, Jul 24, 2020 at 06:54:46PM -0500, eric van tassell wrote:
> Improve SEV guest startup time from O(n) to a constant by deferring
> guest page pinning until the pages are used to satisfy nested page faults.
> 
> Implement the code to do the pinning (sev_get_page) and the notifier
> sev_set_spte_notify().
> 
> Track the pinned pages with xarray so they can be released during guest
> termination.

I like that SEV is trying to be a better citizen, but this is trading one
hack for another.

  - KVM goes through a lot of effort to ensure page faults don't need to
    allocate memory, and this throws all that effort out the window.

  - Tracking all gfns in a separate database (from the MMU) is wasteful.

  - Having to wait to free pinned memory until the VM is destroyed is less
    than ideal.

More thoughts in the next patch.

> Signed-off-by: eric van tassell <Eric.VanTassell@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c | 71 ++++++++++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/svm/svm.c |  2 ++
>  arch/x86/kvm/svm/svm.h |  3 ++
>  3 files changed, 76 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index f7f1f4ecf08e..040ae4aa7c5a 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -184,6 +184,8 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  	sev->asid = asid;
>  	INIT_LIST_HEAD(&sev->regions_list);
>  
> +	xa_init(&sev->pages_xarray);
> +
>  	return 0;
>  
>  e_free:
> @@ -415,6 +417,42 @@ static unsigned long get_num_contig_pages(unsigned long idx,
>  	return pages;
>  }
>  
> +static int sev_get_page(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn)
> +{
> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +	struct xarray *xa = &sev->pages_xarray;
> +	struct page *page = pfn_to_page(pfn);
> +	int ret;
> +
> +	/* store page at index = gfn */
> +	ret = xa_insert(xa, gfn, page, GFP_ATOMIC);
> +	if (ret == -EBUSY) {
> +		/*
> +		 * If xa_insert returned -EBUSY, the  gfn was already associated
> +		 * with a struct page *.
> +		 */
> +		struct page *cur_page;
> +
> +		cur_page = xa_load(xa, gfn);
> +		/* If cur_page == page, no change is needed, so return 0 */
> +		if (cur_page == page)
> +			return 0;
> +
> +		/* Release the page that was stored at index = gfn */
> +		put_page(cur_page);
> +
> +		/* Return result of attempting to store page at index = gfn */
> +		ret = xa_err(xa_store(xa, gfn, page, GFP_ATOMIC));
> +	}
> +
> +	if (ret)
> +		return ret;
> +
> +	get_page(page);
> +
> +	return 0;
> +}
> +
>  static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  {
>  	unsigned long vaddr, vaddr_end, next_vaddr, npages, pages, size, i;
> @@ -1085,6 +1123,8 @@ void sev_vm_destroy(struct kvm *kvm)
>  	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>  	struct list_head *head = &sev->regions_list;
>  	struct list_head *pos, *q;
> +	XA_STATE(xas, &sev->pages_xarray, 0);
> +	struct page *xa_page;
>  
>  	if (!sev_guest(kvm))
>  		return;
> @@ -1109,6 +1149,12 @@ void sev_vm_destroy(struct kvm *kvm)
>  		}
>  	}
>  
> +	/* Release each pinned page that SEV tracked in sev->pages_xarray. */
> +	xas_for_each(&xas, xa_page, ULONG_MAX) {
> +		put_page(xa_page);
> +	}
> +	xa_destroy(&sev->pages_xarray);
> +
>  	mutex_unlock(&kvm->lock);
>  
>  	sev_unbind_asid(kvm, sev->handle);
> @@ -1193,3 +1239,28 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu)
>  	svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
>  	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
>  }
> +
> +int sev_set_spte_notify(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn,
> +			int level, bool mmio, u64 *spte)
> +{
> +	int rc;
> +
> +	if (!sev_guest(vcpu->kvm))
> +		return 0;
> +
> +	/* MMIO page contains the unencrypted data, no need to lock this page */
> +	if (mmio)

Rather than make this a generic set_spte() notify hook, I think it makes
more sense to specifying have it be a "pin_spte" style hook.  That way the
caller can skip mmio PFNs as well as flows that can't possibly be relevant
to SEV, e.g. the sync_page() flow.

> +		return 0;
> +
> +	rc = sev_get_page(vcpu->kvm, gfn, pfn);
> +	if (rc)
> +		return rc;
> +
> +	/*
> +	 * Flush any cached lines of the page being added since "ownership" of
> +	 * it will be transferred from the host to an encrypted guest.
> +	 */
> +	clflush_cache_range(__va(pfn << PAGE_SHIFT), page_level_size(level));
> +
> +	return 0;
> +}
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 535ad311ad02..9b304c761a99 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4130,6 +4130,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
>  
>  	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
> +
> +	.set_spte_notify = sev_set_spte_notify,
>  };
>  
>  static struct kvm_x86_init_ops svm_init_ops __initdata = {
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 121b198b51e9..8a5c01516c89 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -65,6 +65,7 @@ struct kvm_sev_info {
>  	int fd;			/* SEV device fd */
>  	unsigned long pages_locked; /* Number of pages locked */
>  	struct list_head regions_list;  /* List of registered regions */
> +	struct xarray pages_xarray; /* List of PFN locked */
>  };
>  
>  struct kvm_svm {
> @@ -488,5 +489,7 @@ int svm_unregister_enc_region(struct kvm *kvm,
>  void pre_sev_run(struct vcpu_svm *svm, int cpu);
>  int __init sev_hardware_setup(void);
>  void sev_hardware_teardown(void);
> +int sev_set_spte_notify(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn,
> +			int level, bool mmio, u64 *spte);
>  
>  #endif
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 3/4] KVM:SVM: Pin sev_launch_update_data() pages via sev_get_page()
  2020-07-24 23:54 ` [Patch 3/4] KVM:SVM: Pin sev_launch_update_data() pages via sev_get_page() eric van tassell
@ 2020-07-31 20:40   ` Sean Christopherson
  2020-08-02 23:55     ` Eric van Tassell
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2020-07-31 20:40 UTC (permalink / raw)
  To: eric van tassell
  Cc: kvm, bp, hpa, mingo, jmattson, joro, pbonzini, tglx, vkuznets,
	wanpengli, x86, evantass

On Fri, Jul 24, 2020 at 06:54:47PM -0500, eric van tassell wrote:
> Add 2 small infrastructure functions here which to enable pinning the SEV
> guest pages used for sev_launch_update_data() using sev_get_page().
> 
> Pin the memory for the data being passed to launch_update_data() because it
> gets encrypted before the guest is first run and must not be moved which
> would corrupt it.
> 
> Signed-off-by: eric van tassell <Eric.VanTassell@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c | 48 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 48 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 040ae4aa7c5a..e0eed9a20a51 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -453,6 +453,37 @@ static int sev_get_page(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn)
>  	return 0;
>  }
>  
> +static struct kvm_memory_slot *hva_to_memslot(struct kvm *kvm,
> +					      unsigned long hva)
> +{
> +	struct kvm_memslots *slots = kvm_memslots(kvm);
> +	struct kvm_memory_slot *memslot;
> +
> +	kvm_for_each_memslot(memslot, slots) {
> +		if (hva >= memslot->userspace_addr &&
> +		    hva < memslot->userspace_addr +
> +			      (memslot->npages << PAGE_SHIFT))
> +			return memslot;
> +	}
> +
> +	return NULL;
> +}
> +
> +static bool hva_to_gfn(struct kvm *kvm, unsigned long hva, gfn_t *gfn)
> +{
> +	struct kvm_memory_slot *memslot;
> +	gpa_t gpa_offset;
> +
> +	memslot = hva_to_memslot(kvm, hva);
> +	if (!memslot)
> +		return false;
> +
> +	gpa_offset = hva - memslot->userspace_addr;
> +	*gfn = ((memslot->base_gfn << PAGE_SHIFT) + gpa_offset) >> PAGE_SHIFT;
> +
> +	return true;
> +}
> +
>  static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  {
>  	unsigned long vaddr, vaddr_end, next_vaddr, npages, pages, size, i;
> @@ -483,6 +514,23 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  		goto e_free;
>  	}
>  
> +	/*
> +	 * Increment the page ref count so that the pages do not get migrated or
> +	 * moved after we are done from the LAUNCH_UPDATE_DATA.
> +	 */
> +	for (i = 0; i < npages; i++) {
> +		gfn_t gfn;
> +
> +		if (!hva_to_gfn(kvm, (vaddr + (i * PAGE_SIZE)) & PAGE_MASK, &gfn)) {

This needs to hold kvm->srcu to block changes to memslots while looking up
the hva->gpa translation.

> +			ret = -EFAULT;
> +			goto e_unpin;
> +		}
> +
> +		ret = sev_get_page(kvm, gfn, page_to_pfn(inpages[i]));

Rather than dump everything into an xarray, KVM can instead pin the pages
by faulting them into its MMU.  By pinning pages in the MMU proper, KVM can
use software available bits in the SPTEs to track which pages are pinned,
can assert/WARN on unexpected behavior with respect to pinned pages, and
can drop/unpin pages as soon as they are no longer reachable from KVM, e.g.
when the mm_struct dies or the associated memslot is removed.

Leveraging the MMU would also make this extensible to non-SEV features,
e.g. it can be shared by VMX if VMX adds a feature that needs similar hooks
in the MMU.  Shoving the tracking in SEV means the core logic would need to
be duplicated for other features.

The big caveat is that funneling this through the MMU requires a vCPU[*],
i.e. is only viable if userspace has already created at least one vCPU.
For QEMU, this is guaranteed.  I don't know about other VMMs.

If there are VMMs that support SEV and don't create vCPUs before encrypting
guest memory, one option would be to automatically go down the optimized
route iff at least one vCPU has been created.  In other words, don't break
old VMMs, but don't carry more hacks to make them faster either.

It just so happens that I have some code that sort of implements the above.
I reworked it to mesh with SEV and will post it as an RFC.  It's far from
a tested-and-ready-to-roll implemenation, but I think it's fleshed out
enough to start a conversation.

[*] This isn't a hard requirement, i.e. KVM could be reworked to provide a
    common MMU for non-nested TDP, but that's a much bigger effort.

> +		if (ret)
> +			goto e_unpin;
> +	}
> +
>  	/*
>  	 * The LAUNCH_UPDATE command will perform in-place encryption of the
>  	 * memory content (i.e it will write the same memory region with C=1).
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 2/4] KVM:SVM: Introduce set_spte_notify support
  2020-07-31 20:25   ` Sean Christopherson
@ 2020-08-02 20:53     ` Eric van Tassell
  2020-08-03 16:27       ` Sean Christopherson
  0 siblings, 1 reply; 17+ messages in thread
From: Eric van Tassell @ 2020-08-02 20:53 UTC (permalink / raw)
  To: Sean Christopherson, eric van tassell, Tom Lendacky, Singh,
	Brijesh, Grimm, Jon
  Cc: kvm, bp, hpa, mingo, jmattson, joro, pbonzini, tglx, vkuznets,
	wanpengli, x86



On 7/31/20 3:25 PM, Sean Christopherson wrote:
> On Fri, Jul 24, 2020 at 06:54:46PM -0500, eric van tassell wrote:
>> Improve SEV guest startup time from O(n) to a constant by deferring
>> guest page pinning until the pages are used to satisfy nested page faults.
>>
>> Implement the code to do the pinning (sev_get_page) and the notifier
>> sev_set_spte_notify().
>>
>> Track the pinned pages with xarray so they can be released during guest
>> termination.
> 
> I like that SEV is trying to be a better citizen, but this is trading one
> hack for another.
> 
>    - KVM goes through a lot of effort to ensure page faults don't need to
>      allocate memory, and this throws all that effort out the window.
> 
can you elaborate on that?
>    - Tracking all gfns in a separate database (from the MMU) is wasteful.
> 
>    - Having to wait to free pinned memory until the VM is destroyed is less
>      than ideal.
> 
> More thoughts in the next patch.
> 
>> Signed-off-by: eric van tassell <Eric.VanTassell@amd.com>
>> ---
>>   arch/x86/kvm/svm/sev.c | 71 ++++++++++++++++++++++++++++++++++++++++++
>>   arch/x86/kvm/svm/svm.c |  2 ++
>>   arch/x86/kvm/svm/svm.h |  3 ++
>>   3 files changed, 76 insertions(+)
>>
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index f7f1f4ecf08e..040ae4aa7c5a 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -184,6 +184,8 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>   	sev->asid = asid;
>>   	INIT_LIST_HEAD(&sev->regions_list);
>>   
>> +	xa_init(&sev->pages_xarray);
>> +
>>   	return 0;
>>   
>>   e_free:
>> @@ -415,6 +417,42 @@ static unsigned long get_num_contig_pages(unsigned long idx,
>>   	return pages;
>>   }
>>   
>> +static int sev_get_page(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn)
>> +{
>> +	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>> +	struct xarray *xa = &sev->pages_xarray;
>> +	struct page *page = pfn_to_page(pfn);
>> +	int ret;
>> +
>> +	/* store page at index = gfn */
>> +	ret = xa_insert(xa, gfn, page, GFP_ATOMIC);
>> +	if (ret == -EBUSY) {
>> +		/*
>> +		 * If xa_insert returned -EBUSY, the  gfn was already associated
>> +		 * with a struct page *.
>> +		 */
>> +		struct page *cur_page;
>> +
>> +		cur_page = xa_load(xa, gfn);
>> +		/* If cur_page == page, no change is needed, so return 0 */
>> +		if (cur_page == page)
>> +			return 0;
>> +
>> +		/* Release the page that was stored at index = gfn */
>> +		put_page(cur_page);
>> +
>> +		/* Return result of attempting to store page at index = gfn */
>> +		ret = xa_err(xa_store(xa, gfn, page, GFP_ATOMIC));
>> +	}
>> +
>> +	if (ret)
>> +		return ret;
>> +
>> +	get_page(page);
>> +
>> +	return 0;
>> +}
>> +
>>   static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>   {
>>   	unsigned long vaddr, vaddr_end, next_vaddr, npages, pages, size, i;
>> @@ -1085,6 +1123,8 @@ void sev_vm_destroy(struct kvm *kvm)
>>   	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>   	struct list_head *head = &sev->regions_list;
>>   	struct list_head *pos, *q;
>> +	XA_STATE(xas, &sev->pages_xarray, 0);
>> +	struct page *xa_page;
>>   
>>   	if (!sev_guest(kvm))
>>   		return;
>> @@ -1109,6 +1149,12 @@ void sev_vm_destroy(struct kvm *kvm)
>>   		}
>>   	}
>>   
>> +	/* Release each pinned page that SEV tracked in sev->pages_xarray. */
>> +	xas_for_each(&xas, xa_page, ULONG_MAX) {
>> +		put_page(xa_page);
>> +	}
>> +	xa_destroy(&sev->pages_xarray);
>> +
>>   	mutex_unlock(&kvm->lock);
>>   
>>   	sev_unbind_asid(kvm, sev->handle);
>> @@ -1193,3 +1239,28 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu)
>>   	svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
>>   	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
>>   }
>> +
>> +int sev_set_spte_notify(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn,
>> +			int level, bool mmio, u64 *spte)
>> +{
>> +	int rc;
>> +
>> +	if (!sev_guest(vcpu->kvm))
>> +		return 0;
>> +
>> +	/* MMIO page contains the unencrypted data, no need to lock this page */
>> +	if (mmio)
> 
> Rather than make this a generic set_spte() notify hook, I think it makes
> more sense to specifying have it be a "pin_spte" style hook.  That way the
> caller can skip mmio PFNs as well as flows that can't possibly be relevant
> to SEV, e.g. the sync_page() flow.
Not sure i understand. We do ignore mmio here. Can you detail a bit more 
what you see as problematic with the sync_page() flow?
> 
>> +		return 0;
>> +
>> +	rc = sev_get_page(vcpu->kvm, gfn, pfn);
>> +	if (rc)
>> +		return rc;
>> +
>> +	/*
>> +	 * Flush any cached lines of the page being added since "ownership" of
>> +	 * it will be transferred from the host to an encrypted guest.
>> +	 */
>> +	clflush_cache_range(__va(pfn << PAGE_SHIFT), page_level_size(level));
>> +
>> +	return 0;
>> +}
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index 535ad311ad02..9b304c761a99 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -4130,6 +4130,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>>   	.need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
>>   
>>   	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
>> +
>> +	.set_spte_notify = sev_set_spte_notify,
>>   };
>>   
>>   static struct kvm_x86_init_ops svm_init_ops __initdata = {
>> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
>> index 121b198b51e9..8a5c01516c89 100644
>> --- a/arch/x86/kvm/svm/svm.h
>> +++ b/arch/x86/kvm/svm/svm.h
>> @@ -65,6 +65,7 @@ struct kvm_sev_info {
>>   	int fd;			/* SEV device fd */
>>   	unsigned long pages_locked; /* Number of pages locked */
>>   	struct list_head regions_list;  /* List of registered regions */
>> +	struct xarray pages_xarray; /* List of PFN locked */
>>   };
>>   
>>   struct kvm_svm {
>> @@ -488,5 +489,7 @@ int svm_unregister_enc_region(struct kvm *kvm,
>>   void pre_sev_run(struct vcpu_svm *svm, int cpu);
>>   int __init sev_hardware_setup(void);
>>   void sev_hardware_teardown(void);
>> +int sev_set_spte_notify(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn,
>> +			int level, bool mmio, u64 *spte);
>>   
>>   #endif
>> -- 
>> 2.17.1
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 3/4] KVM:SVM: Pin sev_launch_update_data() pages via sev_get_page()
  2020-07-31 20:40   ` Sean Christopherson
@ 2020-08-02 23:55     ` Eric van Tassell
  2020-08-19 16:20       ` Eric van Tassell
  0 siblings, 1 reply; 17+ messages in thread
From: Eric van Tassell @ 2020-08-02 23:55 UTC (permalink / raw)
  To: Sean Christopherson, eric van tassell
  Cc: kvm, bp, hpa, mingo, jmattson, joro, pbonzini, tglx, vkuznets,
	wanpengli, x86



On 7/31/20 3:40 PM, Sean Christopherson wrote:
> On Fri, Jul 24, 2020 at 06:54:47PM -0500, eric van tassell wrote:
>> Add 2 small infrastructure functions here which to enable pinning the SEV
>> guest pages used for sev_launch_update_data() using sev_get_page().
>>
>> Pin the memory for the data being passed to launch_update_data() because it
>> gets encrypted before the guest is first run and must not be moved which
>> would corrupt it.
>>
>> Signed-off-by: eric van tassell <Eric.VanTassell@amd.com>
>> ---
>>   arch/x86/kvm/svm/sev.c | 48 ++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 48 insertions(+)
>>
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index 040ae4aa7c5a..e0eed9a20a51 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -453,6 +453,37 @@ static int sev_get_page(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn)
>>   	return 0;
>>   }
>>   
>> +static struct kvm_memory_slot *hva_to_memslot(struct kvm *kvm,
>> +					      unsigned long hva)
>> +{
>> +	struct kvm_memslots *slots = kvm_memslots(kvm);
>> +	struct kvm_memory_slot *memslot;
>> +
>> +	kvm_for_each_memslot(memslot, slots) {
>> +		if (hva >= memslot->userspace_addr &&
>> +		    hva < memslot->userspace_addr +
>> +			      (memslot->npages << PAGE_SHIFT))
>> +			return memslot;
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +static bool hva_to_gfn(struct kvm *kvm, unsigned long hva, gfn_t *gfn)
>> +{
>> +	struct kvm_memory_slot *memslot;
>> +	gpa_t gpa_offset;
>> +
>> +	memslot = hva_to_memslot(kvm, hva);
>> +	if (!memslot)
>> +		return false;
>> +
>> +	gpa_offset = hva - memslot->userspace_addr;
>> +	*gfn = ((memslot->base_gfn << PAGE_SHIFT) + gpa_offset) >> PAGE_SHIFT;
>> +
>> +	return true;
>> +}
>> +
>>   static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>   {
>>   	unsigned long vaddr, vaddr_end, next_vaddr, npages, pages, size, i;
>> @@ -483,6 +514,23 @@ static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
>>   		goto e_free;
>>   	}
>>   
>> +	/*
>> +	 * Increment the page ref count so that the pages do not get migrated or
>> +	 * moved after we are done from the LAUNCH_UPDATE_DATA.
>> +	 */
>> +	for (i = 0; i < npages; i++) {
>> +		gfn_t gfn;
>> +
>> +		if (!hva_to_gfn(kvm, (vaddr + (i * PAGE_SIZE)) & PAGE_MASK, &gfn)) {
> 
> This needs to hold kvm->srcu to block changes to memslots while looking up
> the hva->gpa translation.
I'll look into this.
> 
>> +			ret = -EFAULT;
>> +			goto e_unpin;
>> +		}
>> +
>> +		ret = sev_get_page(kvm, gfn, page_to_pfn(inpages[i]));
> 
> Rather than dump everything into an xarray, KVM can instead pin the pages
> by faulting them into its MMU.  By pinning pages in the MMU proper, KVM can
> use software available bits in the SPTEs to track which pages are pinned,
> can assert/WARN on unexpected behavior with respect to pinned pages, and
> can drop/unpin pages as soon as they are no longer reachable from KVM, e.g.
> when the mm_struct dies or the associated memslot is removed.
> 
> Leveraging the MMU would also make this extensible to non-SEV features,
> e.g. it can be shared by VMX if VMX adds a feature that needs similar hooks
> in the MMU.  Shoving the tracking in SEV means the core logic would need to
> be duplicated for other features.
> 
> The big caveat is that funneling this through the MMU requires a vCPU[*],
> i.e. is only viable if userspace has already created at least one vCPU.
> For QEMU, this is guaranteed.  I don't know about other VMMs.
> 
> If there are VMMs that support SEV and don't create vCPUs before encrypting
> guest memory, one option would be to automatically go down the optimized
> route iff at least one vCPU has been created.  In other words, don't break
> old VMMs, but don't carry more hacks to make them faster either.
> 
> It just so happens that I have some code that sort of implements the above.
> I reworked it to mesh with SEV and will post it as an RFC.  It's far from
> a tested-and-ready-to-roll implemenation, but I think it's fleshed out
> enough to start a conversation.
> 
> [*] This isn't a hard requirement, i.e. KVM could be reworked to provide a
>      common MMU for non-nested TDP, but that's a much bigger effort.
> 
I will think about this. (I'm out of the office Monday and Tuesday.)
>> +		if (ret)
>> +			goto e_unpin;
>> +	}
>> +
>>   	/*
>>   	 * The LAUNCH_UPDATE command will perform in-place encryption of the
>>   	 * memory content (i.e it will write the same memory region with C=1).
>> -- 
>> 2.17.1
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 2/4] KVM:SVM: Introduce set_spte_notify support
  2020-08-02 20:53     ` Eric van Tassell
@ 2020-08-03 16:27       ` Sean Christopherson
  2020-08-19 16:03         ` Eric van Tassell
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2020-08-03 16:27 UTC (permalink / raw)
  To: Eric van Tassell
  Cc: eric van tassell, Tom Lendacky, Singh, Brijesh, Grimm, Jon, kvm,
	bp, hpa, mingo, jmattson, joro, pbonzini, tglx, vkuznets,
	wanpengli, x86

On Sun, Aug 02, 2020 at 03:53:54PM -0500, Eric van Tassell wrote:
> 
> On 7/31/20 3:25 PM, Sean Christopherson wrote:
> >On Fri, Jul 24, 2020 at 06:54:46PM -0500, eric van tassell wrote:
> >>Improve SEV guest startup time from O(n) to a constant by deferring
> >>guest page pinning until the pages are used to satisfy nested page faults.
> >>
> >>Implement the code to do the pinning (sev_get_page) and the notifier
> >>sev_set_spte_notify().
> >>
> >>Track the pinned pages with xarray so they can be released during guest
> >>termination.
> >
> >I like that SEV is trying to be a better citizen, but this is trading one
> >hack for another.
> >
> >   - KVM goes through a lot of effort to ensure page faults don't need to
> >     allocate memory, and this throws all that effort out the window.
> >
> can you elaborate on that?

mmu_topup_memory_caches() is called from the page fault handlers before
acquiring mmu_lock to pre-allocate shadow pages, PTE list descriptors, GFN
arrays, etc... that may be needed to handle the page fault.  This allows
using standard GFP flags for the allocation and obviates the need for error
handling in the consumers.

> >>+int sev_set_spte_notify(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn,
> >>+			int level, bool mmio, u64 *spte)
> >>+{
> >>+	int rc;
> >>+
> >>+	if (!sev_guest(vcpu->kvm))
> >>+		return 0;
> >>+
> >>+	/* MMIO page contains the unencrypted data, no need to lock this page */
> >>+	if (mmio)
> >
> >Rather than make this a generic set_spte() notify hook, I think it makes
> >more sense to specifying have it be a "pin_spte" style hook.  That way the
> >caller can skip mmio PFNs as well as flows that can't possibly be relevant
> >to SEV, e.g. the sync_page() flow.
> Not sure i understand. We do ignore mmio here.

I'm saying we can have the caller, i.e. set_spte(), skip the hook for MMIO.
If the kvm_x86_ops hook is specifically designed to allow pinning pages (and
to support related features), then set_spte() can filter out MMIO PFNs.  It's
a minor detail, but it's one less thing to have to check in the vendor code.

> Can you detail a bit more what you see as problematic with the sync_page() flow?

There's no problem per se.  But, assuming TDP/NPT is required to enable SEV,
then sync_page() simply isn't relevant for pinning a host PFN as pages can't
become unsynchronized when TDP is enabled, e.g. ->sync_page() is a nop when
TDP is enabled.  If the hook is completely generic, then we need to think
about how it interacts with changing existing SPTEs via ->sync_page().  Giving
the hook more narrowly focused semantics means we can again filter out that
path and not have to worry about testing it.

The above doesn't hold true for nested TDP/NPT, but AIUI SEV doesn't yet
support nested virtualization, i.e. it's a future problem.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 2/4] KVM:SVM: Introduce set_spte_notify support
  2020-08-03 16:27       ` Sean Christopherson
@ 2020-08-19 16:03         ` Eric van Tassell
  2020-08-19 16:05           ` Sean Christopherson
  0 siblings, 1 reply; 17+ messages in thread
From: Eric van Tassell @ 2020-08-19 16:03 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: eric van tassell, Tom Lendacky, Singh, Brijesh, Grimm, Jon, kvm,
	bp, hpa, mingo, jmattson, joro, pbonzini, tglx, vkuznets,
	wanpengli, x86



On 8/3/20 11:27 AM, Sean Christopherson wrote:
> On Sun, Aug 02, 2020 at 03:53:54PM -0500, Eric van Tassell wrote:
>>
>> On 7/31/20 3:25 PM, Sean Christopherson wrote:
>>> On Fri, Jul 24, 2020 at 06:54:46PM -0500, eric van tassell wrote:
>>>> Improve SEV guest startup time from O(n) to a constant by deferring
>>>> guest page pinning until the pages are used to satisfy nested page faults.
>>>>
>>>> Implement the code to do the pinning (sev_get_page) and the notifier
>>>> sev_set_spte_notify().
>>>>
>>>> Track the pinned pages with xarray so they can be released during guest
>>>> termination.
>>>
>>> I like that SEV is trying to be a better citizen, but this is trading one
>>> hack for another.
>>>
>>>    - KVM goes through a lot of effort to ensure page faults don't need to
>>>      allocate memory, and this throws all that effort out the window.
>>>
>> can you elaborate on that?
> 
> mmu_topup_memory_caches() is called from the page fault handlers before
> acquiring mmu_lock to pre-allocate shadow pages, PTE list descriptors, GFN
> arrays, etc... that may be needed to handle the page fault.  This allows
> using standard GFP flags for the allocation and obviates the need for error
> handling in the consumers.
> 

I see what you meant. The issue that causes us to use this approach is 
that we need to be able to unpin the pages when the VM exits.

>>>> +int sev_set_spte_notify(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn,
>>>> +			int level, bool mmio, u64 *spte)
>>>> +{
>>>> +	int rc;
>>>> +
>>>> +	if (!sev_guest(vcpu->kvm))
>>>> +		return 0;
>>>> +
>>>> +	/* MMIO page contains the unencrypted data, no need to lock this page */
>>>> +	if (mmio)
>>>
>>> Rather than make this a generic set_spte() notify hook, I think it makes
>>> more sense to specifying have it be a "pin_spte" style hook.  That way the
>>> caller can skip mmio PFNs as well as flows that can't possibly be relevant
>>> to SEV, e.g. the sync_page() flow.
>> Not sure i understand. We do ignore mmio here.
> 
> I'm saying we can have the caller, i.e. set_spte(), skip the hook for MMIO.
> If the kvm_x86_ops hook is specifically designed to allow pinning pages (and
> to support related features), then set_spte() can filter out MMIO PFNs.  It's
> a minor detail, but it's one less thing to have to check in the vendor code.
> 

The check is moved to the caller.

>> Can you detail a bit more what you see as problematic with the sync_page() flow?
> 
> There's no problem per se.  But, assuming TDP/NPT is required to enable SEV,
> then sync_page() simply isn't relevant for pinning a host PFN as pages can't
> become unsynchronized when TDP is enabled, e.g. ->sync_page() is a nop when
> TDP is enabled.  If the hook is completely generic, then we need to think
> about how it interacts with changing existing SPTEs via ->sync_page().  Giving
> the hook more narrowly focused semantics means we can again filter out that
> path and not have to worry about testing it.
> 
> The above doesn't hold true for nested TDP/NPT, but AIUI SEV doesn't yet
> support nested virtualization, i.e. it's a future problem.
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 2/4] KVM:SVM: Introduce set_spte_notify support
  2020-08-19 16:03         ` Eric van Tassell
@ 2020-08-19 16:05           ` Sean Christopherson
  2020-08-20 17:05             ` Eric van Tassell
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2020-08-19 16:05 UTC (permalink / raw)
  To: Eric van Tassell
  Cc: eric van tassell, Tom Lendacky, Singh, Brijesh, Grimm, Jon, kvm,
	bp, hpa, mingo, jmattson, joro, pbonzini, tglx, vkuznets,
	wanpengli, x86

On Wed, Aug 19, 2020 at 11:03:48AM -0500, Eric van Tassell wrote:
> 
> 
> On 8/3/20 11:27 AM, Sean Christopherson wrote:
> >On Sun, Aug 02, 2020 at 03:53:54PM -0500, Eric van Tassell wrote:
> >>
> >>On 7/31/20 3:25 PM, Sean Christopherson wrote:
> >>>On Fri, Jul 24, 2020 at 06:54:46PM -0500, eric van tassell wrote:
> >>>>Improve SEV guest startup time from O(n) to a constant by deferring
> >>>>guest page pinning until the pages are used to satisfy nested page faults.
> >>>>
> >>>>Implement the code to do the pinning (sev_get_page) and the notifier
> >>>>sev_set_spte_notify().
> >>>>
> >>>>Track the pinned pages with xarray so they can be released during guest
> >>>>termination.
> >>>
> >>>I like that SEV is trying to be a better citizen, but this is trading one
> >>>hack for another.
> >>>
> >>>   - KVM goes through a lot of effort to ensure page faults don't need to
> >>>     allocate memory, and this throws all that effort out the window.
> >>>
> >>can you elaborate on that?
> >
> >mmu_topup_memory_caches() is called from the page fault handlers before
> >acquiring mmu_lock to pre-allocate shadow pages, PTE list descriptors, GFN
> >arrays, etc... that may be needed to handle the page fault.  This allows
> >using standard GFP flags for the allocation and obviates the need for error
> >handling in the consumers.
> >
> 
> I see what you meant. The issue that causes us to use this approach is that
> we need to be able to unpin the pages when the VM exits.

Yes, but using a software available flag in the SPTE to track pinned pages
should be very doable. 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 3/4] KVM:SVM: Pin sev_launch_update_data() pages via sev_get_page()
  2020-08-02 23:55     ` Eric van Tassell
@ 2020-08-19 16:20       ` Eric van Tassell
  0 siblings, 0 replies; 17+ messages in thread
From: Eric van Tassell @ 2020-08-19 16:20 UTC (permalink / raw)
  To: Sean Christopherson, eric van tassell
  Cc: kvm, bp, hpa, mingo, jmattson, joro, pbonzini, tglx, vkuznets,
	wanpengli, x86



On 8/2/20 6:55 PM, Eric van Tassell wrote:
> 
> 
> On 7/31/20 3:40 PM, Sean Christopherson wrote:
>> On Fri, Jul 24, 2020 at 06:54:47PM -0500, eric van tassell wrote:
>>> Add 2 small infrastructure functions here which to enable pinning the 
>>> SEV
>>> guest pages used for sev_launch_update_data() using sev_get_page().
>>>
>>> Pin the memory for the data being passed to launch_update_data() 
>>> because it
>>> gets encrypted before the guest is first run and must not be moved which
>>> would corrupt it.
>>>
>>> Signed-off-by: eric van tassell <Eric.VanTassell@amd.com>
>>> ---
>>>   arch/x86/kvm/svm/sev.c | 48 ++++++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 48 insertions(+)
>>>
>>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>>> index 040ae4aa7c5a..e0eed9a20a51 100644
>>> --- a/arch/x86/kvm/svm/sev.c
>>> +++ b/arch/x86/kvm/svm/sev.c
>>> @@ -453,6 +453,37 @@ static int sev_get_page(struct kvm *kvm, gfn_t 
>>> gfn, kvm_pfn_t pfn)
>>>       return 0;
>>>   }
>>> +static struct kvm_memory_slot *hva_to_memslot(struct kvm *kvm,
>>> +                          unsigned long hva)
>>> +{
>>> +    struct kvm_memslots *slots = kvm_memslots(kvm);
>>> +    struct kvm_memory_slot *memslot;
>>> +
>>> +    kvm_for_each_memslot(memslot, slots) {
>>> +        if (hva >= memslot->userspace_addr &&
>>> +            hva < memslot->userspace_addr +
>>> +                  (memslot->npages << PAGE_SHIFT))
>>> +            return memslot;
>>> +    }
>>> +
>>> +    return NULL;
>>> +}
>>> +
>>> +static bool hva_to_gfn(struct kvm *kvm, unsigned long hva, gfn_t *gfn)
>>> +{
>>> +    struct kvm_memory_slot *memslot;
>>> +    gpa_t gpa_offset;
>>> +
>>> +    memslot = hva_to_memslot(kvm, hva);
>>> +    if (!memslot)
>>> +        return false;
>>> +
>>> +    gpa_offset = hva - memslot->userspace_addr;
>>> +    *gfn = ((memslot->base_gfn << PAGE_SHIFT) + gpa_offset) >> 
>>> PAGE_SHIFT;
>>> +
>>> +    return true;
>>> +}
>>> +
>>>   static int sev_launch_update_data(struct kvm *kvm, struct 
>>> kvm_sev_cmd *argp)
>>>   {
>>>       unsigned long vaddr, vaddr_end, next_vaddr, npages, pages, 
>>> size, i;
>>> @@ -483,6 +514,23 @@ static int sev_launch_update_data(struct kvm 
>>> *kvm, struct kvm_sev_cmd *argp)
>>>           goto e_free;
>>>       }
>>> +    /*
>>> +     * Increment the page ref count so that the pages do not get 
>>> migrated or
>>> +     * moved after we are done from the LAUNCH_UPDATE_DATA.
>>> +     */
>>> +    for (i = 0; i < npages; i++) {
>>> +        gfn_t gfn;
>>> +
>>> +        if (!hva_to_gfn(kvm, (vaddr + (i * PAGE_SIZE)) & PAGE_MASK, 
>>> &gfn)) {
>>
>> This needs to hold kvm->srcu to block changes to memslots while 
>> looking up
>> the hva->gpa translation.
> I'll look into this.

fixed

>>
>>> +            ret = -EFAULT;
>>> +            goto e_unpin;
>>> +        }
>>> +
>>> +        ret = sev_get_page(kvm, gfn, page_to_pfn(inpages[i]));
>>
>> Rather than dump everything into an xarray, KVM can instead pin the pages
>> by faulting them into its MMU.  By pinning pages in the MMU proper, 
>> KVM can
>> use software available bits in the SPTEs to track which pages are pinned,
>> can assert/WARN on unexpected behavior with respect to pinned pages, and
>> can drop/unpin pages as soon as they are no longer reachable from KVM, 
>> e.g.
>> when the mm_struct dies or the associated memslot is removed.
>>
>> Leveraging the MMU would also make this extensible to non-SEV features,
>> e.g. it can be shared by VMX if VMX adds a feature that needs similar 
>> hooks
>> in the MMU.  Shoving the tracking in SEV means the core logic would 
>> need to
>> be duplicated for other features.
>>
>> The big caveat is that funneling this through the MMU requires a vCPU[*],
>> i.e. is only viable if userspace has already created at least one vCPU.
>> For QEMU, this is guaranteed.  I don't know about other VMMs.
>>
>> If there are VMMs that support SEV and don't create vCPUs before 
>> encrypting
>> guest memory, one option would be to automatically go down the optimized
>> route iff at least one vCPU has been created.  In other words, don't 
>> break
>> old VMMs, but don't carry more hacks to make them faster either.
>>
>> It just so happens that I have some code that sort of implements the 
>> above.
>> I reworked it to mesh with SEV and will post it as an RFC.  It's far from
>> a tested-and-ready-to-roll implemenation, but I think it's fleshed out
>> enough to start a conversation.
>>
>> [*] This isn't a hard requirement, i.e. KVM could be reworked to 
>> provide a
>>      common MMU for non-nested TDP, but that's a much bigger effort.
>>
> I will think about this. (I'm out of the office Monday and Tuesday.)

Brijesh invested time and could not get this approach to meet SEV's 
needs as he reported in a previous email.

>>> +        if (ret)
>>> +            goto e_unpin;
>>> +    }
>>> +
>>>       /*
>>>        * The LAUNCH_UPDATE command will perform in-place encryption 
>>> of the
>>>        * memory content (i.e it will write the same memory region 
>>> with C=1).
>>> -- 
>>> 2.17.1
>>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 2/4] KVM:SVM: Introduce set_spte_notify support
  2020-08-19 16:05           ` Sean Christopherson
@ 2020-08-20 17:05             ` Eric van Tassell
  2020-08-20 23:59               ` Sean Christopherson
  0 siblings, 1 reply; 17+ messages in thread
From: Eric van Tassell @ 2020-08-20 17:05 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: eric van tassell, Tom Lendacky, Singh, Brijesh, Grimm, Jon, kvm,
	bp, hpa, mingo, jmattson, joro, pbonzini, tglx, vkuznets,
	wanpengli, x86



On 8/19/20 11:05 AM, Sean Christopherson wrote:
> On Wed, Aug 19, 2020 at 11:03:48AM -0500, Eric van Tassell wrote:
>>
>>
>> On 8/3/20 11:27 AM, Sean Christopherson wrote:
>>> On Sun, Aug 02, 2020 at 03:53:54PM -0500, Eric van Tassell wrote:
>>>>
>>>> On 7/31/20 3:25 PM, Sean Christopherson wrote:
>>>>> On Fri, Jul 24, 2020 at 06:54:46PM -0500, eric van tassell wrote:
>>>>>> Improve SEV guest startup time from O(n) to a constant by deferring
>>>>>> guest page pinning until the pages are used to satisfy nested page faults.
>>>>>>
>>>>>> Implement the code to do the pinning (sev_get_page) and the notifier
>>>>>> sev_set_spte_notify().
>>>>>>
>>>>>> Track the pinned pages with xarray so they can be released during guest
>>>>>> termination.
>>>>>
>>>>> I like that SEV is trying to be a better citizen, but this is trading one
>>>>> hack for another.
>>>>>
>>>>>    - KVM goes through a lot of effort to ensure page faults don't need to
>>>>>      allocate memory, and this throws all that effort out the window.
>>>>>
>>>> can you elaborate on that?
>>>
>>> mmu_topup_memory_caches() is called from the page fault handlers before
>>> acquiring mmu_lock to pre-allocate shadow pages, PTE list descriptors, GFN
>>> arrays, etc... that may be needed to handle the page fault.  This allows
>>> using standard GFP flags for the allocation and obviates the need for error
>>> handling in the consumers.
>>>
>>
>> I see what you meant. The issue that causes us to use this approach is that
>> we need to be able to unpin the pages when the VM exits.
> 
> Yes, but using a software available flag in the SPTE to track pinned pages
> should be very doable.
> 

The issue, as I understand it, is that when spte(s) get zapped/unzapped, 
the flags are lost so we'd have to have some mechanism to, before 
zapping, cache the pfn <-> spte mapping

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 2/4] KVM:SVM: Introduce set_spte_notify support
  2020-08-20 17:05             ` Eric van Tassell
@ 2020-08-20 23:59               ` Sean Christopherson
  2020-08-21  0:36                 ` Eric van Tassell
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2020-08-20 23:59 UTC (permalink / raw)
  To: Eric van Tassell
  Cc: eric van tassell, Tom Lendacky, Singh, Brijesh, Grimm, Jon, kvm,
	bp, hpa, mingo, jmattson, joro, pbonzini, tglx, vkuznets,
	wanpengli, x86

On Thu, Aug 20, 2020 at 12:05:00PM -0500, Eric van Tassell wrote:
> 
> 
> On 8/19/20 11:05 AM, Sean Christopherson wrote:
> > On Wed, Aug 19, 2020 at 11:03:48AM -0500, Eric van Tassell wrote:
> > > 
> > > 
> > > On 8/3/20 11:27 AM, Sean Christopherson wrote:
> > > > On Sun, Aug 02, 2020 at 03:53:54PM -0500, Eric van Tassell wrote:
> > > > > 
> > > > > On 7/31/20 3:25 PM, Sean Christopherson wrote:
> > > > > > On Fri, Jul 24, 2020 at 06:54:46PM -0500, eric van tassell wrote:
> > > > > > > Improve SEV guest startup time from O(n) to a constant by deferring
> > > > > > > guest page pinning until the pages are used to satisfy nested page faults.
> > > > > > > 
> > > > > > > Implement the code to do the pinning (sev_get_page) and the notifier
> > > > > > > sev_set_spte_notify().
> > > > > > > 
> > > > > > > Track the pinned pages with xarray so they can be released during guest
> > > > > > > termination.
> > > > > > 
> > > > > > I like that SEV is trying to be a better citizen, but this is trading one
> > > > > > hack for another.
> > > > > > 
> > > > > >    - KVM goes through a lot of effort to ensure page faults don't need to
> > > > > >      allocate memory, and this throws all that effort out the window.
> > > > > > 
> > > > > can you elaborate on that?
> > > > 
> > > > mmu_topup_memory_caches() is called from the page fault handlers before
> > > > acquiring mmu_lock to pre-allocate shadow pages, PTE list descriptors, GFN
> > > > arrays, etc... that may be needed to handle the page fault.  This allows
> > > > using standard GFP flags for the allocation and obviates the need for error
> > > > handling in the consumers.
> > > > 
> > > 
> > > I see what you meant. The issue that causes us to use this approach is that
> > > we need to be able to unpin the pages when the VM exits.
> > 
> > Yes, but using a software available flag in the SPTE to track pinned pages
> > should be very doable.
> > 
> 
> The issue, as I understand it, is that when spte(s) get zapped/unzapped, the
> flags are lost so we'd have to have some mechanism to, before zapping, cache
> the pfn <-> spte mapping

The issue is that code doesn't exist :-)  

The idea is to leave the pfn in the spte itself when a pinned spte is zapped,
and use software available bits in the spte to indicate the page is pinned
and has zap.  When the VM is destroyed, remove all sptes and drop the page
reference for pinned pages.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 2/4] KVM:SVM: Introduce set_spte_notify support
  2020-08-20 23:59               ` Sean Christopherson
@ 2020-08-21  0:36                 ` Eric van Tassell
  2020-08-21 18:16                   ` Eric van Tassell
  0 siblings, 1 reply; 17+ messages in thread
From: Eric van Tassell @ 2020-08-21  0:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: eric van tassell, Tom Lendacky, Singh, Brijesh, Grimm, Jon, kvm,
	bp, hpa, mingo, jmattson, joro, pbonzini, tglx, vkuznets,
	wanpengli, x86



On 8/20/20 6:59 PM, Sean Christopherson wrote:
> On Thu, Aug 20, 2020 at 12:05:00PM -0500, Eric van Tassell wrote:
>>
>>
>> On 8/19/20 11:05 AM, Sean Christopherson wrote:
>>> On Wed, Aug 19, 2020 at 11:03:48AM -0500, Eric van Tassell wrote:
>>>>
>>>>
>>>> On 8/3/20 11:27 AM, Sean Christopherson wrote:
>>>>> On Sun, Aug 02, 2020 at 03:53:54PM -0500, Eric van Tassell wrote:
>>>>>>
>>>>>> On 7/31/20 3:25 PM, Sean Christopherson wrote:
>>>>>>> On Fri, Jul 24, 2020 at 06:54:46PM -0500, eric van tassell wrote:
>>>>>>>> Improve SEV guest startup time from O(n) to a constant by deferring
>>>>>>>> guest page pinning until the pages are used to satisfy nested page faults.
>>>>>>>>
>>>>>>>> Implement the code to do the pinning (sev_get_page) and the notifier
>>>>>>>> sev_set_spte_notify().
>>>>>>>>
>>>>>>>> Track the pinned pages with xarray so they can be released during guest
>>>>>>>> termination.
>>>>>>>
>>>>>>> I like that SEV is trying to be a better citizen, but this is trading one
>>>>>>> hack for another.
>>>>>>>
>>>>>>>     - KVM goes through a lot of effort to ensure page faults don't need to
>>>>>>>       allocate memory, and this throws all that effort out the window.
>>>>>>>
>>>>>> can you elaborate on that?
>>>>>
>>>>> mmu_topup_memory_caches() is called from the page fault handlers before
>>>>> acquiring mmu_lock to pre-allocate shadow pages, PTE list descriptors, GFN
>>>>> arrays, etc... that may be needed to handle the page fault.  This allows
>>>>> using standard GFP flags for the allocation and obviates the need for error
>>>>> handling in the consumers.
>>>>>
>>>>
>>>> I see what you meant. The issue that causes us to use this approach is that
>>>> we need to be able to unpin the pages when the VM exits.
>>>
>>> Yes, but using a software available flag in the SPTE to track pinned pages
>>> should be very doable.
>>>
>>
>> The issue, as I understand it, is that when spte(s) get zapped/unzapped, the
>> flags are lost so we'd have to have some mechanism to, before zapping, cache
>> the pfn <-> spte mapping
> 
> The issue is that code doesn't exist :-)

let me look into that and discuss it in our team meeting

> 
> The idea is to leave the pfn in the spte itself when a pinned spte is zapped,
> and use software available bits in the spte to indicate the page is pinned
> and has zap.  When the VM is destroyed, remove all sptes and drop the page
> reference for pinned pages.
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 2/4] KVM:SVM: Introduce set_spte_notify support
  2020-08-21  0:36                 ` Eric van Tassell
@ 2020-08-21 18:16                   ` Eric van Tassell
  0 siblings, 0 replies; 17+ messages in thread
From: Eric van Tassell @ 2020-08-21 18:16 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: eric van tassell, Tom Lendacky, Singh, Brijesh, Grimm, Jon, kvm,
	bp, hpa, mingo, jmattson, joro, pbonzini, tglx, vkuznets,
	wanpengli, x86



On 8/20/20 7:36 PM, Eric van Tassell wrote:
> 
> 
> On 8/20/20 6:59 PM, Sean Christopherson wrote:
>> On Thu, Aug 20, 2020 at 12:05:00PM -0500, Eric van Tassell wrote:
>>>
>>>
>>> On 8/19/20 11:05 AM, Sean Christopherson wrote:
>>>> On Wed, Aug 19, 2020 at 11:03:48AM -0500, Eric van Tassell wrote:
>>>>>
>>>>>
>>>>> On 8/3/20 11:27 AM, Sean Christopherson wrote:
>>>>>> On Sun, Aug 02, 2020 at 03:53:54PM -0500, Eric van Tassell wrote:
>>>>>>>
>>>>>>> On 7/31/20 3:25 PM, Sean Christopherson wrote:
>>>>>>>> On Fri, Jul 24, 2020 at 06:54:46PM -0500, eric van tassell wrote:
>>>>>>>>> Improve SEV guest startup time from O(n) to a constant by 
>>>>>>>>> deferring
>>>>>>>>> guest page pinning until the pages are used to satisfy nested 
>>>>>>>>> page faults.
>>>>>>>>>
>>>>>>>>> Implement the code to do the pinning (sev_get_page) and the 
>>>>>>>>> notifier
>>>>>>>>> sev_set_spte_notify().
>>>>>>>>>
>>>>>>>>> Track the pinned pages with xarray so they can be released 
>>>>>>>>> during guest
>>>>>>>>> termination.
>>>>>>>>
>>>>>>>> I like that SEV is trying to be a better citizen, but this is 
>>>>>>>> trading one
>>>>>>>> hack for another.
>>>>>>>>
>>>>>>>>     - KVM goes through a lot of effort to ensure page faults 
>>>>>>>> don't need to
>>>>>>>>       allocate memory, and this throws all that effort out the 
>>>>>>>> window.
>>>>>>>>
>>>>>>> can you elaborate on that?
>>>>>>
>>>>>> mmu_topup_memory_caches() is called from the page fault handlers 
>>>>>> before
>>>>>> acquiring mmu_lock to pre-allocate shadow pages, PTE list 
>>>>>> descriptors, GFN
>>>>>> arrays, etc... that may be needed to handle the page fault.  This 
>>>>>> allows
>>>>>> using standard GFP flags for the allocation and obviates the need 
>>>>>> for error
>>>>>> handling in the consumers.
>>>>>>
>>>>>
>>>>> I see what you meant. The issue that causes us to use this approach 
>>>>> is that
>>>>> we need to be able to unpin the pages when the VM exits.
>>>>
>>>> Yes, but using a software available flag in the SPTE to track pinned 
>>>> pages
>>>> should be very doable.
>>>>
>>>
>>> The issue, as I understand it, is that when spte(s) get 
>>> zapped/unzapped, the
>>> flags are lost so we'd have to have some mechanism to, before 
>>> zapping, cache
>>> the pfn <-> spte mapping
>>
>> The issue is that code doesn't exist :-)

looking at the suggested approach.

> 
> let me look into that and discuss it in our team meeting
> 
>>
>> The idea is to leave the pfn in the spte itself when a pinned spte is 
>> zapped,
>> and use software available bits in the spte to indicate the page is 
>> pinned
>> and has zap.  When the VM is destroyed, remove all sptes and drop the 
>> page
>> reference for pinned pages.
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-08-21 18:17 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-24 23:54 [Patch 0/4] Defer page pinning for SEV guests until guest pages touched eric van tassell
2020-07-24 23:54 ` [Patch 1/4] KVM:MMU: Introduce the set_spte_notify() callback eric van tassell
2020-07-24 23:54 ` [Patch 2/4] KVM:SVM: Introduce set_spte_notify support eric van tassell
2020-07-31 20:25   ` Sean Christopherson
2020-08-02 20:53     ` Eric van Tassell
2020-08-03 16:27       ` Sean Christopherson
2020-08-19 16:03         ` Eric van Tassell
2020-08-19 16:05           ` Sean Christopherson
2020-08-20 17:05             ` Eric van Tassell
2020-08-20 23:59               ` Sean Christopherson
2020-08-21  0:36                 ` Eric van Tassell
2020-08-21 18:16                   ` Eric van Tassell
2020-07-24 23:54 ` [Patch 3/4] KVM:SVM: Pin sev_launch_update_data() pages via sev_get_page() eric van tassell
2020-07-31 20:40   ` Sean Christopherson
2020-08-02 23:55     ` Eric van Tassell
2020-08-19 16:20       ` Eric van Tassell
2020-07-24 23:54 ` [Patch 4/4] KVM:SVM: Remove struct enc_region and associated pinned page tracking eric van tassell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).