All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance
@ 2020-09-01 11:52 yulei.kernel
  2020-09-01 11:54 ` [RFC V2 1/9] Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT support yulei.kernel
                   ` (9 more replies)
  0 siblings, 10 replies; 22+ messages in thread
From: yulei.kernel @ 2020-09-01 11:52 UTC (permalink / raw)
  To: pbonzini
  Cc: kvm, linux-kernel, sean.j.christopherson, jmattson, junaids,
	bgardon, vkuznets, xiaoguangrong.eric, kernellwp,
	lihaiwei.kernel, Yulei Zhang

From: Yulei Zhang <yulei.kernel@gmail.com>

Currently in KVM memory virtulization we relay on mmu_lock to
synchronize the memory mapping update, which make vCPUs work
in serialize mode and slow down the execution, especially after
migration to do substantial memory mapping will cause visible
performance drop, and it can get worse if guest has more vCPU
numbers and memories.
  
The idea we present in this patch set is to mitigate the issue
with pre-constructed memory mapping table. We will fast pin the
guest memory to build up a global memory mapping table according
to the guest memslots changes and apply it to cr3, so that after
guest starts up all the vCPUs would be able to update the memory
simultaneously without page fault exception, thus the performance
improvement is expected. 

We use memory dirty pattern workload to test the initial patch
set and get positive result even with huge page enabled. For example,
we create guest with 32 vCPUs and 64G memories, and let the vcpus
dirty the entire memory region concurrently, as the initial patch
eliminate the overhead of mmu_lock, in 2M/1G huge page mode we would
get the job done in about 50% faster.

We only validate this feature on Intel x86 platform. And as Ben
pointed out in RFC V1, so far we disable the SMM for resource
consideration, drop the mmu notification as in this case the
memory is pinned.

V1->V2:
* Rebase the code to kernel version 5.9.0-rc1.

Yulei Zhang (9):
  Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT
    support
  Introduce page table population function for direct build EPT feature
  Introduce page table remove function for direct build EPT feature
  Add release function for direct build ept when guest VM exit
  Modify the page fault path to meet the direct build EPT requirement
  Apply the direct build EPT according to the memory slots change
  Add migration support when using direct build EPT
  Introduce kvm module parameter global_tdp to turn on the direct build
    EPT mode
  Handle certain mmu exposed functions properly while turn on direct
    build EPT mode

 arch/mips/kvm/mips.c            |  13 +
 arch/powerpc/kvm/powerpc.c      |  13 +
 arch/s390/kvm/kvm-s390.c        |  13 +
 arch/x86/include/asm/kvm_host.h |  13 +-
 arch/x86/kvm/mmu/mmu.c          | 533 ++++++++++++++++++++++++++++++--
 arch/x86/kvm/svm/svm.c          |   2 +-
 arch/x86/kvm/vmx/vmx.c          |   7 +-
 arch/x86/kvm/x86.c              |  55 ++--
 include/linux/kvm_host.h        |   7 +-
 virt/kvm/kvm_main.c             |  43 ++-
 10 files changed, 639 insertions(+), 60 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC V2 1/9] Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT support
  2020-09-01 11:52 [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance yulei.kernel
@ 2020-09-01 11:54 ` yulei.kernel
  2020-09-01 11:55 ` [RFC V2 2/9] Introduce page table population function for direct build EPT feature yulei.kernel
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: yulei.kernel @ 2020-09-01 11:54 UTC (permalink / raw)
  To: pbonzini
  Cc: kvm, linux-kernel, sean.j.christopherson, jmattson, junaids,
	bgardon, vkuznets, xiaoguangrong.eric, kernellwp,
	lihaiwei.kernel, Yulei Zhang

From: Yulei Zhang <yuleixzhang@tencent.com>

Add parameter global_root_hpa for saving direct build global EPT root point,
and add per-vcpu flag direct_build_tdp to indicate using global EPT root
point.

Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>
---
 arch/x86/include/asm/kvm_host.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5ab3af7275d8..485b1239ad39 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -788,6 +788,9 @@ struct kvm_vcpu_arch {
 
 	/* AMD MSRC001_0015 Hardware Configuration */
 	u64 msr_hwcr;
+
+	/* vcpu use pre-constructed EPT */
+	bool direct_build_tdp;
 };
 
 struct kvm_lpage_info {
@@ -963,6 +966,8 @@ struct kvm_arch {
 
 	struct kvm_pmu_event_filter *pmu_event_filter;
 	struct task_struct *nx_lpage_recovery_thread;
+	/* global root hpa for pre-constructed EPT */
+	hpa_t  global_root_hpa;
 };
 
 struct kvm_vm_stat {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC V2 2/9] Introduce page table population function for direct build EPT feature
  2020-09-01 11:52 [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance yulei.kernel
  2020-09-01 11:54 ` [RFC V2 1/9] Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT support yulei.kernel
@ 2020-09-01 11:55 ` yulei.kernel
  2020-09-01 17:33   ` kernel test robot
  2020-09-01 19:04   ` kernel test robot
  2020-09-01 11:55 ` [RFC V2 3/9] Introduce page table remove " yulei.kernel
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 22+ messages in thread
From: yulei.kernel @ 2020-09-01 11:55 UTC (permalink / raw)
  To: pbonzini
  Cc: kvm, linux-kernel, sean.j.christopherson, jmattson, junaids,
	bgardon, vkuznets, xiaoguangrong.eric, kernellwp,
	lihaiwei.kernel, Yulei Zhang, Yulei Zhang

From: Yulei Zhang <yulei.kernel@gmail.com>

Page table population function will pin the memory and pre-construct
the EPT base on the input memory slot configuration so that it won't
relay on the page fault interrupt to setup the page table.

Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>
---
 arch/x86/include/asm/kvm_host.h |   2 +-
 arch/x86/kvm/mmu/mmu.c          | 212 +++++++++++++++++++++++++++++++-
 arch/x86/kvm/svm/svm.c          |   2 +-
 arch/x86/kvm/vmx/vmx.c          |   7 +-
 include/linux/kvm_host.h        |   4 +-
 virt/kvm/kvm_main.c             |  30 ++++-
 6 files changed, 244 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 485b1239ad39..ab3cbef8c1aa 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1138,7 +1138,7 @@ struct kvm_x86_ops {
 	int (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
 	int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
 	int (*set_identity_map_addr)(struct kvm *kvm, u64 ident_addr);
-	u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
+	u64 (*get_mt_mask)(struct kvm *kvm, struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
 
 	void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, unsigned long pgd,
 			     int pgd_level);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4e03841f053d..bfe4d2b3e809 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -241,6 +241,11 @@ struct kvm_shadow_walk_iterator {
 		({ spte = mmu_spte_get_lockless(_walker.sptep); 1; });	\
 	     __shadow_walk_next(&(_walker), spte))
 
+#define for_each_direct_build_shadow_entry(_walker, shadow_addr, _addr, level)	\
+	for (__shadow_walk_init(&(_walker), shadow_addr, _addr, level);		\
+	     shadow_walk_okay(&(_walker));					\
+	     shadow_walk_next(&(_walker)))
+
 static struct kmem_cache *pte_list_desc_cache;
 static struct kmem_cache *mmu_page_header_cache;
 static struct percpu_counter kvm_total_used_mmu_pages;
@@ -2506,13 +2511,20 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 	return sp;
 }
 
+static void __shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
+			       hpa_t shadow_addr, u64 addr, int level)
+{
+	iterator->addr = addr;
+	iterator->shadow_addr = shadow_addr;
+	iterator->level = level;
+	iterator->sptep = NULL;
+}
+
 static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterator,
 					struct kvm_vcpu *vcpu, hpa_t root,
 					u64 addr)
 {
-	iterator->addr = addr;
-	iterator->shadow_addr = root;
-	iterator->level = vcpu->arch.mmu->shadow_root_level;
+	__shadow_walk_init(iterator, root, addr, vcpu->arch.mmu->shadow_root_level);
 
 	if (iterator->level == PT64_ROOT_4LEVEL &&
 	    vcpu->arch.mmu->root_level < PT64_ROOT_4LEVEL &&
@@ -3014,7 +3026,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	if (level > PG_LEVEL_4K)
 		spte |= PT_PAGE_SIZE_MASK;
 	if (tdp_enabled)
-		spte |= kvm_x86_ops.get_mt_mask(vcpu, gfn,
+		spte |= kvm_x86_ops.get_mt_mask(vcpu->kvm, vcpu, gfn,
 			kvm_is_mmio_pfn(pfn));
 
 	if (host_writable)
@@ -6278,6 +6290,198 @@ int kvm_mmu_module_init(void)
 	return ret;
 }
 
+static int direct_build_tdp_set_spte(struct kvm *kvm, struct kvm_memory_slot *slot,
+		    u64 *sptep, unsigned pte_access, int level,
+		    gfn_t gfn, kvm_pfn_t pfn, bool speculative,
+		    bool dirty, bool host_writable)
+{
+	u64 spte = 0;
+	int ret = 0;
+	/*
+	 * For the EPT case, shadow_present_mask is 0 if hardware
+	 * supports exec-only page table entries.  In that case,
+	 * ACC_USER_MASK and shadow_user_mask are used to represent
+	 * read access.  See FNAME(gpte_access) in paging_tmpl.h.
+	 */
+	spte |= shadow_present_mask;
+	if (!speculative)
+		spte |= shadow_accessed_mask;
+
+	if (level > PG_LEVEL_4K && (pte_access & ACC_EXEC_MASK) &&
+	    is_nx_huge_page_enabled()) {
+		pte_access &= ~ACC_EXEC_MASK;
+	}
+
+	if (pte_access & ACC_EXEC_MASK)
+		spte |= shadow_x_mask;
+	else
+		spte |= shadow_nx_mask;
+
+	if (pte_access & ACC_USER_MASK)
+		spte |= shadow_user_mask;
+
+	if (level > PG_LEVEL_4K)
+		spte |= PT_PAGE_SIZE_MASK;
+
+	if (tdp_enabled)
+		spte |= kvm_x86_ops.get_mt_mask(kvm, NULL, gfn, kvm_is_mmio_pfn(pfn));
+
+	if (host_writable)
+		spte |= SPTE_HOST_WRITEABLE;
+	else
+		pte_access &= ~ACC_WRITE_MASK;
+
+	spte |= (u64)pfn << PAGE_SHIFT;
+
+	if (pte_access & ACC_WRITE_MASK) {
+
+		spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;
+
+		if (dirty) {
+			mark_page_dirty_in_slot(slot, gfn);
+			spte |= shadow_dirty_mask;
+		}
+	}
+
+	if (mmu_spte_update(sptep, spte))
+		kvm_flush_remote_tlbs(kvm);
+
+	return ret;
+}
+
+static void __kvm_walk_global_page(struct kvm *kvm, u64 addr, int level)
+{
+	int i;
+	kvm_pfn_t pfn;
+	u64 *sptep = (u64 *)__va(addr);
+
+	for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
+		if (is_shadow_present_pte(sptep[i])) {
+			if (!is_last_spte(sptep[i], level)) {
+				__kvm_walk_global_page(kvm, sptep[i] & PT64_BASE_ADDR_MASK, level - 1);
+			} else {
+				pfn = spte_to_pfn(sptep[i]);
+				mmu_spte_clear_track_bits(&sptep[i]);
+				kvm_release_pfn_clean(pfn);
+			}
+		}
+	}
+	put_page(pfn_to_page(addr >> PAGE_SHIFT));
+}
+
+static int direct_build_tdp_map(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn,
+				kvm_pfn_t pfn, int level)
+{
+	int ret = 0;
+
+	struct kvm_shadow_walk_iterator iterator;
+	kvm_pfn_t old_pfn;
+	u64 spte;
+
+	for_each_direct_build_shadow_entry(iterator, kvm->arch.global_root_hpa,
+				gfn << PAGE_SHIFT, max_tdp_level) {
+		if (iterator.level == level) {
+			break;
+		}
+
+		if (!is_shadow_present_pte(*iterator.sptep)) {
+			struct page *page;
+			page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+			if (!page)
+				return 0;
+
+			spte = page_to_phys(page) | PT_PRESENT_MASK | PT_WRITABLE_MASK |
+				shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
+			mmu_spte_set(iterator.sptep, spte);
+		}
+	}
+	/* if presented pte, release the original pfn  */
+	if (is_shadow_present_pte(*iterator.sptep)) {
+		if (level > PG_LEVEL_4K)
+			__kvm_walk_global_page(kvm, (*iterator.sptep) & PT64_BASE_ADDR_MASK, level - 1);
+		else {
+			old_pfn = spte_to_pfn(*iterator.sptep);
+			mmu_spte_clear_track_bits(iterator.sptep);
+			kvm_release_pfn_clean(old_pfn);
+		}
+	}
+	direct_build_tdp_set_spte(kvm, slot, iterator.sptep, ACC_ALL, level, gfn, pfn, false, true, true);
+
+	return ret;
+}
+
+static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
+{
+	unsigned long page_size;
+	int i, ret = 0;
+
+	page_size = kvm_host_page_size(kvm, NULL, gfn);
+
+	for (i = PG_LEVEL_4K; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) {
+		if (page_size >= KVM_HPAGE_SIZE(i))
+			ret = i;
+		else
+			break;
+	}
+
+	return ret;
+}
+
+int direct_build_mapping_level(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn)
+{
+	int host_level, max_level, level;
+	struct kvm_lpage_info *linfo;
+
+	host_level = host_mapping_level(kvm, gfn);
+	if (host_level != PG_LEVEL_4K) {
+		max_level = min(max_huge_page_level, host_level);
+		for (level = PG_LEVEL_4K; level <= max_level; ++level) {
+			linfo = lpage_info_slot(gfn, slot, level);
+			if (linfo->disallow_lpage)
+				break;
+		}
+		host_level = level - 1;
+	}
+	return host_level;
+}
+
+int kvm_direct_tdp_populate_page_table(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+	gfn_t gfn;
+	kvm_pfn_t pfn;
+	int host_level;
+
+	if (!kvm->arch.global_root_hpa) {
+		struct page *page;
+		WARN_ON(!tdp_enabled);
+		WARN_ON(max_tdp_level != PT64_ROOT_4LEVEL);
+
+		/* init global root hpa */
+		page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+		if (!page)
+			return -ENOMEM;
+
+		kvm->arch.global_root_hpa = page_to_phys(page);
+	}
+
+	/* setup page table for the slot */
+	for (gfn = slot->base_gfn;
+		gfn < slot->base_gfn + slot->npages;
+		gfn += KVM_PAGES_PER_HPAGE(host_level)) {
+		pfn = gfn_to_pfn_try_write(slot, gfn);
+		if ((pfn & KVM_PFN_ERR_FAULT) || is_noslot_pfn(pfn))
+			return -ENOMEM;
+
+		host_level = direct_build_mapping_level(kvm, slot, gfn);
+
+		if (host_level > PG_LEVEL_4K)
+			MMU_WARN_ON(gfn & (KVM_PAGES_PER_HPAGE(host_level) - 1));
+		direct_build_tdp_map(kvm, slot, gfn, pfn, host_level);
+	}
+
+	return 0;
+}
+
 /*
  * Calculate mmu pages needed for kvm.
  */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 03dd7bac8034..3b7ee65cd941 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3607,7 +3607,7 @@ static bool svm_has_emulated_msr(u32 index)
 	return true;
 }
 
-static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
+static u64 svm_get_mt_mask(struct kvm *kvm, struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 {
 	return 0;
 }
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 46ba2e03a892..6f79343ed40e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7106,7 +7106,7 @@ static int __init vmx_check_processor_compat(void)
 	return 0;
 }
 
-static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
+static u64 vmx_get_mt_mask(struct kvm *kvm, struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 {
 	u8 cache;
 	u64 ipat = 0;
@@ -7134,12 +7134,15 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 		goto exit;
 	}
 
-	if (!kvm_arch_has_noncoherent_dma(vcpu->kvm)) {
+	if (!kvm_arch_has_noncoherent_dma(kvm)) {
 		ipat = VMX_EPT_IPAT_BIT;
 		cache = MTRR_TYPE_WRBACK;
 		goto exit;
 	}
 
+	if (!vcpu)
+		vcpu = kvm->vcpus[0];
+
 	if (kvm_read_cr0(vcpu) & X86_CR0_CD) {
 		ipat = VMX_EPT_IPAT_BIT;
 		if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a23076765b4c..8901862ba2a3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -694,6 +694,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 				struct kvm_memory_slot *old,
 				const struct kvm_memory_slot *new,
 				enum kvm_mr_change change);
+void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, gfn_t gfn);
 /* flush all memory translations */
 void kvm_arch_flush_shadow_all(struct kvm *kvm);
 /* flush memory translations pointing to 'slot' */
@@ -721,6 +722,7 @@ kvm_pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn);
 kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
 			       bool atomic, bool *async, bool write_fault,
 			       bool *writable);
+kvm_pfn_t gfn_to_pfn_try_write(struct kvm_memory_slot *slot, gfn_t gfn);
 
 void kvm_release_pfn_clean(kvm_pfn_t pfn);
 void kvm_release_pfn_dirty(kvm_pfn_t pfn);
@@ -775,7 +777,7 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len);
 struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
 bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
 bool kvm_vcpu_is_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
-unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn);
+unsigned long kvm_host_page_size(struct kvm *kvm, struct kvm_vcpu *vcpu, gfn_t gfn);
 void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
 
 struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 737666db02de..47fc18b05c53 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -143,7 +143,7 @@ static void hardware_disable_all(void);
 
 static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
 
-static void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, gfn_t gfn);
+void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, gfn_t gfn);
 
 __visible bool kvm_rebooting;
 EXPORT_SYMBOL_GPL(kvm_rebooting);
@@ -1689,14 +1689,17 @@ bool kvm_vcpu_is_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_is_visible_gfn);
 
-unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn)
+unsigned long kvm_host_page_size(struct kvm *kvm, struct kvm_vcpu *vcpu, gfn_t gfn)
 {
 	struct vm_area_struct *vma;
 	unsigned long addr, size;
 
 	size = PAGE_SIZE;
 
-	addr = kvm_vcpu_gfn_to_hva_prot(vcpu, gfn, NULL);
+	if (vcpu)
+		addr = kvm_vcpu_gfn_to_hva_prot(vcpu, gfn, NULL);
+	else
+		addr = gfn_to_hva(kvm, gfn);
 	if (kvm_is_error_hva(addr))
 		return PAGE_SIZE;
 
@@ -1989,6 +1992,25 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
 	return pfn;
 }
 
+/* Map pfn for direct EPT mode, if map failed and it is readonly memslot,
+ * will try to remap it with readonly flag.
+ */
+kvm_pfn_t gfn_to_pfn_try_write(struct kvm_memory_slot *slot, gfn_t gfn)
+{
+	kvm_pfn_t pfn;
+	unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, !memslot_is_readonly(slot));
+
+	if (kvm_is_error_hva(addr))
+		return KVM_PFN_NOSLOT;
+
+	pfn = hva_to_pfn(addr, false, NULL, true, NULL);
+	if (pfn & KVM_PFN_ERR_FAULT) {
+		if (memslot_is_readonly(slot))
+			pfn = hva_to_pfn(addr, false, NULL, false, NULL);
+	}
+	return pfn;
+}
+
 kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
 			       bool atomic, bool *async, bool write_fault,
 			       bool *writable)
@@ -2638,7 +2660,7 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len)
 }
 EXPORT_SYMBOL_GPL(kvm_clear_guest);
 
-static void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot,
+void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot,
 				    gfn_t gfn)
 {
 	if (memslot && memslot->dirty_bitmap) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC V2 3/9] Introduce page table remove function for direct build EPT feature
  2020-09-01 11:52 [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance yulei.kernel
  2020-09-01 11:54 ` [RFC V2 1/9] Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT support yulei.kernel
  2020-09-01 11:55 ` [RFC V2 2/9] Introduce page table population function for direct build EPT feature yulei.kernel
@ 2020-09-01 11:55 ` yulei.kernel
  2020-09-01 11:55 ` [RFC V2 4/9] Add release function for direct build ept when guest VM exit yulei.kernel
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: yulei.kernel @ 2020-09-01 11:55 UTC (permalink / raw)
  To: pbonzini
  Cc: kvm, linux-kernel, sean.j.christopherson, jmattson, junaids,
	bgardon, vkuznets, xiaoguangrong.eric, kernellwp,
	lihaiwei.kernel, Yulei Zhang

From: Yulei Zhang <yuleixzhang@tencent.com>

During guest boots up it will modify the memory slots multiple times,
so add page table remove function to free pre-pinned memory according
to the the memory slot changes.

Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>
---
 arch/x86/kvm/mmu/mmu.c | 56 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index bfe4d2b3e809..03c5e73b96cb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6482,6 +6482,62 @@ int kvm_direct_tdp_populate_page_table(struct kvm *kvm, struct kvm_memory_slot *
 	return 0;
 }
 
+static int __kvm_remove_spte(struct kvm *kvm, u64 *addr, gfn_t gfn, int level)
+{
+	int i;
+	int ret = level;
+	bool present = false;
+	kvm_pfn_t pfn;
+	u64 *sptep = (u64 *)__va((*addr) & PT64_BASE_ADDR_MASK);
+	unsigned index = SHADOW_PT_INDEX(gfn << PAGE_SHIFT, level);
+
+	for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
+		if (is_shadow_present_pte(sptep[i])) {
+			if (i == index) {
+				if (!is_last_spte(sptep[i], level)) {
+					ret = __kvm_remove_spte(kvm, &sptep[i], gfn, level - 1);
+					if (is_shadow_present_pte(sptep[i]))
+						return ret;
+				} else {
+					pfn = spte_to_pfn(sptep[i]);
+					mmu_spte_clear_track_bits(&sptep[i]);
+					kvm_release_pfn_clean(pfn);
+					if (present)
+						return ret;
+				}
+			} else {
+				if (i > index)
+					return ret;
+				else
+					present = true;
+			}
+		}
+	}
+
+	if (!present) {
+		pfn = spte_to_pfn(*addr);
+		mmu_spte_clear_track_bits(addr);
+		kvm_release_pfn_clean(pfn);
+	}
+	return ret;
+}
+
+void kvm_direct_tdp_remove_page_table(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+	gfn_t gfn = slot->base_gfn;
+	int host_level;
+
+	if (!kvm->arch.global_root_hpa)
+		return;
+
+	for (gfn = slot->base_gfn;
+		gfn < slot->base_gfn + slot->npages;
+		gfn += KVM_PAGES_PER_HPAGE(host_level))
+		host_level = __kvm_remove_spte(kvm, &(kvm->arch.global_root_hpa), gfn, PT64_ROOT_4LEVEL);
+
+	kvm_flush_remote_tlbs(kvm);
+}
+
 /*
  * Calculate mmu pages needed for kvm.
  */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC V2 4/9] Add release function for direct build ept when guest VM exit
  2020-09-01 11:52 [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance yulei.kernel
                   ` (2 preceding siblings ...)
  2020-09-01 11:55 ` [RFC V2 3/9] Introduce page table remove " yulei.kernel
@ 2020-09-01 11:55 ` yulei.kernel
  2020-09-01 11:56 ` [RFC V2 5/9] Modify the page fault path to meet the direct build EPT requirement yulei.kernel
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: yulei.kernel @ 2020-09-01 11:55 UTC (permalink / raw)
  To: pbonzini
  Cc: kvm, linux-kernel, sean.j.christopherson, jmattson, junaids,
	bgardon, vkuznets, xiaoguangrong.eric, kernellwp,
	lihaiwei.kernel, Yulei Zhang, Yulei Zhang

From: Yulei Zhang <yulei.kernel@gmail.com>

Release the pre-pinned memory in direct build ept when guest VM
exit.

Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>
---
 arch/x86/kvm/mmu/mmu.c | 37 ++++++++++++++++++++++++++++---------
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 03c5e73b96cb..f2124f52b286 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4309,8 +4309,11 @@ static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd,
 void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd, bool skip_tlb_flush,
 		     bool skip_mmu_sync)
 {
-	__kvm_mmu_new_pgd(vcpu, new_pgd, kvm_mmu_calc_root_page_role(vcpu),
-			  skip_tlb_flush, skip_mmu_sync);
+	if (!vcpu->arch.direct_build_tdp)
+		__kvm_mmu_new_pgd(vcpu, new_pgd, kvm_mmu_calc_root_page_role(vcpu),
+				  skip_tlb_flush, skip_mmu_sync);
+	else
+		vcpu->arch.mmu->root_hpa = INVALID_PAGE;
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_new_pgd);
 
@@ -5207,10 +5210,14 @@ EXPORT_SYMBOL_GPL(kvm_mmu_load);
 
 void kvm_mmu_unload(struct kvm_vcpu *vcpu)
 {
-	kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu, KVM_MMU_ROOTS_ALL);
-	WARN_ON(VALID_PAGE(vcpu->arch.root_mmu.root_hpa));
-	kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
-	WARN_ON(VALID_PAGE(vcpu->arch.guest_mmu.root_hpa));
+	if (!vcpu->arch.direct_build_tdp) {
+		kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu, KVM_MMU_ROOTS_ALL);
+		WARN_ON(VALID_PAGE(vcpu->arch.root_mmu.root_hpa));
+		kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
+		WARN_ON(VALID_PAGE(vcpu->arch.guest_mmu.root_hpa));
+	}
+	vcpu->arch.direct_build_tdp = false;
+	vcpu->arch.mmu->root_hpa = INVALID_PAGE;
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_unload);
 
@@ -6538,6 +6545,14 @@ void kvm_direct_tdp_remove_page_table(struct kvm *kvm, struct kvm_memory_slot *s
 	kvm_flush_remote_tlbs(kvm);
 }
 
+void kvm_direct_tdp_release_global_root(struct kvm *kvm)
+{
+	if (kvm->arch.global_root_hpa)
+		__kvm_walk_global_page(kvm, kvm->arch.global_root_hpa, max_tdp_level);
+
+	return;
+}
+
 /*
  * Calculate mmu pages needed for kvm.
  */
@@ -6564,9 +6579,13 @@ unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
 
 void kvm_mmu_destroy(struct kvm_vcpu *vcpu)
 {
-	kvm_mmu_unload(vcpu);
-	free_mmu_pages(&vcpu->arch.root_mmu);
-	free_mmu_pages(&vcpu->arch.guest_mmu);
+	if (vcpu->arch.direct_build_tdp) {
+		vcpu->arch.mmu->root_hpa = INVALID_PAGE;
+	} else {
+		kvm_mmu_unload(vcpu);
+		free_mmu_pages(&vcpu->arch.root_mmu);
+		free_mmu_pages(&vcpu->arch.guest_mmu);
+	}
 	mmu_free_memory_caches(vcpu);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC V2 5/9] Modify the page fault path to meet the direct build EPT requirement
  2020-09-01 11:52 [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance yulei.kernel
                   ` (3 preceding siblings ...)
  2020-09-01 11:55 ` [RFC V2 4/9] Add release function for direct build ept when guest VM exit yulei.kernel
@ 2020-09-01 11:56 ` yulei.kernel
  2020-09-01 11:56 ` [RFC V2 6/9] Apply the direct build EPT according to the memory slots change yulei.kernel
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: yulei.kernel @ 2020-09-01 11:56 UTC (permalink / raw)
  To: pbonzini
  Cc: kvm, linux-kernel, sean.j.christopherson, jmattson, junaids,
	bgardon, vkuznets, xiaoguangrong.eric, kernellwp,
	lihaiwei.kernel, Yulei Zhang, Yulei Zhang

From: Yulei Zhang <yulei.kernel@gmail.com>

Refine the fast page fault code so that it can be used in either
normal ept mode or direct build EPT mode.

Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>
---
 arch/x86/kvm/mmu/mmu.c | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f2124f52b286..fda6c4196854 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3443,12 +3443,13 @@ static bool page_fault_can_be_fast(u32 error_code)
  * someone else modified the SPTE from its original value.
  */
 static bool
-fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, gpa_t gpa,
 			u64 *sptep, u64 old_spte, u64 new_spte)
 {
 	gfn_t gfn;
 
-	WARN_ON(!sp->role.direct);
+	WARN_ON(!vcpu->arch.direct_build_tdp &&
+		(!sptep_to_sp(sptep)->role.direct));
 
 	/*
 	 * Theoretically we could also set dirty bit (and flush TLB) here in
@@ -3470,7 +3471,8 @@ fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 		 * The gfn of direct spte is stable since it is
 		 * calculated by sp->gfn.
 		 */
-		gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
+
+		gfn = gpa >> PAGE_SHIFT;
 		kvm_vcpu_mark_page_dirty(vcpu, gfn);
 	}
 
@@ -3498,10 +3500,10 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 			    u32 error_code)
 {
 	struct kvm_shadow_walk_iterator iterator;
-	struct kvm_mmu_page *sp;
 	bool fault_handled = false;
 	u64 spte = 0ull;
 	uint retry_count = 0;
+	int pte_level = 0;
 
 	if (!page_fault_can_be_fast(error_code))
 		return false;
@@ -3515,8 +3517,15 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 			if (!is_shadow_present_pte(spte))
 				break;
 
-		sp = sptep_to_sp(iterator.sptep);
-		if (!is_last_spte(spte, sp->role.level))
+		if (iterator.level < PG_LEVEL_4K)
+			pte_level  = PG_LEVEL_4K;
+		else
+			pte_level = iterator.level;
+
+		WARN_ON(!vcpu->arch.direct_build_tdp &&
+			(pte_level != sptep_to_sp(iterator.sptep)->role.level));
+
+		if (!is_last_spte(spte, pte_level))
 			break;
 
 		/*
@@ -3559,7 +3568,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 			 *
 			 * See the comments in kvm_arch_commit_memory_region().
 			 */
-			if (sp->role.level > PG_LEVEL_4K)
+			if (pte_level > PG_LEVEL_4K)
 				break;
 		}
 
@@ -3573,7 +3582,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 		 * since the gfn is not stable for indirect shadow page. See
 		 * Documentation/virt/kvm/locking.rst to get more detail.
 		 */
-		fault_handled = fast_pf_fix_direct_spte(vcpu, sp,
+		fault_handled = fast_pf_fix_direct_spte(vcpu, cr2_or_gpa,
 							iterator.sptep, spte,
 							new_spte);
 		if (fault_handled)
@@ -4106,6 +4115,9 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
 	if (fast_page_fault(vcpu, gpa, error_code))
 		return RET_PF_RETRY;
 
+	if (vcpu->arch.direct_build_tdp)
+		return RET_PF_EMULATE;
+
 	r = mmu_topup_memory_caches(vcpu, false);
 	if (r)
 		return r;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC V2 6/9] Apply the direct build EPT according to the memory slots change
  2020-09-01 11:52 [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance yulei.kernel
                   ` (4 preceding siblings ...)
  2020-09-01 11:56 ` [RFC V2 5/9] Modify the page fault path to meet the direct build EPT requirement yulei.kernel
@ 2020-09-01 11:56 ` yulei.kernel
  2020-09-01 22:20   ` kernel test robot
  2020-09-02  7:00   ` kernel test robot
  2020-09-01 11:56 ` [RFC V2 7/9] Add migration support when using direct build EPT yulei.kernel
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 22+ messages in thread
From: yulei.kernel @ 2020-09-01 11:56 UTC (permalink / raw)
  To: pbonzini
  Cc: kvm, linux-kernel, sean.j.christopherson, jmattson, junaids,
	bgardon, vkuznets, xiaoguangrong.eric, kernellwp,
	lihaiwei.kernel, Yulei Zhang, Yulei Zhang

From: Yulei Zhang <yulei.kernel@gmail.com>

Construct the direct build ept when guest memory slots have been
changed, and issue mmu_reload request to update the CR3 so that
guest could use the pre-constructed EPT without page fault.

Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>
---
 arch/mips/kvm/mips.c       | 13 +++++++++++++
 arch/powerpc/kvm/powerpc.c | 13 +++++++++++++
 arch/s390/kvm/kvm-s390.c   | 13 +++++++++++++
 arch/x86/kvm/mmu/mmu.c     | 33 ++++++++++++++++++++++++++-------
 include/linux/kvm_host.h   |  3 +++
 virt/kvm/kvm_main.c        | 13 +++++++++++++
 6 files changed, 81 insertions(+), 7 deletions(-)

diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 7de85d2253ff..05d053a53ebf 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -267,6 +267,19 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 	}
 }
 
+int kvm_direct_tdp_populate_page_table(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+	return 0;
+}
+
+void kvm_direct_tdp_remove_page_table(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+}
+
+void kvm_direct_tdp_release_global_root(struct kvm *kvm)
+{
+}
+
 static inline void dump_handler(const char *symbol, void *start, void *end)
 {
 	u32 *p;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 13999123b735..c6964cbeb6da 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -715,6 +715,19 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 	kvmppc_core_commit_memory_region(kvm, mem, old, new, change);
 }
 
+int kvm_direct_tdp_populate_page_table(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+	return 0;
+}
+
+void kvm_direct_tdp_remove_page_table(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+}
+
+void kvm_direct_tdp_release_global_root(struct kvm *kvm)
+{
+}
+
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 				   struct kvm_memory_slot *slot)
 {
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 6b74b92c1a58..d6f7cf1a30a3 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -5021,6 +5021,19 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 	return;
 }
 
+int kvm_direct_tdp_populate_page_table(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+	return 0;
+}
+
+void kvm_direct_tdp_remove_page_table(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+}
+
+void kvm_direct_tdp_release_global_root(struct kvm *kvm)
+{
+}
+
 static inline unsigned long nonhyp_mask(int i)
 {
 	unsigned int nonhyp_fai = (sclp.hmfai << i * 2) >> 30;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index fda6c4196854..47d2a1c18f36 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5206,13 +5206,20 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 {
 	int r;
 
-	r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->direct_map);
-	if (r)
-		goto out;
-	r = mmu_alloc_roots(vcpu);
-	kvm_mmu_sync_roots(vcpu);
-	if (r)
-		goto out;
+	if (vcpu->kvm->arch.global_root_hpa) {
+		vcpu->arch.direct_build_tdp = true;
+		vcpu->arch.mmu->root_hpa = vcpu->kvm->arch.global_root_hpa;
+	}
+
+	if (!vcpu->arch.direct_build_tdp) {
+		r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->direct_map);
+		if (r)
+			goto out;
+		r = mmu_alloc_roots(vcpu);
+		kvm_mmu_sync_roots(vcpu);
+		if (r)
+			goto out;
+	}
 	kvm_mmu_load_pgd(vcpu);
 	kvm_x86_ops.tlb_flush_current(vcpu);
 out:
@@ -6464,6 +6471,17 @@ int direct_build_mapping_level(struct kvm *kvm, struct kvm_memory_slot *slot, gf
 	return host_level;
 }
 
+static void kvm_make_direct_build_update(struct kvm *kvm)
+{
+	int i;
+	struct kvm_vcpu *vcpu;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
+		kvm_vcpu_kick(vcpu);
+	}
+}
+
 int kvm_direct_tdp_populate_page_table(struct kvm *kvm, struct kvm_memory_slot *slot)
 {
 	gfn_t gfn;
@@ -6498,6 +6516,7 @@ int kvm_direct_tdp_populate_page_table(struct kvm *kvm, struct kvm_memory_slot *
 		direct_build_tdp_map(kvm, slot, gfn, pfn, host_level);
 	}
 
+	kvm_make_direct_build_update(kvm);
 	return 0;
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8901862ba2a3..b2aa0daad6dd 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -694,6 +694,9 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 				struct kvm_memory_slot *old,
 				const struct kvm_memory_slot *new,
 				enum kvm_mr_change change);
+int kvm_direct_tdp_populate_page_table(struct kvm *kvm, struct kvm_memory_slot *slot);
+void kvm_direct_tdp_remove_page_table(struct kvm *kvm, struct kvm_memory_slot *slot);
+void kvm_direct_tdp_release_global_root(struct kvm *kvm);
 void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, gfn_t gfn);
 /* flush all memory translations */
 void kvm_arch_flush_shadow_all(struct kvm *kvm);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 47fc18b05c53..fd1b419f4eb4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -876,6 +876,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
 #endif
 	kvm_arch_destroy_vm(kvm);
 	kvm_destroy_devices(kvm);
+	kvm_direct_tdp_release_global_root(kvm);
 	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
 		kvm_free_memslots(kvm, __kvm_memslots(kvm, i));
 	cleanup_srcu_struct(&kvm->irq_srcu);
@@ -1195,6 +1196,10 @@ static int kvm_set_memslot(struct kvm *kvm,
 		 * in the freshly allocated memslots, not in @old or @new.
 		 */
 		slot = id_to_memslot(slots, old->id);
+		/* Remove pre-constructed page table */
+		if (!as_id)
+			kvm_direct_tdp_remove_page_table(kvm, slot);
+
 		slot->flags |= KVM_MEMSLOT_INVALID;
 
 		/*
@@ -1222,6 +1227,14 @@ static int kvm_set_memslot(struct kvm *kvm,
 	update_memslots(slots, new, change);
 	slots = install_new_memslots(kvm, as_id, slots);
 
+	if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
+		if (!as_id) {
+			r = kvm_direct_tdp_populate_page_table(kvm, new);
+			if (r)
+				goto out_slots;
+		}
+	}
+
 	kvm_arch_commit_memory_region(kvm, mem, old, new, change);
 
 	kvfree(slots);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC V2 7/9] Add migration support when using direct build EPT
  2020-09-01 11:52 [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance yulei.kernel
                   ` (5 preceding siblings ...)
  2020-09-01 11:56 ` [RFC V2 6/9] Apply the direct build EPT according to the memory slots change yulei.kernel
@ 2020-09-01 11:56 ` yulei.kernel
  2020-09-01 11:57 ` [RFC V2 8/9] Introduce kvm module parameter global_tdp to turn on the direct build EPT mode yulei.kernel
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: yulei.kernel @ 2020-09-01 11:56 UTC (permalink / raw)
  To: pbonzini
  Cc: kvm, linux-kernel, sean.j.christopherson, jmattson, junaids,
	bgardon, vkuznets, xiaoguangrong.eric, kernellwp,
	lihaiwei.kernel, Yulei Zhang, Yulei Zhang

From: Yulei Zhang <yulei.kernel@gmail.com>

Make migration available in direct build ept mode whether
pml enabled or not.

Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>
---
 arch/x86/include/asm/kvm_host.h |   2 +
 arch/x86/kvm/mmu/mmu.c          | 153 +++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c              |  44 +++++----
 3 files changed, 178 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ab3cbef8c1aa..429a50c89268 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1318,6 +1318,8 @@ void kvm_mmu_zap_all(struct kvm *kvm);
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
 unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
+void kvm_mmu_slot_direct_build_handle_wp(struct kvm *kvm,
+					 struct kvm_memory_slot *memslot);
 
 int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3);
 bool pdptrs_changed(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 47d2a1c18f36..f03bf8efcefe 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -249,6 +249,8 @@ struct kvm_shadow_walk_iterator {
 static struct kmem_cache *pte_list_desc_cache;
 static struct kmem_cache *mmu_page_header_cache;
 static struct percpu_counter kvm_total_used_mmu_pages;
+static int __kvm_write_protect_spte(struct kvm *kvm, struct kvm_memory_slot *slot,
+				gfn_t gfn, int level);
 
 static u64 __read_mostly shadow_nx_mask;
 static u64 __read_mostly shadow_x_mask;	/* mutual exclusive with nx_mask */
@@ -1644,11 +1646,18 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 				     gfn_t gfn_offset, unsigned long mask)
 {
 	struct kvm_rmap_head *rmap_head;
+	gfn_t gfn;
 
 	while (mask) {
-		rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
-					  PG_LEVEL_4K, slot);
-		__rmap_write_protect(kvm, rmap_head, false);
+		if (kvm->arch.global_root_hpa) {
+			gfn = slot->base_gfn + gfn_offset + __ffs(mask);
+
+			__kvm_write_protect_spte(kvm, slot, gfn, PG_LEVEL_4K);
+		} else {
+			rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
+						  PG_LEVEL_4K, slot);
+			__rmap_write_protect(kvm, rmap_head, false);
+		}
 
 		/* clear the first set bit */
 		mask &= mask - 1;
@@ -6584,6 +6593,144 @@ void kvm_direct_tdp_release_global_root(struct kvm *kvm)
 	return;
 }
 
+static int __kvm_write_protect_spte(struct kvm *kvm, struct kvm_memory_slot *slot,
+				gfn_t gfn, int level)
+{
+	int ret = 0;
+	/* add write protect on pte, tear down the page table if large page is enabled */
+	struct kvm_shadow_walk_iterator iterator;
+	unsigned long i;
+	kvm_pfn_t pfn;
+	struct page *page;
+	u64 *sptep;
+	u64 spte, t_spte;
+
+	for_each_direct_build_shadow_entry(iterator, kvm->arch.global_root_hpa,
+			gfn << PAGE_SHIFT, max_tdp_level) {
+		if (iterator.level == level) {
+			break;
+		}
+	}
+
+	if (level != PG_LEVEL_4K) {
+		sptep = iterator.sptep;
+
+		page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+		if (!page)
+			return ret;
+
+		t_spte = page_to_phys(page) | PT_PRESENT_MASK | PT_WRITABLE_MASK |
+			shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
+
+		for (i = 0; i < KVM_PAGES_PER_HPAGE(level); i++) {
+
+			for_each_direct_build_shadow_entry(iterator, t_spte & PT64_BASE_ADDR_MASK,
+					gfn << PAGE_SHIFT, level - 1) {
+				if (iterator.level == PG_LEVEL_4K) {
+					break;
+				}
+
+				if (!is_shadow_present_pte(*iterator.sptep)) {
+					struct page *page;
+					page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+					if (!page) {
+						__kvm_walk_global_page(kvm, t_spte & PT64_BASE_ADDR_MASK, level - 1);
+						return ret;
+					}
+					spte = page_to_phys(page) | PT_PRESENT_MASK | PT_WRITABLE_MASK |
+						shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
+					mmu_spte_set(iterator.sptep, spte);
+				}
+			}
+
+			pfn = gfn_to_pfn_try_write(slot, gfn);
+			if ((pfn & KVM_PFN_ERR_FAULT) || is_noslot_pfn(pfn))
+				return ret;
+
+			if (kvm_x86_ops.slot_enable_log_dirty)
+				direct_build_tdp_set_spte(kvm, slot, iterator.sptep,
+						ACC_ALL, iterator.level, gfn, pfn, false, false, true);
+
+			else
+				direct_build_tdp_set_spte(kvm, slot, iterator.sptep,
+						ACC_EXEC_MASK | ACC_USER_MASK, iterator.level, gfn, pfn, false, true, true);
+			gfn++;
+		}
+		WARN_ON(!is_last_spte(*sptep, level));
+		pfn = spte_to_pfn(*sptep);
+		mmu_spte_clear_track_bits(sptep);
+		kvm_release_pfn_clean(pfn);
+		mmu_spte_set(sptep, t_spte);
+	} else {
+		if (kvm_x86_ops.slot_enable_log_dirty)
+			spte_clear_dirty(iterator.sptep);
+		else
+			spte_write_protect(iterator.sptep, false);
+	}
+	return ret;
+}
+
+static void __kvm_remove_wp_spte(struct kvm *kvm, struct kvm_memory_slot *slot,
+					gfn_t gfn, int level)
+{
+	struct kvm_shadow_walk_iterator iterator;
+	kvm_pfn_t pfn;
+	u64 addr, spte;
+
+	for_each_direct_build_shadow_entry(iterator, kvm->arch.global_root_hpa,
+			gfn << PAGE_SHIFT, max_tdp_level) {
+		if (iterator.level == level)
+			break;
+	}
+
+	if (level != PG_LEVEL_4K) {
+		if (is_shadow_present_pte(*iterator.sptep)) {
+			addr = (*iterator.sptep) & PT64_BASE_ADDR_MASK;
+
+			pfn = gfn_to_pfn_try_write(slot, gfn);
+			if ((pfn & KVM_PFN_ERR_FAULT) || is_noslot_pfn(pfn)) {
+				printk("Failed to alloc page\n");
+				return;
+			}
+			mmu_spte_clear_track_bits(iterator.sptep);
+			direct_build_tdp_set_spte(kvm, slot, iterator.sptep,
+					ACC_ALL, level, gfn, pfn, false, true, true);
+
+			__kvm_walk_global_page(kvm, addr, level - 1);
+		}
+	} else {
+		if (is_shadow_present_pte(*iterator.sptep)) {
+			if (kvm_x86_ops.slot_enable_log_dirty) {
+				spte_set_dirty(iterator.sptep);
+			} else {
+				spte = (*iterator.sptep) | PT_WRITABLE_MASK;
+				mmu_spte_update(iterator.sptep, spte);
+			}
+		}
+	}
+}
+
+void kvm_mmu_slot_direct_build_handle_wp(struct kvm *kvm,
+					 struct kvm_memory_slot *memslot)
+{
+	gfn_t gfn = memslot->base_gfn;
+	int host_level;
+
+	/* remove write mask from PTE */
+	for (gfn = memslot->base_gfn; gfn < memslot->base_gfn + memslot->npages; ) {
+
+		host_level = direct_build_mapping_level(kvm, memslot, gfn);
+
+		if (memslot->flags & KVM_MEM_LOG_DIRTY_PAGES)
+			__kvm_write_protect_spte(kvm, memslot, gfn, host_level);
+		else
+			__kvm_remove_wp_spte(kvm, memslot, gfn, host_level);
+		gfn += KVM_PAGES_PER_HPAGE(host_level);
+	}
+
+	kvm_flush_remote_tlbs(kvm);
+}
+
 /*
  * Calculate mmu pages needed for kvm.
  */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 599d73206299..ee898003f22f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10196,9 +10196,12 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 	 *		kvm_arch_flush_shadow_memslot()
 	 */
 	if ((old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
-	    !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
-		kvm_mmu_zap_collapsible_sptes(kvm, new);
-
+	    !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
+		if (kvm->arch.global_root_hpa)
+			kvm_mmu_slot_direct_build_handle_wp(kvm, (struct kvm_memory_slot *)new);
+		else
+			kvm_mmu_zap_collapsible_sptes(kvm, new);
+	}
 	/*
 	 * Enable or disable dirty logging for the slot.
 	 *
@@ -10228,25 +10231,30 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 	 * is enabled the D-bit or the W-bit will be cleared.
 	 */
 	if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) {
-		if (kvm_x86_ops.slot_enable_log_dirty) {
-			kvm_x86_ops.slot_enable_log_dirty(kvm, new);
+		if (kvm->arch.global_root_hpa) {
+			kvm_mmu_slot_direct_build_handle_wp(kvm, new);
 		} else {
-			int level =
-				kvm_dirty_log_manual_protect_and_init_set(kvm) ?
-				PG_LEVEL_2M : PG_LEVEL_4K;
+			if (kvm_x86_ops.slot_enable_log_dirty) {
+				kvm_x86_ops.slot_enable_log_dirty(kvm, new);
+			} else {
+				int level =
+					kvm_dirty_log_manual_protect_and_init_set(kvm) ?
+					PG_LEVEL_2M : PG_LEVEL_4K;
 
-			/*
-			 * If we're with initial-all-set, we don't need
-			 * to write protect any small page because
-			 * they're reported as dirty already.  However
-			 * we still need to write-protect huge pages
-			 * so that the page split can happen lazily on
-			 * the first write to the huge page.
-			 */
-			kvm_mmu_slot_remove_write_access(kvm, new, level);
+				/*
+				 * If we're with initial-all-set, we don't need
+				 * to write protect any small page because
+				 * they're reported as dirty already.  However
+				 * we still need to write-protect huge pages
+				 * so that the page split can happen lazily on
+				 * the first write to the huge page.
+				 */
+				kvm_mmu_slot_remove_write_access(kvm, new, level);
+			}
 		}
 	} else {
-		if (kvm_x86_ops.slot_disable_log_dirty)
+		if (kvm_x86_ops.slot_disable_log_dirty
+			&& !kvm->arch.global_root_hpa)
 			kvm_x86_ops.slot_disable_log_dirty(kvm, new);
 	}
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC V2 8/9] Introduce kvm module parameter global_tdp to turn on the direct build EPT mode
  2020-09-01 11:52 [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance yulei.kernel
                   ` (6 preceding siblings ...)
  2020-09-01 11:56 ` [RFC V2 7/9] Add migration support when using direct build EPT yulei.kernel
@ 2020-09-01 11:57 ` yulei.kernel
  2020-09-01 11:57 ` [RFC V2 9/9] Handle certain mmu exposed functions properly while turn on " yulei.kernel
  2020-09-09  3:04 ` [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance Wanpeng Li
  9 siblings, 0 replies; 22+ messages in thread
From: yulei.kernel @ 2020-09-01 11:57 UTC (permalink / raw)
  To: pbonzini
  Cc: kvm, linux-kernel, sean.j.christopherson, jmattson, junaids,
	bgardon, vkuznets, xiaoguangrong.eric, kernellwp,
	lihaiwei.kernel, Yulei Zhang

From: Yulei Zhang <yuleixzhang@tencent.com>

Currently global_tdp is only supported on intel X86 system with ept
supported, and it will turn off the smm mode when enable global_tdp.

Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>
---
 arch/x86/include/asm/kvm_host.h |  4 ++++
 arch/x86/kvm/mmu/mmu.c          |  5 ++++-
 arch/x86/kvm/x86.c              | 11 ++++++++++-
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 429a50c89268..330cb254b34b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1357,6 +1357,8 @@ extern u64  kvm_default_tsc_scaling_ratio;
 
 extern u64 kvm_mce_cap_supported;
 
+extern bool global_tdp;
+
 /*
  * EMULTYPE_NO_DECODE - Set when re-emulating an instruction (after completing
  *			userspace I/O) to indicate that the emulation context
@@ -1689,6 +1691,8 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
 #endif
 }
 
+inline bool boot_cpu_is_amd(void);
+
 #define put_smstate(type, buf, offset, val)                      \
 	*(type *)((buf) + (offset) - 0x7e00) = val
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f03bf8efcefe..6639d9c7012e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4573,7 +4573,7 @@ reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
 }
 EXPORT_SYMBOL_GPL(reset_shadow_zero_bits_mask);
 
-static inline bool boot_cpu_is_amd(void)
+inline bool boot_cpu_is_amd(void)
 {
 	WARN_ON_ONCE(!tdp_enabled);
 	return shadow_x_mask == 0;
@@ -6497,6 +6497,9 @@ int kvm_direct_tdp_populate_page_table(struct kvm *kvm, struct kvm_memory_slot *
 	kvm_pfn_t pfn;
 	int host_level;
 
+	if (!global_tdp)
+		return 0;
+
 	if (!kvm->arch.global_root_hpa) {
 		struct page *page;
 		WARN_ON(!tdp_enabled);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ee898003f22f..57d64f3239e1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -161,6 +161,9 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
 int __read_mostly pi_inject_timer = -1;
 module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
 
+bool __read_mostly global_tdp;
+module_param_named(global_tdp, global_tdp, bool, S_IRUGO);
+
 #define KVM_NR_SHARED_MSRS 16
 
 struct kvm_shared_msrs_global {
@@ -3539,7 +3542,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		 * fringe case that is not enabled except via specific settings
 		 * of the module parameters.
 		 */
-		r = kvm_x86_ops.has_emulated_msr(MSR_IA32_SMBASE);
+		if (global_tdp)
+			r = 0;
+		else
+			r = kvm_x86_ops.has_emulated_msr(MSR_IA32_SMBASE);
 		break;
 	case KVM_CAP_VAPIC:
 		r = !kvm_x86_ops.cpu_has_accelerated_tpr();
@@ -9808,6 +9814,9 @@ int kvm_arch_hardware_setup(void *opaque)
 	if (r != 0)
 		return r;
 
+	if ((tdp_enabled == false) || boot_cpu_is_amd())
+		global_tdp = 0;
+
 	memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
 
 	if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC V2 9/9] Handle certain mmu exposed functions properly while turn on direct build EPT mode
  2020-09-01 11:52 [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance yulei.kernel
                   ` (7 preceding siblings ...)
  2020-09-01 11:57 ` [RFC V2 8/9] Introduce kvm module parameter global_tdp to turn on the direct build EPT mode yulei.kernel
@ 2020-09-01 11:57 ` yulei.kernel
  2020-09-09  3:04 ` [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance Wanpeng Li
  9 siblings, 0 replies; 22+ messages in thread
From: yulei.kernel @ 2020-09-01 11:57 UTC (permalink / raw)
  To: pbonzini
  Cc: kvm, linux-kernel, sean.j.christopherson, jmattson, junaids,
	bgardon, vkuznets, xiaoguangrong.eric, kernellwp,
	lihaiwei.kernel, Yulei Zhang, Yulei Zhang

From: Yulei Zhang <yulei.kernel@gmail.com>

Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>
---
 arch/x86/kvm/mmu/mmu.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6639d9c7012e..35bd87bf965f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1719,6 +1719,9 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 	int i;
 	bool write_protected = false;
 
+	if (kvm->arch.global_root_hpa)
+		return write_protected;
+
 	for (i = PG_LEVEL_4K; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) {
 		rmap_head = __gfn_to_rmap(gfn, i, slot);
 		write_protected |= __rmap_write_protect(kvm, rmap_head, true);
@@ -5862,6 +5865,9 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
  */
 static void kvm_mmu_zap_all_fast(struct kvm *kvm)
 {
+	if (kvm->arch.global_root_hpa)
+		return;
+
 	lockdep_assert_held(&kvm->slots_lock);
 
 	spin_lock(&kvm->mmu_lock);
@@ -5924,6 +5930,9 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 	struct kvm_memory_slot *memslot;
 	int i;
 
+	if (kvm->arch.global_root_hpa)
+		return;
+
 	spin_lock(&kvm->mmu_lock);
 	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
 		slots = __kvm_memslots(kvm, i);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC V2 2/9] Introduce page table population function for direct build EPT feature
  2020-09-01 11:55 ` [RFC V2 2/9] Introduce page table population function for direct build EPT feature yulei.kernel
@ 2020-09-01 17:33   ` kernel test robot
  2020-09-01 19:04   ` kernel test robot
  1 sibling, 0 replies; 22+ messages in thread
From: kernel test robot @ 2020-09-01 17:33 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2710 bytes --]

Hi,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on kvm/linux-next]
[also build test WARNING on linus/master v5.9-rc3 next-20200828]
[cannot apply to kvms390/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/yulei-kernel-gmail-com/x86-mmu-Introduce-parallel-memory-virtualization-to-boost-performance/20200901-221509
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: i386-allyesconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
        # save the attached .config to linux build tree
        make W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> arch/x86/kvm/mmu/mmu.c:6430:5: warning: no previous prototype for 'direct_build_mapping_level' [-Wmissing-prototypes]
    6430 | int direct_build_mapping_level(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn)
         |     ^~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/x86/kvm/mmu/mmu.c:6448:5: warning: no previous prototype for 'kvm_direct_tdp_populate_page_table' [-Wmissing-prototypes]
    6448 | int kvm_direct_tdp_populate_page_table(struct kvm *kvm, struct kvm_memory_slot *slot)
         |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# https://github.com/0day-ci/linux/commit/9607ad9b47ad43e0b1fa4b2f4ef0c2e6a1217d08
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review yulei-kernel-gmail-com/x86-mmu-Introduce-parallel-memory-virtualization-to-boost-performance/20200901-221509
git checkout 9607ad9b47ad43e0b1fa4b2f4ef0c2e6a1217d08
vim +/direct_build_mapping_level +6430 arch/x86/kvm/mmu/mmu.c

  6429	
> 6430	int direct_build_mapping_level(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn)
  6431	{
  6432		int host_level, max_level, level;
  6433		struct kvm_lpage_info *linfo;
  6434	
  6435		host_level = host_mapping_level(kvm, gfn);
  6436		if (host_level != PG_LEVEL_4K) {
  6437			max_level = min(max_huge_page_level, host_level);
  6438			for (level = PG_LEVEL_4K; level <= max_level; ++level) {
  6439				linfo = lpage_info_slot(gfn, slot, level);
  6440				if (linfo->disallow_lpage)
  6441					break;
  6442			}
  6443			host_level = level - 1;
  6444		}
  6445		return host_level;
  6446	}
  6447	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 74609 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC V2 2/9] Introduce page table population function for direct build EPT feature
  2020-09-01 11:55 ` [RFC V2 2/9] Introduce page table population function for direct build EPT feature yulei.kernel
  2020-09-01 17:33   ` kernel test robot
@ 2020-09-01 19:04   ` kernel test robot
  1 sibling, 0 replies; 22+ messages in thread
From: kernel test robot @ 2020-09-01 19:04 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 15821 bytes --]

Hi,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on kvm/linux-next]
[also build test ERROR on linus/master v5.9-rc3 next-20200828]
[cannot apply to kvm-ppc/kvm-ppc-next kvms390/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/yulei-kernel-gmail-com/x86-mmu-Introduce-parallel-memory-virtualization-to-boost-performance/20200901-221509
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   arch/powerpc/kvm/book3s_xive_native.c: In function 'kvmppc_xive_native_set_queue_config':
>> arch/powerpc/kvm/book3s_xive_native.c:640:33: error: passing argument 1 of 'kvm_host_page_size' from incompatible pointer type [-Werror=incompatible-pointer-types]
     640 |  page_size = kvm_host_page_size(vcpu, gfn);
         |                                 ^~~~
         |                                 |
         |                                 struct kvm_vcpu *
   In file included from arch/powerpc/kvm/book3s_xive_native.c:9:
   include/linux/kvm_host.h:780:46: note: expected 'struct kvm *' but argument is of type 'struct kvm_vcpu *'
     780 | unsigned long kvm_host_page_size(struct kvm *kvm, struct kvm_vcpu *vcpu, gfn_t gfn);
         |                                  ~~~~~~~~~~~~^~~
>> arch/powerpc/kvm/book3s_xive_native.c:640:39: warning: passing argument 2 of 'kvm_host_page_size' makes pointer from integer without a cast [-Wint-conversion]
     640 |  page_size = kvm_host_page_size(vcpu, gfn);
         |                                       ^~~
         |                                       |
         |                                       gfn_t {aka long long unsigned int}
   In file included from arch/powerpc/kvm/book3s_xive_native.c:9:
   include/linux/kvm_host.h:780:68: note: expected 'struct kvm_vcpu *' but argument is of type 'gfn_t' {aka 'long long unsigned int'}
     780 | unsigned long kvm_host_page_size(struct kvm *kvm, struct kvm_vcpu *vcpu, gfn_t gfn);
         |                                                   ~~~~~~~~~~~~~~~~~^~~~
>> arch/powerpc/kvm/book3s_xive_native.c:640:14: error: too few arguments to function 'kvm_host_page_size'
     640 |  page_size = kvm_host_page_size(vcpu, gfn);
         |              ^~~~~~~~~~~~~~~~~~
   In file included from arch/powerpc/kvm/book3s_xive_native.c:9:
   include/linux/kvm_host.h:780:15: note: declared here
     780 | unsigned long kvm_host_page_size(struct kvm *kvm, struct kvm_vcpu *vcpu, gfn_t gfn);
         |               ^~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

# https://github.com/0day-ci/linux/commit/9607ad9b47ad43e0b1fa4b2f4ef0c2e6a1217d08
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review yulei-kernel-gmail-com/x86-mmu-Introduce-parallel-memory-virtualization-to-boost-performance/20200901-221509
git checkout 9607ad9b47ad43e0b1fa4b2f4ef0c2e6a1217d08
vim +/kvm_host_page_size +640 arch/powerpc/kvm/book3s_xive_native.c

13ce3297c5766b Cédric Le Goater    2019-04-18  549  
13ce3297c5766b Cédric Le Goater    2019-04-18  550  static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
13ce3297c5766b Cédric Le Goater    2019-04-18  551  					       long eq_idx, u64 addr)
13ce3297c5766b Cédric Le Goater    2019-04-18  552  {
13ce3297c5766b Cédric Le Goater    2019-04-18  553  	struct kvm *kvm = xive->kvm;
13ce3297c5766b Cédric Le Goater    2019-04-18  554  	struct kvm_vcpu *vcpu;
13ce3297c5766b Cédric Le Goater    2019-04-18  555  	struct kvmppc_xive_vcpu *xc;
13ce3297c5766b Cédric Le Goater    2019-04-18  556  	void __user *ubufp = (void __user *) addr;
13ce3297c5766b Cédric Le Goater    2019-04-18  557  	u32 server;
13ce3297c5766b Cédric Le Goater    2019-04-18  558  	u8 priority;
13ce3297c5766b Cédric Le Goater    2019-04-18  559  	struct kvm_ppc_xive_eq kvm_eq;
13ce3297c5766b Cédric Le Goater    2019-04-18  560  	int rc;
13ce3297c5766b Cédric Le Goater    2019-04-18  561  	__be32 *qaddr = 0;
13ce3297c5766b Cédric Le Goater    2019-04-18  562  	struct page *page;
13ce3297c5766b Cédric Le Goater    2019-04-18  563  	struct xive_q *q;
13ce3297c5766b Cédric Le Goater    2019-04-18  564  	gfn_t gfn;
13ce3297c5766b Cédric Le Goater    2019-04-18  565  	unsigned long page_size;
aedb5b19429c80 Cédric Le Goater    2019-05-28  566  	int srcu_idx;
13ce3297c5766b Cédric Le Goater    2019-04-18  567  
13ce3297c5766b Cédric Le Goater    2019-04-18  568  	/*
13ce3297c5766b Cédric Le Goater    2019-04-18  569  	 * Demangle priority/server tuple from the EQ identifier
13ce3297c5766b Cédric Le Goater    2019-04-18  570  	 */
13ce3297c5766b Cédric Le Goater    2019-04-18  571  	priority = (eq_idx & KVM_XIVE_EQ_PRIORITY_MASK) >>
13ce3297c5766b Cédric Le Goater    2019-04-18  572  		KVM_XIVE_EQ_PRIORITY_SHIFT;
13ce3297c5766b Cédric Le Goater    2019-04-18  573  	server = (eq_idx & KVM_XIVE_EQ_SERVER_MASK) >>
13ce3297c5766b Cédric Le Goater    2019-04-18  574  		KVM_XIVE_EQ_SERVER_SHIFT;
13ce3297c5766b Cédric Le Goater    2019-04-18  575  
13ce3297c5766b Cédric Le Goater    2019-04-18  576  	if (copy_from_user(&kvm_eq, ubufp, sizeof(kvm_eq)))
13ce3297c5766b Cédric Le Goater    2019-04-18  577  		return -EFAULT;
13ce3297c5766b Cédric Le Goater    2019-04-18  578  
13ce3297c5766b Cédric Le Goater    2019-04-18  579  	vcpu = kvmppc_xive_find_server(kvm, server);
13ce3297c5766b Cédric Le Goater    2019-04-18  580  	if (!vcpu) {
13ce3297c5766b Cédric Le Goater    2019-04-18  581  		pr_err("Can't find server %d\n", server);
13ce3297c5766b Cédric Le Goater    2019-04-18  582  		return -ENOENT;
13ce3297c5766b Cédric Le Goater    2019-04-18  583  	}
13ce3297c5766b Cédric Le Goater    2019-04-18  584  	xc = vcpu->arch.xive_vcpu;
13ce3297c5766b Cédric Le Goater    2019-04-18  585  
13ce3297c5766b Cédric Le Goater    2019-04-18  586  	if (priority != xive_prio_from_guest(priority)) {
13ce3297c5766b Cédric Le Goater    2019-04-18  587  		pr_err("Trying to restore invalid queue %d for VCPU %d\n",
13ce3297c5766b Cédric Le Goater    2019-04-18  588  		       priority, server);
13ce3297c5766b Cédric Le Goater    2019-04-18  589  		return -EINVAL;
13ce3297c5766b Cédric Le Goater    2019-04-18  590  	}
13ce3297c5766b Cédric Le Goater    2019-04-18  591  	q = &xc->queues[priority];
13ce3297c5766b Cédric Le Goater    2019-04-18  592  
13ce3297c5766b Cédric Le Goater    2019-04-18  593  	pr_devel("%s VCPU %d priority %d fl:%x shift:%d addr:%llx g:%d idx:%d\n",
13ce3297c5766b Cédric Le Goater    2019-04-18  594  		 __func__, server, priority, kvm_eq.flags,
13ce3297c5766b Cédric Le Goater    2019-04-18  595  		 kvm_eq.qshift, kvm_eq.qaddr, kvm_eq.qtoggle, kvm_eq.qindex);
13ce3297c5766b Cédric Le Goater    2019-04-18  596  
13ce3297c5766b Cédric Le Goater    2019-04-18  597  	/* reset queue and disable queueing */
13ce3297c5766b Cédric Le Goater    2019-04-18  598  	if (!kvm_eq.qshift) {
13ce3297c5766b Cédric Le Goater    2019-04-18  599  		q->guest_qaddr  = 0;
13ce3297c5766b Cédric Le Goater    2019-04-18  600  		q->guest_qshift = 0;
13ce3297c5766b Cédric Le Goater    2019-04-18  601  
31a88c82b466d2 Greg Kurz           2019-11-13  602  		rc = kvmppc_xive_native_configure_queue(xc->vp_id, q, priority,
13ce3297c5766b Cédric Le Goater    2019-04-18  603  							NULL, 0, true);
13ce3297c5766b Cédric Le Goater    2019-04-18  604  		if (rc) {
13ce3297c5766b Cédric Le Goater    2019-04-18  605  			pr_err("Failed to reset queue %d for VCPU %d: %d\n",
13ce3297c5766b Cédric Le Goater    2019-04-18  606  			       priority, xc->server_num, rc);
13ce3297c5766b Cédric Le Goater    2019-04-18  607  			return rc;
13ce3297c5766b Cédric Le Goater    2019-04-18  608  		}
13ce3297c5766b Cédric Le Goater    2019-04-18  609  
13ce3297c5766b Cédric Le Goater    2019-04-18  610  		return 0;
13ce3297c5766b Cédric Le Goater    2019-04-18  611  	}
13ce3297c5766b Cédric Le Goater    2019-04-18  612  
c468bc4e8468cb Cédric Le Goater    2019-05-20  613  	/*
c468bc4e8468cb Cédric Le Goater    2019-05-20  614  	 * sPAPR specifies a "Unconditional Notify (n) flag" for the
c468bc4e8468cb Cédric Le Goater    2019-05-20  615  	 * H_INT_SET_QUEUE_CONFIG hcall which forces notification
c468bc4e8468cb Cédric Le Goater    2019-05-20  616  	 * without using the coalescing mechanisms provided by the
c468bc4e8468cb Cédric Le Goater    2019-05-20  617  	 * XIVE END ESBs. This is required on KVM as notification
c468bc4e8468cb Cédric Le Goater    2019-05-20  618  	 * using the END ESBs is not supported.
c468bc4e8468cb Cédric Le Goater    2019-05-20  619  	 */
c468bc4e8468cb Cédric Le Goater    2019-05-20  620  	if (kvm_eq.flags != KVM_XIVE_EQ_ALWAYS_NOTIFY) {
c468bc4e8468cb Cédric Le Goater    2019-05-20  621  		pr_err("invalid flags %d\n", kvm_eq.flags);
c468bc4e8468cb Cédric Le Goater    2019-05-20  622  		return -EINVAL;
c468bc4e8468cb Cédric Le Goater    2019-05-20  623  	}
c468bc4e8468cb Cédric Le Goater    2019-05-20  624  
c468bc4e8468cb Cédric Le Goater    2019-05-20  625  	rc = xive_native_validate_queue_size(kvm_eq.qshift);
c468bc4e8468cb Cédric Le Goater    2019-05-20  626  	if (rc) {
c468bc4e8468cb Cédric Le Goater    2019-05-20  627  		pr_err("invalid queue size %d\n", kvm_eq.qshift);
c468bc4e8468cb Cédric Le Goater    2019-05-20  628  		return rc;
c468bc4e8468cb Cédric Le Goater    2019-05-20  629  	}
c468bc4e8468cb Cédric Le Goater    2019-05-20  630  
13ce3297c5766b Cédric Le Goater    2019-04-18  631  	if (kvm_eq.qaddr & ((1ull << kvm_eq.qshift) - 1)) {
13ce3297c5766b Cédric Le Goater    2019-04-18  632  		pr_err("queue page is not aligned %llx/%llx\n", kvm_eq.qaddr,
13ce3297c5766b Cédric Le Goater    2019-04-18  633  		       1ull << kvm_eq.qshift);
13ce3297c5766b Cédric Le Goater    2019-04-18  634  		return -EINVAL;
13ce3297c5766b Cédric Le Goater    2019-04-18  635  	}
13ce3297c5766b Cédric Le Goater    2019-04-18  636  
aedb5b19429c80 Cédric Le Goater    2019-05-28  637  	srcu_idx = srcu_read_lock(&kvm->srcu);
13ce3297c5766b Cédric Le Goater    2019-04-18  638  	gfn = gpa_to_gfn(kvm_eq.qaddr);
13ce3297c5766b Cédric Le Goater    2019-04-18  639  
f9b84e19221efc Sean Christopherson 2020-01-08 @640  	page_size = kvm_host_page_size(vcpu, gfn);
13ce3297c5766b Cédric Le Goater    2019-04-18  641  	if (1ull << kvm_eq.qshift > page_size) {
aedb5b19429c80 Cédric Le Goater    2019-05-28  642  		srcu_read_unlock(&kvm->srcu, srcu_idx);
13ce3297c5766b Cédric Le Goater    2019-04-18  643  		pr_warn("Incompatible host page size %lx!\n", page_size);
13ce3297c5766b Cédric Le Goater    2019-04-18  644  		return -EINVAL;
13ce3297c5766b Cédric Le Goater    2019-04-18  645  	}
13ce3297c5766b Cédric Le Goater    2019-04-18  646  
30486e72093ea2 Greg Kurz           2019-11-13  647  	page = gfn_to_page(kvm, gfn);
30486e72093ea2 Greg Kurz           2019-11-13  648  	if (is_error_page(page)) {
30486e72093ea2 Greg Kurz           2019-11-13  649  		srcu_read_unlock(&kvm->srcu, srcu_idx);
30486e72093ea2 Greg Kurz           2019-11-13  650  		pr_err("Couldn't get queue page %llx!\n", kvm_eq.qaddr);
30486e72093ea2 Greg Kurz           2019-11-13  651  		return -EINVAL;
30486e72093ea2 Greg Kurz           2019-11-13  652  	}
30486e72093ea2 Greg Kurz           2019-11-13  653  
13ce3297c5766b Cédric Le Goater    2019-04-18  654  	qaddr = page_to_virt(page) + (kvm_eq.qaddr & ~PAGE_MASK);
aedb5b19429c80 Cédric Le Goater    2019-05-28  655  	srcu_read_unlock(&kvm->srcu, srcu_idx);
13ce3297c5766b Cédric Le Goater    2019-04-18  656  
13ce3297c5766b Cédric Le Goater    2019-04-18  657  	/*
13ce3297c5766b Cédric Le Goater    2019-04-18  658  	 * Backup the queue page guest address to the mark EQ page
13ce3297c5766b Cédric Le Goater    2019-04-18  659  	 * dirty for migration.
13ce3297c5766b Cédric Le Goater    2019-04-18  660  	 */
13ce3297c5766b Cédric Le Goater    2019-04-18  661  	q->guest_qaddr  = kvm_eq.qaddr;
13ce3297c5766b Cédric Le Goater    2019-04-18  662  	q->guest_qshift = kvm_eq.qshift;
13ce3297c5766b Cédric Le Goater    2019-04-18  663  
13ce3297c5766b Cédric Le Goater    2019-04-18  664  	 /*
13ce3297c5766b Cédric Le Goater    2019-04-18  665  	  * Unconditional Notification is forced by default at the
13ce3297c5766b Cédric Le Goater    2019-04-18  666  	  * OPAL level because the use of END ESBs is not supported by
13ce3297c5766b Cédric Le Goater    2019-04-18  667  	  * Linux.
13ce3297c5766b Cédric Le Goater    2019-04-18  668  	  */
31a88c82b466d2 Greg Kurz           2019-11-13  669  	rc = kvmppc_xive_native_configure_queue(xc->vp_id, q, priority,
13ce3297c5766b Cédric Le Goater    2019-04-18  670  					(__be32 *) qaddr, kvm_eq.qshift, true);
13ce3297c5766b Cédric Le Goater    2019-04-18  671  	if (rc) {
13ce3297c5766b Cédric Le Goater    2019-04-18  672  		pr_err("Failed to configure queue %d for VCPU %d: %d\n",
13ce3297c5766b Cédric Le Goater    2019-04-18  673  		       priority, xc->server_num, rc);
13ce3297c5766b Cédric Le Goater    2019-04-18  674  		put_page(page);
13ce3297c5766b Cédric Le Goater    2019-04-18  675  		return rc;
13ce3297c5766b Cédric Le Goater    2019-04-18  676  	}
13ce3297c5766b Cédric Le Goater    2019-04-18  677  
13ce3297c5766b Cédric Le Goater    2019-04-18  678  	/*
13ce3297c5766b Cédric Le Goater    2019-04-18  679  	 * Only restore the queue state when needed. When doing the
13ce3297c5766b Cédric Le Goater    2019-04-18  680  	 * H_INT_SET_SOURCE_CONFIG hcall, it should not.
13ce3297c5766b Cédric Le Goater    2019-04-18  681  	 */
13ce3297c5766b Cédric Le Goater    2019-04-18  682  	if (kvm_eq.qtoggle != 1 || kvm_eq.qindex != 0) {
13ce3297c5766b Cédric Le Goater    2019-04-18  683  		rc = xive_native_set_queue_state(xc->vp_id, priority,
13ce3297c5766b Cédric Le Goater    2019-04-18  684  						 kvm_eq.qtoggle,
13ce3297c5766b Cédric Le Goater    2019-04-18  685  						 kvm_eq.qindex);
13ce3297c5766b Cédric Le Goater    2019-04-18  686  		if (rc)
13ce3297c5766b Cédric Le Goater    2019-04-18  687  			goto error;
13ce3297c5766b Cédric Le Goater    2019-04-18  688  	}
13ce3297c5766b Cédric Le Goater    2019-04-18  689  
13ce3297c5766b Cédric Le Goater    2019-04-18  690  	rc = kvmppc_xive_attach_escalation(vcpu, priority,
13ce3297c5766b Cédric Le Goater    2019-04-18  691  					   xive->single_escalation);
13ce3297c5766b Cédric Le Goater    2019-04-18  692  error:
13ce3297c5766b Cédric Le Goater    2019-04-18  693  	if (rc)
13ce3297c5766b Cédric Le Goater    2019-04-18  694  		kvmppc_xive_native_cleanup_queue(vcpu, priority);
13ce3297c5766b Cédric Le Goater    2019-04-18  695  	return rc;
13ce3297c5766b Cédric Le Goater    2019-04-18  696  }
13ce3297c5766b Cédric Le Goater    2019-04-18  697  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 70243 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC V2 6/9] Apply the direct build EPT according to the memory slots change
  2020-09-01 11:56 ` [RFC V2 6/9] Apply the direct build EPT according to the memory slots change yulei.kernel
@ 2020-09-01 22:20   ` kernel test robot
  2020-09-02  7:00   ` kernel test robot
  1 sibling, 0 replies; 22+ messages in thread
From: kernel test robot @ 2020-09-01 22:20 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 5674 bytes --]

Hi,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on kvm/linux-next]
[also build test ERROR on linus/master v5.9-rc3 next-20200828]
[cannot apply to kvm-ppc/kvm-ppc-next kvms390/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/yulei-kernel-gmail-com/x86-mmu-Introduce-parallel-memory-virtualization-to-boost-performance/20200901-221509
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: arm64-randconfig-r036-20200901 (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   aarch64-linux-ld: arch/arm64/../../virt/kvm/kvm_main.o: in function `kvm_set_memslot':
>> arch/arm64/kvm/../../../virt/kvm/kvm_main.c:1201: undefined reference to `kvm_direct_tdp_remove_page_table'
>> aarch64-linux-ld: arch/arm64/kvm/../../../virt/kvm/kvm_main.c:1232: undefined reference to `kvm_direct_tdp_populate_page_table'
>> aarch64-linux-ld: arch/arm64/kvm/../../../virt/kvm/kvm_main.c:1232: undefined reference to `kvm_direct_tdp_populate_page_table'
   aarch64-linux-ld: arch/arm64/../../virt/kvm/kvm_main.o: in function `kvm_destroy_vm':
>> arch/arm64/kvm/../../../virt/kvm/kvm_main.c:879: undefined reference to `kvm_direct_tdp_release_global_root'
   aarch64-linux-ld: drivers/gpu/drm/rcar-du/rcar_du_crtc.o: in function `rcar_du_cmm_setup':
   drivers/gpu/drm/rcar-du/rcar_du_crtc.c:515: undefined reference to `rcar_cmm_setup'
   aarch64-linux-ld: drivers/gpu/drm/rcar-du/rcar_du_crtc.o: in function `rcar_du_crtc_atomic_enable':
   drivers/gpu/drm/rcar-du/rcar_du_crtc.c:720: undefined reference to `rcar_cmm_enable'
   aarch64-linux-ld: drivers/gpu/drm/rcar-du/rcar_du_crtc.o: in function `rcar_du_crtc_stop':
   drivers/gpu/drm/rcar-du/rcar_du_crtc.c:664: undefined reference to `rcar_cmm_disable'
   aarch64-linux-ld: drivers/gpu/drm/rcar-du/rcar_du_kms.o: in function `rcar_du_cmm_init':
   drivers/gpu/drm/rcar-du/rcar_du_kms.c:678: undefined reference to `rcar_cmm_init'

# https://github.com/0day-ci/linux/commit/751ce77392ca79955a0577617878ee1950ef3445
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review yulei-kernel-gmail-com/x86-mmu-Introduce-parallel-memory-virtualization-to-boost-performance/20200901-221509
git checkout 751ce77392ca79955a0577617878ee1950ef3445
vim +1201 arch/arm64/kvm/../../../virt/kvm/kvm_main.c

  1178	
  1179	static int kvm_set_memslot(struct kvm *kvm,
  1180				   const struct kvm_userspace_memory_region *mem,
  1181				   struct kvm_memory_slot *old,
  1182				   struct kvm_memory_slot *new, int as_id,
  1183				   enum kvm_mr_change change)
  1184	{
  1185		struct kvm_memory_slot *slot;
  1186		struct kvm_memslots *slots;
  1187		int r;
  1188	
  1189		slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), change);
  1190		if (!slots)
  1191			return -ENOMEM;
  1192	
  1193		if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
  1194			/*
  1195			 * Note, the INVALID flag needs to be in the appropriate entry
  1196			 * in the freshly allocated memslots, not in @old or @new.
  1197			 */
  1198			slot = id_to_memslot(slots, old->id);
  1199			/* Remove pre-constructed page table */
  1200			if (!as_id)
> 1201				kvm_direct_tdp_remove_page_table(kvm, slot);
  1202	
  1203			slot->flags |= KVM_MEMSLOT_INVALID;
  1204	
  1205			/*
  1206			 * We can re-use the old memslots, the only difference from the
  1207			 * newly installed memslots is the invalid flag, which will get
  1208			 * dropped by update_memslots anyway.  We'll also revert to the
  1209			 * old memslots if preparing the new memory region fails.
  1210			 */
  1211			slots = install_new_memslots(kvm, as_id, slots);
  1212	
  1213			/* From this point no new shadow pages pointing to a deleted,
  1214			 * or moved, memslot will be created.
  1215			 *
  1216			 * validation of sp->gfn happens in:
  1217			 *	- gfn_to_hva (kvm_read_guest, gfn_to_pfn)
  1218			 *	- kvm_is_visible_gfn (mmu_check_root)
  1219			 */
  1220			kvm_arch_flush_shadow_memslot(kvm, slot);
  1221		}
  1222	
  1223		r = kvm_arch_prepare_memory_region(kvm, new, mem, change);
  1224		if (r)
  1225			goto out_slots;
  1226	
  1227		update_memslots(slots, new, change);
  1228		slots = install_new_memslots(kvm, as_id, slots);
  1229	
  1230		if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
  1231			if (!as_id) {
> 1232				r = kvm_direct_tdp_populate_page_table(kvm, new);
  1233				if (r)
  1234					goto out_slots;
  1235			}
  1236		}
  1237	
  1238		kvm_arch_commit_memory_region(kvm, mem, old, new, change);
  1239	
  1240		kvfree(slots);
  1241		return 0;
  1242	
  1243	out_slots:
  1244		if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
  1245			slots = install_new_memslots(kvm, as_id, slots);
  1246		kvfree(slots);
  1247		return r;
  1248	}
  1249	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 35282 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC V2 6/9] Apply the direct build EPT according to the memory slots change
  2020-09-01 11:56 ` [RFC V2 6/9] Apply the direct build EPT according to the memory slots change yulei.kernel
  2020-09-01 22:20   ` kernel test robot
@ 2020-09-02  7:00   ` kernel test robot
  1 sibling, 0 replies; 22+ messages in thread
From: kernel test robot @ 2020-09-02  7:00 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1916 bytes --]

Hi,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on kvm/linux-next]
[also build test ERROR on linus/master v5.9-rc3 next-20200828]
[cannot apply to kvm-ppc/kvm-ppc-next kvms390/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/yulei-kernel-gmail-com/x86-mmu-Introduce-parallel-memory-virtualization-to-boost-performance/20200901-221509
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: arm64-allmodconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   aarch64-linux-ld: arch/arm64/../../virt/kvm/kvm_main.o: in function `kvm_set_memslot.constprop.0':
>> kvm_main.c:(.text+0xb384): undefined reference to `kvm_direct_tdp_remove_page_table'
>> aarch64-linux-ld: kvm_main.c:(.text+0xb9e8): undefined reference to `kvm_direct_tdp_populate_page_table'
   aarch64-linux-ld: kvm_main.c:(.text+0xba10): undefined reference to `kvm_direct_tdp_populate_page_table'
   aarch64-linux-ld: arch/arm64/../../virt/kvm/kvm_main.o: in function `kvm_destroy_vm':
>> kvm_main.c:(.text+0x11b98): undefined reference to `kvm_direct_tdp_release_global_root'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 73856 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance
  2020-09-01 11:52 [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance yulei.kernel
                   ` (8 preceding siblings ...)
  2020-09-01 11:57 ` [RFC V2 9/9] Handle certain mmu exposed functions properly while turn on " yulei.kernel
@ 2020-09-09  3:04 ` Wanpeng Li
  2020-09-24  6:28   ` Wanpeng Li
  9 siblings, 1 reply; 22+ messages in thread
From: Wanpeng Li @ 2020-09-09  3:04 UTC (permalink / raw)
  To: Yulei Zhang
  Cc: Paolo Bonzini, kvm, LKML, Sean Christopherson, Jim Mattson,
	Junaid Shahid, Ben Gardon, Vitaly Kuznetsov, Xiao Guangrong,
	Haiwei Li

Any comments? guys!
On Tue, 1 Sep 2020 at 19:52, <yulei.kernel@gmail.com> wrote:
>
> From: Yulei Zhang <yulei.kernel@gmail.com>
>
> Currently in KVM memory virtulization we relay on mmu_lock to
> synchronize the memory mapping update, which make vCPUs work
> in serialize mode and slow down the execution, especially after
> migration to do substantial memory mapping will cause visible
> performance drop, and it can get worse if guest has more vCPU
> numbers and memories.
>
> The idea we present in this patch set is to mitigate the issue
> with pre-constructed memory mapping table. We will fast pin the
> guest memory to build up a global memory mapping table according
> to the guest memslots changes and apply it to cr3, so that after
> guest starts up all the vCPUs would be able to update the memory
> simultaneously without page fault exception, thus the performance
> improvement is expected.
>
> We use memory dirty pattern workload to test the initial patch
> set and get positive result even with huge page enabled. For example,
> we create guest with 32 vCPUs and 64G memories, and let the vcpus
> dirty the entire memory region concurrently, as the initial patch
> eliminate the overhead of mmu_lock, in 2M/1G huge page mode we would
> get the job done in about 50% faster.
>
> We only validate this feature on Intel x86 platform. And as Ben
> pointed out in RFC V1, so far we disable the SMM for resource
> consideration, drop the mmu notification as in this case the
> memory is pinned.
>
> V1->V2:
> * Rebase the code to kernel version 5.9.0-rc1.
>
> Yulei Zhang (9):
>   Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT
>     support
>   Introduce page table population function for direct build EPT feature
>   Introduce page table remove function for direct build EPT feature
>   Add release function for direct build ept when guest VM exit
>   Modify the page fault path to meet the direct build EPT requirement
>   Apply the direct build EPT according to the memory slots change
>   Add migration support when using direct build EPT
>   Introduce kvm module parameter global_tdp to turn on the direct build
>     EPT mode
>   Handle certain mmu exposed functions properly while turn on direct
>     build EPT mode
>
>  arch/mips/kvm/mips.c            |  13 +
>  arch/powerpc/kvm/powerpc.c      |  13 +
>  arch/s390/kvm/kvm-s390.c        |  13 +
>  arch/x86/include/asm/kvm_host.h |  13 +-
>  arch/x86/kvm/mmu/mmu.c          | 533 ++++++++++++++++++++++++++++++--
>  arch/x86/kvm/svm/svm.c          |   2 +-
>  arch/x86/kvm/vmx/vmx.c          |   7 +-
>  arch/x86/kvm/x86.c              |  55 ++--
>  include/linux/kvm_host.h        |   7 +-
>  virt/kvm/kvm_main.c             |  43 ++-
>  10 files changed, 639 insertions(+), 60 deletions(-)
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance
  2020-09-09  3:04 ` [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance Wanpeng Li
@ 2020-09-24  6:28   ` Wanpeng Li
  2020-09-24 17:14     ` Ben Gardon
  0 siblings, 1 reply; 22+ messages in thread
From: Wanpeng Li @ 2020-09-24  6:28 UTC (permalink / raw)
  To: Yulei Zhang
  Cc: Paolo Bonzini, kvm, LKML, Sean Christopherson, Jim Mattson,
	Junaid Shahid, Ben Gardon, Vitaly Kuznetsov, Xiao Guangrong,
	Haiwei Li

Any comments? Paolo! :)
On Wed, 9 Sep 2020 at 11:04, Wanpeng Li <kernellwp@gmail.com> wrote:
>
> Any comments? guys!
> On Tue, 1 Sep 2020 at 19:52, <yulei.kernel@gmail.com> wrote:
> >
> > From: Yulei Zhang <yulei.kernel@gmail.com>
> >
> > Currently in KVM memory virtulization we relay on mmu_lock to
> > synchronize the memory mapping update, which make vCPUs work
> > in serialize mode and slow down the execution, especially after
> > migration to do substantial memory mapping will cause visible
> > performance drop, and it can get worse if guest has more vCPU
> > numbers and memories.
> >
> > The idea we present in this patch set is to mitigate the issue
> > with pre-constructed memory mapping table. We will fast pin the
> > guest memory to build up a global memory mapping table according
> > to the guest memslots changes and apply it to cr3, so that after
> > guest starts up all the vCPUs would be able to update the memory
> > simultaneously without page fault exception, thus the performance
> > improvement is expected.
> >
> > We use memory dirty pattern workload to test the initial patch
> > set and get positive result even with huge page enabled. For example,
> > we create guest with 32 vCPUs and 64G memories, and let the vcpus
> > dirty the entire memory region concurrently, as the initial patch
> > eliminate the overhead of mmu_lock, in 2M/1G huge page mode we would
> > get the job done in about 50% faster.
> >
> > We only validate this feature on Intel x86 platform. And as Ben
> > pointed out in RFC V1, so far we disable the SMM for resource
> > consideration, drop the mmu notification as in this case the
> > memory is pinned.
> >
> > V1->V2:
> > * Rebase the code to kernel version 5.9.0-rc1.
> >
> > Yulei Zhang (9):
> >   Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT
> >     support
> >   Introduce page table population function for direct build EPT feature
> >   Introduce page table remove function for direct build EPT feature
> >   Add release function for direct build ept when guest VM exit
> >   Modify the page fault path to meet the direct build EPT requirement
> >   Apply the direct build EPT according to the memory slots change
> >   Add migration support when using direct build EPT
> >   Introduce kvm module parameter global_tdp to turn on the direct build
> >     EPT mode
> >   Handle certain mmu exposed functions properly while turn on direct
> >     build EPT mode
> >
> >  arch/mips/kvm/mips.c            |  13 +
> >  arch/powerpc/kvm/powerpc.c      |  13 +
> >  arch/s390/kvm/kvm-s390.c        |  13 +
> >  arch/x86/include/asm/kvm_host.h |  13 +-
> >  arch/x86/kvm/mmu/mmu.c          | 533 ++++++++++++++++++++++++++++++--
> >  arch/x86/kvm/svm/svm.c          |   2 +-
> >  arch/x86/kvm/vmx/vmx.c          |   7 +-
> >  arch/x86/kvm/x86.c              |  55 ++--
> >  include/linux/kvm_host.h        |   7 +-
> >  virt/kvm/kvm_main.c             |  43 ++-
> >  10 files changed, 639 insertions(+), 60 deletions(-)
> >
> > --
> > 2.17.1
> >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance
  2020-09-24  6:28   ` Wanpeng Li
@ 2020-09-24 17:14     ` Ben Gardon
  2020-09-25 12:04       ` yulei zhang
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Gardon @ 2020-09-24 17:14 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Yulei Zhang, Paolo Bonzini, kvm, LKML, Sean Christopherson,
	Jim Mattson, Junaid Shahid, Vitaly Kuznetsov, Xiao Guangrong,
	Haiwei Li

On Wed, Sep 23, 2020 at 11:28 PM Wanpeng Li <kernellwp@gmail.com> wrote:
>
> Any comments? Paolo! :)

Hi, sorry to be so late in replying! I wanted to post the first part
of the TDP MMU series I've been working on before responding so we
could discuss the two together, but I haven't been able to get it out
as fast as I would have liked. (I'll send it ASAP!) I'm hopeful that
it will ultimately help address some of the page fault handling and
lock contention issues you're addressing with these patches. I'd also
be happy to work together to add a prepopulation feature to it. I'll
put in some more comments inline below.

> On Wed, 9 Sep 2020 at 11:04, Wanpeng Li <kernellwp@gmail.com> wrote:
> >
> > Any comments? guys!
> > On Tue, 1 Sep 2020 at 19:52, <yulei.kernel@gmail.com> wrote:
> > >
> > > From: Yulei Zhang <yulei.kernel@gmail.com>
> > >
> > > Currently in KVM memory virtulization we relay on mmu_lock to
> > > synchronize the memory mapping update, which make vCPUs work
> > > in serialize mode and slow down the execution, especially after
> > > migration to do substantial memory mapping will cause visible
> > > performance drop, and it can get worse if guest has more vCPU
> > > numbers and memories.
> > >
> > > The idea we present in this patch set is to mitigate the issue
> > > with pre-constructed memory mapping table. We will fast pin the
> > > guest memory to build up a global memory mapping table according
> > > to the guest memslots changes and apply it to cr3, so that after
> > > guest starts up all the vCPUs would be able to update the memory
> > > simultaneously without page fault exception, thus the performance
> > > improvement is expected.

My understanding from this RFC is that your primary goal is to
eliminate page fault latencies and lock contention arising from the
first page faults incurred by vCPUs when initially populating the EPT.
Is that right?

I have the impression that the pinning and generally static memory
mappings are more a convenient simplification than part of a larger
goal to avoid incurring page faults down the line. Is that correct?

I ask because I didn't fully understand, from our conversation on v1
of this RFC, why reimplementing the page fault handler and associated
functions was necessary for the above goals, as I understood them.
My impression of the prepopulation approach is that, KVM will
sequentially populate all the EPT entries to map guest memory. I
understand how this could be optimized to be quite efficient, but I
don't understand how it would scale better than the existing
implementation with one vCPU accessing memory.

> > >
> > > We use memory dirty pattern workload to test the initial patch
> > > set and get positive result even with huge page enabled. For example,
> > > we create guest with 32 vCPUs and 64G memories, and let the vcpus
> > > dirty the entire memory region concurrently, as the initial patch
> > > eliminate the overhead of mmu_lock, in 2M/1G huge page mode we would
> > > get the job done in about 50% faster.

In this benchmark did you include the time required to pre-populate
the EPT or just the time required for the vCPUs to dirty memory?
I ask because I'm curious if your priority is to decrease the total
end-to-end time, or you just care about the guest experience, and not
so much the VM startup time.
How does this compare to the case where 1 vCPU reads every page of
memory and then 32 vCPUs concurrently dirty every page?

> > >
> > > We only validate this feature on Intel x86 platform. And as Ben
> > > pointed out in RFC V1, so far we disable the SMM for resource
> > > consideration, drop the mmu notification as in this case the
> > > memory is pinned.

I'm excited to see big MMU changes like this, and I look forward to
combining our needs towards a better MMU for the x86 TDP case. Have
you thought about how you would build SMM and MMU notifier support
onto this patch series? I know that the invalidate range notifiers, at
least, added a lot of non-trivial complexity to the direct MMU
implementation I presented last year.

> > >
> > > V1->V2:
> > > * Rebase the code to kernel version 5.9.0-rc1.
> > >
> > > Yulei Zhang (9):
> > >   Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT
> > >     support
> > >   Introduce page table population function for direct build EPT feature
> > >   Introduce page table remove function for direct build EPT feature
> > >   Add release function for direct build ept when guest VM exit
> > >   Modify the page fault path to meet the direct build EPT requirement
> > >   Apply the direct build EPT according to the memory slots change
> > >   Add migration support when using direct build EPT
> > >   Introduce kvm module parameter global_tdp to turn on the direct build
> > >     EPT mode
> > >   Handle certain mmu exposed functions properly while turn on direct
> > >     build EPT mode
> > >
> > >  arch/mips/kvm/mips.c            |  13 +
> > >  arch/powerpc/kvm/powerpc.c      |  13 +
> > >  arch/s390/kvm/kvm-s390.c        |  13 +
> > >  arch/x86/include/asm/kvm_host.h |  13 +-
> > >  arch/x86/kvm/mmu/mmu.c          | 533 ++++++++++++++++++++++++++++++--
> > >  arch/x86/kvm/svm/svm.c          |   2 +-
> > >  arch/x86/kvm/vmx/vmx.c          |   7 +-
> > >  arch/x86/kvm/x86.c              |  55 ++--
> > >  include/linux/kvm_host.h        |   7 +-
> > >  virt/kvm/kvm_main.c             |  43 ++-
> > >  10 files changed, 639 insertions(+), 60 deletions(-)
> > >
> > > --
> > > 2.17.1
> > >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance
  2020-09-24 17:14     ` Ben Gardon
@ 2020-09-25 12:04       ` yulei zhang
  2020-09-25 17:30         ` Ben Gardon
  0 siblings, 1 reply; 22+ messages in thread
From: yulei zhang @ 2020-09-25 12:04 UTC (permalink / raw)
  To: Ben Gardon
  Cc: Wanpeng Li, Paolo Bonzini, kvm, LKML, Sean Christopherson,
	Jim Mattson, Junaid Shahid, Vitaly Kuznetsov, Xiao Guangrong,
	Haiwei Li

On Fri, Sep 25, 2020 at 1:14 AM Ben Gardon <bgardon@google.com> wrote:
>
> On Wed, Sep 23, 2020 at 11:28 PM Wanpeng Li <kernellwp@gmail.com> wrote:
> >
> > Any comments? Paolo! :)
>
> Hi, sorry to be so late in replying! I wanted to post the first part
> of the TDP MMU series I've been working on before responding so we
> could discuss the two together, but I haven't been able to get it out
> as fast as I would have liked. (I'll send it ASAP!) I'm hopeful that
> it will ultimately help address some of the page fault handling and
> lock contention issues you're addressing with these patches. I'd also
> be happy to work together to add a prepopulation feature to it. I'll
> put in some more comments inline below.
>

Thanks for the feedback and looking forward to your patchset.

> > On Wed, 9 Sep 2020 at 11:04, Wanpeng Li <kernellwp@gmail.com> wrote:
> > >
> > > Any comments? guys!
> > > On Tue, 1 Sep 2020 at 19:52, <yulei.kernel@gmail.com> wrote:
> > > >
> > > > From: Yulei Zhang <yulei.kernel@gmail.com>
> > > >
> > > > Currently in KVM memory virtulization we relay on mmu_lock to
> > > > synchronize the memory mapping update, which make vCPUs work
> > > > in serialize mode and slow down the execution, especially after
> > > > migration to do substantial memory mapping will cause visible
> > > > performance drop, and it can get worse if guest has more vCPU
> > > > numbers and memories.
> > > >
> > > > The idea we present in this patch set is to mitigate the issue
> > > > with pre-constructed memory mapping table. We will fast pin the
> > > > guest memory to build up a global memory mapping table according
> > > > to the guest memslots changes and apply it to cr3, so that after
> > > > guest starts up all the vCPUs would be able to update the memory
> > > > simultaneously without page fault exception, thus the performance
> > > > improvement is expected.
>
> My understanding from this RFC is that your primary goal is to
> eliminate page fault latencies and lock contention arising from the
> first page faults incurred by vCPUs when initially populating the EPT.
> Is that right?
>

That's right.

> I have the impression that the pinning and generally static memory
> mappings are more a convenient simplification than part of a larger
> goal to avoid incurring page faults down the line. Is that correct?
>
> I ask because I didn't fully understand, from our conversation on v1
> of this RFC, why reimplementing the page fault handler and associated
> functions was necessary for the above goals, as I understood them.
> My impression of the prepopulation approach is that, KVM will
> sequentially populate all the EPT entries to map guest memory. I
> understand how this could be optimized to be quite efficient, but I
> don't understand how it would scale better than the existing
> implementation with one vCPU accessing memory.
>

I don't think our goal is to simply eliminate the page fault. Our
target scenario
is in live migration, when the workload resume on the destination VM after
migrate, it will kick off the vcpus to build the gfn to pfn mapping,
but due to the
mmu_lock it holds the vcpus to execute in sequential which significantly slows
down the workload execution in VM and affect the end user experience, especially
when it is memory sensitive workload. Pre-populate the EPT entries
will solve the
problem smoothly as it allows the vcpus to execute in parallel after migration.

> > > >
> > > > We use memory dirty pattern workload to test the initial patch
> > > > set and get positive result even with huge page enabled. For example,
> > > > we create guest with 32 vCPUs and 64G memories, and let the vcpus
> > > > dirty the entire memory region concurrently, as the initial patch
> > > > eliminate the overhead of mmu_lock, in 2M/1G huge page mode we would
> > > > get the job done in about 50% faster.
>
> In this benchmark did you include the time required to pre-populate
> the EPT or just the time required for the vCPUs to dirty memory?
> I ask because I'm curious if your priority is to decrease the total
> end-to-end time, or you just care about the guest experience, and not
> so much the VM startup time.

We compare the time for each vcpu thread to finish the dirty job. Yes, it can
take some time for the page table pre-populate, but as each vcpu thread
can gain a huge advantage with concurrent dirty write, if we count that in
the total time it is still a better result.

> How does this compare to the case where 1 vCPU reads every page of
> memory and then 32 vCPUs concurrently dirty every page?
>

Haven't tried this yet, I think the major difference would be the page fault
latency introduced by the one vCPU read.

> > > >
> > > > We only validate this feature on Intel x86 platform. And as Ben
> > > > pointed out in RFC V1, so far we disable the SMM for resource
> > > > consideration, drop the mmu notification as in this case the
> > > > memory is pinned.
>
> I'm excited to see big MMU changes like this, and I look forward to
> combining our needs towards a better MMU for the x86 TDP case. Have
> you thought about how you would build SMM and MMU notifier support
> onto this patch series? I know that the invalidate range notifiers, at
> least, added a lot of non-trivial complexity to the direct MMU
> implementation I presented last year.
>

Thanks for the suggestion, I will think about it.

> > > >
> > > > V1->V2:
> > > > * Rebase the code to kernel version 5.9.0-rc1.
> > > >
> > > > Yulei Zhang (9):
> > > >   Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT
> > > >     support
> > > >   Introduce page table population function for direct build EPT feature
> > > >   Introduce page table remove function for direct build EPT feature
> > > >   Add release function for direct build ept when guest VM exit
> > > >   Modify the page fault path to meet the direct build EPT requirement
> > > >   Apply the direct build EPT according to the memory slots change
> > > >   Add migration support when using direct build EPT
> > > >   Introduce kvm module parameter global_tdp to turn on the direct build
> > > >     EPT mode
> > > >   Handle certain mmu exposed functions properly while turn on direct
> > > >     build EPT mode
> > > >
> > > >  arch/mips/kvm/mips.c            |  13 +
> > > >  arch/powerpc/kvm/powerpc.c      |  13 +
> > > >  arch/s390/kvm/kvm-s390.c        |  13 +
> > > >  arch/x86/include/asm/kvm_host.h |  13 +-
> > > >  arch/x86/kvm/mmu/mmu.c          | 533 ++++++++++++++++++++++++++++++--
> > > >  arch/x86/kvm/svm/svm.c          |   2 +-
> > > >  arch/x86/kvm/vmx/vmx.c          |   7 +-
> > > >  arch/x86/kvm/x86.c              |  55 ++--
> > > >  include/linux/kvm_host.h        |   7 +-
> > > >  virt/kvm/kvm_main.c             |  43 ++-
> > > >  10 files changed, 639 insertions(+), 60 deletions(-)
> > > >
> > > > --
> > > > 2.17.1
> > > >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance
  2020-09-25 12:04       ` yulei zhang
@ 2020-09-25 17:30         ` Ben Gardon
  2020-09-25 20:50           ` Paolo Bonzini
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Gardon @ 2020-09-25 17:30 UTC (permalink / raw)
  To: yulei zhang
  Cc: Wanpeng Li, Paolo Bonzini, kvm, LKML, Sean Christopherson,
	Jim Mattson, Junaid Shahid, Vitaly Kuznetsov, Xiao Guangrong,
	Haiwei Li

On Fri, Sep 25, 2020 at 5:04 AM yulei zhang <yulei.kernel@gmail.com> wrote:
>
> On Fri, Sep 25, 2020 at 1:14 AM Ben Gardon <bgardon@google.com> wrote:
> >
> > On Wed, Sep 23, 2020 at 11:28 PM Wanpeng Li <kernellwp@gmail.com> wrote:
> > >
> > > Any comments? Paolo! :)
> >
> > Hi, sorry to be so late in replying! I wanted to post the first part
> > of the TDP MMU series I've been working on before responding so we
> > could discuss the two together, but I haven't been able to get it out
> > as fast as I would have liked. (I'll send it ASAP!) I'm hopeful that
> > it will ultimately help address some of the page fault handling and
> > lock contention issues you're addressing with these patches. I'd also
> > be happy to work together to add a prepopulation feature to it. I'll
> > put in some more comments inline below.
> >
>
> Thanks for the feedback and looking forward to your patchset.
>
> > > On Wed, 9 Sep 2020 at 11:04, Wanpeng Li <kernellwp@gmail.com> wrote:
> > > >
> > > > Any comments? guys!
> > > > On Tue, 1 Sep 2020 at 19:52, <yulei.kernel@gmail.com> wrote:
> > > > >
> > > > > From: Yulei Zhang <yulei.kernel@gmail.com>
> > > > >
> > > > > Currently in KVM memory virtulization we relay on mmu_lock to
> > > > > synchronize the memory mapping update, which make vCPUs work
> > > > > in serialize mode and slow down the execution, especially after
> > > > > migration to do substantial memory mapping will cause visible
> > > > > performance drop, and it can get worse if guest has more vCPU
> > > > > numbers and memories.
> > > > >
> > > > > The idea we present in this patch set is to mitigate the issue
> > > > > with pre-constructed memory mapping table. We will fast pin the
> > > > > guest memory to build up a global memory mapping table according
> > > > > to the guest memslots changes and apply it to cr3, so that after
> > > > > guest starts up all the vCPUs would be able to update the memory
> > > > > simultaneously without page fault exception, thus the performance
> > > > > improvement is expected.
> >
> > My understanding from this RFC is that your primary goal is to
> > eliminate page fault latencies and lock contention arising from the
> > first page faults incurred by vCPUs when initially populating the EPT.
> > Is that right?
> >
>
> That's right.
>
> > I have the impression that the pinning and generally static memory
> > mappings are more a convenient simplification than part of a larger
> > goal to avoid incurring page faults down the line. Is that correct?
> >
> > I ask because I didn't fully understand, from our conversation on v1
> > of this RFC, why reimplementing the page fault handler and associated
> > functions was necessary for the above goals, as I understood them.
> > My impression of the prepopulation approach is that, KVM will
> > sequentially populate all the EPT entries to map guest memory. I
> > understand how this could be optimized to be quite efficient, but I
> > don't understand how it would scale better than the existing
> > implementation with one vCPU accessing memory.
> >
>
> I don't think our goal is to simply eliminate the page fault. Our
> target scenario
> is in live migration, when the workload resume on the destination VM after
> migrate, it will kick off the vcpus to build the gfn to pfn mapping,
> but due to the
> mmu_lock it holds the vcpus to execute in sequential which significantly slows
> down the workload execution in VM and affect the end user experience, especially
> when it is memory sensitive workload. Pre-populate the EPT entries
> will solve the
> problem smoothly as it allows the vcpus to execute in parallel after migration.

Oh, thank you for explaining that. I didn't realize the goal here was
to improve LM performance. I was under the impression that this was to
give VMs a better experience on startup for fast scaling or something.
In your testing with live migration how has this affected the
distribution of time between the phases of live migration? Just for
terminology (since I'm not sure how standard it is across the
industry) I think of a live migration as consisting of 3 stages:
precopy, blackout, and postcopy. In precopy we're tracking the VM's
working set via dirty logging and sending the contents of its memory
to the target host. In blackout we pause the vCPUs on the source, copy
minimal data to the target, and resume the vCPUs on the target. In
postcopy we may still have some pages that have not been copied to the
target and so request those in response to vCPU page faults via user
fault fd or some other mechanism.

Does EPT pre-population preclude the use of a postcopy phase? I would
expect that to make the blackout phase really long. Has that not been
a problem for you?

I love the idea of partial EPT pre-population during precopy if you
could still handle postcopy and just pre-populate as memory came in.

>
> > > > >
> > > > > We use memory dirty pattern workload to test the initial patch
> > > > > set and get positive result even with huge page enabled. For example,
> > > > > we create guest with 32 vCPUs and 64G memories, and let the vcpus
> > > > > dirty the entire memory region concurrently, as the initial patch
> > > > > eliminate the overhead of mmu_lock, in 2M/1G huge page mode we would
> > > > > get the job done in about 50% faster.
> >
> > In this benchmark did you include the time required to pre-populate
> > the EPT or just the time required for the vCPUs to dirty memory?
> > I ask because I'm curious if your priority is to decrease the total
> > end-to-end time, or you just care about the guest experience, and not
> > so much the VM startup time.
>
> We compare the time for each vcpu thread to finish the dirty job. Yes, it can
> take some time for the page table pre-populate, but as each vcpu thread
> can gain a huge advantage with concurrent dirty write, if we count that in
> the total time it is still a better result.

That makes sense to me. Your implementation definitely seems more
efficient than the existing PF handling path. It's probably much
easier to parallelize as a sort of recursive population operation too.

>
> > How does this compare to the case where 1 vCPU reads every page of
> > memory and then 32 vCPUs concurrently dirty every page?
> >
>
> Haven't tried this yet, I think the major difference would be the page fault
> latency introduced by the one vCPU read.

I agree. The whole VM exit path adds a lot of overhead. I wonder what
kind of numbers you'd get it you cranked PTE_PREFETCH_NUM way up
though. If you set that to >= your memory size, one PF could
pre-populate the entire EPT. It's a silly approach, but it would be a
lot more efficient as an easy POC.

>
> > > > >
> > > > > We only validate this feature on Intel x86 platform. And as Ben
> > > > > pointed out in RFC V1, so far we disable the SMM for resource
> > > > > consideration, drop the mmu notification as in this case the
> > > > > memory is pinned.
> >
> > I'm excited to see big MMU changes like this, and I look forward to
> > combining our needs towards a better MMU for the x86 TDP case. Have
> > you thought about how you would build SMM and MMU notifier support
> > onto this patch series? I know that the invalidate range notifiers, at
> > least, added a lot of non-trivial complexity to the direct MMU
> > implementation I presented last year.
> >
>
> Thanks for the suggestion, I will think about it.
>
> > > > >
> > > > > V1->V2:
> > > > > * Rebase the code to kernel version 5.9.0-rc1.
> > > > >
> > > > > Yulei Zhang (9):
> > > > >   Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT
> > > > >     support
> > > > >   Introduce page table population function for direct build EPT feature
> > > > >   Introduce page table remove function for direct build EPT feature
> > > > >   Add release function for direct build ept when guest VM exit
> > > > >   Modify the page fault path to meet the direct build EPT requirement
> > > > >   Apply the direct build EPT according to the memory slots change
> > > > >   Add migration support when using direct build EPT
> > > > >   Introduce kvm module parameter global_tdp to turn on the direct build
> > > > >     EPT mode
> > > > >   Handle certain mmu exposed functions properly while turn on direct
> > > > >     build EPT mode
> > > > >
> > > > >  arch/mips/kvm/mips.c            |  13 +
> > > > >  arch/powerpc/kvm/powerpc.c      |  13 +
> > > > >  arch/s390/kvm/kvm-s390.c        |  13 +
> > > > >  arch/x86/include/asm/kvm_host.h |  13 +-
> > > > >  arch/x86/kvm/mmu/mmu.c          | 533 ++++++++++++++++++++++++++++++--
> > > > >  arch/x86/kvm/svm/svm.c          |   2 +-
> > > > >  arch/x86/kvm/vmx/vmx.c          |   7 +-
> > > > >  arch/x86/kvm/x86.c              |  55 ++--
> > > > >  include/linux/kvm_host.h        |   7 +-
> > > > >  virt/kvm/kvm_main.c             |  43 ++-
> > > > >  10 files changed, 639 insertions(+), 60 deletions(-)
> > > > >
> > > > > --
> > > > > 2.17.1
> > > > >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance
  2020-09-25 17:30         ` Ben Gardon
@ 2020-09-25 20:50           ` Paolo Bonzini
  2020-09-28 11:52             ` yulei zhang
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2020-09-25 20:50 UTC (permalink / raw)
  To: Ben Gardon, yulei zhang
  Cc: Wanpeng Li, kvm, LKML, Sean Christopherson, Jim Mattson,
	Junaid Shahid, Vitaly Kuznetsov, Xiao Guangrong, Haiwei Li

On 25/09/20 19:30, Ben Gardon wrote:
> Oh, thank you for explaining that. I didn't realize the goal here was
> to improve LM performance. I was under the impression that this was to
> give VMs a better experience on startup for fast scaling or something.
> In your testing with live migration how has this affected the
> distribution of time between the phases of live migration? Just for
> terminology (since I'm not sure how standard it is across the
> industry) I think of a live migration as consisting of 3 stages:
> precopy, blackout, and postcopy. In precopy we're tracking the VM's
> working set via dirty logging and sending the contents of its memory
> to the target host. In blackout we pause the vCPUs on the source, copy
> minimal data to the target, and resume the vCPUs on the target. In
> postcopy we may still have some pages that have not been copied to the
> target and so request those in response to vCPU page faults via user
> fault fd or some other mechanism.
> 
> Does EPT pre-population preclude the use of a postcopy phase?

I think so.

As a quick recap, turn postcopy migration handles two kinds of
pages---they can be copied to the destination either in background
(stuff that was dirty when userspace decided to transition to the
blackout phase) or on-demand (relayed from KVM to userspace via
get_user_pages and userfaultfd).  Normally only on-demand pages would be
served through userfaultfd, while with prepopulation every missing page
would be faulted in from the kernel through userfaultfd.  In practice
this would just extend the blackout phase.

Paolo

> I would
> expect that to make the blackout phase really long. Has that not been
> a problem for you?
> 
> I love the idea of partial EPT pre-population during precopy if you
> could still handle postcopy and just pre-populate as memory came in.
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance
  2020-09-25 20:50           ` Paolo Bonzini
@ 2020-09-28 11:52             ` yulei zhang
  0 siblings, 0 replies; 22+ messages in thread
From: yulei zhang @ 2020-09-28 11:52 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, Wanpeng Li, kvm, LKML, Sean Christopherson,
	Jim Mattson, Junaid Shahid, Vitaly Kuznetsov, Xiao Guangrong,
	Haiwei Li

On Sat, Sep 26, 2020 at 4:50 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 25/09/20 19:30, Ben Gardon wrote:
> > Oh, thank you for explaining that. I didn't realize the goal here was
> > to improve LM performance. I was under the impression that this was to
> > give VMs a better experience on startup for fast scaling or something.
> > In your testing with live migration how has this affected the
> > distribution of time between the phases of live migration? Just for
> > terminology (since I'm not sure how standard it is across the
> > industry) I think of a live migration as consisting of 3 stages:
> > precopy, blackout, and postcopy. In precopy we're tracking the VM's
> > working set via dirty logging and sending the contents of its memory
> > to the target host. In blackout we pause the vCPUs on the source, copy
> > minimal data to the target, and resume the vCPUs on the target. In
> > postcopy we may still have some pages that have not been copied to the
> > target and so request those in response to vCPU page faults via user
> > fault fd or some other mechanism.
> >
> > Does EPT pre-population preclude the use of a postcopy phase?
>
> I think so.
>
> As a quick recap, turn postcopy migration handles two kinds of
> pages---they can be copied to the destination either in background
> (stuff that was dirty when userspace decided to transition to the
> blackout phase) or on-demand (relayed from KVM to userspace via
> get_user_pages and userfaultfd).  Normally only on-demand pages would be
> served through userfaultfd, while with prepopulation every missing page
> would be faulted in from the kernel through userfaultfd.  In practice
> this would just extend the blackout phase.
>
> Paolo
>

Yep, you are right, based on current implementation it doesn't support the
postcopy. Thanks for the suggestion, we will try to fill the gap with proper
EPT population during the post-copy.

> > I would
> > expect that to make the blackout phase really long. Has that not been
> > a problem for you?
> >
> > I love the idea of partial EPT pre-population during precopy if you
> > could still handle postcopy and just pre-populate as memory came in.
> >
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC V2 2/9] Introduce page table population function for direct build EPT feature
@ 2020-09-01 19:56 kernel test robot
  0 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2020-09-01 19:56 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 1524 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <f0c109e76f3cd4a1bfd1ca3ff74e0d36c0288ca9.1598868204.git.yulei.kernel@gmail.com>
References: <f0c109e76f3cd4a1bfd1ca3ff74e0d36c0288ca9.1598868204.git.yulei.kernel@gmail.com>
TO: yulei.kernel(a)gmail.com

Hi,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on kvm/linux-next]
[also build test WARNING on linus/master v5.9-rc3 next-20200828]
[cannot apply to kvms390/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/yulei-kernel-gmail-com/x86-mmu-Introduce-parallel-memory-virtualization-to-boost-performance/20200901-221509
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
:::::: branch date: 6 hours ago
:::::: commit date: 6 hours ago
config: x86_64-randconfig-c002-20200901 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>


coccinelle warnings: (new ones prefixed by >>)

>> arch/x86/kvm/mmu/mmu.c:6299:5-8: Unneeded variable: "ret". Return "0" on line 6349

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 35445 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-09-28 11:53 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-01 11:52 [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance yulei.kernel
2020-09-01 11:54 ` [RFC V2 1/9] Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT support yulei.kernel
2020-09-01 11:55 ` [RFC V2 2/9] Introduce page table population function for direct build EPT feature yulei.kernel
2020-09-01 17:33   ` kernel test robot
2020-09-01 19:04   ` kernel test robot
2020-09-01 11:55 ` [RFC V2 3/9] Introduce page table remove " yulei.kernel
2020-09-01 11:55 ` [RFC V2 4/9] Add release function for direct build ept when guest VM exit yulei.kernel
2020-09-01 11:56 ` [RFC V2 5/9] Modify the page fault path to meet the direct build EPT requirement yulei.kernel
2020-09-01 11:56 ` [RFC V2 6/9] Apply the direct build EPT according to the memory slots change yulei.kernel
2020-09-01 22:20   ` kernel test robot
2020-09-02  7:00   ` kernel test robot
2020-09-01 11:56 ` [RFC V2 7/9] Add migration support when using direct build EPT yulei.kernel
2020-09-01 11:57 ` [RFC V2 8/9] Introduce kvm module parameter global_tdp to turn on the direct build EPT mode yulei.kernel
2020-09-01 11:57 ` [RFC V2 9/9] Handle certain mmu exposed functions properly while turn on " yulei.kernel
2020-09-09  3:04 ` [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance Wanpeng Li
2020-09-24  6:28   ` Wanpeng Li
2020-09-24 17:14     ` Ben Gardon
2020-09-25 12:04       ` yulei zhang
2020-09-25 17:30         ` Ben Gardon
2020-09-25 20:50           ` Paolo Bonzini
2020-09-28 11:52             ` yulei zhang
2020-09-01 19:56 [RFC V2 2/9] Introduce page table population function for direct build EPT feature kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.