* [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance
@ 2020-09-01 11:52 yulei.kernel
2020-09-01 11:54 ` [RFC V2 1/9] Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT support yulei.kernel
` (9 more replies)
0 siblings, 10 replies; 22+ messages in thread
From: yulei.kernel @ 2020-09-01 11:52 UTC (permalink / raw)
To: pbonzini
Cc: kvm, linux-kernel, sean.j.christopherson, jmattson, junaids,
bgardon, vkuznets, xiaoguangrong.eric, kernellwp,
lihaiwei.kernel, Yulei Zhang
From: Yulei Zhang <yulei.kernel@gmail.com>
Currently in KVM memory virtulization we relay on mmu_lock to
synchronize the memory mapping update, which make vCPUs work
in serialize mode and slow down the execution, especially after
migration to do substantial memory mapping will cause visible
performance drop, and it can get worse if guest has more vCPU
numbers and memories.
The idea we present in this patch set is to mitigate the issue
with pre-constructed memory mapping table. We will fast pin the
guest memory to build up a global memory mapping table according
to the guest memslots changes and apply it to cr3, so that after
guest starts up all the vCPUs would be able to update the memory
simultaneously without page fault exception, thus the performance
improvement is expected.
We use memory dirty pattern workload to test the initial patch
set and get positive result even with huge page enabled. For example,
we create guest with 32 vCPUs and 64G memories, and let the vcpus
dirty the entire memory region concurrently, as the initial patch
eliminate the overhead of mmu_lock, in 2M/1G huge page mode we would
get the job done in about 50% faster.
We only validate this feature on Intel x86 platform. And as Ben
pointed out in RFC V1, so far we disable the SMM for resource
consideration, drop the mmu notification as in this case the
memory is pinned.
V1->V2:
* Rebase the code to kernel version 5.9.0-rc1.
Yulei Zhang (9):
Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT
support
Introduce page table population function for direct build EPT feature
Introduce page table remove function for direct build EPT feature
Add release function for direct build ept when guest VM exit
Modify the page fault path to meet the direct build EPT requirement
Apply the direct build EPT according to the memory slots change
Add migration support when using direct build EPT
Introduce kvm module parameter global_tdp to turn on the direct build
EPT mode
Handle certain mmu exposed functions properly while turn on direct
build EPT mode
arch/mips/kvm/mips.c | 13 +
arch/powerpc/kvm/powerpc.c | 13 +
arch/s390/kvm/kvm-s390.c | 13 +
arch/x86/include/asm/kvm_host.h | 13 +-
arch/x86/kvm/mmu/mmu.c | 533 ++++++++++++++++++++++++++++++--
arch/x86/kvm/svm/svm.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 7 +-
arch/x86/kvm/x86.c | 55 ++--
include/linux/kvm_host.h | 7 +-
virt/kvm/kvm_main.c | 43 ++-
10 files changed, 639 insertions(+), 60 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 22+ messages in thread
* [RFC V2 1/9] Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT support
2020-09-01 11:52 [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance yulei.kernel
@ 2020-09-01 11:54 ` yulei.kernel
2020-09-01 11:55 ` [RFC V2 2/9] Introduce page table population function for direct build EPT feature yulei.kernel
` (8 subsequent siblings)
9 siblings, 0 replies; 22+ messages in thread
From: yulei.kernel @ 2020-09-01 11:54 UTC (permalink / raw)
To: pbonzini
Cc: kvm, linux-kernel, sean.j.christopherson, jmattson, junaids,
bgardon, vkuznets, xiaoguangrong.eric, kernellwp,
lihaiwei.kernel, Yulei Zhang
From: Yulei Zhang <yuleixzhang@tencent.com>
Add parameter global_root_hpa for saving direct build global EPT root point,
and add per-vcpu flag direct_build_tdp to indicate using global EPT root
point.
Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>
---
arch/x86/include/asm/kvm_host.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5ab3af7275d8..485b1239ad39 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -788,6 +788,9 @@ struct kvm_vcpu_arch {
/* AMD MSRC001_0015 Hardware Configuration */
u64 msr_hwcr;
+
+ /* vcpu use pre-constructed EPT */
+ bool direct_build_tdp;
};
struct kvm_lpage_info {
@@ -963,6 +966,8 @@ struct kvm_arch {
struct kvm_pmu_event_filter *pmu_event_filter;
struct task_struct *nx_lpage_recovery_thread;
+ /* global root hpa for pre-constructed EPT */
+ hpa_t global_root_hpa;
};
struct kvm_vm_stat {
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [RFC V2 2/9] Introduce page table population function for direct build EPT feature
2020-09-01 11:52 [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance yulei.kernel
2020-09-01 11:54 ` [RFC V2 1/9] Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT support yulei.kernel
@ 2020-09-01 11:55 ` yulei.kernel
2020-09-01 17:33 ` kernel test robot
2020-09-01 19:04 ` kernel test robot
2020-09-01 11:55 ` [RFC V2 3/9] Introduce page table remove " yulei.kernel
` (7 subsequent siblings)
9 siblings, 2 replies; 22+ messages in thread
From: yulei.kernel @ 2020-09-01 11:55 UTC (permalink / raw)
To: pbonzini
Cc: kvm, linux-kernel, sean.j.christopherson, jmattson, junaids,
bgardon, vkuznets, xiaoguangrong.eric, kernellwp,
lihaiwei.kernel, Yulei Zhang, Yulei Zhang
From: Yulei Zhang <yulei.kernel@gmail.com>
Page table population function will pin the memory and pre-construct
the EPT base on the input memory slot configuration so that it won't
relay on the page fault interrupt to setup the page table.
Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/mmu/mmu.c | 212 +++++++++++++++++++++++++++++++-
arch/x86/kvm/svm/svm.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 7 +-
include/linux/kvm_host.h | 4 +-
virt/kvm/kvm_main.c | 30 ++++-
6 files changed, 244 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 485b1239ad39..ab3cbef8c1aa 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1138,7 +1138,7 @@ struct kvm_x86_ops {
int (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*set_identity_map_addr)(struct kvm *kvm, u64 ident_addr);
- u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
+ u64 (*get_mt_mask)(struct kvm *kvm, struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, unsigned long pgd,
int pgd_level);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4e03841f053d..bfe4d2b3e809 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -241,6 +241,11 @@ struct kvm_shadow_walk_iterator {
({ spte = mmu_spte_get_lockless(_walker.sptep); 1; }); \
__shadow_walk_next(&(_walker), spte))
+#define for_each_direct_build_shadow_entry(_walker, shadow_addr, _addr, level) \
+ for (__shadow_walk_init(&(_walker), shadow_addr, _addr, level); \
+ shadow_walk_okay(&(_walker)); \
+ shadow_walk_next(&(_walker)))
+
static struct kmem_cache *pte_list_desc_cache;
static struct kmem_cache *mmu_page_header_cache;
static struct percpu_counter kvm_total_used_mmu_pages;
@@ -2506,13 +2511,20 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
return sp;
}
+static void __shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
+ hpa_t shadow_addr, u64 addr, int level)
+{
+ iterator->addr = addr;
+ iterator->shadow_addr = shadow_addr;
+ iterator->level = level;
+ iterator->sptep = NULL;
+}
+
static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterator,
struct kvm_vcpu *vcpu, hpa_t root,
u64 addr)
{
- iterator->addr = addr;
- iterator->shadow_addr = root;
- iterator->level = vcpu->arch.mmu->shadow_root_level;
+ __shadow_walk_init(iterator, root, addr, vcpu->arch.mmu->shadow_root_level);
if (iterator->level == PT64_ROOT_4LEVEL &&
vcpu->arch.mmu->root_level < PT64_ROOT_4LEVEL &&
@@ -3014,7 +3026,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
if (level > PG_LEVEL_4K)
spte |= PT_PAGE_SIZE_MASK;
if (tdp_enabled)
- spte |= kvm_x86_ops.get_mt_mask(vcpu, gfn,
+ spte |= kvm_x86_ops.get_mt_mask(vcpu->kvm, vcpu, gfn,
kvm_is_mmio_pfn(pfn));
if (host_writable)
@@ -6278,6 +6290,198 @@ int kvm_mmu_module_init(void)
return ret;
}
+static int direct_build_tdp_set_spte(struct kvm *kvm, struct kvm_memory_slot *slot,
+ u64 *sptep, unsigned pte_access, int level,
+ gfn_t gfn, kvm_pfn_t pfn, bool speculative,
+ bool dirty, bool host_writable)
+{
+ u64 spte = 0;
+ int ret = 0;
+ /*
+ * For the EPT case, shadow_present_mask is 0 if hardware
+ * supports exec-only page table entries. In that case,
+ * ACC_USER_MASK and shadow_user_mask are used to represent
+ * read access. See FNAME(gpte_access) in paging_tmpl.h.
+ */
+ spte |= shadow_present_mask;
+ if (!speculative)
+ spte |= shadow_accessed_mask;
+
+ if (level > PG_LEVEL_4K && (pte_access & ACC_EXEC_MASK) &&
+ is_nx_huge_page_enabled()) {
+ pte_access &= ~ACC_EXEC_MASK;
+ }
+
+ if (pte_access & ACC_EXEC_MASK)
+ spte |= shadow_x_mask;
+ else
+ spte |= shadow_nx_mask;
+
+ if (pte_access & ACC_USER_MASK)
+ spte |= shadow_user_mask;
+
+ if (level > PG_LEVEL_4K)
+ spte |= PT_PAGE_SIZE_MASK;
+
+ if (tdp_enabled)
+ spte |= kvm_x86_ops.get_mt_mask(kvm, NULL, gfn, kvm_is_mmio_pfn(pfn));
+
+ if (host_writable)
+ spte |= SPTE_HOST_WRITEABLE;
+ else
+ pte_access &= ~ACC_WRITE_MASK;
+
+ spte |= (u64)pfn << PAGE_SHIFT;
+
+ if (pte_access & ACC_WRITE_MASK) {
+
+ spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;
+
+ if (dirty) {
+ mark_page_dirty_in_slot(slot, gfn);
+ spte |= shadow_dirty_mask;
+ }
+ }
+
+ if (mmu_spte_update(sptep, spte))
+ kvm_flush_remote_tlbs(kvm);
+
+ return ret;
+}
+
+static void __kvm_walk_global_page(struct kvm *kvm, u64 addr, int level)
+{
+ int i;
+ kvm_pfn_t pfn;
+ u64 *sptep = (u64 *)__va(addr);
+
+ for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
+ if (is_shadow_present_pte(sptep[i])) {
+ if (!is_last_spte(sptep[i], level)) {
+ __kvm_walk_global_page(kvm, sptep[i] & PT64_BASE_ADDR_MASK, level - 1);
+ } else {
+ pfn = spte_to_pfn(sptep[i]);
+ mmu_spte_clear_track_bits(&sptep[i]);
+ kvm_release_pfn_clean(pfn);
+ }
+ }
+ }
+ put_page(pfn_to_page(addr >> PAGE_SHIFT));
+}
+
+static int direct_build_tdp_map(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn,
+ kvm_pfn_t pfn, int level)
+{
+ int ret = 0;
+
+ struct kvm_shadow_walk_iterator iterator;
+ kvm_pfn_t old_pfn;
+ u64 spte;
+
+ for_each_direct_build_shadow_entry(iterator, kvm->arch.global_root_hpa,
+ gfn << PAGE_SHIFT, max_tdp_level) {
+ if (iterator.level == level) {
+ break;
+ }
+
+ if (!is_shadow_present_pte(*iterator.sptep)) {
+ struct page *page;
+ page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+ if (!page)
+ return 0;
+
+ spte = page_to_phys(page) | PT_PRESENT_MASK | PT_WRITABLE_MASK |
+ shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
+ mmu_spte_set(iterator.sptep, spte);
+ }
+ }
+ /* if presented pte, release the original pfn */
+ if (is_shadow_present_pte(*iterator.sptep)) {
+ if (level > PG_LEVEL_4K)
+ __kvm_walk_global_page(kvm, (*iterator.sptep) & PT64_BASE_ADDR_MASK, level - 1);
+ else {
+ old_pfn = spte_to_pfn(*iterator.sptep);
+ mmu_spte_clear_track_bits(iterator.sptep);
+ kvm_release_pfn_clean(old_pfn);
+ }
+ }
+ direct_build_tdp_set_spte(kvm, slot, iterator.sptep, ACC_ALL, level, gfn, pfn, false, true, true);
+
+ return ret;
+}
+
+static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
+{
+ unsigned long page_size;
+ int i, ret = 0;
+
+ page_size = kvm_host_page_size(kvm, NULL, gfn);
+
+ for (i = PG_LEVEL_4K; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) {
+ if (page_size >= KVM_HPAGE_SIZE(i))
+ ret = i;
+ else
+ break;
+ }
+
+ return ret;
+}
+
+int direct_build_mapping_level(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn)
+{
+ int host_level, max_level, level;
+ struct kvm_lpage_info *linfo;
+
+ host_level = host_mapping_level(kvm, gfn);
+ if (host_level != PG_LEVEL_4K) {
+ max_level = min(max_huge_page_level, host_level);
+ for (level = PG_LEVEL_4K; level <= max_level; ++level) {
+ linfo = lpage_info_slot(gfn, slot, level);
+ if (linfo->disallow_lpage)
+ break;
+ }
+ host_level = level - 1;
+ }
+ return host_level;
+}
+
+int kvm_direct_tdp_populate_page_table(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+ gfn_t gfn;
+ kvm_pfn_t pfn;
+ int host_level;
+
+ if (!kvm->arch.global_root_hpa) {
+ struct page *page;
+ WARN_ON(!tdp_enabled);
+ WARN_ON(max_tdp_level != PT64_ROOT_4LEVEL);
+
+ /* init global root hpa */
+ page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+ if (!page)
+ return -ENOMEM;
+
+ kvm->arch.global_root_hpa = page_to_phys(page);
+ }
+
+ /* setup page table for the slot */
+ for (gfn = slot->base_gfn;
+ gfn < slot->base_gfn + slot->npages;
+ gfn += KVM_PAGES_PER_HPAGE(host_level)) {
+ pfn = gfn_to_pfn_try_write(slot, gfn);
+ if ((pfn & KVM_PFN_ERR_FAULT) || is_noslot_pfn(pfn))
+ return -ENOMEM;
+
+ host_level = direct_build_mapping_level(kvm, slot, gfn);
+
+ if (host_level > PG_LEVEL_4K)
+ MMU_WARN_ON(gfn & (KVM_PAGES_PER_HPAGE(host_level) - 1));
+ direct_build_tdp_map(kvm, slot, gfn, pfn, host_level);
+ }
+
+ return 0;
+}
+
/*
* Calculate mmu pages needed for kvm.
*/
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 03dd7bac8034..3b7ee65cd941 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3607,7 +3607,7 @@ static bool svm_has_emulated_msr(u32 index)
return true;
}
-static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
+static u64 svm_get_mt_mask(struct kvm *kvm, struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
{
return 0;
}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 46ba2e03a892..6f79343ed40e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7106,7 +7106,7 @@ static int __init vmx_check_processor_compat(void)
return 0;
}
-static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
+static u64 vmx_get_mt_mask(struct kvm *kvm, struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
{
u8 cache;
u64 ipat = 0;
@@ -7134,12 +7134,15 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
goto exit;
}
- if (!kvm_arch_has_noncoherent_dma(vcpu->kvm)) {
+ if (!kvm_arch_has_noncoherent_dma(kvm)) {
ipat = VMX_EPT_IPAT_BIT;
cache = MTRR_TYPE_WRBACK;
goto exit;
}
+ if (!vcpu)
+ vcpu = kvm->vcpus[0];
+
if (kvm_read_cr0(vcpu) & X86_CR0_CD) {
ipat = VMX_EPT_IPAT_BIT;
if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a23076765b4c..8901862ba2a3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -694,6 +694,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
struct kvm_memory_slot *old,
const struct kvm_memory_slot *new,
enum kvm_mr_change change);
+void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, gfn_t gfn);
/* flush all memory translations */
void kvm_arch_flush_shadow_all(struct kvm *kvm);
/* flush memory translations pointing to 'slot' */
@@ -721,6 +722,7 @@ kvm_pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn);
kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
bool atomic, bool *async, bool write_fault,
bool *writable);
+kvm_pfn_t gfn_to_pfn_try_write(struct kvm_memory_slot *slot, gfn_t gfn);
void kvm_release_pfn_clean(kvm_pfn_t pfn);
void kvm_release_pfn_dirty(kvm_pfn_t pfn);
@@ -775,7 +777,7 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len);
struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
bool kvm_vcpu_is_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
-unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn);
+unsigned long kvm_host_page_size(struct kvm *kvm, struct kvm_vcpu *vcpu, gfn_t gfn);
void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 737666db02de..47fc18b05c53 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -143,7 +143,7 @@ static void hardware_disable_all(void);
static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
-static void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, gfn_t gfn);
+void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, gfn_t gfn);
__visible bool kvm_rebooting;
EXPORT_SYMBOL_GPL(kvm_rebooting);
@@ -1689,14 +1689,17 @@ bool kvm_vcpu_is_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
}
EXPORT_SYMBOL_GPL(kvm_vcpu_is_visible_gfn);
-unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn)
+unsigned long kvm_host_page_size(struct kvm *kvm, struct kvm_vcpu *vcpu, gfn_t gfn)
{
struct vm_area_struct *vma;
unsigned long addr, size;
size = PAGE_SIZE;
- addr = kvm_vcpu_gfn_to_hva_prot(vcpu, gfn, NULL);
+ if (vcpu)
+ addr = kvm_vcpu_gfn_to_hva_prot(vcpu, gfn, NULL);
+ else
+ addr = gfn_to_hva(kvm, gfn);
if (kvm_is_error_hva(addr))
return PAGE_SIZE;
@@ -1989,6 +1992,25 @@ static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
return pfn;
}
+/* Map pfn for direct EPT mode, if map failed and it is readonly memslot,
+ * will try to remap it with readonly flag.
+ */
+kvm_pfn_t gfn_to_pfn_try_write(struct kvm_memory_slot *slot, gfn_t gfn)
+{
+ kvm_pfn_t pfn;
+ unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, !memslot_is_readonly(slot));
+
+ if (kvm_is_error_hva(addr))
+ return KVM_PFN_NOSLOT;
+
+ pfn = hva_to_pfn(addr, false, NULL, true, NULL);
+ if (pfn & KVM_PFN_ERR_FAULT) {
+ if (memslot_is_readonly(slot))
+ pfn = hva_to_pfn(addr, false, NULL, false, NULL);
+ }
+ return pfn;
+}
+
kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn,
bool atomic, bool *async, bool write_fault,
bool *writable)
@@ -2638,7 +2660,7 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len)
}
EXPORT_SYMBOL_GPL(kvm_clear_guest);
-static void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot,
+void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot,
gfn_t gfn)
{
if (memslot && memslot->dirty_bitmap) {
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [RFC V2 3/9] Introduce page table remove function for direct build EPT feature
2020-09-01 11:52 [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance yulei.kernel
2020-09-01 11:54 ` [RFC V2 1/9] Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT support yulei.kernel
2020-09-01 11:55 ` [RFC V2 2/9] Introduce page table population function for direct build EPT feature yulei.kernel
@ 2020-09-01 11:55 ` yulei.kernel
2020-09-01 11:55 ` [RFC V2 4/9] Add release function for direct build ept when guest VM exit yulei.kernel
` (6 subsequent siblings)
9 siblings, 0 replies; 22+ messages in thread
From: yulei.kernel @ 2020-09-01 11:55 UTC (permalink / raw)
To: pbonzini
Cc: kvm, linux-kernel, sean.j.christopherson, jmattson, junaids,
bgardon, vkuznets, xiaoguangrong.eric, kernellwp,
lihaiwei.kernel, Yulei Zhang
From: Yulei Zhang <yuleixzhang@tencent.com>
During guest boots up it will modify the memory slots multiple times,
so add page table remove function to free pre-pinned memory according
to the the memory slot changes.
Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>
---
arch/x86/kvm/mmu/mmu.c | 56 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 56 insertions(+)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index bfe4d2b3e809..03c5e73b96cb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6482,6 +6482,62 @@ int kvm_direct_tdp_populate_page_table(struct kvm *kvm, struct kvm_memory_slot *
return 0;
}
+static int __kvm_remove_spte(struct kvm *kvm, u64 *addr, gfn_t gfn, int level)
+{
+ int i;
+ int ret = level;
+ bool present = false;
+ kvm_pfn_t pfn;
+ u64 *sptep = (u64 *)__va((*addr) & PT64_BASE_ADDR_MASK);
+ unsigned index = SHADOW_PT_INDEX(gfn << PAGE_SHIFT, level);
+
+ for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
+ if (is_shadow_present_pte(sptep[i])) {
+ if (i == index) {
+ if (!is_last_spte(sptep[i], level)) {
+ ret = __kvm_remove_spte(kvm, &sptep[i], gfn, level - 1);
+ if (is_shadow_present_pte(sptep[i]))
+ return ret;
+ } else {
+ pfn = spte_to_pfn(sptep[i]);
+ mmu_spte_clear_track_bits(&sptep[i]);
+ kvm_release_pfn_clean(pfn);
+ if (present)
+ return ret;
+ }
+ } else {
+ if (i > index)
+ return ret;
+ else
+ present = true;
+ }
+ }
+ }
+
+ if (!present) {
+ pfn = spte_to_pfn(*addr);
+ mmu_spte_clear_track_bits(addr);
+ kvm_release_pfn_clean(pfn);
+ }
+ return ret;
+}
+
+void kvm_direct_tdp_remove_page_table(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+ gfn_t gfn = slot->base_gfn;
+ int host_level;
+
+ if (!kvm->arch.global_root_hpa)
+ return;
+
+ for (gfn = slot->base_gfn;
+ gfn < slot->base_gfn + slot->npages;
+ gfn += KVM_PAGES_PER_HPAGE(host_level))
+ host_level = __kvm_remove_spte(kvm, &(kvm->arch.global_root_hpa), gfn, PT64_ROOT_4LEVEL);
+
+ kvm_flush_remote_tlbs(kvm);
+}
+
/*
* Calculate mmu pages needed for kvm.
*/
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [RFC V2 4/9] Add release function for direct build ept when guest VM exit
2020-09-01 11:52 [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance yulei.kernel
` (2 preceding siblings ...)
2020-09-01 11:55 ` [RFC V2 3/9] Introduce page table remove " yulei.kernel
@ 2020-09-01 11:55 ` yulei.kernel
2020-09-01 11:56 ` [RFC V2 5/9] Modify the page fault path to meet the direct build EPT requirement yulei.kernel
` (5 subsequent siblings)
9 siblings, 0 replies; 22+ messages in thread
From: yulei.kernel @ 2020-09-01 11:55 UTC (permalink / raw)
To: pbonzini
Cc: kvm, linux-kernel, sean.j.christopherson, jmattson, junaids,
bgardon, vkuznets, xiaoguangrong.eric, kernellwp,
lihaiwei.kernel, Yulei Zhang, Yulei Zhang
From: Yulei Zhang <yulei.kernel@gmail.com>
Release the pre-pinned memory in direct build ept when guest VM
exit.
Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>
---
arch/x86/kvm/mmu/mmu.c | 37 ++++++++++++++++++++++++++++---------
1 file changed, 28 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 03c5e73b96cb..f2124f52b286 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4309,8 +4309,11 @@ static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd,
void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd, bool skip_tlb_flush,
bool skip_mmu_sync)
{
- __kvm_mmu_new_pgd(vcpu, new_pgd, kvm_mmu_calc_root_page_role(vcpu),
- skip_tlb_flush, skip_mmu_sync);
+ if (!vcpu->arch.direct_build_tdp)
+ __kvm_mmu_new_pgd(vcpu, new_pgd, kvm_mmu_calc_root_page_role(vcpu),
+ skip_tlb_flush, skip_mmu_sync);
+ else
+ vcpu->arch.mmu->root_hpa = INVALID_PAGE;
}
EXPORT_SYMBOL_GPL(kvm_mmu_new_pgd);
@@ -5207,10 +5210,14 @@ EXPORT_SYMBOL_GPL(kvm_mmu_load);
void kvm_mmu_unload(struct kvm_vcpu *vcpu)
{
- kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu, KVM_MMU_ROOTS_ALL);
- WARN_ON(VALID_PAGE(vcpu->arch.root_mmu.root_hpa));
- kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
- WARN_ON(VALID_PAGE(vcpu->arch.guest_mmu.root_hpa));
+ if (!vcpu->arch.direct_build_tdp) {
+ kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu, KVM_MMU_ROOTS_ALL);
+ WARN_ON(VALID_PAGE(vcpu->arch.root_mmu.root_hpa));
+ kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL);
+ WARN_ON(VALID_PAGE(vcpu->arch.guest_mmu.root_hpa));
+ }
+ vcpu->arch.direct_build_tdp = false;
+ vcpu->arch.mmu->root_hpa = INVALID_PAGE;
}
EXPORT_SYMBOL_GPL(kvm_mmu_unload);
@@ -6538,6 +6545,14 @@ void kvm_direct_tdp_remove_page_table(struct kvm *kvm, struct kvm_memory_slot *s
kvm_flush_remote_tlbs(kvm);
}
+void kvm_direct_tdp_release_global_root(struct kvm *kvm)
+{
+ if (kvm->arch.global_root_hpa)
+ __kvm_walk_global_page(kvm, kvm->arch.global_root_hpa, max_tdp_level);
+
+ return;
+}
+
/*
* Calculate mmu pages needed for kvm.
*/
@@ -6564,9 +6579,13 @@ unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
void kvm_mmu_destroy(struct kvm_vcpu *vcpu)
{
- kvm_mmu_unload(vcpu);
- free_mmu_pages(&vcpu->arch.root_mmu);
- free_mmu_pages(&vcpu->arch.guest_mmu);
+ if (vcpu->arch.direct_build_tdp) {
+ vcpu->arch.mmu->root_hpa = INVALID_PAGE;
+ } else {
+ kvm_mmu_unload(vcpu);
+ free_mmu_pages(&vcpu->arch.root_mmu);
+ free_mmu_pages(&vcpu->arch.guest_mmu);
+ }
mmu_free_memory_caches(vcpu);
}
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [RFC V2 5/9] Modify the page fault path to meet the direct build EPT requirement
2020-09-01 11:52 [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance yulei.kernel
` (3 preceding siblings ...)
2020-09-01 11:55 ` [RFC V2 4/9] Add release function for direct build ept when guest VM exit yulei.kernel
@ 2020-09-01 11:56 ` yulei.kernel
2020-09-01 11:56 ` [RFC V2 6/9] Apply the direct build EPT according to the memory slots change yulei.kernel
` (4 subsequent siblings)
9 siblings, 0 replies; 22+ messages in thread
From: yulei.kernel @ 2020-09-01 11:56 UTC (permalink / raw)
To: pbonzini
Cc: kvm, linux-kernel, sean.j.christopherson, jmattson, junaids,
bgardon, vkuznets, xiaoguangrong.eric, kernellwp,
lihaiwei.kernel, Yulei Zhang, Yulei Zhang
From: Yulei Zhang <yulei.kernel@gmail.com>
Refine the fast page fault code so that it can be used in either
normal ept mode or direct build EPT mode.
Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>
---
arch/x86/kvm/mmu/mmu.c | 28 ++++++++++++++++++++--------
1 file changed, 20 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f2124f52b286..fda6c4196854 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3443,12 +3443,13 @@ static bool page_fault_can_be_fast(u32 error_code)
* someone else modified the SPTE from its original value.
*/
static bool
-fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, gpa_t gpa,
u64 *sptep, u64 old_spte, u64 new_spte)
{
gfn_t gfn;
- WARN_ON(!sp->role.direct);
+ WARN_ON(!vcpu->arch.direct_build_tdp &&
+ (!sptep_to_sp(sptep)->role.direct));
/*
* Theoretically we could also set dirty bit (and flush TLB) here in
@@ -3470,7 +3471,8 @@ fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
* The gfn of direct spte is stable since it is
* calculated by sp->gfn.
*/
- gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
+
+ gfn = gpa >> PAGE_SHIFT;
kvm_vcpu_mark_page_dirty(vcpu, gfn);
}
@@ -3498,10 +3500,10 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
u32 error_code)
{
struct kvm_shadow_walk_iterator iterator;
- struct kvm_mmu_page *sp;
bool fault_handled = false;
u64 spte = 0ull;
uint retry_count = 0;
+ int pte_level = 0;
if (!page_fault_can_be_fast(error_code))
return false;
@@ -3515,8 +3517,15 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
if (!is_shadow_present_pte(spte))
break;
- sp = sptep_to_sp(iterator.sptep);
- if (!is_last_spte(spte, sp->role.level))
+ if (iterator.level < PG_LEVEL_4K)
+ pte_level = PG_LEVEL_4K;
+ else
+ pte_level = iterator.level;
+
+ WARN_ON(!vcpu->arch.direct_build_tdp &&
+ (pte_level != sptep_to_sp(iterator.sptep)->role.level));
+
+ if (!is_last_spte(spte, pte_level))
break;
/*
@@ -3559,7 +3568,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
*
* See the comments in kvm_arch_commit_memory_region().
*/
- if (sp->role.level > PG_LEVEL_4K)
+ if (pte_level > PG_LEVEL_4K)
break;
}
@@ -3573,7 +3582,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
* since the gfn is not stable for indirect shadow page. See
* Documentation/virt/kvm/locking.rst to get more detail.
*/
- fault_handled = fast_pf_fix_direct_spte(vcpu, sp,
+ fault_handled = fast_pf_fix_direct_spte(vcpu, cr2_or_gpa,
iterator.sptep, spte,
new_spte);
if (fault_handled)
@@ -4106,6 +4115,9 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
if (fast_page_fault(vcpu, gpa, error_code))
return RET_PF_RETRY;
+ if (vcpu->arch.direct_build_tdp)
+ return RET_PF_EMULATE;
+
r = mmu_topup_memory_caches(vcpu, false);
if (r)
return r;
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [RFC V2 6/9] Apply the direct build EPT according to the memory slots change
2020-09-01 11:52 [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance yulei.kernel
` (4 preceding siblings ...)
2020-09-01 11:56 ` [RFC V2 5/9] Modify the page fault path to meet the direct build EPT requirement yulei.kernel
@ 2020-09-01 11:56 ` yulei.kernel
2020-09-01 22:20 ` kernel test robot
2020-09-02 7:00 ` kernel test robot
2020-09-01 11:56 ` [RFC V2 7/9] Add migration support when using direct build EPT yulei.kernel
` (3 subsequent siblings)
9 siblings, 2 replies; 22+ messages in thread
From: yulei.kernel @ 2020-09-01 11:56 UTC (permalink / raw)
To: pbonzini
Cc: kvm, linux-kernel, sean.j.christopherson, jmattson, junaids,
bgardon, vkuznets, xiaoguangrong.eric, kernellwp,
lihaiwei.kernel, Yulei Zhang, Yulei Zhang
From: Yulei Zhang <yulei.kernel@gmail.com>
Construct the direct build ept when guest memory slots have been
changed, and issue mmu_reload request to update the CR3 so that
guest could use the pre-constructed EPT without page fault.
Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>
---
arch/mips/kvm/mips.c | 13 +++++++++++++
arch/powerpc/kvm/powerpc.c | 13 +++++++++++++
arch/s390/kvm/kvm-s390.c | 13 +++++++++++++
arch/x86/kvm/mmu/mmu.c | 33 ++++++++++++++++++++++++++-------
include/linux/kvm_host.h | 3 +++
virt/kvm/kvm_main.c | 13 +++++++++++++
6 files changed, 81 insertions(+), 7 deletions(-)
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 7de85d2253ff..05d053a53ebf 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -267,6 +267,19 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
}
}
+int kvm_direct_tdp_populate_page_table(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+ return 0;
+}
+
+void kvm_direct_tdp_remove_page_table(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+}
+
+void kvm_direct_tdp_release_global_root(struct kvm *kvm)
+{
+}
+
static inline void dump_handler(const char *symbol, void *start, void *end)
{
u32 *p;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 13999123b735..c6964cbeb6da 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -715,6 +715,19 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
kvmppc_core_commit_memory_region(kvm, mem, old, new, change);
}
+int kvm_direct_tdp_populate_page_table(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+ return 0;
+}
+
+void kvm_direct_tdp_remove_page_table(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+}
+
+void kvm_direct_tdp_release_global_root(struct kvm *kvm)
+{
+}
+
void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot)
{
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 6b74b92c1a58..d6f7cf1a30a3 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -5021,6 +5021,19 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
return;
}
+int kvm_direct_tdp_populate_page_table(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+ return 0;
+}
+
+void kvm_direct_tdp_remove_page_table(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+}
+
+void kvm_direct_tdp_release_global_root(struct kvm *kvm)
+{
+}
+
static inline unsigned long nonhyp_mask(int i)
{
unsigned int nonhyp_fai = (sclp.hmfai << i * 2) >> 30;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index fda6c4196854..47d2a1c18f36 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5206,13 +5206,20 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
{
int r;
- r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->direct_map);
- if (r)
- goto out;
- r = mmu_alloc_roots(vcpu);
- kvm_mmu_sync_roots(vcpu);
- if (r)
- goto out;
+ if (vcpu->kvm->arch.global_root_hpa) {
+ vcpu->arch.direct_build_tdp = true;
+ vcpu->arch.mmu->root_hpa = vcpu->kvm->arch.global_root_hpa;
+ }
+
+ if (!vcpu->arch.direct_build_tdp) {
+ r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->direct_map);
+ if (r)
+ goto out;
+ r = mmu_alloc_roots(vcpu);
+ kvm_mmu_sync_roots(vcpu);
+ if (r)
+ goto out;
+ }
kvm_mmu_load_pgd(vcpu);
kvm_x86_ops.tlb_flush_current(vcpu);
out:
@@ -6464,6 +6471,17 @@ int direct_build_mapping_level(struct kvm *kvm, struct kvm_memory_slot *slot, gf
return host_level;
}
+static void kvm_make_direct_build_update(struct kvm *kvm)
+{
+ int i;
+ struct kvm_vcpu *vcpu;
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
+ kvm_vcpu_kick(vcpu);
+ }
+}
+
int kvm_direct_tdp_populate_page_table(struct kvm *kvm, struct kvm_memory_slot *slot)
{
gfn_t gfn;
@@ -6498,6 +6516,7 @@ int kvm_direct_tdp_populate_page_table(struct kvm *kvm, struct kvm_memory_slot *
direct_build_tdp_map(kvm, slot, gfn, pfn, host_level);
}
+ kvm_make_direct_build_update(kvm);
return 0;
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8901862ba2a3..b2aa0daad6dd 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -694,6 +694,9 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
struct kvm_memory_slot *old,
const struct kvm_memory_slot *new,
enum kvm_mr_change change);
+int kvm_direct_tdp_populate_page_table(struct kvm *kvm, struct kvm_memory_slot *slot);
+void kvm_direct_tdp_remove_page_table(struct kvm *kvm, struct kvm_memory_slot *slot);
+void kvm_direct_tdp_release_global_root(struct kvm *kvm);
void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, gfn_t gfn);
/* flush all memory translations */
void kvm_arch_flush_shadow_all(struct kvm *kvm);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 47fc18b05c53..fd1b419f4eb4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -876,6 +876,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
#endif
kvm_arch_destroy_vm(kvm);
kvm_destroy_devices(kvm);
+ kvm_direct_tdp_release_global_root(kvm);
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
kvm_free_memslots(kvm, __kvm_memslots(kvm, i));
cleanup_srcu_struct(&kvm->irq_srcu);
@@ -1195,6 +1196,10 @@ static int kvm_set_memslot(struct kvm *kvm,
* in the freshly allocated memslots, not in @old or @new.
*/
slot = id_to_memslot(slots, old->id);
+ /* Remove pre-constructed page table */
+ if (!as_id)
+ kvm_direct_tdp_remove_page_table(kvm, slot);
+
slot->flags |= KVM_MEMSLOT_INVALID;
/*
@@ -1222,6 +1227,14 @@ static int kvm_set_memslot(struct kvm *kvm,
update_memslots(slots, new, change);
slots = install_new_memslots(kvm, as_id, slots);
+ if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
+ if (!as_id) {
+ r = kvm_direct_tdp_populate_page_table(kvm, new);
+ if (r)
+ goto out_slots;
+ }
+ }
+
kvm_arch_commit_memory_region(kvm, mem, old, new, change);
kvfree(slots);
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [RFC V2 7/9] Add migration support when using direct build EPT
2020-09-01 11:52 [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance yulei.kernel
` (5 preceding siblings ...)
2020-09-01 11:56 ` [RFC V2 6/9] Apply the direct build EPT according to the memory slots change yulei.kernel
@ 2020-09-01 11:56 ` yulei.kernel
2020-09-01 11:57 ` [RFC V2 8/9] Introduce kvm module parameter global_tdp to turn on the direct build EPT mode yulei.kernel
` (2 subsequent siblings)
9 siblings, 0 replies; 22+ messages in thread
From: yulei.kernel @ 2020-09-01 11:56 UTC (permalink / raw)
To: pbonzini
Cc: kvm, linux-kernel, sean.j.christopherson, jmattson, junaids,
bgardon, vkuznets, xiaoguangrong.eric, kernellwp,
lihaiwei.kernel, Yulei Zhang, Yulei Zhang
From: Yulei Zhang <yulei.kernel@gmail.com>
Make migration available in direct build ept mode whether
pml enabled or not.
Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>
---
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/kvm/mmu/mmu.c | 153 +++++++++++++++++++++++++++++++-
arch/x86/kvm/x86.c | 44 +++++----
3 files changed, 178 insertions(+), 21 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ab3cbef8c1aa..429a50c89268 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1318,6 +1318,8 @@ void kvm_mmu_zap_all(struct kvm *kvm);
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm);
void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
+void kvm_mmu_slot_direct_build_handle_wp(struct kvm *kvm,
+ struct kvm_memory_slot *memslot);
int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3);
bool pdptrs_changed(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 47d2a1c18f36..f03bf8efcefe 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -249,6 +249,8 @@ struct kvm_shadow_walk_iterator {
static struct kmem_cache *pte_list_desc_cache;
static struct kmem_cache *mmu_page_header_cache;
static struct percpu_counter kvm_total_used_mmu_pages;
+static int __kvm_write_protect_spte(struct kvm *kvm, struct kvm_memory_slot *slot,
+ gfn_t gfn, int level);
static u64 __read_mostly shadow_nx_mask;
static u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */
@@ -1644,11 +1646,18 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
gfn_t gfn_offset, unsigned long mask)
{
struct kvm_rmap_head *rmap_head;
+ gfn_t gfn;
while (mask) {
- rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
- PG_LEVEL_4K, slot);
- __rmap_write_protect(kvm, rmap_head, false);
+ if (kvm->arch.global_root_hpa) {
+ gfn = slot->base_gfn + gfn_offset + __ffs(mask);
+
+ __kvm_write_protect_spte(kvm, slot, gfn, PG_LEVEL_4K);
+ } else {
+ rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
+ PG_LEVEL_4K, slot);
+ __rmap_write_protect(kvm, rmap_head, false);
+ }
/* clear the first set bit */
mask &= mask - 1;
@@ -6584,6 +6593,144 @@ void kvm_direct_tdp_release_global_root(struct kvm *kvm)
return;
}
+static int __kvm_write_protect_spte(struct kvm *kvm, struct kvm_memory_slot *slot,
+ gfn_t gfn, int level)
+{
+ int ret = 0;
+ /* add write protect on pte, tear down the page table if large page is enabled */
+ struct kvm_shadow_walk_iterator iterator;
+ unsigned long i;
+ kvm_pfn_t pfn;
+ struct page *page;
+ u64 *sptep;
+ u64 spte, t_spte;
+
+ for_each_direct_build_shadow_entry(iterator, kvm->arch.global_root_hpa,
+ gfn << PAGE_SHIFT, max_tdp_level) {
+ if (iterator.level == level) {
+ break;
+ }
+ }
+
+ if (level != PG_LEVEL_4K) {
+ sptep = iterator.sptep;
+
+ page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+ if (!page)
+ return ret;
+
+ t_spte = page_to_phys(page) | PT_PRESENT_MASK | PT_WRITABLE_MASK |
+ shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
+
+ for (i = 0; i < KVM_PAGES_PER_HPAGE(level); i++) {
+
+ for_each_direct_build_shadow_entry(iterator, t_spte & PT64_BASE_ADDR_MASK,
+ gfn << PAGE_SHIFT, level - 1) {
+ if (iterator.level == PG_LEVEL_4K) {
+ break;
+ }
+
+ if (!is_shadow_present_pte(*iterator.sptep)) {
+ struct page *page;
+ page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+ if (!page) {
+ __kvm_walk_global_page(kvm, t_spte & PT64_BASE_ADDR_MASK, level - 1);
+ return ret;
+ }
+ spte = page_to_phys(page) | PT_PRESENT_MASK | PT_WRITABLE_MASK |
+ shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
+ mmu_spte_set(iterator.sptep, spte);
+ }
+ }
+
+ pfn = gfn_to_pfn_try_write(slot, gfn);
+ if ((pfn & KVM_PFN_ERR_FAULT) || is_noslot_pfn(pfn))
+ return ret;
+
+ if (kvm_x86_ops.slot_enable_log_dirty)
+ direct_build_tdp_set_spte(kvm, slot, iterator.sptep,
+ ACC_ALL, iterator.level, gfn, pfn, false, false, true);
+
+ else
+ direct_build_tdp_set_spte(kvm, slot, iterator.sptep,
+ ACC_EXEC_MASK | ACC_USER_MASK, iterator.level, gfn, pfn, false, true, true);
+ gfn++;
+ }
+ WARN_ON(!is_last_spte(*sptep, level));
+ pfn = spte_to_pfn(*sptep);
+ mmu_spte_clear_track_bits(sptep);
+ kvm_release_pfn_clean(pfn);
+ mmu_spte_set(sptep, t_spte);
+ } else {
+ if (kvm_x86_ops.slot_enable_log_dirty)
+ spte_clear_dirty(iterator.sptep);
+ else
+ spte_write_protect(iterator.sptep, false);
+ }
+ return ret;
+}
+
+static void __kvm_remove_wp_spte(struct kvm *kvm, struct kvm_memory_slot *slot,
+ gfn_t gfn, int level)
+{
+ struct kvm_shadow_walk_iterator iterator;
+ kvm_pfn_t pfn;
+ u64 addr, spte;
+
+ for_each_direct_build_shadow_entry(iterator, kvm->arch.global_root_hpa,
+ gfn << PAGE_SHIFT, max_tdp_level) {
+ if (iterator.level == level)
+ break;
+ }
+
+ if (level != PG_LEVEL_4K) {
+ if (is_shadow_present_pte(*iterator.sptep)) {
+ addr = (*iterator.sptep) & PT64_BASE_ADDR_MASK;
+
+ pfn = gfn_to_pfn_try_write(slot, gfn);
+ if ((pfn & KVM_PFN_ERR_FAULT) || is_noslot_pfn(pfn)) {
+ printk("Failed to alloc page\n");
+ return;
+ }
+ mmu_spte_clear_track_bits(iterator.sptep);
+ direct_build_tdp_set_spte(kvm, slot, iterator.sptep,
+ ACC_ALL, level, gfn, pfn, false, true, true);
+
+ __kvm_walk_global_page(kvm, addr, level - 1);
+ }
+ } else {
+ if (is_shadow_present_pte(*iterator.sptep)) {
+ if (kvm_x86_ops.slot_enable_log_dirty) {
+ spte_set_dirty(iterator.sptep);
+ } else {
+ spte = (*iterator.sptep) | PT_WRITABLE_MASK;
+ mmu_spte_update(iterator.sptep, spte);
+ }
+ }
+ }
+}
+
+void kvm_mmu_slot_direct_build_handle_wp(struct kvm *kvm,
+ struct kvm_memory_slot *memslot)
+{
+ gfn_t gfn = memslot->base_gfn;
+ int host_level;
+
+ /* remove write mask from PTE */
+ for (gfn = memslot->base_gfn; gfn < memslot->base_gfn + memslot->npages; ) {
+
+ host_level = direct_build_mapping_level(kvm, memslot, gfn);
+
+ if (memslot->flags & KVM_MEM_LOG_DIRTY_PAGES)
+ __kvm_write_protect_spte(kvm, memslot, gfn, host_level);
+ else
+ __kvm_remove_wp_spte(kvm, memslot, gfn, host_level);
+ gfn += KVM_PAGES_PER_HPAGE(host_level);
+ }
+
+ kvm_flush_remote_tlbs(kvm);
+}
+
/*
* Calculate mmu pages needed for kvm.
*/
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 599d73206299..ee898003f22f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10196,9 +10196,12 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
* kvm_arch_flush_shadow_memslot()
*/
if ((old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
- !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
- kvm_mmu_zap_collapsible_sptes(kvm, new);
-
+ !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
+ if (kvm->arch.global_root_hpa)
+ kvm_mmu_slot_direct_build_handle_wp(kvm, (struct kvm_memory_slot *)new);
+ else
+ kvm_mmu_zap_collapsible_sptes(kvm, new);
+ }
/*
* Enable or disable dirty logging for the slot.
*
@@ -10228,25 +10231,30 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
* is enabled the D-bit or the W-bit will be cleared.
*/
if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) {
- if (kvm_x86_ops.slot_enable_log_dirty) {
- kvm_x86_ops.slot_enable_log_dirty(kvm, new);
+ if (kvm->arch.global_root_hpa) {
+ kvm_mmu_slot_direct_build_handle_wp(kvm, new);
} else {
- int level =
- kvm_dirty_log_manual_protect_and_init_set(kvm) ?
- PG_LEVEL_2M : PG_LEVEL_4K;
+ if (kvm_x86_ops.slot_enable_log_dirty) {
+ kvm_x86_ops.slot_enable_log_dirty(kvm, new);
+ } else {
+ int level =
+ kvm_dirty_log_manual_protect_and_init_set(kvm) ?
+ PG_LEVEL_2M : PG_LEVEL_4K;
- /*
- * If we're with initial-all-set, we don't need
- * to write protect any small page because
- * they're reported as dirty already. However
- * we still need to write-protect huge pages
- * so that the page split can happen lazily on
- * the first write to the huge page.
- */
- kvm_mmu_slot_remove_write_access(kvm, new, level);
+ /*
+ * If we're with initial-all-set, we don't need
+ * to write protect any small page because
+ * they're reported as dirty already. However
+ * we still need to write-protect huge pages
+ * so that the page split can happen lazily on
+ * the first write to the huge page.
+ */
+ kvm_mmu_slot_remove_write_access(kvm, new, level);
+ }
}
} else {
- if (kvm_x86_ops.slot_disable_log_dirty)
+ if (kvm_x86_ops.slot_disable_log_dirty
+ && !kvm->arch.global_root_hpa)
kvm_x86_ops.slot_disable_log_dirty(kvm, new);
}
}
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [RFC V2 8/9] Introduce kvm module parameter global_tdp to turn on the direct build EPT mode
2020-09-01 11:52 [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance yulei.kernel
` (6 preceding siblings ...)
2020-09-01 11:56 ` [RFC V2 7/9] Add migration support when using direct build EPT yulei.kernel
@ 2020-09-01 11:57 ` yulei.kernel
2020-09-01 11:57 ` [RFC V2 9/9] Handle certain mmu exposed functions properly while turn on " yulei.kernel
2020-09-09 3:04 ` [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance Wanpeng Li
9 siblings, 0 replies; 22+ messages in thread
From: yulei.kernel @ 2020-09-01 11:57 UTC (permalink / raw)
To: pbonzini
Cc: kvm, linux-kernel, sean.j.christopherson, jmattson, junaids,
bgardon, vkuznets, xiaoguangrong.eric, kernellwp,
lihaiwei.kernel, Yulei Zhang
From: Yulei Zhang <yuleixzhang@tencent.com>
Currently global_tdp is only supported on intel X86 system with ept
supported, and it will turn off the smm mode when enable global_tdp.
Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>
---
arch/x86/include/asm/kvm_host.h | 4 ++++
arch/x86/kvm/mmu/mmu.c | 5 ++++-
arch/x86/kvm/x86.c | 11 ++++++++++-
3 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 429a50c89268..330cb254b34b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1357,6 +1357,8 @@ extern u64 kvm_default_tsc_scaling_ratio;
extern u64 kvm_mce_cap_supported;
+extern bool global_tdp;
+
/*
* EMULTYPE_NO_DECODE - Set when re-emulating an instruction (after completing
* userspace I/O) to indicate that the emulation context
@@ -1689,6 +1691,8 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
#endif
}
+inline bool boot_cpu_is_amd(void);
+
#define put_smstate(type, buf, offset, val) \
*(type *)((buf) + (offset) - 0x7e00) = val
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f03bf8efcefe..6639d9c7012e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4573,7 +4573,7 @@ reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
}
EXPORT_SYMBOL_GPL(reset_shadow_zero_bits_mask);
-static inline bool boot_cpu_is_amd(void)
+inline bool boot_cpu_is_amd(void)
{
WARN_ON_ONCE(!tdp_enabled);
return shadow_x_mask == 0;
@@ -6497,6 +6497,9 @@ int kvm_direct_tdp_populate_page_table(struct kvm *kvm, struct kvm_memory_slot *
kvm_pfn_t pfn;
int host_level;
+ if (!global_tdp)
+ return 0;
+
if (!kvm->arch.global_root_hpa) {
struct page *page;
WARN_ON(!tdp_enabled);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ee898003f22f..57d64f3239e1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -161,6 +161,9 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
int __read_mostly pi_inject_timer = -1;
module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
+bool __read_mostly global_tdp;
+module_param_named(global_tdp, global_tdp, bool, S_IRUGO);
+
#define KVM_NR_SHARED_MSRS 16
struct kvm_shared_msrs_global {
@@ -3539,7 +3542,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
* fringe case that is not enabled except via specific settings
* of the module parameters.
*/
- r = kvm_x86_ops.has_emulated_msr(MSR_IA32_SMBASE);
+ if (global_tdp)
+ r = 0;
+ else
+ r = kvm_x86_ops.has_emulated_msr(MSR_IA32_SMBASE);
break;
case KVM_CAP_VAPIC:
r = !kvm_x86_ops.cpu_has_accelerated_tpr();
@@ -9808,6 +9814,9 @@ int kvm_arch_hardware_setup(void *opaque)
if (r != 0)
return r;
+ if ((tdp_enabled == false) || boot_cpu_is_amd())
+ global_tdp = 0;
+
memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [RFC V2 9/9] Handle certain mmu exposed functions properly while turn on direct build EPT mode
2020-09-01 11:52 [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance yulei.kernel
` (7 preceding siblings ...)
2020-09-01 11:57 ` [RFC V2 8/9] Introduce kvm module parameter global_tdp to turn on the direct build EPT mode yulei.kernel
@ 2020-09-01 11:57 ` yulei.kernel
2020-09-09 3:04 ` [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance Wanpeng Li
9 siblings, 0 replies; 22+ messages in thread
From: yulei.kernel @ 2020-09-01 11:57 UTC (permalink / raw)
To: pbonzini
Cc: kvm, linux-kernel, sean.j.christopherson, jmattson, junaids,
bgardon, vkuznets, xiaoguangrong.eric, kernellwp,
lihaiwei.kernel, Yulei Zhang, Yulei Zhang
From: Yulei Zhang <yulei.kernel@gmail.com>
Signed-off-by: Yulei Zhang <yuleixzhang@tencent.com>
---
arch/x86/kvm/mmu/mmu.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6639d9c7012e..35bd87bf965f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1719,6 +1719,9 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
int i;
bool write_protected = false;
+ if (kvm->arch.global_root_hpa)
+ return write_protected;
+
for (i = PG_LEVEL_4K; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) {
rmap_head = __gfn_to_rmap(gfn, i, slot);
write_protected |= __rmap_write_protect(kvm, rmap_head, true);
@@ -5862,6 +5865,9 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
*/
static void kvm_mmu_zap_all_fast(struct kvm *kvm)
{
+ if (kvm->arch.global_root_hpa)
+ return;
+
lockdep_assert_held(&kvm->slots_lock);
spin_lock(&kvm->mmu_lock);
@@ -5924,6 +5930,9 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
struct kvm_memory_slot *memslot;
int i;
+ if (kvm->arch.global_root_hpa)
+ return;
+
spin_lock(&kvm->mmu_lock);
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
slots = __kvm_memslots(kvm, i);
--
2.17.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [RFC V2 2/9] Introduce page table population function for direct build EPT feature
2020-09-01 11:55 ` [RFC V2 2/9] Introduce page table population function for direct build EPT feature yulei.kernel
@ 2020-09-01 17:33 ` kernel test robot
2020-09-01 19:04 ` kernel test robot
1 sibling, 0 replies; 22+ messages in thread
From: kernel test robot @ 2020-09-01 17:33 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 2710 bytes --]
Hi,
[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on kvm/linux-next]
[also build test WARNING on linus/master v5.9-rc3 next-20200828]
[cannot apply to kvms390/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/yulei-kernel-gmail-com/x86-mmu-Introduce-parallel-memory-virtualization-to-boost-performance/20200901-221509
base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: i386-allyesconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
# save the attached .config to linux build tree
make W=1 ARCH=i386
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
>> arch/x86/kvm/mmu/mmu.c:6430:5: warning: no previous prototype for 'direct_build_mapping_level' [-Wmissing-prototypes]
6430 | int direct_build_mapping_level(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
arch/x86/kvm/mmu/mmu.c:6448:5: warning: no previous prototype for 'kvm_direct_tdp_populate_page_table' [-Wmissing-prototypes]
6448 | int kvm_direct_tdp_populate_page_table(struct kvm *kvm, struct kvm_memory_slot *slot)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# https://github.com/0day-ci/linux/commit/9607ad9b47ad43e0b1fa4b2f4ef0c2e6a1217d08
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review yulei-kernel-gmail-com/x86-mmu-Introduce-parallel-memory-virtualization-to-boost-performance/20200901-221509
git checkout 9607ad9b47ad43e0b1fa4b2f4ef0c2e6a1217d08
vim +/direct_build_mapping_level +6430 arch/x86/kvm/mmu/mmu.c
6429
> 6430 int direct_build_mapping_level(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn)
6431 {
6432 int host_level, max_level, level;
6433 struct kvm_lpage_info *linfo;
6434
6435 host_level = host_mapping_level(kvm, gfn);
6436 if (host_level != PG_LEVEL_4K) {
6437 max_level = min(max_huge_page_level, host_level);
6438 for (level = PG_LEVEL_4K; level <= max_level; ++level) {
6439 linfo = lpage_info_slot(gfn, slot, level);
6440 if (linfo->disallow_lpage)
6441 break;
6442 }
6443 host_level = level - 1;
6444 }
6445 return host_level;
6446 }
6447
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 74609 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC V2 2/9] Introduce page table population function for direct build EPT feature
2020-09-01 11:55 ` [RFC V2 2/9] Introduce page table population function for direct build EPT feature yulei.kernel
2020-09-01 17:33 ` kernel test robot
@ 2020-09-01 19:04 ` kernel test robot
1 sibling, 0 replies; 22+ messages in thread
From: kernel test robot @ 2020-09-01 19:04 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 15821 bytes --]
Hi,
[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on kvm/linux-next]
[also build test ERROR on linus/master v5.9-rc3 next-20200828]
[cannot apply to kvm-ppc/kvm-ppc-next kvms390/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/yulei-kernel-gmail-com/x86-mmu-Introduce-parallel-memory-virtualization-to-boost-performance/20200901-221509
base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All error/warnings (new ones prefixed by >>):
arch/powerpc/kvm/book3s_xive_native.c: In function 'kvmppc_xive_native_set_queue_config':
>> arch/powerpc/kvm/book3s_xive_native.c:640:33: error: passing argument 1 of 'kvm_host_page_size' from incompatible pointer type [-Werror=incompatible-pointer-types]
640 | page_size = kvm_host_page_size(vcpu, gfn);
| ^~~~
| |
| struct kvm_vcpu *
In file included from arch/powerpc/kvm/book3s_xive_native.c:9:
include/linux/kvm_host.h:780:46: note: expected 'struct kvm *' but argument is of type 'struct kvm_vcpu *'
780 | unsigned long kvm_host_page_size(struct kvm *kvm, struct kvm_vcpu *vcpu, gfn_t gfn);
| ~~~~~~~~~~~~^~~
>> arch/powerpc/kvm/book3s_xive_native.c:640:39: warning: passing argument 2 of 'kvm_host_page_size' makes pointer from integer without a cast [-Wint-conversion]
640 | page_size = kvm_host_page_size(vcpu, gfn);
| ^~~
| |
| gfn_t {aka long long unsigned int}
In file included from arch/powerpc/kvm/book3s_xive_native.c:9:
include/linux/kvm_host.h:780:68: note: expected 'struct kvm_vcpu *' but argument is of type 'gfn_t' {aka 'long long unsigned int'}
780 | unsigned long kvm_host_page_size(struct kvm *kvm, struct kvm_vcpu *vcpu, gfn_t gfn);
| ~~~~~~~~~~~~~~~~~^~~~
>> arch/powerpc/kvm/book3s_xive_native.c:640:14: error: too few arguments to function 'kvm_host_page_size'
640 | page_size = kvm_host_page_size(vcpu, gfn);
| ^~~~~~~~~~~~~~~~~~
In file included from arch/powerpc/kvm/book3s_xive_native.c:9:
include/linux/kvm_host.h:780:15: note: declared here
780 | unsigned long kvm_host_page_size(struct kvm *kvm, struct kvm_vcpu *vcpu, gfn_t gfn);
| ^~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
# https://github.com/0day-ci/linux/commit/9607ad9b47ad43e0b1fa4b2f4ef0c2e6a1217d08
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review yulei-kernel-gmail-com/x86-mmu-Introduce-parallel-memory-virtualization-to-boost-performance/20200901-221509
git checkout 9607ad9b47ad43e0b1fa4b2f4ef0c2e6a1217d08
vim +/kvm_host_page_size +640 arch/powerpc/kvm/book3s_xive_native.c
13ce3297c5766b Cédric Le Goater 2019-04-18 549
13ce3297c5766b Cédric Le Goater 2019-04-18 550 static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
13ce3297c5766b Cédric Le Goater 2019-04-18 551 long eq_idx, u64 addr)
13ce3297c5766b Cédric Le Goater 2019-04-18 552 {
13ce3297c5766b Cédric Le Goater 2019-04-18 553 struct kvm *kvm = xive->kvm;
13ce3297c5766b Cédric Le Goater 2019-04-18 554 struct kvm_vcpu *vcpu;
13ce3297c5766b Cédric Le Goater 2019-04-18 555 struct kvmppc_xive_vcpu *xc;
13ce3297c5766b Cédric Le Goater 2019-04-18 556 void __user *ubufp = (void __user *) addr;
13ce3297c5766b Cédric Le Goater 2019-04-18 557 u32 server;
13ce3297c5766b Cédric Le Goater 2019-04-18 558 u8 priority;
13ce3297c5766b Cédric Le Goater 2019-04-18 559 struct kvm_ppc_xive_eq kvm_eq;
13ce3297c5766b Cédric Le Goater 2019-04-18 560 int rc;
13ce3297c5766b Cédric Le Goater 2019-04-18 561 __be32 *qaddr = 0;
13ce3297c5766b Cédric Le Goater 2019-04-18 562 struct page *page;
13ce3297c5766b Cédric Le Goater 2019-04-18 563 struct xive_q *q;
13ce3297c5766b Cédric Le Goater 2019-04-18 564 gfn_t gfn;
13ce3297c5766b Cédric Le Goater 2019-04-18 565 unsigned long page_size;
aedb5b19429c80 Cédric Le Goater 2019-05-28 566 int srcu_idx;
13ce3297c5766b Cédric Le Goater 2019-04-18 567
13ce3297c5766b Cédric Le Goater 2019-04-18 568 /*
13ce3297c5766b Cédric Le Goater 2019-04-18 569 * Demangle priority/server tuple from the EQ identifier
13ce3297c5766b Cédric Le Goater 2019-04-18 570 */
13ce3297c5766b Cédric Le Goater 2019-04-18 571 priority = (eq_idx & KVM_XIVE_EQ_PRIORITY_MASK) >>
13ce3297c5766b Cédric Le Goater 2019-04-18 572 KVM_XIVE_EQ_PRIORITY_SHIFT;
13ce3297c5766b Cédric Le Goater 2019-04-18 573 server = (eq_idx & KVM_XIVE_EQ_SERVER_MASK) >>
13ce3297c5766b Cédric Le Goater 2019-04-18 574 KVM_XIVE_EQ_SERVER_SHIFT;
13ce3297c5766b Cédric Le Goater 2019-04-18 575
13ce3297c5766b Cédric Le Goater 2019-04-18 576 if (copy_from_user(&kvm_eq, ubufp, sizeof(kvm_eq)))
13ce3297c5766b Cédric Le Goater 2019-04-18 577 return -EFAULT;
13ce3297c5766b Cédric Le Goater 2019-04-18 578
13ce3297c5766b Cédric Le Goater 2019-04-18 579 vcpu = kvmppc_xive_find_server(kvm, server);
13ce3297c5766b Cédric Le Goater 2019-04-18 580 if (!vcpu) {
13ce3297c5766b Cédric Le Goater 2019-04-18 581 pr_err("Can't find server %d\n", server);
13ce3297c5766b Cédric Le Goater 2019-04-18 582 return -ENOENT;
13ce3297c5766b Cédric Le Goater 2019-04-18 583 }
13ce3297c5766b Cédric Le Goater 2019-04-18 584 xc = vcpu->arch.xive_vcpu;
13ce3297c5766b Cédric Le Goater 2019-04-18 585
13ce3297c5766b Cédric Le Goater 2019-04-18 586 if (priority != xive_prio_from_guest(priority)) {
13ce3297c5766b Cédric Le Goater 2019-04-18 587 pr_err("Trying to restore invalid queue %d for VCPU %d\n",
13ce3297c5766b Cédric Le Goater 2019-04-18 588 priority, server);
13ce3297c5766b Cédric Le Goater 2019-04-18 589 return -EINVAL;
13ce3297c5766b Cédric Le Goater 2019-04-18 590 }
13ce3297c5766b Cédric Le Goater 2019-04-18 591 q = &xc->queues[priority];
13ce3297c5766b Cédric Le Goater 2019-04-18 592
13ce3297c5766b Cédric Le Goater 2019-04-18 593 pr_devel("%s VCPU %d priority %d fl:%x shift:%d addr:%llx g:%d idx:%d\n",
13ce3297c5766b Cédric Le Goater 2019-04-18 594 __func__, server, priority, kvm_eq.flags,
13ce3297c5766b Cédric Le Goater 2019-04-18 595 kvm_eq.qshift, kvm_eq.qaddr, kvm_eq.qtoggle, kvm_eq.qindex);
13ce3297c5766b Cédric Le Goater 2019-04-18 596
13ce3297c5766b Cédric Le Goater 2019-04-18 597 /* reset queue and disable queueing */
13ce3297c5766b Cédric Le Goater 2019-04-18 598 if (!kvm_eq.qshift) {
13ce3297c5766b Cédric Le Goater 2019-04-18 599 q->guest_qaddr = 0;
13ce3297c5766b Cédric Le Goater 2019-04-18 600 q->guest_qshift = 0;
13ce3297c5766b Cédric Le Goater 2019-04-18 601
31a88c82b466d2 Greg Kurz 2019-11-13 602 rc = kvmppc_xive_native_configure_queue(xc->vp_id, q, priority,
13ce3297c5766b Cédric Le Goater 2019-04-18 603 NULL, 0, true);
13ce3297c5766b Cédric Le Goater 2019-04-18 604 if (rc) {
13ce3297c5766b Cédric Le Goater 2019-04-18 605 pr_err("Failed to reset queue %d for VCPU %d: %d\n",
13ce3297c5766b Cédric Le Goater 2019-04-18 606 priority, xc->server_num, rc);
13ce3297c5766b Cédric Le Goater 2019-04-18 607 return rc;
13ce3297c5766b Cédric Le Goater 2019-04-18 608 }
13ce3297c5766b Cédric Le Goater 2019-04-18 609
13ce3297c5766b Cédric Le Goater 2019-04-18 610 return 0;
13ce3297c5766b Cédric Le Goater 2019-04-18 611 }
13ce3297c5766b Cédric Le Goater 2019-04-18 612
c468bc4e8468cb Cédric Le Goater 2019-05-20 613 /*
c468bc4e8468cb Cédric Le Goater 2019-05-20 614 * sPAPR specifies a "Unconditional Notify (n) flag" for the
c468bc4e8468cb Cédric Le Goater 2019-05-20 615 * H_INT_SET_QUEUE_CONFIG hcall which forces notification
c468bc4e8468cb Cédric Le Goater 2019-05-20 616 * without using the coalescing mechanisms provided by the
c468bc4e8468cb Cédric Le Goater 2019-05-20 617 * XIVE END ESBs. This is required on KVM as notification
c468bc4e8468cb Cédric Le Goater 2019-05-20 618 * using the END ESBs is not supported.
c468bc4e8468cb Cédric Le Goater 2019-05-20 619 */
c468bc4e8468cb Cédric Le Goater 2019-05-20 620 if (kvm_eq.flags != KVM_XIVE_EQ_ALWAYS_NOTIFY) {
c468bc4e8468cb Cédric Le Goater 2019-05-20 621 pr_err("invalid flags %d\n", kvm_eq.flags);
c468bc4e8468cb Cédric Le Goater 2019-05-20 622 return -EINVAL;
c468bc4e8468cb Cédric Le Goater 2019-05-20 623 }
c468bc4e8468cb Cédric Le Goater 2019-05-20 624
c468bc4e8468cb Cédric Le Goater 2019-05-20 625 rc = xive_native_validate_queue_size(kvm_eq.qshift);
c468bc4e8468cb Cédric Le Goater 2019-05-20 626 if (rc) {
c468bc4e8468cb Cédric Le Goater 2019-05-20 627 pr_err("invalid queue size %d\n", kvm_eq.qshift);
c468bc4e8468cb Cédric Le Goater 2019-05-20 628 return rc;
c468bc4e8468cb Cédric Le Goater 2019-05-20 629 }
c468bc4e8468cb Cédric Le Goater 2019-05-20 630
13ce3297c5766b Cédric Le Goater 2019-04-18 631 if (kvm_eq.qaddr & ((1ull << kvm_eq.qshift) - 1)) {
13ce3297c5766b Cédric Le Goater 2019-04-18 632 pr_err("queue page is not aligned %llx/%llx\n", kvm_eq.qaddr,
13ce3297c5766b Cédric Le Goater 2019-04-18 633 1ull << kvm_eq.qshift);
13ce3297c5766b Cédric Le Goater 2019-04-18 634 return -EINVAL;
13ce3297c5766b Cédric Le Goater 2019-04-18 635 }
13ce3297c5766b Cédric Le Goater 2019-04-18 636
aedb5b19429c80 Cédric Le Goater 2019-05-28 637 srcu_idx = srcu_read_lock(&kvm->srcu);
13ce3297c5766b Cédric Le Goater 2019-04-18 638 gfn = gpa_to_gfn(kvm_eq.qaddr);
13ce3297c5766b Cédric Le Goater 2019-04-18 639
f9b84e19221efc Sean Christopherson 2020-01-08 @640 page_size = kvm_host_page_size(vcpu, gfn);
13ce3297c5766b Cédric Le Goater 2019-04-18 641 if (1ull << kvm_eq.qshift > page_size) {
aedb5b19429c80 Cédric Le Goater 2019-05-28 642 srcu_read_unlock(&kvm->srcu, srcu_idx);
13ce3297c5766b Cédric Le Goater 2019-04-18 643 pr_warn("Incompatible host page size %lx!\n", page_size);
13ce3297c5766b Cédric Le Goater 2019-04-18 644 return -EINVAL;
13ce3297c5766b Cédric Le Goater 2019-04-18 645 }
13ce3297c5766b Cédric Le Goater 2019-04-18 646
30486e72093ea2 Greg Kurz 2019-11-13 647 page = gfn_to_page(kvm, gfn);
30486e72093ea2 Greg Kurz 2019-11-13 648 if (is_error_page(page)) {
30486e72093ea2 Greg Kurz 2019-11-13 649 srcu_read_unlock(&kvm->srcu, srcu_idx);
30486e72093ea2 Greg Kurz 2019-11-13 650 pr_err("Couldn't get queue page %llx!\n", kvm_eq.qaddr);
30486e72093ea2 Greg Kurz 2019-11-13 651 return -EINVAL;
30486e72093ea2 Greg Kurz 2019-11-13 652 }
30486e72093ea2 Greg Kurz 2019-11-13 653
13ce3297c5766b Cédric Le Goater 2019-04-18 654 qaddr = page_to_virt(page) + (kvm_eq.qaddr & ~PAGE_MASK);
aedb5b19429c80 Cédric Le Goater 2019-05-28 655 srcu_read_unlock(&kvm->srcu, srcu_idx);
13ce3297c5766b Cédric Le Goater 2019-04-18 656
13ce3297c5766b Cédric Le Goater 2019-04-18 657 /*
13ce3297c5766b Cédric Le Goater 2019-04-18 658 * Backup the queue page guest address to the mark EQ page
13ce3297c5766b Cédric Le Goater 2019-04-18 659 * dirty for migration.
13ce3297c5766b Cédric Le Goater 2019-04-18 660 */
13ce3297c5766b Cédric Le Goater 2019-04-18 661 q->guest_qaddr = kvm_eq.qaddr;
13ce3297c5766b Cédric Le Goater 2019-04-18 662 q->guest_qshift = kvm_eq.qshift;
13ce3297c5766b Cédric Le Goater 2019-04-18 663
13ce3297c5766b Cédric Le Goater 2019-04-18 664 /*
13ce3297c5766b Cédric Le Goater 2019-04-18 665 * Unconditional Notification is forced by default at the
13ce3297c5766b Cédric Le Goater 2019-04-18 666 * OPAL level because the use of END ESBs is not supported by
13ce3297c5766b Cédric Le Goater 2019-04-18 667 * Linux.
13ce3297c5766b Cédric Le Goater 2019-04-18 668 */
31a88c82b466d2 Greg Kurz 2019-11-13 669 rc = kvmppc_xive_native_configure_queue(xc->vp_id, q, priority,
13ce3297c5766b Cédric Le Goater 2019-04-18 670 (__be32 *) qaddr, kvm_eq.qshift, true);
13ce3297c5766b Cédric Le Goater 2019-04-18 671 if (rc) {
13ce3297c5766b Cédric Le Goater 2019-04-18 672 pr_err("Failed to configure queue %d for VCPU %d: %d\n",
13ce3297c5766b Cédric Le Goater 2019-04-18 673 priority, xc->server_num, rc);
13ce3297c5766b Cédric Le Goater 2019-04-18 674 put_page(page);
13ce3297c5766b Cédric Le Goater 2019-04-18 675 return rc;
13ce3297c5766b Cédric Le Goater 2019-04-18 676 }
13ce3297c5766b Cédric Le Goater 2019-04-18 677
13ce3297c5766b Cédric Le Goater 2019-04-18 678 /*
13ce3297c5766b Cédric Le Goater 2019-04-18 679 * Only restore the queue state when needed. When doing the
13ce3297c5766b Cédric Le Goater 2019-04-18 680 * H_INT_SET_SOURCE_CONFIG hcall, it should not.
13ce3297c5766b Cédric Le Goater 2019-04-18 681 */
13ce3297c5766b Cédric Le Goater 2019-04-18 682 if (kvm_eq.qtoggle != 1 || kvm_eq.qindex != 0) {
13ce3297c5766b Cédric Le Goater 2019-04-18 683 rc = xive_native_set_queue_state(xc->vp_id, priority,
13ce3297c5766b Cédric Le Goater 2019-04-18 684 kvm_eq.qtoggle,
13ce3297c5766b Cédric Le Goater 2019-04-18 685 kvm_eq.qindex);
13ce3297c5766b Cédric Le Goater 2019-04-18 686 if (rc)
13ce3297c5766b Cédric Le Goater 2019-04-18 687 goto error;
13ce3297c5766b Cédric Le Goater 2019-04-18 688 }
13ce3297c5766b Cédric Le Goater 2019-04-18 689
13ce3297c5766b Cédric Le Goater 2019-04-18 690 rc = kvmppc_xive_attach_escalation(vcpu, priority,
13ce3297c5766b Cédric Le Goater 2019-04-18 691 xive->single_escalation);
13ce3297c5766b Cédric Le Goater 2019-04-18 692 error:
13ce3297c5766b Cédric Le Goater 2019-04-18 693 if (rc)
13ce3297c5766b Cédric Le Goater 2019-04-18 694 kvmppc_xive_native_cleanup_queue(vcpu, priority);
13ce3297c5766b Cédric Le Goater 2019-04-18 695 return rc;
13ce3297c5766b Cédric Le Goater 2019-04-18 696 }
13ce3297c5766b Cédric Le Goater 2019-04-18 697
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 70243 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC V2 6/9] Apply the direct build EPT according to the memory slots change
2020-09-01 11:56 ` [RFC V2 6/9] Apply the direct build EPT according to the memory slots change yulei.kernel
@ 2020-09-01 22:20 ` kernel test robot
2020-09-02 7:00 ` kernel test robot
1 sibling, 0 replies; 22+ messages in thread
From: kernel test robot @ 2020-09-01 22:20 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 5674 bytes --]
Hi,
[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on kvm/linux-next]
[also build test ERROR on linus/master v5.9-rc3 next-20200828]
[cannot apply to kvm-ppc/kvm-ppc-next kvms390/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/yulei-kernel-gmail-com/x86-mmu-Introduce-parallel-memory-virtualization-to-boost-performance/20200901-221509
base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: arm64-randconfig-r036-20200901 (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
aarch64-linux-ld: arch/arm64/../../virt/kvm/kvm_main.o: in function `kvm_set_memslot':
>> arch/arm64/kvm/../../../virt/kvm/kvm_main.c:1201: undefined reference to `kvm_direct_tdp_remove_page_table'
>> aarch64-linux-ld: arch/arm64/kvm/../../../virt/kvm/kvm_main.c:1232: undefined reference to `kvm_direct_tdp_populate_page_table'
>> aarch64-linux-ld: arch/arm64/kvm/../../../virt/kvm/kvm_main.c:1232: undefined reference to `kvm_direct_tdp_populate_page_table'
aarch64-linux-ld: arch/arm64/../../virt/kvm/kvm_main.o: in function `kvm_destroy_vm':
>> arch/arm64/kvm/../../../virt/kvm/kvm_main.c:879: undefined reference to `kvm_direct_tdp_release_global_root'
aarch64-linux-ld: drivers/gpu/drm/rcar-du/rcar_du_crtc.o: in function `rcar_du_cmm_setup':
drivers/gpu/drm/rcar-du/rcar_du_crtc.c:515: undefined reference to `rcar_cmm_setup'
aarch64-linux-ld: drivers/gpu/drm/rcar-du/rcar_du_crtc.o: in function `rcar_du_crtc_atomic_enable':
drivers/gpu/drm/rcar-du/rcar_du_crtc.c:720: undefined reference to `rcar_cmm_enable'
aarch64-linux-ld: drivers/gpu/drm/rcar-du/rcar_du_crtc.o: in function `rcar_du_crtc_stop':
drivers/gpu/drm/rcar-du/rcar_du_crtc.c:664: undefined reference to `rcar_cmm_disable'
aarch64-linux-ld: drivers/gpu/drm/rcar-du/rcar_du_kms.o: in function `rcar_du_cmm_init':
drivers/gpu/drm/rcar-du/rcar_du_kms.c:678: undefined reference to `rcar_cmm_init'
# https://github.com/0day-ci/linux/commit/751ce77392ca79955a0577617878ee1950ef3445
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review yulei-kernel-gmail-com/x86-mmu-Introduce-parallel-memory-virtualization-to-boost-performance/20200901-221509
git checkout 751ce77392ca79955a0577617878ee1950ef3445
vim +1201 arch/arm64/kvm/../../../virt/kvm/kvm_main.c
1178
1179 static int kvm_set_memslot(struct kvm *kvm,
1180 const struct kvm_userspace_memory_region *mem,
1181 struct kvm_memory_slot *old,
1182 struct kvm_memory_slot *new, int as_id,
1183 enum kvm_mr_change change)
1184 {
1185 struct kvm_memory_slot *slot;
1186 struct kvm_memslots *slots;
1187 int r;
1188
1189 slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), change);
1190 if (!slots)
1191 return -ENOMEM;
1192
1193 if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
1194 /*
1195 * Note, the INVALID flag needs to be in the appropriate entry
1196 * in the freshly allocated memslots, not in @old or @new.
1197 */
1198 slot = id_to_memslot(slots, old->id);
1199 /* Remove pre-constructed page table */
1200 if (!as_id)
> 1201 kvm_direct_tdp_remove_page_table(kvm, slot);
1202
1203 slot->flags |= KVM_MEMSLOT_INVALID;
1204
1205 /*
1206 * We can re-use the old memslots, the only difference from the
1207 * newly installed memslots is the invalid flag, which will get
1208 * dropped by update_memslots anyway. We'll also revert to the
1209 * old memslots if preparing the new memory region fails.
1210 */
1211 slots = install_new_memslots(kvm, as_id, slots);
1212
1213 /* From this point no new shadow pages pointing to a deleted,
1214 * or moved, memslot will be created.
1215 *
1216 * validation of sp->gfn happens in:
1217 * - gfn_to_hva (kvm_read_guest, gfn_to_pfn)
1218 * - kvm_is_visible_gfn (mmu_check_root)
1219 */
1220 kvm_arch_flush_shadow_memslot(kvm, slot);
1221 }
1222
1223 r = kvm_arch_prepare_memory_region(kvm, new, mem, change);
1224 if (r)
1225 goto out_slots;
1226
1227 update_memslots(slots, new, change);
1228 slots = install_new_memslots(kvm, as_id, slots);
1229
1230 if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
1231 if (!as_id) {
> 1232 r = kvm_direct_tdp_populate_page_table(kvm, new);
1233 if (r)
1234 goto out_slots;
1235 }
1236 }
1237
1238 kvm_arch_commit_memory_region(kvm, mem, old, new, change);
1239
1240 kvfree(slots);
1241 return 0;
1242
1243 out_slots:
1244 if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
1245 slots = install_new_memslots(kvm, as_id, slots);
1246 kvfree(slots);
1247 return r;
1248 }
1249
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 35282 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC V2 6/9] Apply the direct build EPT according to the memory slots change
2020-09-01 11:56 ` [RFC V2 6/9] Apply the direct build EPT according to the memory slots change yulei.kernel
2020-09-01 22:20 ` kernel test robot
@ 2020-09-02 7:00 ` kernel test robot
1 sibling, 0 replies; 22+ messages in thread
From: kernel test robot @ 2020-09-02 7:00 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 1916 bytes --]
Hi,
[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on kvm/linux-next]
[also build test ERROR on linus/master v5.9-rc3 next-20200828]
[cannot apply to kvm-ppc/kvm-ppc-next kvms390/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/yulei-kernel-gmail-com/x86-mmu-Introduce-parallel-memory-virtualization-to-boost-performance/20200901-221509
base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: arm64-allmodconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
aarch64-linux-ld: arch/arm64/../../virt/kvm/kvm_main.o: in function `kvm_set_memslot.constprop.0':
>> kvm_main.c:(.text+0xb384): undefined reference to `kvm_direct_tdp_remove_page_table'
>> aarch64-linux-ld: kvm_main.c:(.text+0xb9e8): undefined reference to `kvm_direct_tdp_populate_page_table'
aarch64-linux-ld: kvm_main.c:(.text+0xba10): undefined reference to `kvm_direct_tdp_populate_page_table'
aarch64-linux-ld: arch/arm64/../../virt/kvm/kvm_main.o: in function `kvm_destroy_vm':
>> kvm_main.c:(.text+0x11b98): undefined reference to `kvm_direct_tdp_release_global_root'
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 73856 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance
2020-09-01 11:52 [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance yulei.kernel
` (8 preceding siblings ...)
2020-09-01 11:57 ` [RFC V2 9/9] Handle certain mmu exposed functions properly while turn on " yulei.kernel
@ 2020-09-09 3:04 ` Wanpeng Li
2020-09-24 6:28 ` Wanpeng Li
9 siblings, 1 reply; 22+ messages in thread
From: Wanpeng Li @ 2020-09-09 3:04 UTC (permalink / raw)
To: Yulei Zhang
Cc: Paolo Bonzini, kvm, LKML, Sean Christopherson, Jim Mattson,
Junaid Shahid, Ben Gardon, Vitaly Kuznetsov, Xiao Guangrong,
Haiwei Li
Any comments? guys!
On Tue, 1 Sep 2020 at 19:52, <yulei.kernel@gmail.com> wrote:
>
> From: Yulei Zhang <yulei.kernel@gmail.com>
>
> Currently in KVM memory virtulization we relay on mmu_lock to
> synchronize the memory mapping update, which make vCPUs work
> in serialize mode and slow down the execution, especially after
> migration to do substantial memory mapping will cause visible
> performance drop, and it can get worse if guest has more vCPU
> numbers and memories.
>
> The idea we present in this patch set is to mitigate the issue
> with pre-constructed memory mapping table. We will fast pin the
> guest memory to build up a global memory mapping table according
> to the guest memslots changes and apply it to cr3, so that after
> guest starts up all the vCPUs would be able to update the memory
> simultaneously without page fault exception, thus the performance
> improvement is expected.
>
> We use memory dirty pattern workload to test the initial patch
> set and get positive result even with huge page enabled. For example,
> we create guest with 32 vCPUs and 64G memories, and let the vcpus
> dirty the entire memory region concurrently, as the initial patch
> eliminate the overhead of mmu_lock, in 2M/1G huge page mode we would
> get the job done in about 50% faster.
>
> We only validate this feature on Intel x86 platform. And as Ben
> pointed out in RFC V1, so far we disable the SMM for resource
> consideration, drop the mmu notification as in this case the
> memory is pinned.
>
> V1->V2:
> * Rebase the code to kernel version 5.9.0-rc1.
>
> Yulei Zhang (9):
> Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT
> support
> Introduce page table population function for direct build EPT feature
> Introduce page table remove function for direct build EPT feature
> Add release function for direct build ept when guest VM exit
> Modify the page fault path to meet the direct build EPT requirement
> Apply the direct build EPT according to the memory slots change
> Add migration support when using direct build EPT
> Introduce kvm module parameter global_tdp to turn on the direct build
> EPT mode
> Handle certain mmu exposed functions properly while turn on direct
> build EPT mode
>
> arch/mips/kvm/mips.c | 13 +
> arch/powerpc/kvm/powerpc.c | 13 +
> arch/s390/kvm/kvm-s390.c | 13 +
> arch/x86/include/asm/kvm_host.h | 13 +-
> arch/x86/kvm/mmu/mmu.c | 533 ++++++++++++++++++++++++++++++--
> arch/x86/kvm/svm/svm.c | 2 +-
> arch/x86/kvm/vmx/vmx.c | 7 +-
> arch/x86/kvm/x86.c | 55 ++--
> include/linux/kvm_host.h | 7 +-
> virt/kvm/kvm_main.c | 43 ++-
> 10 files changed, 639 insertions(+), 60 deletions(-)
>
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance
2020-09-09 3:04 ` [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance Wanpeng Li
@ 2020-09-24 6:28 ` Wanpeng Li
2020-09-24 17:14 ` Ben Gardon
0 siblings, 1 reply; 22+ messages in thread
From: Wanpeng Li @ 2020-09-24 6:28 UTC (permalink / raw)
To: Yulei Zhang
Cc: Paolo Bonzini, kvm, LKML, Sean Christopherson, Jim Mattson,
Junaid Shahid, Ben Gardon, Vitaly Kuznetsov, Xiao Guangrong,
Haiwei Li
Any comments? Paolo! :)
On Wed, 9 Sep 2020 at 11:04, Wanpeng Li <kernellwp@gmail.com> wrote:
>
> Any comments? guys!
> On Tue, 1 Sep 2020 at 19:52, <yulei.kernel@gmail.com> wrote:
> >
> > From: Yulei Zhang <yulei.kernel@gmail.com>
> >
> > Currently in KVM memory virtulization we relay on mmu_lock to
> > synchronize the memory mapping update, which make vCPUs work
> > in serialize mode and slow down the execution, especially after
> > migration to do substantial memory mapping will cause visible
> > performance drop, and it can get worse if guest has more vCPU
> > numbers and memories.
> >
> > The idea we present in this patch set is to mitigate the issue
> > with pre-constructed memory mapping table. We will fast pin the
> > guest memory to build up a global memory mapping table according
> > to the guest memslots changes and apply it to cr3, so that after
> > guest starts up all the vCPUs would be able to update the memory
> > simultaneously without page fault exception, thus the performance
> > improvement is expected.
> >
> > We use memory dirty pattern workload to test the initial patch
> > set and get positive result even with huge page enabled. For example,
> > we create guest with 32 vCPUs and 64G memories, and let the vcpus
> > dirty the entire memory region concurrently, as the initial patch
> > eliminate the overhead of mmu_lock, in 2M/1G huge page mode we would
> > get the job done in about 50% faster.
> >
> > We only validate this feature on Intel x86 platform. And as Ben
> > pointed out in RFC V1, so far we disable the SMM for resource
> > consideration, drop the mmu notification as in this case the
> > memory is pinned.
> >
> > V1->V2:
> > * Rebase the code to kernel version 5.9.0-rc1.
> >
> > Yulei Zhang (9):
> > Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT
> > support
> > Introduce page table population function for direct build EPT feature
> > Introduce page table remove function for direct build EPT feature
> > Add release function for direct build ept when guest VM exit
> > Modify the page fault path to meet the direct build EPT requirement
> > Apply the direct build EPT according to the memory slots change
> > Add migration support when using direct build EPT
> > Introduce kvm module parameter global_tdp to turn on the direct build
> > EPT mode
> > Handle certain mmu exposed functions properly while turn on direct
> > build EPT mode
> >
> > arch/mips/kvm/mips.c | 13 +
> > arch/powerpc/kvm/powerpc.c | 13 +
> > arch/s390/kvm/kvm-s390.c | 13 +
> > arch/x86/include/asm/kvm_host.h | 13 +-
> > arch/x86/kvm/mmu/mmu.c | 533 ++++++++++++++++++++++++++++++--
> > arch/x86/kvm/svm/svm.c | 2 +-
> > arch/x86/kvm/vmx/vmx.c | 7 +-
> > arch/x86/kvm/x86.c | 55 ++--
> > include/linux/kvm_host.h | 7 +-
> > virt/kvm/kvm_main.c | 43 ++-
> > 10 files changed, 639 insertions(+), 60 deletions(-)
> >
> > --
> > 2.17.1
> >
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance
2020-09-24 6:28 ` Wanpeng Li
@ 2020-09-24 17:14 ` Ben Gardon
2020-09-25 12:04 ` yulei zhang
0 siblings, 1 reply; 22+ messages in thread
From: Ben Gardon @ 2020-09-24 17:14 UTC (permalink / raw)
To: Wanpeng Li
Cc: Yulei Zhang, Paolo Bonzini, kvm, LKML, Sean Christopherson,
Jim Mattson, Junaid Shahid, Vitaly Kuznetsov, Xiao Guangrong,
Haiwei Li
On Wed, Sep 23, 2020 at 11:28 PM Wanpeng Li <kernellwp@gmail.com> wrote:
>
> Any comments? Paolo! :)
Hi, sorry to be so late in replying! I wanted to post the first part
of the TDP MMU series I've been working on before responding so we
could discuss the two together, but I haven't been able to get it out
as fast as I would have liked. (I'll send it ASAP!) I'm hopeful that
it will ultimately help address some of the page fault handling and
lock contention issues you're addressing with these patches. I'd also
be happy to work together to add a prepopulation feature to it. I'll
put in some more comments inline below.
> On Wed, 9 Sep 2020 at 11:04, Wanpeng Li <kernellwp@gmail.com> wrote:
> >
> > Any comments? guys!
> > On Tue, 1 Sep 2020 at 19:52, <yulei.kernel@gmail.com> wrote:
> > >
> > > From: Yulei Zhang <yulei.kernel@gmail.com>
> > >
> > > Currently in KVM memory virtulization we relay on mmu_lock to
> > > synchronize the memory mapping update, which make vCPUs work
> > > in serialize mode and slow down the execution, especially after
> > > migration to do substantial memory mapping will cause visible
> > > performance drop, and it can get worse if guest has more vCPU
> > > numbers and memories.
> > >
> > > The idea we present in this patch set is to mitigate the issue
> > > with pre-constructed memory mapping table. We will fast pin the
> > > guest memory to build up a global memory mapping table according
> > > to the guest memslots changes and apply it to cr3, so that after
> > > guest starts up all the vCPUs would be able to update the memory
> > > simultaneously without page fault exception, thus the performance
> > > improvement is expected.
My understanding from this RFC is that your primary goal is to
eliminate page fault latencies and lock contention arising from the
first page faults incurred by vCPUs when initially populating the EPT.
Is that right?
I have the impression that the pinning and generally static memory
mappings are more a convenient simplification than part of a larger
goal to avoid incurring page faults down the line. Is that correct?
I ask because I didn't fully understand, from our conversation on v1
of this RFC, why reimplementing the page fault handler and associated
functions was necessary for the above goals, as I understood them.
My impression of the prepopulation approach is that, KVM will
sequentially populate all the EPT entries to map guest memory. I
understand how this could be optimized to be quite efficient, but I
don't understand how it would scale better than the existing
implementation with one vCPU accessing memory.
> > >
> > > We use memory dirty pattern workload to test the initial patch
> > > set and get positive result even with huge page enabled. For example,
> > > we create guest with 32 vCPUs and 64G memories, and let the vcpus
> > > dirty the entire memory region concurrently, as the initial patch
> > > eliminate the overhead of mmu_lock, in 2M/1G huge page mode we would
> > > get the job done in about 50% faster.
In this benchmark did you include the time required to pre-populate
the EPT or just the time required for the vCPUs to dirty memory?
I ask because I'm curious if your priority is to decrease the total
end-to-end time, or you just care about the guest experience, and not
so much the VM startup time.
How does this compare to the case where 1 vCPU reads every page of
memory and then 32 vCPUs concurrently dirty every page?
> > >
> > > We only validate this feature on Intel x86 platform. And as Ben
> > > pointed out in RFC V1, so far we disable the SMM for resource
> > > consideration, drop the mmu notification as in this case the
> > > memory is pinned.
I'm excited to see big MMU changes like this, and I look forward to
combining our needs towards a better MMU for the x86 TDP case. Have
you thought about how you would build SMM and MMU notifier support
onto this patch series? I know that the invalidate range notifiers, at
least, added a lot of non-trivial complexity to the direct MMU
implementation I presented last year.
> > >
> > > V1->V2:
> > > * Rebase the code to kernel version 5.9.0-rc1.
> > >
> > > Yulei Zhang (9):
> > > Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT
> > > support
> > > Introduce page table population function for direct build EPT feature
> > > Introduce page table remove function for direct build EPT feature
> > > Add release function for direct build ept when guest VM exit
> > > Modify the page fault path to meet the direct build EPT requirement
> > > Apply the direct build EPT according to the memory slots change
> > > Add migration support when using direct build EPT
> > > Introduce kvm module parameter global_tdp to turn on the direct build
> > > EPT mode
> > > Handle certain mmu exposed functions properly while turn on direct
> > > build EPT mode
> > >
> > > arch/mips/kvm/mips.c | 13 +
> > > arch/powerpc/kvm/powerpc.c | 13 +
> > > arch/s390/kvm/kvm-s390.c | 13 +
> > > arch/x86/include/asm/kvm_host.h | 13 +-
> > > arch/x86/kvm/mmu/mmu.c | 533 ++++++++++++++++++++++++++++++--
> > > arch/x86/kvm/svm/svm.c | 2 +-
> > > arch/x86/kvm/vmx/vmx.c | 7 +-
> > > arch/x86/kvm/x86.c | 55 ++--
> > > include/linux/kvm_host.h | 7 +-
> > > virt/kvm/kvm_main.c | 43 ++-
> > > 10 files changed, 639 insertions(+), 60 deletions(-)
> > >
> > > --
> > > 2.17.1
> > >
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance
2020-09-24 17:14 ` Ben Gardon
@ 2020-09-25 12:04 ` yulei zhang
2020-09-25 17:30 ` Ben Gardon
0 siblings, 1 reply; 22+ messages in thread
From: yulei zhang @ 2020-09-25 12:04 UTC (permalink / raw)
To: Ben Gardon
Cc: Wanpeng Li, Paolo Bonzini, kvm, LKML, Sean Christopherson,
Jim Mattson, Junaid Shahid, Vitaly Kuznetsov, Xiao Guangrong,
Haiwei Li
On Fri, Sep 25, 2020 at 1:14 AM Ben Gardon <bgardon@google.com> wrote:
>
> On Wed, Sep 23, 2020 at 11:28 PM Wanpeng Li <kernellwp@gmail.com> wrote:
> >
> > Any comments? Paolo! :)
>
> Hi, sorry to be so late in replying! I wanted to post the first part
> of the TDP MMU series I've been working on before responding so we
> could discuss the two together, but I haven't been able to get it out
> as fast as I would have liked. (I'll send it ASAP!) I'm hopeful that
> it will ultimately help address some of the page fault handling and
> lock contention issues you're addressing with these patches. I'd also
> be happy to work together to add a prepopulation feature to it. I'll
> put in some more comments inline below.
>
Thanks for the feedback and looking forward to your patchset.
> > On Wed, 9 Sep 2020 at 11:04, Wanpeng Li <kernellwp@gmail.com> wrote:
> > >
> > > Any comments? guys!
> > > On Tue, 1 Sep 2020 at 19:52, <yulei.kernel@gmail.com> wrote:
> > > >
> > > > From: Yulei Zhang <yulei.kernel@gmail.com>
> > > >
> > > > Currently in KVM memory virtulization we relay on mmu_lock to
> > > > synchronize the memory mapping update, which make vCPUs work
> > > > in serialize mode and slow down the execution, especially after
> > > > migration to do substantial memory mapping will cause visible
> > > > performance drop, and it can get worse if guest has more vCPU
> > > > numbers and memories.
> > > >
> > > > The idea we present in this patch set is to mitigate the issue
> > > > with pre-constructed memory mapping table. We will fast pin the
> > > > guest memory to build up a global memory mapping table according
> > > > to the guest memslots changes and apply it to cr3, so that after
> > > > guest starts up all the vCPUs would be able to update the memory
> > > > simultaneously without page fault exception, thus the performance
> > > > improvement is expected.
>
> My understanding from this RFC is that your primary goal is to
> eliminate page fault latencies and lock contention arising from the
> first page faults incurred by vCPUs when initially populating the EPT.
> Is that right?
>
That's right.
> I have the impression that the pinning and generally static memory
> mappings are more a convenient simplification than part of a larger
> goal to avoid incurring page faults down the line. Is that correct?
>
> I ask because I didn't fully understand, from our conversation on v1
> of this RFC, why reimplementing the page fault handler and associated
> functions was necessary for the above goals, as I understood them.
> My impression of the prepopulation approach is that, KVM will
> sequentially populate all the EPT entries to map guest memory. I
> understand how this could be optimized to be quite efficient, but I
> don't understand how it would scale better than the existing
> implementation with one vCPU accessing memory.
>
I don't think our goal is to simply eliminate the page fault. Our
target scenario
is in live migration, when the workload resume on the destination VM after
migrate, it will kick off the vcpus to build the gfn to pfn mapping,
but due to the
mmu_lock it holds the vcpus to execute in sequential which significantly slows
down the workload execution in VM and affect the end user experience, especially
when it is memory sensitive workload. Pre-populate the EPT entries
will solve the
problem smoothly as it allows the vcpus to execute in parallel after migration.
> > > >
> > > > We use memory dirty pattern workload to test the initial patch
> > > > set and get positive result even with huge page enabled. For example,
> > > > we create guest with 32 vCPUs and 64G memories, and let the vcpus
> > > > dirty the entire memory region concurrently, as the initial patch
> > > > eliminate the overhead of mmu_lock, in 2M/1G huge page mode we would
> > > > get the job done in about 50% faster.
>
> In this benchmark did you include the time required to pre-populate
> the EPT or just the time required for the vCPUs to dirty memory?
> I ask because I'm curious if your priority is to decrease the total
> end-to-end time, or you just care about the guest experience, and not
> so much the VM startup time.
We compare the time for each vcpu thread to finish the dirty job. Yes, it can
take some time for the page table pre-populate, but as each vcpu thread
can gain a huge advantage with concurrent dirty write, if we count that in
the total time it is still a better result.
> How does this compare to the case where 1 vCPU reads every page of
> memory and then 32 vCPUs concurrently dirty every page?
>
Haven't tried this yet, I think the major difference would be the page fault
latency introduced by the one vCPU read.
> > > >
> > > > We only validate this feature on Intel x86 platform. And as Ben
> > > > pointed out in RFC V1, so far we disable the SMM for resource
> > > > consideration, drop the mmu notification as in this case the
> > > > memory is pinned.
>
> I'm excited to see big MMU changes like this, and I look forward to
> combining our needs towards a better MMU for the x86 TDP case. Have
> you thought about how you would build SMM and MMU notifier support
> onto this patch series? I know that the invalidate range notifiers, at
> least, added a lot of non-trivial complexity to the direct MMU
> implementation I presented last year.
>
Thanks for the suggestion, I will think about it.
> > > >
> > > > V1->V2:
> > > > * Rebase the code to kernel version 5.9.0-rc1.
> > > >
> > > > Yulei Zhang (9):
> > > > Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT
> > > > support
> > > > Introduce page table population function for direct build EPT feature
> > > > Introduce page table remove function for direct build EPT feature
> > > > Add release function for direct build ept when guest VM exit
> > > > Modify the page fault path to meet the direct build EPT requirement
> > > > Apply the direct build EPT according to the memory slots change
> > > > Add migration support when using direct build EPT
> > > > Introduce kvm module parameter global_tdp to turn on the direct build
> > > > EPT mode
> > > > Handle certain mmu exposed functions properly while turn on direct
> > > > build EPT mode
> > > >
> > > > arch/mips/kvm/mips.c | 13 +
> > > > arch/powerpc/kvm/powerpc.c | 13 +
> > > > arch/s390/kvm/kvm-s390.c | 13 +
> > > > arch/x86/include/asm/kvm_host.h | 13 +-
> > > > arch/x86/kvm/mmu/mmu.c | 533 ++++++++++++++++++++++++++++++--
> > > > arch/x86/kvm/svm/svm.c | 2 +-
> > > > arch/x86/kvm/vmx/vmx.c | 7 +-
> > > > arch/x86/kvm/x86.c | 55 ++--
> > > > include/linux/kvm_host.h | 7 +-
> > > > virt/kvm/kvm_main.c | 43 ++-
> > > > 10 files changed, 639 insertions(+), 60 deletions(-)
> > > >
> > > > --
> > > > 2.17.1
> > > >
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance
2020-09-25 12:04 ` yulei zhang
@ 2020-09-25 17:30 ` Ben Gardon
2020-09-25 20:50 ` Paolo Bonzini
0 siblings, 1 reply; 22+ messages in thread
From: Ben Gardon @ 2020-09-25 17:30 UTC (permalink / raw)
To: yulei zhang
Cc: Wanpeng Li, Paolo Bonzini, kvm, LKML, Sean Christopherson,
Jim Mattson, Junaid Shahid, Vitaly Kuznetsov, Xiao Guangrong,
Haiwei Li
On Fri, Sep 25, 2020 at 5:04 AM yulei zhang <yulei.kernel@gmail.com> wrote:
>
> On Fri, Sep 25, 2020 at 1:14 AM Ben Gardon <bgardon@google.com> wrote:
> >
> > On Wed, Sep 23, 2020 at 11:28 PM Wanpeng Li <kernellwp@gmail.com> wrote:
> > >
> > > Any comments? Paolo! :)
> >
> > Hi, sorry to be so late in replying! I wanted to post the first part
> > of the TDP MMU series I've been working on before responding so we
> > could discuss the two together, but I haven't been able to get it out
> > as fast as I would have liked. (I'll send it ASAP!) I'm hopeful that
> > it will ultimately help address some of the page fault handling and
> > lock contention issues you're addressing with these patches. I'd also
> > be happy to work together to add a prepopulation feature to it. I'll
> > put in some more comments inline below.
> >
>
> Thanks for the feedback and looking forward to your patchset.
>
> > > On Wed, 9 Sep 2020 at 11:04, Wanpeng Li <kernellwp@gmail.com> wrote:
> > > >
> > > > Any comments? guys!
> > > > On Tue, 1 Sep 2020 at 19:52, <yulei.kernel@gmail.com> wrote:
> > > > >
> > > > > From: Yulei Zhang <yulei.kernel@gmail.com>
> > > > >
> > > > > Currently in KVM memory virtulization we relay on mmu_lock to
> > > > > synchronize the memory mapping update, which make vCPUs work
> > > > > in serialize mode and slow down the execution, especially after
> > > > > migration to do substantial memory mapping will cause visible
> > > > > performance drop, and it can get worse if guest has more vCPU
> > > > > numbers and memories.
> > > > >
> > > > > The idea we present in this patch set is to mitigate the issue
> > > > > with pre-constructed memory mapping table. We will fast pin the
> > > > > guest memory to build up a global memory mapping table according
> > > > > to the guest memslots changes and apply it to cr3, so that after
> > > > > guest starts up all the vCPUs would be able to update the memory
> > > > > simultaneously without page fault exception, thus the performance
> > > > > improvement is expected.
> >
> > My understanding from this RFC is that your primary goal is to
> > eliminate page fault latencies and lock contention arising from the
> > first page faults incurred by vCPUs when initially populating the EPT.
> > Is that right?
> >
>
> That's right.
>
> > I have the impression that the pinning and generally static memory
> > mappings are more a convenient simplification than part of a larger
> > goal to avoid incurring page faults down the line. Is that correct?
> >
> > I ask because I didn't fully understand, from our conversation on v1
> > of this RFC, why reimplementing the page fault handler and associated
> > functions was necessary for the above goals, as I understood them.
> > My impression of the prepopulation approach is that, KVM will
> > sequentially populate all the EPT entries to map guest memory. I
> > understand how this could be optimized to be quite efficient, but I
> > don't understand how it would scale better than the existing
> > implementation with one vCPU accessing memory.
> >
>
> I don't think our goal is to simply eliminate the page fault. Our
> target scenario
> is in live migration, when the workload resume on the destination VM after
> migrate, it will kick off the vcpus to build the gfn to pfn mapping,
> but due to the
> mmu_lock it holds the vcpus to execute in sequential which significantly slows
> down the workload execution in VM and affect the end user experience, especially
> when it is memory sensitive workload. Pre-populate the EPT entries
> will solve the
> problem smoothly as it allows the vcpus to execute in parallel after migration.
Oh, thank you for explaining that. I didn't realize the goal here was
to improve LM performance. I was under the impression that this was to
give VMs a better experience on startup for fast scaling or something.
In your testing with live migration how has this affected the
distribution of time between the phases of live migration? Just for
terminology (since I'm not sure how standard it is across the
industry) I think of a live migration as consisting of 3 stages:
precopy, blackout, and postcopy. In precopy we're tracking the VM's
working set via dirty logging and sending the contents of its memory
to the target host. In blackout we pause the vCPUs on the source, copy
minimal data to the target, and resume the vCPUs on the target. In
postcopy we may still have some pages that have not been copied to the
target and so request those in response to vCPU page faults via user
fault fd or some other mechanism.
Does EPT pre-population preclude the use of a postcopy phase? I would
expect that to make the blackout phase really long. Has that not been
a problem for you?
I love the idea of partial EPT pre-population during precopy if you
could still handle postcopy and just pre-populate as memory came in.
>
> > > > >
> > > > > We use memory dirty pattern workload to test the initial patch
> > > > > set and get positive result even with huge page enabled. For example,
> > > > > we create guest with 32 vCPUs and 64G memories, and let the vcpus
> > > > > dirty the entire memory region concurrently, as the initial patch
> > > > > eliminate the overhead of mmu_lock, in 2M/1G huge page mode we would
> > > > > get the job done in about 50% faster.
> >
> > In this benchmark did you include the time required to pre-populate
> > the EPT or just the time required for the vCPUs to dirty memory?
> > I ask because I'm curious if your priority is to decrease the total
> > end-to-end time, or you just care about the guest experience, and not
> > so much the VM startup time.
>
> We compare the time for each vcpu thread to finish the dirty job. Yes, it can
> take some time for the page table pre-populate, but as each vcpu thread
> can gain a huge advantage with concurrent dirty write, if we count that in
> the total time it is still a better result.
That makes sense to me. Your implementation definitely seems more
efficient than the existing PF handling path. It's probably much
easier to parallelize as a sort of recursive population operation too.
>
> > How does this compare to the case where 1 vCPU reads every page of
> > memory and then 32 vCPUs concurrently dirty every page?
> >
>
> Haven't tried this yet, I think the major difference would be the page fault
> latency introduced by the one vCPU read.
I agree. The whole VM exit path adds a lot of overhead. I wonder what
kind of numbers you'd get it you cranked PTE_PREFETCH_NUM way up
though. If you set that to >= your memory size, one PF could
pre-populate the entire EPT. It's a silly approach, but it would be a
lot more efficient as an easy POC.
>
> > > > >
> > > > > We only validate this feature on Intel x86 platform. And as Ben
> > > > > pointed out in RFC V1, so far we disable the SMM for resource
> > > > > consideration, drop the mmu notification as in this case the
> > > > > memory is pinned.
> >
> > I'm excited to see big MMU changes like this, and I look forward to
> > combining our needs towards a better MMU for the x86 TDP case. Have
> > you thought about how you would build SMM and MMU notifier support
> > onto this patch series? I know that the invalidate range notifiers, at
> > least, added a lot of non-trivial complexity to the direct MMU
> > implementation I presented last year.
> >
>
> Thanks for the suggestion, I will think about it.
>
> > > > >
> > > > > V1->V2:
> > > > > * Rebase the code to kernel version 5.9.0-rc1.
> > > > >
> > > > > Yulei Zhang (9):
> > > > > Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT
> > > > > support
> > > > > Introduce page table population function for direct build EPT feature
> > > > > Introduce page table remove function for direct build EPT feature
> > > > > Add release function for direct build ept when guest VM exit
> > > > > Modify the page fault path to meet the direct build EPT requirement
> > > > > Apply the direct build EPT according to the memory slots change
> > > > > Add migration support when using direct build EPT
> > > > > Introduce kvm module parameter global_tdp to turn on the direct build
> > > > > EPT mode
> > > > > Handle certain mmu exposed functions properly while turn on direct
> > > > > build EPT mode
> > > > >
> > > > > arch/mips/kvm/mips.c | 13 +
> > > > > arch/powerpc/kvm/powerpc.c | 13 +
> > > > > arch/s390/kvm/kvm-s390.c | 13 +
> > > > > arch/x86/include/asm/kvm_host.h | 13 +-
> > > > > arch/x86/kvm/mmu/mmu.c | 533 ++++++++++++++++++++++++++++++--
> > > > > arch/x86/kvm/svm/svm.c | 2 +-
> > > > > arch/x86/kvm/vmx/vmx.c | 7 +-
> > > > > arch/x86/kvm/x86.c | 55 ++--
> > > > > include/linux/kvm_host.h | 7 +-
> > > > > virt/kvm/kvm_main.c | 43 ++-
> > > > > 10 files changed, 639 insertions(+), 60 deletions(-)
> > > > >
> > > > > --
> > > > > 2.17.1
> > > > >
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance
2020-09-25 17:30 ` Ben Gardon
@ 2020-09-25 20:50 ` Paolo Bonzini
2020-09-28 11:52 ` yulei zhang
0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2020-09-25 20:50 UTC (permalink / raw)
To: Ben Gardon, yulei zhang
Cc: Wanpeng Li, kvm, LKML, Sean Christopherson, Jim Mattson,
Junaid Shahid, Vitaly Kuznetsov, Xiao Guangrong, Haiwei Li
On 25/09/20 19:30, Ben Gardon wrote:
> Oh, thank you for explaining that. I didn't realize the goal here was
> to improve LM performance. I was under the impression that this was to
> give VMs a better experience on startup for fast scaling or something.
> In your testing with live migration how has this affected the
> distribution of time between the phases of live migration? Just for
> terminology (since I'm not sure how standard it is across the
> industry) I think of a live migration as consisting of 3 stages:
> precopy, blackout, and postcopy. In precopy we're tracking the VM's
> working set via dirty logging and sending the contents of its memory
> to the target host. In blackout we pause the vCPUs on the source, copy
> minimal data to the target, and resume the vCPUs on the target. In
> postcopy we may still have some pages that have not been copied to the
> target and so request those in response to vCPU page faults via user
> fault fd or some other mechanism.
>
> Does EPT pre-population preclude the use of a postcopy phase?
I think so.
As a quick recap, turn postcopy migration handles two kinds of
pages---they can be copied to the destination either in background
(stuff that was dirty when userspace decided to transition to the
blackout phase) or on-demand (relayed from KVM to userspace via
get_user_pages and userfaultfd). Normally only on-demand pages would be
served through userfaultfd, while with prepopulation every missing page
would be faulted in from the kernel through userfaultfd. In practice
this would just extend the blackout phase.
Paolo
> I would
> expect that to make the blackout phase really long. Has that not been
> a problem for you?
>
> I love the idea of partial EPT pre-population during precopy if you
> could still handle postcopy and just pre-populate as memory came in.
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance
2020-09-25 20:50 ` Paolo Bonzini
@ 2020-09-28 11:52 ` yulei zhang
0 siblings, 0 replies; 22+ messages in thread
From: yulei zhang @ 2020-09-28 11:52 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Ben Gardon, Wanpeng Li, kvm, LKML, Sean Christopherson,
Jim Mattson, Junaid Shahid, Vitaly Kuznetsov, Xiao Guangrong,
Haiwei Li
On Sat, Sep 26, 2020 at 4:50 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 25/09/20 19:30, Ben Gardon wrote:
> > Oh, thank you for explaining that. I didn't realize the goal here was
> > to improve LM performance. I was under the impression that this was to
> > give VMs a better experience on startup for fast scaling or something.
> > In your testing with live migration how has this affected the
> > distribution of time between the phases of live migration? Just for
> > terminology (since I'm not sure how standard it is across the
> > industry) I think of a live migration as consisting of 3 stages:
> > precopy, blackout, and postcopy. In precopy we're tracking the VM's
> > working set via dirty logging and sending the contents of its memory
> > to the target host. In blackout we pause the vCPUs on the source, copy
> > minimal data to the target, and resume the vCPUs on the target. In
> > postcopy we may still have some pages that have not been copied to the
> > target and so request those in response to vCPU page faults via user
> > fault fd or some other mechanism.
> >
> > Does EPT pre-population preclude the use of a postcopy phase?
>
> I think so.
>
> As a quick recap, turn postcopy migration handles two kinds of
> pages---they can be copied to the destination either in background
> (stuff that was dirty when userspace decided to transition to the
> blackout phase) or on-demand (relayed from KVM to userspace via
> get_user_pages and userfaultfd). Normally only on-demand pages would be
> served through userfaultfd, while with prepopulation every missing page
> would be faulted in from the kernel through userfaultfd. In practice
> this would just extend the blackout phase.
>
> Paolo
>
Yep, you are right, based on current implementation it doesn't support the
postcopy. Thanks for the suggestion, we will try to fill the gap with proper
EPT population during the post-copy.
> > I would
> > expect that to make the blackout phase really long. Has that not been
> > a problem for you?
> >
> > I love the idea of partial EPT pre-population during precopy if you
> > could still handle postcopy and just pre-populate as memory came in.
> >
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC V2 2/9] Introduce page table population function for direct build EPT feature
@ 2020-09-01 19:56 kernel test robot
0 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2020-09-01 19:56 UTC (permalink / raw)
To: kbuild
[-- Attachment #1: Type: text/plain, Size: 1524 bytes --]
CC: kbuild-all(a)lists.01.org
In-Reply-To: <f0c109e76f3cd4a1bfd1ca3ff74e0d36c0288ca9.1598868204.git.yulei.kernel@gmail.com>
References: <f0c109e76f3cd4a1bfd1ca3ff74e0d36c0288ca9.1598868204.git.yulei.kernel@gmail.com>
TO: yulei.kernel(a)gmail.com
Hi,
[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on kvm/linux-next]
[also build test WARNING on linus/master v5.9-rc3 next-20200828]
[cannot apply to kvms390/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/yulei-kernel-gmail-com/x86-mmu-Introduce-parallel-memory-virtualization-to-boost-performance/20200901-221509
base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
:::::: branch date: 6 hours ago
:::::: commit date: 6 hours ago
config: x86_64-randconfig-c002-20200901 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
coccinelle warnings: (new ones prefixed by >>)
>> arch/x86/kvm/mmu/mmu.c:6299:5-8: Unneeded variable: "ret". Return "0" on line 6349
Please review and possibly fold the followup patch.
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 35445 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2020-09-28 11:53 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-01 11:52 [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance yulei.kernel
2020-09-01 11:54 ` [RFC V2 1/9] Introduce new fields in kvm_arch/vcpu_arch struct for direct build EPT support yulei.kernel
2020-09-01 11:55 ` [RFC V2 2/9] Introduce page table population function for direct build EPT feature yulei.kernel
2020-09-01 17:33 ` kernel test robot
2020-09-01 19:04 ` kernel test robot
2020-09-01 11:55 ` [RFC V2 3/9] Introduce page table remove " yulei.kernel
2020-09-01 11:55 ` [RFC V2 4/9] Add release function for direct build ept when guest VM exit yulei.kernel
2020-09-01 11:56 ` [RFC V2 5/9] Modify the page fault path to meet the direct build EPT requirement yulei.kernel
2020-09-01 11:56 ` [RFC V2 6/9] Apply the direct build EPT according to the memory slots change yulei.kernel
2020-09-01 22:20 ` kernel test robot
2020-09-02 7:00 ` kernel test robot
2020-09-01 11:56 ` [RFC V2 7/9] Add migration support when using direct build EPT yulei.kernel
2020-09-01 11:57 ` [RFC V2 8/9] Introduce kvm module parameter global_tdp to turn on the direct build EPT mode yulei.kernel
2020-09-01 11:57 ` [RFC V2 9/9] Handle certain mmu exposed functions properly while turn on " yulei.kernel
2020-09-09 3:04 ` [RFC V2 0/9] x86/mmu:Introduce parallel memory virtualization to boost performance Wanpeng Li
2020-09-24 6:28 ` Wanpeng Li
2020-09-24 17:14 ` Ben Gardon
2020-09-25 12:04 ` yulei zhang
2020-09-25 17:30 ` Ben Gardon
2020-09-25 20:50 ` Paolo Bonzini
2020-09-28 11:52 ` yulei zhang
2020-09-01 19:56 [RFC V2 2/9] Introduce page table population function for direct build EPT feature kernel test robot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.