* [RFC PATCH] KVM: x86: Support write protect huge pages lazily @ 2020-08-28 8:11 Keqian Zhu 2021-04-06 11:49 ` Keqian Zhu 0 siblings, 1 reply; 10+ messages in thread From: Keqian Zhu @ 2020-08-28 8:11 UTC (permalink / raw) To: x86, kvm, linux-kernel Cc: wanghaibin.wang, Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin, Keqian Zhu Currently during enable dirty logging, if we're with init-all-set, we just write protect huge pages and leave normal pages untouched, for that we can enable dirty logging for these pages lazily. It seems that enable dirty logging lazily for huge pages is feasible too, which not only reduces the time of start dirty logging, also greatly reduces side-effect on guest when there is high dirty rate. (These codes are not tested, for RFC purpose :-) ). Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> --- arch/x86/include/asm/kvm_host.h | 3 +- arch/x86/kvm/mmu/mmu.c | 65 ++++++++++++++++++++++++++------- arch/x86/kvm/vmx/vmx.c | 3 +- arch/x86/kvm/x86.c | 22 +++++------ 4 files changed, 62 insertions(+), 31 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 5303dbc5c9bc..201a068cf43d 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1296,8 +1296,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, void kvm_mmu_reset_context(struct kvm_vcpu *vcpu); void kvm_mmu_slot_remove_write_access(struct kvm *kvm, - struct kvm_memory_slot *memslot, - int start_level); + struct kvm_memory_slot *memslot); void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm, const struct kvm_memory_slot *memslot); void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 43fdb0c12a5d..4b7d577de6cd 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1625,14 +1625,45 @@ static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head) } /** - * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages + * kvm_mmu_write_protect_largepage_masked - write protect selected largepages * @kvm: kvm instance * @slot: slot to protect * @gfn_offset: start of the BITS_PER_LONG pages we care about * @mask: indicates which pages we should protect * - * Used when we do not need to care about huge page mappings: e.g. during dirty - * logging we do not have any such mappings. + * @ret: true if all pages are write protected + */ +static bool kvm_mmu_write_protect_largepage_masked(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn_offset, unsigned long mask) +{ + struct kvm_rmap_head *rmap_head; + bool protected, all_protected; + gfn_t start_gfn = slot->base_gfn + gfn_offset; + int i; + + all_protected = true; + while (mask) { + protected = false; + for (i = PG_LEVEL_2M; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) { + rmap_head = __gfn_to_rmap(start_gfn + __ffs(mask), i, slot); + protectd |= __rmap_write_protect(kvm, rmap_head, false); + } + + all_protected &= protectd; + /* clear the first set bit */ + mask &= mask - 1; + } + + return all_protected; +} + +/** + * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages + * @kvm: kvm instance + * @slot: slot to protect + * @gfn_offset: start of the BITS_PER_LONG pages we care about + * @mask: indicates which pages we should protect */ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, struct kvm_memory_slot *slot, @@ -1679,18 +1710,25 @@ EXPORT_SYMBOL_GPL(kvm_mmu_clear_dirty_pt_masked); /** * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected - * PT level pages. - * - * It calls kvm_mmu_write_protect_pt_masked to write protect selected pages to - * enable dirty logging for them. - * - * Used when we do not need to care about huge page mappings: e.g. during dirty - * logging we do not have any such mappings. + * dirty pages. */ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn_offset, unsigned long mask) { + /* + * If we're with initial-all-set, huge pages are NOT + * write protected when we start dirty log, so we must + * write protect them here. + */ + if (kvm_dirty_log_manual_protect_and_init_set(kvm)) { + if (kvm_mmu_write_protect_largepage_masked(kvm, slot, + gfn_offset, mask)) + return; + } + + /* Then we can handle the 4K level pages */ + if (kvm_x86_ops.enable_log_dirty_pt_masked) kvm_x86_ops.enable_log_dirty_pt_masked(kvm, slot, gfn_offset, mask); @@ -5906,14 +5944,13 @@ static bool slot_rmap_write_protect(struct kvm *kvm, } void kvm_mmu_slot_remove_write_access(struct kvm *kvm, - struct kvm_memory_slot *memslot, - int start_level) + struct kvm_memory_slot *memslot) { bool flush; spin_lock(&kvm->mmu_lock); - flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect, - start_level, KVM_MAX_HUGEPAGE_LEVEL, false); + flush = slot_handle_all_level(kvm, memslot, slot_rmap_write_protect, + false); spin_unlock(&kvm->mmu_lock); /* diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 819c185adf09..ba871c52ef8b 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7538,8 +7538,7 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu) static void vmx_slot_enable_log_dirty(struct kvm *kvm, struct kvm_memory_slot *slot) { - if (!kvm_dirty_log_manual_protect_and_init_set(kvm)) - kvm_mmu_slot_leaf_clear_dirty(kvm, slot); + kvm_mmu_slot_leaf_clear_dirty(kvm, slot); kvm_mmu_slot_largepage_remove_write_access(kvm, slot); } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d39d6cf1d473..c31c32f1424b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10225,22 +10225,18 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm, * is enabled the D-bit or the W-bit will be cleared. */ if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) { + /* + * If we're with initial-all-set, we don't need + * to write protect any page because they're + * reported as dirty already. + */ + if (kvm_dirty_log_manual_protect_and_init_set(kvm)) + return; + if (kvm_x86_ops.slot_enable_log_dirty) { kvm_x86_ops.slot_enable_log_dirty(kvm, new); } else { - int level = - kvm_dirty_log_manual_protect_and_init_set(kvm) ? - PG_LEVEL_2M : PG_LEVEL_4K; - - /* - * If we're with initial-all-set, we don't need - * to write protect any small page because - * they're reported as dirty already. However - * we still need to write-protect huge pages - * so that the page split can happen lazily on - * the first write to the huge page. - */ - kvm_mmu_slot_remove_write_access(kvm, new, level); + kvm_mmu_slot_remove_write_access(kvm, new); } } else { if (kvm_x86_ops.slot_disable_log_dirty) -- 2.23.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] KVM: x86: Support write protect huge pages lazily 2020-08-28 8:11 [RFC PATCH] KVM: x86: Support write protect huge pages lazily Keqian Zhu @ 2021-04-06 11:49 ` Keqian Zhu 2021-04-06 23:42 ` Sean Christopherson 0 siblings, 1 reply; 10+ messages in thread From: Keqian Zhu @ 2021-04-06 11:49 UTC (permalink / raw) To: kvm, linux-kernel, Paolo Bonzini; +Cc: wanghaibin.wang Hi Paolo, I plan to rework this patch and do full test. What do you think about this idea (enable dirty logging for huge pages lazily)? Best Regards, Keqian PS: As dirty log of TDP MMU has been supported, I should add more code. On 2020/8/28 16:11, Keqian Zhu wrote: > Currently during enable dirty logging, if we're with init-all-set, > we just write protect huge pages and leave normal pages untouched, > for that we can enable dirty logging for these pages lazily. > > It seems that enable dirty logging lazily for huge pages is feasible > too, which not only reduces the time of start dirty logging, also > greatly reduces side-effect on guest when there is high dirty rate. > > (These codes are not tested, for RFC purpose :-) ). > > Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> > --- > arch/x86/include/asm/kvm_host.h | 3 +- > arch/x86/kvm/mmu/mmu.c | 65 ++++++++++++++++++++++++++------- > arch/x86/kvm/vmx/vmx.c | 3 +- > arch/x86/kvm/x86.c | 22 +++++------ > 4 files changed, 62 insertions(+), 31 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 5303dbc5c9bc..201a068cf43d 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1296,8 +1296,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, > > void kvm_mmu_reset_context(struct kvm_vcpu *vcpu); > void kvm_mmu_slot_remove_write_access(struct kvm *kvm, > - struct kvm_memory_slot *memslot, > - int start_level); > + struct kvm_memory_slot *memslot); > void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm, > const struct kvm_memory_slot *memslot); > void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 43fdb0c12a5d..4b7d577de6cd 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -1625,14 +1625,45 @@ static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head) > } > > /** > - * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages > + * kvm_mmu_write_protect_largepage_masked - write protect selected largepages > * @kvm: kvm instance > * @slot: slot to protect > * @gfn_offset: start of the BITS_PER_LONG pages we care about > * @mask: indicates which pages we should protect > * > - * Used when we do not need to care about huge page mappings: e.g. during dirty > - * logging we do not have any such mappings. > + * @ret: true if all pages are write protected > + */ > +static bool kvm_mmu_write_protect_largepage_masked(struct kvm *kvm, > + struct kvm_memory_slot *slot, > + gfn_t gfn_offset, unsigned long mask) > +{ > + struct kvm_rmap_head *rmap_head; > + bool protected, all_protected; > + gfn_t start_gfn = slot->base_gfn + gfn_offset; > + int i; > + > + all_protected = true; > + while (mask) { > + protected = false; > + for (i = PG_LEVEL_2M; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) { > + rmap_head = __gfn_to_rmap(start_gfn + __ffs(mask), i, slot); > + protectd |= __rmap_write_protect(kvm, rmap_head, false); > + } > + > + all_protected &= protectd; > + /* clear the first set bit */ > + mask &= mask - 1; > + } > + > + return all_protected; > +} > + > +/** > + * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages > + * @kvm: kvm instance > + * @slot: slot to protect > + * @gfn_offset: start of the BITS_PER_LONG pages we care about > + * @mask: indicates which pages we should protect > */ > static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, > struct kvm_memory_slot *slot, > @@ -1679,18 +1710,25 @@ EXPORT_SYMBOL_GPL(kvm_mmu_clear_dirty_pt_masked); > > /** > * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected > - * PT level pages. > - * > - * It calls kvm_mmu_write_protect_pt_masked to write protect selected pages to > - * enable dirty logging for them. > - * > - * Used when we do not need to care about huge page mappings: e.g. during dirty > - * logging we do not have any such mappings. > + * dirty pages. > */ > void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, > struct kvm_memory_slot *slot, > gfn_t gfn_offset, unsigned long mask) > { > + /* > + * If we're with initial-all-set, huge pages are NOT > + * write protected when we start dirty log, so we must > + * write protect them here. > + */ > + if (kvm_dirty_log_manual_protect_and_init_set(kvm)) { > + if (kvm_mmu_write_protect_largepage_masked(kvm, slot, > + gfn_offset, mask)) > + return; > + } > + > + /* Then we can handle the 4K level pages */ > + > if (kvm_x86_ops.enable_log_dirty_pt_masked) > kvm_x86_ops.enable_log_dirty_pt_masked(kvm, slot, gfn_offset, > mask); > @@ -5906,14 +5944,13 @@ static bool slot_rmap_write_protect(struct kvm *kvm, > } > > void kvm_mmu_slot_remove_write_access(struct kvm *kvm, > - struct kvm_memory_slot *memslot, > - int start_level) > + struct kvm_memory_slot *memslot) > { > bool flush; > > spin_lock(&kvm->mmu_lock); > - flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect, > - start_level, KVM_MAX_HUGEPAGE_LEVEL, false); > + flush = slot_handle_all_level(kvm, memslot, slot_rmap_write_protect, > + false); > spin_unlock(&kvm->mmu_lock); > > /* > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 819c185adf09..ba871c52ef8b 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -7538,8 +7538,7 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu) > static void vmx_slot_enable_log_dirty(struct kvm *kvm, > struct kvm_memory_slot *slot) > { > - if (!kvm_dirty_log_manual_protect_and_init_set(kvm)) > - kvm_mmu_slot_leaf_clear_dirty(kvm, slot); > + kvm_mmu_slot_leaf_clear_dirty(kvm, slot); > kvm_mmu_slot_largepage_remove_write_access(kvm, slot); > } > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index d39d6cf1d473..c31c32f1424b 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -10225,22 +10225,18 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm, > * is enabled the D-bit or the W-bit will be cleared. > */ > if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) { > + /* > + * If we're with initial-all-set, we don't need > + * to write protect any page because they're > + * reported as dirty already. > + */ > + if (kvm_dirty_log_manual_protect_and_init_set(kvm)) > + return; > + > if (kvm_x86_ops.slot_enable_log_dirty) { > kvm_x86_ops.slot_enable_log_dirty(kvm, new); > } else { > - int level = > - kvm_dirty_log_manual_protect_and_init_set(kvm) ? > - PG_LEVEL_2M : PG_LEVEL_4K; > - > - /* > - * If we're with initial-all-set, we don't need > - * to write protect any small page because > - * they're reported as dirty already. However > - * we still need to write-protect huge pages > - * so that the page split can happen lazily on > - * the first write to the huge page. > - */ > - kvm_mmu_slot_remove_write_access(kvm, new, level); > + kvm_mmu_slot_remove_write_access(kvm, new); > } > } else { > if (kvm_x86_ops.slot_disable_log_dirty) > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] KVM: x86: Support write protect huge pages lazily 2021-04-06 11:49 ` Keqian Zhu @ 2021-04-06 23:42 ` Sean Christopherson 2021-04-08 7:35 ` Keqian Zhu 2021-04-12 17:19 ` Ben Gardon 0 siblings, 2 replies; 10+ messages in thread From: Sean Christopherson @ 2021-04-06 23:42 UTC (permalink / raw) To: Keqian Zhu; +Cc: kvm, linux-kernel, Paolo Bonzini, wanghaibin.wang, Ben Gardon +Ben On Tue, Apr 06, 2021, Keqian Zhu wrote: > Hi Paolo, > > I plan to rework this patch and do full test. What do you think about this idea > (enable dirty logging for huge pages lazily)? Ben, don't you also have something similar (or maybe the exact opposite?) in the hopper? This sounds very familiar, but I can't quite connect the dots that are floating around my head... > PS: As dirty log of TDP MMU has been supported, I should add more code. > > On 2020/8/28 16:11, Keqian Zhu wrote: > > Currently during enable dirty logging, if we're with init-all-set, > > we just write protect huge pages and leave normal pages untouched, > > for that we can enable dirty logging for these pages lazily. > > > > It seems that enable dirty logging lazily for huge pages is feasible > > too, which not only reduces the time of start dirty logging, also > > greatly reduces side-effect on guest when there is high dirty rate. > > > > (These codes are not tested, for RFC purpose :-) ). > > > > Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> > > --- > > arch/x86/include/asm/kvm_host.h | 3 +- > > arch/x86/kvm/mmu/mmu.c | 65 ++++++++++++++++++++++++++------- > > arch/x86/kvm/vmx/vmx.c | 3 +- > > arch/x86/kvm/x86.c | 22 +++++------ > > 4 files changed, 62 insertions(+), 31 deletions(-) > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > index 5303dbc5c9bc..201a068cf43d 100644 > > --- a/arch/x86/include/asm/kvm_host.h > > +++ b/arch/x86/include/asm/kvm_host.h > > @@ -1296,8 +1296,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, > > > > void kvm_mmu_reset_context(struct kvm_vcpu *vcpu); > > void kvm_mmu_slot_remove_write_access(struct kvm *kvm, > > - struct kvm_memory_slot *memslot, > > - int start_level); > > + struct kvm_memory_slot *memslot); > > void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm, > > const struct kvm_memory_slot *memslot); > > void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 43fdb0c12a5d..4b7d577de6cd 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -1625,14 +1625,45 @@ static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head) > > } > > > > /** > > - * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages > > + * kvm_mmu_write_protect_largepage_masked - write protect selected largepages > > * @kvm: kvm instance > > * @slot: slot to protect > > * @gfn_offset: start of the BITS_PER_LONG pages we care about > > * @mask: indicates which pages we should protect > > * > > - * Used when we do not need to care about huge page mappings: e.g. during dirty > > - * logging we do not have any such mappings. > > + * @ret: true if all pages are write protected > > + */ > > +static bool kvm_mmu_write_protect_largepage_masked(struct kvm *kvm, > > + struct kvm_memory_slot *slot, > > + gfn_t gfn_offset, unsigned long mask) > > +{ > > + struct kvm_rmap_head *rmap_head; > > + bool protected, all_protected; > > + gfn_t start_gfn = slot->base_gfn + gfn_offset; > > + int i; > > + > > + all_protected = true; > > + while (mask) { > > + protected = false; > > + for (i = PG_LEVEL_2M; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) { > > + rmap_head = __gfn_to_rmap(start_gfn + __ffs(mask), i, slot); > > + protectd |= __rmap_write_protect(kvm, rmap_head, false); > > + } > > + > > + all_protected &= protectd; > > + /* clear the first set bit */ > > + mask &= mask - 1; > > + } > > + > > + return all_protected; > > +} > > + > > +/** > > + * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages > > + * @kvm: kvm instance > > + * @slot: slot to protect > > + * @gfn_offset: start of the BITS_PER_LONG pages we care about > > + * @mask: indicates which pages we should protect > > */ > > static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, > > struct kvm_memory_slot *slot, > > @@ -1679,18 +1710,25 @@ EXPORT_SYMBOL_GPL(kvm_mmu_clear_dirty_pt_masked); > > > > /** > > * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected > > - * PT level pages. > > - * > > - * It calls kvm_mmu_write_protect_pt_masked to write protect selected pages to > > - * enable dirty logging for them. > > - * > > - * Used when we do not need to care about huge page mappings: e.g. during dirty > > - * logging we do not have any such mappings. > > + * dirty pages. > > */ > > void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, > > struct kvm_memory_slot *slot, > > gfn_t gfn_offset, unsigned long mask) > > { > > + /* > > + * If we're with initial-all-set, huge pages are NOT > > + * write protected when we start dirty log, so we must > > + * write protect them here. > > + */ > > + if (kvm_dirty_log_manual_protect_and_init_set(kvm)) { > > + if (kvm_mmu_write_protect_largepage_masked(kvm, slot, > > + gfn_offset, mask)) > > + return; > > + } > > + > > + /* Then we can handle the 4K level pages */ > > + > > if (kvm_x86_ops.enable_log_dirty_pt_masked) > > kvm_x86_ops.enable_log_dirty_pt_masked(kvm, slot, gfn_offset, > > mask); > > @@ -5906,14 +5944,13 @@ static bool slot_rmap_write_protect(struct kvm *kvm, > > } > > > > void kvm_mmu_slot_remove_write_access(struct kvm *kvm, > > - struct kvm_memory_slot *memslot, > > - int start_level) > > + struct kvm_memory_slot *memslot) > > { > > bool flush; > > > > spin_lock(&kvm->mmu_lock); > > - flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect, > > - start_level, KVM_MAX_HUGEPAGE_LEVEL, false); > > + flush = slot_handle_all_level(kvm, memslot, slot_rmap_write_protect, > > + false); > > spin_unlock(&kvm->mmu_lock); > > > > /* > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > > index 819c185adf09..ba871c52ef8b 100644 > > --- a/arch/x86/kvm/vmx/vmx.c > > +++ b/arch/x86/kvm/vmx/vmx.c > > @@ -7538,8 +7538,7 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu) > > static void vmx_slot_enable_log_dirty(struct kvm *kvm, > > struct kvm_memory_slot *slot) > > { > > - if (!kvm_dirty_log_manual_protect_and_init_set(kvm)) > > - kvm_mmu_slot_leaf_clear_dirty(kvm, slot); > > + kvm_mmu_slot_leaf_clear_dirty(kvm, slot); > > kvm_mmu_slot_largepage_remove_write_access(kvm, slot); > > } > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index d39d6cf1d473..c31c32f1424b 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -10225,22 +10225,18 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm, > > * is enabled the D-bit or the W-bit will be cleared. > > */ > > if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) { > > + /* > > + * If we're with initial-all-set, we don't need > > + * to write protect any page because they're > > + * reported as dirty already. > > + */ > > + if (kvm_dirty_log_manual_protect_and_init_set(kvm)) > > + return; > > + > > if (kvm_x86_ops.slot_enable_log_dirty) { > > kvm_x86_ops.slot_enable_log_dirty(kvm, new); > > } else { > > - int level = > > - kvm_dirty_log_manual_protect_and_init_set(kvm) ? > > - PG_LEVEL_2M : PG_LEVEL_4K; > > - > > - /* > > - * If we're with initial-all-set, we don't need > > - * to write protect any small page because > > - * they're reported as dirty already. However > > - * we still need to write-protect huge pages > > - * so that the page split can happen lazily on > > - * the first write to the huge page. > > - */ > > - kvm_mmu_slot_remove_write_access(kvm, new, level); > > + kvm_mmu_slot_remove_write_access(kvm, new); > > } > > } else { > > if (kvm_x86_ops.slot_disable_log_dirty) > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] KVM: x86: Support write protect huge pages lazily 2021-04-06 23:42 ` Sean Christopherson @ 2021-04-08 7:35 ` Keqian Zhu 2021-04-08 15:52 ` Sean Christopherson 2021-04-12 17:19 ` Ben Gardon 1 sibling, 1 reply; 10+ messages in thread From: Keqian Zhu @ 2021-04-08 7:35 UTC (permalink / raw) To: Ben Gardon Cc: Sean Christopherson, kvm, linux-kernel, Paolo Bonzini, wanghaibin.wang Hi Ben, Do you have any similar idea that can share with us? Thanks Keqian On 2021/4/7 7:42, Sean Christopherson wrote: > +Ben > > On Tue, Apr 06, 2021, Keqian Zhu wrote: >> Hi Paolo, >> >> I plan to rework this patch and do full test. What do you think about this idea >> (enable dirty logging for huge pages lazily)? > > Ben, don't you also have something similar (or maybe the exact opposite?) in the > hopper? This sounds very familiar, but I can't quite connect the dots that are > floating around my head... > >> PS: As dirty log of TDP MMU has been supported, I should add more code. >> >> On 2020/8/28 16:11, Keqian Zhu wrote: >>> Currently during enable dirty logging, if we're with init-all-set, >>> we just write protect huge pages and leave normal pages untouched, >>> for that we can enable dirty logging for these pages lazily. >>> >>> It seems that enable dirty logging lazily for huge pages is feasible >>> too, which not only reduces the time of start dirty logging, also >>> greatly reduces side-effect on guest when there is high dirty rate. >>> >>> (These codes are not tested, for RFC purpose :-) ). >>> >>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> >>> --- >>> arch/x86/include/asm/kvm_host.h | 3 +- >>> arch/x86/kvm/mmu/mmu.c | 65 ++++++++++++++++++++++++++------- >>> arch/x86/kvm/vmx/vmx.c | 3 +- >>> arch/x86/kvm/x86.c | 22 +++++------ >>> 4 files changed, 62 insertions(+), 31 deletions(-) >>> >>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h >>> index 5303dbc5c9bc..201a068cf43d 100644 >>> --- a/arch/x86/include/asm/kvm_host.h >>> +++ b/arch/x86/include/asm/kvm_host.h >>> @@ -1296,8 +1296,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, >>> >>> void kvm_mmu_reset_context(struct kvm_vcpu *vcpu); >>> void kvm_mmu_slot_remove_write_access(struct kvm *kvm, >>> - struct kvm_memory_slot *memslot, >>> - int start_level); >>> + struct kvm_memory_slot *memslot); >>> void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm, >>> const struct kvm_memory_slot *memslot); >>> void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, >>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c >>> index 43fdb0c12a5d..4b7d577de6cd 100644 >>> --- a/arch/x86/kvm/mmu/mmu.c >>> +++ b/arch/x86/kvm/mmu/mmu.c >>> @@ -1625,14 +1625,45 @@ static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head) >>> } >>> >>> /** >>> - * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages >>> + * kvm_mmu_write_protect_largepage_masked - write protect selected largepages >>> * @kvm: kvm instance >>> * @slot: slot to protect >>> * @gfn_offset: start of the BITS_PER_LONG pages we care about >>> * @mask: indicates which pages we should protect >>> * >>> - * Used when we do not need to care about huge page mappings: e.g. during dirty >>> - * logging we do not have any such mappings. >>> + * @ret: true if all pages are write protected >>> + */ >>> +static bool kvm_mmu_write_protect_largepage_masked(struct kvm *kvm, >>> + struct kvm_memory_slot *slot, >>> + gfn_t gfn_offset, unsigned long mask) >>> +{ >>> + struct kvm_rmap_head *rmap_head; >>> + bool protected, all_protected; >>> + gfn_t start_gfn = slot->base_gfn + gfn_offset; >>> + int i; >>> + >>> + all_protected = true; >>> + while (mask) { >>> + protected = false; >>> + for (i = PG_LEVEL_2M; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) { >>> + rmap_head = __gfn_to_rmap(start_gfn + __ffs(mask), i, slot); >>> + protectd |= __rmap_write_protect(kvm, rmap_head, false); >>> + } >>> + >>> + all_protected &= protectd; >>> + /* clear the first set bit */ >>> + mask &= mask - 1; >>> + } >>> + >>> + return all_protected; >>> +} >>> + >>> +/** >>> + * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages >>> + * @kvm: kvm instance >>> + * @slot: slot to protect >>> + * @gfn_offset: start of the BITS_PER_LONG pages we care about >>> + * @mask: indicates which pages we should protect >>> */ >>> static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, >>> struct kvm_memory_slot *slot, >>> @@ -1679,18 +1710,25 @@ EXPORT_SYMBOL_GPL(kvm_mmu_clear_dirty_pt_masked); >>> >>> /** >>> * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected >>> - * PT level pages. >>> - * >>> - * It calls kvm_mmu_write_protect_pt_masked to write protect selected pages to >>> - * enable dirty logging for them. >>> - * >>> - * Used when we do not need to care about huge page mappings: e.g. during dirty >>> - * logging we do not have any such mappings. >>> + * dirty pages. >>> */ >>> void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, >>> struct kvm_memory_slot *slot, >>> gfn_t gfn_offset, unsigned long mask) >>> { >>> + /* >>> + * If we're with initial-all-set, huge pages are NOT >>> + * write protected when we start dirty log, so we must >>> + * write protect them here. >>> + */ >>> + if (kvm_dirty_log_manual_protect_and_init_set(kvm)) { >>> + if (kvm_mmu_write_protect_largepage_masked(kvm, slot, >>> + gfn_offset, mask)) >>> + return; >>> + } >>> + >>> + /* Then we can handle the 4K level pages */ >>> + >>> if (kvm_x86_ops.enable_log_dirty_pt_masked) >>> kvm_x86_ops.enable_log_dirty_pt_masked(kvm, slot, gfn_offset, >>> mask); >>> @@ -5906,14 +5944,13 @@ static bool slot_rmap_write_protect(struct kvm *kvm, >>> } >>> >>> void kvm_mmu_slot_remove_write_access(struct kvm *kvm, >>> - struct kvm_memory_slot *memslot, >>> - int start_level) >>> + struct kvm_memory_slot *memslot) >>> { >>> bool flush; >>> >>> spin_lock(&kvm->mmu_lock); >>> - flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect, >>> - start_level, KVM_MAX_HUGEPAGE_LEVEL, false); >>> + flush = slot_handle_all_level(kvm, memslot, slot_rmap_write_protect, >>> + false); >>> spin_unlock(&kvm->mmu_lock); >>> >>> /* >>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c >>> index 819c185adf09..ba871c52ef8b 100644 >>> --- a/arch/x86/kvm/vmx/vmx.c >>> +++ b/arch/x86/kvm/vmx/vmx.c >>> @@ -7538,8 +7538,7 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu) >>> static void vmx_slot_enable_log_dirty(struct kvm *kvm, >>> struct kvm_memory_slot *slot) >>> { >>> - if (!kvm_dirty_log_manual_protect_and_init_set(kvm)) >>> - kvm_mmu_slot_leaf_clear_dirty(kvm, slot); >>> + kvm_mmu_slot_leaf_clear_dirty(kvm, slot); >>> kvm_mmu_slot_largepage_remove_write_access(kvm, slot); >>> } >>> >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>> index d39d6cf1d473..c31c32f1424b 100644 >>> --- a/arch/x86/kvm/x86.c >>> +++ b/arch/x86/kvm/x86.c >>> @@ -10225,22 +10225,18 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm, >>> * is enabled the D-bit or the W-bit will be cleared. >>> */ >>> if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) { >>> + /* >>> + * If we're with initial-all-set, we don't need >>> + * to write protect any page because they're >>> + * reported as dirty already. >>> + */ >>> + if (kvm_dirty_log_manual_protect_and_init_set(kvm)) >>> + return; >>> + >>> if (kvm_x86_ops.slot_enable_log_dirty) { >>> kvm_x86_ops.slot_enable_log_dirty(kvm, new); >>> } else { >>> - int level = >>> - kvm_dirty_log_manual_protect_and_init_set(kvm) ? >>> - PG_LEVEL_2M : PG_LEVEL_4K; >>> - >>> - /* >>> - * If we're with initial-all-set, we don't need >>> - * to write protect any small page because >>> - * they're reported as dirty already. However >>> - * we still need to write-protect huge pages >>> - * so that the page split can happen lazily on >>> - * the first write to the huge page. >>> - */ >>> - kvm_mmu_slot_remove_write_access(kvm, new, level); >>> + kvm_mmu_slot_remove_write_access(kvm, new); >>> } >>> } else { >>> if (kvm_x86_ops.slot_disable_log_dirty) >>> > . > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] KVM: x86: Support write protect huge pages lazily 2021-04-08 7:35 ` Keqian Zhu @ 2021-04-08 15:52 ` Sean Christopherson 2021-04-09 1:13 ` Keqian Zhu 0 siblings, 1 reply; 10+ messages in thread From: Sean Christopherson @ 2021-04-08 15:52 UTC (permalink / raw) To: Keqian Zhu; +Cc: Ben Gardon, kvm, linux-kernel, Paolo Bonzini, wanghaibin.wang On Thu, Apr 08, 2021, Keqian Zhu wrote: > Hi Ben, > > Do you have any similar idea that can share with us? Doh, Ben is out this week, he'll be back Monday. Sorry for gumming up the works :-/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] KVM: x86: Support write protect huge pages lazily 2021-04-08 15:52 ` Sean Christopherson @ 2021-04-09 1:13 ` Keqian Zhu 0 siblings, 0 replies; 10+ messages in thread From: Keqian Zhu @ 2021-04-09 1:13 UTC (permalink / raw) To: Sean Christopherson Cc: Ben Gardon, kvm, linux-kernel, Paolo Bonzini, wanghaibin.wang Hi Sean, On 2021/4/8 23:52, Sean Christopherson wrote: > On Thu, Apr 08, 2021, Keqian Zhu wrote: >> Hi Ben, >> >> Do you have any similar idea that can share with us? > > Doh, Ben is out this week, he'll be back Monday. Sorry for gumming up the works :-/ Please don't mind. I'm glad we can have some intersection of idea. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] KVM: x86: Support write protect huge pages lazily 2021-04-06 23:42 ` Sean Christopherson 2021-04-08 7:35 ` Keqian Zhu @ 2021-04-12 17:19 ` Ben Gardon 2021-04-13 9:39 ` Keqian Zhu 1 sibling, 1 reply; 10+ messages in thread From: Ben Gardon @ 2021-04-12 17:19 UTC (permalink / raw) To: Sean Christopherson; +Cc: Keqian Zhu, kvm, LKML, Paolo Bonzini, wanghaibin.wang On Tue, Apr 6, 2021 at 4:42 PM Sean Christopherson <seanjc@google.com> wrote: > > +Ben > > On Tue, Apr 06, 2021, Keqian Zhu wrote: > > Hi Paolo, > > > > I plan to rework this patch and do full test. What do you think about this idea > > (enable dirty logging for huge pages lazily)? > > Ben, don't you also have something similar (or maybe the exact opposite?) in the > hopper? This sounds very familiar, but I can't quite connect the dots that are > floating around my head... Sorry for the late response, I was out of office last week. Yes, we have two relevant features I'd like to reconcile somehow: 1.) Large page shattering - Instead of clearing a large TDP mapping, flushing the TLBs, then replacing it with an empty TDP page table, go straight from the large mapping to a fully pre-populated table. This is slightly slower because the table needs to be pre-populated, but it saves many vCPU page faults. 2.) Eager page splitting - split all large mappings down to 4k when enabling dirty logging, using large page shattering. This makes enabling dirty logging much slower, but speeds up the first round (and later rounds) of gathering / clearing the dirty log and reduces the number of vCPU page faults. We've prefered to do this when enabling dirty logging because it's a little less perf-sensitive than the later passes where latency and convergence are critical. Large page shattering can happen in the NPT page fault handler or the thread enabling dirty logging / clearing the dirty log, so it's more-or-less orthogonal to this patch. Eager page splitting on the other hand takes the opposite approach to this patch, frontloading as much of the work to enable dirty logging as possible. Which approach is better is going to depend a lot on the guest workload, your live migration constraints, and how the user-space hypervisor makes use of KVM's growing number of dirty logging options. In our case, the time to migrate a VM is usually less of a concern than the performance degradation the guest experiences, so we want to do everything we can to minimize vCPU exits and exit latency. I think this is a reasonable change in principle if we're not write protecting 4k pages already, but it's hard to really validate all the performance implications. With this change we'd move pretty much all the work to the first pass of clearing the dirty log, which is probably an improvement since it's much more granular. The downside is that we do more work when we'd really like to be converging the dirty set as opposed to earlier when we know all pages are dirty anyway. > > > PS: As dirty log of TDP MMU has been supported, I should add more code. > > > > On 2020/8/28 16:11, Keqian Zhu wrote: > > > Currently during enable dirty logging, if we're with init-all-set, > > > we just write protect huge pages and leave normal pages untouched, > > > for that we can enable dirty logging for these pages lazily. > > > > > > It seems that enable dirty logging lazily for huge pages is feasible > > > too, which not only reduces the time of start dirty logging, also > > > greatly reduces side-effect on guest when there is high dirty rate. The side effect on the guest would also be greatly reduced with large page shattering above. > > > > > > (These codes are not tested, for RFC purpose :-) ). > > > > > > Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> > > > --- > > > arch/x86/include/asm/kvm_host.h | 3 +- > > > arch/x86/kvm/mmu/mmu.c | 65 ++++++++++++++++++++++++++------- > > > arch/x86/kvm/vmx/vmx.c | 3 +- > > > arch/x86/kvm/x86.c | 22 +++++------ > > > 4 files changed, 62 insertions(+), 31 deletions(-) > > > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > > index 5303dbc5c9bc..201a068cf43d 100644 > > > --- a/arch/x86/include/asm/kvm_host.h > > > +++ b/arch/x86/include/asm/kvm_host.h > > > @@ -1296,8 +1296,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, > > > > > > void kvm_mmu_reset_context(struct kvm_vcpu *vcpu); > > > void kvm_mmu_slot_remove_write_access(struct kvm *kvm, > > > - struct kvm_memory_slot *memslot, > > > - int start_level); > > > + struct kvm_memory_slot *memslot); > > > void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm, > > > const struct kvm_memory_slot *memslot); > > > void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > index 43fdb0c12a5d..4b7d577de6cd 100644 > > > --- a/arch/x86/kvm/mmu/mmu.c > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > @@ -1625,14 +1625,45 @@ static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head) > > > } > > > > > > /** > > > - * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages > > > + * kvm_mmu_write_protect_largepage_masked - write protect selected largepages > > > * @kvm: kvm instance > > > * @slot: slot to protect > > > * @gfn_offset: start of the BITS_PER_LONG pages we care about > > > * @mask: indicates which pages we should protect > > > * > > > - * Used when we do not need to care about huge page mappings: e.g. during dirty > > > - * logging we do not have any such mappings. > > > + * @ret: true if all pages are write protected > > > + */ > > > +static bool kvm_mmu_write_protect_largepage_masked(struct kvm *kvm, > > > + struct kvm_memory_slot *slot, > > > + gfn_t gfn_offset, unsigned long mask) > > > +{ > > > + struct kvm_rmap_head *rmap_head; > > > + bool protected, all_protected; > > > + gfn_t start_gfn = slot->base_gfn + gfn_offset; > > > + int i; > > > + > > > + all_protected = true; > > > + while (mask) { > > > + protected = false; > > > + for (i = PG_LEVEL_2M; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) { > > > + rmap_head = __gfn_to_rmap(start_gfn + __ffs(mask), i, slot); > > > + protectd |= __rmap_write_protect(kvm, rmap_head, false); > > > + } > > > + > > > + all_protected &= protectd; > > > + /* clear the first set bit */ > > > + mask &= mask - 1; I'm a little confused by the use of mask in this function. If gfn_offset is aligned to some multiple of 64, which I think it is, all the bits in the mask will be part of the same large page, so I don't think the mask adds anything. I'm also not sure this function compiles since I think the use of protectd above will result in an error. > > > + } > > > + > > > + return all_protected; > > > +} > > > + > > > +/** > > > + * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages > > > + * @kvm: kvm instance > > > + * @slot: slot to protect > > > + * @gfn_offset: start of the BITS_PER_LONG pages we care about > > > + * @mask: indicates which pages we should protect > > > */ > > > static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, > > > struct kvm_memory_slot *slot, > > > @@ -1679,18 +1710,25 @@ EXPORT_SYMBOL_GPL(kvm_mmu_clear_dirty_pt_masked); > > > > > > /** > > > * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected > > > - * PT level pages. > > > - * > > > - * It calls kvm_mmu_write_protect_pt_masked to write protect selected pages to > > > - * enable dirty logging for them. > > > - * > > > - * Used when we do not need to care about huge page mappings: e.g. during dirty > > > - * logging we do not have any such mappings. > > > + * dirty pages. > > > */ > > > void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, > > > struct kvm_memory_slot *slot, > > > gfn_t gfn_offset, unsigned long mask) > > > { > > > + /* > > > + * If we're with initial-all-set, huge pages are NOT > > > + * write protected when we start dirty log, so we must > > > + * write protect them here. > > > + */ > > > + if (kvm_dirty_log_manual_protect_and_init_set(kvm)) { > > > + if (kvm_mmu_write_protect_largepage_masked(kvm, slot, > > > + gfn_offset, mask)) > > > + return; > > > + } > > > + > > > + /* Then we can handle the 4K level pages */ > > > + > > > if (kvm_x86_ops.enable_log_dirty_pt_masked) > > > kvm_x86_ops.enable_log_dirty_pt_masked(kvm, slot, gfn_offset, > > > mask); > > > @@ -5906,14 +5944,13 @@ static bool slot_rmap_write_protect(struct kvm *kvm, > > > } > > > > > > void kvm_mmu_slot_remove_write_access(struct kvm *kvm, > > > - struct kvm_memory_slot *memslot, > > > - int start_level) > > > + struct kvm_memory_slot *memslot) > > > { > > > bool flush; > > > > > > spin_lock(&kvm->mmu_lock); > > > - flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect, > > > - start_level, KVM_MAX_HUGEPAGE_LEVEL, false); > > > + flush = slot_handle_all_level(kvm, memslot, slot_rmap_write_protect, > > > + false); > > > spin_unlock(&kvm->mmu_lock); > > > > > > /* > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > > > index 819c185adf09..ba871c52ef8b 100644 > > > --- a/arch/x86/kvm/vmx/vmx.c > > > +++ b/arch/x86/kvm/vmx/vmx.c > > > @@ -7538,8 +7538,7 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu) > > > static void vmx_slot_enable_log_dirty(struct kvm *kvm, > > > struct kvm_memory_slot *slot) > > > { > > > - if (!kvm_dirty_log_manual_protect_and_init_set(kvm)) > > > - kvm_mmu_slot_leaf_clear_dirty(kvm, slot); > > > + kvm_mmu_slot_leaf_clear_dirty(kvm, slot); > > > kvm_mmu_slot_largepage_remove_write_access(kvm, slot); > > > } > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > index d39d6cf1d473..c31c32f1424b 100644 > > > --- a/arch/x86/kvm/x86.c > > > +++ b/arch/x86/kvm/x86.c > > > @@ -10225,22 +10225,18 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm, > > > * is enabled the D-bit or the W-bit will be cleared. > > > */ > > > if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) { > > > + /* > > > + * If we're with initial-all-set, we don't need > > > + * to write protect any page because they're > > > + * reported as dirty already. > > > + */ > > > + if (kvm_dirty_log_manual_protect_and_init_set(kvm)) > > > + return; > > > + > > > if (kvm_x86_ops.slot_enable_log_dirty) { > > > kvm_x86_ops.slot_enable_log_dirty(kvm, new); > > > } else { > > > - int level = > > > - kvm_dirty_log_manual_protect_and_init_set(kvm) ? > > > - PG_LEVEL_2M : PG_LEVEL_4K; > > > - > > > - /* > > > - * If we're with initial-all-set, we don't need > > > - * to write protect any small page because > > > - * they're reported as dirty already. However > > > - * we still need to write-protect huge pages > > > - * so that the page split can happen lazily on > > > - * the first write to the huge page. > > > - */ > > > - kvm_mmu_slot_remove_write_access(kvm, new, level); > > > + kvm_mmu_slot_remove_write_access(kvm, new); > > > } > > > } else { > > > if (kvm_x86_ops.slot_disable_log_dirty) > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] KVM: x86: Support write protect huge pages lazily 2021-04-12 17:19 ` Ben Gardon @ 2021-04-13 9:39 ` Keqian Zhu 2021-04-13 16:43 ` Ben Gardon 0 siblings, 1 reply; 10+ messages in thread From: Keqian Zhu @ 2021-04-13 9:39 UTC (permalink / raw) To: Ben Gardon, Sean Christopherson; +Cc: kvm, LKML, Paolo Bonzini, wanghaibin.wang On 2021/4/13 1:19, Ben Gardon wrote: > On Tue, Apr 6, 2021 at 4:42 PM Sean Christopherson <seanjc@google.com> wrote: >> >> +Ben >> >> On Tue, Apr 06, 2021, Keqian Zhu wrote: >>> Hi Paolo, >>> >>> I plan to rework this patch and do full test. What do you think about this idea >>> (enable dirty logging for huge pages lazily)? >> >> Ben, don't you also have something similar (or maybe the exact opposite?) in the >> hopper? This sounds very familiar, but I can't quite connect the dots that are >> floating around my head... > > Sorry for the late response, I was out of office last week. Never mind, Sean has told to me. :) > > Yes, we have two relevant features I'd like to reconcile somehow: > 1.) Large page shattering - Instead of clearing a large TDP mapping, > flushing the TLBs, then replacing it with an empty TDP page table, go > straight from the large mapping to a fully pre-populated table. This > is slightly slower because the table needs to be pre-populated, but it > saves many vCPU page faults. > 2.) Eager page splitting - split all large mappings down to 4k when > enabling dirty logging, using large page shattering. This makes > enabling dirty logging much slower, but speeds up the first round (and > later rounds) of gathering / clearing the dirty log and reduces the > number of vCPU page faults. We've prefered to do this when enabling > dirty logging because it's a little less perf-sensitive than the later > passes where latency and convergence are critical. OK, I see. I think the lock stuff is an important part, so one question is that the shattering process is designed to be locked (i.e., protect mapping) or lock-less? If it's locked, vCPU thread may be blocked for a long time (For arm, there is a mmu_lock per VM). If it's lock-less, how can we ensure the synchronization of mapping? > > Large page shattering can happen in the NPT page fault handler or the > thread enabling dirty logging / clearing the dirty log, so it's > more-or-less orthogonal to this patch. > > Eager page splitting on the other hand takes the opposite approach to > this patch, frontloading as much of the work to enable dirty logging > as possible. Which approach is better is going to depend a lot on the > guest workload, your live migration constraints, and how the > user-space hypervisor makes use of KVM's growing number of dirty > logging options. In our case, the time to migrate a VM is usually less > of a concern than the performance degradation the guest experiences, > so we want to do everything we can to minimize vCPU exits and exit > latency. Yes, make sense to me. > > I think this is a reasonable change in principle if we're not write > protecting 4k pages already, but it's hard to really validate all the > performance implications. With this change we'd move pretty much all > the work to the first pass of clearing the dirty log, which is > probably an improvement since it's much more granular. The downside is Yes, at least split large page lazily is better than current logic. > that we do more work when we'd really like to be converging the dirty > set as opposed to earlier when we know all pages are dirty anyway. I think the dirty collecting procedure is not affected, do I miss something? > >> >>> PS: As dirty log of TDP MMU has been supported, I should add more code. >>> >>> On 2020/8/28 16:11, Keqian Zhu wrote: >>>> Currently during enable dirty logging, if we're with init-all-set, >>>> we just write protect huge pages and leave normal pages untouched, >>>> for that we can enable dirty logging for these pages lazily. >>>> >>>> It seems that enable dirty logging lazily for huge pages is feasible >>>> too, which not only reduces the time of start dirty logging, also >>>> greatly reduces side-effect on guest when there is high dirty rate. > > The side effect on the guest would also be greatly reduced with large > page shattering above. Sure. > >>>> >>>> (These codes are not tested, for RFC purpose :-) ). >>>> >>>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> >>>> --- >>>> arch/x86/include/asm/kvm_host.h | 3 +- >>>> arch/x86/kvm/mmu/mmu.c | 65 ++++++++++++++++++++++++++------- >>>> arch/x86/kvm/vmx/vmx.c | 3 +- >>>> arch/x86/kvm/x86.c | 22 +++++------ >>>> 4 files changed, 62 insertions(+), 31 deletions(-) >>>> >>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h >>>> index 5303dbc5c9bc..201a068cf43d 100644 >>>> --- a/arch/x86/include/asm/kvm_host.h >>>> +++ b/arch/x86/include/asm/kvm_host.h >>>> @@ -1296,8 +1296,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, >>>> >>>> void kvm_mmu_reset_context(struct kvm_vcpu *vcpu); >>>> void kvm_mmu_slot_remove_write_access(struct kvm *kvm, >>>> - struct kvm_memory_slot *memslot, >>>> - int start_level); >>>> + struct kvm_memory_slot *memslot); >>>> void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm, >>>> const struct kvm_memory_slot *memslot); >>>> void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, >>>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c >>>> index 43fdb0c12a5d..4b7d577de6cd 100644 >>>> --- a/arch/x86/kvm/mmu/mmu.c >>>> +++ b/arch/x86/kvm/mmu/mmu.c >>>> @@ -1625,14 +1625,45 @@ static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head) >>>> } >>>> >>>> /** >>>> - * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages >>>> + * kvm_mmu_write_protect_largepage_masked - write protect selected largepages >>>> * @kvm: kvm instance >>>> * @slot: slot to protect >>>> * @gfn_offset: start of the BITS_PER_LONG pages we care about >>>> * @mask: indicates which pages we should protect >>>> * >>>> - * Used when we do not need to care about huge page mappings: e.g. during dirty >>>> - * logging we do not have any such mappings. >>>> + * @ret: true if all pages are write protected >>>> + */ >>>> +static bool kvm_mmu_write_protect_largepage_masked(struct kvm *kvm, >>>> + struct kvm_memory_slot *slot, >>>> + gfn_t gfn_offset, unsigned long mask) >>>> +{ >>>> + struct kvm_rmap_head *rmap_head; >>>> + bool protected, all_protected; >>>> + gfn_t start_gfn = slot->base_gfn + gfn_offset; >>>> + int i; >>>> + >>>> + all_protected = true; >>>> + while (mask) { >>>> + protected = false; >>>> + for (i = PG_LEVEL_2M; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) { >>>> + rmap_head = __gfn_to_rmap(start_gfn + __ffs(mask), i, slot); >>>> + protectd |= __rmap_write_protect(kvm, rmap_head, false); >>>> + } >>>> + >>>> + all_protected &= protectd; >>>> + /* clear the first set bit */ >>>> + mask &= mask - 1; > > I'm a little confused by the use of mask in this function. If > gfn_offset is aligned to some multiple of 64, which I think it is, all > the bits in the mask will be part of the same large page, so I don't > think the mask adds anything. Right. We just need to consider the first set bit. Thanks for your careful review. > I'm also not sure this function compiles since I think the use of > protectd above will result in an error. Yep, my fault. And as I mentioned before, I should add more code to support TDP... Thanks, Keqian ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] KVM: x86: Support write protect huge pages lazily 2021-04-13 9:39 ` Keqian Zhu @ 2021-04-13 16:43 ` Ben Gardon 2021-04-14 8:35 ` Keqian Zhu 0 siblings, 1 reply; 10+ messages in thread From: Ben Gardon @ 2021-04-13 16:43 UTC (permalink / raw) To: Keqian Zhu; +Cc: Sean Christopherson, kvm, LKML, Paolo Bonzini, wanghaibin.wang On Tue, Apr 13, 2021 at 2:39 AM Keqian Zhu <zhukeqian1@huawei.com> wrote: > > > > On 2021/4/13 1:19, Ben Gardon wrote: > > On Tue, Apr 6, 2021 at 4:42 PM Sean Christopherson <seanjc@google.com> wrote: > >> > >> +Ben > >> > >> On Tue, Apr 06, 2021, Keqian Zhu wrote: > >>> Hi Paolo, > >>> > >>> I plan to rework this patch and do full test. What do you think about this idea > >>> (enable dirty logging for huge pages lazily)? > >> > >> Ben, don't you also have something similar (or maybe the exact opposite?) in the > >> hopper? This sounds very familiar, but I can't quite connect the dots that are > >> floating around my head... > > > > Sorry for the late response, I was out of office last week. > Never mind, Sean has told to me. :) > > > > > Yes, we have two relevant features I'd like to reconcile somehow: > > 1.) Large page shattering - Instead of clearing a large TDP mapping, > > flushing the TLBs, then replacing it with an empty TDP page table, go > > straight from the large mapping to a fully pre-populated table. This > > is slightly slower because the table needs to be pre-populated, but it > > saves many vCPU page faults. > > 2.) Eager page splitting - split all large mappings down to 4k when > > enabling dirty logging, using large page shattering. This makes > > enabling dirty logging much slower, but speeds up the first round (and > > later rounds) of gathering / clearing the dirty log and reduces the > > number of vCPU page faults. We've prefered to do this when enabling > > dirty logging because it's a little less perf-sensitive than the later > > passes where latency and convergence are critical. > OK, I see. I think the lock stuff is an important part, so one question is that > the shattering process is designed to be locked (i.e., protect mapping) or lock-less? > > If it's locked, vCPU thread may be blocked for a long time (For arm, there is a > mmu_lock per VM). If it's lock-less, how can we ensure the synchronization of > mapping? The TDP MMU for x86 could do it under the MMU read lock, but the legacy / shadow x86 MMU and other architectures would need the whole MMU lock. While we do increase the time required to address a large SPTE, we can completely avoid the vCPU needing the MMU lock on an access to that SPTE as the translation goes straight from a large, writable SPTE, to a 4k spte with either the d bit cleared or write protected. If it's write protected, the fault can (at least on x86) be resolved without the MMU lock. When I'm able to put together a large page shattering series, I'll do some performance analysis and see how it changes things, but that part is sort of orthogonal to this change. The more I think about it, the better the init-all-set approach for large pages sounds, compared to eager splitting. I'm definitely in support of this patch and am happy to help review when you send out the v2 with TDP MMU support and such. > > > > > Large page shattering can happen in the NPT page fault handler or the > > thread enabling dirty logging / clearing the dirty log, so it's > > more-or-less orthogonal to this patch. > > > > Eager page splitting on the other hand takes the opposite approach to > > this patch, frontloading as much of the work to enable dirty logging > > as possible. Which approach is better is going to depend a lot on the > > guest workload, your live migration constraints, and how the > > user-space hypervisor makes use of KVM's growing number of dirty > > logging options. In our case, the time to migrate a VM is usually less > > of a concern than the performance degradation the guest experiences, > > so we want to do everything we can to minimize vCPU exits and exit > > latency. > Yes, make sense to me. > > > > > I think this is a reasonable change in principle if we're not write > > protecting 4k pages already, but it's hard to really validate all the > > performance implications. With this change we'd move pretty much all > > the work to the first pass of clearing the dirty log, which is > > probably an improvement since it's much more granular. The downside is > Yes, at least split large page lazily is better than current logic. > > > that we do more work when we'd really like to be converging the dirty > > set as opposed to earlier when we know all pages are dirty anyway. > I think the dirty collecting procedure is not affected, do I miss something? Oh yeah, good point. Since the splitting of large SPTEs is happening in the vCPU threads it wouldn't slow dirty log collection at all. We would have to do slightly more work to write protect the large SPTEs that weren't written to, but that's a relatively small amount of work. > > > > >> > >>> PS: As dirty log of TDP MMU has been supported, I should add more code. > >>> > >>> On 2020/8/28 16:11, Keqian Zhu wrote: > >>>> Currently during enable dirty logging, if we're with init-all-set, > >>>> we just write protect huge pages and leave normal pages untouched, > >>>> for that we can enable dirty logging for these pages lazily. > >>>> > >>>> It seems that enable dirty logging lazily for huge pages is feasible > >>>> too, which not only reduces the time of start dirty logging, also > >>>> greatly reduces side-effect on guest when there is high dirty rate. > > > > The side effect on the guest would also be greatly reduced with large > > page shattering above. > Sure. > > > > >>>> > >>>> (These codes are not tested, for RFC purpose :-) ). > >>>> > >>>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> > >>>> --- > >>>> arch/x86/include/asm/kvm_host.h | 3 +- > >>>> arch/x86/kvm/mmu/mmu.c | 65 ++++++++++++++++++++++++++------- > >>>> arch/x86/kvm/vmx/vmx.c | 3 +- > >>>> arch/x86/kvm/x86.c | 22 +++++------ > >>>> 4 files changed, 62 insertions(+), 31 deletions(-) > >>>> > >>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > >>>> index 5303dbc5c9bc..201a068cf43d 100644 > >>>> --- a/arch/x86/include/asm/kvm_host.h > >>>> +++ b/arch/x86/include/asm/kvm_host.h > >>>> @@ -1296,8 +1296,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, > >>>> > >>>> void kvm_mmu_reset_context(struct kvm_vcpu *vcpu); > >>>> void kvm_mmu_slot_remove_write_access(struct kvm *kvm, > >>>> - struct kvm_memory_slot *memslot, > >>>> - int start_level); > >>>> + struct kvm_memory_slot *memslot); > >>>> void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm, > >>>> const struct kvm_memory_slot *memslot); > >>>> void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, > >>>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > >>>> index 43fdb0c12a5d..4b7d577de6cd 100644 > >>>> --- a/arch/x86/kvm/mmu/mmu.c > >>>> +++ b/arch/x86/kvm/mmu/mmu.c > >>>> @@ -1625,14 +1625,45 @@ static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head) > >>>> } > >>>> > >>>> /** > >>>> - * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages > >>>> + * kvm_mmu_write_protect_largepage_masked - write protect selected largepages > >>>> * @kvm: kvm instance > >>>> * @slot: slot to protect > >>>> * @gfn_offset: start of the BITS_PER_LONG pages we care about > >>>> * @mask: indicates which pages we should protect > >>>> * > >>>> - * Used when we do not need to care about huge page mappings: e.g. during dirty > >>>> - * logging we do not have any such mappings. > >>>> + * @ret: true if all pages are write protected > >>>> + */ > >>>> +static bool kvm_mmu_write_protect_largepage_masked(struct kvm *kvm, > >>>> + struct kvm_memory_slot *slot, > >>>> + gfn_t gfn_offset, unsigned long mask) > >>>> +{ > >>>> + struct kvm_rmap_head *rmap_head; > >>>> + bool protected, all_protected; > >>>> + gfn_t start_gfn = slot->base_gfn + gfn_offset; > >>>> + int i; > >>>> + > >>>> + all_protected = true; > >>>> + while (mask) { > >>>> + protected = false; > >>>> + for (i = PG_LEVEL_2M; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) { > >>>> + rmap_head = __gfn_to_rmap(start_gfn + __ffs(mask), i, slot); > >>>> + protectd |= __rmap_write_protect(kvm, rmap_head, false); > >>>> + } > >>>> + > >>>> + all_protected &= protectd; > >>>> + /* clear the first set bit */ > >>>> + mask &= mask - 1; > > > > I'm a little confused by the use of mask in this function. If > > gfn_offset is aligned to some multiple of 64, which I think it is, all > > the bits in the mask will be part of the same large page, so I don't > > think the mask adds anything. > Right. We just need to consider the first set bit. Thanks for your careful review. > > > I'm also not sure this function compiles since I think the use of > > protectd above will result in an error. > Yep, my fault. > > And as I mentioned before, I should add more code to support TDP... > > Thanks, > Keqian ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] KVM: x86: Support write protect huge pages lazily 2021-04-13 16:43 ` Ben Gardon @ 2021-04-14 8:35 ` Keqian Zhu 0 siblings, 0 replies; 10+ messages in thread From: Keqian Zhu @ 2021-04-14 8:35 UTC (permalink / raw) To: Ben Gardon; +Cc: Sean Christopherson, kvm, LKML, Paolo Bonzini, wanghaibin.wang Hi Ben, On 2021/4/14 0:43, Ben Gardon wrote: > On Tue, Apr 13, 2021 at 2:39 AM Keqian Zhu <zhukeqian1@huawei.com> wrote: >> >> >> >> On 2021/4/13 1:19, Ben Gardon wrote: >>> On Tue, Apr 6, 2021 at 4:42 PM Sean Christopherson <seanjc@google.com> wrote: >>>> >>>> +Ben >>>> >>>> On Tue, Apr 06, 2021, Keqian Zhu wrote: >>>>> Hi Paolo, >>>>> >>>>> I plan to rework this patch and do full test. What do you think about this idea >>>>> (enable dirty logging for huge pages lazily)? >>>> >>>> Ben, don't you also have something similar (or maybe the exact opposite?) in the >>>> hopper? This sounds very familiar, but I can't quite connect the dots that are >>>> floating around my head... >>> >>> Sorry for the late response, I was out of office last week. >> Never mind, Sean has told to me. :) >> >>> >>> Yes, we have two relevant features I'd like to reconcile somehow: >>> 1.) Large page shattering - Instead of clearing a large TDP mapping, >>> flushing the TLBs, then replacing it with an empty TDP page table, go >>> straight from the large mapping to a fully pre-populated table. This >>> is slightly slower because the table needs to be pre-populated, but it >>> saves many vCPU page faults. >>> 2.) Eager page splitting - split all large mappings down to 4k when >>> enabling dirty logging, using large page shattering. This makes >>> enabling dirty logging much slower, but speeds up the first round (and >>> later rounds) of gathering / clearing the dirty log and reduces the >>> number of vCPU page faults. We've prefered to do this when enabling >>> dirty logging because it's a little less perf-sensitive than the later >>> passes where latency and convergence are critical. >> OK, I see. I think the lock stuff is an important part, so one question is that >> the shattering process is designed to be locked (i.e., protect mapping) or lock-less? >> >> If it's locked, vCPU thread may be blocked for a long time (For arm, there is a >> mmu_lock per VM). If it's lock-less, how can we ensure the synchronization of >> mapping? > > The TDP MMU for x86 could do it under the MMU read lock, but the > legacy / shadow x86 MMU and other architectures would need the whole > MMU lock. > While we do increase the time required to address a large SPTE, we can > completely avoid the vCPU needing the MMU lock on an access to that > SPTE as the translation goes straight from a large, writable SPTE, to > a 4k spte with either the d bit cleared or write protected. If it's > write protected, the fault can (at least on x86) be resolved without > the MMU lock. That's sounds good! In terms of lock, x86 is better than arm64. For arm64, we must hold whole MMU lock both for split large page or change permission for 4K page. > > When I'm able to put together a large page shattering series, I'll do > some performance analysis and see how it changes things, but that part OK. > is sort of orthogonal to this change. The more I think about it, the > better the init-all-set approach for large pages sounds, compared to > eager splitting. I'm definitely in support of this patch and am happy > to help review when you send out the v2 with TDP MMU support and such. Thanks a lot. :) > >> >>> >>> Large page shattering can happen in the NPT page fault handler or the >>> thread enabling dirty logging / clearing the dirty log, so it's >>> more-or-less orthogonal to this patch. >>> >>> Eager page splitting on the other hand takes the opposite approach to >>> this patch, frontloading as much of the work to enable dirty logging >>> as possible. Which approach is better is going to depend a lot on the >>> guest workload, your live migration constraints, and how the >>> user-space hypervisor makes use of KVM's growing number of dirty >>> logging options. In our case, the time to migrate a VM is usually less >>> of a concern than the performance degradation the guest experiences, >>> so we want to do everything we can to minimize vCPU exits and exit >>> latency. >> Yes, make sense to me. >> >>> >>> I think this is a reasonable change in principle if we're not write >>> protecting 4k pages already, but it's hard to really validate all the >>> performance implications. With this change we'd move pretty much all >>> the work to the first pass of clearing the dirty log, which is >>> probably an improvement since it's much more granular. The downside is >> Yes, at least split large page lazily is better than current logic. >> >>> that we do more work when we'd really like to be converging the dirty >>> set as opposed to earlier when we know all pages are dirty anyway. >> I think the dirty collecting procedure is not affected, do I miss something? > > Oh yeah, good point. Since the splitting of large SPTEs is happening > in the vCPU threads it wouldn't slow dirty log collection at all. We > would have to do slightly more work to write protect the large SPTEs > that weren't written to, but that's a relatively small amount of work. Indeed. BRs, Keqian ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-04-14 8:35 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-08-28 8:11 [RFC PATCH] KVM: x86: Support write protect huge pages lazily Keqian Zhu 2021-04-06 11:49 ` Keqian Zhu 2021-04-06 23:42 ` Sean Christopherson 2021-04-08 7:35 ` Keqian Zhu 2021-04-08 15:52 ` Sean Christopherson 2021-04-09 1:13 ` Keqian Zhu 2021-04-12 17:19 ` Ben Gardon 2021-04-13 9:39 ` Keqian Zhu 2021-04-13 16:43 ` Ben Gardon 2021-04-14 8:35 ` Keqian Zhu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).