From: "Zhoujian (jay)" <jianjay.zhou@huawei.com>
To: Peter Xu <peterx@redhat.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"wangxin (U)" <wangxinxin.wang@huawei.com>,
"Huangweidong (C)" <weidong.huang@huawei.com>,
"sean.j.christopherson@intel.com"
<sean.j.christopherson@intel.com>,
"Liujinsong (Paul)" <liu.jinsong@huawei.com>
Subject: RE: [PATCH v2] KVM: x86: enable dirty log gradually in small chunks
Date: Fri, 21 Feb 2020 09:31:52 +0000 [thread overview]
Message-ID: <B2D15215269B544CADD246097EACE7474BB064B7@DGGEMM528-MBX.china.huawei.com> (raw)
In-Reply-To: <20200220191706.GF2905@xz-x1>
> -----Original Message-----
> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Friday, February 21, 2020 3:17 AM
> To: Zhoujian (jay) <jianjay.zhou@huawei.com>
> Cc: kvm@vger.kernel.org; pbonzini@redhat.com; wangxin (U)
> <wangxinxin.wang@huawei.com>; Huangweidong (C)
> <weidong.huang@huawei.com>; sean.j.christopherson@intel.com; Liujinsong
> (Paul) <liu.jinsong@huawei.com>
> Subject: Re: [PATCH v2] KVM: x86: enable dirty log gradually in small chunks
>
> On Thu, Feb 20, 2020 at 12:28:28PM +0800, Jay Zhou wrote:
> > It could take kvm->mmu_lock for an extended period of time when
> > enabling dirty log for the first time. The main cost is to clear all
> > the D-bits of last level SPTEs. This situation can benefit from manual
> > dirty log protect as well, which can reduce the mmu_lock time taken.
> > The sequence is like this:
> >
> > 1. Initialize all the bits of the dirty bitmap to 1 when enabling
> > dirty log for the first time
> > 2. Only write protect the huge pages
> > 3. KVM_GET_DIRTY_LOG returns the dirty bitmap info 4.
> > KVM_CLEAR_DIRTY_LOG will clear D-bit for each of the leaf level
> > SPTEs gradually in small chunks
> >
> > Under the Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz environment, I did
> > some tests with a 128G windows VM and counted the time taken of
> > memory_global_dirty_log_start, here is the numbers:
> >
> > VM Size Before After optimization
> > 128G 460ms 10ms
> >
> > Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
> > ---
> > v2:
> > * add new bit to KVM_ENABLE_CAP for
> KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 [Paolo]
> > * support non-PML path [Peter]
> > * delete the unnecessary ifdef and make the initialization of bitmap
> > more clear [Sean]
> > * document the new bits and tweak the testcase
> >
> > Documentation/virt/kvm/api.rst | 23
> +++++++++++++++++------
> > arch/x86/kvm/mmu/mmu.c | 8 ++++++--
> > arch/x86/kvm/vmx/vmx.c | 3 ++-
> > include/linux/kvm_host.h | 7 ++++++-
> > tools/testing/selftests/kvm/dirty_log_test.c | 3 ++-
> > virt/kvm/kvm_main.c | 25
> ++++++++++++++++++-------
> > 6 files changed, 51 insertions(+), 18 deletions(-)
> >
> > diff --git a/Documentation/virt/kvm/api.rst
> > b/Documentation/virt/kvm/api.rst index 97a72a5..1afd310 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -1267,9 +1267,11 @@ pages in the host.
> > The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
> > KVM_MEM_READONLY. The former can be set to instruct KVM to keep
> track
> > of writes to memory within the slot. See KVM_GET_DIRTY_LOG ioctl to
> > know how to -use it. The latter can be set, if KVM_CAP_READONLY_MEM
> > capability allows it, -to make a new slot read-only. In this case,
> > writes to this memory will be -posted to userspace as KVM_EXIT_MMIO
> exits.
> > +use it. It will be different if the KVM_DIRTY_LOG_INITIALLY_SET flag
> > +of
> > +KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is set. For more information,
> see
> > +the description of the capability. The latter can be set, if
> > +KVM_CAP_READONLY_MEM capability allows it, to make a new slot
> > +read-only. In this case, writes to this memory will be posted to userspace
> as KVM_EXIT_MMIO exits.
>
> Not sure about others, but my own preference is that we keep this part
> untouched... The changed document could be even more confusing to me...
Just want to mention something different happened for the other code logic
if KVM_DIRTY_LOG_INITIALLY_SET flag is set, but I have to admit that the
"difference" is not described clearly here. It is described at [1] below,
so keep this part untouched maybe is a choice.
> >
> > When the KVM_CAP_SYNC_MMU capability is available, changes in the
> > backing of the memory region are automatically reflected into the
> > guest. For example, an @@ -5704,10 +5706,19 @@ and injected
> exceptions.
> > :Architectures: x86, arm, arm64, mips
> > :Parameters: args[0] whether feature should be enabled or not
> >
> > -With this capability enabled, KVM_GET_DIRTY_LOG will not
> > automatically
> > +Valid flags are::
> > +
> > + #define KVM_DIRTY_LOG_MANUAL_PROTECT (1 << 0) #define
> > + KVM_DIRTY_LOG_INITIALLY_SET (1 << 1)
> > +
> > +With KVM_DIRTY_LOG_MANUAL_PROTECT set, KVM_GET_DIRTY_LOG will
> not
> > +automatically
> > clear and write-protect all pages that are returned as dirty.
> > Rather, userspace will have to do this operation separately using
> > -KVM_CLEAR_DIRTY_LOG.
> > +KVM_CLEAR_DIRTY_LOG. With KVM_DIRTY_LOG_INITIALLY_SET set, all
> the
> > +bits of the dirty bitmap will be initialized to 1 when created, dirty
> > +logging will be enabled gradually in small chunks using
> > +KVM_CLEAR_DIRTY_LOG ioctl. However, the
> KVM_DIRTY_LOG_INITIALLY_SET
> > +depends on KVM_DIRTY_LOG_MANUAL_PROTECT, it can not be set
> individually and supports x86 only for now.
[1]
>
> Need
> s/KVM_DIRTY_LOG_MANUAL_PROTECT/KVM_DIRTY_LOG_MANUAL_PROTEC
> T2/?
Since the ioctl name is KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2, this naming
replacement seems reasonable. If no one is against it, I'll take it in the next
version.
>
> >
> > At the cost of a slightly more complicated operation, this provides
> > better scalability and responsiveness for two reasons. First, @@
> > -5716,7 +5727,7 @@ than requiring to sync a full memslot; this ensures
> > that KVM does not take spinlocks for an extended period of time.
> > Second, in some cases a large amount of time can pass between a call
> > to KVM_GET_DIRTY_LOG and userspace actually using the data in the
> > page. Pages can be modified -during this time, which is inefficint for both
> the guest and userspace:
> > +during this time, which is inefficient for both the guest and userspace:
> > the guest will incur a higher penalty due to write protection faults,
> > while userspace can see false reports of dirty pages. Manual
> > reprotection helps reducing this time, improving guest performance
> > and reducing the diff --git a/arch/x86/kvm/mmu/mmu.c
> > b/arch/x86/kvm/mmu/mmu.c index 87e9ba2..f9c120e 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -5865,8 +5865,12 @@ void
> kvm_mmu_slot_remove_write_access(struct kvm *kvm,
> > bool flush;
> >
> > spin_lock(&kvm->mmu_lock);
> > - flush = slot_handle_all_level(kvm, memslot, slot_rmap_write_protect,
> > - false);
> > + if (kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET)
> > + flush = slot_handle_large_level(kvm, memslot,
> > + slot_rmap_write_protect, false);
> > + else
> > + flush = slot_handle_all_level(kvm, memslot,
> > + slot_rmap_write_protect, false);
> > spin_unlock(&kvm->mmu_lock);
> >
> > /*
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
> > 3be25ec..fcc585a 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -7201,7 +7201,8 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu,
> > int cpu) static void vmx_slot_enable_log_dirty(struct kvm *kvm,
> > struct kvm_memory_slot *slot) {
> > - kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
> > + if (!(kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET))
> > + kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
> > kvm_mmu_slot_largepage_remove_write_access(kvm, slot); }
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index
> > e89eb67..a555b52 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -39,6 +39,11 @@
> > #define KVM_MAX_VCPU_ID KVM_MAX_VCPUS #endif
> >
> > +#define KVM_DIRTY_LOG_MANUAL_PROTECT (1 << 0) #define
> > +KVM_DIRTY_LOG_INITIALLY_SET (1 << 1) #define
> > +KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT |
> \
> > + KVM_DIRTY_LOG_INITIALLY_SET)
> > +
> > /*
> > * The bit 16 ~ bit 31 of kvm_memory_region::flags are internally used
> > * in kvm, other bits are visible for userspace which are defined in
> > @@ -493,7 +498,7 @@ struct kvm { #endif
> > long tlbs_dirty;
> > struct list_head devices;
> > - bool manual_dirty_log_protect;
> > + u64 manual_dirty_log_protect;
> > struct dentry *debugfs_dentry;
> > struct kvm_stat_data **debugfs_stat_data;
> > struct srcu_struct srcu;
> > diff --git a/tools/testing/selftests/kvm/dirty_log_test.c
> > b/tools/testing/selftests/kvm/dirty_log_test.c
> > index 5614222..2a493c1 100644
> > --- a/tools/testing/selftests/kvm/dirty_log_test.c
> > +++ b/tools/testing/selftests/kvm/dirty_log_test.c
> > @@ -317,10 +317,11 @@ static void run_test(enum vm_guest_mode mode,
> unsigned long iterations,
> > host_bmap_track = bitmap_alloc(host_num_pages);
> >
> > #ifdef USE_CLEAR_DIRTY_LOG
> > + int ret = kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2);
> > struct kvm_enable_cap cap = {};
> >
> > cap.cap = KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2;
> > - cap.args[0] = 1;
> > + cap.args[0] = ret;
>
> You enabled the initial-all-set but didn't really check it, so it didn't help much
> from the testcase pov...
vm_enable_cap is called afterwards, the return value is checked inside it,
may I ask this check is enough, or it is needed to get the value through
something like vm_get_cap ?
> I'd suggest you drop this change, and you can work
> on top after this patch can be accepted.
OK, some input parameters for cap.args[0] should be tested I think: 0, 1, 3
should be accepted, the other numbers will not.
>
> (Not to mention the original test actually verified that we don't break, which
> seems good..)
>
> > vm_enable_cap(vm, &cap);
> > #endif
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index
> > 70f03ce..f2631d0 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -862,7 +862,7 @@ static int kvm_vm_release(struct inode *inode,
> struct file *filp)
> > * Allocation size is twice as large as the actual dirty bitmap size.
> > * See x86's kvm_vm_ioctl_get_dirty_log() why this is needed.
> > */
> > -static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
> > +static int kvm_alloc_dirty_bitmap(struct kvm_memory_slot *memslot)
>
> This change seems irrelevant..
>
> > {
> > unsigned long dirty_bytes = 2 * kvm_dirty_bitmap_bytes(memslot);
> >
> > @@ -1094,8 +1094,11 @@ int __kvm_set_memory_region(struct kvm *kvm,
> >
> > /* Allocate page dirty bitmap if needed */
> > if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) {
> > - if (kvm_create_dirty_bitmap(&new) < 0)
> > + if (kvm_alloc_dirty_bitmap(&new))
>
> Same here.
s/kvm_create_dirty_bitmap/kvm_alloc_dirty_bitmap is Sean's suggestion to make
it clear that the helper is only responsible for allocation, then set all the bitmap bits
to 1 using bitmap_set afterwards, which seems reasonable. Do you still think it's
better to keep this name untouched?
>
> > goto out_free;
> > +
> > + if (kvm->manual_dirty_log_protect &
> KVM_DIRTY_LOG_INITIALLY_SET)
>
> (Maybe time to introduce a helper to shorten this check. :)
Yeah, but could this be done on top of this patch?
>
> > + bitmap_set(new.dirty_bitmap, 0, new.npages);
> > }
> >
> > slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
> > @@ -1255,7 +1258,7 @@ int kvm_get_dirty_log_protect(struct kvm *kvm,
> >
> > n = kvm_dirty_bitmap_bytes(memslot);
> > *flush = false;
> > - if (kvm->manual_dirty_log_protect) {
> > + if (kvm->manual_dirty_log_protect &
> KVM_DIRTY_LOG_MANUAL_PROTECT) {
>
> Can also introduce a helper for this too.
>
> Side note: logically this line doesn't need a change, because bit 1 depends on bit
> 0 so if (kvm->manual_dirty_log_protect) it means bit 0 must be set.
Yes, I agree.
>
> > /*
> > * Unlike kvm_get_dirty_log, we always return false in *flush,
> > * because no flush is needed until KVM_CLEAR_DIRTY_LOG.
> There @@
> > -3310,9 +3313,6 @@ static long
> kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
> > case KVM_CAP_IOEVENTFD_ANY_LENGTH:
> > case KVM_CAP_CHECK_EXTENSION_VM:
> > case KVM_CAP_ENABLE_CAP_VM:
> > -#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
> > - case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
> > -#endif
> > return 1;
> > #ifdef CONFIG_KVM_MMIO
> > case KVM_CAP_COALESCED_MMIO:
> > @@ -3320,6 +3320,10 @@ static long
> kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
> > case KVM_CAP_COALESCED_PIO:
> > return 1;
> > #endif
> > +#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
> > + case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
> > + return KVM_DIRTY_LOG_MANUAL_CAPS;
> > +#endif
> > #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
> > case KVM_CAP_IRQ_ROUTING:
> > return KVM_MAX_IRQ_ROUTES;
> > @@ -3348,7 +3352,14 @@ static int
> kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
> > switch (cap->cap) {
> > #ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
> > case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
> > - if (cap->flags || (cap->args[0] & ~1))
> > + if (cap->flags ||
> > + (cap->args[0] & ~KVM_DIRTY_LOG_MANUAL_CAPS) ||
> > + /* The capability of KVM_DIRTY_LOG_INITIALLY_SET depends
> > + * on KVM_DIRTY_LOG_MANUAL_PROTECT, it should not be
> > + * set individually
> > + */
> > + ((cap->args[0] & KVM_DIRTY_LOG_MANUAL_CAPS) ==
> > + KVM_DIRTY_LOG_INITIALLY_SET))
>
> How about something easier to read? :)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index
> 70f03ce0e5c1..9dfbab2a9929 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3348,7 +3348,10 @@ static int
> kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
> switch (cap->cap) {
> #ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
> case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
> - if (cap->flags || (cap->args[0] & ~1))
> + if (cap->flags || (cap->args[0] & ~3))
> + return -EINVAL;
> + /* Allow 00, 01, and 11. */
> + if (cap->args[0] == KVM_DIRTY_LOG_INITIALLY_SET)
> return -EINVAL;
> kvm->manual_dirty_log_protect = cap->args[0];
> return 0;
Looks simpler than mine.
> Otherwise it looks good to me!
Thanks!
Regards,
Jay Zhou
>
> Thanks,
>
> --
> Peter Xu
next prev parent reply other threads:[~2020-02-21 9:32 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-20 4:28 [PATCH v2] KVM: x86: enable dirty log gradually in small chunks Jay Zhou
2020-02-20 19:17 ` Peter Xu
2020-02-20 19:23 ` Sean Christopherson
2020-02-20 19:42 ` Peter Xu
2020-02-21 9:43 ` Zhoujian (jay)
2020-02-21 9:31 ` Zhoujian (jay) [this message]
2020-02-21 15:19 ` Peter Xu
2020-02-22 8:11 ` Zhoujian (jay)
2020-02-20 19:28 ` Peter Xu
2020-02-21 9:53 ` Zhoujian (jay)
2020-02-21 15:41 ` Peter Xu
2020-02-22 8:18 ` Zhoujian (jay)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=B2D15215269B544CADD246097EACE7474BB064B7@DGGEMM528-MBX.china.huawei.com \
--to=jianjay.zhou@huawei.com \
--cc=kvm@vger.kernel.org \
--cc=liu.jinsong@huawei.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=sean.j.christopherson@intel.com \
--cc=wangxinxin.wang@huawei.com \
--cc=weidong.huang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).