qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Keqian Zhu <zhukeqian1@huawei.com>
To: Peter Xu <peterx@redhat.com>, <qemu-devel@nongnu.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Hyman <huangy81@chinatelecom.cn>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [PATCH v5 00/10] KVM: Dirty ring support (QEMU part)
Date: Mon, 22 Mar 2021 22:02:38 +0800	[thread overview]
Message-ID: <2e057323-8102-7bfc-051b-cd3950c93875@huawei.com> (raw)
In-Reply-To: <20210310203301.194842-1-peterx@redhat.com>

Hi Peter,

On 2021/3/11 4:32, Peter Xu wrote:
> This is v5 of the qemu dirty ring interface support.
> 
> 
> 
> v5:
> 
> - rebase
> 
> - dropped patch "update-linux-headers: Include const.h" after rebase
> 
> - dropped patch "KVM: Fixup kvm_log_clear_one_slot() ioctl return check" since
> 
>   similar patch got merged recently (38e0b7904eca7cd32f8953c3)
> 
> 
> 
> ========= v4 cover letter below =============
> 
> 
> 
> It is merely the same as v3 content-wise, but there're a few things to mention
> 
> besides the rebase itself:
> 
> 
> 
>   - I picked up two patches from Eric Farman for the linux-header updates (from
> 
>     Eric's v3 series) for convenience just in case any of the series would got
> 
>     queued by any maintainer.
> 
> 
> 
>   - One more patch is added as "KVM: Disable manual dirty log when dirty ring
> 
>     enabled".  I found this when testing the branch after rebasing to latest
> 
>     qemu, that not only the manual dirty log capability is not needed for kvm
> 
>     dirty ring, but more importantly INITIALLY_ALL_SET is totally against kvm
> 
>     dirty ring and it could silently crash the guest after migration.  For this
> 
>     new commit, I touched up "KVM: Add dirty-gfn-count property" a bit.
> 
> 
> 
>   - A few more documentation lines in qemu-options.hx.
> 
> 
> 
>   - I removed the RFC tag after kernel series got merged.
> 
> 
> 
> Again, this is only the 1st step to support dirty ring.  Ideally dirty ring
> 
> should grant QEMU the possibility to remove the whole layered dirty bitmap so
> 
> that dirty ring will work similarly as auto-converge enabled but should better;
> 
> we will just throttle vcpus with the dirty ring kvm exit rather than explicitly
> 
> adding a timer to stop the vcpu thread from entering the guest again (like what
> 
> we did with current migration auto-converge).  Some more information could also
> 
> be found in the kvm forum 2020 talk regarding kvm dirty ring (slides 21/22 [1]).
I have read this pdf and code, and I have some questions, hope you can help me. :)

You emphasize that dirty ring is a "Thread-local buffers", but dirty bitmap is global,
but I don't see it has optimization about "locking" compared to dirty bitmap.

The thread-local means that vCPU can flush hardware buffer into dirty ring without
locking, but for bitmap, vCPU can also use atomic set to mark dirty without locking.
Maybe I miss something?

The second question is that you observed longer migration time (55s->73s) when guest
has 24G ram and dirty rate is 800M/s. I am not clear about the reason. As with dirty
ring enabled, Qemu can get dirty info faster which means it handles dirty page more
quick, and guest can be throttled which means dirty page is generated slower. What's
the rationale for the longer migration time?

PS: As the dirty ring is still converted into dirty_bitmap of kvm_slot, so the
"get dirty info faster" maybe not true. :-(

Thanks,
Keqian

> 
> 
> 
> That next step (to remove all the dirty bitmaps, as mentioned above) is still
> 
> discussable: firstly I don't know whether there's anything I've overlooked in
> 
> there.  Meanwhile that's also only services huge VM cases, may not be extremely
> 
> helpful with a lot major scenarios where VMs are not that huge.
> 
> 
> 
> There's probably other ways to fix huge VM migration issues, majorly focusing
> 
> on responsiveness and convergence.  For example, Google has proposed some new
> 
> userfaultfd kernel capability called "minor modes" [2] to track page minor
> 
> faults and that could be finally served for that purpose too using postcopy.
> 
> That's another long story so I'll stop here, but just as a marker along with
> 
> the dirty ring series so there'll still be a record to reference.
> 
> 
> 
> Said that, I still think this series is very worth merging even if we don't
> 
> persue the next steps yet, since dirty ring is disabled by default, and we can
> 
> always work upon this series.
> 
> 
> 
> Please review, thanks.
> 
> 
> 
> V3: https://lore.kernel.org/qemu-devel/20200523232035.1029349-1-peterx@redhat.com/
> 
>     (V3 contains all the pre-v3 changelog)
> 
> 
> 
> QEMU branch for testing (requires kernel version 5.11-rc1+):
> 
>     https://github.com/xzpeter/qemu/tree/kvm-dirty-ring
> 
> 
> 
> [1] https://static.sched.com/hosted_files/kvmforum2020/97/kvm_dirty_ring_peter.pdf
> 
> [2] https://lore.kernel.org/lkml/20210107190453.3051110-1-axelrasmussen@google.com/
> 
> 
> 
> ---------------------------8<---------------------------------
> 
> 
> 
> Overview
> 
> ========
> 
> 
> 
> KVM dirty ring is a new interface to pass over dirty bits from kernel
> 
> to the userspace.  Instead of using a bitmap for each memory region,
> 
> the dirty ring contains an array of dirtied GPAs to fetch, one ring
> 
> per vcpu.
> 
> 
> 
> There're a few major changes comparing to how the old dirty logging
> 
> interface would work:
> 
> 
> 
> - Granularity of dirty bits
> 
> 
> 
>   KVM dirty ring interface does not offer memory region level
> 
>   granularity to collect dirty bits (i.e., per KVM memory
> 
>   slot). Instead the dirty bit is collected globally for all the vcpus
> 
>   at once.  The major effect is on VGA part because VGA dirty tracking
> 
>   is enabled as long as the device is created, also it was in memory
> 
>   region granularity.  Now that operation will be amplified to a VM
> 
>   sync.  Maybe there's smarter way to do the same thing in VGA with
> 
>   the new interface, but so far I don't see it affects much at least
> 
>   on regular VMs.
> 
> 
> 
> - Collection of dirty bits
> 
> 
> 
>   The old dirty logging interface collects KVM dirty bits when
> 
>   synchronizing dirty bits.  KVM dirty ring interface instead used a
> 
>   standalone thread to do that.  So when the other thread (e.g., the
> 
>   migration thread) wants to synchronize the dirty bits, it simply
> 
>   kick the thread and wait until it flushes all the dirty bits to the
> 
>   ramblock dirty bitmap.
> 
> 
> 
> A new parameter "dirty-ring-size" is added to "-accel kvm".  By
> 
> default, dirty ring is still disabled (size==0).  To enable it, we
> 
> need to be with:
> 
> 
> 
>   -accel kvm,dirty-ring-size=65536
> 
> 
> 
> This establishes a 64K dirty ring buffer per vcpu.  Then if we
> 
> migrate, it'll switch to dirty ring.
> 
> 
> 
> I gave it a shot with a 24G guest, 8 vcpus, using 10g NIC as migration
> 
> channel.  When idle or dirty workload small, I don't observe major
> 
> difference on total migration time.  When with higher random dirty
> 
> workload (800MB/s dirty rate upon 20G memory, worse for kvm dirty
> 
> ring). Total migration time is (ping pong migrate for 6 times, in
> 
> seconds):
> 
> 
> 
> |-------------------------+---------------|
> 
> | dirty ring (4k entries) | dirty logging |
> 
> |-------------------------+---------------|
> 
> |                      70 |            58 |
> 
> |                      78 |            70 |
> 
> |                      72 |            48 |
> 
> |                      74 |            52 |
> 
> |                      83 |            49 |
> 
> |                      65 |            54 |
> 
> |-------------------------+---------------|
> 
> 
> 
> Summary:
> 
> 
> 
> dirty ring average:    73s
> 
> dirty logging average: 55s
> 
> 
> 
> The KVM dirty ring will be slower in above case.  The number may show
> 
> that the dirty logging is still preferred as a default value because
> 
> small/medium VMs are still major cases, and high dirty workload
> 
> happens frequently too.  And that's what this series did.
> 
> 
> 
> TODO:
> 
> 
> 
> - Consider to drop the BQL dependency: then we can run the reaper thread in
> 
>   parallel of main thread.  Needs some thought around the race conditions.
> 
> 
> 
> - Consider to drop the kvmslot bitmap: logically this can be dropped with kvm
> 
>   dirty ring, not only for space saving, but also it's still another layer
> 
>   linear to guest mem size which is against the whole idea of kvm dirty ring.
> 
>   This should make above number (of kvm dirty ring) even smaller (but still may
> 
>   not be as good as dirty logging when with such high workload).
> 
> 
> 
> Please refer to the code and comment itself for more information.
> 
> 
> 
> Thanks,
> 
> 
> 
> Peter Xu (10):
> 
>   memory: Introduce log_sync_global() to memory listener
> 
>   KVM: Use a big lock to replace per-kml slots_lock
> 
>   KVM: Create the KVMSlot dirty bitmap on flag changes
> 
>   KVM: Provide helper to get kvm dirty log
> 
>   KVM: Provide helper to sync dirty bitmap from slot to ramblock
> 
>   KVM: Simplify dirty log sync in kvm_set_phys_mem
> 
>   KVM: Cache kvm slot dirty bitmap size
> 
>   KVM: Add dirty-gfn-count property
> 
>   KVM: Disable manual dirty log when dirty ring enabled
> 
>   KVM: Dirty ring support
> 
> 
> 
>  accel/kvm/kvm-all.c      | 585 +++++++++++++++++++++++++++++++++------
> 
>  accel/kvm/trace-events   |   7 +
> 
>  include/exec/memory.h    |  12 +
> 
>  include/hw/core/cpu.h    |   8 +
> 
>  include/sysemu/kvm_int.h |   7 +-
> 
>  qemu-options.hx          |  12 +
> 
>  softmmu/memory.c         |  33 ++-
> 
>  7 files changed, 565 insertions(+), 99 deletions(-)
> 
> 
> 


  parent reply	other threads:[~2021-03-22 14:29 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-10 20:32 [PATCH v5 00/10] KVM: Dirty ring support (QEMU part) Peter Xu
2021-03-10 20:32 ` [PATCH v5 01/10] memory: Introduce log_sync_global() to memory listener Peter Xu
2021-03-10 20:32 ` [PATCH v5 02/10] KVM: Use a big lock to replace per-kml slots_lock Peter Xu
2021-03-22 10:47   ` Keqian Zhu
2021-03-22 13:54     ` Paolo Bonzini
2021-03-22 16:27       ` Peter Xu
2021-03-24 18:08         ` Peter Xu
2021-03-10 20:32 ` [PATCH v5 03/10] KVM: Create the KVMSlot dirty bitmap on flag changes Peter Xu
2021-03-10 20:32 ` [PATCH v5 04/10] KVM: Provide helper to get kvm dirty log Peter Xu
2021-03-10 20:32 ` [PATCH v5 05/10] KVM: Provide helper to sync dirty bitmap from slot to ramblock Peter Xu
2021-03-10 20:32 ` [PATCH v5 06/10] KVM: Simplify dirty log sync in kvm_set_phys_mem Peter Xu
2021-03-10 20:32 ` [PATCH v5 07/10] KVM: Cache kvm slot dirty bitmap size Peter Xu
2021-03-10 20:32 ` [PATCH v5 08/10] KVM: Add dirty-gfn-count property Peter Xu
2021-03-10 20:33 ` [PATCH v5 09/10] KVM: Disable manual dirty log when dirty ring enabled Peter Xu
2021-03-22  9:17   ` Keqian Zhu
2021-03-22 13:55     ` Paolo Bonzini
2021-03-22 16:21       ` Peter Xu
2021-03-10 20:33 ` [PATCH v5 10/10] KVM: Dirty ring support Peter Xu
2021-03-22 13:37   ` Keqian Zhu
2021-03-22 18:52     ` Peter Xu
2021-03-23  1:25       ` Keqian Zhu
2021-03-19 18:12 ` [PATCH v5 00/10] KVM: Dirty ring support (QEMU part) Peter Xu
2021-03-22 14:02 ` Keqian Zhu [this message]
2021-03-22 19:45   ` Peter Xu
2021-03-23  6:40     ` Keqian Zhu
2021-03-23 14:34       ` Peter Xu
2021-03-24  2:56         ` Keqian Zhu
2021-03-24 15:09           ` Peter Xu
2021-03-25  1:21             ` Keqian Zhu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2e057323-8102-7bfc-051b-cd3950c93875@huawei.com \
    --to=zhukeqian1@huawei.com \
    --cc=dgilbert@redhat.com \
    --cc=huangy81@chinatelecom.cn \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).