linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Peter Xu <peterx@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	Andrew Jones <drjones@redhat.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>
Subject: Re: [PATCH v13 00/14] KVM: Dirty ring interface
Date: Fri, 6 Nov 2020 12:28:17 +0100	[thread overview]
Message-ID: <8b3f68dd-c61c-16a0-2077-0a5d3d57a357@redhat.com> (raw)
In-Reply-To: <20201001012044.5151-1-peterx@redhat.com>

On 01/10/20 03:20, Peter Xu wrote:
> KVM branch:
>    https://github.com/xzpeter/linux/tree/kvm-dirty-ring
> 
> QEMU branch for testing:
>    https://github.com/xzpeter/qemu/tree/kvm-dirty-ring
> 
> v13:
> - rebase to kvm/queue rather than 5.9-rc7.  I think, kvm/queue is broken.  I
>    can only test the dirty ring after I revert 3eb900173c71 ("KVM: x86: VMX:
>    Prevent MSR passthrough when MSR access is denied", 2020-09-28), otherwise
>    the guest will hang on vcpu0 looping forever during boot Linux.
> - added another trivial patch "KVM: Documentation: Update entry for
>    KVM_X86_SET_MSR_FILTER".  It should be squashed into 1a155254ff93 ("KVM: x86:
>    Introduce MSR filtering", 2020-09-28) directly.
> 
> v12:
> - add r-bs for Sean
> - rebase
> 
> v11:
> - rebased to kvm/queue (seems the newest)
> - removed kvm_dirty_ring_waitqueue() tracepoint since not used
> - set memslot->as_id in kvm_delete_memslot() [Sean]
> - let __copy_to_user() always return -EFAULT [Sean]
> - rename 'r' in alloc_apic_access_page into 'hva' [Sean]
> 
> v10:
> - remove unused identity_map_pfn in init_rmode_identity_map [syzbot]
> - add "static" to kvm_dirty_ring_full [syzbot]
> - kvm_page_in_dirty_ring() use "#if" macros for KVM_DIRTY_LOG_PAGE_OFFSET to
>    quiesce syzbot [syzbot]
> - s/false/null/ in gfn_to_memslot_dirty_bitmap() [syzbot]
> 
> v9:
> - patch 3: __x86_set_memory_region: squash another trivial change to return
>    (0xdeadull << 48) always for slot removal [Sean]
> - pick r-bs for Drew
> 
> For previous versions, please refer to:
> 
> V1: https://lore.kernel.org/kvm/20191129213505.18472-1-peterx@redhat.com
> V2: https://lore.kernel.org/kvm/20191221014938.58831-1-peterx@redhat.com
> V3: https://lore.kernel.org/kvm/20200109145729.32898-1-peterx@redhat.com
> V4: https://lore.kernel.org/kvm/20200205025105.367213-1-peterx@redhat.com
> V5: https://lore.kernel.org/kvm/20200304174947.69595-1-peterx@redhat.com
> V6: https://lore.kernel.org/kvm/20200309214424.330363-1-peterx@redhat.com
> V7: https://lore.kernel.org/kvm/20200318163720.93929-1-peterx@redhat.com
> V8: https://lore.kernel.org/kvm/20200331190000.659614-1-peterx@redhat.com
> V9: https://lore.kernel.org/kvm/20200523225659.1027044-1-peterx@redhat.com
> V10: https://lore.kernel.org/kvm/20200601115957.1581250-1-peterx@redhat.com/
> 
> Overview
> ============
> 
> This is a continued work from Lei Cao <lei.cao@stratus.com> and Paolo
> Bonzini on the KVM dirty ring interface.
> 
> The new dirty ring interface is another way to collect dirty pages for
> the virtual machines. It is different from the existing dirty logging
> interface in a few ways, majorly:
> 
>    - Data format: The dirty data was in a ring format rather than a
>      bitmap format, so dirty bits to sync for dirty logging does not
>      depend on the size of guest memory any more, but speed of
>      dirtying.  Also, the dirty ring is per-vcpu, while the dirty
>      bitmap is per-vm.
> 
>    - Data copy: The sync of dirty pages does not need data copy any more,
>      but instead the ring is shared between the userspace and kernel by
>      page sharings (mmap() on vcpu fd)
> 
>    - Interface: Instead of using the old KVM_GET_DIRTY_LOG,
>      KVM_CLEAR_DIRTY_LOG interfaces, the new ring uses the new
>      KVM_RESET_DIRTY_RINGS ioctl when we want to reset the collected
>      dirty pages to protected mode again (works like
>      KVM_CLEAR_DIRTY_LOG, but ring based).  To collecting dirty bits,
>      we only need to read the ring data, no ioctl is needed.
> 
> Ring Layout
> ===========
> 
> KVM dirty ring is per-vcpu.  Each ring is an array of kvm_dirty_gfn
> defined as:
> 
> struct kvm_dirty_gfn {
>          __u32 flags;
>          __u32 slot; /* as_id | slot_id */
>          __u64 offset;
> };
> 
> Each GFN is a state machine itself.  The state is embeded in the flags
> field, as defined in the uapi header:
> 
> /*
>   * KVM dirty GFN flags, defined as:
>   *
>   * |---------------+---------------+--------------|
>   * | bit 1 (reset) | bit 0 (dirty) | Status       |
>   * |---------------+---------------+--------------|
>   * |             0 |             0 | Invalid GFN  |
>   * |             0 |             1 | Dirty GFN    |
>   * |             1 |             X | GFN to reset |
>   * |---------------+---------------+--------------|
>   *
>   * Lifecycle of a dirty GFN goes like:
>   *
>   *      dirtied         collected        reset
>   * 00 -----------> 01 -------------> 1X -------+
>   *  ^                                          |
>   *  |                                          |
>   *  +------------------------------------------+
>   *
>   * The userspace program is only responsible for the 01->1X state
>   * conversion (to collect dirty bits).  Also, it must not skip any
>   * dirty bits so that dirty bits are always collected in sequence.
>   */
> 
> Testing
> =======
> 
> This series provided both the implementation of the KVM dirty ring and
> the test case.  Also I've implemented the QEMU counterpart that can
> run with the new KVM, link can be found at the top of the cover
> letter.  However that's still a very initial version which is prone to
> change and future optimizations.
> 
> I did some measurement with the new method with 24G guest running some
> dirty workload, I don't see any speedup so far, even in some heavy
> dirty load it'll be slower (e.g., when 800MB/s random dirty rate, kvm
> dirty ring takes average of ~73s to complete migration while dirty
> logging only needs average of ~55s).  However that's understandable
> because 24G guest means only 1M dirty bitmap, that's still a suitable
> case for dirty logging.  Meanwhile heavier workload means worst case
> for dirty ring.
> 
> More tests are welcomed if there's bigger host/guest, especially on
> COLO-like workload.
> 
> Please review, thanks.
> 
> Peter Xu (14):
>    KVM: Documentation: Update entry for KVM_X86_SET_MSR_FILTER
>    KVM: Cache as_id in kvm_memory_slot
>    KVM: X86: Don't track dirty for KVM_SET_[TSS_ADDR|IDENTITY_MAP_ADDR]
>    KVM: Pass in kvm pointer into mark_page_dirty_in_slot()
>    KVM: X86: Implement ring-based dirty memory tracking
>    KVM: Make dirty ring exclusive to dirty bitmap log
>    KVM: Don't allocate dirty bitmap if dirty ring is enabled
>    KVM: selftests: Always clear dirty bitmap after iteration
>    KVM: selftests: Sync uapi/linux/kvm.h to tools/
>    KVM: selftests: Use a single binary for dirty/clear log test
>    KVM: selftests: Introduce after_vcpu_run hook for dirty log test
>    KVM: selftests: Add dirty ring buffer test
>    KVM: selftests: Let dirty_log_test async for dirty ring test
>    KVM: selftests: Add "-c" parameter to dirty log test
> 
>   Documentation/virt/kvm/api.rst                | 126 ++++-
>   arch/x86/include/asm/kvm_host.h               |   6 +-
>   arch/x86/include/uapi/asm/kvm.h               |   1 +
>   arch/x86/kvm/Makefile                         |   3 +-
>   arch/x86/kvm/mmu/mmu.c                        |  10 +-
>   arch/x86/kvm/svm/avic.c                       |   9 +-
>   arch/x86/kvm/vmx/vmx.c                        |  96 ++--
>   arch/x86/kvm/x86.c                            |  46 +-
>   include/linux/kvm_dirty_ring.h                | 103 ++++
>   include/linux/kvm_host.h                      |  19 +
>   include/trace/events/kvm.h                    |  63 +++
>   include/uapi/linux/kvm.h                      |  53 ++
>   tools/include/uapi/linux/kvm.h                |  77 ++-
>   tools/testing/selftests/kvm/Makefile          |   2 -
>   .../selftests/kvm/clear_dirty_log_test.c      |   6 -
>   tools/testing/selftests/kvm/dirty_log_test.c  | 505 ++++++++++++++++--
>   .../testing/selftests/kvm/include/kvm_util.h  |   4 +
>   tools/testing/selftests/kvm/lib/kvm_util.c    |  72 ++-
>   .../selftests/kvm/lib/kvm_util_internal.h     |   4 +
>   virt/kvm/dirty_ring.c                         | 197 +++++++
>   virt/kvm/kvm_main.c                           | 168 +++++-
>   21 files changed, 1432 insertions(+), 138 deletions(-)
>   create mode 100644 include/linux/kvm_dirty_ring.h
>   delete mode 100644 tools/testing/selftests/kvm/clear_dirty_log_test.c
>   create mode 100644 virt/kvm/dirty_ring.c
> 

Queued, thanks!  Now on to QEMU...

Paolo


      parent reply	other threads:[~2020-11-06 11:28 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-01  1:20 [PATCH v13 00/14] KVM: Dirty ring interface Peter Xu
2020-10-01  1:20 ` [PATCH v13 01/14] KVM: Documentation: Update entry for KVM_X86_SET_MSR_FILTER Peter Xu
2020-10-01  1:20 ` [PATCH v13 02/14] KVM: Cache as_id in kvm_memory_slot Peter Xu
2020-10-01  1:20 ` [PATCH v13 03/14] KVM: X86: Don't track dirty for KVM_SET_[TSS_ADDR|IDENTITY_MAP_ADDR] Peter Xu
2020-10-01  1:20 ` [PATCH v13 04/14] KVM: Pass in kvm pointer into mark_page_dirty_in_slot() Peter Xu
2020-10-01  1:22 ` [PATCH v13 05/14] KVM: X86: Implement ring-based dirty memory tracking Peter Xu
2020-11-06 10:56   ` Paolo Bonzini
2020-10-01  1:22 ` [PATCH v13 06/14] KVM: Make dirty ring exclusive to dirty bitmap log Peter Xu
2020-11-06 11:00   ` Paolo Bonzini
2020-10-01  1:22 ` [PATCH v13 07/14] KVM: Don't allocate dirty bitmap if dirty ring is enabled Peter Xu
2020-10-01  1:22 ` [PATCH v13 08/14] KVM: selftests: Always clear dirty bitmap after iteration Peter Xu
2020-10-01  1:22 ` [PATCH v13 09/14] KVM: selftests: Sync uapi/linux/kvm.h to tools/ Peter Xu
2020-10-01  1:22 ` [PATCH v13 10/14] KVM: selftests: Use a single binary for dirty/clear log test Peter Xu
2020-10-01  1:22 ` [PATCH v13 11/14] KVM: selftests: Introduce after_vcpu_run hook for dirty " Peter Xu
2020-10-01  1:22 ` [PATCH v13 12/14] KVM: selftests: Add dirty ring buffer test Peter Xu
2020-10-01  1:22 ` [PATCH v13 13/14] KVM: selftests: Let dirty_log_test async for dirty ring test Peter Xu
2020-11-06 11:27   ` Paolo Bonzini
2020-11-06 18:06     ` Peter Xu
2020-11-06 18:15       ` Paolo Bonzini
2020-11-06 18:27         ` Peter Xu
2020-10-01  1:22 ` [PATCH v13 14/14] KVM: selftests: Add "-c" parameter to dirty log test Peter Xu
2020-11-06 11:28 ` Paolo Bonzini [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8b3f68dd-c61c-16a0-2077-0a5d3d57a357@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=drjones@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterx@redhat.com \
    --cc=sean.j.christopherson@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).