kvmarm.lists.cs.columbia.edu archive mirror
 help / color / mirror / Atom feed
From: Ben Gardon <bgardon@google.com>
To: Ricardo Koller <ricarkol@google.com>
Cc: pbonzini@redhat.com, maz@kernel.org, oupton@google.com,
	 yuzenghui@huawei.com, dmatlack@google.com, kvm@vger.kernel.org,
	 kvmarm@lists.linux.dev, qperret@google.com,
	catalin.marinas@arm.com,  andrew.jones@linux.dev,
	seanjc@google.com, alexandru.elisei@arm.com,
	 suzuki.poulose@arm.com, eric.auger@redhat.com, gshan@redhat.com,
	 reijiw@google.com, rananta@google.com, ricarkol@gmail.com
Subject: Re: [PATCH 0/9] KVM: arm64: Eager Huge-page splitting for dirty-logging
Date: Mon, 23 Jan 2023 16:48:02 -0800	[thread overview]
Message-ID: <CANgfPd-zjCcV08qPVpbvqT+aYaaaWsdGuDbiFyYS9HM-KaDKug@mail.gmail.com> (raw)
In-Reply-To: <20230113035000.480021-1-ricarkol@google.com>

On Thu, Jan 12, 2023 at 7:50 PM Ricardo Koller <ricarkol@google.com> wrote:
>
> Implement Eager Page Splitting for ARM.
>
> Eager Page Splitting improves the performance of dirty-logging (used
> in live migrations) when guest memory is backed by huge-pages.  It's
> an optimization used in Google Cloud since 2016 on x86, and for the
> last couple of months on ARM.
>
> Background and motivation
> =========================
> Dirty logging is typically used for live-migration iterative copying.
> KVM implements dirty-logging at the PAGE_SIZE granularity (will refer
> to 4K pages from now on).  It does it by faulting on write-protected
> 4K pages.  Therefore, enabling dirty-logging on a huge-page requires
> breaking it into 4K pages in the first place.  KVM does this breaking
> on fault, and because it's in the critical path it only maps the 4K
> page that faulted; every other 4K page is left unmapped.  This is not
> great for performance on ARM for a couple of reasons:
>
> - Splitting on fault can halt vcpus for milliseconds in some
>   implementations. Splitting a block PTE requires using a broadcasted
>   TLB invalidation (TLBI) for every huge-page (due to the
>   break-before-make requirement). Note that x86 doesn't need this. We
>   observed some implementations that take millliseconds to complete
>   broadcasted TLBIs when done in parallel from multiple vcpus.  And
>   that's exactly what happens when doing it on fault: multiple vcpus
>   fault at the same time triggering TLBIs in parallel.
>
> - Read intensive guest workloads end up paying for dirty-logging.
>   Only mapping the faulting 4K page means that all the other pages
>   that were part of the huge-page will now be unmapped. The effect is
>   that any access, including reads, now has to fault.
>
> Eager Page Splitting (on ARM)
> =============================
> Eager Page Splitting fixes the above two issues by eagerly splitting
> huge-pages when enabling dirty logging. The goal is to avoid doing it
> while faulting on write-protected pages. This is what the TDP MMU does
> for x86 [0], except that x86 does it for different reasons: to avoid
> grabbing the MMU lock on fault. Note that taking care of
> write-protection faults still requires grabbing the MMU lock on ARM,
> but not on x86 (with the fast_page_fault path).
>
> An additional benefit of eagerly splitting huge-pages is that it can
> be done in a controlled way (e.g., via an IOCTL). This series provides
> two knobs for doing it, just like its x86 counterpart: when enabling
> dirty logging, and when using the KVM_CLEAR_DIRTY_LOG ioctl. The
> benefit of doing it on KVM_CLEAR_DIRTY_LOG is that this ioctl takes
> ranges, and not complete memslots like when enabling dirty logging.
> This means that the cost of splitting (mainly broadcasted TLBIs) can
> be throttled: split a range, wait for a bit, split another range, etc.
> The benefits of this approach were presented by Oliver Upton at KVM
> Forum 2022 [1].
>
> Implementation
> ==============
> Patches 1-3 add a pgtable utility function for splitting huge block
> PTEs: kvm_pgtable_stage2_split(). Patches 4-8 add support for eagerly
> splitting huge-pages when enabling dirty-logging and when using the
> KVM_CLEAR_DIRTY_LOG ioctl. Note that this is just like what x86 does,
> and the code is actually based on it.  And finally, patch 9:
>
>         KVM: arm64: Use local TLBI on permission relaxation
>
> adds support for using local TLBIs instead of broadcasts when doing
> permission relaxation. This last patch is key to achieving good
> performance during dirty-logging, as eagerly breaking huge-pages
> replaces mapping new pages with permission relaxation. Got this patch
> (indirectly) from Marc Z.  and took the liberty of adding a commit
> message.
>
> Note: this applies on top of 6.2-rc3.
>
> Performance evaluation
> ======================
> The performance benefits were tested using the dirty_log_perf_test
> selftest with 2M huge-pages.
>
> The first test uses a write-only sequential workload where the stride
> is 2M instead of 4K [2]. The idea with this experiment is to emulate a
> random access pattern writing a different huge-page at every access.
> Observe that the benefit increases with the number of vcpus: up to
> 5.76x for 152 vcpus.
>
> ./dirty_log_perf_test_sparse -s anonymous_hugetlb_2mb -b 1G -v $i -i 3 -m 2
>
>         +-------+----------+------------------+
>         | vCPUs | 6.2-rc3  | 6.2-rc3 + series |
>         |       |    (ms)  |             (ms) |
>         +-------+----------+------------------+
>         |    1  |    2.63  |          1.66    |
>         |    2  |    2.95  |          1.70    |
>         |    4  |    3.21  |          1.71    |
>         |    8  |    4.97  |          1.78    |
>         |   16  |    9.51  |          1.82    |
>         |   32  |   20.15  |          3.03    |
>         |   64  |   40.09  |          5.80    |
>         |  128  |   80.08  |         12.24    |
>         |  152  |  109.81  |         15.14    |
>         +-------+----------+------------------+

This is the average pass time I assume? Or the total or last? Or whole
test runtime?
Was this run with MANUAL_PROTECT or do the split during the CLEAR
IOCTL or with the splitting at enable time?

>
> This second test measures the benefit of eager page splitting on read
> intensive workloads (1 write for every 10 reads). As in the other
> test, the benefit increases with the number of vcpus, up to 8.82x for
> 152 vcpus.
>
> ./dirty_log_perf_test -s anonymous_hugetlb_2mb -b 1G -v $i -i 3 -m 2 -f 10
>
>         +-------+----------+------------------+
>         | vCPUs | 6.2-rc3  | 6.2-rc3 + series |
>         |       |   (sec)  |            (sec) |
>         +-------+----------+------------------+
>         |    1  |    0.65  |          0.07    |
>         |    2  |    0.70  |          0.08    |
>         |    4  |    0.71  |          0.08    |
>         |    8  |    0.72  |          0.08    |
>         |   16  |    0.76  |          0.08    |
>         |   32  |    1.61  |          0.14    |
>         |   64  |    3.46  |          0.30    |
>         |  128  |    5.49  |          0.64    |
>         |  152  |    6.44  |          0.63    |
>         +-------+----------+------------------+
>
> Changes from the RFC:
> https://lore.kernel.org/kvmarm/20221112081714.2169495-1-ricarkol@google.com/
> - dropped the changes to split on POST visits. No visible perf
>   benefit.
> - changed the kvm_pgtable_stage2_free_removed() implementation to
>   reuse the stage2 mapper.
> - dropped the FEAT_BBM changes and optimization. Will send this on a
>   different series.
>
> Thanks,
> Ricardo
>
> [0] https://lore.kernel.org/kvm/20220119230739.2234394-1-dmatlack@google.com/
> [1] https://kvmforum2022.sched.com/event/15jJq/kvmarm-at-scale-improvements-to-the-mmu-in-the-face-of-hardware-growing-pains-oliver-upton-google
> [2] https://github.com/ricarkol/linux/commit/f78e9102b2bff4fb7f30bee810d7d611a537b46d
> [3] https://lore.kernel.org/kvmarm/20221107215644.1895162-1-oliver.upton@linux.dev/
>
> Marc Zyngier (1):
>   KVM: arm64: Use local TLBI on permission relaxation
>
> Ricardo Koller (8):
>   KVM: arm64: Add KVM_PGTABLE_WALK_REMOVED into ctx->flags
>   KVM: arm64: Add helper for creating removed stage2 subtrees
>   KVM: arm64: Add kvm_pgtable_stage2_split()
>   KVM: arm64: Refactor kvm_arch_commit_memory_region()
>   KVM: arm64: Add kvm_uninit_stage2_mmu()
>   KVM: arm64: Split huge pages when dirty logging is enabled
>   KVM: arm64: Open-code kvm_mmu_write_protect_pt_masked()
>   KVM: arm64: Split huge pages during KVM_CLEAR_DIRTY_LOG
>
>  arch/arm64/include/asm/kvm_asm.h     |   4 +
>  arch/arm64/include/asm/kvm_host.h    |  30 +++++
>  arch/arm64/include/asm/kvm_mmu.h     |   1 +
>  arch/arm64/include/asm/kvm_pgtable.h |  62 ++++++++++
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c   |  10 ++
>  arch/arm64/kvm/hyp/nvhe/tlb.c        |  54 +++++++++
>  arch/arm64/kvm/hyp/pgtable.c         | 143 +++++++++++++++++++++--
>  arch/arm64/kvm/hyp/vhe/tlb.c         |  32 +++++
>  arch/arm64/kvm/mmu.c                 | 168 ++++++++++++++++++++++-----
>  9 files changed, 466 insertions(+), 38 deletions(-)
>
> --
> 2.39.0.314.g84b9a713c41-goog
>

  parent reply	other threads:[~2023-01-24  0:48 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-13  3:49 [PATCH 0/9] KVM: arm64: Eager Huge-page splitting for dirty-logging Ricardo Koller
2023-01-13  3:49 ` [PATCH 1/9] KVM: arm64: Add KVM_PGTABLE_WALK_REMOVED into ctx->flags Ricardo Koller
2023-01-24  0:51   ` Ben Gardon
2023-01-24  0:56     ` Oliver Upton
2023-01-24 16:32       ` Ricardo Koller
2023-01-24 18:00         ` Ben Gardon
2023-01-26 18:48           ` Ricardo Koller
2023-01-24 16:30     ` Ricardo Koller
2023-01-13  3:49 ` [PATCH 2/9] KVM: arm64: Add helper for creating removed stage2 subtrees Ricardo Koller
2023-01-14 17:58   ` kernel test robot
2023-01-24  0:55   ` Ben Gardon
2023-01-24 16:35     ` Ricardo Koller
2023-01-24 17:07       ` Oliver Upton
2023-01-13  3:49 ` [PATCH 3/9] KVM: arm64: Add kvm_pgtable_stage2_split() Ricardo Koller
2023-01-24  1:03   ` Ben Gardon
2023-01-24 16:46     ` Ricardo Koller
2023-01-24 17:11       ` Oliver Upton
2023-01-24 17:18         ` Ricardo Koller
2023-01-24 17:48           ` David Matlack
2023-01-24 20:28             ` Oliver Upton
2023-02-06  9:20   ` Zheng Chuan
2023-02-06 16:28     ` Ricardo Koller
2023-01-13  3:49 ` [PATCH 4/9] KVM: arm64: Refactor kvm_arch_commit_memory_region() Ricardo Koller
2023-01-13  3:49 ` [PATCH 5/9] KVM: arm64: Add kvm_uninit_stage2_mmu() Ricardo Koller
2023-01-13  3:49 ` [PATCH 6/9] KVM: arm64: Split huge pages when dirty logging is enabled Ricardo Koller
2023-01-24 17:52   ` Ben Gardon
2023-01-24 22:19     ` Oliver Upton
2023-01-24 22:45   ` Oliver Upton
2023-01-26 18:45     ` Ricardo Koller
2023-01-26 19:25       ` Ricardo Koller
2023-01-26 20:10       ` Marc Zyngier
2023-01-27 15:45         ` Ricardo Koller
2023-01-30 21:18           ` Oliver Upton
2023-01-31  1:18             ` Sean Christopherson
2023-01-31 17:45               ` Oliver Upton
2023-01-31 17:54                 ` Sean Christopherson
2023-01-31 19:06                   ` Oliver Upton
2023-01-31 18:01                 ` David Matlack
2023-01-31 18:19                   ` Ricardo Koller
2023-01-31 18:35                   ` Oliver Upton
2023-01-31 10:31             ` Marc Zyngier
2023-01-31 10:28           ` Marc Zyngier
2023-02-06 16:35             ` Ricardo Koller
2023-01-13  3:49 ` [PATCH 7/9] KVM: arm64: Open-code kvm_mmu_write_protect_pt_masked() Ricardo Koller
2023-01-13  3:49 ` [PATCH 8/9] KVM: arm64: Split huge pages during KVM_CLEAR_DIRTY_LOG Ricardo Koller
2023-01-13  3:50 ` [PATCH 9/9] KVM: arm64: Use local TLBI on permission relaxation Ricardo Koller
2023-01-24  0:48 ` Ben Gardon [this message]
2023-01-24 16:50   ` [PATCH 0/9] KVM: arm64: Eager Huge-page splitting for dirty-logging Ricardo Koller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANgfPd-zjCcV08qPVpbvqT+aYaaaWsdGuDbiFyYS9HM-KaDKug@mail.gmail.com \
    --to=bgardon@google.com \
    --cc=alexandru.elisei@arm.com \
    --cc=andrew.jones@linux.dev \
    --cc=catalin.marinas@arm.com \
    --cc=dmatlack@google.com \
    --cc=eric.auger@redhat.com \
    --cc=gshan@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=maz@kernel.org \
    --cc=oupton@google.com \
    --cc=pbonzini@redhat.com \
    --cc=qperret@google.com \
    --cc=rananta@google.com \
    --cc=reijiw@google.com \
    --cc=ricarkol@gmail.com \
    --cc=ricarkol@google.com \
    --cc=seanjc@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).