All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Gardon <bgardon@google.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	Peter Shier <pshier@google.com>,
	David Matlack <dmatlack@google.com>,
	Mingwei Zhang <mizhang@google.com>,
	Yulei Zhang <yulei.kernel@gmail.com>,
	Wanpeng Li <kernellwp@gmail.com>,
	Xiao Guangrong <xiaoguangrong.eric@gmail.com>,
	Kai Huang <kai.huang@intel.com>,
	Keqian Zhu <zhukeqian1@huawei.com>,
	David Hildenbrand <david@redhat.com>
Subject: Re: [PATCH 00/15] Currently disabling dirty logging with the TDP MMU is extremely slow. On a 96 vCPU / 96G VM it takes ~45 seconds to disable dirty logging with the TDP MMU, as opposed to ~3.5 seconds with the legacy MMU. This series optimizes TLB flushes and introduces in-place large page promotion, to bring the disable dirty log time down to ~2 seconds.
Date: Mon, 15 Nov 2021 15:58:25 -0800	[thread overview]
Message-ID: <CANgfPd8AHMdXAKrOVOwfAps+0qXVfkHD6EY-kW6oU+DisqGiLg@mail.gmail.com> (raw)
In-Reply-To: <20211115234603.2908381-1-bgardon@google.com>

On Mon, Nov 15, 2021 at 3:46 PM Ben Gardon <bgardon@google.com> wrote:

Haha oops, I lost the covers letter title there. It should have been
"KVM: x86/mmu: Optimize disabling dirty logging" and the subject line
should have been the first paragraph.
I'll correct that on any future versions of this series I send out.

>
> Testing:
> Ran KVM selftests and kvm-unit-tests on an Intel Skylake. This
> series introduced no new failures.
>
> Performance:
> To collect these results I needed to apply Mingwei's patch
> "selftests: KVM: align guest physical memory base address to 1GB"
> https://lkml.org/lkml/2021/8/29/310
> David Matlack is going to send out an updated version of that patch soon.
>
> Without this series, TDP MMU:
> > ./dirty_log_perf_test -v 96 -s anonymous_hugetlb_1gb
> Test iterations: 2
> Testing guest mode: PA-bits:ANY, VA-bits:48,  4K pages
> guest physical test memory offset: 0x3fe7c0000000
> Populate memory time: 10.966500447s
> Enabling dirty logging time: 0.002068737s
>
> Iteration 1 dirty memory time: 0.047556280s
> Iteration 1 get dirty log time: 0.001253914s
> Iteration 1 clear dirty log time: 0.049716661s
> Iteration 2 dirty memory time: 3.679662016s
> Iteration 2 get dirty log time: 0.000659546s
> Iteration 2 clear dirty log time: 1.834329322s
> Disabling dirty logging time: 45.738439510s
> Get dirty log over 2 iterations took 0.001913460s. (Avg 0.000956730s/iteration)
> Clear dirty log over 2 iterations took 1.884045983s. (Avg 0.942022991s/iteration)
>
> Without this series, Legacy MMU:
> > ./dirty_log_perf_test -v 96 -s anonymous_hugetlb_1gb
> Test iterations: 2
> Testing guest mode: PA-bits:ANY, VA-bits:48,  4K pages
> guest physical test memory offset: 0x3fe7c0000000
> Populate memory time: 12.664750666s
> Enabling dirty logging time: 0.002025510s
>
> Iteration 1 dirty memory time: 0.046240875s
> Iteration 1 get dirty log time: 0.001864342s
> Iteration 1 clear dirty log time: 0.170243637s
> Iteration 2 dirty memory time: 31.571088701s
> Iteration 2 get dirty log time: 0.000626245s
> Iteration 2 clear dirty log time: 1.294817729s
> Disabling dirty logging time: 3.566831573s
> Get dirty log over 2 iterations took 0.002490587s. (Avg 0.001245293s/iteration)
> Clear dirty log over 2 iterations took 1.465061366s. (Avg 0.732530683s/iteration)
>
> With this series, TDP MMU:
> (Updated since RFC. Pulling out patches 1-4 could have a performance impact.)
> > ./dirty_log_perf_test -v 96 -s anonymous_hugetlb_1gb
> Test iterations: 2
> Testing guest mode: PA-bits:ANY, VA-bits:48,  4K pages
> guest physical test memory offset: 0x3fe7c0000000
> Populate memory time: 12.225242366s
> Enabling dirty logging time: 0.002063442s
>
> Iteration 1 dirty memory time: 0.047598123s
> Iteration 1 get dirty log time: 0.001247702s
> Iteration 1 clear dirty log time: 0.051062420s
> Iteration 2 dirty memory time: 3.660439803s
> Iteration 2 get dirty log time: 0.000736229s
> Iteration 2 clear dirty log time: 1.043469951s
> Disabling dirty logging time: 1.400549627s
> Get dirty log over 2 iterations took 0.001983931s. (Avg 0.000991965s/iteration)
> Clear dirty log over 2 iterations took 1.094532371s. (Avg 0.547266185s/iteration)
>
> Patch breakdown:
> Patches 1 eliminates extra TLB flushes while disabling dirty logging.
> Patches 2-8 remove the need for a vCPU pointer to make_spte
> Patches 9-14 are small refactors in perparation for patch 19
> Patch 15 implements in-place largepage promotion when disabling dirty logging
>
> Changelog:
> RFC -> v1:
>         Dropped the first 4 patches from the series. Patch 1 was sent
>         separately, patches 2-4 will be taken over by Sean Christopherson.
>         Incorporated David Matlack's Reviewed-by.
>
> Ben Gardon (15):
>   KVM: x86/mmu: Remove redundant flushes when disabling dirty logging
>   KVM: x86/mmu: Introduce vcpu_make_spte
>   KVM: x86/mmu: Factor wrprot for nested PML out of make_spte
>   KVM: x86/mmu: Factor mt_mask out of make_spte
>   KVM: x86/mmu: Remove need for a vcpu from
>     kvm_slot_page_track_is_active
>   KVM: x86/mmu: Remove need for a vcpu from mmu_try_to_unsync_pages
>   KVM: x86/mmu: Factor shadow_zero_check out of make_spte
>   KVM: x86/mmu: Replace vcpu argument with kvm pointer in make_spte
>   KVM: x86/mmu: Factor out the meat of reset_tdp_shadow_zero_bits_mask
>   KVM: x86/mmu: Propagate memslot const qualifier
>   KVM: x86/MMU: Refactor vmx_get_mt_mask
>   KVM: x86/mmu: Factor out part of vmx_get_mt_mask which does not depend
>     on vcpu
>   KVM: x86/mmu: Add try_get_mt_mask to x86_ops
>   KVM: x86/mmu: Make kvm_is_mmio_pfn usable outside of spte.c
>   KVM: x86/mmu: Promote pages in-place when disabling dirty logging
>
>  arch/x86/include/asm/kvm-x86-ops.h    |  1 +
>  arch/x86/include/asm/kvm_host.h       |  2 +
>  arch/x86/include/asm/kvm_page_track.h |  6 +-
>  arch/x86/kvm/mmu/mmu.c                | 45 +++++++------
>  arch/x86/kvm/mmu/mmu_internal.h       |  6 +-
>  arch/x86/kvm/mmu/page_track.c         |  8 +--
>  arch/x86/kvm/mmu/paging_tmpl.h        |  6 +-
>  arch/x86/kvm/mmu/spte.c               | 43 ++++++++----
>  arch/x86/kvm/mmu/spte.h               | 17 +++--
>  arch/x86/kvm/mmu/tdp_mmu.c            | 97 +++++++++++++++++++++------
>  arch/x86/kvm/mmu/tdp_mmu.h            |  5 +-
>  arch/x86/kvm/svm/svm.c                |  8 +++
>  arch/x86/kvm/vmx/vmx.c                | 40 ++++++-----
>  include/linux/kvm_host.h              | 10 +--
>  virt/kvm/kvm_main.c                   | 12 ++--
>  15 files changed, 205 insertions(+), 101 deletions(-)
>
> --
> 2.34.0.rc1.387.gb447b232ab-goog
>

      parent reply	other threads:[~2021-11-16  3:37 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-15 23:45 [PATCH 00/15] Currently disabling dirty logging with the TDP MMU is extremely slow. On a 96 vCPU / 96G VM it takes ~45 seconds to disable dirty logging with the TDP MMU, as opposed to ~3.5 seconds with the legacy MMU. This series optimizes TLB flushes and introduces in-place large page promotion, to bring the disable dirty log time down to ~2 seconds Ben Gardon
2021-11-15 23:45 ` [PATCH 01/15] KVM: x86/mmu: Remove redundant flushes when disabling dirty logging Ben Gardon
2021-11-18  8:26   ` Paolo Bonzini
2021-11-15 23:45 ` [PATCH 02/15] KVM: x86/mmu: Introduce vcpu_make_spte Ben Gardon
2021-11-15 23:45 ` [PATCH 03/15] KVM: x86/mmu: Factor wrprot for nested PML out of make_spte Ben Gardon
2021-11-15 23:45 ` [PATCH 04/15] KVM: x86/mmu: Factor mt_mask " Ben Gardon
2021-11-15 23:45 ` [PATCH 05/15] KVM: x86/mmu: Remove need for a vcpu from kvm_slot_page_track_is_active Ben Gardon
2021-11-18  8:25   ` Paolo Bonzini
2021-11-15 23:45 ` [PATCH 06/15] KVM: x86/mmu: Remove need for a vcpu from mmu_try_to_unsync_pages Ben Gardon
2021-11-18  8:25   ` Paolo Bonzini
2021-11-15 23:45 ` [PATCH 07/15] KVM: x86/mmu: Factor shadow_zero_check out of make_spte Ben Gardon
2021-11-15 23:45 ` [PATCH 08/15] KVM: x86/mmu: Replace vcpu argument with kvm pointer in make_spte Ben Gardon
2021-11-15 23:45 ` [PATCH 09/15] KVM: x86/mmu: Factor out the meat of reset_tdp_shadow_zero_bits_mask Ben Gardon
2021-11-15 23:45 ` [PATCH 10/15] KVM: x86/mmu: Propagate memslot const qualifier Ben Gardon
2021-11-18  8:27   ` Paolo Bonzini
2021-11-15 23:45 ` [PATCH 11/15] KVM: x86/MMU: Refactor vmx_get_mt_mask Ben Gardon
2021-11-18  8:30   ` Paolo Bonzini
2021-11-18 15:30     ` Sean Christopherson
2021-11-19  9:02       ` Paolo Bonzini
2021-11-22 18:11         ` Ben Gardon
2021-11-22 18:46           ` Sean Christopherson
2021-11-15 23:46 ` [PATCH 12/15] KVM: x86/mmu: Factor out part of vmx_get_mt_mask which does not depend on vcpu Ben Gardon
2021-11-15 23:46 ` [PATCH 13/15] KVM: x86/mmu: Add try_get_mt_mask to x86_ops Ben Gardon
2021-11-15 23:46 ` [PATCH 14/15] KVM: x86/mmu: Make kvm_is_mmio_pfn usable outside of spte.c Ben Gardon
2021-11-15 23:46 ` [PATCH 15/15] KVM: x86/mmu: Promote pages in-place when disabling dirty logging Ben Gardon
2021-11-25  4:18   ` Peter Xu
2021-11-29 18:31     ` Ben Gardon
2021-11-30  0:13       ` Sean Christopherson
2021-11-30  7:28       ` Peter Xu
2021-11-30 16:01         ` Sean Christopherson
2021-12-01  1:59           ` Peter Xu
2021-11-15 23:58 ` Ben Gardon [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANgfPd8AHMdXAKrOVOwfAps+0qXVfkHD6EY-kW6oU+DisqGiLg@mail.gmail.com \
    --to=bgardon@google.com \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=kai.huang@intel.com \
    --cc=kernellwp@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mizhang@google.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pshier@google.com \
    --cc=seanjc@google.com \
    --cc=xiaoguangrong.eric@gmail.com \
    --cc=yulei.kernel@gmail.com \
    --cc=zhukeqian1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.