All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zdenek Kaspar <zkaspar82@gmail.com>
To: Sean Christopherson <seanjc@google.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: Bad performance since 5.9-rc1
Date: Wed, 13 Jan 2021 23:17:15 +0100	[thread overview]
Message-ID: <20210113231715.615b8d1b.zkaspar82@gmail.com> (raw)
In-Reply-To: <X/9VT6ZgLPZW3dxc@google.com>

On Wed, 13 Jan 2021 12:17:19 -0800
Sean Christopherson <seanjc@google.com> wrote:

> On Tue, Jan 12, 2021, Zdenek Kaspar wrote:
> > On Tue, 22 Dec 2020 22:26:45 +0100
> > Zdenek Kaspar <zkaspar82@gmail.com> wrote:
> > 
> > > On Tue, 22 Dec 2020 09:07:39 -0800
> > > Sean Christopherson <seanjc@google.com> wrote:
> > > 
> > > > On Mon, Dec 21, 2020, Zdenek Kaspar wrote:
> > > > > [  179.364305] WARNING: CPU: 0 PID: 369 at
> > > > > kvm_mmu_zap_oldest_mmu_pages+0xd1/0xe0 [kvm] [  179.365415]
> > > > > Call Trace: [  179.365443]  paging64_page_fault+0x244/0x8e0
> > > > > [kvm]
> > > > 
> > > > This means the shadow page zapping is occuring because KVM is
> > > > hitting the max number of allowed MMU shadow pages.  Can you
> > > > provide your QEMU command line?  I can reproduce the performance
> > > > degredation, but only by deliberately overriding the max number
> > > > of MMU pages via `-machine kvm-shadow-mem` to be an absurdly
> > > > low value.
> > > > 
> > > > > [  179.365596]  kvm_mmu_page_fault+0x376/0x550 [kvm]
> > > > > [  179.365725]  kvm_arch_vcpu_ioctl_run+0xbaf/0x18f0 [kvm]
> > > > > [  179.365772]  kvm_vcpu_ioctl+0x203/0x520 [kvm]
> > > > > [  179.365938]  __x64_sys_ioctl+0x338/0x720
> > > > > [  179.365992]  do_syscall_64+0x33/0x40
> > > > > [  179.366013]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > 
> > > It's one long line, added "\" for mail readability:
> > > 
> > > qemu-system-x86_64 -machine type=q35,accel=kvm            \
> > > -cpu host,host-cache-info=on -smp cpus=2,cores=2          \
> > > -m size=1024 -global virtio-pci.disable-legacy=on         \
> > > -global virtio-pci.disable-modern=off                     \
> > > -device virtio-balloon                                    \
> > > -device virtio-net,netdev=tap-build,mac=DE:AD:BE:EF:00:80 \
> > > -object rng-random,filename=/dev/urandom,id=rng0          \
> > > -device virtio-rng,rng=rng0                               \
> > > -name build,process=qemu-build                            \
> > > -drive
> > > file=/mnt/data/export/unix/kvm/build/openbsd-amd64.img,if=virtio,cache=none,format=raw,aio=native
> > > \ -netdev type=tap,id=tap-build,vhost=on                    \
> > > -serial none                                              \
> > > -parallel none \ -monitor
> > > unix:/dev/shm/kvm-build.sock,server,nowait       \ -enable-kvm
> > > -daemonize -runas qemu
> > > 
> > > Z.
> > 
> > BTW, v5.11-rc3 with kvm-shadow-mem=1073741824 it seems OK.
> >
> > Just curious what v5.8 does
> 
> Aha!  Figured it out.  v5.9 (the commit you bisected to) broke the
> zapping, that's what it did.  The list of MMU pages is a FIFO list,
> meaning KVM adds entries to the head, not the tail.  I botched the
> zapping flow and used for_each instead of for_each_reverse, which
> meant KVM would zap the _newest_ pages instead of the _oldest_ pages.
>  So once a VM hit its limit, KVM would constantly zap the shadow
> pages it just allocated.
> 
> This should resolve the performance regression, or at least make it
> far less painful.  It's possible you may still see some performance
> degredation due to other changes in the the zapping, e.g. more
> aggressive recursive zapping.  If that's the case, I can explore
> other tweaks, e.g. skip higher levels when possible.  I'll get a
> proper patch posted later today.
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index c478904af518..2c6e6fdb26ad 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2417,7 +2417,7 @@ static unsigned long
> kvm_mmu_zap_oldest_mmu_pages(struct kvm *kvm, return 0;
> 
>  restart:
> -       list_for_each_entry_safe(sp, tmp,
> &kvm->arch.active_mmu_pages, link) {
> +       list_for_each_entry_safe_reverse(sp, tmp,
> &kvm->arch.active_mmu_pages, link) { /*
>                  * Don't zap active root pages, the page itself can't
> be freed
>                  * and zapping it will just force vCPUs to realloc
> and reload.
> 
> Side topic, I still can't figure out how on earth your guest kernel
> is hitting the max number of default pages.  Even with large pages
> completely disabled, PTI enabled, multiple guest processes running,
> etc... I hit OOM in the guest before the host's shadow page limit
> kicks in.  I had to force the limit down to 25% of the default to
> reproduce the bad behavior.  All I can figure is that BSD has a
> substantially different paging scheme than Linux.
> 
> > so by any chance is there command for kvm-shadow-mem value via qemu
> > monitor?
> > 
> > Z.

Cool, tested by quick compile in guest and it's a good fix!

5.11.0-rc3-amd64 (list_for_each_entry_safe):
 - with kvm-shadow-mem=1073741824 (without == unusable)
    0m14.86s real     0m10.87s user     0m12.15s system

5.11.0-rc3-2-amd64 (list_for_each_entry_safe_reverse):
    0m14.36s real     0m10.50s user     0m12.43s system

Thanks, Z.

  reply	other threads:[~2021-01-13 22:22 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-19  3:05 Bad performance since 5.9-rc1 Zdenek Kaspar
2020-12-01  6:35 ` Zdenek Kaspar
2020-12-18 19:33   ` Zdenek Kaspar
2020-12-21 19:41     ` Sean Christopherson
2020-12-21 21:13       ` Zdenek Kaspar
2020-12-22 17:07         ` Sean Christopherson
2020-12-22 21:26           ` Zdenek Kaspar
2021-01-12 11:18             ` Zdenek Kaspar
2021-01-13 20:17               ` Sean Christopherson
2021-01-13 22:17                 ` Zdenek Kaspar [this message]
2020-12-02  0:31 ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210113231715.615b8d1b.zkaspar82@gmail.com \
    --to=zkaspar82@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.