kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bad performance since 5.9-rc1
@ 2020-11-19  3:05 Zdenek Kaspar
  2020-12-01  6:35 ` Zdenek Kaspar
  2020-12-02  0:31 ` Sean Christopherson
  0 siblings, 2 replies; 11+ messages in thread
From: Zdenek Kaspar @ 2020-11-19  3:05 UTC (permalink / raw)
  To: kvm

Hi,

in my initial report (https://marc.info/?l=kvm&m=160502183220080&w=2 -
now fixed by c887c9b9ca62c051d339b1c7b796edf2724029ed) I saw degraded
performance going back somewhere between v5.8 - v5.9-rc1.

OpenBSD 6.8 (GENERIC.MP) guest performance (time ./test-build.sh)
good: 0m13.54s real     0m10.51s user     0m10.96s system
bad : 6m20.07s real    11m42.93s user     0m13.57s system

bisected to first bad commit: 6b82ef2c9cf18a48726e4bb359aa9014632f6466

git bisect log:
# bad: [e47c4aee5bde03e7018f4fde45ba21028a8f8438] KVM: x86/mmu: Rename
page_header() to to_shadow_page() # good:
[01c3b2b5cdae39af8dfcf6e40fdf484ae0e812c5] KVM: SVM: Rename
svm_nested_virtualize_tpr() to nested_svm_virtualize_tpr() git bisect
start 'e47c4aee5bde' '01c3b2b5cdae' # bad:
[ebdb292dac7993425c8e31e2c21c9978e914a676] KVM: x86/mmu: Batch zap MMU
pages when shrinking the slab git bisect bad
ebdb292dac7993425c8e31e2c21c9978e914a676 # good:
[fb58a9c345f645f1774dcf6a36fda169253008ae] KVM: x86/mmu: Optimize MMU
page cache lookup for fully direct MMUs git bisect good
fb58a9c345f645f1774dcf6a36fda169253008ae # bad:
[6b82ef2c9cf18a48726e4bb359aa9014632f6466] KVM: x86/mmu: Batch zap MMU
pages when recycling oldest pages git bisect bad
6b82ef2c9cf18a48726e4bb359aa9014632f6466 # good:
[f95eec9bed76d42194c23153cb1cc8f186bf91cb] KVM: x86/mmu: Don't put
invalid SPs back on the list of active pages git bisect good
f95eec9bed76d42194c23153cb1cc8f186bf91cb # first bad commit:
[6b82ef2c9cf18a48726e4bb359aa9014632f6466] KVM: x86/mmu: Batch zap MMU
pages when recycling oldest pages

Host machine is old Intel Core2 without EPT (TDP).

TIA, Z.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bad performance since 5.9-rc1
  2020-11-19  3:05 Bad performance since 5.9-rc1 Zdenek Kaspar
@ 2020-12-01  6:35 ` Zdenek Kaspar
  2020-12-18 19:33   ` Zdenek Kaspar
  2020-12-02  0:31 ` Sean Christopherson
  1 sibling, 1 reply; 11+ messages in thread
From: Zdenek Kaspar @ 2020-12-01  6:35 UTC (permalink / raw)
  To: kvm

On Thu, 19 Nov 2020 04:05:26 +0100
Zdenek Kaspar <zkaspar82@gmail.com> wrote:

> Hi,
> 
> in my initial report (https://marc.info/?l=kvm&m=160502183220080&w=2 -
> now fixed by c887c9b9ca62c051d339b1c7b796edf2724029ed) I saw degraded
> performance going back somewhere between v5.8 - v5.9-rc1.
> 
> OpenBSD 6.8 (GENERIC.MP) guest performance (time ./test-build.sh)
> good: 0m13.54s real     0m10.51s user     0m10.96s system
> bad : 6m20.07s real    11m42.93s user     0m13.57s system
> 
> bisected to first bad commit: 6b82ef2c9cf18a48726e4bb359aa9014632f6466
> 
> git bisect log:
> # bad: [e47c4aee5bde03e7018f4fde45ba21028a8f8438] KVM: x86/mmu: Rename
> page_header() to to_shadow_page() # good:
> [01c3b2b5cdae39af8dfcf6e40fdf484ae0e812c5] KVM: SVM: Rename
> svm_nested_virtualize_tpr() to nested_svm_virtualize_tpr() git bisect
> start 'e47c4aee5bde' '01c3b2b5cdae' # bad:
> [ebdb292dac7993425c8e31e2c21c9978e914a676] KVM: x86/mmu: Batch zap MMU
> pages when shrinking the slab git bisect bad
> ebdb292dac7993425c8e31e2c21c9978e914a676 # good:
> [fb58a9c345f645f1774dcf6a36fda169253008ae] KVM: x86/mmu: Optimize MMU
> page cache lookup for fully direct MMUs git bisect good
> fb58a9c345f645f1774dcf6a36fda169253008ae # bad:
> [6b82ef2c9cf18a48726e4bb359aa9014632f6466] KVM: x86/mmu: Batch zap MMU
> pages when recycling oldest pages git bisect bad
> 6b82ef2c9cf18a48726e4bb359aa9014632f6466 # good:
> [f95eec9bed76d42194c23153cb1cc8f186bf91cb] KVM: x86/mmu: Don't put
> invalid SPs back on the list of active pages git bisect good
> f95eec9bed76d42194c23153cb1cc8f186bf91cb # first bad commit:
> [6b82ef2c9cf18a48726e4bb359aa9014632f6466] KVM: x86/mmu: Batch zap MMU
> pages when recycling oldest pages
> 
> Host machine is old Intel Core2 without EPT (TDP).
> 
> TIA, Z.

Hi, with v5.10-rc6:
get_mmio_spte: detect reserved bits on spte, addr 0xfe00d000, dump hierarchy:
------ spte 0x8000030e level 3.
------ spte 0xaf82027 level 2.
------ spte 0x2038001ffe00d407 level 1.
------------[ cut here ]------------
WARNING: CPU: 1 PID: 355 at kvm_mmu_page_fault.cold+0x42/0x4f [kvm]
...
CPU: 1 PID: 355 Comm: qemu-build Not tainted 5.10.0-rc6-amd64 #1
Hardware name:  /DG35EC, BIOS ECG3510M.86A.0118.2010.0113.1426 01/13/2010
RIP: 0010:kvm_mmu_page_fault.cold+0x42/0x4f [kvm]
Code: e2 ec 44 8b 04 24 8b 5c 24 0c 44 89 c5 89 da 83 eb 01 48 c7 c7 20 b2 65 c0 48 63 c3 48 8b 74 c4 30 e8 dd 74 e2 ec 39 dd 7e e3 <0f> 0b 41 b8 ea ff ff ff e9 27 99 ff ff 0f 0b 48 8b 54 24 10 48 83
RSP: 0018:ffffb67400653d30 EFLAGS: 00010202
RAX: 0000000000000027 RBX: 0000000000000000 RCX: ffffa271ff2976f8
RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa271ff2976f0
RBP: 0000000000000001 R08: ffffffffadd02ae8 R09: 0000000000000003
R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 00000000fe00d000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
FS:  00007fc10ae3d640(0000) GS:ffffa271ff280000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000002dc2000 CR4: 00000000000026e0
Call Trace:
 kvm_arch_vcpu_ioctl_run+0xbaf/0x18f0 [kvm]
 ? do_futex+0x7c4/0xb80
 kvm_vcpu_ioctl+0x203/0x520 [kvm]
 ? set_next_entity+0x5b/0x80
 ? __switch_to_asm+0x32/0x60
 ? finish_task_switch+0x70/0x260
 __x64_sys_ioctl+0x338/0x720
 ? __x64_sys_futex+0x120/0x190
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fc10c389f6b
Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 ae 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007fc10ae3c628 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007fc10c389f6b
RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000012
RBP: 000055ad3767baf0 R08: 000055ad36be4850 R09: 00000000000000ff
R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
R13: 000055ad371d9800 R14: 0000000000000001 R15: 0000000000000002
---[ end trace c5f7ae690f5abcc4 ]---

without: kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting 5-level PT
I can run guest again, but with degraded performance as before.

Z.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bad performance since 5.9-rc1
  2020-11-19  3:05 Bad performance since 5.9-rc1 Zdenek Kaspar
  2020-12-01  6:35 ` Zdenek Kaspar
@ 2020-12-02  0:31 ` Sean Christopherson
  1 sibling, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2020-12-02  0:31 UTC (permalink / raw)
  To: Zdenek Kaspar; +Cc: kvm

On Thu, Nov 19, 2020, Zdenek Kaspar wrote:
> Hi,
> 
> in my initial report (https://marc.info/?l=kvm&m=160502183220080&w=2 -
> now fixed by c887c9b9ca62c051d339b1c7b796edf2724029ed) I saw degraded
> performance going back somewhere between v5.8 - v5.9-rc1.
> 
> OpenBSD 6.8 (GENERIC.MP) guest performance (time ./test-build.sh)
> good: 0m13.54s real     0m10.51s user     0m10.96s system
> bad : 6m20.07s real    11m42.93s user     0m13.57s system
> 
> bisected to first bad commit: 6b82ef2c9cf18a48726e4bb359aa9014632f6466

This is working as intended, in the sense that it's expected that guest
performance would go down the drain due to KVM being much more aggressive when
reclaiming shadow pages.  Prior to commit 6b82ef2c9cf1 ("KVM: x86/mmu: Batch zap
MMU pages when recycling oldest pages"), the zapping was completely anemic,
e.g. a few shadow pages would get zapped each call, without even really making a
dent in the memory consumed by KVM for shadow pages.

Any chance you can track down what is triggering KVM reclaim of shadow pages?
E.g. is KVM hitting its limit on the number of MMU pages and reclaiming via
make_mmu_pages_available()?  Or is the host under high memory pressure and
reclaiming memory via mmu_shrink_scan()?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bad performance since 5.9-rc1
  2020-12-01  6:35 ` Zdenek Kaspar
@ 2020-12-18 19:33   ` Zdenek Kaspar
  2020-12-21 19:41     ` Sean Christopherson
  0 siblings, 1 reply; 11+ messages in thread
From: Zdenek Kaspar @ 2020-12-18 19:33 UTC (permalink / raw)
  To: kvm

On Tue, 1 Dec 2020 07:35:37 +0100
Zdenek Kaspar <zkaspar82@gmail.com> wrote:

> On Thu, 19 Nov 2020 04:05:26 +0100
> Zdenek Kaspar <zkaspar82@gmail.com> wrote:
> 
> > Hi,
> > 
> > in my initial report
> > (https://marc.info/?l=kvm&m=160502183220080&w=2 - now fixed by
> > c887c9b9ca62c051d339b1c7b796edf2724029ed) I saw degraded
> > performance going back somewhere between v5.8 - v5.9-rc1.
> > 
> > OpenBSD 6.8 (GENERIC.MP) guest performance (time ./test-build.sh)
> > good: 0m13.54s real     0m10.51s user     0m10.96s system
> > bad : 6m20.07s real    11m42.93s user     0m13.57s system
> > 
> > bisected to first bad commit:
> > 6b82ef2c9cf18a48726e4bb359aa9014632f6466
> > 
> > git bisect log:
> > # bad: [e47c4aee5bde03e7018f4fde45ba21028a8f8438] KVM: x86/mmu:
> > Rename page_header() to to_shadow_page() # good:
> > [01c3b2b5cdae39af8dfcf6e40fdf484ae0e812c5] KVM: SVM: Rename
> > svm_nested_virtualize_tpr() to nested_svm_virtualize_tpr() git
> > bisect start 'e47c4aee5bde' '01c3b2b5cdae' # bad:
> > [ebdb292dac7993425c8e31e2c21c9978e914a676] KVM: x86/mmu: Batch zap
> > MMU pages when shrinking the slab git bisect bad
> > ebdb292dac7993425c8e31e2c21c9978e914a676 # good:
> > [fb58a9c345f645f1774dcf6a36fda169253008ae] KVM: x86/mmu: Optimize
> > MMU page cache lookup for fully direct MMUs git bisect good
> > fb58a9c345f645f1774dcf6a36fda169253008ae # bad:
> > [6b82ef2c9cf18a48726e4bb359aa9014632f6466] KVM: x86/mmu: Batch zap
> > MMU pages when recycling oldest pages git bisect bad
> > 6b82ef2c9cf18a48726e4bb359aa9014632f6466 # good:
> > [f95eec9bed76d42194c23153cb1cc8f186bf91cb] KVM: x86/mmu: Don't put
> > invalid SPs back on the list of active pages git bisect good
> > f95eec9bed76d42194c23153cb1cc8f186bf91cb # first bad commit:
> > [6b82ef2c9cf18a48726e4bb359aa9014632f6466] KVM: x86/mmu: Batch zap
> > MMU pages when recycling oldest pages
> > 
> > Host machine is old Intel Core2 without EPT (TDP).
> > 
> > TIA, Z.
> 
> Hi, with v5.10-rc6:
> get_mmio_spte: detect reserved bits on spte, addr 0xfe00d000, dump
> hierarchy: ------ spte 0x8000030e level 3.
> ------ spte 0xaf82027 level 2.
> ------ spte 0x2038001ffe00d407 level 1.
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 355 at kvm_mmu_page_fault.cold+0x42/0x4f [kvm]
> ...
> CPU: 1 PID: 355 Comm: qemu-build Not tainted 5.10.0-rc6-amd64 #1
> Hardware name:  /DG35EC, BIOS ECG3510M.86A.0118.2010.0113.1426
> 01/13/2010 RIP: 0010:kvm_mmu_page_fault.cold+0x42/0x4f [kvm]
> Code: e2 ec 44 8b 04 24 8b 5c 24 0c 44 89 c5 89 da 83 eb 01 48 c7 c7
> 20 b2 65 c0 48 63 c3 48 8b 74 c4 30 e8 dd 74 e2 ec 39 dd 7e e3 <0f>
> 0b 41 b8 ea ff ff ff e9 27 99 ff ff 0f 0b 48 8b 54 24 10 48 83 RSP:
> 0018:ffffb67400653d30 EFLAGS: 00010202 RAX: 0000000000000027 RBX:
> 0000000000000000 RCX: ffffa271ff2976f8 RDX: 00000000ffffffd8 RSI:
> 0000000000000027 RDI: ffffa271ff2976f0 RBP: 0000000000000001 R08:
> ffffffffadd02ae8 R09: 0000000000000003 R10: 00000000ffffe000 R11:
> 3fffffffffffffff R12: 00000000fe00d000 R13: 0000000000000000 R14:
> 0000000000000000 R15: 0000000000000001 FS:  00007fc10ae3d640(0000)
> GS:ffffa271ff280000(0000) knlGS:0000000000000000 CS:  0010 DS: 0000
> ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3:
> 0000000002dc2000 CR4: 00000000000026e0 Call Trace:
> kvm_arch_vcpu_ioctl_run+0xbaf/0x18f0 [kvm] ? do_futex+0x7c4/0xb80
>  kvm_vcpu_ioctl+0x203/0x520 [kvm]
>  ? set_next_entity+0x5b/0x80
>  ? __switch_to_asm+0x32/0x60
>  ? finish_task_switch+0x70/0x260
>  __x64_sys_ioctl+0x338/0x720
>  ? __x64_sys_futex+0x120/0x190
>  do_syscall_64+0x33/0x40
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7fc10c389f6b
> Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0
> 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48>
> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 ae 0c 00 f7 d8 64 89 01 48 RSP:
> 002b:00007fc10ae3c628 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007fc10c389f6b
> RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000012
> RBP: 000055ad3767baf0 R08: 000055ad36be4850 R09: 00000000000000ff
> R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> R13: 000055ad371d9800 R14: 0000000000000001 R15: 0000000000000002
> ---[ end trace c5f7ae690f5abcc4 ]---
> 
> without: kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting 5-level
> PT I can run guest again, but with degraded performance as before.
> 
> Z.

With: KVM: x86/mmu: Bug fixes and cleanups in get_mmio_spte() series
I can run guest again and performance is slightly better:

v5.8:        0m13.54s real     0m10.51s user     0m10.96s system
v5.9:        6m20.07s real    11m42.93s user     0m13.57s system
v5.10+fixes: 5m50.77s real    10m38.29s user     0m15.96s system

perf top from host when guest (openbsd) is compiling:
  26.85%  [kernel]                  [k] queued_spin_lock_slowpath
   8.49%  [kvm]                     [k] mmu_page_zap_pte
   7.47%  [kvm]                     [k] __kvm_mmu_prepare_zap_page
   3.61%  [kernel]                  [k] clear_page_rep
   2.43%  [kernel]                  [k] page_counter_uncharge
   2.30%  [kvm]                     [k] paging64_page_fault
   2.03%  [kvm_intel]               [k] vmx_vcpu_run
   2.02%  [kvm]                     [k] kvm_vcpu_gfn_to_memslot
   1.95%  [kernel]                  [k] internal_get_user_pages_fast
   1.64%  [kvm]                     [k] kvm_mmu_get_page
   1.55%  [kernel]                  [k] page_counter_try_charge
   1.33%  [kernel]                  [k] propagate_protected_usage
   1.29%  [kvm]                     [k] kvm_arch_vcpu_ioctl_run
   1.13%  [kernel]                  [k] get_page_from_freelist
   1.01%  [kvm]                     [k] paging64_walk_addr_generic
   0.83%  [kernel]                  [k] ___slab_alloc.constprop.0
   0.83%  [kernel]                  [k] kmem_cache_free
   0.82%  [kvm]                     [k] __pte_list_remove
   0.77%  [kernel]                  [k] try_grab_compound_head
   0.76%  [kvm_intel]               [k] 0x000000000001cfa0
   0.74%  [kvm]                     [k] pte_list_add

HTH, Z.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bad performance since 5.9-rc1
  2020-12-18 19:33   ` Zdenek Kaspar
@ 2020-12-21 19:41     ` Sean Christopherson
  2020-12-21 21:13       ` Zdenek Kaspar
  0 siblings, 1 reply; 11+ messages in thread
From: Sean Christopherson @ 2020-12-21 19:41 UTC (permalink / raw)
  To: Zdenek Kaspar; +Cc: kvm

On Fri, Dec 18, 2020, Zdenek Kaspar wrote:
> > without: kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting 5-level
> > PT I can run guest again, but with degraded performance as before.
> > 
> > Z.
> 
> With: KVM: x86/mmu: Bug fixes and cleanups in get_mmio_spte() series

Apologies, I completely missed your bug report for the get_mmio_spte() bugs.

> I can run guest again and performance is slightly better:
> 
> v5.8:        0m13.54s real     0m10.51s user     0m10.96s system
> v5.9:        6m20.07s real    11m42.93s user     0m13.57s system
> v5.10+fixes: 5m50.77s real    10m38.29s user     0m15.96s system
> 
> perf top from host when guest (openbsd) is compiling:
>   26.85%  [kernel]                  [k] queued_spin_lock_slowpath
>    8.49%  [kvm]                     [k] mmu_page_zap_pte
>    7.47%  [kvm]                     [k] __kvm_mmu_prepare_zap_page
>    3.61%  [kernel]                  [k] clear_page_rep
>    2.43%  [kernel]                  [k] page_counter_uncharge
>    2.30%  [kvm]                     [k] paging64_page_fault
>    2.03%  [kvm_intel]               [k] vmx_vcpu_run
>    2.02%  [kvm]                     [k] kvm_vcpu_gfn_to_memslot
>    1.95%  [kernel]                  [k] internal_get_user_pages_fast
>    1.64%  [kvm]                     [k] kvm_mmu_get_page
>    1.55%  [kernel]                  [k] page_counter_try_charge
>    1.33%  [kernel]                  [k] propagate_protected_usage
>    1.29%  [kvm]                     [k] kvm_arch_vcpu_ioctl_run
>    1.13%  [kernel]                  [k] get_page_from_freelist
>    1.01%  [kvm]                     [k] paging64_walk_addr_generic
>    0.83%  [kernel]                  [k] ___slab_alloc.constprop.0
>    0.83%  [kernel]                  [k] kmem_cache_free
>    0.82%  [kvm]                     [k] __pte_list_remove
>    0.77%  [kernel]                  [k] try_grab_compound_head
>    0.76%  [kvm_intel]               [k] 0x000000000001cfa0
>    0.74%  [kvm]                     [k] pte_list_add

Can you try running with this debug hack to understand what is causing KVM to
zap shadow pages?  The expected behavior is that you'll get backtraces for the
first five cases where KVM zaps valid shadow pages.  Compile tested only.


diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5dfe0ede0e81..c5da993ac753 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2404,6 +2404,8 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
        }
 }

+static unsigned long zapped_warns;
+
 static unsigned long kvm_mmu_zap_oldest_mmu_pages(struct kvm *kvm,
                                                  unsigned long nr_to_zap)
 {
@@ -2435,6 +2437,8 @@ static unsigned long kvm_mmu_zap_oldest_mmu_pages(struct kvm *kvm,
                        goto restart;
        }

+       WARN_ON(total_zapped && zapped_warns++ < 5);
+
        kvm_mmu_commit_zap_page(kvm, &invalid_list);

        kvm->stat.mmu_recycled += total_zapped;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bad performance since 5.9-rc1
  2020-12-21 19:41     ` Sean Christopherson
@ 2020-12-21 21:13       ` Zdenek Kaspar
  2020-12-22 17:07         ` Sean Christopherson
  0 siblings, 1 reply; 11+ messages in thread
From: Zdenek Kaspar @ 2020-12-21 21:13 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm

On Mon, 21 Dec 2020 11:41:44 -0800
Sean Christopherson <seanjc@google.com> wrote:

> On Fri, Dec 18, 2020, Zdenek Kaspar wrote:
> > > without: kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting
> > > 5-level PT I can run guest again, but with degraded performance
> > > as before.
> > > 
> > > Z.
> > 
> > With: KVM: x86/mmu: Bug fixes and cleanups in get_mmio_spte() series
> 
> Apologies, I completely missed your bug report for the
> get_mmio_spte() bugs.
> 
> > I can run guest again and performance is slightly better:
> > 
> > v5.8:        0m13.54s real     0m10.51s user     0m10.96s system
> > v5.9:        6m20.07s real    11m42.93s user     0m13.57s system
> > v5.10+fixes: 5m50.77s real    10m38.29s user     0m15.96s system
> > 
> > perf top from host when guest (openbsd) is compiling:
> >   26.85%  [kernel]                  [k] queued_spin_lock_slowpath
> >    8.49%  [kvm]                     [k] mmu_page_zap_pte
> >    7.47%  [kvm]                     [k] __kvm_mmu_prepare_zap_page
> >    3.61%  [kernel]                  [k] clear_page_rep
> >    2.43%  [kernel]                  [k] page_counter_uncharge
> >    2.30%  [kvm]                     [k] paging64_page_fault
> >    2.03%  [kvm_intel]               [k] vmx_vcpu_run
> >    2.02%  [kvm]                     [k] kvm_vcpu_gfn_to_memslot
> >    1.95%  [kernel]                  [k] internal_get_user_pages_fast
> >    1.64%  [kvm]                     [k] kvm_mmu_get_page
> >    1.55%  [kernel]                  [k] page_counter_try_charge
> >    1.33%  [kernel]                  [k] propagate_protected_usage
> >    1.29%  [kvm]                     [k] kvm_arch_vcpu_ioctl_run
> >    1.13%  [kernel]                  [k] get_page_from_freelist
> >    1.01%  [kvm]                     [k] paging64_walk_addr_generic
> >    0.83%  [kernel]                  [k] ___slab_alloc.constprop.0
> >    0.83%  [kernel]                  [k] kmem_cache_free
> >    0.82%  [kvm]                     [k] __pte_list_remove
> >    0.77%  [kernel]                  [k] try_grab_compound_head
> >    0.76%  [kvm_intel]               [k] 0x000000000001cfa0
> >    0.74%  [kvm]                     [k] pte_list_add
> 
> Can you try running with this debug hack to understand what is
> causing KVM to zap shadow pages?  The expected behavior is that
> you'll get backtraces for the first five cases where KVM zaps valid
> shadow pages.  Compile tested only.
> 
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 5dfe0ede0e81..c5da993ac753 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2404,6 +2404,8 @@ static void kvm_mmu_commit_zap_page(struct kvm
> *kvm, }
>  }
> 
> +static unsigned long zapped_warns;
> +
>  static unsigned long kvm_mmu_zap_oldest_mmu_pages(struct kvm *kvm,
>                                                   unsigned long
> nr_to_zap) {
> @@ -2435,6 +2437,8 @@ static unsigned long
> kvm_mmu_zap_oldest_mmu_pages(struct kvm *kvm, goto restart;
>         }
> 
> +       WARN_ON(total_zapped && zapped_warns++ < 5);
> +
>         kvm_mmu_commit_zap_page(kvm, &invalid_list);
> 
>         kvm->stat.mmu_recycled += total_zapped;

[  179.364234] ------------[ cut here ]------------
[  179.364305] WARNING: CPU: 0 PID: 369 at kvm_mmu_zap_oldest_mmu_pages+0xd1/0xe0 [kvm]
[  179.364347] Modules linked in: vhost_net vhost vhost_iotlb tun auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc nfs_ssc lzo zram zsmalloc cpufreq_powersave i915 kvm_intel video intel_gtt iosf_mbi kvm i2c_algo_bit drm_kms_helper bridge stp iTCO_wdt e1000e syscopyarea irqbypass sysfillrect 8250 evdev 8250_base serial_core sysimgblt lpc_ich mfd_core llc button acpi_cpufreq fb_sys_fops processor drm i2c_core sch_fq_codel backlight ip_tables x_tables ipv6 autofs4 btrfs blake2b_generic libcrc32c crc32c_generic xor zstd_decompress zstd_compress lzo_compress lzo_decompress raid6_pq ecb xts dm_crypt dm_mod sd_mod t10_pi hid_generic usbhid hid uhci_hcd ahci libahci pata_jmicron ehci_pci ehci_hcd sata_sil24 usbcore usb_common
[  179.364818] CPU: 0 PID: 369 Comm: qemu-build Not tainted 5.10.2-1-amd64 #1
[  179.364857] Hardware name:  /DG35EC, BIOS ECG3510M.86A.0118.2010.0113.1426 01/13/2010
[  179.364923] RIP: 0010:kvm_mmu_zap_oldest_mmu_pages+0xd1/0xe0 [kvm]
[  179.364959] Code: 48 83 c4 18 4c 89 f0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 05 a0 6b 03 00 48 8d 50 01 48 83 f8 04 48 89 15 91 6b 03 00 77 b9 <0f> 0b eb b5 45 31 f6 eb cd 66 0f 1f 44 00 00 41 57 48 c7 c7 e0 23
[  179.365065] RSP: 0018:ffffab7e8069bb10 EFLAGS: 00010297
[  179.365097] RAX: 0000000000000000 RBX: ffff8fd62c589c78 RCX: 0000000000000000
[  179.365138] RDX: 0000000000000001 RSI: ffffffffffffffff RDI: 00003ba800a00478
[  179.365179] RBP: ffffab7e8073da78 R08: ffff8fd608d06800 R09: 000000000000000a
[  179.365218] R10: ffff8fd608d06800 R11: 000000000000000a R12: ffffab7e80735000
[  179.365257] R13: 0000000000000015 R14: 0000000000000015 R15: ffffab7e8069bb18
[  179.365299] FS:  00007f22e4b6a640(0000) GS:ffff8fd67f200000(0000) knlGS:0000000000000000
[  179.365343] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  179.365376] CR2: 0000098452a94c00 CR3: 000000000a3c0000 CR4: 00000000000026f0
[  179.365415] Call Trace:
[  179.365443]  paging64_page_fault+0x244/0x8e0 [kvm]
[  179.365482]  ? kvm_mmu_pte_write+0x161/0x410 [kvm]
[  179.365521]  ? write_emulate+0x36/0x50 [kvm]
[  179.365558]  ? kvm_fetch_guest_virt+0x7c/0xb0 [kvm]
[  179.365596]  kvm_mmu_page_fault+0x376/0x550 [kvm]
[  179.365628]  ? vmx_vmexit+0x1d/0x40 [kvm_intel]
[  179.365655]  ? vmx_vmexit+0x11/0x40 [kvm_intel]
[  179.365683]  ? vmx_vcpu_enter_exit+0x5c/0x90 [kvm_intel]
[  179.365725]  kvm_arch_vcpu_ioctl_run+0xbaf/0x18f0 [kvm]
[  179.365772]  kvm_vcpu_ioctl+0x203/0x520 [kvm]
[  179.365801]  ? tick_sched_timer+0x69/0xf0
[  179.365823]  ? tick_nohz_handler+0xf0/0xf0
[  179.365848]  ? timerqueue_add+0x96/0xb0
[  179.365870]  ? __hrtimer_run_queues+0x151/0x1b0
[  179.365895]  ? recalibrate_cpu_khz+0x10/0x10
[  179.365918]  ? ktime_get+0x33/0x90
[  179.365938]  __x64_sys_ioctl+0x338/0x720
[  179.365963]  ? fire_user_return_notifiers+0x3c/0x60
[  179.365992]  do_syscall_64+0x33/0x40
[  179.366013]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  179.366041] RIP: 0033:0x7f22e60cef6b
[  179.366061] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 ae 0c 00 f7 d8 64 89 01 48
[  179.366167] RSP: 002b:00007f22e4b69608 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  179.366210] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f22e60cef6b
[  179.366248] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000012
[  179.366287] RBP: 000055a81d11cf70 R08: 000055a81ac42ad8 R09: 00000000000000ff
[  179.366326] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
[  179.366365] R13: 00007f22e6c35001 R14: 0000000000000064 R15: 0000000000000000
[  179.366405] ---[ end trace 63d1ba11f1bc6180 ]---
[  179.367537] ------------[ cut here ]------------
[  179.367583] WARNING: CPU: 0 PID: 369 at kvm_mmu_zap_oldest_mmu_pages+0xd1/0xe0 [kvm]
[  179.367625] Modules linked in: vhost_net vhost vhost_iotlb tun auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc nfs_ssc lzo zram zsmalloc cpufreq_powersave i915 kvm_intel video intel_gtt iosf_mbi kvm i2c_algo_bit drm_kms_helper bridge stp iTCO_wdt e1000e syscopyarea irqbypass sysfillrect 8250 evdev 8250_base serial_core sysimgblt lpc_ich mfd_core llc button acpi_cpufreq fb_sys_fops processor drm i2c_core sch_fq_codel backlight ip_tables x_tables ipv6 autofs4 btrfs blake2b_generic libcrc32c crc32c_generic xor zstd_decompress zstd_compress lzo_compress lzo_decompress raid6_pq ecb xts dm_crypt dm_mod sd_mod t10_pi hid_generic usbhid hid uhci_hcd ahci libahci pata_jmicron ehci_pci ehci_hcd sata_sil24 usbcore usb_common
[  179.375035] CPU: 0 PID: 369 Comm: qemu-build Tainted: G        W         5.10.2-1-amd64 #1
[  179.377470] Hardware name:  /DG35EC, BIOS ECG3510M.86A.0118.2010.0113.1426 01/13/2010
[  179.379948] RIP: 0010:kvm_mmu_zap_oldest_mmu_pages+0xd1/0xe0 [kvm]
[  179.382436] Code: 48 83 c4 18 4c 89 f0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 05 a0 6b 03 00 48 8d 50 01 48 83 f8 04 48 89 15 91 6b 03 00 77 b9 <0f> 0b eb b5 45 31 f6 eb cd 66 0f 1f 44 00 00 41 57 48 c7 c7 e0 23
[  179.387680] RSP: 0018:ffffab7e8069bb10 EFLAGS: 00010293
[  179.390307] RAX: 0000000000000001 RBX: ffff8fd62c589c78 RCX: 0000000000000000
[  179.392890] RDX: 0000000000000002 RSI: ffffffffffffffff RDI: 00003ba800a00478
[  179.395420] RBP: ffffab7e8073da78 R08: ffff8fd608d06800 R09: 000000000000000a
[  179.397901] R10: ffff8fd608d06800 R11: 000000000000000a R12: ffffab7e80735000
[  179.400339] R13: 0000000000000015 R14: 0000000000000015 R15: ffffab7e8069bb18
[  179.402715] FS:  00007f22e4b6a640(0000) GS:ffff8fd67f200000(0000) knlGS:0000000000000000
[  179.405097] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  179.407466] CR2: 000009837e65c200 CR3: 000000000a3c0000 CR4: 00000000000026f0
[  179.409839] Call Trace:
[  179.412140]  paging64_page_fault+0x244/0x8e0 [kvm]
[  179.414377]  ? kvm_mmu_pte_write+0x161/0x410 [kvm]
[  179.416608]  ? write_emulate+0x36/0x50 [kvm]
[  179.418821]  ? kvm_fetch_guest_virt+0x7c/0xb0 [kvm]
[  179.421031]  kvm_mmu_page_fault+0x376/0x550 [kvm]
[  179.423220]  ? vmx_vmexit+0x1d/0x40 [kvm_intel]
[  179.425402]  ? vmx_vmexit+0x11/0x40 [kvm_intel]
[  179.427556]  ? vmx_vcpu_enter_exit+0x5c/0x90 [kvm_intel]
[  179.429707]  kvm_arch_vcpu_ioctl_run+0xbaf/0x18f0 [kvm]
[  179.431838]  kvm_vcpu_ioctl+0x203/0x520 [kvm]
[  179.433944]  ? tick_sched_timer+0x69/0xf0
[  179.436029]  ? tick_nohz_handler+0xf0/0xf0
[  179.438108]  ? timerqueue_add+0x96/0xb0
[  179.440185]  ? __hrtimer_run_queues+0x151/0x1b0
[  179.442263]  ? recalibrate_cpu_khz+0x10/0x10
[  179.444339]  ? ktime_get+0x33/0x90
[  179.446406]  __x64_sys_ioctl+0x338/0x720
[  179.448470]  ? fire_user_return_notifiers+0x3c/0x60
[  179.450558]  do_syscall_64+0x33/0x40
[  179.452639]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  179.454728] RIP: 0033:0x7f22e60cef6b
[  179.456812] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 ae 0c 00 f7 d8 64 89 01 48
[  179.461281] RSP: 002b:00007f22e4b69608 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  179.463572] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f22e60cef6b
[  179.465875] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000012
[  179.468185] RBP: 000055a81d11cf70 R08: 000055a81ac42ad8 R09: 00000000000000ff
[  179.470499] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
[  179.472829] R13: 00007f22e6c35001 R14: 0000000000000064 R15: 0000000000000000
[  179.475154] ---[ end trace 63d1ba11f1bc6181 ]---
[  179.478464] ------------[ cut here ]------------
[  179.480804] WARNING: CPU: 0 PID: 369 at kvm_mmu_zap_oldest_mmu_pages+0xd1/0xe0 [kvm]
[  179.483160] Modules linked in: vhost_net vhost vhost_iotlb tun auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc nfs_ssc lzo zram zsmalloc cpufreq_powersave i915 kvm_intel video intel_gtt iosf_mbi kvm i2c_algo_bit drm_kms_helper bridge stp iTCO_wdt e1000e syscopyarea irqbypass sysfillrect 8250 evdev 8250_base serial_core sysimgblt lpc_ich mfd_core llc button acpi_cpufreq fb_sys_fops processor drm i2c_core sch_fq_codel backlight ip_tables x_tables ipv6 autofs4 btrfs blake2b_generic libcrc32c crc32c_generic xor zstd_decompress zstd_compress lzo_compress lzo_decompress raid6_pq ecb xts dm_crypt dm_mod sd_mod t10_pi hid_generic usbhid hid uhci_hcd ahci libahci pata_jmicron ehci_pci ehci_hcd sata_sil24 usbcore usb_common
[  179.495262] CPU: 0 PID: 369 Comm: qemu-build Tainted: G        W         5.10.2-1-amd64 #1
[  179.497745] Hardware name:  /DG35EC, BIOS ECG3510M.86A.0118.2010.0113.1426 01/13/2010
[  179.500251] RIP: 0010:kvm_mmu_zap_oldest_mmu_pages+0xd1/0xe0 [kvm]
[  179.502761] Code: 48 83 c4 18 4c 89 f0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 05 a0 6b 03 00 48 8d 50 01 48 83 f8 04 48 89 15 91 6b 03 00 77 b9 <0f> 0b eb b5 45 31 f6 eb cd 66 0f 1f 44 00 00 41 57 48 c7 c7 e0 23
[  179.508004] RSP: 0018:ffffab7e8069bb10 EFLAGS: 00010293
[  179.510628] RAX: 0000000000000002 RBX: ffff8fd62c589c78 RCX: 0000000000000000
[  179.513214] RDX: 0000000000000003 RSI: ffffffffffffffff RDI: 00003ba800a00478
[  179.515747] RBP: ffffab7e8073da78 R08: ffff8fd608d06800 R09: 000000000000000a
[  179.518229] R10: ffff8fd608d06800 R11: 000000000000000a R12: ffffab7e80735000
[  179.520670] R13: 0000000000000015 R14: 0000000000000015 R15: ffffab7e8069bb18
[  179.523049] FS:  00007f22e4b6a640(0000) GS:ffff8fd67f200000(0000) knlGS:0000000000000000
[  179.525431] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  179.527799] CR2: 000009844140de00 CR3: 000000000a3c0000 CR4: 00000000000026f0
[  179.530172] Call Trace:
[  179.532471]  paging64_page_fault+0x244/0x8e0 [kvm]
[  179.534709]  ? kvm_fetch_guest_virt+0x7c/0xb0 [kvm]
[  179.536946]  kvm_mmu_page_fault+0x376/0x550 [kvm]
[  179.539160]  ? vmx_vmexit+0x1d/0x40 [kvm_intel]
[  179.541366]  ? vmx_vmexit+0x11/0x40 [kvm_intel]
[  179.543544]  ? vmx_vcpu_enter_exit+0x5c/0x90 [kvm_intel]
[  179.545720]  kvm_arch_vcpu_ioctl_run+0xbaf/0x18f0 [kvm]
[  179.547890]  kvm_vcpu_ioctl+0x203/0x520 [kvm]
[  179.550033]  ? tick_sched_timer+0x69/0xf0
[  179.552153]  ? tick_nohz_handler+0xf0/0xf0
[  179.554250]  ? timerqueue_add+0x96/0xb0
[  179.556329]  ? __hrtimer_run_queues+0x151/0x1b0
[  179.558407]  ? recalibrate_cpu_khz+0x10/0x10
[  179.560486]  ? ktime_get+0x33/0x90
[  179.562557]  __x64_sys_ioctl+0x338/0x720
[  179.564628]  ? fire_user_return_notifiers+0x3c/0x60
[  179.566702]  do_syscall_64+0x33/0x40
[  179.568772]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  179.570867] RIP: 0033:0x7f22e60cef6b
[  179.572957] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 ae 0c 00 f7 d8 64 89 01 48
[  179.577419] RSP: 002b:00007f22e4b69608 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  179.579699] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f22e60cef6b
[  179.582000] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000012
[  179.584312] RBP: 000055a81d11cf70 R08: 000055a81ac42ad8 R09: 00000000000000ff
[  179.586632] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
[  179.588955] R13: 00007f22e6c35001 R14: 0000000000000064 R15: 0000000000000000
[  179.591283] ---[ end trace 63d1ba11f1bc6182 ]---
[  179.595542] ------------[ cut here ]------------
[  179.597890] WARNING: CPU: 0 PID: 369 at kvm_mmu_zap_oldest_mmu_pages+0xd1/0xe0 [kvm]
[  179.600253] Modules linked in: vhost_net vhost vhost_iotlb tun auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc nfs_ssc lzo zram zsmalloc cpufreq_powersave i915 kvm_intel video intel_gtt iosf_mbi kvm i2c_algo_bit drm_kms_helper bridge stp iTCO_wdt e1000e syscopyarea irqbypass sysfillrect 8250 evdev 8250_base serial_core sysimgblt lpc_ich mfd_core llc button acpi_cpufreq fb_sys_fops processor drm i2c_core sch_fq_codel backlight ip_tables x_tables ipv6 autofs4 btrfs blake2b_generic libcrc32c crc32c_generic xor zstd_decompress zstd_compress lzo_compress lzo_decompress raid6_pq ecb xts dm_crypt dm_mod sd_mod t10_pi hid_generic usbhid hid uhci_hcd ahci libahci pata_jmicron ehci_pci ehci_hcd sata_sil24 usbcore usb_common
[  179.612747] CPU: 0 PID: 369 Comm: qemu-build Tainted: G        W         5.10.2-1-amd64 #1
[  179.615250] Hardware name:  /DG35EC, BIOS ECG3510M.86A.0118.2010.0113.1426 01/13/2010
[  179.617769] RIP: 0010:kvm_mmu_zap_oldest_mmu_pages+0xd1/0xe0 [kvm]
[  179.620283] Code: 48 83 c4 18 4c 89 f0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 05 a0 6b 03 00 48 8d 50 01 48 83 f8 04 48 89 15 91 6b 03 00 77 b9 <0f> 0b eb b5 45 31 f6 eb cd 66 0f 1f 44 00 00 41 57 48 c7 c7 e0 23
[  179.625553] RSP: 0018:ffffab7e8069bb10 EFLAGS: 00010297
[  179.628194] RAX: 0000000000000003 RBX: ffff8fd62c589c78 RCX: 0000000000000000
[  179.630866] RDX: 0000000000000004 RSI: ffffffffffffffff RDI: 00003ba800a00478
[  179.633522] RBP: ffffab7e8073da78 R08: ffff8fd608d06800 R09: 000000000000000a
[  179.636131] R10: ffff8fd608d06800 R11: 000000000000000a R12: ffffab7e80735000
[  179.638696] R13: 0000000000000015 R14: 0000000000000015 R15: ffffab7e8069bb18
[  179.641201] FS:  00007f22e4b6a640(0000) GS:ffff8fd67f200000(0000) knlGS:0000000000000000
[  179.643671] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  179.646085] CR2: 00000983af128a00 CR3: 000000000a3c0000 CR4: 00000000000026f0
[  179.648504] Call Trace:
[  179.650901]  paging64_page_fault+0x244/0x8e0 [kvm]
[  179.653291]  ? kvm_mmu_pte_write+0x161/0x410 [kvm]
[  179.655605]  ? write_emulate+0x36/0x50 [kvm]
[  179.657838]  ? kvm_fetch_guest_virt+0x7c/0xb0 [kvm]
[  179.660073]  kvm_mmu_page_fault+0x376/0x550 [kvm]
[  179.662286]  ? vmx_vmexit+0x1d/0x40 [kvm_intel]
[  179.664491]  ? vmx_vmexit+0x11/0x40 [kvm_intel]
[  179.666668]  ? vmx_vcpu_enter_exit+0x5c/0x90 [kvm_intel]
[  179.668842]  kvm_arch_vcpu_ioctl_run+0xbaf/0x18f0 [kvm]
[  179.671013]  kvm_vcpu_ioctl+0x203/0x520 [kvm]
[  179.673155]  ? tick_sched_timer+0x69/0xf0
[  179.675276]  ? tick_nohz_handler+0xf0/0xf0
[  179.677373]  ? timerqueue_add+0x96/0xb0
[  179.679455]  ? __hrtimer_run_queues+0x151/0x1b0
[  179.681534]  ? recalibrate_cpu_khz+0x10/0x10
[  179.683611]  ? ktime_get+0x33/0x90
[  179.685676]  __x64_sys_ioctl+0x338/0x720
[  179.687741]  ? fire_user_return_notifiers+0x3c/0x60
[  179.689829]  do_syscall_64+0x33/0x40
[  179.691909]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  179.693999] RIP: 0033:0x7f22e60cef6b
[  179.696081] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 ae 0c 00 f7 d8 64 89 01 48
[  179.700547] RSP: 002b:00007f22e4b69608 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  179.702836] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f22e60cef6b
[  179.705138] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000012
[  179.707447] RBP: 000055a81d11cf70 R08: 000055a81ac42ad8 R09: 00000000000000ff
[  179.709767] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
[  179.712099] R13: 00007f22e6c35001 R14: 0000000000000064 R15: 0000000000000000
[  179.714422] ---[ end trace 63d1ba11f1bc6183 ]---
[  179.720536] ------------[ cut here ]------------
[  179.722911] WARNING: CPU: 0 PID: 369 at kvm_mmu_zap_oldest_mmu_pages+0xd1/0xe0 [kvm]
[  179.725273] Modules linked in: vhost_net vhost vhost_iotlb tun auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc nfs_ssc lzo zram zsmalloc cpufreq_powersave i915 kvm_intel video intel_gtt iosf_mbi kvm i2c_algo_bit drm_kms_helper bridge stp iTCO_wdt e1000e syscopyarea irqbypass sysfillrect 8250 evdev 8250_base serial_core sysimgblt lpc_ich mfd_core llc button acpi_cpufreq fb_sys_fops processor drm i2c_core sch_fq_codel backlight ip_tables x_tables ipv6 autofs4 btrfs blake2b_generic libcrc32c crc32c_generic xor zstd_decompress zstd_compress lzo_compress lzo_decompress raid6_pq ecb xts dm_crypt dm_mod sd_mod t10_pi hid_generic usbhid hid uhci_hcd ahci libahci pata_jmicron ehci_pci ehci_hcd sata_sil24 usbcore usb_common
[  179.737395] CPU: 0 PID: 369 Comm: qemu-build Tainted: G        W         5.10.2-1-amd64 #1
[  179.739880] Hardware name:  /DG35EC, BIOS ECG3510M.86A.0118.2010.0113.1426 01/13/2010
[  179.742391] RIP: 0010:kvm_mmu_zap_oldest_mmu_pages+0xd1/0xe0 [kvm]
[  179.744905] Code: 48 83 c4 18 4c 89 f0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 05 a0 6b 03 00 48 8d 50 01 48 83 f8 04 48 89 15 91 6b 03 00 77 b9 <0f> 0b eb b5 45 31 f6 eb cd 66 0f 1f 44 00 00 41 57 48 c7 c7 e0 23
[  179.750149] RSP: 0018:ffffab7e8069bb10 EFLAGS: 00010246
[  179.752776] RAX: 0000000000000004 RBX: ffff8fd62c589c78 RCX: 0000000000000000
[  179.755366] RDX: 0000000000000005 RSI: ffffffffffffffff RDI: 00003ba800a00478
[  179.757901] RBP: ffffab7e8073da78 R08: ffff8fd608d06800 R09: 000000000000000a
[  179.760388] R10: ffff8fd608d06800 R11: 000000000000000a R12: ffffab7e80735000
[  179.762832] R13: 0000000000000015 R14: 0000000000000015 R15: ffffab7e8069bb18
[  179.765216] FS:  00007f22e4b6a640(0000) GS:ffff8fd67f200000(0000) knlGS:ffffffff820e7ff0
[  179.767602] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  179.769976] CR2: 000009844cfd4620 CR3: 000000000a3c0000 CR4: 00000000000026f0
[  179.772351] Call Trace:
[  179.774657]  paging64_page_fault+0x244/0x8e0 [kvm]
[  179.776900]  ? kvm_fetch_guest_virt+0x7c/0xb0 [kvm]
[  179.779139]  kvm_mmu_page_fault+0x376/0x550 [kvm]
[  179.781357]  ? vmx_vmexit+0x1d/0x40 [kvm_intel]
[  179.783566]  ? vmx_vmexit+0x11/0x40 [kvm_intel]
[  179.785748]  ? vmx_vcpu_enter_exit+0x5c/0x90 [kvm_intel]
[  179.787928]  kvm_arch_vcpu_ioctl_run+0xbaf/0x18f0 [kvm]
[  179.790105]  kvm_vcpu_ioctl+0x203/0x520 [kvm]
[  179.792254]  ? tick_sched_timer+0x69/0xf0
[  179.794377]  ? tick_nohz_handler+0xf0/0xf0
[  179.796478]  ? timerqueue_add+0x96/0xb0
[  179.798560]  ? __hrtimer_run_queues+0x151/0x1b0
[  179.800642]  ? recalibrate_cpu_khz+0x10/0x10
[  179.802724]  ? ktime_get+0x33/0x90
[  179.804800]  __x64_sys_ioctl+0x338/0x720
[  179.806872]  ? fire_user_return_notifiers+0x3c/0x60
[  179.808951]  do_syscall_64+0x33/0x40
[  179.811025]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  179.813123] RIP: 0033:0x7f22e60cef6b
[  179.815220] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 ae 0c 00 f7 d8 64 89 01 48
[  179.819687] RSP: 002b:00007f22e4b69608 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  179.821972] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f22e60cef6b
[  179.824277] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000012
[  179.826590] RBP: 000055a81d11cf70 R08: 000055a81ac42ad8 R09: 00000000000000ff
[  179.828911] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
[  179.831236] R13: 00007f22e6c35001 R14: 0000000000000064 R15: 0000000000000000
[  179.833563] ---[ end trace 63d1ba11f1bc6184 ]---

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bad performance since 5.9-rc1
  2020-12-21 21:13       ` Zdenek Kaspar
@ 2020-12-22 17:07         ` Sean Christopherson
  2020-12-22 21:26           ` Zdenek Kaspar
  0 siblings, 1 reply; 11+ messages in thread
From: Sean Christopherson @ 2020-12-22 17:07 UTC (permalink / raw)
  To: Zdenek Kaspar; +Cc: kvm

On Mon, Dec 21, 2020, Zdenek Kaspar wrote:
> [  179.364305] WARNING: CPU: 0 PID: 369 at kvm_mmu_zap_oldest_mmu_pages+0xd1/0xe0 [kvm]
> [  179.365415] Call Trace:
> [  179.365443]  paging64_page_fault+0x244/0x8e0 [kvm]

This means the shadow page zapping is occuring because KVM is hitting the max
number of allowed MMU shadow pages.  Can you provide your QEMU command line?  I
can reproduce the performance degredation, but only by deliberately overriding
the max number of MMU pages via `-machine kvm-shadow-mem` to be an absurdly low
value.

> [  179.365596]  kvm_mmu_page_fault+0x376/0x550 [kvm]
> [  179.365725]  kvm_arch_vcpu_ioctl_run+0xbaf/0x18f0 [kvm]
> [  179.365772]  kvm_vcpu_ioctl+0x203/0x520 [kvm]
> [  179.365938]  __x64_sys_ioctl+0x338/0x720
> [  179.365992]  do_syscall_64+0x33/0x40
> [  179.366013]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bad performance since 5.9-rc1
  2020-12-22 17:07         ` Sean Christopherson
@ 2020-12-22 21:26           ` Zdenek Kaspar
  2021-01-12 11:18             ` Zdenek Kaspar
  0 siblings, 1 reply; 11+ messages in thread
From: Zdenek Kaspar @ 2020-12-22 21:26 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm

On Tue, 22 Dec 2020 09:07:39 -0800
Sean Christopherson <seanjc@google.com> wrote:

> On Mon, Dec 21, 2020, Zdenek Kaspar wrote:
> > [  179.364305] WARNING: CPU: 0 PID: 369 at
> > kvm_mmu_zap_oldest_mmu_pages+0xd1/0xe0 [kvm] [  179.365415] Call
> > Trace: [  179.365443]  paging64_page_fault+0x244/0x8e0 [kvm]
> 
> This means the shadow page zapping is occuring because KVM is hitting
> the max number of allowed MMU shadow pages.  Can you provide your
> QEMU command line?  I can reproduce the performance degredation, but
> only by deliberately overriding the max number of MMU pages via
> `-machine kvm-shadow-mem` to be an absurdly low value.
> 
> > [  179.365596]  kvm_mmu_page_fault+0x376/0x550 [kvm]
> > [  179.365725]  kvm_arch_vcpu_ioctl_run+0xbaf/0x18f0 [kvm]
> > [  179.365772]  kvm_vcpu_ioctl+0x203/0x520 [kvm]
> > [  179.365938]  __x64_sys_ioctl+0x338/0x720
> > [  179.365992]  do_syscall_64+0x33/0x40
> > [  179.366013]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

It's one long line, added "\" for mail readability:

qemu-system-x86_64 -machine type=q35,accel=kvm            \
-cpu host,host-cache-info=on -smp cpus=2,cores=2          \
-m size=1024 -global virtio-pci.disable-legacy=on         \
-global virtio-pci.disable-modern=off                     \
-device virtio-balloon                                    \
-device virtio-net,netdev=tap-build,mac=DE:AD:BE:EF:00:80 \
-object rng-random,filename=/dev/urandom,id=rng0          \
-device virtio-rng,rng=rng0                               \
-name build,process=qemu-build                            \
-drive file=/mnt/data/export/unix/kvm/build/openbsd-amd64.img,if=virtio,cache=none,format=raw,aio=native \
-netdev type=tap,id=tap-build,vhost=on                    \
-serial none                                              \
-parallel none                                            \
-monitor unix:/dev/shm/kvm-build.sock,server,nowait       \
-enable-kvm -daemonize -runas qemu

Z.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bad performance since 5.9-rc1
  2020-12-22 21:26           ` Zdenek Kaspar
@ 2021-01-12 11:18             ` Zdenek Kaspar
  2021-01-13 20:17               ` Sean Christopherson
  0 siblings, 1 reply; 11+ messages in thread
From: Zdenek Kaspar @ 2021-01-12 11:18 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm

On Tue, 22 Dec 2020 22:26:45 +0100
Zdenek Kaspar <zkaspar82@gmail.com> wrote:

> On Tue, 22 Dec 2020 09:07:39 -0800
> Sean Christopherson <seanjc@google.com> wrote:
> 
> > On Mon, Dec 21, 2020, Zdenek Kaspar wrote:
> > > [  179.364305] WARNING: CPU: 0 PID: 369 at
> > > kvm_mmu_zap_oldest_mmu_pages+0xd1/0xe0 [kvm] [  179.365415] Call
> > > Trace: [  179.365443]  paging64_page_fault+0x244/0x8e0 [kvm]
> > 
> > This means the shadow page zapping is occuring because KVM is
> > hitting the max number of allowed MMU shadow pages.  Can you
> > provide your QEMU command line?  I can reproduce the performance
> > degredation, but only by deliberately overriding the max number of
> > MMU pages via `-machine kvm-shadow-mem` to be an absurdly low value.
> > 
> > > [  179.365596]  kvm_mmu_page_fault+0x376/0x550 [kvm]
> > > [  179.365725]  kvm_arch_vcpu_ioctl_run+0xbaf/0x18f0 [kvm]
> > > [  179.365772]  kvm_vcpu_ioctl+0x203/0x520 [kvm]
> > > [  179.365938]  __x64_sys_ioctl+0x338/0x720
> > > [  179.365992]  do_syscall_64+0x33/0x40
> > > [  179.366013]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> It's one long line, added "\" for mail readability:
> 
> qemu-system-x86_64 -machine type=q35,accel=kvm            \
> -cpu host,host-cache-info=on -smp cpus=2,cores=2          \
> -m size=1024 -global virtio-pci.disable-legacy=on         \
> -global virtio-pci.disable-modern=off                     \
> -device virtio-balloon                                    \
> -device virtio-net,netdev=tap-build,mac=DE:AD:BE:EF:00:80 \
> -object rng-random,filename=/dev/urandom,id=rng0          \
> -device virtio-rng,rng=rng0                               \
> -name build,process=qemu-build                            \
> -drive
> file=/mnt/data/export/unix/kvm/build/openbsd-amd64.img,if=virtio,cache=none,format=raw,aio=native
> \ -netdev type=tap,id=tap-build,vhost=on                    \ -serial
> none                                              \ -parallel none
>                                         \ -monitor
> unix:/dev/shm/kvm-build.sock,server,nowait       \ -enable-kvm
> -daemonize -runas qemu
> 
> Z.

BTW, v5.11-rc3 with kvm-shadow-mem=1073741824 it seems OK.

Just curious what v5.8 does, so by any chance is there command
for kvm-shadow-mem value via qemu monitor?

Z.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bad performance since 5.9-rc1
  2021-01-12 11:18             ` Zdenek Kaspar
@ 2021-01-13 20:17               ` Sean Christopherson
  2021-01-13 22:17                 ` Zdenek Kaspar
  0 siblings, 1 reply; 11+ messages in thread
From: Sean Christopherson @ 2021-01-13 20:17 UTC (permalink / raw)
  To: Zdenek Kaspar; +Cc: kvm

On Tue, Jan 12, 2021, Zdenek Kaspar wrote:
> On Tue, 22 Dec 2020 22:26:45 +0100
> Zdenek Kaspar <zkaspar82@gmail.com> wrote:
> 
> > On Tue, 22 Dec 2020 09:07:39 -0800
> > Sean Christopherson <seanjc@google.com> wrote:
> > 
> > > On Mon, Dec 21, 2020, Zdenek Kaspar wrote:
> > > > [  179.364305] WARNING: CPU: 0 PID: 369 at
> > > > kvm_mmu_zap_oldest_mmu_pages+0xd1/0xe0 [kvm] [  179.365415] Call
> > > > Trace: [  179.365443]  paging64_page_fault+0x244/0x8e0 [kvm]
> > > 
> > > This means the shadow page zapping is occuring because KVM is
> > > hitting the max number of allowed MMU shadow pages.  Can you
> > > provide your QEMU command line?  I can reproduce the performance
> > > degredation, but only by deliberately overriding the max number of
> > > MMU pages via `-machine kvm-shadow-mem` to be an absurdly low value.
> > > 
> > > > [  179.365596]  kvm_mmu_page_fault+0x376/0x550 [kvm]
> > > > [  179.365725]  kvm_arch_vcpu_ioctl_run+0xbaf/0x18f0 [kvm]
> > > > [  179.365772]  kvm_vcpu_ioctl+0x203/0x520 [kvm]
> > > > [  179.365938]  __x64_sys_ioctl+0x338/0x720
> > > > [  179.365992]  do_syscall_64+0x33/0x40
> > > > [  179.366013]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > 
> > It's one long line, added "\" for mail readability:
> > 
> > qemu-system-x86_64 -machine type=q35,accel=kvm            \
> > -cpu host,host-cache-info=on -smp cpus=2,cores=2          \
> > -m size=1024 -global virtio-pci.disable-legacy=on         \
> > -global virtio-pci.disable-modern=off                     \
> > -device virtio-balloon                                    \
> > -device virtio-net,netdev=tap-build,mac=DE:AD:BE:EF:00:80 \
> > -object rng-random,filename=/dev/urandom,id=rng0          \
> > -device virtio-rng,rng=rng0                               \
> > -name build,process=qemu-build                            \
> > -drive
> > file=/mnt/data/export/unix/kvm/build/openbsd-amd64.img,if=virtio,cache=none,format=raw,aio=native
> > \ -netdev type=tap,id=tap-build,vhost=on                    \ -serial
> > none                                              \ -parallel none
> >                                         \ -monitor
> > unix:/dev/shm/kvm-build.sock,server,nowait       \ -enable-kvm
> > -daemonize -runas qemu
> > 
> > Z.
> 
> BTW, v5.11-rc3 with kvm-shadow-mem=1073741824 it seems OK.
>
> Just curious what v5.8 does

Aha!  Figured it out.  v5.9 (the commit you bisected to) broke the zapping,
that's what it did.  The list of MMU pages is a FIFO list, meaning KVM adds
entries to the head, not the tail.  I botched the zapping flow and used
for_each instead of for_each_reverse, which meant KVM would zap the _newest_
pages instead of the _oldest_ pages.  So once a VM hit its limit, KVM would
constantly zap the shadow pages it just allocated.

This should resolve the performance regression, or at least make it far less
painful.  It's possible you may still see some performance degredation due to
other changes in the the zapping, e.g. more aggressive recursive zapping.  If
that's the case, I can explore other tweaks, e.g. skip higher levels when
possible.  I'll get a proper patch posted later today.

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c478904af518..2c6e6fdb26ad 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2417,7 +2417,7 @@ static unsigned long kvm_mmu_zap_oldest_mmu_pages(struct kvm *kvm,
                return 0;

 restart:
-       list_for_each_entry_safe(sp, tmp, &kvm->arch.active_mmu_pages, link) {
+       list_for_each_entry_safe_reverse(sp, tmp, &kvm->arch.active_mmu_pages, link) {
                /*
                 * Don't zap active root pages, the page itself can't be freed
                 * and zapping it will just force vCPUs to realloc and reload.

Side topic, I still can't figure out how on earth your guest kernel is hitting
the max number of default pages.  Even with large pages completely disabled, PTI
enabled, multiple guest processes running, etc... I hit OOM in the guest before
the host's shadow page limit kicks in.  I had to force the limit down to 25% of
the default to reproduce the bad behavior.  All I can figure is that BSD has a
substantially different paging scheme than Linux.

> so by any chance is there command for kvm-shadow-mem value via qemu monitor?
> 
> Z.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bad performance since 5.9-rc1
  2021-01-13 20:17               ` Sean Christopherson
@ 2021-01-13 22:17                 ` Zdenek Kaspar
  0 siblings, 0 replies; 11+ messages in thread
From: Zdenek Kaspar @ 2021-01-13 22:17 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm

On Wed, 13 Jan 2021 12:17:19 -0800
Sean Christopherson <seanjc@google.com> wrote:

> On Tue, Jan 12, 2021, Zdenek Kaspar wrote:
> > On Tue, 22 Dec 2020 22:26:45 +0100
> > Zdenek Kaspar <zkaspar82@gmail.com> wrote:
> > 
> > > On Tue, 22 Dec 2020 09:07:39 -0800
> > > Sean Christopherson <seanjc@google.com> wrote:
> > > 
> > > > On Mon, Dec 21, 2020, Zdenek Kaspar wrote:
> > > > > [  179.364305] WARNING: CPU: 0 PID: 369 at
> > > > > kvm_mmu_zap_oldest_mmu_pages+0xd1/0xe0 [kvm] [  179.365415]
> > > > > Call Trace: [  179.365443]  paging64_page_fault+0x244/0x8e0
> > > > > [kvm]
> > > > 
> > > > This means the shadow page zapping is occuring because KVM is
> > > > hitting the max number of allowed MMU shadow pages.  Can you
> > > > provide your QEMU command line?  I can reproduce the performance
> > > > degredation, but only by deliberately overriding the max number
> > > > of MMU pages via `-machine kvm-shadow-mem` to be an absurdly
> > > > low value.
> > > > 
> > > > > [  179.365596]  kvm_mmu_page_fault+0x376/0x550 [kvm]
> > > > > [  179.365725]  kvm_arch_vcpu_ioctl_run+0xbaf/0x18f0 [kvm]
> > > > > [  179.365772]  kvm_vcpu_ioctl+0x203/0x520 [kvm]
> > > > > [  179.365938]  __x64_sys_ioctl+0x338/0x720
> > > > > [  179.365992]  do_syscall_64+0x33/0x40
> > > > > [  179.366013]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > 
> > > It's one long line, added "\" for mail readability:
> > > 
> > > qemu-system-x86_64 -machine type=q35,accel=kvm            \
> > > -cpu host,host-cache-info=on -smp cpus=2,cores=2          \
> > > -m size=1024 -global virtio-pci.disable-legacy=on         \
> > > -global virtio-pci.disable-modern=off                     \
> > > -device virtio-balloon                                    \
> > > -device virtio-net,netdev=tap-build,mac=DE:AD:BE:EF:00:80 \
> > > -object rng-random,filename=/dev/urandom,id=rng0          \
> > > -device virtio-rng,rng=rng0                               \
> > > -name build,process=qemu-build                            \
> > > -drive
> > > file=/mnt/data/export/unix/kvm/build/openbsd-amd64.img,if=virtio,cache=none,format=raw,aio=native
> > > \ -netdev type=tap,id=tap-build,vhost=on                    \
> > > -serial none                                              \
> > > -parallel none \ -monitor
> > > unix:/dev/shm/kvm-build.sock,server,nowait       \ -enable-kvm
> > > -daemonize -runas qemu
> > > 
> > > Z.
> > 
> > BTW, v5.11-rc3 with kvm-shadow-mem=1073741824 it seems OK.
> >
> > Just curious what v5.8 does
> 
> Aha!  Figured it out.  v5.9 (the commit you bisected to) broke the
> zapping, that's what it did.  The list of MMU pages is a FIFO list,
> meaning KVM adds entries to the head, not the tail.  I botched the
> zapping flow and used for_each instead of for_each_reverse, which
> meant KVM would zap the _newest_ pages instead of the _oldest_ pages.
>  So once a VM hit its limit, KVM would constantly zap the shadow
> pages it just allocated.
> 
> This should resolve the performance regression, or at least make it
> far less painful.  It's possible you may still see some performance
> degredation due to other changes in the the zapping, e.g. more
> aggressive recursive zapping.  If that's the case, I can explore
> other tweaks, e.g. skip higher levels when possible.  I'll get a
> proper patch posted later today.
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index c478904af518..2c6e6fdb26ad 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2417,7 +2417,7 @@ static unsigned long
> kvm_mmu_zap_oldest_mmu_pages(struct kvm *kvm, return 0;
> 
>  restart:
> -       list_for_each_entry_safe(sp, tmp,
> &kvm->arch.active_mmu_pages, link) {
> +       list_for_each_entry_safe_reverse(sp, tmp,
> &kvm->arch.active_mmu_pages, link) { /*
>                  * Don't zap active root pages, the page itself can't
> be freed
>                  * and zapping it will just force vCPUs to realloc
> and reload.
> 
> Side topic, I still can't figure out how on earth your guest kernel
> is hitting the max number of default pages.  Even with large pages
> completely disabled, PTI enabled, multiple guest processes running,
> etc... I hit OOM in the guest before the host's shadow page limit
> kicks in.  I had to force the limit down to 25% of the default to
> reproduce the bad behavior.  All I can figure is that BSD has a
> substantially different paging scheme than Linux.
> 
> > so by any chance is there command for kvm-shadow-mem value via qemu
> > monitor?
> > 
> > Z.

Cool, tested by quick compile in guest and it's a good fix!

5.11.0-rc3-amd64 (list_for_each_entry_safe):
 - with kvm-shadow-mem=1073741824 (without == unusable)
    0m14.86s real     0m10.87s user     0m12.15s system

5.11.0-rc3-2-amd64 (list_for_each_entry_safe_reverse):
    0m14.36s real     0m10.50s user     0m12.43s system

Thanks, Z.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-01-13 22:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-19  3:05 Bad performance since 5.9-rc1 Zdenek Kaspar
2020-12-01  6:35 ` Zdenek Kaspar
2020-12-18 19:33   ` Zdenek Kaspar
2020-12-21 19:41     ` Sean Christopherson
2020-12-21 21:13       ` Zdenek Kaspar
2020-12-22 17:07         ` Sean Christopherson
2020-12-22 21:26           ` Zdenek Kaspar
2021-01-12 11:18             ` Zdenek Kaspar
2021-01-13 20:17               ` Sean Christopherson
2021-01-13 22:17                 ` Zdenek Kaspar
2020-12-02  0:31 ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).