All of lore.kernel.org
 help / color / mirror / Atom feed
* reproducible BUG() in kvm_mmu_get_root() in TDP MMU
@ 2021-01-04 23:05 Maciej S. Szmigiero
       [not found] ` <8A352C2E-E7D2-4873-807F-635A595DCAEF@gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Maciej S. Szmigiero @ 2021-01-04 23:05 UTC (permalink / raw)
  To: Ben Gardon, Sean Christopherson
  Cc: linux-kernel, kvm, Cannon Matthews, Paolo Bonzini, Peter Xu,
	Peter Shier, Peter Feiner, Junaid Shahid, Jim Mattson,
	Yulei Zhang, Wanpeng Li, Vitaly Kuznetsov,
	Xiao Guangrong <xiaoguangrong.eric

Hi,

I am hitting a reproducible BUG() with KVM TDP MMU.

The reproducer based on set_memory_region_test.c from KVM selftests
is available here:
https://gist.github.com/maciejsszmigiero/890218151c242d99f63ea0825334c6c0

The test simply moves a memslot a bit back and forth on the host
while the guest is concurrently writing around the area being
moved.

The code runs fine on the default KVM MMU but triggers a BUG() when
TDP MMU is enabled by adding "tdp_mmu=1" kvm module parameter.

The backtrace is:
[ 1308.455120] kernel BUG at arch/x86/kvm/mmu/mmu_internal.h:100!
[ 1308.524951] invalid opcode: 0000 [#1] SMP PTI
[ 1308.577080] CPU: 92 PID: 18675 Comm: memslot_move_te Not tainted 5.11.0-rc2+ #80
[ 1308.665617] Hardware name: Oracle Corporation ORACLE SERVER X7-2c/SERVER MODULE ASSY, , BIOS 46070300 12/20/2019
[ 1308.787438] RIP: 0010:kvm_tdp_mmu_get_vcpu_root_hpa+0x10c/0x120 [kvm]
[ 1308.864587] Code: db 74 1c b8 00 00 00 80 48 03 43 40 72 1e 48 c7 c2 00 00 00 80 48 2b 15 92 0a 1d d3 48 01 d0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 48 8b 15 eb e8 3c d3 eb e7 66 0f 1f 84 00 00 00 00 00 0f 1f
[ 1309.089393] RSP: 0018:ffffa65affa73d10 EFLAGS: 00010246
[ 1309.151922] RAX: 0000000000000000 RBX: ffff9b46829bac78 RCX: 0000000000000000
[ 1309.237334] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa65ada1bd000
[ 1309.322744] RBP: ffffa65affa73d38 R08: 0000000000000000 R09: ffff9b454e443200
[ 1309.408156] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000001794
[ 1309.493567] R13: ffffa65ada1bd000 R14: ffff9b454e443040 R15: ffffa65ada1d2418
[ 1309.578977] FS:  00007fdb0430b700(0000) GS:ffff9ba3bfa00000(0000) knlGS:0000000000000000
[ 1309.675833] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1309.744605] CR2: 0000000000000000 CR3: 0000006090046006 CR4: 00000000007726e0
[ 1309.830018] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1309.915428] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1310.000837] PKRU: 55555554
[ 1310.033199] Call Trace:
[ 1310.062445]  kvm_mmu_load+0x29e/0x480 [kvm]
[ 1310.112542]  vcpu_enter_guest+0x112d/0x15b0 [kvm]
[ 1310.168865]  ? vmx_vcpu_load+0x2e/0x40 [kvm_intel]
[ 1310.226201]  kvm_arch_vcpu_ioctl_run+0xf9/0x580 [kvm]
[ 1310.286685]  kvm_vcpu_ioctl+0x247/0x600 [kvm]
[ 1310.338838]  ? tick_program_event+0x44/0x70
[ 1310.388888]  ? __audit_syscall_entry+0xdd/0x130
[ 1310.443101]  __x64_sys_ioctl+0x92/0xd0
[ 1310.487946]  do_syscall_64+0x37/0x50
[ 1310.530711]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1310.591158] RIP: 0033:0x7fdb44a06307
[ 1310.633925] Code: 44 00 00 48 8b 05 69 1b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 1b 2d 00 f7 d8 64 89 01 48
[ 1310.858726] RSP: 002b:00007fdb0430ae78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1310.949338] RAX: ffffffffffffffda RBX: 00000000019662f0 RCX: 00007fdb44a06307
[ 1311.034747] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000007
[ 1311.120159] RBP: 0000000001965000 R08: 000000000040b2ff R09: 0000000000000000
[ 1311.205567] R10: 00007fdb0430a2a0 R11: 0000000000000246 R12: 0000000000000000
[ 1311.291738] R13: 0000000001965000 R14: 0000000000000000 R15: 00007fdb0430b700
[ 1311.377873] Modules linked in: kvm_intel kvm xt_comment xt_owner ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rpcrdma ib_isert iscsi_target_mod ib_iser ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_umad iw_cxgb4 rdma_cm iw_cm ib_cm intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp bnxt_re ib_uverbs mgag200 ib_core drm_kms_helper cec drm iTCO_wdt iTCO_vendor_support sg irqbypass pcspkr syscopyarea sysfillrect sysimgblt i2c_i801 ioatdma fb_sys_fops joydev i2c_algo_bit i2c_smbus lpc_ich intel_pch_thermal dca ip_tables vfat fat xfs sd_mod t10_pi be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls cxgb3i cxgb3 mdio libcxgbi
[ 1311.377953]  libcxgb qla4xxx iscsi_boot_sysfs crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper bnxt_en wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi [last unloaded: kvm]
[ 1312.712917] ---[ end trace 4716cc8fd037784d ]---
[ 1312.884672] RIP: 0010:kvm_tdp_mmu_get_vcpu_root_hpa+0x10c/0x120 [kvm]
[ 1312.962622] Code: db 74 1c b8 00 00 00 80 48 03 43 40 72 1e 48 c7 c2 00 00 00 80 48 2b 15 92 0a 1d d3 48 01 d0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 48 8b 15 eb e8 3c d3 eb e7 66 0f 1f 84 00 00 00 00 00 0f 1f
[ 1313.189000] RSP: 0018:ffffa65affa73d10 EFLAGS: 00010246
[ 1313.252321] RAX: 0000000000000000 RBX: ffff9b46829bac78 RCX: 0000000000000000
[ 1313.338522] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa65ada1bd000
[ 1313.424727] RBP: ffffa65affa73d38 R08: 0000000000000000 R09: ffff9b454e443200
[ 1313.510932] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000001794
[ 1313.597140] R13: ffffa65ada1bd000 R14: ffff9b454e443040 R15: ffffa65ada1d2418
[ 1313.683343] FS:  00007fdb0430b700(0000) GS:ffff9ba3bfa00000(0000) knlGS:0000000000000000
[ 1313.780987] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1313.850556] CR2: 0000000000000000 CR3: 0000006090046006 CR4: 00000000007726e0
[ 1313.936759] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1314.022964] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1314.109171] PKRU: 55555554
[ 1314.142325] Kernel panic - not syncing: Fatal exception
[ 1314.205755] Kernel Offset: 0x11a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1315.367254] ---[ end Kernel panic - not syncing: Fatal exception ]---

It looks like there might be an inbalance of kvm_mmu_get_root()
and kvm_mmu_put_root() somewhere but I couldn't really nail it down.

I've tried with and without "KVM: x86/mmu: Bug fixes and cleanups in
get_mmio_spte()" series applied, doesn't make any difference.

Thanks,
Maciej


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: reproducible BUG() in kvm_mmu_get_root() in TDP MMU
       [not found] ` <8A352C2E-E7D2-4873-807F-635A595DCAEF@gmail.com>
@ 2021-01-05 17:01   ` Ben Gardon
  2021-01-05 17:49     ` Ben Gardon
  0 siblings, 1 reply; 6+ messages in thread
From: Ben Gardon @ 2021-01-05 17:01 UTC (permalink / raw)
  To: leohou1402
  Cc: maciej.szmigiero, seanjc, linux-kernel, kvm, cannonmatthews,
	pbonzini, peterx, pshier, pfeiner, junaids, jmattson,
	yulei.kernel, kernellwp, vkuznets

On Mon, Jan 4, 2021 at 6:37 PM leohou1402 <leohou1402@gmail.com> wrote:
>
>  On 1/5/2021 07:09,Maciej S. Szmigiero<maciej.szmigiero@oracle.com> wrote:
>
> > Hi,
>
> > I am hitting a reproducible BUG() with KVM TDP MMU.
>
> > The reproducer based on set_memory_region_test.c from KVM selftests
> > is available here:
> > https://gist.github.com/maciejsszmigiero/890218151c242d99f63ea0825334c6c0
>
> > The test simply moves a memslot a bit back and forth on the host
> > while the guest is concurrently writing around the area being
> > moved.
>
> > The code runs fine on the default KVM MMU but triggers a BUG() when
> > TDP MMU is enabled by adding "tdp_mmu=1" kvm module parameter.
>
> > The backtrace is:
> > [ 1308.455120] kernel BUG at arch/x86/kvm/mmu/mmu_internal.h:100!
> > [ 1308.524951] invalid opcode: 0000 [#1] SMP PTI
> > [ 1308.577080] CPU: 92 PID: 18675 Comm: memslot_move_te Not tainted 5.11.0-rc2+ #80
> > [ 1308.665617] Hardware name: Oracle Corporation ORACLE SERVER X7-2c/SERVER MODULE ASSY, , BIOS 46070300 12/20/2019
> > [ 1308.787438] RIP: 0010:kvm_tdp_mmu_get_vcpu_root_hpa+0x10c/0x120 [kvm]
> > [ 1308.864587] Code: db 74 1c b8 00 00 00 80 48 03 43 40 72 1e 48 c7 c2 00 00 00 80 48 2b 15 92 0a 1d d3 48 01 d0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b > 48 8b 15 eb e8 3c d3 eb e7 66 0f 1f 84 00 00 00 00 00 0f 1f
> > [ 1309.089393] RSP: 0018:ffffa65affa73d10 EFLAGS: 00010246
> > [ 1309.151922] RAX: 0000000000000000 RBX: ffff9b46829bac78 RCX: 0000000000000000
> > [ 1309.237334] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa65ada1bd000
> > [ 1309.322744] RBP: ffffa65affa73d38 R08: 0000000000000000 R09: ffff9b454e443200
> > [ 1309.408156] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000001794
> > [ 1309.493567] R13: ffffa65ada1bd000 R14: ffff9b454e443040 R15: ffffa65ada1d2418
> > [ 1309.578977] FS:  00007fdb0430b700(0000) GS:ffff9ba3bfa00000(0000) knlGS:0000000000000000
> > [ 1309.675833] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 1309.744605] CR2: 0000000000000000 CR3: 0000006090046006 CR4: 00000000007726e0
> > [ 1309.830018] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 1309.915428] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 1310.000837] PKRU: 55555554
> > [ 1310.033199] Call Trace:
> > [ 1310.062445]  kvm_mmu_load+0x29e/0x480 [kvm]
> > [ 1310.112542]  vcpu_enter_guest+0x112d/0x15b0 [kvm]
> > [ 1310.168865]  ? vmx_vcpu_load+0x2e/0x40 [kvm_intel]
> > [ 1310.226201]  kvm_arch_vcpu_ioctl_run+0xf9/0x580 [kvm]
> > [ 1310.286685]  kvm_vcpu_ioctl+0x247/0x600 [kvm]
> > [ 1310.338838]  ? tick_program_event+0x44/0x70
> > [ 1310.388888]  ? __audit_syscall_entry+0xdd/0x130
> > [ 1310.443101]  __x64_sys_ioctl+0x92/0xd0
> > [ 1310.487946]  do_syscall_64+0x37/0x50
> > [ 1310.530711]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [ 1310.591158] RIP: 0033:0x7fdb44a06307
> > [ 1310.633925] Code: 44 00 00 48 8b 05 69 1b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 > ff ff 73 01 c3 48 8b 0d 39 1b 2d 00 f7 d8 64 89 01 48
> > [ 1310.858726] RSP: 002b:00007fdb0430ae78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > [ 1310.949338] RAX: ffffffffffffffda RBX: 00000000019662f0 RCX: 00007fdb44a06307
> > [ 1311.034747] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000007
> > [ 1311.120159] RBP: 0000000001965000 R08: 000000000040b2ff R09: 0000000000000000
> > [ 1311.205567] R10: 00007fdb0430a2a0 R11: 0000000000000246 R12: 0000000000000000
> > [ 1311.291738] R13: 0000000001965000 R14: 0000000000000000 R15: 00007fdb0430b700
> > [ 1311.377873] Modules linked in: kvm_intel kvm xt_comment xt_owner ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rpcrdma ib_isert iscsi_target_mod ib_iser ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_umad iw_cxgb4 rdma_cm iw_cm ib_cm intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp bnxt_re ib_uverbs mgag200 ib_core drm_kms_helper cec drm iTCO_wdt iTCO_vendor_support sg irqbypass pcspkr syscopyarea sysfillrect sysimgblt i2c_i801 ioatdma fb_sys_fops joydev i2c_algo_bit i2c_smbus lpc_ich intel_pch_thermal dca ip_tables vfat fat xfs sd_mod t10_pi be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls cxgb3i cxgb3 mdio libcxgbi
> > [ 1311.377953]  libcxgb qla4xxx iscsi_boot_sysfs crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper bnxt_en wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi [last unloaded: kvm]
> > [ 1312.712917] ---[ end trace 4716cc8fd037784d ]---
> > [ 1312.884672] RIP: 0010:kvm_tdp_mmu_get_vcpu_root_hpa+0x10c/0x120 [kvm]
> > [ 1312.962622] Code: db 74 1c b8 00 00 00 80 48 03 43 40 72 1e 48 c7 c2 00 00 00 80 48 2b 15 92 0a 1d d3 48 01 d0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 48 8b 15 eb e8 3c d3 eb e7 66 0f 1f 84 00 00 00 00 00 0f 1f
> > [ 1313.189000] RSP: 0018:ffffa65affa73d10 EFLAGS: 00010246
> > [ 1313.252321] RAX: 0000000000000000 RBX: ffff9b46829bac78 RCX: 0000000000000000
> > [ 1313.338522] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa65ada1bd000
> > [ 1313.424727] RBP: ffffa65affa73d38 R08: 0000000000000000 R09: ffff9b454e443200
> > [ 1313.510932] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000001794
> > [ 1313.597140] R13: ffffa65ada1bd000 R14: ffff9b454e443040 R15: ffffa65ada1d2418
> > [ 1313.683343] FS:  00007fdb0430b700(0000) GS:ffff9ba3bfa00000(0000) knlGS:0000000000000000
> > [ 1313.780987] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 1313.850556] CR2: 0000000000000000 CR3: 0000006090046006 CR4: 00000000007726e0
> > [ 1313.936759] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 1314.022964] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 1314.109171] PKRU: 55555554
> > [ 1314.142325] Kernel panic - not syncing: Fatal exception
> > [ 1314.205755] Kernel Offset: 0x11a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [ 1315.367254] ---[ end Kernel panic - not syncing: Fatal exception ]---
>
> > It looks like there might be an inbalance of kvm_mmu_get_root()
> > and kvm_mmu_put_root() somewhere but I couldn't really nail it down.
>
> > I've tried with and without "KVM: x86/mmu: Bug fixes and cleanups in
> > get_mmio_spte()" series applied, doesn't make any difference.
>
> > Thanks,
> > Maciej
>
> Hi, Maciej,
>
> I think you should post the environment of your hardware and software system, such as which distribution hostOS is, kernel version, CPU model, etc .
>
> Leo Hou

Thanks for reporting this Maciej. I'll look into it this week. As Leo
Hou said, it would be helpful to know more about the environment you
ran this test on. Your theory about a get / put roots imbalance seems
like a good explanation. I'll see if I can find such an imbalance.
Ben

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: reproducible BUG() in kvm_mmu_get_root() in TDP MMU
  2021-01-05 17:01   ` Ben Gardon
@ 2021-01-05 17:49     ` Ben Gardon
  2021-01-05 18:06       ` Paolo Bonzini
  0 siblings, 1 reply; 6+ messages in thread
From: Ben Gardon @ 2021-01-05 17:49 UTC (permalink / raw)
  To: leohou1402
  Cc: maciej.szmigiero, seanjc, linux-kernel, kvm, cannonmatthews,
	pbonzini, peterx, pshier, pfeiner, junaids, jmattson,
	yulei.kernel, kernellwp, vkuznets

I believe I've found the bug and it could happen on any platform, so
the environment probably doesn't matter after all.

There are several TDP MMU functions which follow this pattern:
for_each_tdp_mmu_root(kvm, root) {
        kvm_mmu_get_root(kvm, root);
        <Do something, yield the MMU lock>
        kvm_mmu_put_root(kvm, root);
}

In these cases the get and put root calls are there to ensure that the
root is not freed while the function is running, however they do this
too well. If the put root call reduces the root's root_count to 0, it
should be removed from the roots list and freed before the MMU lock is
released. However the above pattern never bothers to free the root.
The following would fix this bug:

-kvm_mmu_put_root(kvm, root);
+if (kvm_mmu_put_root(kvm, root))
+       kvm_tdp_mmu_free_root(kvm, root);

The current code violates an (undocumented) invariant on the
tdp_mmu_roots list that no root in the list should have a 0
root_count. That invariant is checked by kvm_mmu_get_root but should
probably also be in the comment on the field.

Anyway, I'll write up a patch to fix this and send it out.
Ben

On Tue, Jan 5, 2021 at 9:01 AM Ben Gardon <bgardon@google.com> wrote:
>
> On Mon, Jan 4, 2021 at 6:37 PM leohou1402 <leohou1402@gmail.com> wrote:
> >
> >  On 1/5/2021 07:09,Maciej S. Szmigiero<maciej.szmigiero@oracle.com> wrote:
> >
> > > Hi,
> >
> > > I am hitting a reproducible BUG() with KVM TDP MMU.
> >
> > > The reproducer based on set_memory_region_test.c from KVM selftests
> > > is available here:
> > > https://gist.github.com/maciejsszmigiero/890218151c242d99f63ea0825334c6c0
> >
> > > The test simply moves a memslot a bit back and forth on the host
> > > while the guest is concurrently writing around the area being
> > > moved.
> >
> > > The code runs fine on the default KVM MMU but triggers a BUG() when
> > > TDP MMU is enabled by adding "tdp_mmu=1" kvm module parameter.
> >
> > > The backtrace is:
> > > [ 1308.455120] kernel BUG at arch/x86/kvm/mmu/mmu_internal.h:100!
> > > [ 1308.524951] invalid opcode: 0000 [#1] SMP PTI
> > > [ 1308.577080] CPU: 92 PID: 18675 Comm: memslot_move_te Not tainted 5.11.0-rc2+ #80
> > > [ 1308.665617] Hardware name: Oracle Corporation ORACLE SERVER X7-2c/SERVER MODULE ASSY, , BIOS 46070300 12/20/2019
> > > [ 1308.787438] RIP: 0010:kvm_tdp_mmu_get_vcpu_root_hpa+0x10c/0x120 [kvm]
> > > [ 1308.864587] Code: db 74 1c b8 00 00 00 80 48 03 43 40 72 1e 48 c7 c2 00 00 00 80 48 2b 15 92 0a 1d d3 48 01 d0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b > 48 8b 15 eb e8 3c d3 eb e7 66 0f 1f 84 00 00 00 00 00 0f 1f
> > > [ 1309.089393] RSP: 0018:ffffa65affa73d10 EFLAGS: 00010246
> > > [ 1309.151922] RAX: 0000000000000000 RBX: ffff9b46829bac78 RCX: 0000000000000000
> > > [ 1309.237334] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa65ada1bd000
> > > [ 1309.322744] RBP: ffffa65affa73d38 R08: 0000000000000000 R09: ffff9b454e443200
> > > [ 1309.408156] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000001794
> > > [ 1309.493567] R13: ffffa65ada1bd000 R14: ffff9b454e443040 R15: ffffa65ada1d2418
> > > [ 1309.578977] FS:  00007fdb0430b700(0000) GS:ffff9ba3bfa00000(0000) knlGS:0000000000000000
> > > [ 1309.675833] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 1309.744605] CR2: 0000000000000000 CR3: 0000006090046006 CR4: 00000000007726e0
> > > [ 1309.830018] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [ 1309.915428] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [ 1310.000837] PKRU: 55555554
> > > [ 1310.033199] Call Trace:
> > > [ 1310.062445]  kvm_mmu_load+0x29e/0x480 [kvm]
> > > [ 1310.112542]  vcpu_enter_guest+0x112d/0x15b0 [kvm]
> > > [ 1310.168865]  ? vmx_vcpu_load+0x2e/0x40 [kvm_intel]
> > > [ 1310.226201]  kvm_arch_vcpu_ioctl_run+0xf9/0x580 [kvm]
> > > [ 1310.286685]  kvm_vcpu_ioctl+0x247/0x600 [kvm]
> > > [ 1310.338838]  ? tick_program_event+0x44/0x70
> > > [ 1310.388888]  ? __audit_syscall_entry+0xdd/0x130
> > > [ 1310.443101]  __x64_sys_ioctl+0x92/0xd0
> > > [ 1310.487946]  do_syscall_64+0x37/0x50
> > > [ 1310.530711]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > [ 1310.591158] RIP: 0033:0x7fdb44a06307
> > > [ 1310.633925] Code: 44 00 00 48 8b 05 69 1b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 > ff ff 73 01 c3 48 8b 0d 39 1b 2d 00 f7 d8 64 89 01 48
> > > [ 1310.858726] RSP: 002b:00007fdb0430ae78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > > [ 1310.949338] RAX: ffffffffffffffda RBX: 00000000019662f0 RCX: 00007fdb44a06307
> > > [ 1311.034747] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000007
> > > [ 1311.120159] RBP: 0000000001965000 R08: 000000000040b2ff R09: 0000000000000000
> > > [ 1311.205567] R10: 00007fdb0430a2a0 R11: 0000000000000246 R12: 0000000000000000
> > > [ 1311.291738] R13: 0000000001965000 R14: 0000000000000000 R15: 00007fdb0430b700
> > > [ 1311.377873] Modules linked in: kvm_intel kvm xt_comment xt_owner ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rpcrdma ib_isert iscsi_target_mod ib_iser ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_umad iw_cxgb4 rdma_cm iw_cm ib_cm intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp bnxt_re ib_uverbs mgag200 ib_core drm_kms_helper cec drm iTCO_wdt iTCO_vendor_support sg irqbypass pcspkr syscopyarea sysfillrect sysimgblt i2c_i801 ioatdma fb_sys_fops joydev i2c_algo_bit i2c_smbus lpc_ich intel_pch_thermal dca ip_tables vfat fat xfs sd_mod t10_pi be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls cxgb3i cxgb3 mdio libcxgbi
> > > [ 1311.377953]  libcxgb qla4xxx iscsi_boot_sysfs crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper bnxt_en wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi [last unloaded: kvm]
> > > [ 1312.712917] ---[ end trace 4716cc8fd037784d ]---
> > > [ 1312.884672] RIP: 0010:kvm_tdp_mmu_get_vcpu_root_hpa+0x10c/0x120 [kvm]
> > > [ 1312.962622] Code: db 74 1c b8 00 00 00 80 48 03 43 40 72 1e 48 c7 c2 00 00 00 80 48 2b 15 92 0a 1d d3 48 01 d0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 48 8b 15 eb e8 3c d3 eb e7 66 0f 1f 84 00 00 00 00 00 0f 1f
> > > [ 1313.189000] RSP: 0018:ffffa65affa73d10 EFLAGS: 00010246
> > > [ 1313.252321] RAX: 0000000000000000 RBX: ffff9b46829bac78 RCX: 0000000000000000
> > > [ 1313.338522] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa65ada1bd000
> > > [ 1313.424727] RBP: ffffa65affa73d38 R08: 0000000000000000 R09: ffff9b454e443200
> > > [ 1313.510932] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000001794
> > > [ 1313.597140] R13: ffffa65ada1bd000 R14: ffff9b454e443040 R15: ffffa65ada1d2418
> > > [ 1313.683343] FS:  00007fdb0430b700(0000) GS:ffff9ba3bfa00000(0000) knlGS:0000000000000000
> > > [ 1313.780987] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 1313.850556] CR2: 0000000000000000 CR3: 0000006090046006 CR4: 00000000007726e0
> > > [ 1313.936759] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [ 1314.022964] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [ 1314.109171] PKRU: 55555554
> > > [ 1314.142325] Kernel panic - not syncing: Fatal exception
> > > [ 1314.205755] Kernel Offset: 0x11a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > [ 1315.367254] ---[ end Kernel panic - not syncing: Fatal exception ]---
> >
> > > It looks like there might be an inbalance of kvm_mmu_get_root()
> > > and kvm_mmu_put_root() somewhere but I couldn't really nail it down.
> >
> > > I've tried with and without "KVM: x86/mmu: Bug fixes and cleanups in
> > > get_mmio_spte()" series applied, doesn't make any difference.
> >
> > > Thanks,
> > > Maciej
> >
> > Hi, Maciej,
> >
> > I think you should post the environment of your hardware and software system, such as which distribution hostOS is, kernel version, CPU model, etc .
> >
> > Leo Hou
>
> Thanks for reporting this Maciej. I'll look into it this week. As Leo
> Hou said, it would be helpful to know more about the environment you
> ran this test on. Your theory about a get / put roots imbalance seems
> like a good explanation. I'll see if I can find such an imbalance.
> Ben

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: reproducible BUG() in kvm_mmu_get_root() in TDP MMU
  2021-01-05 17:49     ` Ben Gardon
@ 2021-01-05 18:06       ` Paolo Bonzini
  2021-01-05 18:46         ` Sean Christopherson
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2021-01-05 18:06 UTC (permalink / raw)
  To: Ben Gardon, leohou1402
  Cc: maciej.szmigiero, seanjc, linux-kernel, kvm, cannonmatthews,
	peterx, pshier, pfeiner, junaids, jmattson, yulei.kernel,
	kernellwp, vkuznets

On 05/01/21 18:49, Ben Gardon wrote:
> for_each_tdp_mmu_root(kvm, root) {
>          kvm_mmu_get_root(kvm, root);
>          <Do something, yield the MMU lock>
>          kvm_mmu_put_root(kvm, root);
> }
> 
> In these cases the get and put root calls are there to ensure that the
> root is not freed while the function is running, however they do this
> too well. If the put root call reduces the root's root_count to 0, it
> should be removed from the roots list and freed before the MMU lock is
> released. However the above pattern never bothers to free the root.
> The following would fix this bug:
> 
> -kvm_mmu_put_root(kvm, root);
> +if (kvm_mmu_put_root(kvm, root))
> +       kvm_tdp_mmu_free_root(kvm, root);

Is it worth writing a more complex iterator struct, so that 
for_each_tdp_mmu_root takes care of the get and put?

Paolo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: reproducible BUG() in kvm_mmu_get_root() in TDP MMU
  2021-01-05 18:06       ` Paolo Bonzini
@ 2021-01-05 18:46         ` Sean Christopherson
  2021-01-05 19:21           ` Ben Gardon
  0 siblings, 1 reply; 6+ messages in thread
From: Sean Christopherson @ 2021-01-05 18:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ben Gardon, leohou1402, maciej.szmigiero, linux-kernel, kvm,
	cannonmatthews, peterx, pshier, pfeiner, junaids, jmattson,
	yulei.kernel, kernellwp, vkuznets

On Tue, Jan 05, 2021, Paolo Bonzini wrote:
> On 05/01/21 18:49, Ben Gardon wrote:
> > for_each_tdp_mmu_root(kvm, root) {
> >          kvm_mmu_get_root(kvm, root);
> >          <Do something, yield the MMU lock>
> >          kvm_mmu_put_root(kvm, root);
> > }
> > 
> > In these cases the get and put root calls are there to ensure that the
> > root is not freed while the function is running, however they do this
> > too well. If the put root call reduces the root's root_count to 0, it
> > should be removed from the roots list and freed before the MMU lock is
> > released. However the above pattern never bothers to free the root.
> > The following would fix this bug:
> > 
> > -kvm_mmu_put_root(kvm, root);
> > +if (kvm_mmu_put_root(kvm, root))
> > +       kvm_tdp_mmu_free_root(kvm, root);
> 
> Is it worth writing a more complex iterator struct, so that
> for_each_tdp_mmu_root takes care of the get and put?

Ya, and maybe with an "as_id" variant to avoid the get/put?  Not sure that's a
worthwhile optimization though.

On a related topic, there are a few subtleties with respect to
for_each_tdp_mmu_root() that we should document/comment.  The flows that drop
mmu_lock while iterating over the roots don't protect against the list itself
from being modified.  E.g. the next entry could be deleted, or a new root
could be added.  I think I've convinced myself that there are no existing bugs,
but we should document that the exact current behavior is required for
correctness.

The use of "unsafe" list_for_each_entry() in particular is unintuitive, as using
the "safe" variant would dereference a deleted entry in the "next entry is
deleted" scenario.

And regarding addomg a root, using list_add_tail() instead of list_add() in
get_tdp_mmu_vcpu_root() would cause iteration to visit a root that was added
after the iteration started (though I don't think this would ever be problematic
in practice?).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: reproducible BUG() in kvm_mmu_get_root() in TDP MMU
  2021-01-05 18:46         ` Sean Christopherson
@ 2021-01-05 19:21           ` Ben Gardon
  0 siblings, 0 replies; 6+ messages in thread
From: Ben Gardon @ 2021-01-05 19:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, leohou1402, maciej.szmigiero, linux-kernel, kvm,
	cannonmatthews, peterx, pshier, pfeiner, junaids, jmattson,
	yulei.kernel, kernellwp, vkuznets

On Tue, Jan 5, 2021 at 10:46 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Jan 05, 2021, Paolo Bonzini wrote:
> > On 05/01/21 18:49, Ben Gardon wrote:
> > > for_each_tdp_mmu_root(kvm, root) {
> > >          kvm_mmu_get_root(kvm, root);
> > >          <Do something, yield the MMU lock>
> > >          kvm_mmu_put_root(kvm, root);
> > > }
> > >
> > > In these cases the get and put root calls are there to ensure that the
> > > root is not freed while the function is running, however they do this
> > > too well. If the put root call reduces the root's root_count to 0, it
> > > should be removed from the roots list and freed before the MMU lock is
> > > released. However the above pattern never bothers to free the root.
> > > The following would fix this bug:
> > >
> > > -kvm_mmu_put_root(kvm, root);
> > > +if (kvm_mmu_put_root(kvm, root))
> > > +       kvm_tdp_mmu_free_root(kvm, root);
> >
> > Is it worth writing a more complex iterator struct, so that
> > for_each_tdp_mmu_root takes care of the get and put?
>
> Ya, and maybe with an "as_id" variant to avoid the get/put?  Not sure that's a
> worthwhile optimization though.

I'll see about adding such an iterator. I don't think avoiding the get
/ put is really worthwhile in this case since they're cheap operations
and putting them in the iterator makes it less obvious that they're
missing if those functions ever need to yield.

>
> On a related topic, there are a few subtleties with respect to
> for_each_tdp_mmu_root() that we should document/comment.  The flows that drop
> mmu_lock while iterating over the roots don't protect against the list itself
> from being modified.  E.g. the next entry could be deleted, or a new root
> could be added.  I think I've convinced myself that there are no existing bugs,
> but we should document that the exact current behavior is required for
> correctness.
>
> The use of "unsafe" list_for_each_entry() in particular is unintuitive, as using
> the "safe" variant would dereference a deleted entry in the "next entry is
> deleted" scenario.
>
> And regarding addomg a root, using list_add_tail() instead of list_add() in
> get_tdp_mmu_vcpu_root() would cause iteration to visit a root that was added
> after the iteration started (though I don't think this would ever be problematic
> in practice?).

A lot of these observations are safe because the operations using this
iterator only consider one root at a time and aren't really interested
in the state of the list.
Your point about the dangers of adding and removing roots while one of
these functions is running is valid, but like the legacy / shadow MMU
implementation, properties which need to be guaranteed across multiple
roots need to be managed at a higher level.
I believe that with the legacy / shadow MMU the creation of a new
root, while enabling write protection dirty logging for example, could
result in entries in the new root being considered, and others not
considered. That's why we set the dirty logging flag on the memslot
before we write protect SPTEs.
I'm not sure where exactly to document these properties, but I agree
it would be a good thing to do in a future patch set.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-01-05 19:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-04 23:05 reproducible BUG() in kvm_mmu_get_root() in TDP MMU Maciej S. Szmigiero
     [not found] ` <8A352C2E-E7D2-4873-807F-635A595DCAEF@gmail.com>
2021-01-05 17:01   ` Ben Gardon
2021-01-05 17:49     ` Ben Gardon
2021-01-05 18:06       ` Paolo Bonzini
2021-01-05 18:46         ` Sean Christopherson
2021-01-05 19:21           ` Ben Gardon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.