kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
To: Ben Gardon <bgardon@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Peter Shier <pshier@google.com>, Leo Hou <leohou1402@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>, kvm <kvm@vger.kernel.org>,
	Sean Christopherson <seanjc@google.com>
Subject: Re: [PATCH 2/3] kvm: x86/mmu: Ensure TDP MMU roots are freed after yield
Date: Wed, 6 Jan 2021 10:26:02 +0100	[thread overview]
Message-ID: <e94e674e-1775-3c67-97f0-8c61e1add554@oracle.com> (raw)
In-Reply-To: <CANgfPd8TXa3GG4mQ7MD0wBrUOTdRDeR0z50uDmbcR88rQMn5FQ@mail.gmail.com>

Thanks for looking at it Ben.

On 06.01.2021 00:38, Ben Gardon wrote:
(..)
> 
> +Sean Christopherson, for whom I used a stale email address.
> .
> I tested this series by running kvm-unit-tests on an Intel Skylake
> machine. It did not introduce any new failures. I also ran the
> set_memory_region_test

It's "memslot_move_test" that is crashing the kernel - a memslot
move test based on "set_memory_region_test".

>, but was unable to reproduce Maciej's problem.
> Maciej, if you'd be willing to confirm this series solves the problem
> you observed, or provide more details on the setup in which you
> observed it, I'd appreciate it.
> 

I've applied your patches and now are getting a slightly
different backtrace for the same test:
[  534.768212] general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] SMP PTI
[  534.887969] CPU: 97 PID: 4651 Comm: memslot_move_te Not tainted 5.11.0-rc2+ #81
[  534.975465] Hardware name: Oracle Corporation ORACLE SERVER X7-2c/SERVER MODULE ASSY, , BIOS 46070300 12/20/2019
[  535.097288] RIP: 0010:kvm_tdp_mmu_zap_gfn_range+0x70/0xb0 [kvm]
[  535.168199] Code: b8 01 00 00 00 4c 89 f1 41 89 45 50 4c 89 ee 48 89 df e8 a3 f3 ff ff 41 09 c4 41 83 6d 50 01 74 13 4d 8b 6d 00 4d 39 fd 74 1e <41> 8b 45 50 85 c0 75 c6 0f 0b 4c 89 ee 48 89 df e8 0b fc ff ff 4d
[  535.393005] RSP: 0018:ffffbded19083b90 EFLAGS: 00010297
[  535.455533] RAX: 0000000000000001 RBX: ffffbded1a27d000 RCX: 000000008030000e
[  535.540945] RDX: 000000008030000f RSI: ffffffffc0ad5453 RDI: ffff9cd72a00d300
[  535.626358] RBP: ffffbded19083bc0 R08: 0000000000000001 R09: ffffffffc0ad5400
[  535.711769] R10: ffff9d370acf31b8 R11: 0000000000000001 R12: 0000000000000001
[  535.797181] R13: dead000000000100 R14: 0000000400000000 R15: ffffbded1a292418
[  535.882590] FS:  00007ff50312e740(0000) GS:ffff9d947fb40000(0000) knlGS:0000000000000000
[  535.979443] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  536.048211] CR2: 0000000001e02fe0 CR3: 00000060a78e8003 CR4: 00000000007726e0
[  536.133628] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  536.219043] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  536.304452] PKRU: 55555554
[  536.336813] Call Trace:
[  536.366057]  kvm_tdp_mmu_zap_all+0x26/0x40 [kvm]
[  536.421357]  kvm_mmu_zap_all_fast+0x167/0x180 [kvm]
[  536.479767]  kvm_mmu_invalidate_zap_pages_in_memslot+0xe/0x10 [kvm]
[  536.554817]  kvm_page_track_flush_slot+0x5a/0x90 [kvm]
[  536.616344]  kvm_arch_flush_shadow_memslot+0xe/0x10 [kvm]
[  536.680986]  kvm_set_memslot+0x18f/0x690 [kvm]
[  536.734186]  __kvm_set_memory_region+0x41f/0x580 [kvm]
[  536.795705]  kvm_set_memory_region+0x2b/0x40 [kvm]
[  536.853062]  kvm_vm_ioctl+0x216/0x1060 [kvm]
[  536.904182]  ? irqtime_account_irq+0x40/0xc0
[  536.955270]  ? irq_exit_rcu+0x55/0xf0
[  536.999079]  ? sysvec_apic_timer_interrupt+0x45/0x90
[  537.058485]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[  537.122058]  ? __audit_syscall_entry+0xdd/0x130
[  537.176267]  __x64_sys_ioctl+0x92/0xd0
[  537.221114]  do_syscall_64+0x37/0x50
[  537.263878]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  537.324324] RIP: 0033:0x7ff502a27307
[  537.367882] Code: 44 00 00 48 8b 05 69 1b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 1b 2d 00 f7 d8 64 89 01 48
[  537.594221] RSP: 002b:00007fffde6b2d38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  537.685616] RAX: ffffffffffffffda RBX: 0000000001de8000 RCX: 00007ff502a27307
[  537.771797] RDX: 0000000001e02fe0 RSI: 000000004020ae46 RDI: 0000000000000004
[  537.857967] RBP: 00000000000001fc R08: 00007fffde74b090 R09: 000000000005af86
[  537.944110] R10: 000000000005af86 R11: 0000000000000246 R12: 0000000050000000
[  538.030236] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000000001fb
[  538.116345] Modules linked in: kvm_intel kvm xt_comment xt_owner ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rpcrdma ib_isert iscsi_target_mod ib_iser ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_umad iw_cxgb4 rdma_cm iw_cm ib_cm intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp mgag200 drm_kms_helper iTCO_wdt bnxt_re cec iTCO_vendor_support drm ib_uverbs syscopyarea sysfillrect ib_core sg irqbypass sysimgblt pcspkr ioatdma i2c_i801 fb_sys_fops joydev lpc_ich intel_pch_thermal i2c_smbus i2c_algo_bit dca ip_tables vfat fat xfs sd_mod t10_pi be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls cxgb3i cxgb3 mdio libcxgbi
[  538.116423]  libcxgb qla4xxx iscsi_boot_sysfs crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper bnxt_en wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi [last unloaded: kvm]
[  539.450863] ---[ end trace 7c17f445a2093145 ]---
[  539.623473] RIP: 0010:kvm_tdp_mmu_zap_gfn_range+0x70/0xb0 [kvm]
[  539.695136] Code: b8 01 00 00 00 4c 89 f1 41 89 45 50 4c 89 ee 48 89 df e8 a3 f3 ff ff 41 09 c4 41 83 6d 50 01 74 13 4d 8b 6d 00 4d 39 fd 74 1e <41> 8b 45 50 85 c0 75 c6 0f 0b 4c 89 ee 48 89 df e8 0b fc ff ff 4d
[  539.921479] RSP: 0018:ffffbded19083b90 EFLAGS: 00010297
[  539.984788] RAX: 0000000000000001 RBX: ffffbded1a27d000 RCX: 000000008030000e
[  540.070982] RDX: 000000008030000f RSI: ffffffffc0ad5453 RDI: ffff9cd72a00d300
[  540.157173] RBP: ffffbded19083bc0 R08: 0000000000000001 R09: ffffffffc0ad5400
[  540.243372] R10: ffff9d370acf31b8 R11: 0000000000000001 R12: 0000000000000001
[  540.329567] R13: dead000000000100 R14: 0000000400000000 R15: ffffbded1a292418
[  540.415772] FS:  00007ff50312e740(0000) GS:ffff9d947fb40000(0000) knlGS:0000000000000000
[  540.513427] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  540.583005] CR2: 0000000001e02fe0 CR3: 00000060a78e8003 CR4: 00000000007726e0
[  540.669228] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  540.755448] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  540.841659] PKRU: 55555554
[  540.874826] Kernel panic - not syncing: Fatal exception
[  540.938269] Kernel Offset: 0xe200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  542.097054] ---[ end Kernel panic - not syncing: Fatal exception ]---

The code that is crashing is:
# arch/x86/kvm/mmu/mmu_internal.h:100:  BUG_ON(!sp->root_count);
         movl    80(%r13), %eax  # MEM[(int *)__mptr_14 + 80B], _17 <- here
         testl   %eax, %eax      # _17
         jne     .L421   #,

So it looks like it now crashes in the same BUG_ON() but when trying to
deference the "dead" sp pointer instead.

It's bad that you can't reproduce the issue, however, as this would
probably make the root causing process much more effective.
Are you testing on bare metal like me or while running nested?

My test machine has Xeon Platinum 8167M CPU, so it's a Skylake, too.
It has 754G RAM + 8G swap, running just the test program.

I've uploaded the kernel that I've used for testing here:
https://github.com/maciejsszmigiero/linux/tree/tdp_mmu_bug

It is basically a 5.11.0-rc2 kernel with
"KVM: x86/mmu: Bug fixes and cleanups in get_mmio_spte()" series and
your fixes applied on top of it.

In addition to that, I've updated
https://gist.github.com/maciejsszmigiero/890218151c242d99f63ea0825334c6c0
with the kernel .config file that was used.

The compiler that I've used to compile the test kernel was:
"gcc version 8.3.1 20190311 (Red Hat 8.3.1-3.2.0.1)"

Thanks,
Maciej

  reply	other threads:[~2021-01-06  9:27 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-05 23:31 [PATCH 1/3] kvm: x86/mmu: Clarify TDP MMU page list invariants Ben Gardon
2021-01-05 23:31 ` [PATCH 2/3] kvm: x86/mmu: Ensure TDP MMU roots are freed after yield Ben Gardon
2021-01-05 23:38   ` Ben Gardon
2021-01-06  9:26     ` Maciej S. Szmigiero [this message]
2021-01-06 17:28       ` Ben Gardon
2021-01-06 17:37         ` Maciej S. Szmigiero
2021-01-06 17:56           ` Ben Gardon
2021-01-06 18:02             ` Maciej S. Szmigiero
2021-01-05 23:31 ` [PATCH 3/3] kvm: x86/mmu: Get/put TDP MMU root refs in iterator Ben Gardon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e94e674e-1775-3c67-97f0-8c61e1add554@oracle.com \
    --to=maciej.szmigiero@oracle.com \
    --cc=bgardon@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=leohou1402@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=pshier@google.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).