linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 5.0-rc1 KVM inspired "BUG: Bad page state in process" spew
@ 2019-01-09  5:38 Mike Galbraith
  2019-01-09 14:42 ` Adam Borowski
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2019-01-09  5:38 UTC (permalink / raw)
  To: kvm; +Cc: lkml

[-- Attachment #1: Type: text/plain, Size: 3355 bytes --]

Greetings,

KVM seems to be busted in master ATM.  All I have to do to have host
start screaming and maybe exploding (if the guest doesn't do so first)
is to try to install a (obese in this case) kernel over nfs mount of
the host in a guest.

Kernel producing the spew below is 3bd6e94, config attached.

homer: # grep BUG: /netconsole.log
[ 1531.909703] BUG: Bad page state in process X  pfn:100491
[ 1531.958141] BUG: Bad page state in process systemd-journal  pfn:100412
[ 1532.662359] BUG: Bad page state in process X  pfn:10043f
[ 1532.664033] BUG: Bad page state in process X  pfn:10044d
[ 1532.686433] BUG: Bad page state in process systemd-journal  pfn:1027b0
[ 1532.687704] BUG: Bad page state in process systemd-journal  pfn:1027ba
[ 1532.689039] BUG: Bad page state in process systemd-journal  pfn:1027de
[ 1532.694762] BUG: Bad page state in process systemd-journal  pfn:102602
[ 1532.696337] BUG: Bad page state in process systemd-journal  pfn:102726
[ 1533.086254] BUG: Bad page state in process CPU 0/KVM  pfn:100a48
[ 1533.086869] BUG: Bad page state in process CPU 0/KVM  pfn:100a49
[ 1533.087413] BUG: Bad page state in process CPU 0/KVM  pfn:100a4a
[ 1533.087947] BUG: Bad page state in process CPU 0/KVM  pfn:100a4b
[ 1533.088477] BUG: Bad page state in process CPU 0/KVM  pfn:100a4c
[ 1533.089044] BUG: Bad page state in process CPU 0/KVM  pfn:100a4d
[ 1533.089586] BUG: Bad page state in process CPU 0/KVM  pfn:100a4e
[ 1533.090121] BUG: Bad page state in process CPU 0/KVM  pfn:100a61
[ 1533.090657] BUG: Bad page state in process CPU 0/KVM  pfn:100a62
[ 1533.091191] BUG: Bad page state in process CPU 0/KVM  pfn:100a63
[ 1533.091739] BUG: Bad page state in process CPU 0/KVM  pfn:100a64
[ 1533.092276] BUG: Bad page state in process CPU 0/KVM  pfn:100a65
[ 1533.092814] BUG: Bad page state in process CPU 0/KVM  pfn:100a66
[ 1533.093348] BUG: Bad page state in process CPU 0/KVM  pfn:100a67
[ 1533.093940] BUG: Bad page state in process CPU 0/KVM  pfn:1017e8
[ 1533.094512] BUG: Bad page state in process CPU 0/KVM  pfn:1017e9
[ 1533.095049] BUG: Bad page state in process CPU 0/KVM  pfn:1017ea
[ 1533.095585] BUG: Bad page state in process CPU 0/KVM  pfn:1017eb
[ 1533.096120] BUG: Bad page state in process CPU 0/KVM  pfn:1017ee
[ 1533.096679] BUG: Bad page state in process CPU 0/KVM  pfn:101a20
[ 1533.097221] BUG: Bad page state in process CPU 0/KVM  pfn:101a22
[ 1533.097757] BUG: Bad page state in process systemd-journal  pfn:101ce4
[ 1533.098162] BUG: Bad page state in process systemd-journal  pfn:101ce5
[ 1533.098535] BUG: Bad page state in process CPU 0/KVM  pfn:101ce6
[ 1533.099075] BUG: Bad page state in process CPU 0/KVM  pfn:101ce7
[ 1533.099667] BUG: Bad page state in process CPU 0/KVM  pfn:1024cc
[ 1533.101840] BUG: Bad page state in process CPU 0/KVM  pfn:1024e9
[ 1533.102379] BUG: Bad page state in process CPU 0/KVM  pfn:1024ea
[ 1533.102935] BUG: Bad page state in process CPU 0/KVM  pfn:1024eb
[ 1533.103516] BUG: Bad page state in process CPU 0/KVM  pfn:102bc8
[ 1533.105771] BUG: Bad page state in process CPU 0/KVM  pfn:102be0
[ 1533.106308] BUG: Bad page state in process CPU 0/KVM  pfn:102be1
[ 1533.106875] BUG: Bad page state in process CPU 0/KVM  pfn:102be2
[ 1533.107412] BUG: Bad page state in process CPU 0/KVM  pfn:102be3
[ 1533.111166] BUG: Bad page state in process CPU 0/KVM  pfn:102bee

[-- Attachment #2: config.xz --]
[-- Type: application/x-xz, Size: 39140 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 5.0-rc1 KVM inspired "BUG: Bad page state in process" spew
  2019-01-09  5:38 5.0-rc1 KVM inspired "BUG: Bad page state in process" spew Mike Galbraith
@ 2019-01-09 14:42 ` Adam Borowski
  2019-01-09 15:03   ` Mike Galbraith
  0 siblings, 1 reply; 7+ messages in thread
From: Adam Borowski @ 2019-01-09 14:42 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: kvm, lkml

On Wed, Jan 09, 2019 at 06:38:58AM +0100, Mike Galbraith wrote:
> KVM seems to be busted in master ATM.  All I have to do to have host
> start screaming and maybe exploding (if the guest doesn't do so first)
> is to try to install a (obese in this case) kernel over nfs mount of
> the host in a guest.
> 
> Kernel producing the spew below is 3bd6e94, config attached.

I get same, except that the BUGs were preceded by a bunch of warnings,

> homer: # grep BUG: /netconsole.log
> [ 1531.909703] BUG: Bad page state in process X  pfn:100491
> [ 1531.958141] BUG: Bad page state in process systemd-journal  pfn:100412
> [ 1532.662359] BUG: Bad page state in process X  pfn:10043f
> [ 1532.664033] BUG: Bad page state in process X  pfn:10044d
> [ 1532.686433] BUG: Bad page state in process systemd-journal  pfn:1027b0

the first one being:

Jan  9 00:41:22 umbar kernel: [74122.790461] WARNING: CPU: 2 PID: 26769 at arch/x86/kvm/mmu.c:830 mmu_spte_clear_track_bits+0x7e/0x100
Jan  9 00:41:22 umbar kernel: [74122.799676] Modules linked in: tun dm_mod cdc_acm rndis_wlan rndis_host cdc_ether usbnet sha256_generic cfg80211 rfkill mii nfnetlink snd_usb_audio snd_usbmidi_lib cp210x usbserial radeon ttm pcc_cpufreq
Jan  9 00:41:22 umbar kernel: [74122.817764] CPU: 2 PID: 26769 Comm: qemu-system-x86 Not tainted 5.0.0-rc1-debug-00035-gdce22716f1b4 #1
Jan  9 00:41:22 umbar kernel: [74122.827069] Hardware name: System manufacturer System Product Name/M4A77T, BIOS 2401    05/18/2011
Jan  9 00:41:22 umbar kernel: [74122.836025] RIP: 0010:mmu_spte_clear_track_bits+0x7e/0x100
Jan  9 00:41:22 umbar kernel: [74122.841528] Code: 48 89 e8 48 ba 00 00 00 00 00 ea ff ff 48 c1 e0 06 48 01 d0 48 8b 50 08 48 8d 4a ff 83 e2 01 48 0f 45 c1 8b 40 34 85 c0 75 02 <0f> 0b 48 0f ba e3 3e 73 34 48 85 1d d2 32 26 01 75 5a 48 d1 eb 83
Jan  9 00:41:22 umbar kernel: [74122.860290] RSP: 0018:ffffc900028634d0 EFLAGS: 00010246
Jan  9 00:41:22 umbar kernel: [74122.865516] RAX: 0000000000000000 RBX: 000000010198bc67 RCX: dead0000000000ff
Jan  9 00:41:22 umbar kernel: [74122.872649] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffea00040662c0
Jan  9 00:41:22 umbar kernel: [74122.879790] RBP: 000000000010198b R08: 28f5c28f5c28f5c3 R09: ffff8882264dd000
Jan  9 00:41:22 umbar kernel: [74122.886912] R10: ffff88822fffa000 R11: ffff88822fffa000 R12: 0000000000000000
Jan  9 00:41:22 umbar kernel: [74122.894046] R13: ffffc90002863580 R14: 0000000000000000 R15: ffffc90002863580
Jan  9 00:41:22 umbar kernel: [74122.901170] FS:  00007f19b9953700(0000) GS:ffff888227a80000(0000) knlGS:0000000000000000
Jan  9 00:41:22 umbar kernel: [74122.909249] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan  9 00:41:22 umbar kernel: [74122.914994] CR2: 00000000102630e4 CR3: 00000001c80ee000 CR4: 00000000000006e0
Jan  9 00:41:22 umbar kernel: [74122.922135] Call Trace:
Jan  9 00:41:22 umbar kernel: [74122.924590]  drop_spte+0x1b/0xc0
Jan  9 00:41:22 umbar kernel: [74122.927822]  mmu_page_zap_pte+0xb2/0xe0
Jan  9 00:41:22 umbar kernel: [74122.931661]  kvm_mmu_prepare_zap_page+0x4f/0x2b0
Jan  9 00:41:22 umbar kernel: [74122.936297]  mmu_shrink_scan+0x199/0x240
Jan  9 00:41:22 umbar kernel: [74122.940223]  do_shrink_slab+0x11e/0x1a0
Jan  9 00:41:22 umbar kernel: [74122.944054]  shrink_slab+0x220/0x2b0
Jan  9 00:41:22 umbar kernel: [74122.947632]  shrink_node+0x168/0x460
Jan  9 00:41:22 umbar kernel: [74122.951203]  do_try_to_free_pages+0xc1/0x370
Jan  9 00:41:22 umbar kernel: [74122.955467]  try_to_free_pages+0xb0/0xe0
Jan  9 00:41:22 umbar kernel: [74122.959385]  __alloc_pages_slowpath+0x37d/0xb60
Jan  9 00:41:22 umbar kernel: [74122.963917]  __alloc_pages_nodemask+0x255/0x270
Jan  9 00:41:22 umbar kernel: [74122.968441]  do_huge_pmd_anonymous_page+0x13c/0x5d0
Jan  9 00:41:22 umbar kernel: [74122.973322]  __handle_mm_fault+0x984/0xfb0
Jan  9 00:41:22 umbar kernel: [74122.977438]  handle_mm_fault+0xc2/0x200
Jan  9 00:41:22 umbar kernel: [74122.981278]  __get_user_pages+0x258/0x6c0
Jan  9 00:41:22 umbar kernel: [74122.985290]  get_user_pages_unlocked+0x153/0x1d0
Jan  9 00:41:22 umbar kernel: [74122.989954]  __gfn_to_pfn_memslot+0x149/0x410
Jan  9 00:41:22 umbar kernel: [74122.994320]  try_async_pf+0x96/0x1b0
Jan  9 00:41:22 umbar kernel: [74122.997900]  ? kvm_host_page_size+0x81/0xa0
Jan  9 00:41:22 umbar kernel: [74123.002093]  tdp_page_fault+0x168/0x2b0
Jan  9 00:41:22 umbar kernel: [74123.005927]  ? svm_vcpu_run+0x294/0x680
Jan  9 00:41:22 umbar kernel: [74123.009765]  kvm_mmu_page_fault+0x89/0x3f0
Jan  9 00:41:22 umbar kernel: [74123.013865]  ? kvm_set_cr8.part.87+0xa/0x30
Jan  9 00:41:22 umbar kernel: [74123.018049]  ? svm_vcpu_run+0x4fd/0x680
Jan  9 00:41:22 umbar kernel: [74123.021888]  ? kvm_fast_pio+0x140/0x190
Jan  9 00:41:22 umbar kernel: [74123.025729]  kvm_arch_vcpu_ioctl_run+0x59e/0x19c0
Jan  9 00:41:22 umbar kernel: [74123.030435]  ? _copy_to_user+0x26/0x30
Jan  9 00:41:22 umbar kernel: [74123.034187]  ? kvm_vm_ioctl+0x69a/0x950
Jan  9 00:41:22 umbar kernel: [74123.038018]  kvm_vcpu_ioctl+0x26d/0x5b0
Jan  9 00:41:22 umbar kernel: [74123.041858]  ? __switch_to_asm+0x34/0x70
Jan  9 00:41:22 umbar kernel: [74123.045792]  ? wake_up_q+0x70/0x70
Jan  9 00:41:22 umbar kernel: [74123.049198]  do_vfs_ioctl+0xb0/0x650
Jan  9 00:41:22 umbar kernel: [74123.052769]  ? __x64_sys_futex+0x151/0x1b0
Jan  9 00:41:22 umbar kernel: [74123.056860]  ksys_ioctl+0x7d/0xa0
Jan  9 00:41:22 umbar kernel: [74123.060186]  __x64_sys_ioctl+0x11/0x20
Jan  9 00:41:22 umbar kernel: [74123.063931]  do_syscall_64+0x4a/0xf0
Jan  9 00:41:22 umbar kernel: [74123.067509]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jan  9 00:41:22 umbar kernel: [74123.072564] RIP: 0033:0x7f19be5087f7
Jan  9 00:41:22 umbar kernel: [74123.076143] Code: 00 00 90 48 8b 05 99 a6 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 69 a6 0c 00 f7 d8 64 89 01 48
Jan  9 00:41:22 umbar kernel: [74123.094905] RSP: 002b:00007f19b9952578 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jan  9 00:41:22 umbar kernel: [74123.102496] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f19be5087f7
Jan  9 00:41:22 umbar kernel: [74123.109629] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000010
Jan  9 00:41:22 umbar kernel: [74123.116764] RBP: 0000000000000000 R08: 0000561da63a2c50 R09: 00000000000000ff
Jan  9 00:41:22 umbar kernel: [74123.123904] R10: 0000000000000001 R11: 0000000000000246 R12: 0000561da74bd4f0
Jan  9 00:41:22 umbar kernel: [74123.131046] R13: 00007f19c0268000 R14: 0000000000000000 R15: 0000561da74bd4f0
Jan  9 00:41:22 umbar kernel: [74123.138178] ---[ end trace d25bca6c6bf22c09 ]---


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ Hans 1 was born and raised in Johannesburg, then moved to Boston,
⣾⠁⢠⠒⠀⣿⡁ and has just became a naturalized citizen.  Hans 2's grandparents
⢿⡄⠘⠷⠚⠋⠀ came from Melanesia to Düsseldorf, and he hasn't ever been outside
⠈⠳⣄⠀⠀⠀⠀ Germany until yesterday.  Which one is an African-American?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 5.0-rc1 KVM inspired "BUG: Bad page state in process" spew
  2019-01-09 14:42 ` Adam Borowski
@ 2019-01-09 15:03   ` Mike Galbraith
  2019-01-09 19:26     ` Sean Christopherson
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2019-01-09 15:03 UTC (permalink / raw)
  To: Adam Borowski; +Cc: kvm, lkml

On Wed, 2019-01-09 at 15:42 +0100, Adam Borowski wrote:
> On Wed, Jan 09, 2019 at 06:38:58AM +0100, Mike Galbraith wrote:
> > KVM seems to be busted in master ATM.  All I have to do to have host
> > start screaming and maybe exploding (if the guest doesn't do so first)
> > is to try to install a (obese in this case) kernel over nfs mount of
> > the host in a guest.
> > 
> > Kernel producing the spew below is 3bd6e94, config attached.
> 
> I get same, except that the BUGs were preceded by a bunch of warnings,

Yeah, I was in too much of a rush...

> > homer: # grep BUG: /netconsole.log
> > [ 1531.909703] BUG: Bad page state in process X  pfn:100491
> > [ 1531.958141] BUG: Bad page state in process systemd-journal  pfn:100412
> > [ 1532.662359] BUG: Bad page state in process X  pfn:10043f
> > [ 1532.664033] BUG: Bad page state in process X  pfn:10044d
> > [ 1532.686433] BUG: Bad page state in process systemd-journal  pfn:1027b0
> 
> the first one being:
> 
> Jan  9 00:41:22 umbar kernel: [74122.790461] WARNING: CPU: 2 PID: 26769 at arch/x86/kvm/mmu.c:830 mmu_spte_clear_track_bits+0x7e/0x100

...I also get oodles of those.

	-Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 5.0-rc1 KVM inspired "BUG: Bad page state in process" spew
  2019-01-09 15:03   ` Mike Galbraith
@ 2019-01-09 19:26     ` Sean Christopherson
  2019-01-12  6:43       ` Mike Galbraith
  0 siblings, 1 reply; 7+ messages in thread
From: Sean Christopherson @ 2019-01-09 19:26 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Adam Borowski, kvm, lkml

On Wed, Jan 09, 2019 at 04:03:16PM +0100, Mike Galbraith wrote:
> On Wed, 2019-01-09 at 15:42 +0100, Adam Borowski wrote:
> > On Wed, Jan 09, 2019 at 06:38:58AM +0100, Mike Galbraith wrote:
> > > KVM seems to be busted in master ATM.  All I have to do to have host
> > > start screaming and maybe exploding (if the guest doesn't do so first)
> > > is to try to install a (obese in this case) kernel over nfs mount of
> > > the host in a guest.
> > > 
> > > Kernel producing the spew below is 3bd6e94, config attached.
> > 
> > I get same, except that the BUGs were preceded by a bunch of warnings,
> 
> Yeah, I was in too much of a rush...
> 
> > > homer: # grep BUG: /netconsole.log
> > > [ 1531.909703] BUG: Bad page state in process X  pfn:100491
> > > [ 1531.958141] BUG: Bad page state in process systemd-journal  pfn:100412
> > > [ 1532.662359] BUG: Bad page state in process X  pfn:10043f
> > > [ 1532.664033] BUG: Bad page state in process X  pfn:10044d
> > > [ 1532.686433] BUG: Bad page state in process systemd-journal  pfn:1027b0
> > 
> > the first one being:
> > 
> > Jan  9 00:41:22 umbar kernel: [74122.790461] WARNING: CPU: 2 PID: 26769 at arch/x86/kvm/mmu.c:830 mmu_spte_clear_track_bits+0x7e/0x100
> 
> ...I also get oodles of those.

There's also a bugzilla bug that's probably the same thing:
https://bugzilla.kernel.org/show_bug.cgi?id=202189.

I'm 99.9% confident this is only manifests when reclaiming from the guest,
i.e. host is swapping out a VM's memory.  The WARNING is complaining that
KVM is trying to reclaim a page before it has been removed from KVM's MMU,
i.e. a use-after-free scenario is imminent, and the stack trace shows the
kernel is reclaming.  The bug also listed host swapping as a possible
requirement, and last but not least, I was only able to reproduce this by
forcing reclaim.

I'll try to bisect.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 5.0-rc1 KVM inspired "BUG: Bad page state in process" spew
  2019-01-09 19:26     ` Sean Christopherson
@ 2019-01-12  6:43       ` Mike Galbraith
  2019-01-14 15:46         ` Sean Christopherson
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2019-01-12  6:43 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Adam Borowski, kvm, lkml

On Wed, 2019-01-09 at 11:26 -0800, Sean Christopherson wrote:
> 
> I'll try to bisect.

Good luck with that.  I gave it a go, but that apparently invalidated
the warrantee of my vm image :)

	-Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 5.0-rc1 KVM inspired "BUG: Bad page state in process" spew
  2019-01-12  6:43       ` Mike Galbraith
@ 2019-01-14 15:46         ` Sean Christopherson
  2019-01-14 17:42           ` Mike Galbraith
  0 siblings, 1 reply; 7+ messages in thread
From: Sean Christopherson @ 2019-01-14 15:46 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Adam Borowski, kvm, lkml

On Sat, Jan 12, 2019 at 07:43:02AM +0100, Mike Galbraith wrote:
> On Wed, 2019-01-09 at 11:26 -0800, Sean Christopherson wrote:
> > 
> > I'll try to bisect.
> 
> Good luck with that.  I gave it a go, but that apparently invalidated
> the warrantee of my vm image :)

This is fixed in v5.0-rc2 by commit ba422731316d ("mm/mmu_notifier:
mm/rmap.c: Fix a mmu_notifier range bug in try_to_unmap_one").

Nested VMs are great for bisecting KVM issues :)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 5.0-rc1 KVM inspired "BUG: Bad page state in process" spew
  2019-01-14 15:46         ` Sean Christopherson
@ 2019-01-14 17:42           ` Mike Galbraith
  0 siblings, 0 replies; 7+ messages in thread
From: Mike Galbraith @ 2019-01-14 17:42 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Adam Borowski, kvm, lkml

On Mon, 2019-01-14 at 07:46 -0800, Sean Christopherson wrote:
> On Sat, Jan 12, 2019 at 07:43:02AM +0100, Mike Galbraith wrote:
> > On Wed, 2019-01-09 at 11:26 -0800, Sean Christopherson wrote:
> > > 
> > > I'll try to bisect.
> > 
> > Good luck with that.  I gave it a go, but that apparently invalidated
> > the warrantee of my vm image :)
> 
> This is fixed in v5.0-rc2 by commit ba422731316d ("mm/mmu_notifier:
> mm/rmap.c: Fix a mmu_notifier range bug in try_to_unmap_one").

Good to hear.

> Nested VMs are great for bisecting KVM issues :)

I tried nesting dolls (to speed bisection.. hah!), but couldn't
reliable trigger the problem with my reproducer.  Even if it had,
nested VMs are pretty cool, but not the _least bit_ fast :)

	-Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-01-14 17:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-09  5:38 5.0-rc1 KVM inspired "BUG: Bad page state in process" spew Mike Galbraith
2019-01-09 14:42 ` Adam Borowski
2019-01-09 15:03   ` Mike Galbraith
2019-01-09 19:26     ` Sean Christopherson
2019-01-12  6:43       ` Mike Galbraith
2019-01-14 15:46         ` Sean Christopherson
2019-01-14 17:42           ` Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).