All of lore.kernel.org
 help / color / mirror / Atom feed
* bisected 4.17-rc - BUG: Bad page state in process qemu-system-x86 pfn:7178f3
@ 2018-06-02  9:56 Amadeusz Sławiński
  2018-06-02 15:05 ` Alex Williamson
  0 siblings, 1 reply; 2+ messages in thread
From: Amadeusz Sławiński @ 2018-06-02  9:56 UTC (permalink / raw)
  To: Alex Williamson, kvm, linux-kernel; +Cc: Jason Cai (Xiang Feng)

Hey,

so I've been getting system instability problems after shutting down
virtual machine with GPU pass-through in 4.17-rc series and I finally
got around to bisecting it.

Seems to be caused by 356e88ebe4473a3663cf3d14727ce293a4526d34
and problem seems to be gone after reverting it.

trce from /varlog/messages:

Jun  1 22:47:23 milkyway kernel: BUG: Bad page state in process qemu-system-x86  pfn:7178f3
Jun  1 22:47:23 milkyway kernel: page:fffffbfddc5e3cc0 count:0 mapcount:1 mapping:0000000000000000 index:0x1
Jun  1 22:47:23 milkyway kernel: flags: 0x200000000000000()
Jun  1 22:47:23 milkyway kernel: raw: 0200000000000000 0000000000000000 0000000000000001 0000000000000000
Jun  1 22:47:23 milkyway kernel: raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
Jun  1 22:47:23 milkyway kernel: page dumped because: nonzero mapcount
Jun  1 22:47:23 milkyway kernel: Modules linked in: x86_pkg_temp_thermal coretemp crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel eeepc_wmi asus_wmi wmi_bmof aes_x86_64 crypto_simd cryptd wmi glue_helper
Jun  1 22:47:23 milkyway kernel: CPU: 4 PID: 4303 Comm: qemu-system-x86 Not tainted 4.16.0+ #26
Jun  1 22:47:23 milkyway kernel: Hardware name: ASUS All Series/SABERTOOTH Z97 MARK 2, BIOS 3503 04/18/2018
Jun  1 22:47:23 milkyway kernel: Call Trace:
Jun  1 22:47:23 milkyway kernel:  dump_stack+0x46/0x5b
Jun  1 22:47:23 milkyway kernel:  bad_page+0xbf/0x120
Jun  1 22:47:23 milkyway kernel:  free_pcppages_bulk+0x434/0x500
Jun  1 22:47:23 milkyway kernel:  free_unref_page+0x33/0x40
Jun  1 22:47:23 milkyway kernel:  dma_free_pagelist+0x27/0x40
Jun  1 22:47:23 milkyway kernel:  intel_iommu_unmap+0x114/0x150
Jun  1 22:47:23 milkyway kernel:  __iommu_unmap+0xe4/0x130
Jun  1 22:47:23 milkyway kernel:  vfio_unmap_unpin+0x13f/0x330
Jun  1 22:47:23 milkyway kernel:  vfio_remove_dma+0x12/0x40
Jun  1 22:47:23 milkyway kernel:  vfio_iommu_unmap_unpin_all+0x16/0x30
Jun  1 22:47:23 milkyway kernel:  vfio_iommu_type1_detach_group+0x2b3/0x2c0
Jun  1 22:47:23 milkyway kernel:  __vfio_group_unset_container+0x4d/0x180
Jun  1 22:47:23 milkyway kernel:  vfio_group_put_external_user+0x9/0x20
Jun  1 22:47:23 milkyway kernel:  kvm_vfio_group_put_external_user+0x1d/0x30
Jun  1 22:47:23 milkyway kernel:  kvm_vfio_destroy+0x4a/0xc0
Jun  1 22:47:23 milkyway kernel:  kvm_put_kvm+0x1a1/0x290
Jun  1 22:47:23 milkyway kernel:  kvm_vm_release+0x18/0x20
Jun  1 22:47:23 milkyway kernel:  __fput+0xcd/0x1f0
Jun  1 22:47:23 milkyway kernel:  task_work_run+0x8d/0xb0
Jun  1 22:47:23 milkyway kernel:  do_exit+0x2d9/0xbe0
Jun  1 22:47:23 milkyway kernel:  ? hrtimer_init+0x10/0x10
Jun  1 22:47:23 milkyway kernel:  do_group_exit+0x31/0xb0
Jun  1 22:47:23 milkyway kernel:  get_signal+0x12d/0x570
Jun  1 22:47:23 milkyway kernel:  do_signal+0x3e/0x5d0
Jun  1 22:47:23 milkyway kernel:  exit_to_usermode_loop+0x46/0x80
Jun  1 22:47:23 milkyway kernel:  do_syscall_64+0xe0/0xf0
Jun  1 22:47:23 milkyway kernel:  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Jun  1 22:47:23 milkyway kernel: RIP: 0033:0x7e7c7512750f
Jun  1 22:47:23 milkyway kernel: RSP: 002b:00007e77df3f29d0 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
Jun  1 22:47:23 milkyway kernel: RAX: fffffffffffffdfc RBX: 0000000000000189 RCX: 00007e7c7512750f
Jun  1 22:47:23 milkyway kernel: RDX: 0000000000000000 RSI: 0000000000000189 RDI: 000057066f99c0a8
Jun  1 22:47:23 milkyway kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffffff
Jun  1 22:47:23 milkyway kernel: R10: 00007e77df3f2a80 R11: 0000000000000246 R12: 00007e77df3f2a80
Jun  1 22:47:23 milkyway kernel: R13: 000057066f99c0a8 R14: 00007e77df3f2a80 R15: 00007fff7e253a30
Jun  1 22:47:23 milkyway kernel: Disabling lock debugging due to kernel taint



git bisect log

git bisect start
# good: [0adb32858b0bddf4ada5f364a84ed60b196dbcda] Linux 4.16
git bisect good 0adb32858b0bddf4ada5f364a84ed60b196dbcda
# bad: [60cc43fc888428bb2f18f08997432d426a243338] Linux 4.17-rc1
git bisect bad 60cc43fc888428bb2f18f08997432d426a243338
# good: [ac9053d2dcb9e8c3fa35ce458dfca8fddc141680] Merge tag 'usb-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
git bisect good ac9053d2dcb9e8c3fa35ce458dfca8fddc141680
# good: [38c23685b273cfb4ccf31a199feccce3bdcb5d83] Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good 38c23685b273cfb4ccf31a199feccce3bdcb5d83
# bad: [fbe173e3ffbd897b5a859020d714c0eaf4af2a1a] Merge tag 'rtc-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
git bisect bad fbe173e3ffbd897b5a859020d714c0eaf4af2a1a
# bad: [299f89d53e61c0b17479cc7d6f3b5382d5e83f28] Merge tag 'leaks-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tobin/leaks
git bisect bad 299f89d53e61c0b17479cc7d6f3b5382d5e83f28
# good: [28da7be5ebc096ada5e6bc526c623bdd8c47800a] Merge tag 'mailbox-v4.17' of git://git.linaro.org/landing-teams/working/fujitsu/integration
git bisect good 28da7be5ebc096ada5e6bc526c623bdd8c47800a
# good: [19fd08b85bc7e0502b55cd726f466df82ee7e777] Merge tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
git bisect good 19fd08b85bc7e0502b55cd726f466df82ee7e777
# good: [14d8d776aeda8e367a9354b6cb6a0696671630c9] Merge branch 'lorenzo/pci/endpoint'
git bisect good 14d8d776aeda8e367a9354b6cb6a0696671630c9
# bad: [f605ba97fb80522656c7dce9825a908f1e765b57] Merge tag 'vfio-v4.17-rc1' of git://github.com/awilliam/linux-vfio
git bisect bad f605ba97fb80522656c7dce9825a908f1e765b57
# good: [d2f48c5d7fd791104f3227d8e6b55fca892eb2ba] Merge branch 'lorenzo/pci/xgene'
git bisect good d2f48c5d7fd791104f3227d8e6b55fca892eb2ba
# good: [dc32bb678e103afbcfa4d814489af0566307f528] vhost: add vsock compat ioctl
git bisect good dc32bb678e103afbcfa4d814489af0566307f528
# bad: [da9147140fe3de5a3a3fe5fe7f69739d4f39bea1] MAINTAINERS: vfio/platform: Update sub-maintainer
git bisect bad da9147140fe3de5a3a3fe5fe7f69739d4f39bea1
# bad: [356e88ebe4473a3663cf3d14727ce293a4526d34] vfio/type1: Improve memory pinning process for raw PFN mapping
git bisect bad 356e88ebe4473a3663cf3d14727ce293a4526d34
# good: [c9f89c3f87cfc026d88c08054710902dd52a7772] vfio-mdev/samples: change RDI interrupt condition
git bisect good c9f89c3f87cfc026d88c08054710902dd52a7772
# first bad commit: [356e88ebe4473a3663cf3d14727ce293a4526d34] vfio/type1: Improve memory pinning process for raw PFN mapping


Cheers,
Amadeusz

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: bisected 4.17-rc - BUG: Bad page state in process qemu-system-x86  pfn:7178f3
  2018-06-02  9:56 bisected 4.17-rc - BUG: Bad page state in process qemu-system-x86 pfn:7178f3 Amadeusz Sławiński
@ 2018-06-02 15:05 ` Alex Williamson
  0 siblings, 0 replies; 2+ messages in thread
From: Alex Williamson @ 2018-06-02 15:05 UTC (permalink / raw)
  To: Amadeusz Sławiński; +Cc: kvm, linux-kernel, Jason Cai (Xiang Feng)

On Sat, 2 Jun 2018 11:56:24 +0200
Amadeusz Sławiński <amade@asmblr.net> wrote:

> Hey,
> 
> so I've been getting system instability problems after shutting down
> virtual machine with GPU pass-through in 4.17-rc series and I finally
> got around to bisecting it.
> 
> Seems to be caused by 356e88ebe4473a3663cf3d14727ce293a4526d34
> and problem seems to be gone after reverting it.

Thanks for bisecting this, seems that we're hitting some sort of
unbalanced page state, suggesting we're not skipping the pfn mappings
on unmap.  As this was introduced in v4.17-rc, which is about to close,
I think our only option is to revert it for now.  I'll post that
shortly.  Thanks,

Alex

> trce from /varlog/messages:
> 
> Jun  1 22:47:23 milkyway kernel: BUG: Bad page state in process qemu-system-x86  pfn:7178f3
> Jun  1 22:47:23 milkyway kernel: page:fffffbfddc5e3cc0 count:0 mapcount:1 mapping:0000000000000000 index:0x1
> Jun  1 22:47:23 milkyway kernel: flags: 0x200000000000000()
> Jun  1 22:47:23 milkyway kernel: raw: 0200000000000000 0000000000000000 0000000000000001 0000000000000000
> Jun  1 22:47:23 milkyway kernel: raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
> Jun  1 22:47:23 milkyway kernel: page dumped because: nonzero mapcount
> Jun  1 22:47:23 milkyway kernel: Modules linked in: x86_pkg_temp_thermal coretemp crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel eeepc_wmi asus_wmi wmi_bmof aes_x86_64 crypto_simd cryptd wmi glue_helper
> Jun  1 22:47:23 milkyway kernel: CPU: 4 PID: 4303 Comm: qemu-system-x86 Not tainted 4.16.0+ #26
> Jun  1 22:47:23 milkyway kernel: Hardware name: ASUS All Series/SABERTOOTH Z97 MARK 2, BIOS 3503 04/18/2018
> Jun  1 22:47:23 milkyway kernel: Call Trace:
> Jun  1 22:47:23 milkyway kernel:  dump_stack+0x46/0x5b
> Jun  1 22:47:23 milkyway kernel:  bad_page+0xbf/0x120
> Jun  1 22:47:23 milkyway kernel:  free_pcppages_bulk+0x434/0x500
> Jun  1 22:47:23 milkyway kernel:  free_unref_page+0x33/0x40
> Jun  1 22:47:23 milkyway kernel:  dma_free_pagelist+0x27/0x40
> Jun  1 22:47:23 milkyway kernel:  intel_iommu_unmap+0x114/0x150
> Jun  1 22:47:23 milkyway kernel:  __iommu_unmap+0xe4/0x130
> Jun  1 22:47:23 milkyway kernel:  vfio_unmap_unpin+0x13f/0x330
> Jun  1 22:47:23 milkyway kernel:  vfio_remove_dma+0x12/0x40
> Jun  1 22:47:23 milkyway kernel:  vfio_iommu_unmap_unpin_all+0x16/0x30
> Jun  1 22:47:23 milkyway kernel:  vfio_iommu_type1_detach_group+0x2b3/0x2c0
> Jun  1 22:47:23 milkyway kernel:  __vfio_group_unset_container+0x4d/0x180
> Jun  1 22:47:23 milkyway kernel:  vfio_group_put_external_user+0x9/0x20
> Jun  1 22:47:23 milkyway kernel:  kvm_vfio_group_put_external_user+0x1d/0x30
> Jun  1 22:47:23 milkyway kernel:  kvm_vfio_destroy+0x4a/0xc0
> Jun  1 22:47:23 milkyway kernel:  kvm_put_kvm+0x1a1/0x290
> Jun  1 22:47:23 milkyway kernel:  kvm_vm_release+0x18/0x20
> Jun  1 22:47:23 milkyway kernel:  __fput+0xcd/0x1f0
> Jun  1 22:47:23 milkyway kernel:  task_work_run+0x8d/0xb0
> Jun  1 22:47:23 milkyway kernel:  do_exit+0x2d9/0xbe0
> Jun  1 22:47:23 milkyway kernel:  ? hrtimer_init+0x10/0x10
> Jun  1 22:47:23 milkyway kernel:  do_group_exit+0x31/0xb0
> Jun  1 22:47:23 milkyway kernel:  get_signal+0x12d/0x570
> Jun  1 22:47:23 milkyway kernel:  do_signal+0x3e/0x5d0
> Jun  1 22:47:23 milkyway kernel:  exit_to_usermode_loop+0x46/0x80
> Jun  1 22:47:23 milkyway kernel:  do_syscall_64+0xe0/0xf0
> Jun  1 22:47:23 milkyway kernel:  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> Jun  1 22:47:23 milkyway kernel: RIP: 0033:0x7e7c7512750f
> Jun  1 22:47:23 milkyway kernel: RSP: 002b:00007e77df3f29d0 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> Jun  1 22:47:23 milkyway kernel: RAX: fffffffffffffdfc RBX: 0000000000000189 RCX: 00007e7c7512750f
> Jun  1 22:47:23 milkyway kernel: RDX: 0000000000000000 RSI: 0000000000000189 RDI: 000057066f99c0a8
> Jun  1 22:47:23 milkyway kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffffff
> Jun  1 22:47:23 milkyway kernel: R10: 00007e77df3f2a80 R11: 0000000000000246 R12: 00007e77df3f2a80
> Jun  1 22:47:23 milkyway kernel: R13: 000057066f99c0a8 R14: 00007e77df3f2a80 R15: 00007fff7e253a30
> Jun  1 22:47:23 milkyway kernel: Disabling lock debugging due to kernel taint
> 
> 
> 
> git bisect log
> 
> git bisect start
> # good: [0adb32858b0bddf4ada5f364a84ed60b196dbcda] Linux 4.16
> git bisect good 0adb32858b0bddf4ada5f364a84ed60b196dbcda
> # bad: [60cc43fc888428bb2f18f08997432d426a243338] Linux 4.17-rc1
> git bisect bad 60cc43fc888428bb2f18f08997432d426a243338
> # good: [ac9053d2dcb9e8c3fa35ce458dfca8fddc141680] Merge tag 'usb-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
> git bisect good ac9053d2dcb9e8c3fa35ce458dfca8fddc141680
> # good: [38c23685b273cfb4ccf31a199feccce3bdcb5d83] Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
> git bisect good 38c23685b273cfb4ccf31a199feccce3bdcb5d83
> # bad: [fbe173e3ffbd897b5a859020d714c0eaf4af2a1a] Merge tag 'rtc-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
> git bisect bad fbe173e3ffbd897b5a859020d714c0eaf4af2a1a
> # bad: [299f89d53e61c0b17479cc7d6f3b5382d5e83f28] Merge tag 'leaks-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tobin/leaks
> git bisect bad 299f89d53e61c0b17479cc7d6f3b5382d5e83f28
> # good: [28da7be5ebc096ada5e6bc526c623bdd8c47800a] Merge tag 'mailbox-v4.17' of git://git.linaro.org/landing-teams/working/fujitsu/integration
> git bisect good 28da7be5ebc096ada5e6bc526c623bdd8c47800a
> # good: [19fd08b85bc7e0502b55cd726f466df82ee7e777] Merge tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
> git bisect good 19fd08b85bc7e0502b55cd726f466df82ee7e777
> # good: [14d8d776aeda8e367a9354b6cb6a0696671630c9] Merge branch 'lorenzo/pci/endpoint'
> git bisect good 14d8d776aeda8e367a9354b6cb6a0696671630c9
> # bad: [f605ba97fb80522656c7dce9825a908f1e765b57] Merge tag 'vfio-v4.17-rc1' of git://github.com/awilliam/linux-vfio
> git bisect bad f605ba97fb80522656c7dce9825a908f1e765b57
> # good: [d2f48c5d7fd791104f3227d8e6b55fca892eb2ba] Merge branch 'lorenzo/pci/xgene'
> git bisect good d2f48c5d7fd791104f3227d8e6b55fca892eb2ba
> # good: [dc32bb678e103afbcfa4d814489af0566307f528] vhost: add vsock compat ioctl
> git bisect good dc32bb678e103afbcfa4d814489af0566307f528
> # bad: [da9147140fe3de5a3a3fe5fe7f69739d4f39bea1] MAINTAINERS: vfio/platform: Update sub-maintainer
> git bisect bad da9147140fe3de5a3a3fe5fe7f69739d4f39bea1
> # bad: [356e88ebe4473a3663cf3d14727ce293a4526d34] vfio/type1: Improve memory pinning process for raw PFN mapping
> git bisect bad 356e88ebe4473a3663cf3d14727ce293a4526d34
> # good: [c9f89c3f87cfc026d88c08054710902dd52a7772] vfio-mdev/samples: change RDI interrupt condition
> git bisect good c9f89c3f87cfc026d88c08054710902dd52a7772
> # first bad commit: [356e88ebe4473a3663cf3d14727ce293a4526d34] vfio/type1: Improve memory pinning process for raw PFN mapping
> 
> 
> Cheers,
> Amadeusz

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-06-02 15:05 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-02  9:56 bisected 4.17-rc - BUG: Bad page state in process qemu-system-x86 pfn:7178f3 Amadeusz Sławiński
2018-06-02 15:05 ` Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.