archive mirror
 help / color / mirror / Atom feed
From: "Hoyer, David" <>
To: "Haeuptle, Michael" <>,
	"" <>
Cc: "" <>
Subject: RE: Deadlock during PCIe hot remove
Date: Tue, 24 Mar 2020 15:35:12 +0000	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <CS1PR8401MB0728FC6FDAB8A35C22BD90EC95F10@CS1PR8401MB0728.NAMPRD84.PROD.OUTLOOK.COM>

You mentioned that you are using the latest pciehp code.   Does this code contain the recently introduced ist_running flag?

-----Original Message-----
From: <> On Behalf Of Haeuptle, Michael
Sent: Tuesday, March 24, 2020 10:22 AM
Subject: RE: Deadlock during PCIe hot remove

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.

From: Haeuptle, Michael
Sent: Tuesday, March 24, 2020 8:46 AM
To: '' <>
Cc: '' <>
Subject: Deadlock during PCIe hot remove

Dear PCI maintainers,

I'm running into a deadlock scenario between the hotplug, pcie and vfio_pci driver when removing multiple devices in parallel.
This is happening on CentOS8 (4.18) with SPDK ( I'm using the latest pciehp code, the rest is all 4.18.

The sequence that leads to the deadlock is as follows:

The pciehp_ist() takes the reset_lock early in its processing. While the pciehp_ist processing is progressing, vfio_pci calls pci_try_reset_function() as part of vfio_pci_release or open. The pci_try_reset_function() takes the device lock.

Eventually, pci_try_reset_function() calls pci_reset_hotplug_slot() which calls pciehp_reset_slot(). The pciehp_reset_slot() tries to take the reset_lock but has to wait since it is already taken by pciehp_ist().

Eventually pciehp_ist calls pcie_stop_device() which calls device_release_driver_internal(). This function also tries to take device_lock causing the dead lock.

Here's the kernel stack trace when the deadlock occurs:

[root@localhost ~]# cat /proc/8594/task/8598/stack [<0>] pciehp_reset_slot+0xa5/0x220 [<0>] pci_reset_hotplug_slot.cold.72+0x20/0x36
[<0>] pci_dev_reset_slot_function+0x72/0x9b
[<0>] __pci_reset_function_locked+0x15b/0x190
[<0>] pci_try_reset_function.cold.77+0x9b/0x108
[<0>] vfio_pci_disable+0x261/0x280
[<0>] vfio_pci_release+0xcb/0xf0
[<0>] vfio_device_fops_release+0x1e/0x40
[<0>] __fput+0xa5/0x1d0
[<0>] task_work_run+0x8a/0xb0
[<0>] exit_to_usermode_loop+0xd3/0xe0
[<0>] do_syscall_64+0xe1/0x100
[<0>] entry_SYSCALL_64_after_hwframe+0x65/0xca
[<0>] 0xffffffffffffffff

I was wondering if there's a quick workaround. I think the pci_try_reset_function would need to take the reset_lock before the device lock but there doesn't seem to be a good way of doing that.

I'm also trying to see if we can delay calling the vfio functions that are initiated by SPDK but I think this inherent race should be addressed.

I'm also happy to submit a defect report if this emailing list is not appropriate.

* Michael

  reply	other threads:[~2020-03-24 15:35 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-24 15:21 Deadlock during PCIe hot remove Haeuptle, Michael
2020-03-24 15:35 ` Hoyer, David [this message]
2020-03-24 15:37   ` Haeuptle, Michael
2020-03-24 15:39     ` Hoyer, David
2020-03-24 16:26       ` Haeuptle, Michael
2020-03-24 16:32         ` Hoyer, David
2020-03-24 16:15 ` Lukas Wunner
2020-03-25 10:40   ` Christoph Hellwig
2020-03-26 19:30     ` Haeuptle, Michael
2020-03-29 13:04     ` Lukas Wunner
2020-03-31  8:14       ` Christoph Hellwig
2020-03-29 15:43 ` Lukas Wunner
2020-03-30 16:15   ` Haeuptle, Michael
2020-03-31 13:01     ` Lukas Wunner
2020-03-31 15:02       ` Haeuptle, Michael
2020-04-02 19:24         ` Haeuptle, Michael
     [not found] <>
2022-10-30  8:28 ` Christoph Hellwig
2022-10-30  8:39   ` Alex Michon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).