linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: "Haeuptle, Michael" <michael.haeuptle@hpe.com>,
	Christoph Hellwig <hch@lst.de>
Cc: "linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"michaelhaeuptle@gmail.com" <michaelhaeuptle@gmail.com>
Subject: Re: Deadlock during PCIe hot remove
Date: Tue, 24 Mar 2020 17:15:34 +0100	[thread overview]
Message-ID: <20200324161534.b2u6ag6oecvcthqd@wunner.de> (raw)
In-Reply-To: <CS1PR8401MB0728FC6FDAB8A35C22BD90EC95F10@CS1PR8401MB0728.NAMPRD84.PROD.OUTLOOK.COM>

[cc += Christoph]

On Tue, Mar 24, 2020 at 03:21:52PM +0000, Haeuptle, Michael wrote:
> I'm running into a deadlock scenario between the hotplug, pcie and
> vfio_pci driver when removing multiple devices in parallel.
> This is happening on CentOS8 (4.18) with SPDK (spdk.io). I'm using the
> latest pciehp code, the rest is all 4.18.
> 
> The sequence that leads to the deadlock is as follows:
> 
> The pciehp_ist() takes the reset_lock early in its processing. While
> the pciehp_ist processing is progressing, vfio_pci calls
> pci_try_reset_function() as part of vfio_pci_release or open.
> The pci_try_reset_function() takes the device lock.
> 
> Eventually, pci_try_reset_function() calls pci_reset_hotplug_slot()
> which calls pciehp_reset_slot(). The pciehp_reset_slot() tries to take
> the reset_lock but has to wait since it is already taken by pciehp_ist().
> 
> Eventually pciehp_ist calls pcie_stop_device() which calls
> device_release_driver_internal(). This function also tries to take
> device_lock causing the dead lock.

The pci_dev_trylock() in pci_try_reset_function() looks questionable
to me.  It was added by commit b014e96d1abb ("PCI: Protect
pci_error_handlers->reset_notify() usage with device_lock()")
with the following rationale:

    Every method in struct device_driver or structures derived from it like
    struct pci_driver MUST provide exclusion vs the driver's ->remove()
    method, usually by using device_lock().
    [...]
    Without this, ->reset_notify() may race with ->remove() calls, which
    can be easily triggered in NVMe.

The intersection of drivers defining a ->reset_notify() hook and files
invoking pci_try_reset_function() appears to be empty.  So I don't quite
understand the problem the commit sought to address.  What am I missing?

Thanks,

Lukas

  parent reply	other threads:[~2020-03-24 16:15 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-24 15:21 Deadlock during PCIe hot remove Haeuptle, Michael
2020-03-24 15:35 ` Hoyer, David
2020-03-24 15:37   ` Haeuptle, Michael
2020-03-24 15:39     ` Hoyer, David
2020-03-24 16:26       ` Haeuptle, Michael
2020-03-24 16:32         ` Hoyer, David
2020-03-24 16:15 ` Lukas Wunner [this message]
2020-03-25 10:40   ` Christoph Hellwig
2020-03-26 19:30     ` Haeuptle, Michael
2020-03-29 13:04     ` Lukas Wunner
2020-03-31  8:14       ` Christoph Hellwig
2020-03-29 15:43 ` Lukas Wunner
2020-03-30 16:15   ` Haeuptle, Michael
2020-03-31 13:01     ` Lukas Wunner
2020-03-31 15:02       ` Haeuptle, Michael
2020-04-02 19:24         ` Haeuptle, Michael
     [not found] <492110694.79456.1666778757292.JavaMail.zimbra@kalray.eu>
2022-10-30  8:28 ` Christoph Hellwig
2022-10-30  8:39   ` Alex Michon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200324161534.b2u6ag6oecvcthqd@wunner.de \
    --to=lukas@wunner.de \
    --cc=hch@lst.de \
    --cc=linux-pci@vger.kernel.org \
    --cc=michael.haeuptle@hpe.com \
    --cc=michaelhaeuptle@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).