From: Bjorn Helgaas <helgaas@kernel.org>
To: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-pci@vger.kernel.org, f.fangjian@huawei.com,
huangdaode@huawei.com,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Sinan Kaya <Okaya@kernel.org>,
Jakub Kicinski <jakub.kicinski@netronome.com>,
Christoph Hellwig <hch@lst.de>,
Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@linux.intel.com>
Subject: Re: [PATCH 2/2] PCI/AER: Fix deadlock triggered by AER and sriov enable routine
Date: Tue, 30 Jun 2020 18:58:53 -0500 [thread overview]
Message-ID: <20200630235853.GA3500801@bjorn-Precision-5520> (raw)
In-Reply-To: <1580894289-21498-1-git-send-email-yangyicong@hisilicon.com>
[+cc Ben, Sinan, Jakub, Christoph, Sathy]
On Wed, Feb 05, 2020 at 05:18:09PM +0800, Yicong Yang wrote:
> When enabling VF of pci device by sysfs, we hold device_lock
> first in sriov_enable() and then pci_bus_sem in pci_device_add().
> But when AER invoked, we hold pci_bus_sem first in
> pcie_do_recovery() and then device_lock subsequently.
> The inconsistent order will lead to deadlock as reported [1].
>
> when adding VF by sysfs:
> sriov_numvfs_store()
> device_lock()
> ...
> sriov_enable()
> ...
> pci_device_add()
> down_write(&pci_bus_sem)
> when invoking aer routine:
> pcie_do_recovery()
> ...
> pci_walk_bus()
> down_read(&pci_bus_sem)
> report_*()
> ...
> device_lock()
>
> Add pci_lock_and_walk_bus(), which locks the devices on the bus first
> using pci_bus_lock() and then walk the bus with specific callback.
> Use pci_lock_and_walk_bus() in pcie_do_recovery() and remove
> device_lock() in report_*() callbacks. Then the lock order will be
> consistent with that in the sriov enable routine.
> User space access to the configuration space of the devices on the bus
> will also be blocked in the error recovery routine. The device should
> be unreachable during error recovery.
Did you consider Ben's response [1]? He suggested that the SR-IOV
side maybe shouldn't take the device_lock before doing something that
can take the pci_bus_sem.
The device_lock() in sriov_numvfs_store() was added by 17530e71e016
("PCI: Protect pci_driver->sriov_configure() usage with
device_lock()") [2]. The commit log says we must provide exclusion vs
the driver's ->remove() method, usually by using device_lock().
Maybe this patch is doing the right thing by acquiring all the
device_locks in the subtree on the AER side.
But I do feel a little queasy about pci_device_add() and
pci_bus_add_device() and pci_bus_sem being held while holding
device_lock. Maybe the device addition should be done by a separate
thread or something.
[1] https://lore.kernel.org/linux-pci/a1c90cfb9ce4062b4823c6647d7709baf1c5534f.camel@kernel.crashing.org/
[2] https://git.kernel.org/linus/17530e71e016
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=203981
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> ---
> drivers/pci/bus.c | 8 ++++++++
> drivers/pci/pci.c | 4 ++--
> drivers/pci/pci.h | 4 ++++
> drivers/pci/pcie/err.c | 18 +++++-------------
> 4 files changed, 19 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 8e40b3e..eb29273 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -411,6 +411,14 @@ void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
> }
> EXPORT_SYMBOL_GPL(pci_walk_bus);
>
> +void pci_lock_and_walk_bus(struct pci_bus *top,
> + int (*cb)(struct pci_dev *, void *), void *userdata)
> +{
> + pci_bus_lock(top);
> + pci_walk_bus(top, cb, userdata);
> + pci_bus_unlock(top);
> +}
> +
> struct pci_bus *pci_bus_get(struct pci_bus *bus)
> {
> if (bus)
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 78c99ef..94a7f91 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5150,7 +5150,7 @@ static bool pci_bus_resetable(struct pci_bus *bus)
> }
>
> /* Lock devices from the top of the tree down */
> -static void pci_bus_lock(struct pci_bus *bus)
> +void pci_bus_lock(struct pci_bus *bus)
> {
> struct pci_dev *dev;
>
> @@ -5162,7 +5162,7 @@ static void pci_bus_lock(struct pci_bus *bus)
> }
>
> /* Unlock devices from the bottom of the tree up */
> -static void pci_bus_unlock(struct pci_bus *bus)
> +void pci_bus_unlock(struct pci_bus *bus)
> {
> struct pci_dev *dev;
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index a0a53bd..8f8cd53 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -39,6 +39,8 @@ int pci_mmap_fits(struct pci_dev *pdev, int resno, struct vm_area_struct *vmai,
> int pci_probe_reset_function(struct pci_dev *dev);
> int pci_bridge_secondary_bus_reset(struct pci_dev *dev);
> int pci_bus_error_reset(struct pci_dev *dev);
> +void pci_bus_lock(struct pci_bus *bus);
> +void pci_bus_unlock(struct pci_bus *bus);
>
> #define PCI_PM_D2_DELAY 200
> #define PCI_PM_D3_WAIT 10
> @@ -286,6 +288,8 @@ bool pci_bus_clip_resource(struct pci_dev *dev, int idx);
>
> void pci_reassigndev_resource_alignment(struct pci_dev *dev);
> void pci_disable_bridge_window(struct pci_dev *dev);
> +void pci_lock_and_walk_bus(struct pci_bus *top,
> + int (*cb)(struct pci_dev *, void *), void *userdata);
> struct pci_bus *pci_bus_get(struct pci_bus *bus);
> void pci_bus_put(struct pci_bus *bus);
>
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index b0e6048..ce0ef5c 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -50,7 +50,6 @@ static int report_error_detected(struct pci_dev *dev,
> pci_ers_result_t vote;
> const struct pci_error_handlers *err_handler;
>
> - device_lock(&dev->dev);
> if (!pci_dev_set_io_state(dev, state) ||
> !dev->driver ||
> !dev->driver->err_handler ||
> @@ -71,7 +70,6 @@ static int report_error_detected(struct pci_dev *dev,
> }
> pci_uevent_ers(dev, vote);
> *result = merge_result(*result, vote);
> - device_unlock(&dev->dev);
> return 0;
> }
>
> @@ -90,7 +88,6 @@ static int report_mmio_enabled(struct pci_dev *dev, void *data)
> pci_ers_result_t vote, *result = data;
> const struct pci_error_handlers *err_handler;
>
> - device_lock(&dev->dev);
> if (!dev->driver ||
> !dev->driver->err_handler ||
> !dev->driver->err_handler->mmio_enabled)
> @@ -100,7 +97,6 @@ static int report_mmio_enabled(struct pci_dev *dev, void *data)
> vote = err_handler->mmio_enabled(dev);
> *result = merge_result(*result, vote);
> out:
> - device_unlock(&dev->dev);
> return 0;
> }
>
> @@ -109,7 +105,6 @@ static int report_slot_reset(struct pci_dev *dev, void *data)
> pci_ers_result_t vote, *result = data;
> const struct pci_error_handlers *err_handler;
>
> - device_lock(&dev->dev);
> if (!dev->driver ||
> !dev->driver->err_handler ||
> !dev->driver->err_handler->slot_reset)
> @@ -119,7 +114,6 @@ static int report_slot_reset(struct pci_dev *dev, void *data)
> vote = err_handler->slot_reset(dev);
> *result = merge_result(*result, vote);
> out:
> - device_unlock(&dev->dev);
> return 0;
> }
>
> @@ -127,7 +121,6 @@ static int report_resume(struct pci_dev *dev, void *data)
> {
> const struct pci_error_handlers *err_handler;
>
> - device_lock(&dev->dev);
> if (!pci_dev_set_io_state(dev, pci_channel_io_normal) ||
> !dev->driver ||
> !dev->driver->err_handler ||
> @@ -138,7 +131,6 @@ static int report_resume(struct pci_dev *dev, void *data)
> err_handler->resume(dev);
> out:
> pci_uevent_ers(dev, PCI_ERS_RESULT_RECOVERED);
> - device_unlock(&dev->dev);
> return 0;
> }
>
> @@ -200,9 +192,9 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
>
> pci_dbg(dev, "broadcast error_detected message\n");
> if (state == pci_channel_io_frozen)
> - pci_walk_bus(bus, report_frozen_detected, &status);
> + pci_lock_and_walk_bus(bus, report_frozen_detected, &status);
> else
> - pci_walk_bus(bus, report_normal_detected, &status);
> + pci_lock_and_walk_bus(bus, report_normal_detected, &status);
>
> if (state == pci_channel_io_frozen &&
> reset_link(dev, service) != PCI_ERS_RESULT_RECOVERED)
> @@ -211,7 +203,7 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
> if (status == PCI_ERS_RESULT_CAN_RECOVER) {
> status = PCI_ERS_RESULT_RECOVERED;
> pci_dbg(dev, "broadcast mmio_enabled message\n");
> - pci_walk_bus(bus, report_mmio_enabled, &status);
> + pci_lock_and_walk_bus(bus, report_mmio_enabled, &status);
> }
>
> if (status == PCI_ERS_RESULT_NEED_RESET) {
> @@ -222,14 +214,14 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
> */
> status = PCI_ERS_RESULT_RECOVERED;
> pci_dbg(dev, "broadcast slot_reset message\n");
> - pci_walk_bus(bus, report_slot_reset, &status);
> + pci_lock_and_walk_bus(bus, report_slot_reset, &status);
> }
>
> if (status != PCI_ERS_RESULT_RECOVERED)
> goto failed;
>
> pci_dbg(dev, "broadcast resume message\n");
> - pci_walk_bus(bus, report_resume, &status);
> + pci_lock_and_walk_bus(bus, report_resume, &status);
>
> pci_aer_clear_device_status(dev);
> pci_cleanup_aer_uncorrect_error_status(dev);
> --
> 2.8.1
>
next prev parent reply other threads:[~2020-06-30 23:58 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-05 9:18 [PATCH 2/2] PCI/AER: Fix deadlock triggered by AER and sriov enable routine Yicong Yang
2020-06-30 23:58 ` Bjorn Helgaas [this message]
2020-07-01 15:24 ` Sinan Kaya
2020-07-04 10:25 ` Yicong Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200630235853.GA3500801@bjorn-Precision-5520 \
--to=helgaas@kernel.org \
--cc=Okaya@kernel.org \
--cc=benh@kernel.crashing.org \
--cc=f.fangjian@huawei.com \
--cc=hch@lst.de \
--cc=huangdaode@huawei.com \
--cc=jakub.kicinski@netronome.com \
--cc=linux-pci@vger.kernel.org \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=yangyicong@hisilicon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).