All of lore.kernel.org
 help / color / mirror / Atom feed
* avoid null pointer rereference during FLR V2
@ 2017-06-01 11:10 ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-06-01 11:10 UTC (permalink / raw)
  To: helgaas; +Cc: rakesh, linux-pci, linux-nvme

Hi all,

Rakesh reported a bug where a FLR can trivially crash his system.
The reason for that is that NVMe unbinds the driver from the PCI device
on an unrecoverable error, and that races with the reset_notify method.

This is fairly easily fixable by taking the device lock for a slightly
longer period.  Note that the other PCI error handling methods actually
have the same issue, but with them not taking the lock yet and me having
no good way to reproducibly call them I'm a little reluctant to touch
them, but it would be great if we could fix those issues as well.

Patches 2 and 3 are cleanups in the same area and not 4.12 material,
but given that they depend on the first one I thought I'd send them
along.

Changes since V1:
 - lock over all calls to ->reset_notify

^ permalink raw reply	[flat|nested] 28+ messages in thread

* avoid null pointer rereference during FLR V2
@ 2017-06-01 11:10 ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-06-01 11:10 UTC (permalink / raw)


Hi all,

Rakesh reported a bug where a FLR can trivially crash his system.
The reason for that is that NVMe unbinds the driver from the PCI device
on an unrecoverable error, and that races with the reset_notify method.

This is fairly easily fixable by taking the device lock for a slightly
longer period.  Note that the other PCI error handling methods actually
have the same issue, but with them not taking the lock yet and me having
no good way to reproducibly call them I'm a little reluctant to touch
them, but it would be great if we could fix those issues as well.

Patches 2 and 3 are cleanups in the same area and not 4.12 material,
but given that they depend on the first one I thought I'd send them
along.

Changes since V1:
 - lock over all calls to ->reset_notify

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls
  2017-06-01 11:10 ` Christoph Hellwig
@ 2017-06-01 11:10   ` Christoph Hellwig
  -1 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-06-01 11:10 UTC (permalink / raw)
  To: helgaas; +Cc: rakesh, linux-pci, linux-nvme

Without this ->notify_reset instance may race with ->remove calls,
which can be easily triggered in NVMe.

Reported-by: Rakesh Pandit <rakesh@tuxera.com>
Tested-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/pci/pci.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 563901cd9c06..92f7e5ae2e5e 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4276,11 +4276,13 @@ int pci_reset_function(struct pci_dev *dev)
 	if (rc)
 		return rc;
 
+	pci_dev_lock(dev);
 	pci_dev_save_and_disable(dev);
 
-	rc = pci_dev_reset(dev, 0);
+	rc = __pci_dev_reset(dev, 0);
 
 	pci_dev_restore(dev);
+	pci_dev_unlock(dev);
 
 	return rc;
 }
@@ -4300,16 +4302,14 @@ int pci_try_reset_function(struct pci_dev *dev)
 	if (rc)
 		return rc;
 
-	pci_dev_save_and_disable(dev);
+	if (pci_dev_trylock(dev))
+		return -EAGAIN;
 
-	if (pci_dev_trylock(dev)) {
-		rc = __pci_dev_reset(dev, 0);
-		pci_dev_unlock(dev);
-	} else
-		rc = -EAGAIN;
+	pci_dev_save_and_disable(dev);
+	rc = __pci_dev_reset(dev, 0);
+	pci_dev_unlock(dev);
 
 	pci_dev_restore(dev);
-
 	return rc;
 }
 EXPORT_SYMBOL_GPL(pci_try_reset_function);
@@ -4459,7 +4459,9 @@ static void pci_bus_save_and_disable(struct pci_bus *bus)
 	struct pci_dev *dev;
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
+		pci_dev_lock(dev);
 		pci_dev_save_and_disable(dev);
+		pci_dev_unlock(dev);
 		if (dev->subordinate)
 			pci_bus_save_and_disable(dev->subordinate);
 	}
@@ -4474,7 +4476,9 @@ static void pci_bus_restore(struct pci_bus *bus)
 	struct pci_dev *dev;
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
+		pci_dev_lock(dev);
 		pci_dev_restore(dev);
+		pci_dev_unlock(dev);
 		if (dev->subordinate)
 			pci_bus_restore(dev->subordinate);
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls
@ 2017-06-01 11:10   ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-06-01 11:10 UTC (permalink / raw)


Without this ->notify_reset instance may race with ->remove calls,
which can be easily triggered in NVMe.

Reported-by: Rakesh Pandit <rakesh at tuxera.com>
Tested-by: Rakesh Pandit <rakesh at tuxera.com>
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/pci/pci.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 563901cd9c06..92f7e5ae2e5e 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4276,11 +4276,13 @@ int pci_reset_function(struct pci_dev *dev)
 	if (rc)
 		return rc;
 
+	pci_dev_lock(dev);
 	pci_dev_save_and_disable(dev);
 
-	rc = pci_dev_reset(dev, 0);
+	rc = __pci_dev_reset(dev, 0);
 
 	pci_dev_restore(dev);
+	pci_dev_unlock(dev);
 
 	return rc;
 }
@@ -4300,16 +4302,14 @@ int pci_try_reset_function(struct pci_dev *dev)
 	if (rc)
 		return rc;
 
-	pci_dev_save_and_disable(dev);
+	if (pci_dev_trylock(dev))
+		return -EAGAIN;
 
-	if (pci_dev_trylock(dev)) {
-		rc = __pci_dev_reset(dev, 0);
-		pci_dev_unlock(dev);
-	} else
-		rc = -EAGAIN;
+	pci_dev_save_and_disable(dev);
+	rc = __pci_dev_reset(dev, 0);
+	pci_dev_unlock(dev);
 
 	pci_dev_restore(dev);
-
 	return rc;
 }
 EXPORT_SYMBOL_GPL(pci_try_reset_function);
@@ -4459,7 +4459,9 @@ static void pci_bus_save_and_disable(struct pci_bus *bus)
 	struct pci_dev *dev;
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
+		pci_dev_lock(dev);
 		pci_dev_save_and_disable(dev);
+		pci_dev_unlock(dev);
 		if (dev->subordinate)
 			pci_bus_save_and_disable(dev->subordinate);
 	}
@@ -4474,7 +4476,9 @@ static void pci_bus_restore(struct pci_bus *bus)
 	struct pci_dev *dev;
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
+		pci_dev_lock(dev);
 		pci_dev_restore(dev);
+		pci_dev_unlock(dev);
 		if (dev->subordinate)
 			pci_bus_restore(dev->subordinate);
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/3] PCI: split reset_notify method
  2017-06-01 11:10 ` Christoph Hellwig
@ 2017-06-01 11:10   ` Christoph Hellwig
  -1 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-06-01 11:10 UTC (permalink / raw)
  To: helgaas; +Cc: rakesh, linux-pci, linux-nvme

The prepare and done cases have no shared functionality whatsoever, so
split them into separate methods.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 36 +++++--------
 drivers/net/wireless/marvell/mwifiex/pcie.c  | 75 ++++++++++++++++------------
 drivers/nvme/host/pci.c                      | 15 +++---
 drivers/pci/pci.c                            | 25 +++-------
 include/linux/pci.h                          |  3 +-
 5 files changed, 75 insertions(+), 79 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 3e26d27ad213..63784576ae8b 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -2348,30 +2348,19 @@ static void fm10k_io_resume(struct pci_dev *pdev)
 		netif_device_attach(netdev);
 }
 
-/**
- * fm10k_io_reset_notify - called when PCI function is reset
- * @pdev: Pointer to PCI device
- *
- * This callback is called when the PCI function is reset such as from
- * /sys/class/net/<enpX>/device/reset or similar. When prepare is true, it
- * means we should prepare for a function reset. If prepare is false, it means
- * the function reset just occurred.
- */
-static void fm10k_io_reset_notify(struct pci_dev *pdev, bool prepare)
+static void fm10k_io_reset_prepare(struct pci_dev *pdev)
 {
-	struct fm10k_intfc *interface = pci_get_drvdata(pdev);
-	int err = 0;
-
-	if (prepare) {
-		/* warn incase we have any active VF devices */
-		if (pci_num_vf(pdev))
-			dev_warn(&pdev->dev,
-				 "PCIe FLR may cause issues for any active VF devices\n");
+	/* warn incase we have any active VF devices */
+	if (pci_num_vf(pdev))
+		dev_warn(&pdev->dev,
+			 "PCIe FLR may cause issues for any active VF devices\n");
+	fm10k_prepare_suspend(pci_get_drvdata(pdev));
+}
 
-		fm10k_prepare_suspend(interface);
-	} else {
-		err = fm10k_handle_resume(interface);
-	}
+static void fm10k_io_reset_done(struct pci_dev *pdev)
+{
+	struct fm10k_intfc *interface = pci_get_drvdata(pdev);
+	int err = fm10k_handle_resume(interface);
 
 	if (err) {
 		dev_warn(&pdev->dev,
@@ -2384,7 +2373,8 @@ static const struct pci_error_handlers fm10k_err_handler = {
 	.error_detected = fm10k_io_error_detected,
 	.slot_reset = fm10k_io_slot_reset,
 	.resume = fm10k_io_resume,
-	.reset_notify = fm10k_io_reset_notify,
+	.reset_prepare = fm10k_io_reset_prepare,
+	.reset_done = fm10k_io_reset_done,
 };
 
 static struct pci_driver fm10k_driver = {
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index ac62bce50e96..279adf124fc9 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -346,11 +346,13 @@ static const struct pci_device_id mwifiex_ids[] = {
 
 MODULE_DEVICE_TABLE(pci, mwifiex_ids);
 
-static void mwifiex_pcie_reset_notify(struct pci_dev *pdev, bool prepare)
+/*
+ * Cleanup all software without cleaning anything related to PCIe and HW.
+ */
+static void mwifiex_pcie_reset_prepare(struct pci_dev *pdev)
 {
 	struct pcie_service_card *card = pci_get_drvdata(pdev);
 	struct mwifiex_adapter *adapter = card->adapter;
-	int ret;
 
 	if (!adapter) {
 		dev_err(&pdev->dev, "%s: adapter structure is not valid\n",
@@ -359,37 +361,48 @@ static void mwifiex_pcie_reset_notify(struct pci_dev *pdev, bool prepare)
 	}
 
 	mwifiex_dbg(adapter, INFO,
-		    "%s: vendor=0x%4.04x device=0x%4.04x rev=%d %s\n",
-		    __func__, pdev->vendor, pdev->device,
-		    pdev->revision,
-		    prepare ? "Pre-FLR" : "Post-FLR");
-
-	if (prepare) {
-		/* Kernel would be performing FLR after this notification.
-		 * Cleanup all software without cleaning anything related to
-		 * PCIe and HW.
-		 */
-		mwifiex_shutdown_sw(adapter);
-		adapter->surprise_removed = true;
-		clear_bit(MWIFIEX_IFACE_WORK_DEVICE_DUMP, &card->work_flags);
-		clear_bit(MWIFIEX_IFACE_WORK_CARD_RESET, &card->work_flags);
-	} else {
-		/* Kernel stores and restores PCIe function context before and
-		 * after performing FLR respectively. Reconfigure the software
-		 * and firmware including firmware redownload
-		 */
-		adapter->surprise_removed = false;
-		ret = mwifiex_reinit_sw(adapter);
-		if (ret) {
-			dev_err(&pdev->dev, "reinit failed: %d\n", ret);
-			return;
-		}
-	}
+		    "%s: vendor=0x%4.04x device=0x%4.04x rev=%d Pre-FLR\n",
+		    __func__, pdev->vendor, pdev->device, pdev->revision);
+
+	mwifiex_shutdown_sw(adapter);
+	adapter->surprise_removed = true;
+	clear_bit(MWIFIEX_IFACE_WORK_DEVICE_DUMP, &card->work_flags);
+	clear_bit(MWIFIEX_IFACE_WORK_CARD_RESET, &card->work_flags);
 	mwifiex_dbg(adapter, INFO, "%s, successful\n", __func__);
 }
 
-static const struct pci_error_handlers mwifiex_pcie_err_handler[] = {
-		{ .reset_notify = mwifiex_pcie_reset_notify, },
+/*
+ * Kernel stores and restores PCIe function context before and after performing
+ * FLR respectively. Reconfigure the software and firmware including firmware
+ * redownload.
+ */
+static void mwifiex_pcie_reset_done(struct pci_dev *pdev)
+{
+	struct pcie_service_card *card = pci_get_drvdata(pdev);
+	struct mwifiex_adapter *adapter = card->adapter;
+	int ret;
+
+	if (!adapter) {
+		dev_err(&pdev->dev, "%s: adapter structure is not valid\n",
+			__func__);
+		return;
+	}
+
+	mwifiex_dbg(adapter, INFO,
+		    "%s: vendor=0x%4.04x device=0x%4.04x rev=%d Post-FLR\n",
+		    __func__, pdev->vendor, pdev->device, pdev->revision);
+
+	adapter->surprise_removed = false;
+	ret = mwifiex_reinit_sw(adapter);
+	if (ret)
+		dev_err(&pdev->dev, "reinit failed: %d\n", ret);
+	else
+		mwifiex_dbg(adapter, INFO, "%s, successful\n", __func__);
+}
+
+static const struct pci_error_handlers mwifiex_pcie_err_handler = {
+	.reset_prepare		= mwifiex_pcie_reset_prepare,
+	.reset_done		= mwifiex_pcie_reset_done,
 };
 
 #ifdef CONFIG_PM_SLEEP
@@ -410,7 +423,7 @@ static struct pci_driver __refdata mwifiex_pcie = {
 	},
 #endif
 	.shutdown = mwifiex_pcie_shutdown,
-	.err_handler = mwifiex_pcie_err_handler,
+	.err_handler = &mwifiex_pcie_err_handler,
 };
 
 /*
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index d52701df7245..5059f0f983b5 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2152,14 +2152,14 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	return result;
 }
 
-static void nvme_reset_notify(struct pci_dev *pdev, bool prepare)
+static void nvme_reset_prepare(struct pci_dev *pdev)
 {
-	struct nvme_dev *dev = pci_get_drvdata(pdev);
+	nvme_dev_disable(pci_get_drvdata(pdev), false);
+}
 
-	if (prepare)
-		nvme_dev_disable(dev, false);
-	else
-		nvme_reset(dev);
+static void nvme_reset_done(struct pci_dev *pdev)
+{
+	nvme_reset(pci_get_drvdata(pdev));
 }
 
 static void nvme_shutdown(struct pci_dev *pdev)
@@ -2281,7 +2281,8 @@ static const struct pci_error_handlers nvme_err_handler = {
 	.error_detected	= nvme_error_detected,
 	.slot_reset	= nvme_slot_reset,
 	.resume		= nvme_error_resume,
-	.reset_notify	= nvme_reset_notify,
+	.reset_prepare	= nvme_reset_prepare,
+	.reset_done	= nvme_reset_done,
 };
 
 static const struct pci_device_id nvme_id_table[] = {
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 92f7e5ae2e5e..b3f9ae7a4e21 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4128,26 +4128,13 @@ static void pci_dev_unlock(struct pci_dev *dev)
 	pci_cfg_access_unlock(dev);
 }
 
-/**
- * pci_reset_notify - notify device driver of reset
- * @dev: device to be notified of reset
- * @prepare: 'true' if device is about to be reset; 'false' if reset attempt
- *           completed
- *
- * Must be called prior to device access being disabled and after device
- * access is restored.
- */
-static void pci_reset_notify(struct pci_dev *dev, bool prepare)
+static void pci_dev_save_and_disable(struct pci_dev *dev)
 {
 	const struct pci_error_handlers *err_handler =
 			dev->driver ? dev->driver->err_handler : NULL;
-	if (err_handler && err_handler->reset_notify)
-		err_handler->reset_notify(dev, prepare);
-}
 
-static void pci_dev_save_and_disable(struct pci_dev *dev)
-{
-	pci_reset_notify(dev, true);
+	if (err_handler && err_handler->reset_prepare)
+		err_handler->reset_prepare(dev);
 
 	/*
 	 * Wake-up device prior to save.  PM registers default to D0 after
@@ -4169,8 +4156,12 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
 
 static void pci_dev_restore(struct pci_dev *dev)
 {
+	const struct pci_error_handlers *err_handler =
+			dev->driver ? dev->driver->err_handler : NULL;
+
 	pci_restore_state(dev);
-	pci_reset_notify(dev, false);
+	if (err_handler && err_handler->reset_done)
+		err_handler->reset_done(dev);
 }
 
 static int pci_dev_reset(struct pci_dev *dev, int probe)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 8039f9f0ca05..b87aec18e673 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -695,7 +695,8 @@ struct pci_error_handlers {
 	pci_ers_result_t (*slot_reset)(struct pci_dev *dev);
 
 	/* PCI function reset prepare or completed */
-	void (*reset_notify)(struct pci_dev *dev, bool prepare);
+	void (*reset_prepare)(struct pci_dev *dev);
+	void (*reset_done)(struct pci_dev *dev);
 
 	/* Device driver may resume normal operations */
 	void (*resume)(struct pci_dev *dev);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/3] PCI: split reset_notify method
@ 2017-06-01 11:10   ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-06-01 11:10 UTC (permalink / raw)


The prepare and done cases have no shared functionality whatsoever, so
split them into separate methods.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 36 +++++--------
 drivers/net/wireless/marvell/mwifiex/pcie.c  | 75 ++++++++++++++++------------
 drivers/nvme/host/pci.c                      | 15 +++---
 drivers/pci/pci.c                            | 25 +++-------
 include/linux/pci.h                          |  3 +-
 5 files changed, 75 insertions(+), 79 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 3e26d27ad213..63784576ae8b 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -2348,30 +2348,19 @@ static void fm10k_io_resume(struct pci_dev *pdev)
 		netif_device_attach(netdev);
 }
 
-/**
- * fm10k_io_reset_notify - called when PCI function is reset
- * @pdev: Pointer to PCI device
- *
- * This callback is called when the PCI function is reset such as from
- * /sys/class/net/<enpX>/device/reset or similar. When prepare is true, it
- * means we should prepare for a function reset. If prepare is false, it means
- * the function reset just occurred.
- */
-static void fm10k_io_reset_notify(struct pci_dev *pdev, bool prepare)
+static void fm10k_io_reset_prepare(struct pci_dev *pdev)
 {
-	struct fm10k_intfc *interface = pci_get_drvdata(pdev);
-	int err = 0;
-
-	if (prepare) {
-		/* warn incase we have any active VF devices */
-		if (pci_num_vf(pdev))
-			dev_warn(&pdev->dev,
-				 "PCIe FLR may cause issues for any active VF devices\n");
+	/* warn incase we have any active VF devices */
+	if (pci_num_vf(pdev))
+		dev_warn(&pdev->dev,
+			 "PCIe FLR may cause issues for any active VF devices\n");
+	fm10k_prepare_suspend(pci_get_drvdata(pdev));
+}
 
-		fm10k_prepare_suspend(interface);
-	} else {
-		err = fm10k_handle_resume(interface);
-	}
+static void fm10k_io_reset_done(struct pci_dev *pdev)
+{
+	struct fm10k_intfc *interface = pci_get_drvdata(pdev);
+	int err = fm10k_handle_resume(interface);
 
 	if (err) {
 		dev_warn(&pdev->dev,
@@ -2384,7 +2373,8 @@ static const struct pci_error_handlers fm10k_err_handler = {
 	.error_detected = fm10k_io_error_detected,
 	.slot_reset = fm10k_io_slot_reset,
 	.resume = fm10k_io_resume,
-	.reset_notify = fm10k_io_reset_notify,
+	.reset_prepare = fm10k_io_reset_prepare,
+	.reset_done = fm10k_io_reset_done,
 };
 
 static struct pci_driver fm10k_driver = {
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index ac62bce50e96..279adf124fc9 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -346,11 +346,13 @@ static const struct pci_device_id mwifiex_ids[] = {
 
 MODULE_DEVICE_TABLE(pci, mwifiex_ids);
 
-static void mwifiex_pcie_reset_notify(struct pci_dev *pdev, bool prepare)
+/*
+ * Cleanup all software without cleaning anything related to PCIe and HW.
+ */
+static void mwifiex_pcie_reset_prepare(struct pci_dev *pdev)
 {
 	struct pcie_service_card *card = pci_get_drvdata(pdev);
 	struct mwifiex_adapter *adapter = card->adapter;
-	int ret;
 
 	if (!adapter) {
 		dev_err(&pdev->dev, "%s: adapter structure is not valid\n",
@@ -359,37 +361,48 @@ static void mwifiex_pcie_reset_notify(struct pci_dev *pdev, bool prepare)
 	}
 
 	mwifiex_dbg(adapter, INFO,
-		    "%s: vendor=0x%4.04x device=0x%4.04x rev=%d %s\n",
-		    __func__, pdev->vendor, pdev->device,
-		    pdev->revision,
-		    prepare ? "Pre-FLR" : "Post-FLR");
-
-	if (prepare) {
-		/* Kernel would be performing FLR after this notification.
-		 * Cleanup all software without cleaning anything related to
-		 * PCIe and HW.
-		 */
-		mwifiex_shutdown_sw(adapter);
-		adapter->surprise_removed = true;
-		clear_bit(MWIFIEX_IFACE_WORK_DEVICE_DUMP, &card->work_flags);
-		clear_bit(MWIFIEX_IFACE_WORK_CARD_RESET, &card->work_flags);
-	} else {
-		/* Kernel stores and restores PCIe function context before and
-		 * after performing FLR respectively. Reconfigure the software
-		 * and firmware including firmware redownload
-		 */
-		adapter->surprise_removed = false;
-		ret = mwifiex_reinit_sw(adapter);
-		if (ret) {
-			dev_err(&pdev->dev, "reinit failed: %d\n", ret);
-			return;
-		}
-	}
+		    "%s: vendor=0x%4.04x device=0x%4.04x rev=%d Pre-FLR\n",
+		    __func__, pdev->vendor, pdev->device, pdev->revision);
+
+	mwifiex_shutdown_sw(adapter);
+	adapter->surprise_removed = true;
+	clear_bit(MWIFIEX_IFACE_WORK_DEVICE_DUMP, &card->work_flags);
+	clear_bit(MWIFIEX_IFACE_WORK_CARD_RESET, &card->work_flags);
 	mwifiex_dbg(adapter, INFO, "%s, successful\n", __func__);
 }
 
-static const struct pci_error_handlers mwifiex_pcie_err_handler[] = {
-		{ .reset_notify = mwifiex_pcie_reset_notify, },
+/*
+ * Kernel stores and restores PCIe function context before and after performing
+ * FLR respectively. Reconfigure the software and firmware including firmware
+ * redownload.
+ */
+static void mwifiex_pcie_reset_done(struct pci_dev *pdev)
+{
+	struct pcie_service_card *card = pci_get_drvdata(pdev);
+	struct mwifiex_adapter *adapter = card->adapter;
+	int ret;
+
+	if (!adapter) {
+		dev_err(&pdev->dev, "%s: adapter structure is not valid\n",
+			__func__);
+		return;
+	}
+
+	mwifiex_dbg(adapter, INFO,
+		    "%s: vendor=0x%4.04x device=0x%4.04x rev=%d Post-FLR\n",
+		    __func__, pdev->vendor, pdev->device, pdev->revision);
+
+	adapter->surprise_removed = false;
+	ret = mwifiex_reinit_sw(adapter);
+	if (ret)
+		dev_err(&pdev->dev, "reinit failed: %d\n", ret);
+	else
+		mwifiex_dbg(adapter, INFO, "%s, successful\n", __func__);
+}
+
+static const struct pci_error_handlers mwifiex_pcie_err_handler = {
+	.reset_prepare		= mwifiex_pcie_reset_prepare,
+	.reset_done		= mwifiex_pcie_reset_done,
 };
 
 #ifdef CONFIG_PM_SLEEP
@@ -410,7 +423,7 @@ static struct pci_driver __refdata mwifiex_pcie = {
 	},
 #endif
 	.shutdown = mwifiex_pcie_shutdown,
-	.err_handler = mwifiex_pcie_err_handler,
+	.err_handler = &mwifiex_pcie_err_handler,
 };
 
 /*
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index d52701df7245..5059f0f983b5 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2152,14 +2152,14 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	return result;
 }
 
-static void nvme_reset_notify(struct pci_dev *pdev, bool prepare)
+static void nvme_reset_prepare(struct pci_dev *pdev)
 {
-	struct nvme_dev *dev = pci_get_drvdata(pdev);
+	nvme_dev_disable(pci_get_drvdata(pdev), false);
+}
 
-	if (prepare)
-		nvme_dev_disable(dev, false);
-	else
-		nvme_reset(dev);
+static void nvme_reset_done(struct pci_dev *pdev)
+{
+	nvme_reset(pci_get_drvdata(pdev));
 }
 
 static void nvme_shutdown(struct pci_dev *pdev)
@@ -2281,7 +2281,8 @@ static const struct pci_error_handlers nvme_err_handler = {
 	.error_detected	= nvme_error_detected,
 	.slot_reset	= nvme_slot_reset,
 	.resume		= nvme_error_resume,
-	.reset_notify	= nvme_reset_notify,
+	.reset_prepare	= nvme_reset_prepare,
+	.reset_done	= nvme_reset_done,
 };
 
 static const struct pci_device_id nvme_id_table[] = {
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 92f7e5ae2e5e..b3f9ae7a4e21 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4128,26 +4128,13 @@ static void pci_dev_unlock(struct pci_dev *dev)
 	pci_cfg_access_unlock(dev);
 }
 
-/**
- * pci_reset_notify - notify device driver of reset
- * @dev: device to be notified of reset
- * @prepare: 'true' if device is about to be reset; 'false' if reset attempt
- *           completed
- *
- * Must be called prior to device access being disabled and after device
- * access is restored.
- */
-static void pci_reset_notify(struct pci_dev *dev, bool prepare)
+static void pci_dev_save_and_disable(struct pci_dev *dev)
 {
 	const struct pci_error_handlers *err_handler =
 			dev->driver ? dev->driver->err_handler : NULL;
-	if (err_handler && err_handler->reset_notify)
-		err_handler->reset_notify(dev, prepare);
-}
 
-static void pci_dev_save_and_disable(struct pci_dev *dev)
-{
-	pci_reset_notify(dev, true);
+	if (err_handler && err_handler->reset_prepare)
+		err_handler->reset_prepare(dev);
 
 	/*
 	 * Wake-up device prior to save.  PM registers default to D0 after
@@ -4169,8 +4156,12 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
 
 static void pci_dev_restore(struct pci_dev *dev)
 {
+	const struct pci_error_handlers *err_handler =
+			dev->driver ? dev->driver->err_handler : NULL;
+
 	pci_restore_state(dev);
-	pci_reset_notify(dev, false);
+	if (err_handler && err_handler->reset_done)
+		err_handler->reset_done(dev);
 }
 
 static int pci_dev_reset(struct pci_dev *dev, int probe)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 8039f9f0ca05..b87aec18e673 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -695,7 +695,8 @@ struct pci_error_handlers {
 	pci_ers_result_t (*slot_reset)(struct pci_dev *dev);
 
 	/* PCI function reset prepare or completed */
-	void (*reset_notify)(struct pci_dev *dev, bool prepare);
+	void (*reset_prepare)(struct pci_dev *dev);
+	void (*reset_done)(struct pci_dev *dev);
 
 	/* Device driver may resume normal operations */
 	void (*resume)(struct pci_dev *dev);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/3] PCI: remove __pci_dev_reset and pci_dev_reset
  2017-06-01 11:10 ` Christoph Hellwig
@ 2017-06-01 11:10   ` Christoph Hellwig
  -1 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-06-01 11:10 UTC (permalink / raw)
  To: helgaas; +Cc: rakesh, linux-pci, linux-nvme

And instead implement the reset probing / reset chain in
__pci_probe_reset_function and __pci_reset_function_locked respectively.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/pci/pci.c | 108 ++++++++++++++++++++++++++----------------------------
 1 file changed, 52 insertions(+), 56 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b3f9ae7a4e21..20be3b87124a 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4069,40 +4069,6 @@ static int pci_dev_reset_slot_function(struct pci_dev *dev, int probe)
 	return pci_reset_hotplug_slot(dev->slot->hotplug, probe);
 }
 
-static int __pci_dev_reset(struct pci_dev *dev, int probe)
-{
-	int rc;
-
-	might_sleep();
-
-	rc = pci_dev_specific_reset(dev, probe);
-	if (rc != -ENOTTY)
-		goto done;
-
-	if (pcie_has_flr(dev)) {
-		if (!probe)
-			pcie_flr(dev);
-		rc = 0;
-		goto done;
-	}
-
-	rc = pci_af_flr(dev, probe);
-	if (rc != -ENOTTY)
-		goto done;
-
-	rc = pci_pm_reset(dev, probe);
-	if (rc != -ENOTTY)
-		goto done;
-
-	rc = pci_dev_reset_slot_function(dev, probe);
-	if (rc != -ENOTTY)
-		goto done;
-
-	rc = pci_parent_bus_reset(dev, probe);
-done:
-	return rc;
-}
-
 static void pci_dev_lock(struct pci_dev *dev)
 {
 	pci_cfg_access_lock(dev);
@@ -4164,21 +4130,6 @@ static void pci_dev_restore(struct pci_dev *dev)
 		err_handler->reset_done(dev);
 }
 
-static int pci_dev_reset(struct pci_dev *dev, int probe)
-{
-	int rc;
-
-	if (!probe)
-		pci_dev_lock(dev);
-
-	rc = __pci_dev_reset(dev, probe);
-
-	if (!probe)
-		pci_dev_unlock(dev);
-
-	return rc;
-}
-
 /**
  * __pci_reset_function - reset a PCI device function
  * @dev: PCI device to reset
@@ -4198,7 +4149,13 @@ static int pci_dev_reset(struct pci_dev *dev, int probe)
  */
 int __pci_reset_function(struct pci_dev *dev)
 {
-	return pci_dev_reset(dev, 0);
+	int ret;
+
+	pci_dev_lock(dev);
+	ret = __pci_reset_function_locked(dev);
+	pci_dev_unlock(dev);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(__pci_reset_function);
 
@@ -4223,7 +4180,27 @@ EXPORT_SYMBOL_GPL(__pci_reset_function);
  */
 int __pci_reset_function_locked(struct pci_dev *dev)
 {
-	return __pci_dev_reset(dev, 0);
+	int rc;
+
+	might_sleep();
+
+	rc = pci_dev_specific_reset(dev, 0);
+	if (rc != -ENOTTY)
+		return rc;
+	if (pcie_has_flr(dev)) {
+		pcie_flr(dev);
+		return 0;
+	}
+	rc = pci_af_flr(dev, 0);
+	if (rc != -ENOTTY)
+		return rc;
+	rc = pci_pm_reset(dev, 0);
+	if (rc != -ENOTTY)
+		return rc;
+	rc = pci_dev_reset_slot_function(dev, 0);
+	if (rc != -ENOTTY)
+		return rc;
+	return pci_parent_bus_reset(dev, 0);
 }
 EXPORT_SYMBOL_GPL(__pci_reset_function_locked);
 
@@ -4240,7 +4217,26 @@ EXPORT_SYMBOL_GPL(__pci_reset_function_locked);
  */
 int pci_probe_reset_function(struct pci_dev *dev)
 {
-	return pci_dev_reset(dev, 1);
+	int rc;
+
+	might_sleep();
+
+	rc = pci_dev_specific_reset(dev, 1);
+	if (rc != -ENOTTY)
+		return rc;
+	if (pcie_has_flr(dev))
+		return 0;
+	rc = pci_af_flr(dev, 1);
+	if (rc != -ENOTTY)
+		return rc;
+	rc = pci_pm_reset(dev, 1);
+	if (rc != -ENOTTY)
+		return rc;
+	rc = pci_dev_reset_slot_function(dev, 1);
+	if (rc != -ENOTTY)
+		return rc;
+
+	return pci_parent_bus_reset(dev, 1);
 }
 
 /**
@@ -4263,14 +4259,14 @@ int pci_reset_function(struct pci_dev *dev)
 {
 	int rc;
 
-	rc = pci_dev_reset(dev, 1);
+	rc = pci_probe_reset_function(dev);
 	if (rc)
 		return rc;
 
 	pci_dev_lock(dev);
 	pci_dev_save_and_disable(dev);
 
-	rc = __pci_dev_reset(dev, 0);
+	rc = __pci_reset_function_locked(dev);
 
 	pci_dev_restore(dev);
 	pci_dev_unlock(dev);
@@ -4289,7 +4285,7 @@ int pci_try_reset_function(struct pci_dev *dev)
 {
 	int rc;
 
-	rc = pci_dev_reset(dev, 1);
+	rc = pci_probe_reset_function(dev);
 	if (rc)
 		return rc;
 
@@ -4297,7 +4293,7 @@ int pci_try_reset_function(struct pci_dev *dev)
 		return -EAGAIN;
 
 	pci_dev_save_and_disable(dev);
-	rc = __pci_dev_reset(dev, 0);
+	rc = __pci_reset_function_locked(dev);
 	pci_dev_unlock(dev);
 
 	pci_dev_restore(dev);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/3] PCI: remove __pci_dev_reset and pci_dev_reset
@ 2017-06-01 11:10   ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-06-01 11:10 UTC (permalink / raw)


And instead implement the reset probing / reset chain in
__pci_probe_reset_function and __pci_reset_function_locked respectively.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/pci/pci.c | 108 ++++++++++++++++++++++++++----------------------------
 1 file changed, 52 insertions(+), 56 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b3f9ae7a4e21..20be3b87124a 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4069,40 +4069,6 @@ static int pci_dev_reset_slot_function(struct pci_dev *dev, int probe)
 	return pci_reset_hotplug_slot(dev->slot->hotplug, probe);
 }
 
-static int __pci_dev_reset(struct pci_dev *dev, int probe)
-{
-	int rc;
-
-	might_sleep();
-
-	rc = pci_dev_specific_reset(dev, probe);
-	if (rc != -ENOTTY)
-		goto done;
-
-	if (pcie_has_flr(dev)) {
-		if (!probe)
-			pcie_flr(dev);
-		rc = 0;
-		goto done;
-	}
-
-	rc = pci_af_flr(dev, probe);
-	if (rc != -ENOTTY)
-		goto done;
-
-	rc = pci_pm_reset(dev, probe);
-	if (rc != -ENOTTY)
-		goto done;
-
-	rc = pci_dev_reset_slot_function(dev, probe);
-	if (rc != -ENOTTY)
-		goto done;
-
-	rc = pci_parent_bus_reset(dev, probe);
-done:
-	return rc;
-}
-
 static void pci_dev_lock(struct pci_dev *dev)
 {
 	pci_cfg_access_lock(dev);
@@ -4164,21 +4130,6 @@ static void pci_dev_restore(struct pci_dev *dev)
 		err_handler->reset_done(dev);
 }
 
-static int pci_dev_reset(struct pci_dev *dev, int probe)
-{
-	int rc;
-
-	if (!probe)
-		pci_dev_lock(dev);
-
-	rc = __pci_dev_reset(dev, probe);
-
-	if (!probe)
-		pci_dev_unlock(dev);
-
-	return rc;
-}
-
 /**
  * __pci_reset_function - reset a PCI device function
  * @dev: PCI device to reset
@@ -4198,7 +4149,13 @@ static int pci_dev_reset(struct pci_dev *dev, int probe)
  */
 int __pci_reset_function(struct pci_dev *dev)
 {
-	return pci_dev_reset(dev, 0);
+	int ret;
+
+	pci_dev_lock(dev);
+	ret = __pci_reset_function_locked(dev);
+	pci_dev_unlock(dev);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(__pci_reset_function);
 
@@ -4223,7 +4180,27 @@ EXPORT_SYMBOL_GPL(__pci_reset_function);
  */
 int __pci_reset_function_locked(struct pci_dev *dev)
 {
-	return __pci_dev_reset(dev, 0);
+	int rc;
+
+	might_sleep();
+
+	rc = pci_dev_specific_reset(dev, 0);
+	if (rc != -ENOTTY)
+		return rc;
+	if (pcie_has_flr(dev)) {
+		pcie_flr(dev);
+		return 0;
+	}
+	rc = pci_af_flr(dev, 0);
+	if (rc != -ENOTTY)
+		return rc;
+	rc = pci_pm_reset(dev, 0);
+	if (rc != -ENOTTY)
+		return rc;
+	rc = pci_dev_reset_slot_function(dev, 0);
+	if (rc != -ENOTTY)
+		return rc;
+	return pci_parent_bus_reset(dev, 0);
 }
 EXPORT_SYMBOL_GPL(__pci_reset_function_locked);
 
@@ -4240,7 +4217,26 @@ EXPORT_SYMBOL_GPL(__pci_reset_function_locked);
  */
 int pci_probe_reset_function(struct pci_dev *dev)
 {
-	return pci_dev_reset(dev, 1);
+	int rc;
+
+	might_sleep();
+
+	rc = pci_dev_specific_reset(dev, 1);
+	if (rc != -ENOTTY)
+		return rc;
+	if (pcie_has_flr(dev))
+		return 0;
+	rc = pci_af_flr(dev, 1);
+	if (rc != -ENOTTY)
+		return rc;
+	rc = pci_pm_reset(dev, 1);
+	if (rc != -ENOTTY)
+		return rc;
+	rc = pci_dev_reset_slot_function(dev, 1);
+	if (rc != -ENOTTY)
+		return rc;
+
+	return pci_parent_bus_reset(dev, 1);
 }
 
 /**
@@ -4263,14 +4259,14 @@ int pci_reset_function(struct pci_dev *dev)
 {
 	int rc;
 
-	rc = pci_dev_reset(dev, 1);
+	rc = pci_probe_reset_function(dev);
 	if (rc)
 		return rc;
 
 	pci_dev_lock(dev);
 	pci_dev_save_and_disable(dev);
 
-	rc = __pci_dev_reset(dev, 0);
+	rc = __pci_reset_function_locked(dev);
 
 	pci_dev_restore(dev);
 	pci_dev_unlock(dev);
@@ -4289,7 +4285,7 @@ int pci_try_reset_function(struct pci_dev *dev)
 {
 	int rc;
 
-	rc = pci_dev_reset(dev, 1);
+	rc = pci_probe_reset_function(dev);
 	if (rc)
 		return rc;
 
@@ -4297,7 +4293,7 @@ int pci_try_reset_function(struct pci_dev *dev)
 		return -EAGAIN;
 
 	pci_dev_save_and_disable(dev);
-	rc = __pci_dev_reset(dev, 0);
+	rc = __pci_reset_function_locked(dev);
 	pci_dev_unlock(dev);
 
 	pci_dev_restore(dev);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls
  2017-06-01 11:10   ` Christoph Hellwig
@ 2017-06-06  5:31     ` Bjorn Helgaas
  -1 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2017-06-06  5:31 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: rakesh, linux-pci, linux-nvme

On Thu, Jun 01, 2017 at 01:10:37PM +0200, Christoph Hellwig wrote:
> Without this ->notify_reset instance may race with ->remove calls,

s/notify_reset/reset_notify/

> which can be easily triggered in NVMe.

OK, sorry to be dense; it's taking me a long time to work out the
details here.  It feels like there should be a general principle to
help figure out where we need locking, and it would be really awesome
if we could include that in the changelog.  But it's not obvious to me
what that principle would be.

I think the two racing paths are these:

PATH 1 (write to sysfs "reset" file):

  sysfs_kf_write                      # <-- A (sysfs write)
    reset_store
      pci_reset_function
        pci_dev_lock                  # <-- patch moves lock here
          device_lock
        pci_dev_save_and_disable
          pci_reset_notify(dev, true)
            err_handler->reset_notify
              nvme_reset_notify       # nvme_err_handler.reset_notify
                nvme_dev_disable      # prepare == true
                  # shutdown == false
                  nvme_pci_disable
          pci_save_state
        pci_dev_reset
          pci_dev_lock                # <-- lock was originally here
          __pci_dev_reset
            pcie_flr                  # <-- B (issue reset)
          pci_dev_unlock              # <-- unlock was originally here
        pci_dev_restore
          pci_restore_state
          pci_reset_notify(dev, false)
            err_handler->reset_notify
              nvme_reset_notify       # nvme_err_handler.reset_notify
                dev = pci_get_drvdata(pdev)   # <-- F (dev == NULL)
                nvme_reset            # prepare == false
                  queue_work(..., &dev->reset_work)   # nvme_reset_work
        pci_dev_unlock                # <-- patch moves unlock here

  ...
  nvme_reset_work
    nvme_remove_dead_ctrl
      nvme_dev_disable
        if (!schedule_work(&dev->remove_work)) # nvme_remove_dead_ctrl_work
          nvme_put_ctrl

  ...
  nvme_remove_dead_ctrl_work
    if (pci_get_drvdata(pdev))
      device_release_driver(&pdev->dev)
        ...
          __device_release_driver
            drv->remove
              nvme_remove
                pci_set_drvdata(pdev, NULL)

PATH 2 (AER recovery):

  do_recovery                         # <-- C (AER interrupt)
    if (severity == AER_FATAL)
      state = pci_channel_io_frozen
    status = broadcast_error_message(..., report_error_detected)
      pci_walk_bus
        report_error_detected
          err_handler->error_detected
            nvme_error_detected
              return PCI_ERS_RESULT_NEED_RESET        # frozen case
    # status == PCI_ERS_RESULT_NEED_RESET
    if (severity == AER_FATAL)
      reset_link
    if (status == PCI_ERS_RESULT_NEED_RESET)
      broadcast_error_message(..., report_slot_reset)
        pci_walk_bus
          report_slot_reset
            device_lock               # <-- D (acquire device lock)
            err_handler->slot_reset
              nvme_slot_reset
                nvme_reset
                  queue_work(..., &dev->reset_work)   # nvme_reset_work
            device_lock               # <-- unlock

  ...
  nvme_reset_work
    ...
      schedule_work(&dev->remove_work)  # nvme_remove_dead_ctrl_work

  ...
  nvme_remove_dead_ctrl_work
    ...
      drv->remove
        nvme_remove                     # <-- E (driver remove() method)
          pci_set_drvdata(pdev, NULL)

So the critical point is that with the current code, we can have this
sequence:

  A sysfs write occurs
  B sysfs write thread issues FLR to device
  C AER thread takes an interrupt as a result of the FLR
  D AER thread acquires device lock
  E AER thread calls the driver remove() method and clears drvdata
  F sysfs write thread reads drvdata which is now NULL

The AER thread acquires the device lock before calling
err_handler->slot_reset, and this patch makes the sysfs thread hold
the lock until after it has read drvdata, so the patch avoids the NULL
pointer dereference at F by changing the sequence to this:

  A sysfs write occurs
  B sysfs write thread issues FLR to device
  C AER thread takes an interrupt as a result of the FLR
  F sysfs write thread reads drvdata
  D AER thread acquires device lock
  E AER thread calls the driver remove() method and clears drvdata

But I'm still nervous because I think both threads will queue
nvme_reset_work() work items for the same device, and I'm not sure
they're prepared to run concurrently.

I don't really think it should be the driver's responsibility to
understand issues like this and worry about things like
nvme_reset_work() running concurrently.  So I'm thinking maybe the PCI
core needs to be a little stricter here, but I don't know exactly
*how*.

What do you think?

Bjorn

> Reported-by: Rakesh Pandit <rakesh@tuxera.com>
> Tested-by: Rakesh Pandit <rakesh@tuxera.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/pci/pci.c | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 563901cd9c06..92f7e5ae2e5e 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4276,11 +4276,13 @@ int pci_reset_function(struct pci_dev *dev)
>  	if (rc)
>  		return rc;
>  
> +	pci_dev_lock(dev);
>  	pci_dev_save_and_disable(dev);
>  
> -	rc = pci_dev_reset(dev, 0);
> +	rc = __pci_dev_reset(dev, 0);
>  
>  	pci_dev_restore(dev);
> +	pci_dev_unlock(dev);
>  
>  	return rc;
>  }
> @@ -4300,16 +4302,14 @@ int pci_try_reset_function(struct pci_dev *dev)
>  	if (rc)
>  		return rc;
>  
> -	pci_dev_save_and_disable(dev);
> +	if (pci_dev_trylock(dev))
> +		return -EAGAIN;
>  
> -	if (pci_dev_trylock(dev)) {
> -		rc = __pci_dev_reset(dev, 0);
> -		pci_dev_unlock(dev);
> -	} else
> -		rc = -EAGAIN;
> +	pci_dev_save_and_disable(dev);
> +	rc = __pci_dev_reset(dev, 0);
> +	pci_dev_unlock(dev);
>  
>  	pci_dev_restore(dev);
> -
>  	return rc;
>  }
>  EXPORT_SYMBOL_GPL(pci_try_reset_function);
> @@ -4459,7 +4459,9 @@ static void pci_bus_save_and_disable(struct pci_bus *bus)
>  	struct pci_dev *dev;
>  
>  	list_for_each_entry(dev, &bus->devices, bus_list) {
> +		pci_dev_lock(dev);
>  		pci_dev_save_and_disable(dev);
> +		pci_dev_unlock(dev);
>  		if (dev->subordinate)
>  			pci_bus_save_and_disable(dev->subordinate);
>  	}
> @@ -4474,7 +4476,9 @@ static void pci_bus_restore(struct pci_bus *bus)
>  	struct pci_dev *dev;
>  
>  	list_for_each_entry(dev, &bus->devices, bus_list) {
> +		pci_dev_lock(dev);
>  		pci_dev_restore(dev);
> +		pci_dev_unlock(dev);
>  		if (dev->subordinate)
>  			pci_bus_restore(dev->subordinate);
>  	}
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls
@ 2017-06-06  5:31     ` Bjorn Helgaas
  0 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2017-06-06  5:31 UTC (permalink / raw)


On Thu, Jun 01, 2017@01:10:37PM +0200, Christoph Hellwig wrote:
> Without this ->notify_reset instance may race with ->remove calls,

s/notify_reset/reset_notify/

> which can be easily triggered in NVMe.

OK, sorry to be dense; it's taking me a long time to work out the
details here.  It feels like there should be a general principle to
help figure out where we need locking, and it would be really awesome
if we could include that in the changelog.  But it's not obvious to me
what that principle would be.

I think the two racing paths are these:

PATH 1 (write to sysfs "reset" file):

  sysfs_kf_write                      # <-- A (sysfs write)
    reset_store
      pci_reset_function
        pci_dev_lock                  # <-- patch moves lock here
          device_lock
        pci_dev_save_and_disable
          pci_reset_notify(dev, true)
            err_handler->reset_notify
              nvme_reset_notify       # nvme_err_handler.reset_notify
                nvme_dev_disable      # prepare == true
                  # shutdown == false
                  nvme_pci_disable
          pci_save_state
        pci_dev_reset
          pci_dev_lock                # <-- lock was originally here
          __pci_dev_reset
            pcie_flr                  # <-- B (issue reset)
          pci_dev_unlock              # <-- unlock was originally here
        pci_dev_restore
          pci_restore_state
          pci_reset_notify(dev, false)
            err_handler->reset_notify
              nvme_reset_notify       # nvme_err_handler.reset_notify
                dev = pci_get_drvdata(pdev)   # <-- F (dev == NULL)
                nvme_reset            # prepare == false
                  queue_work(..., &dev->reset_work)   # nvme_reset_work
        pci_dev_unlock                # <-- patch moves unlock here

  ...
  nvme_reset_work
    nvme_remove_dead_ctrl
      nvme_dev_disable
        if (!schedule_work(&dev->remove_work)) # nvme_remove_dead_ctrl_work
          nvme_put_ctrl

  ...
  nvme_remove_dead_ctrl_work
    if (pci_get_drvdata(pdev))
      device_release_driver(&pdev->dev)
        ...
          __device_release_driver
            drv->remove
              nvme_remove
                pci_set_drvdata(pdev, NULL)

PATH 2 (AER recovery):

  do_recovery                         # <-- C (AER interrupt)
    if (severity == AER_FATAL)
      state = pci_channel_io_frozen
    status = broadcast_error_message(..., report_error_detected)
      pci_walk_bus
        report_error_detected
          err_handler->error_detected
            nvme_error_detected
              return PCI_ERS_RESULT_NEED_RESET        # frozen case
    # status == PCI_ERS_RESULT_NEED_RESET
    if (severity == AER_FATAL)
      reset_link
    if (status == PCI_ERS_RESULT_NEED_RESET)
      broadcast_error_message(..., report_slot_reset)
        pci_walk_bus
          report_slot_reset
            device_lock               # <-- D (acquire device lock)
            err_handler->slot_reset
              nvme_slot_reset
                nvme_reset
                  queue_work(..., &dev->reset_work)   # nvme_reset_work
            device_lock               # <-- unlock

  ...
  nvme_reset_work
    ...
      schedule_work(&dev->remove_work)  # nvme_remove_dead_ctrl_work

  ...
  nvme_remove_dead_ctrl_work
    ...
      drv->remove
        nvme_remove                     # <-- E (driver remove() method)
          pci_set_drvdata(pdev, NULL)

So the critical point is that with the current code, we can have this
sequence:

  A sysfs write occurs
  B sysfs write thread issues FLR to device
  C AER thread takes an interrupt as a result of the FLR
  D AER thread acquires device lock
  E AER thread calls the driver remove() method and clears drvdata
  F sysfs write thread reads drvdata which is now NULL

The AER thread acquires the device lock before calling
err_handler->slot_reset, and this patch makes the sysfs thread hold
the lock until after it has read drvdata, so the patch avoids the NULL
pointer dereference at F by changing the sequence to this:

  A sysfs write occurs
  B sysfs write thread issues FLR to device
  C AER thread takes an interrupt as a result of the FLR
  F sysfs write thread reads drvdata
  D AER thread acquires device lock
  E AER thread calls the driver remove() method and clears drvdata

But I'm still nervous because I think both threads will queue
nvme_reset_work() work items for the same device, and I'm not sure
they're prepared to run concurrently.

I don't really think it should be the driver's responsibility to
understand issues like this and worry about things like
nvme_reset_work() running concurrently.  So I'm thinking maybe the PCI
core needs to be a little stricter here, but I don't know exactly
*how*.

What do you think?

Bjorn

> Reported-by: Rakesh Pandit <rakesh at tuxera.com>
> Tested-by: Rakesh Pandit <rakesh at tuxera.com>
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>  drivers/pci/pci.c | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 563901cd9c06..92f7e5ae2e5e 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4276,11 +4276,13 @@ int pci_reset_function(struct pci_dev *dev)
>  	if (rc)
>  		return rc;
>  
> +	pci_dev_lock(dev);
>  	pci_dev_save_and_disable(dev);
>  
> -	rc = pci_dev_reset(dev, 0);
> +	rc = __pci_dev_reset(dev, 0);
>  
>  	pci_dev_restore(dev);
> +	pci_dev_unlock(dev);
>  
>  	return rc;
>  }
> @@ -4300,16 +4302,14 @@ int pci_try_reset_function(struct pci_dev *dev)
>  	if (rc)
>  		return rc;
>  
> -	pci_dev_save_and_disable(dev);
> +	if (pci_dev_trylock(dev))
> +		return -EAGAIN;
>  
> -	if (pci_dev_trylock(dev)) {
> -		rc = __pci_dev_reset(dev, 0);
> -		pci_dev_unlock(dev);
> -	} else
> -		rc = -EAGAIN;
> +	pci_dev_save_and_disable(dev);
> +	rc = __pci_dev_reset(dev, 0);
> +	pci_dev_unlock(dev);
>  
>  	pci_dev_restore(dev);
> -
>  	return rc;
>  }
>  EXPORT_SYMBOL_GPL(pci_try_reset_function);
> @@ -4459,7 +4459,9 @@ static void pci_bus_save_and_disable(struct pci_bus *bus)
>  	struct pci_dev *dev;
>  
>  	list_for_each_entry(dev, &bus->devices, bus_list) {
> +		pci_dev_lock(dev);
>  		pci_dev_save_and_disable(dev);
> +		pci_dev_unlock(dev);
>  		if (dev->subordinate)
>  			pci_bus_save_and_disable(dev->subordinate);
>  	}
> @@ -4474,7 +4476,9 @@ static void pci_bus_restore(struct pci_bus *bus)
>  	struct pci_dev *dev;
>  
>  	list_for_each_entry(dev, &bus->devices, bus_list) {
> +		pci_dev_lock(dev);
>  		pci_dev_restore(dev);
> +		pci_dev_unlock(dev);
>  		if (dev->subordinate)
>  			pci_bus_restore(dev->subordinate);
>  	}
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls
  2017-06-06  5:31     ` Bjorn Helgaas
@ 2017-06-06  7:28       ` Marta Rybczynska
  -1 siblings, 0 replies; 28+ messages in thread
From: Marta Rybczynska @ 2017-06-06  7:28 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Christoph Hellwig, rakesh, linux-nvme, linux-pci

> 
> But I'm still nervous because I think both threads will queue
> nvme_reset_work() work items for the same device, and I'm not sure
> they're prepared to run concurrently.
> 
> I don't really think it should be the driver's responsibility to
> understand issues like this and worry about things like
> nvme_reset_work() running concurrently.  So I'm thinking maybe the PCI
> core needs to be a little stricter here, but I don't know exactly
> *how*.
> 
> What do you think?

>From what I can see the nvme_reset_work if run twice may disable the
controller (the out label) if run concurrently. If run twice it will
initialize twice what isn't best either.

I think that the double nvme_reset_work should be prevented. Maybe
a state information saying that the device is in the reset procedure
and after that run nvme_reset_work just once?

Marta

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls
@ 2017-06-06  7:28       ` Marta Rybczynska
  0 siblings, 0 replies; 28+ messages in thread
From: Marta Rybczynska @ 2017-06-06  7:28 UTC (permalink / raw)


> 
> But I'm still nervous because I think both threads will queue
> nvme_reset_work() work items for the same device, and I'm not sure
> they're prepared to run concurrently.
> 
> I don't really think it should be the driver's responsibility to
> understand issues like this and worry about things like
> nvme_reset_work() running concurrently.  So I'm thinking maybe the PCI
> core needs to be a little stricter here, but I don't know exactly
> *how*.
> 
> What do you think?

>From what I can see the nvme_reset_work if run twice may disable the
controller (the out label) if run concurrently. If run twice it will
initialize twice what isn't best either.

I think that the double nvme_reset_work should be prevented. Maybe
a state information saying that the device is in the reset procedure
and after that run nvme_reset_work just once?

Marta

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls
  2017-06-06  5:31     ` Bjorn Helgaas
@ 2017-06-06 10:48       ` Christoph Hellwig
  -1 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-06-06 10:48 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Christoph Hellwig, rakesh, linux-pci, linux-nvme,
	Greg Kroah-Hartman, linux-kernel

On Tue, Jun 06, 2017 at 12:31:42AM -0500, Bjorn Helgaas wrote:
> OK, sorry to be dense; it's taking me a long time to work out the
> details here.  It feels like there should be a general principle to
> help figure out where we need locking, and it would be really awesome
> if we could include that in the changelog.  But it's not obvious to me
> what that principle would be.

The principle is very simple: every method in struct device_driver
or structures derived from it like struct pci_driver MUST provide
exclusion vs ->remove.  Usuaull by using device_lock().

If we don't provide such an exclusion the method call can race with
a removal in one form or another.

> But I'm still nervous because I think both threads will queue
> nvme_reset_work() work items for the same device, and I'm not sure
> they're prepared to run concurrently.

We had another bug in that area, and the fix for that is hopefully
going to go into the next 4.12-rc.

> I don't really think it should be the driver's responsibility to
> understand issues like this and worry about things like
> nvme_reset_work() running concurrently.  So I'm thinking maybe the PCI
> core needs to be a little stricter here, but I don't know exactly
> *how*.
> 
> What do you think?

The driver core / bus driver must ensure that method calls don't
race with ->remove.  There is nothing the driver can do about it,
and the race is just as possible with explicit user removals or
hardware hotplug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls
@ 2017-06-06 10:48       ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-06-06 10:48 UTC (permalink / raw)


On Tue, Jun 06, 2017@12:31:42AM -0500, Bjorn Helgaas wrote:
> OK, sorry to be dense; it's taking me a long time to work out the
> details here.  It feels like there should be a general principle to
> help figure out where we need locking, and it would be really awesome
> if we could include that in the changelog.  But it's not obvious to me
> what that principle would be.

The principle is very simple: every method in struct device_driver
or structures derived from it like struct pci_driver MUST provide
exclusion vs ->remove.  Usuaull by using device_lock().

If we don't provide such an exclusion the method call can race with
a removal in one form or another.

> But I'm still nervous because I think both threads will queue
> nvme_reset_work() work items for the same device, and I'm not sure
> they're prepared to run concurrently.

We had another bug in that area, and the fix for that is hopefully
going to go into the next 4.12-rc.

> I don't really think it should be the driver's responsibility to
> understand issues like this and worry about things like
> nvme_reset_work() running concurrently.  So I'm thinking maybe the PCI
> core needs to be a little stricter here, but I don't know exactly
> *how*.
> 
> What do you think?

The driver core / bus driver must ensure that method calls don't
race with ->remove.  There is nothing the driver can do about it,
and the race is just as possible with explicit user removals or
hardware hotplug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls
  2017-06-06 10:48       ` Christoph Hellwig
@ 2017-06-06 21:14         ` Bjorn Helgaas
  -1 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2017-06-06 21:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: rakesh, linux-pci, linux-nvme, Greg Kroah-Hartman, linux-kernel

On Tue, Jun 06, 2017 at 12:48:36PM +0200, Christoph Hellwig wrote:
> On Tue, Jun 06, 2017 at 12:31:42AM -0500, Bjorn Helgaas wrote:
> > OK, sorry to be dense; it's taking me a long time to work out the
> > details here.  It feels like there should be a general principle to
> > help figure out where we need locking, and it would be really awesome
> > if we could include that in the changelog.  But it's not obvious to me
> > what that principle would be.
> 
> The principle is very simple: every method in struct device_driver
> or structures derived from it like struct pci_driver MUST provide
> exclusion vs ->remove.  Usuaull by using device_lock().
> 
> If we don't provide such an exclusion the method call can race with
> a removal in one form or another.

So I guess the method here is
dev->driver->err_handler->reset_notify(), and the PCI core should be
holding device_lock() while calling it?  That makes sense to me;
thanks a lot for articulating that!

1) The current patch protects the err_handler->reset_notify() uses by
adding or expanding device_lock regions in the paths that lead to
pci_reset_notify().  Could we simplify it by doing the locking
directly in pci_reset_notify()?  Then it would be easy to verify the
locking, and we would be less likely to add new callers without the
proper locking.

2) Stating the rule explicitly helps look for other problems, and I
think we have a similar problem in all the pcie_portdrv_err_handler
methods.  These are all called in the AER do_recovery() path, and the
functions there, e.g., report_error_detected() do hold device_lock().
But pcie_portdrv_error_detected() propagates this to all the children,
and we *don't* hold the lock for the children.

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls
@ 2017-06-06 21:14         ` Bjorn Helgaas
  0 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2017-06-06 21:14 UTC (permalink / raw)


On Tue, Jun 06, 2017@12:48:36PM +0200, Christoph Hellwig wrote:
> On Tue, Jun 06, 2017@12:31:42AM -0500, Bjorn Helgaas wrote:
> > OK, sorry to be dense; it's taking me a long time to work out the
> > details here.  It feels like there should be a general principle to
> > help figure out where we need locking, and it would be really awesome
> > if we could include that in the changelog.  But it's not obvious to me
> > what that principle would be.
> 
> The principle is very simple: every method in struct device_driver
> or structures derived from it like struct pci_driver MUST provide
> exclusion vs ->remove.  Usuaull by using device_lock().
> 
> If we don't provide such an exclusion the method call can race with
> a removal in one form or another.

So I guess the method here is
dev->driver->err_handler->reset_notify(), and the PCI core should be
holding device_lock() while calling it?  That makes sense to me;
thanks a lot for articulating that!

1) The current patch protects the err_handler->reset_notify() uses by
adding or expanding device_lock regions in the paths that lead to
pci_reset_notify().  Could we simplify it by doing the locking
directly in pci_reset_notify()?  Then it would be easy to verify the
locking, and we would be less likely to add new callers without the
proper locking.

2) Stating the rule explicitly helps look for other problems, and I
think we have a similar problem in all the pcie_portdrv_err_handler
methods.  These are all called in the AER do_recovery() path, and the
functions there, e.g., report_error_detected() do hold device_lock().
But pcie_portdrv_error_detected() propagates this to all the children,
and we *don't* hold the lock for the children.

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls
  2017-06-06 21:14         ` Bjorn Helgaas
@ 2017-06-07 18:29           ` Christoph Hellwig
  -1 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-06-07 18:29 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Christoph Hellwig, rakesh, linux-pci, linux-nvme,
	Greg Kroah-Hartman, linux-kernel

On Tue, Jun 06, 2017 at 04:14:43PM -0500, Bjorn Helgaas wrote:
> So I guess the method here is
> dev->driver->err_handler->reset_notify(), and the PCI core should be
> holding device_lock() while calling it?  That makes sense to me;
> thanks a lot for articulating that!

Yes.

> 1) The current patch protects the err_handler->reset_notify() uses by
> adding or expanding device_lock regions in the paths that lead to
> pci_reset_notify().  Could we simplify it by doing the locking
> directly in pci_reset_notify()?  Then it would be easy to verify the
> locking, and we would be less likely to add new callers without the
> proper locking.

We could do that, except that I'd rather hold the lock over a longer
period if we have many calls following each other.  I also have
a patch to actually kill pci_reset_notify() later in the series as
well, as the calling convention for it and ->reset_notify() are
awkward - depending on prepare parameter they do two entirely
different things.  That being said I could also add new
pci_reset_prepare() and pci_reset_done() helpers.

> 2) Stating the rule explicitly helps look for other problems, and I
> think we have a similar problem in all the pcie_portdrv_err_handler
> methods.

Yes, I mentioned this earlier, and I also vaguely remember we got
bug reports from IBM on power for this a while ago.  I just don't
feel confident enough to touch all these without a good test plan.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls
@ 2017-06-07 18:29           ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-06-07 18:29 UTC (permalink / raw)


On Tue, Jun 06, 2017@04:14:43PM -0500, Bjorn Helgaas wrote:
> So I guess the method here is
> dev->driver->err_handler->reset_notify(), and the PCI core should be
> holding device_lock() while calling it?  That makes sense to me;
> thanks a lot for articulating that!

Yes.

> 1) The current patch protects the err_handler->reset_notify() uses by
> adding or expanding device_lock regions in the paths that lead to
> pci_reset_notify().  Could we simplify it by doing the locking
> directly in pci_reset_notify()?  Then it would be easy to verify the
> locking, and we would be less likely to add new callers without the
> proper locking.

We could do that, except that I'd rather hold the lock over a longer
period if we have many calls following each other.  I also have
a patch to actually kill pci_reset_notify() later in the series as
well, as the calling convention for it and ->reset_notify() are
awkward - depending on prepare parameter they do two entirely
different things.  That being said I could also add new
pci_reset_prepare() and pci_reset_done() helpers.

> 2) Stating the rule explicitly helps look for other problems, and I
> think we have a similar problem in all the pcie_portdrv_err_handler
> methods.

Yes, I mentioned this earlier, and I also vaguely remember we got
bug reports from IBM on power for this a while ago.  I just don't
feel confident enough to touch all these without a good test plan.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls
  2017-06-07 18:29           ` Christoph Hellwig
@ 2017-06-12 23:14             ` Bjorn Helgaas
  -1 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2017-06-12 23:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: rakesh, linux-pci, linux-nvme, Greg Kroah-Hartman, linux-kernel

On Wed, Jun 07, 2017 at 08:29:36PM +0200, Christoph Hellwig wrote:
> On Tue, Jun 06, 2017 at 04:14:43PM -0500, Bjorn Helgaas wrote:
> > So I guess the method here is
> > dev->driver->err_handler->reset_notify(), and the PCI core should be
> > holding device_lock() while calling it?  That makes sense to me;
> > thanks a lot for articulating that!
> 
> Yes.
> 
> > 1) The current patch protects the err_handler->reset_notify() uses by
> > adding or expanding device_lock regions in the paths that lead to
> > pci_reset_notify().  Could we simplify it by doing the locking
> > directly in pci_reset_notify()?  Then it would be easy to verify the
> > locking, and we would be less likely to add new callers without the
> > proper locking.
> 
> We could do that, except that I'd rather hold the lock over a longer
> period if we have many calls following each other.  

My main concern is being able to verify the locking.  I think that is
much easier if the locking is adjacent to the method invocation.  But
if you just add a comment at the method invocation about where the
locking is, that should be sufficient.

> I also have
> a patch to actually kill pci_reset_notify() later in the series as
> well, as the calling convention for it and ->reset_notify() are
> awkward - depending on prepare parameter they do two entirely
> different things.  That being said I could also add new
> pci_reset_prepare() and pci_reset_done() helpers.

I like your pci_reset_notify() changes; they make that much clearer.
I don't think new helpers are necessary.

> > 2) Stating the rule explicitly helps look for other problems, and I
> > think we have a similar problem in all the pcie_portdrv_err_handler
> > methods.
> 
> Yes, I mentioned this earlier, and I also vaguely remember we got
> bug reports from IBM on power for this a while ago.  I just don't
> feel confident enough to touch all these without a good test plan.

Hmmm.  I see your point, but I hate leaving a known bug unfixed.  I
wonder if some enterprising soul could tickle this bug by injecting
errors while removing and rescanning devices below the bridge?

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls
@ 2017-06-12 23:14             ` Bjorn Helgaas
  0 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2017-06-12 23:14 UTC (permalink / raw)


On Wed, Jun 07, 2017@08:29:36PM +0200, Christoph Hellwig wrote:
> On Tue, Jun 06, 2017@04:14:43PM -0500, Bjorn Helgaas wrote:
> > So I guess the method here is
> > dev->driver->err_handler->reset_notify(), and the PCI core should be
> > holding device_lock() while calling it?  That makes sense to me;
> > thanks a lot for articulating that!
> 
> Yes.
> 
> > 1) The current patch protects the err_handler->reset_notify() uses by
> > adding or expanding device_lock regions in the paths that lead to
> > pci_reset_notify().  Could we simplify it by doing the locking
> > directly in pci_reset_notify()?  Then it would be easy to verify the
> > locking, and we would be less likely to add new callers without the
> > proper locking.
> 
> We could do that, except that I'd rather hold the lock over a longer
> period if we have many calls following each other.  

My main concern is being able to verify the locking.  I think that is
much easier if the locking is adjacent to the method invocation.  But
if you just add a comment at the method invocation about where the
locking is, that should be sufficient.

> I also have
> a patch to actually kill pci_reset_notify() later in the series as
> well, as the calling convention for it and ->reset_notify() are
> awkward - depending on prepare parameter they do two entirely
> different things.  That being said I could also add new
> pci_reset_prepare() and pci_reset_done() helpers.

I like your pci_reset_notify() changes; they make that much clearer.
I don't think new helpers are necessary.

> > 2) Stating the rule explicitly helps look for other problems, and I
> > think we have a similar problem in all the pcie_portdrv_err_handler
> > methods.
> 
> Yes, I mentioned this earlier, and I also vaguely remember we got
> bug reports from IBM on power for this a while ago.  I just don't
> feel confident enough to touch all these without a good test plan.

Hmmm.  I see your point, but I hate leaving a known bug unfixed.  I
wonder if some enterprising soul could tickle this bug by injecting
errors while removing and rescanning devices below the bridge?

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls
  2017-06-12 23:14             ` Bjorn Helgaas
@ 2017-06-13  7:08               ` Christoph Hellwig
  -1 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-06-13  7:08 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Christoph Hellwig, rakesh, linux-pci, linux-nvme,
	Greg Kroah-Hartman, linux-kernel

On Mon, Jun 12, 2017 at 06:14:23PM -0500, Bjorn Helgaas wrote:
> My main concern is being able to verify the locking.  I think that is
> much easier if the locking is adjacent to the method invocation.  But
> if you just add a comment at the method invocation about where the
> locking is, that should be sufficient.

Ok.  I can add comments for all the methods as a separate patch,
similar to Documentation/vfs/Locking

> > Yes, I mentioned this earlier, and I also vaguely remember we got
> > bug reports from IBM on power for this a while ago.  I just don't
> > feel confident enough to touch all these without a good test plan.
> 
> Hmmm.  I see your point, but I hate leaving a known bug unfixed.  I
> wonder if some enterprising soul could tickle this bug by injecting
> errors while removing and rescanning devices below the bridge?

I'm completely loaded up at the moment, but this sounds like a good
idea.

In the meantime how do you want to proceed with this patch?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls
@ 2017-06-13  7:08               ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2017-06-13  7:08 UTC (permalink / raw)


On Mon, Jun 12, 2017@06:14:23PM -0500, Bjorn Helgaas wrote:
> My main concern is being able to verify the locking.  I think that is
> much easier if the locking is adjacent to the method invocation.  But
> if you just add a comment at the method invocation about where the
> locking is, that should be sufficient.

Ok.  I can add comments for all the methods as a separate patch,
similar to Documentation/vfs/Locking

> > Yes, I mentioned this earlier, and I also vaguely remember we got
> > bug reports from IBM on power for this a while ago.  I just don't
> > feel confident enough to touch all these without a good test plan.
> 
> Hmmm.  I see your point, but I hate leaving a known bug unfixed.  I
> wonder if some enterprising soul could tickle this bug by injecting
> errors while removing and rescanning devices below the bridge?

I'm completely loaded up at the moment, but this sounds like a good
idea.

In the meantime how do you want to proceed with this patch?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls
  2017-06-13  7:08               ` Christoph Hellwig
@ 2017-06-13 14:05                 ` Bjorn Helgaas
  -1 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2017-06-13 14:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: rakesh, linux-pci, linux-nvme, Greg Kroah-Hartman, linux-kernel

On Tue, Jun 13, 2017 at 09:08:10AM +0200, Christoph Hellwig wrote:
> On Mon, Jun 12, 2017 at 06:14:23PM -0500, Bjorn Helgaas wrote:
> > My main concern is being able to verify the locking.  I think that is
> > much easier if the locking is adjacent to the method invocation.  But
> > if you just add a comment at the method invocation about where the
> > locking is, that should be sufficient.
> 
> Ok.  I can add comments for all the methods as a separate patch,
> similar to Documentation/vfs/Locking
> 
> > > Yes, I mentioned this earlier, and I also vaguely remember we got
> > > bug reports from IBM on power for this a while ago.  I just don't
> > > feel confident enough to touch all these without a good test plan.
> > 
> > Hmmm.  I see your point, but I hate leaving a known bug unfixed.  I
> > wonder if some enterprising soul could tickle this bug by injecting
> > errors while removing and rescanning devices below the bridge?
> 
> I'm completely loaded up at the moment, but this sounds like a good
> idea.
> 
> In the meantime how do you want to proceed with this patch?

Can you just add comments about the locking?  I'd prefer that in the
same patch that adds the locking because that's what I had a hard time
reviewing.  I'm not thinking of anything fancy like
Documentation/filesystems/Locking; I'm just thinking of something
along the lines of "caller must hold pci_dev_lock() to protect
err_handler->reset_notify from racing ->remove()".  And the changelog
should contain the general principle about the locking strategy.

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls
@ 2017-06-13 14:05                 ` Bjorn Helgaas
  0 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2017-06-13 14:05 UTC (permalink / raw)


On Tue, Jun 13, 2017@09:08:10AM +0200, Christoph Hellwig wrote:
> On Mon, Jun 12, 2017@06:14:23PM -0500, Bjorn Helgaas wrote:
> > My main concern is being able to verify the locking.  I think that is
> > much easier if the locking is adjacent to the method invocation.  But
> > if you just add a comment at the method invocation about where the
> > locking is, that should be sufficient.
> 
> Ok.  I can add comments for all the methods as a separate patch,
> similar to Documentation/vfs/Locking
> 
> > > Yes, I mentioned this earlier, and I also vaguely remember we got
> > > bug reports from IBM on power for this a while ago.  I just don't
> > > feel confident enough to touch all these without a good test plan.
> > 
> > Hmmm.  I see your point, but I hate leaving a known bug unfixed.  I
> > wonder if some enterprising soul could tickle this bug by injecting
> > errors while removing and rescanning devices below the bridge?
> 
> I'm completely loaded up at the moment, but this sounds like a good
> idea.
> 
> In the meantime how do you want to proceed with this patch?

Can you just add comments about the locking?  I'd prefer that in the
same patch that adds the locking because that's what I had a hard time
reviewing.  I'm not thinking of anything fancy like
Documentation/filesystems/Locking; I'm just thinking of something
along the lines of "caller must hold pci_dev_lock() to protect
err_handler->reset_notify from racing ->remove()".  And the changelog
should contain the general principle about the locking strategy.

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: avoid null pointer rereference during FLR V2
  2017-06-01 11:10 ` Christoph Hellwig
@ 2017-06-15  3:11   ` Bjorn Helgaas
  -1 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2017-06-15  3:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: rakesh, linux-pci, linux-nvme

On Thu, Jun 01, 2017 at 01:10:36PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> Rakesh reported a bug where a FLR can trivially crash his system.
> The reason for that is that NVMe unbinds the driver from the PCI device
> on an unrecoverable error, and that races with the reset_notify method.
> 
> This is fairly easily fixable by taking the device lock for a slightly
> longer period.  Note that the other PCI error handling methods actually
> have the same issue, but with them not taking the lock yet and me having
> no good way to reproducibly call them I'm a little reluctant to touch
> them, but it would be great if we could fix those issues as well.
> 
> Patches 2 and 3 are cleanups in the same area and not 4.12 material,
> but given that they depend on the first one I thought I'd send them
> along.
> 
> Changes since V1:
>  - lock over all calls to ->reset_notify

Applied all three (with some updated changelogs and comments) to
pci/virtualization for v4.13, thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* avoid null pointer rereference during FLR V2
@ 2017-06-15  3:11   ` Bjorn Helgaas
  0 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2017-06-15  3:11 UTC (permalink / raw)


On Thu, Jun 01, 2017@01:10:36PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> Rakesh reported a bug where a FLR can trivially crash his system.
> The reason for that is that NVMe unbinds the driver from the PCI device
> on an unrecoverable error, and that races with the reset_notify method.
> 
> This is fairly easily fixable by taking the device lock for a slightly
> longer period.  Note that the other PCI error handling methods actually
> have the same issue, but with them not taking the lock yet and me having
> no good way to reproducibly call them I'm a little reluctant to touch
> them, but it would be great if we could fix those issues as well.
> 
> Patches 2 and 3 are cleanups in the same area and not 4.12 material,
> but given that they depend on the first one I thought I'd send them
> along.
> 
> Changes since V1:
>  - lock over all calls to ->reset_notify

Applied all three (with some updated changelogs and comments) to
pci/virtualization for v4.13, thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls
  2017-06-12 23:14             ` Bjorn Helgaas
@ 2017-06-22 20:41               ` Guilherme G. Piccoli
  -1 siblings, 0 replies; 28+ messages in thread
From: Guilherme G. Piccoli @ 2017-06-22 20:41 UTC (permalink / raw)
  To: Bjorn Helgaas, Christoph Hellwig
  Cc: rakesh, linux-pci, linux-nvme, Greg Kroah-Hartman, linux-kernel

On 06/12/2017 08:14 PM, Bjorn Helgaas wrote:
> On Wed, Jun 07, 2017 at 08:29:36PM +0200, Christoph Hellwig wrote:
>> On Tue, Jun 06, 2017 at 04:14:43PM -0500, Bjorn Helgaas wrote:
>>> So I guess the method here is
>>> dev->driver->err_handler->reset_notify(), and the PCI core should be
>>> holding device_lock() while calling it?  That makes sense to me;
>>> thanks a lot for articulating that!
>>
>> Yes.
>>
>>> 1) The current patch protects the err_handler->reset_notify() uses by
>>> adding or expanding device_lock regions in the paths that lead to
>>> pci_reset_notify().  Could we simplify it by doing the locking
>>> directly in pci_reset_notify()?  Then it would be easy to verify the
>>> locking, and we would be less likely to add new callers without the
>>> proper locking.
>>
>> We could do that, except that I'd rather hold the lock over a longer
>> period if we have many calls following each other.  
> 
> My main concern is being able to verify the locking.  I think that is
> much easier if the locking is adjacent to the method invocation.  But
> if you just add a comment at the method invocation about where the
> locking is, that should be sufficient.
> 
>> I also have
>> a patch to actually kill pci_reset_notify() later in the series as
>> well, as the calling convention for it and ->reset_notify() are
>> awkward - depending on prepare parameter they do two entirely
>> different things.  That being said I could also add new
>> pci_reset_prepare() and pci_reset_done() helpers.
> 
> I like your pci_reset_notify() changes; they make that much clearer.
> I don't think new helpers are necessary.
> 
>>> 2) Stating the rule explicitly helps look for other problems, and I
>>> think we have a similar problem in all the pcie_portdrv_err_handler
>>> methods.
>>
>> Yes, I mentioned this earlier, and I also vaguely remember we got
>> bug reports from IBM on power for this a while ago.  I just don't
>> feel confident enough to touch all these without a good test plan.
> 
> Hmmm.  I see your point, but I hate leaving a known bug unfixed.  I
> wonder if some enterprising soul could tickle this bug by injecting
> errors while removing and rescanning devices below the bridge?

Well, although I don't consider myself an enterprising soul...heheh
I can test it, just CC me in next spin and provide some comment on how
to test (or point me the thread of original report).

I guess it was myself the reporter of the issue, I tried a simple fix
for our case and Christoph mentioned issue was more generic and needed a
proper fix..

Hopefully this one is that fix!
Thanks,


Guilherme

> 
> Bjorn
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls
@ 2017-06-22 20:41               ` Guilherme G. Piccoli
  0 siblings, 0 replies; 28+ messages in thread
From: Guilherme G. Piccoli @ 2017-06-22 20:41 UTC (permalink / raw)


On 06/12/2017 08:14 PM, Bjorn Helgaas wrote:
> On Wed, Jun 07, 2017@08:29:36PM +0200, Christoph Hellwig wrote:
>> On Tue, Jun 06, 2017@04:14:43PM -0500, Bjorn Helgaas wrote:
>>> So I guess the method here is
>>> dev->driver->err_handler->reset_notify(), and the PCI core should be
>>> holding device_lock() while calling it?  That makes sense to me;
>>> thanks a lot for articulating that!
>>
>> Yes.
>>
>>> 1) The current patch protects the err_handler->reset_notify() uses by
>>> adding or expanding device_lock regions in the paths that lead to
>>> pci_reset_notify().  Could we simplify it by doing the locking
>>> directly in pci_reset_notify()?  Then it would be easy to verify the
>>> locking, and we would be less likely to add new callers without the
>>> proper locking.
>>
>> We could do that, except that I'd rather hold the lock over a longer
>> period if we have many calls following each other.  
> 
> My main concern is being able to verify the locking.  I think that is
> much easier if the locking is adjacent to the method invocation.  But
> if you just add a comment at the method invocation about where the
> locking is, that should be sufficient.
> 
>> I also have
>> a patch to actually kill pci_reset_notify() later in the series as
>> well, as the calling convention for it and ->reset_notify() are
>> awkward - depending on prepare parameter they do two entirely
>> different things.  That being said I could also add new
>> pci_reset_prepare() and pci_reset_done() helpers.
> 
> I like your pci_reset_notify() changes; they make that much clearer.
> I don't think new helpers are necessary.
> 
>>> 2) Stating the rule explicitly helps look for other problems, and I
>>> think we have a similar problem in all the pcie_portdrv_err_handler
>>> methods.
>>
>> Yes, I mentioned this earlier, and I also vaguely remember we got
>> bug reports from IBM on power for this a while ago.  I just don't
>> feel confident enough to touch all these without a good test plan.
> 
> Hmmm.  I see your point, but I hate leaving a known bug unfixed.  I
> wonder if some enterprising soul could tickle this bug by injecting
> errors while removing and rescanning devices below the bridge?

Well, although I don't consider myself an enterprising soul...heheh
I can test it, just CC me in next spin and provide some comment on how
to test (or point me the thread of original report).

I guess it was myself the reporter of the issue, I tried a simple fix
for our case and Christoph mentioned issue was more generic and needed a
proper fix..

Hopefully this one is that fix!
Thanks,


Guilherme

> 
> Bjorn
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-06-22 20:46 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-01 11:10 avoid null pointer rereference during FLR V2 Christoph Hellwig
2017-06-01 11:10 ` Christoph Hellwig
2017-06-01 11:10 ` [PATCH 1/3] PCI: ensure the PCI device is locked over ->reset_notify calls Christoph Hellwig
2017-06-01 11:10   ` Christoph Hellwig
2017-06-06  5:31   ` Bjorn Helgaas
2017-06-06  5:31     ` Bjorn Helgaas
2017-06-06  7:28     ` Marta Rybczynska
2017-06-06  7:28       ` Marta Rybczynska
2017-06-06 10:48     ` Christoph Hellwig
2017-06-06 10:48       ` Christoph Hellwig
2017-06-06 21:14       ` Bjorn Helgaas
2017-06-06 21:14         ` Bjorn Helgaas
2017-06-07 18:29         ` Christoph Hellwig
2017-06-07 18:29           ` Christoph Hellwig
2017-06-12 23:14           ` Bjorn Helgaas
2017-06-12 23:14             ` Bjorn Helgaas
2017-06-13  7:08             ` Christoph Hellwig
2017-06-13  7:08               ` Christoph Hellwig
2017-06-13 14:05               ` Bjorn Helgaas
2017-06-13 14:05                 ` Bjorn Helgaas
2017-06-22 20:41             ` Guilherme G. Piccoli
2017-06-22 20:41               ` Guilherme G. Piccoli
2017-06-01 11:10 ` [PATCH 2/3] PCI: split reset_notify method Christoph Hellwig
2017-06-01 11:10   ` Christoph Hellwig
2017-06-01 11:10 ` [PATCH 3/3] PCI: remove __pci_dev_reset and pci_dev_reset Christoph Hellwig
2017-06-01 11:10   ` Christoph Hellwig
2017-06-15  3:11 ` avoid null pointer rereference during FLR V2 Bjorn Helgaas
2017-06-15  3:11   ` Bjorn Helgaas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.