All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-rc v2 1/3] PCI: Fix faulty logic in pci_reset_bus()
@ 2018-09-05 16:08 Sinan Kaya
  2018-09-05 16:08 ` [PATCH for-rc v2 2/3] PCI: Request reset type as part of function reset Sinan Kaya
  2018-09-05 16:08 ` [PATCH for-rc v2 3/3] IB/hfi1: Prefer new __pci_reset_function_locked() API with reset type Sinan Kaya
  0 siblings, 2 replies; 13+ messages in thread
From: Sinan Kaya @ 2018-09-05 16:08 UTC (permalink / raw)
  To: linux-pci; +Cc: Dennis Dalessandro, Sinan Kaya, Bjorn Helgaas

From: Dennis Dalessandro <dennis.dalessandro@intel.com>

The pci_rest_bus() function calls into pci_probe_reset_slot() to determine
whether to call the slot or bus reset. The check has faulty logic in that
it does not account for pci_probe_reset_slot() being able to return an
errno. Fix by only calling the slot reset when the function returns 0.
Treat < 1 and > 1 the same.

Cc: Sinan Kaya <okaya@codeaurora.org>
Fixes: 811c5cb37df4 ("PCI: Unify try slot and bus reset API")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
---
 drivers/pci/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 29ff9619b5fa..30b260332a10 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5200,7 +5200,7 @@ static int __pci_reset_bus(struct pci_bus *bus)
  */
 int pci_reset_bus(struct pci_dev *pdev)
 {
-	return pci_probe_reset_slot(pdev->slot) ?
+	return (!pci_probe_reset_slot(pdev->slot)) ?
 	    __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus);
 }
 EXPORT_SYMBOL_GPL(pci_reset_bus);
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH for-rc v2 2/3] PCI: Request reset type as part of function reset
  2018-09-05 16:08 [PATCH for-rc v2 1/3] PCI: Fix faulty logic in pci_reset_bus() Sinan Kaya
@ 2018-09-05 16:08 ` Sinan Kaya
  2018-09-05 23:07   ` Boris Ostrovsky
  2018-09-05 16:08 ` [PATCH for-rc v2 3/3] IB/hfi1: Prefer new __pci_reset_function_locked() API with reset type Sinan Kaya
  1 sibling, 1 reply; 13+ messages in thread
From: Sinan Kaya @ 2018-09-05 16:08 UTC (permalink / raw)
  To: linux-pci
  Cc: Sinan Kaya, Derek Chickles, Satanand Burla, Felix Manlunas,
	Raghu Vatsavayi, David S. Miller, Bjorn Helgaas, Boris Ostrovsky,
	Juergen Gross, Jia-Ju Bai

We need a contract between the reset API users and the PCI core about the
types of reset that a user needs vs. what PCI core can do internally.
If a platform supports hotplug, we need to do hotplug reset as an example.

Expose the reset types to the drivers and try different reset types based
on the new reset_type parameter.

Most users are expected to use PCI_RESET_ANY, PCI_RESET_FUNC or
PCI_RESET_LINK parameters.

Fixes: 811c5cb37df4 ("PCI: Unify try slot and bus reset API")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200985
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sinan Kaya <okaya@kernel.org>
---
 .../net/ethernet/cavium/liquidio/lio_main.c   |  2 +-
 drivers/pci/pci.c                             | 59 ++++++++++++-------
 drivers/xen/xen-pciback/pci_stub.c            |  6 +-
 include/linux/pci.h                           | 13 +++-
 4 files changed, 55 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 6fb13fa73b27..0ff76722734d 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -989,7 +989,7 @@ static void octeon_pci_flr(struct octeon_device *oct)
 	pci_write_config_word(oct->pci_dev, PCI_COMMAND,
 			      PCI_COMMAND_INTX_DISABLE);
 
-	rc = __pci_reset_function_locked(oct->pci_dev);
+	rc = __pci_reset_function_locked(oct->pci_dev, PCI_RESET_ANY);
 
 	if (rc != 0)
 		dev_err(&oct->pci_dev->dev, "Error %d resetting PCI function %d\n",
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 30b260332a10..43a4b0969153 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4672,6 +4672,7 @@ static void pci_dev_restore(struct pci_dev *dev)
  * __pci_reset_function_locked - reset a PCI device function while holding
  * the @dev mutex lock.
  * @dev: PCI device to reset
+ * @reset_type: reset mask to try
  *
  * Some devices allow an individual function to be reset without affecting
  * other functions in the same device.  The PCI device must be responsive
@@ -4687,9 +4688,9 @@ static void pci_dev_restore(struct pci_dev *dev)
  * Returns 0 if the device function was successfully reset or negative if the
  * device doesn't support resetting a single function.
  */
-int __pci_reset_function_locked(struct pci_dev *dev)
+int __pci_reset_function_locked(struct pci_dev *dev, u32 reset_type)
 {
-	int rc;
+	int rc = 0;
 
 	might_sleep();
 
@@ -4701,24 +4702,42 @@ int __pci_reset_function_locked(struct pci_dev *dev)
 	 * other error, we're also finished: this indicates that further
 	 * reset mechanisms might be broken on the device.
 	 */
-	rc = pci_dev_specific_reset(dev, 0);
-	if (rc != -ENOTTY)
-		return rc;
-	if (pcie_has_flr(dev)) {
-		rc = pcie_flr(dev);
+	if (reset_type & PCI_RESET_DEV_SPECIFIC) {
+		rc = pci_dev_specific_reset(dev, 0);
 		if (rc != -ENOTTY)
 			return rc;
 	}
-	rc = pci_af_flr(dev, 0);
-	if (rc != -ENOTTY)
-		return rc;
-	rc = pci_pm_reset(dev, 0);
-	if (rc != -ENOTTY)
-		return rc;
-	rc = pci_dev_reset_slot_function(dev, 0);
-	if (rc != -ENOTTY)
-		return rc;
-	return pci_parent_bus_reset(dev, 0);
+
+	if (reset_type & PCI_RESET_FLR) {
+		if (pcie_has_flr(dev)) {
+			rc = pcie_flr(dev);
+			if (rc != -ENOTTY)
+				return rc;
+		}
+		rc = pci_af_flr(dev, 0);
+		if (rc != -ENOTTY)
+			return rc;
+	}
+
+	if (reset_type & PCI_RESET_PM) {
+		rc = pci_pm_reset(dev, 0);
+		if (rc != -ENOTTY)
+			return rc;
+	}
+
+	if (reset_type & PCI_RESET_SLOT) {
+		rc = pci_dev_reset_slot_function(dev, 0);
+		if (rc != -ENOTTY)
+			return rc;
+	}
+
+	if (reset_type & PCI_RESET_BUS) {
+		rc = pci_parent_bus_reset(dev, 0);
+		if (rc != -ENOTTY)
+			return rc;
+	}
+
+	return rc;
 }
 EXPORT_SYMBOL_GPL(__pci_reset_function_locked);
 
@@ -4783,7 +4802,7 @@ int pci_reset_function(struct pci_dev *dev)
 	pci_dev_lock(dev);
 	pci_dev_save_and_disable(dev);
 
-	rc = __pci_reset_function_locked(dev);
+	rc = __pci_reset_function_locked(dev, PCI_RESET_ANY);
 
 	pci_dev_restore(dev);
 	pci_dev_unlock(dev);
@@ -4818,7 +4837,7 @@ int pci_reset_function_locked(struct pci_dev *dev)
 
 	pci_dev_save_and_disable(dev);
 
-	rc = __pci_reset_function_locked(dev);
+	rc = __pci_reset_function_locked(dev, PCI_RESET_ANY);
 
 	pci_dev_restore(dev);
 
@@ -4843,7 +4862,7 @@ int pci_try_reset_function(struct pci_dev *dev)
 		return -EAGAIN;
 
 	pci_dev_save_and_disable(dev);
-	rc = __pci_reset_function_locked(dev);
+	rc = __pci_reset_function_locked(dev, PCI_RESET_ANY);
 	pci_dev_restore(dev);
 	pci_dev_unlock(dev);
 
diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
index 59661db144e5..6dfb805bcb19 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -105,7 +105,7 @@ static void pcistub_device_release(struct kref *kref)
 	/* Call the reset function which does not take lock as this
 	 * is called from "unbind" which takes a device_lock mutex.
 	 */
-	__pci_reset_function_locked(dev);
+	__pci_reset_function_locked(dev, PCI_RESET_ANY);
 	if (pci_load_and_free_saved_state(dev, &dev_data->pci_saved_state))
 		dev_info(&dev->dev, "Could not reload PCI state\n");
 	else
@@ -283,7 +283,7 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
 	 * (so it's ready for the next domain)
 	 */
 	device_lock_assert(&dev->dev);
-	__pci_reset_function_locked(dev);
+	__pci_reset_function_locked(dev, PCI_RESET_ANY);
 
 	dev_data = pci_get_drvdata(dev);
 	ret = pci_load_saved_state(dev, dev_data->pci_saved_state);
@@ -417,7 +417,7 @@ static int pcistub_init_device(struct pci_dev *dev)
 		dev_err(&dev->dev, "Could not store PCI conf saved state!\n");
 	else {
 		dev_dbg(&dev->dev, "resetting (FLR, D3, etc) the device\n");
-		__pci_reset_function_locked(dev);
+		__pci_reset_function_locked(dev, PCI_RESET_ANY);
 		pci_restore_state(dev);
 	}
 	/* Now disable the device (this also ensures some private device
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e72ca8dd6241..6924d837c959 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -849,6 +849,17 @@ enum {
 	PCI_SCAN_ALL_PCIE_DEVS	= 0x00000040,	/* Scan all, not just dev 0 */
 };
 
+#define PCI_RESET_DEV_SPECIFIC	(1 << 0)
+#define PCI_RESET_FLR		(1 << 1)
+#define PCI_RESET_PM		(1 << 2)
+#define PCI_RESET_SLOT		(1 << 3)
+#define PCI_RESET_BUS		(1 << 4)
+
+#define PCI_RESET_ANY		(~0)
+#define PCI_RESET_FUNC		(PCI_RESET_DEV_SPECIFIC | \
+				 PCI_RESET_FLR | PCI_RESET_PM)
+#define PCI_RESET_LINK		(PCI_RESET_SLOT | PCI_RESET_BUS)
+
 /* These external functions are only available when PCI support is enabled */
 #ifdef CONFIG_PCI
 
@@ -1111,7 +1122,7 @@ u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev,
 void pcie_print_link_status(struct pci_dev *dev);
 bool pcie_has_flr(struct pci_dev *dev);
 int pcie_flr(struct pci_dev *dev);
-int __pci_reset_function_locked(struct pci_dev *dev);
+int __pci_reset_function_locked(struct pci_dev *dev, u32 reset_type);
 int pci_reset_function(struct pci_dev *dev);
 int pci_reset_function_locked(struct pci_dev *dev);
 int pci_try_reset_function(struct pci_dev *dev);
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH for-rc v2 3/3] IB/hfi1: Prefer new __pci_reset_function_locked() API with reset type
  2018-09-05 16:08 [PATCH for-rc v2 1/3] PCI: Fix faulty logic in pci_reset_bus() Sinan Kaya
  2018-09-05 16:08 ` [PATCH for-rc v2 2/3] PCI: Request reset type as part of function reset Sinan Kaya
@ 2018-09-05 16:08 ` Sinan Kaya
  2018-09-05 16:17   ` Doug Ledford
                     ` (2 more replies)
  1 sibling, 3 replies; 13+ messages in thread
From: Sinan Kaya @ 2018-09-05 16:08 UTC (permalink / raw)
  To: linux-pci
  Cc: Sinan Kaya, Mike Marciniszyn, Dennis Dalessandro, Doug Ledford,
	Jason Gunthorpe

pci_reset_bus() is being called from the probe context and causes
a deadlock since pci_reset_bus() also tries to obtain kernel lock.

Use __pci_reset_function_locked() that provides locking guarantees.

Fixes: 811c5cb37df4 ("PCI: Unify try slot and bus reset API")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200985
Signed-off-by: Sinan Kaya <okaya@kernel.org>
---
 drivers/infiniband/hw/hfi1/pcie.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/hfi1/pcie.c b/drivers/infiniband/hw/hfi1/pcie.c
index eec83757d55f..13162289b748 100644
--- a/drivers/infiniband/hw/hfi1/pcie.c
+++ b/drivers/infiniband/hw/hfi1/pcie.c
@@ -900,7 +900,7 @@ static int trigger_sbr(struct hfi1_devdata *dd)
 	 * delay after a reset is required.  Per spec requirements,
 	 * the link is either working or not after that point.
 	 */
-	return pci_reset_bus(dev);
+	return __pci_reset_function_locked(dev, PCI_RESET_LINK);
 }
 
 /*
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH for-rc v2 3/3] IB/hfi1: Prefer new __pci_reset_function_locked() API with reset type
  2018-09-05 16:08 ` [PATCH for-rc v2 3/3] IB/hfi1: Prefer new __pci_reset_function_locked() API with reset type Sinan Kaya
@ 2018-09-05 16:17   ` Doug Ledford
  2018-09-05 17:21   ` Dennis Dalessandro
  2018-09-12  1:09   ` Bjorn Helgaas
  2 siblings, 0 replies; 13+ messages in thread
From: Doug Ledford @ 2018-09-05 16:17 UTC (permalink / raw)
  To: Sinan Kaya, linux-pci
  Cc: Mike Marciniszyn, Dennis Dalessandro, Jason Gunthorpe

[-- Attachment #1: Type: text/plain, Size: 1283 bytes --]

On Wed, 2018-09-05 at 16:08 +0000, Sinan Kaya wrote:
> pci_reset_bus() is being called from the probe context and causes
> a deadlock since pci_reset_bus() also tries to obtain kernel lock.
> 
> Use __pci_reset_function_locked() that provides locking guarantees.
> 
> Fixes: 811c5cb37df4 ("PCI: Unify try slot and bus reset API")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=200985
> Signed-off-by: Sinan Kaya <okaya@kernel.org>

Acked-by: Doug Ledford <dledford@redhat.com>

> ---
>  drivers/infiniband/hw/hfi1/pcie.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/hw/hfi1/pcie.c b/drivers/infiniband/hw/hfi1/pcie.c
> index eec83757d55f..13162289b748 100644
> --- a/drivers/infiniband/hw/hfi1/pcie.c
> +++ b/drivers/infiniband/hw/hfi1/pcie.c
> @@ -900,7 +900,7 @@ static int trigger_sbr(struct hfi1_devdata *dd)
>  	 * delay after a reset is required.  Per spec requirements,
>  	 * the link is either working or not after that point.
>  	 */
> -	return pci_reset_bus(dev);
> +	return __pci_reset_function_locked(dev, PCI_RESET_LINK);
>  }
>  
>  /*

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-rc v2 3/3] IB/hfi1: Prefer new __pci_reset_function_locked() API with reset type
  2018-09-05 16:08 ` [PATCH for-rc v2 3/3] IB/hfi1: Prefer new __pci_reset_function_locked() API with reset type Sinan Kaya
  2018-09-05 16:17   ` Doug Ledford
@ 2018-09-05 17:21   ` Dennis Dalessandro
  2018-09-12  1:09   ` Bjorn Helgaas
  2 siblings, 0 replies; 13+ messages in thread
From: Dennis Dalessandro @ 2018-09-05 17:21 UTC (permalink / raw)
  To: Sinan Kaya, linux-pci; +Cc: Mike Marciniszyn, Doug Ledford, Jason Gunthorpe

On 9/5/2018 12:08 PM, Sinan Kaya wrote:
> pci_reset_bus() is being called from the probe context and causes
> a deadlock since pci_reset_bus() also tries to obtain kernel lock.
> 
> Use __pci_reset_function_locked() that provides locking guarantees.
> 
> Fixes: 811c5cb37df4 ("PCI: Unify try slot and bus reset API")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=200985
> Signed-off-by: Sinan Kaya <okaya@kernel.org>

Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-rc v2 2/3] PCI: Request reset type as part of function reset
  2018-09-05 16:08 ` [PATCH for-rc v2 2/3] PCI: Request reset type as part of function reset Sinan Kaya
@ 2018-09-05 23:07   ` Boris Ostrovsky
  0 siblings, 0 replies; 13+ messages in thread
From: Boris Ostrovsky @ 2018-09-05 23:07 UTC (permalink / raw)
  To: Sinan Kaya, linux-pci
  Cc: Derek Chickles, Satanand Burla, Felix Manlunas, Raghu Vatsavayi,
	David S. Miller, Bjorn Helgaas, Juergen Gross, Jia-Ju Bai

On 09/05/2018 12:08 PM, Sinan Kaya wrote:
> We need a contract between the reset API users and the PCI core about the
> types of reset that a user needs vs. what PCI core can do internally.
> If a platform supports hotplug, we need to do hotplug reset as an example.
>
> Expose the reset types to the drivers and try different reset types based
> on the new reset_type parameter.
>
> Most users are expected to use PCI_RESET_ANY, PCI_RESET_FUNC or
> PCI_RESET_LINK parameters.
>
> Fixes: 811c5cb37df4 ("PCI: Unify try slot and bus reset API")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=200985
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Sinan Kaya <okaya@kernel.org>


Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-rc v2 3/3] IB/hfi1: Prefer new __pci_reset_function_locked() API with reset type
  2018-09-05 16:08 ` [PATCH for-rc v2 3/3] IB/hfi1: Prefer new __pci_reset_function_locked() API with reset type Sinan Kaya
  2018-09-05 16:17   ` Doug Ledford
  2018-09-05 17:21   ` Dennis Dalessandro
@ 2018-09-12  1:09   ` Bjorn Helgaas
  2018-09-12  1:23     ` Sinan Kaya
  2 siblings, 1 reply; 13+ messages in thread
From: Bjorn Helgaas @ 2018-09-12  1:09 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: linux-pci, Mike Marciniszyn, Dennis Dalessandro, Doug Ledford,
	Jason Gunthorpe, Alex Williamson

On Wed, Sep 05, 2018 at 04:08:05PM +0000, Sinan Kaya wrote:
> pci_reset_bus() is being called from the probe context and causes
> a deadlock since pci_reset_bus() also tries to obtain kernel lock.

This doesn't tell me what the deadlock is.

> Use __pci_reset_function_locked() that provides locking guarantees.

And the previous patch that adds the "reset_type" parameter doesn't
tell me anything about what locking guarantees it provides.

I want to merge minimal changes for v4.19 to fix the known problems.

It's not clear to me what actually broke hfi1.  You mention a deadlock
above, so I assume some locking change broke it, but 811c5cb37df4 isn't
obviously related to locking.

It's obvious that 811c5cb37df4 tested the return value of
pci_probe_reset_slot() incorrectly, so there's no problem with patch 1/1.

But patches 2 & 3:

  PCI: Request reset type as part of function reset
  IB/hfi1: Prefer new __pci_reset_function_locked() API with reset type

do not connect the dots for me.

> Fixes: 811c5cb37df4 ("PCI: Unify try slot and bus reset API")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=200985
> Signed-off-by: Sinan Kaya <okaya@kernel.org>
> ---
>  drivers/infiniband/hw/hfi1/pcie.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/hw/hfi1/pcie.c b/drivers/infiniband/hw/hfi1/pcie.c
> index eec83757d55f..13162289b748 100644
> --- a/drivers/infiniband/hw/hfi1/pcie.c
> +++ b/drivers/infiniband/hw/hfi1/pcie.c
> @@ -900,7 +900,7 @@ static int trigger_sbr(struct hfi1_devdata *dd)
>  	 * delay after a reset is required.  Per spec requirements,
>  	 * the link is either working or not after that point.
>  	 */
> -	return pci_reset_bus(dev);
> +	return __pci_reset_function_locked(dev, PCI_RESET_LINK);
>  }
>  
>  /*
> -- 
> 2.18.0
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-rc v2 3/3] IB/hfi1: Prefer new __pci_reset_function_locked() API with reset type
  2018-09-12  1:09   ` Bjorn Helgaas
@ 2018-09-12  1:23     ` Sinan Kaya
  2018-09-12  1:59       ` Bjorn Helgaas
  0 siblings, 1 reply; 13+ messages in thread
From: Sinan Kaya @ 2018-09-12  1:23 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Mike Marciniszyn, Dennis Dalessandro, Doug Ledford,
	Jason Gunthorpe, Alex Williamson

On 9/11/2018 9:09 PM, Bjorn Helgaas wrote:
> On Wed, Sep 05, 2018 at 04:08:05PM +0000, Sinan Kaya wrote:
>> pci_reset_bus() is being called from the probe context and causes
>> a deadlock since pci_reset_bus() also tries to obtain kernel lock.
> 
> This doesn't tell me what the deadlock is.

Sorry, let me try to set the context.

Driver probe routine is being called with pci_dev_lock() held.

pci_reset_bus() tries to obtain the pci_dev_lock() via this path one
more time while holding the lock.

pci_dev_lock()
	enter driver probe()
	pci_reset_bus()
		 __pci_reset_bus()
			pci_bus_save_and_disable()
				pci_dev_lock()

	exit driver probe()
pci_dev_unlock()


> 
>> Use __pci_reset_function_locked() that provides locking guarantees.
> 
> And the previous patch that adds the "reset_type" parameter doesn't
> tell me anything about what locking guarantees it provides.

This is sort of implied.

As the name of the function is __pci_reset_function_locked(), it is
assumed that the caller is responsible for obtaining the necessary
locks before calling this function.

> 
> I want to merge minimal changes for v4.19 to fix the known problems.
> 
> It's not clear to me what actually broke hfi1.  You mention a deadlock
> above, so I assume some locking change broke it, but 811c5cb37df4 isn't
> obviously related to locking.

Previous code was calling secondary bus reset function directly. With the
generalization of the code, we are going through device save and restore
as follows:

1. obtain device lock
2. save device state
3. release device lock
4. perform secondary bus reset
5. obtain device lock
6. restore device state
7. release device lock

We have introduced the locks in step #1, #3, #5 and #7 into the existing
code path.

> 
> It's obvious that 811c5cb37df4 tested the return value of
> pci_probe_reset_slot() incorrectly, so there's no problem with patch 1/1.
> 

yup.

> But patches 2 & 3:
> 
>    PCI: Request reset type as part of function reset
>    IB/hfi1: Prefer new __pci_reset_function_locked() API with reset type
> 
> do not connect the dots for me.

I hope the above explanation helps. The consensus after conversation with Alex
was to reuse the locked API but add masks so that requester can selectively
request a slot reset without any locks.

> 
>> Fixes: 811c5cb37df4 ("PCI: Unify try slot and bus reset API")
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=200985
>> Signed-off-by: Sinan Kaya <okaya@kernel.org>
>> ---
>>   drivers/infiniband/hw/hfi1/pcie.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/hw/hfi1/pcie.c b/drivers/infiniband/hw/hfi1/pcie.c
>> index eec83757d55f..13162289b748 100644
>> --- a/drivers/infiniband/hw/hfi1/pcie.c
>> +++ b/drivers/infiniband/hw/hfi1/pcie.c
>> @@ -900,7 +900,7 @@ static int trigger_sbr(struct hfi1_devdata *dd)
>>   	 * delay after a reset is required.  Per spec requirements,
>>   	 * the link is either working or not after that point.
>>   	 */
>> -	return pci_reset_bus(dev);
>> +	return __pci_reset_function_locked(dev, PCI_RESET_LINK);
>>   }
>>   
>>   /*
>> -- 
>> 2.18.0
>>
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-rc v2 3/3] IB/hfi1: Prefer new __pci_reset_function_locked() API with reset type
  2018-09-12  1:23     ` Sinan Kaya
@ 2018-09-12  1:59       ` Bjorn Helgaas
  2018-09-12  2:05         ` Sinan Kaya
                           ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Bjorn Helgaas @ 2018-09-12  1:59 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: linux-pci, Mike Marciniszyn, Dennis Dalessandro, Doug Ledford,
	Jason Gunthorpe, Alex Williamson

Thanks for the quick response (and sorry for my late one).

On Tue, Sep 11, 2018 at 09:23:31PM -0400, Sinan Kaya wrote:
> On 9/11/2018 9:09 PM, Bjorn Helgaas wrote:
> > On Wed, Sep 05, 2018 at 04:08:05PM +0000, Sinan Kaya wrote:
> > > pci_reset_bus() is being called from the probe context and causes
> > > a deadlock since pci_reset_bus() also tries to obtain kernel lock.
> > 
> > This doesn't tell me what the deadlock is.
> 
> Sorry, let me try to set the context.
> 
> Driver probe routine is being called with pci_dev_lock() held.
> 
> pci_reset_bus() tries to obtain the pci_dev_lock() via this path one
> more time while holding the lock.
> 
> pci_dev_lock()
> 	enter driver probe()
> 	pci_reset_bus()
> 		 __pci_reset_bus()
> 			pci_bus_save_and_disable()
> 				pci_dev_lock()
> 
> 	exit driver probe()
> pci_dev_unlock()
> 
> 
> > 
> > > Use __pci_reset_function_locked() that provides locking guarantees.
> > 
> > And the previous patch that adds the "reset_type" parameter doesn't
> > tell me anything about what locking guarantees it provides.
> 
> This is sort of implied.

Uhh, maybe, but the implication is too subtle for me.  It adds a
"reset_type" that can be FLR, PM, SLOT, BUS, etc.  Those are different
types of resets, but those types say nothing about locking.

> As the name of the function is __pci_reset_function_locked(), it is
> assumed that the caller is responsible for obtaining the necessary
> locks before calling this function.
> 
> > 
> > I want to merge minimal changes for v4.19 to fix the known problems.
> > 
> > It's not clear to me what actually broke hfi1.  You mention a deadlock
> > above, so I assume some locking change broke it, but 811c5cb37df4 isn't
> > obviously related to locking.
> 
> Previous code was calling secondary bus reset function directly. With the

Ah, so it looks like 409888e0966e ("IB/hfi1: Use pci_try_reset_bus()
for initiating PCI Secondary Bus Reset") is an essential part of the
picture here.  That should probably be part of one of these
changelogs.

409888e0966e is where the locking change happened.  trigger_sbr()
previously called pci_reset_bridge_secondary_bus(), but after
409888e0966e, it calls pci_try_reset_bus(), which at the time called
pci_bus_trylock() *before* calling pci_reset_bridge_secondary_bus().

Can we just move the pci_bridge_secondary_bus_reset() decl back to
include/linux/pci.h temporarily and have trigger_sbr() call that
again?  I'd rather do that than try to squeeze some API rework into an
-rc.

> generalization of the code, we are going through device save and restore
> as follows:
> 
> 1. obtain device lock
> 2. save device state
> 3. release device lock
> 4. perform secondary bus reset
> 5. obtain device lock
> 6. restore device state
> 7. release device lock
> 
> We have introduced the locks in step #1, #3, #5 and #7 into the existing
> code path.
> 
> > 
> > It's obvious that 811c5cb37df4 tested the return value of
> > pci_probe_reset_slot() incorrectly, so there's no problem with patch 1/1.
> > 
> 
> yup.
> 
> > But patches 2 & 3:
> > 
> >    PCI: Request reset type as part of function reset
> >    IB/hfi1: Prefer new __pci_reset_function_locked() API with reset type
> > 
> > do not connect the dots for me.
> 
> I hope the above explanation helps. The consensus after conversation with Alex
> was to reuse the locked API but add masks so that requester can selectively
> request a slot reset without any locks.

I'm not buying this yet.  There's nothing in that API that says to me
"requesting certain types of resets doesn't acquire locks".

> > > Fixes: 811c5cb37df4 ("PCI: Unify try slot and bus reset API")
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=200985
> > > Signed-off-by: Sinan Kaya <okaya@kernel.org>
> > > ---
> > >   drivers/infiniband/hw/hfi1/pcie.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/infiniband/hw/hfi1/pcie.c b/drivers/infiniband/hw/hfi1/pcie.c
> > > index eec83757d55f..13162289b748 100644
> > > --- a/drivers/infiniband/hw/hfi1/pcie.c
> > > +++ b/drivers/infiniband/hw/hfi1/pcie.c
> > > @@ -900,7 +900,7 @@ static int trigger_sbr(struct hfi1_devdata *dd)
> > >   	 * delay after a reset is required.  Per spec requirements,
> > >   	 * the link is either working or not after that point.
> > >   	 */
> > > -	return pci_reset_bus(dev);
> > > +	return __pci_reset_function_locked(dev, PCI_RESET_LINK);
> > >   }
> > >   /*
> > > -- 
> > > 2.18.0
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-rc v2 3/3] IB/hfi1: Prefer new __pci_reset_function_locked() API with reset type
  2018-09-12  1:59       ` Bjorn Helgaas
@ 2018-09-12  2:05         ` Sinan Kaya
  2018-09-12  2:23         ` Dennis Dalessandro
  2018-09-12  2:34         ` Alex Williamson
  2 siblings, 0 replies; 13+ messages in thread
From: Sinan Kaya @ 2018-09-12  2:05 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Mike Marciniszyn, Dennis Dalessandro, Doug Ledford,
	Jason Gunthorpe, Alex Williamson

On 9/11/2018 9:59 PM, Bjorn Helgaas wrote:
> Can we just move the pci_bridge_secondary_bus_reset() decl back to
> include/linux/pci.h temporarily and have trigger_sbr() call that
> again?  I'd rather do that than try to squeeze some API rework into an
> -rc.
yes, that was the initial proposal from Dennis. If you prefer to have
a simpler/temporary change instead for the moment, correct patches are:

https://patchwork.kernel.org/patch/10584283/
https://patchwork.kernel.org/patch/10584277/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-rc v2 3/3] IB/hfi1: Prefer new __pci_reset_function_locked() API with reset type
  2018-09-12  1:59       ` Bjorn Helgaas
  2018-09-12  2:05         ` Sinan Kaya
@ 2018-09-12  2:23         ` Dennis Dalessandro
  2018-09-12  2:34         ` Alex Williamson
  2 siblings, 0 replies; 13+ messages in thread
From: Dennis Dalessandro @ 2018-09-12  2:23 UTC (permalink / raw)
  To: Bjorn Helgaas, Sinan Kaya
  Cc: linux-pci, Mike Marciniszyn, Doug Ledford, Jason Gunthorpe,
	Alex Williamson

On 9/11/2018 9:59 PM, Bjorn Helgaas wrote:
> Can we just move the pci_bridge_secondary_bus_reset() decl back to
> include/linux/pci.h temporarily and have trigger_sbr() call that
> again?  I'd rather do that than try to squeeze some API rework into an
> -rc.
> 

That was what I was initially hoping for [1]. Just get things back to 
working for now, especially since we are approaching the later -rc's. 
That being said, I have reviewed and tested Sinan's patch. So far so 
good, but I do see the concern.

[1] https://marc.info/?l=linux-pci&m=153573732824788&w=2

-Denny

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-rc v2 3/3] IB/hfi1: Prefer new __pci_reset_function_locked() API with reset type
  2018-09-12  1:59       ` Bjorn Helgaas
  2018-09-12  2:05         ` Sinan Kaya
  2018-09-12  2:23         ` Dennis Dalessandro
@ 2018-09-12  2:34         ` Alex Williamson
  2018-09-12  2:47           ` Sinan Kaya
  2 siblings, 1 reply; 13+ messages in thread
From: Alex Williamson @ 2018-09-12  2:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Sinan Kaya, linux-pci, Mike Marciniszyn, Dennis Dalessandro,
	Doug Ledford, Jason Gunthorpe

On Tue, 11 Sep 2018 20:59:27 -0500
Bjorn Helgaas <helgaas@kernel.org> wrote:

> Thanks for the quick response (and sorry for my late one).
> 
> On Tue, Sep 11, 2018 at 09:23:31PM -0400, Sinan Kaya wrote:
> > On 9/11/2018 9:09 PM, Bjorn Helgaas wrote:  
> > > On Wed, Sep 05, 2018 at 04:08:05PM +0000, Sinan Kaya wrote:  
> > > > pci_reset_bus() is being called from the probe context and causes
> > > > a deadlock since pci_reset_bus() also tries to obtain kernel lock.  
> > > 
> > > This doesn't tell me what the deadlock is.  
> > 
> > Sorry, let me try to set the context.
> > 
> > Driver probe routine is being called with pci_dev_lock() held.
> > 
> > pci_reset_bus() tries to obtain the pci_dev_lock() via this path one
> > more time while holding the lock.
> > 
> > pci_dev_lock()
> > 	enter driver probe()
> > 	pci_reset_bus()
> > 		 __pci_reset_bus()
> > 			pci_bus_save_and_disable()
> > 				pci_dev_lock()
> > 
> > 	exit driver probe()
> > pci_dev_unlock()
> > 
> >   
> > >   
> > > > Use __pci_reset_function_locked() that provides locking guarantees.  
> > > 
> > > And the previous patch that adds the "reset_type" parameter doesn't
> > > tell me anything about what locking guarantees it provides.  
> > 
> > This is sort of implied.  
> 
> Uhh, maybe, but the implication is too subtle for me.  It adds a
> "reset_type" that can be FLR, PM, SLOT, BUS, etc.  Those are different
> types of resets, but those types say nothing about locking.
> 
> > As the name of the function is __pci_reset_function_locked(), it is
> > assumed that the caller is responsible for obtaining the necessary
> > locks before calling this function.
> >   
> > > 
> > > I want to merge minimal changes for v4.19 to fix the known problems.
> > > 
> > > It's not clear to me what actually broke hfi1.  You mention a deadlock
> > > above, so I assume some locking change broke it, but 811c5cb37df4 isn't
> > > obviously related to locking.  
> > 
> > Previous code was calling secondary bus reset function directly. With the  
> 
> Ah, so it looks like 409888e0966e ("IB/hfi1: Use pci_try_reset_bus()
> for initiating PCI Secondary Bus Reset") is an essential part of the
> picture here.  That should probably be part of one of these
> changelogs.
> 
> 409888e0966e is where the locking change happened.  trigger_sbr()
> previously called pci_reset_bridge_secondary_bus(), but after
> 409888e0966e, it calls pci_try_reset_bus(), which at the time called
> pci_bus_trylock() *before* calling pci_reset_bridge_secondary_bus().
> 
> Can we just move the pci_bridge_secondary_bus_reset() decl back to
> include/linux/pci.h temporarily and have trigger_sbr() call that
> again?  I'd rather do that than try to squeeze some API rework into an
> -rc.
> 
> > generalization of the code, we are going through device save and restore
> > as follows:
> > 
> > 1. obtain device lock
> > 2. save device state
> > 3. release device lock
> > 4. perform secondary bus reset
> > 5. obtain device lock
> > 6. restore device state
> > 7. release device lock
> > 
> > We have introduced the locks in step #1, #3, #5 and #7 into the existing
> > code path.
> >   
> > > 
> > > It's obvious that 811c5cb37df4 tested the return value of
> > > pci_probe_reset_slot() incorrectly, so there's no problem with patch 1/1.
> > >   
> > 
> > yup.
> >   
> > > But patches 2 & 3:
> > > 
> > >    PCI: Request reset type as part of function reset
> > >    IB/hfi1: Prefer new __pci_reset_function_locked() API with reset type
> > > 
> > > do not connect the dots for me.  
> > 
> > I hope the above explanation helps. The consensus after conversation with Alex
> > was to reuse the locked API but add masks so that requester can selectively
> > request a slot reset without any locks.  
> 
> I'm not buying this yet.  There's nothing in that API that says to me
> "requesting certain types of resets doesn't acquire locks".

The type of reset doesn't determine the locking, it's that the
_locked() version of pci_reset_function() requires the caller to
provide the locking and the "__" prefix version requires the caller to
handle the device save and restore.  So the hfi1 requirement was nearly
satisfied by the existing exported __pci_reset_function_locked(), but
they want a link level reset, not an FLR, but the hotplug or sbr resets
are the last options of all the pci_reset_function() choices
currently.  Therefore adding a reset type as a parameter fixes both the
locking issue and the device save/restore, which the hfi1 driver
duplicates, and still makes the hfi1 driver work better with hotplug
slots, where their previous sbr might trigger surprise hotplugs.

Additionally, the original series unified pci_reset_bus() such that the
caller no longer has control of whether a sbr or hotplug reset is used,
which is undesirable for vfio-pci (811c5cb37df4 PCI: Unify try slot and
bus reset API) which really wants to specify since we do a fair bit or
work making sure the user owns the right set of devices for the
available type of reset.  Abstracting this allows PCI core and vfio to
get out of sync.  The idea of allowing the caller to specify the type
of reset is applicable here as well, unify the function, but still allow
the caller to specify.

For rc, I'd certainly agree that a simple revert might be a better
option and this API can play out for v4.20, but I definitely want to
get back an API that allows vfio to specify the bus reset type before
implementations diverge.  Thanks,

Alex

> > > > Fixes: 811c5cb37df4 ("PCI: Unify try slot and bus reset API")
> > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=200985
> > > > Signed-off-by: Sinan Kaya <okaya@kernel.org>
> > > > ---
> > > >   drivers/infiniband/hw/hfi1/pcie.c | 2 +-
> > > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/infiniband/hw/hfi1/pcie.c b/drivers/infiniband/hw/hfi1/pcie.c
> > > > index eec83757d55f..13162289b748 100644
> > > > --- a/drivers/infiniband/hw/hfi1/pcie.c
> > > > +++ b/drivers/infiniband/hw/hfi1/pcie.c
> > > > @@ -900,7 +900,7 @@ static int trigger_sbr(struct hfi1_devdata *dd)
> > > >   	 * delay after a reset is required.  Per spec requirements,
> > > >   	 * the link is either working or not after that point.
> > > >   	 */
> > > > -	return pci_reset_bus(dev);
> > > > +	return __pci_reset_function_locked(dev, PCI_RESET_LINK);
> > > >   }
> > > >   /*
> > > > -- 
> > > > 2.18.0
> > > >   
> > >   
> >   

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for-rc v2 3/3] IB/hfi1: Prefer new __pci_reset_function_locked() API with reset type
  2018-09-12  2:34         ` Alex Williamson
@ 2018-09-12  2:47           ` Sinan Kaya
  0 siblings, 0 replies; 13+ messages in thread
From: Sinan Kaya @ 2018-09-12  2:47 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas
  Cc: linux-pci, Mike Marciniszyn, Dennis Dalessandro, Doug Ledford,
	Jason Gunthorpe

On 9/11/2018 10:34 PM, Alex Williamson wrote:
> For rc, I'd certainly agree that a simple revert might be a better
> option and this API can play out for v4.20, but I definitely want to
> get back an API that allows vfio to specify the bus reset type before
> implementations diverge.  Thanks,

I posted these per Alex's feedback:

https://patchwork.ozlabs.org/patch/967557/
https://patchwork.ozlabs.org/patch/967558/
https://patchwork.ozlabs.org/patch/967559/
https://patchwork.ozlabs.org/patch/967560/

Please feel free to comment on them targeting v4.20.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-09-12  7:01 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-05 16:08 [PATCH for-rc v2 1/3] PCI: Fix faulty logic in pci_reset_bus() Sinan Kaya
2018-09-05 16:08 ` [PATCH for-rc v2 2/3] PCI: Request reset type as part of function reset Sinan Kaya
2018-09-05 23:07   ` Boris Ostrovsky
2018-09-05 16:08 ` [PATCH for-rc v2 3/3] IB/hfi1: Prefer new __pci_reset_function_locked() API with reset type Sinan Kaya
2018-09-05 16:17   ` Doug Ledford
2018-09-05 17:21   ` Dennis Dalessandro
2018-09-12  1:09   ` Bjorn Helgaas
2018-09-12  1:23     ` Sinan Kaya
2018-09-12  1:59       ` Bjorn Helgaas
2018-09-12  2:05         ` Sinan Kaya
2018-09-12  2:23         ` Dennis Dalessandro
2018-09-12  2:34         ` Alex Williamson
2018-09-12  2:47           ` Sinan Kaya

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.