All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Limiting pci access requests
@ 2016-08-08 19:14 Keith Busch
  2016-08-08 19:14 ` [PATCH 1/3] pcie: Don't search capabilities on removed devices Keith Busch
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Keith Busch @ 2016-08-08 19:14 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas; +Cc: Wei Zhang, Jens Axboe, Keith Busch

We observe that error handling and device hot removal creates many
unnecessary config and memory accesses to devices, some of which are not
even present. While we expect command processing to proceed, we observe
on various platforms that the unnecessary accesses create instability
with hardware performing completion synthesis, and slows down handling
of such error events as well as normal IO processing.

This patch series aims to reduce unnecessary accesses.

Keith Busch (3):
  pcie: Don't search capabilities on removed devices
  pci/msix: Skip disabling removed devices
  pcie/aer: Cache capability position

 drivers/pci/msi.c             |  5 +++++
 drivers/pci/pci.c             |  2 +-
 drivers/pci/pcie/aer/aerdrv.c | 38 +++++++++++++++++---------------------
 drivers/pci/pcie/aer/aerdrv.h |  1 +
 4 files changed, 24 insertions(+), 22 deletions(-)

-- 
2.7.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/3] pcie: Don't search capabilities on removed devices
  2016-08-08 19:14 [PATCH 0/3] Limiting pci access requests Keith Busch
@ 2016-08-08 19:14 ` Keith Busch
  2016-08-18 22:38   ` Bjorn Helgaas
  2016-08-08 19:14 ` [PATCH 2/3] pci/msix: Skip disabling " Keith Busch
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Keith Busch @ 2016-08-08 19:14 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas; +Cc: Wei Zhang, Jens Axboe, Keith Busch

This patch returns immediately if trying to find a pcie capability
on a removed device, as seen with an all 1's completion from config
read. Previously this function would iterate the maximum 480 times to
search for a capability at position 0xffc. There is never a case where
we'd expect all 1's to a successful config read on a capability register,
so this is a safe criteria to check before bailing on the device.

While accessing a removed device shouldn't be fatal, it's doesn't
accomplish anything. Instead, the code was testing completion synthesis
capabilities which is observed to cause distruption to normal operations.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index aab9d51..e884608 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -331,7 +331,7 @@ int pci_find_next_ext_capability(struct pci_dev *dev, int start, int cap)
 	if (header == 0)
 		return 0;
 
-	while (ttl-- > 0) {
+	while (ttl-- > 0 && header != -1) {
 		if (PCI_EXT_CAP_ID(header) == cap && pos != start)
 			return pos;
 
-- 
2.7.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/3] pci/msix: Skip disabling removed devices
  2016-08-08 19:14 [PATCH 0/3] Limiting pci access requests Keith Busch
  2016-08-08 19:14 ` [PATCH 1/3] pcie: Don't search capabilities on removed devices Keith Busch
@ 2016-08-08 19:14 ` Keith Busch
  2016-08-18 23:29   ` Bjorn Helgaas
  2016-08-08 19:14 ` [PATCH 3/3] pcie/aer: Cache capability position Keith Busch
  2016-08-09 17:36 ` [PATCH 0/3] Limiting pci access requests Bjorn Helgaas
  3 siblings, 1 reply; 17+ messages in thread
From: Keith Busch @ 2016-08-08 19:14 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas; +Cc: Wei Zhang, Jens Axboe, Keith Busch

There is no need to disable MSIx interrupts on a device that doesn't
exist.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/msi.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index a02981e..b60ee25 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -999,6 +999,11 @@ void pci_msix_shutdown(struct pci_dev *dev)
 	if (!pci_msi_enable || !dev || !dev->msix_enabled)
 		return;
 
+	if (!pci_device_is_present(dev)) {
+		dev->msix_enabled = 0;
+		return;
+	}
+
 	/* Return the device with MSI-X masked as initial states */
 	for_each_pci_msi_entry(entry, dev) {
 		/* Keep cached states to be restored */
-- 
2.7.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/3] pcie/aer: Cache capability position
  2016-08-08 19:14 [PATCH 0/3] Limiting pci access requests Keith Busch
  2016-08-08 19:14 ` [PATCH 1/3] pcie: Don't search capabilities on removed devices Keith Busch
  2016-08-08 19:14 ` [PATCH 2/3] pci/msix: Skip disabling " Keith Busch
@ 2016-08-08 19:14 ` Keith Busch
  2016-08-09 17:33   ` Bjorn Helgaas
  2016-08-09 17:36 ` [PATCH 0/3] Limiting pci access requests Bjorn Helgaas
  3 siblings, 1 reply; 17+ messages in thread
From: Keith Busch @ 2016-08-08 19:14 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas; +Cc: Wei Zhang, Jens Axboe, Keith Busch

This saves the postition of the error reporting capability so that it
doesn't need to be rediscovered during error handling.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/aer/aerdrv.c | 38 +++++++++++++++++---------------------
 drivers/pci/pcie/aer/aerdrv.h |  1 +
 2 files changed, 18 insertions(+), 21 deletions(-)

diff --git a/drivers/pci/pcie/aer/aerdrv.c b/drivers/pci/pcie/aer/aerdrv.c
index 48d21e0..d05c864 100644
--- a/drivers/pci/pcie/aer/aerdrv.c
+++ b/drivers/pci/pcie/aer/aerdrv.c
@@ -122,7 +122,6 @@ static void set_downstream_devices_error_reporting(struct pci_dev *dev,
 static void aer_enable_rootport(struct aer_rpc *rpc)
 {
 	struct pci_dev *pdev = rpc->rpd->port;
-	int aer_pos;
 	u16 reg16;
 	u32 reg32;
 
@@ -134,14 +133,13 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
 	pcie_capability_clear_word(pdev, PCI_EXP_RTCTL,
 				   SYSTEM_ERROR_INTR_ON_MESG_MASK);
 
-	aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR);
 	/* Clear error status */
-	pci_read_config_dword(pdev, aer_pos + PCI_ERR_ROOT_STATUS, &reg32);
-	pci_write_config_dword(pdev, aer_pos + PCI_ERR_ROOT_STATUS, reg32);
-	pci_read_config_dword(pdev, aer_pos + PCI_ERR_COR_STATUS, &reg32);
-	pci_write_config_dword(pdev, aer_pos + PCI_ERR_COR_STATUS, reg32);
-	pci_read_config_dword(pdev, aer_pos + PCI_ERR_UNCOR_STATUS, &reg32);
-	pci_write_config_dword(pdev, aer_pos + PCI_ERR_UNCOR_STATUS, reg32);
+	pci_read_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_STATUS, &reg32);
+	pci_write_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_STATUS, reg32);
+	pci_read_config_dword(pdev, rpc->pos + PCI_ERR_COR_STATUS, &reg32);
+	pci_write_config_dword(pdev, rpc->pos + PCI_ERR_COR_STATUS, reg32);
+	pci_read_config_dword(pdev, rpc->pos + PCI_ERR_UNCOR_STATUS, &reg32);
+	pci_write_config_dword(pdev, rpc->pos + PCI_ERR_UNCOR_STATUS, reg32);
 
 	/*
 	 * Enable error reporting for the root port device and downstream port
@@ -150,9 +148,9 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
 	set_downstream_devices_error_reporting(pdev, true);
 
 	/* Enable Root Port's interrupt in response to error messages */
-	pci_read_config_dword(pdev, aer_pos + PCI_ERR_ROOT_COMMAND, &reg32);
+	pci_read_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_COMMAND, &reg32);
 	reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
-	pci_write_config_dword(pdev, aer_pos + PCI_ERR_ROOT_COMMAND, reg32);
+	pci_write_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_COMMAND, reg32);
 }
 
 /**
@@ -165,7 +163,6 @@ static void aer_disable_rootport(struct aer_rpc *rpc)
 {
 	struct pci_dev *pdev = rpc->rpd->port;
 	u32 reg32;
-	int pos;
 
 	/*
 	 * Disable error reporting for the root port device and downstream port
@@ -173,15 +170,14 @@ static void aer_disable_rootport(struct aer_rpc *rpc)
 	 */
 	set_downstream_devices_error_reporting(pdev, false);
 
-	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR);
 	/* Disable Root's interrupt in response to error messages */
-	pci_read_config_dword(pdev, pos + PCI_ERR_ROOT_COMMAND, &reg32);
+	pci_read_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_COMMAND, &reg32);
 	reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
-	pci_write_config_dword(pdev, pos + PCI_ERR_ROOT_COMMAND, reg32);
+	pci_write_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_COMMAND, reg32);
 
 	/* Clear Root's error status reg */
-	pci_read_config_dword(pdev, pos + PCI_ERR_ROOT_STATUS, &reg32);
-	pci_write_config_dword(pdev, pos + PCI_ERR_ROOT_STATUS, reg32);
+	pci_read_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_STATUS, &reg32);
+	pci_write_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_STATUS, reg32);
 }
 
 /**
@@ -198,9 +194,7 @@ irqreturn_t aer_irq(int irq, void *context)
 	struct aer_rpc *rpc = get_service_data(pdev);
 	int next_prod_idx;
 	unsigned long flags;
-	int pos;
 
-	pos = pci_find_ext_capability(pdev->port, PCI_EXT_CAP_ID_ERR);
 	/*
 	 * Must lock access to Root Error Status Reg, Root Error ID Reg,
 	 * and Root error producer/consumer index
@@ -208,15 +202,15 @@ irqreturn_t aer_irq(int irq, void *context)
 	spin_lock_irqsave(&rpc->e_lock, flags);
 
 	/* Read error status */
-	pci_read_config_dword(pdev->port, pos + PCI_ERR_ROOT_STATUS, &status);
+	pci_read_config_dword(pdev->port, rpc->pos + PCI_ERR_ROOT_STATUS, &status);
 	if (!(status & (PCI_ERR_ROOT_UNCOR_RCV|PCI_ERR_ROOT_COR_RCV))) {
 		spin_unlock_irqrestore(&rpc->e_lock, flags);
 		return IRQ_NONE;
 	}
 
 	/* Read error source and clear error status */
-	pci_read_config_dword(pdev->port, pos + PCI_ERR_ROOT_ERR_SRC, &id);
-	pci_write_config_dword(pdev->port, pos + PCI_ERR_ROOT_STATUS, status);
+	pci_read_config_dword(pdev->port, rpc->pos + PCI_ERR_ROOT_ERR_SRC, &id);
+	pci_write_config_dword(pdev->port, rpc->pos + PCI_ERR_ROOT_STATUS, status);
 
 	/* Store error source for later DPC handler */
 	next_prod_idx = rpc->prod_idx + 1;
@@ -317,6 +311,8 @@ static int aer_probe(struct pcie_device *dev)
 		return -ENOMEM;
 	}
 
+	rpc->pos = pci_find_ext_capability(dev->port, PCI_EXT_CAP_ID_ERR);
+
 	/* Request IRQ ISR */
 	status = request_irq(dev->irq, aer_irq, IRQF_SHARED, "aerdrv", dev);
 	if (status) {
diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index 945c939..d7c586e 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -72,6 +72,7 @@ struct aer_rpc {
 					 * recovery on the same
 					 * root port hierarchy
 					 */
+	int pos;			/* position of AER capability */
 };
 
 struct aer_broadcast_data {
-- 
2.7.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] pcie/aer: Cache capability position
  2016-08-08 19:14 ` [PATCH 3/3] pcie/aer: Cache capability position Keith Busch
@ 2016-08-09 17:33   ` Bjorn Helgaas
  2016-09-06 21:05     ` Jon Derrick
  0 siblings, 1 reply; 17+ messages in thread
From: Bjorn Helgaas @ 2016-08-09 17:33 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-pci, Bjorn Helgaas, Wei Zhang, Jens Axboe

Hi Keith,

On Mon, Aug 08, 2016 at 01:14:27PM -0600, Keith Busch wrote:
> This saves the postition of the error reporting capability so that it
> doesn't need to be rediscovered during error handling.

I like the idea of this, and I wonder if we should go even further,
and cache aer_pos in the pci_dev.  There are a zillion places that
look for PCI_EXT_CAP_ID_ERR, and several of them are outside the AER
driver, where it would be inconvenient to look up the aer_rpc.

That would probably mean adding a hook to pci_init_capabilities(),
which would be kind of nice anyway because we already have a call to
pci_cleanup_aer_error_status_regs() there, which is sort of dangling
and not parallel to the other calls.

> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
>  drivers/pci/pcie/aer/aerdrv.c | 38 +++++++++++++++++---------------------
>  drivers/pci/pcie/aer/aerdrv.h |  1 +
>  2 files changed, 18 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer/aerdrv.c b/drivers/pci/pcie/aer/aerdrv.c
> index 48d21e0..d05c864 100644
> --- a/drivers/pci/pcie/aer/aerdrv.c
> +++ b/drivers/pci/pcie/aer/aerdrv.c
> @@ -122,7 +122,6 @@ static void set_downstream_devices_error_reporting(struct pci_dev *dev,
>  static void aer_enable_rootport(struct aer_rpc *rpc)
>  {
>  	struct pci_dev *pdev = rpc->rpd->port;

What if you did this instead:

 -	int aer_pos;
 +	int aer_pos = rpc->pos;

Then all the config accesses wouldn't have to change.

>  	u16 reg16;
>  	u32 reg32;
>  
> @@ -134,14 +133,13 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
>  	pcie_capability_clear_word(pdev, PCI_EXP_RTCTL,
>  				   SYSTEM_ERROR_INTR_ON_MESG_MASK);
>  
> -	aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR);
>  	/* Clear error status */
> -	pci_read_config_dword(pdev, aer_pos + PCI_ERR_ROOT_STATUS, &reg32);
> -	pci_write_config_dword(pdev, aer_pos + PCI_ERR_ROOT_STATUS, reg32);
> -	pci_read_config_dword(pdev, aer_pos + PCI_ERR_COR_STATUS, &reg32);
> -	pci_write_config_dword(pdev, aer_pos + PCI_ERR_COR_STATUS, reg32);
> -	pci_read_config_dword(pdev, aer_pos + PCI_ERR_UNCOR_STATUS, &reg32);
> -	pci_write_config_dword(pdev, aer_pos + PCI_ERR_UNCOR_STATUS, reg32);
> +	pci_read_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_STATUS, &reg32);
> +	pci_write_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_STATUS, reg32);
> +	pci_read_config_dword(pdev, rpc->pos + PCI_ERR_COR_STATUS, &reg32);
> +	pci_write_config_dword(pdev, rpc->pos + PCI_ERR_COR_STATUS, reg32);
> +	pci_read_config_dword(pdev, rpc->pos + PCI_ERR_UNCOR_STATUS, &reg32);
> +	pci_write_config_dword(pdev, rpc->pos + PCI_ERR_UNCOR_STATUS, reg32);
>  
>  	/*
>  	 * Enable error reporting for the root port device and downstream port
> @@ -150,9 +148,9 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
>  	set_downstream_devices_error_reporting(pdev, true);
>  
>  	/* Enable Root Port's interrupt in response to error messages */
> -	pci_read_config_dword(pdev, aer_pos + PCI_ERR_ROOT_COMMAND, &reg32);
> +	pci_read_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_COMMAND, &reg32);
>  	reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
> -	pci_write_config_dword(pdev, aer_pos + PCI_ERR_ROOT_COMMAND, reg32);
> +	pci_write_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_COMMAND, reg32);
>  }
>  
>  /**
> @@ -165,7 +163,6 @@ static void aer_disable_rootport(struct aer_rpc *rpc)
>  {
>  	struct pci_dev *pdev = rpc->rpd->port;
>  	u32 reg32;
> -	int pos;
>  
>  	/*
>  	 * Disable error reporting for the root port device and downstream port
> @@ -173,15 +170,14 @@ static void aer_disable_rootport(struct aer_rpc *rpc)
>  	 */
>  	set_downstream_devices_error_reporting(pdev, false);
>  
> -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR);
>  	/* Disable Root's interrupt in response to error messages */
> -	pci_read_config_dword(pdev, pos + PCI_ERR_ROOT_COMMAND, &reg32);
> +	pci_read_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_COMMAND, &reg32);
>  	reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
> -	pci_write_config_dword(pdev, pos + PCI_ERR_ROOT_COMMAND, reg32);
> +	pci_write_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_COMMAND, reg32);
>  
>  	/* Clear Root's error status reg */
> -	pci_read_config_dword(pdev, pos + PCI_ERR_ROOT_STATUS, &reg32);
> -	pci_write_config_dword(pdev, pos + PCI_ERR_ROOT_STATUS, reg32);
> +	pci_read_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_STATUS, &reg32);
> +	pci_write_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_STATUS, reg32);
>  }
>  
>  /**
> @@ -198,9 +194,7 @@ irqreturn_t aer_irq(int irq, void *context)
>  	struct aer_rpc *rpc = get_service_data(pdev);
>  	int next_prod_idx;
>  	unsigned long flags;
> -	int pos;
>  
> -	pos = pci_find_ext_capability(pdev->port, PCI_EXT_CAP_ID_ERR);
>  	/*
>  	 * Must lock access to Root Error Status Reg, Root Error ID Reg,
>  	 * and Root error producer/consumer index
> @@ -208,15 +202,15 @@ irqreturn_t aer_irq(int irq, void *context)
>  	spin_lock_irqsave(&rpc->e_lock, flags);
>  
>  	/* Read error status */
> -	pci_read_config_dword(pdev->port, pos + PCI_ERR_ROOT_STATUS, &status);
> +	pci_read_config_dword(pdev->port, rpc->pos + PCI_ERR_ROOT_STATUS, &status);
>  	if (!(status & (PCI_ERR_ROOT_UNCOR_RCV|PCI_ERR_ROOT_COR_RCV))) {
>  		spin_unlock_irqrestore(&rpc->e_lock, flags);
>  		return IRQ_NONE;
>  	}
>  
>  	/* Read error source and clear error status */
> -	pci_read_config_dword(pdev->port, pos + PCI_ERR_ROOT_ERR_SRC, &id);
> -	pci_write_config_dword(pdev->port, pos + PCI_ERR_ROOT_STATUS, status);
> +	pci_read_config_dword(pdev->port, rpc->pos + PCI_ERR_ROOT_ERR_SRC, &id);
> +	pci_write_config_dword(pdev->port, rpc->pos + PCI_ERR_ROOT_STATUS, status);
>  
>  	/* Store error source for later DPC handler */
>  	next_prod_idx = rpc->prod_idx + 1;
> @@ -317,6 +311,8 @@ static int aer_probe(struct pcie_device *dev)
>  		return -ENOMEM;
>  	}
>  
> +	rpc->pos = pci_find_ext_capability(dev->port, PCI_EXT_CAP_ID_ERR);
> +
>  	/* Request IRQ ISR */
>  	status = request_irq(dev->irq, aer_irq, IRQF_SHARED, "aerdrv", dev);
>  	if (status) {
> diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
> index 945c939..d7c586e 100644
> --- a/drivers/pci/pcie/aer/aerdrv.h
> +++ b/drivers/pci/pcie/aer/aerdrv.h
> @@ -72,6 +72,7 @@ struct aer_rpc {
>  					 * recovery on the same
>  					 * root port hierarchy
>  					 */
> +	int pos;			/* position of AER capability */
>  };
>  
>  struct aer_broadcast_data {
> -- 
> 2.7.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] Limiting pci access requests
  2016-08-08 19:14 [PATCH 0/3] Limiting pci access requests Keith Busch
                   ` (2 preceding siblings ...)
  2016-08-08 19:14 ` [PATCH 3/3] pcie/aer: Cache capability position Keith Busch
@ 2016-08-09 17:36 ` Bjorn Helgaas
  2016-08-09 18:56   ` Keith Busch
  3 siblings, 1 reply; 17+ messages in thread
From: Bjorn Helgaas @ 2016-08-09 17:36 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-pci, Bjorn Helgaas, Wei Zhang, Jens Axboe

Hi Keith,

On Mon, Aug 08, 2016 at 01:14:24PM -0600, Keith Busch wrote:
> We observe that error handling and device hot removal creates many
> unnecessary config and memory accesses to devices, some of which are not
> even present. While we expect command processing to proceed, we observe
> on various platforms that the unnecessary accesses create instability
> with hardware performing completion synthesis, and slows down handling
> of such error events as well as normal IO processing.

Is there some hot removal path that we've suddenly starting exercising
more than we used to?  Can you give us any details of that?  I'm
wondering if there are any more generic fixes we can make.  These
patches seem good, but a little piece-meal, so it feels like there
could be more places where we trip over similar issues.

> This patch series aims to reduce unnecessary accesses.
> 
> Keith Busch (3):
>   pcie: Don't search capabilities on removed devices
>   pci/msix: Skip disabling removed devices
>   pcie/aer: Cache capability position
> 
>  drivers/pci/msi.c             |  5 +++++
>  drivers/pci/pci.c             |  2 +-
>  drivers/pci/pcie/aer/aerdrv.c | 38 +++++++++++++++++---------------------
>  drivers/pci/pcie/aer/aerdrv.h |  1 +
>  4 files changed, 24 insertions(+), 22 deletions(-)
> 
> -- 
> 2.7.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] Limiting pci access requests
  2016-08-09 18:56   ` Keith Busch
@ 2016-08-09 18:56     ` Lukas Wunner
  2016-08-17 21:05       ` Keith Busch
  0 siblings, 1 reply; 17+ messages in thread
From: Lukas Wunner @ 2016-08-09 18:56 UTC (permalink / raw)
  To: Keith Busch; +Cc: Bjorn Helgaas, linux-pci, Wei Zhang, Jens Axboe

On Tue, Aug 09, 2016 at 02:56:54PM -0400, Keith Busch wrote:
> On Tue, Aug 09, 2016 at 12:36:33PM -0500, Bjorn Helgaas wrote:
> > On Mon, Aug 08, 2016 at 01:14:24PM -0600, Keith Busch wrote:
> > > We observe that error handling and device hot removal creates many
> > > unnecessary config and memory accesses to devices, some of which are not
> > > even present. While we expect command processing to proceed, we observe
> > > on various platforms that the unnecessary accesses create instability
> > > with hardware performing completion synthesis, and slows down handling
> > > of such error events as well as normal IO processing.
> > 
> > Is there some hot removal path that we've suddenly starting exercising
> > more than we used to?  Can you give us any details of that?  I'm
> > wondering if there are any more generic fixes we can make.  These
> > patches seem good, but a little piece-meal, so it feels like there
> > could be more places where we trip over similar issues.
> 
> This series came from testing JBODs of PCIe SSDs. I think the main
> difference with this setup compared to most other PCIe testing is the
> sheer number of simultaneous add + remove + error events while running
> continuous IO. We're not hitting any new code paths in the kernel, but
> we are discovering interesting software and hardware interactions that
> were likely less reachable before such testing.
> 
> There are still more places that we can remove unnecessary config and
> MMIO, though they're just micro-improvements compared to this series.
> Even those just repeat the same pattern of looking for a -1 completion
> or false return from "pci_device_is_present". So the "fixes" do look
> tedious and piecemeal, but I didn't see how else we could do it. Any
> thoughts or guidance is much appreciated.

FWIW, similar checks were added to pciehp with commit 1469d17dd341
("PCI: pciehp: Handle invalid data when reading from non-existent
devices"). So the general idea to handle such faults is already
present in the kernel, the only improvement I could see here would
be to harmonize (i.e. make identical everywhere) the way this is
coded (check for ~0) as well as the message logged with KERN_INFO
(your patches do not log a message at all AFAICS).

Best regards,

Lukas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] Limiting pci access requests
  2016-08-09 17:36 ` [PATCH 0/3] Limiting pci access requests Bjorn Helgaas
@ 2016-08-09 18:56   ` Keith Busch
  2016-08-09 18:56     ` Lukas Wunner
  0 siblings, 1 reply; 17+ messages in thread
From: Keith Busch @ 2016-08-09 18:56 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Bjorn Helgaas, Wei Zhang, Jens Axboe

On Tue, Aug 09, 2016 at 12:36:33PM -0500, Bjorn Helgaas wrote:
> On Mon, Aug 08, 2016 at 01:14:24PM -0600, Keith Busch wrote:
> > We observe that error handling and device hot removal creates many
> > unnecessary config and memory accesses to devices, some of which are not
> > even present. While we expect command processing to proceed, we observe
> > on various platforms that the unnecessary accesses create instability
> > with hardware performing completion synthesis, and slows down handling
> > of such error events as well as normal IO processing.
> 
> Is there some hot removal path that we've suddenly starting exercising
> more than we used to?  Can you give us any details of that?  I'm
> wondering if there are any more generic fixes we can make.  These
> patches seem good, but a little piece-meal, so it feels like there
> could be more places where we trip over similar issues.

Hi Bjorn,

This series came from testing JBODs of PCIe SSDs. I think the main
difference with this setup compared to most other PCIe testing is the
sheer number of simultaneous add + remove + error events while running
continuous IO. We're not hitting any new code paths in the kernel, but
we are discovering interesting software and hardware interactions that
were likely less reachable before such testing.

There are still more places that we can remove unnecessary config and
MMIO, though they're just micro-improvements compared to this series.
Even those just repeat the same pattern of looking for a -1 completion
or false return from "pci_device_is_present". So the "fixes" do look
tedious and piecemeal, but I didn't see how else we could do it. Any
thoughts or guidance is much appreciated.

Thanks,
Keith

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] Limiting pci access requests
  2016-08-09 18:56     ` Lukas Wunner
@ 2016-08-17 21:05       ` Keith Busch
  2016-08-18 14:02         ` Lukas Wunner
  0 siblings, 1 reply; 17+ messages in thread
From: Keith Busch @ 2016-08-17 21:05 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: Bjorn Helgaas, linux-pci, Wei Zhang, Jens Axboe

On Tue, Aug 09, 2016 at 08:56:28PM +0200, Lukas Wunner wrote:
> On Tue, Aug 09, 2016 at 02:56:54PM -0400, Keith Busch wrote:
> > There are still more places that we can remove unnecessary config and
> > MMIO, though they're just micro-improvements compared to this series.
> > Even those just repeat the same pattern of looking for a -1 completion
> > or false return from "pci_device_is_present". So the "fixes" do look
> > tedious and piecemeal, but I didn't see how else we could do it. Any
> > thoughts or guidance is much appreciated.
> 
> FWIW, similar checks were added to pciehp with commit 1469d17dd341
> ("PCI: pciehp: Handle invalid data when reading from non-existent
> devices"). So the general idea to handle such faults is already
> present in the kernel, the only improvement I could see here would
> be to harmonize (i.e. make identical everywhere) the way this is
> coded (check for ~0) as well as the message logged with KERN_INFO
> (your patches do not log a message at all AFAICS).

AFAICT, the only thing we can do is have every caller of
pci_read_config_*, pci_bus_read_config_*, and pcie_capability_read_*
check for ~0 completion, and handle accordingly. Is that what you mean
by making this identical everywhere?  That is a lot of places to fix! :)

I could write a coccinelle pattern for that, but the actual handling
would need to be specific to where it's called, and it may accidently
treat a legit all 1's completion as an error.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] Limiting pci access requests
  2016-08-17 21:05       ` Keith Busch
@ 2016-08-18 14:02         ` Lukas Wunner
  2016-08-18 16:05           ` Keith Busch
  0 siblings, 1 reply; 17+ messages in thread
From: Lukas Wunner @ 2016-08-18 14:02 UTC (permalink / raw)
  To: Keith Busch; +Cc: Bjorn Helgaas, linux-pci, Wei Zhang, Jens Axboe

On Wed, Aug 17, 2016 at 05:05:40PM -0400, Keith Busch wrote:
> On Tue, Aug 09, 2016 at 08:56:28PM +0200, Lukas Wunner wrote:
> > On Tue, Aug 09, 2016 at 02:56:54PM -0400, Keith Busch wrote:
> > > There are still more places that we can remove unnecessary config and
> > > MMIO, though they're just micro-improvements compared to this series.
> > > Even those just repeat the same pattern of looking for a -1 completion
> > > or false return from "pci_device_is_present". So the "fixes" do look
> > > tedious and piecemeal, but I didn't see how else we could do it. Any
> > > thoughts or guidance is much appreciated.
> > 
> > FWIW, similar checks were added to pciehp with commit 1469d17dd341
> > ("PCI: pciehp: Handle invalid data when reading from non-existent
> > devices"). So the general idea to handle such faults is already
> > present in the kernel, the only improvement I could see here would
> > be to harmonize (i.e. make identical everywhere) the way this is
> > coded (check for ~0) as well as the message logged with KERN_INFO
> > (your patches do not log a message at all AFAICS).
> 
> AFAICT, the only thing we can do is have every caller of
> pci_read_config_*, pci_bus_read_config_*, and pcie_capability_read_*
> check for ~0 completion, and handle accordingly. Is that what you mean
> by making this identical everywhere?  That is a lot of places to fix! :)

Littering the code with such checks won't improve its readability
so we ought to put that where it's really needed. I just meant that
the way the check is written in the code and the accompanying log
message should probably be made identical to improve readability
and clarity.

TBH I don't understand exactly how you trigger these errors. Re-reading
patch [1/3], I can only guess that pci_find_next_ext_capability() might
be called from aerdrv via pci_find_ext_capability(). The other patches
concern aerdrv, so that guess seems plausible. Explaining the call stack
would be helpful.

How is it possible that a device is accessed that no longer exists?
Are these (native) pciehp ports and the attached pci_dev isn't torn
down quickly enough? Do we need some kind of locking or an atomic flag
that prevents accesses to devices until they're torn down completely?

Since your patches pertain to aerdrv, do we need synchronization between
the pciehp and aer drivers so that aer doesn't touch a device for which
pciehp has sensed removal? (Is the interrupt shared between pciehp and
aerdrv?)

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] Limiting pci access requests
  2016-08-18 14:02         ` Lukas Wunner
@ 2016-08-18 16:05           ` Keith Busch
  2016-08-18 16:59             ` Lukas Wunner
  0 siblings, 1 reply; 17+ messages in thread
From: Keith Busch @ 2016-08-18 16:05 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: Bjorn Helgaas, linux-pci, Wei Zhang, Jens Axboe

On Thu, Aug 18, 2016 at 04:02:13PM +0200, Lukas Wunner wrote:
> TBH I don't understand exactly how you trigger these errors. Re-reading
> patch [1/3], I can only guess that pci_find_next_ext_capability() might
> be called from aerdrv via pci_find_ext_capability(). The other patches
> concern aerdrv, so that guess seems plausible. Explaining the call stack
> would be helpful.

Yes, I believe the majority of what we observed in protocol traces did
originate from the aer driver.

pci_find_capability already breaks early on an all 1's, so patch 1/3
just provides the same benefit to searching the extended capabilities.

Caching the AER capability may obviate the immediate benefit of 1/3,
but I think it's still a good change.
 
> How is it possible that a device is accessed that no longer exists?

Surprise hot removal.

> Are these (native) pciehp ports and the attached pci_dev isn't torn
> down quickly enough? Do we need some kind of locking or an atomic flag
> that prevents accesses to devices until they're torn down completely?

Tearing down a device and unbinding it from a driver generates lots of
additional accesses. Patch 2/3 removes MSI-x teardown which was one of
the larger sources of config and MMIO access to a non-existent device.

There are others, too. Heck, even checking if the device is present
(pci_device_is_present) generates config access to the removed device. :)

What do you think about adding a state to the pci_dev to say that it is
removed? The state can be set by pciehp or pcie-dpc if either detects
removal or link down, or on the first ~0 completion. Then have the
teardown check for the removal state before doing orderly device removal.

> Since your patches pertain to aerdrv, do we need synchronization between
> the pciehp and aer drivers so that aer doesn't touch a device for which
> pciehp has sensed removal? (Is the interrupt shared between pciehp and
> aerdrv?)

pciehp and aerdrv can share an interrupt on root ports, but that's it.
The aer driver, though, does access every device in its sub-tree.
There's also pciehp and pcie-dpc that could benifit from coordination.

I can look into these, but it's much less trivial than these incremental
improvements. I'm hoping we can clean up these biggest offenders first
before attempting a more risky synchronization among the different
services.

I have a v2 series ready that addresses Bjorn's comment on caching AER cap
position, but that's the only difference I've accumulated so far from v1.

Thanks,
 Keith

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] Limiting pci access requests
  2016-08-18 16:05           ` Keith Busch
@ 2016-08-18 16:59             ` Lukas Wunner
  0 siblings, 0 replies; 17+ messages in thread
From: Lukas Wunner @ 2016-08-18 16:59 UTC (permalink / raw)
  To: Keith Busch; +Cc: Bjorn Helgaas, linux-pci, Wei Zhang, Jens Axboe

On Thu, Aug 18, 2016 at 12:05:16PM -0400, Keith Busch wrote:
> On Thu, Aug 18, 2016 at 04:02:13PM +0200, Lukas Wunner wrote:
> > How is it possible that a device is accessed that no longer exists?
> 
> Surprise hot removal.
> 
> > Are these (native) pciehp ports and the attached pci_dev isn't torn
> > down quickly enough? Do we need some kind of locking or an atomic flag
> > that prevents accesses to devices until they're torn down completely?
> 
> Tearing down a device and unbinding it from a driver generates lots of
> additional accesses. Patch 2/3 removes MSI-x teardown which was one of
> the larger sources of config and MMIO access to a non-existent device.
> 
> There are others, too. Heck, even checking if the device is present
> (pci_device_is_present) generates config access to the removed device. :)
> 
> What do you think about adding a state to the pci_dev to say that it is
> removed? The state can be set by pciehp or pcie-dpc if either detects
> removal or link down, or on the first ~0 completion. Then have the
> teardown check for the removal state before doing orderly device removal.

Exactly.

Attribute names that come to mind: "removed", "hot_removed",
"surprise_removed", perhaps with an "is_" prefix.

In principle this could be checked at the lowest level when
accessing config space in drivers/pci/access.c, and immediately
return ~0. With the check wrapped in unlikely().

aerdrv is not the only driver that has trouble with surprise removal:
Unplugging the Thunderbolt Ethernet adapter on a Mac while the interface
is up currently causes a lockup in the tg3 driver. Same with nouveau,
which often queries a timer on the GPU and ends up in an infinite loop
if the timer readout returns with -1. If the drivers could sense hot
removal by querying a flag, they could react accordingly in their
->remove hook, so this would be a real improvement.

> > Since your patches pertain to aerdrv, do we need synchronization between
> > the pciehp and aer drivers so that aer doesn't touch a device for which
> > pciehp has sensed removal? (Is the interrupt shared between pciehp and
> > aerdrv?)
> 
> pciehp and aerdrv can share an interrupt on root ports, but that's it.
> The aer driver, though, does access every device in its sub-tree.
> There's also pciehp and pcie-dpc that could benifit from coordination.
> 
> I can look into these, but it's much less trivial than these incremental
> improvements. I'm hoping we can clean up these biggest offenders first
> before attempting a more risky synchronization among the different
> services.

There's no synchronization necessary if there's just a flag to be
checked. Of course if aerdrv/dpc or other drivers need to react
immediately on hot removal, we'd need a separate ->hot_remove hook.

Best regards,

Lukas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] pcie: Don't search capabilities on removed devices
  2016-08-08 19:14 ` [PATCH 1/3] pcie: Don't search capabilities on removed devices Keith Busch
@ 2016-08-18 22:38   ` Bjorn Helgaas
  0 siblings, 0 replies; 17+ messages in thread
From: Bjorn Helgaas @ 2016-08-18 22:38 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-pci, Bjorn Helgaas, Wei Zhang, Jens Axboe

Hi Keith,

On Mon, Aug 08, 2016 at 01:14:25PM -0600, Keith Busch wrote:
> This patch returns immediately if trying to find a pcie capability
> on a removed device, as seen with an all 1's completion from config
> read. Previously this function would iterate the maximum 480 times to
> search for a capability at position 0xffc. There is never a case where
> we'd expect all 1's to a successful config read on a capability register,
> so this is a safe criteria to check before bailing on the device.

I'm nothing if not pedantic, so I think we're talking about reading
PCIe Extended Capability Headers (PCIe r3.0, sec 7.9.3), and I don't
think the spec 100% guarantees that the following is invalid:

  PCIe Extended Capability ID == 0xffff
  Capability Version == 0xf
  Next Capability Offset = 0xfff

It's true that capabilities must be DWORD aligned and the low two bits
of the Next Capability Offset are currently reserved and must be
implemented as 00b, which is not quite the same as saying they will
*always* be zero for all devices, because the spec does explicitly
allow for future uses.

I don't see that Capability ID 0xffff is actually reserved, but sec
7.9.2 does hint at that for capabilities in a RCRB.

So I guess I agree that a 0xffffffff value is unlikely enough that we
can consider it invalid :)

> While accessing a removed device shouldn't be fatal, it's doesn't
> accomplish anything. Instead, the code was testing completion synthesis
> capabilities which is observed to cause distruption to normal operations.

I didn't quite parse this last sentence.

> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
>  drivers/pci/pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index aab9d51..e884608 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -331,7 +331,7 @@ int pci_find_next_ext_capability(struct pci_dev *dev, int start, int cap)
>  	if (header == 0)
>  		return 0;
>  
> -	while (ttl-- > 0) {
> +	while (ttl-- > 0 && header != -1) {

I would prefer to do this check right after the
pci_read_config_dword() instead of putting it in the loop control, and
I'd write the "-1" as 0xffffffff.  That way it's more obviously an
error condition.

I know that means duplicating the check, which is sort of a bummer.
Wonder if it's possible to restructure the loop so we only need one
pci_read_config_dword() call?

I'm not sure "ttl" is the most natural way of controlling the loop.
The spec merely requires these headers to be at offsets between
0x100 and 0xffc (or maybe even 0xff8 if capabilities must have data).

>  		if (PCI_EXT_CAP_ID(header) == cap && pos != start)
>  			return pos;
>  
> -- 
> 2.7.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] pci/msix: Skip disabling removed devices
  2016-08-08 19:14 ` [PATCH 2/3] pci/msix: Skip disabling " Keith Busch
@ 2016-08-18 23:29   ` Bjorn Helgaas
  2016-08-19 17:11     ` Keith Busch
  0 siblings, 1 reply; 17+ messages in thread
From: Bjorn Helgaas @ 2016-08-18 23:29 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-pci, Bjorn Helgaas, Wei Zhang, Jens Axboe

On Mon, Aug 08, 2016 at 01:14:26PM -0600, Keith Busch wrote:
> There is no need to disable MSIx interrupts on a device that doesn't
> exist.
> 
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
>  drivers/pci/msi.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index a02981e..b60ee25 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -999,6 +999,11 @@ void pci_msix_shutdown(struct pci_dev *dev)
>  	if (!pci_msi_enable || !dev || !dev->msix_enabled)
>  		return;
>  
> +	if (!pci_device_is_present(dev)) {

It doesn't really make sense to me to use pci_device_is_present()
(which calls pci_bus_read_dev_vendor_id()) for this purpose.

Adding a new config read and checking for failure seems like just
narrowing the window -- a device that stops responding between this
point and the next required config read could still cause a problem.

Is this fixing a performance problem?  What's the specific motivation
for this?

I see "completion synthesis" in your cover letter.  I don't know what
that is; maybe the capabilities cover letter would have made more
sense to me if I did.

And you also mention "instability with hardware" -- what exactly is
that?  I understand some slowdown if we do config accesses to
non-existent devices, but I don't understand hardware instability.

> +		dev->msix_enabled = 0;
> +		return;
> +	}
> +
>  	/* Return the device with MSI-X masked as initial states */
>  	for_each_pci_msi_entry(entry, dev) {
>  		/* Keep cached states to be restored */
> -- 
> 2.7.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] pci/msix: Skip disabling removed devices
  2016-08-18 23:29   ` Bjorn Helgaas
@ 2016-08-19 17:11     ` Keith Busch
  0 siblings, 0 replies; 17+ messages in thread
From: Keith Busch @ 2016-08-19 17:11 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Bjorn Helgaas, Wei Zhang, Jens Axboe

On Thu, Aug 18, 2016 at 06:29:41PM -0500, Bjorn Helgaas wrote:
> On Mon, Aug 08, 2016 at 01:14:26PM -0600, Keith Busch wrote:
> > @@ -999,6 +999,11 @@ void pci_msix_shutdown(struct pci_dev *dev)
> >  	if (!pci_msi_enable || !dev || !dev->msix_enabled)
> >  		return;
> >  
> > +	if (!pci_device_is_present(dev)) {
> 
> It doesn't really make sense to me to use pci_device_is_present()
> (which calls pci_bus_read_dev_vendor_id()) for this purpose.

Roger that.

Based of some feedback from Lukas, it sounds like we could add a new
state to pci_dev for "is_removed" that can be set by pciehp or pcie-dpc.
Does that sound reasonable? If so, I think we can use that criteria
instead.
 
> Adding a new config read and checking for failure seems like just
> narrowing the window -- a device that stops responding between this
> point and the next required config read could still cause a problem.

That's true, but it's not a window that concerns us in real life. It just
doesn't really occur that a driver is unbinding from a device right as you
hot remove it. The driver is unbinding from the device _because_ it was
hot removed, so presence is down long before we release the MSI vectors.

But if we want to close that gab, it should be trivial if we can add the
is_removed state to pci_dev.

> Is this fixing a performance problem?  What's the specific motivation
> for this?

When you hot remove a device, a non-posted command to that device must be
handled by something. This is what I meant by "completion synthesis",
though I must have pulled that terminology from PCI DPC.

This series plus some additional enhancements we've made sense the
initial posting significantly reduces total teardown time. We've seen
transactions to non-existent devices go from ~1000 down to just 1. These
take on the order of milliseconds for hardware to produce the completion
that allows software to proceed, so the many unnecessary commands add
up very quickly and noticeably degrades user experience.

Also, considering Linux PCI enumeration is serialized with a single mutex,
this becomes a very big deal when hot plugging entire trays of devices.

If you would like, I would be happy setup a conference call to discuss
this further. We have protocol traces showing timings that I think make
a very compelling case for reducing our software overhead.

> And you also mention "instability with hardware" -- what exactly is
> that?  I understand some slowdown if we do config accesses to
> non-existent devices, but I don't understand hardware instability.

I should have left that description out. I'm just a software engineer, and
the root causes you may be looking for are at a lower level than I grok.

But since I already went there, I'll just say occasionally completion
timeout doesn't quite work on time. That's not supposed to happen,
but we don't always get perfection out of devices. After millions of
non-posted commands to devices that don't exist, rare cases of too many
completion timeouts result in machine checking NMI.

I can only say empirically, Linux just works better if we don't
gratuitously rely on hardware to cleanup software's unnecessary
transactions. That observation alone doesn't justify a change, so I hope
the improved hot plug teardown can carry this patch concept forward.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] pcie/aer: Cache capability position
  2016-08-09 17:33   ` Bjorn Helgaas
@ 2016-09-06 21:05     ` Jon Derrick
  2016-09-06 21:18       ` Keith Busch
  0 siblings, 1 reply; 17+ messages in thread
From: Jon Derrick @ 2016-09-06 21:05 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Keith Busch, linux-pci, Bjorn Helgaas, Wei Zhang, Jens Axboe

Hi Keith, I like really this patch and am looking forward to v2. Should aer_root_reset and aer_error_resume be changed as well?

Also is there any reason the same couldn't be done for any pci_dev? I'd like to have this optimization available for is_error_source(). 


On Tue, Aug 09, 2016 at 12:33:50PM -0500, Bjorn Helgaas wrote:
> Hi Keith,
> 
> On Mon, Aug 08, 2016 at 01:14:27PM -0600, Keith Busch wrote:
> > This saves the postition of the error reporting capability so that it
> > doesn't need to be rediscovered during error handling.
> 
> I like the idea of this, and I wonder if we should go even further,
> and cache aer_pos in the pci_dev.  There are a zillion places that
> look for PCI_EXT_CAP_ID_ERR, and several of them are outside the AER
> driver, where it would be inconvenient to look up the aer_rpc.
> 
> That would probably mean adding a hook to pci_init_capabilities(),
> which would be kind of nice anyway because we already have a call to
> pci_cleanup_aer_error_status_regs() there, which is sort of dangling
> and not parallel to the other calls.
> 
> > Signed-off-by: Keith Busch <keith.busch@intel.com>
> > ---
> >  drivers/pci/pcie/aer/aerdrv.c | 38 +++++++++++++++++---------------------
> >  drivers/pci/pcie/aer/aerdrv.h |  1 +
> >  2 files changed, 18 insertions(+), 21 deletions(-)
> > 
> > diff --git a/drivers/pci/pcie/aer/aerdrv.c b/drivers/pci/pcie/aer/aerdrv.c
> > index 48d21e0..d05c864 100644
> > --- a/drivers/pci/pcie/aer/aerdrv.c
> > +++ b/drivers/pci/pcie/aer/aerdrv.c
> > @@ -122,7 +122,6 @@ static void set_downstream_devices_error_reporting(struct pci_dev *dev,
> >  static void aer_enable_rootport(struct aer_rpc *rpc)
> >  {
> >  	struct pci_dev *pdev = rpc->rpd->port;
> 
> What if you did this instead:
> 
>  -	int aer_pos;
>  +	int aer_pos = rpc->pos;
> 
> Then all the config accesses wouldn't have to change.
> 
> >  	u16 reg16;
> >  	u32 reg32;
> >  
> > @@ -134,14 +133,13 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
> >  	pcie_capability_clear_word(pdev, PCI_EXP_RTCTL,
> >  				   SYSTEM_ERROR_INTR_ON_MESG_MASK);
> >  
> > -	aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR);
> >  	/* Clear error status */
> > -	pci_read_config_dword(pdev, aer_pos + PCI_ERR_ROOT_STATUS, &reg32);
> > -	pci_write_config_dword(pdev, aer_pos + PCI_ERR_ROOT_STATUS, reg32);
> > -	pci_read_config_dword(pdev, aer_pos + PCI_ERR_COR_STATUS, &reg32);
> > -	pci_write_config_dword(pdev, aer_pos + PCI_ERR_COR_STATUS, reg32);
> > -	pci_read_config_dword(pdev, aer_pos + PCI_ERR_UNCOR_STATUS, &reg32);
> > -	pci_write_config_dword(pdev, aer_pos + PCI_ERR_UNCOR_STATUS, reg32);
> > +	pci_read_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_STATUS, &reg32);
> > +	pci_write_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_STATUS, reg32);
> > +	pci_read_config_dword(pdev, rpc->pos + PCI_ERR_COR_STATUS, &reg32);
> > +	pci_write_config_dword(pdev, rpc->pos + PCI_ERR_COR_STATUS, reg32);
> > +	pci_read_config_dword(pdev, rpc->pos + PCI_ERR_UNCOR_STATUS, &reg32);
> > +	pci_write_config_dword(pdev, rpc->pos + PCI_ERR_UNCOR_STATUS, reg32);
> >  
> >  	/*
> >  	 * Enable error reporting for the root port device and downstream port
> > @@ -150,9 +148,9 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
> >  	set_downstream_devices_error_reporting(pdev, true);
> >  
> >  	/* Enable Root Port's interrupt in response to error messages */
> > -	pci_read_config_dword(pdev, aer_pos + PCI_ERR_ROOT_COMMAND, &reg32);
> > +	pci_read_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_COMMAND, &reg32);
> >  	reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
> > -	pci_write_config_dword(pdev, aer_pos + PCI_ERR_ROOT_COMMAND, reg32);
> > +	pci_write_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_COMMAND, reg32);
> >  }
> >  
> >  /**
> > @@ -165,7 +163,6 @@ static void aer_disable_rootport(struct aer_rpc *rpc)
> >  {
> >  	struct pci_dev *pdev = rpc->rpd->port;
> >  	u32 reg32;
> > -	int pos;
> >  
> >  	/*
> >  	 * Disable error reporting for the root port device and downstream port
> > @@ -173,15 +170,14 @@ static void aer_disable_rootport(struct aer_rpc *rpc)
> >  	 */
> >  	set_downstream_devices_error_reporting(pdev, false);
> >  
> > -	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR);
> >  	/* Disable Root's interrupt in response to error messages */
> > -	pci_read_config_dword(pdev, pos + PCI_ERR_ROOT_COMMAND, &reg32);
> > +	pci_read_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_COMMAND, &reg32);
> >  	reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
> > -	pci_write_config_dword(pdev, pos + PCI_ERR_ROOT_COMMAND, reg32);
> > +	pci_write_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_COMMAND, reg32);
> >  
> >  	/* Clear Root's error status reg */
> > -	pci_read_config_dword(pdev, pos + PCI_ERR_ROOT_STATUS, &reg32);
> > -	pci_write_config_dword(pdev, pos + PCI_ERR_ROOT_STATUS, reg32);
> > +	pci_read_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_STATUS, &reg32);
> > +	pci_write_config_dword(pdev, rpc->pos + PCI_ERR_ROOT_STATUS, reg32);
> >  }
> >  
> >  /**
> > @@ -198,9 +194,7 @@ irqreturn_t aer_irq(int irq, void *context)
> >  	struct aer_rpc *rpc = get_service_data(pdev);
> >  	int next_prod_idx;
> >  	unsigned long flags;
> > -	int pos;
> >  
> > -	pos = pci_find_ext_capability(pdev->port, PCI_EXT_CAP_ID_ERR);
> >  	/*
> >  	 * Must lock access to Root Error Status Reg, Root Error ID Reg,
> >  	 * and Root error producer/consumer index
> > @@ -208,15 +202,15 @@ irqreturn_t aer_irq(int irq, void *context)
> >  	spin_lock_irqsave(&rpc->e_lock, flags);
> >  
> >  	/* Read error status */
> > -	pci_read_config_dword(pdev->port, pos + PCI_ERR_ROOT_STATUS, &status);
> > +	pci_read_config_dword(pdev->port, rpc->pos + PCI_ERR_ROOT_STATUS, &status);
> >  	if (!(status & (PCI_ERR_ROOT_UNCOR_RCV|PCI_ERR_ROOT_COR_RCV))) {
> >  		spin_unlock_irqrestore(&rpc->e_lock, flags);
> >  		return IRQ_NONE;
> >  	}
> >  
> >  	/* Read error source and clear error status */
> > -	pci_read_config_dword(pdev->port, pos + PCI_ERR_ROOT_ERR_SRC, &id);
> > -	pci_write_config_dword(pdev->port, pos + PCI_ERR_ROOT_STATUS, status);
> > +	pci_read_config_dword(pdev->port, rpc->pos + PCI_ERR_ROOT_ERR_SRC, &id);
> > +	pci_write_config_dword(pdev->port, rpc->pos + PCI_ERR_ROOT_STATUS, status);
> >  
> >  	/* Store error source for later DPC handler */
> >  	next_prod_idx = rpc->prod_idx + 1;
> > @@ -317,6 +311,8 @@ static int aer_probe(struct pcie_device *dev)
> >  		return -ENOMEM;
> >  	}
> >  
> > +	rpc->pos = pci_find_ext_capability(dev->port, PCI_EXT_CAP_ID_ERR);
> > +
> >  	/* Request IRQ ISR */
> >  	status = request_irq(dev->irq, aer_irq, IRQF_SHARED, "aerdrv", dev);
> >  	if (status) {
> > diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
> > index 945c939..d7c586e 100644
> > --- a/drivers/pci/pcie/aer/aerdrv.h
> > +++ b/drivers/pci/pcie/aer/aerdrv.h
> > @@ -72,6 +72,7 @@ struct aer_rpc {
> >  					 * recovery on the same
> >  					 * root port hierarchy
> >  					 */
> > +	int pos;			/* position of AER capability */
> >  };
> >  
> >  struct aer_broadcast_data {
> > -- 
> > 2.7.2
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] pcie/aer: Cache capability position
  2016-09-06 21:05     ` Jon Derrick
@ 2016-09-06 21:18       ` Keith Busch
  0 siblings, 0 replies; 17+ messages in thread
From: Keith Busch @ 2016-09-06 21:18 UTC (permalink / raw)
  To: Jon Derrick
  Cc: Bjorn Helgaas, linux-pci, Bjorn Helgaas, Wei Zhang, Jens Axboe

On Tue, Sep 06, 2016 at 03:05:21PM -0600, Jon Derrick wrote:
> Hi Keith, I like really this patch and am looking forward to v2. Should aer_root_reset and aer_error_resume be changed as well?
> 
> Also is there any reason the same couldn't be done for any pci_dev? I'd like to have this optimization available for is_error_source(). 

Yeah, I think this can go in pci_dev like all the other cached
capabilities. I'll send out v2 of the series today.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-09-06 21:18 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-08 19:14 [PATCH 0/3] Limiting pci access requests Keith Busch
2016-08-08 19:14 ` [PATCH 1/3] pcie: Don't search capabilities on removed devices Keith Busch
2016-08-18 22:38   ` Bjorn Helgaas
2016-08-08 19:14 ` [PATCH 2/3] pci/msix: Skip disabling " Keith Busch
2016-08-18 23:29   ` Bjorn Helgaas
2016-08-19 17:11     ` Keith Busch
2016-08-08 19:14 ` [PATCH 3/3] pcie/aer: Cache capability position Keith Busch
2016-08-09 17:33   ` Bjorn Helgaas
2016-09-06 21:05     ` Jon Derrick
2016-09-06 21:18       ` Keith Busch
2016-08-09 17:36 ` [PATCH 0/3] Limiting pci access requests Bjorn Helgaas
2016-08-09 18:56   ` Keith Busch
2016-08-09 18:56     ` Lukas Wunner
2016-08-17 21:05       ` Keith Busch
2016-08-18 14:02         ` Lukas Wunner
2016-08-18 16:05           ` Keith Busch
2016-08-18 16:59             ` Lukas Wunner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.