linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] pci: export pcie_find_root_port
@ 2016-11-02 22:35 Johannes Thumshirn
  2016-11-02 22:35 ` [PATCH 2/2] pci: Don't set RCB bit in LNKCTL if the upstream bridge hasn't Johannes Thumshirn
  0 siblings, 1 reply; 14+ messages in thread
From: Johannes Thumshirn @ 2016-11-02 22:35 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, Alexander Graf, Hannes Reinecke,
	Johannes Thumshirn

Export pcie_find_root_port() so we can use it outside of PCIe-AER error
injection.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/pci/pcie/aer/aer_inject.c | 14 --------------
 include/linux/pci.h               | 14 ++++++++++++++
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/pci/pcie/aer/aer_inject.c b/drivers/pci/pcie/aer/aer_inject.c
index db553dc..2b6a592 100644
--- a/drivers/pci/pcie/aer/aer_inject.c
+++ b/drivers/pci/pcie/aer/aer_inject.c
@@ -307,20 +307,6 @@ static int pci_bus_set_aer_ops(struct pci_bus *bus)
 	return 0;
 }
 
-static struct pci_dev *pcie_find_root_port(struct pci_dev *dev)
-{
-	while (1) {
-		if (!pci_is_pcie(dev))
-			break;
-		if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
-			return dev;
-		if (!dev->bus->self)
-			break;
-		dev = dev->bus->self;
-	}
-	return NULL;
-}
-
 static int find_aer_device_iter(struct device *device, void *data)
 {
 	struct pcie_device **result = data;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 0e49f70..a38772a 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1928,6 +1928,20 @@ static inline int pci_pcie_type(const struct pci_dev *dev)
 	return (pcie_caps_reg(dev) & PCI_EXP_FLAGS_TYPE) >> 4;
 }
 
+static inline struct pci_dev *pcie_find_root_port(struct pci_dev *dev)
+{
+	while (1) {
+		if (!pci_is_pcie(dev))
+			break;
+		if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
+			return dev;
+		if (!dev->bus->self)
+			break;
+		dev = dev->bus->self;
+	}
+	return NULL;
+}
+
 void pci_request_acs(void);
 bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags);
 bool pci_acs_path_enabled(struct pci_dev *start,
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] pci: Don't set RCB bit in LNKCTL if the upstream bridge hasn't
  2016-11-02 22:35 [PATCH 1/2] pci: export pcie_find_root_port Johannes Thumshirn
@ 2016-11-02 22:35 ` Johannes Thumshirn
  2016-11-09 17:11   ` Bjorn Helgaas
  2016-11-16 18:11   ` Bjorn Helgaas
  0 siblings, 2 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2016-11-02 22:35 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, Alexander Graf, Hannes Reinecke,
	Johannes Thumshirn

The Read Completion Boundary (RCB) bit must only be set on a device or
endpoint if it is set on the root complex.

Certain BIOSes erroneously set the RCB Bit in their ACPI _HPX Tables
even if it is not set on the root port. This is a violation to the PCIe
Specification and is known to bring some Mellanox Connect-X 3 HCAs into
a state where they can't map their firmware and go into error recovery.

BIOS Information
	Vendor: IBM
	Version: -[A8E120CUS-1.30]-
	Release Date: 08/22/2016

>From PCI Express Base Specification 1.1,
section 2.3.1.1. Data Return for Read Requests:
The Read Completion Boundary (RCB) parameter determines the naturally
aligned address boundaries on which a Read Request may be serviced with
multiple Completions
o For a Root Complex, RCB is 64 bytes or 128 bytes
  o This value is reported through a configuration register
    (see Section 7.8)
  Note: Bridges and Endpoints may implement a corresponding command
  bit which may be set by system software to indicate the RCB value
  for the Root Complex, allowing the Bridge/Endpoint to optimize its
  behavior when the Root Complex’s RCB is 128 bytes.
o For all other system elements, RCB is 128 bytes

Table 7-16: Link Control Register:
Configuration software must only Set this bit if the Root Port
Upstream from the Endpoint or Bridge reports an RCB value of
128 bytes (a value of 1b in the Read Completion Boundary bit).
Default value of this bit is 0b.

Functions that do not implement this feature must hardwire the
bit to 0b.

Before commit 7a1562d4f:
> 41:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
> 		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> 			ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt-
>
> 40:02.0 PCI bridge: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 PCI Express Root Port 2a (rev 07) (prog-if 00 [Normal decode])
> 		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> 			ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt-

After:
> 40:02.0 PCI bridge: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 PCI Express Root Port 2a (rev 07) (prog-if 00 [Normal decode])
> 		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> 			ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt-
>
> 41:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
> 		LnkCtl:	ASPM Disabled; RCB 128 bytes Disabled- CommClk+
> 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

Fixes: 7a1562d4f ("PCI: Apply _HPX Link Control settings to all devices with a link")
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
---
 drivers/pci/probe.c | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ab00267..0a4ab9c 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1439,6 +1439,19 @@ static void program_hpp_type1(struct pci_dev *dev, struct hpp_type1 *hpp)
 		dev_warn(&dev->dev, "PCI-X settings not supported\n");
 }
 
+static bool pcie_get_root_rcb(struct pci_dev *dev)
+{
+	struct pci_dev *rp = pcie_find_root_port(dev);
+	u16 lnkctl;
+
+	if (!rp)
+		return false;
+
+	pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
+
+	return lnkctl & PCI_EXP_LNKCTL_RCB;
+}
+
 static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp)
 {
 	int pos;
@@ -1468,9 +1481,21 @@ static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp)
 			~hpp->pci_exp_devctl_and, hpp->pci_exp_devctl_or);
 
 	/* Initialize Link Control Register */
-	if (pcie_cap_has_lnkctl(dev))
+	if (pcie_cap_has_lnkctl(dev)) {
+		bool rrcb;
+		u16 clear;
+		u16 set;
+
+		rrcb = pcie_get_root_rcb(dev);
+
+		clear = ~hpp->pci_exp_lnkctl_and;
+		set = hpp->pci_exp_lnkctl_or;
+		if (!rrcb)
+			set &= ~PCI_EXP_LNKCTL_RCB;
+
 		pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
-			~hpp->pci_exp_lnkctl_and, hpp->pci_exp_lnkctl_or);
+						  clear, set);
+	}
 
 	/* Find Advanced Error Reporting Enhanced Capability */
 	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] pci: Don't set RCB bit in LNKCTL if the upstream bridge hasn't
  2016-11-02 22:35 ` [PATCH 2/2] pci: Don't set RCB bit in LNKCTL if the upstream bridge hasn't Johannes Thumshirn
@ 2016-11-09 17:11   ` Bjorn Helgaas
  2016-11-14 11:56     ` Johannes Thumshirn
  2016-11-16 18:11   ` Bjorn Helgaas
  1 sibling, 1 reply; 14+ messages in thread
From: Bjorn Helgaas @ 2016-11-09 17:11 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, Alexander Graf, Hannes Reinecke

Hi Johannes,

On Wed, Nov 02, 2016 at 04:35:52PM -0600, Johannes Thumshirn wrote:
> The Read Completion Boundary (RCB) bit must only be set on a device or
> endpoint if it is set on the root complex.
> 
> Certain BIOSes erroneously set the RCB Bit in their ACPI _HPX Tables
> even if it is not set on the root port. This is a violation to the PCIe
> Specification and is known to bring some Mellanox Connect-X 3 HCAs into
> a state where they can't map their firmware and go into error recovery.
> 
> BIOS Information
> 	Vendor: IBM
> 	Version: -[A8E120CUS-1.30]-
> 	Release Date: 08/22/2016

This seems like a pretty serious problem (sounds like maybe the HCA is
completely useless?)

Can you point us at a bugzilla or other problem report?  It's nice to
have details of what this looks like to a user, so people who trip
over this problem have a little more chance of finding the solution.

7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices
with a link") appeared in v3.18, so it's probably not a *new* problem,
so my guess is that this is v4.10 material.

> From PCI Express Base Specification 1.1,
> section 2.3.1.1. Data Return for Read Requests:
> The Read Completion Boundary (RCB) parameter determines the naturally
> aligned address boundaries on which a Read Request may be serviced with
> multiple Completions
> o For a Root Complex, RCB is 64 bytes or 128 bytes
>   o This value is reported through a configuration register
>     (see Section 7.8)
>   Note: Bridges and Endpoints may implement a corresponding command
>   bit which may be set by system software to indicate the RCB value
>   for the Root Complex, allowing the Bridge/Endpoint to optimize its
>   behavior when the Root Complex’s RCB is 128 bytes.
> o For all other system elements, RCB is 128 bytes
> 
> Table 7-16: Link Control Register:
> Configuration software must only Set this bit if the Root Port
> Upstream from the Endpoint or Bridge reports an RCB value of
> 128 bytes (a value of 1b in the Read Completion Boundary bit).
> Default value of this bit is 0b.
> 
> Functions that do not implement this feature must hardwire the
> bit to 0b.
> 
> Before commit 7a1562d4f:
> > 41:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
> > 		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> > 			ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt-
> >
> > 40:02.0 PCI bridge: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 PCI Express Root Port 2a (rev 07) (prog-if 00 [Normal decode])
> > 		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> > 			ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt-
> 
> After:
> > 40:02.0 PCI bridge: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 PCI Express Root Port 2a (rev 07) (prog-if 00 [Normal decode])
> > 		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> > 			ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt-
> >
> > 41:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
> > 		LnkCtl:	ASPM Disabled; RCB 128 bytes Disabled- CommClk+
> > 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> 
> Fixes: 7a1562d4f ("PCI: Apply _HPX Link Control settings to all devices with a link")
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> Reviewed-by: Hannes Reinecke <hare@suse.com>
> ---
>  drivers/pci/probe.c | 29 +++++++++++++++++++++++++++--
>  1 file changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index ab00267..0a4ab9c 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1439,6 +1439,19 @@ static void program_hpp_type1(struct pci_dev *dev, struct hpp_type1 *hpp)
>  		dev_warn(&dev->dev, "PCI-X settings not supported\n");
>  }
>  
> +static bool pcie_get_root_rcb(struct pci_dev *dev)
> +{
> +	struct pci_dev *rp = pcie_find_root_port(dev);
> +	u16 lnkctl;
> +
> +	if (!rp)
> +		return false;
> +
> +	pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
> +
> +	return lnkctl & PCI_EXP_LNKCTL_RCB;
> +}
> +
>  static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp)
>  {
>  	int pos;
> @@ -1468,9 +1481,21 @@ static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp)
>  			~hpp->pci_exp_devctl_and, hpp->pci_exp_devctl_or);
>  
>  	/* Initialize Link Control Register */
> -	if (pcie_cap_has_lnkctl(dev))
> +	if (pcie_cap_has_lnkctl(dev)) {
> +		bool rrcb;
> +		u16 clear;
> +		u16 set;
> +
> +		rrcb = pcie_get_root_rcb(dev);
> +
> +		clear = ~hpp->pci_exp_lnkctl_and;
> +		set = hpp->pci_exp_lnkctl_or;
> +		if (!rrcb)
> +			set &= ~PCI_EXP_LNKCTL_RCB;
> +
>  		pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
> -			~hpp->pci_exp_lnkctl_and, hpp->pci_exp_lnkctl_or);
> +						  clear, set);
> +	}
>  
>  	/* Find Advanced Error Reporting Enhanced Capability */
>  	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
> -- 
> 2.10.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] pci: Don't set RCB bit in LNKCTL if the upstream bridge hasn't
  2016-11-09 17:11   ` Bjorn Helgaas
@ 2016-11-14 11:56     ` Johannes Thumshirn
  2016-11-14 16:16       ` Don Dutile
  0 siblings, 1 reply; 14+ messages in thread
From: Johannes Thumshirn @ 2016-11-14 11:56 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, Alexander Graf,
	Hannes Reinecke, Don Dutile

On Wed, Nov 09, 2016 at 11:11:40AM -0600, Bjorn Helgaas wrote:
> Hi Johannes,
> 
> On Wed, Nov 02, 2016 at 04:35:52PM -0600, Johannes Thumshirn wrote:
> > The Read Completion Boundary (RCB) bit must only be set on a device or
> > endpoint if it is set on the root complex.
> > 
> > Certain BIOSes erroneously set the RCB Bit in their ACPI _HPX Tables
> > even if it is not set on the root port. This is a violation to the PCIe
> > Specification and is known to bring some Mellanox Connect-X 3 HCAs into
> > a state where they can't map their firmware and go into error recovery.
> > 
> > BIOS Information
> > 	Vendor: IBM
> > 	Version: -[A8E120CUS-1.30]-
> > 	Release Date: 08/22/2016
> 
> This seems like a pretty serious problem (sounds like maybe the HCA is
> completely useless?)

Correct.

> 
> Can you point us at a bugzilla or other problem report?  It's nice to
> have details of what this looks like to a user, so people who trip
> over this problem have a little more chance of finding the solution.

As we already said, our bugzilla entry for this is not accessible from the
outside, but I know Red Hat does have a bugzilla entry for the same issue as
well. Maybe this is reachable from the outside (adding Don for this, as I know
he has worked on this problem as well).

> 
> 7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices
> with a link") appeared in v3.18, so it's probably not a *new* problem,
> so my guess is that this is v4.10 material.

Yes 4.10 sounds good to me. I personally think, this problem hasn't
materialized yet, as this is the kind of hardware you run on a rather /stable/
kernel either you built on your own or get from an enterprise distribution and
until recently these kernels haven't been updated to something newer than
3.18.

Thanks,
	Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] pci: Don't set RCB bit in LNKCTL if the upstream bridge hasn't
  2016-11-14 11:56     ` Johannes Thumshirn
@ 2016-11-14 16:16       ` Don Dutile
  2016-11-15 12:58         ` Johannes Thumshirn
  0 siblings, 1 reply; 14+ messages in thread
From: Don Dutile @ 2016-11-14 16:16 UTC (permalink / raw)
  To: Johannes Thumshirn, Bjorn Helgaas
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, Alexander Graf, Hannes Reinecke

On 11/14/2016 06:56 AM, Johannes Thumshirn wrote:
> On Wed, Nov 09, 2016 at 11:11:40AM -0600, Bjorn Helgaas wrote:
>> Hi Johannes,
>>
>> On Wed, Nov 02, 2016 at 04:35:52PM -0600, Johannes Thumshirn wrote:
>>> The Read Completion Boundary (RCB) bit must only be set on a device or
>>> endpoint if it is set on the root complex.
>>>
>>> Certain BIOSes erroneously set the RCB Bit in their ACPI _HPX Tables
>>> even if it is not set on the root port. This is a violation to the PCIe
>>> Specification and is known to bring some Mellanox Connect-X 3 HCAs into
>>> a state where they can't map their firmware and go into error recovery.
>>>
>>> BIOS Information
>>> 	Vendor: IBM
>>> 	Version: -[A8E120CUS-1.30]-
>>> 	Release Date: 08/22/2016
>>
>> This seems like a pretty serious problem (sounds like maybe the HCA is
>> completely useless?)
>
> Correct.
>
>>
>> Can you point us at a bugzilla or other problem report?  It's nice to
>> have details of what this looks like to a user, so people who trip
>> over this problem have a little more chance of finding the solution.
>
> As we already said, our bugzilla entry for this is not accessible from the
> outside, but I know Red Hat does have a bugzilla entry for the same issue as
> well. Maybe this is reachable from the outside (adding Don for this, as I know
> he has worked on this problem as well).
>
RHEL bz's are not accessible from the outside.
I suggest capturing the content of the RH bz issue and creating a k.o. bz
with the information.

>>
>> 7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices
>> with a link") appeared in v3.18, so it's probably not a *new* problem,
>> so my guess is that this is v4.10 material.
>
> Yes 4.10 sounds good to me. I personally think, this problem hasn't
> materialized yet, as this is the kind of hardware you run on a rather /stable/
> kernel either you built on your own or get from an enterprise distribution and
> until recently these kernels haven't been updated to something newer than
> 3.18.
>
> Thanks,
> 	Johannes
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] pci: Don't set RCB bit in LNKCTL if the upstream bridge hasn't
  2016-11-14 16:16       ` Don Dutile
@ 2016-11-15 12:58         ` Johannes Thumshirn
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2016-11-15 12:58 UTC (permalink / raw)
  To: Don Dutile
  Cc: Bjorn Helgaas, Bjorn Helgaas, linux-pci, linux-kernel,
	Alexander Graf, Hannes Reinecke

On Mon, Nov 14, 2016 at 11:16:51AM -0500, Don Dutile wrote:
> On 11/14/2016 06:56 AM, Johannes Thumshirn wrote:
> > On Wed, Nov 09, 2016 at 11:11:40AM -0600, Bjorn Helgaas wrote:
> > > Hi Johannes,
> > > 
> > > On Wed, Nov 02, 2016 at 04:35:52PM -0600, Johannes Thumshirn wrote:
> > > > The Read Completion Boundary (RCB) bit must only be set on a device or
> > > > endpoint if it is set on the root complex.
> > > > 
> > > > Certain BIOSes erroneously set the RCB Bit in their ACPI _HPX Tables
> > > > even if it is not set on the root port. This is a violation to the PCIe
> > > > Specification and is known to bring some Mellanox Connect-X 3 HCAs into
> > > > a state where they can't map their firmware and go into error recovery.
> > > > 
> > > > BIOS Information
> > > > 	Vendor: IBM
> > > > 	Version: -[A8E120CUS-1.30]-
> > > > 	Release Date: 08/22/2016
> > > 
> > > This seems like a pretty serious problem (sounds like maybe the HCA is
> > > completely useless?)
> > 
> > Correct.
> > 
> > > 
> > > Can you point us at a bugzilla or other problem report?  It's nice to
> > > have details of what this looks like to a user, so people who trip
> > > over this problem have a little more chance of finding the solution.
> > 
> > As we already said, our bugzilla entry for this is not accessible from the
> > outside, but I know Red Hat does have a bugzilla entry for the same issue as
> > well. Maybe this is reachable from the outside (adding Don for this, as I know
> > he has worked on this problem as well).
> > 
> RHEL bz's are not accessible from the outside.
> I suggest capturing the content of the RH bz issue and creating a k.o. bz
> with the information.

I've created https://bugzilla.kernel.org/show_bug.cgi?id=187781 to track this
on b.k.o. Feel free to add any information you have.

@Bjorn anything else I can provide in order to get the fix applied?

Byte,
 	Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] pci: Don't set RCB bit in LNKCTL if the upstream bridge hasn't
  2016-11-02 22:35 ` [PATCH 2/2] pci: Don't set RCB bit in LNKCTL if the upstream bridge hasn't Johannes Thumshirn
  2016-11-09 17:11   ` Bjorn Helgaas
@ 2016-11-16 18:11   ` Bjorn Helgaas
  2016-11-17  9:57     ` Johannes Thumshirn
  2016-11-21 16:53     ` Bjorn Helgaas
  1 sibling, 2 replies; 14+ messages in thread
From: Bjorn Helgaas @ 2016-11-16 18:11 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, Alexander Graf, Hannes Reinecke

Hi Johannes,

On Wed, Nov 02, 2016 at 04:35:52PM -0600, Johannes Thumshirn wrote:
> The Read Completion Boundary (RCB) bit must only be set on a device or
> endpoint if it is set on the root complex.

I propose the following slightly modified patch.  The interesting
difference is that your patch only touches the _HPX "OR" mask, so it
refrains from *setting* RCB in some cases, but it never actually
*clears* it.  The only time we clear RCB is when the _HPX "AND" mask
has RCB == 0.

My intent below is that we completely ignore the _HPX RCB bits, and we
set an Endpoint's RCB if and only if the Root Port's RCB is set.

I made an ugly ASCII table to think about the cases:

      Root   EP    _HPX  _HPX     Final Endpoint RCB state
      Port  (init)  AND   OR     (curr)  (yours)  (mine)
  0)   0     0      0    0          0       0       0
  1)   0     0      0    1          1       0       0
  2)   0     0      1    0          0       0       0
  3)   0     0      1    1          1       0       0
  4)   0     1      0    0          0       0       0
  5)   0     1      0    1          1       0       0
  6)   0     1      1    0          1       1       0
  7)   0     1      1    1          1       1       0
  8)   1     0      0    0          0       0       1
  9)   1     0      0    1          1       1       1
  A)   1     0      1    0          0       0       1
  B)   1     0      1    1          1       1       1
  C)   1     1      0    0          0       0       1
  D)   1     1      0    1          1       1       1
  E)   1     1      1    0          1       1       1
  F)   1     1      1    1          1       1       1

Cases 0-7 should all result in the Endpoint RCB being zero because the
Root Port RCB is zero.  Case 1 is the bug you're fixing.  Cases 3 & 5
are similar hypothetical bugs your patch also fixes.

Cases 6 & 7, where firmware left the Endpoint RCB set and _HPX didn't
tell us to clear it, are hypothetical firmware bugs that your patch
wouldn't fix.

In cases 8, A, and C, we currently leave the Endpoint RCB cleared,
either because firmware left it clear and _HPX didn't tell us to set
it (8 and A), or because firmware set it but _HPX told us to clear it
(C).

One could argue that 8, A, and C should stay as they currently are, as
a way for _HPX to work around hardware bugs, e.g., a Root Port that
advertises a 128-byte RCB but doesn't actually support it.  I didn't
bother with that and set the Endpoint's RCB to 128 in all cases when
the Root Port claims to support it.

It'd be great if you could test this and comment.

If you get a chance, collect the /proc/iomem contents, too.  That's
not for this bug; it's because I'm curious about the

  ERST: Can not request [mem 0xb928b000-0xb928cbff] for ERST
  
problem in your dmesg log.


commit da567a6c67cb5c5a55db7b26b46e9b7e9af69299
Author: Johannes Thumshirn <jthumshirn@suse.de>
Date:   Wed Nov 2 16:35:52 2016 -0600

    PCI: Set Read Completion Boundary to 128 iff Root Port supports it
    
    Per PCIe spec r3.0, sec 2.3.1.1, the Read Completion Boundary (RCB)
    determines the naturally aligned address boundaries on which a Read Request
    may be serviced with multiple Completions:
    
      - For a Root Complex, RCB is 64 bytes or 128 bytes
        This value is reported in the Link Control Register
    
        Note: Bridges and Endpoints may implement a corresponding command bit
        which may be set by system software to indicate the RCB value for the
        Root Complex, allowing the Bridge/Endpoint to optimize its behavior
        when the Root Complex’s RCB is 128 bytes.
    
      - For all other system elements, RCB is 128 bytes
    
    Per sec 7.8.7, if a Root Port only supports a 64-byte RCB, the RCB of all
    downstream devices must be clear, indicating an RCB of 64 bytes.  If the
    Root Port supports a 128-byte RCB, we may optionally set the RCB of
    downstream devices so they know they can generate larger Completions.
    
    Some BIOSes supply an _HPX that tells us to set RCB, even though the Root
    Port doesn't have RCB set, which may lead to Malformed TLP errors if the
    Endpoint generates completions larger than the Root Port can handle.
    
    The IBM x3850 X6 with BIOS version -[A8E120CUS-1.30]- 08/22/2016 supplies
    such an _HPX and a Mellanox MT27500 ConnectX-3 device fails to initialize:
    
      mlx4_core 0000:41:00.0: command 0xfff timed out (go bit not cleared)
      mlx4_core 0000:41:00.0: device is going to be reset
      mlx4_core 0000:41:00.0: Failed to obtain HW semaphore, aborting
      mlx4_core 0000:41:00.0: Fail to reset HCA
      ------------[ cut here ]------------
      kernel BUG at drivers/net/ethernet/mellanox/mlx4/catas.c:193!
    
    After 6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration")
    and 7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices
    with a link"), we apply _HPX settings to *all* devices, not just those
    hot-added after boot.
    
    Before 7a1562d4f2d0, we didn't touch the Mellanox RCB, and the device
    worked.  After 7a1562d4f2d0, we set its RCB to 128, and it failed.
    
    Set the RCB to 128 iff the Root Port supports a 128-byte RCB.  Otherwise,
    set RCB to 64 bytes.  This effectively ignores what _HPX tells us about
    RCB.
    
    [bhelgaas: changelog, clear RCB if not set for Root Port]
    Fixes: 6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration")
    Fixes: 7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices with a link")
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=187781
    Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    CC: stable@vger.kernel.org	# v3.18+

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ab00267..8854161 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1439,6 +1439,21 @@ static void program_hpp_type1(struct pci_dev *dev, struct hpp_type1 *hpp)
 		dev_warn(&dev->dev, "PCI-X settings not supported\n");
 }
 
+static bool pcie_root_rcb_set(struct pci_dev *dev)
+{
+	struct pci_dev *rp = pcie_find_root_port(dev);
+	u16 lnkctl;
+
+	if (!rp)
+		return false;
+
+	pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
+	if (lnkctl & PCI_EXP_LNKCTL_RCB)
+		return true;
+
+	return false;
+}
+
 static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp)
 {
 	int pos;
@@ -1468,9 +1483,19 @@ static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp)
 			~hpp->pci_exp_devctl_and, hpp->pci_exp_devctl_or);
 
 	/* Initialize Link Control Register */
-	if (pcie_cap_has_lnkctl(dev))
+	if (pcie_cap_has_lnkctl(dev)) {
+
+		/*
+		 * If the Root Port supports Read Completion Boundary of
+		 * 128, set RCB to 128.  Otherwise, clear it.
+		 */
+		hpp->pci_exp_lnkctl_and |= PCI_EXP_LNKCTL_RCB;
+		if (pcie_root_rcb_set(dev))
+			hpp->pci_exp_lnkctl_or |= PCI_EXP_LNKCTL_RCB;
+
 		pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
 			~hpp->pci_exp_lnkctl_and, hpp->pci_exp_lnkctl_or);
+	}
 
 	/* Find Advanced Error Reporting Enhanced Capability */
 	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] pci: Don't set RCB bit in LNKCTL if the upstream bridge hasn't
  2016-11-16 18:11   ` Bjorn Helgaas
@ 2016-11-17  9:57     ` Johannes Thumshirn
  2016-11-21 16:53     ` Bjorn Helgaas
  1 sibling, 0 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2016-11-17  9:57 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, Alexander Graf, Hannes Reinecke

On Wed, Nov 16, 2016 at 12:11:58PM -0600, Bjorn Helgaas wrote:
> Hi Johannes,
> 
> On Wed, Nov 02, 2016 at 04:35:52PM -0600, Johannes Thumshirn wrote:
> > The Read Completion Boundary (RCB) bit must only be set on a device or
> > endpoint if it is set on the root complex.
> 
> I propose the following slightly modified patch.  The interesting
> difference is that your patch only touches the _HPX "OR" mask, so it
> refrains from *setting* RCB in some cases, but it never actually
> *clears* it.  The only time we clear RCB is when the _HPX "AND" mask
> has RCB == 0.
> 
> My intent below is that we completely ignore the _HPX RCB bits, and we
> set an Endpoint's RCB if and only if the Root Port's RCB is set.
> 
> I made an ugly ASCII table to think about the cases:
> 
>       Root   EP    _HPX  _HPX     Final Endpoint RCB state
>       Port  (init)  AND   OR     (curr)  (yours)  (mine)
>   0)   0     0      0    0          0       0       0
>   1)   0     0      0    1          1       0       0
>   2)   0     0      1    0          0       0       0
>   3)   0     0      1    1          1       0       0
>   4)   0     1      0    0          0       0       0
>   5)   0     1      0    1          1       0       0
>   6)   0     1      1    0          1       1       0
>   7)   0     1      1    1          1       1       0
>   8)   1     0      0    0          0       0       1
>   9)   1     0      0    1          1       1       1
>   A)   1     0      1    0          0       0       1
>   B)   1     0      1    1          1       1       1
>   C)   1     1      0    0          0       0       1
>   D)   1     1      0    1          1       1       1
>   E)   1     1      1    0          1       1       1
>   F)   1     1      1    1          1       1       1
> 
> Cases 0-7 should all result in the Endpoint RCB being zero because the
> Root Port RCB is zero.  Case 1 is the bug you're fixing.  Cases 3 & 5
> are similar hypothetical bugs your patch also fixes.
> 
> Cases 6 & 7, where firmware left the Endpoint RCB set and _HPX didn't
> tell us to clear it, are hypothetical firmware bugs that your patch
> wouldn't fix.
> 
> In cases 8, A, and C, we currently leave the Endpoint RCB cleared,
> either because firmware left it clear and _HPX didn't tell us to set
> it (8 and A), or because firmware set it but _HPX told us to clear it
> (C).
> 
> One could argue that 8, A, and C should stay as they currently are, as
> a way for _HPX to work around hardware bugs, e.g., a Root Port that
> advertises a 128-byte RCB but doesn't actually support it.  I didn't
> bother with that and set the Endpoint's RCB to 128 in all cases when
> the Root Port claims to support it.
> 
> It'd be great if you could test this and comment.

I've lost access to the machines, but I'll try to delegate it to someone who
has access.

> 
> If you get a chance, collect the /proc/iomem contents, too.  That's
> not for this bug; it's because I'm curious about the
> 
>   ERST: Can not request [mem 0xb928b000-0xb928cbff] for ERST
>   
> problem in your dmesg log.

I'll ask for this as well.

Byte,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] pci: Don't set RCB bit in LNKCTL if the upstream bridge hasn't
  2016-11-16 18:11   ` Bjorn Helgaas
  2016-11-17  9:57     ` Johannes Thumshirn
@ 2016-11-21 16:53     ` Bjorn Helgaas
  2016-11-22  7:59       ` Johannes Thumshirn
                         ` (3 more replies)
  1 sibling, 4 replies; 14+ messages in thread
From: Bjorn Helgaas @ 2016-11-21 16:53 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, Alexander Graf, Hannes Reinecke

On Wed, Nov 16, 2016 at 12:11:58PM -0600, Bjorn Helgaas wrote:
> Hi Johannes,
> 
> On Wed, Nov 02, 2016 at 04:35:52PM -0600, Johannes Thumshirn wrote:
> > The Read Completion Boundary (RCB) bit must only be set on a device or
> > endpoint if it is set on the root complex.
> 
> I propose the following slightly modified patch.  The interesting
> difference is that your patch only touches the _HPX "OR" mask, so it
> refrains from *setting* RCB in some cases, but it never actually
> *clears* it.  The only time we clear RCB is when the _HPX "AND" mask
> has RCB == 0.
> 
> My intent below is that we completely ignore the _HPX RCB bits, and we
> set an Endpoint's RCB if and only if the Root Port's RCB is set.
> 
> I made an ugly ASCII table to think about the cases:
> 
>       Root   EP    _HPX  _HPX     Final Endpoint RCB state
>       Port  (init)  AND   OR     (curr)  (yours)  (mine)
>   0)   0     0      0    0          0       0       0
>   1)   0     0      0    1          1       0       0
>   2)   0     0      1    0          0       0       0
>   3)   0     0      1    1          1       0       0
>   4)   0     1      0    0          0       0       0
>   5)   0     1      0    1          1       0       0
>   6)   0     1      1    0          1       1       0
>   7)   0     1      1    1          1       1       0
>   8)   1     0      0    0          0       0       1
>   9)   1     0      0    1          1       1       1
>   A)   1     0      1    0          0       0       1
>   B)   1     0      1    1          1       1       1
>   C)   1     1      0    0          0       0       1
>   D)   1     1      0    1          1       1       1
>   E)   1     1      1    0          1       1       1
>   F)   1     1      1    1          1       1       1
> 
> Cases 0-7 should all result in the Endpoint RCB being zero because the
> Root Port RCB is zero.  Case 1 is the bug you're fixing.  Cases 3 & 5
> are similar hypothetical bugs your patch also fixes.
> 
> Cases 6 & 7, where firmware left the Endpoint RCB set and _HPX didn't
> tell us to clear it, are hypothetical firmware bugs that your patch
> wouldn't fix.
> 
> In cases 8, A, and C, we currently leave the Endpoint RCB cleared,
> either because firmware left it clear and _HPX didn't tell us to set
> it (8 and A), or because firmware set it but _HPX told us to clear it
> (C).
> 
> One could argue that 8, A, and C should stay as they currently are, as
> a way for _HPX to work around hardware bugs, e.g., a Root Port that
> advertises a 128-byte RCB but doesn't actually support it.  I didn't
> bother with that and set the Endpoint's RCB to 128 in all cases when
> the Root Port claims to support it.
> 
> It'd be great if you could test this and comment.
> 
> If you get a chance, collect the /proc/iomem contents, too.  That's
> not for this bug; it's because I'm curious about the
> 
>   ERST: Can not request [mem 0xb928b000-0xb928cbff] for ERST
>   
> problem in your dmesg log.

Oops, I goofed and forgot to clear RCB by default.
Here's the fixed one.


commit b7bff74c2e6babf12906291ee177f16444de81ad
Author: Johannes Thumshirn <jthumshirn@suse.de>
Date:   Wed Nov 16 15:47:53 2016 -0600

    PCI: Set Read Completion Boundary to 128 iff Root Port supports it
    
    Per PCIe spec r3.0, sec 2.3.1.1, the Read Completion Boundary (RCB)
    determines the naturally aligned address boundaries on which a Read Request
    may be serviced with multiple Completions:
    
      - For a Root Complex, RCB is 64 bytes or 128 bytes
        This value is reported in the Link Control Register
    
        Note: Bridges and Endpoints may implement a corresponding command bit
        which may be set by system software to indicate the RCB value for the
        Root Complex, allowing the Bridge/Endpoint to optimize its behavior
        when the Root Complex’s RCB is 128 bytes.
    
      - For all other system elements, RCB is 128 bytes
    
    Per sec 7.8.7, if a Root Port only supports a 64-byte RCB, the RCB of all
    downstream devices must be clear, indicating an RCB of 64 bytes.  If the
    Root Port supports a 128-byte RCB, we may optionally set the RCB of
    downstream devices so they know they can generate larger Completions.
    
    Some BIOSes supply an _HPX that tells us to set RCB, even though the Root
    Port doesn't have RCB set, which may lead to Malformed TLP errors if the
    Endpoint generates completions larger than the Root Port can handle.
    
    The IBM x3850 X6 with BIOS version -[A8E120CUS-1.30]- 08/22/2016 supplies
    such an _HPX and a Mellanox MT27500 ConnectX-3 device fails to initialize:
    
      mlx4_core 0000:41:00.0: command 0xfff timed out (go bit not cleared)
      mlx4_core 0000:41:00.0: device is going to be reset
      mlx4_core 0000:41:00.0: Failed to obtain HW semaphore, aborting
      mlx4_core 0000:41:00.0: Fail to reset HCA
      ------------[ cut here ]------------
      kernel BUG at drivers/net/ethernet/mellanox/mlx4/catas.c:193!
    
    After 6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration")
    and 7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices
    with a link"), we apply _HPX settings to *all* devices, not just those
    hot-added after boot.
    
    Before 7a1562d4f2d0, we didn't touch the Mellanox RCB, and the device
    worked.  After 7a1562d4f2d0, we set its RCB to 128, and it failed.
    
    Set the RCB to 128 iff the Root Port supports a 128-byte RCB.  Otherwise,
    set RCB to 64 bytes.  This effectively ignores what _HPX tells us about
    RCB.
    
    [bhelgaas: changelog, clear RCB if not set for Root Port]
    Fixes: 6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration")
    Fixes: 7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices with a link")
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=187781
    Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    CC: stable@vger.kernel.org	# v3.18+

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ab00267..104c46d 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1439,6 +1439,21 @@ static void program_hpp_type1(struct pci_dev *dev, struct hpp_type1 *hpp)
 		dev_warn(&dev->dev, "PCI-X settings not supported\n");
 }
 
+static bool pcie_root_rcb_set(struct pci_dev *dev)
+{
+	struct pci_dev *rp = pcie_find_root_port(dev);
+	u16 lnkctl;
+
+	if (!rp)
+		return false;
+
+	pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
+	if (lnkctl & PCI_EXP_LNKCTL_RCB)
+		return true;
+
+	return false;
+}
+
 static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp)
 {
 	int pos;
@@ -1468,9 +1483,20 @@ static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp)
 			~hpp->pci_exp_devctl_and, hpp->pci_exp_devctl_or);
 
 	/* Initialize Link Control Register */
-	if (pcie_cap_has_lnkctl(dev))
+	if (pcie_cap_has_lnkctl(dev)) {
+
+		/*
+		 * If the Root Port supports Read Completion Boundary of
+		 * 128, set RCB to 128.  Otherwise, clear it.
+		 */
+		hpp->pci_exp_lnkctl_and |= PCI_EXP_LNKCTL_RCB;
+		hpp->pci_exp_lnkctl_or &= ~PCI_EXP_LNKCTL_RCB;
+		if (pcie_root_rcb_set(dev))
+			hpp->pci_exp_lnkctl_or |= PCI_EXP_LNKCTL_RCB;
+
 		pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
 			~hpp->pci_exp_lnkctl_and, hpp->pci_exp_lnkctl_or);
+	}
 
 	/* Find Advanced Error Reporting Enhanced Capability */
 	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] pci: Don't set RCB bit in LNKCTL if the upstream bridge hasn't
  2016-11-21 16:53     ` Bjorn Helgaas
@ 2016-11-22  7:59       ` Johannes Thumshirn
  2016-11-22 10:56       ` Johannes Thumshirn
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2016-11-22  7:59 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, Alexander Graf, Hannes Reinecke

On Mon, Nov 21, 2016 at 10:53:52AM -0600, Bjorn Helgaas wrote:
> On Wed, Nov 16, 2016 at 12:11:58PM -0600, Bjorn Helgaas wrote:
> > Hi Johannes,
> > 
> > On Wed, Nov 02, 2016 at 04:35:52PM -0600, Johannes Thumshirn wrote:
> > > The Read Completion Boundary (RCB) bit must only be set on a device or
> > > endpoint if it is set on the root complex.
> > 
> > I propose the following slightly modified patch.  The interesting
> > difference is that your patch only touches the _HPX "OR" mask, so it
> > refrains from *setting* RCB in some cases, but it never actually
> > *clears* it.  The only time we clear RCB is when the _HPX "AND" mask
> > has RCB == 0.
> > 
> > My intent below is that we completely ignore the _HPX RCB bits, and we
> > set an Endpoint's RCB if and only if the Root Port's RCB is set.
> > 
> > I made an ugly ASCII table to think about the cases:
> > 
> >       Root   EP    _HPX  _HPX     Final Endpoint RCB state
> >       Port  (init)  AND   OR     (curr)  (yours)  (mine)
> >   0)   0     0      0    0          0       0       0
> >   1)   0     0      0    1          1       0       0
> >   2)   0     0      1    0          0       0       0
> >   3)   0     0      1    1          1       0       0
> >   4)   0     1      0    0          0       0       0
> >   5)   0     1      0    1          1       0       0
> >   6)   0     1      1    0          1       1       0
> >   7)   0     1      1    1          1       1       0
> >   8)   1     0      0    0          0       0       1
> >   9)   1     0      0    1          1       1       1
> >   A)   1     0      1    0          0       0       1
> >   B)   1     0      1    1          1       1       1
> >   C)   1     1      0    0          0       0       1
> >   D)   1     1      0    1          1       1       1
> >   E)   1     1      1    0          1       1       1
> >   F)   1     1      1    1          1       1       1
> > 
> > Cases 0-7 should all result in the Endpoint RCB being zero because the
> > Root Port RCB is zero.  Case 1 is the bug you're fixing.  Cases 3 & 5
> > are similar hypothetical bugs your patch also fixes.
> > 
> > Cases 6 & 7, where firmware left the Endpoint RCB set and _HPX didn't
> > tell us to clear it, are hypothetical firmware bugs that your patch
> > wouldn't fix.
> > 
> > In cases 8, A, and C, we currently leave the Endpoint RCB cleared,
> > either because firmware left it clear and _HPX didn't tell us to set
> > it (8 and A), or because firmware set it but _HPX told us to clear it
> > (C).
> > 
> > One could argue that 8, A, and C should stay as they currently are, as
> > a way for _HPX to work around hardware bugs, e.g., a Root Port that
> > advertises a 128-byte RCB but doesn't actually support it.  I didn't
> > bother with that and set the Endpoint's RCB to 128 in all cases when
> > the Root Port claims to support it.
> > 
> > It'd be great if you could test this and comment.
> > 
> > If you get a chance, collect the /proc/iomem contents, too.  That's
> > not for this bug; it's because I'm curious about the
> > 
> >   ERST: Can not request [mem 0xb928b000-0xb928cbff] for ERST
> >   
> > problem in your dmesg log.
> 
> Oops, I goofed and forgot to clear RCB by default.
> Here's the fixed one.

Yep, my contact already noticed. I have heard rumors that the first two
patches worked on RHEL and the 3rd one didn't (but that's just rumors) so I
try to persuade our field engineer to spend another day testing the patches.
But please be aware this is a bit cumbersome as I don't have access to the
machine and our field engineer only has remote access as well.

Byte,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] pci: Don't set RCB bit in LNKCTL if the upstream bridge hasn't
  2016-11-21 16:53     ` Bjorn Helgaas
  2016-11-22  7:59       ` Johannes Thumshirn
@ 2016-11-22 10:56       ` Johannes Thumshirn
  2016-11-22 16:01       ` Myron Stowe
  2016-11-23 17:31       ` Bjorn Helgaas
  3 siblings, 0 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2016-11-22 10:56 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, Alexander Graf, Hannes Reinecke

On Mon, Nov 21, 2016 at 10:53:52AM -0600, Bjorn Helgaas wrote:
> On Wed, Nov 16, 2016 at 12:11:58PM -0600, Bjorn Helgaas wrote:
> > Hi Johannes,

[...]

> > 
> > If you get a chance, collect the /proc/iomem contents, too.  That's
> > not for this bug; it's because I'm curious about the
> > 
> >   ERST: Can not request [mem 0xb928b000-0xb928cbff] for ERST
> >   
> > problem in your dmesg log.

It's pasted at the end (not sure whether attaching works).

> 
> Oops, I goofed and forgot to clear RCB by default.
> Here's the fixed one.

So this version works :-). 

Do you plan to merge it for 4.9-rc7 or is it more 4.10 material?

Byte,
	Johannes

/proc/iomem:

00000000-00000fff : reserved
00001000-0009bfff : System RAM
0009c000-0009ffff : reserved
000a0000-000bffff : PCI Bus 0000:00
000c0000-000fffff : PCI Bus 0000:00
  000c0000-000c7fff : Video ROM
  000c8000-000c8fff : Adapter ROM
  000c9000-000c9fff : Adapter ROM
  000ca000-000cafff : Adapter ROM
  000cb000-000cbfff : Adapter ROM
  000cc000-000d3bff : Adapter ROM
  000e0000-000fffff : reserved
    000f0000-000fffff : System ROM
00100000-b6a32fff : System RAM
  01000000-015e8742 : Kernel code
  015e8743-01d549ff : Kernel data
  01f57000-021cefff : Kernel bss
b6a33000-b6a41fff : reserved
b6a42000-b7ffefff : System RAM
b7fff000-b83fefff : reserved
b83ff000-ba3fefff : ACPI Non-volatile Storage
ba3ff000-ba7a7fff : ACPI Tables
ba7a8000-ba7fffff : System RAM
ba800000-bb7fffff : RAM buffer
bb800000-cfffffff : reserved
  c0000000-cfffffff : PCI MMCONFIG 0000 [bus 00-ff]
d0004000-d0004fff : dmar3
d0007000-d3ffffff : PCI Bus 0000:00
  d0100000-d02fffff : PCI Bus 0000:11
  d0300000-d04fffff : PCI Bus 0000:17
  d20fd000-d20fdfff : 0000:00:05.4
  d20fe000-d20fe3ff : 0000:00:1a.0
    d20fe000-d20fe3ff : ehci_hcd
  d20ff000-d20ff3ff : 0000:00:1d.0
    d20ff000-d20ff3ff : ehci_hcd
  d2100000-d22fffff : PCI Bus 0000:01
    d2100000-d213ffff : 0000:01:00.0
    d2140000-d217ffff : 0000:01:00.1
    d2180000-d21bffff : 0000:01:00.2
    d21f0000-d21f3fff : 0000:01:00.0
      d21f0000-d21f3fff : igb
    d21f4000-d21f7fff : 0000:01:00.1
      d21f4000-d21f7fff : igb
    d21f8000-d21fbfff : 0000:01:00.2
      d21f8000-d21fbfff : igb
    d21fc000-d21fffff : 0000:01:00.3
      d21fc000-d21fffff : igb
    d2200000-d223ffff : 0000:01:00.0
      d2200000-d223ffff : igb
    d2240000-d227ffff : 0000:01:00.1
      d2240000-d227ffff : igb
    d2280000-d22bffff : 0000:01:00.2
      d2280000-d22bffff : igb
    d22c0000-d22fffff : 0000:01:00.3
      d22c0000-d22fffff : igb
  d2300000-d24fffff : PCI Bus 0000:07
    d23f0000-d23fffff : 0000:07:00.0
      d23f0000-d23fffff : megasas: LSI
    d2400000-d24fffff : 0000:07:00.0
  d2500000-d25fffff : PCI Bus 0000:1d
    d25fe000-d25fffff : 0000:1d:00.0
      d25fe000-d25fffff : xhci-hcd
  d2600000-d2ffffff : PCI Bus 0000:18
    d2600000-d2ffffff : PCI Bus 0000:19
      d2600000-d26fffff : PCI Bus 0000:1c
      d2700000-d2ffffff : PCI Bus 0000:1a
        d2700000-d2ffffff : PCI Bus 0000:1b
          d27fc000-d27fffff : 0000:1b:00.0
          d27fc000-d27fffff : mgadrmfb_mmio
          d2800000-d2ffffff : 0000:1b:00.0
  d3000000-d3ffffff : PCI Bus 0000:18
    d3000000-d3ffffff : PCI Bus 0000:19
      d3000000-d3ffffff : PCI Bus 0000:1a
        d3000000-d3ffffff : PCI Bus 0000:1b
          d3000000-d3ffffff : 0000:1b:00.0
          d3000000-d3ffffff : mgadrmfb_vram
d4000000-d4000fff : dmar0
d4003000-dbffffff : PCI Bus 0000:40
  d4100000-d42fffff : PCI Bus 0000:51
  d76ff000-d76fffff : 0000:40:05.4
  d7700000-d77fffff : PCI Bus 0000:41
    d7700000-d77fffff : 0000:41:00.0
  d7800000-db7fffff : PCI Bus 0000:47
  db800000-dbffffff : PCI Bus 0000:41
    db800000-dbffffff : 0000:41:00.0
dc000000-dc000fff : dmar1
dc003000-ebffffff : PCI Bus 0000:80
  dffff000-dfffffff : 0000:80:05.4
  e0000000-e3ffffff : PCI Bus 0000:81
  e4000000-e7ffffff : PCI Bus 0000:8b
  e8000000-ebffffff : PCI Bus 0000:95
ec000000-ec000fff : dmar2
ec003000-fbffffff : PCI Bus 0000:c0
  effff000-efffffff : 0000:c0:05.4
  f0000000-f3ffffff : PCI Bus 0000:c1
  f4000000-f7ffffff : PCI Bus 0000:cb
  f8000000-fbffffff : PCI Bus 0000:d5
fec00000-fecfffff : PNP0003:00
  fec00000-fec003ff : IOAPIC 0
  fec01000-fec013ff : IOAPIC 1
  fec21000-fec213ff : IOAPIC 2
  fec41000-fec413ff : IOAPIC 3
  fec61000-fec613ff : IOAPIC 4
fed00000-fed003ff : HPET 0
  fed00000-fed003ff : PNP0103:00
fed12000-fed1200f : pnp 00:01
fed12010-fed1201f : pnp 00:01
fed1b000-fed1bfff : pnp 00:01
fed1c000-fed1ffff : reserved
  fed1f410-fed1f414 : iTCO_wdt.0.auto
fed45000-fed8bfff : pnp 00:01
fee00000-feefffff : pnp 00:01
  fee00000-fee00fff : Local APIC
ff000000-ffffffff : reserved
  ff000000-ffffffff : pnp 00:01
100000000-803fffffff : System RAM
20000000000-23fffffffff : PCI Bus 0000:00
  20000000000-200001fffff : PCI Bus 0000:07
  20000200000-200003fffff : PCI Bus 0000:11
  20000400000-200005fffff : PCI Bus 0000:17
  20000600000-200007fffff : PCI Bus 0000:1d
  23fffee0000-23fffee3fff : 0000:00:04.0
    23fffee0000-23fffee3fff : ioatdma
  23fffee4000-23fffee7fff : 0000:00:04.1
    23fffee4000-23fffee7fff : ioatdma
  23fffee8000-23fffeebfff : 0000:00:04.2
    23fffee8000-23fffeebfff : ioatdma
  23fffeec000-23fffeeffff : 0000:00:04.3
    23fffeec000-23fffeeffff : ioatdma
  23fffef0000-23fffef3fff : 0000:00:04.4
    23fffef0000-23fffef3fff : ioatdma
  23fffef4000-23fffef7fff : 0000:00:04.5
    23fffef4000-23fffef7fff : ioatdma
  23fffef8000-23fffefbfff : 0000:00:04.6
    23fffef8000-23fffefbfff : ioatdma
  23fffefc000-23fffefffff : 0000:00:04.7
    23fffefc000-23fffefffff : ioatdma
  23ffff00000-23fffffffff : PCI Bus 0000:01
    23ffff00000-23ffff1ffff : 0000:01:00.0
    23ffff20000-23ffff3ffff : 0000:01:00.0
    23ffff40000-23ffff5ffff : 0000:01:00.1
    23ffff60000-23ffff7ffff : 0000:01:00.1
    23ffff80000-23ffff9ffff : 0000:01:00.2
    23ffffa0000-23ffffbffff : 0000:01:00.2
    23ffffc0000-23ffffdffff : 0000:01:00.3
    23ffffe0000-23fffffffff : 0000:01:00.3
24000000000-27fffffffff : PCI Bus 0000:40
  24000000000-240001fffff : PCI Bus 0000:51
  26ffffe0000-26ffffe3fff : 0000:40:04.0
    26ffffe0000-26ffffe3fff : ioatdma
  26ffffe4000-26ffffe7fff : 0000:40:04.1
    26ffffe4000-26ffffe7fff : ioatdma
  26ffffe8000-26ffffebfff : 0000:40:04.2
    26ffffe8000-26ffffebfff : ioatdma
  26ffffec000-26ffffeffff : 0000:40:04.3
    26ffffec000-26ffffeffff : ioatdma
  26fffff0000-26fffff3fff : 0000:40:04.4
    26fffff0000-26fffff3fff : ioatdma
  26fffff4000-26fffff7fff : 0000:40:04.5
    26fffff4000-26fffff7fff : ioatdma
  26fffff8000-26fffffbfff : 0000:40:04.6
    26fffff8000-26fffffbfff : ioatdma
  26fffffc000-26fffffffff : 0000:40:04.7
    26fffffc000-26fffffffff : ioatdma
  27000000000-27fffffffff : PCI Bus 0000:47
28000000000-2bfffffffff : PCI Bus 0000:80
  28ffffe0000-28ffffe3fff : 0000:80:04.0
    28ffffe0000-28ffffe3fff : ioatdma
  28ffffe4000-28ffffe7fff : 0000:80:04.1
    28ffffe4000-28ffffe7fff : ioatdma
  28ffffe8000-28ffffebfff : 0000:80:04.2
    28ffffe8000-28ffffebfff : ioatdma
  28ffffec000-28ffffeffff : 0000:80:04.3
    28ffffec000-28ffffeffff : ioatdma
  28fffff0000-28fffff3fff : 0000:80:04.4
    28fffff0000-28fffff3fff : ioatdma
  28fffff4000-28fffff7fff : 0000:80:04.5
    28fffff4000-28fffff7fff : ioatdma
  28fffff8000-28fffffbfff : 0000:80:04.6
    28fffff8000-28fffffbfff : ioatdma
  28fffffc000-28fffffffff : 0000:80:04.7
    28fffffc000-28fffffffff : ioatdma
  29000000000-29fffffffff : PCI Bus 0000:95
  2a000000000-2afffffffff : PCI Bus 0000:8b
  2b000000000-2bfffffffff : PCI Bus 0000:81
2c000000000-2ffffffffff : PCI Bus 0000:c0
  2cffffe0000-2cffffe3fff : 0000:c0:04.0
    2cffffe0000-2cffffe3fff : ioatdma
  2cffffe4000-2cffffe7fff : 0000:c0:04.1
    2cffffe4000-2cffffe7fff : ioatdma
  2cffffe8000-2cffffebfff : 0000:c0:04.2
    2cffffe8000-2cffffebfff : ioatdma
  2cffffec000-2cffffeffff : 0000:c0:04.3
    2cffffec000-2cffffeffff : ioatdma
  2cfffff0000-2cfffff3fff : 0000:c0:04.4
    2cfffff0000-2cfffff3fff : ioatdma
  2cfffff4000-2cfffff7fff : 0000:c0:04.5
    2cfffff4000-2cfffff7fff : ioatdma
  2cfffff8000-2cfffffbfff : 0000:c0:04.6
    2cfffff8000-2cfffffbfff : ioatdma
  2cfffffc000-2cfffffffff : 0000:c0:04.7
    2cfffffc000-2cfffffffff : ioatdma
  2d000000000-2dfffffffff : PCI Bus 0000:d5
  2e000000000-2efffffffff : PCI Bus 0000:cb
  2f000000000-2ffffffffff : PCI Bus 0000:c1

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] pci: Don't set RCB bit in LNKCTL if the upstream bridge hasn't
  2016-11-21 16:53     ` Bjorn Helgaas
  2016-11-22  7:59       ` Johannes Thumshirn
  2016-11-22 10:56       ` Johannes Thumshirn
@ 2016-11-22 16:01       ` Myron Stowe
  2016-11-23 17:31       ` Bjorn Helgaas
  3 siblings, 0 replies; 14+ messages in thread
From: Myron Stowe @ 2016-11-22 16:01 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Johannes Thumshirn, Bjorn Helgaas, linux-pci, LKML,
	Alexander Graf, Hannes Reinecke

On Mon, Nov 21, 2016 at 9:53 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Wed, Nov 16, 2016 at 12:11:58PM -0600, Bjorn Helgaas wrote:
>> Hi Johannes,
>>
>> On Wed, Nov 02, 2016 at 04:35:52PM -0600, Johannes Thumshirn wrote:
>> > The Read Completion Boundary (RCB) bit must only be set on a device or
>> > endpoint if it is set on the root complex.
>>
>> I propose the following slightly modified patch.  The interesting
>> difference is that your patch only touches the _HPX "OR" mask, so it
>> refrains from *setting* RCB in some cases, but it never actually
>> *clears* it.  The only time we clear RCB is when the _HPX "AND" mask
>> has RCB == 0.
>>
>> My intent below is that we completely ignore the _HPX RCB bits, and we
>> set an Endpoint's RCB if and only if the Root Port's RCB is set.
>>
>> I made an ugly ASCII table to think about the cases:
>>
>>       Root   EP    _HPX  _HPX     Final Endpoint RCB state
>>       Port  (init)  AND   OR     (curr)  (yours)  (mine)
>>   0)   0     0      0    0          0       0       0
>>   1)   0     0      0    1          1       0       0
>>   2)   0     0      1    0          0       0       0
>>   3)   0     0      1    1          1       0       0
>>   4)   0     1      0    0          0       0       0
>>   5)   0     1      0    1          1       0       0
>>   6)   0     1      1    0          1       1       0
>>   7)   0     1      1    1          1       1       0
>>   8)   1     0      0    0          0       0       1
>>   9)   1     0      0    1          1       1       1
>>   A)   1     0      1    0          0       0       1
>>   B)   1     0      1    1          1       1       1
>>   C)   1     1      0    0          0       0       1
>>   D)   1     1      0    1          1       1       1
>>   E)   1     1      1    0          1       1       1
>>   F)   1     1      1    1          1       1       1
>>
>> Cases 0-7 should all result in the Endpoint RCB being zero because the
>> Root Port RCB is zero.  Case 1 is the bug you're fixing.  Cases 3 & 5
>> are similar hypothetical bugs your patch also fixes.
>>
>> Cases 6 & 7, where firmware left the Endpoint RCB set and _HPX didn't
>> tell us to clear it, are hypothetical firmware bugs that your patch
>> wouldn't fix.
>>
>> In cases 8, A, and C, we currently leave the Endpoint RCB cleared,
>> either because firmware left it clear and _HPX didn't tell us to set
>> it (8 and A), or because firmware set it but _HPX told us to clear it
>> (C).
>>
>> One could argue that 8, A, and C should stay as they currently are, as
>> a way for _HPX to work around hardware bugs, e.g., a Root Port that
>> advertises a 128-byte RCB but doesn't actually support it.  I didn't
>> bother with that and set the Endpoint's RCB to 128 in all cases when
>> the Root Port claims to support it.
>>
>> It'd be great if you could test this and comment.
>>
>> If you get a chance, collect the /proc/iomem contents, too.  That's
>> not for this bug; it's because I'm curious about the
>>
>>   ERST: Can not request [mem 0xb928b000-0xb928cbff] for ERST
>>
>> problem in your dmesg log.
>
> Oops, I goofed and forgot to clear RCB by default.
> Here's the fixed one.
>

Testing on our end was successful again with this fix  ;)

Tested-by: Frank Danapfel <fdanapfe@redhat.com>
Acked-by: Myron Stowe <myron.stowe@redhat.com>

>
> commit b7bff74c2e6babf12906291ee177f16444de81ad
> Author: Johannes Thumshirn <jthumshirn@suse.de>
> Date:   Wed Nov 16 15:47:53 2016 -0600
>
>     PCI: Set Read Completion Boundary to 128 iff Root Port supports it
>
>     Per PCIe spec r3.0, sec 2.3.1.1, the Read Completion Boundary (RCB)
>     determines the naturally aligned address boundaries on which a Read Request
>     may be serviced with multiple Completions:
>
>       - For a Root Complex, RCB is 64 bytes or 128 bytes
>         This value is reported in the Link Control Register
>
>         Note: Bridges and Endpoints may implement a corresponding command bit
>         which may be set by system software to indicate the RCB value for the
>         Root Complex, allowing the Bridge/Endpoint to optimize its behavior
>         when the Root Complex’s RCB is 128 bytes.
>
>       - For all other system elements, RCB is 128 bytes
>
>     Per sec 7.8.7, if a Root Port only supports a 64-byte RCB, the RCB of all
>     downstream devices must be clear, indicating an RCB of 64 bytes.  If the
>     Root Port supports a 128-byte RCB, we may optionally set the RCB of
>     downstream devices so they know they can generate larger Completions.
>
>     Some BIOSes supply an _HPX that tells us to set RCB, even though the Root
>     Port doesn't have RCB set, which may lead to Malformed TLP errors if the
>     Endpoint generates completions larger than the Root Port can handle.
>
>     The IBM x3850 X6 with BIOS version -[A8E120CUS-1.30]- 08/22/2016 supplies
>     such an _HPX and a Mellanox MT27500 ConnectX-3 device fails to initialize:
>
>       mlx4_core 0000:41:00.0: command 0xfff timed out (go bit not cleared)
>       mlx4_core 0000:41:00.0: device is going to be reset
>       mlx4_core 0000:41:00.0: Failed to obtain HW semaphore, aborting
>       mlx4_core 0000:41:00.0: Fail to reset HCA
>       ------------[ cut here ]------------
>       kernel BUG at drivers/net/ethernet/mellanox/mlx4/catas.c:193!
>
>     After 6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration")
>     and 7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices
>     with a link"), we apply _HPX settings to *all* devices, not just those
>     hot-added after boot.
>
>     Before 7a1562d4f2d0, we didn't touch the Mellanox RCB, and the device
>     worked.  After 7a1562d4f2d0, we set its RCB to 128, and it failed.
>
>     Set the RCB to 128 iff the Root Port supports a 128-byte RCB.  Otherwise,
>     set RCB to 64 bytes.  This effectively ignores what _HPX tells us about
>     RCB.
>
>     [bhelgaas: changelog, clear RCB if not set for Root Port]
>     Fixes: 6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration")
>     Fixes: 7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices with a link")
>     Link: https://bugzilla.kernel.org/show_bug.cgi?id=187781
>     Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>     CC: stable@vger.kernel.org  # v3.18+
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index ab00267..104c46d 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1439,6 +1439,21 @@ static void program_hpp_type1(struct pci_dev *dev, struct hpp_type1 *hpp)
>                 dev_warn(&dev->dev, "PCI-X settings not supported\n");
>  }
>
> +static bool pcie_root_rcb_set(struct pci_dev *dev)
> +{
> +       struct pci_dev *rp = pcie_find_root_port(dev);
> +       u16 lnkctl;
> +
> +       if (!rp)
> +               return false;
> +
> +       pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
> +       if (lnkctl & PCI_EXP_LNKCTL_RCB)
> +               return true;
> +
> +       return false;
> +}
> +
>  static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp)
>  {
>         int pos;
> @@ -1468,9 +1483,20 @@ static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp)
>                         ~hpp->pci_exp_devctl_and, hpp->pci_exp_devctl_or);
>
>         /* Initialize Link Control Register */
> -       if (pcie_cap_has_lnkctl(dev))
> +       if (pcie_cap_has_lnkctl(dev)) {
> +
> +               /*
> +                * If the Root Port supports Read Completion Boundary of
> +                * 128, set RCB to 128.  Otherwise, clear it.
> +                */
> +               hpp->pci_exp_lnkctl_and |= PCI_EXP_LNKCTL_RCB;
> +               hpp->pci_exp_lnkctl_or &= ~PCI_EXP_LNKCTL_RCB;
> +               if (pcie_root_rcb_set(dev))
> +                       hpp->pci_exp_lnkctl_or |= PCI_EXP_LNKCTL_RCB;
> +
>                 pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
>                         ~hpp->pci_exp_lnkctl_and, hpp->pci_exp_lnkctl_or);
> +       }
>
>         /* Find Advanced Error Reporting Enhanced Capability */
>         pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] pci: Don't set RCB bit in LNKCTL if the upstream bridge hasn't
  2016-11-21 16:53     ` Bjorn Helgaas
                         ` (2 preceding siblings ...)
  2016-11-22 16:01       ` Myron Stowe
@ 2016-11-23 17:31       ` Bjorn Helgaas
  2016-11-23 19:49         ` Bjorn Helgaas
  3 siblings, 1 reply; 14+ messages in thread
From: Bjorn Helgaas @ 2016-11-23 17:31 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, Alexander Graf, Hannes Reinecke

On Mon, Nov 21, 2016 at 10:53:52AM -0600, Bjorn Helgaas wrote:
> On Wed, Nov 16, 2016 at 12:11:58PM -0600, Bjorn Helgaas wrote:
> > Hi Johannes,
> > 
> > On Wed, Nov 02, 2016 at 04:35:52PM -0600, Johannes Thumshirn wrote:
> > > The Read Completion Boundary (RCB) bit must only be set on a device or
> > > endpoint if it is set on the root complex.
> ...

> Here's the fixed one.

I applied the one below to for-linus for v4.9.

I did tweak the preceding patch to only share pcie_find_root_port()
inside drivers/pci -- moved the declaration from include/linux/pci.h
to drivers/pci/pci.h and moved the definition to drivers/pci/search.c.

I don't think it needs to be visible to the whole kernel.

> commit b7bff74c2e6babf12906291ee177f16444de81ad
> Author: Johannes Thumshirn <jthumshirn@suse.de>
> Date:   Wed Nov 16 15:47:53 2016 -0600
> 
>     PCI: Set Read Completion Boundary to 128 iff Root Port supports it
>     
>     Per PCIe spec r3.0, sec 2.3.1.1, the Read Completion Boundary (RCB)
>     determines the naturally aligned address boundaries on which a Read Request
>     may be serviced with multiple Completions:
>     
>       - For a Root Complex, RCB is 64 bytes or 128 bytes
>         This value is reported in the Link Control Register
>     
>         Note: Bridges and Endpoints may implement a corresponding command bit
>         which may be set by system software to indicate the RCB value for the
>         Root Complex, allowing the Bridge/Endpoint to optimize its behavior
>         when the Root Complex’s RCB is 128 bytes.
>     
>       - For all other system elements, RCB is 128 bytes
>     
>     Per sec 7.8.7, if a Root Port only supports a 64-byte RCB, the RCB of all
>     downstream devices must be clear, indicating an RCB of 64 bytes.  If the
>     Root Port supports a 128-byte RCB, we may optionally set the RCB of
>     downstream devices so they know they can generate larger Completions.
>     
>     Some BIOSes supply an _HPX that tells us to set RCB, even though the Root
>     Port doesn't have RCB set, which may lead to Malformed TLP errors if the
>     Endpoint generates completions larger than the Root Port can handle.
>     
>     The IBM x3850 X6 with BIOS version -[A8E120CUS-1.30]- 08/22/2016 supplies
>     such an _HPX and a Mellanox MT27500 ConnectX-3 device fails to initialize:
>     
>       mlx4_core 0000:41:00.0: command 0xfff timed out (go bit not cleared)
>       mlx4_core 0000:41:00.0: device is going to be reset
>       mlx4_core 0000:41:00.0: Failed to obtain HW semaphore, aborting
>       mlx4_core 0000:41:00.0: Fail to reset HCA
>       ------------[ cut here ]------------
>       kernel BUG at drivers/net/ethernet/mellanox/mlx4/catas.c:193!
>     
>     After 6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration")
>     and 7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices
>     with a link"), we apply _HPX settings to *all* devices, not just those
>     hot-added after boot.
>     
>     Before 7a1562d4f2d0, we didn't touch the Mellanox RCB, and the device
>     worked.  After 7a1562d4f2d0, we set its RCB to 128, and it failed.
>     
>     Set the RCB to 128 iff the Root Port supports a 128-byte RCB.  Otherwise,
>     set RCB to 64 bytes.  This effectively ignores what _HPX tells us about
>     RCB.
>     
>     [bhelgaas: changelog, clear RCB if not set for Root Port]
>     Fixes: 6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration")
>     Fixes: 7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices with a link")
>     Link: https://bugzilla.kernel.org/show_bug.cgi?id=187781
>     Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>     CC: stable@vger.kernel.org	# v3.18+
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index ab00267..104c46d 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1439,6 +1439,21 @@ static void program_hpp_type1(struct pci_dev *dev, struct hpp_type1 *hpp)
>  		dev_warn(&dev->dev, "PCI-X settings not supported\n");
>  }
>  
> +static bool pcie_root_rcb_set(struct pci_dev *dev)
> +{
> +	struct pci_dev *rp = pcie_find_root_port(dev);
> +	u16 lnkctl;
> +
> +	if (!rp)
> +		return false;
> +
> +	pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
> +	if (lnkctl & PCI_EXP_LNKCTL_RCB)
> +		return true;
> +
> +	return false;
> +}
> +
>  static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp)
>  {
>  	int pos;
> @@ -1468,9 +1483,20 @@ static void program_hpp_type2(struct pci_dev *dev, struct hpp_type2 *hpp)
>  			~hpp->pci_exp_devctl_and, hpp->pci_exp_devctl_or);
>  
>  	/* Initialize Link Control Register */
> -	if (pcie_cap_has_lnkctl(dev))
> +	if (pcie_cap_has_lnkctl(dev)) {
> +
> +		/*
> +		 * If the Root Port supports Read Completion Boundary of
> +		 * 128, set RCB to 128.  Otherwise, clear it.
> +		 */
> +		hpp->pci_exp_lnkctl_and |= PCI_EXP_LNKCTL_RCB;
> +		hpp->pci_exp_lnkctl_or &= ~PCI_EXP_LNKCTL_RCB;
> +		if (pcie_root_rcb_set(dev))
> +			hpp->pci_exp_lnkctl_or |= PCI_EXP_LNKCTL_RCB;
> +
>  		pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
>  			~hpp->pci_exp_lnkctl_and, hpp->pci_exp_lnkctl_or);
> +	}
>  
>  	/* Find Advanced Error Reporting Enhanced Capability */
>  	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] pci: Don't set RCB bit in LNKCTL if the upstream bridge hasn't
  2016-11-23 17:31       ` Bjorn Helgaas
@ 2016-11-23 19:49         ` Bjorn Helgaas
  0 siblings, 0 replies; 14+ messages in thread
From: Bjorn Helgaas @ 2016-11-23 19:49 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, Alexander Graf, Hannes Reinecke

On Wed, Nov 23, 2016 at 11:31:52AM -0600, Bjorn Helgaas wrote:
> On Mon, Nov 21, 2016 at 10:53:52AM -0600, Bjorn Helgaas wrote:
> > On Wed, Nov 16, 2016 at 12:11:58PM -0600, Bjorn Helgaas wrote:
> > > Hi Johannes,
> > > 
> > > On Wed, Nov 02, 2016 at 04:35:52PM -0600, Johannes Thumshirn wrote:
> > > > The Read Completion Boundary (RCB) bit must only be set on a device or
> > > > endpoint if it is set on the root complex.
> > ...
> 
> > Here's the fixed one.
> 
> I applied the one below to for-linus for v4.9.
> 
> I did tweak the preceding patch to only share pcie_find_root_port()
> inside drivers/pci -- moved the declaration from include/linux/pci.h
> to drivers/pci/pci.h and moved the definition to drivers/pci/search.c.
> 
> I don't think it needs to be visible to the whole kernel.

Sigh, I guess it *does*, because the aer_inject user can be a module.
I'll fix that.

Bjorn

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-11-23 19:49 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-02 22:35 [PATCH 1/2] pci: export pcie_find_root_port Johannes Thumshirn
2016-11-02 22:35 ` [PATCH 2/2] pci: Don't set RCB bit in LNKCTL if the upstream bridge hasn't Johannes Thumshirn
2016-11-09 17:11   ` Bjorn Helgaas
2016-11-14 11:56     ` Johannes Thumshirn
2016-11-14 16:16       ` Don Dutile
2016-11-15 12:58         ` Johannes Thumshirn
2016-11-16 18:11   ` Bjorn Helgaas
2016-11-17  9:57     ` Johannes Thumshirn
2016-11-21 16:53     ` Bjorn Helgaas
2016-11-22  7:59       ` Johannes Thumshirn
2016-11-22 10:56       ` Johannes Thumshirn
2016-11-22 16:01       ` Myron Stowe
2016-11-23 17:31       ` Bjorn Helgaas
2016-11-23 19:49         ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).