linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] PCI: Check for PCIe downtraining conditions
@ 2018-06-04 15:55 Alexandru Gagniuc
  2018-06-05 12:27 ` Andy Shevchenko
  2018-07-16 21:17 ` Bjorn Helgaas
  0 siblings, 2 replies; 47+ messages in thread
From: Alexandru Gagniuc @ 2018-06-04 15:55 UTC (permalink / raw)
  To: bhelgaas
  Cc: alex_gagniuc, austin_bolen, shyam_iyer, keith.busch,
	Alexandru Gagniuc, linux-pci, linux-kernel

PCIe downtraining happens when both the device and PCIe port are
capable of a larger bus width or higher speed than negotiated.
Downtraining might be indicative of other problems in the system, and
identifying this from userspace is neither intuitive, nor straigh
forward.

The easiest way to detect this is with pcie_print_link_status(),
since the bottleneck is usually the link that is downtrained. It's not
a perfect solution, but it works extremely well in most cases.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---

Changes since v2:
 - Check dev->is_virtfn flag

Changes since v1:
 - Use pcie_print_link_status() instead of reimplementing logic
 
 drivers/pci/probe.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ac91b6fd0bcd..a88ec8c25dd5 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2146,6 +2146,25 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
 	return dev;
 }
 
+static void pcie_check_upstream_link(struct pci_dev *dev)
+{
+
+	if (!pci_is_pcie(dev))
+		return;
+
+	/* Look from the device up to avoid downstream ports with no devices. */
+	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
+		return;
+
+	/* Multi-function PCIe share the same link/status. */
+	if ((PCI_FUNC(dev->devfn) != 0) || dev->is_virtfn)
+		return;
+
+	pcie_print_link_status(dev);
+}
+
 static void pci_init_capabilities(struct pci_dev *dev)
 {
 	/* Enhanced Allocation */
@@ -2181,6 +2200,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
 	/* Advanced Error Reporting */
 	pci_aer_init(dev);
 
+	/* Check link and detect downtrain errors */
+	pcie_check_upstream_link(dev);
+
 	if (pci_probe_reset_function(dev) == 0)
 		dev->reset_fn = 1;
 }
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-06-04 15:55 [PATCH v3] PCI: Check for PCIe downtraining conditions Alexandru Gagniuc
@ 2018-06-05 12:27 ` Andy Shevchenko
  2018-06-05 13:04   ` Andy Shevchenko
  2018-07-16 21:17 ` Bjorn Helgaas
  1 sibling, 1 reply; 47+ messages in thread
From: Andy Shevchenko @ 2018-06-05 12:27 UTC (permalink / raw)
  To: Alexandru Gagniuc
  Cc: Bjorn Helgaas, alex_gagniuc, austin_bolen, shyam_iyer,
	Keith Busch, linux-pci, Linux Kernel Mailing List

On Mon, Jun 4, 2018 at 6:55 PM, Alexandru Gagniuc <mr.nuke.me@gmail.com> wrote:
> PCIe downtraining happens when both the device and PCIe port are
> capable of a larger bus width or higher speed than negotiated.
> Downtraining might be indicative of other problems in the system, and
> identifying this from userspace is neither intuitive, nor straigh
> forward.
>
> The easiest way to detect this is with pcie_print_link_status(),
> since the bottleneck is usually the link that is downtrained. It's not
> a perfect solution, but it works extremely well in most cases.

Have you seen any of my comments?
For your convenience repeating below.

>
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
>
> Changes since v2:
>  - Check dev->is_virtfn flag
>
> Changes since v1:
>  - Use pcie_print_link_status() instead of reimplementing logic
>
>  drivers/pci/probe.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index ac91b6fd0bcd..a88ec8c25dd5 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2146,6 +2146,25 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
>         return dev;
>  }
>
> +static void pcie_check_upstream_link(struct pci_dev *dev)
> +{

> +

This is redundant blank line.

> +       if (!pci_is_pcie(dev))
> +               return;
> +
> +       /* Look from the device up to avoid downstream ports with no devices. */
> +       if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
> +           (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
> +           (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
> +               return;

I looked briefly at the use of these calls and perhaps it might make
sense to introduce
pci_is_pcie_type(dev, type) which unifies pci_is_pcie() + pci_pcie_type().

> +
> +       /* Multi-function PCIe share the same link/status. */

> +       if ((PCI_FUNC(dev->devfn) != 0) || dev->is_virtfn)

The one pair of parens is not needed.

> +               return;
> +
> +       pcie_print_link_status(dev);
> +}
> +
>  static void pci_init_capabilities(struct pci_dev *dev)
>  {
>         /* Enhanced Allocation */
> @@ -2181,6 +2200,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
>         /* Advanced Error Reporting */
>         pci_aer_init(dev);
>
> +       /* Check link and detect downtrain errors */
> +       pcie_check_upstream_link(dev);
> +
>         if (pci_probe_reset_function(dev) == 0)
>                 dev->reset_fn = 1;
>  }
> --
> 2.14.4
>



-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-06-05 12:27 ` Andy Shevchenko
@ 2018-06-05 13:04   ` Andy Shevchenko
  0 siblings, 0 replies; 47+ messages in thread
From: Andy Shevchenko @ 2018-06-05 13:04 UTC (permalink / raw)
  To: Alexandru Gagniuc
  Cc: Bjorn Helgaas, alex_gagniuc, austin_bolen, shyam_iyer,
	Keith Busch, linux-pci, Linux Kernel Mailing List

On Tue, Jun 5, 2018 at 3:27 PM, Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
> On Mon, Jun 4, 2018 at 6:55 PM, Alexandru Gagniuc <mr.nuke.me@gmail.com> wrote:
>> PCIe downtraining happens when both the device and PCIe port are
>> capable of a larger bus width or higher speed than negotiated.
>> Downtraining might be indicative of other problems in the system, and
>> identifying this from userspace is neither intuitive, nor straigh
>> forward.
>>
>> The easiest way to detect this is with pcie_print_link_status(),
>> since the bottleneck is usually the link that is downtrained. It's not
>> a perfect solution, but it works extremely well in most cases.
>
> Have you seen any of my comments?
> For your convenience repeating below.

Ah, found the answer in a pile of emails. OK, I see your point about
helper, though the rest is still applicable here.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-06-04 15:55 [PATCH v3] PCI: Check for PCIe downtraining conditions Alexandru Gagniuc
  2018-06-05 12:27 ` Andy Shevchenko
@ 2018-07-16 21:17 ` Bjorn Helgaas
  2018-07-16 22:28   ` Alex_Gagniuc
  2018-07-18 13:38   ` [PATCH v3] " Tal Gilboa
  1 sibling, 2 replies; 47+ messages in thread
From: Bjorn Helgaas @ 2018-07-16 21:17 UTC (permalink / raw)
  To: Alexandru Gagniuc
  Cc: bhelgaas, alex_gagniuc, austin_bolen, shyam_iyer, keith.busch,
	linux-pci, linux-kernel, Jeff Kirsher, Ariel Elior, Michael Chan,
	Ganesh Goudar, Tariq Toukan, Jakub Kicinski, Tal Gilboa,
	Dave Airlie, Alex Deucher

[+cc maintainers of drivers that already use pcie_print_link_status()
and GPU folks]

On Mon, Jun 04, 2018 at 10:55:21AM -0500, Alexandru Gagniuc wrote:
> PCIe downtraining happens when both the device and PCIe port are
> capable of a larger bus width or higher speed than negotiated.
> Downtraining might be indicative of other problems in the system, and
> identifying this from userspace is neither intuitive, nor straigh
> forward.

s/straigh/straight/
In this context, I think "straightforward" should be closed up
(without the space).

> The easiest way to detect this is with pcie_print_link_status(),
> since the bottleneck is usually the link that is downtrained. It's not
> a perfect solution, but it works extremely well in most cases.

This is an interesting idea.  I have two concerns:

Some drivers already do this on their own, and we probably don't want
duplicate output for those devices.  In most cases (ixgbe and mlx* are
exceptions), the drivers do this unconditionally so we *could* remove
it from the driver if we add it to the core.  The dmesg order would
change, and the message wouldn't be associated with the driver as it
now is.

Also, I think some of the GPU devices might come up at a lower speed,
then download firmware, then reset the device so it comes up at a
higher speed.  I think this patch will make us complain about about
the low initial speed, which might confuse users.

So I'm not sure whether it's better to do this in the core for all
devices, or if we should just add it to the high-performance drivers
that really care.

> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
> 
> Changes since v2:
>  - Check dev->is_virtfn flag
> 
> Changes since v1:
>  - Use pcie_print_link_status() instead of reimplementing logic
>  
>  drivers/pci/probe.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index ac91b6fd0bcd..a88ec8c25dd5 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2146,6 +2146,25 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
>  	return dev;
>  }
>  
> +static void pcie_check_upstream_link(struct pci_dev *dev)
> +{
> +
> +	if (!pci_is_pcie(dev))
> +		return;
> +
> +	/* Look from the device up to avoid downstream ports with no devices. */
> +	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
> +		return;

Do we care about Upstream Ports here?  I suspect that ultimately we
only care about the bandwidth to Endpoints, and if an Endpoint is
constrained by a slow link farther up the tree,
pcie_print_link_status() is supposed to identify that slow link.

I would find this test easier to read as

  if (!(type == PCI_EXP_TYPE_ENDPOINT || type == PCI_EXP_TYPE_LEG_END))
    return;

But maybe I'm the only one that finds the conjunction of inequalities
hard to read.  No big deal either way.

> +	/* Multi-function PCIe share the same link/status. */
> +	if ((PCI_FUNC(dev->devfn) != 0) || dev->is_virtfn)
> +		return;
> +
> +	pcie_print_link_status(dev);
> +}
> +
>  static void pci_init_capabilities(struct pci_dev *dev)
>  {
>  	/* Enhanced Allocation */
> @@ -2181,6 +2200,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
>  	/* Advanced Error Reporting */
>  	pci_aer_init(dev);
>  
> +	/* Check link and detect downtrain errors */
> +	pcie_check_upstream_link(dev);
> +
>  	if (pci_probe_reset_function(dev) == 0)
>  		dev->reset_fn = 1;
>  }
> -- 
> 2.14.4
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-07-16 21:17 ` Bjorn Helgaas
@ 2018-07-16 22:28   ` Alex_Gagniuc
  2018-07-18 21:53     ` Bjorn Helgaas
  2018-07-18 13:38   ` [PATCH v3] " Tal Gilboa
  1 sibling, 1 reply; 47+ messages in thread
From: Alex_Gagniuc @ 2018-07-16 22:28 UTC (permalink / raw)
  To: helgaas, mr.nuke.me
  Cc: bhelgaas, Austin.Bolen, Shyam.Iyer, keith.busch, linux-pci,
	linux-kernel, jeffrey.t.kirsher, ariel.elior, michael.chan,
	ganeshgr, tariqt, jakub.kicinski, talgi, airlied,
	alexander.deucher

On 7/16/2018 4:17 PM, Bjorn Helgaas wrote:
> [+cc maintainers of drivers that already use pcie_print_link_status()
> and GPU folks]

Thanks for finding them!

[snip]
>> identifying this from userspace is neither intuitive, nor straigh
>> forward.
> 
> s/straigh/straight/
> In this context, I think "straightforward" should be closed up
> (without the space).

That's a straightforward edit. Thanks for the feedback!

>> The easiest way to detect this is with pcie_print_link_status(),
>> since the bottleneck is usually the link that is downtrained. It's not
>> a perfect solution, but it works extremely well in most cases.
> 
> This is an interesting idea.  I have two concerns:
> 
> Some drivers already do this on their own, and we probably don't want
> duplicate output for those devices.  In most cases (ixgbe and mlx* are
> exceptions), the drivers do this unconditionally so we *could* remove
> it from the driver if we add it to the core.  The dmesg order would
> change, and the message wouldn't be associated with the driver as it
> now is.

Oh, there are only 8 users of that. Even I could patch up the drivers to 
remove the call, assuming we reach agreement about this change.

> Also, I think some of the GPU devices might come up at a lower speed,
> then download firmware, then reset the device so it comes up at a
> higher speed.  I think this patch will make us complain about about
> the low initial speed, which might confuse users.

I spoke to one of the PCIe spec writers. It's allowable for a device to 
downtrain speed or width. It would also be extremely dumb to downtrain 
with the intent to re-train at a higher speed later, but it's possible 
devices do dumb stuff like that. That's why it's an informational 
message, instead of a warning.

Another case: Some devices (lower-end GPUs) use silicon (and marketing) 
that advertises x16, but they're only routed for x8. I'm okay with 
seeing an informational message in this case. In fact, I didn't know 
that my Quadro card for three years is only wired for x8 until I was 
testing this patch.

> So I'm not sure whether it's better to do this in the core for all
> devices, or if we should just add it to the high-performance drivers
> that really care.

You're thinking "do I really need that bandwidth" because I'm using a 
function called "_bandwidth_". The point of the change is very far from 
that: it is to help in system troubleshooting by detecting downtraining 
conditions.

>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
[snip]
>> +	/* Look from the device up to avoid downstream ports with no devices. */
>> +	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
>> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
>> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
>> +		return;
> 
> Do we care about Upstream Ports here?  

YES! Switches. e.g. an x16 switch with 4x downstream ports could 
downtrain at 8x and 4x, and we'd never catch it.

> I suspect that ultimately we
> only care about the bandwidth to Endpoints, and if an Endpoint is
> constrained by a slow link farther up the tree,
> pcie_print_link_status() is supposed to identify that slow link.

See above.

> I would find this test easier to read as
> 
>    if (!(type == PCI_EXP_TYPE_ENDPOINT || type == PCI_EXP_TYPE_LEG_END))
>      return;
> 
> But maybe I'm the only one that finds the conjunction of inequalities
> hard to read.  No big deal either way.
> 
>> +	/* Multi-function PCIe share the same link/status. */
>> +	if ((PCI_FUNC(dev->devfn) != 0) || dev->is_virtfn)
>> +		return;
>> +
>> +	pcie_print_link_status(dev);
>> +}
>> +
>>   static void pci_init_capabilities(struct pci_dev *dev)
>>   {
>>   	/* Enhanced Allocation */
>> @@ -2181,6 +2200,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
>>   	/* Advanced Error Reporting */
>>   	pci_aer_init(dev);
>>   
>> +	/* Check link and detect downtrain errors */
>> +	pcie_check_upstream_link(dev);
>> +
>>   	if (pci_probe_reset_function(dev) == 0)
>>   		dev->reset_fn = 1;
>>   }
>> -- 
>> 2.14.4
>>
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-07-16 21:17 ` Bjorn Helgaas
  2018-07-16 22:28   ` Alex_Gagniuc
@ 2018-07-18 13:38   ` Tal Gilboa
  2018-07-19 15:49     ` Alex G.
  1 sibling, 1 reply; 47+ messages in thread
From: Tal Gilboa @ 2018-07-18 13:38 UTC (permalink / raw)
  To: Bjorn Helgaas, Alexandru Gagniuc
  Cc: bhelgaas, alex_gagniuc, austin_bolen, shyam_iyer, keith.busch,
	linux-pci, linux-kernel, Jeff Kirsher, Ariel Elior, Michael Chan,
	Ganesh Goudar, Tariq Toukan, Jakub Kicinski, Dave Airlie,
	Alex Deucher

On 7/16/2018 5:17 PM, Bjorn Helgaas wrote:
> [+cc maintainers of drivers that already use pcie_print_link_status()
> and GPU folks]
> 
> On Mon, Jun 04, 2018 at 10:55:21AM -0500, Alexandru Gagniuc wrote:
>> PCIe downtraining happens when both the device and PCIe port are
>> capable of a larger bus width or higher speed than negotiated.
>> Downtraining might be indicative of other problems in the system, and
>> identifying this from userspace is neither intuitive, nor straigh
>> forward.
> 
> s/straigh/straight/
> In this context, I think "straightforward" should be closed up
> (without the space).
> 
>> The easiest way to detect this is with pcie_print_link_status(),
>> since the bottleneck is usually the link that is downtrained. It's not
>> a perfect solution, but it works extremely well in most cases.
> 
> This is an interesting idea.  I have two concerns:
> 
> Some drivers already do this on their own, and we probably don't want
> duplicate output for those devices.  In most cases (ixgbe and mlx* are
> exceptions), the drivers do this unconditionally so we *could* remove
> it from the driver if we add it to the core.  The dmesg order would
> change, and the message wouldn't be associated with the driver as it
> now is.
> 
> Also, I think some of the GPU devices might come up at a lower speed,
> then download firmware, then reset the device so it comes up at a
> higher speed.  I think this patch will make us complain about about
> the low initial speed, which might confuse users.
> 
> So I'm not sure whether it's better to do this in the core for all
> devices, or if we should just add it to the high-performance drivers
> that really care.
> 
>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>> ---
>>
>> Changes since v2:
>>   - Check dev->is_virtfn flag
>>
>> Changes since v1:
>>   - Use pcie_print_link_status() instead of reimplementing logic
>>   
>>   drivers/pci/probe.c | 22 ++++++++++++++++++++++
>>   1 file changed, 22 insertions(+)
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index ac91b6fd0bcd..a88ec8c25dd5 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -2146,6 +2146,25 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
>>   	return dev;
>>   }
>>   
>> +static void pcie_check_upstream_link(struct pci_dev *dev)
>> +{
>> +
>> +	if (!pci_is_pcie(dev))
>> +		return;
>> +
>> +	/* Look from the device up to avoid downstream ports with no devices. */
>> +	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
>> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
>> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
>> +		return;
> 
> Do we care about Upstream Ports here?  I suspect that ultimately we
> only care about the bandwidth to Endpoints, and if an Endpoint is
> constrained by a slow link farther up the tree,
> pcie_print_link_status() is supposed to identify that slow link.
> 
> I would find this test easier to read as
> 
>    if (!(type == PCI_EXP_TYPE_ENDPOINT || type == PCI_EXP_TYPE_LEG_END))
>      return;
> 
> But maybe I'm the only one that finds the conjunction of inequalities
> hard to read.  No big deal either way.
> 
>> +	/* Multi-function PCIe share the same link/status. */
>> +	if ((PCI_FUNC(dev->devfn) != 0) || dev->is_virtfn)
>> +		return;
>> +
>> +	pcie_print_link_status(dev);
>> +}

Is this function called by default for every PCIe device? What about 
VFs? We make an exception for them on our driver since a VF doesn't have 
access to the needed information in order to provide a meaningful message.

>> +
>>   static void pci_init_capabilities(struct pci_dev *dev)
>>   {
>>   	/* Enhanced Allocation */
>> @@ -2181,6 +2200,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
>>   	/* Advanced Error Reporting */
>>   	pci_aer_init(dev);
>>   
>> +	/* Check link and detect downtrain errors */
>> +	pcie_check_upstream_link(dev);
>> +
>>   	if (pci_probe_reset_function(dev) == 0)
>>   		dev->reset_fn = 1;
>>   }
>> -- 
>> 2.14.4
>>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-07-16 22:28   ` Alex_Gagniuc
@ 2018-07-18 21:53     ` Bjorn Helgaas
  2018-07-19 15:46       ` Alex G.
                         ` (2 more replies)
  0 siblings, 3 replies; 47+ messages in thread
From: Bjorn Helgaas @ 2018-07-18 21:53 UTC (permalink / raw)
  To: Alex_Gagniuc
  Cc: mr.nuke.me, bhelgaas, Austin.Bolen, Shyam.Iyer, keith.busch,
	linux-pci, linux-kernel, jeffrey.t.kirsher, ariel.elior,
	michael.chan, ganeshgr, tariqt, jakub.kicinski, talgi, airlied,
	alexander.deucher, Mike Marciniszyn

[+cc Mike (hfi1)]

On Mon, Jul 16, 2018 at 10:28:35PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> On 7/16/2018 4:17 PM, Bjorn Helgaas wrote:
> >> ...
> >> The easiest way to detect this is with pcie_print_link_status(),
> >> since the bottleneck is usually the link that is downtrained. It's not
> >> a perfect solution, but it works extremely well in most cases.
> > 
> > This is an interesting idea.  I have two concerns:
> > 
> > Some drivers already do this on their own, and we probably don't want
> > duplicate output for those devices.  In most cases (ixgbe and mlx* are
> > exceptions), the drivers do this unconditionally so we *could* remove
> > it from the driver if we add it to the core.  The dmesg order would
> > change, and the message wouldn't be associated with the driver as it
> > now is.
> 
> Oh, there are only 8 users of that. Even I could patch up the drivers to 
> remove the call, assuming we reach agreement about this change.
> 
> > Also, I think some of the GPU devices might come up at a lower speed,
> > then download firmware, then reset the device so it comes up at a
> > higher speed.  I think this patch will make us complain about about
> > the low initial speed, which might confuse users.
> 
> I spoke to one of the PCIe spec writers. It's allowable for a device to 
> downtrain speed or width. It would also be extremely dumb to downtrain 
> with the intent to re-train at a higher speed later, but it's possible 
> devices do dumb stuff like that. That's why it's an informational 
> message, instead of a warning.

FWIW, here's some of the discussion related to hfi1 from [1]:

  > Btw, why is the driver configuring the PCIe link speed?  Isn't
  > this something we should be handling in the PCI core?

  The device comes out of reset at the 5GT/s speed. The driver
  downloads device firmware, programs PCIe registers, and co-ordinates
  the transition to 8GT/s.

  This recipe is device specific and is therefore implemented in the
  hfi1 driver built on top of PCI core functions and macros.

Also several DRM drivers seem to do this (see cik_pcie_gen3_enable(),
si_pcie_gen3_enable()); from [2]:

  My understanding was that some platfoms only bring up the link in gen 1
  mode for compatibility reasons. 

[1] https://lkml.kernel.org/r/32E1700B9017364D9B60AED9960492BC627FF54C@fmsmsx120.amr.corp.intel.com
[2] https://lkml.kernel.org/r/BN6PR12MB1809BD30AA5B890C054F9832F7B50@BN6PR12MB1809.namprd12.prod.outlook.com

> Another case: Some devices (lower-end GPUs) use silicon (and marketing) 
> that advertises x16, but they're only routed for x8. I'm okay with 
> seeing an informational message in this case. In fact, I didn't know 
> that my Quadro card for three years is only wired for x8 until I was 
> testing this patch.

Yeah, it's probably OK.  I don't want bug reports from people who
think something's broken when it's really just a hardware limitation
of their system.  But hopefully the message is not alarming.

> > So I'm not sure whether it's better to do this in the core for all
> > devices, or if we should just add it to the high-performance drivers
> > that really care.
> 
> You're thinking "do I really need that bandwidth" because I'm using a 
> function called "_bandwidth_". The point of the change is very far from 
> that: it is to help in system troubleshooting by detecting downtraining 
> conditions.

I'm not sure what you think I'm thinking :)  My question is whether
it's worthwhile to print this extra information for *every* PCIe
device, given that your use case is the tiny percentage of broken
systems.

If we only printed the info in the "bw_avail < bw_cap" case, i.e.,
when the device is capable of more than it's getting, that would make
a lot of sense to me.  The normal case line is more questionable.  I
think the reason that's there is because the network drivers are very
performance sensitive and like to see that info all the time.

Maybe we need something like this:

  pcie_print_link_status(struct pci_dev *dev, int verbose)
  {
    ...
    if (bw_avail >= bw_cap) {
      if (verbose)
        pci_info(dev, "... available PCIe bandwidth ...");
    } else
      pci_info(dev, "... available PCIe bandwidth, limited by ...");
  }

So the core could print only the potential problems with:

  pcie_print_link_status(dev, 0);

and drivers that really care even if there's no problem could do:

  pcie_print_link_status(dev, 1);

> >> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> [snip]
> >> +	/* Look from the device up to avoid downstream ports with no devices. */
> >> +	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
> >> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
> >> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
> >> +		return;
> > 
> > Do we care about Upstream Ports here?  
> 
> YES! Switches. e.g. an x16 switch with 4x downstream ports could 
> downtrain at 8x and 4x, and we'd never catch it.

OK, I think I see your point: if the upstream port *could* do 16x but
only trains to 4x, and two endpoints below it are both capable of 4x,
the endpoints *think* they're happy but in fact they have to share 4x
when they could use more.

Bjorn

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-07-18 21:53     ` Bjorn Helgaas
@ 2018-07-19 15:46       ` Alex G.
  2018-07-23 20:01       ` [PATCH v2] PCI/AER: Do not clear AER bits if we don't own AER Alexandru Gagniuc
  2018-07-23 20:03       ` [PATCH v5] PCI: Check for PCIe downtraining conditions Alexandru Gagniuc
  2 siblings, 0 replies; 47+ messages in thread
From: Alex G. @ 2018-07-19 15:46 UTC (permalink / raw)
  To: Bjorn Helgaas, Alex_Gagniuc
  Cc: bhelgaas, Austin.Bolen, Shyam.Iyer, keith.busch, linux-pci,
	linux-kernel, jeffrey.t.kirsher, ariel.elior, michael.chan,
	ganeshgr, tariqt, jakub.kicinski, talgi, airlied,
	alexander.deucher, Mike Marciniszyn



On 07/18/2018 04:53 PM, Bjorn Helgaas wrote:
> [+cc Mike (hfi1)]
> 
> On Mon, Jul 16, 2018 at 10:28:35PM +0000, Alex_Gagniuc@Dellteam.com wrote:
>> On 7/16/2018 4:17 PM, Bjorn Helgaas wrote:
>>>> ...
>>>> The easiest way to detect this is with pcie_print_link_status(),
>>>> since the bottleneck is usually the link that is downtrained. It's not
>>>> a perfect solution, but it works extremely well in most cases.
>>>
>>> This is an interesting idea.  I have two concerns:
>>>
>>> Some drivers already do this on their own, and we probably don't want
>>> duplicate output for those devices.  In most cases (ixgbe and mlx* are
>>> exceptions), the drivers do this unconditionally so we *could* remove
>>> it from the driver if we add it to the core.  The dmesg order would
>>> change, and the message wouldn't be associated with the driver as it
>>> now is.
>>
>> Oh, there are only 8 users of that. Even I could patch up the drivers to
>> remove the call, assuming we reach agreement about this change.
>>
>>> Also, I think some of the GPU devices might come up at a lower speed,
>>> then download firmware, then reset the device so it comes up at a
>>> higher speed.  I think this patch will make us complain about about
>>> the low initial speed, which might confuse users.
>>
>> I spoke to one of the PCIe spec writers. It's allowable for a device to
>> downtrain speed or width. It would also be extremely dumb to downtrain
>> with the intent to re-train at a higher speed later, but it's possible
>> devices do dumb stuff like that. That's why it's an informational
>> message, instead of a warning.
> 
> FWIW, here's some of the discussion related to hfi1 from [1]:
> 
>    > Btw, why is the driver configuring the PCIe link speed?  Isn't
>    > this something we should be handling in the PCI core?
> 
>    The device comes out of reset at the 5GT/s speed. The driver
>    downloads device firmware, programs PCIe registers, and co-ordinates
>    the transition to 8GT/s.
> 
>    This recipe is device specific and is therefore implemented in the
>    hfi1 driver built on top of PCI core functions and macros.
> 
> Also several DRM drivers seem to do this (see ),
> si_pcie_gen3_enable()); from [2]:
> 
>    My understanding was that some platfoms only bring up the link in gen 1
>    mode for compatibility reasons.
> 
> [1] https://lkml.kernel.org/r/32E1700B9017364D9B60AED9960492BC627FF54C@fmsmsx120.amr.corp.intel.com
> [2] https://lkml.kernel.org/r/BN6PR12MB1809BD30AA5B890C054F9832F7B50@BN6PR12MB1809.namprd12.prod.outlook.com

Downtraining a link "for compatibility reasons" is one of those dumb 
things that devices do. I'm SURPRISED AMD HW does it, although it is 
perfectly permissible by PCIe spec.

>> Another case: Some devices (lower-end GPUs) use silicon (and marketing)
>> that advertises x16, but they're only routed for x8. I'm okay with
>> seeing an informational message in this case. In fact, I didn't know
>> that my Quadro card for three years is only wired for x8 until I was
>> testing this patch.
> 
> Yeah, it's probably OK.  I don't want bug reports from people who
> think something's broken when it's really just a hardware limitation
> of their system.  But hopefully the message is not alarming.

It looks fairly innocent:

[    0.749415] pci 0000:18:00.0: 4.000 Gb/s available PCIe bandwidth, 
limited by 5 GT/s x1 link at 0000:17:03.0 (capable of 15.752 Gb/s with 8 
GT/s x2 link)

>>> So I'm not sure whether it's better to do this in the core for all
>>> devices, or if we should just add it to the high-performance drivers
>>> that really care.
>>
>> You're thinking "do I really need that bandwidth" because I'm using a
>> function called "_bandwidth_". The point of the change is very far from
>> that: it is to help in system troubleshooting by detecting downtraining
>> conditions.
> 
> I'm not sure what you think I'm thinking :)  My question is whether
> it's worthwhile to print this extra information for *every* PCIe
> device, given that your use case is the tiny percentage of broken
> systems.

I think this information is a lot more useful than a bunch of other info 
that's printed. Is "type 00 class 0x088000" more valuable? What about 
"reg 0x20: [mem 0x9d950000-0x9d95ffff 64bit pref]", which is also 
available under /proc/iomem for those curious?

> If we only printed the info in the "bw_avail < bw_cap" case, i.e.,
> when the device is capable of more than it's getting, that would make
> a lot of sense to me.  The normal case line is more questionable.  I
> think the reason that's there is because the network drivers are very
> performance sensitive and like to see that info all the time.

I agree that can be an acceptable compromise.

> Maybe we need something like this:
> 
>    pcie_print_link_status(struct pci_dev *dev, int verbose)
>    {
>      ...
>      if (bw_avail >= bw_cap) {
>        if (verbose)
>          pci_info(dev, "... available PCIe bandwidth ...");
>      } else
>        pci_info(dev, "... available PCIe bandwidth, limited by ...");
>    }
> 
> So the core could print only the potential problems with:
> 
>    pcie_print_link_status(dev, 0);
> 
> and drivers that really care even if there's no problem could do:
> 
>    pcie_print_link_status(dev, 1);

Sounds good. I'll try to push out updated PATCH early next week.

>>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>> [snip]
>>>> +	/* Look from the device up to avoid downstream ports with no devices. */
>>>> +	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
>>>> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
>>>> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
>>>> +		return;
>>>
>>> Do we care about Upstream Ports here?
>>
>> YES! Switches. e.g. an x16 switch with 4x downstream ports could
>> downtrain at 8x and 4x, and we'd never catch it.
> 
> OK, I think I see your point: if the upstream port *could* do 16x but
> only trains to 4x, and two endpoints below it are both capable of 4x,
> the endpoints *think* they're happy but in fact they have to share 4x
> when they could use more.
> 
> Bjorn
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-07-18 13:38   ` [PATCH v3] " Tal Gilboa
@ 2018-07-19 15:49     ` Alex G.
  2018-07-23  5:21       ` Tal Gilboa
  0 siblings, 1 reply; 47+ messages in thread
From: Alex G. @ 2018-07-19 15:49 UTC (permalink / raw)
  To: Tal Gilboa, Bjorn Helgaas
  Cc: bhelgaas, alex_gagniuc, austin_bolen, shyam_iyer, keith.busch,
	linux-pci, linux-kernel, Jeff Kirsher, Ariel Elior, Michael Chan,
	Ganesh Goudar, Tariq Toukan, Jakub Kicinski, Dave Airlie,
	Alex Deucher



On 07/18/2018 08:38 AM, Tal Gilboa wrote:
> On 7/16/2018 5:17 PM, Bjorn Helgaas wrote:
>> [+cc maintainers of drivers that already use pcie_print_link_status()
>> and GPU folks]
[snip]
>>
>>> +    /* Multi-function PCIe share the same link/status. */
>>> +    if ((PCI_FUNC(dev->devfn) != 0) || dev->is_virtfn)
>>> +        return;
>>> +
>>> +    pcie_print_link_status(dev);
>>> +}
> 
> Is this function called by default for every PCIe device? What about 
> VFs? We make an exception for them on our driver since a VF doesn't have 
> access to the needed information in order to provide a meaningful message.

I'm assuming VF means virtual function. pcie_print_link_status() doesn't 
care if it's passed a virtual function. It will try to do its job. 
That's why I bail out three lines above, with 'dev->is_virtfn' check.

Alex

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-07-19 15:49     ` Alex G.
@ 2018-07-23  5:21       ` Tal Gilboa
  2018-07-23 17:01         ` Alex G.
  0 siblings, 1 reply; 47+ messages in thread
From: Tal Gilboa @ 2018-07-23  5:21 UTC (permalink / raw)
  To: Alex G., Bjorn Helgaas
  Cc: bhelgaas, alex_gagniuc, austin_bolen, shyam_iyer, keith.busch,
	linux-pci, linux-kernel, Jeff Kirsher, Ariel Elior, Michael Chan,
	Ganesh Goudar, Tariq Toukan, Jakub Kicinski, Dave Airlie,
	Alex Deucher

On 7/19/2018 6:49 PM, Alex G. wrote:
> 
> 
> On 07/18/2018 08:38 AM, Tal Gilboa wrote:
>> On 7/16/2018 5:17 PM, Bjorn Helgaas wrote:
>>> [+cc maintainers of drivers that already use pcie_print_link_status()
>>> and GPU folks]
> [snip]
>>>
>>>> +    /* Multi-function PCIe share the same link/status. */
>>>> +    if ((PCI_FUNC(dev->devfn) != 0) || dev->is_virtfn)
>>>> +        return;
>>>> +
>>>> +    pcie_print_link_status(dev);
>>>> +}
>>
>> Is this function called by default for every PCIe device? What about 
>> VFs? We make an exception for them on our driver since a VF doesn't 
>> have access to the needed information in order to provide a meaningful 
>> message.
> 
> I'm assuming VF means virtual function. pcie_print_link_status() doesn't 
> care if it's passed a virtual function. It will try to do its job. 
> That's why I bail out three lines above, with 'dev->is_virtfn' check.
> 
> Alex

That's the point - we don't want to call pcie_print_link_status() for 
virtual functions. We make the distinction in our driver. If you want to 
change the code to call this function by default it shouldn't affect the 
current usage.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-07-23  5:21       ` Tal Gilboa
@ 2018-07-23 17:01         ` Alex G.
  2018-07-23 21:35           ` Tal Gilboa
  0 siblings, 1 reply; 47+ messages in thread
From: Alex G. @ 2018-07-23 17:01 UTC (permalink / raw)
  To: Tal Gilboa, Bjorn Helgaas
  Cc: bhelgaas, alex_gagniuc, austin_bolen, shyam_iyer, keith.busch,
	linux-pci, linux-kernel, Jeff Kirsher, Ariel Elior, Michael Chan,
	Ganesh Goudar, Tariq Toukan, Jakub Kicinski, Dave Airlie,
	Alex Deucher

On 07/23/2018 12:21 AM, Tal Gilboa wrote:
> On 7/19/2018 6:49 PM, Alex G. wrote:
>>
>>
>> On 07/18/2018 08:38 AM, Tal Gilboa wrote:
>>> On 7/16/2018 5:17 PM, Bjorn Helgaas wrote:
>>>> [+cc maintainers of drivers that already use pcie_print_link_status()
>>>> and GPU folks]
>> [snip]
>>>>
>>>>> +    /* Multi-function PCIe share the same link/status. */
>>>>> +    if ((PCI_FUNC(dev->devfn) != 0) || dev->is_virtfn)
>>>>> +        return;
>>>>> +
>>>>> +    pcie_print_link_status(dev);
>>>>> +}
>>>
>>> Is this function called by default for every PCIe device? What about 
>>> VFs? We make an exception for them on our driver since a VF doesn't 
>>> have access to the needed information in order to provide a 
>>> meaningful message.
>>
>> I'm assuming VF means virtual function. pcie_print_link_status() 
>> doesn't care if it's passed a virtual function. It will try to do its 
>> job. That's why I bail out three lines above, with 'dev->is_virtfn' 
>> check.
>>
>> Alex
> 
> That's the point - we don't want to call pcie_print_link_status() for 
> virtual functions. We make the distinction in our driver. If you want to 
> change the code to call this function by default it shouldn't affect the 
> current usage.

I'm not understanding very well what you're asking. I understand you 
want to avoid printing this message on virtual functions, and that's 
already taken care of. I'm also not changing current behavior.  Let's 
get v2 out and start the discussion again based on that.

Alex

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2] PCI/AER: Do not clear AER bits if we don't own AER
  2018-07-18 21:53     ` Bjorn Helgaas
  2018-07-19 15:46       ` Alex G.
@ 2018-07-23 20:01       ` Alexandru Gagniuc
  2018-07-25  1:24         ` kbuild test robot
  2018-07-23 20:03       ` [PATCH v5] PCI: Check for PCIe downtraining conditions Alexandru Gagniuc
  2 siblings, 1 reply; 47+ messages in thread
From: Alexandru Gagniuc @ 2018-07-23 20:01 UTC (permalink / raw)
  To: linux-pci, bhelgaas
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
	jeffrey.t.kirsher, ariel.elior, michael.chan, ganeshgr, tariqt,
	jakub.kicinski, airlied, alexander.deucher, mike.marciniszyn,
	Alexandru Gagniuc, Frederick Lawler, Oza Pawandeep,
	Greg Kroah-Hartman, linux-kernel

When we don't own AER, we shouldn't touch the AER error bits. Clearing
error bits willy-nilly might cause firmware to miss some errors. In
theory, these bits get cleared by FFS, or via ACPI _HPX method. These
mechanisms are not subject to the problem.

This race is mostly of theoretical significance, since I can't
reasonably demonstrate this race in the lab.

On a side-note, pcie_aer_is_kernel_first() is created to alleviate the
need for two checks: aer_cap and get_firmware_first().

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/pci/pcie/aer.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a2e88386af28..85c3e173c025 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -307,6 +307,12 @@ int pcie_aer_get_firmware_first(struct pci_dev *dev)
 		aer_set_firmware_first(dev);
 	return dev->__aer_firmware_first;
 }
+
+static bool pcie_aer_is_kernel_first(struct pci_dev *dev)
+{
+	return !!dev->aer_cap && !pcie_aer_get_firmware_first(dev);
+}
+
 #define	PCI_EXP_AER_FLAGS	(PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE | \
 				 PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE)
 
@@ -337,10 +343,7 @@ bool aer_acpi_firmware_first(void)
 
 int pci_enable_pcie_error_reporting(struct pci_dev *dev)
 {
-	if (pcie_aer_get_firmware_first(dev))
-		return -EIO;
-
-	if (!dev->aer_cap)
+	if (!pcie_aer_is_kernel_first(dev))
 		return -EIO;
 
 	return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS);
@@ -349,7 +352,7 @@ EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
 
 int pci_disable_pcie_error_reporting(struct pci_dev *dev)
 {
-	if (pcie_aer_get_firmware_first(dev))
+	if (!pcie_aer_is_kernel_first(dev))
 		return -EIO;
 
 	return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
@@ -383,10 +386,10 @@ int pci_cleanup_aer_error_status_regs(struct pci_dev *dev)
 	if (!pci_is_pcie(dev))
 		return -ENODEV;
 
-	pos = dev->aer_cap;
-	if (!pos)
+	if (pcie_aer_is_kernel_first(dev))
 		return -EIO;
 
+	pos = dev->aer_cap;
 	port_type = pci_pcie_type(dev);
 	if (port_type == PCI_EXP_TYPE_ROOT_PORT) {
 		pci_read_config_dword(dev, pos + PCI_ERR_ROOT_STATUS, &status);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-07-18 21:53     ` Bjorn Helgaas
  2018-07-19 15:46       ` Alex G.
  2018-07-23 20:01       ` [PATCH v2] PCI/AER: Do not clear AER bits if we don't own AER Alexandru Gagniuc
@ 2018-07-23 20:03       ` Alexandru Gagniuc
  2018-07-23 21:01         ` Jakub Kicinski
  2 siblings, 1 reply; 47+ messages in thread
From: Alexandru Gagniuc @ 2018-07-23 20:03 UTC (permalink / raw)
  To: linux-pci, bhelgaas
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
	jeffrey.t.kirsher, ariel.elior, michael.chan, ganeshgr, tariqt,
	jakub.kicinski, airlied, alexander.deucher, mike.marciniszyn,
	Alexandru Gagniuc, linux-kernel

PCIe downtraining happens when both the device and PCIe port are
capable of a larger bus width or higher speed than negotiated.
Downtraining might be indicative of other problems in the system, and
identifying this from userspace is neither intuitive, nor
straightforward.

The easiest way to detect this is with pcie_print_link_status(),
since the bottleneck is usually the link that is downtrained. It's not
a perfect solution, but it works extremely well in most cases.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---

For the sake of review, I've created a __pcie_print_link_status() which
takes a 'verbose' argument. If we agree want to go this route, and update
the users of pcie_print_link_status(), I can split this up in two patches.
I prefer just printing this information in the core functions, and letting
drivers not have to worry about this. Though there seems to be strong for
not going that route, so here it goes:

Changes since v4:
 - Use 'verbose' argumnet to print bandwidth under normal conditions
 - Without verbose, only downtraining conditions are reported

Changes since v3:
 - Remove extra newline and parentheses.

Changes since v2:
 - Check dev->is_virtfn flag

Changes since v1:
 - Use pcie_print_link_status() instead of reimplementing logic

 drivers/pci/pci.c   | 22 ++++++++++++++++++----
 drivers/pci/probe.c | 21 +++++++++++++++++++++
 include/linux/pci.h |  1 +
 3 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 316496e99da9..414ad7b3abdb 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5302,14 +5302,15 @@ u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
 }
 
 /**
- * pcie_print_link_status - Report the PCI device's link speed and width
+ * __pcie_print_link_status - Report the PCI device's link speed and width
  * @dev: PCI device to query
+ * @verbose: Be verbose -- print info even when enough bandwidth is available.
  *
  * Report the available bandwidth at the device.  If this is less than the
  * device is capable of, report the device's maximum possible bandwidth and
  * the upstream link that limits its performance to less than that.
  */
-void pcie_print_link_status(struct pci_dev *dev)
+void __pcie_print_link_status(struct pci_dev *dev, bool verbose)
 {
 	enum pcie_link_width width, width_cap;
 	enum pci_bus_speed speed, speed_cap;
@@ -5319,11 +5320,11 @@ void pcie_print_link_status(struct pci_dev *dev)
 	bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap);
 	bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed, &width);
 
-	if (bw_avail >= bw_cap)
+	if (bw_avail >= bw_cap && verbose)
 		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s x%d link)\n",
 			 bw_cap / 1000, bw_cap % 1000,
 			 PCIE_SPEED2STR(speed_cap), width_cap);
-	else
+	else if (bw_avail < bw_cap)
 		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, limited by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
 			 bw_avail / 1000, bw_avail % 1000,
 			 PCIE_SPEED2STR(speed), width,
@@ -5331,6 +5332,19 @@ void pcie_print_link_status(struct pci_dev *dev)
 			 bw_cap / 1000, bw_cap % 1000,
 			 PCIE_SPEED2STR(speed_cap), width_cap);
 }
+
+/**
+ * pcie_print_link_status - Report the PCI device's link speed and width
+ * @dev: PCI device to query
+ *
+ * Report the available bandwidth at the device.  If this is less than the
+ * device is capable of, report the device's maximum possible bandwidth and
+ * the upstream link that limits its performance to less than that.
+ */
+void pcie_print_link_status(struct pci_dev *dev)
+{
+	__pcie_print_link_status(dev, true);
+}
 EXPORT_SYMBOL(pcie_print_link_status);
 
 /**
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ac876e32de4b..1f7336377c3b 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2205,6 +2205,24 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
 	return dev;
 }
 
+static void pcie_check_upstream_link(struct pci_dev *dev)
+{
+	if (!pci_is_pcie(dev))
+		return;
+
+	/* Look from the device up to avoid downstream ports with no devices. */
+	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
+		return;
+
+	/* Multi-function PCIe share the same link/status. */
+	if (PCI_FUNC(dev->devfn) != 0 || dev->is_virtfn)
+		return;
+
+	__pcie_print_link_status(dev, false);
+}
+
 static void pci_init_capabilities(struct pci_dev *dev)
 {
 	/* Enhanced Allocation */
@@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
 	/* Advanced Error Reporting */
 	pci_aer_init(dev);
 
+	/* Check link and detect downtrain errors */
+	pcie_check_upstream_link(dev);
+
 	if (pci_probe_reset_function(dev) == 0)
 		dev->reset_fn = 1;
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index abd5d5e17aee..15bfab8f7a1b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1088,6 +1088,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
 u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev,
 			     enum pci_bus_speed *speed,
 			     enum pcie_link_width *width);
+void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
 void pcie_print_link_status(struct pci_dev *dev);
 int pcie_flr(struct pci_dev *dev);
 int __pci_reset_function_locked(struct pci_dev *dev);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-07-23 20:03       ` [PATCH v5] PCI: Check for PCIe downtraining conditions Alexandru Gagniuc
@ 2018-07-23 21:01         ` Jakub Kicinski
  2018-07-23 21:52           ` Tal Gilboa
  0 siblings, 1 reply; 47+ messages in thread
From: Jakub Kicinski @ 2018-07-23 21:01 UTC (permalink / raw)
  To: Alexandru Gagniuc
  Cc: linux-pci, bhelgaas, keith.busch, alex_gagniuc, austin_bolen,
	shyam_iyer, jeffrey.t.kirsher, ariel.elior, michael.chan,
	ganeshgr, tariqt, airlied, alexander.deucher, mike.marciniszyn,
	linux-kernel

On Mon, 23 Jul 2018 15:03:38 -0500, Alexandru Gagniuc wrote:
> PCIe downtraining happens when both the device and PCIe port are
> capable of a larger bus width or higher speed than negotiated.
> Downtraining might be indicative of other problems in the system, and
> identifying this from userspace is neither intuitive, nor
> straightforward.
> 
> The easiest way to detect this is with pcie_print_link_status(),
> since the bottleneck is usually the link that is downtrained. It's not
> a perfect solution, but it works extremely well in most cases.
> 
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
> 
> For the sake of review, I've created a __pcie_print_link_status() which
> takes a 'verbose' argument. If we agree want to go this route, and update
> the users of pcie_print_link_status(), I can split this up in two patches.
> I prefer just printing this information in the core functions, and letting
> drivers not have to worry about this. Though there seems to be strong for
> not going that route, so here it goes:

FWIW the networking drivers print PCIe BW because sometimes the network
bandwidth is simply over-provisioned on multi port cards, e.g. 80Gbps
card on a x8 link.

Sorry to bike shed, but currently the networking cards print the info
during probe.  Would it make sense to move your message closer to probe
time?  Rather than when device is added.  If driver structure is
available, we could also consider adding a boolean to struct pci_driver
to indicate if driver wants the verbose message?  This way we avoid
duplicated prints.

I have no objection to current patch, it LGTM.  Just a thought.

>  drivers/pci/pci.c   | 22 ++++++++++++++++++----
>  drivers/pci/probe.c | 21 +++++++++++++++++++++
>  include/linux/pci.h |  1 +
>  3 files changed, 40 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 316496e99da9..414ad7b3abdb 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5302,14 +5302,15 @@ u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
>  }
>  
>  /**
> - * pcie_print_link_status - Report the PCI device's link speed and width
> + * __pcie_print_link_status - Report the PCI device's link speed and width
>   * @dev: PCI device to query
> + * @verbose: Be verbose -- print info even when enough bandwidth is available.
>   *
>   * Report the available bandwidth at the device.  If this is less than the
>   * device is capable of, report the device's maximum possible bandwidth and
>   * the upstream link that limits its performance to less than that.
>   */
> -void pcie_print_link_status(struct pci_dev *dev)
> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose)
>  {
>  	enum pcie_link_width width, width_cap;
>  	enum pci_bus_speed speed, speed_cap;
> @@ -5319,11 +5320,11 @@ void pcie_print_link_status(struct pci_dev *dev)
>  	bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap);
>  	bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed, &width);
>  
> -	if (bw_avail >= bw_cap)
> +	if (bw_avail >= bw_cap && verbose)
>  		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s x%d link)\n",
>  			 bw_cap / 1000, bw_cap % 1000,
>  			 PCIE_SPEED2STR(speed_cap), width_cap);
> -	else
> +	else if (bw_avail < bw_cap)
>  		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, limited by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
>  			 bw_avail / 1000, bw_avail % 1000,
>  			 PCIE_SPEED2STR(speed), width,
> @@ -5331,6 +5332,19 @@ void pcie_print_link_status(struct pci_dev *dev)
>  			 bw_cap / 1000, bw_cap % 1000,
>  			 PCIE_SPEED2STR(speed_cap), width_cap);
>  }
> +
> +/**
> + * pcie_print_link_status - Report the PCI device's link speed and width
> + * @dev: PCI device to query
> + *
> + * Report the available bandwidth at the device.  If this is less than the
> + * device is capable of, report the device's maximum possible bandwidth and
> + * the upstream link that limits its performance to less than that.
> + */
> +void pcie_print_link_status(struct pci_dev *dev)
> +{
> +	__pcie_print_link_status(dev, true);
> +}
>  EXPORT_SYMBOL(pcie_print_link_status);
>  
>  /**
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index ac876e32de4b..1f7336377c3b 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2205,6 +2205,24 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
>  	return dev;
>  }
>  
> +static void pcie_check_upstream_link(struct pci_dev *dev)
> +{
> +	if (!pci_is_pcie(dev))
> +		return;
> +
> +	/* Look from the device up to avoid downstream ports with no devices. */
> +	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
> +		return;
> +
> +	/* Multi-function PCIe share the same link/status. */
> +	if (PCI_FUNC(dev->devfn) != 0 || dev->is_virtfn)
> +		return;
> +
> +	__pcie_print_link_status(dev, false);
> +}
> +
>  static void pci_init_capabilities(struct pci_dev *dev)
>  {
>  	/* Enhanced Allocation */
> @@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
>  	/* Advanced Error Reporting */
>  	pci_aer_init(dev);
>  
> +	/* Check link and detect downtrain errors */
> +	pcie_check_upstream_link(dev);
> +
>  	if (pci_probe_reset_function(dev) == 0)
>  		dev->reset_fn = 1;
>  }
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index abd5d5e17aee..15bfab8f7a1b 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1088,6 +1088,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
>  u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev,
>  			     enum pci_bus_speed *speed,
>  			     enum pcie_link_width *width);
> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
>  void pcie_print_link_status(struct pci_dev *dev);
>  int pcie_flr(struct pci_dev *dev);
>  int __pci_reset_function_locked(struct pci_dev *dev);


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-07-23 17:01         ` Alex G.
@ 2018-07-23 21:35           ` Tal Gilboa
  0 siblings, 0 replies; 47+ messages in thread
From: Tal Gilboa @ 2018-07-23 21:35 UTC (permalink / raw)
  To: Alex G., Bjorn Helgaas
  Cc: bhelgaas, alex_gagniuc, austin_bolen, shyam_iyer, keith.busch,
	linux-pci, linux-kernel, Jeff Kirsher, Ariel Elior, Michael Chan,
	Ganesh Goudar, Tariq Toukan, Jakub Kicinski, Dave Airlie,
	Alex Deucher

On 7/23/2018 8:01 PM, Alex G. wrote:
> On 07/23/2018 12:21 AM, Tal Gilboa wrote:
>> On 7/19/2018 6:49 PM, Alex G. wrote:
>>>
>>>
>>> On 07/18/2018 08:38 AM, Tal Gilboa wrote:
>>>> On 7/16/2018 5:17 PM, Bjorn Helgaas wrote:
>>>>> [+cc maintainers of drivers that already use pcie_print_link_status()
>>>>> and GPU folks]
>>> [snip]
>>>>>
>>>>>> +    /* Multi-function PCIe share the same link/status. */
>>>>>> +    if ((PCI_FUNC(dev->devfn) != 0) || dev->is_virtfn)
>>>>>> +        return;
>>>>>> +
>>>>>> +    pcie_print_link_status(dev);
>>>>>> +}
>>>>
>>>> Is this function called by default for every PCIe device? What about 
>>>> VFs? We make an exception for them on our driver since a VF doesn't 
>>>> have access to the needed information in order to provide a 
>>>> meaningful message.
>>>
>>> I'm assuming VF means virtual function. pcie_print_link_status() 
>>> doesn't care if it's passed a virtual function. It will try to do its 
>>> job. That's why I bail out three lines above, with 'dev->is_virtfn' 
>>> check.
>>>
>>> Alex
>>
>> That's the point - we don't want to call pcie_print_link_status() for 
>> virtual functions. We make the distinction in our driver. If you want 
>> to change the code to call this function by default it shouldn't 
>> affect the current usage.
> 
> I'm not understanding very well what you're asking. I understand you 
> want to avoid printing this message on virtual functions, and that's 
> already taken care of. I'm also not changing current behavior.  Let's 
> get v2 out and start the discussion again based on that.
> 
> Alex

Oh ok I see. In this case, please remove the explicit call in mlx4/5 
drivers so it won't be duplicated.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-07-23 21:01         ` Jakub Kicinski
@ 2018-07-23 21:52           ` Tal Gilboa
  2018-07-23 22:14             ` Jakub Kicinski
  2018-07-31  6:40             ` Tal Gilboa
  0 siblings, 2 replies; 47+ messages in thread
From: Tal Gilboa @ 2018-07-23 21:52 UTC (permalink / raw)
  To: Jakub Kicinski, Alexandru Gagniuc
  Cc: linux-pci, bhelgaas, keith.busch, alex_gagniuc, austin_bolen,
	shyam_iyer, jeffrey.t.kirsher, ariel.elior, michael.chan,
	ganeshgr, Tariq Toukan, airlied, alexander.deucher,
	mike.marciniszyn, linux-kernel

On 7/24/2018 12:01 AM, Jakub Kicinski wrote:
> On Mon, 23 Jul 2018 15:03:38 -0500, Alexandru Gagniuc wrote:
>> PCIe downtraining happens when both the device and PCIe port are
>> capable of a larger bus width or higher speed than negotiated.
>> Downtraining might be indicative of other problems in the system, and
>> identifying this from userspace is neither intuitive, nor
>> straightforward.
>>
>> The easiest way to detect this is with pcie_print_link_status(),
>> since the bottleneck is usually the link that is downtrained. It's not
>> a perfect solution, but it works extremely well in most cases.
>>
>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>> ---
>>
>> For the sake of review, I've created a __pcie_print_link_status() which
>> takes a 'verbose' argument. If we agree want to go this route, and update
>> the users of pcie_print_link_status(), I can split this up in two patches.
>> I prefer just printing this information in the core functions, and letting
>> drivers not have to worry about this. Though there seems to be strong for
>> not going that route, so here it goes:
> 
> FWIW the networking drivers print PCIe BW because sometimes the network
> bandwidth is simply over-provisioned on multi port cards, e.g. 80Gbps
> card on a x8 link.
> 
> Sorry to bike shed, but currently the networking cards print the info
> during probe.  Would it make sense to move your message closer to probe
> time?  Rather than when device is added.  If driver structure is
> available, we could also consider adding a boolean to struct pci_driver
> to indicate if driver wants the verbose message?  This way we avoid
> duplicated prints.
> 
> I have no objection to current patch, it LGTM.  Just a thought.

I don't see the reason for having two functions. What's the problem with 
adding the verbose argument to the original function?

> 
>>   drivers/pci/pci.c   | 22 ++++++++++++++++++----
>>   drivers/pci/probe.c | 21 +++++++++++++++++++++
>>   include/linux/pci.h |  1 +
>>   3 files changed, 40 insertions(+), 4 deletions(-) >>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 316496e99da9..414ad7b3abdb 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -5302,14 +5302,15 @@ u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
>>   }
>>   
>>   /**
>> - * pcie_print_link_status - Report the PCI device's link speed and width
>> + * __pcie_print_link_status - Report the PCI device's link speed and width
>>    * @dev: PCI device to query
>> + * @verbose: Be verbose -- print info even when enough bandwidth is available.
>>    *
>>    * Report the available bandwidth at the device.  If this is less than the
>>    * device is capable of, report the device's maximum possible bandwidth and
>>    * the upstream link that limits its performance to less than that.
>>    */
>> -void pcie_print_link_status(struct pci_dev *dev)
>> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose)
>>   {
>>   	enum pcie_link_width width, width_cap;
>>   	enum pci_bus_speed speed, speed_cap;
>> @@ -5319,11 +5320,11 @@ void pcie_print_link_status(struct pci_dev *dev)
>>   	bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap);
>>   	bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed, &width);
>>   
>> -	if (bw_avail >= bw_cap)
>> +	if (bw_avail >= bw_cap && verbose)
>>   		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s x%d link)\n",
>>   			 bw_cap / 1000, bw_cap % 1000,
>>   			 PCIE_SPEED2STR(speed_cap), width_cap);
>> -	else
>> +	else if (bw_avail < bw_cap)
>>   		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, limited by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
>>   			 bw_avail / 1000, bw_avail % 1000,
>>   			 PCIE_SPEED2STR(speed), width,
>> @@ -5331,6 +5332,19 @@ void pcie_print_link_status(struct pci_dev *dev)
>>   			 bw_cap / 1000, bw_cap % 1000,
>>   			 PCIE_SPEED2STR(speed_cap), width_cap);
>>   }
>> +
>> +/**
>> + * pcie_print_link_status - Report the PCI device's link speed and width
>> + * @dev: PCI device to query
>> + *
>> + * Report the available bandwidth at the device.  If this is less than the
>> + * device is capable of, report the device's maximum possible bandwidth and
>> + * the upstream link that limits its performance to less than that.
>> + */
>> +void pcie_print_link_status(struct pci_dev *dev)
>> +{
>> +	__pcie_print_link_status(dev, true);
>> +}
>>   EXPORT_SYMBOL(pcie_print_link_status);
>>   
>>   /**
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index ac876e32de4b..1f7336377c3b 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -2205,6 +2205,24 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
>>   	return dev;
>>   }
>>   
>> +static void pcie_check_upstream_link(struct pci_dev *dev)
>> +{
>> +	if (!pci_is_pcie(dev))
>> +		return;
>> +
>> +	/* Look from the device up to avoid downstream ports with no devices. */
>> +	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
>> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
>> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
>> +		return;
>> +
>> +	/* Multi-function PCIe share the same link/status. */
>> +	if (PCI_FUNC(dev->devfn) != 0 || dev->is_virtfn)
>> +		return;
>> +
>> +	__pcie_print_link_status(dev, false);
>> +}
>> +
>>   static void pci_init_capabilities(struct pci_dev *dev)
>>   {
>>   	/* Enhanced Allocation */
>> @@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
>>   	/* Advanced Error Reporting */
>>   	pci_aer_init(dev);
>>   
>> +	/* Check link and detect downtrain errors */
>> +	pcie_check_upstream_link(dev);

This is called for every PCIe device right? Won't there be a duplicated 
print in case a device loads with lower PCIe bandwidth than needed?

>> +
>>   	if (pci_probe_reset_function(dev) == 0)
>>   		dev->reset_fn = 1;
>>   }
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index abd5d5e17aee..15bfab8f7a1b 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -1088,6 +1088,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
>>   u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev,
>>   			     enum pci_bus_speed *speed,
>>   			     enum pcie_link_width *width);
>> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
>>   void pcie_print_link_status(struct pci_dev *dev);
>>   int pcie_flr(struct pci_dev *dev);
>>   int __pci_reset_function_locked(struct pci_dev *dev);
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-07-23 21:52           ` Tal Gilboa
@ 2018-07-23 22:14             ` Jakub Kicinski
  2018-07-23 23:59               ` Alex G.
  2018-07-31  6:40             ` Tal Gilboa
  1 sibling, 1 reply; 47+ messages in thread
From: Jakub Kicinski @ 2018-07-23 22:14 UTC (permalink / raw)
  To: Tal Gilboa
  Cc: Alexandru Gagniuc, linux-pci, bhelgaas, keith.busch,
	alex_gagniuc, austin_bolen, shyam_iyer, jeffrey.t.kirsher,
	ariel.elior, michael.chan, ganeshgr, Tariq Toukan, airlied,
	alexander.deucher, mike.marciniszyn, linux-kernel

On Tue, 24 Jul 2018 00:52:22 +0300, Tal Gilboa wrote:
> On 7/24/2018 12:01 AM, Jakub Kicinski wrote:
> > On Mon, 23 Jul 2018 15:03:38 -0500, Alexandru Gagniuc wrote:  
> >> PCIe downtraining happens when both the device and PCIe port are
> >> capable of a larger bus width or higher speed than negotiated.
> >> Downtraining might be indicative of other problems in the system, and
> >> identifying this from userspace is neither intuitive, nor
> >> straightforward.
> >>
> >> The easiest way to detect this is with pcie_print_link_status(),
> >> since the bottleneck is usually the link that is downtrained. It's not
> >> a perfect solution, but it works extremely well in most cases.
> >>
> >> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> >> ---
> >>
> >> For the sake of review, I've created a __pcie_print_link_status() which
> >> takes a 'verbose' argument. If we agree want to go this route, and update
> >> the users of pcie_print_link_status(), I can split this up in two patches.
> >> I prefer just printing this information in the core functions, and letting
> >> drivers not have to worry about this. Though there seems to be strong for
> >> not going that route, so here it goes:  
> > 
> > FWIW the networking drivers print PCIe BW because sometimes the network
> > bandwidth is simply over-provisioned on multi port cards, e.g. 80Gbps
> > card on a x8 link.
> > 
> > Sorry to bike shed, but currently the networking cards print the info
> > during probe.  Would it make sense to move your message closer to probe
> > time?  Rather than when device is added.  If driver structure is
> > available, we could also consider adding a boolean to struct pci_driver
> > to indicate if driver wants the verbose message?  This way we avoid
> > duplicated prints.
> > 
> > I have no objection to current patch, it LGTM.  Just a thought.  
> 
> I don't see the reason for having two functions. What's the problem with 
> adding the verbose argument to the original function?

IMHO it's reasonable to keep the default parameter to what 90% of users
want by a means on a wrapper.  The non-verbose output is provided by
the core already for all devices.

What do you think of my proposal above Tal?  That would make the extra
wrapper unnecessary since the verbose parameter would be part of the
driver structure, and it would avoid the duplicated output.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-07-23 22:14             ` Jakub Kicinski
@ 2018-07-23 23:59               ` Alex G.
  2018-07-24 13:39                 ` Tal Gilboa
  0 siblings, 1 reply; 47+ messages in thread
From: Alex G. @ 2018-07-23 23:59 UTC (permalink / raw)
  To: Jakub Kicinski, Tal Gilboa
  Cc: linux-pci, bhelgaas, keith.busch, alex_gagniuc, austin_bolen,
	shyam_iyer, jeffrey.t.kirsher, ariel.elior, michael.chan,
	ganeshgr, Tariq Toukan, airlied, alexander.deucher,
	mike.marciniszyn, linux-kernel



On 07/23/2018 05:14 PM, Jakub Kicinski wrote:
> On Tue, 24 Jul 2018 00:52:22 +0300, Tal Gilboa wrote:
>> On 7/24/2018 12:01 AM, Jakub Kicinski wrote:
>>> On Mon, 23 Jul 2018 15:03:38 -0500, Alexandru Gagniuc wrote:
>>>> PCIe downtraining happens when both the device and PCIe port are
>>>> capable of a larger bus width or higher speed than negotiated.
>>>> Downtraining might be indicative of other problems in the system, and
>>>> identifying this from userspace is neither intuitive, nor
>>>> straightforward.
>>>>
>>>> The easiest way to detect this is with pcie_print_link_status(),
>>>> since the bottleneck is usually the link that is downtrained. It's not
>>>> a perfect solution, but it works extremely well in most cases.
>>>>
>>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>>>> ---
>>>>
>>>> For the sake of review, I've created a __pcie_print_link_status() which
>>>> takes a 'verbose' argument. If we agree want to go this route, and update
>>>> the users of pcie_print_link_status(), I can split this up in two patches.
>>>> I prefer just printing this information in the core functions, and letting
>>>> drivers not have to worry about this. Though there seems to be strong for
>>>> not going that route, so here it goes:
>>>
>>> FWIW the networking drivers print PCIe BW because sometimes the network
>>> bandwidth is simply over-provisioned on multi port cards, e.g. 80Gbps
>>> card on a x8 link.
>>>
>>> Sorry to bike shed, but currently the networking cards print the info
>>> during probe.  Would it make sense to move your message closer to probe
>>> time?  Rather than when device is added.  If driver structure is
>>> available, we could also consider adding a boolean to struct pci_driver
>>> to indicate if driver wants the verbose message?  This way we avoid
>>> duplicated prints.
>>>
>>> I have no objection to current patch, it LGTM.  Just a thought.
>>
>> I don't see the reason for having two functions. What's the problem with
>> adding the verbose argument to the original function?
> 
> IMHO it's reasonable to keep the default parameter to what 90% of users
> want by a means on a wrapper.  The non-verbose output is provided by
> the core already for all devices.
> 
> What do you think of my proposal above Tal?  That would make the extra
> wrapper unnecessary since the verbose parameter would be part of the
> driver structure, and it would avoid the duplicated output.

I see how it might make sense to add another member to the driver 
struct, but is it worth the extra learning curve? It seems to be 
something with the potential to confuse new driver developers, and 
having a very marginal benefit.
Although, if that's what people want...

Alex

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-07-23 23:59               ` Alex G.
@ 2018-07-24 13:39                 ` Tal Gilboa
  2018-07-30 23:26                   ` Alex_Gagniuc
  0 siblings, 1 reply; 47+ messages in thread
From: Tal Gilboa @ 2018-07-24 13:39 UTC (permalink / raw)
  To: Alex G., Jakub Kicinski
  Cc: linux-pci, bhelgaas, keith.busch, alex_gagniuc, austin_bolen,
	shyam_iyer, jeffrey.t.kirsher, ariel.elior, michael.chan,
	ganeshgr, Tariq Toukan, airlied, alexander.deucher,
	mike.marciniszyn, linux-kernel

On 7/24/2018 2:59 AM, Alex G. wrote:
> 
> 
> On 07/23/2018 05:14 PM, Jakub Kicinski wrote:
>> On Tue, 24 Jul 2018 00:52:22 +0300, Tal Gilboa wrote:
>>> On 7/24/2018 12:01 AM, Jakub Kicinski wrote:
>>>> On Mon, 23 Jul 2018 15:03:38 -0500, Alexandru Gagniuc wrote:
>>>>> PCIe downtraining happens when both the device and PCIe port are
>>>>> capable of a larger bus width or higher speed than negotiated.
>>>>> Downtraining might be indicative of other problems in the system, and
>>>>> identifying this from userspace is neither intuitive, nor
>>>>> straightforward.
>>>>>
>>>>> The easiest way to detect this is with pcie_print_link_status(),
>>>>> since the bottleneck is usually the link that is downtrained. It's not
>>>>> a perfect solution, but it works extremely well in most cases.
>>>>>
>>>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>>>>> ---
>>>>>
>>>>> For the sake of review, I've created a __pcie_print_link_status() 
>>>>> which
>>>>> takes a 'verbose' argument. If we agree want to go this route, and 
>>>>> update
>>>>> the users of pcie_print_link_status(), I can split this up in two 
>>>>> patches.
>>>>> I prefer just printing this information in the core functions, and 
>>>>> letting
>>>>> drivers not have to worry about this. Though there seems to be 
>>>>> strong for
>>>>> not going that route, so here it goes:
>>>>
>>>> FWIW the networking drivers print PCIe BW because sometimes the network
>>>> bandwidth is simply over-provisioned on multi port cards, e.g. 80Gbps
>>>> card on a x8 link.
>>>>
>>>> Sorry to bike shed, but currently the networking cards print the info
>>>> during probe.  Would it make sense to move your message closer to probe
>>>> time?  Rather than when device is added.  If driver structure is
>>>> available, we could also consider adding a boolean to struct pci_driver
>>>> to indicate if driver wants the verbose message?  This way we avoid
>>>> duplicated prints.
>>>>
>>>> I have no objection to current patch, it LGTM.  Just a thought.
>>>
>>> I don't see the reason for having two functions. What's the problem with
>>> adding the verbose argument to the original function?
>>
>> IMHO it's reasonable to keep the default parameter to what 90% of users
>> want by a means on a wrapper.  The non-verbose output is provided by
>> the core already for all devices.
>>
>> What do you think of my proposal above Tal?  That would make the extra
>> wrapper unnecessary since the verbose parameter would be part of the
>> driver structure, and it would avoid the duplicated output.
> 
> I see how it might make sense to add another member to the driver 
> struct, but is it worth the extra learning curve? It seems to be 
> something with the potential to confuse new driver developers, and 
> having a very marginal benefit.
> Although, if that's what people want...

I prefer the wrapper function. Looking at struct pci_driver it would 
seem strange for it to hold a field for controlling verbosity (IMO). 
This is a very (very) specific field in a very general struct.

> 
> Alex

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2] PCI/AER: Do not clear AER bits if we don't own AER
  2018-07-23 20:01       ` [PATCH v2] PCI/AER: Do not clear AER bits if we don't own AER Alexandru Gagniuc
@ 2018-07-25  1:24         ` kbuild test robot
  0 siblings, 0 replies; 47+ messages in thread
From: kbuild test robot @ 2018-07-25  1:24 UTC (permalink / raw)
  To: Alexandru Gagniuc
  Cc: kbuild-all, linux-pci, bhelgaas, keith.busch, alex_gagniuc,
	austin_bolen, shyam_iyer, jeffrey.t.kirsher, ariel.elior,
	michael.chan, ganeshgr, tariqt, jakub.kicinski, airlied,
	alexander.deucher, mike.marciniszyn, Alexandru Gagniuc,
	Frederick Lawler, Oza Pawandeep, Greg Kroah-Hartman,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2176 bytes --]

Hi Alexandru,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on pci/next]
[also build test WARNING on v4.18-rc6 next-20180724]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Alexandru-Gagniuc/PCI-AER-Do-not-clear-AER-bits-if-we-don-t-own-AER/20180725-003708
base:   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
config: x86_64-randconfig-s0-07250049 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from include/linux/string.h:6:0,
                    from include/linux/uuid.h:20,
                    from include/linux/cper.h:24,
                    from drivers/pci/pcie/aer.c:15:
   drivers/pci/pcie/aer.c: In function 'pci_enable_pcie_error_reporting':
   drivers/pci/pcie/aer.c:371:7: error: implicit declaration of function 'pcie_aer_is_kernel_first' [-Werror=implicit-function-declaration]
     if (!pcie_aer_is_kernel_first(dev))
          ^
   include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
     if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
                                 ^~~~
>> drivers/pci/pcie/aer.c:371:2: note: in expansion of macro 'if'
     if (!pcie_aer_is_kernel_first(dev))
     ^~
   cc1: some warnings being treated as errors

vim +/if +371 drivers/pci/pcie/aer.c

   365	
   366	#define	PCI_EXP_AER_FLAGS	(PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE | \
   367					 PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE)
   368	
   369	int pci_enable_pcie_error_reporting(struct pci_dev *dev)
   370	{
 > 371		if (!pcie_aer_is_kernel_first(dev))
   372			return -EIO;
   373	
   374		return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS);
   375	}
   376	EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
   377	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32082 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-07-24 13:39                 ` Tal Gilboa
@ 2018-07-30 23:26                   ` Alex_Gagniuc
  0 siblings, 0 replies; 47+ messages in thread
From: Alex_Gagniuc @ 2018-07-30 23:26 UTC (permalink / raw)
  To: talgi, mr.nuke.me, jakub.kicinski
  Cc: linux-pci, bhelgaas, keith.busch, Austin.Bolen, Shyam.Iyer,
	jeffrey.t.kirsher, ariel.elior, michael.chan, ganeshgr, tariqt,
	airlied, alexander.deucher, mike.marciniszyn, linux-kernel

On 07/24/2018 08:40 AM, Tal Gilboa wrote:
> On 7/24/2018 2:59 AM, Alex G. wrote:
>>
>>
>> On 07/23/2018 05:14 PM, Jakub Kicinski wrote:
>>> On Tue, 24 Jul 2018 00:52:22 +0300, Tal Gilboa wrote:
>>>> On 7/24/2018 12:01 AM, Jakub Kicinski wrote:
>>>>> On Mon, 23 Jul 2018 15:03:38 -0500, Alexandru Gagniuc wrote:
>>>>>> PCIe downtraining happens when both the device and PCIe port are
>>>>>> capable of a larger bus width or higher speed than negotiated.
>>>>>> Downtraining might be indicative of other problems in the system, and
>>>>>> identifying this from userspace is neither intuitive, nor
>>>>>> straightforward.
>>>>>>
>>>>>> The easiest way to detect this is with pcie_print_link_status(),
>>>>>> since the bottleneck is usually the link that is downtrained. It's not
>>>>>> a perfect solution, but it works extremely well in most cases.
>>>>>>
>>>>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>>>>>> ---
>>>>>>
>>>>>> For the sake of review, I've created a __pcie_print_link_status()
>>>>>> which
>>>>>> takes a 'verbose' argument. If we agree want to go this route, and
>>>>>> update
>>>>>> the users of pcie_print_link_status(), I can split this up in two
>>>>>> patches.
>>>>>> I prefer just printing this information in the core functions, and
>>>>>> letting
>>>>>> drivers not have to worry about this. Though there seems to be
>>>>>> strong for
>>>>>> not going that route, so here it goes:
>>>>>
>>>>> FWIW the networking drivers print PCIe BW because sometimes the network
>>>>> bandwidth is simply over-provisioned on multi port cards, e.g. 80Gbps
>>>>> card on a x8 link.
>>>>>
>>>>> Sorry to bike shed, but currently the networking cards print the info
>>>>> during probe.  Would it make sense to move your message closer to probe
>>>>> time?  Rather than when device is added.  If driver structure is
>>>>> available, we could also consider adding a boolean to struct pci_driver
>>>>> to indicate if driver wants the verbose message?  This way we avoid
>>>>> duplicated prints.
>>>>>
>>>>> I have no objection to current patch, it LGTM.  Just a thought.
>>>>
>>>> I don't see the reason for having two functions. What's the problem with
>>>> adding the verbose argument to the original function?
>>>
>>> IMHO it's reasonable to keep the default parameter to what 90% of users
>>> want by a means on a wrapper.  The non-verbose output is provided by
>>> the core already for all devices.
>>>
>>> What do you think of my proposal above Tal?  That would make the extra
>>> wrapper unnecessary since the verbose parameter would be part of the
>>> driver structure, and it would avoid the duplicated output.
>>
>> I see how it might make sense to add another member to the driver
>> struct, but is it worth the extra learning curve? It seems to be
>> something with the potential to confuse new driver developers, and
>> having a very marginal benefit.
>> Although, if that's what people want...
> 
> I prefer the wrapper function. Looking at struct pci_driver it would
> seem strange for it to hold a field for controlling verbosity (IMO).
> This is a very (very) specific field in a very general struct.

If people are okay with the wrapper, then I'm not going to update the patch.

Alex

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-07-23 21:52           ` Tal Gilboa
  2018-07-23 22:14             ` Jakub Kicinski
@ 2018-07-31  6:40             ` Tal Gilboa
  2018-07-31 15:10               ` Alex G.
  1 sibling, 1 reply; 47+ messages in thread
From: Tal Gilboa @ 2018-07-31  6:40 UTC (permalink / raw)
  To: Jakub Kicinski, Alexandru Gagniuc
  Cc: linux-pci, bhelgaas, keith.busch, alex_gagniuc, austin_bolen,
	shyam_iyer, jeffrey.t.kirsher, ariel.elior, michael.chan,
	ganeshgr, Tariq Toukan, airlied, alexander.deucher,
	mike.marciniszyn, linux-kernel

On 7/24/2018 12:52 AM, Tal Gilboa wrote:
> On 7/24/2018 12:01 AM, Jakub Kicinski wrote:
>> On Mon, 23 Jul 2018 15:03:38 -0500, Alexandru Gagniuc wrote:
>>> PCIe downtraining happens when both the device and PCIe port are
>>> capable of a larger bus width or higher speed than negotiated.
>>> Downtraining might be indicative of other problems in the system, and
>>> identifying this from userspace is neither intuitive, nor
>>> straightforward.
>>>
>>> The easiest way to detect this is with pcie_print_link_status(),
>>> since the bottleneck is usually the link that is downtrained. It's not
>>> a perfect solution, but it works extremely well in most cases.
>>>
>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>>> ---
>>>
>>> For the sake of review, I've created a __pcie_print_link_status() which
>>> takes a 'verbose' argument. If we agree want to go this route, and 
>>> update
>>> the users of pcie_print_link_status(), I can split this up in two 
>>> patches.
>>> I prefer just printing this information in the core functions, and 
>>> letting
>>> drivers not have to worry about this. Though there seems to be strong 
>>> for
>>> not going that route, so here it goes:
>>
>> FWIW the networking drivers print PCIe BW because sometimes the network
>> bandwidth is simply over-provisioned on multi port cards, e.g. 80Gbps
>> card on a x8 link.
>>
>> Sorry to bike shed, but currently the networking cards print the info
>> during probe.  Would it make sense to move your message closer to probe
>> time?  Rather than when device is added.  If driver structure is
>> available, we could also consider adding a boolean to struct pci_driver
>> to indicate if driver wants the verbose message?  This way we avoid
>> duplicated prints.
>>
>> I have no objection to current patch, it LGTM.  Just a thought.
> 
> I don't see the reason for having two functions. What's the problem with 
> adding the verbose argument to the original function?
> 
>>
>>>   drivers/pci/pci.c   | 22 ++++++++++++++++++----
>>>   drivers/pci/probe.c | 21 +++++++++++++++++++++
>>>   include/linux/pci.h |  1 +
>>>   3 files changed, 40 insertions(+), 4 deletions(-) >>
>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>> index 316496e99da9..414ad7b3abdb 100644
>>> --- a/drivers/pci/pci.c
>>> +++ b/drivers/pci/pci.c
>>> @@ -5302,14 +5302,15 @@ u32 pcie_bandwidth_capable(struct pci_dev 
>>> *dev, enum pci_bus_speed *speed,
>>>   }
>>>   /**
>>> - * pcie_print_link_status - Report the PCI device's link speed and 
>>> width
>>> + * __pcie_print_link_status - Report the PCI device's link speed and 
>>> width
>>>    * @dev: PCI device to query
>>> + * @verbose: Be verbose -- print info even when enough bandwidth is 
>>> available.
>>>    *
>>>    * Report the available bandwidth at the device.  If this is less 
>>> than the
>>>    * device is capable of, report the device's maximum possible 
>>> bandwidth and
>>>    * the upstream link that limits its performance to less than that.
>>>    */
>>> -void pcie_print_link_status(struct pci_dev *dev)
>>> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose)
>>>   {
>>>       enum pcie_link_width width, width_cap;
>>>       enum pci_bus_speed speed, speed_cap;
>>> @@ -5319,11 +5320,11 @@ void pcie_print_link_status(struct pci_dev *dev)
>>>       bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap);
>>>       bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed, 
>>> &width);
>>> -    if (bw_avail >= bw_cap)
>>> +    if (bw_avail >= bw_cap && verbose)
>>>           pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s 
>>> x%d link)\n",
>>>                bw_cap / 1000, bw_cap % 1000,
>>>                PCIE_SPEED2STR(speed_cap), width_cap);
>>> -    else
>>> +    else if (bw_avail < bw_cap)
>>>           pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, 
>>> limited by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d 
>>> link)\n",
>>>                bw_avail / 1000, bw_avail % 1000,
>>>                PCIE_SPEED2STR(speed), width,
>>> @@ -5331,6 +5332,19 @@ void pcie_print_link_status(struct pci_dev *dev)
>>>                bw_cap / 1000, bw_cap % 1000,
>>>                PCIE_SPEED2STR(speed_cap), width_cap);
>>>   }
>>> +
>>> +/**
>>> + * pcie_print_link_status - Report the PCI device's link speed and 
>>> width
>>> + * @dev: PCI device to query
>>> + *
>>> + * Report the available bandwidth at the device.  If this is less 
>>> than the
>>> + * device is capable of, report the device's maximum possible 
>>> bandwidth and
>>> + * the upstream link that limits its performance to less than that.
>>> + */
>>> +void pcie_print_link_status(struct pci_dev *dev)
>>> +{
>>> +    __pcie_print_link_status(dev, true);
>>> +}
>>>   EXPORT_SYMBOL(pcie_print_link_status);
>>>   /**
>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>> index ac876e32de4b..1f7336377c3b 100644
>>> --- a/drivers/pci/probe.c
>>> +++ b/drivers/pci/probe.c
>>> @@ -2205,6 +2205,24 @@ static struct pci_dev *pci_scan_device(struct 
>>> pci_bus *bus, int devfn)
>>>       return dev;
>>>   }
>>> +static void pcie_check_upstream_link(struct pci_dev *dev)
>>> +{
>>> +    if (!pci_is_pcie(dev))
>>> +        return;
>>> +
>>> +    /* Look from the device up to avoid downstream ports with no 
>>> devices. */
>>> +    if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
>>> +        (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
>>> +        (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
>>> +        return;
>>> +
>>> +    /* Multi-function PCIe share the same link/status. */
>>> +    if (PCI_FUNC(dev->devfn) != 0 || dev->is_virtfn)
>>> +        return;
>>> +
>>> +    __pcie_print_link_status(dev, false);
>>> +}
>>> +
>>>   static void pci_init_capabilities(struct pci_dev *dev)
>>>   {
>>>       /* Enhanced Allocation */
>>> @@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct 
>>> pci_dev *dev)
>>>       /* Advanced Error Reporting */
>>>       pci_aer_init(dev);
>>> +    /* Check link and detect downtrain errors */
>>> +    pcie_check_upstream_link(dev);
> 
> This is called for every PCIe device right? Won't there be a duplicated 
> print in case a device loads with lower PCIe bandwidth than needed?

Alex, can you comment on this please?

> 
>>> +
>>>       if (pci_probe_reset_function(dev) == 0)
>>>           dev->reset_fn = 1;
>>>   }
>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>> index abd5d5e17aee..15bfab8f7a1b 100644
>>> --- a/include/linux/pci.h
>>> +++ b/include/linux/pci.h
>>> @@ -1088,6 +1088,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
>>>   u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev 
>>> **limiting_dev,
>>>                    enum pci_bus_speed *speed,
>>>                    enum pcie_link_width *width);
>>> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
>>>   void pcie_print_link_status(struct pci_dev *dev);
>>>   int pcie_flr(struct pci_dev *dev);
>>>   int __pci_reset_function_locked(struct pci_dev *dev);
>>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-07-31  6:40             ` Tal Gilboa
@ 2018-07-31 15:10               ` Alex G.
  2018-08-05  7:05                 ` Tal Gilboa
  0 siblings, 1 reply; 47+ messages in thread
From: Alex G. @ 2018-07-31 15:10 UTC (permalink / raw)
  To: Tal Gilboa, Jakub Kicinski
  Cc: linux-pci, bhelgaas, keith.busch, alex_gagniuc, austin_bolen,
	shyam_iyer, jeffrey.t.kirsher, ariel.elior, michael.chan,
	ganeshgr, Tariq Toukan, airlied, alexander.deucher,
	mike.marciniszyn, linux-kernel

On 07/31/2018 01:40 AM, Tal Gilboa wrote:
[snip]
>>>> @@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct 
>>>> pci_dev *dev)
>>>>       /* Advanced Error Reporting */
>>>>       pci_aer_init(dev);
>>>> +    /* Check link and detect downtrain errors */
>>>> +    pcie_check_upstream_link(dev);
>>
>> This is called for every PCIe device right? Won't there be a 
>> duplicated print in case a device loads with lower PCIe bandwidth than 
>> needed?
> 
> Alex, can you comment on this please?

Of course I can.

There's one print at probe() time, which happens if bandwidth < max. I 
would think that's fine. There is a way to duplicate it, and that is if 
the driver also calls print_link_status(). A few driver maintainers who 
call it have indicated they'd be fine with removing it from the driver, 
and leaving it in the core PCI.

Should the device come back at low speed, go away, then come back at 
full speed we'd still only see one print from the first probe. Again, if 
driver doesn't also call the function.
There's the corner case where the device come up at < max, goes away, 
then comes back faster, but still < max. There will be two prints, but 
they won't be true duplicates -- each one will indicate different BW.

Alex

>>>> +
>>>>       if (pci_probe_reset_function(dev) == 0)
>>>>           dev->reset_fn = 1;
>>>>   }
>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>> index abd5d5e17aee..15bfab8f7a1b 100644
>>>> --- a/include/linux/pci.h
>>>> +++ b/include/linux/pci.h
>>>> @@ -1088,6 +1088,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
>>>>   u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev 
>>>> **limiting_dev,
>>>>                    enum pci_bus_speed *speed,
>>>>                    enum pcie_link_width *width);
>>>> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
>>>>   void pcie_print_link_status(struct pci_dev *dev);
>>>>   int pcie_flr(struct pci_dev *dev);
>>>>   int __pci_reset_function_locked(struct pci_dev *dev);
>>>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-07-31 15:10               ` Alex G.
@ 2018-08-05  7:05                 ` Tal Gilboa
  2018-08-06 18:39                   ` Alex_Gagniuc
  0 siblings, 1 reply; 47+ messages in thread
From: Tal Gilboa @ 2018-08-05  7:05 UTC (permalink / raw)
  To: Alex G., Jakub Kicinski
  Cc: linux-pci, bhelgaas, keith.busch, alex_gagniuc, austin_bolen,
	shyam_iyer, jeffrey.t.kirsher, ariel.elior, michael.chan,
	ganeshgr, Tariq Toukan, airlied, alexander.deucher,
	mike.marciniszyn, linux-kernel

On 7/31/2018 6:10 PM, Alex G. wrote:
> On 07/31/2018 01:40 AM, Tal Gilboa wrote:
> [snip]
>>>>> @@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct 
>>>>> pci_dev *dev)
>>>>>       /* Advanced Error Reporting */
>>>>>       pci_aer_init(dev);
>>>>> +    /* Check link and detect downtrain errors */
>>>>> +    pcie_check_upstream_link(dev);
>>>
>>> This is called for every PCIe device right? Won't there be a 
>>> duplicated print in case a device loads with lower PCIe bandwidth 
>>> than needed?
>>
>> Alex, can you comment on this please?
> 
> Of course I can.
> 
> There's one print at probe() time, which happens if bandwidth < max. I 
> would think that's fine. There is a way to duplicate it, and that is if 
> the driver also calls print_link_status(). A few driver maintainers who 
> call it have indicated they'd be fine with removing it from the driver, 
> and leaving it in the core PCI.

We would be fine with that as well. Please include the removal in your 
patches.

> 
> Should the device come back at low speed, go away, then come back at 
> full speed we'd still only see one print from the first probe. Again, if 
> driver doesn't also call the function.
> There's the corner case where the device come up at < max, goes away, 
> then comes back faster, but still < max. There will be two prints, but 
> they won't be true duplicates -- each one will indicate different BW.

This is fine.

> 
> Alex
> 
>>>>> +
>>>>>       if (pci_probe_reset_function(dev) == 0)
>>>>>           dev->reset_fn = 1;
>>>>>   }
>>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>>> index abd5d5e17aee..15bfab8f7a1b 100644
>>>>> --- a/include/linux/pci.h
>>>>> +++ b/include/linux/pci.h
>>>>> @@ -1088,6 +1088,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
>>>>>   u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev 
>>>>> **limiting_dev,
>>>>>                    enum pci_bus_speed *speed,
>>>>>                    enum pcie_link_width *width);
>>>>> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
>>>>>   void pcie_print_link_status(struct pci_dev *dev);
>>>>>   int pcie_flr(struct pci_dev *dev);
>>>>>   int __pci_reset_function_locked(struct pci_dev *dev);
>>>>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-08-05  7:05                 ` Tal Gilboa
@ 2018-08-06 18:39                   ` Alex_Gagniuc
  2018-08-06 19:46                     ` Bjorn Helgaas
  0 siblings, 1 reply; 47+ messages in thread
From: Alex_Gagniuc @ 2018-08-06 18:39 UTC (permalink / raw)
  To: talgi, mr.nuke.me, jakub.kicinski
  Cc: linux-pci, bhelgaas, keith.busch, Austin.Bolen, Shyam.Iyer,
	jeffrey.t.kirsher, ariel.elior, michael.chan, ganeshgr, tariqt,
	airlied, alexander.deucher, mike.marciniszyn, linux-kernel

On 08/05/2018 02:06 AM, Tal Gilboa wrote:
> On 7/31/2018 6:10 PM, Alex G. wrote:
>> On 07/31/2018 01:40 AM, Tal Gilboa wrote:
>> [snip]
>>>>>> @@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct
>>>>>> pci_dev *dev)
>>>>>>        /* Advanced Error Reporting */
>>>>>>        pci_aer_init(dev);
>>>>>> +    /* Check link and detect downtrain errors */
>>>>>> +    pcie_check_upstream_link(dev);
>>>>
>>>> This is called for every PCIe device right? Won't there be a
>>>> duplicated print in case a device loads with lower PCIe bandwidth
>>>> than needed?
>>>
>>> Alex, can you comment on this please?
>>
>> Of course I can.
>>
>> There's one print at probe() time, which happens if bandwidth < max. I
>> would think that's fine. There is a way to duplicate it, and that is if
>> the driver also calls print_link_status(). A few driver maintainers who
>> call it have indicated they'd be fine with removing it from the driver,
>> and leaving it in the core PCI.
> 
> We would be fine with that as well. Please include the removal in your
> patches.

What's the proper procedure? Do I wait for confirmation from Bjorn 
before knocking on maintainer's doors, or do I William Wallace into 
their trees and demand they merge the removal (pending Bjorn's approval 
on the other side) ?

Alex

>>
>> Should the device come back at low speed, go away, then come back at
>> full speed we'd still only see one print from the first probe. Again, if
>> driver doesn't also call the function.
>> There's the corner case where the device come up at < max, goes away,
>> then comes back faster, but still < max. There will be two prints, but
>> they won't be true duplicates -- each one will indicate different BW.
> 
> This is fine.
> 
>>
>> Alex
>>
>>>>>> +
>>>>>>        if (pci_probe_reset_function(dev) == 0)
>>>>>>            dev->reset_fn = 1;
>>>>>>    }
>>>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>>>> index abd5d5e17aee..15bfab8f7a1b 100644
>>>>>> --- a/include/linux/pci.h
>>>>>> +++ b/include/linux/pci.h
>>>>>> @@ -1088,6 +1088,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
>>>>>>    u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev
>>>>>> **limiting_dev,
>>>>>>                     enum pci_bus_speed *speed,
>>>>>>                     enum pcie_link_width *width);
>>>>>> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
>>>>>>    void pcie_print_link_status(struct pci_dev *dev);
>>>>>>    int pcie_flr(struct pci_dev *dev);
>>>>>>    int __pci_reset_function_locked(struct pci_dev *dev);
>>>>>
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-08-06 18:39                   ` Alex_Gagniuc
@ 2018-08-06 19:46                     ` Bjorn Helgaas
  2018-08-06 23:25                       ` [PATCH v6 1/9] " Alexandru Gagniuc
  0 siblings, 1 reply; 47+ messages in thread
From: Bjorn Helgaas @ 2018-08-06 19:46 UTC (permalink / raw)
  To: Alex_Gagniuc
  Cc: Tal Gilboa, mr.nuke.me, jakub.kicinski, linux-pci, Keith Busch,
	Austin.Bolen, Shyam.Iyer, Kirsher, Jeffrey T, ariel.elior,
	michael.chan, ganeshgr, tariqt, Dave Airlie, Alex Deucher,
	Marciniszyn, Mike, Linux Kernel Mailing List

On Mon, Aug 6, 2018 at 1:39 PM <Alex_Gagniuc@dellteam.com> wrote:
>
> On 08/05/2018 02:06 AM, Tal Gilboa wrote:
> > On 7/31/2018 6:10 PM, Alex G. wrote:
> >> On 07/31/2018 01:40 AM, Tal Gilboa wrote:
> >> [snip]
> >>>>>> @@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct
> >>>>>> pci_dev *dev)
> >>>>>>        /* Advanced Error Reporting */
> >>>>>>        pci_aer_init(dev);
> >>>>>> +    /* Check link and detect downtrain errors */
> >>>>>> +    pcie_check_upstream_link(dev);
> >>>>
> >>>> This is called for every PCIe device right? Won't there be a
> >>>> duplicated print in case a device loads with lower PCIe bandwidth
> >>>> than needed?
> >>>
> >>> Alex, can you comment on this please?
> >>
> >> Of course I can.
> >>
> >> There's one print at probe() time, which happens if bandwidth < max. I
> >> would think that's fine. There is a way to duplicate it, and that is if
> >> the driver also calls print_link_status(). A few driver maintainers who
> >> call it have indicated they'd be fine with removing it from the driver,
> >> and leaving it in the core PCI.
> >
> > We would be fine with that as well. Please include the removal in your
> > patches.
>
> What's the proper procedure? Do I wait for confirmation from Bjorn
> before knocking on maintainer's doors, or do I William Wallace into
> their trees and demand they merge the removal (pending Bjorn's approval
> on the other side) ?

Post a v4 series that does the PCI core stuff as well as removing the
driver code.

Bjorn

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v6 1/9] PCI: Check for PCIe downtraining conditions
  2018-08-06 19:46                     ` Bjorn Helgaas
@ 2018-08-06 23:25                       ` Alexandru Gagniuc
  2018-08-06 23:25                         ` [PATCH v6 2/9] bnx2x: Do not call pcie_print_link_status() Alexandru Gagniuc
                                           ` (9 more replies)
  0 siblings, 10 replies; 47+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: linux-pci, bhelgaas, jakub.kicinski
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
	Alexandru Gagniuc, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

PCIe downtraining happens when both the device and PCIe port are
capable of a larger bus width or higher speed than negotiated.
Downtraining might be indicative of other problems in the system, and
identifying this from userspace is neither intuitive, nor
straightforward.

The easiest way to detect this is with pcie_print_link_status(),
since the bottleneck is usually the link that is downtrained. It's not
a perfect solution, but it works extremely well in most cases.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/pci/pci.c   | 22 ++++++++++++++++++----
 drivers/pci/probe.c | 21 +++++++++++++++++++++
 include/linux/pci.h |  1 +
 3 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 316496e99da9..414ad7b3abdb 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5302,14 +5302,15 @@ u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
 }
 
 /**
- * pcie_print_link_status - Report the PCI device's link speed and width
+ * __pcie_print_link_status - Report the PCI device's link speed and width
  * @dev: PCI device to query
+ * @verbose: Be verbose -- print info even when enough bandwidth is available.
  *
  * Report the available bandwidth at the device.  If this is less than the
  * device is capable of, report the device's maximum possible bandwidth and
  * the upstream link that limits its performance to less than that.
  */
-void pcie_print_link_status(struct pci_dev *dev)
+void __pcie_print_link_status(struct pci_dev *dev, bool verbose)
 {
 	enum pcie_link_width width, width_cap;
 	enum pci_bus_speed speed, speed_cap;
@@ -5319,11 +5320,11 @@ void pcie_print_link_status(struct pci_dev *dev)
 	bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap);
 	bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed, &width);
 
-	if (bw_avail >= bw_cap)
+	if (bw_avail >= bw_cap && verbose)
 		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s x%d link)\n",
 			 bw_cap / 1000, bw_cap % 1000,
 			 PCIE_SPEED2STR(speed_cap), width_cap);
-	else
+	else if (bw_avail < bw_cap)
 		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, limited by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
 			 bw_avail / 1000, bw_avail % 1000,
 			 PCIE_SPEED2STR(speed), width,
@@ -5331,6 +5332,19 @@ void pcie_print_link_status(struct pci_dev *dev)
 			 bw_cap / 1000, bw_cap % 1000,
 			 PCIE_SPEED2STR(speed_cap), width_cap);
 }
+
+/**
+ * pcie_print_link_status - Report the PCI device's link speed and width
+ * @dev: PCI device to query
+ *
+ * Report the available bandwidth at the device.  If this is less than the
+ * device is capable of, report the device's maximum possible bandwidth and
+ * the upstream link that limits its performance to less than that.
+ */
+void pcie_print_link_status(struct pci_dev *dev)
+{
+	__pcie_print_link_status(dev, true);
+}
 EXPORT_SYMBOL(pcie_print_link_status);
 
 /**
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 611adcd9c169..1c8c26dd2cb2 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2205,6 +2205,24 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
 	return dev;
 }
 
+static void pcie_check_upstream_link(struct pci_dev *dev)
+{
+	if (!pci_is_pcie(dev))
+		return;
+
+	/* Look from the device up to avoid downstream ports with no devices. */
+	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
+		return;
+
+	/* Multi-function PCIe share the same link/status. */
+	if (PCI_FUNC(dev->devfn) != 0 || dev->is_virtfn)
+		return;
+
+	__pcie_print_link_status(dev, false);
+}
+
 static void pci_init_capabilities(struct pci_dev *dev)
 {
 	/* Enhanced Allocation */
@@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
 	/* Advanced Error Reporting */
 	pci_aer_init(dev);
 
+	/* Check link and detect downtrain errors */
+	pcie_check_upstream_link(dev);
+
 	if (pci_probe_reset_function(dev) == 0)
 		dev->reset_fn = 1;
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c133ccfa002e..d212de231259 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1087,6 +1087,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
 u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev,
 			     enum pci_bus_speed *speed,
 			     enum pcie_link_width *width);
+void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
 void pcie_print_link_status(struct pci_dev *dev);
 int pcie_flr(struct pci_dev *dev);
 int __pci_reset_function_locked(struct pci_dev *dev);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6 2/9] bnx2x: Do not call pcie_print_link_status()
  2018-08-06 23:25                       ` [PATCH v6 1/9] " Alexandru Gagniuc
@ 2018-08-06 23:25                         ` Alexandru Gagniuc
  2018-08-06 23:25                         ` [PATCH v6 3/9] bnxt_en: " Alexandru Gagniuc
                                           ` (8 subsequent siblings)
  9 siblings, 0 replies; 47+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: linux-pci, bhelgaas, jakub.kicinski
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
	Alexandru Gagniuc, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 57348f2b49a3..3eadd6201dff 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -14100,7 +14100,6 @@ static int bnx2x_init_one(struct pci_dev *pdev,
 	       board_info[ent->driver_data].name,
 	       (CHIP_REV(bp) >> 12) + 'A', (CHIP_METAL(bp) >> 4),
 	       dev->base_addr, bp->pdev->irq, dev->dev_addr);
-	pcie_print_link_status(bp->pdev);
 
 	bnx2x_register_phc(bp);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6 3/9] bnxt_en: Do not call pcie_print_link_status()
  2018-08-06 23:25                       ` [PATCH v6 1/9] " Alexandru Gagniuc
  2018-08-06 23:25                         ` [PATCH v6 2/9] bnx2x: Do not call pcie_print_link_status() Alexandru Gagniuc
@ 2018-08-06 23:25                         ` Alexandru Gagniuc
  2018-08-06 23:25                         ` [PATCH v6 4/9] cxgb4: " Alexandru Gagniuc
                                           ` (7 subsequent siblings)
  9 siblings, 0 replies; 47+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: linux-pci, bhelgaas, jakub.kicinski
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
	Alexandru Gagniuc, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 4394c1162be4..4b3928c89076 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -8909,7 +8909,6 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	netdev_info(dev, "%s found at mem %lx, node addr %pM\n",
 		    board_info[ent->driver_data].name,
 		    (long)pci_resource_start(pdev, 0), dev->dev_addr);
-	pcie_print_link_status(pdev);
 
 	return 0;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6 4/9] cxgb4: Do not call pcie_print_link_status()
  2018-08-06 23:25                       ` [PATCH v6 1/9] " Alexandru Gagniuc
  2018-08-06 23:25                         ` [PATCH v6 2/9] bnx2x: Do not call pcie_print_link_status() Alexandru Gagniuc
  2018-08-06 23:25                         ` [PATCH v6 3/9] bnxt_en: " Alexandru Gagniuc
@ 2018-08-06 23:25                         ` Alexandru Gagniuc
  2018-08-06 23:25                         ` [PATCH v6 5/9] fm10k: " Alexandru Gagniuc
                                           ` (6 subsequent siblings)
  9 siblings, 0 replies; 47+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: linux-pci, bhelgaas, jakub.kicinski
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
	Alexandru Gagniuc, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index a8926e97935e..049958898c17 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -5726,9 +5726,6 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 			free_msix_info(adapter);
 	}
 
-	/* check for PCI Express bandwidth capabiltites */
-	pcie_print_link_status(pdev);
-
 	err = init_rss(adapter);
 	if (err)
 		goto out_free_dev;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6 5/9] fm10k: Do not call pcie_print_link_status()
  2018-08-06 23:25                       ` [PATCH v6 1/9] " Alexandru Gagniuc
                                           ` (2 preceding siblings ...)
  2018-08-06 23:25                         ` [PATCH v6 4/9] cxgb4: " Alexandru Gagniuc
@ 2018-08-06 23:25                         ` Alexandru Gagniuc
  2018-08-07 17:52                           ` Jeff Kirsher
  2018-08-06 23:25                         ` [PATCH v6 6/9] ixgbe: " Alexandru Gagniuc
                                           ` (5 subsequent siblings)
  9 siblings, 1 reply; 47+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: linux-pci, bhelgaas, jakub.kicinski
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
	Alexandru Gagniuc, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 15071e4adb98..079fd3c884ea 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -2224,9 +2224,6 @@ static int fm10k_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* kick off service timer now, even when interface is down */
 	mod_timer(&interface->service_timer, (HZ * 2) + jiffies);
 
-	/* print warning for non-optimal configurations */
-	pcie_print_link_status(interface->pdev);
-
 	/* report MAC address for logging */
 	dev_info(&pdev->dev, "%pM\n", netdev->dev_addr);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6 6/9] ixgbe: Do not call pcie_print_link_status()
  2018-08-06 23:25                       ` [PATCH v6 1/9] " Alexandru Gagniuc
                                           ` (3 preceding siblings ...)
  2018-08-06 23:25                         ` [PATCH v6 5/9] fm10k: " Alexandru Gagniuc
@ 2018-08-06 23:25                         ` Alexandru Gagniuc
  2018-08-07 17:51                           ` Jeff Kirsher
  2018-08-06 23:25                         ` [PATCH v6 7/9] net/mlx4: " Alexandru Gagniuc
                                           ` (4 subsequent siblings)
  9 siblings, 1 reply; 47+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: linux-pci, bhelgaas, jakub.kicinski
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
	Alexandru Gagniuc, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 26 -------------------
 1 file changed, 26 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 62e57b05a0ae..7ecdc6c03a66 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -241,28 +241,6 @@ static inline bool ixgbe_pcie_from_parent(struct ixgbe_hw *hw)
 	}
 }
 
-static void ixgbe_check_minimum_link(struct ixgbe_adapter *adapter,
-				     int expected_gts)
-{
-	struct ixgbe_hw *hw = &adapter->hw;
-	struct pci_dev *pdev;
-
-	/* Some devices are not connected over PCIe and thus do not negotiate
-	 * speed. These devices do not have valid bus info, and thus any report
-	 * we generate may not be correct.
-	 */
-	if (hw->bus.type == ixgbe_bus_type_internal)
-		return;
-
-	/* determine whether to use the parent device */
-	if (ixgbe_pcie_from_parent(&adapter->hw))
-		pdev = adapter->pdev->bus->parent->self;
-	else
-		pdev = adapter->pdev;
-
-	pcie_print_link_status(pdev);
-}
-
 static void ixgbe_service_event_schedule(struct ixgbe_adapter *adapter)
 {
 	if (!test_bit(__IXGBE_DOWN, &adapter->state) &&
@@ -10585,10 +10563,6 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		break;
 	}
 
-	/* don't check link if we failed to enumerate functions */
-	if (expected_gts > 0)
-		ixgbe_check_minimum_link(adapter, expected_gts);
-
 	err = ixgbe_read_pba_string_generic(hw, part_str, sizeof(part_str));
 	if (err)
 		strlcpy(part_str, "Unknown", sizeof(part_str));
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6 7/9] net/mlx4: Do not call pcie_print_link_status()
  2018-08-06 23:25                       ` [PATCH v6 1/9] " Alexandru Gagniuc
                                           ` (4 preceding siblings ...)
  2018-08-06 23:25                         ` [PATCH v6 6/9] ixgbe: " Alexandru Gagniuc
@ 2018-08-06 23:25                         ` Alexandru Gagniuc
  2018-08-08  6:10                           ` Leon Romanovsky
  2018-08-06 23:25                         ` [PATCH v6 8/9] net/mlx5: " Alexandru Gagniuc
                                           ` (3 subsequent siblings)
  9 siblings, 1 reply; 47+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: linux-pci, bhelgaas, jakub.kicinski
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
	Alexandru Gagniuc, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 872014702fc1..da4d54195853 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -3398,13 +3398,6 @@ static int mlx4_load_one(struct pci_dev *pdev, int pci_dev_data,
 		}
 	}
 
-	/* check if the device is functioning at its maximum possible speed.
-	 * No return code for this call, just warn the user in case of PCI
-	 * express device capabilities are under-satisfied by the bus.
-	 */
-	if (!mlx4_is_slave(dev))
-		pcie_print_link_status(dev->persist->pdev);
-
 	/* In master functions, the communication channel must be initialized
 	 * after obtaining its address from fw */
 	if (mlx4_is_master(dev)) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
  2018-08-06 23:25                       ` [PATCH v6 1/9] " Alexandru Gagniuc
                                           ` (5 preceding siblings ...)
  2018-08-06 23:25                         ` [PATCH v6 7/9] net/mlx4: " Alexandru Gagniuc
@ 2018-08-06 23:25                         ` Alexandru Gagniuc
  2018-08-08  6:08                           ` Leon Romanovsky
  2018-08-06 23:25                         ` [PATCH v6 9/9] nfp: " Alexandru Gagniuc
                                           ` (2 subsequent siblings)
  9 siblings, 1 reply; 47+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: linux-pci, bhelgaas, jakub.kicinski
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
	Alexandru Gagniuc, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 615005e63819..e10f9c2bea3b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1045,10 +1045,6 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 	dev_info(&pdev->dev, "firmware version: %d.%d.%d\n", fw_rev_maj(dev),
 		 fw_rev_min(dev), fw_rev_sub(dev));
 
-	/* Only PFs hold the relevant PCIe information for this query */
-	if (mlx5_core_is_pf(dev))
-		pcie_print_link_status(dev->pdev);
-
 	/* on load removing any previous indication of internal error, device is
 	 * up
 	 */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v6 9/9] nfp: Do not call pcie_print_link_status()
  2018-08-06 23:25                       ` [PATCH v6 1/9] " Alexandru Gagniuc
                                           ` (6 preceding siblings ...)
  2018-08-06 23:25                         ` [PATCH v6 8/9] net/mlx5: " Alexandru Gagniuc
@ 2018-08-06 23:25                         ` Alexandru Gagniuc
  2018-08-07 19:44                         ` [PATCH v6 1/9] PCI: Check for PCIe downtraining conditions David Miller
  2018-08-07 21:41                         ` Bjorn Helgaas
  9 siblings, 0 replies; 47+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: linux-pci, bhelgaas, jakub.kicinski
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
	Alexandru Gagniuc, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
index 749655c329b2..0324f99bd1a7 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
@@ -1324,7 +1324,6 @@ struct nfp_cpp *nfp_cpp_from_nfp6000_pcie(struct pci_dev *pdev)
 	/*  Finished with card initialization. */
 	dev_info(&pdev->dev,
 		 "Netronome Flow Processor NFP4000/NFP6000 PCIe Card Probe\n");
-	pcie_print_link_status(pdev);
 
 	nfp = kzalloc(sizeof(*nfp), GFP_KERNEL);
 	if (!nfp) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 6/9] ixgbe: Do not call pcie_print_link_status()
  2018-08-06 23:25                         ` [PATCH v6 6/9] ixgbe: " Alexandru Gagniuc
@ 2018-08-07 17:51                           ` Jeff Kirsher
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Kirsher @ 2018-08-07 17:51 UTC (permalink / raw)
  To: Alexandru Gagniuc, linux-pci, bhelgaas, jakub.kicinski
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer, Ariel Elior,
	everest-linux-l2, David S. Miller, Michael Chan, Ganesh Goudar,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

[-- Attachment #1: Type: text/plain, Size: 372 bytes --]

On Mon, 2018-08-06 at 18:25 -0500, Alexandru Gagniuc wrote:
> This is now done by the PCI core to warn of sub-optimal bandwidth.
> 
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 26 ---------------
> ----
>  1 file changed, 26 deletions(-)

Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 5/9] fm10k: Do not call pcie_print_link_status()
  2018-08-06 23:25                         ` [PATCH v6 5/9] fm10k: " Alexandru Gagniuc
@ 2018-08-07 17:52                           ` Jeff Kirsher
  0 siblings, 0 replies; 47+ messages in thread
From: Jeff Kirsher @ 2018-08-07 17:52 UTC (permalink / raw)
  To: Alexandru Gagniuc, linux-pci, bhelgaas, jakub.kicinski
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer, Ariel Elior,
	everest-linux-l2, David S. Miller, Michael Chan, Ganesh Goudar,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

[-- Attachment #1: Type: text/plain, Size: 349 bytes --]

On Mon, 2018-08-06 at 18:25 -0500, Alexandru Gagniuc wrote:
> This is now done by the PCI core to warn of sub-optimal bandwidth.
> 
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
>  drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 3 ---
>  1 file changed, 3 deletions(-)

Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 1/9] PCI: Check for PCIe downtraining conditions
  2018-08-06 23:25                       ` [PATCH v6 1/9] " Alexandru Gagniuc
                                           ` (7 preceding siblings ...)
  2018-08-06 23:25                         ` [PATCH v6 9/9] nfp: " Alexandru Gagniuc
@ 2018-08-07 19:44                         ` David Miller
  2018-08-07 21:41                         ` Bjorn Helgaas
  9 siblings, 0 replies; 47+ messages in thread
From: David Miller @ 2018-08-07 19:44 UTC (permalink / raw)
  To: mr.nuke.me
  Cc: linux-pci, bhelgaas, jakub.kicinski, keith.busch, alex_gagniuc,
	austin_bolen, shyam_iyer, ariel.elior, everest-linux-l2,
	michael.chan, ganeshgr, jeffrey.t.kirsher, tariqt, saeedm, leon,
	dirk.vandermerwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

From: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Date: Mon,  6 Aug 2018 18:25:35 -0500

> PCIe downtraining happens when both the device and PCIe port are
> capable of a larger bus width or higher speed than negotiated.
> Downtraining might be indicative of other problems in the system, and
> identifying this from userspace is neither intuitive, nor
> straightforward.
> 
> The easiest way to detect this is with pcie_print_link_status(),
> since the bottleneck is usually the link that is downtrained. It's not
> a perfect solution, but it works extremely well in most cases.
> 
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>

Feel free to merge this entire series via the PCI tree.

For the series:

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 1/9] PCI: Check for PCIe downtraining conditions
  2018-08-06 23:25                       ` [PATCH v6 1/9] " Alexandru Gagniuc
                                           ` (8 preceding siblings ...)
  2018-08-07 19:44                         ` [PATCH v6 1/9] PCI: Check for PCIe downtraining conditions David Miller
@ 2018-08-07 21:41                         ` Bjorn Helgaas
  9 siblings, 0 replies; 47+ messages in thread
From: Bjorn Helgaas @ 2018-08-07 21:41 UTC (permalink / raw)
  To: Alexandru Gagniuc
  Cc: linux-pci, bhelgaas, jakub.kicinski, keith.busch, alex_gagniuc,
	austin_bolen, shyam_iyer, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

On Mon, Aug 06, 2018 at 06:25:35PM -0500, Alexandru Gagniuc wrote:
> PCIe downtraining happens when both the device and PCIe port are
> capable of a larger bus width or higher speed than negotiated.
> Downtraining might be indicative of other problems in the system, and
> identifying this from userspace is neither intuitive, nor
> straightforward.
> 
> The easiest way to detect this is with pcie_print_link_status(),
> since the bottleneck is usually the link that is downtrained. It's not
> a perfect solution, but it works extremely well in most cases.

After this series, there are no callers of pcie_print_link_status(),
which means we *only* print something if a device is capable of more
bandwidth than the fabric can deliver.

ISTR some desire to have this information for NICs even if the device
isn't limited, so I'm just double-checking to make sure the driver
guys are OK with this change.

There are no callers of __pcie_print_link_status() outside the PCI
core, so I would move the declaration from include/linux/pci.h to
drivers/pci/pci.h.

If we agree that we *never* need to print anything unless a device is
constrained by the fabric, I would get rid of the "verbose" flag and
keep everything in pcie_print_link_status().

> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
>  drivers/pci/pci.c   | 22 ++++++++++++++++++----
>  drivers/pci/probe.c | 21 +++++++++++++++++++++
>  include/linux/pci.h |  1 +
>  3 files changed, 40 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 316496e99da9..414ad7b3abdb 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5302,14 +5302,15 @@ u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
>  }
>  
>  /**
> - * pcie_print_link_status - Report the PCI device's link speed and width
> + * __pcie_print_link_status - Report the PCI device's link speed and width
>   * @dev: PCI device to query
> + * @verbose: Be verbose -- print info even when enough bandwidth is available.
>   *
>   * Report the available bandwidth at the device.  If this is less than the
>   * device is capable of, report the device's maximum possible bandwidth and
>   * the upstream link that limits its performance to less than that.
>   */
> -void pcie_print_link_status(struct pci_dev *dev)
> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose)
>  {
>  	enum pcie_link_width width, width_cap;
>  	enum pci_bus_speed speed, speed_cap;
> @@ -5319,11 +5320,11 @@ void pcie_print_link_status(struct pci_dev *dev)
>  	bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap);
>  	bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed, &width);
>  
> -	if (bw_avail >= bw_cap)
> +	if (bw_avail >= bw_cap && verbose)
>  		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s x%d link)\n",
>  			 bw_cap / 1000, bw_cap % 1000,
>  			 PCIE_SPEED2STR(speed_cap), width_cap);
> -	else
> +	else if (bw_avail < bw_cap)
>  		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, limited by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
>  			 bw_avail / 1000, bw_avail % 1000,
>  			 PCIE_SPEED2STR(speed), width,
> @@ -5331,6 +5332,19 @@ void pcie_print_link_status(struct pci_dev *dev)
>  			 bw_cap / 1000, bw_cap % 1000,
>  			 PCIE_SPEED2STR(speed_cap), width_cap);
>  }
> +
> +/**
> + * pcie_print_link_status - Report the PCI device's link speed and width
> + * @dev: PCI device to query
> + *
> + * Report the available bandwidth at the device.  If this is less than the
> + * device is capable of, report the device's maximum possible bandwidth and
> + * the upstream link that limits its performance to less than that.
> + */
> +void pcie_print_link_status(struct pci_dev *dev)
> +{
> +	__pcie_print_link_status(dev, true);
> +}
>  EXPORT_SYMBOL(pcie_print_link_status);
>  
>  /**
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 611adcd9c169..1c8c26dd2cb2 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2205,6 +2205,24 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
>  	return dev;
>  }
>  
> +static void pcie_check_upstream_link(struct pci_dev *dev)
> +{
> +	if (!pci_is_pcie(dev))
> +		return;
> +
> +	/* Look from the device up to avoid downstream ports with no devices. */
> +	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
> +		return;
> +
> +	/* Multi-function PCIe share the same link/status. */
> +	if (PCI_FUNC(dev->devfn) != 0 || dev->is_virtfn)
> +		return;
> +
> +	__pcie_print_link_status(dev, false);
> +}
> +
>  static void pci_init_capabilities(struct pci_dev *dev)
>  {
>  	/* Enhanced Allocation */
> @@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
>  	/* Advanced Error Reporting */
>  	pci_aer_init(dev);
>  
> +	/* Check link and detect downtrain errors */
> +	pcie_check_upstream_link(dev);
> +
>  	if (pci_probe_reset_function(dev) == 0)
>  		dev->reset_fn = 1;
>  }
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index c133ccfa002e..d212de231259 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1087,6 +1087,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
>  u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev,
>  			     enum pci_bus_speed *speed,
>  			     enum pcie_link_width *width);
> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
>  void pcie_print_link_status(struct pci_dev *dev);
>  int pcie_flr(struct pci_dev *dev);
>  int __pci_reset_function_locked(struct pci_dev *dev);
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
  2018-08-06 23:25                         ` [PATCH v6 8/9] net/mlx5: " Alexandru Gagniuc
@ 2018-08-08  6:08                           ` Leon Romanovsky
  2018-08-08 14:23                             ` Tal Gilboa
  0 siblings, 1 reply; 47+ messages in thread
From: Leon Romanovsky @ 2018-08-08  6:08 UTC (permalink / raw)
  To: Alexandru Gagniuc
  Cc: linux-pci, bhelgaas, jakub.kicinski, keith.busch, alex_gagniuc,
	austin_bolen, shyam_iyer, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Dirk van der Merwe, netdev,
	linux-kernel, intel-wired-lan, linux-rdma, oss-drivers

[-- Attachment #1: Type: text/plain, Size: 357 bytes --]

On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
> This is now done by the PCI core to warn of sub-optimal bandwidth.
>
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
>  1 file changed, 4 deletions(-)
>

Thanks,
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 7/9] net/mlx4: Do not call pcie_print_link_status()
  2018-08-06 23:25                         ` [PATCH v6 7/9] net/mlx4: " Alexandru Gagniuc
@ 2018-08-08  6:10                           ` Leon Romanovsky
  0 siblings, 0 replies; 47+ messages in thread
From: Leon Romanovsky @ 2018-08-08  6:10 UTC (permalink / raw)
  To: Alexandru Gagniuc
  Cc: linux-pci, bhelgaas, jakub.kicinski, keith.busch, alex_gagniuc,
	austin_bolen, shyam_iyer, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Dirk van der Merwe, netdev,
	linux-kernel, intel-wired-lan, linux-rdma, oss-drivers

[-- Attachment #1: Type: text/plain, Size: 355 bytes --]

On Mon, Aug 06, 2018 at 06:25:41PM -0500, Alexandru Gagniuc wrote:
> This is now done by the PCI core to warn of sub-optimal bandwidth.
>
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
>  drivers/net/ethernet/mellanox/mlx4/main.c | 7 -------
>  1 file changed, 7 deletions(-)
>

Thanks,
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
  2018-08-08  6:08                           ` Leon Romanovsky
@ 2018-08-08 14:23                             ` Tal Gilboa
  2018-08-08 15:41                               ` Leon Romanovsky
  0 siblings, 1 reply; 47+ messages in thread
From: Tal Gilboa @ 2018-08-08 14:23 UTC (permalink / raw)
  To: Leon Romanovsky, Alexandru Gagniuc
  Cc: linux-pci, bhelgaas, jakub.kicinski, keith.busch, alex_gagniuc,
	austin_bolen, shyam_iyer, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Dirk van der Merwe, netdev,
	linux-kernel, intel-wired-lan, linux-rdma, oss-drivers,
	Saeed Mahameed

On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
> On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
>> This is now done by the PCI core to warn of sub-optimal bandwidth.
>>
>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>> ---
>>   drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
>>   1 file changed, 4 deletions(-)
>>
> 
> Thanks,
> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> 

Alex,
I loaded mlx5 driver with and without these series. The report in dmesg 
is now missing. From what I understood, the status should be reported at 
least once, even if everything is in order. We need this functionality 
to stay.

net-next (dmesg output for 07:00.0):
[270498.625351] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
[270498.632130] mlx5_core 0000:07:00.0: 63.008 Gb/s available PCIe 
bandwidth (8 GT/s x8 link)
[270499.169533] (0000:07:00.0): E-Switch: Total vports 9, per vport: max 
uc(1024) max mc(16384)
[270499.182358] mlx5_core 0000:07:00.0: Port module event: module 0, 
Cable plugged

net-next + patches (dmesg output for 07:00.0):
[  331.608472] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
[  332.564938] (0000:07:00.0): E-Switch: Total vports 9, per vport: max 
uc(1024) max mc(16384)
[  332.616271] mlx5_core 0000:07:00.0: Port module event: module 0, 
Cable plugged



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
  2018-08-08 14:23                             ` Tal Gilboa
@ 2018-08-08 15:41                               ` Leon Romanovsky
  2018-08-08 15:56                                 ` Tal Gilboa
  0 siblings, 1 reply; 47+ messages in thread
From: Leon Romanovsky @ 2018-08-08 15:41 UTC (permalink / raw)
  To: Tal Gilboa
  Cc: Alexandru Gagniuc, linux-pci, bhelgaas, jakub.kicinski,
	keith.busch, alex_gagniuc, austin_bolen, shyam_iyer, Ariel Elior,
	everest-linux-l2, David S. Miller, Michael Chan, Ganesh Goudar,
	Jeff Kirsher, Tariq Toukan, Saeed Mahameed, Dirk van der Merwe,
	netdev, linux-kernel, intel-wired-lan, linux-rdma, oss-drivers

[-- Attachment #1: Type: text/plain, Size: 1730 bytes --]

On Wed, Aug 08, 2018 at 05:23:12PM +0300, Tal Gilboa wrote:
> On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
> > On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
> > > This is now done by the PCI core to warn of sub-optimal bandwidth.
> > >
> > > Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> > > ---
> > >   drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
> > >   1 file changed, 4 deletions(-)
> > >
> >
> > Thanks,
> > Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> >
>
> Alex,
> I loaded mlx5 driver with and without these series. The report in dmesg is
> now missing. From what I understood, the status should be reported at least
> once, even if everything is in order.

It is not what this series is doing and it removes prints completely if
fabric can deliver more than card is capable.

> We need this functionality to stay.

I'm not sure that you need this information in driver's dmesg output,
but most probably something globally visible and accessible per-pci
device.

>
> net-next (dmesg output for 07:00.0):
> [270498.625351] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
> [270498.632130] mlx5_core 0000:07:00.0: 63.008 Gb/s available PCIe bandwidth
> (8 GT/s x8 link)
> [270499.169533] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
> uc(1024) max mc(16384)
> [270499.182358] mlx5_core 0000:07:00.0: Port module event: module 0, Cable
> plugged
>
> net-next + patches (dmesg output for 07:00.0):
> [  331.608472] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
> [  332.564938] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
> uc(1024) max mc(16384)
> [  332.616271] mlx5_core 0000:07:00.0: Port module event: module 0, Cable
> plugged
>
>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
  2018-08-08 15:41                               ` Leon Romanovsky
@ 2018-08-08 15:56                                 ` Tal Gilboa
  2018-08-08 16:33                                   ` Alex G.
  0 siblings, 1 reply; 47+ messages in thread
From: Tal Gilboa @ 2018-08-08 15:56 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Alexandru Gagniuc, linux-pci, bhelgaas, jakub.kicinski,
	keith.busch, alex_gagniuc, austin_bolen, shyam_iyer, Ariel Elior,
	everest-linux-l2, David S. Miller, Michael Chan, Ganesh Goudar,
	Jeff Kirsher, Tariq Toukan, Saeed Mahameed, Dirk van der Merwe,
	netdev, linux-kernel, intel-wired-lan, linux-rdma, oss-drivers

On 8/8/2018 6:41 PM, Leon Romanovsky wrote:
> On Wed, Aug 08, 2018 at 05:23:12PM +0300, Tal Gilboa wrote:
>> On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
>>> On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
>>>> This is now done by the PCI core to warn of sub-optimal bandwidth.
>>>>
>>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>>>> ---
>>>>    drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
>>>>    1 file changed, 4 deletions(-)
>>>>
>>>
>>> Thanks,
>>> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
>>>
>>
>> Alex,
>> I loaded mlx5 driver with and without these series. The report in dmesg is
>> now missing. From what I understood, the status should be reported at least
>> once, even if everything is in order.
> 
> It is not what this series is doing and it removes prints completely if
> fabric can deliver more than card is capable.
> 
>> We need this functionality to stay.
> 
> I'm not sure that you need this information in driver's dmesg output,
> but most probably something globally visible and accessible per-pci
> device.

Currently we have users that look for it. If we remove the dmesg print 
we need this to be reported elsewhere. Adding it to sysfs for example 
should be a valid solution for our case.

> 
>>
>> net-next (dmesg output for 07:00.0):
>> [270498.625351] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
>> [270498.632130] mlx5_core 0000:07:00.0: 63.008 Gb/s available PCIe bandwidth
>> (8 GT/s x8 link)
>> [270499.169533] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
>> uc(1024) max mc(16384)
>> [270499.182358] mlx5_core 0000:07:00.0: Port module event: module 0, Cable
>> plugged
>>
>> net-next + patches (dmesg output for 07:00.0):
>> [  331.608472] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
>> [  332.564938] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
>> uc(1024) max mc(16384)
>> [  332.616271] mlx5_core 0000:07:00.0: Port module event: module 0, Cable
>> plugged
>>
>>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
  2018-08-08 15:56                                 ` Tal Gilboa
@ 2018-08-08 16:33                                   ` Alex G.
  2018-08-08 17:27                                     ` Leon Romanovsky
  0 siblings, 1 reply; 47+ messages in thread
From: Alex G. @ 2018-08-08 16:33 UTC (permalink / raw)
  To: Tal Gilboa, Leon Romanovsky
  Cc: linux-pci, bhelgaas, jakub.kicinski, keith.busch, alex_gagniuc,
	austin_bolen, shyam_iyer, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Dirk van der Merwe, netdev,
	linux-kernel, intel-wired-lan, linux-rdma, oss-drivers



On 08/08/2018 10:56 AM, Tal Gilboa wrote:
> On 8/8/2018 6:41 PM, Leon Romanovsky wrote:
>> On Wed, Aug 08, 2018 at 05:23:12PM +0300, Tal Gilboa wrote:
>>> On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
>>>> On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
>>>>> This is now done by the PCI core to warn of sub-optimal bandwidth.
>>>>>
>>>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>>>>> ---
>>>>>    drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
>>>>>    1 file changed, 4 deletions(-)
>>>>>
>>>>
>>>> Thanks,
>>>> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
>>>>
>>>
>>> Alex,
>>> I loaded mlx5 driver with and without these series. The report in 
>>> dmesg is
>>> now missing. From what I understood, the status should be reported at 
>>> least
>>> once, even if everything is in order.
>>
>> It is not what this series is doing and it removes prints completely if
>> fabric can deliver more than card is capable.
>>
>>> We need this functionality to stay.
>>
>> I'm not sure that you need this information in driver's dmesg output,
>> but most probably something globally visible and accessible per-pci
>> device.
> 
> Currently we have users that look for it. If we remove the dmesg print 
> we need this to be reported elsewhere. Adding it to sysfs for example 
> should be a valid solution for our case.

I think a stop-gap measure is to leave the pcie_print_link_status() call 
in drivers that really need it for whatever reason. Implementing a 
reliable reporting through sysfs might take some tinkering, and I don't 
think it's a sufficient reason to block the heart of this series -- 
being able to detect bottlenecks and link downtraining.

Alex

>>
>>>
>>> net-next (dmesg output for 07:00.0):
>>> [270498.625351] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
>>> [270498.632130] mlx5_core 0000:07:00.0: 63.008 Gb/s available PCIe 
>>> bandwidth
>>> (8 GT/s x8 link)
>>> [270499.169533] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
>>> uc(1024) max mc(16384)
>>> [270499.182358] mlx5_core 0000:07:00.0: Port module event: module 0, 
>>> Cable
>>> plugged
>>>
>>> net-next + patches (dmesg output for 07:00.0):
>>> [  331.608472] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
>>> [  332.564938] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
>>> uc(1024) max mc(16384)
>>> [  332.616271] mlx5_core 0000:07:00.0: Port module event: module 0, 
>>> Cable
>>> plugged
>>>
>>>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
  2018-08-08 16:33                                   ` Alex G.
@ 2018-08-08 17:27                                     ` Leon Romanovsky
  2018-08-09 14:02                                       ` Bjorn Helgaas
  0 siblings, 1 reply; 47+ messages in thread
From: Leon Romanovsky @ 2018-08-08 17:27 UTC (permalink / raw)
  To: Alex G.
  Cc: Tal Gilboa, linux-pci, bhelgaas, jakub.kicinski, keith.busch,
	alex_gagniuc, austin_bolen, shyam_iyer, Ariel Elior,
	everest-linux-l2, David S. Miller, Michael Chan, Ganesh Goudar,
	Jeff Kirsher, Tariq Toukan, Saeed Mahameed, Dirk van der Merwe,
	netdev, linux-kernel, intel-wired-lan, linux-rdma, oss-drivers

[-- Attachment #1: Type: text/plain, Size: 2996 bytes --]

On Wed, Aug 08, 2018 at 11:33:51AM -0500, Alex G. wrote:
>
>
> On 08/08/2018 10:56 AM, Tal Gilboa wrote:
> > On 8/8/2018 6:41 PM, Leon Romanovsky wrote:
> > > On Wed, Aug 08, 2018 at 05:23:12PM +0300, Tal Gilboa wrote:
> > > > On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
> > > > > On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
> > > > > > This is now done by the PCI core to warn of sub-optimal bandwidth.
> > > > > >
> > > > > > Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> > > > > > ---
> > > > > >    drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
> > > > > >    1 file changed, 4 deletions(-)
> > > > > >
> > > > >
> > > > > Thanks,
> > > > > Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> > > > >
> > > >
> > > > Alex,
> > > > I loaded mlx5 driver with and without these series. The report
> > > > in dmesg is
> > > > now missing. From what I understood, the status should be
> > > > reported at least
> > > > once, even if everything is in order.
> > >
> > > It is not what this series is doing and it removes prints completely if
> > > fabric can deliver more than card is capable.
> > >
> > > > We need this functionality to stay.
> > >
> > > I'm not sure that you need this information in driver's dmesg output,
> > > but most probably something globally visible and accessible per-pci
> > > device.
> >
> > Currently we have users that look for it. If we remove the dmesg print
> > we need this to be reported elsewhere. Adding it to sysfs for example
> > should be a valid solution for our case.
>
> I think a stop-gap measure is to leave the pcie_print_link_status() call in
> drivers that really need it for whatever reason. Implementing a reliable
> reporting through sysfs might take some tinkering, and I don't think it's a
> sufficient reason to block the heart of this series -- being able to detect
> bottlenecks and link downtraining.

IMHO, you did right change and it is better to replace this print to some
more generic solution now while you are doing it and don't leave leftovers.

Thanks

>
> Alex
>
> > >
> > > >
> > > > net-next (dmesg output for 07:00.0):
> > > > [270498.625351] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
> > > > [270498.632130] mlx5_core 0000:07:00.0: 63.008 Gb/s available
> > > > PCIe bandwidth
> > > > (8 GT/s x8 link)
> > > > [270499.169533] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
> > > > uc(1024) max mc(16384)
> > > > [270499.182358] mlx5_core 0000:07:00.0: Port module event:
> > > > module 0, Cable
> > > > plugged
> > > >
> > > > net-next + patches (dmesg output for 07:00.0):
> > > > [  331.608472] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
> > > > [  332.564938] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
> > > > uc(1024) max mc(16384)
> > > > [  332.616271] mlx5_core 0000:07:00.0: Port module event: module
> > > > 0, Cable
> > > > plugged
> > > >
> > > >

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
  2018-08-08 17:27                                     ` Leon Romanovsky
@ 2018-08-09 14:02                                       ` Bjorn Helgaas
  0 siblings, 0 replies; 47+ messages in thread
From: Bjorn Helgaas @ 2018-08-09 14:02 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Alex G.,
	Tal Gilboa, linux-pci, bhelgaas, jakub.kicinski, keith.busch,
	alex_gagniuc, austin_bolen, shyam_iyer, Ariel Elior,
	everest-linux-l2, David S. Miller, Michael Chan, Ganesh Goudar,
	Jeff Kirsher, Tariq Toukan, Saeed Mahameed, Dirk van der Merwe,
	netdev, linux-kernel, intel-wired-lan, linux-rdma, oss-drivers

On Wed, Aug 08, 2018 at 08:27:36PM +0300, Leon Romanovsky wrote:
> On Wed, Aug 08, 2018 at 11:33:51AM -0500, Alex G. wrote:
> >
> >
> > On 08/08/2018 10:56 AM, Tal Gilboa wrote:
> > > On 8/8/2018 6:41 PM, Leon Romanovsky wrote:
> > > > On Wed, Aug 08, 2018 at 05:23:12PM +0300, Tal Gilboa wrote:
> > > > > On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
> > > > > > On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
> > > > > > > This is now done by the PCI core to warn of sub-optimal bandwidth.
> > > > > > >
> > > > > > > Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> > > > > > > ---
> > > > > > >    drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
> > > > > > >    1 file changed, 4 deletions(-)
> > > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> > > > > >
> > > > >
> > > > > Alex,
> > > > > I loaded mlx5 driver with and without these series. The report
> > > > > in dmesg is
> > > > > now missing. From what I understood, the status should be
> > > > > reported at least
> > > > > once, even if everything is in order.
> > > >
> > > > It is not what this series is doing and it removes prints completely if
> > > > fabric can deliver more than card is capable.
> > > >
> > > > > We need this functionality to stay.
> > > >
> > > > I'm not sure that you need this information in driver's dmesg output,
> > > > but most probably something globally visible and accessible per-pci
> > > > device.
> > >
> > > Currently we have users that look for it. If we remove the dmesg print
> > > we need this to be reported elsewhere. Adding it to sysfs for example
> > > should be a valid solution for our case.
> >
> > I think a stop-gap measure is to leave the pcie_print_link_status() call in
> > drivers that really need it for whatever reason. Implementing a reliable
> > reporting through sysfs might take some tinkering, and I don't think it's a
> > sufficient reason to block the heart of this series -- being able to detect
> > bottlenecks and link downtraining.
> 
> IMHO, you did right change and it is better to replace this print to some
> more generic solution now while you are doing it and don't leave leftovers.

I'd like to make forward progress on this, so I propose we merge only
the PCI core change (patch 1/9) and drop the individual driver
changes.  That would mean:

  - We'll get a message from every NIC driver that calls
    pcie_print_link_status() as before.

  - We'll get a new message from the core for every downtrained link.

  - If a link leading to the NIC is downtrained, there will be
    duplicate messages.  Maybe that's overkill but it's not terrible.

I provisionally put the patch below on my pci/enumeration branch.
Objections?


commit c870cc8cbc4d79014f3daa74d1e412f32e42bf1b
Author: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Date:   Mon Aug 6 18:25:35 2018 -0500

    PCI: Check for PCIe Link downtraining
    
    When both ends of a PCIe Link are capable of a higher bandwidth than is
    currently in use, the Link is said to be "downtrained".  A downtrained Link
    may indicate hardware or configuration problems in the system, but it's
    hard to identify such Links from userspace.
    
    Refactor pcie_print_link_status() so it continues to always print PCIe
    bandwidth information, as several NIC drivers desire.
    
    Add a new internal __pcie_print_link_status() to emit a message only when a
    device's bandwidth is constrained by the fabric and call it from the PCI
    core for all devices, which identifies all downtrained Links.  It also
    emits messages for a few cases that are technically not downtrained, such
    as a x4 device in an open-ended x1 slot.
    
    Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
    [bhelgaas: changelog, move __pcie_print_link_status() declaration to
    drivers/pci/, rename pcie_check_upstream_link() to
    pcie_report_downtraining()]
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 97acba712e4e..a84d341504a5 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5264,14 +5264,16 @@ u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
 }
 
 /**
- * pcie_print_link_status - Report the PCI device's link speed and width
+ * __pcie_print_link_status - Report the PCI device's link speed and width
  * @dev: PCI device to query
+ * @verbose: Print info even when enough bandwidth is available
  *
- * Report the available bandwidth at the device.  If this is less than the
- * device is capable of, report the device's maximum possible bandwidth and
- * the upstream link that limits its performance to less than that.
+ * If the available bandwidth at the device is less than the device is
+ * capable of, report the device's maximum possible bandwidth and the
+ * upstream link that limits its performance.  If @verbose, always print
+ * the available bandwidth, even if the device isn't constrained.
  */
-void pcie_print_link_status(struct pci_dev *dev)
+void __pcie_print_link_status(struct pci_dev *dev, bool verbose)
 {
 	enum pcie_link_width width, width_cap;
 	enum pci_bus_speed speed, speed_cap;
@@ -5281,11 +5283,11 @@ void pcie_print_link_status(struct pci_dev *dev)
 	bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap);
 	bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed, &width);
 
-	if (bw_avail >= bw_cap)
+	if (bw_avail >= bw_cap && verbose)
 		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s x%d link)\n",
 			 bw_cap / 1000, bw_cap % 1000,
 			 PCIE_SPEED2STR(speed_cap), width_cap);
-	else
+	else if (bw_avail < bw_cap)
 		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, limited by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
 			 bw_avail / 1000, bw_avail % 1000,
 			 PCIE_SPEED2STR(speed), width,
@@ -5293,6 +5295,17 @@ void pcie_print_link_status(struct pci_dev *dev)
 			 bw_cap / 1000, bw_cap % 1000,
 			 PCIE_SPEED2STR(speed_cap), width_cap);
 }
+
+/**
+ * pcie_print_link_status - Report the PCI device's link speed and width
+ * @dev: PCI device to query
+ *
+ * Report the available bandwidth at the device.
+ */
+void pcie_print_link_status(struct pci_dev *dev)
+{
+	__pcie_print_link_status(dev, true);
+}
 EXPORT_SYMBOL(pcie_print_link_status);
 
 /**
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 70808c168fb9..ce880dab5bc8 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -263,6 +263,7 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev);
 enum pcie_link_width pcie_get_width_cap(struct pci_dev *dev);
 u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
 			   enum pcie_link_width *width);
+void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
 
 /* Single Root I/O Virtualization */
 struct pci_sriov {
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index bc147c586643..387fc8ac54ec 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2231,6 +2231,25 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
 	return dev;
 }
 
+static void pcie_report_downtraining(struct pci_dev *dev)
+{
+	if (!pci_is_pcie(dev))
+		return;
+
+	/* Look from the device up to avoid downstream ports with no devices */
+	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
+		return;
+
+	/* Multi-function PCIe devices share the same link/status */
+	if (PCI_FUNC(dev->devfn) != 0 || dev->is_virtfn)
+		return;
+
+	/* Print link status only if the device is constrained by the fabric */
+	__pcie_print_link_status(dev, false);
+}
+
 static void pci_init_capabilities(struct pci_dev *dev)
 {
 	/* Enhanced Allocation */
@@ -2266,6 +2285,8 @@ static void pci_init_capabilities(struct pci_dev *dev)
 	/* Advanced Error Reporting */
 	pci_aer_init(dev);
 
+	pcie_report_downtraining(dev);
+
 	if (pci_probe_reset_function(dev) == 0)
 		dev->reset_fn = 1;
 }

^ permalink raw reply related	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2018-08-09 14:03 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-04 15:55 [PATCH v3] PCI: Check for PCIe downtraining conditions Alexandru Gagniuc
2018-06-05 12:27 ` Andy Shevchenko
2018-06-05 13:04   ` Andy Shevchenko
2018-07-16 21:17 ` Bjorn Helgaas
2018-07-16 22:28   ` Alex_Gagniuc
2018-07-18 21:53     ` Bjorn Helgaas
2018-07-19 15:46       ` Alex G.
2018-07-23 20:01       ` [PATCH v2] PCI/AER: Do not clear AER bits if we don't own AER Alexandru Gagniuc
2018-07-25  1:24         ` kbuild test robot
2018-07-23 20:03       ` [PATCH v5] PCI: Check for PCIe downtraining conditions Alexandru Gagniuc
2018-07-23 21:01         ` Jakub Kicinski
2018-07-23 21:52           ` Tal Gilboa
2018-07-23 22:14             ` Jakub Kicinski
2018-07-23 23:59               ` Alex G.
2018-07-24 13:39                 ` Tal Gilboa
2018-07-30 23:26                   ` Alex_Gagniuc
2018-07-31  6:40             ` Tal Gilboa
2018-07-31 15:10               ` Alex G.
2018-08-05  7:05                 ` Tal Gilboa
2018-08-06 18:39                   ` Alex_Gagniuc
2018-08-06 19:46                     ` Bjorn Helgaas
2018-08-06 23:25                       ` [PATCH v6 1/9] " Alexandru Gagniuc
2018-08-06 23:25                         ` [PATCH v6 2/9] bnx2x: Do not call pcie_print_link_status() Alexandru Gagniuc
2018-08-06 23:25                         ` [PATCH v6 3/9] bnxt_en: " Alexandru Gagniuc
2018-08-06 23:25                         ` [PATCH v6 4/9] cxgb4: " Alexandru Gagniuc
2018-08-06 23:25                         ` [PATCH v6 5/9] fm10k: " Alexandru Gagniuc
2018-08-07 17:52                           ` Jeff Kirsher
2018-08-06 23:25                         ` [PATCH v6 6/9] ixgbe: " Alexandru Gagniuc
2018-08-07 17:51                           ` Jeff Kirsher
2018-08-06 23:25                         ` [PATCH v6 7/9] net/mlx4: " Alexandru Gagniuc
2018-08-08  6:10                           ` Leon Romanovsky
2018-08-06 23:25                         ` [PATCH v6 8/9] net/mlx5: " Alexandru Gagniuc
2018-08-08  6:08                           ` Leon Romanovsky
2018-08-08 14:23                             ` Tal Gilboa
2018-08-08 15:41                               ` Leon Romanovsky
2018-08-08 15:56                                 ` Tal Gilboa
2018-08-08 16:33                                   ` Alex G.
2018-08-08 17:27                                     ` Leon Romanovsky
2018-08-09 14:02                                       ` Bjorn Helgaas
2018-08-06 23:25                         ` [PATCH v6 9/9] nfp: " Alexandru Gagniuc
2018-08-07 19:44                         ` [PATCH v6 1/9] PCI: Check for PCIe downtraining conditions David Miller
2018-08-07 21:41                         ` Bjorn Helgaas
2018-07-18 13:38   ` [PATCH v3] " Tal Gilboa
2018-07-19 15:49     ` Alex G.
2018-07-23  5:21       ` Tal Gilboa
2018-07-23 17:01         ` Alex G.
2018-07-23 21:35           ` Tal Gilboa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).