All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] PCI: Check for PCIe downtraining conditions
@ 2018-06-04 15:55 Alexandru Gagniuc
  2018-06-05 12:27 ` Andy Shevchenko
  2018-07-16 21:17 ` Bjorn Helgaas
  0 siblings, 2 replies; 73+ messages in thread
From: Alexandru Gagniuc @ 2018-06-04 15:55 UTC (permalink / raw)
  To: bhelgaas
  Cc: alex_gagniuc, austin_bolen, shyam_iyer, keith.busch,
	Alexandru Gagniuc, linux-pci, linux-kernel

PCIe downtraining happens when both the device and PCIe port are
capable of a larger bus width or higher speed than negotiated.
Downtraining might be indicative of other problems in the system, and
identifying this from userspace is neither intuitive, nor straigh
forward.

The easiest way to detect this is with pcie_print_link_status(),
since the bottleneck is usually the link that is downtrained. It's not
a perfect solution, but it works extremely well in most cases.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---

Changes since v2:
 - Check dev->is_virtfn flag

Changes since v1:
 - Use pcie_print_link_status() instead of reimplementing logic
 
 drivers/pci/probe.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ac91b6fd0bcd..a88ec8c25dd5 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2146,6 +2146,25 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
 	return dev;
 }
 
+static void pcie_check_upstream_link(struct pci_dev *dev)
+{
+
+	if (!pci_is_pcie(dev))
+		return;
+
+	/* Look from the device up to avoid downstream ports with no devices. */
+	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
+		return;
+
+	/* Multi-function PCIe share the same link/status. */
+	if ((PCI_FUNC(dev->devfn) != 0) || dev->is_virtfn)
+		return;
+
+	pcie_print_link_status(dev);
+}
+
 static void pci_init_capabilities(struct pci_dev *dev)
 {
 	/* Enhanced Allocation */
@@ -2181,6 +2200,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
 	/* Advanced Error Reporting */
 	pci_aer_init(dev);
 
+	/* Check link and detect downtrain errors */
+	pcie_check_upstream_link(dev);
+
 	if (pci_probe_reset_function(dev) == 0)
 		dev->reset_fn = 1;
 }
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-06-04 15:55 [PATCH v3] PCI: Check for PCIe downtraining conditions Alexandru Gagniuc
@ 2018-06-05 12:27 ` Andy Shevchenko
  2018-06-05 13:04   ` Andy Shevchenko
  2018-07-16 21:17 ` Bjorn Helgaas
  1 sibling, 1 reply; 73+ messages in thread
From: Andy Shevchenko @ 2018-06-05 12:27 UTC (permalink / raw)
  To: Alexandru Gagniuc
  Cc: Bjorn Helgaas, alex_gagniuc, austin_bolen, shyam_iyer,
	Keith Busch, linux-pci, Linux Kernel Mailing List

On Mon, Jun 4, 2018 at 6:55 PM, Alexandru Gagniuc <mr.nuke.me@gmail.com> wrote:
> PCIe downtraining happens when both the device and PCIe port are
> capable of a larger bus width or higher speed than negotiated.
> Downtraining might be indicative of other problems in the system, and
> identifying this from userspace is neither intuitive, nor straigh
> forward.
>
> The easiest way to detect this is with pcie_print_link_status(),
> since the bottleneck is usually the link that is downtrained. It's not
> a perfect solution, but it works extremely well in most cases.

Have you seen any of my comments?
For your convenience repeating below.

>
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
>
> Changes since v2:
>  - Check dev->is_virtfn flag
>
> Changes since v1:
>  - Use pcie_print_link_status() instead of reimplementing logic
>
>  drivers/pci/probe.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index ac91b6fd0bcd..a88ec8c25dd5 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2146,6 +2146,25 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
>         return dev;
>  }
>
> +static void pcie_check_upstream_link(struct pci_dev *dev)
> +{

> +

This is redundant blank line.

> +       if (!pci_is_pcie(dev))
> +               return;
> +
> +       /* Look from the device up to avoid downstream ports with no devices. */
> +       if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
> +           (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
> +           (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
> +               return;

I looked briefly at the use of these calls and perhaps it might make
sense to introduce
pci_is_pcie_type(dev, type) which unifies pci_is_pcie() + pci_pcie_type().

> +
> +       /* Multi-function PCIe share the same link/status. */

> +       if ((PCI_FUNC(dev->devfn) != 0) || dev->is_virtfn)

The one pair of parens is not needed.

> +               return;
> +
> +       pcie_print_link_status(dev);
> +}
> +
>  static void pci_init_capabilities(struct pci_dev *dev)
>  {
>         /* Enhanced Allocation */
> @@ -2181,6 +2200,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
>         /* Advanced Error Reporting */
>         pci_aer_init(dev);
>
> +       /* Check link and detect downtrain errors */
> +       pcie_check_upstream_link(dev);
> +
>         if (pci_probe_reset_function(dev) == 0)
>                 dev->reset_fn = 1;
>  }
> --
> 2.14.4
>



-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-06-05 12:27 ` Andy Shevchenko
@ 2018-06-05 13:04   ` Andy Shevchenko
  0 siblings, 0 replies; 73+ messages in thread
From: Andy Shevchenko @ 2018-06-05 13:04 UTC (permalink / raw)
  To: Alexandru Gagniuc
  Cc: Bjorn Helgaas, alex_gagniuc, austin_bolen, shyam_iyer,
	Keith Busch, linux-pci, Linux Kernel Mailing List

On Tue, Jun 5, 2018 at 3:27 PM, Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
> On Mon, Jun 4, 2018 at 6:55 PM, Alexandru Gagniuc <mr.nuke.me@gmail.com> wrote:
>> PCIe downtraining happens when both the device and PCIe port are
>> capable of a larger bus width or higher speed than negotiated.
>> Downtraining might be indicative of other problems in the system, and
>> identifying this from userspace is neither intuitive, nor straigh
>> forward.
>>
>> The easiest way to detect this is with pcie_print_link_status(),
>> since the bottleneck is usually the link that is downtrained. It's not
>> a perfect solution, but it works extremely well in most cases.
>
> Have you seen any of my comments?
> For your convenience repeating below.

Ah, found the answer in a pile of emails. OK, I see your point about
helper, though the rest is still applicable here.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-06-04 15:55 [PATCH v3] PCI: Check for PCIe downtraining conditions Alexandru Gagniuc
  2018-06-05 12:27 ` Andy Shevchenko
@ 2018-07-16 21:17 ` Bjorn Helgaas
  2018-07-16 22:28     ` Alex_Gagniuc
  2018-07-18 13:38   ` [PATCH v3] " Tal Gilboa
  1 sibling, 2 replies; 73+ messages in thread
From: Bjorn Helgaas @ 2018-07-16 21:17 UTC (permalink / raw)
  To: Alexandru Gagniuc
  Cc: bhelgaas, alex_gagniuc, austin_bolen, shyam_iyer, keith.busch,
	linux-pci, linux-kernel, Jeff Kirsher, Ariel Elior, Michael Chan,
	Ganesh Goudar, Tariq Toukan, Jakub Kicinski, Tal Gilboa,
	Dave Airlie, Alex Deucher

[+cc maintainers of drivers that already use pcie_print_link_status()
and GPU folks]

On Mon, Jun 04, 2018 at 10:55:21AM -0500, Alexandru Gagniuc wrote:
> PCIe downtraining happens when both the device and PCIe port are
> capable of a larger bus width or higher speed than negotiated.
> Downtraining might be indicative of other problems in the system, and
> identifying this from userspace is neither intuitive, nor straigh
> forward.

s/straigh/straight/
In this context, I think "straightforward" should be closed up
(without the space).

> The easiest way to detect this is with pcie_print_link_status(),
> since the bottleneck is usually the link that is downtrained. It's not
> a perfect solution, but it works extremely well in most cases.

This is an interesting idea.  I have two concerns:

Some drivers already do this on their own, and we probably don't want
duplicate output for those devices.  In most cases (ixgbe and mlx* are
exceptions), the drivers do this unconditionally so we *could* remove
it from the driver if we add it to the core.  The dmesg order would
change, and the message wouldn't be associated with the driver as it
now is.

Also, I think some of the GPU devices might come up at a lower speed,
then download firmware, then reset the device so it comes up at a
higher speed.  I think this patch will make us complain about about
the low initial speed, which might confuse users.

So I'm not sure whether it's better to do this in the core for all
devices, or if we should just add it to the high-performance drivers
that really care.

> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
> 
> Changes since v2:
>  - Check dev->is_virtfn flag
> 
> Changes since v1:
>  - Use pcie_print_link_status() instead of reimplementing logic
>  
>  drivers/pci/probe.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index ac91b6fd0bcd..a88ec8c25dd5 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2146,6 +2146,25 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
>  	return dev;
>  }
>  
> +static void pcie_check_upstream_link(struct pci_dev *dev)
> +{
> +
> +	if (!pci_is_pcie(dev))
> +		return;
> +
> +	/* Look from the device up to avoid downstream ports with no devices. */
> +	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
> +		return;

Do we care about Upstream Ports here?  I suspect that ultimately we
only care about the bandwidth to Endpoints, and if an Endpoint is
constrained by a slow link farther up the tree,
pcie_print_link_status() is supposed to identify that slow link.

I would find this test easier to read as

  if (!(type == PCI_EXP_TYPE_ENDPOINT || type == PCI_EXP_TYPE_LEG_END))
    return;

But maybe I'm the only one that finds the conjunction of inequalities
hard to read.  No big deal either way.

> +	/* Multi-function PCIe share the same link/status. */
> +	if ((PCI_FUNC(dev->devfn) != 0) || dev->is_virtfn)
> +		return;
> +
> +	pcie_print_link_status(dev);
> +}
> +
>  static void pci_init_capabilities(struct pci_dev *dev)
>  {
>  	/* Enhanced Allocation */
> @@ -2181,6 +2200,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
>  	/* Advanced Error Reporting */
>  	pci_aer_init(dev);
>  
> +	/* Check link and detect downtrain errors */
> +	pcie_check_upstream_link(dev);
> +
>  	if (pci_probe_reset_function(dev) == 0)
>  		dev->reset_fn = 1;
>  }
> -- 
> 2.14.4
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-07-16 21:17 ` Bjorn Helgaas
@ 2018-07-16 22:28     ` Alex_Gagniuc
  2018-07-18 13:38   ` [PATCH v3] " Tal Gilboa
  1 sibling, 0 replies; 73+ messages in thread
From: Alex_Gagniuc @ 2018-07-16 22:28 UTC (permalink / raw)
  To: helgaas, mr.nuke.me
  Cc: bhelgaas, Austin.Bolen, Shyam.Iyer, keith.busch, linux-pci,
	linux-kernel, jeffrey.t.kirsher, ariel.elior, michael.chan,
	ganeshgr, tariqt, jakub.kicinski, talgi, airlied,
	alexander.deucher

On 7/16/2018 4:17 PM, Bjorn Helgaas wrote:
> [+cc maintainers of drivers that already use pcie_print_link_status()
> and GPU folks]

Thanks for finding them!

[snip]
>> identifying this from userspace is neither intuitive, nor straigh
>> forward.
> 
> s/straigh/straight/
> In this context, I think "straightforward" should be closed up
> (without the space).

That's a straightforward edit. Thanks for the feedback!

>> The easiest way to detect this is with pcie_print_link_status(),
>> since the bottleneck is usually the link that is downtrained. It's not
>> a perfect solution, but it works extremely well in most cases.
> 
> This is an interesting idea.  I have two concerns:
> 
> Some drivers already do this on their own, and we probably don't want
> duplicate output for those devices.  In most cases (ixgbe and mlx* are
> exceptions), the drivers do this unconditionally so we *could* remove
> it from the driver if we add it to the core.  The dmesg order would
> change, and the message wouldn't be associated with the driver as it
> now is.

Oh, there are only 8 users of that. Even I could patch up the drivers to 
remove the call, assuming we reach agreement about this change.

> Also, I think some of the GPU devices might come up at a lower speed,
> then download firmware, then reset the device so it comes up at a
> higher speed.  I think this patch will make us complain about about
> the low initial speed, which might confuse users.

I spoke to one of the PCIe spec writers. It's allowable for a device to 
downtrain speed or width. It would also be extremely dumb to downtrain 
with the intent to re-train at a higher speed later, but it's possible 
devices do dumb stuff like that. That's why it's an informational 
message, instead of a warning.

Another case: Some devices (lower-end GPUs) use silicon (and marketing) 
that advertises x16, but they're only routed for x8. I'm okay with 
seeing an informational message in this case. In fact, I didn't know 
that my Quadro card for three years is only wired for x8 until I was 
testing this patch.

> So I'm not sure whether it's better to do this in the core for all
> devices, or if we should just add it to the high-performance drivers
> that really care.

You're thinking "do I really need that bandwidth" because I'm using a 
function called "_bandwidth_". The point of the change is very far from 
that: it is to help in system troubleshooting by detecting downtraining 
conditions.

>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
[snip]
>> +	/* Look from the device up to avoid downstream ports with no devices. */
>> +	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
>> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
>> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
>> +		return;
> 
> Do we care about Upstream Ports here?  

YES! Switches. e.g. an x16 switch with 4x downstream ports could 
downtrain at 8x and 4x, and we'd never catch it.

> I suspect that ultimately we
> only care about the bandwidth to Endpoints, and if an Endpoint is
> constrained by a slow link farther up the tree,
> pcie_print_link_status() is supposed to identify that slow link.

See above.

> I would find this test easier to read as
> 
>    if (!(type == PCI_EXP_TYPE_ENDPOINT || type == PCI_EXP_TYPE_LEG_END))
>      return;
> 
> But maybe I'm the only one that finds the conjunction of inequalities
> hard to read.  No big deal either way.
> 
>> +	/* Multi-function PCIe share the same link/status. */
>> +	if ((PCI_FUNC(dev->devfn) != 0) || dev->is_virtfn)
>> +		return;
>> +
>> +	pcie_print_link_status(dev);
>> +}
>> +
>>   static void pci_init_capabilities(struct pci_dev *dev)
>>   {
>>   	/* Enhanced Allocation */
>> @@ -2181,6 +2200,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
>>   	/* Advanced Error Reporting */
>>   	pci_aer_init(dev);
>>   
>> +	/* Check link and detect downtrain errors */
>> +	pcie_check_upstream_link(dev);
>> +
>>   	if (pci_probe_reset_function(dev) == 0)
>>   		dev->reset_fn = 1;
>>   }
>> -- 
>> 2.14.4
>>
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
@ 2018-07-16 22:28     ` Alex_Gagniuc
  0 siblings, 0 replies; 73+ messages in thread
From: Alex_Gagniuc @ 2018-07-16 22:28 UTC (permalink / raw)
  To: helgaas, mr.nuke.me
  Cc: bhelgaas, Austin.Bolen, Shyam.Iyer, keith.busch, linux-pci,
	linux-kernel, jeffrey.t.kirsher, ariel.elior, michael.chan,
	ganeshgr, tariqt, jakub.kicinski, talgi, airlied,
	alexander.deucher

On 7/16/2018 4:17 PM, Bjorn Helgaas wrote:=0A=
> [+cc maintainers of drivers that already use pcie_print_link_status()=0A=
> and GPU folks]=0A=
=0A=
Thanks for finding them!=0A=
=0A=
[snip]=0A=
>> identifying this from userspace is neither intuitive, nor straigh=0A=
>> forward.=0A=
> =0A=
> s/straigh/straight/=0A=
> In this context, I think "straightforward" should be closed up=0A=
> (without the space).=0A=
=0A=
That's a straightforward edit. Thanks for the feedback!=0A=
=0A=
>> The easiest way to detect this is with pcie_print_link_status(),=0A=
>> since the bottleneck is usually the link that is downtrained. It's not=
=0A=
>> a perfect solution, but it works extremely well in most cases.=0A=
> =0A=
> This is an interesting idea.  I have two concerns:=0A=
> =0A=
> Some drivers already do this on their own, and we probably don't want=0A=
> duplicate output for those devices.  In most cases (ixgbe and mlx* are=0A=
> exceptions), the drivers do this unconditionally so we *could* remove=0A=
> it from the driver if we add it to the core.  The dmesg order would=0A=
> change, and the message wouldn't be associated with the driver as it=0A=
> now is.=0A=
=0A=
Oh, there are only 8 users of that. Even I could patch up the drivers to =
=0A=
remove the call, assuming we reach agreement about this change.=0A=
=0A=
> Also, I think some of the GPU devices might come up at a lower speed,=0A=
> then download firmware, then reset the device so it comes up at a=0A=
> higher speed.  I think this patch will make us complain about about=0A=
> the low initial speed, which might confuse users.=0A=
=0A=
I spoke to one of the PCIe spec writers. It's allowable for a device to =0A=
downtrain speed or width. It would also be extremely dumb to downtrain =0A=
with the intent to re-train at a higher speed later, but it's possible =0A=
devices do dumb stuff like that. That's why it's an informational =0A=
message, instead of a warning.=0A=
=0A=
Another case: Some devices (lower-end GPUs) use silicon (and marketing) =0A=
that advertises x16, but they're only routed for x8. I'm okay with =0A=
seeing an informational message in this case. In fact, I didn't know =0A=
that my Quadro card for three years is only wired for x8 until I was =0A=
testing this patch.=0A=
=0A=
> So I'm not sure whether it's better to do this in the core for all=0A=
> devices, or if we should just add it to the high-performance drivers=0A=
> that really care.=0A=
=0A=
You're thinking "do I really need that bandwidth" because I'm using a =0A=
function called "_bandwidth_". The point of the change is very far from =0A=
that: it is to help in system troubleshooting by detecting downtraining =0A=
conditions.=0A=
=0A=
>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>=0A=
[snip]=0A=
>> +	/* Look from the device up to avoid downstream ports with no devices. =
*/=0A=
>> +	if ((pci_pcie_type(dev) !=3D PCI_EXP_TYPE_ENDPOINT) &&=0A=
>> +	    (pci_pcie_type(dev) !=3D PCI_EXP_TYPE_LEG_END) &&=0A=
>> +	    (pci_pcie_type(dev) !=3D PCI_EXP_TYPE_UPSTREAM))=0A=
>> +		return;=0A=
> =0A=
> Do we care about Upstream Ports here?  =0A=
=0A=
YES! Switches. e.g. an x16 switch with 4x downstream ports could =0A=
downtrain at 8x and 4x, and we'd never catch it.=0A=
=0A=
> I suspect that ultimately we=0A=
> only care about the bandwidth to Endpoints, and if an Endpoint is=0A=
> constrained by a slow link farther up the tree,=0A=
> pcie_print_link_status() is supposed to identify that slow link.=0A=
=0A=
See above.=0A=
=0A=
> I would find this test easier to read as=0A=
> =0A=
>    if (!(type =3D=3D PCI_EXP_TYPE_ENDPOINT || type =3D=3D PCI_EXP_TYPE_LE=
G_END))=0A=
>      return;=0A=
> =0A=
> But maybe I'm the only one that finds the conjunction of inequalities=0A=
> hard to read.  No big deal either way.=0A=
> =0A=
>> +	/* Multi-function PCIe share the same link/status. */=0A=
>> +	if ((PCI_FUNC(dev->devfn) !=3D 0) || dev->is_virtfn)=0A=
>> +		return;=0A=
>> +=0A=
>> +	pcie_print_link_status(dev);=0A=
>> +}=0A=
>> +=0A=
>>   static void pci_init_capabilities(struct pci_dev *dev)=0A=
>>   {=0A=
>>   	/* Enhanced Allocation */=0A=
>> @@ -2181,6 +2200,9 @@ static void pci_init_capabilities(struct pci_dev *=
dev)=0A=
>>   	/* Advanced Error Reporting */=0A=
>>   	pci_aer_init(dev);=0A=
>>   =0A=
>> +	/* Check link and detect downtrain errors */=0A=
>> +	pcie_check_upstream_link(dev);=0A=
>> +=0A=
>>   	if (pci_probe_reset_function(dev) =3D=3D 0)=0A=
>>   		dev->reset_fn =3D 1;=0A=
>>   }=0A=
>> -- =0A=
>> 2.14.4=0A=
>>=0A=
> =0A=
=0A=

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-07-16 21:17 ` Bjorn Helgaas
  2018-07-16 22:28     ` Alex_Gagniuc
@ 2018-07-18 13:38   ` Tal Gilboa
  2018-07-19 15:49     ` Alex G.
  1 sibling, 1 reply; 73+ messages in thread
From: Tal Gilboa @ 2018-07-18 13:38 UTC (permalink / raw)
  To: Bjorn Helgaas, Alexandru Gagniuc
  Cc: bhelgaas, alex_gagniuc, austin_bolen, shyam_iyer, keith.busch,
	linux-pci, linux-kernel, Jeff Kirsher, Ariel Elior, Michael Chan,
	Ganesh Goudar, Tariq Toukan, Jakub Kicinski, Dave Airlie,
	Alex Deucher

On 7/16/2018 5:17 PM, Bjorn Helgaas wrote:
> [+cc maintainers of drivers that already use pcie_print_link_status()
> and GPU folks]
> 
> On Mon, Jun 04, 2018 at 10:55:21AM -0500, Alexandru Gagniuc wrote:
>> PCIe downtraining happens when both the device and PCIe port are
>> capable of a larger bus width or higher speed than negotiated.
>> Downtraining might be indicative of other problems in the system, and
>> identifying this from userspace is neither intuitive, nor straigh
>> forward.
> 
> s/straigh/straight/
> In this context, I think "straightforward" should be closed up
> (without the space).
> 
>> The easiest way to detect this is with pcie_print_link_status(),
>> since the bottleneck is usually the link that is downtrained. It's not
>> a perfect solution, but it works extremely well in most cases.
> 
> This is an interesting idea.  I have two concerns:
> 
> Some drivers already do this on their own, and we probably don't want
> duplicate output for those devices.  In most cases (ixgbe and mlx* are
> exceptions), the drivers do this unconditionally so we *could* remove
> it from the driver if we add it to the core.  The dmesg order would
> change, and the message wouldn't be associated with the driver as it
> now is.
> 
> Also, I think some of the GPU devices might come up at a lower speed,
> then download firmware, then reset the device so it comes up at a
> higher speed.  I think this patch will make us complain about about
> the low initial speed, which might confuse users.
> 
> So I'm not sure whether it's better to do this in the core for all
> devices, or if we should just add it to the high-performance drivers
> that really care.
> 
>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>> ---
>>
>> Changes since v2:
>>   - Check dev->is_virtfn flag
>>
>> Changes since v1:
>>   - Use pcie_print_link_status() instead of reimplementing logic
>>   
>>   drivers/pci/probe.c | 22 ++++++++++++++++++++++
>>   1 file changed, 22 insertions(+)
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index ac91b6fd0bcd..a88ec8c25dd5 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -2146,6 +2146,25 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
>>   	return dev;
>>   }
>>   
>> +static void pcie_check_upstream_link(struct pci_dev *dev)
>> +{
>> +
>> +	if (!pci_is_pcie(dev))
>> +		return;
>> +
>> +	/* Look from the device up to avoid downstream ports with no devices. */
>> +	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
>> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
>> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
>> +		return;
> 
> Do we care about Upstream Ports here?  I suspect that ultimately we
> only care about the bandwidth to Endpoints, and if an Endpoint is
> constrained by a slow link farther up the tree,
> pcie_print_link_status() is supposed to identify that slow link.
> 
> I would find this test easier to read as
> 
>    if (!(type == PCI_EXP_TYPE_ENDPOINT || type == PCI_EXP_TYPE_LEG_END))
>      return;
> 
> But maybe I'm the only one that finds the conjunction of inequalities
> hard to read.  No big deal either way.
> 
>> +	/* Multi-function PCIe share the same link/status. */
>> +	if ((PCI_FUNC(dev->devfn) != 0) || dev->is_virtfn)
>> +		return;
>> +
>> +	pcie_print_link_status(dev);
>> +}

Is this function called by default for every PCIe device? What about 
VFs? We make an exception for them on our driver since a VF doesn't have 
access to the needed information in order to provide a meaningful message.

>> +
>>   static void pci_init_capabilities(struct pci_dev *dev)
>>   {
>>   	/* Enhanced Allocation */
>> @@ -2181,6 +2200,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
>>   	/* Advanced Error Reporting */
>>   	pci_aer_init(dev);
>>   
>> +	/* Check link and detect downtrain errors */
>> +	pcie_check_upstream_link(dev);
>> +
>>   	if (pci_probe_reset_function(dev) == 0)
>>   		dev->reset_fn = 1;
>>   }
>> -- 
>> 2.14.4
>>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-07-16 22:28     ` Alex_Gagniuc
  (?)
@ 2018-07-18 21:53     ` Bjorn Helgaas
  2018-07-19 15:46       ` Alex G.
                         ` (2 more replies)
  -1 siblings, 3 replies; 73+ messages in thread
From: Bjorn Helgaas @ 2018-07-18 21:53 UTC (permalink / raw)
  To: Alex_Gagniuc
  Cc: mr.nuke.me, bhelgaas, Austin.Bolen, Shyam.Iyer, keith.busch,
	linux-pci, linux-kernel, jeffrey.t.kirsher, ariel.elior,
	michael.chan, ganeshgr, tariqt, jakub.kicinski, talgi, airlied,
	alexander.deucher, Mike Marciniszyn

[+cc Mike (hfi1)]

On Mon, Jul 16, 2018 at 10:28:35PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> On 7/16/2018 4:17 PM, Bjorn Helgaas wrote:
> >> ...
> >> The easiest way to detect this is with pcie_print_link_status(),
> >> since the bottleneck is usually the link that is downtrained. It's not
> >> a perfect solution, but it works extremely well in most cases.
> > 
> > This is an interesting idea.  I have two concerns:
> > 
> > Some drivers already do this on their own, and we probably don't want
> > duplicate output for those devices.  In most cases (ixgbe and mlx* are
> > exceptions), the drivers do this unconditionally so we *could* remove
> > it from the driver if we add it to the core.  The dmesg order would
> > change, and the message wouldn't be associated with the driver as it
> > now is.
> 
> Oh, there are only 8 users of that. Even I could patch up the drivers to 
> remove the call, assuming we reach agreement about this change.
> 
> > Also, I think some of the GPU devices might come up at a lower speed,
> > then download firmware, then reset the device so it comes up at a
> > higher speed.  I think this patch will make us complain about about
> > the low initial speed, which might confuse users.
> 
> I spoke to one of the PCIe spec writers. It's allowable for a device to 
> downtrain speed or width. It would also be extremely dumb to downtrain 
> with the intent to re-train at a higher speed later, but it's possible 
> devices do dumb stuff like that. That's why it's an informational 
> message, instead of a warning.

FWIW, here's some of the discussion related to hfi1 from [1]:

  > Btw, why is the driver configuring the PCIe link speed?  Isn't
  > this something we should be handling in the PCI core?

  The device comes out of reset at the 5GT/s speed. The driver
  downloads device firmware, programs PCIe registers, and co-ordinates
  the transition to 8GT/s.

  This recipe is device specific and is therefore implemented in the
  hfi1 driver built on top of PCI core functions and macros.

Also several DRM drivers seem to do this (see cik_pcie_gen3_enable(),
si_pcie_gen3_enable()); from [2]:

  My understanding was that some platfoms only bring up the link in gen 1
  mode for compatibility reasons. 

[1] https://lkml.kernel.org/r/32E1700B9017364D9B60AED9960492BC627FF54C@fmsmsx120.amr.corp.intel.com
[2] https://lkml.kernel.org/r/BN6PR12MB1809BD30AA5B890C054F9832F7B50@BN6PR12MB1809.namprd12.prod.outlook.com

> Another case: Some devices (lower-end GPUs) use silicon (and marketing) 
> that advertises x16, but they're only routed for x8. I'm okay with 
> seeing an informational message in this case. In fact, I didn't know 
> that my Quadro card for three years is only wired for x8 until I was 
> testing this patch.

Yeah, it's probably OK.  I don't want bug reports from people who
think something's broken when it's really just a hardware limitation
of their system.  But hopefully the message is not alarming.

> > So I'm not sure whether it's better to do this in the core for all
> > devices, or if we should just add it to the high-performance drivers
> > that really care.
> 
> You're thinking "do I really need that bandwidth" because I'm using a 
> function called "_bandwidth_". The point of the change is very far from 
> that: it is to help in system troubleshooting by detecting downtraining 
> conditions.

I'm not sure what you think I'm thinking :)  My question is whether
it's worthwhile to print this extra information for *every* PCIe
device, given that your use case is the tiny percentage of broken
systems.

If we only printed the info in the "bw_avail < bw_cap" case, i.e.,
when the device is capable of more than it's getting, that would make
a lot of sense to me.  The normal case line is more questionable.  I
think the reason that's there is because the network drivers are very
performance sensitive and like to see that info all the time.

Maybe we need something like this:

  pcie_print_link_status(struct pci_dev *dev, int verbose)
  {
    ...
    if (bw_avail >= bw_cap) {
      if (verbose)
        pci_info(dev, "... available PCIe bandwidth ...");
    } else
      pci_info(dev, "... available PCIe bandwidth, limited by ...");
  }

So the core could print only the potential problems with:

  pcie_print_link_status(dev, 0);

and drivers that really care even if there's no problem could do:

  pcie_print_link_status(dev, 1);

> >> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> [snip]
> >> +	/* Look from the device up to avoid downstream ports with no devices. */
> >> +	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
> >> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
> >> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
> >> +		return;
> > 
> > Do we care about Upstream Ports here?  
> 
> YES! Switches. e.g. an x16 switch with 4x downstream ports could 
> downtrain at 8x and 4x, and we'd never catch it.

OK, I think I see your point: if the upstream port *could* do 16x but
only trains to 4x, and two endpoints below it are both capable of 4x,
the endpoints *think* they're happy but in fact they have to share 4x
when they could use more.

Bjorn

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-07-18 21:53     ` Bjorn Helgaas
@ 2018-07-19 15:46       ` Alex G.
  2018-07-19 17:07         ` Deucher, Alexander
  2018-07-23 20:01       ` [PATCH v2] PCI/AER: Do not clear AER bits if we don't own AER Alexandru Gagniuc
  2018-07-23 20:03       ` [PATCH v5] PCI: Check for PCIe downtraining conditions Alexandru Gagniuc
  2 siblings, 1 reply; 73+ messages in thread
From: Alex G. @ 2018-07-19 15:46 UTC (permalink / raw)
  To: Bjorn Helgaas, Alex_Gagniuc
  Cc: bhelgaas, Austin.Bolen, Shyam.Iyer, keith.busch, linux-pci,
	linux-kernel, jeffrey.t.kirsher, ariel.elior, michael.chan,
	ganeshgr, tariqt, jakub.kicinski, talgi, airlied,
	alexander.deucher, Mike Marciniszyn



On 07/18/2018 04:53 PM, Bjorn Helgaas wrote:
> [+cc Mike (hfi1)]
> 
> On Mon, Jul 16, 2018 at 10:28:35PM +0000, Alex_Gagniuc@Dellteam.com wrote:
>> On 7/16/2018 4:17 PM, Bjorn Helgaas wrote:
>>>> ...
>>>> The easiest way to detect this is with pcie_print_link_status(),
>>>> since the bottleneck is usually the link that is downtrained. It's not
>>>> a perfect solution, but it works extremely well in most cases.
>>>
>>> This is an interesting idea.  I have two concerns:
>>>
>>> Some drivers already do this on their own, and we probably don't want
>>> duplicate output for those devices.  In most cases (ixgbe and mlx* are
>>> exceptions), the drivers do this unconditionally so we *could* remove
>>> it from the driver if we add it to the core.  The dmesg order would
>>> change, and the message wouldn't be associated with the driver as it
>>> now is.
>>
>> Oh, there are only 8 users of that. Even I could patch up the drivers to
>> remove the call, assuming we reach agreement about this change.
>>
>>> Also, I think some of the GPU devices might come up at a lower speed,
>>> then download firmware, then reset the device so it comes up at a
>>> higher speed.  I think this patch will make us complain about about
>>> the low initial speed, which might confuse users.
>>
>> I spoke to one of the PCIe spec writers. It's allowable for a device to
>> downtrain speed or width. It would also be extremely dumb to downtrain
>> with the intent to re-train at a higher speed later, but it's possible
>> devices do dumb stuff like that. That's why it's an informational
>> message, instead of a warning.
> 
> FWIW, here's some of the discussion related to hfi1 from [1]:
> 
>    > Btw, why is the driver configuring the PCIe link speed?  Isn't
>    > this something we should be handling in the PCI core?
> 
>    The device comes out of reset at the 5GT/s speed. The driver
>    downloads device firmware, programs PCIe registers, and co-ordinates
>    the transition to 8GT/s.
> 
>    This recipe is device specific and is therefore implemented in the
>    hfi1 driver built on top of PCI core functions and macros.
> 
> Also several DRM drivers seem to do this (see ),
> si_pcie_gen3_enable()); from [2]:
> 
>    My understanding was that some platfoms only bring up the link in gen 1
>    mode for compatibility reasons.
> 
> [1] https://lkml.kernel.org/r/32E1700B9017364D9B60AED9960492BC627FF54C@fmsmsx120.amr.corp.intel.com
> [2] https://lkml.kernel.org/r/BN6PR12MB1809BD30AA5B890C054F9832F7B50@BN6PR12MB1809.namprd12.prod.outlook.com

Downtraining a link "for compatibility reasons" is one of those dumb 
things that devices do. I'm SURPRISED AMD HW does it, although it is 
perfectly permissible by PCIe spec.

>> Another case: Some devices (lower-end GPUs) use silicon (and marketing)
>> that advertises x16, but they're only routed for x8. I'm okay with
>> seeing an informational message in this case. In fact, I didn't know
>> that my Quadro card for three years is only wired for x8 until I was
>> testing this patch.
> 
> Yeah, it's probably OK.  I don't want bug reports from people who
> think something's broken when it's really just a hardware limitation
> of their system.  But hopefully the message is not alarming.

It looks fairly innocent:

[    0.749415] pci 0000:18:00.0: 4.000 Gb/s available PCIe bandwidth, 
limited by 5 GT/s x1 link at 0000:17:03.0 (capable of 15.752 Gb/s with 8 
GT/s x2 link)

>>> So I'm not sure whether it's better to do this in the core for all
>>> devices, or if we should just add it to the high-performance drivers
>>> that really care.
>>
>> You're thinking "do I really need that bandwidth" because I'm using a
>> function called "_bandwidth_". The point of the change is very far from
>> that: it is to help in system troubleshooting by detecting downtraining
>> conditions.
> 
> I'm not sure what you think I'm thinking :)  My question is whether
> it's worthwhile to print this extra information for *every* PCIe
> device, given that your use case is the tiny percentage of broken
> systems.

I think this information is a lot more useful than a bunch of other info 
that's printed. Is "type 00 class 0x088000" more valuable? What about 
"reg 0x20: [mem 0x9d950000-0x9d95ffff 64bit pref]", which is also 
available under /proc/iomem for those curious?

> If we only printed the info in the "bw_avail < bw_cap" case, i.e.,
> when the device is capable of more than it's getting, that would make
> a lot of sense to me.  The normal case line is more questionable.  I
> think the reason that's there is because the network drivers are very
> performance sensitive and like to see that info all the time.

I agree that can be an acceptable compromise.

> Maybe we need something like this:
> 
>    pcie_print_link_status(struct pci_dev *dev, int verbose)
>    {
>      ...
>      if (bw_avail >= bw_cap) {
>        if (verbose)
>          pci_info(dev, "... available PCIe bandwidth ...");
>      } else
>        pci_info(dev, "... available PCIe bandwidth, limited by ...");
>    }
> 
> So the core could print only the potential problems with:
> 
>    pcie_print_link_status(dev, 0);
> 
> and drivers that really care even if there's no problem could do:
> 
>    pcie_print_link_status(dev, 1);

Sounds good. I'll try to push out updated PATCH early next week.

>>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>> [snip]
>>>> +	/* Look from the device up to avoid downstream ports with no devices. */
>>>> +	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
>>>> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
>>>> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
>>>> +		return;
>>>
>>> Do we care about Upstream Ports here?
>>
>> YES! Switches. e.g. an x16 switch with 4x downstream ports could
>> downtrain at 8x and 4x, and we'd never catch it.
> 
> OK, I think I see your point: if the upstream port *could* do 16x but
> only trains to 4x, and two endpoints below it are both capable of 4x,
> the endpoints *think* they're happy but in fact they have to share 4x
> when they could use more.
> 
> Bjorn
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-07-18 13:38   ` [PATCH v3] " Tal Gilboa
@ 2018-07-19 15:49     ` Alex G.
  2018-07-23  5:21       ` Tal Gilboa
  0 siblings, 1 reply; 73+ messages in thread
From: Alex G. @ 2018-07-19 15:49 UTC (permalink / raw)
  To: Tal Gilboa, Bjorn Helgaas
  Cc: bhelgaas, alex_gagniuc, austin_bolen, shyam_iyer, keith.busch,
	linux-pci, linux-kernel, Jeff Kirsher, Ariel Elior, Michael Chan,
	Ganesh Goudar, Tariq Toukan, Jakub Kicinski, Dave Airlie,
	Alex Deucher



On 07/18/2018 08:38 AM, Tal Gilboa wrote:
> On 7/16/2018 5:17 PM, Bjorn Helgaas wrote:
>> [+cc maintainers of drivers that already use pcie_print_link_status()
>> and GPU folks]
[snip]
>>
>>> +    /* Multi-function PCIe share the same link/status. */
>>> +    if ((PCI_FUNC(dev->devfn) != 0) || dev->is_virtfn)
>>> +        return;
>>> +
>>> +    pcie_print_link_status(dev);
>>> +}
> 
> Is this function called by default for every PCIe device? What about 
> VFs? We make an exception for them on our driver since a VF doesn't have 
> access to the needed information in order to provide a meaningful message.

I'm assuming VF means virtual function. pcie_print_link_status() doesn't 
care if it's passed a virtual function. It will try to do its job. 
That's why I bail out three lines above, with 'dev->is_virtfn' check.

Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* RE: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-07-19 15:46       ` Alex G.
@ 2018-07-19 17:07         ` Deucher, Alexander
  0 siblings, 0 replies; 73+ messages in thread
From: Deucher, Alexander @ 2018-07-19 17:07 UTC (permalink / raw)
  To: Alex G., Bjorn Helgaas, Alex_Gagniuc
  Cc: bhelgaas, Austin.Bolen, Shyam.Iyer, keith.busch, linux-pci,
	linux-kernel, jeffrey.t.kirsher, ariel.elior, michael.chan,
	ganeshgr, tariqt, jakub.kicinski, talgi, airlied,
	Mike Marciniszyn

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBBbGV4IEcuIFttYWlsdG86bXIu
bnVrZS5tZUBnbWFpbC5jb21dDQo+IFNlbnQ6IFRodXJzZGF5LCBKdWx5IDE5LCAyMDE4IDExOjQ3
IEFNDQo+IFRvOiBCam9ybiBIZWxnYWFzIDxoZWxnYWFzQGtlcm5lbC5vcmc+OyBBbGV4X0dhZ25p
dWNARGVsbHRlYW0uY29tDQo+IENjOiBiaGVsZ2Fhc0Bnb29nbGUuY29tOyBBdXN0aW4uQm9sZW5A
ZGVsbC5jb207IFNoeWFtLkl5ZXJAZGVsbC5jb207DQo+IGtlaXRoLmJ1c2NoQGludGVsLmNvbTsg
bGludXgtcGNpQHZnZXIua2VybmVsLm9yZzsgbGludXgtDQo+IGtlcm5lbEB2Z2VyLmtlcm5lbC5v
cmc7IGplZmZyZXkudC5raXJzaGVyQGludGVsLmNvbTsNCj4gYXJpZWwuZWxpb3JAY2F2aXVtLmNv
bTsgbWljaGFlbC5jaGFuQGJyb2FkY29tLmNvbTsNCj4gZ2FuZXNoZ3JAY2hlbHNpby5jb207IHRh
cmlxdEBtZWxsYW5veC5jb207DQo+IGpha3ViLmtpY2luc2tpQG5ldHJvbm9tZS5jb207IHRhbGdp
QG1lbGxhbm94LmNvbTsgYWlybGllZEBnbWFpbC5jb207DQo+IERldWNoZXIsIEFsZXhhbmRlciA8
QWxleGFuZGVyLkRldWNoZXJAYW1kLmNvbT47IE1pa2UgTWFyY2luaXN6eW4NCj4gPG1pa2UubWFy
Y2luaXN6eW5AaW50ZWwuY29tPg0KPiBTdWJqZWN0OiBSZTogW1BBVENIIHYzXSBQQ0k6IENoZWNr
IGZvciBQQ0llIGRvd250cmFpbmluZyBjb25kaXRpb25zDQo+IA0KPiANCj4gDQo+IE9uIDA3LzE4
LzIwMTggMDQ6NTMgUE0sIEJqb3JuIEhlbGdhYXMgd3JvdGU6DQo+ID4gWytjYyBNaWtlIChoZmkx
KV0NCj4gPg0KPiA+IE9uIE1vbiwgSnVsIDE2LCAyMDE4IGF0IDEwOjI4OjM1UE0gKzAwMDAsIEFs
ZXhfR2Fnbml1Y0BEZWxsdGVhbS5jb20NCj4gd3JvdGU6DQo+ID4+IE9uIDcvMTYvMjAxOCA0OjE3
IFBNLCBCam9ybiBIZWxnYWFzIHdyb3RlOg0KPiA+Pj4+IC4uLg0KPiA+Pj4+IFRoZSBlYXNpZXN0
IHdheSB0byBkZXRlY3QgdGhpcyBpcyB3aXRoIHBjaWVfcHJpbnRfbGlua19zdGF0dXMoKSwNCj4g
Pj4+PiBzaW5jZSB0aGUgYm90dGxlbmVjayBpcyB1c3VhbGx5IHRoZSBsaW5rIHRoYXQgaXMgZG93
bnRyYWluZWQuIEl0J3MNCj4gPj4+PiBub3QgYSBwZXJmZWN0IHNvbHV0aW9uLCBidXQgaXQgd29y
a3MgZXh0cmVtZWx5IHdlbGwgaW4gbW9zdCBjYXNlcy4NCj4gPj4+DQo+ID4+PiBUaGlzIGlzIGFu
IGludGVyZXN0aW5nIGlkZWEuICBJIGhhdmUgdHdvIGNvbmNlcm5zOg0KPiA+Pj4NCj4gPj4+IFNv
bWUgZHJpdmVycyBhbHJlYWR5IGRvIHRoaXMgb24gdGhlaXIgb3duLCBhbmQgd2UgcHJvYmFibHkg
ZG9uJ3QNCj4gPj4+IHdhbnQgZHVwbGljYXRlIG91dHB1dCBmb3IgdGhvc2UgZGV2aWNlcy4gIElu
IG1vc3QgY2FzZXMgKGl4Z2JlIGFuZA0KPiA+Pj4gbWx4KiBhcmUgZXhjZXB0aW9ucyksIHRoZSBk
cml2ZXJzIGRvIHRoaXMgdW5jb25kaXRpb25hbGx5IHNvIHdlDQo+ID4+PiAqY291bGQqIHJlbW92
ZSBpdCBmcm9tIHRoZSBkcml2ZXIgaWYgd2UgYWRkIGl0IHRvIHRoZSBjb3JlLiAgVGhlDQo+ID4+
PiBkbWVzZyBvcmRlciB3b3VsZCBjaGFuZ2UsIGFuZCB0aGUgbWVzc2FnZSB3b3VsZG4ndCBiZSBh
c3NvY2lhdGVkDQo+ID4+PiB3aXRoIHRoZSBkcml2ZXIgYXMgaXQgbm93IGlzLg0KPiA+Pg0KPiA+
PiBPaCwgdGhlcmUgYXJlIG9ubHkgOCB1c2VycyBvZiB0aGF0LiBFdmVuIEkgY291bGQgcGF0Y2gg
dXAgdGhlIGRyaXZlcnMNCj4gPj4gdG8gcmVtb3ZlIHRoZSBjYWxsLCBhc3N1bWluZyB3ZSByZWFj
aCBhZ3JlZW1lbnQgYWJvdXQgdGhpcyBjaGFuZ2UuDQo+ID4+DQo+ID4+PiBBbHNvLCBJIHRoaW5r
IHNvbWUgb2YgdGhlIEdQVSBkZXZpY2VzIG1pZ2h0IGNvbWUgdXAgYXQgYSBsb3dlcg0KPiA+Pj4g
c3BlZWQsIHRoZW4gZG93bmxvYWQgZmlybXdhcmUsIHRoZW4gcmVzZXQgdGhlIGRldmljZSBzbyBp
dCBjb21lcyB1cA0KPiA+Pj4gYXQgYSBoaWdoZXIgc3BlZWQuICBJIHRoaW5rIHRoaXMgcGF0Y2gg
d2lsbCBtYWtlIHVzIGNvbXBsYWluIGFib3V0DQo+ID4+PiBhYm91dCB0aGUgbG93IGluaXRpYWwg
c3BlZWQsIHdoaWNoIG1pZ2h0IGNvbmZ1c2UgdXNlcnMuDQo+ID4+DQo+ID4+IEkgc3Bva2UgdG8g
b25lIG9mIHRoZSBQQ0llIHNwZWMgd3JpdGVycy4gSXQncyBhbGxvd2FibGUgZm9yIGEgZGV2aWNl
DQo+ID4+IHRvIGRvd250cmFpbiBzcGVlZCBvciB3aWR0aC4gSXQgd291bGQgYWxzbyBiZSBleHRy
ZW1lbHkgZHVtYiB0bw0KPiA+PiBkb3dudHJhaW4gd2l0aCB0aGUgaW50ZW50IHRvIHJlLXRyYWlu
IGF0IGEgaGlnaGVyIHNwZWVkIGxhdGVyLCBidXQNCj4gPj4gaXQncyBwb3NzaWJsZSBkZXZpY2Vz
IGRvIGR1bWIgc3R1ZmYgbGlrZSB0aGF0LiBUaGF0J3Mgd2h5IGl0J3MgYW4NCj4gPj4gaW5mb3Jt
YXRpb25hbCBtZXNzYWdlLCBpbnN0ZWFkIG9mIGEgd2FybmluZy4NCj4gPg0KPiA+IEZXSVcsIGhl
cmUncyBzb21lIG9mIHRoZSBkaXNjdXNzaW9uIHJlbGF0ZWQgdG8gaGZpMSBmcm9tIFsxXToNCj4g
Pg0KPiA+ICAgID4gQnR3LCB3aHkgaXMgdGhlIGRyaXZlciBjb25maWd1cmluZyB0aGUgUENJZSBs
aW5rIHNwZWVkPyAgSXNuJ3QNCj4gPiAgICA+IHRoaXMgc29tZXRoaW5nIHdlIHNob3VsZCBiZSBo
YW5kbGluZyBpbiB0aGUgUENJIGNvcmU/DQo+ID4NCj4gPiAgICBUaGUgZGV2aWNlIGNvbWVzIG91
dCBvZiByZXNldCBhdCB0aGUgNUdUL3Mgc3BlZWQuIFRoZSBkcml2ZXINCj4gPiAgICBkb3dubG9h
ZHMgZGV2aWNlIGZpcm13YXJlLCBwcm9ncmFtcyBQQ0llIHJlZ2lzdGVycywgYW5kIGNvLW9yZGlu
YXRlcw0KPiA+ICAgIHRoZSB0cmFuc2l0aW9uIHRvIDhHVC9zLg0KPiA+DQo+ID4gICAgVGhpcyBy
ZWNpcGUgaXMgZGV2aWNlIHNwZWNpZmljIGFuZCBpcyB0aGVyZWZvcmUgaW1wbGVtZW50ZWQgaW4g
dGhlDQo+ID4gICAgaGZpMSBkcml2ZXIgYnVpbHQgb24gdG9wIG9mIFBDSSBjb3JlIGZ1bmN0aW9u
cyBhbmQgbWFjcm9zLg0KPiA+DQo+ID4gQWxzbyBzZXZlcmFsIERSTSBkcml2ZXJzIHNlZW0gdG8g
ZG8gdGhpcyAoc2VlICksDQo+ID4gc2lfcGNpZV9nZW4zX2VuYWJsZSgpKTsgZnJvbSBbMl06DQo+
ID4NCj4gPiAgICBNeSB1bmRlcnN0YW5kaW5nIHdhcyB0aGF0IHNvbWUgcGxhdGZvbXMgb25seSBi
cmluZyB1cCB0aGUgbGluayBpbiBnZW4gMQ0KPiA+ICAgIG1vZGUgZm9yIGNvbXBhdGliaWxpdHkg
cmVhc29ucy4NCj4gPg0KPiA+IFsxXQ0KPiA+DQo+IGh0dHBzOi8vbGttbC5rZXJuZWwub3JnL3Iv
MzJFMTcwMEI5MDE3MzY0RDlCNjBBRUQ5OTYwNDkyQkM2MjdGRjU0Q0BmDQo+IG1zDQo+ID4gbXN4
MTIwLmFtci5jb3JwLmludGVsLmNvbSBbMl0NCj4gPg0KPiBodHRwczovL2xrbWwua2VybmVsLm9y
Zy9yL0JONlBSMTJNQjE4MDlCRDMwQUE1Qjg5MEMwNTRGOTgzMkY3QjUwQEINCj4gTjZQUg0KPiA+
IDEyTUIxODA5Lm5hbXByZDEyLnByb2Qub3V0bG9vay5jb20NCj4gDQo+IERvd250cmFpbmluZyBh
IGxpbmsgImZvciBjb21wYXRpYmlsaXR5IHJlYXNvbnMiIGlzIG9uZSBvZiB0aG9zZSBkdW1iIHRo
aW5ncw0KPiB0aGF0IGRldmljZXMgZG8uIEknbSBTVVJQUklTRUQgQU1EIEhXIGRvZXMgaXQsIGFs
dGhvdWdoIGl0IGlzIHBlcmZlY3RseQ0KPiBwZXJtaXNzaWJsZSBieSBQQ0llIHNwZWMuDQoNCg0K
SXQncyBub3QgQU1EIGhhcmR3YXJlLCB0aGF0IGRvZXMgaXQgcGVyIHNlLiAgSUlSQywgc29tZSBl
YXJseSBwY2llIHg4NiBnZW4gMi4wIGNhcGFibGUgbW90aGVyYm9hcmRzIHVzZWQgdG8gYnJpbmcg
dXANCmRldmljZXMgaW4gZ2VuMSBtb2RlIChJIGd1ZXNzIGZvciBjb21wYXRpYmlsaXR5IHJlYXNv
bnM/KS4gIFNvIEkgYWRkZWQgY29kZSB0byB0aGUgcmFkZW9uIGRyaXZlciB0byBzd2l0Y2ggdG8g
Z2VuMiBvciAzDQppZiB0aGUgbW90aGVyYm9hcmQgc3VwcG9ydGVkIGl0LiAgTW9zdCAoYWxsPykg
cmVsYXRpdmVseSByZWNlbnQgbW90aGVyYm9hcmRzIHNlZW0gdG8gdHJhaW4gYXQgdGhlIGhpZ2hl
c3Qgc3BlZWQgc28gaXQncw0Kbm90IHJlYWxseSBhbiBpc3N1ZSBhbnltb3JlLg0KDQpPbiBhIHNv
bWV3aGF0IHJlbGF0ZWQgbm90ZSwgd2UgZHluYW1pY2FsbHkgYWRqdXN0IHRoZSBzcGVlZCBhbmQg
bGFuZXMgb24gdGhlIGZseSB0byBzYXZlIHBvd2VyLiAgVGhpcyBpcyBkb25lIGJ5IGENCm1pY3Jv
Y29udHJvbGxlciBvbiB0aGUgR1BVIHRob3VnaCBzbyB0aGUgZHJpdmVyIGRvZXNuJ3QgcHJvZ3Jh
bSB0aGlzIGRpcmVjdGx5Lg0KDQpBbGV4DQoNCj4gDQo+ID4+IEFub3RoZXIgY2FzZTogU29tZSBk
ZXZpY2VzIChsb3dlci1lbmQgR1BVcykgdXNlIHNpbGljb24gKGFuZA0KPiA+PiBtYXJrZXRpbmcp
IHRoYXQgYWR2ZXJ0aXNlcyB4MTYsIGJ1dCB0aGV5J3JlIG9ubHkgcm91dGVkIGZvciB4OC4gSSdt
DQo+ID4+IG9rYXkgd2l0aCBzZWVpbmcgYW4gaW5mb3JtYXRpb25hbCBtZXNzYWdlIGluIHRoaXMg
Y2FzZS4gSW4gZmFjdCwgSQ0KPiA+PiBkaWRuJ3Qga25vdyB0aGF0IG15IFF1YWRybyBjYXJkIGZv
ciB0aHJlZSB5ZWFycyBpcyBvbmx5IHdpcmVkIGZvciB4OA0KPiA+PiB1bnRpbCBJIHdhcyB0ZXN0
aW5nIHRoaXMgcGF0Y2guDQo+ID4NCj4gPiBZZWFoLCBpdCdzIHByb2JhYmx5IE9LLiAgSSBkb24n
dCB3YW50IGJ1ZyByZXBvcnRzIGZyb20gcGVvcGxlIHdobw0KPiA+IHRoaW5rIHNvbWV0aGluZydz
IGJyb2tlbiB3aGVuIGl0J3MgcmVhbGx5IGp1c3QgYSBoYXJkd2FyZSBsaW1pdGF0aW9uDQo+ID4g
b2YgdGhlaXIgc3lzdGVtLiAgQnV0IGhvcGVmdWxseSB0aGUgbWVzc2FnZSBpcyBub3QgYWxhcm1p
bmcuDQo+IA0KPiBJdCBsb29rcyBmYWlybHkgaW5ub2NlbnQ6DQo+IA0KPiBbICAgIDAuNzQ5NDE1
XSBwY2kgMDAwMDoxODowMC4wOiA0LjAwMCBHYi9zIGF2YWlsYWJsZSBQQ0llIGJhbmR3aWR0aCwN
Cj4gbGltaXRlZCBieSA1IEdUL3MgeDEgbGluayBhdCAwMDAwOjE3OjAzLjAgKGNhcGFibGUgb2Yg
MTUuNzUyIEdiL3Mgd2l0aCA4IEdUL3MNCj4geDIgbGluaykNCj4gDQo+ID4+PiBTbyBJJ20gbm90
IHN1cmUgd2hldGhlciBpdCdzIGJldHRlciB0byBkbyB0aGlzIGluIHRoZSBjb3JlIGZvciBhbGwN
Cj4gPj4+IGRldmljZXMsIG9yIGlmIHdlIHNob3VsZCBqdXN0IGFkZCBpdCB0byB0aGUgaGlnaC1w
ZXJmb3JtYW5jZSBkcml2ZXJzDQo+ID4+PiB0aGF0IHJlYWxseSBjYXJlLg0KPiA+Pg0KPiA+PiBZ
b3UncmUgdGhpbmtpbmcgImRvIEkgcmVhbGx5IG5lZWQgdGhhdCBiYW5kd2lkdGgiIGJlY2F1c2Ug
SSdtIHVzaW5nIGENCj4gPj4gZnVuY3Rpb24gY2FsbGVkICJfYmFuZHdpZHRoXyIuIFRoZSBwb2lu
dCBvZiB0aGUgY2hhbmdlIGlzIHZlcnkgZmFyDQo+ID4+IGZyb20NCj4gPj4gdGhhdDogaXQgaXMg
dG8gaGVscCBpbiBzeXN0ZW0gdHJvdWJsZXNob290aW5nIGJ5IGRldGVjdGluZw0KPiA+PiBkb3du
dHJhaW5pbmcgY29uZGl0aW9ucy4NCj4gPg0KPiA+IEknbSBub3Qgc3VyZSB3aGF0IHlvdSB0aGlu
ayBJJ20gdGhpbmtpbmcgOikgIE15IHF1ZXN0aW9uIGlzIHdoZXRoZXINCj4gPiBpdCdzIHdvcnRo
d2hpbGUgdG8gcHJpbnQgdGhpcyBleHRyYSBpbmZvcm1hdGlvbiBmb3IgKmV2ZXJ5KiBQQ0llDQo+
ID4gZGV2aWNlLCBnaXZlbiB0aGF0IHlvdXIgdXNlIGNhc2UgaXMgdGhlIHRpbnkgcGVyY2VudGFn
ZSBvZiBicm9rZW4NCj4gPiBzeXN0ZW1zLg0KPiANCj4gSSB0aGluayB0aGlzIGluZm9ybWF0aW9u
IGlzIGEgbG90IG1vcmUgdXNlZnVsIHRoYW4gYSBidW5jaCBvZiBvdGhlciBpbmZvIHRoYXQncw0K
PiBwcmludGVkLiBJcyAidHlwZSAwMCBjbGFzcyAweDA4ODAwMCIgbW9yZSB2YWx1YWJsZT8gV2hh
dCBhYm91dCAicmVnIDB4MjA6DQo+IFttZW0gMHg5ZDk1MDAwMC0weDlkOTVmZmZmIDY0Yml0IHBy
ZWZdIiwgd2hpY2ggaXMgYWxzbyBhdmFpbGFibGUgdW5kZXINCj4gL3Byb2MvaW9tZW0gZm9yIHRo
b3NlIGN1cmlvdXM/DQo+IA0KPiA+IElmIHdlIG9ubHkgcHJpbnRlZCB0aGUgaW5mbyBpbiB0aGUg
ImJ3X2F2YWlsIDwgYndfY2FwIiBjYXNlLCBpLmUuLA0KPiA+IHdoZW4gdGhlIGRldmljZSBpcyBj
YXBhYmxlIG9mIG1vcmUgdGhhbiBpdCdzIGdldHRpbmcsIHRoYXQgd291bGQgbWFrZQ0KPiA+IGEg
bG90IG9mIHNlbnNlIHRvIG1lLiAgVGhlIG5vcm1hbCBjYXNlIGxpbmUgaXMgbW9yZSBxdWVzdGlv
bmFibGUuICBJDQo+ID4gdGhpbmsgdGhlIHJlYXNvbiB0aGF0J3MgdGhlcmUgaXMgYmVjYXVzZSB0
aGUgbmV0d29yayBkcml2ZXJzIGFyZSB2ZXJ5DQo+ID4gcGVyZm9ybWFuY2Ugc2Vuc2l0aXZlIGFu
ZCBsaWtlIHRvIHNlZSB0aGF0IGluZm8gYWxsIHRoZSB0aW1lLg0KPiANCj4gSSBhZ3JlZSB0aGF0
IGNhbiBiZSBhbiBhY2NlcHRhYmxlIGNvbXByb21pc2UuDQo+IA0KPiA+IE1heWJlIHdlIG5lZWQg
c29tZXRoaW5nIGxpa2UgdGhpczoNCj4gPg0KPiA+ICAgIHBjaWVfcHJpbnRfbGlua19zdGF0dXMo
c3RydWN0IHBjaV9kZXYgKmRldiwgaW50IHZlcmJvc2UpDQo+ID4gICAgew0KPiA+ICAgICAgLi4u
DQo+ID4gICAgICBpZiAoYndfYXZhaWwgPj0gYndfY2FwKSB7DQo+ID4gICAgICAgIGlmICh2ZXJi
b3NlKQ0KPiA+ICAgICAgICAgIHBjaV9pbmZvKGRldiwgIi4uLiBhdmFpbGFibGUgUENJZSBiYW5k
d2lkdGggLi4uIik7DQo+ID4gICAgICB9IGVsc2UNCj4gPiAgICAgICAgcGNpX2luZm8oZGV2LCAi
Li4uIGF2YWlsYWJsZSBQQ0llIGJhbmR3aWR0aCwgbGltaXRlZCBieSAuLi4iKTsNCj4gPiAgICB9
DQo+ID4NCj4gPiBTbyB0aGUgY29yZSBjb3VsZCBwcmludCBvbmx5IHRoZSBwb3RlbnRpYWwgcHJv
YmxlbXMgd2l0aDoNCj4gPg0KPiA+ICAgIHBjaWVfcHJpbnRfbGlua19zdGF0dXMoZGV2LCAwKTsN
Cj4gPg0KPiA+IGFuZCBkcml2ZXJzIHRoYXQgcmVhbGx5IGNhcmUgZXZlbiBpZiB0aGVyZSdzIG5v
IHByb2JsZW0gY291bGQgZG86DQo+ID4NCj4gPiAgICBwY2llX3ByaW50X2xpbmtfc3RhdHVzKGRl
diwgMSk7DQo+IA0KPiBTb3VuZHMgZ29vZC4gSSdsbCB0cnkgdG8gcHVzaCBvdXQgdXBkYXRlZCBQ
QVRDSCBlYXJseSBuZXh0IHdlZWsuDQo+IA0KPiA+Pj4+IFNpZ25lZC1vZmYtYnk6IEFsZXhhbmRy
dSBHYWduaXVjIDxtci5udWtlLm1lQGdtYWlsLmNvbT4NCj4gPj4gW3NuaXBdDQo+ID4+Pj4gKwkv
KiBMb29rIGZyb20gdGhlIGRldmljZSB1cCB0byBhdm9pZCBkb3duc3RyZWFtIHBvcnRzIHdpdGgg
bm8NCj4gZGV2aWNlcy4gKi8NCj4gPj4+PiArCWlmICgocGNpX3BjaWVfdHlwZShkZXYpICE9IFBD
SV9FWFBfVFlQRV9FTkRQT0lOVCkgJiYNCj4gPj4+PiArCSAgICAocGNpX3BjaWVfdHlwZShkZXYp
ICE9IFBDSV9FWFBfVFlQRV9MRUdfRU5EKSAmJg0KPiA+Pj4+ICsJICAgIChwY2lfcGNpZV90eXBl
KGRldikgIT0gUENJX0VYUF9UWVBFX1VQU1RSRUFNKSkNCj4gPj4+PiArCQlyZXR1cm47DQo+ID4+
Pg0KPiA+Pj4gRG8gd2UgY2FyZSBhYm91dCBVcHN0cmVhbSBQb3J0cyBoZXJlPw0KPiA+Pg0KPiA+
PiBZRVMhIFN3aXRjaGVzLiBlLmcuIGFuIHgxNiBzd2l0Y2ggd2l0aCA0eCBkb3duc3RyZWFtIHBv
cnRzIGNvdWxkDQo+ID4+IGRvd250cmFpbiBhdCA4eCBhbmQgNHgsIGFuZCB3ZSdkIG5ldmVyIGNh
dGNoIGl0Lg0KPiA+DQo+ID4gT0ssIEkgdGhpbmsgSSBzZWUgeW91ciBwb2ludDogaWYgdGhlIHVw
c3RyZWFtIHBvcnQgKmNvdWxkKiBkbyAxNnggYnV0DQo+ID4gb25seSB0cmFpbnMgdG8gNHgsIGFu
ZCB0d28gZW5kcG9pbnRzIGJlbG93IGl0IGFyZSBib3RoIGNhcGFibGUgb2YgNHgsDQo+ID4gdGhl
IGVuZHBvaW50cyAqdGhpbmsqIHRoZXkncmUgaGFwcHkgYnV0IGluIGZhY3QgdGhleSBoYXZlIHRv
IHNoYXJlIDR4DQo+ID4gd2hlbiB0aGV5IGNvdWxkIHVzZSBtb3JlLg0KPiA+DQo+ID4gQmpvcm4N
Cj4gPg0K

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-07-19 15:49     ` Alex G.
@ 2018-07-23  5:21       ` Tal Gilboa
  2018-07-23 17:01         ` Alex G.
  0 siblings, 1 reply; 73+ messages in thread
From: Tal Gilboa @ 2018-07-23  5:21 UTC (permalink / raw)
  To: Alex G., Bjorn Helgaas
  Cc: bhelgaas, alex_gagniuc, austin_bolen, shyam_iyer, keith.busch,
	linux-pci, linux-kernel, Jeff Kirsher, Ariel Elior, Michael Chan,
	Ganesh Goudar, Tariq Toukan, Jakub Kicinski, Dave Airlie,
	Alex Deucher

On 7/19/2018 6:49 PM, Alex G. wrote:
> 
> 
> On 07/18/2018 08:38 AM, Tal Gilboa wrote:
>> On 7/16/2018 5:17 PM, Bjorn Helgaas wrote:
>>> [+cc maintainers of drivers that already use pcie_print_link_status()
>>> and GPU folks]
> [snip]
>>>
>>>> +    /* Multi-function PCIe share the same link/status. */
>>>> +    if ((PCI_FUNC(dev->devfn) != 0) || dev->is_virtfn)
>>>> +        return;
>>>> +
>>>> +    pcie_print_link_status(dev);
>>>> +}
>>
>> Is this function called by default for every PCIe device? What about 
>> VFs? We make an exception for them on our driver since a VF doesn't 
>> have access to the needed information in order to provide a meaningful 
>> message.
> 
> I'm assuming VF means virtual function. pcie_print_link_status() doesn't 
> care if it's passed a virtual function. It will try to do its job. 
> That's why I bail out three lines above, with 'dev->is_virtfn' check.
> 
> Alex

That's the point - we don't want to call pcie_print_link_status() for 
virtual functions. We make the distinction in our driver. If you want to 
change the code to call this function by default it shouldn't affect the 
current usage.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-07-23  5:21       ` Tal Gilboa
@ 2018-07-23 17:01         ` Alex G.
  2018-07-23 21:35           ` Tal Gilboa
  0 siblings, 1 reply; 73+ messages in thread
From: Alex G. @ 2018-07-23 17:01 UTC (permalink / raw)
  To: Tal Gilboa, Bjorn Helgaas
  Cc: bhelgaas, alex_gagniuc, austin_bolen, shyam_iyer, keith.busch,
	linux-pci, linux-kernel, Jeff Kirsher, Ariel Elior, Michael Chan,
	Ganesh Goudar, Tariq Toukan, Jakub Kicinski, Dave Airlie,
	Alex Deucher

On 07/23/2018 12:21 AM, Tal Gilboa wrote:
> On 7/19/2018 6:49 PM, Alex G. wrote:
>>
>>
>> On 07/18/2018 08:38 AM, Tal Gilboa wrote:
>>> On 7/16/2018 5:17 PM, Bjorn Helgaas wrote:
>>>> [+cc maintainers of drivers that already use pcie_print_link_status()
>>>> and GPU folks]
>> [snip]
>>>>
>>>>> +    /* Multi-function PCIe share the same link/status. */
>>>>> +    if ((PCI_FUNC(dev->devfn) != 0) || dev->is_virtfn)
>>>>> +        return;
>>>>> +
>>>>> +    pcie_print_link_status(dev);
>>>>> +}
>>>
>>> Is this function called by default for every PCIe device? What about 
>>> VFs? We make an exception for them on our driver since a VF doesn't 
>>> have access to the needed information in order to provide a 
>>> meaningful message.
>>
>> I'm assuming VF means virtual function. pcie_print_link_status() 
>> doesn't care if it's passed a virtual function. It will try to do its 
>> job. That's why I bail out three lines above, with 'dev->is_virtfn' 
>> check.
>>
>> Alex
> 
> That's the point - we don't want to call pcie_print_link_status() for 
> virtual functions. We make the distinction in our driver. If you want to 
> change the code to call this function by default it shouldn't affect the 
> current usage.

I'm not understanding very well what you're asking. I understand you 
want to avoid printing this message on virtual functions, and that's 
already taken care of. I'm also not changing current behavior.  Let's 
get v2 out and start the discussion again based on that.

Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v2] PCI/AER: Do not clear AER bits if we don't own AER
  2018-07-18 21:53     ` Bjorn Helgaas
  2018-07-19 15:46       ` Alex G.
@ 2018-07-23 20:01       ` Alexandru Gagniuc
  2018-07-25  1:24         ` kbuild test robot
  2018-07-23 20:03       ` [PATCH v5] PCI: Check for PCIe downtraining conditions Alexandru Gagniuc
  2 siblings, 1 reply; 73+ messages in thread
From: Alexandru Gagniuc @ 2018-07-23 20:01 UTC (permalink / raw)
  To: linux-pci, bhelgaas
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
	jeffrey.t.kirsher, ariel.elior, michael.chan, ganeshgr, tariqt,
	jakub.kicinski, airlied, alexander.deucher, mike.marciniszyn,
	Alexandru Gagniuc, Frederick Lawler, Oza Pawandeep,
	Greg Kroah-Hartman, linux-kernel

When we don't own AER, we shouldn't touch the AER error bits. Clearing
error bits willy-nilly might cause firmware to miss some errors. In
theory, these bits get cleared by FFS, or via ACPI _HPX method. These
mechanisms are not subject to the problem.

This race is mostly of theoretical significance, since I can't
reasonably demonstrate this race in the lab.

On a side-note, pcie_aer_is_kernel_first() is created to alleviate the
need for two checks: aer_cap and get_firmware_first().

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/pci/pcie/aer.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a2e88386af28..85c3e173c025 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -307,6 +307,12 @@ int pcie_aer_get_firmware_first(struct pci_dev *dev)
 		aer_set_firmware_first(dev);
 	return dev->__aer_firmware_first;
 }
+
+static bool pcie_aer_is_kernel_first(struct pci_dev *dev)
+{
+	return !!dev->aer_cap && !pcie_aer_get_firmware_first(dev);
+}
+
 #define	PCI_EXP_AER_FLAGS	(PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE | \
 				 PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE)
 
@@ -337,10 +343,7 @@ bool aer_acpi_firmware_first(void)
 
 int pci_enable_pcie_error_reporting(struct pci_dev *dev)
 {
-	if (pcie_aer_get_firmware_first(dev))
-		return -EIO;
-
-	if (!dev->aer_cap)
+	if (!pcie_aer_is_kernel_first(dev))
 		return -EIO;
 
 	return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS);
@@ -349,7 +352,7 @@ EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
 
 int pci_disable_pcie_error_reporting(struct pci_dev *dev)
 {
-	if (pcie_aer_get_firmware_first(dev))
+	if (!pcie_aer_is_kernel_first(dev))
 		return -EIO;
 
 	return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
@@ -383,10 +386,10 @@ int pci_cleanup_aer_error_status_regs(struct pci_dev *dev)
 	if (!pci_is_pcie(dev))
 		return -ENODEV;
 
-	pos = dev->aer_cap;
-	if (!pos)
+	if (pcie_aer_is_kernel_first(dev))
 		return -EIO;
 
+	pos = dev->aer_cap;
 	port_type = pci_pcie_type(dev);
 	if (port_type == PCI_EXP_TYPE_ROOT_PORT) {
 		pci_read_config_dword(dev, pos + PCI_ERR_ROOT_STATUS, &status);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-07-18 21:53     ` Bjorn Helgaas
  2018-07-19 15:46       ` Alex G.
  2018-07-23 20:01       ` [PATCH v2] PCI/AER: Do not clear AER bits if we don't own AER Alexandru Gagniuc
@ 2018-07-23 20:03       ` Alexandru Gagniuc
  2018-07-23 21:01         ` Jakub Kicinski
  2 siblings, 1 reply; 73+ messages in thread
From: Alexandru Gagniuc @ 2018-07-23 20:03 UTC (permalink / raw)
  To: linux-pci, bhelgaas
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
	jeffrey.t.kirsher, ariel.elior, michael.chan, ganeshgr, tariqt,
	jakub.kicinski, airlied, alexander.deucher, mike.marciniszyn,
	Alexandru Gagniuc, linux-kernel

PCIe downtraining happens when both the device and PCIe port are
capable of a larger bus width or higher speed than negotiated.
Downtraining might be indicative of other problems in the system, and
identifying this from userspace is neither intuitive, nor
straightforward.

The easiest way to detect this is with pcie_print_link_status(),
since the bottleneck is usually the link that is downtrained. It's not
a perfect solution, but it works extremely well in most cases.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---

For the sake of review, I've created a __pcie_print_link_status() which
takes a 'verbose' argument. If we agree want to go this route, and update
the users of pcie_print_link_status(), I can split this up in two patches.
I prefer just printing this information in the core functions, and letting
drivers not have to worry about this. Though there seems to be strong for
not going that route, so here it goes:

Changes since v4:
 - Use 'verbose' argumnet to print bandwidth under normal conditions
 - Without verbose, only downtraining conditions are reported

Changes since v3:
 - Remove extra newline and parentheses.

Changes since v2:
 - Check dev->is_virtfn flag

Changes since v1:
 - Use pcie_print_link_status() instead of reimplementing logic

 drivers/pci/pci.c   | 22 ++++++++++++++++++----
 drivers/pci/probe.c | 21 +++++++++++++++++++++
 include/linux/pci.h |  1 +
 3 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 316496e99da9..414ad7b3abdb 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5302,14 +5302,15 @@ u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
 }
 
 /**
- * pcie_print_link_status - Report the PCI device's link speed and width
+ * __pcie_print_link_status - Report the PCI device's link speed and width
  * @dev: PCI device to query
+ * @verbose: Be verbose -- print info even when enough bandwidth is available.
  *
  * Report the available bandwidth at the device.  If this is less than the
  * device is capable of, report the device's maximum possible bandwidth and
  * the upstream link that limits its performance to less than that.
  */
-void pcie_print_link_status(struct pci_dev *dev)
+void __pcie_print_link_status(struct pci_dev *dev, bool verbose)
 {
 	enum pcie_link_width width, width_cap;
 	enum pci_bus_speed speed, speed_cap;
@@ -5319,11 +5320,11 @@ void pcie_print_link_status(struct pci_dev *dev)
 	bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap);
 	bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed, &width);
 
-	if (bw_avail >= bw_cap)
+	if (bw_avail >= bw_cap && verbose)
 		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s x%d link)\n",
 			 bw_cap / 1000, bw_cap % 1000,
 			 PCIE_SPEED2STR(speed_cap), width_cap);
-	else
+	else if (bw_avail < bw_cap)
 		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, limited by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
 			 bw_avail / 1000, bw_avail % 1000,
 			 PCIE_SPEED2STR(speed), width,
@@ -5331,6 +5332,19 @@ void pcie_print_link_status(struct pci_dev *dev)
 			 bw_cap / 1000, bw_cap % 1000,
 			 PCIE_SPEED2STR(speed_cap), width_cap);
 }
+
+/**
+ * pcie_print_link_status - Report the PCI device's link speed and width
+ * @dev: PCI device to query
+ *
+ * Report the available bandwidth at the device.  If this is less than the
+ * device is capable of, report the device's maximum possible bandwidth and
+ * the upstream link that limits its performance to less than that.
+ */
+void pcie_print_link_status(struct pci_dev *dev)
+{
+	__pcie_print_link_status(dev, true);
+}
 EXPORT_SYMBOL(pcie_print_link_status);
 
 /**
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ac876e32de4b..1f7336377c3b 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2205,6 +2205,24 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
 	return dev;
 }
 
+static void pcie_check_upstream_link(struct pci_dev *dev)
+{
+	if (!pci_is_pcie(dev))
+		return;
+
+	/* Look from the device up to avoid downstream ports with no devices. */
+	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
+		return;
+
+	/* Multi-function PCIe share the same link/status. */
+	if (PCI_FUNC(dev->devfn) != 0 || dev->is_virtfn)
+		return;
+
+	__pcie_print_link_status(dev, false);
+}
+
 static void pci_init_capabilities(struct pci_dev *dev)
 {
 	/* Enhanced Allocation */
@@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
 	/* Advanced Error Reporting */
 	pci_aer_init(dev);
 
+	/* Check link and detect downtrain errors */
+	pcie_check_upstream_link(dev);
+
 	if (pci_probe_reset_function(dev) == 0)
 		dev->reset_fn = 1;
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index abd5d5e17aee..15bfab8f7a1b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1088,6 +1088,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
 u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev,
 			     enum pci_bus_speed *speed,
 			     enum pcie_link_width *width);
+void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
 void pcie_print_link_status(struct pci_dev *dev);
 int pcie_flr(struct pci_dev *dev);
 int __pci_reset_function_locked(struct pci_dev *dev);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-07-23 20:03       ` [PATCH v5] PCI: Check for PCIe downtraining conditions Alexandru Gagniuc
@ 2018-07-23 21:01         ` Jakub Kicinski
  2018-07-23 21:52           ` Tal Gilboa
  0 siblings, 1 reply; 73+ messages in thread
From: Jakub Kicinski @ 2018-07-23 21:01 UTC (permalink / raw)
  To: Alexandru Gagniuc
  Cc: linux-pci, bhelgaas, keith.busch, alex_gagniuc, austin_bolen,
	shyam_iyer, jeffrey.t.kirsher, ariel.elior, michael.chan,
	ganeshgr, tariqt, airlied, alexander.deucher, mike.marciniszyn,
	linux-kernel

On Mon, 23 Jul 2018 15:03:38 -0500, Alexandru Gagniuc wrote:
> PCIe downtraining happens when both the device and PCIe port are
> capable of a larger bus width or higher speed than negotiated.
> Downtraining might be indicative of other problems in the system, and
> identifying this from userspace is neither intuitive, nor
> straightforward.
> 
> The easiest way to detect this is with pcie_print_link_status(),
> since the bottleneck is usually the link that is downtrained. It's not
> a perfect solution, but it works extremely well in most cases.
> 
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
> 
> For the sake of review, I've created a __pcie_print_link_status() which
> takes a 'verbose' argument. If we agree want to go this route, and update
> the users of pcie_print_link_status(), I can split this up in two patches.
> I prefer just printing this information in the core functions, and letting
> drivers not have to worry about this. Though there seems to be strong for
> not going that route, so here it goes:

FWIW the networking drivers print PCIe BW because sometimes the network
bandwidth is simply over-provisioned on multi port cards, e.g. 80Gbps
card on a x8 link.

Sorry to bike shed, but currently the networking cards print the info
during probe.  Would it make sense to move your message closer to probe
time?  Rather than when device is added.  If driver structure is
available, we could also consider adding a boolean to struct pci_driver
to indicate if driver wants the verbose message?  This way we avoid
duplicated prints.

I have no objection to current patch, it LGTM.  Just a thought.

>  drivers/pci/pci.c   | 22 ++++++++++++++++++----
>  drivers/pci/probe.c | 21 +++++++++++++++++++++
>  include/linux/pci.h |  1 +
>  3 files changed, 40 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 316496e99da9..414ad7b3abdb 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5302,14 +5302,15 @@ u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
>  }
>  
>  /**
> - * pcie_print_link_status - Report the PCI device's link speed and width
> + * __pcie_print_link_status - Report the PCI device's link speed and width
>   * @dev: PCI device to query
> + * @verbose: Be verbose -- print info even when enough bandwidth is available.
>   *
>   * Report the available bandwidth at the device.  If this is less than the
>   * device is capable of, report the device's maximum possible bandwidth and
>   * the upstream link that limits its performance to less than that.
>   */
> -void pcie_print_link_status(struct pci_dev *dev)
> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose)
>  {
>  	enum pcie_link_width width, width_cap;
>  	enum pci_bus_speed speed, speed_cap;
> @@ -5319,11 +5320,11 @@ void pcie_print_link_status(struct pci_dev *dev)
>  	bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap);
>  	bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed, &width);
>  
> -	if (bw_avail >= bw_cap)
> +	if (bw_avail >= bw_cap && verbose)
>  		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s x%d link)\n",
>  			 bw_cap / 1000, bw_cap % 1000,
>  			 PCIE_SPEED2STR(speed_cap), width_cap);
> -	else
> +	else if (bw_avail < bw_cap)
>  		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, limited by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
>  			 bw_avail / 1000, bw_avail % 1000,
>  			 PCIE_SPEED2STR(speed), width,
> @@ -5331,6 +5332,19 @@ void pcie_print_link_status(struct pci_dev *dev)
>  			 bw_cap / 1000, bw_cap % 1000,
>  			 PCIE_SPEED2STR(speed_cap), width_cap);
>  }
> +
> +/**
> + * pcie_print_link_status - Report the PCI device's link speed and width
> + * @dev: PCI device to query
> + *
> + * Report the available bandwidth at the device.  If this is less than the
> + * device is capable of, report the device's maximum possible bandwidth and
> + * the upstream link that limits its performance to less than that.
> + */
> +void pcie_print_link_status(struct pci_dev *dev)
> +{
> +	__pcie_print_link_status(dev, true);
> +}
>  EXPORT_SYMBOL(pcie_print_link_status);
>  
>  /**
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index ac876e32de4b..1f7336377c3b 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2205,6 +2205,24 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
>  	return dev;
>  }
>  
> +static void pcie_check_upstream_link(struct pci_dev *dev)
> +{
> +	if (!pci_is_pcie(dev))
> +		return;
> +
> +	/* Look from the device up to avoid downstream ports with no devices. */
> +	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
> +		return;
> +
> +	/* Multi-function PCIe share the same link/status. */
> +	if (PCI_FUNC(dev->devfn) != 0 || dev->is_virtfn)
> +		return;
> +
> +	__pcie_print_link_status(dev, false);
> +}
> +
>  static void pci_init_capabilities(struct pci_dev *dev)
>  {
>  	/* Enhanced Allocation */
> @@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
>  	/* Advanced Error Reporting */
>  	pci_aer_init(dev);
>  
> +	/* Check link and detect downtrain errors */
> +	pcie_check_upstream_link(dev);
> +
>  	if (pci_probe_reset_function(dev) == 0)
>  		dev->reset_fn = 1;
>  }
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index abd5d5e17aee..15bfab8f7a1b 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1088,6 +1088,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
>  u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev,
>  			     enum pci_bus_speed *speed,
>  			     enum pcie_link_width *width);
> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
>  void pcie_print_link_status(struct pci_dev *dev);
>  int pcie_flr(struct pci_dev *dev);
>  int __pci_reset_function_locked(struct pci_dev *dev);


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
  2018-07-23 17:01         ` Alex G.
@ 2018-07-23 21:35           ` Tal Gilboa
  0 siblings, 0 replies; 73+ messages in thread
From: Tal Gilboa @ 2018-07-23 21:35 UTC (permalink / raw)
  To: Alex G., Bjorn Helgaas
  Cc: bhelgaas, alex_gagniuc, austin_bolen, shyam_iyer, keith.busch,
	linux-pci, linux-kernel, Jeff Kirsher, Ariel Elior, Michael Chan,
	Ganesh Goudar, Tariq Toukan, Jakub Kicinski, Dave Airlie,
	Alex Deucher

On 7/23/2018 8:01 PM, Alex G. wrote:
> On 07/23/2018 12:21 AM, Tal Gilboa wrote:
>> On 7/19/2018 6:49 PM, Alex G. wrote:
>>>
>>>
>>> On 07/18/2018 08:38 AM, Tal Gilboa wrote:
>>>> On 7/16/2018 5:17 PM, Bjorn Helgaas wrote:
>>>>> [+cc maintainers of drivers that already use pcie_print_link_status()
>>>>> and GPU folks]
>>> [snip]
>>>>>
>>>>>> +    /* Multi-function PCIe share the same link/status. */
>>>>>> +    if ((PCI_FUNC(dev->devfn) != 0) || dev->is_virtfn)
>>>>>> +        return;
>>>>>> +
>>>>>> +    pcie_print_link_status(dev);
>>>>>> +}
>>>>
>>>> Is this function called by default for every PCIe device? What about 
>>>> VFs? We make an exception for them on our driver since a VF doesn't 
>>>> have access to the needed information in order to provide a 
>>>> meaningful message.
>>>
>>> I'm assuming VF means virtual function. pcie_print_link_status() 
>>> doesn't care if it's passed a virtual function. It will try to do its 
>>> job. That's why I bail out three lines above, with 'dev->is_virtfn' 
>>> check.
>>>
>>> Alex
>>
>> That's the point - we don't want to call pcie_print_link_status() for 
>> virtual functions. We make the distinction in our driver. If you want 
>> to change the code to call this function by default it shouldn't 
>> affect the current usage.
> 
> I'm not understanding very well what you're asking. I understand you 
> want to avoid printing this message on virtual functions, and that's 
> already taken care of. I'm also not changing current behavior.  Let's 
> get v2 out and start the discussion again based on that.
> 
> Alex

Oh ok I see. In this case, please remove the explicit call in mlx4/5 
drivers so it won't be duplicated.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-07-23 21:01         ` Jakub Kicinski
@ 2018-07-23 21:52           ` Tal Gilboa
  2018-07-23 22:14             ` Jakub Kicinski
  2018-07-31  6:40             ` Tal Gilboa
  0 siblings, 2 replies; 73+ messages in thread
From: Tal Gilboa @ 2018-07-23 21:52 UTC (permalink / raw)
  To: Jakub Kicinski, Alexandru Gagniuc
  Cc: linux-pci, bhelgaas, keith.busch, alex_gagniuc, austin_bolen,
	shyam_iyer, jeffrey.t.kirsher, ariel.elior, michael.chan,
	ganeshgr, Tariq Toukan, airlied, alexander.deucher,
	mike.marciniszyn, linux-kernel

On 7/24/2018 12:01 AM, Jakub Kicinski wrote:
> On Mon, 23 Jul 2018 15:03:38 -0500, Alexandru Gagniuc wrote:
>> PCIe downtraining happens when both the device and PCIe port are
>> capable of a larger bus width or higher speed than negotiated.
>> Downtraining might be indicative of other problems in the system, and
>> identifying this from userspace is neither intuitive, nor
>> straightforward.
>>
>> The easiest way to detect this is with pcie_print_link_status(),
>> since the bottleneck is usually the link that is downtrained. It's not
>> a perfect solution, but it works extremely well in most cases.
>>
>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>> ---
>>
>> For the sake of review, I've created a __pcie_print_link_status() which
>> takes a 'verbose' argument. If we agree want to go this route, and update
>> the users of pcie_print_link_status(), I can split this up in two patches.
>> I prefer just printing this information in the core functions, and letting
>> drivers not have to worry about this. Though there seems to be strong for
>> not going that route, so here it goes:
> 
> FWIW the networking drivers print PCIe BW because sometimes the network
> bandwidth is simply over-provisioned on multi port cards, e.g. 80Gbps
> card on a x8 link.
> 
> Sorry to bike shed, but currently the networking cards print the info
> during probe.  Would it make sense to move your message closer to probe
> time?  Rather than when device is added.  If driver structure is
> available, we could also consider adding a boolean to struct pci_driver
> to indicate if driver wants the verbose message?  This way we avoid
> duplicated prints.
> 
> I have no objection to current patch, it LGTM.  Just a thought.

I don't see the reason for having two functions. What's the problem with 
adding the verbose argument to the original function?

> 
>>   drivers/pci/pci.c   | 22 ++++++++++++++++++----
>>   drivers/pci/probe.c | 21 +++++++++++++++++++++
>>   include/linux/pci.h |  1 +
>>   3 files changed, 40 insertions(+), 4 deletions(-) >>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 316496e99da9..414ad7b3abdb 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -5302,14 +5302,15 @@ u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
>>   }
>>   
>>   /**
>> - * pcie_print_link_status - Report the PCI device's link speed and width
>> + * __pcie_print_link_status - Report the PCI device's link speed and width
>>    * @dev: PCI device to query
>> + * @verbose: Be verbose -- print info even when enough bandwidth is available.
>>    *
>>    * Report the available bandwidth at the device.  If this is less than the
>>    * device is capable of, report the device's maximum possible bandwidth and
>>    * the upstream link that limits its performance to less than that.
>>    */
>> -void pcie_print_link_status(struct pci_dev *dev)
>> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose)
>>   {
>>   	enum pcie_link_width width, width_cap;
>>   	enum pci_bus_speed speed, speed_cap;
>> @@ -5319,11 +5320,11 @@ void pcie_print_link_status(struct pci_dev *dev)
>>   	bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap);
>>   	bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed, &width);
>>   
>> -	if (bw_avail >= bw_cap)
>> +	if (bw_avail >= bw_cap && verbose)
>>   		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s x%d link)\n",
>>   			 bw_cap / 1000, bw_cap % 1000,
>>   			 PCIE_SPEED2STR(speed_cap), width_cap);
>> -	else
>> +	else if (bw_avail < bw_cap)
>>   		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, limited by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
>>   			 bw_avail / 1000, bw_avail % 1000,
>>   			 PCIE_SPEED2STR(speed), width,
>> @@ -5331,6 +5332,19 @@ void pcie_print_link_status(struct pci_dev *dev)
>>   			 bw_cap / 1000, bw_cap % 1000,
>>   			 PCIE_SPEED2STR(speed_cap), width_cap);
>>   }
>> +
>> +/**
>> + * pcie_print_link_status - Report the PCI device's link speed and width
>> + * @dev: PCI device to query
>> + *
>> + * Report the available bandwidth at the device.  If this is less than the
>> + * device is capable of, report the device's maximum possible bandwidth and
>> + * the upstream link that limits its performance to less than that.
>> + */
>> +void pcie_print_link_status(struct pci_dev *dev)
>> +{
>> +	__pcie_print_link_status(dev, true);
>> +}
>>   EXPORT_SYMBOL(pcie_print_link_status);
>>   
>>   /**
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index ac876e32de4b..1f7336377c3b 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -2205,6 +2205,24 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
>>   	return dev;
>>   }
>>   
>> +static void pcie_check_upstream_link(struct pci_dev *dev)
>> +{
>> +	if (!pci_is_pcie(dev))
>> +		return;
>> +
>> +	/* Look from the device up to avoid downstream ports with no devices. */
>> +	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
>> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
>> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
>> +		return;
>> +
>> +	/* Multi-function PCIe share the same link/status. */
>> +	if (PCI_FUNC(dev->devfn) != 0 || dev->is_virtfn)
>> +		return;
>> +
>> +	__pcie_print_link_status(dev, false);
>> +}
>> +
>>   static void pci_init_capabilities(struct pci_dev *dev)
>>   {
>>   	/* Enhanced Allocation */
>> @@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
>>   	/* Advanced Error Reporting */
>>   	pci_aer_init(dev);
>>   
>> +	/* Check link and detect downtrain errors */
>> +	pcie_check_upstream_link(dev);

This is called for every PCIe device right? Won't there be a duplicated 
print in case a device loads with lower PCIe bandwidth than needed?

>> +
>>   	if (pci_probe_reset_function(dev) == 0)
>>   		dev->reset_fn = 1;
>>   }
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index abd5d5e17aee..15bfab8f7a1b 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -1088,6 +1088,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
>>   u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev,
>>   			     enum pci_bus_speed *speed,
>>   			     enum pcie_link_width *width);
>> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
>>   void pcie_print_link_status(struct pci_dev *dev);
>>   int pcie_flr(struct pci_dev *dev);
>>   int __pci_reset_function_locked(struct pci_dev *dev);
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-07-23 21:52           ` Tal Gilboa
@ 2018-07-23 22:14             ` Jakub Kicinski
  2018-07-23 23:59               ` Alex G.
  2018-07-31  6:40             ` Tal Gilboa
  1 sibling, 1 reply; 73+ messages in thread
From: Jakub Kicinski @ 2018-07-23 22:14 UTC (permalink / raw)
  To: Tal Gilboa
  Cc: Alexandru Gagniuc, linux-pci, bhelgaas, keith.busch,
	alex_gagniuc, austin_bolen, shyam_iyer, jeffrey.t.kirsher,
	ariel.elior, michael.chan, ganeshgr, Tariq Toukan, airlied,
	alexander.deucher, mike.marciniszyn, linux-kernel

On Tue, 24 Jul 2018 00:52:22 +0300, Tal Gilboa wrote:
> On 7/24/2018 12:01 AM, Jakub Kicinski wrote:
> > On Mon, 23 Jul 2018 15:03:38 -0500, Alexandru Gagniuc wrote:  
> >> PCIe downtraining happens when both the device and PCIe port are
> >> capable of a larger bus width or higher speed than negotiated.
> >> Downtraining might be indicative of other problems in the system, and
> >> identifying this from userspace is neither intuitive, nor
> >> straightforward.
> >>
> >> The easiest way to detect this is with pcie_print_link_status(),
> >> since the bottleneck is usually the link that is downtrained. It's not
> >> a perfect solution, but it works extremely well in most cases.
> >>
> >> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> >> ---
> >>
> >> For the sake of review, I've created a __pcie_print_link_status() which
> >> takes a 'verbose' argument. If we agree want to go this route, and update
> >> the users of pcie_print_link_status(), I can split this up in two patches.
> >> I prefer just printing this information in the core functions, and letting
> >> drivers not have to worry about this. Though there seems to be strong for
> >> not going that route, so here it goes:  
> > 
> > FWIW the networking drivers print PCIe BW because sometimes the network
> > bandwidth is simply over-provisioned on multi port cards, e.g. 80Gbps
> > card on a x8 link.
> > 
> > Sorry to bike shed, but currently the networking cards print the info
> > during probe.  Would it make sense to move your message closer to probe
> > time?  Rather than when device is added.  If driver structure is
> > available, we could also consider adding a boolean to struct pci_driver
> > to indicate if driver wants the verbose message?  This way we avoid
> > duplicated prints.
> > 
> > I have no objection to current patch, it LGTM.  Just a thought.  
> 
> I don't see the reason for having two functions. What's the problem with 
> adding the verbose argument to the original function?

IMHO it's reasonable to keep the default parameter to what 90% of users
want by a means on a wrapper.  The non-verbose output is provided by
the core already for all devices.

What do you think of my proposal above Tal?  That would make the extra
wrapper unnecessary since the verbose parameter would be part of the
driver structure, and it would avoid the duplicated output.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-07-23 22:14             ` Jakub Kicinski
@ 2018-07-23 23:59               ` Alex G.
  2018-07-24 13:39                 ` Tal Gilboa
  0 siblings, 1 reply; 73+ messages in thread
From: Alex G. @ 2018-07-23 23:59 UTC (permalink / raw)
  To: Jakub Kicinski, Tal Gilboa
  Cc: linux-pci, bhelgaas, keith.busch, alex_gagniuc, austin_bolen,
	shyam_iyer, jeffrey.t.kirsher, ariel.elior, michael.chan,
	ganeshgr, Tariq Toukan, airlied, alexander.deucher,
	mike.marciniszyn, linux-kernel



On 07/23/2018 05:14 PM, Jakub Kicinski wrote:
> On Tue, 24 Jul 2018 00:52:22 +0300, Tal Gilboa wrote:
>> On 7/24/2018 12:01 AM, Jakub Kicinski wrote:
>>> On Mon, 23 Jul 2018 15:03:38 -0500, Alexandru Gagniuc wrote:
>>>> PCIe downtraining happens when both the device and PCIe port are
>>>> capable of a larger bus width or higher speed than negotiated.
>>>> Downtraining might be indicative of other problems in the system, and
>>>> identifying this from userspace is neither intuitive, nor
>>>> straightforward.
>>>>
>>>> The easiest way to detect this is with pcie_print_link_status(),
>>>> since the bottleneck is usually the link that is downtrained. It's not
>>>> a perfect solution, but it works extremely well in most cases.
>>>>
>>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>>>> ---
>>>>
>>>> For the sake of review, I've created a __pcie_print_link_status() which
>>>> takes a 'verbose' argument. If we agree want to go this route, and update
>>>> the users of pcie_print_link_status(), I can split this up in two patches.
>>>> I prefer just printing this information in the core functions, and letting
>>>> drivers not have to worry about this. Though there seems to be strong for
>>>> not going that route, so here it goes:
>>>
>>> FWIW the networking drivers print PCIe BW because sometimes the network
>>> bandwidth is simply over-provisioned on multi port cards, e.g. 80Gbps
>>> card on a x8 link.
>>>
>>> Sorry to bike shed, but currently the networking cards print the info
>>> during probe.  Would it make sense to move your message closer to probe
>>> time?  Rather than when device is added.  If driver structure is
>>> available, we could also consider adding a boolean to struct pci_driver
>>> to indicate if driver wants the verbose message?  This way we avoid
>>> duplicated prints.
>>>
>>> I have no objection to current patch, it LGTM.  Just a thought.
>>
>> I don't see the reason for having two functions. What's the problem with
>> adding the verbose argument to the original function?
> 
> IMHO it's reasonable to keep the default parameter to what 90% of users
> want by a means on a wrapper.  The non-verbose output is provided by
> the core already for all devices.
> 
> What do you think of my proposal above Tal?  That would make the extra
> wrapper unnecessary since the verbose parameter would be part of the
> driver structure, and it would avoid the duplicated output.

I see how it might make sense to add another member to the driver 
struct, but is it worth the extra learning curve? It seems to be 
something with the potential to confuse new driver developers, and 
having a very marginal benefit.
Although, if that's what people want...

Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-07-23 23:59               ` Alex G.
@ 2018-07-24 13:39                 ` Tal Gilboa
  2018-07-30 23:26                     ` Alex_Gagniuc
  0 siblings, 1 reply; 73+ messages in thread
From: Tal Gilboa @ 2018-07-24 13:39 UTC (permalink / raw)
  To: Alex G., Jakub Kicinski
  Cc: linux-pci, bhelgaas, keith.busch, alex_gagniuc, austin_bolen,
	shyam_iyer, jeffrey.t.kirsher, ariel.elior, michael.chan,
	ganeshgr, Tariq Toukan, airlied, alexander.deucher,
	mike.marciniszyn, linux-kernel

On 7/24/2018 2:59 AM, Alex G. wrote:
> 
> 
> On 07/23/2018 05:14 PM, Jakub Kicinski wrote:
>> On Tue, 24 Jul 2018 00:52:22 +0300, Tal Gilboa wrote:
>>> On 7/24/2018 12:01 AM, Jakub Kicinski wrote:
>>>> On Mon, 23 Jul 2018 15:03:38 -0500, Alexandru Gagniuc wrote:
>>>>> PCIe downtraining happens when both the device and PCIe port are
>>>>> capable of a larger bus width or higher speed than negotiated.
>>>>> Downtraining might be indicative of other problems in the system, and
>>>>> identifying this from userspace is neither intuitive, nor
>>>>> straightforward.
>>>>>
>>>>> The easiest way to detect this is with pcie_print_link_status(),
>>>>> since the bottleneck is usually the link that is downtrained. It's not
>>>>> a perfect solution, but it works extremely well in most cases.
>>>>>
>>>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>>>>> ---
>>>>>
>>>>> For the sake of review, I've created a __pcie_print_link_status() 
>>>>> which
>>>>> takes a 'verbose' argument. If we agree want to go this route, and 
>>>>> update
>>>>> the users of pcie_print_link_status(), I can split this up in two 
>>>>> patches.
>>>>> I prefer just printing this information in the core functions, and 
>>>>> letting
>>>>> drivers not have to worry about this. Though there seems to be 
>>>>> strong for
>>>>> not going that route, so here it goes:
>>>>
>>>> FWIW the networking drivers print PCIe BW because sometimes the network
>>>> bandwidth is simply over-provisioned on multi port cards, e.g. 80Gbps
>>>> card on a x8 link.
>>>>
>>>> Sorry to bike shed, but currently the networking cards print the info
>>>> during probe.  Would it make sense to move your message closer to probe
>>>> time?  Rather than when device is added.  If driver structure is
>>>> available, we could also consider adding a boolean to struct pci_driver
>>>> to indicate if driver wants the verbose message?  This way we avoid
>>>> duplicated prints.
>>>>
>>>> I have no objection to current patch, it LGTM.  Just a thought.
>>>
>>> I don't see the reason for having two functions. What's the problem with
>>> adding the verbose argument to the original function?
>>
>> IMHO it's reasonable to keep the default parameter to what 90% of users
>> want by a means on a wrapper.  The non-verbose output is provided by
>> the core already for all devices.
>>
>> What do you think of my proposal above Tal?  That would make the extra
>> wrapper unnecessary since the verbose parameter would be part of the
>> driver structure, and it would avoid the duplicated output.
> 
> I see how it might make sense to add another member to the driver 
> struct, but is it worth the extra learning curve? It seems to be 
> something with the potential to confuse new driver developers, and 
> having a very marginal benefit.
> Although, if that's what people want...

I prefer the wrapper function. Looking at struct pci_driver it would 
seem strange for it to hold a field for controlling verbosity (IMO). 
This is a very (very) specific field in a very general struct.

> 
> Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v2] PCI/AER: Do not clear AER bits if we don't own AER
  2018-07-23 20:01       ` [PATCH v2] PCI/AER: Do not clear AER bits if we don't own AER Alexandru Gagniuc
@ 2018-07-25  1:24         ` kbuild test robot
  0 siblings, 0 replies; 73+ messages in thread
From: kbuild test robot @ 2018-07-25  1:24 UTC (permalink / raw)
  To: Alexandru Gagniuc
  Cc: kbuild-all, linux-pci, bhelgaas, keith.busch, alex_gagniuc,
	austin_bolen, shyam_iyer, jeffrey.t.kirsher, ariel.elior,
	michael.chan, ganeshgr, tariqt, jakub.kicinski, airlied,
	alexander.deucher, mike.marciniszyn, Alexandru Gagniuc,
	Frederick Lawler, Oza Pawandeep, Greg Kroah-Hartman,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2176 bytes --]

Hi Alexandru,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on pci/next]
[also build test WARNING on v4.18-rc6 next-20180724]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Alexandru-Gagniuc/PCI-AER-Do-not-clear-AER-bits-if-we-don-t-own-AER/20180725-003708
base:   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
config: x86_64-randconfig-s0-07250049 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from include/linux/string.h:6:0,
                    from include/linux/uuid.h:20,
                    from include/linux/cper.h:24,
                    from drivers/pci/pcie/aer.c:15:
   drivers/pci/pcie/aer.c: In function 'pci_enable_pcie_error_reporting':
   drivers/pci/pcie/aer.c:371:7: error: implicit declaration of function 'pcie_aer_is_kernel_first' [-Werror=implicit-function-declaration]
     if (!pcie_aer_is_kernel_first(dev))
          ^
   include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
     if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
                                 ^~~~
>> drivers/pci/pcie/aer.c:371:2: note: in expansion of macro 'if'
     if (!pcie_aer_is_kernel_first(dev))
     ^~
   cc1: some warnings being treated as errors

vim +/if +371 drivers/pci/pcie/aer.c

   365	
   366	#define	PCI_EXP_AER_FLAGS	(PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE | \
   367					 PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE)
   368	
   369	int pci_enable_pcie_error_reporting(struct pci_dev *dev)
   370	{
 > 371		if (!pcie_aer_is_kernel_first(dev))
   372			return -EIO;
   373	
   374		return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_AER_FLAGS);
   375	}
   376	EXPORT_SYMBOL_GPL(pci_enable_pcie_error_reporting);
   377	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32082 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-07-24 13:39                 ` Tal Gilboa
@ 2018-07-30 23:26                     ` Alex_Gagniuc
  0 siblings, 0 replies; 73+ messages in thread
From: Alex_Gagniuc @ 2018-07-30 23:26 UTC (permalink / raw)
  To: talgi, mr.nuke.me, jakub.kicinski
  Cc: linux-pci, bhelgaas, keith.busch, Austin.Bolen, Shyam.Iyer,
	jeffrey.t.kirsher, ariel.elior, michael.chan, ganeshgr, tariqt,
	airlied, alexander.deucher, mike.marciniszyn, linux-kernel

On 07/24/2018 08:40 AM, Tal Gilboa wrote:
> On 7/24/2018 2:59 AM, Alex G. wrote:
>>
>>
>> On 07/23/2018 05:14 PM, Jakub Kicinski wrote:
>>> On Tue, 24 Jul 2018 00:52:22 +0300, Tal Gilboa wrote:
>>>> On 7/24/2018 12:01 AM, Jakub Kicinski wrote:
>>>>> On Mon, 23 Jul 2018 15:03:38 -0500, Alexandru Gagniuc wrote:
>>>>>> PCIe downtraining happens when both the device and PCIe port are
>>>>>> capable of a larger bus width or higher speed than negotiated.
>>>>>> Downtraining might be indicative of other problems in the system, and
>>>>>> identifying this from userspace is neither intuitive, nor
>>>>>> straightforward.
>>>>>>
>>>>>> The easiest way to detect this is with pcie_print_link_status(),
>>>>>> since the bottleneck is usually the link that is downtrained. It's not
>>>>>> a perfect solution, but it works extremely well in most cases.
>>>>>>
>>>>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>>>>>> ---
>>>>>>
>>>>>> For the sake of review, I've created a __pcie_print_link_status()
>>>>>> which
>>>>>> takes a 'verbose' argument. If we agree want to go this route, and
>>>>>> update
>>>>>> the users of pcie_print_link_status(), I can split this up in two
>>>>>> patches.
>>>>>> I prefer just printing this information in the core functions, and
>>>>>> letting
>>>>>> drivers not have to worry about this. Though there seems to be
>>>>>> strong for
>>>>>> not going that route, so here it goes:
>>>>>
>>>>> FWIW the networking drivers print PCIe BW because sometimes the network
>>>>> bandwidth is simply over-provisioned on multi port cards, e.g. 80Gbps
>>>>> card on a x8 link.
>>>>>
>>>>> Sorry to bike shed, but currently the networking cards print the info
>>>>> during probe.  Would it make sense to move your message closer to probe
>>>>> time?  Rather than when device is added.  If driver structure is
>>>>> available, we could also consider adding a boolean to struct pci_driver
>>>>> to indicate if driver wants the verbose message?  This way we avoid
>>>>> duplicated prints.
>>>>>
>>>>> I have no objection to current patch, it LGTM.  Just a thought.
>>>>
>>>> I don't see the reason for having two functions. What's the problem with
>>>> adding the verbose argument to the original function?
>>>
>>> IMHO it's reasonable to keep the default parameter to what 90% of users
>>> want by a means on a wrapper.  The non-verbose output is provided by
>>> the core already for all devices.
>>>
>>> What do you think of my proposal above Tal?  That would make the extra
>>> wrapper unnecessary since the verbose parameter would be part of the
>>> driver structure, and it would avoid the duplicated output.
>>
>> I see how it might make sense to add another member to the driver
>> struct, but is it worth the extra learning curve? It seems to be
>> something with the potential to confuse new driver developers, and
>> having a very marginal benefit.
>> Although, if that's what people want...
> 
> I prefer the wrapper function. Looking at struct pci_driver it would
> seem strange for it to hold a field for controlling verbosity (IMO).
> This is a very (very) specific field in a very general struct.

If people are okay with the wrapper, then I'm not going to update the patch.

Alex

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
@ 2018-07-30 23:26                     ` Alex_Gagniuc
  0 siblings, 0 replies; 73+ messages in thread
From: Alex_Gagniuc @ 2018-07-30 23:26 UTC (permalink / raw)
  To: talgi, mr.nuke.me, jakub.kicinski
  Cc: linux-pci, bhelgaas, keith.busch, Austin.Bolen, Shyam.Iyer,
	jeffrey.t.kirsher, ariel.elior, michael.chan, ganeshgr, tariqt,
	airlied, alexander.deucher, mike.marciniszyn, linux-kernel

On 07/24/2018 08:40 AM, Tal Gilboa wrote:=0A=
> On 7/24/2018 2:59 AM, Alex G. wrote:=0A=
>>=0A=
>>=0A=
>> On 07/23/2018 05:14 PM, Jakub Kicinski wrote:=0A=
>>> On Tue, 24 Jul 2018 00:52:22 +0300, Tal Gilboa wrote:=0A=
>>>> On 7/24/2018 12:01 AM, Jakub Kicinski wrote:=0A=
>>>>> On Mon, 23 Jul 2018 15:03:38 -0500, Alexandru Gagniuc wrote:=0A=
>>>>>> PCIe downtraining happens when both the device and PCIe port are=0A=
>>>>>> capable of a larger bus width or higher speed than negotiated.=0A=
>>>>>> Downtraining might be indicative of other problems in the system, an=
d=0A=
>>>>>> identifying this from userspace is neither intuitive, nor=0A=
>>>>>> straightforward.=0A=
>>>>>>=0A=
>>>>>> The easiest way to detect this is with pcie_print_link_status(),=0A=
>>>>>> since the bottleneck is usually the link that is downtrained. It's n=
ot=0A=
>>>>>> a perfect solution, but it works extremely well in most cases.=0A=
>>>>>>=0A=
>>>>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>=0A=
>>>>>> ---=0A=
>>>>>>=0A=
>>>>>> For the sake of review, I've created a __pcie_print_link_status()=0A=
>>>>>> which=0A=
>>>>>> takes a 'verbose' argument. If we agree want to go this route, and=
=0A=
>>>>>> update=0A=
>>>>>> the users of pcie_print_link_status(), I can split this up in two=0A=
>>>>>> patches.=0A=
>>>>>> I prefer just printing this information in the core functions, and=
=0A=
>>>>>> letting=0A=
>>>>>> drivers not have to worry about this. Though there seems to be=0A=
>>>>>> strong for=0A=
>>>>>> not going that route, so here it goes:=0A=
>>>>>=0A=
>>>>> FWIW the networking drivers print PCIe BW because sometimes the netwo=
rk=0A=
>>>>> bandwidth is simply over-provisioned on multi port cards, e.g. 80Gbps=
=0A=
>>>>> card on a x8 link.=0A=
>>>>>=0A=
>>>>> Sorry to bike shed, but currently the networking cards print the info=
=0A=
>>>>> during probe.=A0 Would it make sense to move your message closer to p=
robe=0A=
>>>>> time?=A0 Rather than when device is added.=A0 If driver structure is=
=0A=
>>>>> available, we could also consider adding a boolean to struct pci_driv=
er=0A=
>>>>> to indicate if driver wants the verbose message?=A0 This way we avoid=
=0A=
>>>>> duplicated prints.=0A=
>>>>>=0A=
>>>>> I have no objection to current patch, it LGTM.=A0 Just a thought.=0A=
>>>>=0A=
>>>> I don't see the reason for having two functions. What's the problem wi=
th=0A=
>>>> adding the verbose argument to the original function?=0A=
>>>=0A=
>>> IMHO it's reasonable to keep the default parameter to what 90% of users=
=0A=
>>> want by a means on a wrapper.=A0 The non-verbose output is provided by=
=0A=
>>> the core already for all devices.=0A=
>>>=0A=
>>> What do you think of my proposal above Tal?=A0 That would make the extr=
a=0A=
>>> wrapper unnecessary since the verbose parameter would be part of the=0A=
>>> driver structure, and it would avoid the duplicated output.=0A=
>>=0A=
>> I see how it might make sense to add another member to the driver=0A=
>> struct, but is it worth the extra learning curve? It seems to be=0A=
>> something with the potential to confuse new driver developers, and=0A=
>> having a very marginal benefit.=0A=
>> Although, if that's what people want...=0A=
> =0A=
> I prefer the wrapper function. Looking at struct pci_driver it would=0A=
> seem strange for it to hold a field for controlling verbosity (IMO).=0A=
> This is a very (very) specific field in a very general struct.=0A=
=0A=
If people are okay with the wrapper, then I'm not going to update the patch=
.=0A=
=0A=
Alex=0A=

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-07-23 21:52           ` Tal Gilboa
  2018-07-23 22:14             ` Jakub Kicinski
@ 2018-07-31  6:40             ` Tal Gilboa
  2018-07-31 15:10               ` Alex G.
  1 sibling, 1 reply; 73+ messages in thread
From: Tal Gilboa @ 2018-07-31  6:40 UTC (permalink / raw)
  To: Jakub Kicinski, Alexandru Gagniuc
  Cc: linux-pci, bhelgaas, keith.busch, alex_gagniuc, austin_bolen,
	shyam_iyer, jeffrey.t.kirsher, ariel.elior, michael.chan,
	ganeshgr, Tariq Toukan, airlied, alexander.deucher,
	mike.marciniszyn, linux-kernel

On 7/24/2018 12:52 AM, Tal Gilboa wrote:
> On 7/24/2018 12:01 AM, Jakub Kicinski wrote:
>> On Mon, 23 Jul 2018 15:03:38 -0500, Alexandru Gagniuc wrote:
>>> PCIe downtraining happens when both the device and PCIe port are
>>> capable of a larger bus width or higher speed than negotiated.
>>> Downtraining might be indicative of other problems in the system, and
>>> identifying this from userspace is neither intuitive, nor
>>> straightforward.
>>>
>>> The easiest way to detect this is with pcie_print_link_status(),
>>> since the bottleneck is usually the link that is downtrained. It's not
>>> a perfect solution, but it works extremely well in most cases.
>>>
>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>>> ---
>>>
>>> For the sake of review, I've created a __pcie_print_link_status() which
>>> takes a 'verbose' argument. If we agree want to go this route, and 
>>> update
>>> the users of pcie_print_link_status(), I can split this up in two 
>>> patches.
>>> I prefer just printing this information in the core functions, and 
>>> letting
>>> drivers not have to worry about this. Though there seems to be strong 
>>> for
>>> not going that route, so here it goes:
>>
>> FWIW the networking drivers print PCIe BW because sometimes the network
>> bandwidth is simply over-provisioned on multi port cards, e.g. 80Gbps
>> card on a x8 link.
>>
>> Sorry to bike shed, but currently the networking cards print the info
>> during probe.  Would it make sense to move your message closer to probe
>> time?  Rather than when device is added.  If driver structure is
>> available, we could also consider adding a boolean to struct pci_driver
>> to indicate if driver wants the verbose message?  This way we avoid
>> duplicated prints.
>>
>> I have no objection to current patch, it LGTM.  Just a thought.
> 
> I don't see the reason for having two functions. What's the problem with 
> adding the verbose argument to the original function?
> 
>>
>>>   drivers/pci/pci.c   | 22 ++++++++++++++++++----
>>>   drivers/pci/probe.c | 21 +++++++++++++++++++++
>>>   include/linux/pci.h |  1 +
>>>   3 files changed, 40 insertions(+), 4 deletions(-) >>
>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>> index 316496e99da9..414ad7b3abdb 100644
>>> --- a/drivers/pci/pci.c
>>> +++ b/drivers/pci/pci.c
>>> @@ -5302,14 +5302,15 @@ u32 pcie_bandwidth_capable(struct pci_dev 
>>> *dev, enum pci_bus_speed *speed,
>>>   }
>>>   /**
>>> - * pcie_print_link_status - Report the PCI device's link speed and 
>>> width
>>> + * __pcie_print_link_status - Report the PCI device's link speed and 
>>> width
>>>    * @dev: PCI device to query
>>> + * @verbose: Be verbose -- print info even when enough bandwidth is 
>>> available.
>>>    *
>>>    * Report the available bandwidth at the device.  If this is less 
>>> than the
>>>    * device is capable of, report the device's maximum possible 
>>> bandwidth and
>>>    * the upstream link that limits its performance to less than that.
>>>    */
>>> -void pcie_print_link_status(struct pci_dev *dev)
>>> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose)
>>>   {
>>>       enum pcie_link_width width, width_cap;
>>>       enum pci_bus_speed speed, speed_cap;
>>> @@ -5319,11 +5320,11 @@ void pcie_print_link_status(struct pci_dev *dev)
>>>       bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap);
>>>       bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed, 
>>> &width);
>>> -    if (bw_avail >= bw_cap)
>>> +    if (bw_avail >= bw_cap && verbose)
>>>           pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s 
>>> x%d link)\n",
>>>                bw_cap / 1000, bw_cap % 1000,
>>>                PCIE_SPEED2STR(speed_cap), width_cap);
>>> -    else
>>> +    else if (bw_avail < bw_cap)
>>>           pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, 
>>> limited by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d 
>>> link)\n",
>>>                bw_avail / 1000, bw_avail % 1000,
>>>                PCIE_SPEED2STR(speed), width,
>>> @@ -5331,6 +5332,19 @@ void pcie_print_link_status(struct pci_dev *dev)
>>>                bw_cap / 1000, bw_cap % 1000,
>>>                PCIE_SPEED2STR(speed_cap), width_cap);
>>>   }
>>> +
>>> +/**
>>> + * pcie_print_link_status - Report the PCI device's link speed and 
>>> width
>>> + * @dev: PCI device to query
>>> + *
>>> + * Report the available bandwidth at the device.  If this is less 
>>> than the
>>> + * device is capable of, report the device's maximum possible 
>>> bandwidth and
>>> + * the upstream link that limits its performance to less than that.
>>> + */
>>> +void pcie_print_link_status(struct pci_dev *dev)
>>> +{
>>> +    __pcie_print_link_status(dev, true);
>>> +}
>>>   EXPORT_SYMBOL(pcie_print_link_status);
>>>   /**
>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>> index ac876e32de4b..1f7336377c3b 100644
>>> --- a/drivers/pci/probe.c
>>> +++ b/drivers/pci/probe.c
>>> @@ -2205,6 +2205,24 @@ static struct pci_dev *pci_scan_device(struct 
>>> pci_bus *bus, int devfn)
>>>       return dev;
>>>   }
>>> +static void pcie_check_upstream_link(struct pci_dev *dev)
>>> +{
>>> +    if (!pci_is_pcie(dev))
>>> +        return;
>>> +
>>> +    /* Look from the device up to avoid downstream ports with no 
>>> devices. */
>>> +    if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
>>> +        (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
>>> +        (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
>>> +        return;
>>> +
>>> +    /* Multi-function PCIe share the same link/status. */
>>> +    if (PCI_FUNC(dev->devfn) != 0 || dev->is_virtfn)
>>> +        return;
>>> +
>>> +    __pcie_print_link_status(dev, false);
>>> +}
>>> +
>>>   static void pci_init_capabilities(struct pci_dev *dev)
>>>   {
>>>       /* Enhanced Allocation */
>>> @@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct 
>>> pci_dev *dev)
>>>       /* Advanced Error Reporting */
>>>       pci_aer_init(dev);
>>> +    /* Check link and detect downtrain errors */
>>> +    pcie_check_upstream_link(dev);
> 
> This is called for every PCIe device right? Won't there be a duplicated 
> print in case a device loads with lower PCIe bandwidth than needed?

Alex, can you comment on this please?

> 
>>> +
>>>       if (pci_probe_reset_function(dev) == 0)
>>>           dev->reset_fn = 1;
>>>   }
>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>> index abd5d5e17aee..15bfab8f7a1b 100644
>>> --- a/include/linux/pci.h
>>> +++ b/include/linux/pci.h
>>> @@ -1088,6 +1088,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
>>>   u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev 
>>> **limiting_dev,
>>>                    enum pci_bus_speed *speed,
>>>                    enum pcie_link_width *width);
>>> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
>>>   void pcie_print_link_status(struct pci_dev *dev);
>>>   int pcie_flr(struct pci_dev *dev);
>>>   int __pci_reset_function_locked(struct pci_dev *dev);
>>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-07-31  6:40             ` Tal Gilboa
@ 2018-07-31 15:10               ` Alex G.
  2018-08-05  7:05                 ` Tal Gilboa
  0 siblings, 1 reply; 73+ messages in thread
From: Alex G. @ 2018-07-31 15:10 UTC (permalink / raw)
  To: Tal Gilboa, Jakub Kicinski
  Cc: linux-pci, bhelgaas, keith.busch, alex_gagniuc, austin_bolen,
	shyam_iyer, jeffrey.t.kirsher, ariel.elior, michael.chan,
	ganeshgr, Tariq Toukan, airlied, alexander.deucher,
	mike.marciniszyn, linux-kernel

On 07/31/2018 01:40 AM, Tal Gilboa wrote:
[snip]
>>>> @@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct 
>>>> pci_dev *dev)
>>>>       /* Advanced Error Reporting */
>>>>       pci_aer_init(dev);
>>>> +    /* Check link and detect downtrain errors */
>>>> +    pcie_check_upstream_link(dev);
>>
>> This is called for every PCIe device right? Won't there be a 
>> duplicated print in case a device loads with lower PCIe bandwidth than 
>> needed?
> 
> Alex, can you comment on this please?

Of course I can.

There's one print at probe() time, which happens if bandwidth < max. I 
would think that's fine. There is a way to duplicate it, and that is if 
the driver also calls print_link_status(). A few driver maintainers who 
call it have indicated they'd be fine with removing it from the driver, 
and leaving it in the core PCI.

Should the device come back at low speed, go away, then come back at 
full speed we'd still only see one print from the first probe. Again, if 
driver doesn't also call the function.
There's the corner case where the device come up at < max, goes away, 
then comes back faster, but still < max. There will be two prints, but 
they won't be true duplicates -- each one will indicate different BW.

Alex

>>>> +
>>>>       if (pci_probe_reset_function(dev) == 0)
>>>>           dev->reset_fn = 1;
>>>>   }
>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>> index abd5d5e17aee..15bfab8f7a1b 100644
>>>> --- a/include/linux/pci.h
>>>> +++ b/include/linux/pci.h
>>>> @@ -1088,6 +1088,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
>>>>   u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev 
>>>> **limiting_dev,
>>>>                    enum pci_bus_speed *speed,
>>>>                    enum pcie_link_width *width);
>>>> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
>>>>   void pcie_print_link_status(struct pci_dev *dev);
>>>>   int pcie_flr(struct pci_dev *dev);
>>>>   int __pci_reset_function_locked(struct pci_dev *dev);
>>>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-07-31 15:10               ` Alex G.
@ 2018-08-05  7:05                 ` Tal Gilboa
  2018-08-06 18:39                     ` Alex_Gagniuc
  0 siblings, 1 reply; 73+ messages in thread
From: Tal Gilboa @ 2018-08-05  7:05 UTC (permalink / raw)
  To: Alex G., Jakub Kicinski
  Cc: linux-pci, bhelgaas, keith.busch, alex_gagniuc, austin_bolen,
	shyam_iyer, jeffrey.t.kirsher, ariel.elior, michael.chan,
	ganeshgr, Tariq Toukan, airlied, alexander.deucher,
	mike.marciniszyn, linux-kernel

On 7/31/2018 6:10 PM, Alex G. wrote:
> On 07/31/2018 01:40 AM, Tal Gilboa wrote:
> [snip]
>>>>> @@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct 
>>>>> pci_dev *dev)
>>>>>       /* Advanced Error Reporting */
>>>>>       pci_aer_init(dev);
>>>>> +    /* Check link and detect downtrain errors */
>>>>> +    pcie_check_upstream_link(dev);
>>>
>>> This is called for every PCIe device right? Won't there be a 
>>> duplicated print in case a device loads with lower PCIe bandwidth 
>>> than needed?
>>
>> Alex, can you comment on this please?
> 
> Of course I can.
> 
> There's one print at probe() time, which happens if bandwidth < max. I 
> would think that's fine. There is a way to duplicate it, and that is if 
> the driver also calls print_link_status(). A few driver maintainers who 
> call it have indicated they'd be fine with removing it from the driver, 
> and leaving it in the core PCI.

We would be fine with that as well. Please include the removal in your 
patches.

> 
> Should the device come back at low speed, go away, then come back at 
> full speed we'd still only see one print from the first probe. Again, if 
> driver doesn't also call the function.
> There's the corner case where the device come up at < max, goes away, 
> then comes back faster, but still < max. There will be two prints, but 
> they won't be true duplicates -- each one will indicate different BW.

This is fine.

> 
> Alex
> 
>>>>> +
>>>>>       if (pci_probe_reset_function(dev) == 0)
>>>>>           dev->reset_fn = 1;
>>>>>   }
>>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>>> index abd5d5e17aee..15bfab8f7a1b 100644
>>>>> --- a/include/linux/pci.h
>>>>> +++ b/include/linux/pci.h
>>>>> @@ -1088,6 +1088,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
>>>>>   u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev 
>>>>> **limiting_dev,
>>>>>                    enum pci_bus_speed *speed,
>>>>>                    enum pcie_link_width *width);
>>>>> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
>>>>>   void pcie_print_link_status(struct pci_dev *dev);
>>>>>   int pcie_flr(struct pci_dev *dev);
>>>>>   int __pci_reset_function_locked(struct pci_dev *dev);
>>>>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-08-05  7:05                 ` Tal Gilboa
@ 2018-08-06 18:39                     ` Alex_Gagniuc
  0 siblings, 0 replies; 73+ messages in thread
From: Alex_Gagniuc @ 2018-08-06 18:39 UTC (permalink / raw)
  To: talgi, mr.nuke.me, jakub.kicinski
  Cc: linux-pci, bhelgaas, keith.busch, Austin.Bolen, Shyam.Iyer,
	jeffrey.t.kirsher, ariel.elior, michael.chan, ganeshgr, tariqt,
	airlied, alexander.deucher, mike.marciniszyn, linux-kernel

On 08/05/2018 02:06 AM, Tal Gilboa wrote:
> On 7/31/2018 6:10 PM, Alex G. wrote:
>> On 07/31/2018 01:40 AM, Tal Gilboa wrote:
>> [snip]
>>>>>> @@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct
>>>>>> pci_dev *dev)
>>>>>>        /* Advanced Error Reporting */
>>>>>>        pci_aer_init(dev);
>>>>>> +    /* Check link and detect downtrain errors */
>>>>>> +    pcie_check_upstream_link(dev);
>>>>
>>>> This is called for every PCIe device right? Won't there be a
>>>> duplicated print in case a device loads with lower PCIe bandwidth
>>>> than needed?
>>>
>>> Alex, can you comment on this please?
>>
>> Of course I can.
>>
>> There's one print at probe() time, which happens if bandwidth < max. I
>> would think that's fine. There is a way to duplicate it, and that is if
>> the driver also calls print_link_status(). A few driver maintainers who
>> call it have indicated they'd be fine with removing it from the driver,
>> and leaving it in the core PCI.
> 
> We would be fine with that as well. Please include the removal in your
> patches.

What's the proper procedure? Do I wait for confirmation from Bjorn 
before knocking on maintainer's doors, or do I William Wallace into 
their trees and demand they merge the removal (pending Bjorn's approval 
on the other side) ?

Alex

>>
>> Should the device come back at low speed, go away, then come back at
>> full speed we'd still only see one print from the first probe. Again, if
>> driver doesn't also call the function.
>> There's the corner case where the device come up at < max, goes away,
>> then comes back faster, but still < max. There will be two prints, but
>> they won't be true duplicates -- each one will indicate different BW.
> 
> This is fine.
> 
>>
>> Alex
>>
>>>>>> +
>>>>>>        if (pci_probe_reset_function(dev) == 0)
>>>>>>            dev->reset_fn = 1;
>>>>>>    }
>>>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>>>> index abd5d5e17aee..15bfab8f7a1b 100644
>>>>>> --- a/include/linux/pci.h
>>>>>> +++ b/include/linux/pci.h
>>>>>> @@ -1088,6 +1088,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
>>>>>>    u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev
>>>>>> **limiting_dev,
>>>>>>                     enum pci_bus_speed *speed,
>>>>>>                     enum pcie_link_width *width);
>>>>>> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
>>>>>>    void pcie_print_link_status(struct pci_dev *dev);
>>>>>>    int pcie_flr(struct pci_dev *dev);
>>>>>>    int __pci_reset_function_locked(struct pci_dev *dev);
>>>>>
> 


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
@ 2018-08-06 18:39                     ` Alex_Gagniuc
  0 siblings, 0 replies; 73+ messages in thread
From: Alex_Gagniuc @ 2018-08-06 18:39 UTC (permalink / raw)
  To: talgi, mr.nuke.me, jakub.kicinski
  Cc: linux-pci, bhelgaas, keith.busch, Austin.Bolen, Shyam.Iyer,
	jeffrey.t.kirsher, ariel.elior, michael.chan, ganeshgr, tariqt,
	airlied, alexander.deucher, mike.marciniszyn, linux-kernel

On 08/05/2018 02:06 AM, Tal Gilboa wrote:=0A=
> On 7/31/2018 6:10 PM, Alex G. wrote:=0A=
>> On 07/31/2018 01:40 AM, Tal Gilboa wrote:=0A=
>> [snip]=0A=
>>>>>> @@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct=0A=
>>>>>> pci_dev *dev)=0A=
>>>>>>  =A0=A0=A0=A0=A0 /* Advanced Error Reporting */=0A=
>>>>>>  =A0=A0=A0=A0=A0 pci_aer_init(dev);=0A=
>>>>>> +=A0=A0=A0 /* Check link and detect downtrain errors */=0A=
>>>>>> +=A0=A0=A0 pcie_check_upstream_link(dev);=0A=
>>>>=0A=
>>>> This is called for every PCIe device right? Won't there be a=0A=
>>>> duplicated print in case a device loads with lower PCIe bandwidth=0A=
>>>> than needed?=0A=
>>>=0A=
>>> Alex, can you comment on this please?=0A=
>>=0A=
>> Of course I can.=0A=
>>=0A=
>> There's one print at probe() time, which happens if bandwidth < max. I=
=0A=
>> would think that's fine. There is a way to duplicate it, and that is if=
=0A=
>> the driver also calls print_link_status(). A few driver maintainers who=
=0A=
>> call it have indicated they'd be fine with removing it from the driver,=
=0A=
>> and leaving it in the core PCI.=0A=
> =0A=
> We would be fine with that as well. Please include the removal in your=0A=
> patches.=0A=
=0A=
What's the proper procedure? Do I wait for confirmation from Bjorn =0A=
before knocking on maintainer's doors, or do I William Wallace into =0A=
their trees and demand they merge the removal (pending Bjorn's approval =0A=
on the other side) ?=0A=
=0A=
Alex=0A=
=0A=
>>=0A=
>> Should the device come back at low speed, go away, then come back at=0A=
>> full speed we'd still only see one print from the first probe. Again, if=
=0A=
>> driver doesn't also call the function.=0A=
>> There's the corner case where the device come up at < max, goes away,=0A=
>> then comes back faster, but still < max. There will be two prints, but=
=0A=
>> they won't be true duplicates -- each one will indicate different BW.=0A=
> =0A=
> This is fine.=0A=
> =0A=
>>=0A=
>> Alex=0A=
>>=0A=
>>>>>> +=0A=
>>>>>>  =A0=A0=A0=A0=A0 if (pci_probe_reset_function(dev) =3D=3D 0)=0A=
>>>>>>  =A0=A0=A0=A0=A0=A0=A0=A0=A0 dev->reset_fn =3D 1;=0A=
>>>>>>  =A0 }=0A=
>>>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h=0A=
>>>>>> index abd5d5e17aee..15bfab8f7a1b 100644=0A=
>>>>>> --- a/include/linux/pci.h=0A=
>>>>>> +++ b/include/linux/pci.h=0A=
>>>>>> @@ -1088,6 +1088,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps)=
;=0A=
>>>>>>  =A0 u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_de=
v=0A=
>>>>>> **limiting_dev,=0A=
>>>>>>  =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 enum pci_bus=
_speed *speed,=0A=
>>>>>>  =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 enum pcie_li=
nk_width *width);=0A=
>>>>>> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose);=
=0A=
>>>>>>  =A0 void pcie_print_link_status(struct pci_dev *dev);=0A=
>>>>>>  =A0 int pcie_flr(struct pci_dev *dev);=0A=
>>>>>>  =A0 int __pci_reset_function_locked(struct pci_dev *dev);=0A=
>>>>>=0A=
> =0A=
=0A=

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
  2018-08-06 18:39                     ` Alex_Gagniuc
  (?)
@ 2018-08-06 19:46                     ` Bjorn Helgaas
  2018-08-06 23:25                         ` [Intel-wired-lan] " Alexandru Gagniuc
  -1 siblings, 1 reply; 73+ messages in thread
From: Bjorn Helgaas @ 2018-08-06 19:46 UTC (permalink / raw)
  To: Alex_Gagniuc
  Cc: Tal Gilboa, mr.nuke.me, jakub.kicinski, linux-pci, Keith Busch,
	Austin.Bolen, Shyam.Iyer, Kirsher, Jeffrey T, ariel.elior,
	michael.chan, ganeshgr, tariqt, Dave Airlie, Alex Deucher,
	Marciniszyn, Mike, Linux Kernel Mailing List

On Mon, Aug 6, 2018 at 1:39 PM <Alex_Gagniuc@dellteam.com> wrote:
>
> On 08/05/2018 02:06 AM, Tal Gilboa wrote:
> > On 7/31/2018 6:10 PM, Alex G. wrote:
> >> On 07/31/2018 01:40 AM, Tal Gilboa wrote:
> >> [snip]
> >>>>>> @@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct
> >>>>>> pci_dev *dev)
> >>>>>>        /* Advanced Error Reporting */
> >>>>>>        pci_aer_init(dev);
> >>>>>> +    /* Check link and detect downtrain errors */
> >>>>>> +    pcie_check_upstream_link(dev);
> >>>>
> >>>> This is called for every PCIe device right? Won't there be a
> >>>> duplicated print in case a device loads with lower PCIe bandwidth
> >>>> than needed?
> >>>
> >>> Alex, can you comment on this please?
> >>
> >> Of course I can.
> >>
> >> There's one print at probe() time, which happens if bandwidth < max. I
> >> would think that's fine. There is a way to duplicate it, and that is if
> >> the driver also calls print_link_status(). A few driver maintainers who
> >> call it have indicated they'd be fine with removing it from the driver,
> >> and leaving it in the core PCI.
> >
> > We would be fine with that as well. Please include the removal in your
> > patches.
>
> What's the proper procedure? Do I wait for confirmation from Bjorn
> before knocking on maintainer's doors, or do I William Wallace into
> their trees and demand they merge the removal (pending Bjorn's approval
> on the other side) ?

Post a v4 series that does the PCI core stuff as well as removing the
driver code.

Bjorn

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v6 1/9] PCI: Check for PCIe downtraining conditions
  2018-08-06 19:46                     ` Bjorn Helgaas
@ 2018-08-06 23:25                         ` Alexandru Gagniuc
  0 siblings, 0 replies; 73+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: linux-pci, bhelgaas, jakub.kicinski
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
	Alexandru Gagniuc, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

PCIe downtraining happens when both the device and PCIe port are
capable of a larger bus width or higher speed than negotiated.
Downtraining might be indicative of other problems in the system, and
identifying this from userspace is neither intuitive, nor
straightforward.

The easiest way to detect this is with pcie_print_link_status(),
since the bottleneck is usually the link that is downtrained. It's not
a perfect solution, but it works extremely well in most cases.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/pci/pci.c   | 22 ++++++++++++++++++----
 drivers/pci/probe.c | 21 +++++++++++++++++++++
 include/linux/pci.h |  1 +
 3 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 316496e99da9..414ad7b3abdb 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5302,14 +5302,15 @@ u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
 }
 
 /**
- * pcie_print_link_status - Report the PCI device's link speed and width
+ * __pcie_print_link_status - Report the PCI device's link speed and width
  * @dev: PCI device to query
+ * @verbose: Be verbose -- print info even when enough bandwidth is available.
  *
  * Report the available bandwidth at the device.  If this is less than the
  * device is capable of, report the device's maximum possible bandwidth and
  * the upstream link that limits its performance to less than that.
  */
-void pcie_print_link_status(struct pci_dev *dev)
+void __pcie_print_link_status(struct pci_dev *dev, bool verbose)
 {
 	enum pcie_link_width width, width_cap;
 	enum pci_bus_speed speed, speed_cap;
@@ -5319,11 +5320,11 @@ void pcie_print_link_status(struct pci_dev *dev)
 	bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap);
 	bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed, &width);
 
-	if (bw_avail >= bw_cap)
+	if (bw_avail >= bw_cap && verbose)
 		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s x%d link)\n",
 			 bw_cap / 1000, bw_cap % 1000,
 			 PCIE_SPEED2STR(speed_cap), width_cap);
-	else
+	else if (bw_avail < bw_cap)
 		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, limited by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
 			 bw_avail / 1000, bw_avail % 1000,
 			 PCIE_SPEED2STR(speed), width,
@@ -5331,6 +5332,19 @@ void pcie_print_link_status(struct pci_dev *dev)
 			 bw_cap / 1000, bw_cap % 1000,
 			 PCIE_SPEED2STR(speed_cap), width_cap);
 }
+
+/**
+ * pcie_print_link_status - Report the PCI device's link speed and width
+ * @dev: PCI device to query
+ *
+ * Report the available bandwidth at the device.  If this is less than the
+ * device is capable of, report the device's maximum possible bandwidth and
+ * the upstream link that limits its performance to less than that.
+ */
+void pcie_print_link_status(struct pci_dev *dev)
+{
+	__pcie_print_link_status(dev, true);
+}
 EXPORT_SYMBOL(pcie_print_link_status);
 
 /**
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 611adcd9c169..1c8c26dd2cb2 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2205,6 +2205,24 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
 	return dev;
 }
 
+static void pcie_check_upstream_link(struct pci_dev *dev)
+{
+	if (!pci_is_pcie(dev))
+		return;
+
+	/* Look from the device up to avoid downstream ports with no devices. */
+	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
+		return;
+
+	/* Multi-function PCIe share the same link/status. */
+	if (PCI_FUNC(dev->devfn) != 0 || dev->is_virtfn)
+		return;
+
+	__pcie_print_link_status(dev, false);
+}
+
 static void pci_init_capabilities(struct pci_dev *dev)
 {
 	/* Enhanced Allocation */
@@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
 	/* Advanced Error Reporting */
 	pci_aer_init(dev);
 
+	/* Check link and detect downtrain errors */
+	pcie_check_upstream_link(dev);
+
 	if (pci_probe_reset_function(dev) == 0)
 		dev->reset_fn = 1;
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c133ccfa002e..d212de231259 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1087,6 +1087,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
 u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev,
 			     enum pci_bus_speed *speed,
 			     enum pcie_link_width *width);
+void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
 void pcie_print_link_status(struct pci_dev *dev);
 int pcie_flr(struct pci_dev *dev);
 int __pci_reset_function_locked(struct pci_dev *dev);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Intel-wired-lan] [PATCH v6 1/9] PCI: Check for PCIe downtraining conditions
@ 2018-08-06 23:25                         ` Alexandru Gagniuc
  0 siblings, 0 replies; 73+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: intel-wired-lan

PCIe downtraining happens when both the device and PCIe port are
capable of a larger bus width or higher speed than negotiated.
Downtraining might be indicative of other problems in the system, and
identifying this from userspace is neither intuitive, nor
straightforward.

The easiest way to detect this is with pcie_print_link_status(),
since the bottleneck is usually the link that is downtrained. It's not
a perfect solution, but it works extremely well in most cases.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/pci/pci.c   | 22 ++++++++++++++++++----
 drivers/pci/probe.c | 21 +++++++++++++++++++++
 include/linux/pci.h |  1 +
 3 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 316496e99da9..414ad7b3abdb 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5302,14 +5302,15 @@ u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
 }
 
 /**
- * pcie_print_link_status - Report the PCI device's link speed and width
+ * __pcie_print_link_status - Report the PCI device's link speed and width
  * @dev: PCI device to query
+ * @verbose: Be verbose -- print info even when enough bandwidth is available.
  *
  * Report the available bandwidth at the device.  If this is less than the
  * device is capable of, report the device's maximum possible bandwidth and
  * the upstream link that limits its performance to less than that.
  */
-void pcie_print_link_status(struct pci_dev *dev)
+void __pcie_print_link_status(struct pci_dev *dev, bool verbose)
 {
 	enum pcie_link_width width, width_cap;
 	enum pci_bus_speed speed, speed_cap;
@@ -5319,11 +5320,11 @@ void pcie_print_link_status(struct pci_dev *dev)
 	bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap);
 	bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed, &width);
 
-	if (bw_avail >= bw_cap)
+	if (bw_avail >= bw_cap && verbose)
 		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s x%d link)\n",
 			 bw_cap / 1000, bw_cap % 1000,
 			 PCIE_SPEED2STR(speed_cap), width_cap);
-	else
+	else if (bw_avail < bw_cap)
 		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, limited by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
 			 bw_avail / 1000, bw_avail % 1000,
 			 PCIE_SPEED2STR(speed), width,
@@ -5331,6 +5332,19 @@ void pcie_print_link_status(struct pci_dev *dev)
 			 bw_cap / 1000, bw_cap % 1000,
 			 PCIE_SPEED2STR(speed_cap), width_cap);
 }
+
+/**
+ * pcie_print_link_status - Report the PCI device's link speed and width
+ * @dev: PCI device to query
+ *
+ * Report the available bandwidth@the device.  If this is less than the
+ * device is capable of, report the device's maximum possible bandwidth and
+ * the upstream link that limits its performance to less than that.
+ */
+void pcie_print_link_status(struct pci_dev *dev)
+{
+	__pcie_print_link_status(dev, true);
+}
 EXPORT_SYMBOL(pcie_print_link_status);
 
 /**
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 611adcd9c169..1c8c26dd2cb2 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2205,6 +2205,24 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
 	return dev;
 }
 
+static void pcie_check_upstream_link(struct pci_dev *dev)
+{
+	if (!pci_is_pcie(dev))
+		return;
+
+	/* Look from the device up to avoid downstream ports with no devices. */
+	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
+		return;
+
+	/* Multi-function PCIe share the same link/status. */
+	if (PCI_FUNC(dev->devfn) != 0 || dev->is_virtfn)
+		return;
+
+	__pcie_print_link_status(dev, false);
+}
+
 static void pci_init_capabilities(struct pci_dev *dev)
 {
 	/* Enhanced Allocation */
@@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
 	/* Advanced Error Reporting */
 	pci_aer_init(dev);
 
+	/* Check link and detect downtrain errors */
+	pcie_check_upstream_link(dev);
+
 	if (pci_probe_reset_function(dev) == 0)
 		dev->reset_fn = 1;
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c133ccfa002e..d212de231259 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1087,6 +1087,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
 u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev,
 			     enum pci_bus_speed *speed,
 			     enum pcie_link_width *width);
+void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
 void pcie_print_link_status(struct pci_dev *dev);
 int pcie_flr(struct pci_dev *dev);
 int __pci_reset_function_locked(struct pci_dev *dev);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v6 2/9] bnx2x: Do not call pcie_print_link_status()
  2018-08-06 23:25                         ` [Intel-wired-lan] " Alexandru Gagniuc
@ 2018-08-06 23:25                           ` Alexandru Gagniuc
  -1 siblings, 0 replies; 73+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: linux-pci, bhelgaas, jakub.kicinski
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
	Alexandru Gagniuc, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 57348f2b49a3..3eadd6201dff 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -14100,7 +14100,6 @@ static int bnx2x_init_one(struct pci_dev *pdev,
 	       board_info[ent->driver_data].name,
 	       (CHIP_REV(bp) >> 12) + 'A', (CHIP_METAL(bp) >> 4),
 	       dev->base_addr, bp->pdev->irq, dev->dev_addr);
-	pcie_print_link_status(bp->pdev);
 
 	bnx2x_register_phc(bp);
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Intel-wired-lan] [PATCH v6 2/9] bnx2x: Do not call pcie_print_link_status()
@ 2018-08-06 23:25                           ` Alexandru Gagniuc
  0 siblings, 0 replies; 73+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: intel-wired-lan

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 57348f2b49a3..3eadd6201dff 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -14100,7 +14100,6 @@ static int bnx2x_init_one(struct pci_dev *pdev,
 	       board_info[ent->driver_data].name,
 	       (CHIP_REV(bp) >> 12) + 'A', (CHIP_METAL(bp) >> 4),
 	       dev->base_addr, bp->pdev->irq, dev->dev_addr);
-	pcie_print_link_status(bp->pdev);
 
 	bnx2x_register_phc(bp);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v6 3/9] bnxt_en: Do not call pcie_print_link_status()
  2018-08-06 23:25                         ` [Intel-wired-lan] " Alexandru Gagniuc
@ 2018-08-06 23:25                           ` Alexandru Gagniuc
  -1 siblings, 0 replies; 73+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: linux-pci, bhelgaas, jakub.kicinski
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
	Alexandru Gagniuc, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 4394c1162be4..4b3928c89076 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -8909,7 +8909,6 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	netdev_info(dev, "%s found at mem %lx, node addr %pM\n",
 		    board_info[ent->driver_data].name,
 		    (long)pci_resource_start(pdev, 0), dev->dev_addr);
-	pcie_print_link_status(pdev);
 
 	return 0;
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Intel-wired-lan] [PATCH v6 3/9] bnxt_en: Do not call pcie_print_link_status()
@ 2018-08-06 23:25                           ` Alexandru Gagniuc
  0 siblings, 0 replies; 73+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: intel-wired-lan

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 4394c1162be4..4b3928c89076 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -8909,7 +8909,6 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	netdev_info(dev, "%s found at mem %lx, node addr %pM\n",
 		    board_info[ent->driver_data].name,
 		    (long)pci_resource_start(pdev, 0), dev->dev_addr);
-	pcie_print_link_status(pdev);
 
 	return 0;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v6 4/9] cxgb4: Do not call pcie_print_link_status()
  2018-08-06 23:25                         ` [Intel-wired-lan] " Alexandru Gagniuc
@ 2018-08-06 23:25                           ` Alexandru Gagniuc
  -1 siblings, 0 replies; 73+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: linux-pci, bhelgaas, jakub.kicinski
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
	Alexandru Gagniuc, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index a8926e97935e..049958898c17 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -5726,9 +5726,6 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 			free_msix_info(adapter);
 	}
 
-	/* check for PCI Express bandwidth capabiltites */
-	pcie_print_link_status(pdev);
-
 	err = init_rss(adapter);
 	if (err)
 		goto out_free_dev;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Intel-wired-lan] [PATCH v6 4/9] cxgb4: Do not call pcie_print_link_status()
@ 2018-08-06 23:25                           ` Alexandru Gagniuc
  0 siblings, 0 replies; 73+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: intel-wired-lan

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index a8926e97935e..049958898c17 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -5726,9 +5726,6 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 			free_msix_info(adapter);
 	}
 
-	/* check for PCI Express bandwidth capabiltites */
-	pcie_print_link_status(pdev);
-
 	err = init_rss(adapter);
 	if (err)
 		goto out_free_dev;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v6 5/9] fm10k: Do not call pcie_print_link_status()
  2018-08-06 23:25                         ` [Intel-wired-lan] " Alexandru Gagniuc
@ 2018-08-06 23:25                           ` Alexandru Gagniuc
  -1 siblings, 0 replies; 73+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: linux-pci, bhelgaas, jakub.kicinski
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
	Alexandru Gagniuc, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 15071e4adb98..079fd3c884ea 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -2224,9 +2224,6 @@ static int fm10k_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* kick off service timer now, even when interface is down */
 	mod_timer(&interface->service_timer, (HZ * 2) + jiffies);
 
-	/* print warning for non-optimal configurations */
-	pcie_print_link_status(interface->pdev);
-
 	/* report MAC address for logging */
 	dev_info(&pdev->dev, "%pM\n", netdev->dev_addr);
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Intel-wired-lan] [PATCH v6 5/9] fm10k: Do not call pcie_print_link_status()
@ 2018-08-06 23:25                           ` Alexandru Gagniuc
  0 siblings, 0 replies; 73+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: intel-wired-lan

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 15071e4adb98..079fd3c884ea 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -2224,9 +2224,6 @@ static int fm10k_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* kick off service timer now, even when interface is down */
 	mod_timer(&interface->service_timer, (HZ * 2) + jiffies);
 
-	/* print warning for non-optimal configurations */
-	pcie_print_link_status(interface->pdev);
-
 	/* report MAC address for logging */
 	dev_info(&pdev->dev, "%pM\n", netdev->dev_addr);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v6 6/9] ixgbe: Do not call pcie_print_link_status()
  2018-08-06 23:25                         ` [Intel-wired-lan] " Alexandru Gagniuc
@ 2018-08-06 23:25                           ` Alexandru Gagniuc
  -1 siblings, 0 replies; 73+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: linux-pci, bhelgaas, jakub.kicinski
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
	Alexandru Gagniuc, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 26 -------------------
 1 file changed, 26 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 62e57b05a0ae..7ecdc6c03a66 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -241,28 +241,6 @@ static inline bool ixgbe_pcie_from_parent(struct ixgbe_hw *hw)
 	}
 }
 
-static void ixgbe_check_minimum_link(struct ixgbe_adapter *adapter,
-				     int expected_gts)
-{
-	struct ixgbe_hw *hw = &adapter->hw;
-	struct pci_dev *pdev;
-
-	/* Some devices are not connected over PCIe and thus do not negotiate
-	 * speed. These devices do not have valid bus info, and thus any report
-	 * we generate may not be correct.
-	 */
-	if (hw->bus.type == ixgbe_bus_type_internal)
-		return;
-
-	/* determine whether to use the parent device */
-	if (ixgbe_pcie_from_parent(&adapter->hw))
-		pdev = adapter->pdev->bus->parent->self;
-	else
-		pdev = adapter->pdev;
-
-	pcie_print_link_status(pdev);
-}
-
 static void ixgbe_service_event_schedule(struct ixgbe_adapter *adapter)
 {
 	if (!test_bit(__IXGBE_DOWN, &adapter->state) &&
@@ -10585,10 +10563,6 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		break;
 	}
 
-	/* don't check link if we failed to enumerate functions */
-	if (expected_gts > 0)
-		ixgbe_check_minimum_link(adapter, expected_gts);
-
 	err = ixgbe_read_pba_string_generic(hw, part_str, sizeof(part_str));
 	if (err)
 		strlcpy(part_str, "Unknown", sizeof(part_str));
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Intel-wired-lan] [PATCH v6 6/9] ixgbe: Do not call pcie_print_link_status()
@ 2018-08-06 23:25                           ` Alexandru Gagniuc
  0 siblings, 0 replies; 73+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: intel-wired-lan

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 26 -------------------
 1 file changed, 26 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 62e57b05a0ae..7ecdc6c03a66 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -241,28 +241,6 @@ static inline bool ixgbe_pcie_from_parent(struct ixgbe_hw *hw)
 	}
 }
 
-static void ixgbe_check_minimum_link(struct ixgbe_adapter *adapter,
-				     int expected_gts)
-{
-	struct ixgbe_hw *hw = &adapter->hw;
-	struct pci_dev *pdev;
-
-	/* Some devices are not connected over PCIe and thus do not negotiate
-	 * speed. These devices do not have valid bus info, and thus any report
-	 * we generate may not be correct.
-	 */
-	if (hw->bus.type == ixgbe_bus_type_internal)
-		return;
-
-	/* determine whether to use the parent device */
-	if (ixgbe_pcie_from_parent(&adapter->hw))
-		pdev = adapter->pdev->bus->parent->self;
-	else
-		pdev = adapter->pdev;
-
-	pcie_print_link_status(pdev);
-}
-
 static void ixgbe_service_event_schedule(struct ixgbe_adapter *adapter)
 {
 	if (!test_bit(__IXGBE_DOWN, &adapter->state) &&
@@ -10585,10 +10563,6 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		break;
 	}
 
-	/* don't check link if we failed to enumerate functions */
-	if (expected_gts > 0)
-		ixgbe_check_minimum_link(adapter, expected_gts);
-
 	err = ixgbe_read_pba_string_generic(hw, part_str, sizeof(part_str));
 	if (err)
 		strlcpy(part_str, "Unknown", sizeof(part_str));
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v6 7/9] net/mlx4: Do not call pcie_print_link_status()
  2018-08-06 23:25                         ` [Intel-wired-lan] " Alexandru Gagniuc
@ 2018-08-06 23:25                           ` Alexandru Gagniuc
  -1 siblings, 0 replies; 73+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: linux-pci, bhelgaas, jakub.kicinski
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
	Alexandru Gagniuc, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 872014702fc1..da4d54195853 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -3398,13 +3398,6 @@ static int mlx4_load_one(struct pci_dev *pdev, int pci_dev_data,
 		}
 	}
 
-	/* check if the device is functioning at its maximum possible speed.
-	 * No return code for this call, just warn the user in case of PCI
-	 * express device capabilities are under-satisfied by the bus.
-	 */
-	if (!mlx4_is_slave(dev))
-		pcie_print_link_status(dev->persist->pdev);
-
 	/* In master functions, the communication channel must be initialized
 	 * after obtaining its address from fw */
 	if (mlx4_is_master(dev)) {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Intel-wired-lan] [PATCH v6 7/9] net/mlx4: Do not call pcie_print_link_status()
@ 2018-08-06 23:25                           ` Alexandru Gagniuc
  0 siblings, 0 replies; 73+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: intel-wired-lan

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 872014702fc1..da4d54195853 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -3398,13 +3398,6 @@ static int mlx4_load_one(struct pci_dev *pdev, int pci_dev_data,
 		}
 	}
 
-	/* check if the device is functioning at its maximum possible speed.
-	 * No return code for this call, just warn the user in case of PCI
-	 * express device capabilities are under-satisfied by the bus.
-	 */
-	if (!mlx4_is_slave(dev))
-		pcie_print_link_status(dev->persist->pdev);
-
 	/* In master functions, the communication channel must be initialized
 	 * after obtaining its address from fw */
 	if (mlx4_is_master(dev)) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
  2018-08-06 23:25                         ` [Intel-wired-lan] " Alexandru Gagniuc
@ 2018-08-06 23:25                           ` Alexandru Gagniuc
  -1 siblings, 0 replies; 73+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: linux-pci, bhelgaas, jakub.kicinski
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
	Alexandru Gagniuc, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 615005e63819..e10f9c2bea3b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1045,10 +1045,6 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 	dev_info(&pdev->dev, "firmware version: %d.%d.%d\n", fw_rev_maj(dev),
 		 fw_rev_min(dev), fw_rev_sub(dev));
 
-	/* Only PFs hold the relevant PCIe information for this query */
-	if (mlx5_core_is_pf(dev))
-		pcie_print_link_status(dev->pdev);
-
 	/* on load removing any previous indication of internal error, device is
 	 * up
 	 */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Intel-wired-lan] [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
@ 2018-08-06 23:25                           ` Alexandru Gagniuc
  0 siblings, 0 replies; 73+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: intel-wired-lan

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 615005e63819..e10f9c2bea3b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1045,10 +1045,6 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 	dev_info(&pdev->dev, "firmware version: %d.%d.%d\n", fw_rev_maj(dev),
 		 fw_rev_min(dev), fw_rev_sub(dev));
 
-	/* Only PFs hold the relevant PCIe information for this query */
-	if (mlx5_core_is_pf(dev))
-		pcie_print_link_status(dev->pdev);
-
 	/* on load removing any previous indication of internal error, device is
 	 * up
 	 */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [PATCH v6 9/9] nfp: Do not call pcie_print_link_status()
  2018-08-06 23:25                         ` [Intel-wired-lan] " Alexandru Gagniuc
@ 2018-08-06 23:25                           ` Alexandru Gagniuc
  -1 siblings, 0 replies; 73+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: linux-pci, bhelgaas, jakub.kicinski
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
	Alexandru Gagniuc, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
index 749655c329b2..0324f99bd1a7 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
@@ -1324,7 +1324,6 @@ struct nfp_cpp *nfp_cpp_from_nfp6000_pcie(struct pci_dev *pdev)
 	/*  Finished with card initialization. */
 	dev_info(&pdev->dev,
 		 "Netronome Flow Processor NFP4000/NFP6000 PCIe Card Probe\n");
-	pcie_print_link_status(pdev);
 
 	nfp = kzalloc(sizeof(*nfp), GFP_KERNEL);
 	if (!nfp) {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Intel-wired-lan] [PATCH v6 9/9] nfp: Do not call pcie_print_link_status()
@ 2018-08-06 23:25                           ` Alexandru Gagniuc
  0 siblings, 0 replies; 73+ messages in thread
From: Alexandru Gagniuc @ 2018-08-06 23:25 UTC (permalink / raw)
  To: intel-wired-lan

This is now done by the PCI core to warn of sub-optimal bandwidth.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
index 749655c329b2..0324f99bd1a7 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
@@ -1324,7 +1324,6 @@ struct nfp_cpp *nfp_cpp_from_nfp6000_pcie(struct pci_dev *pdev)
 	/*  Finished with card initialization. */
 	dev_info(&pdev->dev,
 		 "Netronome Flow Processor NFP4000/NFP6000 PCIe Card Probe\n");
-	pcie_print_link_status(pdev);
 
 	nfp = kzalloc(sizeof(*nfp), GFP_KERNEL);
 	if (!nfp) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 73+ messages in thread

* Re: [PATCH v6 6/9] ixgbe: Do not call pcie_print_link_status()
  2018-08-06 23:25                           ` [Intel-wired-lan] " Alexandru Gagniuc
@ 2018-08-07 17:51                             ` Jeff Kirsher
  -1 siblings, 0 replies; 73+ messages in thread
From: Jeff Kirsher @ 2018-08-07 17:51 UTC (permalink / raw)
  To: Alexandru Gagniuc, linux-pci, bhelgaas, jakub.kicinski
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer, Ariel Elior,
	everest-linux-l2, David S. Miller, Michael Chan, Ganesh Goudar,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

[-- Attachment #1: Type: text/plain, Size: 372 bytes --]

On Mon, 2018-08-06 at 18:25 -0500, Alexandru Gagniuc wrote:
> This is now done by the PCI core to warn of sub-optimal bandwidth.
> 
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 26 ---------------
> ----
>  1 file changed, 26 deletions(-)

Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Intel-wired-lan] [PATCH v6 6/9] ixgbe: Do not call pcie_print_link_status()
@ 2018-08-07 17:51                             ` Jeff Kirsher
  0 siblings, 0 replies; 73+ messages in thread
From: Jeff Kirsher @ 2018-08-07 17:51 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, 2018-08-06 at 18:25 -0500, Alexandru Gagniuc wrote:
> This is now done by the PCI core to warn of sub-optimal bandwidth.
> 
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 26 ---------------
> ----
>  1 file changed, 26 deletions(-)

Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20180807/022d7918/attachment.asc>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v6 5/9] fm10k: Do not call pcie_print_link_status()
  2018-08-06 23:25                           ` [Intel-wired-lan] " Alexandru Gagniuc
@ 2018-08-07 17:52                             ` Jeff Kirsher
  -1 siblings, 0 replies; 73+ messages in thread
From: Jeff Kirsher @ 2018-08-07 17:52 UTC (permalink / raw)
  To: Alexandru Gagniuc, linux-pci, bhelgaas, jakub.kicinski
  Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer, Ariel Elior,
	everest-linux-l2, David S. Miller, Michael Chan, Ganesh Goudar,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

[-- Attachment #1: Type: text/plain, Size: 349 bytes --]

On Mon, 2018-08-06 at 18:25 -0500, Alexandru Gagniuc wrote:
> This is now done by the PCI core to warn of sub-optimal bandwidth.
> 
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
>  drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 3 ---
>  1 file changed, 3 deletions(-)

Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Intel-wired-lan] [PATCH v6 5/9] fm10k: Do not call pcie_print_link_status()
@ 2018-08-07 17:52                             ` Jeff Kirsher
  0 siblings, 0 replies; 73+ messages in thread
From: Jeff Kirsher @ 2018-08-07 17:52 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, 2018-08-06 at 18:25 -0500, Alexandru Gagniuc wrote:
> This is now done by the PCI core to warn of sub-optimal bandwidth.
> 
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
>  drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 3 ---
>  1 file changed, 3 deletions(-)

Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20180807/d4ce42cf/attachment.asc>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v6 1/9] PCI: Check for PCIe downtraining conditions
  2018-08-06 23:25                         ` [Intel-wired-lan] " Alexandru Gagniuc
@ 2018-08-07 19:44                           ` David Miller
  -1 siblings, 0 replies; 73+ messages in thread
From: David Miller @ 2018-08-07 19:44 UTC (permalink / raw)
  To: mr.nuke.me
  Cc: linux-pci, bhelgaas, jakub.kicinski, keith.busch, alex_gagniuc,
	austin_bolen, shyam_iyer, ariel.elior, everest-linux-l2,
	michael.chan, ganeshgr, jeffrey.t.kirsher, tariqt, saeedm, leon,
	dirk.vandermerwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

From: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Date: Mon,  6 Aug 2018 18:25:35 -0500

> PCIe downtraining happens when both the device and PCIe port are
> capable of a larger bus width or higher speed than negotiated.
> Downtraining might be indicative of other problems in the system, and
> identifying this from userspace is neither intuitive, nor
> straightforward.
> 
> The easiest way to detect this is with pcie_print_link_status(),
> since the bottleneck is usually the link that is downtrained. It's not
> a perfect solution, but it works extremely well in most cases.
> 
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>

Feel free to merge this entire series via the PCI tree.

For the series:

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Intel-wired-lan] [PATCH v6 1/9] PCI: Check for PCIe downtraining conditions
@ 2018-08-07 19:44                           ` David Miller
  0 siblings, 0 replies; 73+ messages in thread
From: David Miller @ 2018-08-07 19:44 UTC (permalink / raw)
  To: intel-wired-lan

From: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Date: Mon,  6 Aug 2018 18:25:35 -0500

> PCIe downtraining happens when both the device and PCIe port are
> capable of a larger bus width or higher speed than negotiated.
> Downtraining might be indicative of other problems in the system, and
> identifying this from userspace is neither intuitive, nor
> straightforward.
> 
> The easiest way to detect this is with pcie_print_link_status(),
> since the bottleneck is usually the link that is downtrained. It's not
> a perfect solution, but it works extremely well in most cases.
> 
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>

Feel free to merge this entire series via the PCI tree.

For the series:

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v6 1/9] PCI: Check for PCIe downtraining conditions
  2018-08-06 23:25                         ` [Intel-wired-lan] " Alexandru Gagniuc
@ 2018-08-07 21:41                           ` Bjorn Helgaas
  -1 siblings, 0 replies; 73+ messages in thread
From: Bjorn Helgaas @ 2018-08-07 21:41 UTC (permalink / raw)
  To: Alexandru Gagniuc
  Cc: linux-pci, bhelgaas, jakub.kicinski, keith.busch, alex_gagniuc,
	austin_bolen, shyam_iyer, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky,
	Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
	linux-rdma, oss-drivers

On Mon, Aug 06, 2018 at 06:25:35PM -0500, Alexandru Gagniuc wrote:
> PCIe downtraining happens when both the device and PCIe port are
> capable of a larger bus width or higher speed than negotiated.
> Downtraining might be indicative of other problems in the system, and
> identifying this from userspace is neither intuitive, nor
> straightforward.
> 
> The easiest way to detect this is with pcie_print_link_status(),
> since the bottleneck is usually the link that is downtrained. It's not
> a perfect solution, but it works extremely well in most cases.

After this series, there are no callers of pcie_print_link_status(),
which means we *only* print something if a device is capable of more
bandwidth than the fabric can deliver.

ISTR some desire to have this information for NICs even if the device
isn't limited, so I'm just double-checking to make sure the driver
guys are OK with this change.

There are no callers of __pcie_print_link_status() outside the PCI
core, so I would move the declaration from include/linux/pci.h to
drivers/pci/pci.h.

If we agree that we *never* need to print anything unless a device is
constrained by the fabric, I would get rid of the "verbose" flag and
keep everything in pcie_print_link_status().

> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
>  drivers/pci/pci.c   | 22 ++++++++++++++++++----
>  drivers/pci/probe.c | 21 +++++++++++++++++++++
>  include/linux/pci.h |  1 +
>  3 files changed, 40 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 316496e99da9..414ad7b3abdb 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5302,14 +5302,15 @@ u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
>  }
>  
>  /**
> - * pcie_print_link_status - Report the PCI device's link speed and width
> + * __pcie_print_link_status - Report the PCI device's link speed and width
>   * @dev: PCI device to query
> + * @verbose: Be verbose -- print info even when enough bandwidth is available.
>   *
>   * Report the available bandwidth at the device.  If this is less than the
>   * device is capable of, report the device's maximum possible bandwidth and
>   * the upstream link that limits its performance to less than that.
>   */
> -void pcie_print_link_status(struct pci_dev *dev)
> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose)
>  {
>  	enum pcie_link_width width, width_cap;
>  	enum pci_bus_speed speed, speed_cap;
> @@ -5319,11 +5320,11 @@ void pcie_print_link_status(struct pci_dev *dev)
>  	bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap);
>  	bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed, &width);
>  
> -	if (bw_avail >= bw_cap)
> +	if (bw_avail >= bw_cap && verbose)
>  		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s x%d link)\n",
>  			 bw_cap / 1000, bw_cap % 1000,
>  			 PCIE_SPEED2STR(speed_cap), width_cap);
> -	else
> +	else if (bw_avail < bw_cap)
>  		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, limited by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
>  			 bw_avail / 1000, bw_avail % 1000,
>  			 PCIE_SPEED2STR(speed), width,
> @@ -5331,6 +5332,19 @@ void pcie_print_link_status(struct pci_dev *dev)
>  			 bw_cap / 1000, bw_cap % 1000,
>  			 PCIE_SPEED2STR(speed_cap), width_cap);
>  }
> +
> +/**
> + * pcie_print_link_status - Report the PCI device's link speed and width
> + * @dev: PCI device to query
> + *
> + * Report the available bandwidth at the device.  If this is less than the
> + * device is capable of, report the device's maximum possible bandwidth and
> + * the upstream link that limits its performance to less than that.
> + */
> +void pcie_print_link_status(struct pci_dev *dev)
> +{
> +	__pcie_print_link_status(dev, true);
> +}
>  EXPORT_SYMBOL(pcie_print_link_status);
>  
>  /**
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 611adcd9c169..1c8c26dd2cb2 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2205,6 +2205,24 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
>  	return dev;
>  }
>  
> +static void pcie_check_upstream_link(struct pci_dev *dev)
> +{
> +	if (!pci_is_pcie(dev))
> +		return;
> +
> +	/* Look from the device up to avoid downstream ports with no devices. */
> +	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
> +		return;
> +
> +	/* Multi-function PCIe share the same link/status. */
> +	if (PCI_FUNC(dev->devfn) != 0 || dev->is_virtfn)
> +		return;
> +
> +	__pcie_print_link_status(dev, false);
> +}
> +
>  static void pci_init_capabilities(struct pci_dev *dev)
>  {
>  	/* Enhanced Allocation */
> @@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
>  	/* Advanced Error Reporting */
>  	pci_aer_init(dev);
>  
> +	/* Check link and detect downtrain errors */
> +	pcie_check_upstream_link(dev);
> +
>  	if (pci_probe_reset_function(dev) == 0)
>  		dev->reset_fn = 1;
>  }
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index c133ccfa002e..d212de231259 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1087,6 +1087,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
>  u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev,
>  			     enum pci_bus_speed *speed,
>  			     enum pcie_link_width *width);
> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
>  void pcie_print_link_status(struct pci_dev *dev);
>  int pcie_flr(struct pci_dev *dev);
>  int __pci_reset_function_locked(struct pci_dev *dev);
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Intel-wired-lan] [PATCH v6 1/9] PCI: Check for PCIe downtraining conditions
@ 2018-08-07 21:41                           ` Bjorn Helgaas
  0 siblings, 0 replies; 73+ messages in thread
From: Bjorn Helgaas @ 2018-08-07 21:41 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, Aug 06, 2018 at 06:25:35PM -0500, Alexandru Gagniuc wrote:
> PCIe downtraining happens when both the device and PCIe port are
> capable of a larger bus width or higher speed than negotiated.
> Downtraining might be indicative of other problems in the system, and
> identifying this from userspace is neither intuitive, nor
> straightforward.
> 
> The easiest way to detect this is with pcie_print_link_status(),
> since the bottleneck is usually the link that is downtrained. It's not
> a perfect solution, but it works extremely well in most cases.

After this series, there are no callers of pcie_print_link_status(),
which means we *only* print something if a device is capable of more
bandwidth than the fabric can deliver.

ISTR some desire to have this information for NICs even if the device
isn't limited, so I'm just double-checking to make sure the driver
guys are OK with this change.

There are no callers of __pcie_print_link_status() outside the PCI
core, so I would move the declaration from include/linux/pci.h to
drivers/pci/pci.h.

If we agree that we *never* need to print anything unless a device is
constrained by the fabric, I would get rid of the "verbose" flag and
keep everything in pcie_print_link_status().

> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
>  drivers/pci/pci.c   | 22 ++++++++++++++++++----
>  drivers/pci/probe.c | 21 +++++++++++++++++++++
>  include/linux/pci.h |  1 +
>  3 files changed, 40 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 316496e99da9..414ad7b3abdb 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5302,14 +5302,15 @@ u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
>  }
>  
>  /**
> - * pcie_print_link_status - Report the PCI device's link speed and width
> + * __pcie_print_link_status - Report the PCI device's link speed and width
>   * @dev: PCI device to query
> + * @verbose: Be verbose -- print info even when enough bandwidth is available.
>   *
>   * Report the available bandwidth at the device.  If this is less than the
>   * device is capable of, report the device's maximum possible bandwidth and
>   * the upstream link that limits its performance to less than that.
>   */
> -void pcie_print_link_status(struct pci_dev *dev)
> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose)
>  {
>  	enum pcie_link_width width, width_cap;
>  	enum pci_bus_speed speed, speed_cap;
> @@ -5319,11 +5320,11 @@ void pcie_print_link_status(struct pci_dev *dev)
>  	bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap);
>  	bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed, &width);
>  
> -	if (bw_avail >= bw_cap)
> +	if (bw_avail >= bw_cap && verbose)
>  		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s x%d link)\n",
>  			 bw_cap / 1000, bw_cap % 1000,
>  			 PCIE_SPEED2STR(speed_cap), width_cap);
> -	else
> +	else if (bw_avail < bw_cap)
>  		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, limited by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
>  			 bw_avail / 1000, bw_avail % 1000,
>  			 PCIE_SPEED2STR(speed), width,
> @@ -5331,6 +5332,19 @@ void pcie_print_link_status(struct pci_dev *dev)
>  			 bw_cap / 1000, bw_cap % 1000,
>  			 PCIE_SPEED2STR(speed_cap), width_cap);
>  }
> +
> +/**
> + * pcie_print_link_status - Report the PCI device's link speed and width
> + * @dev: PCI device to query
> + *
> + * Report the available bandwidth at the device.  If this is less than the
> + * device is capable of, report the device's maximum possible bandwidth and
> + * the upstream link that limits its performance to less than that.
> + */
> +void pcie_print_link_status(struct pci_dev *dev)
> +{
> +	__pcie_print_link_status(dev, true);
> +}
>  EXPORT_SYMBOL(pcie_print_link_status);
>  
>  /**
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 611adcd9c169..1c8c26dd2cb2 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2205,6 +2205,24 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
>  	return dev;
>  }
>  
> +static void pcie_check_upstream_link(struct pci_dev *dev)
> +{
> +	if (!pci_is_pcie(dev))
> +		return;
> +
> +	/* Look from the device up to avoid downstream ports with no devices. */
> +	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
> +		return;
> +
> +	/* Multi-function PCIe share the same link/status. */
> +	if (PCI_FUNC(dev->devfn) != 0 || dev->is_virtfn)
> +		return;
> +
> +	__pcie_print_link_status(dev, false);
> +}
> +
>  static void pci_init_capabilities(struct pci_dev *dev)
>  {
>  	/* Enhanced Allocation */
> @@ -2240,6 +2258,9 @@ static void pci_init_capabilities(struct pci_dev *dev)
>  	/* Advanced Error Reporting */
>  	pci_aer_init(dev);
>  
> +	/* Check link and detect downtrain errors */
> +	pcie_check_upstream_link(dev);
> +
>  	if (pci_probe_reset_function(dev) == 0)
>  		dev->reset_fn = 1;
>  }
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index c133ccfa002e..d212de231259 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1087,6 +1087,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
>  u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev,
>  			     enum pci_bus_speed *speed,
>  			     enum pcie_link_width *width);
> +void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
>  void pcie_print_link_status(struct pci_dev *dev);
>  int pcie_flr(struct pci_dev *dev);
>  int __pci_reset_function_locked(struct pci_dev *dev);
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
  2018-08-06 23:25                           ` [Intel-wired-lan] " Alexandru Gagniuc
@ 2018-08-08  6:08                             ` Leon Romanovsky
  -1 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2018-08-08  6:08 UTC (permalink / raw)
  To: Alexandru Gagniuc
  Cc: linux-pci, bhelgaas, jakub.kicinski, keith.busch, alex_gagniuc,
	austin_bolen, shyam_iyer, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Dirk van der Merwe, netdev,
	linux-kernel, intel-wired-lan, linux-rdma, oss-drivers

[-- Attachment #1: Type: text/plain, Size: 357 bytes --]

On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
> This is now done by the PCI core to warn of sub-optimal bandwidth.
>
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
>  1 file changed, 4 deletions(-)
>

Thanks,
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Intel-wired-lan] [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
@ 2018-08-08  6:08                             ` Leon Romanovsky
  0 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2018-08-08  6:08 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
> This is now done by the PCI core to warn of sub-optimal bandwidth.
>
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
>  1 file changed, 4 deletions(-)
>

Thanks,
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20180808/dd871358/attachment.asc>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v6 7/9] net/mlx4: Do not call pcie_print_link_status()
  2018-08-06 23:25                           ` [Intel-wired-lan] " Alexandru Gagniuc
@ 2018-08-08  6:10                             ` Leon Romanovsky
  -1 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2018-08-08  6:10 UTC (permalink / raw)
  To: Alexandru Gagniuc
  Cc: linux-pci, bhelgaas, jakub.kicinski, keith.busch, alex_gagniuc,
	austin_bolen, shyam_iyer, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Dirk van der Merwe, netdev,
	linux-kernel, intel-wired-lan, linux-rdma, oss-drivers

[-- Attachment #1: Type: text/plain, Size: 355 bytes --]

On Mon, Aug 06, 2018 at 06:25:41PM -0500, Alexandru Gagniuc wrote:
> This is now done by the PCI core to warn of sub-optimal bandwidth.
>
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
>  drivers/net/ethernet/mellanox/mlx4/main.c | 7 -------
>  1 file changed, 7 deletions(-)
>

Thanks,
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Intel-wired-lan] [PATCH v6 7/9] net/mlx4: Do not call pcie_print_link_status()
@ 2018-08-08  6:10                             ` Leon Romanovsky
  0 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2018-08-08  6:10 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, Aug 06, 2018 at 06:25:41PM -0500, Alexandru Gagniuc wrote:
> This is now done by the PCI core to warn of sub-optimal bandwidth.
>
> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> ---
>  drivers/net/ethernet/mellanox/mlx4/main.c | 7 -------
>  1 file changed, 7 deletions(-)
>

Thanks,
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20180808/cf17c6fa/attachment.asc>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
  2018-08-08  6:08                             ` [Intel-wired-lan] " Leon Romanovsky
  (?)
@ 2018-08-08 14:23                               ` Tal Gilboa
  -1 siblings, 0 replies; 73+ messages in thread
From: Tal Gilboa @ 2018-08-08 14:23 UTC (permalink / raw)
  To: Leon Romanovsky, Alexandru Gagniuc
  Cc: linux-pci, bhelgaas, jakub.kicinski, keith.busch, alex_gagniuc,
	austin_bolen, shyam_iyer, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Dirk van der Merwe, netdev,
	linux-kernel, intel-wired-lan, linux-rdma, oss-drivers

On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
> On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
>> This is now done by the PCI core to warn of sub-optimal bandwidth.
>>
>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>> ---
>>   drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
>>   1 file changed, 4 deletions(-)
>>
> 
> Thanks,
> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> 

Alex,
I loaded mlx5 driver with and without these series. The report in dmesg 
is now missing. From what I understood, the status should be reported at 
least once, even if everything is in order. We need this functionality 
to stay.

net-next (dmesg output for 07:00.0):
[270498.625351] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
[270498.632130] mlx5_core 0000:07:00.0: 63.008 Gb/s available PCIe 
bandwidth (8 GT/s x8 link)
[270499.169533] (0000:07:00.0): E-Switch: Total vports 9, per vport: max 
uc(1024) max mc(16384)
[270499.182358] mlx5_core 0000:07:00.0: Port module event: module 0, 
Cable plugged

net-next + patches (dmesg output for 07:00.0):
[  331.608472] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
[  332.564938] (0000:07:00.0): E-Switch: Total vports 9, per vport: max 
uc(1024) max mc(16384)
[  332.616271] mlx5_core 0000:07:00.0: Port module event: module 0, 
Cable plugged

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
@ 2018-08-08 14:23                               ` Tal Gilboa
  0 siblings, 0 replies; 73+ messages in thread
From: Tal Gilboa @ 2018-08-08 14:23 UTC (permalink / raw)
  To: Leon Romanovsky, Alexandru Gagniuc
  Cc: linux-pci, bhelgaas, jakub.kicinski, keith.busch, alex_gagniuc,
	austin_bolen, shyam_iyer, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Dirk van der Merwe, netdev,
	linux-kernel, intel-wired-lan, linux-rdma, oss-drivers,
	Saeed Mahameed

On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
> On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
>> This is now done by the PCI core to warn of sub-optimal bandwidth.
>>
>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>> ---
>>   drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
>>   1 file changed, 4 deletions(-)
>>
> 
> Thanks,
> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> 

Alex,
I loaded mlx5 driver with and without these series. The report in dmesg 
is now missing. From what I understood, the status should be reported at 
least once, even if everything is in order. We need this functionality 
to stay.

net-next (dmesg output for 07:00.0):
[270498.625351] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
[270498.632130] mlx5_core 0000:07:00.0: 63.008 Gb/s available PCIe 
bandwidth (8 GT/s x8 link)
[270499.169533] (0000:07:00.0): E-Switch: Total vports 9, per vport: max 
uc(1024) max mc(16384)
[270499.182358] mlx5_core 0000:07:00.0: Port module event: module 0, 
Cable plugged

net-next + patches (dmesg output for 07:00.0):
[  331.608472] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
[  332.564938] (0000:07:00.0): E-Switch: Total vports 9, per vport: max 
uc(1024) max mc(16384)
[  332.616271] mlx5_core 0000:07:00.0: Port module event: module 0, 
Cable plugged



^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Intel-wired-lan] [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
@ 2018-08-08 14:23                               ` Tal Gilboa
  0 siblings, 0 replies; 73+ messages in thread
From: Tal Gilboa @ 2018-08-08 14:23 UTC (permalink / raw)
  To: intel-wired-lan

On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
> On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
>> This is now done by the PCI core to warn of sub-optimal bandwidth.
>>
>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>> ---
>>   drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
>>   1 file changed, 4 deletions(-)
>>
> 
> Thanks,
> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> 

Alex,
I loaded mlx5 driver with and without these series. The report in dmesg 
is now missing. From what I understood, the status should be reported at 
least once, even if everything is in order. We need this functionality 
to stay.

net-next (dmesg output for 07:00.0):
[270498.625351] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
[270498.632130] mlx5_core 0000:07:00.0: 63.008 Gb/s available PCIe 
bandwidth (8 GT/s x8 link)
[270499.169533] (0000:07:00.0): E-Switch: Total vports 9, per vport: max 
uc(1024) max mc(16384)
[270499.182358] mlx5_core 0000:07:00.0: Port module event: module 0, 
Cable plugged

net-next + patches (dmesg output for 07:00.0):
[  331.608472] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
[  332.564938] (0000:07:00.0): E-Switch: Total vports 9, per vport: max 
uc(1024) max mc(16384)
[  332.616271] mlx5_core 0000:07:00.0: Port module event: module 0, 
Cable plugged



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
  2018-08-08 14:23                               ` Tal Gilboa
@ 2018-08-08 15:41                                 ` Leon Romanovsky
  -1 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2018-08-08 15:41 UTC (permalink / raw)
  To: Tal Gilboa
  Cc: Alexandru Gagniuc, linux-pci, bhelgaas, jakub.kicinski,
	keith.busch, alex_gagniuc, austin_bolen, shyam_iyer, Ariel Elior,
	everest-linux-l2, David S. Miller, Michael Chan, Ganesh Goudar,
	Jeff Kirsher, Tariq Toukan, Saeed Mahameed, Dirk van der Merwe,
	netdev, linux-kernel, intel-wired-lan, linux-rdma, oss-drivers

[-- Attachment #1: Type: text/plain, Size: 1730 bytes --]

On Wed, Aug 08, 2018 at 05:23:12PM +0300, Tal Gilboa wrote:
> On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
> > On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
> > > This is now done by the PCI core to warn of sub-optimal bandwidth.
> > >
> > > Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> > > ---
> > >   drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
> > >   1 file changed, 4 deletions(-)
> > >
> >
> > Thanks,
> > Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> >
>
> Alex,
> I loaded mlx5 driver with and without these series. The report in dmesg is
> now missing. From what I understood, the status should be reported at least
> once, even if everything is in order.

It is not what this series is doing and it removes prints completely if
fabric can deliver more than card is capable.

> We need this functionality to stay.

I'm not sure that you need this information in driver's dmesg output,
but most probably something globally visible and accessible per-pci
device.

>
> net-next (dmesg output for 07:00.0):
> [270498.625351] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
> [270498.632130] mlx5_core 0000:07:00.0: 63.008 Gb/s available PCIe bandwidth
> (8 GT/s x8 link)
> [270499.169533] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
> uc(1024) max mc(16384)
> [270499.182358] mlx5_core 0000:07:00.0: Port module event: module 0, Cable
> plugged
>
> net-next + patches (dmesg output for 07:00.0):
> [  331.608472] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
> [  332.564938] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
> uc(1024) max mc(16384)
> [  332.616271] mlx5_core 0000:07:00.0: Port module event: module 0, Cable
> plugged
>
>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Intel-wired-lan] [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
@ 2018-08-08 15:41                                 ` Leon Romanovsky
  0 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2018-08-08 15:41 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, Aug 08, 2018 at 05:23:12PM +0300, Tal Gilboa wrote:
> On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
> > On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
> > > This is now done by the PCI core to warn of sub-optimal bandwidth.
> > >
> > > Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> > > ---
> > >   drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
> > >   1 file changed, 4 deletions(-)
> > >
> >
> > Thanks,
> > Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> >
>
> Alex,
> I loaded mlx5 driver with and without these series. The report in dmesg is
> now missing. From what I understood, the status should be reported at least
> once, even if everything is in order.

It is not what this series is doing and it removes prints completely if
fabric can deliver more than card is capable.

> We need this functionality to stay.

I'm not sure that you need this information in driver's dmesg output,
but most probably something globally visible and accessible per-pci
device.

>
> net-next (dmesg output for 07:00.0):
> [270498.625351] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
> [270498.632130] mlx5_core 0000:07:00.0: 63.008 Gb/s available PCIe bandwidth
> (8 GT/s x8 link)
> [270499.169533] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
> uc(1024) max mc(16384)
> [270499.182358] mlx5_core 0000:07:00.0: Port module event: module 0, Cable
> plugged
>
> net-next + patches (dmesg output for 07:00.0):
> [  331.608472] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
> [  332.564938] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
> uc(1024) max mc(16384)
> [  332.616271] mlx5_core 0000:07:00.0: Port module event: module 0, Cable
> plugged
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20180808/28bc5402/attachment.asc>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
  2018-08-08 15:41                                 ` [Intel-wired-lan] " Leon Romanovsky
@ 2018-08-08 15:56                                   ` Tal Gilboa
  -1 siblings, 0 replies; 73+ messages in thread
From: Tal Gilboa @ 2018-08-08 15:56 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Alexandru Gagniuc, linux-pci, bhelgaas, jakub.kicinski,
	keith.busch, alex_gagniuc, austin_bolen, shyam_iyer, Ariel Elior,
	everest-linux-l2, David S. Miller, Michael Chan, Ganesh Goudar,
	Jeff Kirsher, Tariq Toukan, Saeed Mahameed, Dirk van der Merwe,
	netdev, linux-kernel, intel-wired-lan, linux-rdma, oss-drivers

On 8/8/2018 6:41 PM, Leon Romanovsky wrote:
> On Wed, Aug 08, 2018 at 05:23:12PM +0300, Tal Gilboa wrote:
>> On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
>>> On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
>>>> This is now done by the PCI core to warn of sub-optimal bandwidth.
>>>>
>>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>>>> ---
>>>>    drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
>>>>    1 file changed, 4 deletions(-)
>>>>
>>>
>>> Thanks,
>>> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
>>>
>>
>> Alex,
>> I loaded mlx5 driver with and without these series. The report in dmesg is
>> now missing. From what I understood, the status should be reported at least
>> once, even if everything is in order.
> 
> It is not what this series is doing and it removes prints completely if
> fabric can deliver more than card is capable.
> 
>> We need this functionality to stay.
> 
> I'm not sure that you need this information in driver's dmesg output,
> but most probably something globally visible and accessible per-pci
> device.

Currently we have users that look for it. If we remove the dmesg print 
we need this to be reported elsewhere. Adding it to sysfs for example 
should be a valid solution for our case.

> 
>>
>> net-next (dmesg output for 07:00.0):
>> [270498.625351] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
>> [270498.632130] mlx5_core 0000:07:00.0: 63.008 Gb/s available PCIe bandwidth
>> (8 GT/s x8 link)
>> [270499.169533] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
>> uc(1024) max mc(16384)
>> [270499.182358] mlx5_core 0000:07:00.0: Port module event: module 0, Cable
>> plugged
>>
>> net-next + patches (dmesg output for 07:00.0):
>> [  331.608472] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
>> [  332.564938] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
>> uc(1024) max mc(16384)
>> [  332.616271] mlx5_core 0000:07:00.0: Port module event: module 0, Cable
>> plugged
>>
>>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Intel-wired-lan] [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
@ 2018-08-08 15:56                                   ` Tal Gilboa
  0 siblings, 0 replies; 73+ messages in thread
From: Tal Gilboa @ 2018-08-08 15:56 UTC (permalink / raw)
  To: intel-wired-lan

On 8/8/2018 6:41 PM, Leon Romanovsky wrote:
> On Wed, Aug 08, 2018 at 05:23:12PM +0300, Tal Gilboa wrote:
>> On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
>>> On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
>>>> This is now done by the PCI core to warn of sub-optimal bandwidth.
>>>>
>>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>>>> ---
>>>>    drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
>>>>    1 file changed, 4 deletions(-)
>>>>
>>>
>>> Thanks,
>>> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
>>>
>>
>> Alex,
>> I loaded mlx5 driver with and without these series. The report in dmesg is
>> now missing. From what I understood, the status should be reported at least
>> once, even if everything is in order.
> 
> It is not what this series is doing and it removes prints completely if
> fabric can deliver more than card is capable.
> 
>> We need this functionality to stay.
> 
> I'm not sure that you need this information in driver's dmesg output,
> but most probably something globally visible and accessible per-pci
> device.

Currently we have users that look for it. If we remove the dmesg print 
we need this to be reported elsewhere. Adding it to sysfs for example 
should be a valid solution for our case.

> 
>>
>> net-next (dmesg output for 07:00.0):
>> [270498.625351] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
>> [270498.632130] mlx5_core 0000:07:00.0: 63.008 Gb/s available PCIe bandwidth
>> (8 GT/s x8 link)
>> [270499.169533] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
>> uc(1024) max mc(16384)
>> [270499.182358] mlx5_core 0000:07:00.0: Port module event: module 0, Cable
>> plugged
>>
>> net-next + patches (dmesg output for 07:00.0):
>> [  331.608472] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
>> [  332.564938] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
>> uc(1024) max mc(16384)
>> [  332.616271] mlx5_core 0000:07:00.0: Port module event: module 0, Cable
>> plugged
>>
>>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
  2018-08-08 15:56                                   ` [Intel-wired-lan] " Tal Gilboa
@ 2018-08-08 16:33                                     ` Alex G.
  -1 siblings, 0 replies; 73+ messages in thread
From: Alex G. @ 2018-08-08 16:33 UTC (permalink / raw)
  To: Tal Gilboa, Leon Romanovsky
  Cc: linux-pci, bhelgaas, jakub.kicinski, keith.busch, alex_gagniuc,
	austin_bolen, shyam_iyer, Ariel Elior, everest-linux-l2,
	David S. Miller, Michael Chan, Ganesh Goudar, Jeff Kirsher,
	Tariq Toukan, Saeed Mahameed, Dirk van der Merwe, netdev,
	linux-kernel, intel-wired-lan, linux-rdma, oss-drivers



On 08/08/2018 10:56 AM, Tal Gilboa wrote:
> On 8/8/2018 6:41 PM, Leon Romanovsky wrote:
>> On Wed, Aug 08, 2018 at 05:23:12PM +0300, Tal Gilboa wrote:
>>> On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
>>>> On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
>>>>> This is now done by the PCI core to warn of sub-optimal bandwidth.
>>>>>
>>>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>>>>> ---
>>>>>    drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
>>>>>    1 file changed, 4 deletions(-)
>>>>>
>>>>
>>>> Thanks,
>>>> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
>>>>
>>>
>>> Alex,
>>> I loaded mlx5 driver with and without these series. The report in 
>>> dmesg is
>>> now missing. From what I understood, the status should be reported at 
>>> least
>>> once, even if everything is in order.
>>
>> It is not what this series is doing and it removes prints completely if
>> fabric can deliver more than card is capable.
>>
>>> We need this functionality to stay.
>>
>> I'm not sure that you need this information in driver's dmesg output,
>> but most probably something globally visible and accessible per-pci
>> device.
> 
> Currently we have users that look for it. If we remove the dmesg print 
> we need this to be reported elsewhere. Adding it to sysfs for example 
> should be a valid solution for our case.

I think a stop-gap measure is to leave the pcie_print_link_status() call 
in drivers that really need it for whatever reason. Implementing a 
reliable reporting through sysfs might take some tinkering, and I don't 
think it's a sufficient reason to block the heart of this series -- 
being able to detect bottlenecks and link downtraining.

Alex

>>
>>>
>>> net-next (dmesg output for 07:00.0):
>>> [270498.625351] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
>>> [270498.632130] mlx5_core 0000:07:00.0: 63.008 Gb/s available PCIe 
>>> bandwidth
>>> (8 GT/s x8 link)
>>> [270499.169533] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
>>> uc(1024) max mc(16384)
>>> [270499.182358] mlx5_core 0000:07:00.0: Port module event: module 0, 
>>> Cable
>>> plugged
>>>
>>> net-next + patches (dmesg output for 07:00.0):
>>> [  331.608472] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
>>> [  332.564938] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
>>> uc(1024) max mc(16384)
>>> [  332.616271] mlx5_core 0000:07:00.0: Port module event: module 0, 
>>> Cable
>>> plugged
>>>
>>>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Intel-wired-lan] [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
@ 2018-08-08 16:33                                     ` Alex G.
  0 siblings, 0 replies; 73+ messages in thread
From: Alex G. @ 2018-08-08 16:33 UTC (permalink / raw)
  To: intel-wired-lan



On 08/08/2018 10:56 AM, Tal Gilboa wrote:
> On 8/8/2018 6:41 PM, Leon Romanovsky wrote:
>> On Wed, Aug 08, 2018 at 05:23:12PM +0300, Tal Gilboa wrote:
>>> On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
>>>> On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
>>>>> This is now done by the PCI core to warn of sub-optimal bandwidth.
>>>>>
>>>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>>>>> ---
>>>>> ?? drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
>>>>> ?? 1 file changed, 4 deletions(-)
>>>>>
>>>>
>>>> Thanks,
>>>> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
>>>>
>>>
>>> Alex,
>>> I loaded mlx5 driver with and without these series. The report in 
>>> dmesg is
>>> now missing. From what I understood, the status should be reported at 
>>> least
>>> once, even if everything is in order.
>>
>> It is not what this series is doing and it removes prints completely if
>> fabric can deliver more than card is capable.
>>
>>> We need this functionality to stay.
>>
>> I'm not sure that you need this information in driver's dmesg output,
>> but most probably something globally visible and accessible per-pci
>> device.
> 
> Currently we have users that look for it. If we remove the dmesg print 
> we need this to be reported elsewhere. Adding it to sysfs for example 
> should be a valid solution for our case.

I think a stop-gap measure is to leave the pcie_print_link_status() call 
in drivers that really need it for whatever reason. Implementing a 
reliable reporting through sysfs might take some tinkering, and I don't 
think it's a sufficient reason to block the heart of this series -- 
being able to detect bottlenecks and link downtraining.

Alex

>>
>>>
>>> net-next (dmesg output for 07:00.0):
>>> [270498.625351] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
>>> [270498.632130] mlx5_core 0000:07:00.0: 63.008 Gb/s available PCIe 
>>> bandwidth
>>> (8 GT/s x8 link)
>>> [270499.169533] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
>>> uc(1024) max mc(16384)
>>> [270499.182358] mlx5_core 0000:07:00.0: Port module event: module 0, 
>>> Cable
>>> plugged
>>>
>>> net-next + patches (dmesg output for 07:00.0):
>>> [? 331.608472] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
>>> [? 332.564938] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
>>> uc(1024) max mc(16384)
>>> [? 332.616271] mlx5_core 0000:07:00.0: Port module event: module 0, 
>>> Cable
>>> plugged
>>>
>>>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
  2018-08-08 16:33                                     ` [Intel-wired-lan] " Alex G.
@ 2018-08-08 17:27                                       ` Leon Romanovsky
  -1 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2018-08-08 17:27 UTC (permalink / raw)
  To: Alex G.
  Cc: Tal Gilboa, linux-pci, bhelgaas, jakub.kicinski, keith.busch,
	alex_gagniuc, austin_bolen, shyam_iyer, Ariel Elior,
	everest-linux-l2, David S. Miller, Michael Chan, Ganesh Goudar,
	Jeff Kirsher, Tariq Toukan, Saeed Mahameed, Dirk van der Merwe,
	netdev, linux-kernel, intel-wired-lan, linux-rdma, oss-drivers

[-- Attachment #1: Type: text/plain, Size: 2996 bytes --]

On Wed, Aug 08, 2018 at 11:33:51AM -0500, Alex G. wrote:
>
>
> On 08/08/2018 10:56 AM, Tal Gilboa wrote:
> > On 8/8/2018 6:41 PM, Leon Romanovsky wrote:
> > > On Wed, Aug 08, 2018 at 05:23:12PM +0300, Tal Gilboa wrote:
> > > > On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
> > > > > On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
> > > > > > This is now done by the PCI core to warn of sub-optimal bandwidth.
> > > > > >
> > > > > > Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> > > > > > ---
> > > > > >    drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
> > > > > >    1 file changed, 4 deletions(-)
> > > > > >
> > > > >
> > > > > Thanks,
> > > > > Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> > > > >
> > > >
> > > > Alex,
> > > > I loaded mlx5 driver with and without these series. The report
> > > > in dmesg is
> > > > now missing. From what I understood, the status should be
> > > > reported at least
> > > > once, even if everything is in order.
> > >
> > > It is not what this series is doing and it removes prints completely if
> > > fabric can deliver more than card is capable.
> > >
> > > > We need this functionality to stay.
> > >
> > > I'm not sure that you need this information in driver's dmesg output,
> > > but most probably something globally visible and accessible per-pci
> > > device.
> >
> > Currently we have users that look for it. If we remove the dmesg print
> > we need this to be reported elsewhere. Adding it to sysfs for example
> > should be a valid solution for our case.
>
> I think a stop-gap measure is to leave the pcie_print_link_status() call in
> drivers that really need it for whatever reason. Implementing a reliable
> reporting through sysfs might take some tinkering, and I don't think it's a
> sufficient reason to block the heart of this series -- being able to detect
> bottlenecks and link downtraining.

IMHO, you did right change and it is better to replace this print to some
more generic solution now while you are doing it and don't leave leftovers.

Thanks

>
> Alex
>
> > >
> > > >
> > > > net-next (dmesg output for 07:00.0):
> > > > [270498.625351] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
> > > > [270498.632130] mlx5_core 0000:07:00.0: 63.008 Gb/s available
> > > > PCIe bandwidth
> > > > (8 GT/s x8 link)
> > > > [270499.169533] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
> > > > uc(1024) max mc(16384)
> > > > [270499.182358] mlx5_core 0000:07:00.0: Port module event:
> > > > module 0, Cable
> > > > plugged
> > > >
> > > > net-next + patches (dmesg output for 07:00.0):
> > > > [  331.608472] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
> > > > [  332.564938] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
> > > > uc(1024) max mc(16384)
> > > > [  332.616271] mlx5_core 0000:07:00.0: Port module event: module
> > > > 0, Cable
> > > > plugged
> > > >
> > > >

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [Intel-wired-lan] [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
@ 2018-08-08 17:27                                       ` Leon Romanovsky
  0 siblings, 0 replies; 73+ messages in thread
From: Leon Romanovsky @ 2018-08-08 17:27 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, Aug 08, 2018 at 11:33:51AM -0500, Alex G. wrote:
>
>
> On 08/08/2018 10:56 AM, Tal Gilboa wrote:
> > On 8/8/2018 6:41 PM, Leon Romanovsky wrote:
> > > On Wed, Aug 08, 2018 at 05:23:12PM +0300, Tal Gilboa wrote:
> > > > On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
> > > > > On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
> > > > > > This is now done by the PCI core to warn of sub-optimal bandwidth.
> > > > > >
> > > > > > Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> > > > > > ---
> > > > > > ?? drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
> > > > > > ?? 1 file changed, 4 deletions(-)
> > > > > >
> > > > >
> > > > > Thanks,
> > > > > Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> > > > >
> > > >
> > > > Alex,
> > > > I loaded mlx5 driver with and without these series. The report
> > > > in dmesg is
> > > > now missing. From what I understood, the status should be
> > > > reported at least
> > > > once, even if everything is in order.
> > >
> > > It is not what this series is doing and it removes prints completely if
> > > fabric can deliver more than card is capable.
> > >
> > > > We need this functionality to stay.
> > >
> > > I'm not sure that you need this information in driver's dmesg output,
> > > but most probably something globally visible and accessible per-pci
> > > device.
> >
> > Currently we have users that look for it. If we remove the dmesg print
> > we need this to be reported elsewhere. Adding it to sysfs for example
> > should be a valid solution for our case.
>
> I think a stop-gap measure is to leave the pcie_print_link_status() call in
> drivers that really need it for whatever reason. Implementing a reliable
> reporting through sysfs might take some tinkering, and I don't think it's a
> sufficient reason to block the heart of this series -- being able to detect
> bottlenecks and link downtraining.

IMHO, you did right change and it is better to replace this print to some
more generic solution now while you are doing it and don't leave leftovers.

Thanks

>
> Alex
>
> > >
> > > >
> > > > net-next (dmesg output for 07:00.0):
> > > > [270498.625351] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
> > > > [270498.632130] mlx5_core 0000:07:00.0: 63.008 Gb/s available
> > > > PCIe bandwidth
> > > > (8 GT/s x8 link)
> > > > [270499.169533] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
> > > > uc(1024) max mc(16384)
> > > > [270499.182358] mlx5_core 0000:07:00.0: Port module event:
> > > > module 0, Cable
> > > > plugged
> > > >
> > > > net-next + patches (dmesg output for 07:00.0):
> > > > [? 331.608472] mlx5_core 0000:07:00.0: firmware version: 14.22.4020
> > > > [? 332.564938] (0000:07:00.0): E-Switch: Total vports 9, per vport: max
> > > > uc(1024) max mc(16384)
> > > > [? 332.616271] mlx5_core 0000:07:00.0: Port module event: module
> > > > 0, Cable
> > > > plugged
> > > >
> > > >
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20180808/436c707b/attachment.asc>

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
  2018-08-08 17:27                                       ` [Intel-wired-lan] " Leon Romanovsky
@ 2018-08-09 14:02                                         ` Bjorn Helgaas
  -1 siblings, 0 replies; 73+ messages in thread
From: Bjorn Helgaas @ 2018-08-09 14:02 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Alex G.,
	Tal Gilboa, linux-pci, bhelgaas, jakub.kicinski, keith.busch,
	alex_gagniuc, austin_bolen, shyam_iyer, Ariel Elior,
	everest-linux-l2, David S. Miller, Michael Chan, Ganesh Goudar,
	Jeff Kirsher, Tariq Toukan, Saeed Mahameed, Dirk van der Merwe,
	netdev, linux-kernel, intel-wired-lan, linux-rdma, oss-drivers

On Wed, Aug 08, 2018 at 08:27:36PM +0300, Leon Romanovsky wrote:
> On Wed, Aug 08, 2018 at 11:33:51AM -0500, Alex G. wrote:
> >
> >
> > On 08/08/2018 10:56 AM, Tal Gilboa wrote:
> > > On 8/8/2018 6:41 PM, Leon Romanovsky wrote:
> > > > On Wed, Aug 08, 2018 at 05:23:12PM +0300, Tal Gilboa wrote:
> > > > > On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
> > > > > > On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
> > > > > > > This is now done by the PCI core to warn of sub-optimal bandwidth.
> > > > > > >
> > > > > > > Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> > > > > > > ---
> > > > > > >    drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
> > > > > > >    1 file changed, 4 deletions(-)
> > > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> > > > > >
> > > > >
> > > > > Alex,
> > > > > I loaded mlx5 driver with and without these series. The report
> > > > > in dmesg is
> > > > > now missing. From what I understood, the status should be
> > > > > reported at least
> > > > > once, even if everything is in order.
> > > >
> > > > It is not what this series is doing and it removes prints completely if
> > > > fabric can deliver more than card is capable.
> > > >
> > > > > We need this functionality to stay.
> > > >
> > > > I'm not sure that you need this information in driver's dmesg output,
> > > > but most probably something globally visible and accessible per-pci
> > > > device.
> > >
> > > Currently we have users that look for it. If we remove the dmesg print
> > > we need this to be reported elsewhere. Adding it to sysfs for example
> > > should be a valid solution for our case.
> >
> > I think a stop-gap measure is to leave the pcie_print_link_status() call in
> > drivers that really need it for whatever reason. Implementing a reliable
> > reporting through sysfs might take some tinkering, and I don't think it's a
> > sufficient reason to block the heart of this series -- being able to detect
> > bottlenecks and link downtraining.
> 
> IMHO, you did right change and it is better to replace this print to some
> more generic solution now while you are doing it and don't leave leftovers.

I'd like to make forward progress on this, so I propose we merge only
the PCI core change (patch 1/9) and drop the individual driver
changes.  That would mean:

  - We'll get a message from every NIC driver that calls
    pcie_print_link_status() as before.

  - We'll get a new message from the core for every downtrained link.

  - If a link leading to the NIC is downtrained, there will be
    duplicate messages.  Maybe that's overkill but it's not terrible.

I provisionally put the patch below on my pci/enumeration branch.
Objections?


commit c870cc8cbc4d79014f3daa74d1e412f32e42bf1b
Author: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Date:   Mon Aug 6 18:25:35 2018 -0500

    PCI: Check for PCIe Link downtraining
    
    When both ends of a PCIe Link are capable of a higher bandwidth than is
    currently in use, the Link is said to be "downtrained".  A downtrained Link
    may indicate hardware or configuration problems in the system, but it's
    hard to identify such Links from userspace.
    
    Refactor pcie_print_link_status() so it continues to always print PCIe
    bandwidth information, as several NIC drivers desire.
    
    Add a new internal __pcie_print_link_status() to emit a message only when a
    device's bandwidth is constrained by the fabric and call it from the PCI
    core for all devices, which identifies all downtrained Links.  It also
    emits messages for a few cases that are technically not downtrained, such
    as a x4 device in an open-ended x1 slot.
    
    Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
    [bhelgaas: changelog, move __pcie_print_link_status() declaration to
    drivers/pci/, rename pcie_check_upstream_link() to
    pcie_report_downtraining()]
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 97acba712e4e..a84d341504a5 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5264,14 +5264,16 @@ u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
 }
 
 /**
- * pcie_print_link_status - Report the PCI device's link speed and width
+ * __pcie_print_link_status - Report the PCI device's link speed and width
  * @dev: PCI device to query
+ * @verbose: Print info even when enough bandwidth is available
  *
- * Report the available bandwidth at the device.  If this is less than the
- * device is capable of, report the device's maximum possible bandwidth and
- * the upstream link that limits its performance to less than that.
+ * If the available bandwidth at the device is less than the device is
+ * capable of, report the device's maximum possible bandwidth and the
+ * upstream link that limits its performance.  If @verbose, always print
+ * the available bandwidth, even if the device isn't constrained.
  */
-void pcie_print_link_status(struct pci_dev *dev)
+void __pcie_print_link_status(struct pci_dev *dev, bool verbose)
 {
 	enum pcie_link_width width, width_cap;
 	enum pci_bus_speed speed, speed_cap;
@@ -5281,11 +5283,11 @@ void pcie_print_link_status(struct pci_dev *dev)
 	bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap);
 	bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed, &width);
 
-	if (bw_avail >= bw_cap)
+	if (bw_avail >= bw_cap && verbose)
 		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s x%d link)\n",
 			 bw_cap / 1000, bw_cap % 1000,
 			 PCIE_SPEED2STR(speed_cap), width_cap);
-	else
+	else if (bw_avail < bw_cap)
 		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, limited by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
 			 bw_avail / 1000, bw_avail % 1000,
 			 PCIE_SPEED2STR(speed), width,
@@ -5293,6 +5295,17 @@ void pcie_print_link_status(struct pci_dev *dev)
 			 bw_cap / 1000, bw_cap % 1000,
 			 PCIE_SPEED2STR(speed_cap), width_cap);
 }
+
+/**
+ * pcie_print_link_status - Report the PCI device's link speed and width
+ * @dev: PCI device to query
+ *
+ * Report the available bandwidth at the device.
+ */
+void pcie_print_link_status(struct pci_dev *dev)
+{
+	__pcie_print_link_status(dev, true);
+}
 EXPORT_SYMBOL(pcie_print_link_status);
 
 /**
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 70808c168fb9..ce880dab5bc8 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -263,6 +263,7 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev);
 enum pcie_link_width pcie_get_width_cap(struct pci_dev *dev);
 u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
 			   enum pcie_link_width *width);
+void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
 
 /* Single Root I/O Virtualization */
 struct pci_sriov {
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index bc147c586643..387fc8ac54ec 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2231,6 +2231,25 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
 	return dev;
 }
 
+static void pcie_report_downtraining(struct pci_dev *dev)
+{
+	if (!pci_is_pcie(dev))
+		return;
+
+	/* Look from the device up to avoid downstream ports with no devices */
+	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
+		return;
+
+	/* Multi-function PCIe devices share the same link/status */
+	if (PCI_FUNC(dev->devfn) != 0 || dev->is_virtfn)
+		return;
+
+	/* Print link status only if the device is constrained by the fabric */
+	__pcie_print_link_status(dev, false);
+}
+
 static void pci_init_capabilities(struct pci_dev *dev)
 {
 	/* Enhanced Allocation */
@@ -2266,6 +2285,8 @@ static void pci_init_capabilities(struct pci_dev *dev)
 	/* Advanced Error Reporting */
 	pci_aer_init(dev);
 
+	pcie_report_downtraining(dev);
+
 	if (pci_probe_reset_function(dev) == 0)
 		dev->reset_fn = 1;
 }

^ permalink raw reply related	[flat|nested] 73+ messages in thread

* [Intel-wired-lan] [PATCH v6 8/9] net/mlx5: Do not call pcie_print_link_status()
@ 2018-08-09 14:02                                         ` Bjorn Helgaas
  0 siblings, 0 replies; 73+ messages in thread
From: Bjorn Helgaas @ 2018-08-09 14:02 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, Aug 08, 2018 at 08:27:36PM +0300, Leon Romanovsky wrote:
> On Wed, Aug 08, 2018 at 11:33:51AM -0500, Alex G. wrote:
> >
> >
> > On 08/08/2018 10:56 AM, Tal Gilboa wrote:
> > > On 8/8/2018 6:41 PM, Leon Romanovsky wrote:
> > > > On Wed, Aug 08, 2018 at 05:23:12PM +0300, Tal Gilboa wrote:
> > > > > On 8/8/2018 9:08 AM, Leon Romanovsky wrote:
> > > > > > On Mon, Aug 06, 2018 at 06:25:42PM -0500, Alexandru Gagniuc wrote:
> > > > > > > This is now done by the PCI core to warn of sub-optimal bandwidth.
> > > > > > >
> > > > > > > Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> > > > > > > ---
> > > > > > > ?? drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
> > > > > > > ?? 1 file changed, 4 deletions(-)
> > > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
> > > > > >
> > > > >
> > > > > Alex,
> > > > > I loaded mlx5 driver with and without these series. The report
> > > > > in dmesg is
> > > > > now missing. From what I understood, the status should be
> > > > > reported at least
> > > > > once, even if everything is in order.
> > > >
> > > > It is not what this series is doing and it removes prints completely if
> > > > fabric can deliver more than card is capable.
> > > >
> > > > > We need this functionality to stay.
> > > >
> > > > I'm not sure that you need this information in driver's dmesg output,
> > > > but most probably something globally visible and accessible per-pci
> > > > device.
> > >
> > > Currently we have users that look for it. If we remove the dmesg print
> > > we need this to be reported elsewhere. Adding it to sysfs for example
> > > should be a valid solution for our case.
> >
> > I think a stop-gap measure is to leave the pcie_print_link_status() call in
> > drivers that really need it for whatever reason. Implementing a reliable
> > reporting through sysfs might take some tinkering, and I don't think it's a
> > sufficient reason to block the heart of this series -- being able to detect
> > bottlenecks and link downtraining.
> 
> IMHO, you did right change and it is better to replace this print to some
> more generic solution now while you are doing it and don't leave leftovers.

I'd like to make forward progress on this, so I propose we merge only
the PCI core change (patch 1/9) and drop the individual driver
changes.  That would mean:

  - We'll get a message from every NIC driver that calls
    pcie_print_link_status() as before.

  - We'll get a new message from the core for every downtrained link.

  - If a link leading to the NIC is downtrained, there will be
    duplicate messages.  Maybe that's overkill but it's not terrible.

I provisionally put the patch below on my pci/enumeration branch.
Objections?


commit c870cc8cbc4d79014f3daa74d1e412f32e42bf1b
Author: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Date:   Mon Aug 6 18:25:35 2018 -0500

    PCI: Check for PCIe Link downtraining
    
    When both ends of a PCIe Link are capable of a higher bandwidth than is
    currently in use, the Link is said to be "downtrained".  A downtrained Link
    may indicate hardware or configuration problems in the system, but it's
    hard to identify such Links from userspace.
    
    Refactor pcie_print_link_status() so it continues to always print PCIe
    bandwidth information, as several NIC drivers desire.
    
    Add a new internal __pcie_print_link_status() to emit a message only when a
    device's bandwidth is constrained by the fabric and call it from the PCI
    core for all devices, which identifies all downtrained Links.  It also
    emits messages for a few cases that are technically not downtrained, such
    as a x4 device in an open-ended x1 slot.
    
    Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
    [bhelgaas: changelog, move __pcie_print_link_status() declaration to
    drivers/pci/, rename pcie_check_upstream_link() to
    pcie_report_downtraining()]
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 97acba712e4e..a84d341504a5 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5264,14 +5264,16 @@ u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
 }
 
 /**
- * pcie_print_link_status - Report the PCI device's link speed and width
+ * __pcie_print_link_status - Report the PCI device's link speed and width
  * @dev: PCI device to query
+ * @verbose: Print info even when enough bandwidth is available
  *
- * Report the available bandwidth at the device.  If this is less than the
- * device is capable of, report the device's maximum possible bandwidth and
- * the upstream link that limits its performance to less than that.
+ * If the available bandwidth at the device is less than the device is
+ * capable of, report the device's maximum possible bandwidth and the
+ * upstream link that limits its performance.  If @verbose, always print
+ * the available bandwidth, even if the device isn't constrained.
  */
-void pcie_print_link_status(struct pci_dev *dev)
+void __pcie_print_link_status(struct pci_dev *dev, bool verbose)
 {
 	enum pcie_link_width width, width_cap;
 	enum pci_bus_speed speed, speed_cap;
@@ -5281,11 +5283,11 @@ void pcie_print_link_status(struct pci_dev *dev)
 	bw_cap = pcie_bandwidth_capable(dev, &speed_cap, &width_cap);
 	bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed, &width);
 
-	if (bw_avail >= bw_cap)
+	if (bw_avail >= bw_cap && verbose)
 		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s x%d link)\n",
 			 bw_cap / 1000, bw_cap % 1000,
 			 PCIE_SPEED2STR(speed_cap), width_cap);
-	else
+	else if (bw_avail < bw_cap)
 		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, limited by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
 			 bw_avail / 1000, bw_avail % 1000,
 			 PCIE_SPEED2STR(speed), width,
@@ -5293,6 +5295,17 @@ void pcie_print_link_status(struct pci_dev *dev)
 			 bw_cap / 1000, bw_cap % 1000,
 			 PCIE_SPEED2STR(speed_cap), width_cap);
 }
+
+/**
+ * pcie_print_link_status - Report the PCI device's link speed and width
+ * @dev: PCI device to query
+ *
+ * Report the available bandwidth@the device.
+ */
+void pcie_print_link_status(struct pci_dev *dev)
+{
+	__pcie_print_link_status(dev, true);
+}
 EXPORT_SYMBOL(pcie_print_link_status);
 
 /**
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 70808c168fb9..ce880dab5bc8 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -263,6 +263,7 @@ enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev);
 enum pcie_link_width pcie_get_width_cap(struct pci_dev *dev);
 u32 pcie_bandwidth_capable(struct pci_dev *dev, enum pci_bus_speed *speed,
 			   enum pcie_link_width *width);
+void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
 
 /* Single Root I/O Virtualization */
 struct pci_sriov {
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index bc147c586643..387fc8ac54ec 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2231,6 +2231,25 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn)
 	return dev;
 }
 
+static void pcie_report_downtraining(struct pci_dev *dev)
+{
+	if (!pci_is_pcie(dev))
+		return;
+
+	/* Look from the device up to avoid downstream ports with no devices */
+	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
+		return;
+
+	/* Multi-function PCIe devices share the same link/status */
+	if (PCI_FUNC(dev->devfn) != 0 || dev->is_virtfn)
+		return;
+
+	/* Print link status only if the device is constrained by the fabric */
+	__pcie_print_link_status(dev, false);
+}
+
 static void pci_init_capabilities(struct pci_dev *dev)
 {
 	/* Enhanced Allocation */
@@ -2266,6 +2285,8 @@ static void pci_init_capabilities(struct pci_dev *dev)
 	/* Advanced Error Reporting */
 	pci_aer_init(dev);
 
+	pcie_report_downtraining(dev);
+
 	if (pci_probe_reset_function(dev) == 0)
 		dev->reset_fn = 1;
 }

^ permalink raw reply related	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2018-08-09 14:02 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-04 15:55 [PATCH v3] PCI: Check for PCIe downtraining conditions Alexandru Gagniuc
2018-06-05 12:27 ` Andy Shevchenko
2018-06-05 13:04   ` Andy Shevchenko
2018-07-16 21:17 ` Bjorn Helgaas
2018-07-16 22:28   ` Alex_Gagniuc
2018-07-16 22:28     ` Alex_Gagniuc
2018-07-18 21:53     ` Bjorn Helgaas
2018-07-19 15:46       ` Alex G.
2018-07-19 17:07         ` Deucher, Alexander
2018-07-23 20:01       ` [PATCH v2] PCI/AER: Do not clear AER bits if we don't own AER Alexandru Gagniuc
2018-07-25  1:24         ` kbuild test robot
2018-07-23 20:03       ` [PATCH v5] PCI: Check for PCIe downtraining conditions Alexandru Gagniuc
2018-07-23 21:01         ` Jakub Kicinski
2018-07-23 21:52           ` Tal Gilboa
2018-07-23 22:14             ` Jakub Kicinski
2018-07-23 23:59               ` Alex G.
2018-07-24 13:39                 ` Tal Gilboa
2018-07-30 23:26                   ` Alex_Gagniuc
2018-07-30 23:26                     ` Alex_Gagniuc
2018-07-31  6:40             ` Tal Gilboa
2018-07-31 15:10               ` Alex G.
2018-08-05  7:05                 ` Tal Gilboa
2018-08-06 18:39                   ` Alex_Gagniuc
2018-08-06 18:39                     ` Alex_Gagniuc
2018-08-06 19:46                     ` Bjorn Helgaas
2018-08-06 23:25                       ` [PATCH v6 1/9] " Alexandru Gagniuc
2018-08-06 23:25                         ` [Intel-wired-lan] " Alexandru Gagniuc
2018-08-06 23:25                         ` [PATCH v6 2/9] bnx2x: Do not call pcie_print_link_status() Alexandru Gagniuc
2018-08-06 23:25                           ` [Intel-wired-lan] " Alexandru Gagniuc
2018-08-06 23:25                         ` [PATCH v6 3/9] bnxt_en: " Alexandru Gagniuc
2018-08-06 23:25                           ` [Intel-wired-lan] " Alexandru Gagniuc
2018-08-06 23:25                         ` [PATCH v6 4/9] cxgb4: " Alexandru Gagniuc
2018-08-06 23:25                           ` [Intel-wired-lan] " Alexandru Gagniuc
2018-08-06 23:25                         ` [PATCH v6 5/9] fm10k: " Alexandru Gagniuc
2018-08-06 23:25                           ` [Intel-wired-lan] " Alexandru Gagniuc
2018-08-07 17:52                           ` Jeff Kirsher
2018-08-07 17:52                             ` [Intel-wired-lan] " Jeff Kirsher
2018-08-06 23:25                         ` [PATCH v6 6/9] ixgbe: " Alexandru Gagniuc
2018-08-06 23:25                           ` [Intel-wired-lan] " Alexandru Gagniuc
2018-08-07 17:51                           ` Jeff Kirsher
2018-08-07 17:51                             ` [Intel-wired-lan] " Jeff Kirsher
2018-08-06 23:25                         ` [PATCH v6 7/9] net/mlx4: " Alexandru Gagniuc
2018-08-06 23:25                           ` [Intel-wired-lan] " Alexandru Gagniuc
2018-08-08  6:10                           ` Leon Romanovsky
2018-08-08  6:10                             ` [Intel-wired-lan] " Leon Romanovsky
2018-08-06 23:25                         ` [PATCH v6 8/9] net/mlx5: " Alexandru Gagniuc
2018-08-06 23:25                           ` [Intel-wired-lan] " Alexandru Gagniuc
2018-08-08  6:08                           ` Leon Romanovsky
2018-08-08  6:08                             ` [Intel-wired-lan] " Leon Romanovsky
2018-08-08 14:23                             ` Tal Gilboa
2018-08-08 14:23                               ` [Intel-wired-lan] " Tal Gilboa
2018-08-08 14:23                               ` Tal Gilboa
2018-08-08 15:41                               ` Leon Romanovsky
2018-08-08 15:41                                 ` [Intel-wired-lan] " Leon Romanovsky
2018-08-08 15:56                                 ` Tal Gilboa
2018-08-08 15:56                                   ` [Intel-wired-lan] " Tal Gilboa
2018-08-08 16:33                                   ` Alex G.
2018-08-08 16:33                                     ` [Intel-wired-lan] " Alex G.
2018-08-08 17:27                                     ` Leon Romanovsky
2018-08-08 17:27                                       ` [Intel-wired-lan] " Leon Romanovsky
2018-08-09 14:02                                       ` Bjorn Helgaas
2018-08-09 14:02                                         ` [Intel-wired-lan] " Bjorn Helgaas
2018-08-06 23:25                         ` [PATCH v6 9/9] nfp: " Alexandru Gagniuc
2018-08-06 23:25                           ` [Intel-wired-lan] " Alexandru Gagniuc
2018-08-07 19:44                         ` [PATCH v6 1/9] PCI: Check for PCIe downtraining conditions David Miller
2018-08-07 19:44                           ` [Intel-wired-lan] " David Miller
2018-08-07 21:41                         ` Bjorn Helgaas
2018-08-07 21:41                           ` [Intel-wired-lan] " Bjorn Helgaas
2018-07-18 13:38   ` [PATCH v3] " Tal Gilboa
2018-07-19 15:49     ` Alex G.
2018-07-23  5:21       ` Tal Gilboa
2018-07-23 17:01         ` Alex G.
2018-07-23 21:35           ` Tal Gilboa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.