All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] PCI/LINK: Account for BW notification in vector calculation
@ 2019-04-22 22:43 Alex Williamson
  2019-04-23  0:05 ` Alex G
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Alex Williamson @ 2019-04-22 22:43 UTC (permalink / raw)
  To: bhelgaas, helgaas, mr.nuke.me, linux-pci
  Cc: austin_bolen, alex_gagniuc, keith.busch, Shyam_Iyer, lukas,
	okaya, torvalds, linux-kernel

On systems that don't support any PCIe services other than bandwidth
notification, pcie_message_numbers() can return zero vectors, causing
the vector reallocation in pcie_port_enable_irq_vec() to retry with
zero, which fails, resulting in fallback to INTx (which might be
broken) for the bandwidth notification service.  This can resolve
spurious interrupt faults due to this service on some systems.

Fixes: e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

However, the system is still susceptible to random spew in dmesg
depending on how the root port handles downstream device managed link
speed changes.  For example, GPUs like to scale their link speed for
power management when idle.  A GPU assigned to a VM through vfio-pci
can generate link bandwidth notification every time the link is
scaled down, ex:

[  329.725607] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
GT/s x16 link)
[  708.151488] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
GT/s x16 link)
[  718.262959] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
GT/s x16 link)
[ 1138.124932] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
GT/s x16 link)

What is the value of this nagging?

 drivers/pci/pcie/portdrv_core.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 7d04f9d087a6..1b330129089f 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -55,7 +55,8 @@ static int pcie_message_numbers(struct pci_dev *dev, int mask,
 	 * 7.8.2, 7.10.10, 7.31.2.
 	 */
 
-	if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP)) {
+	if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP |
+		    PCIE_PORT_SERVICE_BWNOTIF)) {
 		pcie_capability_read_word(dev, PCI_EXP_FLAGS, &reg16);
 		*pme = (reg16 & PCI_EXP_FLAGS_IRQ) >> 9;
 		nvec = *pme + 1;


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation
  2019-04-22 22:43 [PATCH] PCI/LINK: Account for BW notification in vector calculation Alex Williamson
@ 2019-04-23  0:05 ` Alex G
  2019-04-23  0:33   ` Alex Williamson
  2019-04-23 17:59 ` Alex G
  2019-05-01 20:30 ` Bjorn Helgaas
  2 siblings, 1 reply; 15+ messages in thread
From: Alex G @ 2019-04-23  0:05 UTC (permalink / raw)
  To: Alex Williamson, bhelgaas, helgaas, linux-pci
  Cc: austin_bolen, alex_gagniuc, keith.busch, Shyam_Iyer, lukas,
	okaya, torvalds, linux-kernel

On 4/22/19 5:43 PM, Alex Williamson wrote:
> [  329.725607] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
> limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
> GT/s x16 link)
> [  708.151488] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
> limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
> GT/s x16 link)
> [  718.262959] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
> limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
> GT/s x16 link)
> [ 1138.124932] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
> limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
> GT/s x16 link)
> 
> What is the value of this nagging?

Good! The bandwidth notification service is working as intended. If this 
bothers you, you can unbind the device from the bandwidth notification 
driver:

echo 0000:07:00.0:pcie010 |
sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind



> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index 7d04f9d087a6..1b330129089f 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -55,7 +55,8 @@ static int pcie_message_numbers(struct pci_dev *dev, int mask,
>   	 * 7.8.2, 7.10.10, 7.31.2.
>   	 */
>   
> -	if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP)) {
> +	if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP |
> +		    PCIE_PORT_SERVICE_BWNOTIF)) {
>   		pcie_capability_read_word(dev, PCI_EXP_FLAGS, &reg16);
>   		*pme = (reg16 & PCI_EXP_FLAGS_IRQ) >> 9;
>   		nvec = *pme + 1;

Good catch!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation
  2019-04-23  0:05 ` Alex G
@ 2019-04-23  0:33   ` Alex Williamson
  2019-04-23 14:33     ` Alex G
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Williamson @ 2019-04-23  0:33 UTC (permalink / raw)
  To: Alex G
  Cc: bhelgaas, helgaas, linux-pci, austin_bolen, alex_gagniuc,
	keith.busch, Shyam_Iyer, lukas, okaya, torvalds, linux-kernel

On Mon, 22 Apr 2019 19:05:57 -0500
Alex G <mr.nuke.me@gmail.com> wrote:

> On 4/22/19 5:43 PM, Alex Williamson wrote:
> > [  329.725607] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
> > limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
> > GT/s x16 link)
> > [  708.151488] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
> > limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
> > GT/s x16 link)
> > [  718.262959] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
> > limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
> > GT/s x16 link)
> > [ 1138.124932] vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth,
> > limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5
> > GT/s x16 link)
> > 
> > What is the value of this nagging?  
> 
> Good! The bandwidth notification service is working as intended. If this 
> bothers you, you can unbind the device from the bandwidth notification 
> driver:
> 
> echo 0000:07:00.0:pcie010 |
> sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind

That's a bad solution for users, this is meaningless tracking of a
device whose driver is actively managing the link bandwidth for power
purposes.  There is nothing wrong happening here that needs to fill
logs.  I thought maybe if I enabled notification of autonomous
bandwidth changes that it might categorize these as something we could
ignore, but it doesn't.  How can we identify only cases where this is
an erroneous/noteworthy situation?  Thanks,

Alex

> > diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> > index 7d04f9d087a6..1b330129089f 100644
> > --- a/drivers/pci/pcie/portdrv_core.c
> > +++ b/drivers/pci/pcie/portdrv_core.c
> > @@ -55,7 +55,8 @@ static int pcie_message_numbers(struct pci_dev *dev, int mask,
> >   	 * 7.8.2, 7.10.10, 7.31.2.
> >   	 */
> >   
> > -	if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP)) {
> > +	if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP |
> > +		    PCIE_PORT_SERVICE_BWNOTIF)) {
> >   		pcie_capability_read_word(dev, PCI_EXP_FLAGS, &reg16);
> >   		*pme = (reg16 & PCI_EXP_FLAGS_IRQ) >> 9;
> >   		nvec = *pme + 1;  
> 
> Good catch!


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation
  2019-04-23  0:33   ` Alex Williamson
@ 2019-04-23 14:33     ` Alex G
  2019-04-23 15:34       ` Alex Williamson
  2019-04-23 17:10       ` Bjorn Helgaas
  0 siblings, 2 replies; 15+ messages in thread
From: Alex G @ 2019-04-23 14:33 UTC (permalink / raw)
  To: Alex Williamson
  Cc: bhelgaas, helgaas, linux-pci, austin_bolen, alex_gagniuc,
	keith.busch, Shyam_Iyer, lukas, okaya, torvalds, linux-kernel

On 4/22/19 7:33 PM, Alex Williamson wrote:
> On Mon, 22 Apr 2019 19:05:57 -0500
> Alex G <mr.nuke.me@gmail.com> wrote:
>> echo 0000:07:00.0:pcie010 |
>> sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind
> 
> That's a bad solution for users, this is meaningless tracking of a
> device whose driver is actively managing the link bandwidth for power
> purposes. 

0.5W savings on a 100+W GPU? I agree it's meaningless.

> There is nothing wrong happening here that needs to fill
> logs.  I thought maybe if I enabled notification of autonomous
> bandwidth changes that it might categorize these as something we could
> ignore, but it doesn't.
> How can we identify only cases where this is
> an erroneous/noteworthy situation?  Thanks,

You don't. Ethernet doesn't. USB doesn't. This logging behavior is 
consistent with every other subsystem that deals with multi-speed links. 
I realize some people are very resistant to change (and use very ancient 
kernels). I do not, however, agree that this is a sufficient argument to 
dis-unify behavior.

Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation
  2019-04-23 14:33     ` Alex G
@ 2019-04-23 15:34       ` Alex Williamson
  2019-04-23 15:49         ` Lukas Wunner
  2019-04-23 16:03         ` Alex G
  2019-04-23 17:10       ` Bjorn Helgaas
  1 sibling, 2 replies; 15+ messages in thread
From: Alex Williamson @ 2019-04-23 15:34 UTC (permalink / raw)
  To: Alex G
  Cc: bhelgaas, helgaas, linux-pci, austin_bolen, alex_gagniuc,
	keith.busch, Shyam_Iyer, lukas, okaya, torvalds, linux-kernel

On Tue, 23 Apr 2019 09:33:53 -0500
Alex G <mr.nuke.me@gmail.com> wrote:

> On 4/22/19 7:33 PM, Alex Williamson wrote:
> > On Mon, 22 Apr 2019 19:05:57 -0500
> > Alex G <mr.nuke.me@gmail.com> wrote:  
> >> echo 0000:07:00.0:pcie010 |
> >> sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind  
> > 
> > That's a bad solution for users, this is meaningless tracking of a
> > device whose driver is actively managing the link bandwidth for power
> > purposes.   
> 
> 0.5W savings on a 100+W GPU? I agree it's meaningless.

Evidence?  Regardless, I don't have control of the driver that's making
these changes, but the claim seems unfounded and irrelevant.
 
> > There is nothing wrong happening here that needs to fill
> > logs.  I thought maybe if I enabled notification of autonomous
> > bandwidth changes that it might categorize these as something we could
> > ignore, but it doesn't.
> > How can we identify only cases where this is
> > an erroneous/noteworthy situation?  Thanks,  
> 
> You don't. Ethernet doesn't. USB doesn't. This logging behavior is 
> consistent with every other subsystem that deals with multi-speed links. 
> I realize some people are very resistant to change (and use very ancient 
> kernels). I do not, however, agree that this is a sufficient argument to 
> dis-unify behavior.

Sorry, I don't see how any of this is relevant either.  Clearly I'm
using a recent kernel or I wouldn't be seeing this new bandwidth
notification driver.  I'm assigning a device to a VM whose driver is
power managing the device via link speed changes.  The result is that
we now see irrelevant spam in the host dmesg for every inconsequential
link downgrade directed by the device.  I can see why we might want to
be notified of degraded links due to signal issues, but what I'm
reporting is that there are also entirely normal and benign reasons
that a link might be reduced, we can't seem to tell the difference
between a fault and this normal dynamic scaling, and the assumption of
a fault is spamming dmesg.  So, I don't think what we have here is well
cooked.  Do drivers have a mechanism to opt-out of this error
reporting?  Can drivers register an anticipated link change to avoid
the spam?  What instructions can we *reasonably* give to users as to
when these messages mean something, when they don't, any how they can
be turned off?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation
  2019-04-23 15:34       ` Alex Williamson
@ 2019-04-23 15:49         ` Lukas Wunner
  2019-04-23 16:03         ` Alex G
  1 sibling, 0 replies; 15+ messages in thread
From: Lukas Wunner @ 2019-04-23 15:49 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Alex G, bhelgaas, helgaas, linux-pci, austin_bolen, alex_gagniuc,
	keith.busch, Shyam_Iyer, okaya, torvalds, linux-kernel

On Tue, Apr 23, 2019 at 09:34:08AM -0600, Alex Williamson wrote:
> On Tue, 23 Apr 2019 09:33:53 -0500 Alex G <mr.nuke.me@gmail.com> wrote:
> > 0.5W savings on a 100+W GPU? I agree it's meaningless.
> 
> Evidence?  Regardless, I don't have control of the driver that's making
> these changes, but the claim seems unfounded and irrelevant.

On laptops, 0.5 W can result in noticeably longer battery life.

> I can see why we might want to
> be notified of degraded links due to signal issues, but what I'm
> reporting is that there are also entirely normal and benign reasons
> that a link might be reduced, we can't seem to tell the difference
> between a fault and this normal dynamic scaling, and the assumption of
> a fault is spamming dmesg.  So, I don't think what we have here is well
> cooked.  Do drivers have a mechanism to opt-out of this error
> reporting?

Is dmesg spammed even if no driver is bound to a GPU?  If so, that would
suggest a solution that's not dependent on drivers.  E.g., the
bw_notification port service could avoid reports for devices matching
PCI_BASE_CLASS_DISPLAY.  (It could also avoid binding to ports whose
children include such a device, but the child may be hot-pluggable
and thus appear only after the port is bound.)  Then we'd still get
a notification on boot about degraded link speed, but not continuous
messages.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation
  2019-04-23 15:34       ` Alex Williamson
  2019-04-23 15:49         ` Lukas Wunner
@ 2019-04-23 16:03         ` Alex G
  2019-04-23 16:22           ` Alex Williamson
  1 sibling, 1 reply; 15+ messages in thread
From: Alex G @ 2019-04-23 16:03 UTC (permalink / raw)
  To: Alex Williamson
  Cc: bhelgaas, helgaas, linux-pci, austin_bolen, alex_gagniuc,
	keith.busch, Shyam_Iyer, lukas, okaya, torvalds, linux-kernel



On 4/23/19 10:34 AM, Alex Williamson wrote:
> On Tue, 23 Apr 2019 09:33:53 -0500
> Alex G <mr.nuke.me@gmail.com> wrote:
> 
>> On 4/22/19 7:33 PM, Alex Williamson wrote:
>>> On Mon, 22 Apr 2019 19:05:57 -0500
>>> Alex G <mr.nuke.me@gmail.com> wrote:
>>>> echo 0000:07:00.0:pcie010 |
>>>> sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind
>>>
>>> That's a bad solution for users, this is meaningless tracking of a
>>> device whose driver is actively managing the link bandwidth for power
>>> purposes.
>>
>> 0.5W savings on a 100+W GPU? I agree it's meaningless.
> 
> Evidence?  Regardless, I don't have control of the driver that's making
> these changes, but the claim seems unfounded and irrelevant.

The number of 5mW/Gb/lane doesn't ring a bell? [1] [2]. Your GPU 
supports 5Gb/s, so likely using an older, more power hungry process. I 
suspect it's still within the same order of magnitude.


> I'm assigning a device to a VM [snip]
> I can see why we might want to be notified of degraded links due to signal issues,
> but what I'm reporting is that there are also entirely normal reasons
> [snip] we can't seem to tell the difference

Unfortunately, there is no way in PCI-Express to distinguish between an 
expected link bandwidth change and one due to error.

If you're using virt-manager to configure the VM, then virt-manager 
could have a checkbox to disable link bandwidth management messages. I'd 
rather we avoid kernel-side heuristics (like Lukas suggested). If you're 
confident that your link will operate as intended, and don't want 
messages about it, that's your call as a user -- we shouldn't decide 
this in the kernel.

Alex

[1] 
https://www.synopsys.com/designware-ip/technical-bulletin/reduce-power-consumption.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation
  2019-04-23 16:03         ` Alex G
@ 2019-04-23 16:22           ` Alex Williamson
  2019-04-23 16:27             ` Alex G
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Williamson @ 2019-04-23 16:22 UTC (permalink / raw)
  To: Alex G
  Cc: bhelgaas, helgaas, linux-pci, austin_bolen, alex_gagniuc,
	keith.busch, Shyam_Iyer, lukas, okaya, torvalds, linux-kernel

On Tue, 23 Apr 2019 11:03:04 -0500
Alex G <mr.nuke.me@gmail.com> wrote:

> On 4/23/19 10:34 AM, Alex Williamson wrote:
> > On Tue, 23 Apr 2019 09:33:53 -0500
> > Alex G <mr.nuke.me@gmail.com> wrote:
> >   
> >> On 4/22/19 7:33 PM, Alex Williamson wrote:  
> >>> On Mon, 22 Apr 2019 19:05:57 -0500
> >>> Alex G <mr.nuke.me@gmail.com> wrote:  
> >>>> echo 0000:07:00.0:pcie010 |
> >>>> sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind  
> >>>
> >>> That's a bad solution for users, this is meaningless tracking of a
> >>> device whose driver is actively managing the link bandwidth for power
> >>> purposes.  
> >>
> >> 0.5W savings on a 100+W GPU? I agree it's meaningless.  
> > 
> > Evidence?  Regardless, I don't have control of the driver that's making
> > these changes, but the claim seems unfounded and irrelevant.  
> 
> The number of 5mW/Gb/lane doesn't ring a bell? [1] [2]. Your GPU 
> supports 5Gb/s, so likely using an older, more power hungry process. I 
> suspect it's still within the same order of magnitude.

This doesn't necessarily imply the overall power savings to the
endpoint as a whole though, and it's still irrelevant to the discussion
here.  The driver is doing something reasonable that's generating host
dmesg spam.

> > I'm assigning a device to a VM [snip]
> > I can see why we might want to be notified of degraded links due to signal issues,
> > but what I'm reporting is that there are also entirely normal reasons
> > [snip] we can't seem to tell the difference  
> 
> Unfortunately, there is no way in PCI-Express to distinguish between an 
> expected link bandwidth change and one due to error.

Then assuming every link speed change is an error seems like the wrong
approach.  Should we instead have a callback that drivers can
optionally register to receive link change notifications?  If a driver
doesn't register such a callback then a generic message can be posted,
but if they do, the driver can decide whether this is an error.
 
> If you're using virt-manager to configure the VM, then virt-manager 
> could have a checkbox to disable link bandwidth management messages. I'd 

What makes us think that this is the only case where such link speed
changes will occur?  Hand waving that a userspace management utility
should go unbind drivers that over-zealously report errors is a poor
solution.

> rather we avoid kernel-side heuristics (like Lukas suggested). If you're 
> confident that your link will operate as intended, and don't want 
> messages about it, that's your call as a user -- we shouldn't decide 
> this in the kernel.

Nor should pci-core decide what link speed changes are intended or
errors.  Minimally we should be enabling drivers to receive this
feedback.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation
  2019-04-23 16:22           ` Alex Williamson
@ 2019-04-23 16:27             ` Alex G
  2019-04-23 16:37               ` Alex Williamson
  0 siblings, 1 reply; 15+ messages in thread
From: Alex G @ 2019-04-23 16:27 UTC (permalink / raw)
  To: Alex Williamson
  Cc: bhelgaas, helgaas, linux-pci, austin_bolen, alex_gagniuc,
	keith.busch, Shyam_Iyer, lukas, okaya, torvalds, linux-kernel

On 4/23/19 11:22 AM, Alex Williamson wrote:
> Nor should pci-core decide what link speed changes are intended or
> errors.  Minimally we should be enabling drivers to receive this
> feedback.  Thanks,

Not errors. pci core reports that a link speed change event has occured. 
Period.

Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation
  2019-04-23 16:27             ` Alex G
@ 2019-04-23 16:37               ` Alex Williamson
  0 siblings, 0 replies; 15+ messages in thread
From: Alex Williamson @ 2019-04-23 16:37 UTC (permalink / raw)
  To: Alex G
  Cc: bhelgaas, helgaas, linux-pci, austin_bolen, alex_gagniuc,
	keith.busch, Shyam_Iyer, lukas, okaya, torvalds, linux-kernel

On Tue, 23 Apr 2019 11:27:39 -0500
Alex G <mr.nuke.me@gmail.com> wrote:

> On 4/23/19 11:22 AM, Alex Williamson wrote:
> > Nor should pci-core decide what link speed changes are intended or
> > errors.  Minimally we should be enabling drivers to receive this
> > feedback.  Thanks,  
> 
> Not errors. pci core reports that a link speed change event has occured. 
> Period.

And it shows up in dmesg, and what do users (and developers) think when
things are repeatedly reported in dmesg?  Whether this is "information"
or "error", it's spamming dmesg, irrelevant, and confusing.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation
  2019-04-23 14:33     ` Alex G
  2019-04-23 15:34       ` Alex Williamson
@ 2019-04-23 17:10       ` Bjorn Helgaas
  2019-04-23 17:53         ` Alex G
  1 sibling, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2019-04-23 17:10 UTC (permalink / raw)
  To: Alex G
  Cc: Alex Williamson, linux-pci, austin_bolen, alex_gagniuc,
	keith.busch, Shyam_Iyer, lukas, okaya, torvalds, linux-kernel

On Tue, Apr 23, 2019 at 09:33:53AM -0500, Alex G wrote:
> On 4/22/19 7:33 PM, Alex Williamson wrote:
> > There is nothing wrong happening here that needs to fill logs.  I
> > thought maybe if I enabled notification of autonomous bandwidth
> > changes that it might categorize these as something we could
> > ignore, but it doesn't.  How can we identify only cases where this
> > is an erroneous/noteworthy situation?  Thanks,
> 
> You don't. Ethernet doesn't. USB doesn't. This logging behavior is
> consistent with every other subsystem that deals with multi-speed links.

Can you point me to the logging in these other subsystems so I can
learn more about how they deal with this?

I agree that emitting log messages for normal and expected events will
lead to user confusion and we need to do something.

e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth
notification") was merged in v5.1-rc1, so we still have (a little)
time to figure this out before v5.1.

Bjorn

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation
  2019-04-23 17:10       ` Bjorn Helgaas
@ 2019-04-23 17:53         ` Alex G
  2019-04-23 18:38           ` Alex Williamson
  0 siblings, 1 reply; 15+ messages in thread
From: Alex G @ 2019-04-23 17:53 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Alex Williamson, linux-pci, austin_bolen, alex_gagniuc,
	keith.busch, Shyam_Iyer, lukas, okaya, torvalds, linux-kernel

On 4/23/19 12:10 PM, Bjorn Helgaas wrote:
> On Tue, Apr 23, 2019 at 09:33:53AM -0500, Alex G wrote:
>> On 4/22/19 7:33 PM, Alex Williamson wrote:
>>> There is nothing wrong happening here that needs to fill logs.  I
>>> thought maybe if I enabled notification of autonomous bandwidth
>>> changes that it might categorize these as something we could
>>> ignore, but it doesn't.  How can we identify only cases where this
>>> is an erroneous/noteworthy situation?  Thanks,
>>
>> You don't. Ethernet doesn't. USB doesn't. This logging behavior is
>> consistent with every other subsystem that deals with multi-speed links.
> 
> Can you point me to the logging in these other subsystems so I can
> learn more about how they deal with this?

I don't have any in-depth articles about the logging in these systems, 
but I can extract some logs from my machines.

Ethernet:

[Sun Apr 21 11:14:06 2019] e1000e: eno1 NIC Link is Down
[Sun Apr 21 11:14:17 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full 
Duplex, Flow Control: Rx/Tx
[Sun Apr 21 11:14:23 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full 
Duplex, Flow Control: Rx/Tx
[Sun Apr 21 23:33:31 2019] e1000e: eno1 NIC Link is Down
[Sun Apr 21 23:33:43 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full 
Duplex, Flow Control: Rx/Tx
[Sun Apr 21 23:33:48 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full 
Duplex, Flow Control: Rx/Tx

I used to have one of these "green" ethernet switches that went down to 
100mbps automatically. You can imagine how "clogged" the logs were with 
link up messages. Thank goodness that switch was killed in a thunderstorm.

USB will log every device insertion and removal, very verbosely (see 
appendix A).


> I agree that emitting log messages for normal and expected events will
> lead to user confusion and we need to do something.
> 
> e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth
> notification") was merged in v5.1-rc1, so we still have (a little)
> time to figure this out before v5.1.

I always viewed the system log as a system log, instead of a database of 
system errors. I may have extremist views, but going back to Alex's 
example, I prefer to see that the power saving mechanism is doing 
something to save power on my laptop (I'll just ignore it on a desktop).

If you think increasing code complexity because people don't want things 
logged into the system log, then I'm certain we can work out some sane 
solution. It's the same problem we see with GCC, where people want 
warning messages here, but don't want the same messages there.

Alex


P.S. The pedantic in me points out that one of the examples I gave is a 
terrible example. ASPM "allows hardware-autonomous, dynamic Link power 
reduction beyond what is achievable by software-only control" [1].

[1] PCI-Express 3.0 -- 5.4.1. Active State Power Management (ASPM)


Appendix A:

[1618067.987084] usb 1-3.5: new high-speed USB device number 79 using 
xhci_hcd
[1618068.179914] usb 1-3.5: New USB device found, idVendor=0bda, 
idProduct=4014, bcdDevice= 0.05
[1618068.179924] usb 1-3.5: New USB device strings: Mfr=3, Product=1, 
SerialNumber=2
[1618068.179930] usb 1-3.5: Product: USB Audio
[1618068.179936] usb 1-3.5: Manufacturer: Generic
[1618068.179941] usb 1-3.5: SerialNumber: 200901010001
[1618068.280100] usb 1-3.6: new low-speed USB device number 80 using 
xhci_hcd
[1618068.342541] Bluetooth: hci0: Waiting for firmware download to complete
[1618068.342795] Bluetooth: hci0: Firmware loaded in 1509081 usecs
[1618068.342887] Bluetooth: hci0: Waiting for device to boot
[1618068.354919] Bluetooth: hci0: Device booted in 11797 usecs
[1618068.356006] Bluetooth: hci0: Found Intel DDC parameters: 
intel/ibt-12-16.ddc
[1618068.358958] Bluetooth: hci0: Applying Intel DDC parameters completed
[1618068.378624] usb 1-3.6: New USB device found, idVendor=04d9, 
idProduct=1400, bcdDevice= 1.43
[1618068.378626] usb 1-3.6: New USB device strings: Mfr=0, Product=0, 
SerialNumber=0
[1618068.390686] input: HID 04d9:1400 as 
/devices/pci0000:00/0000:00:14.0/usb1/1-3/1-3.6/1-3.6:1.0/0003:04D9:1400.0139/input/input921
[1618068.444282] hid-generic 0003:04D9:1400.0139: input,hidraw1: USB HID 
v1.10 Keyboard [HID 04d9:1400] on usb-0000:00:14.0-3.6/input0
[1618068.456373] input: HID 04d9:1400 Mouse as 
/devices/pci0000:00/0000:00:14.0/usb1/1-3/1-3.6/1-3.6:1.1/0003:04D9:1400.013A/input/input922
[1618068.457929] input: HID 04d9:1400 Consumer Control as 
/devices/pci0000:00/0000:00:14.0/usb1/1-3/1-3.6/1-3.6:1.1/0003:04D9:1400.013A/input/input923
[1618068.509294] input: HID 04d9:1400 System Control as 
/devices/pci0000:00/0000:00:14.0/usb1/1-3/1-3.6/1-3.6:1.1/0003:04D9:1400.013A/input/input924
[1618068.509518] hid-generic 0003:04D9:1400.013A: input,hidraw2: USB HID 
v1.10 Mouse [HID 04d9:1400] on usb-0000:00:14.0-3.6/input1
[1618068.588078] usb 1-3.7: new full-speed USB device number 81 using 
xhci_hcd
[1618068.679132] usb 1-3.7: New USB device found, idVendor=046d, 
idProduct=c52b, bcdDevice=12.03
[1618068.679137] usb 1-3.7: New USB device strings: Mfr=1, Product=2, 
SerialNumber=0
[1618068.679139] usb 1-3.7: Product: USB Receiver
[1618068.679142] usb 1-3.7: Manufacturer: Logitech
[1618068.692430] logitech-djreceiver 0003:046D:C52B.013D: 
hiddev96,hidraw3: USB HID v1.11 Device [Logitech USB Receiver] on 
usb-0000:00:14.0-3.7/input2
[1618068.817334] input: Logitech Performance MX as 
/devices/pci0000:00/0000:00:14.0/usb1/1-3/1-3.7/1-3.7:1.2/0003:046D:C52B.013D/0003:046D:101A.013E/input/input925
[1618068.820357] logitech-hidpp-device 0003:046D:101A.013E: 
input,hidraw4: USB HID v1.11 Mouse [Logitech Performance MX] on 
usb-0000:00:14.0-3.7:1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation
  2019-04-22 22:43 [PATCH] PCI/LINK: Account for BW notification in vector calculation Alex Williamson
  2019-04-23  0:05 ` Alex G
@ 2019-04-23 17:59 ` Alex G
  2019-05-01 20:30 ` Bjorn Helgaas
  2 siblings, 0 replies; 15+ messages in thread
From: Alex G @ 2019-04-23 17:59 UTC (permalink / raw)
  To: Alex Williamson, bhelgaas, helgaas, linux-pci
  Cc: austin_bolen, alex_gagniuc, keith.busch, Shyam_Iyer, lukas,
	okaya, torvalds, linux-kernel



On 4/22/19 5:43 PM, Alex Williamson wrote:
> On systems that don't support any PCIe services other than bandwidth
> notification, pcie_message_numbers() can return zero vectors, causing
> the vector reallocation in pcie_port_enable_irq_vec() to retry with
> zero, which fails, resulting in fallback to INTx (which might be
> broken) for the bandwidth notification service.  This can resolve
> spurious interrupt faults due to this service on some systems.
> 
> Fixes: e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---

+1
Tested on some Dell servers. Everything works as expected. I don't have 
a system with a device that only supports bandwidth notification.

Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation
  2019-04-23 17:53         ` Alex G
@ 2019-04-23 18:38           ` Alex Williamson
  0 siblings, 0 replies; 15+ messages in thread
From: Alex Williamson @ 2019-04-23 18:38 UTC (permalink / raw)
  To: Alex G
  Cc: Bjorn Helgaas, linux-pci, austin_bolen, alex_gagniuc,
	keith.busch, Shyam_Iyer, lukas, okaya, torvalds, linux-kernel

On Tue, 23 Apr 2019 12:53:07 -0500
Alex G <mr.nuke.me@gmail.com> wrote:

> On 4/23/19 12:10 PM, Bjorn Helgaas wrote:
> > On Tue, Apr 23, 2019 at 09:33:53AM -0500, Alex G wrote:  
> >> On 4/22/19 7:33 PM, Alex Williamson wrote:  
> >>> There is nothing wrong happening here that needs to fill logs.  I
> >>> thought maybe if I enabled notification of autonomous bandwidth
> >>> changes that it might categorize these as something we could
> >>> ignore, but it doesn't.  How can we identify only cases where this
> >>> is an erroneous/noteworthy situation?  Thanks,  
> >>
> >> You don't. Ethernet doesn't. USB doesn't. This logging behavior is
> >> consistent with every other subsystem that deals with multi-speed links.  
> > 
> > Can you point me to the logging in these other subsystems so I can
> > learn more about how they deal with this?  
> 
> I don't have any in-depth articles about the logging in these systems, 
> but I can extract some logs from my machines.
> 
> Ethernet:
> 
> [Sun Apr 21 11:14:06 2019] e1000e: eno1 NIC Link is Down
> [Sun Apr 21 11:14:17 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full 
> Duplex, Flow Control: Rx/Tx
> [Sun Apr 21 11:14:23 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full 
> Duplex, Flow Control: Rx/Tx
> [Sun Apr 21 23:33:31 2019] e1000e: eno1 NIC Link is Down
> [Sun Apr 21 23:33:43 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full 
> Duplex, Flow Control: Rx/Tx
> [Sun Apr 21 23:33:48 2019] e1000e: eno1 NIC Link is Up 1000 Mbps Full 
> Duplex, Flow Control: Rx/Tx
> 
> I used to have one of these "green" ethernet switches that went down to 
> 100mbps automatically. You can imagine how "clogged" the logs were with 
> link up messages. Thank goodness that switch was killed in a thunderstorm.
> 
> USB will log every device insertion and removal, very verbosely (see 
> appendix A).

I have a hard time putting USB insertion and removal into the same
class, the equivalent is PCI hotplug which is logged separately.  Do
we ever log beyond USB device discovery if a device is running at a
lower speed than is possible?  The most directly related is the green
ethernet switch, which you admit was a nuisance due to exactly this
sort of logging.  It was probably confusing to see this logging, perhaps
you wondered if the cable was bad or the switch was defective.

> > I agree that emitting log messages for normal and expected events will
> > lead to user confusion and we need to do something.
> > 
> > e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth
> > notification") was merged in v5.1-rc1, so we still have (a little)
> > time to figure this out before v5.1.  
> 
> I always viewed the system log as a system log, instead of a database of 
> system errors. I may have extremist views, but going back to Alex's 
> example, I prefer to see that the power saving mechanism is doing 
> something to save power on my laptop (I'll just ignore it on a desktop).

There's a disconnect from above where similar behavior on ethernet
behavior "clogged" the log files, but here we just want to ignore it.
Excessive logging can also be considered a denial of service vector
when the device generating that excessive logging is attached to a
userspace driver.

> If you think increasing code complexity because people don't want things 
> logged into the system log, then I'm certain we can work out some sane 
> solution. It's the same problem we see with GCC, where people want 
> warning messages here, but don't want the same messages there.

v5.1 is approaching quickly, can we downgrade these to pci_dbg() while
we work on maybe some sort of driver participation in this logging?
Thanks,

Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] PCI/LINK: Account for BW notification in vector calculation
  2019-04-22 22:43 [PATCH] PCI/LINK: Account for BW notification in vector calculation Alex Williamson
  2019-04-23  0:05 ` Alex G
  2019-04-23 17:59 ` Alex G
@ 2019-05-01 20:30 ` Bjorn Helgaas
  2 siblings, 0 replies; 15+ messages in thread
From: Bjorn Helgaas @ 2019-05-01 20:30 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mr.nuke.me, linux-pci, austin_bolen, alex_gagniuc, keith.busch,
	Shyam_Iyer, lukas, okaya, torvalds, linux-kernel

On Mon, Apr 22, 2019 at 04:43:30PM -0600, Alex Williamson wrote:
> On systems that don't support any PCIe services other than bandwidth
> notification, pcie_message_numbers() can return zero vectors, causing
> the vector reallocation in pcie_port_enable_irq_vec() to retry with
> zero, which fails, resulting in fallback to INTx (which might be
> broken) for the bandwidth notification service.  This can resolve
> spurious interrupt faults due to this service on some systems.
> 
> Fixes: e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification")
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Applied for (hopefully) v5.1, thanks!

>  drivers/pci/pcie/portdrv_core.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index 7d04f9d087a6..1b330129089f 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -55,7 +55,8 @@ static int pcie_message_numbers(struct pci_dev *dev, int mask,
>  	 * 7.8.2, 7.10.10, 7.31.2.
>  	 */
>  
> -	if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP)) {
> +	if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP |
> +		    PCIE_PORT_SERVICE_BWNOTIF)) {
>  		pcie_capability_read_word(dev, PCI_EXP_FLAGS, &reg16);
>  		*pme = (reg16 & PCI_EXP_FLAGS_IRQ) >> 9;
>  		nvec = *pme + 1;
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-05-01 20:30 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-22 22:43 [PATCH] PCI/LINK: Account for BW notification in vector calculation Alex Williamson
2019-04-23  0:05 ` Alex G
2019-04-23  0:33   ` Alex Williamson
2019-04-23 14:33     ` Alex G
2019-04-23 15:34       ` Alex Williamson
2019-04-23 15:49         ` Lukas Wunner
2019-04-23 16:03         ` Alex G
2019-04-23 16:22           ` Alex Williamson
2019-04-23 16:27             ` Alex G
2019-04-23 16:37               ` Alex Williamson
2019-04-23 17:10       ` Bjorn Helgaas
2019-04-23 17:53         ` Alex G
2019-04-23 18:38           ` Alex Williamson
2019-04-23 17:59 ` Alex G
2019-05-01 20:30 ` Bjorn Helgaas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.