linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
@ 2020-01-15 22:10 Bjorn Helgaas
  2020-01-16  2:44 ` Alex G
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Bjorn Helgaas @ 2020-01-15 22:10 UTC (permalink / raw)
  To: Alexandru Gagniuc, Alexandru Gagniuc, Keith Busch
  Cc: Jan Vesely, Lukas Wunner, Alex Williamson, Austin Bolen,
	Shyam Iyer, Sinan Kaya, linux-pci, linux-kernel

I think we have a problem with link bandwidth change notifications
(see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/bw_notification.c).

Here's a recent bug report where Jan reported "_tons_" of these
notifications on an nvme device:
https://bugzilla.kernel.org/show_bug.cgi?id=206197

There was similar discussion involving GPU drivers at
https://lore.kernel.org/r/20190429185611.121751-2-helgaas@kernel.org

The current solution is the CONFIG_PCIE_BW config option, which
disables the messages completely.  That option defaults to "off" (no
messages), but even so, I think it's a little problematic.

Users are not really in a position to figure out whether it's safe to
enable.  All they can do is experiment and see whether it works with
their current mix of devices and drivers.

I don't think it's currently useful for distros because it's a
compile-time switch, and distros cannot predict what system configs
will be used, so I don't think they can enable it.

Does anybody have proposals for making it smarter about distinguishing
real problems from intentional power management, or maybe interfaces
drivers could use to tell us when we should ignore bandwidth changes?

Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2020-01-15 22:10 Issues with "PCI/LINK: Report degraded links via link bandwidth notification" Bjorn Helgaas
@ 2020-01-16  2:44 ` Alex G
  2020-01-18  0:18   ` Bjorn Helgaas
  2020-01-20  2:33 ` Bjorn Helgaas
  2020-02-22 16:58 ` Bjorn Helgaas
  2 siblings, 1 reply; 24+ messages in thread
From: Alex G @ 2020-01-16  2:44 UTC (permalink / raw)
  To: Bjorn Helgaas, Alexandru Gagniuc, Keith Busch
  Cc: Jan Vesely, Lukas Wunner, Alex Williamson, Austin Bolen,
	Shyam Iyer, Sinan Kaya, linux-pci, linux-kernel

Hi Bjorn,

I'm no longer working on this, so my memory may not be up to speed. If 
the endpoint is causing the bandwidth change, then we should get an 
_autonomous_ link management interrupt instead. I don't think we report 
those, and that shouldn't spam the logs

If it's not a (non-autonomous) link management interrupt, then something 
is causing the downstream port to do funny things. I don't think ASPM is 
supposed to be causing this.

Do we know what's causing these swings?

For now, I suggest a boot-time parameter to disable link speed reporting 
instead of a compile time option.

Alex

On 1/15/20 4:10 PM, Bjorn Helgaas wrote:
> I think we have a problem with link bandwidth change notifications
> (see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/bw_notification.c).
> 
> Here's a recent bug report where Jan reported "_tons_" of these
> notifications on an nvme device:
> https://bugzilla.kernel.org/show_bug.cgi?id=206197
> 
> There was similar discussion involving GPU drivers at
> https://lore.kernel.org/r/20190429185611.121751-2-helgaas@kernel.org
> 
> The current solution is the CONFIG_PCIE_BW config option, which
> disables the messages completely.  That option defaults to "off" (no
> messages), but even so, I think it's a little problematic.
> 
> Users are not really in a position to figure out whether it's safe to
> enable.  All they can do is experiment and see whether it works with
> their current mix of devices and drivers.
> 
> I don't think it's currently useful for distros because it's a
> compile-time switch, and distros cannot predict what system configs
> will be used, so I don't think they can enable it.
> 
> Does anybody have proposals for making it smarter about distinguishing
> real problems from intentional power management, or maybe interfaces
> drivers could use to tell us when we should ignore bandwidth changes?
> 
> Bjorn
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2020-01-16  2:44 ` Alex G
@ 2020-01-18  0:18   ` Bjorn Helgaas
  0 siblings, 0 replies; 24+ messages in thread
From: Bjorn Helgaas @ 2020-01-18  0:18 UTC (permalink / raw)
  To: Alex G
  Cc: Alexandru Gagniuc, Keith Busch, Jan Vesely, Lukas Wunner,
	Alex Williamson, Austin Bolen, Shyam Iyer, Sinan Kaya, linux-pci,
	linux-kernel

On Wed, Jan 15, 2020 at 08:44:21PM -0600, Alex G wrote:
> Hi Bjorn,
> 
> I'm no longer working on this, so my memory may not be up to speed. If the
> endpoint is causing the bandwidth change, then we should get an _autonomous_
> link management interrupt instead. I don't think we report those, and that
> shouldn't spam the logs
> 
> If it's not a (non-autonomous) link management interrupt, then something is
> causing the downstream port to do funny things. I don't think ASPM is
> supposed to be causing this.
> 
> Do we know what's causing these swings?
> 
> For now, I suggest a boot-time parameter to disable link speed reporting
> instead of a compile time option.

If we add a parameter, it would have to be that reporting is disabled
by default, and the parameter would enable it.  That way the default
will not spam the logs and cause more problem reports.

I don't have time to debug this and I don't like boot parameters in
general, so somebody else will have to step up to resolve this.

At the very least, we need a Kconfig update to warn about the
possibility, and we may need to consider reverting this if we don't
have a better solution.

> On 1/15/20 4:10 PM, Bjorn Helgaas wrote:
> > I think we have a problem with link bandwidth change notifications
> > (see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/bw_notification.c).
> > 
> > Here's a recent bug report where Jan reported "_tons_" of these
> > notifications on an nvme device:
> > https://bugzilla.kernel.org/show_bug.cgi?id=206197
> > 
> > There was similar discussion involving GPU drivers at
> > https://lore.kernel.org/r/20190429185611.121751-2-helgaas@kernel.org
> > 
> > The current solution is the CONFIG_PCIE_BW config option, which
> > disables the messages completely.  That option defaults to "off" (no
> > messages), but even so, I think it's a little problematic.
> > 
> > Users are not really in a position to figure out whether it's safe to
> > enable.  All they can do is experiment and see whether it works with
> > their current mix of devices and drivers.
> > 
> > I don't think it's currently useful for distros because it's a
> > compile-time switch, and distros cannot predict what system configs
> > will be used, so I don't think they can enable it.
> > 
> > Does anybody have proposals for making it smarter about distinguishing
> > real problems from intentional power management, or maybe interfaces
> > drivers could use to tell us when we should ignore bandwidth changes?
> > 
> > Bjorn
> > 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2020-01-15 22:10 Issues with "PCI/LINK: Report degraded links via link bandwidth notification" Bjorn Helgaas
  2020-01-16  2:44 ` Alex G
@ 2020-01-20  2:33 ` Bjorn Helgaas
  2020-01-20 15:56   ` Alex Williamson
                     ` (2 more replies)
  2020-02-22 16:58 ` Bjorn Helgaas
  2 siblings, 3 replies; 24+ messages in thread
From: Bjorn Helgaas @ 2020-01-20  2:33 UTC (permalink / raw)
  To: Alexandru Gagniuc, Alexandru Gagniuc, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, David Airlie, Daniel Vetter
  Cc: Jan Vesely, Lukas Wunner, Alex Williamson, Austin Bolen,
	Shyam Iyer, Sinan Kaya, linux-pci, linux-kernel

[+cc NVMe, GPU driver folks]

On Wed, Jan 15, 2020 at 04:10:08PM -0600, Bjorn Helgaas wrote:
> I think we have a problem with link bandwidth change notifications
> (see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/bw_notification.c).
> 
> Here's a recent bug report where Jan reported "_tons_" of these
> notifications on an nvme device:
> https://bugzilla.kernel.org/show_bug.cgi?id=206197
> 
> There was similar discussion involving GPU drivers at
> https://lore.kernel.org/r/20190429185611.121751-2-helgaas@kernel.org
> 
> The current solution is the CONFIG_PCIE_BW config option, which
> disables the messages completely.  That option defaults to "off" (no
> messages), but even so, I think it's a little problematic.
> 
> Users are not really in a position to figure out whether it's safe to
> enable.  All they can do is experiment and see whether it works with
> their current mix of devices and drivers.
> 
> I don't think it's currently useful for distros because it's a
> compile-time switch, and distros cannot predict what system configs
> will be used, so I don't think they can enable it.
> 
> Does anybody have proposals for making it smarter about distinguishing
> real problems from intentional power management, or maybe interfaces
> drivers could use to tell us when we should ignore bandwidth changes?

NVMe, GPU folks, do your drivers or devices change PCIe link
speed/width for power saving or other reasons?  When CONFIG_PCIE_BW=y,
the PCI core interprets changes like that as problems that need to be
reported.

If drivers do change link speed/width, can you point me to where
that's done?  Would it be feasible to add some sort of PCI core
interface so the driver could say "ignore" or "pay attention to"
subsequent link changes?

Or maybe there would even be a way to move the link change itself into
the PCI core, so the core would be aware of what's going on?

Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2020-01-20  2:33 ` Bjorn Helgaas
@ 2020-01-20 15:56   ` Alex Williamson
  2020-01-20 16:01   ` Alex G.
  2020-01-30 16:26   ` Christoph Hellwig
  2 siblings, 0 replies; 24+ messages in thread
From: Alex Williamson @ 2020-01-20 15:56 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Alexandru Gagniuc, Alexandru Gagniuc, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, David Airlie, Daniel Vetter,
	Jan Vesely, Lukas Wunner, Austin Bolen, Shyam Iyer, Sinan Kaya,
	linux-pci, linux-kernel

On Sun, 19 Jan 2020 20:33:26 -0600
Bjorn Helgaas <helgaas@kernel.org> wrote:

> [+cc NVMe, GPU driver folks]
> 
> On Wed, Jan 15, 2020 at 04:10:08PM -0600, Bjorn Helgaas wrote:
> > I think we have a problem with link bandwidth change notifications
> > (see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/bw_notification.c).
> > 
> > Here's a recent bug report where Jan reported "_tons_" of these
> > notifications on an nvme device:
> > https://bugzilla.kernel.org/show_bug.cgi?id=206197
> > 
> > There was similar discussion involving GPU drivers at
> > https://lore.kernel.org/r/20190429185611.121751-2-helgaas@kernel.org
> > 
> > The current solution is the CONFIG_PCIE_BW config option, which
> > disables the messages completely.  That option defaults to "off" (no
> > messages), but even so, I think it's a little problematic.
> > 
> > Users are not really in a position to figure out whether it's safe to
> > enable.  All they can do is experiment and see whether it works with
> > their current mix of devices and drivers.
> > 
> > I don't think it's currently useful for distros because it's a
> > compile-time switch, and distros cannot predict what system configs
> > will be used, so I don't think they can enable it.
> > 
> > Does anybody have proposals for making it smarter about distinguishing
> > real problems from intentional power management, or maybe interfaces
> > drivers could use to tell us when we should ignore bandwidth changes?  
> 
> NVMe, GPU folks, do your drivers or devices change PCIe link
> speed/width for power saving or other reasons?  When CONFIG_PCIE_BW=y,
> the PCI core interprets changes like that as problems that need to be
> reported.
> 
> If drivers do change link speed/width, can you point me to where
> that's done?  Would it be feasible to add some sort of PCI core
> interface so the driver could say "ignore" or "pay attention to"
> subsequent link changes?
> 
> Or maybe there would even be a way to move the link change itself into
> the PCI core, so the core would be aware of what's going on?

One case where we previously saw sporadic link change messages was
vfio-pci owned devices.  If the transitions are based on config space
manipulation then I can trap those accesses and wrap them in a PCI core
API, but I suspect that's not the exclusive (or potentially even
primary) mechanism for initiating link changes.  So I think we'd
probably need a mechanism for a driver to opt-out of link notification
for their devices (presumably the fn0 device per link would opt-out the
entire link?).  Thanks,

Alex


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2020-01-20  2:33 ` Bjorn Helgaas
  2020-01-20 15:56   ` Alex Williamson
@ 2020-01-20 16:01   ` Alex G.
  2020-01-21 11:10     ` Lucas Stach
  2020-01-30 16:26   ` Christoph Hellwig
  2 siblings, 1 reply; 24+ messages in thread
From: Alex G. @ 2020-01-20 16:01 UTC (permalink / raw)
  To: Bjorn Helgaas, Alexandru Gagniuc, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, David Airlie, Daniel Vetter
  Cc: Jan Vesely, Lukas Wunner, Alex Williamson, Austin Bolen,
	Shyam Iyer, Sinan Kaya, linux-pci, linux-kernel



On 1/19/20 8:33 PM, Bjorn Helgaas wrote:
> [+cc NVMe, GPU driver folks]
> 
> On Wed, Jan 15, 2020 at 04:10:08PM -0600, Bjorn Helgaas wrote:
>> I think we have a problem with link bandwidth change notifications
>> (see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/bw_notification.c).
>>
>> Here's a recent bug report where Jan reported "_tons_" of these
>> notifications on an nvme device:
>> https://bugzilla.kernel.org/show_bug.cgi?id=206197
>>
>> There was similar discussion involving GPU drivers at
>> https://lore.kernel.org/r/20190429185611.121751-2-helgaas@kernel.org
>>
>> The current solution is the CONFIG_PCIE_BW config option, which
>> disables the messages completely.  That option defaults to "off" (no
>> messages), but even so, I think it's a little problematic.
>>
>> Users are not really in a position to figure out whether it's safe to
>> enable.  All they can do is experiment and see whether it works with
>> their current mix of devices and drivers.
>>
>> I don't think it's currently useful for distros because it's a
>> compile-time switch, and distros cannot predict what system configs
>> will be used, so I don't think they can enable it.
>>
>> Does anybody have proposals for making it smarter about distinguishing
>> real problems from intentional power management, or maybe interfaces
>> drivers could use to tell us when we should ignore bandwidth changes?
> 
> NVMe, GPU folks, do your drivers or devices change PCIe link
> speed/width for power saving or other reasons?  When CONFIG_PCIE_BW=y,
> the PCI core interprets changes like that as problems that need to be
> reported.
> 
> If drivers do change link speed/width, can you point me to where
> that's done?  Would it be feasible to add some sort of PCI core
> interface so the driver could say "ignore" or "pay attention to"
> subsequent link changes?
> 
> Or maybe there would even be a way to move the link change itself into
> the PCI core, so the core would be aware of what's going on?

Funny thing is, I was going to suggest an in-kernel API for this.
   * Driver requests lower link speed 'X'
   * Link management interrupt fires
   * If link speed is at or above 'X' then do not report it.
I think an "ignore" flag would defeat the purpose of having link 
bandwidth reporting in the first place. If some drivers set it, and 
others don't, then it would be inconsistent enough to not be useful.

A second suggestion is, if there is a way to ratelimit these messages on 
a per-downstream port basis.

Alex

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2020-01-20 16:01   ` Alex G.
@ 2020-01-21 11:10     ` Lucas Stach
  2020-01-21 14:55       ` Alex G.
  2020-02-03  1:56       ` Dave Airlie
  0 siblings, 2 replies; 24+ messages in thread
From: Lucas Stach @ 2020-01-21 11:10 UTC (permalink / raw)
  To: Alex G.,
	Bjorn Helgaas, Alexandru Gagniuc, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, David Airlie, Daniel Vetter
  Cc: Jan Vesely, Lukas Wunner, Alex Williamson, Austin Bolen,
	Shyam Iyer, Sinan Kaya, linux-pci, linux-kernel

On Mo, 2020-01-20 at 10:01 -0600, Alex G. wrote:
> 
> On 1/19/20 8:33 PM, Bjorn Helgaas wrote:
> > [+cc NVMe, GPU driver folks]
> > 
> > On Wed, Jan 15, 2020 at 04:10:08PM -0600, Bjorn Helgaas wrote:
> > > I think we have a problem with link bandwidth change notifications
> > > (see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/bw_notification.c).
> > > 
> > > Here's a recent bug report where Jan reported "_tons_" of these
> > > notifications on an nvme device:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=206197
> > > 
> > > There was similar discussion involving GPU drivers at
> > > https://lore.kernel.org/r/20190429185611.121751-2-helgaas@kernel.org
> > > 
> > > The current solution is the CONFIG_PCIE_BW config option, which
> > > disables the messages completely.  That option defaults to "off" (no
> > > messages), but even so, I think it's a little problematic.
> > > 
> > > Users are not really in a position to figure out whether it's safe to
> > > enable.  All they can do is experiment and see whether it works with
> > > their current mix of devices and drivers.
> > > 
> > > I don't think it's currently useful for distros because it's a
> > > compile-time switch, and distros cannot predict what system configs
> > > will be used, so I don't think they can enable it.
> > > 
> > > Does anybody have proposals for making it smarter about distinguishing
> > > real problems from intentional power management, or maybe interfaces
> > > drivers could use to tell us when we should ignore bandwidth changes?
> > 
> > NVMe, GPU folks, do your drivers or devices change PCIe link
> > speed/width for power saving or other reasons?  When CONFIG_PCIE_BW=y,
> > the PCI core interprets changes like that as problems that need to be
> > reported.
> > 
> > If drivers do change link speed/width, can you point me to where
> > that's done?  Would it be feasible to add some sort of PCI core
> > interface so the driver could say "ignore" or "pay attention to"
> > subsequent link changes?
> > 
> > Or maybe there would even be a way to move the link change itself into
> > the PCI core, so the core would be aware of what's going on?
> 
> Funny thing is, I was going to suggest an in-kernel API for this.
>    * Driver requests lower link speed 'X'
>    * Link management interrupt fires
>    * If link speed is at or above 'X' then do not report it.
> I think an "ignore" flag would defeat the purpose of having link 
> bandwidth reporting in the first place. If some drivers set it, and 
> others don't, then it would be inconsistent enough to not be useful.
> 
> A second suggestion is, if there is a way to ratelimit these messages on 
> a per-downstream port basis.

Both AMD and Nvidia GPUs have embedded controllers, which are
responsible for the power management. IIRC those controllers can
autonomously initiate PCIe link speed changes depending on measured bus
load. So there is no way for the driver to signal the requested bus
speed to the PCIe core.

I guess for the GPU usecase the best we can do is to have the driver
opt-out of the link bandwidth notifications, as the driver knows that
there is some autonomous entity on the endpoint mucking with the link
parameters.

Regards,
Lucas


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2020-01-21 11:10     ` Lucas Stach
@ 2020-01-21 14:55       ` Alex G.
  2020-02-03  1:56       ` Dave Airlie
  1 sibling, 0 replies; 24+ messages in thread
From: Alex G. @ 2020-01-21 14:55 UTC (permalink / raw)
  To: Lucas Stach, Bjorn Helgaas, Alexandru Gagniuc, Keith Busch,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, David Airlie,
	Daniel Vetter
  Cc: Jan Vesely, Lukas Wunner, Alex Williamson, Austin Bolen,
	Shyam Iyer, Sinan Kaya, linux-pci, linux-kernel

On 1/21/20 5:10 AM, Lucas Stach wrote:
> On Mo, 2020-01-20 at 10:01 -0600, Alex G. wrote:
>>
>> On 1/19/20 8:33 PM, Bjorn Helgaas wrote:
>>> [+cc NVMe, GPU driver folks]
>>>
>>> On Wed, Jan 15, 2020 at 04:10:08PM -0600, Bjorn Helgaas wrote:
>>>> I think we have a problem with link bandwidth change notifications
>>>> (see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/bw_notification.c).
>>>>
>>>> Here's a recent bug report where Jan reported "_tons_" of these
>>>> notifications on an nvme device:
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=206197
>>>>
>>>> There was similar discussion involving GPU drivers at
>>>> https://lore.kernel.org/r/20190429185611.121751-2-helgaas@kernel.org
>>>>
>>>> The current solution is the CONFIG_PCIE_BW config option, which
>>>> disables the messages completely.  That option defaults to "off" (no
>>>> messages), but even so, I think it's a little problematic.
>>>>
>>>> Users are not really in a position to figure out whether it's safe to
>>>> enable.  All they can do is experiment and see whether it works with
>>>> their current mix of devices and drivers.
>>>>
>>>> I don't think it's currently useful for distros because it's a
>>>> compile-time switch, and distros cannot predict what system configs
>>>> will be used, so I don't think they can enable it.
>>>>
>>>> Does anybody have proposals for making it smarter about distinguishing
>>>> real problems from intentional power management, or maybe interfaces
>>>> drivers could use to tell us when we should ignore bandwidth changes?
>>>
>>> NVMe, GPU folks, do your drivers or devices change PCIe link
>>> speed/width for power saving or other reasons?  When CONFIG_PCIE_BW=y,
>>> the PCI core interprets changes like that as problems that need to be
>>> reported.
>>>
>>> If drivers do change link speed/width, can you point me to where
>>> that's done?  Would it be feasible to add some sort of PCI core
>>> interface so the driver could say "ignore" or "pay attention to"
>>> subsequent link changes?
>>>
>>> Or maybe there would even be a way to move the link change itself into
>>> the PCI core, so the core would be aware of what's going on?
>>
>> Funny thing is, I was going to suggest an in-kernel API for this.
>>     * Driver requests lower link speed 'X'
>>     * Link management interrupt fires
>>     * If link speed is at or above 'X' then do not report it.
>> I think an "ignore" flag would defeat the purpose of having link
>> bandwidth reporting in the first place. If some drivers set it, and
>> others don't, then it would be inconsistent enough to not be useful.
>>
>> A second suggestion is, if there is a way to ratelimit these messages on
>> a per-downstream port basis.
> 
> Both AMD and Nvidia GPUs have embedded controllers, which are
> responsible for the power management. IIRC those controllers can
> autonomously initiate PCIe link speed changes depending on measured bus
> load. So there is no way for the driver to signal the requested bus
> speed to the PCIe core.
> 
> I guess for the GPU usecase the best we can do is to have the driver
> opt-out of the link bandwidth notifications, as the driver knows that
> there is some autonomous entity on the endpoint mucking with the link
> parameters.

I'm confused. Are you saying that the autonomous mechanism is causing a 
link bandwidth notification?

Alex

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2020-01-20  2:33 ` Bjorn Helgaas
  2020-01-20 15:56   ` Alex Williamson
  2020-01-20 16:01   ` Alex G.
@ 2020-01-30 16:26   ` Christoph Hellwig
  2 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2020-01-30 16:26 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Alexandru Gagniuc, Alexandru Gagniuc, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, David Airlie, Daniel Vetter,
	Jan Vesely, Lukas Wunner, Alex Williamson, Austin Bolen,
	Shyam Iyer, Sinan Kaya, linux-pci, linux-kernel

On Sun, Jan 19, 2020 at 08:33:26PM -0600, Bjorn Helgaas wrote:
> NVMe, GPU folks, do your drivers or devices change PCIe link
> speed/width for power saving or other reasons?  When CONFIG_PCIE_BW=y,
> the PCI core interprets changes like that as problems that need to be
> reported.

The NVMe driver doesn't.  For devices I don't know of any, but Ican't
find anything in the spec that would forbid it.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2020-01-21 11:10     ` Lucas Stach
  2020-01-21 14:55       ` Alex G.
@ 2020-02-03  1:56       ` Dave Airlie
  2020-02-03  2:04         ` Dave Airlie
  1 sibling, 1 reply; 24+ messages in thread
From: Dave Airlie @ 2020-02-03  1:56 UTC (permalink / raw)
  To: Lucas Stach, Alex Deucher, Ben Skeggs
  Cc: Alex G.,
	Bjorn Helgaas, Alexandru Gagniuc, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, David Airlie, Daniel Vetter,
	Jan Vesely, Lukas Wunner, Alex Williamson, Austin Bolen,
	Shyam Iyer, Sinan Kaya, Linux PCI, LKML

On Tue, 21 Jan 2020 at 21:11, Lucas Stach <l.stach@pengutronix.de> wrote:
>
> On Mo, 2020-01-20 at 10:01 -0600, Alex G. wrote:
> >
> > On 1/19/20 8:33 PM, Bjorn Helgaas wrote:
> > > [+cc NVMe, GPU driver folks]
> > >
> > > On Wed, Jan 15, 2020 at 04:10:08PM -0600, Bjorn Helgaas wrote:
> > > > I think we have a problem with link bandwidth change notifications
> > > > (see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/bw_notification.c).
> > > >
> > > > Here's a recent bug report where Jan reported "_tons_" of these
> > > > notifications on an nvme device:
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=206197
> > > >
> > > > There was similar discussion involving GPU drivers at
> > > > https://lore.kernel.org/r/20190429185611.121751-2-helgaas@kernel.org
> > > >
> > > > The current solution is the CONFIG_PCIE_BW config option, which
> > > > disables the messages completely.  That option defaults to "off" (no
> > > > messages), but even so, I think it's a little problematic.
> > > >
> > > > Users are not really in a position to figure out whether it's safe to
> > > > enable.  All they can do is experiment and see whether it works with
> > > > their current mix of devices and drivers.
> > > >
> > > > I don't think it's currently useful for distros because it's a
> > > > compile-time switch, and distros cannot predict what system configs
> > > > will be used, so I don't think they can enable it.
> > > >
> > > > Does anybody have proposals for making it smarter about distinguishing
> > > > real problems from intentional power management, or maybe interfaces
> > > > drivers could use to tell us when we should ignore bandwidth changes?
> > >
> > > NVMe, GPU folks, do your drivers or devices change PCIe link
> > > speed/width for power saving or other reasons?  When CONFIG_PCIE_BW=y,
> > > the PCI core interprets changes like that as problems that need to be
> > > reported.
> > >
> > > If drivers do change link speed/width, can you point me to where
> > > that's done?  Would it be feasible to add some sort of PCI core
> > > interface so the driver could say "ignore" or "pay attention to"
> > > subsequent link changes?
> > >
> > > Or maybe there would even be a way to move the link change itself into
> > > the PCI core, so the core would be aware of what's going on?
> >
> > Funny thing is, I was going to suggest an in-kernel API for this.
> >    * Driver requests lower link speed 'X'
> >    * Link management interrupt fires
> >    * If link speed is at or above 'X' then do not report it.
> > I think an "ignore" flag would defeat the purpose of having link
> > bandwidth reporting in the first place. If some drivers set it, and
> > others don't, then it would be inconsistent enough to not be useful.
> >
> > A second suggestion is, if there is a way to ratelimit these messages on
> > a per-downstream port basis.
>
> Both AMD and Nvidia GPUs have embedded controllers, which are
> responsible for the power management. IIRC those controllers can
> autonomously initiate PCIe link speed changes depending on measured bus
> load. So there is no way for the driver to signal the requested bus
> speed to the PCIe core.
>
> I guess for the GPU usecase the best we can do is to have the driver
> opt-out of the link bandwidth notifications, as the driver knows that
> there is some autonomous entity on the endpoint mucking with the link
> parameters.
>

Adding Alex and Ben for AMD and NVIDIA info

Dave.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2020-02-03  1:56       ` Dave Airlie
@ 2020-02-03  2:04         ` Dave Airlie
  2020-02-03  2:07           ` Ben Skeggs
  2020-02-03 21:16           ` Alex Deucher
  0 siblings, 2 replies; 24+ messages in thread
From: Dave Airlie @ 2020-02-03  2:04 UTC (permalink / raw)
  To: Lucas Stach, Alex Deucher, Ben Skeggs, karolherbst
  Cc: Alex G.,
	Bjorn Helgaas, Alexandru Gagniuc, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, David Airlie, Daniel Vetter,
	Jan Vesely, Lukas Wunner, Alex Williamson, Austin Bolen,
	Shyam Iyer, Sinan Kaya, Linux PCI, LKML

On Mon, 3 Feb 2020 at 11:56, Dave Airlie <airlied@gmail.com> wrote:
>
> On Tue, 21 Jan 2020 at 21:11, Lucas Stach <l.stach@pengutronix.de> wrote:
> >
> > On Mo, 2020-01-20 at 10:01 -0600, Alex G. wrote:
> > >
> > > On 1/19/20 8:33 PM, Bjorn Helgaas wrote:
> > > > [+cc NVMe, GPU driver folks]
> > > >
> > > > On Wed, Jan 15, 2020 at 04:10:08PM -0600, Bjorn Helgaas wrote:
> > > > > I think we have a problem with link bandwidth change notifications
> > > > > (see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/bw_notification.c).
> > > > >
> > > > > Here's a recent bug report where Jan reported "_tons_" of these
> > > > > notifications on an nvme device:
> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=206197
> > > > >
> > > > > There was similar discussion involving GPU drivers at
> > > > > https://lore.kernel.org/r/20190429185611.121751-2-helgaas@kernel.org
> > > > >
> > > > > The current solution is the CONFIG_PCIE_BW config option, which
> > > > > disables the messages completely.  That option defaults to "off" (no
> > > > > messages), but even so, I think it's a little problematic.
> > > > >
> > > > > Users are not really in a position to figure out whether it's safe to
> > > > > enable.  All they can do is experiment and see whether it works with
> > > > > their current mix of devices and drivers.
> > > > >
> > > > > I don't think it's currently useful for distros because it's a
> > > > > compile-time switch, and distros cannot predict what system configs
> > > > > will be used, so I don't think they can enable it.
> > > > >
> > > > > Does anybody have proposals for making it smarter about distinguishing
> > > > > real problems from intentional power management, or maybe interfaces
> > > > > drivers could use to tell us when we should ignore bandwidth changes?
> > > >
> > > > NVMe, GPU folks, do your drivers or devices change PCIe link
> > > > speed/width for power saving or other reasons?  When CONFIG_PCIE_BW=y,
> > > > the PCI core interprets changes like that as problems that need to be
> > > > reported.
> > > >
> > > > If drivers do change link speed/width, can you point me to where
> > > > that's done?  Would it be feasible to add some sort of PCI core
> > > > interface so the driver could say "ignore" or "pay attention to"
> > > > subsequent link changes?
> > > >
> > > > Or maybe there would even be a way to move the link change itself into
> > > > the PCI core, so the core would be aware of what's going on?
> > >
> > > Funny thing is, I was going to suggest an in-kernel API for this.
> > >    * Driver requests lower link speed 'X'
> > >    * Link management interrupt fires
> > >    * If link speed is at or above 'X' then do not report it.
> > > I think an "ignore" flag would defeat the purpose of having link
> > > bandwidth reporting in the first place. If some drivers set it, and
> > > others don't, then it would be inconsistent enough to not be useful.
> > >
> > > A second suggestion is, if there is a way to ratelimit these messages on
> > > a per-downstream port basis.
> >
> > Both AMD and Nvidia GPUs have embedded controllers, which are
> > responsible for the power management. IIRC those controllers can
> > autonomously initiate PCIe link speed changes depending on measured bus
> > load. So there is no way for the driver to signal the requested bus
> > speed to the PCIe core.
> >
> > I guess for the GPU usecase the best we can do is to have the driver
> > opt-out of the link bandwidth notifications, as the driver knows that
> > there is some autonomous entity on the endpoint mucking with the link
> > parameters.
> >
>
> Adding Alex and Ben for AMD and NVIDIA info (and Karol).
>
> Dave.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2020-02-03  2:04         ` Dave Airlie
@ 2020-02-03  2:07           ` Ben Skeggs
  2020-02-03 21:16           ` Alex Deucher
  1 sibling, 0 replies; 24+ messages in thread
From: Ben Skeggs @ 2020-02-03  2:07 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Lucas Stach, Alex Deucher, Karol Herbst, Alex G.,
	Bjorn Helgaas, Alexandru Gagniuc, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, David Airlie, Daniel Vetter,
	Jan Vesely, Lukas Wunner, Alex Williamson, Austin Bolen,
	Shyam Iyer, Sinan Kaya, Linux PCI, LKML

On Mon, 3 Feb 2020 at 12:04, Dave Airlie <airlied@gmail.com> wrote:
>
> On Mon, 3 Feb 2020 at 11:56, Dave Airlie <airlied@gmail.com> wrote:
> >
> > On Tue, 21 Jan 2020 at 21:11, Lucas Stach <l.stach@pengutronix.de> wrote:
> > >
> > > On Mo, 2020-01-20 at 10:01 -0600, Alex G. wrote:
> > > >
> > > > On 1/19/20 8:33 PM, Bjorn Helgaas wrote:
> > > > > [+cc NVMe, GPU driver folks]
> > > > >
> > > > > On Wed, Jan 15, 2020 at 04:10:08PM -0600, Bjorn Helgaas wrote:
> > > > > > I think we have a problem with link bandwidth change notifications
> > > > > > (see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/bw_notification.c).
> > > > > >
> > > > > > Here's a recent bug report where Jan reported "_tons_" of these
> > > > > > notifications on an nvme device:
> > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=206197
> > > > > >
> > > > > > There was similar discussion involving GPU drivers at
> > > > > > https://lore.kernel.org/r/20190429185611.121751-2-helgaas@kernel.org
> > > > > >
> > > > > > The current solution is the CONFIG_PCIE_BW config option, which
> > > > > > disables the messages completely.  That option defaults to "off" (no
> > > > > > messages), but even so, I think it's a little problematic.
> > > > > >
> > > > > > Users are not really in a position to figure out whether it's safe to
> > > > > > enable.  All they can do is experiment and see whether it works with
> > > > > > their current mix of devices and drivers.
> > > > > >
> > > > > > I don't think it's currently useful for distros because it's a
> > > > > > compile-time switch, and distros cannot predict what system configs
> > > > > > will be used, so I don't think they can enable it.
> > > > > >
> > > > > > Does anybody have proposals for making it smarter about distinguishing
> > > > > > real problems from intentional power management, or maybe interfaces
> > > > > > drivers could use to tell us when we should ignore bandwidth changes?
> > > > >
> > > > > NVMe, GPU folks, do your drivers or devices change PCIe link
> > > > > speed/width for power saving or other reasons?  When CONFIG_PCIE_BW=y,
> > > > > the PCI core interprets changes like that as problems that need to be
> > > > > reported.
> > > > >
> > > > > If drivers do change link speed/width, can you point me to where
> > > > > that's done?  Would it be feasible to add some sort of PCI core
> > > > > interface so the driver could say "ignore" or "pay attention to"
> > > > > subsequent link changes?
> > > > >
> > > > > Or maybe there would even be a way to move the link change itself into
> > > > > the PCI core, so the core would be aware of what's going on?
> > > >
> > > > Funny thing is, I was going to suggest an in-kernel API for this.
> > > >    * Driver requests lower link speed 'X'
> > > >    * Link management interrupt fires
> > > >    * If link speed is at or above 'X' then do not report it.
> > > > I think an "ignore" flag would defeat the purpose of having link
> > > > bandwidth reporting in the first place. If some drivers set it, and
> > > > others don't, then it would be inconsistent enough to not be useful.
> > > >
> > > > A second suggestion is, if there is a way to ratelimit these messages on
> > > > a per-downstream port basis.
> > >
> > > Both AMD and Nvidia GPUs have embedded controllers, which are
> > > responsible for the power management. IIRC those controllers can
> > > autonomously initiate PCIe link speed changes depending on measured bus
> > > load. So there is no way for the driver to signal the requested bus
> > > speed to the PCIe core.
> > >
> > > I guess for the GPU usecase the best we can do is to have the driver
> > > opt-out of the link bandwidth notifications, as the driver knows that
> > > there is some autonomous entity on the endpoint mucking with the link
> > > parameters.
> > >
> >
> > Adding Alex and Ben for AMD and NVIDIA info (and Karol).
We don't attempt link speed changes by default in Nouveau currently,
however, I believe we will (if instructed by the VBIOS perf tables) if
you enable clock frequency changes.  We do this from the driver code
currently, but I'm not 100% sure what this will look like if/when we
receive PMU firmware.  More and more autonomy is being pushed into
there by NVIDIA with each generation, it's possible that even happens
on Volta/Turing PMU already, it's not an area I've looked into
recently.

Ben.

> >
> > Dave.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2020-02-03  2:04         ` Dave Airlie
  2020-02-03  2:07           ` Ben Skeggs
@ 2020-02-03 21:16           ` Alex Deucher
  2020-02-04  4:38             ` Lukas Wunner
  1 sibling, 1 reply; 24+ messages in thread
From: Alex Deucher @ 2020-02-03 21:16 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Lucas Stach, Ben Skeggs, Karol Herbst, Alex G.,
	Bjorn Helgaas, Alexandru Gagniuc, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, David Airlie, Daniel Vetter,
	Jan Vesely, Lukas Wunner, Alex Williamson, Austin Bolen,
	Shyam Iyer, Sinan Kaya, Linux PCI, LKML

On Sun, Feb 2, 2020 at 9:04 PM Dave Airlie <airlied@gmail.com> wrote:
>
> On Mon, 3 Feb 2020 at 11:56, Dave Airlie <airlied@gmail.com> wrote:
> >
> > On Tue, 21 Jan 2020 at 21:11, Lucas Stach <l.stach@pengutronix.de> wrote:
> > >
> > > On Mo, 2020-01-20 at 10:01 -0600, Alex G. wrote:
> > > >
> > > > On 1/19/20 8:33 PM, Bjorn Helgaas wrote:
> > > > > [+cc NVMe, GPU driver folks]
> > > > >
> > > > > On Wed, Jan 15, 2020 at 04:10:08PM -0600, Bjorn Helgaas wrote:
> > > > > > I think we have a problem with link bandwidth change notifications
> > > > > > (see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/bw_notification.c).
> > > > > >
> > > > > > Here's a recent bug report where Jan reported "_tons_" of these
> > > > > > notifications on an nvme device:
> > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=206197
> > > > > >
> > > > > > There was similar discussion involving GPU drivers at
> > > > > > https://lore.kernel.org/r/20190429185611.121751-2-helgaas@kernel.org
> > > > > >
> > > > > > The current solution is the CONFIG_PCIE_BW config option, which
> > > > > > disables the messages completely.  That option defaults to "off" (no
> > > > > > messages), but even so, I think it's a little problematic.
> > > > > >
> > > > > > Users are not really in a position to figure out whether it's safe to
> > > > > > enable.  All they can do is experiment and see whether it works with
> > > > > > their current mix of devices and drivers.
> > > > > >
> > > > > > I don't think it's currently useful for distros because it's a
> > > > > > compile-time switch, and distros cannot predict what system configs
> > > > > > will be used, so I don't think they can enable it.
> > > > > >
> > > > > > Does anybody have proposals for making it smarter about distinguishing
> > > > > > real problems from intentional power management, or maybe interfaces
> > > > > > drivers could use to tell us when we should ignore bandwidth changes?
> > > > >
> > > > > NVMe, GPU folks, do your drivers or devices change PCIe link
> > > > > speed/width for power saving or other reasons?  When CONFIG_PCIE_BW=y,
> > > > > the PCI core interprets changes like that as problems that need to be
> > > > > reported.
> > > > >
> > > > > If drivers do change link speed/width, can you point me to where
> > > > > that's done?  Would it be feasible to add some sort of PCI core
> > > > > interface so the driver could say "ignore" or "pay attention to"
> > > > > subsequent link changes?
> > > > >
> > > > > Or maybe there would even be a way to move the link change itself into
> > > > > the PCI core, so the core would be aware of what's going on?
> > > >
> > > > Funny thing is, I was going to suggest an in-kernel API for this.
> > > >    * Driver requests lower link speed 'X'
> > > >    * Link management interrupt fires
> > > >    * If link speed is at or above 'X' then do not report it.
> > > > I think an "ignore" flag would defeat the purpose of having link
> > > > bandwidth reporting in the first place. If some drivers set it, and
> > > > others don't, then it would be inconsistent enough to not be useful.
> > > >
> > > > A second suggestion is, if there is a way to ratelimit these messages on
> > > > a per-downstream port basis.
> > >
> > > Both AMD and Nvidia GPUs have embedded controllers, which are
> > > responsible for the power management. IIRC those controllers can
> > > autonomously initiate PCIe link speed changes depending on measured bus
> > > load. So there is no way for the driver to signal the requested bus
> > > speed to the PCIe core.
> > >
> > > I guess for the GPU usecase the best we can do is to have the driver
> > > opt-out of the link bandwidth notifications, as the driver knows that
> > > there is some autonomous entity on the endpoint mucking with the link
> > > parameters.
> > >
> >
> > Adding Alex and Ben for AMD and NVIDIA info (and Karol).

AMD has had a micro-controller on the GPU handling pcie link speeds
and widths dynamically (in addition to GPU clocks and voltages) for
about 12 years or so at this point to save power when the GPU is idle
and improve performance when it's required.  The micro-controller
changes the link parameters dynamically based on load independent of
the driver.  The driver can tweak the heuristics, or even disable the
dynamic changes, but by default it's enabled when the driver loads.
The ucode for this micro-controller is loaded by the driver so you'll
see fixed clocks and widths prior to the driver loading.  We'd need
some sort of opt out I suppose for periods when the driver has enabled
dynamic pcie power management in the micro-controller.

Alex

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2020-02-03 21:16           ` Alex Deucher
@ 2020-02-04  4:38             ` Lukas Wunner
  2020-02-04 14:47               ` Alex Deucher
  0 siblings, 1 reply; 24+ messages in thread
From: Lukas Wunner @ 2020-02-04  4:38 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Dave Airlie, Lucas Stach, Ben Skeggs, Karol Herbst, Alex G.,
	Bjorn Helgaas, Alexandru Gagniuc, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, David Airlie, Daniel Vetter,
	Jan Vesely, Alex Williamson, Austin Bolen, Shyam Iyer,
	Sinan Kaya, Linux PCI, LKML

On Mon, Feb 03, 2020 at 04:16:36PM -0500, Alex Deucher wrote:
> AMD has had a micro-controller on the GPU handling pcie link speeds
> and widths dynamically (in addition to GPU clocks and voltages) for
> about 12 years or so at this point to save power when the GPU is idle
> and improve performance when it's required.  The micro-controller
> changes the link parameters dynamically based on load independent of
> the driver.  The driver can tweak the heuristics, or even disable the
> dynamic changes, but by default it's enabled when the driver loads.
> The ucode for this micro-controller is loaded by the driver so you'll
> see fixed clocks and widths prior to the driver loading.  We'd need
> some sort of opt out I suppose for periods when the driver has enabled
> dynamic pcie power management in the micro-controller.

Note that there are *two* bits in the Link Status Register:

* Link Autonomous Bandwidth Status
  "This bit is Set by hardware to indicate that hardware has
  autonomously changed Link speed or width, without the Port
  transitioning through DL_Down status, for reasons other than to
  attempt to correct unreliable Link operation.  This bit must be set if
  the Physical Layer reports a speed or width change was initiated by
  the Downstream component that was indicated as an autonomous change."

* Link Bandwidth Management Status
  "This bit is Set by hardware to indicate that either of the
  following has occurred without the Port transitioning through
  DL_Down status. [...] Hardware has changed Link speed or width to
  attempt to correct unreliable Link operation, either through an
  LTSSM timeout or a higher level process."

See PCIe Base Spec 4.0 sec 7.8.8, 7.8.7, 4.2.6.3.3.1.

The two bits generate *separate* interrupts.  We only enable the
interrupt for the latter.

If AMD GPUs generate a Link Bandwidth Management Interrupt upon
autonomously changing bandwidth for power management reasons
(instead of to correct unreliability issues), that would be a
spec violation.

So the question is, do your GPUs violate the spec in this regard?

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2020-02-04  4:38             ` Lukas Wunner
@ 2020-02-04 14:47               ` Alex Deucher
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Deucher @ 2020-02-04 14:47 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Dave Airlie, Lucas Stach, Ben Skeggs, Karol Herbst, Alex G.,
	Bjorn Helgaas, Alexandru Gagniuc, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, David Airlie, Daniel Vetter,
	Jan Vesely, Alex Williamson, Austin Bolen, Shyam Iyer,
	Sinan Kaya, Linux PCI, LKML

On Mon, Feb 3, 2020 at 11:38 PM Lukas Wunner <lukas@wunner.de> wrote:
>
> On Mon, Feb 03, 2020 at 04:16:36PM -0500, Alex Deucher wrote:
> > AMD has had a micro-controller on the GPU handling pcie link speeds
> > and widths dynamically (in addition to GPU clocks and voltages) for
> > about 12 years or so at this point to save power when the GPU is idle
> > and improve performance when it's required.  The micro-controller
> > changes the link parameters dynamically based on load independent of
> > the driver.  The driver can tweak the heuristics, or even disable the
> > dynamic changes, but by default it's enabled when the driver loads.
> > The ucode for this micro-controller is loaded by the driver so you'll
> > see fixed clocks and widths prior to the driver loading.  We'd need
> > some sort of opt out I suppose for periods when the driver has enabled
> > dynamic pcie power management in the micro-controller.
>
> Note that there are *two* bits in the Link Status Register:
>
> * Link Autonomous Bandwidth Status
>   "This bit is Set by hardware to indicate that hardware has
>   autonomously changed Link speed or width, without the Port
>   transitioning through DL_Down status, for reasons other than to
>   attempt to correct unreliable Link operation.  This bit must be set if
>   the Physical Layer reports a speed or width change was initiated by
>   the Downstream component that was indicated as an autonomous change."
>
> * Link Bandwidth Management Status
>   "This bit is Set by hardware to indicate that either of the
>   following has occurred without the Port transitioning through
>   DL_Down status. [...] Hardware has changed Link speed or width to
>   attempt to correct unreliable Link operation, either through an
>   LTSSM timeout or a higher level process."
>
> See PCIe Base Spec 4.0 sec 7.8.8, 7.8.7, 4.2.6.3.3.1.
>
> The two bits generate *separate* interrupts.  We only enable the
> interrupt for the latter.
>
> If AMD GPUs generate a Link Bandwidth Management Interrupt upon
> autonomously changing bandwidth for power management reasons
> (instead of to correct unreliability issues), that would be a
> spec violation.
>
> So the question is, do your GPUs violate the spec in this regard?

I don't know off hand.  I can ask the firmware team.  That said, I
haven't seen any reports of problems with our GPUs with this code
enabled.

Alex

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2020-01-15 22:10 Issues with "PCI/LINK: Report degraded links via link bandwidth notification" Bjorn Helgaas
  2020-01-16  2:44 ` Alex G
  2020-01-20  2:33 ` Bjorn Helgaas
@ 2020-02-22 16:58 ` Bjorn Helgaas
  2021-01-28 23:39   ` Bjorn Helgaas
  2 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2020-02-22 16:58 UTC (permalink / raw)
  To: Alexandru Gagniuc, Alexandru Gagniuc, Keith Busch
  Cc: Jan Vesely, Lukas Wunner, Alex Williamson, Austin Bolen,
	Shyam Iyer, Sinan Kaya, linux-pci, linux-kernel,
	Christoph Hellwig, Lucas Stach, Dave Airlie, Ben Skeggs,
	Alex Deucher, Myron Stowe

[+cc Christoph, Lucas, Dave, Ben, Alex, Myron]

On Wed, Jan 15, 2020 at 04:10:08PM -0600, Bjorn Helgaas wrote:
> I think we have a problem with link bandwidth change notifications
> (see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/bw_notification.c).
> 
> Here's a recent bug report where Jan reported "_tons_" of these
> notifications on an nvme device:
> https://bugzilla.kernel.org/show_bug.cgi?id=206197

AFAICT, this thread petered out with no resolution.

If the bandwidth change notifications are important to somebody,
please speak up, preferably with a patch that makes the notifications
disabled by default and adds a parameter to enable them (or some other
strategy that makes sense).

I think these are potentially useful, so I don't really want to just
revert them, but if nobody thinks these are important enough to fix,
that's a possibility.

> There was similar discussion involving GPU drivers at
> https://lore.kernel.org/r/20190429185611.121751-2-helgaas@kernel.org
> 
> The current solution is the CONFIG_PCIE_BW config option, which
> disables the messages completely.  That option defaults to "off" (no
> messages), but even so, I think it's a little problematic.
> 
> Users are not really in a position to figure out whether it's safe to
> enable.  All they can do is experiment and see whether it works with
> their current mix of devices and drivers.
> 
> I don't think it's currently useful for distros because it's a
> compile-time switch, and distros cannot predict what system configs
> will be used, so I don't think they can enable it.
> 
> Does anybody have proposals for making it smarter about distinguishing
> real problems from intentional power management, or maybe interfaces
> drivers could use to tell us when we should ignore bandwidth changes?
> 
> Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2020-02-22 16:58 ` Bjorn Helgaas
@ 2021-01-28 23:39   ` Bjorn Helgaas
  2021-01-28 23:51     ` Sinan Kaya
  2021-01-29  1:30     ` Alex Deucher
  0 siblings, 2 replies; 24+ messages in thread
From: Bjorn Helgaas @ 2021-01-28 23:39 UTC (permalink / raw)
  To: Alexandru Gagniuc, Alexandru Gagniuc, Keith Busch
  Cc: Jan Vesely, Lukas Wunner, Alex Williamson, Austin Bolen,
	Shyam Iyer, Sinan Kaya, linux-pci, linux-kernel,
	Christoph Hellwig, Lucas Stach, Dave Airlie, Ben Skeggs,
	Alex Deucher, Myron Stowe, A. Vladimirov

[+cc Atanas -- thank you very much for the bug report!]

On Sat, Feb 22, 2020 at 10:58:40AM -0600, Bjorn Helgaas wrote:
> On Wed, Jan 15, 2020 at 04:10:08PM -0600, Bjorn Helgaas wrote:
> > I think we have a problem with link bandwidth change notifications
> > (see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/bw_notification.c).
> > 
> > Here's a recent bug report where Jan reported "_tons_" of these
> > notifications on an nvme device:
> > https://bugzilla.kernel.org/show_bug.cgi?id=206197
> 
> AFAICT, this thread petered out with no resolution.
> 
> If the bandwidth change notifications are important to somebody,
> please speak up, preferably with a patch that makes the notifications
> disabled by default and adds a parameter to enable them (or some other
> strategy that makes sense).
> 
> I think these are potentially useful, so I don't really want to just
> revert them, but if nobody thinks these are important enough to fix,
> that's a possibility.

Atanas is also seeing this problem and went to the trouble of digging
up this bug report:
https://bugzilla.kernel.org/show_bug.cgi?id=206197#c8

I'm actually a little surprised that we haven't seen more reports of
this.  I don't think distros enable CONFIG_PCIE_BW, but even so, I
would think more people running upstream kernels would trip over it.
But maybe people just haven't turned CONFIG_PCIE_BW on.

I don't have a suggestion; just adding Atanas to this old thread.

> > There was similar discussion involving GPU drivers at
> > https://lore.kernel.org/r/20190429185611.121751-2-helgaas@kernel.org
> > 
> > The current solution is the CONFIG_PCIE_BW config option, which
> > disables the messages completely.  That option defaults to "off" (no
> > messages), but even so, I think it's a little problematic.
> > 
> > Users are not really in a position to figure out whether it's safe to
> > enable.  All they can do is experiment and see whether it works with
> > their current mix of devices and drivers.
> > 
> > I don't think it's currently useful for distros because it's a
> > compile-time switch, and distros cannot predict what system configs
> > will be used, so I don't think they can enable it.
> > 
> > Does anybody have proposals for making it smarter about distinguishing
> > real problems from intentional power management, or maybe interfaces
> > drivers could use to tell us when we should ignore bandwidth changes?
> > 
> > Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2021-01-28 23:39   ` Bjorn Helgaas
@ 2021-01-28 23:51     ` Sinan Kaya
  2021-01-29  0:07       ` Alex G.
  2021-01-29  1:30     ` Alex Deucher
  1 sibling, 1 reply; 24+ messages in thread
From: Sinan Kaya @ 2021-01-28 23:51 UTC (permalink / raw)
  To: Bjorn Helgaas, Alexandru Gagniuc, Alexandru Gagniuc, Keith Busch
  Cc: Jan Vesely, Lukas Wunner, Alex Williamson, Austin Bolen,
	Shyam Iyer, linux-pci, linux-kernel, Christoph Hellwig,
	Lucas Stach, Dave Airlie, Ben Skeggs, Alex Deucher, Myron Stowe,
	A. Vladimirov

On 1/28/2021 6:39 PM, Bjorn Helgaas wrote:
> AFAICT, this thread petered out with no resolution.
> 
> If the bandwidth change notifications are important to somebody,
> please speak up, preferably with a patch that makes the notifications
> disabled by default and adds a parameter to enable them (or some other
> strategy that makes sense).
> 
> I think these are potentially useful, so I don't really want to just
> revert them, but if nobody thinks these are important enough to fix,
> that's a possibility.

Hide behind debug or expert option by default? or even mark it as BROKEN
until someone fixes it?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2021-01-28 23:51     ` Sinan Kaya
@ 2021-01-29  0:07       ` Alex G.
  2021-01-29 21:56         ` Bjorn Helgaas
  0 siblings, 1 reply; 24+ messages in thread
From: Alex G. @ 2021-01-29  0:07 UTC (permalink / raw)
  To: Sinan Kaya, Bjorn Helgaas, Alexandru Gagniuc, Keith Busch
  Cc: Jan Vesely, Lukas Wunner, Alex Williamson, Austin Bolen,
	Shyam Iyer, linux-pci, linux-kernel, Christoph Hellwig,
	Lucas Stach, Dave Airlie, Ben Skeggs, Alex Deucher, Myron Stowe,
	A. Vladimirov

On 1/28/21 5:51 PM, Sinan Kaya wrote:
> On 1/28/2021 6:39 PM, Bjorn Helgaas wrote:
>> AFAICT, this thread petered out with no resolution.
>>
>> If the bandwidth change notifications are important to somebody,
>> please speak up, preferably with a patch that makes the notifications
>> disabled by default and adds a parameter to enable them (or some other
>> strategy that makes sense).
>>
>> I think these are potentially useful, so I don't really want to just
>> revert them, but if nobody thinks these are important enough to fix,
>> that's a possibility.
> 
> Hide behind debug or expert option by default? or even mark it as BROKEN
> until someone fixes it?
> 
Instead of making it a config option, wouldn't it be better as a kernel 
parameter? People encountering this seem quite competent in passing 
kernel arguments, so having a "pcie_bw_notification=off" would solve 
their problems.

As far as marking this as broken, I've seen no conclusive evidence of to 
tell if its a sw bug or actual hardware problem. Could we have a sysfs 
to disable this on a per-downstream-port basis?

e.g.
     echo 0 > /sys/bus/pci/devices/0000:00:04.0/bw_notification_enabled

This probably won't be ideal if there are many devices downtraining 
their links ad-hoc. At worst we'd have a way to silence those messages 
if we do encounter such devices.

Alex

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2021-01-28 23:39   ` Bjorn Helgaas
  2021-01-28 23:51     ` Sinan Kaya
@ 2021-01-29  1:30     ` Alex Deucher
  1 sibling, 0 replies; 24+ messages in thread
From: Alex Deucher @ 2021-01-29  1:30 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Alexandru Gagniuc, Alexandru Gagniuc, Keith Busch, Jan Vesely,
	Lukas Wunner, Alex Williamson, Austin Bolen, Shyam Iyer,
	Sinan Kaya, Linux PCI, LKML, Christoph Hellwig, Lucas Stach,
	Dave Airlie, Ben Skeggs, Myron Stowe, A. Vladimirov

On Thu, Jan 28, 2021 at 6:39 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Atanas -- thank you very much for the bug report!]
>
> On Sat, Feb 22, 2020 at 10:58:40AM -0600, Bjorn Helgaas wrote:
> > On Wed, Jan 15, 2020 at 04:10:08PM -0600, Bjorn Helgaas wrote:
> > > I think we have a problem with link bandwidth change notifications
> > > (see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/bw_notification.c).
> > >
> > > Here's a recent bug report where Jan reported "_tons_" of these
> > > notifications on an nvme device:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=206197
> >
> > AFAICT, this thread petered out with no resolution.
> >
> > If the bandwidth change notifications are important to somebody,
> > please speak up, preferably with a patch that makes the notifications
> > disabled by default and adds a parameter to enable them (or some other
> > strategy that makes sense).
> >
> > I think these are potentially useful, so I don't really want to just
> > revert them, but if nobody thinks these are important enough to fix,
> > that's a possibility.
>
> Atanas is also seeing this problem and went to the trouble of digging
> up this bug report:
> https://bugzilla.kernel.org/show_bug.cgi?id=206197#c8
>
> I'm actually a little surprised that we haven't seen more reports of
> this.  I don't think distros enable CONFIG_PCIE_BW, but even so, I
> would think more people running upstream kernels would trip over it.
> But maybe people just haven't turned CONFIG_PCIE_BW on.
>
> I don't have a suggestion; just adding Atanas to this old thread.
>
> > > There was similar discussion involving GPU drivers at
> > > https://lore.kernel.org/r/20190429185611.121751-2-helgaas@kernel.org
> > >
> > > The current solution is the CONFIG_PCIE_BW config option, which
> > > disables the messages completely.  That option defaults to "off" (no
> > > messages), but even so, I think it's a little problematic.
> > >
> > > Users are not really in a position to figure out whether it's safe to
> > > enable.  All they can do is experiment and see whether it works with
> > > their current mix of devices and drivers.
> > >
> > > I don't think it's currently useful for distros because it's a
> > > compile-time switch, and distros cannot predict what system configs
> > > will be used, so I don't think they can enable it.
> > >
> > > Does anybody have proposals for making it smarter about distinguishing
> > > real problems from intentional power management, or maybe interfaces
> > > drivers could use to tell us when we should ignore bandwidth changes?

There's also this recently filed bug:
https://gitlab.freedesktop.org/drm/amd/-/issues/1447
The root cause of it appears to be related to ASPM.

Alex

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2021-01-29  0:07       ` Alex G.
@ 2021-01-29 21:56         ` Bjorn Helgaas
  2021-02-02 19:50           ` Alex G.
  0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2021-01-29 21:56 UTC (permalink / raw)
  To: Alex G.
  Cc: Sinan Kaya, Alexandru Gagniuc, Keith Busch, Jan Vesely,
	Lukas Wunner, Alex Williamson, Austin Bolen, Shyam Iyer,
	linux-pci, linux-kernel, Christoph Hellwig, Lucas Stach,
	Dave Airlie, Ben Skeggs, Alex Deucher, Myron Stowe,
	A. Vladimirov

On Thu, Jan 28, 2021 at 06:07:36PM -0600, Alex G. wrote:
> On 1/28/21 5:51 PM, Sinan Kaya wrote:
> > On 1/28/2021 6:39 PM, Bjorn Helgaas wrote:
> > > AFAICT, this thread petered out with no resolution.
> > > 
> > > If the bandwidth change notifications are important to somebody,
> > > please speak up, preferably with a patch that makes the notifications
> > > disabled by default and adds a parameter to enable them (or some other
> > > strategy that makes sense).
> > > 
> > > I think these are potentially useful, so I don't really want to just
> > > revert them, but if nobody thinks these are important enough to fix,
> > > that's a possibility.
> > 
> > Hide behind debug or expert option by default? or even mark it as BROKEN
> > until someone fixes it?
> > 
> Instead of making it a config option, wouldn't it be better as a kernel
> parameter? People encountering this seem quite competent in passing kernel
> arguments, so having a "pcie_bw_notification=off" would solve their
> problems.

I don't want people to have to discover a parameter to solve issues.
If there's a parameter, notification should default to off, and people
who want notification should supply a parameter to enable it.  Same
thing for the sysfs idea.

I think we really just need to figure out what's going on.  Then it
should be clearer how to handle it.  I'm not really in a position to
debug the root cause since I don't have the hardware or the time.  If
nobody can figure out what's going on, I think we'll have to make it
disabled by default.

> As far as marking this as broken, I've seen no conclusive evidence of to
> tell if its a sw bug or actual hardware problem. Could we have a sysfs to
> disable this on a per-downstream-port basis?
> 
> e.g.
>     echo 0 > /sys/bus/pci/devices/0000:00:04.0/bw_notification_enabled
> 
> This probably won't be ideal if there are many devices downtraining their
> links ad-hoc. At worst we'd have a way to silence those messages if we do
> encounter such devices.
> 
> Alex

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2021-01-29 21:56         ` Bjorn Helgaas
@ 2021-02-02 19:50           ` Alex G.
  2021-02-02 20:16             ` Bjorn Helgaas
  0 siblings, 1 reply; 24+ messages in thread
From: Alex G. @ 2021-02-02 19:50 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Sinan Kaya, Keith Busch, Jan Vesely, Lukas Wunner,
	Alex Williamson, Austin Bolen, Shyam Iyer, linux-pci,
	linux-kernel, Christoph Hellwig, Lucas Stach, Dave Airlie,
	Ben Skeggs, Alex Deucher, Myron Stowe, A. Vladimirov

On 1/29/21 3:56 PM, Bjorn Helgaas wrote:
> On Thu, Jan 28, 2021 at 06:07:36PM -0600, Alex G. wrote:
>> On 1/28/21 5:51 PM, Sinan Kaya wrote:
>>> On 1/28/2021 6:39 PM, Bjorn Helgaas wrote:
>>>> AFAICT, this thread petered out with no resolution.
>>>>
>>>> If the bandwidth change notifications are important to somebody,
>>>> please speak up, preferably with a patch that makes the notifications
>>>> disabled by default and adds a parameter to enable them (or some other
>>>> strategy that makes sense).
>>>>
>>>> I think these are potentially useful, so I don't really want to just
>>>> revert them, but if nobody thinks these are important enough to fix,
>>>> that's a possibility.
>>>
>>> Hide behind debug or expert option by default? or even mark it as BROKEN
>>> until someone fixes it?
>>>
>> Instead of making it a config option, wouldn't it be better as a kernel
>> parameter? People encountering this seem quite competent in passing kernel
>> arguments, so having a "pcie_bw_notification=off" would solve their
>> problems.
> 
> I don't want people to have to discover a parameter to solve issues.
> If there's a parameter, notification should default to off, and people
> who want notification should supply a parameter to enable it.  Same
> thing for the sysfs idea.

I can imagine cases where a per-port flag would be useful. For example, 
a machine with a NIC and a couple of PCIe storage drives. In this 
example, the PCIe drives donwtrain willie-nillie, so it's useful to turn 
off their notifications, but the NIC absolutely must not downtrain. It's 
debatable whether it should be default on or default off.

> I think we really just need to figure out what's going on.  Then it
> should be clearer how to handle it.  I'm not really in a position to
> debug the root cause since I don't have the hardware or the time.

I wonder
(a) if some PCIe devices are downtraining willie-nillie to save power
(b) if this willie-nillie downtraining somehow violates PCIe spec
(c) what is the official behavior when downtraining is intentional

My theory is: YES, YES, ASPM. But I don't know how to figure this out 
without having the problem hardware in hand.


> If nobody can figure out what's going on, I think we'll have to make it
> disabled by default.

I think most distros do "CONFIG_PCIE_BW is not set". Is that not true?

Alex

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2021-02-02 19:50           ` Alex G.
@ 2021-02-02 20:16             ` Bjorn Helgaas
  2021-02-02 20:25               ` Alex G.
  0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2021-02-02 20:16 UTC (permalink / raw)
  To: Alex G.
  Cc: Sinan Kaya, Keith Busch, Jan Vesely, Lukas Wunner,
	Alex Williamson, Austin Bolen, Shyam Iyer, linux-pci,
	linux-kernel, Christoph Hellwig, Lucas Stach, Dave Airlie,
	Ben Skeggs, Alex Deucher, Myron Stowe, A. Vladimirov

On Tue, Feb 02, 2021 at 01:50:20PM -0600, Alex G. wrote:
> On 1/29/21 3:56 PM, Bjorn Helgaas wrote:
> > On Thu, Jan 28, 2021 at 06:07:36PM -0600, Alex G. wrote:
> > > On 1/28/21 5:51 PM, Sinan Kaya wrote:
> > > > On 1/28/2021 6:39 PM, Bjorn Helgaas wrote:
> > > > > AFAICT, this thread petered out with no resolution.
> > > > > 
> > > > > If the bandwidth change notifications are important to somebody,
> > > > > please speak up, preferably with a patch that makes the notifications
> > > > > disabled by default and adds a parameter to enable them (or some other
> > > > > strategy that makes sense).
> > > > > 
> > > > > I think these are potentially useful, so I don't really want to just
> > > > > revert them, but if nobody thinks these are important enough to fix,
> > > > > that's a possibility.
> > > > 
> > > > Hide behind debug or expert option by default? or even mark it as BROKEN
> > > > until someone fixes it?
> > > > 
> > > Instead of making it a config option, wouldn't it be better as a kernel
> > > parameter? People encountering this seem quite competent in passing kernel
> > > arguments, so having a "pcie_bw_notification=off" would solve their
> > > problems.
> > 
> > I don't want people to have to discover a parameter to solve issues.
> > If there's a parameter, notification should default to off, and people
> > who want notification should supply a parameter to enable it.  Same
> > thing for the sysfs idea.
> 
> I can imagine cases where a per-port flag would be useful. For example, a
> machine with a NIC and a couple of PCIe storage drives. In this example, the
> PCIe drives downtrain willie-nillie, so it's useful to turn off their
> notifications, but the NIC absolutely must not downtrain. It's debatable
> whether it should be default on or default off.
>
> > I think we really just need to figure out what's going on.  Then it
> > should be clearer how to handle it.  I'm not really in a position to
> > debug the root cause since I don't have the hardware or the time.
> 
> I wonder
> (a) if some PCIe devices are downtraining willie-nillie to save power
> (b) if this willie-nillie downtraining somehow violates PCIe spec
> (c) what is the official behavior when downtraining is intentional
> 
> My theory is: YES, YES, ASPM. But I don't know how to figure this out
> without having the problem hardware in hand.
> 
> > If nobody can figure out what's going on, I think we'll have to make it
> > disabled by default.
> 
> I think most distros do "CONFIG_PCIE_BW is not set". Is that not true?

I think it *is* true that distros do not enable CONFIG_PCIE_BW.

But it's perfectly reasonable for people building their own kernels to
enable it.  It should be safe to enable all config options.  If they
do enable CONFIG_PCIE_BW, I don't want them to waste time debugging
messages they don't expect.

If we understood why these happen and could filter out the expected
ones, that would be great.  But we don't.  We've already wasted quite
a bit of Jan's and Atanas' time, and no doubt others who haven't
bothered to file bug reports.

So I think I'll queue up a patch to remove the functionality for now.
It's easily restored if somebody debugs the problem or adds a
command-line switch or something.

Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Issues with "PCI/LINK: Report degraded links via link bandwidth notification"
  2021-02-02 20:16             ` Bjorn Helgaas
@ 2021-02-02 20:25               ` Alex G.
  0 siblings, 0 replies; 24+ messages in thread
From: Alex G. @ 2021-02-02 20:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Sinan Kaya, Keith Busch, Jan Vesely, Lukas Wunner,
	Alex Williamson, Austin Bolen, Shyam Iyer, linux-pci,
	linux-kernel, Christoph Hellwig, Lucas Stach, Dave Airlie,
	Ben Skeggs, Alex Deucher, Myron Stowe, A. Vladimirov

On 2/2/21 2:16 PM, Bjorn Helgaas wrote:
> On Tue, Feb 02, 2021 at 01:50:20PM -0600, Alex G. wrote:
>> On 1/29/21 3:56 PM, Bjorn Helgaas wrote:
>>> On Thu, Jan 28, 2021 at 06:07:36PM -0600, Alex G. wrote:
>>>> On 1/28/21 5:51 PM, Sinan Kaya wrote:
>>>>> On 1/28/2021 6:39 PM, Bjorn Helgaas wrote:
>>>>>> AFAICT, this thread petered out with no resolution.
>>>>>>
>>>>>> If the bandwidth change notifications are important to somebody,
>>>>>> please speak up, preferably with a patch that makes the notifications
>>>>>> disabled by default and adds a parameter to enable them (or some other
>>>>>> strategy that makes sense).
>>>>>>
>>>>>> I think these are potentially useful, so I don't really want to just
>>>>>> revert them, but if nobody thinks these are important enough to fix,
>>>>>> that's a possibility.
>>>>>
>>>>> Hide behind debug or expert option by default? or even mark it as BROKEN
>>>>> until someone fixes it?
>>>>>
>>>> Instead of making it a config option, wouldn't it be better as a kernel
>>>> parameter? People encountering this seem quite competent in passing kernel
>>>> arguments, so having a "pcie_bw_notification=off" would solve their
>>>> problems.
>>>
>>> I don't want people to have to discover a parameter to solve issues.
>>> If there's a parameter, notification should default to off, and people
>>> who want notification should supply a parameter to enable it.  Same
>>> thing for the sysfs idea.
>>
>> I can imagine cases where a per-port flag would be useful. For example, a
>> machine with a NIC and a couple of PCIe storage drives. In this example, the
>> PCIe drives downtrain willie-nillie, so it's useful to turn off their
>> notifications, but the NIC absolutely must not downtrain. It's debatable
>> whether it should be default on or default off.
>>
>>> I think we really just need to figure out what's going on.  Then it
>>> should be clearer how to handle it.  I'm not really in a position to
>>> debug the root cause since I don't have the hardware or the time.
>>
>> I wonder
>> (a) if some PCIe devices are downtraining willie-nillie to save power
>> (b) if this willie-nillie downtraining somehow violates PCIe spec
>> (c) what is the official behavior when downtraining is intentional
>>
>> My theory is: YES, YES, ASPM. But I don't know how to figure this out
>> without having the problem hardware in hand.
>>
>>> If nobody can figure out what's going on, I think we'll have to make it
>>> disabled by default.
>>
>> I think most distros do "CONFIG_PCIE_BW is not set". Is that not true?
> 
> I think it *is* true that distros do not enable CONFIG_PCIE_BW.
> 
> But it's perfectly reasonable for people building their own kernels to
> enable it.  It should be safe to enable all config options.  If they
> do enable CONFIG_PCIE_BW, I don't want them to waste time debugging
> messages they don't expect.
> 
> If we understood why these happen and could filter out the expected
> ones, that would be great.  But we don't.  We've already wasted quite
> a bit of Jan's and Atanas' time, and no doubt others who haven't
> bothered to file bug reports.
> 
> So I think I'll queue up a patch to remove the functionality for now.
> It's easily restored if somebody debugs the problem or adds a
> command-line switch or something.

I think it's best we make it a module (or kernel) parameter, default=off 
for the time being.

Alex

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-02-02 20:29 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-15 22:10 Issues with "PCI/LINK: Report degraded links via link bandwidth notification" Bjorn Helgaas
2020-01-16  2:44 ` Alex G
2020-01-18  0:18   ` Bjorn Helgaas
2020-01-20  2:33 ` Bjorn Helgaas
2020-01-20 15:56   ` Alex Williamson
2020-01-20 16:01   ` Alex G.
2020-01-21 11:10     ` Lucas Stach
2020-01-21 14:55       ` Alex G.
2020-02-03  1:56       ` Dave Airlie
2020-02-03  2:04         ` Dave Airlie
2020-02-03  2:07           ` Ben Skeggs
2020-02-03 21:16           ` Alex Deucher
2020-02-04  4:38             ` Lukas Wunner
2020-02-04 14:47               ` Alex Deucher
2020-01-30 16:26   ` Christoph Hellwig
2020-02-22 16:58 ` Bjorn Helgaas
2021-01-28 23:39   ` Bjorn Helgaas
2021-01-28 23:51     ` Sinan Kaya
2021-01-29  0:07       ` Alex G.
2021-01-29 21:56         ` Bjorn Helgaas
2021-02-02 19:50           ` Alex G.
2021-02-02 20:16             ` Bjorn Helgaas
2021-02-02 20:25               ` Alex G.
2021-01-29  1:30     ` Alex Deucher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).