linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: "Alex G." <mr.nuke.me@gmail.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>,
	linux-pci@vger.kernel.org, "Bolen,
	Austin" <austin_bolen@dell.com>, Keith Busch <kbusch@kernel.org>
Subject: Re: spammy dmesg about fluctuating pcie bandwidth on 5.9
Date: Fri, 9 Oct 2020 10:05:17 -0500	[thread overview]
Message-ID: <20201009150517.GA3475242@bjorn-Precision-5520> (raw)
In-Reply-To: <e0b40eed-72a4-dde0-3622-2a9a28db1e62@gmail.com>

[+cc Keith]

On Fri, Oct 09, 2020 at 09:32:32AM -0500, Alex G. wrote:
> [+cc Austin]
> 
> Hi Bjorn,
> 
> That log looks like the PCIe registers spitting wrong values, but all I can
> do is speculate. Have we verified we don't have a race condition between
> ASPM and pcie_report_downtraining()?
> 
> I wouldn't be surprised if the cause of the messages is the device (or
> downstream port) faking all f's responses. I don't see how else we'd get
> links reported as x63 32GT/s.

Right, I think you and Keith nailed it: likely we read ~0 from
PCI_EXP_LNKSTA because of some error.  I don't know what the
connection with ASPM would be, although we do have some known problems
with ASPM configuration, especially in "powersave" mode; see [1]

Jason, could you capture the "sudo lspci -vvxxx" output and dmesg log
after the messages start?  You could instrument bw_notification.c so
it shuts up after a few messages if necessary.  The lspci should show
us if a port leading to the device is in DPC or other error condition.

DPC takes down the link (as would an AER event).  It doesn't seem like
these should cause Link Bandwidth Management interrupts, but maybe?

[1] https://lore.kernel.org/r/20201007132808.647589-1-ian.kumlien@gmail.com

> On 10/7/20 12:02 PM, Bjorn Helgaas wrote:
> > [+cc Alex]
> > 
> > On Wed, Oct 07, 2020 at 06:09:06PM +0200, Jason A. Donenfeld wrote:
> > > Hi,
> > > 
> > > Since 5.9 I've been seeing lots of the below in my logs. I'm wondering
> > > if this is a case of "ASPM finally working properly," or if I'm
> > > actually running into aberrant behavior that I should look into
> > > further. I run with `pcie_aspm=force pcie_aspm.policy=powersave` on my
> > > command line. But I wasn't seeing these messages in 5.8.
> > 
> > I'm sorry that you need to use "pcie_aspm=force
> > pcie_aspm.policy=powersave".  Someday maybe we'll get enough of ASPM
> > fixed so we won't need junk like that.  I don't think we're there yet.
> > Do you build with CONFIG_PCIEASPM_POWERSAVE=y?  Do you need
> > "pcie_aspm=force" because the firmware tells us not to use ASPM?
> > 
> > Re: the messages below, they come from Link Bandwidth Management
> > interrupts.  These *should* only happen because of a
> > software-initiated link retrain or because hardware changed the link
> > speed or width because the link was unreliable.  ASPM shouldn't cause
> > these.
> > 
> > So it's possible you have an unreliable slot, but I doubt it because
> > you said v5.8 works fine, and also the Link Bandwidth interrupts
> > should only happen if something *changed*, but all the messages below
> > look the same to me.
> > 
> > Something is also wrong with them -- there's no such thing as a "x63"
> > link.  But maybe these are copy/paste errors?  I don't know where the
> > "b.4" comes from either.  Oh, that probably belongs with
> > "0000:00:1b.4" but got separated by copy/paste.
> > 
> > Obviously you can stop the messages by unsetting CONFIG_PCIE_BW.
> > 
> > The code (drivers/pci/pcie/bw_notification.c) is pretty
> > straightforward and I don't see an obvious problem, but maybe Alex
> > will.
> > 
> > > [79960.801929] pcieport 0000:04:00.0: 31.504 Gb/s available PCIe
> > > bandwidth, limited by 8.0 GT/s PCIe x4 link at 0000:00:1
> > > b.4 (capable of 1984.941 Gb/s with 32.0 GT/s PCIe x63 link)
> > > [79981.679813] pcieport 0000:04:00.0: 31.504 Gb/s available PCIe
> > > bandwidth, limited by 8.0 GT/s PCIe x4 link at 0000:00:1
> > > b.4 (capable of 1984.941 Gb/s with 32.0 GT/s PCIe x63 link)
> > > ...

  reply	other threads:[~2020-10-09 15:05 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-07 16:09 spammy dmesg about fluctuating pcie bandwidth on 5.9 Jason A. Donenfeld
2020-10-07 16:46 ` Keith Busch
2020-10-07 17:02 ` Bjorn Helgaas
2020-10-09 14:32   ` Alex G.
2020-10-09 15:05     ` Bjorn Helgaas [this message]
     [not found] <CAHmME9qa4NQCj8w-Apd2TnbtMjbox0jA6T347Bf_wEkJrzSz0g@mail.gmail.com>
2020-10-12 22:03 ` Bjorn Helgaas
2020-10-14 10:31   ` Jason A. Donenfeld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201009150517.GA3475242@bjorn-Precision-5520 \
    --to=helgaas@kernel.org \
    --cc=Jason@zx2c4.com \
    --cc=austin_bolen@dell.com \
    --cc=kbusch@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mr.nuke.me@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).