linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mika Westerberg <mika.westerberg@linux.intel.com>
To: Lukas Wunner <lukas@wunner.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	Stuart Hayes <stuart.w.hayes@gmail.com>,
	Dongdong Liu <liudongdong3@huawei.com>,
	John Garry <john.garry@huawei.com>,
	Keith Busch <kbusch@kernel.org>,
	linux-pci@vger.kernel.org
Subject: Re: [PATCH] PCI: pciehp: Reduce noisiness on hot removal
Date: Mon, 3 Aug 2020 13:10:40 +0300	[thread overview]
Message-ID: <20200803101040.GH1375436@lahna.fi.intel.com> (raw)
In-Reply-To: <b45e46fd8a6aa6930aaac9d7718c2e4b787a4e5e.1595935071.git.lukas@wunner.de>

On Tue, Jul 28, 2020 at 01:24:10PM +0200, Lukas Wunner wrote:
> When a PCIe card is hot-removed, the Presence Detect State and Data Link
> Layer Link Active bits often do not clear simultaneously.  I've seen
> delays of up to 244 msec between the two events with Thunderbolt.
> 
> After pciehp has brought down the slot in response to the first event,
> the other bit may still be set.  It's not discernible whether it's set
> because a new card is already in the slot or if it will soon clear.
> So pciehp tries to bring up the slot and in the latter case fails with
> a bunch of messages, some of them at KERN_ERR severity.  The messages
> are false positives if the slot is no longer occupied and annoy users.
> 
> Stuart Hayes reports the following splat on hot removal:
> 
> KERN_INFO pcieport 0000:3c:06.0: pciehp: Slot(180): Link Up
> KERN_INFO pcieport 0000:3c:06.0: pciehp: Timeout waiting for Presence Detect
> KERN_ERR  pcieport 0000:3c:06.0: pciehp: link training error: status 0x0001
> KERN_ERR  pcieport 0000:3c:06.0: pciehp: Failed to check link status
> 
> Dongdong Liu complains about a similar splat:
> 
> KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): Link Down
> KERN_INFO iommu: Removing device 0000:87:00.0 from group 12
> KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): Card present
> KERN_INFO pcieport 0000:80:10.0: Data Link Layer Link Active not set in 1000 msec
> KERN_ERR  pciehp 0000:80:10.0:pcie004: Failed to check link status
> 
> Users are particularly irritated to see a bringup attempt even though
> the slot was explicitly brought down via sysfs.  In a perfect world, we
> could avoid this by setting Link Disable on slot bringdown and
> re-enabling it upon a Presence Detect State change.  In reality however,
> there are broken hotplug ports which hardwire Presence Detect to zero,
> see 80696f991424 ("PCI: pciehp: Tolerate Presence Detect hardwired to
> zero").  Conversely, PCIe r1.0 hotplug ports hardwire Link Active to
> zero because Link Active Reporting wasn't specified before PCIe r1.1.
> On unplug, some ports first clear Presence then Link (see Stuart Hayes'
> splat) whereas others use the inverse order (see Dongdong Liu's splat).
> To top it off, there are hotplug ports which flap the Presence and Link
> bits on slot bringup, see 6c35a1ac3da6 ("PCI: pciehp: Tolerate initially
> unstable link").
> 
> pciehp is designed to work with all of these variants.  Surplus attempts
> at slot bringup are a lesser evil than not being able to bring up slots
> at all.  Although we could try to perfect the behavior for specific
> hotplug controllers, we'd risk breaking others or increasing code
> complexity.
> 
> But we can certainly minimize annoyance by emitting only a single
> message with KERN_INFO severity if bringup is unsuccessful:
> 
> * Drop the "Timeout waiting for Presence Detect" message in
>   pcie_wait_for_presence().  The sole caller of that function,
>   pciehp_check_link_status(), ignores the timeout and carries on.
>   It emits error messages of its own and I don't think this particular
>   message adds much value.
> 
> * There's a single error condition in pciehp_check_link_status() which
>   does not emit a message.  Adding one allows dropping the "Failed to
>   check link status" message emitted by board_added() if
>   pciehp_check_link_status() returns a non-zero integer.
> 
> * Tone down all messages in pciehp_check_link_status() to KERN_INFO
>   severity and rephrase them to look as innocuous as possible.  To this
>   end, move the message emitted by pcie_wait_for_link_delay() to its
>   callers.
> 
> As a result, Stuart Hayes' splat becomes:
> 
> KERN_INFO pcieport 0000:3c:06.0: pciehp: Slot(180): Link Up
> KERN_INFO pcieport 0000:3c:06.0: pciehp: Slot(180): Cannot train link: status 0x0001
> 
> Dongdong Liu's splat becomes:
> 
> KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): Card present
> KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): No link
> 
> The messages now merely serve as information that presence or link bits
> were set a little longer than expected.  Bringup failures which are not
> false positives are still reported, albeit no longer at KERN_ERR
> severity.
> 
> Link: https://lore.kernel.org/linux-pci/20200310182100.102987-1-stuart.w.hayes@gmail.com/
> Link: https://lore.kernel.org/linux-pci/1547649064-19019-1-git-send-email-liudongdong3@huawei.com/
> Reported-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> Reported-by: Dongdong Liu <liudongdong3@huawei.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>

Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

  reply	other threads:[~2020-08-03 10:10 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-28 11:24 [PATCH] PCI: pciehp: Reduce noisiness on hot removal Lukas Wunner
2020-08-03 10:10 ` Mika Westerberg [this message]
2020-09-17 21:24 ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200803101040.GH1375436@lahna.fi.intel.com \
    --to=mika.westerberg@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=john.garry@huawei.com \
    --cc=kbusch@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=liudongdong3@huawei.com \
    --cc=lukas@wunner.de \
    --cc=stuart.w.hayes@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).