Linux-PCI Archive on
 help / color / Atom feed
From: Mika Westerberg <>
To: Lukas Wunner <>
Cc: Bjorn Helgaas <>,
	Stuart Hayes <>,
	Dongdong Liu <>,
	John Garry <>,
	Keith Busch <>,
Subject: Re: [PATCH] PCI: pciehp: Reduce noisiness on hot removal
Date: Mon, 3 Aug 2020 13:10:40 +0300
Message-ID: <> (raw)
In-Reply-To: <>

On Tue, Jul 28, 2020 at 01:24:10PM +0200, Lukas Wunner wrote:
> When a PCIe card is hot-removed, the Presence Detect State and Data Link
> Layer Link Active bits often do not clear simultaneously.  I've seen
> delays of up to 244 msec between the two events with Thunderbolt.
> After pciehp has brought down the slot in response to the first event,
> the other bit may still be set.  It's not discernible whether it's set
> because a new card is already in the slot or if it will soon clear.
> So pciehp tries to bring up the slot and in the latter case fails with
> a bunch of messages, some of them at KERN_ERR severity.  The messages
> are false positives if the slot is no longer occupied and annoy users.
> Stuart Hayes reports the following splat on hot removal:
> KERN_INFO pcieport 0000:3c:06.0: pciehp: Slot(180): Link Up
> KERN_INFO pcieport 0000:3c:06.0: pciehp: Timeout waiting for Presence Detect
> KERN_ERR  pcieport 0000:3c:06.0: pciehp: link training error: status 0x0001
> KERN_ERR  pcieport 0000:3c:06.0: pciehp: Failed to check link status
> Dongdong Liu complains about a similar splat:
> KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): Link Down
> KERN_INFO iommu: Removing device 0000:87:00.0 from group 12
> KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): Card present
> KERN_INFO pcieport 0000:80:10.0: Data Link Layer Link Active not set in 1000 msec
> KERN_ERR  pciehp 0000:80:10.0:pcie004: Failed to check link status
> Users are particularly irritated to see a bringup attempt even though
> the slot was explicitly brought down via sysfs.  In a perfect world, we
> could avoid this by setting Link Disable on slot bringdown and
> re-enabling it upon a Presence Detect State change.  In reality however,
> there are broken hotplug ports which hardwire Presence Detect to zero,
> see 80696f991424 ("PCI: pciehp: Tolerate Presence Detect hardwired to
> zero").  Conversely, PCIe r1.0 hotplug ports hardwire Link Active to
> zero because Link Active Reporting wasn't specified before PCIe r1.1.
> On unplug, some ports first clear Presence then Link (see Stuart Hayes'
> splat) whereas others use the inverse order (see Dongdong Liu's splat).
> To top it off, there are hotplug ports which flap the Presence and Link
> bits on slot bringup, see 6c35a1ac3da6 ("PCI: pciehp: Tolerate initially
> unstable link").
> pciehp is designed to work with all of these variants.  Surplus attempts
> at slot bringup are a lesser evil than not being able to bring up slots
> at all.  Although we could try to perfect the behavior for specific
> hotplug controllers, we'd risk breaking others or increasing code
> complexity.
> But we can certainly minimize annoyance by emitting only a single
> message with KERN_INFO severity if bringup is unsuccessful:
> * Drop the "Timeout waiting for Presence Detect" message in
>   pcie_wait_for_presence().  The sole caller of that function,
>   pciehp_check_link_status(), ignores the timeout and carries on.
>   It emits error messages of its own and I don't think this particular
>   message adds much value.
> * There's a single error condition in pciehp_check_link_status() which
>   does not emit a message.  Adding one allows dropping the "Failed to
>   check link status" message emitted by board_added() if
>   pciehp_check_link_status() returns a non-zero integer.
> * Tone down all messages in pciehp_check_link_status() to KERN_INFO
>   severity and rephrase them to look as innocuous as possible.  To this
>   end, move the message emitted by pcie_wait_for_link_delay() to its
>   callers.
> As a result, Stuart Hayes' splat becomes:
> KERN_INFO pcieport 0000:3c:06.0: pciehp: Slot(180): Link Up
> KERN_INFO pcieport 0000:3c:06.0: pciehp: Slot(180): Cannot train link: status 0x0001
> Dongdong Liu's splat becomes:
> KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): Card present
> KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): No link
> The messages now merely serve as information that presence or link bits
> were set a little longer than expected.  Bringup failures which are not
> false positives are still reported, albeit no longer at KERN_ERR
> severity.
> Link:
> Link:
> Reported-by: Stuart Hayes <>
> Reported-by: Dongdong Liu <>
> Signed-off-by: Lukas Wunner <>

Reviewed-by: Mika Westerberg <>

  reply index

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-28 11:24 Lukas Wunner
2020-08-03 10:10 ` Mika Westerberg [this message]
2020-09-17 21:24 ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-PCI Archive on

Archives are clonable:
	git clone --mirror linux-pci/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-pci linux-pci/ \
	public-inbox-index linux-pci

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone