linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: Dongdong Liu <liudongdong3@huawei.com>
Cc: helgaas@kernel.org, lorenzo.pieralisi@arm.com,
	linux-pci@vger.kernel.org, linuxarm@huawei.com,
	john.garry@huawei.com,
	Mika Westerberg <mika.westerberg@linux.intel.com>
Subject: Re: [RFC PATCH] PCI: hotplug: Fix surprise removal report card present and link failed
Date: Thu, 17 Jan 2019 19:30:46 +0100	[thread overview]
Message-ID: <20190117183046.gvwf7rwi7eqgtkbg@wunner.de> (raw)
In-Reply-To: <aeec5f59-1f41-65d0-b2c1-7a606e32e58b@huawei.com>

On Thu, Jan 17, 2019 at 08:07:13PM +0800, Dongdong Liu wrote:
> ??? 2019/1/16 22:22, Lukas Wunner ??????:
> > On Wed, Jan 16, 2019 at 10:31:04PM +0800, Dongdong Liu wrote:
> > > The lspci -tv topology is as below.
> > >  +-[0000:80]-+-00.0-[81]----00.0  Huawei Technologies Co., Ltd. Device 3714
> > >  |           +-02.0-[82]----00.0  Huawei Technologies Co., Ltd. Device 3714
> > >  |           +-04.0-[83]----00.0  Huawei Technologies Co., Ltd. Device 3714
> > >  |           +-06.0-[84]----00.0  Huawei Technologies Co., Ltd. Device 3714
> > >  |           +-10.0-[87]----00.0  Huawei Technologies Co., Ltd. Device 3714
> > > 
> > > Then surprise removal 87:00.0 NVME SSD card. The message is as below.
> > > 
> > > pciehp 0000:80:10.0:pcie004: Slot(36): Link Down
> > > iommu: Removing device 0000:87:00.0 from group 12
> > > pciehp 0000:80:10.0:pcie004: Slot(36): Card present
> > > pcieport 0000:80:10.0: Data Link Layer Link Active not set in 1000 msec
> > > pciehp 0000:80:10.0:pcie004: Failed to check link status
> > 
> > What is the problem that you're trying to fix?  That these messages
> > are logged?  Or is there a bigger issue?  If the only problem are the
> > messages, then I feel that the current behavior is a feature, not a bug.
> > We could probably tone down the "Failed to check link status" message's
> > severity.  (Currently it's KERN_ERR, all the other messages are KERN_INFO.)
> 
> Yes, the only problem is the messages, looks not good,
> as the card have been removed from board, the message still show
> card present and failed to check link status.
> Only tone down the "Failed to check link status" message's severity
> seems not good enough.

Well, getting messages like this is par for the course with PCIe hotplug.

E.g. some older Thunderbolt controllers do not support MSI on their
hotplug ports, but only INTx.  If multiple such devices are daisy-
chained, they'll share an interrupt, so whenever a device is hot-removed,
a "pciehp_isr: no response from device" message is logged with
KERN_INFO severity because the hot-removed device was inaccessible
for its interrupt handler.  The interrupt didn't come from the
hot-removed device of course but from another one further upstream
in the daisy-chain where the plug event occurred.  We can't do much
better with such broken hardware.

The reason you're seeing messages is because it takes an unusually
long time for the controller to clear the Presence Detect State bit
after a Data Link Layer State Changed event upon hot-removal.
That's arguably a quirk of the hardware you're dealing with.

pciehp cannot tell whether the Presence Detect State bit is set
because a new card is already present in the slot or if it's trailing
hot-removal and will be cleared shortly.  The protocol doesn't allow
for a clear disambiguation, so pciehp copes by optimistically trying
to bring up the slot, and giving up after a certain delay.

There is other quirky hardware out there which flaps the Presence
Detect State and Data Link Layer Link Active bits a couple of times
before they become stable, which is why pciehp needs to try for a
certain period to bring up the slot.

Again, we could probably tone down or remove some of the messages,
but that might make it harder to diagnose when something really
doesn't work.  It's Bjorn's call anyway.

Thanks,

Lukas

      reply	other threads:[~2019-01-17 18:30 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-16 14:31 [RFC PATCH] PCI: hotplug: Fix surprise removal report card present and link failed Dongdong Liu
2019-01-16 14:22 ` Lukas Wunner
2019-01-17 12:07   ` Dongdong Liu
2019-01-17 18:30     ` Lukas Wunner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190117183046.gvwf7rwi7eqgtkbg@wunner.de \
    --to=lukas@wunner.de \
    --cc=helgaas@kernel.org \
    --cc=john.garry@huawei.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=liudongdong3@huawei.com \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=mika.westerberg@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).