linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: Stuart Hayes <stuart.w.hayes@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>, linux-pci@vger.kernel.org
Subject: Re: [PATCH] PCI: pciehp: Don't enable slot unless link or presence up
Date: Thu, 9 Jul 2020 18:05:26 +0200	[thread overview]
Message-ID: <20200709160526.unwnsqqfcvd6wkux@wunner.de> (raw)
In-Reply-To: <d8c4c36a-9eb9-33c6-556a-2a2d55e127ec@gmail.com>

On Wed, Jul 08, 2020 at 05:39:49PM -0500, Stuart Hayes wrote:
> It's more than just the "Link Up" message.  When I "power down" an NVMe
> drive via sysfs (on a system that doesn't actually shut off the power to
> the slot), and then I physically remove the drive, I get this ugly string
> of messages:
> 
> pcieport 0000:3c:06.0: pciehp: Slot(180): Link Up
> pcieport 0000:3c:06.0: pciehp: Timeout waiting for Presence Detect
> pcieport 0000:3c:06.0: pciehp: link training error: status 0x0001
> pcieport 0000:3c:06.0: pciehp: Failed to check link status

I was just wondering if those messages could be avoided by setting
the Link Disable bit in the Link Control register if the slot
is brought down via sysfs.  The question is, if a new NVMe
drive is subsequently inserted into the slot, does the hotplug
controller signal a DLLSC event even though Link Disable is set?
Or at least a PDC event, whereupon we could clear Link Disable?

Or is there some other way to force the link into a state such that
it is off but DLLSC events continue to be received?

There's a function pciehp_link_enable() which can set or clear the
Link Disable flag.  But it's only ever invoked to clear it.  Hm, digging
in the git history shows that between 2012 and 2014 it was used to set
Link Disable when bringing down the slot, but b1811d2455f3 dropped that
because DLLSC was no longer received.

I guess we might be able to set Link Disable, then clear it once the
PDC event is received upon removal of the NVMe drive from the slot.
But I imagine that would complicate the state machine quite a bit.

Another idea:  If the hotplug controller is suspended to D3hot,
the link should be down but presence or link events should still
be received.  Sadly, runtime D3hot is disabled by default for
hotplug ports because it is known to not work on certain SkyLake
Xeon-SP systems and possibly others (see eb3b5bf1a88d).  You can
force it on by passing pcie_port_pm=force on the command line.


> Would you consider the patch if I rework it to separate the
> BLINKINGON_STATE so that its behavior isn't changed, and change the
> fake PDC events to PDC | DLLSC?

TBH I'd suggest to simply drop some of the messages and tone down the
remaining ones, in particular:

* Drop the "Timeout waiting for Presence Detect" message in
  pcie_wait_for_presence().  The sole caller of that function,
  pciehp_check_link_status(), doesn't bail out if it fails
  but continues and will emit an error message of its own.
  I don't think the message provides much additional value.

* Tone down the "link training error" message in
  pciehp_check_link_status() to ctrl_info().  That message at
  least prints the contents of the Link Status register, which
  may have some value.

* Add a message with ctrl_info() severity in the "if (!found)"
  clause in pciehp_check_link_status().  That way the function
  emits a message in all error cases.

* This in turn allows dropping the "Failed to check link status"
  message in board_added().

As a result you should just get two messages with KERN_INFO severity
which simply serve as an indication that DLLLA and PDS did not change
in unison.

With Thunderbolt, unplugging a daisy chain of devices always causes
a bunch of harmless "no response from device" messages with KERN_INFO
severity.  We never know if there's a real problem or not, so we try
to alert the user that there *may* be a problem, but at the same time
we should try to keep noisiness at a minimum.

Thanks,

Lukas

      reply	other threads:[~2020-07-09 16:05 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-10 18:21 [PATCH] PCI: pciehp: Don't enable slot unless link or presence up Stuart Hayes
2020-03-28 20:39 ` Bjorn Helgaas
2020-03-29 12:36 ` Lukas Wunner
2020-07-08 22:39   ` Stuart Hayes
2020-07-09 16:05     ` Lukas Wunner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200709160526.unwnsqqfcvd6wkux@wunner.de \
    --to=lukas@wunner.de \
    --cc=bhelgaas@google.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=stuart.w.hayes@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).