linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: stuart hayes <stuart.w.hayes@gmail.com>
To: Lukas Wunner <lukas@wunner.de>, Bjorn Helgaas <helgaas@kernel.org>
Cc: Kuppuswamy Sathyanarayanan 
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Ethan Zhao <haifeng.zhao@intel.com>,
	Sinan Kaya <okaya@kernel.org>, Ashok Raj <ashok.raj@intel.com>,
	Keith Busch <kbusch@kernel.org>,
	Yicong Yang <yangyicong@hisilicon.com>,
	linux-pci@vger.kernel.org, Russell Currey <ruscur@russell.cc>,
	Oliver OHalloran <oohall@gmail.com>,
	Mika Westerberg <mika.westerberg@linux.intel.com>
Subject: Re: [PATCH v2] PCI: pciehp: Ignore Link Down/Up caused by DPC
Date: Fri, 25 Jun 2021 15:38:41 -0500	[thread overview]
Message-ID: <08c046b0-c9f2-3489-eeef-7e7aca435bb9@gmail.com> (raw)
In-Reply-To: <20210620073804.GA13118@wunner.de>



On 6/20/2021 2:38 AM, Lukas Wunner wrote:
> On Wed, Jun 16, 2021 at 05:19:45PM -0500, Bjorn Helgaas wrote:
>> On Sat, May 01, 2021 at 10:29:00AM +0200, Lukas Wunner wrote:
>>> Downstream Port Containment (PCIe Base Spec, sec. 6.2.10) disables the
>>> link upon an error and attempts to re-enable it when instructed by the
>>> DPC driver.
>>>
>>> A slot which is both DPC- and hotplug-capable is currently brought down
>>> by pciehp once DPC is triggered (due to the link change) and brought up
>>> on successful recovery.  That's undesirable, the slot should remain up
>>> so that the hotplugged device remains bound to its driver.
>>
>> I think the slot being "brought down" means slot power is turned off,
>> right?
>>
>> I reworded it along those lines and applied this to pci/hotplug for
>> v5.14, thanks!
> 
> Thanks, the reworded commit message LGTM and is more readable.
> 
> "Being brought down" is just a colloquial term for pciehp_disable_slot(),
> i.e. unbinding and removal of the pci_dev's below the hotplug port,
> removing slot power, turning off the power LED and setting the slot's
> state to OFF_STATE.
> 
> Indeed, turning off slot power concurrently to DPC recovery is wrong
> and likely the biggest contributor to the problems seen.
> 
> Another issue is that after bringing down the slot due to the Link Change
> event, pciehp will notice that Presence Detect State is set and will try
> to bring the slot up again, even though DPC recovery may not have completed
> yet.
> 
> The commit should solve all those synchronization issues between pciehp
> and DPC.
> 
> Thanks,
> 
> Lukas
> 

Lukas--

I have a system that is failing to recover after an EDR event with (or 
without...) this patch.  It looks like the problem is similar to what 
this patch is trying to fix, except that on my system, the hotplug port 
is downstream of the root port that has DPC, so the "link down" event on 
it is not being ignored.  So the hotplug code disables the slot (which 
contains an NVMe device on this system) while the nvme driver is trying 
to use it, which results in a failed recovery and another EDR event, and 
the kernel ends up with the DPC trigger status bit set in the root port, 
so everything downstream is gone.

I added the hack below so the hotplug code will ignore the "link down" 
events on the ports downstream of the root port during DPC recovery, and 
it recovers no problem.  (I'm not proposing this as a correct fix.)

Does this sound like a real issue, or am I possibly misunderstanding 
something about how this should work?

Thanks
Stuart

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index b576aa890c76..dfd983c3c5bf 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -119,8 +132,10 @@ static int report_slot_reset(struct pci_dev *dev, 
void *data)
  		!dev->driver->err_handler->slot_reset)
  		goto out;

+	set_bit(PCI_DPC_RECOVERING, &dev->priv_flags);
  	err_handler = dev->driver->err_handler;
  	vote = err_handler->slot_reset(dev);
+	clear_bit(PCI_DPC_RECOVERING, &dev->priv_flags);
  	*result = merge_result(*result, vote);
  out:
  	device_unlock(&dev->dev);


  reply	other threads:[~2021-06-25 20:38 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-01  8:29 [PATCH v2] PCI: pciehp: Ignore Link Down/Up caused by DPC Lukas Wunner
2021-05-01  8:38 ` Lukas Wunner
2021-06-16 22:19 ` Bjorn Helgaas
2021-06-20  7:38   ` Lukas Wunner
2021-06-25 20:38     ` stuart hayes [this message]
2021-06-26  6:50       ` Lukas Wunner
2021-07-06 22:15         ` stuart hayes
2021-07-18 21:26           ` Lukas Wunner
2021-07-19 15:10       ` Lukas Wunner
2021-07-19 19:00         ` stuart hayes
2021-07-20  6:57           ` Lukas Wunner
2021-07-20 22:11             ` stuart hayes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=08c046b0-c9f2-3489-eeef-7e7aca435bb9@gmail.com \
    --to=stuart.w.hayes@gmail.com \
    --cc=ashok.raj@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=haifeng.zhao@intel.com \
    --cc=helgaas@kernel.org \
    --cc=kbusch@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mika.westerberg@linux.intel.com \
    --cc=okaya@kernel.org \
    --cc=oohall@gmail.com \
    --cc=ruscur@russell.cc \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=yangyicong@hisilicon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).