linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: stuart hayes <stuart.w.hayes@gmail.com>
To: Lukas Wunner <lukas@wunner.de>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
	Kuppuswamy Sathyanarayanan 
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Ethan Zhao <haifeng.zhao@intel.com>,
	Sinan Kaya <okaya@kernel.org>, Ashok Raj <ashok.raj@intel.com>,
	Keith Busch <kbusch@kernel.org>,
	Yicong Yang <yangyicong@hisilicon.com>,
	linux-pci@vger.kernel.org, Russell Currey <ruscur@russell.cc>,
	Oliver OHalloran <oohall@gmail.com>,
	Mika Westerberg <mika.westerberg@linux.intel.com>
Subject: Re: [PATCH v2] PCI: pciehp: Ignore Link Down/Up caused by DPC
Date: Tue, 20 Jul 2021 17:11:07 -0500	[thread overview]
Message-ID: <f092c1fd-17fb-9714-ec09-9e5513364a2b@gmail.com> (raw)
In-Reply-To: <20210720065718.GA21556@wunner.de>



On 7/20/2021 1:57 AM, Lukas Wunner wrote:
> On Mon, Jul 19, 2021 at 02:00:51PM -0500, stuart hayes wrote:
>> On 7/19/2021 10:10 AM, Lukas Wunner wrote:
>>> Could you test if the below patch fixes the issue?
>>
>> That does appear to fix the issue, thanks!  Without your patch, the PCIe
>> devices under 64:02.0 disappear (the triggered bit is still set in the DPC
>> capability).  With your patch, recovery is successful and all of the PCIe
>> devices are still there.
> 
> Thanks for testing.
> 
> The test patch clears DLLSC because the Hot Reset that is propagated
> down the hierarchy causes the link to flap.  I'm wondering though if
> that's sufficient or if PDC needs to be cleared as well.  According
> to PCIe Base Spec sec. 4.2.6, LTSSM transitions from "Hot Reset" state
> to "Detect", then "Polling".  If I understand the table "Link Status
> Mapped to the LTSSM" in the spec correctly, in-band presence is 0b
> in Detect state, hence I'd expect PDC to flap as well as a result of
> a Hot Reset being propagated down the hierarchy.
> 

I think the table "Link Status Mapped to the LTSSM" is saying that when 
in-band presence is 0, the LTSSM state must be "Detect" (not that being 
in "Detect" will force in-band presence to zero).

I would not expect PDC to flap since the presence detect (even in-band) 
should not go away during hot reset.

On the system I'm using, I modified the kernel to read and print the 
slot status register right before your test patch clears DLLSC, and it 
reads 0x140 (link status changed, presence is detected, but PDC is not set).

> Does the hotplug port at 0000:68:00.0 support In-Band Presence Disable?
> That would explain why only clearing DLLSC is sufficient.
> 

No... the slot capabilities 2 register is 0.

> The problem is, if PDC is cleared as well, we lose the ability to
> detect that a device was hot-removed while the reset was ongoing,
> which is unfortunate.
> 

Agreed, but I don't think PDC should get set on hot reset.

> If an error is handled by aer_root_reset() (instead of dpc_reset_link())
> and the reset is performed at a hotplug port, then pciehp_reset_slot()
> is invoked:
> 
> aer_root_reset()
>    pci_bus_error_reset()
>      pci_slot_reset()
>        pci_reset_hotplug_slot()
>          pciehp_reset_slot()
> 
> pciehp_reset_slot() temporarily masks both DLLSC *and* PDC events,
> then performs a Secondary Bus Reset at the hotplug port.
> 
> If there are further hotplug ports below that hotplug port
> where the SBR is performed, my expectation is that the Hot Reset
> is likewise propagated down the hierarchy (just as with DPC),
> so those cascaded hotplug ports should also see their link go down.
> 
> In other words, the issue you're seeing isn't really DPC-specific.
> However, the test patch should fix the issue for AER-handled errors
> as well.  Do you agree with this analysis or did I miss anything?
> 

That looks correct to me.

> Thanks,
> 
> Lukas
> 

      reply	other threads:[~2021-07-20 22:11 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-01  8:29 [PATCH v2] PCI: pciehp: Ignore Link Down/Up caused by DPC Lukas Wunner
2021-05-01  8:38 ` Lukas Wunner
2021-06-16 22:19 ` Bjorn Helgaas
2021-06-20  7:38   ` Lukas Wunner
2021-06-25 20:38     ` stuart hayes
2021-06-26  6:50       ` Lukas Wunner
2021-07-06 22:15         ` stuart hayes
2021-07-18 21:26           ` Lukas Wunner
2021-07-19 15:10       ` Lukas Wunner
2021-07-19 19:00         ` stuart hayes
2021-07-20  6:57           ` Lukas Wunner
2021-07-20 22:11             ` stuart hayes [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f092c1fd-17fb-9714-ec09-9e5513364a2b@gmail.com \
    --to=stuart.w.hayes@gmail.com \
    --cc=ashok.raj@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=haifeng.zhao@intel.com \
    --cc=helgaas@kernel.org \
    --cc=kbusch@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mika.westerberg@linux.intel.com \
    --cc=okaya@kernel.org \
    --cc=oohall@gmail.com \
    --cc=ruscur@russell.cc \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=yangyicong@hisilicon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).