From: Yicong Yang <yangyicong@hisilicon.com> To: Lukas Wunner <lukas@wunner.de> Cc: Bjorn Helgaas <helgaas@kernel.org>, Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>, Dan Williams <dan.j.williams@intel.com>, Ethan Zhao <haifeng.zhao@intel.com>, Sinan Kaya <okaya@kernel.org>, Ashok Raj <ashok.raj@intel.com>, Keith Busch <kbusch@kernel.org>, <linux-pci@vger.kernel.org>, Russell Currey <ruscur@russell.cc>, Oliver O'Halloran <oohall@gmail.com>, Stuart Hayes <stuart.w.hayes@gmail.com>, Mika Westerberg <mika.westerberg@linux.intel.com>, Linuxarm <linuxarm@huawei.com> Subject: Re: [PATCH] PCI: pciehp: Ignore Link Down/Up caused by DPC Date: Thu, 29 Apr 2021 19:29:59 +0800 [thread overview] Message-ID: <c7932c4e-81b1-279d-48df-5d621efff757@hisilicon.com> (raw) In-Reply-To: <20210428144041.GA27967@wunner.de> On 2021/4/28 22:40, Lukas Wunner wrote: > On Wed, Apr 28, 2021 at 06:08:02PM +0800, Yicong Yang wrote: >> I've tested the patch on our board, but the hotplug will still be >> triggered sometimes. >> seems the hotplug doesn't find the link down event is caused by dpc. >> Any further test I can do? >> >> mestuary:/$ [12508.408576] pcieport 0000:00:10.0: DPC: containment event, status:0x1f21 source:0x0000 >> [12508.423016] pcieport 0000:00:10.0: DPC: unmasked uncorrectable error detected >> [12508.434277] pcieport 0000:00:10.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Completer ID) >> [12508.447651] pcieport 0000:00:10.0: device [19e5:a130] error status/mask=00008000/04400000 >> [12508.458279] pcieport 0000:00:10.0: [15] CmpltAbrt (First) >> [12508.467094] pcieport 0000:00:10.0: AER: TLP Header: 00000000 00000000 00000000 00000000 >> [12511.152329] pcieport 0000:00:10.0: pciehp: Slot(0): Link Down > > Note that about 3 seconds pass between DPC trigger and hotplug link down > (12508 -> 12511). That's most likely the 3 second timeout in my patch: > > + /* > + * Need a timeout in case DPC never completes due to failure of > + * dpc_wait_rp_inactive(). > + */ > + wait_event_timeout(dpc_completed_waitqueue, dpc_completed(pdev), > + msecs_to_jiffies(3000)); > > If DPC doesn't recover within 3 seconds, pciehp will consider the > error unrecoverable and bring down the slot, no matter what. > > I can't tell you why DPC is unable to recover. Does it help if you > raise the timeout to, say, 5000 msec? > I raise the timeout to 4s and it works well. I dump the remained jiffies in the log and find sometimes the recovery will take a bit more than 3s: [ 826.564141] pcieport 0000:00:10.0: DPC: containment event, status:0x1f01 source:0x0000 [ 826.579790] pcieport 0000:00:10.0: DPC: unmasked uncorrectable error detected [ 826.591881] pcieport 0000:00:10.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Completer ID) [ 826.608137] pcieport 0000:00:10.0: device [19e5:a130] error status/mask=00008000/04400000 [ 826.620888] pcieport 0000:00:10.0: [15] CmpltAbrt (First) [ 826.638742] pcieport 0000:00:10.0: AER: TLP Header: 00000000 00000000 00000000 00000000 [ 828.955313] pcieport 0000:00:10.0: DPC: dpc_reset_link: begin reset [ 829.719875] pcieport 0000:00:10.0: DPC: DPC reset has been finished. [ 829.731449] pcieport 0000:00:10.0: DPC: remaining time for waiting dpc compelete: 0xd0 <-------- 208 jiffies remained [ 829.732459] ixgbe 0000:01:00.0: enabling device (0000 -> 0002) [ 829.744535] pcieport 0000:00:10.0: pciehp: Slot(0): Link Down/Up ignored (recovered by DPC) [ 829.993188] ixgbe 0000:01:00.1: enabling device (0000 -> 0002) [ 830.760190] pcieport 0000:00:10.0: AER: device recovery successful [ 831.013197] ixgbe 0000:01:00.0 eth0: detected SFP+: 5 [ 831.164242] ixgbe 0000:01:00.0 eth0: NIC Link is Up 10 Gbps, Flow Control: RX/TX [ 831.827845] ixgbe 0000:01:00.0 eth0: NIC Link is Down [ 833.381018] ixgbe 0000:01:00.0 eth0: NIC Link is Up 10 Gbps, Flow Control: RX/TX CONFIG_HZ=250 so remaining jiffies should larger than 250 if the recovery finished in 3s. Is there a reference to the 3s timeout? and does it make sense to raise it a little bit? Thanks, Yicong > Thanks, > > Lukas > > . >
next prev parent reply other threads:[~2021-04-29 11:30 UTC|newest] Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-03-28 8:52 Lukas Wunner 2021-03-30 20:53 ` Kuppuswamy, Sathyanarayanan 2021-04-28 0:39 ` Kuppuswamy, Sathyanarayanan 2021-04-28 1:42 ` Zhao, Haifeng 2021-04-28 10:08 ` Yicong Yang 2021-04-28 14:40 ` Lukas Wunner 2021-04-29 11:29 ` Yicong Yang [this message] 2021-04-29 12:40 ` Zhao, Haifeng 2021-04-29 19:42 ` Lukas Wunner 2021-04-30 8:47 ` Yicong Yang 2021-04-30 12:15 ` Lukas Wunner 2021-04-29 19:36 ` Keith Busch 2021-04-29 20:16 ` Lukas Wunner 2021-04-29 21:16 ` Keith Busch
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=c7932c4e-81b1-279d-48df-5d621efff757@hisilicon.com \ --to=yangyicong@hisilicon.com \ --cc=ashok.raj@intel.com \ --cc=dan.j.williams@intel.com \ --cc=haifeng.zhao@intel.com \ --cc=helgaas@kernel.org \ --cc=kbusch@kernel.org \ --cc=linux-pci@vger.kernel.org \ --cc=linuxarm@huawei.com \ --cc=lukas@wunner.de \ --cc=mika.westerberg@linux.intel.com \ --cc=okaya@kernel.org \ --cc=oohall@gmail.com \ --cc=ruscur@russell.cc \ --cc=sathyanarayanan.kuppuswamy@linux.intel.com \ --cc=stuart.w.hayes@gmail.com \ --subject='Re: [PATCH] PCI: pciehp: Ignore Link Down/Up caused by DPC' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).