From: Hinko Kocevar <hinko.kocevar@ess.eu>
To: Keith Busch <kbusch@kernel.org>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>
Subject: Re: Recovering from AER: Uncorrected (Fatal) error
Date: Tue, 15 Dec 2020 13:56:21 +0100 [thread overview]
Message-ID: <0def63a9-9a9e-440c-6bd8-7fd8dfef5b63@ess.eu> (raw)
In-Reply-To: <20201214212319.GB22809@redsun51.ssa.fujisawa.hgst.com>
Hi Keith,
On 12/14/20 10:23 PM, Keith Busch wrote:
> On Wed, Dec 09, 2020 at 11:55:07PM +0100, Hinko Kocevar wrote:
>> Adding a bunch of printk()'s to portdrv_pci.c led to (partial) success!
>>
>> So, the pcie_portdrv_error_detected() returns PCI_ERS_RESULT_CAN_RECOVER and
>> therefore the pcie_portdrv_slot_reset() is not called.
>>
>> But the pcie_portdrv_err_resume() is called! Adding these two lines to
>> pcie_portdrv_err_resume(), before the call to device_for_each_child():
>>
>> pci_restore_state(dev);
>> pci_save_state(dev);
>
> You need to do that with the current kernel or are you still using a
> 3.10? A more recent kernel shouldn't have needed such a fix after the
This was tested on the 5.9.12 kernel at that time. As of today, I've
re-ran the tests on Bjorn's git tree, pci/err branch from Sunday (I
guess 5.10.0 version).
> following commit was introduced:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=874b3251113a1e2cbe79c24994dc03fe4fe4b99b
>
I noticed the change you are pointing out when trying to propose a patch.
It made me curious on why the pcie_portdrv_slot_reset() is not invoked.
After sprinkling a couple of printk()'s around the pcie_do_recovery()
and pcie_portdrv_err_handler's I can observe that the
pcie_portdrv_slot_reset() is never called from pcie_do_recovery() due to
status returned by reset_subordinates() (actually aer_root_reset() from
pcie/aer.c) being PCI_ERS_RESULT_RECOVERED.
I reckon, in order to invoke the pcie_portdrv_slot_reset(), the
aer_root_reset() should have returned PCI_ERS_RESULT_NEED_RESET.
As soon as I plug the calls to pci_restore_state() and pci_save_state()
into the pcie_portdrv_err_resume() the bus and devices are operational
again.
next prev parent reply other threads:[~2020-12-15 12:58 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-04 12:52 Recovering from AER: Uncorrected (Fatal) error Hinko Kocevar
2020-12-04 22:38 ` Bjorn Helgaas
2020-12-09 10:02 ` Hinko Kocevar
2020-12-09 17:40 ` Bjorn Helgaas
2020-12-09 20:31 ` Hinko Kocevar
2020-12-09 20:50 ` Hinko Kocevar
2020-12-09 21:32 ` Bjorn Helgaas
2020-12-09 22:55 ` Hinko Kocevar
2020-12-10 12:56 ` Hinko Kocevar
2020-12-14 21:23 ` Keith Busch
2020-12-15 12:56 ` Hinko Kocevar [this message]
2020-12-15 18:56 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0def63a9-9a9e-440c-6bd8-7fd8dfef5b63@ess.eu \
--to=hinko.kocevar@ess.eu \
--cc=helgaas@kernel.org \
--cc=kbusch@kernel.org \
--cc=linux-pci@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).