linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hinko Kocevar <hinko.kocevar@ess.eu>
To: Keith Busch <kbusch@kernel.org>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>
Subject: Re: Recovering from AER: Uncorrected (Fatal) error
Date: Tue, 15 Dec 2020 13:56:21 +0100	[thread overview]
Message-ID: <0def63a9-9a9e-440c-6bd8-7fd8dfef5b63@ess.eu> (raw)
In-Reply-To: <20201214212319.GB22809@redsun51.ssa.fujisawa.hgst.com>

Hi Keith,

On 12/14/20 10:23 PM, Keith Busch wrote:
> On Wed, Dec 09, 2020 at 11:55:07PM +0100, Hinko Kocevar wrote:
>> Adding a bunch of printk()'s to portdrv_pci.c led to (partial) success!
>>
>> So, the pcie_portdrv_error_detected() returns PCI_ERS_RESULT_CAN_RECOVER and
>> therefore the pcie_portdrv_slot_reset() is not called.
>>
>> But the pcie_portdrv_err_resume() is called! Adding these two lines to
>> pcie_portdrv_err_resume(), before the call to device_for_each_child():
>>
>>          pci_restore_state(dev);
>>          pci_save_state(dev);
> 
> You need to do that with the current kernel or are you still using a
> 3.10? A more recent kernel shouldn't have needed such a fix after the


This was tested on the 5.9.12 kernel at that time. As of today, I've 
re-ran the tests on Bjorn's git tree, pci/err branch from Sunday (I 
guess 5.10.0 version).

> following commit was introduced:
> 
>    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=874b3251113a1e2cbe79c24994dc03fe4fe4b99b
> 

I noticed the change you are pointing out when trying to propose a patch.

It made me curious on why the pcie_portdrv_slot_reset() is not invoked.

After sprinkling a couple of printk()'s around the pcie_do_recovery() 
and pcie_portdrv_err_handler's I can observe that the 
pcie_portdrv_slot_reset() is never called from pcie_do_recovery() due to 
status returned by reset_subordinates() (actually aer_root_reset() from 
pcie/aer.c) being PCI_ERS_RESULT_RECOVERED.

I reckon, in order to invoke the pcie_portdrv_slot_reset(), the 
aer_root_reset() should have returned PCI_ERS_RESULT_NEED_RESET.

As soon as I plug the calls to pci_restore_state() and pci_save_state() 
into the pcie_portdrv_err_resume() the bus and devices are operational 
again.

  reply	other threads:[~2020-12-15 12:58 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-04 12:52 Recovering from AER: Uncorrected (Fatal) error Hinko Kocevar
2020-12-04 22:38 ` Bjorn Helgaas
2020-12-09 10:02   ` Hinko Kocevar
2020-12-09 17:40     ` Bjorn Helgaas
2020-12-09 20:31       ` Hinko Kocevar
2020-12-09 20:50       ` Hinko Kocevar
2020-12-09 21:32         ` Bjorn Helgaas
2020-12-09 22:55           ` Hinko Kocevar
2020-12-10 12:56             ` Hinko Kocevar
2020-12-14 21:23             ` Keith Busch
2020-12-15 12:56               ` Hinko Kocevar [this message]
2020-12-15 18:56                 ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0def63a9-9a9e-440c-6bd8-7fd8dfef5b63@ess.eu \
    --to=hinko.kocevar@ess.eu \
    --cc=helgaas@kernel.org \
    --cc=kbusch@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).