linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: poza@codeaurora.org
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Sinan Kaya <okaya@kernel.org>, Bjorn Helgaas <helgaas@kernel.org>,
	Thomas Tai <thomas.tai@oracle.com>,
	bhelgaas@google.com, keith.busch@intel.com,
	linux-pci@vger.kernel.org, linux-pci-owner@vger.kernel.org,
	Sam Bobroff <sam.bobroff@au1.ibm.com>
Subject: Re: [PATCH 1/1] PCI/AER: prevent pcie_do_fatal_recovery from using device after it is removed
Date: Mon, 20 Aug 2018 18:56:27 +0530	[thread overview]
Message-ID: <a149d9657ca83b41c724b0ddd319c6fb@codeaurora.org> (raw)
In-Reply-To: <2ecd1fd6d763810d45697f846fa876b58a193b1b.camel@kernel.crashing.org>

On 2018-08-20 16:52, Benjamin Herrenschmidt wrote:
> On Mon, 2018-08-20 at 13:26 +0530, poza@codeaurora.org wrote:
>>  now, I've sent a revert patch for the Documentation/ bit to
>> > Bjorn, and I have no (not yet at least) beef in what you do in
>> > drivers/pci/* ... however, that said, I think it would be great to move
>> > EEH toward having a bulk of the policy use common code as well.
>> >
>> 
>> In my opinion, I think revert patch can wait, because again we might
>> need to change it according to improvements we bring.
> 
> I doubt it will differ much from what's there ... anyway...
> 
>> > It will be long road, in part due to the historical crappyness of our
>> > EEH code, so my thinking is we should:
>> >
>> >  - First agree on what we want the policy to be. I need to read a bit
>> > more about DPC since that's new to me, it seems to be similar to what
>> > our EEH does, with slighty less granularity (we can freeze access to
>> > individual functions for example).
>> >
>> >  - Rework err.c to implement that policy with the existing AER and DPC
>> > code.
>> >
>> 
>> I went through eeh-pci-error-recovery.txt
>> If I pick up folowing
>> 
>> "
>> If the OS or device driver suspects that a PCI slot has been
>> EEH-isolated, there is a firmware call it can make to determine if 
>> this
>> is the case. If so, then the device driver should put itself into a
>> consistent state (given that it won't be able to complete any pending
>> work) and start recovery of the card.  Recovery normally would consist
>> of resetting the PCI device (holding the PCI #RST line high for two
>> seconds), followed by setting up the device config space (the base
>> address registers (BAR's), latency timer, cache line size, interrupt
>> line, and so on).  This is followed by a reinitialization of the 
>> device
>> driver.  In a worst-case scenario, the power to the card can be 
>> toggled,
>> at least on hot-plug-capable slots.  In principle, layers far above 
>> the
>> device driver probably do not need to know that the PCI card has been
>> "rebooted" in this way; ideally, there should be at most a pause in
>> Ethernet/disk/USB I/O while the card is being reset.
> 
> This verbiage is a bit old :-) A bunch of it predates PCIe.
> 
>> If the card cannot be recovered after three or four resets, the
>> kernel/device driver should assume the worst-case scenario, that the
>> card has died completely, and report this error to the sysadmin. In
>> addition, error messages are reported through RTAS and also through
>> syslogd (/var/log/messages) to alert the sysadmin of PCI resets. The
>> correct way to deal with failed adapters is to use the standard PCI
>> hotplug tools to remove and replace the dead card.
>> "
>> 
>> some differences: I find is:
>> Current framework does not attempt recovery 3-4 times.
>> 
>> Besides If I grasp eeh-pci-error-recovery.txt correctly, (although I
>> could be easily wrong),
>> mainly the difference is coming how we reset the device.
> 
> See below
> 
>> Linux uses SBR, while EEH seems to be using #PERST.
>> PERST signal implementation might be board specific, e.g. we have
>> io-port-expander sitting on i2c to drive #PERST.
>> we rely on SBR. and SBR also should be a good way to bring hotreset of
>> device.
>> "All Lanes in the configured Link transmit TS1 Ordered Sets with the 
>> Hot
>> Reset bit 15 asserted and the configured Link and Lane numbers."
>> although I am not sure between #PERST and SBR is there is any subtle
>> differences such as some sticky bits might not be affected by SBR.
>> but so far SBR has served well for us.
>> 
>> Also I see that eeh-pci-error-recovery.txt does not seem to 
>> distinguish
>> between fatal and nonfatal errors, is the behavior same for both type 
>> of
>> errors in EEH ?
> 
> Yeah so...
> 
>> I would like to mention is that there should nto be any need to reset
>> the card in case of NONFATAL because by definition link is functional,
>> only the particular transaction
>> had a problem. so not sure how EEH deals with ERR_NONFATAL.
> 
> As I believe I said earlier... NONFATAL means the link is still usable.
> It doesn't mean *anything* as far as the state of the device itself is
> concerned. The device microcode could have crashed for example etc...
> 
> That is why it's important to honor a driver requesting a reset.
> 
> Ignoring the verbiage in the documents for a minute, think of the big
> picture... the idea is that on error, there could be many actors
> involved in the recovery. The brigde, the link itself (NONFATAL vs.
> FATAL matters here), the device, the driver ....
> 
> So the design is that it's always permitted to perform a "deeper"
> action than strictly necessary. IE: If the link is ok, the device might
> still require a reset. If the device is ok, the platform might still
> enforce a reset....
> 
> There can be multiple devices or drivers involved in a recovery
> operation (for example if the error came from a switch).
> 
> The difference between FATAL and NON_FATAL is PCIe specific and only
> really matters from the perspective that if it's FATAL the core will
> require a reset. It doens't mean some other agent can't request one too
> when NON_FATAL errors occur.
> 
> As for EEH, *any* error that isn't purely informational results in the
> adapter being isolated physcially by some dedicated HW. It's possible
> to selectively un-isolate but the policy has generally been to just go
> through the reset sequence as it was generally deemed safer.
> 
> It's hard enough to properly test and validate recovery path, it's
> almost impossible to test and verify every way the HW can recover from
> a all type of non-fatal errors.
> 
> Thus by enforcing that on any error we simply reset the adapter provide
> a slower but much more robust recovery mechanism that is also a lot
> more testable.

Hi Ben,

I get the idea and get your points. and when I started implementation of 
this, I did ask these questions to myself.
but then we all fell aligned to what DPC was doing, because thats how 
the driver was dealing with ERR_FATAL.

I can put together a patch adhering to the idea, and modify err.c.
let me know if you are okay with that.
I have both DPC and AER capable root port, so I can easily validate the 
changes as we improve upon this.

besides We have to come to some common ground on policy.
1) if device driver dictates the reset we should agree to do SBR. 
(irrespective of the type of error)
2) under what circumstances framework will impose the reset, even if 
device driver did not !

probably changes might be simpler; although it will require to exercise 
some tests cases for both DPC and AER.
let me know if anything I missed here, but let me attempt to make patch 
and we can go form there.

Let me know you opinion.

Regards,
Oza.

> 
>> Also I am very much interested in knowing original intention of DPC
>> driver to unplug/plug devices,
>> all I remember in some conversation was:
>> hotplug capable bridge might have see devices changed, so it is safer 
>> to
>> remove/unplug the devices and during which .shutdown methods of driver
>> is called, in case of ERR_FATAL.
> 
> The device being "swapped" during an error recovery operation is ...
> unlikely. Do the bridges have a way to latch that the presence detect
> changed ?
> 
>> although DPC is HW recovery while AER is sw recovery both should
>> fundamentally act in the same way as far as device drivers callbacks 
>> are
>> concerned.
>> (again I really dont know real motivation behind this)
> 
> The main problem with unplug/replug (as I mentioned earlier) is that it
> just does NOT work for storage controllers (or similar type of
> devices). The links between the storage controller and the mounted
> filesystems is lost permanently, you'll most likely have to reboot the
> machine.
> 
> With our current EEH implementation we can successfully recover from
> fatal errors with the storage controller by resetting it.
> 
> Finally as for PERST vs. Hot Reset, this is an implementation detail.
> Not all our platforms can control PERST on all devices either, we
> generally prefer PERST as it provides a more reliable reset mechanism
> in practice (talking from experience) but will fallback to hot reset.
> 
> Cheers,
> Ben.
> 
>> 
>> Regards,
>> Oza.
>> 
>> >  - Figure out what hooks might be needed to be able to plumb EEH into
>> > it, possibly removing a bunch of crap in arch/powerpc (yay !)
>> >
>> > I don't think having a webex will be that practical with the timezones
>> > involved. I'm trying to get approval to go to Plumbers in which case we
>> > could setup a BOF but I have no guarantee at this point that I can make
>> > it happen.
>> >
>> > So let's try using email as much possible for now.
>> >
>> > Cheers,
>> > Ben.

  reply	other threads:[~2018-08-20 13:26 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-13 16:51 [PATCH 0/1] PCI/AER: prevent pcie_do_fatal_recovery from using device after it is removed Thomas Tai
2018-08-13 16:51 ` [PATCH 1/1] " Thomas Tai
2018-08-14  9:16   ` poza
2018-08-14  9:22     ` poza
2018-08-14 13:51       ` Thomas Tai
2018-08-15 14:57         ` poza
2018-08-15 15:02           ` Thomas Tai
2018-08-15 15:26             ` poza
2018-08-15 15:43               ` Thomas Tai
2018-08-15 15:59                 ` poza
2018-08-15 16:04                   ` Thomas Tai
2018-08-15 21:55       ` Benjamin Herrenschmidt
2018-08-15 21:56       ` Benjamin Herrenschmidt
2018-08-16  6:36         ` poza
2018-08-16  6:51           ` Benjamin Herrenschmidt
2018-08-16  6:59             ` Benjamin Herrenschmidt
2018-08-16  8:07               ` poza
2018-08-16  8:12                 ` Benjamin Herrenschmidt
2018-08-16  9:03                   ` poza
2018-08-16 10:07                     ` Benjamin Herrenschmidt
2018-08-16 14:11                       ` poza
2018-08-16 23:30                         ` Benjamin Herrenschmidt
2018-08-17 10:29                           ` poza
2018-08-17 10:44                             ` poza
2018-08-18  7:38                             ` Benjamin Herrenschmidt
2018-08-16  7:05             ` Benjamin Herrenschmidt
2018-08-16  7:15               ` Benjamin Herrenschmidt
2018-08-16  7:56                 ` poza
2018-08-16  8:10                   ` Benjamin Herrenschmidt
2018-08-16  8:05               ` poza
2018-08-16  8:15                 ` Benjamin Herrenschmidt
2018-08-16  8:22                   ` poza
2018-08-16  8:28                     ` Benjamin Herrenschmidt
2018-08-16 13:30                       ` Thomas Tai
2018-08-16 13:46                     ` Sinan Kaya
2018-08-16 23:27                       ` Benjamin Herrenschmidt
2018-08-17  6:35                         ` poza
2018-08-19  2:24                         ` Bjorn Helgaas
2018-08-20  5:09                           ` poza
2018-08-20  5:15                             ` Benjamin Herrenschmidt
2018-08-20 13:02                               ` Thomas Tai
2018-08-20 13:27                                 ` Benjamin Herrenschmidt
2018-08-19  2:19               ` Bjorn Helgaas
2018-08-19 21:41                 ` Sinan Kaya
2018-08-20  2:03                   ` Benjamin Herrenschmidt
2018-08-20  5:19                     ` poza
2018-08-20  5:33                       ` Benjamin Herrenschmidt
2018-08-20  7:56                         ` poza
2018-08-20 11:22                           ` Benjamin Herrenschmidt
2018-08-20 13:26                             ` poza [this message]
2018-08-20 21:02                               ` Benjamin Herrenschmidt
2018-08-21  5:14                                 ` poza
2018-08-21  6:06                                   ` Benjamin Herrenschmidt
2018-08-21 14:37                                     ` Keith Busch
2018-08-21 15:07                                       ` Sinan Kaya
2018-08-21 15:29                                         ` Keith Busch
2018-08-21 15:50                                           ` Sinan Kaya
2018-08-21 15:55                                             ` Sinan Kaya
2018-08-21 16:44                                             ` Keith Busch
2018-08-21 15:30                                       ` poza
2018-08-21 21:14                                       ` Benjamin Herrenschmidt
2018-08-21 22:04                                         ` Keith Busch
2018-08-21 22:33                                           ` Keith Busch
2018-08-21 23:06                                           ` Benjamin Herrenschmidt
2018-08-21 23:13                                             ` Keith Busch
2018-08-22  0:36                                               ` Benjamin Herrenschmidt
2018-08-30  0:01                                             ` Keith Busch
2018-08-30  0:10                                               ` Sinan Kaya
2018-08-30  0:46                                                 ` Keith Busch
2018-08-30  4:26                                               ` Benjamin Herrenschmidt
2018-08-20 15:53                             ` Keith Busch
2018-08-20 16:13                               ` poza
2018-08-20 16:32                                 ` Keith Busch
2018-08-20 21:05                               ` Benjamin Herrenschmidt
2018-08-20 21:21                                 ` Sinan Kaya
2018-08-20 21:35                                   ` Keith Busch
2018-08-20 21:53                                     ` Benjamin Herrenschmidt
2018-08-20 22:02                                       ` Sinan Kaya
2018-08-20 22:04                                         ` Benjamin Herrenschmidt
2018-08-20 22:13                                           ` Sinan Kaya
2018-08-20 22:19                                             ` Benjamin Herrenschmidt
2018-08-22  9:13                                             ` Lukas Wunner
2018-08-22 14:38                                               ` Keith Busch
2018-08-22 14:51                                                 ` Sinan Kaya
2018-08-20 22:13                                           ` Keith Busch
2018-08-20 22:19                                             ` Benjamin Herrenschmidt
2018-08-21  1:30                                               ` Keith Busch
2018-08-20  4:37                 ` Benjamin Herrenschmidt
2018-08-20  4:39                 ` PATCH] Partial revert of "PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices" Benjamin Herrenschmidt
2018-08-21 19:50                   ` Bjorn Helgaas
2018-08-22  4:35                     ` poza

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a149d9657ca83b41c724b0ddd319c6fb@codeaurora.org \
    --to=poza@codeaurora.org \
    --cc=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=keith.busch@intel.com \
    --cc=linux-pci-owner@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=okaya@kernel.org \
    --cc=sam.bobroff@au1.ibm.com \
    --cc=thomas.tai@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).