[PATCH v2 0/4] s390/pci: automatic error recovery

* [PATCH v2 0/4] s390/pci: automatic error recovery
@ 2021-09-16  9:33 Niklas Schnelle
  2021-09-16  9:33 ` [PATCH v2 1/4] s390/pci: refresh function handle in iomap Niklas Schnelle
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Niklas Schnelle @ 2021-09-16  9:33 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Linas Vepstas, Oliver O'Halloran, linux-kernel, linux-s390,
	linux-pci, Matthew Rosato, Pierre Morel

Hello,

This series implements automatic error recovery for PCI devices on s390
following the scheme outlined at Documentation/PCI/pci-error-recovery.rst
it applies on top of currenct v5.15-rc1.

The patches are almost completely s390 specific except for one patch
exporting existing functionality for use by arch/s390/pci/ code. Nevertheless
I would appreciate any feedback. This is down from two common code changes in
the first version of this series.

The outline of the patches is as follows:

Patch 1 and 2 add s390 specific code implementing a reset mechanism that
takes the PCI function out of the platform specific error state.

Patch 3 "PCI: Export pci_dev_lock()" is basically an extension to commit
e3a9b1212b9d ("PCI: Export pci_dev_trylock() and pci_dev_unlock()") which
already exported pci_dev_trylock(). In the final patch we make use of
pci_dev_lock() to wait for any other exclusive uses of the pdev to be finished
before starting recovery.

Finally Patch 4 implements the recovery flow as part of the existing s390
specific PCI availability and error event mechanism. Previously the error event
handler only set pdev->error_state and required manual intervention to make the
device usable again. Now we handle the case where firmware has already reset
a PCI function after an error was encountered informing the OS that it should
be ready to be used again. In that case if the driver supports error recovery
we use it to transparently reset the device or simply take it out of the error
state and then if possible let the driver resume operations.

Note that the same event is also issued by the hypervisor if the function was
previously taken into a service mode for example for firmware upgrade via the
hypervisor and is now ready to be used again.

Changes since v1:
- Dropped the patch moving pci_dev_is_added(), we can rely on pdev->driver
  being unset for a device that has already been removed or not yet
  initialized. While I believe pci_dev_is_added() would still be a cleaner
  check we need to check for a bound driver anyway and that is sufficient.
- Adapted the hotplug_slot_ops::reset_slot() signature to current upstream
  taking a bool instead of an int
- Added a missing parameter documentation and reworded some comments
- Reworded some debug/info messages

Thanks,
Niklas Schnelle

Niklas Schnelle (4):
  s390/pci: refresh function handle in iomap
  s390/pci: implement reset_slot for hotplug slot
  PCI: Export pci_dev_lock()
  s390/pci: implement minimal PCI error recovery

 arch/s390/include/asm/pci.h        |   6 +-
 arch/s390/pci/pci.c                | 143 +++++++++++++++++++++-
 arch/s390/pci/pci_event.c          | 188 ++++++++++++++++++++++++++++-
 arch/s390/pci/pci_insn.c           |   4 +-
 arch/s390/pci/pci_irq.c            |   9 ++
 drivers/pci/hotplug/s390_pci_hpc.c |  24 ++++
 drivers/pci/pci.c                  |   3 +-
 include/linux/pci.h                |   1 +
 8 files changed, 366 insertions(+), 12 deletions(-)

-- 
2.25.1

^ permalink raw reply	[flat|nested] 11+ messages in thread