linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/11] EEH Followup Fixes (II)
@ 2013-07-24  2:24 Gavin Shan
  2013-07-24  2:24 ` [PATCH 01/11] powerpc/eeh: Remove reference to PCI device Gavin Shan
                   ` (10 more replies)
  0 siblings, 11 replies; 15+ messages in thread
From: Gavin Shan @ 2013-07-24  2:24 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The series of patches bases on linux-poerpc-next initially and intends to resolve
the following problems:
 
	- On pSeries platform, the EEH doesn't work after PHB hotplug
	  with "drmgr". The root cause is that the EEH resources (
	  EEH devices, EEH caches) aren't released correctly. For the
	  problem, we add one hook (pcibios_stop_dev), which is called
	  on pci_stop_and_remove_device(). In pcibios_stop_dev(), we
	  release the EEH resources.
	- Another issue is that we need put the domain (PE or PHB) into
	  quite state while doing reset on that domain. However, some
	  deivces in the domain might not have EEH sensitive drivers, or
	  even don't have driver. Those deivces can't be put into quite
	  state and possibly keep issuing PCI-CFG or MMIO request during
	  resetting the domain. That possibly causes the failure of reset
	  and eventually failure of EEH recovery. For the issue, we introduces
	  so-called "partial hotplug". That means, those devices without driver or
	  without EEH sensitive driver are removed before doing reset, and
	  plugged (probed) into the system after reset.
	- We need traverse EEH devices of one specific PE with safe variant
	  of list tranverse function. The EEH device might be removed while
	  doing iteration.
	- When doing plug for PCI bus, we need check if we need reassign the
	  resources for subordinate devices (PCI_REASSIGN_ALL_RSRC) and do that
	  accordingly.

The patchset is verified on pSeires and PowerNV platforms:

pSeries Platform:

drmgr -c phb -r -s "PHB 513"
drmgr -c phb -a -s "PHB 513"
errinjct eeh -f 1 -s net/eth2

PowerNV Platform:

cd /sys/devices/pci0005:00/0005:00:00.0/0005:01:00.0/0005:02:08.0/0005:80:00.0/0005:90:01.0
while true; do od -x config > /dev/null; sleep 1; done
echo 1 > /sys/kernel/debug/powerpc/PCI0005/err_injct

---

v3 -> v4:
	* Add some comments to explain why we needn't check the return
	  value of pci_scan_slot() in pcibios_add_pci_devices().
	* Check PCI_PROBE_ONLY while assigning those unassigned resources
	  in pcibios_finish_adding_to_bus().
v2 -> v3:
	* Make pcibios_add_pci_devices() to support "partial" hotplug
	  according to Ben's comments. arch/powerpc/kernel/pci_of_scan.c
	  has been adjusted for that.
	* Use pcibios_add_pci_devices() to do "partial" hotplug inside
	  eeh_reset_device().
	* Introduce flag EEH_DEV_SYSFS to trace the state of sysfs entries
	  of the EEH device (then PCI device) to avoid race condition during
	  "partial" hotplug.
v1 -> v2:
	* Rebase to 3.11.rc1 in order to use pcibios_release_device().
	* Use pcibios_release_device() to release EEH cache and detach
	  EEH device from PCI device.
	* Remove reference to PCI device in EEH cache since we're relying
	  on pcibios_release_device().
	* PCI device instance (struct pci_dev) isn't available during BAR
	  restore and avoid use the instance that time.
	* Fix unbalanced enable for IRQ in eeh_driver.c
	* Retest the series of patches on Firebird-L/VPL3/VPL4

---

arch/powerpc/include/asm/eeh.h               |   30 ++++++++--
arch/powerpc/include/asm/pci-bridge.h        |    1 -
arch/powerpc/kernel/eeh.c                    |   70 +++++++++++------------
arch/powerpc/kernel/eeh_cache.c              |   18 ++----
arch/powerpc/kernel/eeh_driver.c             |   77 +++++++++++++++++++++++++-
arch/powerpc/kernel/eeh_pe.c                 |   58 ++++++++-----------
arch/powerpc/kernel/eeh_sysfs.c              |   21 +++++++
arch/powerpc/kernel/pci-common.c             |    2 +
arch/powerpc/kernel/pci-hotplug.c            |   49 ++++++++--------
arch/powerpc/kernel/pci_of_scan.c            |   56 +++++++++++++-----
arch/powerpc/platforms/powernv/eeh-powernv.c |   17 +++++-
arch/powerpc/platforms/pseries/eeh_pseries.c |   67 +++++++++++++++++++++-
drivers/pci/hotplug/rpadlpar_core.c          |    1 -
13 files changed, 327 insertions(+), 140 deletions(-)

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 15+ messages in thread
* [PATCH v3 0/11] EEH Followup Fixes (II)
@ 2013-07-23 11:10 Gavin Shan
  2013-07-23 11:10 ` [PATCH 09/11] powerpc/eeh: Don't use pci_dev during BAR restore Gavin Shan
  0 siblings, 1 reply; 15+ messages in thread
From: Gavin Shan @ 2013-07-23 11:10 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

The series of patches bases on linux-poerpc-next initially and intends to resolve
the following problems:
 
	- On pSeries platform, the EEH doesn't work after PHB hotplug
	  with "drmgr". The root cause is that the EEH resources (
	  EEH devices, EEH caches) aren't released correctly. For the
	  problem, we add one hook (pcibios_stop_dev), which is called
	  on pci_stop_and_remove_device(). In pcibios_stop_dev(), we
	  release the EEH resources.
	- Another issue is that we need put the domain (PE or PHB) into
	  quite state while doing reset on that domain. However, some
	  deivces in the domain might not have EEH sensitive drivers, or
	  even don't have driver. Those deivces can't be put into quite
	  state and possibly keep issuing PCI-CFG or MMIO request during
	  resetting the domain. That possibly causes the failure of reset
	  and eventually failure of EEH recovery. For the issue, we introduces
	  so-called "partial hotplug". That means, those devices without driver or
	  without EEH sensitive driver are removed before doing reset, and
	  plugged (probed) into the system after reset.
	- We need traverse EEH devices of one specific PE with safe variant
	  of list tranverse function. The EEH device might be removed while
	  doing iteration.
	- When doing plug for PCI bus, we need check if we need reassign the
	  resources for subordinate devices (PCI_REASSIGN_ALL_RSRC) and do that
	  accordingly.

The patchset is verified on pSeires and PowerNV platforms:

pSeries Platform:

drmgr -c phb -r -s "PHB 513"
drmgr -c phb -a -s "PHB 513"
errinjct eeh -f 1 -s net/eth2

PowerNV Platform:

cd /sys/devices/pci0005:00/0005:00:00.0/0005:01:00.0/0005:02:08.0/0005:80:00.0/0005:90:01.0
while true; do od -x config > /dev/null; sleep 1; done
echo 1 > /sys/kernel/debug/powerpc/PCI0005/err_injct

---

v2 -> v3:
	* Make pcibios_add_pci_devices() to support "partial" hotplug
	  according to Ben's comments. arch/powerpc/kernel/pci_of_scan.c
	  has been adjusted for that.
	* Use pcibios_add_pci_devices() to do "partial" hotplug inside
	  eeh_reset_device().
	* Introduce flag EEH_DEV_SYSFS to trace the state of sysfs entries
	  of the EEH device (then PCI device) to avoid race condition during
	  "partial" hotplug.
v1 -> v2:
	* Rebase to 3.11.rc1 in order to use pcibios_release_device().
	* Use pcibios_release_device() to release EEH cache and detach
	  EEH device from PCI device.
	* Remove reference to PCI device in EEH cache since we're relying
	  on pcibios_release_device().
	* PCI device instance (struct pci_dev) isn't available during BAR
	  restore and avoid use the instance that time.
	* Fix unbalanced enable for IRQ in eeh_driver.c
	* Retest the series of patches on Firebird-L/VPL3/VPL4

---

arch/powerpc/include/asm/eeh.h               |   30 ++++++++--
arch/powerpc/include/asm/pci-bridge.h        |    1 -
arch/powerpc/kernel/eeh.c                    |   70 +++++++++++------------
arch/powerpc/kernel/eeh_cache.c              |   18 ++----
arch/powerpc/kernel/eeh_driver.c             |   77 +++++++++++++++++++++++++-
arch/powerpc/kernel/eeh_pe.c                 |   58 ++++++++-----------
arch/powerpc/kernel/eeh_sysfs.c              |   21 +++++++
arch/powerpc/kernel/pci-common.c             |    1 +
arch/powerpc/kernel/pci-hotplug.c            |   41 ++++++--------
arch/powerpc/kernel/pci_of_scan.c            |   56 +++++++++++++-----
arch/powerpc/platforms/powernv/eeh-powernv.c |   17 +++++-
arch/powerpc/platforms/pseries/eeh_pseries.c |   67 +++++++++++++++++++++-
drivers/pci/hotplug/rpadlpar_core.c          |    1 -
13 files changed, 319 insertions(+), 139 deletions(-)

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-07-24 21:48 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-24  2:24 [PATCH v4 0/11] EEH Followup Fixes (II) Gavin Shan
2013-07-24  2:24 ` [PATCH 01/11] powerpc/eeh: Remove reference to PCI device Gavin Shan
2013-07-24  2:24 ` [PATCH 02/11] powerpc/eeh: Export functions for hotplug Gavin Shan
2013-07-24  2:24 ` [PATCH 03/11] powerpc/pci: Override pcibios_release_device() Gavin Shan
2013-07-24  2:24 ` [PATCH 04/11] PCI/hotplug: Needn't remove EEH cache again Gavin Shan
2013-07-24 18:02   ` Bjorn Helgaas
2013-07-24 21:47     ` Benjamin Herrenschmidt
2013-07-24  2:24 ` [PATCH 05/11] powerpc/eeh: Keep PE during hotplug Gavin Shan
2013-07-24  2:24 ` [PATCH 06/11] powerpc/eeh: Tranverse EEH devices with safe mode Gavin Shan
2013-07-24  2:24 ` [PATCH 07/11] powerpc/pci: Partial hotplug support Gavin Shan
2013-07-24  2:24 ` [PATCH 08/11] powerpc/eeh: Support partial hotplug Gavin Shan
2013-07-24  2:24 ` [PATCH 09/11] powerpc/eeh: Don't use pci_dev during BAR restore Gavin Shan
2013-07-24  2:25 ` [PATCH 10/11] powerpc/eeh: Fix unbalanced enable for IRQ Gavin Shan
2013-07-24  2:25 ` [PATCH 11/11] powerpc/eeh: Introdce flag to protect sysfs Gavin Shan
  -- strict thread matches above, loose matches on Subject: below --
2013-07-23 11:10 [PATCH v3 0/11] EEH Followup Fixes (II) Gavin Shan
2013-07-23 11:10 ` [PATCH 09/11] powerpc/eeh: Don't use pci_dev during BAR restore Gavin Shan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).