linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/12]
@ 2019-08-16  4:48 Sam Bobroff
  2019-08-16  4:48 ` [PATCH v5 01/12] powerpc/64: Adjust order in pcibios_init() Sam Bobroff
                   ` (11 more replies)
  0 siblings, 12 replies; 21+ messages in thread
From: Sam Bobroff @ 2019-08-16  4:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik, oohall, tyreld

Hi all,

Here is v5, with a complete rewrite of the commit message for patch 1, and the
inclusion of three patches from another set which are based on this set
(previously titled "EEH fixes 4").

Cover letter:

This patch set adds support for EEH recovery of hot plugged devices on pSeries
machines. Specifically, devices discovered by PCI rescanning using
/sys/bus/pci/rescan, which includes devices hotplugged by QEMU's device_add
command. (Upstream Linux pSeries guests running under QEMU/KVM don't currently
use slot power control for hotplugging.)

As a side effect this also provides EEH support for devices removed by
/sys/bus/pci/devices/*/remove and re-discovered by writing to /sys/bus/pci/rescan,
on all platforms.

The approach I've taken is to use the fact that the existing
pcibios_bus_add_device() platform hooks (which are used to set up EEH on
Virtual Function devices (VFs)) are actually called for all devices, so I've
widened their scope and made other adjustments necessary to allow them to work
for hotplugged and boot-time devices as well.

Because some of the changes are in generic PowerPC code, it's
possible that I've disturbed something for another PowerPC platform. I've tried
to minimize this by leaving that code alone as much as possible and so there
are a few cases where eeh_add_device_{early,late}() or eeh_add_sysfs_files() is
called more than once. I think these can be looked at later, as duplicate calls
are not harmful.

The first patch is a rework of the pcibios_init reordering patch I posted
earlier, which I've included here because it's necessary for this set.

I have done some testing for PowerNV on Power9 using a modified pnv_php module
and some testing on pSeries with slot power control using a modified rpaphp
module, and the EEH-related parts seem to work.

Cheers,
Sam.

Patch set changelog follows:

Patch set v5: 
Patch 1/12: powerpc/64: Adjust order in pcibios_init()
- Complete rewrite of commit message based on more research.
Patch 2/12: powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag
Patch 3/12: powerpc/eeh: Improve debug messages around device addition
Patch 4/12: powerpc/eeh: Initialize EEH address cache earlier
Patch 5/12: powerpc/eeh: EEH for pSeries hot plug
Patch 6/12: powerpc/eeh: Refactor around eeh_probe_devices()
Patch 7/12: powerpc/eeh: Add bdfn field to eeh_dev
Patch 8/12: powerpc/eeh: Introduce EEH edev logging macros
Patch 9/12: powerpc/eeh: Convert log messages to eeh_edev_* macros
Patch 10/12 (new in this version): powerpc/eeh: Fix crash when edev->pdev changes
Patch 11/12 (new in this version): powerpc/eeh: Remove unused return path from eeh_pe_dev_traverse()
Patch 12/12 (new in this version): powerpc/eeh: Slightly simplify eeh_add_to_parent_pe()

Patch set v4: 
Patch 1/9: powerpc/64: Adjust order in pcibios_init()
Patch 2/9: powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag
Patch 3/9: powerpc/eeh: Improve debug messages around device addition
Patch 4/9: powerpc/eeh: Initialize EEH address cache earlier
Patch 5/9: powerpc/eeh: EEH for pSeries hot plug
Patch 6/9: powerpc/eeh: Refactor around eeh_probe_devices()
Patch 7/9: powerpc/eeh: Add bdfn field to eeh_dev
Patch 8/9: powerpc/eeh: Introduce EEH edev logging macros
Patch 9/9: powerpc/eeh: Convert log messages to eeh_edev_* macros
- Fixed compile warning when compiling without CONFIG_IOV.

Patch set v3: 
Patch 1/9: powerpc/64: Adjust order in pcibios_init()
Patch 2/9: powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag
Patch 3/9: powerpc/eeh: Improve debug messages around device addition
Patch 4/9: powerpc/eeh: Initialize EEH address cache earlier
Patch 5/9: powerpc/eeh: EEH for pSeries hot plug
Patch 6/9: powerpc/eeh: Refactor around eeh_probe_devices()
Patch 7/9 (new in this version): powerpc/eeh: Add bdfn field to eeh_dev
Patch 8/9 (new in this version): powerpc/eeh: Introduce EEH edev logging macros
Patch 9/9 (new in this version): powerpc/eeh: Convert log messages to eeh_edev_* macros

Patch set v2: 
Patch 1/6: powerpc/64: Adjust order in pcibios_init()
Patch 2/6: powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag
* Also clear EEH_DEV_NO_HANDLER in eeh_handle_special_event().
Patch 3/6 (was 4/8): powerpc/eeh: Improve debug messages around device addition
Patch 4/6 (was 6/8): powerpc/eeh: Initialize EEH address cache earlier
Patch 5/6 (was 3/8 and 7/8): powerpc/eeh: EEH for pSeries hot plug
- Dropped changes to the PowerNV PHB EEH flag, instead refactor just enough to
  use the existing flag from multiple places.
- Merge the little remaining work from the above change into the patch where
  it's used.
Patch 6/6 (was 5/8 and 8/8): powerpc/eeh: Refactor around eeh_probe_devices()
- As it's so small, merged the enablement message patch into this one (where it's used).
- Reworked enablement messages.

Patch set v1:
Patch 1/8: powerpc/64: Adjust order in pcibios_init()
Patch 2/8: powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag
Patch 3/8: powerpc/eeh: Convert PNV_PHB_FLAG_EEH to global flag
Patch 4/8: powerpc/eeh: Improve debug messages around device addition
Patch 5/8: powerpc/eeh: Add eeh_show_enabled()
Patch 6/8: powerpc/eeh: Initialize EEH address cache earlier
Patch 7/8: powerpc/eeh: EEH for pSeries hot plug
Patch 8/8: powerpc/eeh: Remove eeh_probe_devices() and eeh_addr_cache_build()

Oliver O'Halloran (1):
  powerpc/eeh: Add bdfn field to eeh_dev

Sam Bobroff (11):
  powerpc/64: Adjust order in pcibios_init()
  powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag
  powerpc/eeh: Improve debug messages around device addition
  powerpc/eeh: Initialize EEH address cache earlier
  powerpc/eeh: EEH for pSeries hot plug
  powerpc/eeh: Refactor around eeh_probe_devices()
  powerpc/eeh: Introduce EEH edev logging macros
  powerpc/eeh: Convert log messages to eeh_edev_* macros
  powerpc/eeh: Fix crash when edev->pdev changes
  powerpc/eeh: Remove unused return path from eeh_pe_dev_traverse()
  powerpc/eeh: Slightly simplify eeh_add_to_parent_pe()

 arch/powerpc/include/asm/eeh.h               |  27 ++--
 arch/powerpc/include/asm/ppc-pci.h           |   7 +-
 arch/powerpc/kernel/eeh.c                    |  66 ++++------
 arch/powerpc/kernel/eeh_cache.c              |  37 ++----
 arch/powerpc/kernel/eeh_dev.c                |   2 +
 arch/powerpc/kernel/eeh_driver.c             | 105 ++++++++--------
 arch/powerpc/kernel/eeh_pe.c                 | 122 +++++++------------
 arch/powerpc/kernel/of_platform.c            |   3 +-
 arch/powerpc/kernel/pci-common.c             |   4 -
 arch/powerpc/kernel/pci_32.c                 |   4 +
 arch/powerpc/kernel/pci_64.c                 |  12 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c |  56 +++++----
 arch/powerpc/platforms/pseries/eeh_pseries.c |  68 ++++++-----
 arch/powerpc/platforms/pseries/pci.c         |   3 +-
 14 files changed, 238 insertions(+), 278 deletions(-)

-- 
2.22.0.216.g00a2a96fc9


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v5 01/12] powerpc/64: Adjust order in pcibios_init()
  2019-08-16  4:48 [PATCH v5 00/12] Sam Bobroff
@ 2019-08-16  4:48 ` Sam Bobroff
  2019-08-28  4:24   ` Michael Ellerman
  2019-08-16  4:48 ` [PATCH v5 02/12] powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag Sam Bobroff
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 21+ messages in thread
From: Sam Bobroff @ 2019-08-16  4:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik, oohall, tyreld

The pcibios_init() function for PowerPC 64 currently calls
pci_bus_add_devices() before pcibios_resource_survey(). This means
that at boot time, when the pcibios_bus_add_device() hooks are called
by pci_bus_add_devices(), device resources have not been allocated and
they are unable to perform EEH setup, so a separate pass is needed.

This patch adjusts that order so that it will become possible to
consolidate the EEH setup work into a single location.

The only functional change is to execute pcibios_resource_survey()
(excepting ppc_md.pcibios_fixup(), see below) before
pci_bus_add_devices() instead of after it.

Because pcibios_scan_phb() and pci_bus_add_devices() are called
together in a loop, this must be broken into one loop for each call.
Then the call to pcibios_resource_survey() is moved up in between
them. This changes the ordering but because pcibios_resource_survey()
also calls ppc_md.pcibios_fixup(), that call is extracted out into
pcibios_init() to where pcibios_resource_survey() was, so that it is
not moved.

The only other caller of pcibios_resource_survey() is the PowerPC 32
version of pcibios_init(), and therefore, that is modified to call
ppc_md.pcibios_fixup() right after pcibios_resource_survey() so that
there is no functional change there at all.

The re-arrangement will cause very few side-effects because at this
stage in the boot, pci_bus_add_devices() does very little:
- pci_create_sysfs_dev_files() does nothing (no sysfs yet)
- pci_proc_attach_device() does nothing (no proc yet)
- device_attach() does nothing (no drivers yet)
This leaves only the pci_final_fixup calls, D3 support, and marking
the device as added. Of those, only the pci_final_fixup calls have the
potential to be affected by resource allocation.

The only pci_final_fixup handlers that touch resources seem to be one
for x86 (pci_amd_enable_64bit_bar()), and a PowerPC 32 platform driver
(quirk_final_uli1575()), neither of which use this pcibios_init()
function. Even if they did, it would almost certainly be a bug, under
the current ordering, to rely on or make changes to resources before
they were allocated.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
v5 - Complete rewrite of commit message based on more research.

 arch/powerpc/kernel/pci-common.c |  4 ----
 arch/powerpc/kernel/pci_32.c     |  4 ++++
 arch/powerpc/kernel/pci_64.c     | 12 +++++++++---
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index f627e15bb43c..1c448cf25506 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1379,10 +1379,6 @@ void __init pcibios_resource_survey(void)
 		pr_debug("PCI: Assigning unassigned resources...\n");
 		pci_assign_unassigned_resources();
 	}
-
-	/* Call machine dependent fixup */
-	if (ppc_md.pcibios_fixup)
-		ppc_md.pcibios_fixup();
 }
 
 /* This is used by the PCI hotplug driver to allocate resource
diff --git a/arch/powerpc/kernel/pci_32.c b/arch/powerpc/kernel/pci_32.c
index 50942a1d1a5f..b49e1060a3bf 100644
--- a/arch/powerpc/kernel/pci_32.c
+++ b/arch/powerpc/kernel/pci_32.c
@@ -263,6 +263,10 @@ static int __init pcibios_init(void)
 	/* Call common code to handle resource allocation */
 	pcibios_resource_survey();
 
+	/* Call machine dependent fixup */
+	if (ppc_md.pcibios_fixup)
+		ppc_md.pcibios_fixup();
+
 	/* Call machine dependent post-init code */
 	if (ppc_md.pcibios_after_init)
 		ppc_md.pcibios_after_init();
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index b7030b1189d0..f83d1f69b1dd 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -54,14 +54,20 @@ static int __init pcibios_init(void)
 	pci_add_flags(PCI_ENABLE_PROC_DOMAINS | PCI_COMPAT_DOMAIN_0);
 
 	/* Scan all of the recorded PCI controllers.  */
-	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
+	list_for_each_entry_safe(hose, tmp, &hose_list, list_node)
 		pcibios_scan_phb(hose);
-		pci_bus_add_devices(hose->bus);
-	}
 
 	/* Call common code to handle resource allocation */
 	pcibios_resource_survey();
 
+	/* Add devices. */
+	list_for_each_entry_safe(hose, tmp, &hose_list, list_node)
+		pci_bus_add_devices(hose->bus);
+
+	/* Call machine dependent fixup */
+	if (ppc_md.pcibios_fixup)
+		ppc_md.pcibios_fixup();
+
 	printk(KERN_DEBUG "PCI: Probing PCI hardware done\n");
 
 	return 0;
-- 
2.22.0.216.g00a2a96fc9


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 02/12] powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag
  2019-08-16  4:48 [PATCH v5 00/12] Sam Bobroff
  2019-08-16  4:48 ` [PATCH v5 01/12] powerpc/64: Adjust order in pcibios_init() Sam Bobroff
@ 2019-08-16  4:48 ` Sam Bobroff
  2019-08-16  4:48 ` [PATCH v5 03/12] powerpc/eeh: Improve debug messages around device addition Sam Bobroff
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Sam Bobroff @ 2019-08-16  4:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik, oohall, tyreld

The EEH_DEV_NO_HANDLER flag is used by the EEH system to prevent the
use of driver callbacks in drivers that have been bound part way
through the recovery process. This is necessary to prevent later stage
handlers from being called when the earlier stage handlers haven't,
which can be confusing for drivers.

However, the flag is set for all devices that are added after boot
time and only cleared at the end of the EEH recovery process. This
results in hot plugged devices erroneously having the flag set during
the first recovery after they are added (causing their driver's
handlers to be incorrectly ignored).

To remedy this, clear the flag at the beginning of recovery
processing. The flag is still cleared at the end of recovery
processing, although it is no longer really necessary.

Also clear the flag during eeh_handle_special_event(), for the same
reasons.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 6f3ee30565dd..d6f54840a3a9 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -819,6 +819,10 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 		result = PCI_ERS_RESULT_DISCONNECT;
 	}
 
+	eeh_for_each_pe(pe, tmp_pe)
+		eeh_pe_for_each_dev(tmp_pe, edev, tmp)
+			edev->mode &= ~EEH_DEV_NO_HANDLER;
+
 	/* Walk the various device drivers attached to this slot through
 	 * a reset sequence, giving each an opportunity to do what it needs
 	 * to accomplish the reset.  Each child gets a report of the
@@ -1009,7 +1013,8 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
  */
 void eeh_handle_special_event(void)
 {
-	struct eeh_pe *pe, *phb_pe;
+	struct eeh_pe *pe, *phb_pe, *tmp_pe;
+	struct eeh_dev *edev, *tmp_edev;
 	struct pci_bus *bus;
 	struct pci_controller *hose;
 	unsigned long flags;
@@ -1078,6 +1083,10 @@ void eeh_handle_special_event(void)
 				    (phb_pe->state & EEH_PE_RECOVERING))
 					continue;
 
+				eeh_for_each_pe(pe, tmp_pe)
+					eeh_pe_for_each_dev(tmp_pe, edev, tmp_edev)
+						edev->mode &= ~EEH_DEV_NO_HANDLER;
+
 				/* Notify all devices to be down */
 				eeh_pe_state_clear(pe, EEH_PE_PRI_BUS, true);
 				eeh_set_channel_state(pe, pci_channel_io_perm_failure);
-- 
2.22.0.216.g00a2a96fc9


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 03/12] powerpc/eeh: Improve debug messages around device addition
  2019-08-16  4:48 [PATCH v5 00/12] Sam Bobroff
  2019-08-16  4:48 ` [PATCH v5 01/12] powerpc/64: Adjust order in pcibios_init() Sam Bobroff
  2019-08-16  4:48 ` [PATCH v5 02/12] powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag Sam Bobroff
@ 2019-08-16  4:48 ` Sam Bobroff
  2019-08-16  4:48 ` [PATCH v5 04/12] powerpc/eeh: Initialize EEH address cache earlier Sam Bobroff
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Sam Bobroff @ 2019-08-16  4:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik, oohall, tyreld

Also remove useless comment.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kernel/eeh.c                    |  2 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c | 14 ++++++++----
 arch/powerpc/platforms/pseries/eeh_pseries.c | 23 +++++++++++++++-----
 3 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index d44533bba642..846cc697030c 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1278,7 +1278,7 @@ void eeh_add_device_late(struct pci_dev *dev)
 	pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
 	edev = pdn_to_eeh_dev(pdn);
 	if (edev->pdev == dev) {
-		pr_debug("EEH: Already referenced !\n");
+		pr_debug("EEH: Device %s already referenced!\n", pci_name(dev));
 		return;
 	}
 
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 1cd5ebd7299c..629f9390d9af 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -46,10 +46,7 @@ void pnv_pcibios_bus_add_device(struct pci_dev *pdev)
 	if (!pdev->is_virtfn)
 		return;
 
-	/*
-	 * The following operations will fail if VF's sysfs files
-	 * aren't created or its resources aren't finalized.
-	 */
+	pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev));
 	eeh_add_device_early(pdn);
 	eeh_add_device_late(pdev);
 	eeh_sysfs_add_device(pdev);
@@ -393,6 +390,10 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
 	int ret;
 	int config_addr = (pdn->busno << 8) | (pdn->devfn);
 
+	pr_debug("%s: probing %04x:%02x:%02x.%01x\n",
+		__func__, hose->global_number, pdn->busno,
+		PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
+
 	/*
 	 * When probing the root bridge, which doesn't have any
 	 * subordinate PCI devices. We don't have OF node for
@@ -487,6 +488,11 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
 	/* Save memory bars */
 	eeh_save_bars(edev);
 
+	pr_debug("%s: EEH enabled on %02x:%02x.%01x PHB#%x-PE#%x\n",
+		__func__, pdn->busno, PCI_SLOT(pdn->devfn),
+		PCI_FUNC(pdn->devfn), edev->pe->phb->global_number,
+		edev->pe->addr);
+
 	return NULL;
 }
 
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 07e3fc2667aa..31733f6d642c 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -52,6 +52,8 @@ void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
 	if (!pdev->is_virtfn)
 		return;
 
+	pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev));
+
 	pdn->device_id  =  pdev->device;
 	pdn->vendor_id  =  pdev->vendor;
 	pdn->class_code =  pdev->class;
@@ -238,6 +240,10 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void *data)
 	int enable = 0;
 	int ret;
 
+	pr_debug("%s: probing %04x:%02x:%02x.%01x\n",
+		__func__, pdn->phb->global_number, pdn->busno,
+		PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
+
 	/* Retrieve OF node and eeh device */
 	edev = pdn_to_eeh_dev(pdn);
 	if (!edev || edev->pe)
@@ -281,7 +287,12 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void *data)
 
 	/* Enable EEH on the device */
 	ret = eeh_ops->set_option(&pe, EEH_OPT_ENABLE);
-	if (!ret) {
+	if (ret) {
+		pr_debug("%s: EEH failed to enable on %02x:%02x.%01x PHB#%x-PE#%x (code %d)\n",
+			__func__, pdn->busno, PCI_SLOT(pdn->devfn),
+			PCI_FUNC(pdn->devfn), pe.phb->global_number,
+			pe.addr, ret);
+	} else {
 		/* Retrieve PE address */
 		edev->pe_config_addr = eeh_ops->get_pe_addr(&pe);
 		pe.addr = edev->pe_config_addr;
@@ -297,11 +308,6 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void *data)
 		if (enable) {
 			eeh_add_flag(EEH_ENABLED);
 			eeh_add_to_parent_pe(edev);
-
-			pr_debug("%s: EEH enabled on %02x:%02x.%01x PHB#%x-PE#%x\n",
-				__func__, pdn->busno, PCI_SLOT(pdn->devfn),
-				PCI_FUNC(pdn->devfn), pe.phb->global_number,
-				pe.addr);
 		} else if (pdn->parent && pdn_to_eeh_dev(pdn->parent) &&
 			   (pdn_to_eeh_dev(pdn->parent))->pe) {
 			/* This device doesn't support EEH, but it may have an
@@ -310,6 +316,11 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void *data)
 			edev->pe_config_addr = pdn_to_eeh_dev(pdn->parent)->pe_config_addr;
 			eeh_add_to_parent_pe(edev);
 		}
+		pr_debug("%s: EEH %s on %02x:%02x.%01x PHB#%x-PE#%x (code %d)\n",
+			__func__, (enable ? "enabled" : "unsupported"),
+			pdn->busno, PCI_SLOT(pdn->devfn),
+			PCI_FUNC(pdn->devfn), pe.phb->global_number,
+			pe.addr, ret);
 	}
 
 	/* Save memory bars */
-- 
2.22.0.216.g00a2a96fc9


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 04/12] powerpc/eeh: Initialize EEH address cache earlier
  2019-08-16  4:48 [PATCH v5 00/12] Sam Bobroff
                   ` (2 preceding siblings ...)
  2019-08-16  4:48 ` [PATCH v5 03/12] powerpc/eeh: Improve debug messages around device addition Sam Bobroff
@ 2019-08-16  4:48 ` Sam Bobroff
  2019-08-16  4:48 ` [PATCH v5 05/12] powerpc/eeh: EEH for pSeries hot plug Sam Bobroff
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Sam Bobroff @ 2019-08-16  4:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik, oohall, tyreld

The EEH address cache is currently initialized and populated by a
single function: eeh_addr_cache_build().  While the initial population
of the cache can only be done once resources are allocated,
initialization (just setting up a spinlock) could be done much
earlier.

So move the initialization step into a separate function and call it
from a core_initcall (rather than a subsys initcall).

This will allow future work to make use of the cache during boot time
PCI scanning.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/eeh.h  |  3 +++
 arch/powerpc/kernel/eeh.c       |  2 ++
 arch/powerpc/kernel/eeh_cache.c | 13 +++++++++++--
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 45c9b26e3cce..20105964287a 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -275,6 +275,7 @@ int __init eeh_ops_register(struct eeh_ops *ops);
 int __exit eeh_ops_unregister(const char *name);
 int eeh_check_failure(const volatile void __iomem *token);
 int eeh_dev_check_failure(struct eeh_dev *edev);
+void eeh_addr_cache_init(void);
 void eeh_addr_cache_build(void);
 void eeh_add_device_early(struct pci_dn *);
 void eeh_add_device_tree_early(struct pci_dn *);
@@ -335,6 +336,8 @@ static inline int eeh_check_failure(const volatile void __iomem *token)
 
 #define eeh_dev_check_failure(x) (0)
 
+static inline void eeh_addr_cache_init(void) { }
+
 static inline void eeh_addr_cache_build(void) { }
 
 static inline void eeh_add_device_early(struct pci_dn *pdn) { }
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 846cc697030c..ca8b0c58a6a7 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1200,6 +1200,8 @@ static int eeh_init(void)
 	list_for_each_entry_safe(hose, tmp, &hose_list, list_node)
 		eeh_dev_phb_init_dynamic(hose);
 
+	eeh_addr_cache_init();
+
 	/* Initialize EEH event */
 	return eeh_event_init();
 }
diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index 320472373122..a790fa49c62d 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -254,6 +254,17 @@ void eeh_addr_cache_rmv_dev(struct pci_dev *dev)
 	spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
 }
 
+/**
+ * eeh_addr_cache_init - Initialize a cache of I/O addresses
+ *
+ * Initialize a cache of pci i/o addresses.  This cache will be used to
+ * find the pci device that corresponds to a given address.
+ */
+void eeh_addr_cache_init(void)
+{
+	spin_lock_init(&pci_io_addr_cache_root.piar_lock);
+}
+
 /**
  * eeh_addr_cache_build - Build a cache of I/O addresses
  *
@@ -269,8 +280,6 @@ void eeh_addr_cache_build(void)
 	struct eeh_dev *edev;
 	struct pci_dev *dev = NULL;
 
-	spin_lock_init(&pci_io_addr_cache_root.piar_lock);
-
 	for_each_pci_dev(dev) {
 		pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
 		if (!pdn)
-- 
2.22.0.216.g00a2a96fc9


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 05/12] powerpc/eeh: EEH for pSeries hot plug
  2019-08-16  4:48 [PATCH v5 00/12] Sam Bobroff
                   ` (3 preceding siblings ...)
  2019-08-16  4:48 ` [PATCH v5 04/12] powerpc/eeh: Initialize EEH address cache earlier Sam Bobroff
@ 2019-08-16  4:48 ` Sam Bobroff
  2019-08-21  3:28   ` Michael Ellerman
  2019-09-19 20:28   ` [PATCH v5 05/12] powerpc/eeh: EEH for pSeries hot plug Nathan Lynch
  2019-08-16  4:48 ` [PATCH v5 06/12] powerpc/eeh: Refactor around eeh_probe_devices() Sam Bobroff
                   ` (6 subsequent siblings)
  11 siblings, 2 replies; 21+ messages in thread
From: Sam Bobroff @ 2019-08-16  4:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik, oohall, tyreld

On PowerNV and pSeries, devices currently acquire EEH support from
several different places: Boot-time devices from eeh_probe_devices()
and eeh_addr_cache_build(), Virtual Function devices from the pcibios
bus add device hooks and hot plugged devices from pci_hp_add_devices()
(with other platforms using other methods as well).  Unfortunately,
pSeries machines currently discover hot plugged devices using
pci_rescan_bus(), not pci_hp_add_devices(), and so those devices do
not receive EEH support.

Rather than adding another case for pci_rescan_bus(), this change
widens the scope of the pcibios bus add device hooks so that they can
handle all devices. As a side effect this also supports devices
discovered after manually rescanning via /sys/bus/pci/rescan.

Note that on PowerNV, this change allows the EEH subsystem to become
enabled after boot as long as it has not been forced off, which was
not previously possible (it was already possible on pSeries).

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/kernel/eeh.c                    |  2 +-
 arch/powerpc/kernel/of_platform.c            |  3 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c | 39 +++++++++-----
 arch/powerpc/platforms/pseries/eeh_pseries.c | 54 ++++++++++----------
 4 files changed, 56 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index ca8b0c58a6a7..87edac6f2fd9 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1272,7 +1272,7 @@ void eeh_add_device_late(struct pci_dev *dev)
 	struct pci_dn *pdn;
 	struct eeh_dev *edev;
 
-	if (!dev || !eeh_enabled())
+	if (!dev)
 		return;
 
 	pr_debug("EEH: Adding device %s\n", pci_name(dev));
diff --git a/arch/powerpc/kernel/of_platform.c b/arch/powerpc/kernel/of_platform.c
index 427fc22f72b6..11c807468ab5 100644
--- a/arch/powerpc/kernel/of_platform.c
+++ b/arch/powerpc/kernel/of_platform.c
@@ -81,7 +81,8 @@ static int of_pci_phb_probe(struct platform_device *dev)
 	pcibios_claim_one_bus(phb->bus);
 
 	/* Finish EEH setup */
-	eeh_add_device_tree_late(phb->bus);
+	if (!eeh_has_flag(EEH_FORCE_DISABLED))
+		eeh_add_device_tree_late(phb->bus);
 
 	/* Add probed PCI devices to the device model */
 	pci_bus_add_devices(phb->bus);
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 629f9390d9af..77cc2f51c2ea 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -43,7 +43,7 @@ void pnv_pcibios_bus_add_device(struct pci_dev *pdev)
 {
 	struct pci_dn *pdn = pci_get_pdn(pdev);
 
-	if (!pdev->is_virtfn)
+	if (eeh_has_flag(EEH_FORCE_DISABLED))
 		return;
 
 	pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev));
@@ -222,6 +222,25 @@ static const struct file_operations eeh_tree_state_debugfs_ops = {
 
 #endif /* CONFIG_DEBUG_FS */
 
+void pnv_eeh_enable_phbs(void)
+{
+	struct pci_controller *hose;
+	struct pnv_phb *phb;
+
+	list_for_each_entry(hose, &hose_list, list_node) {
+		phb = hose->private_data;
+		/*
+		 * If EEH is enabled, we're going to rely on that.
+		 * Otherwise, we restore to conventional mechanism
+		 * to clear frozen PE during PCI config access.
+		 */
+		if (eeh_enabled())
+			phb->flags |= PNV_PHB_FLAG_EEH;
+		else
+			phb->flags &= ~PNV_PHB_FLAG_EEH;
+	}
+}
+
 /**
  * pnv_eeh_post_init - EEH platform dependent post initialization
  *
@@ -260,19 +279,11 @@ int pnv_eeh_post_init(void)
 	if (!eeh_enabled())
 		disable_irq(eeh_event_irq);
 
+	pnv_eeh_enable_phbs();
+
 	list_for_each_entry(hose, &hose_list, list_node) {
 		phb = hose->private_data;
 
-		/*
-		 * If EEH is enabled, we're going to rely on that.
-		 * Otherwise, we restore to conventional mechanism
-		 * to clear frozen PE during PCI config access.
-		 */
-		if (eeh_enabled())
-			phb->flags |= PNV_PHB_FLAG_EEH;
-		else
-			phb->flags &= ~PNV_PHB_FLAG_EEH;
-
 		/* Create debugfs entries */
 #ifdef CONFIG_DEBUG_FS
 		if (phb->has_dbgfs || !phb->dbgfs)
@@ -483,7 +494,11 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
 	 * Enable EEH explicitly so that we will do EEH check
 	 * while accessing I/O stuff
 	 */
-	eeh_add_flag(EEH_ENABLED);
+	if (!eeh_has_flag(EEH_ENABLED)) {
+		enable_irq(eeh_event_irq);
+		pnv_eeh_enable_phbs();
+		eeh_add_flag(EEH_ENABLED);
+	}
 
 	/* Save memory bars */
 	eeh_save_bars(edev);
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 31733f6d642c..96ad41fbf96b 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -42,44 +42,44 @@ static int ibm_get_config_addr_info;
 static int ibm_get_config_addr_info2;
 static int ibm_configure_pe;
 
-#ifdef CONFIG_PCI_IOV
 void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
 {
 	struct pci_dn *pdn = pci_get_pdn(pdev);
-	struct pci_dn *physfn_pdn;
-	struct eeh_dev *edev;
 
-	if (!pdev->is_virtfn)
+	if (eeh_has_flag(EEH_FORCE_DISABLED))
 		return;
 
 	pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev));
+#ifdef CONFIG_PCI_IOV
+	if (pdev->is_virtfn) {
+		struct pci_dn *physfn_pdn;
 
-	pdn->device_id  =  pdev->device;
-	pdn->vendor_id  =  pdev->vendor;
-	pdn->class_code =  pdev->class;
-	/*
-	 * Last allow unfreeze return code used for retrieval
-	 * by user space in eeh-sysfs to show the last command
-	 * completion from platform.
-	 */
-	pdn->last_allow_rc =  0;
-	physfn_pdn      =  pci_get_pdn(pdev->physfn);
-	pdn->pe_number  =  physfn_pdn->pe_num_map[pdn->vf_index];
-	edev = pdn_to_eeh_dev(pdn);
-
-	/*
-	 * The following operations will fail if VF's sysfs files
-	 * aren't created or its resources aren't finalized.
-	 */
+		pdn->device_id  =  pdev->device;
+		pdn->vendor_id  =  pdev->vendor;
+		pdn->class_code =  pdev->class;
+		/*
+		 * Last allow unfreeze return code used for retrieval
+		 * by user space in eeh-sysfs to show the last command
+		 * completion from platform.
+		 */
+		pdn->last_allow_rc =  0;
+		physfn_pdn      =  pci_get_pdn(pdev->physfn);
+		pdn->pe_number  =  physfn_pdn->pe_num_map[pdn->vf_index];
+	}
+#endif
 	eeh_add_device_early(pdn);
 	eeh_add_device_late(pdev);
-	edev->pe_config_addr =  (pdn->busno << 16) | (pdn->devfn << 8);
-	eeh_rmv_from_parent_pe(edev); /* Remove as it is adding to bus pe */
-	eeh_add_to_parent_pe(edev);   /* Add as VF PE type */
-	eeh_sysfs_add_device(pdev);
+#ifdef CONFIG_PCI_IOV
+	if (pdev->is_virtfn) {
+		struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
 
-}
+		edev->pe_config_addr =  (pdn->busno << 16) | (pdn->devfn << 8);
+		eeh_rmv_from_parent_pe(edev); /* Remove as it is adding to bus pe */
+		eeh_add_to_parent_pe(edev);   /* Add as VF PE type */
+	}
 #endif
+	eeh_sysfs_add_device(pdev);
+}
 
 /*
  * Buffer for reporting slot-error-detail rtas calls. Its here
@@ -146,10 +146,8 @@ static int pseries_eeh_init(void)
 	/* Set EEH probe mode */
 	eeh_add_flag(EEH_PROBE_MODE_DEVTREE | EEH_ENABLE_IO_FOR_LOG);
 
-#ifdef CONFIG_PCI_IOV
 	/* Set EEH machine dependent code */
 	ppc_md.pcibios_bus_add_device = pseries_pcibios_bus_add_device;
-#endif
 
 	return 0;
 }
-- 
2.22.0.216.g00a2a96fc9


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 06/12] powerpc/eeh: Refactor around eeh_probe_devices()
  2019-08-16  4:48 [PATCH v5 00/12] Sam Bobroff
                   ` (4 preceding siblings ...)
  2019-08-16  4:48 ` [PATCH v5 05/12] powerpc/eeh: EEH for pSeries hot plug Sam Bobroff
@ 2019-08-16  4:48 ` Sam Bobroff
  2019-08-16  4:48 ` [PATCH v5 07/12] powerpc/eeh: Add bdfn field to eeh_dev Sam Bobroff
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Sam Bobroff @ 2019-08-16  4:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik, oohall, tyreld

Now that EEH support for all devices (on PowerNV and pSeries) is
provided by the pcibios bus add device hooks, eeh_probe_devices() and
eeh_addr_cache_build() are redundant and can be removed.

Move the EEH enabled message into it's own function so that it can be
called from multiple places.

Note that previously on pSeries, useless EEH sysfs files were created
for some devices that did not have EEH support and this change
prevents them from being created.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/include/asm/eeh.h               |  7 ++---
 arch/powerpc/kernel/eeh.c                    | 27 ++++++-----------
 arch/powerpc/kernel/eeh_cache.c              | 32 --------------------
 arch/powerpc/platforms/powernv/eeh-powernv.c |  4 +--
 arch/powerpc/platforms/pseries/pci.c         |  3 +-
 5 files changed, 14 insertions(+), 59 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 20105964287a..7f9404a0c3bb 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -270,13 +270,12 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
 
 struct eeh_dev *eeh_dev_init(struct pci_dn *pdn);
 void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
-void eeh_probe_devices(void);
+void eeh_show_enabled(void);
 int __init eeh_ops_register(struct eeh_ops *ops);
 int __exit eeh_ops_unregister(const char *name);
 int eeh_check_failure(const volatile void __iomem *token);
 int eeh_dev_check_failure(struct eeh_dev *edev);
 void eeh_addr_cache_init(void);
-void eeh_addr_cache_build(void);
 void eeh_add_device_early(struct pci_dn *);
 void eeh_add_device_tree_early(struct pci_dn *);
 void eeh_add_device_late(struct pci_dev *);
@@ -320,7 +319,7 @@ static inline bool eeh_enabled(void)
         return false;
 }
 
-static inline void eeh_probe_devices(void) { }
+static inline void eeh_show_enabled(void) { }
 
 static inline void *eeh_dev_init(struct pci_dn *pdn, void *data)
 {
@@ -338,8 +337,6 @@ static inline int eeh_check_failure(const volatile void __iomem *token)
 
 static inline void eeh_addr_cache_init(void) { }
 
-static inline void eeh_addr_cache_build(void) { }
-
 static inline void eeh_add_device_early(struct pci_dn *pdn) { }
 
 static inline void eeh_add_device_tree_early(struct pci_dn *pdn) { }
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 87edac6f2fd9..c0ec1b6b1e69 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -150,6 +150,16 @@ static int __init eeh_setup(char *str)
 }
 __setup("eeh=", eeh_setup);
 
+void eeh_show_enabled(void)
+{
+	if (eeh_has_flag(EEH_FORCE_DISABLED))
+		pr_info("EEH: Recovery disabled by kernel parameter.\n");
+	else if (eeh_has_flag(EEH_ENABLED))
+		pr_info("EEH: Capable adapter found: recovery enabled.\n");
+	else
+		pr_info("EEH: No capable adapters found: recovery disabled.\n");
+}
+
 /*
  * This routine captures assorted PCI configuration space data
  * for the indicated PCI device, and puts them into a buffer
@@ -1143,23 +1153,6 @@ static struct notifier_block eeh_reboot_nb = {
 	.notifier_call = eeh_reboot_notifier,
 };
 
-void eeh_probe_devices(void)
-{
-	struct pci_controller *hose, *tmp;
-	struct pci_dn *pdn;
-
-	/* Enable EEH for all adapters */
-	list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
-		pdn = hose->pci_data;
-		traverse_pci_dn(pdn, eeh_ops->probe, NULL);
-	}
-	if (eeh_enabled())
-		pr_info("EEH: PCI Enhanced I/O Error Handling Enabled\n");
-	else
-		pr_info("EEH: No capable adapters found\n");
-
-}
-
 /**
  * eeh_init - EEH initialization
  *
diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index a790fa49c62d..8c8649172e97 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -265,38 +265,6 @@ void eeh_addr_cache_init(void)
 	spin_lock_init(&pci_io_addr_cache_root.piar_lock);
 }
 
-/**
- * eeh_addr_cache_build - Build a cache of I/O addresses
- *
- * Build a cache of pci i/o addresses.  This cache will be used to
- * find the pci device that corresponds to a given address.
- * This routine scans all pci busses to build the cache.
- * Must be run late in boot process, after the pci controllers
- * have been scanned for devices (after all device resources are known).
- */
-void eeh_addr_cache_build(void)
-{
-	struct pci_dn *pdn;
-	struct eeh_dev *edev;
-	struct pci_dev *dev = NULL;
-
-	for_each_pci_dev(dev) {
-		pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
-		if (!pdn)
-			continue;
-
-		edev = pdn_to_eeh_dev(pdn);
-		if (!edev)
-			continue;
-
-		dev->dev.archdata.edev = edev;
-		edev->pdev = dev;
-
-		eeh_addr_cache_insert_dev(dev);
-		eeh_sysfs_add_device(dev);
-	}
-}
-
 static int eeh_addr_cache_show(struct seq_file *s, void *v)
 {
 	struct pci_io_addr_range *piar;
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 77cc2f51c2ea..7ee0df9ba2c8 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -255,9 +255,7 @@ int pnv_eeh_post_init(void)
 	struct pnv_phb *phb;
 	int ret = 0;
 
-	/* Probe devices & build address cache */
-	eeh_probe_devices();
-	eeh_addr_cache_build();
+	eeh_show_enabled();
 
 	/* Register OPAL event notifier */
 	eeh_event_irq = opal_event_request(ilog2(OPAL_EVENT_PCI_ERROR));
diff --git a/arch/powerpc/platforms/pseries/pci.c b/arch/powerpc/platforms/pseries/pci.c
index 1eae1d09980c..722830978639 100644
--- a/arch/powerpc/platforms/pseries/pci.c
+++ b/arch/powerpc/platforms/pseries/pci.c
@@ -229,8 +229,7 @@ void __init pSeries_final_fixup(void)
 
 	pSeries_request_regions();
 
-	eeh_probe_devices();
-	eeh_addr_cache_build();
+	eeh_show_enabled();
 
 #ifdef CONFIG_PCI_IOV
 	ppc_md.pcibios_sriov_enable = pseries_pcibios_sriov_enable;
-- 
2.22.0.216.g00a2a96fc9


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 07/12] powerpc/eeh: Add bdfn field to eeh_dev
  2019-08-16  4:48 [PATCH v5 00/12] Sam Bobroff
                   ` (5 preceding siblings ...)
  2019-08-16  4:48 ` [PATCH v5 06/12] powerpc/eeh: Refactor around eeh_probe_devices() Sam Bobroff
@ 2019-08-16  4:48 ` Sam Bobroff
  2019-08-16  4:48 ` [PATCH v5 08/12] powerpc/eeh: Introduce EEH edev logging macros Sam Bobroff
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Sam Bobroff @ 2019-08-16  4:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik, oohall, tyreld

From: Oliver O'Halloran <oohall@gmail.com>

Preparation for removing pci_dn from the powernv EEH code. The only
thing we really use pci_dn for is to get the bdfn of the device for
config space accesses, so adding that information to eeh_dev reduces
the need to carry around the pci_dn.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[SB: Re-wrapped commit message, fixed whitespace damage.]
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/include/asm/eeh.h     | 2 ++
 arch/powerpc/include/asm/ppc-pci.h | 2 ++
 arch/powerpc/kernel/eeh_dev.c      | 2 ++
 3 files changed, 6 insertions(+)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 7f9404a0c3bb..bbe0798f6624 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -121,6 +121,8 @@ static inline bool eeh_pe_passed(struct eeh_pe *pe)
 struct eeh_dev {
 	int mode;			/* EEH mode			*/
 	int class_code;			/* Class code of the device	*/
+	int bdfn;			/* bdfn of device (for cfg ops) */
+	struct pci_controller *controller;
 	int pe_config_addr;		/* PE config address		*/
 	u32 config_space[16];		/* Saved PCI config space	*/
 	int pcix_cap;			/* Saved PCIx capability	*/
diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
index cec2d6409515..72860de205a0 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -74,6 +74,8 @@ static inline const char *eeh_driver_name(struct pci_dev *pdev)
 
 #endif /* CONFIG_EEH */
 
+#define PCI_BUSNO(bdfn) ((bdfn >> 8) & 0xff)
+
 #else /* CONFIG_PCI */
 static inline void init_pci_config_tokens(void) { }
 #endif /* !CONFIG_PCI */
diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c
index c4317c452d98..7370185c7a05 100644
--- a/arch/powerpc/kernel/eeh_dev.c
+++ b/arch/powerpc/kernel/eeh_dev.c
@@ -47,6 +47,8 @@ struct eeh_dev *eeh_dev_init(struct pci_dn *pdn)
 	/* Associate EEH device with OF node */
 	pdn->edev = edev;
 	edev->pdn = pdn;
+	edev->bdfn = (pdn->busno << 8) | pdn->devfn;
+	edev->controller = pdn->phb;
 
 	return edev;
 }
-- 
2.22.0.216.g00a2a96fc9


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 08/12] powerpc/eeh: Introduce EEH edev logging macros
  2019-08-16  4:48 [PATCH v5 00/12] Sam Bobroff
                   ` (6 preceding siblings ...)
  2019-08-16  4:48 ` [PATCH v5 07/12] powerpc/eeh: Add bdfn field to eeh_dev Sam Bobroff
@ 2019-08-16  4:48 ` Sam Bobroff
  2019-08-16  4:48 ` [PATCH v5 09/12] powerpc/eeh: Convert log messages to eeh_edev_* macros Sam Bobroff
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Sam Bobroff @ 2019-08-16  4:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik, oohall, tyreld

Now that struct eeh_dev includes the BDFN of it's PCI device, make use
of it to replace eeh_edev_info() with a set of dev_dbg()-style macros
that only need a struct edev.

With the BDFN available without the struct pci_dev, eeh_pci_name() is
now unnecessary, so remove it.

While only the "info" level function is used here, the others will be
used in followup work.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/include/asm/eeh.h   | 11 +++++++++++
 arch/powerpc/kernel/eeh_driver.c | 17 -----------------
 2 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index bbe0798f6624..e1023a556721 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -138,6 +138,17 @@ struct eeh_dev {
 	struct pci_dev *physfn;		/* Associated SRIOV PF		*/
 };
 
+/* "fmt" must be a simple literal string */
+#define EEH_EDEV_PRINT(level, edev, fmt, ...) \
+	pr_##level("PCI %04x:%02x:%02x.%x#%04x: EEH: " fmt, \
+	(edev)->controller->global_number, PCI_BUSNO((edev)->bdfn), \
+	PCI_SLOT((edev)->bdfn), PCI_FUNC((edev)->bdfn), \
+	((edev)->pe ? (edev)->pe_config_addr : 0xffff), ##__VA_ARGS__)
+#define eeh_edev_dbg(edev, fmt, ...) EEH_EDEV_PRINT(debug, (edev), fmt, ##__VA_ARGS__)
+#define eeh_edev_info(edev, fmt, ...) EEH_EDEV_PRINT(info, (edev), fmt, ##__VA_ARGS__)
+#define eeh_edev_warn(edev, fmt, ...) EEH_EDEV_PRINT(warn, (edev), fmt, ##__VA_ARGS__)
+#define eeh_edev_err(edev, fmt, ...) EEH_EDEV_PRINT(err, (edev), fmt, ##__VA_ARGS__)
+
 static inline struct pci_dn *eeh_dev_to_pdn(struct eeh_dev *edev)
 {
 	return edev ? edev->pdn : NULL;
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index d6f54840a3a9..29424d5e5fea 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -82,23 +82,6 @@ static const char *pci_ers_result_name(enum pci_ers_result result)
 	}
 };
 
-static __printf(2, 3) void eeh_edev_info(const struct eeh_dev *edev,
-					 const char *fmt, ...)
-{
-	struct va_format vaf;
-	va_list args;
-
-	va_start(args, fmt);
-
-	vaf.fmt = fmt;
-	vaf.va = &args;
-
-	printk(KERN_INFO "EEH: PE#%x (PCI %s): %pV\n", edev->pe_config_addr,
-	       edev->pdev ? dev_name(&edev->pdev->dev) : "none", &vaf);
-
-	va_end(args);
-}
-
 static enum pci_ers_result pci_ers_merge_result(enum pci_ers_result old,
 						enum pci_ers_result new)
 {
-- 
2.22.0.216.g00a2a96fc9


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 09/12] powerpc/eeh: Convert log messages to eeh_edev_* macros
  2019-08-16  4:48 [PATCH v5 00/12] Sam Bobroff
                   ` (7 preceding siblings ...)
  2019-08-16  4:48 ` [PATCH v5 08/12] powerpc/eeh: Introduce EEH edev logging macros Sam Bobroff
@ 2019-08-16  4:48 ` Sam Bobroff
  2019-08-16  4:48 ` [PATCH v5 10/12] powerpc/eeh: Fix crash when edev->pdev changes Sam Bobroff
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: Sam Bobroff @ 2019-08-16  4:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik, oohall, tyreld

Convert existing messages, where appropriate, to use the eeh_edev_*
logging macros.

The only effect should be minor adjustments to the log messages, apart
from:

- A new message in pseries_eeh_probe() "Probing device" to match the
powernv case.
- The "Probing device" message in pnv_eeh_probe() is now generated
slightly later, which will mean that it is no longer emitted for
devices that aren't probed due to the initial checks.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
 arch/powerpc/include/asm/ppc-pci.h           |  5 --
 arch/powerpc/kernel/eeh.c                    | 19 +++-----
 arch/powerpc/kernel/eeh_cache.c              |  8 +--
 arch/powerpc/kernel/eeh_driver.c             |  7 +--
 arch/powerpc/kernel/eeh_pe.c                 | 51 +++++---------------
 arch/powerpc/platforms/powernv/eeh-powernv.c | 17 ++-----
 arch/powerpc/platforms/pseries/eeh_pseries.c | 21 +++-----
 7 files changed, 39 insertions(+), 89 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
index 72860de205a0..7f4be5a05eb3 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -62,11 +62,6 @@ void eeh_pe_dev_mode_mark(struct eeh_pe *pe, int mode);
 void eeh_sysfs_add_device(struct pci_dev *pdev);
 void eeh_sysfs_remove_device(struct pci_dev *pdev);
 
-static inline const char *eeh_pci_name(struct pci_dev *pdev) 
-{ 
-	return pdev ? pci_name(pdev) : "<null>";
-} 
-
 static inline const char *eeh_driver_name(struct pci_dev *pdev)
 {
 	return (pdev && pdev->driver) ? pdev->driver->name : "<null>";
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index c0ec1b6b1e69..b6683f367f7f 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -461,8 +461,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 	/* Access to IO BARs might get this far and still not want checking. */
 	if (!pe) {
 		eeh_stats.ignored_check++;
-		pr_debug("EEH: Ignored check for %s\n",
-			eeh_pci_name(dev));
+		eeh_edev_dbg(edev, "Ignored check\n");
 		return 0;
 	}
 
@@ -502,12 +501,11 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 			if (dn)
 				location = of_get_property(dn, "ibm,loc-code",
 						NULL);
-			printk(KERN_ERR "EEH: %d reads ignored for recovering device at "
-				"location=%s driver=%s pci addr=%s\n",
+			eeh_edev_err(edev, "%d reads ignored for recovering device at location=%s driver=%s\n",
 				pe->check_count,
 				location ? location : "unknown",
-				eeh_driver_name(dev), eeh_pci_name(dev));
-			printk(KERN_ERR "EEH: Might be infinite loop in %s driver\n",
+				eeh_driver_name(dev));
+			eeh_edev_err(edev, "Might be infinite loop in %s driver\n",
 				eeh_driver_name(dev));
 			dump_stack();
 		}
@@ -1268,12 +1266,11 @@ void eeh_add_device_late(struct pci_dev *dev)
 	if (!dev)
 		return;
 
-	pr_debug("EEH: Adding device %s\n", pci_name(dev));
-
 	pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
 	edev = pdn_to_eeh_dev(pdn);
+	eeh_edev_dbg(edev, "Adding device\n");
 	if (edev->pdev == dev) {
-		pr_debug("EEH: Device %s already referenced!\n", pci_name(dev));
+		eeh_edev_dbg(edev, "Device already referenced!\n");
 		return;
 	}
 
@@ -1374,10 +1371,10 @@ void eeh_remove_device(struct pci_dev *dev)
 	edev = pci_dev_to_eeh_dev(dev);
 
 	/* Unregister the device with the EEH/PCI address search system */
-	pr_debug("EEH: Removing device %s\n", pci_name(dev));
+	dev_dbg(&dev->dev, "EEH: Removing device\n");
 
 	if (!edev || !edev->pdev || !edev->pe) {
-		pr_debug("EEH: Not referenced !\n");
+		dev_dbg(&dev->dev, "EEH: Device not referenced!\n");
 		return;
 	}
 
diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index 8c8649172e97..45360b9eab90 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -145,8 +145,8 @@ eeh_addr_cache_insert(struct pci_dev *dev, resource_size_t alo,
 	piar->pcidev = dev;
 	piar->flags = flags;
 
-	pr_debug("PIAR: insert range=[%pap:%pap] dev=%s\n",
-		 &alo, &ahi, pci_name(dev));
+	eeh_edev_dbg(piar->edev, "PIAR: insert range=[%pap:%pap]\n",
+		 &alo, &ahi);
 
 	rb_link_node(&piar->rb_node, parent, p);
 	rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root);
@@ -226,8 +226,8 @@ static inline void __eeh_addr_cache_rmv_dev(struct pci_dev *dev)
 		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
 
 		if (piar->pcidev == dev) {
-			pr_debug("PIAR: remove range=[%pap:%pap] dev=%s\n",
-				 &piar->addr_lo, &piar->addr_hi, pci_name(dev));
+			eeh_edev_dbg(piar->edev, "PIAR: remove range=[%pap:%pap]\n",
+				 &piar->addr_lo, &piar->addr_hi);
 			rb_erase(n, &pci_io_addr_cache_root.rb_root);
 			kfree(piar);
 			goto restart;
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 29424d5e5fea..670b6159cef1 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -457,12 +457,9 @@ static void *eeh_add_virt_device(struct eeh_dev *edev)
 {
 	struct pci_driver *driver;
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
-	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
 
 	if (!(edev->physfn)) {
-		pr_warn("%s: EEH dev %04x:%02x:%02x.%01x not for VF\n",
-			__func__, pdn->phb->global_number, pdn->busno,
-			PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
+		eeh_edev_warn(edev, "Not for VF\n");
 		return NULL;
 	}
 
@@ -476,7 +473,7 @@ static void *eeh_add_virt_device(struct eeh_dev *edev)
 	}
 
 #ifdef CONFIG_PCI_IOV
-	pci_iov_add_virtfn(edev->physfn, pdn->vf_index);
+	pci_iov_add_virtfn(edev->physfn, eeh_dev_to_pdn(edev)->vf_index);
 #endif
 	return NULL;
 }
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 854cef7b18f4..317a31624526 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -379,8 +379,7 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
 
 	/* Check if the PE number is valid */
 	if (!eeh_has_flag(EEH_VALID_PE_ZERO) && !edev->pe_config_addr) {
-		pr_err("%s: Invalid PE#0 for edev 0x%x on PHB#%x\n",
-		       __func__, config_addr, pdn->phb->global_number);
+		eeh_edev_err(edev, "PE#0 is invalid for this PHB!\n");
 		return -EINVAL;
 	}
 
@@ -398,12 +397,7 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
 
 		/* Put the edev to PE */
 		list_add_tail(&edev->entry, &pe->edevs);
-		pr_debug("EEH: Add %04x:%02x:%02x.%01x to Bus PE#%x\n",
-			 pdn->phb->global_number,
-			 pdn->busno,
-			 PCI_SLOT(pdn->devfn),
-			 PCI_FUNC(pdn->devfn),
-			 pe->addr);
+		eeh_edev_dbg(edev, "Added to bus PE\n");
 		return 0;
 	} else if (pe && (pe->type & EEH_PE_INVALID)) {
 		list_add_tail(&edev->entry, &pe->edevs);
@@ -420,13 +414,8 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
 			parent = parent->parent;
 		}
 
-		pr_debug("EEH: Add %04x:%02x:%02x.%01x to Device "
-			 "PE#%x, Parent PE#%x\n",
-			 pdn->phb->global_number,
-			 pdn->busno,
-			 PCI_SLOT(pdn->devfn),
-			 PCI_FUNC(pdn->devfn),
-			 pe->addr, pe->parent->addr);
+		eeh_edev_dbg(edev, "Added to device PE (parent: PE#%x)\n",
+			     pe->parent->addr);
 		return 0;
 	}
 
@@ -468,13 +457,8 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
 	list_add_tail(&pe->child, &parent->child_list);
 	list_add_tail(&edev->entry, &pe->edevs);
 	edev->pe = pe;
-	pr_debug("EEH: Add %04x:%02x:%02x.%01x to "
-		 "Device PE#%x, Parent PE#%x\n",
-		 pdn->phb->global_number,
-		 pdn->busno,
-		 PCI_SLOT(pdn->devfn),
-		 PCI_FUNC(pdn->devfn),
-		 pe->addr, pe->parent->addr);
+	eeh_edev_dbg(edev, "Added to device PE (parent: PE#%x)\n",
+		     pe->parent->addr);
 
 	return 0;
 }
@@ -492,15 +476,10 @@ int eeh_rmv_from_parent_pe(struct eeh_dev *edev)
 {
 	struct eeh_pe *pe, *parent, *child;
 	int cnt;
-	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
 
 	pe = eeh_dev_to_pe(edev);
 	if (!pe) {
-		pr_debug("%s: No PE found for device %04x:%02x:%02x.%01x\n",
-			 __func__,  pdn->phb->global_number,
-			 pdn->busno,
-			 PCI_SLOT(pdn->devfn),
-			 PCI_FUNC(pdn->devfn));
+		eeh_edev_dbg(edev, "No PE found for device.\n");
 		return -EEXIST;
 	}
 
@@ -717,17 +696,13 @@ static void eeh_bridge_check_link(struct eeh_dev *edev)
 	if (!(edev->mode & (EEH_DEV_ROOT_PORT | EEH_DEV_DS_PORT)))
 		return;
 
-	pr_debug("%s: Check PCIe link for %04x:%02x:%02x.%01x ...\n",
-		 __func__, pdn->phb->global_number,
-		 pdn->busno,
-		 PCI_SLOT(pdn->devfn),
-		 PCI_FUNC(pdn->devfn));
+	eeh_edev_dbg(edev, "Checking PCIe link...\n");
 
 	/* Check slot status */
 	cap = edev->pcie_cap;
 	eeh_ops->read_config(pdn, cap + PCI_EXP_SLTSTA, 2, &val);
 	if (!(val & PCI_EXP_SLTSTA_PDS)) {
-		pr_debug("  No card in the slot (0x%04x) !\n", val);
+		eeh_edev_dbg(edev, "No card in the slot (0x%04x) !\n", val);
 		return;
 	}
 
@@ -736,7 +711,7 @@ static void eeh_bridge_check_link(struct eeh_dev *edev)
 	if (val & PCI_EXP_SLTCAP_PCP) {
 		eeh_ops->read_config(pdn, cap + PCI_EXP_SLTCTL, 2, &val);
 		if (val & PCI_EXP_SLTCTL_PCC) {
-			pr_debug("  In power-off state, power it on ...\n");
+			eeh_edev_dbg(edev, "In power-off state, power it on ...\n");
 			val &= ~(PCI_EXP_SLTCTL_PCC | PCI_EXP_SLTCTL_PIC);
 			val |= (0x0100 & PCI_EXP_SLTCTL_PIC);
 			eeh_ops->write_config(pdn, cap + PCI_EXP_SLTCTL, 2, val);
@@ -752,7 +727,7 @@ static void eeh_bridge_check_link(struct eeh_dev *edev)
 	/* Check link */
 	eeh_ops->read_config(pdn, cap + PCI_EXP_LNKCAP, 4, &val);
 	if (!(val & PCI_EXP_LNKCAP_DLLLARC)) {
-		pr_debug("  No link reporting capability (0x%08x) \n", val);
+		eeh_edev_dbg(edev, "No link reporting capability (0x%08x) \n", val);
 		msleep(1000);
 		return;
 	}
@@ -769,10 +744,10 @@ static void eeh_bridge_check_link(struct eeh_dev *edev)
 	}
 
 	if (val & PCI_EXP_LNKSTA_DLLLA)
-		pr_debug("  Link up (%s)\n",
+		eeh_edev_dbg(edev, "Link up (%s)\n",
 			 (val & PCI_EXP_LNKSTA_CLS_2_5GB) ? "2.5GB" : "5GB");
 	else
-		pr_debug("  Link not ready (0x%04x)\n", val);
+		eeh_edev_dbg(edev, "Link not ready (0x%04x)\n", val);
 }
 
 #define BYTE_SWAP(OFF)	(8*((OFF)/4)+3-(OFF))
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 7ee0df9ba2c8..43c3b7d81794 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -46,7 +46,7 @@ void pnv_pcibios_bus_add_device(struct pci_dev *pdev)
 	if (eeh_has_flag(EEH_FORCE_DISABLED))
 		return;
 
-	pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev));
+	dev_dbg(&pdev->dev, "EEH: Setting up device\n");
 	eeh_add_device_early(pdn);
 	eeh_add_device_late(pdev);
 	eeh_sysfs_add_device(pdev);
@@ -399,10 +399,6 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
 	int ret;
 	int config_addr = (pdn->busno << 8) | (pdn->devfn);
 
-	pr_debug("%s: probing %04x:%02x:%02x.%01x\n",
-		__func__, hose->global_number, pdn->busno,
-		PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
-
 	/*
 	 * When probing the root bridge, which doesn't have any
 	 * subordinate PCI devices. We don't have OF node for
@@ -416,6 +412,8 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
 	if ((pdn->class_code >> 8) == PCI_CLASS_BRIDGE_ISA)
 		return NULL;
 
+	eeh_edev_dbg(edev, "Probing device\n");
+
 	/* Initialize eeh device */
 	edev->class_code = pdn->class_code;
 	edev->mode	&= 0xFFFFFF00;
@@ -441,9 +439,7 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
 	/* Create PE */
 	ret = eeh_add_to_parent_pe(edev);
 	if (ret) {
-		pr_warn("%s: Can't add PCI dev %04x:%02x:%02x.%01x to parent PE (%x)\n",
-			__func__, hose->global_number, pdn->busno,
-			PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn), ret);
+		eeh_edev_warn(edev, "Failed to add device to PE (code %d)\n", ret);
 		return NULL;
 	}
 
@@ -501,10 +497,7 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
 	/* Save memory bars */
 	eeh_save_bars(edev);
 
-	pr_debug("%s: EEH enabled on %02x:%02x.%01x PHB#%x-PE#%x\n",
-		__func__, pdn->busno, PCI_SLOT(pdn->devfn),
-		PCI_FUNC(pdn->devfn), edev->pe->phb->global_number,
-		edev->pe->addr);
+	eeh_edev_dbg(edev, "EEH enabled on device\n");
 
 	return NULL;
 }
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 96ad41fbf96b..517982197451 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -49,7 +49,7 @@ void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
 	if (eeh_has_flag(EEH_FORCE_DISABLED))
 		return;
 
-	pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev));
+	dev_dbg(&pdev->dev, "EEH: Setting up device\n");
 #ifdef CONFIG_PCI_IOV
 	if (pdev->is_virtfn) {
 		struct pci_dn *physfn_pdn;
@@ -238,10 +238,6 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void *data)
 	int enable = 0;
 	int ret;
 
-	pr_debug("%s: probing %04x:%02x:%02x.%01x\n",
-		__func__, pdn->phb->global_number, pdn->busno,
-		PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
-
 	/* Retrieve OF node and eeh device */
 	edev = pdn_to_eeh_dev(pdn);
 	if (!edev || edev->pe)
@@ -255,6 +251,8 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void *data)
         if ((pdn->class_code >> 8) == PCI_CLASS_BRIDGE_ISA)
 		return NULL;
 
+	eeh_edev_dbg(edev, "Probing device\n");
+
 	/*
 	 * Update class code and mode of eeh device. We need
 	 * correctly reflects that current device is root port
@@ -284,12 +282,10 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void *data)
 	pe.config_addr = (pdn->busno << 16) | (pdn->devfn << 8);
 
 	/* Enable EEH on the device */
+	eeh_edev_dbg(edev, "Enabling EEH on device\n");
 	ret = eeh_ops->set_option(&pe, EEH_OPT_ENABLE);
 	if (ret) {
-		pr_debug("%s: EEH failed to enable on %02x:%02x.%01x PHB#%x-PE#%x (code %d)\n",
-			__func__, pdn->busno, PCI_SLOT(pdn->devfn),
-			PCI_FUNC(pdn->devfn), pe.phb->global_number,
-			pe.addr, ret);
+		eeh_edev_dbg(edev, "EEH failed to enable on device (code %d)\n", ret);
 	} else {
 		/* Retrieve PE address */
 		edev->pe_config_addr = eeh_ops->get_pe_addr(&pe);
@@ -314,11 +310,8 @@ static void *pseries_eeh_probe(struct pci_dn *pdn, void *data)
 			edev->pe_config_addr = pdn_to_eeh_dev(pdn->parent)->pe_config_addr;
 			eeh_add_to_parent_pe(edev);
 		}
-		pr_debug("%s: EEH %s on %02x:%02x.%01x PHB#%x-PE#%x (code %d)\n",
-			__func__, (enable ? "enabled" : "unsupported"),
-			pdn->busno, PCI_SLOT(pdn->devfn),
-			PCI_FUNC(pdn->devfn), pe.phb->global_number,
-			pe.addr, ret);
+		eeh_edev_dbg(edev, "EEH is %s on device (code %d)\n",
+			     (enable ? "enabled" : "unsupported"), ret);
 	}
 
 	/* Save memory bars */
-- 
2.22.0.216.g00a2a96fc9


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 10/12] powerpc/eeh: Fix crash when edev->pdev changes
  2019-08-16  4:48 [PATCH v5 00/12] Sam Bobroff
                   ` (8 preceding siblings ...)
  2019-08-16  4:48 ` [PATCH v5 09/12] powerpc/eeh: Convert log messages to eeh_edev_* macros Sam Bobroff
@ 2019-08-16  4:48 ` Sam Bobroff
  2019-08-16  4:48 ` [PATCH v5 11/12] powerpc/eeh: Remove unused return path from eeh_pe_dev_traverse() Sam Bobroff
  2019-08-16  4:48 ` [PATCH v5 12/12] powerpc/eeh: Slightly simplify eeh_add_to_parent_pe() Sam Bobroff
  11 siblings, 0 replies; 21+ messages in thread
From: Sam Bobroff @ 2019-08-16  4:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik, oohall, tyreld

If a PCI device is removed during eeh_pe_report_edev(), between the
calls to device_lock() and device_unlock(), edev->pdev will change and
cause a crash as the wrong mutex is released.

To correct this, hold the PCI rescan/remove lock while taking a copy
of edev->pdev and performing a get_device() on it.  Use this value to
release the mutex, but also pass it through to the device driver's EEH
handlers so that they always see the same device.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
v5 * New in this version.

 arch/powerpc/kernel/eeh_driver.c | 44 +++++++++++++++++++++-----------
 1 file changed, 29 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 670b6159cef1..d0d175f22129 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -258,20 +258,27 @@ static void eeh_set_irq_state(struct eeh_pe *root, bool enable)
 }
 
 typedef enum pci_ers_result (*eeh_report_fn)(struct eeh_dev *,
+					     struct pci_dev *,
 					     struct pci_driver *);
 static void eeh_pe_report_edev(struct eeh_dev *edev, eeh_report_fn fn,
 			       enum pci_ers_result *result)
 {
+	struct pci_dev *pdev;
 	struct pci_driver *driver;
 	enum pci_ers_result new_result;
 
-	if (!edev->pdev) {
+	pci_lock_rescan_remove();
+	pdev = edev->pdev;
+	if (pdev)
+		get_device(&pdev->dev);
+	pci_unlock_rescan_remove();
+	if (!pdev) {
 		eeh_edev_info(edev, "no device");
 		return;
 	}
-	device_lock(&edev->pdev->dev);
+	device_lock(&pdev->dev);
 	if (eeh_edev_actionable(edev)) {
-		driver = eeh_pcid_get(edev->pdev);
+		driver = eeh_pcid_get(pdev);
 
 		if (!driver)
 			eeh_edev_info(edev, "no driver");
@@ -280,7 +287,7 @@ static void eeh_pe_report_edev(struct eeh_dev *edev, eeh_report_fn fn,
 		else if (edev->mode & EEH_DEV_NO_HANDLER)
 			eeh_edev_info(edev, "driver bound too late");
 		else {
-			new_result = fn(edev, driver);
+			new_result = fn(edev, pdev, driver);
 			eeh_edev_info(edev, "%s driver reports: '%s'",
 				      driver->name,
 				      pci_ers_result_name(new_result));
@@ -289,12 +296,15 @@ static void eeh_pe_report_edev(struct eeh_dev *edev, eeh_report_fn fn,
 							       new_result);
 		}
 		if (driver)
-			eeh_pcid_put(edev->pdev);
+			eeh_pcid_put(pdev);
 	} else {
-		eeh_edev_info(edev, "not actionable (%d,%d,%d)", !!edev->pdev,
+		eeh_edev_info(edev, "not actionable (%d,%d,%d)", !!pdev,
 			      !eeh_dev_removed(edev), !eeh_pe_passed(edev->pe));
 	}
-	device_unlock(&edev->pdev->dev);
+	device_unlock(&pdev->dev);
+	if (edev->pdev != pdev)
+		eeh_edev_warn(edev, "Device changed during processing!\n");
+	put_device(&pdev->dev);
 }
 
 static void eeh_pe_report(const char *name, struct eeh_pe *root,
@@ -321,20 +331,20 @@ static void eeh_pe_report(const char *name, struct eeh_pe *root,
  * Report an EEH error to each device driver.
  */
 static enum pci_ers_result eeh_report_error(struct eeh_dev *edev,
+					    struct pci_dev *pdev,
 					    struct pci_driver *driver)
 {
 	enum pci_ers_result rc;
-	struct pci_dev *dev = edev->pdev;
 
 	if (!driver->err_handler->error_detected)
 		return PCI_ERS_RESULT_NONE;
 
 	eeh_edev_info(edev, "Invoking %s->error_detected(IO frozen)",
 		      driver->name);
-	rc = driver->err_handler->error_detected(dev, pci_channel_io_frozen);
+	rc = driver->err_handler->error_detected(pdev, pci_channel_io_frozen);
 
 	edev->in_error = true;
-	pci_uevent_ers(dev, PCI_ERS_RESULT_NONE);
+	pci_uevent_ers(pdev, PCI_ERS_RESULT_NONE);
 	return rc;
 }
 
@@ -347,12 +357,13 @@ static enum pci_ers_result eeh_report_error(struct eeh_dev *edev,
  * are now enabled.
  */
 static enum pci_ers_result eeh_report_mmio_enabled(struct eeh_dev *edev,
+						   struct pci_dev *pdev,
 						   struct pci_driver *driver)
 {
 	if (!driver->err_handler->mmio_enabled)
 		return PCI_ERS_RESULT_NONE;
 	eeh_edev_info(edev, "Invoking %s->mmio_enabled()", driver->name);
-	return driver->err_handler->mmio_enabled(edev->pdev);
+	return driver->err_handler->mmio_enabled(pdev);
 }
 
 /**
@@ -366,12 +377,13 @@ static enum pci_ers_result eeh_report_mmio_enabled(struct eeh_dev *edev,
  * driver can work again while the device is recovered.
  */
 static enum pci_ers_result eeh_report_reset(struct eeh_dev *edev,
+					    struct pci_dev *pdev,
 					    struct pci_driver *driver)
 {
 	if (!driver->err_handler->slot_reset || !edev->in_error)
 		return PCI_ERS_RESULT_NONE;
 	eeh_edev_info(edev, "Invoking %s->slot_reset()", driver->name);
-	return driver->err_handler->slot_reset(edev->pdev);
+	return driver->err_handler->slot_reset(pdev);
 }
 
 static void *eeh_dev_restore_state(struct eeh_dev *edev, void *userdata)
@@ -412,13 +424,14 @@ static void *eeh_dev_restore_state(struct eeh_dev *edev, void *userdata)
  * to make the recovered device work again.
  */
 static enum pci_ers_result eeh_report_resume(struct eeh_dev *edev,
+					     struct pci_dev *pdev,
 					     struct pci_driver *driver)
 {
 	if (!driver->err_handler->resume || !edev->in_error)
 		return PCI_ERS_RESULT_NONE;
 
 	eeh_edev_info(edev, "Invoking %s->resume()", driver->name);
-	driver->err_handler->resume(edev->pdev);
+	driver->err_handler->resume(pdev);
 
 	pci_uevent_ers(edev->pdev, PCI_ERS_RESULT_RECOVERED);
 #ifdef CONFIG_PCI_IOV
@@ -437,6 +450,7 @@ static enum pci_ers_result eeh_report_resume(struct eeh_dev *edev,
  * dead, and that no further recovery attempts will be made on it.
  */
 static enum pci_ers_result eeh_report_failure(struct eeh_dev *edev,
+					      struct pci_dev *pdev,
 					      struct pci_driver *driver)
 {
 	enum pci_ers_result rc;
@@ -446,10 +460,10 @@ static enum pci_ers_result eeh_report_failure(struct eeh_dev *edev,
 
 	eeh_edev_info(edev, "Invoking %s->error_detected(permanent failure)",
 		      driver->name);
-	rc = driver->err_handler->error_detected(edev->pdev,
+	rc = driver->err_handler->error_detected(pdev,
 						 pci_channel_io_perm_failure);
 
-	pci_uevent_ers(edev->pdev, PCI_ERS_RESULT_DISCONNECT);
+	pci_uevent_ers(pdev, PCI_ERS_RESULT_DISCONNECT);
 	return rc;
 }
 
-- 
2.22.0.216.g00a2a96fc9


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 11/12] powerpc/eeh: Remove unused return path from eeh_pe_dev_traverse()
  2019-08-16  4:48 [PATCH v5 00/12] Sam Bobroff
                   ` (9 preceding siblings ...)
  2019-08-16  4:48 ` [PATCH v5 10/12] powerpc/eeh: Fix crash when edev->pdev changes Sam Bobroff
@ 2019-08-16  4:48 ` Sam Bobroff
  2019-08-16  4:48 ` [PATCH v5 12/12] powerpc/eeh: Slightly simplify eeh_add_to_parent_pe() Sam Bobroff
  11 siblings, 0 replies; 21+ messages in thread
From: Sam Bobroff @ 2019-08-16  4:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik, oohall, tyreld

There are no users of the early-out return value from
eeh_pe_dev_traverse(), so remove it.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
v5 * New in this version.

 arch/powerpc/include/asm/eeh.h   |  6 +++---
 arch/powerpc/kernel/eeh.c        | 16 +++++-----------
 arch/powerpc/kernel/eeh_driver.c | 26 +++++++++++---------------
 arch/powerpc/kernel/eeh_pe.c     | 25 +++++++------------------
 4 files changed, 26 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index e1023a556721..10cb6342f60b 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -261,7 +261,7 @@ static inline bool eeh_state_active(int state)
 	== (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE);
 }
 
-typedef void *(*eeh_edev_traverse_func)(struct eeh_dev *edev, void *flag);
+typedef void (*eeh_edev_traverse_func)(struct eeh_dev *edev, void *flag);
 typedef void *(*eeh_pe_traverse_func)(struct eeh_pe *pe, void *flag);
 void eeh_set_pe_aux_size(int size);
 int eeh_phb_pe_create(struct pci_controller *phb);
@@ -275,8 +275,8 @@ int eeh_rmv_from_parent_pe(struct eeh_dev *edev);
 void eeh_pe_update_time_stamp(struct eeh_pe *pe);
 void *eeh_pe_traverse(struct eeh_pe *root,
 		      eeh_pe_traverse_func fn, void *flag);
-void *eeh_pe_dev_traverse(struct eeh_pe *root,
-			  eeh_edev_traverse_func fn, void *flag);
+void eeh_pe_dev_traverse(struct eeh_pe *root,
+			 eeh_edev_traverse_func fn, void *flag);
 void eeh_pe_restore_bars(struct eeh_pe *pe);
 const char *eeh_pe_loc_get(struct eeh_pe *pe);
 struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index b6683f367f7f..b7a884305ae5 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -696,7 +696,7 @@ int eeh_pci_enable(struct eeh_pe *pe, int function)
 	return rc;
 }
 
-static void *eeh_disable_and_save_dev_state(struct eeh_dev *edev,
+static void eeh_disable_and_save_dev_state(struct eeh_dev *edev,
 					    void *userdata)
 {
 	struct pci_dev *pdev = eeh_dev_to_pci_dev(edev);
@@ -707,7 +707,7 @@ static void *eeh_disable_and_save_dev_state(struct eeh_dev *edev,
 	 * state for the specified device
 	 */
 	if (!pdev || pdev == dev)
-		return NULL;
+		return;
 
 	/* Ensure we have D0 power state */
 	pci_set_power_state(pdev, PCI_D0);
@@ -720,18 +720,16 @@ static void *eeh_disable_and_save_dev_state(struct eeh_dev *edev,
 	 * interrupt from the device
 	 */
 	pci_write_config_word(pdev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
-
-	return NULL;
 }
 
-static void *eeh_restore_dev_state(struct eeh_dev *edev, void *userdata)
+static void eeh_restore_dev_state(struct eeh_dev *edev, void *userdata)
 {
 	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
 	struct pci_dev *pdev = eeh_dev_to_pci_dev(edev);
 	struct pci_dev *dev = userdata;
 
 	if (!pdev)
-		return NULL;
+		return;
 
 	/* Apply customization from firmware */
 	if (pdn && eeh_ops->restore_config)
@@ -740,8 +738,6 @@ static void *eeh_restore_dev_state(struct eeh_dev *edev, void *userdata)
 	/* The caller should restore state for the specified device */
 	if (pdev != dev)
 		pci_restore_state(pdev);
-
-	return NULL;
 }
 
 int eeh_restore_vf_config(struct pci_dn *pdn)
@@ -956,7 +952,7 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state stat
  * the indicated device and its children so that the bunch of the
  * devices could be reset properly.
  */
-static void *eeh_set_dev_freset(struct eeh_dev *edev, void *flag)
+static void eeh_set_dev_freset(struct eeh_dev *edev, void *flag)
 {
 	struct pci_dev *dev;
 	unsigned int *freset = (unsigned int *)flag;
@@ -964,8 +960,6 @@ static void *eeh_set_dev_freset(struct eeh_dev *edev, void *flag)
 	dev = eeh_dev_to_pci_dev(edev);
 	if (dev)
 		*freset |= dev->needs_freset;
-
-	return NULL;
 }
 
 static void eeh_pe_refreeze_passed(struct eeh_pe *root)
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index d0d175f22129..56841c6451e5 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -198,12 +198,12 @@ static void eeh_enable_irq(struct eeh_dev *edev)
 	}
 }
 
-static void *eeh_dev_save_state(struct eeh_dev *edev, void *userdata)
+static void eeh_dev_save_state(struct eeh_dev *edev, void *userdata)
 {
 	struct pci_dev *pdev;
 
 	if (!edev)
-		return NULL;
+		return;
 
 	/*
 	 * We cannot access the config space on some adapters.
@@ -213,14 +213,13 @@ static void *eeh_dev_save_state(struct eeh_dev *edev, void *userdata)
 	 * device is created.
 	 */
 	if (edev->pe && (edev->pe->state & EEH_PE_CFG_RESTRICTED))
-		return NULL;
+		return;
 
 	pdev = eeh_dev_to_pci_dev(edev);
 	if (!pdev)
-		return NULL;
+		return;
 
 	pci_save_state(pdev);
-	return NULL;
 }
 
 static void eeh_set_channel_state(struct eeh_pe *root, enum pci_channel_state s)
@@ -386,12 +385,12 @@ static enum pci_ers_result eeh_report_reset(struct eeh_dev *edev,
 	return driver->err_handler->slot_reset(pdev);
 }
 
-static void *eeh_dev_restore_state(struct eeh_dev *edev, void *userdata)
+static void eeh_dev_restore_state(struct eeh_dev *edev, void *userdata)
 {
 	struct pci_dev *pdev;
 
 	if (!edev)
-		return NULL;
+		return;
 
 	/*
 	 * The content in the config space isn't saved because
@@ -403,15 +402,14 @@ static void *eeh_dev_restore_state(struct eeh_dev *edev, void *userdata)
 		if (list_is_last(&edev->entry, &edev->pe->edevs))
 			eeh_pe_restore_bars(edev->pe);
 
-		return NULL;
+		return;
 	}
 
 	pdev = eeh_dev_to_pci_dev(edev);
 	if (!pdev)
-		return NULL;
+		return;
 
 	pci_restore_state(pdev);
-	return NULL;
 }
 
 /**
@@ -492,7 +490,7 @@ static void *eeh_add_virt_device(struct eeh_dev *edev)
 	return NULL;
 }
 
-static void *eeh_rmv_device(struct eeh_dev *edev, void *userdata)
+static void eeh_rmv_device(struct eeh_dev *edev, void *userdata)
 {
 	struct pci_driver *driver;
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
@@ -507,7 +505,7 @@ static void *eeh_rmv_device(struct eeh_dev *edev, void *userdata)
 	 */
 	if (!eeh_edev_actionable(edev) ||
 	    (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE))
-		return NULL;
+		return;
 
 	if (rmv_data) {
 		driver = eeh_pcid_get(dev);
@@ -516,7 +514,7 @@ static void *eeh_rmv_device(struct eeh_dev *edev, void *userdata)
 			    driver->err_handler->error_detected &&
 			    driver->err_handler->slot_reset) {
 				eeh_pcid_put(dev);
-				return NULL;
+				return;
 			}
 			eeh_pcid_put(dev);
 		}
@@ -549,8 +547,6 @@ static void *eeh_rmv_device(struct eeh_dev *edev, void *userdata)
 		pci_stop_and_remove_bus_device(dev);
 		pci_unlock_rescan_remove();
 	}
-
-	return NULL;
 }
 
 static void *eeh_pe_detach_dev(struct eeh_pe *pe, void *userdata)
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 317a31624526..9c38fa7c33aa 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -231,29 +231,22 @@ void *eeh_pe_traverse(struct eeh_pe *root,
  * The function is used to traverse the devices of the specified
  * PE and its child PEs.
  */
-void *eeh_pe_dev_traverse(struct eeh_pe *root,
+void eeh_pe_dev_traverse(struct eeh_pe *root,
 			  eeh_edev_traverse_func fn, void *flag)
 {
 	struct eeh_pe *pe;
 	struct eeh_dev *edev, *tmp;
-	void *ret;
 
 	if (!root) {
 		pr_warn("%s: Invalid PE %p\n",
 			__func__, root);
-		return NULL;
+		return;
 	}
 
 	/* Traverse root PE */
-	eeh_for_each_pe(root, pe) {
-		eeh_pe_for_each_dev(pe, edev, tmp) {
-			ret = fn(edev, flag);
-			if (ret)
-				return ret;
-		}
-	}
-
-	return NULL;
+	eeh_for_each_pe(root, pe)
+		eeh_pe_for_each_dev(pe, edev, tmp)
+			fn(edev, flag);
 }
 
 /**
@@ -602,13 +595,11 @@ void eeh_pe_mark_isolated(struct eeh_pe *root)
 }
 EXPORT_SYMBOL_GPL(eeh_pe_mark_isolated);
 
-static void *__eeh_pe_dev_mode_mark(struct eeh_dev *edev, void *flag)
+static void __eeh_pe_dev_mode_mark(struct eeh_dev *edev, void *flag)
 {
 	int mode = *((int *)flag);
 
 	edev->mode |= mode;
-
-	return NULL;
 }
 
 /**
@@ -827,7 +818,7 @@ static void eeh_restore_device_bars(struct eeh_dev *edev)
  * the expansion ROM base address, the latency timer, and etc.
  * from the saved values in the device node.
  */
-static void *eeh_restore_one_device_bars(struct eeh_dev *edev, void *flag)
+static void eeh_restore_one_device_bars(struct eeh_dev *edev, void *flag)
 {
 	struct pci_dn *pdn = eeh_dev_to_pdn(edev);
 
@@ -839,8 +830,6 @@ static void *eeh_restore_one_device_bars(struct eeh_dev *edev, void *flag)
 
 	if (eeh_ops->restore_config && pdn)
 		eeh_ops->restore_config(pdn);
-
-	return NULL;
 }
 
 /**
-- 
2.22.0.216.g00a2a96fc9


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 12/12] powerpc/eeh: Slightly simplify eeh_add_to_parent_pe()
  2019-08-16  4:48 [PATCH v5 00/12] Sam Bobroff
                   ` (10 preceding siblings ...)
  2019-08-16  4:48 ` [PATCH v5 11/12] powerpc/eeh: Remove unused return path from eeh_pe_dev_traverse() Sam Bobroff
@ 2019-08-16  4:48 ` Sam Bobroff
  11 siblings, 0 replies; 21+ messages in thread
From: Sam Bobroff @ 2019-08-16  4:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik, oohall, tyreld

Simplify some needlessly complicated boolean logic in
eeh_add_to_parent_pe().

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
v5 * New in this version.

 arch/powerpc/kernel/eeh_pe.c | 52 +++++++++++++++++++-----------------
 1 file changed, 27 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 9c38fa7c33aa..1a6254bcf056 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -383,32 +383,34 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
 	 * components.
 	 */
 	pe = eeh_pe_get(pdn->phb, edev->pe_config_addr, config_addr);
-	if (pe && !(pe->type & EEH_PE_INVALID)) {
-		/* Mark the PE as type of PCI bus */
-		pe->type = EEH_PE_BUS;
-		edev->pe = pe;
-
-		/* Put the edev to PE */
-		list_add_tail(&edev->entry, &pe->edevs);
-		eeh_edev_dbg(edev, "Added to bus PE\n");
-		return 0;
-	} else if (pe && (pe->type & EEH_PE_INVALID)) {
-		list_add_tail(&edev->entry, &pe->edevs);
-		edev->pe = pe;
-		/*
-		 * We're running to here because of PCI hotplug caused by
-		 * EEH recovery. We need clear EEH_PE_INVALID until the top.
-		 */
-		parent = pe;
-		while (parent) {
-			if (!(parent->type & EEH_PE_INVALID))
-				break;
-			parent->type &= ~EEH_PE_INVALID;
-			parent = parent->parent;
-		}
+	if (pe) {
+		if (pe->type & EEH_PE_INVALID) {
+			list_add_tail(&edev->entry, &pe->edevs);
+			edev->pe = pe;
+			/*
+			 * We're running to here because of PCI hotplug caused by
+			 * EEH recovery. We need clear EEH_PE_INVALID until the top.
+			 */
+			parent = pe;
+			while (parent) {
+				if (!(parent->type & EEH_PE_INVALID))
+					break;
+				parent->type &= ~EEH_PE_INVALID;
+				parent = parent->parent;
+			}
 
-		eeh_edev_dbg(edev, "Added to device PE (parent: PE#%x)\n",
-			     pe->parent->addr);
+			eeh_edev_dbg(edev,
+				     "Added to device PE (parent: PE#%x)\n",
+				     pe->parent->addr);
+		} else {
+			/* Mark the PE as type of PCI bus */
+			pe->type = EEH_PE_BUS;
+			edev->pe = pe;
+
+			/* Put the edev to PE */
+			list_add_tail(&edev->entry, &pe->edevs);
+			eeh_edev_dbg(edev, "Added to bus PE\n");
+		}
 		return 0;
 	}
 
-- 
2.22.0.216.g00a2a96fc9


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 05/12] powerpc/eeh: EEH for pSeries hot plug
  2019-08-16  4:48 ` [PATCH v5 05/12] powerpc/eeh: EEH for pSeries hot plug Sam Bobroff
@ 2019-08-21  3:28   ` Michael Ellerman
  2019-08-22  6:17     ` [PATCH] powerpc/eeh: Fixup EEH for pSeries hotplug Sam Bobroff
  2019-09-19 20:28   ` [PATCH v5 05/12] powerpc/eeh: EEH for pSeries hot plug Nathan Lynch
  1 sibling, 1 reply; 21+ messages in thread
From: Michael Ellerman @ 2019-08-21  3:28 UTC (permalink / raw)
  To: Sam Bobroff, linuxppc-dev; +Cc: aik, oohall, tyreld

Sam Bobroff <sbobroff@linux.ibm.com> writes:
> diff --git a/arch/powerpc/kernel/of_platform.c b/arch/powerpc/kernel/of_platform.c
> index 427fc22f72b6..11c807468ab5 100644
> --- a/arch/powerpc/kernel/of_platform.c
> +++ b/arch/powerpc/kernel/of_platform.c
> @@ -81,7 +81,8 @@ static int of_pci_phb_probe(struct platform_device *dev)
>  	pcibios_claim_one_bus(phb->bus);
>  
>  	/* Finish EEH setup */
> -	eeh_add_device_tree_late(phb->bus);
> +	if (!eeh_has_flag(EEH_FORCE_DISABLED))
> +		eeh_add_device_tree_late(phb->bus);

This breaks cell_defconfig which has CONFIG_EEH=n.

That's because while eeh_add_device_tree_late() has an empty definition
in that case, eeh_has_flag() and EEH_FORCE_DISABLED do not.

Let me know how you want to fix it, if it's small just send me an
incremental diff.

cheers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] powerpc/eeh: Fixup EEH for pSeries hotplug
  2019-08-21  3:28   ` Michael Ellerman
@ 2019-08-22  6:17     ` Sam Bobroff
  0 siblings, 0 replies; 21+ messages in thread
From: Sam Bobroff @ 2019-08-22  6:17 UTC (permalink / raw)
  To: mpe, linuxppc-dev; +Cc: aik, oohall, tyreld

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
---
Let's move the test into eeh_add_device_tree_late().

Thanks,
Sam.

 arch/powerpc/kernel/eeh.c         | 2 ++
 arch/powerpc/kernel/of_platform.c | 3 +--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 87edac6f2fd9..e95a7a3c9037 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1328,6 +1328,8 @@ void eeh_add_device_tree_late(struct pci_bus *bus)
 {
 	struct pci_dev *dev;
 
+	if (eeh_has_flag(EEH_FORCE_DISABLED))
+		return;
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		eeh_add_device_late(dev);
 		if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
diff --git a/arch/powerpc/kernel/of_platform.c b/arch/powerpc/kernel/of_platform.c
index 11c807468ab5..427fc22f72b6 100644
--- a/arch/powerpc/kernel/of_platform.c
+++ b/arch/powerpc/kernel/of_platform.c
@@ -81,8 +81,7 @@ static int of_pci_phb_probe(struct platform_device *dev)
 	pcibios_claim_one_bus(phb->bus);
 
 	/* Finish EEH setup */
-	if (!eeh_has_flag(EEH_FORCE_DISABLED))
-		eeh_add_device_tree_late(phb->bus);
+	eeh_add_device_tree_late(phb->bus);
 
 	/* Add probed PCI devices to the device model */
 	pci_bus_add_devices(phb->bus);
-- 
2.22.0.216.g00a2a96fc9


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 01/12] powerpc/64: Adjust order in pcibios_init()
  2019-08-16  4:48 ` [PATCH v5 01/12] powerpc/64: Adjust order in pcibios_init() Sam Bobroff
@ 2019-08-28  4:24   ` Michael Ellerman
  0 siblings, 0 replies; 21+ messages in thread
From: Michael Ellerman @ 2019-08-28  4:24 UTC (permalink / raw)
  To: Sam Bobroff, linuxppc-dev; +Cc: aik, oohall, tyreld

On Fri, 2019-08-16 at 04:48:05 UTC, Sam Bobroff wrote:
> The pcibios_init() function for PowerPC 64 currently calls
> pci_bus_add_devices() before pcibios_resource_survey(). This means
> that at boot time, when the pcibios_bus_add_device() hooks are called
> by pci_bus_add_devices(), device resources have not been allocated and
> they are unable to perform EEH setup, so a separate pass is needed.
> 
> This patch adjusts that order so that it will become possible to
> consolidate the EEH setup work into a single location.
> 
> The only functional change is to execute pcibios_resource_survey()
> (excepting ppc_md.pcibios_fixup(), see below) before
> pci_bus_add_devices() instead of after it.
> 
> Because pcibios_scan_phb() and pci_bus_add_devices() are called
> together in a loop, this must be broken into one loop for each call.
> Then the call to pcibios_resource_survey() is moved up in between
> them. This changes the ordering but because pcibios_resource_survey()
> also calls ppc_md.pcibios_fixup(), that call is extracted out into
> pcibios_init() to where pcibios_resource_survey() was, so that it is
> not moved.
> 
> The only other caller of pcibios_resource_survey() is the PowerPC 32
> version of pcibios_init(), and therefore, that is modified to call
> ppc_md.pcibios_fixup() right after pcibios_resource_survey() so that
> there is no functional change there at all.
> 
> The re-arrangement will cause very few side-effects because at this
> stage in the boot, pci_bus_add_devices() does very little:
> - pci_create_sysfs_dev_files() does nothing (no sysfs yet)
> - pci_proc_attach_device() does nothing (no proc yet)
> - device_attach() does nothing (no drivers yet)
> This leaves only the pci_final_fixup calls, D3 support, and marking
> the device as added. Of those, only the pci_final_fixup calls have the
> potential to be affected by resource allocation.
> 
> The only pci_final_fixup handlers that touch resources seem to be one
> for x86 (pci_amd_enable_64bit_bar()), and a PowerPC 32 platform driver
> (quirk_final_uli1575()), neither of which use this pcibios_init()
> function. Even if they did, it would almost certainly be a bug, under
> the current ordering, to rely on or make changes to resources before
> they were allocated.
> 
> Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/3f068aae7a958555533847af88705b5629f31600

cheers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 05/12] powerpc/eeh: EEH for pSeries hot plug
  2019-08-16  4:48 ` [PATCH v5 05/12] powerpc/eeh: EEH for pSeries hot plug Sam Bobroff
  2019-08-21  3:28   ` Michael Ellerman
@ 2019-09-19 20:28   ` Nathan Lynch
  2019-09-19 23:27     ` Oliver O'Halloran
  2019-09-23  5:00     ` Sam Bobroff
  1 sibling, 2 replies; 21+ messages in thread
From: Nathan Lynch @ 2019-09-19 20:28 UTC (permalink / raw)
  To: Sam Bobroff; +Cc: aik, linuxppc-dev, oohall, tyreld

Hello Sam,

Sam Bobroff <sbobroff@linux.ibm.com> writes:
> On PowerNV and pSeries, devices currently acquire EEH support from
> several different places: Boot-time devices from eeh_probe_devices()
> and eeh_addr_cache_build(), Virtual Function devices from the pcibios
> bus add device hooks and hot plugged devices from pci_hp_add_devices()
> (with other platforms using other methods as well).  Unfortunately,
> pSeries machines currently discover hot plugged devices using
> pci_rescan_bus(), not pci_hp_add_devices(), and so those devices do
> not receive EEH support.
>
> Rather than adding another case for pci_rescan_bus(), this change
> widens the scope of the pcibios bus add device hooks so that they can
> handle all devices. As a side effect this also supports devices
> discovered after manually rescanning via /sys/bus/pci/rescan.
>
> Note that on PowerNV, this change allows the EEH subsystem to become
> enabled after boot as long as it has not been forced off, which was
> not previously possible (it was already possible on pSeries).

With this change, I get a crash (use after free by the looks of it) when
I remove and then add a pci device in qemu:

$ qemu-system-ppc64 -M pseries -append 'debug console=hvc0' \
  -nographic -vga none -m 1G,slots=32,maxmem=1024G -smp 2 \
  -kernel vmlinux -initrd ~/b/br/ppc64le-initramfs/images/rootfs.cpio \
  -nic model=e1000

...

# echo 1 > /sys/devices/pci0000:00/0000:00:00.0/remove ; \
  echo 1 > /sys/devices/pci0000:00/pci_bus/0000:00/rescan

pci 0000:00:00.0: Removing from iommu group 0
pci 0000:00:00.0: [8086:100e] type 00 class 0x020000
pci 0000:00:00.0: reg 0x10: [mem 0x200080000000-0x20008001ffff]
pci 0000:00:00.0: reg 0x14: [io  0x10040-0x1007f]
pci 0000:00:00.0: reg 0x30: [mem 0x200080040000-0x20008007ffff pref]
pci 0000:00:00.0: Adding to iommu group 0
pci 0000:00:00.0: BAR 6: assigned [mem 0x200080000000-0x20008003ffff pref]
pci 0000:00:00.0: BAR 0: assigned [mem 0x200080040000-0x20008005ffff]
pci 0000:00:00.0: BAR 1: assigned [io  0x10000-0x1003f]
e1000 0000:00:00.0 eth0: (PCI:33MHz:32-bit) 52:54:00:12:34:56
e1000 0000:00:00.0 eth0: Intel(R) PRO/1000 Network Connection
pci 0000:00:00.0: Removing from iommu group 0
pci 0000:00:00.0: [8086:100e] type 00 class 0x020000
pci 0000:00:00.0: reg 0x10: [mem 0x200080040000-0x20008005ffff]
pci 0000:00:00.0: reg 0x14: [io  0x10000-0x1003f]
pci 0000:00:00.0: reg 0x30: [mem 0x200080040000-0x20008007ffff pref]
pci 0000:00:00.0: BAR 6: assigned [mem 0x200080000000-0x20008003ffff pref]
pci 0000:00:00.0: BAR 0: assigned [mem 0x200080040000-0x20008005ffff]
pci 0000:00:00.0: BAR 1: assigned [io  0x10000-0x1003f]
BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6bfb
Faulting instruction address: 0xc000000000597270
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 0 PID: 2464 Comm: pci-probe-vs-cp Not tainted 5.3.0-rc2-00092-gf381d5711f09 #76
NIP:  c000000000597270 LR: c000000000599470 CTR: c0000000002030b0
REGS: c00000003ee4f650 TRAP: 0380   Not tainted  (5.3.0-rc2-00092-gf381d5711f09)
MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24002442  XER: 00000000
CFAR: c00000000059946c IRQMASK: 0 
GPR00: c000000000599470 c00000003ee4f8e0 c000000003317a00 6b6b6b6b6b6b6b6b 
GPR04: c000000001d0fa38 0000000000000000 0000000000000000 221a64979a66f870 
GPR08: c00000000347b398 0000000000000000 c00000000336e070 ffffffffffffffff 
GPR12: 0000000000002000 c000000004060000 0000000000000000 0000000000000000 
GPR16: 00000000100a78d8 00007fffe9fdff96 00000000100a7898 0000000000000000 
GPR20: 0000000000000000 00000000100e0ff0 0000000000000000 00000000100e0fe8 
GPR24: 0000000000000000 000001002ae50260 c000000001d0fa38 6b6b6b6b6b6b6b6b 
GPR28: fffffffffffffff2 c000000001d0fa38 0000000000000000 c000000003118c18 
NIP [c000000000597270] kernfs_find_ns+0x50/0x3d0
LR [c000000000599470] kernfs_remove_by_name_ns+0x60/0xe0
Call Trace:
[c00000003ee4f8e0] [c00000000020950c] lockdep_hardirqs_on+0x10c/0x210 (unreliable)
[c00000003ee4f970] [c000000000599470] kernfs_remove_by_name_ns+0x60/0xe0
[c00000003ee4fa00] [c00000000059ca08] sysfs_remove_file_ns+0x28/0x40
[c00000003ee4fa20] [c000000000cbd70c] device_remove_file+0x2c/0x40
[c00000003ee4fa40] [c000000000051480] eeh_sysfs_remove_device+0x50/0xf0
[c00000003ee4fa80] [c00000000004a594] eeh_add_device_late.part.7+0x84/0x220
[c00000003ee4fb00] [c0000000000e94f0] pseries_pcibios_bus_add_device+0x60/0xb0
[c00000003ee4fb70] [c00000000006fc40] pcibios_bus_add_device+0x40/0x60
[c00000003ee4fb90] [c000000000bc5220] pci_bus_add_device+0x30/0x100
[c00000003ee4fc00] [c000000000bc5344] pci_bus_add_devices+0x54/0xb0
[c00000003ee4fc40] [c000000000bca058] pci_rescan_bus+0x48/0x70
[c00000003ee4fc70] [c000000000bd9adc] dev_bus_rescan_store+0xcc/0x100
[c00000003ee4fcb0] [c000000000cbc9d8] dev_attr_store+0x38/0x60
[c00000003ee4fcd0] [c00000000059c460] sysfs_kf_write+0x70/0xb0
[c00000003ee4fd10] [c00000000059aa98] kernfs_fop_write+0xf8/0x280
[c00000003ee4fd60] [c0000000004b3e5c] __vfs_write+0x3c/0x70
[c00000003ee4fd80] [c0000000004b81f0] vfs_write+0xd0/0x220
[c00000003ee4fdd0] [c0000000004b85ac] ksys_write+0x7c/0x140
[c00000003ee4fe20] [c00000000000bc6c] system_call+0x5c/0x70

FWIW during boot the EEH core reports:

  EEH: No capable adapters found: recovery disabled.

> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index ca8b0c58a6a7..87edac6f2fd9 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1272,7 +1272,7 @@ void eeh_add_device_late(struct pci_dev *dev)
>  	struct pci_dn *pdn;
>  	struct eeh_dev *edev;
>  
> -	if (!dev || !eeh_enabled())
> +	if (!dev)
>  		return;
>  
>  	pr_debug("EEH: Adding device %s\n", pci_name(dev));

Reverting this hunk works around (fixes?) it.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 05/12] powerpc/eeh: EEH for pSeries hot plug
  2019-09-19 20:28   ` [PATCH v5 05/12] powerpc/eeh: EEH for pSeries hot plug Nathan Lynch
@ 2019-09-19 23:27     ` Oliver O'Halloran
  2019-09-19 23:44       ` Nathan Lynch
  2019-09-23  5:00     ` Sam Bobroff
  1 sibling, 1 reply; 21+ messages in thread
From: Oliver O'Halloran @ 2019-09-19 23:27 UTC (permalink / raw)
  To: Nathan Lynch
  Cc: Sam Bobroff, Alexey Kardashevskiy, linuxppc-dev, Tyrel Datwyler

On Fri, Sep 20, 2019 at 6:28 AM Nathan Lynch <nathanl@linux.ibm.com> wrote:
>
> Hello Sam,
>
> Sam Bobroff <sbobroff@linux.ibm.com> writes:
>
> With this change, I get a crash (use after free by the looks of it) when
> I remove and then add a pci device in qemu:
>
> $ qemu-system-ppc64 -M pseries -append 'debug console=hvc0' \
>   -nographic -vga none -m 1G,slots=32,maxmem=1024G -smp 2 \
>   -kernel vmlinux -initrd ~/b/br/ppc64le-initramfs/images/rootfs.cpio \
>   -nic model=e1000

is there anything special in your kernel config? I tested this with
pseries_le_defconfig and couldn't hit the crash.

>
> ...
>
> # echo 1 > /sys/devices/pci0000:00/0000:00:00.0/remove ; \
>   echo 1 > /sys/devices/pci0000:00/pci_bus/0000:00/rescan
>
> pci 0000:00:00.0: Removing from iommu group 0
> pci 0000:00:00.0: [8086:100e] type 00 class 0x020000
> pci 0000:00:00.0: reg 0x10: [mem 0x200080000000-0x20008001ffff]
> pci 0000:00:00.0: reg 0x14: [io  0x10040-0x1007f]
> pci 0000:00:00.0: reg 0x30: [mem 0x200080040000-0x20008007ffff pref]
> pci 0000:00:00.0: Adding to iommu group 0
> pci 0000:00:00.0: BAR 6: assigned [mem 0x200080000000-0x20008003ffff pref]
> pci 0000:00:00.0: BAR 0: assigned [mem 0x200080040000-0x20008005ffff]
> pci 0000:00:00.0: BAR 1: assigned [io  0x10000-0x1003f]
> e1000 0000:00:00.0 eth0: (PCI:33MHz:32-bit) 52:54:00:12:34:56
> e1000 0000:00:00.0 eth0: Intel(R) PRO/1000 Network Connection
> pci 0000:00:00.0: Removing from iommu group 0
> pci 0000:00:00.0: [8086:100e] type 00 class 0x020000
> pci 0000:00:00.0: reg 0x10: [mem 0x200080040000-0x20008005ffff]
> pci 0000:00:00.0: reg 0x14: [io  0x10000-0x1003f]
> pci 0000:00:00.0: reg 0x30: [mem 0x200080040000-0x20008007ffff pref]
> pci 0000:00:00.0: BAR 6: assigned [mem 0x200080000000-0x20008003ffff pref]
> pci 0000:00:00.0: BAR 0: assigned [mem 0x200080040000-0x20008005ffff]
> pci 0000:00:00.0: BAR 1: assigned [io  0x10000-0x1003f]
> BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6bfb
> Faulting instruction address: 0xc000000000597270
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> Modules linked in:
> CPU: 0 PID: 2464 Comm: pci-probe-vs-cp Not tainted 5.3.0-rc2-00092-gf381d5711f09 #76
> NIP:  c000000000597270 LR: c000000000599470 CTR: c0000000002030b0
> REGS: c00000003ee4f650 TRAP: 0380   Not tainted  (5.3.0-rc2-00092-gf381d5711f09)
> MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24002442  XER: 00000000
> CFAR: c00000000059946c IRQMASK: 0
> GPR00: c000000000599470 c00000003ee4f8e0 c000000003317a00 6b6b6b6b6b6b6b6b
> GPR04: c000000001d0fa38 0000000000000000 0000000000000000 221a64979a66f870
> GPR08: c00000000347b398 0000000000000000 c00000000336e070 ffffffffffffffff
> GPR12: 0000000000002000 c000000004060000 0000000000000000 0000000000000000
> GPR16: 00000000100a78d8 00007fffe9fdff96 00000000100a7898 0000000000000000
> GPR20: 0000000000000000 00000000100e0ff0 0000000000000000 00000000100e0fe8
> GPR24: 0000000000000000 000001002ae50260 c000000001d0fa38 6b6b6b6b6b6b6b6b
> GPR28: fffffffffffffff2 c000000001d0fa38 0000000000000000 c000000003118c18
> NIP [c000000000597270] kernfs_find_ns+0x50/0x3d0
> LR [c000000000599470] kernfs_remove_by_name_ns+0x60/0xe0
> Call Trace:
> [c00000003ee4f8e0] [c00000000020950c] lockdep_hardirqs_on+0x10c/0x210 (unreliable)
> [c00000003ee4f970] [c000000000599470] kernfs_remove_by_name_ns+0x60/0xe0
> [c00000003ee4fa00] [c00000000059ca08] sysfs_remove_file_ns+0x28/0x40
> [c00000003ee4fa20] [c000000000cbd70c] device_remove_file+0x2c/0x40
> [c00000003ee4fa40] [c000000000051480] eeh_sysfs_remove_device+0x50/0xf0
> [c00000003ee4fa80] [c00000000004a594] eeh_add_device_late.part.7+0x84/0x220
> [c00000003ee4fb00] [c0000000000e94f0] pseries_pcibios_bus_add_device+0x60/0xb0
> [c00000003ee4fb70] [c00000000006fc40] pcibios_bus_add_device+0x40/0x60
> [c00000003ee4fb90] [c000000000bc5220] pci_bus_add_device+0x30/0x100
> [c00000003ee4fc00] [c000000000bc5344] pci_bus_add_devices+0x54/0xb0
> [c00000003ee4fc40] [c000000000bca058] pci_rescan_bus+0x48/0x70
> [c00000003ee4fc70] [c000000000bd9adc] dev_bus_rescan_store+0xcc/0x100
> [c00000003ee4fcb0] [c000000000cbc9d8] dev_attr_store+0x38/0x60
> [c00000003ee4fcd0] [c00000000059c460] sysfs_kf_write+0x70/0xb0
> [c00000003ee4fd10] [c00000000059aa98] kernfs_fop_write+0xf8/0x280
> [c00000003ee4fd60] [c0000000004b3e5c] __vfs_write+0x3c/0x70
> [c00000003ee4fd80] [c0000000004b81f0] vfs_write+0xd0/0x220
> [c00000003ee4fdd0] [c0000000004b85ac] ksys_write+0x7c/0x140
> [c00000003ee4fe20] [c00000000000bc6c] system_call+0x5c/0x70
>
> FWIW during boot the EEH core reports:
>
>   EEH: No capable adapters found: recovery disabled.
>
> > diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> > index ca8b0c58a6a7..87edac6f2fd9 100644
> > --- a/arch/powerpc/kernel/eeh.c
> > +++ b/arch/powerpc/kernel/eeh.c
> > @@ -1272,7 +1272,7 @@ void eeh_add_device_late(struct pci_dev *dev)
> >       struct pci_dn *pdn;
> >       struct eeh_dev *edev;
> >
> > -     if (!dev || !eeh_enabled())
> > +     if (!dev)
> >               return;
> >
> >       pr_debug("EEH: Adding device %s\n", pci_name(dev));
>
> Reverting this hunk works around (fixes?) it.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 05/12] powerpc/eeh: EEH for pSeries hot plug
  2019-09-19 23:27     ` Oliver O'Halloran
@ 2019-09-19 23:44       ` Nathan Lynch
  0 siblings, 0 replies; 21+ messages in thread
From: Nathan Lynch @ 2019-09-19 23:44 UTC (permalink / raw)
  To: Oliver O'Halloran
  Cc: Sam Bobroff, Alexey Kardashevskiy, linuxppc-dev, Tyrel Datwyler

"Oliver O'Halloran" <oohall@gmail.com> writes:

> On Fri, Sep 20, 2019 at 6:28 AM Nathan Lynch <nathanl@linux.ibm.com> wrote:
>>
>> Hello Sam,
>>
>> Sam Bobroff <sbobroff@linux.ibm.com> writes:
>>
>> With this change, I get a crash (use after free by the looks of it) when
>> I remove and then add a pci device in qemu:
>>
>> $ qemu-system-ppc64 -M pseries -append 'debug console=hvc0' \
>>   -nographic -vga none -m 1G,slots=32,maxmem=1024G -smp 2 \
>>   -kernel vmlinux -initrd ~/b/br/ppc64le-initramfs/images/rootfs.cpio \
>>   -nic model=e1000
>
> is there anything special in your kernel config? I tested this with
> pseries_le_defconfig and couldn't hit the crash.

My config is below; CONFIG_SLUB_DEBUG_ON=y probably makes the difference.

CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_AUDIT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=18
CONFIG_LOG_CPU_MAX_BUF_SHIFT=13
CONFIG_NUMA_BALANCING=y
CONFIG_CGROUPS=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
CONFIG_CGROUP_SCHED=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CPUSETS=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_BPF=y
CONFIG_USER_NS=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE="rootfs.cpio"
CONFIG_BPF_SYSCALL=y
# CONFIG_COMPAT_BRK is not set
CONFIG_PROFILING=y
CONFIG_PPC64=y
CONFIG_NR_CPUS=2048
CONFIG_CPU_LITTLE_ENDIAN=y
CONFIG_PPC_SPLPAR=y
CONFIG_DTL=y
CONFIG_SCANLOG=y
CONFIG_PPC_SMLPAR=y
CONFIG_RTAS_FLASH=y
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
CONFIG_HZ_100=y
CONFIG_PPC_TRANSACTIONAL_MEM=y
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
CONFIG_IRQ_ALL_CPUS=y
CONFIG_PPC_64K_PAGES=y
CONFIG_PPC_SUBPAGE_PROT=y
CONFIG_SCHED_SMT=y
CONFIG_PM_DEBUG=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM_BOOK3S_64=y
CONFIG_KVM_BOOK3S_64_HV=y
CONFIG_VHOST_NET=y
CONFIG_OPROFILE=y
CONFIG_KPROBES=y
CONFIG_JUMP_LABEL=y
CONFIG_REFCOUNT_FULL=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_PARTITION_ADVANCED=y
CONFIG_BINFMT_MISC=y
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_KSM=y
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_XFRM_USER=y
CONFIG_NET_KEY=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_NET_IPIP=y
CONFIG_SYN_COOKIES=y
CONFIG_INET_AH=y
CONFIG_INET_ESP=y
CONFIG_INET_IPCOMP=y
# CONFIG_IPV6 is not set
CONFIG_NETFILTER=y
# CONFIG_NETFILTER_ADVANCED is not set
CONFIG_NF_CONNTRACK=y
CONFIG_NF_CONNTRACK_FTP=y
CONFIG_NF_CONNTRACK_IRC=y
CONFIG_NF_CONNTRACK_SIP=y
CONFIG_NF_CT_NETLINK=y
CONFIG_NETFILTER_XT_MARK=y
CONFIG_NETFILTER_XT_TARGET_LOG=y
CONFIG_NETFILTER_XT_TARGET_NFLOG=y
CONFIG_NETFILTER_XT_TARGET_TCPMSS=y
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=y
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y
CONFIG_NETFILTER_XT_MATCH_POLICY=y
CONFIG_NETFILTER_XT_MATCH_STATE=y
CONFIG_NF_LOG_ARP=y
CONFIG_IP_NF_IPTABLES=y
CONFIG_IP_NF_FILTER=y
CONFIG_IP_NF_TARGET_REJECT=y
CONFIG_IP_NF_NAT=y
CONFIG_IP_NF_TARGET_MASQUERADE=y
CONFIG_IP_NF_MANGLE=y
CONFIG_BRIDGE=y
CONFIG_VLAN_8021Q=y
CONFIG_NET_SCHED=y
CONFIG_NET_CLS_BPF=y
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_BPF=y
CONFIG_BPF_JIT=y
CONFIG_HOTPLUG_PCI=y
CONFIG_HOTPLUG_PCI_RPA=y
CONFIG_HOTPLUG_PCI_RPA_DLPAR=y
CONFIG_UEVENT_HELPER=y
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_OF_UNITTEST=y
CONFIG_PARPORT=y
CONFIG_PARPORT_PC=y
CONFIG_BLK_DEV_FD=y
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_NBD=y
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_SIZE=65536
CONFIG_VIRTIO_BLK=y
CONFIG_CXL=y
CONFIG_OCXL=y
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=y
CONFIG_BLK_DEV_SR=y
CONFIG_BLK_DEV_SR_VENDOR=y
CONFIG_CHR_DEV_SG=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_FC_ATTRS=y
CONFIG_SCSI_CXGB3_ISCSI=y
CONFIG_SCSI_CXGB4_ISCSI=y
CONFIG_SCSI_BNX2_ISCSI=y
CONFIG_BE2ISCSI=y
CONFIG_CXLFLASH=y
CONFIG_SCSI_MPT2SAS=y
CONFIG_SCSI_IBMVSCSI=y
CONFIG_SCSI_IBMVFC=y
CONFIG_SCSI_SYM53C8XX_2=y
CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=0
CONFIG_SCSI_IPR=y
CONFIG_SCSI_QLA_FC=y
CONFIG_SCSI_QLA_ISCSI=y
CONFIG_SCSI_LPFC=y
CONFIG_SCSI_VIRTIO=y
CONFIG_SCSI_DH=y
CONFIG_SCSI_DH_RDAC=y
CONFIG_SCSI_DH_ALUA=y
CONFIG_ATA=y
CONFIG_SATA_AHCI=y
CONFIG_PATA_AMD=y
CONFIG_ATA_GENERIC=y
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_LINEAR=y
CONFIG_MD_RAID0=y
CONFIG_MD_RAID1=y
CONFIG_MD_RAID10=y
CONFIG_MD_RAID456=y
CONFIG_MD_MULTIPATH=y
CONFIG_MD_FAULTY=y
CONFIG_BLK_DEV_DM=y
CONFIG_DM_CRYPT=y
CONFIG_DM_SNAPSHOT=y
CONFIG_DM_THIN_PROVISIONING=y
CONFIG_DM_MIRROR=y
CONFIG_DM_ZERO=y
CONFIG_DM_MULTIPATH=y
CONFIG_DM_MULTIPATH_QL=y
CONFIG_DM_MULTIPATH_ST=y
CONFIG_DM_UEVENT=y
CONFIG_BONDING=y
CONFIG_DUMMY=y
CONFIG_MACVLAN=y
CONFIG_MACVTAP=y
CONFIG_VXLAN=y
CONFIG_NETCONSOLE=y
CONFIG_TUN=y
CONFIG_VETH=y
CONFIG_VIRTIO_NET=y
CONFIG_VORTEX=y
CONFIG_ACENIC=y
CONFIG_ACENIC_OMIT_TIGON_I=y
CONFIG_PCNET32=y
CONFIG_TIGON3=y
CONFIG_BNX2X=y
CONFIG_CHELSIO_T1=y
CONFIG_BE2NET=y
CONFIG_IBMVETH=y
CONFIG_E100=y
CONFIG_E1000=y
CONFIG_E1000E=y
CONFIG_IXGB=y
CONFIG_IXGBE=y
CONFIG_I40E=y
CONFIG_MLX4_EN=y
CONFIG_MYRI10GE=y
CONFIG_S2IO=y
CONFIG_QLGE=y
CONFIG_NETXEN_NIC=y
CONFIG_PPP=y
CONFIG_PPP_BSDCOMP=y
CONFIG_PPP_DEFLATE=y
CONFIG_PPPOE=y
CONFIG_PPP_ASYNC=y
CONFIG_PPP_SYNC_TTY=y
CONFIG_INPUT_EVDEV=y
CONFIG_INPUT_MISC=y
CONFIG_INPUT_PCSPKR=y
# CONFIG_SERIO_SERPORT is not set
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_ICOM=y
CONFIG_SERIAL_JSM=y
CONFIG_HVC_CONSOLE=y
CONFIG_HVC_RTAS=y
CONFIG_HVCS=y
CONFIG_VIRTIO_CONSOLE=y
CONFIG_IBM_BSR=y
CONFIG_POWERNV_OP_PANEL=y
CONFIG_HW_RANDOM=y
CONFIG_RAW_DRIVER=y
CONFIG_MAX_RAW_DEVS=1024
CONFIG_I2C_CHARDEV=y
CONFIG_FB=y
CONFIG_FIRMWARE_EDID=y
CONFIG_FB_OF=y
CONFIG_FB_MATROX=y
CONFIG_FB_MATROX_MILLENIUM=y
CONFIG_FB_MATROX_MYSTIQUE=y
CONFIG_FB_MATROX_G=y
CONFIG_FB_RADEON=y
CONFIG_FB_IBM_GXT4500=y
CONFIG_LCD_CLASS_DEVICE=y
CONFIG_LCD_PLATFORM=y
# CONFIG_VGA_CONSOLE is not set
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_LOGO=y
CONFIG_HID_GYRATION=y
CONFIG_HID_PANTHERLORD=y
CONFIG_HID_PETALYNX=y
CONFIG_HID_SAMSUNG=y
CONFIG_HID_SUNPLUS=y
CONFIG_USB_HIDDEV=y
CONFIG_USB=y
CONFIG_USB_MON=y
CONFIG_USB_XHCI_HCD=y
CONFIG_USB_EHCI_HCD=y
# CONFIG_USB_EHCI_HCD_PPC_OF is not set
CONFIG_USB_OHCI_HCD=y
CONFIG_USB_STORAGE=y
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
CONFIG_LEDS_POWERNV=y
CONFIG_INFINIBAND=y
CONFIG_INFINIBAND_USER_MAD=y
CONFIG_INFINIBAND_USER_ACCESS=y
CONFIG_INFINIBAND_MTHCA=y
CONFIG_INFINIBAND_CXGB3=y
CONFIG_INFINIBAND_CXGB4=y
CONFIG_MLX4_INFINIBAND=y
CONFIG_INFINIBAND_IPOIB=y
CONFIG_INFINIBAND_IPOIB_CM=y
CONFIG_INFINIBAND_SRP=y
CONFIG_INFINIBAND_ISER=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_DRV_GENERIC=y
CONFIG_VIRTIO_PCI=y
CONFIG_VIRTIO_BALLOON=y
CONFIG_VALIDATE_FS_PARSER=y
CONFIG_EXT2_FS=y
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
CONFIG_EXT2_FS_SECURITY=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
CONFIG_JFS_FS=y
CONFIG_JFS_POSIX_ACL=y
CONFIG_JFS_SECURITY=y
CONFIG_XFS_FS=y
CONFIG_XFS_POSIX_ACL=y
CONFIG_BTRFS_FS=y
CONFIG_BTRFS_FS_POSIX_ACL=y
CONFIG_NILFS2_FS=y
CONFIG_FS_DAX=y
CONFIG_AUTOFS4_FS=y
CONFIG_FUSE_FS=y
CONFIG_OVERLAY_FS=y
CONFIG_ISO9660_FS=y
CONFIG_UDF_FS=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_PROC_KCORE=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_HUGETLBFS=y
CONFIG_CRAMFS=y
CONFIG_SQUASHFS=y
CONFIG_SQUASHFS_XATTR=y
CONFIG_SQUASHFS_LZO=y
CONFIG_SQUASHFS_XZ=y
CONFIG_PSTORE=y
CONFIG_NFS_FS=y
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=y
CONFIG_NFSD=y
CONFIG_NFSD_V3_ACL=y
CONFIG_NFSD_V4=y
CONFIG_CIFS=y
CONFIG_CIFS_XATTR=y
CONFIG_CIFS_POSIX=y
CONFIG_NLS_DEFAULT="utf8"
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_UTF8=y
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_PCBC=y
CONFIG_CRYPTO_CRC32C_VPMSUM=y
CONFIG_CRYPTO_MD5_PPC=y
CONFIG_CRYPTO_MICHAEL_MIC=y
CONFIG_CRYPTO_SHA1_PPC=y
CONFIG_CRYPTO_TGR192=y
CONFIG_CRYPTO_WP512=y
CONFIG_CRYPTO_ANUBIS=y
CONFIG_CRYPTO_ARC4=y
CONFIG_CRYPTO_BLOWFISH=y
CONFIG_CRYPTO_CAST6=y
CONFIG_CRYPTO_KHAZAD=y
CONFIG_CRYPTO_SALSA20=y
CONFIG_CRYPTO_SERPENT=y
CONFIG_CRYPTO_TEA=y
CONFIG_CRYPTO_TWOFISH=y
CONFIG_CRYPTO_LZO=y
CONFIG_CRYPTO_DEV_NX=y
CONFIG_CRYPTO_DEV_VMX=y
CONFIG_CRYPTO_DEV_VMX_ENCRYPT=y
CONFIG_CRYPTO_DEV_VIRTIO=y
CONFIG_PRINTK_TIME=y
CONFIG_DYNAMIC_DEBUG=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_REDUCED=y
CONFIG_GDB_SCRIPTS=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_KERNEL=y
CONFIG_PAGE_EXTENSION=y
CONFIG_PAGE_POISONING=y
CONFIG_SLUB_DEBUG_ON=y
CONFIG_DEBUG_STACK_USAGE=y
CONFIG_DEBUG_VM=y
CONFIG_DEBUG_PER_CPU_MAPS=y
CONFIG_DEBUG_STACKOVERFLOW=y
CONFIG_DEBUG_SHIRQ=y
CONFIG_SOFTLOCKUP_DETECTOR=y
CONFIG_HARDLOCKUP_DETECTOR=y
CONFIG_WQ_WATCHDOG=y
CONFIG_PANIC_ON_OOPS=y
CONFIG_SCHED_STACK_END_CHECK=y
CONFIG_PROVE_LOCKING=y
CONFIG_DEBUG_ATOMIC_SLEEP=y
CONFIG_DEBUG_LIST=y
CONFIG_DEBUG_SG=y
CONFIG_DEBUG_NOTIFIERS=y
CONFIG_DEBUG_WQ_FORCE_RR_CPU=y
CONFIG_LATENCYTOP=y
CONFIG_FUNCTION_TRACER=y
CONFIG_SCHED_TRACER=y
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_CODE_PATCHING_SELFTEST=y
CONFIG_FTR_FIXUP_SELFTEST=y
CONFIG_MSI_BITMAP_SELFTEST=y
CONFIG_PPC_IRQ_SOFT_MASK_DEBUG=y

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 05/12] powerpc/eeh: EEH for pSeries hot plug
  2019-09-19 20:28   ` [PATCH v5 05/12] powerpc/eeh: EEH for pSeries hot plug Nathan Lynch
  2019-09-19 23:27     ` Oliver O'Halloran
@ 2019-09-23  5:00     ` Sam Bobroff
  2019-09-23 18:01       ` Nathan Lynch
  1 sibling, 1 reply; 21+ messages in thread
From: Sam Bobroff @ 2019-09-23  5:00 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: aik, linuxppc-dev, oohall, tyreld

[-- Attachment #1: Type: text/plain, Size: 7054 bytes --]

On Thu, Sep 19, 2019 at 03:28:40PM -0500, Nathan Lynch wrote:
> Hello Sam,
> 
> Sam Bobroff <sbobroff@linux.ibm.com> writes:
> > On PowerNV and pSeries, devices currently acquire EEH support from
> > several different places: Boot-time devices from eeh_probe_devices()
> > and eeh_addr_cache_build(), Virtual Function devices from the pcibios
> > bus add device hooks and hot plugged devices from pci_hp_add_devices()
> > (with other platforms using other methods as well).  Unfortunately,
> > pSeries machines currently discover hot plugged devices using
> > pci_rescan_bus(), not pci_hp_add_devices(), and so those devices do
> > not receive EEH support.
> >
> > Rather than adding another case for pci_rescan_bus(), this change
> > widens the scope of the pcibios bus add device hooks so that they can
> > handle all devices. As a side effect this also supports devices
> > discovered after manually rescanning via /sys/bus/pci/rescan.
> >
> > Note that on PowerNV, this change allows the EEH subsystem to become
> > enabled after boot as long as it has not been forced off, which was
> > not previously possible (it was already possible on pSeries).
> 
> With this change, I get a crash (use after free by the looks of it) when
> I remove and then add a pci device in qemu:
> 
> $ qemu-system-ppc64 -M pseries -append 'debug console=hvc0' \
>   -nographic -vga none -m 1G,slots=32,maxmem=1024G -smp 2 \
>   -kernel vmlinux -initrd ~/b/br/ppc64le-initramfs/images/rootfs.cpio \
>   -nic model=e1000
> 
> ...
> 
> # echo 1 > /sys/devices/pci0000:00/0000:00:00.0/remove ; \
>   echo 1 > /sys/devices/pci0000:00/pci_bus/0000:00/rescan
> 
> pci 0000:00:00.0: Removing from iommu group 0
> pci 0000:00:00.0: [8086:100e] type 00 class 0x020000
> pci 0000:00:00.0: reg 0x10: [mem 0x200080000000-0x20008001ffff]
> pci 0000:00:00.0: reg 0x14: [io  0x10040-0x1007f]
> pci 0000:00:00.0: reg 0x30: [mem 0x200080040000-0x20008007ffff pref]
> pci 0000:00:00.0: Adding to iommu group 0
> pci 0000:00:00.0: BAR 6: assigned [mem 0x200080000000-0x20008003ffff pref]
> pci 0000:00:00.0: BAR 0: assigned [mem 0x200080040000-0x20008005ffff]
> pci 0000:00:00.0: BAR 1: assigned [io  0x10000-0x1003f]
> e1000 0000:00:00.0 eth0: (PCI:33MHz:32-bit) 52:54:00:12:34:56
> e1000 0000:00:00.0 eth0: Intel(R) PRO/1000 Network Connection
> pci 0000:00:00.0: Removing from iommu group 0
> pci 0000:00:00.0: [8086:100e] type 00 class 0x020000
> pci 0000:00:00.0: reg 0x10: [mem 0x200080040000-0x20008005ffff]
> pci 0000:00:00.0: reg 0x14: [io  0x10000-0x1003f]
> pci 0000:00:00.0: reg 0x30: [mem 0x200080040000-0x20008007ffff pref]
> pci 0000:00:00.0: BAR 6: assigned [mem 0x200080000000-0x20008003ffff pref]
> pci 0000:00:00.0: BAR 0: assigned [mem 0x200080040000-0x20008005ffff]
> pci 0000:00:00.0: BAR 1: assigned [io  0x10000-0x1003f]
> BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6bfb
> Faulting instruction address: 0xc000000000597270
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> Modules linked in:
> CPU: 0 PID: 2464 Comm: pci-probe-vs-cp Not tainted 5.3.0-rc2-00092-gf381d5711f09 #76
> NIP:  c000000000597270 LR: c000000000599470 CTR: c0000000002030b0
> REGS: c00000003ee4f650 TRAP: 0380   Not tainted  (5.3.0-rc2-00092-gf381d5711f09)
> MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24002442  XER: 00000000
> CFAR: c00000000059946c IRQMASK: 0 
> GPR00: c000000000599470 c00000003ee4f8e0 c000000003317a00 6b6b6b6b6b6b6b6b 
> GPR04: c000000001d0fa38 0000000000000000 0000000000000000 221a64979a66f870 
> GPR08: c00000000347b398 0000000000000000 c00000000336e070 ffffffffffffffff 
> GPR12: 0000000000002000 c000000004060000 0000000000000000 0000000000000000 
> GPR16: 00000000100a78d8 00007fffe9fdff96 00000000100a7898 0000000000000000 
> GPR20: 0000000000000000 00000000100e0ff0 0000000000000000 00000000100e0fe8 
> GPR24: 0000000000000000 000001002ae50260 c000000001d0fa38 6b6b6b6b6b6b6b6b 
> GPR28: fffffffffffffff2 c000000001d0fa38 0000000000000000 c000000003118c18 
> NIP [c000000000597270] kernfs_find_ns+0x50/0x3d0
> LR [c000000000599470] kernfs_remove_by_name_ns+0x60/0xe0
> Call Trace:
> [c00000003ee4f8e0] [c00000000020950c] lockdep_hardirqs_on+0x10c/0x210 (unreliable)
> [c00000003ee4f970] [c000000000599470] kernfs_remove_by_name_ns+0x60/0xe0
> [c00000003ee4fa00] [c00000000059ca08] sysfs_remove_file_ns+0x28/0x40
> [c00000003ee4fa20] [c000000000cbd70c] device_remove_file+0x2c/0x40
> [c00000003ee4fa40] [c000000000051480] eeh_sysfs_remove_device+0x50/0xf0
> [c00000003ee4fa80] [c00000000004a594] eeh_add_device_late.part.7+0x84/0x220
> [c00000003ee4fb00] [c0000000000e94f0] pseries_pcibios_bus_add_device+0x60/0xb0
> [c00000003ee4fb70] [c00000000006fc40] pcibios_bus_add_device+0x40/0x60
> [c00000003ee4fb90] [c000000000bc5220] pci_bus_add_device+0x30/0x100
> [c00000003ee4fc00] [c000000000bc5344] pci_bus_add_devices+0x54/0xb0
> [c00000003ee4fc40] [c000000000bca058] pci_rescan_bus+0x48/0x70
> [c00000003ee4fc70] [c000000000bd9adc] dev_bus_rescan_store+0xcc/0x100
> [c00000003ee4fcb0] [c000000000cbc9d8] dev_attr_store+0x38/0x60
> [c00000003ee4fcd0] [c00000000059c460] sysfs_kf_write+0x70/0xb0
> [c00000003ee4fd10] [c00000000059aa98] kernfs_fop_write+0xf8/0x280
> [c00000003ee4fd60] [c0000000004b3e5c] __vfs_write+0x3c/0x70
> [c00000003ee4fd80] [c0000000004b81f0] vfs_write+0xd0/0x220
> [c00000003ee4fdd0] [c0000000004b85ac] ksys_write+0x7c/0x140
> [c00000003ee4fe20] [c00000000000bc6c] system_call+0x5c/0x70
> 
> FWIW during boot the EEH core reports:
> 
>   EEH: No capable adapters found: recovery disabled.
> 
> > diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> > index ca8b0c58a6a7..87edac6f2fd9 100644
> > --- a/arch/powerpc/kernel/eeh.c
> > +++ b/arch/powerpc/kernel/eeh.c
> > @@ -1272,7 +1272,7 @@ void eeh_add_device_late(struct pci_dev *dev)
> >  	struct pci_dn *pdn;
> >  	struct eeh_dev *edev;
> >  
> > -	if (!dev || !eeh_enabled())
> > +	if (!dev)
> >  		return;
> >  
> >  	pr_debug("EEH: Adding device %s\n", pci_name(dev));
> 
> Reverting this hunk works around (fixes?) it.

Hi Nathan,

Thanks, this does look like a bug to me. I couldn't replicate your crash
(even with CONFIG_SLUB_DEBUG_ON) but I think I do see a bug there.

Does the below patch also fix it for you?

Cheers,
Sam.

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 0a91dee51245..f8aa65cb2931 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1207,10 +1207,11 @@ void eeh_add_device_late(struct pci_dev *dev)
        if (eeh_has_flag(EEH_PROBE_MODE_DEV))
                eeh_ops->probe(pdn, NULL);

-       edev->pdev = dev;
-       dev->dev.archdata.edev = edev;
-
-       eeh_addr_cache_insert_dev(dev);
+       if (eeh_enabled()) {
+               edev->pdev = dev;
+               dev->dev.archdata.edev = edev;
+               eeh_addr_cache_insert_dev(dev);
+       }
 }

 /**

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 05/12] powerpc/eeh: EEH for pSeries hot plug
  2019-09-23  5:00     ` Sam Bobroff
@ 2019-09-23 18:01       ` Nathan Lynch
  0 siblings, 0 replies; 21+ messages in thread
From: Nathan Lynch @ 2019-09-23 18:01 UTC (permalink / raw)
  To: Sam Bobroff; +Cc: aik, linuxppc-dev, oohall, tyreld

Sam Bobroff <sbobroff@linux.ibm.com> writes:
> Thanks, this does look like a bug to me. I couldn't replicate your crash
> (even with CONFIG_SLUB_DEBUG_ON) but I think I do see a bug there.
>
> Does the below patch also fix it for you?

Yes, this works as well, thanks.


> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 0a91dee51245..f8aa65cb2931 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1207,10 +1207,11 @@ void eeh_add_device_late(struct pci_dev *dev)
>         if (eeh_has_flag(EEH_PROBE_MODE_DEV))
>                 eeh_ops->probe(pdn, NULL);
>
> -       edev->pdev = dev;
> -       dev->dev.archdata.edev = edev;
> -
> -       eeh_addr_cache_insert_dev(dev);
> +       if (eeh_enabled()) {
> +               edev->pdev = dev;
> +               dev->dev.archdata.edev = edev;
> +               eeh_addr_cache_insert_dev(dev);
> +       }
>  }
>
>  /**

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-09-23 18:04 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-16  4:48 [PATCH v5 00/12] Sam Bobroff
2019-08-16  4:48 ` [PATCH v5 01/12] powerpc/64: Adjust order in pcibios_init() Sam Bobroff
2019-08-28  4:24   ` Michael Ellerman
2019-08-16  4:48 ` [PATCH v5 02/12] powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag Sam Bobroff
2019-08-16  4:48 ` [PATCH v5 03/12] powerpc/eeh: Improve debug messages around device addition Sam Bobroff
2019-08-16  4:48 ` [PATCH v5 04/12] powerpc/eeh: Initialize EEH address cache earlier Sam Bobroff
2019-08-16  4:48 ` [PATCH v5 05/12] powerpc/eeh: EEH for pSeries hot plug Sam Bobroff
2019-08-21  3:28   ` Michael Ellerman
2019-08-22  6:17     ` [PATCH] powerpc/eeh: Fixup EEH for pSeries hotplug Sam Bobroff
2019-09-19 20:28   ` [PATCH v5 05/12] powerpc/eeh: EEH for pSeries hot plug Nathan Lynch
2019-09-19 23:27     ` Oliver O'Halloran
2019-09-19 23:44       ` Nathan Lynch
2019-09-23  5:00     ` Sam Bobroff
2019-09-23 18:01       ` Nathan Lynch
2019-08-16  4:48 ` [PATCH v5 06/12] powerpc/eeh: Refactor around eeh_probe_devices() Sam Bobroff
2019-08-16  4:48 ` [PATCH v5 07/12] powerpc/eeh: Add bdfn field to eeh_dev Sam Bobroff
2019-08-16  4:48 ` [PATCH v5 08/12] powerpc/eeh: Introduce EEH edev logging macros Sam Bobroff
2019-08-16  4:48 ` [PATCH v5 09/12] powerpc/eeh: Convert log messages to eeh_edev_* macros Sam Bobroff
2019-08-16  4:48 ` [PATCH v5 10/12] powerpc/eeh: Fix crash when edev->pdev changes Sam Bobroff
2019-08-16  4:48 ` [PATCH v5 11/12] powerpc/eeh: Remove unused return path from eeh_pe_dev_traverse() Sam Bobroff
2019-08-16  4:48 ` [PATCH v5 12/12] powerpc/eeh: Slightly simplify eeh_add_to_parent_pe() Sam Bobroff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).