All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/3] Asynchronous EEH recovery
@ 2022-08-16  3:27 Ganesh Goudar
  2022-08-16  3:27 ` [RFC 1/3] powerpc/eeh: Synchronization for safety Ganesh Goudar
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Ganesh Goudar @ 2022-08-16  3:27 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Ganesh Goudar, mahesh

Hi,

EEH reocvery is currently serialized and these patches shorten
the time taken for EEH recovery by making the recovery to run
in parallel. The original author of these patches is Sam Bobroff,
I have rebased and tested these patches.

On powervm with 64 VFs and I see approximately 48% reduction
in time taken in EEH recovery, Yet to be tested on powernv.

These patches were originally posted as separate RFCs, I think
posting them as single series would be more helpful, I know the
patches are too big, I will try to logically divide in next
iterations.

Thanks 

Ganesh Goudar (3):
  powerpc/eeh: Synchronization for safety
  powerpc/eeh: Provide a unique ID for each EEH recovery
  powerpc/eeh: Asynchronous recovery

 arch/powerpc/include/asm/eeh.h               |   7 +-
 arch/powerpc/include/asm/eeh_event.h         |  10 +-
 arch/powerpc/include/asm/pci-bridge.h        |   3 +
 arch/powerpc/include/asm/ppc-pci.h           |   2 +-
 arch/powerpc/kernel/eeh.c                    | 154 +++--
 arch/powerpc/kernel/eeh_driver.c             | 578 +++++++++++++++----
 arch/powerpc/kernel/eeh_event.c              |  71 ++-
 arch/powerpc/kernel/eeh_pe.c                 |  33 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c |  12 +-
 arch/powerpc/platforms/pseries/eeh_pseries.c |   5 +-
 arch/powerpc/platforms/pseries/pci_dlpar.c   |   5 +-
 drivers/pci/hotplug/pnv_php.c                |   5 +-
 drivers/pci/hotplug/rpadlpar_core.c          |   2 +
 drivers/vfio/vfio_spapr_eeh.c                |  10 +-
 14 files changed, 685 insertions(+), 212 deletions(-)

-- 
2.37.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC 1/3] powerpc/eeh: Synchronization for safety
  2022-08-16  3:27 [RFC 0/3] Asynchronous EEH recovery Ganesh Goudar
@ 2022-08-16  3:27 ` Ganesh Goudar
  2022-08-16  3:27 ` [RFC 2/3] powerpc/eeh: Provide a unique ID for each EEH recovery Ganesh Goudar
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Ganesh Goudar @ 2022-08-16  3:27 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Ganesh Goudar, mahesh

Based on the original work from Sam Bobroff.

There is currently little synchronization between EEH error detection
(eeh_dev_check_failure()), EEH error recovery
(eeh_handle_{normal,special}_event()) and the PCI subsystem (device
addition and removal), and so there are race conditions that lead to
crashes (often access to free'd memory or LIST_POISON).

However, a solution must consider:
- EEH error detection can occur in interrupt context, which prevents
the use of a mutex.
- EEH recovery may need to sleep, which prevents the use of a spinlock.
- EEH recovery uses PCI operations that may require the PCI
rescan/remove lock and/or device lock to be held
- PCI operations may hold the rescan/remove and/or device lock when
calling into EEH functions.
- Device driver callbacks may perform arbitrary PCI operations
during recovery, including device removal.

In this patch the existing mutex and spinlock are combined with the
EEH_PE_RECOVERING flag to provide some assurances that are then used
to reduce the race conditions.

The fields to be protected are the ones that provide the structure
of the trees of struct eeh_pe that are held for each PHB: the parent
pointer and child lists and the list of struct eeh_dev, as well as
the pe and pdev pointers within struct eeh_dev.

The existing way of using EEH_PE_RECOVERING is kept and slightly
extended: No struct eeh_pe will be removed while it has the flag set
on it. Additionally, when adding new PEs, they are marked
EEH_PE_RECOVERING if their parent PE is marked: this allows the
recovery thread to assume that all PEs underneath the one it's
processing will continue to exist during recovery.

Both the mutex and spinlock are held while any protected field is
changed or a PE is deleted, so holding either of them (elsewhere) will
keep them stable and safe to access. Additionally, if
EEH_PE_RECOVERING is set on a PE then the locks can be released and
re-acquired safely, as long as the protected fields aren't used while
no locks are held. This is used during recovery to release locks
for long sleeps (i.e. during eeh_wait_state() when we may sleep up to
5 minutes), or to maintain lock ordering.

The spinlock is used in error detection (which cannot use a mutex, see
above) and also where it's possible that the mutex is already held.
The mutex is used in areas that don't have that restriction, and where
blocking may be required. Care must be taken when ordering these locks
against the PCI rescan/remove lock and the device locks to avoid
deadlocking.

Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
---
 arch/powerpc/include/asm/eeh.h               |   6 +-
 arch/powerpc/kernel/eeh.c                    | 112 ++++++--
 arch/powerpc/kernel/eeh_driver.c             | 287 ++++++++++++++-----
 arch/powerpc/kernel/eeh_pe.c                 |  30 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c |  12 +-
 arch/powerpc/platforms/pseries/eeh_pseries.c |   5 +-
 arch/powerpc/platforms/pseries/pci_dlpar.c   |   5 +-
 drivers/pci/hotplug/pnv_php.c                |   5 +-
 drivers/pci/hotplug/rpadlpar_core.c          |   2 +
 drivers/vfio/vfio_spapr_eeh.c                |  10 +-
 10 files changed, 364 insertions(+), 110 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 514dd056c2c8..f659c0433de5 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -271,11 +271,15 @@ static inline bool eeh_state_active(int state)
 	== (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE);
 }
 
+void eeh_recovery_lock(void);
+void eeh_recovery_unlock(void);
+void eeh_recovery_must_be_locked(void);
+
 typedef void (*eeh_edev_traverse_func)(struct eeh_dev *edev, void *flag);
 typedef void *(*eeh_pe_traverse_func)(struct eeh_pe *pe, void *flag);
 void eeh_set_pe_aux_size(int size);
 int eeh_phb_pe_create(struct pci_controller *phb);
-int eeh_wait_state(struct eeh_pe *pe, int max_wait);
+int eeh_wait_state(struct eeh_pe *pe, int max_waiti, bool unlock);
 struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb);
 struct eeh_pe *eeh_pe_next(struct eeh_pe *pe, struct eeh_pe *root);
 struct eeh_pe *eeh_pe_get(struct pci_controller *phb, int pe_no);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index ab316e155ea9..2c90c37524ed 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -108,7 +108,25 @@ bool eeh_debugfs_no_recover;
 /* Platform dependent EEH operations */
 struct eeh_ops *eeh_ops = NULL;
 
-/* Lock to avoid races due to multiple reports of an error */
+/*
+ * confirm_error_lock and eeh_dev_mutex are used together to provide
+ * safety during EEH operations.
+ *
+ * Generally, the spinlock is used in error detection where it's not possible
+ * to use a mutex or where there is potential to deadlock with the mutex, and
+ * the mutex is used during recovery and other PCI related operations. One must
+ * be held when reading and both must be held when making changes to the
+ * protected fields: eeh_pe.parent, child_list, child, edevs and eeh_dev.pe,
+ * .pdev.
+ *
+ * Lock ordering:
+ * - the PCI rescan/remove mutex (see pci_lock_rescan_remove())
+ * - the struct device lock (see device_lock())
+ * - the eeh_dev_mutex mutex (see eeh_recovery_lock())
+ * - the confirm_error_lock spinlock (see eeh_serialize_lock())
+ * - the eeh_eventlist_lock spinlock
+ */
+
 DEFINE_RAW_SPINLOCK(confirm_error_lock);
 EXPORT_SYMBOL_GPL(confirm_error_lock);
 
@@ -160,6 +178,23 @@ void eeh_show_enabled(void)
 		pr_info("EEH: No capable adapters found: recovery disabled.\n");
 }
 
+void eeh_recovery_lock(void)
+{
+	mutex_lock(&eeh_dev_mutex);
+}
+EXPORT_SYMBOL_GPL(eeh_recovery_lock);
+
+void eeh_recovery_unlock(void)
+{
+	mutex_unlock(&eeh_dev_mutex);
+}
+EXPORT_SYMBOL_GPL(eeh_recovery_unlock);
+void eeh_recovery_must_be_locked(void)
+{
+	WARN_ON_ONCE(!mutex_is_locked(&eeh_dev_mutex));
+}
+EXPORT_SYMBOL_GPL(eeh_recovery_must_be_locked);
+
 /*
  * This routine captures assorted PCI configuration space data
  * for the indicated PCI device, and puts them into a buffer
@@ -353,11 +388,12 @@ static inline unsigned long eeh_token_to_phys(unsigned long token)
  * On PowerNV platform, we might already have fenced PHB there.
  * For that case, it's meaningless to recover frozen PE. Intead,
  * We have to handle fenced PHB firstly.
+ *
+ * eeh_serialize_lock must be held when calling this function.
  */
 static int eeh_phb_check_failure(struct eeh_pe *pe)
 {
 	struct eeh_pe *phb_pe;
-	unsigned long flags;
 	int ret;
 
 	if (!eeh_has_flag(EEH_PROBE_MODE_DEV))
@@ -372,7 +408,6 @@ static int eeh_phb_check_failure(struct eeh_pe *pe)
 	}
 
 	/* If the PHB has been in problematic state */
-	eeh_serialize_lock(&flags);
 	if (phb_pe->state & EEH_PE_ISOLATED) {
 		ret = 0;
 		goto out;
@@ -388,14 +423,12 @@ static int eeh_phb_check_failure(struct eeh_pe *pe)
 
 	/* Isolate the PHB and send event */
 	eeh_pe_mark_isolated(phb_pe);
-	eeh_serialize_unlock(flags);
 
 	pr_debug("EEH: PHB#%x failure detected, location: %s\n",
 		phb_pe->phb->global_number, eeh_pe_loc_get(phb_pe));
 	eeh_send_failure_event(phb_pe);
 	return 1;
 out:
-	eeh_serialize_unlock(flags);
 	return ret;
 }
 
@@ -423,12 +456,11 @@ static inline const char *eeh_driver_name(struct pci_dev *pdev)
  */
 int eeh_dev_check_failure(struct eeh_dev *edev)
 {
-	int ret;
 	unsigned long flags;
 	struct device_node *dn;
 	struct pci_dev *dev;
 	struct eeh_pe *pe, *parent_pe;
-	int rc = 0;
+	int rc;
 	const char *location = NULL;
 
 	eeh_stats.total_mmio_ffs++;
@@ -440,6 +472,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 		eeh_stats.no_dn++;
 		return 0;
 	}
+	eeh_serialize_lock(&flags);
 	dev = eeh_dev_to_pci_dev(edev);
 	pe = eeh_dev_to_pe(edev);
 
@@ -447,24 +480,27 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 	if (!pe) {
 		eeh_stats.ignored_check++;
 		eeh_edev_dbg(edev, "Ignored check\n");
-		return 0;
+		rc = 0;
+		goto dn_unlock;
 	}
 
 	/*
 	 * On PowerNV platform, we might already have fenced PHB
 	 * there and we need take care of that firstly.
 	 */
-	ret = eeh_phb_check_failure(pe);
-	if (ret > 0)
-		return ret;
+	rc = eeh_phb_check_failure(pe);
+	if (rc > 0)
+		goto dn_unlock;
 
 	/*
 	 * If the PE isn't owned by us, we shouldn't check the
 	 * state. Instead, let the owner handle it if the PE has
 	 * been frozen.
 	 */
-	if (eeh_pe_passed(pe))
-		return 0;
+	if (eeh_pe_passed(pe)) {
+		rc = 0;
+		goto dn_unlock;
+	}
 
 	/* If we already have a pending isolation event for this
 	 * slot, we know it's bad already, we don't need to check.
@@ -472,8 +508,6 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 	 * in one slot might report errors simultaneously, and we
 	 * only want one error recovery routine running.
 	 */
-	eeh_serialize_lock(&flags);
-	rc = 1;
 	if (pe->state & EEH_PE_ISOLATED) {
 		pe->check_count++;
 		if (pe->check_count == EEH_MAX_FAILS) {
@@ -489,6 +523,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 				eeh_driver_name(dev));
 			dump_stack();
 		}
+		rc = 1;
 		goto dn_unlock;
 	}
 
@@ -499,7 +534,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 	 * function zero of a multi-function device.
 	 * In any case they must share a common PHB.
 	 */
-	ret = eeh_ops->get_state(pe, NULL);
+	rc = eeh_ops->get_state(pe, NULL);
 
 	/* Note that config-io to empty slots may fail;
 	 * they are empty when they don't have children.
@@ -507,8 +542,8 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 	 * PE's state, EEH not support and Permanently unavailable
 	 * state, PE is in good state.
 	 */
-	if ((ret < 0) ||
-	    (ret == EEH_STATE_NOT_SUPPORT) || eeh_state_active(ret)) {
+	if (rc < 0 ||
+	    rc == EEH_STATE_NOT_SUPPORT || eeh_state_active(rc)) {
 		eeh_stats.false_positives++;
 		pe->false_positives++;
 		rc = 0;
@@ -527,8 +562,8 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 			break;
 
 		/* Frozen parent PE ? */
-		ret = eeh_ops->get_state(parent_pe, NULL);
-		if (ret > 0 && !eeh_state_active(ret)) {
+		rc = eeh_ops->get_state(parent_pe, NULL);
+		if (rc > 0 && !eeh_state_active(rc)) {
 			pe = parent_pe;
 			pr_err("EEH: Failure of PHB#%x-PE#%x will be handled at parent PHB#%x-PE#%x.\n",
 			       pe->phb->global_number, pe->addr,
@@ -546,7 +581,6 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 	 * bridges.
 	 */
 	eeh_pe_mark_isolated(pe);
-	eeh_serialize_unlock(flags);
 
 	/* Most EEH events are due to device driver bugs.  Having
 	 * a stack trace will help the device-driver authors figure
@@ -555,6 +589,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 	pr_debug("EEH: %s: Frozen PHB#%x-PE#%x detected\n",
 		__func__, pe->phb->global_number, pe->addr);
 	eeh_send_failure_event(pe);
+	eeh_serialize_unlock(flags);
 
 	return 1;
 
@@ -659,7 +694,7 @@ int eeh_pci_enable(struct eeh_pe *pe, int function)
 
 	/* Check if the request is finished successfully */
 	if (active_flag) {
-		rc = eeh_wait_state(pe, PCI_BUS_RESET_WAIT_MSEC);
+		rc = eeh_wait_state(pe, PCI_BUS_RESET_WAIT_MSEC, false);
 		if (rc < 0)
 			return rc;
 
@@ -857,7 +892,7 @@ int eeh_pe_reset_full(struct eeh_pe *pe, bool include_passed)
 				pe->phb->global_number, pe->addr, i + 1);
 
 		/* Wait until the PE is in a functioning state */
-		state = eeh_wait_state(pe, PCI_BUS_RESET_WAIT_MSEC);
+		state = eeh_wait_state(pe, PCI_BUS_RESET_WAIT_MSEC, false);
 		if (state < 0) {
 			pr_warn("EEH: Unrecoverable slot failure on PHB#%x-PE#%x",
 				pe->phb->global_number, pe->addr);
@@ -933,7 +968,9 @@ static int eeh_device_notifier(struct notifier_block *nb,
 	 * the device's resources, which have not yet been set up.
 	 */
 	case BUS_NOTIFY_DEL_DEVICE:
+		eeh_recovery_lock();
 		eeh_remove_device(to_pci_dev(dev));
+		eeh_recovery_unlock();
 		break;
 	default:
 		break;
@@ -998,6 +1035,7 @@ int eeh_init(struct eeh_ops *ops)
 void eeh_probe_device(struct pci_dev *dev)
 {
 	struct eeh_dev *edev;
+	unsigned long flags;
 
 	pr_debug("EEH: Adding device %s\n", pci_name(dev));
 
@@ -1038,9 +1076,13 @@ void eeh_probe_device(struct pci_dev *dev)
 		edev->mode |= EEH_DEV_NO_HANDLER;
 	}
 
+	/* Both locks are required to make changes */
+	eeh_recovery_must_be_locked();
+	eeh_serialize_lock(&flags);
 	/* bind the pdev and the edev together */
 	edev->pdev = dev;
 	dev->dev.archdata.edev = edev;
+	eeh_serialize_unlock(flags);
 	eeh_addr_cache_insert_dev(dev);
 	eeh_sysfs_add_device(dev);
 }
@@ -1058,6 +1100,7 @@ void eeh_probe_device(struct pci_dev *dev)
 void eeh_remove_device(struct pci_dev *dev)
 {
 	struct eeh_dev *edev;
+	unsigned long flags;
 
 	if (!dev || !eeh_enabled())
 		return;
@@ -1071,6 +1114,9 @@ void eeh_remove_device(struct pci_dev *dev)
 		return;
 	}
 
+	/* Both locks are required to make changes */
+	eeh_recovery_must_be_locked();
+	eeh_serialize_lock(&flags);
 	/*
 	 * During the hotplug for EEH error recovery, we need the EEH
 	 * device attached to the parent PE in order for BAR restore
@@ -1078,6 +1124,7 @@ void eeh_remove_device(struct pci_dev *dev)
 	 * from the parent PE during the BAR resotre.
 	 */
 	edev->pdev = NULL;
+	eeh_serialize_unlock(flags);
 
 	/*
 	 * eeh_sysfs_remove_device() uses pci_dev_to_eeh_dev() so we need to
@@ -1103,7 +1150,11 @@ void eeh_remove_device(struct pci_dev *dev)
 	 * for the VF EEH device.
 	 */
 	edev->in_error = false;
+	/* Both locks are required to make changes */
+	eeh_recovery_must_be_locked();
+	eeh_serialize_lock(&flags);
 	dev->dev.archdata.edev = NULL;
+	eeh_serialize_unlock(flags);
 	if (!(edev->pe->state & EEH_PE_KEEP))
 		eeh_pe_tree_remove(edev);
 	else
@@ -1199,7 +1250,7 @@ int eeh_dev_open(struct pci_dev *pdev)
 	struct eeh_dev *edev;
 	int ret = -ENODEV;
 
-	mutex_lock(&eeh_dev_mutex);
+	eeh_recovery_lock();
 
 	/* No PCI device ? */
 	if (!pdev)
@@ -1222,11 +1273,11 @@ int eeh_dev_open(struct pci_dev *pdev)
 
 	/* Increase PE's pass through count */
 	atomic_inc(&edev->pe->pass_dev_cnt);
-	mutex_unlock(&eeh_dev_mutex);
+	eeh_recovery_unlock();
 
 	return 0;
 out:
-	mutex_unlock(&eeh_dev_mutex);
+	eeh_recovery_unlock();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(eeh_dev_open);
@@ -1243,7 +1294,7 @@ void eeh_dev_release(struct pci_dev *pdev)
 {
 	struct eeh_dev *edev;
 
-	mutex_lock(&eeh_dev_mutex);
+	eeh_recovery_lock();
 
 	/* No PCI device ? */
 	if (!pdev)
@@ -1258,7 +1309,7 @@ void eeh_dev_release(struct pci_dev *pdev)
 	WARN_ON(atomic_dec_if_positive(&edev->pe->pass_dev_cnt) < 0);
 	eeh_pe_change_owner(edev->pe);
 out:
-	mutex_unlock(&eeh_dev_mutex);
+	eeh_recovery_unlock();
 }
 EXPORT_SYMBOL(eeh_dev_release);
 
@@ -1646,6 +1697,7 @@ static ssize_t eeh_force_recover_write(struct file *filp,
 	struct eeh_pe *pe;
 	char buf[20];
 	int ret;
+	unsigned long flags;
 
 	ret = simple_write_to_buffer(buf, sizeof(buf), ppos, user_buf, count);
 	if (!ret)
@@ -1658,7 +1710,9 @@ static ssize_t eeh_force_recover_write(struct file *filp,
 	 * recoveries can occur.
 	 */
 	if (!strncmp(buf, "hwcheck", 7)) {
+		eeh_serialize_lock(&flags);
 		__eeh_send_failure_event(NULL);
+		eeh_serialize_unlock(flags);
 		return count;
 	}
 
@@ -1682,7 +1736,9 @@ static ssize_t eeh_force_recover_write(struct file *filp,
 	 * from an odd state (e.g. PE removed, or recovery of a
 	 * non-isolated PE)
 	 */
+	eeh_serialize_lock(&flags);
 	__eeh_send_failure_event(pe);
+	eeh_serialize_unlock(flags);
 
 	return ret < 0 ? ret : count;
 }
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 260273e56431..d956ae624691 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -247,65 +247,124 @@ static void eeh_set_irq_state(struct eeh_pe *root, bool enable)
 	}
 }
 
-typedef enum pci_ers_result (*eeh_report_fn)(struct eeh_dev *,
-					     struct pci_dev *,
+typedef enum pci_ers_result (*eeh_report_fn)(struct pci_dev *,
 					     struct pci_driver *);
-static void eeh_pe_report_edev(struct eeh_dev *edev, eeh_report_fn fn,
+static void eeh_pe_report_pdev(struct pci_dev *pdev, eeh_report_fn fn,
 			       enum pci_ers_result *result)
 {
-	struct pci_dev *pdev;
+	struct eeh_dev *edev;
 	struct pci_driver *driver;
+	bool actionable, late, removed, passed;
 	enum pci_ers_result new_result;
 
-	pci_lock_rescan_remove();
-	pdev = edev->pdev;
-	if (pdev)
-		get_device(&pdev->dev);
-	pci_unlock_rescan_remove();
-	if (!pdev) {
-		eeh_edev_info(edev, "no device");
+	edev = pci_dev_to_eeh_dev(pdev);
+	if (!edev) {
+		pci_info(pdev, "no EEH state for device");
 		return;
 	}
-	device_lock(&pdev->dev);
-	if (eeh_edev_actionable(edev)) {
+	/* Cache some useful values before releasing the lock: */
+	actionable = eeh_edev_actionable(edev);
+	late = edev->mode & EEH_DEV_NO_HANDLER;
+	removed = eeh_dev_removed(edev);
+	passed = eeh_pe_passed(edev->pe);
+	if (actionable) {
+		/*
+		 * Driver callbacks may end up calling back into EEH functions
+		 * (for example by removing a PCI device) which will deadlock
+		 * unless the EEH locks are released first. Note that it may be
+		 * re-acquired by the report functions, if necessary.
+		 */
+		eeh_recovery_unlock();
+		device_lock(&pdev->dev);
 		driver = eeh_pcid_get(pdev);
 
 		if (!driver)
-			eeh_edev_info(edev, "no driver");
+			pci_info(pdev, "no driver");
 		else if (!driver->err_handler)
-			eeh_edev_info(edev, "driver not EEH aware");
-		else if (edev->mode & EEH_DEV_NO_HANDLER)
-			eeh_edev_info(edev, "driver bound too late");
+			pci_info(pdev, "driver not EEH aware");
+		else if (late)
+			pci_info(pdev, "driver bound too late");
 		else {
-			new_result = fn(edev, pdev, driver);
-			eeh_edev_info(edev, "%s driver reports: '%s'",
-				      driver->name,
-				      pci_ers_result_name(new_result));
+			new_result = fn(pdev, driver);
+			/*
+			 * It's not safe to use edev here, because the locks
+			 * have been released and devices could have changed.
+			 */
+			pci_info(pdev, "%s driver reports: '%s'",
+				 driver->name,
+				 pci_ers_result_name(new_result));
 			if (result)
 				*result = pci_ers_merge_result(*result,
 							       new_result);
 		}
 		if (driver)
 			eeh_pcid_put(pdev);
+		device_unlock(&pdev->dev);
+		eeh_recovery_lock();
 	} else {
-		eeh_edev_info(edev, "not actionable (%d,%d,%d)", !!pdev,
-			      !eeh_dev_removed(edev), !eeh_pe_passed(edev->pe));
+		pci_info(pdev, "not actionable (%d,%d,%d)", !!pdev,
+			 !removed, !passed);
 	}
-	device_unlock(&pdev->dev);
-	if (edev->pdev != pdev)
-		eeh_edev_warn(edev, "Device changed during processing!\n");
-	put_device(&pdev->dev);
 }
 
-static void eeh_pe_report(const char *name, struct eeh_pe *root,
-			  eeh_report_fn fn, enum pci_ers_result *result)
+struct pci_dev **pdev_cache_list_create(struct eeh_pe *root)
 {
 	struct eeh_pe *pe;
 	struct eeh_dev *edev, *tmp;
+	struct pci_dev **pdevs;
+	int i, n;
+
+	n = 0;
+	eeh_for_each_pe(root, pe) eeh_pe_for_each_dev(pe, edev, tmp) {
+		if (edev->pdev)
+			n++;
+	}
+	pdevs = kmalloc(sizeof(*pdevs) * (n + 1), GFP_KERNEL);
+	if (WARN_ON_ONCE(!pdevs))
+		return NULL;
+	i = 0;
+	eeh_for_each_pe(root, pe) eeh_pe_for_each_dev(pe, edev, tmp) {
+		if (i < n) {
+			get_device(&edev->pdev->dev);
+			pdevs[i++] = edev->pdev;
+		}
+	}
+	if (WARN_ON_ONCE(i < n))
+		n = i;
+	pdevs[n] = NULL; /* terminator */
+	return pdevs;
+}
+
+static void pdev_cache_list_destroy(struct pci_dev **pdevs)
+{
+	struct pci_dev **pdevp;
+
+	for (pdevp = pdevs; pdevp && *pdevp; pdevp++)
+		put_device(&(*pdevp)->dev);
+	kfree(pdevs);
+}
+
+static void eeh_pe_report(const char *name, struct eeh_pe *root,
+			  eeh_report_fn fn, enum pci_ers_result *result)
+{
+	struct pci_dev **pdevs, **pdevp;
 
 	pr_info("EEH: Beginning: '%s'\n", name);
-	eeh_for_each_pe(root, pe) eeh_pe_for_each_dev(pe, edev, tmp)
-		eeh_pe_report_edev(edev, fn, result);
+	/*
+	 * It would be convenient to continue to hold the recovery lock here
+	 * but driver callbacks can take a very long time or never return at
+	 * all.
+	 */
+	pdevs = pdev_cache_list_create(root);
+	for (pdevp = pdevs; pdevp && *pdevp; pdevp++) {
+		/*
+		 * NOTE! eeh_recovery_lock() is released briefly
+		 * in eeh_pe_report_pdev()
+		 */
+		eeh_pe_report_pdev(*pdevp, fn, result);
+	}
+	pdev_cache_list_destroy(pdevs);
+
 	if (result)
 		pr_info("EEH: Finished:'%s' with aggregate recovery state:'%s'\n",
 			name, pci_ers_result_name(*result));
@@ -315,25 +374,30 @@ static void eeh_pe_report(const char *name, struct eeh_pe *root,
 
 /**
  * eeh_report_error - Report pci error to each device driver
- * @edev: eeh device
+ * @pdev: eeh device
  * @driver: device's PCI driver
  *
  * Report an EEH error to each device driver.
  */
-static enum pci_ers_result eeh_report_error(struct eeh_dev *edev,
-					    struct pci_dev *pdev,
+static enum pci_ers_result eeh_report_error(struct pci_dev *pdev,
 					    struct pci_driver *driver)
 {
 	enum pci_ers_result rc;
+	struct eeh_dev *edev;
+	unsigned long flags;
 
 	if (!driver->err_handler->error_detected)
 		return PCI_ERS_RESULT_NONE;
 
-	eeh_edev_info(edev, "Invoking %s->error_detected(IO frozen)",
-		      driver->name);
+	pci_info(pdev, "Invoking %s->error_detected(IO frozen)", driver->name);
 	rc = driver->err_handler->error_detected(pdev, pci_channel_io_frozen);
 
-	edev->in_error = true;
+	eeh_serialize_lock(&flags);
+	edev = pci_dev_to_eeh_dev(pdev);
+	if (edev)
+		edev->in_error = true;
+	eeh_serialize_unlock(flags);
+
 	pci_uevent_ers(pdev, PCI_ERS_RESULT_NONE);
 	return rc;
 }
@@ -346,19 +410,19 @@ static enum pci_ers_result eeh_report_error(struct eeh_dev *edev,
  * Tells each device driver that IO ports, MMIO and config space I/O
  * are now enabled.
  */
-static enum pci_ers_result eeh_report_mmio_enabled(struct eeh_dev *edev,
-						   struct pci_dev *pdev,
+static enum pci_ers_result eeh_report_mmio_enabled(struct pci_dev *pdev,
 						   struct pci_driver *driver)
 {
 	if (!driver->err_handler->mmio_enabled)
 		return PCI_ERS_RESULT_NONE;
-	eeh_edev_info(edev, "Invoking %s->mmio_enabled()", driver->name);
+	pci_info(pdev, "Invoking %s->mmio_enabled()", driver->name);
 	return driver->err_handler->mmio_enabled(pdev);
 }
 
 /**
  * eeh_report_reset - Tell device that slot has been reset
  * @edev: eeh device
+ * @edev: eeh device
  * @driver: device's PCI driver
  *
  * This routine must be called while EEH tries to reset particular
@@ -366,13 +430,20 @@ static enum pci_ers_result eeh_report_mmio_enabled(struct eeh_dev *edev,
  * some actions, usually to save data the driver needs so that the
  * driver can work again while the device is recovered.
  */
-static enum pci_ers_result eeh_report_reset(struct eeh_dev *edev,
-					    struct pci_dev *pdev,
+static enum pci_ers_result eeh_report_reset(struct pci_dev *pdev,
 					    struct pci_driver *driver)
 {
-	if (!driver->err_handler->slot_reset || !edev->in_error)
+	struct eeh_dev *edev;
+	unsigned long flags;
+
+	eeh_serialize_lock(&flags);
+	edev = pci_dev_to_eeh_dev(pdev);
+	if (!driver->err_handler->slot_reset || !edev->in_error) {
+		eeh_serialize_unlock(flags);
 		return PCI_ERS_RESULT_NONE;
-	eeh_edev_info(edev, "Invoking %s->slot_reset()", driver->name);
+	}
+	eeh_serialize_unlock(flags);
+	pci_info(pdev, "Invoking %s->slot_reset()", driver->name);
 	return driver->err_handler->slot_reset(pdev);
 }
 
@@ -412,20 +483,29 @@ static void eeh_dev_restore_state(struct eeh_dev *edev, void *userdata)
  * could resume so that the device driver can do some initialization
  * to make the recovered device work again.
  */
-static enum pci_ers_result eeh_report_resume(struct eeh_dev *edev,
-					     struct pci_dev *pdev,
+static enum pci_ers_result eeh_report_resume(struct pci_dev *pdev,
 					     struct pci_driver *driver)
 {
-	if (!driver->err_handler->resume || !edev->in_error)
+	struct eeh_dev *edev;
+	unsigned long flags;
+
+	eeh_serialize_lock(&flags);
+	edev = pci_dev_to_eeh_dev(pdev);
+	if (!driver->err_handler->resume || !edev->in_error) {
+		eeh_serialize_unlock(flags);
 		return PCI_ERS_RESULT_NONE;
+	}
+	eeh_serialize_unlock(flags);
 
-	eeh_edev_info(edev, "Invoking %s->resume()", driver->name);
+	pci_info(pdev, "Invoking %s->resume()", driver->name);
 	driver->err_handler->resume(pdev);
 
-	pci_uevent_ers(edev->pdev, PCI_ERS_RESULT_RECOVERED);
+	pci_uevent_ers(pdev, PCI_ERS_RESULT_RECOVERED);
 #ifdef CONFIG_PCI_IOV
+	eeh_serialize_lock(&flags);
 	if (eeh_ops->notify_resume)
 		eeh_ops->notify_resume(edev);
+	eeh_serialize_unlock(flags);
 #endif
 	return PCI_ERS_RESULT_NONE;
 }
@@ -438,8 +518,7 @@ static enum pci_ers_result eeh_report_resume(struct eeh_dev *edev,
  * This informs the device driver that the device is permanently
  * dead, and that no further recovery attempts will be made on it.
  */
-static enum pci_ers_result eeh_report_failure(struct eeh_dev *edev,
-					      struct pci_dev *pdev,
+static enum pci_ers_result eeh_report_failure(struct pci_dev *pdev,
 					      struct pci_driver *driver)
 {
 	enum pci_ers_result rc;
@@ -447,8 +526,8 @@ static enum pci_ers_result eeh_report_failure(struct eeh_dev *edev,
 	if (!driver->err_handler->error_detected)
 		return PCI_ERS_RESULT_NONE;
 
-	eeh_edev_info(edev, "Invoking %s->error_detected(permanent failure)",
-		      driver->name);
+	pci_info(pdev, "Invoking %s->error_detected(permanent failure)",
+		 driver->name);
 	rc = driver->err_handler->error_detected(pdev,
 						 pci_channel_io_perm_failure);
 
@@ -476,17 +555,38 @@ static void *eeh_add_virt_device(struct eeh_dev *edev)
 	}
 
 #ifdef CONFIG_PCI_IOV
-	pci_iov_add_virtfn(edev->physfn, edev->vf_index);
+	{
+		struct pci_dev *physfn = edev->physfn;
+		int vf_index = eeh_dev_to_pdn(edev)->vf_index;
+
+		get_device(&physfn->dev);
+		eeh_recovery_unlock();
+		/*
+		 * This PCI operation will call back into EEH code where the
+		 * recovery lock will be acquired, so it must be released here,
+		 * first:
+		 */
+		pci_iov_add_virtfn(physfn, vf_index);
+		put_device(&physfn->dev);
+		eeh_recovery_lock();
+	}
 #endif
 	return NULL;
 }
 
-static void eeh_rmv_device(struct eeh_dev *edev, void *userdata)
+static void eeh_rmv_device(struct pci_dev *pdev, void *userdata)
 {
+	struct eeh_dev *edev;
 	struct pci_driver *driver;
-	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
 	struct eeh_rmv_data *rmv_data = (struct eeh_rmv_data *)userdata;
 
+	edev = pci_dev_to_eeh_dev(pdev);
+	if (!edev) {
+		pci_warn(pdev, "EEH: Device removed during processing (#%d)\n",
+			 __LINE__);
+		return;
+	}
+
 	/*
 	 * Actually, we should remove the PCI bridges as well.
 	 * However, that's lots of complexity to do that,
@@ -495,40 +595,50 @@ static void eeh_rmv_device(struct eeh_dev *edev, void *userdata)
 	 * simplicity here.
 	 */
 	if (!eeh_edev_actionable(edev) ||
-	    (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE))
+	    (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE))
 		return;
 
 	if (rmv_data) {
-		driver = eeh_pcid_get(dev);
+		driver = eeh_pcid_get(pdev);
 		if (driver) {
 			if (driver->err_handler &&
 			    driver->err_handler->error_detected &&
 			    driver->err_handler->slot_reset) {
-				eeh_pcid_put(dev);
+				eeh_pcid_put(pdev);
 				return;
 			}
-			eeh_pcid_put(dev);
+			eeh_pcid_put(pdev);
 		}
 	}
 
 	/* Remove it from PCI subsystem */
-	pr_info("EEH: Removing %s without EEH sensitive driver\n",
-		pci_name(dev));
+	pci_info(pdev, "EEH: Removing device without EEH sensitive driver\n");
 	edev->mode |= EEH_DEV_DISCONNECTED;
 	if (rmv_data)
 		rmv_data->removed_dev_count++;
 
 	if (edev->physfn) {
 #ifdef CONFIG_PCI_IOV
+		eeh_recovery_unlock();
 		pci_iov_remove_virtfn(edev->physfn, edev->vf_index);
+		eeh_recovery_lock();
+		/* Both locks are required to make changes */
+		eeh_serialize_lock(&flags);
 		edev->pdev = NULL;
+		eeh_serialize_unlock(flags);
 #endif
 		if (rmv_data)
 			list_add(&edev->rmv_entry, &rmv_data->removed_vf_list);
 	} else {
+		/*
+		 * Lock ordering requires that the recovery lock be released
+		 * before acquiring the PCI rescan/remove lock.
+		 */
+		eeh_recovery_unlock();
 		pci_lock_rescan_remove();
-		pci_stop_and_remove_bus_device(dev);
+		pci_stop_and_remove_bus_device(pdev);
 		pci_unlock_rescan_remove();
+		eeh_recovery_lock();
 	}
 }
 
@@ -626,6 +736,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
 {
 	time64_t tstamp;
 	int cnt, rc;
+	struct pci_dev **pdevs, **pdevp;
 	struct eeh_dev *edev;
 	struct eeh_pe *tmp_pe;
 	bool any_passed = false;
@@ -645,11 +756,23 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
 	 */
 	eeh_pe_state_mark(pe, EEH_PE_KEEP);
 	if (any_passed || driver_eeh_aware || (pe->type & EEH_PE_VF)) {
-		eeh_pe_dev_traverse(pe, eeh_rmv_device, rmv_data);
+		/*
+		 * eeh_rmv_device() may need to release the recovery lock to
+		 * remove a PCI device so we can't rely on the PE lists staying
+		 * valid:
+		 */
+		pdevs = pdev_cache_list_create(pe);
+		/* eeh_rmv_device() may re-acquire the recovery lock */
+		for (pdevp = pdevs; pdevp && *pdevp; pdevp++)
+			eeh_rmv_device(*pdevp, rmv_data);
+		pdev_cache_list_destroy(pdevs);
+
 	} else {
+		eeh_recovery_unlock();
 		pci_lock_rescan_remove();
 		pci_hp_remove_devices(bus);
 		pci_unlock_rescan_remove();
+		eeh_recovery_lock();
 	}
 
 	/*
@@ -665,7 +788,13 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
 	if (rc)
 		return rc;
 
+	/*
+	 * The PCI rescan/remove lock must always be taken first, but we need
+	 * both here:
+	 */
+	eeh_recovery_unlock();
 	pci_lock_rescan_remove();
+	eeh_recovery_lock();
 
 	/* Restore PE */
 	eeh_ops->configure_bridge(pe);
@@ -673,10 +802,9 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
 
 	/* Clear frozen state */
 	rc = eeh_clear_pe_frozen_state(pe, false);
-	if (rc) {
-		pci_unlock_rescan_remove();
+	pci_unlock_rescan_remove();
+	if (rc)
 		return rc;
-	}
 
 	/* Give the system 5 seconds to finish running the user-space
 	 * hotplug shutdown scripts, e.g. ifdown for ethernet.  Yes,
@@ -687,7 +815,9 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
 	if (!driver_eeh_aware || rmv_data->removed_dev_count) {
 		pr_info("EEH: Sleep 5s ahead of %s hotplug\n",
 			(driver_eeh_aware ? "partial" : "complete"));
+		eeh_recovery_unlock();
 		ssleep(5);
+		eeh_recovery_lock();
 
 		/*
 		 * The EEH device is still connected with its parent
@@ -701,7 +831,17 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
 		} else {
 			if (!driver_eeh_aware)
 				eeh_pe_state_clear(pe, EEH_PE_PRI_BUS, true);
+			/*
+			 * Lock ordering requires that the recovery lock be
+			 * released before acquiring the PCI rescan/remove
+			 * lock.
+			 */
+			eeh_recovery_unlock();
+			pci_lock_rescan_remove();
 			pci_hp_add_devices(bus);
+			pci_unlock_rescan_remove();
+			eeh_recovery_lock();
+
 		}
 	}
 	eeh_pe_state_clear(pe, EEH_PE_KEEP, true);
@@ -709,7 +849,6 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
 	pe->tstamp = tstamp;
 	pe->freeze_count = cnt;
 
-	pci_unlock_rescan_remove();
 	return 0;
 }
 
@@ -837,16 +976,19 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	struct pci_bus *bus;
 	struct eeh_dev *edev, *tmp;
 	struct eeh_pe *tmp_pe;
+	struct pci_dev **pdevs, **pdevp;
 	int rc = 0;
 	enum pci_ers_result result = PCI_ERS_RESULT_NONE;
 	struct eeh_rmv_data rmv_data =
 		{LIST_HEAD_INIT(rmv_data.removed_vf_list), 0};
 	int devices = 0;
 
+	eeh_recovery_lock();
 	bus = eeh_pe_bus_get(pe);
 	if (!bus) {
 		pr_err("%s: Cannot find PCI bus for PHB#%x-PE#%x\n",
 			__func__, pe->phb->global_number, pe->addr);
+		eeh_recovery_unlock();
 		return;
 	}
 
@@ -948,7 +1090,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	/* Get the current PCI slot state. This can take a long time,
 	 * sometimes over 300 seconds for certain systems.
 	 */
-	rc = eeh_wait_state(pe, MAX_WAIT_FOR_RECOVERY * 1000);
+	rc = eeh_wait_state(pe, MAX_WAIT_FOR_RECOVERY * 1000, true);
 	if (rc < 0 || rc == EEH_STATE_NOT_SUPPORT) {
 		pr_warn("EEH: Permanent failure\n");
 		goto recover_failed;
@@ -1079,12 +1221,16 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	 * the their PCI config any more.
 	 */
 	if (pe->type & EEH_PE_VF) {
-		eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
+		pdevs = pdev_cache_list_create(pe);
+		for (pdevp = pdevs; pdevp && *pdevp; pdevp++)
+			eeh_rmv_device(*pdevp, NULL);
+		pdev_cache_list_destroy(pdevs);
 		eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
 	} else {
 		eeh_pe_state_clear(pe, EEH_PE_PRI_BUS, true);
 		eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
 
+		eeh_recovery_unlock();
 		pci_lock_rescan_remove();
 		pci_hp_remove_devices(bus);
 		pci_unlock_rescan_remove();
@@ -1105,6 +1251,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 			eeh_clear_slot_attention(edev->pdev);
 
 	eeh_pe_state_clear(pe, EEH_PE_RECOVERING, true);
+	eeh_recovery_unlock();
 }
 
 /**
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index d2873d17d2b1..8f30ad3df8e0 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -103,8 +103,15 @@ int eeh_phb_pe_create(struct pci_controller *phb)
  *
  * Wait for the state of associated PE. It might take some time
  * to retrieve the PE's state.
+ *
+ * Note that if this function sleeps and unlock is true, the EEH recovery lock
+ * will be released and re-acquired.
+ * It is only safe to do this when the PE has the recovering flag set on it.
+ * This is important because the sleep may be
+ * very long (300 seconds) and device removal will be blocked by the recovery
+ * mutex. See eeh_device_notifier().
  */
-int eeh_wait_state(struct eeh_pe *pe, int max_wait)
+int eeh_wait_state(struct eeh_pe *pe, int max_wait, bool unlock)
 {
 	int ret;
 	int mwait;
@@ -120,6 +127,8 @@ int eeh_wait_state(struct eeh_pe *pe, int max_wait)
 #define EEH_STATE_MIN_WAIT_TIME	(1000)
 #define EEH_STATE_MAX_WAIT_TIME	(300 * 1000)
 
+	WARN_ON_ONCE(unlock && !(pe->state & EEH_PE_RECOVERING));
+
 	while (1) {
 		ret = eeh_ops->get_state(pe, &mwait);
 
@@ -141,8 +150,11 @@ int eeh_wait_state(struct eeh_pe *pe, int max_wait)
 				__func__, mwait);
 			mwait = EEH_STATE_MAX_WAIT_TIME;
 		}
-
+		if (unlock)
+			eeh_recovery_unlock();
 		msleep(min(mwait, max_wait));
+		if (unlock)
+			eeh_recovery_lock();
 		max_wait -= mwait;
 	}
 }
@@ -308,6 +320,7 @@ int eeh_pe_tree_insert(struct eeh_dev *edev, struct eeh_pe *new_pe_parent)
 {
 	struct pci_controller *hose = edev->controller;
 	struct eeh_pe *pe, *parent;
+	unsigned long flags;
 
 	/*
 	 * Search the PE has been existing or not according
@@ -315,6 +328,8 @@ int eeh_pe_tree_insert(struct eeh_dev *edev, struct eeh_pe *new_pe_parent)
 	 * PE should be composed of PCI bus and its subordinate
 	 * components.
 	 */
+	eeh_recovery_must_be_locked();
+	eeh_serialize_lock(&flags);
 	pe = eeh_pe_get(hose, edev->pe_config_addr);
 	if (pe) {
 		if (pe->type & EEH_PE_INVALID) {
@@ -343,8 +358,10 @@ int eeh_pe_tree_insert(struct eeh_dev *edev, struct eeh_pe *new_pe_parent)
 			list_add_tail(&edev->entry, &pe->edevs);
 			eeh_edev_dbg(edev, "Added to bus PE\n");
 		}
+		eeh_serialize_unlock(flags);
 		return 0;
 	}
+	eeh_serialize_unlock(flags);
 
 	/* Create a new EEH PE */
 	if (edev->physfn)
@@ -364,6 +381,7 @@ int eeh_pe_tree_insert(struct eeh_dev *edev, struct eeh_pe *new_pe_parent)
 	 * to PHB directly. Otherwise, we have to associate the
 	 * PE with its parent.
 	 */
+	eeh_serialize_lock(&flags);
 	if (!new_pe_parent) {
 		new_pe_parent = eeh_phb_pe_get(hose);
 		if (!new_pe_parent) {
@@ -371,6 +389,7 @@ int eeh_pe_tree_insert(struct eeh_dev *edev, struct eeh_pe *new_pe_parent)
 				__func__, hose->global_number);
 			edev->pe = NULL;
 			kfree(pe);
+			eeh_serialize_unlock(flags);
 			return -EEXIST;
 		}
 	}
@@ -385,6 +404,7 @@ int eeh_pe_tree_insert(struct eeh_dev *edev, struct eeh_pe *new_pe_parent)
 	 */
 	list_add_tail(&edev->entry, &pe->edevs);
 	edev->pe = pe;
+	eeh_serialize_unlock(flags);
 	eeh_edev_dbg(edev, "Added to new (parent: PE#%x)\n",
 		     new_pe_parent->addr);
 
@@ -402,13 +422,18 @@ int eeh_pe_tree_insert(struct eeh_dev *edev, struct eeh_pe *new_pe_parent)
  */
 int eeh_pe_tree_remove(struct eeh_dev *edev)
 {
+	unsigned long flags;
 	struct eeh_pe *pe, *parent, *child;
 	bool keep, recover;
 	int cnt;
 
+	/* Both locks are required to make changes */
+	eeh_recovery_must_be_locked();
+	eeh_serialize_lock(&flags);
 	pe = eeh_dev_to_pe(edev);
 	if (!pe) {
 		eeh_edev_dbg(edev, "No PE found for device.\n");
+		eeh_serialize_unlock(flags);
 		return -EEXIST;
 	}
 
@@ -475,6 +500,7 @@ int eeh_pe_tree_remove(struct eeh_dev *edev)
 		pe = parent;
 	}
 
+	eeh_serialize_unlock(flags);
 	return 0;
 }
 
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index a83cb679dd59..12178be1d70a 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -72,6 +72,7 @@ static ssize_t pnv_eeh_ei_write(struct file *filp,
 	unsigned long addr, mask;
 	char buf[50];
 	int ret;
+	unsigned long flags;
 
 	if (!eeh_ops || !eeh_ops->err_inject)
 		return -ENXIO;
@@ -86,14 +87,21 @@ static ssize_t pnv_eeh_ei_write(struct file *filp,
 		     &pe_no, &type, &func, &addr, &mask);
 	if (ret != 5)
 		return -EINVAL;
-
+	/*
+	 * Use the spinlock rather than the mutex so that errors can be
+	 * injected during slow recovery operations (for testing).
+	 */
+	eeh_serialize_lock(&flags);
 	/* Retrieve PE */
 	pe = eeh_pe_get(hose, pe_no);
-	if (!pe)
+	if (!pe) {
+		eeh_serialize_unlock(flags);
 		return -ENODEV;
+	}
 
 	/* Do error injection */
 	ret = eeh_ops->err_inject(pe, type, func, addr, mask);
+	eeh_serialize_unlock(flags);
 	return ret < 0 ? ret : count;
 }
 
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 1b0c901a6f3b..aeb0ec94dfb8 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -47,11 +47,13 @@ static void pseries_eeh_init_edev(struct pci_dn *pdn);
 
 static void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
 {
-	struct pci_dn *pdn = pci_get_pdn(pdev);
+	struct pci_dn *pdn;
 
 	if (eeh_has_flag(EEH_FORCE_DISABLED))
 		return;
 
+	eeh_recovery_lock();
+	pdn = pci_get_pdn(pdev);
 	dev_dbg(&pdev->dev, "EEH: Setting up device\n");
 #ifdef CONFIG_PCI_IOV
 	if (pdev->is_virtfn) {
@@ -82,6 +84,7 @@ static void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
 	}
 #endif
 	eeh_probe_device(pdev);
+	eeh_recovery_unlock();
 }
 
 
diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c b/arch/powerpc/platforms/pseries/pci_dlpar.c
index 4ba824568119..495dd9204ee5 100644
--- a/arch/powerpc/platforms/pseries/pci_dlpar.c
+++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
@@ -38,8 +38,11 @@ struct pci_controller *init_phb_dynamic(struct device_node *dn)
 	/* Create EEH devices for the PHB */
 	eeh_phb_pe_create(phb);
 
-	if (dn->child)
+	if (dn->child) {
+		eeh_recovery_lock();
 		pseries_eeh_init_edev_recursive(PCI_DN(dn));
+		eeh_recovery_unlock();
+	}
 
 	pcibios_scan_phb(phb);
 	pcibios_finish_adding_to_bus(phb->bus);
diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
index 881d420637bf..7837c9054c73 100644
--- a/drivers/pci/hotplug/pnv_php.c
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -769,7 +769,6 @@ static irqreturn_t pnv_php_interrupt(int irq, void *data)
 	u16 sts, lsts;
 	u8 presence;
 	bool added;
-	unsigned long flags;
 	int ret;
 
 	pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &sts);
@@ -807,10 +806,10 @@ static irqreturn_t pnv_php_interrupt(int irq, void *data)
 		edev = pchild ? pci_dev_to_eeh_dev(pchild) : NULL;
 		pe = edev ? edev->pe : NULL;
 		if (pe) {
-			eeh_serialize_lock(&flags);
+			eeh_recovery_lock();
 			eeh_pe_mark_isolated(pe);
-			eeh_serialize_unlock(flags);
 			eeh_pe_set_option(pe, EEH_OPT_FREEZE_PE);
+			eeh_recovery_unlock();
 		}
 	}
 
diff --git a/drivers/pci/hotplug/rpadlpar_core.c b/drivers/pci/hotplug/rpadlpar_core.c
index 980bb3afd092..1811f24ad151 100644
--- a/drivers/pci/hotplug/rpadlpar_core.c
+++ b/drivers/pci/hotplug/rpadlpar_core.c
@@ -141,7 +141,9 @@ static void dlpar_pci_add_bus(struct device_node *dn)
 	struct pci_controller *phb = pdn->phb;
 	struct pci_dev *dev = NULL;
 
+	eeh_recovery_lock();
 	pseries_eeh_init_edev_recursive(pdn);
+	eeh_recovery_unlock();
 
 	/* Add EADS device to PHB bus, adding new entry to bus->devices */
 	dev = of_create_pci_dev(dn, phb->bus, pdn->devfn);
diff --git a/drivers/vfio/vfio_spapr_eeh.c b/drivers/vfio/vfio_spapr_eeh.c
index 67f55ac1d459..2f184725af5a 100644
--- a/drivers/vfio/vfio_spapr_eeh.c
+++ b/drivers/vfio/vfio_spapr_eeh.c
@@ -54,6 +54,7 @@ long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
 		if (op.argsz < minsz || op.flags)
 			return -EINVAL;
 
+		eeh_recovery_lock();
 		switch (op.op) {
 		case VFIO_EEH_PE_DISABLE:
 			ret = eeh_pe_set_option(pe, EEH_OPT_DISABLE);
@@ -84,10 +85,14 @@ long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
 			break;
 		case VFIO_EEH_PE_INJECT_ERR:
 			minsz = offsetofend(struct vfio_eeh_pe_op, err.mask);
-			if (op.argsz < minsz)
+			if (op.argsz < minsz) {
+				eeh_recovery_unlock();
 				return -EINVAL;
-			if (copy_from_user(&op, (void __user *)arg, minsz))
+			}
+			if (copy_from_user(&op, (void __user *)arg, minsz)) {
+				eeh_recovery_unlock();
 				return -EFAULT;
+			}
 
 			ret = eeh_pe_inject_err(pe, op.err.type, op.err.func,
 						op.err.addr, op.err.mask);
@@ -95,6 +100,7 @@ long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
 		default:
 			ret = -EINVAL;
 		}
+		eeh_recovery_unlock();
 	}
 
 	return ret;
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 2/3] powerpc/eeh: Provide a unique ID for each EEH recovery
  2022-08-16  3:27 [RFC 0/3] Asynchronous EEH recovery Ganesh Goudar
  2022-08-16  3:27 ` [RFC 1/3] powerpc/eeh: Synchronization for safety Ganesh Goudar
@ 2022-08-16  3:27 ` Ganesh Goudar
  2022-08-16  3:27 ` [RFC 3/3] powerpc/eeh: Asynchronous recovery Ganesh Goudar
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Ganesh Goudar @ 2022-08-16  3:27 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Ganesh Goudar, mahesh

Based on the original work from Sam Bobroff.

Give a unique ID to each recovery event, to ease log parsing
and prepare for parallel recovery.

Also add some new messages with a very simple format that may
be useful to log-parsers.

Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
---
 arch/powerpc/include/asm/eeh_event.h |   3 +-
 arch/powerpc/include/asm/ppc-pci.h   |   2 +-
 arch/powerpc/kernel/eeh.c            |  42 +++---
 arch/powerpc/kernel/eeh_driver.c     | 188 ++++++++++++++++-----------
 arch/powerpc/kernel/eeh_event.c      |  12 +-
 5 files changed, 146 insertions(+), 101 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh_event.h b/arch/powerpc/include/asm/eeh_event.h
index dadde7d52f46..a1fe736bc4cf 100644
--- a/arch/powerpc/include/asm/eeh_event.h
+++ b/arch/powerpc/include/asm/eeh_event.h
@@ -17,13 +17,14 @@
 struct eeh_event {
 	struct list_head	list;	/* to form event queue	*/
 	struct eeh_pe		*pe;	/* EEH PE		*/
+	unsigned int		id;	/* Event ID		*/
 };
 
 int eeh_event_init(void);
 int eeh_send_failure_event(struct eeh_pe *pe);
 int __eeh_send_failure_event(struct eeh_pe *pe);
 void eeh_remove_event(struct eeh_pe *pe, bool force);
-void eeh_handle_normal_event(struct eeh_pe *pe);
+void eeh_handle_normal_event(unsigned int event_id, struct eeh_pe *pe);
 void eeh_handle_special_event(void);
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
index f6cf0159024e..42d175af33cb 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -40,7 +40,7 @@ extern int rtas_setup_phb(struct pci_controller *phb);
 void eeh_addr_cache_insert_dev(struct pci_dev *dev);
 void eeh_addr_cache_rmv_dev(struct pci_dev *dev);
 struct eeh_dev *eeh_addr_cache_get_dev(unsigned long addr);
-void eeh_slot_error_detail(struct eeh_pe *pe, int severity);
+void eeh_slot_error_detail(unsigned int event_id, struct eeh_pe *pe, int severity);
 int eeh_pci_enable(struct eeh_pe *pe, int function);
 int eeh_pe_reset_full(struct eeh_pe *pe, bool include_passed);
 void eeh_save_bars(struct eeh_dev *edev);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 2c90c37524ed..148d5df0e606 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -200,7 +200,8 @@ EXPORT_SYMBOL_GPL(eeh_recovery_must_be_locked);
  * for the indicated PCI device, and puts them into a buffer
  * for RTAS error logging.
  */
-static size_t eeh_dump_dev_log(struct eeh_dev *edev, char *buf, size_t len)
+static size_t eeh_dump_dev_log(unsigned int event_id, struct eeh_dev *edev,
+			       char *buf, size_t len)
 {
 	u32 cfg;
 	int cap, i;
@@ -210,27 +211,29 @@ static size_t eeh_dump_dev_log(struct eeh_dev *edev, char *buf, size_t len)
 	n += scnprintf(buf+n, len-n, "%04x:%02x:%02x.%01x\n",
 			edev->pe->phb->global_number, edev->bdfn >> 8,
 			PCI_SLOT(edev->bdfn), PCI_FUNC(edev->bdfn));
-	pr_warn("EEH: of node=%04x:%02x:%02x.%01x\n",
+	pr_warn("EEH(%u): of node=%04x:%02x:%02x.%01x\n",
+		event_id,
 		edev->pe->phb->global_number, edev->bdfn >> 8,
 		PCI_SLOT(edev->bdfn), PCI_FUNC(edev->bdfn));
 
 	eeh_ops->read_config(edev, PCI_VENDOR_ID, 4, &cfg);
 	n += scnprintf(buf+n, len-n, "dev/vend:%08x\n", cfg);
-	pr_warn("EEH: PCI device/vendor: %08x\n", cfg);
+	pr_warn("EEH(%u): PCI device/vendor: %08x\n",event_id, cfg);
 
 	eeh_ops->read_config(edev, PCI_COMMAND, 4, &cfg);
 	n += scnprintf(buf+n, len-n, "cmd/stat:%x\n", cfg);
-	pr_warn("EEH: PCI cmd/status register: %08x\n", cfg);
+	pr_warn("EEH(%u): PCI cmd/status register: %08x\n", event_id, cfg);
 
 	/* Gather bridge-specific registers */
 	if (edev->mode & EEH_DEV_BRIDGE) {
 		eeh_ops->read_config(edev, PCI_SEC_STATUS, 2, &cfg);
 		n += scnprintf(buf+n, len-n, "sec stat:%x\n", cfg);
-		pr_warn("EEH: Bridge secondary status: %04x\n", cfg);
+		pr_warn("EEH(%u): Bridge secondary status: %04x\n",
+			event_id, cfg);
 
 		eeh_ops->read_config(edev, PCI_BRIDGE_CONTROL, 2, &cfg);
 		n += scnprintf(buf+n, len-n, "brdg ctl:%x\n", cfg);
-		pr_warn("EEH: Bridge control: %04x\n", cfg);
+		pr_warn("EEH(%u): Bridge control: %04x\n", event_id, cfg);
 	}
 
 	/* Dump out the PCI-X command and status regs */
@@ -238,18 +241,19 @@ static size_t eeh_dump_dev_log(struct eeh_dev *edev, char *buf, size_t len)
 	if (cap) {
 		eeh_ops->read_config(edev, cap, 4, &cfg);
 		n += scnprintf(buf+n, len-n, "pcix-cmd:%x\n", cfg);
-		pr_warn("EEH: PCI-X cmd: %08x\n", cfg);
+		pr_warn("EEH(%u): PCI-X cmd: %08x\n", event_id, cfg);
 
 		eeh_ops->read_config(edev, cap+4, 4, &cfg);
 		n += scnprintf(buf+n, len-n, "pcix-stat:%x\n", cfg);
-		pr_warn("EEH: PCI-X status: %08x\n", cfg);
+		pr_warn("EEH(%u): PCI-X status: %08x\n", event_id, cfg);
 	}
 
 	/* If PCI-E capable, dump PCI-E cap 10 */
 	cap = edev->pcie_cap;
 	if (cap) {
 		n += scnprintf(buf+n, len-n, "pci-e cap10:\n");
-		pr_warn("EEH: PCI-E capabilities and status follow:\n");
+		pr_warn("EEH(%u): PCI-E capabilities and status follow:\n",
+			event_id);
 
 		for (i=0; i<=8; i++) {
 			eeh_ops->read_config(edev, cap+4*i, 4, &cfg);
@@ -260,8 +264,8 @@ static size_t eeh_dump_dev_log(struct eeh_dev *edev, char *buf, size_t len)
 					pr_warn("%s\n", buffer);
 
 				l = scnprintf(buffer, sizeof(buffer),
-					      "EEH: PCI-E %02x: %08x ",
-					      4*i, cfg);
+					      "EEH(%u): PCI-E %02x: %08x ",
+					      event_id, 4*i, cfg);
 			} else {
 				l += scnprintf(buffer+l, sizeof(buffer)-l,
 					       "%08x ", cfg);
@@ -276,7 +280,8 @@ static size_t eeh_dump_dev_log(struct eeh_dev *edev, char *buf, size_t len)
 	cap = edev->aer_cap;
 	if (cap) {
 		n += scnprintf(buf+n, len-n, "pci-e AER:\n");
-		pr_warn("EEH: PCI-E AER capability register set follows:\n");
+		pr_warn("EEH(%u): PCI-E AER capability register set follows:\n",
+			event_id);
 
 		for (i=0; i<=13; i++) {
 			eeh_ops->read_config(edev, cap+4*i, 4, &cfg);
@@ -301,16 +306,13 @@ static size_t eeh_dump_dev_log(struct eeh_dev *edev, char *buf, size_t len)
 	return n;
 }
 
-static void *eeh_dump_pe_log(struct eeh_pe *pe, void *flag)
+static void eeh_dump_pe_log(unsigned int event_id, struct eeh_pe *pe, size_t *plen)
 {
 	struct eeh_dev *edev, *tmp;
-	size_t *plen = flag;
 
 	eeh_pe_for_each_dev(pe, edev, tmp)
-		*plen += eeh_dump_dev_log(edev, pci_regs_buf + *plen,
+		*plen += eeh_dump_dev_log(event_id, edev, pci_regs_buf + *plen,
 					  EEH_PCI_REGS_LOG_LEN - *plen);
-
-	return NULL;
 }
 
 /**
@@ -323,9 +325,10 @@ static void *eeh_dump_pe_log(struct eeh_pe *pe, void *flag)
  * out from the config space of the corresponding PCI device, while
  * the error log is fetched through platform dependent function call.
  */
-void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
+void eeh_slot_error_detail(unsigned int event_id, struct eeh_pe *pe, int severity)
 {
 	size_t loglen = 0;
+	struct eeh_pe *tmp_pe;
 
 	/*
 	 * When the PHB is fenced or dead, it's pointless to collect
@@ -365,7 +368,8 @@ void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
 			eeh_pe_restore_bars(pe);
 
 			pci_regs_buf[0] = 0;
-			eeh_pe_traverse(pe, eeh_dump_pe_log, &loglen);
+			eeh_for_each_pe(pe, tmp_pe)
+				eeh_dump_pe_log(event_id, tmp_pe, &loglen);
 		}
 	}
 
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index d956ae624691..894326cc4dfa 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -247,10 +247,13 @@ static void eeh_set_irq_state(struct eeh_pe *root, bool enable)
 	}
 }
 
-typedef enum pci_ers_result (*eeh_report_fn)(struct pci_dev *,
+typedef enum pci_ers_result (*eeh_report_fn)(unsigned int event_id,
+					     struct pci_dev *,
 					     struct pci_driver *);
-static void eeh_pe_report_pdev(struct pci_dev *pdev, eeh_report_fn fn,
-			       enum pci_ers_result *result)
+static void eeh_pe_report_pdev(unsigned int event_id,
+			       struct pci_dev *pdev, eeh_report_fn fn,
+			       enum pci_ers_result *result,
+			       const char *handler_name)
 {
 	struct eeh_dev *edev;
 	struct pci_driver *driver;
@@ -259,7 +262,7 @@ static void eeh_pe_report_pdev(struct pci_dev *pdev, eeh_report_fn fn,
 
 	edev = pci_dev_to_eeh_dev(pdev);
 	if (!edev) {
-		pci_info(pdev, "no EEH state for device");
+		pci_info(pdev, "EEH(%u): no EEH state for device", event_id);
 		return;
 	}
 	/* Cache some useful values before releasing the lock: */
@@ -279,19 +282,26 @@ static void eeh_pe_report_pdev(struct pci_dev *pdev, eeh_report_fn fn,
 		driver = eeh_pcid_get(pdev);
 
 		if (!driver)
-			pci_info(pdev, "no driver");
+			pci_info(pdev, "EEH(%u): no driver", event_id);
 		else if (!driver->err_handler)
-			pci_info(pdev, "driver not EEH aware");
+			pci_info(pdev, "EEH(%u): driver not EEH aware", event_id);
 		else if (late)
-			pci_info(pdev, "driver bound too late");
+			pci_info(pdev, "EEH(%u): driver bound too late", event_id);
 		else {
-			new_result = fn(pdev, driver);
+			pr_warn("EEH(%u): EVENT=HANDLER_CALL DEVICE=%04x:%02x:%02x.%x DRIVER='%s' HANDLER='%s'\n",
+				event_id, edev->controller->global_number,
+				PCI_BUSNO(edev->bdfn), PCI_SLOT(edev->bdfn),
+				PCI_FUNC(edev->bdfn), driver->name, handler_name);
+
+			new_result = fn(event_id, pdev, driver);
 			/*
 			 * It's not safe to use edev here, because the locks
 			 * have been released and devices could have changed.
 			 */
-			pci_info(pdev, "%s driver reports: '%s'",
-				 driver->name,
+			pr_warn("EEH(%u): EVENT=HANDLER_RETURN RESULT='%s'\n",
+				event_id, pci_ers_result_name(new_result));
+			pci_info(pdev, "EEH(%u): %s driver reports: %s",
+				 event_id, driver->name,
 				 pci_ers_result_name(new_result));
 			if (result)
 				*result = pci_ers_merge_result(*result,
@@ -302,8 +312,8 @@ static void eeh_pe_report_pdev(struct pci_dev *pdev, eeh_report_fn fn,
 		device_unlock(&pdev->dev);
 		eeh_recovery_lock();
 	} else {
-		pci_info(pdev, "not actionable (%d,%d,%d)", !!pdev,
-			 !removed, !passed);
+		pci_info(pdev, "EEH(%u): not actionable (%d,%d,%d)",
+			 event_id, !!pdev, !removed, !passed);
 	}
 }
 
@@ -344,12 +354,13 @@ static void pdev_cache_list_destroy(struct pci_dev **pdevs)
 	kfree(pdevs);
 }
 
-static void eeh_pe_report(const char *name, struct eeh_pe *root,
+static void eeh_pe_report(unsigned int event_id,
+			  const char *name, struct eeh_pe *root,
 			  eeh_report_fn fn, enum pci_ers_result *result)
 {
 	struct pci_dev **pdevs, **pdevp;
 
-	pr_info("EEH: Beginning: '%s'\n", name);
+	pr_info("EEH(%u): Beginning: '%s'\n", event_id, name);
 	/*
 	 * It would be convenient to continue to hold the recovery lock here
 	 * but driver callbacks can take a very long time or never return at
@@ -361,15 +372,15 @@ static void eeh_pe_report(const char *name, struct eeh_pe *root,
 		 * NOTE! eeh_recovery_lock() is released briefly
 		 * in eeh_pe_report_pdev()
 		 */
-		eeh_pe_report_pdev(*pdevp, fn, result);
+		eeh_pe_report_pdev(event_id, *pdevp, fn, result, name);
 	}
 	pdev_cache_list_destroy(pdevs);
 
 	if (result)
-		pr_info("EEH: Finished:'%s' with aggregate recovery state:'%s'\n",
-			name, pci_ers_result_name(*result));
+		pr_info("EEH(%u): Finished:'%s' with aggregate recovery state:'%s'\n",
+			event_id, name, pci_ers_result_name(*result));
 	else
-		pr_info("EEH: Finished:'%s'", name);
+		pr_info("EEH(%u): Finished:'%s'",event_id, name);
 }
 
 /**
@@ -379,7 +390,8 @@ static void eeh_pe_report(const char *name, struct eeh_pe *root,
  *
  * Report an EEH error to each device driver.
  */
-static enum pci_ers_result eeh_report_error(struct pci_dev *pdev,
+static enum pci_ers_result eeh_report_error(unsigned int event_id,
+					    struct pci_dev *pdev,
 					    struct pci_driver *driver)
 {
 	enum pci_ers_result rc;
@@ -410,12 +422,14 @@ static enum pci_ers_result eeh_report_error(struct pci_dev *pdev,
  * Tells each device driver that IO ports, MMIO and config space I/O
  * are now enabled.
  */
-static enum pci_ers_result eeh_report_mmio_enabled(struct pci_dev *pdev,
+static enum pci_ers_result eeh_report_mmio_enabled(unsigned int event_id,
+						   struct pci_dev *pdev,
 						   struct pci_driver *driver)
 {
 	if (!driver->err_handler->mmio_enabled)
 		return PCI_ERS_RESULT_NONE;
-	pci_info(pdev, "Invoking %s->mmio_enabled()", driver->name);
+	pci_info(pdev, "EEH(%u): Invoking %s->mmio_enabled()",
+		 event_id, driver->name);
 	return driver->err_handler->mmio_enabled(pdev);
 }
 
@@ -430,7 +444,8 @@ static enum pci_ers_result eeh_report_mmio_enabled(struct pci_dev *pdev,
  * some actions, usually to save data the driver needs so that the
  * driver can work again while the device is recovered.
  */
-static enum pci_ers_result eeh_report_reset(struct pci_dev *pdev,
+static enum pci_ers_result eeh_report_reset(unsigned int event_id,
+					    struct pci_dev *pdev,
 					    struct pci_driver *driver)
 {
 	struct eeh_dev *edev;
@@ -443,7 +458,8 @@ static enum pci_ers_result eeh_report_reset(struct pci_dev *pdev,
 		return PCI_ERS_RESULT_NONE;
 	}
 	eeh_serialize_unlock(flags);
-	pci_info(pdev, "Invoking %s->slot_reset()", driver->name);
+	pci_info(pdev, "EEH(%u): Invoking %s->slot_reset()",
+		 event_id, driver->name);
 	return driver->err_handler->slot_reset(pdev);
 }
 
@@ -483,7 +499,8 @@ static void eeh_dev_restore_state(struct eeh_dev *edev, void *userdata)
  * could resume so that the device driver can do some initialization
  * to make the recovered device work again.
  */
-static enum pci_ers_result eeh_report_resume(struct pci_dev *pdev,
+static enum pci_ers_result eeh_report_resume(unsigned int event_id,
+					     struct pci_dev *pdev,
 					     struct pci_driver *driver)
 {
 	struct eeh_dev *edev;
@@ -497,7 +514,8 @@ static enum pci_ers_result eeh_report_resume(struct pci_dev *pdev,
 	}
 	eeh_serialize_unlock(flags);
 
-	pci_info(pdev, "Invoking %s->resume()", driver->name);
+	pci_info(pdev, "EEH(%u): Invoking %s->resume()",
+		 event_id, driver->name);
 	driver->err_handler->resume(pdev);
 
 	pci_uevent_ers(pdev, PCI_ERS_RESULT_RECOVERED);
@@ -518,7 +536,8 @@ static enum pci_ers_result eeh_report_resume(struct pci_dev *pdev,
  * This informs the device driver that the device is permanently
  * dead, and that no further recovery attempts will be made on it.
  */
-static enum pci_ers_result eeh_report_failure(struct pci_dev *pdev,
+static enum pci_ers_result eeh_report_failure(unsigned int event_id,
+					      struct pci_dev *pdev,
 					      struct pci_driver *driver)
 {
 	enum pci_ers_result rc;
@@ -526,8 +545,8 @@ static enum pci_ers_result eeh_report_failure(struct pci_dev *pdev,
 	if (!driver->err_handler->error_detected)
 		return PCI_ERS_RESULT_NONE;
 
-	pci_info(pdev, "Invoking %s->error_detected(permanent failure)",
-		 driver->name);
+	pci_info(pdev, "EEH(%u): Invoking %s->error_detected(permanent failure)",
+		 event_id, driver->name);
 	rc = driver->err_handler->error_detected(pdev,
 						 pci_channel_io_perm_failure);
 
@@ -574,7 +593,8 @@ static void *eeh_add_virt_device(struct eeh_dev *edev)
 	return NULL;
 }
 
-static void eeh_rmv_device(struct pci_dev *pdev, void *userdata)
+static void eeh_rmv_device(unsigned int event_id,
+			   struct pci_dev *pdev, void *userdata)
 {
 	struct eeh_dev *edev;
 	struct pci_driver *driver;
@@ -582,8 +602,8 @@ static void eeh_rmv_device(struct pci_dev *pdev, void *userdata)
 
 	edev = pci_dev_to_eeh_dev(pdev);
 	if (!edev) {
-		pci_warn(pdev, "EEH: Device removed during processing (#%d)\n",
-			 __LINE__);
+		pci_warn(pdev, "EEH(%u): Device removed during processing (#%d)\n",
+			 event_id, __LINE__);
 		return;
 	}
 
@@ -612,7 +632,8 @@ static void eeh_rmv_device(struct pci_dev *pdev, void *userdata)
 	}
 
 	/* Remove it from PCI subsystem */
-	pci_info(pdev, "EEH: Removing device without EEH sensitive driver\n");
+	pci_info(pdev, "EEH(%u): Removing device without EEH sensitive driver\n",
+		 event_id);
 	edev->mode |= EEH_DEV_DISCONNECTED;
 	if (rmv_data)
 		rmv_data->removed_dev_count++;
@@ -730,7 +751,8 @@ int eeh_pe_reset_and_recover(struct eeh_pe *pe)
  * During the reset, udev might be invoked because those affected
  * PCI devices will be removed and then added.
  */
-static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
+static int eeh_reset_device(unsigned int event_id,
+			    struct eeh_pe *pe, struct pci_bus *bus,
 			    struct eeh_rmv_data *rmv_data,
 			    bool driver_eeh_aware)
 {
@@ -764,7 +786,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
 		pdevs = pdev_cache_list_create(pe);
 		/* eeh_rmv_device() may re-acquire the recovery lock */
 		for (pdevp = pdevs; pdevp && *pdevp; pdevp++)
-			eeh_rmv_device(*pdevp, rmv_data);
+			eeh_rmv_device(event_id, *pdevp, rmv_data);
 		pdev_cache_list_destroy(pdevs);
 
 	} else {
@@ -813,8 +835,8 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
 	 * potentially weird things happen.
 	 */
 	if (!driver_eeh_aware || rmv_data->removed_dev_count) {
-		pr_info("EEH: Sleep 5s ahead of %s hotplug\n",
-			(driver_eeh_aware ? "partial" : "complete"));
+		pr_info("EEH(%u): Sleep 5s ahead of %s hotplug\n",
+			event_id, (driver_eeh_aware ? "partial" : "complete"));
 		eeh_recovery_unlock();
 		ssleep(5);
 		eeh_recovery_lock();
@@ -971,7 +993,7 @@ static void eeh_clear_slot_attention(struct pci_dev *pdev)
  * drivers (which cause a second set of hotplug events to go out to
  * userspace).
  */
-void eeh_handle_normal_event(struct eeh_pe *pe)
+void eeh_handle_normal_event(unsigned int event_id, struct eeh_pe *pe)
 {
 	struct pci_bus *bus;
 	struct eeh_dev *edev, *tmp;
@@ -986,8 +1008,8 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	eeh_recovery_lock();
 	bus = eeh_pe_bus_get(pe);
 	if (!bus) {
-		pr_err("%s: Cannot find PCI bus for PHB#%x-PE#%x\n",
-			__func__, pe->phb->global_number, pe->addr);
+		pr_err("EEH(%u): %s: Cannot find PCI bus for PHB#%x-PE#%x\n",
+			event_id, __func__, pe->phb->global_number, pe->addr);
 		eeh_recovery_unlock();
 		return;
 	}
@@ -1007,22 +1029,27 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 				devices++;
 
 	if (!devices) {
-		pr_debug("EEH: Frozen PHB#%x-PE#%x is empty!\n",
-			pe->phb->global_number, pe->addr);
+		pr_debug("EEH(%u): Frozen PHB#%x-PE#%x is empty!\n",
+			event_id, pe->phb->global_number, pe->addr);
 		goto out; /* nothing to recover */
 	}
 
+	pe->freeze_count++;
+	pr_warn("EEH(%u): EVENT=RECOVERY_START TYPE=%s PHB=%#x PE=%#x COUNT=%d\n",
+		event_id, ((pe->type & EEH_PE_PHB) ? "PHB" : "PE"),
+		pe->phb->global_number, pe->addr, pe->freeze_count);
+
 	/* Log the event */
 	if (pe->type & EEH_PE_PHB) {
-		pr_err("EEH: Recovering PHB#%x, location: %s\n",
-			pe->phb->global_number, eeh_pe_loc_get(pe));
+		pr_err("EEH(%u): Recovering PHB#%x, location: %s\n",
+			event_id, pe->phb->global_number, eeh_pe_loc_get(pe));
 	} else {
 		struct eeh_pe *phb_pe = eeh_phb_pe_get(pe->phb);
 
-		pr_err("EEH: Recovering PHB#%x-PE#%x\n",
-		       pe->phb->global_number, pe->addr);
-		pr_err("EEH: PE location: %s, PHB location: %s\n",
-		       eeh_pe_loc_get(pe), eeh_pe_loc_get(phb_pe));
+		pr_err("EEH(%u): Recovering PHB#%x-PE#%x\n",
+		       event_id, pe->phb->global_number, pe->addr);
+		pr_err("EEH(%u): PE location: %s, PHB location: %s\n",
+		       event_id, eeh_pe_loc_get(pe), eeh_pe_loc_get(phb_pe));
 	}
 
 #ifdef CONFIG_STACKTRACE
@@ -1034,13 +1061,13 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 		void **ptrs = (void **) pe->stack_trace;
 		int i;
 
-		pr_err("EEH: Frozen PHB#%x-PE#%x detected\n",
-		       pe->phb->global_number, pe->addr);
+		pr_err("EEH(%u): Frozen PHB#%x-PE#%x detected\n",
+		       event_id, pe->phb->global_number, pe->addr);
 
 		/* FIXME: Use the same format as dump_stack() */
-		pr_err("EEH: Call Trace:\n");
+		pr_err("EEH(%u): Call Trace:\n", event_id);
 		for (i = 0; i < pe->trace_entries; i++)
-			pr_err("EEH: [%pK] %pS\n", ptrs[i], ptrs[i]);
+			pr_err("EEH(%u): [%pK] %pS\n", event_id, ptrs[i], ptrs[i]);
 
 		pe->trace_entries = 0;
 	}
@@ -1053,8 +1080,8 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	eeh_pe_update_time_stamp(pe);
 	pe->freeze_count++;
 	if (pe->freeze_count > eeh_max_freezes) {
-		pr_err("EEH: PHB#%x-PE#%x has failed %d times in the last hour and has been permanently disabled.\n",
-		       pe->phb->global_number, pe->addr,
+		pr_err("EEH(%u): PHB#%x-PE#%x has failed %d times in the last hour and has been permanently disabled.\n",
+		       event_id, pe->phb->global_number, pe->addr,
 		       pe->freeze_count);
 
 		goto recover_failed;
@@ -1070,12 +1097,12 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	 * the error. Override the result if necessary to have partially
 	 * hotplug for this case.
 	 */
-	pr_warn("EEH: This PCI device has failed %d times in the last hour and will be permanently disabled after %d failures.\n",
-		pe->freeze_count, eeh_max_freezes);
-	pr_info("EEH: Notify device drivers to shutdown\n");
+	pr_warn("EEH(%u): This PCI device has failed %d times in the last hour and will be permanently disabled after %d failures.\n",
+		event_id, pe->freeze_count, eeh_max_freezes);
+	pr_info("EEH(%u): Notify device drivers to shutdown\n", event_id);
 	eeh_set_channel_state(pe, pci_channel_io_frozen);
 	eeh_set_irq_state(pe, false);
-	eeh_pe_report("error_detected(IO frozen)", pe,
+	eeh_pe_report(event_id, "error_detected(IO frozen)", pe,
 		      eeh_report_error, &result);
 	if (result == PCI_ERS_RESULT_DISCONNECT)
 		goto recover_failed;
@@ -1092,7 +1119,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	 */
 	rc = eeh_wait_state(pe, MAX_WAIT_FOR_RECOVERY * 1000, true);
 	if (rc < 0 || rc == EEH_STATE_NOT_SUPPORT) {
-		pr_warn("EEH: Permanent failure\n");
+		pr_warn("EEH(%u): Permanent failure\n", event_id);
 		goto recover_failed;
 	}
 
@@ -1100,16 +1127,16 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	 * don't post the error log until after all dev drivers
 	 * have been informed.
 	 */
-	pr_info("EEH: Collect temporary log\n");
-	eeh_slot_error_detail(pe, EEH_LOG_TEMP);
+	pr_info("EEH(%u): Collect temporary log\n", event_id);
+	eeh_slot_error_detail(event_id, pe, EEH_LOG_TEMP);
 
 	/* If all device drivers were EEH-unaware, then shut
 	 * down all of the device drivers, and hope they
 	 * go down willingly, without panicing the system.
 	 */
 	if (result == PCI_ERS_RESULT_NONE) {
-		pr_info("EEH: Reset with hotplug activity\n");
-		rc = eeh_reset_device(pe, bus, NULL, false);
+		pr_info("EEH(%u): Reset with hotplug activity\n", event_id);
+		rc = eeh_reset_device(event_id, pe, bus, NULL, false);
 		if (rc) {
 			pr_warn("%s: Unable to reset, err=%d\n", __func__, rc);
 			goto recover_failed;
@@ -1118,7 +1145,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 
 	/* If all devices reported they can proceed, then re-enable MMIO */
 	if (result == PCI_ERS_RESULT_CAN_RECOVER) {
-		pr_info("EEH: Enable I/O for affected devices\n");
+		pr_info("EEH(%u): Enable I/O for affected devices\n", event_id);
 		rc = eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
 		if (rc < 0)
 			goto recover_failed;
@@ -1126,13 +1153,13 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 		if (rc) {
 			result = PCI_ERS_RESULT_NEED_RESET;
 		} else {
-			pr_info("EEH: Notify device drivers to resume I/O\n");
-			eeh_pe_report("mmio_enabled", pe,
+			pr_info("EEH(%u): Notify device drivers to resume I/O\n", event_id);
+			eeh_pe_report(event_id, "mmio_enabled", pe,
 				      eeh_report_mmio_enabled, &result);
 		}
 	}
 	if (result == PCI_ERS_RESULT_CAN_RECOVER) {
-		pr_info("EEH: Enabled DMA for affected devices\n");
+		pr_info("EEH(%u): Enabled DMA for affected devices\n", event_id);
 		rc = eeh_pci_enable(pe, EEH_OPT_THAW_DMA);
 		if (rc < 0)
 			goto recover_failed;
@@ -1152,8 +1179,8 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 
 	/* If any device called out for a reset, then reset the slot */
 	if (result == PCI_ERS_RESULT_NEED_RESET) {
-		pr_info("EEH: Reset without hotplug activity\n");
-		rc = eeh_reset_device(pe, bus, &rmv_data, true);
+		pr_info("EEH(%u): Reset without hotplug activity\n", event_id);
+		rc = eeh_reset_device(event_id, pe, bus, &rmv_data, true);
 		if (rc) {
 			pr_warn("%s: Cannot reset, err=%d\n", __func__, rc);
 			goto recover_failed;
@@ -1162,7 +1189,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 		result = PCI_ERS_RESULT_NONE;
 		eeh_set_channel_state(pe, pci_channel_io_normal);
 		eeh_set_irq_state(pe, true);
-		eeh_pe_report("slot_reset", pe, eeh_report_reset,
+		eeh_pe_report(event_id, "slot_reset", pe, eeh_report_reset,
 			      &result);
 	}
 
@@ -1179,10 +1206,10 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 		}
 
 		/* Tell all device drivers that they can resume operations */
-		pr_info("EEH: Notify device driver to resume\n");
+		pr_info("EEH(%u): Notify device driver to resume\n", event_id);
 		eeh_set_channel_state(pe, pci_channel_io_normal);
 		eeh_set_irq_state(pe, true);
-		eeh_pe_report("resume", pe, eeh_report_resume, NULL);
+		eeh_pe_report(event_id, "resume", pe, eeh_report_resume, NULL);
 		eeh_for_each_pe(pe, tmp_pe) {
 			eeh_pe_for_each_dev(tmp_pe, edev, tmp) {
 				edev->mode &= ~EEH_DEV_NO_HANDLER;
@@ -1190,7 +1217,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 			}
 		}
 
-		pr_info("EEH: Recovery successful.\n");
+		pr_info("EEH(%u): Recovery successful.\n", event_id);
 		goto out;
 	}
 
@@ -1200,17 +1227,18 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	 * are due to poorly seated PCI cards. Only 10% or so are
 	 * due to actual, failed cards.
 	 */
-	pr_err("EEH: Unable to recover from failure from PHB#%x-PE#%x.\n"
+	pr_err("EEH(%u): Unable to recover from failure from PHB#%x-PE#%x.\n"
 		"Please try reseating or replacing it\n",
-		pe->phb->global_number, pe->addr);
+		event_id, pe->phb->global_number, pe->addr);
 
-	eeh_slot_error_detail(pe, EEH_LOG_PERM);
+	eeh_slot_error_detail(event_id, pe, EEH_LOG_PERM);
 
 	/* Notify all devices that they're about to go down. */
 	eeh_set_channel_state(pe, pci_channel_io_perm_failure);
 	eeh_set_irq_state(pe, false);
-	eeh_pe_report("error_detected(permanent failure)", pe,
+	eeh_pe_report(event_id, "error_detected(permanent failure)", pe,
 		      eeh_report_failure, NULL);
+	pr_crit("EEH(%u): EVENT=RECOVERY_END RESULT=failure\n", event_id);
 
 	/* Mark the PE to be removed permanently */
 	eeh_pe_state_mark(pe, EEH_PE_REMOVED);
@@ -1223,7 +1251,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	if (pe->type & EEH_PE_VF) {
 		pdevs = pdev_cache_list_create(pe);
 		for (pdevp = pdevs; pdevp && *pdevp; pdevp++)
-			eeh_rmv_device(*pdevp, NULL);
+			eeh_rmv_device(event_id, *pdevp, NULL);
 		pdev_cache_list_destroy(pdevs);
 		eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
 	} else {
@@ -1238,6 +1266,8 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 		return;
 	}
 
+	pr_info("EEH(%u): EVENT=RECOVERY_END RESULT=success\n", event_id);
+
 out:
 	/*
 	 * Clean up any PEs without devices. While marked as EEH_PE_RECOVERYING
@@ -1324,7 +1354,7 @@ void eeh_handle_special_event(void)
 		if (rc == EEH_NEXT_ERR_FROZEN_PE ||
 		    rc == EEH_NEXT_ERR_FENCED_PHB) {
 			eeh_pe_state_mark(pe, EEH_PE_RECOVERING);
-			eeh_handle_normal_event(pe);
+			eeh_handle_normal_event(0, pe);
 		} else {
 			eeh_for_each_pe(pe, tmp_pe)
 				eeh_pe_for_each_dev(tmp_pe, edev, tmp_edev)
@@ -1333,7 +1363,7 @@ void eeh_handle_special_event(void)
 			/* Notify all devices to be down */
 			eeh_pe_state_clear(pe, EEH_PE_PRI_BUS, true);
 			eeh_set_channel_state(pe, pci_channel_io_perm_failure);
-			eeh_pe_report(
+			eeh_pe_report(0,
 				"error_detected(permanent failure)", pe,
 				eeh_report_failure, NULL);
 
diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c
index c23a454af08a..6c205a77581f 100644
--- a/arch/powerpc/kernel/eeh_event.c
+++ b/arch/powerpc/kernel/eeh_event.c
@@ -26,6 +26,9 @@ static DEFINE_SPINLOCK(eeh_eventlist_lock);
 static DECLARE_COMPLETION(eeh_eventlist_event);
 static LIST_HEAD(eeh_eventlist);
 
+/* Event ID 0 is reserved for special events */
+static atomic_t eeh_event_id = ATOMIC_INIT(1);
+
 /**
  * eeh_event_handler - Dispatch EEH events.
  * @dummy - unused
@@ -59,7 +62,7 @@ static int eeh_event_handler(void * dummy)
 
 		/* We might have event without binding PE */
 		if (event->pe)
-			eeh_handle_normal_event(event->pe);
+			eeh_handle_normal_event(event->id, event->pe);
 		else
 			eeh_handle_special_event();
 
@@ -110,6 +113,13 @@ int __eeh_send_failure_event(struct eeh_pe *pe)
 		return -ENOMEM;
 	}
 	event->pe = pe;
+	do {
+		/* Skip over the special value (0) */
+		event->id = (unsigned int)atomic_inc_return(&eeh_event_id);
+	} while (!event->id);
+
+	pr_err("EEH(%u): EVENT=ERROR_DETECTED PHB=%#x PE=%#x\n",
+	      event->id, pe->phb->global_number, pe->addr);
 
 	/*
 	 * Mark the PE as recovering before inserting it in the queue.
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC 3/3] powerpc/eeh: Asynchronous recovery
  2022-08-16  3:27 [RFC 0/3] Asynchronous EEH recovery Ganesh Goudar
  2022-08-16  3:27 ` [RFC 1/3] powerpc/eeh: Synchronization for safety Ganesh Goudar
  2022-08-16  3:27 ` [RFC 2/3] powerpc/eeh: Provide a unique ID for each EEH recovery Ganesh Goudar
@ 2022-08-16  3:27 ` Ganesh Goudar
  2022-08-17  7:16 ` [RFC 0/3] Asynchronous EEH recovery Oliver O'Halloran
  2022-09-02  0:19 ` Jason Gunthorpe
  4 siblings, 0 replies; 9+ messages in thread
From: Ganesh Goudar @ 2022-08-16  3:27 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Ganesh Goudar, mahesh

Based on the original work from Sam Bobroff.

Currently, EEH recovery is entirely serialized and takes place within
a single kernel thread. This can cause recovery to take a long time
when there are many devices.

To shorten recovery time, this change allows recovery to proceed in
parallel in two ways:
- Each PHB is given it's own recovery event queue and can be recovered
independently from other PHBs.
- Driver handlers are called in parallel, but with the constraint that
handlers higher up (closer to the PHB) in the PE hierarchy must be
called before those lower down.

To maintain the constraint, above, the driver handlers are called by
traversing the tree of affected PEs from the top, stopping to call
handlers (in parallel) when a PE with devices is discovered. When the
calls for that PE are complete, traversal continues at each child PE.

Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
---
 arch/powerpc/include/asm/eeh.h        |   1 +
 arch/powerpc/include/asm/eeh_event.h  |   7 +
 arch/powerpc/include/asm/pci-bridge.h |   3 +
 arch/powerpc/kernel/eeh_driver.c      | 323 +++++++++++++++++++-------
 arch/powerpc/kernel/eeh_event.c       |  65 +++---
 arch/powerpc/kernel/eeh_pe.c          |   3 +
 6 files changed, 288 insertions(+), 114 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index f659c0433de5..2728aee5cb0b 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -128,6 +128,7 @@ static inline bool eeh_pe_passed(struct eeh_pe *pe)
 #define EEH_DEV_NO_HANDLER	(1 << 8)	/* No error handler	*/
 #define EEH_DEV_SYSFS		(1 << 9)	/* Sysfs created	*/
 #define EEH_DEV_REMOVED		(1 << 10)	/* Removed permanently	*/
+#define EEH_DEV_RECOVERING	(1 << 11)	/* Recovering		*/
 
 struct eeh_dev {
 	int mode;			/* EEH mode			*/
diff --git a/arch/powerpc/include/asm/eeh_event.h b/arch/powerpc/include/asm/eeh_event.h
index a1fe736bc4cf..b21f49e87b7b 100644
--- a/arch/powerpc/include/asm/eeh_event.h
+++ b/arch/powerpc/include/asm/eeh_event.h
@@ -8,6 +8,8 @@
 #define ASM_POWERPC_EEH_EVENT_H
 #ifdef __KERNEL__
 
+#include <linux/workqueue.h>
+
 /*
  * structure holding pci controller data that describes a
  * change in the isolation status of a PCI slot.  A pointer
@@ -15,16 +17,21 @@
  * callback.
  */
 struct eeh_event {
+	struct work_struct	work;
 	struct list_head	list;	/* to form event queue	*/
 	struct eeh_pe		*pe;	/* EEH PE		*/
 	unsigned int		id;	/* Event ID		*/
 };
 
+extern spinlock_t eeh_eventlist_lock;
+
 int eeh_event_init(void);
+int eeh_phb_event(struct eeh_pe *pe);
 int eeh_send_failure_event(struct eeh_pe *pe);
 int __eeh_send_failure_event(struct eeh_pe *pe);
 void eeh_remove_event(struct eeh_pe *pe, bool force);
 void eeh_handle_normal_event(unsigned int event_id, struct eeh_pe *pe);
+void eeh_handle_normal_event_work(struct work_struct *work);
 void eeh_handle_special_event(void);
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index c85f901227c9..74806009f50a 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -131,6 +131,9 @@ struct pci_controller {
 	struct irq_domain	*dev_domain;
 	struct irq_domain	*msi_domain;
 	struct fwnode_handle	*fwnode;
+
+	bool eeh_in_progress;
+	struct list_head eeh_eventlist;
 };
 
 /* These are used for config access before all the PCI probing
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 894326cc4dfa..3abd5f2d146c 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -12,12 +12,17 @@
 #include <linux/module.h>
 #include <linux/pci.h>
 #include <linux/pci_hotplug.h>
+#include <linux/kthread.h>
+#include <linux/workqueue.h>
+#include <linux/completion.h>
 #include <asm/eeh.h>
 #include <asm/eeh_event.h>
 #include <asm/ppc-pci.h>
 #include <asm/pci-bridge.h>
 #include <asm/rtas.h>
 
+static atomic_t eeh_wu_id = ATOMIC_INIT(0);
+
 struct eeh_rmv_data {
 	struct list_head removed_vf_list;
 	int removed_dev_count;
@@ -248,73 +253,59 @@ static void eeh_set_irq_state(struct eeh_pe *root, bool enable)
 }
 
 typedef enum pci_ers_result (*eeh_report_fn)(unsigned int event_id,
+					     unsigned int id,
 					     struct pci_dev *,
 					     struct pci_driver *);
 static void eeh_pe_report_pdev(unsigned int event_id,
-			       struct pci_dev *pdev, eeh_report_fn fn,
+			       unsigned int id,
+			       struct pci_dev *pdev,
+			       const char *fn_name, eeh_report_fn fn,
 			       enum pci_ers_result *result,
-			       const char *handler_name)
+			       bool late, bool removed, bool passed)
 {
-	struct eeh_dev *edev;
 	struct pci_driver *driver;
-	bool actionable, late, removed, passed;
 	enum pci_ers_result new_result;
 
-	edev = pci_dev_to_eeh_dev(pdev);
-	if (!edev) {
-		pci_info(pdev, "EEH(%u): no EEH state for device", event_id);
-		return;
-	}
-	/* Cache some useful values before releasing the lock: */
-	actionable = eeh_edev_actionable(edev);
-	late = edev->mode & EEH_DEV_NO_HANDLER;
-	removed = eeh_dev_removed(edev);
-	passed = eeh_pe_passed(edev->pe);
-	if (actionable) {
+	/*
+	 * Driver callbacks may end up calling back into EEH functions
+	 * (for example by removing a PCI device) which will deadlock
+	 * unless the EEH locks are released first. Note that it may be
+	 * re-acquired by the report functions, if necessary.
+	 */
+	device_lock(&pdev->dev);
+	driver = eeh_pcid_get(pdev);
+
+	if (!driver) {
+		pci_info(pdev, "EEH(%u): W%u: no driver", event_id, id);
+	} else if (!driver->err_handler) {
+		pci_info(pdev, "EEH(%u): W%u: driver not EEH aware", event_id, id);
+	} else if (late) {
+		pci_info(pdev, "EEH(%u): W%u: driver bound too late", event_id, id);
+	} else {
+		pci_info(pdev, "EEH(%u): EVENT=HANDLER_CALL HANDLER='%s'\n",
+			 event_id, fn_name);
+
+		new_result = fn(event_id, id, pdev, driver);
+
 		/*
-		 * Driver callbacks may end up calling back into EEH functions
-		 * (for example by removing a PCI device) which will deadlock
-		 * unless the EEH locks are released first. Note that it may be
-		 * re-acquired by the report functions, if necessary.
+		 * It's not safe to use edev here, because the locks
+		 * have been released and devices could have changed.
 		 */
-		eeh_recovery_unlock();
-		device_lock(&pdev->dev);
-		driver = eeh_pcid_get(pdev);
-
-		if (!driver)
-			pci_info(pdev, "EEH(%u): no driver", event_id);
-		else if (!driver->err_handler)
-			pci_info(pdev, "EEH(%u): driver not EEH aware", event_id);
-		else if (late)
-			pci_info(pdev, "EEH(%u): driver bound too late", event_id);
-		else {
-			pr_warn("EEH(%u): EVENT=HANDLER_CALL DEVICE=%04x:%02x:%02x.%x DRIVER='%s' HANDLER='%s'\n",
-				event_id, edev->controller->global_number,
-				PCI_BUSNO(edev->bdfn), PCI_SLOT(edev->bdfn),
-				PCI_FUNC(edev->bdfn), driver->name, handler_name);
-
-			new_result = fn(event_id, pdev, driver);
-			/*
-			 * It's not safe to use edev here, because the locks
-			 * have been released and devices could have changed.
-			 */
-			pr_warn("EEH(%u): EVENT=HANDLER_RETURN RESULT='%s'\n",
-				event_id, pci_ers_result_name(new_result));
-			pci_info(pdev, "EEH(%u): %s driver reports: %s",
-				 event_id, driver->name,
-				 pci_ers_result_name(new_result));
-			if (result)
-				*result = pci_ers_merge_result(*result,
-							       new_result);
+		pr_warn("EEH(%u): EVENT=HANDLER_RETURN RESULT='%s'\n",
+			event_id, pci_ers_result_name(new_result));
+		pci_info(pdev, "EEH(%u): W%u: %s driver reports: '%s'",
+			 event_id, id, driver->name,
+			 pci_ers_result_name(new_result));
+		if (result) {
+			eeh_recovery_lock();
+			*result = pci_ers_merge_result(*result,
+						       new_result);
+			eeh_recovery_unlock();
 		}
-		if (driver)
-			eeh_pcid_put(pdev);
-		device_unlock(&pdev->dev);
-		eeh_recovery_lock();
-	} else {
-		pci_info(pdev, "EEH(%u): not actionable (%d,%d,%d)",
-			 event_id, !!pdev, !removed, !passed);
 	}
+	if (driver)
+		eeh_pcid_put(pdev);
+	device_unlock(&pdev->dev);
 }
 
 struct pci_dev **pdev_cache_list_create(struct eeh_pe *root)
@@ -354,27 +345,153 @@ static void pdev_cache_list_destroy(struct pci_dev **pdevs)
 	kfree(pdevs);
 }
 
-static void eeh_pe_report(unsigned int event_id,
-			  const char *name, struct eeh_pe *root,
-			  eeh_report_fn fn, enum pci_ers_result *result)
+struct work_unit {
+	unsigned int id;
+	struct work_struct work;
+	unsigned int event_id;
+	struct pci_dev *pdev;
+	struct eeh_pe *pe;
+	const char *fn_name;
+	eeh_report_fn fn;
+	enum pci_ers_result *result;
+	atomic_t *count;
+	struct completion *done;
+};
+
+static void eeh_pe_report_pdev_thread(struct work_struct *work);
+/*
+ * Traverse down from a PE through it's children, to find devices and enqueue
+ * jobs to call the handler (fn) on them.  But do not traverse below a PE that
+ * has devices, so that devices are always handled strictly before their
+ * children. (Traversal is continued by the jobs after handlers are called.)
+ * The recovery lock must be held.
+ * TODO: Convert away from recursive descent traversal?
+ */
+static bool enqueue_pe_work(struct eeh_pe *root, unsigned int event_id,
+			    const char *fn_name, eeh_report_fn fn,
+			    enum pci_ers_result *result, atomic_t *count,
+			    struct completion *done)
 {
-	struct pci_dev **pdevs, **pdevp;
+	struct eeh_pe *pe;
+	struct eeh_dev *edev, *tmp;
+	struct work_unit *wu;
+	bool work_added = false;
+
+	if (list_empty(&root->edevs)) {
+		list_for_each_entry(pe, &root->child_list, child)
+			work_added |= enqueue_pe_work(pe, event_id, fn_name,
+						      fn, result, count, done);
+	} else {
+		eeh_pe_for_each_dev(root, edev, tmp) {
+			work_added = true;
+			edev->mode |= EEH_DEV_RECOVERING;
+			atomic_inc(count);
+			WARN_ON(!(edev->mode & EEH_DEV_RECOVERING));
+			wu = kmalloc(sizeof(*wu), GFP_KERNEL);
+			wu->id = (unsigned int)atomic_inc_return(&eeh_wu_id);
+			wu->event_id = event_id;
+			get_device(&edev->pdev->dev);
+			wu->pdev = edev->pdev;
+			wu->pe = root;
+			wu->fn_name = fn_name;
+			wu->fn = fn;
+			wu->result = result;
+			wu->count = count;
+			wu->done = done;
+			INIT_WORK(&wu->work, eeh_pe_report_pdev_thread);
+			pr_debug("EEH(%u): Queue work unit W%u for device %s (count ~ %d)\n",
+				 event_id, wu->id, pci_name(edev->pdev),
+				 atomic_read(count));
+			queue_work(system_unbound_wq, &wu->work);
+		}
+		/* This PE has devices, so don't traverse further now */
+	}
+	return work_added;
+}
+
+static void eeh_pe_report_pdev_thread(struct work_struct *work)
+{
+	struct work_unit *wu = container_of(work, struct work_unit, work);
+	struct eeh_dev *edev, *oedev, *tmp;
+	struct eeh_pe *pe;
+	int todo;
 
-	pr_info("EEH(%u): Beginning: '%s'\n", event_id, name);
 	/*
 	 * It would be convenient to continue to hold the recovery lock here
 	 * but driver callbacks can take a very long time or never return at
 	 * all.
 	 */
-	pdevs = pdev_cache_list_create(root);
-	for (pdevp = pdevs; pdevp && *pdevp; pdevp++) {
-		/*
-		 * NOTE! eeh_recovery_lock() is released briefly
-		 * in eeh_pe_report_pdev()
-		 */
-		eeh_pe_report_pdev(event_id, *pdevp, fn, result, name);
+	pr_debug("EEH(%u): W%u: start (device: %s)\n", wu->event_id, wu->id, pci_name(wu->pdev));
+	eeh_recovery_lock();
+	edev = pci_dev_to_eeh_dev(wu->pdev);
+	if (edev) {
+		bool late, removed, passed;
+
+		WARN_ON(!(edev->mode & EEH_DEV_RECOVERING));
+		removed = eeh_dev_removed(edev);
+		passed = eeh_pe_passed(edev->pe);
+		late = edev->mode & EEH_DEV_NO_HANDLER;
+		if (eeh_edev_actionable(edev)) {
+			eeh_recovery_unlock();
+			eeh_pe_report_pdev(wu->event_id, wu->id, wu->pdev,
+					   wu->fn_name, wu->fn, wu->result,
+					   late, removed, passed);
+			eeh_recovery_lock();
+		} else {
+			pci_info(wu->pdev, "EEH(%u): W%u: Not actionable (%d,%d,%d)\n",
+				 wu->event_id, wu->id, !!wu->pdev, !removed, !passed);
+		}
+		edev = pci_dev_to_eeh_dev(wu->pdev); // Re-acquire after lock release
+		if (edev)
+			edev->mode &= ~EEH_DEV_RECOVERING;
+		/* The edev may be lost, but not moved to a different PE! */
+		WARN_ON(eeh_dev_to_pe(edev) && (eeh_dev_to_pe(edev) != wu->pe));
+		todo = 0;
+		eeh_pe_for_each_dev(wu->pe, oedev, tmp)
+			if (oedev->mode & EEH_DEV_RECOVERING)
+				todo++;
+		pci_dbg(wu->pdev, "EEH(%u): W%u: Remaining devices in this PE: %d\n",
+			wu->event_id, wu->id, todo);
+		if (todo) {
+			pr_debug("EEH(%u): W%u: Remaining work units at this PE: %d\n",
+				 wu->event_id, wu->id, todo);
+		} else {
+			pr_debug("EEH(%u): W%u: All work for this PE complete, continuing traversal:\n",
+				 wu->event_id, wu->id);
+			list_for_each_entry(pe, &wu->pe->child_list, child)
+				enqueue_pe_work(pe, wu->event_id, wu->fn_name,
+						wu->fn, wu->result, wu->count,
+						wu->done);
+		}
+	} else {
+		pr_warn("EEH(%u): W%u: Device removed.\n", wu->event_id, wu->id);
+	}
+	eeh_recovery_unlock();
+	if (atomic_dec_and_test(wu->count)) {
+		pr_debug("EEH(%u): W%u: done\n", wu->event_id, wu->id);
+		complete(wu->done);
+	}
+	put_device(&wu->pdev->dev);
+	kfree(wu);
+}
+
+static void eeh_pe_report(unsigned int event_id, const char *name, struct eeh_pe *root,
+			  eeh_report_fn fn, enum pci_ers_result *result)
+{
+	atomic_t count = ATOMIC_INIT(0);
+	DECLARE_COMPLETION_ONSTACK(done);
+
+	pr_info("EEH(%u): Beginning: '%s'\n", event_id, name);
+	if (enqueue_pe_work(root, event_id, name, fn, result, &count, &done)) {
+		pr_info("EEH(%u): Waiting for asynchronous recovery work to complete...\n",
+			event_id);
+		eeh_recovery_unlock();
+		wait_for_completion_interruptible(&done);
+		pr_info("EEH(%u): Asynchronous recovery work is complete.\n", event_id);
+		eeh_recovery_lock();
+	} else {
+		pr_info("EEH(%u): No recovery work do.\n", event_id);
 	}
-	pdev_cache_list_destroy(pdevs);
 
 	if (result)
 		pr_info("EEH(%u): Finished:'%s' with aggregate recovery state:'%s'\n",
@@ -391,6 +508,7 @@ static void eeh_pe_report(unsigned int event_id,
  * Report an EEH error to each device driver.
  */
 static enum pci_ers_result eeh_report_error(unsigned int event_id,
+					    unsigned int id,
 					    struct pci_dev *pdev,
 					    struct pci_driver *driver)
 {
@@ -401,7 +519,8 @@ static enum pci_ers_result eeh_report_error(unsigned int event_id,
 	if (!driver->err_handler->error_detected)
 		return PCI_ERS_RESULT_NONE;
 
-	pci_info(pdev, "Invoking %s->error_detected(IO frozen)", driver->name);
+	pci_info(pdev, "EEH(%u): W%u: Invoking %s->error_detected(IO frozen)",
+		 event_id, id, driver->name);
 	rc = driver->err_handler->error_detected(pdev, pci_channel_io_frozen);
 
 	eeh_serialize_lock(&flags);
@@ -423,13 +542,14 @@ static enum pci_ers_result eeh_report_error(unsigned int event_id,
  * are now enabled.
  */
 static enum pci_ers_result eeh_report_mmio_enabled(unsigned int event_id,
+						   unsigned int id,
 						   struct pci_dev *pdev,
 						   struct pci_driver *driver)
 {
 	if (!driver->err_handler->mmio_enabled)
 		return PCI_ERS_RESULT_NONE;
-	pci_info(pdev, "EEH(%u): Invoking %s->mmio_enabled()",
-		 event_id, driver->name);
+	pci_info(pdev, "EEH(%u): W%u: Invoking %s->mmio_enabled()",
+		 event_id, id, driver->name);
 	return driver->err_handler->mmio_enabled(pdev);
 }
 
@@ -445,6 +565,7 @@ static enum pci_ers_result eeh_report_mmio_enabled(unsigned int event_id,
  * driver can work again while the device is recovered.
  */
 static enum pci_ers_result eeh_report_reset(unsigned int event_id,
+					    unsigned int id,
 					    struct pci_dev *pdev,
 					    struct pci_driver *driver)
 {
@@ -458,8 +579,8 @@ static enum pci_ers_result eeh_report_reset(unsigned int event_id,
 		return PCI_ERS_RESULT_NONE;
 	}
 	eeh_serialize_unlock(flags);
-	pci_info(pdev, "EEH(%u): Invoking %s->slot_reset()",
-		 event_id, driver->name);
+	pci_info(pdev, "EEH(%u): W%u: Invoking %s->slot_reset()",
+		 event_id, id, driver->name);
 	return driver->err_handler->slot_reset(pdev);
 }
 
@@ -500,6 +621,7 @@ static void eeh_dev_restore_state(struct eeh_dev *edev, void *userdata)
  * to make the recovered device work again.
  */
 static enum pci_ers_result eeh_report_resume(unsigned int event_id,
+					     unsigned int id,
 					     struct pci_dev *pdev,
 					     struct pci_driver *driver)
 {
@@ -514,8 +636,8 @@ static enum pci_ers_result eeh_report_resume(unsigned int event_id,
 	}
 	eeh_serialize_unlock(flags);
 
-	pci_info(pdev, "EEH(%u): Invoking %s->resume()",
-		 event_id, driver->name);
+	pci_info(pdev, "EEH(%u): W%u Invoking %s->resume()",
+		 event_id, id, driver->name);
 	driver->err_handler->resume(pdev);
 
 	pci_uevent_ers(pdev, PCI_ERS_RESULT_RECOVERED);
@@ -537,6 +659,7 @@ static enum pci_ers_result eeh_report_resume(unsigned int event_id,
  * dead, and that no further recovery attempts will be made on it.
  */
 static enum pci_ers_result eeh_report_failure(unsigned int event_id,
+					      unsigned int id,
 					      struct pci_dev *pdev,
 					      struct pci_driver *driver)
 {
@@ -545,8 +668,8 @@ static enum pci_ers_result eeh_report_failure(unsigned int event_id,
 	if (!driver->err_handler->error_detected)
 		return PCI_ERS_RESULT_NONE;
 
-	pci_info(pdev, "EEH(%u): Invoking %s->error_detected(permanent failure)",
-		 event_id, driver->name);
+	pci_info(pdev, "EEH(%u): W%u: Invoking %s->error_detected(permanent failure)",
+		 event_id, id, driver->name);
 	rc = driver->err_handler->error_detected(pdev,
 						 pci_channel_io_perm_failure);
 
@@ -995,9 +1118,10 @@ static void eeh_clear_slot_attention(struct pci_dev *pdev)
  */
 void eeh_handle_normal_event(unsigned int event_id, struct eeh_pe *pe)
 {
+	struct eeh_pe *tmp_pe;
+	struct pci_controller *phb = pe->phb;
 	struct pci_bus *bus;
 	struct eeh_dev *edev, *tmp;
-	struct eeh_pe *tmp_pe;
 	struct pci_dev **pdevs, **pdevp;
 	int rc = 0;
 	enum pci_ers_result result = PCI_ERS_RESULT_NONE;
@@ -1009,7 +1133,7 @@ void eeh_handle_normal_event(unsigned int event_id, struct eeh_pe *pe)
 	bus = eeh_pe_bus_get(pe);
 	if (!bus) {
 		pr_err("EEH(%u): %s: Cannot find PCI bus for PHB#%x-PE#%x\n",
-			event_id, __func__, pe->phb->global_number, pe->addr);
+			event_id, __func__, phb->global_number, pe->addr);
 		eeh_recovery_unlock();
 		return;
 	}
@@ -1030,7 +1154,7 @@ void eeh_handle_normal_event(unsigned int event_id, struct eeh_pe *pe)
 
 	if (!devices) {
 		pr_debug("EEH(%u): Frozen PHB#%x-PE#%x is empty!\n",
-			event_id, pe->phb->global_number, pe->addr);
+			event_id, phb->global_number, pe->addr);
 		goto out; /* nothing to recover */
 	}
 
@@ -1042,12 +1166,12 @@ void eeh_handle_normal_event(unsigned int event_id, struct eeh_pe *pe)
 	/* Log the event */
 	if (pe->type & EEH_PE_PHB) {
 		pr_err("EEH(%u): Recovering PHB#%x, location: %s\n",
-			event_id, pe->phb->global_number, eeh_pe_loc_get(pe));
+			event_id, phb->global_number, eeh_pe_loc_get(pe));
 	} else {
-		struct eeh_pe *phb_pe = eeh_phb_pe_get(pe->phb);
+		struct eeh_pe *phb_pe = eeh_phb_pe_get(phb);
 
 		pr_err("EEH(%u): Recovering PHB#%x-PE#%x\n",
-		       event_id, pe->phb->global_number, pe->addr);
+		       event_id, phb->global_number, pe->addr);
 		pr_err("EEH(%u): PE location: %s, PHB location: %s\n",
 		       event_id, eeh_pe_loc_get(pe), eeh_pe_loc_get(phb_pe));
 	}
@@ -1062,7 +1186,7 @@ void eeh_handle_normal_event(unsigned int event_id, struct eeh_pe *pe)
 		int i;
 
 		pr_err("EEH(%u): Frozen PHB#%x-PE#%x detected\n",
-		       event_id, pe->phb->global_number, pe->addr);
+		       event_id, phb->global_number, pe->addr);
 
 		/* FIXME: Use the same format as dump_stack() */
 		pr_err("EEH(%u): Call Trace:\n", event_id);
@@ -1081,7 +1205,7 @@ void eeh_handle_normal_event(unsigned int event_id, struct eeh_pe *pe)
 	pe->freeze_count++;
 	if (pe->freeze_count > eeh_max_freezes) {
 		pr_err("EEH(%u): PHB#%x-PE#%x has failed %d times in the last hour and has been permanently disabled.\n",
-		       event_id, pe->phb->global_number, pe->addr,
+		       event_id, phb->global_number, pe->addr,
 		       pe->freeze_count);
 
 		goto recover_failed;
@@ -1229,7 +1353,7 @@ void eeh_handle_normal_event(unsigned int event_id, struct eeh_pe *pe)
 	 */
 	pr_err("EEH(%u): Unable to recover from failure from PHB#%x-PE#%x.\n"
 		"Please try reseating or replacing it\n",
-		event_id, pe->phb->global_number, pe->addr);
+		event_id, phb->global_number, pe->addr);
 
 	eeh_slot_error_detail(event_id, pe, EEH_LOG_PERM);
 
@@ -1284,6 +1408,30 @@ void eeh_handle_normal_event(unsigned int event_id, struct eeh_pe *pe)
 	eeh_recovery_unlock();
 }
 
+void eeh_handle_normal_event_work(struct work_struct *work)
+{
+	unsigned long flags;
+	struct eeh_event *event = container_of(work, struct eeh_event, work);
+	struct pci_controller *phb = event->pe->phb;
+
+	eeh_handle_normal_event(event->id, event->pe);
+
+	kfree(event);
+	spin_lock_irqsave(&eeh_eventlist_lock, flags);
+	WARN_ON_ONCE(!phb->eeh_in_progress);
+	if (list_empty(&phb->eeh_eventlist)) {
+		phb->eeh_in_progress = false;
+		pr_debug("EEH(%u): No more work to do\n", event->id);
+	} else {
+		pr_warn("EEH(%u): More work to do\n", event->id);
+		event = list_entry(phb->eeh_eventlist.next,
+				   struct eeh_event, list);
+		list_del(&event->list);
+		queue_work(system_unbound_wq, &event->work);
+	}
+	spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
+}
+
 /**
  * eeh_handle_special_event - Handle EEH events without a specific failing PE
  *
@@ -1353,8 +1501,7 @@ void eeh_handle_special_event(void)
 		 */
 		if (rc == EEH_NEXT_ERR_FROZEN_PE ||
 		    rc == EEH_NEXT_ERR_FENCED_PHB) {
-			eeh_pe_state_mark(pe, EEH_PE_RECOVERING);
-			eeh_handle_normal_event(0, pe);
+			eeh_phb_event(pe);
 		} else {
 			eeh_for_each_pe(pe, tmp_pe)
 				eeh_pe_for_each_dev(tmp_pe, edev, tmp_edev)
diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c
index 6c205a77581f..f01cb8e981e1 100644
--- a/arch/powerpc/kernel/eeh_event.c
+++ b/arch/powerpc/kernel/eeh_event.c
@@ -22,7 +22,7 @@
  *  work-queue, where a worker thread can drive recovery.
  */
 
-static DEFINE_SPINLOCK(eeh_eventlist_lock);
+DEFINE_SPINLOCK(eeh_eventlist_lock);
 static DECLARE_COMPLETION(eeh_eventlist_event);
 static LIST_HEAD(eeh_eventlist);
 
@@ -61,7 +61,7 @@ static int eeh_event_handler(void * dummy)
 			continue;
 
 		/* We might have event without binding PE */
-		if (event->pe)
+		if (event->pe) /* TODO: Unused now? */
 			eeh_handle_normal_event(event->id, event->pe);
 		else
 			eeh_handle_special_event();
@@ -94,33 +94,56 @@ int eeh_event_init(void)
 	return 0;
 }
 
-/**
- * eeh_send_failure_event - Generate a PCI error event
- * @pe: EEH PE
- *
- * This routine can be called within an interrupt context;
- * the actual event will be delivered in a normal context
- * (from a workqueue).
- */
-int __eeh_send_failure_event(struct eeh_pe *pe)
+int eeh_phb_event(struct eeh_pe *pe)
 {
-	unsigned long flags;
 	struct eeh_event *event;
+	unsigned long flags;
 
 	event = kzalloc(sizeof(*event), GFP_ATOMIC);
 	if (!event) {
 		pr_err("EEH: out of memory, event not handled\n");
 		return -ENOMEM;
 	}
-	event->pe = pe;
+
 	do {
 		/* Skip over the special value (0) */
 		event->id = (unsigned int)atomic_inc_return(&eeh_event_id);
 	} while (!event->id);
 
-	pr_err("EEH(%u): EVENT=ERROR_DETECTED PHB=%#x PE=%#x\n",
-	      event->id, pe->phb->global_number, pe->addr);
+	spin_lock_irqsave(&eeh_eventlist_lock, flags);
+	INIT_WORK(&event->work, eeh_handle_normal_event_work);
+	if (pe) {
+		event->pe = pe;
+		eeh_pe_state_mark(pe, EEH_PE_RECOVERING);
+		pr_err("EEH(%u): EVENT=ERROR_DETECTED PHB=%#x PE=%#x\n",
+		       event->id, pe->phb->global_number, pe->addr);
+		if (event->pe->phb->eeh_in_progress) {
+			pr_info("EEH: EEH already in progress on this PHB, queueing.\n");
+			list_add(&event->list, &event->pe->phb->eeh_eventlist);
+		} else {
+			pr_info("EEH: Beginning recovery on this PHB.\n");
+			WARN_ON_ONCE(!list_empty(&event->pe->phb->eeh_eventlist));
+			event->pe->phb->eeh_in_progress = true;
+			queue_work(system_unbound_wq, &event->work);
+		}
+	} else {
+		list_add(&event->list, &eeh_eventlist);
+		complete(&eeh_eventlist_event);
+	}
+	spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
+	return 0;
+}
 
+/**
+ * eeh_send_failure_event - Generate a PCI error event
+ * @pe: EEH PE
+ *
+ * This routine can be called within an interrupt context;
+ * the actual event will be delivered in a normal context
+ * (from a workqueue).
+ */
+int __eeh_send_failure_event(struct eeh_pe *pe)
+{
 	/*
 	 * Mark the PE as recovering before inserting it in the queue.
 	 * This prevents the PE from being free()ed by a hotplug driver
@@ -136,18 +159,8 @@ int __eeh_send_failure_event(struct eeh_pe *pe)
 					 ARRAY_SIZE(pe->stack_trace), 0);
 #endif /* CONFIG_STACKTRACE */
 
-		eeh_pe_state_mark(pe, EEH_PE_RECOVERING);
 	}
-
-	/* We may or may not be called in an interrupt context */
-	spin_lock_irqsave(&eeh_eventlist_lock, flags);
-	list_add(&event->list, &eeh_eventlist);
-	spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
-
-	/* For EEH deamon to knick in */
-	complete(&eeh_eventlist_event);
-
-	return 0;
+	return eeh_phb_event(pe);
 }
 
 int eeh_send_failure_event(struct eeh_pe *pe)
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 8f30ad3df8e0..0e9676b2d71f 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -81,6 +81,9 @@ int eeh_phb_pe_create(struct pci_controller *phb)
 {
 	struct eeh_pe *pe;
 
+	phb->eeh_in_progress = false; /* TODO: Necessary? */
+	INIT_LIST_HEAD(&phb->eeh_eventlist);
+
 	/* Allocate PHB PE */
 	pe = eeh_pe_alloc(phb, EEH_PE_PHB);
 	if (!pe) {
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC 0/3] Asynchronous EEH recovery
  2022-08-16  3:27 [RFC 0/3] Asynchronous EEH recovery Ganesh Goudar
                   ` (2 preceding siblings ...)
  2022-08-16  3:27 ` [RFC 3/3] powerpc/eeh: Asynchronous recovery Ganesh Goudar
@ 2022-08-17  7:16 ` Oliver O'Halloran
  2022-09-02  0:19 ` Jason Gunthorpe
  4 siblings, 0 replies; 9+ messages in thread
From: Oliver O'Halloran @ 2022-08-17  7:16 UTC (permalink / raw)
  To: Ganesh Goudar; +Cc: linuxppc-dev, Mahesh J Salgaonkar

On Tue, Aug 16, 2022 at 1:29 PM Ganesh Goudar <ganeshgr@linux.ibm.com> wrote:
>
> Hi,
>
> EEH reocvery is currently serialized and these patches shorten
> the time taken for EEH recovery by making the recovery to run
> in parallel. The original author of these patches is Sam Bobroff,
> I have rebased and tested these patches.
>
> On powervm with 64 VFs and I see approximately 48% reduction
> in time taken in EEH recovery,

Can you elaborate on how you're testing this? A VM with 64 separate
VFs on 64 separate PHBs I guess? How are you injecting errors?

> Yet to be tested on powernv.

If you're not testing on powernv you might as well not be testing. EEH
support on pseries is relatively straightforward since managing PCI
bridges, etc is all hidden away in the hypervisor. Most of the things
about EEH which give me headaches simply don't happen on pseries since
the PCI topology seen by the guest is flat.

> These patches were originally posted as separate RFCs, I think
> posting them as single series would be more helpful, I know the
> patches are too big, I will try to logically divide in next
> iterations.
>
> Thanks
>
> Ganesh Goudar (3):
>   powerpc/eeh: Synchronization for safety

I didn't pick that patch up after Sam left for a reason. I don't
recall the exact details, but I think I decided that the lock being
added had the same (or mostly the same) scope as the pci rescan lock.
All modifications to the EEH PE tree have to occur with the rescan
lock held since the per-device EEH setup occurs while configuring the
pci_dev (similarly the teardown happens when we remove the pci_dev
from its bus). I don't think that was true when Same wrote the patch
originally though since it pre-dates my big EEH init re-work.

It's possible (probable even!) I got that wrong, or that it was
something I was trying to make true through re-works rather than an
accurate assessment of the current state of things. I'll try page some
of this back into my brain later. I think I left some comments on
Sam's original RFC patch which might still be valid.

>   powerpc/eeh: Asynchronous recovery

I remember not hating the idea, but I think the implementation leaves
a lot to be desired. If you only really care about pseries then just
moving to a model where each PHB has its own recovery thread would get
you most of the benefit without needing to turn the already
complicated recovery path into an async trainwreck. IIRC Sam's
motivation for that patch was to make recovery faster for powernv
systems when an error was injected on a PF with a lot of child VFs.
Certain drivers took forever to recover each VF (might have been mlx5)
so doing it in parallel would have sped things up considerably.

Oliver

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 0/3] Asynchronous EEH recovery
  2022-08-16  3:27 [RFC 0/3] Asynchronous EEH recovery Ganesh Goudar
                   ` (3 preceding siblings ...)
  2022-08-17  7:16 ` [RFC 0/3] Asynchronous EEH recovery Oliver O'Halloran
@ 2022-09-02  0:19 ` Jason Gunthorpe
  2022-09-15 10:15   ` Ganesh
  4 siblings, 1 reply; 9+ messages in thread
From: Jason Gunthorpe @ 2022-09-02  0:19 UTC (permalink / raw)
  To: Ganesh Goudar; +Cc: linuxppc-dev, mahesh

On Tue, Aug 16, 2022 at 08:57:13AM +0530, Ganesh Goudar wrote:
> Hi,
> 
> EEH reocvery is currently serialized and these patches shorten
> the time taken for EEH recovery by making the recovery to run
> in parallel. The original author of these patches is Sam Bobroff,
> I have rebased and tested these patches.

How did you test this?

I understand that VFIO on 6.0 does not work at all on power?

I am waiting for power maintainers to pick up this series to fix it:

https://lore.kernel.org/kvm/20220714081822.3717693-1-aik@ozlabs.ru/

Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 0/3] Asynchronous EEH recovery
  2022-09-02  0:19 ` Jason Gunthorpe
@ 2022-09-15 10:15   ` Ganesh
  2023-01-25 14:04     ` Christophe Leroy
  0 siblings, 1 reply; 9+ messages in thread
From: Ganesh @ 2022-09-15 10:15 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linuxppc-dev, mahesh

[-- Attachment #1: Type: text/plain, Size: 635 bytes --]

On 9/2/22 05:49, Jason Gunthorpe wrote:

> On Tue, Aug 16, 2022 at 08:57:13AM +0530, Ganesh Goudar wrote:
>> Hi,
>>
>> EEH reocvery is currently serialized and these patches shorten
>> the time taken for EEH recovery by making the recovery to run
>> in parallel. The original author of these patches is Sam Bobroff,
>> I have rebased and tested these patches.
> How did you test this?

This is tested on SRIOV VFs.

>
> I understand that VFIO on 6.0 does not work at all on power?
>
> I am waiting for power maintainers to pick up this series to fix it:
>
> https://lore.kernel.org/kvm/20220714081822.3717693-1-aik@ozlabs.ru/
>
> Jason

[-- Attachment #2: Type: text/html, Size: 1328 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 0/3] Asynchronous EEH recovery
  2022-09-15 10:15   ` Ganesh
@ 2023-01-25 14:04     ` Christophe Leroy
  0 siblings, 0 replies; 9+ messages in thread
From: Christophe Leroy @ 2023-01-25 14:04 UTC (permalink / raw)
  To: Ganesh, Jason Gunthorpe; +Cc: linuxppc-dev, mahesh

Hi,

Le 15/09/2022 à 12:15, Ganesh a écrit :
> On 9/2/22 05:49, Jason Gunthorpe wrote:
> 
>> On Tue, Aug 16, 2022 at 08:57:13AM +0530, Ganesh Goudar wrote:
>>> Hi,
>>>
>>> EEH reocvery is currently serialized and these patches shorten
>>> the time taken for EEH recovery by making the recovery to run
>>> in parallel. The original author of these patches is Sam Bobroff,
>>> I have rebased and tested these patches.
>> How did you test this?
> 
> This is tested on SRIOV VFs.
> 
>> I understand that VFIO on 6.0 does not work at all on power?
>>
>> I am waiting for power maintainers to pick up this series to fix it:
>>
>> https://lore.kernel.org/kvm/20220714081822.3717693-1-aik@ozlabs.ru/
>>


This series doesn't apply anymore, for instance 
drivers/vfio/vfio_spapr_eeh.c doesn't exist anymore.

If you series is still relevant, please resubmit.

Christophe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC 2/3] powerpc/eeh: Provide a unique ID for each EEH recovery
  2023-06-13  1:43 Ganesh Goudar
@ 2023-06-13  1:43 ` Ganesh Goudar
  0 siblings, 0 replies; 9+ messages in thread
From: Ganesh Goudar @ 2023-06-13  1:43 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Ganesh Goudar, oohall, mahesh

Based on the original work from Sam Bobroff.

Give a unique ID to each recovery event, to ease log parsing
and prepare for parallel recovery.

Also add some new messages with a very simple format that may
be useful to log-parsers.

Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
---
 arch/powerpc/include/asm/eeh_event.h |   3 +-
 arch/powerpc/include/asm/ppc-pci.h   |   2 +-
 arch/powerpc/kernel/eeh.c            |  42 +++---
 arch/powerpc/kernel/eeh_driver.c     | 189 +++++++++++++++------------
 arch/powerpc/kernel/eeh_event.c      |  12 +-
 include/linux/mmzone.h               |   2 +-
 6 files changed, 147 insertions(+), 103 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh_event.h b/arch/powerpc/include/asm/eeh_event.h
index dadde7d52f46..a1fe736bc4cf 100644
--- a/arch/powerpc/include/asm/eeh_event.h
+++ b/arch/powerpc/include/asm/eeh_event.h
@@ -17,13 +17,14 @@
 struct eeh_event {
 	struct list_head	list;	/* to form event queue	*/
 	struct eeh_pe		*pe;	/* EEH PE		*/
+	unsigned int		id;	/* Event ID		*/
 };
 
 int eeh_event_init(void);
 int eeh_send_failure_event(struct eeh_pe *pe);
 int __eeh_send_failure_event(struct eeh_pe *pe);
 void eeh_remove_event(struct eeh_pe *pe, bool force);
-void eeh_handle_normal_event(struct eeh_pe *pe);
+void eeh_handle_normal_event(unsigned int event_id, struct eeh_pe *pe);
 void eeh_handle_special_event(void);
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
index d9fcff575027..5b82e76dbd19 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -40,7 +40,7 @@ extern int rtas_setup_phb(struct pci_controller *phb);
 void eeh_addr_cache_insert_dev(struct pci_dev *dev);
 void eeh_addr_cache_rmv_dev(struct pci_dev *dev);
 struct eeh_dev *eeh_addr_cache_get_dev(unsigned long addr);
-void eeh_slot_error_detail(struct eeh_pe *pe, int severity);
+void eeh_slot_error_detail(unsigned int event_id, struct eeh_pe *pe, int severity);
 int eeh_pci_enable(struct eeh_pe *pe, int function);
 int eeh_pe_reset_full(struct eeh_pe *pe, bool include_passed);
 void eeh_save_bars(struct eeh_dev *edev);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 2c90c37524ed..148d5df0e606 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -200,7 +200,8 @@ EXPORT_SYMBOL_GPL(eeh_recovery_must_be_locked);
  * for the indicated PCI device, and puts them into a buffer
  * for RTAS error logging.
  */
-static size_t eeh_dump_dev_log(struct eeh_dev *edev, char *buf, size_t len)
+static size_t eeh_dump_dev_log(unsigned int event_id, struct eeh_dev *edev,
+			       char *buf, size_t len)
 {
 	u32 cfg;
 	int cap, i;
@@ -210,27 +211,29 @@ static size_t eeh_dump_dev_log(struct eeh_dev *edev, char *buf, size_t len)
 	n += scnprintf(buf+n, len-n, "%04x:%02x:%02x.%01x\n",
 			edev->pe->phb->global_number, edev->bdfn >> 8,
 			PCI_SLOT(edev->bdfn), PCI_FUNC(edev->bdfn));
-	pr_warn("EEH: of node=%04x:%02x:%02x.%01x\n",
+	pr_warn("EEH(%u): of node=%04x:%02x:%02x.%01x\n",
+		event_id,
 		edev->pe->phb->global_number, edev->bdfn >> 8,
 		PCI_SLOT(edev->bdfn), PCI_FUNC(edev->bdfn));
 
 	eeh_ops->read_config(edev, PCI_VENDOR_ID, 4, &cfg);
 	n += scnprintf(buf+n, len-n, "dev/vend:%08x\n", cfg);
-	pr_warn("EEH: PCI device/vendor: %08x\n", cfg);
+	pr_warn("EEH(%u): PCI device/vendor: %08x\n",event_id, cfg);
 
 	eeh_ops->read_config(edev, PCI_COMMAND, 4, &cfg);
 	n += scnprintf(buf+n, len-n, "cmd/stat:%x\n", cfg);
-	pr_warn("EEH: PCI cmd/status register: %08x\n", cfg);
+	pr_warn("EEH(%u): PCI cmd/status register: %08x\n", event_id, cfg);
 
 	/* Gather bridge-specific registers */
 	if (edev->mode & EEH_DEV_BRIDGE) {
 		eeh_ops->read_config(edev, PCI_SEC_STATUS, 2, &cfg);
 		n += scnprintf(buf+n, len-n, "sec stat:%x\n", cfg);
-		pr_warn("EEH: Bridge secondary status: %04x\n", cfg);
+		pr_warn("EEH(%u): Bridge secondary status: %04x\n",
+			event_id, cfg);
 
 		eeh_ops->read_config(edev, PCI_BRIDGE_CONTROL, 2, &cfg);
 		n += scnprintf(buf+n, len-n, "brdg ctl:%x\n", cfg);
-		pr_warn("EEH: Bridge control: %04x\n", cfg);
+		pr_warn("EEH(%u): Bridge control: %04x\n", event_id, cfg);
 	}
 
 	/* Dump out the PCI-X command and status regs */
@@ -238,18 +241,19 @@ static size_t eeh_dump_dev_log(struct eeh_dev *edev, char *buf, size_t len)
 	if (cap) {
 		eeh_ops->read_config(edev, cap, 4, &cfg);
 		n += scnprintf(buf+n, len-n, "pcix-cmd:%x\n", cfg);
-		pr_warn("EEH: PCI-X cmd: %08x\n", cfg);
+		pr_warn("EEH(%u): PCI-X cmd: %08x\n", event_id, cfg);
 
 		eeh_ops->read_config(edev, cap+4, 4, &cfg);
 		n += scnprintf(buf+n, len-n, "pcix-stat:%x\n", cfg);
-		pr_warn("EEH: PCI-X status: %08x\n", cfg);
+		pr_warn("EEH(%u): PCI-X status: %08x\n", event_id, cfg);
 	}
 
 	/* If PCI-E capable, dump PCI-E cap 10 */
 	cap = edev->pcie_cap;
 	if (cap) {
 		n += scnprintf(buf+n, len-n, "pci-e cap10:\n");
-		pr_warn("EEH: PCI-E capabilities and status follow:\n");
+		pr_warn("EEH(%u): PCI-E capabilities and status follow:\n",
+			event_id);
 
 		for (i=0; i<=8; i++) {
 			eeh_ops->read_config(edev, cap+4*i, 4, &cfg);
@@ -260,8 +264,8 @@ static size_t eeh_dump_dev_log(struct eeh_dev *edev, char *buf, size_t len)
 					pr_warn("%s\n", buffer);
 
 				l = scnprintf(buffer, sizeof(buffer),
-					      "EEH: PCI-E %02x: %08x ",
-					      4*i, cfg);
+					      "EEH(%u): PCI-E %02x: %08x ",
+					      event_id, 4*i, cfg);
 			} else {
 				l += scnprintf(buffer+l, sizeof(buffer)-l,
 					       "%08x ", cfg);
@@ -276,7 +280,8 @@ static size_t eeh_dump_dev_log(struct eeh_dev *edev, char *buf, size_t len)
 	cap = edev->aer_cap;
 	if (cap) {
 		n += scnprintf(buf+n, len-n, "pci-e AER:\n");
-		pr_warn("EEH: PCI-E AER capability register set follows:\n");
+		pr_warn("EEH(%u): PCI-E AER capability register set follows:\n",
+			event_id);
 
 		for (i=0; i<=13; i++) {
 			eeh_ops->read_config(edev, cap+4*i, 4, &cfg);
@@ -301,16 +306,13 @@ static size_t eeh_dump_dev_log(struct eeh_dev *edev, char *buf, size_t len)
 	return n;
 }
 
-static void *eeh_dump_pe_log(struct eeh_pe *pe, void *flag)
+static void eeh_dump_pe_log(unsigned int event_id, struct eeh_pe *pe, size_t *plen)
 {
 	struct eeh_dev *edev, *tmp;
-	size_t *plen = flag;
 
 	eeh_pe_for_each_dev(pe, edev, tmp)
-		*plen += eeh_dump_dev_log(edev, pci_regs_buf + *plen,
+		*plen += eeh_dump_dev_log(event_id, edev, pci_regs_buf + *plen,
 					  EEH_PCI_REGS_LOG_LEN - *plen);
-
-	return NULL;
 }
 
 /**
@@ -323,9 +325,10 @@ static void *eeh_dump_pe_log(struct eeh_pe *pe, void *flag)
  * out from the config space of the corresponding PCI device, while
  * the error log is fetched through platform dependent function call.
  */
-void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
+void eeh_slot_error_detail(unsigned int event_id, struct eeh_pe *pe, int severity)
 {
 	size_t loglen = 0;
+	struct eeh_pe *tmp_pe;
 
 	/*
 	 * When the PHB is fenced or dead, it's pointless to collect
@@ -365,7 +368,8 @@ void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
 			eeh_pe_restore_bars(pe);
 
 			pci_regs_buf[0] = 0;
-			eeh_pe_traverse(pe, eeh_dump_pe_log, &loglen);
+			eeh_for_each_pe(pe, tmp_pe)
+				eeh_dump_pe_log(event_id, tmp_pe, &loglen);
 		}
 	}
 
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 09f181bd39a3..cdf2de0eba57 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -247,10 +247,13 @@ static void eeh_set_irq_state(struct eeh_pe *root, bool enable)
 	}
 }
 
-typedef enum pci_ers_result (*eeh_report_fn)(struct pci_dev *,
+typedef enum pci_ers_result (*eeh_report_fn)(unsigned int event_id,
+					     struct pci_dev *,
 					     struct pci_driver *);
-static void eeh_pe_report_pdev(struct pci_dev *pdev, eeh_report_fn fn,
-			       enum pci_ers_result *result)
+static void eeh_pe_report_pdev(unsigned int event_id,
+			       struct pci_dev *pdev, eeh_report_fn fn,
+			       enum pci_ers_result *result,
+			       const char *handler_name)
 {
 	struct eeh_dev *edev;
 	struct pci_driver *driver;
@@ -259,7 +262,7 @@ static void eeh_pe_report_pdev(struct pci_dev *pdev, eeh_report_fn fn,
 
 	edev = pci_dev_to_eeh_dev(pdev);
 	if (!edev) {
-		pci_info(pdev, "no EEH state for device");
+		pci_info(pdev, "EEH(%u): no EEH state for device", event_id);
 		return;
 	}
 	/* Cache some useful values before releasing the lock: */
@@ -279,19 +282,26 @@ static void eeh_pe_report_pdev(struct pci_dev *pdev, eeh_report_fn fn,
 		driver = eeh_pcid_get(pdev);
 
 		if (!driver)
-			pci_info(pdev, "no driver");
+			pci_info(pdev, "EEH(%u): no driver", event_id);
 		else if (!driver->err_handler)
-			pci_info(pdev, "driver not EEH aware");
+			pci_info(pdev, "EEH(%u): driver not EEH aware", event_id);
 		else if (late)
-			pci_info(pdev, "driver bound too late");
+			pci_info(pdev, "EEH(%u): driver bound too late", event_id);
 		else {
-			new_result = fn(pdev, driver);
+			pr_warn("EEH(%u): EVENT=HANDLER_CALL DEVICE=%04x:%02x:%02x.%x DRIVER='%s' HANDLER='%s'\n",
+				event_id, edev->controller->global_number,
+				PCI_BUSNO(edev->bdfn), PCI_SLOT(edev->bdfn),
+				PCI_FUNC(edev->bdfn), driver->name, handler_name);
+
+			new_result = fn(event_id, pdev, driver);
 			/*
 			 * It's not safe to use edev here, because the locks
 			 * have been released and devices could have changed.
 			 */
-			pci_info(pdev, "%s driver reports: '%s'",
-				 driver->name,
+			pr_warn("EEH(%u): EVENT=HANDLER_RETURN RESULT='%s'\n",
+				event_id, pci_ers_result_name(new_result));
+			pci_info(pdev, "EEH(%u): %s driver reports: %s",
+				 event_id, driver->name,
 				 pci_ers_result_name(new_result));
 			if (result)
 				*result = pci_ers_merge_result(*result,
@@ -302,8 +312,8 @@ static void eeh_pe_report_pdev(struct pci_dev *pdev, eeh_report_fn fn,
 		device_unlock(&pdev->dev);
 		eeh_recovery_lock();
 	} else {
-		pci_info(pdev, "not actionable (%d,%d,%d)", !!pdev,
-			 !removed, !passed);
+		pci_info(pdev, "EEH(%u): not actionable (%d,%d,%d)",
+			 event_id, !!pdev, !removed, !passed);
 	}
 }
 
@@ -344,12 +354,13 @@ static void pdev_cache_list_destroy(struct pci_dev **pdevs)
 	kfree(pdevs);
 }
 
-static void eeh_pe_report(const char *name, struct eeh_pe *root,
+static void eeh_pe_report(unsigned int event_id,
+			  const char *name, struct eeh_pe *root,
 			  eeh_report_fn fn, enum pci_ers_result *result)
 {
 	struct pci_dev **pdevs, **pdevp;
 
-	pr_info("EEH: Beginning: '%s'\n", name);
+	pr_info("EEH(%u): Beginning: '%s'\n", event_id, name);
 	/*
 	 * It would be convenient to continue to hold the recovery lock here
 	 * but driver callbacks can take a very long time or never return at
@@ -361,15 +372,15 @@ static void eeh_pe_report(const char *name, struct eeh_pe *root,
 		 * NOTE! eeh_recovery_lock() is released briefly
 		 * in eeh_pe_report_pdev()
 		 */
-		eeh_pe_report_pdev(*pdevp, fn, result);
+		eeh_pe_report_pdev(event_id, *pdevp, fn, result, name);
 	}
 	pdev_cache_list_destroy(pdevs);
 
 	if (result)
-		pr_info("EEH: Finished:'%s' with aggregate recovery state:'%s'\n",
-			name, pci_ers_result_name(*result));
+		pr_info("EEH(%u): Finished:'%s' with aggregate recovery state:'%s'\n",
+			event_id, name, pci_ers_result_name(*result));
 	else
-		pr_info("EEH: Finished:'%s'", name);
+		pr_info("EEH(%u): Finished:'%s'",event_id, name);
 }
 
 /**
@@ -379,7 +390,8 @@ static void eeh_pe_report(const char *name, struct eeh_pe *root,
  *
  * Report an EEH error to each device driver.
  */
-static enum pci_ers_result eeh_report_error(struct pci_dev *pdev,
+static enum pci_ers_result eeh_report_error(unsigned int event_id,
+					    struct pci_dev *pdev,
 					    struct pci_driver *driver)
 {
 	enum pci_ers_result rc;
@@ -410,12 +422,14 @@ static enum pci_ers_result eeh_report_error(struct pci_dev *pdev,
  * Tells each device driver that IO ports, MMIO and config space I/O
  * are now enabled.
  */
-static enum pci_ers_result eeh_report_mmio_enabled(struct pci_dev *pdev,
+static enum pci_ers_result eeh_report_mmio_enabled(unsigned int event_id,
+						   struct pci_dev *pdev,
 						   struct pci_driver *driver)
 {
 	if (!driver->err_handler->mmio_enabled)
 		return PCI_ERS_RESULT_NONE;
-	pci_info(pdev, "Invoking %s->mmio_enabled()", driver->name);
+	pci_info(pdev, "EEH(%u): Invoking %s->mmio_enabled()",
+		 event_id, driver->name);
 	return driver->err_handler->mmio_enabled(pdev);
 }
 
@@ -430,7 +444,8 @@ static enum pci_ers_result eeh_report_mmio_enabled(struct pci_dev *pdev,
  * some actions, usually to save data the driver needs so that the
  * driver can work again while the device is recovered.
  */
-static enum pci_ers_result eeh_report_reset(struct pci_dev *pdev,
+static enum pci_ers_result eeh_report_reset(unsigned int event_id,
+					    struct pci_dev *pdev,
 					    struct pci_driver *driver)
 {
 	struct eeh_dev *edev;
@@ -443,7 +458,8 @@ static enum pci_ers_result eeh_report_reset(struct pci_dev *pdev,
 		return PCI_ERS_RESULT_NONE;
 	}
 	eeh_serialize_unlock(flags);
-	pci_info(pdev, "Invoking %s->slot_reset()", driver->name);
+	pci_info(pdev, "EEH(%u): Invoking %s->slot_reset()",
+		 event_id, driver->name);
 	return driver->err_handler->slot_reset(pdev);
 }
 
@@ -483,7 +499,8 @@ static void eeh_dev_restore_state(struct eeh_dev *edev, void *userdata)
  * could resume so that the device driver can do some initialization
  * to make the recovered device work again.
  */
-static enum pci_ers_result eeh_report_resume(struct pci_dev *pdev,
+static enum pci_ers_result eeh_report_resume(unsigned int event_id,
+					     struct pci_dev *pdev,
 					     struct pci_driver *driver)
 {
 	struct eeh_dev *edev;
@@ -497,7 +514,8 @@ static enum pci_ers_result eeh_report_resume(struct pci_dev *pdev,
 	}
 	eeh_serialize_unlock(flags);
 
-	pci_info(pdev, "Invoking %s->resume()", driver->name);
+	pci_info(pdev, "EEH(%u): Invoking %s->resume()",
+		 event_id, driver->name);
 	driver->err_handler->resume(pdev);
 
 	pci_uevent_ers(pdev, PCI_ERS_RESULT_RECOVERED);
@@ -518,7 +536,8 @@ static enum pci_ers_result eeh_report_resume(struct pci_dev *pdev,
  * This informs the device driver that the device is permanently
  * dead, and that no further recovery attempts will be made on it.
  */
-static enum pci_ers_result eeh_report_failure(struct pci_dev *pdev,
+static enum pci_ers_result eeh_report_failure(unsigned int event_id,
+					      struct pci_dev *pdev,
 					      struct pci_driver *driver)
 {
 	enum pci_ers_result rc;
@@ -526,8 +545,8 @@ static enum pci_ers_result eeh_report_failure(struct pci_dev *pdev,
 	if (!driver->err_handler->error_detected)
 		return PCI_ERS_RESULT_NONE;
 
-	pci_info(pdev, "Invoking %s->error_detected(permanent failure)",
-		 driver->name);
+	pci_info(pdev, "EEH(%u): Invoking %s->error_detected(permanent failure)",
+		 event_id, driver->name);
 	rc = driver->err_handler->error_detected(pdev,
 						 pci_channel_io_perm_failure);
 
@@ -574,7 +593,8 @@ static void *eeh_add_virt_device(struct eeh_dev *edev)
 	return NULL;
 }
 
-static void eeh_rmv_device(struct pci_dev *pdev, void *userdata)
+static void eeh_rmv_device(unsigned int event_id,
+			   struct pci_dev *pdev, void *userdata)
 {
 	unsigned long flags;
 	struct eeh_dev *edev;
@@ -583,8 +603,8 @@ static void eeh_rmv_device(struct pci_dev *pdev, void *userdata)
 
 	edev = pci_dev_to_eeh_dev(pdev);
 	if (!edev) {
-		pci_warn(pdev, "EEH: Device removed during processing (#%d)\n",
-			 __LINE__);
+		pci_warn(pdev, "EEH(%u): Device removed during processing (#%d)\n",
+			 event_id, __LINE__);
 		return;
 	}
 
@@ -613,7 +633,8 @@ static void eeh_rmv_device(struct pci_dev *pdev, void *userdata)
 	}
 
 	/* Remove it from PCI subsystem */
-	pci_info(pdev, "EEH: Removing device without EEH sensitive driver\n");
+	pci_info(pdev, "EEH(%u): Removing device without EEH sensitive driver\n",
+		 event_id);
 	edev->mode |= EEH_DEV_DISCONNECTED;
 	if (rmv_data)
 		rmv_data->removed_dev_count++;
@@ -731,7 +752,8 @@ int eeh_pe_reset_and_recover(struct eeh_pe *pe)
  * During the reset, udev might be invoked because those affected
  * PCI devices will be removed and then added.
  */
-static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
+static int eeh_reset_device(unsigned int event_id,
+			    struct eeh_pe *pe, struct pci_bus *bus,
 			    struct eeh_rmv_data *rmv_data,
 			    bool driver_eeh_aware)
 {
@@ -765,7 +787,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
 		pdevs = pdev_cache_list_create(pe);
 		/* eeh_rmv_device() may re-acquire the recovery lock */
 		for (pdevp = pdevs; pdevp && *pdevp; pdevp++)
-			eeh_rmv_device(*pdevp, rmv_data);
+			eeh_rmv_device(event_id, *pdevp, rmv_data);
 		pdev_cache_list_destroy(pdevs);
 
 	} else {
@@ -814,8 +836,8 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
 	 * potentially weird things happen.
 	 */
 	if (!driver_eeh_aware || rmv_data->removed_dev_count) {
-		pr_info("EEH: Sleep 5s ahead of %s hotplug\n",
-			(driver_eeh_aware ? "partial" : "complete"));
+		pr_info("EEH(%u): Sleep 5s ahead of %s hotplug\n",
+			event_id, (driver_eeh_aware ? "partial" : "complete"));
 		eeh_recovery_unlock();
 		ssleep(5);
 		eeh_recovery_lock();
@@ -972,7 +994,7 @@ static void eeh_clear_slot_attention(struct pci_dev *pdev)
  * drivers (which cause a second set of hotplug events to go out to
  * userspace).
  */
-void eeh_handle_normal_event(struct eeh_pe *pe)
+void eeh_handle_normal_event(unsigned int event_id, struct eeh_pe *pe)
 {
 	struct pci_bus *bus;
 	struct eeh_dev *edev, *tmp;
@@ -987,8 +1009,8 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	eeh_recovery_lock();
 	bus = eeh_pe_bus_get(pe);
 	if (!bus) {
-		pr_err("%s: Cannot find PCI bus for PHB#%x-PE#%x\n",
-			__func__, pe->phb->global_number, pe->addr);
+		pr_err("EEH(%u): %s: Cannot find PCI bus for PHB#%x-PE#%x\n",
+			event_id, __func__, pe->phb->global_number, pe->addr);
 		eeh_recovery_unlock();
 		return;
 	}
@@ -1008,22 +1030,27 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 				devices++;
 
 	if (!devices) {
-		pr_debug("EEH: Frozen PHB#%x-PE#%x is empty!\n",
-			pe->phb->global_number, pe->addr);
+		pr_debug("EEH(%u): Frozen PHB#%x-PE#%x is empty!\n",
+			event_id, pe->phb->global_number, pe->addr);
 		goto out; /* nothing to recover */
 	}
 
+	pe->freeze_count++;
+	pr_warn("EEH(%u): EVENT=RECOVERY_START TYPE=%s PHB=%#x PE=%#x COUNT=%d\n",
+		event_id, ((pe->type & EEH_PE_PHB) ? "PHB" : "PE"),
+		pe->phb->global_number, pe->addr, pe->freeze_count);
+
 	/* Log the event */
 	if (pe->type & EEH_PE_PHB) {
-		pr_err("EEH: Recovering PHB#%x, location: %s\n",
-			pe->phb->global_number, eeh_pe_loc_get(pe));
+		pr_err("EEH(%u): Recovering PHB#%x, location: %s\n",
+			event_id, pe->phb->global_number, eeh_pe_loc_get(pe));
 	} else {
 		struct eeh_pe *phb_pe = eeh_phb_pe_get(pe->phb);
 
-		pr_err("EEH: Recovering PHB#%x-PE#%x\n",
-		       pe->phb->global_number, pe->addr);
-		pr_err("EEH: PE location: %s, PHB location: %s\n",
-		       eeh_pe_loc_get(pe), eeh_pe_loc_get(phb_pe));
+		pr_err("EEH(%u): Recovering PHB#%x-PE#%x\n",
+		       event_id, pe->phb->global_number, pe->addr);
+		pr_err("EEH(%u): PE location: %s, PHB location: %s\n",
+		       event_id, eeh_pe_loc_get(pe), eeh_pe_loc_get(phb_pe));
 	}
 
 #ifdef CONFIG_STACKTRACE
@@ -1035,13 +1062,13 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 		void **ptrs = (void **) pe->stack_trace;
 		int i;
 
-		pr_err("EEH: Frozen PHB#%x-PE#%x detected\n",
-		       pe->phb->global_number, pe->addr);
+		pr_err("EEH(%u): Frozen PHB#%x-PE#%x detected\n",
+		       event_id, pe->phb->global_number, pe->addr);
 
 		/* FIXME: Use the same format as dump_stack() */
-		pr_err("EEH: Call Trace:\n");
+		pr_err("EEH(%u): Call Trace:\n", event_id);
 		for (i = 0; i < pe->trace_entries; i++)
-			pr_err("EEH: [%pK] %pS\n", ptrs[i], ptrs[i]);
+			pr_err("EEH(%u): [%pK] %pS\n", event_id, ptrs[i], ptrs[i]);
 
 		pe->trace_entries = 0;
 	}
@@ -1052,10 +1079,9 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 			edev->mode &= ~EEH_DEV_NO_HANDLER;
 
 	eeh_pe_update_time_stamp(pe);
-	pe->freeze_count++;
 	if (pe->freeze_count > eeh_max_freezes) {
-		pr_err("EEH: PHB#%x-PE#%x has failed %d times in the last hour and has been permanently disabled.\n",
-		       pe->phb->global_number, pe->addr,
+		pr_err("EEH(%u): PHB#%x-PE#%x has failed %d times in the last hour and has been permanently disabled.\n",
+		       event_id, pe->phb->global_number, pe->addr,
 		       pe->freeze_count);
 
 		goto recover_failed;
@@ -1071,12 +1097,12 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	 * the error. Override the result if necessary to have partially
 	 * hotplug for this case.
 	 */
-	pr_warn("EEH: This PCI device has failed %d times in the last hour and will be permanently disabled after %d failures.\n",
-		pe->freeze_count, eeh_max_freezes);
-	pr_info("EEH: Notify device drivers to shutdown\n");
+	pr_warn("EEH(%u): This PCI device has failed %d times in the last hour and will be permanently disabled after %d failures.\n",
+		event_id, pe->freeze_count, eeh_max_freezes);
+	pr_info("EEH(%u): Notify device drivers to shutdown\n", event_id);
 	eeh_set_channel_state(pe, pci_channel_io_frozen);
 	eeh_set_irq_state(pe, false);
-	eeh_pe_report("error_detected(IO frozen)", pe,
+	eeh_pe_report(event_id, "error_detected(IO frozen)", pe,
 		      eeh_report_error, &result);
 	if (result == PCI_ERS_RESULT_DISCONNECT)
 		goto recover_failed;
@@ -1093,7 +1119,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	 */
 	rc = eeh_wait_state(pe, MAX_WAIT_FOR_RECOVERY * 1000, true);
 	if (rc < 0 || rc == EEH_STATE_NOT_SUPPORT) {
-		pr_warn("EEH: Permanent failure\n");
+		pr_warn("EEH(%u): Permanent failure\n", event_id);
 		goto recover_failed;
 	}
 
@@ -1101,16 +1127,16 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	 * don't post the error log until after all dev drivers
 	 * have been informed.
 	 */
-	pr_info("EEH: Collect temporary log\n");
-	eeh_slot_error_detail(pe, EEH_LOG_TEMP);
+	pr_info("EEH(%u): Collect temporary log\n", event_id);
+	eeh_slot_error_detail(event_id, pe, EEH_LOG_TEMP);
 
 	/* If all device drivers were EEH-unaware, then shut
 	 * down all of the device drivers, and hope they
 	 * go down willingly, without panicing the system.
 	 */
 	if (result == PCI_ERS_RESULT_NONE) {
-		pr_info("EEH: Reset with hotplug activity\n");
-		rc = eeh_reset_device(pe, bus, NULL, false);
+		pr_info("EEH(%u): Reset with hotplug activity\n", event_id);
+		rc = eeh_reset_device(event_id, pe, bus, NULL, false);
 		if (rc) {
 			pr_warn("%s: Unable to reset, err=%d\n", __func__, rc);
 			goto recover_failed;
@@ -1119,7 +1145,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 
 	/* If all devices reported they can proceed, then re-enable MMIO */
 	if (result == PCI_ERS_RESULT_CAN_RECOVER) {
-		pr_info("EEH: Enable I/O for affected devices\n");
+		pr_info("EEH(%u): Enable I/O for affected devices\n", event_id);
 		rc = eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
 		if (rc < 0)
 			goto recover_failed;
@@ -1127,13 +1153,13 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 		if (rc) {
 			result = PCI_ERS_RESULT_NEED_RESET;
 		} else {
-			pr_info("EEH: Notify device drivers to resume I/O\n");
-			eeh_pe_report("mmio_enabled", pe,
+			pr_info("EEH(%u): Notify device drivers to resume I/O\n", event_id);
+			eeh_pe_report(event_id, "mmio_enabled", pe,
 				      eeh_report_mmio_enabled, &result);
 		}
 	}
 	if (result == PCI_ERS_RESULT_CAN_RECOVER) {
-		pr_info("EEH: Enabled DMA for affected devices\n");
+		pr_info("EEH(%u): Enabled DMA for affected devices\n", event_id);
 		rc = eeh_pci_enable(pe, EEH_OPT_THAW_DMA);
 		if (rc < 0)
 			goto recover_failed;
@@ -1153,8 +1179,8 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 
 	/* If any device called out for a reset, then reset the slot */
 	if (result == PCI_ERS_RESULT_NEED_RESET) {
-		pr_info("EEH: Reset without hotplug activity\n");
-		rc = eeh_reset_device(pe, bus, &rmv_data, true);
+		pr_info("EEH(%u): Reset without hotplug activity\n", event_id);
+		rc = eeh_reset_device(event_id, pe, bus, &rmv_data, true);
 		if (rc) {
 			pr_warn("%s: Cannot reset, err=%d\n", __func__, rc);
 			goto recover_failed;
@@ -1163,7 +1189,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 		result = PCI_ERS_RESULT_NONE;
 		eeh_set_channel_state(pe, pci_channel_io_normal);
 		eeh_set_irq_state(pe, true);
-		eeh_pe_report("slot_reset", pe, eeh_report_reset,
+		eeh_pe_report(event_id, "slot_reset", pe, eeh_report_reset,
 			      &result);
 	}
 
@@ -1180,10 +1206,10 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 		}
 
 		/* Tell all device drivers that they can resume operations */
-		pr_info("EEH: Notify device driver to resume\n");
+		pr_info("EEH(%u): Notify device driver to resume\n", event_id);
 		eeh_set_channel_state(pe, pci_channel_io_normal);
 		eeh_set_irq_state(pe, true);
-		eeh_pe_report("resume", pe, eeh_report_resume, NULL);
+		eeh_pe_report(event_id, "resume", pe, eeh_report_resume, NULL);
 		eeh_for_each_pe(pe, tmp_pe) {
 			eeh_pe_for_each_dev(tmp_pe, edev, tmp) {
 				edev->mode &= ~EEH_DEV_NO_HANDLER;
@@ -1191,7 +1217,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 			}
 		}
 
-		pr_info("EEH: Recovery successful.\n");
+		pr_info("EEH(%u): Recovery successful.\n", event_id);
 		goto out;
 	}
 
@@ -1201,17 +1227,18 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	 * are due to poorly seated PCI cards. Only 10% or so are
 	 * due to actual, failed cards.
 	 */
-	pr_err("EEH: Unable to recover from failure from PHB#%x-PE#%x.\n"
+	pr_err("EEH(%u): Unable to recover from failure from PHB#%x-PE#%x.\n"
 		"Please try reseating or replacing it\n",
-		pe->phb->global_number, pe->addr);
+		event_id, pe->phb->global_number, pe->addr);
 
-	eeh_slot_error_detail(pe, EEH_LOG_PERM);
+	eeh_slot_error_detail(event_id, pe, EEH_LOG_PERM);
 
 	/* Notify all devices that they're about to go down. */
 	eeh_set_irq_state(pe, false);
-	eeh_pe_report("error_detected(permanent failure)", pe,
+	eeh_pe_report(event_id, "error_detected(permanent failure)", pe,
 		      eeh_report_failure, NULL);
 	eeh_set_channel_state(pe, pci_channel_io_perm_failure);
+	pr_crit("EEH(%u): EVENT=RECOVERY_END RESULT=failure\n", event_id);
 
 	/* Mark the PE to be removed permanently */
 	eeh_pe_state_mark(pe, EEH_PE_REMOVED);
@@ -1224,7 +1251,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	if (pe->type & EEH_PE_VF) {
 		pdevs = pdev_cache_list_create(pe);
 		for (pdevp = pdevs; pdevp && *pdevp; pdevp++)
-			eeh_rmv_device(*pdevp, NULL);
+			eeh_rmv_device(event_id, *pdevp, NULL);
 		pdev_cache_list_destroy(pdevs);
 		eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
 	} else {
@@ -1239,6 +1266,8 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 		return;
 	}
 
+	pr_info("EEH(%u): EVENT=RECOVERY_END RESULT=success\n", event_id);
+
 out:
 	/*
 	 * Clean up any PEs without devices. While marked as EEH_PE_RECOVERYING
@@ -1325,7 +1354,7 @@ void eeh_handle_special_event(void)
 		if (rc == EEH_NEXT_ERR_FROZEN_PE ||
 		    rc == EEH_NEXT_ERR_FENCED_PHB) {
 			eeh_pe_state_mark(pe, EEH_PE_RECOVERING);
-			eeh_handle_normal_event(pe);
+			eeh_handle_normal_event(0, pe);
 		} else {
 			eeh_for_each_pe(pe, tmp_pe)
 				eeh_pe_for_each_dev(tmp_pe, edev, tmp_edev)
@@ -1333,7 +1362,7 @@ void eeh_handle_special_event(void)
 
 			/* Notify all devices to be down */
 			eeh_pe_state_clear(pe, EEH_PE_PRI_BUS, true);
-			eeh_pe_report(
+			eeh_pe_report(0,
 				"error_detected(permanent failure)", pe,
 				eeh_report_failure, NULL);
 			eeh_set_channel_state(pe, pci_channel_io_perm_failure);
diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c
index c23a454af08a..6c205a77581f 100644
--- a/arch/powerpc/kernel/eeh_event.c
+++ b/arch/powerpc/kernel/eeh_event.c
@@ -26,6 +26,9 @@ static DEFINE_SPINLOCK(eeh_eventlist_lock);
 static DECLARE_COMPLETION(eeh_eventlist_event);
 static LIST_HEAD(eeh_eventlist);
 
+/* Event ID 0 is reserved for special events */
+static atomic_t eeh_event_id = ATOMIC_INIT(1);
+
 /**
  * eeh_event_handler - Dispatch EEH events.
  * @dummy - unused
@@ -59,7 +62,7 @@ static int eeh_event_handler(void * dummy)
 
 		/* We might have event without binding PE */
 		if (event->pe)
-			eeh_handle_normal_event(event->pe);
+			eeh_handle_normal_event(event->id, event->pe);
 		else
 			eeh_handle_special_event();
 
@@ -110,6 +113,13 @@ int __eeh_send_failure_event(struct eeh_pe *pe)
 		return -ENOMEM;
 	}
 	event->pe = pe;
+	do {
+		/* Skip over the special value (0) */
+		event->id = (unsigned int)atomic_inc_return(&eeh_event_id);
+	} while (!event->id);
+
+	pr_err("EEH(%u): EVENT=ERROR_DETECTED PHB=%#x PE=%#x\n",
+	      event->id, pe->phb->global_number, pe->addr);
 
 	/*
 	 * Mark the PE as recovering before inserting it in the queue.
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index a4889c9d4055..093e8f560ea4 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1732,7 +1732,7 @@ static inline bool movable_only_nodes(nodemask_t *nodes)
 	((1UL << (PFN_SECTION_SHIFT - pageblock_order)) * NR_PAGEBLOCK_BITS)
 
 #if (MAX_ORDER + PAGE_SHIFT) > SECTION_SIZE_BITS
-#error Allocator MAX_ORDER exceeds SECTION_SIZE
+//#error Allocator MAX_ORDER exceeds SECTION_SIZE
 #endif
 
 static inline unsigned long pfn_to_section_nr(unsigned long pfn)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-06-13  2:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-16  3:27 [RFC 0/3] Asynchronous EEH recovery Ganesh Goudar
2022-08-16  3:27 ` [RFC 1/3] powerpc/eeh: Synchronization for safety Ganesh Goudar
2022-08-16  3:27 ` [RFC 2/3] powerpc/eeh: Provide a unique ID for each EEH recovery Ganesh Goudar
2022-08-16  3:27 ` [RFC 3/3] powerpc/eeh: Asynchronous recovery Ganesh Goudar
2022-08-17  7:16 ` [RFC 0/3] Asynchronous EEH recovery Oliver O'Halloran
2022-09-02  0:19 ` Jason Gunthorpe
2022-09-15 10:15   ` Ganesh
2023-01-25 14:04     ` Christophe Leroy
2023-06-13  1:43 Ganesh Goudar
2023-06-13  1:43 ` [RFC 2/3] powerpc/eeh: Provide a unique ID for each " Ganesh Goudar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.