LinuxPPC-Dev Archive on lore.kernel.org
 help / Atom feed
* [PATCH 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes
@ 2019-02-08  3:07 Oliver O'Halloran
  2019-02-08  3:07 ` [PATCH 2/7] powerpc/eeh_cache: Add pr_debug() prints for insert/remove Oliver O'Halloran
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Oliver O'Halloran @ 2019-02-08  3:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Oliver O'Halloran

There's no need to the custom getter/setter functions so we should remove
them in favour of using the generic one. While we're here, change the type
of eeh_max_freeze to uint32_t and print the value in decimal rather than
hex because printing it in hex makes no sense.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/include/asm/eeh.h |  2 +-
 arch/powerpc/kernel/eeh.c      | 21 +++------------------
 2 files changed, 4 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 8b596d096ebe..c003628441cc 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -219,7 +219,7 @@ struct eeh_ops {
 };
 
 extern int eeh_subsystem_flags;
-extern int eeh_max_freezes;
+extern uint32_t eeh_max_freezes;
 extern struct eeh_ops *eeh_ops;
 extern raw_spinlock_t confirm_error_lock;
 
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index ae05203eb4de..f6e65375a8de 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -109,7 +109,7 @@ EXPORT_SYMBOL(eeh_subsystem_flags);
  * frozen count in last hour exceeds this limit, the PE will
  * be forced to be offline permanently.
  */
-int eeh_max_freezes = 5;
+uint32_t eeh_max_freezes = 5;
 
 /* Platform dependent EEH operations */
 struct eeh_ops *eeh_ops = NULL;
@@ -1796,22 +1796,8 @@ static int eeh_enable_dbgfs_get(void *data, u64 *val)
 	return 0;
 }
 
-static int eeh_freeze_dbgfs_set(void *data, u64 val)
-{
-	eeh_max_freezes = val;
-	return 0;
-}
-
-static int eeh_freeze_dbgfs_get(void *data, u64 *val)
-{
-	*val = eeh_max_freezes;
-	return 0;
-}
-
 DEFINE_DEBUGFS_ATTRIBUTE(eeh_enable_dbgfs_ops, eeh_enable_dbgfs_get,
 			 eeh_enable_dbgfs_set, "0x%llx\n");
-DEFINE_DEBUGFS_ATTRIBUTE(eeh_freeze_dbgfs_ops, eeh_freeze_dbgfs_get,
-			 eeh_freeze_dbgfs_set, "0x%llx\n");
 #endif
 
 static int __init eeh_init_proc(void)
@@ -1822,9 +1808,8 @@ static int __init eeh_init_proc(void)
 		debugfs_create_file_unsafe("eeh_enable", 0600,
 					   powerpc_debugfs_root, NULL,
 					   &eeh_enable_dbgfs_ops);
-		debugfs_create_file_unsafe("eeh_max_freezes", 0600,
-					   powerpc_debugfs_root, NULL,
-					   &eeh_freeze_dbgfs_ops);
+		debugfs_create_u32("eeh_max_freezes", 0600,
+				powerpc_debugfs_root, &eeh_max_freezes);
 #endif
 	}
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 2/7] powerpc/eeh_cache: Add pr_debug() prints for insert/remove
  2019-02-08  3:07 [PATCH 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Oliver O'Halloran
@ 2019-02-08  3:07 ` Oliver O'Halloran
  2019-02-08  3:07 ` [PATCH 3/7] powerpc/eeh_cache: Add a way to dump the EEH address cache Oliver O'Halloran
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Oliver O'Halloran @ 2019-02-08  3:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Oliver O'Halloran

The EEH address cache is used to map a physical MMIO address back to a PCI
device. It's useful to know when it's being manipulated, but currently this
requires recompiling with #define DEBUG set. This is pointless since we
have dynamic_debug nowdays, so remove the #ifdef guard and add a pr_debug()
for the remove case too.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/kernel/eeh_cache.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index 201943d54a6e..b2c320e0fcef 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -157,10 +157,8 @@ eeh_addr_cache_insert(struct pci_dev *dev, resource_size_t alo,
 	piar->pcidev = dev;
 	piar->flags = flags;
 
-#ifdef DEBUG
 	pr_debug("PIAR: insert range=[%pap:%pap] dev=%s\n",
 		 &alo, &ahi, pci_name(dev));
-#endif
 
 	rb_link_node(&piar->rb_node, parent, p);
 	rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root);
@@ -240,6 +238,8 @@ static inline void __eeh_addr_cache_rmv_dev(struct pci_dev *dev)
 		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
 
 		if (piar->pcidev == dev) {
+			pr_debug("PIAR: remove range=[%pap:%pap] dev=%s\n",
+				 &piar->addr_lo, &piar->addr_hi, pci_name(dev));
 			rb_erase(n, &pci_io_addr_cache_root.rb_root);
 			kfree(piar);
 			goto restart;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 3/7] powerpc/eeh_cache: Add a way to dump the EEH address cache
  2019-02-08  3:07 [PATCH 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Oliver O'Halloran
  2019-02-08  3:07 ` [PATCH 2/7] powerpc/eeh_cache: Add pr_debug() prints for insert/remove Oliver O'Halloran
@ 2019-02-08  3:07 ` Oliver O'Halloran
  2019-02-08  9:00   ` kbuild test robot
  2019-02-08  9:47   ` Michael Ellerman
  2019-02-08  3:07 ` [PATCH 4/7] powerpc/eeh_cache: Bump log level of eeh_addr_cache_print() Oliver O'Halloran
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 21+ messages in thread
From: Oliver O'Halloran @ 2019-02-08  3:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Oliver O'Halloran

Adds a debugfs file that can be read to view the contents of the EEH
address cache. This is pretty similar to the existing
eeh_addr_cache_print() function, but that function is intended to debug
issues inside of the kernel since it's #ifdef`ed out by default, and writes
into the kernel log.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/include/asm/eeh.h  |  3 +++
 arch/powerpc/kernel/eeh.c       |  2 +-
 arch/powerpc/kernel/eeh_cache.c | 34 +++++++++++++++++++++++++++++----
 3 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index c003628441cc..fc21b6e78e91 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -460,6 +460,9 @@ static inline void eeh_readsl(const volatile void __iomem *addr, void * buf,
 		eeh_check_failure(addr);
 }
 
+
+void eeh_cache_debugfs_init(void);
+
 #endif /* CONFIG_PPC64 */
 #endif /* __KERNEL__ */
 #endif /* _POWERPC_EEH_H */
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index f6e65375a8de..d1f0bdf41fac 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1810,7 +1810,7 @@ static int __init eeh_init_proc(void)
 					   &eeh_enable_dbgfs_ops);
 		debugfs_create_u32("eeh_max_freezes", 0600,
 				powerpc_debugfs_root, &eeh_max_freezes);
-#endif
+		eeh_cache_debugfs_init();
 	}
 
 	return 0;
diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index b2c320e0fcef..dba421a577e7 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -26,6 +26,7 @@
 #include <linux/spinlock.h>
 #include <linux/atomic.h>
 #include <asm/pci-bridge.h>
+#include <asm/debugfs.h>
 #include <asm/ppc-pci.h>
 
 
@@ -298,9 +299,34 @@ void eeh_addr_cache_build(void)
 		eeh_addr_cache_insert_dev(dev);
 		eeh_sysfs_add_device(dev);
 	}
+}
 
-#ifdef DEBUG
-	/* Verify tree built up above, echo back the list of addrs. */
-	eeh_addr_cache_print(&pci_io_addr_cache_root);
-#endif
+static int eeh_addr_cache_show(struct seq_file *s, void *v)
+{
+	struct rb_node *n = rb_first(&pci_io_addr_cache_root.rb_root);
+	struct pci_io_addr_range *piar;
+	int cnt = 0;
+
+	spin_lock(&pci_io_addr_cache_root.piar_lock);
+	while (n) {
+		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
+
+		seq_printf(s, "%s addr range %3d [%pap-%pap]: %s\n",
+		       (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt,
+		       &piar->addr_lo, &piar->addr_hi, pci_name(piar->pcidev));
+
+		n = rb_next(n);
+		cnt++;
+	}
+	spin_unlock(&pci_io_addr_cache_root.piar_lock);
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(eeh_addr_cache);
+
+void eeh_cache_debugfs_init(void)
+{
+	debugfs_create_file_unsafe("eeh_address_cache", 0400,
+			powerpc_debugfs_root, NULL,
+			&eeh_addr_cache_fops);
 }
-- 
2.20.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 4/7] powerpc/eeh_cache: Bump log level of eeh_addr_cache_print()
  2019-02-08  3:07 [PATCH 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Oliver O'Halloran
  2019-02-08  3:07 ` [PATCH 2/7] powerpc/eeh_cache: Add pr_debug() prints for insert/remove Oliver O'Halloran
  2019-02-08  3:07 ` [PATCH 3/7] powerpc/eeh_cache: Add a way to dump the EEH address cache Oliver O'Halloran
@ 2019-02-08  3:07 ` Oliver O'Halloran
  2019-02-08  3:08 ` [PATCH 5/7] powerpc/pci: Add pci_find_hose_for_domain() Oliver O'Halloran
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Oliver O'Halloran @ 2019-02-08  3:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Oliver O'Halloran

To use this function at all #define DEBUG needs to be set in eeh_cache.c.
Considering that printing at pr_debug is probably not all that useful since
it adds the additional hurdle of requiring you to enable the debug print if
dynamic_debug is in use so this patch bumps it to pr_info.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/kernel/eeh_cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index dba421a577e7..7de278f88e3d 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -114,7 +114,7 @@ static void eeh_addr_cache_print(struct pci_io_addr_cache *cache)
 	while (n) {
 		struct pci_io_addr_range *piar;
 		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
-		pr_debug("PCI: %s addr range %d [%pap-%pap]: %s\n",
+		pr_info("PCI: %s addr range %d [%pap-%pap]: %s\n",
 		       (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt,
 		       &piar->addr_lo, &piar->addr_hi, pci_name(piar->pcidev));
 		cnt++;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 5/7] powerpc/pci: Add pci_find_hose_for_domain()
  2019-02-08  3:07 [PATCH 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Oliver O'Halloran
                   ` (2 preceding siblings ...)
  2019-02-08  3:07 ` [PATCH 4/7] powerpc/eeh_cache: Bump log level of eeh_addr_cache_print() Oliver O'Halloran
@ 2019-02-08  3:08 ` Oliver O'Halloran
  2019-02-08  9:57   ` Michael Ellerman
  2019-02-08  3:08 ` [PATCH 6/7] powerpc/eeh: Allow disabling recovery Oliver O'Halloran
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Oliver O'Halloran @ 2019-02-08  3:08 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Oliver O'Halloran

Add a helper to find the pci_controller structure based on the domain
number / phb id.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/include/asm/pci-bridge.h |  2 ++
 arch/powerpc/kernel/pci-common.c      | 11 +++++++++++
 2 files changed, 13 insertions(+)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index aee4fcc24990..149053b7f481 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -274,6 +274,8 @@ extern int pcibios_map_io_space(struct pci_bus *bus);
 extern struct pci_controller *pci_find_hose_for_OF_device(
 			struct device_node* node);
 
+extern struct pci_controller *pci_find_hose_for_domain(uint32_t domain_nr);
+
 /* Fill up host controller resources from the OF node */
 extern void pci_process_bridge_OF_ranges(struct pci_controller *hose,
 			struct device_node *dev, int primary);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 88e4f69a09e5..958f38c698da 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -357,6 +357,17 @@ struct pci_controller* pci_find_hose_for_OF_device(struct device_node* node)
 	return NULL;
 }
 
+struct pci_controller *pci_find_hose_for_domain(uint32_t domain_nr)
+{
+	struct pci_controller *hose;
+
+	list_for_each_entry(hose, &hose_list, list_node)
+		if (hose->global_number == domain_nr)
+			return hose;
+
+	return NULL;
+}
+
 /*
  * Reads the interrupt pin to determine if interrupt is use by card.
  * If the interrupt is used, then gets the interrupt line from the
-- 
2.20.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 6/7] powerpc/eeh: Allow disabling recovery
  2019-02-08  3:07 [PATCH 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Oliver O'Halloran
                   ` (3 preceding siblings ...)
  2019-02-08  3:08 ` [PATCH 5/7] powerpc/pci: Add pci_find_hose_for_domain() Oliver O'Halloran
@ 2019-02-08  3:08 ` Oliver O'Halloran
  2019-02-08  9:58   ` Michael Ellerman
  2019-02-08  3:08 ` [PATCH 7/7] powerpc/eeh: Add eeh_force_recover to debugfs Oliver O'Halloran
  2019-02-08  9:38 ` [PATCH 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Michael Ellerman
  6 siblings, 1 reply; 21+ messages in thread
From: Oliver O'Halloran @ 2019-02-08  3:08 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Oliver O'Halloran

Currently when we detect an error we automatically invoke the EEH recovery
handler. This can be annoying when debugging EEH problems, or when working
on EEH itself so this patch adds a debugfs knob that will prevent a
recovery event from being queued up when an issue is detected.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/include/asm/eeh.h  |  1 +
 arch/powerpc/kernel/eeh.c       | 11 +++++++++++
 arch/powerpc/kernel/eeh_event.c |  9 +++++++++
 3 files changed, 21 insertions(+)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index fc21b6e78e91..6f6721561302 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -220,6 +220,7 @@ struct eeh_ops {
 
 extern int eeh_subsystem_flags;
 extern uint32_t eeh_max_freezes;
+extern bool eeh_debugfs_no_recover;
 extern struct eeh_ops *eeh_ops;
 extern raw_spinlock_t confirm_error_lock;
 
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index d1f0bdf41fac..92809b137e39 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -111,6 +111,13 @@ EXPORT_SYMBOL(eeh_subsystem_flags);
  */
 uint32_t eeh_max_freezes = 5;
 
+/*
+ * Controls whether a recovery event should be scheduled when an
+ * isolated device is discovered. This is only really useful for
+ * debugging problems with the EEH core.
+ */
+bool eeh_debugfs_no_recover;
+
 /* Platform dependent EEH operations */
 struct eeh_ops *eeh_ops = NULL;
 
@@ -1810,7 +1817,11 @@ static int __init eeh_init_proc(void)
 					   &eeh_enable_dbgfs_ops);
 		debugfs_create_u32("eeh_max_freezes", 0600,
 				powerpc_debugfs_root, &eeh_max_freezes);
+		debugfs_create_bool("eeh_disable_recovery", 0600,
+				powerpc_debugfs_root,
+				&eeh_debugfs_no_recover);
 		eeh_cache_debugfs_init();
+#endif
 	}
 
 	return 0;
diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c
index 227e57f980df..19837798bb1d 100644
--- a/arch/powerpc/kernel/eeh_event.c
+++ b/arch/powerpc/kernel/eeh_event.c
@@ -126,6 +126,15 @@ int eeh_send_failure_event(struct eeh_pe *pe)
 	unsigned long flags;
 	struct eeh_event *event;
 
+	/*
+	 * If we've manually supressed recovery events via debugfs
+	 * then just drop it on the floor.
+	 */
+	if (eeh_debugfs_no_recover) {
+		pr_err("EEH: Event dropped due to no_recover setting\n");
+		return 0;
+	}
+
 	event = kzalloc(sizeof(*event), GFP_ATOMIC);
 	if (!event) {
 		pr_err("EEH: out of memory, event not handled\n");
-- 
2.20.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 7/7] powerpc/eeh: Add eeh_force_recover to debugfs
  2019-02-08  3:07 [PATCH 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Oliver O'Halloran
                   ` (4 preceding siblings ...)
  2019-02-08  3:08 ` [PATCH 6/7] powerpc/eeh: Allow disabling recovery Oliver O'Halloran
@ 2019-02-08  3:08 ` Oliver O'Halloran
  2019-02-08 12:31   ` Michael Ellerman
  2019-02-13  4:37   ` Sam Bobroff
  2019-02-08  9:38 ` [PATCH 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Michael Ellerman
  6 siblings, 2 replies; 21+ messages in thread
From: Oliver O'Halloran @ 2019-02-08  3:08 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Oliver O'Halloran

This patch adds a debugfs interface to force scheduling a recovery event.
This can be used to recover a specific PE or schedule a "special" recovery
even that checks for errors at the PHB level.
To force a recovery of a normal PE, use:

 echo '<#pe>:<#phb>' > /sys/kernel/debug/powerpc/eeh_force_recover

To force a scan broken PHBs:

 echo 'null' > /sys/kernel/debug/powerpc/eeh_force_recover

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/include/asm/eeh_event.h |  1 +
 arch/powerpc/kernel/eeh.c            | 60 ++++++++++++++++++++++++++++
 arch/powerpc/kernel/eeh_event.c      | 25 +++++++-----
 3 files changed, 76 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh_event.h b/arch/powerpc/include/asm/eeh_event.h
index 9884e872686f..6d0412b846ac 100644
--- a/arch/powerpc/include/asm/eeh_event.h
+++ b/arch/powerpc/include/asm/eeh_event.h
@@ -33,6 +33,7 @@ struct eeh_event {
 
 int eeh_event_init(void);
 int eeh_send_failure_event(struct eeh_pe *pe);
+int __eeh_send_failure_event(struct eeh_pe *pe);
 void eeh_remove_event(struct eeh_pe *pe, bool force);
 void eeh_handle_normal_event(struct eeh_pe *pe);
 void eeh_handle_special_event(void);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 92809b137e39..63b91a4918c9 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1805,6 +1805,63 @@ static int eeh_enable_dbgfs_get(void *data, u64 *val)
 
 DEFINE_DEBUGFS_ATTRIBUTE(eeh_enable_dbgfs_ops, eeh_enable_dbgfs_get,
 			 eeh_enable_dbgfs_set, "0x%llx\n");
+
+static ssize_t eeh_force_recover_write(struct file *filp,
+				const char __user *user_buf,
+				size_t count, loff_t *ppos)
+{
+	struct pci_controller *hose;
+	uint32_t phbid, pe_no;
+	struct eeh_pe *pe;
+	char buf[20];
+	int ret;
+
+	ret = simple_write_to_buffer(buf, sizeof(buf), ppos, user_buf, count);
+	if (!ret)
+		return -EFAULT;
+
+	/*
+	 * When PE is NULL the event is a "special" event. Rather than
+	 * recovering a specific PE it forces the EEH core to scan for failed
+	 * PHBs and recovers each. This needs to be done before any device
+	 * recoveries can occur.
+	 */
+	if (!strncmp(buf, "null", 4)) {
+		pr_err("sending failure event\n");
+		__eeh_send_failure_event(NULL);
+		return count;
+	}
+
+	ret = sscanf(buf, "%x:%x", &phbid, &pe_no);
+	if (ret != 2)
+		return -EINVAL;
+
+	hose = pci_find_hose_for_domain(phbid);
+	if (!hose)
+		return -ENODEV;
+
+	/* Retrieve PE */
+	pe = eeh_pe_get(hose, pe_no, 0);
+	if (!pe)
+		return -ENODEV;
+
+	/*
+	 * We don't do any state checking here since the detection
+	 * process is async to the recovery process. The recovery
+	 * thread *should* not break even if we schedule a recovery
+	 * from an odd state (e.g. PE removed, or recovery of a
+	 * non-isolated PE)
+	 */
+	__eeh_send_failure_event(pe);
+
+	return ret < 0 ? ret : count;
+}
+
+static const struct file_operations eeh_force_recover_fops = {
+	.open	= simple_open,
+	.llseek	= no_llseek,
+	.write	= eeh_force_recover_write,
+};
 #endif
 
 static int __init eeh_init_proc(void)
@@ -1820,6 +1877,9 @@ static int __init eeh_init_proc(void)
 		debugfs_create_bool("eeh_disable_recovery", 0600,
 				powerpc_debugfs_root,
 				&eeh_debugfs_no_recover);
+		debugfs_create_file_unsafe("eeh_force_recover", 0600,
+				powerpc_debugfs_root, NULL,
+				&eeh_force_recover_fops);
 		eeh_cache_debugfs_init();
 #endif
 	}
diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c
index 19837798bb1d..539aca055d70 100644
--- a/arch/powerpc/kernel/eeh_event.c
+++ b/arch/powerpc/kernel/eeh_event.c
@@ -121,20 +121,11 @@ int eeh_event_init(void)
  * the actual event will be delivered in a normal context
  * (from a workqueue).
  */
-int eeh_send_failure_event(struct eeh_pe *pe)
+int __eeh_send_failure_event(struct eeh_pe *pe)
 {
 	unsigned long flags;
 	struct eeh_event *event;
 
-	/*
-	 * If we've manually supressed recovery events via debugfs
-	 * then just drop it on the floor.
-	 */
-	if (eeh_debugfs_no_recover) {
-		pr_err("EEH: Event dropped due to no_recover setting\n");
-		return 0;
-	}
-
 	event = kzalloc(sizeof(*event), GFP_ATOMIC);
 	if (!event) {
 		pr_err("EEH: out of memory, event not handled\n");
@@ -153,6 +144,20 @@ int eeh_send_failure_event(struct eeh_pe *pe)
 	return 0;
 }
 
+int eeh_send_failure_event(struct eeh_pe *pe)
+{
+	/*
+	 * If we've manually supressed recovery events via debugfs
+	 * then just drop it on the floor.
+	 */
+	if (eeh_debugfs_no_recover) {
+		pr_err("EEH: Event dropped due to no_recover setting\n");
+		return 0;
+	}
+
+	return __eeh_send_failure_event(pe);
+}
+
 /**
  * eeh_remove_event - Remove EEH event from the queue
  * @pe: Event binding to the PE
-- 
2.20.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/7] powerpc/eeh_cache: Add a way to dump the EEH address cache
  2019-02-08  3:07 ` [PATCH 3/7] powerpc/eeh_cache: Add a way to dump the EEH address cache Oliver O'Halloran
@ 2019-02-08  9:00   ` kbuild test robot
  2019-02-08  9:47   ` Michael Ellerman
  1 sibling, 0 replies; 21+ messages in thread
From: kbuild test robot @ 2019-02-08  9:00 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: Oliver O'Halloran, linuxppc-dev, kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2235 bytes --]

Hi Oliver,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.0-rc4 next-20190207]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Oliver-O-Halloran/powerpc-eeh-Use-debugfs_create_u32-for-eeh_max_freezes/20190208-145918
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 8.2.0-11) 8.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=8.2.0 make.cross ARCH=powerpc 

Note: the linux-review/Oliver-O-Halloran/powerpc-eeh-Use-debugfs_create_u32-for-eeh_max_freezes/20190208-145918 HEAD a8dcd44575537e3e67a44fe3139b273a64c0f620 builds fine.
      It only hurts bisectibility.

All errors (new ones prefixed by >>):

>> arch/powerpc/kernel/eeh.c:1840: error: unterminated #ifdef
    #ifdef CONFIG_DEBUG_FS
    

vim +1840 arch/powerpc/kernel/eeh.c

7f52a526f arch/powerpc/kernel/eeh.c Gavin Shan        2014-04-24  1835  
^1da177e4 arch/ppc64/kernel/eeh.c   Linus Torvalds    2005-04-16  1836  static int __init eeh_init_proc(void)
^1da177e4 arch/ppc64/kernel/eeh.c   Linus Torvalds    2005-04-16  1837  {
7f52a526f arch/powerpc/kernel/eeh.c Gavin Shan        2014-04-24  1838  	if (machine_is(pseries) || machine_is(powernv)) {
3f3942aca arch/powerpc/kernel/eeh.c Christoph Hellwig 2018-05-15  1839  		proc_create_single("powerpc/eeh", 0, NULL, proc_eeh_show);
7f52a526f arch/powerpc/kernel/eeh.c Gavin Shan        2014-04-24 @1840  #ifdef CONFIG_DEBUG_FS

:::::: The code at line 1840 was first introduced by commit
:::::: 7f52a526f64c69c913f0027fbf43821ff0b3a7d7 powerpc/eeh: Allow to disable EEH

:::::: TO: Gavin Shan <gwshan@linux.vnet.ibm.com>
:::::: CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 59719 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes
  2019-02-08  3:07 [PATCH 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Oliver O'Halloran
                   ` (5 preceding siblings ...)
  2019-02-08  3:08 ` [PATCH 7/7] powerpc/eeh: Add eeh_force_recover to debugfs Oliver O'Halloran
@ 2019-02-08  9:38 ` Michael Ellerman
  6 siblings, 0 replies; 21+ messages in thread
From: Michael Ellerman @ 2019-02-08  9:38 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev; +Cc: Oliver O'Halloran

Oliver O'Halloran <oohall@gmail.com> writes:

> There's no need to the custom getter/setter functions so we should remove
> them in favour of using the generic one. While we're here, change the type
> of eeh_max_freeze to uint32_t and print the value in decimal rather than

Please use kernel types, ie. u32.

Look fine otherwise.

cheers

> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> ---
>  arch/powerpc/include/asm/eeh.h |  2 +-
>  arch/powerpc/kernel/eeh.c      | 21 +++------------------
>  2 files changed, 4 insertions(+), 19 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index 8b596d096ebe..c003628441cc 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -219,7 +219,7 @@ struct eeh_ops {
>  };
>  
>  extern int eeh_subsystem_flags;
> -extern int eeh_max_freezes;
> +extern uint32_t eeh_max_freezes;
>  extern struct eeh_ops *eeh_ops;
>  extern raw_spinlock_t confirm_error_lock;
>  
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index ae05203eb4de..f6e65375a8de 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -109,7 +109,7 @@ EXPORT_SYMBOL(eeh_subsystem_flags);
>   * frozen count in last hour exceeds this limit, the PE will
>   * be forced to be offline permanently.
>   */
> -int eeh_max_freezes = 5;
> +uint32_t eeh_max_freezes = 5;
>  
>  /* Platform dependent EEH operations */
>  struct eeh_ops *eeh_ops = NULL;
> @@ -1796,22 +1796,8 @@ static int eeh_enable_dbgfs_get(void *data, u64 *val)
>  	return 0;
>  }
>  
> -static int eeh_freeze_dbgfs_set(void *data, u64 val)
> -{
> -	eeh_max_freezes = val;
> -	return 0;
> -}
> -
> -static int eeh_freeze_dbgfs_get(void *data, u64 *val)
> -{
> -	*val = eeh_max_freezes;
> -	return 0;
> -}
> -
>  DEFINE_DEBUGFS_ATTRIBUTE(eeh_enable_dbgfs_ops, eeh_enable_dbgfs_get,
>  			 eeh_enable_dbgfs_set, "0x%llx\n");
> -DEFINE_DEBUGFS_ATTRIBUTE(eeh_freeze_dbgfs_ops, eeh_freeze_dbgfs_get,
> -			 eeh_freeze_dbgfs_set, "0x%llx\n");
>  #endif
>  
>  static int __init eeh_init_proc(void)
> @@ -1822,9 +1808,8 @@ static int __init eeh_init_proc(void)
>  		debugfs_create_file_unsafe("eeh_enable", 0600,
>  					   powerpc_debugfs_root, NULL,
>  					   &eeh_enable_dbgfs_ops);
> -		debugfs_create_file_unsafe("eeh_max_freezes", 0600,
> -					   powerpc_debugfs_root, NULL,
> -					   &eeh_freeze_dbgfs_ops);
> +		debugfs_create_u32("eeh_max_freezes", 0600,
> +				powerpc_debugfs_root, &eeh_max_freezes);
>  #endif
>  	}
>  
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/7] powerpc/eeh_cache: Add a way to dump the EEH address cache
  2019-02-08  3:07 ` [PATCH 3/7] powerpc/eeh_cache: Add a way to dump the EEH address cache Oliver O'Halloran
  2019-02-08  9:00   ` kbuild test robot
@ 2019-02-08  9:47   ` Michael Ellerman
  2019-02-08 13:14     ` Oliver
  1 sibling, 1 reply; 21+ messages in thread
From: Michael Ellerman @ 2019-02-08  9:47 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev; +Cc: Oliver O'Halloran

Oliver O'Halloran <oohall@gmail.com> writes:
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index f6e65375a8de..d1f0bdf41fac 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1810,7 +1810,7 @@ static int __init eeh_init_proc(void)
>  					   &eeh_enable_dbgfs_ops);
>  		debugfs_create_u32("eeh_max_freezes", 0600,
>  				powerpc_debugfs_root, &eeh_max_freezes);
> -#endif
> +		eeh_cache_debugfs_init();

Oops :)

> diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
> index b2c320e0fcef..dba421a577e7 100644
> --- a/arch/powerpc/kernel/eeh_cache.c
> +++ b/arch/powerpc/kernel/eeh_cache.c
> @@ -298,9 +299,34 @@ void eeh_addr_cache_build(void)
>  		eeh_addr_cache_insert_dev(dev);
>  		eeh_sysfs_add_device(dev);
>  	}
> +}
>  
> -#ifdef DEBUG
> -	/* Verify tree built up above, echo back the list of addrs. */
> -	eeh_addr_cache_print(&pci_io_addr_cache_root);
> -#endif
> +static int eeh_addr_cache_show(struct seq_file *s, void *v)
> +{
> +	struct rb_node *n = rb_first(&pci_io_addr_cache_root.rb_root);
> +	struct pci_io_addr_range *piar;
> +	int cnt = 0;
> +
> +	spin_lock(&pci_io_addr_cache_root.piar_lock);
> +	while (n) {
> +		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
> +
> +		seq_printf(s, "%s addr range %3d [%pap-%pap]: %s\n",
> +		       (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt,
> +		       &piar->addr_lo, &piar->addr_hi, pci_name(piar->pcidev));
> +
> +		n = rb_next(n);
> +		cnt++;
> +	}

You can write that as a for loop can't you?

	struct rb_node *n;
        int i = 0;

	for (n = rb_first(&pci_io_addr_cache_root.rb_root); n; n = rb_next(n), i++) {
		piar = rb_entry(n, struct pci_io_addr_range, rb_node);

		seq_printf(s, "%s addr range %3d [%pap-%pap]: %s\n",
		       (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", i,
		       &piar->addr_lo, &piar->addr_hi, pci_name(piar->pcidev));
	}

cheers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/7] powerpc/pci: Add pci_find_hose_for_domain()
  2019-02-08  3:08 ` [PATCH 5/7] powerpc/pci: Add pci_find_hose_for_domain() Oliver O'Halloran
@ 2019-02-08  9:57   ` Michael Ellerman
  2019-02-08 12:53     ` Oliver
  0 siblings, 1 reply; 21+ messages in thread
From: Michael Ellerman @ 2019-02-08  9:57 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev; +Cc: Oliver O'Halloran

Oliver O'Halloran <oohall@gmail.com> writes:
> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
> index aee4fcc24990..149053b7f481 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -274,6 +274,8 @@ extern int pcibios_map_io_space(struct pci_bus *bus);
>  extern struct pci_controller *pci_find_hose_for_OF_device(
>  			struct device_node* node);
>  
> +extern struct pci_controller *pci_find_hose_for_domain(uint32_t domain_nr);

I know we use "hose" a lot in the PCI code, but it's a stupid name. Can
we not introduce new usages?

It returns a pci_controller so pci_find_controller_for_domain() ?

cheers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/7] powerpc/eeh: Allow disabling recovery
  2019-02-08  3:08 ` [PATCH 6/7] powerpc/eeh: Allow disabling recovery Oliver O'Halloran
@ 2019-02-08  9:58   ` Michael Ellerman
  2019-02-08 12:52     ` Oliver
  0 siblings, 1 reply; 21+ messages in thread
From: Michael Ellerman @ 2019-02-08  9:58 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev; +Cc: Oliver O'Halloran

Oliver O'Halloran <oohall@gmail.com> writes:

> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index d1f0bdf41fac..92809b137e39 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1810,7 +1817,11 @@ static int __init eeh_init_proc(void)
>  					   &eeh_enable_dbgfs_ops);
>  		debugfs_create_u32("eeh_max_freezes", 0600,
>  				powerpc_debugfs_root, &eeh_max_freezes);
> +		debugfs_create_bool("eeh_disable_recovery", 0600,
> +				powerpc_debugfs_root,
> +				&eeh_debugfs_no_recover);
>  		eeh_cache_debugfs_init();
> +#endif

There's that endif.

Whem I'm doing rebasing and think I might have broken bisectability I
build every commit with:

  https://github.com/mpe/misc-scripts/blob/master/git/for-each-commit


cheers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/7] powerpc/eeh: Add eeh_force_recover to debugfs
  2019-02-08  3:08 ` [PATCH 7/7] powerpc/eeh: Add eeh_force_recover to debugfs Oliver O'Halloran
@ 2019-02-08 12:31   ` Michael Ellerman
  2019-02-08 12:50     ` Oliver
  2019-02-13  4:37   ` Sam Bobroff
  1 sibling, 1 reply; 21+ messages in thread
From: Michael Ellerman @ 2019-02-08 12:31 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev; +Cc: Oliver O'Halloran

Oliver O'Halloran <oohall@gmail.com> writes:

> This patch adds a debugfs interface to force scheduling a recovery event.
> This can be used to recover a specific PE or schedule a "special" recovery
> even that checks for errors at the PHB level.
> To force a recovery of a normal PE, use:
>
>  echo '<#pe>:<#phb>' > /sys/kernel/debug/powerpc/eeh_force_recover
>
> To force a scan broken PHBs:
>
>  echo 'null' > /sys/kernel/debug/powerpc/eeh_force_recover

Why 'null', that seems like an odd choice. Why not "all" or "scan" or
something?

Also it oopsed on me:

[   76.323164] sending failure event
[   76.323421] BUG: Unable to handle kernel instruction fetch (NULL pointer?)
[   76.323655] Faulting instruction address: 0x00000000
[   76.323856] Oops: Kernel access of bad area, sig: 11 [#1]
[   76.323946] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[   76.324295] Modules linked in: vmx_crypto kvm binfmt_misc ip_tables x_tables autofs4 crc32c_vpmsum
[   76.324669] CPU: 2 PID: 97 Comm: eehd Not tainted 5.0.0-rc2-gcc-8.2.0-00080-gfacc0d1d9517 #435
[   76.325054] NIP:  0000000000000000 LR: c0000000000451f8 CTR: 0000000000000000
[   76.325402] REGS: c0000000fec779c0 TRAP: 0400   Not tainted  (5.0.0-rc2-gcc-8.2.0-00080-gfacc0d1d9517)
[   76.325768] MSR:  800000014280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 24000482  XER: 20000000
[   76.326243] CFAR: c000000000002528 IRQMASK: 0 
[   76.326243] GPR00: c000000000045edc c0000000fec77c50 c000000001574000 c0000000fec77cb0 
[   76.326243] GPR04: 0000000000000000 00177d76e3e321bc 00177d76e4293a1f 5deadbeef0000100 
[   76.326243] GPR08: 5deadbeef0000200 0000000000000000 0000000000000000 00177d76e3e3216b 
[   76.326243] GPR12: 0000000000000000 c00000003fffdf00 c0000000001438a8 c0000000fe211700 
[   76.326243] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[   76.326243] GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000e814e8 
[   76.326243] GPR24: c000000000e814c0 5deadbeef0000100 c000000001622480 0000000100000000 
[   76.326243] GPR28: c000000001413310 c0000000016244e0 c0000000014132f0 c0000001f84246a0 
[   76.329073] NIP [0000000000000000]           (null)
[   76.329285] LR [c0000000000451f8] eeh_handle_special_event+0x78/0x348
[   76.329602] Call Trace:
[   76.329762] [c0000000fec77c50] [c0000000fec77ce0] 0xc0000000fec77ce0 (unreliable)
[   76.330113] [c0000000fec77d00] [c000000000045edc] eeh_event_handler+0x10c/0x1c0
[   76.330464] [c0000000fec77db0] [c000000000143a4c] kthread+0x1ac/0x1c0
[   76.330681] [c0000000fec77e20] [c00000000000bdc4] ret_from_kernel_thread+0x5c/0x78
[   76.331026] Instruction dump:
[   76.331197] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
[   76.331550] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
[   76.331803] ---[ end trace dc73d37df5bb9ecd ]---


cheers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/7] powerpc/eeh: Add eeh_force_recover to debugfs
  2019-02-08 12:31   ` Michael Ellerman
@ 2019-02-08 12:50     ` Oliver
  2019-02-11  2:24       ` Michael Ellerman
  0 siblings, 1 reply; 21+ messages in thread
From: Oliver @ 2019-02-08 12:50 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

On Fri, Feb 8, 2019 at 11:32 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Oliver O'Halloran <oohall@gmail.com> writes:
>
> > This patch adds a debugfs interface to force scheduling a recovery event.
> > This can be used to recover a specific PE or schedule a "special" recovery
> > even that checks for errors at the PHB level.
> > To force a recovery of a normal PE, use:
> >
> >  echo '<#pe>:<#phb>' > /sys/kernel/debug/powerpc/eeh_force_recover
> >
> > To force a scan broken PHBs:
> >
> >  echo 'null' > /sys/kernel/debug/powerpc/eeh_force_recover
>
> Why 'null', that seems like an odd choice. Why not "all" or "scan" or
> something?

When an EEH event occurs the bit that is sent to the event handler is
just a pointer the the struct eeh_pe. If the pointer is null it's then
treated as a special event which indicates a PHB failure. I agree it's
a bit dumb, but I don't really expect anyone except me or samb to use
this interface so I went with what would make sense to someone
familiar with the internals.

>
> Also it oopsed on me:
>
> [   76.323164] sending failure event
> [   76.323421] BUG: Unable to handle kernel instruction fetch (NULL pointer?)
> [   76.323655] Faulting instruction address: 0x00000000
> [   76.323856] Oops: Kernel access of bad area, sig: 11 [#1]
> [   76.323946] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [   76.324295] Modules linked in: vmx_crypto kvm binfmt_misc ip_tables x_tables autofs4 crc32c_vpmsum
> [   76.324669] CPU: 2 PID: 97 Comm: eehd Not tainted 5.0.0-rc2-gcc-8.2.0-00080-gfacc0d1d9517 #435
> [   76.325054] NIP:  0000000000000000 LR: c0000000000451f8 CTR: 0000000000000000
> [   76.325402] REGS: c0000000fec779c0 TRAP: 0400   Not tainted  (5.0.0-rc2-gcc-8.2.0-00080-gfacc0d1d9517)
> [   76.325768] MSR:  800000014280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 24000482  XER: 20000000
> [   76.326243] CFAR: c000000000002528 IRQMASK: 0
> [   76.326243] GPR00: c000000000045edc c0000000fec77c50 c000000001574000 c0000000fec77cb0
> [   76.326243] GPR04: 0000000000000000 00177d76e3e321bc 00177d76e4293a1f 5deadbeef0000100
> [   76.326243] GPR08: 5deadbeef0000200 0000000000000000 0000000000000000 00177d76e3e3216b
> [   76.326243] GPR12: 0000000000000000 c00000003fffdf00 c0000000001438a8 c0000000fe211700
> [   76.326243] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [   76.326243] GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000e814e8
> [   76.326243] GPR24: c000000000e814c0 5deadbeef0000100 c000000001622480 0000000100000000
> [   76.326243] GPR28: c000000001413310 c0000000016244e0 c0000000014132f0 c0000001f84246a0
> [   76.329073] NIP [0000000000000000]           (null)
> [   76.329285] LR [c0000000000451f8] eeh_handle_special_event+0x78/0x348
> [   76.329602] Call Trace:
> [   76.329762] [c0000000fec77c50] [c0000000fec77ce0] 0xc0000000fec77ce0 (unreliable)
> [   76.330113] [c0000000fec77d00] [c000000000045edc] eeh_event_handler+0x10c/0x1c0
> [   76.330464] [c0000000fec77db0] [c000000000143a4c] kthread+0x1ac/0x1c0
> [   76.330681] [c0000000fec77e20] [c00000000000bdc4] ret_from_kernel_thread+0x5c/0x78
> [   76.331026] Instruction dump:
> [   76.331197] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
> [   76.331550] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
> [   76.331803] ---[ end trace dc73d37df5bb9ecd ]---
>
>
> cheers

This is probably a side effect of special events being a PowerNV
specific concept. For a pseries guest there should never be any PHB
PEs since (hardware) PHBs are a concept that is hidden to to a guest.
It's like EEH is poorly thought out and full of layering violations or
something...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/7] powerpc/eeh: Allow disabling recovery
  2019-02-08  9:58   ` Michael Ellerman
@ 2019-02-08 12:52     ` Oliver
  0 siblings, 0 replies; 21+ messages in thread
From: Oliver @ 2019-02-08 12:52 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

On Fri, Feb 8, 2019 at 8:58 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Oliver O'Halloran <oohall@gmail.com> writes:
>
> > diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> > index d1f0bdf41fac..92809b137e39 100644
> > --- a/arch/powerpc/kernel/eeh.c
> > +++ b/arch/powerpc/kernel/eeh.c
> > @@ -1810,7 +1817,11 @@ static int __init eeh_init_proc(void)
> >                                          &eeh_enable_dbgfs_ops);
> >               debugfs_create_u32("eeh_max_freezes", 0600,
> >                               powerpc_debugfs_root, &eeh_max_freezes);
> > +             debugfs_create_bool("eeh_disable_recovery", 0600,
> > +                             powerpc_debugfs_root,
> > +                             &eeh_debugfs_no_recover);
> >               eeh_cache_debugfs_init();
> > +#endif
>
> There's that endif.

Bleh

>
> Whem I'm doing rebasing and think I might have broken bisectability I
> build every commit with:
>
>   https://github.com/mpe/misc-scripts/blob/master/git/for-each-commit

Thanks, I have something similar for skiboot but never got around to
porting it to the kernel.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/7] powerpc/pci: Add pci_find_hose_for_domain()
  2019-02-08  9:57   ` Michael Ellerman
@ 2019-02-08 12:53     ` Oliver
  0 siblings, 0 replies; 21+ messages in thread
From: Oliver @ 2019-02-08 12:53 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

On Fri, Feb 8, 2019 at 8:57 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Oliver O'Halloran <oohall@gmail.com> writes:
> > diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
> > index aee4fcc24990..149053b7f481 100644
> > --- a/arch/powerpc/include/asm/pci-bridge.h
> > +++ b/arch/powerpc/include/asm/pci-bridge.h
> > @@ -274,6 +274,8 @@ extern int pcibios_map_io_space(struct pci_bus *bus);
> >  extern struct pci_controller *pci_find_hose_for_OF_device(
> >                       struct device_node* node);
> >
> > +extern struct pci_controller *pci_find_hose_for_domain(uint32_t domain_nr);
>
> I know we use "hose" a lot in the PCI code, but it's a stupid name. Can
> we not introduce new usages?

I was tempted to call it pci_find_horse_for_domain(), but neigh.

>
> It returns a pci_controller so pci_find_controller_for_domain()?

ok

>
> cheers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/7] powerpc/eeh_cache: Add a way to dump the EEH address cache
  2019-02-08  9:47   ` Michael Ellerman
@ 2019-02-08 13:14     ` Oliver
  2019-02-11  2:16       ` Michael Ellerman
  0 siblings, 1 reply; 21+ messages in thread
From: Oliver @ 2019-02-08 13:14 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

On Fri, Feb 8, 2019 at 8:47 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Oliver O'Halloran <oohall@gmail.com> writes:
> > diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> > index f6e65375a8de..d1f0bdf41fac 100644
> > --- a/arch/powerpc/kernel/eeh.c
> > +++ b/arch/powerpc/kernel/eeh.c
> > @@ -1810,7 +1810,7 @@ static int __init eeh_init_proc(void)
> >                                          &eeh_enable_dbgfs_ops);
> >               debugfs_create_u32("eeh_max_freezes", 0600,
> >                               powerpc_debugfs_root, &eeh_max_freezes);
> > -#endif
> > +             eeh_cache_debugfs_init();
>
> Oops :)

Yeah :(

> > diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
> > index b2c320e0fcef..dba421a577e7 100644
> > --- a/arch/powerpc/kernel/eeh_cache.c
> > +++ b/arch/powerpc/kernel/eeh_cache.c
> > @@ -298,9 +299,34 @@ void eeh_addr_cache_build(void)
> >               eeh_addr_cache_insert_dev(dev);
> >               eeh_sysfs_add_device(dev);
> >       }
> > +}
> >
> > -#ifdef DEBUG
> > -     /* Verify tree built up above, echo back the list of addrs. */
> > -     eeh_addr_cache_print(&pci_io_addr_cache_root);
> > -#endif
> > +static int eeh_addr_cache_show(struct seq_file *s, void *v)
> > +{
> > +     struct rb_node *n = rb_first(&pci_io_addr_cache_root.rb_root);
> > +     struct pci_io_addr_range *piar;
> > +     int cnt = 0;
> > +
> > +     spin_lock(&pci_io_addr_cache_root.piar_lock);
> > +     while (n) {
> > +             piar = rb_entry(n, struct pci_io_addr_range, rb_node);
> > +
> > +             seq_printf(s, "%s addr range %3d [%pap-%pap]: %s\n",
> > +                    (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt,
> > +                    &piar->addr_lo, &piar->addr_hi, pci_name(piar->pcidev));
> > +
> > +             n = rb_next(n);
> > +             cnt++;
> > +     }
>
> You can write that as a for loop can't you?
>
>         struct rb_node *n;
>         int i = 0;
>
>         for (n = rb_first(&pci_io_addr_cache_root.rb_root); n; n = rb_next(n), i++) {

IIRC I did try that, but it's too long. 85 cols wide according to my editor.

>                 piar = rb_entry(n, struct pci_io_addr_range, rb_node);
>
>                 seq_printf(s, "%s addr range %3d [%pap-%pap]: %s\n",
>                        (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", i,
>                        &piar->addr_lo, &piar->addr_hi, pci_name(piar->pcidev));
>         }
>
> cheers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/7] powerpc/eeh_cache: Add a way to dump the EEH address cache
  2019-02-08 13:14     ` Oliver
@ 2019-02-11  2:16       ` Michael Ellerman
  0 siblings, 0 replies; 21+ messages in thread
From: Michael Ellerman @ 2019-02-11  2:16 UTC (permalink / raw)
  To: Oliver; +Cc: linuxppc-dev

Oliver <oohall@gmail.com> writes:
> On Fri, Feb 8, 2019 at 8:47 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>> Oliver O'Halloran <oohall@gmail.com> writes:
>> > diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
>> > index b2c320e0fcef..dba421a577e7 100644
>> > --- a/arch/powerpc/kernel/eeh_cache.c
>> > +++ b/arch/powerpc/kernel/eeh_cache.c
>> > @@ -298,9 +299,34 @@ void eeh_addr_cache_build(void)
>> >               eeh_addr_cache_insert_dev(dev);
>> >               eeh_sysfs_add_device(dev);
>> >       }
>> > +}
>> >
>> > -#ifdef DEBUG
>> > -     /* Verify tree built up above, echo back the list of addrs. */
>> > -     eeh_addr_cache_print(&pci_io_addr_cache_root);
>> > -#endif
>> > +static int eeh_addr_cache_show(struct seq_file *s, void *v)
>> > +{
>> > +     struct rb_node *n = rb_first(&pci_io_addr_cache_root.rb_root);
>> > +     struct pci_io_addr_range *piar;
>> > +     int cnt = 0;
>> > +
>> > +     spin_lock(&pci_io_addr_cache_root.piar_lock);
>> > +     while (n) {
>> > +             piar = rb_entry(n, struct pci_io_addr_range, rb_node);
>> > +
>> > +             seq_printf(s, "%s addr range %3d [%pap-%pap]: %s\n",
>> > +                    (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt,
>> > +                    &piar->addr_lo, &piar->addr_hi, pci_name(piar->pcidev));
>> > +
>> > +             n = rb_next(n);
>> > +             cnt++;
>> > +     }
>>
>> You can write that as a for loop can't you?
>>
>>         struct rb_node *n;
>>         int i = 0;
>>
>>         for (n = rb_first(&pci_io_addr_cache_root.rb_root); n; n = rb_next(n), i++) {
>
> IIRC I did try that, but it's too long. 85 cols wide according to my editor.

Don't care.

Long lines aren't inherently evil, they have some downsides but so do
the other options. In a case like this 85 columns would be preferable to
splitting the line or writing it a while loop.

cheers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/7] powerpc/eeh: Add eeh_force_recover to debugfs
  2019-02-08 12:50     ` Oliver
@ 2019-02-11  2:24       ` Michael Ellerman
  0 siblings, 0 replies; 21+ messages in thread
From: Michael Ellerman @ 2019-02-11  2:24 UTC (permalink / raw)
  To: Oliver; +Cc: linuxppc-dev

Oliver <oohall@gmail.com> writes:
> On Fri, Feb 8, 2019 at 11:32 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>> Oliver O'Halloran <oohall@gmail.com> writes:
>>
>> > This patch adds a debugfs interface to force scheduling a recovery event.
>> > This can be used to recover a specific PE or schedule a "special" recovery
>> > even that checks for errors at the PHB level.
>> > To force a recovery of a normal PE, use:
>> >
>> >  echo '<#pe>:<#phb>' > /sys/kernel/debug/powerpc/eeh_force_recover
>> >
>> > To force a scan broken PHBs:
>> >
>> >  echo 'null' > /sys/kernel/debug/powerpc/eeh_force_recover
>>
>> Why 'null', that seems like an odd choice. Why not "all" or "scan" or
>> something?
>
> When an EEH event occurs the bit that is sent to the event handler is
> just a pointer the the struct eeh_pe. If the pointer is null it's then
> treated as a special event which indicates a PHB failure. I agree it's
> a bit dumb, but I don't really expect anyone except me or samb to use
> this interface so I went with what would make sense to someone
> familiar with the internals.

Yeah, nah. Let's use something that's at least vaguely self documenting
so people like me can have some clue what it's doing.

cheers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/7] powerpc/eeh: Add eeh_force_recover to debugfs
  2019-02-08  3:08 ` [PATCH 7/7] powerpc/eeh: Add eeh_force_recover to debugfs Oliver O'Halloran
  2019-02-08 12:31   ` Michael Ellerman
@ 2019-02-13  4:37   ` Sam Bobroff
  2019-02-13  5:18     ` Oliver
  1 sibling, 1 reply; 21+ messages in thread
From: Sam Bobroff @ 2019-02-13  4:37 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 5488 bytes --]

On Fri, Feb 08, 2019 at 02:08:02PM +1100, Oliver O'Halloran wrote:
> This patch adds a debugfs interface to force scheduling a recovery event.
> This can be used to recover a specific PE or schedule a "special" recovery
> even that checks for errors at the PHB level.
> To force a recovery of a normal PE, use:
> 
>  echo '<#pe>:<#phb>' > /sys/kernel/debug/powerpc/eeh_force_recover

How about placing these in the per-PHB debugfs directory?
echo '<#pe>' > /sys/kernel/debug/powerpc/PCI0000/eeh_force_recover

> To force a scan broken PHBs:
> 
>  echo 'null' > /sys/kernel/debug/powerpc/eeh_force_recover

And keep this one where it is, and just trigger with any write (or a '1'
or whatever)?

Sam.

> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> ---
>  arch/powerpc/include/asm/eeh_event.h |  1 +
>  arch/powerpc/kernel/eeh.c            | 60 ++++++++++++++++++++++++++++
>  arch/powerpc/kernel/eeh_event.c      | 25 +++++++-----
>  3 files changed, 76 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/eeh_event.h b/arch/powerpc/include/asm/eeh_event.h
> index 9884e872686f..6d0412b846ac 100644
> --- a/arch/powerpc/include/asm/eeh_event.h
> +++ b/arch/powerpc/include/asm/eeh_event.h
> @@ -33,6 +33,7 @@ struct eeh_event {
>  
>  int eeh_event_init(void);
>  int eeh_send_failure_event(struct eeh_pe *pe);
> +int __eeh_send_failure_event(struct eeh_pe *pe);
>  void eeh_remove_event(struct eeh_pe *pe, bool force);
>  void eeh_handle_normal_event(struct eeh_pe *pe);
>  void eeh_handle_special_event(void);
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 92809b137e39..63b91a4918c9 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1805,6 +1805,63 @@ static int eeh_enable_dbgfs_get(void *data, u64 *val)
>  
>  DEFINE_DEBUGFS_ATTRIBUTE(eeh_enable_dbgfs_ops, eeh_enable_dbgfs_get,
>  			 eeh_enable_dbgfs_set, "0x%llx\n");
> +
> +static ssize_t eeh_force_recover_write(struct file *filp,
> +				const char __user *user_buf,
> +				size_t count, loff_t *ppos)
> +{
> +	struct pci_controller *hose;
> +	uint32_t phbid, pe_no;
> +	struct eeh_pe *pe;
> +	char buf[20];
> +	int ret;
> +
> +	ret = simple_write_to_buffer(buf, sizeof(buf), ppos, user_buf, count);
> +	if (!ret)
> +		return -EFAULT;
> +
> +	/*
> +	 * When PE is NULL the event is a "special" event. Rather than
> +	 * recovering a specific PE it forces the EEH core to scan for failed
> +	 * PHBs and recovers each. This needs to be done before any device
> +	 * recoveries can occur.
> +	 */
> +	if (!strncmp(buf, "null", 4)) {
> +		pr_err("sending failure event\n");
> +		__eeh_send_failure_event(NULL);
> +		return count;
> +	}
> +
> +	ret = sscanf(buf, "%x:%x", &phbid, &pe_no);
> +	if (ret != 2)
> +		return -EINVAL;
> +
> +	hose = pci_find_hose_for_domain(phbid);
> +	if (!hose)
> +		return -ENODEV;
> +
> +	/* Retrieve PE */
> +	pe = eeh_pe_get(hose, pe_no, 0);
> +	if (!pe)
> +		return -ENODEV;
> +
> +	/*
> +	 * We don't do any state checking here since the detection
> +	 * process is async to the recovery process. The recovery
> +	 * thread *should* not break even if we schedule a recovery
> +	 * from an odd state (e.g. PE removed, or recovery of a
> +	 * non-isolated PE)
> +	 */
> +	__eeh_send_failure_event(pe);
> +
> +	return ret < 0 ? ret : count;
> +}
> +
> +static const struct file_operations eeh_force_recover_fops = {
> +	.open	= simple_open,
> +	.llseek	= no_llseek,
> +	.write	= eeh_force_recover_write,
> +};
>  #endif
>  
>  static int __init eeh_init_proc(void)
> @@ -1820,6 +1877,9 @@ static int __init eeh_init_proc(void)
>  		debugfs_create_bool("eeh_disable_recovery", 0600,
>  				powerpc_debugfs_root,
>  				&eeh_debugfs_no_recover);
> +		debugfs_create_file_unsafe("eeh_force_recover", 0600,
> +				powerpc_debugfs_root, NULL,
> +				&eeh_force_recover_fops);
>  		eeh_cache_debugfs_init();
>  #endif
>  	}
> diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c
> index 19837798bb1d..539aca055d70 100644
> --- a/arch/powerpc/kernel/eeh_event.c
> +++ b/arch/powerpc/kernel/eeh_event.c
> @@ -121,20 +121,11 @@ int eeh_event_init(void)
>   * the actual event will be delivered in a normal context
>   * (from a workqueue).
>   */
> -int eeh_send_failure_event(struct eeh_pe *pe)
> +int __eeh_send_failure_event(struct eeh_pe *pe)
>  {
>  	unsigned long flags;
>  	struct eeh_event *event;
>  
> -	/*
> -	 * If we've manually supressed recovery events via debugfs
> -	 * then just drop it on the floor.
> -	 */
> -	if (eeh_debugfs_no_recover) {
> -		pr_err("EEH: Event dropped due to no_recover setting\n");
> -		return 0;
> -	}
> -
>  	event = kzalloc(sizeof(*event), GFP_ATOMIC);
>  	if (!event) {
>  		pr_err("EEH: out of memory, event not handled\n");
> @@ -153,6 +144,20 @@ int eeh_send_failure_event(struct eeh_pe *pe)
>  	return 0;
>  }
>  
> +int eeh_send_failure_event(struct eeh_pe *pe)
> +{
> +	/*
> +	 * If we've manually supressed recovery events via debugfs
> +	 * then just drop it on the floor.
> +	 */
> +	if (eeh_debugfs_no_recover) {
> +		pr_err("EEH: Event dropped due to no_recover setting\n");
> +		return 0;
> +	}
> +
> +	return __eeh_send_failure_event(pe);
> +}
> +
>  /**
>   * eeh_remove_event - Remove EEH event from the queue
>   * @pe: Event binding to the PE
> -- 
> 2.20.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/7] powerpc/eeh: Add eeh_force_recover to debugfs
  2019-02-13  4:37   ` Sam Bobroff
@ 2019-02-13  5:18     ` Oliver
  0 siblings, 0 replies; 21+ messages in thread
From: Oliver @ 2019-02-13  5:18 UTC (permalink / raw)
  To: Sam Bobroff; +Cc: linuxppc-dev

On Wed, Feb 13, 2019 at 3:38 PM Sam Bobroff <sbobroff@linux.ibm.com> wrote:
>
> On Fri, Feb 08, 2019 at 02:08:02PM +1100, Oliver O'Halloran wrote:
> > This patch adds a debugfs interface to force scheduling a recovery event.
> > This can be used to recover a specific PE or schedule a "special" recovery
> > even that checks for errors at the PHB level.
> > To force a recovery of a normal PE, use:
> >
> >  echo '<#pe>:<#phb>' > /sys/kernel/debug/powerpc/eeh_force_recover
>
> How about placing these in the per-PHB debugfs directory?
> echo '<#pe>' > /sys/kernel/debug/powerpc/PCI0000/eeh_force_recover
>
> > To force a scan broken PHBs:
> >
> >  echo 'null' > /sys/kernel/debug/powerpc/eeh_force_recover
>
> And keep this one where it is, and just trigger with any write (or a '1'
> or whatever)?

The per-PHB directories only exist on PowerNV. I'd rather this was
merged as-is since it handles both platforms. If we want to add the
per-PHB debugfs stuff to pseries we can do it later.

>
> Sam.
>
> > Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> > ---
> >  arch/powerpc/include/asm/eeh_event.h |  1 +
> >  arch/powerpc/kernel/eeh.c            | 60 ++++++++++++++++++++++++++++
> >  arch/powerpc/kernel/eeh_event.c      | 25 +++++++-----
> >  3 files changed, 76 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/eeh_event.h b/arch/powerpc/include/asm/eeh_event.h
> > index 9884e872686f..6d0412b846ac 100644
> > --- a/arch/powerpc/include/asm/eeh_event.h
> > +++ b/arch/powerpc/include/asm/eeh_event.h
> > @@ -33,6 +33,7 @@ struct eeh_event {
> >
> >  int eeh_event_init(void);
> >  int eeh_send_failure_event(struct eeh_pe *pe);
> > +int __eeh_send_failure_event(struct eeh_pe *pe);
> >  void eeh_remove_event(struct eeh_pe *pe, bool force);
> >  void eeh_handle_normal_event(struct eeh_pe *pe);
> >  void eeh_handle_special_event(void);
> > diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> > index 92809b137e39..63b91a4918c9 100644
> > --- a/arch/powerpc/kernel/eeh.c
> > +++ b/arch/powerpc/kernel/eeh.c
> > @@ -1805,6 +1805,63 @@ static int eeh_enable_dbgfs_get(void *data, u64 *val)
> >
> >  DEFINE_DEBUGFS_ATTRIBUTE(eeh_enable_dbgfs_ops, eeh_enable_dbgfs_get,
> >                        eeh_enable_dbgfs_set, "0x%llx\n");
> > +
> > +static ssize_t eeh_force_recover_write(struct file *filp,
> > +                             const char __user *user_buf,
> > +                             size_t count, loff_t *ppos)
> > +{
> > +     struct pci_controller *hose;
> > +     uint32_t phbid, pe_no;
> > +     struct eeh_pe *pe;
> > +     char buf[20];
> > +     int ret;
> > +
> > +     ret = simple_write_to_buffer(buf, sizeof(buf), ppos, user_buf, count);
> > +     if (!ret)
> > +             return -EFAULT;
> > +
> > +     /*
> > +      * When PE is NULL the event is a "special" event. Rather than
> > +      * recovering a specific PE it forces the EEH core to scan for failed
> > +      * PHBs and recovers each. This needs to be done before any device
> > +      * recoveries can occur.
> > +      */
> > +     if (!strncmp(buf, "null", 4)) {
> > +             pr_err("sending failure event\n");
> > +             __eeh_send_failure_event(NULL);
> > +             return count;
> > +     }
> > +
> > +     ret = sscanf(buf, "%x:%x", &phbid, &pe_no);
> > +     if (ret != 2)
> > +             return -EINVAL;
> > +
> > +     hose = pci_find_hose_for_domain(phbid);
> > +     if (!hose)
> > +             return -ENODEV;
> > +
> > +     /* Retrieve PE */
> > +     pe = eeh_pe_get(hose, pe_no, 0);
> > +     if (!pe)
> > +             return -ENODEV;
> > +
> > +     /*
> > +      * We don't do any state checking here since the detection
> > +      * process is async to the recovery process. The recovery
> > +      * thread *should* not break even if we schedule a recovery
> > +      * from an odd state (e.g. PE removed, or recovery of a
> > +      * non-isolated PE)
> > +      */
> > +     __eeh_send_failure_event(pe);
> > +
> > +     return ret < 0 ? ret : count;
> > +}
> > +
> > +static const struct file_operations eeh_force_recover_fops = {
> > +     .open   = simple_open,
> > +     .llseek = no_llseek,
> > +     .write  = eeh_force_recover_write,
> > +};
> >  #endif
> >
> >  static int __init eeh_init_proc(void)
> > @@ -1820,6 +1877,9 @@ static int __init eeh_init_proc(void)
> >               debugfs_create_bool("eeh_disable_recovery", 0600,
> >                               powerpc_debugfs_root,
> >                               &eeh_debugfs_no_recover);
> > +             debugfs_create_file_unsafe("eeh_force_recover", 0600,
> > +                             powerpc_debugfs_root, NULL,
> > +                             &eeh_force_recover_fops);
> >               eeh_cache_debugfs_init();
> >  #endif
> >       }
> > diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c
> > index 19837798bb1d..539aca055d70 100644
> > --- a/arch/powerpc/kernel/eeh_event.c
> > +++ b/arch/powerpc/kernel/eeh_event.c
> > @@ -121,20 +121,11 @@ int eeh_event_init(void)
> >   * the actual event will be delivered in a normal context
> >   * (from a workqueue).
> >   */
> > -int eeh_send_failure_event(struct eeh_pe *pe)
> > +int __eeh_send_failure_event(struct eeh_pe *pe)
> >  {
> >       unsigned long flags;
> >       struct eeh_event *event;
> >
> > -     /*
> > -      * If we've manually supressed recovery events via debugfs
> > -      * then just drop it on the floor.
> > -      */
> > -     if (eeh_debugfs_no_recover) {
> > -             pr_err("EEH: Event dropped due to no_recover setting\n");
> > -             return 0;
> > -     }
> > -
> >       event = kzalloc(sizeof(*event), GFP_ATOMIC);
> >       if (!event) {
> >               pr_err("EEH: out of memory, event not handled\n");
> > @@ -153,6 +144,20 @@ int eeh_send_failure_event(struct eeh_pe *pe)
> >       return 0;
> >  }
> >
> > +int eeh_send_failure_event(struct eeh_pe *pe)
> > +{
> > +     /*
> > +      * If we've manually supressed recovery events via debugfs
> > +      * then just drop it on the floor.
> > +      */
> > +     if (eeh_debugfs_no_recover) {
> > +             pr_err("EEH: Event dropped due to no_recover setting\n");
> > +             return 0;
> > +     }
> > +
> > +     return __eeh_send_failure_event(pe);
> > +}
> > +
> >  /**
> >   * eeh_remove_event - Remove EEH event from the queue
> >   * @pe: Event binding to the PE
> > --
> > 2.20.1
> >

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, back to index

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-08  3:07 [PATCH 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Oliver O'Halloran
2019-02-08  3:07 ` [PATCH 2/7] powerpc/eeh_cache: Add pr_debug() prints for insert/remove Oliver O'Halloran
2019-02-08  3:07 ` [PATCH 3/7] powerpc/eeh_cache: Add a way to dump the EEH address cache Oliver O'Halloran
2019-02-08  9:00   ` kbuild test robot
2019-02-08  9:47   ` Michael Ellerman
2019-02-08 13:14     ` Oliver
2019-02-11  2:16       ` Michael Ellerman
2019-02-08  3:07 ` [PATCH 4/7] powerpc/eeh_cache: Bump log level of eeh_addr_cache_print() Oliver O'Halloran
2019-02-08  3:08 ` [PATCH 5/7] powerpc/pci: Add pci_find_hose_for_domain() Oliver O'Halloran
2019-02-08  9:57   ` Michael Ellerman
2019-02-08 12:53     ` Oliver
2019-02-08  3:08 ` [PATCH 6/7] powerpc/eeh: Allow disabling recovery Oliver O'Halloran
2019-02-08  9:58   ` Michael Ellerman
2019-02-08 12:52     ` Oliver
2019-02-08  3:08 ` [PATCH 7/7] powerpc/eeh: Add eeh_force_recover to debugfs Oliver O'Halloran
2019-02-08 12:31   ` Michael Ellerman
2019-02-08 12:50     ` Oliver
2019-02-11  2:24       ` Michael Ellerman
2019-02-13  4:37   ` Sam Bobroff
2019-02-13  5:18     ` Oliver
2019-02-08  9:38 ` [PATCH 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Michael Ellerman

LinuxPPC-Dev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linuxppc-dev/0 linuxppc-dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linuxppc-dev linuxppc-dev/ https://lore.kernel.org/linuxppc-dev \
		linuxppc-dev@lists.ozlabs.org linuxppc-dev@ozlabs.org linuxppc-dev@archiver.kernel.org
	public-inbox-index linuxppc-dev


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.ozlabs.lists.linuxppc-dev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox