linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes
@ 2019-02-15  0:48 Oliver O'Halloran
  2019-02-15  0:48 ` [PATCH v2 2/7] powerpc/eeh_cache: Add pr_debug() prints for insert/remove Oliver O'Halloran
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Oliver O'Halloran @ 2019-02-15  0:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sbobroff, Oliver O'Halloran

There's no need to the custom getter/setter functions so we should remove
them in favour of using the generic one. While we're here, change the type
of eeh_max_freeze to u32 and print the value in decimal rather than
hex because printing it in hex makes no sense.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
v2: Replaced uint32_t with u32.
---
 arch/powerpc/include/asm/eeh.h |  2 +-
 arch/powerpc/kernel/eeh.c      | 21 +++------------------
 2 files changed, 4 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 8b596d096ebe..2daecd677855 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -219,7 +219,7 @@ struct eeh_ops {
 };
 
 extern int eeh_subsystem_flags;
-extern int eeh_max_freezes;
+extern u32 eeh_max_freezes;
 extern struct eeh_ops *eeh_ops;
 extern raw_spinlock_t confirm_error_lock;
 
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index ae05203eb4de..d021f5abeec5 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -109,7 +109,7 @@ EXPORT_SYMBOL(eeh_subsystem_flags);
  * frozen count in last hour exceeds this limit, the PE will
  * be forced to be offline permanently.
  */
-int eeh_max_freezes = 5;
+u32 eeh_max_freezes = 5;
 
 /* Platform dependent EEH operations */
 struct eeh_ops *eeh_ops = NULL;
@@ -1796,22 +1796,8 @@ static int eeh_enable_dbgfs_get(void *data, u64 *val)
 	return 0;
 }
 
-static int eeh_freeze_dbgfs_set(void *data, u64 val)
-{
-	eeh_max_freezes = val;
-	return 0;
-}
-
-static int eeh_freeze_dbgfs_get(void *data, u64 *val)
-{
-	*val = eeh_max_freezes;
-	return 0;
-}
-
 DEFINE_DEBUGFS_ATTRIBUTE(eeh_enable_dbgfs_ops, eeh_enable_dbgfs_get,
 			 eeh_enable_dbgfs_set, "0x%llx\n");
-DEFINE_DEBUGFS_ATTRIBUTE(eeh_freeze_dbgfs_ops, eeh_freeze_dbgfs_get,
-			 eeh_freeze_dbgfs_set, "0x%llx\n");
 #endif
 
 static int __init eeh_init_proc(void)
@@ -1822,9 +1808,8 @@ static int __init eeh_init_proc(void)
 		debugfs_create_file_unsafe("eeh_enable", 0600,
 					   powerpc_debugfs_root, NULL,
 					   &eeh_enable_dbgfs_ops);
-		debugfs_create_file_unsafe("eeh_max_freezes", 0600,
-					   powerpc_debugfs_root, NULL,
-					   &eeh_freeze_dbgfs_ops);
+		debugfs_create_u32("eeh_max_freezes", 0600,
+				powerpc_debugfs_root, &eeh_max_freezes);
 #endif
 	}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/7] powerpc/eeh_cache: Add pr_debug() prints for insert/remove
  2019-02-15  0:48 [PATCH v2 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Oliver O'Halloran
@ 2019-02-15  0:48 ` Oliver O'Halloran
  2019-02-15  5:11   ` Sam Bobroff
  2019-02-15  0:48 ` [PATCH v2 3/7] powerpc/eeh_cache: Add a way to dump the EEH address cache Oliver O'Halloran
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Oliver O'Halloran @ 2019-02-15  0:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sbobroff, Oliver O'Halloran

The EEH address cache is used to map a physical MMIO address back to a PCI
device. It's useful to know when it's being manipulated, but currently this
requires recompiling with #define DEBUG set. This is pointless since we
have dynamic_debug nowdays, so remove the #ifdef guard and add a pr_debug()
for the remove case too.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/kernel/eeh_cache.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index 201943d54a6e..b2c320e0fcef 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -157,10 +157,8 @@ eeh_addr_cache_insert(struct pci_dev *dev, resource_size_t alo,
 	piar->pcidev = dev;
 	piar->flags = flags;
 
-#ifdef DEBUG
 	pr_debug("PIAR: insert range=[%pap:%pap] dev=%s\n",
 		 &alo, &ahi, pci_name(dev));
-#endif
 
 	rb_link_node(&piar->rb_node, parent, p);
 	rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root);
@@ -240,6 +238,8 @@ static inline void __eeh_addr_cache_rmv_dev(struct pci_dev *dev)
 		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
 
 		if (piar->pcidev == dev) {
+			pr_debug("PIAR: remove range=[%pap:%pap] dev=%s\n",
+				 &piar->addr_lo, &piar->addr_hi, pci_name(dev));
 			rb_erase(n, &pci_io_addr_cache_root.rb_root);
 			kfree(piar);
 			goto restart;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/7] powerpc/eeh_cache: Add a way to dump the EEH address cache
  2019-02-15  0:48 [PATCH v2 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Oliver O'Halloran
  2019-02-15  0:48 ` [PATCH v2 2/7] powerpc/eeh_cache: Add pr_debug() prints for insert/remove Oliver O'Halloran
@ 2019-02-15  0:48 ` Oliver O'Halloran
  2019-02-15  5:12   ` Sam Bobroff
  2019-02-15  0:48 ` [PATCH v2 4/7] powerpc/eeh_cache: Bump log level of eeh_addr_cache_print() Oliver O'Halloran
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Oliver O'Halloran @ 2019-02-15  0:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sbobroff, Oliver O'Halloran

Adds a debugfs file that can be read to view the contents of the EEH
address cache. This is pretty similar to the existing
eeh_addr_cache_print() function, but that function is intended to debug
issues inside of the kernel since it's #ifdef`ed out by default, and writes
into the kernel log.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
v2: Added missing #endif
    Replaced while loop with a for
---
 arch/powerpc/include/asm/eeh.h  |  3 +++
 arch/powerpc/kernel/eeh.c       |  1 +
 arch/powerpc/kernel/eeh_cache.c | 30 ++++++++++++++++++++++++++----
 3 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 2daecd677855..478f199d5663 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -460,6 +460,9 @@ static inline void eeh_readsl(const volatile void __iomem *addr, void * buf,
 		eeh_check_failure(addr);
 }
 
+
+void eeh_cache_debugfs_init(void);
+
 #endif /* CONFIG_PPC64 */
 #endif /* __KERNEL__ */
 #endif /* _POWERPC_EEH_H */
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index d021f5abeec5..82d22c671c0e 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1810,6 +1810,7 @@ static int __init eeh_init_proc(void)
 					   &eeh_enable_dbgfs_ops);
 		debugfs_create_u32("eeh_max_freezes", 0600,
 				powerpc_debugfs_root, &eeh_max_freezes);
+		eeh_cache_debugfs_init();
 #endif
 	}
 
diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index b2c320e0fcef..5c5697cced41 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -26,6 +26,7 @@
 #include <linux/spinlock.h>
 #include <linux/atomic.h>
 #include <asm/pci-bridge.h>
+#include <asm/debugfs.h>
 #include <asm/ppc-pci.h>
 
 
@@ -298,9 +299,30 @@ void eeh_addr_cache_build(void)
 		eeh_addr_cache_insert_dev(dev);
 		eeh_sysfs_add_device(dev);
 	}
+}
 
-#ifdef DEBUG
-	/* Verify tree built up above, echo back the list of addrs. */
-	eeh_addr_cache_print(&pci_io_addr_cache_root);
-#endif
+static int eeh_addr_cache_show(struct seq_file *s, void *v)
+{
+	struct pci_io_addr_range *piar;
+	struct rb_node *n;
+
+	spin_lock(&pci_io_addr_cache_root.piar_lock);
+	for (n = rb_first(&pci_io_addr_cache_root.rb_root); n; n = rb_next(n)) {
+		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
+
+		seq_printf(s, "%s addr range [%pap-%pap]: %s\n",
+		       (piar->flags & IORESOURCE_IO) ? "i/o" : "mem",
+		       &piar->addr_lo, &piar->addr_hi, pci_name(piar->pcidev));
+	}
+	spin_unlock(&pci_io_addr_cache_root.piar_lock);
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(eeh_addr_cache);
+
+void eeh_cache_debugfs_init(void)
+{
+	debugfs_create_file_unsafe("eeh_address_cache", 0400,
+			powerpc_debugfs_root, NULL,
+			&eeh_addr_cache_fops);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/7] powerpc/eeh_cache: Bump log level of eeh_addr_cache_print()
  2019-02-15  0:48 [PATCH v2 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Oliver O'Halloran
  2019-02-15  0:48 ` [PATCH v2 2/7] powerpc/eeh_cache: Add pr_debug() prints for insert/remove Oliver O'Halloran
  2019-02-15  0:48 ` [PATCH v2 3/7] powerpc/eeh_cache: Add a way to dump the EEH address cache Oliver O'Halloran
@ 2019-02-15  0:48 ` Oliver O'Halloran
  2019-02-15  5:12   ` Sam Bobroff
  2019-02-15  0:48 ` [PATCH v2 5/7] powerpc/pci: Add pci_find_controller_for_domain() Oliver O'Halloran
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Oliver O'Halloran @ 2019-02-15  0:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sbobroff, Oliver O'Halloran

To use this function at all #define DEBUG needs to be set in eeh_cache.c.
Considering that printing at pr_debug is probably not all that useful since
it adds the additional hurdle of requiring you to enable the debug print if
dynamic_debug is in use so this patch bumps it to pr_info.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/kernel/eeh_cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index 5c5697cced41..9c68f0837385 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -114,7 +114,7 @@ static void eeh_addr_cache_print(struct pci_io_addr_cache *cache)
 	while (n) {
 		struct pci_io_addr_range *piar;
 		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
-		pr_debug("PCI: %s addr range %d [%pap-%pap]: %s\n",
+		pr_info("PCI: %s addr range %d [%pap-%pap]: %s\n",
 		       (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt,
 		       &piar->addr_lo, &piar->addr_hi, pci_name(piar->pcidev));
 		cnt++;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 5/7] powerpc/pci: Add pci_find_controller_for_domain()
  2019-02-15  0:48 [PATCH v2 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Oliver O'Halloran
                   ` (2 preceding siblings ...)
  2019-02-15  0:48 ` [PATCH v2 4/7] powerpc/eeh_cache: Bump log level of eeh_addr_cache_print() Oliver O'Halloran
@ 2019-02-15  0:48 ` Oliver O'Halloran
  2019-02-15  5:13   ` Sam Bobroff
  2019-02-15  0:48 ` [PATCH v2 6/7] powerpc/eeh: Allow disabling recovery Oliver O'Halloran
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Oliver O'Halloran @ 2019-02-15  0:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sbobroff, Oliver O'Halloran

Add a helper to find the pci_controller structure based on the domain
number / phb id.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
v2: Renamed pci_find_hose_for_domain() to
    pci_find_controller_for_domain()
---
 arch/powerpc/include/asm/pci-bridge.h |  2 ++
 arch/powerpc/kernel/pci-common.c      | 11 +++++++++++
 2 files changed, 13 insertions(+)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index aee4fcc24990..feea9534a015 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -274,6 +274,8 @@ extern int pcibios_map_io_space(struct pci_bus *bus);
 extern struct pci_controller *pci_find_hose_for_OF_device(
 			struct device_node* node);
 
+extern struct pci_controller *pci_find_controller_for_domain(int domain_nr);
+
 /* Fill up host controller resources from the OF node */
 extern void pci_process_bridge_OF_ranges(struct pci_controller *hose,
 			struct device_node *dev, int primary);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 88e4f69a09e5..21eeeabb3e0e 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -357,6 +357,17 @@ struct pci_controller* pci_find_hose_for_OF_device(struct device_node* node)
 	return NULL;
 }
 
+struct pci_controller *pci_find_controller_for_domain(int domain_nr)
+{
+	struct pci_controller *hose;
+
+	list_for_each_entry(hose, &hose_list, list_node)
+		if (hose->global_number == domain_nr)
+			return hose;
+
+	return NULL;
+}
+
 /*
  * Reads the interrupt pin to determine if interrupt is use by card.
  * If the interrupt is used, then gets the interrupt line from the
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 6/7] powerpc/eeh: Allow disabling recovery
  2019-02-15  0:48 [PATCH v2 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Oliver O'Halloran
                   ` (3 preceding siblings ...)
  2019-02-15  0:48 ` [PATCH v2 5/7] powerpc/pci: Add pci_find_controller_for_domain() Oliver O'Halloran
@ 2019-02-15  0:48 ` Oliver O'Halloran
  2019-02-15  5:58   ` Sam Bobroff
  2019-02-15  0:48 ` [PATCH v2 7/7] powerpc/eeh: Add eeh_force_recover to debugfs Oliver O'Halloran
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Oliver O'Halloran @ 2019-02-15  0:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sbobroff, Oliver O'Halloran

Currently when we detect an error we automatically invoke the EEH recovery
handler. This can be annoying when debugging EEH problems, or when working
on EEH itself so this patch adds a debugfs knob that will prevent a
recovery event from being queued up when an issue is detected.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/include/asm/eeh.h  |  1 +
 arch/powerpc/kernel/eeh.c       | 10 ++++++++++
 arch/powerpc/kernel/eeh_event.c |  9 +++++++++
 3 files changed, 20 insertions(+)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 478f199d5663..810e05273ad3 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -220,6 +220,7 @@ struct eeh_ops {
 
 extern int eeh_subsystem_flags;
 extern u32 eeh_max_freezes;
+extern bool eeh_debugfs_no_recover;
 extern struct eeh_ops *eeh_ops;
 extern raw_spinlock_t confirm_error_lock;
 
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 82d22c671c0e..9f20099ce2d9 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -111,6 +111,13 @@ EXPORT_SYMBOL(eeh_subsystem_flags);
  */
 u32 eeh_max_freezes = 5;
 
+/*
+ * Controls whether a recovery event should be scheduled when an
+ * isolated device is discovered. This is only really useful for
+ * debugging problems with the EEH core.
+ */
+bool eeh_debugfs_no_recover;
+
 /* Platform dependent EEH operations */
 struct eeh_ops *eeh_ops = NULL;
 
@@ -1810,6 +1817,9 @@ static int __init eeh_init_proc(void)
 					   &eeh_enable_dbgfs_ops);
 		debugfs_create_u32("eeh_max_freezes", 0600,
 				powerpc_debugfs_root, &eeh_max_freezes);
+		debugfs_create_bool("eeh_disable_recovery", 0600,
+				powerpc_debugfs_root,
+				&eeh_debugfs_no_recover);
 		eeh_cache_debugfs_init();
 #endif
 	}
diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c
index 227e57f980df..19837798bb1d 100644
--- a/arch/powerpc/kernel/eeh_event.c
+++ b/arch/powerpc/kernel/eeh_event.c
@@ -126,6 +126,15 @@ int eeh_send_failure_event(struct eeh_pe *pe)
 	unsigned long flags;
 	struct eeh_event *event;
 
+	/*
+	 * If we've manually supressed recovery events via debugfs
+	 * then just drop it on the floor.
+	 */
+	if (eeh_debugfs_no_recover) {
+		pr_err("EEH: Event dropped due to no_recover setting\n");
+		return 0;
+	}
+
 	event = kzalloc(sizeof(*event), GFP_ATOMIC);
 	if (!event) {
 		pr_err("EEH: out of memory, event not handled\n");
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 7/7] powerpc/eeh: Add eeh_force_recover to debugfs
  2019-02-15  0:48 [PATCH v2 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Oliver O'Halloran
                   ` (4 preceding siblings ...)
  2019-02-15  0:48 ` [PATCH v2 6/7] powerpc/eeh: Allow disabling recovery Oliver O'Halloran
@ 2019-02-15  0:48 ` Oliver O'Halloran
  2019-02-15  5:10 ` [PATCH v2 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Sam Bobroff
  2019-02-22  9:47 ` [v2,1/7] " Michael Ellerman
  7 siblings, 0 replies; 15+ messages in thread
From: Oliver O'Halloran @ 2019-02-15  0:48 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sbobroff, Oliver O'Halloran

This patch adds a debugfs interface to force scheduling a recovery event.
This can be used to recover a specific PE or schedule a "special" recovery
even that checks for errors at the PHB level.
To force a recovery of a normal PE, use:

 echo '<#pe>:<#phb>' > /sys/kernel/debug/powerpc/eeh_force_recover

To force a scan for broken PHBs:

 echo 'hwcheck' > /sys/kernel/debug/powerpc/eeh_force_recover

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
v2: Rename from pci_find_hose_for_domain() to
    pci_find_controller_for_domain()

    Use the more descriptive "hwcheck" to send a special event
    rather than "null"
---
 arch/powerpc/include/asm/eeh_event.h |  1 +
 arch/powerpc/kernel/eeh.c            | 59 ++++++++++++++++++++++++++++
 arch/powerpc/kernel/eeh_event.c      | 25 +++++++-----
 3 files changed, 75 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh_event.h b/arch/powerpc/include/asm/eeh_event.h
index 9884e872686f..6d0412b846ac 100644
--- a/arch/powerpc/include/asm/eeh_event.h
+++ b/arch/powerpc/include/asm/eeh_event.h
@@ -33,6 +33,7 @@ struct eeh_event {
 
 int eeh_event_init(void);
 int eeh_send_failure_event(struct eeh_pe *pe);
+int __eeh_send_failure_event(struct eeh_pe *pe);
 void eeh_remove_event(struct eeh_pe *pe, bool force);
 void eeh_handle_normal_event(struct eeh_pe *pe);
 void eeh_handle_special_event(void);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 9f20099ce2d9..37677468be6d 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1805,6 +1805,62 @@ static int eeh_enable_dbgfs_get(void *data, u64 *val)
 
 DEFINE_DEBUGFS_ATTRIBUTE(eeh_enable_dbgfs_ops, eeh_enable_dbgfs_get,
 			 eeh_enable_dbgfs_set, "0x%llx\n");
+
+static ssize_t eeh_force_recover_write(struct file *filp,
+				const char __user *user_buf,
+				size_t count, loff_t *ppos)
+{
+	struct pci_controller *hose;
+	uint32_t phbid, pe_no;
+	struct eeh_pe *pe;
+	char buf[20];
+	int ret;
+
+	ret = simple_write_to_buffer(buf, sizeof(buf), ppos, user_buf, count);
+	if (!ret)
+		return -EFAULT;
+
+	/*
+	 * When PE is NULL the event is a "special" event. Rather than
+	 * recovering a specific PE it forces the EEH core to scan for failed
+	 * PHBs and recovers each. This needs to be done before any device
+	 * recoveries can occur.
+	 */
+	if (!strncmp(buf, "hwcheck", 7)) {
+		__eeh_send_failure_event(NULL);
+		return count;
+	}
+
+	ret = sscanf(buf, "%x:%x", &phbid, &pe_no);
+	if (ret != 2)
+		return -EINVAL;
+
+	hose = pci_find_controller_for_domain(phbid);
+	if (!hose)
+		return -ENODEV;
+
+	/* Retrieve PE */
+	pe = eeh_pe_get(hose, pe_no, 0);
+	if (!pe)
+		return -ENODEV;
+
+	/*
+	 * We don't do any state checking here since the detection
+	 * process is async to the recovery process. The recovery
+	 * thread *should* not break even if we schedule a recovery
+	 * from an odd state (e.g. PE removed, or recovery of a
+	 * non-isolated PE)
+	 */
+	__eeh_send_failure_event(pe);
+
+	return ret < 0 ? ret : count;
+}
+
+static const struct file_operations eeh_force_recover_fops = {
+	.open	= simple_open,
+	.llseek	= no_llseek,
+	.write	= eeh_force_recover_write,
+};
 #endif
 
 static int __init eeh_init_proc(void)
@@ -1820,6 +1876,9 @@ static int __init eeh_init_proc(void)
 		debugfs_create_bool("eeh_disable_recovery", 0600,
 				powerpc_debugfs_root,
 				&eeh_debugfs_no_recover);
+		debugfs_create_file_unsafe("eeh_force_recover", 0600,
+				powerpc_debugfs_root, NULL,
+				&eeh_force_recover_fops);
 		eeh_cache_debugfs_init();
 #endif
 	}
diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c
index 19837798bb1d..539aca055d70 100644
--- a/arch/powerpc/kernel/eeh_event.c
+++ b/arch/powerpc/kernel/eeh_event.c
@@ -121,20 +121,11 @@ int eeh_event_init(void)
  * the actual event will be delivered in a normal context
  * (from a workqueue).
  */
-int eeh_send_failure_event(struct eeh_pe *pe)
+int __eeh_send_failure_event(struct eeh_pe *pe)
 {
 	unsigned long flags;
 	struct eeh_event *event;
 
-	/*
-	 * If we've manually supressed recovery events via debugfs
-	 * then just drop it on the floor.
-	 */
-	if (eeh_debugfs_no_recover) {
-		pr_err("EEH: Event dropped due to no_recover setting\n");
-		return 0;
-	}
-
 	event = kzalloc(sizeof(*event), GFP_ATOMIC);
 	if (!event) {
 		pr_err("EEH: out of memory, event not handled\n");
@@ -153,6 +144,20 @@ int eeh_send_failure_event(struct eeh_pe *pe)
 	return 0;
 }
 
+int eeh_send_failure_event(struct eeh_pe *pe)
+{
+	/*
+	 * If we've manually supressed recovery events via debugfs
+	 * then just drop it on the floor.
+	 */
+	if (eeh_debugfs_no_recover) {
+		pr_err("EEH: Event dropped due to no_recover setting\n");
+		return 0;
+	}
+
+	return __eeh_send_failure_event(pe);
+}
+
 /**
  * eeh_remove_event - Remove EEH event from the queue
  * @pe: Event binding to the PE
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes
  2019-02-15  0:48 [PATCH v2 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Oliver O'Halloran
                   ` (5 preceding siblings ...)
  2019-02-15  0:48 ` [PATCH v2 7/7] powerpc/eeh: Add eeh_force_recover to debugfs Oliver O'Halloran
@ 2019-02-15  5:10 ` Sam Bobroff
  2019-02-22  9:47 ` [v2,1/7] " Michael Ellerman
  7 siblings, 0 replies; 15+ messages in thread
From: Sam Bobroff @ 2019-02-15  5:10 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 2715 bytes --]

On Fri, Feb 15, 2019 at 11:48:11AM +1100, Oliver O'Halloran wrote:
> There's no need to the custom getter/setter functions so we should remove
> them in favour of using the generic one. While we're here, change the type
> of eeh_max_freeze to u32 and print the value in decimal rather than
> hex because printing it in hex makes no sense.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>

Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>

> ---
> v2: Replaced uint32_t with u32.
> ---
>  arch/powerpc/include/asm/eeh.h |  2 +-
>  arch/powerpc/kernel/eeh.c      | 21 +++------------------
>  2 files changed, 4 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index 8b596d096ebe..2daecd677855 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -219,7 +219,7 @@ struct eeh_ops {
>  };
>  
>  extern int eeh_subsystem_flags;
> -extern int eeh_max_freezes;
> +extern u32 eeh_max_freezes;
>  extern struct eeh_ops *eeh_ops;
>  extern raw_spinlock_t confirm_error_lock;
>  
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index ae05203eb4de..d021f5abeec5 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -109,7 +109,7 @@ EXPORT_SYMBOL(eeh_subsystem_flags);
>   * frozen count in last hour exceeds this limit, the PE will
>   * be forced to be offline permanently.
>   */
> -int eeh_max_freezes = 5;
> +u32 eeh_max_freezes = 5;
>  
>  /* Platform dependent EEH operations */
>  struct eeh_ops *eeh_ops = NULL;
> @@ -1796,22 +1796,8 @@ static int eeh_enable_dbgfs_get(void *data, u64 *val)
>  	return 0;
>  }
>  
> -static int eeh_freeze_dbgfs_set(void *data, u64 val)
> -{
> -	eeh_max_freezes = val;
> -	return 0;
> -}
> -
> -static int eeh_freeze_dbgfs_get(void *data, u64 *val)
> -{
> -	*val = eeh_max_freezes;
> -	return 0;
> -}
> -
>  DEFINE_DEBUGFS_ATTRIBUTE(eeh_enable_dbgfs_ops, eeh_enable_dbgfs_get,
>  			 eeh_enable_dbgfs_set, "0x%llx\n");
> -DEFINE_DEBUGFS_ATTRIBUTE(eeh_freeze_dbgfs_ops, eeh_freeze_dbgfs_get,
> -			 eeh_freeze_dbgfs_set, "0x%llx\n");
>  #endif
>  
>  static int __init eeh_init_proc(void)
> @@ -1822,9 +1808,8 @@ static int __init eeh_init_proc(void)
>  		debugfs_create_file_unsafe("eeh_enable", 0600,
>  					   powerpc_debugfs_root, NULL,
>  					   &eeh_enable_dbgfs_ops);
> -		debugfs_create_file_unsafe("eeh_max_freezes", 0600,
> -					   powerpc_debugfs_root, NULL,
> -					   &eeh_freeze_dbgfs_ops);
> +		debugfs_create_u32("eeh_max_freezes", 0600,
> +				powerpc_debugfs_root, &eeh_max_freezes);
>  #endif
>  	}
>  
> -- 
> 2.20.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/7] powerpc/eeh_cache: Add pr_debug() prints for insert/remove
  2019-02-15  0:48 ` [PATCH v2 2/7] powerpc/eeh_cache: Add pr_debug() prints for insert/remove Oliver O'Halloran
@ 2019-02-15  5:11   ` Sam Bobroff
  0 siblings, 0 replies; 15+ messages in thread
From: Sam Bobroff @ 2019-02-15  5:11 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1640 bytes --]

On Fri, Feb 15, 2019 at 11:48:12AM +1100, Oliver O'Halloran wrote:
> The EEH address cache is used to map a physical MMIO address back to a PCI
> device. It's useful to know when it's being manipulated, but currently this
> requires recompiling with #define DEBUG set. This is pointless since we
> have dynamic_debug nowdays, so remove the #ifdef guard and add a pr_debug()
> for the remove case too.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>

Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>

> ---
>  arch/powerpc/kernel/eeh_cache.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
> index 201943d54a6e..b2c320e0fcef 100644
> --- a/arch/powerpc/kernel/eeh_cache.c
> +++ b/arch/powerpc/kernel/eeh_cache.c
> @@ -157,10 +157,8 @@ eeh_addr_cache_insert(struct pci_dev *dev, resource_size_t alo,
>  	piar->pcidev = dev;
>  	piar->flags = flags;
>  
> -#ifdef DEBUG
>  	pr_debug("PIAR: insert range=[%pap:%pap] dev=%s\n",
>  		 &alo, &ahi, pci_name(dev));
> -#endif
>  
>  	rb_link_node(&piar->rb_node, parent, p);
>  	rb_insert_color(&piar->rb_node, &pci_io_addr_cache_root.rb_root);
> @@ -240,6 +238,8 @@ static inline void __eeh_addr_cache_rmv_dev(struct pci_dev *dev)
>  		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
>  
>  		if (piar->pcidev == dev) {
> +			pr_debug("PIAR: remove range=[%pap:%pap] dev=%s\n",
> +				 &piar->addr_lo, &piar->addr_hi, pci_name(dev));
>  			rb_erase(n, &pci_io_addr_cache_root.rb_root);
>  			kfree(piar);
>  			goto restart;
> -- 
> 2.20.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/7] powerpc/eeh_cache: Add a way to dump the EEH address cache
  2019-02-15  0:48 ` [PATCH v2 3/7] powerpc/eeh_cache: Add a way to dump the EEH address cache Oliver O'Halloran
@ 2019-02-15  5:12   ` Sam Bobroff
  0 siblings, 0 replies; 15+ messages in thread
From: Sam Bobroff @ 2019-02-15  5:12 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 3235 bytes --]

On Fri, Feb 15, 2019 at 11:48:13AM +1100, Oliver O'Halloran wrote:
> Adds a debugfs file that can be read to view the contents of the EEH
> address cache. This is pretty similar to the existing
> eeh_addr_cache_print() function, but that function is intended to debug
> issues inside of the kernel since it's #ifdef`ed out by default, and writes
> into the kernel log.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>

Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>

> ---
> v2: Added missing #endif
>     Replaced while loop with a for
> ---
>  arch/powerpc/include/asm/eeh.h  |  3 +++
>  arch/powerpc/kernel/eeh.c       |  1 +
>  arch/powerpc/kernel/eeh_cache.c | 30 ++++++++++++++++++++++++++----
>  3 files changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index 2daecd677855..478f199d5663 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -460,6 +460,9 @@ static inline void eeh_readsl(const volatile void __iomem *addr, void * buf,
>  		eeh_check_failure(addr);
>  }
>  
> +
> +void eeh_cache_debugfs_init(void);
> +
>  #endif /* CONFIG_PPC64 */
>  #endif /* __KERNEL__ */
>  #endif /* _POWERPC_EEH_H */
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index d021f5abeec5..82d22c671c0e 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1810,6 +1810,7 @@ static int __init eeh_init_proc(void)
>  					   &eeh_enable_dbgfs_ops);
>  		debugfs_create_u32("eeh_max_freezes", 0600,
>  				powerpc_debugfs_root, &eeh_max_freezes);
> +		eeh_cache_debugfs_init();
>  #endif
>  	}
>  
> diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
> index b2c320e0fcef..5c5697cced41 100644
> --- a/arch/powerpc/kernel/eeh_cache.c
> +++ b/arch/powerpc/kernel/eeh_cache.c
> @@ -26,6 +26,7 @@
>  #include <linux/spinlock.h>
>  #include <linux/atomic.h>
>  #include <asm/pci-bridge.h>
> +#include <asm/debugfs.h>
>  #include <asm/ppc-pci.h>
>  
>  
> @@ -298,9 +299,30 @@ void eeh_addr_cache_build(void)
>  		eeh_addr_cache_insert_dev(dev);
>  		eeh_sysfs_add_device(dev);
>  	}
> +}
>  
> -#ifdef DEBUG
> -	/* Verify tree built up above, echo back the list of addrs. */
> -	eeh_addr_cache_print(&pci_io_addr_cache_root);
> -#endif
> +static int eeh_addr_cache_show(struct seq_file *s, void *v)
> +{
> +	struct pci_io_addr_range *piar;
> +	struct rb_node *n;
> +
> +	spin_lock(&pci_io_addr_cache_root.piar_lock);
> +	for (n = rb_first(&pci_io_addr_cache_root.rb_root); n; n = rb_next(n)) {
> +		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
> +
> +		seq_printf(s, "%s addr range [%pap-%pap]: %s\n",
> +		       (piar->flags & IORESOURCE_IO) ? "i/o" : "mem",
> +		       &piar->addr_lo, &piar->addr_hi, pci_name(piar->pcidev));
> +	}
> +	spin_unlock(&pci_io_addr_cache_root.piar_lock);
> +
> +	return 0;
> +}
> +DEFINE_SHOW_ATTRIBUTE(eeh_addr_cache);
> +
> +void eeh_cache_debugfs_init(void)
> +{
> +	debugfs_create_file_unsafe("eeh_address_cache", 0400,
> +			powerpc_debugfs_root, NULL,
> +			&eeh_addr_cache_fops);
>  }
> -- 
> 2.20.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/7] powerpc/eeh_cache: Bump log level of eeh_addr_cache_print()
  2019-02-15  0:48 ` [PATCH v2 4/7] powerpc/eeh_cache: Bump log level of eeh_addr_cache_print() Oliver O'Halloran
@ 2019-02-15  5:12   ` Sam Bobroff
  0 siblings, 0 replies; 15+ messages in thread
From: Sam Bobroff @ 2019-02-15  5:12 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1279 bytes --]

On Fri, Feb 15, 2019 at 11:48:14AM +1100, Oliver O'Halloran wrote:
> To use this function at all #define DEBUG needs to be set in eeh_cache.c.
> Considering that printing at pr_debug is probably not all that useful since
> it adds the additional hurdle of requiring you to enable the debug print if
> dynamic_debug is in use so this patch bumps it to pr_info.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>

Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>

> ---
>  arch/powerpc/kernel/eeh_cache.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
> index 5c5697cced41..9c68f0837385 100644
> --- a/arch/powerpc/kernel/eeh_cache.c
> +++ b/arch/powerpc/kernel/eeh_cache.c
> @@ -114,7 +114,7 @@ static void eeh_addr_cache_print(struct pci_io_addr_cache *cache)
>  	while (n) {
>  		struct pci_io_addr_range *piar;
>  		piar = rb_entry(n, struct pci_io_addr_range, rb_node);
> -		pr_debug("PCI: %s addr range %d [%pap-%pap]: %s\n",
> +		pr_info("PCI: %s addr range %d [%pap-%pap]: %s\n",
>  		       (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", cnt,
>  		       &piar->addr_lo, &piar->addr_hi, pci_name(piar->pcidev));
>  		cnt++;
> -- 
> 2.20.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 5/7] powerpc/pci: Add pci_find_controller_for_domain()
  2019-02-15  0:48 ` [PATCH v2 5/7] powerpc/pci: Add pci_find_controller_for_domain() Oliver O'Halloran
@ 2019-02-15  5:13   ` Sam Bobroff
  0 siblings, 0 replies; 15+ messages in thread
From: Sam Bobroff @ 2019-02-15  5:13 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1985 bytes --]

On Fri, Feb 15, 2019 at 11:48:15AM +1100, Oliver O'Halloran wrote:
> Add a helper to find the pci_controller structure based on the domain
> number / phb id.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>

Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>

> ---
> v2: Renamed pci_find_hose_for_domain() to
>     pci_find_controller_for_domain()
> ---
>  arch/powerpc/include/asm/pci-bridge.h |  2 ++
>  arch/powerpc/kernel/pci-common.c      | 11 +++++++++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
> index aee4fcc24990..feea9534a015 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -274,6 +274,8 @@ extern int pcibios_map_io_space(struct pci_bus *bus);
>  extern struct pci_controller *pci_find_hose_for_OF_device(
>  			struct device_node* node);
>  
> +extern struct pci_controller *pci_find_controller_for_domain(int domain_nr);
> +
>  /* Fill up host controller resources from the OF node */
>  extern void pci_process_bridge_OF_ranges(struct pci_controller *hose,
>  			struct device_node *dev, int primary);
> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> index 88e4f69a09e5..21eeeabb3e0e 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -357,6 +357,17 @@ struct pci_controller* pci_find_hose_for_OF_device(struct device_node* node)
>  	return NULL;
>  }
>  
> +struct pci_controller *pci_find_controller_for_domain(int domain_nr)
> +{
> +	struct pci_controller *hose;
> +
> +	list_for_each_entry(hose, &hose_list, list_node)
> +		if (hose->global_number == domain_nr)
> +			return hose;
> +
> +	return NULL;
> +}
> +
>  /*
>   * Reads the interrupt pin to determine if interrupt is use by card.
>   * If the interrupt is used, then gets the interrupt line from the
> -- 
> 2.20.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 6/7] powerpc/eeh: Allow disabling recovery
  2019-02-15  0:48 ` [PATCH v2 6/7] powerpc/eeh: Allow disabling recovery Oliver O'Halloran
@ 2019-02-15  5:58   ` Sam Bobroff
  2019-02-17 23:19     ` Oliver
  0 siblings, 1 reply; 15+ messages in thread
From: Sam Bobroff @ 2019-02-15  5:58 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 2973 bytes --]

On Fri, Feb 15, 2019 at 11:48:16AM +1100, Oliver O'Halloran wrote:
> Currently when we detect an error we automatically invoke the EEH recovery
> handler. This can be annoying when debugging EEH problems, or when working
> on EEH itself so this patch adds a debugfs knob that will prevent a
> recovery event from being queued up when an issue is detected.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> ---
>  arch/powerpc/include/asm/eeh.h  |  1 +
>  arch/powerpc/kernel/eeh.c       | 10 ++++++++++
>  arch/powerpc/kernel/eeh_event.c |  9 +++++++++
>  3 files changed, 20 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index 478f199d5663..810e05273ad3 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -220,6 +220,7 @@ struct eeh_ops {
>  
>  extern int eeh_subsystem_flags;
>  extern u32 eeh_max_freezes;
> +extern bool eeh_debugfs_no_recover;
>  extern struct eeh_ops *eeh_ops;
>  extern raw_spinlock_t confirm_error_lock;
>  
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 82d22c671c0e..9f20099ce2d9 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -111,6 +111,13 @@ EXPORT_SYMBOL(eeh_subsystem_flags);
>   */
>  u32 eeh_max_freezes = 5;
>  
> +/*
> + * Controls whether a recovery event should be scheduled when an
> + * isolated device is discovered. This is only really useful for
> + * debugging problems with the EEH core.
> + */
> +bool eeh_debugfs_no_recover;
> +
>  /* Platform dependent EEH operations */
>  struct eeh_ops *eeh_ops = NULL;
>  
> @@ -1810,6 +1817,9 @@ static int __init eeh_init_proc(void)
>  					   &eeh_enable_dbgfs_ops);
>  		debugfs_create_u32("eeh_max_freezes", 0600,
>  				powerpc_debugfs_root, &eeh_max_freezes);
> +		debugfs_create_bool("eeh_disable_recovery", 0600,
> +				powerpc_debugfs_root,
> +				&eeh_debugfs_no_recover);
>  		eeh_cache_debugfs_init();
>  #endif
>  	}
> diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c
> index 227e57f980df..19837798bb1d 100644
> --- a/arch/powerpc/kernel/eeh_event.c
> +++ b/arch/powerpc/kernel/eeh_event.c
> @@ -126,6 +126,15 @@ int eeh_send_failure_event(struct eeh_pe *pe)
>  	unsigned long flags;
>  	struct eeh_event *event;
>  
> +	/*
> +	 * If we've manually supressed recovery events via debugfs
> +	 * then just drop it on the floor.
> +	 */
> +	if (eeh_debugfs_no_recover) {
> +		pr_err("EEH: Event dropped due to no_recover setting\n");
> +		return 0;
> +	}
> +

I think it might be clearer if you did the 'no recovery' test at the
call sites (I think there are only a few), instead of inside the
function (and then the next patch wouldn't need to add a wrapper).

>  	event = kzalloc(sizeof(*event), GFP_ATOMIC);
>  	if (!event) {
>  		pr_err("EEH: out of memory, event not handled\n");
> -- 
> 2.20.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 6/7] powerpc/eeh: Allow disabling recovery
  2019-02-15  5:58   ` Sam Bobroff
@ 2019-02-17 23:19     ` Oliver
  0 siblings, 0 replies; 15+ messages in thread
From: Oliver @ 2019-02-17 23:19 UTC (permalink / raw)
  To: Sam Bobroff; +Cc: linuxppc-dev

On Fri, Feb 15, 2019 at 4:58 PM Sam Bobroff <sbobroff@linux.ibm.com> wrote:
>
> On Fri, Feb 15, 2019 at 11:48:16AM +1100, Oliver O'Halloran wrote:
> > Currently when we detect an error we automatically invoke the EEH recovery
> > handler. This can be annoying when debugging EEH problems, or when working
> > on EEH itself so this patch adds a debugfs knob that will prevent a
> > recovery event from being queued up when an issue is detected.
> >
> > Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> > ---
> >  arch/powerpc/include/asm/eeh.h  |  1 +
> >  arch/powerpc/kernel/eeh.c       | 10 ++++++++++
> >  arch/powerpc/kernel/eeh_event.c |  9 +++++++++
> >  3 files changed, 20 insertions(+)
> >
> > diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> > index 478f199d5663..810e05273ad3 100644
> > --- a/arch/powerpc/include/asm/eeh.h
> > +++ b/arch/powerpc/include/asm/eeh.h
> > @@ -220,6 +220,7 @@ struct eeh_ops {
> >
> >  extern int eeh_subsystem_flags;
> >  extern u32 eeh_max_freezes;
> > +extern bool eeh_debugfs_no_recover;
> >  extern struct eeh_ops *eeh_ops;
> >  extern raw_spinlock_t confirm_error_lock;
> >
> > diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> > index 82d22c671c0e..9f20099ce2d9 100644
> > --- a/arch/powerpc/kernel/eeh.c
> > +++ b/arch/powerpc/kernel/eeh.c
> > @@ -111,6 +111,13 @@ EXPORT_SYMBOL(eeh_subsystem_flags);
> >   */
> >  u32 eeh_max_freezes = 5;
> >
> > +/*
> > + * Controls whether a recovery event should be scheduled when an
> > + * isolated device is discovered. This is only really useful for
> > + * debugging problems with the EEH core.
> > + */
> > +bool eeh_debugfs_no_recover;
> > +
> >  /* Platform dependent EEH operations */
> >  struct eeh_ops *eeh_ops = NULL;
> >
> > @@ -1810,6 +1817,9 @@ static int __init eeh_init_proc(void)
> >                                          &eeh_enable_dbgfs_ops);
> >               debugfs_create_u32("eeh_max_freezes", 0600,
> >                               powerpc_debugfs_root, &eeh_max_freezes);
> > +             debugfs_create_bool("eeh_disable_recovery", 0600,
> > +                             powerpc_debugfs_root,
> > +                             &eeh_debugfs_no_recover);
> >               eeh_cache_debugfs_init();
> >  #endif
> >       }
> > diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c
> > index 227e57f980df..19837798bb1d 100644
> > --- a/arch/powerpc/kernel/eeh_event.c
> > +++ b/arch/powerpc/kernel/eeh_event.c
> > @@ -126,6 +126,15 @@ int eeh_send_failure_event(struct eeh_pe *pe)
> >       unsigned long flags;
> >       struct eeh_event *event;
> >
> > +     /*
> > +      * If we've manually supressed recovery events via debugfs
> > +      * then just drop it on the floor.
> > +      */
> > +     if (eeh_debugfs_no_recover) {
> > +             pr_err("EEH: Event dropped due to no_recover setting\n");
> > +             return 0;
> > +     }
> > +
>
> I think it might be clearer if you did the 'no recovery' test at the
> call sites (I think there are only a few), instead of inside the
> function (and then the next patch wouldn't need to add a wrapper).

I don't think adding boilerplate at all the call sites is an
improvement. It's just a recipe for adding bugs.

>
> >       event = kzalloc(sizeof(*event), GFP_ATOMIC);
> >       if (!event) {
> >               pr_err("EEH: out of memory, event not handled\n");
> > --
> > 2.20.1
> >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [v2,1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes
  2019-02-15  0:48 [PATCH v2 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Oliver O'Halloran
                   ` (6 preceding siblings ...)
  2019-02-15  5:10 ` [PATCH v2 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Sam Bobroff
@ 2019-02-22  9:47 ` Michael Ellerman
  7 siblings, 0 replies; 15+ messages in thread
From: Michael Ellerman @ 2019-02-22  9:47 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev; +Cc: sbobroff, Oliver O'Halloran

On Fri, 2019-02-15 at 00:48:11 UTC, Oliver O'Halloran wrote:
> There's no need to the custom getter/setter functions so we should remove
> them in favour of using the generic one. While we're here, change the type
> of eeh_max_freeze to u32 and print the value in decimal rather than
> hex because printing it in hex makes no sense.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/46ee7c3c5212b0f4f8713d60cfd59572

cheers

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-02-22 10:29 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-15  0:48 [PATCH v2 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Oliver O'Halloran
2019-02-15  0:48 ` [PATCH v2 2/7] powerpc/eeh_cache: Add pr_debug() prints for insert/remove Oliver O'Halloran
2019-02-15  5:11   ` Sam Bobroff
2019-02-15  0:48 ` [PATCH v2 3/7] powerpc/eeh_cache: Add a way to dump the EEH address cache Oliver O'Halloran
2019-02-15  5:12   ` Sam Bobroff
2019-02-15  0:48 ` [PATCH v2 4/7] powerpc/eeh_cache: Bump log level of eeh_addr_cache_print() Oliver O'Halloran
2019-02-15  5:12   ` Sam Bobroff
2019-02-15  0:48 ` [PATCH v2 5/7] powerpc/pci: Add pci_find_controller_for_domain() Oliver O'Halloran
2019-02-15  5:13   ` Sam Bobroff
2019-02-15  0:48 ` [PATCH v2 6/7] powerpc/eeh: Allow disabling recovery Oliver O'Halloran
2019-02-15  5:58   ` Sam Bobroff
2019-02-17 23:19     ` Oliver
2019-02-15  0:48 ` [PATCH v2 7/7] powerpc/eeh: Add eeh_force_recover to debugfs Oliver O'Halloran
2019-02-15  5:10 ` [PATCH v2 1/7] powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes Sam Bobroff
2019-02-22  9:47 ` [v2,1/7] " Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).