All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] error handling and pciehp maintenance
@ 2018-09-18 23:58 Keith Busch
  2018-09-18 23:58 ` [PATCH 01/12] PCI: Set PCI bus accessors to noinline Keith Busch
                   ` (12 more replies)
  0 siblings, 13 replies; 41+ messages in thread
From: Keith Busch @ 2018-09-18 23:58 UTC (permalink / raw)
  To: Linux PCI, Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Sinan Kaya, Thomas Tai, poza,
	Lukas Wunner, Christoph Hellwig, Mika Westerberg, Keith Busch

I ran into a lot of trouble testing error handling, and this series is
just trying to simplify some things. The first 4 fix up aer_inject, and
the rest are cleanup to make better use of kernel APIs.

Keith Busch (12):
  PCI: Set PCI bus accessors to noinline
  PCI/AER: Covertly inject errors
  PCI/AER: Reuse existing service device lookup
  PCI/AER: Abstract AER interrupt handling
  PCI/AER: Remove dead code
  PCI/AER: Remove error source from aer struct
  PCI/AER: Use kfifo for tracking events
  PCI/AER: Use kfifo helper inserting locked elements
  PCI/AER: Don't read upstream ports below fatal errors
  PCI/AER: Use threaded IRQ for bottom half
  PCI/AER: Use managed resource allocations
  PCI/pciehp: Use device managed allocations

 drivers/pci/access.c              |   4 +-
 drivers/pci/hotplug/pciehp_core.c |  14 +-
 drivers/pci/hotplug/pciehp_hpc.c  |  48 ++----
 drivers/pci/pcie/Kconfig          |   2 +-
 drivers/pci/pcie/aer.c            | 219 ++++++---------------------
 drivers/pci/pcie/aer_inject.c     | 306 ++++++++++++++++++++------------------
 drivers/pci/pcie/portdrv.h        |   4 -
 drivers/pci/pcie/portdrv_core.c   |   1 +
 8 files changed, 227 insertions(+), 371 deletions(-)

-- 
2.14.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 01/12] PCI: Set PCI bus accessors to noinline
  2018-09-18 23:58 [PATCH 00/12] error handling and pciehp maintenance Keith Busch
@ 2018-09-18 23:58 ` Keith Busch
  2018-09-18 23:58 ` [PATCH 02/12] PCI/AER: Covertly inject errors Keith Busch
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Keith Busch @ 2018-09-18 23:58 UTC (permalink / raw)
  To: Linux PCI, Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Sinan Kaya, Thomas Tai, poza,
	Lukas Wunner, Christoph Hellwig, Mika Westerberg, Keith Busch

The pci bus config accessors could compile inlined with other accessor
functions, which makes it so they can't be traced. This patch forces
these to never be inlined so that ftrace can hook into these functions.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/access.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index a3ad2fe185b9..544922f097c0 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -33,7 +33,7 @@ DEFINE_RAW_SPINLOCK(pci_lock);
 #endif
 
 #define PCI_OP_READ(size, type, len) \
-int pci_bus_read_config_##size \
+int noinline pci_bus_read_config_##size \
 	(struct pci_bus *bus, unsigned int devfn, int pos, type *value)	\
 {									\
 	int res;							\
@@ -48,7 +48,7 @@ int pci_bus_read_config_##size \
 }
 
 #define PCI_OP_WRITE(size, type, len) \
-int pci_bus_write_config_##size \
+int noinline pci_bus_write_config_##size \
 	(struct pci_bus *bus, unsigned int devfn, int pos, type value)	\
 {									\
 	int res;							\
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 02/12] PCI/AER: Covertly inject errors
  2018-09-18 23:58 [PATCH 00/12] error handling and pciehp maintenance Keith Busch
  2018-09-18 23:58 ` [PATCH 01/12] PCI: Set PCI bus accessors to noinline Keith Busch
@ 2018-09-18 23:58 ` Keith Busch
  2018-09-18 23:58 ` [PATCH 03/12] PCI/AER: Reuse existing service device lookup Keith Busch
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Keith Busch @ 2018-09-18 23:58 UTC (permalink / raw)
  To: Linux PCI, Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Sinan Kaya, Thomas Tai, poza,
	Lukas Wunner, Christoph Hellwig, Mika Westerberg, Keith Busch

The aer_inject module was intercepting config requests by overwriting
the config accessor operations in the pci_bus. This has several issues.

First, the module was tracking kernel objects unbeknownst to the drivers
that own them. The kernel may free those devices, leaving the AER inject
module holding stale references and no way to know that happened.

Second, the pci enumeration has child devices inherit the pci_bus ops
from its parent bus. Since errors may lead to link resets that trigger
re-enuemration, the child devices have now inherited operations that
don't know about the devices using them, causing kernel crashes.

Finally, CONFIG_PCI_LOCKLESS_CONFIG doesn't block accessing the pci_bus
ops, so it's racing with potential in flight config requests.

It's just a bad idea to muck with kernel objects out from under the
drivers depending on them.

This patch uses a more discreet approach to intercept config requests
using ftrace for thunking the config space accessors to inject the desired
errors, and removing any need for aer_inject to track pci structuresh.

Inspired-by: https://github.com/ilammy/ftrace-hook
Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/Kconfig      |   2 +-
 drivers/pci/pcie/aer_inject.c | 276 +++++++++++++++++++++++-------------------
 2 files changed, 152 insertions(+), 126 deletions(-)

diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
index 0a1e9d379bc5..87bcf40f415d 100644
--- a/drivers/pci/pcie/Kconfig
+++ b/drivers/pci/pcie/Kconfig
@@ -35,7 +35,7 @@ config PCIEAER
 
 config PCIEAER_INJECT
 	tristate "PCI Express error injection support"
-	depends on PCIEAER
+	depends on PCIEAER && DYNAMIC_FTRACE_WITH_REGS
 	default n
 	help
 	  This enables PCI Express Root Port Advanced Error Reporting
diff --git a/drivers/pci/pcie/aer_inject.c b/drivers/pci/pcie/aer_inject.c
index 0eb24346cad3..6f8b96499f44 100644
--- a/drivers/pci/pcie/aer_inject.c
+++ b/drivers/pci/pcie/aer_inject.c
@@ -14,6 +14,10 @@
 
 #include <linux/module.h>
 #include <linux/init.h>
+#include <linux/ftrace.h>
+#include <linux/kallsyms.h>
+#include <linux/kernel.h>
+#include <linux/linkage.h>
 #include <linux/miscdevice.h>
 #include <linux/pci.h>
 #include <linux/slab.h>
@@ -58,17 +62,9 @@ struct aer_error {
 	u32 source_id;
 };
 
-struct pci_bus_ops {
-	struct list_head list;
-	struct pci_bus *bus;
-	struct pci_ops *ops;
-};
-
 static LIST_HEAD(einjected);
 
-static LIST_HEAD(pci_bus_ops_list);
-
-/* Protect einjected and pci_bus_ops_list */
+/* Protect einjected */
 static DEFINE_SPINLOCK(inject_lock);
 
 static void aer_error_init(struct aer_error *err, u32 domain,
@@ -82,7 +78,6 @@ static void aer_error_init(struct aer_error *err, u32 domain,
 	err->pos_cap_err = pos_cap_err;
 }
 
-/* inject_lock must be held before calling */
 static struct aer_error *__find_aer_error(u32 domain, unsigned int bus,
 					  unsigned int devfn)
 {
@@ -97,7 +92,6 @@ static struct aer_error *__find_aer_error(u32 domain, unsigned int bus,
 	return NULL;
 }
 
-/* inject_lock must be held before calling */
 static struct aer_error *__find_aer_error_by_dev(struct pci_dev *dev)
 {
 	int domain = pci_domain_nr(dev->bus);
@@ -106,32 +100,6 @@ static struct aer_error *__find_aer_error_by_dev(struct pci_dev *dev)
 	return __find_aer_error(domain, dev->bus->number, dev->devfn);
 }
 
-/* inject_lock must be held before calling */
-static struct pci_ops *__find_pci_bus_ops(struct pci_bus *bus)
-{
-	struct pci_bus_ops *bus_ops;
-
-	list_for_each_entry(bus_ops, &pci_bus_ops_list, list) {
-		if (bus_ops->bus == bus)
-			return bus_ops->ops;
-	}
-	return NULL;
-}
-
-static struct pci_bus_ops *pci_bus_ops_pop(void)
-{
-	unsigned long flags;
-	struct pci_bus_ops *bus_ops;
-
-	spin_lock_irqsave(&inject_lock, flags);
-	bus_ops = list_first_entry_or_null(&pci_bus_ops_list,
-					   struct pci_bus_ops, list);
-	if (bus_ops)
-		list_del(&bus_ops->list);
-	spin_unlock_irqrestore(&inject_lock, flags);
-	return bus_ops;
-}
-
 static u32 *find_pci_config_dword(struct aer_error *err, int where,
 				  int *prw1cs)
 {
@@ -176,19 +144,14 @@ static u32 *find_pci_config_dword(struct aer_error *err, int where,
 }
 
 static int aer_inj_read_config(struct pci_bus *bus, unsigned int devfn,
-			       int where, int size, u32 *val)
+			       int where, u32 *val)
 {
 	u32 *sim;
 	struct aer_error *err;
 	unsigned long flags;
-	struct pci_ops *ops;
-	struct pci_ops *my_ops;
 	int domain;
-	int rv;
 
 	spin_lock_irqsave(&inject_lock, flags);
-	if (size != sizeof(u32))
-		goto out;
 	domain = pci_domain_nr(bus);
 	if (domain < 0)
 		goto out;
@@ -203,37 +166,20 @@ static int aer_inj_read_config(struct pci_bus *bus, unsigned int devfn,
 		return 0;
 	}
 out:
-	ops = __find_pci_bus_ops(bus);
-	/*
-	 * pci_lock must already be held, so we can directly
-	 * manipulate bus->ops.  Many config access functions,
-	 * including pci_generic_config_read() require the original
-	 * bus->ops be installed to function, so temporarily put them
-	 * back.
-	 */
-	my_ops = bus->ops;
-	bus->ops = ops;
-	rv = ops->read(bus, devfn, where, size, val);
-	bus->ops = my_ops;
 	spin_unlock_irqrestore(&inject_lock, flags);
-	return rv;
+	return -1;
 }
 
 static int aer_inj_write_config(struct pci_bus *bus, unsigned int devfn,
-				int where, int size, u32 val)
+				int where, u32 val)
 {
 	u32 *sim;
 	struct aer_error *err;
 	unsigned long flags;
 	int rw1cs;
-	struct pci_ops *ops;
-	struct pci_ops *my_ops;
 	int domain;
-	int rv;
 
 	spin_lock_irqsave(&inject_lock, flags);
-	if (size != sizeof(u32))
-		goto out;
 	domain = pci_domain_nr(bus);
 	if (domain < 0)
 		goto out;
@@ -250,57 +196,9 @@ static int aer_inj_write_config(struct pci_bus *bus, unsigned int devfn,
 		spin_unlock_irqrestore(&inject_lock, flags);
 		return 0;
 	}
-out:
-	ops = __find_pci_bus_ops(bus);
-	/*
-	 * pci_lock must already be held, so we can directly
-	 * manipulate bus->ops.  Many config access functions,
-	 * including pci_generic_config_write() require the original
-	 * bus->ops be installed to function, so temporarily put them
-	 * back.
-	 */
-	my_ops = bus->ops;
-	bus->ops = ops;
-	rv = ops->write(bus, devfn, where, size, val);
-	bus->ops = my_ops;
-	spin_unlock_irqrestore(&inject_lock, flags);
-	return rv;
-}
-
-static struct pci_ops aer_inj_pci_ops = {
-	.read = aer_inj_read_config,
-	.write = aer_inj_write_config,
-};
-
-static void pci_bus_ops_init(struct pci_bus_ops *bus_ops,
-			     struct pci_bus *bus,
-			     struct pci_ops *ops)
-{
-	INIT_LIST_HEAD(&bus_ops->list);
-	bus_ops->bus = bus;
-	bus_ops->ops = ops;
-}
-
-static int pci_bus_set_aer_ops(struct pci_bus *bus)
-{
-	struct pci_ops *ops;
-	struct pci_bus_ops *bus_ops;
-	unsigned long flags;
-
-	bus_ops = kmalloc(sizeof(*bus_ops), GFP_KERNEL);
-	if (!bus_ops)
-		return -ENOMEM;
-	ops = pci_bus_set_ops(bus, &aer_inj_pci_ops);
-	spin_lock_irqsave(&inject_lock, flags);
-	if (ops == &aer_inj_pci_ops)
-		goto out;
-	pci_bus_ops_init(bus_ops, bus, ops);
-	list_add(&bus_ops->list, &pci_bus_ops_list);
-	bus_ops = NULL;
 out:
 	spin_unlock_irqrestore(&inject_lock, flags);
-	kfree(bus_ops);
-	return 0;
+	return -1;
 }
 
 static int find_aer_device_iter(struct device *device, void *data)
@@ -457,13 +355,6 @@ static int aer_inject(struct aer_error_inj *einj)
 				       uncor_mask_orig);
 	}
 
-	ret = pci_bus_set_aer_ops(dev->bus);
-	if (ret)
-		goto out_put;
-	ret = pci_bus_set_aer_ops(rpdev->bus);
-	if (ret)
-		goto out_put;
-
 	if (find_aer_device(rpdev, &edev)) {
 		if (!get_service_data(edev)) {
 			dev_warn(&edev->device,
@@ -518,24 +409,159 @@ static struct miscdevice aer_inject_device = {
 	.fops = &aer_inject_fops,
 };
 
+static asmlinkage int (*read_config_dword)(struct pci_bus *bus,
+					   unsigned int devfn,
+					   int where, u32 *val);
+static asmlinkage int (*write_config_dword)(struct pci_bus *bus,
+					    unsigned int devfn,
+					    int where, u32 val);
+struct aer_hook {
+	struct ftrace_ops ops;
+	const char *name;
+	void *function;
+	void *original;
+	unsigned long address;
+};
+
+static int asmlinkage ainj_read_config_dword(struct pci_bus *bus,
+			unsigned int devfn, int where, u32 *val)
+{
+	if (!aer_inj_read_config(bus, devfn, where, (u32 *)val))
+		return 0;
+	return read_config_dword(bus, devfn, where, val);
+}
+
+static int asmlinkage ainj_write_config_dword(struct pci_bus *bus,
+			unsigned int devfn, int where, u32 val)
+{
+	if (!aer_inj_write_config(bus, devfn, where, val))
+		return 0;
+	return write_config_dword(bus, devfn, where, val);
+}
+
+static int aer_inject_resolve_hook_address(struct aer_hook *hook)
+{
+	hook->address = kallsyms_lookup_name(hook->name);
+
+	if (!hook->address) {
+		pr_warn("unresolved symbol: %s\n", hook->name);
+		return -ENOENT;
+	}
+	*((unsigned long*) hook->original) = hook->address + MCOUNT_INSN_SIZE;
+	return 0;
+}
+
+static void notrace aer_inject_ftrace_thunk(unsigned long ip, unsigned long parent_ip,
+		struct ftrace_ops *ops, struct pt_regs *regs)
+{
+	struct aer_hook *hook = container_of(ops, struct aer_hook, ops);
+	regs->ip = (unsigned long) hook->function;
+}
+
+static int aer_inject_install_hook(struct aer_hook *hook)
+{
+	int err;
+
+	err = aer_inject_resolve_hook_address(hook);
+	if (err)
+		return err;
+
+	hook->ops.func = aer_inject_ftrace_thunk;
+	hook->ops.flags = FTRACE_OPS_FL_SAVE_REGS |
+			  FTRACE_OPS_FL_RECURSION_SAFE |
+			  FTRACE_OPS_FL_IPMODIFY;
+
+	err = ftrace_set_filter_ip(&hook->ops, hook->address, 0, 0);
+	if (err) {
+		pr_warn("ftrace_set_filter_ip() failed: %d\n", err);
+		return err;
+	}
+
+	err = register_ftrace_function(&hook->ops);
+	if (err) {
+		pr_warn("register_ftrace_function() failed: %d\n", err);
+		ftrace_set_filter_ip(&hook->ops, hook->address, 1, 0);
+		return err;
+	}
+
+	return 0;
+}
+
+static void aer_inject_remove_hook(struct aer_hook *hook)
+{
+	int err;
+
+	err = unregister_ftrace_function(&hook->ops);
+	if (err)
+		pr_warn("unregister_ftrace_function() failed: %d\n", err);
+
+	err = ftrace_set_filter_ip(&hook->ops, hook->address, 1, 0);
+	if (err)
+		pr_warn("ftrace_set_filter_ip() failed: %d\n", err);
+}
+
+static int aer_inject_install_hooks(struct aer_hook *hooks, size_t count)
+{
+	int err, i;
+
+	for (i = 0; i < count; i++) {
+		err = aer_inject_install_hook(&hooks[i]);
+		if (err)
+			goto error;
+	}
+	return 0;
+error:
+	while (i != 0)
+		aer_inject_remove_hook(&hooks[--i]);
+	return err;
+}
+
+static void aer_inject_remove_hooks(struct aer_hook *hooks, size_t count)
+{
+	int i;
+
+	for (i = 0; i < count; i++)
+		aer_inject_remove_hook(&hooks[i]);
+}
+
+static struct aer_hook aer_hooks[] = {
+	{
+		.name = "pci_bus_read_config_dword",
+		.function = ainj_read_config_dword,
+		.original = &read_config_dword,
+	},
+	{
+		.name = "pci_bus_write_config_dword",
+		.function = ainj_write_config_dword,
+		.original = &write_config_dword,
+	},
+};
+
 static int __init aer_inject_init(void)
 {
-	return misc_register(&aer_inject_device);
+	int err;
+
+	err = misc_register(&aer_inject_device);
+	if (err)
+		return err;
+
+	err = aer_inject_install_hooks(aer_hooks, ARRAY_SIZE(aer_hooks));
+	if (err)
+		goto out;
+	return 0;
+ out:
+	misc_deregister(&aer_inject_device);
+	return err;
 }
 
 static void __exit aer_inject_exit(void)
 {
 	struct aer_error *err, *err_next;
 	unsigned long flags;
-	struct pci_bus_ops *bus_ops;
 
+	aer_inject_remove_hooks(aer_hooks, ARRAY_SIZE(aer_hooks));
 	misc_deregister(&aer_inject_device);
 
-	while ((bus_ops = pci_bus_ops_pop())) {
-		pci_bus_set_ops(bus_ops->bus, bus_ops->ops);
-		kfree(bus_ops);
-	}
-
 	spin_lock_irqsave(&inject_lock, flags);
 	list_for_each_entry_safe(err, err_next, &einjected, list) {
 		list_del(&err->list);
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 03/12] PCI/AER: Reuse existing service device lookup
  2018-09-18 23:58 [PATCH 00/12] error handling and pciehp maintenance Keith Busch
  2018-09-18 23:58 ` [PATCH 01/12] PCI: Set PCI bus accessors to noinline Keith Busch
  2018-09-18 23:58 ` [PATCH 02/12] PCI/AER: Covertly inject errors Keith Busch
@ 2018-09-18 23:58 ` Keith Busch
  2018-09-18 23:58 ` [PATCH 04/12] PCI/AER: Abstract AER interrupt handling Keith Busch
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Keith Busch @ 2018-09-18 23:58 UTC (permalink / raw)
  To: Linux PCI, Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Sinan Kaya, Thomas Tai, poza,
	Lukas Wunner, Christoph Hellwig, Mika Westerberg, Keith Busch

The port services driver already provides a method to find the pcie_device
for a service. This patch exports the function and uses it from the
aer_inject module and removes the duplicate functionality.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/aer_inject.c   | 25 ++++---------------------
 drivers/pci/pcie/portdrv_core.c |  1 +
 2 files changed, 5 insertions(+), 21 deletions(-)

diff --git a/drivers/pci/pcie/aer_inject.c b/drivers/pci/pcie/aer_inject.c
index 6f8b96499f44..7d641a543194 100644
--- a/drivers/pci/pcie/aer_inject.c
+++ b/drivers/pci/pcie/aer_inject.c
@@ -201,32 +201,13 @@ static int aer_inj_write_config(struct pci_bus *bus, unsigned int devfn,
 	return -1;
 }
 
-static int find_aer_device_iter(struct device *device, void *data)
-{
-	struct pcie_device **result = data;
-	struct pcie_device *pcie_dev;
-
-	if (device->bus == &pcie_port_bus_type) {
-		pcie_dev = to_pcie_device(device);
-		if (pcie_dev->service & PCIE_PORT_SERVICE_AER) {
-			*result = pcie_dev;
-			return 1;
-		}
-	}
-	return 0;
-}
-
-static int find_aer_device(struct pci_dev *dev, struct pcie_device **result)
-{
-	return device_for_each_child(&dev->dev, result, find_aer_device_iter);
-}
-
 static int aer_inject(struct aer_error_inj *einj)
 {
 	struct aer_error *err, *rperr;
 	struct aer_error *err_alloc = NULL, *rperr_alloc = NULL;
 	struct pci_dev *dev, *rpdev;
 	struct pcie_device *edev;
+	struct device *device;
 	unsigned long flags;
 	unsigned int devfn = PCI_DEVFN(einj->dev, einj->fn);
 	int pos_cap_err, rp_pos_cap_err;
@@ -355,7 +336,9 @@ static int aer_inject(struct aer_error_inj *einj)
 				       uncor_mask_orig);
 	}
 
-	if (find_aer_device(rpdev, &edev)) {
+	device = pcie_port_find_device(rpdev, PCIE_PORT_SERVICE_AER);
+	if (device) {
+		edev = to_pcie_device(device);
 		if (!get_service_data(edev)) {
 			dev_warn(&edev->device,
 				 "aer_inject: AER service is not initialized\n");
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 7c37d815229e..5508285a0bff 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -466,6 +466,7 @@ struct device *pcie_port_find_device(struct pci_dev *dev,
 	device = pdrvs.dev;
 	return device;
 }
+EXPORT_SYMBOL_GPL(pcie_port_find_device);
 
 /**
  * pcie_port_device_remove - unregister PCI Express port service devices
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 04/12] PCI/AER: Abstract AER interrupt handling
  2018-09-18 23:58 [PATCH 00/12] error handling and pciehp maintenance Keith Busch
                   ` (2 preceding siblings ...)
  2018-09-18 23:58 ` [PATCH 03/12] PCI/AER: Reuse existing service device lookup Keith Busch
@ 2018-09-18 23:58 ` Keith Busch
  2018-09-18 23:58 ` [PATCH 05/12] PCI/AER: Remove dead code Keith Busch
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Keith Busch @ 2018-09-18 23:58 UTC (permalink / raw)
  To: Linux PCI, Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Sinan Kaya, Thomas Tai, poza,
	Lukas Wunner, Christoph Hellwig, Mika Westerberg, Keith Busch

The aer_inject module was directly calling aer_irq. This required the
aer driver export its private irq handler for no other reason than to
support error injection. A driver should not have to expose its private
interfaces, so this patch uses the irq subsystem to route injection to
the aer driver, and makes the aer_irq handler a private interface.

This are additional benefits this provides.

First, directly calling the irq handler bypasses the IRQ subsytem so
the injection wasn't really synthesizing what happens if a shared AER
interrupt occurs.

The error injection had to provide the callback data directly, which
may be racing with a removal that is freeing that structure. The irq
subsystem can handle that race.

Finally, if we ever need to modify the aer interrupt handling, for
example, to use threaded IRQs for the benefits those provide, abstracting
the interface from the error injection will make that easier to modify.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/aer.c        | 3 +--
 drivers/pci/pcie/aer_inject.c | 5 ++++-
 drivers/pci/pcie/portdrv.h    | 4 ----
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 6b59a23568f8..1318483a080c 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1270,7 +1270,7 @@ static void aer_isr(struct work_struct *work)
  *
  * Invoked when Root Port detects AER messages.
  */
-irqreturn_t aer_irq(int irq, void *context)
+static irqreturn_t aer_irq(int irq, void *context)
 {
 	unsigned int status, id;
 	struct pcie_device *pdev = (struct pcie_device *)context;
@@ -1319,7 +1319,6 @@ irqreturn_t aer_irq(int irq, void *context)
 
 	return IRQ_HANDLED;
 }
-EXPORT_SYMBOL_GPL(aer_irq);
 
 static int set_device_error_reporting(struct pci_dev *dev, void *data)
 {
diff --git a/drivers/pci/pcie/aer_inject.c b/drivers/pci/pcie/aer_inject.c
index 7d641a543194..57821bb61c2a 100644
--- a/drivers/pci/pcie/aer_inject.c
+++ b/drivers/pci/pcie/aer_inject.c
@@ -15,6 +15,7 @@
 #include <linux/module.h>
 #include <linux/init.h>
 #include <linux/ftrace.h>
+#include <linux/irq.h>
 #include <linux/kallsyms.h>
 #include <linux/kernel.h>
 #include <linux/linkage.h>
@@ -348,7 +349,9 @@ static int aer_inject(struct aer_error_inj *einj)
 		dev_info(&edev->device,
 			 "aer_inject: Injecting errors %08x/%08x into device %s\n",
 			 einj->cor_status, einj->uncor_status, pci_name(dev));
-		aer_irq(-1, edev);
+		local_irq_disable();
+		generic_handle_irq(edev->irq);
+		local_irq_enable();
 	} else {
 		pci_err(rpdev, "aer_inject: AER device not found\n");
 		ret = -ENODEV;
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index d59afa42fc14..127b8b246437 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -123,10 +123,6 @@ static inline int pcie_aer_get_firmware_first(struct pci_dev *pci_dev)
 }
 #endif
 
-#ifdef CONFIG_PCIEAER
-irqreturn_t aer_irq(int irq, void *context);
-#endif
-
 struct pcie_port_service_driver *pcie_port_find_service(struct pci_dev *dev,
 							u32 service);
 struct device *pcie_port_find_device(struct pci_dev *dev, u32 service);
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 05/12] PCI/AER: Remove dead code
  2018-09-18 23:58 [PATCH 00/12] error handling and pciehp maintenance Keith Busch
                   ` (3 preceding siblings ...)
  2018-09-18 23:58 ` [PATCH 04/12] PCI/AER: Abstract AER interrupt handling Keith Busch
@ 2018-09-18 23:58 ` Keith Busch
  2018-09-18 23:58 ` [PATCH 06/12] PCI/AER: Remove error source from aer struct Keith Busch
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Keith Busch @ 2018-09-18 23:58 UTC (permalink / raw)
  To: Linux PCI, Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Sinan Kaya, Thomas Tai, poza,
	Lukas Wunner, Christoph Hellwig, Mika Westerberg, Keith Busch

The error recovery callbacks are only run on child devices. A root port
is never a child device, so this error resume callback was never
invoked.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/aer.c | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 1318483a080c..a66e33868303 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1545,18 +1545,6 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
 	return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
 }
 
-/**
- * aer_error_resume - clean up corresponding error status bits
- * @dev: pointer to Root Port's pci_dev data structure
- *
- * Invoked by Port Bus driver during nonfatal recovery.
- */
-static void aer_error_resume(struct pci_dev *dev)
-{
-	pci_aer_clear_device_status(dev);
-	pci_cleanup_aer_uncorrect_error_status(dev);
-}
-
 static struct pcie_port_service_driver aerdriver = {
 	.name		= "aer",
 	.port_type	= PCI_EXP_TYPE_ROOT_PORT,
@@ -1564,7 +1552,6 @@ static struct pcie_port_service_driver aerdriver = {
 
 	.probe		= aer_probe,
 	.remove		= aer_remove,
-	.error_resume	= aer_error_resume,
 	.reset_link	= aer_root_reset,
 };
 
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 06/12] PCI/AER: Remove error source from aer struct
  2018-09-18 23:58 [PATCH 00/12] error handling and pciehp maintenance Keith Busch
                   ` (4 preceding siblings ...)
  2018-09-18 23:58 ` [PATCH 05/12] PCI/AER: Remove dead code Keith Busch
@ 2018-09-18 23:58 ` Keith Busch
  2018-09-18 23:58 ` [PATCH 07/12] PCI/AER: Use kfifo for tracking events Keith Busch
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Keith Busch @ 2018-09-18 23:58 UTC (permalink / raw)
  To: Linux PCI, Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Sinan Kaya, Thomas Tai, poza,
	Lukas Wunner, Christoph Hellwig, Mika Westerberg, Keith Busch

The aer struct was carrying a copy of the error source simply as a
temperary variable. This patch removes that from the structure and uses
a stack variable for the purpose.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/aer.c | 33 ++++++++++++++++-----------------
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a66e33868303..72f8b4d6be32 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -44,7 +44,6 @@ struct aer_rpc {
 	struct pci_dev *rpd;		/* Root Port device */
 	struct work_struct dpc_handler;
 	struct aer_err_source e_sources[AER_ERROR_SOURCES_MAX];
-	struct aer_err_info e_info;
 	unsigned short prod_idx;	/* Error Producer Index */
 	unsigned short cons_idx;	/* Error Consumer Index */
 	int isr;
@@ -1175,7 +1174,7 @@ static void aer_isr_one_error(struct aer_rpc *rpc,
 		struct aer_err_source *e_src)
 {
 	struct pci_dev *pdev = rpc->rpd;
-	struct aer_err_info *e_info = &rpc->e_info;
+	struct aer_err_info e_info;
 
 	pci_rootport_aer_stats_incr(pdev, e_src);
 
@@ -1184,36 +1183,36 @@ static void aer_isr_one_error(struct aer_rpc *rpc,
 	 * uncorrectable error being logged. Report correctable error first.
 	 */
 	if (e_src->status & PCI_ERR_ROOT_COR_RCV) {
-		e_info->id = ERR_COR_ID(e_src->id);
-		e_info->severity = AER_CORRECTABLE;
+		e_info.id = ERR_COR_ID(e_src->id);
+		e_info.severity = AER_CORRECTABLE;
 
 		if (e_src->status & PCI_ERR_ROOT_MULTI_COR_RCV)
-			e_info->multi_error_valid = 1;
+			e_info.multi_error_valid = 1;
 		else
-			e_info->multi_error_valid = 0;
-		aer_print_port_info(pdev, e_info);
+			e_info.multi_error_valid = 0;
+		aer_print_port_info(pdev, &e_info);
 
-		if (find_source_device(pdev, e_info))
-			aer_process_err_devices(e_info);
+		if (find_source_device(pdev, &e_info))
+			aer_process_err_devices(&e_info);
 	}
 
 	if (e_src->status & PCI_ERR_ROOT_UNCOR_RCV) {
-		e_info->id = ERR_UNCOR_ID(e_src->id);
+		e_info.id = ERR_UNCOR_ID(e_src->id);
 
 		if (e_src->status & PCI_ERR_ROOT_FATAL_RCV)
-			e_info->severity = AER_FATAL;
+			e_info.severity = AER_FATAL;
 		else
-			e_info->severity = AER_NONFATAL;
+			e_info.severity = AER_NONFATAL;
 
 		if (e_src->status & PCI_ERR_ROOT_MULTI_UNCOR_RCV)
-			e_info->multi_error_valid = 1;
+			e_info.multi_error_valid = 1;
 		else
-			e_info->multi_error_valid = 0;
+			e_info.multi_error_valid = 0;
 
-		aer_print_port_info(pdev, e_info);
+		aer_print_port_info(pdev, &e_info);
 
-		if (find_source_device(pdev, e_info))
-			aer_process_err_devices(e_info);
+		if (find_source_device(pdev, &e_info))
+			aer_process_err_devices(&e_info);
 	}
 }
 
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 07/12] PCI/AER: Use kfifo for tracking events
  2018-09-18 23:58 [PATCH 00/12] error handling and pciehp maintenance Keith Busch
                   ` (5 preceding siblings ...)
  2018-09-18 23:58 ` [PATCH 06/12] PCI/AER: Remove error source from aer struct Keith Busch
@ 2018-09-18 23:58 ` Keith Busch
  2018-09-18 23:58 ` [PATCH 08/12] PCI/AER: Use kfifo helper inserting locked elements Keith Busch
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Keith Busch @ 2018-09-18 23:58 UTC (permalink / raw)
  To: Linux PCI, Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Sinan Kaya, Thomas Tai, poza,
	Lukas Wunner, Christoph Hellwig, Mika Westerberg, Keith Busch

The kernel provides a generic fifo implementation, so no need to reinvent
that capability in a driver. This patch replaces the aer specific
implementation with the kernel provided kfifo. Since the interrupt
handler producer and work queue consumer run single threaded, there is
no need for additional locking, so this patch removes that lock too.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/aer.c | 88 +++++++-------------------------------------------
 1 file changed, 11 insertions(+), 77 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 72f8b4d6be32..813b5dd6fe68 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -30,7 +30,7 @@
 #include "../pci.h"
 #include "portdrv.h"
 
-#define AER_ERROR_SOURCES_MAX		100
+#define AER_ERROR_SOURCES_MAX		128
 
 #define AER_MAX_TYPEOF_COR_ERRS		16	/* as per PCI_ERR_COR_STATUS */
 #define AER_MAX_TYPEOF_UNCOR_ERRS	26	/* as per PCI_ERR_UNCOR_STATUS*/
@@ -43,14 +43,8 @@ struct aer_err_source {
 struct aer_rpc {
 	struct pci_dev *rpd;		/* Root Port device */
 	struct work_struct dpc_handler;
-	struct aer_err_source e_sources[AER_ERROR_SOURCES_MAX];
-	unsigned short prod_idx;	/* Error Producer Index */
-	unsigned short cons_idx;	/* Error Consumer Index */
+	DECLARE_KFIFO(aer_fifo, struct aer_err_source, AER_ERROR_SOURCES_MAX);
 	int isr;
-	spinlock_t e_lock;		/*
-					 * Lock access to Error Status/ID Regs
-					 * and error producer/consumer index
-					 */
 	struct mutex rpc_mutex;		/*
 					 * only one thread could do
 					 * recovery on the same
@@ -1216,35 +1210,6 @@ static void aer_isr_one_error(struct aer_rpc *rpc,
 	}
 }
 
-/**
- * get_e_source - retrieve an error source
- * @rpc: pointer to the root port which holds an error
- * @e_src: pointer to store retrieved error source
- *
- * Return 1 if an error source is retrieved, otherwise 0.
- *
- * Invoked by DPC handler to consume an error.
- */
-static int get_e_source(struct aer_rpc *rpc, struct aer_err_source *e_src)
-{
-	unsigned long flags;
-
-	/* Lock access to Root error producer/consumer index */
-	spin_lock_irqsave(&rpc->e_lock, flags);
-	if (rpc->prod_idx == rpc->cons_idx) {
-		spin_unlock_irqrestore(&rpc->e_lock, flags);
-		return 0;
-	}
-
-	*e_src = rpc->e_sources[rpc->cons_idx];
-	rpc->cons_idx++;
-	if (rpc->cons_idx == AER_ERROR_SOURCES_MAX)
-		rpc->cons_idx = 0;
-	spin_unlock_irqrestore(&rpc->e_lock, flags);
-
-	return 1;
-}
-
 /**
  * aer_isr - consume errors detected by root port
  * @work: definition of this work item
@@ -1257,7 +1222,7 @@ static void aer_isr(struct work_struct *work)
 	struct aer_err_source uninitialized_var(e_src);
 
 	mutex_lock(&rpc->rpc_mutex);
-	while (get_e_source(rpc, &e_src))
+	while (kfifo_get(&rpc->aer_fifo, &e_src))
 		aer_isr_one_error(rpc, &e_src);
 	mutex_unlock(&rpc->rpc_mutex);
 }
@@ -1271,51 +1236,23 @@ static void aer_isr(struct work_struct *work)
  */
 static irqreturn_t aer_irq(int irq, void *context)
 {
-	unsigned int status, id;
 	struct pcie_device *pdev = (struct pcie_device *)context;
 	struct aer_rpc *rpc = get_service_data(pdev);
-	int next_prod_idx;
-	unsigned long flags;
-	int pos;
-
-	pos = pdev->port->aer_cap;
-	/*
-	 * Must lock access to Root Error Status Reg, Root Error ID Reg,
-	 * and Root error producer/consumer index
-	 */
-	spin_lock_irqsave(&rpc->e_lock, flags);
+	struct pci_dev *rp = rpc->rpd;
+	struct aer_err_source e_src = {};
+	int pos = rp->aer_cap;
 
-	/* Read error status */
-	pci_read_config_dword(pdev->port, pos + PCI_ERR_ROOT_STATUS, &status);
-	if (!(status & (PCI_ERR_ROOT_UNCOR_RCV|PCI_ERR_ROOT_COR_RCV))) {
-		spin_unlock_irqrestore(&rpc->e_lock, flags);
+	pci_read_config_dword(rp, pos + PCI_ERR_ROOT_STATUS, &e_src.status);
+	if (!(e_src.status & (PCI_ERR_ROOT_UNCOR_RCV|PCI_ERR_ROOT_COR_RCV)))
 		return IRQ_NONE;
-	}
 
-	/* Read error source and clear error status */
-	pci_read_config_dword(pdev->port, pos + PCI_ERR_ROOT_ERR_SRC, &id);
-	pci_write_config_dword(pdev->port, pos + PCI_ERR_ROOT_STATUS, status);
+	pci_read_config_dword(rp, pos + PCI_ERR_ROOT_ERR_SRC, &e_src.id);
+	pci_write_config_dword(rp, pos + PCI_ERR_ROOT_STATUS, e_src.status);
 
-	/* Store error source for later DPC handler */
-	next_prod_idx = rpc->prod_idx + 1;
-	if (next_prod_idx == AER_ERROR_SOURCES_MAX)
-		next_prod_idx = 0;
-	if (next_prod_idx == rpc->cons_idx) {
-		/*
-		 * Error Storm Condition - possibly the same error occurred.
-		 * Drop the error.
-		 */
-		spin_unlock_irqrestore(&rpc->e_lock, flags);
+	if (!kfifo_put(&rpc->aer_fifo, e_src))
 		return IRQ_HANDLED;
-	}
-	rpc->e_sources[rpc->prod_idx].status =  status;
-	rpc->e_sources[rpc->prod_idx].id = id;
-	rpc->prod_idx = next_prod_idx;
-	spin_unlock_irqrestore(&rpc->e_lock, flags);
 
-	/*  Invoke DPC handler */
 	schedule_work(&rpc->dpc_handler);
-
 	return IRQ_HANDLED;
 }
 
@@ -1439,9 +1376,6 @@ static struct aer_rpc *aer_alloc_rpc(struct pcie_device *dev)
 	if (!rpc)
 		return NULL;
 
-	/* Initialize Root lock access, e_lock, to Root Error Status Reg */
-	spin_lock_init(&rpc->e_lock);
-
 	rpc->rpd = dev->port;
 	INIT_WORK(&rpc->dpc_handler, aer_isr);
 	mutex_init(&rpc->rpc_mutex);
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 08/12] PCI/AER: Use kfifo helper inserting locked elements
  2018-09-18 23:58 [PATCH 00/12] error handling and pciehp maintenance Keith Busch
                   ` (6 preceding siblings ...)
  2018-09-18 23:58 ` [PATCH 07/12] PCI/AER: Use kfifo for tracking events Keith Busch
@ 2018-09-18 23:58 ` Keith Busch
  2018-09-18 23:58 ` [PATCH 09/12] PCI/AER: Don't read upstream ports below fatal errors Keith Busch
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Keith Busch @ 2018-09-18 23:58 UTC (permalink / raw)
  To: Linux PCI, Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Sinan Kaya, Thomas Tai, poza,
	Lukas Wunner, Christoph Hellwig, Mika Westerberg, Keith Busch

This patch uses the recommended kernel API for writing to a concurrently
accessed kfifo. No functional change here.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/aer.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 813b5dd6fe68..33bbcaa41f65 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1063,7 +1063,6 @@ static DECLARE_WORK(aer_recover_work, aer_recover_work_func);
 void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn,
 		       int severity, struct aer_capability_regs *aer_regs)
 {
-	unsigned long flags;
 	struct aer_recover_entry entry = {
 		.bus		= bus,
 		.devfn		= devfn,
@@ -1072,13 +1071,12 @@ void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn,
 		.regs		= aer_regs,
 	};
 
-	spin_lock_irqsave(&aer_recover_ring_lock, flags);
-	if (kfifo_put(&aer_recover_ring, entry))
+	if (kfifo_in_spinlocked(&aer_recover_ring, &entry, sizeof(entry),
+				 &aer_recover_ring_lock))
 		schedule_work(&aer_recover_work);
 	else
 		pr_err("AER recover: Buffer overflow when recovering AER for %04x:%02x:%02x:%x\n",
 		       domain, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
-	spin_unlock_irqrestore(&aer_recover_ring_lock, flags);
 }
 EXPORT_SYMBOL_GPL(aer_recover_queue);
 #endif
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 09/12] PCI/AER: Don't read upstream ports below fatal errors
  2018-09-18 23:58 [PATCH 00/12] error handling and pciehp maintenance Keith Busch
                   ` (7 preceding siblings ...)
  2018-09-18 23:58 ` [PATCH 08/12] PCI/AER: Use kfifo helper inserting locked elements Keith Busch
@ 2018-09-18 23:58 ` Keith Busch
  2018-09-18 23:58 ` [PATCH 10/12] PCI/AER: Use threaded IRQ for bottom half Keith Busch
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Keith Busch @ 2018-09-18 23:58 UTC (permalink / raw)
  To: Linux PCI, Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Sinan Kaya, Thomas Tai, poza,
	Lukas Wunner, Christoph Hellwig, Mika Westerberg, Keith Busch

The AER driver has never read the config space of an end device
that reported a fatal error because the link to that device is considered
unreliable.

An ERR_FATAL from upstream port was almost certainly detected that error
on its upstream link, so we can't expect to reliably read its config
space for the same reason.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/aer.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 33bbcaa41f65..41c36916d46c 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1111,8 +1111,9 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
 			&info->mask);
 		if (!(info->status & ~info->mask))
 			return 0;
-	} else if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE ||
-		info->severity == AER_NONFATAL) {
+	} else if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
+	           pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
+		   info->severity == AER_NONFATAL) {
 
 		/* Link is still healthy for IO reads */
 		pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS,
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 10/12] PCI/AER: Use threaded IRQ for bottom half
  2018-09-18 23:58 [PATCH 00/12] error handling and pciehp maintenance Keith Busch
                   ` (8 preceding siblings ...)
  2018-09-18 23:58 ` [PATCH 09/12] PCI/AER: Don't read upstream ports below fatal errors Keith Busch
@ 2018-09-18 23:58 ` Keith Busch
  2018-09-18 23:58 ` [PATCH 11/12] PCI/AER: Use managed resource allocations Keith Busch
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Keith Busch @ 2018-09-18 23:58 UTC (permalink / raw)
  To: Linux PCI, Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Sinan Kaya, Thomas Tai, poza,
	Lukas Wunner, Christoph Hellwig, Mika Westerberg, Keith Busch

The threaded IRQ is naturally single threaded as desired, so use that
to simplify the aer bottom half handler. Since the root port structure
is has much less to do now, this patch removes the rpc construction
helper routine.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/aer.c | 62 ++++++++++++--------------------------------------
 1 file changed, 14 insertions(+), 48 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 41c36916d46c..1878d9d7760b 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -42,14 +42,7 @@ struct aer_err_source {
 
 struct aer_rpc {
 	struct pci_dev *rpd;		/* Root Port device */
-	struct work_struct dpc_handler;
 	DECLARE_KFIFO(aer_fifo, struct aer_err_source, AER_ERROR_SOURCES_MAX);
-	int isr;
-	struct mutex rpc_mutex;		/*
-					 * only one thread could do
-					 * recovery on the same
-					 * root port hierarchy
-					 */
 };
 
 /* AER stats for the device */
@@ -1215,15 +1208,18 @@ static void aer_isr_one_error(struct aer_rpc *rpc,
  *
  * Invoked, as DPC, when root port records new detected error
  */
-static void aer_isr(struct work_struct *work)
+static irqreturn_t aer_isr(int irq, void *context)
 {
-	struct aer_rpc *rpc = container_of(work, struct aer_rpc, dpc_handler);
+	struct pcie_device *dev = (struct pcie_device *)context;
+	struct aer_rpc *rpc = get_service_data(dev);
 	struct aer_err_source uninitialized_var(e_src);
 
-	mutex_lock(&rpc->rpc_mutex);
+	if (kfifo_is_empty(&rpc->aer_fifo))
+		return IRQ_NONE;
+
 	while (kfifo_get(&rpc->aer_fifo, &e_src))
 		aer_isr_one_error(rpc, &e_src);
-	mutex_unlock(&rpc->rpc_mutex);
+	return IRQ_HANDLED;
 }
 
 /**
@@ -1251,8 +1247,7 @@ static irqreturn_t aer_irq(int irq, void *context)
 	if (!kfifo_put(&rpc->aer_fifo, e_src))
 		return IRQ_HANDLED;
 
-	schedule_work(&rpc->dpc_handler);
-	return IRQ_HANDLED;
+	return IRQ_WAKE_THREAD;
 }
 
 static int set_device_error_reporting(struct pci_dev *dev, void *data)
@@ -1361,30 +1356,6 @@ static void aer_disable_rootport(struct aer_rpc *rpc)
 	pci_write_config_dword(pdev, pos + PCI_ERR_ROOT_STATUS, reg32);
 }
 
-/**
- * aer_alloc_rpc - allocate Root Port data structure
- * @dev: pointer to the pcie_dev data structure
- *
- * Invoked when Root Port's AER service is loaded.
- */
-static struct aer_rpc *aer_alloc_rpc(struct pcie_device *dev)
-{
-	struct aer_rpc *rpc;
-
-	rpc = kzalloc(sizeof(struct aer_rpc), GFP_KERNEL);
-	if (!rpc)
-		return NULL;
-
-	rpc->rpd = dev->port;
-	INIT_WORK(&rpc->dpc_handler, aer_isr);
-	mutex_init(&rpc->rpc_mutex);
-
-	/* Use PCIe bus function to store rpc into PCIe device */
-	set_service_data(dev, rpc);
-
-	return rpc;
-}
-
 /**
  * aer_remove - clean up resources
  * @dev: pointer to the pcie_dev data structure
@@ -1396,11 +1367,6 @@ static void aer_remove(struct pcie_device *dev)
 	struct aer_rpc *rpc = get_service_data(dev);
 
 	if (rpc) {
-		/* If register interrupt service, it must be free. */
-		if (rpc->isr)
-			free_irq(dev->irq, dev);
-
-		flush_work(&rpc->dpc_handler);
 		aer_disable_rootport(rpc);
 		kfree(rpc);
 		set_service_data(dev, NULL);
@@ -1417,18 +1383,20 @@ static int aer_probe(struct pcie_device *dev)
 {
 	int status;
 	struct aer_rpc *rpc;
-	struct device *device = &dev->port->dev;
+	struct device *device = &dev->port->device;
 
 	/* Alloc rpc data structure */
-	rpc = aer_alloc_rpc(dev);
+	rpc = kzalloc(sizeof(struct aer_rpc), GFP_KERNEL);
 	if (!rpc) {
 		dev_printk(KERN_DEBUG, device, "alloc AER rpc failed\n");
-		aer_remove(dev);
 		return -ENOMEM;
 	}
+	rpc->rpd = dev->port;
+	set_service_data(dev, rpc);
 
 	/* Request IRQ ISR */
-	status = request_irq(dev->irq, aer_irq, IRQF_SHARED, "aerdrv", dev);
+	status = request_threaded_irq(dev->irq, aer_irq, aer_isr,
+				      IRQF_SHARED, "aerdrv", dev);
 	if (status) {
 		dev_printk(KERN_DEBUG, device, "request AER IRQ %d failed\n",
 			   dev->irq);
@@ -1436,8 +1404,6 @@ static int aer_probe(struct pcie_device *dev)
 		return status;
 	}
 
-	rpc->isr = 1;
-
 	aer_enable_rootport(rpc);
 	dev_info(device, "AER enabled with IRQ %d\n", dev->irq);
 	return 0;
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 11/12] PCI/AER: Use managed resource allocations
  2018-09-18 23:58 [PATCH 00/12] error handling and pciehp maintenance Keith Busch
                   ` (9 preceding siblings ...)
  2018-09-18 23:58 ` [PATCH 10/12] PCI/AER: Use threaded IRQ for bottom half Keith Busch
@ 2018-09-18 23:58 ` Keith Busch
  2018-09-19 16:29   ` Sinan Kaya
  2018-09-25  1:13   ` Benjamin Herrenschmidt
  2018-09-18 23:58 ` [PATCH 12/12] PCI/pciehp: Use device managed allocations Keith Busch
  2018-10-04 21:40 ` [PATCH 00/12] error handling and pciehp maintenance Bjorn Helgaas
  12 siblings, 2 replies; 41+ messages in thread
From: Keith Busch @ 2018-09-18 23:58 UTC (permalink / raw)
  To: Linux PCI, Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Sinan Kaya, Thomas Tai, poza,
	Lukas Wunner, Christoph Hellwig, Mika Westerberg, Keith Busch

This uses the managed device resource allocations for the service data
so that the aer driver doesn't need to manage it, further simplifying
this driver.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/aer.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 1878d9d7760b..7ecad011458d 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1366,11 +1366,7 @@ static void aer_remove(struct pcie_device *dev)
 {
 	struct aer_rpc *rpc = get_service_data(dev);
 
-	if (rpc) {
-		aer_disable_rootport(rpc);
-		kfree(rpc);
-		set_service_data(dev, NULL);
-	}
+	aer_disable_rootport(rpc);
 }
 
 /**
@@ -1383,10 +1379,9 @@ static int aer_probe(struct pcie_device *dev)
 {
 	int status;
 	struct aer_rpc *rpc;
-	struct device *device = &dev->port->device;
+	struct device *device = &dev->device;
 
-	/* Alloc rpc data structure */
-	rpc = kzalloc(sizeof(struct aer_rpc), GFP_KERNEL);
+	rpc = devm_kzalloc(device, sizeof(struct aer_rpc), GFP_KERNEL);
 	if (!rpc) {
 		dev_printk(KERN_DEBUG, device, "alloc AER rpc failed\n");
 		return -ENOMEM;
@@ -1394,13 +1389,11 @@ static int aer_probe(struct pcie_device *dev)
 	rpc->rpd = dev->port;
 	set_service_data(dev, rpc);
 
-	/* Request IRQ ISR */
-	status = request_threaded_irq(dev->irq, aer_irq, aer_isr,
-				      IRQF_SHARED, "aerdrv", dev);
+	status = devm_request_threaded_irq(device, dev->irq, aer_irq, aer_isr,
+					   IRQF_SHARED, "aerdrv", dev);
 	if (status) {
 		dev_printk(KERN_DEBUG, device, "request AER IRQ %d failed\n",
 			   dev->irq);
-		aer_remove(dev);
 		return status;
 	}
 
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 12/12] PCI/pciehp: Use device managed allocations
  2018-09-18 23:58 [PATCH 00/12] error handling and pciehp maintenance Keith Busch
                   ` (10 preceding siblings ...)
  2018-09-18 23:58 ` [PATCH 11/12] PCI/AER: Use managed resource allocations Keith Busch
@ 2018-09-18 23:58 ` Keith Busch
  2018-09-19 15:11   ` Sinan Kaya
  2018-09-22 18:10   ` Lukas Wunner
  2018-10-04 21:40 ` [PATCH 00/12] error handling and pciehp maintenance Bjorn Helgaas
  12 siblings, 2 replies; 41+ messages in thread
From: Keith Busch @ 2018-09-18 23:58 UTC (permalink / raw)
  To: Linux PCI, Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Sinan Kaya, Thomas Tai, poza,
	Lukas Wunner, Christoph Hellwig, Mika Westerberg, Keith Busch

All of pciehp's resources are tied to the lifetime of the device it is
driving. This patch simplifies the resource tracking by using the device
managed resource allocations.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/hotplug/pciehp_core.c | 14 +++---------
 drivers/pci/hotplug/pciehp_hpc.c  | 48 +++++++++++----------------------------
 2 files changed, 16 insertions(+), 46 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 334044814dbe..c9ae89f25e8c 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -58,16 +58,16 @@ static int init_slot(struct controller *ctrl)
 	char name[SLOT_NAME_SIZE];
 	int retval = -ENOMEM;
 
-	hotplug = kzalloc(sizeof(*hotplug), GFP_KERNEL);
+	hotplug = devm_kzalloc(&ctrl->pcie->device, sizeof(*hotplug), GFP_KERNEL);
 	if (!hotplug)
 		goto out;
 
-	info = kzalloc(sizeof(*info), GFP_KERNEL);
+	info = devm_kzalloc(&ctrl->pcie->device, sizeof(*info), GFP_KERNEL);
 	if (!info)
 		goto out;
 
 	/* Setup hotplug slot ops */
-	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
+	ops = devm_kzalloc(&ctrl->pcie->device, sizeof(*ops), GFP_KERNEL);
 	if (!ops)
 		goto out;
 
@@ -98,11 +98,6 @@ static int init_slot(struct controller *ctrl)
 	if (retval)
 		ctrl_err(ctrl, "pci_hp_initialize failed: error %d\n", retval);
 out:
-	if (retval) {
-		kfree(ops);
-		kfree(info);
-		kfree(hotplug);
-	}
 	return retval;
 }
 
@@ -111,9 +106,6 @@ static void cleanup_slot(struct controller *ctrl)
 	struct hotplug_slot *hotplug_slot = ctrl->slot->hotplug_slot;
 
 	pci_hp_destroy(hotplug_slot);
-	kfree(hotplug_slot->ops);
-	kfree(hotplug_slot->info);
-	kfree(hotplug_slot);
 }
 
 /*
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 13650f079188..72c22e9c0b63 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -45,22 +45,15 @@ static inline int pciehp_request_irq(struct controller *ctrl)
 	}
 
 	/* Installs the interrupt handler */
-	retval = request_threaded_irq(irq, pciehp_isr, pciehp_ist,
-				      IRQF_SHARED, MY_NAME, ctrl);
+	retval = devm_request_threaded_irq(&ctrl->pcie->device, irq, pciehp_isr,
+					   pciehp_ist, IRQF_SHARED, MY_NAME,
+					   ctrl);
 	if (retval)
 		ctrl_err(ctrl, "Cannot get irq %d for the hotplug controller\n",
 			 irq);
 	return retval;
 }
 
-static inline void pciehp_free_irq(struct controller *ctrl)
-{
-	if (pciehp_poll_mode)
-		kthread_stop(ctrl->poll_thread);
-	else
-		free_irq(ctrl->pcie->irq, ctrl);
-}
-
 static int pcie_poll_cmd(struct controller *ctrl, int timeout)
 {
 	struct pci_dev *pdev = ctrl_dev(ctrl);
@@ -780,17 +773,14 @@ int pcie_init_notification(struct controller *ctrl)
 	if (pciehp_request_irq(ctrl))
 		return -1;
 	pcie_enable_notification(ctrl);
-	ctrl->notification_enabled = 1;
 	return 0;
 }
 
 void pcie_shutdown_notification(struct controller *ctrl)
 {
-	if (ctrl->notification_enabled) {
-		pcie_disable_notification(ctrl);
-		pciehp_free_irq(ctrl);
-		ctrl->notification_enabled = 0;
-	}
+	pcie_disable_notification(ctrl);
+	if (pciehp_poll_mode)
+		kthread_stop(ctrl->poll_thread);
 }
 
 static int pcie_init_slot(struct controller *ctrl)
@@ -798,7 +788,7 @@ static int pcie_init_slot(struct controller *ctrl)
 	struct pci_bus *subordinate = ctrl_dev(ctrl)->subordinate;
 	struct slot *slot;
 
-	slot = kzalloc(sizeof(*slot), GFP_KERNEL);
+	slot = devm_kzalloc(&ctrl->pcie->device, sizeof(*slot), GFP_KERNEL);
 	if (!slot)
 		return -ENOMEM;
 
@@ -813,14 +803,6 @@ static int pcie_init_slot(struct controller *ctrl)
 	return 0;
 }
 
-static void pcie_cleanup_slot(struct controller *ctrl)
-{
-	struct slot *slot = ctrl->slot;
-
-	cancel_delayed_work_sync(&slot->work);
-	kfree(slot);
-}
-
 static inline void dbg_ctrl(struct controller *ctrl)
 {
 	struct pci_dev *pdev = ctrl->pcie->port;
@@ -845,9 +827,9 @@ struct controller *pcie_init(struct pcie_device *dev)
 	u8 occupied, poweron;
 	struct pci_dev *pdev = dev->port;
 
-	ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
+	ctrl = devm_kzalloc(&dev->device, sizeof(*ctrl), GFP_KERNEL);
 	if (!ctrl)
-		goto abort;
+		return NULL;
 
 	ctrl->pcie = dev;
 	pcie_capability_read_dword(pdev, PCI_EXP_SLTCAP, &slot_cap);
@@ -893,7 +875,7 @@ struct controller *pcie_init(struct pcie_device *dev)
 		pdev->broken_cmd_compl ? " (with Cmd Compl erratum)" : "");
 
 	if (pcie_init_slot(ctrl))
-		goto abort_ctrl;
+		return NULL;
 
 	/*
 	 * If empty slot's power status is on, turn power off.  The IRQ isn't
@@ -909,17 +891,13 @@ struct controller *pcie_init(struct pcie_device *dev)
 	}
 
 	return ctrl;
-
-abort_ctrl:
-	kfree(ctrl);
-abort:
-	return NULL;
 }
 
 void pciehp_release_ctrl(struct controller *ctrl)
 {
-	pcie_cleanup_slot(ctrl);
-	kfree(ctrl);
+	struct slot *slot = ctrl->slot;
+
+	cancel_delayed_work_sync(&slot->work);
 }
 
 static void quirk_cmd_compl(struct pci_dev *pdev)
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 12/12] PCI/pciehp: Use device managed allocations
  2018-09-18 23:58 ` [PATCH 12/12] PCI/pciehp: Use device managed allocations Keith Busch
@ 2018-09-19 15:11   ` Sinan Kaya
  2018-09-19 16:17     ` Keith Busch
  2018-09-22 18:10   ` Lukas Wunner
  1 sibling, 1 reply; 41+ messages in thread
From: Sinan Kaya @ 2018-09-19 15:11 UTC (permalink / raw)
  To: Keith Busch, Linux PCI, Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Thomas Tai, poza, Lukas Wunner,
	Christoph Hellwig, Mika Westerberg

On 9/18/2018 7:58 PM, Keith Busch wrote:
> -	ctrl->notification_enabled = 1;
>   	return 0;
>   }
>   
>   void pcie_shutdown_notification(struct controller *ctrl)
>   {
> -	if (ctrl->notification_enabled) {
> -		pcie_disable_notification(ctrl);
> -		pciehp_free_irq(ctrl);
> -		ctrl->notification_enabled = 0;
> -	}
> +	pcie_disable_notification(ctrl);
> +	if (pciehp_poll_mode)
> +		kthread_stop(ctrl->poll_thread);
>   }
>   

I think this notification_enabled bit change needs to go to
another path. The rest of the change in this file is pretty mechanic changes.
Also, are you going to remove the notification_enabled member?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 12/12] PCI/pciehp: Use device managed allocations
  2018-09-19 15:11   ` Sinan Kaya
@ 2018-09-19 16:17     ` Keith Busch
  0 siblings, 0 replies; 41+ messages in thread
From: Keith Busch @ 2018-09-19 16:17 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Linux PCI, Bjorn Helgaas, Benjamin Herrenschmidt, Thomas Tai,
	poza, Lukas Wunner, Christoph Hellwig, Mika Westerberg

On Wed, Sep 19, 2018 at 11:11:19AM -0400, Sinan Kaya wrote:
> On 9/18/2018 7:58 PM, Keith Busch wrote:
> > -	ctrl->notification_enabled = 1;
> >   	return 0;
> >   }
> >   void pcie_shutdown_notification(struct controller *ctrl)
> >   {
> > -	if (ctrl->notification_enabled) {
> > -		pcie_disable_notification(ctrl);
> > -		pciehp_free_irq(ctrl);
> > -		ctrl->notification_enabled = 0;
> > -	}
> > +	pcie_disable_notification(ctrl);
> > +	if (pciehp_poll_mode)
> > +		kthread_stop(ctrl->poll_thread);
> >   }
> 
> I think this notification_enabled bit change needs to go to
> another path. The rest of the change in this file is pretty mechanic changes.
> Also, are you going to remove the notification_enabled member?

Right, we don't need it, and I should have removed it. There is no path
that would call pcie_shutdown_notification if it hadn't been enabled.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 11/12] PCI/AER: Use managed resource allocations
  2018-09-18 23:58 ` [PATCH 11/12] PCI/AER: Use managed resource allocations Keith Busch
@ 2018-09-19 16:29   ` Sinan Kaya
  2018-09-19 17:25     ` Keith Busch
  2018-09-25  1:13   ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 41+ messages in thread
From: Sinan Kaya @ 2018-09-19 16:29 UTC (permalink / raw)
  To: Keith Busch, Linux PCI, Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Thomas Tai, poza, Lukas Wunner,
	Christoph Hellwig, Mika Westerberg

On 9/18/2018 7:58 PM, Keith Busch wrote:
>   	if (status) {
>   		dev_printk(KERN_DEBUG, device, "request AER IRQ %d failed\n",
>   			   dev->irq);
> -		aer_remove(dev);
>   		return status;
>   	}
>   

Don't we still need to call aer_remove() here?

Old code would call aer_disable_rootport(rpc) via aer_remove() on IRQ allocation
failure. We are no longer doing this.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 11/12] PCI/AER: Use managed resource allocations
  2018-09-19 16:29   ` Sinan Kaya
@ 2018-09-19 17:25     ` Keith Busch
  2018-09-19 17:36       ` Sinan Kaya
  0 siblings, 1 reply; 41+ messages in thread
From: Keith Busch @ 2018-09-19 17:25 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Linux PCI, Bjorn Helgaas, Benjamin Herrenschmidt, Thomas Tai,
	poza, Lukas Wunner, Christoph Hellwig, Mika Westerberg

On Wed, Sep 19, 2018 at 12:29:15PM -0400, Sinan Kaya wrote:
> On 9/18/2018 7:58 PM, Keith Busch wrote:
> >   	if (status) {
> >   		dev_printk(KERN_DEBUG, device, "request AER IRQ %d failed\n",
> >   			   dev->irq);
> > -		aer_remove(dev);
> >   		return status;
> >   	}
> 
> Don't we still need to call aer_remove() here?
> 
> Old code would call aer_disable_rootport(rpc) via aer_remove() on IRQ allocation
> failure. We are no longer doing this.

We need to call aer_disable_rootport only if aer_enable_rootport was
called, but that happens *after* irq allocation.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 11/12] PCI/AER: Use managed resource allocations
  2018-09-19 17:25     ` Keith Busch
@ 2018-09-19 17:36       ` Sinan Kaya
  0 siblings, 0 replies; 41+ messages in thread
From: Sinan Kaya @ 2018-09-19 17:36 UTC (permalink / raw)
  To: Keith Busch
  Cc: Linux PCI, Bjorn Helgaas, Benjamin Herrenschmidt, Thomas Tai,
	poza, Lukas Wunner, Christoph Hellwig, Mika Westerberg

On 9/19/2018 1:25 PM, Keith Busch wrote:
> On Wed, Sep 19, 2018 at 12:29:15PM -0400, Sinan Kaya wrote:
>> On 9/18/2018 7:58 PM, Keith Busch wrote:
>>>    	if (status) {
>>>    		dev_printk(KERN_DEBUG, device, "request AER IRQ %d failed\n",
>>>    			   dev->irq);
>>> -		aer_remove(dev);
>>>    		return status;
>>>    	}
>>
>> Don't we still need to call aer_remove() here?
>>
>> Old code would call aer_disable_rootport(rpc) via aer_remove() on IRQ allocation
>> failure. We are no longer doing this.
> 
> We need to call aer_disable_rootport only if aer_enable_rootport was
> called, but that happens *after* irq allocation.
> 

I see. Thanks for clarification.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 12/12] PCI/pciehp: Use device managed allocations
  2018-09-18 23:58 ` [PATCH 12/12] PCI/pciehp: Use device managed allocations Keith Busch
  2018-09-19 15:11   ` Sinan Kaya
@ 2018-09-22 18:10   ` Lukas Wunner
  2018-09-24 23:43     ` Bjorn Helgaas
  1 sibling, 1 reply; 41+ messages in thread
From: Lukas Wunner @ 2018-09-22 18:10 UTC (permalink / raw)
  To: Keith Busch
  Cc: Linux PCI, Bjorn Helgaas, Benjamin Herrenschmidt, Sinan Kaya,
	Thomas Tai, poza, Christoph Hellwig, Mika Westerberg

On Tue, Sep 18, 2018 at 05:58:48PM -0600, Keith Busch wrote:
> --- a/drivers/pci/hotplug/pciehp_core.c
> +++ b/drivers/pci/hotplug/pciehp_core.c
> @@ -58,16 +58,16 @@ static int init_slot(struct controller *ctrl)
>  	char name[SLOT_NAME_SIZE];
>  	int retval = -ENOMEM;
>  
> -	hotplug = kzalloc(sizeof(*hotplug), GFP_KERNEL);
> +	hotplug = devm_kzalloc(&ctrl->pcie->device, sizeof(*hotplug), GFP_KERNEL);
>  	if (!hotplug)
>  		goto out;
>  
> -	info = kzalloc(sizeof(*info), GFP_KERNEL);
> +	info = devm_kzalloc(&ctrl->pcie->device, sizeof(*info), GFP_KERNEL);
>  	if (!info)
>  		goto out;
>  
>  	/* Setup hotplug slot ops */
> -	ops = kzalloc(sizeof(*ops), GFP_KERNEL);
> +	ops = devm_kzalloc(&ctrl->pcie->device, sizeof(*ops), GFP_KERNEL);
>  	if (!ops)
>  		goto out;

The "hotplug" and "info" allocations are gone on the pci/hotplug branch,
so this needs a rebase.


> @@ -111,9 +106,6 @@ static void cleanup_slot(struct controller *ctrl)
>  	struct hotplug_slot *hotplug_slot = ctrl->slot->hotplug_slot;
>  
>  	pci_hp_destroy(hotplug_slot);
> -	kfree(hotplug_slot->ops);
> -	kfree(hotplug_slot->info);
> -	kfree(hotplug_slot);
>  }

Please replace all invocations of cleanup_slot() with a direct call to
pci_hp_destroy() since this is now becoming merely a wrapper function.


> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -45,22 +45,15 @@ static inline int pciehp_request_irq(struct controller *ctrl)
>  	}
>  
>  	/* Installs the interrupt handler */
> -	retval = request_threaded_irq(irq, pciehp_isr, pciehp_ist,
> -				      IRQF_SHARED, MY_NAME, ctrl);
> +	retval = devm_request_threaded_irq(&ctrl->pcie->device, irq, pciehp_isr,
> +					   pciehp_ist, IRQF_SHARED, MY_NAME,
> +					   ctrl);

While using devm_kzalloc() for the "ops" and "ctrl" allocations is fine,
using devm_request_threaded_irq() is *not* because it allows a
use-after-free of the hotplug_slot_name():

With your patch the teardown order becomes:
  pci_hp_del(&ctrl->hotplug_slot);       # After this user space can no longer
                                         # issue enable/disable requests (but
				         # ->reset_slot() is still possible).
  pcie_disable_notification();           # After this, the IRQ thread is no
                                         # longer woken because of slot events.
  pci_hp_destroy(hotplug_slot);          # After this, hotplug_slot_name() must
                                         # no longer be called.
  cancel_delayed_work_sync(&slot->work); # After this, the IRQ thread is no
                                         # longer woken by the button_work.

What can happen here is the button_work gets scheduled before it is
canceled, it wakes up the IRQ thread, the IRQ thread brings up or
down the slot and prints messages to the log which include the
hotplug_slot_name(), leading to a use-after-free.

This cannot happen with what's in v4.19 or on the pci/hotplug branch
because the IRQ is released right after the call to pci_hp_del().
If the button_work happens to be scheduled afterwards, it will call
irq_wake_thread(ctrl->pcie->irq, ctrl).  Now if you look at the
implementation of irq_wake_thread(), it iterates over all the
actions of the IRQ (in case it's shared by multiple devices) and
finds the action which matches the ctrl pointer.  But because we've
already released the IRQ at that point, it won't find the IRQ action
and just return.  So irq_wake_thread() essentially becomes a no-op.

The teardown order is very meticulously designed, I went over it
dozens of times to ensure it's bullet-proof.  The IRQ really has to
be released at exactly the right time and using a device-managed IRQ
breaks the correct order I'm afraid.

While I had thought of using devm_kzalloc(), I also knew that Bjorn
wants to get rid of portdrv and I didn't want to make any changes that
might constrain the future design of portdrv.  Right now, portdrv
creates a sub-device for each service and service drivers bind to it.
An alternative design could either be a single driver that binds to
ports and handles all services at once (probably wouldn't be all that
different to the current situation), or no driver would be bound to
ports and the functionality of the port services would be handled
within the core.  In either case we have the pci_dev of the port to
attach the device-managed allocations to, so I suppose using
devm_kzalloc() here isn't constraining us all that much, hence
seems fine.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 12/12] PCI/pciehp: Use device managed allocations
  2018-09-22 18:10   ` Lukas Wunner
@ 2018-09-24 23:43     ` Bjorn Helgaas
  2018-09-25  7:13       ` Lukas Wunner
  0 siblings, 1 reply; 41+ messages in thread
From: Bjorn Helgaas @ 2018-09-24 23:43 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Keith Busch, Linux PCI, Bjorn Helgaas, Benjamin Herrenschmidt,
	Sinan Kaya, Thomas Tai, poza, Christoph Hellwig, Mika Westerberg

On Sat, Sep 22, 2018 at 08:10:57PM +0200, Lukas Wunner wrote:
> On Tue, Sep 18, 2018 at 05:58:48PM -0600, Keith Busch wrote:
> > --- a/drivers/pci/hotplug/pciehp_core.c
> > +++ b/drivers/pci/hotplug/pciehp_core.c

> >  	/* Installs the interrupt handler */
> > -	retval = request_threaded_irq(irq, pciehp_isr, pciehp_ist,
> > -				      IRQF_SHARED, MY_NAME, ctrl);
> > +	retval = devm_request_threaded_irq(&ctrl->pcie->device, irq, pciehp_isr,
> > +					   pciehp_ist, IRQF_SHARED, MY_NAME,
> > +					   ctrl);
> 
> While using devm_kzalloc() for the "ops" and "ctrl" allocations is fine,
> using devm_request_threaded_irq() is *not* because it allows a
> use-after-free of the hotplug_slot_name():
> 
> With your patch the teardown order becomes:
>   pci_hp_del(&ctrl->hotplug_slot);       # After this user space can no longer
>                                          # issue enable/disable requests (but
>                                          # ->reset_slot() is still possible).
>   pcie_disable_notification();           # After this, the IRQ thread is no
>                                          # longer woken because of slot events.
>   pci_hp_destroy(hotplug_slot);          # After this, hotplug_slot_name() must
>                                          # no longer be called.
>   cancel_delayed_work_sync(&slot->work); # After this, the IRQ thread is no
>                                          # longer woken by the button_work.
> 
> What can happen here is the button_work gets scheduled before it is
> canceled, it wakes up the IRQ thread, the IRQ thread brings up or
> down the slot and prints messages to the log which include the
> hotplug_slot_name(), leading to a use-after-free.
> 
> This cannot happen with what's in v4.19 or on the pci/hotplug branch
> because the IRQ is released right after the call to pci_hp_del().
> If the button_work happens to be scheduled afterwards, it will call
> irq_wake_thread(ctrl->pcie->irq, ctrl).  Now if you look at the
> implementation of irq_wake_thread(), it iterates over all the
> actions of the IRQ (in case it's shared by multiple devices) and
> finds the action which matches the ctrl pointer.  But because we've
> already released the IRQ at that point, it won't find the IRQ action
> and just return.  So irq_wake_thread() essentially becomes a no-op.
> 
> The teardown order is very meticulously designed, I went over it
> dozens of times to ensure it's bullet-proof.  The IRQ really has to
> be released at exactly the right time and using a device-managed IRQ
> breaks the correct order I'm afraid.

Thanks for the very detailed analysis!  

This seems like a very subtle issue that we're likely to trip over
again.  I assume most drivers can use device-managed IRQs safely, but
it's a problem here because of the extra complexity around
hotplug_slot?  Is there some way we can make it less subtle, e.g., by
moving the slot cleanup into the pci_destroy_dev() path or something?

Is there some value we get from having both struct hotplug_slot and
struct pci_slot?

> While I had thought of using devm_kzalloc(), I also knew that Bjorn
> wants to get rid of portdrv and I didn't want to make any changes that
> might constrain the future design of portdrv.  Right now, portdrv
> creates a sub-device for each service and service drivers bind to it.
> An alternative design could either be a single driver that binds to
> ports and handles all services at once (probably wouldn't be all that
> different to the current situation), or no driver would be bound to
> ports and the functionality of the port services would be handled
> within the core.  In either case we have the pci_dev of the port to
> attach the device-managed allocations to, so I suppose using
> devm_kzalloc() here isn't constraining us all that much, hence
> seems fine.

I'd *like* to get rid of portdrv, but I don't see a clear path to do
that yet, so it might be impractical.  One reason I don't like it is
that only one driver can bind to a device, so portdrv prevents us from
having drivers for device-specific bridge features like performance
monitors.  For that reason, I'd like to have the architected services
handled directly in the core.

Bjorn

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 11/12] PCI/AER: Use managed resource allocations
  2018-09-18 23:58 ` [PATCH 11/12] PCI/AER: Use managed resource allocations Keith Busch
  2018-09-19 16:29   ` Sinan Kaya
@ 2018-09-25  1:13   ` Benjamin Herrenschmidt
  2018-09-25 14:17     ` Keith Busch
  1 sibling, 1 reply; 41+ messages in thread
From: Benjamin Herrenschmidt @ 2018-09-25  1:13 UTC (permalink / raw)
  To: Keith Busch, Linux PCI, Bjorn Helgaas
  Cc: Sinan Kaya, Thomas Tai, poza, Lukas Wunner, Christoph Hellwig,
	Mika Westerberg

On Tue, 2018-09-18 at 17:58 -0600, Keith Busch wrote:
> This uses the managed device resource allocations for the service data
> so that the aer driver doesn't need to manage it, further simplifying
> this driver.

Just be careful (it migh be ok, I haven't audited everything, but I got
bitten by something like that in the past) that the devm stuff will get
disposed of in two cases:

 - The owner device going away (so far so good)

 - The owner device's driver being unbound

The latter is something not completely obvious, ie, even if the owner
device still has held references, the successful completion of
->remove() on the driver will be followed by a cleanup of the managed
stuff.

As I said, it might be ok in the AER case, but you might want to at
least keep the set_service_data(dev, NULL) to make sure you don't leave
a stale pointer there.

Cheers,
Ben.

> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
>  drivers/pci/pcie/aer.c | 17 +++++------------
>  1 file changed, 5 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 1878d9d7760b..7ecad011458d 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1366,11 +1366,7 @@ static void aer_remove(struct pcie_device *dev)
>  {
>  	struct aer_rpc *rpc = get_service_data(dev);
>  
> -	if (rpc) {
> -		aer_disable_rootport(rpc);
> -		kfree(rpc);
> -		set_service_data(dev, NULL);
> -	}
> +	aer_disable_rootport(rpc);
>  }
>  
>  /**
> @@ -1383,10 +1379,9 @@ static int aer_probe(struct pcie_device *dev)
>  {
>  	int status;
>  	struct aer_rpc *rpc;
> -	struct device *device = &dev->port->device;
> +	struct device *device = &dev->device;
>  
> -	/* Alloc rpc data structure */
> -	rpc = kzalloc(sizeof(struct aer_rpc), GFP_KERNEL);
> +	rpc = devm_kzalloc(device, sizeof(struct aer_rpc), GFP_KERNEL);
>  	if (!rpc) {
>  		dev_printk(KERN_DEBUG, device, "alloc AER rpc failed\n");
>  		return -ENOMEM;
> @@ -1394,13 +1389,11 @@ static int aer_probe(struct pcie_device *dev)
>  	rpc->rpd = dev->port;
>  	set_service_data(dev, rpc);
>  
> -	/* Request IRQ ISR */
> -	status = request_threaded_irq(dev->irq, aer_irq, aer_isr,
> -				      IRQF_SHARED, "aerdrv", dev);
> +	status = devm_request_threaded_irq(device, dev->irq, aer_irq, aer_isr,
> +					   IRQF_SHARED, "aerdrv", dev);
>  	if (status) {
>  		dev_printk(KERN_DEBUG, device, "request AER IRQ %d failed\n",
>  			   dev->irq);
> -		aer_remove(dev);
>  		return status;
>  	}
>  


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 12/12] PCI/pciehp: Use device managed allocations
  2018-09-24 23:43     ` Bjorn Helgaas
@ 2018-09-25  7:13       ` Lukas Wunner
  0 siblings, 0 replies; 41+ messages in thread
From: Lukas Wunner @ 2018-09-25  7:13 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Keith Busch, Linux PCI, Bjorn Helgaas, Benjamin Herrenschmidt,
	Sinan Kaya, Thomas Tai, poza, Christoph Hellwig, Mika Westerberg

On Mon, Sep 24, 2018 at 06:43:07PM -0500, Bjorn Helgaas wrote:
> On Sat, Sep 22, 2018 at 08:10:57PM +0200, Lukas Wunner wrote:
> > This cannot happen with what's in v4.19 or on the pci/hotplug branch
> > because the IRQ is released right after the call to pci_hp_del().
> > If the button_work happens to be scheduled afterwards, it will call
> > irq_wake_thread(ctrl->pcie->irq, ctrl).  Now if you look at the
> > implementation of irq_wake_thread(), it iterates over all the
> > actions of the IRQ (in case it's shared by multiple devices) and
> > finds the action which matches the ctrl pointer.  But because we've
> > already released the IRQ at that point, it won't find the IRQ action
> > and just return.  So irq_wake_thread() essentially becomes a no-op.
> 
> This seems like a very subtle issue that we're likely to trip over
> again.  I assume most drivers can use device-managed IRQs safely, but
> it's a problem here because of the extra complexity around
> hotplug_slot?  Is there some way we can make it less subtle, e.g., by
> moving the slot cleanup into the pci_destroy_dev() path or something?

The extra complexity is caused by the various ways in which events can
be triggered (Slot Status register change, sysfs, button_work) plus the
deregistration of hotplug_slot.  Teardown paths are always hard, I don't
really see a better code solution, but I should add more code comments to
make it less subtle.


> Is there some value we get from having both struct hotplug_slot and
> struct pci_slot?

When Alex Chiang introduced struct pci_slot, he thought of migrating
struct hotplug_slot into it.  See the "migrate over time" code comment
in commit f46753c5e354.  (It got changed to "move here" by 0aa0f5d1084c.)

On the current pci/hotplug branch, I've embedded struct hotplug_slot
in each hotplug driver's struct slot.  But that doesn't preclude
merging struct hotplug_slot into struct pci_slot, the hotplug drivers
would just have to embed a struct pci_slot instead, or struct pci_slot
would be embedded in struct hotplug_slot.

TBH the rationale behind all the different structs isn't quite clear
to me, e.g. struct pci_slot is also used for non-hotplug slots it seems,
see 8344b568f5bd.

(Note, Alex is no longer reachable under his hp.com address, same for
his canonical.com address.)


> > While I had thought of using devm_kzalloc(), I also knew that Bjorn
> > wants to get rid of portdrv and I didn't want to make any changes that
> > might constrain the future design of portdrv.  Right now, portdrv
> > creates a sub-device for each service and service drivers bind to it.
> > An alternative design could either be a single driver that binds to
> > ports and handles all services at once (probably wouldn't be all that
> > different to the current situation), or no driver would be bound to
> > ports and the functionality of the port services would be handled
> > within the core.  In either case we have the pci_dev of the port to
> > attach the device-managed allocations to, so I suppose using
> > devm_kzalloc() here isn't constraining us all that much, hence
> > seems fine.
> 
> I'd *like* to get rid of portdrv, but I don't see a clear path to do
> that yet, so it might be impractical.  One reason I don't like it is
> that only one driver can bind to a device, so portdrv prevents us from
> having drivers for device-specific bridge features like performance
> monitors.  For that reason, I'd like to have the architected services
> handled directly in the core.

Hm, well in that case we can't use devm_*() at all, as Benjamin
Herrenschmidt has just correctly pointed out, devres_release_all()
is called both on driver unbind in __device_release_driver() as well
as on device destruction in device_release().

Thus, if the port services are handled in the core to allow a driver
to bind to the port and devm_kzalloc() is used by the port services,
those allocations will be freed when that driver unbinds, and any
further use of the allocations is a use-after-free.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 11/12] PCI/AER: Use managed resource allocations
  2018-09-25  1:13   ` Benjamin Herrenschmidt
@ 2018-09-25 14:17     ` Keith Busch
  0 siblings, 0 replies; 41+ messages in thread
From: Keith Busch @ 2018-09-25 14:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linux PCI, Bjorn Helgaas, Sinan Kaya, Thomas Tai, poza,
	Lukas Wunner, Christoph Hellwig, Mika Westerberg

On Tue, Sep 25, 2018 at 11:13:42AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2018-09-18 at 17:58 -0600, Keith Busch wrote:
> > This uses the managed device resource allocations for the service data
> > so that the aer driver doesn't need to manage it, further simplifying
> > this driver.
> 
> Just be careful (it migh be ok, I haven't audited everything, but I got
> bitten by something like that in the past) that the devm stuff will get
> disposed of in two cases:
> 
>  - The owner device going away (so far so good)
> 
>  - The owner device's driver being unbound
> 
> The latter is something not completely obvious, ie, even if the owner
> device still has held references, the successful completion of
> ->remove() on the driver will be followed by a cleanup of the managed
> stuff.
> 
> As I said, it might be ok in the AER case, but you might want to at
> least keep the set_service_data(dev, NULL) to make sure you don't leave
> a stale pointer there.

Yes, these resource methods should be considered carefully. I think
we're okay here, and didn't want to set service data to NULL for a
couple reasons:

 1. The service data and its device are released together, so the device
    is already out of scope before it could hold a stale pointer.

 2. It is possible the IRQ handler may be invoked after 'remove', but
    before the managed irq is torn down. Leaving the service data set
    while it is allocated removes a requirement to check for NULL on
    each interrupt.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 00/12] error handling and pciehp maintenance
  2018-09-18 23:58 [PATCH 00/12] error handling and pciehp maintenance Keith Busch
                   ` (11 preceding siblings ...)
  2018-09-18 23:58 ` [PATCH 12/12] PCI/pciehp: Use device managed allocations Keith Busch
@ 2018-10-04 21:40 ` Bjorn Helgaas
  2018-10-04 22:11   ` Keith Busch
  12 siblings, 1 reply; 41+ messages in thread
From: Bjorn Helgaas @ 2018-10-04 21:40 UTC (permalink / raw)
  To: Keith Busch
  Cc: Linux PCI, Bjorn Helgaas, Benjamin Herrenschmidt, Sinan Kaya,
	Thomas Tai, poza, Lukas Wunner, Christoph Hellwig,
	Mika Westerberg

On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote:
> I ran into a lot of trouble testing error handling, and this series is
> just trying to simplify some things. The first 4 fix up aer_inject, and
> the rest are cleanup to make better use of kernel APIs.
> 
> Keith Busch (12):
>   PCI: Set PCI bus accessors to noinline
>   PCI/AER: Covertly inject errors
>   PCI/AER: Reuse existing service device lookup
>   PCI/AER: Abstract AER interrupt handling
>   PCI/AER: Remove dead code
>   PCI/AER: Remove error source from aer struct
>   PCI/AER: Use kfifo for tracking events
>   PCI/AER: Use kfifo helper inserting locked elements
>   PCI/AER: Don't read upstream ports below fatal errors
>   PCI/AER: Use threaded IRQ for bottom half
>   PCI/AER: Use managed resource allocations
>   PCI/pciehp: Use device managed allocations
> 
>  drivers/pci/access.c              |   4 +-
>  drivers/pci/hotplug/pciehp_core.c |  14 +-
>  drivers/pci/hotplug/pciehp_hpc.c  |  48 ++----
>  drivers/pci/pcie/Kconfig          |   2 +-
>  drivers/pci/pcie/aer.c            | 219 ++++++---------------------
>  drivers/pci/pcie/aer_inject.c     | 306 ++++++++++++++++++++------------------
>  drivers/pci/pcie/portdrv.h        |   4 -
>  drivers/pci/pcie/portdrv_core.c   |   1 +
>  8 files changed, 227 insertions(+), 371 deletions(-)

Thanks a lot for doing this!  I applied these to pci/hotplug for
v4.20, except for "PCI/AER: Don't read upstream ports below fatal
errors", which seems to be already there via another posting, and
"PCI/pciehp: Use device managed allocations", which needs a few
tweaks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 00/12] error handling and pciehp maintenance
  2018-10-04 21:40 ` [PATCH 00/12] error handling and pciehp maintenance Bjorn Helgaas
@ 2018-10-04 22:11   ` Keith Busch
  2018-10-05 17:31       ` Bjorn Helgaas
  0 siblings, 1 reply; 41+ messages in thread
From: Keith Busch @ 2018-10-04 22:11 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Linux PCI, Bjorn Helgaas, Benjamin Herrenschmidt, Sinan Kaya,
	Thomas Tai, poza, Lukas Wunner, Christoph Hellwig,
	Mika Westerberg

On Thu, Oct 04, 2018 at 04:40:15PM -0500, Bjorn Helgaas wrote:
> On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote:
> > I ran into a lot of trouble testing error handling, and this series is
> > just trying to simplify some things. The first 4 fix up aer_inject, and
> > the rest are cleanup to make better use of kernel APIs.
> > 
> > Keith Busch (12):
> >   PCI: Set PCI bus accessors to noinline
> >   PCI/AER: Covertly inject errors
> >   PCI/AER: Reuse existing service device lookup
> >   PCI/AER: Abstract AER interrupt handling
> >   PCI/AER: Remove dead code
> >   PCI/AER: Remove error source from aer struct
> >   PCI/AER: Use kfifo for tracking events
> >   PCI/AER: Use kfifo helper inserting locked elements
> >   PCI/AER: Don't read upstream ports below fatal errors
> >   PCI/AER: Use threaded IRQ for bottom half
> >   PCI/AER: Use managed resource allocations
> >   PCI/pciehp: Use device managed allocations
> > 
> >  drivers/pci/access.c              |   4 +-
> >  drivers/pci/hotplug/pciehp_core.c |  14 +-
> >  drivers/pci/hotplug/pciehp_hpc.c  |  48 ++----
> >  drivers/pci/pcie/Kconfig          |   2 +-
> >  drivers/pci/pcie/aer.c            | 219 ++++++---------------------
> >  drivers/pci/pcie/aer_inject.c     | 306 ++++++++++++++++++++------------------
> >  drivers/pci/pcie/portdrv.h        |   4 -
> >  drivers/pci/pcie/portdrv_core.c   |   1 +
> >  8 files changed, 227 insertions(+), 371 deletions(-)
> 
> Thanks a lot for doing this!  I applied these to pci/hotplug for
> v4.20, except for "PCI/AER: Don't read upstream ports below fatal
> errors", which seems to be already there via another posting, and
> "PCI/pciehp: Use device managed allocations", which needs a few
> tweaks.

Sounds good, and thanks for applying!

In case this went unnoticed, patch 2's aer_inject using ftrace hooks
to pci config accessors is really cool and fixes several kernel crashes
I encountered, but it may not work on every architecture. I'm not sure
how widely aer_inject is used, so maybe there are no concerns with the
DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that
dependency in case there are valid objections.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 00/12] error handling and pciehp maintenance
  2018-10-04 22:11   ` Keith Busch
@ 2018-10-05 17:31       ` Bjorn Helgaas
  0 siblings, 0 replies; 41+ messages in thread
From: Bjorn Helgaas @ 2018-10-05 17:31 UTC (permalink / raw)
  To: Keith Busch
  Cc: Linux PCI, Bjorn Helgaas, Benjamin Herrenschmidt, Sinan Kaya,
	Thomas Tai, poza, Lukas Wunner, Christoph Hellwig,
	Mika Westerberg, Catalin Marinas, Will Deacon, linux-arm-kernel,
	linux-kernel

[+cc arm64 folks, LKML: This conversation is about this patch:

  https://lore.kernel.org/linux-pci/20180918235848.26694-3-keith.busch@intel.com

which fixes some PCIe AER error injection bugs, but also makes the error
injector dependent on DYNAMIC_FTRACE_WITH_REGS, which not all arches
support.  Note that this question is only about the error *injection*
module used for testing.  It doesn't affect AER support itself.]

On Thu, Oct 04, 2018 at 04:11:37PM -0600, Keith Busch wrote:
> On Thu, Oct 04, 2018 at 04:40:15PM -0500, Bjorn Helgaas wrote:
> > On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote:
> > > I ran into a lot of trouble testing error handling, and this series is
> > > just trying to simplify some things. The first 4 fix up aer_inject, and
> > > the rest are cleanup to make better use of kernel APIs.
> > > 
> > > Keith Busch (12):
> > >   PCI: Set PCI bus accessors to noinline
> > >   PCI/AER: Covertly inject errors
> > >   PCI/AER: Reuse existing service device lookup
> > >   PCI/AER: Abstract AER interrupt handling
> > >   PCI/AER: Remove dead code
> > >   PCI/AER: Remove error source from aer struct
> > >   PCI/AER: Use kfifo for tracking events
> > >   PCI/AER: Use kfifo helper inserting locked elements
> > >   PCI/AER: Don't read upstream ports below fatal errors
> > >   PCI/AER: Use threaded IRQ for bottom half
> > >   PCI/AER: Use managed resource allocations
> > >   PCI/pciehp: Use device managed allocations
> > > 
> > >  drivers/pci/access.c              |   4 +-
> > >  drivers/pci/hotplug/pciehp_core.c |  14 +-
> > >  drivers/pci/hotplug/pciehp_hpc.c  |  48 ++----
> > >  drivers/pci/pcie/Kconfig          |   2 +-
> > >  drivers/pci/pcie/aer.c            | 219 ++++++---------------------
> > >  drivers/pci/pcie/aer_inject.c     | 306 ++++++++++++++++++++------------------
> > >  drivers/pci/pcie/portdrv.h        |   4 -
> > >  drivers/pci/pcie/portdrv_core.c   |   1 +
> > >  8 files changed, 227 insertions(+), 371 deletions(-)
> > 
> > Thanks a lot for doing this!  I applied these to pci/hotplug for
> > v4.20, except for "PCI/AER: Don't read upstream ports below fatal
> > errors", which seems to be already there via another posting, and
> > "PCI/pciehp: Use device managed allocations", which needs a few
> > tweaks.
> 
> Sounds good, and thanks for applying!
> 
> In case this went unnoticed, patch 2's aer_inject using ftrace hooks
> to pci config accessors is really cool and fixes several kernel crashes
> I encountered, but it may not work on every architecture. I'm not sure
> how widely aer_inject is used, so maybe there are no concerns with the
> DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that
> dependency in case there are valid objections.

Oh, indeed, I hadn't noticed this arch dependency.  AFAICT, the new
DYNAMIC_FTRACE_WITH_REGS dependency means aer_inject will work only
on these arches:

  arm		# if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU
  powerpc	# if PPC64 && CPU_LITTLE_ENDIAN
  riscv		# ARCH_RV64I only
  s390
  x86

Notably missing is arm64, which has DYNAMIC_FTRACE but not
DYNAMIC_FTRACE_WITH_REGS.

Bjorn

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 00/12] error handling and pciehp maintenance
@ 2018-10-05 17:31       ` Bjorn Helgaas
  0 siblings, 0 replies; 41+ messages in thread
From: Bjorn Helgaas @ 2018-10-05 17:31 UTC (permalink / raw)
  To: linux-arm-kernel

[+cc arm64 folks, LKML: This conversation is about this patch:

  https://lore.kernel.org/linux-pci/20180918235848.26694-3-keith.busch at intel.com

which fixes some PCIe AER error injection bugs, but also makes the error
injector dependent on DYNAMIC_FTRACE_WITH_REGS, which not all arches
support.  Note that this question is only about the error *injection*
module used for testing.  It doesn't affect AER support itself.]

On Thu, Oct 04, 2018 at 04:11:37PM -0600, Keith Busch wrote:
> On Thu, Oct 04, 2018 at 04:40:15PM -0500, Bjorn Helgaas wrote:
> > On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote:
> > > I ran into a lot of trouble testing error handling, and this series is
> > > just trying to simplify some things. The first 4 fix up aer_inject, and
> > > the rest are cleanup to make better use of kernel APIs.
> > > 
> > > Keith Busch (12):
> > >   PCI: Set PCI bus accessors to noinline
> > >   PCI/AER: Covertly inject errors
> > >   PCI/AER: Reuse existing service device lookup
> > >   PCI/AER: Abstract AER interrupt handling
> > >   PCI/AER: Remove dead code
> > >   PCI/AER: Remove error source from aer struct
> > >   PCI/AER: Use kfifo for tracking events
> > >   PCI/AER: Use kfifo helper inserting locked elements
> > >   PCI/AER: Don't read upstream ports below fatal errors
> > >   PCI/AER: Use threaded IRQ for bottom half
> > >   PCI/AER: Use managed resource allocations
> > >   PCI/pciehp: Use device managed allocations
> > > 
> > >  drivers/pci/access.c              |   4 +-
> > >  drivers/pci/hotplug/pciehp_core.c |  14 +-
> > >  drivers/pci/hotplug/pciehp_hpc.c  |  48 ++----
> > >  drivers/pci/pcie/Kconfig          |   2 +-
> > >  drivers/pci/pcie/aer.c            | 219 ++++++---------------------
> > >  drivers/pci/pcie/aer_inject.c     | 306 ++++++++++++++++++++------------------
> > >  drivers/pci/pcie/portdrv.h        |   4 -
> > >  drivers/pci/pcie/portdrv_core.c   |   1 +
> > >  8 files changed, 227 insertions(+), 371 deletions(-)
> > 
> > Thanks a lot for doing this!  I applied these to pci/hotplug for
> > v4.20, except for "PCI/AER: Don't read upstream ports below fatal
> > errors", which seems to be already there via another posting, and
> > "PCI/pciehp: Use device managed allocations", which needs a few
> > tweaks.
> 
> Sounds good, and thanks for applying!
> 
> In case this went unnoticed, patch 2's aer_inject using ftrace hooks
> to pci config accessors is really cool and fixes several kernel crashes
> I encountered, but it may not work on every architecture. I'm not sure
> how widely aer_inject is used, so maybe there are no concerns with the
> DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that
> dependency in case there are valid objections.

Oh, indeed, I hadn't noticed this arch dependency.  AFAICT, the new
DYNAMIC_FTRACE_WITH_REGS dependency means aer_inject will work only
on these arches:

  arm		# if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU
  powerpc	# if PPC64 && CPU_LITTLE_ENDIAN
  riscv		# ARCH_RV64I only
  s390
  x86

Notably missing is arm64, which has DYNAMIC_FTRACE but not
DYNAMIC_FTRACE_WITH_REGS.

Bjorn

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 00/12] error handling and pciehp maintenance
  2018-10-05 17:31       ` Bjorn Helgaas
@ 2018-10-08 16:18         ` Keith Busch
  -1 siblings, 0 replies; 41+ messages in thread
From: Keith Busch @ 2018-10-08 16:18 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Linux PCI, Bjorn Helgaas, Benjamin Herrenschmidt, Sinan Kaya,
	Thomas Tai, poza, Lukas Wunner, Christoph Hellwig,
	Mika Westerberg, Catalin Marinas, Will Deacon, linux-arm-kernel,
	linux-kernel

On Fri, Oct 05, 2018 at 12:31:45PM -0500, Bjorn Helgaas wrote:
> [+cc arm64 folks, LKML: This conversation is about this patch:
> 
>   https://lore.kernel.org/linux-pci/20180918235848.26694-3-keith.busch@intel.com
> 
> which fixes some PCIe AER error injection bugs, but also makes the error
> injector dependent on DYNAMIC_FTRACE_WITH_REGS, which not all arches
> support.  Note that this question is only about the error *injection*
> module used for testing.  It doesn't affect AER support itself.]
> 
> On Thu, Oct 04, 2018 at 04:11:37PM -0600, Keith Busch wrote:
> > On Thu, Oct 04, 2018 at 04:40:15PM -0500, Bjorn Helgaas wrote:
> > > On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote:
> > > > I ran into a lot of trouble testing error handling, and this series is
> > > > just trying to simplify some things. The first 4 fix up aer_inject, and
> > > > the rest are cleanup to make better use of kernel APIs.
> > > > 
> > > > Keith Busch (12):
> > > >   PCI: Set PCI bus accessors to noinline
> > > >   PCI/AER: Covertly inject errors
> > > >   PCI/AER: Reuse existing service device lookup
> > > >   PCI/AER: Abstract AER interrupt handling
> > > >   PCI/AER: Remove dead code
> > > >   PCI/AER: Remove error source from aer struct
> > > >   PCI/AER: Use kfifo for tracking events
> > > >   PCI/AER: Use kfifo helper inserting locked elements
> > > >   PCI/AER: Don't read upstream ports below fatal errors
> > > >   PCI/AER: Use threaded IRQ for bottom half
> > > >   PCI/AER: Use managed resource allocations
> > > >   PCI/pciehp: Use device managed allocations
> > > > 
> > > >  drivers/pci/access.c              |   4 +-
> > > >  drivers/pci/hotplug/pciehp_core.c |  14 +-
> > > >  drivers/pci/hotplug/pciehp_hpc.c  |  48 ++----
> > > >  drivers/pci/pcie/Kconfig          |   2 +-
> > > >  drivers/pci/pcie/aer.c            | 219 ++++++---------------------
> > > >  drivers/pci/pcie/aer_inject.c     | 306 ++++++++++++++++++++------------------
> > > >  drivers/pci/pcie/portdrv.h        |   4 -
> > > >  drivers/pci/pcie/portdrv_core.c   |   1 +
> > > >  8 files changed, 227 insertions(+), 371 deletions(-)
> > > 
> > > Thanks a lot for doing this!  I applied these to pci/hotplug for
> > > v4.20, except for "PCI/AER: Don't read upstream ports below fatal
> > > errors", which seems to be already there via another posting, and
> > > "PCI/pciehp: Use device managed allocations", which needs a few
> > > tweaks.
> > 
> > Sounds good, and thanks for applying!
> > 
> > In case this went unnoticed, patch 2's aer_inject using ftrace hooks
> > to pci config accessors is really cool and fixes several kernel crashes
> > I encountered, but it may not work on every architecture. I'm not sure
> > how widely aer_inject is used, so maybe there are no concerns with the
> > DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that
> > dependency in case there are valid objections.
> 
> Oh, indeed, I hadn't noticed this arch dependency.  AFAICT, the new
> DYNAMIC_FTRACE_WITH_REGS dependency means aer_inject will work only
> on these arches:
> 
>   arm		# if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU
>   powerpc	# if PPC64 && CPU_LITTLE_ENDIAN
>   riscv		# ARCH_RV64I only
>   s390
>   x86
> 
> Notably missing is arm64, which has DYNAMIC_FTRACE but not
> DYNAMIC_FTRACE_WITH_REGS.
> 
> Bjorn

Looks like the kbuild bot found an ARM kernel config that has
DYNAMIC_FTRACE_WITH_REGS set, and then the module can't compile
there. I'll need to update this patch regardless, and I think the right
thing to do is maintain the "old" way with conditional compiling for
any arch specific features.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 00/12] error handling and pciehp maintenance
@ 2018-10-08 16:18         ` Keith Busch
  0 siblings, 0 replies; 41+ messages in thread
From: Keith Busch @ 2018-10-08 16:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 05, 2018 at 12:31:45PM -0500, Bjorn Helgaas wrote:
> [+cc arm64 folks, LKML: This conversation is about this patch:
> 
>   https://lore.kernel.org/linux-pci/20180918235848.26694-3-keith.busch at intel.com
> 
> which fixes some PCIe AER error injection bugs, but also makes the error
> injector dependent on DYNAMIC_FTRACE_WITH_REGS, which not all arches
> support.  Note that this question is only about the error *injection*
> module used for testing.  It doesn't affect AER support itself.]
> 
> On Thu, Oct 04, 2018 at 04:11:37PM -0600, Keith Busch wrote:
> > On Thu, Oct 04, 2018 at 04:40:15PM -0500, Bjorn Helgaas wrote:
> > > On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote:
> > > > I ran into a lot of trouble testing error handling, and this series is
> > > > just trying to simplify some things. The first 4 fix up aer_inject, and
> > > > the rest are cleanup to make better use of kernel APIs.
> > > > 
> > > > Keith Busch (12):
> > > >   PCI: Set PCI bus accessors to noinline
> > > >   PCI/AER: Covertly inject errors
> > > >   PCI/AER: Reuse existing service device lookup
> > > >   PCI/AER: Abstract AER interrupt handling
> > > >   PCI/AER: Remove dead code
> > > >   PCI/AER: Remove error source from aer struct
> > > >   PCI/AER: Use kfifo for tracking events
> > > >   PCI/AER: Use kfifo helper inserting locked elements
> > > >   PCI/AER: Don't read upstream ports below fatal errors
> > > >   PCI/AER: Use threaded IRQ for bottom half
> > > >   PCI/AER: Use managed resource allocations
> > > >   PCI/pciehp: Use device managed allocations
> > > > 
> > > >  drivers/pci/access.c              |   4 +-
> > > >  drivers/pci/hotplug/pciehp_core.c |  14 +-
> > > >  drivers/pci/hotplug/pciehp_hpc.c  |  48 ++----
> > > >  drivers/pci/pcie/Kconfig          |   2 +-
> > > >  drivers/pci/pcie/aer.c            | 219 ++++++---------------------
> > > >  drivers/pci/pcie/aer_inject.c     | 306 ++++++++++++++++++++------------------
> > > >  drivers/pci/pcie/portdrv.h        |   4 -
> > > >  drivers/pci/pcie/portdrv_core.c   |   1 +
> > > >  8 files changed, 227 insertions(+), 371 deletions(-)
> > > 
> > > Thanks a lot for doing this!  I applied these to pci/hotplug for
> > > v4.20, except for "PCI/AER: Don't read upstream ports below fatal
> > > errors", which seems to be already there via another posting, and
> > > "PCI/pciehp: Use device managed allocations", which needs a few
> > > tweaks.
> > 
> > Sounds good, and thanks for applying!
> > 
> > In case this went unnoticed, patch 2's aer_inject using ftrace hooks
> > to pci config accessors is really cool and fixes several kernel crashes
> > I encountered, but it may not work on every architecture. I'm not sure
> > how widely aer_inject is used, so maybe there are no concerns with the
> > DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that
> > dependency in case there are valid objections.
> 
> Oh, indeed, I hadn't noticed this arch dependency.  AFAICT, the new
> DYNAMIC_FTRACE_WITH_REGS dependency means aer_inject will work only
> on these arches:
> 
>   arm		# if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU
>   powerpc	# if PPC64 && CPU_LITTLE_ENDIAN
>   riscv		# ARCH_RV64I only
>   s390
>   x86
> 
> Notably missing is arm64, which has DYNAMIC_FTRACE but not
> DYNAMIC_FTRACE_WITH_REGS.
> 
> Bjorn

Looks like the kbuild bot found an ARM kernel config that has
DYNAMIC_FTRACE_WITH_REGS set, and then the module can't compile
there. I'll need to update this patch regardless, and I think the right
thing to do is maintain the "old" way with conditional compiling for
any arch specific features.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 00/12] error handling and pciehp maintenance
  2018-10-08 16:18         ` Keith Busch
@ 2018-10-08 17:23           ` Bjorn Helgaas
  -1 siblings, 0 replies; 41+ messages in thread
From: Bjorn Helgaas @ 2018-10-08 17:23 UTC (permalink / raw)
  To: Keith Busch
  Cc: Linux PCI, Bjorn Helgaas, Benjamin Herrenschmidt, Sinan Kaya,
	Thomas Tai, poza, Lukas Wunner, Christoph Hellwig,
	Mika Westerberg, Catalin Marinas, Will Deacon, linux-arm-kernel,
	linux-kernel

On Mon, Oct 08, 2018 at 10:18:47AM -0600, Keith Busch wrote:
> On Fri, Oct 05, 2018 at 12:31:45PM -0500, Bjorn Helgaas wrote:
> > [+cc arm64 folks, LKML: This conversation is about this patch:
> > 
> >   https://lore.kernel.org/linux-pci/20180918235848.26694-3-keith.busch@intel.com
> > 
> > which fixes some PCIe AER error injection bugs, but also makes the error
> > injector dependent on DYNAMIC_FTRACE_WITH_REGS, which not all arches
> > support.  Note that this question is only about the error *injection*
> > module used for testing.  It doesn't affect AER support itself.]
> > 
> > On Thu, Oct 04, 2018 at 04:11:37PM -0600, Keith Busch wrote:
> > > On Thu, Oct 04, 2018 at 04:40:15PM -0500, Bjorn Helgaas wrote:
> > > > On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote:
> > > > > I ran into a lot of trouble testing error handling, and this series is
> > > > > just trying to simplify some things. The first 4 fix up aer_inject, and
> > > > > the rest are cleanup to make better use of kernel APIs.
> > > > > 
> > > > > Keith Busch (12):
> > > > >   PCI: Set PCI bus accessors to noinline
> > > > >   PCI/AER: Covertly inject errors
> > > > >   PCI/AER: Reuse existing service device lookup
> > > > >   PCI/AER: Abstract AER interrupt handling
> > > > >   PCI/AER: Remove dead code
> > > > >   PCI/AER: Remove error source from aer struct
> > > > >   PCI/AER: Use kfifo for tracking events
> > > > >   PCI/AER: Use kfifo helper inserting locked elements
> > > > >   PCI/AER: Don't read upstream ports below fatal errors
> > > > >   PCI/AER: Use threaded IRQ for bottom half
> > > > >   PCI/AER: Use managed resource allocations
> > > > >   PCI/pciehp: Use device managed allocations
> > > > > 
> > > > >  drivers/pci/access.c              |   4 +-
> > > > >  drivers/pci/hotplug/pciehp_core.c |  14 +-
> > > > >  drivers/pci/hotplug/pciehp_hpc.c  |  48 ++----
> > > > >  drivers/pci/pcie/Kconfig          |   2 +-
> > > > >  drivers/pci/pcie/aer.c            | 219 ++++++---------------------
> > > > >  drivers/pci/pcie/aer_inject.c     | 306 ++++++++++++++++++++------------------
> > > > >  drivers/pci/pcie/portdrv.h        |   4 -
> > > > >  drivers/pci/pcie/portdrv_core.c   |   1 +
> > > > >  8 files changed, 227 insertions(+), 371 deletions(-)
> > > > 
> > > > Thanks a lot for doing this!  I applied these to pci/hotplug for
> > > > v4.20, except for "PCI/AER: Don't read upstream ports below fatal
> > > > errors", which seems to be already there via another posting, and
> > > > "PCI/pciehp: Use device managed allocations", which needs a few
> > > > tweaks.
> > > 
> > > Sounds good, and thanks for applying!
> > > 
> > > In case this went unnoticed, patch 2's aer_inject using ftrace hooks
> > > to pci config accessors is really cool and fixes several kernel crashes
> > > I encountered, but it may not work on every architecture. I'm not sure
> > > how widely aer_inject is used, so maybe there are no concerns with the
> > > DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that
> > > dependency in case there are valid objections.
> > 
> > Oh, indeed, I hadn't noticed this arch dependency.  AFAICT, the new
> > DYNAMIC_FTRACE_WITH_REGS dependency means aer_inject will work only
> > on these arches:
> > 
> >   arm		# if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU
> >   powerpc	# if PPC64 && CPU_LITTLE_ENDIAN
> >   riscv		# ARCH_RV64I only
> >   s390
> >   x86
> > 
> > Notably missing is arm64, which has DYNAMIC_FTRACE but not
> > DYNAMIC_FTRACE_WITH_REGS.
> > 
> > Bjorn
> 
> Looks like the kbuild bot found an ARM kernel config that has
> DYNAMIC_FTRACE_WITH_REGS set, and then the module can't compile
> there. I'll need to update this patch regardless, and I think the right
> thing to do is maintain the "old" way with conditional compiling for
> any arch specific features.

Sounds messy, but probably the best route.

I dropped these patches for now:

  PCI/AER: Covertly inject errors with ftrace hooks
  PCI/AER: Reuse existing pcie_port_find_device() interface
  PCI/AER: Abstract AER interrupt handling

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 00/12] error handling and pciehp maintenance
@ 2018-10-08 17:23           ` Bjorn Helgaas
  0 siblings, 0 replies; 41+ messages in thread
From: Bjorn Helgaas @ 2018-10-08 17:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 08, 2018 at 10:18:47AM -0600, Keith Busch wrote:
> On Fri, Oct 05, 2018 at 12:31:45PM -0500, Bjorn Helgaas wrote:
> > [+cc arm64 folks, LKML: This conversation is about this patch:
> > 
> >   https://lore.kernel.org/linux-pci/20180918235848.26694-3-keith.busch at intel.com
> > 
> > which fixes some PCIe AER error injection bugs, but also makes the error
> > injector dependent on DYNAMIC_FTRACE_WITH_REGS, which not all arches
> > support.  Note that this question is only about the error *injection*
> > module used for testing.  It doesn't affect AER support itself.]
> > 
> > On Thu, Oct 04, 2018 at 04:11:37PM -0600, Keith Busch wrote:
> > > On Thu, Oct 04, 2018 at 04:40:15PM -0500, Bjorn Helgaas wrote:
> > > > On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote:
> > > > > I ran into a lot of trouble testing error handling, and this series is
> > > > > just trying to simplify some things. The first 4 fix up aer_inject, and
> > > > > the rest are cleanup to make better use of kernel APIs.
> > > > > 
> > > > > Keith Busch (12):
> > > > >   PCI: Set PCI bus accessors to noinline
> > > > >   PCI/AER: Covertly inject errors
> > > > >   PCI/AER: Reuse existing service device lookup
> > > > >   PCI/AER: Abstract AER interrupt handling
> > > > >   PCI/AER: Remove dead code
> > > > >   PCI/AER: Remove error source from aer struct
> > > > >   PCI/AER: Use kfifo for tracking events
> > > > >   PCI/AER: Use kfifo helper inserting locked elements
> > > > >   PCI/AER: Don't read upstream ports below fatal errors
> > > > >   PCI/AER: Use threaded IRQ for bottom half
> > > > >   PCI/AER: Use managed resource allocations
> > > > >   PCI/pciehp: Use device managed allocations
> > > > > 
> > > > >  drivers/pci/access.c              |   4 +-
> > > > >  drivers/pci/hotplug/pciehp_core.c |  14 +-
> > > > >  drivers/pci/hotplug/pciehp_hpc.c  |  48 ++----
> > > > >  drivers/pci/pcie/Kconfig          |   2 +-
> > > > >  drivers/pci/pcie/aer.c            | 219 ++++++---------------------
> > > > >  drivers/pci/pcie/aer_inject.c     | 306 ++++++++++++++++++++------------------
> > > > >  drivers/pci/pcie/portdrv.h        |   4 -
> > > > >  drivers/pci/pcie/portdrv_core.c   |   1 +
> > > > >  8 files changed, 227 insertions(+), 371 deletions(-)
> > > > 
> > > > Thanks a lot for doing this!  I applied these to pci/hotplug for
> > > > v4.20, except for "PCI/AER: Don't read upstream ports below fatal
> > > > errors", which seems to be already there via another posting, and
> > > > "PCI/pciehp: Use device managed allocations", which needs a few
> > > > tweaks.
> > > 
> > > Sounds good, and thanks for applying!
> > > 
> > > In case this went unnoticed, patch 2's aer_inject using ftrace hooks
> > > to pci config accessors is really cool and fixes several kernel crashes
> > > I encountered, but it may not work on every architecture. I'm not sure
> > > how widely aer_inject is used, so maybe there are no concerns with the
> > > DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that
> > > dependency in case there are valid objections.
> > 
> > Oh, indeed, I hadn't noticed this arch dependency.  AFAICT, the new
> > DYNAMIC_FTRACE_WITH_REGS dependency means aer_inject will work only
> > on these arches:
> > 
> >   arm		# if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU
> >   powerpc	# if PPC64 && CPU_LITTLE_ENDIAN
> >   riscv		# ARCH_RV64I only
> >   s390
> >   x86
> > 
> > Notably missing is arm64, which has DYNAMIC_FTRACE but not
> > DYNAMIC_FTRACE_WITH_REGS.
> > 
> > Bjorn
> 
> Looks like the kbuild bot found an ARM kernel config that has
> DYNAMIC_FTRACE_WITH_REGS set, and then the module can't compile
> there. I'll need to update this patch regardless, and I think the right
> thing to do is maintain the "old" way with conditional compiling for
> any arch specific features.

Sounds messy, but probably the best route.

I dropped these patches for now:

  PCI/AER: Covertly inject errors with ftrace hooks
  PCI/AER: Reuse existing pcie_port_find_device() interface
  PCI/AER: Abstract AER interrupt handling

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 00/12] error handling and pciehp maintenance
  2018-10-05 17:31       ` Bjorn Helgaas
@ 2018-10-09 16:03         ` Will Deacon
  -1 siblings, 0 replies; 41+ messages in thread
From: Will Deacon @ 2018-10-09 16:03 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Keith Busch, Linux PCI, Bjorn Helgaas, Benjamin Herrenschmidt,
	Sinan Kaya, Thomas Tai, poza, Lukas Wunner, Christoph Hellwig,
	Mika Westerberg, Catalin Marinas, linux-arm-kernel, linux-kernel

Hi Bjorn,

On Fri, Oct 05, 2018 at 12:31:45PM -0500, Bjorn Helgaas wrote:
> [+cc arm64 folks, LKML: This conversation is about this patch:
> 
>   https://lore.kernel.org/linux-pci/20180918235848.26694-3-keith.busch@intel.com
> 
> which fixes some PCIe AER error injection bugs, but also makes the error
> injector dependent on DYNAMIC_FTRACE_WITH_REGS, which not all arches
> support.  Note that this question is only about the error *injection*
> module used for testing.  It doesn't affect AER support itself.]
> 
> On Thu, Oct 04, 2018 at 04:11:37PM -0600, Keith Busch wrote:
> > In case this went unnoticed, patch 2's aer_inject using ftrace hooks
> > to pci config accessors is really cool and fixes several kernel crashes
> > I encountered, but it may not work on every architecture. I'm not sure
> > how widely aer_inject is used, so maybe there are no concerns with the
> > DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that
> > dependency in case there are valid objections.
> 
> Oh, indeed, I hadn't noticed this arch dependency.  AFAICT, the new
> DYNAMIC_FTRACE_WITH_REGS dependency means aer_inject will work only
> on these arches:
> 
>   arm		# if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU
>   powerpc	# if PPC64 && CPU_LITTLE_ENDIAN
>   riscv		# ARCH_RV64I only
>   s390
>   x86
> 
> Notably missing is arm64, which has DYNAMIC_FTRACE but not
> DYNAMIC_FTRACE_WITH_REGS.

Thanks for the heads-up here. This feature is currently in development for
arm64:

http://lkml.kernel.org/r/20181001141648.1DBED68BDF@newverein.lst.de

but the latest review comments suggest that it's a fair way from ready yet,
so I wouldn't hold your breath.

Will

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 00/12] error handling and pciehp maintenance
@ 2018-10-09 16:03         ` Will Deacon
  0 siblings, 0 replies; 41+ messages in thread
From: Will Deacon @ 2018-10-09 16:03 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Bjorn,

On Fri, Oct 05, 2018 at 12:31:45PM -0500, Bjorn Helgaas wrote:
> [+cc arm64 folks, LKML: This conversation is about this patch:
> 
>   https://lore.kernel.org/linux-pci/20180918235848.26694-3-keith.busch at intel.com
> 
> which fixes some PCIe AER error injection bugs, but also makes the error
> injector dependent on DYNAMIC_FTRACE_WITH_REGS, which not all arches
> support.  Note that this question is only about the error *injection*
> module used for testing.  It doesn't affect AER support itself.]
> 
> On Thu, Oct 04, 2018 at 04:11:37PM -0600, Keith Busch wrote:
> > In case this went unnoticed, patch 2's aer_inject using ftrace hooks
> > to pci config accessors is really cool and fixes several kernel crashes
> > I encountered, but it may not work on every architecture. I'm not sure
> > how widely aer_inject is used, so maybe there are no concerns with the
> > DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that
> > dependency in case there are valid objections.
> 
> Oh, indeed, I hadn't noticed this arch dependency.  AFAICT, the new
> DYNAMIC_FTRACE_WITH_REGS dependency means aer_inject will work only
> on these arches:
> 
>   arm		# if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU
>   powerpc	# if PPC64 && CPU_LITTLE_ENDIAN
>   riscv		# ARCH_RV64I only
>   s390
>   x86
> 
> Notably missing is arm64, which has DYNAMIC_FTRACE but not
> DYNAMIC_FTRACE_WITH_REGS.

Thanks for the heads-up here. This feature is currently in development for
arm64:

http://lkml.kernel.org/r/20181001141648.1DBED68BDF at newverein.lst.de

but the latest review comments suggest that it's a fair way from ready yet,
so I wouldn't hold your breath.

Will

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 00/12] error handling and pciehp maintenance
  2018-10-08 16:18         ` Keith Busch
@ 2018-11-06 16:34           ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 41+ messages in thread
From: Lorenzo Pieralisi @ 2018-11-06 16:34 UTC (permalink / raw)
  To: Keith Busch
  Cc: Bjorn Helgaas, Linux PCI, Bjorn Helgaas, Benjamin Herrenschmidt,
	Sinan Kaya, Thomas Tai, poza, Lukas Wunner, Christoph Hellwig,
	Mika Westerberg, Catalin Marinas, Will Deacon, linux-arm-kernel,
	linux-kernel

On Mon, Oct 08, 2018 at 10:18:47AM -0600, Keith Busch wrote:
> On Fri, Oct 05, 2018 at 12:31:45PM -0500, Bjorn Helgaas wrote:
> > [+cc arm64 folks, LKML: This conversation is about this patch:
> > 
> >   https://lore.kernel.org/linux-pci/20180918235848.26694-3-keith.busch@intel.com
> > 
> > which fixes some PCIe AER error injection bugs, but also makes the error
> > injector dependent on DYNAMIC_FTRACE_WITH_REGS, which not all arches
> > support.  Note that this question is only about the error *injection*
> > module used for testing.  It doesn't affect AER support itself.]
> > 
> > On Thu, Oct 04, 2018 at 04:11:37PM -0600, Keith Busch wrote:
> > > On Thu, Oct 04, 2018 at 04:40:15PM -0500, Bjorn Helgaas wrote:
> > > > On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote:
> > > > > I ran into a lot of trouble testing error handling, and this series is
> > > > > just trying to simplify some things. The first 4 fix up aer_inject, and
> > > > > the rest are cleanup to make better use of kernel APIs.
> > > > > 
> > > > > Keith Busch (12):
> > > > >   PCI: Set PCI bus accessors to noinline
> > > > >   PCI/AER: Covertly inject errors
> > > > >   PCI/AER: Reuse existing service device lookup
> > > > >   PCI/AER: Abstract AER interrupt handling
> > > > >   PCI/AER: Remove dead code
> > > > >   PCI/AER: Remove error source from aer struct
> > > > >   PCI/AER: Use kfifo for tracking events
> > > > >   PCI/AER: Use kfifo helper inserting locked elements
> > > > >   PCI/AER: Don't read upstream ports below fatal errors
> > > > >   PCI/AER: Use threaded IRQ for bottom half
> > > > >   PCI/AER: Use managed resource allocations
> > > > >   PCI/pciehp: Use device managed allocations
> > > > > 
> > > > >  drivers/pci/access.c              |   4 +-
> > > > >  drivers/pci/hotplug/pciehp_core.c |  14 +-
> > > > >  drivers/pci/hotplug/pciehp_hpc.c  |  48 ++----
> > > > >  drivers/pci/pcie/Kconfig          |   2 +-
> > > > >  drivers/pci/pcie/aer.c            | 219 ++++++---------------------
> > > > >  drivers/pci/pcie/aer_inject.c     | 306 ++++++++++++++++++++------------------
> > > > >  drivers/pci/pcie/portdrv.h        |   4 -
> > > > >  drivers/pci/pcie/portdrv_core.c   |   1 +
> > > > >  8 files changed, 227 insertions(+), 371 deletions(-)
> > > > 
> > > > Thanks a lot for doing this!  I applied these to pci/hotplug for
> > > > v4.20, except for "PCI/AER: Don't read upstream ports below fatal
> > > > errors", which seems to be already there via another posting, and
> > > > "PCI/pciehp: Use device managed allocations", which needs a few
> > > > tweaks.
> > > 
> > > Sounds good, and thanks for applying!
> > > 
> > > In case this went unnoticed, patch 2's aer_inject using ftrace hooks
> > > to pci config accessors is really cool and fixes several kernel crashes
> > > I encountered, but it may not work on every architecture. I'm not sure
> > > how widely aer_inject is used, so maybe there are no concerns with the
> > > DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that
> > > dependency in case there are valid objections.
> > 
> > Oh, indeed, I hadn't noticed this arch dependency.  AFAICT, the new
> > DYNAMIC_FTRACE_WITH_REGS dependency means aer_inject will work only
> > on these arches:
> > 
> >   arm		# if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU
> >   powerpc	# if PPC64 && CPU_LITTLE_ENDIAN
> >   riscv		# ARCH_RV64I only
> >   s390
> >   x86
> > 
> > Notably missing is arm64, which has DYNAMIC_FTRACE but not
> > DYNAMIC_FTRACE_WITH_REGS.
> > 
> > Bjorn
> 
> Looks like the kbuild bot found an ARM kernel config that has
> DYNAMIC_FTRACE_WITH_REGS set, and then the module can't compile
> there. I'll need to update this patch regardless, and I think the right
> thing to do is maintain the "old" way with conditional compiling for
> any arch specific features.

The question is whether we really need to dynamically patch the kernel
with ftrace to achieve what that patch does.

Furthermore, it would also be good to report what bugs we are actually
fixing, from what you are writing falling back to the current method if
!DYNAMIC_FTRACE_WITH_REGS is broken in many ways and I would start with
fixing the current behaviour with something that does not depend on arch
features that may not even be implemented.

Anyway - please CC me in in the follow up series, I am happy to help you
test it.

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 00/12] error handling and pciehp maintenance
@ 2018-11-06 16:34           ` Lorenzo Pieralisi
  0 siblings, 0 replies; 41+ messages in thread
From: Lorenzo Pieralisi @ 2018-11-06 16:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 08, 2018 at 10:18:47AM -0600, Keith Busch wrote:
> On Fri, Oct 05, 2018 at 12:31:45PM -0500, Bjorn Helgaas wrote:
> > [+cc arm64 folks, LKML: This conversation is about this patch:
> > 
> >   https://lore.kernel.org/linux-pci/20180918235848.26694-3-keith.busch at intel.com
> > 
> > which fixes some PCIe AER error injection bugs, but also makes the error
> > injector dependent on DYNAMIC_FTRACE_WITH_REGS, which not all arches
> > support.  Note that this question is only about the error *injection*
> > module used for testing.  It doesn't affect AER support itself.]
> > 
> > On Thu, Oct 04, 2018 at 04:11:37PM -0600, Keith Busch wrote:
> > > On Thu, Oct 04, 2018 at 04:40:15PM -0500, Bjorn Helgaas wrote:
> > > > On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote:
> > > > > I ran into a lot of trouble testing error handling, and this series is
> > > > > just trying to simplify some things. The first 4 fix up aer_inject, and
> > > > > the rest are cleanup to make better use of kernel APIs.
> > > > > 
> > > > > Keith Busch (12):
> > > > >   PCI: Set PCI bus accessors to noinline
> > > > >   PCI/AER: Covertly inject errors
> > > > >   PCI/AER: Reuse existing service device lookup
> > > > >   PCI/AER: Abstract AER interrupt handling
> > > > >   PCI/AER: Remove dead code
> > > > >   PCI/AER: Remove error source from aer struct
> > > > >   PCI/AER: Use kfifo for tracking events
> > > > >   PCI/AER: Use kfifo helper inserting locked elements
> > > > >   PCI/AER: Don't read upstream ports below fatal errors
> > > > >   PCI/AER: Use threaded IRQ for bottom half
> > > > >   PCI/AER: Use managed resource allocations
> > > > >   PCI/pciehp: Use device managed allocations
> > > > > 
> > > > >  drivers/pci/access.c              |   4 +-
> > > > >  drivers/pci/hotplug/pciehp_core.c |  14 +-
> > > > >  drivers/pci/hotplug/pciehp_hpc.c  |  48 ++----
> > > > >  drivers/pci/pcie/Kconfig          |   2 +-
> > > > >  drivers/pci/pcie/aer.c            | 219 ++++++---------------------
> > > > >  drivers/pci/pcie/aer_inject.c     | 306 ++++++++++++++++++++------------------
> > > > >  drivers/pci/pcie/portdrv.h        |   4 -
> > > > >  drivers/pci/pcie/portdrv_core.c   |   1 +
> > > > >  8 files changed, 227 insertions(+), 371 deletions(-)
> > > > 
> > > > Thanks a lot for doing this!  I applied these to pci/hotplug for
> > > > v4.20, except for "PCI/AER: Don't read upstream ports below fatal
> > > > errors", which seems to be already there via another posting, and
> > > > "PCI/pciehp: Use device managed allocations", which needs a few
> > > > tweaks.
> > > 
> > > Sounds good, and thanks for applying!
> > > 
> > > In case this went unnoticed, patch 2's aer_inject using ftrace hooks
> > > to pci config accessors is really cool and fixes several kernel crashes
> > > I encountered, but it may not work on every architecture. I'm not sure
> > > how widely aer_inject is used, so maybe there are no concerns with the
> > > DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that
> > > dependency in case there are valid objections.
> > 
> > Oh, indeed, I hadn't noticed this arch dependency.  AFAICT, the new
> > DYNAMIC_FTRACE_WITH_REGS dependency means aer_inject will work only
> > on these arches:
> > 
> >   arm		# if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU
> >   powerpc	# if PPC64 && CPU_LITTLE_ENDIAN
> >   riscv		# ARCH_RV64I only
> >   s390
> >   x86
> > 
> > Notably missing is arm64, which has DYNAMIC_FTRACE but not
> > DYNAMIC_FTRACE_WITH_REGS.
> > 
> > Bjorn
> 
> Looks like the kbuild bot found an ARM kernel config that has
> DYNAMIC_FTRACE_WITH_REGS set, and then the module can't compile
> there. I'll need to update this patch regardless, and I think the right
> thing to do is maintain the "old" way with conditional compiling for
> any arch specific features.

The question is whether we really need to dynamically patch the kernel
with ftrace to achieve what that patch does.

Furthermore, it would also be good to report what bugs we are actually
fixing, from what you are writing falling back to the current method if
!DYNAMIC_FTRACE_WITH_REGS is broken in many ways and I would start with
fixing the current behaviour with something that does not depend on arch
features that may not even be implemented.

Anyway - please CC me in in the follow up series, I am happy to help you
test it.

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 00/12] error handling and pciehp maintenance
  2018-11-06 16:34           ` Lorenzo Pieralisi
@ 2018-11-06 16:47             ` Keith Busch
  -1 siblings, 0 replies; 41+ messages in thread
From: Keith Busch @ 2018-11-06 16:47 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Bjorn Helgaas, Linux PCI, Bjorn Helgaas, Benjamin Herrenschmidt,
	Sinan Kaya, Thomas Tai, poza, Lukas Wunner, Christoph Hellwig,
	Mika Westerberg, Catalin Marinas, Will Deacon, linux-arm-kernel,
	linux-kernel

On Tue, Nov 06, 2018 at 04:34:08PM +0000, Lorenzo Pieralisi wrote:
> The question is whether we really need to dynamically patch the kernel
> with ftrace to achieve what that patch does.
> 
> Furthermore, it would also be good to report what bugs we are actually
> fixing, from what you are writing falling back to the current method if
> !DYNAMIC_FTRACE_WITH_REGS is broken in many ways and I would start with
> fixing the current behaviour with something that does not depend on arch
> features that may not even be implemented.

There are two problems with the current method:

  1. It may dereference pci_dev after it was freed
  2. The pci_dev's children inherit its fake pci_bus's ops on
     enumeration

Both result in kernel panic.

The dynamic kernel patch just seemed like a cool way to inject errors
without messing with the driver's structures. But if there's a more
elegant way to do it, I'm all for it.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 00/12] error handling and pciehp maintenance
@ 2018-11-06 16:47             ` Keith Busch
  0 siblings, 0 replies; 41+ messages in thread
From: Keith Busch @ 2018-11-06 16:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 06, 2018 at 04:34:08PM +0000, Lorenzo Pieralisi wrote:
> The question is whether we really need to dynamically patch the kernel
> with ftrace to achieve what that patch does.
> 
> Furthermore, it would also be good to report what bugs we are actually
> fixing, from what you are writing falling back to the current method if
> !DYNAMIC_FTRACE_WITH_REGS is broken in many ways and I would start with
> fixing the current behaviour with something that does not depend on arch
> features that may not even be implemented.

There are two problems with the current method:

  1. It may dereference pci_dev after it was freed
  2. The pci_dev's children inherit its fake pci_bus's ops on
     enumeration

Both result in kernel panic.

The dynamic kernel patch just seemed like a cool way to inject errors
without messing with the driver's structures. But if there's a more
elegant way to do it, I'm all for it.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 00/12] error handling and pciehp maintenance
  2018-11-06 16:47             ` Keith Busch
@ 2018-11-06 17:21               ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 41+ messages in thread
From: Lorenzo Pieralisi @ 2018-11-06 17:21 UTC (permalink / raw)
  To: Keith Busch
  Cc: Bjorn Helgaas, Linux PCI, Bjorn Helgaas, Benjamin Herrenschmidt,
	Sinan Kaya, Thomas Tai, poza, Lukas Wunner, Christoph Hellwig,
	Mika Westerberg, Catalin Marinas, Will Deacon, linux-arm-kernel,
	linux-kernel

On Tue, Nov 06, 2018 at 09:47:52AM -0700, Keith Busch wrote:
> On Tue, Nov 06, 2018 at 04:34:08PM +0000, Lorenzo Pieralisi wrote:
> > The question is whether we really need to dynamically patch the kernel
> > with ftrace to achieve what that patch does.
> > 
> > Furthermore, it would also be good to report what bugs we are actually
> > fixing, from what you are writing falling back to the current method if
> > !DYNAMIC_FTRACE_WITH_REGS is broken in many ways and I would start with
> > fixing the current behaviour with something that does not depend on arch
> > features that may not even be implemented.
> 
> There are two problems with the current method:
> 
>   1. It may dereference pci_dev after it was freed
>   2. The pci_dev's children inherit its fake pci_bus's ops on
>      enumeration
> 
> Both result in kernel panic.

That's my point, current test module is not robust, I wanted to ask if
there is a way to fix it that does not depend on arch features, because
if there is a dependency that is not met we are still not fixing the
current code, using it as a fallback can still create issues.

> The dynamic kernel patch just seemed like a cool way to inject errors
> without messing with the driver's structures. But if there's a more
> elegant way to do it, I'm all for it.

If you have a simple reproducer for the bugs I am happy to help you test
it (I can also apply arm64 DYNAMIC_FTRACE_WITH_REGS patches and test that
new code path if that's the final direction we are taking).

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 00/12] error handling and pciehp maintenance
@ 2018-11-06 17:21               ` Lorenzo Pieralisi
  0 siblings, 0 replies; 41+ messages in thread
From: Lorenzo Pieralisi @ 2018-11-06 17:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 06, 2018 at 09:47:52AM -0700, Keith Busch wrote:
> On Tue, Nov 06, 2018 at 04:34:08PM +0000, Lorenzo Pieralisi wrote:
> > The question is whether we really need to dynamically patch the kernel
> > with ftrace to achieve what that patch does.
> > 
> > Furthermore, it would also be good to report what bugs we are actually
> > fixing, from what you are writing falling back to the current method if
> > !DYNAMIC_FTRACE_WITH_REGS is broken in many ways and I would start with
> > fixing the current behaviour with something that does not depend on arch
> > features that may not even be implemented.
> 
> There are two problems with the current method:
> 
>   1. It may dereference pci_dev after it was freed
>   2. The pci_dev's children inherit its fake pci_bus's ops on
>      enumeration
> 
> Both result in kernel panic.

That's my point, current test module is not robust, I wanted to ask if
there is a way to fix it that does not depend on arch features, because
if there is a dependency that is not met we are still not fixing the
current code, using it as a fallback can still create issues.

> The dynamic kernel patch just seemed like a cool way to inject errors
> without messing with the driver's structures. But if there's a more
> elegant way to do it, I'm all for it.

If you have a simple reproducer for the bugs I am happy to help you test
it (I can also apply arm64 DYNAMIC_FTRACE_WITH_REGS patches and test that
new code path if that's the final direction we are taking).

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 00/12] error handling and pciehp maintenance
  2018-11-06 17:21               ` Lorenzo Pieralisi
@ 2018-11-06 17:26                 ` Keith Busch
  -1 siblings, 0 replies; 41+ messages in thread
From: Keith Busch @ 2018-11-06 17:26 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Bjorn Helgaas, Linux PCI, Bjorn Helgaas, Benjamin Herrenschmidt,
	Sinan Kaya, Thomas Tai, poza, Lukas Wunner, Christoph Hellwig,
	Mika Westerberg, Catalin Marinas, Will Deacon, linux-arm-kernel,
	linux-kernel

On Tue, Nov 06, 2018 at 05:21:00PM +0000, Lorenzo Pieralisi wrote:
> If you have a simple reproducer for the bugs I am happy to help you test
> it (I can also apply arm64 DYNAMIC_FTRACE_WITH_REGS patches and test that
> new code path if that's the final direction we are taking).

The easiest way to reproduce is load the aer_inject module, inject an
error into a bridge, then remove the bridge and rescan.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 00/12] error handling and pciehp maintenance
@ 2018-11-06 17:26                 ` Keith Busch
  0 siblings, 0 replies; 41+ messages in thread
From: Keith Busch @ 2018-11-06 17:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 06, 2018 at 05:21:00PM +0000, Lorenzo Pieralisi wrote:
> If you have a simple reproducer for the bugs I am happy to help you test
> it (I can also apply arm64 DYNAMIC_FTRACE_WITH_REGS patches and test that
> new code path if that's the final direction we are taking).

The easiest way to reproduce is load the aer_inject module, inject an
error into a bridge, then remove the bridge and rescan.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2018-11-06 17:28 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-18 23:58 [PATCH 00/12] error handling and pciehp maintenance Keith Busch
2018-09-18 23:58 ` [PATCH 01/12] PCI: Set PCI bus accessors to noinline Keith Busch
2018-09-18 23:58 ` [PATCH 02/12] PCI/AER: Covertly inject errors Keith Busch
2018-09-18 23:58 ` [PATCH 03/12] PCI/AER: Reuse existing service device lookup Keith Busch
2018-09-18 23:58 ` [PATCH 04/12] PCI/AER: Abstract AER interrupt handling Keith Busch
2018-09-18 23:58 ` [PATCH 05/12] PCI/AER: Remove dead code Keith Busch
2018-09-18 23:58 ` [PATCH 06/12] PCI/AER: Remove error source from aer struct Keith Busch
2018-09-18 23:58 ` [PATCH 07/12] PCI/AER: Use kfifo for tracking events Keith Busch
2018-09-18 23:58 ` [PATCH 08/12] PCI/AER: Use kfifo helper inserting locked elements Keith Busch
2018-09-18 23:58 ` [PATCH 09/12] PCI/AER: Don't read upstream ports below fatal errors Keith Busch
2018-09-18 23:58 ` [PATCH 10/12] PCI/AER: Use threaded IRQ for bottom half Keith Busch
2018-09-18 23:58 ` [PATCH 11/12] PCI/AER: Use managed resource allocations Keith Busch
2018-09-19 16:29   ` Sinan Kaya
2018-09-19 17:25     ` Keith Busch
2018-09-19 17:36       ` Sinan Kaya
2018-09-25  1:13   ` Benjamin Herrenschmidt
2018-09-25 14:17     ` Keith Busch
2018-09-18 23:58 ` [PATCH 12/12] PCI/pciehp: Use device managed allocations Keith Busch
2018-09-19 15:11   ` Sinan Kaya
2018-09-19 16:17     ` Keith Busch
2018-09-22 18:10   ` Lukas Wunner
2018-09-24 23:43     ` Bjorn Helgaas
2018-09-25  7:13       ` Lukas Wunner
2018-10-04 21:40 ` [PATCH 00/12] error handling and pciehp maintenance Bjorn Helgaas
2018-10-04 22:11   ` Keith Busch
2018-10-05 17:31     ` Bjorn Helgaas
2018-10-05 17:31       ` Bjorn Helgaas
2018-10-08 16:18       ` Keith Busch
2018-10-08 16:18         ` Keith Busch
2018-10-08 17:23         ` Bjorn Helgaas
2018-10-08 17:23           ` Bjorn Helgaas
2018-11-06 16:34         ` Lorenzo Pieralisi
2018-11-06 16:34           ` Lorenzo Pieralisi
2018-11-06 16:47           ` Keith Busch
2018-11-06 16:47             ` Keith Busch
2018-11-06 17:21             ` Lorenzo Pieralisi
2018-11-06 17:21               ` Lorenzo Pieralisi
2018-11-06 17:26               ` Keith Busch
2018-11-06 17:26                 ` Keith Busch
2018-10-09 16:03       ` Will Deacon
2018-10-09 16:03         ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.