linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* pci: kernel crash in bus_find_device
@ 2014-05-20 19:17 Francesco Ruggeri
  2014-05-20 19:50 ` Guenter Roeck
  0 siblings, 1 reply; 16+ messages in thread
From: Francesco Ruggeri @ 2014-05-20 19:17 UTC (permalink / raw)
  To: linux-pci, linux-kernel; +Cc: Francesco Ruggeri

I posted this about a week ago but I did not get any replies.
Re-trying.

While traversing devices on pci_bus_type I ran into the crash below.
The immediate cause of the crash is that bus_find_device is trying to resume
a scan starting from a device that has been unregistered (and whose knode_bus
has already been klist_del' ed).
The main issue seems to be that when resuming a scan the caller should
be holding a
reference to the klist_node, but instead it relies on holding a
reference to the device.
I played with a couple of narrow fixes, but a clean solution would
affect quite a bit of code.

Has anybody run into this before?

Thanks,
Francesco Ruggeri


------------[ cut here ]------------
WARNING: at /bld/EosKernel/Artools-rpmbuild/linux-3.4/include/linux/kref.h:41
klist_iter_init_node+0x30/0x38()
Modules linked in: pci_scan(O) sch_prio sand_dma(PO) arista_bde(PO)
macvlan ip6table_mangle iptable_mangle msr nf_conntrack_ipv6
nf_defrag_ipv6 ip6t_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_LOG
xt_limit ipt_REJECT xt_hl xt_state xt_multiport xt_tcpudp kbfd(O)
8021q garp stp llc tun scd_em_driver(O) nf_conntrack_tftp iptable_raw
iptable_filter ip_tables xt_NOTRACK nf_conntrack xt_mark ip6table_raw
ip6table_filter ip6_tables x_tables scd(O) k8temp amd64_edac_mod hwmon
kvm_amd kvm
Pid: 6861, comm: pci_scan_0 Tainted: P           O
3.4.43.Ar-1797671.flbocafruggeri #1
Call Trace:
 [<ffffffff81029dc4>] warn_slowpath_common+0x80/0x98
 [<ffffffff811b57f1>] ? pci_do_find_bus+0x49/0x49
 [<ffffffff81029df1>] warn_slowpath_null+0x15/0x17
 [<ffffffff813a43ce>] klist_iter_init_node+0x30/0x38
 [<ffffffff8120e57e>] bus_find_device+0x48/0x90
 [<ffffffff811b5908>] pci_get_dev_by_id+0x5e/0x81
 [<ffffffff811b5a6a>] pci_get_subsys+0x5c/0x7f
 [<ffffffff811b5a9e>] pci_get_device+0x11/0x13
 [<ffffffffa00b2087>] pci_scan+0x39/0x8a [pci_scan]
 [<ffffffffa00b204e>] ? init_module+0x3c/0x3c [pci_scan]
 [<ffffffff81040e6e>] kthread+0x84/0x8c
 [<ffffffff813c8b14>] kernel_thread_helper+0x4/0x10
 [<ffffffff81040dea>] ? __init_kthread_worker+0x37/0x37
 [<ffffffff813c8b10>] ? gs_change+0xb/0xb
---[ end trace 79cea1ec476672fe ]---
------------[ cut here ]------------
WARNING: at /bld/EosKernel/Artools-rpmbuild/linux-3.4/lib/klist.c:189
klist_release+0x2b/0xeb()
Modules linked in: pci_scan(O) sch_prio sand_dma(PO) arista_bde(PO)
macvlan ip6table_mangle iptable_mangle msr nf_conntrack_ipv6
nf_defrag_ipv6 ip6t_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_LOG
xt_limit ipt_REJECT xt_hl xt_state xt_multiport xt_tcpudp kbfd(O)
8021q garp stp llc tun scd_em_driver(O) nf_conntrack_tftp iptable_raw
iptable_filter ip_tables xt_NOTRACK nf_conntrack xt_mark ip6table_raw
ip6table_filter ip6_tables x_tables scd(O) k8temp amd64_edac_mod hwmon
kvm_amd kvm
Pid: 6861, comm: pci_scan_0 Tainted: P        W  O
3.4.43.Ar-1797671.flbocafruggeri #1
Call Trace:
 [<ffffffff81029dc4>] warn_slowpath_common+0x80/0x98
 [<ffffffff8120de13>] ? bus_get_device_klist+0x10/0x10
 [<ffffffff81029df1>] warn_slowpath_null+0x15/0x17
 [<ffffffff813a440e>] klist_release+0x2b/0xeb
 [<ffffffff813a44ec>] klist_dec_and_del+0x1e/0x25
 [<ffffffff813a4528>] klist_next+0x35/0xc9
 [<ffffffff811b57f1>] ? pci_do_find_bus+0x49/0x49
 [<ffffffff8120deb3>] next_device+0x9/0x19
 [<ffffffff8120e5a2>] bus_find_device+0x6c/0x90
 [<ffffffff811b5908>] pci_get_dev_by_id+0x5e/0x81
 [<ffffffff811b5a6a>] pci_get_subsys+0x5c/0x7f
 [<ffffffff811b5a9e>] pci_get_device+0x11/0x13
 [<ffffffffa00b2087>] pci_scan+0x39/0x8a [pci_scan]
 [<ffffffffa00b204e>] ? init_module+0x3c/0x3c [pci_scan]
 [<ffffffff81040e6e>] kthread+0x84/0x8c
 [<ffffffff813c8b14>] kernel_thread_helper+0x4/0x10
 [<ffffffff81040dea>] ? __init_kthread_worker+0x37/0x37
 [<ffffffff813c8b10>] ? gs_change+0xb/0xb
---[ end trace 79cea1ec476672ff ]---
general protection fault: 0000 [#1] PREEMPT SMP
CPU 1
Modules linked in: pci_scan(O) sch_prio sand_dma(PO) arista_bde(PO)
macvlan ip6table_mangle iptable_mangle msr nf_conntrack_ipv6
nf_defrag_ipv6 ip6t_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_LOG
xt_limit ipt_REJECT xt_hl xt_state xt_multiport xt_tcpudp kbfd(O)
8021q garp stp llc tun scd_em_driver(O) nf_conntrack_tftp iptable_raw
iptable_filter ip_tables xt_NOTRACK nf_conntrack xt_mark ip6table_raw
ip6table_filter ip6_tables x_tables scd(O) k8temp amd64_edac_mod hwmon
kvm_amd kvm

Pid: 6861, comm: pci_scan_0 Tainted: P        W  O
3.4.43.Ar-1797671.flbocafruggeri #1
RIP: 0010:[<ffffffff813a442c>]  [<ffffffff813a442c>] klist_release+0x49/0xeb
RSP: 0018:ffff88001c55bd50  EFLAGS: 00010293
RAX: dead000000200200 RBX: ffff880030949e78 RCX: ffff880000000010
RDX: dead000000100100 RSI: 0000000000000000 RDI: dead000000200200
RBP: ffff88001c55bd70 R08: dead000000100100 R09: 000000000000000a
R10: 0000000000000000 R11: ffffffff81619920 R12: ffff880030949e90
R13: ffff880030949e78 R14: ffffffff8120de13 R15: ffff880027e717e0
FS:  0000000000000000(0000) GS:ffff88013fb00000(0000) knlGS:00000000f73bc6d0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000009012644 CR3: 0000000069f9e000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process pci_scan_0 (pid: 6861, threadinfo ffff88001c55a000, task
ffff880032ffd340)
Stack:
 ffff880030949e78 ffff88001c55bde0 dead000000100100 ffff880030949e78
 ffff88001c55bd80 ffffffff813a44ec ffff88001c55bdc0 ffffffff813a4528
 ffff88001c55bde0 ffff880027e717e0 ffffffff811b57f1 ffff88001c55bde0
Call Trace:
 [<ffffffff813a44ec>] klist_dec_and_del+0x1e/0x25
 [<ffffffff813a4528>] klist_next+0x35/0xc9
 [<ffffffff811b57f1>] ? pci_do_find_bus+0x49/0x49
 [<ffffffff8120deb3>] next_device+0x9/0x19
 [<ffffffff8120e5a2>] bus_find_device+0x6c/0x90
 [<ffffffff811b5908>] pci_get_dev_by_id+0x5e/0x81
 [<ffffffff811b5a6a>] pci_get_subsys+0x5c/0x7f
 [<ffffffff811b5a9e>] pci_get_device+0x11/0x13
 [<ffffffffa00b2087>] pci_scan+0x39/0x8a [pci_scan]
 [<ffffffffa00b204e>] ? init_module+0x3c/0x3c [pci_scan]
 [<ffffffff81040e6e>] kthread+0x84/0x8c
 [<ffffffff813c8b14>] kernel_thread_helper+0x4/0x10
 [<ffffffff81040dea>] ? __init_kthread_worker+0x37/0x37
 [<ffffffff813c8b10>] ? gs_change+0xb/0xb
Code: 00 48 c7 c7 a1 01 51 81 e8 ce 59 c8 ff 49 8b 54 24 f0 49 8b 44
24 f8 49 b8 00 01 10 00 00 00 ad de 48 bf 00 02 20 00 00 00 ad de <48>
89 42 08 48 89 10 49 89 7c 24 f8 4d 89 44 24 f0 48 c7 c7 30
RIP  [<ffffffff813a442c>] klist_release+0x49/0xeb
 RSP <ffff88001c55bd50>

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: pci: kernel crash in bus_find_device
@ 2014-06-03 22:55 Francesco Ruggeri
  2014-06-03 23:21 ` Greg KH
  2014-06-03 23:23 ` Greg KH
  0 siblings, 2 replies; 16+ messages in thread
From: Francesco Ruggeri @ 2014-06-03 22:55 UTC (permalink / raw)
  To: linux-kernel, hare, gregkh, linux; +Cc: fruggeri

In-Reply-To: <20140523023141.GC13900@kroah.com>


Hi Guenter,
I got back to looking into this crash.
Just as an example, the attached diffs also fix my bus_find_device problem for
traversals that start from the head of the list and traverse it completely.
They are very specific to the case of bus_find_device, and a complete solution
would affect a lot of code.
The main issue seems to be that when a device is found in a klist by say
bus_find_device the klist_node reference should be returned to the caller,
who should then decide whether to use it for the next klist search, drop it or
maybe exchange it for a struct device reference. When resuming a search one
should already hold a klist_node reference from the previous search.
This model is broken by several functions using struct devices such as
bus_find_device, which resume klist searches on the implicit assumption that
holding a reference to the struct device is enough to acquire one on the
klist_node. 
The only reason that this has not been a big issue so far is probably that
on most systems struct devices are not destroyed and created very often.
Thanks,
Francesco


Index: linux-3.4/lib/klist.c
===================================================================
--- linux-3.4.orig/lib/klist.c
+++ linux-3.4/lib/klist.c
@@ -318,6 +318,51 @@ void klist_iter_exit(struct klist_iter *
 }
 EXPORT_SYMBOL_GPL(klist_iter_exit);
 
+/**
+ * klist_iter_init_node_noref - Initialize a klist_iter structure without
+ *				taking a kref.
+ * @k: klist we're iterating.
+ * @i: klist_iter we're filling.
+ * @n: node to start with.
+ *
+ * Similar to klist_iter_init_noref(), but starts the action off with @n,
+ * instead of with the list head.
+ */
+void klist_iter_init_node_noref(struct klist *k, struct klist_iter *i,
+			  struct klist_node *n)
+{
+	i->i_klist = k;
+	i->i_cur = n;
+}
+EXPORT_SYMBOL_GPL(klist_iter_init_node_noref);
+
+/**
+ * klist_iter_init_noref - Initalize a klist_iter structure without taking
+ *			   a kref.
+ * @k: klist we're iterating.
+ * @i: klist_iter structure we're filling.
+ *
+ * Similar to klist_iter_init_node_noref(), but start with the list head.
+ */
+void klist_iter_init_noref(struct klist *k, struct klist_iter *i)
+{
+	klist_iter_init_node_noref(k, i, NULL);
+}
+EXPORT_SYMBOL_GPL(klist_iter_init_noref);
+
+/**
+ * klist_iter_exit_noref - Finish a list iteration without dropping a kref.
+ * @i: Iterator structure.
+ *
+ * Not really needed, but always good form.
+ */
+void klist_iter_exit_noref(struct klist_iter *i)
+{
+	if (i->i_cur)
+		i->i_cur = NULL;
+}
+EXPORT_SYMBOL_GPL(klist_iter_exit_noref);
+
 static struct klist_node *to_klist_node(struct list_head *n)
 {
 	return container_of(n, struct klist_node, n_node);
Index: linux-3.4/drivers/base/bus.c
===================================================================
--- linux-3.4.orig/drivers/base/bus.c
+++ linux-3.4/drivers/base/bus.c
@@ -341,6 +341,37 @@ struct device *bus_find_device(struct bu
 }
 EXPORT_SYMBOL_GPL(bus_find_device);
 
+/**
+ * bus_find_device_noref - device iterator for locating a particular device.
+ * @bus: bus type
+ * @start: Device to begin with
+ * @data: Data to pass to match function
+ * @match: Callback function to check device
+ *
+ * Same as bus_find_device, but it assumes caller already has klist_node ref,
+ * and it returns to caller a struct device with the new kref.
+ */
+
+struct device *bus_find_device_noref(struct bus_type *bus,
+				     struct device *start, void *data,
+				     int (*match)(struct device *dev, void *data))
+{
+	struct klist_iter i;
+	struct device *dev;
+
+	if (!bus || !bus->p)
+		return NULL;
+
+	klist_iter_init_node_noref(&bus->p->klist_devices, &i,
+				   (start ? &start->p->knode_bus : NULL));
+	while ((dev = next_device(&i)))
+		if (match(dev, data) && get_device(dev))
+			break;
+	klist_iter_exit_noref(&i);
+	return dev;
+}
+EXPORT_SYMBOL_GPL(bus_find_device_noref);
+
 static int match_name(struct device *dev, void *data)
 {
 	const char *name = data;
Index: linux-3.4/drivers/pci/search.c
===================================================================
--- linux-3.4.orig/drivers/pci/search.c
+++ linux-3.4/drivers/pci/search.c
@@ -289,6 +289,94 @@ pci_get_device(unsigned int vendor, unsi
 	return pci_get_subsys(vendor, device, PCI_ANY_ID, PCI_ANY_ID, from);
 }
 
+/*
+ * pci_get_dev_by_id_noref - begin or continue searching for a PCI device by id
+ * @id: pointer to struct pci_device_id to match for the device
+ * @from: Previous PCI device found in search, or %NULL for new search.
+ *
+ * Same as pci_get_dev_by_id, but it expects caller to hold a kref on knode_bus
+ * in "from" from previous search, and it holds a kref on knode_bus in returned
+ * pdev.
+ * This is an internal function for use by the other search functions in
+ * this file.
+ */
+static struct pci_dev *pci_get_dev_by_id_noref(const struct pci_device_id *id,
+					       struct pci_dev *from)
+{
+	struct device *dev;
+	struct device *dev_start = NULL;
+	struct pci_dev *pdev = NULL;
+
+	WARN_ON(in_interrupt());
+	if (from)
+		dev_start = &from->dev;
+	dev = bus_find_device_noref(&pci_bus_type, dev_start, (void *)id,
+				    match_pci_dev_by_id);
+	if (dev)
+		pdev = to_pci_dev(dev);
+	if (from)
+		pci_dev_put(from);
+	return pdev;
+}
+
+/**
+ * pci_get_subsys_noref - begin or continue searching for a PCI device by vendor/subvendor/device/subdevice id
+ * @vendor: PCI vendor id to match, or %PCI_ANY_ID to match all vendor ids
+ * @device: PCI device id to match, or %PCI_ANY_ID to match all device ids
+ * @ss_vendor: PCI subsystem vendor id to match, or %PCI_ANY_ID to match all vendor ids
+ * @ss_device: PCI subsystem device id to match, or %PCI_ANY_ID to match all device ids
+ * @from: Previous PCI device found in search, or %NULL for new search.
+ *
+ * Same as pci_get_subsys, but it expects caller to hold a kref on knode_bus
+ * in "from" from previous search, and it holds a kref on knode_bus in returned
+ * pdev.
+ */
+struct pci_dev *pci_get_subsys_noref(unsigned int vendor, unsigned int device,
+				     unsigned int ss_vendor, unsigned int ss_device,
+				     struct pci_dev *from)
+{
+	struct pci_dev *pdev;
+	struct pci_device_id *id;
+
+	/*
+	 * pci_find_subsys() can be called on the ide_setup() path,
+	 * super-early in boot.  But the down_read() will enable local
+	 * interrupts, which can cause some machines to crash.  So here we
+	 * detect and flag that situation and bail out early.
+	 */
+	if (unlikely(no_pci_devices()))
+		return NULL;
+
+	id = kzalloc(sizeof(*id), GFP_KERNEL);
+	if (!id)
+		return NULL;
+	id->vendor = vendor;
+	id->device = device;
+	id->subvendor = ss_vendor;
+	id->subdevice = ss_device;
+
+	pdev = pci_get_dev_by_id_noref(id, from);
+	kfree(id);
+
+	return pdev;
+}
+
+/**
+ * pci_get_device_noref - begin or continue searching for a PCI device by vendor/device id
+ * @vendor: PCI vendor id to match, or %PCI_ANY_ID to match all vendor ids
+ * @device: PCI device id to match, or %PCI_ANY_ID to match all device ids
+ * @from: Previous PCI device found in search, or %NULL for new search.
+ *
+ * Same as pci_get_subsys, but it expects caller to hold a kref on knode_bus
+ * in "from" from previous search, and it holds a kref on knode_bus in returned
+ * pdev.
+ */
+struct pci_dev *
+pci_get_device_noref(unsigned int vendor, unsigned int device, struct pci_dev *from)
+{
+	return pci_get_subsys_noref(vendor, device, PCI_ANY_ID, PCI_ANY_ID, from);
+}
+
 /**
  * pci_get_class - begin or continue searching for a PCI device by class
  * @class: search for a PCI device with this class designation
@@ -355,5 +443,7 @@ EXPORT_SYMBOL(pci_find_next_bus);
 /* For everyone */
 EXPORT_SYMBOL(pci_get_device);
 EXPORT_SYMBOL(pci_get_subsys);
+EXPORT_SYMBOL(pci_get_device_noref);
+EXPORT_SYMBOL(pci_get_subsys_noref);
 EXPORT_SYMBOL(pci_get_slot);
 EXPORT_SYMBOL(pci_get_class);
Index: linux-3.4/include/linux/pci.h
===================================================================
--- linux-3.4.orig/include/linux/pci.h
+++ linux-3.4/include/linux/pci.h
@@ -719,6 +719,11 @@ struct pci_dev *pci_get_device(unsigned 
 struct pci_dev *pci_get_subsys(unsigned int vendor, unsigned int device,
 				unsigned int ss_vendor, unsigned int ss_device,
 				struct pci_dev *from);
+struct pci_dev *pci_get_device_noref(unsigned int vendor, unsigned int device,
+				     struct pci_dev *from);
+struct pci_dev *pci_get_subsys_noref(unsigned int vendor, unsigned int device,
+				     unsigned int ss_vendor, unsigned int ss_device,
+				     struct pci_dev *from);
 struct pci_dev *pci_get_slot(struct pci_bus *bus, unsigned int devfn);
 struct pci_dev *pci_get_domain_bus_and_slot(int domain, unsigned int bus,
 					    unsigned int devfn);
Index: linux-3.4/include/linux/device.h
===================================================================
--- linux-3.4.orig/include/linux/device.h
+++ linux-3.4/include/linux/device.h
@@ -140,6 +140,9 @@ int bus_for_each_dev(struct bus_type *bu
 struct device *bus_find_device(struct bus_type *bus, struct device *start,
 			       void *data,
 			       int (*match)(struct device *dev, void *data));
+struct device *bus_find_device_noref(struct bus_type *bus, struct device *start,
+				     void *data,
+				     int (*match)(struct device *dev, void *data));
 struct device *bus_find_device_by_name(struct bus_type *bus,
 				       struct device *start,
 				       const char *name);
Index: linux-3.4/include/linux/klist.h
===================================================================
--- linux-3.4.orig/include/linux/klist.h
+++ linux-3.4/include/linux/klist.h
@@ -63,6 +63,10 @@ extern void klist_iter_init(struct klist
 extern void klist_iter_init_node(struct klist *k, struct klist_iter *i,
 				 struct klist_node *n);
 extern void klist_iter_exit(struct klist_iter *i);
+extern void klist_iter_init_noref(struct klist *k, struct klist_iter *i);
+extern void klist_iter_init_node_noref(struct klist *k, struct klist_iter *i,
+				       struct klist_node *n);
+extern void klist_iter_exit_noref(struct klist_iter *i);
 extern struct klist_node *klist_next(struct klist_iter *i);
 
 #endif

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-06-04  6:22 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-20 19:17 pci: kernel crash in bus_find_device Francesco Ruggeri
2014-05-20 19:50 ` Guenter Roeck
2014-05-20 22:35   ` Francesco Ruggeri
2014-05-20 23:38     ` Guenter Roeck
     [not found]       ` <CA+HUmGge7AEpAnwAG_VJD2CKTtRBoC2bCGVU_t4qm-x6+OCr-g@mail.gmail.com>
     [not found]         ` <20140521193010.GA1721@roeck-us.net>
     [not found]           ` <CA+HUmGhm1VLTvMKW1TUUPqStUhD11M5u0VyTZyXyWz_ZS8uSVw@mail.gmail.com>
2014-05-21 22:59             ` Guenter Roeck
2014-05-22  7:14               ` Greg Kroah-Hartmann
2014-05-22  7:22                 ` Guenter Roeck
2014-05-22 16:19                   ` Francesco Ruggeri
2014-05-22 17:57                     ` Guenter Roeck
2014-05-23  2:31                   ` Greg Kroah-Hartmann
2014-05-21 17:39     ` Guenter Roeck
2014-06-03 22:55 Francesco Ruggeri
2014-06-03 23:21 ` Greg KH
2014-06-04  3:25   ` Guenter Roeck
2014-06-04  6:22     ` Francesco Ruggeri
2014-06-03 23:23 ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).