linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH(v3) 2.6.24] Fix fakephp deadlock
@ 2008-01-25 16:23 Ian Abbott
  2008-01-27  6:01 ` Andrew Morton
  0 siblings, 1 reply; 3+ messages in thread
From: Ian Abbott @ 2008-01-25 16:23 UTC (permalink / raw)
  To: linux-kernel, linux-pci, pcihpd-discuss, gregkh, kristen.c.accardi

From: Ian Abbott <abbotti@mev.co.uk>

This is the third version of my patch to fix the problem of a process
deadlocking itself when it uses the fakephp driver to fake the removal
of a PCI device.

This section describes the changes since the first version of the
patch.  Skip to the next section (after the "---") for the original
rationale and (slightly modified) patch description, and for the patch
itself.

Greg KH hinted at a race condition in the first version of the patch.
In the first version, the enable_slot() function checked that the slot
was still valid and then flushed the work queue to perform any pending
removals of slots from sysfs before it finally rescanned the PCI buses.
Another task could have come in and marked the slot as disabled after
the validity check but before flushing the work queue, and this would
cause the task to deadlock while flushing the work queue (for similar
reasons to the problem I was initially trying to fix).  (The sysfs
attribute file's buffer semaphore wouldn't necessarily protect against
that, because writing "0" to the "power" file can trigger a cascade of
other slot removals that don't have that protection.)

The second version of the patch rewrote the enable_slot() function so
it no longer flushes the work queue.  To maintain the proper sequencing,
it allocated and queued a work queue item to perform the PCI rescan (the
work queue is single-threaded).  I also removed the slot validity check
as we don't really care which slot was used to trigger a rescan.

This third version of the patch uses a single statically allocated work
queue item to perform the PCI rescan instead of allocating one
dynamically in the enable_slot() function.  When enable_slot() is
called, it first cancels any queued PCI rescan work item and then
requeues it.  This means a previously scheduled PCI rescan may be
skipped, but that shouldn't matter.  The important thing is that the
new PCI rescan is performed after any previously scheduled removals
of PCI slots from sysfs.

Greg KH also seemed to have a problem with my use of unsigned long for
the 'removed' member of 'struct dummy_slot'.  I need that for the
test_and_set_bit() and a narrower type wouldn't make 'struct dummy_slot'
any smaller (and there are other ways to save space in 'struct
dummy_slot' if it is a problem, but they make the code trickier).

On with the rationale and patch description....

---
From: Ian Abbott <abbotti@mev.co.uk>

If the fakephp driver is used to emulate removal of a PCI device by
writing text string "0" to the "power" sysfs attribute file, this causes
its parent directory and its contents (including the "power" file) to be
deleted before the write operation returns.  Unfortunately, it ends up
in a deadlock waiting for itself to complete.

The deadlock is as follows: sysfs_write_file calls flush_write_buffer
which calls sysfs_get_active_two before calling power_write_file in
pci_hotplug_core.c via the sysfs store operation. The power_write_file
function calls disable_slot in fakephp.c via the slot operation.  The
disable_slot function calls remove_slot which calls pci_hp_deregister
(back in pci_hotplug_core.c) which calls fs_remove_slot which calls
sysfs_remove_file to remove the "power" file. The sysfs_remove_file
function calls sysfs_hash_and_remove which calls sysfs_addrm_finish
which calls sysfs_deactivate. The sysfs_deactivate function sees that
something has an active reference on the sysfs_dirent (from the
previous call to sysfs_get_active_two back up the call stack somewhere)
so waits for the active reference to go away, which is of course
impossible.

The problem has been present since 2.6.21.

This patch breaks the deadlock by queuing work queue items on a single-
threaded work queue to remove a slot from sysfs, and to rescan the PCI
buses.  There is also some protection against disabling a slot that is
already being removed.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
---

--- linux-2.6.24/drivers/pci/hotplug/fakephp.c.orig	2008-01-24 22:58:37.000000000 +0000
+++ linux-2.6.24/drivers/pci/hotplug/fakephp.c	2008-01-25 15:58:21.000000000 +0000
@@ -39,6 +39,7 @@
 #include <linux/init.h>
 #include <linux/string.h>
 #include <linux/slab.h>
+#include <linux/workqueue.h>
 #include "../pci.h"

 #if !defined(MODULE)
@@ -63,10 +64,16 @@ struct dummy_slot {
 	struct list_head node;
 	struct hotplug_slot *slot;
 	struct pci_dev *dev;
+	struct work_struct remove_work;
+	unsigned long removed;
 };

 static int debug;
 static LIST_HEAD(slot_list);
+static struct workqueue_struct *dummyphp_wq;
+
+static void pci_rescan_worker(struct work_struct *work);
+static DECLARE_WORK(pci_rescan_work, pci_rescan_worker);

 static int enable_slot (struct hotplug_slot *slot);
 static int disable_slot (struct hotplug_slot *slot);
@@ -109,7 +116,7 @@ static int add_slot(struct pci_dev *dev)
 	slot->name = &dev->dev.bus_id[0];
 	dbg("slot->name = %s\n", slot->name);

-	dslot = kmalloc(sizeof(struct dummy_slot), GFP_KERNEL);
+	dslot = kzalloc(sizeof(struct dummy_slot), GFP_KERNEL);
 	if (!dslot)
 		goto error_info;

@@ -164,6 +171,14 @@ static void remove_slot(struct dummy_slo
 		err("Problem unregistering a slot %s\n", dslot->slot->name);
 }

+/* called from the single-threaded workqueue handler to remove a slot */
+static void remove_slot_worker(struct work_struct *work)
+{
+	struct dummy_slot *dslot =
+		container_of(work, struct dummy_slot, remove_work);
+	remove_slot(dslot);
+}
+
 /**
  * pci_rescan_slot - Rescan slot
  * @temp: Device template. Should be set: bus and devfn.
@@ -267,11 +282,17 @@ static inline void pci_rescan(void) {
 	pci_rescan_buses(&pci_root_buses);
 }

+/* called from the single-threaded workqueue handler to rescan all pci buses */
+static void pci_rescan_worker(struct work_struct *work)
+{
+	pci_rescan();
+}

 static int enable_slot(struct hotplug_slot *hotplug_slot)
 {
 	/* mis-use enable_slot for rescanning of the pci bus */
-	pci_rescan();
+	cancel_work_sync(&pci_rescan_work);
+	queue_work(dummyphp_wq, &pci_rescan_work);
 	return -ENODEV;
 }

@@ -306,6 +327,10 @@ static int disable_slot(struct hotplug_s
 		err("Can't remove PCI devices with other PCI devices behind it yet.\n");
 		return -ENODEV;
 	}
+	if (test_and_set_bit(0, &dslot->removed)) {
+		dbg("Slot already scheduled for removal\n");
+		return -ENODEV;
+	}
 	/* search for subfunctions and disable them first */
 	if (!(dslot->dev->devfn & 7)) {
 		for (func = 1; func < 8; func++) {
@@ -328,8 +353,9 @@ static int disable_slot(struct hotplug_s
 	/* remove the device from the pci core */
 	pci_remove_bus_device(dslot->dev);

-	/* blow away this sysfs entry and other parts. */
-	remove_slot(dslot);
+	/* queue work item to blow away this sysfs entry and other parts. */
+	INIT_WORK(&dslot->remove_work, remove_slot_worker);
+	queue_work(dummyphp_wq, &dslot->remove_work);

 	return 0;
 }
@@ -340,6 +366,7 @@ static void cleanup_slots (void)
 	struct list_head *next;
 	struct dummy_slot *dslot;

+	destroy_workqueue(dummyphp_wq);
 	list_for_each_safe (tmp, next, &slot_list) {
 		dslot = list_entry (tmp, struct dummy_slot, node);
 		remove_slot(dslot);
@@ -351,6 +378,10 @@ static int __init dummyphp_init(void)
 {
 	info(DRIVER_DESC "\n");

+	dummyphp_wq = create_singlethread_workqueue(MY_NAME);
+	if (!dummyphp_wq)
+		return -ENOMEM;
+
 	return pci_scan_buses();
 }


-- 
-=( Ian Abbott @ MEV Ltd.    E-mail: <abbotti@mev.co.uk>        )=-
-=( Tel: +44 (0)161 477 1898   FAX: +44 (0)161 718 3587         )=-

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH(v3) 2.6.24] Fix fakephp deadlock
  2008-01-25 16:23 [PATCH(v3) 2.6.24] Fix fakephp deadlock Ian Abbott
@ 2008-01-27  6:01 ` Andrew Morton
  2008-01-28 10:33   ` Ian Abbott
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2008-01-27  6:01 UTC (permalink / raw)
  To: Ian Abbott
  Cc: linux-kernel, linux-pci, pcihpd-discuss, gregkh, kristen.c.accardi

> On Fri, 25 Jan 2008 16:23:56 +0000 Ian Abbott <abbotti@mev.co.uk> wrote:
> From: Ian Abbott <abbotti@mev.co.uk>
> 
> If the fakephp driver is used to emulate removal of a PCI device by
> writing text string "0" to the "power" sysfs attribute file, this causes
> its parent directory and its contents (including the "power" file) to be
> deleted before the write operation returns.  Unfortunately, it ends up
> in a deadlock waiting for itself to complete.

Thansk for working on this, but....

> The deadlock is as follows: sysfs_write_file calls flush_write_buffer
> which calls sysfs_get_active_two before calling power_write_file in
> pci_hotplug_core.c via the sysfs store operation. The power_write_file
> function calls disable_slot in fakephp.c via the slot operation.  The
> disable_slot function calls remove_slot which calls pci_hp_deregister
> (back in pci_hotplug_core.c) which calls fs_remove_slot which calls
> sysfs_remove_file to remove the "power" file. The sysfs_remove_file
> function calls sysfs_hash_and_remove which calls sysfs_addrm_finish
> which calls sysfs_deactivate. The sysfs_deactivate function sees that
> something has an active reference on the sysfs_dirent (from the
> previous call to sysfs_get_active_two back up the call stack somewhere)
> so waits for the active reference to go away, which is of course

     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  <- this always wrong
> impossible.
> 
> The problem has been present since 2.6.21.
> 
> This patch breaks the deadlock by queuing work queue items on a single-
> threaded work queue to remove a slot from sysfs, and to rescan the PCI
> buses.  There is also some protection against disabling a slot that is
> already being removed.

Adding a deferred-work like this just because we can't get the locking and
refcounting correct is a really sad hack.

Can we get the locking and refcounting right please?  Start by making that
wait-for-refcount-to-go-away go away.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH(v3) 2.6.24] Fix fakephp deadlock
  2008-01-27  6:01 ` Andrew Morton
@ 2008-01-28 10:33   ` Ian Abbott
  0 siblings, 0 replies; 3+ messages in thread
From: Ian Abbott @ 2008-01-28 10:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-pci, pcihpd-discuss, gregkh, kristen.c.accardi

On 27/01/08 06:01, Andrew Morton wrote:
> Adding a deferred-work like this just because we can't get the locking and
> refcounting correct is a really sad hack.

True, but some would argue that the fakephp driver itself was a really 
sad hack. :-)

> Can we get the locking and refcounting right please?  Start by making that
> wait-for-refcount-to-go-away go away.

I guess that stuff was added for a good reason and fixed other potential 
problems - it just broke fakephp.  It's an unusual requirement to try 
and support -- a file that self-destructs when you write to it.  Are 
there any other cases in sysfs that do that?

-- 
-=( Ian Abbott @ MEV Ltd.    E-mail: <abbotti@mev.co.uk>        )=-
-=( Tel: +44 (0)161 477 1898   FAX: +44 (0)161 718 3587         )=-

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-01-28 10:33 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-25 16:23 [PATCH(v3) 2.6.24] Fix fakephp deadlock Ian Abbott
2008-01-27  6:01 ` Andrew Morton
2008-01-28 10:33   ` Ian Abbott

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).