All of lore.kernel.org
 help / color / mirror / Atom feed
* System reboot hangs due to race against devices_kset->list triggered by SCSI FC workqueue
@ 2010-03-03  0:47 Hugh Daschbach
  2010-03-03  4:54 ` Greg KH
  2010-03-03 15:50 ` Alan Stern
  0 siblings, 2 replies; 11+ messages in thread
From: Hugh Daschbach @ 2010-03-03  0:47 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Kay Sievers, Alan Stern, Jan Blunck, David Vrabel
  Cc: linux-kernel, linux-scsi

The system may fail to boot when the kernel's devices_kset->list gets
written by another thread while device_shutdown() is traversing the
list.  Though not common, this is fairly reproducible for some SCSI
Fibre Channel topologies; particularly so with FCoE configurations.

The reboot thread calls device_shutdown() as part of system shutdown.
device_shutdown() loops through devices_kset->list, shutting down each
system device.  But devices_kset->list isn't protected from other
writers while device_shutdown() traverses the list.

One such secondary writer is the SCI Fibre Channel workqueue.  When
fc_wq_N removes a device that device_shutdown() holds in it's "devn"
(list traversal iterator) variable, device_shutdown() stalls, chasing
what is essentially a broken link.

This is not a common occurrence.  But FC SCSI devices associated with a
link that has gone down cause a race between device_shutdown() running
in reboot's process and scsi_remove_target() running in a SCSI FC
workqueue (fc_wq_N).

Network attached FC devices are particularly vulnerable because SysV
init scripts shut network interfaces down before proceeding with the
reboot request.  So by the time reboot is called, the link to the FC
devices is already down.

When the link is down device_shutdown() stalls (in sd_shutdown() --
which issues cache flush CDBs to what are, by that time, inaccessible
devices).  The stall ends when the fc rport timer expires.  But the
timer expiration also initiates fc_starget_delete() in the fc workqueue,
causing the race with device_shutdown().

The attached patch detects and attempts to recover from the
corruption.  But this can hardly be considered a fix, as it does not
address the race between device_shutdown() and scsi_remove_target().

Perhaps converting the list_for_each_entry_safe_reverse() to something
like.

        while (!list_empty(&devices_kset->list)) {
                dev = list_last_entry(...);
                ...
        }

might be appropriate.  But I have no idea if any devices don't fully
remove themselves from the list when shutdown.

Does anyone have any guidance for what would make a more appropriate
fix?

Thanks,
Hugh

>From ff59be003a016ed1f638f89658bcbb17d69b2983 Mon Sep 17 00:00:00 2001
From: Hugh Daschbach <hdasch@broadcom.com>
Date: Mon, 1 Mar 2010 17:01:48 -0800
Subject: [PATCH] Fix Cont00045892: bnx2fc 0.2.3: Cannot shutdown a system

Detect and attempt recovery from devices_kset->list corruption.

The list gets corrupted during reboot by a race between
device_shutdown() and scsi_remove_target().

Signed-off-by: Hugh Daschbach <hdasch@broadcom.com>
---
 drivers/base/core.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 2820257..07851e9 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1733,6 +1733,7 @@ void device_shutdown(void)
 {
 	struct device *dev, *devn;

+retry:
 	list_for_each_entry_safe_reverse(dev, devn, &devices_kset->list,
 				kobj.entry) {
 		if (dev->bus && dev->bus->shutdown) {
@@ -1742,6 +1743,9 @@ void device_shutdown(void)
 			dev_dbg(dev, "shutdown\n");
 			dev->driver->shutdown(dev);
 		}
+		if (devn->kobj.entry.next == devn->kobj.entry.prev &&
+		    devn->kobj.entry.next != &devices_kset->list)
+			goto retry;
 	}
 	async_synchronize_full();
 }
-- 
1.7.0.rc0.48.gdace5



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: System reboot hangs due to race against devices_kset->list triggered by SCSI FC workqueue
  2010-03-03  0:47 System reboot hangs due to race against devices_kset->list triggered by SCSI FC workqueue Hugh Daschbach
@ 2010-03-03  4:54 ` Greg KH
  2010-03-03 19:16   ` Hugh Daschbach
  2010-03-03 15:50 ` Alan Stern
  1 sibling, 1 reply; 11+ messages in thread
From: Greg KH @ 2010-03-03  4:54 UTC (permalink / raw)
  To: Hugh Daschbach
  Cc: Kay Sievers, Alan Stern, Jan Blunck, David Vrabel, linux-kernel,
	linux-scsi

On Tue, Mar 02, 2010 at 04:47:01PM -0800, Hugh Daschbach wrote:
> The system may fail to boot when the kernel's devices_kset->list gets
> written by another thread while device_shutdown() is traversing the
> list.  Though not common, this is fairly reproducible for some SCSI
> Fibre Channel topologies; particularly so with FCoE configurations.

Really?  What a mess :(

> The reboot thread calls device_shutdown() as part of system shutdown.
> device_shutdown() loops through devices_kset->list, shutting down each
> system device.  But devices_kset->list isn't protected from other
> writers while device_shutdown() traverses the list.

Can't we just protect the list?  What is wanting to write to the list
while shutdown is happening?

> One such secondary writer is the SCI Fibre Channel workqueue.  When
> fc_wq_N removes a device that device_shutdown() holds in it's "devn"
> (list traversal iterator) variable, device_shutdown() stalls, chasing
> what is essentially a broken link.
> 
> This is not a common occurrence.  But FC SCSI devices associated with a
> link that has gone down cause a race between device_shutdown() running
> in reboot's process and scsi_remove_target() running in a SCSI FC
> workqueue (fc_wq_N).
> 
> Network attached FC devices are particularly vulnerable because SysV
> init scripts shut network interfaces down before proceeding with the
> reboot request.  So by the time reboot is called, the link to the FC
> devices is already down.
> 
> When the link is down device_shutdown() stalls (in sd_shutdown() --
> which issues cache flush CDBs to what are, by that time, inaccessible
> devices).  The stall ends when the fc rport timer expires.  But the
> timer expiration also initiates fc_starget_delete() in the fc workqueue,
> causing the race with device_shutdown().

Can't you just not do this?

> The attached patch detects and attempts to recover from the
> corruption.  But this can hardly be considered a fix, as it does not
> address the race between device_shutdown() and scsi_remove_target().

I agree, this patch isn't ok, it should be handled in the scsi core as
it looks like a scsi problem, not a driver core problem, right?

> Perhaps converting the list_for_each_entry_safe_reverse() to something
> like.
> 
>         while (!list_empty(&devices_kset->list)) {
>                 dev = list_last_entry(...);
>                 ...
>         }
> 
> might be appropriate.  But I have no idea if any devices don't fully
> remove themselves from the list when shutdown.

That shouldn't really solve the problem, right?

> Does anyone have any guidance for what would make a more appropriate
> fix?

So the scsi core is trying to remove a device at the same time shutdown
is happening, right?  So we need to protect the list somehow, maybe just
switch it over to use a klist which should handle this for us instead?
Can you try that?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: System reboot hangs due to race against devices_kset->list triggered by SCSI FC workqueue
  2010-03-03  0:47 System reboot hangs due to race against devices_kset->list triggered by SCSI FC workqueue Hugh Daschbach
  2010-03-03  4:54 ` Greg KH
@ 2010-03-03 15:50 ` Alan Stern
  1 sibling, 0 replies; 11+ messages in thread
From: Alan Stern @ 2010-03-03 15:50 UTC (permalink / raw)
  To: Hugh Daschbach
  Cc: Greg Kroah-Hartman, Kay Sievers, Jan Blunck, David Vrabel,
	linux-kernel, linux-scsi

On Tue, 2 Mar 2010, Hugh Daschbach wrote:

> The system may fail to boot when the kernel's devices_kset->list gets
> written by another thread while device_shutdown() is traversing the
> list.  Though not common, this is fairly reproducible for some SCSI
> Fibre Channel topologies; particularly so with FCoE configurations.
> 
> The reboot thread calls device_shutdown() as part of system shutdown.
> device_shutdown() loops through devices_kset->list, shutting down each
> system device.  But devices_kset->list isn't protected from other
> writers while device_shutdown() traverses the list.
> 
> One such secondary writer is the SCI Fibre Channel workqueue.  When
> fc_wq_N removes a device that device_shutdown() holds in it's "devn"
> (list traversal iterator) variable, device_shutdown() stalls, chasing
> what is essentially a broken link.
> 
> This is not a common occurrence.  But FC SCSI devices associated with a
> link that has gone down cause a race between device_shutdown() running
> in reboot's process and scsi_remove_target() running in a SCSI FC
> workqueue (fc_wq_N).
> 
> Network attached FC devices are particularly vulnerable because SysV
> init scripts shut network interfaces down before proceeding with the
> reboot request.  So by the time reboot is called, the link to the FC
> devices is already down.
> 
> When the link is down device_shutdown() stalls (in sd_shutdown() --
> which issues cache flush CDBs to what are, by that time, inaccessible
> devices).  The stall ends when the fc rport timer expires.  But the
> timer expiration also initiates fc_starget_delete() in the fc workqueue,
> causing the race with device_shutdown().
> 
> The attached patch detects and attempts to recover from the
> corruption.  But this can hardly be considered a fix, as it does not
> address the race between device_shutdown() and scsi_remove_target().
> 
> Perhaps converting the list_for_each_entry_safe_reverse() to something
> like.
> 
>         while (!list_empty(&devices_kset->list)) {
>                 dev = list_last_entry(...);
>                 ...
>         }
> 
> might be appropriate.  But I have no idea if any devices don't fully
> remove themselves from the list when shutdown.

You can't make any assumptions about that.  Probably most of them 
don't.

> Does anyone have any guidance for what would make a more appropriate
> fix?

Your suggestion above ought to work out okay, if you remove each device
from the list yourself as you come to it.  (I don't think that will
cause problems elsewhere, but I could be wrong.)  However, struct kset
contains a spinlock which is supposed to protect the list.  This loop
should be using the spinlock.

Alan Stern


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: System reboot hangs due to race against devices_kset->list triggered by SCSI FC workqueue
  2010-03-03  4:54 ` Greg KH
@ 2010-03-03 19:16   ` Hugh Daschbach
  2010-03-03 20:25     ` Alan Stern
  0 siblings, 1 reply; 11+ messages in thread
From: Hugh Daschbach @ 2010-03-03 19:16 UTC (permalink / raw)
  To: Greg KH
  Cc: Kay Sievers, Alan Stern, Jan Blunck, David Vrabel, linux-kernel,
	linux-scsi, james Bottomley, James Smart

+ James Bottomley
+ James Smart

As the discussion tilts toward SCSI device shutdown.

Greg KH [mailto:gregkh@suse.de] writes:

> On Tue, Mar 02, 2010 at 04:47:01PM -0800, Hugh Daschbach wrote:
>> The system may fail to boot when the kernel's devices_kset->list gets
>> written by another thread while device_shutdown() is traversing the
>> list.  Though not common, this is fairly reproducible for some SCSI
>> Fibre Channel topologies; particularly so with FCoE configurations.
>
> Really?  What a mess :(
>
>> The reboot thread calls device_shutdown() as part of system shutdown.
>> device_shutdown() loops through devices_kset->list, shutting down each
>> system device.  But devices_kset->list isn't protected from other
>> writers while device_shutdown() traverses the list.
>
> Can't we just protect the list?  What is wanting to write to the list
> while shutdown is happening?

Indeed, Alan suggested holding the kset spinlock while iterating the
list.  Unfortunately, the device shutdown routines may sleep.  At least
the SCSI sd_shutdown routine issues I/O to the device as part of
flushing device caches.  I would guess other subsystems sleep as well.

>> One such secondary writer is the SCI Fibre Channel workqueue.  When
>> fc_wq_N removes a device that device_shutdown() holds in it's "devn"
>> (list traversal iterator) variable, device_shutdown() stalls, chasing
>> what is essentially a broken link.
>> 
>> This is not a common occurrence.  But FC SCSI devices associated with a
>> link that has gone down cause a race between device_shutdown() running
>> in reboot's process and scsi_remove_target() running in a SCSI FC
>> workqueue (fc_wq_N).
>> 
>> Network attached FC devices are particularly vulnerable because SysV
>> init scripts shut network interfaces down before proceeding with the
>> reboot request.  So by the time reboot is called, the link to the FC
>> devices is already down.
>> 
>> When the link is down device_shutdown() stalls (in sd_shutdown() --
>> which issues cache flush CDBs to what are, by that time, inaccessible
>> devices).  The stall ends when the fc rport timer expires.  But the
>> timer expiration also initiates fc_starget_delete() in the fc workqueue,
>> causing the race with device_shutdown().
>
> Can't you just not do this?

I'm not sure.  I'd punt this question to the SCSI maintainers.  From the
FC transport point of view, the rport timeout simply looks like a device
unplug event.  Should the unplug be handled differently if the system
is already shutting down?

Presumably any other subsystem that supports device unplug (usb, for
example) could enter the same race.  But, FCoE devices seem uniquely
poised to provoke the issue in a semi-repeatable fashion.

...

>> Does anyone have any guidance for what would make a more appropriate
>> fix?
>
> So the scsi core is trying to remove a device at the same time shutdown
> is happening, right?  So we need to protect the list somehow, maybe just
> switch it over to use a klist which should handle this for us instead?
> Can you try that?

I'll try klist.  That looks like a good mediator between traversal and
removal.

Thanks,
Hugh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: System reboot hangs due to race against devices_kset->list triggered by SCSI FC workqueue
  2010-03-03 19:16   ` Hugh Daschbach
@ 2010-03-03 20:25     ` Alan Stern
  2010-03-04  3:25       ` Hugh Daschbach
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Stern @ 2010-03-03 20:25 UTC (permalink / raw)
  To: Hugh Daschbach
  Cc: Greg KH, Kay Sievers, Jan Blunck, David Vrabel, linux-kernel,
	linux-scsi, james Bottomley, James Smart

On Wed, 3 Mar 2010, Hugh Daschbach wrote:

> > Can't we just protect the list?  What is wanting to write to the list
> > while shutdown is happening?
> 
> Indeed, Alan suggested holding the kset spinlock while iterating the
> list.  Unfortunately, the device shutdown routines may sleep.  At least
> the SCSI sd_shutdown routine issues I/O to the device as part of
> flushing device caches.  I would guess other subsystems sleep as well.

What I meant was that you should hold the spinlock while finding and 
unlinking the last device on the list.  Clearly you shouldn't hold it 
while calling the device shutdown routine.

> I'll try klist.  That looks like a good mediator between traversal and
> removal.

Yes, it removes a lot of difficulties.

Alan Stern


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: System reboot hangs due to race against devices_kset->list triggered by SCSI FC workqueue
  2010-03-03 20:25     ` Alan Stern
@ 2010-03-04  3:25       ` Hugh Daschbach
  2010-03-04 15:18         ` Alan Stern
  0 siblings, 1 reply; 11+ messages in thread
From: Hugh Daschbach @ 2010-03-04  3:25 UTC (permalink / raw)
  To: Alan Stern, Greg KH
  Cc: Kay Sievers, Jan Blunck, David Vrabel, linux-kernel, linux-scsi,
	james Bottomley, James Smart

Alan Stern [mailto:stern@rowland.harvard.edu] writes:

> On Wed, 3 Mar 2010, Hugh Daschbach wrote:
>
>> > Can't we just protect the list?  What is wanting to write to the list
>> > while shutdown is happening?
>> 
>> Indeed, Alan suggested holding the kset spinlock while iterating the
>> list.  Unfortunately, the device shutdown routines may sleep.  At least
>> the SCSI sd_shutdown routine issues I/O to the device as part of
>> flushing device caches.  I would guess other subsystems sleep as well.
>
> What I meant was that you should hold the spinlock while finding and 
> unlinking the last device on the list.  Clearly you shouldn't hold it 
> while calling the device shutdown routine.

I misunderstood.  But I believe insertion and deletion is properly
serliaized.  It looks to me like the list structure is intact.  It's the
iterator that's been driven off into the weeds.

>> I'll try klist.  That looks like a good mediator between traversal and
>> removal.
>
> Yes, it removes a lot of difficulties.
>
> Alan Stern

Just to be clear, the list we're talking about is "list" in "struct
kset"  And the nodes of the list are chained by "entry" in "struct
kobject".  I just want to make sure that this is what's intended before
I get too far down the road.

At a minimum the change looks something like the patch below.  This
doesn't work yet.  And I'll need extensive testing in device plug and
unplug.  So I'm not looking for a detailed review.  But if I'm obviously
way off base, I'd like to know earlier rather than later.

Thanks for the guidance.

Hugh

diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index 58ae8e0..2c9b14a 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -18,6 +18,7 @@
 
 #include <linux/types.h>
 #include <linux/list.h>
+#include <linux/klist.h>
 #include <linux/sysfs.h>
 #include <linux/compiler.h>
 #include <linux/spinlock.h>
@@ -58,7 +59,7 @@ enum kobject_action {
 
 struct kobject {
 	const char		*name;
-	struct list_head	entry;
+	struct klist_node	entry;
 	struct kobject		*parent;
 	struct kset		*kset;
 	struct kobj_type	*ktype;
@@ -152,7 +153,7 @@ extern struct sysfs_ops kobj_sysfs_ops;
  * desired.
  */
 struct kset {
-	struct list_head list;
+	struct klist list;
 	spinlock_t list_lock;
 	struct kobject kobj;
 	struct kset_uevent_ops *uevent_ops;
diff --git a/include/linux/klist.h b/include/linux/klist.h
index e91a4e5..274fa91 100644
--- a/include/linux/klist.h
+++ b/include/linux/klist.h
@@ -64,5 +64,6 @@ extern void klist_iter_init_node(struct klist *k, struct klist_iter *i,
 				 struct klist_node *n);
 extern void klist_iter_exit(struct klist_iter *i);
 extern struct klist_node *klist_next(struct klist_iter *i);
+extern struct klist_node *klist_prev(struct klist_iter *i);
 
 #endif
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 07851e9..4ea25b0 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1726,15 +1726,24 @@ out:
 }
 EXPORT_SYMBOL_GPL(device_move);
 
+static struct device *klist_node_to_dev(struct klist_node *node)
+{
+	struct kobject *kobj = container_of(node, struct kobject, entry);
+	return container_of(kobj, struct device, kobj);
+}
+
 /**
  * device_shutdown - call ->shutdown() on each device to shutdown.
  */
 void device_shutdown(void)
 {
-	struct device *dev, *devn;
+	struct device *dev;
+	struct klist_iter iter;
+	struct klist_node *node;
 
-	list_for_each_entry_safe_reverse(dev, devn, &devices_kset->list,
-				kobj.entry) {
+	klist_iter_init(&devices_kset->list, &iter);
+	while ((node = klist_prev(&iter)) != NULL) {
+		dev = klist_node_to_dev(node);
 		if (dev->bus && dev->bus->shutdown) {
 			dev_dbg(dev, "shutdown\n");
 			dev->bus->shutdown(dev);
@@ -1743,6 +1751,7 @@ retry:
 			dev_dbg(dev, "shutdown\n");
 			dev->driver->shutdown(dev);
 		}
 	}
+	klist_iter_exit(&iter);
 	async_synchronize_full();
 }
diff --git a/drivers/base/sys.c b/drivers/base/sys.c
index 0d90390..5d77600 100644
--- a/drivers/base/sys.c
+++ b/drivers/base/sys.c
@@ -160,6 +160,30 @@ EXPORT_SYMBOL_GPL(sysdev_class_unregister);
 
 static DEFINE_MUTEX(sysdev_drivers_lock);
 
+static struct sysdev_class *sysdev_next_class(struct klist_iter *iter)
+{
+	struct klist_node *node = klist_next(iter);
+	return node
+		? container_of(node, struct sysdev_class, kset.kobj.entry)
+		: NULL;
+}
+
+static struct sysdev_class *sysdev_prev_class(struct klist_iter *iter)
+{
+	struct klist_node *node = klist_prev(iter);
+	return node
+		? container_of(node, struct sysdev_class, kset.kobj.entry)
+		: NULL;
+}
+
+static struct sys_device *sysdev_next_sysdev(struct klist_iter *iter)
+{
+	struct klist_node *node = klist_prev(iter);
+	return node
+		? container_of(node, struct sys_device, kobj.entry)
+		: NULL;
+}
+
 /**
  *	sysdev_driver_register - Register auxillary driver
  *	@cls:	Device class driver belongs to.
@@ -193,8 +217,12 @@ int sysdev_driver_register(struct sysdev_class *cls, struct sysdev_driver *drv)
 		/* If devices of this class already exist, tell the driver */
 		if (drv->add) {
 			struct sys_device *dev;
-			list_for_each_entry(dev, &cls->kset.list, kobj.entry)
+			struct klist_iter device_iter;
+
+			klist_iter_init(&cls->kset.list, &device_iter);
+			while ((dev = sysdev_next_sysdev(&device_iter)) != NULL)
 				drv->add(dev);
+			klist_iter_exit(&device_iter);
 		}
 	} else {
 		err = -EINVAL;
@@ -218,8 +246,12 @@ void sysdev_driver_unregister(struct sysdev_class *cls,
 	if (cls) {
 		if (drv->remove) {
 			struct sys_device *dev;
-			list_for_each_entry(dev, &cls->kset.list, kobj.entry)
+			struct klist_iter device_iter;
+
+			klist_iter_init(&cls->kset.list, &device_iter);
+			while ((dev = sysdev_next_sysdev(&device_iter)) != NULL)
 				drv->remove(dev);
+			klist_iter_exit(&device_iter);
 		}
 		kset_put(&cls->kset);
 	}
@@ -312,18 +344,23 @@ void sysdev_unregister(struct sys_device *sysdev)
  */
 void sysdev_shutdown(void)
 {
+	struct klist_iter class_iter;
 	struct sysdev_class *cls;
 
 	pr_debug("Shutting Down System Devices\n");
 
 	mutex_lock(&sysdev_drivers_lock);
-	list_for_each_entry_reverse(cls, &system_kset->list, kset.kobj.entry) {
+
+	klist_iter_init(&system_kset->list, &class_iter);
+	while ((cls = sysdev_prev_class(&class_iter)) != NULL) {
+		struct klist_iter device_iter;
 		struct sys_device *sysdev;
 
 		pr_debug("Shutting down type '%s':\n",
 			 kobject_name(&cls->kset.kobj));
 
-		list_for_each_entry(sysdev, &cls->kset.list, kobj.entry) {
+		klist_iter_init(&cls->kset.list, &device_iter);
+		while ((sysdev = sysdev_next_sysdev(&device_iter)) != NULL) {
 			struct sysdev_driver *drv;
 			pr_debug(" %s\n", kobject_name(&sysdev->kobj));
 
@@ -337,7 +374,9 @@ void sysdev_shutdown(void)
 			if (cls->shutdown)
 				cls->shutdown(sysdev);
 		}
+		klist_iter_exit(&device_iter);
 	}
+	klist_iter_exit(&class_iter);
 	mutex_unlock(&sysdev_drivers_lock);
 }
 
@@ -375,6 +414,9 @@ static void __sysdev_resume(struct sys_device *dev)
  */
 int sysdev_suspend(pm_message_t state)
 {
+	struct klist_iter class_iter;
+	struct klist_iter device_iter;
+	struct klist_iter err_device_iter;
 	struct sysdev_class *cls;
 	struct sys_device *sysdev, *err_dev;
 	struct sysdev_driver *drv, *err_drv;
@@ -392,15 +434,17 @@ int sysdev_suspend(pm_message_t state)
 
 	pr_debug("Suspending System Devices\n");
 
-	list_for_each_entry_reverse(cls, &system_kset->list, kset.kobj.entry) {
+	klist_iter_init(&system_kset->list, &class_iter);
+	while ((cls = sysdev_prev_class(&class_iter)) != NULL) {
 		pr_debug("Suspending type '%s':\n",
 			 kobject_name(&cls->kset.kobj));
 
-		list_for_each_entry(sysdev, &cls->kset.list, kobj.entry) {
+		klist_iter_init(&cls->kset.list, &device_iter);
+		while ((sysdev = sysdev_next_sysdev(&device_iter)) != NULL) {
 			pr_debug(" %s\n", kobject_name(&sysdev->kobj));
 
 			/* Call auxillary drivers first */
-			list_for_each_entry(drv, &cls->drivers, entry) {
+			list_for_each_entry (drv, &cls->drivers, entry) {
 				if (drv->suspend) {
 					ret = drv->suspend(sysdev, state);
 					if (ret)
@@ -421,7 +465,9 @@ int sysdev_suspend(pm_message_t state)
 					cls->suspend);
 			}
 		}
+		klist_iter_exit(&device_iter);
 	}
+	klist_iter_exit(&class_iter);
 	return 0;
 	/* resume current sysdev */
 cls_driver:
@@ -441,20 +487,26 @@ aux_driver:
 	}
 
 	/* resume other sysdevs in current class */
-	list_for_each_entry(err_dev, &cls->kset.list, kobj.entry) {
+	klist_iter_init(&cls->kset.list, &err_device_iter);
+	while ((err_dev = sysdev_next_sysdev(&err_device_iter)) != NULL) {
 		if (err_dev == sysdev)
 			break;
 		pr_debug(" %s\n", kobject_name(&err_dev->kobj));
 		__sysdev_resume(err_dev);
 	}
+	klist_iter_exit(&err_device_iter);
 
 	/* resume other classes */
-	list_for_each_entry_continue(cls, &system_kset->list, kset.kobj.entry) {
-		list_for_each_entry(err_dev, &cls->kset.list, kobj.entry) {
+	while ((cls = sysdev_next_class(&class_iter)) != NULL) {
+		klist_iter_init(&cls->kset.list, &err_device_iter);
+		while ((err_dev = sysdev_next_sysdev(&err_device_iter))) {
 			pr_debug(" %s\n", kobject_name(&err_dev->kobj));
 			__sysdev_resume(err_dev);
 		}
+		klist_iter_exit(&err_device_iter);
 	}
+	klist_iter_exit(&device_iter);
+	klist_iter_exit(&class_iter);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(sysdev_suspend);
@@ -469,6 +521,8 @@ EXPORT_SYMBOL_GPL(sysdev_suspend);
  */
 int sysdev_resume(void)
 {
+	struct klist_iter class_iter;
+	struct klist_iter device_iter;
 	struct sysdev_class *cls;
 
 	WARN_ONCE(!irqs_disabled(),
@@ -476,18 +530,21 @@ int sysdev_resume(void)
 
 	pr_debug("Resuming System Devices\n");
 
-	list_for_each_entry(cls, &system_kset->list, kset.kobj.entry) {
+	while ((cls = sysdev_next_class(&class_iter)) != NULL) {
 		struct sys_device *sysdev;
 
 		pr_debug("Resuming type '%s':\n",
 			 kobject_name(&cls->kset.kobj));
 
-		list_for_each_entry(sysdev, &cls->kset.list, kobj.entry) {
+		klist_iter_init(&cls->kset.list, &device_iter);
+		while ((sysdev = sysdev_next_sysdev(&device_iter)) != NULL) {
 			pr_debug(" %s\n", kobject_name(&sysdev->kobj));
 
 			__sysdev_resume(sysdev);
 		}
+		klist_iter_exit(&device_iter);
 	}
+	klist_iter_exit(&class_iter);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(sysdev_resume);
diff --git a/lib/klist.c b/lib/klist.c
index 573d606..fcb19c8 100644
--- a/lib/klist.c
+++ b/lib/klist.c
@@ -363,3 +363,44 @@ struct klist_node *klist_next(struct klist_iter *i)
 	return i->i_cur;
 }
 EXPORT_SYMBOL_GPL(klist_next);
+
+/**
+ * klist_prev - Ante up next previous in list.
+ * @i: Iterator structure.
+ *
+ * First grab list lock. Decrement the reference count of the last iterated
+ * node, if there was one. Grab the previous node, increment its reference
+ * count, drop the lock, and return that previous node.
+ */
+struct klist_node *klist_prev(struct klist_iter *i)
+{
+	void (*put)(struct klist_node *) = i->i_klist->put;
+	struct klist_node *last = i->i_cur;
+	struct klist_node *prev;
+
+	spin_lock(&i->i_klist->k_lock);
+
+	if (last) {
+		prev = to_klist_node(last->n_node.prev);
+		if (!klist_dec_and_del(last))
+			put = NULL;
+	} else
+		prev = to_klist_node(i->i_klist->k_list.prev);
+
+	i->i_cur = NULL;
+	while (prev != to_klist_node(&i->i_klist->k_list)) {
+		if (likely(!knode_dead(prev))) {
+			kref_get(&prev->n_ref);
+			i->i_cur = prev;
+			break;
+		}
+		prev = to_klist_node(prev->n_node.prev);
+	}
+
+	spin_unlock(&i->i_klist->k_lock);
+
+	if (put && last)
+		put(last);
+	return i->i_cur;
+}
+EXPORT_SYMBOL_GPL(klist_prev);
diff --git a/lib/kobject.c b/lib/kobject.c
index b512b74..96808c3 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -126,7 +126,7 @@ static void kobj_kset_join(struct kobject *kobj)
 
 	kset_get(kobj->kset);
 	spin_lock(&kobj->kset->list_lock);
-	list_add_tail(&kobj->entry, &kobj->kset->list);
+	klist_add_tail(&kobj->entry, &kobj->kset->list);
 	spin_unlock(&kobj->kset->list_lock);
 }
 
@@ -137,7 +137,7 @@ static void kobj_kset_leave(struct kobject *kobj)
 		return;
 
 	spin_lock(&kobj->kset->list_lock);
-	list_del_init(&kobj->entry);
+	klist_del(&kobj->entry);
 	spin_unlock(&kobj->kset->list_lock);
 	kset_put(kobj->kset);
 }
@@ -147,7 +147,6 @@ static void kobject_init_internal(struct kobject *kobj)
 	if (!kobj)
 		return;
 	kref_init(&kobj->kref);
-	INIT_LIST_HEAD(&kobj->entry);
 	kobj->state_in_sysfs = 0;
 	kobj->state_add_uevent_sent = 0;
 	kobj->state_remove_uevent_sent = 0;
@@ -671,7 +670,7 @@ EXPORT_SYMBOL_GPL(kobject_create_and_add);
 void kset_init(struct kset *k)
 {
 	kobject_init_internal(&k->kobj);
-	INIT_LIST_HEAD(&k->list);
+	klist_init(&k->list, NULL, NULL);
 	spin_lock_init(&k->list_lock);
 }
 
@@ -735,6 +734,14 @@ void kset_unregister(struct kset *k)
 	kobject_put(&k->kobj);
 }
 
+static struct kobject *kset_next_kobject(struct klist_iter *iter)
+{
+	struct klist_node *node = klist_next(iter);
+	return node
+		? container_of(node, struct kobject, entry)
+		: NULL;
+}
+
 /**
  * kset_find_obj - search for object in kset.
  * @kset: kset we're looking in.
@@ -746,17 +753,18 @@ void kset_unregister(struct kset *k)
  */
 struct kobject *kset_find_obj(struct kset *kset, const char *name)
 {
+	struct klist_iter i;
 	struct kobject *k;
 	struct kobject *ret = NULL;
 
-	spin_lock(&kset->list_lock);
-	list_for_each_entry(k, &kset->list, entry) {
+	klist_iter_init(&kset->list, &i);
+	while ((k = kset_next_kobject(&i))) {
 		if (kobject_name(k) && !strcmp(kobject_name(k), name)) {
 			ret = kobject_get(k);
 			break;
 		}
 	}
-	spin_unlock(&kset->list_lock);
+	klist_iter_exit(&i);
 	return ret;
 }
 


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: System reboot hangs due to race against devices_kset->list triggered by SCSI FC workqueue
  2010-03-04  3:25       ` Hugh Daschbach
@ 2010-03-04 15:18         ` Alan Stern
  2010-03-04 19:09           ` Hugh Daschbach
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Stern @ 2010-03-04 15:18 UTC (permalink / raw)
  To: Hugh Daschbach
  Cc: Greg KH, Kay Sievers, Jan Blunck, David Vrabel, linux-kernel,
	linux-scsi, james Bottomley, James Smart

On Wed, 3 Mar 2010, Hugh Daschbach wrote:

> Alan Stern [mailto:stern@rowland.harvard.edu] writes:
> 
> > On Wed, 3 Mar 2010, Hugh Daschbach wrote:
> >
> >> > Can't we just protect the list?  What is wanting to write to the list
> >> > while shutdown is happening?
> >> 
> >> Indeed, Alan suggested holding the kset spinlock while iterating the
> >> list.  Unfortunately, the device shutdown routines may sleep.  At least
> >> the SCSI sd_shutdown routine issues I/O to the device as part of
> >> flushing device caches.  I would guess other subsystems sleep as well.
> >
> > What I meant was that you should hold the spinlock while finding and 
> > unlinking the last device on the list.  Clearly you shouldn't hold it 
> > while calling the device shutdown routine.
> 
> I misunderstood.  But I believe insertion and deletion is properly
> serliaized.  It looks to me like the list structure is intact.  It's the
> iterator that's been driven off into the weeds.
> 
> >> I'll try klist.  That looks like a good mediator between traversal and
> >> removal.
> >
> > Yes, it removes a lot of difficulties.
> >
> > Alan Stern
> 
> Just to be clear, the list we're talking about is "list" in "struct
> kset"  And the nodes of the list are chained by "entry" in "struct
> kobject".  I just want to make sure that this is what's intended before
> I get too far down the road.
> 
> At a minimum the change looks something like the patch below.  This
> doesn't work yet.  And I'll need extensive testing in device plug and
> unplug.  So I'm not looking for a detailed review.  But if I'm obviously
> way off base, I'd like to know earlier rather than later.
> 
> Thanks for the guidance.

If you really want to do this then you should remove the lock member 
from struct kset.  However this seems like an awful lot of work 
compared to my original suggestion -- something like this (untested, 
and you'll want to add comments):

Index: usb-2.6/drivers/base/core.c
===================================================================
--- usb-2.6.orig/drivers/base/core.c
+++ usb-2.6/drivers/base/core.c
@@ -1719,10 +1719,14 @@ EXPORT_SYMBOL_GPL(device_move);
  */
 void device_shutdown(void)
 {
-	struct device *dev, *devn;
+	struct device *dev;
+
+	spin_lock(&devices_kset->lock);
+	while (!list_empty(&devices_kset->list)) {
+		dev = list_entry(devices_kset->list.prev, struct device,
+				kobj.entry);
+		spin_unlock(&devices_kset->lock);
 
-	list_for_each_entry_safe_reverse(dev, devn, &devices_kset->list,
-				kobj.entry) {
 		if (dev->bus && dev->bus->shutdown) {
 			dev_dbg(dev, "shutdown\n");
 			dev->bus->shutdown(dev);
@@ -1730,6 +1734,10 @@ void device_shutdown(void)
 			dev_dbg(dev, "shutdown\n");
 			dev->driver->shutdown(dev);
 		}
+
+		spin_lock(&devices_kset->lock);
+		list_del_init(&dev->kobj.entry);
 	}
+	spin_unlock(&devices_kset->lock);
 	async_synchronize_full();
 }

Alan Stern


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: System reboot hangs due to race against devices_kset->list triggered by SCSI FC workqueue
  2010-03-04 15:18         ` Alan Stern
@ 2010-03-04 19:09           ` Hugh Daschbach
  2010-03-04 19:22             ` Alan Stern
  0 siblings, 1 reply; 11+ messages in thread
From: Hugh Daschbach @ 2010-03-04 19:09 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, Kay Sievers, Jan Blunck, David Vrabel, linux-kernel,
	linux-scsi, james Bottomley, James Smart

Alan Stern [mailto:stern@rowland.harvard.edu] writes:

> On Wed, 3 Mar 2010, Hugh Daschbach wrote:
>
>> Alan Stern [mailto:stern@rowland.harvard.edu] writes:
>> 
>> > On Wed, 3 Mar 2010, Hugh Daschbach wrote:
>> >
>> >> > Can't we just protect the list?  What is wanting to write to the list
>> >> > while shutdown is happening?
>> >> 
>> >> Indeed, Alan suggested holding the kset spinlock while iterating the
...
>> > What I meant was that you should hold the spinlock while finding and 
>> > unlinking the last device on the list.  Clearly you shouldn't hold it 
>> > while calling the device shutdown routine.
>> 
>> I misunderstood.  But I believe insertion and deletion is properly
>> serliaized.  It looks to me like the list structure is intact.  It's the
>> iterator that's been driven off into the weeds.
...
>> Just to be clear, the list we're talking about is "list" in "struct
>> kset"  And the nodes of the list are chained by "entry" in "struct
>> kobject".
...
>> At a minimum the change looks something like the patch below.
...
> If you really want to do this then you should remove the lock member 
> from struct kset.  However this seems like an awful lot of work 
> compared to my original suggestion -- something like this (untested, 
> and you'll want to add comments):
...

I'm not sure I do want to pursue this.  It does seem particularly
invasive at a fundamental level of a core data structure.

Apparently I still don't understand your original suggestion.  I'd
prefer to, especially if it leads to a simpler fix.  The loop in
device_shutdown() looks something like:

       struct device *dev, *devn;

        list_for_each_entry_safe_reverse(dev, devn, &devices_kset->list,
                                kobj.entry) {
                if (dev->bus && dev->bus->shutdown) {
                        dev->bus->shutdown(dev);
                } else if (dev->driver && dev->driver->shutdown) {
                        dev->driver->shutdown(dev);
                }
        }

*dev gets delinked kobj_kset_leave() indirectly called from
dev->*->shutdown(dev).  This is protected by the spinlock.

The secondary thread similarly calls kobj_kset_leave().  But when the
secondary thread calls the shutdown routine for the device that devn
points to, the loop hangs.

Is there some way I can detect that devn no longer points to a valid
device upon return from dev->*->shutdown(dev)?  Or, where else can I
look to better understand your suggestion?

Thanks,
Hugh


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: System reboot hangs due to race against devices_kset->list triggered by SCSI FC workqueue
  2010-03-04 19:09           ` Hugh Daschbach
@ 2010-03-04 19:22             ` Alan Stern
  2010-03-04 22:32               ` Hugh Daschbach
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Stern @ 2010-03-04 19:22 UTC (permalink / raw)
  To: Hugh Daschbach
  Cc: Greg KH, Kay Sievers, Jan Blunck, David Vrabel, linux-kernel,
	linux-scsi, james Bottomley, James Smart

On Thu, 4 Mar 2010, Hugh Daschbach wrote:

> Apparently I still don't understand your original suggestion.  I'd
> prefer to, especially if it leads to a simpler fix.  The loop in
> device_shutdown() looks something like:
> 
>        struct device *dev, *devn;
> 
>         list_for_each_entry_safe_reverse(dev, devn, &devices_kset->list,
>                                 kobj.entry) {
>                 if (dev->bus && dev->bus->shutdown) {
>                         dev->bus->shutdown(dev);
>                 } else if (dev->driver && dev->driver->shutdown) {
>                         dev->driver->shutdown(dev);
>                 }
>         }
> 
> *dev gets delinked kobj_kset_leave() indirectly called from
> dev->*->shutdown(dev).  This is protected by the spinlock.
> 
> The secondary thread similarly calls kobj_kset_leave().  But when the
> secondary thread calls the shutdown routine for the device that devn
> points to, the loop hangs.
> 
> Is there some way I can detect that devn no longer points to a valid
> device upon return from dev->*->shutdown(dev)?  Or, where else can I
> look to better understand your suggestion?

Did you read the patch in my previous message?  You didn't quote it.  
It removes the devn variable, so the problem you're worried about
cannot occur.

Alan Stern


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: System reboot hangs due to race against devices_kset->list triggered by SCSI FC workqueue
  2010-03-04 19:22             ` Alan Stern
@ 2010-03-04 22:32               ` Hugh Daschbach
  2010-03-05 14:31                 ` Alan Stern
  0 siblings, 1 reply; 11+ messages in thread
From: Hugh Daschbach @ 2010-03-04 22:32 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg KH, Kay Sievers, Jan Blunck, David Vrabel, linux-kernel,
	linux-scsi, james Bottomley, James Smart

Alan Stern [mailto:stern@rowland.harvard.edu] writes:

> On Thu, 4 Mar 2010, Hugh Daschbach wrote:
>
>> Is there some way I can detect that devn no longer points to a valid
>> device upon return from dev->*->shutdown(dev)?  Or, where else can I
>> look to better understand your suggestion?
>
> Did you read the patch in my previous message?  You didn't quote it.  
> It removes the devn variable, so the problem you're worried about
> cannot occur.
>
> Alan Stern

Mea culpa.  I looked but did not see.  I have tested your patch.  It
does solve my problem.  I've attached my version, with comments as you
suggested, below.

I have added get_device()/put_device() to ensure there's the device
hasn't fully disappeared before calling list_del_init().  Is this
needed?  If so, there's "might_sleep()" commented out in put_device().
Do I need to release the lock before calling put_device()?

Thanks for you patience.

Hugh

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 2820257..1133d7a 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1731,10 +1731,20 @@ EXPORT_SYMBOL_GPL(device_move);
  */
 void device_shutdown(void)
 {
-	struct device *dev, *devn;
+	struct device *dev;
+
+	spin_lock(&devices_kset->list_lock);
+	/* 
+	 * Walk the devices list backward, shutting down each in turn.
+	 * Beware that device unplug events may also start pulling
+	 * devices offline, even as the system is shutting down.
+	 */
+	while (!list_empty(&devices_kset->list)) {
+		dev = list_entry(devices_kset->list.prev, struct device,
+				kobj.entry);
+		get_device(dev);
+		spin_unlock(&devices_kset->list_lock);
 
-	list_for_each_entry_safe_reverse(dev, devn, &devices_kset->list,
-				kobj.entry) {
 		if (dev->bus && dev->bus->shutdown) {
 			dev_dbg(dev, "shutdown\n");
 			dev->bus->shutdown(dev);
@@ -1742,6 +1752,15 @@ void device_shutdown(void)
 			dev_dbg(dev, "shutdown\n");
 			dev->driver->shutdown(dev);
 		}
+
+		/*
+		 * Make sure the device iss off the kset list, in the
+		 * event that dev->*->shutdown() didn't remove it.
+		 */
+		spin_lock(&devices_kset->list_lock);
+		list_del_init(&dev->kobj.entry);
+		put_device(dev);
 	}
+	spin_unlock(&devices_kset->list_lock);
 	async_synchronize_full();
 }


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: System reboot hangs due to race against devices_kset->list triggered by SCSI FC workqueue
  2010-03-04 22:32               ` Hugh Daschbach
@ 2010-03-05 14:31                 ` Alan Stern
  0 siblings, 0 replies; 11+ messages in thread
From: Alan Stern @ 2010-03-05 14:31 UTC (permalink / raw)
  To: Hugh Daschbach
  Cc: Greg KH, Kay Sievers, Jan Blunck, David Vrabel, linux-kernel,
	linux-scsi, james Bottomley, James Smart

On Thu, 4 Mar 2010, Hugh Daschbach wrote:

> I have added get_device()/put_device() to ensure there's the device
> hasn't fully disappeared before calling list_del_init().  Is this
> needed?

Yes, it is.  In fact, it's necessary to insure the device doesn't
disappear before you call the shutdown routines!

>  If so, there's "might_sleep()" commented out in put_device().
> Do I need to release the lock before calling put_device()?

A better approach might be to do the list_del_init() while still 
holding the initial spinlock, before you call the shutdown methods.  
Then do the put_device() before reacquiring the spinlock at the end of 
the loop.

Alan Stern


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-03-05 14:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-03  0:47 System reboot hangs due to race against devices_kset->list triggered by SCSI FC workqueue Hugh Daschbach
2010-03-03  4:54 ` Greg KH
2010-03-03 19:16   ` Hugh Daschbach
2010-03-03 20:25     ` Alan Stern
2010-03-04  3:25       ` Hugh Daschbach
2010-03-04 15:18         ` Alan Stern
2010-03-04 19:09           ` Hugh Daschbach
2010-03-04 19:22             ` Alan Stern
2010-03-04 22:32               ` Hugh Daschbach
2010-03-05 14:31                 ` Alan Stern
2010-03-03 15:50 ` Alan Stern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.