linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: sbp2.c on SMP
@ 2001-11-12  4:49 Douglas Gilbert
  0 siblings, 0 replies; 14+ messages in thread
From: Douglas Gilbert @ 2001-11-12  4:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-scsi

Andrew Morton wrote:

> drivers/ieee1394/sbp2.c deadlocks immediately on SMP, because
> io_request_lock is not held over its call to scsi_old_done().
> 
> I don't know why scsi_old_done() actually requires io_request_lock;
> perhaps Jens could comment on whether I've taken the lock in the
> appropriate place?
> 
> With the appended patch I can confirm that the driver happily runs
> `dbench 40' for half an hour on dual x86.   Even when you kick the
> disk onto the floor (sorry, HJ).
> 
> The games which scsi_old_done() plays with spinlocks and interrupt
> enabling are really foul.  If someone calls it with interrupts disabled
> (sbp2 does this) then scsi_old_done() will enable interrupts for you.
> If, for example, you call scsi_old_done() under spin_lock_irqsave(), 
> the reenabling of interrupts will expose you to deadlocks.  Perhaps
> scsi_old_done() should just use spin_unlock()/spin_lock()?
> 
> I tried enabling SBP2_USE_REAL_SPINLOCKS in sbp2.c and that appears to
> work just fine, although I haven't left that change in place here.
> 
> You don't actually _need_ SMP hardware to test SMP code, BTW.  You
> can just build an SMP kernel and run that on a uniprocessor box.
> This will still catch a wide range of bugs - it certainly detects
> the lockup which was occurring because of the scsi_old_done() thing.
> 
> Incidentally, it would be nice to be able to get this driver working
> properly when linked into the kernel - it makes debugging much easier :)

It would also be useful to replace the old style scsi error
handling with the improved "eh" error handling. The old style
was deprecated in the lk 2.2 series but lives on.

> 
> --- linux-2.4.15-pre2/drivers/ieee1394/sbp2.c   Wed Oct 17 14:19:20 2001
> +++ linux-akpm/drivers/ieee1394/sbp2.c  Sun Nov 11 17:22:47 2001
> @@ -2767,7 +2767,9 @@ static void sbp2scsi_complete_command(st
>         /*
>          * Tell scsi stack that we're done with this command
>          */
> +       spin_lock_irq(&io_request_lock);
>         done (SCpnt);
> +       spin_unlock_irq(&io_request_lock);
>  
>         return;
>  }

In the meantime, this patch is obviously required.

Doug Gilbert

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sbp2.c on SMP
  2001-11-16 21:25         ` H . J . Lu
@ 2001-11-16 22:40           ` Kristian Hogsberg
  0 siblings, 0 replies; 14+ messages in thread
From: Kristian Hogsberg @ 2001-11-16 22:40 UTC (permalink / raw)
  To: H . J . Lu
  Cc: Kristian Hogsberg, Andrew Morton, jamesg, linux-1394devel, lkml

"H . J . Lu" <hjl@lucon.org> writes:

> On Fri, Nov 16, 2001 at 05:15:47PM +0100, Kristian Hogsberg wrote:
> > This is true, but only because the struct list_head is the first
> > element in struct node_entry.  If it wasn't, lh would have been -16 or
> > so, as Andrew says.
> > 
> > In any case, it's the wrong fix, because the error is elsewhere:
> > neither the host_info list or the node list should contain NULL
> > entries.  This is just curing the symptoms.  HJ, could you provide
> > some details on the crash?  Do you have the sbp2 module loaded when
> > you insmod/rmmod ohci1394, and if so, does it crash without sbp2
> > loaded?
> > 
> 
> Found it. You have to use list_for_each_safe when you remove things.
> Here is a patch.

Thanks for looking into this, however these bugs have been fixed in
the cvs version of the drivers, except for the loop in video1394.c
(which wouldn't cause problems, since there's a break immediately
after the call to remove_card).

Kristian


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sbp2.c on SMP
  2001-11-16 16:15       ` Kristian Hogsberg
  2001-11-16 16:30         ` H . J . Lu
@ 2001-11-16 21:25         ` H . J . Lu
  2001-11-16 22:40           ` Kristian Hogsberg
  1 sibling, 1 reply; 14+ messages in thread
From: H . J . Lu @ 2001-11-16 21:25 UTC (permalink / raw)
  To: Kristian Hogsberg; +Cc: Andrew Morton, hogsberg, jamesg, linux-1394devel, lkml

On Fri, Nov 16, 2001 at 05:15:47PM +0100, Kristian Hogsberg wrote:
> This is true, but only because the struct list_head is the first
> element in struct node_entry.  If it wasn't, lh would have been -16 or
> so, as Andrew says.
> 
> In any case, it's the wrong fix, because the error is elsewhere:
> neither the host_info list or the node list should contain NULL
> entries.  This is just curing the symptoms.  HJ, could you provide
> some details on the crash?  Do you have the sbp2 module loaded when
> you insmod/rmmod ohci1394, and if so, does it crash without sbp2
> loaded?
> 

Found it. You have to use list_for_each_safe when you remove things.
Here is a patch.


H.J.
----
--- linux-2.4.9-12.2mod/drivers/ieee1394/nodemgr.c.rmmod	Tue Nov 13 19:15:44 2001
+++ linux-2.4.9-12.2mod/drivers/ieee1394/nodemgr.c	Fri Nov 16 13:14:22 2001
@@ -558,7 +558,7 @@ done_reset_host:
 
 static void nodemgr_remove_host(struct hpsb_host *host)
 {
-	struct list_head *lh;
+	struct list_head *lh, *spare;
 	struct host_info *hi = NULL;
 	struct node_entry *ne;
 	unsigned long flags;
@@ -568,7 +568,7 @@ static void nodemgr_remove_host(struct h
 
 	/* First remove all node entries for this host */
 	write_lock_irqsave(&node_lock, flags);
-	list_for_each(lh, &node_list) {
+	list_for_each_safe(lh, spare, &node_list) {
 		ne = list_entry(lh, struct node_entry, list);
 
 		/* Only checking this host */
--- linux-2.4.9-12.2mod/drivers/ieee1394/sbp2.c.rmmod	Thu Nov 15 17:08:49 2001
+++ linux-2.4.9-12.2mod/drivers/ieee1394/sbp2.c	Fri Nov 16 13:14:37 2001
@@ -754,13 +754,13 @@ static int sbp2util_create_command_orb_p
 static void sbp2util_remove_command_orb_pool(struct scsi_id_instance_data *scsi_id,
 					     struct sbp2scsi_host_info *hi)
 {
-	struct list_head *lh;
+	struct list_head *lh, *spare;
 	struct sbp2_command_info *command;
 	unsigned long flags;
         
 	sbp2_spin_lock(&scsi_id->sbp2_command_orb_lock, flags);
 	if (!list_empty(&scsi_id->sbp2_command_orb_completed)) {
-		list_for_each(lh, &scsi_id->sbp2_command_orb_completed) {
+		list_for_each_safe(lh, spare, &scsi_id->sbp2_command_orb_completed) {
 			command = list_entry(lh, struct sbp2_command_info, list);
 
 			/* Release our generic DMA's */
--- linux-2.4.9-12.2mod/drivers/ieee1394/video1394.c.rmmod	Sun Nov 11 21:06:48 2001
+++ linux-2.4.9-12.2mod/drivers/ieee1394/video1394.c	Fri Nov 16 13:15:28 2001
@@ -1592,7 +1592,7 @@ static void video1394_remove_host (struc
 {
 	struct ti_ohci *ohci;
 	unsigned long flags;
-	struct list_head *lh;
+	struct list_head *lh, *spare;
 
 	/* We only work with the OHCI-1394 driver */
 	if (strcmp(host->template->name, OHCI1394_DRIVER_NAME))
@@ -1603,7 +1603,7 @@ static void video1394_remove_host (struc
         spin_lock_irqsave(&video1394_cards_lock, flags);
 	if (!list_empty(&video1394_cards)) {
 		struct video_card *p;
-		list_for_each(lh, &video1394_cards) {
+		list_for_each_safe(lh, spare, &video1394_cards) {
 			p = list_entry(lh, struct video_card, list);
 			if (p ->ohci == ohci) {
 				remove_card(p);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sbp2.c on SMP
  2001-11-16 16:15       ` Kristian Hogsberg
@ 2001-11-16 16:30         ` H . J . Lu
  2001-11-16 21:25         ` H . J . Lu
  1 sibling, 0 replies; 14+ messages in thread
From: H . J . Lu @ 2001-11-16 16:30 UTC (permalink / raw)
  To: Kristian Hogsberg; +Cc: Andrew Morton, hogsberg, jamesg, linux-1394devel, lkml

On Fri, Nov 16, 2001 at 05:15:47PM +0100, Kristian Hogsberg wrote:
> In any case, it's the wrong fix, because the error is elsewhere:
> neither the host_info list or the node list should contain NULL
> entries.  This is just curing the symptoms.  HJ, could you provide
> some details on the crash?  Do you have the sbp2 module loaded when
> you insmod/rmmod ohci1394, and if so, does it crash without sbp2
> loaded?

No, sbp2 is not loaded.


H.J.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sbp2.c on SMP
  2001-11-16  3:32     ` H . J . Lu
@ 2001-11-16 16:15       ` Kristian Hogsberg
  2001-11-16 16:30         ` H . J . Lu
  2001-11-16 21:25         ` H . J . Lu
  0 siblings, 2 replies; 14+ messages in thread
From: Kristian Hogsberg @ 2001-11-16 16:15 UTC (permalink / raw)
  To: H . J . Lu; +Cc: Andrew Morton, hogsberg, jamesg, linux-1394devel, lkml

"H . J . Lu" <hjl@lucon.org> writes:

> On Tue, Nov 13, 2001 at 11:21:29PM -0800, Andrew Morton wrote:
> > "H . J . Lu" wrote:
> > > 
> > > Here is another patch. It fixes:
> > > 
> > > # modprobe ohci1394
> > > # rmmod ohci1394
> > > 
> > > H.J.
> > > --- linux-2.4.9-12.2mod/drivers/ieee1394/nodemgr.c.rmmod        Tue Nov 13 19:15:44 2001
> > > +++ linux-2.4.9-12.2mod/drivers/ieee1394/nodemgr.c      Tue Nov 13 19:11:38 2001
> > > @@ -570,7 +570,8 @@ static void nodemgr_remove_host(struct h
> > >         write_lock_irqsave(&node_lock, flags);
> > >         list_for_each(lh, &node_list) {
> > >                 ne = list_entry(lh, struct node_entry, list);
> > > -
> > > +               if (!ne)
> > > +                       break;
> > >                 /* Only checking this host */
> > >                 if (ne->host != host)
> > >                         continue;
> > > @@ -582,6 +583,8 @@ static void nodemgr_remove_host(struct h
> > >         spin_lock_irqsave (&host_info_lock, flags);
> > >         list_for_each(lh, &host_info_list) {
> > >                 struct host_info *myhi = list_entry(lh, struct host_info, list);
> > > +               if (!myhi)
> > > +                       break;
> > >                 if (myhi->host == host) {
> > >                         hi = myhi;
> > >                         break;
> > 
> > I don't think so.  Look at the definition of list_entry.
> > It's just a pointer offset.  So if `ne' is zero, then `lh'
> > must have been -16 or so.
> > 
> 
> I don't thik so. Zero `ne' means the `list' field which `lh' points
> to is NULL.

This is true, but only because the struct list_head is the first
element in struct node_entry.  If it wasn't, lh would have been -16 or
so, as Andrew says.

In any case, it's the wrong fix, because the error is elsewhere:
neither the host_info list or the node list should contain NULL
entries.  This is just curing the symptoms.  HJ, could you provide
some details on the crash?  Do you have the sbp2 module loaded when
you insmod/rmmod ohci1394, and if so, does it crash without sbp2
loaded?


> > There seems to be a problem with module reference counts:

[...]

This is a subtle issue... if you just want to prevent the ohci1394
driver from being unloaded while the sbp2 driver is loaded, it's a
matter of adding a hpsb_inc_host_usage call to sbp2_add_host and a
hpsb_dec_host_usage call to sbp2_remove_host.  However this goes
against the design of the 1394 subsystem, which is a bit like the scsi
subsystem: there's a middle layer which holds the hole thing together
and implements the core 1394 transaction logic and bus management,
then there's a low level layer where host controller drivers (ohci1394
and pcilynx) register and an upper layer where protocol level drivers
register (eg. sbp2, video1394, raw1394 and in the future ip1394).  The
design allows you to load and unload low level drivers independent of
each other and independent of whatever highlevel drivers currently
loaded, likewise for high level drivers.

Of course, you dont want to unload a low level driver when it's in
use, and for this you have the hpsb_inc_host_usage and
hpsb_dec_host_usage calls, which prevents unloading of the module
implementing the driver for the host given as argument.  This is used
by raw1394 when someone opens /dev/raw1394 and selects a host
controller to talk to.  So even if raw1394 is loaded, you can still
unload ohci1394 if nobody is using the card through raw1394.

So, if sbp2 unconditionally locks down any host controller it sees,
this flexibility is lost.  A better solution is to call
hpsb_inc_host_usage when sbp2 successfully logs into an sbp2 drive and
hpsb_dec_host_usage when a drive is disconnected.  It's still not
optimal, since now you cant unload a card driver if one of it's cards
has an sbp2 device attached that the sbp2 driver is logged into.  By
default, the sbp2 driver logs in to all devices on all buses if
possible and only logs out when you unload it, so to unload the
ohci1394 driver you must unplug all sbp2 devices from all ohci cards
or unload the sbp2 modules first.

Ideally, we want to hpsb_inc_host_usage for a host only when somebody
opens or mounts a device present on that hosts bus, but unfortunately
that event isn't passed down by the scsi subsystem, except that it
increases module usage count for sbp2.

Any ideas?  Is there another way to acomplish this or should we just
go with the simple solution, where we increase host usage when the
sbp2 driver detects and logs into a sbp2 device?

Kristian


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sbp2.c on SMP
  2001-11-14  7:21   ` Andrew Morton
@ 2001-11-16  3:32     ` H . J . Lu
  2001-11-16 16:15       ` Kristian Hogsberg
  0 siblings, 1 reply; 14+ messages in thread
From: H . J . Lu @ 2001-11-16  3:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: hogsberg, jamesg, linux-1394devel, lkml

On Tue, Nov 13, 2001 at 11:21:29PM -0800, Andrew Morton wrote:
> "H . J . Lu" wrote:
> > 
> > Here is another patch. It fixes:
> > 
> > # modprobe ohci1394
> > # rmmod ohci1394
> > 
> > H.J.
> > --- linux-2.4.9-12.2mod/drivers/ieee1394/nodemgr.c.rmmod        Tue Nov 13 19:15:44 2001
> > +++ linux-2.4.9-12.2mod/drivers/ieee1394/nodemgr.c      Tue Nov 13 19:11:38 2001
> > @@ -570,7 +570,8 @@ static void nodemgr_remove_host(struct h
> >         write_lock_irqsave(&node_lock, flags);
> >         list_for_each(lh, &node_list) {
> >                 ne = list_entry(lh, struct node_entry, list);
> > -
> > +               if (!ne)
> > +                       break;
> >                 /* Only checking this host */
> >                 if (ne->host != host)
> >                         continue;
> > @@ -582,6 +583,8 @@ static void nodemgr_remove_host(struct h
> >         spin_lock_irqsave (&host_info_lock, flags);
> >         list_for_each(lh, &host_info_list) {
> >                 struct host_info *myhi = list_entry(lh, struct host_info, list);
> > +               if (!myhi)
> > +                       break;
> >                 if (myhi->host == host) {
> >                         hi = myhi;
> >                         break;
> 
> I don't think so.  Look at the definition of list_entry.
> It's just a pointer offset.  So if `ne' is zero, then `lh'
> must have been -16 or so.
> 

I don't thik so. Zero `ne' means the `list' field which `lh' points
to is NULL.

> There seems to be a problem with module reference counts:
> 
> mnm:/home/akpm> lsmod
> Module                  Size  Used by
> sbp2                   14656   0 (unused)
> ohci1394               22224   0 (unused)
> ieee1394               26768   0 [sbp2 ohci1394]
> eepro100               17664   1 (autoclean)
> mnm:/home/akpm> 0 mount /dev/sdb1 /mnt/sdb1
> mnm:/home/akpm> lsmod
> Module                  Size  Used by
> sbp2                   14656   1
> ohci1394               22224   0 (unused)
> ieee1394               26768   0 [sbp2 ohci1394]
> eepro100               17664   1 (autoclean)
> 
> So it's possible to unload ohci1394.o and ieee1394.o while they're
> in use.
> 
> The fix isn't pretty, because the logical place where the module
> refcount needs to be incremented in ieee1394.o is called from
> external modules, as well as from within ieee1394.o itself.
> The latter of course makes the module permanently unloadable.
> 
> So we introduce a `different_module' flag which is set when
> the caller is from a different module.  Ugh.  Perhaps the
> maintainers can do better?
> 
> Also use the hpsb_host_template.devctl() method in hl_all_hosts()
> when a new client is registered to bump the refcount of ohci1394.o.
> 
> Anyway, here's my aggregate patch.  It appears to work, which is
> about the best that I can say about it.
> 

I think you missed hpsb_register_lowlevel/hpsb_unregister_lowlevel.


H.J.
---
--- linux-2.4.9-12.2mod/drivers/ieee1394/csr.c.rmmod	Thu Jul 19 17:48:15 2001
+++ linux-2.4.9-12.2mod/drivers/ieee1394/csr.c	Thu Nov 15 17:05:58 2001
@@ -425,7 +425,7 @@ static struct hpsb_highlevel *hl;
 
 void init_csr(void)
 {
-        hl = hpsb_register_highlevel("standard registers", &csr_ops);
+        hl = hpsb_register_highlevel("standard registers", &csr_ops, 0);
         if (hl == NULL) {
                 HPSB_ERR("out of memory during ieee1394 initialization");
                 return;
@@ -449,5 +449,5 @@ void init_csr(void)
 
 void cleanup_csr(void)
 {
-        hpsb_unregister_highlevel(hl);
+        hpsb_unregister_highlevel(hl, 0);
 }
--- linux-2.4.9-12.2mod/drivers/ieee1394/highlevel.c.rmmod	Mon Nov 12 16:46:35 2001
+++ linux-2.4.9-12.2mod/drivers/ieee1394/highlevel.c	Thu Nov 15 18:30:47 2001
@@ -9,6 +9,7 @@
 
 #include <linux/config.h>
 #include <linux/slab.h>
+#include <linux/module.h>
 
 #include "ieee1394.h"
 #include "ieee1394_types.h"
@@ -28,7 +29,8 @@ static struct hpsb_address_ops dummy_ops
 static struct hpsb_address_serve dummy_zero_addr, dummy_max_addr;
 
 struct hpsb_highlevel *hpsb_register_highlevel(const char *name,
-                                               struct hpsb_highlevel_ops *ops)
+                                               struct hpsb_highlevel_ops *ops,
+						int different_module)
 {
         struct hpsb_highlevel *hl;
 
@@ -38,6 +40,9 @@ struct hpsb_highlevel *hpsb_register_hig
                 return NULL;
         }
 
+	if (different_module)
+		MOD_INC_USE_COUNT;
+
         INIT_LIST_HEAD(&hl->hl_list);
         INIT_LIST_HEAD(&hl->addr_list);
         hl->name = name;
@@ -51,7 +56,7 @@ struct hpsb_highlevel *hpsb_register_hig
         return hl;
 }
 
-void hpsb_unregister_highlevel(struct hpsb_highlevel *hl)
+void hpsb_unregister_highlevel(struct hpsb_highlevel *hl, int different_module)
 {
         struct list_head *entry;
         struct hpsb_address_serve *as;
@@ -77,6 +82,8 @@ void hpsb_unregister_highlevel(struct hp
         write_unlock_irq(&hl_drivers_lock);
 
         kfree(hl);
+	if (different_module)
+		MOD_DEC_USE_COUNT;
 }
 
 int hpsb_register_addrspace(struct hpsb_highlevel *hl,
--- linux-2.4.9-12.2mod/drivers/ieee1394/highlevel.h.rmmod	Thu Jul 19 17:48:15 2001
+++ linux-2.4.9-12.2mod/drivers/ieee1394/highlevel.h	Thu Nov 15 17:05:58 2001
@@ -115,8 +115,9 @@ void highlevel_fcp_request(struct hpsb_h
  * because the string is not copied.
  */
 struct hpsb_highlevel *hpsb_register_highlevel(const char *name,
-                                               struct hpsb_highlevel_ops *ops);
-void hpsb_unregister_highlevel(struct hpsb_highlevel *hl);
+                                               struct hpsb_highlevel_ops *ops,
+						int different_module);
+void hpsb_unregister_highlevel(struct hpsb_highlevel *hl, int different_module);
 
 /*
  * Register handlers for host address spaces.  Start and end are 48 bit pointers
--- linux-2.4.9-12.2mod/drivers/ieee1394/hosts.c.rmmod	Sun Aug 12 12:39:02 2001
+++ linux-2.4.9-12.2mod/drivers/ieee1394/hosts.c	Thu Nov 15 18:00:05 2001
@@ -11,6 +11,7 @@
  */
 
 #include <linux/config.h>
+#include <linux/module.h>
 
 #include <linux/types.h>
 #include <linux/init.h>
@@ -39,6 +40,8 @@ void hl_all_hosts(struct hpsb_highlevel 
 
         for (tmpl = templates; tmpl != NULL; tmpl = tmpl->next) {
                 for (host = tmpl->hosts; host != NULL; host = host->next) {
+			if (tmpl->devctl)
+				tmpl->devctl(host, MODIFY_USAGE, init);
                         if (host->initialized) {
                                 if (init) {
                                         if (hl->op->add_host) {
@@ -258,6 +261,7 @@ int hpsb_register_lowlevel(struct hpsb_h
 		init_hosts(tmpl);
 	}
 
+	MOD_INC_USE_COUNT;
         return 0;
 }
 
@@ -268,4 +272,6 @@ void hpsb_unregister_lowlevel(struct hps
         if (remove_template(tmpl)) {
                 HPSB_PANIC("remove_template failed on %s", tmpl->name);
         }
+
+	MOD_DEC_USE_COUNT;
 }
--- linux-2.4.9-12.2mod/drivers/ieee1394/nodemgr.c.rmmod	Tue Nov 13 19:15:44 2001
+++ linux-2.4.9-12.2mod/drivers/ieee1394/nodemgr.c	Thu Nov 15 18:27:05 2001
@@ -570,7 +570,8 @@ static void nodemgr_remove_host(struct h
 	write_lock_irqsave(&node_lock, flags);
 	list_for_each(lh, &node_list) {
 		ne = list_entry(lh, struct node_entry, list);
-
+		if (!ne)
+			break;
 		/* Only checking this host */
 		if (ne->host != host)
 			continue;
@@ -582,6 +583,8 @@ static void nodemgr_remove_host(struct h
 	spin_lock_irqsave (&host_info_lock, flags);
 	list_for_each(lh, &host_info_list) {
 		struct host_info *myhi = list_entry(lh, struct host_info, list);
+		if (!myhi)
+			break;
 		if (myhi->host == host) {
 			hi = myhi;
 			break;
@@ -612,7 +615,7 @@ static struct hpsb_highlevel *hl;
 
 void init_ieee1394_nodemgr(void)
 {
-        hl = hpsb_register_highlevel("Node manager", &nodemgr_ops);
+        hl = hpsb_register_highlevel("Node manager", &nodemgr_ops, 0);
         if (!hl) {
                 HPSB_ERR("Out of memory during ieee1394 initialization");
         }
@@ -620,5 +623,5 @@ void init_ieee1394_nodemgr(void)
 
 void cleanup_ieee1394_nodemgr(void)
 {
-        hpsb_unregister_highlevel(hl);
+        hpsb_unregister_highlevel(hl, 0);
 }
--- linux-2.4.9-12.2mod/drivers/ieee1394/raw1394.c.rmmod	Sun Nov 11 21:06:48 2001
+++ linux-2.4.9-12.2mod/drivers/ieee1394/raw1394.c	Thu Nov 15 17:05:58 2001
@@ -1002,7 +1002,7 @@ static struct file_operations file_ops =
 
 static int __init init_raw1394(void)
 {
-        hl_handle = hpsb_register_highlevel(RAW1394_DEVICE_NAME, &hl_ops);
+        hl_handle = hpsb_register_highlevel(RAW1394_DEVICE_NAME, &hl_ops, 1);
         if (hl_handle == NULL) {
                 HPSB_ERR("raw1394 failed to register with ieee1394 highlevel");
                 return -ENOMEM;
@@ -1026,7 +1026,7 @@ static void __exit cleanup_raw1394(void)
 {
         devfs_unregister_chrdev(RAW1394_DEVICE_MAJOR, RAW1394_DEVICE_NAME);
 	devfs_unregister(devfs_handle);
-        hpsb_unregister_highlevel(hl_handle);
+        hpsb_unregister_highlevel(hl_handle, 1);
 }
 
 module_init(init_raw1394);
--- linux-2.4.9-12.2mod/drivers/ieee1394/sbp2.c.rmmod	Thu Nov 15 17:08:49 2001
+++ linux-2.4.9-12.2mod/drivers/ieee1394/sbp2.c	Thu Nov 15 18:31:56 2001
@@ -914,7 +914,7 @@ int sbp2_init(void)
 	/*
 	 * Register our high level driver with 1394 stack
 	 */
-	sbp2_hl_handle = hpsb_register_highlevel(SBP2_DEVICE_NAME, &sbp2_hl_ops);
+	sbp2_hl_handle = hpsb_register_highlevel(SBP2_DEVICE_NAME, &sbp2_hl_ops, 1);
 
 	if (sbp2_hl_handle == NULL) {
 		SBP2_ERR("sbp2: sbp2 failed to register with ieee1394 highlevel");
@@ -948,7 +948,7 @@ void sbp2_cleanup(void)
 	SBP2_DEBUG("sbp2: sbp2_cleanup");
 
 	if (sbp2_hl_handle) {
-		hpsb_unregister_highlevel(sbp2_hl_handle);
+		hpsb_unregister_highlevel(sbp2_hl_handle, 1);
 		sbp2_hl_handle = NULL;
 	}
 	return;
@@ -3454,7 +3456,7 @@ static int sbp2scsi_detect (Scsi_Host_Te
 	if (!sbp2_host_count) {
 		SBP2_ERR("sbp2: Please load the lower level IEEE-1394 driver (e.g. ohci1394) before sbp2...");
 		if (sbp2_hl_handle) {
-			hpsb_unregister_highlevel(sbp2_hl_handle);
+			hpsb_unregister_highlevel(sbp2_hl_handle, 1);
 			sbp2_hl_handle = NULL;
 		}
 	}
--- linux-2.4.9-12.2mod/drivers/ieee1394/video1394.c.rmmod	Sun Nov 11 21:06:48 2001
+++ linux-2.4.9-12.2mod/drivers/ieee1394/video1394.c	Thu Nov 15 17:05:58 2001
@@ -1643,7 +1643,7 @@ MODULE_LICENSE("GPL");
 
 static void __exit video1394_exit_module (void)
 {
-	hpsb_unregister_highlevel (hl_handle);
+	hpsb_unregister_highlevel (hl_handle, 1);
 
 	devfs_unregister(devfs_handle);
 	devfs_unregister_chrdev(VIDEO1394_MAJOR, VIDEO1394_DRIVER_NAME);
@@ -1666,7 +1666,7 @@ static int __init video1394_init_module 
 	devfs_handle = devfs_mk_dir(NULL, VIDEO1394_DRIVER_NAME, NULL);
 #endif
 
-	hl_handle = hpsb_register_highlevel (VIDEO1394_DRIVER_NAME, &hl_ops);
+	hl_handle = hpsb_register_highlevel (VIDEO1394_DRIVER_NAME, &hl_ops, 1);
 	if (hl_handle == NULL) {
 		PRINT_G(KERN_ERR, "No more memory for driver\n");
 		devfs_unregister(devfs_handle);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sbp2.c on SMP
  2001-11-14  3:17 ` H . J . Lu
@ 2001-11-14  7:21   ` Andrew Morton
  2001-11-16  3:32     ` H . J . Lu
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2001-11-14  7:21 UTC (permalink / raw)
  To: H . J . Lu; +Cc: hogsberg, jamesg, linux-1394devel, lkml

"H . J . Lu" wrote:
> 
> Here is another patch. It fixes:
> 
> # modprobe ohci1394
> # rmmod ohci1394
> 
> H.J.
> --- linux-2.4.9-12.2mod/drivers/ieee1394/nodemgr.c.rmmod        Tue Nov 13 19:15:44 2001
> +++ linux-2.4.9-12.2mod/drivers/ieee1394/nodemgr.c      Tue Nov 13 19:11:38 2001
> @@ -570,7 +570,8 @@ static void nodemgr_remove_host(struct h
>         write_lock_irqsave(&node_lock, flags);
>         list_for_each(lh, &node_list) {
>                 ne = list_entry(lh, struct node_entry, list);
> -
> +               if (!ne)
> +                       break;
>                 /* Only checking this host */
>                 if (ne->host != host)
>                         continue;
> @@ -582,6 +583,8 @@ static void nodemgr_remove_host(struct h
>         spin_lock_irqsave (&host_info_lock, flags);
>         list_for_each(lh, &host_info_list) {
>                 struct host_info *myhi = list_entry(lh, struct host_info, list);
> +               if (!myhi)
> +                       break;
>                 if (myhi->host == host) {
>                         hi = myhi;
>                         break;

I don't think so.  Look at the definition of list_entry.
It's just a pointer offset.  So if `ne' is zero, then `lh'
must have been -16 or so.

There seems to be a problem with module reference counts:

mnm:/home/akpm> lsmod
Module                  Size  Used by
sbp2                   14656   0 (unused)
ohci1394               22224   0 (unused)
ieee1394               26768   0 [sbp2 ohci1394]
eepro100               17664   1 (autoclean)
mnm:/home/akpm> 0 mount /dev/sdb1 /mnt/sdb1
mnm:/home/akpm> lsmod
Module                  Size  Used by
sbp2                   14656   1
ohci1394               22224   0 (unused)
ieee1394               26768   0 [sbp2 ohci1394]
eepro100               17664   1 (autoclean)

So it's possible to unload ohci1394.o and ieee1394.o while they're
in use.

The fix isn't pretty, because the logical place where the module
refcount needs to be incremented in ieee1394.o is called from
external modules, as well as from within ieee1394.o itself.
The latter of course makes the module permanently unloadable.

So we introduce a `different_module' flag which is set when
the caller is from a different module.  Ugh.  Perhaps the
maintainers can do better?

Also use the hpsb_host_template.devctl() method in hl_all_hosts()
when a new client is registered to bump the refcount of ohci1394.o.

Anyway, here's my aggregate patch.  It appears to work, which is
about the best that I can say about it.



--- linux-2.4.15-pre4/drivers/ieee1394/highlevel.h	Thu Jul 19 17:48:15 2001
+++ linux-akpm/drivers/ieee1394/highlevel.h	Tue Nov 13 22:36:40 2001
@@ -115,8 +115,9 @@ void highlevel_fcp_request(struct hpsb_h
  * because the string is not copied.
  */
 struct hpsb_highlevel *hpsb_register_highlevel(const char *name,
-                                               struct hpsb_highlevel_ops *ops);
-void hpsb_unregister_highlevel(struct hpsb_highlevel *hl);
+                                               struct hpsb_highlevel_ops *ops,
+						int different_module);
+void hpsb_unregister_highlevel(struct hpsb_highlevel *hl, int different_module);
 
 /*
  * Register handlers for host address spaces.  Start and end are 48 bit pointers
--- linux-2.4.15-pre4/drivers/ieee1394/csr.c	Thu Jul 19 17:48:15 2001
+++ linux-akpm/drivers/ieee1394/csr.c	Tue Nov 13 22:36:20 2001
@@ -425,7 +425,7 @@ static struct hpsb_highlevel *hl;
 
 void init_csr(void)
 {
-        hl = hpsb_register_highlevel("standard registers", &csr_ops);
+        hl = hpsb_register_highlevel("standard registers", &csr_ops, 0);
         if (hl == NULL) {
                 HPSB_ERR("out of memory during ieee1394 initialization");
                 return;
@@ -449,5 +449,5 @@ void init_csr(void)
 
 void cleanup_csr(void)
 {
-        hpsb_unregister_highlevel(hl);
+        hpsb_unregister_highlevel(hl, 0);
 }
--- linux-2.4.15-pre4/drivers/ieee1394/highlevel.c	Mon Oct  1 21:24:24 2001
+++ linux-akpm/drivers/ieee1394/highlevel.c	Tue Nov 13 23:12:33 2001
@@ -9,6 +9,7 @@
 
 #include <linux/config.h>
 #include <linux/slab.h>
+#include <linux/module.h>
 
 #include "ieee1394.h"
 #include "ieee1394_types.h"
@@ -28,7 +29,8 @@ static struct hpsb_address_ops dummy_ops
 static struct hpsb_address_serve dummy_zero_addr, dummy_max_addr;
 
 struct hpsb_highlevel *hpsb_register_highlevel(const char *name,
-                                               struct hpsb_highlevel_ops *ops)
+                                               struct hpsb_highlevel_ops *ops,
+						int different_module)
 {
         struct hpsb_highlevel *hl;
 
@@ -38,6 +40,9 @@ struct hpsb_highlevel *hpsb_register_hig
                 return NULL;
         }
 
+	if (different_module)
+		MOD_INC_USE_COUNT;
+
         INIT_LIST_HEAD(&hl->hl_list);
         INIT_LIST_HEAD(&hl->addr_list);
         hl->name = name;
@@ -51,15 +56,17 @@ struct hpsb_highlevel *hpsb_register_hig
         return hl;
 }
 
-void hpsb_unregister_highlevel(struct hpsb_highlevel *hl)
+void hpsb_unregister_highlevel(struct hpsb_highlevel *hl, int different_module)
 {
         struct list_head *entry;
         struct hpsb_address_serve *as;
 
         if (hl == NULL) {
+		printk(KERN_ERR __FUNCTION__ ": bug. null pointer.\n");
                 return;
         }
 
+	printk(__FUNCTION__ ": %s\n", hl->name);
         write_lock_irq(&addr_space_lock);
         entry = hl->addr_list.next;
 
@@ -77,6 +84,8 @@ void hpsb_unregister_highlevel(struct hp
         write_unlock_irq(&hl_drivers_lock);
 
         kfree(hl);
+	if (different_module)
+		MOD_DEC_USE_COUNT;
 }
 
 int hpsb_register_addrspace(struct hpsb_highlevel *hl,
--- linux-2.4.15-pre4/drivers/ieee1394/hosts.c	Mon Oct  1 21:24:24 2001
+++ linux-akpm/drivers/ieee1394/hosts.c	Tue Nov 13 23:05:35 2001
@@ -42,6 +42,8 @@ void hl_all_hosts(struct hpsb_highlevel 
                 tmpl = list_entry(tlh, struct hpsb_host_template, list);
 		list_for_each(hlh, &tmpl->hosts) {
 			host = list_entry(hlh, struct hpsb_host, list);
+			if (tmpl->devctl)
+				tmpl->devctl(host, MODIFY_USAGE, init);
                         if (host->initialized) {
                                 if (init) {
                                         if (hl->op->add_host) {
--- linux-2.4.15-pre4/drivers/ieee1394/nodemgr.c	Wed Oct 17 14:19:20 2001
+++ linux-akpm/drivers/ieee1394/nodemgr.c	Tue Nov 13 22:37:42 2001
@@ -891,7 +891,7 @@ static struct hpsb_highlevel *hl;
 
 void init_ieee1394_nodemgr(void)
 {
-        hl = hpsb_register_highlevel("Node manager", &nodemgr_ops);
+        hl = hpsb_register_highlevel("Node manager", &nodemgr_ops, 0);
         if (!hl) {
 		HPSB_ERR("NodeMgr: out of memory during ieee1394 initialization");
         }
@@ -899,5 +899,5 @@ void init_ieee1394_nodemgr(void)
 
 void cleanup_ieee1394_nodemgr(void)
 {
-        hpsb_unregister_highlevel(hl);
+        hpsb_unregister_highlevel(hl, 0);
 }
--- linux-2.4.15-pre4/drivers/ieee1394/raw1394.c	Mon Oct  1 21:24:24 2001
+++ linux-akpm/drivers/ieee1394/raw1394.c	Tue Nov 13 22:37:56 2001
@@ -1013,7 +1013,7 @@ static struct file_operations file_ops =
 
 static int __init init_raw1394(void)
 {
-        hl_handle = hpsb_register_highlevel(RAW1394_DEVICE_NAME, &hl_ops);
+        hl_handle = hpsb_register_highlevel(RAW1394_DEVICE_NAME, &hl_ops, 1);
         if (hl_handle == NULL) {
                 HPSB_ERR("raw1394 failed to register with ieee1394 highlevel");
                 return -ENOMEM;
@@ -1037,7 +1037,7 @@ static void __exit cleanup_raw1394(void)
 {
         devfs_unregister_chrdev(RAW1394_DEVICE_MAJOR, RAW1394_DEVICE_NAME);
 	devfs_unregister(devfs_handle);
-        hpsb_unregister_highlevel(hl_handle);
+        hpsb_unregister_highlevel(hl_handle, 1);
 }
 
 module_init(init_raw1394);
--- linux-2.4.15-pre4/drivers/ieee1394/sbp2.c	Wed Oct 17 14:19:20 2001
+++ linux-akpm/drivers/ieee1394/sbp2.c	Tue Nov 13 23:11:39 2001
@@ -836,7 +836,7 @@ int sbp2_init(void)
 	/*
 	 * Register our high level driver with 1394 stack
 	 */
-	sbp2_hl_handle = hpsb_register_highlevel(SBP2_DEVICE_NAME, &sbp2_hl_ops);
+	sbp2_hl_handle = hpsb_register_highlevel(SBP2_DEVICE_NAME, &sbp2_hl_ops, 1);
 
 	if (sbp2_hl_handle == NULL) {
 		SBP2_ERR("sbp2 failed to register with ieee1394 highlevel");
@@ -865,7 +865,7 @@ void sbp2_cleanup(void)
 	hpsb_unregister_protocol(&sbp2_driver);
 
 	if (sbp2_hl_handle) {
-		hpsb_unregister_highlevel(sbp2_hl_handle);
+		hpsb_unregister_highlevel(sbp2_hl_handle, 1);
 		sbp2_hl_handle = NULL;
 	}
 	return;
@@ -2767,7 +2767,9 @@ static void sbp2scsi_complete_command(st
 	/*
 	 * Tell scsi stack that we're done with this command
 	 */
+	spin_lock_irq(&io_request_lock);
 	done (SCpnt);
+	spin_unlock_irq(&io_request_lock);
 
 	return;
 }
--- linux-2.4.15-pre4/drivers/ieee1394/video1394.c	Mon Oct  1 21:24:25 2001
+++ linux-akpm/drivers/ieee1394/video1394.c	Tue Nov 13 22:38:12 2001
@@ -1647,7 +1647,7 @@ MODULE_LICENSE("GPL");
 
 static void __exit video1394_exit_module (void)
 {
-	hpsb_unregister_highlevel (hl_handle);
+	hpsb_unregister_highlevel (hl_handle, 1);
 
 	devfs_unregister(devfs_handle);
 	devfs_unregister_chrdev(VIDEO1394_MAJOR, VIDEO1394_DRIVER_NAME);
@@ -1670,7 +1670,7 @@ static int __init video1394_init_module 
 	devfs_handle = devfs_mk_dir(NULL, VIDEO1394_DRIVER_NAME, NULL);
 #endif
 
-	hl_handle = hpsb_register_highlevel (VIDEO1394_DRIVER_NAME, &hl_ops);
+	hl_handle = hpsb_register_highlevel (VIDEO1394_DRIVER_NAME, &hl_ops, 1);
 	if (hl_handle == NULL) {
 		PRINT_G(KERN_ERR, "No more memory for driver\n");
 		devfs_unregister(devfs_handle);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sbp2.c on SMP
  2001-11-12  1:37 Andrew Morton
                   ` (2 preceding siblings ...)
  2001-11-12 17:34 ` Jens Axboe
@ 2001-11-14  3:17 ` H . J . Lu
  2001-11-14  7:21   ` Andrew Morton
  3 siblings, 1 reply; 14+ messages in thread
From: H . J . Lu @ 2001-11-14  3:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: hogsberg, jamesg, Jens Axboe, lkml

On Sun, Nov 11, 2001 at 05:37:21PM -0800, Andrew Morton wrote:
> Guys,
> 
> drivers/ieee1394/sbp2.c deadlocks immediately on SMP, because
> io_request_lock is not held over its call to scsi_old_done().
> 

Here is another patch. It fixes:

# modprobe ohci1394
# rmmod ohci1394


H.J.
--- linux-2.4.9-12.2mod/drivers/ieee1394/nodemgr.c.rmmod	Tue Nov 13 19:15:44 2001
+++ linux-2.4.9-12.2mod/drivers/ieee1394/nodemgr.c	Tue Nov 13 19:11:38 2001
@@ -570,7 +570,8 @@ static void nodemgr_remove_host(struct h
 	write_lock_irqsave(&node_lock, flags);
 	list_for_each(lh, &node_list) {
 		ne = list_entry(lh, struct node_entry, list);
-
+		if (!ne)
+			break;
 		/* Only checking this host */
 		if (ne->host != host)
 			continue;
@@ -582,6 +583,8 @@ static void nodemgr_remove_host(struct h
 	spin_lock_irqsave (&host_info_lock, flags);
 	list_for_each(lh, &host_info_list) {
 		struct host_info *myhi = list_entry(lh, struct host_info, list);
+		if (!myhi)
+			break;
 		if (myhi->host == host) {
 			hi = myhi;
 			break;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sbp2.c on SMP
  2001-11-12  1:37 Andrew Morton
  2001-11-12  4:54 ` H . J . Lu
  2001-11-12  8:50 ` Alan Cox
@ 2001-11-12 17:34 ` Jens Axboe
  2001-11-14  3:17 ` H . J . Lu
  3 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2001-11-12 17:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: hogsberg, jamesg, H . J . Lu, lkml

On Sun, Nov 11 2001, Andrew Morton wrote:
> Guys,
> 
> drivers/ieee1394/sbp2.c deadlocks immediately on SMP, because
> io_request_lock is not held over its call to scsi_old_done().
> 
> I don't know why scsi_old_done() actually requires io_request_lock;
> perhaps Jens could comment on whether I've taken the lock in the
> appropriate place?

Yes it looks fine, unfortunately the mid layer locking design is crappy
like that. Imposing locking downwards is just not good style :/

I've actually had quite good luck changing this for future kernels -- it
was required to remove io_request_lock anyways.

> The games which scsi_old_done() plays with spinlocks and interrupt
> enabling are really foul.  If someone calls it with interrupts disabled
> (sbp2 does this) then scsi_old_done() will enable interrupts for you.
> If, for example, you call scsi_old_done() under spin_lock_irqsave(), 
> the reenabling of interrupts will expose you to deadlocks.  Perhaps
> scsi_old_done() should just use spin_unlock()/spin_lock()?

Yep it's not nice either. I wouldn't change that without further
auditing though.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sbp2.c on SMP
  2001-11-12  1:37 Andrew Morton
  2001-11-12  4:54 ` H . J . Lu
@ 2001-11-12  8:50 ` Alan Cox
  2001-11-12 17:34 ` Jens Axboe
  2001-11-14  3:17 ` H . J . Lu
  3 siblings, 0 replies; 14+ messages in thread
From: Alan Cox @ 2001-11-12  8:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: hogsberg, jamesg, H . J . Lu, Jens Axboe, lkml

> If, for example, you call scsi_old_done() under spin_lock_irqsave(), 
> the reenabling of interrupts will expose you to deadlocks.  Perhaps
> scsi_old_done() should just use spin_unlock()/spin_lock()?

scsi_old_done ceases to exist in 2.5 thankfully. Its basically emulating
old (1.2,2.0,..) scsi completion


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sbp2.c on SMP
  2001-11-12  5:14   ` Andrew Morton
@ 2001-11-12  5:28     ` H . J . Lu
  0 siblings, 0 replies; 14+ messages in thread
From: H . J . Lu @ 2001-11-12  5:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: hogsberg, jamesg, Jens Axboe, lkml

On Sun, Nov 11, 2001 at 09:14:35PM -0800, Andrew Morton wrote:
> "H . J . Lu" wrote:
> > 
> > > Incidentally, it would be nice to be able to get this driver working
> > > properly when linked into the kernel - it makes debugging much easier :)
> > >
> > 
> > I guess I can try that. The only main issue will be the order of
> > initialization.
> > 
> 
> Actually, it almost works.  If you link the drivers into the kernel
> and, after bootup, attach a firewire drive and run rescan-scsi-bus.sh
> it will pick up the new devices.  It's just the bus scan at initcall
> time which fails.

I will look into that.

H.J.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sbp2.c on SMP
  2001-11-12  4:54 ` H . J . Lu
@ 2001-11-12  5:14   ` Andrew Morton
  2001-11-12  5:28     ` H . J . Lu
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2001-11-12  5:14 UTC (permalink / raw)
  To: H . J . Lu; +Cc: hogsberg, jamesg, Jens Axboe, lkml

"H . J . Lu" wrote:
> 
> > Incidentally, it would be nice to be able to get this driver working
> > properly when linked into the kernel - it makes debugging much easier :)
> >
> 
> I guess I can try that. The only main issue will be the order of
> initialization.
> 

Actually, it almost works.  If you link the drivers into the kernel
and, after bootup, attach a firewire drive and run rescan-scsi-bus.sh
it will pick up the new devices.  It's just the bus scan at initcall
time which fails.

-

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: sbp2.c on SMP
  2001-11-12  1:37 Andrew Morton
@ 2001-11-12  4:54 ` H . J . Lu
  2001-11-12  5:14   ` Andrew Morton
  2001-11-12  8:50 ` Alan Cox
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: H . J . Lu @ 2001-11-12  4:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: hogsberg, jamesg, Jens Axboe, lkml

On Sun, Nov 11, 2001 at 05:37:21PM -0800, Andrew Morton wrote:
> 
> With the appended patch I can confirm that the driver happily runs
> `dbench 40' for half an hour on dual x86.   Even when you kick the

Thanks a lot.

> disk onto the floor (sorry, HJ).

That is a good stress test.

> 
> The games which scsi_old_done() plays with spinlocks and interrupt
> enabling are really foul.  If someone calls it with interrupts disabled
> (sbp2 does this) then scsi_old_done() will enable interrupts for you.
> If, for example, you call scsi_old_done() under spin_lock_irqsave(), 
> the reenabling of interrupts will expose you to deadlocks.  Perhaps
> scsi_old_done() should just use spin_unlock()/spin_lock()?
> 
> I tried enabling SBP2_USE_REAL_SPINLOCKS in sbp2.c and that appears to
> work just fine, although I haven't left that change in place here.
> 
> You don't actually _need_ SMP hardware to test SMP code, BTW.  You
> can just build an SMP kernel and run that on a uniprocessor box.
> This will still catch a wide range of bugs - it certainly detects
> the lockup which was occurring because of the scsi_old_done() thing.
> 
> Incidentally, it would be nice to be able to get this driver working
> properly when linked into the kernel - it makes debugging much easier :)
> 

I guess I can try that. The only main issue will be the order of
initialization.



H.J.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* sbp2.c on SMP
@ 2001-11-12  1:37 Andrew Morton
  2001-11-12  4:54 ` H . J . Lu
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Andrew Morton @ 2001-11-12  1:37 UTC (permalink / raw)
  To: hogsberg, jamesg; +Cc: H . J . Lu, Jens Axboe, lkml

Guys,

drivers/ieee1394/sbp2.c deadlocks immediately on SMP, because
io_request_lock is not held over its call to scsi_old_done().

I don't know why scsi_old_done() actually requires io_request_lock;
perhaps Jens could comment on whether I've taken the lock in the
appropriate place?

With the appended patch I can confirm that the driver happily runs
`dbench 40' for half an hour on dual x86.   Even when you kick the
disk onto the floor (sorry, HJ).

The games which scsi_old_done() plays with spinlocks and interrupt
enabling are really foul.  If someone calls it with interrupts disabled
(sbp2 does this) then scsi_old_done() will enable interrupts for you.
If, for example, you call scsi_old_done() under spin_lock_irqsave(), 
the reenabling of interrupts will expose you to deadlocks.  Perhaps
scsi_old_done() should just use spin_unlock()/spin_lock()?

I tried enabling SBP2_USE_REAL_SPINLOCKS in sbp2.c and that appears to
work just fine, although I haven't left that change in place here.

You don't actually _need_ SMP hardware to test SMP code, BTW.  You
can just build an SMP kernel and run that on a uniprocessor box.
This will still catch a wide range of bugs - it certainly detects
the lockup which was occurring because of the scsi_old_done() thing.

Incidentally, it would be nice to be able to get this driver working
properly when linked into the kernel - it makes debugging much easier :)



--- linux-2.4.15-pre2/drivers/ieee1394/sbp2.c	Wed Oct 17 14:19:20 2001
+++ linux-akpm/drivers/ieee1394/sbp2.c	Sun Nov 11 17:22:47 2001
@@ -2767,7 +2767,9 @@ static void sbp2scsi_complete_command(st
 	/*
 	 * Tell scsi stack that we're done with this command
 	 */
+	spin_lock_irq(&io_request_lock);
 	done (SCpnt);
+	spin_unlock_irq(&io_request_lock);
 
 	return;
 }

-

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2001-11-16 22:41 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-11-12  4:49 sbp2.c on SMP Douglas Gilbert
  -- strict thread matches above, loose matches on Subject: below --
2001-11-12  1:37 Andrew Morton
2001-11-12  4:54 ` H . J . Lu
2001-11-12  5:14   ` Andrew Morton
2001-11-12  5:28     ` H . J . Lu
2001-11-12  8:50 ` Alan Cox
2001-11-12 17:34 ` Jens Axboe
2001-11-14  3:17 ` H . J . Lu
2001-11-14  7:21   ` Andrew Morton
2001-11-16  3:32     ` H . J . Lu
2001-11-16 16:15       ` Kristian Hogsberg
2001-11-16 16:30         ` H . J . Lu
2001-11-16 21:25         ` H . J . Lu
2001-11-16 22:40           ` Kristian Hogsberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).