linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: OOPS in scsi generic stuff 2.4.10-pre6
@ 2001-09-16 15:55 Douglas Gilbert
  2001-09-16 16:22 ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Douglas Gilbert @ 2001-09-16 15:55 UTC (permalink / raw)
  To: lkml; +Cc: linux-kernel, linux-scsi, Jens Axboe

lkml@krimedawg.org <<nameless>> wrote:
> The ksymoops output.  Let me know if there is anything else I can offer
> to help?  This happened when ripping a cd with cdparanoia on an IDE drive
> with the ide-scsi stuff.

....

> >>EIP; c01b7688 <generic_unplug_device+8/30>   <=====
> Trace; c01f8fe6 <sg_common_write+1d6/1f0>
> Trace; c01f9bc0 <sg_cmd_done_bh+0/280>    #### bizarre
> Trace; c01f8c16 <sg_write+256/280>

generic_unplug_device() was an addition into the sg driver
by Jens Axboe. Under heavy stress testing I have also received
an oops from this function.

It is there because the tentacles of the Linux block subsystem 
have found their way into the the SCSI midlevel. The st and sg 
drivers are proof of why this is bad design as they are char 
devices.

If the generic_unplug_device() call is removed then the
sg driver will periodically have its commands suspended
on the SCSI mid level queue until the block subsystem
decides to send something to the device in question.
This can be seconds (which isn't a pleasant thing to do
to a cdwriter).

Doug Gilbert


Oops output:
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Unable to handle kernel paging request at virtual address 0a080294
c01b7688
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c01b7688>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00210002
eax: 0a08021c   ebx: 00200202   ecx: 00000010   edx: 0a08021c
esi: e6634064   edi: e6634040   ebp: c1bc8940   esp: d55bfefc
ds: 0018   es: 0018   ss: 0018
Process cdparanoia (pid: 3380, stackpage=d55bf000)
Stack: ea8bfc80 c01f8fe7 0a08021c ea8bfc80 d55bff6c dd2f0000 00007770 c01f9bc0
       00001770 00000001 d55bff6c e6634000 e6634040 08058b64 c01f8c17 e6634000
       e6634040 d55bff6c 00001770 00000001 e3cae540 ffffffea 00000000 000077a0
Call Trace: [<c01f8fe7>] [<c01f9bc0>] [<c01f8c17>] [<c012ebe6>] [<c0106c2b>]
Code: 80 7a 78 00 74 15 c6 42 78 00 8d 42 28 39 42 28 74 09 52 8b

>>EIP; c01b7688 <generic_unplug_device+8/30>   <=====
Trace; c01f8fe6 <sg_common_write+1d6/1f0>
Trace; c01f9bc0 <sg_cmd_done_bh+0/280>
Trace; c01f8c16 <sg_write+256/280>
Trace; c012ebe6 <sys_write+96/d0>
Trace; c0106c2a <system_call+32/38>
Code;  c01b7688 <generic_unplug_device+8/30>
00000000 <_EIP>:
Code;  c01b7688 <generic_unplug_device+8/30>   <=====
   0:   80 7a 78 00               cmpb   $0x0,0x78(%edx)   <=====
Code;  c01b768c <generic_unplug_device+c/30>
   4:   74 15                     je     1b <_EIP+0x1b> c01b76a2
<generic_unplug_device+22/30>
Code;  c01b768e <generic_unplug_device+e/30>
   6:   c6 42 78 00               movb   $0x0,0x78(%edx)
Code;  c01b7692 <generic_unplug_device+12/30>
   a:   8d 42 28                  lea    0x28(%edx),%eax
Code;  c01b7694 <generic_unplug_device+14/30>
   d:   39 42 28                  cmp    %eax,0x28(%edx)
Code;  c01b7698 <generic_unplug_device+18/30>
  10:   74 09                     je     1b <_EIP+0x1b> c01b76a2
<generic_unplug_device+22/30>
Code;  c01b769a <generic_unplug_device+1a/30>
  12:   52                        push   %edx
Code;  c01b769a <generic_unplug_device+1a/30>
  13:   8b 00                     mov    (%eax),%eax

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: OOPS in scsi generic stuff 2.4.10-pre6
  2001-09-16 15:55 OOPS in scsi generic stuff 2.4.10-pre6 Douglas Gilbert
@ 2001-09-16 16:22 ` Jens Axboe
  2001-09-16 16:55   ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2001-09-16 16:22 UTC (permalink / raw)
  To: Douglas Gilbert; +Cc: lkml, linux-kernel, linux-scsi

On Sun, Sep 16 2001, Douglas Gilbert wrote:
> lkml@krimedawg.org <<nameless>> wrote:
> > The ksymoops output.  Let me know if there is anything else I can offer
> > to help?  This happened when ripping a cd with cdparanoia on an IDE drive
> > with the ide-scsi stuff.
> 
> ....
> 
> > >>EIP; c01b7688 <generic_unplug_device+8/30>   <=====
> > Trace; c01f8fe6 <sg_common_write+1d6/1f0>
> > Trace; c01f9bc0 <sg_cmd_done_bh+0/280>    #### bizarre
> > Trace; c01f8c16 <sg_write+256/280>
> 
> generic_unplug_device() was an addition into the sg driver
> by Jens Axboe. Under heavy stress testing I have also received
> an oops from this function.

It looks like a race in that sg_cmd_done_bh can be completed before
generic_unplug_device is called (and thus on a free'd scsi request). We
then pass an invalid queue to generic_unplug_device.

> It is there because the tentacles of the Linux block subsystem 
> have found their way into the the SCSI midlevel. The st and sg 
> drivers are proof of why this is bad design as they are char 
> devices.

Not so. The SCSI mid level should just force this unplug, or have a
special do_req that forces imediate completion (or start of execution,
rather). The unplug in sg is needed because of that.

> If the generic_unplug_device() call is removed then the
> sg driver will periodically have its commands suspended
> on the SCSI mid level queue until the block subsystem
> decides to send something to the device in question.
> This can be seconds (which isn't a pleasant thing to do
> to a cdwriter).

Because the scsi_request_fn can quit with commands on the queue. That's
a SCSI internal issue. This is not a block layer bug.

--- drivers/scsi/sg.c~	Sun Sep 16 18:17:20 2001
+++ drivers/scsi/sg.c	Sun Sep 16 18:18:44 2001
@@ -645,6 +645,7 @@
     Scsi_Request        * SRpnt;
     Sg_device           * sdp = sfp->parentdp;
     sg_io_hdr_t         * hp = &srp->header;
+    request_queue_t	* q;
 
     srp->data.cmd_opcode = cmnd[0];  /* hold opcode of command */
     hp->status = 0;
@@ -673,6 +674,7 @@
     	return -ENODEV;
     }
     SRpnt = scsi_allocate_request(sdp->device);
+    q = &SRpnt->sr_device->request_queue;
     if(SRpnt == NULL) {
     	SCSI_LOG_TIMEOUT(1, printk("sg_write: no mem\n"));
     	sg_finish_rem_req(srp);
@@ -715,7 +717,7 @@
 		(void *)SRpnt->sr_buffer, hp->dxfer_len,
 		sg_cmd_done_bh, timeout, SG_DEFAULT_RETRIES);
     /* dxfer_len overwrites SRpnt->sr_bufflen, hence need for b_malloc_len */
-    generic_unplug_device(&SRpnt->sr_device->request_queue);
+    generic_unplug_device(q);
     return 0;
 }
 
-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: OOPS in scsi generic stuff 2.4.10-pre6
  2001-09-16 16:22 ` Jens Axboe
@ 2001-09-16 16:55   ` Jens Axboe
  2001-09-17  4:01     ` Douglas Gilbert
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2001-09-16 16:55 UTC (permalink / raw)
  To: Douglas Gilbert; +Cc: lkml, linux-kernel, linux-scsi

On Sun, Sep 16 2001, Jens Axboe wrote:
> It looks like a race in that sg_cmd_done_bh can be completed before
> generic_unplug_device is called (and thus on a free'd scsi request). We
> then pass an invalid queue to generic_unplug_device.

(corrected version, scsi_allocate_request can of course fail)

--- drivers/scsi/sg.c~	Sun Sep 16 18:17:20 2001
+++ drivers/scsi/sg.c	Sun Sep 16 18:53:38 2001
@@ -645,6 +645,7 @@
     Scsi_Request        * SRpnt;
     Sg_device           * sdp = sfp->parentdp;
     sg_io_hdr_t         * hp = &srp->header;
+    request_queue_t	* q;
 
     srp->data.cmd_opcode = cmnd[0];  /* hold opcode of command */
     hp->status = 0;
@@ -680,6 +681,7 @@
     }
 
     srp->my_cmdp = SRpnt;
+    q = &SRpnt->sr_device->request_queue;
     SRpnt->sr_request.rq_dev = sdp->i_rdev;
     SRpnt->sr_request.rq_status = RQ_ACTIVE;
     SRpnt->sr_sense_buffer[0] = 0;
@@ -715,7 +717,7 @@
 		(void *)SRpnt->sr_buffer, hp->dxfer_len,
 		sg_cmd_done_bh, timeout, SG_DEFAULT_RETRIES);
     /* dxfer_len overwrites SRpnt->sr_bufflen, hence need for b_malloc_len */
-    generic_unplug_device(&SRpnt->sr_device->request_queue);
+    generic_unplug_device(q);
     return 0;
 }
 
-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: OOPS in scsi generic stuff 2.4.10-pre6
  2001-09-16 16:55   ` Jens Axboe
@ 2001-09-17  4:01     ` Douglas Gilbert
  2001-09-17  7:59       ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Douglas Gilbert @ 2001-09-17  4:01 UTC (permalink / raw)
  To: Jens Axboe; +Cc: lkml, linux-kernel, linux-scsi

Jens Axboe wrote:
> 
> On Sun, Sep 16 2001, Jens Axboe wrote:
> > It looks like a race in that sg_cmd_done_bh can be completed before
> > generic_unplug_device is called (and thus on a free'd scsi request). We
> > then pass an invalid queue to generic_unplug_device.
> 
> (corrected version, scsi_allocate_request can of course fail)

Jens,
Prior to this patch (actually the first one you posted
today) sg_dd would frequently crash in generic_unplug_device
when tested against the scsi_debug adapter driver. [I have 
hacked up that driver to simulate a large number of (ram) 
disks to test Richard Gooch's 2000+ scsi disk patch.]

The way scsi_debug handles all its commands, the bottom
half handler in sg will be called before scsi_do_req()
completes. With this patch the problem goes away.

Doug Gilbert


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: OOPS in scsi generic stuff 2.4.10-pre6
  2001-09-17  4:01     ` Douglas Gilbert
@ 2001-09-17  7:59       ` Jens Axboe
  0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2001-09-17  7:59 UTC (permalink / raw)
  To: Douglas Gilbert; +Cc: lkml, linux-kernel, linux-scsi

On Mon, Sep 17 2001, Douglas Gilbert wrote:
> Jens Axboe wrote:
> > 
> > On Sun, Sep 16 2001, Jens Axboe wrote:
> > > It looks like a race in that sg_cmd_done_bh can be completed before
> > > generic_unplug_device is called (and thus on a free'd scsi request). We
> > > then pass an invalid queue to generic_unplug_device.
> > 
> > (corrected version, scsi_allocate_request can of course fail)
> 
> Jens,
> Prior to this patch (actually the first one you posted
> today) sg_dd would frequently crash in generic_unplug_device
> when tested against the scsi_debug adapter driver. [I have 
> hacked up that driver to simulate a large number of (ram) 
> disks to test Richard Gooch's 2000+ scsi disk patch.]
> 
> The way scsi_debug handles all its commands, the bottom
> half handler in sg will be called before scsi_do_req()
> completes. With this patch the problem goes away.

Good, so at least that long standing race has been closed now.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* OOPS in scsi generic stuff 2.4.10-pre6
@ 2001-09-16  7:57 lkml
  0 siblings, 0 replies; 6+ messages in thread
From: lkml @ 2001-09-16  7:57 UTC (permalink / raw)
  To: linux-kernel

The ksymoops output.  Let me know if there is anything else I can offer
to help?  This happened when ripping a cd with cdparanoia on an IDE drive
with the ide-scsi stuff.

Unable to handle kernel paging request at virtual address 0a080294
c01b7688
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c01b7688>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00210002
eax: 0a08021c   ebx: 00200202   ecx: 00000010   edx: 0a08021c
esi: e6634064   edi: e6634040   ebp: c1bc8940   esp: d55bfefc
ds: 0018   es: 0018   ss: 0018
Process cdparanoia (pid: 3380, stackpage=d55bf000)
Stack: ea8bfc80 c01f8fe7 0a08021c ea8bfc80 d55bff6c dd2f0000 00007770 c01f9bc0
       00001770 00000001 d55bff6c e6634000 e6634040 08058b64 c01f8c17 e6634000
       e6634040 d55bff6c 00001770 00000001 e3cae540 ffffffea 00000000 000077a0
Call Trace: [<c01f8fe7>] [<c01f9bc0>] [<c01f8c17>] [<c012ebe6>] [<c0106c2b>]
Code: 80 7a 78 00 74 15 c6 42 78 00 8d 42 28 39 42 28 74 09 52 8b

>>EIP; c01b7688 <generic_unplug_device+8/30>   <=====
Trace; c01f8fe6 <sg_common_write+1d6/1f0>
Trace; c01f9bc0 <sg_cmd_done_bh+0/280>
Trace; c01f8c16 <sg_write+256/280>
Trace; c012ebe6 <sys_write+96/d0>
Trace; c0106c2a <system_call+32/38>
Code;  c01b7688 <generic_unplug_device+8/30>
00000000 <_EIP>:
Code;  c01b7688 <generic_unplug_device+8/30>   <=====
   0:   80 7a 78 00               cmpb   $0x0,0x78(%edx)   <=====
Code;  c01b768c <generic_unplug_device+c/30>
   4:   74 15                     je     1b <_EIP+0x1b> c01b76a2 <generic_unplug_device+22/30>
Code;  c01b768e <generic_unplug_device+e/30>
   6:   c6 42 78 00               movb   $0x0,0x78(%edx)
Code;  c01b7692 <generic_unplug_device+12/30>
   a:   8d 42 28                  lea    0x28(%edx),%eax
Code;  c01b7694 <generic_unplug_device+14/30>
   d:   39 42 28                  cmp    %eax,0x28(%edx)
Code;  c01b7698 <generic_unplug_device+18/30>
  10:   74 09                     je     1b <_EIP+0x1b> c01b76a2 <generic_unplug_device+22/30>
Code;  c01b769a <generic_unplug_device+1a/30>
  12:   52                        push   %edx
Code;  c01b769a <generic_unplug_device+1a/30>
  13:   8b 00                     mov    (%eax),%eax


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2001-09-17  7:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-16 15:55 OOPS in scsi generic stuff 2.4.10-pre6 Douglas Gilbert
2001-09-16 16:22 ` Jens Axboe
2001-09-16 16:55   ` Jens Axboe
2001-09-17  4:01     ` Douglas Gilbert
2001-09-17  7:59       ` Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2001-09-16  7:57 lkml

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).