linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.4.21: kernel BUG at ide-iops.c:1262!
@ 2003-06-24 22:19 David Dillow
  2003-06-25  0:48 ` Scott McDermott
  0 siblings, 1 reply; 12+ messages in thread
From: David Dillow @ 2003-06-24 22:19 UTC (permalink / raw)
  To: linux-kernel

I've found a completely repeatable BUG/panic in 2.4.21 (final). The
kernel BUGS and then panics in an interrupt while beginning to burn a
DVD+R. I get the following in the logs, then the BUG/panic decoded at
the end of this message:

scsi : aborting command due to timeout : pid 390, scsi0, channel 0, id
0, lun 0
Test Unit Ready 00 00 00 00 00
scsi : aborting command due to timeout : pid 390, scsi0, channel 0, id
0, lun 0
Test Unit Ready 00 00 00 00 00
SCSI host 0 abort (pid 390) timed out - resetting
SCSI bus is being reset for host 0 channel 0.

Hardware is a dual PIII, 512MB memory, VIA vt82c686b IDE controller,
Sony DRU-500A DVD+/-RW burner, and Maxwell 4x DVD+R media. I have not
been able to reproduce with any other media, and the firmware on the
500A is the latest.

I was able to reproduce this with 2.4.21-rc5-ac1, but have not had media
available to try other kernels. I do have a few left, so I can help test
theories and solutions.

I was using growifs from dvd+rw-tools versions 5.6.4.4.4 and 5.9.4.4.4.
It will happen under either version.

Dave




ksymoops 2.4.5 on i686 2.4.21.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.21/ (default)
     -m /boot/System.map-2.4.21 (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

kernel BUG at ide-iops.c:1262!
invalid operand: 0000
CPU:    1
EIP:    0010:[<c01b5551>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00013082
eax: c01d8ee0   ebx: 00000000   ecx: 00000036   edx: 00000002
esi: c15d7c00   edi: c032c450   ebp: c032c254   esp: dfff5e84
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=dfff5000)
Stack: 00000001 00003086 00003086 c15d7c00 c1597c60 00003086 00000000
c15d7c00 
       00000000 c15e64c0 c01b5767 c032c450 00000000 c01da110 c032c450
c01d427d 
       c15d7c00 00000002 00000000 c15d7c00 00003286 00000020 00000001
c01d3675 
Call Trace:    [<c01b5767>] [<c01da110>] [<c01d427d>] [<c01d3675>]
[<e0825fbc>]
  [<c01d35f0>] [<c0125c2e>] [<c0121b54>] [<c01219f7>] [<c01217b9>]
[<c0109399>]
  [<c01053e0>] [<c010bed8>] [<c01053e0>] [<c010540c>] [<c01054a2>]
[<c011c5a5>]
  [<c011c810>]
Code: 0f 0b ee 04 61 ef 25 c0 80 bf 09 01 00 00 20 74 0c 8b 74 24 


>>EIP; c01b5551 <do_reset1+31/230>   <=====

>>eax; c01d8ee0 <idescsi_pc_intr+0/360>
>>esi; c15d7c00 <_end+1298ac8/204cef28>
>>edi; c032c450 <ide_hwifs+670/2c88>
>>ebp; c032c254 <ide_hwifs+474/2c88>
>>esp; dfff5e84 <_end+1fcb6d4c/204cef28>

Trace; c01b5767 <ide_do_reset+17/20>
Trace; c01da110 <idescsi_reset+20/40>
Trace; c01d427d <scsi_reset+11d/370>
Trace; c01d3675 <scsi_old_times_out+85/160>
Trace; e0825fbc <[usb-uhci]rh_init_int_timer+5c/70>
Trace; c01d35f0 <scsi_old_times_out+0/160>
Trace; c0125c2e <run_timer_list+12e/1b7>
Trace; c0121b54 <bh_action+54/80>
Trace; c01219f7 <tasklet_hi_action+67/a0>
Trace; c01217b9 <do_softirq+d9/e0>
Trace; c0109399 <do_IRQ+e9/f0>
Trace; c01053e0 <default_idle+0/50>
Trace; c010bed8 <call_do_IRQ+5/d>
Trace; c01053e0 <default_idle+0/50>
Trace; c010540c <default_idle+2c/50>
Trace; c01054a2 <cpu_idle+52/70>
Trace; c011c5a5 <call_console_drivers+65/120>
Trace; c011c810 <printk+140/180>

Code;  c01b5551 <do_reset1+31/230>
00000000 <_EIP>:
Code;  c01b5551 <do_reset1+31/230>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c01b5553 <do_reset1+33/230>
   2:   ee                        out    %al,(%dx)
Code;  c01b5554 <do_reset1+34/230>
   3:   04 61                     add    $0x61,%al
Code;  c01b5556 <do_reset1+36/230>
   5:   ef                        out    %eax,(%dx)
Code;  c01b5557 <do_reset1+37/230>
   6:   25 c0 80 bf 09            and    $0x9bf80c0,%eax
Code;  c01b555c <do_reset1+3c/230>
   b:   01 00                     add    %eax,(%eax)
Code;  c01b555e <do_reset1+3e/230>
   d:   00 20                     add    %ah,(%eax)
Code;  c01b5560 <do_reset1+40/230>
   f:   74 0c                     je     1d <_EIP+0x1d>
Code;  c01b5562 <do_reset1+42/230>
  11:   8b 74 24 00               mov    0x0(%esp,1),%esi

 <0>Kernel panic: Aiee, killing interrupt handler!

1 warning issued.  Results may not be reliable.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.4.21: kernel BUG at ide-iops.c:1262!
  2003-06-24 22:19 2.4.21: kernel BUG at ide-iops.c:1262! David Dillow
@ 2003-06-25  0:48 ` Scott McDermott
  2003-06-25  3:54   ` David Dillow
  2003-06-25  7:42   ` Roland Mas
  0 siblings, 2 replies; 12+ messages in thread
From: Scott McDermott @ 2003-06-25  0:48 UTC (permalink / raw)
  To: linux-kernel

David Dillow on Tue 24/06 18:19 -0400:
> I've found a completely repeatable BUG/panic in 2.4.21
> (final). The kernel BUGS and then panics in an interrupt
> while beginning to burn a DVD+R. I get the following in
> the logs, then the BUG/panic decoded at the end of this
> message:

There were recent threads about this, and a Bugzilla bug 829
I think.  Try killing magicdev.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.4.21: kernel BUG at ide-iops.c:1262!
  2003-06-25  0:48 ` Scott McDermott
@ 2003-06-25  3:54   ` David Dillow
  2003-06-25 12:00     ` Alan Cox
       [not found]     ` <1056557323.1444.4.camel@dhcp22.swansea.linux.org.uk>
  2003-06-25  7:42   ` Roland Mas
  1 sibling, 2 replies; 12+ messages in thread
From: David Dillow @ 2003-06-25  3:54 UTC (permalink / raw)
  To: Scott McDermott; +Cc: linux-kernel

On Tue, 2003-06-24 at 20:48, Scott McDermott wrote:
> David Dillow on Tue 24/06 18:19 -0400:
> > I've found a completely repeatable BUG/panic in 2.4.21
> > (final). The kernel BUGS and then panics in an interrupt
> > while beginning to burn a DVD+R. I get the following in
> > the logs, then the BUG/panic decoded at the end of this
> > message:
> 
> There were recent threads about this, and a Bugzilla bug 829
> I think.  

Doh! Missed those, and my quick search didn't hit. Found one after your
message, though.

> Try killing magicdev.

I thought I had, but upon checking, it was still kicking. Killing it
allows the burn to complete uninterrupted.

I'll test patches to kill the kernel BUG, but I'm happy to be able to
burn without a reboot again.

Dave

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.4.21: kernel BUG at ide-iops.c:1262!
  2003-06-25  0:48 ` Scott McDermott
  2003-06-25  3:54   ` David Dillow
@ 2003-06-25  7:42   ` Roland Mas
  2003-06-25 21:20     ` Adrian Bunk
  1 sibling, 1 reply; 12+ messages in thread
From: Roland Mas @ 2003-06-25  7:42 UTC (permalink / raw)
  To: linux-kernel

Scott McDermott, 2003-06-24 20:48:28 -0400 :

> There were recent threads about this, and a Bugzilla bug 829 I
> think.  Try killing magicdev.

Okay, I'll ask a LKML-newbie question.  Bugzilla says "Linux 2.5
kernel bugs only at this time"...  Does that mean this bug won't be
fixed in 2.4, or just that the fix will be written for 2.5 and then
backported to 2.4?

Roland.
-- 
Roland Mas

Au royaume des aveugles, les borgnes n'ont qu'un oeil.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.4.21: kernel BUG at ide-iops.c:1262!
  2003-06-25  3:54   ` David Dillow
@ 2003-06-25 12:00     ` Alan Cox
  2003-06-25 22:47       ` Scott McDermott
       [not found]     ` <1056557323.1444.4.camel@dhcp22.swansea.linux.org.uk>
  1 sibling, 1 reply; 12+ messages in thread
From: Alan Cox @ 2003-06-25 12:00 UTC (permalink / raw)
  To: David Dillow; +Cc: Scott McDermott, Linux Kernel Mailing List


Its a known problem. There are two approaches. Either rewrite the ide
scsi reset code to use ide_abort infrastructure or switch ide-scsi back
to the old abort/reset code not new_eh and use SCSI_RESET_PUNT so that
the recovery is all handled by the ide layer.

For 2.4 the latter may be the best approach, for 2.5 it has to use
new_eh


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.4.21: kernel BUG at ide-iops.c:1262!
  2003-06-25  7:42   ` Roland Mas
@ 2003-06-25 21:20     ` Adrian Bunk
  2003-06-25 21:37       ` Scott McDermott
  0 siblings, 1 reply; 12+ messages in thread
From: Adrian Bunk @ 2003-06-25 21:20 UTC (permalink / raw)
  To: Roland Mas; +Cc: linux-kernel

On Wed, Jun 25, 2003 at 09:42:38AM +0200, Roland Mas wrote:
> Scott McDermott, 2003-06-24 20:48:28 -0400 :
> 
> > There were recent threads about this, and a Bugzilla bug 829 I
> > think.  Try killing magicdev.
> 
> Okay, I'll ask a LKML-newbie question.  Bugzilla says "Linux 2.5
> kernel bugs only at this time"...  Does that mean this bug won't be
> fixed in 2.4, or just that the fix will be written for 2.5 and then
> backported to 2.4?

The Bugzilla tracks only 2.5 bugs.

Whether it will be fixed in 2.4 or a fix will be backported to 2.4 is 
beyond the scope of the Bugzilla (but of course Bugzilla does not 
affect how kernel developers write and submit fixes).

> Roland.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.4.21: kernel BUG at ide-iops.c:1262!
  2003-06-25 21:20     ` Adrian Bunk
@ 2003-06-25 21:37       ` Scott McDermott
  0 siblings, 0 replies; 12+ messages in thread
From: Scott McDermott @ 2003-06-25 21:37 UTC (permalink / raw)
  To: linux-kernel

Adrian Bunk on Wed 25/06 23:20 +0200:
> > > There were recent threads about this, and a Bugzilla
> > > bug 829 I think.  Try killing magicdev.
> > 
> > Okay, I'll ask a LKML-newbie question.  Bugzilla says
> > "Linux 2.5 kernel bugs only at this time"...  Does that
> > mean this bug won't be fixed in 2.4, or just that the
> > fix will be written for 2.5 and then backported to 2.4?
> 
> The Bugzilla tracks only 2.5 bugs.
> 
> Whether it will be fixed in 2.4 or a fix will be
> backported to 2.4 is beyond the scope of the Bugzilla (but
> of course Bugzilla does not affect how kernel developers
> write and submit fixes).

it tracks 2.5 but I think the bug is the same; it appears
the relevant code has changed but I think the root issue
probably is the same (ide-scsi reset error handling causing
kernel to crash).  The bug submitter for the original bug
said he gets the same behavior on 2.4, but he didn't yet
respond as to whether killing magicdev fixes his problem.
Maybe they're different problems, but they appear related.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.4.21: kernel BUG at ide-iops.c:1262!
  2003-06-25 12:00     ` Alan Cox
@ 2003-06-25 22:47       ` Scott McDermott
  2003-06-25 22:49         ` Alan Cox
  0 siblings, 1 reply; 12+ messages in thread
From: Scott McDermott @ 2003-06-25 22:47 UTC (permalink / raw)
  To: Alan Cox; +Cc: David Dillow, Linux Kernel Mailing List

Alan Cox on Wed 25/06 13:00 +0100:
> Its a known problem. There are two approaches. Either
> rewrite the ide scsi reset code to use ide_abort
> infrastructure or switch ide-scsi back to the old
> abort/reset code not new_eh and use SCSI_RESET_PUNT so
> that the recovery is all handled by the ide layer.
> 
> For 2.4 the latter may be the best approach, for 2.5 it
> has to use new_eh

is this what

o       First crack at fixing the ide reset oopses      (me)

tries to fix?

The CDRW devices that have problems with Test Unit Ready
during finalization (like the GCC-4240N) are broken and this
won't fix that problem, but the fix you're talking about
will stop the kernel from crashing when it happens, do I
have that right?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.4.21: kernel BUG at ide-iops.c:1262!
  2003-06-25 22:47       ` Scott McDermott
@ 2003-06-25 22:49         ` Alan Cox
  0 siblings, 0 replies; 12+ messages in thread
From: Alan Cox @ 2003-06-25 22:49 UTC (permalink / raw)
  To: Scott McDermott; +Cc: David Dillow, Linux Kernel Mailing List

On Mer, 2003-06-25 at 23:47, Scott McDermott wrote:
> is this what
> 
> o       First crack at fixing the ide reset oopses      (me)
> 
> tries to fix?
> 
> The CDRW devices that have problems with Test Unit Ready
> during finalization (like the GCC-4240N) are broken and this
> won't fix that problem, but the fix you're talking about
> will stop the kernel from crashing when it happens, do I
> have that right?

The TUA thing is a bit different but the general problem of 
reset being mishandled is part of it and needs fixing anyway


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.4.21: kernel BUG at ide-iops.c:1262!
       [not found]     ` <1056557323.1444.4.camel@dhcp22.swansea.linux.org.uk>
@ 2003-06-26  3:17       ` David Dillow
  2003-06-26 21:34         ` Alan Cox
  0 siblings, 1 reply; 12+ messages in thread
From: David Dillow @ 2003-06-26  3:17 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

On Wed, 2003-06-25 at 12:08, Alan Cox wrote:
> On Mer, 2003-06-25 at 04:54, David Dillow wrote:
> 
> > I thought I had, but upon checking, it was still kicking. Killing it
> > allows the burn to complete uninterrupted.
> > 
> > I'll test patches to kill the kernel BUG, but I'm happy to be able to
> > burn without a reboot again.
> 
> I've put a test fix into -ac3

Well, it no longer BUGS. Progress. :)

Unfortunately, it still oopses, then panics. Everything but the kernel
is the same as before. Forgot to get ksym/module info for ksymoops
before testing, can get that info if need be.

ksymoops 2.4.5 on i686 2.4.21.  Options used
     -V (default)
     -K (specified)
     -L (specified)
     -o /lib/modules/2.4.21-ac3/ (specified)
     -m /boot/System.map-2.4.21-ac3 (specified)

No modules in ksyms, skipping objects
cpu: 0, clocks: 1339302, slice: 446434
cpu: 1, clocks: 1339302, slice: 446434
Unable to handle kernel NULL pointer dereference at virtual address
00000000
00000000
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<00000000>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010096
eax: c029c480   ebx: c034b7fc   ecx: 00000033   edx: 00000002
esi: c15de400   edi: 00000000   ebp: c15e3440   esp: c02cded4
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c02cd000)
Stack: c01de899 c034b7fc c026be65 00000000 c01d891d c15de400 00000002
00000000 
       c15de400 00000292 00000000 00000001 c01d7d15 c15de400 00000002
000003ac 
       00000000 c15de46c c01d7c90 c0126f1e c15de400 c15de46c c15de46c
00000001 
Call Trace:    [<c01de899>] [<c01d891d>] [<c01d7d15>] [<c01d7c90>]
[<c0126f1e>]
  [<c0122e24>] [<c0122cc2>] [<c0122a89>] [<c0109289>] [<c01053e0>]
[<c0105000>]
  [<c010bde8>] [<c01053e0>] [<c0105000>] [<c010540c>] [<c0105474>]
Code:  Bad EIP value.


>>EIP; 00000000 Before first symbol

>>eax; c029c480 <idescsi_driver+0/6c>
>>ebx; c034b7fc <ide_hwifs+67c/2cd8>
>>esi; c15de400 <END_OF_CODE+127f9c8/????>
>>ebp; c15e3440 <END_OF_CODE+1284a08/????>
>>esp; c02cded4 <init_task_union+1ed4/2000>

Trace; c01de899 <idescsi_reset+29/80>
Trace; c01d891d <scsi_reset+11d/370>
Trace; c01d7d15 <scsi_old_times_out+85/160>
Trace; c01d7c90 <scsi_old_times_out+0/160>
Trace; c0126f1e <run_timer_list+12e/1b7>
Trace; c0122e24 <bh_action+54/80>
Trace; c0122cc2 <tasklet_hi_action+62/a0>
Trace; c0122a89 <do_softirq+d9/e0>
Trace; c0109289 <do_IRQ+e9/f0>
Trace; c01053e0 <default_idle+0/50>
Trace; c0105000 <_stext+0/0>
Trace; c010bde8 <call_do_IRQ+5/d>
Trace; c01053e0 <default_idle+0/50>
Trace; c0105000 <_stext+0/0>
Trace; c010540c <default_idle+2c/50>
Trace; c0105474 <cpu_idle+24/30>

 <0>Kernel panic: Aiee, killing interrupt handler!



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.4.21: kernel BUG at ide-iops.c:1262!
  2003-06-26  3:17       ` David Dillow
@ 2003-06-26 21:34         ` Alan Cox
  2003-06-27  2:58           ` David Dillow
  0 siblings, 1 reply; 12+ messages in thread
From: Alan Cox @ 2003-06-26 21:34 UTC (permalink / raw)
  To: David Dillow; +Cc: Linux Kernel Mailing List

On Iau, 2003-06-26 at 04:17, David Dillow wrote:
> Trace; c01de899 <idescsi_reset+29/80>
> Trace; c01d891d <scsi_reset+11d/370>

Interesting trace.

Lets try a little sanity check first then. Replace that whole 
idescsi_reset handler with "return SCSI_RESET_SNOOZE"

(I think its snooze)



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.4.21: kernel BUG at ide-iops.c:1262!
  2003-06-26 21:34         ` Alan Cox
@ 2003-06-27  2:58           ` David Dillow
  0 siblings, 0 replies; 12+ messages in thread
From: David Dillow @ 2003-06-27  2:58 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

On Thu, 2003-06-26 at 17:34, Alan Cox wrote:
> On Iau, 2003-06-26 at 04:17, David Dillow wrote:
> > Trace; c01de899 <idescsi_reset+29/80>
> > Trace; c01d891d <scsi_reset+11d/370>
> 
> Interesting trace.
> 
> Lets try a little sanity check first then. Replace that whole 
> idescsi_reset handler with "return SCSI_RESET_SNOOZE"

With that change, I've burned two disks back to back with no oopsen.

All I get in the logs are these (expected) messages:
scsi : aborting command due to timeout : pid 522, scsi0, channel 0, id
0, lun 0
scsi : aborting command due to timeout : pid 522, scsi0, channel 0, id
0, lun 0
SCSI host 0 abort (pid 522) timed out - resetting
SCSI bus is being reset for host 0 channel 0.
SCSI host 0 channel 0 reset (pid 522) timed out - trying harder
SCSI bus is being reset for host 0 channel 0.
scsi : aborting command due to timeout : pid 144056, scsi0, channel 0,
id 0, lu
scsi : aborting command due to timeout : pid 144056, scsi0, channel 0,
id 0, lu
SCSI host 0 abort (pid 144056) timed out - resetting
SCSI bus is being reset for host 0 channel 0.
SCSI host 0 channel 0 reset (pid 144056) timed out - trying harder
SCSI bus is being reset for host 0 channel 0.

Don't know if you broke anything else, but I can burn with magicdev
running and it not lock up the system now.

I've got four pieces of this media left, so if you want me to test the
next iteration of the patch, give me a yell.

Thanks,
Dave



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2003-06-27  2:45 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-24 22:19 2.4.21: kernel BUG at ide-iops.c:1262! David Dillow
2003-06-25  0:48 ` Scott McDermott
2003-06-25  3:54   ` David Dillow
2003-06-25 12:00     ` Alan Cox
2003-06-25 22:47       ` Scott McDermott
2003-06-25 22:49         ` Alan Cox
     [not found]     ` <1056557323.1444.4.camel@dhcp22.swansea.linux.org.uk>
2003-06-26  3:17       ` David Dillow
2003-06-26 21:34         ` Alan Cox
2003-06-27  2:58           ` David Dillow
2003-06-25  7:42   ` Roland Mas
2003-06-25 21:20     ` Adrian Bunk
2003-06-25 21:37       ` Scott McDermott

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).