All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.20.3 AMD64 oops in CFQ code
@ 2007-03-22 12:38 linux
  2007-03-22 18:41 ` Jens Axboe
  2007-03-22 18:43 ` Aristeu Sergio Rozanski Filho
  0 siblings, 2 replies; 16+ messages in thread
From: linux @ 2007-03-22 12:38 UTC (permalink / raw)
  To: linux-ide, linux-kernel; +Cc: axboe, linux

This is a uniprocessor AMD64 system running software RAID-5 and RAID-10
over multiple PCIe SiI3132 SATA controllers.  The hardware has been very
stable for a long time, but has been acting up of late since I upgraded
to 2.6.20.3.  ECC memory should preclude the possibility of bit-flip
errors.

Kernel 2.6.20.3 + linuxpps patches (confined to drivers/serial, and not
actually in use as I stole the serial port for a console).

It takes half a day to reproduce the problem, so bisecting would be painful.

BackupPC_dump mostly writes to a large (1.7 TB) ext3 RAID5 partition.


Here are two oopes, a few minutes (16:31, to be precise) apart.
Unusually, it oopsed twice *without* locking up the system..  Usually,
I see this followed by an error from drivers/input/keyboard/atkbd.c:
                        printk(KERN_WARNING "atkbd.c: Spurious %s on %s. "
                               "Some program might be trying access hardware directly.\n",
emitted at 1 Hz with the keyboard LEDs flashing and the system
unresponsive to keyboard or pings.
(I think it was spurious ACK on serio/input0, but my memory may be faulty.)


If anyone has any suggestions, they'd be gratefully received.


Unable to handle kernel NULL pointer dereference at 0000000000000098 RIP: 
 [<ffffffff8031504a>] cfq_dispatch_insert+0x18/0x68
PGD 777e9067 PUD 78774067 PMD 0 
Oops: 0000 [1] 
CPU 0 
Modules linked in: ecb
Pid: 2837, comm: BackupPC_dump Not tainted 2.6.20.3-g691f5333 #40
RIP: 0010:[<ffffffff8031504a>]  [<ffffffff8031504a>] cfq_dispatch_insert+0x18/0x68
RSP: 0018:ffff8100770bbaf8  EFLAGS: 00010092
RAX: ffff81007fb36c80 RBX: 0000000000000000 RCX: 0000000000000001
RDX: 000000010003e4e7 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff81007fb37a00 R08: 00000000ffffffff R09: ffff81005d390298
R10: ffff81007fcb4f80 R11: ffff81007fcb4f80 R12: ffff81007facd280
R13: 0000000000000004 R14: 0000000000000001 R15: 0000000000000000
FS:  00002b322d120d30(0000) GS:ffffffff805de000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000098 CR3: 000000007bcf0000 CR4: 00000000000006e0
Process BackupPC_dump (pid: 2837, threadinfo ffff8100770ba000, task ffff81007fc5d8e0)
Stack:  0000000000000000 ffff8100770f39f0 0000000000000000 0000000000000004
 0000000000000001 ffffffff80315253 ffffffff803b2607 ffff81005da2bc40
 ffff81007fac3800 ffff81007facd280 ffff81007facd280 ffff81005d390298
Call Trace:
 [<ffffffff80315253>] cfq_dispatch_requests+0x152/0x512
 [<ffffffff803b2607>] scsi_done+0x0/0x18
 [<ffffffff8030d9f1>] elv_next_request+0x137/0x147
 [<ffffffff803b7ce0>] scsi_request_fn+0x6a/0x33a
 [<ffffffff8024d407>] generic_unplug_device+0xa/0xe
 [<ffffffff80407ced>] unplug_slaves+0x5b/0x94
 [<ffffffff80223d65>] sync_page+0x0/0x40
 [<ffffffff80223d9b>] sync_page+0x36/0x40
 [<ffffffff80256d45>] __wait_on_bit_lock+0x36/0x65
 [<ffffffff80237496>] __lock_page+0x5e/0x64
 [<ffffffff8028061d>] wake_bit_function+0x0/0x23
 [<ffffffff802074de>] find_get_page+0xe/0x2d
 [<ffffffff8020b38e>] do_generic_mapping_read+0x1c2/0x40d
 [<ffffffff8020bd80>] file_read_actor+0x0/0x118
 [<ffffffff8021422e>] generic_file_aio_read+0x15c/0x19e
 [<ffffffff8020bafa>] do_sync_read+0xc9/0x10c
 [<ffffffff80210342>] may_open+0x5b/0x1c6
 [<ffffffff802805ef>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8020a857>] vfs_read+0xaa/0x152
 [<ffffffff8020faf3>] sys_read+0x45/0x6e
 [<ffffffff8025041e>] system_call+0x7e/0x83


Code: 4c 8b ae 98 00 00 00 4c 8b 70 08 e8 63 fe ff ff 8b 43 28 4c 
RIP  [<ffffffff8031504a>] cfq_dispatch_insert+0x18/0x68
 RSP <ffff8100770bbaf8>
CR2: 0000000000000098
 <1>Unable to handle kernel NULL pointer dereference at 0000000000000098 RIP: 
 [<ffffffff8031504a>] cfq_dispatch_insert+0x18/0x68
PGD 79bd2067 PUD 789f9067 PMD 0 
Oops: 0000 [2] 
CPU 0 
Modules linked in: ecb
Pid: 2834, comm: BackupPC_dump Not tainted 2.6.20.3-g691f5333 #40
RIP: 0010:[<ffffffff8031504a>]  [<ffffffff8031504a>] cfq_dispatch_insert+0x18/0x
68
RSP: 0018:ffff8100789b5af8  EFLAGS: 00010092
RAX: ffff81007fb36c80 RBX: 0000000000000000 RCX: 0000000000000001
RDX: 000000010007ac16 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff81007fb37a00 R08: 00000000ffffffff R09: ffff810064dd45e0
R10: ffff81007fcb4f80 R11: ffff81007fcb4f80 R12: ffff81007facd280
R13: 0000000000000004 R14: 0000000000000001 R15: 0000000000000000
FS:  00002b0a7c680d30(0000) GS:ffffffff805de000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000098 CR3: 0000000079d36000 CR4: 00000000000006e0
Process BackupPC_dump (pid: 2834, threadinfo ffff8100789b4000, task ffff81007a23
5140)
Stack:  0000000000000000 ffff81007b9ebbd0 0000000000000000 0000000000000004
 0000000000000001 ffffffff80315253 ffffffff803b2607 ffff81000e67ba00
 ffff81007fac3800 ffff81007facd280 ffff81007facd280 ffff810064dd45e0
Call Trace:
 [<ffffffff80315253>] cfq_dispatch_requests+0x152/0x512
 [<ffffffff803b2607>] scsi_done+0x0/0x18
 [<ffffffff8030d9f1>] elv_next_request+0x137/0x147
 [<ffffffff803b7ce0>] scsi_request_fn+0x6a/0x33a
 [<ffffffff8024d407>] generic_unplug_device+0xa/0xe
 [<ffffffff80407ced>] unplug_slaves+0x5b/0x94
 [<ffffffff80223d65>] sync_page+0x0/0x40
 [<ffffffff80223d9b>] sync_page+0x36/0x40
 [<ffffffff80256d45>] __wait_on_bit_lock+0x36/0x65
 [<ffffffff80237496>] __lock_page+0x5e/0x64
 [<ffffffff8028061d>] wake_bit_function+0x0/0x23
 [<ffffffff802074de>] find_get_page+0xe/0x2d
 [<ffffffff8020b38e>] do_generic_mapping_read+0x1c2/0x40d
 [<ffffffff8020bd80>] file_read_actor+0x0/0x118
 [<ffffffff8021422e>] generic_file_aio_read+0x15c/0x19e
 [<ffffffff8020bafa>] do_sync_read+0xc9/0x10c
 [<ffffffff80210342>] may_open+0x5b/0x1c6
 [<ffffffff802805ef>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8020a857>] vfs_read+0xaa/0x152
 [<ffffffff8020faf3>] sys_read+0x45/0x6e
 [<ffffffff8025041e>] system_call+0x7e/0x83


Code: 4c 8b ae 98 00 00 00 4c 8b 70 08 e8 63 fe ff ff 8b 43 28 4c 
RIP  [<ffffffff8031504a>] cfq_dispatch_insert+0x18/0x68
 RSP <ffff8100789b5af8>
CR2: 0000000000000098
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20.3 AMD64 oops in CFQ code
  2007-03-22 12:38 2.6.20.3 AMD64 oops in CFQ code linux
@ 2007-03-22 18:41 ` Jens Axboe
  2007-03-22 18:54   ` linux
  2007-03-22 18:43 ` Aristeu Sergio Rozanski Filho
  1 sibling, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2007-03-22 18:41 UTC (permalink / raw)
  To: linux; +Cc: linux-ide, linux-kernel

On Thu, Mar 22 2007, linux@horizon.com wrote:
> This is a uniprocessor AMD64 system running software RAID-5 and RAID-10
> over multiple PCIe SiI3132 SATA controllers.  The hardware has been very
> stable for a long time, but has been acting up of late since I upgraded
> to 2.6.20.3.  ECC memory should preclude the possibility of bit-flip
> errors.
> 
> Kernel 2.6.20.3 + linuxpps patches (confined to drivers/serial, and not
> actually in use as I stole the serial port for a console).
> 
> It takes half a day to reproduce the problem, so bisecting would be painful.
> 
> BackupPC_dump mostly writes to a large (1.7 TB) ext3 RAID5 partition.
> 
> 
> Here are two oopes, a few minutes (16:31, to be precise) apart.
> Unusually, it oopsed twice *without* locking up the system..  Usually,
> I see this followed by an error from drivers/input/keyboard/atkbd.c:
>                         printk(KERN_WARNING "atkbd.c: Spurious %s on %s. "
>                                "Some program might be trying access hardware directly.\n",
> emitted at 1 Hz with the keyboard LEDs flashing and the system
> unresponsive to keyboard or pings.
> (I think it was spurious ACK on serio/input0, but my memory may be faulty.)
> 
> 
> If anyone has any suggestions, they'd be gratefully received.
> 
> 
> Unable to handle kernel NULL pointer dereference at 0000000000000098 RIP: 
>  [<ffffffff8031504a>] cfq_dispatch_insert+0x18/0x68
> PGD 777e9067 PUD 78774067 PMD 0 
> Oops: 0000 [1] 
> CPU 0 
> Modules linked in: ecb
> Pid: 2837, comm: BackupPC_dump Not tainted 2.6.20.3-g691f5333 #40
> RIP: 0010:[<ffffffff8031504a>]  [<ffffffff8031504a>] cfq_dispatch_insert+0x18/0x68
> RSP: 0018:ffff8100770bbaf8  EFLAGS: 00010092
> RAX: ffff81007fb36c80 RBX: 0000000000000000 RCX: 0000000000000001
> RDX: 000000010003e4e7 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffff81007fb37a00 R08: 00000000ffffffff R09: ffff81005d390298
> R10: ffff81007fcb4f80 R11: ffff81007fcb4f80 R12: ffff81007facd280
> R13: 0000000000000004 R14: 0000000000000001 R15: 0000000000000000
> FS:  00002b322d120d30(0000) GS:ffffffff805de000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000098 CR3: 000000007bcf0000 CR4: 00000000000006e0
> Process BackupPC_dump (pid: 2837, threadinfo ffff8100770ba000, task ffff81007fc5d8e0)
> Stack:  0000000000000000 ffff8100770f39f0 0000000000000000 0000000000000004
>  0000000000000001 ffffffff80315253 ffffffff803b2607 ffff81005da2bc40
>  ffff81007fac3800 ffff81007facd280 ffff81007facd280 ffff81005d390298
> Call Trace:
>  [<ffffffff80315253>] cfq_dispatch_requests+0x152/0x512
>  [<ffffffff803b2607>] scsi_done+0x0/0x18
>  [<ffffffff8030d9f1>] elv_next_request+0x137/0x147
>  [<ffffffff803b7ce0>] scsi_request_fn+0x6a/0x33a
>  [<ffffffff8024d407>] generic_unplug_device+0xa/0xe
>  [<ffffffff80407ced>] unplug_slaves+0x5b/0x94
>  [<ffffffff80223d65>] sync_page+0x0/0x40
>  [<ffffffff80223d9b>] sync_page+0x36/0x40
>  [<ffffffff80256d45>] __wait_on_bit_lock+0x36/0x65
>  [<ffffffff80237496>] __lock_page+0x5e/0x64
>  [<ffffffff8028061d>] wake_bit_function+0x0/0x23
>  [<ffffffff802074de>] find_get_page+0xe/0x2d
>  [<ffffffff8020b38e>] do_generic_mapping_read+0x1c2/0x40d
>  [<ffffffff8020bd80>] file_read_actor+0x0/0x118
>  [<ffffffff8021422e>] generic_file_aio_read+0x15c/0x19e
>  [<ffffffff8020bafa>] do_sync_read+0xc9/0x10c
>  [<ffffffff80210342>] may_open+0x5b/0x1c6
>  [<ffffffff802805ef>] autoremove_wake_function+0x0/0x2e
>  [<ffffffff8020a857>] vfs_read+0xaa/0x152
>  [<ffffffff8020faf3>] sys_read+0x45/0x6e
>  [<ffffffff8025041e>] system_call+0x7e/0x83

3 (I think) seperate instances of this, each involving raid5. Is your
array degraded or fully operational?


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20.3 AMD64 oops in CFQ code
  2007-03-22 12:38 2.6.20.3 AMD64 oops in CFQ code linux
  2007-03-22 18:41 ` Jens Axboe
@ 2007-03-22 18:43 ` Aristeu Sergio Rozanski Filho
  1 sibling, 0 replies; 16+ messages in thread
From: Aristeu Sergio Rozanski Filho @ 2007-03-22 18:43 UTC (permalink / raw)
  To: linux; +Cc: linux-ide, linux-kernel, axboe

> This is a uniprocessor AMD64 system running software RAID-5 and RAID-10
> over multiple PCIe SiI3132 SATA controllers.  The hardware has been very
> stable for a long time, but has been acting up of late since I upgraded
> to 2.6.20.3.  ECC memory should preclude the possibility of bit-flip
> errors.
Tried checking the memory with memtest86?

Do you have k8_edac module loaded?
If you don't, I'd recomend using it to get reports of
recoverable/unrecoverable memory errors, check http://bluesmoke.sf.net/ for
latest version.

-- 
Aristeu


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20.3 AMD64 oops in CFQ code
  2007-03-22 18:41 ` Jens Axboe
@ 2007-03-22 18:54   ` linux
  2007-03-22 19:00     ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: linux @ 2007-03-22 18:54 UTC (permalink / raw)
  To: jens.axboe, linux; +Cc: linux-kernel, linux-raid

> 3 (I think) seperate instances of this, each involving raid5. Is your
> array degraded or fully operational?

Ding! A drive fell out the other day, which is why the problems only
appeared recently.

md5 : active raid5 sdf4[5] sdd4[3] sdc4[2] sdb4[1] sda4[0]
      1719155200 blocks level 5, 64k chunk, algorithm 2 [6/5] [UUUU_U]
      bitmap: 149/164 pages [596KB], 1024KB chunk

H'm... this means that my alarm scripts aren't working.  Well, that's
good to know.  The drive is being re-integrated now.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20.3 AMD64 oops in CFQ code
  2007-03-22 18:54   ` linux
@ 2007-03-22 19:00     ` Jens Axboe
  2007-03-22 23:59       ` Neil Brown
  0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2007-03-22 19:00 UTC (permalink / raw)
  To: linux; +Cc: linux-kernel, linux-raid, Neil Brown, linux-kernel, cebbert

On Thu, Mar 22 2007, linux@horizon.com wrote:
> > 3 (I think) seperate instances of this, each involving raid5. Is your
> > array degraded or fully operational?
> 
> Ding! A drive fell out the other day, which is why the problems only
> appeared recently.
> 
> md5 : active raid5 sdf4[5] sdd4[3] sdc4[2] sdb4[1] sda4[0]
>       1719155200 blocks level 5, 64k chunk, algorithm 2 [6/5] [UUUU_U]
>       bitmap: 149/164 pages [596KB], 1024KB chunk
> 
> H'm... this means that my alarm scripts aren't working.  Well, that's
> good to know.  The drive is being re-integrated now.

Heh, at least something good came out of this bug then :-)
But that's reaffirming. Neil, are you following this? It smells somewhat
fishy wrt raid5.

I wonder if this triggers anything?

--- linux-2.6.20.3/block/ll_rw_blk.c~	2007-03-22 19:59:17.128833635 +0100
+++ linux-2.6.20.3/block/ll_rw_blk.c	2007-03-22 19:59:28.850045490 +0100
@@ -1602,6 +1602,8 @@
  **/
 void generic_unplug_device(request_queue_t *q)
 {
+	WARN_ON(irqs_disabled());
+
 	spin_lock_irq(q->queue_lock);
 	__generic_unplug_device(q);
 	spin_unlock_irq(q->queue_lock);

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20.3 AMD64 oops in CFQ code
  2007-03-22 19:00     ` Jens Axboe
@ 2007-03-22 23:59       ` Neil Brown
  2007-03-23  0:31         ` Dan Williams
  0 siblings, 1 reply; 16+ messages in thread
From: Neil Brown @ 2007-03-22 23:59 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux, linux-kernel, linux-raid, linux-kernel, cebbert

On Thursday March 22, jens.axboe@oracle.com wrote:
> On Thu, Mar 22 2007, linux@horizon.com wrote:
> > > 3 (I think) seperate instances of this, each involving raid5. Is your
> > > array degraded or fully operational?
> > 
> > Ding! A drive fell out the other day, which is why the problems only
> > appeared recently.
> > 
> > md5 : active raid5 sdf4[5] sdd4[3] sdc4[2] sdb4[1] sda4[0]
> >       1719155200 blocks level 5, 64k chunk, algorithm 2 [6/5] [UUUU_U]
> >       bitmap: 149/164 pages [596KB], 1024KB chunk
> > 
> > H'm... this means that my alarm scripts aren't working.  Well, that's
> > good to know.  The drive is being re-integrated now.
> 
> Heh, at least something good came out of this bug then :-)
> But that's reaffirming. Neil, are you following this? It smells somewhat
> fishy wrt raid5.

Yes, I've been trying to pay attention....

The evidence does seem to point to raid5 and degraded arrays being
implicated.  However I'm having trouble finding how the fact that an array
is degraded would be visible down in the elevator except for having a
slightly different distribution of reads and writes.

One possible way is that if an array is degraded, then some read
requests will go through the stripe cache rather than direct to the
device.  However I would more expect the direct-to-device path to have
problems as it is much newer code.  Going through the cache for reads
is very well tested code - and reads come from the cache for most
writes anyway, so the elevator will still see lots of single-page.
reads.  It only ever sees single-page write.

There might be more pressure on the stripe cache when running
degraded, so we might call the ->unplug_fn a little more often, but I
doubt that would be noticeable.

As you seem to suggest by the patch, it does look like some sort of
unlocked access to the cfq_queue structure.  However apart from the
comment before cfq_exit_single_io_context being in the wrong place
(should be before __cfq_exit_single_io_context) I cannot see anything
obviously wrong with the locking around that structure.

So I'm afraid I'm stumped too. 

NeilBrown

> 
> I wonder if this triggers anything?
> 
> --- linux-2.6.20.3/block/ll_rw_blk.c~	2007-03-22 19:59:17.128833635 +0100
> +++ linux-2.6.20.3/block/ll_rw_blk.c	2007-03-22 19:59:28.850045490 +0100
> @@ -1602,6 +1602,8 @@
>   **/
>  void generic_unplug_device(request_queue_t *q)
>  {
> +	WARN_ON(irqs_disabled());
> +
>  	spin_lock_irq(q->queue_lock);
>  	__generic_unplug_device(q);
>  	spin_unlock_irq(q->queue_lock);
> 
> -- 
> Jens Axboe
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20.3 AMD64 oops in CFQ code
  2007-03-22 23:59       ` Neil Brown
@ 2007-03-23  0:31         ` Dan Williams
  2007-03-23  0:33           ` Dan Williams
  2007-03-23  0:44           ` Neil Brown
  0 siblings, 2 replies; 16+ messages in thread
From: Dan Williams @ 2007-03-23  0:31 UTC (permalink / raw)
  To: Neil Brown
  Cc: Jens Axboe, linux, linux-kernel, linux-raid, linux-kernel, cebbert

On 3/22/07, Neil Brown <neilb@suse.de> wrote:
> On Thursday March 22, jens.axboe@oracle.com wrote:
> > On Thu, Mar 22 2007, linux@horizon.com wrote:
> > > > 3 (I think) seperate instances of this, each involving raid5. Is your
> > > > array degraded or fully operational?
> > >
> > > Ding! A drive fell out the other day, which is why the problems only
> > > appeared recently.
> > >
> > > md5 : active raid5 sdf4[5] sdd4[3] sdc4[2] sdb4[1] sda4[0]
> > >       1719155200 blocks level 5, 64k chunk, algorithm 2 [6/5] [UUUU_U]
> > >       bitmap: 149/164 pages [596KB], 1024KB chunk
> > >
> > > H'm... this means that my alarm scripts aren't working.  Well, that's
> > > good to know.  The drive is being re-integrated now.
> >
> > Heh, at least something good came out of this bug then :-)
> > But that's reaffirming. Neil, are you following this? It smells somewhat
> > fishy wrt raid5.
>
> Yes, I've been trying to pay attention....
>
> The evidence does seem to point to raid5 and degraded arrays being
> implicated.  However I'm having trouble finding how the fact that an array
> is degraded would be visible down in the elevator except for having a
> slightly different distribution of reads and writes.
>
> One possible way is that if an array is degraded, then some read
> requests will go through the stripe cache rather than direct to the
> device.  However I would more expect the direct-to-device path to have
> problems as it is much newer code.  Going through the cache for reads
> is very well tested code - and reads come from the cache for most
> writes anyway, so the elevator will still see lots of single-page.
> reads.  It only ever sees single-page write.
>
> There might be more pressure on the stripe cache when running
> degraded, so we might call the ->unplug_fn a little more often, but I
> doubt that would be noticeable.
>
> As you seem to suggest by the patch, it does look like some sort of
> unlocked access to the cfq_queue structure.  However apart from the
> comment before cfq_exit_single_io_context being in the wrong place
> (should be before __cfq_exit_single_io_context) I cannot see anything
> obviously wrong with the locking around that structure.
>
> So I'm afraid I'm stumped too.
>
> NeilBrown

Not a cfq failure, but I have been able to reproduce a different oops
at array stop time while i/o's were pending.  I have not dug into it
enough to suggest a patch, but I wonder if it is somehow related to
the cfq failure since it involves congestion and drives going away:

md: md0: recovery done.
Unable to handle kernel NULL pointer dereference at virtual address 000000bc
pgd = 40004000
[000000bc] *pgd=00000000
Internal error: Oops: 17 [#1]
Modules linked in:
CPU: 0
PC is at raid5_congested+0x14/0x5c
LR is at sync_sb_inodes+0x278/0x2ec
pc : [<402801cc>]    lr : [<400a39e8>]    Not tainted
sp : 8a3e3ec4  ip : 8a3e3ed4  fp : 8a3e3ed0
r10: 40474878  r9 : 40474870  r8 : 40439710
r7 : 8a3e3f30  r6 : bfa76b78  r5 : 4161dc08  r4 : 40474800
r3 : 402801b8  r2 : 00000004  r1 : 00000001  r0 : 00000000
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  Segment kernel
Control: 400397F
Table: 7B7D4018  DAC: 00000035
Process pdflush (pid: 1371, stack limit = 0x8a3e2250)
Stack: (0x8a3e3ec4 to 0x8a3e4000)
3ec0:          8a3e3f04 8a3e3ed4 400a39e8 402801c4 8a3e3f24 000129f9 40474800
3ee0: 4047483c 40439a44 8a3e3f30 40439710 40438a48 4045ae68 8a3e3f24 8a3e3f08
3f00: 400a3ca0 400a377c 8a3e3f30 00001162 00012bed 40438a48 8a3e3f78 8a3e3f28
3f20: 40069b58 400a3bfc 00011e41 8a3e3f38 00000000 00000000 8a3e3f28 00000400
3f40: 00000000 00000000 00000000 00000000 00000000 00000025 8a3e3f80 8a3e3f8c
3f60: 40439750 8a3e2000 40438a48 8a3e3fc0 8a3e3f7c 4006ab68 40069a8c 00000001
3f80: bfae2ac0 40069a80 00000000 8a3e3f8c 8a3e3f8c 00012805 00000000 8a3e2000
3fa0: 9e7e1f1c 4006aa40 00000001 00000000 fffffffc 8a3e3ff4 8a3e3fc4 4005461c
3fc0: 4006aa4c 00000001 ffffffff ffffffff 00000000 00000000 00000000 00000000
3fe0: 00000000 00000000 00000000 8a3e3ff8 40042320 40054520 00000000 00000000
Backtrace:
[<402801b8>] (raid5_congested+0x0/0x5c) from [<400a39e8>]
(sync_sb_inodes+0x278/0x2ec)
[<400a3770>] (sync_sb_inodes+0x0/0x2ec) from [<400a3ca0>]
(writeback_inodes+0xb0/0xb8)
[<400a3bf0>] (writeback_inodes+0x0/0xb8) from [<40069b58>]
(wb_kupdate+0xd8/0x160)
 r7 = 40438A48  r6 = 00012BED  r5 = 00001162  r4 = 8A3E3F30
[<40069a80>] (wb_kupdate+0x0/0x160) from [<4006ab68>] (pdflush+0x128/0x204)
 r8 = 40438A48  r7 = 8A3E2000  r6 = 40439750  r5 = 8A3E3F8C
 r4 = 8A3E3F80
[<4006aa40>] (pdflush+0x0/0x204) from [<4005461c>] (kthread+0x108/0x134)
[<40054514>] (kthread+0x0/0x134) from [<40042320>] (do_exit+0x0/0x844)
Code: e92dd800 e24cb004 e5900000 e3a01001 (e59030bc)
md: md0 stopped.
md: unbind<sda>
md: export_rdev(sda)
md: unbind<sdd>
md: export_rdev(sdd)
md: unbind<sdc>
md: export_rdev(sdc)
md: unbind<sdb>
md: export_rdev(sdb)

2.6.20-rc3-iop1 on an iop348 platform.  SATA controller is sata_vsc.

--
Dan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20.3 AMD64 oops in CFQ code
  2007-03-23  0:31         ` Dan Williams
@ 2007-03-23  0:33           ` Dan Williams
  2007-03-23  0:44           ` Neil Brown
  1 sibling, 0 replies; 16+ messages in thread
From: Dan Williams @ 2007-03-23  0:33 UTC (permalink / raw)
  To: Neil Brown
  Cc: Jens Axboe, linux, linux-kernel, linux-raid, linux-kernel, cebbert

On 3/22/07, Dan Williams <dan.j.williams@intel.com> wrote:
> On 3/22/07, Neil Brown <neilb@suse.de> wrote:
> > On Thursday March 22, jens.axboe@oracle.com wrote:
> > > On Thu, Mar 22 2007, linux@horizon.com wrote:
> > > > > 3 (I think) seperate instances of this, each involving raid5. Is your
> > > > > array degraded or fully operational?
> > > >
> > > > Ding! A drive fell out the other day, which is why the problems only
> > > > appeared recently.
> > > >
> > > > md5 : active raid5 sdf4[5] sdd4[3] sdc4[2] sdb4[1] sda4[0]
> > > >       1719155200 blocks level 5, 64k chunk, algorithm 2 [6/5] [UUUU_U]
> > > >       bitmap: 149/164 pages [596KB], 1024KB chunk
> > > >
> > > > H'm... this means that my alarm scripts aren't working.  Well, that's
> > > > good to know.  The drive is being re-integrated now.
> > >
> > > Heh, at least something good came out of this bug then :-)
> > > But that's reaffirming. Neil, are you following this? It smells somewhat
> > > fishy wrt raid5.
> >
> > Yes, I've been trying to pay attention....
> >
> > The evidence does seem to point to raid5 and degraded arrays being
> > implicated.  However I'm having trouble finding how the fact that an array
> > is degraded would be visible down in the elevator except for having a
> > slightly different distribution of reads and writes.
> >
> > One possible way is that if an array is degraded, then some read
> > requests will go through the stripe cache rather than direct to the
> > device.  However I would more expect the direct-to-device path to have
> > problems as it is much newer code.  Going through the cache for reads
> > is very well tested code - and reads come from the cache for most
> > writes anyway, so the elevator will still see lots of single-page.
> > reads.  It only ever sees single-page write.
> >
> > There might be more pressure on the stripe cache when running
> > degraded, so we might call the ->unplug_fn a little more often, but I
> > doubt that would be noticeable.
> >
> > As you seem to suggest by the patch, it does look like some sort of
> > unlocked access to the cfq_queue structure.  However apart from the
> > comment before cfq_exit_single_io_context being in the wrong place
> > (should be before __cfq_exit_single_io_context) I cannot see anything
> > obviously wrong with the locking around that structure.
> >
> > So I'm afraid I'm stumped too.
> >
> > NeilBrown
>
> Not a cfq failure, but I have been able to reproduce a different oops
> at array stop time while i/o's were pending.  I have not dug into it
> enough to suggest a patch, but I wonder if it is somehow related to
> the cfq failure since it involves congestion and drives going away:
>
> md: md0: recovery done.
> Unable to handle kernel NULL pointer dereference at virtual address 000000bc
> pgd = 40004000
> [000000bc] *pgd=00000000
> Internal error: Oops: 17 [#1]
> Modules linked in:
> CPU: 0
> PC is at raid5_congested+0x14/0x5c
> LR is at sync_sb_inodes+0x278/0x2ec
> pc : [<402801cc>]    lr : [<400a39e8>]    Not tainted
> sp : 8a3e3ec4  ip : 8a3e3ed4  fp : 8a3e3ed0
> r10: 40474878  r9 : 40474870  r8 : 40439710
> r7 : 8a3e3f30  r6 : bfa76b78  r5 : 4161dc08  r4 : 40474800
> r3 : 402801b8  r2 : 00000004  r1 : 00000001  r0 : 00000000
> Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  Segment kernel
> Control: 400397F
> Table: 7B7D4018  DAC: 00000035
> Process pdflush (pid: 1371, stack limit = 0x8a3e2250)
> Stack: (0x8a3e3ec4 to 0x8a3e4000)
> 3ec0:          8a3e3f04 8a3e3ed4 400a39e8 402801c4 8a3e3f24 000129f9 40474800
> 3ee0: 4047483c 40439a44 8a3e3f30 40439710 40438a48 4045ae68 8a3e3f24 8a3e3f08
> 3f00: 400a3ca0 400a377c 8a3e3f30 00001162 00012bed 40438a48 8a3e3f78 8a3e3f28
> 3f20: 40069b58 400a3bfc 00011e41 8a3e3f38 00000000 00000000 8a3e3f28 00000400
> 3f40: 00000000 00000000 00000000 00000000 00000000 00000025 8a3e3f80 8a3e3f8c
> 3f60: 40439750 8a3e2000 40438a48 8a3e3fc0 8a3e3f7c 4006ab68 40069a8c 00000001
> 3f80: bfae2ac0 40069a80 00000000 8a3e3f8c 8a3e3f8c 00012805 00000000 8a3e2000
> 3fa0: 9e7e1f1c 4006aa40 00000001 00000000 fffffffc 8a3e3ff4 8a3e3fc4 4005461c
> 3fc0: 4006aa4c 00000001 ffffffff ffffffff 00000000 00000000 00000000 00000000
> 3fe0: 00000000 00000000 00000000 8a3e3ff8 40042320 40054520 00000000 00000000
> Backtrace:
> [<402801b8>] (raid5_congested+0x0/0x5c) from [<400a39e8>]
> (sync_sb_inodes+0x278/0x2ec)
> [<400a3770>] (sync_sb_inodes+0x0/0x2ec) from [<400a3ca0>]
> (writeback_inodes+0xb0/0xb8)
> [<400a3bf0>] (writeback_inodes+0x0/0xb8) from [<40069b58>]
> (wb_kupdate+0xd8/0x160)
>  r7 = 40438A48  r6 = 00012BED  r5 = 00001162  r4 = 8A3E3F30
> [<40069a80>] (wb_kupdate+0x0/0x160) from [<4006ab68>] (pdflush+0x128/0x204)
>  r8 = 40438A48  r7 = 8A3E2000  r6 = 40439750  r5 = 8A3E3F8C
>  r4 = 8A3E3F80
> [<4006aa40>] (pdflush+0x0/0x204) from [<4005461c>] (kthread+0x108/0x134)
> [<40054514>] (kthread+0x0/0x134) from [<40042320>] (do_exit+0x0/0x844)
> Code: e92dd800 e24cb004 e5900000 e3a01001 (e59030bc)
> md: md0 stopped.
> md: unbind<sda>
> md: export_rdev(sda)
> md: unbind<sdd>
> md: export_rdev(sdd)
> md: unbind<sdc>
> md: export_rdev(sdc)
> md: unbind<sdb>
> md: export_rdev(sdb)
>
> 2.6.20-rc3-iop1 on an iop348 platform.  SATA controller is sata_vsc.
Sorry, that's 2.6.21-rc3-iop1.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20.3 AMD64 oops in CFQ code
  2007-03-23  0:31         ` Dan Williams
  2007-03-23  0:33           ` Dan Williams
@ 2007-03-23  0:44           ` Neil Brown
  2007-03-23 17:46             ` linux
  1 sibling, 1 reply; 16+ messages in thread
From: Neil Brown @ 2007-03-23  0:44 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jens Axboe, linux, linux-kernel, linux-raid, linux-kernel, cebbert

On Thursday March 22, dan.j.williams@intel.com wrote:
> 
> Not a cfq failure, but I have been able to reproduce a different oops
> at array stop time while i/o's were pending.  I have not dug into it
> enough to suggest a patch, but I wonder if it is somehow related to
> the cfq failure since it involves congestion and drives going away:

Thanks.   I know about that one and have a patch about to be posted
which should fix it.  But I don't completely understand it.

When a raid5 array shuts down, it clears mddev->private, but doesn't
clean q->backing_dev_info.congested_fn.  So if someone tries to call
that congested_fn, it will try to dereference mddev->private and Oops.

Only by the time that raid5 is shutting down, no-one should have a
reference to the device any more, and no-one should be in a position
to call congested_fn !!

Maybe pdflush is just trying to sync the block device, even though
there is no dirty date .... dunno....

But I don't think it is related to the cfq problem as this one is only
a problem when the array is being stopped.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20.3 AMD64 oops in CFQ code
  2007-03-23  0:44           ` Neil Brown
@ 2007-03-23 17:46             ` linux
  2007-04-03  5:49               ` Tejun Heo
  0 siblings, 1 reply; 16+ messages in thread
From: linux @ 2007-03-23 17:46 UTC (permalink / raw)
  To: dan.j.williams, linux-ide, neilb
  Cc: cebbert, jens.axboe, linux-kernel, linux-kernel, linux-raid, linux

As an additional data point, here's a libata problem I'm having trying to
rebuild the array.

I have six identical 400 GB drives (ST3400832AS), and one is giving
me hassles.  I've run SMART short and long diagnostics, badblocks, and
Seagate's "seatools" diagnostic software, and none of these find problems.
It is the only one of the six with a non-zero reallocated sector count
(it's 26).

Anyway, the drive is partitioned into a 45G RAID-10 part and a 350G RAID-5
part.  The RAID-10 part integrated successfully, but the RAID-5 got to
about 60% and then puked:

ata5.00: exception Emask 0x0 SAct 0x1ef SErr 0x0 action 0x2 frozen
ata5.00: cmd 61/c0:00:d2:d0:b9/00:00:1c:00:00/40 tag 0 cdb 0x0 data 98304 out
         res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5.00: cmd 61/40:08:92:d1:b9/00:00:1c:00:00/40 tag 1 cdb 0x0 data 32768 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5.00: cmd 61/00:10:d2:d1:b9/01:00:1c:00:00/40 tag 2 cdb 0x0 data 131072 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5.00: cmd 61/00:18:d2:d2:b9/01:00:1c:00:00/40 tag 3 cdb 0x0 data 131072 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5.00: cmd 61/00:28:d2:d3:b9/01:00:1c:00:00/40 tag 5 cdb 0x0 data 131072 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5.00: cmd 61/00:30:d2:d4:b9/01:00:1c:00:00/40 tag 6 cdb 0x0 data 131072 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5.00: cmd 61/00:38:d2:d5:b9/01:00:1c:00:00/40 tag 7 cdb 0x0 data 131072 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5.00: cmd 61/00:40:d2:d6:b9/01:00:1c:00:00/40 tag 8 cdb 0x0 data 131072 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5: soft resetting port
ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata5.00: configured for UDMA/100
ata5: EH complete
SCSI device sde: 781422768 512-byte hdwr sectors (400088 MB)
sde: Write Protect is off
SCSI device sde: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 
         res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5: soft resetting port
ata5: softreset failed (timeout)
ata5: softreset failed, retrying in 5 secs
ata5: hard resetting port
ata5: softreset failed (timeout)
ata5: follow-up softreset failed, retrying in 5 secs
ata5: hard resetting port
ata5: softreset failed (timeout)
ata5: reset failed, giving up
ata5.00: disabled
ata5: EH complete
sd 4:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sde, sector 91795259
md: super_written gets error=-5, uptodate=0
raid10: Disk failure on sde3, disabling device. 
        Operation continuing on 5 devices
sd 4:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sde, sector 481942994
raid5: Disk failure on sde4, disabling device. Operation continuing on 5 devices
sd 4:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sde, sector 481944018
md: md5: recovery done.
RAID10 conf printout:
 --- wd:5 rd:6
 disk 0, wo:0, o:1, dev:sdb3
 disk 1, wo:0, o:1, dev:sdc3
 disk 2, wo:0, o:1, dev:sdd3
 disk 3, wo:1, o:0, dev:sde3
 disk 4, wo:0, o:1, dev:sdf3
 disk 5, wo:0, o:1, dev:sda3
RAID10 conf printout:
 --- wd:5 rd:6
 disk 0, wo:0, o:1, dev:sdb3
 disk 1, wo:0, o:1, dev:sdc3
 disk 2, wo:0, o:1, dev:sdd3
 disk 4, wo:0, o:1, dev:sdf3
 disk 5, wo:0, o:1, dev:sda3
RAID5 conf printout:
 --- rd:6 wd:5
 disk 0, o:1, dev:sda4
 disk 1, o:1, dev:sdb4
 disk 2, o:1, dev:sdc4
 disk 3, o:1, dev:sdd4
 disk 4, o:0, dev:sde4
 disk 5, o:1, dev:sdf4
RAID5 conf printout:
 --- rd:6 wd:5
 disk 0, o:1, dev:sda4
 disk 1, o:1, dev:sdb4
 disk 2, o:1, dev:sdc4
 disk 3, o:1, dev:sdd4
 disk 5, o:1, dev:sdf4

The first error address is just barely inside the RAID-10 part (which ends at
sector 91,795,410), while the second and third errors (at 481,942,994)
look like where the reconstruction was working.


Anyway, what's annoying is that I can't figure out how to bring the
drive back on line without resetting the box.  It's in a hot-swap enclosure,
but power cycling the drive doesn't seem to help.  I thought libata hotplug
was working?  (SiI3132 card, using the sil24 driver.)

(H'm... after rebooting, reallocated sectors jumped from 26 to 39.
Something is up with that drive.)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20.3 AMD64 oops in CFQ code
  2007-03-23 17:46             ` linux
@ 2007-04-03  5:49               ` Tejun Heo
  2007-04-03 13:03                 ` linux
  2007-04-04 23:22                 ` Bill Davidsen
  0 siblings, 2 replies; 16+ messages in thread
From: Tejun Heo @ 2007-04-03  5:49 UTC (permalink / raw)
  To: linux
  Cc: dan.j.williams, linux-ide, neilb, cebbert, jens.axboe,
	linux-kernel, linux-kernel, linux-raid

[resending.  my mail service was down for more than a week and this
message didn't get delivered.]

linux@horizon.com wrote:
> > Anyway, what's annoying is that I can't figure out how to bring the
> > drive back on line without resetting the box.  It's in a hot-swap
enclosure,
> > but power cycling the drive doesn't seem to help.  I thought libata
hotplug
> > was working?  (SiI3132 card, using the sil24 driver.)

Yeah, it's working but failing resets are considered highly dangerous
(in that the controller status is unknown and may cause something
dangerous like screaming interrupts) and port is muted after that.  The
plan is to handle this with polling hotplug such that libata tries to
revive the port if PHY status change is detected by polling.  Patches
are available but they need other things to resolved to get integrated.
 I think it'll happen before the summer.

Anyways, you can tell libata to retry the port by manually telling it to
rescan the port (echo - - - > /sys/class/scsi_host/hostX/scan).

> > (H'm... after rebooting, reallocated sectors jumped from 26 to 39.
> > Something is up with that drive.)

Yeap, seems like a broken drive to me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20.3 AMD64 oops in CFQ code
  2007-04-03  5:49               ` Tejun Heo
@ 2007-04-03 13:03                 ` linux
  2007-04-03 13:11                   ` Tejun Heo
  2007-04-04 23:22                 ` Bill Davidsen
  1 sibling, 1 reply; 16+ messages in thread
From: linux @ 2007-04-03 13:03 UTC (permalink / raw)
  To: linux, teheo
  Cc: cebbert, dan.j.williams, jens.axboe, linux-ide, linux-kernel,
	linux-kernel, linux-raid, neilb

linux@horizon.com wrote:
>> Anyway, what's annoying is that I can't figure out how to bring the
>> drive back on line without resetting the box.  It's in a hot-swap enclosure,
>> but power cycling the drive doesn't seem to help.  I thought libata hotplug
>> was working?  (SiI3132 card, using the sil24 driver.)

> Yeah, it's working but failing resets are considered highly dangerous
> (in that the controller status is unknown and may cause something
> dangerous like screaming interrupts) and port is muted after that.  The
> plan is to handle this with polling hotplug such that libata tries to
> revive the port if PHY status change is detected by polling.  Patches
> are available but they need other things to resolved to get integrated.
> I think it'll happen before the summer.

> Anyways, you can tell libata to retry the port by manually telling it to
> rescan the port (echo - - - > /sys/class/scsi_host/hostX/scan).

Ah, thank you!  I have to admit, that is at least as mysterious as any
Microsoft registry tweak.

>> (H'm... after rebooting, reallocated sectors jumped from 26 to 39.
>> Something is up with that drive.)

> Yeap, seems like a broken drive to me.

Actually, after a few rounds, the reallocated sectors stabilized at 56
and all is working well again.  It's like there was a major problem with
error handling.

The problem is that I don't know where the blame lies.

In case it helps, here is the syslog associated with the first failure:

07:17:56: ata5.00: exception Emask 0x0 SAct 0x7fffff SErr 0x0 action 0x2 frozen
07:20:15: ata5.00: cmd 61/30:00:a2:d8:b9/00:00:1c:00:00/40 tag 0 cdb 0x0 data 24576 out
07:20:15:          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:15: ata5.00: cmd 60/08:08:9a:7c:b9/00:00:1c:00:00/40 tag 1 cdb 0x0 data 4096 in
07:20:15:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:15: ata5.00: cmd 60/08:10:ca:7c:b9/00:00:1c:00:00/40 tag 2 cdb 0x0 data 4096 in
07:20:15:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:15: ata5.00: cmd 60/20:18:b2:d6:b9/00:00:1c:00:00/40 tag 3 cdb 0x0 data 16384 in
07:20:15:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:15: ata5.00: cmd 61/08:20:fa:a2:b6/00:00:1c:00:00/40 tag 4 cdb 0x0 data 4096 out
07:20:15:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:15: ata5.00: cmd 60/40:28:1a:d7:b9/00:00:1c:00:00/40 tag 5 cdb 0x0 data 32768 in
07:20:16:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:16: ata5.00: cmd 60/00:30:d2:11:c0/01:00:24:00:00/40 tag 6 cdb 0x0 data 131072 in
07:20:16:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:16: ata5.00: cmd 60/28:38:2a:0b:b9/00:00:1c:00:00/40 tag 7 cdb 0x0 data 20480 in
07:20:16:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:16: ata5.00: cmd 60/08:40:82:7c:b9/00:00:1c:00:00/40 tag 8 cdb 0x0 data 4096 in
07:20:16:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:16: ata5.00: cmd 61/58:48:02:a3:b6/00:00:1c:00:00/40 tag 9 cdb 0x0 data 45056 out
07:20:16:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:16: ata5.00: cmd 60/10:50:52:49:ba/00:00:1c:00:00/40 tag 10 cdb 0x0 data 8192 in
07:20:16:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:16: ata5.00: cmd 60/08:58:6a:49:ba/00:00:1c:00:00/40 tag 11 cdb 0x0 data 4096 in
07:20:16:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:16: ata5.00: cmd 60/08:60:7a:49:ba/00:00:1c:00:00/40 tag 12 cdb 0x0 data 4096 in
07:20:16:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:16: ata5.00: cmd 60/10:68:8a:49:ba/00:00:1c:00:00/40 tag 13 cdb 0x0 data 8192 in
07:20:16:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:16: ata5.00: cmd 60/08:70:a2:49:ba/00:00:1c:00:00/40 tag 14 cdb 0x0 data 4096 in
07:20:16:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:16: ata5.00: cmd 60/08:78:ba:49:ba/00:00:1c:00:00/40 tag 15 cdb 0x0 data 4096 in
07:20:16:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:16: ata5.00: cmd 60/18:80:0a:b0:b8/00:00:1c:00:00/40 tag 16 cdb 0x0 data 12288 in
07:20:16:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:16: ata5.00: cmd 60/10:88:2a:b0:b8/00:00:1c:00:00/40 tag 17 cdb 0x0 data 8192 in
07:20:16:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:16: ata5.00: cmd 60/08:90:52:0a:b9/00:00:1c:00:00/40 tag 18 cdb 0x0 data 4096 in
07:20:16:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:16: ata5.00: cmd 61/40:98:d2:d6:b9/00:00:1c:00:00/40 tag 19 cdb 0x0 data 32768 out
07:20:16:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:16: ata5.00: cmd 61/40:a0:92:d7:b9/00:00:1c:00:00/40 tag 20 cdb 0x0 data 32768 out
07:20:16:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:16: ata5.00: cmd 61/50:a8:52:d8:b9/00:00:1c:00:00/40 tag 21 cdb 0x0 data 40960 out
07:20:16:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:16: ata5.00: cmd 61/78:b0:da:d7:b9/00:00:1c:00:00/40 tag 22 cdb 0x0 data 61440 out
07:20:16:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:16: ata5: soft resetting port
07:20:16: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
07:20:16: ata5.00: configured for UDMA/100
07:20:16: ata5: EH complete
07:20:16: SCSI device sde: 781422768 512-byte hdwr sectors (400088 MB)
07:20:16: sde: Write Protect is off
07:20:16: sde: Mode Sense: 00 3a 00 00
07:20:16: SCSI device sde: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
07:20:16: ata5.00: exception Emask 0x0 SAct 0xb0000 SErr 0x0 action 0x2 frozen
07:20:16: ata5.00: cmd 60/00:80:d2:11:c0/01:00:24:00:00/40 tag 16 cdb 0x0 data 131072 in
07:20:16:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:16: ata5.00: cmd 60/40:88:1a:d7:b9/00:00:1c:00:00/40 tag 17 cdb 0x0 data 32768 in
07:20:16:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:16: ata5.00: cmd 60/20:98:b2:d6:b9/00:00:1c:00:00/40 tag 19 cdb 0x0 data 16384 in
07:20:16:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
07:20:16: ata5: soft resetting port
07:20:16: ata5: softreset failed (timeout)
07:20:16: ata5: softreset failed, retrying in 5 secs
07:20:16: ata5: hard resetting port
07:20:16: ata5: softreset failed (timeout)
07:20:16: ata5: follow-up softreset failed, retrying in 5 secs
07:20:16: ata5: hard resetting port
07:20:16: ata5: softreset failed (timeout)
07:20:16: ata5: reset failed, giving up
07:20:16: ata5.00: disabled
07:20:16: ata5: EH complete
07:20:16: sd 4:0:0:0: SCSI error: return code = 0x00040000
07:20:16: end_request: I/O error, dev sde, sector 481941170
07:20:16: sd 4:0:0:0: SCSI error: return code = 0x00040000
07:20:16: end_request: I/O error, dev sde, sector 481941274
07:20:16: sd 4:0:0:0: SCSI error: return code = 0x00040000
07:20:16: end_request: I/O error, dev sde, sector 616567250
07:20:16: sd 4:0:0:0: SCSI error: return code = 0x00040000
07:20:16: end_request: I/O error, dev sde, sector 779457538
07:20:16: md: super_written gets error=-5, uptodate=0
07:20:16: raid5: Disk failure on sde4, disabling device. Operation continuing on 5 devices
07:20:16: sd 4:0:0:0: SCSI error: return code = 0x00040000
07:20:16: end_request: I/O error, dev sde, sector 91795259
07:20:16: md: super_written gets error=-5, uptodate=0
07:20:16: raid10: Disk failure on sde3, disabling device.
07:20:16: ^IOperation continuing on 5 devices
07:20:16: RAID5 conf printout:
07:20:16:  --- rd:6 wd:5
07:20:16:  disk 0, o:1, dev:sda4
07:20:16:  disk 1, o:1, dev:sdb4
07:20:16:  disk 2, o:1, dev:sdc4
07:20:16:  disk 3, o:1, dev:sdd4
07:20:16:  disk 4, o:0, dev:sde4
07:20:16:  disk 5, o:1, dev:sdf4
07:20:16: RAID5 conf printout:
07:20:16:  --- rd:6 wd:5
07:20:16:  disk 0, o:1, dev:sda4
07:20:16:  disk 1, o:1, dev:sdb4
07:20:16:  disk 2, o:1, dev:sdc4
07:20:16:  disk 3, o:1, dev:sdd4
07:20:16:  disk 5, o:1, dev:sdf4
07:20:16: RAID10 conf printout:
07:20:16:  --- wd:5 rd:6
07:20:16:  disk 0, wo:0, o:1, dev:sdb3
07:20:16:  disk 1, wo:0, o:1, dev:sdc3
07:20:16:  disk 2, wo:0, o:1, dev:sdd3
07:20:16:  disk 3, wo:1, o:0, dev:sde3
07:20:16:  disk 4, wo:0, o:1, dev:sdf3
07:20:16:  disk 5, wo:0, o:1, dev:sda3
07:20:16: RAID10 conf printout:
07:20:16:  --- wd:5 rd:6
07:20:16:  disk 0, wo:0, o:1, dev:sdb3
07:20:16:  disk 1, wo:0, o:1, dev:sdc3
07:20:16:  disk 2, wo:0, o:1, dev:sdd3
07:20:16:  disk 4, wo:0, o:1, dev:sdf3
07:20:16:  disk 5, wo:0, o:1, dev:sda3

And another one while recovering.  I notice that the reset timeouts seem
to take much longer this time.  Or could the above have had similar
timeouts, but syslog was blocked by the error and didn't receive and
timestamp the messages until recovery was complete?

14:49:55: RAID5 conf printout:
14:49:55:  --- rd:6 wd:5
14:49:55:  disk 0, o:1, dev:sda4
14:49:55:  disk 1, o:1, dev:sdb4
14:49:55:  disk 2, o:1, dev:sdc4
14:49:55:  disk 3, o:1, dev:sdd4
14:49:55:  disk 4, o:1, dev:sde4
14:49:55:  disk 5, o:1, dev:sdf4
14:49:55: md: recovery of RAID array md5
14:49:55: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
14:49:55: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
14:49:55: md: using 128k window, over a total of 343831040 blocks.
14:56:13: ata5.00: exception Emask 0x0 SAct 0xff SErr 0x0 action 0x2 frozen
14:56:13: ata5.00: cmd 61/f8:00:da:d6:b9/00:00:1c:00:00/40 tag 0 cdb 0x0 data 126976 out
14:56:13:          res 40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
14:56:13: ata5.00: cmd 61/90:08:d2:2f:ba/00:00:1c:00:00/40 tag 1 cdb 0x0 data 73728 out
14:56:13:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
14:56:13: ata5.00: cmd 61/70:10:62:30:ba/01:00:1c:00:00/40 tag 2 cdb 0x0 data 188416 out
14:56:13:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
14:56:13: ata5.00: cmd 61/00:18:d2:31:ba/01:00:1c:00:00/40 tag 3 cdb 0x0 data 131072 out
14:56:13:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
14:56:13: ata5.00: cmd 61/00:20:d2:32:ba/01:00:1c:00:00/40 tag 4 cdb 0x0 data 131072 out
14:56:13:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
14:56:13: ata5.00: cmd 61/00:28:d2:33:ba/01:00:1c:00:00/40 tag 5 cdb 0x0 data 131072 out
14:56:13:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
14:56:13: ata5.00: cmd 61/00:30:d2:34:ba/01:00:1c:00:00/40 tag 6 cdb 0x0 data 131072 out
14:56:13:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
14:56:13: ata5.00: cmd 61/00:38:d2:35:ba/01:00:1c:00:00/40 tag 7 cdb 0x0 data 131072 out
14:56:13:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
14:56:13: ata5: soft resetting port
14:56:43: ata5: softreset failed (timeout)
14:56:43: ata5: softreset failed, retrying in 5 secs
14:56:48: ata5: hard resetting port
14:57:20: ata5: softreset failed (timeout)
14:57:20: ata5: follow-up softreset failed, retrying in 5 secs
14:57:25: ata5: hard resetting port
14:57:58: ata5: softreset failed (timeout)
14:57:58: ata5: reset failed, giving up
14:57:58: ata5.00: disabled
14:57:58: ata5: EH complete
14:57:58: sd 4:0:0:0: SCSI error: return code = 0x00040000
14:57:58: end_request: I/O error, dev sde, sector 481965522
14:57:58: raid5: Disk failure on sde4, disabling device. Operation continuing on 5 devices
14:57:58: sd 4:0:0:0: SCSI error: return code = 0x00040000
14:57:58: end_request: I/O error, dev sde, sector 481965266
14:57:58: sd 4:0:0:0: SCSI error: return code = 0x00040000
14:57:58: end_request: I/O error, dev sde, sector 481965010
14:57:58: sd 4:0:0:0: SCSI error: return code = 0x00040000
14:57:58: end_request: I/O error, dev sde, sector 481964754
14:57:58: sd 4:0:0:0: SCSI error: return code = 0x00040000
14:57:58: end_request: I/O error, dev sde, sector 481964498
14:57:58: sd 4:0:0:0: SCSI error: return code = 0x00040000
14:57:58: end_request: I/O error, dev sde, sector 481964130
14:57:58: sd 4:0:0:0: SCSI error: return code = 0x00040000
14:57:58: end_request: I/O error, dev sde, sector 481963986
14:57:58: sd 4:0:0:0: SCSI error: return code = 0x00040000
14:57:58: end_request: I/O error, dev sde, sector 481941210
14:57:58: md: md5: recovery done.
14:57:58: RAID5 conf printout:
14:57:58:  --- rd:6 wd:5
14:57:58:  disk 0, o:1, dev:sda4
14:57:58:  disk 1, o:1, dev:sdb4
14:57:58:  disk 2, o:1, dev:sdc4
14:57:58:  disk 3, o:1, dev:sdd4
14:57:58:  disk 4, o:0, dev:sde4
14:57:58:  disk 5, o:1, dev:sdf4
14:57:58: RAID5 conf printout:
14:57:58:  --- rd:6 wd:5
14:57:58:  disk 0, o:1, dev:sda4
14:57:58:  disk 1, o:1, dev:sdb4
14:57:58:  disk 2, o:1, dev:sdc4
14:57:58:  disk 3, o:1, dev:sdd4
14:57:58:  disk 5, o:1, dev:sdf4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20.3 AMD64 oops in CFQ code
  2007-04-03 13:03                 ` linux
@ 2007-04-03 13:11                   ` Tejun Heo
  0 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2007-04-03 13:11 UTC (permalink / raw)
  To: linux
  Cc: cebbert, dan.j.williams, jens.axboe, linux-ide, linux-kernel,
	linux-kernel, linux-raid, neilb

linux@horizon.com wrote:
> linux@horizon.com wrote:
>>> Anyway, what's annoying is that I can't figure out how to bring the
>>> drive back on line without resetting the box.  It's in a hot-swap enclosure,
>>> but power cycling the drive doesn't seem to help.  I thought libata hotplug
>>> was working?  (SiI3132 card, using the sil24 driver.)
> 
>> Yeah, it's working but failing resets are considered highly dangerous
>> (in that the controller status is unknown and may cause something
>> dangerous like screaming interrupts) and port is muted after that.  The
>> plan is to handle this with polling hotplug such that libata tries to
>> revive the port if PHY status change is detected by polling.  Patches
>> are available but they need other things to resolved to get integrated.
>> I think it'll happen before the summer.
> 
>> Anyways, you can tell libata to retry the port by manually telling it to
>> rescan the port (echo - - - > /sys/class/scsi_host/hostX/scan).
> 
> Ah, thank you!  I have to admit, that is at least as mysterious as any
> Microsoft registry tweak.

Polling hotplug should fix this.  I thought I would be able to merge it
much earlier.  I apparently was way too optimistic.  :-(

>>> (H'm... after rebooting, reallocated sectors jumped from 26 to 39.
>>> Something is up with that drive.)
> 
>> Yeap, seems like a broken drive to me.
> 
> Actually, after a few rounds, the reallocated sectors stabilized at 56
> and all is working well again.  It's like there was a major problem with
> error handling.
> 
> The problem is that I don't know where the blame lies.

I'm pretty sure it's the firmware's fault.  It's not supposed to go out
for lunch like that even when internal error occurs.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20.3 AMD64 oops in CFQ code
  2007-04-03  5:49               ` Tejun Heo
  2007-04-03 13:03                 ` linux
@ 2007-04-04 23:22                 ` Bill Davidsen
  2007-04-05  4:13                   ` Lee Revell
  1 sibling, 1 reply; 16+ messages in thread
From: Bill Davidsen @ 2007-04-04 23:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux, dan.j.williams, linux-ide, neilb, cebbert, jens.axboe,
	linux-kernel, linux-kernel, linux-raid

Tejun Heo wrote:
> [resending.  my mail service was down for more than a week and this
> message didn't get delivered.]
>
> linux@horizon.com wrote:
>   
>>> Anyway, what's annoying is that I can't figure out how to bring the
>>> drive back on line without resetting the box.  It's in a hot-swap
>>>       
> enclosure,
>   
>>> but power cycling the drive doesn't seem to help.  I thought libata
>>>       
> hotplug
>   
>>> was working?  (SiI3132 card, using the sil24 driver.)
>>>       
>
> Yeah, it's working but failing resets are considered highly dangerous
> (in that the controller status is unknown and may cause something
> dangerous like screaming interrupts) and port is muted after that.  The
> plan is to handle this with polling hotplug such that libata tries to
> revive the port if PHY status change is detected by polling.  Patches
> are available but they need other things to resolved to get integrated.
>  I think it'll happen before the summer.
>
> Anyways, you can tell libata to retry the port by manually telling it to
> rescan the port (echo - - - > /sys/class/scsi_host/hostX/scan).
>   
I won't say that's voodoo, but if I ever did it I'd wipe down my 
keyboard with holy water afterward. ;-)

Well, I did save the message in my tricks file, but it sounds like a 
last ditch effort after something get very wrong.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20.3 AMD64 oops in CFQ code
  2007-04-04 23:22                 ` Bill Davidsen
@ 2007-04-05  4:13                   ` Lee Revell
  2007-04-05  4:29                     ` Tejun Heo
  0 siblings, 1 reply; 16+ messages in thread
From: Lee Revell @ 2007-04-05  4:13 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Tejun Heo, linux, dan.j.williams, linux-ide, neilb, cebbert,
	jens.axboe, linux-kernel, linux-kernel, linux-raid

On 4/4/07, Bill Davidsen <davidsen@tmr.com> wrote:
> I won't say that's voodoo, but if I ever did it I'd wipe down my
> keyboard with holy water afterward. ;-)
>
> Well, I did save the message in my tricks file, but it sounds like a
> last ditch effort after something get very wrong.

Would it reallty be an impediment to development if the kernel
maintainers simply refuse to merge patches that add new sysfs entries
without corresponding documentation?

Lee

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.20.3 AMD64 oops in CFQ code
  2007-04-05  4:13                   ` Lee Revell
@ 2007-04-05  4:29                     ` Tejun Heo
  0 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2007-04-05  4:29 UTC (permalink / raw)
  To: Lee Revell
  Cc: Bill Davidsen, linux, dan.j.williams, linux-ide, neilb, cebbert,
	jens.axboe, linux-kernel, linux-kernel, linux-raid

Lee Revell wrote:
> On 4/4/07, Bill Davidsen <davidsen@tmr.com> wrote:
>> I won't say that's voodoo, but if I ever did it I'd wipe down my
>> keyboard with holy water afterward. ;-)
>>
>> Well, I did save the message in my tricks file, but it sounds like a
>> last ditch effort after something get very wrong.

Which actually is true.  ATA ports failing to reset indicate something
is very wrong.  Either the attached device or the controller is broken
and libata shuts down the port to protect the rest of the system from
it.  The manual scan requests tell libata to give it one more shot and
polling hotplug can do that automatically.  Anyways, this shouldn't
happen unless you have a broken piece of hardware.

> Would it reallty be an impediment to development if the kernel
> maintainers simply refuse to merge patches that add new sysfs entries
> without corresponding documentation?

SCSI host scan nodes have been there for a long time.  I think it's
documented somewhere.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2007-04-05  4:29 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-22 12:38 2.6.20.3 AMD64 oops in CFQ code linux
2007-03-22 18:41 ` Jens Axboe
2007-03-22 18:54   ` linux
2007-03-22 19:00     ` Jens Axboe
2007-03-22 23:59       ` Neil Brown
2007-03-23  0:31         ` Dan Williams
2007-03-23  0:33           ` Dan Williams
2007-03-23  0:44           ` Neil Brown
2007-03-23 17:46             ` linux
2007-04-03  5:49               ` Tejun Heo
2007-04-03 13:03                 ` linux
2007-04-03 13:11                   ` Tejun Heo
2007-04-04 23:22                 ` Bill Davidsen
2007-04-05  4:13                   ` Lee Revell
2007-04-05  4:29                     ` Tejun Heo
2007-03-22 18:43 ` Aristeu Sergio Rozanski Filho

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.