linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Weird RAID/SATA problem [ once was Re: 2.6.17-mm3 ]
       [not found]             ` <44A64BD8.90906@reub.net>
@ 2006-07-01 10:51               ` Neil Brown
  2006-07-01 11:38                 ` Reuben Farrelly
  0 siblings, 1 reply; 6+ messages in thread
From: Neil Brown @ 2006-07-01 10:51 UTC (permalink / raw)
  To: Reuben Farrelly; +Cc: linux-kernel, Andrew Morton

On Saturday July 1, reuben-lkml@reub.net wrote:
> >>
> >> md: super_written gets error=-5, uptodate=0
> >>
> >> messages on the console that didn't seem to want to stop...
> > 
> > '5' == EIO 
> > 
> > We try to write the superblock and we get EIO - something wrong somewhere.
> > 
> > What sort of device are we writing to here?  What controller, what
> > driver (if you know), what drives?
> 
> The two raid-1 disks are the Seagate ST380817AS SATA disks on the onboard
> controller.  The motherboard is an Intel D945GNT motherboard.  See dmesg..
> 
> > Can you write to the device without using md?
> 
> Yes.
> 

So... When md writes a superblock to this device, it reliably (or
close to reliably) gets EIO.  When mkfs writes, it works fine.

Only difference I can think of is still barriers... Does this patch
make any difference?

NeilBrown

(For readers on linux-kernel who are wondering where the history of
this thread is - you won't find it.  We were having a discussion in
private and finally got embarrassed about keeping such gems to
ourselves so we decided to share)


Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/md.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c	2006-07-01 12:12:14.000000000 +1000
+++ ./drivers/md/md.c	2006-07-01 20:48:44.000000000 +1000
@@ -454,7 +454,7 @@ void md_super_write(mddev_t *mddev, mdk_
 	bio->bi_rw = rw;
 
 	atomic_inc(&mddev->pending_writes);
-	if (!test_bit(BarriersNotsupp, &rdev->flags)) {
+	if (0 && !test_bit(BarriersNotsupp, &rdev->flags)) {
 		struct bio *rbio;
 		rw |= (1<<BIO_RW_BARRIER);
 		rbio = bio_clone(bio, GFP_NOIO);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Weird RAID/SATA problem [ once was Re: 2.6.17-mm3 ]
  2006-07-01 10:51               ` Weird RAID/SATA problem [ once was Re: 2.6.17-mm3 ] Neil Brown
@ 2006-07-01 11:38                 ` Reuben Farrelly
  2006-07-01 12:05                   ` Neil Brown
  0 siblings, 1 reply; 6+ messages in thread
From: Reuben Farrelly @ 2006-07-01 11:38 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-kernel, Andrew Morton



On 1/07/2006 10:51 p.m., Neil Brown wrote:
> On Saturday July 1, reuben-lkml@reub.net wrote:
>>>> md: super_written gets error=-5, uptodate=0
>>>>
>>>> messages on the console that didn't seem to want to stop...
>>> '5' == EIO 
>>>
>>> We try to write the superblock and we get EIO - something wrong somewhere.
>>>
>>> What sort of device are we writing to here?  What controller, what
>>> driver (if you know), what drives?
>> The two raid-1 disks are the Seagate ST380817AS SATA disks on the onboard
>> controller.  The motherboard is an Intel D945GNT motherboard.  See dmesg..
>>
>>> Can you write to the device without using md?
>> Yes.
>>
> 
> So... When md writes a superblock to this device, it reliably (or
> close to reliably) gets EIO.  When mkfs writes, it works fine.
> 
> Only difference I can think of is still barriers... Does this patch
> make any difference?

You will be happy to know that yes, it does make a difference.

Applied to -mm4, RAID-1 now comes up with all arrays in sync and everything 
looking good.  Tried it twice, and both times raid-1 came up perfectly with

md0 : active raid1 sdc2[1] sda2[0]
       24410688 blocks [2/2] [UU]
       bitmap: 0/187 pages [0KB], 64KB chunk

for each md.

reuben

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Weird RAID/SATA problem [ once was Re: 2.6.17-mm3 ]
  2006-07-01 11:38                 ` Reuben Farrelly
@ 2006-07-01 12:05                   ` Neil Brown
  2006-07-01 12:24                     ` Reuben Farrelly
  0 siblings, 1 reply; 6+ messages in thread
From: Neil Brown @ 2006-07-01 12:05 UTC (permalink / raw)
  To: Reuben Farrelly; +Cc: linux-kernel, Andrew Morton

On Saturday July 1, reuben-lkml@reub.net wrote:
> > 
> > Only difference I can think of is still barriers... Does this patch
> > make any difference?
> 
> You will be happy to know that yes, it does make a difference.
> 
> Applied to -mm4, RAID-1 now comes up with all arrays in sync and everything 
> looking good.  Tried it twice, and both times raid-1 came up perfectly with
> 
> md0 : active raid1 sdc2[1] sda2[0]
>        24410688 blocks [2/2] [UU]
>        bitmap: 0/187 pages [0KB], 64KB chunk
> 
> for each md.
> 

Cool.... so who is giving us that EIO.  I'm guessing
end_that_request_first or .._last, but where is it getting to there
from?

What is you remove that last patch (so it still tries barrier writes)
but add this patch (so WARN_ON gives us a trace when it happens).

Thanks,
NeilBrown


Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/md.c |    1 +
 1 file changed, 1 insertion(+)

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c	2006-07-01 12:12:14.000000000 +1000
+++ ./drivers/md/md.c	2006-07-01 22:00:57.000000000 +1000
@@ -412,6 +412,7 @@ static int super_written_barrier(struct 
 	if (bio->bi_size)
 		return 1;
 
+	WARN_ON(error);
 	if (!test_bit(BIO_UPTODATE, &bio->bi_flags) &&
 	    error == -EOPNOTSUPP) {
 		unsigned long flags;

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Weird RAID/SATA problem [ once was Re: 2.6.17-mm3 ]
  2006-07-01 12:05                   ` Neil Brown
@ 2006-07-01 12:24                     ` Reuben Farrelly
  2006-07-01 13:28                       ` Neil Brown
  0 siblings, 1 reply; 6+ messages in thread
From: Reuben Farrelly @ 2006-07-01 12:24 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-kernel, Andrew Morton



On 2/07/2006 12:05 a.m., Neil Brown wrote:
> On Saturday July 1, reuben-lkml@reub.net wrote:
>>> Only difference I can think of is still barriers... Does this patch
>>> make any difference?
>> You will be happy to know that yes, it does make a difference.
>>
>> Applied to -mm4, RAID-1 now comes up with all arrays in sync and everything 
>> looking good.  Tried it twice, and both times raid-1 came up perfectly with
>>
>> md0 : active raid1 sdc2[1] sda2[0]
>>        24410688 blocks [2/2] [UU]
>>        bitmap: 0/187 pages [0KB], 64KB chunk
>>
>> for each md.
>>
> 
> Cool.... so who is giving us that EIO.  I'm guessing
> end_that_request_first or .._last, but where is it getting to there
> from?
> 
> What is you remove that last patch (so it still tries barrier writes)
> but add this patch (so WARN_ON gives us a trace when it happens).

Bear in mind that this does not include Ingo's fix for the md stuff:


Bootdata ok (command line is ro root=/dev/md0 panic=60 console=ttyS0,57600 single)
Linux version 2.6.17-mm4 (root@tornado.reub.net) (gcc version 4.1.1 20060629 
(Red Hat 4.1.1-6)) #3 SMP Sun Jul 2 00:11:13 NZST 2006

<snip>

mice: PS/2 mouse device common for all mice
md: raid1 personality registered for level 1
md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 4.39
EDAC MC: Ver: 2.0.0 Jul  1 2006
TCP bic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
BIOS EDD facility v0.16 2004-Jun-25, 3 devices found
Freeing unused kernel memory: 216k freed
Red Hat nash version 5.0.46 starting
Mounting proc filesystem
Mounting sysfs filesystem
Creating /dev
Creating initial device nodes
Setting up hotplug.
Creating block device nodes.
Loading ide-disk.ko module
md: Autodetecting RAID arrays.
md: invalid raid superblock magic on sda10
md: sda10 has invalid sb, not importing!
BUG: warning at fs/block_dev.c:1109/__blkdev_put()

Call Trace:
  [<ffffffff802bf192>] __blkdev_put+0x9c/0x1bb
  [<ffffffff802bf2bf>] blkdev_put_partition+0xe/0x10
  [<ffffffff8041f434>] unlock_rdev+0x49/0x50
  [<ffffffff8041fc9a>] md_import_device+0x24a/0x2a0
  [<ffffffff8033af8a>] selinux_capable+0x24/0x29
  [<ffffffff80424a21>] md_ioctl+0xc1/0x154f
  [<ffffffff8033a693>] avc_has_perm+0x49/0x5b
  [<ffffffff80350762>] blkdev_driver_ioctl+0x62/0x80
  [<ffffffff80350e76>] blkdev_ioctl+0x6f6/0x75a
  [<ffffffff8023f674>] wake_up_inode+0x18/0x24
  [<ffffffff8033ae13>] inode_has_perm+0x62/0x71
  [<ffffffff802bfa62>] blkdev_open+0x0/0x61
  [<ffffffff802bfa8e>] blkdev_open+0x2c/0x61
  [<ffffffff8033aecc>] file_has_perm+0xaa/0xb9
  [<ffffffff802beb12>] block_ioctl+0x1b/0x1f
  [<ffffffff802432ea>] do_ioctl+0x2a/0x8f
  [<ffffffff8023185b>] vfs_ioctl+0x27b/0x2a0
  [<ffffffff8024e76a>] sys_ioctl+0x5f/0x82
  [<ffffffff8022fadd>] sys_fcntl+0x33d/0x34f
  [<ffffffff8026014a>] system_call+0x7e/0x83

BUG: warning at fs/block_dev.c:1128/__blkdev_put()

Call Trace:
  [<ffffffff803576f8>] kobject_put+0x19/0x21
  [<ffffffff802bf274>] __blkdev_put+0x17e/0x1bb
  [<ffffffff802bf2bf>] blkdev_put_partition+0xe/0x10
  [<ffffffff8041f434>] unlock_rdev+0x49/0x50
  [<ffffffff8041fc9a>] md_import_device+0x24a/0x2a0
  [<ffffffff8033af8a>] selinux_capable+0x24/0x29
  [<ffffffff80424a21>] md_ioctl+0xc1/0x154f
  [<ffffffff8033a693>] avc_has_perm+0x49/0x5b
  [<ffffffff80350762>] blkdev_driver_ioctl+0x62/0x80
  [<ffffffff80350e76>] blkdev_ioctl+0x6f6/0x75a
  [<ffffffff8023f674>] wake_up_inode+0x18/0x24
  [<ffffffff8033ae13>] inode_has_perm+0x62/0x71
  [<ffffffff802bfa62>] blkdev_open+0x0/0x61
  [<ffffffff802bfa8e>] blkdev_open+0x2c/0x61
  [<ffffffff8033aecc>] file_has_perm+0xaa/0xb9
  [<ffffffff802beb12>] block_ioctl+0x1b/0x1f
  [<ffffffff802432ea>] do_ioctl+0x2a/0x8f
  [<ffffffff8023185b>] vfs_ioctl+0x27b/0x2a0
  [<ffffffff8024e76a>] sys_ioctl+0x5f/0x82
  [<ffffffff8022fadd>] sys_fcntl+0x33d/0x34f
  [<ffffffff8026014a>] system_call+0x7e/0x83

md: invalid raid superblock magic on sdc10
md: sdc10 has invalid sb, not importing!
BUG: warning at fs/block_dev.c:1109/__blkdev_put()

Call Trace:
  [<ffffffff802bf192>] __blkdev_put+0x9c/0x1bb
  [<ffffffff802bf2bf>] blkdev_put_partition+0xe/0x10
  [<ffffffff8041f434>] unlock_rdev+0x49/0x50
  [<ffffffff8041fc9a>] md_import_device+0x24a/0x2a0
  [<ffffffff8033af8a>] selinux_capable+0x24/0x29
  [<ffffffff80424a21>] md_ioctl+0xc1/0x154f
  [<ffffffff8033a693>] avc_has_perm+0x49/0x5b
  [<ffffffff80350762>] blkdev_driver_ioctl+0x62/0x80
  [<ffffffff80350e76>] blkdev_ioctl+0x6f6/0x75a
  [<ffffffff8023f674>] wake_up_inode+0x18/0x24
  [<ffffffff8033ae13>] inode_has_perm+0x62/0x71
  [<ffffffff802bfa62>] blkdev_open+0x0/0x61
  [<ffffffff802bfa8e>] blkdev_open+0x2c/0x61
  [<ffffffff8033aecc>] file_has_perm+0xaa/0xb9
  [<ffffffff802beb12>] block_ioctl+0x1b/0x1f
  [<ffffffff802432ea>] do_ioctl+0x2a/0x8f
  [<ffffffff8023185b>] vfs_ioctl+0x27b/0x2a0
  [<ffffffff8024e76a>] sys_ioctl+0x5f/0x82
  [<ffffffff8022fadd>] sys_fcntl+0x33d/0x34f
  [<ffffffff8026014a>] system_call+0x7e/0x83

BUG: warning at fs/block_dev.c:1128/__blkdev_put()

Call Trace:
  [<ffffffff803576f8>] kobject_put+0x19/0x21
  [<ffffffff802bf274>] __blkdev_put+0x17e/0x1bb
  [<ffffffff802bf2bf>] blkdev_put_partition+0xe/0x10
  [<ffffffff8041f434>] unlock_rdev+0x49/0x50
  [<ffffffff8041fc9a>] md_import_device+0x24a/0x2a0
  [<ffffffff8033af8a>] selinux_capable+0x24/0x29
  [<ffffffff80424a21>] md_ioctl+0xc1/0x154f
  [<ffffffff8033a693>] avc_has_perm+0x49/0x5b
  [<ffffffff80350762>] blkdev_driver_ioctl+0x62/0x80
  [<ffffffff80350e76>] blkdev_ioctl+0x6f6/0x75a
  [<ffffffff8023f674>] wake_up_inode+0x18/0x24
  [<ffffffff8033ae13>] inode_has_perm+0x62/0x71
  [<ffffffff802bfa62>] blkdev_open+0x0/0x61
  [<ffffffff802bfa8e>] blkdev_open+0x2c/0x61
  [<ffffffff8033aecc>] file_has_perm+0xaa/0xb9
  [<ffffffff802beb12>] block_ioctl+0x1b/0x1f
  [<ffffffff802432ea>] do_ioctl+0x2a/0x8f
  [<ffffffff8023185b>] vfs_ioctl+0x27b/0x2a0
  [<ffffffff8024e76a>] sys_ioctl+0x5f/0x82
  [<ffffffff8022fadd>] sys_fcntl+0x33d/0x34f
  [<ffffffff8026014a>] system_call+0x7e/0x83

md: autorun ...
md: considering sdc11 ...
md:  adding sdc11 ...
md: sdc7 has different UUID to sdc11
md: sdc6 has different UUID to sdc11
md: sdc5 has different UUID to sdc11
md: sdc3 has different UUID to sdc11
md: sdc2 has different UUID to sdc11
md:  adding sda11 ...
md: sda7 has different UUID to sdc11
md: sda6 has different UUID to sdc11
md: sda5 has different UUID to sdc11
md: sda3 has different UUID to sdc11
md: sda2 has different UUID to sdc11
md: created md5
md: bind<sda11>
md: bind<sdc11>
md: running: <sdc11><sda11>
raid1: raid set md5 active with 2 out of 2 mirrors
md5: bitmap initialized from disk: read 10/10 pages, set 0 bits, status: 0
created bitmap (153 pages) for device md5
BUG: warning at drivers/md/md.c:411/super_written_barrier()

Call Trace:
  <IRQ> [<ffffffff80422231>] super_written_barrier+0x61/0x100
  [<ffffffff8023c000>] bio_endio+0x5a/0x6a
  [<ffffffff8022e24f>] __end_that_request_first+0x16f/0x4c9
  [<ffffffff8024afaa>] end_that_request_first+0xc/0xe
  [<ffffffff8034e825>] blk_ordered_complete_seq+0x7d/0x8c
  [<ffffffff8034e864>] post_flush_end_io+0x30/0x35
  [<ffffffff8034e748>] end_that_request_last+0xd8/0xf4
  [<ffffffff803d83a1>] scsi_end_request+0xb1/0xdd
  [<ffffffff803d87cd>] scsi_io_completion+0x3cd/0x3dc
  [<ffffffff803d8802>] scsi_blk_pc_done+0x26/0x28
  [<ffffffff803d3e53>] scsi_finish_command+0x66/0x73
  [<ffffffff803d8b71>] scsi_softirq_done+0xe1/0xf0
  [<ffffffff80239980>] blk_done_softirq+0x6e/0x7e
  [<ffffffff80211f5e>] __do_softirq+0x63/0xe5
  [<ffffffff80261322>] call_softirq+0x1e/0x28
  [<ffffffff8026ac58>] do_softirq+0x34/0x8b
  [<ffffffff802872c8>] irq_exit+0x48/0x4a
  [<ffffffff8026ac1a>] do_IRQ+0x6b/0x75
  [<ffffffff8025a3f1>] mwait_idle+0x0/0x4f
  [<ffffffff80260644>] ret_from_intr+0x0/0xa
  <EOI> [<ffffffff80268e00>] prepare_to_copy+0x32/0x3b
  [<ffffffff80268e04>] prepare_to_copy+0x36/0x3b
  [<ffffffff80268e03>] prepare_to_copy+0x35/0x3b
  [<ffffffff8026ee04>] mce_init+0x35/0xec
  [<ffffffff8026ee00>] mce_init+0x31/0xec
  [<ffffffff80268e00>] prepare_to_copy+0x32/0x3b
<repeats dozens of times>
  [<ffffffff80268e00>] prepare_to_copy+0x32/0x3b
  [<ffffffff8026ee00>] mce_init+0x31/0xec
  [<ffffffff80268e00>] prepare_to_copy+0x32/0x3b
<repeats dozens of times>
  [<ffffffff80268e00>] prepare_to_copy+0x32/0x3b
  [<ffffffff80208e00>] __d_lookup+0xa0/0x140
  [<ffffffff802a669f>] handle_bad_irq+0x0/0x1fd
  [<ffffffff802a669f>] handle_bad_irq+0x0/0x1fd
<repeats dozens of times>

reuben

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Weird RAID/SATA problem [ once was Re: 2.6.17-mm3 ]
  2006-07-01 12:24                     ` Reuben Farrelly
@ 2006-07-01 13:28                       ` Neil Brown
  2006-07-02  0:03                         ` Reuben Farrelly
  0 siblings, 1 reply; 6+ messages in thread
From: Neil Brown @ 2006-07-01 13:28 UTC (permalink / raw)
  To: Reuben Farrelly; +Cc: linux-kernel, Andrew Morton

On Sunday July 2, reuben-lkml@reub.net wrote:
> BUG: warning at drivers/md/md.c:411/super_written_barrier()
> 
> Call Trace:
>   <IRQ> [<ffffffff80422231>] super_written_barrier+0x61/0x100
>   [<ffffffff8023c000>] bio_endio+0x5a/0x6a
>   [<ffffffff8022e24f>] __end_that_request_first+0x16f/0x4c9
>   [<ffffffff8024afaa>] end_that_request_first+0xc/0xe
>   [<ffffffff8034e825>] blk_ordered_complete_seq+0x7d/0x8c
>   [<ffffffff8034e864>] post_flush_end_io+0x30/0x35
>   [<ffffffff8034e748>] end_that_request_last+0xd8/0xf4
>   [<ffffffff803d83a1>] scsi_end_request+0xb1/0xdd
>   [<ffffffff803d87cd>] scsi_io_completion+0x3cd/0x3dc

I think the key decision to return an error is happening here in
scsi_io_completion. 
Pooring over a disassembly might help show here, but sticking in a
bunch of printks is probably easier (for someone like me who has never
seen this code before anyway :-)

What does this patch produce?

NeilBrown

Signed-off-by: Neil Brown <neilb@suse.de>


diff .prev/drivers/scsi/scsi_lib.c ./drivers/scsi/scsi_lib.c
--- .prev/drivers/scsi/scsi_lib.c	2006-07-01 23:22:46.000000000 +1000
+++ ./drivers/scsi/scsi_lib.c	2006-07-01 23:26:18.000000000 +1000
@@ -952,6 +952,7 @@ void scsi_io_completion(struct scsi_cmnd
 				 * and quietly refuse further access.
 				 */
 				cmd->device->changed = 1;
+				printk("Unit Attention\n");
 				scsi_end_request(cmd, 0, this_count, 1);
 				return;
 			} else {
@@ -984,6 +985,8 @@ void scsi_io_completion(struct scsi_cmnd
 				scsi_requeue_command(q, cmd);
 				return;
 			} else {
+				printk("Illegal Request %d %d %d\n",
+				       sshdr.asc, sshdr.ascq, cmd->cmnd[0]);
 				scsi_end_request(cmd, 0, this_count, 1);
 				return;
 			}
@@ -1012,6 +1015,7 @@ void scsi_io_completion(struct scsi_cmnd
 					    "Device not ready: ");
 				scsi_print_sense_hdr("", &sshdr);
 			}
+			printk("Not ready\n");
 			scsi_end_request(cmd, 0, this_count, 1);
 			return;
 		case VOLUME_OVERFLOW:
@@ -1022,6 +1026,7 @@ void scsi_io_completion(struct scsi_cmnd
 				scsi_print_sense("", cmd);
 			}
 			/* See SSC3rXX or current. */
+			printk("Volume Overflow\n");
 			scsi_end_request(cmd, 0, this_count, 1);
 			return;
 		default:
@@ -1045,6 +1050,10 @@ void scsi_io_completion(struct scsi_cmnd
 				scsi_print_sense("", cmd);
 		}
 	}
+	printk("ouch. %d %d %d   %d %d %d  %d\n",
+	       good_bytes, sense_valid, sense_deferred,
+	       sshdr.sense_key, sshdr.asc, sshdr.ascq,
+	       result);
 	scsi_end_request(cmd, 0, this_count, !result);
 }
 EXPORT_SYMBOL(scsi_io_completion);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Weird RAID/SATA problem [ once was Re: 2.6.17-mm3 ]
  2006-07-01 13:28                       ` Neil Brown
@ 2006-07-02  0:03                         ` Reuben Farrelly
  0 siblings, 0 replies; 6+ messages in thread
From: Reuben Farrelly @ 2006-07-02  0:03 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-kernel, Andrew Morton



On 2/07/2006 1:28 a.m., Neil Brown wrote:
> On Sunday July 2, reuben-lkml@reub.net wrote:
>> BUG: warning at drivers/md/md.c:411/super_written_barrier()
>>
>> Call Trace:
>>   <IRQ> [<ffffffff80422231>] super_written_barrier+0x61/0x100
>>   [<ffffffff8023c000>] bio_endio+0x5a/0x6a
>>   [<ffffffff8022e24f>] __end_that_request_first+0x16f/0x4c9
>>   [<ffffffff8024afaa>] end_that_request_first+0xc/0xe
>>   [<ffffffff8034e825>] blk_ordered_complete_seq+0x7d/0x8c
>>   [<ffffffff8034e864>] post_flush_end_io+0x30/0x35
>>   [<ffffffff8034e748>] end_that_request_last+0xd8/0xf4
>>   [<ffffffff803d83a1>] scsi_end_request+0xb1/0xdd
>>   [<ffffffff803d87cd>] scsi_io_completion+0x3cd/0x3dc
> 
> I think the key decision to return an error is happening here in
> scsi_io_completion. 
> Pooring over a disassembly might help show here, but sticking in a
> bunch of printks is probably easier (for someone like me who has never
> seen this code before anyway :-)
> 
> What does this patch produce?
> 
> NeilBrown
> 
> Signed-off-by: Neil Brown <neilb@suse.de>
> 
> 
> diff .prev/drivers/scsi/scsi_lib.c ./drivers/scsi/scsi_lib.c
> --- .prev/drivers/scsi/scsi_lib.c	2006-07-01 23:22:46.000000000 +1000
> +++ ./drivers/scsi/scsi_lib.c	2006-07-01 23:26:18.000000000 +1000
> @@ -952,6 +952,7 @@ void scsi_io_completion(struct scsi_cmnd

Loading ide-disk.ko module
md: Autodetecting RAID arrays.
md: invalid raid superblock magic on sda10
md: sda10 has invalid sb, not importing!
md: invalid raid superblock magic on sdc10
md: sdc10 has invalid sb, not importing!
md: autorun ...
md: considering sdc11 ...
md:  adding sdc11 ...
md: sdc7 has different UUID to sdc11
md: sdc6 has different UUID to sdc11
md: sdc5 has different UUID to sdc11
md: sdc3 has different UUID to sdc11
md: sdc2 has different UUID to sdc11
md:  adding sda11 ...
md: sda7 has different UUID to sdc11
md: sda6 has different UUID to sdc11
md: sda5 has different UUID to sdc11
md: sda3 has different UUID to sdc11
md: sda2 has different UUID to sdc11
md: created md5
md: bind<sda11>
md: bind<sdc11>
md: running: <sdc11><sda11>
raid1: raid set md5 active with 2 out of 2 mirrors
md5: bitmap initialized from disk: read 10/10 pages, set 1 bits, status: 0
created bitmap (153 pages) for device md5
ouch. 0 0 0   78 98 128  0
ouch. 0 0 0   78 98 128  0
ouch. 0 0 0   207 248 1  0
raid1: Disk failure on sda11, disabling device.
         Operation continuing on 1 devices
ouch. 0 0 0   2 0 0  0
ouch. 0 0 0   143 248 1  0
ouch. 0 0 0   143 248 1  0
ouch. 0 0 0   143 248 1  0
ouch. 0 0 0   143 248 1  0
md: considering sdc7 ...
RAID1 conf printout:
  --- wd:1 rd:2
  disk 0, wo:1, o:0, dev:sda11
  disk 1, wo:0, o:1, dev:sdc11
md:  adding sdc7 ...
md: sdc6 has different UUID to sdc7
RAID1 conf printout:
  --- wd:1 rd:2
  disk 1, wo:0, o:1, dev:sdc11
md: sdc5 has different UUID to sdc7
md: sdc3 has different UUID to sdc7
md: sdc2 has different UUID to sdc7
md:  adding sda7 ...
md: sda6 has different UUID to sdc7
md: sda5 has different UUID to sdc7
md: sda3 has different UUID to sdc7
md: sda2 has different UUID to sdc7
md: created md4
md: bind<sda7>
md: bind<sdc7>
md: running: <sdc7><sda7>
raid1: raid set md4 active with 2 out of 2 mirrors
md4: bitmap initialized from disk: read 4/4 pages, set 2 bits, status: 0
created bitmap (61 pages) for device md4
ouch. 0 0 0   207 248 1  0
ouch. 0 0 0   207 248 1  0
ouch. 0 0 0   207 248 1  0
raid1: Disk failure on sda7, disabling device.
         Operation continuing on 1 devices
ouch. 0 0 0   2 0 0  0
ouch. 0 0 0   143 248 1  0
ouch. 0 0 0   143 248 1  0
ouch. 0 0 0   143 248 1  0
ouch. 0 0 0   143 248 1  0
md: considering sdc6 ...
RAID1 conf printout:
  --- wd:1 rd:2
  disk 0, wo:1, o:0, dev:sda7
  disk 1, wo:0, o:1, dev:sdc7
md:  adding sdc6 ...
md: sdc5 has different UUID to sdc6
md: sdc3 has different UUID to sdc6
md: sdc2 has different UUID to sdc6
md:  adding sda6 ...
md: sda5 has different UUID to sdc6
md: sda3 has different UUID to sdc6
RAID1 conf printout:
  --- wd:1 rd:2
  disk 1, wo:0, o:1, dev:sdc7
md: sda2 has different UUID to sdc6
md: created md3
md: bind<sda6>
md: bind<sdc6>
md: running: <sdc6><sda6>
raid1: raid set md3 active with 2 out of 2 mirrors
md3: bitmap initialized from disk: read 1/1 pages, set 1 bits, status: 0
created bitmap (13 pages) for device md3
ouch. 0 0 0   78 98 128  0
ouch. 0 0 0   78 98 128  0
ouch. 0 0 0   143 248 1  0
raid1: Disk failure on sdc6, disabling device.
         Operation continuing on 1 devices
ouch. 0 0 0   143 248 1  0
ouch. 0 0 0   207 248 1  0
ouch. 0 0 0   207 248 1  0
ouch. 0 0 0   207 248 1  0
ouch. 0 0 0   207 248 1  0
md: considering sdc5 ...
RAID1 conf printout:
  --- wd:1 rd:2
  disk 0, wo:0, o:1, dev:sda6
  disk 1, wo:1, o:0, dev:sdc6
md:  adding sdc5 ...
md: sdc3 has different UUID to sdc5
md: sdc2 has different UUID to sdc5
md:  adding sda5 ...
md: sda3 has different UUID to sdc5
RAID1 conf printout:
  --- wd:1 rd:2
  disk 0, wo:0, o:1, dev:sda6
md: sda2 has different UUID to sdc5
md: created md2
md: bind<sda5>
md: bind<sdc5>
md: running: <sdc5><sda5>
raid1: raid set md2 active with 2 out of 2 mirrors
md2: bitmap initialized from disk: read 10/10 pages, set 16 bits, status: 0
created bitmap (150 pages) for device md2
ouch. 0 0 0   207 248 1  0
ouch. 0 0 0   207 248 1  0
ouch. 0 0 0   207 248 1  0
raid1: Disk failure on sda5, disabling device.
         Operation continuing on 1 devices
ouch. 0 0 0   207 248 1  0
ouch. 0 0 0   143 248 1  0
ouch. 0 0 0   143 248 1  0
ouch. 0 0 0   143 248 1  0
ouch. 0 0 0   143 248 1  0
md: considering sdc3 ...
RAID1 conf printout:
  --- wd:1 rd:2
  disk 0, wo:1, o:0, dev:sda5
  disk 1, wo:0, o:1, dev:sdc5
md:  adding sdc3 ...
md: sdc2 has different UUID to sdc3
md:  adding sda3 ...
md: sda2 has different UUID to sdc3
md: created md1
md: bind<sda3>
md: bind<sdc3>
RAID1 conf printout:
  --- wd:1 rd:2
  disk 1, wo:0, o:1, dev:sdc5
md: running: <sdc3><sda3>
raid1: raid set md1 active with 2 out of 2 mirrors
md1: bitmap initialized from disk: read 10/10 pages, set 2 bits, status: 0
created bitmap (150 pages) for device md1
ouch. 0 0 0   78 98 128  0
ouch. 0 0 0   78 98 128  0
ouch. 0 0 0   143 248 1  0
raid1: Disk failure on sdc3, disabling device.
         Operation continuing on 1 devices
ouch. 0 0 0   2 0 0  0
ouch. 0 0 0   207 248 1  0
ouch. 0 0 0   207 248 1  0
ouch. 0 0 0   207 248 1  0
ouch. 0 0 0   207 248 1  0
md: considering sdc2 ...
RAID1 conf printout:
  --- wd:1 rd:2
  disk 0, wo:0, o:1, dev:sda3
  disk 1, wo:1, o:0, dev:sdc3
md:  adding sdc2 ...
md:  adding sda2 ...
md: created md0
RAID1 conf printout:
  --- wd:1 rd:2
  disk 0, wo:0, o:1, dev:sda3
md: bind<sda2>
md: bind<sdc2>
md: running: <sdc2><sda2>
raid1: raid set md0 active with 2 out of 2 mirrors
md0: bitmap initialized from disk: read 12/12 pages, set 100 bits, status: 0
created bitmap (187 pages) for device md0
ouch. 0 0 0   143 248 1  0
ouch. 0 0 0   143 248 1  0
ouch. 0 0 0   207 248 1  0
raid1: Disk failure on sda2, disabling device.
         Operation continuing on 1 devices
ouch. 0 0 0   2 0 0  0
ouch. 0 0 0   143 248 1  0
ouch. 0 0 0   143 248 1  0
ouch. 0 0 0   143 248 1  0
ouch. 0 0 0   143 248 1  0
md: ... autorun DONE.
RAID1 conf printout:
  --- wd:1 rd:2
  disk 0, wo:1, o:0, dev:sda2
  disk 1, wo:0, o:1, dev:sdc2
Creating root device.
RAID1 conf printout:
  --- wd:1 rd:2
  disk 1, wo:0, o:1, dev:sdc2
Mounting root filesystem.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Setting up other filesystems.
Setting up new root fs
no fstab.sys, mounting internal defaults
Switching to new root and running init.
unmounting old /dev
unmounting old /proc
unmounting old /proc/bus/usb
ERROR unmounting old /proc/bus/usb: No such file or directory
forcing unmount of /proc/bus/usb
unmounting old /sys
SELinux:  Disabled at runtime.
SELinux:  Unregistering netfilter hooks
audit(1151798137.896:2): selinux=0 auid=4294967295
INIT: version 2.86 booting
                 Welcome to Fedora Core
                 Press 'I' to enter interactive startup.
Setting clock  (utc): Sun Jul  2 11:55:41 NZST 2006 [  OK  ]
Starting udev: [  OK  ]
Setting hostname tornado.reub.net:  [  OK  ]
Checking filesystems
Checking all file systems.
[/sbin/fsck.ext3 (1) -- /boot] fsck.ext3 -a /dev/sda1
/dev/sda1: clean, 50/6024 files, 23798/24064 blocks (check in 3 mounts)
[  OK  ]
Remounting root filesystem in read-write mode:  [  OK  ]
Mounting local filesystems:  [  OK  ]
Enabling local swap partitions:  [  OK  ]
Enabling /etc/fstab swaps:  [  OK  ]
sh-3.1#
sh-3.1#
sh-3.1# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdc3[2](F) sda3[0]
       4891712 blocks [2/1] [U_]
       bitmap: 1/150 pages [4KB], 16KB chunk

md2 : active raid1 sdc5[1] sda5[2](F)
       4891648 blocks [2/1] [_U]
       bitmap: 7/150 pages [28KB], 16KB chunk

md3 : active raid1 sdc6[2](F) sda6[0]
       104320 blocks [2/1] [U_]
       bitmap: 1/13 pages [4KB], 4KB chunk

md4 : active raid1 sdc7[1] sda7[2](F)
       497856 blocks [2/1] [_U]
       bitmap: 4/61 pages [16KB], 4KB chunk

md5 : active raid1 sdc11[1] sda11[2](F)
       20008832 blocks [2/1] [_U]
       bitmap: 1/153 pages [4KB], 64KB chunk

md0 : active raid1 sdc2[1] sda2[2](F)
       24410688 blocks [2/1] [_U]
       bitmap: 6/187 pages [24KB], 64KB chunk

unused devices: <none>
sh-3.1#
sh-3.1# mdadm --add /dev/md5 /dev/sda11
mdadm: Cannot open /dev/sda11: Device or resource busy
sh-3.1# mdadm --add /dev/md5 /dev/sdc11
mdadm: Cannot open /dev/sdc11: Device or resource busy
sh-3.1#
sh-3.1#
sh-3.1# mdadm --add /dev/md4 /dev/sda7
mdadm: Cannot open /dev/sda7: Device or resource busy
sh-3.1#
sh-3.1#
sh-3.1# mdadm --add /dev/md0 /dev/sda2
mdadm: Cannot open /dev/sda2: Device or resource busy
sh-3.1#

reuben

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-07-02  0:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20060630105401.2dc1d3f3.akpm@osdl.org>
     [not found] ` <44A5C1D5.20200@reub.net>
     [not found]   ` <17573.50871.307879.557218@cse.unsw.edu.au>
     [not found]     ` <44A5D079.6070505@reub.net>
     [not found]       ` <17573.55937.866300.638738@cse.unsw.edu.au>
     [not found]         ` <44A6390E.1030608@reub.net>
     [not found]           ` <17574.15295.828980.278323@cse.unsw.edu.au>
     [not found]             ` <44A64BD8.90906@reub.net>
2006-07-01 10:51               ` Weird RAID/SATA problem [ once was Re: 2.6.17-mm3 ] Neil Brown
2006-07-01 11:38                 ` Reuben Farrelly
2006-07-01 12:05                   ` Neil Brown
2006-07-01 12:24                     ` Reuben Farrelly
2006-07-01 13:28                       ` Neil Brown
2006-07-02  0:03                         ` Reuben Farrelly

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).