All of lore.kernel.org
 help / color / mirror / Atom feed
* mdadm --grow failed
@ 2007-02-17  3:22 Marc Marais
  2007-02-17  8:40 ` Neil Brown
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Marc Marais @ 2007-02-17  3:22 UTC (permalink / raw)
  To: linux-raid

I'm trying to grow my raid 5 array as I've just added a new disk. The array 
was originally 3 drives, I've added a fourth using:

mdadm -a /dev/md6 /dev/sda1

Which added the new drive as a spare. I then did:

mdadm --grow /dev/md6 -n 4

Which started the reshape operation. 

Feb 16 23:51:40 xerces kernel: RAID5 conf printout:
Feb 16 23:51:40 xerces kernel:  --- rd:4 wd:4
Feb 16 23:51:40 xerces kernel:  disk 0, o:1, dev:sdb1
Feb 16 23:51:40 xerces kernel:  disk 1, o:1, dev:sdc1
Feb 16 23:51:40 xerces kernel:  disk 2, o:1, dev:sdd1
Feb 16 23:51:40 xerces kernel:  disk 3, o:1, dev:sda1
Feb 16 23:51:40 xerces kernel: md: reshape of RAID array md6
Feb 16 23:51:40 xerces kernel: md: minimum _guaranteed_  speed: 1000 
KB/sec/disk.
Feb 16 23:51:40 xerces kernel: md: using maximum available idle IO bandwidth 
(but not more than 200000 KB/sec) for reshape.
Feb 16 23:51:40 xerces kernel: md: using 128k window, over a total of 
156288256 blocks.

Unfortunately one of the drives timed out during the operation (not a read 
error - just a timeout - which I would've thought would be retried but 
anyway...):

Feb 17 00:19:16 xerces kernel: ata3: command timeout
Feb 17 00:19:16 xerces kernel: ata3: no sense translation for status: 0x40
Feb 17 00:19:16 xerces kernel: ata3: translated ATA stat/err 0x40/00 to SCSI 
SK/ASC/ASCQ 0xb/00/00
Feb 17 00:19:16 xerces kernel: ata3: status=0x40 { DriveReady }
Feb 17 00:19:16 xerces kernel: sd 3:0:0:0: SCSI error: return code = 
0x08000002
Feb 17 00:19:16 xerces kernel: sdc: Current [descriptor]: sense key: Aborted 
Command
Feb 17 00:19:16 xerces kernel:     Additional sense: No additional sense 
information
Feb 17 00:19:16 xerces kernel: Descriptor sense data with sense descriptors 
(in hex):
Feb 17 00:19:16 xerces kernel:         72 0b 00 00 00 00 00 0c 00 0a 80 00 
00 00 00 00 
Feb 17 00:19:16 xerces kernel:         00 00 00 01 
Feb 17 00:19:16 xerces kernel: end_request: I/O error, dev sdc, sector 
24065423
Feb 17 00:19:16 xerces kernel: raid5: Disk failure on sdc1, disabling 
device. Operation continuing on 3 devices

Which then unfortunately aborted the reshape operation:

Feb 17 00:19:16 xerces kernel: md: md6: reshape done.
Feb 17 00:19:17 xerces kernel: RAID5 conf printout:
Feb 17 00:19:17 xerces kernel:  --- rd:4 wd:3
Feb 17 00:19:17 xerces kernel:  disk 0, o:1, dev:sdb1
Feb 17 00:19:17 xerces kernel:  disk 1, o:0, dev:sdc1
Feb 17 00:19:17 xerces kernel:  disk 2, o:1, dev:sdd1
Feb 17 00:19:17 xerces kernel:  disk 3, o:1, dev:sda1
Feb 17 00:19:17 xerces kernel: RAID5 conf printout:
Feb 17 00:19:17 xerces kernel:  --- rd:4 wd:3
Feb 17 00:19:17 xerces kernel:  disk 0, o:1, dev:sdb1
Feb 17 00:19:17 xerces kernel:  disk 2, o:1, dev:sdd1
Feb 17 00:19:17 xerces kernel:  disk 3, o:1, dev:sda1

I re-added the failed disk (sdc) (which btw is a brand new disk - seems this 
is a controller issue - high IO load?) which then resynced the array.

At this point I'm confused as to the state of the array.

mdadm -D /dev/md6 gives:

/dev/md6:
        Version : 00.91.03
  Creation Time : Tue Aug  1 23:31:54 2006
     Raid Level : raid5
     Array Size : 312576512 (298.10 GiB 320.08 GB)
  Used Dev Size : 156288256 (149.05 GiB 160.04 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 6
    Persistence : Superblock is persistent

    Update Time : Sat Feb 17 12:14:22 2007
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

  Delta Devices : 1, (3->4)

           UUID : 603e7ac0:de4df2d1:d44c6b9b:3d20ad32
         Events : 0.7215890

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       2       8       49        2      active sync   /dev/sdd1
       3       8        1        3      active sync   /dev/sda1

Although it previously (before issuing the command below) mentioned 
something about reshape 1% or something to that effect.

I've attempted to continue the reshape by issuing:

mdadm --grow /dev/md6 -n 4 

Which gives the error that the array can't be reshaped without increasing 
its size!

Is my array destroyed? Seeing as the sda disk wasn't completely synced I'm 
wonder how it was using to resync the array when sdc went offline. I've got 
a bad feeling about this :|

Help appreciated. (I do have a full backup of course but that's a last 
resort with my luck I'd get a read error from the tape drive)

Regards,
Marc




--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mdadm --grow failed
  2007-02-17  3:22 mdadm --grow failed Marc Marais
@ 2007-02-17  8:40 ` Neil Brown
  2007-02-18  9:20   ` Marc Marais
  2007-02-17 18:27 ` Bill Davidsen
  2007-02-18 11:51 ` David Greaves
  2 siblings, 1 reply; 15+ messages in thread
From: Neil Brown @ 2007-02-17  8:40 UTC (permalink / raw)
  To: Marc Marais; +Cc: linux-raid

On Saturday February 17, marcm@liquid-nexus.net wrote:
> 
> Is my array destroyed? Seeing as the sda disk wasn't completely synced I'm 
> wonder how it was using to resync the array when sdc went offline. I've got 
> a bad feeling about this :|

I can understand your bad feeling...
What happened there shouldn't happen, but obviously it did.  There is
evidence that all is not lost but obviously I cannot be sure yet.

Can you "fsck -n" the array?  does the data still seem to be intact?

Can you report exactly what version of Linux kernel, and of mdadm you
are using, and give the output of "mdadm -E" on each drive.

I'll try to work out what happened and how to go forward, but am
unlikely to get back to you for 24-48 hours (I have a busy weekend:-).

NeilBrown

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mdadm --grow failed
  2007-02-17  3:22 mdadm --grow failed Marc Marais
  2007-02-17  8:40 ` Neil Brown
@ 2007-02-17 18:27 ` Bill Davidsen
  2007-02-17 19:16   ` Justin Piszcz
  2007-02-18 11:51 ` David Greaves
  2 siblings, 1 reply; 15+ messages in thread
From: Bill Davidsen @ 2007-02-17 18:27 UTC (permalink / raw)
  To: Marc Marais; +Cc: linux-raid

Marc Marais wrote:
> I'm trying to grow my raid 5 array as I've just added a new disk. The array 
> was originally 3 drives, I've added a fourth using:
>
> mdadm -a /dev/md6 /dev/sda1
>
> Which added the new drive as a spare. I then did:
>
> mdadm --grow /dev/md6 -n 4
>
> Which started the reshape operation. 
>
> Feb 16 23:51:40 xerces kernel: RAID5 conf printout:
> Feb 16 23:51:40 xerces kernel:  --- rd:4 wd:4
> Feb 16 23:51:40 xerces kernel:  disk 0, o:1, dev:sdb1
> Feb 16 23:51:40 xerces kernel:  disk 1, o:1, dev:sdc1
> Feb 16 23:51:40 xerces kernel:  disk 2, o:1, dev:sdd1
> Feb 16 23:51:40 xerces kernel:  disk 3, o:1, dev:sda1
> Feb 16 23:51:40 xerces kernel: md: reshape of RAID array md6
> Feb 16 23:51:40 xerces kernel: md: minimum _guaranteed_  speed: 1000 
> KB/sec/disk.
> Feb 16 23:51:40 xerces kernel: md: using maximum available idle IO bandwidth 
> (but not more than 200000 KB/sec) for reshape.
> Feb 16 23:51:40 xerces kernel: md: using 128k window, over a total of 
> 156288256 blocks.
>
> Unfortunately one of the drives timed out during the operation (not a read 
> error - just a timeout - which I would've thought would be retried but 
> anyway...):
>
> Feb 17 00:19:16 xerces kernel: ata3: command timeout
> Feb 17 00:19:16 xerces kernel: ata3: no sense translation for status: 0x40
> Feb 17 00:19:16 xerces kernel: ata3: translated ATA stat/err 0x40/00 to SCSI 
> SK/ASC/ASCQ 0xb/00/00
> Feb 17 00:19:16 xerces kernel: ata3: status=0x40 { DriveReady }
> Feb 17 00:19:16 xerces kernel: sd 3:0:0:0: SCSI error: return code = 
> 0x08000002
> Feb 17 00:19:16 xerces kernel: sdc: Current [descriptor]: sense key: Aborted 
> Command
> Feb 17 00:19:16 xerces kernel:     Additional sense: No additional sense 
> information
> Feb 17 00:19:16 xerces kernel: Descriptor sense data with sense descriptors 
> (in hex):
> Feb 17 00:19:16 xerces kernel:         72 0b 00 00 00 00 00 0c 00 0a 80 00 
> 00 00 00 00 
> Feb 17 00:19:16 xerces kernel:         00 00 00 01 
> Feb 17 00:19:16 xerces kernel: end_request: I/O error, dev sdc, sector 
> 24065423
> Feb 17 00:19:16 xerces kernel: raid5: Disk failure on sdc1, disabling 
> device. Operation continuing on 3 devices
>
> Which then unfortunately aborted the reshape operation:
>
> Feb 17 00:19:16 xerces kernel: md: md6: reshape done.
> Feb 17 00:19:17 xerces kernel: RAID5 conf printout:
> Feb 17 00:19:17 xerces kernel:  --- rd:4 wd:3
> Feb 17 00:19:17 xerces kernel:  disk 0, o:1, dev:sdb1
> Feb 17 00:19:17 xerces kernel:  disk 1, o:0, dev:sdc1
> Feb 17 00:19:17 xerces kernel:  disk 2, o:1, dev:sdd1
> Feb 17 00:19:17 xerces kernel:  disk 3, o:1, dev:sda1
> Feb 17 00:19:17 xerces kernel: RAID5 conf printout:
> Feb 17 00:19:17 xerces kernel:  --- rd:4 wd:3
> Feb 17 00:19:17 xerces kernel:  disk 0, o:1, dev:sdb1
> Feb 17 00:19:17 xerces kernel:  disk 2, o:1, dev:sdd1
> Feb 17 00:19:17 xerces kernel:  disk 3, o:1, dev:sda1
>
> I re-added the failed disk (sdc) (which btw is a brand new disk - seems this 
> is a controller issue - high IO load?) which then resynced the array.
>
> At this point I'm confused as to the state of the array.
>
> mdadm -D /dev/md6 gives:
>
> /dev/md6:
>         Version : 00.91.03
>   Creation Time : Tue Aug  1 23:31:54 2006
>      Raid Level : raid5
>      Array Size : 312576512 (298.10 GiB 320.08 GB)
>   Used Dev Size : 156288256 (149.05 GiB 160.04 GB)
>    Raid Devices : 4
>   Total Devices : 4
> Preferred Minor : 6
>     Persistence : Superblock is persistent
>
>     Update Time : Sat Feb 17 12:14:22 2007
>           State : clean
>  Active Devices : 4
> Working Devices : 4
>  Failed Devices : 0
>   Spare Devices : 0
>
>          Layout : left-symmetric
>      Chunk Size : 128K
>
>   Delta Devices : 1, (3->4)
>
>            UUID : 603e7ac0:de4df2d1:d44c6b9b:3d20ad32
>          Events : 0.7215890
>
>     Number   Major   Minor   RaidDevice State
>        0       8       17        0      active sync   /dev/sdb1
>        1       8       33        1      active sync   /dev/sdc1
>        2       8       49        2      active sync   /dev/sdd1
>        3       8        1        3      active sync   /dev/sda1
>
> Although it previously (before issuing the command below) mentioned 
> something about reshape 1% or something to that effect.
>
> I've attempted to continue the reshape by issuing:
>
> mdadm --grow /dev/md6 -n 4 
>
> Which gives the error that the array can't be reshaped without increasing 
> its size!
>
> Is my array destroyed? Seeing as the sda disk wasn't completely synced I'm 
> wonder how it was using to resync the array when sdc went offline. I've got 
> a bad feeling about this :|
>
> Help appreciated. (I do have a full backup of course but that's a last 
> resort with my luck I'd get a read error from the tape drive)
I have to think maybe a 'check' would have been good before the grow, 
but since Neil didn't suggest it, please don't now, unless he agrees 
that it's a valid attempt.

However, you certainly can run 'df' and see if the filesystem is resized.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mdadm --grow failed
  2007-02-17 18:27 ` Bill Davidsen
@ 2007-02-17 19:16   ` Justin Piszcz
  2007-02-17 21:08     ` Neil Brown
  0 siblings, 1 reply; 15+ messages in thread
From: Justin Piszcz @ 2007-02-17 19:16 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Marc Marais, linux-raid



On Sat, 17 Feb 2007, Bill Davidsen wrote:

> Marc Marais wrote:
>> I'm trying to grow my raid 5 array as I've just added a new disk. The array 
>> was originally 3 drives, I've added a fourth using:
>> 
>> mdadm -a /dev/md6 /dev/sda1
>> 
>> Which added the new drive as a spare. I then did:
>> 
>> mdadm --grow /dev/md6 -n 4
>> 
>> Which started the reshape operation. 
>> Feb 16 23:51:40 xerces kernel: RAID5 conf printout:
>> Feb 16 23:51:40 xerces kernel:  --- rd:4 wd:4
>> Feb 16 23:51:40 xerces kernel:  disk 0, o:1, dev:sdb1
>> Feb 16 23:51:40 xerces kernel:  disk 1, o:1, dev:sdc1
>> Feb 16 23:51:40 xerces kernel:  disk 2, o:1, dev:sdd1
>> Feb 16 23:51:40 xerces kernel:  disk 3, o:1, dev:sda1
>> Feb 16 23:51:40 xerces kernel: md: reshape of RAID array md6
>> Feb 16 23:51:40 xerces kernel: md: minimum _guaranteed_  speed: 1000 
>> KB/sec/disk.
>> Feb 16 23:51:40 xerces kernel: md: using maximum available idle IO 
>> bandwidth (but not more than 200000 KB/sec) for reshape.
>> Feb 16 23:51:40 xerces kernel: md: using 128k window, over a total of 
>> 156288256 blocks.
>> 
>> Unfortunately one of the drives timed out during the operation (not a read 
>> error - just a timeout - which I would've thought would be retried but 
>> anyway...):
>> 
>> Feb 17 00:19:16 xerces kernel: ata3: command timeout
>> Feb 17 00:19:16 xerces kernel: ata3: no sense translation for status: 0x40
>> Feb 17 00:19:16 xerces kernel: ata3: translated ATA stat/err 0x40/00 to 
>> SCSI SK/ASC/ASCQ 0xb/00/00
>> Feb 17 00:19:16 xerces kernel: ata3: status=0x40 { DriveReady }
>> Feb 17 00:19:16 xerces kernel: sd 3:0:0:0: SCSI error: return code = 
>> 0x08000002
>> Feb 17 00:19:16 xerces kernel: sdc: Current [descriptor]: sense key: 
>> Aborted Command
>> Feb 17 00:19:16 xerces kernel:     Additional sense: No additional sense 
>> information
>> Feb 17 00:19:16 xerces kernel: Descriptor sense data with sense descriptors 
>> (in hex):
>> Feb 17 00:19:16 xerces kernel:         72 0b 00 00 00 00 00 0c 00 0a 80 00 
>> 00 00 00 00 Feb 17 00:19:16 xerces kernel:         00 00 00 01 Feb 17 
>> 00:19:16 xerces kernel: end_request: I/O error, dev sdc, sector 24065423
>> Feb 17 00:19:16 xerces kernel: raid5: Disk failure on sdc1, disabling 
>> device. Operation continuing on 3 devices
>> 
>> Which then unfortunately aborted the reshape operation:
>> 
>> Feb 17 00:19:16 xerces kernel: md: md6: reshape done.
>> Feb 17 00:19:17 xerces kernel: RAID5 conf printout:
>> Feb 17 00:19:17 xerces kernel:  --- rd:4 wd:3
>> Feb 17 00:19:17 xerces kernel:  disk 0, o:1, dev:sdb1
>> Feb 17 00:19:17 xerces kernel:  disk 1, o:0, dev:sdc1
>> Feb 17 00:19:17 xerces kernel:  disk 2, o:1, dev:sdd1
>> Feb 17 00:19:17 xerces kernel:  disk 3, o:1, dev:sda1
>> Feb 17 00:19:17 xerces kernel: RAID5 conf printout:
>> Feb 17 00:19:17 xerces kernel:  --- rd:4 wd:3
>> Feb 17 00:19:17 xerces kernel:  disk 0, o:1, dev:sdb1
>> Feb 17 00:19:17 xerces kernel:  disk 2, o:1, dev:sdd1
>> Feb 17 00:19:17 xerces kernel:  disk 3, o:1, dev:sda1
>> 
>> I re-added the failed disk (sdc) (which btw is a brand new disk - seems 
>> this is a controller issue - high IO load?) which then resynced the array.
>> 
>> At this point I'm confused as to the state of the array.
>> 
>> mdadm -D /dev/md6 gives:
>> 
>> /dev/md6:
>>         Version : 00.91.03
>>   Creation Time : Tue Aug  1 23:31:54 2006
>>      Raid Level : raid5
>>      Array Size : 312576512 (298.10 GiB 320.08 GB)
>>   Used Dev Size : 156288256 (149.05 GiB 160.04 GB)
>>    Raid Devices : 4
>>   Total Devices : 4
>> Preferred Minor : 6
>>     Persistence : Superblock is persistent
>>
>>     Update Time : Sat Feb 17 12:14:22 2007
>>           State : clean
>>  Active Devices : 4
>> Working Devices : 4
>>  Failed Devices : 0
>>   Spare Devices : 0
>>
>>          Layout : left-symmetric
>>      Chunk Size : 128K
>>
>>   Delta Devices : 1, (3->4)
>>
>>            UUID : 603e7ac0:de4df2d1:d44c6b9b:3d20ad32
>>          Events : 0.7215890
>>
>>     Number   Major   Minor   RaidDevice State
>>        0       8       17        0      active sync   /dev/sdb1
>>        1       8       33        1      active sync   /dev/sdc1
>>        2       8       49        2      active sync   /dev/sdd1
>>        3       8        1        3      active sync   /dev/sda1
>> 
>> Although it previously (before issuing the command below) mentioned 
>> something about reshape 1% or something to that effect.
>> 
>> I've attempted to continue the reshape by issuing:
>> 
>> mdadm --grow /dev/md6 -n 4 
>> Which gives the error that the array can't be reshaped without increasing 
>> its size!
>> 
>> Is my array destroyed? Seeing as the sda disk wasn't completely synced I'm 
>> wonder how it was using to resync the array when sdc went offline. I've got 
>> a bad feeling about this :|
>> 
>> Help appreciated. (I do have a full backup of course but that's a last 
>> resort with my luck I'd get a read error from the tape drive)
> I have to think maybe a 'check' would have been good before the grow, but 
> since Neil didn't suggest it, please don't now, unless he agrees that it's a 
> valid attempt.
>
> However, you certainly can run 'df' and see if the filesystem is resized.
>
> -- 
> bill davidsen <davidsen@tmr.com>
> CTO TMR Associates, Inc
> Doing interesting things with small computers since 1979
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Is growing an array with > 1 disk at a time permissible?  I've grown a 
raid 5 from 1.8tb to 3.3tb but always 1 disk at a time.

Justin.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mdadm --grow failed
  2007-02-17 19:16   ` Justin Piszcz
@ 2007-02-17 21:08     ` Neil Brown
  2007-02-17 21:30       ` Justin Piszcz
  0 siblings, 1 reply; 15+ messages in thread
From: Neil Brown @ 2007-02-17 21:08 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Bill Davidsen, Marc Marais, linux-raid

On Saturday February 17, jpiszcz@lucidpixels.com wrote:
> 
> Is growing an array with > 1 disk at a time permissible?  I've grown a 
> raid 5 from 1.8tb to 3.3tb but always 1 disk at a time.

Sure is.  >0 is the current requirement.
You can grow a 2 drive raid5 directly to a 10drive if you like.

NeilBrown

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mdadm --grow failed
  2007-02-17 21:08     ` Neil Brown
@ 2007-02-17 21:30       ` Justin Piszcz
  0 siblings, 0 replies; 15+ messages in thread
From: Justin Piszcz @ 2007-02-17 21:30 UTC (permalink / raw)
  To: Neil Brown; +Cc: Bill Davidsen, Marc Marais, linux-raid



On Sun, 18 Feb 2007, Neil Brown wrote:

> On Saturday February 17, jpiszcz@lucidpixels.com wrote:
>>
>> Is growing an array with > 1 disk at a time permissible?  I've grown a
>> raid 5 from 1.8tb to 3.3tb but always 1 disk at a time.
>
> Sure is.  >0 is the current requirement.
> You can grow a 2 drive raid5 directly to a 10drive if you like.
>
> NeilBrown
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Wow! Thanks for the info, was not aware of this.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mdadm --grow failed
  2007-02-17  8:40 ` Neil Brown
@ 2007-02-18  9:20   ` Marc Marais
       [not found]     ` <17880.7869.963793.706096@notabene.brown>
  2007-02-19  0:50     ` Neil Brown
  0 siblings, 2 replies; 15+ messages in thread
From: Marc Marais @ 2007-02-18  9:20 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Ok, I understand the risks which is why I did a full backup before doing 
this. I have subsequently recreated the array and restored my data from 
backup.

Just for information, the e2fsck -n on the drive hung (unresponsive with no 
I/O) so I assume the filesystem was hosed. I suspect resyncing the array 
after the grow failed was a bad idea. 

I'm not sure how the grow operation is performed but to me it seems that 
their is no fault tolerance during the operation so any failure will cause a 
corrupt array. My 2c would be that if any drive fails during a grow 
operation that the operation is aborted in such a way as to allow a restart 
later (if possible) - as in my case a retry would've probably worked. 

Anyway, if you need more info to help improve growing arrays let me know.

As a side note, either my hardware (Promise TX4000) card is acting up or 
there are still some unresolved issues with libata in general and/or 
sata_promise itself. 

Regards,
Marc

On Sat, 17 Feb 2007 19:40:17 +1100, Neil Brown wrote
> On Saturday February 17, marcm@liquid-nexus.net wrote:
> > 
> > Is my array destroyed? Seeing as the sda disk wasn't completely synced 
I'm 
> > wonder how it was using to resync the array when sdc went offline. I've 
got 
> > a bad feeling about this :|
> 
> I can understand your bad feeling...
> What happened there shouldn't happen, but obviously it did.  There is
> evidence that all is not lost but obviously I cannot be sure yet.
> 
> Can you "fsck -n" the array?  does the data still seem to be intact?
> 
> Can you report exactly what version of Linux kernel, and of mdadm you
> are using, and give the output of "mdadm -E" on each drive.
> 
> I'll try to work out what happened and how to go forward, but am
> unlikely to get back to you for 24-48 hours (I have a busy weekend:-).
> 
> NeilBrown


--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mdadm --grow failed
  2007-02-17  3:22 mdadm --grow failed Marc Marais
  2007-02-17  8:40 ` Neil Brown
  2007-02-17 18:27 ` Bill Davidsen
@ 2007-02-18 11:51 ` David Greaves
  2 siblings, 0 replies; 15+ messages in thread
From: David Greaves @ 2007-02-18 11:51 UTC (permalink / raw)
  To: Marc Marais; +Cc: linux-raid

Marc Marais wrote:
[snip]
> Unfortunately one of the drives timed out during the operation (not a read 
> error - just a timeout - which I would've thought would be retried but 
> anyway...):
> Help appreciated. (I do have a full backup of course but that's a last 
> resort with my luck I'd get a read error from the tape drive)

Hi Marc
It looks like you've since recreated the array and restored your data - good :)

It doesn't appear that you mentioned the kernel and distro you are using and the
software versions.

I'm sure this is something people will need.

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Fw: Re: mdadm --grow failed
       [not found]       ` <20070218105242.M29958@liquid-nexus.net>
@ 2007-02-18 11:57         ` Marc Marais
  2007-02-18 12:13           ` Justin Piszcz
  0 siblings, 1 reply; 15+ messages in thread
From: Marc Marais @ 2007-02-18 11:57 UTC (permalink / raw)
  To: linux-raid

On Sun, 18 Feb 2007 20:39:09 +1100, Neil Brown wrote
> On Sunday February 18, marcm@liquid-nexus.net wrote:
> > Ok, I understand the risks which is why I did a full backup before doing 
> > this. I have subsequently recreated the array and restored my data from 
> > backup.
> 
> Could you still please tell me exactly what kernel/mdadm version you
> were using?
> 
> Thanks,
> NeilBrown

2.6.20 with the patch you supplied in response to the "md6_raid5 crash 
email" I posted in linux-raid a few days ago. Just as background, I replaced 
the failing drive and at the same time bought an additional drive in order 
to increase the array size.

mdadm -V = v2.6 - 21 December 2006. Compiled under Debian (stable).

Also, I've just noticed another drive failure with the new array with a 
similar error to what happened during the grow operation (although on a 
different drive) - I wonder if I should post this to linux-ide?

Feb 18 00:58:10 xerces kernel: ata4: command timeout
Feb 18 00:58:10 xerces kernel: ata4: no sense translation for status: 0x40
Feb 18 00:58:10 xerces kernel: ata4: translated ATA stat/err 0x40/00 to SCSI 
SK/ASC/ASCQ 0xb/00/00
Feb 18 00:58:10 xerces kernel: ata4: status=0x40 { DriveReady }
Feb 18 00:58:10 xerces kernel: sd 4:0:0:0: SCSI error: return code = 
0x08000002
Feb 18 00:58:10 xerces kernel: sdd: Current [descriptor]: sense key: Aborted 
Command
Feb 18 00:58:10 xerces kernel:     Additional sense: No additional sense 
information
Feb 18 00:58:10 xerces kernel: Descriptor sense data with sense descriptors 
(in hex):
Feb 18 00:58:10 xerces kernel:         72 0b 00 00 00 00 00 0c 00 0a 80 00 
00 00 00 00
Feb 18 00:58:10 xerces kernel:         00 00 00 00
Feb 18 00:58:10 xerces kernel: end_request: I/O error, dev sdd, sector 
35666775
Feb 18 00:58:10 xerces kernel: raid5: Disk failure on sdd1, disabling 
device. Operation continuing on 3 devices

Regards,
Marc


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fw: Re: mdadm --grow failed
  2007-02-18 11:57         ` Fw: " Marc Marais
@ 2007-02-18 12:13           ` Justin Piszcz
  2007-02-18 12:32             ` Marc Marais
  2007-02-19  5:41             ` mdadm --grow failed Marc Marais
  0 siblings, 2 replies; 15+ messages in thread
From: Justin Piszcz @ 2007-02-18 12:13 UTC (permalink / raw)
  To: Marc Marais; +Cc: linux-raid

On Sun, 18 Feb 2007, Marc Marais wrote:

> On Sun, 18 Feb 2007 20:39:09 +1100, Neil Brown wrote
>> On Sunday February 18, marcm@liquid-nexus.net wrote:
>>> Ok, I understand the risks which is why I did a full backup before doing
>>> this. I have subsequently recreated the array and restored my data from
>>> backup.
>>
>> Could you still please tell me exactly what kernel/mdadm version you
>> were using?
>>
>> Thanks,
>> NeilBrown
>
> 2.6.20 with the patch you supplied in response to the "md6_raid5 crash
> email" I posted in linux-raid a few days ago. Just as background, I replaced
> the failing drive and at the same time bought an additional drive in order
> to increase the array size.
>
> mdadm -V = v2.6 - 21 December 2006. Compiled under Debian (stable).
>
> Also, I've just noticed another drive failure with the new array with a
> similar error to what happened during the grow operation (although on a
> different drive) - I wonder if I should post this to linux-ide?
>
> Feb 18 00:58:10 xerces kernel: ata4: command timeout
> Feb 18 00:58:10 xerces kernel: ata4: no sense translation for status: 0x40
> Feb 18 00:58:10 xerces kernel: ata4: translated ATA stat/err 0x40/00 to SCSI
> SK/ASC/ASCQ 0xb/00/00
> Feb 18 00:58:10 xerces kernel: ata4: status=0x40 { DriveReady }
> Feb 18 00:58:10 xerces kernel: sd 4:0:0:0: SCSI error: return code =
> 0x08000002
> Feb 18 00:58:10 xerces kernel: sdd: Current [descriptor]: sense key: Aborted
> Command
> Feb 18 00:58:10 xerces kernel:     Additional sense: No additional sense
> information
> Feb 18 00:58:10 xerces kernel: Descriptor sense data with sense descriptors
> (in hex):
> Feb 18 00:58:10 xerces kernel:         72 0b 00 00 00 00 00 0c 00 0a 80 00
> 00 00 00 00
> Feb 18 00:58:10 xerces kernel:         00 00 00 00
> Feb 18 00:58:10 xerces kernel: end_request: I/O error, dev sdd, sector
> 35666775
> Feb 18 00:58:10 xerces kernel: raid5: Disk failure on sdd1, disabling
> device. Operation continuing on 3 devices
>
> Regards,
> Marc
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Just out of curiosity:

Feb 18 00:58:10 xerces kernel: end_request: I/O error, dev sdd, sector
35666775

Can you run:

smartctl -d ata -t short /dev/sdd
wait 5 min
smartctl -d ata -t long /dev/sdd
wait 2-3 hr
smartctl -d ata -a /dev/sdd

And then e-mail that output to the list?

Justin.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mdadm --grow failed
  2007-02-18 12:13           ` Justin Piszcz
@ 2007-02-18 12:32             ` Marc Marais
  2007-02-19  4:43               ` sata_promise: random/intermittent errors Marc Marais
  2007-02-19  5:41             ` mdadm --grow failed Marc Marais
  1 sibling, 1 reply; 15+ messages in thread
From: Marc Marais @ 2007-02-18 12:32 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid

On Sun, 18 Feb 2007 07:13:28 -0500 (EST), Justin Piszcz wrote
> On Sun, 18 Feb 2007, Marc Marais wrote:
> 
> > On Sun, 18 Feb 2007 20:39:09 +1100, Neil Brown wrote
> >> On Sunday February 18, marcm@liquid-nexus.net wrote:
> >>> Ok, I understand the risks which is why I did a full backup before 
doing
> >>> this. I have subsequently recreated the array and restored my data from
> >>> backup.
> >>
> >> Could you still please tell me exactly what kernel/mdadm version you
> >> were using?
> >>
> >> Thanks,
> >> NeilBrown
> >
> > 2.6.20 with the patch you supplied in response to the "md6_raid5 crash
> > email" I posted in linux-raid a few days ago. Just as background, I 
replaced
> > the failing drive and at the same time bought an additional drive in 
order
> > to increase the array size.
> >
> > mdadm -V = v2.6 - 21 December 2006. Compiled under Debian (stable).
> >
> > Also, I've just noticed another drive failure with the new array with a
> > similar error to what happened during the grow operation (although on a
> > different drive) - I wonder if I should post this to linux-ide?
> >
> > Feb 18 00:58:10 xerces kernel: ata4: command timeout
> > Feb 18 00:58:10 xerces kernel: ata4: no sense translation for status: 
0x40
> > Feb 18 00:58:10 xerces kernel: ata4: translated ATA stat/err 0x40/00 to 
SCSI
> > SK/ASC/ASCQ 0xb/00/00
> > Feb 18 00:58:10 xerces kernel: ata4: status=0x40 { DriveReady }
> > Feb 18 00:58:10 xerces kernel: sd 4:0:0:0: SCSI error: return code =
> > 0x08000002
> > Feb 18 00:58:10 xerces kernel: sdd: Current [descriptor]: sense key: 
Aborted
> > Command
> > Feb 18 00:58:10 xerces kernel:     Additional sense: No additional sense
> > information
> > Feb 18 00:58:10 xerces kernel: Descriptor sense data with sense 
descriptors
> > (in hex):
> > Feb 18 00:58:10 xerces kernel:         72 0b 00 00 00 00 00 0c 00 0a 80 
00
> > 00 00 00 00
> > Feb 18 00:58:10 xerces kernel:         00 00 00 00
> > Feb 18 00:58:10 xerces kernel: end_request: I/O error, dev sdd, sector
> > 35666775
> > Feb 18 00:58:10 xerces kernel: raid5: Disk failure on sdd1, disabling
> > device. Operation continuing on 3 devices
> >
> > Regards,
> > Marc
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> Just out of curiosity:
> 
> Feb 18 00:58:10 xerces kernel: end_request: I/O error, dev sdd,
>  sector 35666775
> 
> Can you run:
> 
> smartctl -d ata -t short /dev/sdd
> wait 5 min
> smartctl -d ata -t long /dev/sdd
> wait 2-3 hr
> smartctl -d ata -a /dev/sdd
> 
> And then e-mail that output to the list?
> 
> Justin.

I have smartmontools performing regular short and long scans but I will run 
the tests immediately and send the output of smartctl -a when done. 

Note I'm getting similar errors on sdc too (as in 5 minutes ago). 
Interestingly the SMART error logs for sdc and sdd show no errors at all. 

ata3: command timeout
ata3: no sense translation for status: 0x40
ata3: translated ATA stat/err 0x40/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x40 { DriveReady }
sd 3:0:0:0: SCSI error: return code = 0x08000002
sdd: Current [descriptor]: sense key: Aborted Command
     Additional sense: No additional sense information
Descriptor sense data with sense descriptors (in hex):
         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
         00 00 00 00
end_request: I/O error, dev sdc, sector 260419647
raid5:md6: read error corrected (8 sectors at 260419584 on sdc1)

Will post logs when done...

Marc

--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mdadm --grow failed
  2007-02-18  9:20   ` Marc Marais
       [not found]     ` <17880.7869.963793.706096@notabene.brown>
@ 2007-02-19  0:50     ` Neil Brown
  1 sibling, 0 replies; 15+ messages in thread
From: Neil Brown @ 2007-02-19  0:50 UTC (permalink / raw)
  To: Marc Marais; +Cc: linux-raid

On Sunday February 18, marcm@liquid-nexus.net wrote:
> 
> I'm not sure how the grow operation is performed but to me it seems that 
> their is no fault tolerance during the operation so any failure will cause a 
> corrupt array. My 2c would be that if any drive fails during a grow 
> operation that the operation is aborted in such a way as to allow a restart 
> later (if possible) - as in my case a retry would've probably worked. 

For what it's worth, the code does exactly what you suggest.  It does
fail gracefully.  The problem is that it doesn't restart quite the
way you would like.

Had you stopped the array and re-assembled it, it would have resume
the reshape process (at least it did in my testing).

The following patch makes it retry a reshape straight away if it was
aborted due to a device failure (of course, if too many devices have
failed, the retry won't get anywhere, but you would expect that).

Thanks for the valuable feedback.

NeilBrown


Restart a (raid5) reshape that has been aborted due to a read/write error.

An error always aborts any resync/recovery/reshape on the understanding
that it will immediately be restarted if that still makes sense.
However a reshape currently doesn't get restarted.  This this patch
it does.
To avoid restarting when it is not possible to do work, we call 
in to the personality to check that a reshape is ok, and strengthen
raid5_check_reshape to fail if there are too many failed devices.

We also break some code out into a separate function: remote_and_add_spares
as the indent level for that code we getting crazy.


### Diffstat output
 ./drivers/md/md.c    |   74 +++++++++++++++++++++++++++++++--------------------
 ./drivers/md/raid5.c |    2 +
 2 files changed, 47 insertions(+), 29 deletions(-)

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c	2007-02-19 11:44:51.000000000 +1100
+++ ./drivers/md/md.c	2007-02-19 11:44:54.000000000 +1100
@@ -5343,6 +5343,44 @@ void md_do_sync(mddev_t *mddev)
 EXPORT_SYMBOL_GPL(md_do_sync);
 
 
+static int remove_and_add_spares(mddev_t *mddev)
+{
+	mdk_rdev_t *rdev;
+	struct list_head *rtmp;
+	int spares = 0;
+
+	ITERATE_RDEV(mddev,rdev,rtmp)
+		if (rdev->raid_disk >= 0 &&
+		    (test_bit(Faulty, &rdev->flags) ||
+		     ! test_bit(In_sync, &rdev->flags)) &&
+		    atomic_read(&rdev->nr_pending)==0) {
+			if (mddev->pers->hot_remove_disk(
+				    mddev, rdev->raid_disk)==0) {
+				char nm[20];
+				sprintf(nm,"rd%d", rdev->raid_disk);
+				sysfs_remove_link(&mddev->kobj, nm);
+				rdev->raid_disk = -1;
+			}
+		}
+
+	if (mddev->degraded) {
+		ITERATE_RDEV(mddev,rdev,rtmp)
+			if (rdev->raid_disk < 0
+			    && !test_bit(Faulty, &rdev->flags)) {
+				rdev->recovery_offset = 0;
+				if (mddev->pers->hot_add_disk(mddev,rdev)) {
+					char nm[20];
+					sprintf(nm, "rd%d", rdev->raid_disk);
+					sysfs_create_link(&mddev->kobj,
+							  &rdev->kobj, nm);
+					spares++;
+					md_new_event(mddev);
+				} else
+					break;
+			}
+	}
+	return spares;
+}
 /*
  * This routine is regularly called by all per-raid-array threads to
  * deal with generic issues like resync and super-block update.
@@ -5397,7 +5435,7 @@ void md_check_recovery(mddev_t *mddev)
 		return;
 
 	if (mddev_trylock(mddev)) {
-		int spares =0;
+		int spares = 0;
 
 		spin_lock_irq(&mddev->write_lock);
 		if (mddev->safemode && !atomic_read(&mddev->writes_pending) &&
@@ -5460,35 +5498,13 @@ void md_check_recovery(mddev_t *mddev)
 		 * Spare are also removed and re-added, to allow
 		 * the personality to fail the re-add.
 		 */
-		ITERATE_RDEV(mddev,rdev,rtmp)
-			if (rdev->raid_disk >= 0 &&
-			    (test_bit(Faulty, &rdev->flags) || ! test_bit(In_sync, &rdev->flags)) &&
-			    atomic_read(&rdev->nr_pending)==0) {
-				if (mddev->pers->hot_remove_disk(mddev, rdev->raid_disk)==0) {
-					char nm[20];
-					sprintf(nm,"rd%d", rdev->raid_disk);
-					sysfs_remove_link(&mddev->kobj, nm);
-					rdev->raid_disk = -1;
-				}
-			}
-
-		if (mddev->degraded) {
-			ITERATE_RDEV(mddev,rdev,rtmp)
-				if (rdev->raid_disk < 0
-				    && !test_bit(Faulty, &rdev->flags)) {
-					rdev->recovery_offset = 0;
-					if (mddev->pers->hot_add_disk(mddev,rdev)) {
-						char nm[20];
-						sprintf(nm, "rd%d", rdev->raid_disk);
-						sysfs_create_link(&mddev->kobj, &rdev->kobj, nm);
-						spares++;
-						md_new_event(mddev);
-					} else
-						break;
-				}
-		}
 
-		if (spares) {
+		if (mddev->reshape_position != MaxSector) {
+			if (mddev->pers->check_reshape(mddev) != 0)
+				/* Cannot proceed */
+				goto unlock;
+			set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
+		} else if ((spares = remove_and_add_spares(mddev))) {
 			clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
 			clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
 		} else if (mddev->recovery_cp < MaxSector) {

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c	2007-02-19 11:44:48.000000000 +1100
+++ ./drivers/md/raid5.c	2007-02-19 11:44:54.000000000 +1100
@@ -3814,6 +3814,8 @@ static int raid5_check_reshape(mddev_t *
 	if (err)
 		return err;
 
+	if (mddev->degraded > conf->max_degraded)
+		return -EINVAL;
 	/* looks like we might be able to manage this */
 	return 0;
 }

^ permalink raw reply	[flat|nested] 15+ messages in thread

* sata_promise: random/intermittent errors
  2007-02-18 12:32             ` Marc Marais
@ 2007-02-19  4:43               ` Marc Marais
  0 siblings, 0 replies; 15+ messages in thread
From: Marc Marais @ 2007-02-19  4:43 UTC (permalink / raw)
  To: linux-ide

I've decided to post this to the linux-ide list to see if I can get to the
bottom of this problem I'm experiencing with sata_promise and my PATA drives.

I've pasted a thread from the linux-raid list where I was trying to
troubleshoot/recover a destroyed raid5 array.

First a full history:

1) 2.6.17.13: 3 drive PATA raid5 array with one drive starting to give read
errors (legitimate according to SMART logs).
2) System lockups (no kernel panic seen) during load - I suspect due to the
read error on the failing drive. 
3) Decide to upgrade to 2.6.20
4) Raid5 issues occur (handling of read errors caused md device to die). 
5) Patch from Neil to fix raid-5 error handling
6) Replace failed drive and add a new drive at the same time to create a 4
drive PATA array.
7) Attempt to grow the array from 3 -> 4 devices which failed due to an error
similar to this:

ata3: command timeout
ata3: no sense translation for status: 0x40
ata3: translated ATA stat/err 0x40/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x40 { DriveReady }
sd 3:0:0:0: SCSI error: return code = 0x08000002
sdd: Current [descriptor]: sense key: Aborted Command
     Additional sense: No additional sense information
Descriptor sense data with sense descriptors (in hex):
         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
         00 00 00 00
end_request: I/O error, dev sdc, sector 260419647

8) Raid array is trashed, rebuild array and restore from backup.
9) From this point on the system is up and running - restored to working
state. However, I'm still getting errors similar to the above during array
accesses (read/write). Not related to load. The array (being synced) manages
to continue operation using another drive. My concern is that this may happen
on a degraded array in future.

Note that the error I'm getting (shown above) has happened on sdc and sdd and
at different sectors (i.e. not a consistent read error). Also, the SMART logs
for both drives show NO error at all, short and long SMART tests complete
successfully. I suspect this is an issue in the driver and/or my physical
TX4000 card.

If you could shed any light on this I would appreciate it.

Thanks.
Regards.

------------- BEGIN DMESG DUMP -----------------

Linux version 2.6.20 (root@xerces) (gcc version 3.3.5 (Debian 1:3.3.5-13)) #2
SMP Mon Feb 12 09:28:29 GMT-9 2007 BIOS-provided physical RAM map:
sanitize start
sanitize end
copy_e820_map() start: 0000000000000000 size: 000000000009c800 end:
000000000009c800 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 000000000009c800 size: 0000000000003800 end:
00000000000a0000 type: 2
copy_e820_map() start: 00000000000f0000 size: 0000000000010000 end:
0000000000100000 type: 2
copy_e820_map() start: 0000000000100000 size: 000000007feec000 end:
000000007ffec000 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 000000007ffec000 size: 0000000000003000 end:
000000007ffef000 type: 3
copy_e820_map() start: 000000007ffef000 size: 0000000000010000 end:
000000007ffff000 type: 2
copy_e820_map() start: 000000007ffff000 size: 0000000000001000 end:
0000000080000000 type: 4
copy_e820_map() start: 00000000fec00000 size: 0000000000001000 end:
00000000fec01000 type: 2
copy_e820_map() start: 00000000fee00000 size: 0000000000001000 end:
00000000fee01000 type: 2
copy_e820_map() start: 00000000ffff0000 size: 0000000000010000 end:
0000000100000000 type: 2
 BIOS-e820: 0000000000000000 - 000000000009c800 (usable)
 BIOS-e820: 000000000009c800 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007ffec000 (usable)
 BIOS-e820: 000000007ffec000 - 000000007ffef000 (ACPI data)
 BIOS-e820: 000000007ffef000 - 000000007ffff000 (reserved)
 BIOS-e820: 000000007ffff000 - 0000000080000000 (ACPI NVS)
 BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved) 1151MB HIGHMEM
available.
896MB LOWMEM available.
found SMP MP-table at 000f7ea0
Entering add_active_range(0, 0, 524268) 0 entries of 256 used Zone PFN ranges:
  DMA             0 ->     4096
  Normal       4096 ->   229376
  HighMem    229376 ->   524268
early_node_map[1] active PFN ranges
    0:        0 ->   524268
On node 0 totalpages: 524268
  DMA zone: 32 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 4064 pages, LIFO batch:0
  Normal zone: 1760 pages used for memmap
  Normal zone: 223520 pages, LIFO batch:31
  HighMem zone: 2303 pages used for memmap
  HighMem zone: 292589 pages, LIFO batch:31 DMI 2.3 present.
Intel MultiProcessor Specification v1.4
    Virtual Wire compatibility mode.
OEM ID: ASUS     Product ID: PROD00000000 APIC at: 0xFEE00000
Processor #0 6:10 APIC version 16
Processor #1 6:10 APIC version 16
I/O APIC #2 Version 17 at 0xFEC00000.
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Processors: 2
Allocating PCI resources starting at 88000000 (gap: 80000000:7ec00000)
Detected 2133.464 MHz processor.
Built 1 zonelists.  Total pages: 520173
Kernel command line: auto BOOT_IMAGE=Linux ro root=901 acpi=off pci=noacpi
elevator=as mapped APIC to ffffd000 (fee00000) mapped IOAPIC to ffffc000
(fec00000) Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 16384 bytes)
Console: colour VGA+ 80x50
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache
hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 2072936k/2097072k available (1539k kernel code, 22916k reserved, 593k
data, 200k init, 1179568k highmem) virtual kernel memory layout:
    fixmap  : 0xfffa2000 - 0xfffff000   ( 372 kB)
    pkmap   : 0xff800000 - 0xffc00000   (4096 kB)
    vmalloc : 0xf8800000 - 0xff7fe000   ( 111 MB)
    lowmem  : 0xc0000000 - 0xf8000000   ( 896 MB)
      .init : 0xc031b000 - 0xc034d000   ( 200 kB)
      .data : 0xc0280c62 - 0xc0315230   ( 593 kB)
      .text : 0xc0100000 - 0xc0280c62   (1539 kB)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 4269.42 BogoMIPS
(lpj=2134710) Mount-cache hash table entries: 512
CPU: After generic identify, caps: 0383fbff c1cbfbff 00000000 00000000
00000000 00000000 00000000
CPU: CLK_CTL MSR was 60031223. Reprogramming to 20031223
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU: After all inits, caps: 0383fbff c1cbfbff 00000000 00000420 00000000
00000000 00000000 Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Compat vDSO mapped to ffffe000.
Checking 'hlt' instruction... OK.
Freeing SMP alternatives: 10k freed
CPU0: AMD Athlon(TM) MP 2800+ stepping 00 Booting processor 1/1 eip 2000
Initializing CPU#1 Calibrating delay using timer specific routine.. 4266.31
BogoMIPS (lpj=2133156)
CPU: After generic identify, caps: 0383fbff c1cbfbff 00000000 00000000
00000000 00000000 00000000
CPU: CLK_CTL MSR was 60031223. Reprogramming to 20031223
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU: After all inits, caps: 0383fbff c1cbfbff 00000000 00000420 00000000
00000000 00000000 Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: AMD Athlon(TM) MP 2800+ stepping 00 Total of 2 processors activated
(8535.73 BogoMIPS).
ExtINT not setup in hardware but reported by MP table ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=0 pin2=0 checking TSC
synchronization across 2 CPUs: passed.
Brought up 2 CPUs
migration_cost=1084
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xf1f30, last bus=2
PCI: Using configuration type 1
Setting up standard PCI resources
mtrr: your CPUs had inconsistent fixed MTRR settings
mtrr: probably your BIOS does not setup all CPUs.
mtrr: corrected configuration.
Linux Plug and Play Support v0.97 (c) Adam Belay
PnPBIOS: Scanning system for PnP BIOS support...
PnPBIOS: Found PnP BIOS installation structure at 0xc00fc5f0
PnPBIOS: PnP BIOS version 1.0, entry 0xf0000:0xc620, dseg 0xf0000
PnPBIOS: 13 nodes reported by PnP BIOS; 13 recorded by driver SCSI subsystem
initialized libata version 2.00 loaded.
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
Boot video device is 0000:01:05.0
PCI: Using IRQ router AMD768 [1022/7443] at 0000:00:07.3
PCI->APIC IRQ transform: 0000:00:08.0[A] -> IRQ 16 APIC IRQ transform: 
PCI->0000:00:09.0[A] -> IRQ 17 APIC IRQ transform: 0000:01:05.0[A] -> 
PCI->IRQ 16 APIC IRQ transform: 0000:02:04.0[A] -> IRQ 17 APIC IRQ 
PCI->transform: 0000:02:05.0[A] -> IRQ 18 APIC IRQ transform: 
PCI->0000:02:05.1[B] -> IRQ 19 APIC IRQ transform: 0000:02:05.2[C] -> 
PCI->IRQ 16 APIC IRQ transform: 0000:02:06.0[A] -> IRQ 17 APIC IRQ 
PCI->transform: 0000:02:08.0[A] -> IRQ 19
pnp: 00:0f: ioport range 0xe400-0xe47f has been reserved
pnp: 00:0f: ioport range 0xe4e0-0xe4ff has been reserved
PCI: Bridge: 0000:00:01.0
  IO window: disabled.
  MEM window: ee000000-efcfffff
  PREFETCH window: eff00000-fb7fffff
PCI: Bridge: 0000:00:10.0
  IO window: a000-afff
  MEM window: e8800000-ebffffff
  PREFETCH window: efd00000-efdfffff
PCI: Setting latency timer of device 0000:00:01.0 to 64
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 5, 131072 bytes) TCP
established hash table entries: 131072 (order: 8, 1048576 bytes) TCP bind hash
table entries: 65536 (order: 7, 524288 bytes)
TCP: Hash tables configured (established 131072 bind 65536) TCP reno
registered checking if image is initramfs...it isn't (bad gzip magic numbers);
looks like an initrd Freeing initrd memory: 3072k freed Machine check
exception polling timer started.
highmem bounce pool size: 64 pages
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes) io scheduler noop
registered io scheduler anticipatory registered (default) io scheduler
deadline registered io scheduler cfq registered BIOS failed to enable PCI
standards compliance, fixing this error.
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Serial: 8250/16550 driver $Revision: 1.90 $ 2 ports, IRQ sharing enabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
00:02: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:03: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A RAMDISK driver initialized: 16
RAM disks of 8192K size 1024 blocksize
PNP: PS/2 Controller [PNP0303,PNP0f13] at 0x60,0x64 irq 1,12
serio: i8042 KBD port at 0x60,0x64 irq 1
mice: PS/2 mouse device common for all mice TCP cubic registered Starting
balanced_irq Using IPI Shortcut mode
input: AT Translated Set 2 keyboard as /class/input/input0
RAMDISK: cramfs filesystem found at block 0
RAMDISK: Loading 3072KiB [1 disk] into ram disk... 
VFS: Mounted root (cramfs filesystem) readonly.
Freeing unused kernel memory: 200k freed
NET: Registered protocol family 1
md: raid1 personality registered for level 1 Uniform Multi-Platform E-IDE
driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
AMD7441: IDE controller at PCI slot 0000:00:07.1
AMD7441: chipset revision 4
AMD7441: not 100% native mode: will probe irqs later
AMD7441: 0000:00:07.1 (rev 04) UDMA100 controller
    ide0: BM-DMA at 0xd800-0xd807, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xd808-0xd80f, BIOS settings: hdc:DMA, hdd:DMA Probing IDE
interface ide0...
hda: WDC WD800BB-00JHC0, ATA DISK drive
hdb: WDC WD2500JB-00GVC0, ATA DISK drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: WDC WD800BB-23DKA0, ATA DISK drive
hdd: HL-DT-STDVD-ROM GDR8163B, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 128KiB
hda: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100)
hda: cache flushes supported
 hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 hda8 >
hdb: max request size: 512KiB
hdb: 488397168 sectors (250059 MB) w/8192KiB Cache, CHS=30401/255/63, UDMA(100)
hdb: cache flushes supported
 hdb: hdb1
hdc: max request size: 512KiB
hdc: 156312576 sectors (80032 MB) w/2048KiB Cache, CHS=16383/255/63, UDMA(100)
hdc: cache flushes supported
 hdc: hdc1 hdc2 hdc3 hdc4 < hdc5 hdc6 hdc7 hdc8 >
md: md0 stopped.
md: bind<hda1>
md: bind<hdc1>
raid1: raid set md0 active with 2 out of 2 mirrors
md: md1 stopped.
md: bind<hda2>
md: bind<hdc2>
raid1: raid set md1 active with 2 out of 2 mirrors kjournald starting.  Commit
interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
hda: cache flushes supported
hdc: cache flushes supported
hdb: cache flushes supported
Adding 2007992k swap on /dev/md0.  Priority:-1 extents:1 across:2007992k
EXT3 FS on md1, internal journal
Real Time Clock Driver v1.12ac
hdd: ATAPI 52X DVD-ROM drive, 256kB Cache, UDMA(33) Uniform CD-ROM driver
Revision: 3.20
ieee1394: Initialized config rom entry `ip1394'
ieee1394: raw1394: /dev/raw1394 device initialized
ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[17]  MMIO=[e9800000-e98007ff] 
Max Packet=[2048]  IR/IT contexts=[4/8]
video1394: Installed video1394 module
AMD768 RNG detected
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
ohci_hcd 0000:02:05.0: OHCI Host Controller ohci_hcd 0000:02:05.0: new USB bus
registered, assigned bus number 1 ohci_hcd 0000:02:05.0: irq 18, io mem
0xeb000000 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB
hub found hub 1-0:1.0: 3 ports detected ohci_hcd 0000:02:05.1: OHCI Host
Controller ohci_hcd 0000:02:05.1: new USB bus registered, assigned bus number
2 ohci_hcd 0000:02:05.1: irq 19, io mem 0xea800000
ieee1394: Host added: ID:BUS[0-00:1023]  GUID[005042f81010a4eb] usb usb2:
configuration #1 chosen from 1 choice hub 2-0:1.0: USB hub found hub 2-0:1.0:
2 ports detected
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
drivers/usb/input/hid-core.c: v2.6:USB HID core driver
Intel(R) PRO/1000 Network Driver - version 7.3.15-k2 Copyright (c) 1999-2006
Intel Corporation.
e1000: 0000:00:09.0: e1000_probe: (PCI:66MHz:32-bit) 00:0e:0c:a0:04:dd
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection scsi0 : Adaptec
AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
        <Adaptec 2940 Ultra SCSI adapter>
        aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs

scsi 0:0:0:0: Sequential-Access SONY     SDX-500C         0101 PQ: 0 ANSI: 2
 target0:0:0: Beginning Domain Validation
 target0:0:0: wide asynchronous
 target0:0:0: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 8)
 target0:0:0: Domain Validation skipping write tests
 target0:0:0: Ending Domain Validation
sata_promise 0000:00:08.0: version 1.05
ata1: PATA max UDMA/133 cmd 0xF8AA6200 ctl 0xF8AA6238 bmdma 0x0 irq 16
ata2: PATA max UDMA/133 cmd 0xF8AA6280 ctl 0xF8AA62B8 bmdma 0x0 irq 16
ata3: PATA max UDMA/133 cmd 0xF8AA6300 ctl 0xF8AA6338 bmdma 0x0 irq 16
ata4: PATA max UDMA/133 cmd 0xF8AA6380 ctl 0xF8AA63B8 bmdma 0x0 irq 16
scsi1 : sata_promise
ata1.00: ATA-7, max UDMA/100, 312581808 sectors: LBA48
ata1.00: ata1: dev 0 multi count 0
ata1.00: configured for UDMA/100
scsi2 : sata_promise
ata2.00: ATA-7, max UDMA/100, 312581808 sectors: LBA48
ata2.00: ata2: dev 0 multi count 0
ata2.00: configured for UDMA/100
scsi3 : sata_promise
ata3.00: ATA-7, max UDMA/100, 312581808 sectors: LBA48
ata3.00: ata3: dev 0 multi count 0
ata3.00: configured for UDMA/100
scsi4 : sata_promise
ata4.00: ATA-6, max UDMA/100, 312581808 sectors: LBA48
ata4.00: ata4: dev 0 multi count 0
ata4.00: configured for UDMA/100
scsi 1:0:0:0: Direct-Access     ATA      WDC WD1600JB-00R 20.0 PQ: 0 ANSI: 5
scsi 2:0:0:0: Direct-Access     ATA      WDC WD1600JB-00R 20.0 PQ: 0 ANSI: 5
scsi 3:0:0:0: Direct-Access     ATA      WDC WD1600JB-00R 20.0 PQ: 0 ANSI: 5
scsi 4:0:0:0: Direct-Access     ATA      WDC WD1600JB-00E 15.0 PQ: 0 ANSI: 5
device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: dm-devel@redhat.com
md: md2 stopped.
md: bind<hda3>
md: bind<hdc3>
raid1: raid set md2 active with 2 out of 2 mirrors
md: md3 stopped.
md: bind<hda5>
md: bind<hdc5>
raid1: raid set md3 active with 2 out of 2 mirrors
md: md4 stopped.
md: bind<hda6>
md: bind<hdc6>
raid1: raid set md4 active with 2 out of 2 mirrors
md: md5 stopped.
md: bind<hda7>
md: bind<hdc7>
raid1: raid set md5 active with 2 out of 2 mirrors
md: md6 stopped.
SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: write cache: enabled, read cache: enabled, doesn't support
DPO or FUA SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: write cache: enabled, read cache: enabled, doesn't support
DPO or FUA
 sda: sda1
sd 1:0:0:0: Attached scsi disk sda
SCSI device sdb: 312581808 512-byte hdwr sectors (160042 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support
DPO or FUA SCSI device sdb: 312581808 512-byte hdwr sectors (160042 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support
DPO or FUA
 sdb: sdb1
sd 2:0:0:0: Attached scsi disk sdb
SCSI device sdc: 312581808 512-byte hdwr sectors (160042 MB)
sdc: Write Protect is off
sdc: Mode Sense: 00 3a 00 00
SCSI device sdc: write cache: enabled, read cache: enabled, doesn't support
DPO or FUA SCSI device sdc: 312581808 512-byte hdwr sectors (160042 MB)
sdc: Write Protect is off
sdc: Mode Sense: 00 3a 00 00
SCSI device sdc: write cache: enabled, read cache: enabled, doesn't support
DPO or FUA
 sdc: sdc1
sd 3:0:0:0: Attached scsi disk sdc
SCSI device sdd: 312581808 512-byte hdwr sectors (160042 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: write cache: enabled, read cache: enabled, doesn't support
DPO or FUA SCSI device sdd: 312581808 512-byte hdwr sectors (160042 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: write cache: enabled, read cache: enabled, doesn't support
DPO or FUA
 sdd: sdd1
sd 4:0:0:0: Attached scsi disk sdd
md: bind<sdb1>
md: bind<sdc1>
md: bind<sda1>
raid5: automatically using best checksumming function: pIII_sse
   pIII_sse  :  4928.000 MB/sec
raid5: using function: pIII_sse (4928.000 MB/sec)
raid6: int32x1    855 MB/s
raid6: int32x2   1156 MB/s
raid6: int32x4    730 MB/s
raid6: int32x8    648 MB/s
raid6: mmxx1     1781 MB/s
raid6: mmxx2     3265 MB/s
raid6: sse1x1     464 MB/s
raid6: sse1x2     929 MB/s
raid6: using algorithm sse1x2 (929 MB/s)
md: raid6 personality registered for level 6
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
raid5: device sda1 operational as raid disk 0
raid5: device sdc1 operational as raid disk 2
raid5: device sdb1 operational as raid disk 1
raid5: allocated 4204kB for md6
raid5: raid level 5 set md6 active with 3 out of 4 devices, algorithm 2
RAID5 conf printout:
 --- rd:4 wd:3
 disk 0, o:1, dev:sda1
 disk 1, o:1, dev:sdb1
 disk 2, o:1, dev:sdc1
md: md7 stopped.
md: bind<hdc8>
md: bind<hda8>
raid1: raid set md7 active with 2 out of 2 mirrors
st: Version 20061107, fixed bufsize 32768, s/g segs 256 st 0:0:0:0: Attached
scsi tape st0 st 0:0:0:0: st0: try direct i/o: yes (alignment 512 B)
 target0:0:0: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 8)
st0: Block limits 2 - 16777215 bytes.
program stinit is using a deprecated SCSI ioctl, please convert it to SG_IO
kjournald starting.  Commit interval 5 seconds
EXT3 FS on md2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on md3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on md4, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on md5, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on md6, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on md7, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hdb1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex
e1000: eth0: e1000_set_tso: TSO is Disabled
e1000: eth0: e1000_set_tso: TSO is Disabled
e1000: eth0: e1000_set_tso: TSO is Disabled process `syslogd' is using
obsolete setsockopt SO_BSDCOMPAT
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
st0: MTSETDRVBUFFER only allowed for root.
vmmon: module license 'unspecified' taints kernel.
/dev/vmmon[2331]: Module vmmon: registered with major=10 minor=165
/dev/vmmon[2331]: Module vmmon: initialized
/dev/vmnet: open called by PID 2366 (vmnet-bridge)
/dev/vmnet: hub 0 does not exist, allocating memory.
/dev/vmnet: port on hub 0 successfully opened
bridge-eth0: enabling the bridge
bridge-eth0: up
bridge-eth0: already up
bridge-eth0: attached
floppy0: no floppy controllers found
floppy0: no floppy controllers found
st 0:0:0:0: Attached scsi generic sg0 type 1 sd 1:0:0:0: Attached scsi generic
sg1 type 0 sd 2:0:0:0: Attached scsi generic sg2 type 0 sd 3:0:0:0: Attached
scsi generic sg3 type 0 sd 4:0:0:0: Attached scsi generic sg4 type 0
/dev/vmnet: open called by PID 2723 (vmware-vmx) device eth0 entered
promiscuous mode
bridge-eth0: enabled promiscuous mode
/dev/vmnet: port on hub 0 successfully opened
/dev/vmmon[2744]: host clock rate change request 0 -> 1001
/dev/vmnet: open called by PID 2972 (vmware-vmx)
/dev/vmnet: port on hub 0 successfully opened
md: bind<sdd1>
RAID5 conf printout:
 --- rd:4 wd:3
 disk 0, o:1, dev:sda1
 disk 1, o:1, dev:sdb1
 disk 2, o:1, dev:sdc1
 disk 3, o:1, dev:sdd1
md: recovery of RAID array md6
md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000
KB/sec) for recovery.
md: using 128k window, over a total of 156288256 blocks.
md: md6: recovery done.
RAID5 conf printout:
 --- rd:4 wd:4
 disk 0, o:1, dev:sda1
 disk 1, o:1, dev:sdb1
 disk 2, o:1, dev:sdc1
 disk 3, o:1, dev:sdd1
/dev/vmnet: open called by PID 2989 (vmware-vmx)
/dev/vmnet: port on hub 0 successfully opened
/dev/vmnet: open called by PID 2989 (vmware-vmx)
/dev/vmnet: port on hub 0 successfully opened
/dev/vmmon[2744]: host clock rate change request 1001 -> 1002
/dev/vmmon[2744]: host clock rate change request 1002 -> 83
/dev/vmmon[2744]: host clock rate change request 83 -> 1001
/dev/vmmon[2744]: host clock rate change request 1001 -> 1002
/dev/vmmon[2744]: host clock rate change request 1002 -> 1001
/dev/vmnet: open called by PID 2988 (vmware-vmx)
/dev/vmnet: port on hub 0 successfully opened
/dev/vmnet: open called by PID 2989 (vmware-vmx)
/dev/vmnet: port on hub 0 successfully opened kjournald starting.  Commit
interval 5 seconds
EXT3 FS on dm-0, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
ata3: command timeout
ata3: no sense translation for status: 0x40
ata3: translated ATA stat/err 0x40/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x40 { DriveReady }
sd 3:0:0:0: SCSI error: return code = 0x08000002
sdc: Current [descriptor]: sense key: Aborted Command
    Additional sense: No additional sense information Descriptor sense data
with sense descriptors (in hex):
        72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
        00 00 00 01
end_request: I/O error, dev sdc, sector 260419647
raid5:md6: read error corrected (8 sectors at 260419584 on sdc1)
ata4: command timeout
ata4: no sense translation for status: 0x40
ata4: translated ATA stat/err 0x40/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x40 { DriveReady }
sd 4:0:0:0: SCSI error: return code = 0x08000002
sdd: Current [descriptor]: sense key: Aborted Command
    Additional sense: No additional sense information Descriptor sense data
with sense descriptors (in hex):
        72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
        00 00 00 00
end_request: I/O error, dev sdd, sector 277596095


------------- END DMESG DUMP -------------


---------- Forwarded Message -----------
On Sun, 18 Feb 2007 07:13:28 -0500 (EST), Justin Piszcz wrote
> On Sun, 18 Feb 2007, Marc Marais wrote:
> 
> > On Sun, 18 Feb 2007 20:39:09 +1100, Neil Brown wrote
> >> On Sunday February 18, marcm@liquid-nexus.net wrote:
> >>> Ok, I understand the risks which is why I did a full backup before 
doing
> >>> this. I have subsequently recreated the array and restored my data from
> >>> backup.
> >>
> >> Could you still please tell me exactly what kernel/mdadm version you
> >> were using?
> >>
> >> Thanks,
> >> NeilBrown
> >
> > 2.6.20 with the patch you supplied in response to the "md6_raid5 crash
> > email" I posted in linux-raid a few days ago. Just as background, I 
replaced
> > the failing drive and at the same time bought an additional drive in 
order
> > to increase the array size.
> >
> > mdadm -V = v2.6 - 21 December 2006. Compiled under Debian (stable).
> >
> > Also, I've just noticed another drive failure with the new array with a
> > similar error to what happened during the grow operation (although on a
> > different drive) - I wonder if I should post this to linux-ide?
> >
> > Feb 18 00:58:10 xerces kernel: ata4: command timeout
> > Feb 18 00:58:10 xerces kernel: ata4: no sense translation for status: 
0x40
> > Feb 18 00:58:10 xerces kernel: ata4: translated ATA stat/err 0x40/00 to 
SCSI
> > SK/ASC/ASCQ 0xb/00/00
> > Feb 18 00:58:10 xerces kernel: ata4: status=0x40 { DriveReady }
> > Feb 18 00:58:10 xerces kernel: sd 4:0:0:0: SCSI error: return code =
> > 0x08000002
> > Feb 18 00:58:10 xerces kernel: sdd: Current [descriptor]: sense key: 
Aborted
> > Command
> > Feb 18 00:58:10 xerces kernel:     Additional sense: No additional sense
> > information
> > Feb 18 00:58:10 xerces kernel: Descriptor sense data with sense 
descriptors
> > (in hex):
> > Feb 18 00:58:10 xerces kernel:         72 0b 00 00 00 00 00 0c 00 0a 80 
00
> > 00 00 00 00
> > Feb 18 00:58:10 xerces kernel:         00 00 00 00
> > Feb 18 00:58:10 xerces kernel: end_request: I/O error, dev sdd, sector
> > 35666775
> > Feb 18 00:58:10 xerces kernel: raid5: Disk failure on sdd1, disabling
> > device. Operation continuing on 3 devices
> >
> > Regards,
> > Marc
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> Just out of curiosity:
> 
> Feb 18 00:58:10 xerces kernel: end_request: I/O error, dev sdd,
>  sector 35666775
> 
> Can you run:
> 
> smartctl -d ata -t short /dev/sdd
> wait 5 min
> smartctl -d ata -t long /dev/sdd
> wait 2-3 hr
> smartctl -d ata -a /dev/sdd
> 
> And then e-mail that output to the list?
> 
> Justin.

I have smartmontools performing regular short and long scans but I will run 
the tests immediately and send the output of smartctl -a when done.

Note I'm getting similar errors on sdc too (as in 5 minutes ago). 
Interestingly the SMART error logs for sdc and sdd show no errors at all.

ata3: command timeout
ata3: no sense translation for status: 0x40
ata3: translated ATA stat/err 0x40/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x40 { DriveReady }
sd 3:0:0:0: SCSI error: return code = 0x08000002
sdd: Current [descriptor]: sense key: Aborted Command
     Additional sense: No additional sense information
Descriptor sense data with sense descriptors (in hex):
         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
         00 00 00 00
end_request: I/O error, dev sdc, sector 260419647
raid5:md6: read error corrected (8 sectors at 260419584 on sdc1)

Will post logs when done...

Marc

--
------- End of Forwarded Message -------


--

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mdadm --grow failed
  2007-02-18 12:13           ` Justin Piszcz
  2007-02-18 12:32             ` Marc Marais
@ 2007-02-19  5:41             ` Marc Marais
  2007-02-19 13:25               ` Justin Piszcz
  1 sibling, 1 reply; 15+ messages in thread
From: Marc Marais @ 2007-02-19  5:41 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid

On Sun, 18 Feb 2007 07:13:28 -0500 (EST), Justin Piszcz wrote
> On Sun, 18 Feb 2007, Marc Marais wrote:
> 
> > On Sun, 18 Feb 2007 20:39:09 +1100, Neil Brown wrote
> >> On Sunday February 18, marcm@liquid-nexus.net wrote:
> >>> Ok, I understand the risks which is why I did a full backup before doing
> >>> this. I have subsequently recreated the array and restored my data from
> >>> backup.
> >>
> >> Could you still please tell me exactly what kernel/mdadm version you
> >> were using?
> >>
> >> Thanks,
> >> NeilBrown
> >
> > 2.6.20 with the patch you supplied in response to the "md6_raid5 crash
> > email" I posted in linux-raid a few days ago. Just as background, I replaced
> > the failing drive and at the same time bought an additional drive in order
> > to increase the array size.
> >
> > mdadm -V = v2.6 - 21 December 2006. Compiled under Debian (stable).
> >
> > Also, I've just noticed another drive failure with the new array with a
> > similar error to what happened during the grow operation (although on a
> > different drive) - I wonder if I should post this to linux-ide?
> >
> > Feb 18 00:58:10 xerces kernel: ata4: command timeout
> > Feb 18 00:58:10 xerces kernel: ata4: no sense translation for status: 0x40
> > Feb 18 00:58:10 xerces kernel: ata4: translated ATA stat/err 0x40/00 to SCSI
> > SK/ASC/ASCQ 0xb/00/00
> > Feb 18 00:58:10 xerces kernel: ata4: status=0x40 { DriveReady }
> > Feb 18 00:58:10 xerces kernel: sd 4:0:0:0: SCSI error: return code =
> > 0x08000002
> > Feb 18 00:58:10 xerces kernel: sdd: Current [descriptor]: sense key: Aborted
> > Command
> > Feb 18 00:58:10 xerces kernel:     Additional sense: No additional sense
> > information
> > Feb 18 00:58:10 xerces kernel: Descriptor sense data with sense descriptors
> > (in hex):
> > Feb 18 00:58:10 xerces kernel:         72 0b 00 00 00 00 00 0c 00 0a 80 00
> > 00 00 00 00
> > Feb 18 00:58:10 xerces kernel:         00 00 00 00
> > Feb 18 00:58:10 xerces kernel: end_request: I/O error, dev sdd, sector
> > 35666775
> > Feb 18 00:58:10 xerces kernel: raid5: Disk failure on sdd1, disabling
> > device. Operation continuing on 3 devices
> >
> > Regards,
> > Marc
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> Just out of curiosity:
> 
> Feb 18 00:58:10 xerces kernel: end_request: I/O error, dev sdd,
>  sector 35666775
> 
> Can you run:
> 
> smartctl -d ata -t short /dev/sdd
> wait 5 min
> smartctl -d ata -t long /dev/sdd
> wait 2-3 hr
> smartctl -d ata -a /dev/sdd
> 
> And then e-mail that output to the list?
> 
> Justin.

Ok here we go:

/dev/sdd:

smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen Home page is
http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD1600JB-00EVA0
Serial Number:    WD-WMAEK2751794
Firmware Version: 15.05R15
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Mon Feb 19 14:38:16 2007 GMT-9
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION === SMART overall-health self-assessment
test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
					was suspended by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		 (5073) seconds.
Offline data collection
capabilities: 			 (0x79) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					No General Purpose Logging support.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  67) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.

SMART Attributes Data Structure revision number: 16 Vendor Specific SMART
Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED 
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   200   200   051    Pre-fail  Always     
 -       0
  3 Spin_Up_Time            0x0007   148   144   021    Pre-fail  Always     
 -       3141
  4 Start_Stop_Count        0x0032   100   100   040    Old_age   Always     
 -       91
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always     
 -       0
  7 Seek_Error_Rate         0x000b   200   200   051    Pre-fail  Always     
 -       0
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always     
 -       5070
 10 Spin_Retry_Count        0x0013   100   253   051    Pre-fail  Always     
 -       0
 11 Calibration_Retry_Count 0x0013   100   253   051    Pre-fail  Always     
 -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always     
 -       90
194 Temperature_Celsius     0x0022   116   253   000    Old_age   Always     
 -       34
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always     
 -       0
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always     
 -       0
198 Offline_Uncorrectable   0x0012   200   200   000    Old_age   Always     
 -       0
199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always     
 -       0
200 Multi_Zone_Error_Rate   0x0009   200   155   051    Pre-fail  Offline    
 -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours) 
LBA_of_first_error
# 1  Short offline       Completed without error       00%       691         -
# 2  Extended offline    Completed without error       00%       686         -
# 3  Short offline       Completed without error       00%       685         -
# 4  Short offline       Completed without error       00%       620         -
# 5  Extended offline    Completed without error       00%       598         -
# 6  Short offline       Completed without error       00%       597         -
# 7  Short offline       Completed without error       00%       573         -
# 8  Short offline       Completed without error       00%       549         -
# 9  Short offline       Completed without error       00%       525         -
#10  Short offline       Completed without error       00%       501         -
#11  Short offline       Completed without error       00%       477         -
#12  Short offline       Completed without error       00%       453         -
#13  Short offline       Completed without error       00%       382         -
#14  Short offline       Completed without error       00%       358         -
#15  Short offline       Completed without error       00%       334         -
#16  Short offline       Completed without error       00%       310         -
#17  Short offline       Completed without error       00%       286         -
#18  Extended offline    Completed without error       00%       264         -
#19  Short offline       Completed without error       00%       263         -
#20  Short offline       Completed without error       00%       239         -
#21  Short offline       Completed without error       00%       215         -

SMART Selective self-test log data structure revision number 1  SPAN  MIN_LBA
 MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

--
/dev/sdc:

smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen Home page is
http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD1600JB-00REA0
Serial Number:    WD-WCANM4681863
Firmware Version: 20.00K20
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Mon Feb 19 14:38:11 2007 GMT-9
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION === SMART overall-health self-assessment
test result: PASSED

General SMART Values:
Offline data collection status:  (0x85)	Offline data collection activity
					was aborted by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		 (4980) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  60) minutes.
Conveyance self-test routine
recommended polling time: 	 (   6) minutes.

SMART Attributes Data Structure revision number: 16 Vendor Specific SMART
Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED 
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always     
 -       0
  3 Spin_Up_Time            0x0003   184   184   021    Pre-fail  Always     
 -       3775
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always     
 -       19
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always     
 -       0
  7 Seek_Error_Rate         0x000f   200   200   051    Pre-fail  Always     
 -       0
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always     
 -       4834
 10 Spin_Retry_Count        0x0013   100   253   051    Pre-fail  Always     
 -       0
 11 Calibration_Retry_Count 0x0012   100   253   051    Old_age   Always     
 -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always     
 -       18
194 Temperature_Celsius     0x0022   114   095   000    Old_age   Always     
 -       33
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always     
 -       0
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always     
 -       0
198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   Offline    
 -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always     
 -       0
200 Multi_Zone_Error_Rate   0x0009   200   200   051    Pre-fail  Offline    
 -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours) 
LBA_of_first_error
# 1  Short offline       Completed without error       00%      4823         -
# 2  Extended offline    Completed without error       00%      4819         -
# 3  Short offline       Completed without error       00%      4817         -
# 4  Short offline       Completed without error       00%      4799         -
# 5  Short offline       Completed without error       00%      4775         -
# 6  Short offline       Completed without error       00%      4751         -
# 7  Extended offline    Completed without error       00%      4728         -
# 8  Short offline       Completed without error       00%      4727         -
# 9  Short offline       Completed without error       00%      4703         -
#10  Short offline       Completed without error       00%      4679         -
#11  Short offline       Completed without error       00%      4655         -
#12  Short offline       Completed without error       00%      4631         -
#13  Short offline       Completed without error       00%      4607         -
#14  Short offline       Completed without error       00%      4583         -
#15  Short offline       Completed without error       00%      4511         -
#16  Short offline       Completed without error       00%      4487         -
#17  Short offline       Completed without error       00%      4463         -
#18  Short offline       Completed without error       00%      4439         -
#19  Short offline       Completed without error       00%      4415         -
#20  Extended offline    Completed without error       00%      4393         -
#21  Short offline       Completed without error       00%      4391         -

SMART Selective self-test log data structure revision number 1  SPAN  MIN_LBA
 MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mdadm --grow failed
  2007-02-19  5:41             ` mdadm --grow failed Marc Marais
@ 2007-02-19 13:25               ` Justin Piszcz
  0 siblings, 0 replies; 15+ messages in thread
From: Justin Piszcz @ 2007-02-19 13:25 UTC (permalink / raw)
  To: Marc Marais; +Cc: linux-raid



On Mon, 19 Feb 2007, Marc Marais wrote:

> On Sun, 18 Feb 2007 07:13:28 -0500 (EST), Justin Piszcz wrote
>> On Sun, 18 Feb 2007, Marc Marais wrote:
>>
>>> On Sun, 18 Feb 2007 20:39:09 +1100, Neil Brown wrote
>>>> On Sunday February 18, marcm@liquid-nexus.net wrote:
>>>>> Ok, I understand the risks which is why I did a full backup before doing
>>>>> this. I have subsequently recreated the array and restored my data from
>>>>> backup.
>>>>
>>>> Could you still please tell me exactly what kernel/mdadm version you
>>>> were using?
>>>>
>>>> Thanks,
>>>> NeilBrown
>>>
>>> 2.6.20 with the patch you supplied in response to the "md6_raid5 crash
>>> email" I posted in linux-raid a few days ago. Just as background, I replaced
>>> the failing drive and at the same time bought an additional drive in order
>>> to increase the array size.
>>>
>>> mdadm -V = v2.6 - 21 December 2006. Compiled under Debian (stable).
>>>
>>> Also, I've just noticed another drive failure with the new array with a
>>> similar error to what happened during the grow operation (although on a
>>> different drive) - I wonder if I should post this to linux-ide?
>>>
>>> Feb 18 00:58:10 xerces kernel: ata4: command timeout
>>> Feb 18 00:58:10 xerces kernel: ata4: no sense translation for status: 0x40
>>> Feb 18 00:58:10 xerces kernel: ata4: translated ATA stat/err 0x40/00 to SCSI
>>> SK/ASC/ASCQ 0xb/00/00
>>> Feb 18 00:58:10 xerces kernel: ata4: status=0x40 { DriveReady }
>>> Feb 18 00:58:10 xerces kernel: sd 4:0:0:0: SCSI error: return code =
>>> 0x08000002
>>> Feb 18 00:58:10 xerces kernel: sdd: Current [descriptor]: sense key: Aborted
>>> Command
>>> Feb 18 00:58:10 xerces kernel:     Additional sense: No additional sense
>>> information
>>> Feb 18 00:58:10 xerces kernel: Descriptor sense data with sense descriptors
>>> (in hex):
>>> Feb 18 00:58:10 xerces kernel:         72 0b 00 00 00 00 00 0c 00 0a 80 00
>>> 00 00 00 00
>>> Feb 18 00:58:10 xerces kernel:         00 00 00 00
>>> Feb 18 00:58:10 xerces kernel: end_request: I/O error, dev sdd, sector
>>> 35666775
>>> Feb 18 00:58:10 xerces kernel: raid5: Disk failure on sdd1, disabling
>>> device. Operation continuing on 3 devices
>>>
>>> Regards,
>>> Marc
>>>
>>> -
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>> Just out of curiosity:
>>
>> Feb 18 00:58:10 xerces kernel: end_request: I/O error, dev sdd,
>>  sector 35666775
>>
>> Can you run:
>>
>> smartctl -d ata -t short /dev/sdd
>> wait 5 min
>> smartctl -d ata -t long /dev/sdd
>> wait 2-3 hr
>> smartctl -d ata -a /dev/sdd
>>
>> And then e-mail that output to the list?
>>
>> Justin.
>
> Ok here we go:
>
> /dev/sdd:
>
> smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen Home page is
> http://smartmontools.sourceforge.net/
>
> === START OF INFORMATION SECTION ===
> Device Model:     WDC WD1600JB-00EVA0
> Serial Number:    WD-WMAEK2751794
> Firmware Version: 15.05R15
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   6
> ATA Standard is:  Exact ATA specification draft version not indicated
> Local Time is:    Mon Feb 19 14:38:16 2007 GMT-9
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION === SMART overall-health self-assessment
> test result: PASSED
>
> General SMART Values:
> Offline data collection status:  (0x84)	Offline data collection activity
> 					was suspended by an interrupting command from host.
> 					Auto Offline Data Collection: Enabled.
> Self-test execution status:      (   0)	The previous self-test routine completed
> 					without error or no self-test has ever
> 					been run.
> Total time to complete Offline
> data collection: 		 (5073) seconds.
> Offline data collection
> capabilities: 			 (0x79) SMART execute Offline immediate.
> 					No Auto Offline data collection support.
> 					Suspend Offline collection upon new
> 					command.
> 					Offline surface scan supported.
> 					Self-test supported.
> 					Conveyance Self-test supported.
> 					Selective Self-test supported.
> SMART capabilities:            (0x0003)	Saves SMART data before entering
> 					power-saving mode.
> 					Supports SMART auto save timer.
> Error logging capability:        (0x01)	Error logging supported.
> 					No General Purpose Logging support.
> Short self-test routine
> recommended polling time: 	 (   2) minutes.
> Extended self-test routine
> recommended polling time: 	 (  67) minutes.
> Conveyance self-test routine
> recommended polling time: 	 (   5) minutes.
>
> SMART Attributes Data Structure revision number: 16 Vendor Specific SMART
> Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED
> WHEN_FAILED RAW_VALUE
>  1 Raw_Read_Error_Rate     0x000b   200   200   051    Pre-fail  Always
> -       0
>  3 Spin_Up_Time            0x0007   148   144   021    Pre-fail  Always
> -       3141
>  4 Start_Stop_Count        0x0032   100   100   040    Old_age   Always
> -       91
>  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always
> -       0
>  7 Seek_Error_Rate         0x000b   200   200   051    Pre-fail  Always
> -       0
>  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always
> -       5070
> 10 Spin_Retry_Count        0x0013   100   253   051    Pre-fail  Always
> -       0
> 11 Calibration_Retry_Count 0x0013   100   253   051    Pre-fail  Always
> -       0
> 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always
> -       90
> 194 Temperature_Celsius     0x0022   116   253   000    Old_age   Always
> -       34
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always
> -       0
> 197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always
> -       0
> 198 Offline_Uncorrectable   0x0012   200   200   000    Old_age   Always
> -       0
> 199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always
> -       0
> 200 Multi_Zone_Error_Rate   0x0009   200   155   051    Pre-fail  Offline
> -       0
>
> SMART Error Log Version: 1
> No Errors Logged
>
> SMART Self-test log structure revision number 1
> Num  Test_Description    Status                  Remaining  LifeTime(hours)
> LBA_of_first_error
> # 1  Short offline       Completed without error       00%       691         -
> # 2  Extended offline    Completed without error       00%       686         -
> # 3  Short offline       Completed without error       00%       685         -
> # 4  Short offline       Completed without error       00%       620         -
> # 5  Extended offline    Completed without error       00%       598         -
> # 6  Short offline       Completed without error       00%       597         -
> # 7  Short offline       Completed without error       00%       573         -
> # 8  Short offline       Completed without error       00%       549         -
> # 9  Short offline       Completed without error       00%       525         -
> #10  Short offline       Completed without error       00%       501         -
> #11  Short offline       Completed without error       00%       477         -
> #12  Short offline       Completed without error       00%       453         -
> #13  Short offline       Completed without error       00%       382         -
> #14  Short offline       Completed without error       00%       358         -
> #15  Short offline       Completed without error       00%       334         -
> #16  Short offline       Completed without error       00%       310         -
> #17  Short offline       Completed without error       00%       286         -
> #18  Extended offline    Completed without error       00%       264         -
> #19  Short offline       Completed without error       00%       263         -
> #20  Short offline       Completed without error       00%       239         -
> #21  Short offline       Completed without error       00%       215         -
>
> SMART Selective self-test log data structure revision number 1  SPAN  MIN_LBA
> MAX_LBA  CURRENT_TEST_STATUS
>    1        0        0  Not_testing
>    2        0        0  Not_testing
>    3        0        0  Not_testing
>    4        0        0  Not_testing
>    5        0        0  Not_testing
> Selective self-test flags (0x0):
>  After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.
>
> --
> /dev/sdc:
>
> smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen Home page is
> http://smartmontools.sourceforge.net/
>
> === START OF INFORMATION SECTION ===
> Device Model:     WDC WD1600JB-00REA0
> Serial Number:    WD-WCANM4681863
> Firmware Version: 20.00K20
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   7
> ATA Standard is:  Exact ATA specification draft version not indicated
> Local Time is:    Mon Feb 19 14:38:11 2007 GMT-9
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION === SMART overall-health self-assessment
> test result: PASSED
>
> General SMART Values:
> Offline data collection status:  (0x85)	Offline data collection activity
> 					was aborted by an interrupting command from host.
> 					Auto Offline Data Collection: Enabled.
> Self-test execution status:      (   0)	The previous self-test routine completed
> 					without error or no self-test has ever
> 					been run.
> Total time to complete Offline
> data collection: 		 (4980) seconds.
> Offline data collection
> capabilities: 			 (0x7b) SMART execute Offline immediate.
> 					Auto Offline data collection on/off support.
> 					Suspend Offline collection upon new
> 					command.
> 					Offline surface scan supported.
> 					Self-test supported.
> 					Conveyance Self-test supported.
> 					Selective Self-test supported.
> SMART capabilities:            (0x0003)	Saves SMART data before entering
> 					power-saving mode.
> 					Supports SMART auto save timer.
> Error logging capability:        (0x01)	Error logging supported.
> 					General Purpose Logging supported.
> Short self-test routine
> recommended polling time: 	 (   2) minutes.
> Extended self-test routine
> recommended polling time: 	 (  60) minutes.
> Conveyance self-test routine
> recommended polling time: 	 (   6) minutes.
>
> SMART Attributes Data Structure revision number: 16 Vendor Specific SMART
> Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED
> WHEN_FAILED RAW_VALUE
>  1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always
> -       0
>  3 Spin_Up_Time            0x0003   184   184   021    Pre-fail  Always
> -       3775
>  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always
> -       19
>  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always
> -       0
>  7 Seek_Error_Rate         0x000f   200   200   051    Pre-fail  Always
> -       0
>  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always
> -       4834
> 10 Spin_Retry_Count        0x0013   100   253   051    Pre-fail  Always
> -       0
> 11 Calibration_Retry_Count 0x0012   100   253   051    Old_age   Always
> -       0
> 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always
> -       18
> 194 Temperature_Celsius     0x0022   114   095   000    Old_age   Always
> -       33
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always
> -       0
> 197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always
> -       0
> 198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   Offline
> -       0
> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always
> -       0
> 200 Multi_Zone_Error_Rate   0x0009   200   200   051    Pre-fail  Offline
> -       0
>
> SMART Error Log Version: 1
> No Errors Logged
>
> SMART Self-test log structure revision number 1
> Num  Test_Description    Status                  Remaining  LifeTime(hours)
> LBA_of_first_error
> # 1  Short offline       Completed without error       00%      4823         -
> # 2  Extended offline    Completed without error       00%      4819         -
> # 3  Short offline       Completed without error       00%      4817         -
> # 4  Short offline       Completed without error       00%      4799         -
> # 5  Short offline       Completed without error       00%      4775         -
> # 6  Short offline       Completed without error       00%      4751         -
> # 7  Extended offline    Completed without error       00%      4728         -
> # 8  Short offline       Completed without error       00%      4727         -
> # 9  Short offline       Completed without error       00%      4703         -
> #10  Short offline       Completed without error       00%      4679         -
> #11  Short offline       Completed without error       00%      4655         -
> #12  Short offline       Completed without error       00%      4631         -
> #13  Short offline       Completed without error       00%      4607         -
> #14  Short offline       Completed without error       00%      4583         -
> #15  Short offline       Completed without error       00%      4511         -
> #16  Short offline       Completed without error       00%      4487         -
> #17  Short offline       Completed without error       00%      4463         -
> #18  Short offline       Completed without error       00%      4439         -
> #19  Short offline       Completed without error       00%      4415         -
> #20  Extended offline    Completed without error       00%      4393         -
> #21  Short offline       Completed without error       00%      4391         -
>
> SMART Selective self-test log data structure revision number 1  SPAN  MIN_LBA
> MAX_LBA  CURRENT_TEST_STATUS
>    1        0        0  Not_testing
>    2        0        0  Not_testing
>    3        0        0  Not_testing
>    4        0        0  Not_testing
>    5        0        0  Not_testing
> Selective self-test flags (0x0):
>  After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.
>

Strange, sounds like an interrupt problem to me then, what does cat 
/proc/interrupts say?  What does dmesg say?  Any errors there?  Your disks 
appear to be fine.

Justin.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2007-02-19 13:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-17  3:22 mdadm --grow failed Marc Marais
2007-02-17  8:40 ` Neil Brown
2007-02-18  9:20   ` Marc Marais
     [not found]     ` <17880.7869.963793.706096@notabene.brown>
     [not found]       ` <20070218105242.M29958@liquid-nexus.net>
2007-02-18 11:57         ` Fw: " Marc Marais
2007-02-18 12:13           ` Justin Piszcz
2007-02-18 12:32             ` Marc Marais
2007-02-19  4:43               ` sata_promise: random/intermittent errors Marc Marais
2007-02-19  5:41             ` mdadm --grow failed Marc Marais
2007-02-19 13:25               ` Justin Piszcz
2007-02-19  0:50     ` Neil Brown
2007-02-17 18:27 ` Bill Davidsen
2007-02-17 19:16   ` Justin Piszcz
2007-02-17 21:08     ` Neil Brown
2007-02-17 21:30       ` Justin Piszcz
2007-02-18 11:51 ` David Greaves

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.