All of lore.kernel.org
 help / color / mirror / Atom feed
* Fw: sdc1 does not have a valid v0.90 superblock, not importing!
@ 2010-08-10 21:35 Jon Hardcastle
  2010-08-10 21:41 ` Jon Hardcastle
  2010-08-11 22:01 ` Stefan /*St0fF*/ Hübner
  0 siblings, 2 replies; 9+ messages in thread
From: Jon Hardcastle @ 2010-08-10 21:35 UTC (permalink / raw)
  To: linux-raid

Help!

Long story short - I was watching a movie off my RAID6 array. Got a smart error warning 

'Device: /dev/sdc [SAT], ATA error count increased from 30 to 31'

I went to investigate and found:

Error 31 occurred at disk power-on lifetime: 8461 hours (352 days + 13 
hours)

  When the command that caused the error occurred, the device was active
 or idle.



  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 28 50 bd 49 47



  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  61 38 08 3f bd 49 40 08      00:38:33.100  WRITE FPDMA QUEUED

  61 08 00 7f bd 49 40 08      00:38:33.100  WRITE FPDMA QUEUED

  61 08 00 97 bd 49 40 08      00:38:33.000  WRITE FPDMA QUEUED

  ea 00 00 00 00 00 a0 08      00:38:33.000  FLUSH CACHE EXT

  61 08 00 bf 4b 38 40 08      00:38:33.000  WRITE FPDMA QUEUED

I then emailed myself some error logs and shut the machine down. This drive has caused me problems before - the last time when the cat knocked the computer over and dislodged the controller card. But several echo "check" sync_action later and several weeks I have not had a peep out of it.

ANYWAYS. after the reboot the array wont assemble (is that normal?)

Aug 10 22:00:07 mangalore kernel: md: running: 
<sdg1><sdf1><sde1><sdd1><sdb1> <sda1>

Aug 10 22:00:07 mangalore kernel: raid5: md4 is not clean -- starting 
background reconstruction

Aug 10 22:00:07 mangalore kernel: raid5: device sdg1 operational as raid
 disk 0

Aug 10 22:00:07 mangalore kernel: raid5: device sdf1 operational as raid
 disk 6

Aug 10 22:00:07 mangalore kernel: raid5: device sde1 operational as raid
 disk 2

Aug 10 22:00:07 mangalore kernel: raid5: device sdd1 operational as raid
 disk 4

Aug 10 22:00:07 mangalore kernel: raid5: device sdb1 operational as raid
 disk 5

Aug 10 22:00:07 mangalore kernel: raid5: device sda1 operational as raid
 disk 1

Aug 10 22:00:07 mangalore kernel: raid5: allocated 7343kB for md4

Aug 10 22:00:07 mangalore kernel: 0: w=1 pa=0 pr=7 m=2 a=2 r=7 op1=0 
op2=0

Aug 10 22:00:07 mangalore kernel: 6: w=2 pa=0 pr=7 m=2 a=2 r=7 op1=0 
op2=0

Aug 10 22:00:07 mangalore kernel: 2: w=3 pa=0 pr=7 m=2 a=2 r=7 op1=0 
op2=0

Aug 10 22:00:07 mangalore kernel: 4: w=4 pa=0 pr=7 m=2 a=2 r=7 op1=0 
op2=0

Aug 10 22:00:07 mangalore kernel: 5: w=5 pa=0 pr=7 m=2 a=2 r=7 op1=0 
op2=0

Aug 10 22:00:07 mangalore kernel: 1: w=6 pa=0 pr=7 m=2 a=2 r=7 op1=0 
op2=0

Aug 10 22:00:07 mangalore kernel: raid5: cannot start dirty degraded 
array for md4

Aug 10 22:00:07 mangalore kernel: RAID5 conf printout:

Aug 10 22:00:07 mangalore kernel: --- rd:7 wd:6

Aug 10 22:00:07 mangalore kernel: disk 0, o:1, dev:sdg1

Aug 10 22:00:07 mangalore kernel: disk 1, o:1, dev:sda1

Aug 10 22:00:07 mangalore kernel: disk 2, o:1, dev:sde1

Aug 10 22:00:07 mangalore kernel: disk 4, o:1, dev:sdd1

Aug 10 22:00:07 mangalore kernel: disk 5, o:1, dev:sdb1

Aug 10 22:00:07 mangalore kernel: disk 6, o:1, dev:sdf1

Aug 10 22:00:07 mangalore kernel: raid5: failed to run raid set md4

Aug 10 22:00:07 mangalore kernel: md: pers->run() failed ...

Aug 10 22:00:07 mangalore kernel: md: do_md_run() returned -5

Aug 10 22:00:07 mangalore kernel: md: md4 stopped.

It appears sdc has an invalid superblock? 

This is the 'examine' from sdc1 (note the checksum)

/dev/sdc1:

          Magic : a92b4efc

        Version : 0.90.00

           UUID : 7438efd1:9e6ca2b5:d6b88274: 7003b1d3

  Creation Time : Thu Oct 11 00:01:49 2007

     Raid Level : raid6

  Used Dev Size : 488383936 (465.76 GiB 500.11 GB)

     Array Size : 2441919680 (2328.80 GiB 2500.53 GB)

   Raid Devices : 7

  Total Devices : 7

Preferred Minor : 4



    Update Time : Tue Aug 10 21:39:49 2010

          State : active

 Active Devices : 7

Working Devices : 7

 Failed Devices : 0

  Spare Devices : 0

       Checksum : b335b4e3 - expected b735b4e3

         Events : 1860555



         Layout : left-symmetric

     Chunk Size : 64K



      Number   Major   Minor   RaidDevice State

this     3       8       33        3      active sync   /dev/sdc1



   0     0       8       97        0      active sync   /dev/sdg1

   1     1       8        1        1      active sync   /dev/sda1

   2     2       8       65        2      active sync   /dev/sde1

   3     3       8       33        3      active sync   /dev/sdc1

   4     4       8       49        4      active sync   /dev/sdd1

   5     5       8       17        5      active sync   /dev/sdb1

   6     6       8       81        6      active sync   /dev/sdf1
Anyways... I am ASSUMING mdadm has not assembled the array to be on the safe side? i have not done anything.. no force... no assume clean.. I wanted to be sure?

Should i remove sdc1 from the array? It should then assemble? I have 2 spare drives that I am getting around to using to replace this drive and the other 500GB.. so should I remove sdc1... and try and re-add or just put the new drive in?

atm I have 'stop'ped the array and got badblocks running....


      
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fw: sdc1 does not have a valid v0.90 superblock, not importing!
  2010-08-10 21:35 Fw: sdc1 does not have a valid v0.90 superblock, not importing! Jon Hardcastle
@ 2010-08-10 21:41 ` Jon Hardcastle
  2010-08-11 22:01 ` Stefan /*St0fF*/ Hübner
  1 sibling, 0 replies; 9+ messages in thread
From: Jon Hardcastle @ 2010-08-10 21:41 UTC (permalink / raw)
  To: linux-raid, Jon

I should ADD to this story... that when i powererd the machine down... it appeared to HANG so I have to press and hold to get it to shut down!!!!




-----------------------
N: Jon Hardcastle
E: Jon@eHardcastle.com
'Do not worry about tomorrow, for tomorrow will bring worries of its own.'

***********
Please note, I am phasing out jd_hardcastle AT yahoo.com and replacing it with jon AT eHardcastle.com
***********

-----------------------


--- On Tue, 10/8/10, Jon Hardcastle <jd_hardcastle@yahoo.com> wrote:

> From: Jon Hardcastle <jd_hardcastle@yahoo.com>
> Subject: Fw: sdc1 does not have a valid v0.90 superblock, not importing!
> To: linux-raid@vger.kernel.org
> Date: Tuesday, 10 August, 2010, 22:35
> Help!
> 
> Long story short - I was watching a movie off my RAID6
> array. Got a smart error warning 
> 
> 'Device: /dev/sdc [SAT], ATA error count increased from 30
> to 31'
> 
> I went to investigate and found:
> 
> Error 31 occurred at disk power-on lifetime: 8461 hours
> (352 days + 13 
> hours)
> 
>   When the command that caused the error occurred, the
> device was active
>  or idle.
> 
> 
> 
>   After command completion occurred, registers were:
> 
>   ER ST SC SN CL CH DH
> 
>   -- -- -- -- -- -- --
> 
>   84 51 28 50 bd 49 47
> 
> 
> 
>   Commands leading to the command that caused the error
> were:
> 
>   CR FR SC SN CL CH DH DC   Powered_Up_Time
>  Command/Feature_Name
> 
>   -- -- -- -- -- -- -- --  ----------------
>  --------------------
> 
>   61 38 08 3f bd 49 40 08      00:38:33.100  WRITE
> FPDMA QUEUED
> 
>   61 08 00 7f bd 49 40 08      00:38:33.100  WRITE
> FPDMA QUEUED
> 
>   61 08 00 97 bd 49 40 08      00:38:33.000  WRITE
> FPDMA QUEUED
> 
>   ea 00 00 00 00 00 a0 08      00:38:33.000  FLUSH
> CACHE EXT
> 
>   61 08 00 bf 4b 38 40 08      00:38:33.000  WRITE
> FPDMA QUEUED
> 
> I then emailed myself some error logs and shut the machine
> down. This drive has caused me problems before - the last
> time when the cat knocked the computer over and dislodged
> the controller card. But several echo "check" sync_action
> later and several weeks I have not had a peep out of it.
> 
> ANYWAYS. after the reboot the array wont assemble (is that
> normal?)
> 
> Aug 10 22:00:07 mangalore kernel: md: running: 
> <sdg1><sdf1><sde1><sdd1><sdb1>
> <sda1>
> 
> Aug 10 22:00:07 mangalore kernel: raid5: md4 is not clean
> -- starting 
> background reconstruction
> 
> Aug 10 22:00:07 mangalore kernel: raid5: device sdg1
> operational as raid
>  disk 0
> 
> Aug 10 22:00:07 mangalore kernel: raid5: device sdf1
> operational as raid
>  disk 6
> 
> Aug 10 22:00:07 mangalore kernel: raid5: device sde1
> operational as raid
>  disk 2
> 
> Aug 10 22:00:07 mangalore kernel: raid5: device sdd1
> operational as raid
>  disk 4
> 
> Aug 10 22:00:07 mangalore kernel: raid5: device sdb1
> operational as raid
>  disk 5
> 
> Aug 10 22:00:07 mangalore kernel: raid5: device sda1
> operational as raid
>  disk 1
> 
> Aug 10 22:00:07 mangalore kernel: raid5: allocated 7343kB
> for md4
> 
> Aug 10 22:00:07 mangalore kernel: 0: w=1 pa=0 pr=7 m=2 a=2
> r=7 op1=0 
> op2=0
> 
> Aug 10 22:00:07 mangalore kernel: 6: w=2 pa=0 pr=7 m=2 a=2
> r=7 op1=0 
> op2=0
> 
> Aug 10 22:00:07 mangalore kernel: 2: w=3 pa=0 pr=7 m=2 a=2
> r=7 op1=0 
> op2=0
> 
> Aug 10 22:00:07 mangalore kernel: 4: w=4 pa=0 pr=7 m=2 a=2
> r=7 op1=0 
> op2=0
> 
> Aug 10 22:00:07 mangalore kernel: 5: w=5 pa=0 pr=7 m=2 a=2
> r=7 op1=0 
> op2=0
> 
> Aug 10 22:00:07 mangalore kernel: 1: w=6 pa=0 pr=7 m=2 a=2
> r=7 op1=0 
> op2=0
> 
> Aug 10 22:00:07 mangalore kernel: raid5: cannot start dirty
> degraded 
> array for md4
> 
> Aug 10 22:00:07 mangalore kernel: RAID5 conf printout:
> 
> Aug 10 22:00:07 mangalore kernel: --- rd:7 wd:6
> 
> Aug 10 22:00:07 mangalore kernel: disk 0, o:1, dev:sdg1
> 
> Aug 10 22:00:07 mangalore kernel: disk 1, o:1, dev:sda1
> 
> Aug 10 22:00:07 mangalore kernel: disk 2, o:1, dev:sde1
> 
> Aug 10 22:00:07 mangalore kernel: disk 4, o:1, dev:sdd1
> 
> Aug 10 22:00:07 mangalore kernel: disk 5, o:1, dev:sdb1
> 
> Aug 10 22:00:07 mangalore kernel: disk 6, o:1, dev:sdf1
> 
> Aug 10 22:00:07 mangalore kernel: raid5: failed to run raid
> set md4
> 
> Aug 10 22:00:07 mangalore kernel: md: pers->run() failed
> ...
> 
> Aug 10 22:00:07 mangalore kernel: md: do_md_run() returned
> -5
> 
> Aug 10 22:00:07 mangalore kernel: md: md4 stopped.
> 
> It appears sdc has an invalid superblock? 
> 
> This is the 'examine' from sdc1 (note the checksum)
> 
> /dev/sdc1:
> 
>           Magic : a92b4efc
> 
>         Version : 0.90.00
> 
>            UUID : 7438efd1:9e6ca2b5:d6b88274:
> 7003b1d3
> 
>   Creation Time : Thu Oct 11 00:01:49 2007
> 
>      Raid Level : raid6
> 
>   Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
> 
>      Array Size : 2441919680 (2328.80 GiB 2500.53 GB)
> 
>    Raid Devices : 7
> 
>   Total Devices : 7
> 
> Preferred Minor : 4
> 
> 
> 
>     Update Time : Tue Aug 10 21:39:49 2010
> 
>           State : active
> 
>  Active Devices : 7
> 
> Working Devices : 7
> 
>  Failed Devices : 0
> 
>   Spare Devices : 0
> 
>        Checksum : b335b4e3 - expected b735b4e3
> 
>          Events : 1860555
> 
> 
> 
>          Layout : left-symmetric
> 
>      Chunk Size : 64K
> 
> 
> 
>       Number   Major   Minor   RaidDevice State
> 
> this     3       8       33        3    
>  active sync   /dev/sdc1
> 
> 
> 
>    0     0       8       97        0    
>  active sync   /dev/sdg1
> 
>    1     1       8        1        1    
>  active sync   /dev/sda1
> 
>    2     2       8       65        2    
>  active sync   /dev/sde1
> 
>    3     3       8       33        3    
>  active sync   /dev/sdc1
> 
>    4     4       8       49        4    
>  active sync   /dev/sdd1
> 
>    5     5       8       17        5    
>  active sync   /dev/sdb1
> 
>    6     6       8       81        6    
>  active sync   /dev/sdf1
> Anyways... I am ASSUMING mdadm has not assembled the array
> to be on the safe side? i have not done anything.. no
> force... no assume clean.. I wanted to be sure?
> 
> Should i remove sdc1 from the array? It should then
> assemble? I have 2 spare drives that I am getting around to
> using to replace this drive and the other 500GB.. so should
> I remove sdc1... and try and re-add or just put the new
> drive in?
> 
> atm I have 'stop'ped the array and got badblocks
> running....
> 
> 
>       
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


      
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fw: sdc1 does not have a valid v0.90 superblock, not importing!
  2010-08-10 21:35 Fw: sdc1 does not have a valid v0.90 superblock, not importing! Jon Hardcastle
  2010-08-10 21:41 ` Jon Hardcastle
@ 2010-08-11 22:01 ` Stefan /*St0fF*/ Hübner
  2010-08-11 22:56   ` Neil Brown
  1 sibling, 1 reply; 9+ messages in thread
From: Stefan /*St0fF*/ Hübner @ 2010-08-11 22:01 UTC (permalink / raw)
  To: Jon; +Cc: Jon Hardcastle, linux-raid

I had exactly the same problem this week with a costumer raid.  Solved
it via:
- calculate the hardware block where the Superblock resides
- dd if=/dev/sdXY of=superblock seek=block_of_superblock bs=512
- hexedit superblock checksum
- dd of=/dev/sdXY if=superblock skip=block_of_superblock bs=512

this is not the correct way to go.  But noticing that only ONE BIT was
skipped in the checksum, but all the other EXAMINE-information seemed
right, I thought it's the only way to go to get ahold of the data on the
array.

hope it helps,
Stefan


Am 10.08.2010 23:35, schrieb Jon Hardcastle:
> Help!
> 
> Long story short - I was watching a movie off my RAID6 array. Got a smart error warning 
> 
> 'Device: /dev/sdc [SAT], ATA error count increased from 30 to 31'
> 
> I went to investigate and found:
> 
> Error 31 occurred at disk power-on lifetime: 8461 hours (352 days + 13 
> hours)
> 
>   When the command that caused the error occurred, the device was active
>  or idle.
> 
> 
> 
>   After command completion occurred, registers were:
> 
>   ER ST SC SN CL CH DH
> 
>   -- -- -- -- -- -- --
> 
>   84 51 28 50 bd 49 47
> 
> 
> 
>   Commands leading to the command that caused the error were:
> 
>   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
> 
>   -- -- -- -- -- -- -- --  ----------------  --------------------
> 
>   61 38 08 3f bd 49 40 08      00:38:33.100  WRITE FPDMA QUEUED
> 
>   61 08 00 7f bd 49 40 08      00:38:33.100  WRITE FPDMA QUEUED
> 
>   61 08 00 97 bd 49 40 08      00:38:33.000  WRITE FPDMA QUEUED
> 
>   ea 00 00 00 00 00 a0 08      00:38:33.000  FLUSH CACHE EXT
> 
>   61 08 00 bf 4b 38 40 08      00:38:33.000  WRITE FPDMA QUEUED
> 
> I then emailed myself some error logs and shut the machine down. This drive has caused me problems before - the last time when the cat knocked the computer over and dislodged the controller card. But several echo "check" sync_action later and several weeks I have not had a peep out of it.
> 
> ANYWAYS. after the reboot the array wont assemble (is that normal?)
> 
> Aug 10 22:00:07 mangalore kernel: md: running: 
> <sdg1><sdf1><sde1><sdd1><sdb1> <sda1>
> 
> Aug 10 22:00:07 mangalore kernel: raid5: md4 is not clean -- starting 
> background reconstruction
> 
> Aug 10 22:00:07 mangalore kernel: raid5: device sdg1 operational as raid
>  disk 0
> 
> Aug 10 22:00:07 mangalore kernel: raid5: device sdf1 operational as raid
>  disk 6
> 
> Aug 10 22:00:07 mangalore kernel: raid5: device sde1 operational as raid
>  disk 2
> 
> Aug 10 22:00:07 mangalore kernel: raid5: device sdd1 operational as raid
>  disk 4
> 
> Aug 10 22:00:07 mangalore kernel: raid5: device sdb1 operational as raid
>  disk 5
> 
> Aug 10 22:00:07 mangalore kernel: raid5: device sda1 operational as raid
>  disk 1
> 
> Aug 10 22:00:07 mangalore kernel: raid5: allocated 7343kB for md4
> 
> Aug 10 22:00:07 mangalore kernel: 0: w=1 pa=0 pr=7 m=2 a=2 r=7 op1=0 
> op2=0
> 
> Aug 10 22:00:07 mangalore kernel: 6: w=2 pa=0 pr=7 m=2 a=2 r=7 op1=0 
> op2=0
> 
> Aug 10 22:00:07 mangalore kernel: 2: w=3 pa=0 pr=7 m=2 a=2 r=7 op1=0 
> op2=0
> 
> Aug 10 22:00:07 mangalore kernel: 4: w=4 pa=0 pr=7 m=2 a=2 r=7 op1=0 
> op2=0
> 
> Aug 10 22:00:07 mangalore kernel: 5: w=5 pa=0 pr=7 m=2 a=2 r=7 op1=0 
> op2=0
> 
> Aug 10 22:00:07 mangalore kernel: 1: w=6 pa=0 pr=7 m=2 a=2 r=7 op1=0 
> op2=0
> 
> Aug 10 22:00:07 mangalore kernel: raid5: cannot start dirty degraded 
> array for md4
> 
> Aug 10 22:00:07 mangalore kernel: RAID5 conf printout:
> 
> Aug 10 22:00:07 mangalore kernel: --- rd:7 wd:6
> 
> Aug 10 22:00:07 mangalore kernel: disk 0, o:1, dev:sdg1
> 
> Aug 10 22:00:07 mangalore kernel: disk 1, o:1, dev:sda1
> 
> Aug 10 22:00:07 mangalore kernel: disk 2, o:1, dev:sde1
> 
> Aug 10 22:00:07 mangalore kernel: disk 4, o:1, dev:sdd1
> 
> Aug 10 22:00:07 mangalore kernel: disk 5, o:1, dev:sdb1
> 
> Aug 10 22:00:07 mangalore kernel: disk 6, o:1, dev:sdf1
> 
> Aug 10 22:00:07 mangalore kernel: raid5: failed to run raid set md4
> 
> Aug 10 22:00:07 mangalore kernel: md: pers->run() failed ...
> 
> Aug 10 22:00:07 mangalore kernel: md: do_md_run() returned -5
> 
> Aug 10 22:00:07 mangalore kernel: md: md4 stopped.
> 
> It appears sdc has an invalid superblock? 
> 
> This is the 'examine' from sdc1 (note the checksum)
> 
> /dev/sdc1:
> 
>           Magic : a92b4efc
> 
>         Version : 0.90.00
> 
>            UUID : 7438efd1:9e6ca2b5:d6b88274: 7003b1d3
> 
>   Creation Time : Thu Oct 11 00:01:49 2007
> 
>      Raid Level : raid6
> 
>   Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
> 
>      Array Size : 2441919680 (2328.80 GiB 2500.53 GB)
> 
>    Raid Devices : 7
> 
>   Total Devices : 7
> 
> Preferred Minor : 4
> 
> 
> 
>     Update Time : Tue Aug 10 21:39:49 2010
> 
>           State : active
> 
>  Active Devices : 7
> 
> Working Devices : 7
> 
>  Failed Devices : 0
> 
>   Spare Devices : 0
> 
>        Checksum : b335b4e3 - expected b735b4e3
> 
>          Events : 1860555
> 
> 
> 
>          Layout : left-symmetric
> 
>      Chunk Size : 64K
> 
> 
> 
>       Number   Major   Minor   RaidDevice State
> 
> this     3       8       33        3      active sync   /dev/sdc1
> 
> 
> 
>    0     0       8       97        0      active sync   /dev/sdg1
> 
>    1     1       8        1        1      active sync   /dev/sda1
> 
>    2     2       8       65        2      active sync   /dev/sde1
> 
>    3     3       8       33        3      active sync   /dev/sdc1
> 
>    4     4       8       49        4      active sync   /dev/sdd1
> 
>    5     5       8       17        5      active sync   /dev/sdb1
> 
>    6     6       8       81        6      active sync   /dev/sdf1
> Anyways... I am ASSUMING mdadm has not assembled the array to be on the safe side? i have not done anything.. no force... no assume clean.. I wanted to be sure?
> 
> Should i remove sdc1 from the array? It should then assemble? I have 2 spare drives that I am getting around to using to replace this drive and the other 500GB.. so should I remove sdc1... and try and re-add or just put the new drive in?
> 
> atm I have 'stop'ped the array and got badblocks running....
> 
> 
>       
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sdc1 does not have a valid v0.90 superblock, not importing!
  2010-08-11 22:01 ` Stefan /*St0fF*/ Hübner
@ 2010-08-11 22:56   ` Neil Brown
  0 siblings, 0 replies; 9+ messages in thread
From: Neil Brown @ 2010-08-11 22:56 UTC (permalink / raw)
  To: st0ff; +Cc: stefan.huebner, Jon, Jon Hardcastle, linux-raid

On Thu, 12 Aug 2010 00:01:45 +0200
Stefan /*St0fF*/ Hübner <stefan.huebner@stud.tu-ilmenau.de> wrote:

> I had exactly the same problem this week with a costumer raid.  Solved
> it via:
> - calculate the hardware block where the Superblock resides
> - dd if=/dev/sdXY of=superblock seek=block_of_superblock bs=512
> - hexedit superblock checksum
> - dd of=/dev/sdXY if=superblock skip=block_of_superblock bs=512
> 
> this is not the correct way to go.  But noticing that only ONE BIT was
> skipped in the checksum, but all the other EXAMINE-information seemed
> right, I thought it's the only way to go to get ahold of the data on the
> array.

I hope you realise that if one bit is wrong in the checksum, it means there
is a very good chance that one bit is wrong somewhere else in the superblock.

Maybe this was a bit that was ignored.  Or maybe not.

I guess if you checked the output of --examine very thoroughly you should be
safe, but it is worth remembers that the checksum just shows the corruption,
it probably isn't the source of the corruption.

NeilBrown


> 
> hope it helps,
> Stefan
> 
> 
> Am 10.08.2010 23:35, schrieb Jon Hardcastle:
> > Help!
> > 
> > Long story short - I was watching a movie off my RAID6 array. Got a smart error warning 
> > 
> > 'Device: /dev/sdc [SAT], ATA error count increased from 30 to 31'
> > 
> > I went to investigate and found:
> > 
> > Error 31 occurred at disk power-on lifetime: 8461 hours (352 days + 13 
> > hours)
> > 
> >   When the command that caused the error occurred, the device was active
> >  or idle.
> > 
> > 
> > 
> >   After command completion occurred, registers were:
> > 
> >   ER ST SC SN CL CH DH
> > 
> >   -- -- -- -- -- -- --
> > 
> >   84 51 28 50 bd 49 47
> > 
> > 
> > 
> >   Commands leading to the command that caused the error were:
> > 
> >   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
> > 
> >   -- -- -- -- -- -- -- --  ----------------  --------------------
> > 
> >   61 38 08 3f bd 49 40 08      00:38:33.100  WRITE FPDMA QUEUED
> > 
> >   61 08 00 7f bd 49 40 08      00:38:33.100  WRITE FPDMA QUEUED
> > 
> >   61 08 00 97 bd 49 40 08      00:38:33.000  WRITE FPDMA QUEUED
> > 
> >   ea 00 00 00 00 00 a0 08      00:38:33.000  FLUSH CACHE EXT
> > 
> >   61 08 00 bf 4b 38 40 08      00:38:33.000  WRITE FPDMA QUEUED
> > 
> > I then emailed myself some error logs and shut the machine down. This drive has caused me problems before - the last time when the cat knocked the computer over and dislodged the controller card. But several echo "check" sync_action later and several weeks I have not had a peep out of it.
> > 
> > ANYWAYS. after the reboot the array wont assemble (is that normal?)
> > 
> > Aug 10 22:00:07 mangalore kernel: md: running: 
> > <sdg1><sdf1><sde1><sdd1><sdb1> <sda1>
> > 
> > Aug 10 22:00:07 mangalore kernel: raid5: md4 is not clean -- starting 
> > background reconstruction
> > 
> > Aug 10 22:00:07 mangalore kernel: raid5: device sdg1 operational as raid
> >  disk 0
> > 
> > Aug 10 22:00:07 mangalore kernel: raid5: device sdf1 operational as raid
> >  disk 6
> > 
> > Aug 10 22:00:07 mangalore kernel: raid5: device sde1 operational as raid
> >  disk 2
> > 
> > Aug 10 22:00:07 mangalore kernel: raid5: device sdd1 operational as raid
> >  disk 4
> > 
> > Aug 10 22:00:07 mangalore kernel: raid5: device sdb1 operational as raid
> >  disk 5
> > 
> > Aug 10 22:00:07 mangalore kernel: raid5: device sda1 operational as raid
> >  disk 1
> > 
> > Aug 10 22:00:07 mangalore kernel: raid5: allocated 7343kB for md4
> > 
> > Aug 10 22:00:07 mangalore kernel: 0: w=1 pa=0 pr=7 m=2 a=2 r=7 op1=0 
> > op2=0
> > 
> > Aug 10 22:00:07 mangalore kernel: 6: w=2 pa=0 pr=7 m=2 a=2 r=7 op1=0 
> > op2=0
> > 
> > Aug 10 22:00:07 mangalore kernel: 2: w=3 pa=0 pr=7 m=2 a=2 r=7 op1=0 
> > op2=0
> > 
> > Aug 10 22:00:07 mangalore kernel: 4: w=4 pa=0 pr=7 m=2 a=2 r=7 op1=0 
> > op2=0
> > 
> > Aug 10 22:00:07 mangalore kernel: 5: w=5 pa=0 pr=7 m=2 a=2 r=7 op1=0 
> > op2=0
> > 
> > Aug 10 22:00:07 mangalore kernel: 1: w=6 pa=0 pr=7 m=2 a=2 r=7 op1=0 
> > op2=0
> > 
> > Aug 10 22:00:07 mangalore kernel: raid5: cannot start dirty degraded 
> > array for md4
> > 
> > Aug 10 22:00:07 mangalore kernel: RAID5 conf printout:
> > 
> > Aug 10 22:00:07 mangalore kernel: --- rd:7 wd:6
> > 
> > Aug 10 22:00:07 mangalore kernel: disk 0, o:1, dev:sdg1
> > 
> > Aug 10 22:00:07 mangalore kernel: disk 1, o:1, dev:sda1
> > 
> > Aug 10 22:00:07 mangalore kernel: disk 2, o:1, dev:sde1
> > 
> > Aug 10 22:00:07 mangalore kernel: disk 4, o:1, dev:sdd1
> > 
> > Aug 10 22:00:07 mangalore kernel: disk 5, o:1, dev:sdb1
> > 
> > Aug 10 22:00:07 mangalore kernel: disk 6, o:1, dev:sdf1
> > 
> > Aug 10 22:00:07 mangalore kernel: raid5: failed to run raid set md4
> > 
> > Aug 10 22:00:07 mangalore kernel: md: pers->run() failed ...
> > 
> > Aug 10 22:00:07 mangalore kernel: md: do_md_run() returned -5
> > 
> > Aug 10 22:00:07 mangalore kernel: md: md4 stopped.
> > 
> > It appears sdc has an invalid superblock? 
> > 
> > This is the 'examine' from sdc1 (note the checksum)
> > 
> > /dev/sdc1:
> > 
> >           Magic : a92b4efc
> > 
> >         Version : 0.90.00
> > 
> >            UUID : 7438efd1:9e6ca2b5:d6b88274: 7003b1d3
> > 
> >   Creation Time : Thu Oct 11 00:01:49 2007
> > 
> >      Raid Level : raid6
> > 
> >   Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
> > 
> >      Array Size : 2441919680 (2328.80 GiB 2500.53 GB)
> > 
> >    Raid Devices : 7
> > 
> >   Total Devices : 7
> > 
> > Preferred Minor : 4
> > 
> > 
> > 
> >     Update Time : Tue Aug 10 21:39:49 2010
> > 
> >           State : active
> > 
> >  Active Devices : 7
> > 
> > Working Devices : 7
> > 
> >  Failed Devices : 0
> > 
> >   Spare Devices : 0
> > 
> >        Checksum : b335b4e3 - expected b735b4e3
> > 
> >          Events : 1860555
> > 
> > 
> > 
> >          Layout : left-symmetric
> > 
> >      Chunk Size : 64K
> > 
> > 
> > 
> >       Number   Major   Minor   RaidDevice State
> > 
> > this     3       8       33        3      active sync   /dev/sdc1
> > 
> > 
> > 
> >    0     0       8       97        0      active sync   /dev/sdg1
> > 
> >    1     1       8        1        1      active sync   /dev/sda1
> > 
> >    2     2       8       65        2      active sync   /dev/sde1
> > 
> >    3     3       8       33        3      active sync   /dev/sdc1
> > 
> >    4     4       8       49        4      active sync   /dev/sdd1
> > 
> >    5     5       8       17        5      active sync   /dev/sdb1
> > 
> >    6     6       8       81        6      active sync   /dev/sdf1
> > Anyways... I am ASSUMING mdadm has not assembled the array to be on the safe side? i have not done anything.. no force... no assume clean.. I wanted to be sure?
> > 
> > Should i remove sdc1 from the array? It should then assemble? I have 2 spare drives that I am getting around to using to replace this drive and the other 500GB.. so should I remove sdc1... and try and re-add or just put the new drive in?
> > 
> > atm I have 'stop'ped the array and got badblocks running....
> > 
> > 
> >       
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re:  sdc1 does not have a valid v0.90 superblock, not importing!
  2010-08-11 11:34     ` Neil Brown
  2010-08-11 12:29       ` Jon Hardcastle
@ 2010-08-11 15:30       ` Jon Hardcastle
  1 sibling, 0 replies; 9+ messages in thread
From: Jon Hardcastle @ 2010-08-11 15:30 UTC (permalink / raw)
  To: Jon, Neil Brown; +Cc: linux-raid

--- On Wed, 11/8/10, Neil Brown <neilb@suse.de> wrote:

> From: Neil Brown <neilb@suse.de>
> Subject: Re:  sdc1 does not have a valid v0.90 superblock, not importing!
> To: Jon@eHardcastle.com
> Cc: jd_hardcastle@yahoo.com, linux-raid@vger.kernel.org
> Date: Wednesday, 11 August, 2010, 12:34
> On Wed, 11 Aug 2010 04:19:07 -0700
> (PDT)
> Jon Hardcastle <jd_hardcastle@yahoo.com>
> wrote:
> 
> > 
> > --- On Wed, 11/8/10, Neil Brown <neilb@suse.de>
> wrote:
> > 
> > > From: Neil Brown <neilb@suse.de>
> > > Subject: Re:  sdc1 does not have a valid
> v0.90 superblock, not importing!
> > > To: Jon@eHardcastle.com
> > > Cc: jd_hardcastle@yahoo.com,
> linux-raid@vger.kernel.org
> > > Date: Wednesday, 11 August, 2010, 12:06
> > > On Wed, 11 Aug 2010 02:55:44 -0700
> > > (PDT)
> > > Jon Hardcastle <jd_hardcastle@yahoo.com>
> > > wrote:
> > > 
> > > > (my first attempt appears to have been
> bounced as the
> > > spam checker thought it had HTML in it?!)
> > > 
> > > odd... came through ok for me the first time.
> > > 
> > > > 
> > > > Help!
> > > > 
> > > > Long story short - I was watching a movie
> off my RAID6
> > > array. Got a smart error warning
> > > 
> > > > Aug 10 22:00:07 mangalore kernel: raid5:
> cannot start
> > > dirty degraded array for md4
> > > 
> > > This is the current problem.  The array is dirty
> and
> > > degraded so there could
> > > theoretically be undetectable corruption. 
> Chance is
> > > quite low but it is
> > > there so md won't start with out you
> acknowledging the risk
> > > by giving the
> > > --force flag to mdadm --assemble.
> > > Only do that if you are confident that your
> hardware is
> > > working correctly.
> > 
> > Well I am reasonable sure the controller came adrift
> the first time.. when i reseated it i stopped getting 100's
> of errors.. and it has survived 1.5 badblocks checks. It is
> being held in place by one of those bars you press down
> (does all the expansion cards in 1 go) except i dont think
> it is very good. I will screw it down.
> > 
> > > 
> > > > It appears sdc has an invalid superblock?
> > > > 
> > > > This is the 'examine' from sdc1 (note the
> checksum)
> > > > 
> > > > /dev/sdc1:
> > > .....
> > > >       Checksum : b335b4e3 -
> > > expected b735b4e3
> > > 
> > > Single bit error.  That isn't good as it means
> some
> > > bit of memory or some bit
> > > on some bus somewhere cannot be trusted.
> > > It could be a transient thing and will never
> happen
> > > again.  Or maybe not.
> > > Given the smart errors and the fact that you have
> had
> > > problems with the drive
> > > before it seem very likely that the problem is in
> that
> > > drive.  I suggest
> > > unplugging it and leaving it unplugged.  Some
> memory
> > > buffer in the drive is
> > > probably marginal.  I don't think they use ECC
> > > memory.
> > 
> > Could this be a result of me forcing a power off when
> the drive was causing problems?
> 
> Probably not.  Forcing a power off may well have left
> the array 'dirty' so
> that it wouldn't assemble, but is fairly unlikely to
> corrupt data within a
> block.
> 
> > 
> > What are the dangers to removing it, zeroing the
> superblock and readding? is it MORE dangerous than leaving a
> raid 6 degraded for a few days?
> 
> In general, I would say the chance of a known-bad drive
> causing problems is
> greater than the chance of a fewer known-good drives
> causing problems.
> But then you seem to think it isn't the drive, it was the
> controller and that
> is fixed...
> 
> This is really about your level of trust in the hardware.
> If you trust sdc as much as the others, include it in the
> array.
> If you don't, then don't.
> 
> NeilBrown
> 
> 
> 
> > 
> > > 
> > > > 
> > > > Anyways... I am ASSUMING mdadm has not
> assembled the
> > > array to be on the safe side? i have not done
> anything.. no
> > > force... no assume clean.. I wanted to be sure?
> > > 
> > > You assume correctly.
> > > 
> > > > 
> > > > Should i remove sdc1 from the array? It
> should then
> > > assemble? I have 2 spare drives that I am getting
> around to
> > > using to replace this drive and the other 500GB..
> so should
> > > I remove sdc1... and try and re-add or just put
> the new
> > > drive in?
> > > > 
> > > > atm I have 'stop'ped the array and got
> badblocks
> > > running....
> > > > 
> > > 
> > > Remove sdc and assemble the array with --force,
> and get a
> > > new device to
> > > replace /dev/sdc as soon as possible.
> > 
> > Thanks Neil - I panic'd as previously it has mounted
> the array in a degraded state... but previously the drive
> has disappeared completely... whereas in this case it is
> present... but wrong!
> > 
> > > 
> > > NeilBrown
> > > --
> > > To unsubscribe from this list: send the line
> "unsubscribe
> > > linux-raid" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > 
> > 
> >       

For the benefit of those that follow! I assemble the array by specifying exactly the drives I wanted in it.. once assembled i could confidently zero-block the troublesome drive... without it rearing its ugly head again!

I now have a 1 drive down - degraded raid 6 array.


      
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re:  sdc1 does not have a valid v0.90 superblock, not importing!
  2010-08-11 11:34     ` Neil Brown
@ 2010-08-11 12:29       ` Jon Hardcastle
  2010-08-11 15:30       ` Jon Hardcastle
  1 sibling, 0 replies; 9+ messages in thread
From: Jon Hardcastle @ 2010-08-11 12:29 UTC (permalink / raw)
  To: Jon, Neil Brown; +Cc: linux-raid

--- On Wed, 11/8/10, Neil Brown <neilb@suse.de> wrote:

> From: Neil Brown <neilb@suse.de>
> Subject: Re:  sdc1 does not have a valid v0.90 superblock, not importing!
> To: Jon@eHardcastle.com
> Cc: jd_hardcastle@yahoo.com, linux-raid@vger.kernel.org
> Date: Wednesday, 11 August, 2010, 12:34
> On Wed, 11 Aug 2010 04:19:07 -0700
> (PDT)
> Jon Hardcastle <jd_hardcastle@yahoo.com>
> wrote:
> 
> > 
> > --- On Wed, 11/8/10, Neil Brown <neilb@suse.de>
> wrote:
> > 
> > > From: Neil Brown <neilb@suse.de>
> > > Subject: Re:  sdc1 does not have a valid
> v0.90 superblock, not importing!
> > > To: Jon@eHardcastle.com
> > > Cc: jd_hardcastle@yahoo.com,
> linux-raid@vger.kernel.org
> > > Date: Wednesday, 11 August, 2010, 12:06
> > > On Wed, 11 Aug 2010 02:55:44 -0700
> > > (PDT)
> > > Jon Hardcastle <jd_hardcastle@yahoo.com>
> > > wrote:
> > > 
> > > > (my first attempt appears to have been
> bounced as the
> > > spam checker thought it had HTML in it?!)
> > > 
> > > odd... came through ok for me the first time.
> > > 
> > > > 
> > > > Help!
> > > > 
> > > > Long story short - I was watching a movie
> off my RAID6
> > > array. Got a smart error warning
> > > 
> > > > Aug 10 22:00:07 mangalore kernel: raid5:
> cannot start
> > > dirty degraded array for md4
> > > 
> > > This is the current problem.  The array is dirty
> and
> > > degraded so there could
> > > theoretically be undetectable corruption. 
> Chance is
> > > quite low but it is
> > > there so md won't start with out you
> acknowledging the risk
> > > by giving the
> > > --force flag to mdadm --assemble.
> > > Only do that if you are confident that your
> hardware is
> > > working correctly.
> > 
> > Well I am reasonable sure the controller came adrift
> the first time.. when i reseated it i stopped getting 100's
> of errors.. and it has survived 1.5 badblocks checks. It is
> being held in place by one of those bars you press down
> (does all the expansion cards in 1 go) except i dont think
> it is very good. I will screw it down.
> > 
> > > 
> > > > It appears sdc has an invalid superblock?
> > > > 
> > > > This is the 'examine' from sdc1 (note the
> checksum)
> > > > 
> > > > /dev/sdc1:
> > > .....
> > > >       Checksum : b335b4e3 -
> > > expected b735b4e3
> > > 
> > > Single bit error.  That isn't good as it means
> some
> > > bit of memory or some bit
> > > on some bus somewhere cannot be trusted.
> > > It could be a transient thing and will never
> happen
> > > again.  Or maybe not.
> > > Given the smart errors and the fact that you have
> had
> > > problems with the drive
> > > before it seem very likely that the problem is in
> that
> > > drive.  I suggest
> > > unplugging it and leaving it unplugged.  Some
> memory
> > > buffer in the drive is
> > > probably marginal.  I don't think they use ECC
> > > memory.
> > 
> > Could this be a result of me forcing a power off when
> the drive was causing problems?
> 
> Probably not.  Forcing a power off may well have left
> the array 'dirty' so
> that it wouldn't assemble, but is fairly unlikely to
> corrupt data within a
> block.
> 
> > 
> > What are the dangers to removing it, zeroing the
> superblock and readding? is it MORE dangerous than leaving a
> raid 6 degraded for a few days?
> 
> In general, I would say the chance of a known-bad drive
> causing problems is
> greater than the chance of a fewer known-good drives
> causing problems.
> But then you seem to think it isn't the drive, it was the
> controller and that
> is fixed...
> 
> This is really about your level of trust in the hardware.
> If you trust sdc as much as the others, include it in the
> array.
> If you don't, then don't.
> 
> NeilBrown
> 
> 
> 
> > 
> > > 
> > > > 
> > > > Anyways... I am ASSUMING mdadm has not
> assembled the
> > > array to be on the safe side? i have not done
> anything.. no
> > > force... no assume clean.. I wanted to be sure?
> > > 
> > > You assume correctly.
> > > 
> > > > 
> > > > Should i remove sdc1 from the array? It
> should then
> > > assemble? I have 2 spare drives that I am getting
> around to
> > > using to replace this drive and the other 500GB..
> so should
> > > I remove sdc1... and try and re-add or just put
> the new
> > > drive in?
> > > > 
> > > > atm I have 'stop'ped the array and got
> badblocks
> > > running....
> > > > 
> > > 
> > > Remove sdc and assemble the array with --force,
> and get a
> > > new device to
> > > replace /dev/sdc as soon as possible.
> > 
> > Thanks Neil - I panic'd as previously it has mounted
> the array in a degraded state... but previously the drive
> has disappeared completely... whereas in this case it is
> present... but wrong!
> > 
> > > 
> > > NeilBrown

Hmmm ok. It isn't worth the risk. I can thrash the drive after I have replaced it.

OK so now I want to mark the drive as 'removed' but it is proving problematic as the array is not active?

# mdadm /dev/md4 --fail /dev/sdc1
mdadm: cannot get array info for /dev/md4


# mdadm --detail /dev/md4
mdadm: md device /dev/md4 does not appear to be active.

# mdadm --assemble /dev/md4
mdadm: failed to add /dev/sdc1 to /dev/md4: Invalid argument
mdadm: /dev/md4 assembled from 6 drives - not enough to start the array while not clean - consider --force.

I really wanted to fail it before trying to assemble the rest?


      
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re:  sdc1 does not have a valid v0.90 superblock, not importing!
  2010-08-11 11:19   ` Jon Hardcastle
@ 2010-08-11 11:34     ` Neil Brown
  2010-08-11 12:29       ` Jon Hardcastle
  2010-08-11 15:30       ` Jon Hardcastle
  0 siblings, 2 replies; 9+ messages in thread
From: Neil Brown @ 2010-08-11 11:34 UTC (permalink / raw)
  To: Jon; +Cc: jd_hardcastle, linux-raid

On Wed, 11 Aug 2010 04:19:07 -0700 (PDT)
Jon Hardcastle <jd_hardcastle@yahoo.com> wrote:

> 
> --- On Wed, 11/8/10, Neil Brown <neilb@suse.de> wrote:
> 
> > From: Neil Brown <neilb@suse.de>
> > Subject: Re:  sdc1 does not have a valid v0.90 superblock, not importing!
> > To: Jon@eHardcastle.com
> > Cc: jd_hardcastle@yahoo.com, linux-raid@vger.kernel.org
> > Date: Wednesday, 11 August, 2010, 12:06
> > On Wed, 11 Aug 2010 02:55:44 -0700
> > (PDT)
> > Jon Hardcastle <jd_hardcastle@yahoo.com>
> > wrote:
> > 
> > > (my first attempt appears to have been bounced as the
> > spam checker thought it had HTML in it?!)
> > 
> > odd... came through ok for me the first time.
> > 
> > > 
> > > Help!
> > > 
> > > Long story short - I was watching a movie off my RAID6
> > array. Got a smart error warning
> > 
> > > Aug 10 22:00:07 mangalore kernel: raid5: cannot start
> > dirty degraded array for md4
> > 
> > This is the current problem.  The array is dirty and
> > degraded so there could
> > theoretically be undetectable corruption.  Chance is
> > quite low but it is
> > there so md won't start with out you acknowledging the risk
> > by giving the
> > --force flag to mdadm --assemble.
> > Only do that if you are confident that your hardware is
> > working correctly.
> 
> Well I am reasonable sure the controller came adrift the first time.. when i reseated it i stopped getting 100's of errors.. and it has survived 1.5 badblocks checks. It is being held in place by one of those bars you press down (does all the expansion cards in 1 go) except i dont think it is very good. I will screw it down.
> 
> > 
> > > It appears sdc has an invalid superblock?
> > > 
> > > This is the 'examine' from sdc1 (note the checksum)
> > > 
> > > /dev/sdc1:
> > .....
> > >       Checksum : b335b4e3 -
> > expected b735b4e3
> > 
> > Single bit error.  That isn't good as it means some
> > bit of memory or some bit
> > on some bus somewhere cannot be trusted.
> > It could be a transient thing and will never happen
> > again.  Or maybe not.
> > Given the smart errors and the fact that you have had
> > problems with the drive
> > before it seem very likely that the problem is in that
> > drive.  I suggest
> > unplugging it and leaving it unplugged.  Some memory
> > buffer in the drive is
> > probably marginal.  I don't think they use ECC
> > memory.
> 
> Could this be a result of me forcing a power off when the drive was causing problems?

Probably not.  Forcing a power off may well have left the array 'dirty' so
that it wouldn't assemble, but is fairly unlikely to corrupt data within a
block.

> 
> What are the dangers to removing it, zeroing the superblock and readding? is it MORE dangerous than leaving a raid 6 degraded for a few days?

In general, I would say the chance of a known-bad drive causing problems is
greater than the chance of a fewer known-good drives causing problems.
But then you seem to think it isn't the drive, it was the controller and that
is fixed...

This is really about your level of trust in the hardware.
If you trust sdc as much as the others, include it in the array.
If you don't, then don't.

NeilBrown



> 
> > 
> > > 
> > > Anyways... I am ASSUMING mdadm has not assembled the
> > array to be on the safe side? i have not done anything.. no
> > force... no assume clean.. I wanted to be sure?
> > 
> > You assume correctly.
> > 
> > > 
> > > Should i remove sdc1 from the array? It should then
> > assemble? I have 2 spare drives that I am getting around to
> > using to replace this drive and the other 500GB.. so should
> > I remove sdc1... and try and re-add or just put the new
> > drive in?
> > > 
> > > atm I have 'stop'ped the array and got badblocks
> > running....
> > > 
> > 
> > Remove sdc and assemble the array with --force, and get a
> > new device to
> > replace /dev/sdc as soon as possible.
> 
> Thanks Neil - I panic'd as previously it has mounted the array in a degraded state... but previously the drive has disappeared completely... whereas in this case it is present... but wrong!
> 
> > 
> > NeilBrown
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> 
>       

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re:  sdc1 does not have a valid v0.90 superblock, not importing!
  2010-08-11 11:06 ` Neil Brown
@ 2010-08-11 11:19   ` Jon Hardcastle
  2010-08-11 11:34     ` Neil Brown
  0 siblings, 1 reply; 9+ messages in thread
From: Jon Hardcastle @ 2010-08-11 11:19 UTC (permalink / raw)
  To: Jon, Neil Brown; +Cc: linux-raid


--- On Wed, 11/8/10, Neil Brown <neilb@suse.de> wrote:

> From: Neil Brown <neilb@suse.de>
> Subject: Re:  sdc1 does not have a valid v0.90 superblock, not importing!
> To: Jon@eHardcastle.com
> Cc: jd_hardcastle@yahoo.com, linux-raid@vger.kernel.org
> Date: Wednesday, 11 August, 2010, 12:06
> On Wed, 11 Aug 2010 02:55:44 -0700
> (PDT)
> Jon Hardcastle <jd_hardcastle@yahoo.com>
> wrote:
> 
> > (my first attempt appears to have been bounced as the
> spam checker thought it had HTML in it?!)
> 
> odd... came through ok for me the first time.
> 
> > 
> > Help!
> > 
> > Long story short - I was watching a movie off my RAID6
> array. Got a smart error warning
> 
> > Aug 10 22:00:07 mangalore kernel: raid5: cannot start
> dirty degraded array for md4
> 
> This is the current problem.  The array is dirty and
> degraded so there could
> theoretically be undetectable corruption.  Chance is
> quite low but it is
> there so md won't start with out you acknowledging the risk
> by giving the
> --force flag to mdadm --assemble.
> Only do that if you are confident that your hardware is
> working correctly.

Well I am reasonable sure the controller came adrift the first time.. when i reseated it i stopped getting 100's of errors.. and it has survived 1.5 badblocks checks. It is being held in place by one of those bars you press down (does all the expansion cards in 1 go) except i dont think it is very good. I will screw it down.

> 
> > It appears sdc has an invalid superblock?
> > 
> > This is the 'examine' from sdc1 (note the checksum)
> > 
> > /dev/sdc1:
> .....
> >       Checksum : b335b4e3 -
> expected b735b4e3
> 
> Single bit error.  That isn't good as it means some
> bit of memory or some bit
> on some bus somewhere cannot be trusted.
> It could be a transient thing and will never happen
> again.  Or maybe not.
> Given the smart errors and the fact that you have had
> problems with the drive
> before it seem very likely that the problem is in that
> drive.  I suggest
> unplugging it and leaving it unplugged.  Some memory
> buffer in the drive is
> probably marginal.  I don't think they use ECC
> memory.

Could this be a result of me forcing a power off when the drive was causing problems?

What are the dangers to removing it, zeroing the superblock and readding? is it MORE dangerous than leaving a raid 6 degraded for a few days?

> 
> > 
> > Anyways... I am ASSUMING mdadm has not assembled the
> array to be on the safe side? i have not done anything.. no
> force... no assume clean.. I wanted to be sure?
> 
> You assume correctly.
> 
> > 
> > Should i remove sdc1 from the array? It should then
> assemble? I have 2 spare drives that I am getting around to
> using to replace this drive and the other 500GB.. so should
> I remove sdc1... and try and re-add or just put the new
> drive in?
> > 
> > atm I have 'stop'ped the array and got badblocks
> running....
> > 
> 
> Remove sdc and assemble the array with --force, and get a
> new device to
> replace /dev/sdc as soon as possible.

Thanks Neil - I panic'd as previously it has mounted the array in a degraded state... but previously the drive has disappeared completely... whereas in this case it is present... but wrong!

> 
> NeilBrown
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


      
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re:  sdc1 does not have a valid v0.90 superblock, not importing!
  2010-08-11  9:55 Sorry if Spamming! - " Jon Hardcastle
@ 2010-08-11 11:06 ` Neil Brown
  2010-08-11 11:19   ` Jon Hardcastle
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2010-08-11 11:06 UTC (permalink / raw)
  To: Jon; +Cc: jd_hardcastle, linux-raid

On Wed, 11 Aug 2010 02:55:44 -0700 (PDT)
Jon Hardcastle <jd_hardcastle@yahoo.com> wrote:

> (my first attempt appears to have been bounced as the spam checker thought it had HTML in it?!)

odd... came through ok for me the first time.

> 
> Help!
> 
> Long story short - I was watching a movie off my RAID6 array. Got a smart error warning

> Aug 10 22:00:07 mangalore kernel: raid5: cannot start dirty degraded array for md4

This is the current problem.  The array is dirty and degraded so there could
theoretically be undetectable corruption.  Chance is quite low but it is
there so md won't start with out you acknowledging the risk by giving the
--force flag to mdadm --assemble.
Only do that if you are confident that your hardware is working correctly.

> It appears sdc has an invalid superblock?
> 
> This is the 'examine' from sdc1 (note the checksum)
> 
> /dev/sdc1:
.....
>       Checksum : b335b4e3 - expected b735b4e3

Single bit error.  That isn't good as it means some bit of memory or some bit
on some bus somewhere cannot be trusted.
It could be a transient thing and will never happen again.  Or maybe not.
Given the smart errors and the fact that you have had problems with the drive
before it seem very likely that the problem is in that drive.  I suggest
unplugging it and leaving it unplugged.  Some memory buffer in the drive is
probably marginal.  I don't think they use ECC memory.

> 
> Anyways... I am ASSUMING mdadm has not assembled the array to be on the safe side? i have not done anything.. no force... no assume clean.. I wanted to be sure?

You assume correctly.

> 
> Should i remove sdc1 from the array? It should then assemble? I have 2 spare drives that I am getting around to using to replace this drive and the other 500GB.. so should I remove sdc1... and try and re-add or just put the new drive in?
> 
> atm I have 'stop'ped the array and got badblocks running....
> 

Remove sdc and assemble the array with --force, and get a new device to
replace /dev/sdc as soon as possible.

NeilBrown

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-08-11 22:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-10 21:35 Fw: sdc1 does not have a valid v0.90 superblock, not importing! Jon Hardcastle
2010-08-10 21:41 ` Jon Hardcastle
2010-08-11 22:01 ` Stefan /*St0fF*/ Hübner
2010-08-11 22:56   ` Neil Brown
2010-08-11  9:55 Sorry if Spamming! - " Jon Hardcastle
2010-08-11 11:06 ` Neil Brown
2010-08-11 11:19   ` Jon Hardcastle
2010-08-11 11:34     ` Neil Brown
2010-08-11 12:29       ` Jon Hardcastle
2010-08-11 15:30       ` Jon Hardcastle

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.