All of lore.kernel.org
 help / color / mirror / Atom feed
* faulty array member
@ 2010-11-18 17:08 Roberto Nunnari
  2010-11-18 19:29 ` Tim Small
  2010-11-22  4:05 ` Neil Brown
  0 siblings, 2 replies; 4+ messages in thread
From: Roberto Nunnari @ 2010-11-18 17:08 UTC (permalink / raw)
  To: linux-raid

Hello.

I have a linux file-server with to 1TB sata disks
in software raid1.

as my drives are no longer in full health raid put
one array member in faulty state.

A bit about my environment:

# uname -rms
Linux 2.6.9-89.0.18.ELsmp i686

# cat /etc/redhat-release
CentOS release 4.8 (Final)


# parted /dev/sda print
Disk geometry for /dev/sda: 0.000-953869.710 megabytes
Disk label type: msdos
Minor    Start       End     Type      Filesystem  Flags
1          0.031    251.015  primary   ext3        boot
2        251.016  40248.786  primary   ext3        raid
3      40248.787  42296.132  primary   linux-swap
4      42296.133 953867.219  primary   ext3        raid

# parted /dev/sdb print
Disk geometry for /dev/sdb: 0.000-953869.710 megabytes
Disk label type: msdos
Minor    Start       End     Type      Filesystem  Flags
1          0.031  39997.771  primary   ext3        boot, raid
2      39997.771  42045.117  primary   linux-swap
3      42045.117  42296.132  primary   ext3
4      42296.133 953867.219  primary   ext3        raid


# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb4[1] sda4[0]
       933448704 blocks [2/2] [UU]
md0 : active raid1 sdb1[1] sda2[2](F)
       40957568 blocks [2/1] [_U]
unused devices: <none>


Don't ask me why the two drives are not specular
and md0 is mapped sdb1+sda2.. I have no idea.
It was made so by anaconda using kickstart during install.


So, I was using debugfs:
# debugfs
debugfs 1.35 (28-Feb-2004)
debugfs:  open /dev/md0
debugfs:  testb 1736947
Block 1736947 marked in use
debugfs:  icheck 1736947
Block   Inode number
1736947 <block not found>
debugfs:  icheck 1736947 10
Block   Inode number
1736947 <block not found>
10      7

in an attempt to locate the bad disk blocks, and after that
software raid put sda2 in faulty state.


Now, as smartctl is telling me that there are errors spread
on all partitions used in both raids, I would like to take
a full backup of at least /dev/md1 (that's still healthy).

The question is:
Is there a way and is it safe to put back /dev/sda2 into
/dev/md0 so that I'm sure I can backup even the blocks
that are unreadable on the first array member but probably
are still readable on failed device?

Thank you for your time and help!

Best regards.
Robi

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: faulty array member
  2010-11-18 17:08 faulty array member Roberto Nunnari
@ 2010-11-18 19:29 ` Tim Small
       [not found]   ` <4CE640C6.3090607@supsi.ch>
  2010-11-22  4:05 ` Neil Brown
  1 sibling, 1 reply; 4+ messages in thread
From: Tim Small @ 2010-11-18 19:29 UTC (permalink / raw)
  To: Roberto Nunnari; +Cc: linux-raid

On 18/11/10 17:08, Roberto Nunnari wrote:
>
> md0 : active raid1 sdb1[1] sda2[2](F)
>       40957568 blocks [2/1] [_U]
>
> The question is:
> Is there a way and is it safe to put back /dev/sda2 into
> /dev/md0 so that I'm sure I can backup even the blocks

If there have been writes to md0 since sda2 was failed, then no.  Your
best best is probably to find out which sectors are bad in sdb1, and
then copy those individual blocks (only) over from sda2 (e.g. using
dd).  Once sdb1 has no pending sectors left, you should then be able to
re-add sda2 back into md0.

I don't know what the bad-block remapping is like on the CentOS4 kernel,
so you may want to use a more recent boot CD etc. to carry out the
rebuild, if it doesn't work using the native kernel.  Maybe.

Tim.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: faulty array member
       [not found]   ` <4CE640C6.3090607@supsi.ch>
@ 2010-11-19 10:47     ` Tim Small
  0 siblings, 0 replies; 4+ messages in thread
From: Tim Small @ 2010-11-19 10:47 UTC (permalink / raw)
  To: Roberto Nunnari; +Cc: linux-raid


>> then copy those individual blocks (only) over from sda2 (e.g. using
>> dd).  Once sdb1 has no pending sectors left, you should then be able to
>> re-add sda2 back into md0.
>
> that is exactly what I was trying to do.. but as the two drives
> partitioning are not the same, I don't know how to find the
> corresponding blocks/sectors in the two drives.

Under later kernels, you can definitely do something like:

cat /sys/block/sda/sda2/start

to find what the offset is from the start of the drive for each partition.

>> I don't know what the bad-block remapping is like on the CentOS4 kernel,
>
> I believe in this case, the bad-block remapping is done by the
> hd firmware.. please correct me if I'm wrong.

Yes it is, but the remap is triggered by a write to the block, so when
it happens automatically due to md, the following has to happen:

md tries to read from a sector, the read fails, so md reads from the
other drive, and then writes that data back to the original drive


> By the way, why it is not possible to re-add /dev/sda2 into the raid?

You should be able to do that, but md would in that case start a rebuild
from the other drive (i.e. copy all blocks from the other drive), and as
there are currently pending sectors on that drive, the rebuild will fail
- so you have to eliminate those faulty sectors on the source drive first.

Tim.

-- 
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.  
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53  http://seoss.co.uk/ +44-(0)1273-808309


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: faulty array member
  2010-11-18 17:08 faulty array member Roberto Nunnari
  2010-11-18 19:29 ` Tim Small
@ 2010-11-22  4:05 ` Neil Brown
  1 sibling, 0 replies; 4+ messages in thread
From: Neil Brown @ 2010-11-22  4:05 UTC (permalink / raw)
  To: Roberto Nunnari; +Cc: linux-raid

On Thu, 18 Nov 2010 18:08:46 +0100
Roberto Nunnari <roberto.nunnari@supsi.ch> wrote:

> Hello.
> 
> I have a linux file-server with to 1TB sata disks
> in software raid1.
> 
> as my drives are no longer in full health raid put
> one array member in faulty state.

More accurately: as one of your drives reported a write error, md/raid put it
in 'faulty' state'.

> 
> A bit about my environment:
> 
> # uname -rms
> Linux 2.6.9-89.0.18.ELsmp i686

Wow, that's old!!

> 
> # cat /etc/redhat-release
> CentOS release 4.8 (Final)
> 
> 
> # parted /dev/sda print
> Disk geometry for /dev/sda: 0.000-953869.710 megabytes
> Disk label type: msdos
> Minor    Start       End     Type      Filesystem  Flags
> 1          0.031    251.015  primary   ext3        boot
> 2        251.016  40248.786  primary   ext3        raid
> 3      40248.787  42296.132  primary   linux-swap
> 4      42296.133 953867.219  primary   ext3        raid
> 
> # parted /dev/sdb print
> Disk geometry for /dev/sdb: 0.000-953869.710 megabytes
> Disk label type: msdos
> Minor    Start       End     Type      Filesystem  Flags
> 1          0.031  39997.771  primary   ext3        boot, raid
> 2      39997.771  42045.117  primary   linux-swap
> 3      42045.117  42296.132  primary   ext3
> 4      42296.133 953867.219  primary   ext3        raid
> 
> 
> # cat /proc/mdstat
> Personalities : [raid1]
> md1 : active raid1 sdb4[1] sda4[0]
>        933448704 blocks [2/2] [UU]
> md0 : active raid1 sdb1[1] sda2[2](F)
>        40957568 blocks [2/1] [_U]
> unused devices: <none>
> 
> 
> Don't ask me why the two drives are not specular
> and md0 is mapped sdb1+sda2.. I have no idea.
> It was made so by anaconda using kickstart during install.
> 
> 
> So, I was using debugfs:
> # debugfs
> debugfs 1.35 (28-Feb-2004)
> debugfs:  open /dev/md0
> debugfs:  testb 1736947
> Block 1736947 marked in use
> debugfs:  icheck 1736947
> Block   Inode number
> 1736947 <block not found>
> debugfs:  icheck 1736947 10
> Block   Inode number
> 1736947 <block not found>
> 10      7
> 
> in an attempt to locate the bad disk blocks, and after that
> software raid put sda2 in faulty state.
> 
> 
> Now, as smartctl is telling me that there are errors spread
> on all partitions used in both raids, I would like to take
> a full backup of at least /dev/md1 (that's still healthy).
> 
> The question is:
> Is there a way and is it safe to put back /dev/sda2 into
> /dev/md0 so that I'm sure I can backup even the blocks
> that are unreadable on the first array member but probably
> are still readable on failed device?
> 

You should get 'ddrescue' and carefully read the documentation.

Then 'ddrescue' from the best device to a new device, making sure to keep the
log file.
Then 'ddrescue' from the second best device to the same new device using the
same log file.  This will only copy blocks that couldn't be read from the
first device.

NeilBrown



> Thank you for your time and help!
> 
> Best regards.
> Robi
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-11-22  4:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-18 17:08 faulty array member Roberto Nunnari
2010-11-18 19:29 ` Tim Small
     [not found]   ` <4CE640C6.3090607@supsi.ch>
2010-11-19 10:47     ` Tim Small
2010-11-22  4:05 ` Neil Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.