* raid6 recovery with read errors
@ 2009-08-03 8:36 Matthias Urlichs
2009-08-12 8:14 ` Matthias Urlichs
0 siblings, 1 reply; 3+ messages in thread
From: Matthias Urlichs @ 2009-08-03 8:36 UTC (permalink / raw)
To: linux-raid
Hi,
I have a rather large RAID6 array where multiple disks have developed
read errors. The problem is that the RAID6 is built from LVM-mapped disks,
which (I think) isolated the MD driver from write errors. Thus it thought
that a re-write of the bad sectors succeeded. WRONG. Thus the bad disks
were never unmapped.
The problem is that yesterday, three of these TByte disks failed in
_exactly_ the same 1k-sized spot. So, no more RAID6. :-(
So, how do I get the data back?
I've copied the individual partitions with ddrescue, which conveniently
left me a log file pointing to the sectors which need to be recovered.
However, I'm sure that there's no way to tell the kernel about individual
"bad spots".
Is there a standalone program that can do that?
--
Matthias Urlichs
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: raid6 recovery with read errors
2009-08-03 8:36 raid6 recovery with read errors Matthias Urlichs
@ 2009-08-12 8:14 ` Matthias Urlichs
2009-09-22 10:56 ` [PATCH] " Matthias Urlichs
0 siblings, 1 reply; 3+ messages in thread
From: Matthias Urlichs @ 2009-08-12 8:14 UTC (permalink / raw)
To: linux-raid
On Mon, 03 Aug 2009 08:36:06 +0000, Matthias Urlichs wrote:
> [ block-level restoration of RAID6 data ]
> Is there a standalone program that can do that?
Seems that the answer is "no", at least as far as anybody on this list is
concerned.
*Sigh* So I'll have to write one, it seems.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH] Re: raid6 recovery with read errors
2009-08-12 8:14 ` Matthias Urlichs
@ 2009-09-22 10:56 ` Matthias Urlichs
0 siblings, 0 replies; 3+ messages in thread
From: Matthias Urlichs @ 2009-09-22 10:56 UTC (permalink / raw)
To: linux-raid
>> [ block-level restoration of RAID6 data ] Is there a standalone program
>> that can do that?
>
> Seems that the answer is "no", at least as far as anybody on this list
> is concerned.
>
> *Sigh* So I'll have to write one, it seems.
The code is available from http://netz.smurf.noris.de/git/mdadm.git
(branch "repair"). This is off mdadm's 3.1 development branch; the bugfix
changes since 3.0 have been merged in.
Restoration of RAID5 and RAID6 partitions is tested, given a workable
badblock list (i.e. ddrescue's log). Restoration of RAID6 and purging
unrecoverable ranges (plus restoring the parity sectors) is untested.
The test suite includes a helper program which can find bad parity or
inconsistent RAID1. I didn't add an option to mdadm to do that, because
I'm lazy :-P besides, the kernel can do that already.
Also included are some changes in the test suite which ensure that the
test suite works on a machine which has udev running -- or loopback or
raid drivers in use. :-/
Usage: mdadm --repair-blocks=0:bad-sda1.log,1:bad-sdb1.log /dev/sd[a-d]1
(The numbers correspond to the disk numbers within the RAID, not to the
n'th non-option argument.) Specifying a UUID instead of a bunch of
devices should also work, but is untested.
Any feedback is of course welcome.
Now for the difficult part -- how to do automagic remapping. I tend to
think that extending the staging/cowloop (copy-on-write) driver to be a
coeloop (copy-on-error) driver would be a good idea. (It should also be
possible to force a copy of the RAID superblocks, and overwrite that with
some sort of cowloop metadata on the original disk, so that the kernel
doesn't mount the underlying device with its obsolete blocks. :-/ )
TODO: (a) containers. Somebody else please uncomment and re-enable the
code in Repair.c; (b) testing of RAID1 and uuid-etc.-based RAID selection,
(c) auto-selection among 'identical' RAID partitions, if scanning finds
more than one (the idea is to choose the partition that's _not_ on the
device where the superblock says it should be, assuming identical event
numbers).
I haven't tested this on my production system yet -- that's the next
step. But the test suite says it should work. Famous last words. :-P
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-09-22 10:56 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-03 8:36 raid6 recovery with read errors Matthias Urlichs
2009-08-12 8:14 ` Matthias Urlichs
2009-09-22 10:56 ` [PATCH] " Matthias Urlichs
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.