All of lore.kernel.org
 help / color / mirror / Atom feed
* Replacing a failed disk "in advance"
@ 2015-05-20 14:20 Jan Kasprzak
  2015-05-20 14:40 ` Ladislav Mate
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kasprzak @ 2015-05-20 14:20 UTC (permalink / raw)
  To: linux-raid

	Hello,

I have a RAID-5 volume of 8 physical disks. One of these disks failed the SMART
self-test with an unreadable block error. Unfortunately I have discovered
that there is _another_ bad block on another disk. It is a different block,
so the RAID-5 volume as a whole is still working. But as a whole, the
RAID-5 volume has at least two unreadable sectors on two different disks.

What is the best way to replace these two failing disks one by one without
the loss of data? I cannot mdadm --fail one of them, because the
subsequent rebuild on a new disk would fail on reading the other bad block.

I would like to add the ninth drive to the RAID-5 volume, and put a replica
of one of the failing drives to it. Then remove the just-replicated drive,
and do the same with the other failing drive.

Thanks,

-Yenya

-- 
| Jan "Yenya" Kasprzak   <kas at {fi.muni.cz - work | yenya.net - private}> |
| New GPG 4096R/A45477D5 -- see http://www.fi.muni.cz/~kas/pgp-rollover.txt |
| http://www.fi.muni.cz/~kas/     Journal: http://www.fi.muni.cz/~kas/blog/ |
           Smart data structures and dumb code works a lot better
           than the other way around.           --Eric S. Raymond

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Replacing a failed disk "in advance"
  2015-05-20 14:20 Replacing a failed disk "in advance" Jan Kasprzak
@ 2015-05-20 14:40 ` Ladislav Mate
  2015-05-20 14:51   ` Jan Kasprzak
  0 siblings, 1 reply; 5+ messages in thread
From: Ladislav Mate @ 2015-05-20 14:40 UTC (permalink / raw)
  To: Jan Kasprzak; +Cc: linux-raid

On Wed, May 20, 2015 at 04:20:49PM +0200, Jan Kasprzak wrote:
> 	Hello,
Hi Jan,
> 
> I have a RAID-5 volume of 8 physical disks. One of these disks failed the SMART
> self-test with an unreadable block error. Unfortunately I have discovered
> that there is _another_ bad block on another disk. It is a different block,
> so the RAID-5 volume as a whole is still working. But as a whole, the
> RAID-5 volume has at least two unreadable sectors on two different disks.
> 
> What is the best way to replace these two failing disks one by one without
> the loss of data? I cannot mdadm --fail one of them, because the
> subsequent rebuild on a new disk would fail on reading the other bad block.
> 
> I would like to add the ninth drive to the RAID-5 volume, and put a replica
> of one of the failing drives to it. Then remove the just-replicated drive,
> and do the same with the other failing drive.
When you take a look in mdadm man page and search for --replace you'll find what you are looking for.

--replace
       Mark listed devices as requiring replacement.  As soon as a spare is available, it will be rebuilt  and  will  replace
       the marked device.  This is similar to marking a device as faulty, but the device remains in service during the recov-
       ery process to increase resilience against multiple failures.  When the replacement  process  finishes,  the  replaced
       device will be marked as faulty.

--with This  can  follow a list of --replace devices.  The devices listed after --with will be preferentially used to replace
       the devices listed after --replace.  These device must already be spare devices in the array.

HTH,

/lm
> 
> Thanks,
> 
> -Yenya
> 
> -- 
> | Jan "Yenya" Kasprzak   <kas at {fi.muni.cz - work | yenya.net - private}> |
> | New GPG 4096R/A45477D5 -- see http://www.fi.muni.cz/~kas/pgp-rollover.txt |
> | http://www.fi.muni.cz/~kas/     Journal: http://www.fi.muni.cz/~kas/blog/ |
>            Smart data structures and dumb code works a lot better
>            than the other way around.           --Eric S. Raymond
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Replacing a failed disk "in advance"
  2015-05-20 14:40 ` Ladislav Mate
@ 2015-05-20 14:51   ` Jan Kasprzak
  2015-05-20 15:12     ` Caspar Smit
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kasprzak @ 2015-05-20 14:51 UTC (permalink / raw)
  To: Ladislav Mate; +Cc: linux-raid

	Hi Ladislav,

Ladislav Mate wrote:
: > I have a RAID-5 volume of 8 physical disks. One of these disks failed the SMART
: > self-test with an unreadable block error. Unfortunately I have discovered
: > that there is _another_ bad block on another disk. It is a different block,
: > so the RAID-5 volume as a whole is still working. But as a whole, the
: > RAID-5 volume has at least two unreadable sectors on two different disks.
: > 
: > What is the best way to replace these two failing disks one by one without
: > the loss of data? I cannot mdadm --fail one of them, because the
: > subsequent rebuild on a new disk would fail on reading the other bad block.
: > 
: > I would like to add the ninth drive to the RAID-5 volume, and put a replica
: > of one of the failing drives to it. Then remove the just-replicated drive,
: > and do the same with the other failing drive.
: When you take a look in mdadm man page and search for --replace you'll find what you are looking for.
: 
: --replace
:        Mark listed devices as requiring replacement.  As soon as a spare is available, it will be rebuilt  and  will  replace
:        the marked device.  This is similar to marking a device as faulty, but the device remains in service during the recov-
:        ery process to increase resilience against multiple failures.  When the replacement  process  finishes,  the  replaced
:        device will be marked as faulty.
: 
: --with This  can  follow a list of --replace devices.  The devices listed after --with will be preferentially used to replace
:        the devices listed after --replace.  These device must already be spare devices in the array.

	OK, this seems to be a way to go.

	Unfortunately, the system in question is too old,
and its mdadm does not know about the --replace option, according to
mdadm --manage --help. So I will look for a newer mdadm and hope the old
kernel of that system supports the mdadm --replace.

	Thanks!

-Yenya

-- 
| Jan "Yenya" Kasprzak   <kas at {fi.muni.cz - work | yenya.net - private}> |
| New GPG 4096R/A45477D5 -- see http://www.fi.muni.cz/~kas/pgp-rollover.txt |
| http://www.fi.muni.cz/~kas/     Journal: http://www.fi.muni.cz/~kas/blog/ |
           Smart data structures and dumb code works a lot better
           than the other way around.           --Eric S. Raymond

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Replacing a failed disk "in advance"
  2015-05-20 14:51   ` Jan Kasprzak
@ 2015-05-20 15:12     ` Caspar Smit
  2015-05-20 15:15       ` Jan Kasprzak
  0 siblings, 1 reply; 5+ messages in thread
From: Caspar Smit @ 2015-05-20 15:12 UTC (permalink / raw)
  To: linux-raid

Op 20 mei 2015 16:51 schreef "Jan Kasprzak" <kas@fi.muni.cz>:
>
>         Hi Ladislav,
>
> Ladislav Mate wrote:
> : > I have a RAID-5 volume of 8 physical disks. One of these disks failed the SMART
> : > self-test with an unreadable block error. Unfortunately I have discovered
> : > that there is _another_ bad block on another disk. It is a different block,
> : > so the RAID-5 volume as a whole is still working. But as a whole, the
> : > RAID-5 volume has at least two unreadable sectors on two different disks.
> : >
> : > What is the best way to replace these two failing disks one by one without
> : > the loss of data? I cannot mdadm --fail one of them, because the
> : > subsequent rebuild on a new disk would fail on reading the other bad block.
> : >
> : > I would like to add the ninth drive to the RAID-5 volume, and put a replica
> : > of one of the failing drives to it. Then remove the just-replicated drive,
> : > and do the same with the other failing drive.
> : When you take a look in mdadm man page and search for --replace you'll find what you are looking for.
> :
> : --replace
> :        Mark listed devices as requiring replacement.  As soon as a spare is available, it will be rebuilt  and  will  replace
> :        the marked device.  This is similar to marking a device as faulty, but the device remains in service during the recov-
> :        ery process to increase resilience against multiple failures.  When the replacement  process  finishes,  the  replaced
> :        device will be marked as faulty.
> :
> : --with This  can  follow a list of --replace devices.  The devices listed after --with will be preferentially used to replace
> :        the devices listed after --replace.  These device must already be spare devices in the array.
>
>         OK, this seems to be a way to go.
>
>         Unfortunately, the system in question is too old,
> and its mdadm does not know about the --replace option, according to
> mdadm --manage --help. So I will look for a newer mdadm and hope the old
> kernel of that system supports the mdadm --replace.
>

You'll need kernel version 3.3+ to support advance replacement.

Kind regards,
Caspar

>         Thanks!
>
> -Yenya
>
> --
> | Jan "Yenya" Kasprzak   <kas at {fi.muni.cz - work | yenya.net - private}> |
> | New GPG 4096R/A45477D5 -- see http://www.fi.muni.cz/~kas/pgp-rollover.txt |
> | http://www.fi.muni.cz/~kas/     Journal: http://www.fi.muni.cz/~kas/blog/ |
>            Smart data structures and dumb code works a lot better
>            than the other way around.           --Eric S. Raymond
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Replacing a failed disk "in advance"
  2015-05-20 15:12     ` Caspar Smit
@ 2015-05-20 15:15       ` Jan Kasprzak
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Kasprzak @ 2015-05-20 15:15 UTC (permalink / raw)
  To: Caspar Smit; +Cc: linux-raid

Caspar Smit wrote:
: Op 20 mei 2015 16:51 schreef "Jan Kasprzak" <kas@fi.muni.cz>:
: >
: >         OK, this seems to be a way to go.
: >
: >         Unfortunately, the system in question is too old,
: > and its mdadm does not know about the --replace option, according to
: > mdadm --manage --help. So I will look for a newer mdadm and hope the old
: > kernel of that system supports the mdadm --replace.
: >
: 
: You'll need kernel version 3.3+ to support advance replacement.

	OK, I have grabbed the Fedora 22 mdadm package, rebuilt it,
and apparently mdadm --manage /dev/mdXX --replace /dev/sdOLD1 --with /dev/sdNEW1
works as expected. Thanks!

-Yenya.

-- 
| Jan "Yenya" Kasprzak   <kas at {fi.muni.cz - work | yenya.net - private}> |
| New GPG 4096R/A45477D5 -- see http://www.fi.muni.cz/~kas/pgp-rollover.txt |
| http://www.fi.muni.cz/~kas/     Journal: http://www.fi.muni.cz/~kas/blog/ |
           Smart data structures and dumb code works a lot better
           than the other way around.           --Eric S. Raymond

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-05-20 15:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-20 14:20 Replacing a failed disk "in advance" Jan Kasprzak
2015-05-20 14:40 ` Ladislav Mate
2015-05-20 14:51   ` Jan Kasprzak
2015-05-20 15:12     ` Caspar Smit
2015-05-20 15:15       ` Jan Kasprzak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.