All of lore.kernel.org
 help / color / mirror / Atom feed
* Auto replace disk
@ 2017-03-08 11:28 Gandalf Corvotempesta
  2017-03-08 18:17 ` Wols Lists
  0 siblings, 1 reply; 7+ messages in thread
From: Gandalf Corvotempesta @ 2017-03-08 11:28 UTC (permalink / raw)
  To: linux-raid

Hi to all
I'm trying to configure mdadm to do automatic replace/rebuild when a
disk is phisically removed and replaced in a slot but without success

Is this possible? How?
The new disk must be formatted or mdadm will replicate partition table
on it's own?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Auto replace disk
  2017-03-08 11:28 Auto replace disk Gandalf Corvotempesta
@ 2017-03-08 18:17 ` Wols Lists
  2017-03-08 21:32   ` Gandalf Corvotempesta
  0 siblings, 1 reply; 7+ messages in thread
From: Wols Lists @ 2017-03-08 18:17 UTC (permalink / raw)
  To: Gandalf Corvotempesta, linux-raid

On 08/03/17 11:28, Gandalf Corvotempesta wrote:
> Hi to all
> I'm trying to configure mdadm to do automatic replace/rebuild when a
> disk is phisically removed and replaced in a slot but without success

Do you mean you remove an old disk, and put a new blank disk in?
> 
> Is this possible? How?
> The new disk must be formatted or mdadm will replicate partition table
> on it's own?

If that's what you mean, then no, it's not possible. mdadm doesn't have
a clue about disks, what it sees is "block devices".

If you stick a new disk in, you need to tell mdadm about it. At which
point you can add it as a spare (which means mdadm will use it to
replace a disk that fails), or you can tell mdadm to replace a failed disk.

You should not - if you can help it - ever remove a disk and then
replace it. Yes in practice I know that's a luxury people often don't
have ... at best you should have spares configured; if you have to you
put the new drive in, use --replace, and then remove the old one. The
last resort is to remove the broken drive and then replace it - this is
likely to trigger further failures and bring down the array.

Cheers,
Wol


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Auto replace disk
  2017-03-08 18:17 ` Wols Lists
@ 2017-03-08 21:32   ` Gandalf Corvotempesta
  2017-03-09  1:31     ` Brad Campbell
                       ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Gandalf Corvotempesta @ 2017-03-08 21:32 UTC (permalink / raw)
  To: Wols Lists; +Cc: linux-raid

2017-03-08 19:17 GMT+01:00 Wols Lists <antlists@youngman.org.uk>:
> Do you mean you remove an old disk, and put a new blank disk in?

Yes

> If that's what you mean, then no, it's not possible. mdadm doesn't have
> a clue about disks, what it sees is "block devices".

Ok but mdadm.conf man page seems to say the opposite:
https://linux.die.net/man/5/mdadm.conf

"POLICY
This is used to specify what automatic behavior is allowed on devices
newly appearing in the system and provides a way of marking spares
that can be moved to other arrays as well as the migration domains.

action=include, re-add, spare, spare-same-slot, or force-spare
auto= yes, no, or homehost.

The action item determines the automatic behavior allowed for devices
matching the path and type in the same line. If a device matches
several lines with different actions then the most permissive will
apply. The ordering of policy lines is irrelevant to the end result.

includeallows adding a disk to an array if metadata on that disk
matches that arrayre-addwill include the device in the array if it
appears to be a current member or a member that was recently
removedspareas above and additionally: if the device is bare it can
become a spare if there is any array that it is a candidate for based
on domains and metadata.spare-same-slotas above and additionally if
given slot was used by an array that went degraded recently and the
device plugged in has no metadata then it will be automatically added
to that array (or it's container)force-spareas above and the disk will
become a spare in remaining cases
"

> You should not - if you can help it - ever remove a disk and then
> replace it. Yes in practice I know that's a luxury people often don't
> have ... at best you should have spares configured

If you have a server with only 4 slot configured in a RAID10,
this workflow would be impossible.

> if you have to you
> put the new drive in, use --replace, and then remove the old one. The
> last resort is to remove the broken drive and then replace it - this is
> likely to trigger further failures and bring down the array.

Why ? I've removed many, many, many disks before with no issue.
Why removing a disk should bring the whole array down? This seems a bug to me.
If a disk will crash, the effect is the same as removing from the slot
and RAID is
meant to protect against this kind of failures.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Auto replace disk
  2017-03-08 21:32   ` Gandalf Corvotempesta
@ 2017-03-09  1:31     ` Brad Campbell
  2017-03-09  9:07       ` Gandalf Corvotempesta
  2017-03-09  2:08     ` Edward Kuns
  2017-03-13 21:36     ` NeilBrown
  2 siblings, 1 reply; 7+ messages in thread
From: Brad Campbell @ 2017-03-09  1:31 UTC (permalink / raw)
  To: Gandalf Corvotempesta, Wols Lists; +Cc: linux-raid

On 09/03/17 05:32, Gandalf Corvotempesta wrote:

> Why ? I've removed many, many, many disks before with no issue.
> Why removing a disk should bring the whole array down? This seems a bug to me.
> If a disk will crash, the effect is the same as removing from the slot
> and RAID is
> meant to protect against this kind of failures.

In general a good number of "help me my RAID is dead" requests that hit 
this list are due to not performing routine array or drive scrubs. So 
one drive dies, and one of the others has a previously unknown bad 
sector. When you put the new drive in, during the rebuild the bad sector 
is hit and the whole array comes tumbling down.

Doing a proactive replacement reduces the possibility of this occurring. 
Having said that, if your disk is dead then there's no other option 
anyway. Regular array scrubs go a long way to mitigating this risk, but 
it does happen frequently enough that you need to be warned against it.

Just because it hasn't happened to you *yet* does not mean you're 
immune, and it's certainly not a *bug*.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Auto replace disk
  2017-03-08 21:32   ` Gandalf Corvotempesta
  2017-03-09  1:31     ` Brad Campbell
@ 2017-03-09  2:08     ` Edward Kuns
  2017-03-13 21:36     ` NeilBrown
  2 siblings, 0 replies; 7+ messages in thread
From: Edward Kuns @ 2017-03-09  2:08 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: Wols Lists, Linux-RAID

On Wed, Mar 8, 2017 at 3:32 PM, Gandalf Corvotempesta
<gandalf.corvotempesta@gmail.com> wrote:
>> The last resort is to remove the broken drive and then replace it - this is
>> likely to trigger further failures and bring down the array.
>
> Why ? I've removed many, many, many disks before with no issue.
> Why removing a disk should bring the whole array down? This seems a bug to me.

In a perfect world where you do scrubbing and your timeouts are
properly configured and your other disks are all in good shape, you're
right.  In the real world where people often don't do what, where
multiple bad sectors accumulate on multiple disks before you get a
failure, or where you're just unlucky, a rebuild triggered by removing
a bad disk and adding back a new one has a fair chance of experiencing
a failure during the rebuild.

If you've lost all redundancy due to a disk failure and while doing a
rebuild you experience another read failure or a second disk failure,
whether due to negligence or bad luck, you can lose the whole array.
If you have the option of doing so, therefore, better to --replace
while you still have redundancy.

           Eddie

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Auto replace disk
  2017-03-09  1:31     ` Brad Campbell
@ 2017-03-09  9:07       ` Gandalf Corvotempesta
  0 siblings, 0 replies; 7+ messages in thread
From: Gandalf Corvotempesta @ 2017-03-09  9:07 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Wols Lists, linux-raid

2017-03-09 2:31 GMT+01:00 Brad Campbell <lists2009@fnarfbargle.com>:
> In general a good number of "help me my RAID is dead" requests that hit this
> list are due to not performing routine array or drive scrubs. So one drive
> dies, and one of the others has a previously unknown bad sector. When you
> put the new drive in, during the rebuild the bad sector is hit and the whole
> array comes tumbling down.
>
> Doing a proactive replacement reduces the possibility of this occurring.
> Having said that, if your disk is dead then there's no other option anyway.
> Regular array scrubs go a long way to mitigating this risk, but it does
> happen frequently enough that you need to be warned against it.

OK I misunderstood
You are right, scrubs are needed (and at least on debian, are
scheduled automatically) and I'm scrubbing my d
arrays monthly (don't know if it's enough) in addition to weekly smart long test

I was referring to just removing a disks
I thought that removing a disk was a bad idea and that MD wasn't able
to support that properly bringing the whole raid down

Also, I always use 3way mirrors or raid6.
I never use a single redundancy raid.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Auto replace disk
  2017-03-08 21:32   ` Gandalf Corvotempesta
  2017-03-09  1:31     ` Brad Campbell
  2017-03-09  2:08     ` Edward Kuns
@ 2017-03-13 21:36     ` NeilBrown
  2 siblings, 0 replies; 7+ messages in thread
From: NeilBrown @ 2017-03-13 21:36 UTC (permalink / raw)
  To: Gandalf Corvotempesta, Wols Lists; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1875 bytes --]

On Wed, Mar 08 2017, Gandalf Corvotempesta wrote:

> 2017-03-08 19:17 GMT+01:00 Wols Lists <antlists@youngman.org.uk>:
>> Do you mean you remove an old disk, and put a new blank disk in?
>
> Yes
>
>> If that's what you mean, then no, it's not possible. mdadm doesn't have
>> a clue about disks, what it sees is "block devices".
>
> Ok but mdadm.conf man page seems to say the opposite:
> https://linux.die.net/man/5/mdadm.conf
>
> "POLICY
> This is used to specify what automatic behavior is allowed on devices
> newly appearing in the system and provides a way of marking spares
> that can be moved to other arrays as well as the migration domains.
>
> action=include, re-add, spare, spare-same-slot, or force-spare
> auto= yes, no, or homehost.
>
> The action item determines the automatic behavior allowed for devices
> matching the path and type in the same line. If a device matches
> several lines with different actions then the most permissive will
> apply. The ordering of policy lines is irrelevant to the end result.
>
> includeallows adding a disk to an array if metadata on that disk
> matches that arrayre-addwill include the device in the array if it
> appears to be a current member or a member that was recently
> removedspareas above and additionally: if the device is bare it can
> become a spare if there is any array that it is a candidate for based
> on domains and metadata.spare-same-slotas above and additionally if
> given slot was used by an array that went degraded recently and the
> device plugged in has no metadata then it will be automatically added
> to that array (or it's container)force-spareas above and the disk will
> become a spare in remaining cases
> "

Clearly you have read the documentation - excellent!
What exactly are you asking?
Presumably you have tried something and it didn't work.  What (exactly)
did you try?

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-03-13 21:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-08 11:28 Auto replace disk Gandalf Corvotempesta
2017-03-08 18:17 ` Wols Lists
2017-03-08 21:32   ` Gandalf Corvotempesta
2017-03-09  1:31     ` Brad Campbell
2017-03-09  9:07       ` Gandalf Corvotempesta
2017-03-09  2:08     ` Edward Kuns
2017-03-13 21:36     ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.