All of lore.kernel.org
 help / color / mirror / Atom feed
* RAID grow and disk failure
@ 2010-06-24 18:12 Piergiorgio Sartor
  2010-06-24 21:57 ` Neil Brown
  0 siblings, 1 reply; 4+ messages in thread
From: Piergiorgio Sartor @ 2010-06-24 18:12 UTC (permalink / raw)
  To: linux-raid

Hi all,

I was wondering, let's say a RAID-6 has an HDD
added and a grow is performed.

What will happen if one of the HDD of the RAID,
possibly the newly added, will faili, i.e. die?

Will the RAID continue the grow using all the
available parity or it will result in a
catastrophic failure for the array?

As side question, assuming the above RAID volume
is a PV (LVM physical volume), what would be the
correct procedure to grow it and extend the PV:

1)
mdadm --grow ...
mdadm --wait
pvresize

2)
mdadm --grow
pvresize

Thanks,

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RAID grow and disk failure
  2010-06-24 18:12 RAID grow and disk failure Piergiorgio Sartor
@ 2010-06-24 21:57 ` Neil Brown
  2010-06-26 13:12   ` Piergiorgio Sartor
  0 siblings, 1 reply; 4+ messages in thread
From: Neil Brown @ 2010-06-24 21:57 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

On Thu, 24 Jun 2010 20:12:13 +0200
Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> wrote:

> Hi all,
> 
> I was wondering, let's say a RAID-6 has an HDD
> added and a grow is performed.
> 
> What will happen if one of the HDD of the RAID,
> possibly the newly added, will faili, i.e. die?
> 
> Will the RAID continue the grow using all the
> available parity or it will result in a
> catastrophic failure for the array?

Assuming the code doesn't have any bugs, the reshape will stop, then
immediately restart picking up where it left off.
You will of course end up with a degraded array.

I have tested this so I think the code is fine.
It is fairly easy to set up a free smallish /dev/loop devices and experiment
yourself.  Not only will this give you confidence that it works, but it will
also give you familiarity with with the mdadm commands so there will be less
room for surprises once you do it for-real.

It might be nice in these circumstances to abort the reshape and revert back
the the previous number of devices - particularly if it was the new device
that failed.  However that currently isn't supported.

> 
> As side question, assuming the above RAID volume
> is a PV (LVM physical volume), what would be the
> correct procedure to grow it and extend the PV:
> 
> 1)
> mdadm --grow ...
> mdadm --wait
> pvresize

Yes.

> 
> 2)
> mdadm --grow
> pvresize

No.
Until the reshape has completed, the extra space is not available.

NeilBrown


> 
> Thanks,
> 
> bye,
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RAID grow and disk failure
  2010-06-24 21:57 ` Neil Brown
@ 2010-06-26 13:12   ` Piergiorgio Sartor
  2010-06-28 23:49     ` Neil Brown
  0 siblings, 1 reply; 4+ messages in thread
From: Piergiorgio Sartor @ 2010-06-26 13:12 UTC (permalink / raw)
  To: Neil Brown; +Cc: Piergiorgio Sartor, linux-raid

Hi,

> Assuming the code doesn't have any bugs, the reshape will stop, then
> immediately restart picking up where it left off.

thanks, that's what I wanted to know.

> You will of course end up with a degraded array

Yes, that was clear.

> It might be nice in these circumstances to abort the reshape and revert back
> the the previous number of devices - particularly if it was the new device
> that failed.  However that currently isn't supported.

Well, probably as an option, it could be interesting.

Actually, I would be still interested, we already
discussed the topic, on a RAID-5/6 with HDDs of
different size.
This would simplify many things...

> > 1)
> > mdadm --grow ...
> > mdadm --wait
> > pvresize
> 
> Yes.
> 
> > 
> > 2)
> > mdadm --grow
> > pvresize
> 
> No.
> Until the reshape has completed, the extra space is not available.

There seem to be an issue, here, maybe.

Using the command line:

mdadm --grow /dev/md/vol02 --bitmap=none; mdadm --grow /dev/md/vol02 -n 9 --backup-file=/var/tmp/md125.backup; mdadm --wait /dev/md/vol02; mdadm --grow /dev/md/vol02 --bitmap=internal --bitmap-chunk=128

Note that /dev/md/vol02 is the usual link to /dev/md125,
which should be the same for this scope, I guess.

I got (in two independent tests):

mdadm: Need to backup 2688K of critical section..
mdadm: failed to set internal bitmap.

Re-issuing:

mdadm --wait /dev/md/vol02; mdadm --grow /dev/md/vol02 --bitmap=internal --bitmap-chunk=128

Does wait.

Could it be the devices (being USB) are so slow
that some race condition is uncovered and the
immediate "--wait" after the "--grow" does not work?

Thanks,

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RAID grow and disk failure
  2010-06-26 13:12   ` Piergiorgio Sartor
@ 2010-06-28 23:49     ` Neil Brown
  0 siblings, 0 replies; 4+ messages in thread
From: Neil Brown @ 2010-06-28 23:49 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

On Sat, 26 Jun 2010 15:12:35 +0200
Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> wrote:

> Hi,
> 
> > Assuming the code doesn't have any bugs, the reshape will stop, then
> > immediately restart picking up where it left off.
> 
> thanks, that's what I wanted to know.
> 
> > You will of course end up with a degraded array
> 
> Yes, that was clear.
> 
> > It might be nice in these circumstances to abort the reshape and revert back
> > the the previous number of devices - particularly if it was the new device
> > that failed.  However that currently isn't supported.
> 
> Well, probably as an option, it could be interesting.
> 
> Actually, I would be still interested, we already
> discussed the topic, on a RAID-5/6 with HDDs of
> different size.
> This would simplify many things...
> 
> > > 1)
> > > mdadm --grow ...
> > > mdadm --wait
> > > pvresize
> > 
> > Yes.
> > 
> > > 
> > > 2)
> > > mdadm --grow
> > > pvresize
> > 
> > No.
> > Until the reshape has completed, the extra space is not available.
> 
> There seem to be an issue, here, maybe.
> 
> Using the command line:
> 
> mdadm --grow /dev/md/vol02 --bitmap=none; mdadm --grow /dev/md/vol02 -n 9 --backup-file=/var/tmp/md125.backup; mdadm --wait /dev/md/vol02; mdadm --grow /dev/md/vol02 --bitmap=internal --bitmap-chunk=128
> 
> Note that /dev/md/vol02 is the usual link to /dev/md125,
> which should be the same for this scope, I guess.
> 
> I got (in two independent tests):
> 
> mdadm: Need to backup 2688K of critical section..
> mdadm: failed to set internal bitmap.
> 
> Re-issuing:
> 
> mdadm --wait /dev/md/vol02; mdadm --grow /dev/md/vol02 --bitmap=internal --bitmap-chunk=128
> 
> Does wait.
> 
> Could it be the devices (being USB) are so slow
> that some race condition is uncovered and the
> immediate "--wait" after the "--grow" does not work?
>

Yes, there  is a race here.  The reshape doesn't quite start instantly and so
--wait doesn't notice.
I've added a note to my todo-list to look into this.
For now, a 'sleep 1' between the --grow and the --wait should be enough.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-06-28 23:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-24 18:12 RAID grow and disk failure Piergiorgio Sartor
2010-06-24 21:57 ` Neil Brown
2010-06-26 13:12   ` Piergiorgio Sartor
2010-06-28 23:49     ` Neil Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.