All of lore.kernel.org
 help / color / mirror / Atom feed
* Hot-replace for RAID5
@ 2012-05-08  9:10 Patrik Horník
  2012-05-10  6:59 ` David Brown
  0 siblings, 1 reply; 26+ messages in thread
From: Patrik Horník @ 2012-05-08  9:10 UTC (permalink / raw)
  To: linux-raid

Hello guys,

I need to replace drive in big production RAID5 array and I am
thinking about using new hot-replace feature added in kernel 3.3.

Does someone have experience with it on big RAID5 arrays? Mine is 7 *
1.5 TB. What do you think about its status / stability / reliability?
Do you recommend it on production data?

Thanks.

Patrik Horník
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-08  9:10 Hot-replace for RAID5 Patrik Horník
@ 2012-05-10  6:59 ` David Brown
  2012-05-10  8:50   ` Patrik Horník
  2012-05-10 17:16   ` Patrik Horník
  0 siblings, 2 replies; 26+ messages in thread
From: David Brown @ 2012-05-10  6:59 UTC (permalink / raw)
  To: patrik; +Cc: linux-raid

(I accidentally sent my first reply directly to the OP, and forgot the 
mailing list - I'm adding it back now, because I don't want the OP to 
follow my advice until others have confirmed or corrected it!)

On 09/05/2012 21:53, Patrik Horník wrote:
 > Great suggestion, thanks.
 >
 > So I guess steps with exact parameters should be:
 > 1, add spare S to RAID5 array
 > 2, mdadm --grow /dev/mdX --level 6 --raid-devices N+1 --layout=preserve
 > 3, remove faulty drive and add replacement, let it synchronize
 > 4, possibly remove added spare S
 > 5, mdadm --grow /dev/mdX --level 5 --raid-devices N


Yes, that's what I was thinking.  You are missing "2b - let it synchronise".

Of course, another possibility is that if you have the space in the 
system for another drive, you may want to convert to a full raid6 for 
the future.  That way you have the extra safety built-in in advance. 
But that will definitely lead to a re-shape.

 >
 > My questions:
 > - Are you sure steps 3, 4 and 5 would not cause reshaping?

I /believe/ it will avoid a reshape, but I can't say I'm sure.  This is 
stuff that I only know about in theory, and have not tried in practice.

 >
 > - My array has now left-symmetric layout, so after migration to RAID6
 > it should be left-symmetric-6. Is RAID6 working without problem in
 > degraded mode with this layout, no matter which one or two drives are
 > missing?
 >

The layout will not affect the redundancy or the features of the raid - 
it will only (slightly) affect the speed of some operations.

 > - What happens in step 5 and how long does it take? (If it is without
 > reshaping, it should only upgrade superblocks and thats it.)

That is my understanding.

 >
 > - What happens if I dont remove spare S before migration back to
 > RAID5? Will the array be reshaped and which drive will it make into
 > spare? (If step 5 is instantaneous, there is no reason for that. But
 > if it takes time, it is probably safer.)
 >

I /think/ that the extra disk will turn into a hot spare.  But I am 
getting out of my depth here - it all depends on how the disks get 
numbered and how that affects the layout, and I don't know the details here.

 > So all and alll, what guys do you think is more reliable now, new
 > hot-replace or these steps?


I too am very curious to hear opinions.  Hot-replace will certainly be 
much simpler and faster than these sorts of re-shaping - it's exactly 
the sort of situation the feature was designed for.  But I don't know if 
it is considered stable and well-tested, or "bleeding edge".

mvh.,

David


 >
 > Thanks.
 >
 > Patrik
 >
 > On Wed, May 9, 2012 at 8:09 AM, David Brown<david.brown@hesbynett.no> 
  wrote:
 >> On 08/05/12 11:10, Patrik Horník wrote:
 >>>
 >>> Hello guys,
 >>>
 >>> I need to replace drive in big production RAID5 array and I am
 >>> thinking about using new hot-replace feature added in kernel 3.3.
 >>>
 >>> Does someone have experience with it on big RAID5 arrays? Mine is 7 *
 >>> 1.5 TB. What do you think about its status / stability / reliability?
 >>> Do you recommend it on production data?
 >>>
 >>> Thanks.
 >>>
 >>
 >> If you don't want to play with the "bleeding edge" features, you 
could add
 >> the disk and extend the array to RAID6, then remove the old drive. I 
think
 >> if you want to do it all without doing any re-shapes, however, then 
you'd
 >> need a third drive (the extra drive could easily be an external USB 
disk if
 >> needed - it will only be used for writing, and not for reading unless
 >> there's another disk failure).  Start by adding the extra drive as a hot
 >> spare, then re-shape your raid5 to raid6 in raid5+extra parity 
layout.  Then
 >> fail and remove the old drive.  Put the new drive into the box and 
add it as
 >> a hot spare.  It should automatically take its place in the raid5, 
replacing
 >> the old one.  Once it has been rebuilt, you can fail and remove the 
extra
 >> drive, then re-shape back to raid5.
 >>
 >> If things go horribly wrong, the external drive gives you your parity
 >> protection.
 >>
 >> Of course, don't follow this plan until others here have commented 
on it,
 >> and either corrected or approved it.
 >>
 >> And make sure you have a good backup no matter what you decide to do.
 >>
 >> mvh.,
 >>
 >> David
 >>
 >
 >

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-10  6:59 ` David Brown
@ 2012-05-10  8:50   ` Patrik Horník
  2012-05-10 17:16   ` Patrik Horník
  1 sibling, 0 replies; 26+ messages in thread
From: Patrik Horník @ 2012-05-10  8:50 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid

On Thu, May 10, 2012 at 8:59 AM, David Brown <david.brown@hesbynett.no> wrote:
> (I accidentally sent my first reply directly to the OP, and forgot the
> mailing list - I'm adding it back now, because I don't want the OP to follow
> my advice until others have confirmed or corrected it!)
>

Thanks. I just hit reply all and did not notice that...

>
> On 09/05/2012 21:53, Patrik Horník wrote:
>> Great suggestion, thanks.
>>
>> So I guess steps with exact parameters should be:
>> 1, add spare S to RAID5 array
>> 2, mdadm --grow /dev/mdX --level 6 --raid-devices N+1 --layout=preserve
>> 3, remove faulty drive and add replacement, let it synchronize
>> 4, possibly remove added spare S
>> 5, mdadm --grow /dev/mdX --level 5 --raid-devices N
>
>
> Yes, that's what I was thinking.  You are missing "2b - let it synchronise".
>
> Of course, another possibility is that if you have the space in the system
> for another drive, you may want to convert to a full raid6 for the future.
>  That way you have the extra safety built-in in advance. But that will
> definitely lead to a re-shape.
>
>
>>
>> My questions:
>> - Are you sure steps 3, 4 and 5 would not cause reshaping?
>
> I /believe/ it will avoid a reshape, but I can't say I'm sure.  This is
> stuff that I only know about in theory, and have not tried in practice.
>
>
>>
>> - My array has now left-symmetric layout, so after migration to RAID6
>> it should be left-symmetric-6. Is RAID6 working without problem in
>> degraded mode with this layout, no matter which one or two drives are
>> missing?
>>
>
> The layout will not affect the redundancy or the features of the raid - it
> will only (slightly) affect the speed of some operations.
>
>
>> - What happens in step 5 and how long does it take? (If it is without
>> reshaping, it should only upgrade superblocks and thats it.)
>
> That is my understanding.
>
>
>>
>> - What happens if I dont remove spare S before migration back to
>> RAID5? Will the array be reshaped and which drive will it make into
>> spare? (If step 5 is instantaneous, there is no reason for that. But
>> if it takes time, it is probably safer.)
>>
>
> I /think/ that the extra disk will turn into a hot spare.  But I am getting
> out of my depth here - it all depends on how the disks get numbered and how
> that affects the layout, and I don't know the details here.
>
>
>> So all and alll, what guys do you think is more reliable now, new
>> hot-replace or these steps?
>
>
> I too am very curious to hear opinions.  Hot-replace will certainly be much
> simpler and faster than these sorts of re-shaping - it's exactly the sort of
> situation the feature was designed for.  But I don't know if it is
> considered stable and well-tested, or "bleeding edge".
>
> mvh.,
>
> David
>
>
>
>>
>> Thanks.
>>
>> Patrik
>>
>> On Wed, May 9, 2012 at 8:09 AM, David Brown<david.brown@hesbynett.no>
>>  wrote:
>>> On 08/05/12 11:10, Patrik Horník wrote:
>>>>
>>>> Hello guys,
>>>>
>>>> I need to replace drive in big production RAID5 array and I am
>>>> thinking about using new hot-replace feature added in kernel 3.3.
>>>>
>>>> Does someone have experience with it on big RAID5 arrays? Mine is 7 *
>>>> 1.5 TB. What do you think about its status / stability / reliability?
>>>> Do you recommend it on production data?
>>>>
>>>> Thanks.
>>>>
>>>
>>> If you don't want to play with the "bleeding edge" features, you could
>>> add
>>> the disk and extend the array to RAID6, then remove the old drive. I
>>> think
>>> if you want to do it all without doing any re-shapes, however, then you'd
>>> need a third drive (the extra drive could easily be an external USB disk
>>> if
>>> needed - it will only be used for writing, and not for reading unless
>>> there's another disk failure).  Start by adding the extra drive as a hot
>>> spare, then re-shape your raid5 to raid6 in raid5+extra parity layout.
>>>  Then
>>> fail and remove the old drive.  Put the new drive into the box and add it
>>> as
>>> a hot spare.  It should automatically take its place in the raid5,
>>> replacing
>>> the old one.  Once it has been rebuilt, you can fail and remove the extra
>>> drive, then re-shape back to raid5.
>>>
>>> If things go horribly wrong, the external drive gives you your parity
>>> protection.
>>>
>>> Of course, don't follow this plan until others here have commented on it,
>>> and either corrected or approved it.
>>>
>>> And make sure you have a good backup no matter what you decide to do.
>>>
>>> mvh.,
>>>
>>> David
>>>
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-10  6:59 ` David Brown
  2012-05-10  8:50   ` Patrik Horník
@ 2012-05-10 17:16   ` Patrik Horník
  2012-05-11  0:50     ` NeilBrown
  1 sibling, 1 reply; 26+ messages in thread
From: Patrik Horník @ 2012-05-10 17:16 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid, NeilBrown

Neil, can you please comment if separate operations mentioned in this
process are behaving and are stable enough as we expect? Thanks.

On Thu, May 10, 2012 at 8:59 AM, David Brown <david.brown@hesbynett.no> wrote:
> (I accidentally sent my first reply directly to the OP, and forgot the
> mailing list - I'm adding it back now, because I don't want the OP to follow
> my advice until others have confirmed or corrected it!)
>
>
> On 09/05/2012 21:53, Patrik Horník wrote:
>> Great suggestion, thanks.
>>
>> So I guess steps with exact parameters should be:
>> 1, add spare S to RAID5 array
>> 2, mdadm --grow /dev/mdX --level 6 --raid-devices N+1 --layout=preserve
>> 3, remove faulty drive and add replacement, let it synchronize
>> 4, possibly remove added spare S
>> 5, mdadm --grow /dev/mdX --level 5 --raid-devices N
>
>
> Yes, that's what I was thinking.  You are missing "2b - let it synchronise".

Sure :)

> Of course, another possibility is that if you have the space in the system
> for another drive, you may want to convert to a full raid6 for the future.
>  That way you have the extra safety built-in in advance. But that will
> definitely lead to a re-shape.

Actually I dont have free physical space, array already has 7 drives.
For the process I need place the additional drive on table near the PC
and cool it with fan standing by itself on table... :)

>>
>> My questions:
>> - Are you sure steps 3, 4 and 5 would not cause reshaping?
>
> I /believe/ it will avoid a reshape, but I can't say I'm sure.  This is
> stuff that I only know about in theory, and have not tried in practice.
>
>
>>
>> - My array has now left-symmetric layout, so after migration to RAID6
>> it should be left-symmetric-6. Is RAID6 working without problem in
>> degraded mode with this layout, no matter which one or two drives are
>> missing?
>>
>
> The layout will not affect the redundancy or the features of the raid - it
> will only (slightly) affect the speed of some operations.

I know it should work, but it is probably configuration that is not
used much by users, so maybe it is not tested as much as standard
layouts. So the question was aiming more at practical experience and
stability...

>> - What happens in step 5 and how long does it take? (If it is without
>> reshaping, it should only upgrade superblocks and thats it.)
>
> That is my understanding.
>
>
>>
>> - What happens if I dont remove spare S before migration back to
>> RAID5? Will the array be reshaped and which drive will it make into
>> spare? (If step 5 is instantaneous, there is no reason for that. But
>> if it takes time, it is probably safer.)
>>
>
> I /think/ that the extra disk will turn into a hot spare.  But I am getting
> out of my depth here - it all depends on how the disks get numbered and how
> that affects the layout, and I don't know the details here.
>
>
>> So all and alll, what guys do you think is more reliable now, new
>> hot-replace or these steps?
>
>
> I too am very curious to hear opinions.  Hot-replace will certainly be much
> simpler and faster than these sorts of re-shaping - it's exactly the sort of
> situation the feature was designed for.  But I don't know if it is
> considered stable and well-tested, or "bleeding edge".
>
> mvh.,
>
> David
>
>
>
>>
>> Thanks.
>>
>> Patrik
>>
>> On Wed, May 9, 2012 at 8:09 AM, David Brown<david.brown@hesbynett.no>
>>  wrote:
>>> On 08/05/12 11:10, Patrik Horník wrote:
>>>>
>>>> Hello guys,
>>>>
>>>> I need to replace drive in big production RAID5 array and I am
>>>> thinking about using new hot-replace feature added in kernel 3.3.
>>>>
>>>> Does someone have experience with it on big RAID5 arrays? Mine is 7 *
>>>> 1.5 TB. What do you think about its status / stability / reliability?
>>>> Do you recommend it on production data?
>>>>
>>>> Thanks.
>>>>
>>>
>>> If you don't want to play with the "bleeding edge" features, you could
>>> add
>>> the disk and extend the array to RAID6, then remove the old drive. I
>>> think
>>> if you want to do it all without doing any re-shapes, however, then you'd
>>> need a third drive (the extra drive could easily be an external USB disk
>>> if
>>> needed - it will only be used for writing, and not for reading unless
>>> there's another disk failure).  Start by adding the extra drive as a hot
>>> spare, then re-shape your raid5 to raid6 in raid5+extra parity layout.
>>>  Then
>>> fail and remove the old drive.  Put the new drive into the box and add it
>>> as
>>> a hot spare.  It should automatically take its place in the raid5,
>>> replacing
>>> the old one.  Once it has been rebuilt, you can fail and remove the extra
>>> drive, then re-shape back to raid5.
>>>
>>> If things go horribly wrong, the external drive gives you your parity
>>> protection.
>>>
>>> Of course, don't follow this plan until others here have commented on it,
>>> and either corrected or approved it.
>>>
>>> And make sure you have a good backup no matter what you decide to do.
>>>
>>> mvh.,
>>>
>>> David
>>>
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-10 17:16   ` Patrik Horník
@ 2012-05-11  0:50     ` NeilBrown
  2012-05-11  2:44       ` Patrik Horník
  2012-05-16 23:34       ` Oliver Martin
  0 siblings, 2 replies; 26+ messages in thread
From: NeilBrown @ 2012-05-11  0:50 UTC (permalink / raw)
  To: patrik; +Cc: David Brown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 6597 bytes --]

On Thu, 10 May 2012 19:16:59 +0200 Patrik Horník <patrik@dsl.sk> wrote:

> Neil, can you please comment if separate operations mentioned in this
> process are behaving and are stable enough as we expect? Thanks.

The conversion to and from RAID6 as described should work as expected, though
it requires having an extra device and requires to 'recovery' cycles.
Specifying the number of --raid-devices is not necessary.  When you convert
RAID5 to RAID6, mdadm assumes you are increasing number of devices by 1
unless you say otherwise.  Similarly with RAID6->RAID5 the assumption is a
decrease by 1.

Doing an in-place reshape with the new 3.3 code should work, though with a
softer "should" than above.  We will only know that it is "stable" when enough
people (such as yourself) try it and report success.  If anything does go
wrong I would of course help you to put the array back together but I can
never guarantee no data loss.  You wouldn't be the first to test the code on
live data, but you would be the second that I have heard of.

The in-place reshape is not yet supported by mdadm but it is very easy to
manage directly.  Just
   echo replaceable > /sys/block/mdXXX/md/dev-YYY/state
and as soon as a spare is available the replacement will happen.

NeilBrown


> 
> On Thu, May 10, 2012 at 8:59 AM, David Brown <david.brown@hesbynett.no> wrote:
> > (I accidentally sent my first reply directly to the OP, and forgot the
> > mailing list - I'm adding it back now, because I don't want the OP to follow
> > my advice until others have confirmed or corrected it!)
> >
> >
> > On 09/05/2012 21:53, Patrik Horník wrote:
> >> Great suggestion, thanks.
> >>
> >> So I guess steps with exact parameters should be:
> >> 1, add spare S to RAID5 array
> >> 2, mdadm --grow /dev/mdX --level 6 --raid-devices N+1 --layout=preserve
> >> 3, remove faulty drive and add replacement, let it synchronize
> >> 4, possibly remove added spare S
> >> 5, mdadm --grow /dev/mdX --level 5 --raid-devices N
> >
> >
> > Yes, that's what I was thinking.  You are missing "2b - let it synchronise".
> 
> Sure :)
> 
> > Of course, another possibility is that if you have the space in the system
> > for another drive, you may want to convert to a full raid6 for the future.
> >  That way you have the extra safety built-in in advance. But that will
> > definitely lead to a re-shape.
> 
> Actually I dont have free physical space, array already has 7 drives.
> For the process I need place the additional drive on table near the PC
> and cool it with fan standing by itself on table... :)
> 
> >>
> >> My questions:
> >> - Are you sure steps 3, 4 and 5 would not cause reshaping?
> >
> > I /believe/ it will avoid a reshape, but I can't say I'm sure.  This is
> > stuff that I only know about in theory, and have not tried in practice.
> >
> >
> >>
> >> - My array has now left-symmetric layout, so after migration to RAID6
> >> it should be left-symmetric-6. Is RAID6 working without problem in
> >> degraded mode with this layout, no matter which one or two drives are
> >> missing?
> >>
> >
> > The layout will not affect the redundancy or the features of the raid - it
> > will only (slightly) affect the speed of some operations.
> 
> I know it should work, but it is probably configuration that is not
> used much by users, so maybe it is not tested as much as standard
> layouts. So the question was aiming more at practical experience and
> stability...
> 
> >> - What happens in step 5 and how long does it take? (If it is without
> >> reshaping, it should only upgrade superblocks and thats it.)
> >
> > That is my understanding.
> >
> >
> >>
> >> - What happens if I dont remove spare S before migration back to
> >> RAID5? Will the array be reshaped and which drive will it make into
> >> spare? (If step 5 is instantaneous, there is no reason for that. But
> >> if it takes time, it is probably safer.)
> >>
> >
> > I /think/ that the extra disk will turn into a hot spare.  But I am getting
> > out of my depth here - it all depends on how the disks get numbered and how
> > that affects the layout, and I don't know the details here.
> >
> >
> >> So all and alll, what guys do you think is more reliable now, new
> >> hot-replace or these steps?
> >
> >
> > I too am very curious to hear opinions.  Hot-replace will certainly be much
> > simpler and faster than these sorts of re-shaping - it's exactly the sort of
> > situation the feature was designed for.  But I don't know if it is
> > considered stable and well-tested, or "bleeding edge".
> >
> > mvh.,
> >
> > David
> >
> >
> >
> >>
> >> Thanks.
> >>
> >> Patrik
> >>
> >> On Wed, May 9, 2012 at 8:09 AM, David Brown<david.brown@hesbynett.no>
> >>  wrote:
> >>> On 08/05/12 11:10, Patrik Horník wrote:
> >>>>
> >>>> Hello guys,
> >>>>
> >>>> I need to replace drive in big production RAID5 array and I am
> >>>> thinking about using new hot-replace feature added in kernel 3.3.
> >>>>
> >>>> Does someone have experience with it on big RAID5 arrays? Mine is 7 *
> >>>> 1.5 TB. What do you think about its status / stability / reliability?
> >>>> Do you recommend it on production data?
> >>>>
> >>>> Thanks.
> >>>>
> >>>
> >>> If you don't want to play with the "bleeding edge" features, you could
> >>> add
> >>> the disk and extend the array to RAID6, then remove the old drive. I
> >>> think
> >>> if you want to do it all without doing any re-shapes, however, then you'd
> >>> need a third drive (the extra drive could easily be an external USB disk
> >>> if
> >>> needed - it will only be used for writing, and not for reading unless
> >>> there's another disk failure).  Start by adding the extra drive as a hot
> >>> spare, then re-shape your raid5 to raid6 in raid5+extra parity layout.
> >>>  Then
> >>> fail and remove the old drive.  Put the new drive into the box and add it
> >>> as
> >>> a hot spare.  It should automatically take its place in the raid5,
> >>> replacing
> >>> the old one.  Once it has been rebuilt, you can fail and remove the extra
> >>> drive, then re-shape back to raid5.
> >>>
> >>> If things go horribly wrong, the external drive gives you your parity
> >>> protection.
> >>>
> >>> Of course, don't follow this plan until others here have commented on it,
> >>> and either corrected or approved it.
> >>>
> >>> And make sure you have a good backup no matter what you decide to do.
> >>>
> >>> mvh.,
> >>>
> >>> David
> >>>
> >>
> >>
> >


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-11  0:50     ` NeilBrown
@ 2012-05-11  2:44       ` Patrik Horník
  2012-05-11  7:16         ` David Brown
  2012-05-16 23:34       ` Oliver Martin
  1 sibling, 1 reply; 26+ messages in thread
From: Patrik Horník @ 2012-05-11  2:44 UTC (permalink / raw)
  To: NeilBrown; +Cc: David Brown, linux-raid

On Fri, May 11, 2012 at 2:50 AM, NeilBrown <neilb@suse.de> wrote:
> On Thu, 10 May 2012 19:16:59 +0200 Patrik Horník <patrik@dsl.sk> wrote:
>
>> Neil, can you please comment if separate operations mentioned in this
>> process are behaving and are stable enough as we expect? Thanks.
>
> The conversion to and from RAID6 as described should work as expected, though
> it requires having an extra device and requires to 'recovery' cycles.
> Specifying the number of --raid-devices is not necessary.  When you convert
> RAID5 to RAID6, mdadm assumes you are increasing number of devices by 1
> unless you say otherwise.  Similarly with RAID6->RAID5 the assumption is a
> decrease by 1.
>
> Doing an in-place reshape with the new 3.3 code should work, though with a
> softer "should" than above.  We will only know that it is "stable" when enough
> people (such as yourself) try it and report success.  If anything does go
> wrong I would of course help you to put the array back together but I can
> never guarantee no data loss.  You wouldn't be the first to test the code on
> live data, but you would be the second that I have heard of.

Thanks Neil, this answers my questions. I dont like being second, so
RAID5 - RAID6 - RAID5 it is... :)

In addition my array has 0.9 metadata so hot-replace would also
require conversion of metadata, so all together it seems much riskier.

> The in-place reshape is not yet supported by mdadm but it is very easy to
> manage directly.  Just
>   echo replaceable > /sys/block/mdXXX/md/dev-YYY/state
> and as soon as a spare is available the replacement will happen.
>
> NeilBrown
>
>
>>
>> On Thu, May 10, 2012 at 8:59 AM, David Brown <david.brown@hesbynett.no> wrote:
>> > (I accidentally sent my first reply directly to the OP, and forgot the
>> > mailing list - I'm adding it back now, because I don't want the OP to follow
>> > my advice until others have confirmed or corrected it!)
>> >
>> >
>> > On 09/05/2012 21:53, Patrik Horník wrote:
>> >> Great suggestion, thanks.
>> >>
>> >> So I guess steps with exact parameters should be:
>> >> 1, add spare S to RAID5 array
>> >> 2, mdadm --grow /dev/mdX --level 6 --raid-devices N+1 --layout=preserve
>> >> 3, remove faulty drive and add replacement, let it synchronize
>> >> 4, possibly remove added spare S
>> >> 5, mdadm --grow /dev/mdX --level 5 --raid-devices N
>> >
>> >
>> > Yes, that's what I was thinking.  You are missing "2b - let it synchronise".
>>
>> Sure :)
>>
>> > Of course, another possibility is that if you have the space in the system
>> > for another drive, you may want to convert to a full raid6 for the future.
>> >  That way you have the extra safety built-in in advance. But that will
>> > definitely lead to a re-shape.
>>
>> Actually I dont have free physical space, array already has 7 drives.
>> For the process I need place the additional drive on table near the PC
>> and cool it with fan standing by itself on table... :)
>>
>> >>
>> >> My questions:
>> >> - Are you sure steps 3, 4 and 5 would not cause reshaping?
>> >
>> > I /believe/ it will avoid a reshape, but I can't say I'm sure.  This is
>> > stuff that I only know about in theory, and have not tried in practice.
>> >
>> >
>> >>
>> >> - My array has now left-symmetric layout, so after migration to RAID6
>> >> it should be left-symmetric-6. Is RAID6 working without problem in
>> >> degraded mode with this layout, no matter which one or two drives are
>> >> missing?
>> >>
>> >
>> > The layout will not affect the redundancy or the features of the raid - it
>> > will only (slightly) affect the speed of some operations.
>>
>> I know it should work, but it is probably configuration that is not
>> used much by users, so maybe it is not tested as much as standard
>> layouts. So the question was aiming more at practical experience and
>> stability...
>>
>> >> - What happens in step 5 and how long does it take? (If it is without
>> >> reshaping, it should only upgrade superblocks and thats it.)
>> >
>> > That is my understanding.
>> >
>> >
>> >>
>> >> - What happens if I dont remove spare S before migration back to
>> >> RAID5? Will the array be reshaped and which drive will it make into
>> >> spare? (If step 5 is instantaneous, there is no reason for that. But
>> >> if it takes time, it is probably safer.)
>> >>
>> >
>> > I /think/ that the extra disk will turn into a hot spare.  But I am getting
>> > out of my depth here - it all depends on how the disks get numbered and how
>> > that affects the layout, and I don't know the details here.
>> >
>> >
>> >> So all and alll, what guys do you think is more reliable now, new
>> >> hot-replace or these steps?
>> >
>> >
>> > I too am very curious to hear opinions.  Hot-replace will certainly be much
>> > simpler and faster than these sorts of re-shaping - it's exactly the sort of
>> > situation the feature was designed for.  But I don't know if it is
>> > considered stable and well-tested, or "bleeding edge".
>> >
>> > mvh.,
>> >
>> > David
>> >
>> >
>> >
>> >>
>> >> Thanks.
>> >>
>> >> Patrik
>> >>
>> >> On Wed, May 9, 2012 at 8:09 AM, David Brown<david.brown@hesbynett.no>
>> >>  wrote:
>> >>> On 08/05/12 11:10, Patrik Horník wrote:
>> >>>>
>> >>>> Hello guys,
>> >>>>
>> >>>> I need to replace drive in big production RAID5 array and I am
>> >>>> thinking about using new hot-replace feature added in kernel 3.3.
>> >>>>
>> >>>> Does someone have experience with it on big RAID5 arrays? Mine is 7 *
>> >>>> 1.5 TB. What do you think about its status / stability / reliability?
>> >>>> Do you recommend it on production data?
>> >>>>
>> >>>> Thanks.
>> >>>>
>> >>>
>> >>> If you don't want to play with the "bleeding edge" features, you could
>> >>> add
>> >>> the disk and extend the array to RAID6, then remove the old drive. I
>> >>> think
>> >>> if you want to do it all without doing any re-shapes, however, then you'd
>> >>> need a third drive (the extra drive could easily be an external USB disk
>> >>> if
>> >>> needed - it will only be used for writing, and not for reading unless
>> >>> there's another disk failure).  Start by adding the extra drive as a hot
>> >>> spare, then re-shape your raid5 to raid6 in raid5+extra parity layout.
>> >>>  Then
>> >>> fail and remove the old drive.  Put the new drive into the box and add it
>> >>> as
>> >>> a hot spare.  It should automatically take its place in the raid5,
>> >>> replacing
>> >>> the old one.  Once it has been rebuilt, you can fail and remove the extra
>> >>> drive, then re-shape back to raid5.
>> >>>
>> >>> If things go horribly wrong, the external drive gives you your parity
>> >>> protection.
>> >>>
>> >>> Of course, don't follow this plan until others here have commented on it,
>> >>> and either corrected or approved it.
>> >>>
>> >>> And make sure you have a good backup no matter what you decide to do.
>> >>>
>> >>> mvh.,
>> >>>
>> >>> David
>> >>>
>> >>
>> >>
>> >
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-11  2:44       ` Patrik Horník
@ 2012-05-11  7:16         ` David Brown
  2012-05-12  4:40           ` Patrik Horník
  0 siblings, 1 reply; 26+ messages in thread
From: David Brown @ 2012-05-11  7:16 UTC (permalink / raw)
  To: patrik; +Cc: NeilBrown, linux-raid

Just in case you missed it earlier...

Remember to take a backup before you start this!

Also make notes of things like the "mdadm --detail", version numbers, 
the exact commands executed, etc. (and store this information on another 
computer!)  If something does go wrong, then that information can make 
it much easier for Neil or others to advise you.

mvh.,

David


On 11/05/2012 04:44, Patrik Horník wrote:
> On Fri, May 11, 2012 at 2:50 AM, NeilBrown<neilb@suse.de>  wrote:
>> On Thu, 10 May 2012 19:16:59 +0200 Patrik Horník<patrik@dsl.sk>  wrote:
>>
>>> Neil, can you please comment if separate operations mentioned in this
>>> process are behaving and are stable enough as we expect? Thanks.
>>
>> The conversion to and from RAID6 as described should work as expected, though
>> it requires having an extra device and requires to 'recovery' cycles.
>> Specifying the number of --raid-devices is not necessary.  When you convert
>> RAID5 to RAID6, mdadm assumes you are increasing number of devices by 1
>> unless you say otherwise.  Similarly with RAID6->RAID5 the assumption is a
>> decrease by 1.
>>
>> Doing an in-place reshape with the new 3.3 code should work, though with a
>> softer "should" than above.  We will only know that it is "stable" when enough
>> people (such as yourself) try it and report success.  If anything does go
>> wrong I would of course help you to put the array back together but I can
>> never guarantee no data loss.  You wouldn't be the first to test the code on
>> live data, but you would be the second that I have heard of.
>
> Thanks Neil, this answers my questions. I dont like being second, so
> RAID5 - RAID6 - RAID5 it is... :)
>
> In addition my array has 0.9 metadata so hot-replace would also
> require conversion of metadata, so all together it seems much riskier.
>
>> The in-place reshape is not yet supported by mdadm but it is very easy to
>> manage directly.  Just
>>    echo replaceable>  /sys/block/mdXXX/md/dev-YYY/state
>> and as soon as a spare is available the replacement will happen.
>>
>> NeilBrown
>>
>>
>>>
>>> On Thu, May 10, 2012 at 8:59 AM, David Brown<david.brown@hesbynett.no>  wrote:
>>>> (I accidentally sent my first reply directly to the OP, and forgot the
>>>> mailing list - I'm adding it back now, because I don't want the OP to follow
>>>> my advice until others have confirmed or corrected it!)
>>>>
>>>>
>>>> On 09/05/2012 21:53, Patrik Horník wrote:
>>>>> Great suggestion, thanks.
>>>>>
>>>>> So I guess steps with exact parameters should be:
>>>>> 1, add spare S to RAID5 array
>>>>> 2, mdadm --grow /dev/mdX --level 6 --raid-devices N+1 --layout=preserve
>>>>> 3, remove faulty drive and add replacement, let it synchronize
>>>>> 4, possibly remove added spare S
>>>>> 5, mdadm --grow /dev/mdX --level 5 --raid-devices N
>>>>
>>>>
>>>> Yes, that's what I was thinking.  You are missing "2b - let it synchronise".
>>>
>>> Sure :)
>>>
>>>> Of course, another possibility is that if you have the space in the system
>>>> for another drive, you may want to convert to a full raid6 for the future.
>>>>   That way you have the extra safety built-in in advance. But that will
>>>> definitely lead to a re-shape.
>>>
>>> Actually I dont have free physical space, array already has 7 drives.
>>> For the process I need place the additional drive on table near the PC
>>> and cool it with fan standing by itself on table... :)
>>>
>>>>>
>>>>> My questions:
>>>>> - Are you sure steps 3, 4 and 5 would not cause reshaping?
>>>>
>>>> I /believe/ it will avoid a reshape, but I can't say I'm sure.  This is
>>>> stuff that I only know about in theory, and have not tried in practice.
>>>>
>>>>
>>>>>
>>>>> - My array has now left-symmetric layout, so after migration to RAID6
>>>>> it should be left-symmetric-6. Is RAID6 working without problem in
>>>>> degraded mode with this layout, no matter which one or two drives are
>>>>> missing?
>>>>>
>>>>
>>>> The layout will not affect the redundancy or the features of the raid - it
>>>> will only (slightly) affect the speed of some operations.
>>>
>>> I know it should work, but it is probably configuration that is not
>>> used much by users, so maybe it is not tested as much as standard
>>> layouts. So the question was aiming more at practical experience and
>>> stability...
>>>
>>>>> - What happens in step 5 and how long does it take? (If it is without
>>>>> reshaping, it should only upgrade superblocks and thats it.)
>>>>
>>>> That is my understanding.
>>>>
>>>>
>>>>>
>>>>> - What happens if I dont remove spare S before migration back to
>>>>> RAID5? Will the array be reshaped and which drive will it make into
>>>>> spare? (If step 5 is instantaneous, there is no reason for that. But
>>>>> if it takes time, it is probably safer.)
>>>>>
>>>>
>>>> I /think/ that the extra disk will turn into a hot spare.  But I am getting
>>>> out of my depth here - it all depends on how the disks get numbered and how
>>>> that affects the layout, and I don't know the details here.
>>>>
>>>>
>>>>> So all and alll, what guys do you think is more reliable now, new
>>>>> hot-replace or these steps?
>>>>
>>>>
>>>> I too am very curious to hear opinions.  Hot-replace will certainly be much
>>>> simpler and faster than these sorts of re-shaping - it's exactly the sort of
>>>> situation the feature was designed for.  But I don't know if it is
>>>> considered stable and well-tested, or "bleeding edge".
>>>>
>>>> mvh.,
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>>>
>>>>> Thanks.
>>>>>
>>>>> Patrik
>>>>>
>>>>> On Wed, May 9, 2012 at 8:09 AM, David Brown<david.brown@hesbynett.no>
>>>>>   wrote:
>>>>>> On 08/05/12 11:10, Patrik Horník wrote:
>>>>>>>
>>>>>>> Hello guys,
>>>>>>>
>>>>>>> I need to replace drive in big production RAID5 array and I am
>>>>>>> thinking about using new hot-replace feature added in kernel 3.3.
>>>>>>>
>>>>>>> Does someone have experience with it on big RAID5 arrays? Mine is 7 *
>>>>>>> 1.5 TB. What do you think about its status / stability / reliability?
>>>>>>> Do you recommend it on production data?
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>
>>>>>> If you don't want to play with the "bleeding edge" features, you could
>>>>>> add
>>>>>> the disk and extend the array to RAID6, then remove the old drive. I
>>>>>> think
>>>>>> if you want to do it all without doing any re-shapes, however, then you'd
>>>>>> need a third drive (the extra drive could easily be an external USB disk
>>>>>> if
>>>>>> needed - it will only be used for writing, and not for reading unless
>>>>>> there's another disk failure).  Start by adding the extra drive as a hot
>>>>>> spare, then re-shape your raid5 to raid6 in raid5+extra parity layout.
>>>>>>   Then
>>>>>> fail and remove the old drive.  Put the new drive into the box and add it
>>>>>> as
>>>>>> a hot spare.  It should automatically take its place in the raid5,
>>>>>> replacing
>>>>>> the old one.  Once it has been rebuilt, you can fail and remove the extra
>>>>>> drive, then re-shape back to raid5.
>>>>>>
>>>>>> If things go horribly wrong, the external drive gives you your parity
>>>>>> protection.
>>>>>>
>>>>>> Of course, don't follow this plan until others here have commented on it,
>>>>>> and either corrected or approved it.
>>>>>>
>>>>>> And make sure you have a good backup no matter what you decide to do.
>>>>>>
>>>>>> mvh.,
>>>>>>
>>>>>> David
>>>>>>
>>>>>
>>>>>
>>>>
>>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-11  7:16         ` David Brown
@ 2012-05-12  4:40           ` Patrik Horník
  2012-05-12 15:56             ` Patrik Horník
  0 siblings, 1 reply; 26+ messages in thread
From: Patrik Horník @ 2012-05-12  4:40 UTC (permalink / raw)
  To: David Brown; +Cc: NeilBrown, linux-raid

Neil, the migration to RAID6 is unfortunately not working as expected.

I added spare and used command mdadm --grow /dev/md6 --level 6
--layout=preserve, but I guess it ignored layout preserve.

It asked for backup_file and now it is writing the same amount of data
on all drives. I maybe can live with that, even if that is little
risky because I suspect one of the drives is not OK. But the problem
is I thought backup_file is only for some critical section, so I gave
it backup_file located on one of the drives used in the array. It is
of course not on a partition in the array, but it seems it is the I/O
bottleneck. The speed of reshaping is not constant and varies between
100 K/s and 1.6 MB/s and it seems it will take more than a week maybe
two.

It is kernel 3.2.0 amd64 and mdadm 3.2.2 from squezee backports, it
was seven and now it is eight drives.

What additional info you need to diagnose the problem? I am not yet
100% sure the botlleneck is backup file, but it looks like it from
iostat -d. Is there anything I can do about that? (Like stoping the
reshaping and changing the backup file. To do that I need to restart
server and I need the operation was 100% safe.)

Here is output of detail:

 Version : 0.91
  Creation Time : Tue Aug 18 14:51:41 2009
     Raid Level : raid6
     Array Size : 2933388288 (2797.50 GiB 3003.79 GB)
  Used Dev Size : 488898048 (466.25 GiB 500.63 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 6
    Persistence : Superblock is persistent

    Update Time : Sat May 12 06:37:48 2012
          State : clean, degraded, reshaping
 Active Devices : 7
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric-6
     Chunk Size : 64K

 Reshape Status : 0% complete
     New Layout : left-symmetric

           UUID : d8e679a2:5d6fa7a7:2e406ee4:439be8d3
         Events : 0.983549

    Number   Major   Minor   RaidDevice State
       0       8      115        0      active sync   /dev/sdh3
       1       8       67        1      active sync   /dev/sde3
       2       8       99        2      active sync   /dev/sdg3
       3       8       83        3      active sync   /dev/sdf3
       4       8        3        4      active sync   /dev/sda3
       5       8       19        5      active sync   /dev/sdb3
       6       8       35        6      active sync   /dev/sdc3
       7       8       51        7      spare rebuilding   /dev/sdd3


Patrik


On Fri, May 11, 2012 at 9:16 AM, David Brown <david.brown@hesbynett.no> wrote:
> Just in case you missed it earlier...
>
> Remember to take a backup before you start this!
>
> Also make notes of things like the "mdadm --detail", version numbers, the
> exact commands executed, etc. (and store this information on another
> computer!)  If something does go wrong, then that information can make it
> much easier for Neil or others to advise you.
>
> mvh.,
>
> David
>
>
>
> On 11/05/2012 04:44, Patrik Horník wrote:
>>
>> On Fri, May 11, 2012 at 2:50 AM, NeilBrown<neilb@suse.de>  wrote:
>>>
>>> On Thu, 10 May 2012 19:16:59 +0200 Patrik Horník<patrik@dsl.sk>  wrote:
>>>
>>>> Neil, can you please comment if separate operations mentioned in this
>>>> process are behaving and are stable enough as we expect? Thanks.
>>>
>>>
>>> The conversion to and from RAID6 as described should work as expected,
>>> though
>>> it requires having an extra device and requires to 'recovery' cycles.
>>> Specifying the number of --raid-devices is not necessary.  When you
>>> convert
>>> RAID5 to RAID6, mdadm assumes you are increasing number of devices by 1
>>> unless you say otherwise.  Similarly with RAID6->RAID5 the assumption is
>>> a
>>> decrease by 1.
>>>
>>> Doing an in-place reshape with the new 3.3 code should work, though with
>>> a
>>> softer "should" than above.  We will only know that it is "stable" when
>>> enough
>>> people (such as yourself) try it and report success.  If anything does go
>>> wrong I would of course help you to put the array back together but I can
>>> never guarantee no data loss.  You wouldn't be the first to test the code
>>> on
>>> live data, but you would be the second that I have heard of.
>>
>>
>> Thanks Neil, this answers my questions. I dont like being second, so
>> RAID5 - RAID6 - RAID5 it is... :)
>>
>> In addition my array has 0.9 metadata so hot-replace would also
>> require conversion of metadata, so all together it seems much riskier.
>>
>>> The in-place reshape is not yet supported by mdadm but it is very easy to
>>> manage directly.  Just
>>>   echo replaceable>  /sys/block/mdXXX/md/dev-YYY/state
>>> and as soon as a spare is available the replacement will happen.
>>>
>>> NeilBrown
>>>
>>>
>>>>
>>>> On Thu, May 10, 2012 at 8:59 AM, David Brown<david.brown@hesbynett.no>
>>>>  wrote:
>>>>>
>>>>> (I accidentally sent my first reply directly to the OP, and forgot the
>>>>> mailing list - I'm adding it back now, because I don't want the OP to
>>>>> follow
>>>>> my advice until others have confirmed or corrected it!)
>>>>>
>>>>>
>>>>> On 09/05/2012 21:53, Patrik Horník wrote:
>>>>>>
>>>>>> Great suggestion, thanks.
>>>>>>
>>>>>> So I guess steps with exact parameters should be:
>>>>>> 1, add spare S to RAID5 array
>>>>>> 2, mdadm --grow /dev/mdX --level 6 --raid-devices N+1
>>>>>> --layout=preserve
>>>>>> 3, remove faulty drive and add replacement, let it synchronize
>>>>>> 4, possibly remove added spare S
>>>>>> 5, mdadm --grow /dev/mdX --level 5 --raid-devices N
>>>>>
>>>>>
>>>>>
>>>>> Yes, that's what I was thinking.  You are missing "2b - let it
>>>>> synchronise".
>>>>
>>>>
>>>> Sure :)
>>>>
>>>>> Of course, another possibility is that if you have the space in the
>>>>> system
>>>>> for another drive, you may want to convert to a full raid6 for the
>>>>> future.
>>>>>  That way you have the extra safety built-in in advance. But that will
>>>>> definitely lead to a re-shape.
>>>>
>>>>
>>>> Actually I dont have free physical space, array already has 7 drives.
>>>> For the process I need place the additional drive on table near the PC
>>>> and cool it with fan standing by itself on table... :)
>>>>
>>>>>>
>>>>>> My questions:
>>>>>> - Are you sure steps 3, 4 and 5 would not cause reshaping?
>>>>>
>>>>>
>>>>> I /believe/ it will avoid a reshape, but I can't say I'm sure.  This is
>>>>> stuff that I only know about in theory, and have not tried in practice.
>>>>>
>>>>>
>>>>>>
>>>>>> - My array has now left-symmetric layout, so after migration to RAID6
>>>>>> it should be left-symmetric-6. Is RAID6 working without problem in
>>>>>> degraded mode with this layout, no matter which one or two drives are
>>>>>> missing?
>>>>>>
>>>>>
>>>>> The layout will not affect the redundancy or the features of the raid -
>>>>> it
>>>>> will only (slightly) affect the speed of some operations.
>>>>
>>>>
>>>> I know it should work, but it is probably configuration that is not
>>>> used much by users, so maybe it is not tested as much as standard
>>>> layouts. So the question was aiming more at practical experience and
>>>> stability...
>>>>
>>>>>> - What happens in step 5 and how long does it take? (If it is without
>>>>>> reshaping, it should only upgrade superblocks and thats it.)
>>>>>
>>>>>
>>>>> That is my understanding.
>>>>>
>>>>>
>>>>>>
>>>>>> - What happens if I dont remove spare S before migration back to
>>>>>> RAID5? Will the array be reshaped and which drive will it make into
>>>>>> spare? (If step 5 is instantaneous, there is no reason for that. But
>>>>>> if it takes time, it is probably safer.)
>>>>>>
>>>>>
>>>>> I /think/ that the extra disk will turn into a hot spare.  But I am
>>>>> getting
>>>>> out of my depth here - it all depends on how the disks get numbered and
>>>>> how
>>>>> that affects the layout, and I don't know the details here.
>>>>>
>>>>>
>>>>>> So all and alll, what guys do you think is more reliable now, new
>>>>>> hot-replace or these steps?
>>>>>
>>>>>
>>>>>
>>>>> I too am very curious to hear opinions.  Hot-replace will certainly be
>>>>> much
>>>>> simpler and faster than these sorts of re-shaping - it's exactly the
>>>>> sort of
>>>>> situation the feature was designed for.  But I don't know if it is
>>>>> considered stable and well-tested, or "bleeding edge".
>>>>>
>>>>> mvh.,
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> Patrik
>>>>>>
>>>>>> On Wed, May 9, 2012 at 8:09 AM, David Brown<david.brown@hesbynett.no>
>>>>>>  wrote:
>>>>>>>
>>>>>>> On 08/05/12 11:10, Patrik Horník wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hello guys,
>>>>>>>>
>>>>>>>> I need to replace drive in big production RAID5 array and I am
>>>>>>>> thinking about using new hot-replace feature added in kernel 3.3.
>>>>>>>>
>>>>>>>> Does someone have experience with it on big RAID5 arrays? Mine is 7
>>>>>>>> *
>>>>>>>> 1.5 TB. What do you think about its status / stability /
>>>>>>>> reliability?
>>>>>>>> Do you recommend it on production data?
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>
>>>>>>> If you don't want to play with the "bleeding edge" features, you
>>>>>>> could
>>>>>>> add
>>>>>>> the disk and extend the array to RAID6, then remove the old drive. I
>>>>>>> think
>>>>>>> if you want to do it all without doing any re-shapes, however, then
>>>>>>> you'd
>>>>>>> need a third drive (the extra drive could easily be an external USB
>>>>>>> disk
>>>>>>> if
>>>>>>> needed - it will only be used for writing, and not for reading unless
>>>>>>> there's another disk failure).  Start by adding the extra drive as a
>>>>>>> hot
>>>>>>> spare, then re-shape your raid5 to raid6 in raid5+extra parity
>>>>>>> layout.
>>>>>>>  Then
>>>>>>> fail and remove the old drive.  Put the new drive into the box and
>>>>>>> add it
>>>>>>> as
>>>>>>> a hot spare.  It should automatically take its place in the raid5,
>>>>>>> replacing
>>>>>>> the old one.  Once it has been rebuilt, you can fail and remove the
>>>>>>> extra
>>>>>>> drive, then re-shape back to raid5.
>>>>>>>
>>>>>>> If things go horribly wrong, the external drive gives you your parity
>>>>>>> protection.
>>>>>>>
>>>>>>> Of course, don't follow this plan until others here have commented on
>>>>>>> it,
>>>>>>> and either corrected or approved it.
>>>>>>>
>>>>>>> And make sure you have a good backup no matter what you decide to do.
>>>>>>>
>>>>>>> mvh.,
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-12  4:40           ` Patrik Horník
@ 2012-05-12 15:56             ` Patrik Horník
  2012-05-12 23:19               ` NeilBrown
  0 siblings, 1 reply; 26+ messages in thread
From: Patrik Horník @ 2012-05-12 15:56 UTC (permalink / raw)
  To: David Brown; +Cc: NeilBrown, linux-raid

Neil,

so I further analyzed the behaviour and I found following:

- The bottleneck cca 1.7 MB/s is probably caused by backup file on one
of the drives, that drive is utilized almost 80% according to iostat
-x and its avg queue length is almost 4 while having await under 50
ms.

- The variable speed and low speeds down to 100 KB are caused by
problems on drive I suspected as problematic. Its service time is
sometimes going above 1 sec.. Total avg speed is about 0.8 MB/s. (I
tested the read speed on it by running check of array and it worked
with 30 MB/s. And because preserve should only read from it I did not
specifically test its write speed )

So my questions are:

- Is there a way I can move backup_file to other drive 100% safely? To
add another non-network drive I need to restart the server. I can boot
it then to some live distribution for example to 100% prevent
automatic assembly. I think speed should be couple of times higher.

- Is it safe to fail and remove problematic drive? The array will be
down to 6 from 8 drives in part where it is not reshaped. It should
double the speed.

- Why mdadm did ignore layout=preserve? I have other arrays in that
server in which I need replace the drive.

Thanks.

Patrik

On Sat, May 12, 2012 at 6:40 AM, Patrik Horník <patrik@dsl.sk> wrote:
> Neil, the migration to RAID6 is unfortunately not working as expected.
>
> I added spare and used command mdadm --grow /dev/md6 --level 6
> --layout=preserve, but I guess it ignored layout preserve.
>
> It asked for backup_file and now it is writing the same amount of data
> on all drives. I maybe can live with that, even if that is little
> risky because I suspect one of the drives is not OK. But the problem
> is I thought backup_file is only for some critical section, so I gave
> it backup_file located on one of the drives used in the array. It is
> of course not on a partition in the array, but it seems it is the I/O
> bottleneck. The speed of reshaping is not constant and varies between
> 100 K/s and 1.6 MB/s and it seems it will take more than a week maybe
> two.
>
> It is kernel 3.2.0 amd64 and mdadm 3.2.2 from squezee backports, it
> was seven and now it is eight drives.
>
> What additional info you need to diagnose the problem? I am not yet
> 100% sure the botlleneck is backup file, but it looks like it from
> iostat -d. Is there anything I can do about that? (Like stoping the
> reshaping and changing the backup file. To do that I need to restart
> server and I need the operation was 100% safe.)
>
> Here is output of detail:
>
>  Version : 0.91
>  Creation Time : Tue Aug 18 14:51:41 2009
>     Raid Level : raid6
>     Array Size : 2933388288 (2797.50 GiB 3003.79 GB)
>  Used Dev Size : 488898048 (466.25 GiB 500.63 GB)
>   Raid Devices : 8
>  Total Devices : 8
> Preferred Minor : 6
>    Persistence : Superblock is persistent
>
>    Update Time : Sat May 12 06:37:48 2012
>          State : clean, degraded, reshaping
>  Active Devices : 7
> Working Devices : 8
>  Failed Devices : 0
>  Spare Devices : 1
>
>         Layout : left-symmetric-6
>     Chunk Size : 64K
>
>  Reshape Status : 0% complete
>     New Layout : left-symmetric
>
>           UUID : d8e679a2:5d6fa7a7:2e406ee4:439be8d3
>         Events : 0.983549
>
>    Number   Major   Minor   RaidDevice State
>       0       8      115        0      active sync   /dev/sdh3
>       1       8       67        1      active sync   /dev/sde3
>       2       8       99        2      active sync   /dev/sdg3
>       3       8       83        3      active sync   /dev/sdf3
>       4       8        3        4      active sync   /dev/sda3
>       5       8       19        5      active sync   /dev/sdb3
>       6       8       35        6      active sync   /dev/sdc3
>       7       8       51        7      spare rebuilding   /dev/sdd3
>
>
> Patrik
>
>
> On Fri, May 11, 2012 at 9:16 AM, David Brown <david.brown@hesbynett.no> wrote:
>> Just in case you missed it earlier...
>>
>> Remember to take a backup before you start this!
>>
>> Also make notes of things like the "mdadm --detail", version numbers, the
>> exact commands executed, etc. (and store this information on another
>> computer!)  If something does go wrong, then that information can make it
>> much easier for Neil or others to advise you.
>>
>> mvh.,
>>
>> David
>>
>>
>>
>> On 11/05/2012 04:44, Patrik Horník wrote:
>>>
>>> On Fri, May 11, 2012 at 2:50 AM, NeilBrown<neilb@suse.de>  wrote:
>>>>
>>>> On Thu, 10 May 2012 19:16:59 +0200 Patrik Horník<patrik@dsl.sk>  wrote:
>>>>
>>>>> Neil, can you please comment if separate operations mentioned in this
>>>>> process are behaving and are stable enough as we expect? Thanks.
>>>>
>>>>
>>>> The conversion to and from RAID6 as described should work as expected,
>>>> though
>>>> it requires having an extra device and requires to 'recovery' cycles.
>>>> Specifying the number of --raid-devices is not necessary.  When you
>>>> convert
>>>> RAID5 to RAID6, mdadm assumes you are increasing number of devices by 1
>>>> unless you say otherwise.  Similarly with RAID6->RAID5 the assumption is
>>>> a
>>>> decrease by 1.
>>>>
>>>> Doing an in-place reshape with the new 3.3 code should work, though with
>>>> a
>>>> softer "should" than above.  We will only know that it is "stable" when
>>>> enough
>>>> people (such as yourself) try it and report success.  If anything does go
>>>> wrong I would of course help you to put the array back together but I can
>>>> never guarantee no data loss.  You wouldn't be the first to test the code
>>>> on
>>>> live data, but you would be the second that I have heard of.
>>>
>>>
>>> Thanks Neil, this answers my questions. I dont like being second, so
>>> RAID5 - RAID6 - RAID5 it is... :)
>>>
>>> In addition my array has 0.9 metadata so hot-replace would also
>>> require conversion of metadata, so all together it seems much riskier.
>>>
>>>> The in-place reshape is not yet supported by mdadm but it is very easy to
>>>> manage directly.  Just
>>>>   echo replaceable>  /sys/block/mdXXX/md/dev-YYY/state
>>>> and as soon as a spare is available the replacement will happen.
>>>>
>>>> NeilBrown
>>>>
>>>>
>>>>>
>>>>> On Thu, May 10, 2012 at 8:59 AM, David Brown<david.brown@hesbynett.no>
>>>>>  wrote:
>>>>>>
>>>>>> (I accidentally sent my first reply directly to the OP, and forgot the
>>>>>> mailing list - I'm adding it back now, because I don't want the OP to
>>>>>> follow
>>>>>> my advice until others have confirmed or corrected it!)
>>>>>>
>>>>>>
>>>>>> On 09/05/2012 21:53, Patrik Horník wrote:
>>>>>>>
>>>>>>> Great suggestion, thanks.
>>>>>>>
>>>>>>> So I guess steps with exact parameters should be:
>>>>>>> 1, add spare S to RAID5 array
>>>>>>> 2, mdadm --grow /dev/mdX --level 6 --raid-devices N+1
>>>>>>> --layout=preserve
>>>>>>> 3, remove faulty drive and add replacement, let it synchronize
>>>>>>> 4, possibly remove added spare S
>>>>>>> 5, mdadm --grow /dev/mdX --level 5 --raid-devices N
>>>>>>
>>>>>>
>>>>>>
>>>>>> Yes, that's what I was thinking.  You are missing "2b - let it
>>>>>> synchronise".
>>>>>
>>>>>
>>>>> Sure :)
>>>>>
>>>>>> Of course, another possibility is that if you have the space in the
>>>>>> system
>>>>>> for another drive, you may want to convert to a full raid6 for the
>>>>>> future.
>>>>>>  That way you have the extra safety built-in in advance. But that will
>>>>>> definitely lead to a re-shape.
>>>>>
>>>>>
>>>>> Actually I dont have free physical space, array already has 7 drives.
>>>>> For the process I need place the additional drive on table near the PC
>>>>> and cool it with fan standing by itself on table... :)
>>>>>
>>>>>>>
>>>>>>> My questions:
>>>>>>> - Are you sure steps 3, 4 and 5 would not cause reshaping?
>>>>>>
>>>>>>
>>>>>> I /believe/ it will avoid a reshape, but I can't say I'm sure.  This is
>>>>>> stuff that I only know about in theory, and have not tried in practice.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> - My array has now left-symmetric layout, so after migration to RAID6
>>>>>>> it should be left-symmetric-6. Is RAID6 working without problem in
>>>>>>> degraded mode with this layout, no matter which one or two drives are
>>>>>>> missing?
>>>>>>>
>>>>>>
>>>>>> The layout will not affect the redundancy or the features of the raid -
>>>>>> it
>>>>>> will only (slightly) affect the speed of some operations.
>>>>>
>>>>>
>>>>> I know it should work, but it is probably configuration that is not
>>>>> used much by users, so maybe it is not tested as much as standard
>>>>> layouts. So the question was aiming more at practical experience and
>>>>> stability...
>>>>>
>>>>>>> - What happens in step 5 and how long does it take? (If it is without
>>>>>>> reshaping, it should only upgrade superblocks and thats it.)
>>>>>>
>>>>>>
>>>>>> That is my understanding.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> - What happens if I dont remove spare S before migration back to
>>>>>>> RAID5? Will the array be reshaped and which drive will it make into
>>>>>>> spare? (If step 5 is instantaneous, there is no reason for that. But
>>>>>>> if it takes time, it is probably safer.)
>>>>>>>
>>>>>>
>>>>>> I /think/ that the extra disk will turn into a hot spare.  But I am
>>>>>> getting
>>>>>> out of my depth here - it all depends on how the disks get numbered and
>>>>>> how
>>>>>> that affects the layout, and I don't know the details here.
>>>>>>
>>>>>>
>>>>>>> So all and alll, what guys do you think is more reliable now, new
>>>>>>> hot-replace or these steps?
>>>>>>
>>>>>>
>>>>>>
>>>>>> I too am very curious to hear opinions.  Hot-replace will certainly be
>>>>>> much
>>>>>> simpler and faster than these sorts of re-shaping - it's exactly the
>>>>>> sort of
>>>>>> situation the feature was designed for.  But I don't know if it is
>>>>>> considered stable and well-tested, or "bleeding edge".
>>>>>>
>>>>>> mvh.,
>>>>>>
>>>>>> David
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> Patrik
>>>>>>>
>>>>>>> On Wed, May 9, 2012 at 8:09 AM, David Brown<david.brown@hesbynett.no>
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>> On 08/05/12 11:10, Patrik Horník wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hello guys,
>>>>>>>>>
>>>>>>>>> I need to replace drive in big production RAID5 array and I am
>>>>>>>>> thinking about using new hot-replace feature added in kernel 3.3.
>>>>>>>>>
>>>>>>>>> Does someone have experience with it on big RAID5 arrays? Mine is 7
>>>>>>>>> *
>>>>>>>>> 1.5 TB. What do you think about its status / stability /
>>>>>>>>> reliability?
>>>>>>>>> Do you recommend it on production data?
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>
>>>>>>>> If you don't want to play with the "bleeding edge" features, you
>>>>>>>> could
>>>>>>>> add
>>>>>>>> the disk and extend the array to RAID6, then remove the old drive. I
>>>>>>>> think
>>>>>>>> if you want to do it all without doing any re-shapes, however, then
>>>>>>>> you'd
>>>>>>>> need a third drive (the extra drive could easily be an external USB
>>>>>>>> disk
>>>>>>>> if
>>>>>>>> needed - it will only be used for writing, and not for reading unless
>>>>>>>> there's another disk failure).  Start by adding the extra drive as a
>>>>>>>> hot
>>>>>>>> spare, then re-shape your raid5 to raid6 in raid5+extra parity
>>>>>>>> layout.
>>>>>>>>  Then
>>>>>>>> fail and remove the old drive.  Put the new drive into the box and
>>>>>>>> add it
>>>>>>>> as
>>>>>>>> a hot spare.  It should automatically take its place in the raid5,
>>>>>>>> replacing
>>>>>>>> the old one.  Once it has been rebuilt, you can fail and remove the
>>>>>>>> extra
>>>>>>>> drive, then re-shape back to raid5.
>>>>>>>>
>>>>>>>> If things go horribly wrong, the external drive gives you your parity
>>>>>>>> protection.
>>>>>>>>
>>>>>>>> Of course, don't follow this plan until others here have commented on
>>>>>>>> it,
>>>>>>>> and either corrected or approved it.
>>>>>>>>
>>>>>>>> And make sure you have a good backup no matter what you decide to do.
>>>>>>>>
>>>>>>>> mvh.,
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>>
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-12 15:56             ` Patrik Horník
@ 2012-05-12 23:19               ` NeilBrown
  2012-05-13  7:43                 ` Patrik Horník
  0 siblings, 1 reply; 26+ messages in thread
From: NeilBrown @ 2012-05-12 23:19 UTC (permalink / raw)
  To: patrik; +Cc: David Brown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2382 bytes --]

On Sat, 12 May 2012 17:56:04 +0200 Patrik Horník <patrik@dsl.sk> wrote:

> Neil,

Hi Patrik,
 sorry about the "--layout=preserve" confusion.  I was a bit hasty.
 -layout=left-symmetric-6" would probably have done what was wanted, but it
 is a bit later for that :-(

> 
> so I further analyzed the behaviour and I found following:
> 
> - The bottleneck cca 1.7 MB/s is probably caused by backup file on one
> of the drives, that drive is utilized almost 80% according to iostat
> -x and its avg queue length is almost 4 while having await under 50
> ms.
> 
> - The variable speed and low speeds down to 100 KB are caused by
> problems on drive I suspected as problematic. Its service time is
> sometimes going above 1 sec.. Total avg speed is about 0.8 MB/s. (I
> tested the read speed on it by running check of array and it worked
> with 30 MB/s. And because preserve should only read from it I did not
> specifically test its write speed )
> 
> So my questions are:
> 
> - Is there a way I can move backup_file to other drive 100% safely? To
> add another non-network drive I need to restart the server. I can boot
> it then to some live distribution for example to 100% prevent
> automatic assembly. I think speed should be couple of times higher.

Yes.
If you stop the array, then copy the backup file, then re-assemble the
array giving it the backup file in the new location, all should be well.
A reboot while the array is stopped is not a problem.

> 
> - Is it safe to fail and remove problematic drive? The array will be
> down to 6 from 8 drives in part where it is not reshaped. It should
> double the speed.

As safe as it ever is to fail a device in a non-degraded array.
i.e. it would not cause a problem directly but of course if you get an error
on another device, that would be awkward.

> 
> - Why mdadm did ignore layout=preserve? I have other arrays in that
> server in which I need replace the drive.

I'm not 100% sure - what version of mdadm are you using?
If it is 3.2.4, then maybe commit 0073a6e189c41c broke something.
I'll add test for this to the test suit to make sure it doesn't break again.
But you are using 3.2.2 .... Not sure. I'd have to look more closely.

Using --layout=left-symmetric-6 should work, though testing on some
/dev/loop devices first is always a good idea.

NeilBrown



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-12 23:19               ` NeilBrown
@ 2012-05-13  7:43                 ` Patrik Horník
  2012-05-13 21:41                   ` Patrik Horník
  0 siblings, 1 reply; 26+ messages in thread
From: Patrik Horník @ 2012-05-13  7:43 UTC (permalink / raw)
  To: NeilBrown; +Cc: David Brown, linux-raid

Hi Neil,

On Sun, May 13, 2012 at 1:19 AM, NeilBrown <neilb@suse.de> wrote:
> On Sat, 12 May 2012 17:56:04 +0200 Patrik Horník <patrik@dsl.sk> wrote:
>
>> Neil,
>
> Hi Patrik,
>  sorry about the "--layout=preserve" confusion.  I was a bit hasty.
>  -layout=left-symmetric-6" would probably have done what was wanted, but it
>  is a bit later for that :-(

--layout=preserve is mentioned also in the md or mdadm
documentation... So is it not the right one?

>>
>> so I further analyzed the behaviour and I found following:
>>
>> - The bottleneck cca 1.7 MB/s is probably caused by backup file on one
>> of the drives, that drive is utilized almost 80% according to iostat
>> -x and its avg queue length is almost 4 while having await under 50
>> ms.
>>
>> - The variable speed and low speeds down to 100 KB are caused by
>> problems on drive I suspected as problematic. Its service time is
>> sometimes going above 1 sec.. Total avg speed is about 0.8 MB/s. (I
>> tested the read speed on it by running check of array and it worked
>> with 30 MB/s. And because preserve should only read from it I did not
>> specifically test its write speed )
>>
>> So my questions are:
>>
>> - Is there a way I can move backup_file to other drive 100% safely? To
>> add another non-network drive I need to restart the server. I can boot
>> it then to some live distribution for example to 100% prevent
>> automatic assembly. I think speed should be couple of times higher.
>
> Yes.
> If you stop the array, then copy the backup file, then re-assemble the
> array giving it the backup file in the new location, all should be well.
> A reboot while the array is stopped is not a problem.

Should or will? :) I have 0.90, now 0.91, metadata, is everything
needed stored there? Should mdadm 3.2.2-1~bpo60+2 from
squeeze-backports work well? Or should I compile mdadm 3.2.4?

In case there is some risk involved I will need to choose between
waiting and risking power outage happening sometimes in the following
week (we have something like storm season here) and risking this...

Do you recommend some live linux distro installable on USB which is
good for this? (One that has newest versions and dont try assemble
arrays.)

Or will automatic assemble fail and it will cause no problem at all
for sure? (According to md or mdadm doc this should be the case.) In
that case can I use distribution on the server, Debian stable plus
some packages from squeeze, for that? Possibly with added
raid=noautodetect? I have LVM on top of raid arrays and I dont want to
cause mess. OS is not on LVM or raid.

>>
>> - Is it safe to fail and remove problematic drive? The array will be
>> down to 6 from 8 drives in part where it is not reshaped. It should
>> double the speed.
>
> As safe as it ever is to fail a device in a non-degraded array.
> i.e. it would not cause a problem directly but of course if you get an error
> on another device, that would be awkward.

I actually "check"-ed this raid array couple of times few days ago and
data on other drives were OK. Problematic drive reported couple of
reading errors, always corrected with data from other drives and by
rewriting.

About that, shoud this reshaping work OK if it encounter possible
reading errors on problematic drive? Will it use data from other
drives to correct that also in this reshaping mode?

Thanks.

Patrik

>>
>> - Why mdadm did ignore layout=preserve? I have other arrays in that
>> server in which I need replace the drive.
>
> I'm not 100% sure - what version of mdadm are you using?
> If it is 3.2.4, then maybe commit 0073a6e189c41c broke something.
> I'll add test for this to the test suit to make sure it doesn't break again.
> But you are using 3.2.2 .... Not sure. I'd have to look more closely.
>
> Using --layout=left-symmetric-6 should work, though testing on some
> /dev/loop devices first is always a good idea.
>
> NeilBrown
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-13  7:43                 ` Patrik Horník
@ 2012-05-13 21:41                   ` Patrik Horník
  2012-05-13 22:15                     ` NeilBrown
  0 siblings, 1 reply; 26+ messages in thread
From: Patrik Horník @ 2012-05-13 21:41 UTC (permalink / raw)
  To: NeilBrown; +Cc: David Brown, linux-raid

Hi Neil,

I decided to move backup file on other device. I stopped the array,
mdadm stopped it but wrote "mdadm: failed to unfreeze array". What
does it exactly mean? I dont want to proceed until I am sure it does
not signalize error.

I quickly checked sources and it seems to be related to some sysfs
resources, but I am not sure. But the array disappeared from
/sys/block/.

Thanks.

Patrik

On Sun, May 13, 2012 at 9:43 AM, Patrik Horník <patrik@dsl.sk> wrote:
> Hi Neil,
>
> On Sun, May 13, 2012 at 1:19 AM, NeilBrown <neilb@suse.de> wrote:
>> On Sat, 12 May 2012 17:56:04 +0200 Patrik Horník <patrik@dsl.sk> wrote:
>>
>>> Neil,
>>
>> Hi Patrik,
>>  sorry about the "--layout=preserve" confusion.  I was a bit hasty.
>>  -layout=left-symmetric-6" would probably have done what was wanted, but it
>>  is a bit later for that :-(
>
> --layout=preserve is mentioned also in the md or mdadm
> documentation... So is it not the right one?
>
>>>
>>> so I further analyzed the behaviour and I found following:
>>>
>>> - The bottleneck cca 1.7 MB/s is probably caused by backup file on one
>>> of the drives, that drive is utilized almost 80% according to iostat
>>> -x and its avg queue length is almost 4 while having await under 50
>>> ms.
>>>
>>> - The variable speed and low speeds down to 100 KB are caused by
>>> problems on drive I suspected as problematic. Its service time is
>>> sometimes going above 1 sec.. Total avg speed is about 0.8 MB/s. (I
>>> tested the read speed on it by running check of array and it worked
>>> with 30 MB/s. And because preserve should only read from it I did not
>>> specifically test its write speed )
>>>
>>> So my questions are:
>>>
>>> - Is there a way I can move backup_file to other drive 100% safely? To
>>> add another non-network drive I need to restart the server. I can boot
>>> it then to some live distribution for example to 100% prevent
>>> automatic assembly. I think speed should be couple of times higher.
>>
>> Yes.
>> If you stop the array, then copy the backup file, then re-assemble the
>> array giving it the backup file in the new location, all should be well.
>> A reboot while the array is stopped is not a problem.
>
> Should or will? :) I have 0.90, now 0.91, metadata, is everything
> needed stored there? Should mdadm 3.2.2-1~bpo60+2 from
> squeeze-backports work well? Or should I compile mdadm 3.2.4?
>
> In case there is some risk involved I will need to choose between
> waiting and risking power outage happening sometimes in the following
> week (we have something like storm season here) and risking this...
>
> Do you recommend some live linux distro installable on USB which is
> good for this? (One that has newest versions and dont try assemble
> arrays.)
>
> Or will automatic assemble fail and it will cause no problem at all
> for sure? (According to md or mdadm doc this should be the case.) In
> that case can I use distribution on the server, Debian stable plus
> some packages from squeeze, for that? Possibly with added
> raid=noautodetect? I have LVM on top of raid arrays and I dont want to
> cause mess. OS is not on LVM or raid.
>
>>>
>>> - Is it safe to fail and remove problematic drive? The array will be
>>> down to 6 from 8 drives in part where it is not reshaped. It should
>>> double the speed.
>>
>> As safe as it ever is to fail a device in a non-degraded array.
>> i.e. it would not cause a problem directly but of course if you get an error
>> on another device, that would be awkward.
>
> I actually "check"-ed this raid array couple of times few days ago and
> data on other drives were OK. Problematic drive reported couple of
> reading errors, always corrected with data from other drives and by
> rewriting.
>
> About that, shoud this reshaping work OK if it encounter possible
> reading errors on problematic drive? Will it use data from other
> drives to correct that also in this reshaping mode?
>
> Thanks.
>
> Patrik
>
>>>
>>> - Why mdadm did ignore layout=preserve? I have other arrays in that
>>> server in which I need replace the drive.
>>
>> I'm not 100% sure - what version of mdadm are you using?
>> If it is 3.2.4, then maybe commit 0073a6e189c41c broke something.
>> I'll add test for this to the test suit to make sure it doesn't break again.
>> But you are using 3.2.2 .... Not sure. I'd have to look more closely.
>>
>> Using --layout=left-symmetric-6 should work, though testing on some
>> /dev/loop devices first is always a good idea.
>>
>> NeilBrown
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-13 21:41                   ` Patrik Horník
@ 2012-05-13 22:15                     ` NeilBrown
  2012-05-14  0:52                       ` Patrik Horník
  0 siblings, 1 reply; 26+ messages in thread
From: NeilBrown @ 2012-05-13 22:15 UTC (permalink / raw)
  To: patrik; +Cc: David Brown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 6580 bytes --]

On Sun, 13 May 2012 23:41:35 +0200 Patrik Horník <patrik@dsl.sk> wrote:

> Hi Neil,
> 
> I decided to move backup file on other device. I stopped the array,
> mdadm stopped it but wrote "mdadm: failed to unfreeze array". What
> does it exactly mean? I dont want to proceed until I am sure it does
> not signalize error.

That would appear to be a minor bug in mdadm - I've made a note.

When reshaping an array like this, the 'mdadm' which started the reshape
forks and continues in the background managing the the  backup file.
When it exits, having completed, it makes sure that the array is 'unfrozen'
just to be safe.
However if it exits because  the array was stopped, there is no array to
unfreeze an it gets a little confused.
So it is a bug but it does not affect the data on the devices or indicate
that anything serious went wrong when stopping the array.

> 
> I quickly checked sources and it seems to be related to some sysfs
> resources, but I am not sure. But the array disappeared from
> /sys/block/.

Exactly.  And as the array disappeared, it really has stopped.


> 
> Thanks.
> 
> Patrik
> 
> On Sun, May 13, 2012 at 9:43 AM, Patrik Horník <patrik@dsl.sk> wrote:
> > Hi Neil,
> >
> > On Sun, May 13, 2012 at 1:19 AM, NeilBrown <neilb@suse.de> wrote:
> >> On Sat, 12 May 2012 17:56:04 +0200 Patrik Horník <patrik@dsl.sk> wrote:
> >>
> >>> Neil,
> >>
> >> Hi Patrik,
> >>  sorry about the "--layout=preserve" confusion.  I was a bit hasty.
> >>  -layout=left-symmetric-6" would probably have done what was wanted, but it
> >>  is a bit later for that :-(
> >
> > --layout=preserve is mentioned also in the md or mdadm
> > documentation... So is it not the right one?

It should be ... I think.  But it definitely seems not to work.  I only have
a vague memory of how it was meant to work so I'll have to review the code
and add some proper self-tests.

> >
> >>>
> >>> so I further analyzed the behaviour and I found following:
> >>>
> >>> - The bottleneck cca 1.7 MB/s is probably caused by backup file on one
> >>> of the drives, that drive is utilized almost 80% according to iostat
> >>> -x and its avg queue length is almost 4 while having await under 50
> >>> ms.
> >>>
> >>> - The variable speed and low speeds down to 100 KB are caused by
> >>> problems on drive I suspected as problematic. Its service time is
> >>> sometimes going above 1 sec.. Total avg speed is about 0.8 MB/s. (I
> >>> tested the read speed on it by running check of array and it worked
> >>> with 30 MB/s. And because preserve should only read from it I did not
> >>> specifically test its write speed )
> >>>
> >>> So my questions are:
> >>>
> >>> - Is there a way I can move backup_file to other drive 100% safely? To
> >>> add another non-network drive I need to restart the server. I can boot
> >>> it then to some live distribution for example to 100% prevent
> >>> automatic assembly. I think speed should be couple of times higher.
> >>
> >> Yes.
> >> If you stop the array, then copy the backup file, then re-assemble the
> >> array giving it the backup file in the new location, all should be well.
> >> A reboot while the array is stopped is not a problem.
> >
> > Should or will? :) I have 0.90, now 0.91, metadata, is everything
> > needed stored there? Should mdadm 3.2.2-1~bpo60+2 from
> > squeeze-backports work well? Or should I compile mdadm 3.2.4?

"Will" requires clairvoyance :-)
0.91 is the same as 0.90, except that the array is in the middle of a reshape.
This make sure that old kernels which don't know about reshape never try to
start the array.
Yes - everything you need is stored in the 0.91 metadata and the backup file.
After a clean shutdown, you could manage without the backup file if you had
to, but as you have it, that isn't an issue.

> >
> > In case there is some risk involved I will need to choose between
> > waiting and risking power outage happening sometimes in the following
> > week (we have something like storm season here) and risking this...

There is always risk.
I think you made a wise choice in choosing the move the backup file.

> >
> > Do you recommend some live linux distro installable on USB which is
> > good for this? (One that has newest versions and dont try assemble
> > arrays.)

No.  Best to use whatever you are familiar with.


> >
> > Or will automatic assemble fail and it will cause no problem at all
> > for sure? (According to md or mdadm doc this should be the case.) In
> > that case can I use distribution on the server, Debian stable plus
> > some packages from squeeze, for that? Possibly with added
> > raid=noautodetect? I have LVM on top of raid arrays and I dont want to
> > cause mess. OS is not on LVM or raid.
> >

raid=noautodetect is certainly a good idea. I'm not sure if the in-kernel
autodetect will try to start a reshaping raid - I hope not.

> >>>
> >>> - Is it safe to fail and remove problematic drive? The array will be
> >>> down to 6 from 8 drives in part where it is not reshaped. It should
> >>> double the speed.
> >>
> >> As safe as it ever is to fail a device in a non-degraded array.
> >> i.e. it would not cause a problem directly but of course if you get an error
> >> on another device, that would be awkward.
> >
> > I actually "check"-ed this raid array couple of times few days ago and
> > data on other drives were OK. Problematic drive reported couple of
> > reading errors, always corrected with data from other drives and by
> > rewriting.

That is good!

> >
> > About that, shoud this reshaping work OK if it encounter possible
> > reading errors on problematic drive? Will it use data from other
> > drives to correct that also in this reshaping mode?

As long as there are enough working drives to be able to read and write the
data, the reshape will continue.

NeilBrown


> >
> > Thanks.
> >
> > Patrik
> >
> >>>
> >>> - Why mdadm did ignore layout=preserve? I have other arrays in that
> >>> server in which I need replace the drive.
> >>
> >> I'm not 100% sure - what version of mdadm are you using?
> >> If it is 3.2.4, then maybe commit 0073a6e189c41c broke something.
> >> I'll add test for this to the test suit to make sure it doesn't break again.
> >> But you are using 3.2.2 .... Not sure. I'd have to look more closely.
> >>
> >> Using --layout=left-symmetric-6 should work, though testing on some
> >> /dev/loop devices first is always a good idea.
> >>
> >> NeilBrown
> >>
> >>


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-13 22:15                     ` NeilBrown
@ 2012-05-14  0:52                       ` Patrik Horník
  2012-05-15 10:11                         ` Patrik Horník
  0 siblings, 1 reply; 26+ messages in thread
From: Patrik Horník @ 2012-05-14  0:52 UTC (permalink / raw)
  To: NeilBrown; +Cc: David Brown, linux-raid

Well,

I used raid=noautodetect and the other arrays did start automatically.
I am not sure who started them, maybe  initscripts... But the one
which is reshaping thankfully did not start.

Unfortunately  the speed is not much better. The top speed is up by
cca third to maybe 2.3 MB/s, which seems pretty small and I am unable
to quickly pinpoint the exact reason.Do you have idea what can it be
and how to improve speed?

In addition the performance problem with bad drive periodically kicks
in sooner and thus the average speed is almost the same, around 0.8 to
0.9 MB/s. I am thinking about failing the problematic drive. Except
that I will end up without redundancy for yet not reshaped part,
should the failing work as expected even in the situation array is
now? (raid6 with 8 drives, 7 active devices in not yet reshaped part,
couple of times stopped and start with backup-file.)

Thanks.

Patrik

On Mon, May 14, 2012 at 12:15 AM, NeilBrown <neilb@suse.de> wrote:
> On Sun, 13 May 2012 23:41:35 +0200 Patrik Horník <patrik@dsl.sk> wrote:
>
>> Hi Neil,
>>
>> I decided to move backup file on other device. I stopped the array,
>> mdadm stopped it but wrote "mdadm: failed to unfreeze array". What
>> does it exactly mean? I dont want to proceed until I am sure it does
>> not signalize error.
>
> That would appear to be a minor bug in mdadm - I've made a note.
>
> When reshaping an array like this, the 'mdadm' which started the reshape
> forks and continues in the background managing the the  backup file.
> When it exits, having completed, it makes sure that the array is 'unfrozen'
> just to be safe.
> However if it exits because  the array was stopped, there is no array to
> unfreeze an it gets a little confused.
> So it is a bug but it does not affect the data on the devices or indicate
> that anything serious went wrong when stopping the array.
>
>>
>> I quickly checked sources and it seems to be related to some sysfs
>> resources, but I am not sure. But the array disappeared from
>> /sys/block/.
>
> Exactly.  And as the array disappeared, it really has stopped.
>
>
>>
>> Thanks.
>>
>> Patrik
>>
>> On Sun, May 13, 2012 at 9:43 AM, Patrik Horník <patrik@dsl.sk> wrote:
>> > Hi Neil,
>> >
>> > On Sun, May 13, 2012 at 1:19 AM, NeilBrown <neilb@suse.de> wrote:
>> >> On Sat, 12 May 2012 17:56:04 +0200 Patrik Horník <patrik@dsl.sk> wrote:
>> >>
>> >>> Neil,
>> >>
>> >> Hi Patrik,
>> >>  sorry about the "--layout=preserve" confusion.  I was a bit hasty.
>> >>  -layout=left-symmetric-6" would probably have done what was wanted, but it
>> >>  is a bit later for that :-(
>> >
>> > --layout=preserve is mentioned also in the md or mdadm
>> > documentation... So is it not the right one?
>
> It should be ... I think.  But it definitely seems not to work.  I only have
> a vague memory of how it was meant to work so I'll have to review the code
> and add some proper self-tests.
>
>> >
>> >>>
>> >>> so I further analyzed the behaviour and I found following:
>> >>>
>> >>> - The bottleneck cca 1.7 MB/s is probably caused by backup file on one
>> >>> of the drives, that drive is utilized almost 80% according to iostat
>> >>> -x and its avg queue length is almost 4 while having await under 50
>> >>> ms.
>> >>>
>> >>> - The variable speed and low speeds down to 100 KB are caused by
>> >>> problems on drive I suspected as problematic. Its service time is
>> >>> sometimes going above 1 sec.. Total avg speed is about 0.8 MB/s. (I
>> >>> tested the read speed on it by running check of array and it worked
>> >>> with 30 MB/s. And because preserve should only read from it I did not
>> >>> specifically test its write speed )
>> >>>
>> >>> So my questions are:
>> >>>
>> >>> - Is there a way I can move backup_file to other drive 100% safely? To
>> >>> add another non-network drive I need to restart the server. I can boot
>> >>> it then to some live distribution for example to 100% prevent
>> >>> automatic assembly. I think speed should be couple of times higher.
>> >>
>> >> Yes.
>> >> If you stop the array, then copy the backup file, then re-assemble the
>> >> array giving it the backup file in the new location, all should be well.
>> >> A reboot while the array is stopped is not a problem.
>> >
>> > Should or will? :) I have 0.90, now 0.91, metadata, is everything
>> > needed stored there? Should mdadm 3.2.2-1~bpo60+2 from
>> > squeeze-backports work well? Or should I compile mdadm 3.2.4?
>
> "Will" requires clairvoyance :-)
> 0.91 is the same as 0.90, except that the array is in the middle of a reshape.
> This make sure that old kernels which don't know about reshape never try to
> start the array.
> Yes - everything you need is stored in the 0.91 metadata and the backup file.
> After a clean shutdown, you could manage without the backup file if you had
> to, but as you have it, that isn't an issue.
>
>> >
>> > In case there is some risk involved I will need to choose between
>> > waiting and risking power outage happening sometimes in the following
>> > week (we have something like storm season here) and risking this...
>
> There is always risk.
> I think you made a wise choice in choosing the move the backup file.
>
>> >
>> > Do you recommend some live linux distro installable on USB which is
>> > good for this? (One that has newest versions and dont try assemble
>> > arrays.)
>
> No.  Best to use whatever you are familiar with.
>
>
>> >
>> > Or will automatic assemble fail and it will cause no problem at all
>> > for sure? (According to md or mdadm doc this should be the case.) In
>> > that case can I use distribution on the server, Debian stable plus
>> > some packages from squeeze, for that? Possibly with added
>> > raid=noautodetect? I have LVM on top of raid arrays and I dont want to
>> > cause mess. OS is not on LVM or raid.
>> >
>
> raid=noautodetect is certainly a good idea. I'm not sure if the in-kernel
> autodetect will try to start a reshaping raid - I hope not.
>
>> >>>
>> >>> - Is it safe to fail and remove problematic drive? The array will be
>> >>> down to 6 from 8 drives in part where it is not reshaped. It should
>> >>> double the speed.
>> >>
>> >> As safe as it ever is to fail a device in a non-degraded array.
>> >> i.e. it would not cause a problem directly but of course if you get an error
>> >> on another device, that would be awkward.
>> >
>> > I actually "check"-ed this raid array couple of times few days ago and
>> > data on other drives were OK. Problematic drive reported couple of
>> > reading errors, always corrected with data from other drives and by
>> > rewriting.
>
> That is good!
>
>> >
>> > About that, shoud this reshaping work OK if it encounter possible
>> > reading errors on problematic drive? Will it use data from other
>> > drives to correct that also in this reshaping mode?
>
> As long as there are enough working drives to be able to read and write the
> data, the reshape will continue.
>
> NeilBrown
>
>
>> >
>> > Thanks.
>> >
>> > Patrik
>> >
>> >>>
>> >>> - Why mdadm did ignore layout=preserve? I have other arrays in that
>> >>> server in which I need replace the drive.
>> >>
>> >> I'm not 100% sure - what version of mdadm are you using?
>> >> If it is 3.2.4, then maybe commit 0073a6e189c41c broke something.
>> >> I'll add test for this to the test suit to make sure it doesn't break again.
>> >> But you are using 3.2.2 .... Not sure. I'd have to look more closely.
>> >>
>> >> Using --layout=left-symmetric-6 should work, though testing on some
>> >> /dev/loop devices first is always a good idea.
>> >>
>> >> NeilBrown
>> >>
>> >>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-14  0:52                       ` Patrik Horník
@ 2012-05-15 10:11                         ` Patrik Horník
  2012-05-15 10:43                           ` NeilBrown
  0 siblings, 1 reply; 26+ messages in thread
From: Patrik Horník @ 2012-05-15 10:11 UTC (permalink / raw)
  To: NeilBrown; +Cc: David Brown, linux-raid

Neil,

did you have a chance to look at how to migrate from raid5 to raid6
without reshaping and/or why layout=preserve did not work?

Regarding failing drive during reshape I was worried, because I found
some mentions of problems in mailing lists from 1-2 years ago, like
non-functional backup-file after failing drive or worse... But I
tested it on test array and it worked, so I did it.

Now I am getting constant speed 2.3 MB/s. Is it not too slow? It is
not CPU constrained, it is I/O. But nothing else is going on the
drives, they are all modern drives, backup is now on different drive,
so if it is enough sequential it should be much higher. What should be
the pattern of I/O operations it uses? It is 7 x HDD RAID5 to RAID6
migration, chunk size is 64K, backup file is about 50M.

Thanks.

Patrik

On Mon, May 14, 2012 at 2:52 AM, Patrik Horník <patrik@dsl.sk> wrote:
> Well,
>
> I used raid=noautodetect and the other arrays did start automatically.
> I am not sure who started them, maybe  initscripts... But the one
> which is reshaping thankfully did not start.
>
> Unfortunately  the speed is not much better. The top speed is up by
> cca third to maybe 2.3 MB/s, which seems pretty small and I am unable
> to quickly pinpoint the exact reason.Do you have idea what can it be
> and how to improve speed?
>
> In addition the performance problem with bad drive periodically kicks
> in sooner and thus the average speed is almost the same, around 0.8 to
> 0.9 MB/s. I am thinking about failing the problematic drive. Except
> that I will end up without redundancy for yet not reshaped part,
> should the failing work as expected even in the situation array is
> now? (raid6 with 8 drives, 7 active devices in not yet reshaped part,
> couple of times stopped and start with backup-file.)
>
> Thanks.
>
> Patrik
>
> On Mon, May 14, 2012 at 12:15 AM, NeilBrown <neilb@suse.de> wrote:
>> On Sun, 13 May 2012 23:41:35 +0200 Patrik Horník <patrik@dsl.sk> wrote:
>>
>>> Hi Neil,
>>>
>>> I decided to move backup file on other device. I stopped the array,
>>> mdadm stopped it but wrote "mdadm: failed to unfreeze array". What
>>> does it exactly mean? I dont want to proceed until I am sure it does
>>> not signalize error.
>>
>> That would appear to be a minor bug in mdadm - I've made a note.
>>
>> When reshaping an array like this, the 'mdadm' which started the reshape
>> forks and continues in the background managing the the  backup file.
>> When it exits, having completed, it makes sure that the array is 'unfrozen'
>> just to be safe.
>> However if it exits because  the array was stopped, there is no array to
>> unfreeze an it gets a little confused.
>> So it is a bug but it does not affect the data on the devices or indicate
>> that anything serious went wrong when stopping the array.
>>
>>>
>>> I quickly checked sources and it seems to be related to some sysfs
>>> resources, but I am not sure. But the array disappeared from
>>> /sys/block/.
>>
>> Exactly.  And as the array disappeared, it really has stopped.
>>
>>
>>>
>>> Thanks.
>>>
>>> Patrik
>>>
>>> On Sun, May 13, 2012 at 9:43 AM, Patrik Horník <patrik@dsl.sk> wrote:
>>> > Hi Neil,
>>> >
>>> > On Sun, May 13, 2012 at 1:19 AM, NeilBrown <neilb@suse.de> wrote:
>>> >> On Sat, 12 May 2012 17:56:04 +0200 Patrik Horník <patrik@dsl.sk> wrote:
>>> >>
>>> >>> Neil,
>>> >>
>>> >> Hi Patrik,
>>> >>  sorry about the "--layout=preserve" confusion.  I was a bit hasty.
>>> >>  -layout=left-symmetric-6" would probably have done what was wanted, but it
>>> >>  is a bit later for that :-(
>>> >
>>> > --layout=preserve is mentioned also in the md or mdadm
>>> > documentation... So is it not the right one?
>>
>> It should be ... I think.  But it definitely seems not to work.  I only have
>> a vague memory of how it was meant to work so I'll have to review the code
>> and add some proper self-tests.
>>
>>> >
>>> >>>
>>> >>> so I further analyzed the behaviour and I found following:
>>> >>>
>>> >>> - The bottleneck cca 1.7 MB/s is probably caused by backup file on one
>>> >>> of the drives, that drive is utilized almost 80% according to iostat
>>> >>> -x and its avg queue length is almost 4 while having await under 50
>>> >>> ms.
>>> >>>
>>> >>> - The variable speed and low speeds down to 100 KB are caused by
>>> >>> problems on drive I suspected as problematic. Its service time is
>>> >>> sometimes going above 1 sec.. Total avg speed is about 0.8 MB/s. (I
>>> >>> tested the read speed on it by running check of array and it worked
>>> >>> with 30 MB/s. And because preserve should only read from it I did not
>>> >>> specifically test its write speed )
>>> >>>
>>> >>> So my questions are:
>>> >>>
>>> >>> - Is there a way I can move backup_file to other drive 100% safely? To
>>> >>> add another non-network drive I need to restart the server. I can boot
>>> >>> it then to some live distribution for example to 100% prevent
>>> >>> automatic assembly. I think speed should be couple of times higher.
>>> >>
>>> >> Yes.
>>> >> If you stop the array, then copy the backup file, then re-assemble the
>>> >> array giving it the backup file in the new location, all should be well.
>>> >> A reboot while the array is stopped is not a problem.
>>> >
>>> > Should or will? :) I have 0.90, now 0.91, metadata, is everything
>>> > needed stored there? Should mdadm 3.2.2-1~bpo60+2 from
>>> > squeeze-backports work well? Or should I compile mdadm 3.2.4?
>>
>> "Will" requires clairvoyance :-)
>> 0.91 is the same as 0.90, except that the array is in the middle of a reshape.
>> This make sure that old kernels which don't know about reshape never try to
>> start the array.
>> Yes - everything you need is stored in the 0.91 metadata and the backup file.
>> After a clean shutdown, you could manage without the backup file if you had
>> to, but as you have it, that isn't an issue.
>>
>>> >
>>> > In case there is some risk involved I will need to choose between
>>> > waiting and risking power outage happening sometimes in the following
>>> > week (we have something like storm season here) and risking this...
>>
>> There is always risk.
>> I think you made a wise choice in choosing the move the backup file.
>>
>>> >
>>> > Do you recommend some live linux distro installable on USB which is
>>> > good for this? (One that has newest versions and dont try assemble
>>> > arrays.)
>>
>> No.  Best to use whatever you are familiar with.
>>
>>
>>> >
>>> > Or will automatic assemble fail and it will cause no problem at all
>>> > for sure? (According to md or mdadm doc this should be the case.) In
>>> > that case can I use distribution on the server, Debian stable plus
>>> > some packages from squeeze, for that? Possibly with added
>>> > raid=noautodetect? I have LVM on top of raid arrays and I dont want to
>>> > cause mess. OS is not on LVM or raid.
>>> >
>>
>> raid=noautodetect is certainly a good idea. I'm not sure if the in-kernel
>> autodetect will try to start a reshaping raid - I hope not.
>>
>>> >>>
>>> >>> - Is it safe to fail and remove problematic drive? The array will be
>>> >>> down to 6 from 8 drives in part where it is not reshaped. It should
>>> >>> double the speed.
>>> >>
>>> >> As safe as it ever is to fail a device in a non-degraded array.
>>> >> i.e. it would not cause a problem directly but of course if you get an error
>>> >> on another device, that would be awkward.
>>> >
>>> > I actually "check"-ed this raid array couple of times few days ago and
>>> > data on other drives were OK. Problematic drive reported couple of
>>> > reading errors, always corrected with data from other drives and by
>>> > rewriting.
>>
>> That is good!
>>
>>> >
>>> > About that, shoud this reshaping work OK if it encounter possible
>>> > reading errors on problematic drive? Will it use data from other
>>> > drives to correct that also in this reshaping mode?
>>
>> As long as there are enough working drives to be able to read and write the
>> data, the reshape will continue.
>>
>> NeilBrown
>>
>>
>>> >
>>> > Thanks.
>>> >
>>> > Patrik
>>> >
>>> >>>
>>> >>> - Why mdadm did ignore layout=preserve? I have other arrays in that
>>> >>> server in which I need replace the drive.
>>> >>
>>> >> I'm not 100% sure - what version of mdadm are you using?
>>> >> If it is 3.2.4, then maybe commit 0073a6e189c41c broke something.
>>> >> I'll add test for this to the test suit to make sure it doesn't break again.
>>> >> But you are using 3.2.2 .... Not sure. I'd have to look more closely.
>>> >>
>>> >> Using --layout=left-symmetric-6 should work, though testing on some
>>> >> /dev/loop devices first is always a good idea.
>>> >>
>>> >> NeilBrown
>>> >>
>>> >>
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-15 10:11                         ` Patrik Horník
@ 2012-05-15 10:43                           ` NeilBrown
       [not found]                             ` <CAAOsTSmMrs2bHDbFrND4-iaxwrTA0WySd_AVaK+KXZ-XZsysag@mail.gmail.com>
  0 siblings, 1 reply; 26+ messages in thread
From: NeilBrown @ 2012-05-15 10:43 UTC (permalink / raw)
  To: patrik; +Cc: David Brown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 10140 bytes --]

On Tue, 15 May 2012 12:11:28 +0200 Patrik Horník <patrik@dsl.sk> wrote:

> Neil,
> 
> did you have a chance to look at how to migrate from raid5 to raid6
> without reshaping and/or why layout=preserve did not work?

Yes.
http://neil.brown.name/git?p=mdadm;a=commitdiff;h=385167f364122c9424aa3d56f00b8c0874ce78b8

fixes it.
--layout=preserve works properly after that patch.

> 
> Regarding failing drive during reshape I was worried, because I found
> some mentions of problems in mailing lists from 1-2 years ago, like
> non-functional backup-file after failing drive or worse... But I
> tested it on test array and it worked, so I did it.

testing == good !!

> 
> Now I am getting constant speed 2.3 MB/s. Is it not too slow? It is
> not CPU constrained, it is I/O. But nothing else is going on the
> drives, they are all modern drives, backup is now on different drive,
> so if it is enough sequential it should be much higher. What should be
> the pattern of I/O operations it uses? It is 7 x HDD RAID5 to RAID6
> migration, chunk size is 64K, backup file is about 50M.

Yes, it is painfully slow.
It reads from the array and writes to the backup.  Then allows reshape to
progress which might read from the array again, and writes to the array.
It is doing this in 50M blocks

How big is the stripe cache - /sys/block/md0/md/stripe_cache_size ??
To hold 50M it needs 50M/4K/6 == 2133 entries.
And it might need to hold it twice - once for the old layout and once for the
new.
So try increasing it to about 5000 if it isn't there already.
That might reduce the reads and allow it to flow more smoothly.

NeilBrown


> 
> Thanks.
> 
> Patrik
> 
> On Mon, May 14, 2012 at 2:52 AM, Patrik Horník <patrik@dsl.sk> wrote:
> > Well,
> >
> > I used raid=noautodetect and the other arrays did start automatically.
> > I am not sure who started them, maybe  initscripts... But the one
> > which is reshaping thankfully did not start.
> >
> > Unfortunately  the speed is not much better. The top speed is up by
> > cca third to maybe 2.3 MB/s, which seems pretty small and I am unable
> > to quickly pinpoint the exact reason.Do you have idea what can it be
> > and how to improve speed?
> >
> > In addition the performance problem with bad drive periodically kicks
> > in sooner and thus the average speed is almost the same, around 0.8 to
> > 0.9 MB/s. I am thinking about failing the problematic drive. Except
> > that I will end up without redundancy for yet not reshaped part,
> > should the failing work as expected even in the situation array is
> > now? (raid6 with 8 drives, 7 active devices in not yet reshaped part,
> > couple of times stopped and start with backup-file.)
> >
> > Thanks.
> >
> > Patrik
> >
> > On Mon, May 14, 2012 at 12:15 AM, NeilBrown <neilb@suse.de> wrote:
> >> On Sun, 13 May 2012 23:41:35 +0200 Patrik Horník <patrik@dsl.sk> wrote:
> >>
> >>> Hi Neil,
> >>>
> >>> I decided to move backup file on other device. I stopped the array,
> >>> mdadm stopped it but wrote "mdadm: failed to unfreeze array". What
> >>> does it exactly mean? I dont want to proceed until I am sure it does
> >>> not signalize error.
> >>
> >> That would appear to be a minor bug in mdadm - I've made a note.
> >>
> >> When reshaping an array like this, the 'mdadm' which started the reshape
> >> forks and continues in the background managing the the  backup file.
> >> When it exits, having completed, it makes sure that the array is 'unfrozen'
> >> just to be safe.
> >> However if it exits because  the array was stopped, there is no array to
> >> unfreeze an it gets a little confused.
> >> So it is a bug but it does not affect the data on the devices or indicate
> >> that anything serious went wrong when stopping the array.
> >>
> >>>
> >>> I quickly checked sources and it seems to be related to some sysfs
> >>> resources, but I am not sure. But the array disappeared from
> >>> /sys/block/.
> >>
> >> Exactly.  And as the array disappeared, it really has stopped.
> >>
> >>
> >>>
> >>> Thanks.
> >>>
> >>> Patrik
> >>>
> >>> On Sun, May 13, 2012 at 9:43 AM, Patrik Horník <patrik@dsl.sk> wrote:
> >>> > Hi Neil,
> >>> >
> >>> > On Sun, May 13, 2012 at 1:19 AM, NeilBrown <neilb@suse.de> wrote:
> >>> >> On Sat, 12 May 2012 17:56:04 +0200 Patrik Horník <patrik@dsl.sk> wrote:
> >>> >>
> >>> >>> Neil,
> >>> >>
> >>> >> Hi Patrik,
> >>> >>  sorry about the "--layout=preserve" confusion.  I was a bit hasty.
> >>> >>  -layout=left-symmetric-6" would probably have done what was wanted, but it
> >>> >>  is a bit later for that :-(
> >>> >
> >>> > --layout=preserve is mentioned also in the md or mdadm
> >>> > documentation... So is it not the right one?
> >>
> >> It should be ... I think.  But it definitely seems not to work.  I only have
> >> a vague memory of how it was meant to work so I'll have to review the code
> >> and add some proper self-tests.
> >>
> >>> >
> >>> >>>
> >>> >>> so I further analyzed the behaviour and I found following:
> >>> >>>
> >>> >>> - The bottleneck cca 1.7 MB/s is probably caused by backup file on one
> >>> >>> of the drives, that drive is utilized almost 80% according to iostat
> >>> >>> -x and its avg queue length is almost 4 while having await under 50
> >>> >>> ms.
> >>> >>>
> >>> >>> - The variable speed and low speeds down to 100 KB are caused by
> >>> >>> problems on drive I suspected as problematic. Its service time is
> >>> >>> sometimes going above 1 sec.. Total avg speed is about 0.8 MB/s. (I
> >>> >>> tested the read speed on it by running check of array and it worked
> >>> >>> with 30 MB/s. And because preserve should only read from it I did not
> >>> >>> specifically test its write speed )
> >>> >>>
> >>> >>> So my questions are:
> >>> >>>
> >>> >>> - Is there a way I can move backup_file to other drive 100% safely? To
> >>> >>> add another non-network drive I need to restart the server. I can boot
> >>> >>> it then to some live distribution for example to 100% prevent
> >>> >>> automatic assembly. I think speed should be couple of times higher.
> >>> >>
> >>> >> Yes.
> >>> >> If you stop the array, then copy the backup file, then re-assemble the
> >>> >> array giving it the backup file in the new location, all should be well.
> >>> >> A reboot while the array is stopped is not a problem.
> >>> >
> >>> > Should or will? :) I have 0.90, now 0.91, metadata, is everything
> >>> > needed stored there? Should mdadm 3.2.2-1~bpo60+2 from
> >>> > squeeze-backports work well? Or should I compile mdadm 3.2.4?
> >>
> >> "Will" requires clairvoyance :-)
> >> 0.91 is the same as 0.90, except that the array is in the middle of a reshape.
> >> This make sure that old kernels which don't know about reshape never try to
> >> start the array.
> >> Yes - everything you need is stored in the 0.91 metadata and the backup file.
> >> After a clean shutdown, you could manage without the backup file if you had
> >> to, but as you have it, that isn't an issue.
> >>
> >>> >
> >>> > In case there is some risk involved I will need to choose between
> >>> > waiting and risking power outage happening sometimes in the following
> >>> > week (we have something like storm season here) and risking this...
> >>
> >> There is always risk.
> >> I think you made a wise choice in choosing the move the backup file.
> >>
> >>> >
> >>> > Do you recommend some live linux distro installable on USB which is
> >>> > good for this? (One that has newest versions and dont try assemble
> >>> > arrays.)
> >>
> >> No.  Best to use whatever you are familiar with.
> >>
> >>
> >>> >
> >>> > Or will automatic assemble fail and it will cause no problem at all
> >>> > for sure? (According to md or mdadm doc this should be the case.) In
> >>> > that case can I use distribution on the server, Debian stable plus
> >>> > some packages from squeeze, for that? Possibly with added
> >>> > raid=noautodetect? I have LVM on top of raid arrays and I dont want to
> >>> > cause mess. OS is not on LVM or raid.
> >>> >
> >>
> >> raid=noautodetect is certainly a good idea. I'm not sure if the in-kernel
> >> autodetect will try to start a reshaping raid - I hope not.
> >>
> >>> >>>
> >>> >>> - Is it safe to fail and remove problematic drive? The array will be
> >>> >>> down to 6 from 8 drives in part where it is not reshaped. It should
> >>> >>> double the speed.
> >>> >>
> >>> >> As safe as it ever is to fail a device in a non-degraded array.
> >>> >> i.e. it would not cause a problem directly but of course if you get an error
> >>> >> on another device, that would be awkward.
> >>> >
> >>> > I actually "check"-ed this raid array couple of times few days ago and
> >>> > data on other drives were OK. Problematic drive reported couple of
> >>> > reading errors, always corrected with data from other drives and by
> >>> > rewriting.
> >>
> >> That is good!
> >>
> >>> >
> >>> > About that, shoud this reshaping work OK if it encounter possible
> >>> > reading errors on problematic drive? Will it use data from other
> >>> > drives to correct that also in this reshaping mode?
> >>
> >> As long as there are enough working drives to be able to read and write the
> >> data, the reshape will continue.
> >>
> >> NeilBrown
> >>
> >>
> >>> >
> >>> > Thanks.
> >>> >
> >>> > Patrik
> >>> >
> >>> >>>
> >>> >>> - Why mdadm did ignore layout=preserve? I have other arrays in that
> >>> >>> server in which I need replace the drive.
> >>> >>
> >>> >> I'm not 100% sure - what version of mdadm are you using?
> >>> >> If it is 3.2.4, then maybe commit 0073a6e189c41c broke something.
> >>> >> I'll add test for this to the test suit to make sure it doesn't break again.
> >>> >> But you are using 3.2.2 .... Not sure. I'd have to look more closely.
> >>> >>
> >>> >> Using --layout=left-symmetric-6 should work, though testing on some
> >>> >> /dev/loop devices first is always a good idea.
> >>> >>
> >>> >> NeilBrown
> >>> >>
> >>> >>
> >>


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
       [not found]                               ` <20120515212820.14db2fd2@notabene.brown>
@ 2012-05-15 11:56                                 ` Patrik Horník
  2012-05-15 12:13                                   ` NeilBrown
  0 siblings, 1 reply; 26+ messages in thread
From: Patrik Horník @ 2012-05-15 11:56 UTC (permalink / raw)
  To: NeilBrown; +Cc: David Brown, linux-raid

Anyway increasing it to 5K did not help and drives don't seem to be
fully utilized.

Does the reshape work something like this:
- Read about X = (50M / N - 1 / stripe size) stripes from drives and
write them to the backup-file
- Reshape X stripes one by another sequentially
- Reshaping stripe by reading chunks from all drives, calculate Q,
writing all chunks back and doing I/O for next stripe only after
finishing previous one?

So after increasing stripe_cache_size the cache should hold stripes
after backing them and so reshaping should not need to read them from
drives again?

Cant the slow speed be caused by some synchronization issues? How are
the stripes read for writing them to backup-file? Is it done one by
one, so I/Os for next stripe are issued only after having read the
previous stripe completely? Are they issued in maximum parallel way
possible?

Patrik


On Tue, May 15, 2012 at 1:28 PM, NeilBrown <neilb@suse.de> wrote:
> On Tue, 15 May 2012 13:16:42 +0200 Patrik Horník <patrik@dsl.sk> wrote:
>
>> Can I increase it during reshape by echo N >
>> /sys/block/mdX/md/stripe_cache_size?
>
> Yes.
>
>
>>
>> How is the size determined? I have only 1027 while having 8 GB system memory...
>
> Not very well.
>
> It is set to 256, or the minimum size needed to allow the reshape to proceed
> (which means about 4 chunks worth).  I should probably add some auto-sizing
> but that sort of stuff is hard :-(
>
> NeilBrown
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-15 11:56                                 ` Patrik Horník
@ 2012-05-15 12:13                                   ` NeilBrown
  2012-05-15 19:39                                     ` Patrik Horník
  0 siblings, 1 reply; 26+ messages in thread
From: NeilBrown @ 2012-05-15 12:13 UTC (permalink / raw)
  To: patrik; +Cc: David Brown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2276 bytes --]

On Tue, 15 May 2012 13:56:58 +0200 Patrik Horník <patrik@dsl.sk> wrote:

> Anyway increasing it to 5K did not help and drives don't seem to be
> fully utilized.
> 
> Does the reshape work something like this:
> - Read about X = (50M / N - 1 / stripe size) stripes from drives and
> write them to the backup-file
> - Reshape X stripes one by another sequentially
> - Reshaping stripe by reading chunks from all drives, calculate Q,
> writing all chunks back and doing I/O for next stripe only after
> finishing previous one?
> 
> So after increasing stripe_cache_size the cache should hold stripes
> after backing them and so reshaping should not need to read them from
> drives again?
> 
> Cant the slow speed be caused by some synchronization issues? How are
> the stripes read for writing them to backup-file? Is it done one by
> one, so I/Os for next stripe are issued only after having read the
> previous stripe completely? Are they issued in maximum parallel way
> possible?

There is as much parallelism as I could manage.
The backup file is divided into 2 sections.
Write to one,  then the other, then invalidate the first and write to it etc.
So while one half is being written, the data in the other half is being
reshaped in the array.
Also the stripe reads are scheduled asynchronously and as soon as a stripe is
fully available, the Q is calculated and they are scheduled for write.

The slowness is due to continually having to seek back a little way to over
write what has just be read, and also having to update the metadata each time
to record where we are up to.

NeilBrown


> 
> Patrik
> 
> 
> On Tue, May 15, 2012 at 1:28 PM, NeilBrown <neilb@suse.de> wrote:
> > On Tue, 15 May 2012 13:16:42 +0200 Patrik Horník <patrik@dsl.sk> wrote:
> >
> >> Can I increase it during reshape by echo N >
> >> /sys/block/mdX/md/stripe_cache_size?
> >
> > Yes.
> >
> >
> >>
> >> How is the size determined? I have only 1027 while having 8 GB system memory...
> >
> > Not very well.
> >
> > It is set to 256, or the minimum size needed to allow the reshape to proceed
> > (which means about 4 chunks worth).  I should probably add some auto-sizing
> > but that sort of stuff is hard :-(
> >
> > NeilBrown
> >


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-15 12:13                                   ` NeilBrown
@ 2012-05-15 19:39                                     ` Patrik Horník
  2012-05-15 22:47                                       ` NeilBrown
  0 siblings, 1 reply; 26+ messages in thread
From: Patrik Horník @ 2012-05-15 19:39 UTC (permalink / raw)
  To: NeilBrown; +Cc: David Brown, linux-raid

BTW thank you very much for the fix for layout=preserve. As soon as
current reshape finishes, I am going to other arrays.

Are regressions in 2.3.4 serious and so to which version I should
apply the patch? Or when you looked at the code, should
layout=left-symmetric-6 work in 2.3.2?

In regard reshaping speed, estimation when doing things a lot more
sequentially gives much higher speeds. Lets say 48 MB backup, 6 drives
with 80 MB/s sequential speed. If you do reshaping like this:
- Read 8 MB sequential from each drive in parallel, 0.1 s
- Then write it to backup, 48/80 = 0.6 s
- Calculate Q for something like 48 MB (guessing 0.05 s) and writing
it back to diff drives in parallel in 0.1 s. Because it is in the
cache and you are only writing  in this phase (?), there is not back
and forth seeking and rotational latency applies only couple of times
altogether, lets say 0.02.
- Update superblock and move header back, two worst seeks, 0.03 s (I
dont know how often do you update superblocks?)

you process 8 MB in cca 0.9 s, so speed in this scenario should be cca 9 MB/s.

I guess the main real difference when you logically doing it in
stripes can be that when you waiting for completion of writing chunks
(are you waiting for real completion of writes?), the difference
between first and last drive is often long enough to need wait one or
more rotations for writing another stripe. If that is the case, you
need add cca 128 * lets say 1.5 * 0.005 s = 0.64 s and so we are down
to cca 4.3 MB/s theoretically.

Patrik

On Tue, May 15, 2012 at 2:13 PM, NeilBrown <neilb@suse.de> wrote:
> On Tue, 15 May 2012 13:56:58 +0200 Patrik Horník <patrik@dsl.sk> wrote:
>
>> Anyway increasing it to 5K did not help and drives don't seem to be
>> fully utilized.
>>
>> Does the reshape work something like this:
>> - Read about X = (50M / N - 1 / stripe size) stripes from drives and
>> write them to the backup-file
>> - Reshape X stripes one by another sequentially
>> - Reshaping stripe by reading chunks from all drives, calculate Q,
>> writing all chunks back and doing I/O for next stripe only after
>> finishing previous one?
>>
>> So after increasing stripe_cache_size the cache should hold stripes
>> after backing them and so reshaping should not need to read them from
>> drives again?
>>
>> Cant the slow speed be caused by some synchronization issues? How are
>> the stripes read for writing them to backup-file? Is it done one by
>> one, so I/Os for next stripe are issued only after having read the
>> previous stripe completely? Are they issued in maximum parallel way
>> possible?
>
> There is as much parallelism as I could manage.
> The backup file is divided into 2 sections.
> Write to one,  then the other, then invalidate the first and write to it etc.
> So while one half is being written, the data in the other half is being
> reshaped in the array.
> Also the stripe reads are scheduled asynchronously and as soon as a stripe is
> fully available, the Q is calculated and they are scheduled for write.
>
> The slowness is due to continually having to seek back a little way to over
> write what has just be read, and also having to update the metadata each time
> to record where we are up to.
>
> NeilBrown
>
>
>>
>> Patrik
>>
>>
>> On Tue, May 15, 2012 at 1:28 PM, NeilBrown <neilb@suse.de> wrote:
>> > On Tue, 15 May 2012 13:16:42 +0200 Patrik Horník <patrik@dsl.sk> wrote:
>> >
>> >> Can I increase it during reshape by echo N >
>> >> /sys/block/mdX/md/stripe_cache_size?
>> >
>> > Yes.
>> >
>> >
>> >>
>> >> How is the size determined? I have only 1027 while having 8 GB system memory...
>> >
>> > Not very well.
>> >
>> > It is set to 256, or the minimum size needed to allow the reshape to proceed
>> > (which means about 4 chunks worth).  I should probably add some auto-sizing
>> > but that sort of stuff is hard :-(
>> >
>> > NeilBrown
>> >
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-15 19:39                                     ` Patrik Horník
@ 2012-05-15 22:47                                       ` NeilBrown
  2012-05-16  5:51                                         ` Patrik Horník
  0 siblings, 1 reply; 26+ messages in thread
From: NeilBrown @ 2012-05-15 22:47 UTC (permalink / raw)
  To: patrik; +Cc: David Brown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 4498 bytes --]

On Tue, 15 May 2012 21:39:10 +0200 Patrik Horník <patrik@dsl.sk> wrote:

> BTW thank you very much for the fix for layout=preserve. As soon as
> current reshape finishes, I am going to other arrays.
> 
> Are regressions in 2.3.4 serious and so to which version I should
> apply the patch? Or when you looked at the code, should
> layout=left-symmetric-6 work in 2.3.2?

Regression isn't dangerous, just inconvenient (--add often doesn't work).
--layout=left-symmetric-6 will work on 2.3.2, providing the current layout
of the array is "left symmetric" which I think is the default, but you should
check.

NeilBrown

> 
> In regard reshaping speed, estimation when doing things a lot more
> sequentially gives much higher speeds. Lets say 48 MB backup, 6 drives
> with 80 MB/s sequential speed. If you do reshaping like this:
> - Read 8 MB sequential from each drive in parallel, 0.1 s
> - Then write it to backup, 48/80 = 0.6 s
> - Calculate Q for something like 48 MB (guessing 0.05 s) and writing
> it back to diff drives in parallel in 0.1 s. Because it is in the
> cache and you are only writing  in this phase (?), there is not back
> and forth seeking and rotational latency applies only couple of times
> altogether, lets say 0.02.
> - Update superblock and move header back, two worst seeks, 0.03 s (I
> dont know how often do you update superblocks?)
> 
> you process 8 MB in cca 0.9 s, so speed in this scenario should be cca 9 MB/s.
> 
> I guess the main real difference when you logically doing it in
> stripes can be that when you waiting for completion of writing chunks
> (are you waiting for real completion of writes?), the difference
> between first and last drive is often long enough to need wait one or
> more rotations for writing another stripe. If that is the case, you
> need add cca 128 * lets say 1.5 * 0.005 s = 0.64 s and so we are down
> to cca 4.3 MB/s theoretically.
> 
> Patrik
> 
> On Tue, May 15, 2012 at 2:13 PM, NeilBrown <neilb@suse.de> wrote:
> > On Tue, 15 May 2012 13:56:58 +0200 Patrik Horník <patrik@dsl.sk> wrote:
> >
> >> Anyway increasing it to 5K did not help and drives don't seem to be
> >> fully utilized.
> >>
> >> Does the reshape work something like this:
> >> - Read about X = (50M / N - 1 / stripe size) stripes from drives and
> >> write them to the backup-file
> >> - Reshape X stripes one by another sequentially
> >> - Reshaping stripe by reading chunks from all drives, calculate Q,
> >> writing all chunks back and doing I/O for next stripe only after
> >> finishing previous one?
> >>
> >> So after increasing stripe_cache_size the cache should hold stripes
> >> after backing them and so reshaping should not need to read them from
> >> drives again?
> >>
> >> Cant the slow speed be caused by some synchronization issues? How are
> >> the stripes read for writing them to backup-file? Is it done one by
> >> one, so I/Os for next stripe are issued only after having read the
> >> previous stripe completely? Are they issued in maximum parallel way
> >> possible?
> >
> > There is as much parallelism as I could manage.
> > The backup file is divided into 2 sections.
> > Write to one,  then the other, then invalidate the first and write to it etc.
> > So while one half is being written, the data in the other half is being
> > reshaped in the array.
> > Also the stripe reads are scheduled asynchronously and as soon as a stripe is
> > fully available, the Q is calculated and they are scheduled for write.
> >
> > The slowness is due to continually having to seek back a little way to over
> > write what has just be read, and also having to update the metadata each time
> > to record where we are up to.
> >
> > NeilBrown
> >
> >
> >>
> >> Patrik
> >>
> >>
> >> On Tue, May 15, 2012 at 1:28 PM, NeilBrown <neilb@suse.de> wrote:
> >> > On Tue, 15 May 2012 13:16:42 +0200 Patrik Horník <patrik@dsl.sk> wrote:
> >> >
> >> >> Can I increase it during reshape by echo N >
> >> >> /sys/block/mdX/md/stripe_cache_size?
> >> >
> >> > Yes.
> >> >
> >> >
> >> >>
> >> >> How is the size determined? I have only 1027 while having 8 GB system memory...
> >> >
> >> > Not very well.
> >> >
> >> > It is set to 256, or the minimum size needed to allow the reshape to proceed
> >> > (which means about 4 chunks worth).  I should probably add some auto-sizing
> >> > but that sort of stuff is hard :-(
> >> >
> >> > NeilBrown
> >> >
> >


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-15 22:47                                       ` NeilBrown
@ 2012-05-16  5:51                                         ` Patrik Horník
  0 siblings, 0 replies; 26+ messages in thread
From: Patrik Horník @ 2012-05-16  5:51 UTC (permalink / raw)
  To: NeilBrown; +Cc: David Brown, linux-raid

On Wed, May 16, 2012 at 12:47 AM, NeilBrown <neilb@suse.de> wrote:
> On Tue, 15 May 2012 21:39:10 +0200 Patrik Horník <patrik@dsl.sk> wrote:
>
>> BTW thank you very much for the fix for layout=preserve. As soon as
>> current reshape finishes, I am going to other arrays.
>>
>> Are regressions in 2.3.4 serious and so to which version I should
>> apply the patch? Or when you looked at the code, should
>> layout=left-symmetric-6 work in 2.3.2?
>
> Regression isn't dangerous, just inconvenient (--add often doesn't work).
> --layout=left-symmetric-6 will work on 2.3.2, providing the current layout
> of the array is "left symmetric" which I think is the default, but you should
> check.

OK, thanks. Yes, the layout of my arrays is left-symmetric.

>
> NeilBrown
>
>>
>> In regard reshaping speed, estimation when doing things a lot more
>> sequentially gives much higher speeds. Lets say 48 MB backup, 6 drives
>> with 80 MB/s sequential speed. If you do reshaping like this:
>> - Read 8 MB sequential from each drive in parallel, 0.1 s
>> - Then write it to backup, 48/80 = 0.6 s
>> - Calculate Q for something like 48 MB (guessing 0.05 s) and writing
>> it back to diff drives in parallel in 0.1 s. Because it is in the
>> cache and you are only writing  in this phase (?), there is not back
>> and forth seeking and rotational latency applies only couple of times
>> altogether, lets say 0.02.
>> - Update superblock and move header back, two worst seeks, 0.03 s (I
>> dont know how often do you update superblocks?)
>>
>> you process 8 MB in cca 0.9 s, so speed in this scenario should be cca 9 MB/s.
>>
>> I guess the main real difference when you logically doing it in
>> stripes can be that when you waiting for completion of writing chunks
>> (are you waiting for real completion of writes?), the difference
>> between first and last drive is often long enough to need wait one or
>> more rotations for writing another stripe. If that is the case, you
>> need add cca 128 * lets say 1.5 * 0.005 s = 0.64 s and so we are down
>> to cca 4.3 MB/s theoretically.
>>
>> Patrik
>>
>> On Tue, May 15, 2012 at 2:13 PM, NeilBrown <neilb@suse.de> wrote:
>> > On Tue, 15 May 2012 13:56:58 +0200 Patrik Horník <patrik@dsl.sk> wrote:
>> >
>> >> Anyway increasing it to 5K did not help and drives don't seem to be
>> >> fully utilized.
>> >>
>> >> Does the reshape work something like this:
>> >> - Read about X = (50M / N - 1 / stripe size) stripes from drives and
>> >> write them to the backup-file
>> >> - Reshape X stripes one by another sequentially
>> >> - Reshaping stripe by reading chunks from all drives, calculate Q,
>> >> writing all chunks back and doing I/O for next stripe only after
>> >> finishing previous one?
>> >>
>> >> So after increasing stripe_cache_size the cache should hold stripes
>> >> after backing them and so reshaping should not need to read them from
>> >> drives again?
>> >>
>> >> Cant the slow speed be caused by some synchronization issues? How are
>> >> the stripes read for writing them to backup-file? Is it done one by
>> >> one, so I/Os for next stripe are issued only after having read the
>> >> previous stripe completely? Are they issued in maximum parallel way
>> >> possible?
>> >
>> > There is as much parallelism as I could manage.
>> > The backup file is divided into 2 sections.
>> > Write to one,  then the other, then invalidate the first and write to it etc.
>> > So while one half is being written, the data in the other half is being
>> > reshaped in the array.
>> > Also the stripe reads are scheduled asynchronously and as soon as a stripe is
>> > fully available, the Q is calculated and they are scheduled for write.
>> >
>> > The slowness is due to continually having to seek back a little way to over
>> > write what has just be read, and also having to update the metadata each time
>> > to record where we are up to.
>> >
>> > NeilBrown
>> >
>> >
>> >>
>> >> Patrik
>> >>
>> >>
>> >> On Tue, May 15, 2012 at 1:28 PM, NeilBrown <neilb@suse.de> wrote:
>> >> > On Tue, 15 May 2012 13:16:42 +0200 Patrik Horník <patrik@dsl.sk> wrote:
>> >> >
>> >> >> Can I increase it during reshape by echo N >
>> >> >> /sys/block/mdX/md/stripe_cache_size?
>> >> >
>> >> > Yes.
>> >> >
>> >> >
>> >> >>
>> >> >> How is the size determined? I have only 1027 while having 8 GB system memory...
>> >> >
>> >> > Not very well.
>> >> >
>> >> > It is set to 256, or the minimum size needed to allow the reshape to proceed
>> >> > (which means about 4 chunks worth).  I should probably add some auto-sizing
>> >> > but that sort of stuff is hard :-(
>> >> >
>> >> > NeilBrown
>> >> >
>> >
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-11  0:50     ` NeilBrown
  2012-05-11  2:44       ` Patrik Horník
@ 2012-05-16 23:34       ` Oliver Martin
  2012-05-18  3:45         ` NeilBrown
  1 sibling, 1 reply; 26+ messages in thread
From: Oliver Martin @ 2012-05-16 23:34 UTC (permalink / raw)
  To: NeilBrown; +Cc: patrik, David Brown, linux-raid

Hi Neil,

Am 11.05.2012 02:50, schrieb NeilBrown:
> Doing an in-place reshape with the new 3.3 code should work, though with a
> softer "should" than above.  We will only know that it is "stable" when enough
> people (such as yourself) try it and report success.  If anything does go
> wrong I would of course help you to put the array back together but I can
> never guarantee no data loss.  You wouldn't be the first to test the code on
> live data, but you would be the second that I have heard of.

I guess I'll be taking 2nd place then. I just used it on three live 
raid6 arrays, and it worked perfectly.

Thanks for your all your awesome work!

Oliver

PS: I wasn't subscribed to the list before, so I'm trying to reply to 
this via gmane. No idea if this preserves all the list headers - if I 
break the thread, that's probably the cause.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-16 23:34       ` Oliver Martin
@ 2012-05-18  3:45         ` NeilBrown
  2012-05-19 10:40           ` Patrik Horník
  2012-05-21  9:54           ` Asdo
  0 siblings, 2 replies; 26+ messages in thread
From: NeilBrown @ 2012-05-18  3:45 UTC (permalink / raw)
  To: Oliver Martin; +Cc: patrik, David Brown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1172 bytes --]

On Thu, 17 May 2012 01:34:15 +0200 Oliver Martin <oliver@volatilevoid.net>
wrote:

> Hi Neil,
> 
> Am 11.05.2012 02:50, schrieb NeilBrown:
> > Doing an in-place reshape with the new 3.3 code should work, though with a
> > softer "should" than above.  We will only know that it is "stable" when enough
> > people (such as yourself) try it and report success.  If anything does go
> > wrong I would of course help you to put the array back together but I can
> > never guarantee no data loss.  You wouldn't be the first to test the code on
> > live data, but you would be the second that I have heard of.
> 
> I guess I'll be taking 2nd place then. I just used it on three live 
> raid6 arrays, and it worked perfectly.

3 arrays - so you are 2nd, 3rd, and 4th :-)

Thanks.  I often get failure reports and only more rarely get success
reports, so I value them all the more.

NeilBrown


> 
> Thanks for your all your awesome work!
> 
> Oliver
> 
> PS: I wasn't subscribed to the list before, so I'm trying to reply to 
> this via gmane. No idea if this preserves all the list headers - if I 
> break the thread, that's probably the cause.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-18  3:45         ` NeilBrown
@ 2012-05-19 10:40           ` Patrik Horník
  2012-05-21  9:54           ` Asdo
  1 sibling, 0 replies; 26+ messages in thread
From: Patrik Horník @ 2012-05-19 10:40 UTC (permalink / raw)
  To: NeilBrown; +Cc: Oliver Martin, David Brown, linux-raid

Neil, thanks for your assistance.

I successfully converted 2nd and 3rd array to raid6 with ls-6 layout,
--layout=left-symmetric-6 worked as advertised.

Reshaping of 1st array finished OK too, even when it did not avoid
power outage at last and was hit by one couple of hours before
finishing. But it restarted without problem, the array was consistent
according check and I also compared important data with backup and
they are OK.

Patrik

On Fri, May 18, 2012 at 5:45 AM, NeilBrown <neilb@suse.de> wrote:
> On Thu, 17 May 2012 01:34:15 +0200 Oliver Martin <oliver@volatilevoid.net>
> wrote:
>
>> Hi Neil,
>>
>> Am 11.05.2012 02:50, schrieb NeilBrown:
>> > Doing an in-place reshape with the new 3.3 code should work, though with a
>> > softer "should" than above.  We will only know that it is "stable" when enough
>> > people (such as yourself) try it and report success.  If anything does go
>> > wrong I would of course help you to put the array back together but I can
>> > never guarantee no data loss.  You wouldn't be the first to test the code on
>> > live data, but you would be the second that I have heard of.
>>
>> I guess I'll be taking 2nd place then. I just used it on three live
>> raid6 arrays, and it worked perfectly.
>
> 3 arrays - so you are 2nd, 3rd, and 4th :-)
>
> Thanks.  I often get failure reports and only more rarely get success
> reports, so I value them all the more.
>
> NeilBrown
>
>
>>
>> Thanks for your all your awesome work!
>>
>> Oliver
>>
>> PS: I wasn't subscribed to the list before, so I'm trying to reply to
>> this via gmane. No idea if this preserves all the list headers - if I
>> break the thread, that's probably the cause.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-18  3:45         ` NeilBrown
  2012-05-19 10:40           ` Patrik Horník
@ 2012-05-21  9:54           ` Asdo
  2012-05-21 10:12             ` NeilBrown
  1 sibling, 1 reply; 26+ messages in thread
From: Asdo @ 2012-05-21  9:54 UTC (permalink / raw)
  To: NeilBrown; +Cc: Oliver Martin, patrik, David Brown, linux-raid

On 05/18/12 05:45, NeilBrown wrote:
> On Thu, 17 May 2012 01:34:15 +0200 Oliver Martin<oliver@volatilevoid.net>
> wrote:
>
>> Hi Neil,
>>
>> Am 11.05.2012 02:50, schrieb NeilBrown:
>>> Doing an in-place reshape with the new 3.3 code should work, though with a
>>> softer "should" than above.  We will only know that it is "stable" when enough
>>> people (such as yourself) try it and report success.  If anything does go
>>> wrong I would of course help you to put the array back together but I can
>>> never guarantee no data loss.  You wouldn't be the first to test the code on
>>> live data, but you would be the second that I have heard of.
>> I guess I'll be taking 2nd place then. I just used it on three live
>> raid6 arrays, and it worked perfectly.
> 3 arrays - so you are 2nd, 3rd, and 4th :-)

Good to know that when all is good, hot-replace works.

I wonder if all "error paths" were considered and implemented (and maybe 
even tested, but we users could help with testing if we understand the 
intended behaviour), i.e.

what happens when the disk being hot-replaced shows read errors in 
locations previously unknown to the bad-block list: does it
- immediately fall back to fail+rebuild or
- first tries a recompute + rewrite of the sector, then if rewrite fails 
it falls back to fail+rebuild
- first tries a recompute + rewrite of the sector, then if rewrite fails 
it adds the block to bad block list, then if the list is out-of-space it 
falls back to fail+rebuild
?

What happens if the destination of the hot-replace has *one* write 
error? And *lots* of write errors?

What happens if one hot-replace hits a sector for which both the disk 
being replaced and another one have an entry in the bad block list, and 
so there is not enough parity information to recompute? Does it proceed 
anyway marking the corresponding sector in the bad-block-list for the 
destination device (=nonvalid strip), or it fails the hot-replace, or what?

(this is actually more about bad block lists)
What happens if a *different* disk shows back sectors due to concomitant 
reads (simultaneous but not caused by hot-replace):
- first recomputes and rewrites, then if rewrite fails it is added to 
bad block list, then if list is full it gets failed? Or can another 
hot-replace get started when already one is running?

Thank you


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Hot-replace for RAID5
  2012-05-21  9:54           ` Asdo
@ 2012-05-21 10:12             ` NeilBrown
  0 siblings, 0 replies; 26+ messages in thread
From: NeilBrown @ 2012-05-21 10:12 UTC (permalink / raw)
  To: Asdo; +Cc: Oliver Martin, patrik, David Brown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 3384 bytes --]

On Mon, 21 May 2012 11:54:43 +0200 Asdo <asdo@shiftmail.org> wrote:

> On 05/18/12 05:45, NeilBrown wrote:
> > On Thu, 17 May 2012 01:34:15 +0200 Oliver Martin<oliver@volatilevoid.net>
> > wrote:
> >
> >> Hi Neil,
> >>
> >> Am 11.05.2012 02:50, schrieb NeilBrown:
> >>> Doing an in-place reshape with the new 3.3 code should work, though with a
> >>> softer "should" than above.  We will only know that it is "stable" when enough
> >>> people (such as yourself) try it and report success.  If anything does go
> >>> wrong I would of course help you to put the array back together but I can
> >>> never guarantee no data loss.  You wouldn't be the first to test the code on
> >>> live data, but you would be the second that I have heard of.
> >> I guess I'll be taking 2nd place then. I just used it on three live
> >> raid6 arrays, and it worked perfectly.
> > 3 arrays - so you are 2nd, 3rd, and 4th :-)
> 
> Good to know that when all is good, hot-replace works.
> 
> I wonder if all "error paths" were considered and implemented (and maybe 
> even tested, but we users could help with testing if we understand the 
> intended behaviour), i.e.

I hope I considered them...  but I do miss things sometimes :-)

> 
> what happens when the disk being hot-replaced shows read errors in 
> locations previously unknown to the bad-block list: does it
> - immediately fall back to fail+rebuild or
> - first tries a recompute + rewrite of the sector, then if rewrite fails 
> it falls back to fail+rebuild

This one if no bad-blocks list is configured.

> - first tries a recompute + rewrite of the sector, then if rewrite fails 
> it adds the block to bad block list, then if the list is out-of-space it 
> falls back to fail+rebuild

This one if a bad-blocks list is configured
> ?
> 
> What happens if the destination of the hot-replace has *one* write 
> error? And *lots* of write errors?

If the hot-replace destination has any write errors it is failed and removed
from the array.  Better the devil you know ....

> 
> What happens if one hot-replace hits a sector for which both the disk 
> being replaced and another one have an entry in the bad block list, and 
> so there is not enough parity information to recompute? Does it proceed 
> anyway marking the corresponding sector in the bad-block-list for the 
> destination device (=nonvalid strip), or it fails the hot-replace, or what?

If a bad-block list is configured for the target device, a bad block is
recorded there, else the device is failed.

> 
> (this is actually more about bad block lists)
> What happens if a *different* disk shows back sectors due to concomitant 
> reads (simultaneous but not caused by hot-replace):
> - first recomputes and rewrites, then if rewrite fails it is added to 
> bad block list, then if list is full it gets failed? Or can another 
> hot-replace get started when already one is running?

The handling of bad blocks is independent of any hot-replace activity.
So if some other device gets a read error we try to recover as normal.
If the results in an error which would trigger a hot-replace, then at the
next opportunity when no resync/recovery/reshape/replace is running, and a
spare is available, a hot-replace will start.

Hope that clarifies the situation.

Thanks,
NeilBrown



> 
> Thank you


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2012-05-21 10:12 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-08  9:10 Hot-replace for RAID5 Patrik Horník
2012-05-10  6:59 ` David Brown
2012-05-10  8:50   ` Patrik Horník
2012-05-10 17:16   ` Patrik Horník
2012-05-11  0:50     ` NeilBrown
2012-05-11  2:44       ` Patrik Horník
2012-05-11  7:16         ` David Brown
2012-05-12  4:40           ` Patrik Horník
2012-05-12 15:56             ` Patrik Horník
2012-05-12 23:19               ` NeilBrown
2012-05-13  7:43                 ` Patrik Horník
2012-05-13 21:41                   ` Patrik Horník
2012-05-13 22:15                     ` NeilBrown
2012-05-14  0:52                       ` Patrik Horník
2012-05-15 10:11                         ` Patrik Horník
2012-05-15 10:43                           ` NeilBrown
     [not found]                             ` <CAAOsTSmMrs2bHDbFrND4-iaxwrTA0WySd_AVaK+KXZ-XZsysag@mail.gmail.com>
     [not found]                               ` <20120515212820.14db2fd2@notabene.brown>
2012-05-15 11:56                                 ` Patrik Horník
2012-05-15 12:13                                   ` NeilBrown
2012-05-15 19:39                                     ` Patrik Horník
2012-05-15 22:47                                       ` NeilBrown
2012-05-16  5:51                                         ` Patrik Horník
2012-05-16 23:34       ` Oliver Martin
2012-05-18  3:45         ` NeilBrown
2012-05-19 10:40           ` Patrik Horník
2012-05-21  9:54           ` Asdo
2012-05-21 10:12             ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.