All of lore.kernel.org
 help / color / mirror / Atom feed
* How to use --freeze-reshape  and is it safe?
@ 2014-08-14  5:38 Ram Ramesh
  2014-08-14  5:56 ` NeilBrown
  0 siblings, 1 reply; 6+ messages in thread
From: Ram Ramesh @ 2014-08-14  5:38 UTC (permalink / raw)
  To: linux-raid

I was browsing through mdadm man pages to check out --layout options 
when converting 3disk-raid5 to 4disk-raid6 and encountered 
--freeze-reshape switch/arg. I did a quick google and could not get much 
info. Can a user issue this to suspend reshape for a short while? 
Specifically

 1. Is the use (or frequent use) of this switch safe? recommended?
 2. Can the array be mounted when this switch is used?
 3. What is correct syntax for the usage?
 4. Can I use this to manage the reshape load on an array? May be to let
    the disk cool off after a busy hours of seeking to reshape?
 5. Can I use it as a safe method for shutting down  the machine?
 6. Is there a tutorial/faq/manual that explains in detail the use of
    other mdadm esoteric switches? (like --layout I was searching)

Regards
Ramesh


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to use --freeze-reshape  and is it safe?
  2014-08-14  5:38 How to use --freeze-reshape and is it safe? Ram Ramesh
@ 2014-08-14  5:56 ` NeilBrown
  2014-08-14  6:25   ` Ram Ramesh
  2014-08-14 13:51   ` Ethan Wilson
  0 siblings, 2 replies; 6+ messages in thread
From: NeilBrown @ 2014-08-14  5:56 UTC (permalink / raw)
  To: Ram Ramesh; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1844 bytes --]

On Thu, 14 Aug 2014 00:38:43 -0500 Ram Ramesh <rramesh2400@gmail.com> wrote:

> I was browsing through mdadm man pages to check out --layout options 
> when converting 3disk-raid5 to 4disk-raid6 and encountered 
> --freeze-reshape switch/arg. I did a quick google and could not get much 
> info. Can a user issue this to suspend reshape for a short while? 

As --freeze-reshape is only meaningful in combination with --assemble,
this question doesn't really make sense.

If you are using a sufficiently new kernel and mdadm so that "data_offset" is
adjusted during reshapes so that no 'backup' is needed, then you can
suspend a reshape for a period of time by:

  echo frozen > /sys/block/mdXXX/md/sync_action

This is perfectly safe.  When you want to unfreeze, write 'idle'
to 'sync_action'.  md will notice that a reshape is pending and will restart
where it was up to.


> Specifically
> 
>  1. Is the use (or frequent use) of this switch safe? recommended?
>  2. Can the array be mounted when this switch is used?
>  3. What is correct syntax for the usage?
>  4. Can I use this to manage the reshape load on an array? May be to let
>     the disk cool off after a busy hours of seeking to reshape?
>  5. Can I use it as a safe method for shutting down  the machine?
>  6. Is there a tutorial/faq/manual that explains in detail the use of
>     other mdadm esoteric switches? (like --layout I was searching)

Is it really that esoteric?
If you want to reshape an array, you run "mdadm --grow" and list all the
changes you want to make.  Set a new level, a new number of devices, a new
layout, a new chunk size, whatever.  mdadm will do it if it can and give an
error if it cannot.
If you want to test it out first then that is extremely sensible.  Make some
loop devices and experiment.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to use --freeze-reshape  and is it safe?
  2014-08-14  5:56 ` NeilBrown
@ 2014-08-14  6:25   ` Ram Ramesh
  2014-08-14  7:30     ` NeilBrown
  2014-08-14 13:51   ` Ethan Wilson
  1 sibling, 1 reply; 6+ messages in thread
From: Ram Ramesh @ 2014-08-14  6:25 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 08/14/2014 12:56 AM, NeilBrown wrote:
> On Thu, 14 Aug 2014 00:38:43 -0500 Ram Ramesh <rramesh2400@gmail.com> wrote:
>
>> I was browsing through mdadm man pages to check out --layout options
>> when converting 3disk-raid5 to 4disk-raid6 and encountered
>> --freeze-reshape switch/arg. I did a quick google and could not get much
>> info. Can a user issue this to suspend reshape for a short while?
> As --freeze-reshape is only meaningful in combination with --assemble,
> this question doesn't really make sense.
>
> If you are using a sufficiently new kernel and mdadm so that "data_offset" is
> adjusted during reshapes so that no 'backup' is needed, then you can
> suspend a reshape for a period of time by:
>
>    echo frozen > /sys/block/mdXXX/md/sync_action
>
> This is perfectly safe.  When you want to unfreeze, write 'idle'
> to 'sync_action'.  md will notice that a reshape is pending and will restart
> where it was up to.
>
>
>> Specifically
>>
>>   1. Is the use (or frequent use) of this switch safe? recommended?
>>   2. Can the array be mounted when this switch is used?
>>   3. What is correct syntax for the usage?
>>   4. Can I use this to manage the reshape load on an array? May be to let
>>      the disk cool off after a busy hours of seeking to reshape?
>>   5. Can I use it as a safe method for shutting down  the machine?
>>   6. Is there a tutorial/faq/manual that explains in detail the use of
>>      other mdadm esoteric switches? (like --layout I was searching)
> Is it really that esoteric?
> If you want to reshape an array, you run "mdadm --grow" and list all the
> changes you want to make.  Set a new level, a new number of devices, a new
> layout, a new chunk size, whatever.  mdadm will do it if it can and give an
> error if it cannot.
> If you want to test it out first then that is extremely sensible.  Make some
> loop devices and experiment.
>
> NeilBrown
Thanks. The name --freeze-reshape mislead me in to thinking that this is 
a request to stop reshape just like -fail is to make a drive
failed. I used esoteric to mean not routinely used or cannot be 
interpreted by plain English meaning of the the switch/arg name.

While I am at this, let me ask the --layout question also. Does 
conversion from raid5 to raid6 do --layout=left-symmeric-6 first and 
then distribute Q through second pass with --layout=left-symmetric? If 
not, will the reshape be faster if I did it in two phases?

Ramesh


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to use --freeze-reshape  and is it safe?
  2014-08-14  6:25   ` Ram Ramesh
@ 2014-08-14  7:30     ` NeilBrown
  2014-08-14 15:59       ` Ram Ramesh
  0 siblings, 1 reply; 6+ messages in thread
From: NeilBrown @ 2014-08-14  7:30 UTC (permalink / raw)
  To: Ram Ramesh; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4666 bytes --]

On Thu, 14 Aug 2014 01:25:28 -0500 Ram Ramesh <rramesh2400@gmail.com> wrote:

> On 08/14/2014 12:56 AM, NeilBrown wrote:
> > On Thu, 14 Aug 2014 00:38:43 -0500 Ram Ramesh <rramesh2400@gmail.com> wrote:
> >
> >> I was browsing through mdadm man pages to check out --layout options
> >> when converting 3disk-raid5 to 4disk-raid6 and encountered
> >> --freeze-reshape switch/arg. I did a quick google and could not get much
> >> info. Can a user issue this to suspend reshape for a short while?
> > As --freeze-reshape is only meaningful in combination with --assemble,
> > this question doesn't really make sense.
> >
> > If you are using a sufficiently new kernel and mdadm so that "data_offset" is
> > adjusted during reshapes so that no 'backup' is needed, then you can
> > suspend a reshape for a period of time by:
> >
> >    echo frozen > /sys/block/mdXXX/md/sync_action
> >
> > This is perfectly safe.  When you want to unfreeze, write 'idle'
> > to 'sync_action'.  md will notice that a reshape is pending and will restart
> > where it was up to.
> >
> >
> >> Specifically
> >>
> >>   1. Is the use (or frequent use) of this switch safe? recommended?
> >>   2. Can the array be mounted when this switch is used?
> >>   3. What is correct syntax for the usage?
> >>   4. Can I use this to manage the reshape load on an array? May be to let
> >>      the disk cool off after a busy hours of seeking to reshape?
> >>   5. Can I use it as a safe method for shutting down  the machine?
> >>   6. Is there a tutorial/faq/manual that explains in detail the use of
> >>      other mdadm esoteric switches? (like --layout I was searching)
> > Is it really that esoteric?
> > If you want to reshape an array, you run "mdadm --grow" and list all the
> > changes you want to make.  Set a new level, a new number of devices, a new
> > layout, a new chunk size, whatever.  mdadm will do it if it can and give an
> > error if it cannot.
> > If you want to test it out first then that is extremely sensible.  Make some
> > loop devices and experiment.
> >
> > NeilBrown
> Thanks. The name --freeze-reshape mislead me in to thinking that this is 
> a request to stop reshape just like -fail is to make a drive
> failed. I used esoteric to mean not routinely used or cannot be 
> interpreted by plain English meaning of the the switch/arg name.
> 
> While I am at this, let me ask the --layout question also. Does 
> conversion from raid5 to raid6 do --layout=left-symmeric-6 first and 
> then distribute Q through second pass with --layout=left-symmetric? If 
> not, will the reshape be faster if I did it in two phases?

When you convert a raid5 to a raid6 it will assume that an extra drive is
being added as well.
Firstly the array is instantaneously converted from an optimal RAID5 in
left-symmetric layout to a degraded RAID6 in left-symmetric-6 layout.

Then the reshape process is started which reads each stripe in the
left-symmetric-6 layout and writes it back in the raid6:left-symmetric layout.

(if you specify a different number of final devices it all still works in one
pass, but the dance is more complex).

If this is done without changing the data offset, then every stripe is
written on top of the old location of the same stripe so if the host crashed
in the middle of the write, data would be lost.
So mdadm copies each stripe to a backup-file before allowing the data to be
relocated.  This causes a lot more IO than required to move the data, but is
a lot safer.

With newer kernels (v3.5) and mdadm (v3.3) a reshape can move the data_offset
at the same time so that it is only ever writing to an unused area of the
devices.  This should be much faster.
However it requires that the data_offset is high enough that there is room to
move it backwards.  mdadm 3.3 creates arrays with a reasonably large
data_offset.  With arrays created earlier you might need to
 - shrink the filesystem
 - shrink the --size of the array

md can either increase or decrease the data offset.
The later requires free space at the start of the array so data_offset must
be large.  The former requires free space at the end of the array, so size
must be less than the maximum.  "mdadm --examine" will report "Unused space"
both "before" and "after" which indicates how much data_offset can be moved.
If either of these are larger than 1 chunk, then mdadm will make use of it.

To answer you question: there is no "second pass".  The only way to make it
faster is to have a recent kernel and mdadm and make sure there is sufficient
Unused space, either "before" or "after".

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to use --freeze-reshape  and is it safe?
  2014-08-14  5:56 ` NeilBrown
  2014-08-14  6:25   ` Ram Ramesh
@ 2014-08-14 13:51   ` Ethan Wilson
  1 sibling, 0 replies; 6+ messages in thread
From: Ethan Wilson @ 2014-08-14 13:51 UTC (permalink / raw)
  To: linux-raid

On 14/08/2014 07:56, NeilBrown wrote:
> ....
>
> As --freeze-reshape is only meaningful in combination with --assemble,
> this question doesn't really make sense.
>
> If you are using a sufficiently new kernel and mdadm so that "data_offset" is
> adjusted during reshapes so that no 'backup' is needed, then you can
> suspend a reshape for a period of time by:
>
>    echo frozen > /sys/block/mdXXX/md/sync_action
>
> This is perfectly safe.  When you want to unfreeze, write 'idle'
> to 'sync_action'.  md will notice that a reshape is pending and will restart
> where it was up to.
>
> .....
>
> Is it really that esoteric?
> If you want to reshape an array, you run "mdadm --grow" and list all the
> changes you want to make.  Set a new level, a new number of devices, a new
> layout, a new chunk size, whatever.  mdadm will do it if it can and give an
> error if it cannot.
> If you want to test it out first then that is extremely sensible.  Make some
> loop devices and experiment.

I also was interested in the functioning of reshape, and it never was 
completely clear to me.

Do you confirm that the manpage for --freeze-reshape is correct?

        --freeze-reshape
               Option  is  intended  to  be  used  in  start-up scripts 
during initrd boot phase.  When array under reshape is assembled during 
initrd phase, this option stops reshape after
               reshape critical section is being restored. This happens 
before file system pivot operation and avoids loss of file system 
context.  Losing file system  context  would  cause
               reshape to be broken.

"would cause reshape to be broken" means complete data loss?

So, to be safe in case of power loss during reshape, we have to make 
absolutely sure that our Linux distribution implements the 
--freeze-reshape before pivot_root ?

I think Ubuntu doesn't do that. Debian probably also doesn't.

Is the reshape operation supposed to be safe with regard to power loss 
(supposing the distro implements --freeze-reshape) ?

Is the --freeze-reshape needed only when there is a backup file, or even 
with newer kernels (>= 3.5) and mdadm (>= 3.3) and no backup file ?

If mdadm lets me initiate a reshape without specifying a backup file, 
does it mean that it has checked that in my case it is safe?

Thank you
EW

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to use --freeze-reshape and is it safe?
  2014-08-14  7:30     ` NeilBrown
@ 2014-08-14 15:59       ` Ram Ramesh
  0 siblings, 0 replies; 6+ messages in thread
From: Ram Ramesh @ 2014-08-14 15:59 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

>
> When you convert a raid5 to a raid6 it will assume that an extra drive is
> being added as well.
> Firstly the array is instantaneously converted from an optimal RAID5 in
> left-symmetric layout to a degraded RAID6 in left-symmetric-6 layout.
>
> Then the reshape process is started which reads each stripe in the
> left-symmetric-6 layout and writes it back in the raid6:left-symmetric layout.
>
> (if you specify a different number of final devices it all still works in one
> pass, but the dance is more complex).
>
> If this is done without changing the data offset, then every stripe is
> written on top of the old location of the same stripe so if the host crashed
> in the middle of the write, data would be lost.
> So mdadm copies each stripe to a backup-file before allowing the data to be
> relocated.  This causes a lot more IO than required to move the data, but is
> a lot safer.
>
> With newer kernels (v3.5) and mdadm (v3.3) a reshape can move the data_offset
> at the same time so that it is only ever writing to an unused area of the
> devices.  This should be much faster.
> However it requires that the data_offset is high enough that there is room to
> move it backwards.  mdadm 3.3 creates arrays with a reasonably large
> data_offset.  With arrays created earlier you might need to
>  - shrink the filesystem
>  - shrink the --size of the array
>
> md can either increase or decrease the data offset.
> The later requires free space at the start of the array so data_offset must
> be large.  The former requires free space at the end of the array, so size
> must be less than the maximum.  "mdadm --examine" will report "Unused space"
> both "before" and "after" which indicates how much data_offset can be moved.
> If either of these are larger than 1 chunk, then mdadm will make use of it.
>
> To answer you question: there is no "second pass".  The only way to make it
> faster is to have a recent kernel and mdadm and make sure there is sufficient
> Unused space, either "before" or "after".
>
> NeilBrown

I may be wrong here, but wouldn't going back and forth on the same
disk make the operation slow. I mean trying to compute Q and
distribute will require read followed by a write to several disk
making seek the bottleneck. Would it not be better to first build Q on
the new disk and do the distribution later as you may be able to read
multiple blocks, parallelize reads, and combine writes.  I am not
claiming deep knowledge of disk's inner working here. Just bouncing
thoughts.

Ramesh

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-08-14 15:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-14  5:38 How to use --freeze-reshape and is it safe? Ram Ramesh
2014-08-14  5:56 ` NeilBrown
2014-08-14  6:25   ` Ram Ramesh
2014-08-14  7:30     ` NeilBrown
2014-08-14 15:59       ` Ram Ramesh
2014-08-14 13:51   ` Ethan Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.