All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Robinson <john.robinson@anonymous.org.uk>
To: Neil Brown <neilb@suse.de>
Cc: Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: reshape success story
Date: Tue, 02 Nov 2010 01:14:27 +0000	[thread overview]
Message-ID: <4CCF65F3.3040307@anonymous.org.uk> (raw)
In-Reply-To: <20101031114655.36e627e3@notabene>

[-- Attachment #1: Type: text/plain, Size: 766 bytes --]

On 31/10/2010 15:46, Neil Brown wrote:
> On Sun, 31 Oct 2010 14:19:13 +0000
> John Robinson<john.robinson@anonymous.org.uk>  wrote:
[...]
>> Perhaps the man page needs updating then.
[...]
>> If I've got the above right (someone please correct me if I'm not)
>> perhaps I could make a modest contribution (for a change) by updating/
>> patching the man page...
>
> That would certainly be appreciated.   Your understanding appear to be
> correct!

Is the attached of any use? I started with 3.1.4. I've fixed a couple of 
typos as well as hopefully improving the explanations about backup files 
and reshapes, and added a couple of your remarks about metadata types 
from another thread. Some of the text was cribbed from your blog about 
reshaping.

Cheers,

John.

[-- Attachment #2: mdadm.8.in.patch --]
[-- Type: text/plain, Size: 6109 bytes --]

--- a/mdadm.8.in	2010-08-31 08:21:13.000000000 +0100
+++ b/mdadm.8.in	2010-11-02 01:05:44.000000000 +0000
@@ -322,16 +322,20 @@
 ..
 Use the original 0.90 format superblock.  This format limits arrays to
 28 component devices and limits component devices of levels 1 and
-greater to 2 terabytes.
+greater to 2 terabytes.  It is also possible for there to be confusion
+about whether the superblock applies to a whole device or just the
+last partition, if the partition starts on a 64K boundary.
 .ie '{DEFAULT_METADATA}'0.90'
 .IP "1, 1.0, 1.1, 1.2"
 .el
 .IP "1, 1.0, 1.1, 1.2 default"
 ..
 Use the new version-1 format superblock.  This has few restrictions.
-The different sub-versions store the superblock at different locations
-on the device, either at the end (for 1.0), at the start (for 1.1) or
-4K from the start (for 1.2).  "1" is equivalent to "1.0".
+It can easily be moved between hosts with different endian-ness, and a
+recovery operation can be checkpointed and restarted.  The different
+sub-versions store the superblock at different locations on the
+device, either at the end (for 1.0), at the start (for 1.1) or 4K from
+the start (for 1.2).  "1" is equivalent to "1.0".
 'if '{DEFAULT_METADATA}'1.2'  "default" is equivalent to "1.2".
 .IP ddf
 Use the "Industry Standard" DDF (Disk Data Format) format defined by
@@ -493,7 +497,7 @@
 The default is
 .BR left\-symmetric .
 
-It is also possibly to cause RAID5 to use a RAID4-like layout by
+It is also possible to cause RAID5 to use a RAID4-like layout by
 choosing
 .BR parity\-first ,
 or
@@ -660,11 +664,11 @@
 .BR \-\-backup\-file=
 This is needed when
 .B \-\-grow
-is used to increase the number of
-raid-devices in a RAID5 if there are no spare devices available.
-See the GROW MODE section below on RAID\-DEVICES CHANGES.  The file
-should be stored on a separate device, not on the RAID array being
-reshaped.
+is used to increase the number of raid-devices in a RAID5 or RAID6 if
+there are no spare devices available, or to shrink, change RAID level
+or layout.  See the GROW MODE section below on RAID\-DEVICES CHANGES.
+The file must be stored on a separate device, not on the RAID array
+being reshaped.
 
 .TP
 .BR \-\-array-size= ", " \-Z
@@ -883,12 +887,14 @@
 .BR \-\-backup\-file=
 If
 .B \-\-backup\-file
-was used to grow the number of raid-devices in a RAID5, and the system
-crashed during the critical section, then the same
+was used when requesting a grow, shrink, RAID level change or other
+reshape, and the system crashed during the critical section, then the
+same
 .B \-\-backup\-file
 must be presented to
 .B \-\-assemble
-to allow possibly corrupted data to be restored.
+to allow possibly corrupted data to be restored, and the reshape
+to be completed.
 
 .TP
 .BR \-U ", " \-\-update=
@@ -2171,27 +2177,36 @@
 inaccessible.  The integrity of any data can then be checked before
 the non-reversible reduction in the number of devices is request.
 
-When relocating the first few stripes on a RAID5, it is not possible
-to keep the data on disk completely consistent and crash-proof.  To
-provide the required safety, mdadm disables writes to the array while
-this "critical section" is reshaped, and takes a backup of the data
-that is in that section.  This backup is normally stored in any spare
-devices that the array has, however it can also be stored in a
-separate file specified with the
+When relocating the first few stripes on a RAID5 or RAID6, it is not
+possible to keep the data on disk completely consistent and
+crash-proof.  To provide the required safety, mdadm disables writes to
+the array while this "critical section" is reshaped, and takes a
+backup of the data that is in that section.  For grows, this backup may be
+stored in any spare devices that the array has, however it can also be
+stored in a separate file specified with the
 .B \-\-backup\-file
-option.  If this option is used, and the system does crash during the
-critical period, the same file must be passed to
+option, and is required to be specified for shrinks, RAID level
+changes and layout changes.  If this option is used, and the system
+does crash during the critical period, the same file must be passed to
 .B \-\-assemble
-to restore the backup and reassemble the array.
+to restore the backup and reassemble the array.  When shrinking rather
+than growing the array, the reshape is done from the end towards the
+beginning, so the "critical section" is at the end of the reshape.
 
 .SS LEVEL CHANGES
 
 Changing the RAID level of any array happens instantaneously.  However
-in the RAID to RAID6 case this requires a non-standard layout of the
+in the RAID5 to RAID6 case this requires a non-standard layout of the
 RAID6 data, and in the RAID6 to RAID5 case that non-standard layout is
-required before the change can be accomplish.  So while the level
+required before the change can be accomplished.  So while the level
 change is instant, the accompanying layout change can take quite a
-long time.
+long time.  A
+.B \-\-backup\-file
+is required.  If the array is not simultaneously being grown or
+shrunk, so that the array size will remain the same - for example,
+reshaping a 3-drive RAID5 into a 4-drive RAID6 - the backup file will
+be used not just for a "cricital section" but throughout the reshape
+operation, as described below under LAYOUT CHANGES.
 
 .SS CHUNK-SIZE AND LAYOUT CHANGES
 
@@ -2200,10 +2215,13 @@
 To ensure against data loss in the case of a crash, a
 .B --backup-file
 must be provided for these changes.  Small sections of the array will
-be copied to the backup file while they are being rearranged.
+be copied to the backup file while they are being rearranged.  This
+means that all the data is copied twice, once to the backup and once
+to the new layout on the array, so this type of reshape will go very
+slowly.
 
 If the reshape is interrupted for any reason, this backup file must be
-make available to
+made available to
 .B "mdadm --assemble"
 so the array can be reassembled.  Consequently the file cannot be
 stored on the device being reshaped.

  reply	other threads:[~2010-11-02  1:14 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-31 13:41 reshape success story Florian Dazinger
2010-10-31 14:19 ` John Robinson
2010-10-31 15:46   ` Neil Brown
2010-11-02  1:14     ` John Robinson [this message]
2010-11-02  6:11       ` Upgraded grub, now confused about mirrored /boot Guy Watkins
2010-11-02 14:20         ` Leslie Rhorer
2010-11-02 15:23         ` John Robinson
2010-11-05  1:42           ` Guy Watkins
2010-11-03  3:11       ` reshape success story Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CCF65F3.3040307@anonymous.org.uk \
    --to=john.robinson@anonymous.org.uk \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.