All of lore.kernel.org
 help / color / mirror / Atom feed
* reshape success story
@ 2010-10-31 13:41 Florian Dazinger
  2010-10-31 14:19 ` John Robinson
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Dazinger @ 2010-10-31 13:41 UTC (permalink / raw)
  To: linux-raid

hi,
because there are so many "help"/"didn't work" mails here, I want to
give positive feedback:

Reshape of a 4-disk RAID6 (250GB each) to RADI5 worked perfectly, now
having a 3-disk RAID5 + hot spare.
the backup-file was on a NFS4-share ;)
duration: ca. 26h (cannot say for sure, it finished during night)

alas, I'm not able to tell the max. filesize of the backup-file (mdadm
deletes it afterwards), but when I looked at it in the middle of the
process, it was about 30MB. Unlike what I excpected from the man-page,
I had network traffic during the *whole* reshape-process, not just at
the beginning. It would be interesting to know, how fast the reshape
process would have been, if the backup-file were on a local disk...

kernel-2.6.36
mdadm-3.1.4

thank you very much for all the good work and user support on this list!!
cheers, florian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: reshape success story
  2010-10-31 13:41 reshape success story Florian Dazinger
@ 2010-10-31 14:19 ` John Robinson
  2010-10-31 15:46   ` Neil Brown
  0 siblings, 1 reply; 9+ messages in thread
From: John Robinson @ 2010-10-31 14:19 UTC (permalink / raw)
  To: Florian Dazinger; +Cc: linux-raid

On 31/10/2010 13:41, Florian Dazinger wrote:
[...]
> Unlike what I excpected from the man-page,
> I had network traffic during the *whole* reshape-process, not just at
> the beginning.

Perhaps the man page needs updating then. The backup file is only used 
at the beginning for grows, or at the end for shrinks, but a same-size 
reshape (as yours was, going from 4-disc RAID-6 to 3-disc RAID-5) needs 
to back up everything because there's no spare space.

Of course when the man page section on reshaping and the use of the 
backup file was originally written, changing RAID level wasn't 
supported, and nor were shrinks, so the backup file was only used for 
grows, so it was only used at the beginning.

If I've got the above right (someone please correct me if I'm not) 
perhaps I could make a modest contribution (for a change) by updating/ 
patching the man page...

Actually that makes me wonder: the man page says spare devices can be 
used for the backup if there are any. Is that still true with all the 
grows, shrinks and level-changing reshape options we have now? I'd 
expect that method of backup to be (slightly) faster than putting the 
backup on a filesystem (even on a local disc).

> thank you very much for all the good work and user support on this list!!

It is nice to hear positive feedback, but at the same time I tend to 
think that since there must be millions of users of md/mdadm, it's 
pretty encouraging that there are only one or two "arghs" per day...

Cheers,

John.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: reshape success story
  2010-10-31 14:19 ` John Robinson
@ 2010-10-31 15:46   ` Neil Brown
  2010-11-02  1:14     ` John Robinson
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2010-10-31 15:46 UTC (permalink / raw)
  To: John Robinson; +Cc: Florian Dazinger, linux-raid

On Sun, 31 Oct 2010 14:19:13 +0000
John Robinson <john.robinson@anonymous.org.uk> wrote:

> On 31/10/2010 13:41, Florian Dazinger wrote:
> [...]
> > Unlike what I excpected from the man-page,
> > I had network traffic during the *whole* reshape-process, not just at
> > the beginning.
> 
> Perhaps the man page needs updating then. The backup file is only used 
> at the beginning for grows, or at the end for shrinks, but a same-size 
> reshape (as yours was, going from 4-disc RAID-6 to 3-disc RAID-5) needs 
> to back up everything because there's no spare space.
> 
> Of course when the man page section on reshaping and the use of the 
> backup file was originally written, changing RAID level wasn't 
> supported, and nor were shrinks, so the backup file was only used for 
> grows, so it was only used at the beginning.
> 
> If I've got the above right (someone please correct me if I'm not) 
> perhaps I could make a modest contribution (for a change) by updating/ 
> patching the man page...

That would certainly be appreciated.   Your understanding appear to be
correct!

> 
> Actually that makes me wonder: the man page says spare devices can be 
> used for the backup if there are any. Is that still true with all the 
> grows, shrinks and level-changing reshape options we have now? I'd 
> expect that method of backup to be (slightly) faster than putting the 
> backup on a filesystem (even on a local disc).

I think mdadm insists on a backup file for shrinks and same-size
transformations.  It probably could use a spare, but I think there is
generally less likely to be one - when you are growing an array there is
almost certainly an array to grow to.

Write to a file on a local disk should be just as fast as writing to a raw
device .... I guess there could be a little bit of filesystem overhead, but I
doubt you would be able to measure it.


> 
> > thank you very much for all the good work and user support on this list!!
> 
> It is nice to hear positive feedback, but at the same time I tend to 
> think that since there must be millions of users of md/mdadm, it's 
> pretty encouraging that there are only one or two "arghs" per day...

:-)

Certainly - positive feedback is always welcome!

NeilBrown



> 
> Cheers,
> 
> John.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: reshape success story
  2010-10-31 15:46   ` Neil Brown
@ 2010-11-02  1:14     ` John Robinson
  2010-11-02  6:11       ` Upgraded grub, now confused about mirrored /boot Guy Watkins
  2010-11-03  3:11       ` reshape success story Neil Brown
  0 siblings, 2 replies; 9+ messages in thread
From: John Robinson @ 2010-11-02  1:14 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linux RAID

[-- Attachment #1: Type: text/plain, Size: 766 bytes --]

On 31/10/2010 15:46, Neil Brown wrote:
> On Sun, 31 Oct 2010 14:19:13 +0000
> John Robinson<john.robinson@anonymous.org.uk>  wrote:
[...]
>> Perhaps the man page needs updating then.
[...]
>> If I've got the above right (someone please correct me if I'm not)
>> perhaps I could make a modest contribution (for a change) by updating/
>> patching the man page...
>
> That would certainly be appreciated.   Your understanding appear to be
> correct!

Is the attached of any use? I started with 3.1.4. I've fixed a couple of 
typos as well as hopefully improving the explanations about backup files 
and reshapes, and added a couple of your remarks about metadata types 
from another thread. Some of the text was cribbed from your blog about 
reshaping.

Cheers,

John.

[-- Attachment #2: mdadm.8.in.patch --]
[-- Type: text/plain, Size: 6109 bytes --]

--- a/mdadm.8.in	2010-08-31 08:21:13.000000000 +0100
+++ b/mdadm.8.in	2010-11-02 01:05:44.000000000 +0000
@@ -322,16 +322,20 @@
 ..
 Use the original 0.90 format superblock.  This format limits arrays to
 28 component devices and limits component devices of levels 1 and
-greater to 2 terabytes.
+greater to 2 terabytes.  It is also possible for there to be confusion
+about whether the superblock applies to a whole device or just the
+last partition, if the partition starts on a 64K boundary.
 .ie '{DEFAULT_METADATA}'0.90'
 .IP "1, 1.0, 1.1, 1.2"
 .el
 .IP "1, 1.0, 1.1, 1.2 default"
 ..
 Use the new version-1 format superblock.  This has few restrictions.
-The different sub-versions store the superblock at different locations
-on the device, either at the end (for 1.0), at the start (for 1.1) or
-4K from the start (for 1.2).  "1" is equivalent to "1.0".
+It can easily be moved between hosts with different endian-ness, and a
+recovery operation can be checkpointed and restarted.  The different
+sub-versions store the superblock at different locations on the
+device, either at the end (for 1.0), at the start (for 1.1) or 4K from
+the start (for 1.2).  "1" is equivalent to "1.0".
 'if '{DEFAULT_METADATA}'1.2'  "default" is equivalent to "1.2".
 .IP ddf
 Use the "Industry Standard" DDF (Disk Data Format) format defined by
@@ -493,7 +497,7 @@
 The default is
 .BR left\-symmetric .
 
-It is also possibly to cause RAID5 to use a RAID4-like layout by
+It is also possible to cause RAID5 to use a RAID4-like layout by
 choosing
 .BR parity\-first ,
 or
@@ -660,11 +664,11 @@
 .BR \-\-backup\-file=
 This is needed when
 .B \-\-grow
-is used to increase the number of
-raid-devices in a RAID5 if there are no spare devices available.
-See the GROW MODE section below on RAID\-DEVICES CHANGES.  The file
-should be stored on a separate device, not on the RAID array being
-reshaped.
+is used to increase the number of raid-devices in a RAID5 or RAID6 if
+there are no spare devices available, or to shrink, change RAID level
+or layout.  See the GROW MODE section below on RAID\-DEVICES CHANGES.
+The file must be stored on a separate device, not on the RAID array
+being reshaped.
 
 .TP
 .BR \-\-array-size= ", " \-Z
@@ -883,12 +887,14 @@
 .BR \-\-backup\-file=
 If
 .B \-\-backup\-file
-was used to grow the number of raid-devices in a RAID5, and the system
-crashed during the critical section, then the same
+was used when requesting a grow, shrink, RAID level change or other
+reshape, and the system crashed during the critical section, then the
+same
 .B \-\-backup\-file
 must be presented to
 .B \-\-assemble
-to allow possibly corrupted data to be restored.
+to allow possibly corrupted data to be restored, and the reshape
+to be completed.
 
 .TP
 .BR \-U ", " \-\-update=
@@ -2171,27 +2177,36 @@
 inaccessible.  The integrity of any data can then be checked before
 the non-reversible reduction in the number of devices is request.
 
-When relocating the first few stripes on a RAID5, it is not possible
-to keep the data on disk completely consistent and crash-proof.  To
-provide the required safety, mdadm disables writes to the array while
-this "critical section" is reshaped, and takes a backup of the data
-that is in that section.  This backup is normally stored in any spare
-devices that the array has, however it can also be stored in a
-separate file specified with the
+When relocating the first few stripes on a RAID5 or RAID6, it is not
+possible to keep the data on disk completely consistent and
+crash-proof.  To provide the required safety, mdadm disables writes to
+the array while this "critical section" is reshaped, and takes a
+backup of the data that is in that section.  For grows, this backup may be
+stored in any spare devices that the array has, however it can also be
+stored in a separate file specified with the
 .B \-\-backup\-file
-option.  If this option is used, and the system does crash during the
-critical period, the same file must be passed to
+option, and is required to be specified for shrinks, RAID level
+changes and layout changes.  If this option is used, and the system
+does crash during the critical period, the same file must be passed to
 .B \-\-assemble
-to restore the backup and reassemble the array.
+to restore the backup and reassemble the array.  When shrinking rather
+than growing the array, the reshape is done from the end towards the
+beginning, so the "critical section" is at the end of the reshape.
 
 .SS LEVEL CHANGES
 
 Changing the RAID level of any array happens instantaneously.  However
-in the RAID to RAID6 case this requires a non-standard layout of the
+in the RAID5 to RAID6 case this requires a non-standard layout of the
 RAID6 data, and in the RAID6 to RAID5 case that non-standard layout is
-required before the change can be accomplish.  So while the level
+required before the change can be accomplished.  So while the level
 change is instant, the accompanying layout change can take quite a
-long time.
+long time.  A
+.B \-\-backup\-file
+is required.  If the array is not simultaneously being grown or
+shrunk, so that the array size will remain the same - for example,
+reshaping a 3-drive RAID5 into a 4-drive RAID6 - the backup file will
+be used not just for a "cricital section" but throughout the reshape
+operation, as described below under LAYOUT CHANGES.
 
 .SS CHUNK-SIZE AND LAYOUT CHANGES
 
@@ -2200,10 +2215,13 @@
 To ensure against data loss in the case of a crash, a
 .B --backup-file
 must be provided for these changes.  Small sections of the array will
-be copied to the backup file while they are being rearranged.
+be copied to the backup file while they are being rearranged.  This
+means that all the data is copied twice, once to the backup and once
+to the new layout on the array, so this type of reshape will go very
+slowly.
 
 If the reshape is interrupted for any reason, this backup file must be
-make available to
+made available to
 .B "mdadm --assemble"
 so the array can be reassembled.  Consequently the file cannot be
 stored on the device being reshaped.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Upgraded grub, now confused about mirrored /boot
  2010-11-02  1:14     ` John Robinson
@ 2010-11-02  6:11       ` Guy Watkins
  2010-11-02 14:20         ` Leslie Rhorer
  2010-11-02 15:23         ` John Robinson
  2010-11-03  3:11       ` reshape success story Neil Brown
  1 sibling, 2 replies; 9+ messages in thread
From: Guy Watkins @ 2010-11-02  6:11 UTC (permalink / raw)
  To: 'Linux RAID'

Hello,

	I upgraded my system from Red Hat FC10 to FC11.  The instructions
say to run this command:
/sbin/grub-install BOOTDEVICE

And if it fails, run this:
/sbin/grub-install --recheck /dev/sda

However, my boot disk (/boot) is mirrored on 4 disks and I think (or hope)
all 4 are bootable.  The mirrors were created at install time many years ago
when I installed FC5.  No idea if it really made more than 1 bootable.  I
have assumed that if sda failed, I could still boot from sdb, sdc or sdd.
And I do understand that I might need to remove sda first, depending on the
type of failure.  Lucky for me, no drive has failed yet and I don't recall
if I tested booting off of any other disks.

I do have this on the kernel line:
md-mod.start_dirty_degraded=1

So, what do I do now?  Run that command on all 4 disks?  Or run it on
/dev/md0?

Oh, 4 way mirror is not because I am paranoid.  I have 4 disks partitioned
alike, so I figured I would use all 4 disks just for the symmetry.  OCD
maybe, but not paranoid.  :)

# df -k /boot
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/md0                256586    221526     21812  92% /boot

md0 : active raid1 sdd1[0] sda1[3] sdc1[2] sdb1[1]
      264960 blocks [4/4] [UUUU]
      bitmap: 0/33 pages [0KB], 4KB chunk

Thanks,
Guy


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Upgraded grub, now confused about mirrored /boot
  2010-11-02  6:11       ` Upgraded grub, now confused about mirrored /boot Guy Watkins
@ 2010-11-02 14:20         ` Leslie Rhorer
  2010-11-02 15:23         ` John Robinson
  1 sibling, 0 replies; 9+ messages in thread
From: Leslie Rhorer @ 2010-11-02 14:20 UTC (permalink / raw)
  To: 'Guy Watkins', 'Linux RAID'

> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Guy Watkins
> Sent: Tuesday, November 02, 2010 1:11 AM
> To: 'Linux RAID'
> Subject: Upgraded grub, now confused about mirrored /boot
> 
> Hello,
> 
> 	I upgraded my system from Red Hat FC10 to FC11.  The instructions
> say to run this command:
> /sbin/grub-install BOOTDEVICE
> 
> And if it fails, run this:
> /sbin/grub-install --recheck /dev/sda
> 
> However, my boot disk (/boot) is mirrored on 4 disks and I think (or hope)
> all 4 are bootable.  The mirrors were created at install time many years
> ago
> when I installed FC5.  No idea if it really made more than 1 bootable.  I
> have assumed that if sda failed, I could still boot from sdb, sdc or sdd.
> And I do understand that I might need to remove sda first, depending on
> the
> type of failure.  Lucky for me, no drive has failed yet and I don't recall
> if I tested booting off of any other disks.
> 
> I do have this on the kernel line:
> md-mod.start_dirty_degraded=1
> 
> So, what do I do now?  Run that command on all 4 disks?  Or run it on
> /dev/md0?
> 
> Oh, 4 way mirror is not because I am paranoid.  I have 4 disks partitioned
> alike, so I figured I would use all 4 disks just for the symmetry.  OCD
> maybe, but not paranoid.  :)
> 
> # df -k /boot
> Filesystem           1K-blocks      Used Available Use% Mounted on
> /dev/md0                256586    221526     21812  92% /boot
> 
> md0 : active raid1 sdd1[0] sda1[3] sdc1[2] sdb1[1]
>       264960 blocks [4/4] [UUUU]
>       bitmap: 0/33 pages [0KB], 4KB chunk

GRUB2 only supports RAID on 0.90 superblock arrays.  If your array has other
than 0.90 superblocks, it won't work, and you will need to convert to 0.90
superblocks in order to continue.  Once done, you should install GRUB2 to
all four drives.  Test booting from all four.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Upgraded grub, now confused about mirrored /boot
  2010-11-02  6:11       ` Upgraded grub, now confused about mirrored /boot Guy Watkins
  2010-11-02 14:20         ` Leslie Rhorer
@ 2010-11-02 15:23         ` John Robinson
  2010-11-05  1:42           ` Guy Watkins
  1 sibling, 1 reply; 9+ messages in thread
From: John Robinson @ 2010-11-02 15:23 UTC (permalink / raw)
  To: Guy Watkins; +Cc: 'Linux RAID'

On 02/11/2010 06:11, Guy Watkins wrote:
> Hello,
>
> 	I upgraded my system from Red Hat FC10 to FC11.  The instructions
> say to run this command:
> /sbin/grub-install BOOTDEVICE
>
> And if it fails, run this:
> /sbin/grub-install --recheck /dev/sda
>
> However, my boot disk (/boot) is mirrored on 4 disks and I think (or hope)
> all 4 are bootable.  The mirrors were created at install time many years ago
> when I installed FC5.  No idea if it really made more than 1 bootable.  I
> have assumed that if sda failed, I could still boot from sdb, sdc or sdd.
> And I do understand that I might need to remove sda first, depending on the
> type of failure.  Lucky for me, no drive has failed yet and I don't recall
> if I tested booting off of any other disks.
>
> I do have this on the kernel line:
> md-mod.start_dirty_degraded=1
>
> So, what do I do now?  Run that command on all 4 disks?  Or run it on
> /dev/md0?

I don't know which grub version you get in FC10/11, but in CentOS 5 
(with grub 0.97), grub-install is a clever little script which does all 
the work for you, so you just
   /sbin/grub-install /dev/md0
and it will install grub on all of /dev/md0's constituent drives and 
generally get everything right.

You should be fine with metadata 0.90 or 1.0 as both store the 
superblock at the end of the device. 1.1 and 1.2 probably won't work 
because their constituent partitions don't look like bare filesystems.

What actually happens is that the BIOS boots off the first live disc, 
and so does grub 0.97, neither has any inherent support for RAID but 
they don't have to if it's a RAID-1 mirror because both (or all) parts 
of the mirror can be used on their own, at least for reading enough to 
boot with.

You should probably test booting up with your first drive disconnected, 
just to be sure!

Cheers,

John.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: reshape success story
  2010-11-02  1:14     ` John Robinson
  2010-11-02  6:11       ` Upgraded grub, now confused about mirrored /boot Guy Watkins
@ 2010-11-03  3:11       ` Neil Brown
  1 sibling, 0 replies; 9+ messages in thread
From: Neil Brown @ 2010-11-03  3:11 UTC (permalink / raw)
  To: John Robinson; +Cc: Linux RAID

On Tue, 02 Nov 2010 01:14:27 +0000
John Robinson <john.robinson@anonymous.org.uk> wrote:

> On 31/10/2010 15:46, Neil Brown wrote:
> > On Sun, 31 Oct 2010 14:19:13 +0000
> > John Robinson<john.robinson@anonymous.org.uk>  wrote:
> [...]
> >> Perhaps the man page needs updating then.
> [...]
> >> If I've got the above right (someone please correct me if I'm not)
> >> perhaps I could make a modest contribution (for a change) by updating/
> >> patching the man page...
> >
> > That would certainly be appreciated.   Your understanding appear to be
> > correct!
> 
> Is the attached of any use? I started with 3.1.4. I've fixed a couple of 
> typos as well as hopefully improving the explanations about backup files 
> and reshapes, and added a couple of your remarks about metadata types 
> from another thread. Some of the text was cribbed from your blog about 
> reshaping.
> 
> Cheers,
> 
> John.

Great!  Thanks.

I made a couple of very minor changes - one of which was a nearby typo that I
spotted.

http://neil.brown.name/git?p=mdadm;a=commitdiff;h=cd19c0cf1c52ce765bf27791b5ef0ee4cdc4c8db

Thanks,
NeilBrown


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Upgraded grub, now confused about mirrored /boot
  2010-11-02 15:23         ` John Robinson
@ 2010-11-05  1:42           ` Guy Watkins
  0 siblings, 0 replies; 9+ messages in thread
From: Guy Watkins @ 2010-11-05  1:42 UTC (permalink / raw)
  To: 'John Robinson'; +Cc: 'Linux RAID'

It worked!
/sbin/grub-install /dev/md0

Thanks,
Guy

} -----Original Message-----
} From: John Robinson [mailto:john.robinson@anonymous.org.uk]
} Sent: Tuesday, November 02, 2010 11:24 AM
} To: Guy Watkins
} Cc: 'Linux RAID'
} Subject: Re: Upgraded grub, now confused about mirrored /boot
} 
} On 02/11/2010 06:11, Guy Watkins wrote:
} > Hello,
} >
} > 	I upgraded my system from Red Hat FC10 to FC11.  The instructions
} > say to run this command:
} > /sbin/grub-install BOOTDEVICE
} >
} > And if it fails, run this:
} > /sbin/grub-install --recheck /dev/sda
} >
} > However, my boot disk (/boot) is mirrored on 4 disks and I think (or
} hope)
} > all 4 are bootable.  The mirrors were created at install time many years
} ago
} > when I installed FC5.  No idea if it really made more than 1 bootable.
} I
} > have assumed that if sda failed, I could still boot from sdb, sdc or
} sdd.
} > And I do understand that I might need to remove sda first, depending on
} the
} > type of failure.  Lucky for me, no drive has failed yet and I don't
} recall
} > if I tested booting off of any other disks.
} >
} > I do have this on the kernel line:
} > md-mod.start_dirty_degraded=1
} >
} > So, what do I do now?  Run that command on all 4 disks?  Or run it on
} > /dev/md0?
} 
} I don't know which grub version you get in FC10/11, but in CentOS 5
} (with grub 0.97), grub-install is a clever little script which does all
} the work for you, so you just
}    /sbin/grub-install /dev/md0
} and it will install grub on all of /dev/md0's constituent drives and
} generally get everything right.
} 
} You should be fine with metadata 0.90 or 1.0 as both store the
} superblock at the end of the device. 1.1 and 1.2 probably won't work
} because their constituent partitions don't look like bare filesystems.
} 
} What actually happens is that the BIOS boots off the first live disc,
} and so does grub 0.97, neither has any inherent support for RAID but
} they don't have to if it's a RAID-1 mirror because both (or all) parts
} of the mirror can be used on their own, at least for reading enough to
} boot with.
} 
} You should probably test booting up with your first drive disconnected,
} just to be sure!
} 
} Cheers,
} 
} John.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-11-05  1:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-31 13:41 reshape success story Florian Dazinger
2010-10-31 14:19 ` John Robinson
2010-10-31 15:46   ` Neil Brown
2010-11-02  1:14     ` John Robinson
2010-11-02  6:11       ` Upgraded grub, now confused about mirrored /boot Guy Watkins
2010-11-02 14:20         ` Leslie Rhorer
2010-11-02 15:23         ` John Robinson
2010-11-05  1:42           ` Guy Watkins
2010-11-03  3:11       ` reshape success story Neil Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.