All of lore.kernel.org
 help / color / mirror / Atom feed
From: "C. Morgan Hamill" <chamill@wesleyan.edu>
To: stan <stan@hardwarefreak.com>
Cc: xfs <xfs@oss.sgi.com>
Subject: Re: Question regarding XFS on LVM over hardware RAID.
Date: Fri, 31 Jan 2014 16:14:31 -0500	[thread overview]
Message-ID: <1391202273-sup-9265@al.wesleyan.edu> (raw)
In-Reply-To: <52EB3B96.7000103@hardwarefreak.com>

Excerpts from Stan Hoeppner's message of 2014-01-31 00:58:46 -0500:
> RAID60 is a nested RAID level just like RAID10 and RAID50.  It is a
> stripe, or RAID0, across multiple primary array types, RAID6 in this
> case.  The stripe width of each 'inner' RAID6 becomes the stripe unit of
> the 'outer' RAID0 array:
> 
> RAID6 geometry     128KB * 12 = 1536KB
> RAID0 geometry  1536KB * 3  = 4608KB
> 
> If you are creating your RAID60 array with a proprietary hardware
> RAID/SAN management utility it may not be clearly showing you the
> resulting nested geometry I've demonstrated above, which is correct for
> your RAID60.
> 
> It is possible with software RAID to continue nesting stripe upon stripe
> to build infinitely large nested arrays.  It is not practical to do so
> for many reasons, but I'll not express those here as it is out of scope
> for this discussion.  I am simply attempting to explain how nested RAID
> levels are constructed.
> 
> > So optimised for sequential IO. The time-honoured method of setting
> > up XFS for this if the workload is large files is to use a stripe
> > unit that is equal to the width of the underlying RAID6 volumes with
> > a stripe width of 3. That way XFS tries to align files to the start
> > of each RAID6 volume, and allocate in full RAID6 stripe chunks. This
> > mostly avoids RMW cycles for large files and sequential IO. i.e. su
> > = 1536k, sw = 3.

Makes perfect sense.

> As Dave demonstrates, your hardware geometry is 1536*3=4608KB.  Thus,
> when you create your logical volumes they each need to start and end on
> a 4608KB boundary, and be evenly divisible by 4608KB.  This will ensure
> that all of your logical volumes are aligned to the RAID60 geometry.
> When formatting the LVs with XFS you will use:
> 
> ~# mkfs.xfs -d su=1536k,sw=3 /dev/[lv_device_path]

Noted.

> This aligns XFS to the RAID60 geometry.  Geometry alignment must be
> maintained throughout the entire storage stack.  If a single layer is
> not aligned properly, every layer will be misaligned.  When this occurs
> performance will suffer, and could suffer tremendously.
> 
> You'll want to add "inode64" to your fstab mount options for these
> filesystems.  This has nothing to do with geometry, but how XFS
> allocates inodes and how/where files are written to AGs.  It is the
> default in very recent kernels but I don't know in which it was made so.

Yes, I was aware of this.

> LVM typically affords you much more flexibility here than your RAID/SAN
> controller.  Just be mindful that when you expand you need to keep your
> geometry, i.e. stripe width, the same.  Let's say some time in the
> future you want to expand but can only afford, or only need, one 14 disk
> chassis at the time, not another 3 for another RAID60.  Here you could
> create a single 14 drive RAID6 with stripe geometry 384KB * 12 = 4608KB.
> 
> You could then carve it up into 1-3 pieces, each aligned to the
> start/end of a 4608KB stripe and evenly divisible by 4608KB, and add
> them to one of more of your LVs/XFS filesystems.  This maintains the
> same overall stripe width geometry as the RAID60 to which all of your
> XFS filesystems are already aligned.

OK, so the upshot is is that any additions to the volume group must be
array with su*sw=4608k, and all logical volumes and filesystems must
begin and end on multiples of 4608k from the start of the block device.

As long as these things hold true, is it all right for logical
volumes/filesystems to begin on one physical device and end on another?

> If you remember only 3 words of my post, remember:
> 
> Alignment, alignment, alignment.

Yes, I am hearing you. :-)

> For a RAID60 setup such as you're describing, you'll want to use LVM,
> and you must maintain consistent geometry throughout the stack, from
> array to filesystem.  This means every physical volume you create must
> start and end on a 4608KB stripe boundary.  Every volume group you
> create must do the same.  And every logical volume must also start and
> end on a 4608KB stripe boundary.  If you don't verify each layer is
> aligned all of your XFS filesystems will likely be unaligned.  And
> again, performance will suffer, possibly horribly so.

So, basically, --dataalignment is my friend during pvcreate and
lvcreate.

Thanks so much for your and Dave's help; this has been tremendously
helpful.
--
Morgan Hamill

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2014-01-31 21:14 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-29 14:26 Question regarding XFS on LVM over hardware RAID C. Morgan Hamill
2014-01-29 15:07 ` Eric Sandeen
2014-01-29 19:11   ` C. Morgan Hamill
2014-01-29 23:55     ` Stan Hoeppner
2014-01-30 14:28       ` C. Morgan Hamill
2014-01-30 20:28         ` Dave Chinner
2014-01-31  5:58           ` Stan Hoeppner
2014-01-31 21:14             ` C. Morgan Hamill [this message]
2014-02-01 21:06               ` Stan Hoeppner
2014-02-02 21:21                 ` Dave Chinner
2014-02-03 16:12                   ` C. Morgan Hamill
2014-02-03 21:41                     ` Dave Chinner
2014-02-04  8:00                       ` Stan Hoeppner
2014-02-18 19:44                         ` C. Morgan Hamill
2014-02-18 23:07                           ` Stan Hoeppner
2014-02-20 18:31                             ` C. Morgan Hamill
2014-02-21  3:33                               ` Stan Hoeppner
2014-02-21  8:57                                 ` Emmanuel Florac
2014-02-22  2:21                                   ` Stan Hoeppner
2014-02-25 17:04                                     ` C. Morgan Hamill
2014-02-25 17:17                                       ` Emmanuel Florac
2014-02-25 20:08                                       ` Stan Hoeppner
2014-02-26 14:19                                         ` C. Morgan Hamill
2014-02-26 17:49                                           ` Stan Hoeppner
2014-02-21 19:17                                 ` C. Morgan Hamill
2014-02-03 16:07                 ` C. Morgan Hamill
2014-01-29 22:40   ` Stan Hoeppner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1391202273-sup-9265@al.wesleyan.edu \
    --to=chamill@wesleyan.edu \
    --cc=stan@hardwarefreak.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.