All of lore.kernel.org
 help / color / mirror / Atom feed
* XFS over LVM over md RAID
@ 2010-09-09 22:58 Richard Scobie
  2010-09-10  0:25 ` Michael Monnerie
  2010-09-10  1:30 ` Dave Chinner
  0 siblings, 2 replies; 10+ messages in thread
From: Richard Scobie @ 2010-09-09 22:58 UTC (permalink / raw)
  To: xfs

Using the latest, stable versions of LVM2 and xfsprogs and the 2.6.35.4 
kernel, I am setting up lvm on a 16 drive, 256k chunk md RAID6, which 
has been used to date with XFS directly on the RAID.

mkfs.xfs directly on the RAID gives:

meta-data=/dev/md8               isize=256    agcount=32, 
agsize=106814656 blks
          =                       sectsz=4096  attr=2
data     =                       bsize=4096   blocks=3418068864, imaxpct=5
          =                       sunit=64     swidth=896 blks
naming   =version 2              bsize=4096   ascii-ci=0

which gives the correct sunit and swidth values for the array.

Creating an lv which uses the entire array and mkfs.xfs on that, gives:

meta-data=/dev/vg_local/Storage  isize=256    agcount=13, 
agsize=268435455 blks
          =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=3418067968, imaxpct=5
          =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0

Limited testing using dd and bonnie++ shows no difference in write 
performance whether I use sunit=64/swidth=896 or sunit=0/swidth=0 on the lv.

My gut reaction is that I should be using 64/896 but maybe mkfs.xfs 
knows better?

Regards,

Richard

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: XFS over LVM over md RAID
  2010-09-09 22:58 XFS over LVM over md RAID Richard Scobie
@ 2010-09-10  0:25 ` Michael Monnerie
  2010-09-10  0:52   ` Richard Scobie
  2010-09-10  1:14   ` Richard Scobie
  2010-09-10  1:30 ` Dave Chinner
  1 sibling, 2 replies; 10+ messages in thread
From: Michael Monnerie @ 2010-09-10  0:25 UTC (permalink / raw)
  To: xfs; +Cc: Richard Scobie


[-- Attachment #1.1: Type: Text/Plain, Size: 1153 bytes --]

On Freitag, 10. September 2010 Richard Scobie wrote:
> Limited testing using dd and bonnie++ shows no difference in write 
> performance
 
For dd it should always show the same, as you just sequentially write a 
large file. Only with bonnie you would see differences due to stripe 
sets, as the speed only drops when doing I/O not on boundaries and/or 
smaller than the stripe size, as the stripe size is the smallest 
possible I/O for the RAID.

I don't know why you don't see any difference with bonnie though.

FWIW, a stripe set of 256k means you do read/write 256k from a single 
drive on each I/O, then the next 256k from the next drive. I hope you 
have very few small accesses and mostly very large files. If you'd use a 
database on that system it would crawl...

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31

****** Aktuelles Radiointerview! ******
http://www.it-podcast.at/aktuelle-sendung.html

// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: XFS over LVM over md RAID
  2010-09-10  0:25 ` Michael Monnerie
@ 2010-09-10  0:52   ` Richard Scobie
  2010-09-10  1:14   ` Richard Scobie
  1 sibling, 0 replies; 10+ messages in thread
From: Richard Scobie @ 2010-09-10  0:52 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: xfs

Michael Monnerie wrote:

> FWIW, a stripe set of 256k means you do read/write 256k from a single
> drive on each I/O, then the next 256k from the next drive. I hope you
> have very few small accesses and mostly very large files. If you'd use a
> database on that system it would crawl...

Mostly > 2MB and multi GB.

Regards,

Richard

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: XFS over LVM over md RAID
  2010-09-10  0:25 ` Michael Monnerie
  2010-09-10  0:52   ` Richard Scobie
@ 2010-09-10  1:14   ` Richard Scobie
  1 sibling, 0 replies; 10+ messages in thread
From: Richard Scobie @ 2010-09-10  1:14 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: xfs

Michael Monnerie wrote:

> I don't know why you don't see any difference with bonnie though.

Re-ran the bonnie tests using 1.2MB files instead of the 12MB files i 
used initially and there is a substantial difference - sorry for the noise.

Regards,

Richard

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: XFS over LVM over md RAID
  2010-09-09 22:58 XFS over LVM over md RAID Richard Scobie
  2010-09-10  0:25 ` Michael Monnerie
@ 2010-09-10  1:30 ` Dave Chinner
  2010-09-10  2:29   ` Richard Scobie
  1 sibling, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2010-09-10  1:30 UTC (permalink / raw)
  To: Richard Scobie; +Cc: xfs

On Fri, Sep 10, 2010 at 10:58:22AM +1200, Richard Scobie wrote:
> Using the latest, stable versions of LVM2 and xfsprogs and the
> 2.6.35.4 kernel, I am setting up lvm on a 16 drive, 256k chunk md
> RAID6, which has been used to date with XFS directly on the RAID.
> 
> mkfs.xfs directly on the RAID gives:
> 
> meta-data=/dev/md8               isize=256    agcount=32,
> agsize=106814656 blks
>          =                       sectsz=4096  attr=2
> data     =                       bsize=4096   blocks=3418068864, imaxpct=5
>          =                       sunit=64     swidth=896 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> 
> which gives the correct sunit and swidth values for the array.
> 
> Creating an lv which uses the entire array and mkfs.xfs on that, gives:
> 
> meta-data=/dev/vg_local/Storage  isize=256    agcount=13,
> agsize=268435455 blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=3418067968, imaxpct=5
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0

Hmmm - it's treating MD very differently to the LVM volume -
different numbers of AGs, different sunit/swdith. Did you
build xfsprogs yourself? Is it linked against libblkid or libdisk?

Or it might be that LVM is not exporting the characteristic of the
underlying volume. Can you check if there are different parameter
values exported by the two devices in /sys/block/<dev>/queue?

> Limited testing using dd and bonnie++ shows no difference in write
> performance whether I use sunit=64/swidth=896 or sunit=0/swidth=0 on
> the lv.

These benchmarks won't realy show any difference on an empty
filesystem. It will have an impact on how the filesystems age and
how well aligned the IO will be to the underlying device under more
complex workloads...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: XFS over LVM over md RAID
  2010-09-10  1:30 ` Dave Chinner
@ 2010-09-10  2:29   ` Richard Scobie
  2010-09-10 14:24     ` Eric Sandeen
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Scobie @ 2010-09-10  2:29 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hi Dave,

Dave Chinner wrote:
> On Fri, Sep 10, 2010 at 10:58:22AM +1200, Richard Scobie wrote:
>> Using the latest, stable versions of LVM2 and xfsprogs and the
>> 2.6.35.4 kernel, I am setting up lvm on a 16 drive, 256k chunk md
>> RAID6, which has been used to date with XFS directly on the RAID.
>>
>> mkfs.xfs directly on the RAID gives:
>>
>> meta-data=/dev/md8               isize=256    agcount=32,
>> agsize=106814656 blks
>>           =                       sectsz=4096  attr=2
>> data     =                       bsize=4096   blocks=3418068864, imaxpct=5
>>           =                       sunit=64     swidth=896 blks
>> naming   =version 2              bsize=4096   ascii-ci=0
>>
>> which gives the correct sunit and swidth values for the array.
>>
>> Creating an lv which uses the entire array and mkfs.xfs on that, gives:
>>
>> meta-data=/dev/vg_local/Storage  isize=256    agcount=13,
>> agsize=268435455 blks
>>           =                       sectsz=512   attr=2
>> data     =                       bsize=4096   blocks=3418067968, imaxpct=5
>>           =                       sunit=0      swidth=0 blks
>> naming   =version 2              bsize=4096   ascii-ci=0
>
> Hmmm - it's treating MD very differently to the LVM volume -
> different numbers of AGs, different sunit/swdith. Did you
> build xfsprogs yourself? Is it linked against libblkid or libdisk?

I should clarify - the first set was created with xfsprogs 3.0.0 and 
second was done with xfsprogs 3.1.3, so I wondered if the default ag 
count had changed.

Given a 12TB array, would I be better using 32?

I did build 3.1.3. Is libblkid preferable?  I note it is defaulted off 
in configure and I used the default configuration.

l appear to have e2fsprogs and devel packages installed (Fedora 11), for 
libblkid, but when I enable and try to build, it fails:

     [CC]     xfs_mkfs.o
xfs_mkfs.c: In function ‘check_overwrite’:
xfs_mkfs.c:298: error: ‘blkid_probe’ undeclared (first use in this function)
xfs_mkfs.c:298: error: (Each undeclared identifier is reported only once
xfs_mkfs.c:298: error: for each function it appears in.)
xfs_mkfs.c:298: error: expected ‘;’ before ‘pr’
xfs_mkfs.c:321: error: ‘pr’ undeclared (first use in this function)
xfs_mkfs.c:321: warning: implicit declaration of function 
‘blkid_new_probe_from_filename’
xfs_mkfs.c:325: warning: implicit declaration of function 
‘blkid_probe_enable_partitions’
xfs_mkfs.c:329: warning: implicit declaration of function 
‘blkid_do_fullprobe’
xfs_mkfs.c:345: warning: implicit declaration of function 
‘blkid_probe_lookup_value’
xfs_mkfs.c:362: warning: implicit declaration of function ‘blkid_free_probe’
xfs_mkfs.c: In function ‘blkid_get_topology’:
xfs_mkfs.c:372: error: ‘blkid_topology’ undeclared (first use in this 
function)
xfs_mkfs.c:372: error: expected ‘;’ before ‘tp’
xfs_mkfs.c:373: error: ‘blkid_probe’ undeclared (first use in this function)
xfs_mkfs.c:373: error: expected ‘;’ before ‘pr’
xfs_mkfs.c:381: error: ‘pr’ undeclared (first use in this function)
xfs_mkfs.c:385: error: ‘tp’ undeclared (first use in this function)
xfs_mkfs.c:385: warning: implicit declaration of function 
‘blkid_probe_get_topology’
xfs_mkfs.c:397: warning: implicit declaration of function 
‘blkid_topology_get_minimum_io_size’
xfs_mkfs.c:400: warning: implicit declaration of function 
‘blkid_topology_get_optimal_io_size’
xfs_mkfs.c:403: warning: implicit declaration of function 
‘blkid_probe_get_sectorsize’
xfs_mkfs.c:406: warning: implicit declaration of function 
‘blkid_topology_get_alignment_offset’
gmake[2]: *** [xfs_mkfs.o] Error 1
gmake[1]: *** [mkfs] Error 2
make: *** [default] Error 2


> Or it might be that LVM is not exporting the characteristic of the
> underlying volume. Can you check if there are different parameter
> values exported by the two devices in /sys/block/<dev>/queue?

They look the same.

>> Limited testing using dd and bonnie++ shows no difference in write
>> performance whether I use sunit=64/swidth=896 or sunit=0/swidth=0 on
>> the lv.
>
> These benchmarks won't realy show any difference on an empty
> filesystem. It will have an impact on how the filesystems age and
> how well aligned the IO will be to the underlying device under more
> complex workloads...

I figured I'd go with the geometry specified.

Regards,

Richard

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: XFS over LVM over md RAID
  2010-09-10  2:29   ` Richard Scobie
@ 2010-09-10 14:24     ` Eric Sandeen
  2010-09-10 21:42       ` Richard Scobie
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Sandeen @ 2010-09-10 14:24 UTC (permalink / raw)
  To: Richard Scobie; +Cc: xfs

Richard Scobie wrote:

> I did build 3.1.3. Is libblkid preferable?  I note it is defaulted off
> in configure and I used the default configuration.
> 
> l appear to have e2fsprogs and devel packages installed (Fedora 11), for
> libblkid, but when I enable and try to build, it fails:
> 
>     [CC]     xfs_mkfs.o
> xfs_mkfs.c: In function ‘check_overwrite’:
> xfs_mkfs.c:298: error: ‘blkid_probe’ undeclared (first use in this
> function)

yup F11 doesn't have all the nice blkid topology bits; F11 still has
libblkid in e2fsprogs, from before it was liberated to util-linux-ng.

(I wonder if we shouldn't just convert xfsprogs over to use the ioctls...)

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: XFS over LVM over md RAID
  2010-09-10 14:24     ` Eric Sandeen
@ 2010-09-10 21:42       ` Richard Scobie
  2010-09-10 22:19         ` Stan Hoeppner
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Scobie @ 2010-09-10 21:42 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

Eric Sandeen wrote:

> yup F11 doesn't have all the nice blkid topology bits; F11 still has
> libblkid in e2fsprogs, from before it was liberated to util-linux-ng.
>
> (I wonder if we shouldn't just convert xfsprogs over to use the ioctls...)

Thanks both for your help. For various reasons, I'll just rely on using 
the non-topology aware version.

In the future this lv will be grown in multiples of  256K chunk, 16 
drive RAID6 arrays, so am I correct in thinking that the sunit/swidth 
parameter can stay the same as it is expanded?

I am thinking that XFS allocates ag's across all the space and that it 
will only be writing to any one array at a time, or would there be more 
configured if mkfs.xfs were aware that it was being created on say, 2 x 
16 RAID6 arrays?

This is based on backing up the first array prior to expansion, adding 
the second and mkfs.xfs, as I imagine just expanding the fs over the 
second would result in non optimal performance.

Regards,

Richard

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: XFS over LVM over md RAID
  2010-09-10 21:42       ` Richard Scobie
@ 2010-09-10 22:19         ` Stan Hoeppner
  0 siblings, 0 replies; 10+ messages in thread
From: Stan Hoeppner @ 2010-09-10 22:19 UTC (permalink / raw)
  To: xfs

Richard Scobie put forth on 9/10/2010 4:42 PM:

> In the future this lv will be grown in multiples of  256K chunk, 16
> drive RAID6 arrays, so am I correct in thinking that the sunit/swidth
> parameter can stay the same as it is expanded?

What is the reasoning behind adding so many terabytes under a single
filesystem?

Do you _need_ all of it under a single mount point?  If not, or even if
you do, for many reasons, it may very well be better to put a single
filesystem directly on each RAID6 array without using LVM in the middle
and simply mount each filesystem at a different point, say:

/data
/data/array1
/data/array2
/data/array3
/data/array4

This method can minimize damage and downtime when an entire array is
knocked offline.  We just had a post yesterday where a SATA cable was
kicked loose and took 5 drives down of a 15 drive md RAID6 set, killing
the entire filesystem.  If that OP had setup 3x5 drive arrays with 3
filesystems, the system could have continued to run in a degraded
fashion, depending on his application data layout across the
filesystems.  If done properly, you lose an app or two, not all of them.

This method also eliminates xfs_growfs performance issues such as what
you're describing because you're never changing the filesystem layout
when adding new arrays to the system.

In summary, every layer of complexity added to the storage stack
increases the probability of failure.  As my grandmother was fond of
saying, "Don't put all of your eggs in one basket."  It was salient
advice on the farm 80 years ago, and it's salient advice today with high
technology.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: XFS over LVM over md RAID
@ 2010-09-10 23:08 Richard Scobie
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Scobie @ 2010-09-10 23:08 UTC (permalink / raw)
  To: xfs

  Stan Hoeppner wrote:

 > What is the reasoning behind adding so many terabytes under a single 
filesystem?

Heavily scripted project environments, where initial storage estimates 
are exceeded and more needs to be added without the complications of 
managing seperate filesystems part way through.

It is unlikely that more than 2 arrays would be involved and I used the 
example to try and understand how XFS adapts to changing topologies.

Regards,

Richard

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-09-10 23:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-09 22:58 XFS over LVM over md RAID Richard Scobie
2010-09-10  0:25 ` Michael Monnerie
2010-09-10  0:52   ` Richard Scobie
2010-09-10  1:14   ` Richard Scobie
2010-09-10  1:30 ` Dave Chinner
2010-09-10  2:29   ` Richard Scobie
2010-09-10 14:24     ` Eric Sandeen
2010-09-10 21:42       ` Richard Scobie
2010-09-10 22:19         ` Stan Hoeppner
2010-09-10 23:08 Richard Scobie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.