* XFS over LVM over md RAID
@ 2010-09-09 22:58 Richard Scobie
2010-09-10 0:25 ` Michael Monnerie
2010-09-10 1:30 ` Dave Chinner
0 siblings, 2 replies; 10+ messages in thread
From: Richard Scobie @ 2010-09-09 22:58 UTC (permalink / raw)
To: xfs
Using the latest, stable versions of LVM2 and xfsprogs and the 2.6.35.4
kernel, I am setting up lvm on a 16 drive, 256k chunk md RAID6, which
has been used to date with XFS directly on the RAID.
mkfs.xfs directly on the RAID gives:
meta-data=/dev/md8 isize=256 agcount=32,
agsize=106814656 blks
= sectsz=4096 attr=2
data = bsize=4096 blocks=3418068864, imaxpct=5
= sunit=64 swidth=896 blks
naming =version 2 bsize=4096 ascii-ci=0
which gives the correct sunit and swidth values for the array.
Creating an lv which uses the entire array and mkfs.xfs on that, gives:
meta-data=/dev/vg_local/Storage isize=256 agcount=13,
agsize=268435455 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=3418067968, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
Limited testing using dd and bonnie++ shows no difference in write
performance whether I use sunit=64/swidth=896 or sunit=0/swidth=0 on the lv.
My gut reaction is that I should be using 64/896 but maybe mkfs.xfs
knows better?
Regards,
Richard
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS over LVM over md RAID
2010-09-09 22:58 XFS over LVM over md RAID Richard Scobie
@ 2010-09-10 0:25 ` Michael Monnerie
2010-09-10 0:52 ` Richard Scobie
2010-09-10 1:14 ` Richard Scobie
2010-09-10 1:30 ` Dave Chinner
1 sibling, 2 replies; 10+ messages in thread
From: Michael Monnerie @ 2010-09-10 0:25 UTC (permalink / raw)
To: xfs; +Cc: Richard Scobie
[-- Attachment #1.1: Type: Text/Plain, Size: 1153 bytes --]
On Freitag, 10. September 2010 Richard Scobie wrote:
> Limited testing using dd and bonnie++ shows no difference in write
> performance
For dd it should always show the same, as you just sequentially write a
large file. Only with bonnie you would see differences due to stripe
sets, as the speed only drops when doing I/O not on boundaries and/or
smaller than the stripe size, as the stripe size is the smallest
possible I/O for the RAID.
I don't know why you don't see any difference with bonnie though.
FWIW, a stripe set of 256k means you do read/write 256k from a single
drive on each I/O, then the next 256k from the next drive. I hope you
have very few small accesses and mostly very large files. If you'd use a
database on that system it would crawl...
--
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc
it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31
****** Aktuelles Radiointerview! ******
http://www.it-podcast.at/aktuelle-sendung.html
// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/
[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS over LVM over md RAID
2010-09-10 0:25 ` Michael Monnerie
@ 2010-09-10 0:52 ` Richard Scobie
2010-09-10 1:14 ` Richard Scobie
1 sibling, 0 replies; 10+ messages in thread
From: Richard Scobie @ 2010-09-10 0:52 UTC (permalink / raw)
To: Michael Monnerie; +Cc: xfs
Michael Monnerie wrote:
> FWIW, a stripe set of 256k means you do read/write 256k from a single
> drive on each I/O, then the next 256k from the next drive. I hope you
> have very few small accesses and mostly very large files. If you'd use a
> database on that system it would crawl...
Mostly > 2MB and multi GB.
Regards,
Richard
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS over LVM over md RAID
2010-09-10 0:25 ` Michael Monnerie
2010-09-10 0:52 ` Richard Scobie
@ 2010-09-10 1:14 ` Richard Scobie
1 sibling, 0 replies; 10+ messages in thread
From: Richard Scobie @ 2010-09-10 1:14 UTC (permalink / raw)
To: Michael Monnerie; +Cc: xfs
Michael Monnerie wrote:
> I don't know why you don't see any difference with bonnie though.
Re-ran the bonnie tests using 1.2MB files instead of the 12MB files i
used initially and there is a substantial difference - sorry for the noise.
Regards,
Richard
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS over LVM over md RAID
2010-09-09 22:58 XFS over LVM over md RAID Richard Scobie
2010-09-10 0:25 ` Michael Monnerie
@ 2010-09-10 1:30 ` Dave Chinner
2010-09-10 2:29 ` Richard Scobie
1 sibling, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2010-09-10 1:30 UTC (permalink / raw)
To: Richard Scobie; +Cc: xfs
On Fri, Sep 10, 2010 at 10:58:22AM +1200, Richard Scobie wrote:
> Using the latest, stable versions of LVM2 and xfsprogs and the
> 2.6.35.4 kernel, I am setting up lvm on a 16 drive, 256k chunk md
> RAID6, which has been used to date with XFS directly on the RAID.
>
> mkfs.xfs directly on the RAID gives:
>
> meta-data=/dev/md8 isize=256 agcount=32,
> agsize=106814656 blks
> = sectsz=4096 attr=2
> data = bsize=4096 blocks=3418068864, imaxpct=5
> = sunit=64 swidth=896 blks
> naming =version 2 bsize=4096 ascii-ci=0
>
> which gives the correct sunit and swidth values for the array.
>
> Creating an lv which uses the entire array and mkfs.xfs on that, gives:
>
> meta-data=/dev/vg_local/Storage isize=256 agcount=13,
> agsize=268435455 blks
> = sectsz=512 attr=2
> data = bsize=4096 blocks=3418067968, imaxpct=5
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0
Hmmm - it's treating MD very differently to the LVM volume -
different numbers of AGs, different sunit/swdith. Did you
build xfsprogs yourself? Is it linked against libblkid or libdisk?
Or it might be that LVM is not exporting the characteristic of the
underlying volume. Can you check if there are different parameter
values exported by the two devices in /sys/block/<dev>/queue?
> Limited testing using dd and bonnie++ shows no difference in write
> performance whether I use sunit=64/swidth=896 or sunit=0/swidth=0 on
> the lv.
These benchmarks won't realy show any difference on an empty
filesystem. It will have an impact on how the filesystems age and
how well aligned the IO will be to the underlying device under more
complex workloads...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS over LVM over md RAID
2010-09-10 1:30 ` Dave Chinner
@ 2010-09-10 2:29 ` Richard Scobie
2010-09-10 14:24 ` Eric Sandeen
0 siblings, 1 reply; 10+ messages in thread
From: Richard Scobie @ 2010-09-10 2:29 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
Hi Dave,
Dave Chinner wrote:
> On Fri, Sep 10, 2010 at 10:58:22AM +1200, Richard Scobie wrote:
>> Using the latest, stable versions of LVM2 and xfsprogs and the
>> 2.6.35.4 kernel, I am setting up lvm on a 16 drive, 256k chunk md
>> RAID6, which has been used to date with XFS directly on the RAID.
>>
>> mkfs.xfs directly on the RAID gives:
>>
>> meta-data=/dev/md8 isize=256 agcount=32,
>> agsize=106814656 blks
>> = sectsz=4096 attr=2
>> data = bsize=4096 blocks=3418068864, imaxpct=5
>> = sunit=64 swidth=896 blks
>> naming =version 2 bsize=4096 ascii-ci=0
>>
>> which gives the correct sunit and swidth values for the array.
>>
>> Creating an lv which uses the entire array and mkfs.xfs on that, gives:
>>
>> meta-data=/dev/vg_local/Storage isize=256 agcount=13,
>> agsize=268435455 blks
>> = sectsz=512 attr=2
>> data = bsize=4096 blocks=3418067968, imaxpct=5
>> = sunit=0 swidth=0 blks
>> naming =version 2 bsize=4096 ascii-ci=0
>
> Hmmm - it's treating MD very differently to the LVM volume -
> different numbers of AGs, different sunit/swdith. Did you
> build xfsprogs yourself? Is it linked against libblkid or libdisk?
I should clarify - the first set was created with xfsprogs 3.0.0 and
second was done with xfsprogs 3.1.3, so I wondered if the default ag
count had changed.
Given a 12TB array, would I be better using 32?
I did build 3.1.3. Is libblkid preferable? I note it is defaulted off
in configure and I used the default configuration.
l appear to have e2fsprogs and devel packages installed (Fedora 11), for
libblkid, but when I enable and try to build, it fails:
[CC] xfs_mkfs.o
xfs_mkfs.c: In function ‘check_overwrite’:
xfs_mkfs.c:298: error: ‘blkid_probe’ undeclared (first use in this function)
xfs_mkfs.c:298: error: (Each undeclared identifier is reported only once
xfs_mkfs.c:298: error: for each function it appears in.)
xfs_mkfs.c:298: error: expected ‘;’ before ‘pr’
xfs_mkfs.c:321: error: ‘pr’ undeclared (first use in this function)
xfs_mkfs.c:321: warning: implicit declaration of function
‘blkid_new_probe_from_filename’
xfs_mkfs.c:325: warning: implicit declaration of function
‘blkid_probe_enable_partitions’
xfs_mkfs.c:329: warning: implicit declaration of function
‘blkid_do_fullprobe’
xfs_mkfs.c:345: warning: implicit declaration of function
‘blkid_probe_lookup_value’
xfs_mkfs.c:362: warning: implicit declaration of function ‘blkid_free_probe’
xfs_mkfs.c: In function ‘blkid_get_topology’:
xfs_mkfs.c:372: error: ‘blkid_topology’ undeclared (first use in this
function)
xfs_mkfs.c:372: error: expected ‘;’ before ‘tp’
xfs_mkfs.c:373: error: ‘blkid_probe’ undeclared (first use in this function)
xfs_mkfs.c:373: error: expected ‘;’ before ‘pr’
xfs_mkfs.c:381: error: ‘pr’ undeclared (first use in this function)
xfs_mkfs.c:385: error: ‘tp’ undeclared (first use in this function)
xfs_mkfs.c:385: warning: implicit declaration of function
‘blkid_probe_get_topology’
xfs_mkfs.c:397: warning: implicit declaration of function
‘blkid_topology_get_minimum_io_size’
xfs_mkfs.c:400: warning: implicit declaration of function
‘blkid_topology_get_optimal_io_size’
xfs_mkfs.c:403: warning: implicit declaration of function
‘blkid_probe_get_sectorsize’
xfs_mkfs.c:406: warning: implicit declaration of function
‘blkid_topology_get_alignment_offset’
gmake[2]: *** [xfs_mkfs.o] Error 1
gmake[1]: *** [mkfs] Error 2
make: *** [default] Error 2
> Or it might be that LVM is not exporting the characteristic of the
> underlying volume. Can you check if there are different parameter
> values exported by the two devices in /sys/block/<dev>/queue?
They look the same.
>> Limited testing using dd and bonnie++ shows no difference in write
>> performance whether I use sunit=64/swidth=896 or sunit=0/swidth=0 on
>> the lv.
>
> These benchmarks won't realy show any difference on an empty
> filesystem. It will have an impact on how the filesystems age and
> how well aligned the IO will be to the underlying device under more
> complex workloads...
I figured I'd go with the geometry specified.
Regards,
Richard
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS over LVM over md RAID
2010-09-10 2:29 ` Richard Scobie
@ 2010-09-10 14:24 ` Eric Sandeen
2010-09-10 21:42 ` Richard Scobie
0 siblings, 1 reply; 10+ messages in thread
From: Eric Sandeen @ 2010-09-10 14:24 UTC (permalink / raw)
To: Richard Scobie; +Cc: xfs
Richard Scobie wrote:
> I did build 3.1.3. Is libblkid preferable? I note it is defaulted off
> in configure and I used the default configuration.
>
> l appear to have e2fsprogs and devel packages installed (Fedora 11), for
> libblkid, but when I enable and try to build, it fails:
>
> [CC] xfs_mkfs.o
> xfs_mkfs.c: In function ‘check_overwrite’:
> xfs_mkfs.c:298: error: ‘blkid_probe’ undeclared (first use in this
> function)
yup F11 doesn't have all the nice blkid topology bits; F11 still has
libblkid in e2fsprogs, from before it was liberated to util-linux-ng.
(I wonder if we shouldn't just convert xfsprogs over to use the ioctls...)
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS over LVM over md RAID
2010-09-10 14:24 ` Eric Sandeen
@ 2010-09-10 21:42 ` Richard Scobie
2010-09-10 22:19 ` Stan Hoeppner
0 siblings, 1 reply; 10+ messages in thread
From: Richard Scobie @ 2010-09-10 21:42 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs
Eric Sandeen wrote:
> yup F11 doesn't have all the nice blkid topology bits; F11 still has
> libblkid in e2fsprogs, from before it was liberated to util-linux-ng.
>
> (I wonder if we shouldn't just convert xfsprogs over to use the ioctls...)
Thanks both for your help. For various reasons, I'll just rely on using
the non-topology aware version.
In the future this lv will be grown in multiples of 256K chunk, 16
drive RAID6 arrays, so am I correct in thinking that the sunit/swidth
parameter can stay the same as it is expanded?
I am thinking that XFS allocates ag's across all the space and that it
will only be writing to any one array at a time, or would there be more
configured if mkfs.xfs were aware that it was being created on say, 2 x
16 RAID6 arrays?
This is based on backing up the first array prior to expansion, adding
the second and mkfs.xfs, as I imagine just expanding the fs over the
second would result in non optimal performance.
Regards,
Richard
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS over LVM over md RAID
2010-09-10 21:42 ` Richard Scobie
@ 2010-09-10 22:19 ` Stan Hoeppner
0 siblings, 0 replies; 10+ messages in thread
From: Stan Hoeppner @ 2010-09-10 22:19 UTC (permalink / raw)
To: xfs
Richard Scobie put forth on 9/10/2010 4:42 PM:
> In the future this lv will be grown in multiples of 256K chunk, 16
> drive RAID6 arrays, so am I correct in thinking that the sunit/swidth
> parameter can stay the same as it is expanded?
What is the reasoning behind adding so many terabytes under a single
filesystem?
Do you _need_ all of it under a single mount point? If not, or even if
you do, for many reasons, it may very well be better to put a single
filesystem directly on each RAID6 array without using LVM in the middle
and simply mount each filesystem at a different point, say:
/data
/data/array1
/data/array2
/data/array3
/data/array4
This method can minimize damage and downtime when an entire array is
knocked offline. We just had a post yesterday where a SATA cable was
kicked loose and took 5 drives down of a 15 drive md RAID6 set, killing
the entire filesystem. If that OP had setup 3x5 drive arrays with 3
filesystems, the system could have continued to run in a degraded
fashion, depending on his application data layout across the
filesystems. If done properly, you lose an app or two, not all of them.
This method also eliminates xfs_growfs performance issues such as what
you're describing because you're never changing the filesystem layout
when adding new arrays to the system.
In summary, every layer of complexity added to the storage stack
increases the probability of failure. As my grandmother was fond of
saying, "Don't put all of your eggs in one basket." It was salient
advice on the farm 80 years ago, and it's salient advice today with high
technology.
--
Stan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: XFS over LVM over md RAID
@ 2010-09-10 23:08 Richard Scobie
0 siblings, 0 replies; 10+ messages in thread
From: Richard Scobie @ 2010-09-10 23:08 UTC (permalink / raw)
To: xfs
Stan Hoeppner wrote:
> What is the reasoning behind adding so many terabytes under a single
filesystem?
Heavily scripted project environments, where initial storage estimates
are exceeded and more needs to be added without the complications of
managing seperate filesystems part way through.
It is unlikely that more than 2 arrays would be involved and I used the
example to try and understand how XFS adapts to changing topologies.
Regards,
Richard
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2010-09-10 23:07 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-09 22:58 XFS over LVM over md RAID Richard Scobie
2010-09-10 0:25 ` Michael Monnerie
2010-09-10 0:52 ` Richard Scobie
2010-09-10 1:14 ` Richard Scobie
2010-09-10 1:30 ` Dave Chinner
2010-09-10 2:29 ` Richard Scobie
2010-09-10 14:24 ` Eric Sandeen
2010-09-10 21:42 ` Richard Scobie
2010-09-10 22:19 ` Stan Hoeppner
2010-09-10 23:08 Richard Scobie
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.