All of lore.kernel.org
 help / color / mirror / Atom feed
* Optimal XFS formatting options?
@ 2012-01-14 17:44 MikeJeezy
  2012-01-14 22:23 ` Stan Hoeppner
  2012-01-15  1:14 ` Peter Grandi
  0 siblings, 2 replies; 17+ messages in thread
From: MikeJeezy @ 2012-01-14 17:44 UTC (permalink / raw)
  To: xfs


Hi, I have a 4.9 TB iSCSI LUN on a RAID 6 array with twelve 2 TB SATA disks
(4.9T is only one of the logical volumes). It will contain several million
files of various sizes, but 80% of them will be less than 50 MB.  I'm a
novice at best and I usually just use the default #mkfs.xfs /dev/sdx1

This is server will be write heavy for about 8 hours a night, but every
morning there are many reads to the disk.  There is rarely a time where it
will be write heavy and read heavy at the same time.  Are there other XFS
format options that I could use to optimize performance?

Any input is greatly appreciated. Thank you.
-- 
View this message in context: http://old.nabble.com/Optimal-XFS-formatting-options--tp33140169p33140169.html
Sent from the Xfs - General mailing list archive at Nabble.com.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Optimal XFS formatting options?
  2012-01-14 17:44 Optimal XFS formatting options? MikeJeezy
@ 2012-01-14 22:23 ` Stan Hoeppner
  2012-01-16  0:27   ` MikeJeezy
  2012-01-15  1:14 ` Peter Grandi
  1 sibling, 1 reply; 17+ messages in thread
From: Stan Hoeppner @ 2012-01-14 22:23 UTC (permalink / raw)
  To: xfs

On 1/14/2012 11:44 AM, MikeJeezy wrote:
> 
> Hi, I have a 4.9 TB iSCSI LUN on a RAID 6 array with twelve 2 TB SATA disks
> (4.9T is only one of the logical volumes). It will contain several million
> files of various sizes, but 80% of them will be less than 50 MB.  I'm a
> novice at best and I usually just use the default #mkfs.xfs /dev/sdx1
> 
> This is server will be write heavy for about 8 hours a night, but every
> morning there are many reads to the disk.  There is rarely a time where it
> will be write heavy and read heavy at the same time.  Are there other XFS
> format options that I could use to optimize performance?

    sunit=value

This is used to specify the stripe unit for a RAID device or a logical
volume. The value has to be specified in 512-byte block units. Use the
su suboption to specify the stripe unit size in bytes. This suboption
ensures that data allocations will be stripe unit aligned when the
current end of file is being extended and the file size is larger than
512KiB. Also inode allocations and the internal log will be stripe unit
aligned.

    su=value

This is an alternative to using sunit. The su suboption is used to
specify the stripe unit for a RAID device or a striped logical volume.
The value has to be specified in bytes, (usually using the m or g
suffixes). This value must be a multiple of the filesystem block size.

    swidth=value

This is used to specify the stripe width for a RAID device or a striped
logical volume. The value has to be specified in 512-byte block units.
Use the sw suboption to specify the stripe width size in bytes. This
suboption is required if -d sunit has been specified and it has to be a
multiple of the -d sunit suboption.

    sw=value

suboption is an alternative to using swidth. The sw suboption is used to
specify the stripe width for a RAID device or striped logical volume.
The value is expressed as a multiplier of the stripe unit, usually the
same as the number of stripe members in the logical volume
configuration, or data disks in a RAID device.


Using su and sw is often easier due to less conversions.

With a 12 drive RAID6 array your stripe width, or sw, is 10.  You will
need to consult the array controller admin interface and documentation
to discover the su value if you don't already know it.  Different
vendors call this parameter by different names.  It could be "chunk
size" or "strip size" or other.  Some/many vendors don't specify this
value at all, giving you only static pre-defined total stripe size
options for the array, such as 64KB, 128KB, 1MB, etc, only in power of 2
values.  In this case if you have 64KB stripe size and divide by 10
drives in the stripe you end up with a non filesystem block size
multiple:  6553.6 bytes.  This presents serious problems for alignment.
 In this case you must dig deep to find out exactly how your vendor
controller handles this situation when your effective RAID spindle count
is not a power of 2.

So let's assume your vendor does the smart thing and allows you
flexibility in specifying per drive strip size.  Assume for example the
stripe unit (strip, chunk) of the array is 64KB, there are 10 stripe
spindles (12-2=10), and the local device name of the LUN is /dev/sdb.
To create an aligned XFS filesystem on this you would use something like:

$ mkfs.xfs -d su=64k sw=10 /dev/sdb

When using vendor array hardware that only allows one to define what XFS
calls swidth, it is best to use a power of 2 stripe spindle count to get
proper alignment.  If you use a non power of 2 stripe spindle count the
vendor firmware will either round down or round up to create the stripe
unit size, and this formula is often not documented.

With such vendor hardware, for a RAID6 array you would want to have 6,
10, or 18 total drives in the array, giving you 4, 8, or 16 stripe
spindles.  Alternatively, you need to know exactly how the firmware
rounds up or down to arrive at the strip block size (sunit).

If you find yourself in such a situation, and are unable to determine
the strip size the array firmware is using, you may be better off using
the mkfs.xfs defaults, vs guessing and ending up with unaligned writes.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Optimal XFS formatting options?
  2012-01-14 17:44 Optimal XFS formatting options? MikeJeezy
  2012-01-14 22:23 ` Stan Hoeppner
@ 2012-01-15  1:14 ` Peter Grandi
  2012-01-20  9:03   ` Linda Walsh
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Grandi @ 2012-01-15  1:14 UTC (permalink / raw)
  To: Linux fs XFS

[ ... ]

> Hi, I have a 4.9 TB iSCSI LUN on a RAID 6 array with twelve 2
> TB SATA disks (4.9T is only one of the logical volumes). It
> will contain several million files of various sizes, but 80%
> of them will be less than 50 MB.  I'm a novice at best and I
> usually just use the default #mkfs.xfs /dev/sdx1

The default :-) advice in this list and in the XFS FAQ is that
in any recent edition of the XFS tools and XFS code in the
kernel the defaults are usually best, unless you have a special
situation, for example if the kernel cannot get storage geometry
from the storage layer.

Also, "several million" in a about 5,000,000MB filesystem
indicates an average file size of 1MB. That's not too small,
fortunately. Anyhow consider how long it will take to 'fsck' all
that if it gets damaged, or the extra load to backup the whole
filetree if backups scan the tree (e.g. RYNC based).

> This is server will be write heavy for about 8 hours a night,
> but every morning there are many reads to the disk.  There is
> rarely a time where it will be write heavy and read heavy at
> the same time.  Are there other XFS format options that I
> could use to optimize performance? Any input is greatly
> appreciated. Thank you.

As usual, the first note is that in general RAID6 is a bad idea,
with RMW and reliability (especially during rebuild) issues, but
salesmen and management usually love it because it embodies a
promise of something for nothing (let's say that the parity RAID
industry is the Wall Street of storage system :->).

To mitigate problems In general if you are doing a lot of
writing it is very important that the filesystem try to align to
address/length of the full RAID stripe, but this should be
automatic if the relevant geometry is reported to the Linux
kernel. Otherwise thee are many previous messages in this list
about that, and the FAQ etc.

Things that you might want to double check in case they matter
for you, as to not-'mkfs' options:

  * XFS has several limitations on 32b kernels. Just make sure
    you have a 64b kernel.

  * Make really sure your partitions (or LUNs if unpartitioned)
    are aligned, certainly to a multiple of stripe size, ideally
    to something larg, at least like 1MiB.

  * Recent (let's say at least 2.6.32 or EL57) kernels and
    editions of XFS tools and partitioning tools (if you use
    any) are very improved. The newer usually the better.

  * Usually just in case explicitly specify at 'mount' (not
    'mkfs') time the 'inode64' option; and the 'barrier' option
    unless you really know better (and pray hard that your
    storage layer supports it). The 'delaylog' option or its
    opposite are also something to look carefully into.

  * Check carefully whether your app is compatible with the
    'noatime' and 'nodiratime' options and enable them if
    possible, "just in case" :-).

  * Look very attentively at the kernel page cache flusher
    parameters to make it run more often (tom prevent the
    accumulation of very large gulps of unwritten data) but not
    too often (to give a chance to the delayed allocator).

As to proper 'mkfs' you may want to look into:

  * Explicitly set the sector size because most storage layers
    lie. In general if possible you should set it to 4096, just
    in case :-). This also allegedly extends the range where
    inodes can be stored if you cannot specify 'inode64' at
    mount time.

  * If you have a critically high rate of metadata work (like
    file creation/deletion, and it seems your case overnight)
    you may want to ensure that your log is not only aligned,
    but perhaps on a separate device, and/or you have a host
    adapter with a large battery backed cache. Logs are small,
    so it should be easy either way.

  * Depending on the degree of multihtreading of your
    application you may want more/less AGs, but usually on a
    4.9TB filetree there will be plenty.

  * You may want larger inodes than the default if you have lots
    of ACLs or your files are written slowly and thus have many
    extents. They are recommended also for small files but I
    cannot remember whether XFS really stores small files or
    directories into the inode (I remember that directories of
    less than 8 entries are stored in the inode, but I don't
    know whether depends on its size).

Run first 'mfs.fs -N ....' so it will print out which
parameters it will use without actually doing anything.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Optimal XFS formatting options?
  2012-01-14 22:23 ` Stan Hoeppner
@ 2012-01-16  0:27   ` MikeJeezy
  2012-01-16  4:56     ` Stan Hoeppner
  0 siblings, 1 reply; 17+ messages in thread
From: MikeJeezy @ 2012-01-16  0:27 UTC (permalink / raw)
  To: xfs


>So let's assume your vendor does the smart thing and allows you 
>flexibility in specifying per drive strip size.  Assume for example the 
>stripe unit (strip, chunk) of the array is 64KB, there are 10 stripe 
>spindles (12-2=10), and the local device name of the LUN is /dev/sdb. 
>To create an aligned XFS filesystem on this you would use something like: 

>$ mkfs.xfs -d su=64k sw=10 /dev/sdb 

Great explanations! (some of it I am still trying to understand :-)  In this
case on my HP P2000 G3, I do have a 64k chunk size so I will do:

$ mkfs.xfs -d su=64k,sw=10 /dev/sdd

Question: Does the above command assume I do not already have a partition
created?  I was 
http://www.fhgfs.com/wiki/wikka.php?wakka=PartitionAlignment reading here 
that the easiest way to acheive partition alignment is to create the file
system directly on the storage device without any paritions - such as $
mkfs.xfs /dev/sdd  (and your example above also hints at this)

When I created my current partiton, I used the following commands:

$ parted -a optimal /dev/sdd
$ mklabel gpt
$ mkpart primary 0 -0
$ q

I would like to align the partiton as well, but I am not sure how to acheive
this using parted.  This will be the only partition on the LUN, so not sure
if I even need to create one (although I do like to stay consistent with my
other volumes). 

When printing the partition info with parted I see:

# (parted) p                                                                
Model: HP P2000 G3 iSCSI (scsi)
Disk /dev/sdd: 4900GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End     Size    File system  Name     Flags
 1      1049kB  4900GB  4900GB  xfs          primary

but from reading, I suspect the Sector size should be more like: 
(logical/physical): 512B/65536B.  Any thoughts on partition alignment or
other thoughts in general?  Thank you.

-- 
View this message in context: http://old.nabble.com/Optimal-XFS-formatting-options--tp33140169p33145068.html
Sent from the Xfs - General mailing list archive at Nabble.com.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Optimal XFS formatting options?
  2012-01-16  0:27   ` MikeJeezy
@ 2012-01-16  4:56     ` Stan Hoeppner
  2012-01-16 23:11       ` Dave Chinner
  0 siblings, 1 reply; 17+ messages in thread
From: Stan Hoeppner @ 2012-01-16  4:56 UTC (permalink / raw)
  To: xfs

On 1/15/2012 6:27 PM, MikeJeezy wrote:
> 
>> So let's assume your vendor does the smart thing and allows you 
>> flexibility in specifying per drive strip size.  Assume for example the 
>> stripe unit (strip, chunk) of the array is 64KB, there are 10 stripe 
>> spindles (12-2=10), and the local device name of the LUN is /dev/sdb. 
>> To create an aligned XFS filesystem on this you would use something like: 
> 
>> $ mkfs.xfs -d su=64k sw=10 /dev/sdb 
> 
> Great explanations! (some of it I am still trying to understand :-)  In this
> case on my HP P2000 G3, I do have a 64k chunk size so I will do:
> 
> $ mkfs.xfs -d su=64k,sw=10 /dev/sdd

That should be fine.

> Question: Does the above command assume I do not already have a partition
> created?  I was 
> http://www.fhgfs.com/wiki/wikka.php?wakka=PartitionAlignment reading here 
> that the easiest way to acheive partition alignment is to create the file
> system directly on the storage device without any paritions - such as $
> mkfs.xfs /dev/sdd  (and your example above also hints at this)

That example and command assume you're not using partitions.

> When I created my current partiton, I used the following commands:
> 
> $ parted -a optimal /dev/sdd
> $ mklabel gpt
> $ mkpart primary 0 -0
> $ q
> 
> I would like to align the partiton as well, but I am not sure how to acheive
> this using parted.  This will be the only partition on the LUN, so not sure
> if I even need to create one (although I do like to stay consistent with my
> other volumes). 

If your drives have 512 byte physical sectors (not advanced format
drives with 4096 byte sectors) then there is no need to worry about
partition alignment.  And in fact, if you plan to put a single
filesystem on this entire 4.9TB virtual drive, you don't need to
partition the disk device at all.  Recall the dictionary definition of
"partition".  You're not dividing the whole into smaller pieces here.

> When printing the partition info with parted I see:
> 
> # (parted) p                                                                
> Model: HP P2000 G3 iSCSI (scsi)
> Disk /dev/sdd: 4900GB
> Sector size (logical/physical): 512B/512B
> Partition Table: gpt
> 
> Number  Start   End     Size    File system  Name     Flags
>  1      1049kB  4900GB  4900GB  xfs          primary
> 
> but from reading, I suspect the Sector size should be more like: 
> (logical/physical): 512B/65536B.  

No, that 65536 figure is wrong.  Their are only two possibilities for
sector size (logical/physical):  512/512 and 512/4096.  These are the
only two disk sector formats currently used on disk drives.
Partitioning utils look strictly at disk parameters, not RAID parameters.

Sectors deal with how many books (bytes) fit on each shelf (sector) in
the library, and which shelf (sector) we're going to store a given set
of books (bytes) on.  RAID parameters, such as stripe unit, deal with
how many shelves (sectors) worth of books (bytes) we can carry most
efficiently down the isle and place on the shelves at one time.

In short, sectors are a destination where we store bytes, much like
books on a shelf.  A stripe unit acts as a book cart in which we carry a
fixed number of books, allowing us to fill a fixed number of shelves
most efficiently per cart transported down the isle.

> Any thoughts on partition alignment or
> other thoughts in general?  Thank you.

Yes, don't use partitions if you don't need to divide your disk device
(LUN/virtual disk) into multiple pieces.  Now, if you need to make use
of snapshots or other volume management features, you may want to create
an LVM device on top of the disk device (LUN) and then make your XFS on
top of the LVM device.  If you have no need for LVM features, I'd say
directly format the LUN with XFS, no partition table necessary.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Optimal XFS formatting options?
  2012-01-16  4:56     ` Stan Hoeppner
@ 2012-01-16 23:11       ` Dave Chinner
  2012-01-17  3:31         ` Stan Hoeppner
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2012-01-16 23:11 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: xfs

On Sun, Jan 15, 2012 at 10:56:22PM -0600, Stan Hoeppner wrote:
> On 1/15/2012 6:27 PM, MikeJeezy wrote:
> > I would like to align the partiton as well, but I am not sure how to acheive
> > this using parted.  This will be the only partition on the LUN, so not sure
> > if I even need to create one (although I do like to stay consistent with my
> > other volumes). 
> 
> If your drives have 512 byte physical sectors (not advanced format
> drives with 4096 byte sectors) then there is no need to worry about
> partition alignment.

That is incorrect. Partitions need to be aligned to the underlying
stripe configuration, regardless of the sector size of the drives
that make up the stripe. If you do not align the partition to the
stripe, then the filesystem will be unaligned no matter how you
configure it. Every layer of the storage stack under the filesystem
needs to be correctly aligned and sized for filesystem alignment to
make any difference to performance.

> > Any thoughts on partition alignment or
> > other thoughts in general?  Thank you.
> 
> Yes, don't use partitions if you don't need to divide your disk device
> (LUN/virtual disk) into multiple pieces.  Now, if you need to make use
> of snapshots or other volume management features, you may want to create
> an LVM device on top of the disk device (LUN) and then make your XFS on
> top of the LVM device.  If you have no need for LVM features, I'd say
> directly format the LUN with XFS, no partition table necessary.

If you use LVM, then you need to ensure that it is slicing up the
device in a manner that is aligned correctly to the underlying
stripe, just like if you are using partitions to provide the same
functionality. Different technologies, same problem.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Optimal XFS formatting options?
  2012-01-16 23:11       ` Dave Chinner
@ 2012-01-17  3:31         ` Stan Hoeppner
  2012-01-17  9:19           ` Michael Monnerie
  0 siblings, 1 reply; 17+ messages in thread
From: Stan Hoeppner @ 2012-01-17  3:31 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 1/16/2012 5:11 PM, Dave Chinner wrote:
> On Sun, Jan 15, 2012 at 10:56:22PM -0600, Stan Hoeppner wrote:
>> On 1/15/2012 6:27 PM, MikeJeezy wrote:
>>> I would like to align the partiton as well, but I am not sure how to acheive
>>> this using parted.  This will be the only partition on the LUN, so not sure
>>> if I even need to create one (although I do like to stay consistent with my
>>> other volumes). 
>>
>> If your drives have 512 byte physical sectors (not advanced format
>> drives with 4096 byte sectors) then there is no need to worry about
>> partition alignment.
> 
> That is incorrect. Partitions need to be aligned to the underlying
> stripe configuration, regardless of the sector size of the drives
> that make up the stripe. If you do not align the partition to the
> stripe, then the filesystem will be unaligned no matter how you
> configure it. Every layer of the storage stack under the filesystem
> needs to be correctly aligned and sized for filesystem alignment to
> make any difference to performance.

Thanks for the correction/reminder Dave.  So in this case the first
sector of the first partition would need to reside at LBA1280 in this
array (655360 byte stripe width, 1280 sectors/stripe), as the partition
table itself is going to occupy some sectors at the beginning of the
first stripe.  By creating the partition at LBA1280 we make sure the
first sector of the XFS filesystem is aligned with the first sector of
the 2nd stripe.

This exercise demonstrates why it's often preferable to directly format
the LUN.  If you don't have a _need_ for a partition table, such as
cloning/backup software that works at the partition level, or something
of that nature, avoid partitions.

>>> Any thoughts on partition alignment or
>>> other thoughts in general?  Thank you.
>>
>> Yes, don't use partitions if you don't need to divide your disk device
>> (LUN/virtual disk) into multiple pieces.  Now, if you need to make use
>> of snapshots or other volume management features, you may want to create
>> an LVM device on top of the disk device (LUN) and then make your XFS on
>> top of the LVM device.  If you have no need for LVM features, I'd say
>> directly format the LUN with XFS, no partition table necessary.
> 
> If you use LVM, then you need to ensure that it is slicing up the
> device in a manner that is aligned correctly to the underlying
> stripe, just like if you are using partitions to provide the same
> functionality. Different technologies, same problem.

If he's doing a single LVM volume then alignment should be automatic
during mkfs.xfs shouldn't it?

-- 
Stan


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Optimal XFS formatting options?
  2012-01-17  3:31         ` Stan Hoeppner
@ 2012-01-17  9:19           ` Michael Monnerie
  2012-01-17 11:17             ` Emmanuel Florac
  2012-01-17 11:34             ` Stan Hoeppner
  0 siblings, 2 replies; 17+ messages in thread
From: Michael Monnerie @ 2012-01-17  9:19 UTC (permalink / raw)
  To: xfs, stan


[-- Attachment #1.1: Type: Text/Plain, Size: 1379 bytes --]

On Dienstag, 17. Januar 2012 Stan Hoeppner wrote:
> Thanks for the correction/reminder Dave.  So in this case the first
> sector of the first partition would need to reside at LBA1280 in this
> array (655360 byte stripe width, 1280 sectors/stripe), as the
> partition table itself is going to occupy some sectors at the
> beginning of the first stripe.  By creating the partition at LBA1280
> we make sure the first sector of the XFS filesystem is aligned with
> the first sector of the 2nd stripe.

There's one big problem with that: Many people will sooner or later 
expand and existing array. If you add one drive, all your nice stripe 
width alignment becomes bogus, and suddenly your performance will drop.

There's no real way out of that, but three solutions come to my mind:
- backup before expand/restore after expand with new alignment
- leave existing data, just change mount options so after expansion at 
least new files are going to be aligned to the new stripe width. 
- expand array by factors of two. So if you have 10 data drives, add 10 
data drives. But that creates other problems (probability of single 
drive failure + time to recover a single broken disk)

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Optimal XFS formatting options?
  2012-01-17  9:19           ` Michael Monnerie
@ 2012-01-17 11:17             ` Emmanuel Florac
  2012-01-17 11:34             ` Stan Hoeppner
  1 sibling, 0 replies; 17+ messages in thread
From: Emmanuel Florac @ 2012-01-17 11:17 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 854 bytes --]

Le Tue, 17 Jan 2012 10:19:55 +0100
Michael Monnerie <michael.monnerie@is.it-management.at> écrivait:

> - expand array by factors of two. So if you have 10 data drives, add
> 10 data drives. But that creates other problems (probability of
> single drive failure + time to recover a single broken disk)

From my experience 20 drives is OK for RAID-6. And rebuild time doesn't
change much with array size, anyway.

Misaligned partitions, on the other hand, can easily halve array
throughput from my own measurements.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Optimal XFS formatting options?
  2012-01-17  9:19           ` Michael Monnerie
  2012-01-17 11:17             ` Emmanuel Florac
@ 2012-01-17 11:34             ` Stan Hoeppner
  2012-01-20 15:52               ` Michael Monnerie
  1 sibling, 1 reply; 17+ messages in thread
From: Stan Hoeppner @ 2012-01-17 11:34 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: xfs

On 1/17/2012 3:19 AM, Michael Monnerie wrote:
> On Dienstag, 17. Januar 2012 Stan Hoeppner wrote:
>> Thanks for the correction/reminder Dave.  So in this case the first
>> sector of the first partition would need to reside at LBA1280 in this
>> array (655360 byte stripe width, 1280 sectors/stripe), as the
>> partition table itself is going to occupy some sectors at the
>> beginning of the first stripe.  By creating the partition at LBA1280
>> we make sure the first sector of the XFS filesystem is aligned with
>> the first sector of the 2nd stripe.
> 
> There's one big problem with that: Many people will sooner or later 
> expand and existing array. If you add one drive, all your nice stripe 
> width alignment becomes bogus, and suddenly your performance will drop.

So to be clear, your issue with the above isn't with my partition
alignment math WRT the OP's P2000 array, but is with using XFS stripe
alignment in general, correct?

> There's no real way out of that, but three solutions come to my mind:
> - backup before expand/restore after expand with new alignment
> - leave existing data, just change mount options so after expansion at 
> least new files are going to be aligned to the new stripe width. 
> - expand array by factors of two. So if you have 10 data drives, add 10 
> data drives. But that creates other problems (probability of single 
> drive failure + time to recover a single broken disk)

There is one really simple way around this issue you describe: don't add
drives to an existing array.  Simply create another array with new
disks, create a new aligned XFS on the array, and mount the filesystem
in an appropriate location.  There is no 11th Commandment stating one
must have a single massive XFS atop all of one's disks. ;)

There is little to no application software today that can't be
configured to store its data files across multiple directories.  So
there's no need to box oneself into the corner you describe above.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Optimal XFS formatting options?
  2012-01-15  1:14 ` Peter Grandi
@ 2012-01-20  9:03   ` Linda Walsh
  2012-01-20 12:06     ` Peter Grandi
  0 siblings, 1 reply; 17+ messages in thread
From: Linda Walsh @ 2012-01-20  9:03 UTC (permalink / raw)
  To: Linux fs XFS



Peter Grandi wrote:
>
> 
>   * XFS has several limitations on 32b kernels. Just make sure
>     you have a 64b kernel.
----
	I was unaware that the block size was larger on 64b kernels.

Is that what you are referring to ?

(would be nice)...


One thing I have a Q on -- you (OP), said this was an 'iscsi' box?

That means hookup over an network, right?

You are planning on using a 10Gbit or faster network fabric, right?

a 1Gb ethernet will only get you 125MB/s max... doesn't take much
tuning to hit that speed.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Optimal XFS formatting options?
  2012-01-20  9:03   ` Linda Walsh
@ 2012-01-20 12:06     ` Peter Grandi
  2012-01-20 15:55       ` Michael Monnerie
  2012-01-23  4:21       ` Dave Chinner
  0 siblings, 2 replies; 17+ messages in thread
From: Peter Grandi @ 2012-01-20 12:06 UTC (permalink / raw)
  To: Linux fs XFS

[ ... ]
>> * XFS has several limitations on 32b kernels. Just make sure
>>   you have a 64b kernel.
[ ... ]
> I was unaware that the block size was larger on 64b kernels.
> Is that what you are referring to ? (would be nice)...

Not as such, the maximum block size is limited by the Linux page
cache, that is hw page size, which is for IA32 and AMD64
architectures the same at 4KiB. However other architectures
which are natively 64b allow bigger page sizes (notably IA64
[aka Itanium]), so the page cache and thus XFS can do larger
blocks sizes.

The limitations of XFS on 32b kernels come from limitations of
XFS itself in 32b mode, limitations of Linux in 32b mode, and
combined limitations. For example:

  * There be 32b inode numbers, which limit inodes to the first
    1TB of a filetree if sector size is 512B.

  * The 32b block IO subsystems limits partition sizes to 16TiB.

  * XFS tools scanning a large filesystem, usually for repair,
    can run out of the available 32b address space (by default
    around 2GiB).

Page 5 and 6 here list some limits:

  http://oss.sgi.com/projects/xfs/training/xfs_slides_02_overview.pdf

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Optimal XFS formatting options?
  2012-01-17 11:34             ` Stan Hoeppner
@ 2012-01-20 15:52               ` Michael Monnerie
  2012-01-20 22:44                 ` Stan Hoeppner
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Monnerie @ 2012-01-20 15:52 UTC (permalink / raw)
  To: xfs, stan


[-- Attachment #1.1: Type: Text/Plain, Size: 1378 bytes --]

On Dienstag, 17. Januar 2012 Stan Hoeppner wrote:
> So to be clear, your issue with the above isn't with my partition
> alignment math WRT the OP's P2000 array, but is with using XFS stripe
> alignment in general, correct?

Yes. I just wanted to document this as people often expand RAIDs and 
forget to apply the changes to stripe width.
 
> There is one really simple way around this issue you describe: don't
> add drives to an existing array.  Simply create another array with
> new disks, create a new aligned XFS on the array, and mount the
> filesystem in an appropriate location.  There is no 11th Commandment
> stating one must have a single massive XFS atop all of one's disks.
> ;)
> 
> There is little to no application software today that can't be
> configured to store its data files across multiple directories.  So
> there's no need to box oneself into the corner you describe above.

It's a management burden to do that. I've learned that systems usually 
are strictly structured in their configuration, so it's often better to 
extend a RAID and to keep the config, as this is cheaper in the end. At 
least for the salaries of good admins here in Europe ;-)

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Optimal XFS formatting options?
  2012-01-20 12:06     ` Peter Grandi
@ 2012-01-20 15:55       ` Michael Monnerie
  2012-01-23  4:21       ` Dave Chinner
  1 sibling, 0 replies; 17+ messages in thread
From: Michael Monnerie @ 2012-01-20 15:55 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: Text/Plain, Size: 660 bytes --]

On Freitag, 20. Januar 2012 Peter Grandi wrote:
>   * There be 32b inode numbers, which limit inodes to the first
>     1TB of a filetree if sector size is 512B.
> 
>   * The 32b block IO subsystems limits partition sizes to 16TiB.

I thought those two have been removed by some updates? I think I 
remember to have read that. Not that it's too interesting, I've been 
running on 64b Linux everywhere since AMD has put it in their 
processors. Should be 10+ years or so.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Optimal XFS formatting options?
  2012-01-20 15:52               ` Michael Monnerie
@ 2012-01-20 22:44                 ` Stan Hoeppner
  2012-01-24 10:31                   ` Michael Monnerie
  0 siblings, 1 reply; 17+ messages in thread
From: Stan Hoeppner @ 2012-01-20 22:44 UTC (permalink / raw)
  To: xfs

On 1/20/2012 9:52 AM, Michael Monnerie wrote:
> On Dienstag, 17. Januar 2012 Stan Hoeppner wrote:
>> So to be clear, your issue with the above isn't with my partition
>> alignment math WRT the OP's P2000 array, but is with using XFS stripe
>> alignment in general, correct?
> 
> Yes. I just wanted to document this as people often expand RAIDs and 
> forget to apply the changes to stripe width.
>  
>> There is one really simple way around this issue you describe: don't
>> add drives to an existing array.  Simply create another array with
>> new disks, create a new aligned XFS on the array, and mount the
>> filesystem in an appropriate location.  There is no 11th Commandment
>> stating one must have a single massive XFS atop all of one's disks.
>> ;)
>>
>> There is little to no application software today that can't be
>> configured to store its data files across multiple directories.  So
>> there's no need to box oneself into the corner you describe above.
> 
> It's a management burden to do that. I've learned that systems usually 
> are strictly structured in their configuration, so it's often better to 
> extend a RAID and to keep the config, as this is cheaper in the end. At 
> least for the salaries of good admins here in Europe ;-)

If ease (or cost) of filesystem administration is of that much greater
priority than performance, then why are you using XFS in the first place
instead of EXT?

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Optimal XFS formatting options?
  2012-01-20 12:06     ` Peter Grandi
  2012-01-20 15:55       ` Michael Monnerie
@ 2012-01-23  4:21       ` Dave Chinner
  1 sibling, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2012-01-23  4:21 UTC (permalink / raw)
  To: Peter Grandi; +Cc: Linux fs XFS

On Fri, Jan 20, 2012 at 12:06:31PM +0000, Peter Grandi wrote:
> [ ... ]
> >> * XFS has several limitations on 32b kernels. Just make sure
> >>   you have a 64b kernel.
> [ ... ]
> > I was unaware that the block size was larger on 64b kernels.
> > Is that what you are referring to ? (would be nice)...
> 
> Not as such, the maximum block size is limited by the Linux page
> cache, that is hw page size, which is for IA32 and AMD64
> architectures the same at 4KiB. However other architectures
> which are natively 64b allow bigger page sizes (notably IA64
> [aka Itanium]), so the page cache and thus XFS can do larger
> blocks sizes.
> 
> The limitations of XFS on 32b kernels come from limitations of
> XFS itself in 32b mode, limitations of Linux in 32b mode, and
> combined limitations. For example:
> 
>   * There be 32b inode numbers, which limit inodes to the first
>     1TB of a filetree if sector size is 512B.

Internally XFS still uses 64 bit inode numbers - the on-disk format
does not change just because the CPU arch has changed. If you use
the stat64() style interfaces, even on 32 bit machines you can
access the full 64 bit inode numbers.

>   * The 32b block IO subsystems limits partition sizes to 16TiB.

The sector_t is a 64 bit number even on 32 bit systems. The
problem is that the page cache cannot index past offsets of 16TB.
Given that XFS no longer uses the page cache for it's metadata
indexing, we could remove this limit in the kernel code if we
wanted to. And given that the userpsace tools use direct IO, the
page cache limitation doesn't cause problems there, either, because
we bypass it.

So in theory we could lift this limit, but there really isn't much
demand for >16TB filesystems on 32 bit, because....

>   * XFS tools scanning a large filesystem, usually for repair,
>     can run out of the available 32b address space (by default
>     around 2GiB).

.... you need 64 bit systems to handle the userspace memory
requirements tools like xfs_check and xfs_repair require to run.  If
the filesystem is large enough that you can't run repair because it
needs more than 2GB of RAM, then you shouldn't be using a 32 bit
systems.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Optimal XFS formatting options?
  2012-01-20 22:44                 ` Stan Hoeppner
@ 2012-01-24 10:31                   ` Michael Monnerie
  0 siblings, 0 replies; 17+ messages in thread
From: Michael Monnerie @ 2012-01-24 10:31 UTC (permalink / raw)
  To: xfs, stan


[-- Attachment #1.1: Type: Text/Plain, Size: 825 bytes --]

On Freitag, 20. Januar 2012 Stan Hoeppner wrote:
> If ease (or cost) of filesystem administration is of that much
> greater priority than performance, then why are you using XFS in the
> first place instead of EXT?

Great experience in recovery of disaster filesystem problems on XFS. A 
switch to another FS costs a lot of time, and why switch if it works 
great? And administration comes down to mkfs, mount, maybe xfs_fsr, in 
disaster xfs_repair, and sometimes xfs_growfs. Basically nothing.

Also, this list has been of great help during the years, whenever there 
were problems they got fixed. That's ease of administration :-)

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-01-24 10:31 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-14 17:44 Optimal XFS formatting options? MikeJeezy
2012-01-14 22:23 ` Stan Hoeppner
2012-01-16  0:27   ` MikeJeezy
2012-01-16  4:56     ` Stan Hoeppner
2012-01-16 23:11       ` Dave Chinner
2012-01-17  3:31         ` Stan Hoeppner
2012-01-17  9:19           ` Michael Monnerie
2012-01-17 11:17             ` Emmanuel Florac
2012-01-17 11:34             ` Stan Hoeppner
2012-01-20 15:52               ` Michael Monnerie
2012-01-20 22:44                 ` Stan Hoeppner
2012-01-24 10:31                   ` Michael Monnerie
2012-01-15  1:14 ` Peter Grandi
2012-01-20  9:03   ` Linda Walsh
2012-01-20 12:06     ` Peter Grandi
2012-01-20 15:55       ` Michael Monnerie
2012-01-23  4:21       ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.