All of lore.kernel.org
 help / color / mirror / Atom feed
* Help with XFS in VMs on VMFS
@ 2013-03-28 13:21 Jan Perci
  2013-03-28 14:59 ` Stefan Ring
  2013-03-28 19:50 ` Stan Hoeppner
  0 siblings, 2 replies; 11+ messages in thread
From: Jan Perci @ 2013-03-28 13:21 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 1525 bytes --]

Hello.

I would like to use XFS in VMs with VMFS datastores on top of RAID-6.  The
RAID is a FC 14+2 x 4TB with 64K stripe.  There are 6 of these arrays.
 Each contains one aligned VMFS partition, and this VMFS partition is
shared by 4 ESXi hosts.  Each host runs 2-3 compute nodes, and some of
these nodes have multiple partitions consuming 20-50 TB.  The data is
comprised of files ranging from 100KB to 500KB, with few outliers reaching
many MB.  The directory hierarchy is such that no single directory contains
more than 2,000 or so of these files.  The data is added almost exclusively
append-only, i.e. write once when added and read many times afterwards, but
they come in spikes of 1-20GB at a time.  As the partitions fill up, new
ones are added, but sometimes the existing partitions must be grown.

Normally I would use raw mappings and XFS directly on the volumes.  But
there is a hard requirement to support VM snapshots, so all the data must
reside within VMDK files on the VMFS datastores.  ESXi has a VMDK size
limit of 2TB.  So, I am forced to create many 2TB virtual disks and attach
them to the host, then use Linux LVM to group them into a single LV, then
create XFS on the LV.

This setup is not optimal and has risks, but I must work within some
constraints.  There are a few things I can do to increase I/O performance,
such as distributing the VMDK files used by each LV across the 6 VMFS
datastores.  But can XFS be tuned as well?  Do stripe unit and stripe width
help?  Thanks for your help.

Jan.

[-- Attachment #1.2: Type: text/html, Size: 1666 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Help with XFS in VMs on VMFS
  2013-03-28 13:21 Help with XFS in VMs on VMFS Jan Perci
@ 2013-03-28 14:59 ` Stefan Ring
  2013-03-28 19:50 ` Stan Hoeppner
  1 sibling, 0 replies; 11+ messages in thread
From: Stefan Ring @ 2013-03-28 14:59 UTC (permalink / raw)
  To: Jan Perci; +Cc: xfs

> This setup is not optimal and has risks, but I must work within some
> constraints.  There are a few things I can do to increase I/O performance,
> such as distributing the VMDK files used by each LV across the 6 VMFS
> datastores.  But can XFS be tuned as well?  Do stripe unit and stripe width
> help?  Thanks for your help.

I guess you should make the number of allocation groups equal to or a
multiple of the number of concatenated VMDK files (assuming they are
equally sized). Any more fiddling is probably not worth the effort.
But I'm sure you'll get lengthy answers from other people on the list
;).

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Help with XFS in VMs on VMFS
  2013-03-28 13:21 Help with XFS in VMs on VMFS Jan Perci
  2013-03-28 14:59 ` Stefan Ring
@ 2013-03-28 19:50 ` Stan Hoeppner
  2013-03-28 21:45   ` Ralf Gross
  1 sibling, 1 reply; 11+ messages in thread
From: Stan Hoeppner @ 2013-03-28 19:50 UTC (permalink / raw)
  To: Jan Perci; +Cc: xfs

On 3/28/2013 8:21 AM, Jan Perci wrote:

> Normally I would use raw mappings and XFS directly on the volumes.  But
> there is a hard requirement to support VM snapshots, so all the data must
> reside within VMDK files on the VMFS datastores.

Since when?  ESX has had LUN snapshot capability back to 3.0, 6 years or
so.  It may have required the VCB add on back then.

Is this simply a limitation of the freebie version?  If so, pony up and
pay for what you need, or switch to a FOSS solution which has no such
limitations.

VMFS volumes are not intended for high performance IO.  Unless things
have changed recently, VMware has always recommended housing only OS
images and the like in VMDKs, not user data.  They've always recommended
using RDMs for everything else.  IIRC VMDKs have a huge block (sector)
size, something like 1MB.  That's going to make XFS alignment difficult,
if not impossible.

I cannot stress emphatically enough that you should not stitch 2TB VMDKs
together and use them in the manner you described.  This is a recipe for
disaster.  Find another solution.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Help with XFS in VMs on VMFS
  2013-03-28 19:50 ` Stan Hoeppner
@ 2013-03-28 21:45   ` Ralf Gross
  2013-03-28 22:13     ` Emmanuel Florac
  2013-03-29  0:56     ` Stan Hoeppner
  0 siblings, 2 replies; 11+ messages in thread
From: Ralf Gross @ 2013-03-28 21:45 UTC (permalink / raw)
  To: xfs

Stan Hoeppner schrieb:
> On 3/28/2013 8:21 AM, Jan Perci wrote:
> 
> > Normally I would use raw mappings and XFS directly on the volumes.  But
> > there is a hard requirement to support VM snapshots, so all the data must
> > reside within VMDK files on the VMFS datastores.
> 
> Since when?  ESX has had LUN snapshot capability back to 3.0, 6 years or
> so.  It may have required the VCB add on back then.

Snapshots are possible with RDM in virtual compatibily mode, not
physical mode (> 2 TB).

http://pubs.vmware.com/vsphere-51/topic/com.vmware.vsphere.storage.doc/GUID-0114693D-94BF-4D0E-9BA4-416D4A51A5A1.html
 
> Is this simply a limitation of the freebie version?  If so, pony up and
> pay for what you need, or switch to a FOSS solution which has no such
> limitations.

No, thats the limit for all versions.

 
> VMFS volumes are not intended for high performance IO.  Unless things
> have changed recently, VMware has always recommended housing only OS
> images and the like in VMDKs, not user data.  They've always recommended
> using RDMs for everything else.  IIRC VMDKs have a huge block (sector)
> size, something like 1MB.  That's going to make XFS alignment difficult,
> if not impossible.

I can't remember that I've every found this recommendation on a vmware
page.

http://blogs.vmware.com/vsphere/2013/01/vsphere-5-1-vmdk-versus-rdm.html

 
> I cannot stress emphatically enough that you should not stitch 2TB VMDKs
> together and use them in the manner you described.  This is a recipe for
> disaster.  Find another solution.

I'm seeing more and more requests for VMs with large disks lately in my
env. Right now the max. is ~2 TB. I'm also thinking about where to go,
 > 2 TB ist only possible with pRDMs which can't be snapshotted. You
have to use the snapshot features of your storage array.

Ralf

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Help with XFS in VMs on VMFS
  2013-03-28 21:45   ` Ralf Gross
@ 2013-03-28 22:13     ` Emmanuel Florac
  2013-03-29 14:23       ` Ralf Gross
  2013-03-29  0:56     ` Stan Hoeppner
  1 sibling, 1 reply; 11+ messages in thread
From: Emmanuel Florac @ 2013-03-28 22:13 UTC (permalink / raw)
  To: Ralf Gross; +Cc: xfs

Le Thu, 28 Mar 2013 22:45:50 +0100 vous écriviez:

> I'm seeing more and more requests for VMs with large disks lately in
> my env. Right now the max. is ~2 TB. I'm also thinking about where to
> go,
>  > 2 TB ist only possible with pRDMs which can't be snapshotted. You  
> have to use the snapshot features of your storage array.

Maybe you could give LVM snapshot a new try. They got better recently.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Help with XFS in VMs on VMFS
  2013-03-28 21:45   ` Ralf Gross
  2013-03-28 22:13     ` Emmanuel Florac
@ 2013-03-29  0:56     ` Stan Hoeppner
  2013-03-29  3:30       ` Jan Perci
  1 sibling, 1 reply; 11+ messages in thread
From: Stan Hoeppner @ 2013-03-29  0:56 UTC (permalink / raw)
  To: xfs

On 3/28/2013 4:45 PM, Ralf Gross wrote:
> Stan Hoeppner schrieb:

> Snapshots are possible with RDM in virtual compatibily mode, not
> physical mode (> 2 TB).

So 2TB is the kicker here.  I haven't used ESX since 3.x, and none of
our RDMs back then were close to 2TB.  IIRC our largest was 500GB.

>> VMFS volumes are not intended for high performance IO.  Unless things
>> have changed recently, VMware has always recommended housing only OS
>> images and the like in VMDKs, not user data.  They've always recommended
>> using RDMs for everything else.  IIRC VMDKs have a huge block (sector)
>> size, something like 1MB.  That's going to make XFS alignment difficult,
>> if not impossible.
> 
> I can't remember that I've every found this recommendation on a vmware
> page.
> 
> http://blogs.vmware.com/vsphere/2013/01/vsphere-5-1-vmdk-versus-rdm.html

If you drill down through that you find this:
http://www.vmware.com/files/pdf/performance_char_vmfs_rdm.pdf

RDMs have better large sequential performance, and lower CPU burn than
VMDKs.  The OP mentioned "compute node" in his post, which suggests an
HPC application workload, which suggests large sequential IO.

Also note that VMware is Microsoft centric so they always run their
tests using an MS Server guest.  Also note they always test with tiny
volumes, in this case 20GB.  NTFS isn't going to have any trouble at
this size, but at say 20TB it probably will and these published results
would likely be quite different at that scale.  XFS performance
characteristics on a 2TB or 20TB or ?? TB volume will likely be
substantially different than NTFS.  Their tests show 5-8% lower CPU burn
for RDM vs VMDK.  Not a huge difference, but again they're testing only
20GB.

>> I cannot stress emphatically enough that you should not stitch 2TB VMDKs
>> together and use them in the manner you described.  This is a recipe for
>> disaster.  Find another solution.
> 
> I'm seeing more and more requests for VMs with large disks lately in my
> env. Right now the max. is ~2 TB. I'm also thinking about where to go,
>  > 2 TB ist only possible with pRDMs which can't be snapshotted. You
> have to use the snapshot features of your storage array.

And more and more folks are using midrange FC/iSCSI arrays that don't
have snapshot features, others are using DAS with RAID HBAs, in both
cases forcing them to rely on ESX snapshots.  Sounds like VMware needs
to bump this artificial 2TB limit quite a bit higher.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Help with XFS in VMs on VMFS
  2013-03-29  0:56     ` Stan Hoeppner
@ 2013-03-29  3:30       ` Jan Perci
  2013-03-29 20:27         ` Ben Myers
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Perci @ 2013-03-29  3:30 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 4498 bytes --]

Thank you for your responses.  Since this list is for XFS, I do not wish to
go off topic too far into VM's.  But I will provide more context.

A key factor is the need for >2TB file systems that can be snapshot and
reverted quickly.  We have other FC arrays attached to compute nodes
without this requirement, and they have XFS directly on the FC logical
volumes made accessible to native nodes and VM nodes via RDM.

Our FC arrays do not have native snapshot features, so we must use a
software layer whether that is Linux LVM, ESXi, or something else.  And
because of our unique usage patterns and constraints, we have settled on
VMware over other virtualization technologies.  We are using ESXi (free
version) but can upgrade to ESX if necessary.  However, the upgrade
wouldn't fix the 2TB snapshot limit.

We are certainly not in the true HPC realm, but we do have about 20
physical compute nodes that do both random and sequential I/O.  An example
query might identify a 10-500GB data set comprised of 100-500KB files.
 Some work sets are processor bound with disk I/O accounting for less than
5%.  However, others are spending about 50% on disk I/O, so improving
performance would be helpful - again in the context of the snapshot
requirement.

Point well understood about the risks of striping multiple 2TB VMDK files
together.  But because of the constraints, it's either 2TB VMDK's or 2TB
RDM's in virtual compatibility mode, and they both seem about equally
risky.  Do you have better suggestions?

Back to XFS, in this context, is there any benefit in tuning some
parameters to get better performance, or will it all just be overshadowed
by poor performance of the VMDKs that tuning isn't worthwhile?

Jan.


On Thu, Mar 28, 2013 at 8:56 PM, Stan Hoeppner <stan@hardwarefreak.com>wrote:

> On 3/28/2013 4:45 PM, Ralf Gross wrote:
> > Stan Hoeppner schrieb:
>
> > Snapshots are possible with RDM in virtual compatibily mode, not
> > physical mode (> 2 TB).
>
> So 2TB is the kicker here.  I haven't used ESX since 3.x, and none of
> our RDMs back then were close to 2TB.  IIRC our largest was 500GB.
>
> >> VMFS volumes are not intended for high performance IO.  Unless things
> >> have changed recently, VMware has always recommended housing only OS
> >> images and the like in VMDKs, not user data.  They've always recommended
> >> using RDMs for everything else.  IIRC VMDKs have a huge block (sector)
> >> size, something like 1MB.  That's going to make XFS alignment difficult,
> >> if not impossible.
> >
> > I can't remember that I've every found this recommendation on a vmware
> > page.
> >
> > http://blogs.vmware.com/vsphere/2013/01/vsphere-5-1-vmdk-versus-rdm.html
>
> If you drill down through that you find this:
> http://www.vmware.com/files/pdf/performance_char_vmfs_rdm.pdf
>
> RDMs have better large sequential performance, and lower CPU burn than
> VMDKs.  The OP mentioned "compute node" in his post, which suggests an
> HPC application workload, which suggests large sequential IO.
>
> Also note that VMware is Microsoft centric so they always run their
> tests using an MS Server guest.  Also note they always test with tiny
> volumes, in this case 20GB.  NTFS isn't going to have any trouble at
> this size, but at say 20TB it probably will and these published results
> would likely be quite different at that scale.  XFS performance
> characteristics on a 2TB or 20TB or ?? TB volume will likely be
> substantially different than NTFS.  Their tests show 5-8% lower CPU burn
> for RDM vs VMDK.  Not a huge difference, but again they're testing only
> 20GB.
>
> >> I cannot stress emphatically enough that you should not stitch 2TB VMDKs
> >> together and use them in the manner you described.  This is a recipe for
> >> disaster.  Find another solution.
> >
> > I'm seeing more and more requests for VMs with large disks lately in my
> > env. Right now the max. is ~2 TB. I'm also thinking about where to go,
> >  > 2 TB ist only possible with pRDMs which can't be snapshotted. You
> > have to use the snapshot features of your storage array.
>
> And more and more folks are using midrange FC/iSCSI arrays that don't
> have snapshot features, others are using DAS with RAID HBAs, in both
> cases forcing them to rely on ESX snapshots.  Sounds like VMware needs
> to bump this artificial 2TB limit quite a bit higher.
>
> --
> Stan
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>

[-- Attachment #1.2: Type: text/html, Size: 5774 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Help with XFS in VMs on VMFS
  2013-03-28 22:13     ` Emmanuel Florac
@ 2013-03-29 14:23       ` Ralf Gross
  0 siblings, 0 replies; 11+ messages in thread
From: Ralf Gross @ 2013-03-29 14:23 UTC (permalink / raw)
  To: xfs

Emmanuel Florac schrieb:
> Le Thu, 28 Mar 2013 22:45:50 +0100 vous écriviez:
> 
> > I'm seeing more and more requests for VMs with large disks lately in
> > my env. Right now the max. is ~2 TB. I'm also thinking about where to
> > go,
> >  > 2 TB ist only possible with pRDMs which can't be snapshotted. You  
> > have to use the snapshot features of your storage array.
> 
> Maybe you could give LVM snapshot a new try. They got better recently.

I need the bigger disks for win VMs ;)

Ralf

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Help with XFS in VMs on VMFS
  2013-03-29  3:30       ` Jan Perci
@ 2013-03-29 20:27         ` Ben Myers
  2013-03-30 19:12           ` Stan Hoeppner
  0 siblings, 1 reply; 11+ messages in thread
From: Ben Myers @ 2013-03-29 20:27 UTC (permalink / raw)
  To: Jan Perci; +Cc: xfs

Hi Jan,

On Thu, Mar 28, 2013 at 11:30:01PM -0400, Jan Perci wrote:
> Back to XFS, in this context, is there any benefit in tuning some
> parameters to get better performance, or will it all just be overshadowed
> by poor performance of the VMDKs that tuning isn't worthwhile?

At least get your stripe unit and width correct.
http://xfs.org/index.php/XFS_FAQ#Q:_How_to_calculate_the_correct_sunit.2Cswidth_values_for_optimal_performance

Beyond that I suggest you stick with the defaults unless you have a specific
need.  e.g. heavy usage of extended attributes might prompt you to use a larger
inode size to keep them inline.

Regards,
Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Help with XFS in VMs on VMFS
  2013-03-29 20:27         ` Ben Myers
@ 2013-03-30 19:12           ` Stan Hoeppner
  2013-03-31  2:04             ` Dave Chinner
  0 siblings, 1 reply; 11+ messages in thread
From: Stan Hoeppner @ 2013-03-30 19:12 UTC (permalink / raw)
  To: Ben Myers; +Cc: Jan Perci, xfs

On 3/29/2013 3:27 PM, Ben Myers wrote:
> Hi Jan,
> 
> On Thu, Mar 28, 2013 at 11:30:01PM -0400, Jan Perci wrote:
>> Back to XFS, in this context, is there any benefit in tuning some
>> parameters to get better performance, or will it all just be overshadowed
>> by poor performance of the VMDKs that tuning isn't worthwhile?
> 
> At least get your stripe unit and width correct.
> http://xfs.org/index.php/XFS_FAQ#Q:_How_to_calculate_the_correct_sunit.2Cswidth_values_for_optimal_performance

Is this really a good idea given that XFS sits atop a virtual disk which
consists of multiple concatenated 2TB sparse files sitting on the VMFS
filesystem, which, IIRC, has a 1MB sector size?  Thus can one rely on
XFS being able to properly align to the physical RAID stripe, even if
the math is done 'properly' (if that's even possible here)?

In a complex stack like this I'd recommend defaults across the board.
Misalignment hurts performance far more than proper alignment increases
it.  No alignment is agnostic, 4KB IOs only, so you neither gain nor lose.

> Beyond that I suggest you stick with the defaults unless you have a specific
> need.  e.g. heavy usage of extended attributes might prompt you to use a larger
> inode size to keep them inline.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Help with XFS in VMs on VMFS
  2013-03-30 19:12           ` Stan Hoeppner
@ 2013-03-31  2:04             ` Dave Chinner
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2013-03-31  2:04 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: Ben Myers, Jan Perci, xfs

On Sat, Mar 30, 2013 at 02:12:36PM -0500, Stan Hoeppner wrote:
> On 3/29/2013 3:27 PM, Ben Myers wrote:
> > Hi Jan,
> > 
> > On Thu, Mar 28, 2013 at 11:30:01PM -0400, Jan Perci wrote:
> >> Back to XFS, in this context, is there any benefit in tuning some
> >> parameters to get better performance, or will it all just be overshadowed
> >> by poor performance of the VMDKs that tuning isn't worthwhile?
> > 
> > At least get your stripe unit and width correct.
> > http://xfs.org/index.php/XFS_FAQ#Q:_How_to_calculate_the_correct_sunit.2Cswidth_values_for_optimal_performance
> 
> Is this really a good idea given that XFS sits atop a virtual disk which
> consists of multiple concatenated 2TB sparse files sitting on the VMFS
> filesystem, which, IIRC, has a 1MB sector size?  Thus can one rely on
> XFS being able to properly align to the physical RAID stripe, even if
> the math is done 'properly' (if that's even possible here)?

No, because VMFS doesn't do any specific alignment to the underlying
storage geometry.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-03-31  2:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-28 13:21 Help with XFS in VMs on VMFS Jan Perci
2013-03-28 14:59 ` Stefan Ring
2013-03-28 19:50 ` Stan Hoeppner
2013-03-28 21:45   ` Ralf Gross
2013-03-28 22:13     ` Emmanuel Florac
2013-03-29 14:23       ` Ralf Gross
2013-03-29  0:56     ` Stan Hoeppner
2013-03-29  3:30       ` Jan Perci
2013-03-29 20:27         ` Ben Myers
2013-03-30 19:12           ` Stan Hoeppner
2013-03-31  2:04             ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.