All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: tm@tao.ma
Cc: Christoph Hellwig <hch@infradead.org>,
	xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: XFS status update for January 2011
Date: Tue, 15 Feb 2011 10:55:17 +1100	[thread overview]
Message-ID: <20110214235517.GC13052@dastard> (raw)
In-Reply-To: <04650f485d5814ddfe6893a3a7c8429b.squirrel@box585.bluehost.com>

On Mon, Feb 14, 2011 at 08:20:18AM -0700, tm@tao.ma wrote:
> Hi Dave,
> > On Mon, Feb 14, 2011 at 10:17:26AM +0800, Tao Ma wrote:
> >> Hi Christoph,
> >> On 02/14/2011 02:42 AM, Christoph Hellwig wrote:
> >> >On the 4th of January we saw the release of Linux 2.6.37, which
> >> contains a
> >> >large XFS update:
> >> >
> >> >     67 files changed, 1424 insertions(+), 1524 deletions(-)
> >> >
> >> >User visible changes are the new XFS_IOC_ZERO_RANGE ioctl which allows
> >> >to convert already allocated space into unwritten extents that return
> >> >zeros on a read,
> >> would you mind describing some scenario that this ioctl can be used. I
> >> am
> >> just wondering whether ocfs2 can implement it as well.
> >
> > Zeroing a file without doing IO or having to punch out the blocks
> > already allocated to the file.
> >
> > In this case, we had a couple of different people in cloud storage
> > land asking for such functionality to optimise record deletion
> > be avoiding disruption of their preallocated file layouts as a
> > punch-then-preallocate operation does.
> Thanks for the info. yeah, ocfs2 is also used to host images in some cloud
> computing environment. So It looks helpful for us too.

Just to be clear, this optimisation isn't relevant for hosting VM
images in a cloud compute environment - this was added for
optimising the back end of distributed storage applications that
hold tens of millions of records and tens of TB of data per back end
storage host.

Hosting VM images is largely static, especially if you are
preallocating them - they never, ever get punched. Even if you are
using thin provisioning semantics and punching TRIMmed ranges, you
aren't converting the TRIMmed ranges back to preallocated state so
you wouldn't be using this interface. Hence I don't see this as
something that you would use in such an environment.

The distributed storage applications that this was added for
required atomic record deletes from the back end and the fastest and
safest way to do that was to turn the record being deleted back into
unwritten extents.  This allows that operation to be done atomically
by the filesystem whilst providing simple recovery semantics to the
application. The XFS_IOC_ZERO_RANGE ioctl simply prevents the
fragmentation that this punch-then-preallocate operation was causing
and allows the back end to scale to much larger record stores...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: tm@tao.ma
Cc: Christoph Hellwig <hch@infradead.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	xfs@oss.sgi.com
Subject: Re: XFS status update for January 2011
Date: Tue, 15 Feb 2011 10:55:17 +1100	[thread overview]
Message-ID: <20110214235517.GC13052@dastard> (raw)
In-Reply-To: <04650f485d5814ddfe6893a3a7c8429b.squirrel@box585.bluehost.com>

On Mon, Feb 14, 2011 at 08:20:18AM -0700, tm@tao.ma wrote:
> Hi Dave,
> > On Mon, Feb 14, 2011 at 10:17:26AM +0800, Tao Ma wrote:
> >> Hi Christoph,
> >> On 02/14/2011 02:42 AM, Christoph Hellwig wrote:
> >> >On the 4th of January we saw the release of Linux 2.6.37, which
> >> contains a
> >> >large XFS update:
> >> >
> >> >     67 files changed, 1424 insertions(+), 1524 deletions(-)
> >> >
> >> >User visible changes are the new XFS_IOC_ZERO_RANGE ioctl which allows
> >> >to convert already allocated space into unwritten extents that return
> >> >zeros on a read,
> >> would you mind describing some scenario that this ioctl can be used. I
> >> am
> >> just wondering whether ocfs2 can implement it as well.
> >
> > Zeroing a file without doing IO or having to punch out the blocks
> > already allocated to the file.
> >
> > In this case, we had a couple of different people in cloud storage
> > land asking for such functionality to optimise record deletion
> > be avoiding disruption of their preallocated file layouts as a
> > punch-then-preallocate operation does.
> Thanks for the info. yeah, ocfs2 is also used to host images in some cloud
> computing environment. So It looks helpful for us too.

Just to be clear, this optimisation isn't relevant for hosting VM
images in a cloud compute environment - this was added for
optimising the back end of distributed storage applications that
hold tens of millions of records and tens of TB of data per back end
storage host.

Hosting VM images is largely static, especially if you are
preallocating them - they never, ever get punched. Even if you are
using thin provisioning semantics and punching TRIMmed ranges, you
aren't converting the TRIMmed ranges back to preallocated state so
you wouldn't be using this interface. Hence I don't see this as
something that you would use in such an environment.

The distributed storage applications that this was added for
required atomic record deletes from the back end and the fastest and
safest way to do that was to turn the record being deleted back into
unwritten extents.  This allows that operation to be done atomically
by the filesystem whilst providing simple recovery semantics to the
application. The XFS_IOC_ZERO_RANGE ioctl simply prevents the
fragmentation that this punch-then-preallocate operation was causing
and allows the back end to scale to much larger record stores...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2011-02-14 23:55 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-13 18:42 XFS status update for January 2011 Christoph Hellwig
2011-02-13 18:42 ` Christoph Hellwig
2011-02-14  2:17 ` Tao Ma
2011-02-14  2:17   ` Tao Ma
2011-02-14 12:02   ` Dave Chinner
2011-02-14 12:02     ` Dave Chinner
2011-02-14 15:20     ` tm
2011-02-14 15:20       ` tm
2011-02-14 23:55       ` Dave Chinner [this message]
2011-02-14 23:55         ` Dave Chinner
2011-02-15  2:01         ` Tao Ma
2011-02-15  2:01           ` Tao Ma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110214235517.GC13052@dastard \
    --to=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tm@tao.ma \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.