From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
Christoph Hellwig <hch@infradead.org>,
Jane Chu <jane.chu@oracle.com>,
linux-xfs <linux-xfs@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 3/5] vfs: add a zero-initialization mode to fallocate
Date: Wed, 22 Sep 2021 14:27:25 -0700 [thread overview]
Message-ID: <20210922212725.GN570615@magnolia> (raw)
In-Reply-To: <20210922054931.GT1756565@dread.disaster.area>
On Wed, Sep 22, 2021 at 03:49:31PM +1000, Dave Chinner wrote:
> On Tue, Sep 21, 2021 at 09:13:54PM -0700, Darrick J. Wong wrote:
> > On Wed, Sep 22, 2021 at 01:59:07PM +1000, Dave Chinner wrote:
> > > On Tue, Sep 21, 2021 at 07:38:01PM -0700, Darrick J. Wong wrote:
> > > > On Tue, Sep 21, 2021 at 07:16:26PM -0700, Dan Williams wrote:
> > > > > On Tue, Sep 21, 2021 at 1:32 AM Christoph Hellwig <hch@infradead.org> wrote:
> > > > > >
> > > > > > On Tue, Sep 21, 2021 at 10:44:31AM +1000, Dave Chinner wrote:
> > > > > > > I think this wants to be a behavioural modifier for existing
> > > > > > > operations rather than an operation unto itself. i.e. similar to how
> > > > > > > KEEP_SIZE modifies ALLOC behaviour but doesn't fundamentally alter
> > > > > > > the guarantees ALLOC provides userspace.
> > > > > > >
> > > > > > > In this case, the change of behaviour over ZERO_RANGE is that we
> > > > > > > want physical zeros to be written instead of the filesystem
> > > > > > > optimising away the physical zeros by manipulating the layout
> > > > > > > of the file.
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > > Then we have and API that looks like:
> > > > > > >
> > > > > > > ALLOC - allocate space efficiently
> > > > > > > ALLOC | INIT - allocate space by writing zeros to it
> > > > > > > ZERO - zero data and preallocate space efficiently
> > > > > > > ZERO | INIT - zero range by writing zeros to it
> > > > > > >
> > > > > > > Which seems to cater for all the cases I know of where physically
> > > > > > > writing zeros instead of allocating unwritten extents is the
> > > > > > > preferred behaviour of fallocate()....
> > > > > >
> > > > > > Agreed. I'm not sure INIT is really the right name, but I can't come
> > > > > > up with a better idea offhand.
> > > > >
> > > > > FUA? As in, this is a forced-unit-access zeroing all the way to media
> > > > > bypassing any mechanisms to emulate zero-filled payloads on future
> > > > > reads.
> > >
> > > Yes, that's the semantic we want, but FUA already defines specific
> > > data integrity behaviour in the storage stack w.r.t. volatile
> > > caches.
> > >
> > > Also, FUA is associated with devices - it's low level storage jargon
> > > and so is not really appropriate to call a user interface operation
> > > FUA where users have no idea what a "unit" or "access" actually
> > > means.
> > >
> > > Hence we should not overload this name with some other operation
> > > that does not have (and should not have) explicit data integrity
> > > requirements. That will just cause confusion for everyone.
> > >
> > > > FALLOC_FL_ZERO_EXISTING, because you want to zero the storage that
> > > > already exists at that file range?
> > >
> > > IMO that doesn't work as a behavioural modifier for ALLOC because
> > > the ALLOC semantics are explicitly "don't touch existing user
> > > data"...
> >
> > Well since you can't preallocate /and/ zerorange at the same time...
> >
> > /* For FALLOC_FL_ZERO_RANGE, write zeroes to pre-existing mapped storage. */
> > #define FALLOC_FL_ZERO_EXISTING (0x80)
>
> Except we also want the newly allocated regions (i.e. where holes
> were) in that range being zeroed to have zeroes written to them as
> well, yes? Otherwise we end up with a combination of unwritten
> extents and physical zeroes, and you can't use
> ZERORANGE|EXISTING as a replacement for PUNCH + ALLOC|INIT
>
> /*
> * For preallocation and zeroing operations, force the filesystem to
> * write zeroes rather than use unwritten extents to indicate the
> * range contains zeroes.
> *
> * For filesystems that support unwritten extents, this trades off
> * slow fallocate performance for faster first write performance as
> * unwritten extent conversion on the first write to each block in
> * the range is not needed.
> *
> * Care is required when using FALLOC_FL_ALLOC_INIT_DATA as it will
> * be much slower overall for large ranges and/or slow storage
> * compared to using unwritten extents.
> */
> #define FALLOC_FL_ALLOC_INIT_DATA (1 << 7)
I prefer FALLOC_FL_ZEROINIT_DATA here, because in the ZERO|INIT case
we're not allocating any new space, merely rewriting existing storage.
I also want to expand the description slightly:
/*
* For preallocation, force the filesystem to write zeroes rather than
* use unwritten extents to indicate the range contains zeroes. For
* zeroing operations, force the filesystem to write zeroes to existing
* written extents.
--D
>
> Cheers,
>
> Dave.
>
> --
> Dave Chinner
> david@fromorbit.com
next prev parent reply other threads:[~2021-09-22 21:27 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-18 1:30 [PATCHSET RFC v2 jane 0/5] vfs: enable userspace to reset damaged file storage Darrick J. Wong
2021-09-18 1:30 ` [PATCH 1/5] dax: prepare pmem for use by zero-initializing contents and clearing poisons Darrick J. Wong
2021-09-18 16:54 ` riteshh
2021-09-20 17:22 ` Darrick J. Wong
2021-09-21 4:07 ` riteshh
2021-09-22 18:26 ` Darrick J. Wong
2021-09-22 19:47 ` riteshh
2021-09-22 20:26 ` Dan Williams
2021-09-21 8:34 ` Christoph Hellwig
2021-09-22 18:10 ` Darrick J. Wong
2021-09-18 1:30 ` [PATCH 2/5] iomap: use accelerated zeroing on a block device to zero a file range Darrick J. Wong
2021-09-18 16:55 ` riteshh
2021-09-21 8:29 ` Christoph Hellwig
2021-09-22 18:53 ` Darrick J. Wong
2021-09-21 22:33 ` Dave Chinner
2021-09-22 18:54 ` Darrick J. Wong
2021-09-18 1:31 ` [PATCH 3/5] vfs: add a zero-initialization mode to fallocate Darrick J. Wong
2021-09-18 16:58 ` riteshh
2021-09-20 17:52 ` Eric Biggers
2021-09-20 18:06 ` Darrick J. Wong
2021-09-21 0:44 ` Dave Chinner
2021-09-21 8:31 ` Christoph Hellwig
2021-09-22 2:16 ` Dan Williams
2021-09-22 2:38 ` Darrick J. Wong
2021-09-22 3:59 ` Dave Chinner
2021-09-22 4:13 ` Darrick J. Wong
2021-09-22 5:49 ` Dave Chinner
2021-09-22 21:27 ` Darrick J. Wong [this message]
2021-09-23 0:02 ` Darrick J. Wong
2021-09-23 0:44 ` Darrick J. Wong
2021-09-23 1:42 ` Dave Chinner
2021-09-23 2:43 ` Dan Williams
2021-09-23 5:42 ` Dan Williams
2021-09-23 22:54 ` Dave Chinner
2021-09-24 1:18 ` Dan Williams
2021-09-24 1:21 ` Jane Chu
2021-09-24 1:35 ` Darrick J. Wong
2021-09-27 21:07 ` Dave Chinner
2021-09-27 21:57 ` Jane Chu
2021-09-28 0:08 ` Dan Williams
2021-09-22 5:28 ` riteshh
2021-09-18 1:31 ` [PATCH 4/5] xfs: implement FALLOC_FL_ZEROINIT_RANGE Darrick J. Wong
2021-09-18 1:31 ` [PATCH 5/5] ext4: " Darrick J. Wong
2021-09-18 17:07 ` riteshh
2021-09-20 18:11 ` Darrick J. Wong
2021-09-21 6:10 ` riteshh
2021-09-18 18:05 ` [PATCHSET RFC v2 jane 0/5] vfs: enable userspace to reset damaged file storage Dan Williams
2021-09-23 0:51 ` Darrick J. Wong
2021-09-23 1:17 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210922212725.GN570615@magnolia \
--to=djwong@kernel.org \
--cc=dan.j.williams@intel.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=jane.chu@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).