All of lore.kernel.org
 help / color / mirror / Atom feed
From: riteshh <riteshh@linux.ibm.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>,
	jane.chu@oracle.com, linux-xfs@vger.kernel.org,
	hch@infradead.org, dan.j.williams@intel.com,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 3/5] vfs: add a zero-initialization mode to fallocate
Date: Wed, 22 Sep 2021 10:58:18 +0530	[thread overview]
Message-ID: <20210922052818.rszl76zkmx2tbgu2@riteshh-domain> (raw)
In-Reply-To: <20210921004431.GO1756565@dread.disaster.area>

On 21/09/21 10:44AM, Dave Chinner wrote:
> On Fri, Sep 17, 2021 at 06:31:01PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> >
> > Add a new mode to fallocate to zero-initialize all the storage backing a
> > file.
> >
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/open.c                   |    5 +++++
> >  include/linux/falloc.h      |    1 +
> >  include/uapi/linux/falloc.h |    9 +++++++++
> >  3 files changed, 15 insertions(+)
> >
> >
> > diff --git a/fs/open.c b/fs/open.c
> > index daa324606a41..230220b8f67a 100644
> > --- a/fs/open.c
> > +++ b/fs/open.c
> > @@ -256,6 +256,11 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> >  	    (mode & ~FALLOC_FL_INSERT_RANGE))
> >  		return -EINVAL;
> >
> > +	/* Zeroinit should only be used by itself and keep size must be set. */
> > +	if ((mode & FALLOC_FL_ZEROINIT_RANGE) &&
> > +	    (mode != (FALLOC_FL_ZEROINIT_RANGE | FALLOC_FL_KEEP_SIZE)))
> > +		return -EINVAL;
> > +
> >  	/* Unshare range should only be used with allocate mode. */
> >  	if ((mode & FALLOC_FL_UNSHARE_RANGE) &&
> >  	    (mode & ~(FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_KEEP_SIZE)))
> > diff --git a/include/linux/falloc.h b/include/linux/falloc.h
> > index f3f0b97b1675..4597b416667b 100644
> > --- a/include/linux/falloc.h
> > +++ b/include/linux/falloc.h
> > @@ -29,6 +29,7 @@ struct space_resv {
> >  					 FALLOC_FL_PUNCH_HOLE |		\
> >  					 FALLOC_FL_COLLAPSE_RANGE |	\
> >  					 FALLOC_FL_ZERO_RANGE |		\
> > +					 FALLOC_FL_ZEROINIT_RANGE |	\
> >  					 FALLOC_FL_INSERT_RANGE |	\
> >  					 FALLOC_FL_UNSHARE_RANGE)
> >
> > diff --git a/include/uapi/linux/falloc.h b/include/uapi/linux/falloc.h
> > index 51398fa57f6c..8144403b6102 100644
> > --- a/include/uapi/linux/falloc.h
> > +++ b/include/uapi/linux/falloc.h
> > @@ -77,4 +77,13 @@
> >   */
> >  #define FALLOC_FL_UNSHARE_RANGE		0x40
> >
> > +/*
> > + * FALLOC_FL_ZEROINIT_RANGE is used to reinitialize storage backing a file by
> > + * writing zeros to it.  Subsequent read and writes should not fail due to any
> > + * previous media errors.  Blocks must be not be shared or require copy on
> > + * write.  Holes and unwritten extents are left untouched.  This mode must be
> > + * used with FALLOC_FL_KEEP_SIZE.
> > + */
> > +#define FALLOC_FL_ZEROINIT_RANGE	0x80
>
> Hmmmm.
>
> I think this wants to be a behavioural modifier for existing
> operations rather than an operation unto itself. i.e. similar to how
> KEEP_SIZE modifies ALLOC behaviour but doesn't fundamentally alter
> the guarantees ALLOC provides userspace.
>
> In this case, the change of behaviour over ZERO_RANGE is that we
> want physical zeros to be written instead of the filesystem
> optimising away the physical zeros by manipulating the layout
> of the file.
>
> There's been requests in the past for a way to make ALLOC also
> behave like this - in the case that users want fully allocated space
> to be preallocated so their applications don't take unwritten extent
> conversion penalties on first writes. Databases are an example here,
> where setup of a new WAL file isn't performance critical, but writes
> to the WAL are and the WAL files are write-once. Hence they always
> take unwritten conversion penalties and the only way around that is
> to physically zero the files before use...
>
> So it seems to me what we actually need here is a "write zeroes"
> modifier to fallocate() operations to tell the filesystem that the
> application really wants it to write zeroes over that range, not
> just guarantee space has been physically allocated....
>
> Then we have and API that looks like:
>
> 	ALLOC		- allocate space efficiently
> 	ALLOC | INIT	- allocate space by writing zeros to it
> 	ZERO		- zero data and preallocate space efficiently
> 	ZERO | INIT	- zero range by writing zeros to it
>
> Which seems to cater for all the cases I know of where physically
> writing zeros instead of allocating unwritten extents is the
> preferred behaviour of fallocate()....
>

If that's the case we can just have FALLOC_FL_ZEROWRITE_RANGE?
Where FALLOC_FL_ZERO_RANGE & FALLOC_FL_ZEROWRITE_RANGE are mutually exclusive.

AFAIU,
/* FALLOC_FL_ZERO_RANGE may optimize the underlying blocks with unwritten
 * extents if the filesystem allows so, but with FALLOC_FL_ZEROWRITE_RANGE,
 * the underlying blocks are guranteed to be written with zeros.
 * In case of hole it will be preallocated with written extents and will be
 * initialized with zeroes. If FALLOC_FL_KEEP_SIZE is specified then the
 * inode size will remain the same.
 *
 * Essentially similar to FALLOC_FL_ZERO_RANGE but with gurantees that
 * underlying storage has written extents initialized with zeroes.
 */
#define FALLOC_FL_ZEROWRITE_RANGE		0x80

Does that make sense?

-ritesh

  parent reply	other threads:[~2021-09-22  5:29 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-18  1:30 [PATCHSET RFC v2 jane 0/5] vfs: enable userspace to reset damaged file storage Darrick J. Wong
2021-09-18  1:30 ` [PATCH 1/5] dax: prepare pmem for use by zero-initializing contents and clearing poisons Darrick J. Wong
2021-09-18 16:54   ` riteshh
2021-09-20 17:22     ` Darrick J. Wong
2021-09-21  4:07       ` riteshh
2021-09-22 18:26         ` Darrick J. Wong
2021-09-22 19:47           ` riteshh
2021-09-22 20:26           ` Dan Williams
2021-09-21  8:34   ` Christoph Hellwig
2021-09-22 18:10     ` Darrick J. Wong
2021-09-18  1:30 ` [PATCH 2/5] iomap: use accelerated zeroing on a block device to zero a file range Darrick J. Wong
2021-09-18 16:55   ` riteshh
2021-09-21  8:29   ` Christoph Hellwig
2021-09-22 18:53     ` Darrick J. Wong
2021-09-21 22:33   ` Dave Chinner
2021-09-22 18:54     ` Darrick J. Wong
2021-09-18  1:31 ` [PATCH 3/5] vfs: add a zero-initialization mode to fallocate Darrick J. Wong
2021-09-18 16:58   ` riteshh
2021-09-20 17:52   ` Eric Biggers
2021-09-20 18:06     ` Darrick J. Wong
2021-09-21  0:44   ` Dave Chinner
2021-09-21  8:31     ` Christoph Hellwig
2021-09-22  2:16       ` Dan Williams
2021-09-22  2:38         ` Darrick J. Wong
2021-09-22  3:59           ` Dave Chinner
2021-09-22  4:13             ` Darrick J. Wong
2021-09-22  5:49               ` Dave Chinner
2021-09-22 21:27                 ` Darrick J. Wong
2021-09-23  0:02                   ` Darrick J. Wong
2021-09-23  0:44                     ` Darrick J. Wong
2021-09-23  1:42                     ` Dave Chinner
2021-09-23  2:43                       ` Dan Williams
2021-09-23  5:42                         ` Dan Williams
2021-09-23 22:54                           ` Dave Chinner
2021-09-24  1:18                             ` Dan Williams
2021-09-24  1:21                               ` Jane Chu
2021-09-24  1:35                                 ` Darrick J. Wong
2021-09-27 21:07                                   ` Dave Chinner
2021-09-27 21:57                                     ` Jane Chu
2021-09-28  0:08                                       ` Dan Williams
2021-09-22  5:28     ` riteshh [this message]
2021-09-18  1:31 ` [PATCH 4/5] xfs: implement FALLOC_FL_ZEROINIT_RANGE Darrick J. Wong
2021-09-18  1:31 ` [PATCH 5/5] ext4: " Darrick J. Wong
2021-09-18 17:07   ` riteshh
2021-09-20 18:11     ` Darrick J. Wong
2021-09-21  6:10       ` riteshh
2021-09-18 18:05 ` [PATCHSET RFC v2 jane 0/5] vfs: enable userspace to reset damaged file storage Dan Williams
2021-09-23  0:51 ` Darrick J. Wong
2021-09-23  1:17   ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210922052818.rszl76zkmx2tbgu2@riteshh-domain \
    --to=riteshh@linux.ibm.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=jane.chu@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.