linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Amir Goldstein <amir73il@gmail.com>
Cc: linux-xfs <linux-xfs@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>
Subject: Re: [PATCHSET RFC v3 00/18] xfs: atomic file updates
Date: Thu, 1 Apr 2021 17:22:06 -0700	[thread overview]
Message-ID: <20210402002206.GF4090233@magnolia> (raw)
In-Reply-To: <CAOQ4uxhPzTK=DvUxTGV0KXnqWNsumjm1UbSiFgXbdogbzyd29w@mail.gmail.com>

On Thu, Apr 01, 2021 at 06:56:20AM +0300, Amir Goldstein wrote:
> On Thu, Apr 1, 2021 at 4:14 AM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > Hi all,
> >
> > This series creates a new FIEXCHANGE_RANGE system call to exchange
> > ranges of bytes between two files atomically.  This new functionality
> > enables data storage programs to stage and commit file updates such that
> > reader programs will see either the old contents or the new contents in
> > their entirety, with no chance of torn writes.  A successful call
> > completion guarantees that the new contents will be seen even if the
> > system fails.
> >
> > User programs will be able to update files atomically by opening an
> > O_TMPFILE, reflinking the source file to it, making whatever updates
> > they want to make, and exchange the relevant ranges of the temp file
> > with the original file.  If the updates are aligned with the file block
> > size, a new (since v2) flag provides for exchanging only the written
> > areas.  Callers can arrange for the update to be rejected if the
> > original file has been changed.
> >
> > The intent behind this new userspace functionality is to enable atomic
> > rewrites of arbitrary parts of individual files.  For years, application
> > programmers wanting to ensure the atomicity of a file update had to
> > write the changes to a new file in the same directory, fsync the new
> > file, rename the new file on top of the old filename, and then fsync the
> > directory.  People get it wrong all the time, and $fs hacks abound.
> > Here is the proposed manual page:
> >
> 
> I like the idea of modernizing FIEXCHANGE_RANGE very much and
> I think that the improved implementation and new(?) flags will be very
> useful just the way you designed them, but maybe something to consider...
> 
> Taking a step back and ignoring the existing xfs ioctl, all the use cases
> that you listed actually want MOVE_RANGE not exchange range.
> No listed use case does anything with the old data except dump it in the
> trash bin. Right?

The three listed in the manpage don't do anything with the blocks.

However, there is usecase #4: online filesystem repair, where we want to
be able to construct a new metadata file/directory/xattr tree, exchange
the new contents with the old, and still have the old contents attached
to the file so that we can (very carefully) tear down the internal
buffer caches and other.  For /that/ use case, we require truncation to
be a separate step.

> I do realize that implementing atomic extent exchange was easier back
> when that ioctl was implemented for xfs and ext4 and I realize that
> deferring inode unlink was much simpler to implement than deferred
> extent freeing, but seeing how punch hole and dedupe range already
> need to deal with freeing target inode extents, it is not obvious to me that
> atomic freeing the target inode extents instead of exchange is a bad idea
> (given the appropriate opt-in flags).
> 
> Is there a good reason for keeping the "freeing old blocks with unlink"
> strategy the only option?

Making userspace take the extra step of deciding what to do with the
tempfile (and when!) after the operation reduces the amount of work that
has to be done in the hot path, since we know that the only work we need
to do is switch the mappings (and the reverse mappings).

If this became a move operation where we drop the file2 blocks, it would
be necessary to traverse the refcount btree to see if the blocks are
shared, update the refcount btree, and possibly update the free space
btrees as well.  The current design permits us to skip all that, which
is all the more useful if the operation is synchronous.

Consider also that inactivation of inodes will soon become a background
operation in XFS, which means that userspace soon won't even have to
wait for that part.

--D

> 
> Thanks,
> Amir.

      reply	other threads:[~2021-04-02  0:22 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01  1:08 [PATCHSET RFC v3 00/18] xfs: atomic file updates Darrick J. Wong
2021-04-01  1:08 ` [PATCH 01/18] vfs: introduce new file range exchange ioctl Darrick J. Wong
2021-04-01  1:44   ` Al Viro
2021-04-01 21:18     ` Darrick J. Wong
2021-04-01  3:32   ` Amir Goldstein
2021-04-02  0:37     ` Darrick J. Wong
2021-04-01  1:08 ` [PATCH 02/18] xfs: support two inodes in the defer capture structure Darrick J. Wong
2021-04-02 23:20   ` Allison Henderson
2021-04-01  1:09 ` [PATCH 03/18] xfs: allow setting and clearing of log incompat feature flags Darrick J. Wong
2021-04-02 23:20   ` Allison Henderson
2021-04-01  1:09 ` [PATCH 04/18] xfs: clear log incompat feature bits when the log is idle Darrick J. Wong
2021-04-02 23:20   ` Allison Henderson
2021-04-01  1:09 ` [PATCH 05/18] xfs: create a log incompat flag for atomic extent swapping Darrick J. Wong
2021-04-02 23:21   ` Allison Henderson
2021-04-01  1:09 ` [PATCH 06/18] xfs: introduce a swap-extent log intent item Darrick J. Wong
2021-04-05 23:08   ` Allison Henderson
2021-04-01  1:09 ` [PATCH 07/18] xfs: create deferred log items for extent swapping Darrick J. Wong
2021-04-01  1:09 ` [PATCH 08/18] xfs: add a ->xchg_file_range handler Darrick J. Wong
2021-04-01  1:09 ` [PATCH 09/18] xfs: add error injection to test swapext recovery Darrick J. Wong
2021-04-01  1:09 ` [PATCH 10/18] xfs: port xfs_swap_extents_rmap to our new code Darrick J. Wong
2021-04-01  1:09 ` [PATCH 11/18] xfs: consolidate all of the xfs_swap_extent_forks code Darrick J. Wong
2021-04-01  1:09 ` [PATCH 12/18] xfs: refactor reflink flag handling in xfs_swap_extent_forks Darrick J. Wong
2021-04-01  1:09 ` [PATCH 13/18] xfs: allow xfs_swap_range to use older extent swap algorithms Darrick J. Wong
2021-04-01  1:10 ` [PATCH 14/18] xfs: remove old swap extents implementation Darrick J. Wong
2021-04-01  1:10 ` [PATCH 15/18] xfs: condense extended attributes after an atomic swap Darrick J. Wong
2021-04-01  1:10 ` [PATCH 16/18] xfs: condense directories " Darrick J. Wong
2021-04-01  1:10 ` [PATCH 17/18] xfs: make atomic extent swapping support realtime files Darrick J. Wong
2021-04-01  1:10 ` [PATCH 18/18] xfs: enable atomic swapext feature Darrick J. Wong
2021-04-01  3:56 ` [PATCHSET RFC v3 00/18] xfs: atomic file updates Amir Goldstein
2021-04-02  0:22   ` Darrick J. Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210402002206.GF4090233@magnolia \
    --to=djwong@kernel.org \
    --cc=amir73il@gmail.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).