From: Dave Chinner <firstname.lastname@example.org>
To: James Yonan <email@example.com>
Cc: firstname.lastname@example.org, email@example.com, firstname.lastname@example.org,
Subject: Re: [PATCH v4 1/2] namei: implemented RENAME_NEWER_MTIME flag for renameat2() conditional replace
Date: Tue, 12 Jul 2022 09:10:27 +1000 [thread overview]
Message-ID: <20220711231027.GB3600936@dread.disaster.area> (raw)
On Mon, Jul 11, 2022 at 01:13:30PM -0600, James Yonan wrote:
> RENAME_NEWER_MTIME is a new userspace-visible flag for renameat2(), and
> stands alongside existing flags including RENAME_NOREPLACE,
> RENAME_EXCHANGE, and RENAME_WHITEOUT.
> RENAME_NEWER_MTIME is a conditional variation on RENAME_NOREPLACE, and
> indicates that if the target of the rename exists, the rename or exchange
> will only succeed if the source file is newer than the target (i.e.
> source mtime > target mtime). Otherwise, the rename will fail with
> -EEXIST instead of replacing the target. When the target doesn't exist,
> RENAME_NEWER_MTIME does a plain rename like RENAME_NOREPLACE.
> RENAME_NEWER_MTIME can also be combined with RENAME_EXCHANGE for
> conditional exchange, where the exchange only occurs if source mtime >
> target mtime. Otherwise, the operation will fail with -EEXIST.
> Some of the use cases for RENAME_NEWER_MTIME include (a) using a
> directory as a key-value store, or (b) maintaining a near-real-time
> mirror of a remote data source. A common design pattern for maintaining
> such a data store would be to create a file using a temporary pathname,
> setting the file mtime using utimensat(2) or futimens(2) based on the
> remote creation timestamp of the file content, then using
> RENAME_NEWER_MTIME to move the file into place in the target directory.
> If the operation returns an error with errno == EEXIST, then the source
> file is not up-to-date and can safely be deleted. The goal is to
> facilitate distributed systems having many concurrent writers and
> readers, where update notifications are possibly delayed, duplicated, or
> reordered, yet where readers see a consistent view of the target
> directory with predictable semantics and atomic updates.
> Note that RENAME_NEWER_MTIME depends on accurate, high-resolution
> timestamps for mtime, preferably approaching nanosecond resolution.
> RENAME_NEWER_MTIME is implemented in vfs_rename(), and we lock and deny
> write access to both source and target inodes before comparing their
> mtimes, to stabilize the comparison.
> The use case for RENAME_NEWER_MTIME doesn't really align with
> directories, so we return -EISDIR if either source or target is a
> directory. This makes the locking necessary to stabilize the mtime
> comparison (in vfs_rename()) much more straightforward.
> Like RENAME_NOREPLACE, the RENAME_NEWER_MTIME implementation lives in
> the VFS, however the individual fs implementations do strict flags
> checking and will return -EINVAL for any flag they don't recognize.
> At this time, I have enabled and tested RENAME_NEWER_MTIME on ext2, ext3,
> ext4, xfs, btrfs, and tmpfs.
> I did not notice a general self-test for renameat2() at the VFS
> layer (outside of fs-specific tests),
We have a whole bunch of renameat2() tests in fstests that cover all
the functionality of renameat2(), and fsstress will also exercise it
in stress workloads, too:
$ git grep -l renameat2
> so I created one, though
> at the moment it only exercises RENAME_NEWER_MTIME and RENAME_EXCHANGE.
> The self-test is written to be portable to the Linux Test Project,
> and the advantage of running it there is that it automatically runs
> tests on multiple filesystems. See comments at the beginning of
> renameat2_tests.c for more info.
Ideally, new renameat2 correctness tests should be added to fstests
as per the existing tests (as this is the primary test suite a lot
of fs developers use) so that we don't end up with partial test
coverage fragmented across different test suites. It does us no
favors to have non-overlapping partial coverage in different test
suites - we are better to implement complete coverage in one test
suite and focus our efforts there...
prev parent reply other threads:[~2022-07-11 23:10 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-27 22:11 [PATCH] namei: implemented RENAME_NEWER flag for renameat2() conditional replace James Yonan
2022-06-28 9:46 ` Amir Goldstein
2022-06-28 21:56 ` James Yonan
2022-06-29 5:15 ` Amir Goldstein
2022-06-30 16:18 ` James Yonan
2022-06-28 17:34 ` Al Viro
2022-06-28 18:34 ` Amir Goldstein
2022-06-28 23:19 ` James Yonan
2022-06-29 1:43 ` Dave Chinner
2022-06-29 2:07 ` NeilBrown
2022-06-29 2:35 ` Dave Chinner
2022-06-29 2:49 ` NeilBrown
2022-06-30 16:39 ` James Yonan
2022-07-01 9:23 ` [PATCH v2] namei: implemented RENAME_NEWER_MTIME " James Yonan
2022-07-01 10:34 ` Amir Goldstein
2022-07-01 20:06 ` James Yonan
2022-07-02 8:07 ` Dave Chinner
2022-07-05 13:30 ` [PATCH v3 1/2] RENAME_NEWER_MTIME is a new userspace-visible flag for renameat2(), and stands alongside existing flags including RENAME_NOREPLACE, RENAME_EXCHANGE, and RENAME_WHITEOUT James Yonan
2022-07-05 13:30 ` [PATCH v3 2/2] selftests: added a new target renameat2 James Yonan
2022-07-05 13:30 ` [PATCH man-pages] rename.2: document new renameat2() flag RENAME_NEWER_MTIME James Yonan
2022-07-05 14:03 ` [RESEND PATCH v3 1/2] namei: implemented RENAME_NEWER_MTIME flag for renameat2() conditional replace James Yonan
2022-07-11 19:13 ` [PATCH v4 " James Yonan
2022-07-11 19:13 ` [PATCH v4 2/2] selftests: added a new target renameat2 James Yonan
2022-07-11 23:10 ` Dave Chinner [this message]
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.