All of lore.kernel.org
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: James Yonan <james@openvpn.net>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>
Subject: Re: [PATCH] namei: implemented RENAME_NEWER flag for renameat2() conditional replace
Date: Tue, 28 Jun 2022 12:46:04 +0300	[thread overview]
Message-ID: <CAOQ4uxi5mKd1OuAcdFemx=h+1Ay-Ka4F6ddO5_fjk7m6G88MuQ@mail.gmail.com> (raw)
In-Reply-To: <20220627221107.176495-1-james@openvpn.net>

[+linux-api]

On Tue, Jun 28, 2022 at 1:58 AM James Yonan <james@openvpn.net> wrote:
>
> RENAME_NEWER is a new userspace-visible flag for renameat2(), and
> stands alongside existing flags such as RENAME_NOREPLACE,
> RENAME_EXCHANGE, and RENAME_WHITEOUT.
>
> RENAME_NEWER is a conditional variation on RENAME_NOREPLACE, and
> indicates that if the target of the rename exists, the rename will
> only succeed if the source file is newer than the target (i.e. source
> mtime > target mtime).  Otherwise, the rename will fail with -EEXIST
> instead of replacing the target.  When the target doesn't exist,
> RENAME_NEWER does a plain rename like RENAME_NOREPLACE.
>
> RENAME_NEWER is very useful in distributed systems that mirror a
> directory structure, or use a directory as a key/value store, and need
> to guarantee that files will only be overwritten by newer files, and
> that all updates are atomic.

This feature sounds very cool.
For adding a new API it is always useful if you bring forward a userland
tool (rsync?) that intend to use it, preferably with a POC patch.
A concrete prospective user is always better than a hypothetical one.
Some people hold the opinion that only new APIs with real prospective
users should be merged.

>
> While this patch may appear large at first glance, most of the changes
> deal with renameat2() flags validation, and the core logic is only
> 5 lines in the do_renameat2() function in fs/namei.c:
>
>         if ((flags & RENAME_NEWER)
>             && d_is_positive(new_dentry)
>             && timespec64_compare(&d_backing_inode(old_dentry)->i_mtime,
>                                   &d_backing_inode(new_dentry)->i_mtime) <= 0)
>                 goto exit5;
>

I have a few questions:
- Why mtime?
- Why not ctime?
- Shouldn't a feature like that protect from overwriting metadata changes
  to the destination file?

In any case, I would be much more comfortable with comparing ctime
because when it comes to user settable times, what if rsync *wants*
to update the destination file's mtime to an earlier time that was set in
the rsync source?

If comparing ctime does not fit your use case and you can convince
the community that comparing mtime is a justified use case, we would
need to use a flag name to reflect that like RENAME_NEWER_MTIME
so we don't block future use case of RENAME_NEWER_CTIME.
I hope that we can agree that ctime is enough and that mtime will not
make sense so we can settle with RENAME_NEWER that means ctime.

> It's pretty cool in a way that a new atomic file operation can even be
> implemented in just 5 lines of code, and it's thanks to the existing
> locking infrastructure around file rename/move that these operations
> become almost trivial.  Unfortunately, every fs must approve a new
> renameat2() flag, so it bloats the patch a bit.
>
> So one question to ask is could this functionality be implemented
> in userspace without adding a new renameat2() flag?  I think you
> could attempt it with iterative RENAME_EXCHANGE, but it's hackish,
> inefficient, and not atomic, because races could cause temporary
> mtime backtracks.  How about using file locking?  Probably not,
> because the problem we want to solve is maintaining file/directory
> atomicity for readers by creating files out-of-directory, setting
> their mtime, and atomically moving them into place.  The strategy
> to lock such an operation really requires more complex locking methods
> than are generally exposed to userspace.  And if you are using inotify
> on the directory to notify readers of changes, it certainly makes
> sense to reduce unnecessary churn by preventing a move operation
> based on the mtime check.
>
> While some people might question the utility of adding features to
> filesystems to make them more like databases, there is real value
> in the performance, atomicity, consistent VFS interface, multi-thread
> safety, and async-notify capabilities of modern filesystems that
> starts to blur the line, and actually make filesystem-based key-value
> stores a win for many applications.
>
> Like RENAME_NOREPLACE, the RENAME_NEWER implementation lives in
> the VFS, however the individual fs implementations do strict flags
> checking and will return -EINVAL for any flag they don't recognize.
> For this reason, my general approach with flags is to accept
> RENAME_NEWER wherever RENAME_NOREPLACE is also accepted, since
> RENAME_NEWER is simply a conditional variant of RENAME_NOREPLACE.

You are not taking into account that mtime may be cached attribute that is not
uptodate in network filesystems (fuse too) so behavior can be a bit
unpredictable,
unless filesystem code compares also the cache coherency of the attributes
on both files and even then, without extending the network protocol this is
questionable behavior for client side.
So I think your filter of which filesystems to enable is way too wide.

How many of them did you test, I'll take a wild guess that not all of them.

Please do not enable RENAME_NEWER is any filesystem that you did
not test or that was not tested by some other fs developer using the
tests that you write.

>
> I noticed only one fs driver (cifs) that treated RENAME_NOREPLACE
> in a non-generic way, because no-replace is the natural behavior
> for CIFS, and it therefore considers RENAME_NOREPLACE as a hint that
> no replacement can occur.  Aside from this special case, it seems
> safe to assume that any fs that supports RENAME_NOREPLACE should
> also be able to support RENAME_NEWER out of the box.
>
> I did not notice a general self-test for renameat2() at the VFS
> layer (outside of fs-specific tests), so I created one, though
> at the moment it only exercises RENAME_NEWER.  Build and run with:
>
>   make -C tools/testing/selftests TARGETS=renameat2 run_tests

Please make sure to also add test coverage in fstests and/or LTP.
See fstest generic/024 for RENAME_NOREPLACE and LTP test
renameat201 as examples.
renameat201 can and should be converted to newlib and run with
.all_filesystems = 1.
This is the easiest and fastest way for you to verify your patch
on the common fs that are installed on your system.
LTP maintainers can help you with that.

Thanks,
Amir.

  reply	other threads:[~2022-06-28  9:46 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-27 22:11 [PATCH] namei: implemented RENAME_NEWER flag for renameat2() conditional replace James Yonan
2022-06-28  9:46 ` Amir Goldstein [this message]
2022-06-28 21:56   ` James Yonan
2022-06-29  5:15     ` Amir Goldstein
2022-06-30 16:18       ` James Yonan
2022-06-28 17:34 ` Al Viro
2022-06-28 18:34   ` Amir Goldstein
2022-06-28 23:19     ` James Yonan
2022-06-29  1:43       ` Dave Chinner
2022-06-29  2:07         ` NeilBrown
2022-06-29  2:35           ` Dave Chinner
2022-06-29  2:49             ` NeilBrown
2022-06-30 16:39             ` James Yonan
2022-07-01  9:23               ` [PATCH v2] namei: implemented RENAME_NEWER_MTIME " James Yonan
2022-07-01 10:34                 ` Amir Goldstein
2022-07-01 20:06                   ` James Yonan
2022-07-02  8:07                 ` Dave Chinner
2022-07-05 13:30                   ` [PATCH v3 1/2] RENAME_NEWER_MTIME is a new userspace-visible flag for renameat2(), and stands alongside existing flags including RENAME_NOREPLACE, RENAME_EXCHANGE, and RENAME_WHITEOUT James Yonan
2022-07-05 13:30                     ` [PATCH v3 2/2] selftests: added a new target renameat2 James Yonan
2022-07-05 13:30                     ` [PATCH man-pages] rename.2: document new renameat2() flag RENAME_NEWER_MTIME James Yonan
2022-07-05 14:03                   ` [RESEND PATCH v3 1/2] namei: implemented RENAME_NEWER_MTIME flag for renameat2() conditional replace James Yonan
2022-07-11 19:13                   ` [PATCH v4 " James Yonan
2022-07-11 19:13                     ` [PATCH v4 2/2] selftests: added a new target renameat2 James Yonan
2022-07-11 23:10                     ` [PATCH v4 1/2] namei: implemented RENAME_NEWER_MTIME flag for renameat2() conditional replace Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOQ4uxi5mKd1OuAcdFemx=h+1Ay-Ka4F6ddO5_fjk7m6G88MuQ@mail.gmail.com' \
    --to=amir73il@gmail.com \
    --cc=james@openvpn.net \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.