All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: NeilBrown <neilb@suse.de>
Cc: Miklos Szeredi <mszeredi@redhat.com>,
	Xavier Roche <xavier.roche@algolia.com>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v2] vfs: fix link vs. rename race
Date: Wed, 14 Sep 2022 02:30:53 +0100	[thread overview]
Message-ID: <YyEuzf33chhNXZwD@ZenIV> (raw)
In-Reply-To: <166311448437.20483.2299581036245030695@noble.neil.brown.name>

On Wed, Sep 14, 2022 at 10:14:44AM +1000, NeilBrown wrote:

> But they can race.  The open completes the lookup of /tmp/share-dir and
> holds the dentry, but the rename succeeds with inode_lock(target) in the
> code fragment you provided above before the open() can get the lock.
> By the time open() does get the lock, the dentry it holds will be marked
> S_DEAD and the __lookup_hash() will fail.
> So the open() returns -ENOENT - unexpected.
> 
> Test code below, based on the test code for the link race.
> 
> I wonder if S_DEAD should result in -ESTALE rather than -ENOENT.
> That would cause the various retry_estale() loops to retry the whole
> operation.  I suspect we'd actually need more subtlety than just that
> simple change, but it might be worth exploring.

	I don't think that wording in POSIX prohibits that ENOENT.
It does not (and no UNIX I've ever seen provides) atomicity of all
link traversals involved in pathname resolution.

root@sonny:/tmp# mkdir fubar
root@sonny:/tmp# cd fubar/
root@sonny:/tmp/fubar# rmdir ../fubar
root@sonny:/tmp/fubar# touch ./foo
touch: cannot touch './foo': No such file or directory

Do you agree that this is a reasonable behaviour?  That, BTW, is why
"retry on S_DEAD" is wrong - it can be a persistent state.

Now, replace rmdir ../fubar with rename("/tmp/barf", "/tmp/fubar") and
you'll get pretty much your testcase.  You are asking not just for
atomicity of rename vs. traversal of "fubar" in /tmp - that we already
have.  You are asking for the atomicity of the entire pathname resolution
of open() argument wrt rename().  And that is a much scarier can of worms.

Basically, pathname resolution happens on per-component basis and it's
*NOT* guaranteed that filesystem won't be changed between those or that
we won't observe such modifications.  If you check POSIX, you'll see
that it avoids any such promises - all it says (for directories; non-directory
case has similar verbiage) is this:

	If the directory named by the new argument exists, it shall be removed
	and old renamed to new. In this case, a link named new shall exist
	throughout the renaming operation and shall refer either to the directory
	referred to by new or old before the operation began.

It carefully stays the hell out of any pathname resolution atomicity promises -
all it's talking about is resolution of a single link.  It's enough to
guarantee that rename("old", "new") won't race with mkdir("new") allowing the
latter to succeed, with similar for creat(2), etc. - "new" is promised to
refer to an existing object at all times.  And that's what rename() atomicity
is normally for.  Operation on the link in question will observe either old
or new state; operation that passed through that link on the way to whatever
it will act upon has *also* observed either of those states - and the next
pathname component had been looked up in either old or new directory, but
there is no promise that nothing has happened to that directory since we got
there.

  reply	other threads:[~2022-09-14  1:31 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-21  8:20 [PATCH v2] vfs: fix link vs. rename race Miklos Szeredi
2022-02-21  8:56 ` Xavier Roche
2022-09-13  2:04 ` Al Viro
2022-09-13  4:29   ` Al Viro
2022-09-13  8:02     ` Amir Goldstein
2022-09-13 10:03       ` Miklos Szeredi
2022-09-13  4:41 ` NeilBrown
2022-09-13  5:20   ` Al Viro
2022-09-13  5:40     ` Al Viro
2022-09-14  0:14       ` NeilBrown
2022-09-14  1:30         ` Al Viro [this message]
2022-09-13 23:52     ` NeilBrown
2022-09-14  0:13       ` Al Viro
2022-09-16  6:13         ` [PATCH RFC] VFS: lock source directory for link to avoid " NeilBrown
2022-09-16  6:28           ` Miklos Szeredi
2022-09-16  6:45             ` NeilBrown
2022-09-16  6:49             ` Al Viro
2022-09-16 14:32           ` Amir Goldstein
2022-09-19  8:28             ` Christian Brauner
2022-09-19 22:56               ` NeilBrown
2022-09-23  3:02           ` [VFS] 3fb4ec6faa: ltp.linkat02.fail kernel test robot
2022-09-23  3:02             ` kernel test robot
2022-09-23  3:02             ` [LTP] " kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YyEuzf33chhNXZwD@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mszeredi@redhat.com \
    --cc=neilb@suse.de \
    --cc=xavier.roche@algolia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.