linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* fs: race between vfs_rename and do_linkat (mv and link)
@ 2022-02-14 21:07 Xavier Roche
  2022-02-15  9:56 ` Miklos Szeredi
  0 siblings, 1 reply; 13+ messages in thread
From: Xavier Roche @ 2022-02-14 21:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alexander Viro, linux-fsdevel, Xavier Roche, Aneesh Kumar K.V

There has been a longstanding race condition between vfs_rename and do_linkat,
when those operations are done in parallel:

1. Moving a file to an existing target file (eg. mv file target)
2. Creating a link from the target file  to a third file (eg. ln target link)

A typical example would be (1) a regular process putting a new version
of a database in place and (2) a regular process backuping the live
database by hardlinking it.

My understanding is that as the target file is never erased on client
side, but just replaced, the link should never fail.

The issue seem to lie inside vfs_link (fs/namei.c):
       inode_lock(inode);
       /* Make sure we don't allow creating hardlink to an unlinked file */
       if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE))
               error =  -ENOENT;

The possible answer is that the inode refcount is zero because the
file has just been replaced concurrently, old file being erased, and
as such, the link operation is failing.

The race appears to have been introduced by aae8a97d3ec30, to fix
_another_ race between unlink and link (but I'm not sure to understand
what were the implications).

Reverting the inode->i_nlink == 0 section "fixes" the issue, but would
probably reintroduce this another issue.

At this point I don't know what would be the best way to fix this issue.

Trivial case that will lead to ENOENT: (reproduced on 5.16.5)
Note that the race _seems_ to last while some IO are pending (getting the
race on tmpfs is typically much harder)

========== Cut here ==========
#!/bin/bash
#

rm -f link file target
touch target

# Link target -> link in loop
while ln target link && rm link; do :; done &

# Overwrite file -> target in loop until we fail
while touch file && mv file target; do :; done &

wait
========== Cut here ==========

Kudos to Xavier Grand from Algolia for spotting the issue with a
reproducible case.

The issue was reported three years ago, but only on the fsdevel
mailing-list, where it might have been overlooked.
It was also reported at https://bugzilla.kernel.org/show_bug.cgi?id=204705

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: race between vfs_rename and do_linkat (mv and link)
  2022-02-14 21:07 fs: race between vfs_rename and do_linkat (mv and link) Xavier Roche
@ 2022-02-15  9:56 ` Miklos Szeredi
  2022-02-15 13:37   ` Al Viro
  0 siblings, 1 reply; 13+ messages in thread
From: Miklos Szeredi @ 2022-02-15  9:56 UTC (permalink / raw)
  To: Xavier Roche
  Cc: linux-kernel, Alexander Viro, linux-fsdevel, Aneesh Kumar K.V

On Mon, 14 Feb 2022 at 22:07, Xavier Roche <xavier.roche@algolia.com> wrote:
>
> There has been a longstanding race condition between vfs_rename and do_linkat,
> when those operations are done in parallel:
>
> 1. Moving a file to an existing target file (eg. mv file target)
> 2. Creating a link from the target file  to a third file (eg. ln target link)
>
> A typical example would be (1) a regular process putting a new version
> of a database in place and (2) a regular process backuping the live
> database by hardlinking it.
>
> My understanding is that as the target file is never erased on client
> side, but just replaced, the link should never fail.
>
> The issue seem to lie inside vfs_link (fs/namei.c):
>        inode_lock(inode);
>        /* Make sure we don't allow creating hardlink to an unlinked file */
>        if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE))
>                error =  -ENOENT;
>
> The possible answer is that the inode refcount is zero because the
> file has just been replaced concurrently, old file being erased, and
> as such, the link operation is failing.
>
> The race appears to have been introduced by aae8a97d3ec30, to fix
> _another_ race between unlink and link (but I'm not sure to understand
> what were the implications).
>
> Reverting the inode->i_nlink == 0 section "fixes" the issue, but would
> probably reintroduce this another issue.
>
> At this point I don't know what would be the best way to fix this issue.

Doing "lock_rename() + lookup last components" would fix this race.
If this was only done on retry, then that would prevent possible
performance regressions, at the cost of extra complexity.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: race between vfs_rename and do_linkat (mv and link)
  2022-02-15  9:56 ` Miklos Szeredi
@ 2022-02-15 13:37   ` Al Viro
  2022-02-15 16:06     ` Al Viro
  0 siblings, 1 reply; 13+ messages in thread
From: Al Viro @ 2022-02-15 13:37 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Xavier Roche, linux-kernel, linux-fsdevel, Aneesh Kumar K.V

On Tue, Feb 15, 2022 at 10:56:29AM +0100, Miklos Szeredi wrote:

> Doing "lock_rename() + lookup last components" would fix this race.

No go - thanks to the possibility of AT_SYMLINK_FOLLOW there.
Think of it - we'd need to
	* lock parents (both at the same time)
	* look up the last component of source
	* if it turns a symlink - unlock parents and repeat the entire
thing for its body, except when asked not to.
	* when we are done with the source, look the last component of
target up

... and then there is sodding -ESTALE handling, with all the elegance
that brings in.

> If this was only done on retry, then that would prevent possible
> performance regressions, at the cost of extra complexity.

Extra compared to the above, that is.  How delightful...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: race between vfs_rename and do_linkat (mv and link)
  2022-02-15 13:37   ` Al Viro
@ 2022-02-15 16:06     ` Al Viro
  2022-02-15 16:17       ` Matthew Wilcox
                         ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Al Viro @ 2022-02-15 16:06 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Xavier Roche, linux-kernel, linux-fsdevel, Aneesh Kumar K.V

On Tue, Feb 15, 2022 at 01:37:40PM +0000, Al Viro wrote:
> On Tue, Feb 15, 2022 at 10:56:29AM +0100, Miklos Szeredi wrote:
> 
> > Doing "lock_rename() + lookup last components" would fix this race.
> 
> No go - thanks to the possibility of AT_SYMLINK_FOLLOW there.
> Think of it - we'd need to
> 	* lock parents (both at the same time)
> 	* look up the last component of source
> 	* if it turns a symlink - unlock parents and repeat the entire
> thing for its body, except when asked not to.
> 	* when we are done with the source, look the last component of
> target up
> 
> ... and then there is sodding -ESTALE handling, with all the elegance
> that brings in.
> 
> > If this was only done on retry, then that would prevent possible
> > performance regressions, at the cost of extra complexity.
> 
> Extra compared to the above, that is.  How delightful...

Actually, it's even viler than that: lock_rename() relies upon the
directories being locked sitting on the same fs.  Now, surely link(2)
would fail if source and target are on the different filesystem,
wouldn't it?  Alas, with AT_SYMLINK_FOLLOW it's quite possible to have
the source resolving to a symlink that does lead to the same fs as the
target, while the symlink itself is on a different fs.  So it's not
even straight lock_rename() - it has to be a special version that would
handle cross-fs invocations (somehow - e.g. ordering them on superblock
or mount in-core address in such case; ordering between dentries could
be arbitrary for cross-fs cases).

Worse, you need to deal with the corner cases.  "/" or anything ending on
"." or ".." can be rejected (no links to directories) and thankfully we
do not allow AT_EMPTY for linkat(2), but...  procfs symlinks are in the
game, since AT_SYMLINK_FOLLOW is there.

And _that_ is a real bitch - what "parent" would you lock for (followed)
/proc/self/fd/0?  It can change right under you; one solution would
be to grab ->vfs_rename_mutex first, same parent or not, then do what
lock_rename() does, relying upon ->d_parent having been stabilized
by ->vfs_rename_mutex.  But that would have to be conditional upon
running into that case - you don't want to serialize the shit out of
(same-directory) link(2) on given filesystem.  Which makes the entire
thing even harder to follow and reason about.

And to make it even more fun, you'll need to either duplicate pick_link()
guts, or try and make it usable in this situation.  Might or might not
be easy - I hadn't tried to go into that.

"Fucking ugly" is inadequate for the likely results of that approach.
It's guaranteed to be a source of headache for pretty much ever after.

Does POSIX actually make any promises in that area?  That would affect
how high a cost we ought to pay for that - I agree that it would be nicer
to have atomicity from userland point of view, but there's a difference
between hard bug and QoI issue.

Again, what really makes it painful is AT_SYMLINK_FOLLOW support in
linkat(2).  For plain link(2) it would be easier to deal with.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: race between vfs_rename and do_linkat (mv and link)
  2022-02-15 16:06     ` Al Viro
@ 2022-02-15 16:17       ` Matthew Wilcox
  2022-02-15 16:20         ` Al Viro
  2022-02-15 16:18       ` Al Viro
  2022-02-15 16:56       ` Xavier Roche
  2 siblings, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2022-02-15 16:17 UTC (permalink / raw)
  To: Al Viro
  Cc: Miklos Szeredi, Xavier Roche, linux-kernel, linux-fsdevel,
	Aneesh Kumar K.V

On Tue, Feb 15, 2022 at 04:06:06PM +0000, Al Viro wrote:
> On Tue, Feb 15, 2022 at 01:37:40PM +0000, Al Viro wrote:
> > On Tue, Feb 15, 2022 at 10:56:29AM +0100, Miklos Szeredi wrote:
> > 
> > > Doing "lock_rename() + lookup last components" would fix this race.
>
> "Fucking ugly" is inadequate for the likely results of that approach.
> It's guaranteed to be a source of headache for pretty much ever after.
> 
> Does POSIX actually make any promises in that area?  That would affect
> how high a cost we ought to pay for that - I agree that it would be nicer
> to have atomicity from userland point of view, but there's a difference
> between hard bug and QoI issue.

As I understand the original report, it relies on us hitting the nlink ==
0 at exactly the wrong moment.  Can't we just restart the entire path
resolution if we find a target with nlink == 0?  Sure, it's a lot of
extra work, but you've got to be trying hard to hit it in the first place.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: race between vfs_rename and do_linkat (mv and link)
  2022-02-15 16:06     ` Al Viro
  2022-02-15 16:17       ` Matthew Wilcox
@ 2022-02-15 16:18       ` Al Viro
  2022-02-15 16:56       ` Xavier Roche
  2 siblings, 0 replies; 13+ messages in thread
From: Al Viro @ 2022-02-15 16:18 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Xavier Roche, linux-kernel, linux-fsdevel, Aneesh Kumar K.V

On Tue, Feb 15, 2022 at 04:06:06PM +0000, Al Viro wrote:

> Worse, you need to deal with the corner cases.  "/" or anything ending on
> "." or ".." can be rejected (no links to directories) and thankfully we
> do not allow AT_EMPTY for linkat(2), but...  procfs symlinks are in the

That'd be AT_EMPTY_PATH, obviously, and unfortunately we do allow that.
Which brings another fun case to deal with.  Same problem with "what's the
parent of that thing and how do we make it stable?"...

Oh, and you need to cope with O_TMPFILE ones as well - both for that and
for procfs symlinks to them.  Which is fine from the vfs_link() POV (see
I_LINKABLE check in there), but the locking is outside of that, so we
need to deal with that joy.  And _there_ you have no parent at all - what
could it be, anyway?  So we'd need to decide what to lock.  *AND* we have
the possibility of another thread doing link(2) on what used to be
O_TMPFILE, which would give it a parent by the time we get to doing
the actual operation...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: race between vfs_rename and do_linkat (mv and link)
  2022-02-15 16:17       ` Matthew Wilcox
@ 2022-02-15 16:20         ` Al Viro
  2022-02-16  9:28           ` Miklos Szeredi
  0 siblings, 1 reply; 13+ messages in thread
From: Al Viro @ 2022-02-15 16:20 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Miklos Szeredi, Xavier Roche, linux-kernel, linux-fsdevel,
	Aneesh Kumar K.V

On Tue, Feb 15, 2022 at 04:17:11PM +0000, Matthew Wilcox wrote:
> On Tue, Feb 15, 2022 at 04:06:06PM +0000, Al Viro wrote:
> > On Tue, Feb 15, 2022 at 01:37:40PM +0000, Al Viro wrote:
> > > On Tue, Feb 15, 2022 at 10:56:29AM +0100, Miklos Szeredi wrote:
> > > 
> > > > Doing "lock_rename() + lookup last components" would fix this race.
> >
> > "Fucking ugly" is inadequate for the likely results of that approach.
> > It's guaranteed to be a source of headache for pretty much ever after.
> > 
> > Does POSIX actually make any promises in that area?  That would affect
> > how high a cost we ought to pay for that - I agree that it would be nicer
> > to have atomicity from userland point of view, but there's a difference
> > between hard bug and QoI issue.
> 
> As I understand the original report, it relies on us hitting the nlink ==
> 0 at exactly the wrong moment.  Can't we just restart the entire path
> resolution if we find a target with nlink == 0?  Sure, it's a lot of
> extra work, but you've got to be trying hard to hit it in the first place.

touch /tmp/blah
exec 42</tmp/blah
rm /tmp/blah
... call linkat() with AT_SYMLINK_FOLLOW and /proc/self/fd/42 for source

Your variant will loop indefinitely on that...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: race between vfs_rename and do_linkat (mv and link)
  2022-02-15 16:06     ` Al Viro
  2022-02-15 16:17       ` Matthew Wilcox
  2022-02-15 16:18       ` Al Viro
@ 2022-02-15 16:56       ` Xavier Roche
  2 siblings, 0 replies; 13+ messages in thread
From: Xavier Roche @ 2022-02-15 16:56 UTC (permalink / raw)
  To: Al Viro; +Cc: Miklos Szeredi, linux-kernel, linux-fsdevel, Aneesh Kumar K.V

On Tue, Feb 15, 2022 at 5:06 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> Does POSIX actually make any promises in that area?

My understanding is that we inherit from the mandatory atomicity of
all rename calls
(https://pubs.opengroup.org/onlinepubs/000095399/functions/rename.html)
> That specification requires that the action of the function be atomic.

We also inherit from the link call that is required to be atomic
(https://pubs.opengroup.org/onlinepubs/009695399/functions/link.html)
> The link() function shall atomically create a new link for the existing file and the link count of the file shall be incremented by one

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: race between vfs_rename and do_linkat (mv and link)
  2022-02-15 16:20         ` Al Viro
@ 2022-02-16  9:28           ` Miklos Szeredi
  2022-02-16 10:28             ` Miklos Szeredi
  0 siblings, 1 reply; 13+ messages in thread
From: Miklos Szeredi @ 2022-02-16  9:28 UTC (permalink / raw)
  To: Al Viro
  Cc: Matthew Wilcox, Xavier Roche, linux-kernel, linux-fsdevel,
	Aneesh Kumar K.V

On Tue, 15 Feb 2022 at 17:20, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Tue, Feb 15, 2022 at 04:17:11PM +0000, Matthew Wilcox wrote:
> > On Tue, Feb 15, 2022 at 04:06:06PM +0000, Al Viro wrote:
> > > On Tue, Feb 15, 2022 at 01:37:40PM +0000, Al Viro wrote:
> > > > On Tue, Feb 15, 2022 at 10:56:29AM +0100, Miklos Szeredi wrote:
> > > >
> > > > > Doing "lock_rename() + lookup last components" would fix this race.
> > >
> > > "Fucking ugly" is inadequate for the likely results of that approach.
> > > It's guaranteed to be a source of headache for pretty much ever after.

So this is a fairly special situation.  How about adding a new rwsem
(could possibly be global or per-fs)?

 - acquired for read in lock_rename() before inode locks
 - acquired for write in do_linkat before inode locks, but only on retry

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: race between vfs_rename and do_linkat (mv and link)
  2022-02-16  9:28           ` Miklos Szeredi
@ 2022-02-16 10:28             ` Miklos Szeredi
  2022-02-16 13:18               ` Xavier Roche
  0 siblings, 1 reply; 13+ messages in thread
From: Miklos Szeredi @ 2022-02-16 10:28 UTC (permalink / raw)
  To: Al Viro
  Cc: Matthew Wilcox, Xavier Roche, linux-kernel, linux-fsdevel,
	Aneesh Kumar K.V

On Wed, Feb 16, 2022 at 10:28:20AM +0100, Miklos Szeredi wrote:

> So this is a fairly special situation.  How about adding a new rwsem
> (could possibly be global or per-fs)?
> 
>  - acquired for read in lock_rename() before inode locks
>  - acquired for write in do_linkat before inode locks, but only on retry

Something like this:

diff --git a/fs/namei.c b/fs/namei.c
index 3f1829b3ab5b..dd6908cee49d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -122,6 +122,8 @@
  * PATH_MAX includes the nul terminator --RR.
  */
 
+static DECLARE_RWSEM(link_rwsem);
+
 #define EMBEDDED_NAME_MAX	(PATH_MAX - offsetof(struct filename, iname))
 
 struct filename *
@@ -2961,6 +2963,8 @@ struct dentry *lock_rename(struct dentry *p1, struct dentry *p2)
 {
 	struct dentry *p;
 
+	down_read(&link_rwsem);
+
 	if (p1 == p2) {
 		inode_lock_nested(p1->d_inode, I_MUTEX_PARENT);
 		return NULL;
@@ -2995,6 +2999,8 @@ void unlock_rename(struct dentry *p1, struct dentry *p2)
 		inode_unlock(p2->d_inode);
 		mutex_unlock(&p1->d_sb->s_vfs_rename_mutex);
 	}
+
+	up_read(&link_rwsem);
 }
 EXPORT_SYMBOL(unlock_rename);
 
@@ -4456,6 +4462,7 @@ int do_linkat(int olddfd, struct filename *old, int newdfd,
 	struct path old_path, new_path;
 	struct inode *delegated_inode = NULL;
 	int how = 0;
+	bool lock = false;
 	int error;
 
 	if ((flags & ~(AT_SYMLINK_FOLLOW | AT_EMPTY_PATH)) != 0) {
@@ -4474,10 +4481,13 @@ int do_linkat(int olddfd, struct filename *old, int newdfd,
 
 	if (flags & AT_SYMLINK_FOLLOW)
 		how |= LOOKUP_FOLLOW;
+retry_lock:
+	if (lock)
+		down_write(&link_rwsem);
 retry:
 	error = filename_lookup(olddfd, old, how, &old_path, NULL);
 	if (error)
-		goto out_putnames;
+		goto out_unlock_link;
 
 	new_dentry = filename_create(newdfd, new, &new_path,
 					(how & LOOKUP_REVAL));
@@ -4511,8 +4521,16 @@ int do_linkat(int olddfd, struct filename *old, int newdfd,
 		how |= LOOKUP_REVAL;
 		goto retry;
 	}
+	if (!lock && error == -ENOENT) {
+		path_put(&old_path);
+		lock = true;
+		goto retry_lock;
+	}
 out_putpath:
 	path_put(&old_path);
+out_unlock_link:
+	if (lock)
+		up_write(&link_rwsem);
 out_putnames:
 	putname(old);
 	putname(new);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: race between vfs_rename and do_linkat (mv and link)
  2022-02-16 10:28             ` Miklos Szeredi
@ 2022-02-16 13:18               ` Xavier Roche
  2022-02-16 13:37                 ` Miklos Szeredi
  0 siblings, 1 reply; 13+ messages in thread
From: Xavier Roche @ 2022-02-16 13:18 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Al Viro, Matthew Wilcox, linux-kernel, linux-fsdevel, Aneesh Kumar K.V

On Wed, Feb 16, 2022 at 11:28:18AM +0100, Miklos Szeredi wrote:
> Something like this:
> diff --git a/fs/namei.c b/fs/namei.c
> index 3f1829b3ab5b..dd6908cee49d 100644

Tested-by: Xavier Roche <xavier.roche@algolia.com>

I confirm this completely fixes at least the specific race. Tested on a
unpatched and then patched 5.16.5, with the trivial bash test, and then
with a C++ torture test.

Before:
-------

$ time ./linkbug
Failed after 4 with No such file or directory

real	0m0,004s
user	0m0,000s
sys	0m0,004s

After:
------

(no error after ten minutes of running the program)

Torture test program:
---------------------

/* Linux rename vs. linkat race condition.
 * Rationale: both (1) moving a file to a target and (2) linking the target to a file in parallel leads to a race
 * on Linux kernel.
 * Sample file courtesy of Xavier Grand at Algolia
 * g++ -pthread linkbug.c -o linkbug
 */

#include <thread>
#include <unistd.h>
#include <assert.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <iostream>
#include <string.h>

static const char* producedDir = "/tmp";
static const char* producedFile = "/tmp/file.txt";
static const char* producedTmpFile = "/tmp/file.txt.tmp";
static const char* producedThreadDir = "/tmp/tmp";
static const char* producedThreadFile = "/tmp/file.txt.tmp.2";

bool createFile(const char* path)
{
    const int fdOut = open(path,
                           O_WRONLY | O_CREAT | O_TRUNC | O_EXCL | O_CLOEXEC,
                           S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH);
    assert(fdOut != -1);
    assert(write(fdOut, "Foo", 4) == 4);
    assert(close(fdOut) == 0);
    return true;
}

void func()
{
    int nbSuccess = 0;
    // Loop producedThread a hardlink of the file
    while (true) {
        if (link(producedFile, producedThreadFile) != 0) {
            std::cout << "Failed after " << nbSuccess << " with " << strerror(errno) << std::endl;
            exit(EXIT_FAILURE);
        } else {
            nbSuccess++;
        }
        assert(unlink(producedThreadFile) == 0);
    }
}

int main()
{
    // Setup env
    unlink(producedTmpFile);
    unlink(producedFile);
    unlink(producedThreadFile);
    createFile(producedFile);
    mkdir(producedThreadDir, 0777);

    // Async thread doing a hardlink and moving it
    std::thread t(func);
    // Loop creating a .tmp and moving it
    while (true) {
        assert(createFile(producedTmpFile));
        assert(rename(producedTmpFile, producedFile) == 0);
    }
    return 0;
}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: race between vfs_rename and do_linkat (mv and link)
  2022-02-16 13:18               ` Xavier Roche
@ 2022-02-16 13:37                 ` Miklos Szeredi
  2022-02-18 15:37                   ` Miklos Szeredi
  0 siblings, 1 reply; 13+ messages in thread
From: Miklos Szeredi @ 2022-02-16 13:37 UTC (permalink / raw)
  To: Xavier Roche
  Cc: Al Viro, Matthew Wilcox, linux-kernel, linux-fsdevel, Aneesh Kumar K.V

On Wed, 16 Feb 2022 at 14:18, Xavier Roche <xavier.roche@algolia.com> wrote:
>
> On Wed, Feb 16, 2022 at 11:28:18AM +0100, Miklos Szeredi wrote:
> > Something like this:
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 3f1829b3ab5b..dd6908cee49d 100644
>
> Tested-by: Xavier Roche <xavier.roche@algolia.com>
>
> I confirm this completely fixes at least the specific race. Tested on a
> unpatched and then patched 5.16.5, with the trivial bash test, and then
> with a C++ torture test.

Thanks for testing.

One issue with the patch is nesting of lock_rename() calls in stacked
fs (rwsem is not allowed to recurse even for read locks).

So the lock needs to be per-sb, but then do_linkat() becomes more
complex due to not being able to use the filename_create() helper.
But it's still much simpler than the special lookup loop described by
Al.

Thanks,
Miklos
>
> Before:
> -------
>
> $ time ./linkbug
> Failed after 4 with No such file or directory
>
> real    0m0,004s
> user    0m0,000s
> sys     0m0,004s
>
> After:
> ------
>
> (no error after ten minutes of running the program)
>
> Torture test program:
> ---------------------
>
> /* Linux rename vs. linkat race condition.
>  * Rationale: both (1) moving a file to a target and (2) linking the target to a file in parallel leads to a race
>  * on Linux kernel.
>  * Sample file courtesy of Xavier Grand at Algolia
>  * g++ -pthread linkbug.c -o linkbug
>  */
>
> #include <thread>
> #include <unistd.h>
> #include <assert.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <iostream>
> #include <string.h>
>
> static const char* producedDir = "/tmp";
> static const char* producedFile = "/tmp/file.txt";
> static const char* producedTmpFile = "/tmp/file.txt.tmp";
> static const char* producedThreadDir = "/tmp/tmp";
> static const char* producedThreadFile = "/tmp/file.txt.tmp.2";
>
> bool createFile(const char* path)
> {
>     const int fdOut = open(path,
>                            O_WRONLY | O_CREAT | O_TRUNC | O_EXCL | O_CLOEXEC,
>                            S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH);
>     assert(fdOut != -1);
>     assert(write(fdOut, "Foo", 4) == 4);
>     assert(close(fdOut) == 0);
>     return true;
> }
>
> void func()
> {
>     int nbSuccess = 0;
>     // Loop producedThread a hardlink of the file
>     while (true) {
>         if (link(producedFile, producedThreadFile) != 0) {
>             std::cout << "Failed after " << nbSuccess << " with " << strerror(errno) << std::endl;
>             exit(EXIT_FAILURE);
>         } else {
>             nbSuccess++;
>         }
>         assert(unlink(producedThreadFile) == 0);
>     }
> }
>
> int main()
> {
>     // Setup env
>     unlink(producedTmpFile);
>     unlink(producedFile);
>     unlink(producedThreadFile);
>     createFile(producedFile);
>     mkdir(producedThreadDir, 0777);
>
>     // Async thread doing a hardlink and moving it
>     std::thread t(func);
>     // Loop creating a .tmp and moving it
>     while (true) {
>         assert(createFile(producedTmpFile));
>         assert(rename(producedTmpFile, producedFile) == 0);
>     }
>     return 0;
> }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: race between vfs_rename and do_linkat (mv and link)
  2022-02-16 13:37                 ` Miklos Szeredi
@ 2022-02-18 15:37                   ` Miklos Szeredi
  0 siblings, 0 replies; 13+ messages in thread
From: Miklos Szeredi @ 2022-02-18 15:37 UTC (permalink / raw)
  To: Xavier Roche
  Cc: Al Viro, Matthew Wilcox, linux-kernel, linux-fsdevel, Aneesh Kumar K.V

On Wed, 16 Feb 2022 at 14:37, Miklos Szeredi <miklos@szeredi.hu> wrote:

> One issue with the patch is nesting of lock_rename() calls in stacked
> fs (rwsem is not allowed to recurse even for read locks).

Scratch that, there's no recursion since do_linkat() is not called
from stacked fs.  And taking link_rwsem is optional for the link
operation, so this is fine.  For stacked fs the race is hopefully
taken care by the top layer locking, no need to repeat it in lower
layers.

I've now sent this patch with a proper header comment to Al.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-02-18 15:38 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-14 21:07 fs: race between vfs_rename and do_linkat (mv and link) Xavier Roche
2022-02-15  9:56 ` Miklos Szeredi
2022-02-15 13:37   ` Al Viro
2022-02-15 16:06     ` Al Viro
2022-02-15 16:17       ` Matthew Wilcox
2022-02-15 16:20         ` Al Viro
2022-02-16  9:28           ` Miklos Szeredi
2022-02-16 10:28             ` Miklos Szeredi
2022-02-16 13:18               ` Xavier Roche
2022-02-16 13:37                 ` Miklos Szeredi
2022-02-18 15:37                   ` Miklos Szeredi
2022-02-15 16:18       ` Al Viro
2022-02-15 16:56       ` Xavier Roche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).