All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: John Ogness <john.ogness@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Christoph Hellwig <hch@lst.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: dcache: remove trylock loops (was Re: [BUG] lock_parent() breakage when used from shrink_dentry_list())
Date: Mon, 12 Mar 2018 18:52:31 -0500	[thread overview]
Message-ID: <87tvtk97i8.fsf@xmission.com> (raw)
In-Reply-To: <87woygan6p.fsf@xmission.com> (Eric W. Biederman's message of "Mon, 12 Mar 2018 18:28:30 -0500")

ebiederm@xmission.com (Eric W. Biederman) writes:

> Al Viro <viro@ZenIV.linux.org.uk> writes:
>
>> On Mon, Mar 12, 2018 at 03:23:44PM -0500, Eric W. Biederman wrote:
>>
>>> Of the two code paths you are concert about:
>>> 
>>> For path path_connected looking at s_root is a heuristic to avoid
>>> calling is_subdir every time we need to do that check.  If the heuristic
>>> fails we still have is_subdir which should remain accurate.  If
>>> is_subdir fails the path is genuinely not connected at that moment
>>> and failing is the correct thing to do.
>>  
>> Umm...  That might be not good enough - the logics is "everything's
>> reachable from ->s_root anyway, so we might as well not bother checking".
>> For NFS it's simply not true.
>
> If I am parsing the code correctly path_connected is broken for nfsv2
> and nfsv3 when NFS_MOUNT_UNSHARED is not set.   nfsv4 appears to make
> a kernel mount of the real root of the filesystem properly setting
> s_root and then finds the child it is mounting.
>
>> We can mount server:/foo/bar/baz on /tmp/a, then server:/foo on /tmp/b
>> and we'll have ->s_root pointing to a subtree of what's reachable at
>> /tmp/b.  Play with renames under /tmp/b and you just might end up with
>> a problem.  And mount on /tmp/a will be (mistakenly) considered to
>> be safe, since it satisfies the heuristics in path_connected().
>
> Agreed.
>
> Which means that if you mount server:/foo/bar/baz first and then
> mount server:/foo with an appropriate rename you might be able
> to see all of server:/foo or possibly server:/
>
> Hmm..
>
> Given that nfs_kill_super calls generic_shutdown_super and
> generic_shutdown_super calls shrink_dcache_for_umount
> I would argue that nfsv2 and nfsv3 are buggy in the same
> case, as shrink_dcache_for_umount is called on something
> that is not the root of the filesystem's dentry tree.

Ah.  I see now there is now the s_roots list that handles
that bit of strangeness.

So one path is to simply remove the heuristic from
path_connected.

Another path is to have nfsv2 and nfsv3 not set s_root at all.
Leaving the heuristic working for the rest of the filesystems,
and generally simplifying the code.

Something like the diff below I should think.

Eric

 fs/dcache.c      |  3 ++-
 fs/nfs/getroot.c | 35 +----------------------------------
 fs/nfs/super.c   |  2 --
 3 files changed, 3 insertions(+), 37 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 7c38f39958bc..ebe63ca026da 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1501,7 +1501,8 @@ void shrink_dcache_for_umount(struct super_block *sb)
 
 	dentry = sb->s_root;
 	sb->s_root = NULL;
-	do_one_tree(dentry);
+	if (dentry)
+		do_one_tree(dentry);
 
 	while (!hlist_bl_empty(&sb->s_roots)) {
 		dentry = dget(hlist_bl_entry(hlist_bl_first(&sb->s_roots), struct dentry, d_hash));
diff --git a/fs/nfs/getroot.c b/fs/nfs/getroot.c
index 391dafaf9182..f609d583fd2b 100644
--- a/fs/nfs/getroot.c
+++ b/fs/nfs/getroot.c
@@ -36,35 +36,6 @@
 
 #define NFSDBG_FACILITY		NFSDBG_CLIENT
 
-/*
- * Set the superblock root dentry.
- * Note that this function frees the inode in case of error.
- */
-static int nfs_superblock_set_dummy_root(struct super_block *sb, struct inode *inode)
-{
-	/* The mntroot acts as the dummy root dentry for this superblock */
-	if (sb->s_root == NULL) {
-		sb->s_root = d_make_root(inode);
-		if (sb->s_root == NULL)
-			return -ENOMEM;
-		ihold(inode);
-		/*
-		 * Ensure that this dentry is invisible to d_find_alias().
-		 * Otherwise, it may be spliced into the tree by
-		 * d_splice_alias if a parent directory from the same
-		 * filesystem gets mounted at a later time.
-		 * This again causes shrink_dcache_for_umount_subtree() to
-		 * Oops, since the test for IS_ROOT() will fail.
-		 */
-		spin_lock(&d_inode(sb->s_root)->i_lock);
-		spin_lock(&sb->s_root->d_lock);
-		hlist_del_init(&sb->s_root->d_u.d_alias);
-		spin_unlock(&sb->s_root->d_lock);
-		spin_unlock(&d_inode(sb->s_root)->i_lock);
-	}
-	return 0;
-}
-
 /*
  * get an NFS2/NFS3 root dentry from the root filehandle
  */
@@ -102,11 +73,7 @@ struct dentry *nfs_get_root(struct super_block *sb, struct nfs_fh *mntfh,
 		goto out;
 	}
 
-	error = nfs_superblock_set_dummy_root(sb, inode);
-	if (error != 0) {
-		ret = ERR_PTR(error);
-		goto out;
-	}
+	/* Leave nfsv2 and nfsv3 s_root == NULL */
 
 	/* root dentries normally start off anonymous and get spliced in later
 	 * if the dentry tree reaches them; however if the dentry already
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 29bacdc56f6a..30b45ea4dfe9 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -2625,9 +2625,7 @@ struct dentry *nfs_fs_mount_common(struct nfs_server *server,
 		}
 		s->s_bdi->ra_pages = server->rpages * NFS_MAX_READAHEAD;
 		server->super = s;
-	}
 
-	if (!s->s_root) {
 		/* initial superblock/root creation */
 		mount_info->fill_super(s, mount_info);
 		nfs_get_cache_cookie(s, mount_info->parsed, mount_info->cloned);

  reply	other threads:[~2018-03-12 23:53 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-22 23:50 [PATCH v2 0/6] fs/dcache: avoid trylock loops John Ogness
2018-02-22 23:50 ` [PATCH v2 1/6] fs/dcache: Remove stale comment from dentry_kill() John Ogness
2018-02-22 23:50 ` [PATCH v2 2/6] fs/dcache: Move dentry_kill() below lock_parent() John Ogness
2018-02-22 23:50 ` [PATCH v2 3/6] fs/dcache: Avoid the try_lock loop in d_delete() John Ogness
2018-02-23  2:08   ` Al Viro
2018-02-22 23:50 ` [PATCH v2 4/6] fs/dcache: Avoid the try_lock loops in dentry_kill() John Ogness
2018-02-23  2:22   ` Al Viro
2018-02-23  3:12     ` Al Viro
2018-02-23  3:16       ` Al Viro
2018-02-23  5:46       ` Al Viro
2018-02-22 23:50 ` [PATCH v2 5/6] fs/dcache: Avoid a try_lock loop in shrink_dentry_list() John Ogness
2018-02-23  3:48   ` Al Viro
2018-02-22 23:50 ` [PATCH v2 6/6] fs/dcache: Avoid remaining " John Ogness
2018-02-23  3:58   ` Al Viro
2018-02-23  4:08     ` Al Viro
2018-02-23 13:57       ` John Ogness
2018-02-23 15:09         ` Al Viro
2018-02-23 17:42           ` Al Viro
2018-02-23 20:13             ` [BUG] lock_parent() breakage when used from shrink_dentry_list() (was Re: [PATCH v2 6/6] fs/dcache: Avoid remaining try_lock loop in shrink_dentry_list()) Al Viro
2018-02-23 21:35               ` Linus Torvalds
2018-02-24  0:22                 ` Al Viro
2018-02-25  7:40                   ` Al Viro
2018-02-27  5:16                     ` dcache: remove trylock loops (was Re: [BUG] lock_parent() breakage when used from shrink_dentry_list()) John Ogness
2018-03-12 19:13                       ` Al Viro
2018-03-12 20:05                         ` Al Viro
2018-03-12 20:33                           ` Al Viro
2018-03-13  1:12                           ` NeilBrown
2018-04-28  0:10                             ` Al Viro
2018-03-12 20:23                         ` Eric W. Biederman
2018-03-12 20:39                           ` Al Viro
2018-03-12 23:28                             ` Eric W. Biederman
2018-03-12 23:52                               ` Eric W. Biederman [this message]
2018-03-13  0:37                                 ` Al Viro
2018-03-13  0:50                                   ` Al Viro
2018-03-13  4:02                                     ` Eric W. Biederman
2018-03-14 23:20                                     ` [PATCH] fs: Teach path_connected to handle nfs filesystems with multiple roots Eric W. Biederman
2018-03-15 22:34                                       ` Al Viro
2018-03-13  0:36                               ` dcache: remove trylock loops (was Re: [BUG] lock_parent() breakage when used from shrink_dentry_list()) Al Viro
2018-03-12 22:14                         ` Thomas Gleixner
2018-03-13 20:46                         ` John Ogness
2018-03-13 21:05                           ` John Ogness
2018-03-13 23:59                             ` Al Viro
2018-03-14  2:58                               ` Matthew Wilcox
2018-03-14  8:18                               ` John Ogness
2018-03-02  9:04                     ` [BUG] lock_parent() breakage when used from shrink_dentry_list() (was Re: [PATCH v2 6/6] fs/dcache: Avoid remaining try_lock loop in shrink_dentry_list()) Sebastian Andrzej Siewior
2018-02-23  0:59 ` [PATCH v2 0/6] fs/dcache: avoid trylock loops Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tvtk97i8.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=bigeasy@linutronix.de \
    --cc=hch@lst.de \
    --cc=john.ogness@linutronix.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.