Re: [PATCH 2/3] Improve fairness when locking the per-superblock s_anon list

From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: NeilBrown <neilb@suse.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/3] Improve fairness when locking the per-superblock s_anon list
Date: Thu, 9 Nov 2017 20:50:29 +0000	[thread overview]
Message-ID: <20171109205029.GD21978@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CA+55aFwxFCi5mDEadCf8xxK-LQ-aXUhd1Ox6m51vsXt3uexFpw@mail.gmail.com>

On Thu, Nov 09, 2017 at 11:52:48AM -0800, Linus Torvalds wrote:
> Honestly, looking at the code, the whole s_anon thing seems entirely
> broken. There doesn't even seem to be much reason for it. In pretty
> much all cases, we could just hash the damn dentry,
> 
> The only reason for actually having s_anon seems to be that we want
> some per-superblock list of these unconnected dentries for
> shrink_dcache_for_umount().
> 
> Everything else would actually be *much* happier with just having the
> dentry on the regular hash table. It would entirely get rid of this
> stupid performance problem, and it would actually simplify all the
> code elsewhere, because it would remove special cases like this
> 
>                 if (unlikely(IS_ROOT(dentry)))
>                         b = &dentry->d_sb->s_anon;
>                 else
>                         b = d_hash(dentry->d_name.hash);
> 
> and just turn them into
> 
>                 b = d_hash(dentry->d_name.hash);
> 
> so I really wonder if we could just get rid of s_anon entirely.
> 
> Yes, getting rid of s_anon might involve crazy things like "let's just
> walk all the dentries at umount time", but honestly, that sounds
> preferable. Especially if we can just then do something like
> 
>  - set a special flag in the superblock if we ever use __d_obtain_alias()

Automatically set for a lot of NFS mounts (whenever you mount more than one
tree from the same server, IIRC)...

>  - only scan all the dentries on umount if that flag is set.
> 
> Hmm?

That looks like a bloody painful approach, IMO.  I'm not saying I like
Neil's patch, but I doubt that "let's just walk the entire dcache on
umount" is a good idea.

I wonder if separating the d_obtain_alias() and d_obtain_root() would be
a good idea; the former outnumber the latter by many orders of magnitude.
The tricky part is that we could have a disconnected directory from
d_obtain_alias() with children already connected to it (and thus normally
hashed by d_splice_alias()) and fail to connect the whole thing to parent.

That leaves us with an orphaned tree that might stick around past the
time when we drop all references to dentries in it.  And we want to
get those hunted down and shot on umount.  Could we
	* make s_anon hold d_obtain_root ones + orphans from such
failed reconnects
	* make final dput() treat hashed IS_ROOT as "don't retain it"
	* have d_obtain_alias() put into normal hash, leaving the
"move to s_anon" part to reconnect failures.
	* keep umount side of things unchanged.

I agree that temporary insertions into ->s_anon are bogus; hell, I'm not
even sure we want to put it on _any_ list initially - we want it to look
like it's hashed, so we could set ->next to NULL and have ->pprev point
to itself.  Then normal case for d_obtain_alias() would not bother
the hash chains at all at allocation time, then have it put into the
right hash chain on reconnect.  And on reconnect failure the caller
would've moved it to orphan list (i.e. ->s_anon).