linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Al Viro <viro@ZenIV.linux.org.uk>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/3] Improve fairness when locking the per-superblock s_anon list
Date: Fri, 10 Nov 2017 10:09:09 +1100	[thread overview]
Message-ID: <87zi7vdp7u.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20171109205029.GD21978@ZenIV.linux.org.uk>

[-- Attachment #1: Type: text/plain, Size: 4514 bytes --]

On Thu, Nov 09 2017, Al Viro wrote:

> On Thu, Nov 09, 2017 at 11:52:48AM -0800, Linus Torvalds wrote:
>> Honestly, looking at the code, the whole s_anon thing seems entirely
>> broken. There doesn't even seem to be much reason for it. In pretty
>> much all cases, we could just hash the damn dentry,
>> 
>> The only reason for actually having s_anon seems to be that we want
>> some per-superblock list of these unconnected dentries for
>> shrink_dcache_for_umount().
>> 
>> Everything else would actually be *much* happier with just having the
>> dentry on the regular hash table. It would entirely get rid of this
>> stupid performance problem, and it would actually simplify all the
>> code elsewhere, because it would remove special cases like this
>> 
>>                 if (unlikely(IS_ROOT(dentry)))
>>                         b = &dentry->d_sb->s_anon;
>>                 else
>>                         b = d_hash(dentry->d_name.hash);
>> 
>> and just turn them into
>> 
>>                 b = d_hash(dentry->d_name.hash);
>> 
>> so I really wonder if we could just get rid of s_anon entirely.
>> 
>> Yes, getting rid of s_anon might involve crazy things like "let's just
>> walk all the dentries at umount time", but honestly, that sounds
>> preferable. Especially if we can just then do something like
>> 
>>  - set a special flag in the superblock if we ever use __d_obtain_alias()
>
> Automatically set for a lot of NFS mounts (whenever you mount more than one
> tree from the same server, IIRC)...
>
>>  - only scan all the dentries on umount if that flag is set.
>> 
>> Hmm?
>
> That looks like a bloody painful approach, IMO.  I'm not saying I like
> Neil's patch, but I doubt that "let's just walk the entire dcache on
> umount" is a good idea.
>
> I wonder if separating the d_obtain_alias() and d_obtain_root() would be
> a good idea; the former outnumber the latter by many orders of magnitude.
> The tricky part is that we could have a disconnected directory from
> d_obtain_alias() with children already connected to it (and thus normally
> hashed by d_splice_alias()) and fail to connect the whole thing to parent.
>
> That leaves us with an orphaned tree that might stick around past the
> time when we drop all references to dentries in it.  And we want to
> get those hunted down and shot on umount.  Could we
> 	* make s_anon hold d_obtain_root ones + orphans from such
> failed reconnects
> 	* make final dput() treat hashed IS_ROOT as "don't retain it"
> 	* have d_obtain_alias() put into normal hash, leaving the
> "move to s_anon" part to reconnect failures.
> 	* keep umount side of things unchanged.
>
> I agree that temporary insertions into ->s_anon are bogus; hell, I'm not
> even sure we want to put it on _any_ list initially - we want it to look
> like it's hashed, so we could set ->next to NULL and have ->pprev point
> to itself.  Then normal case for d_obtain_alias() would not bother
> the hash chains at all at allocation time, then have it put into the
> right hash chain on reconnect.  And on reconnect failure the caller
> would've moved it to orphan list (i.e. ->s_anon).

I looked back at the original bug report, and it was d_obtain_alias()
that was particularly suffering.  As this holds two spinlocks while
waiting for the bl_lock, a delay can affect other code that doesn't
touch s_anon.
So improving d_obtain_alias() would be a good goal (as long as we don't
just move the problem of course).

It isn't only "reconnect failure" that leaves things on s_anon.
We normally don't bother trying to connect non-directories.
So if an NFS server is getting lots of read/write request without opens
or other pathname lookups, it could easily have lots of disconnected
files being repeatedly accessed.  Keeping the dentries on d_anon means
we don't need to keep allocating new ones for every request.

So I'm not keen on dropping an IS_ROOT() dentry at final dput(), but
it might make sense to add the dentry to the per-fs list of IS_ROOT
dentries at that time.

One possible approach would be to use d_child rather than d_hash to link
together dentries that don't have a parent.
We could assign a random number to d_name.hash so it could appear to be
hashed without imposing on any one hash chain.  We would still need a
spinlock in the superblock to manage the d_anon list that links the
d_child's together...
I might try to see how the code looks.

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2017-11-09 23:09 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-09  3:22 [PATCH 0/3] Three VFS patch resends NeilBrown
2017-11-09  3:22 ` [PATCH 1/3] VFS: use synchronize_rcu_expedited() in namespace_unlock() NeilBrown
2017-11-09  3:22 ` [PATCH 2/3] Improve fairness when locking the per-superblock s_anon list NeilBrown
2017-11-09 19:52   ` Linus Torvalds
2017-11-09 20:50     ` Al Viro
2017-11-09 23:09       ` NeilBrown [this message]
2017-11-09 23:19         ` Al Viro
2017-11-10  0:02       ` Linus Torvalds
2017-11-10  8:50     ` Christoph Hellwig
2017-11-09  3:22 ` [PATCH 3/3] VFS: close race between getcwd() and d_move() NeilBrown
2017-11-09 11:41   ` Nikolay Borisov
2017-11-09 13:08     ` Matthew Wilcox
2017-11-09 16:02       ` Nikolay Borisov
2017-11-09 20:23   ` Linus Torvalds
2017-11-09 22:14     ` NeilBrown
2017-11-10  1:40       ` Linus Torvalds
2017-11-10  4:45         ` NeilBrown
2017-11-10 19:52           ` Linus Torvalds
2017-11-10 20:53           ` Al Viro
2017-11-21 23:50             ` Al Viro
2017-11-22  1:31               ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zi7vdp7u.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).