linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: NeilBrown <neilb@suse.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/3] Improve fairness when locking the per-superblock s_anon list
Date: Thu, 9 Nov 2017 20:50:29 +0000	[thread overview]
Message-ID: <20171109205029.GD21978@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CA+55aFwxFCi5mDEadCf8xxK-LQ-aXUhd1Ox6m51vsXt3uexFpw@mail.gmail.com>

On Thu, Nov 09, 2017 at 11:52:48AM -0800, Linus Torvalds wrote:
> Honestly, looking at the code, the whole s_anon thing seems entirely
> broken. There doesn't even seem to be much reason for it. In pretty
> much all cases, we could just hash the damn dentry,
> 
> The only reason for actually having s_anon seems to be that we want
> some per-superblock list of these unconnected dentries for
> shrink_dcache_for_umount().
> 
> Everything else would actually be *much* happier with just having the
> dentry on the regular hash table. It would entirely get rid of this
> stupid performance problem, and it would actually simplify all the
> code elsewhere, because it would remove special cases like this
> 
>                 if (unlikely(IS_ROOT(dentry)))
>                         b = &dentry->d_sb->s_anon;
>                 else
>                         b = d_hash(dentry->d_name.hash);
> 
> and just turn them into
> 
>                 b = d_hash(dentry->d_name.hash);
> 
> so I really wonder if we could just get rid of s_anon entirely.
> 
> Yes, getting rid of s_anon might involve crazy things like "let's just
> walk all the dentries at umount time", but honestly, that sounds
> preferable. Especially if we can just then do something like
> 
>  - set a special flag in the superblock if we ever use __d_obtain_alias()

Automatically set for a lot of NFS mounts (whenever you mount more than one
tree from the same server, IIRC)...

>  - only scan all the dentries on umount if that flag is set.
> 
> Hmm?

That looks like a bloody painful approach, IMO.  I'm not saying I like
Neil's patch, but I doubt that "let's just walk the entire dcache on
umount" is a good idea.

I wonder if separating the d_obtain_alias() and d_obtain_root() would be
a good idea; the former outnumber the latter by many orders of magnitude.
The tricky part is that we could have a disconnected directory from
d_obtain_alias() with children already connected to it (and thus normally
hashed by d_splice_alias()) and fail to connect the whole thing to parent.

That leaves us with an orphaned tree that might stick around past the
time when we drop all references to dentries in it.  And we want to
get those hunted down and shot on umount.  Could we
	* make s_anon hold d_obtain_root ones + orphans from such
failed reconnects
	* make final dput() treat hashed IS_ROOT as "don't retain it"
	* have d_obtain_alias() put into normal hash, leaving the
"move to s_anon" part to reconnect failures.
	* keep umount side of things unchanged.

I agree that temporary insertions into ->s_anon are bogus; hell, I'm not
even sure we want to put it on _any_ list initially - we want it to look
like it's hashed, so we could set ->next to NULL and have ->pprev point
to itself.  Then normal case for d_obtain_alias() would not bother
the hash chains at all at allocation time, then have it put into the
right hash chain on reconnect.  And on reconnect failure the caller
would've moved it to orphan list (i.e. ->s_anon).

  reply	other threads:[~2017-11-09 20:50 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-09  3:22 [PATCH 0/3] Three VFS patch resends NeilBrown
2017-11-09  3:22 ` [PATCH 1/3] VFS: use synchronize_rcu_expedited() in namespace_unlock() NeilBrown
2017-11-09  3:22 ` [PATCH 2/3] Improve fairness when locking the per-superblock s_anon list NeilBrown
2017-11-09 19:52   ` Linus Torvalds
2017-11-09 20:50     ` Al Viro [this message]
2017-11-09 23:09       ` NeilBrown
2017-11-09 23:19         ` Al Viro
2017-11-10  0:02       ` Linus Torvalds
2017-11-10  8:50     ` Christoph Hellwig
2017-11-09  3:22 ` [PATCH 3/3] VFS: close race between getcwd() and d_move() NeilBrown
2017-11-09 11:41   ` Nikolay Borisov
2017-11-09 13:08     ` Matthew Wilcox
2017-11-09 16:02       ` Nikolay Borisov
2017-11-09 20:23   ` Linus Torvalds
2017-11-09 22:14     ` NeilBrown
2017-11-10  1:40       ` Linus Torvalds
2017-11-10  4:45         ` NeilBrown
2017-11-10 19:52           ` Linus Torvalds
2017-11-10 20:53           ` Al Viro
2017-11-21 23:50             ` Al Viro
2017-11-22  1:31               ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171109205029.GD21978@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).