linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stephen Brennan <stephen.s.brennan@oracle.com>
To: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Arnd Bergmann <arnd@arndb.de>,
	Stephen Brennan <stephen.s.brennan@oracle.com>,
	Matthew Wilcox <willy@infradead.org>,
	James Bottomley <James.Bottomley@HansenPartnership.com>,
	Gao Xiang <hsiangkao@linux.alibaba.com>,
	Dave Chinner <david@fromorbit.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Colin Walters <walters@verbum.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [RFC PATCH 0/2] fs/dcache: Per directory amortized negative dentry pruning
Date: Thu, 31 Mar 2022 12:08:25 -0700	[thread overview]
Message-ID: <20220331190827.48241-1-stephen.s.brennan@oracle.com> (raw)

Hi Al et al.

I wanted to share this idea for a way to approach reducing negative dentry
bloat. I think it's got some flaws that still need to be worked out, but I like
this approach more than the previous ones we've seen.

Previous attempts to reduce the bloat have looked at dentries from the hash
bucket perspective, and from the per-sb LRU perspective. This series looks at
them from the directory perspective. It's hard to look at a hash bucket or LRU
list and design a heuristic for an acceptable amount of negative dentries: it
won't scale from small to large systems well. But setting up heuristics on a
per-directory basis will scale better, and it's easier to reason about.

This patch creates a heuristic sysctl, fs.negative-dentry-ratio, which defines
the acceptable ratio of negative to positive dentries in a directory, and
defaults it to 5 negative dentries per positive. Of course, right now we don't
track the number of children of a dentry (let alone whether they are negative or
positive) so applying a heuristic is difficult. We also don't maintain a
per-directory LRU list, so identifying candidates to prune is difficult as well.

The approach I took is inspired by the way cursors iterate slowly through a
directory. Dentries maintain a cursor that points into their d_subdirs list, and
as dentries are created or become negative, we scan a few dentries of the parent
directory, killing a negative dentry if we see too many. The hope is that this
is more fair: there will be a performance cost, but now tasks which create more
dentries are tasked with the scanning, rather than some workqueue or an
unrelated task calling dput(). And, since the amount of scanning is tied to the
creation of dentries, it scales easily as workloads start creating ridiculous
amount of negative dentries.

Some other advantages are:

(1) By relying on the siblings list (not the LRU) we avoid nasty contention
    issues on the LRU lists. The parent dentry lock was already going to be
    taken during d_alloc, so there's nothing new here.
(2) By keeping pruning on a per-directory basis, we're not forced to evict
    potentially useful dentries elsewhere. For instance, if /tmp/foo has a
    workload producing lots of negative dentries, it won't start evicting useful
    cached negative dentries in /usr/bin.

It's not perfect. I have a few gripes with this approach that I want to improve:

(1) The pruning behavior is based on the ordering of dentries in the d_subdirs
    list. If you have 100 positive dentries all adjacent to each other, the
    pruner will see them, but only allow 5 negative dentries to come after them
    before it starts pruning. On the other hand, the pruner would not prune a
    list of dentries containing 500 negative dentries and 100 positive dentries,
    assuming that they are evenly shuffled together. This workload-dependence is
    bad, full stop. I hope to improve this.
(2) The ratio approach is a bit aggressive for small directories. For the
    default case, only 5 negative dentries are allowed. There could be a default
    lower-bound to allow small directories a more reasonable number.  But this
    would require knowing the number of children ahead of time.
(3) The cursor approach fails to do much in the way of a prioritizing older /
    less used dentries.

I based the series on current master, and did some light testing with parallel
fstat() workloads to see how much I could bloat up a directory (major
reduction of dentry cache size even in the worst case scenario). I've also got a
simulation script which can create different workloads and simulate a
directory's negative/positive dentry count over time.

See also this LSF/MM discussion[1] regarding negative dentry handling, which
motivated me to think a bit more about approaches. Also this[2] past series
tries to tame negative dentry bloat by reordering the d_subdirs list, which is
what got me thinking about the cursor-based approach. I think some improved
version of this RFC, if accepted, would eliminate any need for [2].

[1]: https://lore.kernel.org/linux-fsdevel/YjDvRPuxPN0GsxLB@casper.infradead.org/
[2]: https://lore.kernel.org/linux-fsdevel/20220209231406.187668-1-stephen.s.brennan@oracle.com/

Stephen Brennan (2):
  fs/dcache: make cond_resched in __dentry_kill optional
  fs/dcache: Add negative-dentry-ratio config

 fs/dcache.c            | 108 ++++++++++++++++++++++++++++++++++++++---
 include/linux/dcache.h |   1 +
 2 files changed, 101 insertions(+), 8 deletions(-)

-- 
2.30.2


             reply	other threads:[~2022-03-31 19:09 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-31 19:08 Stephen Brennan [this message]
2022-03-31 19:08 ` [RFC PATCH 1/2] fs/dcache: make cond_resched in __dentry_kill optional Stephen Brennan
2022-03-31 19:08 ` [RFC PATCH 2/2] fs/dcache: Add negative-dentry-ratio config Stephen Brennan
2022-03-31 19:45   ` Al Viro
2022-03-31 20:37     ` Stephen Brennan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220331190827.48241-1-stephen.s.brennan@oracle.com \
    --to=stephen.s.brennan@oracle.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=david@fromorbit.com \
    --cc=hsiangkao@linux.alibaba.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=roman.gushchin@linux.dev \
    --cc=viro@zeniv.linux.org.uk \
    --cc=walters@verbum.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).