All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] fs/dcache: Per directory amortized negative dentry pruning
@ 2022-03-31 19:08 Stephen Brennan
  2022-03-31 19:08 ` [RFC PATCH 1/2] fs/dcache: make cond_resched in __dentry_kill optional Stephen Brennan
  2022-03-31 19:08 ` [RFC PATCH 2/2] fs/dcache: Add negative-dentry-ratio config Stephen Brennan
  0 siblings, 2 replies; 5+ messages in thread
From: Stephen Brennan @ 2022-03-31 19:08 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Luis Chamberlain, Arnd Bergmann, Stephen Brennan,
	Matthew Wilcox, James Bottomley, Gao Xiang, Dave Chinner,
	Roman Gushchin, Colin Walters, linux-fsdevel, linux-kernel

Hi Al et al.

I wanted to share this idea for a way to approach reducing negative dentry
bloat. I think it's got some flaws that still need to be worked out, but I like
this approach more than the previous ones we've seen.

Previous attempts to reduce the bloat have looked at dentries from the hash
bucket perspective, and from the per-sb LRU perspective. This series looks at
them from the directory perspective. It's hard to look at a hash bucket or LRU
list and design a heuristic for an acceptable amount of negative dentries: it
won't scale from small to large systems well. But setting up heuristics on a
per-directory basis will scale better, and it's easier to reason about.

This patch creates a heuristic sysctl, fs.negative-dentry-ratio, which defines
the acceptable ratio of negative to positive dentries in a directory, and
defaults it to 5 negative dentries per positive. Of course, right now we don't
track the number of children of a dentry (let alone whether they are negative or
positive) so applying a heuristic is difficult. We also don't maintain a
per-directory LRU list, so identifying candidates to prune is difficult as well.

The approach I took is inspired by the way cursors iterate slowly through a
directory. Dentries maintain a cursor that points into their d_subdirs list, and
as dentries are created or become negative, we scan a few dentries of the parent
directory, killing a negative dentry if we see too many. The hope is that this
is more fair: there will be a performance cost, but now tasks which create more
dentries are tasked with the scanning, rather than some workqueue or an
unrelated task calling dput(). And, since the amount of scanning is tied to the
creation of dentries, it scales easily as workloads start creating ridiculous
amount of negative dentries.

Some other advantages are:

(1) By relying on the siblings list (not the LRU) we avoid nasty contention
    issues on the LRU lists. The parent dentry lock was already going to be
    taken during d_alloc, so there's nothing new here.
(2) By keeping pruning on a per-directory basis, we're not forced to evict
    potentially useful dentries elsewhere. For instance, if /tmp/foo has a
    workload producing lots of negative dentries, it won't start evicting useful
    cached negative dentries in /usr/bin.

It's not perfect. I have a few gripes with this approach that I want to improve:

(1) The pruning behavior is based on the ordering of dentries in the d_subdirs
    list. If you have 100 positive dentries all adjacent to each other, the
    pruner will see them, but only allow 5 negative dentries to come after them
    before it starts pruning. On the other hand, the pruner would not prune a
    list of dentries containing 500 negative dentries and 100 positive dentries,
    assuming that they are evenly shuffled together. This workload-dependence is
    bad, full stop. I hope to improve this.
(2) The ratio approach is a bit aggressive for small directories. For the
    default case, only 5 negative dentries are allowed. There could be a default
    lower-bound to allow small directories a more reasonable number.  But this
    would require knowing the number of children ahead of time.
(3) The cursor approach fails to do much in the way of a prioritizing older /
    less used dentries.

I based the series on current master, and did some light testing with parallel
fstat() workloads to see how much I could bloat up a directory (major
reduction of dentry cache size even in the worst case scenario). I've also got a
simulation script which can create different workloads and simulate a
directory's negative/positive dentry count over time.

See also this LSF/MM discussion[1] regarding negative dentry handling, which
motivated me to think a bit more about approaches. Also this[2] past series
tries to tame negative dentry bloat by reordering the d_subdirs list, which is
what got me thinking about the cursor-based approach. I think some improved
version of this RFC, if accepted, would eliminate any need for [2].

[1]: https://lore.kernel.org/linux-fsdevel/YjDvRPuxPN0GsxLB@casper.infradead.org/
[2]: https://lore.kernel.org/linux-fsdevel/20220209231406.187668-1-stephen.s.brennan@oracle.com/

Stephen Brennan (2):
  fs/dcache: make cond_resched in __dentry_kill optional
  fs/dcache: Add negative-dentry-ratio config

 fs/dcache.c            | 108 ++++++++++++++++++++++++++++++++++++++---
 include/linux/dcache.h |   1 +
 2 files changed, 101 insertions(+), 8 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH 1/2] fs/dcache: make cond_resched in __dentry_kill optional
  2022-03-31 19:08 [RFC PATCH 0/2] fs/dcache: Per directory amortized negative dentry pruning Stephen Brennan
@ 2022-03-31 19:08 ` Stephen Brennan
  2022-03-31 19:08 ` [RFC PATCH 2/2] fs/dcache: Add negative-dentry-ratio config Stephen Brennan
  1 sibling, 0 replies; 5+ messages in thread
From: Stephen Brennan @ 2022-03-31 19:08 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Luis Chamberlain, Stephen Brennan, Arnd Bergmann,
	Matthew Wilcox, James Bottomley, Gao Xiang, Dave Chinner,
	Roman Gushchin, Colin Walters, linux-fsdevel, linux-kernel

Some callers of __dentry_kill may be killing single dentries, where a
cond_resched is not strictly necessary (and may result in undesirable
sleeps). Add a check_resched flag so the caller may request that we
do not call cond_resched().

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 fs/dcache.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 93f4f5ee07bf..b1480433ddc5 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -575,7 +575,7 @@ static inline void dentry_unlist(struct dentry *dentry, struct dentry *parent)
 	}
 }
 
-static void __dentry_kill(struct dentry *dentry)
+static void __dentry_kill(struct dentry *dentry, bool check_resched)
 {
 	struct dentry *parent = NULL;
 	bool can_free = true;
@@ -619,7 +619,8 @@ static void __dentry_kill(struct dentry *dentry)
 	spin_unlock(&dentry->d_lock);
 	if (likely(can_free))
 		dentry_free(dentry);
-	cond_resched();
+	if (check_resched)
+		cond_resched();
 }
 
 static struct dentry *__lock_parent(struct dentry *dentry)
@@ -730,7 +731,7 @@ static struct dentry *dentry_kill(struct dentry *dentry)
 			goto slow_positive;
 		}
 	}
-	__dentry_kill(dentry);
+	__dentry_kill(dentry, true);
 	return parent;
 
 slow_positive:
@@ -742,7 +743,7 @@ static struct dentry *dentry_kill(struct dentry *dentry)
 	if (unlikely(dentry->d_lockref.count != 1)) {
 		dentry->d_lockref.count--;
 	} else if (likely(!retain_dentry(dentry))) {
-		__dentry_kill(dentry);
+		__dentry_kill(dentry, true);
 		return parent;
 	}
 	/* we are keeping it, after all */
@@ -1109,7 +1110,7 @@ void d_prune_aliases(struct inode *inode)
 		if (!dentry->d_lockref.count) {
 			struct dentry *parent = lock_parent(dentry);
 			if (likely(!dentry->d_lockref.count)) {
-				__dentry_kill(dentry);
+				__dentry_kill(dentry, true);
 				dput(parent);
 				goto restart;
 			}
@@ -1198,7 +1199,7 @@ void shrink_dentry_list(struct list_head *list)
 		parent = dentry->d_parent;
 		if (parent != dentry)
 			__dput_to_list(parent, list);
-		__dentry_kill(dentry);
+		__dentry_kill(dentry, true);
 	}
 }
 
@@ -1645,7 +1646,7 @@ void shrink_dcache_parent(struct dentry *parent)
 				parent = data.victim->d_parent;
 				if (parent != data.victim)
 					__dput_to_list(parent, &data.dispose);
-				__dentry_kill(data.victim);
+				__dentry_kill(data.victim, true);
 			}
 		}
 		if (!list_empty(&data.dispose))
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC PATCH 2/2] fs/dcache: Add negative-dentry-ratio config
  2022-03-31 19:08 [RFC PATCH 0/2] fs/dcache: Per directory amortized negative dentry pruning Stephen Brennan
  2022-03-31 19:08 ` [RFC PATCH 1/2] fs/dcache: make cond_resched in __dentry_kill optional Stephen Brennan
@ 2022-03-31 19:08 ` Stephen Brennan
  2022-03-31 19:45   ` Al Viro
  1 sibling, 1 reply; 5+ messages in thread
From: Stephen Brennan @ 2022-03-31 19:08 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Luis Chamberlain, Stephen Brennan, Arnd Bergmann,
	Matthew Wilcox, James Bottomley, Gao Xiang, Dave Chinner,
	Roman Gushchin, Colin Walters, linux-fsdevel, linux-kernel

Negative dentry bloat is a well-known problem. For systems without
memory pressure, some workloads (like repeated stat calls) can create an
unbounded amount of negative dentries quite quickly. In the best case,
these dentries could speed up a subsequent name lookup, but in the worst
case, they are never used and their memory never freed.

While systems without memory pressure may not need that memory for other
purposes, negative dentry bloat can have other side-effects, such as
soft lockups when traversing the d_subdirs list or general slowness with
managing them. It is a good idea to have some sort of mechanism for
controlling negative dentries, even outside memory pressure.

This patch attempts to do so in a fair way. Workloads which create many
negative dentries must create many dentries, or convert dentries from
positive to negative. Thus, negative dentry management is best done
during these same operations, as it will amortize its cost, and
distribute the cost to the perpetrators of the dentry bloat. We
introduce a sysctl "negative-dentry-ratio" which sets a maximum number
of negative dentries per positive dentry, N:1. When a dentry is created
or unlinked, the next N+1 dentries of the parent are scanned. If no
positive dentries are found, then a candidate negative dentry is killed.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 fs/dcache.c            | 93 +++++++++++++++++++++++++++++++++++++++++-
 include/linux/dcache.h |  1 +
 2 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index b1480433ddc5..ba79107a8268 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -74,6 +74,8 @@
 int sysctl_vfs_cache_pressure __read_mostly = 100;
 EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
 
+unsigned int sysctl_negative_dentry_ratio = 5;
+
 __cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);
 
 EXPORT_SYMBOL(rename_lock);
@@ -183,6 +185,7 @@ static int proc_nr_dentry(struct ctl_table *table, int write, void *buffer,
 	return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
 }
 
+const unsigned int max_negative_dentry_ratio = 20;
 static struct ctl_table fs_dcache_sysctls[] = {
 	{
 		.procname	= "dentry-state",
@@ -191,6 +194,15 @@ static struct ctl_table fs_dcache_sysctls[] = {
 		.mode		= 0444,
 		.proc_handler	= proc_nr_dentry,
 	},
+	{
+		.procname	= "negative-dentry-ratio",
+		.data		= &sysctl_negative_dentry_ratio,
+		.maxlen	= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_douintvec_minmax,
+		.extra1	= SYSCTL_ZERO,
+		.extra2	= (void *)&max_negative_dentry_ratio,
+	},
 	{ }
 };
 
@@ -547,6 +559,8 @@ static inline void dentry_unlist(struct dentry *dentry, struct dentry *parent)
 	dentry->d_flags |= DCACHE_DENTRY_KILLED;
 	if (unlikely(list_empty(&dentry->d_child)))
 		return;
+	if (!IS_ROOT(dentry) && unlikely(dentry->d_parent->d_scan_cursor == &dentry->d_child))
+		dentry->d_parent->d_scan_cursor = dentry->d_child.next;
 	__list_del_entry(&dentry->d_child);
 	/*
 	 * Cursors can move around the list of children.  While we'd been
@@ -1751,6 +1765,71 @@ void d_invalidate(struct dentry *dentry)
 }
 EXPORT_SYMBOL(d_invalidate);
 
+/**
+ * Enforce a requirement that a directory never contains more than N negative
+ * dentries per positive dentry. Scan N + 1 dentries. If no positive dentries
+ * are found, then kill one negative dentry. parent's lock must be held, and it
+ * will be released by this function.
+ */
+static void dentry_incremental_scan(struct dentry *parent)
+{
+	struct dentry *dentry, *candidate = NULL;
+	struct list_head *cursor, *start;
+	unsigned int flags;
+	int refcount;
+	int nr_to_scan = sysctl_negative_dentry_ratio + 1;
+	int nr_positive = 0;
+
+	if (nr_to_scan == 1)
+		goto out_unlock;
+
+	cursor = start = parent->d_scan_cursor ?: parent->d_subdirs.next;
+
+	rcu_read_lock();
+	while (nr_to_scan--) {
+		if (cursor == &parent->d_subdirs)
+			goto next;
+		dentry = container_of(cursor, struct dentry, d_child);
+		flags = READ_ONCE(dentry->d_flags);
+		refcount = READ_ONCE(dentry->d_lockref.count);
+		if (d_flags_negative(flags)) {
+			if (!refcount && !dentry->d_inode && !candidate)
+				candidate = dentry;
+		} else {
+			nr_positive++;
+		}
+	next:
+		cursor = cursor->next;
+		if (cursor == start) {
+			rcu_read_unlock();
+			goto out_unlock;
+		}
+	}
+
+	parent->d_scan_cursor = cursor;
+	if (nr_positive || !candidate) {
+		rcu_read_unlock();
+		goto out_unlock;
+	}
+
+	spin_lock(&candidate->d_lock);
+	rcu_read_unlock();
+
+	/* Need to re-read candidate's flags and inode. */
+	if (!d_is_negative(candidate) || candidate->d_inode ||
+	    candidate->d_lockref.count) {
+		spin_unlock(&candidate->d_lock);
+		goto out_unlock;
+	}
+	/* No need to cond_resched(); we're not repeating this operation */
+	__dentry_kill(candidate, false);
+	/* parent->d_lock is now unlocked */
+	return;
+
+out_unlock:
+	spin_unlock(&parent->d_lock);
+}
+
 /**
  * __d_alloc	-	allocate a dcache entry
  * @sb: filesystem it will belong to
@@ -1814,6 +1893,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 	dentry->d_sb = sb;
 	dentry->d_op = NULL;
 	dentry->d_fsdata = NULL;
+	dentry->d_scan_cursor = NULL;
 	INIT_HLIST_BL_NODE(&dentry->d_hash);
 	INIT_LIST_HEAD(&dentry->d_lru);
 	INIT_LIST_HEAD(&dentry->d_subdirs);
@@ -1858,7 +1938,7 @@ struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
 	__dget_dlock(parent);
 	dentry->d_parent = parent;
 	list_add(&dentry->d_child, &parent->d_subdirs);
-	spin_unlock(&parent->d_lock);
+	dentry_incremental_scan(parent); /* unlocks parent */
 
 	return dentry;
 }
@@ -2521,6 +2601,7 @@ EXPORT_SYMBOL(d_hash_and_lookup);
 void d_delete(struct dentry * dentry)
 {
 	struct inode *inode = dentry->d_inode;
+	struct dentry *parent = NULL;
 
 	spin_lock(&inode->i_lock);
 	spin_lock(&dentry->d_lock);
@@ -2529,7 +2610,17 @@ void d_delete(struct dentry * dentry)
 	 */
 	if (dentry->d_lockref.count == 1) {
 		dentry->d_flags &= ~DCACHE_CANT_MOUNT;
+		if (!IS_ROOT(dentry))
+			parent = dentry->d_parent;
 		dentry_unlink_inode(dentry);
+		/*
+		 * Since we have created a negative dentry, continue the
+		 * incremental scan to keep enforcing negative dentry ratio.
+		 */
+		if (parent) {
+			spin_lock(&parent->d_lock);
+			dentry_incremental_scan(parent); /* unlocks parent */
+		}
 	} else {
 		__d_drop(dentry);
 		spin_unlock(&dentry->d_lock);
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index f5bba51480b2..59c240d8c493 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -102,6 +102,7 @@ struct dentry {
 	};
 	struct list_head d_child;	/* child of parent list */
 	struct list_head d_subdirs;	/* our children */
+	struct list_head *d_scan_cursor;
 	/*
 	 * d_alias and d_rcu can share memory
 	 */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 2/2] fs/dcache: Add negative-dentry-ratio config
  2022-03-31 19:08 ` [RFC PATCH 2/2] fs/dcache: Add negative-dentry-ratio config Stephen Brennan
@ 2022-03-31 19:45   ` Al Viro
  2022-03-31 20:37     ` Stephen Brennan
  0 siblings, 1 reply; 5+ messages in thread
From: Al Viro @ 2022-03-31 19:45 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Andrew Morton, Luis Chamberlain, Arnd Bergmann, Matthew Wilcox,
	James Bottomley, Gao Xiang, Dave Chinner, Roman Gushchin,
	Colin Walters, linux-fsdevel, linux-kernel

On Thu, Mar 31, 2022 at 12:08:27PM -0700, Stephen Brennan wrote:
> Negative dentry bloat is a well-known problem. For systems without
> memory pressure, some workloads (like repeated stat calls) can create an
> unbounded amount of negative dentries quite quickly. In the best case,
> these dentries could speed up a subsequent name lookup, but in the worst
> case, they are never used and their memory never freed.
> 
> While systems without memory pressure may not need that memory for other
> purposes, negative dentry bloat can have other side-effects, such as
> soft lockups when traversing the d_subdirs list or general slowness with
> managing them. It is a good idea to have some sort of mechanism for
> controlling negative dentries, even outside memory pressure.
> 
> This patch attempts to do so in a fair way. Workloads which create many
> negative dentries must create many dentries, or convert dentries from
> positive to negative. Thus, negative dentry management is best done
> during these same operations, as it will amortize its cost, and
> distribute the cost to the perpetrators of the dentry bloat. We
> introduce a sysctl "negative-dentry-ratio" which sets a maximum number
> of negative dentries per positive dentry, N:1. When a dentry is created
> or unlinked, the next N+1 dentries of the parent are scanned. If no
> positive dentries are found, then a candidate negative dentry is killed.

Er...  So what's to stop d_move() from leaving you with your cursor
pointer poiting into the list of children of another parent?

What's more, your dentry_unlist() logics will be defeated by that -
if victim used to have a different parent, got moved, then evicted,
it looks like you could end up with old parent cursor pointing
to the victim and left unmodified by dentry_unlist() (since it looks
only at the current parent's cursor).  Wait for it to be freed and
voila - access to old parent's cursor will do unpleasant things.

What am I missing here?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 2/2] fs/dcache: Add negative-dentry-ratio config
  2022-03-31 19:45   ` Al Viro
@ 2022-03-31 20:37     ` Stephen Brennan
  0 siblings, 0 replies; 5+ messages in thread
From: Stephen Brennan @ 2022-03-31 20:37 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, Luis Chamberlain, Arnd Bergmann, Matthew Wilcox,
	James Bottomley, Gao Xiang, Dave Chinner, Roman Gushchin,
	Colin Walters, linux-fsdevel, linux-kernel

On 3/31/22 12:45, Al Viro wrote:
> On Thu, Mar 31, 2022 at 12:08:27PM -0700, Stephen Brennan wrote:
>> Negative dentry bloat is a well-known problem. For systems without
>> memory pressure, some workloads (like repeated stat calls) can create an
>> unbounded amount of negative dentries quite quickly. In the best case,
>> these dentries could speed up a subsequent name lookup, but in the worst
>> case, they are never used and their memory never freed.
>>
>> While systems without memory pressure may not need that memory for other
>> purposes, negative dentry bloat can have other side-effects, such as
>> soft lockups when traversing the d_subdirs list or general slowness with
>> managing them. It is a good idea to have some sort of mechanism for
>> controlling negative dentries, even outside memory pressure.
>>
>> This patch attempts to do so in a fair way. Workloads which create many
>> negative dentries must create many dentries, or convert dentries from
>> positive to negative. Thus, negative dentry management is best done
>> during these same operations, as it will amortize its cost, and
>> distribute the cost to the perpetrators of the dentry bloat. We
>> introduce a sysctl "negative-dentry-ratio" which sets a maximum number
>> of negative dentries per positive dentry, N:1. When a dentry is created
>> or unlinked, the next N+1 dentries of the parent are scanned. If no
>> positive dentries are found, then a candidate negative dentry is killed.
> 
> Er...  So what's to stop d_move() from leaving you with your cursor
> pointer poiting into the list of children of another parent?
> 
> What's more, your dentry_unlist() logics will be defeated by that -
> if victim used to have a different parent, got moved, then evicted,
> it looks like you could end up with old parent cursor pointing
> to the victim and left unmodified by dentry_unlist() (since it looks
> only at the current parent's cursor).  Wait for it to be freed and
> voila - access to old parent's cursor will do unpleasant things.
> 
> What am I missing here?

Thanks for this catch. Since d_move holds the parent's lock, it should
be possible to include the same condition as dentry_unlist() to ensure
the cursor gets advanced if necessary. I could make it a small inline
helper to make things easier to read. I will go ahead and fix that.

Thanks,
Stephen

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-03-31 20:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-31 19:08 [RFC PATCH 0/2] fs/dcache: Per directory amortized negative dentry pruning Stephen Brennan
2022-03-31 19:08 ` [RFC PATCH 1/2] fs/dcache: make cond_resched in __dentry_kill optional Stephen Brennan
2022-03-31 19:08 ` [RFC PATCH 2/2] fs/dcache: Add negative-dentry-ratio config Stephen Brennan
2022-03-31 19:45   ` Al Viro
2022-03-31 20:37     ` Stephen Brennan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.