All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/6] fs/dcache: Limit # of negative dentries
@ 2017-09-18 18:20 Waiman Long
  2017-09-18 18:20 ` [PATCH v4 1/6] fs/dcache: Relocate dentry_kill() after lock_parent() Waiman Long
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Waiman Long @ 2017-09-18 18:20 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-kernel, linux-fsdevel, Linus Torvalds, Jan Kara,
	Paul E. McKenney, Andrew Morton, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C),
	Waiman Long

 v3->v4:
  - Remove the config option "neg_dentry_pc" to reduce system admin
    overload.
  - Allow referenced negative dentries to be recycled in the list
    instead of being killed in pruning.
  - Enable auto-tuneup of negative dentry limit to match positive
    dentry count.
  - Remove the umount racing patch but take an active reference on
    SB while pruning to prevent it from vanishing.
  - Separate out the dentry_kill() code relocation in patch 1 to a
    separate patch.
  - Move the negative dentry tracking patch in front of the limiting
    patch.
  - Decrease the default negative dentry percentage from 5% to 2%.

 v2->v3:
  - Add a faster pruning rate when the free pool is closed to depletion.
  - As suggested by James Bottomley, add an artificial delay waiting
    loop before killing a negative dentry and properly clear the
    DCACHE_KILL_NEGATIVE flag if killing doesn't happen.
  - Add a new patch to track number of negative dentries that are
    forcifully killed.

 v1->v2:
  - Move the new nr_negative field to the end of dentry_stat_t structure
    as suggested by Matthew Wilcox.
  - With the help of Miklos Szeredi, fix incorrect locking order in
    dentry_kill() by using lock_parent() instead of locking the parent's
    d_lock directly.
  - Correctly account for positive to negative dentry transitions.
  - Automatic pruning of negative dentries will now ignore the reference
    bit in negative dentries but not the regular shrinking.

A rogue application can potentially create a large number of negative
dentries in the system consuming most of the memory available even if
memory controller is enabled to limit memory usage. This can impact
performance of other applications running on the system.

We have customers seeing soft lockup and unresponsive system when
tearing down a container because of the large number of negative
dentries accumulated during its up time that had to be cleaned up at
exit time when the container's filesystem was unmounted. So we need
to do something about it.

This patchset introduces changes to the dcache subsystem to limit
the number of negative dentries allowed to be created thus limiting
the amount of memory that can be consumed by negative dentries.

Patch 1 just relocates the postion of the dentry_kill() function.

Patch 2 tracks the number of negative dentries present in the LRU
lists and reports it in /proc/sys/fs/dentry-state.

Patch 3 sets a limit on the number of negative dentries allowable as a
small percentage (2%) of total system memory. So the larger the system,
the more negative dentries can be allowed. Once the limit is reached,
new negative dentries will be killed after use.

Patch 4 enables automatic pruning of least recently used negative
dentries when it is close to the limit so that we won't end up killing
recently used negative dentries.

Patch 5 shows the number of forced negative dentry killings in
/proc/sys/fs/dentry-state.

Patch 6 enables auto-tuneup of free pool negative dentry count to
no more than the maximum number of positive dentries ever used.

With a 4.13 based kernel, the positive & negative dentries lookup rates
(lookups per second) after initial boot on a 36-core 50GB memory VM
with and without the patch were as follows:

  Metric                    w/o patch        with patch
  ------                    ---------        ----------
  Positive dentry lookup      840269           845762
  Negative dentry lookup     1903405          1962514
  Negative dentry creation   6817957          6928768

The last row refers to the creation rate of 1 millions negative
dentries. With 50GB of memory, 1 millions negative dentries can be
created with the patched kernel without any pruning or dentry killing.

Ignoring some inherent noise in the test results, there wasn't any
noticeable difference in term of lookup and negative dentry creation
performance with or without this patch.

By creating 10 millions negative dentries, however, the performance
differed.

  Metric                    w/o patch        with patch
  ------                    ---------        ----------
  Negative dentry creation   651663            190105


For the patched kernel, the corresponding dentry-state was:

  1608833	1590416	45	0	1579878	8286952

This was expected as negative dentry creation throttling with forced
dentry deletion happened in this case.

Running the AIM7 high-systime workload on the same VM, the baseline
performance was 186770 jobs/min. By running a single-thread rogue
negative dentry creation program in the background until the patched
kernel with 2% limit started throttling, the performance was 183746
jobs/min. On an unpatched kernel with memory almost exhausted and
memory shrinker was kicked in, the performance was 148997 jobs/min.

So the patch does protect the system from suffering significant
performance degradation in case a negative dentry creation rogue
program is runninig in the background.

Waiman Long (6):
  fs/dcache: Relocate dentry_kill() after lock_parent()
  fs/dcache: Track & report number of negative dentries
  fs/dcache: Limit numbers of negative dentries
  fs/dcache: Enable automatic pruning of negative dentries
  fs/dcache: Track count of negative dentries forcibly killed
  fs/dcache: Autotuning of negative dentry limit

 fs/dcache.c              | 462 +++++++++++++++++++++++++++++++++++++++++++----
 include/linux/dcache.h   |   8 +-
 include/linux/list_lru.h |   1 +
 mm/list_lru.c            |   4 +-
 4 files changed, 439 insertions(+), 36 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/6] fs/dcache: Relocate dentry_kill() after lock_parent()
  2017-09-18 18:20 [PATCH v4 0/6] fs/dcache: Limit # of negative dentries Waiman Long
@ 2017-09-18 18:20 ` Waiman Long
  2017-09-18 18:20 ` [PATCH v4 2/6] fs/dcache: Track & report number of negative dentries Waiman Long
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2017-09-18 18:20 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-kernel, linux-fsdevel, Linus Torvalds, Jan Kara,
	Paul E. McKenney, Andrew Morton, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C),
	Waiman Long

This patch just relocates the dentry_kill() function to after the
lock_parent() so that dentry_kill() can call lock_parent() in a later
patch. There is no code change.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/dcache.c | 64 ++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index f901413..8979dd8 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -586,38 +586,6 @@ static void __dentry_kill(struct dentry *dentry)
 		dentry_free(dentry);
 }
 
-/*
- * Finish off a dentry we've decided to kill.
- * dentry->d_lock must be held, returns with it unlocked.
- * If ref is non-zero, then decrement the refcount too.
- * Returns dentry requiring refcount drop, or NULL if we're done.
- */
-static struct dentry *dentry_kill(struct dentry *dentry)
-	__releases(dentry->d_lock)
-{
-	struct inode *inode = dentry->d_inode;
-	struct dentry *parent = NULL;
-
-	if (inode && unlikely(!spin_trylock(&inode->i_lock)))
-		goto failed;
-
-	if (!IS_ROOT(dentry)) {
-		parent = dentry->d_parent;
-		if (unlikely(!spin_trylock(&parent->d_lock))) {
-			if (inode)
-				spin_unlock(&inode->i_lock);
-			goto failed;
-		}
-	}
-
-	__dentry_kill(dentry);
-	return parent;
-
-failed:
-	spin_unlock(&dentry->d_lock);
-	return dentry; /* try again with same dentry */
-}
-
 static inline struct dentry *lock_parent(struct dentry *dentry)
 {
 	struct dentry *parent = dentry->d_parent;
@@ -653,6 +621,38 @@ static inline struct dentry *lock_parent(struct dentry *dentry)
 }
 
 /*
+ * Finish off a dentry we've decided to kill.
+ * dentry->d_lock must be held, returns with it unlocked.
+ * If ref is non-zero, then decrement the refcount too.
+ * Returns dentry requiring refcount drop, or NULL if we're done.
+ */
+static struct dentry *dentry_kill(struct dentry *dentry)
+	__releases(dentry->d_lock)
+{
+	struct inode *inode = dentry->d_inode;
+	struct dentry *parent = NULL;
+
+	if (inode && unlikely(!spin_trylock(&inode->i_lock)))
+		goto failed;
+
+	if (!IS_ROOT(dentry)) {
+		parent = dentry->d_parent;
+		if (unlikely(!spin_trylock(&parent->d_lock))) {
+			if (inode)
+				spin_unlock(&inode->i_lock);
+			goto failed;
+		}
+	}
+
+	__dentry_kill(dentry);
+	return parent;
+
+failed:
+	spin_unlock(&dentry->d_lock);
+	return dentry; /* try again with same dentry */
+}
+
+/*
  * Try to do a lockless dput(), and return whether that was successful.
  *
  * If unsuccessful, we return false, having already taken the dentry lock.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 2/6] fs/dcache: Track & report number of negative dentries
  2017-09-18 18:20 [PATCH v4 0/6] fs/dcache: Limit # of negative dentries Waiman Long
  2017-09-18 18:20 ` [PATCH v4 1/6] fs/dcache: Relocate dentry_kill() after lock_parent() Waiman Long
@ 2017-09-18 18:20 ` Waiman Long
  2017-09-18 18:20 ` [PATCH v4 3/6] fs/dcache: Limit numbers " Waiman Long
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2017-09-18 18:20 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-kernel, linux-fsdevel, Linus Torvalds, Jan Kara,
	Paul E. McKenney, Andrew Morton, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C),
	Waiman Long

The current dentry number tracking code doesn't distinguish between
positive & negative dentries. It just reports the total number of
dentries in the LRU lists.

As excessive number of negative dentries can have an impact on system
performance, it will be wise to track the number of positive and
negative dentries separately.

This patch adds tracking for the total number of negative dentries in
the system LRU lists and reports it in the /proc/sys/fs/dentry-state
file.  The number of positive dentries in the LRU lists can be found
by subtracting the number of negative dentries from the total.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/dcache.c            | 45 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dcache.h |  7 ++++---
 2 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 8979dd8..e13668c 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -132,6 +132,7 @@ struct dentry_stat_t dentry_stat = {
 
 static DEFINE_PER_CPU(long, nr_dentry);
 static DEFINE_PER_CPU(long, nr_dentry_unused);
+static DEFINE_PER_CPU(long, nr_dentry_neg);
 
 #if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS)
 
@@ -165,11 +166,22 @@ static long get_nr_dentry_unused(void)
 	return sum < 0 ? 0 : sum;
 }
 
+static long get_nr_dentry_neg(void)
+{
+	int i;
+	long sum = 0;
+
+	for_each_possible_cpu(i)
+		sum += per_cpu(nr_dentry_neg, i);
+	return sum < 0 ? 0 : sum;
+}
+
 int proc_nr_dentry(struct ctl_table *table, int write, void __user *buffer,
 		   size_t *lenp, loff_t *ppos)
 {
 	dentry_stat.nr_dentry = get_nr_dentry();
 	dentry_stat.nr_unused = get_nr_dentry_unused();
+	dentry_stat.nr_negative = get_nr_dentry_neg();
 	return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
 }
 #endif
@@ -227,6 +239,28 @@ static inline int dentry_string_cmp(const unsigned char *cs, const unsigned char
 
 #endif
 
+static inline void __neg_dentry_dec(struct dentry *dentry)
+{
+	this_cpu_dec(nr_dentry_neg);
+}
+
+static inline void neg_dentry_dec(struct dentry *dentry)
+{
+	if (unlikely(d_is_negative(dentry)))
+		__neg_dentry_dec(dentry);
+}
+
+static inline void __neg_dentry_inc(struct dentry *dentry)
+{
+	this_cpu_inc(nr_dentry_neg);
+}
+
+static inline void neg_dentry_inc(struct dentry *dentry)
+{
+	if (unlikely(d_is_negative(dentry)))
+		__neg_dentry_inc(dentry);
+}
+
 static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *ct, unsigned tcount)
 {
 	/*
@@ -329,6 +363,8 @@ static inline void __d_clear_type_and_inode(struct dentry *dentry)
 	flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU);
 	WRITE_ONCE(dentry->d_flags, flags);
 	dentry->d_inode = NULL;
+	if (dentry->d_flags & DCACHE_LRU_LIST)
+		__neg_dentry_inc(dentry);
 }
 
 static void dentry_free(struct dentry *dentry)
@@ -396,6 +432,7 @@ static void d_lru_add(struct dentry *dentry)
 	dentry->d_flags |= DCACHE_LRU_LIST;
 	this_cpu_inc(nr_dentry_unused);
 	WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
+	neg_dentry_inc(dentry);
 }
 
 static void d_lru_del(struct dentry *dentry)
@@ -404,6 +441,7 @@ static void d_lru_del(struct dentry *dentry)
 	dentry->d_flags &= ~DCACHE_LRU_LIST;
 	this_cpu_dec(nr_dentry_unused);
 	WARN_ON_ONCE(!list_lru_del(&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
+	neg_dentry_dec(dentry);
 }
 
 static void d_shrink_del(struct dentry *dentry)
@@ -434,6 +472,7 @@ static void d_lru_isolate(struct list_lru_one *lru, struct dentry *dentry)
 	dentry->d_flags &= ~DCACHE_LRU_LIST;
 	this_cpu_dec(nr_dentry_unused);
 	list_lru_isolate(lru, &dentry->d_lru);
+	neg_dentry_dec(dentry);
 }
 
 static void d_lru_shrink_move(struct list_lru_one *lru, struct dentry *dentry,
@@ -442,6 +481,7 @@ static void d_lru_shrink_move(struct list_lru_one *lru, struct dentry *dentry,
 	D_FLAG_VERIFY(dentry, DCACHE_LRU_LIST);
 	dentry->d_flags |= DCACHE_SHRINK_LIST;
 	list_lru_isolate_move(lru, &dentry->d_lru, list);
+	neg_dentry_dec(dentry);
 }
 
 /*
@@ -1820,6 +1860,11 @@ static void __d_instantiate(struct dentry *dentry, struct inode *inode)
 	WARN_ON(d_in_lookup(dentry));
 
 	spin_lock(&dentry->d_lock);
+	/*
+	 * Decrement negative dentry count if it was in the LRU list.
+	 */
+	if (dentry->d_flags & DCACHE_LRU_LIST)
+		__neg_dentry_dec(dentry);
 	hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry);
 	raw_write_seqcount_begin(&dentry->d_seq);
 	__d_set_inode_and_type(dentry, inode, add_flags);
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index ed1a7cf..c8dcbd3 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -63,9 +63,10 @@ struct qstr {
 struct dentry_stat_t {
 	long nr_dentry;
 	long nr_unused;
-	long age_limit;          /* age in seconds */
-	long want_pages;         /* pages requested by system */
-	long dummy[2];
+	long age_limit;		/* age in seconds */
+	long want_pages;	/* pages requested by system */
+	long nr_negative;	/* # of negative dentries */
+	long dummy;
 };
 extern struct dentry_stat_t dentry_stat;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 3/6] fs/dcache: Limit numbers of negative dentries
  2017-09-18 18:20 [PATCH v4 0/6] fs/dcache: Limit # of negative dentries Waiman Long
  2017-09-18 18:20 ` [PATCH v4 1/6] fs/dcache: Relocate dentry_kill() after lock_parent() Waiman Long
  2017-09-18 18:20 ` [PATCH v4 2/6] fs/dcache: Track & report number of negative dentries Waiman Long
@ 2017-09-18 18:20 ` Waiman Long
  2017-09-18 18:20 ` [PATCH v4 4/6] fs/dcache: Enable automatic pruning " Waiman Long
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2017-09-18 18:20 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-kernel, linux-fsdevel, Linus Torvalds, Jan Kara,
	Paul E. McKenney, Andrew Morton, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C),
	Waiman Long

The number of positive dentries is limited by the number of files in
the filesystems. The number of negative dentries, however, has no limit
other than the total amount of memory available in the system. So
a rogue application that generates a lot of negative dentries can
potentially exhaust most of the memory available in the system even
if memory controller is enabled to limit memory usage. This can have
a big performance impact on other running applications.

To prevent this from happening, the dcache code is now updated to limit
the number of the negative dentries in the LRU lists that can be kept
as a percentage of total available system memory. The current limit
is 2% which is about 100k dentries on a 64-bit system with 1GB memory.
This limit should be big enough for most circumstances.

If the negative dentry limit is exceeded, newly created negative
dentries will be killed right after use to avoid adding unpredictable
latency to the directory lookup operation.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/dcache.c            | 141 +++++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/dcache.h |   1 +
 2 files changed, 137 insertions(+), 5 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index e13668c..6fb65b8 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -130,6 +130,30 @@ struct dentry_stat_t dentry_stat = {
 	.age_limit = 45,
 };
 
+/*
+ * There is a system-wide limit to the amount of negative dentries allowed
+ * in the super blocks' LRU lists. The current limit is 2% of the total
+ * system memory. On a 64-bit system with 1G memory, that translated to
+ * about 100k dentries which is quite a lot.
+ *
+ * To avoid performance problem with a global counter on an SMP system,
+ * the tracking is done mostly on a per-cpu basis. The total limit is
+ * distributed in a 80/20 ratio to per-cpu counters and a global free pool.
+ *
+ * If a per-cpu counter runs out of negative dentries, it can borrow extra
+ * ones from the global free pool. If it has more than its percpu limit,
+ * the extra ones will be returned back to the global pool.
+ */
+#define NEG_DENTRY_PC		2
+#define NEG_DENTRY_BATCH	(1 << 8)
+
+static long neg_dentry_percpu_limit __read_mostly;
+static long neg_dentry_nfree_init __read_mostly; /* Free pool initial value */
+static struct {
+	raw_spinlock_t nfree_lock;
+	long nfree;			/* Negative dentry free pool */
+} ndblk ____cacheline_aligned_in_smp;
+
 static DEFINE_PER_CPU(long, nr_dentry);
 static DEFINE_PER_CPU(long, nr_dentry_unused);
 static DEFINE_PER_CPU(long, nr_dentry_neg);
@@ -173,6 +197,7 @@ static long get_nr_dentry_neg(void)
 
 	for_each_possible_cpu(i)
 		sum += per_cpu(nr_dentry_neg, i);
+	sum += neg_dentry_nfree_init - ndblk.nfree;
 	return sum < 0 ? 0 : sum;
 }
 
@@ -239,9 +264,21 @@ static inline int dentry_string_cmp(const unsigned char *cs, const unsigned char
 
 #endif
 
-static inline void __neg_dentry_dec(struct dentry *dentry)
+/*
+ * Decrement negative dentry count if applicable.
+ */
+static void __neg_dentry_dec(struct dentry *dentry)
 {
-	this_cpu_dec(nr_dentry_neg);
+	if (unlikely(this_cpu_dec_return(nr_dentry_neg) < 0)) {
+		long *pcnt = get_cpu_ptr(&nr_dentry_neg);
+
+		if ((*pcnt < 0) && raw_spin_trylock(&ndblk.nfree_lock)) {
+			ACCESS_ONCE(ndblk.nfree) += NEG_DENTRY_BATCH;
+			*pcnt += NEG_DENTRY_BATCH;
+			raw_spin_unlock(&ndblk.nfree_lock);
+		}
+		put_cpu_ptr(&nr_dentry_neg);
+	}
 }
 
 static inline void neg_dentry_dec(struct dentry *dentry)
@@ -250,9 +287,45 @@ static inline void neg_dentry_dec(struct dentry *dentry)
 		__neg_dentry_dec(dentry);
 }
 
-static inline void __neg_dentry_inc(struct dentry *dentry)
+/*
+ * Try to decrement the negative dentry free pool by NEG_DENTRY_BATCH.
+ * The actual decrement returned by the function may be smaller.
+ */
+static long __neg_dentry_nfree_dec(void)
 {
-	this_cpu_inc(nr_dentry_neg);
+	long cnt = NEG_DENTRY_BATCH;
+
+	raw_spin_lock(&ndblk.nfree_lock);
+	if (ndblk.nfree < cnt)
+		cnt = ndblk.nfree;
+	ACCESS_ONCE(ndblk.nfree) -= cnt;
+	raw_spin_unlock(&ndblk.nfree_lock);
+	return cnt;
+}
+
+/*
+ * Increment negative dentry count if applicable.
+ */
+static void __neg_dentry_inc(struct dentry *dentry)
+{
+	long cnt = 0, *pcnt;
+
+	if (likely(this_cpu_inc_return(nr_dentry_neg) <=
+		   neg_dentry_percpu_limit))
+		return;
+
+	pcnt = get_cpu_ptr(&nr_dentry_neg);
+	if (READ_ONCE(ndblk.nfree) && (*pcnt > neg_dentry_percpu_limit)) {
+		cnt = __neg_dentry_nfree_dec();
+		*pcnt -= cnt;
+	}
+	put_cpu_ptr(&nr_dentry_neg);
+	/*
+	 * If there are too many negative dentries, set DCACHE_KILL_NEGATIVE
+	 * flag to indicate that the dentry should be killed.
+	 */
+	if (!cnt)
+		dentry->d_flags |= DCACHE_KILL_NEGATIVE;
 }
 
 static inline void neg_dentry_inc(struct dentry *dentry)
@@ -677,7 +750,40 @@ static struct dentry *dentry_kill(struct dentry *dentry)
 
 	if (!IS_ROOT(dentry)) {
 		parent = dentry->d_parent;
-		if (unlikely(!spin_trylock(&parent->d_lock))) {
+		if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE)) {
+			/*
+			 * Force the killing of this negative dentry when
+			 * DCACHE_KILL_NEGATIVE flag is set. This is to
+			 * protect against the worst case situation where
+			 * the parent d_lock is kept busy most of the time
+			 * while the number of negative dentries has
+			 * exceeded the limit. Leaving the dentry as is
+			 * will cause the number of negative entries to
+			 * continue to increase way past the limit under
+			 * the worst case situation.
+			 *
+			 * As the reference count is kept at 1 before
+			 * calling dentry_kill(), the dentry won't
+			 * disappear under the hood even if the dentry
+			 * lock is temporarily released.
+			 */
+			unsigned int dflags;
+
+			dentry->d_flags &= ~DCACHE_KILL_NEGATIVE;
+			dflags = dentry->d_flags;
+			parent = lock_parent(dentry);
+			/*
+			 * Abort the killing if the reference count or
+			 * d_flags changed.
+			 */
+			if ((dentry->d_flags != dflags) ||
+			    (dentry->d_lockref.count != 1)) {
+				if (parent)
+					spin_unlock(&parent->d_lock);
+				goto failed;
+			}
+
+		} else if (unlikely(!spin_trylock(&parent->d_lock))) {
 			if (inode)
 				spin_unlock(&inode->i_lock);
 			goto failed;
@@ -855,6 +961,9 @@ void dput(struct dentry *dentry)
 
 	dentry_lru_add(dentry);
 
+	if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE))
+		goto kill_it;
+
 	dentry->d_lockref.count--;
 	spin_unlock(&dentry->d_lock);
 	return;
@@ -3611,6 +3720,26 @@ void d_tmpfile(struct dentry *dentry, struct inode *inode)
 }
 EXPORT_SYMBOL(d_tmpfile);
 
+static void __init neg_dentry_init(void)
+{
+	/* Rough estimate of # of dentries allocated per page */
+	unsigned int nr_dentry_page = PAGE_SIZE/sizeof(struct dentry) - 1;
+	unsigned long cnt;
+
+	raw_spin_lock_init(&ndblk.nfree_lock);
+
+	/* 20% in global pool & 80% in percpu free */
+	ndblk.nfree = neg_dentry_nfree_init
+		    = totalram_pages * nr_dentry_page * NEG_DENTRY_PC / 500;
+	cnt = ndblk.nfree * 4 / num_possible_cpus();
+	if (unlikely(cnt < 2 * NEG_DENTRY_BATCH))
+		cnt = 2 * NEG_DENTRY_BATCH;
+	neg_dentry_percpu_limit = cnt;
+
+	pr_info("Negative dentry: percpu limit = %ld, free pool = %ld\n",
+		neg_dentry_percpu_limit, ndblk.nfree);
+}
+
 static __initdata unsigned long dhash_entries;
 static int __init set_dhash_entries(char *str)
 {
@@ -3651,6 +3780,8 @@ static void __init dcache_init(void)
 	dentry_cache = KMEM_CACHE(dentry,
 		SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT);
 
+	neg_dentry_init();
+
 	/* Hash may have been set up in dcache_init_early */
 	if (!hashdist)
 		return;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index c8dcbd3..9052522 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -219,6 +219,7 @@ struct dentry_operations {
 
 #define DCACHE_PAR_LOOKUP		0x10000000 /* being looked up (with parent locked shared) */
 #define DCACHE_DENTRY_CURSOR		0x20000000
+#define DCACHE_KILL_NEGATIVE		0x40000000 /* Kill negative dentry */
 
 extern seqlock_t rename_lock;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 4/6] fs/dcache: Enable automatic pruning of negative dentries
  2017-09-18 18:20 [PATCH v4 0/6] fs/dcache: Limit # of negative dentries Waiman Long
                   ` (2 preceding siblings ...)
  2017-09-18 18:20 ` [PATCH v4 3/6] fs/dcache: Limit numbers " Waiman Long
@ 2017-09-18 18:20 ` Waiman Long
  2017-09-18 18:20 ` [PATCH v4 5/6] fs/dcache: Track count of negative dentries forcibly killed Waiman Long
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2017-09-18 18:20 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-kernel, linux-fsdevel, Linus Torvalds, Jan Kara,
	Paul E. McKenney, Andrew Morton, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C),
	Waiman Long

Having a limit for the number of negative dentries may have an
undesirable side effect that no new negative dentries will be allowed
when the limit is reached. This may have a performance impact on some
workloads.

To prevent this from happening, we need a way to prune the negative
dentries so that new ones can be created before it is too late. This
is done by using the workqueue API to do the pruning gradually when a
threshold is reached to minimize performance impact on other running
tasks.

The current threshold is 1/4 of the initial value of the free pool
count. Once the threshold is reached, the automatic pruning process
will be kicked in to replenish the free pool. Each pruning run will
scan 64 dentries per LRU list and can remove up to 256 negative
dentries to minimize the LRU locks hold time. The pruning rate will
be 50 Hz if the free pool count is less than 1/8 of the original and
10 Hz otherwise.

A short artificial delay loop is added to wait for changes in the
negative dentry count before killing the negative dentry. Sleeping
in this case may be problematic as the callers of dput() may not
be in a state that is sleepable.

Allowing tasks needing negative dentries to potentially go to do
the pruning synchronously themselves can cause lock and cacheline
contention. The end result may not be better than that of killing
recently created negative dentries.

The dentry pruning operation will also free some least recently used
positive dentries as well.

In the unlikely event that a superblock is being umount'ed while in
negative dentry pruning mode, the umount may face an additional delay
of up to 0.1s.

With automatic pruning in place, forced negative dentry killing should
not happen unless there are pathological tasks present that generate
a large number of negative dentries in a very short time.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/dcache.c              | 190 +++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/list_lru.h |   1 +
 mm/list_lru.c            |   4 +-
 3 files changed, 189 insertions(+), 6 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 6fb65b8..dd1adda 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -146,14 +146,27 @@ struct dentry_stat_t dentry_stat = {
  */
 #define NEG_DENTRY_PC		2
 #define NEG_DENTRY_BATCH	(1 << 8)
+#define NEG_PRUNING_SIZE	(1 << 6)
+#define NEG_PRUNING_SLOW_RATE	(HZ/10)
+#define NEG_PRUNING_FAST_RATE	(HZ/50)
+#define NEG_IS_SB_UMOUNTING(sb)	\
+	unlikely(!(sb)->s_root || !((sb)->s_flags & MS_ACTIVE))
 
 static long neg_dentry_percpu_limit __read_mostly;
 static long neg_dentry_nfree_init __read_mostly; /* Free pool initial value */
 static struct {
 	raw_spinlock_t nfree_lock;
+	int niter;			/* Pruning iteration count */
+	int lru_count;			/* Per-LRU pruning count */
+	long n_neg;			/* # of negative dentries pruned */
+	long n_pos;			/* # of positive dentries pruned */
 	long nfree;			/* Negative dentry free pool */
+	struct super_block *prune_sb;	/* Super_block for pruning */
 } ndblk ____cacheline_aligned_in_smp;
 
+static void prune_negative_dentry(struct work_struct *work);
+static DECLARE_DELAYED_WORK(prune_neg_dentry_work, prune_negative_dentry);
+
 static DEFINE_PER_CPU(long, nr_dentry);
 static DEFINE_PER_CPU(long, nr_dentry_unused);
 static DEFINE_PER_CPU(long, nr_dentry_neg);
@@ -326,6 +339,25 @@ static void __neg_dentry_inc(struct dentry *dentry)
 	 */
 	if (!cnt)
 		dentry->d_flags |= DCACHE_KILL_NEGATIVE;
+
+	/*
+	 * Initiate negative dentry pruning if free pool has less than
+	 * 1/4 of its initial value.
+	 */
+	if ((READ_ONCE(ndblk.nfree) < READ_ONCE(neg_dentry_nfree_init)/4) &&
+	    !READ_ONCE(ndblk.prune_sb) &&
+	    !cmpxchg(&ndblk.prune_sb, NULL, dentry->d_sb)) {
+		/*
+		 * Abort if umounting is in progress, otherwise take a
+		 * reference and move on.
+		 */
+		if (NEG_IS_SB_UMOUNTING(ndblk.prune_sb)) {
+			WRITE_ONCE(ndblk.prune_sb, NULL);
+		} else {
+			atomic_inc(&ndblk.prune_sb->s_active);
+			schedule_delayed_work(&prune_neg_dentry_work, 1);
+		}
+	}
 }
 
 static inline void neg_dentry_inc(struct dentry *dentry)
@@ -767,10 +799,8 @@ static struct dentry *dentry_kill(struct dentry *dentry)
 			 * disappear under the hood even if the dentry
 			 * lock is temporarily released.
 			 */
-			unsigned int dflags;
+			unsigned int dflags = dentry->d_flags;
 
-			dentry->d_flags &= ~DCACHE_KILL_NEGATIVE;
-			dflags = dentry->d_flags;
 			parent = lock_parent(dentry);
 			/*
 			 * Abort the killing if the reference count or
@@ -961,8 +991,35 @@ void dput(struct dentry *dentry)
 
 	dentry_lru_add(dentry);
 
-	if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE))
-		goto kill_it;
+	if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE)) {
+		/*
+		 * Kill the dentry if it is really negative and the per-cpu
+		 * negative dentry count has still exceeded the limit even
+		 * after a short artificial delay.
+		 */
+		if (d_is_negative(dentry) &&
+		   (this_cpu_read(nr_dentry_neg) > neg_dentry_percpu_limit)) {
+			int loop = 256;
+
+			/*
+			 * Waiting to transfer free negative dentries from the
+			 * free pool to the percpu count.
+			 */
+			while (--loop) {
+				if (READ_ONCE(ndblk.nfree)) {
+					long cnt = __neg_dentry_nfree_dec();
+
+					this_cpu_sub(nr_dentry_neg, cnt);
+					if (cnt)
+						break;
+				}
+				cpu_relax();
+			}
+			if (!loop)
+				goto kill_it;
+		}
+		dentry->d_flags &= ~DCACHE_KILL_NEGATIVE;
+	}
 
 	dentry->d_lockref.count--;
 	spin_unlock(&dentry->d_lock);
@@ -1323,6 +1380,129 @@ void shrink_dcache_sb(struct super_block *sb)
 }
 EXPORT_SYMBOL(shrink_dcache_sb);
 
+/*
+ * A modified version that attempts to remove a limited number of negative
+ * dentries as well as some other non-negative dentries at the front.
+ */
+static enum lru_status dentry_negative_lru_isolate(struct list_head *item,
+		struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
+{
+	struct list_head *freeable = arg;
+	struct dentry	*dentry = container_of(item, struct dentry, d_lru);
+	enum lru_status	status = LRU_SKIP;
+
+	/*
+	 * Limit amount of dentry walking in each LRU list.
+	 */
+	if (ndblk.lru_count >= NEG_PRUNING_SIZE) {
+		ndblk.lru_count = 0;
+		return LRU_STOP;
+	}
+	ndblk.lru_count++;
+
+	/*
+	 * we are inverting the lru lock/dentry->d_lock here,
+	 * so use a trylock. If we fail to get the lock, just skip
+	 * it
+	 */
+	if (!spin_trylock(&dentry->d_lock))
+		return LRU_SKIP;
+
+	/*
+	 * Referenced dentries are still in use. If they have active
+	 * counts, just remove them from the LRU. Otherwise give them
+	 * another pass through the LRU.
+	 */
+	if (dentry->d_lockref.count) {
+		d_lru_isolate(lru, dentry);
+		status = LRU_REMOVED;
+		goto out;
+	}
+
+	/*
+	 * Dentries with reference bit on are moved back to the tail.
+	 */
+	if (dentry->d_flags & DCACHE_REFERENCED) {
+		dentry->d_flags &= ~DCACHE_REFERENCED;
+		status = LRU_ROTATE;
+		goto out;
+	}
+
+	status = LRU_REMOVED;
+	d_lru_shrink_move(lru, dentry, freeable);
+	if (d_is_negative(dentry))
+		ndblk.n_neg++;
+out:
+	spin_unlock(&dentry->d_lock);
+	return status;
+}
+
+/*
+ * A workqueue function to prune negative dentry.
+ *
+ * The pruning is done gradually over time so as to have as little
+ * performance impact as possible.
+ */
+static void prune_negative_dentry(struct work_struct *work)
+{
+	int freed, last_n_neg;
+	long nfree;
+	struct super_block *sb = READ_ONCE(ndblk.prune_sb);
+	LIST_HEAD(dispose);
+
+	if (!sb)
+		return;
+	if (NEG_IS_SB_UMOUNTING(sb))
+		goto stop_pruning;
+
+	ndblk.niter++;
+	ndblk.lru_count = 0;
+	last_n_neg = ndblk.n_neg;
+	freed = list_lru_walk(&sb->s_dentry_lru, dentry_negative_lru_isolate,
+			      &dispose, NEG_DENTRY_BATCH);
+
+	if (freed)
+		shrink_dentry_list(&dispose);
+	ndblk.n_pos += freed - (ndblk.n_neg - last_n_neg);
+
+	/*
+	 * Continue delayed pruning until negative dentry free pool is at
+	 * least 1/2 of the initial value, the super_block has no more
+	 * negative dentries left at the front, or unmounting is in
+	 * progress.
+	 *
+	 * The pruning rate depends on the size of the free pool. The
+	 * faster rate is used when there is less than 1/8 left.
+	 * Otherwise, the slower rate will be used.
+	 */
+	nfree = READ_ONCE(ndblk.nfree);
+	if ((ndblk.n_neg == last_n_neg) ||
+	    (nfree >= neg_dentry_nfree_init/2) || NEG_IS_SB_UMOUNTING(sb))
+		goto stop_pruning;
+
+	schedule_delayed_work(&prune_neg_dentry_work,
+			     (nfree < neg_dentry_nfree_init/8)
+			     ? NEG_PRUNING_FAST_RATE : NEG_PRUNING_SLOW_RATE);
+	return;
+
+stop_pruning:
+#ifdef CONFIG_DEBUG_KERNEL
+	/*
+	 * Report large negative dentry pruning event.
+	 */
+	if (ndblk.n_neg > NEG_PRUNING_SIZE) {
+		pr_info("Negative dentry pruning (SB=%s):\n\t"
+			"%d iterations, %ld/%ld neg/pos dentries freed.\n",
+			ndblk.prune_sb->s_id, ndblk.niter, ndblk.n_neg,
+			ndblk.n_pos);
+	}
+#endif
+	ndblk.niter = 0;
+	ndblk.n_neg = ndblk.n_pos = 0;
+	deactivate_super(sb);
+	WRITE_ONCE(ndblk.prune_sb, NULL);
+}
+
 /**
  * enum d_walk_ret - action to talke during tree walk
  * @D_WALK_CONTINUE:	contrinue walk
diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index fa7fd03..06c9d15 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -22,6 +22,7 @@ enum lru_status {
 	LRU_SKIP,		/* item cannot be locked, skip */
 	LRU_RETRY,		/* item not freeable. May drop the lock
 				   internally, but has to return locked. */
+	LRU_STOP,		/* stop walking the list */
 };
 
 struct list_lru_one {
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 7a40fa2..f6e7796 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -244,11 +244,13 @@ unsigned long list_lru_count_node(struct list_lru *lru, int nid)
 			 */
 			assert_spin_locked(&nlru->lock);
 			goto restart;
+		case LRU_STOP:
+			goto out;
 		default:
 			BUG();
 		}
 	}
-
+out:
 	spin_unlock(&nlru->lock);
 	return isolated;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 5/6] fs/dcache: Track count of negative dentries forcibly killed
  2017-09-18 18:20 [PATCH v4 0/6] fs/dcache: Limit # of negative dentries Waiman Long
                   ` (3 preceding siblings ...)
  2017-09-18 18:20 ` [PATCH v4 4/6] fs/dcache: Enable automatic pruning " Waiman Long
@ 2017-09-18 18:20 ` Waiman Long
  2017-09-18 18:20 ` [PATCH v4 6/6] fs/dcache: Autotuning of negative dentry limit Waiman Long
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2017-09-18 18:20 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-kernel, linux-fsdevel, Linus Torvalds, Jan Kara,
	Paul E. McKenney, Andrew Morton, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C),
	Waiman Long

There is performance concern about killing recently created negative
dentries. This should rarely happen under normal working condition. To
understand the extent of how often this negative dentry killing is
happening, the /proc/sys/fs/denty-state file is extended to track this
number. This allows us to see if additional measures will be needed
to reduce the chance of negative dentries killing.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/dcache.c            | 4 ++++
 include/linux/dcache.h | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index dd1adda..f46603e 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -162,6 +162,7 @@ struct dentry_stat_t dentry_stat = {
 	long n_pos;			/* # of positive dentries pruned */
 	long nfree;			/* Negative dentry free pool */
 	struct super_block *prune_sb;	/* Super_block for pruning */
+	atomic_long_t nr_neg_killed;	/* # of negative entries killed */
 } ndblk ____cacheline_aligned_in_smp;
 
 static void prune_negative_dentry(struct work_struct *work);
@@ -220,6 +221,7 @@ int proc_nr_dentry(struct ctl_table *table, int write, void __user *buffer,
 	dentry_stat.nr_dentry = get_nr_dentry();
 	dentry_stat.nr_unused = get_nr_dentry_unused();
 	dentry_stat.nr_negative = get_nr_dentry_neg();
+	dentry_stat.nr_killed = atomic_long_read(&ndblk.nr_neg_killed);
 	return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
 }
 #endif
@@ -812,6 +814,7 @@ static struct dentry *dentry_kill(struct dentry *dentry)
 					spin_unlock(&parent->d_lock);
 				goto failed;
 			}
+			atomic_long_inc(&ndblk.nr_neg_killed);
 
 		} else if (unlikely(!spin_trylock(&parent->d_lock))) {
 			if (inode)
@@ -3907,6 +3910,7 @@ static void __init neg_dentry_init(void)
 	unsigned long cnt;
 
 	raw_spin_lock_init(&ndblk.nfree_lock);
+	atomic_long_set(&ndblk.nr_neg_killed, 0);
 
 	/* 20% in global pool & 80% in percpu free */
 	ndblk.nfree = neg_dentry_nfree_init
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 9052522..16d0244 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -66,7 +66,7 @@ struct dentry_stat_t {
 	long age_limit;		/* age in seconds */
 	long want_pages;	/* pages requested by system */
 	long nr_negative;	/* # of negative dentries */
-	long dummy;
+	long nr_killed;		/* # of negative dentries killed */
 };
 extern struct dentry_stat_t dentry_stat;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 6/6] fs/dcache: Autotuning of negative dentry limit
  2017-09-18 18:20 [PATCH v4 0/6] fs/dcache: Limit # of negative dentries Waiman Long
                   ` (4 preceding siblings ...)
  2017-09-18 18:20 ` [PATCH v4 5/6] fs/dcache: Track count of negative dentries forcibly killed Waiman Long
@ 2017-09-18 18:20 ` Waiman Long
  2017-10-05 15:41 ` [PATCH v4 0/6] fs/dcache: Limit # of negative dentries Waiman Long
  2017-10-10 22:54 ` Andrew Morton
  7 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2017-09-18 18:20 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-kernel, linux-fsdevel, Linus Torvalds, Jan Kara,
	Paul E. McKenney, Andrew Morton, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C),
	Waiman Long

To have a proper balance of positive and negative dentries, the free
pool negative dentry limit can be dynamically tuned up to no more than
the number of positive dentries that have ever been used. The total
positive dentry count is computed at a frequency of no more than once
every 5 minutes.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/dcache.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/fs/dcache.c b/fs/dcache.c
index f46603e..ac68fbd 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -143,12 +143,19 @@ struct dentry_stat_t dentry_stat = {
  * If a per-cpu counter runs out of negative dentries, it can borrow extra
  * ones from the global free pool. If it has more than its percpu limit,
  * the extra ones will be returned back to the global pool.
+ *
+ * In order to have a proper balance between negative and positive dentries,
+ * the free pool negative dentry limit can be tuned up dynamically at run
+ * time to no more than the highest number of positive dentries that were
+ * ever used. Total positive dentry count is checked no more than once
+ * every 5 mins to reduce performance impact.
  */
 #define NEG_DENTRY_PC		2
 #define NEG_DENTRY_BATCH	(1 << 8)
 #define NEG_PRUNING_SIZE	(1 << 6)
 #define NEG_PRUNING_SLOW_RATE	(HZ/10)
 #define NEG_PRUNING_FAST_RATE	(HZ/50)
+#define NEG_COUNT_CHECK_PERIOD	(HZ*5*60)
 #define NEG_IS_SB_UMOUNTING(sb)	\
 	unlikely(!(sb)->s_root || !((sb)->s_flags & MS_ACTIVE))
 
@@ -163,6 +170,7 @@ struct dentry_stat_t dentry_stat = {
 	long nfree;			/* Negative dentry free pool */
 	struct super_block *prune_sb;	/* Super_block for pruning */
 	atomic_long_t nr_neg_killed;	/* # of negative entries killed */
+	unsigned long last_jiffies;	/* Last time +ve count is checked */
 } ndblk ____cacheline_aligned_in_smp;
 
 static void prune_negative_dentry(struct work_struct *work);
@@ -1469,6 +1477,36 @@ static void prune_negative_dentry(struct work_struct *work)
 	ndblk.n_pos += freed - (ndblk.n_neg - last_n_neg);
 
 	/*
+	 * Also check the total negative & positive dentries count to see
+	 * if we can increase the free pool negative dentry limit.
+	 */
+	if (time_after(jiffies, ndblk.last_jiffies + NEG_COUNT_CHECK_PERIOD)) {
+		unsigned long pos = 0, neg = 0;
+		int i;
+
+		/*
+		 * The positive count may include a small number of
+		 * negative dentries in transit, but that should be
+		 * negligible.
+		 */
+		for_each_possible_cpu(i) {
+			pos += per_cpu(nr_dentry, i);
+			neg += per_cpu(nr_dentry_neg, i);
+		}
+		pos -= neg;
+
+		if (unlikely(pos > neg_dentry_nfree_init)) {
+			unsigned long inc = pos - neg_dentry_nfree_init;
+
+			raw_spin_lock(&ndblk.nfree_lock);
+			ndblk.nfree += inc;
+			raw_spin_unlock(&ndblk.nfree_lock);
+			WRITE_ONCE(neg_dentry_nfree_init, pos);
+		}
+		ndblk.last_jiffies = jiffies;
+	}
+
+	/*
 	 * Continue delayed pruning until negative dentry free pool is at
 	 * least 1/2 of the initial value, the super_block has no more
 	 * negative dentries left at the front, or unmounting is in
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 0/6] fs/dcache: Limit # of negative dentries
  2017-09-18 18:20 [PATCH v4 0/6] fs/dcache: Limit # of negative dentries Waiman Long
                   ` (5 preceding siblings ...)
  2017-09-18 18:20 ` [PATCH v4 6/6] fs/dcache: Autotuning of negative dentry limit Waiman Long
@ 2017-10-05 15:41 ` Waiman Long
  2017-10-10 22:54 ` Andrew Morton
  7 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2017-10-05 15:41 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-kernel, linux-fsdevel, Linus Torvalds, Jan Kara,
	Paul E. McKenney, Andrew Morton, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C)

On 09/18/2017 02:20 PM, Waiman Long wrote:
>  
> A rogue application can potentially create a large number of negative
> dentries in the system consuming most of the memory available even if
> memory controller is enabled to limit memory usage. This can impact
> performance of other applications running on the system.
>
> We have customers seeing soft lockup and unresponsive system when
> tearing down a container because of the large number of negative
> dentries accumulated during its up time that had to be cleaned up at
> exit time when the container's filesystem was unmounted. So we need
> to do something about it.
>
> This patchset introduces changes to the dcache subsystem to limit
> the number of negative dentries allowed to be created thus limiting
> the amount of memory that can be consumed by negative dentries.
>
> Patch 1 just relocates the postion of the dentry_kill() function.
>
> Patch 2 tracks the number of negative dentries present in the LRU
> lists and reports it in /proc/sys/fs/dentry-state.
>
> Patch 3 sets a limit on the number of negative dentries allowable as a
> small percentage (2%) of total system memory. So the larger the system,
> the more negative dentries can be allowed. Once the limit is reached,
> new negative dentries will be killed after use.
>
> Patch 4 enables automatic pruning of least recently used negative
> dentries when it is close to the limit so that we won't end up killing
> recently used negative dentries.
>
> Patch 5 shows the number of forced negative dentry killings in
> /proc/sys/fs/dentry-state.
>
> Patch 6 enables auto-tuneup of free pool negative dentry count to
> no more than the maximum number of positive dentries ever used.
>
> With a 4.13 based kernel, the positive & negative dentries lookup rates
> (lookups per second) after initial boot on a 36-core 50GB memory VM
> with and without the patch were as follows:
>
>   Metric                    w/o patch        with patch
>   ------                    ---------        ----------
>   Positive dentry lookup      840269           845762
>   Negative dentry lookup     1903405          1962514
>   Negative dentry creation   6817957          6928768
>
> The last row refers to the creation rate of 1 millions negative
> dentries. With 50GB of memory, 1 millions negative dentries can be
> created with the patched kernel without any pruning or dentry killing.
>
> Ignoring some inherent noise in the test results, there wasn't any
> noticeable difference in term of lookup and negative dentry creation
> performance with or without this patch.
>
> By creating 10 millions negative dentries, however, the performance
> differed.
>
>   Metric                    w/o patch        with patch
>   ------                    ---------        ----------
>   Negative dentry creation   651663            190105
>
>
> For the patched kernel, the corresponding dentry-state was:
>
>   1608833	1590416	45	0	1579878	8286952
>
> This was expected as negative dentry creation throttling with forced
> dentry deletion happened in this case.
>
> Running the AIM7 high-systime workload on the same VM, the baseline
> performance was 186770 jobs/min. By running a single-thread rogue
> negative dentry creation program in the background until the patched
> kernel with 2% limit started throttling, the performance was 183746
> jobs/min. On an unpatched kernel with memory almost exhausted and
> memory shrinker was kicked in, the performance was 148997 jobs/min.
>
> So the patch does protect the system from suffering significant
> performance degradation in case a negative dentry creation rogue
> program is runninig in the background.
>
> Waiman Long (6):
>   fs/dcache: Relocate dentry_kill() after lock_parent()
>   fs/dcache: Track & report number of negative dentries
>   fs/dcache: Limit numbers of negative dentries
>   fs/dcache: Enable automatic pruning of negative dentries
>   fs/dcache: Track count of negative dentries forcibly killed
>   fs/dcache: Autotuning of negative dentry limit
>
>  fs/dcache.c              | 462 +++++++++++++++++++++++++++++++++++++++++++----
>  include/linux/dcache.h   |   8 +-
>  include/linux/list_lru.h |   1 +
>  mm/list_lru.c            |   4 +-
>  4 files changed, 439 insertions(+), 36 deletions(-)
>
Ping!

Any comments on this v4 patchset. We have customers seeing problems
because of the unlimited growth of negative dentries. I am hoping that
at least part of this patchset, in some form, can be merged eventually.

Thanks,
Longman

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 0/6] fs/dcache: Limit # of negative dentries
  2017-09-18 18:20 [PATCH v4 0/6] fs/dcache: Limit # of negative dentries Waiman Long
                   ` (6 preceding siblings ...)
  2017-10-05 15:41 ` [PATCH v4 0/6] fs/dcache: Limit # of negative dentries Waiman Long
@ 2017-10-10 22:54 ` Andrew Morton
  2017-10-11 20:47   ` Waiman Long
  7 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2017-10-10 22:54 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, linux-kernel, linux-fsdevel, Linus Torvalds,
	Jan Kara, Paul E. McKenney, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C)

On Mon, 18 Sep 2017 14:20:28 -0400 Waiman Long <longman@redhat.com> wrote:

> A rogue application can potentially create a large number of negative
> dentries in the system consuming most of the memory available even if
> memory controller is enabled to limit memory usage. This can impact
> performance of other applications running on the system.

It does seem that under these circumstances it is pretty silly of us to
reclaim useful things in order to instantiate zillions of -ve dentries.

Dentries are subject to kmemcg handling.  Does this not help avoid
"impacting performance of other applications"?

> We have customers seeing soft lockup and unresponsive system when
> tearing down a container because of the large number of negative
> dentries accumulated during its up time that had to be cleaned up at
> exit time when the container's filesystem was unmounted. So we need
> to do something about it.

It's a somewhat separate issue, but maybe we're missing a cond_resched
somewhere?  Seeing such a softlockup's output would help.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 0/6] fs/dcache: Limit # of negative dentries
  2017-10-10 22:54 ` Andrew Morton
@ 2017-10-11 20:47   ` Waiman Long
  2017-10-11 20:56     ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Waiman Long @ 2017-10-11 20:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, linux-kernel, linux-fsdevel, Linus Torvalds,
	Jan Kara, Paul E. McKenney, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C)

On 10/10/2017 06:54 PM, Andrew Morton wrote:
> On Mon, 18 Sep 2017 14:20:28 -0400 Waiman Long <longman@redhat.com> wrote:
>
>> A rogue application can potentially create a large number of negative
>> dentries in the system consuming most of the memory available even if
>> memory controller is enabled to limit memory usage. This can impact
>> performance of other applications running on the system.
> It does seem that under these circumstances it is pretty silly of us to
> reclaim useful things in order to instantiate zillions of -ve dentries.

I am talking about a misbehaving program due to bug or an intentional
rogue program.

>
> Dentries are subject to kmemcg handling.  Does this not help avoid
> "impacting performance of other applications"?

AFAIK, the dentry kmem_cache isn't memcg aware. So memcg can't really
constrain the dentry allocation.

>> We have customers seeing soft lockup and unresponsive system when
>> tearing down a container because of the large number of negative
>> dentries accumulated during its up time that had to be cleaned up at
>> exit time when the container's filesystem was unmounted. So we need
>> to do something about it.
> It's a somewhat separate issue, but maybe we're missing a cond_resched
> somewhere?  Seeing such a softlockup's output would help.
>
I don't have a console log. I got this information indirectly via some
of our customer-facing engineers.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 0/6] fs/dcache: Limit # of negative dentries
  2017-10-11 20:47   ` Waiman Long
@ 2017-10-11 20:56     ` Dave Chinner
  2017-10-11 21:08       ` Waiman Long
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2017-10-11 20:56 UTC (permalink / raw)
  To: Waiman Long
  Cc: Andrew Morton, Alexander Viro, linux-kernel, linux-fsdevel,
	Linus Torvalds, Jan Kara, Paul E. McKenney, Ingo Molnar,
	Miklos Szeredi, Matthew Wilcox, Larry Woodman, James Bottomley,
	Wangkai (Kevin C)

On Wed, Oct 11, 2017 at 04:47:05PM -0400, Waiman Long wrote:
> On 10/10/2017 06:54 PM, Andrew Morton wrote:
> > On Mon, 18 Sep 2017 14:20:28 -0400 Waiman Long <longman@redhat.com> wrote:
> >
> >> A rogue application can potentially create a large number of negative
> >> dentries in the system consuming most of the memory available even if
> >> memory controller is enabled to limit memory usage. This can impact
> >> performance of other applications running on the system.
> > It does seem that under these circumstances it is pretty silly of us to
> > reclaim useful things in order to instantiate zillions of -ve dentries.
> 
> I am talking about a misbehaving program due to bug or an intentional
> rogue program.
> 
> >
> > Dentries are subject to kmemcg handling.  Does this not help avoid
> > "impacting performance of other applications"?
> 
> AFAIK, the dentry kmem_cache isn't memcg aware.

The dentry cache is most definitely is memcg aware. It (and teh
inode cache) were the primary targets for the memcg slab reclaim
infrastructure.

#if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
# define SLAB_ACCOUNT           0x04000000UL    /* Account to memcg */
#else
# define SLAB_ACCOUNT           0x00000000UL
#endif

dcache_init():

        dentry_cache = KMEM_CACHE(dentry,
                SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT);


Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 0/6] fs/dcache: Limit # of negative dentries
  2017-10-11 20:56     ` Dave Chinner
@ 2017-10-11 21:08       ` Waiman Long
  0 siblings, 0 replies; 12+ messages in thread
From: Waiman Long @ 2017-10-11 21:08 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andrew Morton, Alexander Viro, linux-kernel, linux-fsdevel,
	Linus Torvalds, Jan Kara, Paul E. McKenney, Ingo Molnar,
	Miklos Szeredi, Matthew Wilcox, Larry Woodman, James Bottomley,
	Wangkai (Kevin C)

On 10/11/2017 04:56 PM, Dave Chinner wrote:
> On Wed, Oct 11, 2017 at 04:47:05PM -0400, Waiman Long wrote:
>> On 10/10/2017 06:54 PM, Andrew Morton wrote:
>>> On Mon, 18 Sep 2017 14:20:28 -0400 Waiman Long <longman@redhat.com> wrote:
>>>
>>>> A rogue application can potentially create a large number of negative
>>>> dentries in the system consuming most of the memory available even if
>>>> memory controller is enabled to limit memory usage. This can impact
>>>> performance of other applications running on the system.
>>> It does seem that under these circumstances it is pretty silly of us to
>>> reclaim useful things in order to instantiate zillions of -ve dentries.
>> I am talking about a misbehaving program due to bug or an intentional
>> rogue program.
>>
>>> Dentries are subject to kmemcg handling.  Does this not help avoid
>>> "impacting performance of other applications"?
>> AFAIK, the dentry kmem_cache isn't memcg aware.
> The dentry cache is most definitely is memcg aware. It (and teh
> inode cache) were the primary targets for the memcg slab reclaim
> infrastructure.
>
> #if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
> # define SLAB_ACCOUNT           0x04000000UL    /* Account to memcg */
> #else
> # define SLAB_ACCOUNT           0x00000000UL
> #endif
>
> dcache_init():
>
>         dentry_cache = KMEM_CACHE(dentry,
>                 SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT);
>

Oh, my bad! You are right.

However, it is still a problem for applications that are not being under
any memcg.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-10-11 21:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-18 18:20 [PATCH v4 0/6] fs/dcache: Limit # of negative dentries Waiman Long
2017-09-18 18:20 ` [PATCH v4 1/6] fs/dcache: Relocate dentry_kill() after lock_parent() Waiman Long
2017-09-18 18:20 ` [PATCH v4 2/6] fs/dcache: Track & report number of negative dentries Waiman Long
2017-09-18 18:20 ` [PATCH v4 3/6] fs/dcache: Limit numbers " Waiman Long
2017-09-18 18:20 ` [PATCH v4 4/6] fs/dcache: Enable automatic pruning " Waiman Long
2017-09-18 18:20 ` [PATCH v4 5/6] fs/dcache: Track count of negative dentries forcibly killed Waiman Long
2017-09-18 18:20 ` [PATCH v4 6/6] fs/dcache: Autotuning of negative dentry limit Waiman Long
2017-10-05 15:41 ` [PATCH v4 0/6] fs/dcache: Limit # of negative dentries Waiman Long
2017-10-10 22:54 ` Andrew Morton
2017-10-11 20:47   ` Waiman Long
2017-10-11 20:56     ` Dave Chinner
2017-10-11 21:08       ` Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.