All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/5] fs/dcache: Limit # of negative dentries
@ 2017-07-28 18:34 Waiman Long
  2017-07-28 18:34 ` [PATCH v3 1/5] fs/dcache: Limit numbers " Waiman Long
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Waiman Long @ 2017-07-28 18:34 UTC (permalink / raw)
  To: Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, James Bottomley, Waiman Long

 v2->v3:
  - Add a faster pruning rate when the free pool is closed to depletion.
  - As suggested by James Bottomley, add an artificial delay waiting
    loop before killing a negative dentry and properly clear the
    DCACHE_KILL_NEGATIVE flag if killing doesn't happen.
  - Add a new patch to track number of negative dentries that are
    forcifully killed.

 v1->v2:
  - Move the new nr_negative field to the end of dentry_stat_t structure
    as suggested by Matthew Wilcox.
  - With the help of Miklos Szeredi, fix incorrect locking order in
    dentry_kill() by using lock_parent() instead of locking the parent's
    d_lock directly.
  - Correctly account for positive to negative dentry transitions.
  - Automatic pruning of negative dentries will now ignore the reference
    bit in negative dentries but not the regular shrinking.

A rogue application can potentially create a large number of negative
dentries in the system consuming most of the memory available. This
can impact performance of other applications running on the system.

This patchset introduces changes to the dcache subsystem to limit
the number of negative dentries allowed to be created thus limiting
the amount of memory that can be consumed by negative dentries.

Patch 1 tracks the number of negative dentries used and disallow
the creation of more when the limit is reached.

Patch 2 enables /proc/sys/fs/dentry-state to report the number of
negative dentries in the system.

Patch 3 enables automatic pruning of negative dentries when it is
close to the limit so that we won't end up killing recently used
negative dentries.

Patch 4 prevents racing between negative dentry pruning and umount
operation.

Patch 5 shows the number of forced negative dentry killings in
/proc/sys/fs/dentry-state. End users can then tune the neg_dentry_pc=
kernel boot parameter if they want to reduce forced negative dentry
killings.

Waiman Long (5):
  fs/dcache: Limit numbers of negative dentries
  fs/dcache: Report negative dentry number in dentry-state
  fs/dcache: Enable automatic pruning of negative dentries
  fs/dcache: Protect negative dentry pruning from racing with umount
  fs/dcache: Track count of negative dentries forcibly killed

 Documentation/admin-guide/kernel-parameters.txt |   7 +
 fs/dcache.c                                     | 451 ++++++++++++++++++++++--
 include/linux/dcache.h                          |   8 +-
 include/linux/list_lru.h                        |   1 +
 mm/list_lru.c                                   |   4 +-
 5 files changed, 435 insertions(+), 36 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 1/5] fs/dcache: Limit numbers of negative dentries
  2017-07-28 18:34 [PATCH v3 0/5] fs/dcache: Limit # of negative dentries Waiman Long
@ 2017-07-28 18:34 ` Waiman Long
  2017-07-28 18:34 ` [PATCH v3 2/5] fs/dcache: Report negative dentry number in dentry-state Waiman Long
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Waiman Long @ 2017-07-28 18:34 UTC (permalink / raw)
  To: Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, James Bottomley, Waiman Long

The number of positive dentries is limited by the number of files
in the filesystems. The number of negative dentries, however,
has no limit other than the total amount of memory available in
the system. So a rogue application that generates a lot of negative
dentries can potentially exhaust most of the memory available in the
system impacting performance on other running applications.

To prevent this from happening, the dcache code is now updated to limit
the amount of the negative dentries in the LRU lists that can be kept
as a percentage of total available system memory. The default is 5%
and can be changed by specifying the "neg_dentry_pc=" kernel command
line option.

If the negative dentry limit is exceeded, newly created negative
dentries will be killed right after use to avoid adding unpredictable
latency to the directory lookup operation.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt |   7 +
 fs/dcache.c                                     | 251 +++++++++++++++++++++---
 include/linux/dcache.h                          |   1 +
 3 files changed, 227 insertions(+), 32 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 372cc66..7f5497b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2383,6 +2383,13 @@
 
 	n2=		[NET] SDL Inc. RISCom/N2 synchronous serial card
 
+	neg_dentry_pc=	[KNL]
+			Range: 1-50
+			Default: 5
+			This parameter specifies the amount of negative
+			dentries allowed in the system as a percentage of
+			total system memory.
+
 	netdev=		[NET] Network devices parameters
 			Format: <irq>,<io>,<mem_start>,<mem_end>,<name>
 			Note that mem_start is often overloaded to mean
diff --git a/fs/dcache.c b/fs/dcache.c
index f901413..ab10b96 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -130,8 +130,19 @@ struct dentry_stat_t dentry_stat = {
 	.age_limit = 45,
 };
 
+/*
+ * Macros and variables to manage and count negative dentries.
+ */
+#define NEG_DENTRY_BATCH	(1 << 8)
+static long neg_dentry_percpu_limit __read_mostly;
+static struct {
+	raw_spinlock_t nfree_lock;
+	long nfree;			/* Negative dentry free pool */
+} ndblk ____cacheline_aligned_in_smp;
+
 static DEFINE_PER_CPU(long, nr_dentry);
 static DEFINE_PER_CPU(long, nr_dentry_unused);
+static DEFINE_PER_CPU(long, nr_dentry_neg);
 
 #if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS)
 
@@ -227,6 +238,92 @@ static inline int dentry_string_cmp(const unsigned char *cs, const unsigned char
 
 #endif
 
+/*
+ * There is a system-wide limit to the amount of negative dentries allowed
+ * in the super blocks' LRU lists. The default limit is 5% of the total
+ * system memory. This limit can be changed by using the kernel command line
+ * option "neg_dentry_pc=" to specify the percentage of the total memory
+ * that can be used for negative dentries. That percentage must be in the
+ * 1-50% range.
+ *
+ * To avoid performance problem with a global counter on an SMP system,
+ * the tracking is done mostly on a per-cpu basis. The total limit is
+ * distributed in a 80/20 ratio to per-cpu counters and a global free pool.
+ *
+ * If a per-cpu counter runs out of negative dentries, it can borrow extra
+ * ones from the global free pool. If it has more than its percpu limit,
+ * the extra ones will be returned back to the global pool.
+ */
+
+/*
+ * Decrement negative dentry count if applicable.
+ */
+static void __neg_dentry_dec(struct dentry *dentry)
+{
+	if (unlikely(this_cpu_dec_return(nr_dentry_neg) < 0)) {
+		long *pcnt = get_cpu_ptr(&nr_dentry_neg);
+
+		if ((*pcnt < 0) && raw_spin_trylock(&ndblk.nfree_lock)) {
+			ACCESS_ONCE(ndblk.nfree) += NEG_DENTRY_BATCH;
+			*pcnt += NEG_DENTRY_BATCH;
+			raw_spin_unlock(&ndblk.nfree_lock);
+		}
+		put_cpu_ptr(&nr_dentry_neg);
+	}
+}
+
+static inline void neg_dentry_dec(struct dentry *dentry)
+{
+	if (unlikely(d_is_negative(dentry)))
+		__neg_dentry_dec(dentry);
+}
+
+/*
+ * Decrement the negative dentry free pool by NEG_DENTRY_BATCH & return
+ * the actual number decremented.
+ */
+static long __neg_dentry_nfree_dec(void)
+{
+	long cnt = NEG_DENTRY_BATCH;
+
+	raw_spin_lock(&ndblk.nfree_lock);
+	if (ndblk.nfree < cnt)
+		cnt = ndblk.nfree;
+	ACCESS_ONCE(ndblk.nfree) -= cnt;
+	raw_spin_unlock(&ndblk.nfree_lock);
+	return cnt;
+}
+
+/*
+ * Increment negative dentry count if applicable.
+ */
+static void __neg_dentry_inc(struct dentry *dentry)
+{
+	long cnt = 0, *pcnt;
+
+	if (this_cpu_inc_return(nr_dentry_neg) <= neg_dentry_percpu_limit)
+		return;
+
+	pcnt = get_cpu_ptr(&nr_dentry_neg);
+	if (READ_ONCE(ndblk.nfree) && (*pcnt > neg_dentry_percpu_limit)) {
+		cnt = __neg_dentry_nfree_dec();
+		*pcnt -= cnt;
+	}
+	put_cpu_ptr(&nr_dentry_neg);
+	/*
+	 * If there are too many negative dentries, set DCACHE_KILL_NEGATIVE
+	 * flag to indicate that the dentry should be killed.
+	 */
+	if (!cnt)
+		dentry->d_flags |= DCACHE_KILL_NEGATIVE;
+}
+
+static inline void neg_dentry_inc(struct dentry *dentry)
+{
+	if (unlikely(d_is_negative(dentry)))
+		__neg_dentry_inc(dentry);
+}
+
 static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *ct, unsigned tcount)
 {
 	/*
@@ -329,6 +426,8 @@ static inline void __d_clear_type_and_inode(struct dentry *dentry)
 	flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU);
 	WRITE_ONCE(dentry->d_flags, flags);
 	dentry->d_inode = NULL;
+	if (dentry->d_flags & DCACHE_LRU_LIST)
+		__neg_dentry_inc(dentry);
 }
 
 static void dentry_free(struct dentry *dentry)
@@ -396,6 +495,7 @@ static void d_lru_add(struct dentry *dentry)
 	dentry->d_flags |= DCACHE_LRU_LIST;
 	this_cpu_inc(nr_dentry_unused);
 	WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
+	neg_dentry_inc(dentry);
 }
 
 static void d_lru_del(struct dentry *dentry)
@@ -404,6 +504,7 @@ static void d_lru_del(struct dentry *dentry)
 	dentry->d_flags &= ~DCACHE_LRU_LIST;
 	this_cpu_dec(nr_dentry_unused);
 	WARN_ON_ONCE(!list_lru_del(&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
+	neg_dentry_dec(dentry);
 }
 
 static void d_shrink_del(struct dentry *dentry)
@@ -434,6 +535,7 @@ static void d_lru_isolate(struct list_lru_one *lru, struct dentry *dentry)
 	dentry->d_flags &= ~DCACHE_LRU_LIST;
 	this_cpu_dec(nr_dentry_unused);
 	list_lru_isolate(lru, &dentry->d_lru);
+	neg_dentry_dec(dentry);
 }
 
 static void d_lru_shrink_move(struct list_lru_one *lru, struct dentry *dentry,
@@ -442,6 +544,7 @@ static void d_lru_shrink_move(struct list_lru_one *lru, struct dentry *dentry,
 	D_FLAG_VERIFY(dentry, DCACHE_LRU_LIST);
 	dentry->d_flags |= DCACHE_SHRINK_LIST;
 	list_lru_isolate_move(lru, &dentry->d_lru, list);
+	neg_dentry_dec(dentry);
 }
 
 /*
@@ -586,38 +689,6 @@ static void __dentry_kill(struct dentry *dentry)
 		dentry_free(dentry);
 }
 
-/*
- * Finish off a dentry we've decided to kill.
- * dentry->d_lock must be held, returns with it unlocked.
- * If ref is non-zero, then decrement the refcount too.
- * Returns dentry requiring refcount drop, or NULL if we're done.
- */
-static struct dentry *dentry_kill(struct dentry *dentry)
-	__releases(dentry->d_lock)
-{
-	struct inode *inode = dentry->d_inode;
-	struct dentry *parent = NULL;
-
-	if (inode && unlikely(!spin_trylock(&inode->i_lock)))
-		goto failed;
-
-	if (!IS_ROOT(dentry)) {
-		parent = dentry->d_parent;
-		if (unlikely(!spin_trylock(&parent->d_lock))) {
-			if (inode)
-				spin_unlock(&inode->i_lock);
-			goto failed;
-		}
-	}
-
-	__dentry_kill(dentry);
-	return parent;
-
-failed:
-	spin_unlock(&dentry->d_lock);
-	return dentry; /* try again with same dentry */
-}
-
 static inline struct dentry *lock_parent(struct dentry *dentry)
 {
 	struct dentry *parent = dentry->d_parent;
@@ -653,6 +724,71 @@ static inline struct dentry *lock_parent(struct dentry *dentry)
 }
 
 /*
+ * Finish off a dentry we've decided to kill.
+ * dentry->d_lock must be held, returns with it unlocked.
+ * If ref is non-zero, then decrement the refcount too.
+ * Returns dentry requiring refcount drop, or NULL if we're done.
+ */
+static struct dentry *dentry_kill(struct dentry *dentry)
+	__releases(dentry->d_lock)
+{
+	struct inode *inode = dentry->d_inode;
+	struct dentry *parent = NULL;
+
+	if (inode && unlikely(!spin_trylock(&inode->i_lock)))
+		goto failed;
+
+	if (!IS_ROOT(dentry)) {
+		parent = dentry->d_parent;
+		if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE)) {
+			/*
+			 * Force the killing of this negative dentry when
+			 * DCACHE_KILL_NEGATIVE flag is set. This is to
+			 * protect against the worst case situation where
+			 * the parent d_lock is kept busy most of the time
+			 * while the number of negative dentries has
+			 * exceeded the limit. Leaving the dentry as is
+			 * will cause the number of negative entries to
+			 * continue to increase way past the limit under
+			 * the worst case situation.
+			 *
+			 * As the reference count is kept at 1 before
+			 * calling dentry_kill(), the dentry won't
+			 * disappear under the hood even if the dentry
+			 * lock is temporarily released.
+			 */
+			unsigned int dflags;
+
+			dentry->d_flags &= ~DCACHE_KILL_NEGATIVE;
+			dflags = dentry->d_flags;
+			parent = lock_parent(dentry);
+			/*
+			 * Abort the killing if the reference count or
+			 * d_flags changed.
+			 */
+			if ((dentry->d_flags != dflags) ||
+			    (dentry->d_lockref.count != 1)) {
+				if (parent)
+					spin_unlock(&parent->d_lock);
+				goto failed;
+			}
+
+		} else if (unlikely(!spin_trylock(&parent->d_lock))) {
+			if (inode)
+				spin_unlock(&inode->i_lock);
+			goto failed;
+		}
+	}
+
+	__dentry_kill(dentry);
+	return parent;
+
+failed:
+	spin_unlock(&dentry->d_lock);
+	return dentry; /* try again with same dentry */
+}
+
+/*
  * Try to do a lockless dput(), and return whether that was successful.
  *
  * If unsuccessful, we return false, having already taken the dentry lock.
@@ -815,6 +951,9 @@ void dput(struct dentry *dentry)
 
 	dentry_lru_add(dentry);
 
+	if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE))
+		goto kill_it;
+
 	dentry->d_lockref.count--;
 	spin_unlock(&dentry->d_lock);
 	return;
@@ -1820,6 +1959,11 @@ static void __d_instantiate(struct dentry *dentry, struct inode *inode)
 	WARN_ON(d_in_lookup(dentry));
 
 	spin_lock(&dentry->d_lock);
+	/*
+	 * Decrement negative dentry count if it was in the LRU list.
+	 */
+	if (dentry->d_flags & DCACHE_LRU_LIST)
+		__neg_dentry_dec(dentry);
 	hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry);
 	raw_write_seqcount_begin(&dentry->d_seq);
 	__d_set_inode_and_type(dentry, inode, add_flags);
@@ -3566,6 +3710,47 @@ void d_tmpfile(struct dentry *dentry, struct inode *inode)
 }
 EXPORT_SYMBOL(d_tmpfile);
 
+static long neg_dentry_pc __initdata = 5;
+static bool neg_dentry_warn __initdata;
+static int __init set_neg_dentry_pc(char *str)
+{
+	ssize_t ret;
+	long new_pc = neg_dentry_pc;
+
+	if (!str)
+		return 0;
+	ret = kstrtol(str, 0, &new_pc);
+	if (ret || (new_pc < 1) || (new_pc > 50))
+		ret = 1;
+	else
+		neg_dentry_pc = new_pc;
+	if (ret)
+		neg_dentry_warn = true;
+	return ret ? 0 : 1;
+}
+__setup("neg_dentry_pc=", set_neg_dentry_pc);
+
+static void __init neg_dentry_init(void)
+{
+	/* Rough estimate of # of dentries allocated per page */
+	unsigned int nr_dentry_page = PAGE_SIZE/sizeof(struct dentry) - 1;
+	unsigned long cnt;
+
+	raw_spin_lock_init(&ndblk.nfree_lock);
+
+	/* 20% in global pool & 80% in percpu free */
+	ndblk.nfree = totalram_pages * nr_dentry_page * neg_dentry_pc / 500;
+	cnt = ndblk.nfree * 4 / num_possible_cpus();
+	if (unlikely(cnt < 2 * NEG_DENTRY_BATCH))
+		cnt = 2 * NEG_DENTRY_BATCH;
+	neg_dentry_percpu_limit = cnt;
+
+	if (neg_dentry_warn)
+		pr_warn("Warning: neg_dentry_pc must be within 1-50 range.\n");
+	pr_info("Negative dentry: percpu limit = %ld, free pool = %ld\n",
+		neg_dentry_percpu_limit, ndblk.nfree);
+}
+
 static __initdata unsigned long dhash_entries;
 static int __init set_dhash_entries(char *str)
 {
@@ -3606,6 +3791,8 @@ static void __init dcache_init(void)
 	dentry_cache = KMEM_CACHE(dentry,
 		SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT);
 
+	neg_dentry_init();
+
 	/* Hash may have been set up in dcache_init_early */
 	if (!hashdist)
 		return;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index aae1cdb..5ffcc46 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -218,6 +218,7 @@ struct dentry_operations {
 
 #define DCACHE_PAR_LOOKUP		0x10000000 /* being looked up (with parent locked shared) */
 #define DCACHE_DENTRY_CURSOR		0x20000000
+#define DCACHE_KILL_NEGATIVE		0x40000000 /* Kill negative dentry */
 
 extern seqlock_t rename_lock;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 2/5] fs/dcache: Report negative dentry number in dentry-state
  2017-07-28 18:34 [PATCH v3 0/5] fs/dcache: Limit # of negative dentries Waiman Long
  2017-07-28 18:34 ` [PATCH v3 1/5] fs/dcache: Limit numbers " Waiman Long
@ 2017-07-28 18:34 ` Waiman Long
  2017-07-28 18:34 ` [PATCH v3 3/5] fs/dcache: Enable automatic pruning of negative dentries Waiman Long
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Waiman Long @ 2017-07-28 18:34 UTC (permalink / raw)
  To: Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, James Bottomley, Waiman Long

The number of negative dentries currently in the system is now reported
in the /proc/sys/fs/dentry-state file.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/dcache.c            | 16 +++++++++++++++-
 include/linux/dcache.h |  7 ++++---
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index ab10b96..fb7e041 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -135,6 +135,7 @@ struct dentry_stat_t dentry_stat = {
  */
 #define NEG_DENTRY_BATCH	(1 << 8)
 static long neg_dentry_percpu_limit __read_mostly;
+static long neg_dentry_nfree_init __read_mostly; /* Free pool initial value */
 static struct {
 	raw_spinlock_t nfree_lock;
 	long nfree;			/* Negative dentry free pool */
@@ -176,11 +177,23 @@ static long get_nr_dentry_unused(void)
 	return sum < 0 ? 0 : sum;
 }
 
+static long get_nr_dentry_neg(void)
+{
+	int i;
+	long sum = 0;
+
+	for_each_possible_cpu(i)
+		sum += per_cpu(nr_dentry_neg, i);
+	sum += neg_dentry_nfree_init - ndblk.nfree;
+	return sum < 0 ? 0 : sum;
+}
+
 int proc_nr_dentry(struct ctl_table *table, int write, void __user *buffer,
 		   size_t *lenp, loff_t *ppos)
 {
 	dentry_stat.nr_dentry = get_nr_dentry();
 	dentry_stat.nr_unused = get_nr_dentry_unused();
+	dentry_stat.nr_negative = get_nr_dentry_neg();
 	return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
 }
 #endif
@@ -3739,7 +3752,8 @@ static void __init neg_dentry_init(void)
 	raw_spin_lock_init(&ndblk.nfree_lock);
 
 	/* 20% in global pool & 80% in percpu free */
-	ndblk.nfree = totalram_pages * nr_dentry_page * neg_dentry_pc / 500;
+	ndblk.nfree = neg_dentry_nfree_init
+		    = totalram_pages * nr_dentry_page * neg_dentry_pc / 500;
 	cnt = ndblk.nfree * 4 / num_possible_cpus();
 	if (unlikely(cnt < 2 * NEG_DENTRY_BATCH))
 		cnt = 2 * NEG_DENTRY_BATCH;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 5ffcc46..e42c8fc 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -63,9 +63,10 @@ struct qstr {
 struct dentry_stat_t {
 	long nr_dentry;
 	long nr_unused;
-	long age_limit;          /* age in seconds */
-	long want_pages;         /* pages requested by system */
-	long dummy[2];
+	long age_limit;		/* age in seconds */
+	long want_pages;	/* pages requested by system */
+	long nr_negative;	/* # of negative dentries */
+	long dummy;
 };
 extern struct dentry_stat_t dentry_stat;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 3/5] fs/dcache: Enable automatic pruning of negative dentries
  2017-07-28 18:34 [PATCH v3 0/5] fs/dcache: Limit # of negative dentries Waiman Long
  2017-07-28 18:34 ` [PATCH v3 1/5] fs/dcache: Limit numbers " Waiman Long
  2017-07-28 18:34 ` [PATCH v3 2/5] fs/dcache: Report negative dentry number in dentry-state Waiman Long
@ 2017-07-28 18:34 ` Waiman Long
  2017-07-28 18:34 ` [PATCH v3 4/5] fs/dcache: Protect negative dentry pruning from racing with umount Waiman Long
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Waiman Long @ 2017-07-28 18:34 UTC (permalink / raw)
  To: Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, James Bottomley, Waiman Long

Having a limit for the number of negative dentries may have an
undesirable side effect that no new negative dentries will be allowed
when the limit is reached. This may have a performance impact on some
workloads.

To prevent this from happening, we need a way to prune the negative
dentries so that new ones can be created before it is too late. This
is done by using the workqueue API to do the pruning gradually when a
threshold is reached to minimize performance impact on other running
tasks.

The current threshold is 1/4 of the initial value of the free pool
count. Once the threshold is reached, the automatic pruning process
will be kicked in to replenish the free pool. Each pruning run will
scan at most 256 LRU dentries and 64 dentries per node to minimize
the LRU locks hold time. The pruning rate will be 50 Hz if the free
pool count is less than 1/8 of the original and 10 Hz otherwise.

A short artificial delay loop is added to wait for changes in the
negative dentry count before killing the negative dentry. Sleeping
in this case may be problematic as the callers of dput() may not
be in a state that is sleepable.

Allowing tasks needing negative dentries to potentially go to do
the pruning synchronously themselves can cause lock and cacheline
contention. The end result may not be better than that of killing
recently created negative dentries.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/dcache.c              | 156 +++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/list_lru.h |   1 +
 mm/list_lru.c            |   4 +-
 3 files changed, 155 insertions(+), 6 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index fb7e041..3482972 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -134,13 +134,21 @@ struct dentry_stat_t dentry_stat = {
  * Macros and variables to manage and count negative dentries.
  */
 #define NEG_DENTRY_BATCH	(1 << 8)
+#define NEG_PRUNING_SIZE	(1 << 6)
+#define NEG_PRUNING_SLOW_RATE	(HZ/10)
+#define NEG_PRUNING_FAST_RATE	(HZ/50)
 static long neg_dentry_percpu_limit __read_mostly;
 static long neg_dentry_nfree_init __read_mostly; /* Free pool initial value */
 static struct {
 	raw_spinlock_t nfree_lock;
 	long nfree;			/* Negative dentry free pool */
+	struct super_block *prune_sb;	/* Super_block for pruning */
+	int neg_count, prune_count;	/* Pruning counts */
 } ndblk ____cacheline_aligned_in_smp;
 
+static void prune_negative_dentry(struct work_struct *work);
+static DECLARE_DELAYED_WORK(prune_neg_dentry_work, prune_negative_dentry);
+
 static DEFINE_PER_CPU(long, nr_dentry);
 static DEFINE_PER_CPU(long, nr_dentry_unused);
 static DEFINE_PER_CPU(long, nr_dentry_neg);
@@ -329,6 +337,15 @@ static void __neg_dentry_inc(struct dentry *dentry)
 	 */
 	if (!cnt)
 		dentry->d_flags |= DCACHE_KILL_NEGATIVE;
+
+	/*
+	 * Initiate negative dentry pruning if free pool has less than
+	 * 1/4 of its initial value.
+	 */
+	if (READ_ONCE(ndblk.nfree) < neg_dentry_nfree_init/4) {
+		WRITE_ONCE(ndblk.prune_sb, dentry->d_sb);
+		schedule_delayed_work(&prune_neg_dentry_work, 1);
+	}
 }
 
 static inline void neg_dentry_inc(struct dentry *dentry)
@@ -770,10 +787,8 @@ static struct dentry *dentry_kill(struct dentry *dentry)
 			 * disappear under the hood even if the dentry
 			 * lock is temporarily released.
 			 */
-			unsigned int dflags;
+			unsigned int dflags = dentry->d_flags;
 
-			dentry->d_flags &= ~DCACHE_KILL_NEGATIVE;
-			dflags = dentry->d_flags;
 			parent = lock_parent(dentry);
 			/*
 			 * Abort the killing if the reference count or
@@ -964,8 +979,35 @@ void dput(struct dentry *dentry)
 
 	dentry_lru_add(dentry);
 
-	if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE))
-		goto kill_it;
+	if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE)) {
+		/*
+		 * Kill the dentry if it is really negative and the per-cpu
+		 * negative dentry count has still exceeded the limit even
+		 * after a short artificial delay.
+		 */
+		if (d_is_negative(dentry) &&
+		   (this_cpu_read(nr_dentry_neg) > neg_dentry_percpu_limit)) {
+			int loop = 256;
+
+			/*
+			 * Waiting to transfer free negative dentries from the
+			 * free pool to the percpu count.
+			 */
+			while (--loop) {
+				if (READ_ONCE(ndblk.nfree)) {
+					long cnt = __neg_dentry_nfree_dec();
+
+					this_cpu_sub(nr_dentry_neg, cnt);
+					if (cnt)
+						break;
+				}
+				cpu_relax();
+			}
+			if (!loop)
+				goto kill_it;
+		}
+		dentry->d_flags &= ~DCACHE_KILL_NEGATIVE;
+	}
 
 	dentry->d_lockref.count--;
 	spin_unlock(&dentry->d_lock);
@@ -1326,6 +1368,110 @@ void shrink_dcache_sb(struct super_block *sb)
 }
 EXPORT_SYMBOL(shrink_dcache_sb);
 
+/*
+ * A modified version that attempts to remove a limited number of negative
+ * dentries as well as some other non-negative dentries at the front.
+ */
+static enum lru_status dentry_negative_lru_isolate(struct list_head *item,
+		struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
+{
+	struct list_head *freeable = arg;
+	struct dentry	*dentry = container_of(item, struct dentry, d_lru);
+	enum lru_status	status = LRU_SKIP;
+
+	/*
+	 * Stop further list walking for the current node list to limit
+	 * performance impact, but allow further walking in the next node
+	 * list.
+	 */
+	if ((ndblk.neg_count >= NEG_PRUNING_SIZE) ||
+	    (ndblk.prune_count >= NEG_PRUNING_SIZE)) {
+		ndblk.prune_count = 0;
+		return LRU_STOP;
+	}
+
+	/*
+	 * we are inverting the lru lock/dentry->d_lock here,
+	 * so use a trylock. If we fail to get the lock, just skip
+	 * it
+	 */
+	if (!spin_trylock(&dentry->d_lock)) {
+		ndblk.prune_count++;
+		return LRU_SKIP;
+	}
+
+	/*
+	 * Referenced dentries are still in use. If they have active
+	 * counts, just remove them from the LRU. Otherwise give them
+	 * another pass through the LRU.
+	 */
+	if (dentry->d_lockref.count) {
+		d_lru_isolate(lru, dentry);
+		status = LRU_REMOVED;
+		goto out;
+	}
+
+	/*
+	 * Dentries with reference bit on are moved back to the tail
+	 * except for the negative ones.
+	 */
+	if ((dentry->d_flags & DCACHE_REFERENCED) && !d_is_negative(dentry)) {
+		dentry->d_flags &= ~DCACHE_REFERENCED;
+		status = LRU_ROTATE;
+		goto out;
+	}
+
+	status = LRU_REMOVED;
+	d_lru_shrink_move(lru, dentry, freeable);
+	if (d_is_negative(dentry))
+		ndblk.neg_count++;
+out:
+	spin_unlock(&dentry->d_lock);
+	ndblk.prune_count++;
+	return status;
+}
+
+/*
+ * A workqueue function to prune negative dentry.
+ *
+ * The pruning is done gradually over time so as to have as little
+ * performance impact as possible.
+ */
+static void prune_negative_dentry(struct work_struct *work)
+{
+	int freed;
+	long nfree;
+	struct super_block *sb = READ_ONCE(ndblk.prune_sb);
+	LIST_HEAD(dispose);
+
+	if (!sb)
+		return;
+
+	ndblk.neg_count = ndblk.prune_count = 0;
+	freed = list_lru_walk(&sb->s_dentry_lru, dentry_negative_lru_isolate,
+			      &dispose, NEG_DENTRY_BATCH);
+
+	if (freed)
+		shrink_dentry_list(&dispose);
+	/*
+	 * Continue delayed pruning until negative dentry free pool is at
+	 * least 1/2 of the initial value or the super_block has no more
+	 * negative dentries left at the front.
+	 *
+	 * The pruning rate depends on the size of the free pool. The
+	 * faster rate is used when there is less than 1/8 left.
+	 * Otherwise, the slower rate will be used.
+	 */
+	nfree = READ_ONCE(ndblk.nfree);
+	if (ndblk.neg_count && (nfree < neg_dentry_nfree_init/2))
+		schedule_delayed_work(&prune_neg_dentry_work,
+				     (nfree < neg_dentry_nfree_init/8)
+				     ? NEG_PRUNING_FAST_RATE
+				     : NEG_PRUNING_SLOW_RATE);
+	else
+		WRITE_ONCE(ndblk.prune_sb, NULL);
+}
+
 /**
  * enum d_walk_ret - action to talke during tree walk
  * @D_WALK_CONTINUE:	contrinue walk
diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index fa7fd03..06c9d15 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -22,6 +22,7 @@ enum lru_status {
 	LRU_SKIP,		/* item cannot be locked, skip */
 	LRU_RETRY,		/* item not freeable. May drop the lock
 				   internally, but has to return locked. */
+	LRU_STOP,		/* stop walking the list */
 };
 
 struct list_lru_one {
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 7a40fa2..f6e7796 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -244,11 +244,13 @@ unsigned long list_lru_count_node(struct list_lru *lru, int nid)
 			 */
 			assert_spin_locked(&nlru->lock);
 			goto restart;
+		case LRU_STOP:
+			goto out;
 		default:
 			BUG();
 		}
 	}
-
+out:
 	spin_unlock(&nlru->lock);
 	return isolated;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 4/5] fs/dcache: Protect negative dentry pruning from racing with umount
  2017-07-28 18:34 [PATCH v3 0/5] fs/dcache: Limit # of negative dentries Waiman Long
                   ` (2 preceding siblings ...)
  2017-07-28 18:34 ` [PATCH v3 3/5] fs/dcache: Enable automatic pruning of negative dentries Waiman Long
@ 2017-07-28 18:34 ` Waiman Long
  2017-07-28 18:34 ` [PATCH v3 5/5] fs/dcache: Track count of negative dentries forcibly killed Waiman Long
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Waiman Long @ 2017-07-28 18:34 UTC (permalink / raw)
  To: Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, James Bottomley, Waiman Long

The negative dentry pruning is done on a specific super_block set
in the ndblk.prune_sb variable. If the super_block is also being
un-mounted concurrently, the content of the super_block may no longer
be valid.

To protect against such racing condition, a new lock is added to
the ndblk structure to synchronize the negative dentry pruning and
umount operation. This is a regular spinlock as the pruning operation
can be quite time consuming.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/dcache.c | 42 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 3482972..360185e 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -141,11 +141,13 @@ struct dentry_stat_t dentry_stat = {
 static long neg_dentry_nfree_init __read_mostly; /* Free pool initial value */
 static struct {
 	raw_spinlock_t nfree_lock;
+	spinlock_t prune_lock;		/* Lock for protecting pruning */
 	long nfree;			/* Negative dentry free pool */
 	struct super_block *prune_sb;	/* Super_block for pruning */
 	int neg_count, prune_count;	/* Pruning counts */
 } ndblk ____cacheline_aligned_in_smp;
 
+static void clear_prune_sb_for_umount(struct super_block *sb);
 static void prune_negative_dentry(struct work_struct *work);
 static DECLARE_DELAYED_WORK(prune_neg_dentry_work, prune_negative_dentry);
 
@@ -1355,6 +1357,7 @@ void shrink_dcache_sb(struct super_block *sb)
 {
 	long freed;
 
+	clear_prune_sb_for_umount(sb);
 	do {
 		LIST_HEAD(dispose);
 
@@ -1385,7 +1388,8 @@ static enum lru_status dentry_negative_lru_isolate(struct list_head *item,
 	 * list.
 	 */
 	if ((ndblk.neg_count >= NEG_PRUNING_SIZE) ||
-	    (ndblk.prune_count >= NEG_PRUNING_SIZE)) {
+	    (ndblk.prune_count >= NEG_PRUNING_SIZE) ||
+	    !READ_ONCE(ndblk.prune_sb)) {
 		ndblk.prune_count = 0;
 		return LRU_STOP;
 	}
@@ -1441,15 +1445,24 @@ static void prune_negative_dentry(struct work_struct *work)
 {
 	int freed;
 	long nfree;
-	struct super_block *sb = READ_ONCE(ndblk.prune_sb);
+	struct super_block *sb;
 	LIST_HEAD(dispose);
 
-	if (!sb)
+	/*
+	 * The prune_lock is used to protect negative dentry pruning from
+	 * racing with concurrent umount operation.
+	 */
+	spin_lock(&ndblk.prune_lock);
+	sb = READ_ONCE(ndblk.prune_sb);
+	if (!sb) {
+		spin_unlock(&ndblk.prune_lock);
 		return;
+	}
 
 	ndblk.neg_count = ndblk.prune_count = 0;
 	freed = list_lru_walk(&sb->s_dentry_lru, dentry_negative_lru_isolate,
 			      &dispose, NEG_DENTRY_BATCH);
+	spin_unlock(&ndblk.prune_lock);
 
 	if (freed)
 		shrink_dentry_list(&dispose);
@@ -1472,6 +1485,27 @@ static void prune_negative_dentry(struct work_struct *work)
 		WRITE_ONCE(ndblk.prune_sb, NULL);
 }
 
+/*
+ * This is called before an umount to clear ndblk.prune_sb if it
+ * matches the given super_block.
+ */
+static void clear_prune_sb_for_umount(struct super_block *sb)
+{
+	if (likely(READ_ONCE(ndblk.prune_sb) != sb))
+		return;
+	WRITE_ONCE(ndblk.prune_sb, NULL);
+	/*
+	 * Need to wait until an ongoing pruning operation, if present,
+	 * is completed.
+	 *
+	 * Clearing ndblk.prune_sb will hasten the completion of pruning.
+	 * In the unlikely event that ndblk.prune_sb is set to another
+	 * super_block, the waiting will last the complete pruning operation
+	 * which shouldn't be that long either.
+	 */
+	spin_unlock_wait(&ndblk.prune_lock);
+}
+
 /**
  * enum d_walk_ret - action to talke during tree walk
  * @D_WALK_CONTINUE:	contrinue walk
@@ -1794,6 +1828,7 @@ void shrink_dcache_for_umount(struct super_block *sb)
 
 	WARN(down_read_trylock(&sb->s_umount), "s_umount should've been locked");
 
+	clear_prune_sb_for_umount(sb);
 	dentry = sb->s_root;
 	sb->s_root = NULL;
 	do_one_tree(dentry);
@@ -3896,6 +3931,7 @@ static void __init neg_dentry_init(void)
 	unsigned long cnt;
 
 	raw_spin_lock_init(&ndblk.nfree_lock);
+	spin_lock_init(&ndblk.prune_lock);
 
 	/* 20% in global pool & 80% in percpu free */
 	ndblk.nfree = neg_dentry_nfree_init
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 5/5] fs/dcache: Track count of negative dentries forcibly killed
  2017-07-28 18:34 [PATCH v3 0/5] fs/dcache: Limit # of negative dentries Waiman Long
                   ` (3 preceding siblings ...)
  2017-07-28 18:34 ` [PATCH v3 4/5] fs/dcache: Protect negative dentry pruning from racing with umount Waiman Long
@ 2017-07-28 18:34 ` Waiman Long
  2017-08-15 17:15 ` [PATCH v3 0/5] fs/dcache: Limit # of negative dentries Waiman Long
  2017-08-28 17:58 ` Waiman Long
  6 siblings, 0 replies; 18+ messages in thread
From: Waiman Long @ 2017-07-28 18:34 UTC (permalink / raw)
  To: Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, James Bottomley, Waiman Long

There is performance concern about killing recently created negative
dentries. This should rarely happen under normal working condition. To
understand the extent of how often this negative dentry killing is
happening, the /proc/sys/fs/denty-state file is extended to track this
number. This allows us to see if additional measures will be needed
to reduce the chance of negative dentries killing.

One possible measure is to increase the percentage of system
memory allowed for negative dentries by adding or adjusting the
"neg_dentry_pc=" parameter in the kernel boot command line.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/dcache.c            | 4 ++++
 include/linux/dcache.h | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 360185e..3796c3f 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -145,6 +145,7 @@ struct dentry_stat_t dentry_stat = {
 	long nfree;			/* Negative dentry free pool */
 	struct super_block *prune_sb;	/* Super_block for pruning */
 	int neg_count, prune_count;	/* Pruning counts */
+	atomic_long_t nr_neg_killed;	/* # of negative entries killed */
 } ndblk ____cacheline_aligned_in_smp;
 
 static void clear_prune_sb_for_umount(struct super_block *sb);
@@ -204,6 +205,7 @@ int proc_nr_dentry(struct ctl_table *table, int write, void __user *buffer,
 	dentry_stat.nr_dentry = get_nr_dentry();
 	dentry_stat.nr_unused = get_nr_dentry_unused();
 	dentry_stat.nr_negative = get_nr_dentry_neg();
+	dentry_stat.nr_killed = atomic_long_read(&ndblk.nr_neg_killed);
 	return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
 }
 #endif
@@ -802,6 +804,7 @@ static struct dentry *dentry_kill(struct dentry *dentry)
 					spin_unlock(&parent->d_lock);
 				goto failed;
 			}
+			atomic_long_inc(&ndblk.nr_neg_killed);
 
 		} else if (unlikely(!spin_trylock(&parent->d_lock))) {
 			if (inode)
@@ -3932,6 +3935,7 @@ static void __init neg_dentry_init(void)
 
 	raw_spin_lock_init(&ndblk.nfree_lock);
 	spin_lock_init(&ndblk.prune_lock);
+	atomic_long_set(&ndblk.nr_neg_killed, 0);
 
 	/* 20% in global pool & 80% in percpu free */
 	ndblk.nfree = neg_dentry_nfree_init
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index e42c8fc..227ed83 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -66,7 +66,7 @@ struct dentry_stat_t {
 	long age_limit;		/* age in seconds */
 	long want_pages;	/* pages requested by system */
 	long nr_negative;	/* # of negative dentries */
-	long dummy;
+	long nr_killed;		/* # of negative dentries killed */
 };
 extern struct dentry_stat_t dentry_stat;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/5] fs/dcache: Limit # of negative dentries
  2017-07-28 18:34 [PATCH v3 0/5] fs/dcache: Limit # of negative dentries Waiman Long
                   ` (4 preceding siblings ...)
  2017-07-28 18:34 ` [PATCH v3 5/5] fs/dcache: Track count of negative dentries forcibly killed Waiman Long
@ 2017-08-15 17:15 ` Waiman Long
  2017-08-16 10:33   ` Wangkai (Kevin,C)
  2017-08-28 17:58 ` Waiman Long
  6 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2017-08-15 17:15 UTC (permalink / raw)
  To: Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, James Bottomley

On 07/28/2017 02:34 PM, Waiman Long wrote:
>  v2->v3:
>   - Add a faster pruning rate when the free pool is closed to depletion.
>   - As suggested by James Bottomley, add an artificial delay waiting
>     loop before killing a negative dentry and properly clear the
>     DCACHE_KILL_NEGATIVE flag if killing doesn't happen.
>   - Add a new patch to track number of negative dentries that are
>     forcifully killed.
>
>  v1->v2:
>   - Move the new nr_negative field to the end of dentry_stat_t structure
>     as suggested by Matthew Wilcox.
>   - With the help of Miklos Szeredi, fix incorrect locking order in
>     dentry_kill() by using lock_parent() instead of locking the parent's
>     d_lock directly.
>   - Correctly account for positive to negative dentry transitions.
>   - Automatic pruning of negative dentries will now ignore the reference
>     bit in negative dentries but not the regular shrinking.
>
> A rogue application can potentially create a large number of negative
> dentries in the system consuming most of the memory available. This
> can impact performance of other applications running on the system.
>
> This patchset introduces changes to the dcache subsystem to limit
> the number of negative dentries allowed to be created thus limiting
> the amount of memory that can be consumed by negative dentries.
>
> Patch 1 tracks the number of negative dentries used and disallow
> the creation of more when the limit is reached.
>
> Patch 2 enables /proc/sys/fs/dentry-state to report the number of
> negative dentries in the system.
>
> Patch 3 enables automatic pruning of negative dentries when it is
> close to the limit so that we won't end up killing recently used
> negative dentries.
>
> Patch 4 prevents racing between negative dentry pruning and umount
> operation.
>
> Patch 5 shows the number of forced negative dentry killings in
> /proc/sys/fs/dentry-state. End users can then tune the neg_dentry_pc=
> kernel boot parameter if they want to reduce forced negative dentry
> killings.
>
> Waiman Long (5):
>   fs/dcache: Limit numbers of negative dentries
>   fs/dcache: Report negative dentry number in dentry-state
>   fs/dcache: Enable automatic pruning of negative dentries
>   fs/dcache: Protect negative dentry pruning from racing with umount
>   fs/dcache: Track count of negative dentries forcibly killed
>
>  Documentation/admin-guide/kernel-parameters.txt |   7 +
>  fs/dcache.c                                     | 451 ++++++++++++++++++++++--
>  include/linux/dcache.h                          |   8 +-
>  include/linux/list_lru.h                        |   1 +
>  mm/list_lru.c                                   |   4 +-
>  5 files changed, 435 insertions(+), 36 deletions(-)
>
I haven't received any comment on this v3 patch for over 2 weeks. Is
there anything I can do to make it more ready to be merged?

Thanks,
Longman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v3 0/5] fs/dcache: Limit # of negative dentries
  2017-08-15 17:15 ` [PATCH v3 0/5] fs/dcache: Limit # of negative dentries Waiman Long
@ 2017-08-16 10:33   ` Wangkai (Kevin,C)
  2017-08-16 13:29     ` Waiman Long
  0 siblings, 1 reply; 18+ messages in thread
From: Wangkai (Kevin,C) @ 2017-08-16 10:33 UTC (permalink / raw)
  To: Waiman Long, Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, James Bottomley

> -----Original Message-----
> From: linux-fsdevel-owner@vger.kernel.org
> [mailto:linux-fsdevel-owner@vger.kernel.org] On Behalf Of Waiman Long
> Sent: Wednesday, August 16, 2017 1:15 AM
> To: Alexander Viro; Jonathan Corbet
> Cc: linux-kernel@vger.kernel.org; linux-doc@vger.kernel.org;
> linux-fsdevel@vger.kernel.org; Paul E. McKenney; Andrew Morton; Ingo Molnar;
> Miklos Szeredi; Matthew Wilcox; Larry Woodman; James Bottomley
> Subject: Re: [PATCH v3 0/5] fs/dcache: Limit # of negative dentries
> 
> On 07/28/2017 02:34 PM, Waiman Long wrote:
> >  v2->v3:
> >   - Add a faster pruning rate when the free pool is closed to depletion.
> >   - As suggested by James Bottomley, add an artificial delay waiting
> >     loop before killing a negative dentry and properly clear the
> >     DCACHE_KILL_NEGATIVE flag if killing doesn't happen.
> >   - Add a new patch to track number of negative dentries that are
> >     forcifully killed.
> >
> >  v1->v2:
> >   - Move the new nr_negative field to the end of dentry_stat_t structure
> >     as suggested by Matthew Wilcox.
> >   - With the help of Miklos Szeredi, fix incorrect locking order in
> >     dentry_kill() by using lock_parent() instead of locking the parent's
> >     d_lock directly.
> >   - Correctly account for positive to negative dentry transitions.
> >   - Automatic pruning of negative dentries will now ignore the reference
> >     bit in negative dentries but not the regular shrinking.
> >
> > A rogue application can potentially create a large number of negative
> > dentries in the system consuming most of the memory available. This
> > can impact performance of other applications running on the system.
> >
> > This patchset introduces changes to the dcache subsystem to limit the
> > number of negative dentries allowed to be created thus limiting the
> > amount of memory that can be consumed by negative dentries.
> >
> > Patch 1 tracks the number of negative dentries used and disallow the
> > creation of more when the limit is reached.
> >
> > Patch 2 enables /proc/sys/fs/dentry-state to report the number of
> > negative dentries in the system.
> >
> > Patch 3 enables automatic pruning of negative dentries when it is
> > close to the limit so that we won't end up killing recently used
> > negative dentries.
> >
> > Patch 4 prevents racing between negative dentry pruning and umount
> > operation.
> >
> > Patch 5 shows the number of forced negative dentry killings in
> > /proc/sys/fs/dentry-state. End users can then tune the neg_dentry_pc=
> > kernel boot parameter if they want to reduce forced negative dentry
> > killings.
> >
> > Waiman Long (5):
> >   fs/dcache: Limit numbers of negative dentries
> >   fs/dcache: Report negative dentry number in dentry-state
> >   fs/dcache: Enable automatic pruning of negative dentries
> >   fs/dcache: Protect negative dentry pruning from racing with umount
> >   fs/dcache: Track count of negative dentries forcibly killed
> >
> >  Documentation/admin-guide/kernel-parameters.txt |   7 +
> >  fs/dcache.c                                     | 451
> ++++++++++++++++++++++--
> >  include/linux/dcache.h                          |   8 +-
> >  include/linux/list_lru.h                        |   1 +
> >  mm/list_lru.c                                   |   4 +-
> >  5 files changed, 435 insertions(+), 36 deletions(-)
> >
> I haven't received any comment on this v3 patch for over 2 weeks. Is there
> anything I can do to make it more ready to be merged?
> 
> Thanks,
> Longman

Hi Longman,
I am a fresher of fsdevel, about 2 weeks before, I have joined this mail list, recently I have met the same problem of negative dentries, 
in my opinion, the dentries should be remove together with the files or directories, I don't know you have submit this patch, I have
another patch about this:

http://marc.info/?l=linux-fsdevel&m=150209902215266&w=2

maybe this is a foo idea...

regards
Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/5] fs/dcache: Limit # of negative dentries
  2017-08-16 10:33   ` Wangkai (Kevin,C)
@ 2017-08-16 13:29     ` Waiman Long
  2017-08-17  4:00       ` Wangkai (Kevin,C)
  0 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2017-08-16 13:29 UTC (permalink / raw)
  To: Wangkai (Kevin,C), Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, James Bottomley

On 08/16/2017 06:33 AM, Wangkai (Kevin,C) wrote:
>> -----Original Message-----
>> From: linux-fsdevel-owner@vger.kernel.org
>> [mailto:linux-fsdevel-owner@vger.kernel.org] On Behalf Of Waiman Long
>> Sent: Wednesday, August 16, 2017 1:15 AM
>> To: Alexander Viro; Jonathan Corbet
>> Cc: linux-kernel@vger.kernel.org; linux-doc@vger.kernel.org;
>> linux-fsdevel@vger.kernel.org; Paul E. McKenney; Andrew Morton; Ingo Molnar;
>> Miklos Szeredi; Matthew Wilcox; Larry Woodman; James Bottomley
>> Subject: Re: [PATCH v3 0/5] fs/dcache: Limit # of negative dentries
>>
>> On 07/28/2017 02:34 PM, Waiman Long wrote:
>>>  v2->v3:
>>>   - Add a faster pruning rate when the free pool is closed to depletion.
>>>   - As suggested by James Bottomley, add an artificial delay waiting
>>>     loop before killing a negative dentry and properly clear the
>>>     DCACHE_KILL_NEGATIVE flag if killing doesn't happen.
>>>   - Add a new patch to track number of negative dentries that are
>>>     forcifully killed.
>>>
>>>  v1->v2:
>>>   - Move the new nr_negative field to the end of dentry_stat_t structure
>>>     as suggested by Matthew Wilcox.
>>>   - With the help of Miklos Szeredi, fix incorrect locking order in
>>>     dentry_kill() by using lock_parent() instead of locking the parent's
>>>     d_lock directly.
>>>   - Correctly account for positive to negative dentry transitions.
>>>   - Automatic pruning of negative dentries will now ignore the reference
>>>     bit in negative dentries but not the regular shrinking.
>>>
>>> A rogue application can potentially create a large number of negative
>>> dentries in the system consuming most of the memory available. This
>>> can impact performance of other applications running on the system.
>>>
>>> This patchset introduces changes to the dcache subsystem to limit the
>>> number of negative dentries allowed to be created thus limiting the
>>> amount of memory that can be consumed by negative dentries.
>>>
>>> Patch 1 tracks the number of negative dentries used and disallow the
>>> creation of more when the limit is reached.
>>>
>>> Patch 2 enables /proc/sys/fs/dentry-state to report the number of
>>> negative dentries in the system.
>>>
>>> Patch 3 enables automatic pruning of negative dentries when it is
>>> close to the limit so that we won't end up killing recently used
>>> negative dentries.
>>>
>>> Patch 4 prevents racing between negative dentry pruning and umount
>>> operation.
>>>
>>> Patch 5 shows the number of forced negative dentry killings in
>>> /proc/sys/fs/dentry-state. End users can then tune the neg_dentry_pc=
>>> kernel boot parameter if they want to reduce forced negative dentry
>>> killings.
>>>
>>> Waiman Long (5):
>>>   fs/dcache: Limit numbers of negative dentries
>>>   fs/dcache: Report negative dentry number in dentry-state
>>>   fs/dcache: Enable automatic pruning of negative dentries
>>>   fs/dcache: Protect negative dentry pruning from racing with umount
>>>   fs/dcache: Track count of negative dentries forcibly killed
>>>
>>>  Documentation/admin-guide/kernel-parameters.txt |   7 +
>>>  fs/dcache.c                                     | 451
>> ++++++++++++++++++++++--
>>>  include/linux/dcache.h                          |   8 +-
>>>  include/linux/list_lru.h                        |   1 +
>>>  mm/list_lru.c                                   |   4 +-
>>>  5 files changed, 435 insertions(+), 36 deletions(-)
>>>
>> I haven't received any comment on this v3 patch for over 2 weeks. Is there
>> anything I can do to make it more ready to be merged?
>>
>> Thanks,
>> Longman
> Hi Longman,
> I am a fresher of fsdevel, about 2 weeks before, I have joined this mail list, recently I have met the same problem of negative dentries, 
> in my opinion, the dentries should be remove together with the files or directories, I don't know you have submit this patch, I have
> another patch about this:
>
> http://marc.info/?l=linux-fsdevel&m=150209902215266&w=2
>
> maybe this is a foo idea...
>
> regards
> Kevin

If you look at the code, the front dentries of the LRU list are removed
when there are too many negative dentries. That includes positive
dentries as well as it is not practical to just remove the negative
dentries.

I have looked at your patch. The dentry of a removed file becomes a
negative dentry. The kernel can keep track of those negative entries and
there is no need to add an additional flag for that.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v3 0/5] fs/dcache: Limit # of negative dentries
  2017-08-16 13:29     ` Waiman Long
@ 2017-08-17  4:00       ` Wangkai (Kevin,C)
  2017-08-17 13:04         ` Waiman Long
  0 siblings, 1 reply; 18+ messages in thread
From: Wangkai (Kevin,C) @ 2017-08-17  4:00 UTC (permalink / raw)
  To: Waiman Long, Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, James Bottomley



> -----Original Message-----
> From: Waiman Long [mailto:longman@redhat.com]
> Sent: Wednesday, August 16, 2017 9:29 PM
> To: Wangkai (Kevin,C); Alexander Viro; Jonathan Corbet
> Cc: linux-kernel@vger.kernel.org; linux-doc@vger.kernel.org;
> linux-fsdevel@vger.kernel.org; Paul E. McKenney; Andrew Morton; Ingo Molnar;
> Miklos Szeredi; Matthew Wilcox; Larry Woodman; James Bottomley
> Subject: Re: [PATCH v3 0/5] fs/dcache: Limit # of negative dentries
> 
> On 08/16/2017 06:33 AM, Wangkai (Kevin,C) wrote:
> >> -----Original Message-----
> >> From: linux-fsdevel-owner@vger.kernel.org
> >> [mailto:linux-fsdevel-owner@vger.kernel.org] On Behalf Of Waiman Long
> >> Sent: Wednesday, August 16, 2017 1:15 AM
> >> To: Alexander Viro; Jonathan Corbet
> >> Cc: linux-kernel@vger.kernel.org; linux-doc@vger.kernel.org;
> >> linux-fsdevel@vger.kernel.org; Paul E. McKenney; Andrew Morton; Ingo
> >> Molnar; Miklos Szeredi; Matthew Wilcox; Larry Woodman; James
> >> Bottomley
> >> Subject: Re: [PATCH v3 0/5] fs/dcache: Limit # of negative dentries
> >>
> >> On 07/28/2017 02:34 PM, Waiman Long wrote:
> >>>  v2->v3:
> >>>   - Add a faster pruning rate when the free pool is closed to depletion.
> >>>   - As suggested by James Bottomley, add an artificial delay waiting
> >>>     loop before killing a negative dentry and properly clear the
> >>>     DCACHE_KILL_NEGATIVE flag if killing doesn't happen.
> >>>   - Add a new patch to track number of negative dentries that are
> >>>     forcifully killed.
> >>>
> >>>  v1->v2:
> >>>   - Move the new nr_negative field to the end of dentry_stat_t structure
> >>>     as suggested by Matthew Wilcox.
> >>>   - With the help of Miklos Szeredi, fix incorrect locking order in
> >>>     dentry_kill() by using lock_parent() instead of locking the parent's
> >>>     d_lock directly.
> >>>   - Correctly account for positive to negative dentry transitions.
> >>>   - Automatic pruning of negative dentries will now ignore the reference
> >>>     bit in negative dentries but not the regular shrinking.
> >>>
> >>> A rogue application can potentially create a large number of
> >>> negative dentries in the system consuming most of the memory
> >>> available. This can impact performance of other applications running on the
> system.
> >>>
> >>> This patchset introduces changes to the dcache subsystem to limit
> >>> the number of negative dentries allowed to be created thus limiting
> >>> the amount of memory that can be consumed by negative dentries.
> >>>
> >>> Patch 1 tracks the number of negative dentries used and disallow the
> >>> creation of more when the limit is reached.
> >>>
> >>> Patch 2 enables /proc/sys/fs/dentry-state to report the number of
> >>> negative dentries in the system.
> >>>
> >>> Patch 3 enables automatic pruning of negative dentries when it is
> >>> close to the limit so that we won't end up killing recently used
> >>> negative dentries.
> >>>
> >>> Patch 4 prevents racing between negative dentry pruning and umount
> >>> operation.
> >>>
> >>> Patch 5 shows the number of forced negative dentry killings in
> >>> /proc/sys/fs/dentry-state. End users can then tune the
> >>> neg_dentry_pc= kernel boot parameter if they want to reduce forced
> >>> negative dentry killings.
> >>>
> >>> Waiman Long (5):
> >>>   fs/dcache: Limit numbers of negative dentries
> >>>   fs/dcache: Report negative dentry number in dentry-state
> >>>   fs/dcache: Enable automatic pruning of negative dentries
> >>>   fs/dcache: Protect negative dentry pruning from racing with umount
> >>>   fs/dcache: Track count of negative dentries forcibly killed
> >>>
> >>>  Documentation/admin-guide/kernel-parameters.txt |   7 +
> >>>  fs/dcache.c                                     | 451
> >> ++++++++++++++++++++++--
> >>>  include/linux/dcache.h                          |   8 +-
> >>>  include/linux/list_lru.h                        |   1 +
> >>>  mm/list_lru.c                                   |   4 +-
> >>>  5 files changed, 435 insertions(+), 36 deletions(-)
> >>>
> >> I haven't received any comment on this v3 patch for over 2 weeks. Is
> >> there anything I can do to make it more ready to be merged?
> >>
> >> Thanks,
> >> Longman
> > Hi Longman,
> > I am a fresher of fsdevel, about 2 weeks before, I have joined this
> > mail list, recently I have met the same problem of negative dentries,
> > in my opinion, the dentries should be remove together with the files or
> directories, I don't know you have submit this patch, I have another patch
> about this:
> >
> > http://marc.info/?l=linux-fsdevel&m=150209902215266&w=2
> >
> > maybe this is a foo idea...
> >
> > regards
> > Kevin
> 
> If you look at the code, the front dentries of the LRU list are removed when
> there are too many negative dentries. That includes positive dentries as well as
> it is not practical to just remove the negative dentries.
> 
> I have looked at your patch. The dentry of a removed file becomes a negative
> dentry. The kernel can keep track of those negative entries and there is no need
> to add an additional flag for that.
> 
> Cheers,
> Longman

One comment about your patch:
In the patch 1/5 function dentry_kill first get dentry->d_flags, after lock parent and
Compare d_flags again, is this needed? The d_flags was changed under lock.

In my patch the DCACHE_FILE_REMOVED flag was to distinguish the removed file and
The closed file, I found there was no difference of a dentry between the removed file and the closed
File, they all on the lru list.

Regards,
Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/5] fs/dcache: Limit # of negative dentries
  2017-08-17  4:00       ` Wangkai (Kevin,C)
@ 2017-08-17 13:04         ` Waiman Long
  2017-08-18  9:59           ` Wangkai (Kevin,C)
  0 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2017-08-17 13:04 UTC (permalink / raw)
  To: Wangkai (Kevin,C), Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, James Bottomley

On 08/17/2017 12:00 AM, Wangkai (Kevin,C) wrote:
>
>>>
>>> Hi Longman,
>>> I am a fresher of fsdevel, about 2 weeks before, I have joined this
>>> mail list, recently I have met the same problem of negative dentries,
>>> in my opinion, the dentries should be remove together with the files or
>> directories, I don't know you have submit this patch, I have another patch
>> about this:
>>> http://marc.info/?l=linux-fsdevel&m=150209902215266&w=2
>>>
>>> maybe this is a foo idea...
>>>
>>> regards
>>> Kevin
>> If you look at the code, the front dentries of the LRU list are removed when
>> there are too many negative dentries. That includes positive dentries as well as
>> it is not practical to just remove the negative dentries.
>>
>> I have looked at your patch. The dentry of a removed file becomes a negative
>> dentry. The kernel can keep track of those negative entries and there is no need
>> to add an additional flag for that.
>>
>> Cheers,
>> Longman
> One comment about your patch:
> In the patch 1/5 function dentry_kill first get dentry->d_flags, after lock parent and
> Compare d_flags again, is this needed? The d_flags was changed under lock.

Yes, it is necessary. We are talking about an SMP system with multiple
threads running concurrently. If you look at the lock parent code, it
may release the current dentry lock before taking the parent's and then
the dentry lock again. As soon as the lock is released, anything can
happen to the dentry including changes in d_flags.

> In my patch the DCACHE_FILE_REMOVED flag was to distinguish the removed file and
> The closed file, I found there was no difference of a dentry between the removed file and the closed
> File, they all on the lru list.

There is a difference between removed file and closed file. The type
field of d_flags will be empty for a removed file which indicate a
negative dentry. Anything else is a positive dentry. Look at the inline
function d_is_negative() [d_is_miss()] and you will see how it is done.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v3 0/5] fs/dcache: Limit # of negative dentries
  2017-08-17 13:04         ` Waiman Long
@ 2017-08-18  9:59           ` Wangkai (Kevin,C)
  2017-08-18 14:10             ` Waiman Long
  0 siblings, 1 reply; 18+ messages in thread
From: Wangkai (Kevin,C) @ 2017-08-18  9:59 UTC (permalink / raw)
  To: Waiman Long, Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, James Bottomley



> -----Original Message-----
> From: Waiman Long [mailto:longman@redhat.com]
> Sent: Thursday, August 17, 2017 9:04 PM
> To: Wangkai (Kevin,C); Alexander Viro; Jonathan Corbet
> Cc: linux-kernel@vger.kernel.org; linux-doc@vger.kernel.org;
> linux-fsdevel@vger.kernel.org; Paul E. McKenney; Andrew Morton; Ingo Molnar;
> Miklos Szeredi; Matthew Wilcox; Larry Woodman; James Bottomley
> Subject: Re: [PATCH v3 0/5] fs/dcache: Limit # of negative dentries
> 
> On 08/17/2017 12:00 AM, Wangkai (Kevin,C) wrote:
> >
> >>>
> >>> Hi Longman,
> >>> I am a fresher of fsdevel, about 2 weeks before, I have joined this
> >>> mail list, recently I have met the same problem of negative
> >>> dentries, in my opinion, the dentries should be remove together with
> >>> the files or
> >> directories, I don't know you have submit this patch, I have another
> >> patch about this:
> >>> http://marc.info/?l=linux-fsdevel&m=150209902215266&w=2
> >>>
> >>> maybe this is a foo idea...
> >>>
> >>> regards
> >>> Kevin
> >> If you look at the code, the front dentries of the LRU list are
> >> removed when there are too many negative dentries. That includes
> >> positive dentries as well as it is not practical to just remove the negative
> dentries.
> >>
> >> I have looked at your patch. The dentry of a removed file becomes a
> >> negative dentry. The kernel can keep track of those negative entries
> >> and there is no need to add an additional flag for that.
> >>
> >> Cheers,
> >> Longman
> > One comment about your patch:
> > In the patch 1/5 function dentry_kill first get dentry->d_flags, after
> > lock parent and Compare d_flags again, is this needed? The d_flags was
> changed under lock.
> 
> Yes, it is necessary. We are talking about an SMP system with multiple threads
> running concurrently. If you look at the lock parent code, it may release the
> current dentry lock before taking the parent's and then the dentry lock again.
> As soon as the lock is released, anything can happen to the dentry including
> changes in d_flags.

Yes, I am not check the lock parent code, it is necessary.

> > In my patch the DCACHE_FILE_REMOVED flag was to distinguish the
> > removed file and The closed file, I found there was no difference of a
> > dentry between the removed file and the closed File, they all on the lru list.
> 
> There is a difference between removed file and closed file. The type field of
> d_flags will be empty for a removed file which indicate a negative dentry.
> Anything else is a positive dentry. Look at the inline function d_is_negative()
> [d_is_miss()] and you will see how it is done.

After the file was removed, the dentry flag was not MISS, the flag was:
DCACHE_REFERENCED | DCACHE_RCUACCESS | DCACHE_LRU_LIST | DCACHE_REGULAR_TYPE
So, the dentry never be freed, until the kernel reclaim the slab memory.

Regards,
Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/5] fs/dcache: Limit # of negative dentries
  2017-08-18  9:59           ` Wangkai (Kevin,C)
@ 2017-08-18 14:10             ` Waiman Long
  2017-08-21  3:23               ` Wangkai (Kevin,C)
  0 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2017-08-18 14:10 UTC (permalink / raw)
  To: Wangkai (Kevin,C), Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, James Bottomley

On 08/18/2017 05:59 AM, Wangkai (Kevin,C) wrote:
>
>>> In my patch the DCACHE_FILE_REMOVED flag was to distinguish the
>>> removed file and The closed file, I found there was no difference of a
>>> dentry between the removed file and the closed File, they all on the lru list.
>> There is a difference between removed file and closed file. The type field of
>> d_flags will be empty for a removed file which indicate a negative dentry.
>> Anything else is a positive dentry. Look at the inline function d_is_negative()
>> [d_is_miss()] and you will see how it is done.
> After the file was removed, the dentry flag was not MISS, the flag was:
> DCACHE_REFERENCED | DCACHE_RCUACCESS | DCACHE_LRU_LIST | DCACHE_REGULAR_TYPE
> So, the dentry never be freed, until the kernel reclaim the slab memory.

The dentry_unlink_inode() function will clear DCACHE_REGULAR_TYPE.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v3 0/5] fs/dcache: Limit # of negative dentries
  2017-08-18 14:10             ` Waiman Long
@ 2017-08-21  3:23               ` Wangkai (Kevin,C)
  2017-08-21 13:34                 ` Waiman Long
  0 siblings, 1 reply; 18+ messages in thread
From: Wangkai (Kevin,C) @ 2017-08-21  3:23 UTC (permalink / raw)
  To: Waiman Long, Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, James Bottomley



> -----Original Message-----
> From: Waiman Long [mailto:longman@redhat.com]
> Sent: Friday, August 18, 2017 10:10 PM
> To: Wangkai (Kevin,C); Alexander Viro; Jonathan Corbet
> Cc: linux-kernel@vger.kernel.org; linux-doc@vger.kernel.org;
> linux-fsdevel@vger.kernel.org; Paul E. McKenney; Andrew Morton; Ingo Molnar;
> Miklos Szeredi; Matthew Wilcox; Larry Woodman; James Bottomley
> Subject: Re: [PATCH v3 0/5] fs/dcache: Limit # of negative dentries
> 
> On 08/18/2017 05:59 AM, Wangkai (Kevin,C) wrote:
> >
> >>> In my patch the DCACHE_FILE_REMOVED flag was to distinguish the
> >>> removed file and The closed file, I found there was no difference of
> >>> a dentry between the removed file and the closed File, they all on the lru
> list.
> >> There is a difference between removed file and closed file. The type
> >> field of d_flags will be empty for a removed file which indicate a negative
> dentry.
> >> Anything else is a positive dentry. Look at the inline function
> >> d_is_negative() [d_is_miss()] and you will see how it is done.
> > After the file was removed, the dentry flag was not MISS, the flag was:
> > DCACHE_REFERENCED | DCACHE_RCUACCESS | DCACHE_LRU_LIST |
> > DCACHE_REGULAR_TYPE So, the dentry never be freed, until the kernel
> reclaim the slab memory.
> 
> The dentry_unlink_inode() function will clear DCACHE_REGULAR_TYPE.
> 

Yes, I have add some trace info for the dentry state changed, with dentry flag and reference count:

File create:
[   42.636675] dentry [xxxx_1234] 0xffff880230be8180 flag 0x0 ref 1 ev dentry alloc
File close:
[   42.637421] dentry [xxxx_1234] 0xffff880230be8180 flag 0x4800c0 ref 0 ev dput called

Unlink lookup:
[  244.658086] dentry [xxxx_1234] 0xffff880230be8180 flag 0x4800c0 ref 1 ev d_lookup
Unlink d_delete:
[  244.658254] dentry [xxxx_1234] 0xffff880230be8180 flag 0x800c0 ref 1 ev d_lockref ref 1
Unlink dput:
[  244.658438] dentry [xxxx_1234] 0xffff880230be8180 flag 0x800c0 ref 0 ev dput called

The end, dentry's flag stay at 0x800c0, but this dentry was not freed, keeped by the dcache as unused,
After tens of thousands of the dentries slow down the dentry lookup performance, kernel memory usage
Keep high.

Regards,
Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/5] fs/dcache: Limit # of negative dentries
  2017-08-21  3:23               ` Wangkai (Kevin,C)
@ 2017-08-21 13:34                 ` Waiman Long
  2017-08-22  2:59                   ` Wangkai (Kevin,C)
  0 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2017-08-21 13:34 UTC (permalink / raw)
  To: Wangkai (Kevin,C), Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, James Bottomley

On 08/20/2017 11:23 PM, Wangkai (Kevin,C) wrote:
>
> Yes, I have add some trace info for the dentry state changed, with dentry flag and reference count:
>
> File create:
> [   42.636675] dentry [xxxx_1234] 0xffff880230be8180 flag 0x0 ref 1 ev dentry alloc
> File close:
> [   42.637421] dentry [xxxx_1234] 0xffff880230be8180 flag 0x4800c0 ref 0 ev dput called
>
> Unlink lookup:
> [  244.658086] dentry [xxxx_1234] 0xffff880230be8180 flag 0x4800c0 ref 1 ev d_lookup
> Unlink d_delete:
> [  244.658254] dentry [xxxx_1234] 0xffff880230be8180 flag 0x800c0 ref 1 ev d_lockref ref 1
> Unlink dput:
> [  244.658438] dentry [xxxx_1234] 0xffff880230be8180 flag 0x800c0 ref 0 ev dput called
>
> The end, dentry's flag stay at 0x800c0, but this dentry was not freed, keeped by the dcache as unused,
> After tens of thousands of the dentries slow down the dentry lookup performance, kernel memory usage
> Keep high.
>
> Regards,
> Kevin

That is expected. The kernel does not get rid of negative dentries until
the shrinker is called because of memory pressure. Negative dentries do
help to improve file lookup performance. However, too much of negative
dentries suppress the amount of free memory available for other use.
That is why I send out my patch to limit the number of negative dentries
outstanding.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v3 0/5] fs/dcache: Limit # of negative dentries
  2017-08-21 13:34                 ` Waiman Long
@ 2017-08-22  2:59                   ` Wangkai (Kevin,C)
  0 siblings, 0 replies; 18+ messages in thread
From: Wangkai (Kevin,C) @ 2017-08-22  2:59 UTC (permalink / raw)
  To: Waiman Long, Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, James Bottomley



> -----Original Message-----
> From: Waiman Long [mailto:longman@redhat.com]
> Sent: Monday, August 21, 2017 9:35 PM
> To: Wangkai (Kevin,C); Alexander Viro; Jonathan Corbet
> Cc: linux-kernel@vger.kernel.org; linux-doc@vger.kernel.org;
> linux-fsdevel@vger.kernel.org; Paul E. McKenney; Andrew Morton; Ingo Molnar;
> Miklos Szeredi; Matthew Wilcox; Larry Woodman; James Bottomley
> Subject: Re: [PATCH v3 0/5] fs/dcache: Limit # of negative dentries
> 
> On 08/20/2017 11:23 PM, Wangkai (Kevin,C) wrote:
> >
> > Yes, I have add some trace info for the dentry state changed, with dentry flag
> and reference count:
> >
> > File create:
> > [   42.636675] dentry [xxxx_1234] 0xffff880230be8180 flag 0x0 ref 1 ev
> dentry alloc
> > File close:
> > [   42.637421] dentry [xxxx_1234] 0xffff880230be8180 flag 0x4800c0 ref 0
> ev dput called
> >
> > Unlink lookup:
> > [  244.658086] dentry [xxxx_1234] 0xffff880230be8180 flag 0x4800c0 ref
> > 1 ev d_lookup Unlink d_delete:
> > [  244.658254] dentry [xxxx_1234] 0xffff880230be8180 flag 0x800c0 ref
> > 1 ev d_lockref ref 1 Unlink dput:
> > [  244.658438] dentry [xxxx_1234] 0xffff880230be8180 flag 0x800c0 ref
> > 0 ev dput called
> >
> > The end, dentry's flag stay at 0x800c0, but this dentry was not freed,
> > keeped by the dcache as unused, After tens of thousands of the
> > dentries slow down the dentry lookup performance, kernel memory usage
> Keep high.
> >
> > Regards,
> > Kevin
> 
> That is expected. The kernel does not get rid of negative dentries until the
> shrinker is called because of memory pressure. Negative dentries do help to
> improve file lookup performance. However, too much of negative dentries
> suppress the amount of free memory available for other use.
> That is why I send out my patch to limit the number of negative dentries
> outstanding.
> 
I think there are two issue:
1. when a file was removed, the dentry should be deleted, this is as a bug, if the
Dentry memory cannot be reclaimed, there is a memory leak.

2. limit the dentries number can improve when there were lots of files operations,
And all the files were valid.

Regards,
Kevin 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/5] fs/dcache: Limit # of negative dentries
  2017-07-28 18:34 [PATCH v3 0/5] fs/dcache: Limit # of negative dentries Waiman Long
                   ` (5 preceding siblings ...)
  2017-08-15 17:15 ` [PATCH v3 0/5] fs/dcache: Limit # of negative dentries Waiman Long
@ 2017-08-28 17:58 ` Waiman Long
  2017-08-28 18:59   ` Waiman Long
  6 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2017-08-28 17:58 UTC (permalink / raw)
  To: Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, James Bottomley, Wangkai (Kevin,C)

On 07/28/2017 02:34 PM, Waiman Long wrote:
>  v2->v3:
>   - Add a faster pruning rate when the free pool is closed to depletion.
>   - As suggested by James Bottomley, add an artificial delay waiting
>     loop before killing a negative dentry and properly clear the
>     DCACHE_KILL_NEGATIVE flag if killing doesn't happen.
>   - Add a new patch to track number of negative dentries that are
>     forcifully killed.
>
>  v1->v2:
>   - Move the new nr_negative field to the end of dentry_stat_t structure
>     as suggested by Matthew Wilcox.
>   - With the help of Miklos Szeredi, fix incorrect locking order in
>     dentry_kill() by using lock_parent() instead of locking the parent's
>     d_lock directly.
>   - Correctly account for positive to negative dentry transitions.
>   - Automatic pruning of negative dentries will now ignore the reference
>     bit in negative dentries but not the regular shrinking.
>
> A rogue application can potentially create a large number of negative
> dentries in the system consuming most of the memory available. This
> can impact performance of other applications running on the system.
>
> This patchset introduces changes to the dcache subsystem to limit
> the number of negative dentries allowed to be created thus limiting
> the amount of memory that can be consumed by negative dentries.
>
> Patch 1 tracks the number of negative dentries used and disallow
> the creation of more when the limit is reached.
>
> Patch 2 enables /proc/sys/fs/dentry-state to report the number of
> negative dentries in the system.
>
> Patch 3 enables automatic pruning of negative dentries when it is
> close to the limit so that we won't end up killing recently used
> negative dentries.
>
> Patch 4 prevents racing between negative dentry pruning and umount
> operation.
>
> Patch 5 shows the number of forced negative dentry killings in
> /proc/sys/fs/dentry-state. End users can then tune the neg_dentry_pc=
> kernel boot parameter if they want to reduce forced negative dentry
> killings.
>
> Waiman Long (5):
>   fs/dcache: Limit numbers of negative dentries
>   fs/dcache: Report negative dentry number in dentry-state
>   fs/dcache: Enable automatic pruning of negative dentries
>   fs/dcache: Protect negative dentry pruning from racing with umount
>   fs/dcache: Track count of negative dentries forcibly killed
>
>  Documentation/admin-guide/kernel-parameters.txt |   7 +
>  fs/dcache.c                                     | 451 ++++++++++++++++++++++--
>  include/linux/dcache.h                          |   8 +-
>  include/linux/list_lru.h                        |   1 +
>  mm/list_lru.c                                   |   4 +-
>  5 files changed, 435 insertions(+), 36 deletions(-)
>
With a 4.13 based kernel, the positive & negative dentries lookup rates
(lookups per second) after initial boot on a 32GB memory VM with and
without the patch were as follows:

  Metric                    w/o patch        with patch
  ------                    ---------        ----------
  Positive dentry lookup      844881           842618
  Negative dentry lookup     1865158          1901875
  Negative dentry creation    609887           617215

The last row refers to the creation rate of 10 millions negative
dentries with the negative dentry limit set to 50% (> 80M dentries).
Ignoring some inherent noise in the test results, there wasn't any
noticeable difference in term of lookup and negative dentry creation
performance with or without this patch.

If the limit was set to 5% (the default), the 10M negative dentry
creation rate dropped to 199565 and the dentry-state was:

2344754 2326486 45      0       2316533 7494261

This was expected as negative dentry creation throttling with forced
dentry deletion happened in this case.

IOW, this patch does not cause any regression in term of lookup and
negative dentry creation performance as long as the limit hasn't been
reached.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/5] fs/dcache: Limit # of negative dentries
  2017-08-28 17:58 ` Waiman Long
@ 2017-08-28 18:59   ` Waiman Long
  0 siblings, 0 replies; 18+ messages in thread
From: Waiman Long @ 2017-08-28 18:59 UTC (permalink / raw)
  To: Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, James Bottomley, Wangkai (Kevin,C)

On 08/28/2017 01:58 PM, Waiman Long wrote:
> On 07/28/2017 02:34 PM, Waiman Long wrote:
>>  v2->v3:
>>   - Add a faster pruning rate when the free pool is closed to depletion.
>>   - As suggested by James Bottomley, add an artificial delay waiting
>>     loop before killing a negative dentry and properly clear the
>>     DCACHE_KILL_NEGATIVE flag if killing doesn't happen.
>>   - Add a new patch to track number of negative dentries that are
>>     forcifully killed.
>>
>>  v1->v2:
>>   - Move the new nr_negative field to the end of dentry_stat_t structure
>>     as suggested by Matthew Wilcox.
>>   - With the help of Miklos Szeredi, fix incorrect locking order in
>>     dentry_kill() by using lock_parent() instead of locking the parent's
>>     d_lock directly.
>>   - Correctly account for positive to negative dentry transitions.
>>   - Automatic pruning of negative dentries will now ignore the reference
>>     bit in negative dentries but not the regular shrinking.
>>
>> A rogue application can potentially create a large number of negative
>> dentries in the system consuming most of the memory available. This
>> can impact performance of other applications running on the system.
>>
>> This patchset introduces changes to the dcache subsystem to limit
>> the number of negative dentries allowed to be created thus limiting
>> the amount of memory that can be consumed by negative dentries.
>>
>> Patch 1 tracks the number of negative dentries used and disallow
>> the creation of more when the limit is reached.
>>
>> Patch 2 enables /proc/sys/fs/dentry-state to report the number of
>> negative dentries in the system.
>>
>> Patch 3 enables automatic pruning of negative dentries when it is
>> close to the limit so that we won't end up killing recently used
>> negative dentries.
>>
>> Patch 4 prevents racing between negative dentry pruning and umount
>> operation.
>>
>> Patch 5 shows the number of forced negative dentry killings in
>> /proc/sys/fs/dentry-state. End users can then tune the neg_dentry_pc=
>> kernel boot parameter if they want to reduce forced negative dentry
>> killings.
>>
>> Waiman Long (5):
>>   fs/dcache: Limit numbers of negative dentries
>>   fs/dcache: Report negative dentry number in dentry-state
>>   fs/dcache: Enable automatic pruning of negative dentries
>>   fs/dcache: Protect negative dentry pruning from racing with umount
>>   fs/dcache: Track count of negative dentries forcibly killed
>>
>>  Documentation/admin-guide/kernel-parameters.txt |   7 +
>>  fs/dcache.c                                     | 451 ++++++++++++++++++++++--
>>  include/linux/dcache.h                          |   8 +-
>>  include/linux/list_lru.h                        |   1 +
>>  mm/list_lru.c                                   |   4 +-
>>  5 files changed, 435 insertions(+), 36 deletions(-)
>>
> With a 4.13 based kernel, the positive & negative dentries lookup rates
> (lookups per second) after initial boot on a 32GB memory VM with and
> without the patch were as follows:
>
>   Metric                    w/o patch        with patch
>   ------                    ---------        ----------
>   Positive dentry lookup      844881           842618
>   Negative dentry lookup     1865158          1901875
>   Negative dentry creation    609887           617215
>
> The last row refers to the creation rate of 10 millions negative
> dentries with the negative dentry limit set to 50% (> 80M dentries).
> Ignoring some inherent noise in the test results, there wasn't any
> noticeable difference in term of lookup and negative dentry creation
> performance with or without this patch.
>
> If the limit was set to 5% (the default), the 10M negative dentry
> creation rate dropped to 199565 and the dentry-state was:
>
> 2344754 2326486 45      0       2316533 7494261
>
> This was expected as negative dentry creation throttling with forced
> dentry deletion happened in this case.
>
> IOW, this patch does not cause any regression in term of lookup and
> negative dentry creation performance as long as the limit hasn't been
> reached.

Another performance data point about running the AIM7 highsystime
workload on a 36-core 32G VM is as follows:

Running the AIM7 high-systime workload on the VM, the baseline
performance was 186770 jobs/min. By running a single-thread rogue
negative dentry creation program in the background until the patched
kernel with 5% limit started throttling, the performance was 183746
jobs/min. On an unpatched kernel with memory almost exhausted and
memory shrinker was kicked in, the performance was 148997 jobs/min.

So the patch does protect the system from suffering significant
performance degradation in case a negative dentry creation rogue
program is runninig in the background.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-08-28 18:59 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-28 18:34 [PATCH v3 0/5] fs/dcache: Limit # of negative dentries Waiman Long
2017-07-28 18:34 ` [PATCH v3 1/5] fs/dcache: Limit numbers " Waiman Long
2017-07-28 18:34 ` [PATCH v3 2/5] fs/dcache: Report negative dentry number in dentry-state Waiman Long
2017-07-28 18:34 ` [PATCH v3 3/5] fs/dcache: Enable automatic pruning of negative dentries Waiman Long
2017-07-28 18:34 ` [PATCH v3 4/5] fs/dcache: Protect negative dentry pruning from racing with umount Waiman Long
2017-07-28 18:34 ` [PATCH v3 5/5] fs/dcache: Track count of negative dentries forcibly killed Waiman Long
2017-08-15 17:15 ` [PATCH v3 0/5] fs/dcache: Limit # of negative dentries Waiman Long
2017-08-16 10:33   ` Wangkai (Kevin,C)
2017-08-16 13:29     ` Waiman Long
2017-08-17  4:00       ` Wangkai (Kevin,C)
2017-08-17 13:04         ` Waiman Long
2017-08-18  9:59           ` Wangkai (Kevin,C)
2017-08-18 14:10             ` Waiman Long
2017-08-21  3:23               ` Wangkai (Kevin,C)
2017-08-21 13:34                 ` Waiman Long
2017-08-22  2:59                   ` Wangkai (Kevin,C)
2017-08-28 17:58 ` Waiman Long
2017-08-28 18:59   ` Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.