linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] fs/dcache: Limit # of negative dentries
@ 2017-07-17 13:39 Waiman Long
  2017-07-17 13:39 ` [PATCH 1/4] fs/dcache: Limit numbers " Waiman Long
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Waiman Long @ 2017-07-17 13:39 UTC (permalink / raw)
  To: Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Waiman Long

A rogue application can potentially create a large number of negative
dentries in the system consuming most of the memory available. This
can impact performance of other applications running on the system.

This patchset introduces changes to the dcache subsystem to limit
the number of negative dentries allowed to be created thus limiting
the amount of memory that can be consumed by negative dentries.

Patch 1 tracks the number of negative dentries used and disallow
the creation of more when the limit is reached.

Patch 2 enables /proc/sys/fs/dentry-state to report the number of
negative dentries in the system.

Patch 3 enables automatic pruning of negative dentries when it is
close to the limit.

Patch 4 prevents racing between negative dentry pruning and umount
operation.

Waiman Long (4):
  fs/dcache: Limit numbers of negative dentries
  fs/dcache: Report negative dentry number in dentry-state
  fs/dcache: Enable automatic pruning of negative dentries
  fs/dcache: Protect negative dentry pruning from racing with umount

 Documentation/admin-guide/kernel-parameters.txt |   7 +
 fs/dcache.c                                     | 312 +++++++++++++++++++++++-
 include/linux/dcache.h                          |   8 +-
 include/linux/list_lru.h                        |   1 +
 mm/list_lru.c                                   |   4 +-
 5 files changed, 327 insertions(+), 5 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] fs/dcache: Limit numbers of negative dentries
  2017-07-17 13:39 [PATCH 0/4] fs/dcache: Limit # of negative dentries Waiman Long
@ 2017-07-17 13:39 ` Waiman Long
  2017-07-17 17:49   ` Matthew Wilcox
                     ` (2 more replies)
  2017-07-17 13:39 ` [PATCH 2/4] fs/dcache: Report negative dentry number in dentry-state Waiman Long
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 17+ messages in thread
From: Waiman Long @ 2017-07-17 13:39 UTC (permalink / raw)
  To: Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Waiman Long

The number of positive dentries is limited by the number of files
in the filesystems. The number of negative dentries, however,
has no limit other than the total amount of memory available in
the system. So a rogue application that generates a lot of negative
dentries can potentially exhaust most of the memory available in the
system impacting performance on other running applications.

To prevent this from happening, the dcache code is now updated to limit
the amount of the negative dentries in the LRU lists that can be kept
as a percentage of total available system memory. The default is 5%
and can be changed by specifying the "neg_dentry_pc=" kernel command
line option.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt |   7 ++
 fs/dcache.c                                     | 154 +++++++++++++++++++++++-
 include/linux/dcache.h                          |   1 +
 3 files changed, 161 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f701430..fc3c937 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2372,6 +2372,13 @@
 
 	n2=		[NET] SDL Inc. RISCom/N2 synchronous serial card
 
+	neg_dentry_pc=	[KNL]
+			Range: 1-50
+			Default: 5
+			This parameter specifies the amount of negative
+			dentries allowed in the system as a percentage of
+			total system memory.
+
 	netdev=		[NET] Network devices parameters
 			Format: <irq>,<io>,<mem_start>,<mem_end>,<name>
 			Note that mem_start is often overloaded to mean
diff --git a/fs/dcache.c b/fs/dcache.c
index f901413..6a0a844 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -130,8 +130,19 @@ struct dentry_stat_t dentry_stat = {
 	.age_limit = 45,
 };
 
+/*
+ * Macros and variables to manage and count negative dentries.
+ */
+#define NEG_DENTRY_BATCH	(1 << 8)
+static long neg_dentry_percpu_limit __read_mostly;
+static struct {
+	raw_spinlock_t nfree_lock;
+	long nfree;			/* Negative dentry free pool */
+} ndblk ____cacheline_aligned_in_smp;
+
 static DEFINE_PER_CPU(long, nr_dentry);
 static DEFINE_PER_CPU(long, nr_dentry_unused);
+static DEFINE_PER_CPU(long, nr_dentry_neg);
 
 #if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS)
 
@@ -227,6 +238,86 @@ static inline int dentry_string_cmp(const unsigned char *cs, const unsigned char
 
 #endif
 
+/*
+ * There is a system-wide limit to the amount of negative dentries allowed
+ * in the super blocks' LRU lists. The default limit is 5% of the total
+ * system memory. This limit can be changed by using the kernel command line
+ * option "neg_dentry_pc=" to specify the percentage of the total memory
+ * that can be used for negative dentries. That percentage must be in the
+ * 1-50% range.
+ *
+ * To avoid performance problem with a global counter on an SMP system,
+ * the tracking is done mostly on a per-cpu basis. The total limit is
+ * distributed in a 80/20 ratio to per-cpu counters and a global free pool.
+ *
+ * If a per-cpu counter runs out of negative dentries, it can borrow extra
+ * ones from the global free pool. If it has more than its percpu limit,
+ * the extra ones will be returned back to the global pool.
+ */
+
+/*
+ * Decrement negative dentry count if applicable.
+ */
+static void __neg_dentry_dec(struct dentry *dentry)
+{
+	if (unlikely(this_cpu_dec_return(nr_dentry_neg) < 0)) {
+		long *pcnt = get_cpu_ptr(&nr_dentry_neg);
+
+		if ((*pcnt < 0) && raw_spin_trylock(&ndblk.nfree_lock)) {
+			ACCESS_ONCE(ndblk.nfree) += NEG_DENTRY_BATCH;
+			*pcnt += NEG_DENTRY_BATCH;
+			raw_spin_unlock(&ndblk.nfree_lock);
+		}
+		put_cpu_ptr(&nr_dentry_neg);
+	}
+}
+
+static inline void neg_dentry_dec(struct dentry *dentry)
+{
+	if (unlikely(d_is_negative(dentry)))
+		__neg_dentry_dec(dentry);
+}
+
+/*
+ * Increment negative dentry count if applicable.
+ */
+static void __neg_dentry_inc(struct dentry *dentry)
+{
+	long cnt, *pcnt;
+
+	if (this_cpu_inc_return(nr_dentry_neg) <= neg_dentry_percpu_limit)
+		return;
+
+	pcnt = get_cpu_ptr(&nr_dentry_neg);
+	cnt  = (READ_ONCE(ndblk.nfree) &&
+	       (*pcnt > neg_dentry_percpu_limit)) ? NEG_DENTRY_BATCH : 0;
+
+	if (cnt && raw_spin_trylock(&ndblk.nfree_lock)) {
+		long val = READ_ONCE(ndblk.nfree);
+
+		if (val < cnt)
+			cnt = val;
+		ACCESS_ONCE(ndblk.nfree) -= cnt;
+		*pcnt -= cnt;
+		raw_spin_unlock(&ndblk.nfree_lock);
+	} else {
+		cnt = 0;
+	}
+	put_cpu_ptr(&nr_dentry_neg);
+	/*
+	 * If there are too many negative dentries, set DCACHE_KILL_NEGATIVE
+	 * flag to indicate that the dentry should be killed.
+	 */
+	if (!cnt)
+		dentry->d_flags |= DCACHE_KILL_NEGATIVE;
+}
+
+static inline void neg_dentry_inc(struct dentry *dentry)
+{
+	if (unlikely(d_is_negative(dentry)))
+		__neg_dentry_inc(dentry);
+}
+
 static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *ct, unsigned tcount)
 {
 	/*
@@ -396,6 +487,7 @@ static void d_lru_add(struct dentry *dentry)
 	dentry->d_flags |= DCACHE_LRU_LIST;
 	this_cpu_inc(nr_dentry_unused);
 	WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
+	neg_dentry_inc(dentry);
 }
 
 static void d_lru_del(struct dentry *dentry)
@@ -404,6 +496,7 @@ static void d_lru_del(struct dentry *dentry)
 	dentry->d_flags &= ~DCACHE_LRU_LIST;
 	this_cpu_dec(nr_dentry_unused);
 	WARN_ON_ONCE(!list_lru_del(&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
+	neg_dentry_dec(dentry);
 }
 
 static void d_shrink_del(struct dentry *dentry)
@@ -434,6 +527,7 @@ static void d_lru_isolate(struct list_lru_one *lru, struct dentry *dentry)
 	dentry->d_flags &= ~DCACHE_LRU_LIST;
 	this_cpu_dec(nr_dentry_unused);
 	list_lru_isolate(lru, &dentry->d_lru);
+	neg_dentry_dec(dentry);
 }
 
 static void d_lru_shrink_move(struct list_lru_one *lru, struct dentry *dentry,
@@ -442,6 +536,7 @@ static void d_lru_shrink_move(struct list_lru_one *lru, struct dentry *dentry,
 	D_FLAG_VERIFY(dentry, DCACHE_LRU_LIST);
 	dentry->d_flags |= DCACHE_SHRINK_LIST;
 	list_lru_isolate_move(lru, &dentry->d_lru, list);
+	neg_dentry_dec(dentry);
 }
 
 /*
@@ -603,7 +698,13 @@ static struct dentry *dentry_kill(struct dentry *dentry)
 
 	if (!IS_ROOT(dentry)) {
 		parent = dentry->d_parent;
-		if (unlikely(!spin_trylock(&parent->d_lock))) {
+		/*
+		 * Force the killing of this negative dentry when
+		 * DCACHE_KILL_NEGATIVE flag is set.
+		 */
+		if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE)) {
+			spin_lock(&parent->d_lock);
+		} else if (unlikely(!spin_trylock(&parent->d_lock))) {
 			if (inode)
 				spin_unlock(&inode->i_lock);
 			goto failed;
@@ -815,6 +916,9 @@ void dput(struct dentry *dentry)
 
 	dentry_lru_add(dentry);
 
+	if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE))
+		goto kill_it;
+
 	dentry->d_lockref.count--;
 	spin_unlock(&dentry->d_lock);
 	return;
@@ -1820,6 +1924,11 @@ static void __d_instantiate(struct dentry *dentry, struct inode *inode)
 	WARN_ON(d_in_lookup(dentry));
 
 	spin_lock(&dentry->d_lock);
+	/*
+	 * Decrement negative dentry count if it was in the LRU list.
+	 */
+	if (dentry->d_flags & DCACHE_LRU_LIST)
+		neg_dentry_dec(dentry);
 	hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry);
 	raw_write_seqcount_begin(&dentry->d_seq);
 	__d_set_inode_and_type(dentry, inode, add_flags);
@@ -3566,6 +3675,47 @@ void d_tmpfile(struct dentry *dentry, struct inode *inode)
 }
 EXPORT_SYMBOL(d_tmpfile);
 
+static long neg_dentry_pc __initdata = 5;
+static bool neg_dentry_warn __initdata;
+static int __init set_neg_dentry_pc(char *str)
+{
+	ssize_t ret;
+	long new_pc = neg_dentry_pc;
+
+	if (!str)
+		return 0;
+	ret = kstrtol(str, 0, &new_pc);
+	if (ret || (new_pc < 1) || (new_pc > 50))
+		ret = 1;
+	else
+		neg_dentry_pc = new_pc;
+	if (ret)
+		neg_dentry_warn = true;
+	return ret ? 0 : 1;
+}
+__setup("neg_dentry_pc=", set_neg_dentry_pc);
+
+static void __init neg_dentry_init(void)
+{
+	/* Rough estimate of # of dentries allocated per page */
+	unsigned int nr_dentry_page = PAGE_SIZE/sizeof(struct dentry) - 1;
+	unsigned long cnt;
+
+	raw_spin_lock_init(&ndblk.nfree_lock);
+
+	/* 20% in global pool & 80% in percpu free */
+	ndblk.nfree = totalram_pages * nr_dentry_page * neg_dentry_pc / 500;
+	cnt = ndblk.nfree * 4 / num_possible_cpus();
+	if (unlikely(cnt < 2 * NEG_DENTRY_BATCH))
+		cnt = 2 * NEG_DENTRY_BATCH;
+	neg_dentry_percpu_limit = cnt;
+
+	if (neg_dentry_warn)
+		pr_warn("Warning: neg_dentry_pc must be within 1-50 range.\n");
+	pr_info("Negative dentry: percpu limit = %ld, free pool = %ld\n",
+		neg_dentry_percpu_limit, ndblk.nfree);
+}
+
 static __initdata unsigned long dhash_entries;
 static int __init set_dhash_entries(char *str)
 {
@@ -3606,6 +3756,8 @@ static void __init dcache_init(void)
 	dentry_cache = KMEM_CACHE(dentry,
 		SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT);
 
+	neg_dentry_init();
+
 	/* Hash may have been set up in dcache_init_early */
 	if (!hashdist)
 		return;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 3f3ff4c..498233b 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -218,6 +218,7 @@ struct dentry_operations {
 
 #define DCACHE_PAR_LOOKUP		0x10000000 /* being looked up (with parent locked shared) */
 #define DCACHE_DENTRY_CURSOR		0x20000000
+#define DCACHE_KILL_NEGATIVE		0x40000000 /* Kill negative dentry */
 
 extern seqlock_t rename_lock;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4] fs/dcache: Report negative dentry number in dentry-state
  2017-07-17 13:39 [PATCH 0/4] fs/dcache: Limit # of negative dentries Waiman Long
  2017-07-17 13:39 ` [PATCH 1/4] fs/dcache: Limit numbers " Waiman Long
@ 2017-07-17 13:39 ` Waiman Long
  2017-07-17 14:09   ` Matthew Wilcox
  2017-07-17 13:39 ` [PATCH 3/4] fs/dcache: Enable automatic pruning of negative dentries Waiman Long
  2017-07-17 13:39 ` [PATCH 4/4] fs/dcache: Protect negative dentry pruning from racing with umount Waiman Long
  3 siblings, 1 reply; 17+ messages in thread
From: Waiman Long @ 2017-07-17 13:39 UTC (permalink / raw)
  To: Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Waiman Long

The number of negative dentries currently in the system is now reported
in the /proc/sys/fs/dentry-state file.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/dcache.c            | 16 +++++++++++++++-
 include/linux/dcache.h |  7 ++++---
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 6a0a844..bb0a519 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -135,6 +135,7 @@ struct dentry_stat_t dentry_stat = {
  */
 #define NEG_DENTRY_BATCH	(1 << 8)
 static long neg_dentry_percpu_limit __read_mostly;
+static long neg_dentry_nfree_init __read_mostly; /* Free pool initial value */
 static struct {
 	raw_spinlock_t nfree_lock;
 	long nfree;			/* Negative dentry free pool */
@@ -176,11 +177,23 @@ static long get_nr_dentry_unused(void)
 	return sum < 0 ? 0 : sum;
 }
 
+static long get_nr_dentry_neg(void)
+{
+	int i;
+	long sum = 0;
+
+	for_each_possible_cpu(i)
+		sum += per_cpu(nr_dentry_neg, i);
+	sum += neg_dentry_nfree_init - ndblk.nfree;
+	return sum < 0 ? 0 : sum;
+}
+
 int proc_nr_dentry(struct ctl_table *table, int write, void __user *buffer,
 		   size_t *lenp, loff_t *ppos)
 {
 	dentry_stat.nr_dentry = get_nr_dentry();
 	dentry_stat.nr_unused = get_nr_dentry_unused();
+	dentry_stat.nr_negative = get_nr_dentry_neg();
 	return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
 }
 #endif
@@ -3704,7 +3717,8 @@ static void __init neg_dentry_init(void)
 	raw_spin_lock_init(&ndblk.nfree_lock);
 
 	/* 20% in global pool & 80% in percpu free */
-	ndblk.nfree = totalram_pages * nr_dentry_page * neg_dentry_pc / 500;
+	ndblk.nfree = neg_dentry_nfree_init
+		    = totalram_pages * nr_dentry_page * neg_dentry_pc / 500;
 	cnt = ndblk.nfree * 4 / num_possible_cpus();
 	if (unlikely(cnt < 2 * NEG_DENTRY_BATCH))
 		cnt = 2 * NEG_DENTRY_BATCH;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 498233b..184669f 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -63,9 +63,10 @@ struct qstr {
 struct dentry_stat_t {
 	long nr_dentry;
 	long nr_unused;
-	long age_limit;          /* age in seconds */
-	long want_pages;         /* pages requested by system */
-	long dummy[2];
+	long nr_negative;	/* # of negative dentries */
+	long age_limit;		/* age in seconds */
+	long want_pages;	/* pages requested by system */
+	long dummy;
 };
 extern struct dentry_stat_t dentry_stat;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4] fs/dcache: Enable automatic pruning of negative dentries
  2017-07-17 13:39 [PATCH 0/4] fs/dcache: Limit # of negative dentries Waiman Long
  2017-07-17 13:39 ` [PATCH 1/4] fs/dcache: Limit numbers " Waiman Long
  2017-07-17 13:39 ` [PATCH 2/4] fs/dcache: Report negative dentry number in dentry-state Waiman Long
@ 2017-07-17 13:39 ` Waiman Long
  2017-07-17 13:39 ` [PATCH 4/4] fs/dcache: Protect negative dentry pruning from racing with umount Waiman Long
  3 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2017-07-17 13:39 UTC (permalink / raw)
  To: Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Waiman Long

Having a limit for the number of negative dentries does have an
undesirable side effect that no new negative dentries will be allowed
when the limit is reached. This will have performance implication
for some types of workloads.

So we need a way to prune the negative dentries so that new ones can
be created. This is done by using the workqueue API to do the pruning
gradually when a threshold is reached to minimize performance impact
on other running tasks. The pruning is done at a frequency of 10 runs
per second. Each run scans at most 256 LRU dentries for each node LRU
list of a certain superblock. Some non-negative dentries that happen
to be at the front of the LRU lists will also be pruned.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/dcache.c              | 109 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/list_lru.h |   1 +
 mm/list_lru.c            |   4 +-
 3 files changed, 113 insertions(+), 1 deletion(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index bb0a519..6c7d86f 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -134,13 +134,19 @@ struct dentry_stat_t dentry_stat = {
  * Macros and variables to manage and count negative dentries.
  */
 #define NEG_DENTRY_BATCH	(1 << 8)
+#define NEG_PRUNING_DELAY	(HZ/10)
 static long neg_dentry_percpu_limit __read_mostly;
 static long neg_dentry_nfree_init __read_mostly; /* Free pool initial value */
 static struct {
 	raw_spinlock_t nfree_lock;
 	long nfree;			/* Negative dentry free pool */
+	struct super_block *prune_sb;	/* Super_block for pruning */
+	int neg_count, prune_count;	/* Pruning counts */
 } ndblk ____cacheline_aligned_in_smp;
 
+static void prune_negative_dentry(struct work_struct *work);
+static DECLARE_DELAYED_WORK(prune_neg_dentry_work, prune_negative_dentry);
+
 static DEFINE_PER_CPU(long, nr_dentry);
 static DEFINE_PER_CPU(long, nr_dentry_unused);
 static DEFINE_PER_CPU(long, nr_dentry_neg);
@@ -323,6 +329,16 @@ static void __neg_dentry_inc(struct dentry *dentry)
 	 */
 	if (!cnt)
 		dentry->d_flags |= DCACHE_KILL_NEGATIVE;
+
+	/*
+	 * Initiate negative dentry pruning if free pool has less than
+	 * 1/4 of its initial value.
+	 */
+	if (READ_ONCE(ndblk.nfree) < neg_dentry_nfree_init/4) {
+		WRITE_ONCE(ndblk.prune_sb, dentry->d_sb);
+		schedule_delayed_work(&prune_neg_dentry_work,
+				      NEG_PRUNING_DELAY);
+	}
 }
 
 static inline void neg_dentry_inc(struct dentry *dentry)
@@ -1291,6 +1307,99 @@ void shrink_dcache_sb(struct super_block *sb)
 }
 EXPORT_SYMBOL(shrink_dcache_sb);
 
+/*
+ * A modified version that attempts to remove a limited number of negative
+ * dentries as well as some other non-negative dentries at the front.
+ */
+static enum lru_status dentry_negative_lru_isolate(struct list_head *item,
+		struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
+{
+	struct list_head *freeable = arg;
+	struct dentry	*dentry = container_of(item, struct dentry, d_lru);
+	enum lru_status	status = LRU_SKIP;
+
+	/*
+	 * Stop further list walking for the current node list to limit
+	 * performance impact, but allow further walking in the next node
+	 * list.
+	 */
+	if ((ndblk.neg_count >= NEG_DENTRY_BATCH) ||
+	    (ndblk.prune_count >= NEG_DENTRY_BATCH)) {
+		ndblk.prune_count = 0;
+		return LRU_STOP;
+	}
+
+	/*
+	 * we are inverting the lru lock/dentry->d_lock here,
+	 * so use a trylock. If we fail to get the lock, just skip
+	 * it
+	 */
+	if (!spin_trylock(&dentry->d_lock)) {
+		ndblk.prune_count++;
+		return LRU_SKIP;
+	}
+
+	/*
+	 * Referenced dentries are still in use. If they have active
+	 * counts, just remove them from the LRU. Otherwise give them
+	 * another pass through the LRU.
+	 */
+	if (dentry->d_lockref.count) {
+		d_lru_isolate(lru, dentry);
+		status = LRU_REMOVED;
+		goto out;
+	}
+
+	if (dentry->d_flags & DCACHE_REFERENCED) {
+		dentry->d_flags &= ~DCACHE_REFERENCED;
+		status = LRU_ROTATE;
+		goto out;
+	}
+
+	status = LRU_REMOVED;
+	d_lru_shrink_move(lru, dentry, freeable);
+	if (d_is_negative(dentry))
+		ndblk.neg_count++;
+out:
+	spin_unlock(&dentry->d_lock);
+	ndblk.prune_count++;
+	return status;
+}
+
+/*
+ * A workqueue function to prune negative dentry.
+ *
+ * The pruning is done gradually over time so as not to have noticeable
+ * performance impact.
+ */
+static void prune_negative_dentry(struct work_struct *work)
+{
+	int freed;
+	struct super_block *sb = READ_ONCE(ndblk.prune_sb);
+	LIST_HEAD(dispose);
+
+	if (!sb)
+		return;
+
+	ndblk.neg_count = ndblk.prune_count = 0;
+	freed = list_lru_walk(&sb->s_dentry_lru, dentry_negative_lru_isolate,
+			      &dispose, NEG_DENTRY_BATCH);
+
+	if (freed)
+		shrink_dentry_list(&dispose);
+	/*
+	 * Continue delayed pruning once every second until negative dentry
+	 * free pool is at least 1/2 of the initial value or the super_block
+	 * has no more negative dentries left at the front.
+	 */
+	if (ndblk.neg_count &&
+	   (READ_ONCE(ndblk.nfree) < neg_dentry_nfree_init/2))
+		schedule_delayed_work(&prune_neg_dentry_work,
+				      NEG_PRUNING_DELAY);
+	else
+		WRITE_ONCE(ndblk.prune_sb, NULL);
+}
+
 /**
  * enum d_walk_ret - action to talke during tree walk
  * @D_WALK_CONTINUE:	contrinue walk
diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index fa7fd03..06c9d15 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -22,6 +22,7 @@ enum lru_status {
 	LRU_SKIP,		/* item cannot be locked, skip */
 	LRU_RETRY,		/* item not freeable. May drop the lock
 				   internally, but has to return locked. */
+	LRU_STOP,		/* stop walking the list */
 };
 
 struct list_lru_one {
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 7a40fa2..f6e7796 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -244,11 +244,13 @@ unsigned long list_lru_count_node(struct list_lru *lru, int nid)
 			 */
 			assert_spin_locked(&nlru->lock);
 			goto restart;
+		case LRU_STOP:
+			goto out;
 		default:
 			BUG();
 		}
 	}
-
+out:
 	spin_unlock(&nlru->lock);
 	return isolated;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4] fs/dcache: Protect negative dentry pruning from racing with umount
  2017-07-17 13:39 [PATCH 0/4] fs/dcache: Limit # of negative dentries Waiman Long
                   ` (2 preceding siblings ...)
  2017-07-17 13:39 ` [PATCH 3/4] fs/dcache: Enable automatic pruning of negative dentries Waiman Long
@ 2017-07-17 13:39 ` Waiman Long
  3 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2017-07-17 13:39 UTC (permalink / raw)
  To: Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-doc, linux-fsdevel, Paul E. McKenney,
	Andrew Morton, Ingo Molnar, Miklos Szeredi, Waiman Long

The negative dentry pruning is done on a specific super_block set
in the ndblk.prune_sb variable. If the super_block is also being
un-mounted concurrently, the content of the super_block may no longer
be valid.

To protect against such racing condition, a new lock is added to
the ndblk structure to synchronize the negative dentry pruning and
umount operation.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/dcache.c | 41 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 38 insertions(+), 3 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 6c7d86f..babfa05 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -139,11 +139,13 @@ struct dentry_stat_t dentry_stat = {
 static long neg_dentry_nfree_init __read_mostly; /* Free pool initial value */
 static struct {
 	raw_spinlock_t nfree_lock;
+	raw_spinlock_t prune_lock;	/* Lock for protecting pruning */
 	long nfree;			/* Negative dentry free pool */
 	struct super_block *prune_sb;	/* Super_block for pruning */
 	int neg_count, prune_count;	/* Pruning counts */
 } ndblk ____cacheline_aligned_in_smp;
 
+static void clear_prune_sb_for_umount(struct super_block *sb);
 static void prune_negative_dentry(struct work_struct *work);
 static DECLARE_DELAYED_WORK(prune_neg_dentry_work, prune_negative_dentry);
 
@@ -1294,6 +1296,7 @@ void shrink_dcache_sb(struct super_block *sb)
 {
 	long freed;
 
+	clear_prune_sb_for_umount(sb);
 	do {
 		LIST_HEAD(dispose);
 
@@ -1324,7 +1327,8 @@ static enum lru_status dentry_negative_lru_isolate(struct list_head *item,
 	 * list.
 	 */
 	if ((ndblk.neg_count >= NEG_DENTRY_BATCH) ||
-	    (ndblk.prune_count >= NEG_DENTRY_BATCH)) {
+	    (ndblk.prune_count >= NEG_DENTRY_BATCH) ||
+	    !READ_ONCE(ndblk.prune_sb)) {
 		ndblk.prune_count = 0;
 		return LRU_STOP;
 	}
@@ -1375,15 +1379,24 @@ static enum lru_status dentry_negative_lru_isolate(struct list_head *item,
 static void prune_negative_dentry(struct work_struct *work)
 {
 	int freed;
-	struct super_block *sb = READ_ONCE(ndblk.prune_sb);
+	struct super_block *sb;
 	LIST_HEAD(dispose);
 
-	if (!sb)
+	/*
+	 * The prune_lock is used to protect negative dentry pruning from
+	 * racing with concurrent umount operation.
+	 */
+	raw_spin_lock(&ndblk.prune_lock);
+	sb = READ_ONCE(ndblk.prune_sb);
+	if (!sb) {
+		raw_spin_unlock(&ndblk.prune_lock);
 		return;
+	}
 
 	ndblk.neg_count = ndblk.prune_count = 0;
 	freed = list_lru_walk(&sb->s_dentry_lru, dentry_negative_lru_isolate,
 			      &dispose, NEG_DENTRY_BATCH);
+	raw_spin_unlock(&ndblk.prune_lock);
 
 	if (freed)
 		shrink_dentry_list(&dispose);
@@ -1400,6 +1413,27 @@ static void prune_negative_dentry(struct work_struct *work)
 		WRITE_ONCE(ndblk.prune_sb, NULL);
 }
 
+/*
+ * This is called before an umount to clear ndblk.prune_sb if it
+ * matches the given super_block.
+ */
+static void clear_prune_sb_for_umount(struct super_block *sb)
+{
+	if (likely(READ_ONCE(ndblk.prune_sb) != sb))
+		return;
+	WRITE_ONCE(ndblk.prune_sb, NULL);
+	/*
+	 * Need to wait until an ongoing pruning operation, if present,
+	 * is completed.
+	 *
+	 * Clearing ndblk.prune_sb will hasten the completion of pruning.
+	 * In the unlikely event that ndblk.prune_sb is set to another
+	 * super_block, the waiting will last the complete pruning operation
+	 * which shouldn't be that long either.
+	 */
+	raw_spin_unlock_wait(&ndblk.prune_lock);
+}
+
 /**
  * enum d_walk_ret - action to talke during tree walk
  * @D_WALK_CONTINUE:	contrinue walk
@@ -1722,6 +1756,7 @@ void shrink_dcache_for_umount(struct super_block *sb)
 
 	WARN(down_read_trylock(&sb->s_umount), "s_umount should've been locked");
 
+	clear_prune_sb_for_umount(sb);
 	dentry = sb->s_root;
 	sb->s_root = NULL;
 	do_one_tree(dentry);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] fs/dcache: Report negative dentry number in dentry-state
  2017-07-17 13:39 ` [PATCH 2/4] fs/dcache: Report negative dentry number in dentry-state Waiman Long
@ 2017-07-17 14:09   ` Matthew Wilcox
  2017-07-17 14:39     ` Waiman Long
  0 siblings, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2017-07-17 14:09 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-doc,
	linux-fsdevel, Paul E. McKenney, Andrew Morton, Ingo Molnar,
	Miklos Szeredi

On Mon, Jul 17, 2017 at 09:39:31AM -0400, Waiman Long wrote:
> @@ -63,9 +63,10 @@ struct qstr {
>  struct dentry_stat_t {
>  	long nr_dentry;
>  	long nr_unused;
> -	long age_limit;          /* age in seconds */
> -	long want_pages;         /* pages requested by system */
> -	long dummy[2];
> +	long nr_negative;	/* # of negative dentries */
> +	long age_limit;		/* age in seconds */
> +	long want_pages;	/* pages requested by system */
> +	long dummy;
>  };
>  extern struct dentry_stat_t dentry_stat;

You can't just insert a field in the middle like that.  It'll break any code
parsing /proc/sys/fs/dentry-state.  You have to put it at the end:

 	long age_limit;          /* age in seconds */
 	long want_pages;         /* pages requested by system */
-	long dummy[2];
+	long nr_negative;	/* # of negative dentries */
+	long dummy;
 };

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] fs/dcache: Report negative dentry number in dentry-state
  2017-07-17 14:09   ` Matthew Wilcox
@ 2017-07-17 14:39     ` Waiman Long
  0 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2017-07-17 14:39 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-doc,
	linux-fsdevel, Paul E. McKenney, Andrew Morton, Ingo Molnar,
	Miklos Szeredi

On 07/17/2017 10:09 AM, Matthew Wilcox wrote:
> On Mon, Jul 17, 2017 at 09:39:31AM -0400, Waiman Long wrote:
>> @@ -63,9 +63,10 @@ struct qstr {
>>  struct dentry_stat_t {
>>  	long nr_dentry;
>>  	long nr_unused;
>> -	long age_limit;          /* age in seconds */
>> -	long want_pages;         /* pages requested by system */
>> -	long dummy[2];
>> +	long nr_negative;	/* # of negative dentries */
>> +	long age_limit;		/* age in seconds */
>> +	long want_pages;	/* pages requested by system */
>> +	long dummy;
>>  };
>>  extern struct dentry_stat_t dentry_stat;
> You can't just insert a field in the middle like that.  It'll break any code
> parsing /proc/sys/fs/dentry-state.  You have to put it at the end:
>
>  	long age_limit;          /* age in seconds */
>  	long want_pages;         /* pages requested by system */
> -	long dummy[2];
> +	long nr_negative;	/* # of negative dentries */
> +	long dummy;
>  };

My mistake. I will send out an updated patches later on after collecting
more feedback to make the new field go to the end.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] fs/dcache: Limit numbers of negative dentries
  2017-07-17 13:39 ` [PATCH 1/4] fs/dcache: Limit numbers " Waiman Long
@ 2017-07-17 17:49   ` Matthew Wilcox
  2017-07-17 18:31     ` Waiman Long
  2017-07-19 14:39   ` Miklos Szeredi
  2017-07-19 20:24   ` Miklos Szeredi
  2 siblings, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2017-07-17 17:49 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-doc,
	linux-fsdevel, Paul E. McKenney, Andrew Morton, Ingo Molnar,
	Miklos Szeredi

On Mon, Jul 17, 2017 at 09:39:30AM -0400, Waiman Long wrote:
> The number of positive dentries is limited by the number of files
> in the filesystems. The number of negative dentries, however,
> has no limit other than the total amount of memory available in
> the system. So a rogue application that generates a lot of negative
> dentries can potentially exhaust most of the memory available in the
> system impacting performance on other running applications.
> 
> To prevent this from happening, the dcache code is now updated to limit
> the amount of the negative dentries in the LRU lists that can be kept
> as a percentage of total available system memory. The default is 5%
> and can be changed by specifying the "neg_dentry_pc=" kernel command
> line option.

I see the problem, but rather than restricting the number of negative
dentries to be a fraction of the total amount of memory in the machine,
wouldn't it make more sense to limit the number of negative dentries to be
some multiple of the number of positive dentries currently in the system?

Or make negative dentries more easily prunable.  For example, we could
allocate them from a separate slab and use the existing reclaim mechanism
to just throw them away.  Since they can't be pinned by an inode, they're
much easier to get rid of than positive dentries.  Might make changing
a dentry from positive to negative or vice versa a bit more expensive ...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] fs/dcache: Limit numbers of negative dentries
  2017-07-17 17:49   ` Matthew Wilcox
@ 2017-07-17 18:31     ` Waiman Long
  0 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2017-07-17 18:31 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-doc,
	linux-fsdevel, Paul E. McKenney, Andrew Morton, Ingo Molnar,
	Miklos Szeredi

On 07/17/2017 01:49 PM, Matthew Wilcox wrote:
> On Mon, Jul 17, 2017 at 09:39:30AM -0400, Waiman Long wrote:
>> The number of positive dentries is limited by the number of files
>> in the filesystems. The number of negative dentries, however,
>> has no limit other than the total amount of memory available in
>> the system. So a rogue application that generates a lot of negative
>> dentries can potentially exhaust most of the memory available in the
>> system impacting performance on other running applications.
>>
>> To prevent this from happening, the dcache code is now updated to limit
>> the amount of the negative dentries in the LRU lists that can be kept
>> as a percentage of total available system memory. The default is 5%
>> and can be changed by specifying the "neg_dentry_pc=" kernel command
>> line option.
> I see the problem, but rather than restricting the number of negative
> dentries to be a fraction of the total amount of memory in the machine,
> wouldn't it make more sense to limit the number of negative dentries to be
> some multiple of the number of positive dentries currently in the system?

The number of positive dentries will be a rapidly changing number. So we
can't use __read_mostly variable for the limits. That may have a certain
performance impact. I chose to use a fixed number because of simplicity
and performance. I can compromise on simplicity, but not on performance.
I am open to maybe adjust the free pool count in some ways as long as
the performance impact is negligible.
 
> Or make negative dentries more easily prunable.  For example, we could
> allocate them from a separate slab and use the existing reclaim mechanism
> to just throw them away.  Since they can't be pinned by an inode, they're
> much easier to get rid of than positive dentries.  Might make changing
> a dentry from positive to negative or vice versa a bit more expensive ...

I don't quite understand what you mean by having two separate slabs. The
current reclaim mechanism is through scanning the LRU lists.

I had been thinking about having a separate LRU list for negative
dentries. Giving the complexity of the current per-node/per-memcg LRU
list, maintaining 2 separate LRU lists in each super_block may be
error-prone.

It is true that positive dentries will also be pruned in the process. By
the time automatic pruning happens, there should have a lot of negative
dentries in the LRU lists already. We can skip over positive dentries in
the scanning, but we have to either allow scanning more entries in each
pass prolonging the interruption or do no pruning at all if the LRU
lists are front-loaded with a bunch of positive dentries.

BTW, you remind me that I should have accounted for the
positive-to-negative dentry transitions which is missing in the current
patch.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] fs/dcache: Limit numbers of negative dentries
  2017-07-17 13:39 ` [PATCH 1/4] fs/dcache: Limit numbers " Waiman Long
  2017-07-17 17:49   ` Matthew Wilcox
@ 2017-07-19 14:39   ` Miklos Szeredi
  2017-07-19 15:02     ` Waiman Long
  2017-07-19 20:24   ` Miklos Szeredi
  2 siblings, 1 reply; 17+ messages in thread
From: Miklos Szeredi @ 2017-07-19 14:39 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jonathan Corbet, lkml, linux-doc, linux-fsdevel,
	Paul E. McKenney, Andrew Morton, Ingo Molnar

On Mon, Jul 17, 2017 at 3:39 PM, Waiman Long <longman@redhat.com> wrote:
> The number of positive dentries is limited by the number of files
> in the filesystems. The number of negative dentries, however,
> has no limit other than the total amount of memory available in
> the system. So a rogue application that generates a lot of negative
> dentries can potentially exhaust most of the memory available in the
> system impacting performance on other running applications.
>
> To prevent this from happening, the dcache code is now updated to limit
> the amount of the negative dentries in the LRU lists that can be kept
> as a percentage of total available system memory. The default is 5%
> and can be changed by specifying the "neg_dentry_pc=" kernel command
> line option.

AFAICS the implementation is counter to the concept of LRU since it
will get rid of the most recently used negative dentry after passing
the limit.  Which in itself is a source of DoS (keep rouge negative
dentries at just about the limit, so normal application are prevented
from getting their negatives cached).

Thanks,
Miklos

>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |   7 ++
>  fs/dcache.c                                     | 154 +++++++++++++++++++++++-
>  include/linux/dcache.h                          |   1 +
>  3 files changed, 161 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index f701430..fc3c937 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2372,6 +2372,13 @@
>
>         n2=             [NET] SDL Inc. RISCom/N2 synchronous serial card
>
> +       neg_dentry_pc=  [KNL]
> +                       Range: 1-50
> +                       Default: 5
> +                       This parameter specifies the amount of negative
> +                       dentries allowed in the system as a percentage of
> +                       total system memory.
> +
>         netdev=         [NET] Network devices parameters
>                         Format: <irq>,<io>,<mem_start>,<mem_end>,<name>
>                         Note that mem_start is often overloaded to mean
> diff --git a/fs/dcache.c b/fs/dcache.c
> index f901413..6a0a844 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -130,8 +130,19 @@ struct dentry_stat_t dentry_stat = {
>         .age_limit = 45,
>  };
>
> +/*
> + * Macros and variables to manage and count negative dentries.
> + */
> +#define NEG_DENTRY_BATCH       (1 << 8)
> +static long neg_dentry_percpu_limit __read_mostly;
> +static struct {
> +       raw_spinlock_t nfree_lock;
> +       long nfree;                     /* Negative dentry free pool */
> +} ndblk ____cacheline_aligned_in_smp;
> +
>  static DEFINE_PER_CPU(long, nr_dentry);
>  static DEFINE_PER_CPU(long, nr_dentry_unused);
> +static DEFINE_PER_CPU(long, nr_dentry_neg);
>
>  #if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS)
>
> @@ -227,6 +238,86 @@ static inline int dentry_string_cmp(const unsigned char *cs, const unsigned char
>
>  #endif
>
> +/*
> + * There is a system-wide limit to the amount of negative dentries allowed
> + * in the super blocks' LRU lists. The default limit is 5% of the total
> + * system memory. This limit can be changed by using the kernel command line
> + * option "neg_dentry_pc=" to specify the percentage of the total memory
> + * that can be used for negative dentries. That percentage must be in the
> + * 1-50% range.
> + *
> + * To avoid performance problem with a global counter on an SMP system,
> + * the tracking is done mostly on a per-cpu basis. The total limit is
> + * distributed in a 80/20 ratio to per-cpu counters and a global free pool.
> + *
> + * If a per-cpu counter runs out of negative dentries, it can borrow extra
> + * ones from the global free pool. If it has more than its percpu limit,
> + * the extra ones will be returned back to the global pool.
> + */
> +
> +/*
> + * Decrement negative dentry count if applicable.
> + */
> +static void __neg_dentry_dec(struct dentry *dentry)
> +{
> +       if (unlikely(this_cpu_dec_return(nr_dentry_neg) < 0)) {
> +               long *pcnt = get_cpu_ptr(&nr_dentry_neg);
> +
> +               if ((*pcnt < 0) && raw_spin_trylock(&ndblk.nfree_lock)) {
> +                       ACCESS_ONCE(ndblk.nfree) += NEG_DENTRY_BATCH;
> +                       *pcnt += NEG_DENTRY_BATCH;
> +                       raw_spin_unlock(&ndblk.nfree_lock);
> +               }
> +               put_cpu_ptr(&nr_dentry_neg);
> +       }
> +}
> +
> +static inline void neg_dentry_dec(struct dentry *dentry)
> +{
> +       if (unlikely(d_is_negative(dentry)))
> +               __neg_dentry_dec(dentry);
> +}
> +
> +/*
> + * Increment negative dentry count if applicable.
> + */
> +static void __neg_dentry_inc(struct dentry *dentry)
> +{
> +       long cnt, *pcnt;
> +
> +       if (this_cpu_inc_return(nr_dentry_neg) <= neg_dentry_percpu_limit)
> +               return;
> +
> +       pcnt = get_cpu_ptr(&nr_dentry_neg);
> +       cnt  = (READ_ONCE(ndblk.nfree) &&
> +              (*pcnt > neg_dentry_percpu_limit)) ? NEG_DENTRY_BATCH : 0;
> +
> +       if (cnt && raw_spin_trylock(&ndblk.nfree_lock)) {
> +               long val = READ_ONCE(ndblk.nfree);
> +
> +               if (val < cnt)
> +                       cnt = val;
> +               ACCESS_ONCE(ndblk.nfree) -= cnt;
> +               *pcnt -= cnt;
> +               raw_spin_unlock(&ndblk.nfree_lock);
> +       } else {
> +               cnt = 0;
> +       }
> +       put_cpu_ptr(&nr_dentry_neg);
> +       /*
> +        * If there are too many negative dentries, set DCACHE_KILL_NEGATIVE
> +        * flag to indicate that the dentry should be killed.
> +        */
> +       if (!cnt)
> +               dentry->d_flags |= DCACHE_KILL_NEGATIVE;
> +}
> +
> +static inline void neg_dentry_inc(struct dentry *dentry)
> +{
> +       if (unlikely(d_is_negative(dentry)))
> +               __neg_dentry_inc(dentry);
> +}
> +
>  static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *ct, unsigned tcount)
>  {
>         /*
> @@ -396,6 +487,7 @@ static void d_lru_add(struct dentry *dentry)
>         dentry->d_flags |= DCACHE_LRU_LIST;
>         this_cpu_inc(nr_dentry_unused);
>         WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
> +       neg_dentry_inc(dentry);
>  }
>
>  static void d_lru_del(struct dentry *dentry)
> @@ -404,6 +496,7 @@ static void d_lru_del(struct dentry *dentry)
>         dentry->d_flags &= ~DCACHE_LRU_LIST;
>         this_cpu_dec(nr_dentry_unused);
>         WARN_ON_ONCE(!list_lru_del(&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
> +       neg_dentry_dec(dentry);
>  }
>
>  static void d_shrink_del(struct dentry *dentry)
> @@ -434,6 +527,7 @@ static void d_lru_isolate(struct list_lru_one *lru, struct dentry *dentry)
>         dentry->d_flags &= ~DCACHE_LRU_LIST;
>         this_cpu_dec(nr_dentry_unused);
>         list_lru_isolate(lru, &dentry->d_lru);
> +       neg_dentry_dec(dentry);
>  }
>
>  static void d_lru_shrink_move(struct list_lru_one *lru, struct dentry *dentry,
> @@ -442,6 +536,7 @@ static void d_lru_shrink_move(struct list_lru_one *lru, struct dentry *dentry,
>         D_FLAG_VERIFY(dentry, DCACHE_LRU_LIST);
>         dentry->d_flags |= DCACHE_SHRINK_LIST;
>         list_lru_isolate_move(lru, &dentry->d_lru, list);
> +       neg_dentry_dec(dentry);
>  }
>
>  /*
> @@ -603,7 +698,13 @@ static struct dentry *dentry_kill(struct dentry *dentry)
>
>         if (!IS_ROOT(dentry)) {
>                 parent = dentry->d_parent;
> -               if (unlikely(!spin_trylock(&parent->d_lock))) {
> +               /*
> +                * Force the killing of this negative dentry when
> +                * DCACHE_KILL_NEGATIVE flag is set.
> +                */
> +               if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE)) {
> +                       spin_lock(&parent->d_lock);
> +               } else if (unlikely(!spin_trylock(&parent->d_lock))) {
>                         if (inode)
>                                 spin_unlock(&inode->i_lock);
>                         goto failed;
> @@ -815,6 +916,9 @@ void dput(struct dentry *dentry)
>
>         dentry_lru_add(dentry);
>
> +       if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE))
> +               goto kill_it;
> +
>         dentry->d_lockref.count--;
>         spin_unlock(&dentry->d_lock);
>         return;
> @@ -1820,6 +1924,11 @@ static void __d_instantiate(struct dentry *dentry, struct inode *inode)
>         WARN_ON(d_in_lookup(dentry));
>
>         spin_lock(&dentry->d_lock);
> +       /*
> +        * Decrement negative dentry count if it was in the LRU list.
> +        */
> +       if (dentry->d_flags & DCACHE_LRU_LIST)
> +               neg_dentry_dec(dentry);
>         hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry);
>         raw_write_seqcount_begin(&dentry->d_seq);
>         __d_set_inode_and_type(dentry, inode, add_flags);
> @@ -3566,6 +3675,47 @@ void d_tmpfile(struct dentry *dentry, struct inode *inode)
>  }
>  EXPORT_SYMBOL(d_tmpfile);
>
> +static long neg_dentry_pc __initdata = 5;
> +static bool neg_dentry_warn __initdata;
> +static int __init set_neg_dentry_pc(char *str)
> +{
> +       ssize_t ret;
> +       long new_pc = neg_dentry_pc;
> +
> +       if (!str)
> +               return 0;
> +       ret = kstrtol(str, 0, &new_pc);
> +       if (ret || (new_pc < 1) || (new_pc > 50))
> +               ret = 1;
> +       else
> +               neg_dentry_pc = new_pc;
> +       if (ret)
> +               neg_dentry_warn = true;
> +       return ret ? 0 : 1;
> +}
> +__setup("neg_dentry_pc=", set_neg_dentry_pc);
> +
> +static void __init neg_dentry_init(void)
> +{
> +       /* Rough estimate of # of dentries allocated per page */
> +       unsigned int nr_dentry_page = PAGE_SIZE/sizeof(struct dentry) - 1;
> +       unsigned long cnt;
> +
> +       raw_spin_lock_init(&ndblk.nfree_lock);
> +
> +       /* 20% in global pool & 80% in percpu free */
> +       ndblk.nfree = totalram_pages * nr_dentry_page * neg_dentry_pc / 500;
> +       cnt = ndblk.nfree * 4 / num_possible_cpus();
> +       if (unlikely(cnt < 2 * NEG_DENTRY_BATCH))
> +               cnt = 2 * NEG_DENTRY_BATCH;
> +       neg_dentry_percpu_limit = cnt;
> +
> +       if (neg_dentry_warn)
> +               pr_warn("Warning: neg_dentry_pc must be within 1-50 range.\n");
> +       pr_info("Negative dentry: percpu limit = %ld, free pool = %ld\n",
> +               neg_dentry_percpu_limit, ndblk.nfree);
> +}
> +
>  static __initdata unsigned long dhash_entries;
>  static int __init set_dhash_entries(char *str)
>  {
> @@ -3606,6 +3756,8 @@ static void __init dcache_init(void)
>         dentry_cache = KMEM_CACHE(dentry,
>                 SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT);
>
> +       neg_dentry_init();
> +
>         /* Hash may have been set up in dcache_init_early */
>         if (!hashdist)
>                 return;
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index 3f3ff4c..498233b 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -218,6 +218,7 @@ struct dentry_operations {
>
>  #define DCACHE_PAR_LOOKUP              0x10000000 /* being looked up (with parent locked shared) */
>  #define DCACHE_DENTRY_CURSOR           0x20000000
> +#define DCACHE_KILL_NEGATIVE           0x40000000 /* Kill negative dentry */
>
>  extern seqlock_t rename_lock;
>
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] fs/dcache: Limit numbers of negative dentries
  2017-07-19 14:39   ` Miklos Szeredi
@ 2017-07-19 15:02     ` Waiman Long
  0 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2017-07-19 15:02 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Alexander Viro, Jonathan Corbet, lkml, linux-doc, linux-fsdevel,
	Paul E. McKenney, Andrew Morton, Ingo Molnar

On 07/19/2017 10:39 AM, Miklos Szeredi wrote:
> On Mon, Jul 17, 2017 at 3:39 PM, Waiman Long <longman@redhat.com> wrote:
>> The number of positive dentries is limited by the number of files
>> in the filesystems. The number of negative dentries, however,
>> has no limit other than the total amount of memory available in
>> the system. So a rogue application that generates a lot of negative
>> dentries can potentially exhaust most of the memory available in the
>> system impacting performance on other running applications.
>>
>> To prevent this from happening, the dcache code is now updated to limit
>> the amount of the negative dentries in the LRU lists that can be kept
>> as a percentage of total available system memory. The default is 5%
>> and can be changed by specifying the "neg_dentry_pc=" kernel command
>> line option.
> AFAICS the implementation is counter to the concept of LRU since it
> will get rid of the most recently used negative dentry after passing
> the limit.  Which in itself is a source of DoS (keep rouge negative
> dentries at just about the limit, so normal application are prevented
> from getting their negatives cached).
>
> Thanks,
> Miklos

Yes, you are right. That is exactly the problem with patch 1 alone. That
is why I have patches 3 & 4 to enable automatic trimming to decrease the
number of negative dentries before the limit is reached assuming the
rate of increase of negative dentries isn't faster that the reduction
rate of the automatic trimming process.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] fs/dcache: Limit numbers of negative dentries
  2017-07-17 13:39 ` [PATCH 1/4] fs/dcache: Limit numbers " Waiman Long
  2017-07-17 17:49   ` Matthew Wilcox
  2017-07-19 14:39   ` Miklos Szeredi
@ 2017-07-19 20:24   ` Miklos Szeredi
  2017-07-19 20:42     ` Waiman Long
  2 siblings, 1 reply; 17+ messages in thread
From: Miklos Szeredi @ 2017-07-19 20:24 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jonathan Corbet, lkml, linux-doc, linux-fsdevel,
	Paul E. McKenney, Andrew Morton, Ingo Molnar

On Mon, Jul 17, 2017 at 3:39 PM, Waiman Long <longman@redhat.com> wrote:
> The number of positive dentries is limited by the number of files
> in the filesystems. The number of negative dentries, however,
> has no limit other than the total amount of memory available in
> the system. So a rogue application that generates a lot of negative
> dentries can potentially exhaust most of the memory available in the
> system impacting performance on other running applications.
>
> To prevent this from happening, the dcache code is now updated to limit
> the amount of the negative dentries in the LRU lists that can be kept
> as a percentage of total available system memory. The default is 5%
> and can be changed by specifying the "neg_dentry_pc=" kernel command
> line option.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---

[...]

> @@ -603,7 +698,13 @@ static struct dentry *dentry_kill(struct dentry *dentry)
>
>         if (!IS_ROOT(dentry)) {
>                 parent = dentry->d_parent;
> -               if (unlikely(!spin_trylock(&parent->d_lock))) {
> +               /*
> +                * Force the killing of this negative dentry when
> +                * DCACHE_KILL_NEGATIVE flag is set.
> +                */
> +               if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE)) {
> +                       spin_lock(&parent->d_lock);

This looks like d_lock ordering problem (should be parent first, child
second).  Why is this needed, anyway?

> +               } else if (unlikely(!spin_trylock(&parent->d_lock))) {
>                         if (inode)
>                                 spin_unlock(&inode->i_lock);
>                         goto failed;

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] fs/dcache: Limit numbers of negative dentries
  2017-07-19 20:24   ` Miklos Szeredi
@ 2017-07-19 20:42     ` Waiman Long
  2017-07-20  7:20       ` Miklos Szeredi
  0 siblings, 1 reply; 17+ messages in thread
From: Waiman Long @ 2017-07-19 20:42 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Alexander Viro, Jonathan Corbet, lkml, linux-doc, linux-fsdevel,
	Paul E. McKenney, Andrew Morton, Ingo Molnar

On 07/19/2017 04:24 PM, Miklos Szeredi wrote:
> On Mon, Jul 17, 2017 at 3:39 PM, Waiman Long <longman@redhat.com> wrote:
>> The number of positive dentries is limited by the number of files
>> in the filesystems. The number of negative dentries, however,
>> has no limit other than the total amount of memory available in
>> the system. So a rogue application that generates a lot of negative
>> dentries can potentially exhaust most of the memory available in the
>> system impacting performance on other running applications.
>>
>> To prevent this from happening, the dcache code is now updated to limit
>> the amount of the negative dentries in the LRU lists that can be kept
>> as a percentage of total available system memory. The default is 5%
>> and can be changed by specifying the "neg_dentry_pc=" kernel command
>> line option.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
> [...]
>
>> @@ -603,7 +698,13 @@ static struct dentry *dentry_kill(struct dentry *dentry)
>>
>>         if (!IS_ROOT(dentry)) {
>>                 parent = dentry->d_parent;
>> -               if (unlikely(!spin_trylock(&parent->d_lock))) {
>> +               /*
>> +                * Force the killing of this negative dentry when
>> +                * DCACHE_KILL_NEGATIVE flag is set.
>> +                */
>> +               if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE)) {
>> +                       spin_lock(&parent->d_lock);
> This looks like d_lock ordering problem (should be parent first, child
> second).  Why is this needed, anyway?
>

Yes, that is a bug. I should have used lock_parent() instead.

I have a test program that generate a lot of negative dentries
continuously. Using spin_trylock(), it failed most of the time when that
test program was running. So I need to actually acquire the parent's
d_lock to make sure that the offending negative dentry was really
killed. It was there to protect against the worst case situation. I will
update the patch to correct that.

Thanks for spotting this.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] fs/dcache: Limit numbers of negative dentries
  2017-07-19 20:42     ` Waiman Long
@ 2017-07-20  7:20       ` Miklos Szeredi
  2017-07-20 14:21         ` Waiman Long
  0 siblings, 1 reply; 17+ messages in thread
From: Miklos Szeredi @ 2017-07-20  7:20 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jonathan Corbet, lkml, linux-doc, linux-fsdevel,
	Paul E. McKenney, Andrew Morton, Ingo Molnar

On Wed, Jul 19, 2017 at 10:42 PM, Waiman Long <longman@redhat.com> wrote:
> On 07/19/2017 04:24 PM, Miklos Szeredi wrote:
>> On Mon, Jul 17, 2017 at 3:39 PM, Waiman Long <longman@redhat.com> wrote:
>>> The number of positive dentries is limited by the number of files
>>> in the filesystems. The number of negative dentries, however,
>>> has no limit other than the total amount of memory available in
>>> the system. So a rogue application that generates a lot of negative
>>> dentries can potentially exhaust most of the memory available in the
>>> system impacting performance on other running applications.
>>>
>>> To prevent this from happening, the dcache code is now updated to limit
>>> the amount of the negative dentries in the LRU lists that can be kept
>>> as a percentage of total available system memory. The default is 5%
>>> and can be changed by specifying the "neg_dentry_pc=" kernel command
>>> line option.
>>>
>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>> ---
>> [...]
>>
>>> @@ -603,7 +698,13 @@ static struct dentry *dentry_kill(struct dentry *dentry)
>>>
>>>         if (!IS_ROOT(dentry)) {
>>>                 parent = dentry->d_parent;
>>> -               if (unlikely(!spin_trylock(&parent->d_lock))) {
>>> +               /*
>>> +                * Force the killing of this negative dentry when
>>> +                * DCACHE_KILL_NEGATIVE flag is set.
>>> +                */
>>> +               if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE)) {
>>> +                       spin_lock(&parent->d_lock);
>> This looks like d_lock ordering problem (should be parent first, child
>> second).  Why is this needed, anyway?
>>
>
> Yes, that is a bug. I should have used lock_parent() instead.

lock_parent() can release dentry->d_lock, which means it's perfectly
useless for this.

I still feel forcing  free is wrong here.  Why not just block until
the number of negatives goes below the limit (start reclaim if not
already doing so, etc...)?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] fs/dcache: Limit numbers of negative dentries
  2017-07-20  7:20       ` Miklos Szeredi
@ 2017-07-20 14:21         ` Waiman Long
  2017-07-20 15:08           ` Miklos Szeredi
  0 siblings, 1 reply; 17+ messages in thread
From: Waiman Long @ 2017-07-20 14:21 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Alexander Viro, Jonathan Corbet, lkml, linux-doc, linux-fsdevel,
	Paul E. McKenney, Andrew Morton, Ingo Molnar

On 07/20/2017 03:20 AM, Miklos Szeredi wrote:
> On Wed, Jul 19, 2017 at 10:42 PM, Waiman Long <longman@redhat.com> wrote:
>>
>>>> @@ -603,7 +698,13 @@ static struct dentry *dentry_kill(struct dentry *dentry)
>>>>
>>>>         if (!IS_ROOT(dentry)) {
>>>>                 parent = dentry->d_parent;
>>>> -               if (unlikely(!spin_trylock(&parent->d_lock))) {
>>>> +               /*
>>>> +                * Force the killing of this negative dentry when
>>>> +                * DCACHE_KILL_NEGATIVE flag is set.
>>>> +                */
>>>> +               if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE)) {
>>>> +                       spin_lock(&parent->d_lock);
>>> This looks like d_lock ordering problem (should be parent first, child
>>> second).  Why is this needed, anyway?
>>>
>> Yes, that is a bug. I should have used lock_parent() instead.
> lock_parent() can release dentry->d_lock, which means it's perfectly
> useless for this.

As the reference count is kept at 1 in dentry_kill(), the dentry won't
go away even if the dentry lock is temporarily released.

> I still feel forcing  free is wrong here.  Why not just block until
> the number of negatives goes below the limit (start reclaim if not
> already doing so, etc...)?

Force freeing is the simplest. Any other ways will require adding more
code and increasing code complexity.

One reason why I prefer this is to avoid adding unpredictable latency to
the regular directory lookup and other dentry related operations. We can
always change the code later on if there is a better way of doing it.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] fs/dcache: Limit numbers of negative dentries
  2017-07-20 14:21         ` Waiman Long
@ 2017-07-20 15:08           ` Miklos Szeredi
  2017-07-20 15:46             ` Waiman Long
  0 siblings, 1 reply; 17+ messages in thread
From: Miklos Szeredi @ 2017-07-20 15:08 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jonathan Corbet, lkml, linux-doc, linux-fsdevel,
	Paul E. McKenney, Andrew Morton, Ingo Molnar

On Thu, Jul 20, 2017 at 4:21 PM, Waiman Long <longman@redhat.com> wrote:
> On 07/20/2017 03:20 AM, Miklos Szeredi wrote:
>> On Wed, Jul 19, 2017 at 10:42 PM, Waiman Long <longman@redhat.com> wrote:
>>>
>>>>> @@ -603,7 +698,13 @@ static struct dentry *dentry_kill(struct dentry *dentry)
>>>>>
>>>>>         if (!IS_ROOT(dentry)) {
>>>>>                 parent = dentry->d_parent;
>>>>> -               if (unlikely(!spin_trylock(&parent->d_lock))) {
>>>>> +               /*
>>>>> +                * Force the killing of this negative dentry when
>>>>> +                * DCACHE_KILL_NEGATIVE flag is set.
>>>>> +                */
>>>>> +               if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE)) {
>>>>> +                       spin_lock(&parent->d_lock);
>>>> This looks like d_lock ordering problem (should be parent first, child
>>>> second).  Why is this needed, anyway?
>>>>
>>> Yes, that is a bug. I should have used lock_parent() instead.
>> lock_parent() can release dentry->d_lock, which means it's perfectly
>> useless for this.
>
> As the reference count is kept at 1 in dentry_kill(), the dentry won't
> go away even if the dentry lock is temporarily released.

It won't go away, but anything else might happen to it (ref grabbed by
somebody else, instantiated, etc).  Don't see how it's going to be
better than the existing trylock.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] fs/dcache: Limit numbers of negative dentries
  2017-07-20 15:08           ` Miklos Szeredi
@ 2017-07-20 15:46             ` Waiman Long
  0 siblings, 0 replies; 17+ messages in thread
From: Waiman Long @ 2017-07-20 15:46 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Alexander Viro, Jonathan Corbet, lkml, linux-doc, linux-fsdevel,
	Paul E. McKenney, Andrew Morton, Ingo Molnar

On 07/20/2017 11:08 AM, Miklos Szeredi wrote:
> On Thu, Jul 20, 2017 at 4:21 PM, Waiman Long <longman@redhat.com> wrote:
>> On 07/20/2017 03:20 AM, Miklos Szeredi wrote:
>>> On Wed, Jul 19, 2017 at 10:42 PM, Waiman Long <longman@redhat.com> wrote:
>>>>>> @@ -603,7 +698,13 @@ static struct dentry *dentry_kill(struct dentry *dentry)
>>>>>>
>>>>>>         if (!IS_ROOT(dentry)) {
>>>>>>                 parent = dentry->d_parent;
>>>>>> -               if (unlikely(!spin_trylock(&parent->d_lock))) {
>>>>>> +               /*
>>>>>> +                * Force the killing of this negative dentry when
>>>>>> +                * DCACHE_KILL_NEGATIVE flag is set.
>>>>>> +                */
>>>>>> +               if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE)) {
>>>>>> +                       spin_lock(&parent->d_lock);
>>>>> This looks like d_lock ordering problem (should be parent first, child
>>>>> second).  Why is this needed, anyway?
>>>>>
>>>> Yes, that is a bug. I should have used lock_parent() instead.
>>> lock_parent() can release dentry->d_lock, which means it's perfectly
>>> useless for this.
>> As the reference count is kept at 1 in dentry_kill(), the dentry won't
>> go away even if the dentry lock is temporarily released.
> It won't go away, but anything else might happen to it (ref grabbed by
> somebody else, instantiated, etc).  Don't see how it's going to be
> better than the existing trylock.
>
> Thanks,
> Miklos

In the unlikely event that the reference count or the d_flags changes,
we can abort the killing.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-07-20 15:46 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-17 13:39 [PATCH 0/4] fs/dcache: Limit # of negative dentries Waiman Long
2017-07-17 13:39 ` [PATCH 1/4] fs/dcache: Limit numbers " Waiman Long
2017-07-17 17:49   ` Matthew Wilcox
2017-07-17 18:31     ` Waiman Long
2017-07-19 14:39   ` Miklos Szeredi
2017-07-19 15:02     ` Waiman Long
2017-07-19 20:24   ` Miklos Szeredi
2017-07-19 20:42     ` Waiman Long
2017-07-20  7:20       ` Miklos Szeredi
2017-07-20 14:21         ` Waiman Long
2017-07-20 15:08           ` Miklos Szeredi
2017-07-20 15:46             ` Waiman Long
2017-07-17 13:39 ` [PATCH 2/4] fs/dcache: Report negative dentry number in dentry-state Waiman Long
2017-07-17 14:09   ` Matthew Wilcox
2017-07-17 14:39     ` Waiman Long
2017-07-17 13:39 ` [PATCH 3/4] fs/dcache: Enable automatic pruning of negative dentries Waiman Long
2017-07-17 13:39 ` [PATCH 4/4] fs/dcache: Protect negative dentry pruning from racing with umount Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).