linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
@ 2018-07-02  5:51 Waiman Long
  2018-07-02  5:51 ` [PATCH v5 1/6] fs/dcache: Track & report number " Waiman Long
                   ` (6 more replies)
  0 siblings, 7 replies; 29+ messages in thread
From: Waiman Long @ 2018-07-02  5:51 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-kernel, linux-fsdevel, Linus Torvalds, Jan Kara,
	Paul E. McKenney, Andrew Morton, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C),
	Waiman Long

 v4->v5:
  - Backed to the latest 4.18 kernel and modify the code
    accordingly. Patch 1 "Relocate dentry_kill() after lock_parent()"
    is now no longer necessary.
  - Make tracking and limiting of negative dentries an user configurable
    option (CONFIG_DCACHE_TRACK_NEG_ENTRY) so that users can decide if
    they want to include this capability in the kernel.
  - Make killing excess negative dentries an optional feature that can be
    enabled via a boot command line option or a sysctl parameter.
  - Spread negative dentry pruning across multiple CPUs.

 v4: https://lkml.org/lkml/2017/9/18/739

A rogue application can potentially create a large number of negative
dentries in the system consuming most of the memory available if it
is not under the direct control of a memory controller that enforce
kernel memory limit.

This patchset introduces changes to the dcache subsystem to track and
optionally limit the number of negative dentries allowed to be created by
background pruning of excess negative dentries or even kill it after use.
This capability will help to limit the amount of memory that can be
consumed by negative dentries.

Patch 1 tracks the number of negative dentries present in the LRU
lists and reports it in /proc/sys/fs/dentry-state.

Patch 2 makes negative dentry tracking a user configurable option
(CONFIG_DCACHE_TRACK_NEG_ENTRY) as well as adding a "neg_dentry_pc=" boot
command line option to specify a soft limit on the number of negative
allowed as a percentage of total system memory. The default is 2%.

Patch 3 enables automatic pruning of least recently used negative
dentries when the total number is close to the preset limit.

Patch 4 spreads the negative dentry pruning effort to multiple CPUs to
make it more fair.

Patch 5 extends the "neg_dentry_pc=" boot command line option to
optionally enable enforcing the limit by killing off excess negative
dentries immediately after use.

Patch 6 makes the limit enforcing option a sysctl parameter so that it
can be dynamically enabled at run time if the need arises, for example,
when a rogue application generating a lot of negative dentries is
detected.

Waiman Long (6):
  fs/dcache: Track & report number of negative dentries
  fs/dcache: Make negative dentry tracking configurable
  fs/dcache: Enable automatic pruning of negative dentries
  fs/dcache: Spread negative dentry pruning across multiple CPUs
  fs/dcache: Allow optional enforcement of negative dentry limit
  fs/dcache: Make negative dentry limit enforcement sysctl parameter

 Documentation/admin-guide/kernel-parameters.txt |  12 +
 Documentation/sysctl/fs.txt                     |  30 +-
 fs/Kconfig                                      |  10 +
 fs/dcache.c                                     | 452 +++++++++++++++++++++++-
 include/linux/dcache.h                          |  13 +-
 include/linux/list_lru.h                        |   1 +
 kernel/sysctl.c                                 |  11 +
 mm/list_lru.c                                   |   4 +-
 8 files changed, 519 insertions(+), 14 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v5 1/6] fs/dcache: Track & report number of negative dentries
  2018-07-02  5:51 [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries Waiman Long
@ 2018-07-02  5:51 ` Waiman Long
  2018-07-02  5:51 ` [PATCH v5 2/6] fs/dcache: Make negative dentry tracking configurable Waiman Long
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 29+ messages in thread
From: Waiman Long @ 2018-07-02  5:51 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-kernel, linux-fsdevel, Linus Torvalds, Jan Kara,
	Paul E. McKenney, Andrew Morton, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C),
	Waiman Long

The current dentry number tracking code doesn't distinguish between
positive & negative dentries. It just reports the total number of
dentries in the LRU lists.

As excessive number of negative dentries can have an impact on system
performance, it will be wise to track the number of positive and
negative dentries separately.

This patch adds tracking for the total number of negative dentries in
the system LRU lists and reports it in the /proc/sys/fs/dentry-state
file.  The number of positive dentries in the LRU lists can be found
by subtracting the number of negative dentries from the total.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/sysctl/fs.txt | 19 +++++++++++++------
 fs/dcache.c                 | 45 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dcache.h      |  7 ++++---
 3 files changed, 62 insertions(+), 9 deletions(-)

diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
index 6c00c1e..a8e3f1f 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -61,19 +61,26 @@ struct {
         int nr_unused;
         int age_limit;         /* age in seconds */
         int want_pages;        /* pages requested by system */
-        int dummy[2];
+        int nr_negative;       /* # of unused negative dentries */
+        int dummy;
 } dentry_stat = {0, 0, 45, 0,};
--------------------------------------------------------------- 
+--------------------------------------------------------------
+
+Dentries are dynamically allocated and deallocated.
+
+nr_dentry shows the total number of dentries allocated (active
++ unused). nr_unused shows the number of dentries that are not
+actively used, but are saved in the LRU list for future reuse.
 
-Dentries are dynamically allocated and deallocated, and
-nr_dentry seems to be 0 all the time. Hence it's safe to
-assume that only nr_unused, age_limit and want_pages are
-used. Nr_unused seems to be exactly what its name says.
 Age_limit is the age in seconds after which dcache entries
 can be reclaimed when memory is short and want_pages is
 nonzero when shrink_dcache_pages() has been called and the
 dcache isn't pruned yet.
 
+nr_negative shows the number of unused dentries that are also
+negative dentries which do not mapped to actual files if negative
+dentries tracking is enabled.
+
 ==============================================================
 
 dquot-max & dquot-nr:
diff --git a/fs/dcache.c b/fs/dcache.c
index 0e8e5de..dbab6c2 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -119,6 +119,7 @@ struct dentry_stat_t dentry_stat = {
 
 static DEFINE_PER_CPU(long, nr_dentry);
 static DEFINE_PER_CPU(long, nr_dentry_unused);
+static DEFINE_PER_CPU(long, nr_dentry_neg);
 
 #if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS)
 
@@ -152,11 +153,22 @@ static long get_nr_dentry_unused(void)
 	return sum < 0 ? 0 : sum;
 }
 
+static long get_nr_dentry_neg(void)
+{
+	int i;
+	long sum = 0;
+
+	for_each_possible_cpu(i)
+		sum += per_cpu(nr_dentry_neg, i);
+	return sum < 0 ? 0 : sum;
+}
+
 int proc_nr_dentry(struct ctl_table *table, int write, void __user *buffer,
 		   size_t *lenp, loff_t *ppos)
 {
 	dentry_stat.nr_dentry = get_nr_dentry();
 	dentry_stat.nr_unused = get_nr_dentry_unused();
+	dentry_stat.nr_negative = get_nr_dentry_neg();
 	return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
 }
 #endif
@@ -214,6 +226,28 @@ static inline int dentry_string_cmp(const unsigned char *cs, const unsigned char
 
 #endif
 
+static inline void __neg_dentry_dec(struct dentry *dentry)
+{
+	this_cpu_dec(nr_dentry_neg);
+}
+
+static inline void neg_dentry_dec(struct dentry *dentry)
+{
+	if (unlikely(d_is_negative(dentry)))
+		__neg_dentry_dec(dentry);
+}
+
+static inline void __neg_dentry_inc(struct dentry *dentry)
+{
+	this_cpu_inc(nr_dentry_neg);
+}
+
+static inline void neg_dentry_inc(struct dentry *dentry)
+{
+	if (unlikely(d_is_negative(dentry)))
+		__neg_dentry_inc(dentry);
+}
+
 static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *ct, unsigned tcount)
 {
 	/*
@@ -330,6 +364,8 @@ static inline void __d_clear_type_and_inode(struct dentry *dentry)
 	flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU);
 	WRITE_ONCE(dentry->d_flags, flags);
 	dentry->d_inode = NULL;
+	if (dentry->d_flags & DCACHE_LRU_LIST)
+		__neg_dentry_inc(dentry);
 }
 
 static void dentry_free(struct dentry *dentry)
@@ -397,6 +433,7 @@ static void d_lru_add(struct dentry *dentry)
 	dentry->d_flags |= DCACHE_LRU_LIST;
 	this_cpu_inc(nr_dentry_unused);
 	WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
+	neg_dentry_inc(dentry);
 }
 
 static void d_lru_del(struct dentry *dentry)
@@ -405,6 +442,7 @@ static void d_lru_del(struct dentry *dentry)
 	dentry->d_flags &= ~DCACHE_LRU_LIST;
 	this_cpu_dec(nr_dentry_unused);
 	WARN_ON_ONCE(!list_lru_del(&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
+	neg_dentry_dec(dentry);
 }
 
 static void d_shrink_del(struct dentry *dentry)
@@ -435,6 +473,7 @@ static void d_lru_isolate(struct list_lru_one *lru, struct dentry *dentry)
 	dentry->d_flags &= ~DCACHE_LRU_LIST;
 	this_cpu_dec(nr_dentry_unused);
 	list_lru_isolate(lru, &dentry->d_lru);
+	neg_dentry_dec(dentry);
 }
 
 static void d_lru_shrink_move(struct list_lru_one *lru, struct dentry *dentry,
@@ -443,6 +482,7 @@ static void d_lru_shrink_move(struct list_lru_one *lru, struct dentry *dentry,
 	D_FLAG_VERIFY(dentry, DCACHE_LRU_LIST);
 	dentry->d_flags |= DCACHE_SHRINK_LIST;
 	list_lru_isolate_move(lru, &dentry->d_lru, list);
+	neg_dentry_dec(dentry);
 }
 
 /**
@@ -1842,6 +1882,11 @@ static void __d_instantiate(struct dentry *dentry, struct inode *inode)
 	WARN_ON(d_in_lookup(dentry));
 
 	spin_lock(&dentry->d_lock);
+	/*
+	 * Decrement negative dentry count if it was in the LRU list.
+	 */
+	if (dentry->d_flags & DCACHE_LRU_LIST)
+		__neg_dentry_dec(dentry);
 	hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry);
 	raw_write_seqcount_begin(&dentry->d_seq);
 	__d_set_inode_and_type(dentry, inode, add_flags);
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 66c6e17..6e06d91 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -62,9 +62,10 @@ struct qstr {
 struct dentry_stat_t {
 	long nr_dentry;
 	long nr_unused;
-	long age_limit;          /* age in seconds */
-	long want_pages;         /* pages requested by system */
-	long dummy[2];
+	long age_limit;		/* age in seconds */
+	long want_pages;	/* pages requested by system */
+	long nr_negative;	/* # of unused negative dentries */
+	long dummy;
 };
 extern struct dentry_stat_t dentry_stat;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 2/6] fs/dcache: Make negative dentry tracking configurable
  2018-07-02  5:51 [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries Waiman Long
  2018-07-02  5:51 ` [PATCH v5 1/6] fs/dcache: Track & report number " Waiman Long
@ 2018-07-02  5:51 ` Waiman Long
  2018-07-02 21:12   ` Andrew Morton
  2018-07-02  5:52 ` [PATCH v5 3/6] fs/dcache: Enable automatic pruning of negative dentries Waiman Long
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Waiman Long @ 2018-07-02  5:51 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-kernel, linux-fsdevel, Linus Torvalds, Jan Kara,
	Paul E. McKenney, Andrew Morton, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C),
	Waiman Long

The negative dentry tracking is made a configurable option so that
users who don't care about negative dentry tracking will have the
option to disable it. The new config option DCACHE_TRACK_NEG_ENTRY
is disabled by default.

If this option is enabled, a new kernel parameter "neg_dentry_pc=<%>"
allows users to set the soft limit on how many negative dentries are
allowed as a percentage of the total system memory. The default is 2%
and this new parameter accept a range of 0-10% where 0% means there
is no limit.

When the soft limit is reached, a warning message will be printed to
the console to alert the system administrator.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt |   9 ++
 fs/Kconfig                                      |  10 ++
 fs/dcache.c                                     | 170 +++++++++++++++++++++++-
 3 files changed, 184 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index efc7aa7..b7ab98a 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2462,6 +2462,15 @@
 
 	n2=		[NET] SDL Inc. RISCom/N2 synchronous serial card
 
+	neg_dentry_pc=
+			With "CONFIG_DCACHE_TRACK_NEG_ENTRY=y", specify
+			the limit for the number negative dentries
+			allowable in a system as a percentage of the
+			total system memory. The default is 2% and the
+			valid range is 0-10 where 0 means no limit.
+
+			Format: <pc>
+
 	netdev=		[NET] Network devices parameters
 			Format: <irq>,<io>,<mem_start>,<mem_end>,<name>
 			Note that mem_start is often overloaded to mean
diff --git a/fs/Kconfig b/fs/Kconfig
index ac474a6..2e81637 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -113,6 +113,16 @@ source "fs/autofs/Kconfig"
 source "fs/fuse/Kconfig"
 source "fs/overlayfs/Kconfig"
 
+#
+# Track and limit the number of negative dentries allowed in the system.
+#
+config DCACHE_TRACK_NEG_ENTRY
+	bool "Track & limit negative dcache entries"
+	default n
+	help
+	  This option enables the tracking and limiting of the total
+	  number of negative dcache entries in the filesystem.
+
 menu "Caches"
 
 source "fs/fscache/Kconfig"
diff --git a/fs/dcache.c b/fs/dcache.c
index dbab6c2..889d3bb 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -14,6 +14,8 @@
  * the dcache entry is deleted or garbage collected.
  */
 
+#define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
+
 #include <linux/ratelimit.h>
 #include <linux/string.h>
 #include <linux/mm.h>
@@ -117,9 +119,37 @@ struct dentry_stat_t dentry_stat = {
 	.age_limit = 45,
 };
 
+/*
+ * There is a system-wide soft limit to the number of negative dentries
+ * allowed in the super blocks' LRU lists, if enabled. The default limit
+ * is 2% of the total system memory. On a 64-bit system with 1G memory,
+ * that translated to about 100k dentries which is quite a lot. The limit
+ * can be changed by using the "neg_dentry_pc" kernel parameter.
+ *
+ * To avoid performance problem with a global counter on an SMP system,
+ * the tracking is done mostly on a per-cpu basis. The total limit is
+ * distributed in a 80/20 ratio to per-cpu counters and a global free pool.
+ *
+ * If a per-cpu counter runs out of negative dentries, it can borrow extra
+ * ones from the global free pool. If it has more than its percpu limit,
+ * the extra ones will be returned back to the global pool.
+ */
+#define NEG_DENTRY_PC_DEFAULT	2
+#define NEG_DENTRY_BATCH	(1 << 8)
+
+#ifdef CONFIG_DCACHE_TRACK_NEG_ENTRY
+static int neg_dentry_pc __read_mostly = NEG_DENTRY_PC_DEFAULT;
+static long neg_dentry_percpu_limit __read_mostly;
+static long neg_dentry_nfree_init __read_mostly; /* Free pool initial value */
+static struct {
+	raw_spinlock_t nfree_lock;
+	long nfree;			/* Negative dentry free pool */
+} ndblk ____cacheline_aligned_in_smp;
+static DEFINE_PER_CPU(long, nr_dentry_neg);
+#endif
+
 static DEFINE_PER_CPU(long, nr_dentry);
 static DEFINE_PER_CPU(long, nr_dentry_unused);
-static DEFINE_PER_CPU(long, nr_dentry_neg);
 
 #if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS)
 
@@ -153,6 +183,7 @@ static long get_nr_dentry_unused(void)
 	return sum < 0 ? 0 : sum;
 }
 
+#ifdef CONFIG_DCACHE_TRACK_NEG_ENTRY
 static long get_nr_dentry_neg(void)
 {
 	int i;
@@ -160,8 +191,12 @@ static long get_nr_dentry_neg(void)
 
 	for_each_possible_cpu(i)
 		sum += per_cpu(nr_dentry_neg, i);
+	sum += neg_dentry_nfree_init - ndblk.nfree;
 	return sum < 0 ? 0 : sum;
 }
+#else
+static long get_nr_dentry_neg(void)	{ return 0L; }
+#endif
 
 int proc_nr_dentry(struct ctl_table *table, int write, void __user *buffer,
 		   size_t *lenp, loff_t *ppos)
@@ -226,9 +261,23 @@ static inline int dentry_string_cmp(const unsigned char *cs, const unsigned char
 
 #endif
 
-static inline void __neg_dentry_dec(struct dentry *dentry)
+#ifdef CONFIG_DCACHE_TRACK_NEG_ENTRY
+/*
+ * Decrement negative dentry count if applicable.
+ */
+static void __neg_dentry_dec(struct dentry *dentry)
 {
-	this_cpu_dec(nr_dentry_neg);
+	if (unlikely((this_cpu_dec_return(nr_dentry_neg) < 0) &&
+	    neg_dentry_pc)) {
+		long *pcnt = get_cpu_ptr(&nr_dentry_neg);
+
+		if ((*pcnt < 0) && raw_spin_trylock(&ndblk.nfree_lock)) {
+			WRITE_ONCE(ndblk.nfree, ndblk.nfree + NEG_DENTRY_BATCH);
+			*pcnt += NEG_DENTRY_BATCH;
+			raw_spin_unlock(&ndblk.nfree_lock);
+		}
+		put_cpu_ptr(&nr_dentry_neg);
+	}
 }
 
 static inline void neg_dentry_dec(struct dentry *dentry)
@@ -237,9 +286,50 @@ static inline void neg_dentry_dec(struct dentry *dentry)
 		__neg_dentry_dec(dentry);
 }
 
-static inline void __neg_dentry_inc(struct dentry *dentry)
+/*
+ * Try to decrement the negative dentry free pool by NEG_DENTRY_BATCH.
+ * The actual decrement returned by the function may be smaller.
+ */
+static long __neg_dentry_nfree_dec(void)
 {
-	this_cpu_inc(nr_dentry_neg);
+	long cnt = NEG_DENTRY_BATCH;
+
+	raw_spin_lock(&ndblk.nfree_lock);
+	if (ndblk.nfree < cnt)
+		cnt = ndblk.nfree;
+	WRITE_ONCE(ndblk.nfree, ndblk.nfree - cnt);
+	raw_spin_unlock(&ndblk.nfree_lock);
+	return cnt;
+}
+
+/*
+ * Increment negative dentry count if applicable.
+ */
+static void __neg_dentry_inc(struct dentry *dentry)
+{
+	long cnt = 0, *pcnt;
+
+	if (likely((this_cpu_inc_return(nr_dentry_neg) <=
+		    neg_dentry_percpu_limit) || !neg_dentry_pc))
+		return;
+
+	/*
+	 * Try to move some negative dentry quota from the global free
+	 * pool to the percpu count to allow more negative dentries to
+	 * be added to the LRU.
+	 */
+	pcnt = get_cpu_ptr(&nr_dentry_neg);
+	if (READ_ONCE(ndblk.nfree) && (*pcnt > neg_dentry_percpu_limit)) {
+		cnt = __neg_dentry_nfree_dec();
+		*pcnt -= cnt;
+	}
+	put_cpu_ptr(&nr_dentry_neg);
+
+	/*
+	 * Put out a warning if there are too many negative dentries.
+	 */
+	if (!cnt)
+		pr_warn_once("Too many negative dentries.");
 }
 
 static inline void neg_dentry_inc(struct dentry *dentry)
@@ -248,6 +338,26 @@ static inline void neg_dentry_inc(struct dentry *dentry)
 		__neg_dentry_inc(dentry);
 }
 
+#else /* CONFIG_DCACHE_TRACK_NEG_ENTRY */
+
+static inline void __neg_dentry_dec(struct dentry *dentry)
+{
+}
+
+static inline void neg_dentry_dec(struct dentry *dentry)
+{
+}
+
+static inline void __neg_dentry_inc(struct dentry *dentry)
+{
+}
+
+static inline void neg_dentry_inc(struct dentry *dentry)
+{
+}
+
+#endif /* CONFIG_DCACHE_TRACK_NEG_ENTRY */
+
 static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *ct, unsigned tcount)
 {
 	/*
@@ -3149,6 +3259,54 @@ void d_tmpfile(struct dentry *dentry, struct inode *inode)
 }
 EXPORT_SYMBOL(d_tmpfile);
 
+#ifdef CONFIG_DCACHE_TRACK_NEG_ENTRY
+static void __init neg_dentry_init(void)
+{
+	/* Rough estimate of # of dentries allocated per page */
+	unsigned int nr_dentry_page = PAGE_SIZE/sizeof(struct dentry) - 1;
+	unsigned long cnt;
+
+	raw_spin_lock_init(&ndblk.nfree_lock);
+
+	/* 20% in global pool & 80% in percpu free */
+	ndblk.nfree = neg_dentry_nfree_init
+		    = totalram_pages * nr_dentry_page * neg_dentry_pc / 500;
+	cnt = ndblk.nfree * 4 / num_possible_cpus();
+	if (unlikely((cnt < 2 * NEG_DENTRY_BATCH) && neg_dentry_pc))
+		cnt = 2 * NEG_DENTRY_BATCH;
+	neg_dentry_percpu_limit = cnt;
+
+	pr_info("Negative dentry: percpu limit = %ld, free pool = %ld\n",
+		neg_dentry_percpu_limit, ndblk.nfree);
+}
+
+static int __init set_neg_dentry_pc(char *str)
+{
+	int err = -EINVAL;
+	unsigned long pc;
+
+	if (str) {
+		err = kstrtoul(str, 0, &pc);
+		if (err)
+			return err;
+
+		/*
+		 * Valid negative dentry percentage: 0-10%
+		 */
+		if ((pc >= 0) && (pc <= 10)) {
+			neg_dentry_pc = pc;
+			return 0;
+		}
+		err = -ERANGE;
+	}
+	return err;
+}
+early_param("neg_dentry_pc", set_neg_dentry_pc);
+#else
+static inline void neg_dentry_init(void) { }
+#endif
+
+
 static __initdata unsigned long dhash_entries;
 static int __init set_dhash_entries(char *str)
 {
@@ -3191,6 +3349,8 @@ static void __init dcache_init(void)
 		SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
 		d_iname);
 
+	neg_dentry_init();
+
 	/* Hash may have been set up in dcache_init_early */
 	if (!hashdist)
 		return;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 3/6] fs/dcache: Enable automatic pruning of negative dentries
  2018-07-02  5:51 [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries Waiman Long
  2018-07-02  5:51 ` [PATCH v5 1/6] fs/dcache: Track & report number " Waiman Long
  2018-07-02  5:51 ` [PATCH v5 2/6] fs/dcache: Make negative dentry tracking configurable Waiman Long
@ 2018-07-02  5:52 ` Waiman Long
  2018-07-02  5:52 ` [PATCH v5 4/6] fs/dcache: Spread negative dentry pruning across multiple CPUs Waiman Long
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 29+ messages in thread
From: Waiman Long @ 2018-07-02  5:52 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-kernel, linux-fsdevel, Linus Torvalds, Jan Kara,
	Paul E. McKenney, Andrew Morton, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C),
	Waiman Long

It is not good enough to have a soft limit for the number of
negative dentries in the system and print a warning if that limit is
exceeded. We need to do something about it when this happens.

This patch enables automatic pruning of negative dentries when
the soft limit is going to be exceeded.  This is done by using the
workqueue API to do the pruning gradually when a threshold is reached
to minimize performance impact on other running tasks.

The current threshold is 1/4 of the initial value of the free pool
count. Once the threshold is reached, the automatic pruning process
will be kicked in to replenish the free pool. Each pruning run will
scan 64 dentries per LRU list and can remove up to 256 negative
dentries to minimize the LRU locks hold time. The pruning rate will
be 50 Hz if the free pool count is less than 1/8 of the original and
10 Hz otherwise.

The dentry pruning operation may also free some least recently used
positive dentries.

In the unlikely event that a superblock is being umount'ed while in
negative dentry pruning mode, the umount may face an additional delay
of up to 0.1s.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/dcache.c              | 158 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/list_lru.h |   1 +
 mm/list_lru.c            |   4 +-
 3 files changed, 162 insertions(+), 1 deletion(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 889d3bb..6d00f52 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -136,6 +136,11 @@ struct dentry_stat_t dentry_stat = {
  */
 #define NEG_DENTRY_PC_DEFAULT	2
 #define NEG_DENTRY_BATCH	(1 << 8)
+#define NEG_PRUNING_SIZE	(1 << 6)
+#define NEG_PRUNING_SLOW_RATE	(HZ/10)
+#define NEG_PRUNING_FAST_RATE	(HZ/50)
+#define NEG_IS_SB_UMOUNTING(sb)	\
+	unlikely(!(sb)->s_root || !((sb)->s_flags & MS_ACTIVE))
 
 #ifdef CONFIG_DCACHE_TRACK_NEG_ENTRY
 static int neg_dentry_pc __read_mostly = NEG_DENTRY_PC_DEFAULT;
@@ -143,8 +148,17 @@ struct dentry_stat_t dentry_stat = {
 static long neg_dentry_nfree_init __read_mostly; /* Free pool initial value */
 static struct {
 	raw_spinlock_t nfree_lock;
+	int niter;			/* Pruning iteration count */
+	int lru_count;			/* Per-LRU pruning count */
+	long n_neg;			/* # of negative dentries pruned */
+	long n_pos;			/* # of positive dentries pruned */
 	long nfree;			/* Negative dentry free pool */
+	struct super_block *prune_sb;	/* Super_block for pruning */
 } ndblk ____cacheline_aligned_in_smp;
+
+static void prune_negative_dentry(struct work_struct *work);
+static DECLARE_DELAYED_WORK(prune_neg_dentry_work, prune_negative_dentry);
+
 static DEFINE_PER_CPU(long, nr_dentry_neg);
 #endif
 
@@ -330,6 +344,25 @@ static void __neg_dentry_inc(struct dentry *dentry)
 	 */
 	if (!cnt)
 		pr_warn_once("Too many negative dentries.");
+
+	/*
+	 * Initiate negative dentry pruning if free pool has less than
+	 * 1/4 of its initial value.
+	 */
+	if ((READ_ONCE(ndblk.nfree) < READ_ONCE(neg_dentry_nfree_init)/4) &&
+	    !READ_ONCE(ndblk.prune_sb) &&
+	    !cmpxchg(&ndblk.prune_sb, NULL, dentry->d_sb)) {
+		/*
+		 * Abort if umounting is in progress, otherwise take a
+		 * reference and move on.
+		 */
+		if (NEG_IS_SB_UMOUNTING(ndblk.prune_sb)) {
+			WRITE_ONCE(ndblk.prune_sb, NULL);
+		} else {
+			atomic_inc(&ndblk.prune_sb->s_active);
+			schedule_delayed_work(&prune_neg_dentry_work, 1);
+		}
+	}
 }
 
 static inline void neg_dentry_inc(struct dentry *dentry)
@@ -1368,6 +1401,131 @@ void shrink_dcache_sb(struct super_block *sb)
 }
 EXPORT_SYMBOL(shrink_dcache_sb);
 
+#ifdef CONFIG_DCACHE_TRACK_NEG_ENTRY
+/*
+ * A modified version that attempts to remove a limited number of negative
+ * dentries as well as some other non-negative dentries at the front.
+ */
+static enum lru_status dentry_negative_lru_isolate(struct list_head *item,
+		struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
+{
+	struct list_head *freeable = arg;
+	struct dentry	*dentry = container_of(item, struct dentry, d_lru);
+	enum lru_status	status = LRU_SKIP;
+
+	/*
+	 * Limit amount of dentry walking in each LRU list.
+	 */
+	if (ndblk.lru_count >= NEG_PRUNING_SIZE) {
+		ndblk.lru_count = 0;
+		return LRU_STOP;
+	}
+	ndblk.lru_count++;
+
+	/*
+	 * we are inverting the lru lock/dentry->d_lock here,
+	 * so use a trylock. If we fail to get the lock, just skip
+	 * it
+	 */
+	if (!spin_trylock(&dentry->d_lock))
+		return LRU_SKIP;
+
+	/*
+	 * Referenced dentries are still in use. If they have active
+	 * counts, just remove them from the LRU. Otherwise give them
+	 * another pass through the LRU.
+	 */
+	if (dentry->d_lockref.count) {
+		d_lru_isolate(lru, dentry);
+		status = LRU_REMOVED;
+		goto out;
+	}
+
+	/*
+	 * Dentries with reference bit on are moved back to the tail.
+	 */
+	if (dentry->d_flags & DCACHE_REFERENCED) {
+		dentry->d_flags &= ~DCACHE_REFERENCED;
+		status = LRU_ROTATE;
+		goto out;
+	}
+
+	status = LRU_REMOVED;
+	d_lru_shrink_move(lru, dentry, freeable);
+	if (d_is_negative(dentry))
+		ndblk.n_neg++;
+out:
+	spin_unlock(&dentry->d_lock);
+	return status;
+}
+
+/*
+ * A workqueue function to prune negative dentry.
+ *
+ * The pruning is done gradually over time so as to have as little
+ * performance impact as possible.
+ */
+static void prune_negative_dentry(struct work_struct *work)
+{
+	int freed, last_n_neg;
+	long nfree;
+	struct super_block *sb = READ_ONCE(ndblk.prune_sb);
+	LIST_HEAD(dispose);
+
+	if (!sb)
+		return;
+	if (NEG_IS_SB_UMOUNTING(sb))
+		goto stop_pruning;
+
+	ndblk.niter++;
+	ndblk.lru_count = 0;
+	last_n_neg = ndblk.n_neg;
+	freed = list_lru_walk(&sb->s_dentry_lru, dentry_negative_lru_isolate,
+			      &dispose, NEG_DENTRY_BATCH);
+
+	if (freed)
+		shrink_dentry_list(&dispose);
+	ndblk.n_pos += freed - (ndblk.n_neg - last_n_neg);
+
+	/*
+	 * Continue delayed pruning until negative dentry free pool is at
+	 * least 1/2 of the initial value, the super_block has no more
+	 * negative dentries left at the front, or unmounting is in
+	 * progress.
+	 *
+	 * The pruning rate depends on the size of the free pool. The
+	 * faster rate is used when there is less than 1/8 left.
+	 * Otherwise, the slower rate will be used.
+	 */
+	nfree = READ_ONCE(ndblk.nfree);
+	if ((ndblk.n_neg == last_n_neg) ||
+	    (nfree >= neg_dentry_nfree_init/2) || NEG_IS_SB_UMOUNTING(sb))
+		goto stop_pruning;
+
+	schedule_delayed_work(&prune_neg_dentry_work,
+			     (nfree < neg_dentry_nfree_init/8)
+			     ? NEG_PRUNING_FAST_RATE : NEG_PRUNING_SLOW_RATE);
+	return;
+
+stop_pruning:
+#ifdef CONFIG_DEBUG_KERNEL
+	/*
+	 * Report large negative dentry pruning event.
+	 */
+	if (ndblk.n_neg > NEG_PRUNING_SIZE) {
+		pr_info("Negative dentry pruning (SB=%s):\n\t"
+			"%d iterations, %ld/%ld neg/pos dentries freed.\n",
+			ndblk.prune_sb->s_id, ndblk.niter, ndblk.n_neg,
+			ndblk.n_pos);
+	}
+#endif
+	ndblk.niter = 0;
+	ndblk.n_neg = ndblk.n_pos = 0;
+	deactivate_super(sb);
+	WRITE_ONCE(ndblk.prune_sb, NULL);
+}
+#endif /* CONFIG_DCACHE_TRACK_NEG_ENTRY */
+
 /**
  * enum d_walk_ret - action to talke during tree walk
  * @D_WALK_CONTINUE:	contrinue walk
diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index 96def9d..a9598a0 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -23,6 +23,7 @@ enum lru_status {
 	LRU_SKIP,		/* item cannot be locked, skip */
 	LRU_RETRY,		/* item not freeable. May drop the lock
 				   internally, but has to return locked. */
+	LRU_STOP,		/* stop walking the list */
 };
 
 struct list_lru_one {
diff --git a/mm/list_lru.c b/mm/list_lru.c
index fcfb6c8..2ee5d3a 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -246,11 +246,13 @@ unsigned long list_lru_count_node(struct list_lru *lru, int nid)
 			 */
 			assert_spin_locked(&nlru->lock);
 			goto restart;
+		case LRU_STOP:
+			goto out;
 		default:
 			BUG();
 		}
 	}
-
+out:
 	spin_unlock(&nlru->lock);
 	return isolated;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 4/6] fs/dcache: Spread negative dentry pruning across multiple CPUs
  2018-07-02  5:51 [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries Waiman Long
                   ` (2 preceding siblings ...)
  2018-07-02  5:52 ` [PATCH v5 3/6] fs/dcache: Enable automatic pruning of negative dentries Waiman Long
@ 2018-07-02  5:52 ` Waiman Long
  2018-07-02  5:52 ` [PATCH v5 5/6] fs/dcache: Allow optional enforcement of negative dentry limit Waiman Long
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 29+ messages in thread
From: Waiman Long @ 2018-07-02  5:52 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-kernel, linux-fsdevel, Linus Torvalds, Jan Kara,
	Paul E. McKenney, Andrew Morton, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C),
	Waiman Long

Doing negative dentry pruning using schedule_delayed_work() will
typically concentrate the pruning effort on one particular CPU. That is
not fair to the tasks running on that CPU. In addition, it is possible
that one CPU can have all its negative dentries pruned away while the
others can still have more negative dentries than the percpu limit.

To be fair, negative dentries pruning is now done across all the online
CPUs, if they all have close to the percpu limit of negative dentries.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/dcache.c | 43 ++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 38 insertions(+), 5 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 6d00f52..4f34f53 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -360,7 +360,8 @@ static void __neg_dentry_inc(struct dentry *dentry)
 			WRITE_ONCE(ndblk.prune_sb, NULL);
 		} else {
 			atomic_inc(&ndblk.prune_sb->s_active);
-			schedule_delayed_work(&prune_neg_dentry_work, 1);
+			schedule_delayed_work_on(smp_processor_id(),
+						&prune_neg_dentry_work, 1);
 		}
 	}
 }
@@ -1467,8 +1468,9 @@ static enum lru_status dentry_negative_lru_isolate(struct list_head *item,
  */
 static void prune_negative_dentry(struct work_struct *work)
 {
+	int cpu = smp_processor_id();
 	int freed, last_n_neg;
-	long nfree;
+	long nfree, excess;
 	struct super_block *sb = READ_ONCE(ndblk.prune_sb);
 	LIST_HEAD(dispose);
 
@@ -1502,9 +1504,40 @@ static void prune_negative_dentry(struct work_struct *work)
 	    (nfree >= neg_dentry_nfree_init/2) || NEG_IS_SB_UMOUNTING(sb))
 		goto stop_pruning;
 
-	schedule_delayed_work(&prune_neg_dentry_work,
-			     (nfree < neg_dentry_nfree_init/8)
-			     ? NEG_PRUNING_FAST_RATE : NEG_PRUNING_SLOW_RATE);
+	/*
+	 * If the negative dentry count in the current cpu is less than the
+	 * per_cpu limit, schedule the pruning in the next cpu if it has
+	 * more negative dentries. This will make the negative dentry count
+	 * reduction spread more evenly across multiple per-cpu counters.
+	 */
+	excess = neg_dentry_percpu_limit - __this_cpu_read(nr_dentry_neg);
+	if (excess > 0) {
+		int next_cpu = cpumask_next(cpu, cpu_online_mask);
+
+		if (next_cpu >= nr_cpu_ids)
+			next_cpu = cpumask_first(cpu_online_mask);
+		if (per_cpu(nr_dentry_neg, next_cpu) >
+		    __this_cpu_read(nr_dentry_neg)) {
+			cpu = next_cpu;
+
+			/*
+			 * Transfer some of the excess negative dentry count
+			 * to the free pool if the current percpu pool is less
+			 * than 3/4 of the limit.
+			 */
+			if ((excess > neg_dentry_percpu_limit/4) &&
+			    raw_spin_trylock(&ndblk.nfree_lock)) {
+				WRITE_ONCE(ndblk.nfree,
+					   ndblk.nfree + NEG_DENTRY_BATCH);
+				__this_cpu_add(nr_dentry_neg, NEG_DENTRY_BATCH);
+				raw_spin_unlock(&ndblk.nfree_lock);
+			}
+		}
+	}
+
+	schedule_delayed_work_on(cpu, &prune_neg_dentry_work,
+			(nfree < neg_dentry_nfree_init/8)
+			? NEG_PRUNING_FAST_RATE : NEG_PRUNING_SLOW_RATE);
 	return;
 
 stop_pruning:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 5/6] fs/dcache: Allow optional enforcement of negative dentry limit
  2018-07-02  5:51 [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries Waiman Long
                   ` (3 preceding siblings ...)
  2018-07-02  5:52 ` [PATCH v5 4/6] fs/dcache: Spread negative dentry pruning across multiple CPUs Waiman Long
@ 2018-07-02  5:52 ` Waiman Long
  2018-07-02  5:52 ` [PATCH v5 6/6] fs/dcache: Make negative dentry limit enforcement sysctl parameter Waiman Long
  2018-07-02 19:34 ` [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries Linus Torvalds
  6 siblings, 0 replies; 29+ messages in thread
From: Waiman Long @ 2018-07-02  5:52 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-kernel, linux-fsdevel, Linus Torvalds, Jan Kara,
	Paul E. McKenney, Andrew Morton, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C),
	Waiman Long

If a rogue application that generates a large number of negative
dentries is running, the automatic negative dentries pruning process
may not be fast enough to clear up the negative dentries in time. In
this case, it is possible that negative dentries will use up most
of the available memory in the system when that application is not
under the control of a memory cgroup that limit kernel memory.

The lack of available memory may significantly affect the operation
of other applications running in the system. It may even lead to OOM
kill of useful applications.

To allow system administrators the option to prevent this extreme
situation from happening, the "enforce" option can now be added to
the "neg_dentry_pc" kernel parameter to enforce the negative dentry
limit. When the limit is enforced, extra negative dentries that exceed
the limit will be killed after use instead of leaving them in the LRU.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  5 +-
 fs/dcache.c                                     | 94 +++++++++++++++++++------
 include/linux/dcache.h                          |  2 +-
 3 files changed, 76 insertions(+), 25 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index b7ab98a..05531a8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2468,8 +2468,11 @@
 			allowable in a system as a percentage of the
 			total system memory. The default is 2% and the
 			valid range is 0-10 where 0 means no limit.
+			The optional "enforce" option can be added to
+			enforce the limit by killing excessive negative
+			dentries.
 
-			Format: <pc>
+			Format: <pc>[,enforce]
 
 	netdev=		[NET] Network devices parameters
 			Format: <irq>,<io>,<mem_start>,<mem_end>,<name>
diff --git a/fs/dcache.c b/fs/dcache.c
index 4f34f53..77910c9 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -124,7 +124,10 @@ struct dentry_stat_t dentry_stat = {
  * allowed in the super blocks' LRU lists, if enabled. The default limit
  * is 2% of the total system memory. On a 64-bit system with 1G memory,
  * that translated to about 100k dentries which is quite a lot. The limit
- * can be changed by using the "neg_dentry_pc" kernel parameter.
+ * can be changed by using the "neg_dentry_pc" kernel parameter. An
+ * optional "enforce" option can be added to enforce the limit by
+ * destroying extra negative dentries after use when the limit is
+ * exceeded.
  *
  * To avoid performance problem with a global counter on an SMP system,
  * the tracking is done mostly on a per-cpu basis. The total limit is
@@ -143,6 +146,7 @@ struct dentry_stat_t dentry_stat = {
 	unlikely(!(sb)->s_root || !((sb)->s_flags & MS_ACTIVE))
 
 #ifdef CONFIG_DCACHE_TRACK_NEG_ENTRY
+static int enforce_neg_dentry_limit __read_mostly;
 static int neg_dentry_pc __read_mostly = NEG_DENTRY_PC_DEFAULT;
 static long neg_dentry_percpu_limit __read_mostly;
 static long neg_dentry_nfree_init __read_mostly; /* Free pool initial value */
@@ -276,6 +280,9 @@ static inline int dentry_string_cmp(const unsigned char *cs, const unsigned char
 #endif
 
 #ifdef CONFIG_DCACHE_TRACK_NEG_ENTRY
+
+static void d_lru_del(struct dentry *dentry);
+
 /*
  * Decrement negative dentry count if applicable.
  */
@@ -318,8 +325,12 @@ static long __neg_dentry_nfree_dec(void)
 
 /*
  * Increment negative dentry count if applicable.
+ *
+ * The retain flag will only be set when calling from
+ * __d_clear_type_and_inode() so as to retain the entry even
+ * if the negative dentry limit has been exceeded.
  */
-static void __neg_dentry_inc(struct dentry *dentry)
+static void __neg_dentry_inc(struct dentry *dentry, bool retain)
 {
 	long cnt = 0, *pcnt;
 
@@ -340,10 +351,18 @@ static void __neg_dentry_inc(struct dentry *dentry)
 	put_cpu_ptr(&nr_dentry_neg);
 
 	/*
-	 * Put out a warning if there are too many negative dentries.
+	 * Put out a warning if there are too many negative dentries or
+	 * kill it by removing it from the LRU and set the
+	 * DCACHE_KILL_NEGATIVE flag if the enforce option is on.
 	 */
-	if (!cnt)
-		pr_warn_once("Too many negative dentries.");
+	if (!cnt) {
+		if (enforce_neg_dentry_limit && !retain) {
+			dentry->d_flags |= DCACHE_KILL_NEGATIVE;
+			d_lru_del(dentry);
+		} else {
+			pr_warn_once("Too many negative dentries.");
+		}
+	}
 
 	/*
 	 * Initiate negative dentry pruning if free pool has less than
@@ -369,7 +388,7 @@ static void __neg_dentry_inc(struct dentry *dentry)
 static inline void neg_dentry_inc(struct dentry *dentry)
 {
 	if (unlikely(d_is_negative(dentry)))
-		__neg_dentry_inc(dentry);
+		__neg_dentry_inc(dentry, false);
 }
 
 #else /* CONFIG_DCACHE_TRACK_NEG_ENTRY */
@@ -382,7 +401,7 @@ static inline void neg_dentry_dec(struct dentry *dentry)
 {
 }
 
-static inline void __neg_dentry_inc(struct dentry *dentry)
+static inline void __neg_dentry_inc(struct dentry *dentry, bool retain)
 {
 }
 
@@ -509,7 +528,7 @@ static inline void __d_clear_type_and_inode(struct dentry *dentry)
 	WRITE_ONCE(dentry->d_flags, flags);
 	dentry->d_inode = NULL;
 	if (dentry->d_flags & DCACHE_LRU_LIST)
-		__neg_dentry_inc(dentry);
+		__neg_dentry_inc(dentry, true);	/* Always retain it */
 }
 
 static void dentry_free(struct dentry *dentry)
@@ -816,16 +835,27 @@ static inline bool retain_dentry(struct dentry *dentry)
 	if (unlikely(dentry->d_flags & DCACHE_DISCONNECTED))
 		return false;
 
+	if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE))
+		return false;
+
 	if (unlikely(dentry->d_flags & DCACHE_OP_DELETE)) {
 		if (dentry->d_op->d_delete(dentry))
 			return false;
 	}
 	/* retain; LRU fodder */
 	dentry->d_lockref.count--;
-	if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST)))
+	if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST))) {
 		d_lru_add(dentry);
-	else if (unlikely(!(dentry->d_flags & DCACHE_REFERENCED)))
+		/*
+		 * If DCACHE_LRU_LIST flag isn't set after d_lru_add(),
+		 * it means that it is a negative dentry that has to
+		 * be killed.
+		 */
+		if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST)))
+			return false;
+	} else if (unlikely(!(dentry->d_flags & DCACHE_REFERENCED))) {
 		dentry->d_flags |= DCACHE_REFERENCED;
+	}
 	return true;
 }
 
@@ -865,7 +895,8 @@ static struct dentry *dentry_kill(struct dentry *dentry)
 	spin_lock(&dentry->d_lock);
 	parent = lock_parent(dentry);
 got_locks:
-	if (unlikely(dentry->d_lockref.count != 1)) {
+	if (unlikely((dentry->d_lockref.count != 1) &&
+		    !(dentry->d_flags & DCACHE_KILL_NEGATIVE))) {
 		dentry->d_lockref.count--;
 	} else if (likely(!retain_dentry(dentry))) {
 		__dentry_kill(dentry);
@@ -3451,6 +3482,8 @@ void d_tmpfile(struct dentry *dentry, struct inode *inode)
 EXPORT_SYMBOL(d_tmpfile);
 
 #ifdef CONFIG_DCACHE_TRACK_NEG_ENTRY
+#include <linux/ctype.h>
+
 static void __init neg_dentry_init(void)
 {
 	/* Rough estimate of # of dentries allocated per page */
@@ -3473,23 +3506,38 @@ static void __init neg_dentry_init(void)
 
 static int __init set_neg_dentry_pc(char *str)
 {
-	int err = -EINVAL;
+	int err = 0;
+	int enforce = false;
 	unsigned long pc;
 
-	if (str) {
-		err = kstrtoul(str, 0, &pc);
-		if (err)
-			return err;
+	if (!str)
+		return -EINVAL;
 
-		/*
-		 * Valid negative dentry percentage: 0-10%
-		 */
-		if ((pc >= 0) && (pc <= 10)) {
-			neg_dentry_pc = pc;
-			return 0;
+	while (*str && !err) {
+		if (isdigit(*str)) {
+			err = kstrtoul(str, 0, &pc);
+			if (err)
+				break;
+			/*
+			 * Valid negative dentry percentage: 0-10%
+			 */
+			if ((pc >= 0) && (pc <= 10)) {
+				neg_dentry_pc = pc;
+				while (isxdigit(*str))
+					str++;
+			} else {
+				err = -ERANGE;
+			}
+		} else if (isspace(*str) || (*str == ',')) {
+			str++;
+		} else if (*str && !strncmp("enforce", str, 7)) {
+			str += 7;
+			enforce = true;
+		} else {
+			err = -EINVAL;
 		}
-		err = -ERANGE;
 	}
+	enforce_neg_dentry_limit = enforce;
 	return err;
 }
 early_param("neg_dentry_pc", set_neg_dentry_pc);
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 6e06d91..69b8cb3 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -215,7 +215,7 @@ struct dentry_operations {
 #define DCACHE_FALLTHRU			0x01000000 /* Fall through to lower layer */
 #define DCACHE_ENCRYPTED_WITH_KEY	0x02000000 /* dir is encrypted with a valid key */
 #define DCACHE_OP_REAL			0x04000000
-
+#define DCACHE_KILL_NEGATIVE		0x08000000 /* Kill negative dentry */
 #define DCACHE_PAR_LOOKUP		0x10000000 /* being looked up (with parent locked shared) */
 #define DCACHE_DENTRY_CURSOR		0x20000000
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v5 6/6] fs/dcache: Make negative dentry limit enforcement sysctl parameter
  2018-07-02  5:51 [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries Waiman Long
                   ` (4 preceding siblings ...)
  2018-07-02  5:52 ` [PATCH v5 5/6] fs/dcache: Allow optional enforcement of negative dentry limit Waiman Long
@ 2018-07-02  5:52 ` Waiman Long
  2018-07-02 19:34 ` [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries Linus Torvalds
  6 siblings, 0 replies; 29+ messages in thread
From: Waiman Long @ 2018-07-02  5:52 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-kernel, linux-fsdevel, Linus Torvalds, Jan Kara,
	Paul E. McKenney, Andrew Morton, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C),
	Waiman Long

It can be useful to make negative dentry limit enformcement a runtime
tuning parameter instead of just a boot time option. This allows system
administrator to disable limit enforcement in normal use, but turn it
on under certain circumstances.

A new /proc/sys/fs/enforce-neg-dentry-limit sysctl parameter is now
added. This is a boolean flag that accept a value of either 0 or 1
for disabling and enabling enforcement of negative dentry limit
respectively.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/sysctl/fs.txt | 11 +++++++++++
 fs/dcache.c                 |  4 +++-
 include/linux/dcache.h      |  4 ++++
 kernel/sysctl.c             | 11 +++++++++++
 4 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
index a8e3f1f..ef8fd32 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -24,6 +24,7 @@ Currently, these files are in /proc/sys/fs:
 - dentry-state
 - dquot-max
 - dquot-nr
+- enforce-neg-dentry-limit
 - file-max
 - file-nr
 - inode-max
@@ -97,6 +98,16 @@ you might want to raise the limit.
 
 ==============================================================
 
+enforce-neg-dentry-limit:
+
+The file enforce-neg-dentry-limit, if present, contains a boolean
+flag (0 or 1) indicating if the negative dentries limit set by
+the "neg_dentry_pc" kernel parameter should be enforced or not.
+If enforced, excess negative dentries over the limit will be killed
+immediately after use.
+
+==============================================================
+
 file-max & file-nr:
 
 The value in file-max denotes the maximum number of file-
diff --git a/fs/dcache.c b/fs/dcache.c
index 77910c9..f50886e 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -146,7 +146,9 @@ struct dentry_stat_t dentry_stat = {
 	unlikely(!(sb)->s_root || !((sb)->s_flags & MS_ACTIVE))
 
 #ifdef CONFIG_DCACHE_TRACK_NEG_ENTRY
-static int enforce_neg_dentry_limit __read_mostly;
+int enforce_neg_dentry_limit __read_mostly;
+EXPORT_SYMBOL_GPL(enforce_neg_dentry_limit);
+
 static int neg_dentry_pc __read_mostly = NEG_DENTRY_PC_DEFAULT;
 static long neg_dentry_percpu_limit __read_mostly;
 static long neg_dentry_nfree_init __read_mostly; /* Free pool initial value */
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 69b8cb3..bd7238a0 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -610,4 +610,8 @@ struct name_snapshot {
 void take_dentry_name_snapshot(struct name_snapshot *, struct dentry *);
 void release_dentry_name_snapshot(struct name_snapshot *);
 
+#ifdef CONFIG_DCACHE_TRACK_NEG_ENTRY
+extern int enforce_neg_dentry_limit;
+#endif
+
 #endif	/* __LINUX_DCACHE_H */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 2d9837c..94f6f6c 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1849,6 +1849,17 @@ static int sysrq_sysctl_handler(struct ctl_table *table, int write,
 		.proc_handler	= proc_dointvec_minmax,
 		.extra1		= &one,
 	},
+#ifdef CONFIG_DCACHE_TRACK_NEG_ENTRY
+	{
+		.procname	= "enforce-neg-dentry-limit",
+		.data		= &enforce_neg_dentry_limit,
+		.maxlen		= sizeof(enforce_neg_dentry_limit),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+#endif
 	{ }
 };
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02  5:51 [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries Waiman Long
                   ` (5 preceding siblings ...)
  2018-07-02  5:52 ` [PATCH v5 6/6] fs/dcache: Make negative dentry limit enforcement sysctl parameter Waiman Long
@ 2018-07-02 19:34 ` Linus Torvalds
  2018-07-02 21:18   ` Andrew Morton
  2018-07-03  0:46   ` Waiman Long
  6 siblings, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2018-07-02 19:34 UTC (permalink / raw)
  To: Waiman Long
  Cc: Al Viro, Linux Kernel Mailing List, linux-fsdevel, Jan Kara,
	Paul McKenney, Andrew Morton, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin,C)

On Sun, Jul 1, 2018 at 10:52 PM Waiman Long <longman@redhat.com> wrote:
>
> A rogue application can potentially create a large number of negative
> dentries in the system consuming most of the memory available if it
> is not under the direct control of a memory controller that enforce
> kernel memory limit.

I certainly don't mind the patch series, but I would like it to be
accompanied with some actual example numbers, just to make it all a
bit more concrete.

Maybe even performance numbers showing "look, I've filled the dentry
lists with nasty negative dentries, now it's all slower because we
walk those less interesting entries".

              Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 2/6] fs/dcache: Make negative dentry tracking configurable
  2018-07-02  5:51 ` [PATCH v5 2/6] fs/dcache: Make negative dentry tracking configurable Waiman Long
@ 2018-07-02 21:12   ` Andrew Morton
  2018-07-03  0:59     ` Waiman Long
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Morton @ 2018-07-02 21:12 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, linux-kernel, linux-fsdevel, Linus Torvalds,
	Jan Kara, Paul E. McKenney, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C)

On Mon,  2 Jul 2018 13:51:59 +0800 Waiman Long <longman@redhat.com> wrote:

> The negative dentry tracking is made a configurable option so that
> users who don't care about negative dentry tracking will have the
> option to disable it. The new config option DCACHE_TRACK_NEG_ENTRY
> is disabled by default.
> 
> If this option is enabled, a new kernel parameter "neg_dentry_pc=<%>"
> allows users to set the soft limit on how many negative dentries are
> allowed as a percentage of the total system memory. The default is 2%
> and this new parameter accept a range of 0-10% where 0% means there
> is no limit.
> 
> When the soft limit is reached, a warning message will be printed to
> the console to alert the system administrator.

It would be much more convenient if this was tunable at runtime via yet
another /proc knob.  Is there any particular reason why we can't do this?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02 19:34 ` [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries Linus Torvalds
@ 2018-07-02 21:18   ` Andrew Morton
  2018-07-02 22:21     ` Matthew Wilcox
                       ` (2 more replies)
  2018-07-03  0:46   ` Waiman Long
  1 sibling, 3 replies; 29+ messages in thread
From: Andrew Morton @ 2018-07-02 21:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Waiman Long, Al Viro, Linux Kernel Mailing List, linux-fsdevel,
	Jan Kara, Paul McKenney, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Mon, 2 Jul 2018 12:34:00 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Sun, Jul 1, 2018 at 10:52 PM Waiman Long <longman@redhat.com> wrote:
> >
> > A rogue application can potentially create a large number of negative
> > dentries in the system consuming most of the memory available if it
> > is not under the direct control of a memory controller that enforce
> > kernel memory limit.
> 
> I certainly don't mind the patch series, but I would like it to be
> accompanied with some actual example numbers, just to make it all a
> bit more concrete.
> 
> Maybe even performance numbers showing "look, I've filled the dentry
> lists with nasty negative dentries, now it's all slower because we
> walk those less interesting entries".
> 

(Please cc linux-mm@kvack.org on this work)

Yup.  The description of the user-visible impact of current behavior is
far too vague.

In the [5/6] changelog it is mentioned that a large number of -ve
dentries can lead to oom-killings.  This sounds bad - -ve dentries
should be trivially reclaimable and we shouldn't be oom-killing in such
a situation.

Dumb question: do we know that negative dentries are actually
worthwhile?  Has anyone checked in the past couple of decades?  Perhaps
our lookups are so whizzy nowadays that we don't need them?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02 21:18   ` Andrew Morton
@ 2018-07-02 22:21     ` Matthew Wilcox
  2018-07-02 22:31       ` Linus Torvalds
  2018-07-02 22:34     ` James Bottomley
  2018-07-03  1:11     ` Waiman Long
  2 siblings, 1 reply; 29+ messages in thread
From: Matthew Wilcox @ 2018-07-02 22:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, Waiman Long, Al Viro, Linux Kernel Mailing List,
	linux-fsdevel, Jan Kara, Paul McKenney, Ingo Molnar,
	Miklos Szeredi, Larry Woodman, James Bottomley, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Mon, Jul 02, 2018 at 02:18:11PM -0700, Andrew Morton wrote:
> In the [5/6] changelog it is mentioned that a large number of -ve
> dentries can lead to oom-killings.  This sounds bad - -ve dentries
> should be trivially reclaimable and we shouldn't be oom-killing in such
> a situation.
> 
> Dumb question: do we know that negative dentries are actually
> worthwhile?  Has anyone checked in the past couple of decades?  Perhaps
> our lookups are so whizzy nowadays that we don't need them?

I can't believe that's true.  Have you looked at strace of a typical
program startup recently?

$ strace -o ls.out ls 
$ grep -c ENOENT ls.out 
10

There's a few duplicates in there (6 accesses to /etc/ld.so.nohwcap), so
we definitely want those negative entries.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02 22:21     ` Matthew Wilcox
@ 2018-07-02 22:31       ` Linus Torvalds
  0 siblings, 0 replies; 29+ messages in thread
From: Linus Torvalds @ 2018-07-02 22:31 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Waiman Long, Al Viro, Linux Kernel Mailing List,
	linux-fsdevel, Jan Kara, Paul McKenney, Ingo Molnar,
	Miklos Szeredi, Larry Woodman, James Bottomley, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Mon, Jul 2, 2018 at 3:21 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Jul 02, 2018 at 02:18:11PM -0700, Andrew Morton wrote:
> >
> > Dumb question: do we know that negative dentries are actually
> > worthwhile?  Has anyone checked in the past couple of decades?  Perhaps
> > our lookups are so whizzy nowadays that we don't need them?
>
> I can't believe that's true.

Yeah, I'm with Matthew.

Negative dentries are absolutely *critical*. We have a shit-ton of
stuff that walks various PATH-like things, trying to open a file in
one directory after another.

They also happen to be really fundamental to how the dentry cache
itself works, with operations like "rename()" fundamentally depending
on negative dentries.

Sure, that "fundamental to rename()" could still be something that
isn't actually ever *cached*, but the thing about many filesystems is
that it's actually much more expensive to look up a file that doesn't
exist than it is to look up one that does.

That can be true even with things like hashed lookups, although it's
more obviously true with legacy filesystems.

Calling down to the filesystem every time you wonder "do I have a
/usr/local/bin/cat" binary would be absolutely horrid.

Also, honestly, I think the oom-killing thing says more about the
issues we've had with the memory freeing code than about negative
dentries. But the one problem with negative dentries is that it's
fairly easy to create a shit-ton of them, so while we absolutely don't
want to get rid of the concept, I do agree that having a limiter is a
fine fine idea.

                Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02 21:18   ` Andrew Morton
  2018-07-02 22:21     ` Matthew Wilcox
@ 2018-07-02 22:34     ` James Bottomley
  2018-07-02 22:54       ` Linus Torvalds
  2018-07-02 23:19       ` Andrew Morton
  2018-07-03  1:11     ` Waiman Long
  2 siblings, 2 replies; 29+ messages in thread
From: James Bottomley @ 2018-07-02 22:34 UTC (permalink / raw)
  To: Andrew Morton, Linus Torvalds
  Cc: Waiman Long, Al Viro, Linux Kernel Mailing List, linux-fsdevel,
	Jan Kara, Paul McKenney, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Mon, 2018-07-02 at 14:18 -0700, Andrew Morton wrote:
> On Mon, 2 Jul 2018 12:34:00 -0700 Linus Torvalds <torvalds@linux-foun
> dation.org> wrote:
> 
> > On Sun, Jul 1, 2018 at 10:52 PM Waiman Long <longman@redhat.com>
> > wrote:
> > > 
> > > A rogue application can potentially create a large number of
> > > negative
> > > dentries in the system consuming most of the memory available if
> > > it
> > > is not under the direct control of a memory controller that
> > > enforce
> > > kernel memory limit.
> > 
> > I certainly don't mind the patch series, but I would like it to be
> > accompanied with some actual example numbers, just to make it all a
> > bit more concrete.
> > 
> > Maybe even performance numbers showing "look, I've filled the
> > dentry
> > lists with nasty negative dentries, now it's all slower because we
> > walk those less interesting entries".
> > 
> 
> (Please cc linux-mm@kvack.org on this work)
> 
> Yup.  The description of the user-visible impact of current behavior
> is far too vague.
> 
> In the [5/6] changelog it is mentioned that a large number of -ve
> dentries can lead to oom-killings.  This sounds bad - -ve dentries
> should be trivially reclaimable and we shouldn't be oom-killing in
> such a situation.

If you're old enough, it's déjà vu; Andrea went on a negative dentry
rampage about 15 years ago:

https://lkml.org/lkml/2002/5/24/71

I think the summary of the thread is that it's not worth it because
dentries are a clean cache, so they're immediately shrinkable.

> Dumb question: do we know that negative dentries are actually
> worthwhile?  Has anyone checked in the past couple of
> decades?  Perhaps our lookups are so whizzy nowadays that we don't
> need them?

There are still a lot of applications that keep looking up non-existent 
files, so I think it's still beneficial to keep them.  Apparently
apache still looks for a .htaccess file in every directory it
traverses, for instance.  Round tripping every one of these to disk
instead of caching it as a negative dentry would seem to be a
performance loser here.

However, actually measuring this again might be useful.

James

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02 22:34     ` James Bottomley
@ 2018-07-02 22:54       ` Linus Torvalds
  2018-07-02 23:03         ` Linus Torvalds
  2018-07-02 23:19       ` Andrew Morton
  1 sibling, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2018-07-02 22:54 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrew Morton, Waiman Long, Al Viro, Linux Kernel Mailing List,
	linux-fsdevel, Jan Kara, Paul McKenney, Ingo Molnar,
	Miklos Szeredi, Matthew Wilcox, Larry Woodman, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Mon, Jul 2, 2018 at 3:34 PM James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> There are still a lot of applications that keep looking up non-existent
> files, so I think it's still beneficial to keep them.  Apparently
> apache still looks for a .htaccess file in every directory it
> traverses, for instance.

.. or git looking for ".gitignore" files in every directory, or any
number of similar things.

Lookie here, for example:

  [torvalds@i7 linux]$ strace -e trace=%file -c git status
  On branch master
  Your branch is up to date with 'origin/master'.

  nothing to commit, working tree clean
  % time     seconds  usecs/call     calls    errors syscall
  ------ ----------- ----------- --------- --------- ----------------
   73.23    0.009066           2      4056         6 open
   23.33    0.002888           2      1294      1189 openat
    1.60    0.000198          13        15         8 access
    0.80    0.000099           2        36        31 lstat
    0.53    0.000066           1        40         6 stat
    0.27    0.000033           8         4           getcwd
    0.11    0.000014          14         1           execve
    0.11    0.000014          14         1           chdir
    0.02    0.000003           3         1         1 readlink
    0.00    0.000000           0         1           unlink
  ------ ----------- ----------- --------- --------- ----------------
  100.00    0.012381                  5449      1241 total

so almost a quarter (1241 of 5449) of the file accesses resulted in
errors (and I think they are all ENOENT).

Negative lookups are *not* some unusual thing.

                   Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02 22:54       ` Linus Torvalds
@ 2018-07-02 23:03         ` Linus Torvalds
  0 siblings, 0 replies; 29+ messages in thread
From: Linus Torvalds @ 2018-07-02 23:03 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrew Morton, Waiman Long, Al Viro, Linux Kernel Mailing List,
	linux-fsdevel, Jan Kara, Paul McKenney, Ingo Molnar,
	Miklos Szeredi, Matthew Wilcox, Larry Woodman, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Mon, Jul 2, 2018 at 3:54 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Lookie here, for example:
>
>   [torvalds@i7 linux]$ strace -e trace=%file -c git status

So in the name of honestly, that's slightly misleading.

"git" will happily thread the actual index file up-to-date testing.

And that's hidden in the above numbers (because I didn't use "-f" to
follow threads), and they are all successful (because git will go an
'lstat()' on every single entry in the index file, and the index file
obviously is all valid filenames).

So the numbers quoted are closer to the

        git ls-files -o --exclude-standard

command (which doesn't check the index state, it only checks "what
non-tracked files do I have that aren't the ones I explicitly
exclude").

            Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02 22:34     ` James Bottomley
  2018-07-02 22:54       ` Linus Torvalds
@ 2018-07-02 23:19       ` Andrew Morton
  2018-07-02 23:28         ` Linus Torvalds
                           ` (2 more replies)
  1 sibling, 3 replies; 29+ messages in thread
From: Andrew Morton @ 2018-07-02 23:19 UTC (permalink / raw)
  To: James Bottomley
  Cc: Linus Torvalds, Waiman Long, Al Viro, Linux Kernel Mailing List,
	linux-fsdevel, Jan Kara, Paul McKenney, Ingo Molnar,
	Miklos Szeredi, Matthew Wilcox, Larry Woodman, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Mon, 02 Jul 2018 15:34:40 -0700 James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> On Mon, 2018-07-02 at 14:18 -0700, Andrew Morton wrote:
> > On Mon, 2 Jul 2018 12:34:00 -0700 Linus Torvalds <torvalds@linux-foun
> > dation.org> wrote:
> > 
> > > On Sun, Jul 1, 2018 at 10:52 PM Waiman Long <longman@redhat.com>
> > > wrote:
> > > > 
> > > > A rogue application can potentially create a large number of
> > > > negative
> > > > dentries in the system consuming most of the memory available if
> > > > it
> > > > is not under the direct control of a memory controller that
> > > > enforce
> > > > kernel memory limit.
> > > 
> > > I certainly don't mind the patch series, but I would like it to be
> > > accompanied with some actual example numbers, just to make it all a
> > > bit more concrete.
> > > 
> > > Maybe even performance numbers showing "look, I've filled the
> > > dentry
> > > lists with nasty negative dentries, now it's all slower because we
> > > walk those less interesting entries".
> > > 
> > 
> > (Please cc linux-mm@kvack.org on this work)
> > 
> > Yup.  The description of the user-visible impact of current behavior
> > is far too vague.
> > 
> > In the [5/6] changelog it is mentioned that a large number of -ve
> > dentries can lead to oom-killings.  This sounds bad - -ve dentries
> > should be trivially reclaimable and we shouldn't be oom-killing in
> > such a situation.
> 
> If you're old enough, it's déjà vu; Andrea went on a negative dentry
> rampage about 15 years ago:
> 
> https://lkml.org/lkml/2002/5/24/71

That's kinda funny.

> I think the summary of the thread is that it's not worth it because
> dentries are a clean cache, so they're immediately shrinkable.

Yes, "should be".  I could understand that the presence of huge
nunmbers of -ve dentries could result in undesirable reclaim of
pagecache, etc.  Triggering oom-killings is very bad, and presumably
has the same cause.

Before we go and add a large amount of code to do the shrinker's job
for it, we should get a full understanding of what's going wrong.  Is
it because the dentry_lru had a mixture of +ve and -ve dentries? 
Should we have a separate LRU for -ve dentries?  Are we appropriately
aging the various dentries?  etc.

It could be that tuning/fixing the current code will fix whatever
problems inspired this patchset.

> > Dumb question: do we know that negative dentries are actually
> > worthwhile?  Has anyone checked in the past couple of
> > decades?  Perhaps our lookups are so whizzy nowadays that we don't
> > need them?
> 
> There are still a lot of applications that keep looking up non-existent 
> files, so I think it's still beneficial to keep them.  Apparently
> apache still looks for a .htaccess file in every directory it
> traverses, for instance.  Round tripping every one of these to disk
> instead of caching it as a negative dentry would seem to be a
> performance loser here.
> 
> However, actually measuring this again might be useful.

Yup.  I don't know how hard it would be to disable the -ve dentries
(the rename thing makes it sounds harder than I expected) but having
real numbers to justify continuing presence might be a fun project for
someone.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02 23:19       ` Andrew Morton
@ 2018-07-02 23:28         ` Linus Torvalds
  2018-07-03  1:38         ` Waiman Long
  2018-07-03  9:18         ` Jan Kara
  2 siblings, 0 replies; 29+ messages in thread
From: Linus Torvalds @ 2018-07-02 23:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James Bottomley, Waiman Long, Al Viro, Linux Kernel Mailing List,
	linux-fsdevel, Jan Kara, Paul McKenney, Ingo Molnar,
	Miklos Szeredi, Matthew Wilcox, Larry Woodman, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Mon, Jul 2, 2018 at 4:19 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Before we go and add a large amount of code to do the shrinker's job
> for it, we should get a full understanding of what's going wrong.  Is
> it because the dentry_lru had a mixture of +ve and -ve dentries?
> Should we have a separate LRU for -ve dentries?  Are we appropriately
> aging the various dentries?  etc.
>
> It could be that tuning/fixing the current code will fix whatever
> problems inspired this patchset.

So I do think that the shrinker is likely the culprit behind the oom
issues. I think it's likely worse when you try to do some kind of
containerization, and dentries are shared.

That said, I think there are likely good reasons to limit excessive
negative dentries even outside the oom issue. Even if we did a perfect
job at shrinking them and took no time at all doing so, the fact that
you can generate an effecitvely infinite amount of negative dentries
and then polluting the dentry hash chains with them _could_ be a
performance problem.

No sane application does that, and we handle the "obvious" cases
already: ie if you create a lot of files in a deep subdirectory and
then do "rm -rf dir", we *will* throw the negative dentries away as we
remove the directories they are in. So it is unlikely to be much of a
problem in practice. But at least in theory you can generate many
millions of negative dentries just to mess with the system, and slow
down good people.

Probably not even remotely to the point of a DoS attack, but certainly
to the point of "we're wasting time".

So I do think that restricting negative dentries is a fine concept.
They are useful, but that doesn't mean that it makes sense to fill
memory with them.
                 Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02 19:34 ` [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries Linus Torvalds
  2018-07-02 21:18   ` Andrew Morton
@ 2018-07-03  0:46   ` Waiman Long
  1 sibling, 0 replies; 29+ messages in thread
From: Waiman Long @ 2018-07-03  0:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Linux Kernel Mailing List, linux-fsdevel, Jan Kara,
	Paul McKenney, Andrew Morton, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin,C)

On 07/03/2018 03:34 AM, Linus Torvalds wrote:
> On Sun, Jul 1, 2018 at 10:52 PM Waiman Long <longman@redhat.com> wrote:
>> A rogue application can potentially create a large number of negative
>> dentries in the system consuming most of the memory available if it
>> is not under the direct control of a memory controller that enforce
>> kernel memory limit.
> I certainly don't mind the patch series, but I would like it to be
> accompanied with some actual example numbers, just to make it all a
> bit more concrete.
>
> Maybe even performance numbers showing "look, I've filled the dentry
> lists with nasty negative dentries, now it's all slower because we
> walk those less interesting entries".
>
>               Linus

I did have performance numbers in the previous version of the patchset.
I will rerun the performance test and post the numbers later on.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 2/6] fs/dcache: Make negative dentry tracking configurable
  2018-07-02 21:12   ` Andrew Morton
@ 2018-07-03  0:59     ` Waiman Long
  0 siblings, 0 replies; 29+ messages in thread
From: Waiman Long @ 2018-07-03  0:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, linux-kernel, linux-fsdevel, Linus Torvalds,
	Jan Kara, Paul E. McKenney, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C)

On 07/03/2018 05:12 AM, Andrew Morton wrote:
> On Mon,  2 Jul 2018 13:51:59 +0800 Waiman Long <longman@redhat.com> wrote:
>
>> The negative dentry tracking is made a configurable option so that
>> users who don't care about negative dentry tracking will have the
>> option to disable it. The new config option DCACHE_TRACK_NEG_ENTRY
>> is disabled by default.
>>
>> If this option is enabled, a new kernel parameter "neg_dentry_pc=<%>"
>> allows users to set the soft limit on how many negative dentries are
>> allowed as a percentage of the total system memory. The default is 2%
>> and this new parameter accept a range of 0-10% where 0% means there
>> is no limit.
>>
>> When the soft limit is reached, a warning message will be printed to
>> the console to alert the system administrator.
> It would be much more convenient if this was tunable at runtime via yet
> another /proc knob.  Is there any particular reason why we can't do this?
>
The percpu accounting of negative dentries cannot be dynamically turn on
and off or the count won't be accurate. Fortunately that part shouldn't
introduce any noticeable overhead. Everything else can be dynamically
turn on or off, if desired. I will look into making this patchset more
dynamic in the next version.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02 21:18   ` Andrew Morton
  2018-07-02 22:21     ` Matthew Wilcox
  2018-07-02 22:34     ` James Bottomley
@ 2018-07-03  1:11     ` Waiman Long
  2018-07-03 13:48       ` Vlastimil Babka
  2 siblings, 1 reply; 29+ messages in thread
From: Waiman Long @ 2018-07-03  1:11 UTC (permalink / raw)
  To: Andrew Morton, Linus Torvalds
  Cc: Al Viro, Linux Kernel Mailing List, linux-fsdevel, Jan Kara,
	Paul McKenney, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, James Bottomley, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On 07/03/2018 05:18 AM, Andrew Morton wrote:
> On Mon, 2 Jul 2018 12:34:00 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>> On Sun, Jul 1, 2018 at 10:52 PM Waiman Long <longman@redhat.com> wrote:
>>> A rogue application can potentially create a large number of negative
>>> dentries in the system consuming most of the memory available if it
>>> is not under the direct control of a memory controller that enforce
>>> kernel memory limit.
>> I certainly don't mind the patch series, but I would like it to be
>> accompanied with some actual example numbers, just to make it all a
>> bit more concrete.
>>
>> Maybe even performance numbers showing "look, I've filled the dentry
>> lists with nasty negative dentries, now it's all slower because we
>> walk those less interesting entries".
>>
> (Please cc linux-mm@kvack.org on this work)
>
> Yup.  The description of the user-visible impact of current behavior is
> far too vague.
>
> In the [5/6] changelog it is mentioned that a large number of -ve
> dentries can lead to oom-killings.  This sounds bad - -ve dentries
> should be trivially reclaimable and we shouldn't be oom-killing in such
> a situation.

The OOM situation was observed in an older distro kernel. It may not be
the case with the upstream kernel. I will double check that.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02 23:19       ` Andrew Morton
  2018-07-02 23:28         ` Linus Torvalds
@ 2018-07-03  1:38         ` Waiman Long
  2018-07-03  9:18         ` Jan Kara
  2 siblings, 0 replies; 29+ messages in thread
From: Waiman Long @ 2018-07-03  1:38 UTC (permalink / raw)
  To: Andrew Morton, James Bottomley
  Cc: Linus Torvalds, Al Viro, Linux Kernel Mailing List,
	linux-fsdevel, Jan Kara, Paul McKenney, Ingo Molnar,
	Miklos Szeredi, Matthew Wilcox, Larry Woodman, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On 07/03/2018 07:19 AM, Andrew Morton wrote:
> On Mon, 02 Jul 2018 15:34:40 -0700 James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>
>> On Mon, 2018-07-02 at 14:18 -0700, Andrew Morton wrote:
>>> On Mon, 2 Jul 2018 12:34:00 -0700 Linus Torvalds <torvalds@linux-foun
>>> dation.org> wrote:
>>>
>>>> On Sun, Jul 1, 2018 at 10:52 PM Waiman Long <longman@redhat.com>
>>>> wrote:
>>>>> A rogue application can potentially create a large number of
>>>>> negative
>>>>> dentries in the system consuming most of the memory available if
>>>>> it
>>>>> is not under the direct control of a memory controller that
>>>>> enforce
>>>>> kernel memory limit.
>>>> I certainly don't mind the patch series, but I would like it to be
>>>> accompanied with some actual example numbers, just to make it all a
>>>> bit more concrete.
>>>>
>>>> Maybe even performance numbers showing "look, I've filled the
>>>> dentry
>>>> lists with nasty negative dentries, now it's all slower because we
>>>> walk those less interesting entries".
>>>>
>>> (Please cc linux-mm@kvack.org on this work)
>>>
>>> Yup.  The description of the user-visible impact of current behavior
>>> is far too vague.
>>>
>>> In the [5/6] changelog it is mentioned that a large number of -ve
>>> dentries can lead to oom-killings.  This sounds bad - -ve dentries
>>> should be trivially reclaimable and we shouldn't be oom-killing in
>>> such a situation.
>> If you're old enough, it's déjà vu; Andrea went on a negative dentry
>> rampage about 15 years ago:
>>
>> https://lkml.org/lkml/2002/5/24/71
> That's kinda funny.
>
>> I think the summary of the thread is that it's not worth it because
>> dentries are a clean cache, so they're immediately shrinkable.
> Yes, "should be".  I could understand that the presence of huge
> nunmbers of -ve dentries could result in undesirable reclaim of
> pagecache, etc.  Triggering oom-killings is very bad, and presumably
> has the same cause.
>
> Before we go and add a large amount of code to do the shrinker's job
> for it, we should get a full understanding of what's going wrong.  Is
> it because the dentry_lru had a mixture of +ve and -ve dentries? 
> Should we have a separate LRU for -ve dentries?  Are we appropriately
> aging the various dentries?  etc.

I have actually investigated having a separate LRU for negative
dentries. That will result in a far more invasive patch that will be
more disruptive.

Another change that was suggested by a colleague is to put a newly
created -ve dentry to the tail (LRU end) of the LRU and move it to the
head only if it is accessed a second time. That will put most of the
negative dentries at the tail that will be more easily trimmed away.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-02 23:19       ` Andrew Morton
  2018-07-02 23:28         ` Linus Torvalds
  2018-07-03  1:38         ` Waiman Long
@ 2018-07-03  9:18         ` Jan Kara
  2018-07-14 17:35           ` Pavel Machek
  2 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2018-07-03  9:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James Bottomley, Linus Torvalds, Waiman Long, Al Viro,
	Linux Kernel Mailing List, linux-fsdevel, Jan Kara,
	Paul McKenney, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Mon 02-07-18 16:19:25, Andrew Morton wrote:
> On Mon, 02 Jul 2018 15:34:40 -0700 James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
> > On Mon, 2018-07-02 at 14:18 -0700, Andrew Morton wrote:
> > > On Mon, 2 Jul 2018 12:34:00 -0700 Linus Torvalds <torvalds@linux-foun
> > > dation.org> wrote:
> > > 
> > > > On Sun, Jul 1, 2018 at 10:52 PM Waiman Long <longman@redhat.com>
> > > > wrote:
> > > > > 
> > > > > A rogue application can potentially create a large number of
> > > > > negative
> > > > > dentries in the system consuming most of the memory available if
> > > > > it
> > > > > is not under the direct control of a memory controller that
> > > > > enforce
> > > > > kernel memory limit.
> > > > 
> > > > I certainly don't mind the patch series, but I would like it to be
> > > > accompanied with some actual example numbers, just to make it all a
> > > > bit more concrete.
> > > > 
> > > > Maybe even performance numbers showing "look, I've filled the
> > > > dentry
> > > > lists with nasty negative dentries, now it's all slower because we
> > > > walk those less interesting entries".
> > > > 
> > > 
> > > (Please cc linux-mm@kvack.org on this work)
> > > 
> > > Yup.��The description of the user-visible impact of current behavior
> > > is far too vague.
> > > 
> > > In the [5/6] changelog it is mentioned that a large number of -ve
> > > dentries can lead to oom-killings.��This sounds bad - -ve dentries
> > > should be trivially reclaimable and we shouldn't be oom-killing in
> > > such a situation.
> > 
> > If you're old enough, it's d�j� vu; Andrea went on a negative dentry
> > rampage about 15 years ago:
> > 
> > https://lkml.org/lkml/2002/5/24/71
> 
> That's kinda funny.
> 
> > I think the summary of the thread is that it's not worth it because
> > dentries are a clean cache, so they're immediately shrinkable.
> 
> Yes, "should be".  I could understand that the presence of huge
> nunmbers of -ve dentries could result in undesirable reclaim of
> pagecache, etc.  Triggering oom-killings is very bad, and presumably
> has the same cause.
> 
> Before we go and add a large amount of code to do the shrinker's job
> for it, we should get a full understanding of what's going wrong.  Is
> it because the dentry_lru had a mixture of +ve and -ve dentries? 
> Should we have a separate LRU for -ve dentries?  Are we appropriately
> aging the various dentries?  etc.
> 
> It could be that tuning/fixing the current code will fix whatever
> problems inspired this patchset.

What I think is contributing to the problems and could lead to reclaim
oddities is the internal fragmentation of dentry slab cache. Dentries are
relatively small, you get 21 per page on my system, so if trivial to
reclaim negative dentries get mixed with a small amount of unreclaimable
positive dentries, you can get a lot of pages in dentry slab cache that are
unreclaimable.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-03  1:11     ` Waiman Long
@ 2018-07-03 13:48       ` Vlastimil Babka
  0 siblings, 0 replies; 29+ messages in thread
From: Vlastimil Babka @ 2018-07-03 13:48 UTC (permalink / raw)
  To: Waiman Long, Andrew Morton, Linus Torvalds
  Cc: Al Viro, Linux Kernel Mailing List, linux-fsdevel, Jan Kara,
	Paul McKenney, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, James Bottomley, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On 07/03/2018 03:11 AM, Waiman Long wrote:
> On 07/03/2018 05:18 AM, Andrew Morton wrote:
>> On Mon, 2 Jul 2018 12:34:00 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>>> On Sun, Jul 1, 2018 at 10:52 PM Waiman Long <longman@redhat.com> wrote:
>>>> A rogue application can potentially create a large number of negative
>>>> dentries in the system consuming most of the memory available if it
>>>> is not under the direct control of a memory controller that enforce
>>>> kernel memory limit.
>>> I certainly don't mind the patch series, but I would like it to be
>>> accompanied with some actual example numbers, just to make it all a
>>> bit more concrete.
>>>
>>> Maybe even performance numbers showing "look, I've filled the dentry
>>> lists with nasty negative dentries, now it's all slower because we
>>> walk those less interesting entries".
>>>
>> (Please cc linux-mm@kvack.org on this work)
>>
>> Yup.  The description of the user-visible impact of current behavior is
>> far too vague.
>>
>> In the [5/6] changelog it is mentioned that a large number of -ve
>> dentries can lead to oom-killings.  This sounds bad - -ve dentries
>> should be trivially reclaimable and we shouldn't be oom-killing in such
>> a situation.
> 
> The OOM situation was observed in an older distro kernel. It may not be
> the case with the upstream kernel. I will double check that.

Note that dentries with externally allocated (long) names might have
been the factor here, until recent commits f1782c9bc547 ("dcache:
account external names as indirectly reclaimable memory") and
d79f7aa496fc ("mm: treat indirectly reclaimable memory as free in
overcommit logic").

Vlastimil

> Cheers,
> Longman
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-03  9:18         ` Jan Kara
@ 2018-07-14 17:35           ` Pavel Machek
  2018-07-14 18:00             ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Pavel Machek @ 2018-07-14 17:35 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, James Bottomley, Linus Torvalds, Waiman Long,
	Al Viro, Linux Kernel Mailing List, linux-fsdevel, Paul McKenney,
	Ingo Molnar, Miklos Szeredi, Matthew Wilcox, Larry Woodman,
	Wangkai (Kevin,C),
	linux-mm, Michal Hocko

> > Yes, "should be".  I could understand that the presence of huge
> > nunmbers of -ve dentries could result in undesirable reclaim of
> > pagecache, etc.  Triggering oom-killings is very bad, and presumably
> > has the same cause.
> > 
> > Before we go and add a large amount of code to do the shrinker's job
> > for it, we should get a full understanding of what's going wrong.  Is
> > it because the dentry_lru had a mixture of +ve and -ve dentries? 
> > Should we have a separate LRU for -ve dentries?  Are we appropriately
> > aging the various dentries?  etc.
> > 
> > It could be that tuning/fixing the current code will fix whatever
> > problems inspired this patchset.
> 
> What I think is contributing to the problems and could lead to reclaim
> oddities is the internal fragmentation of dentry slab cache. Dentries are
> relatively small, you get 21 per page on my system, so if trivial to
> reclaim negative dentries get mixed with a small amount of unreclaimable
> positive dentries, you can get a lot of pages in dentry slab cache that are
> unreclaimable.

Could we allocate -ve entries from separate slab?

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-14 17:35           ` Pavel Machek
@ 2018-07-14 18:00             ` Linus Torvalds
  2018-07-14 18:34               ` Al Viro
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2018-07-14 18:00 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Jan Kara, Andrew Morton, James Bottomley, Waiman Long, Al Viro,
	Linux Kernel Mailing List, linux-fsdevel, Paul McKenney,
	Ingo Molnar, Miklos Szeredi, Matthew Wilcox, Larry Woodman,
	Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Sat, Jul 14, 2018 at 10:35 AM Pavel Machek <pavel@ucw.cz> wrote:
>
> Could we allocate -ve entries from separate slab?

No, because negative dentrires don't stay negative.

Every single positive dentry starts out as a negative dentry that is
passed in to "lookup()" to maybe be made positive.

And most of the time they <i>do</i> turn positive, because most of the
time people actually open files that exist.

But then occasionally you don't, because you're just blindly opening a
filename whether it exists or not (to _check_ whether it's there).

              Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-14 18:00             ` Linus Torvalds
@ 2018-07-14 18:34               ` Al Viro
  2018-07-14 18:36                 ` Al Viro
  2018-07-18 16:01                 ` Waiman Long
  0 siblings, 2 replies; 29+ messages in thread
From: Al Viro @ 2018-07-14 18:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Pavel Machek, Jan Kara, Andrew Morton, James Bottomley,
	Waiman Long, Linux Kernel Mailing List, linux-fsdevel,
	Paul McKenney, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Sat, Jul 14, 2018 at 11:00:32AM -0700, Linus Torvalds wrote:
> On Sat, Jul 14, 2018 at 10:35 AM Pavel Machek <pavel@ucw.cz> wrote:
> >
> > Could we allocate -ve entries from separate slab?
> 
> No, because negative dentrires don't stay negative.
> 
> Every single positive dentry starts out as a negative dentry that is
> passed in to "lookup()" to maybe be made positive.
> 
> And most of the time they <i>do</i> turn positive, because most of the
> time people actually open files that exist.
> 
> But then occasionally you don't, because you're just blindly opening a
> filename whether it exists or not (to _check_ whether it's there).

BTW, one point that might not be realized by everyone: negative dentries
are *not* the hard case.
mount -t tmpfs none /mnt
touch /mnt/a
for i in `seq 100000`; do ln /mnt/a /mnt/$i; done

and you've got 100000 *unevictable* dentries, with the time per iteration
being not all that high (especially if you just call link(2) in a loop).
They are all positive and all pinned.  And you've got only one inode
there and no persistently opened files, so rlimit and quota won't help
any.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-14 18:34               ` Al Viro
@ 2018-07-14 18:36                 ` Al Viro
  2018-07-14 18:43                   ` Linus Torvalds
  2018-07-18 16:01                 ` Waiman Long
  1 sibling, 1 reply; 29+ messages in thread
From: Al Viro @ 2018-07-14 18:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Pavel Machek, Jan Kara, Andrew Morton, James Bottomley,
	Waiman Long, Linux Kernel Mailing List, linux-fsdevel,
	Paul McKenney, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Sat, Jul 14, 2018 at 07:34:45PM +0100, Al Viro wrote:
> On Sat, Jul 14, 2018 at 11:00:32AM -0700, Linus Torvalds wrote:
> > On Sat, Jul 14, 2018 at 10:35 AM Pavel Machek <pavel@ucw.cz> wrote:
> > >
> > > Could we allocate -ve entries from separate slab?
> > 
> > No, because negative dentrires don't stay negative.
> > 
> > Every single positive dentry starts out as a negative dentry that is
> > passed in to "lookup()" to maybe be made positive.
> > 
> > And most of the time they <i>do</i> turn positive, because most of the
> > time people actually open files that exist.
> > 
> > But then occasionally you don't, because you're just blindly opening a
> > filename whether it exists or not (to _check_ whether it's there).
> 
> BTW, one point that might not be realized by everyone: negative dentries
> are *not* the hard case.
> mount -t tmpfs none /mnt
> touch /mnt/a
> for i in `seq 100000`; do ln /mnt/a /mnt/$i; done
> 
> and you've got 100000 *unevictable* dentries, with the time per iteration
> being not all that high (especially if you just call link(2) in a loop).
> They are all positive and all pinned.  And you've got only one inode
> there and no persistently opened files, so rlimit and quota won't help
> any.

OK, this
        /*   
         * No ordinary (disk based) filesystem counts links as inodes;
         * but each new link needs a new dentry, pinning lowmem, and
         * tmpfs dentries cannot be pruned until they are unlinked.
         */
        ret = shmem_reserve_inode(inode->i_sb);
        if (ret)
                goto out;
will probably help (on ramfs it won't, though).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-14 18:36                 ` Al Viro
@ 2018-07-14 18:43                   ` Linus Torvalds
  0 siblings, 0 replies; 29+ messages in thread
From: Linus Torvalds @ 2018-07-14 18:43 UTC (permalink / raw)
  To: Al Viro
  Cc: Pavel Machek, Jan Kara, Andrew Morton, James Bottomley,
	Waiman Long, Linux Kernel Mailing List, linux-fsdevel,
	Paul McKenney, Ingo Molnar, Miklos Szeredi, Matthew Wilcox,
	Larry Woodman, Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On Sat, Jul 14, 2018 at 11:36 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> OK, this
>         /*
>          * No ordinary (disk based) filesystem counts links as inodes;
>          * but each new link needs a new dentry, pinning lowmem, and
>          * tmpfs dentries cannot be pruned until they are unlinked.
>          */
>         ret = shmem_reserve_inode(inode->i_sb);
>         if (ret)
>                 goto out;
> will probably help (on ramfs it won't, though).

Nobody who cares about memory use would use ramfs and then allow
random users on it.

I think you can exhaust memory more easily on ramfs by just writing a
huge file. Do we have any limits at all?

ramfs is fine for things like initramfs, but I think the comment says it all:

 * NOTE! This filesystem is probably most useful
 * not as a real filesystem, but as an example of
 * how virtual filesystems can be written.

and even that comment may have been more correct back in 2000 than it is today.

             Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries
  2018-07-14 18:34               ` Al Viro
  2018-07-14 18:36                 ` Al Viro
@ 2018-07-18 16:01                 ` Waiman Long
  1 sibling, 0 replies; 29+ messages in thread
From: Waiman Long @ 2018-07-18 16:01 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: Pavel Machek, Jan Kara, Andrew Morton, James Bottomley,
	Linux Kernel Mailing List, linux-fsdevel, Paul McKenney,
	Ingo Molnar, Miklos Szeredi, Matthew Wilcox, Larry Woodman,
	Wangkai (Kevin,C),
	linux-mm, Michal Hocko

On 07/14/2018 02:34 PM, Al Viro wrote:
> On Sat, Jul 14, 2018 at 11:00:32AM -0700, Linus Torvalds wrote:
>> On Sat, Jul 14, 2018 at 10:35 AM Pavel Machek <pavel@ucw.cz> wrote:
>>> Could we allocate -ve entries from separate slab?
>> No, because negative dentrires don't stay negative.
>>
>> Every single positive dentry starts out as a negative dentry that is
>> passed in to "lookup()" to maybe be made positive.
>>
>> And most of the time they <i>do</i> turn positive, because most of the
>> time people actually open files that exist.
>>
>> But then occasionally you don't, because you're just blindly opening a
>> filename whether it exists or not (to _check_ whether it's there).
> BTW, one point that might not be realized by everyone: negative dentries
> are *not* the hard case.
> mount -t tmpfs none /mnt
> touch /mnt/a
> for i in `seq 100000`; do ln /mnt/a /mnt/$i; done
>
> and you've got 100000 *unevictable* dentries, with the time per iteration
> being not all that high (especially if you just call link(2) in a loop).
> They are all positive and all pinned.  And you've got only one inode
> there and no persistently opened files, so rlimit and quota won't help
> any.

Normally you need to be root or have privileges to mount a filesystem.
Right?

I am aware there is effort going on to allow non-privilege user mount in
container. That can open a can of worms if it is not done properly.

With privileges, there is a lot of ways one can screw up the system. So
I am not less concern about this particular issue.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2018-07-18 16:01 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-02  5:51 [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries Waiman Long
2018-07-02  5:51 ` [PATCH v5 1/6] fs/dcache: Track & report number " Waiman Long
2018-07-02  5:51 ` [PATCH v5 2/6] fs/dcache: Make negative dentry tracking configurable Waiman Long
2018-07-02 21:12   ` Andrew Morton
2018-07-03  0:59     ` Waiman Long
2018-07-02  5:52 ` [PATCH v5 3/6] fs/dcache: Enable automatic pruning of negative dentries Waiman Long
2018-07-02  5:52 ` [PATCH v5 4/6] fs/dcache: Spread negative dentry pruning across multiple CPUs Waiman Long
2018-07-02  5:52 ` [PATCH v5 5/6] fs/dcache: Allow optional enforcement of negative dentry limit Waiman Long
2018-07-02  5:52 ` [PATCH v5 6/6] fs/dcache: Make negative dentry limit enforcement sysctl parameter Waiman Long
2018-07-02 19:34 ` [PATCH v5 0/6] fs/dcache: Track & limit # of negative dentries Linus Torvalds
2018-07-02 21:18   ` Andrew Morton
2018-07-02 22:21     ` Matthew Wilcox
2018-07-02 22:31       ` Linus Torvalds
2018-07-02 22:34     ` James Bottomley
2018-07-02 22:54       ` Linus Torvalds
2018-07-02 23:03         ` Linus Torvalds
2018-07-02 23:19       ` Andrew Morton
2018-07-02 23:28         ` Linus Torvalds
2018-07-03  1:38         ` Waiman Long
2018-07-03  9:18         ` Jan Kara
2018-07-14 17:35           ` Pavel Machek
2018-07-14 18:00             ` Linus Torvalds
2018-07-14 18:34               ` Al Viro
2018-07-14 18:36                 ` Al Viro
2018-07-14 18:43                   ` Linus Torvalds
2018-07-18 16:01                 ` Waiman Long
2018-07-03  1:11     ` Waiman Long
2018-07-03 13:48       ` Vlastimil Babka
2018-07-03  0:46   ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).