linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] fs/dcache: Track # of negative dentries
@ 2018-08-28 17:19 Waiman Long
  2018-08-28 17:19 ` [PATCH 1/2] fs/dcache: Track & report number " Waiman Long
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Waiman Long @ 2018-08-28 17:19 UTC (permalink / raw)
  To: Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-mm, linux-doc,
	Luis R. Rodriguez, Kees Cook, Linus Torvalds, Jan Kara,
	Paul E. McKenney, Andrew Morton, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C),
	Michal Hocko, Waiman Long

This patchset is a reduced scope version of the
patchset "fs/dcache: Track & limit # of negative dentries"
(https://lkml.org/lkml/2018/7/12/586). Only the first 2 patches are
included to track the number of negative dentries in the system as well
as making negative dentries more easily reclaimed than positive ones.

There are controversies on limiting number of negative dentries as it may
make negative dentries special in term of how memory resources are to
be managed in the kernel. However, I don't believe I heard any concern
about tracking the number of negative dentries in the system. So it is
better to separate that out and get it done with. We can deal with the
controversial part later on.

Patch 1 adds tracking to the number of negative dentries in the LRU list.

Patch 2 makes negative dentries to be added at the head end of the LRU
list so that they are first to go when a shrinker is running if those
negative dentries are never referenced again.

Waiman Long (2):
  fs/dcache: Track & report number of negative dentries
  fs/dcache: Make negative dentries easier to be reclaimed

 Documentation/sysctl/fs.txt | 19 ++++++++++-----
 fs/dcache.c                 | 56 ++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/dcache.h      |  8 ++++---
 include/linux/list_lru.h    | 17 ++++++++++++++
 mm/list_lru.c               | 16 +++++++++++--
 5 files changed, 104 insertions(+), 12 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/2] fs/dcache: Track & report number of negative dentries
  2018-08-28 17:19 [PATCH 0/2] fs/dcache: Track # of negative dentries Waiman Long
@ 2018-08-28 17:19 ` Waiman Long
  2018-08-29  0:11   ` Dave Chinner
  2018-08-28 17:19 ` [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed Waiman Long
  2018-08-28 22:50 ` [PATCH 0/2] fs/dcache: Track # of negative dentries Andrew Morton
  2 siblings, 1 reply; 29+ messages in thread
From: Waiman Long @ 2018-08-28 17:19 UTC (permalink / raw)
  To: Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-mm, linux-doc,
	Luis R. Rodriguez, Kees Cook, Linus Torvalds, Jan Kara,
	Paul E. McKenney, Andrew Morton, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C),
	Michal Hocko, Waiman Long

The current dentry number tracking code doesn't distinguish between
positive & negative dentries. It just reports the total number of
dentries in the LRU lists.

As excessive number of negative dentries can have an impact on system
performance, it will be wise to track the number of positive and
negative dentries separately.

This patch adds tracking for the total number of negative dentries in
the system LRU lists and reports it in the /proc/sys/fs/dentry-state
file. The number, however, does not include negative dentries that are
in flight but not in the LRU yet.

The number of positive dentries in the LRU lists can be roughly found
by subtracting the number of negative dentries from the total.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/sysctl/fs.txt | 19 +++++++++++++------
 fs/dcache.c                 | 45 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dcache.h      |  7 ++++---
 3 files changed, 62 insertions(+), 9 deletions(-)

diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
index 819caf8..118bb93 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -63,19 +63,26 @@ struct {
         int nr_unused;
         int age_limit;         /* age in seconds */
         int want_pages;        /* pages requested by system */
-        int dummy[2];
+        int nr_negative;       /* # of unused negative dentries */
+        int dummy;
 } dentry_stat = {0, 0, 45, 0,};
--------------------------------------------------------------- 
+--------------------------------------------------------------
+
+Dentries are dynamically allocated and deallocated.
+
+nr_dentry shows the total number of dentries allocated (active
++ unused). nr_unused shows the number of dentries that are not
+actively used, but are saved in the LRU list for future reuse.
 
-Dentries are dynamically allocated and deallocated, and
-nr_dentry seems to be 0 all the time. Hence it's safe to
-assume that only nr_unused, age_limit and want_pages are
-used. Nr_unused seems to be exactly what its name says.
 Age_limit is the age in seconds after which dcache entries
 can be reclaimed when memory is short and want_pages is
 nonzero when shrink_dcache_pages() has been called and the
 dcache isn't pruned yet.
 
+nr_negative shows the number of unused dentries that are also
+negative dentries which do not mapped to actual files if negative
+dentries tracking is enabled.
+
 ==============================================================
 
 dquot-max & dquot-nr:
diff --git a/fs/dcache.c b/fs/dcache.c
index 2e7e8d8..69f5541 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -119,6 +119,7 @@ struct dentry_stat_t dentry_stat = {
 
 static DEFINE_PER_CPU(long, nr_dentry);
 static DEFINE_PER_CPU(long, nr_dentry_unused);
+static DEFINE_PER_CPU(long, nr_dentry_neg);
 
 #if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS)
 
@@ -152,11 +153,22 @@ static long get_nr_dentry_unused(void)
 	return sum < 0 ? 0 : sum;
 }
 
+static long get_nr_dentry_neg(void)
+{
+	int i;
+	long sum = 0;
+
+	for_each_possible_cpu(i)
+		sum += per_cpu(nr_dentry_neg, i);
+	return sum < 0 ? 0 : sum;
+}
+
 int proc_nr_dentry(struct ctl_table *table, int write, void __user *buffer,
 		   size_t *lenp, loff_t *ppos)
 {
 	dentry_stat.nr_dentry = get_nr_dentry();
 	dentry_stat.nr_unused = get_nr_dentry_unused();
+	dentry_stat.nr_negative = get_nr_dentry_neg();
 	return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
 }
 #endif
@@ -214,6 +226,28 @@ static inline int dentry_string_cmp(const unsigned char *cs, const unsigned char
 
 #endif
 
+static inline void __neg_dentry_dec(struct dentry *dentry)
+{
+	this_cpu_dec(nr_dentry_neg);
+}
+
+static inline void neg_dentry_dec(struct dentry *dentry)
+{
+	if (unlikely(d_is_negative(dentry)))
+		__neg_dentry_dec(dentry);
+}
+
+static inline void __neg_dentry_inc(struct dentry *dentry)
+{
+	this_cpu_inc(nr_dentry_neg);
+}
+
+static inline void neg_dentry_inc(struct dentry *dentry)
+{
+	if (unlikely(d_is_negative(dentry)))
+		__neg_dentry_inc(dentry);
+}
+
 static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *ct, unsigned tcount)
 {
 	/*
@@ -331,6 +365,8 @@ static inline void __d_clear_type_and_inode(struct dentry *dentry)
 	flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU);
 	WRITE_ONCE(dentry->d_flags, flags);
 	dentry->d_inode = NULL;
+	if (dentry->d_flags & DCACHE_LRU_LIST)
+		__neg_dentry_inc(dentry);
 }
 
 static void dentry_free(struct dentry *dentry)
@@ -395,6 +431,7 @@ static void d_lru_add(struct dentry *dentry)
 	dentry->d_flags |= DCACHE_LRU_LIST;
 	this_cpu_inc(nr_dentry_unused);
 	WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
+	neg_dentry_inc(dentry);
 }
 
 static void d_lru_del(struct dentry *dentry)
@@ -403,6 +440,7 @@ static void d_lru_del(struct dentry *dentry)
 	dentry->d_flags &= ~DCACHE_LRU_LIST;
 	this_cpu_dec(nr_dentry_unused);
 	WARN_ON_ONCE(!list_lru_del(&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
+	neg_dentry_dec(dentry);
 }
 
 static void d_shrink_del(struct dentry *dentry)
@@ -433,6 +471,7 @@ static void d_lru_isolate(struct list_lru_one *lru, struct dentry *dentry)
 	dentry->d_flags &= ~DCACHE_LRU_LIST;
 	this_cpu_dec(nr_dentry_unused);
 	list_lru_isolate(lru, &dentry->d_lru);
+	neg_dentry_dec(dentry);
 }
 
 static void d_lru_shrink_move(struct list_lru_one *lru, struct dentry *dentry,
@@ -441,6 +480,7 @@ static void d_lru_shrink_move(struct list_lru_one *lru, struct dentry *dentry,
 	D_FLAG_VERIFY(dentry, DCACHE_LRU_LIST);
 	dentry->d_flags |= DCACHE_SHRINK_LIST;
 	list_lru_isolate_move(lru, &dentry->d_lru, list);
+	neg_dentry_dec(dentry);
 }
 
 /**
@@ -1840,6 +1880,11 @@ static void __d_instantiate(struct dentry *dentry, struct inode *inode)
 	WARN_ON(d_in_lookup(dentry));
 
 	spin_lock(&dentry->d_lock);
+	/*
+	 * Decrement negative dentry count if it was in the LRU list.
+	 */
+	if (dentry->d_flags & DCACHE_LRU_LIST)
+		__neg_dentry_dec(dentry);
 	hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry);
 	raw_write_seqcount_begin(&dentry->d_seq);
 	__d_set_inode_and_type(dentry, inode, add_flags);
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index ef4b70f..df942e5 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -62,9 +62,10 @@ struct qstr {
 struct dentry_stat_t {
 	long nr_dentry;
 	long nr_unused;
-	long age_limit;          /* age in seconds */
-	long want_pages;         /* pages requested by system */
-	long dummy[2];
+	long age_limit;		/* age in seconds */
+	long want_pages;	/* pages requested by system */
+	long nr_negative;	/* # of unused negative dentries */
+	long dummy;
 };
 extern struct dentry_stat_t dentry_stat;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed
  2018-08-28 17:19 [PATCH 0/2] fs/dcache: Track # of negative dentries Waiman Long
  2018-08-28 17:19 ` [PATCH 1/2] fs/dcache: Track & report number " Waiman Long
@ 2018-08-28 17:19 ` Waiman Long
  2018-08-28 22:13   ` Matthew Wilcox
                     ` (3 more replies)
  2018-08-28 22:50 ` [PATCH 0/2] fs/dcache: Track # of negative dentries Andrew Morton
  2 siblings, 4 replies; 29+ messages in thread
From: Waiman Long @ 2018-08-28 17:19 UTC (permalink / raw)
  To: Alexander Viro, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-mm, linux-doc,
	Luis R. Rodriguez, Kees Cook, Linus Torvalds, Jan Kara,
	Paul E. McKenney, Andrew Morton, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C),
	Michal Hocko, Waiman Long

For negative dentries that are accessed once and never used again, they
should be removed first before other dentries when shrinker is running.
This is done by putting negative dentries at the head of the LRU list
instead at the tail.

A new DCACHE_NEW_NEGATIVE flag is now added to a negative dentry when it
is initially created. When such a dentry is added to the LRU, it will be
added to the head so that it will be the first to go when a shrinker is
running if it is never accessed again (DCACHE_REFERENCED bit not set).
The flag is cleared after the LRU list addition.

Suggested-by: Larry Woodman <lwoodman@redhat.com>
Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/dcache.c              | 25 +++++++++++++++++--------
 include/linux/dcache.h   |  1 +
 include/linux/list_lru.h | 17 +++++++++++++++++
 mm/list_lru.c            | 16 ++++++++++++++--
 4 files changed, 49 insertions(+), 10 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 69f5541..ab6a4cf 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -242,12 +242,6 @@ static inline void __neg_dentry_inc(struct dentry *dentry)
 	this_cpu_inc(nr_dentry_neg);
 }
 
-static inline void neg_dentry_inc(struct dentry *dentry)
-{
-	if (unlikely(d_is_negative(dentry)))
-		__neg_dentry_inc(dentry);
-}
-
 static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *ct, unsigned tcount)
 {
 	/*
@@ -353,7 +347,7 @@ static inline void __d_set_inode_and_type(struct dentry *dentry,
 
 	dentry->d_inode = inode;
 	flags = READ_ONCE(dentry->d_flags);
-	flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU);
+	flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU | DCACHE_NEW_NEGATIVE);
 	flags |= type_flags;
 	WRITE_ONCE(dentry->d_flags, flags);
 }
@@ -430,8 +424,20 @@ static void d_lru_add(struct dentry *dentry)
 	D_FLAG_VERIFY(dentry, 0);
 	dentry->d_flags |= DCACHE_LRU_LIST;
 	this_cpu_inc(nr_dentry_unused);
+	if (d_is_negative(dentry)) {
+		__neg_dentry_inc(dentry);
+		if (dentry->d_flags & DCACHE_NEW_NEGATIVE) {
+			/*
+			 * Add the negative dentry to the head once, it
+			 * will be added to the tail next time.
+			 */
+			WARN_ON_ONCE(!list_lru_add_head(
+				&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
+			dentry->d_flags &= ~DCACHE_NEW_NEGATIVE;
+			return;
+		}
+	}
 	WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
-	neg_dentry_inc(dentry);
 }
 
 static void d_lru_del(struct dentry *dentry)
@@ -2620,6 +2626,9 @@ static inline void __d_add(struct dentry *dentry, struct inode *inode)
 		__d_set_inode_and_type(dentry, inode, add_flags);
 		raw_write_seqcount_end(&dentry->d_seq);
 		fsnotify_update_flags(dentry);
+	} else {
+		/* It is a negative dentry, add it to LRU head initially. */
+		dentry->d_flags |= DCACHE_NEW_NEGATIVE;
 	}
 	__d_rehash(dentry);
 	if (dir)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index df942e5..03a1918 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -214,6 +214,7 @@ struct dentry_operations {
 #define DCACHE_FALLTHRU			0x01000000 /* Fall through to lower layer */
 #define DCACHE_ENCRYPTED_WITH_KEY	0x02000000 /* dir is encrypted with a valid key */
 #define DCACHE_OP_REAL			0x04000000
+#define DCACHE_NEW_NEGATIVE		0x08000000 /* New negative dentry */
 
 #define DCACHE_PAR_LOOKUP		0x10000000 /* being looked up (with parent locked shared) */
 #define DCACHE_DENTRY_CURSOR		0x20000000
diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index aa5efd9..bfac057 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -90,6 +90,23 @@ int __list_lru_init(struct list_lru *lru, bool memcg_aware,
 bool list_lru_add(struct list_lru *lru, struct list_head *item);
 
 /**
+ * list_lru_add_head: add an element to the lru list's head
+ * @list_lru: the lru pointer
+ * @item: the item to be added.
+ *
+ * This is similar to list_lru_add(). The only difference is the location
+ * where the new item will be added. The list_lru_add() function will add
+ * the new item to the tail as it is the most recently used one. The
+ * list_lru_add_head() will add the new item into the head so that it
+ * will the first to go if a shrinker is running. So this function should
+ * only be used for less important item that can be the first to go if
+ * the system is under memory pressure.
+ *
+ * Return value: true if the list was updated, false otherwise
+ */
+bool list_lru_add_head(struct list_lru *lru, struct list_head *item);
+
+/**
  * list_lru_del: delete an element to the lru list
  * @list_lru: the lru pointer
  * @item: the item to be deleted.
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 5b30625..133f41c 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -124,7 +124,8 @@ static inline bool list_lru_memcg_aware(struct list_lru *lru)
 }
 #endif /* CONFIG_MEMCG_KMEM */
 
-bool list_lru_add(struct list_lru *lru, struct list_head *item)
+static inline bool __list_lru_add(struct list_lru *lru, struct list_head *item,
+				  const bool add_tail)
 {
 	int nid = page_to_nid(virt_to_page(item));
 	struct list_lru_node *nlru = &lru->node[nid];
@@ -134,7 +135,7 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item)
 	spin_lock(&nlru->lock);
 	if (list_empty(item)) {
 		l = list_lru_from_kmem(nlru, item, &memcg);
-		list_add_tail(item, &l->list);
+		(add_tail ? list_add_tail : list_add)(item, &l->list);
 		/* Set shrinker bit if the first element was added */
 		if (!l->nr_items++)
 			memcg_set_shrinker_bit(memcg, nid,
@@ -146,8 +147,19 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item)
 	spin_unlock(&nlru->lock);
 	return false;
 }
+
+bool list_lru_add(struct list_lru *lru, struct list_head *item)
+{
+	return __list_lru_add(lru, item, true);
+}
 EXPORT_SYMBOL_GPL(list_lru_add);
 
+bool list_lru_add_head(struct list_lru *lru, struct list_head *item)
+{
+	return __list_lru_add(lru, item, false);
+}
+EXPORT_SYMBOL_GPL(list_lru_add_head);
+
 bool list_lru_del(struct list_lru *lru, struct list_head *item)
 {
 	int nid = page_to_nid(virt_to_page(item));
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed
  2018-08-28 17:19 ` [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed Waiman Long
@ 2018-08-28 22:13   ` Matthew Wilcox
  2018-08-28 22:29     ` Waiman Long
  2018-08-28 23:01   ` Andrew Morton
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 29+ messages in thread
From: Matthew Wilcox @ 2018-08-28 22:13 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-fsdevel,
	linux-mm, linux-doc, Luis R. Rodriguez, Kees Cook,
	Linus Torvalds, Jan Kara, Paul E. McKenney, Andrew Morton,
	Ingo Molnar, Miklos Szeredi, Larry Woodman, James Bottomley,
	Wangkai (Kevin C),
	Michal Hocko

On Tue, Aug 28, 2018 at 01:19:40PM -0400, Waiman Long wrote:
> @@ -134,7 +135,7 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item)
>  	spin_lock(&nlru->lock);
>  	if (list_empty(item)) {
>  		l = list_lru_from_kmem(nlru, item, &memcg);
> -		list_add_tail(item, &l->list);
> +		(add_tail ? list_add_tail : list_add)(item, &l->list);
>  		/* Set shrinker bit if the first element was added */
>  		if (!l->nr_items++)
>  			memcg_set_shrinker_bit(memcg, nid,

That's not OK.  Write it out properly, ie:

		if (add_tail)
			list_add_tail(item, &l->list);
		else
			list_add(item, &l->list);

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed
  2018-08-28 22:13   ` Matthew Wilcox
@ 2018-08-28 22:29     ` Waiman Long
  2018-08-28 23:10       ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Waiman Long @ 2018-08-28 22:29 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-fsdevel,
	linux-mm, linux-doc, Luis R. Rodriguez, Kees Cook,
	Linus Torvalds, Jan Kara, Paul E. McKenney, Andrew Morton,
	Ingo Molnar, Miklos Szeredi, Larry Woodman, James Bottomley,
	Wangkai (Kevin C),
	Michal Hocko

On 08/28/2018 06:13 PM, Matthew Wilcox wrote:
> On Tue, Aug 28, 2018 at 01:19:40PM -0400, Waiman Long wrote:
>> @@ -134,7 +135,7 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item)
>>  	spin_lock(&nlru->lock);
>>  	if (list_empty(item)) {
>>  		l = list_lru_from_kmem(nlru, item, &memcg);
>> -		list_add_tail(item, &l->list);
>> +		(add_tail ? list_add_tail : list_add)(item, &l->list);
>>  		/* Set shrinker bit if the first element was added */
>>  		if (!l->nr_items++)
>>  			memcg_set_shrinker_bit(memcg, nid,
> That's not OK.  Write it out properly, ie:
>
> 		if (add_tail)
> 			list_add_tail(item, &l->list);
> 		else
> 			list_add(item, &l->list);
>
Yes, I can rewrite it. What is the problem with the abbreviated form?

Cheers,
Longman

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/2] fs/dcache: Track # of negative dentries
  2018-08-28 17:19 [PATCH 0/2] fs/dcache: Track # of negative dentries Waiman Long
  2018-08-28 17:19 ` [PATCH 1/2] fs/dcache: Track & report number " Waiman Long
  2018-08-28 17:19 ` [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed Waiman Long
@ 2018-08-28 22:50 ` Andrew Morton
  2018-08-28 22:54   ` Waiman Long
  2 siblings, 1 reply; 29+ messages in thread
From: Andrew Morton @ 2018-08-28 22:50 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-fsdevel,
	linux-mm, linux-doc, Luis R. Rodriguez, Kees Cook,
	Linus Torvalds, Jan Kara, Paul E. McKenney, Ingo Molnar,
	Miklos Szeredi, Matthew Wilcox, Larry Woodman, James Bottomley,
	Wangkai (Kevin C),
	Michal Hocko

On Tue, 28 Aug 2018 13:19:38 -0400 Waiman Long <longman@redhat.com> wrote:

> This patchset is a reduced scope version of the
> patchset "fs/dcache: Track & limit # of negative dentries"
> (https://lkml.org/lkml/2018/7/12/586). Only the first 2 patches are
> included to track the number of negative dentries in the system as well
> as making negative dentries more easily reclaimed than positive ones.
> 
> There are controversies on limiting number of negative dentries as it may
> make negative dentries special in term of how memory resources are to
> be managed in the kernel. However, I don't believe I heard any concern
> about tracking the number of negative dentries in the system. So it is
> better to separate that out and get it done with. We can deal with the
> controversial part later on.

Seems reasonable.

It would be nice to see testing results please.  Quite comprehensive
ones.

And again, an apparently permanent feature of this patchset is that the
changelogs fail to provide descriptions of real-world problems with the
existing code.  Please do provide those (comprehensive) descriptions and
demonstrate that these changes resolve those problems.

Also, a grumpynit: with 100% uniformity, the vfs presently refers to
negative dentries with the string "negative" in the identifier.  This
patchset abbreviates that to "neg".

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/2] fs/dcache: Track # of negative dentries
  2018-08-28 22:50 ` [PATCH 0/2] fs/dcache: Track # of negative dentries Andrew Morton
@ 2018-08-28 22:54   ` Waiman Long
  0 siblings, 0 replies; 29+ messages in thread
From: Waiman Long @ 2018-08-28 22:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-fsdevel,
	linux-mm, linux-doc, Luis R. Rodriguez, Kees Cook,
	Linus Torvalds, Jan Kara, Paul E. McKenney, Ingo Molnar,
	Miklos Szeredi, Matthew Wilcox, Larry Woodman, James Bottomley,
	Wangkai (Kevin C),
	Michal Hocko

On 08/28/2018 06:50 PM, Andrew Morton wrote:
> On Tue, 28 Aug 2018 13:19:38 -0400 Waiman Long <longman@redhat.com> wrote:
>
>> This patchset is a reduced scope version of the
>> patchset "fs/dcache: Track & limit # of negative dentries"
>> (https://lkml.org/lkml/2018/7/12/586). Only the first 2 patches are
>> included to track the number of negative dentries in the system as well
>> as making negative dentries more easily reclaimed than positive ones.
>>
>> There are controversies on limiting number of negative dentries as it may
>> make negative dentries special in term of how memory resources are to
>> be managed in the kernel. However, I don't believe I heard any concern
>> about tracking the number of negative dentries in the system. So it is
>> better to separate that out and get it done with. We can deal with the
>> controversial part later on.
> Seems reasonable.
>
> It would be nice to see testing results please.  Quite comprehensive
> ones.
>
> And again, an apparently permanent feature of this patchset is that the
> changelogs fail to provide descriptions of real-world problems with the
> existing code.  Please do provide those (comprehensive) descriptions and
> demonstrate that these changes resolve those problems.
>
> Also, a grumpynit: with 100% uniformity, the vfs presently refers to
> negative dentries with the string "negative" in the identifier.  This
> patchset abbreviates that to "neg".
>
Will do.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed
  2018-08-28 17:19 ` [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed Waiman Long
  2018-08-28 22:13   ` Matthew Wilcox
@ 2018-08-28 23:01   ` Andrew Morton
  2018-08-29 17:54     ` Paul E. McKenney
  2018-08-29  1:02   ` Dave Chinner
  2018-08-29  7:51   ` Michal Hocko
  3 siblings, 1 reply; 29+ messages in thread
From: Andrew Morton @ 2018-08-28 23:01 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-fsdevel,
	linux-mm, linux-doc, Luis R. Rodriguez, Kees Cook,
	Linus Torvalds, Jan Kara, Paul E. McKenney, Ingo Molnar,
	Miklos Szeredi, Matthew Wilcox, Larry Woodman, James Bottomley,
	Wangkai (Kevin C),
	Michal Hocko


Another pet peeve ;)

On Tue, 28 Aug 2018 13:19:40 -0400 Waiman Long <longman@redhat.com> wrote:

>  /**
> + * list_lru_add_head: add an element to the lru list's head
> + * @list_lru: the lru pointer
> + * @item: the item to be added.
> + *
> + * This is similar to list_lru_add(). The only difference is the location
> + * where the new item will be added. The list_lru_add() function will add

People often use the term "the foo() function".  I don't know why -
just say "foo()"!

> + * the new item to the tail as it is the most recently used one. The
> + * list_lru_add_head() will add the new item into the head so that it

Ditto.

"to the head"

> + * will the first to go if a shrinker is running. So this function should

"will be the"

> + * only be used for less important item that can be the first to go if

"items"

> + * the system is under memory pressure.
> + *
> + * Return value: true if the list was updated, false otherwise
> + */

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed
  2018-08-28 22:29     ` Waiman Long
@ 2018-08-28 23:10       ` Linus Torvalds
  2018-08-28 23:22         ` Andrew Morton
  2018-08-29  1:18         ` Waiman Long
  0 siblings, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2018-08-28 23:10 UTC (permalink / raw)
  To: Waiman Long
  Cc: Matthew Wilcox, Al Viro, Jonathan Corbet,
	Linux Kernel Mailing List, linux-fsdevel, linux-mm,
	open list:DOCUMENTATION, Luis R. Rodriguez, Kees Cook, Jan Kara,
	Paul McKenney, Andrew Morton, Ingo Molnar, Miklos Szeredi,
	Larry Woodman, James Bottomley, Wangkai (Kevin,C),
	Michal Hocko

On Tue, Aug 28, 2018 at 3:29 PM Waiman Long <longman@redhat.com> wrote:
>
> Yes, I can rewrite it. What is the problem with the abbreviated form?

Either gcc rewrites it for you, or you end up _actually_ using a
function pointer and calling through it.

The latter would be absolutely horribly bad for something like
"list_add()", which should expand to just a couple of instructions.

And the former would be ok, except for the "you wrote code the garbage
way, and then depended on the compiler fixing it up". Which we
generally try to avoid in the kernel.

(Don't get me wrong - we definitely depend on the compiler doing a
good job at CSE and dead code elimination etc, but generally we try to
avoid the whole "compiler has to rewrite code to be good" model).

                 Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed
  2018-08-28 23:10       ` Linus Torvalds
@ 2018-08-28 23:22         ` Andrew Morton
  2018-08-29  1:18           ` Waiman Long
  2018-08-29  1:18         ` Waiman Long
  1 sibling, 1 reply; 29+ messages in thread
From: Andrew Morton @ 2018-08-28 23:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Waiman Long, Matthew Wilcox, Al Viro, Jonathan Corbet,
	Linux Kernel Mailing List, linux-fsdevel, linux-mm,
	open list:DOCUMENTATION, Luis R. Rodriguez, Kees Cook, Jan Kara,
	Paul McKenney, Ingo Molnar, Miklos Szeredi, Larry Woodman,
	James Bottomley, Wangkai (Kevin,C),
	Michal Hocko

On Tue, 28 Aug 2018 16:10:24 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Aug 28, 2018 at 3:29 PM Waiman Long <longman@redhat.com> wrote:
> >
> > Yes, I can rewrite it. What is the problem with the abbreviated form?
> 
> Either gcc rewrites it for you, or you end up _actually_ using a
> function pointer and calling through it.
> 
> The latter would be absolutely horribly bad for something like
> "list_add()", which should expand to just a couple of instructions.
> 
> And the former would be ok, except for the "you wrote code the garbage
> way, and then depended on the compiler fixing it up". Which we
> generally try to avoid in the kernel.
> 
> (Don't get me wrong - we definitely depend on the compiler doing a
> good job at CSE and dead code elimination etc, but generally we try to
> avoid the whole "compiler has to rewrite code to be good" model).
> 

And the "abbreviated form" will surely explode if one or both of those
"functions" happens to be implemented (or later reimplemented) as a macro.
It's best not to unnecessarily make such assumptions.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] fs/dcache: Track & report number of negative dentries
  2018-08-28 17:19 ` [PATCH 1/2] fs/dcache: Track & report number " Waiman Long
@ 2018-08-29  0:11   ` Dave Chinner
  2018-08-29 17:11     ` Waiman Long
  2018-08-31 14:31     ` Matthew Wilcox
  0 siblings, 2 replies; 29+ messages in thread
From: Dave Chinner @ 2018-08-29  0:11 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-fsdevel,
	linux-mm, linux-doc, Luis R. Rodriguez, Kees Cook,
	Linus Torvalds, Jan Kara, Paul E. McKenney, Andrew Morton,
	Ingo Molnar, Miklos Szeredi, Matthew Wilcox, Larry Woodman,
	James Bottomley, Wangkai (Kevin C),
	Michal Hocko

On Tue, Aug 28, 2018 at 01:19:39PM -0400, Waiman Long wrote:
> The current dentry number tracking code doesn't distinguish between
> positive & negative dentries. It just reports the total number of
> dentries in the LRU lists.
> 
> As excessive number of negative dentries can have an impact on system
> performance, it will be wise to track the number of positive and
> negative dentries separately.
> 
> This patch adds tracking for the total number of negative dentries in
> the system LRU lists and reports it in the /proc/sys/fs/dentry-state
> file. The number, however, does not include negative dentries that are
> in flight but not in the LRU yet.
> 
> The number of positive dentries in the LRU lists can be roughly found
> by subtracting the number of negative dentries from the total.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  Documentation/sysctl/fs.txt | 19 +++++++++++++------
>  fs/dcache.c                 | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/dcache.h      |  7 ++++---
>  3 files changed, 62 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
> index 819caf8..118bb93 100644
> --- a/Documentation/sysctl/fs.txt
> +++ b/Documentation/sysctl/fs.txt
> @@ -63,19 +63,26 @@ struct {
>          int nr_unused;
>          int age_limit;         /* age in seconds */
>          int want_pages;        /* pages requested by system */
> -        int dummy[2];
> +        int nr_negative;       /* # of unused negative dentries */
> +        int dummy;
>  } dentry_stat = {0, 0, 45, 0,};

That's not a backwards compatible ABI change. Those dummy fields
used to represent some metric we no longer calculate, and there are
probably still monitoring apps out there that think they still have
the old meaning. i.e. they are still visible to userspace:

$ cat /proc/sys/fs/dentry-state 
83090	67661	45	0	0	0
$

IOWs, you can add new fields for new metrics to the end of the
structure, but you can't re-use existing fields even if they
aren't calculated anymore.

[....]

> @@ -214,6 +226,28 @@ static inline int dentry_string_cmp(const unsigned char *cs, const unsigned char
>  
>  #endif
>  
> +static inline void __neg_dentry_dec(struct dentry *dentry)
> +{
> +	this_cpu_dec(nr_dentry_neg);
> +}
> +
> +static inline void neg_dentry_dec(struct dentry *dentry)
> +{
> +	if (unlikely(d_is_negative(dentry)))
> +		__neg_dentry_dec(dentry);

unlikely() considered harmful.

The workload you are trying to optimise is whe negative dentries are
the common case. IOWs, static branch prediction hints like this will
be wrong exactly when we want the branch to be predicted correctly
by the hardware.

> +}
> +
> +static inline void __neg_dentry_inc(struct dentry *dentry)
> +{
> +	this_cpu_inc(nr_dentry_neg);
> +}
> +
> +static inline void neg_dentry_inc(struct dentry *dentry)
> +{
> +	if (unlikely(d_is_negative(dentry)))
> +		__neg_dentry_inc(dentry);
> +}

These wrappers obfuscate the code - they do not do what the
name suggests and instead are conditional on dentry state.

I'd just open code this stuff - the code is much better without
the wrappers.

> +
>  static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *ct, unsigned tcount)
>  {
>  	/*
> @@ -331,6 +365,8 @@ static inline void __d_clear_type_and_inode(struct dentry *dentry)
>  	flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU);
>  	WRITE_ONCE(dentry->d_flags, flags);
>  	dentry->d_inode = NULL;
> +	if (dentry->d_flags & DCACHE_LRU_LIST)
> +		__neg_dentry_inc(dentry);
>  }
>  
>  static void dentry_free(struct dentry *dentry)
> @@ -395,6 +431,7 @@ static void d_lru_add(struct dentry *dentry)
>  	dentry->d_flags |= DCACHE_LRU_LIST;
>  	this_cpu_inc(nr_dentry_unused);
>  	WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
> +	neg_dentry_inc(dentry);

Like this - why on earth would we increment the negative dentry
count for every dentry that is added to LRU? Open coding

 	this_cpu_inc(nr_dentry_unused);
+	if (d_is_negative(dentry))
+		this_cpu_inc(nr_dentry_neg);
 	WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru));

That's obvious to the reader what we are doing, and it aggregates
all the accounting in a single location. Same for the rest of the
code.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed
  2018-08-28 17:19 ` [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed Waiman Long
  2018-08-28 22:13   ` Matthew Wilcox
  2018-08-28 23:01   ` Andrew Morton
@ 2018-08-29  1:02   ` Dave Chinner
  2018-08-29 19:34     ` Waiman Long
  2018-08-29  7:51   ` Michal Hocko
  3 siblings, 1 reply; 29+ messages in thread
From: Dave Chinner @ 2018-08-29  1:02 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-fsdevel,
	linux-mm, linux-doc, Luis R. Rodriguez, Kees Cook,
	Linus Torvalds, Jan Kara, Paul E. McKenney, Andrew Morton,
	Ingo Molnar, Miklos Szeredi, Matthew Wilcox, Larry Woodman,
	James Bottomley, Wangkai (Kevin C),
	Michal Hocko

On Tue, Aug 28, 2018 at 01:19:40PM -0400, Waiman Long wrote:
> For negative dentries that are accessed once and never used again, they
> should be removed first before other dentries when shrinker is running.
> This is done by putting negative dentries at the head of the LRU list
> instead at the tail.
> 
> A new DCACHE_NEW_NEGATIVE flag is now added to a negative dentry when it
> is initially created. When such a dentry is added to the LRU, it will be
> added to the head so that it will be the first to go when a shrinker is
> running if it is never accessed again (DCACHE_REFERENCED bit not set).
> The flag is cleared after the LRU list addition.

This exposes internal LRU list functionality to callers. I carefully
hid what end of the list was the most or least recent from
subsystems interacting with LRUs precisely because "head" and "tail"
are completely confusing when interacting with a LRU.

LRUs are about tracking relative object age, not heads and tails.

IOWs, "track this object as most recently used", not "add this
object to the list tail". The opposite is "track this object is
least recently used", not "add object at head of list".

IOWs, the interface should be list_lru_add_oldest() or maybe
list_lru_add_least_recent() to indicate that these objects are
considered to be the oldest and therefore prime immediate reclaim
candidates.

Which points out a problem. That the most recent negative dentry
will be the first to be reclaimed. That's not LRU behaviour, and
prevents a working set of negative dentries from being retained
when the shrinker rotates through the dentries in LRU order. i.e.
this patch turns the LRU into a MRU list for negative dentries.

And then there's shrinker behaviour. What happens if the shrinker
isolate callback returns LRU_ROTATE on a negative dentry? It gets
moved to the most recent end of the list, so it won't have an
attempt to reclaim it again until it's tried to reclaim all the real
dentries. IOWs, it goes back to behaving like LRUs are supposed to
behaving.

IOWs, reclaim behaviour of negative dentries will be
highly unpredictable, it will not easily retain a working set, nor
will the working set it does retain be predictable or easy to eject
from memory when the workload changes.

Is this the behavour what we want for negative dentries?

Perhaps a second internal LRU list in the list_lru for "immediate
reclaim" objects would be a better solution. i.e. similar to the
active/inactive lists used for prioritising the working set iover
single use pages in page reclaim. negative dentries go onto the
immediate list, real dentries go on the existing list. Both are LRU,
and the shrinker operates on the immediate list first. When we
rotate referenced negative dentries on the immediate list, promote
them to the active list with all the real dentries so they age out
with the rest of the working set. That way single use negative
dentries get removed in LRU order in preference to the working set
of real dentries.

Being able to make changes to the list implementation easily was one
of the reasons I hid the implementation of the list_lru from the
interface callers use....

[...]

> @@ -430,8 +424,20 @@ static void d_lru_add(struct dentry *dentry)
>  	D_FLAG_VERIFY(dentry, 0);
>  	dentry->d_flags |= DCACHE_LRU_LIST;
>  	this_cpu_inc(nr_dentry_unused);
> +	if (d_is_negative(dentry)) {
> +		__neg_dentry_inc(dentry);

/me notes this patch now open codes this, like suggested in the
previous patch.

> +		if (dentry->d_flags & DCACHE_NEW_NEGATIVE) {
> +			/*
> +			 * Add the negative dentry to the head once, it
> +			 * will be added to the tail next time.
> +			 */
> +			WARN_ON_ONCE(!list_lru_add_head(
> +				&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
> +			dentry->d_flags &= ~DCACHE_NEW_NEGATIVE;
> +			return;
> +		}
> +	}
>  	WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
> -	neg_dentry_inc(dentry);
>  }
>  
>  static void d_lru_del(struct dentry *dentry)
> @@ -2620,6 +2626,9 @@ static inline void __d_add(struct dentry *dentry, struct inode *inode)
>  		__d_set_inode_and_type(dentry, inode, add_flags);
>  		raw_write_seqcount_end(&dentry->d_seq);
>  		fsnotify_update_flags(dentry);
> +	} else {
> +		/* It is a negative dentry, add it to LRU head initially. */
> +		dentry->d_flags |= DCACHE_NEW_NEGATIVE;

Comments about LRU behaviour should not be put anywhere but in the
lru code. Otherwise it ends up stale and wrong, assuming it is
correct to start with....

>  	}
>  	__d_rehash(dentry);
>  	if (dir)
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index df942e5..03a1918 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -214,6 +214,7 @@ struct dentry_operations {
>  #define DCACHE_FALLTHRU			0x01000000 /* Fall through to lower layer */
>  #define DCACHE_ENCRYPTED_WITH_KEY	0x02000000 /* dir is encrypted with a valid key */
>  #define DCACHE_OP_REAL			0x04000000
> +#define DCACHE_NEW_NEGATIVE		0x08000000 /* New negative dentry */
>  
>  #define DCACHE_PAR_LOOKUP		0x10000000 /* being looked up (with parent locked shared) */
>  #define DCACHE_DENTRY_CURSOR		0x20000000
> diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
> index aa5efd9..bfac057 100644
> --- a/include/linux/list_lru.h
> +++ b/include/linux/list_lru.h
> @@ -90,6 +90,23 @@ int __list_lru_init(struct list_lru *lru, bool memcg_aware,
>  bool list_lru_add(struct list_lru *lru, struct list_head *item);
>  
>  /**
> + * list_lru_add_head: add an element to the lru list's head
> + * @list_lru: the lru pointer
> + * @item: the item to be added.
> + *
> + * This is similar to list_lru_add(). The only difference is the location
> + * where the new item will be added. The list_lru_add() function will add
> + * the new item to the tail as it is the most recently used one. The
> + * list_lru_add_head() will add the new item into the head so that it
> + * will the first to go if a shrinker is running. So this function should
> + * only be used for less important item that can be the first to go if
> + * the system is under memory pressure.

As mentioned above, this API needs reference object ages, not
internal list ordering and shrinker implementation details.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed
  2018-08-28 23:10       ` Linus Torvalds
  2018-08-28 23:22         ` Andrew Morton
@ 2018-08-29  1:18         ` Waiman Long
  1 sibling, 0 replies; 29+ messages in thread
From: Waiman Long @ 2018-08-29  1:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthew Wilcox, Al Viro, Jonathan Corbet,
	Linux Kernel Mailing List, linux-fsdevel, linux-mm,
	open list:DOCUMENTATION, Luis R. Rodriguez, Kees Cook, Jan Kara,
	Paul McKenney, Andrew Morton, Ingo Molnar, Miklos Szeredi,
	Larry Woodman, James Bottomley, Wangkai (Kevin,C),
	Michal Hocko

On 08/28/2018 07:10 PM, Linus Torvalds wrote:
> On Tue, Aug 28, 2018 at 3:29 PM Waiman Long <longman@redhat.com> wrote:
>> Yes, I can rewrite it. What is the problem with the abbreviated form?
> Either gcc rewrites it for you, or you end up _actually_ using a
> function pointer and calling through it.

Yes, function pointer will be really bad.
>
> The latter would be absolutely horribly bad for something like
> "list_add()", which should expand to just a couple of instructions.
>
> And the former would be ok, except for the "you wrote code the garbage
> way, and then depended on the compiler fixing it up". Which we
> generally try to avoid in the kernel.
>
> (Don't get me wrong - we definitely depend on the compiler doing a
> good job at CSE and dead code elimination etc, but generally we try to
> avoid the whole "compiler has to rewrite code to be good" model).
>
>                  Linus

I see your point here. I will rewrite to use the regular if-then-else.

Thanks,
Longman

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed
  2018-08-28 23:22         ` Andrew Morton
@ 2018-08-29  1:18           ` Waiman Long
  0 siblings, 0 replies; 29+ messages in thread
From: Waiman Long @ 2018-08-29  1:18 UTC (permalink / raw)
  To: Andrew Morton, Linus Torvalds
  Cc: Matthew Wilcox, Al Viro, Jonathan Corbet,
	Linux Kernel Mailing List, linux-fsdevel, linux-mm,
	open list:DOCUMENTATION, Luis R. Rodriguez, Kees Cook, Jan Kara,
	Paul McKenney, Ingo Molnar, Miklos Szeredi, Larry Woodman,
	James Bottomley, Wangkai (Kevin,C),
	Michal Hocko

On 08/28/2018 07:22 PM, Andrew Morton wrote:
> On Tue, 28 Aug 2018 16:10:24 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>> On Tue, Aug 28, 2018 at 3:29 PM Waiman Long <longman@redhat.com> wrote:
>>> Yes, I can rewrite it. What is the problem with the abbreviated form?
>> Either gcc rewrites it for you, or you end up _actually_ using a
>> function pointer and calling through it.
>>
>> The latter would be absolutely horribly bad for something like
>> "list_add()", which should expand to just a couple of instructions.
>>
>> And the former would be ok, except for the "you wrote code the garbage
>> way, and then depended on the compiler fixing it up". Which we
>> generally try to avoid in the kernel.
>>
>> (Don't get me wrong - we definitely depend on the compiler doing a
>> good job at CSE and dead code elimination etc, but generally we try to
>> avoid the whole "compiler has to rewrite code to be good" model).
>>
> And the "abbreviated form" will surely explode if one or both of those
> "functions" happens to be implemented (or later reimplemented) as a macro.
> It's best not to unnecessarily make such assumptions.
>
Yes,  that is true.

Thanks,
Longman

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed
  2018-08-28 17:19 ` [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed Waiman Long
                     ` (2 preceding siblings ...)
  2018-08-29  1:02   ` Dave Chinner
@ 2018-08-29  7:51   ` Michal Hocko
  2018-08-29 19:58     ` Waiman Long
  3 siblings, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2018-08-29  7:51 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-fsdevel,
	linux-mm, linux-doc, Luis R. Rodriguez, Kees Cook,
	Linus Torvalds, Jan Kara, Paul E. McKenney, Andrew Morton,
	Ingo Molnar, Miklos Szeredi, Matthew Wilcox, Larry Woodman,
	James Bottomley, Wangkai (Kevin C)

On Tue 28-08-18 13:19:40, Waiman Long wrote:
> For negative dentries that are accessed once and never used again, they
> should be removed first before other dentries when shrinker is running.
> This is done by putting negative dentries at the head of the LRU list
> instead at the tail.
> 
> A new DCACHE_NEW_NEGATIVE flag is now added to a negative dentry when it
> is initially created. When such a dentry is added to the LRU, it will be
> added to the head so that it will be the first to go when a shrinker is
> running if it is never accessed again (DCACHE_REFERENCED bit not set).
> The flag is cleared after the LRU list addition.

Placing object to the head of the LRU list can be really tricky as Dave
pointed out. I am not familiar with the dentry cache reclaim so my
comparison below might not apply. Let me try anyway.

Negative dentries sound very similar to MADV_FREE pages from the reclaim
POV. They are primary candidate for reclaim, yet you want to preserve
aging to other easily reclaimable objects (including other MADV_FREE
pages). What we do for those pages is to move them from the anonymous
LRU list to the inactive file LRU list. Now you obviously do not have
anon/file LRUs but something similar to active/inactive LRU lists might
be a reasonably good match. Have easily reclaimable dentries on the
inactive list including negative dentries. If negative entries are
heavily used then they can promote to the active list because there is
no reason to reclaim them soon.

Just my 2c
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] fs/dcache: Track & report number of negative dentries
  2018-08-29  0:11   ` Dave Chinner
@ 2018-08-29 17:11     ` Waiman Long
  2018-08-30  1:43       ` Dave Chinner
  2018-08-31 14:31     ` Matthew Wilcox
  1 sibling, 1 reply; 29+ messages in thread
From: Waiman Long @ 2018-08-29 17:11 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-fsdevel,
	linux-mm, linux-doc, Luis R. Rodriguez, Kees Cook,
	Linus Torvalds, Jan Kara, Paul E. McKenney, Andrew Morton,
	Ingo Molnar, Miklos Szeredi, Matthew Wilcox, Larry Woodman,
	James Bottomley, Wangkai (Kevin C),
	Michal Hocko

On 08/28/2018 08:11 PM, Dave Chinner wrote:
> On Tue, Aug 28, 2018 at 01:19:39PM -0400, Waiman Long wrote:
>> The current dentry number tracking code doesn't distinguish between
>> positive & negative dentries. It just reports the total number of
>> dentries in the LRU lists.
>>
>> As excessive number of negative dentries can have an impact on system
>> performance, it will be wise to track the number of positive and
>> negative dentries separately.
>>
>> This patch adds tracking for the total number of negative dentries in
>> the system LRU lists and reports it in the /proc/sys/fs/dentry-state
>> file. The number, however, does not include negative dentries that are
>> in flight but not in the LRU yet.
>>
>> The number of positive dentries in the LRU lists can be roughly found
>> by subtracting the number of negative dentries from the total.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>  Documentation/sysctl/fs.txt | 19 +++++++++++++------
>>  fs/dcache.c                 | 45 +++++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/dcache.h      |  7 ++++---
>>  3 files changed, 62 insertions(+), 9 deletions(-)
>>
>> diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
>> index 819caf8..118bb93 100644
>> --- a/Documentation/sysctl/fs.txt
>> +++ b/Documentation/sysctl/fs.txt
>> @@ -63,19 +63,26 @@ struct {
>>          int nr_unused;
>>          int age_limit;         /* age in seconds */
>>          int want_pages;        /* pages requested by system */
>> -        int dummy[2];
>> +        int nr_negative;       /* # of unused negative dentries */
>> +        int dummy;
>>  } dentry_stat = {0, 0, 45, 0,};
> That's not a backwards compatible ABI change. Those dummy fields
> used to represent some metric we no longer calculate, and there are
> probably still monitoring apps out there that think they still have
> the old meaning. i.e. they are still visible to userspace:
>
> $ cat /proc/sys/fs/dentry-state 
> 83090	67661	45	0	0	0
> $
>
> IOWs, you can add new fields for new metrics to the end of the
> structure, but you can't re-use existing fields even if they
> aren't calculated anymore.
>
> [....]

I looked up the git history and the state of the dentry_stat structure
hadn't changed since it was first put into git in 2.6.12-rc2 on Apr 16,
2005. That was over 13 years ago. Even adding an extra argument can have
the potential of breaking old applications depending on how the parsing
code was written.

Given that systems that are still using some very old tools are not
likely to upgrade to the latest kernel anyway. I don't see that as a big
problem.

>> @@ -214,6 +226,28 @@ static inline int dentry_string_cmp(const unsigned char *cs, const unsigned char
>>  
>>  #endif
>>  
>> +static inline void __neg_dentry_dec(struct dentry *dentry)
>> +{
>> +	this_cpu_dec(nr_dentry_neg);
>> +}
>> +
>> +static inline void neg_dentry_dec(struct dentry *dentry)
>> +{
>> +	if (unlikely(d_is_negative(dentry)))
>> +		__neg_dentry_dec(dentry);
> unlikely() considered harmful.
>
> The workload you are trying to optimise is whe negative dentries are
> the common case. IOWs, static branch prediction hints like this will
> be wrong exactly when we want the branch to be predicted correctly
> by the hardware.

You are right. I will remove the branch hints.

>> +}
>> +
>> +static inline void __neg_dentry_inc(struct dentry *dentry)
>> +{
>> +	this_cpu_inc(nr_dentry_neg);
>> +}
>> +
>> +static inline void neg_dentry_inc(struct dentry *dentry)
>> +{
>> +	if (unlikely(d_is_negative(dentry)))
>> +		__neg_dentry_inc(dentry);
>> +}
> These wrappers obfuscate the code - they do not do what the
> name suggests and instead are conditional on dentry state.
>
> I'd just open code this stuff - the code is much better without
> the wrappers.

Sure. I will open code the counter updates.

>> +
>>  static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *ct, unsigned tcount)
>>  {
>>  	/*
>> @@ -331,6 +365,8 @@ static inline void __d_clear_type_and_inode(struct dentry *dentry)
>>  	flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU);
>>  	WRITE_ONCE(dentry->d_flags, flags);
>>  	dentry->d_inode = NULL;
>> +	if (dentry->d_flags & DCACHE_LRU_LIST)
>> +		__neg_dentry_inc(dentry);
>>  }
>>  
>>  static void dentry_free(struct dentry *dentry)
>> @@ -395,6 +431,7 @@ static void d_lru_add(struct dentry *dentry)
>>  	dentry->d_flags |= DCACHE_LRU_LIST;
>>  	this_cpu_inc(nr_dentry_unused);
>>  	WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
>> +	neg_dentry_inc(dentry);
> Like this - why on earth would we increment the negative dentry
> count for every dentry that is added to LRU? Open coding
>
>  	this_cpu_inc(nr_dentry_unused);
> +	if (d_is_negative(dentry))
> +		this_cpu_inc(nr_dentry_neg);
>  	WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
>
> That's obvious to the reader what we are doing, and it aggregates
> all the accounting in a single location. Same for the rest of the
> code.
>
> Cheers,
>
> Dave.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed
  2018-08-28 23:01   ` Andrew Morton
@ 2018-08-29 17:54     ` Paul E. McKenney
  2018-08-29 20:03       ` Waiman Long
  2018-08-29 21:04       ` Matthew Wilcox
  0 siblings, 2 replies; 29+ messages in thread
From: Paul E. McKenney @ 2018-08-29 17:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Waiman Long, Alexander Viro, Jonathan Corbet, linux-kernel,
	linux-fsdevel, linux-mm, linux-doc, Luis R. Rodriguez, Kees Cook,
	Linus Torvalds, Jan Kara, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C),
	Michal Hocko

On Tue, Aug 28, 2018 at 04:01:50PM -0700, Andrew Morton wrote:
> 
> Another pet peeve ;)
> 
> On Tue, 28 Aug 2018 13:19:40 -0400 Waiman Long <longman@redhat.com> wrote:
> 
> >  /**
> > + * list_lru_add_head: add an element to the lru list's head
> > + * @list_lru: the lru pointer
> > + * @item: the item to be added.
> > + *
> > + * This is similar to list_lru_add(). The only difference is the location
> > + * where the new item will be added. The list_lru_add() function will add
> 
> People often use the term "the foo() function".  I don't know why -
> just say "foo()"!

For whatever it is worth...

I tend to use "The foo() function ..." instead of "foo() ..." in order
to properly capitalize the first word of the sentence.  So I might say
"The call_rcu() function enqueues an RCU callback." rather than something
like "call_rcu() enqueues an RCU callback."  Or I might use some other
trick to keep "call_rcu()" from being the first word of the sentence.
But if the end of the previous sentence introduced call_rcu(), you
usually want the next sentence's first use of "call_rcu()" to be very
early in the sentence, because otherwise the flow will seem choppy.

And no, I have no idea what I would do if I were writing in German,
where nouns are capitalized, given that function names tend to be used
as nouns.  Probably I would get yelled at a lot for capitalizing my
function names.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed
  2018-08-29  1:02   ` Dave Chinner
@ 2018-08-29 19:34     ` Waiman Long
  2018-08-30  1:12       ` Dave Chinner
  0 siblings, 1 reply; 29+ messages in thread
From: Waiman Long @ 2018-08-29 19:34 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-fsdevel,
	linux-mm, linux-doc, Luis R. Rodriguez, Kees Cook,
	Linus Torvalds, Jan Kara, Paul E. McKenney, Andrew Morton,
	Ingo Molnar, Miklos Szeredi, Matthew Wilcox, Larry Woodman,
	James Bottomley, Wangkai (Kevin C),
	Michal Hocko

On 08/28/2018 09:02 PM, Dave Chinner wrote:
> On Tue, Aug 28, 2018 at 01:19:40PM -0400, Waiman Long wrote:
>> For negative dentries that are accessed once and never used again, they
>> should be removed first before other dentries when shrinker is running.
>> This is done by putting negative dentries at the head of the LRU list
>> instead at the tail.
>>
>> A new DCACHE_NEW_NEGATIVE flag is now added to a negative dentry when it
>> is initially created. When such a dentry is added to the LRU, it will be
>> added to the head so that it will be the first to go when a shrinker is
>> running if it is never accessed again (DCACHE_REFERENCED bit not set).
>> The flag is cleared after the LRU list addition.
> This exposes internal LRU list functionality to callers. I carefully
> hid what end of the list was the most or least recent from
> subsystems interacting with LRUs precisely because "head" and "tail"
> are completely confusing when interacting with a LRU.
>
> LRUs are about tracking relative object age, not heads and tails.
>
> IOWs, "track this object as most recently used", not "add this
> object to the list tail". The opposite is "track this object is
> least recently used", not "add object at head of list".
>
> IOWs, the interface should be list_lru_add_oldest() or maybe
> list_lru_add_least_recent() to indicate that these objects are
> considered to be the oldest and therefore prime immediate reclaim
> candidates.

Good point. I will use age as function name instead.

> Which points out a problem. That the most recent negative dentry
> will be the first to be reclaimed. That's not LRU behaviour, and
> prevents a working set of negative dentries from being retained
> when the shrinker rotates through the dentries in LRU order. i.e.
> this patch turns the LRU into a MRU list for negative dentries.
>
> And then there's shrinker behaviour. What happens if the shrinker
> isolate callback returns LRU_ROTATE on a negative dentry? It gets
> moved to the most recent end of the list, so it won't have an
> attempt to reclaim it again until it's tried to reclaim all the real
> dentries. IOWs, it goes back to behaving like LRUs are supposed to
> behaving.
>
> IOWs, reclaim behaviour of negative dentries will be
> highly unpredictable, it will not easily retain a working set, nor
> will the working set it does retain be predictable or easy to eject
> from memory when the workload changes.
>
> Is this the behavour what we want for negative dentries?

I am aware that the behavior is not strictly LRU for negative dentries.
This is a compromise for using one LRU list for 2 different classes of
dentries. The basic idea is that negative dentries that are used only
once will go first irrespective of their age.

> Perhaps a second internal LRU list in the list_lru for "immediate
> reclaim" objects would be a better solution. i.e. similar to the
> active/inactive lists used for prioritising the working set iover
> single use pages in page reclaim. negative dentries go onto the
> immediate list, real dentries go on the existing list. Both are LRU,
> and the shrinker operates on the immediate list first. When we
> rotate referenced negative dentries on the immediate list, promote
> them to the active list with all the real dentries so they age out
> with the rest of the working set. That way single use negative
> dentries get removed in LRU order in preference to the working set
> of real dentries.
>
> Being able to make changes to the list implementation easily was one
> of the reasons I hid the implementation of the list_lru from the
> interface callers use....
>
> [...]

I have thought about using 2 lists for dentries. That will require much
more extensive changes to the code and hence much more testing will be
needed to verify their correctness. That is the main reason why I try to
avoid doing that.

As you have suggested, we could implement this 2-level LRU list in the
list_lru API. But it is used by other subsystems as well. Extensive
change like that will have similar issue in term of testing and
verification effort.

>> @@ -430,8 +424,20 @@ static void d_lru_add(struct dentry *dentry)
>>  	D_FLAG_VERIFY(dentry, 0);
>>  	dentry->d_flags |= DCACHE_LRU_LIST;
>>  	this_cpu_inc(nr_dentry_unused);
>> +	if (d_is_negative(dentry)) {
>> +		__neg_dentry_inc(dentry);
> /me notes this patch now open codes this, like suggested in the
> previous patch.
>
>> +		if (dentry->d_flags & DCACHE_NEW_NEGATIVE) {
>> +			/*
>> +			 * Add the negative dentry to the head once, it
>> +			 * will be added to the tail next time.
>> +			 */
>> +			WARN_ON_ONCE(!list_lru_add_head(
>> +				&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
>> +			dentry->d_flags &= ~DCACHE_NEW_NEGATIVE;
>> +			return;
>> +		}
>> +	}
>>  	WARN_ON_ONCE(!list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
>> -	neg_dentry_inc(dentry);
>>  }
>>  
>>  static void d_lru_del(struct dentry *dentry)
>> @@ -2620,6 +2626,9 @@ static inline void __d_add(struct dentry *dentry, struct inode *inode)
>>  		__d_set_inode_and_type(dentry, inode, add_flags);
>>  		raw_write_seqcount_end(&dentry->d_seq);
>>  		fsnotify_update_flags(dentry);
>> +	} else {
>> +		/* It is a negative dentry, add it to LRU head initially. */
>> +		dentry->d_flags |= DCACHE_NEW_NEGATIVE;
> Comments about LRU behaviour should not be put anywhere but in the
> lru code. Otherwise it ends up stale and wrong, assuming it is
> correct to start with....
>

I will document all that in the LRU code.

>>  	}
>>  	__d_rehash(dentry);
>>  	if (dir)
>> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
>> index df942e5..03a1918 100644
>> --- a/include/linux/dcache.h
>> +++ b/include/linux/dcache.h
>> @@ -214,6 +214,7 @@ struct dentry_operations {
>>  #define DCACHE_FALLTHRU			0x01000000 /* Fall through to lower layer */
>>  #define DCACHE_ENCRYPTED_WITH_KEY	0x02000000 /* dir is encrypted with a valid key */
>>  #define DCACHE_OP_REAL			0x04000000
>> +#define DCACHE_NEW_NEGATIVE		0x08000000 /* New negative dentry */
>>  
>>  #define DCACHE_PAR_LOOKUP		0x10000000 /* being looked up (with parent locked shared) */
>>  #define DCACHE_DENTRY_CURSOR		0x20000000
>> diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
>> index aa5efd9..bfac057 100644
>> --- a/include/linux/list_lru.h
>> +++ b/include/linux/list_lru.h
>> @@ -90,6 +90,23 @@ int __list_lru_init(struct list_lru *lru, bool memcg_aware,
>>  bool list_lru_add(struct list_lru *lru, struct list_head *item);
>>  
>>  /**
>> + * list_lru_add_head: add an element to the lru list's head
>> + * @list_lru: the lru pointer
>> + * @item: the item to be added.
>> + *
>> + * This is similar to list_lru_add(). The only difference is the location
>> + * where the new item will be added. The list_lru_add() function will add
>> + * the new item to the tail as it is the most recently used one. The
>> + * list_lru_add_head() will add the new item into the head so that it
>> + * will the first to go if a shrinker is running. So this function should
>> + * only be used for less important item that can be the first to go if
>> + * the system is under memory pressure.
> As mentioned above, this API needs reference object ages, not
> internal list ordering and shrinker implementation details.

Sure. Thanks for the suggestion.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed
  2018-08-29  7:51   ` Michal Hocko
@ 2018-08-29 19:58     ` Waiman Long
  2018-08-30  7:20       ` Michal Hocko
  0 siblings, 1 reply; 29+ messages in thread
From: Waiman Long @ 2018-08-29 19:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-fsdevel,
	linux-mm, linux-doc, Luis R. Rodriguez, Kees Cook,
	Linus Torvalds, Jan Kara, Paul E. McKenney, Andrew Morton,
	Ingo Molnar, Miklos Szeredi, Matthew Wilcox, Larry Woodman,
	James Bottomley, Wangkai (Kevin C)

On 08/29/2018 03:51 AM, Michal Hocko wrote:
> On Tue 28-08-18 13:19:40, Waiman Long wrote:
>> For negative dentries that are accessed once and never used again, they
>> should be removed first before other dentries when shrinker is running.
>> This is done by putting negative dentries at the head of the LRU list
>> instead at the tail.
>>
>> A new DCACHE_NEW_NEGATIVE flag is now added to a negative dentry when it
>> is initially created. When such a dentry is added to the LRU, it will be
>> added to the head so that it will be the first to go when a shrinker is
>> running if it is never accessed again (DCACHE_REFERENCED bit not set).
>> The flag is cleared after the LRU list addition.
> Placing object to the head of the LRU list can be really tricky as Dave
> pointed out. I am not familiar with the dentry cache reclaim so my
> comparison below might not apply. Let me try anyway.
>
> Negative dentries sound very similar to MADV_FREE pages from the reclaim
> POV. They are primary candidate for reclaim, yet you want to preserve
> aging to other easily reclaimable objects (including other MADV_FREE
> pages). What we do for those pages is to move them from the anonymous
> LRU list to the inactive file LRU list. Now you obviously do not have
> anon/file LRUs but something similar to active/inactive LRU lists might
> be a reasonably good match. Have easily reclaimable dentries on the
> inactive list including negative dentries. If negative entries are
> heavily used then they can promote to the active list because there is
> no reason to reclaim them soon.
>
> Just my 2c

As mentioned in my reply to Dave, I did considered using a 2 LRU list
solution. However, that will add more complexity to the dcache LRU
management code than my current approach and probably more potential for
slowdown.

One point that I forgot to mention is that the current dcache LRU list
will only update the order of the dentries when the list is being
shrinked. If a dentry was referenced again before (the DCACHE_REFERENCED
bit set), it will rotated to the tail of the list via LRU rotation
instead of being removed. This is not strictly LRU anyway. The way that
my patch work will just make it move further away from true LRU behavior.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed
  2018-08-29 17:54     ` Paul E. McKenney
@ 2018-08-29 20:03       ` Waiman Long
  2018-08-29 21:04       ` Matthew Wilcox
  1 sibling, 0 replies; 29+ messages in thread
From: Waiman Long @ 2018-08-29 20:03 UTC (permalink / raw)
  To: paulmck, Andrew Morton
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-fsdevel,
	linux-mm, linux-doc, Luis R. Rodriguez, Kees Cook,
	Linus Torvalds, Jan Kara, Ingo Molnar, Miklos Szeredi,
	Matthew Wilcox, Larry Woodman, James Bottomley, Wangkai (Kevin C),
	Michal Hocko

On 08/29/2018 01:54 PM, Paul E. McKenney wrote:
> On Tue, Aug 28, 2018 at 04:01:50PM -0700, Andrew Morton wrote:
>> Another pet peeve ;)
>>
>> On Tue, 28 Aug 2018 13:19:40 -0400 Waiman Long <longman@redhat.com> wrote:
>>
>>>  /**
>>> + * list_lru_add_head: add an element to the lru list's head
>>> + * @list_lru: the lru pointer
>>> + * @item: the item to be added.
>>> + *
>>> + * This is similar to list_lru_add(). The only difference is the location
>>> + * where the new item will be added. The list_lru_add() function will add
>> People often use the term "the foo() function".  I don't know why -
>> just say "foo()"!
> For whatever it is worth...
>
> I tend to use "The foo() function ..." instead of "foo() ..." in order
> to properly capitalize the first word of the sentence.  So I might say
> "The call_rcu() function enqueues an RCU callback." rather than something
> like "call_rcu() enqueues an RCU callback."  Or I might use some other
> trick to keep "call_rcu()" from being the first word of the sentence.
> But if the end of the previous sentence introduced call_rcu(), you
> usually want the next sentence's first use of "call_rcu()" to be very
> early in the sentence, because otherwise the flow will seem choppy.
>
> And no, I have no idea what I would do if I were writing in German,
> where nouns are capitalized, given that function names tend to be used
> as nouns.  Probably I would get yelled at a lot for capitalizing my
> function names.  ;-)
>
> 							Thanx, Paul
>
Yes, doing proper capitalization of the first letter of a sentence is
the main reason I used "The foo() function" in a sentence.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed
  2018-08-29 17:54     ` Paul E. McKenney
  2018-08-29 20:03       ` Waiman Long
@ 2018-08-29 21:04       ` Matthew Wilcox
  1 sibling, 0 replies; 29+ messages in thread
From: Matthew Wilcox @ 2018-08-29 21:04 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Andrew Morton, Waiman Long, Alexander Viro, Jonathan Corbet,
	linux-kernel, linux-fsdevel, linux-mm, linux-doc,
	Luis R. Rodriguez, Kees Cook, Linus Torvalds, Jan Kara,
	Ingo Molnar, Miklos Szeredi, Larry Woodman, James Bottomley,
	Wangkai (Kevin C),
	Michal Hocko

On Wed, Aug 29, 2018 at 10:54:05AM -0700, Paul E. McKenney wrote:
> On Tue, Aug 28, 2018 at 04:01:50PM -0700, Andrew Morton wrote:
> > People often use the term "the foo() function".  I don't know why -
> > just say "foo()"!
> 
> For whatever it is worth...
> 
> I tend to use "The foo() function ..." instead of "foo() ..." in order
> to properly capitalize the first word of the sentence.  So I might say
> "The call_rcu() function enqueues an RCU callback." rather than something
> like "call_rcu() enqueues an RCU callback."  Or I might use some other
> trick to keep "call_rcu()" from being the first word of the sentence.

I tend to write 'Use call_rcu() to enqueue a callback." or "When
call_rcu() returns, the callback will have been enqueued".

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed
  2018-08-29 19:34     ` Waiman Long
@ 2018-08-30  1:12       ` Dave Chinner
  2018-08-30 21:51         ` Waiman Long
  0 siblings, 1 reply; 29+ messages in thread
From: Dave Chinner @ 2018-08-30  1:12 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-fsdevel,
	linux-mm, linux-doc, Luis R. Rodriguez, Kees Cook,
	Linus Torvalds, Jan Kara, Paul E. McKenney, Andrew Morton,
	Ingo Molnar, Miklos Szeredi, Matthew Wilcox, Larry Woodman,
	James Bottomley, Wangkai (Kevin C),
	Michal Hocko

On Wed, Aug 29, 2018 at 03:34:05PM -0400, Waiman Long wrote:
> On 08/28/2018 09:02 PM, Dave Chinner wrote:
> > On Tue, Aug 28, 2018 at 01:19:40PM -0400, Waiman Long wrote:
> > And then there's shrinker behaviour. What happens if the shrinker
> > isolate callback returns LRU_ROTATE on a negative dentry? It gets
> > moved to the most recent end of the list, so it won't have an
> > attempt to reclaim it again until it's tried to reclaim all the real
> > dentries. IOWs, it goes back to behaving like LRUs are supposed to
> > behaving.
> >
> > IOWs, reclaim behaviour of negative dentries will be
> > highly unpredictable, it will not easily retain a working set, nor
> > will the working set it does retain be predictable or easy to eject
> > from memory when the workload changes.
> >
> > Is this the behavour what we want for negative dentries?
> 
> I am aware that the behavior is not strictly LRU for negative dentries.
> This is a compromise for using one LRU list for 2 different classes of
> dentries.

Thus demonstrating just enough knowledge to be dangerous.

We already 3 different classes of dentries on the LRU list:

	- in use dentries (because we use lazy removal to avoid
	  lru list lock contention on cache hit lookups)
	- unused, referenced dentries
	- unused, unreferenced dentries.

Each of these classes of dentries are treated differently by the
shrinker, but the key point is that they are all aged the same way
and so there's consistent maintenance of the working set under
memory pressure. Putting negative dentries at the head of the list
doesn't create a new "class" of object on the LRU, it just changes
the ordering of the lru list. This will cause unpredictable
behaviour because objects haven't had a chance to age gracefully
before they are reclaimed.

FYI, the inode cache has the same list_lru setup, object types and
shrinker algorithm as the dentry cache, so this isn't a one-off.
Indeed, the XFS buffer cache has a multi-reference heirarchy of 13
different types of {unused, referenced} buffers in it's list_lru to
implement a quasi aging-NFU reclaim algorithm in it's shrinker.

i.e. the list_lru infrastructure has never provided or enforced a
pure LRU algorithm. It is common infrastructure intended to provide
a scalable, flexible and memcg-aware FIFO-like object tracking
system that interates tightly with memory reclaim to allow
subsystems to implement cache reclaim algorithms that are optimal
for that subsystem.

IOWs, the list_lru doesn't define the reclaim algorithm the
subsystem uses and there's no reason why we can't extend the
infrastructure to support more complex algorithms without impacting
existing subsystem reclaim algorithms at all. Only the subsystems
that use the new infrastructure and algorithms would need careful
examination.  Of course, the overall system cache balancing
behaviour under different and changing workloads would still need to
be verified, but you have to do that for any cache reclaim algorithm
change that is made....

> The basic idea is that negative dentries that are used only
> once will go first irrespective of their age.

Using MRU for negative dentries, as I've previously explained, is a
flawed concept. It might be expedient to solve your specific
problem, but it's not a solution to the general problem of negative
dentry management.

> > Perhaps a second internal LRU list in the list_lru for "immediate
> > reclaim" objects would be a better solution. i.e. similar to the
> > active/inactive lists used for prioritising the working set iover
> > single use pages in page reclaim. negative dentries go onto the
> > immediate list, real dentries go on the existing list. Both are LRU,
> > and the shrinker operates on the immediate list first. When we
> > rotate referenced negative dentries on the immediate list, promote
> > them to the active list with all the real dentries so they age out
> > with the rest of the working set. That way single use negative
> > dentries get removed in LRU order in preference to the working set
> > of real dentries.
> >
> > Being able to make changes to the list implementation easily was one
> > of the reasons I hid the implementation of the list_lru from the
> > interface callers use....
> >
> > [...]
> 
> I have thought about using 2 lists for dentries. That will require much
> more extensive changes to the code and hence much more testing will be
> needed to verify their correctness.  That is the main reason why I try to
> avoid doing that.

i.e. expediency.

However, you're changing the behaviour of core caching and memory
reclaim algorithms. The amount and level of testing and verification
you need to do is the same regardless of whether it's a small change
or a large change.  Sure, you've shown that *one* artificial
micro-benchmark improves, but what about everything else?

> As you have suggested, we could implement this 2-level LRU list in the
> list_lru API. But it is used by other subsystems as well. Extensive
> change like that will have similar issue in term of testing and
> verification effort.

I know what changing reclaim algorithm involves, how difficult
it is to balance the competing caches quickly and to the desired
ratios for acceptable performance, how difficult it is to measure
and control the system reacts to transient and impulse memory
pressure events, etc.

I also know that "simple" and/or "obviously correct" subsystem
changes can cause very unexepected system level effects, and that
it's almost never what you think it is that caused the unexpected
behaviour.  IOWs, getting anything even slightly wrong in these
algorithms will adversely affect system performance and balance
significantly.  Hence the bar is /always/ set high for core caching
algorithm changes like this.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] fs/dcache: Track & report number of negative dentries
  2018-08-29 17:11     ` Waiman Long
@ 2018-08-30  1:43       ` Dave Chinner
  2018-08-30 21:49         ` Waiman Long
  0 siblings, 1 reply; 29+ messages in thread
From: Dave Chinner @ 2018-08-30  1:43 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-fsdevel,
	linux-mm, linux-doc, Luis R. Rodriguez, Kees Cook,
	Linus Torvalds, Jan Kara, Paul E. McKenney, Andrew Morton,
	Ingo Molnar, Miklos Szeredi, Matthew Wilcox, Larry Woodman,
	James Bottomley, Wangkai (Kevin C),
	Michal Hocko

On Wed, Aug 29, 2018 at 01:11:08PM -0400, Waiman Long wrote:
> On 08/28/2018 08:11 PM, Dave Chinner wrote:
> > On Tue, Aug 28, 2018 at 01:19:39PM -0400, Waiman Long wrote:
> >> The current dentry number tracking code doesn't distinguish between
> >> positive & negative dentries. It just reports the total number of
> >> dentries in the LRU lists.
> >>
> >> As excessive number of negative dentries can have an impact on system
> >> performance, it will be wise to track the number of positive and
> >> negative dentries separately.
> >>
> >> This patch adds tracking for the total number of negative dentries in
> >> the system LRU lists and reports it in the /proc/sys/fs/dentry-state
> >> file. The number, however, does not include negative dentries that are
> >> in flight but not in the LRU yet.
> >>
> >> The number of positive dentries in the LRU lists can be roughly found
> >> by subtracting the number of negative dentries from the total.
> >>
> >> Signed-off-by: Waiman Long <longman@redhat.com>
> >> ---
> >>  Documentation/sysctl/fs.txt | 19 +++++++++++++------
> >>  fs/dcache.c                 | 45 +++++++++++++++++++++++++++++++++++++++++++++
> >>  include/linux/dcache.h      |  7 ++++---
> >>  3 files changed, 62 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
> >> index 819caf8..118bb93 100644
> >> --- a/Documentation/sysctl/fs.txt
> >> +++ b/Documentation/sysctl/fs.txt
> >> @@ -63,19 +63,26 @@ struct {
> >>          int nr_unused;
> >>          int age_limit;         /* age in seconds */
> >>          int want_pages;        /* pages requested by system */
> >> -        int dummy[2];
> >> +        int nr_negative;       /* # of unused negative dentries */
> >> +        int dummy;
> >>  } dentry_stat = {0, 0, 45, 0,};
> > That's not a backwards compatible ABI change. Those dummy fields
> > used to represent some metric we no longer calculate, and there are
> > probably still monitoring apps out there that think they still have
> > the old meaning. i.e. they are still visible to userspace:
> >
> > $ cat /proc/sys/fs/dentry-state 
> > 83090	67661	45	0	0	0
> > $
> >
> > IOWs, you can add new fields for new metrics to the end of the
> > structure, but you can't re-use existing fields even if they
> > aren't calculated anymore.
> >
> > [....]
> 
> I looked up the git history and the state of the dentry_stat structure
> hadn't changed since it was first put into git in 2.6.12-rc2 on Apr 16,
> 2005. That was over 13 years ago. Even adding an extra argument can have
> the potential of breaking old applications depending on how the parsing
> code was written.

I'm pretty we've had this discussion many times before  w.r.t.
/proc/self/mount* and other multi-field proc files. 

IIRC, The answer has always been that it's OK to extend lines with
new fields as existing apps /should/ ignore them, but it's not OK to
remove or redefine existing fields in the line because existing apps
/will/ misinterpret what that field means.

> Given that systems that are still using some very old tools are not
> likely to upgrade to the latest kernel anyway. I don't see that as a big
> problem.

I don't think that matters when it comes to changing what
information we expose in proc files.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed
  2018-08-29 19:58     ` Waiman Long
@ 2018-08-30  7:20       ` Michal Hocko
  2018-08-30 21:48         ` Waiman Long
  0 siblings, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2018-08-30  7:20 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-fsdevel,
	linux-mm, linux-doc, Luis R. Rodriguez, Kees Cook,
	Linus Torvalds, Jan Kara, Paul E. McKenney, Andrew Morton,
	Ingo Molnar, Miklos Szeredi, Matthew Wilcox, Larry Woodman,
	James Bottomley, Wangkai (Kevin C)

On Wed 29-08-18 15:58:52, Waiman Long wrote:
> On 08/29/2018 03:51 AM, Michal Hocko wrote:
> > On Tue 28-08-18 13:19:40, Waiman Long wrote:
> >> For negative dentries that are accessed once and never used again, they
> >> should be removed first before other dentries when shrinker is running.
> >> This is done by putting negative dentries at the head of the LRU list
> >> instead at the tail.
> >>
> >> A new DCACHE_NEW_NEGATIVE flag is now added to a negative dentry when it
> >> is initially created. When such a dentry is added to the LRU, it will be
> >> added to the head so that it will be the first to go when a shrinker is
> >> running if it is never accessed again (DCACHE_REFERENCED bit not set).
> >> The flag is cleared after the LRU list addition.
> > Placing object to the head of the LRU list can be really tricky as Dave
> > pointed out. I am not familiar with the dentry cache reclaim so my
> > comparison below might not apply. Let me try anyway.
> >
> > Negative dentries sound very similar to MADV_FREE pages from the reclaim
> > POV. They are primary candidate for reclaim, yet you want to preserve
> > aging to other easily reclaimable objects (including other MADV_FREE
> > pages). What we do for those pages is to move them from the anonymous
> > LRU list to the inactive file LRU list. Now you obviously do not have
> > anon/file LRUs but something similar to active/inactive LRU lists might
> > be a reasonably good match. Have easily reclaimable dentries on the
> > inactive list including negative dentries. If negative entries are
> > heavily used then they can promote to the active list because there is
> > no reason to reclaim them soon.
> >
> > Just my 2c
> 
> As mentioned in my reply to Dave, I did considered using a 2 LRU list
> solution. However, that will add more complexity to the dcache LRU
> management code than my current approach and probably more potential for
> slowdown.

I completely agree with Dave here. This is not easy but trying to sneak
in something that works for an _artificial_ workload is simply a no go.
So if it takes to come with a more complex solution to cover more
general workloads then be it. Someone has to bite a bullet and explore
that direction. It won't be a simple project but well, if negative
dentries really matter then it is worth making the reclaim design robust
and comprehensible rather than adhoc and unpredictable.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed
  2018-08-30  7:20       ` Michal Hocko
@ 2018-08-30 21:48         ` Waiman Long
  0 siblings, 0 replies; 29+ messages in thread
From: Waiman Long @ 2018-08-30 21:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-fsdevel,
	linux-mm, linux-doc, Luis R. Rodriguez, Kees Cook,
	Linus Torvalds, Jan Kara, Paul E. McKenney, Andrew Morton,
	Ingo Molnar, Miklos Szeredi, Matthew Wilcox, Larry Woodman,
	James Bottomley, Wangkai (Kevin C)

On 08/30/2018 03:20 AM, Michal Hocko wrote:
> On Wed 29-08-18 15:58:52, Waiman Long wrote:
>> On 08/29/2018 03:51 AM, Michal Hocko wrote:
>>> On Tue 28-08-18 13:19:40, Waiman Long wrote:
>>>> For negative dentries that are accessed once and never used again, they
>>>> should be removed first before other dentries when shrinker is running.
>>>> This is done by putting negative dentries at the head of the LRU list
>>>> instead at the tail.
>>>>
>>>> A new DCACHE_NEW_NEGATIVE flag is now added to a negative dentry when it
>>>> is initially created. When such a dentry is added to the LRU, it will be
>>>> added to the head so that it will be the first to go when a shrinker is
>>>> running if it is never accessed again (DCACHE_REFERENCED bit not set).
>>>> The flag is cleared after the LRU list addition.
>>> Placing object to the head of the LRU list can be really tricky as Dave
>>> pointed out. I am not familiar with the dentry cache reclaim so my
>>> comparison below might not apply. Let me try anyway.
>>>
>>> Negative dentries sound very similar to MADV_FREE pages from the reclaim
>>> POV. They are primary candidate for reclaim, yet you want to preserve
>>> aging to other easily reclaimable objects (including other MADV_FREE
>>> pages). What we do for those pages is to move them from the anonymous
>>> LRU list to the inactive file LRU list. Now you obviously do not have
>>> anon/file LRUs but something similar to active/inactive LRU lists might
>>> be a reasonably good match. Have easily reclaimable dentries on the
>>> inactive list including negative dentries. If negative entries are
>>> heavily used then they can promote to the active list because there is
>>> no reason to reclaim them soon.
>>>
>>> Just my 2c
>> As mentioned in my reply to Dave, I did considered using a 2 LRU list
>> solution. However, that will add more complexity to the dcache LRU
>> management code than my current approach and probably more potential for
>> slowdown.
> I completely agree with Dave here. This is not easy but trying to sneak
> in something that works for an _artificial_ workload is simply a no go.
> So if it takes to come with a more complex solution to cover more
> general workloads then be it. Someone has to bite a bullet and explore
> that direction. It won't be a simple project but well, if negative
> dentries really matter then it is worth making the reclaim design robust
> and comprehensible rather than adhoc and unpredictable.

OK, I will need to spend more time to think about a better way of doing
that.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] fs/dcache: Track & report number of negative dentries
  2018-08-30  1:43       ` Dave Chinner
@ 2018-08-30 21:49         ` Waiman Long
  0 siblings, 0 replies; 29+ messages in thread
From: Waiman Long @ 2018-08-30 21:49 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-fsdevel,
	linux-mm, linux-doc, Luis R. Rodriguez, Kees Cook,
	Linus Torvalds, Jan Kara, Paul E. McKenney, Andrew Morton,
	Ingo Molnar, Miklos Szeredi, Matthew Wilcox, Larry Woodman,
	James Bottomley, Wangkai (Kevin C),
	Michal Hocko

On 08/29/2018 09:43 PM, Dave Chinner wrote:
> On Wed, Aug 29, 2018 at 01:11:08PM -0400, Waiman Long wrote:
>> On 08/28/2018 08:11 PM, Dave Chinner wrote:
>>> On Tue, Aug 28, 2018 at 01:19:39PM -0400, Waiman Long wrote:
>>>> The current dentry number tracking code doesn't distinguish between
>>>> positive & negative dentries. It just reports the total number of
>>>> dentries in the LRU lists.
>>>>
>>>> As excessive number of negative dentries can have an impact on system
>>>> performance, it will be wise to track the number of positive and
>>>> negative dentries separately.
>>>>
>>>> This patch adds tracking for the total number of negative dentries in
>>>> the system LRU lists and reports it in the /proc/sys/fs/dentry-state
>>>> file. The number, however, does not include negative dentries that are
>>>> in flight but not in the LRU yet.
>>>>
>>>> The number of positive dentries in the LRU lists can be roughly found
>>>> by subtracting the number of negative dentries from the total.
>>>>
>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>> ---
>>>>  Documentation/sysctl/fs.txt | 19 +++++++++++++------
>>>>  fs/dcache.c                 | 45 +++++++++++++++++++++++++++++++++++++++++++++
>>>>  include/linux/dcache.h      |  7 ++++---
>>>>  3 files changed, 62 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
>>>> index 819caf8..118bb93 100644
>>>> --- a/Documentation/sysctl/fs.txt
>>>> +++ b/Documentation/sysctl/fs.txt
>>>> @@ -63,19 +63,26 @@ struct {
>>>>          int nr_unused;
>>>>          int age_limit;         /* age in seconds */
>>>>          int want_pages;        /* pages requested by system */
>>>> -        int dummy[2];
>>>> +        int nr_negative;       /* # of unused negative dentries */
>>>> +        int dummy;
>>>>  } dentry_stat = {0, 0, 45, 0,};
>>> That's not a backwards compatible ABI change. Those dummy fields
>>> used to represent some metric we no longer calculate, and there are
>>> probably still monitoring apps out there that think they still have
>>> the old meaning. i.e. they are still visible to userspace:
>>>
>>> $ cat /proc/sys/fs/dentry-state 
>>> 83090	67661	45	0	0	0
>>> $
>>>
>>> IOWs, you can add new fields for new metrics to the end of the
>>> structure, but you can't re-use existing fields even if they
>>> aren't calculated anymore.
>>>
>>> [....]
>> I looked up the git history and the state of the dentry_stat structure
>> hadn't changed since it was first put into git in 2.6.12-rc2 on Apr 16,
>> 2005. That was over 13 years ago. Even adding an extra argument can have
>> the potential of breaking old applications depending on how the parsing
>> code was written.
> I'm pretty we've had this discussion many times before  w.r.t.
> /proc/self/mount* and other multi-field proc files. 
>
> IIRC, The answer has always been that it's OK to extend lines with
> new fields as existing apps /should/ ignore them, but it's not OK to
> remove or redefine existing fields in the line because existing apps
> /will/ misinterpret what that field means.
>
>> Given that systems that are still using some very old tools are not
>> likely to upgrade to the latest kernel anyway. I don't see that as a big
>> problem.
> I don't think that matters when it comes to changing what
> information we expose in proc files.
>
> Cheers,
>
> Dave.

I am not against appending the new count to the end. I just want to make
sure that it is the right thing to do.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed
  2018-08-30  1:12       ` Dave Chinner
@ 2018-08-30 21:51         ` Waiman Long
  0 siblings, 0 replies; 29+ messages in thread
From: Waiman Long @ 2018-08-30 21:51 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-fsdevel,
	linux-mm, linux-doc, Luis R. Rodriguez, Kees Cook,
	Linus Torvalds, Jan Kara, Paul E. McKenney, Andrew Morton,
	Ingo Molnar, Miklos Szeredi, Matthew Wilcox, Larry Woodman,
	James Bottomley, Wangkai (Kevin C),
	Michal Hocko

On 08/29/2018 09:12 PM, Dave Chinner wrote:
> On Wed, Aug 29, 2018 at 03:34:05PM -0400, Waiman Long wrote:
>> On 08/28/2018 09:02 PM, Dave Chinner wrote:
>>> On Tue, Aug 28, 2018 at 01:19:40PM -0400, Waiman Long wrote:
>>> And then there's shrinker behaviour. What happens if the shrinker
>>> isolate callback returns LRU_ROTATE on a negative dentry? It gets
>>> moved to the most recent end of the list, so it won't have an
>>> attempt to reclaim it again until it's tried to reclaim all the real
>>> dentries. IOWs, it goes back to behaving like LRUs are supposed to
>>> behaving.
>>>
>>> IOWs, reclaim behaviour of negative dentries will be
>>> highly unpredictable, it will not easily retain a working set, nor
>>> will the working set it does retain be predictable or easy to eject
>>> from memory when the workload changes.
>>>
>>> Is this the behavour what we want for negative dentries?
>> I am aware that the behavior is not strictly LRU for negative dentries.
>> This is a compromise for using one LRU list for 2 different classes of
>> dentries.
> Thus demonstrating just enough knowledge to be dangerous.
>
> We already 3 different classes of dentries on the LRU list:
>
> 	- in use dentries (because we use lazy removal to avoid
> 	  lru list lock contention on cache hit lookups)
> 	- unused, referenced dentries
> 	- unused, unreferenced dentries.
>
> Each of these classes of dentries are treated differently by the
> shrinker, but the key point is that they are all aged the same way
> and so there's consistent maintenance of the working set under
> memory pressure. Putting negative dentries at the head of the list
> doesn't create a new "class" of object on the LRU, it just changes
> the ordering of the lru list. This will cause unpredictable
> behaviour because objects haven't had a chance to age gracefully
> before they are reclaimed.
>
> FYI, the inode cache has the same list_lru setup, object types and
> shrinker algorithm as the dentry cache, so this isn't a one-off.
> Indeed, the XFS buffer cache has a multi-reference heirarchy of 13
> different types of {unused, referenced} buffers in it's list_lru to
> implement a quasi aging-NFU reclaim algorithm in it's shrinker.
>
> i.e. the list_lru infrastructure has never provided or enforced a
> pure LRU algorithm. It is common infrastructure intended to provide
> a scalable, flexible and memcg-aware FIFO-like object tracking
> system that interates tightly with memory reclaim to allow
> subsystems to implement cache reclaim algorithms that are optimal
> for that subsystem.
>
> IOWs, the list_lru doesn't define the reclaim algorithm the
> subsystem uses and there's no reason why we can't extend the
> infrastructure to support more complex algorithms without impacting
> existing subsystem reclaim algorithms at all. Only the subsystems
> that use the new infrastructure and algorithms would need careful
> examination.  Of course, the overall system cache balancing
> behaviour under different and changing workloads would still need to
> be verified, but you have to do that for any cache reclaim algorithm
> change that is made....
>
>> The basic idea is that negative dentries that are used only
>> once will go first irrespective of their age.
> Using MRU for negative dentries, as I've previously explained, is a
> flawed concept. It might be expedient to solve your specific
> problem, but it's not a solution to the general problem of negative
> dentry management.
>
>>> Perhaps a second internal LRU list in the list_lru for "immediate
>>> reclaim" objects would be a better solution. i.e. similar to the
>>> active/inactive lists used for prioritising the working set iover
>>> single use pages in page reclaim. negative dentries go onto the
>>> immediate list, real dentries go on the existing list. Both are LRU,
>>> and the shrinker operates on the immediate list first. When we
>>> rotate referenced negative dentries on the immediate list, promote
>>> them to the active list with all the real dentries so they age out
>>> with the rest of the working set. That way single use negative
>>> dentries get removed in LRU order in preference to the working set
>>> of real dentries.
>>>
>>> Being able to make changes to the list implementation easily was one
>>> of the reasons I hid the implementation of the list_lru from the
>>> interface callers use....
>>>
>>> [...]
>> I have thought about using 2 lists for dentries. That will require much
>> more extensive changes to the code and hence much more testing will be
>> needed to verify their correctness.  That is the main reason why I try to
>> avoid doing that.
> i.e. expediency.
>
> However, you're changing the behaviour of core caching and memory
> reclaim algorithms. The amount and level of testing and verification
> you need to do is the same regardless of whether it's a small change
> or a large change.  Sure, you've shown that *one* artificial
> micro-benchmark improves, but what about everything else?
>
>> As you have suggested, we could implement this 2-level LRU list in the
>> list_lru API. But it is used by other subsystems as well. Extensive
>> change like that will have similar issue in term of testing and
>> verification effort.
> I know what changing reclaim algorithm involves, how difficult
> it is to balance the competing caches quickly and to the desired
> ratios for acceptable performance, how difficult it is to measure
> and control the system reacts to transient and impulse memory
> pressure events, etc.
>
> I also know that "simple" and/or "obviously correct" subsystem
> changes can cause very unexepected system level effects, and that
> it's almost never what you think it is that caused the unexpected
> behaviour.  IOWs, getting anything even slightly wrong in these
> algorithms will adversely affect system performance and balance
> significantly.  Hence the bar is /always/ set high for core caching
> algorithm changes like this.
>
> Cheers,
>
> Dave.

Thanks for the comments. I will need more time to think about it.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] fs/dcache: Track & report number of negative dentries
  2018-08-29  0:11   ` Dave Chinner
  2018-08-29 17:11     ` Waiman Long
@ 2018-08-31 14:31     ` Matthew Wilcox
  2018-08-31 15:03       ` Waiman Long
  1 sibling, 1 reply; 29+ messages in thread
From: Matthew Wilcox @ 2018-08-31 14:31 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Waiman Long, Alexander Viro, Jonathan Corbet, linux-kernel,
	linux-fsdevel, linux-mm, linux-doc, Luis R. Rodriguez, Kees Cook,
	Linus Torvalds, Jan Kara, Paul E. McKenney, Andrew Morton,
	Ingo Molnar, Miklos Szeredi, Larry Woodman, James Bottomley,
	Wangkai (Kevin C),
	Michal Hocko

On Wed, Aug 29, 2018 at 10:11:53AM +1000, Dave Chinner wrote:
> > +++ b/Documentation/sysctl/fs.txt
> > @@ -63,19 +63,26 @@ struct {
> >          int nr_unused;
> >          int age_limit;         /* age in seconds */
> >          int want_pages;        /* pages requested by system */
> > -        int dummy[2];
> > +        int nr_negative;       /* # of unused negative dentries */
> > +        int dummy;
> >  } dentry_stat = {0, 0, 45, 0,};
> 
> That's not a backwards compatible ABI change. Those dummy fields
> used to represent some metric we no longer calculate, and there are
> probably still monitoring apps out there that think they still have
> the old meaning. i.e. they are still visible to userspace:

I believe you are incorrect.  dentry_stat was introduced in 2.1.60 with
this hunk:

+struct {
+       int nr_dentry;
+       int nr_unused;
+       int age_limit;          /* age in seconds */
+       int want_pages;         /* pages requested by system */
+       int dummy[2];
+} dentry_stat = {0, 0, 45, 0,};
+

Looking through the rest of the dentry_stat changes in the 2.1.60 release,
it's not replacing anything, it's adding new information.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] fs/dcache: Track & report number of negative dentries
  2018-08-31 14:31     ` Matthew Wilcox
@ 2018-08-31 15:03       ` Waiman Long
  0 siblings, 0 replies; 29+ messages in thread
From: Waiman Long @ 2018-08-31 15:03 UTC (permalink / raw)
  To: Matthew Wilcox, Dave Chinner
  Cc: Alexander Viro, Jonathan Corbet, linux-kernel, linux-fsdevel,
	linux-mm, linux-doc, Luis R. Rodriguez, Kees Cook,
	Linus Torvalds, Jan Kara, Paul E. McKenney, Andrew Morton,
	Ingo Molnar, Miklos Szeredi, Larry Woodman, James Bottomley,
	Wangkai (Kevin C),
	Michal Hocko

On 08/31/2018 10:31 AM, Matthew Wilcox wrote:
> On Wed, Aug 29, 2018 at 10:11:53AM +1000, Dave Chinner wrote:
>>> +++ b/Documentation/sysctl/fs.txt
>>> @@ -63,19 +63,26 @@ struct {
>>>          int nr_unused;
>>>          int age_limit;         /* age in seconds */
>>>          int want_pages;        /* pages requested by system */
>>> -        int dummy[2];
>>> +        int nr_negative;       /* # of unused negative dentries */
>>> +        int dummy;
>>>  } dentry_stat = {0, 0, 45, 0,};
>> That's not a backwards compatible ABI change. Those dummy fields
>> used to represent some metric we no longer calculate, and there are
>> probably still monitoring apps out there that think they still have
>> the old meaning. i.e. they are still visible to userspace:
> I believe you are incorrect.  dentry_stat was introduced in 2.1.60 with
> this hunk:
>
> +struct {
> +       int nr_dentry;
> +       int nr_unused;
> +       int age_limit;          /* age in seconds */
> +       int want_pages;         /* pages requested by system */
> +       int dummy[2];
> +} dentry_stat = {0, 0, 45, 0,};
> +
>
> Looking through the rest of the dentry_stat changes in the 2.1.60 release,
> it's not replacing anything, it's adding new information.

Thanks for looking up earlier non-git source tree. If that is the case,
the dummy[2] was there just for future extension purpose. It should be
perfectly fine to reuse one of the dummy entry for negative dentry count
then as no sane application would have checked the last 2 entries of
dentry-state and do dummy things if they are non-zero.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2018-08-31 15:03 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-28 17:19 [PATCH 0/2] fs/dcache: Track # of negative dentries Waiman Long
2018-08-28 17:19 ` [PATCH 1/2] fs/dcache: Track & report number " Waiman Long
2018-08-29  0:11   ` Dave Chinner
2018-08-29 17:11     ` Waiman Long
2018-08-30  1:43       ` Dave Chinner
2018-08-30 21:49         ` Waiman Long
2018-08-31 14:31     ` Matthew Wilcox
2018-08-31 15:03       ` Waiman Long
2018-08-28 17:19 ` [PATCH 2/2] fs/dcache: Make negative dentries easier to be reclaimed Waiman Long
2018-08-28 22:13   ` Matthew Wilcox
2018-08-28 22:29     ` Waiman Long
2018-08-28 23:10       ` Linus Torvalds
2018-08-28 23:22         ` Andrew Morton
2018-08-29  1:18           ` Waiman Long
2018-08-29  1:18         ` Waiman Long
2018-08-28 23:01   ` Andrew Morton
2018-08-29 17:54     ` Paul E. McKenney
2018-08-29 20:03       ` Waiman Long
2018-08-29 21:04       ` Matthew Wilcox
2018-08-29  1:02   ` Dave Chinner
2018-08-29 19:34     ` Waiman Long
2018-08-30  1:12       ` Dave Chinner
2018-08-30 21:51         ` Waiman Long
2018-08-29  7:51   ` Michal Hocko
2018-08-29 19:58     ` Waiman Long
2018-08-30  7:20       ` Michal Hocko
2018-08-30 21:48         ` Waiman Long
2018-08-28 22:50 ` [PATCH 0/2] fs/dcache: Track # of negative dentries Andrew Morton
2018-08-28 22:54   ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).